BPNet_Dmel_OreR_2to3hr_ZDTBCG

Authors: Melanie Weilert and Kaelan Brennan

License: MIT

Contributed by: Melanie Weilert and Kaelan Brennan

Cite as: https://doi.org/10.1101/2022.12.20.520743

Type: None

Postprocessing: None

Trained on: 2-3hr OreR D.mel embryos (validation chrom=chr2L, test chrom=chrX)

Source files

BPNet model predicting the ChIP-nexus profiles of Zelda, Dorsal, Twist, GAGA-factor, Caudal and Bicoid in 2-3hr OreR D.mel embryos.

Create a new conda environment with all dependencies installed
kipoi env create BPNet_Dmel_OreR_2to3hr_ZDTBCG
source activate kipoi-BPNet_Dmel_OreR_2to3hr_ZDTBCG
Test the model
kipoi test BPNet_Dmel_OreR_2to3hr_ZDTBCG --source=kipoi
Make a prediction
kipoi get-example BPNet_Dmel_OreR_2to3hr_ZDTBCG -o example
kipoi predict BPNet_Dmel_OreR_2to3hr_ZDTBCG \
  --dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
  -o '/tmp/BPNet_Dmel_OreR_2to3hr_ZDTBCG.example_pred.tsv'
# check the results
head '/tmp/BPNet_Dmel_OreR_2to3hr_ZDTBCG.example_pred.tsv'
Create a new conda environment with all dependencies installed
kipoi env create BPNet_Dmel_OreR_2to3hr_ZDTBCG
source activate kipoi-BPNet_Dmel_OreR_2to3hr_ZDTBCG
Get the model
import kipoi
model = kipoi.get_model('BPNet_Dmel_OreR_2to3hr_ZDTBCG')
Make a prediction for example files
pred = model.pipeline.predict_example(batch_size=4)
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
    # predict for a batch
    batch_pred = model.predict_on_batch(batch['inputs'])
Make predictions for custom files directly
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
Get the model
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('BPNet_Dmel_OreR_2to3hr_ZDTBCG')
Make a prediction for example files
predictions <- model$pipeline$predict_example()
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
Make predictions for custom files directly
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
Get the docker image
Not available yet
Get the full sized docker image
Not available yet
Get the activated conda environment inside the container
Not available yet
Test the model
Not available yet
Make prediction for custom files directly
Not available yet
Install apptainer
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
Make prediction for custom files directly
Not available yet

Schema

Inputs

Single numpy array

Name: None

    Shape: (1000, 4) 

    Doc: One-hot encoded DNA sequence.


Targets

Dictionary of numpy arrays

Name: Bcd

    Shape: (1000, 2) 

    Doc: Strand-specific ChIP-nexus data for Bicoid.

Name: Cad

    Shape: (1000, 2) 

    Doc: Strand-specific ChIP-nexus data for Caudal.

Name: Dl

    Shape: (1000, 2) 

    Doc: Strand-specific ChIP-nexus data for Dorsal.

Name: GAF

    Shape: (1000, 2) 

    Doc: Strand-specific ChIP-nexus data for GAGA-factor.

Name: Twi

    Shape: (1000, 2) 

    Doc: Strand-specific ChIP-nexus data for Twist.

Name: Zld

    Shape: (1000, 2) 

    Doc: Strand-specific ChIP-nexus data for Zelda.


Dataloader

Defined as: kipoiseq.dataloaders.SeqIntervalDl

Doc: Dataloader for a combination of fasta and tab-delimited input files such as bed files. The dataloader extracts regions from the fasta file as defined in the tab-delimited `intervals_file` and converts them into one-hot encoded format. Returned sequences are of the type np.array with the shape inferred from the arguments: `alphabet_axis` and `dummy_axis`.

Authors: Ziga Avsec , Roman Kreuzhuber

Type: Dataset

License: MIT


Arguments

intervals_file : bed3+<columns> file path containing intervals + (optionally) labels

fasta_file : Reference genome FASTA file path.

num_chr_fasta (optional): True, the the dataloader will make sure that the chromosomes don't start with chr.

use_strand (optional): reverse-complement fasta sequence if bed file defines negative strand. Requires a bed6 file


Model dependencies
conda:
  • python=3.7
  • bioconda::pybedtools>=0.7.10
  • bioconda::bedtools>=2.27.1
  • bioconda::pybigwig>=0.3.10
  • bioconda::pysam>=0.14.0
  • bioconda::genomelake==0.1.4
  • pytorch::pytorch=1.4.0
  • cython=0.29.22
  • h5py=2.10.0
  • numpy=1.19.2
  • pandas=1.1.5
  • fastparquet=0.5.0
  • python-snappy=0.6.0
  • pip=21.0.1
  • nb_conda=2.2.1
  • tensorflow=1.14
  • keras=2.2.4

pip:
  • git+https://github.com/kundajelab/DeepExplain.git
  • git+https://github.com/kundajelab/bpnet.git@0cb7277b736260f8b4084c9b0c5bd62b9edb5266
  • protobuf==3.20

Dataloader dependencies
conda:
  • bioconda::pybedtools
  • bioconda::pyfaidx
  • bioconda::pyranges
  • numpy
  • pandas

pip:
  • kipoiseq