DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261
Authors: Babak Alipanahi , Andrew Delong , Matthew T Weirauch , Brendan J Frey
License: BSD 3-Clause
Contributed by: Johnny Israeli
Cite as: https://doi.org/10.1038/nbt.3300
Type: keras
Postprocessing: variant_effects
Trained on: ?All chromosomes? Data from protein binding microarrays (Mukherjee et al., 2004), RNAcompete assays (Ray et al., 2009), ChIP-seq (Kharchenko et al., 2008), and HT-SELEX (Jolma et al., 2010)
Abstract: Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.
kipoi env create DeepBind
source activate kipoi-DeepBind
kipoi test DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 --source=kipoi
kipoi get-example DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 -o example
kipoi predict DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o '/tmp/DeepBind|Naegleria_gruberi|RBP|D00270.001_RNAcompete_Ng_0261.example_pred.tsv'
# check the results
head '/tmp/DeepBind|Naegleria_gruberi|RBP|D00270.001_RNAcompete_Ng_0261.example_pred.tsv'
kipoi env create DeepBind
source activate kipoi-DeepBind
import kipoi
model = kipoi.get_model('DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261')
pred = model.pipeline.predict_example(batch_size=4)
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
# predict for a batch
batch_pred = model.predict_on_batch(batch['inputs'])
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261')
predictions <- model$pipeline$predict_example()
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
docker pull kipoi/kipoi-docker:sharedpy3keras2tf1-slim
docker pull kipoi/kipoi-docker:sharedpy3keras2tf1
docker run -it kipoi/kipoi-docker:sharedpy3keras2tf1-slim
docker run kipoi/kipoi-docker:sharedpy3keras2tf1-slim kipoi test DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 --source=kipoi
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example
# You can replace $PWD/kipoi-example with a different absolute path containing the data
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras2tf1-slim \
kipoi get-example DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 -o /app/example
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras2tf1-slim \
kipoi predict DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 \
--dataloader_args='{'intervals_file': '/app/example/intervals_file', 'fasta_file': '/app/example/fasta_file'}' \
-o '/app/DeepBind_Naegleria_gruberi_RBP_D00270.001_RNAcompete_Ng_0261.example_pred.tsv'
# check the results
head $PWD/kipoi-example/DeepBind_Naegleria_gruberi_RBP_D00270.001_RNAcompete_Ng_0261.example_pred.tsv
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
kipoi get-example DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 -o example
kipoi predict DeepBind/Naegleria_gruberi/RBP/D00270.001_RNAcompete_Ng_0261 \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o 'DeepBind_Naegleria_gruberi_RBP_D00270.001_RNAcompete_Ng_0261.example_pred.tsv' \
--singularity
# check the results
head DeepBind_Naegleria_gruberi_RBP_D00270.001_RNAcompete_Ng_0261.example_pred.tsv
Defined as: kipoiseq.dataloaders.SeqIntervalDl
Doc: Dataloader for a combination of fasta and tab-delimited input files such as bed files. The dataloader extracts regions from the fasta file as defined in the tab-delimited `intervals_file` and converts them into one-hot encoded format. Returned sequences are of the type np.array with the shape inferred from the arguments: `alphabet_axis` and `dummy_axis`.
Authors: Ziga Avsec , Roman Kreuzhuber
Type: Dataset
License: MIT
Arguments
intervals_file : bed3+<columns> file path containing intervals + (optionally) labels
fasta_file : Reference genome FASTA file path.
num_chr_fasta (optional): True, the the dataloader will make sure that the chromosomes don't start with chr.
label_dtype (optional): None, datatype of the task labels taken from the intervals_file. Example: str, int, float, np.float32
use_strand (optional): reverse-complement fasta sequence if bed file defines negative strand. Requires a bed6 file
ignore_targets (optional): if True, don't return any target variables
- h5py=2.10.0
- tensorflow=2.7.0
- keras=2.7.0
- python=3.7
- bioconda::pysam=0.18.0
- pip=20.2.4
- bioconda::pybedtools
- bioconda::pyfaidx
- bioconda::pyranges
- numpy
- pandas
- kipoiseq