DeepSEA/predict
Authors: Jian Zhou , Olga G. Troyanskaya
License: CC-BY 3.0
Contributed by: Roman Kreuzhuber
Cite as: https://doi.org/10.1038/nmeth.3547
Type: pytorch
Postprocessing: variant_effects
Trained on: Chromosome 8 and 9 were excluded from training, and the rest of the autosomes were used for training and validation. 4,000 samples on chromosome 7 spanning the genomic coordinates 30,508,751-35,296,850 were used as the validation set. Data were ENCODE and Roadmap Epigenomics chromatin profiles https://www.nature.com/articles/nmeth.3547#methods
This CNN is based on the DeepSEA model from Zhou and Troyanskaya (2015). The model has been converted to a pytorch model on a modified version of https://github.com/clcarwin/convert_torch_to_pytorch Use this model only for predictions of sequences, not for variant effect prediction. The model automatically generates reverse-complement and averages over forward and reverse-complement to results from the website. To predict variant effects use the DeepSEA/variantEffects model. It categorically predicts 919 cell type-specific epigenetic features from DNA sequence. The model is trained on publicly available ENCODE and Roadmap Epigenomics data and on DNA sequences of size 1000bp. The input of the tensor has to be (N, 4, 1, 1000) for N samples, 1000bp window size and 4 nucleotides. Per sample, 919 probabilities of a specific epigentic feature will be predicted.
kipoi env create DeepSEA/predict
source activate kipoi-DeepSEA__predict
kipoi test DeepSEA/predict --source=kipoi
kipoi get-example DeepSEA/predict -o example
kipoi predict DeepSEA/predict \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o '/tmp/DeepSEA|predict.example_pred.tsv'
# check the results
head '/tmp/DeepSEA|predict.example_pred.tsv'
kipoi env create DeepSEA/predict
source activate kipoi-DeepSEA__predict
import kipoi
model = kipoi.get_model('DeepSEA/predict')
pred = model.pipeline.predict_example(batch_size=4)
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
# predict for a batch
batch_pred = model.predict_on_batch(batch['inputs'])
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('DeepSEA/predict')
predictions <- model$pipeline$predict_example()
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
docker pull kipoi/kipoi-docker:sharedpy3keras2tf2-slim
docker pull kipoi/kipoi-docker:sharedpy3keras2tf2
docker run -it kipoi/kipoi-docker:sharedpy3keras2tf2-slim
docker run kipoi/kipoi-docker:sharedpy3keras2tf2-slim kipoi test DeepSEA/predict --source=kipoi
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example
# You can replace $PWD/kipoi-example with a different absolute path containing the data
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras2tf2-slim \
kipoi get-example DeepSEA/predict -o /app/example
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras2tf2-slim \
kipoi predict DeepSEA/predict \
--dataloader_args='{'intervals_file': '/app/example/intervals_file', 'fasta_file': '/app/example/fasta_file'}' \
-o '/app/DeepSEA_predict.example_pred.tsv'
# check the results
head $PWD/kipoi-example/DeepSEA_predict.example_pred.tsv
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
kipoi get-example DeepSEA/predict -o example
kipoi predict DeepSEA/predict \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o 'DeepSEA_predict.example_pred.tsv' \
--singularity
# check the results
head DeepSEA_predict.example_pred.tsv
Defined as: kipoiseq.dataloaders.SeqIntervalDl
Doc: Dataloader for a combination of fasta and tab-delimited input files such as bed files. The dataloader extracts regions from the fasta file as defined in the tab-delimited `intervals_file` and converts them into one-hot encoded format. Returned sequences are of the type np.array with the shape inferred from the arguments: `alphabet_axis` and `dummy_axis`.
Authors: Ziga Avsec , Roman Kreuzhuber
Type: Dataset
License: MIT
Arguments
intervals_file : bed3+<columns> file path containing intervals + (optionally) labels
fasta_file : Reference genome FASTA file path.
num_chr_fasta (optional): True, the the dataloader will make sure that the chromosomes don't start with chr.
label_dtype (optional): None, datatype of the task labels taken from the intervals_file. Example: str, int, float, np.float32
use_strand (optional): reverse-complement fasta sequence if bed file defines negative strand. Requires a bed6 file
ignore_targets (optional): if True, don't return any target variables
- python=3.8
- h5py=3.9.0
- pytorch::pytorch=2.0.1
- pip=22.0.4
- cython=3.0.0
- kipoiseq
- bioconda::pybedtools
- bioconda::pyfaidx
- bioconda::pyranges
- numpy
- pandas
- kipoiseq