MPRA-DragoNN/DeepFactorizedModel
Authors: Rajiv Movva, Surag Nair
License: MIT
Contributed by: Rajiv Movva, Surag Nair
Cite as: https://doi.org/10.1101/393926
Type: None
Postprocessing: variant_effects
Trained on: Sharpr-MPRA dataset. chr8 validation, chr18 test. other chromosomes train.
Deep factorized convolutional neural network for predicting Sharpr-MPRA activity of arbitrary 145bp sequences. Architecture based on https://doi.org/10.1101/229385.
kipoi env create MPRA-DragoNN/DeepFactorizedModel
source activate kipoi-MPRA-DragoNN__DeepFactorizedModel
kipoi test MPRA-DragoNN/DeepFactorizedModel --source=kipoi
kipoi get-example MPRA-DragoNN/DeepFactorizedModel -o example
kipoi predict MPRA-DragoNN/DeepFactorizedModel \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o '/tmp/MPRA-DragoNN|DeepFactorizedModel.example_pred.tsv'
# check the results
head '/tmp/MPRA-DragoNN|DeepFactorizedModel.example_pred.tsv'
kipoi env create MPRA-DragoNN/DeepFactorizedModel
source activate kipoi-MPRA-DragoNN__DeepFactorizedModel
import kipoi
model = kipoi.get_model('MPRA-DragoNN/DeepFactorizedModel')
pred = model.pipeline.predict_example(batch_size=4)
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
# predict for a batch
batch_pred = model.predict_on_batch(batch['inputs'])
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('MPRA-DragoNN/DeepFactorizedModel')
predictions <- model$pipeline$predict_example()
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
docker pull kipoi/kipoi-docker:mpra-dragonn-slim
docker pull kipoi/kipoi-docker:mpra-dragonn
docker run -it kipoi/kipoi-docker:mpra-dragonn-slim
docker run kipoi/kipoi-docker:mpra-dragonn-slim kipoi test MPRA-DragoNN/DeepFactorizedModel --source=kipoi
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example
# You can replace $PWD/kipoi-example with a different absolute path containing the data
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:mpra-dragonn-slim \
kipoi get-example MPRA-DragoNN/DeepFactorizedModel -o /app/example
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:mpra-dragonn-slim \
kipoi predict MPRA-DragoNN/DeepFactorizedModel \
--dataloader_args='{'intervals_file': '/app/example/intervals_file', 'fasta_file': '/app/example/fasta_file'}' \
-o '/app/MPRA-DragoNN_DeepFactorizedModel.example_pred.tsv'
# check the results
head $PWD/kipoi-example/MPRA-DragoNN_DeepFactorizedModel.example_pred.tsv
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
kipoi get-example MPRA-DragoNN/DeepFactorizedModel -o example
kipoi predict MPRA-DragoNN/DeepFactorizedModel \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o 'MPRA-DragoNN_DeepFactorizedModel.example_pred.tsv' \
--singularity
# check the results
head MPRA-DragoNN_DeepFactorizedModel.example_pred.tsv
Inputs
Single numpy array
Name: None
Doc: 145bp one-hot encoded ACGT sequences (e.g. [1,0,0,0] = 'A')
Targets
Single numpy array
Name: None
Doc: predicts 12 tasks: k562 minP replicate 1, k562 minP replicate 2, k562 minP pooled, k562 sv40p replicate 1, k562 sv40p replicate 2, k562 sv40p pooled, hepg2 minP replicate 1, hepg2 minP replicate 2, hepg2 minP pooled, hepg2 sv40p replicate 1, hepg2 sv40p replicate 2, hepg2 sv40p pooled.
Defined as: kipoiseq.dataloaders.SeqIntervalDl
Doc: Dataloader for a combination of fasta and tab-delimited input files such as bed files. The dataloader extracts regions from the fasta file as defined in the tab-delimited `intervals_file` and converts them into one-hot encoded format. Returned sequences are of the type np.array with the shape inferred from the arguments: `alphabet_axis` and `dummy_axis`.
Authors: Ziga Avsec , Roman Kreuzhuber
Type: Dataset
License: MIT
Arguments
intervals_file : bed3+<columns> file path containing intervals + (optionally) labels
fasta_file : Reference genome FASTA file path.
num_chr_fasta (optional): True, the the dataloader will make sure that the chromosomes don't start with chr.
label_dtype (optional): None, datatype of the task labels taken from the intervals_file. Example: str, int, float, np.float32
use_strand (optional): reverse-complement fasta sequence if bed file defines negative strand. Requires a bed6 file
- cython=0.28.5
- python=3.7
- h5py=2.8.0
- pip=20.3.3
- keras=2.3
- tensorflow=1.14
- protobuf==3.20
- bioconda::pybedtools
- bioconda::pyfaidx
- bioconda::pyranges
- numpy
- pandas
- kipoiseq