Authors: Jian Zhou , Olga Troyanskaya

Version: 0.1

License: MIT

Contributed by: Lara Urban

Cite as:

Trained on: ENCODE and Roadmap Epigenomics chromatin profiles

Type: keras

Postprocessing: variant_effects

This CNN is based on the DeepSEA model from Zhou and Troyanskaya (2015). It categorically predicts 918 cell type-specific epigenetic features from DNA sequence. The model is trained on publicly available ENCODE and Roadmap Epigenomics data and on DNA sequences of size 1000bp. The input of the tensor has to be (N, 1000, 4) for N samples, 1000bp window size and 4 nucleotides. Per sample, 918 probabilities of showing a specific epigentic feature will be predicted.

Create a new conda environment with all dependencies installed
kipoi env create DeepSEAKeras
source activate kipoi-DeepSEAKeras
Install model dependencies into current environment
kipoi env install DeepSEAKeras
Test the model
kipoi test DeepSEAKeras --source=kipoi
Make a prediction
cd ~/.kipoi/models/DeepSEAKeras
kipoi predict DeepSEAKeras \
  --dataloader_args='{'intervals_file': 'example_files/intervals.bed', 'fasta_file': 'example_files/hg38_chr22.fa'}' \
  -o '/tmp/DeepSEAKeras.example_pred.tsv'
# check the results
head '/tmp/DeepSEAKeras.example_pred.tsv'
Get the model
import kipoi
model = kipoi.get_model('DeepSEAKeras')
Make a prediction for example files
pred = model.pipeline.predict_example()
Use dataloader and model separately
# setup the example dataloader kwargs
dl_kwargs = {'intervals_file': 'example_files/intervals.bed', 'fasta_file': 'example_files/hg38_chr22.fa'}
import os; os.chdir(os.path.expanduser('~/.kipoi/models/DeepSEAKeras'))
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
it = dl.batch_iter(batch_size=4)
# predict for a batch
batch = next(it)
Make predictions for custom files directly
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
Get the model
kipoi <- import('kipoi')
model <- kipoi$get_model('DeepSEAKeras')
Make a prediction for example files
predictions <- model$pipeline$predict_example()
Use dataloader and model separately
# Get the dataloader
dl <- model$default_dataloader(intervals_file='example_files/intervals.bed', fasta_file='example_files/hg38_chr22.fa')
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
Make predictions for custom files directly
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)



Single numpy array

Name: seq

    Shape: (1000, 4) 

    Doc: DNA sequence


Single numpy array

Name: TFBS_DHS_probs

    Shape: (919,) 

    Doc: Probability of a specific epigentic feature


Relative path: .

Version: 0.1

Doc: Dataloader for the DeepSEA model.

Authors: Lara Urban , Ziga Avsec

Type: Dataset

License: MIT


intervals_file : bed3 file with `chrom start end id score strand`

fasta_file : Reference genome sequence

target_file (optional): path to the targets (.tsv) file

use_linecache (optional): if True, use linecache to access bed file rows

Model dependencies
  • python=3.5
  • h5py

  • tensorflow==1.4.1
  • keras==1.2.2

Dataloader dependencies
  • python=3.5
  • numpy
  • pandas
  • cython

  • genomelake
  • pybedtools