DeepCpG_DNA/Hou2016_HCC_dna

Authors: Christof Angermueller

License: MIT

Contributed by: Roman Kreuzhuber

Cite as: https://doi.org/10.1186/s13059-017-1189-z
https://doi.org/10.5281/zenodo.1094823

Type: keras

Postprocessing: None

Trained on: scBS-seq and scRRBS-seq datasets, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1189-z#Sec7

Source files

This is the extraction of the DNA-part of the a pretrained model by Christof Angermueller. The DeepCpG models are trained on: scBS-seq-profiled cells contained 18 serum and 12 2i mESCs, which were pre-processed as described in Smallwood et al. (2014), with reads mapped to the GRCm38 mouse genome. Two serum cells (RSC27_4, RSC27_7) were excluded since their methylation pattern deviated strongly from the remaining serum cells. scRRBS-seq-profiled cells were downloaded from the Gene Expression Omnibus (GEO; GSE65364) and contained 25 human HCCs, six human heptoplastoma-derived cells (HepG2) and six mESCs. Following Hou et al. (2013), one HCC was excluded (Ca26) and the analysis was restricted to CpG sites that were covered by at least four reads. For HCCs and HepG2 cells, the position of CpG sites was lifted from GRCh37 to GRCh38, and for mESC cells from NCBIM37 to GRCm38, using the liftOver tool from the UCSC Genome Browser.

Create a new conda environment with all dependencies installed
kipoi env create DeepCpG_DNA/Hou2016_HCC_dna
source activate kipoi-DeepCpG_DNA__Hou2016_HCC_dna
Test the model
kipoi test DeepCpG_DNA/Hou2016_HCC_dna --source=kipoi
Make a prediction
kipoi get-example DeepCpG_DNA/Hou2016_HCC_dna -o example
kipoi predict DeepCpG_DNA/Hou2016_HCC_dna \
  --dataloader_args='{"fasta_file": "example/fasta_file", "intervals_file": "example/intervals_file"}' \
  -o '/tmp/DeepCpG_DNA|Hou2016_HCC_dna.example_pred.tsv'
# check the results
head '/tmp/DeepCpG_DNA|Hou2016_HCC_dna.example_pred.tsv'
Create a new conda environment with all dependencies installed
kipoi env create DeepCpG_DNA/Hou2016_HCC_dna
source activate kipoi-DeepCpG_DNA__Hou2016_HCC_dna
Get the model
import kipoi
model = kipoi.get_model('DeepCpG_DNA/Hou2016_HCC_dna')
Make a prediction for example files
pred = model.pipeline.predict_example(batch_size=4)
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
    # predict for a batch
    batch_pred = model.predict_on_batch(batch['inputs'])
Make predictions for custom files directly
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
Get the model
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('DeepCpG_DNA/Hou2016_HCC_dna')
Make a prediction for example files
predictions <- model$pipeline$predict_example()
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
Make predictions for custom files directly
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
Get the docker image
docker pull kipoi/kipoi-docker:sharedpy3keras1.2-slim
Get the full sized docker image
docker pull kipoi/kipoi-docker:sharedpy3keras1.2
Get the activated conda environment inside the container
docker run -it kipoi/kipoi-docker:sharedpy3keras1.2-slim
Test the model
docker run kipoi/kipoi-docker:sharedpy3keras1.2-slim kipoi test DeepCpG_DNA/Hou2016_HCC_dna --source=kipoi
Make prediction for custom files directly
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example 
# You can replace $PWD/kipoi-example with a different absolute path containing the data 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras1.2-slim \
kipoi get-example DeepCpG_DNA/Hou2016_HCC_dna -o /app/example 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras1.2-slim \
kipoi predict DeepCpG_DNA/Hou2016_HCC_dna \
--dataloader_args='{'fasta_file': '/app/example/fasta_file', 'intervals_file': '/app/example/intervals_file'}' \
-o '/app/DeepCpG_DNA_Hou2016_HCC_dna.example_pred.tsv' 
# check the results
head $PWD/kipoi-example/DeepCpG_DNA_Hou2016_HCC_dna.example_pred.tsv
    
Install apptainer
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
Make prediction for custom files directly
kipoi get-example DeepCpG_DNA/Hou2016_HCC_dna -o example
kipoi predict DeepCpG_DNA/Hou2016_HCC_dna \
--dataloader_args='{"fasta_file": "example/fasta_file", "intervals_file": "example/intervals_file"}' \
-o 'DeepCpG_DNA_Hou2016_HCC_dna.example_pred.tsv' \
--singularity 
# check the results
head DeepCpG_DNA_Hou2016_HCC_dna.example_pred.tsv

Schema

Inputs

Dictionary of numpy arrays

Name: dna

    Shape: (1001, 4) 

    Doc: DNA sequence


Targets

List of numpy arrays

Name: cpg/Ca01

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca01

Name: cpg/Ca02

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca02

Name: cpg/Ca03

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca03

Name: cpg/Ca04

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca04

Name: cpg/Ca05

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca05

Name: cpg/Ca06

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca06

Name: cpg/Ca07

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca07

Name: cpg/Ca08

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca08

Name: cpg/Ca09

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca09

Name: cpg/Ca10

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca10

Name: cpg/Ca11

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca11

Name: cpg/Ca12

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca12

Name: cpg/Ca13

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca13

Name: cpg/Ca14

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca14

Name: cpg/Ca15

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca15

Name: cpg/Ca16

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca16

Name: cpg/Ca17

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca17

Name: cpg/Ca18

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca18

Name: cpg/Ca19

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca19

Name: cpg/Ca20

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca20

Name: cpg/Ca21

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca21

Name: cpg/Ca22

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca22

Name: cpg/Ca23

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca23

Name: cpg/Ca24

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca24

Name: cpg/Ca25

    Shape: (None, 1) 

    Doc: Methylation probability for cpg/Ca25


Dataloader

Defined as: .

Doc: Dataloader for the DeepCpG.

Authors: Ziga Avsec , Roman Kreuzhuber

Type: Dataset

License: MIT


Arguments

fasta_file : Reference genome sequence

intervals_file : bed3 file with `chrom start end id score strand`


Model dependencies
conda:
  • python=3.7
  • h5py=2.10.0
  • pip=20.2.4

pip:
  • tensorflow==1.13.1
  • keras==1.2.2
  • protobuf==3.20

Dataloader dependencies
conda:
  • bioconda::genomelake=0.1.4
  • bioconda::pybedtools=0.8.1
  • python=3.7
  • numpy=1.19.2
  • pandas=1.1.3

pip: