Generated from notebooks/tf_binding_models.ipynb

Model benchmarking with Kipoi

This tutorial will show to to easily benchmark tf-binding models in Kipoi. By providing a unified access to models, it takes the same effort to run a simple PWM scanning model then to run a more complicated model (DeepBind in this example).

Load software tools

Let's start by loading software for this tutorial: the kipoi model zoo,

import kipoi
import numpy as np
from sklearn.metrics import roc_auc_score

Prepare data files

Next, we introduce a labeled BED-format interval file and a genome fasta file

intervals_file = 'example_data/chr22.101bp.2000_intervals.JUND.HepG2.tsv'
fasta_file  = 'example_data/hg19_chr22.fa'
dl_kwargs = {'intervals_file': intervals_file, 'fasta_file': fasta_file}

Let's look at the first few lines in the intervals file

!head $intervals_file
chr22   20208963    20209064    0
chr22   29673572    29673673    0
chr22   28193720    28193821    0
chr22   43864274    43864375    0
chr22   18261550    18261651    0
chr22   7869409 7869510 0
chr22   49798024    49798125    0
chr22   43088594    43088695    0
chr22   35147671    35147772    0
chr22   49486843    49486944    0

The four columns in this file contain chromosomes, interval start coordinate, interval end coordinate, and the label. This file contains 2000 examples, 1000 positives and 1000 negatives.

Let's load the labels from the last column:

labels = np.loadtxt(intervals_file, usecols=(3,))

Next, to evaluate the DeepBind model for JUND, we will 1) install software requirements to run the model, 2) load the model, and 3) get model predictions using our intervals and fasta file.

Install DeepBind model software requirements

kipoi env create DeepBind --source=kipoi

source activate kipoi-DeepBind

## Load DeepBind model
deepbind_model_name = "DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF"
deepbind_model = kipoi.get_model(deepbind_model_name)
Using downloaded and verified file: /Users/b260/.kipoi/models/DeepBind/downloaded/model_files/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/arch/0d6747991a525b94a1ac9174459c2bf4
Using downloaded and verified file: /Users/b260/.kipoi/models/DeepBind/downloaded/model_files/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/weights/838eb7287139a2542f21984e692a9be2

Get DeepBind predictions

deepbind_predictions = deepbind_model.pipeline.predict(dl_kwargs, batch_size=1000)
2it [00:02,  1.16s/it]

Evaluate DeepBind predictions

Let's check the auROC of deepbind predictions:

roc_auc_score(labels, deepbind_predictions)
0.614856

kipoi env create pwm_HOCOMOCO

source activate kipoi-pwm_HOCOMOCO

pwm_model_name = "pwm_HOCOMOCO/human/JUND"
pwm_model = kipoi.get_model(pwm_model_name)
pwm_predictions = pwm_model.pipeline.predict(dl_kwargs, batch_size=1000)
print("PWM auROC:")
roc_auc_score(labels, pwm_predictions)
0.00B [00:00, ?B/s]Downloading https://zenodo.org/record/1466139/files/human-JUND.h5?download=1 to /Users/b260/.kipoi/models/pwm_HOCOMOCO/downloaded/model_files/human/JUND/weights/bb64a335f37cff4537b1bde4c11cab8b
16.4kB [00:01, 16.0kB/s]                            
2it [00:01,  1.51it/s]PWM auROC:






0.6431155

In this example, HOCOMOCO PWM's auROC of 64.3% outperforms the DeepBind auROC of 61.5%