Generated from notebooks/tf_binding_models.ipynb
Model benchmarking with Kipoi
This tutorial will show to to easily benchmark tf-binding models in Kipoi. By providing a unified access to models, it takes the same effort to run a simple PWM scanning model then to run a more complicated model (DeepBind in this example).
Load software tools
Let's start by loading software for this tutorial: the kipoi model zoo,
import kipoi
import numpy as np
from sklearn.metrics import roc_auc_score
Prepare data files
Next, we introduce a labeled BED-format interval file and a genome fasta file
intervals_file = 'example_data/chr22.101bp.2000_intervals.JUND.HepG2.tsv'
fasta_file = 'example_data/hg19_chr22.fa'
dl_kwargs = {'intervals_file': intervals_file, 'fasta_file': fasta_file}
Let's look at the first few lines in the intervals file
!head $intervals_file
chr22 20208963 20209064 0
chr22 29673572 29673673 0
chr22 28193720 28193821 0
chr22 43864274 43864375 0
chr22 18261550 18261651 0
chr22 7869409 7869510 0
chr22 49798024 49798125 0
chr22 43088594 43088695 0
chr22 35147671 35147772 0
chr22 49486843 49486944 0
The four columns in this file contain chromosomes, interval start coordinate, interval end coordinate, and the label. This file contains 2000 examples, 1000 positives and 1000 negatives.
Let's load the labels from the last column:
labels = np.loadtxt(intervals_file, usecols=(3,))
Next, to evaluate the DeepBind model for JUND, we will 1) install software requirements to run the model, 2) load the model, and 3) get model predictions using our intervals and fasta file.
Install DeepBind model software requirements
kipoi env create DeepBind --source=kipoi
source activate kipoi-DeepBind
## Load DeepBind model
deepbind_model_name = "DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF"
deepbind_model = kipoi.get_model(deepbind_model_name)
Using downloaded and verified file: /Users/b260/.kipoi/models/DeepBind/downloaded/model_files/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/arch/0d6747991a525b94a1ac9174459c2bf4
Using downloaded and verified file: /Users/b260/.kipoi/models/DeepBind/downloaded/model_files/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/weights/838eb7287139a2542f21984e692a9be2
Get DeepBind predictions
deepbind_predictions = deepbind_model.pipeline.predict(dl_kwargs, batch_size=1000)
2it [00:02, 1.16s/it]
Evaluate DeepBind predictions
Let's check the auROC of deepbind predictions:
roc_auc_score(labels, deepbind_predictions)
0.614856
kipoi env create pwm_HOCOMOCO
source activate kipoi-pwm_HOCOMOCO
pwm_model_name = "pwm_HOCOMOCO/human/JUND"
pwm_model = kipoi.get_model(pwm_model_name)
pwm_predictions = pwm_model.pipeline.predict(dl_kwargs, batch_size=1000)
print("PWM auROC:")
roc_auc_score(labels, pwm_predictions)
0.00B [00:00, ?B/s]Downloading https://zenodo.org/record/1466139/files/human-JUND.h5?download=1 to /Users/b260/.kipoi/models/pwm_HOCOMOCO/downloaded/model_files/human/JUND/weights/bb64a335f37cff4537b1bde4c11cab8b
16.4kB [00:01, 16.0kB/s]
2it [00:01, 1.51it/s]PWM auROC:
0.6431155
In this example, HOCOMOCO PWM's auROC of 64.3% outperforms the DeepBind auROC of 61.5%