Generated from notebooks/tf_binding_models.ipynb

Model benchmarking with Kipoi

This tutorial will show to to easily benchmark tf-binding models in Kipoi. By providing a unified access to models, it takes the same effort to run a simple PWM scanning model then to run a more complicated model (DeepBind in this example).

Load software tools

Let's start by loading software for this tutorial: the kipoi model zoo,

import kipoi
import numpy as np
from sklearn.metrics import roc_auc_score

Prepare data files

Next, we introduce a labeled BED-format interval file and a genome fasta file

intervals_file = 'example_data/chr22.101bp.2000_intervals.JUND.HepG2.tsv'
fasta_file  = 'example_data/hg19_chr22.fa'
dl_kwargs = {'intervals_file': intervals_file, 'fasta_file': fasta_file}

Let's look at the first few lines in the intervals file

!head $intervals_file
chr22   20208963    20209064    0
chr22   29673572    29673673    0
chr22   28193720    28193821    0
chr22   43864274    43864375    0
chr22   18261550    18261651    0
chr22   7869409 7869510 0
chr22   49798024    49798125    0
chr22   43088594    43088695    0
chr22   35147671    35147772    0
chr22   49486843    49486944    0

The four columns in this file contain chromosomes, interval start coordinate, interval end coordinate, and the label. This file contains 2000 examples, 1000 positives and 1000 negatives.

Let's load the labels from the last column:

labels = np.loadtxt(intervals_file, usecols=(3,))

Next, to evaluate the DeepBind model for JUND, we will 1) install software requirements to run the model, 2) load the model, and 3) get model predictions using our intervals and fasta file.

Install DeepBind model software requirements

deepbind_model_name = "DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF"
# Use `$ kipoi env install DeepBind/D00776.005 --gpu` from the command-line to install the gpu version of the dependencies

Load DeepBind model

deepbind_model = kipoi.get_model(deepbind_model_name)
Downloading to /home/avsec/.kipoi/models/DeepBind/downloaded/model_files/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/arch/0d6747991a525b94a1ac9174459c2bf4

8.19kB [00:00, 11.9kB/s]

Downloading to /home/avsec/.kipoi/models/DeepBind/downloaded/model_files/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/weights/838eb7287139a2542f21984e692a9be2

32.8kB [00:02, 11.0kB/s]                   
/home/avsec/bin/anaconda3/lib/python3.6/site-packages/h5py/ FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
/home/avsec/bin/anaconda3/lib/python3.6/importlib/ RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)

Get DeepBind predictions

deepbind_predictions = deepbind_model.pipeline.predict(dl_kwargs, batch_size=1000)
2it [00:01,  1.04s/it]

Evaluate DeepBind predictions

Let's check the auROC of deepbind predictions:

roc_auc_score(labels, deepbind_predictions)

Load, run, and evaluate a HOCOMOCO PWM model

pwm_model_name = "pwm_HOCOMOCO/human/JUND"
# Use `$ kipoi env install pwm_HOCOMOCO/human/JUND --gpu` from the command-line to install the gpu version of the dependencies
pwm_model = kipoi.get_model(pwm_model_name)
pwm_predictions = pwm_model.pipeline.predict(dl_kwargs, batch_size=1000)
print("PWM auROC:")
roc_auc_score(labels, pwm_predictions)
Downloading to /home/avsec/.kipoi/models/pwm_HOCOMOCO/downloaded/model_files/human/JUND/weights/bb64a335f37cff4537b1bde4c11cab8b

16.4kB [00:00, 28.8kB/s]                   
/home/avsec/bin/anaconda3/lib/python3.6/site-packages/keras/ UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
2it [00:00,  2.19it/s]



In this example, DeepBind's auROC of 80.8% outperforms the HOCOMOCO PWM auROC of 64.3%