Kipoi: model zoo for genomics


1. Install miniconda/anaconda

Kipoi requires conda to manage model dependencies. Make sure you have either anaconda (download page) or miniconda (download page) installed. If you are using OSX, see Installing python on OSX.

2. Install Git LFS

For downloading models, Kipoi uses git and Git Large File Storage (LFS). See how to install git here. To install git-lfs on Ubuntu, run:

curl -s | sudo bash
sudo apt-get install -y git git-lfs
git-lfs install

Alternatively, you can install git-lfs through conda:

conda install -c conda-forge git-lfs && git lfs install

3. Install Kipoi

Next, install Kipoi using pip:

pip install kipoi


If you wish to develop kipoi, run instead:

conda install pytorch-cpu
pip install -e '.[develop]'

This will install some additional packages like pytest. You can test the package by running py.test.

If you wish to run tests in parallel, run py.test -n 6.

Quick start


List available models

import kipoi


Hint: For an overview over the available models also check the model overview on our website, where you can see example commands for how to use the models on the CLI, python and R.

Load the model from model source or local directory

# Load the model from
model = kipoi.get_model("rbp_eclip/UPF1", source="kipoi") # source="kipoi" is the default

# Load the model from a local directory
model = kipoi.get_model("~/mymodels/rbp", source="dir")  
# Note: Custom model sources are defined in ~/.kipoi/config.yaml

# Load the model via github permalink for a particular commit:
model = kipoi.get_model("", source='github-permalink')

Main model attributes and methods

# See the information about the author:

# Access the default dataloader

# Access the Keras model

# Predict on batch - implemented by all the models regardless of the framework
# (i.e. works with sklearn, Keras, tensorflow, ...)

# Get predictions for the raw files
# Kipoi runs: raw files -[dataloader]-> numpy arrays -[model]-> predictions 
model.pipeline.predict({"dataloader_arg1": "inputs.csv"})

Load the dataloader

Dl = kipoi.get_dataloader_factory("rbp_eclip/UPF1") # returns a class that needs to be instantiated
dl = Dl(dataloader_arg1="inputs.csv")  # Create/instantiate an object

Dataloader attributes and methods

# batch_iter - common to all dataloaders
# Returns an iterator generating batches of model-ready numpy.arrays
it = dl.batch_iter(batch_size=32)
out = next(it)  # {"inputs": np.array, (optional) "targets": np.arrays.., "metadata": np.arrays...}

# To get predictions, run

# load the whole dataset into memory

Re-train the model

# re-train example for Keras
dl = Dl(dataloader_arg1="inputs.csv", targets_file="mytargets.csv")
it_train = dl.batch_train_iter(batch_size=32)  
# batch_train_iter is a convenience wrapper of batch_iter
# yielding (inputs, targets) tuples indefinitely
model.model.fit_generator(it_train, steps_per_epoch=len(dl)//32, epochs=10)

For more information see: notebooks/python-api.ipynb and docs/using getting started


$ kipoi
usage: kipoi <command> [-h] ...

    # Kipoi model-zoo command line tool. Available sub-commands:
    # - using models:
    ls               List all the available models
    predict          Run the model prediction
    pull             Download the directory associated with the model
    preproc          Run the dataloader and save the results to an hdf5 array
    postproc         Tools for model postprocessing like variant effect prediction
    env              Tools for managing Kipoi conda environments

    # - contribuing models:
    init             Initialize a new Kipoi model
    test             Runs a set of unit-tests for the model
    test-source      Runs a set of unit-tests for many/all models in a source

Explore the CLI usage by running kipoi <command> -h. Also, see docs/using/getting started cli for more information.

Configure Kipoi in .kipoi/config.yaml

You can add your own (private) model sources. See docs/using/03_Model_sources/.

Contributing models

See docs/contributing getting started and docs/tutorials/contributing/models for more information.


SNV effect prediction

Functionality to predict the effect of SNVs is available in the API as well as in the command line interface. The input is a VCF which can then be annotated with effect predictions and returned in the process. For more details on the requirements for the models and dataloaders please check docs/using/02_Variant_effect_prediction


Documentation can be found here: