Getting started - Kipoi documentation

Preparation

Before you start, make sure you have installed kipoi.

Setting up your model

For this example let's assume the model you want to submit is called MyModel. To submit your model you will have create the folder MyModel in you Kipoi model folder (default: ~/.kipoi/models). In this folder you will have to create the following file(s):

If you have trained multiple models that logically belong into one model-group as they are similar in function, but they individually require different preprocessing code then you are right here. To submit your model you will have to:

Create a new local folder named after your model, e.g.:
mkdir MyModel
and within this folder create a folder structure so that every individual trained model has its own folder. Every folder that contains a model.yaml is then interpreted as an individual model by Kipoi.
To make this clearer take a look at how FactorNet is structured: FactorNet. If you have files that are re-used in multiple models you can use symbolic links (ln -s) relative within the folder structure of your model group.
For your selection the following files have to exist in every sub-folder that should act as an individual model:

For this example let's assume the model you want to submit is called MyModel. To submit your model you will have to:

Create a new local folder named like your model, e.g.: mkdir MyModel
In the MyModel folder you will have to crate a model.yaml file: The model.yaml files acts as a configuration file for Kipoi. For an example take a look at Divergent421/model.yaml.

For this example let's assume you have trained one model architecture on multiple similar datasets and can use the same preprocessing code for all models. Let's assume you want to call the model-group MyModel. To submit your model you will have to:

Create a new local folder named after your model, e.g.: mkdir MyModel
In the MyModel folder you will have to crate a model-template.yaml file: The model-template.yaml files acts as a configuration file for Kipoi. For an example take a look at CpGenie/model-template.yaml.
As you can see instead of putting urls and parameters directly in the .yaml file you need to put {{ parameter_name }} in the yaml file. The values are then automatically loaded from a tab-delimited file called models.tsv that you also have to provide. For the previous example this would be: CpGenie/models.tsv. Using kipoi those models are then accessible by the model group name and the model name defined in the models.tsv. Model names may contain /s.

In the model definition yaml file you see the defined_as keyword: Since your model is a Keras model, set it to kipoi.model.KerasModel.
In the model definition yaml file you see the args keyword, which can be set the following way: KerasModel definition

In the model definition yaml file you see the defined_as keyword: Since your model is a TensorFlow model, set it to kipoi.model.TensorFlowModel.
In the model definition yaml file you see the args keyword, which can be set the following way: TensorFlowModel definition

In the model definition yaml file you see the defined_as keyword: Since your model is a PyTorch model, set it to kipoi.model.PyTorchModel.
In the model definition yaml file you see the args keyword, which can be set the following way: PyTorchModel definition

In the model definition yaml file you see the defined_as keyword: Since your model is a scikit-learn model, set it to kipoi.model.SklearnModel.
In the model definition yaml file you see the args keyword, which can be set the following way: SklearnModel definition

Your model is not implemented in Keras, TensorFlow, PyTorch, nor sci-kit learn, so you will have to implement a custom python class inheriting from kipoi.model.Model. In the defined_as keyword of the model.yaml you will then have to refer to your definition by my_model_def.MyModel if the MyModel class is defined in the my_model_def.py that lies in the same folder as model.yaml. For details please see: defining custom models in model.yaml and writing a model.py file.

Now set the software requirements correctly. This happens in the dependencies section of the model .yaml file. As you can see in the example the dependencies are split by conda and pip. Ideally you define the ranges of the versions of packages your model supports - otherwise it may fail at some point in future. If you need to specify a conda channel use the <channel>::<package> notation for conda dependencies.

As you have seen in the presented example and in the model definition links it is necessary that prior to model contribution you have published all model files (except for python scripts and other configuration files) on zenodo or figshare to ensure functionality and versioning of models.

If you want to test your model(s) locally before publishing them on zenodo or figshare you can replace the pair of url and md5 tags in the model definition yaml by the local path on your filesystem, e.g.:

args:
        arch: path/to/my/arch.json

But keep in mind that local paths are only good for testing and for models that you want to keep only locally.

Setting up your dataloader

Sice your model uses DNA sequence input the kipoiseq dataloaders are recommended to be used, as shown in the above example model definition .yaml file, which could for example be defined like this:

default_dataloader:
      defined_as: kipoiseq.dataloaders.SeqIntervalDl
      default_args:
        auto_resize_len: 1001
        alphabet_axis: 0
        dummy_axis: 1

To see all the parameters and functions of the off-the-shelf dataloaders please take a look at kipoiseq.

Since your model uses DNA sequence and additional annotation you have to define your own dataloader function or class. Depending on your use-case you may find some of the data-loader implementations of exiting models in the model zoo helpful. You may find the rbp_eclip dataloader or one of the FactorNet dataloaders relevant. Also consider taking advantage of elements implemented in the kipoiseq package. For you implementation you have to:

set default_dataloader: . in the model.yaml file
write a dataloader.yaml file as defined in writing dataloader.yaml. An example is this one.
implement the dataloader in a dataloader.py file as defined in writing dataloader.py. An example is this one.
put the dataloader.yaml and the dataloader.py in the same folder as model.yaml.

Since your model uses input other than what is covered by the default data-loaders you have to define your own dataloader function or class. Depending on your use-case you may find some of the data-loader implementations of exiting models in the model zoo helpful. You may find the rbp_eclip dataloader or one of the FactorNet dataloaders relevant. Also consider taking advantage of elements implemented in the kipoiseq package. For you implementation you have to:

set default_dataloader: . in the model.yaml file
write a dataloader.yaml file as defined in writing dataloader.yaml. An example is this one.
implement the dataloader in a dataloader.py file as defined in writing dataloader.py. An example is this one.
put the dataloader.yaml and the dataloader.py in the same folder as model.yaml.

Since your model is specialised in predicting properties of splice sites you are encouraged to take a look at the dataloaders implemented for the kipoi models tagged as RNA splicing models, such as HAL, labranchor, or MMSplice. If the MMSplice dataloader in the above example does not fit your needs, you have to:

set default_dataloader: . in the model.yaml file
write a dataloader.yaml file as defined in writing dataloader.yaml.
implement the dataloader in a dataloader.py file as defined in writing dataloader.py.
put the dataloader.yaml and the dataloader.py in the same folder as model.yaml.

Info and model schema

Please update the model description, the authors and the data it the model was trained in the info section of the model .yaml file. Please explain explicitly what your model does etc. Think what you would want to know if you didn't know anything about the model.

Now fillout the model schema (schema tag) as explained here: model schema.

License

Please make sure that the license that is defined in the license: tag in the yaml file is correct. Also only contribute models for which you have the rights to do so and only contribute models that permit redistribution.

Testing

Now it is time to test your model. If you are in the model directory run the command:

kipoi test .

in your model folder to test whether the general setup is correct. When this was successful run

kipoi test-source dir --all

to test whether all the software dependencies of the model are setup correctly and the automated tests will pass.

Testing

Now it is time to test your models. For the following let's assume your model group is called MyModel and your have two models in the group, which are MyModel/ModelA and MyModel/ModelB then you should should make sure you are in the MyModel folder and run the commands

kipoi test ./ModelA

and

kipoi test ./ModelB

. When this was successful run

kipoi test-source dir --all

to test whether all the software dependencies of the model and dataloader are setup correctly.

Forking and submitting

Make sure your model repository is up to date:
- git pull
Commit your changes
- git add MyModel/
- git commit -m "Added <MyModel>"
Fork the https://github.com/kipoi/models repo on github (click on the Fork button)
Add your fork as a git remote to ~/.kipoi/models
- git remote add fork https://github.com/<username>/models.git
Push to your fork
- git push fork master
Submit a pull-request
- On github click the New pull request button on your github fork - https://github.com/<username>/models>

Contributing models - Getting started

Preparation

Setting up your model

Setting up your dataloader

Info and model schema

License

Testing

Testing

Forking and submitting