Contributing models - Getting started
Kipoi stores models (descriptions, parameter files, dataloader code, ...) as folders in the
kipoi/models github repository. The minimum requirement for a model is that a
model.yaml
file is available in the model folder, which defines the type of the model,
file paths / URLs, the dataloader, description, software dependencies, etc.
We have compiled some of the standard use-cases of model contribution here. Please specify:
- which input data your model requires:
- in which framework your model is implemented:
- whether you want to contribute a:
Preparation
Before you start, make sure you have installed kipoi
.
Setting up your model
For this example let's assume the model you want to submit is called MyModel
. To submit your model
you will have create the folder MyModel
in you Kipoi model folder (default:
~/.kipoi/models
). In this folder you will have to create the following file(s):
If you have trained multiple models that logically belong into one model-group as they are similar in function, but they individually require different preprocessing code then you are right here. To submit your model you will have to:
- Create a new local folder named after your model, e.g.:
mkdir MyModel
and within this folder create a folder structure so that every individual trained model has its own folder. Every folder that contains amodel.yaml
is then interpreted as an individual model by Kipoi. - To make this clearer take a look at how
FactorNet
is structured: FactorNet. If you have files that are re-used in multiple models you can use symbolic links (ln -s
) relative within the folder structure of your model group. - For your selection the following files have to exist in every sub-folder that should act as an individual model:
For this example let's assume the model you want to submit is called MyModel
. To submit your model you will have to:
- Create a new local folder named like your model, e.g.:
mkdir MyModel
- In the
MyModel
folder you will have to crate amodel.yaml
file: Themodel.yaml
files acts as a configuration file for Kipoi. For an example take a look at Divergent421/model.yaml.
For this example let's assume you have trained one model architecture on multiple similar datasets and can use the
same preprocessing code for all models. Let's assume you want to call the
model-group MyModel
. To submit your model you will have to:
- Create a new local folder named after your model, e.g.:
mkdir MyModel
- In the
MyModel
folder you will have to crate amodel-template.yaml
file: Themodel-template.yaml
files acts as a configuration file for Kipoi. For an example take a look at CpGenie/model-template.yaml. - As you can see instead of putting urls and parameters directly in the
.yaml
file you need to put{{ parameter_name }}
in the yaml file. The values are then automatically loaded from atab
-delimited file calledmodels.tsv
that you also have to provide. For the previous example this would be: CpGenie/models.tsv. Using kipoi those models are then accessible by the model group name and the model name defined in themodels.tsv
. Model names may contain/
s.
- In the model definition yaml file you see the
defined_as
keyword: Since your model is a Keras model, set it tokipoi.model.KerasModel
. - In the model definition yaml file you see the
args
keyword, which can be set the following way: KerasModel definition
- In the model definition yaml file you see the
defined_as
keyword: Since your model is a TensorFlow model, set it tokipoi.model.TensorFlowModel
. - In the model definition yaml file you see the
args
keyword, which can be set the following way: TensorFlowModel definition
- In the model definition yaml file you see the
defined_as
keyword: Since your model is a PyTorch model, set it tokipoi.model.PyTorchModel
. - In the model definition yaml file you see the
args
keyword, which can be set the following way: PyTorchModel definition
- In the model definition yaml file you see the
defined_as
keyword: Since your model is a scikit-learn model, set it tokipoi.model.SklearnModel
. - In the model definition yaml file you see the
args
keyword, which can be set the following way: SklearnModel definition
- Your model is not implemented in
Keras
,TensorFlow
,PyTorch
, norsci-kit learn
, so you will have to implement a custom python class inheriting fromkipoi.model.Model
. In thedefined_as
keyword of themodel.yaml
you will then have to refer to your definition bymy_model_def.MyModel
if theMyModel
class is defined in themy_model_def.py
that lies in the same folder asmodel.yaml
. For details please see: defining custom models in model.yaml and writing a model.py file.
- Now set the software requirements correctly. This happens in the
dependencies
section of the model.yaml
file. As you can see in the example the dependencies are split byconda
andpip
. Ideally you define the ranges of the versions of packages your model supports - otherwise it may fail at some point in future. If you need to specify a conda channel use the<channel>::<package>
notation for conda dependencies.
As you have seen in the presented example and in the model definition links it is necessary that prior to model contribution you have published all model files (except for python scripts and other configuration files) on zenodo or figshare to ensure functionality and versioning of models.
If you want to test your model(s) locally before publishing them on zenodo or
figshare you can replace the pair of url
and md5
tags in the model definition yaml by the
local path on your filesystem, e.g.:
args:
arch: path/to/my/arch.json
But keep in mind that local paths are only good for testing and for models that you want to keep only locally.
Setting up your dataloader
Sice your model uses DNA sequence input the kipoiseq dataloaders are recommended to be used, as shown in
the above example model definition .yaml
file, which could for example be defined like this:
default_dataloader:
defined_as: kipoiseq.dataloaders.SeqIntervalDl
default_args:
auto_resize_len: 1001
alphabet_axis: 0
dummy_axis: 1
To see all the parameters and functions of the off-the-shelf dataloaders please take a look at kipoiseq.
Since your model uses DNA sequence and additional annotation you have to define your own dataloader function or class. Depending on your use-case you may find some of the data-loader implementations of exiting models in the model zoo helpful. You may find the rbp_eclip dataloader or one of the FactorNet dataloaders relevant. Also consider taking advantage of elements implemented in the kipoiseq package. For you implementation you have to:
- set
default_dataloader: .
in themodel.yaml
file - write a
dataloader.yaml
file as defined in writing dataloader.yaml. An example is this one. - implement the dataloader in a
dataloader.py
file as defined in writing dataloader.py. An example is this one. - put the
dataloader.yaml
and thedataloader.py
in the same folder asmodel.yaml
.
Since your model uses input other than what is covered by the default data-loaders you have to define your own dataloader function or class. Depending on your use-case you may find some of the data-loader implementations of exiting models in the model zoo helpful. You may find the rbp_eclip dataloader or one of the FactorNet dataloaders relevant. Also consider taking advantage of elements implemented in the kipoiseq package. For you implementation you have to:
- set
default_dataloader: .
in themodel.yaml
file - write a
dataloader.yaml
file as defined in writing dataloader.yaml. An example is this one. - implement the dataloader in a
dataloader.py
file as defined in writing dataloader.py. An example is this one. - put the
dataloader.yaml
and thedataloader.py
in the same folder asmodel.yaml
.
Since your model is specialised in predicting properties of splice sites you are encouraged to take a look at the
dataloaders implemented for the kipoi models tagged as RNA splicing
models, such as
HAL,
labranchor, or
MMSplice.
If the MMSplice dataloader in the above example does not fit your needs, you have to:
- set
default_dataloader: .
in themodel.yaml
file - write a
dataloader.yaml
file as defined in writing dataloader.yaml. - implement the dataloader in a
dataloader.py
file as defined in writing dataloader.py. - put the
dataloader.yaml
and thedataloader.py
in the same folder asmodel.yaml
.
Info and model schema
Please update the model description, the authors and the data it the model was trained in the info
section of the model .yaml
file. Please explain explicitly what your model does etc. Think what you
would want to know if you didn't know anything about the model.
Now fillout the model schema (schema
tag) as explained here:
model schema.
License
Please make sure that the license that is defined in the license:
tag in the yaml file is correct.
Also only contribute models for which you have the rights to do so and only contribute models that permit
redistribution.
Testing
Now it is time to test your model. If you are in the model directory run the command:
kipoi test .
in your model folder to test whether the general setup is correct. When this was successful run
kipoi test-source dir --all
to test whether all the software dependencies of the model are setup correctly and the automated tests will pass.
Testing
Now it is time to test your models. For the following let's assume your model group is called
MyModel
and your have two models in the group, which are MyModel/ModelA
and
MyModel/ModelB
then you should should make sure you are in the MyModel
folder and
run the commands
kipoi test ./ModelA
and
kipoi test ./ModelB
. When this was successful run
kipoi test-source dir --all
to test whether all the software dependencies of the model and dataloader are setup correctly.
Forking and submitting
- Make sure your model repository is up to date:
git pull
- Commit your changes
git add MyModel/
git commit -m "Added <MyModel>"
- Fork the https://github.com/kipoi/models repo on github (click on the Fork button)
- Add your fork as a git remote to
~/.kipoi/models
git remote add fork https://github.com/<username>/models.git
- Push to your fork
git push fork master
- Submit a pull-request
- On github click the New pull request button on your
github fork -
https://github.com/<username>/models>
- On github click the New pull request button on your
github fork -