Variant effects

Main function:

predict_snvs

predict_snvs(model, dataloader, vcf_fpath, batch_size, num_workers=0, dataloader_args=None, vcf_to_region=None, vcf_id_generator_fn=<function default_vcf_id_gen at 0x7fc0d953cc80>, evaluation_function=<function analyse_model_preds at 0x7fc0d9ac7400>, evaluation_function_kwargs={'diff_types': {'logit': <kipoi.postprocessing.variant_effects.scores.Logit object at 0x7fc0d9ab9b70>}}, sync_pred_writer=None, use_dataloader_example_data=False, return_predictions=False, generated_seq_writer=None)

Predict the effect of SNVs

Prediction of effects of SNV based on a VCF. If desired the VCF can be stored with the predicted values as annotation. For a detailed description of the requirements in the yaml files please take a look at kipoi/nbs/variant_effect_prediction.ipynb.

Arguments

- __model__: A kipoi model handle generated by e.g.: kipoi.get_model()
- __dataloader__: Dataloader factory generated by e.g.: kipoi.get_dataloader_factory()
- __vcf_fpath__: Path of the VCF defining the positions that shall be assessed. Only SNVs will be tested.
- __batch_size__: Prediction batch size used for calling the data loader. Each batch will be generated in 4
mutated states yielding a system RAM consumption of >= 4x batch size.
- __num_workers__: Number of parallel workers for loading the dataset.
- __dataloader_args__: arguments passed on to the dataloader for sequence generation, arguments
mentioned in dataloader.yaml > postprocessing > variant_effects > bed_input will be overwritten
by the methods here.
- __vcf_to_region__: Callable that generates a region compatible with dataloader/model from a cyvcf2 record
- __vcf_id_generator_fn__: Callable that generates a unique ID from a cyvcf2 record
- __evaluation_function__: effect evaluation function. Default is `analyse_model_preds`, which will get
arguments defined in `evaluation_function_kwargs`
- __evaluation_function_kwargs__: kwargs passed on to `evaluation_function`.
- __sync_pred_writer__: Single writer or list of writer objects like instances of `VcfWriter`. This object
will be called after effect prediction of a batch is done.
- __use_dataloader_example_data__: Fill out the missing dataloader arguments with the example values given in the
dataloader.yaml.
- __return_predictions__: Return all variant effect predictions as a dictionary. Setting this to False will
help maintain a low memory profile and is faster as it avoids concatenating batches after prediction.
- __generated_seq_writer__: Single writer or list of writer objects like instances of `SyncHdf5SeqWriter`.
This object will be called after the DNA sequence sets have been generated. If this parameter is
not None, no prediction will be performed and only DNA sequence will be written!! This is relevant
if you want to use the `predict_snvs` to generate appropriate input DNA sequences for your model.

Returns

If return_predictions: Dictionary which contains a pandas DataFrame containing the calculated values
for each model output (target) column VCF SNV line. If return_predictions == False, returns None.