score_variants

score_variants(model, dl_args, input_vcf, output_vcf=None, output_writers=None, scores=['logit_ref', 'logit_alt', 'ref', 'alt', 'logit', 'diff'], score_kwargs=None, num_workers=0, batch_size=32, source='kipoi', seq_length=None, std_var_id=False, restriction_bed=None, return_predictions=False, model_outputs=None)

Score variants: annotate the vcf file using model predictions for the reference and alternative alleles

The functional elements that generate a score from a set of predictions for reference and alternative allele are defined in the scores argument.

This function is the python version of the command-line call score_variants and is a convenience version of the predict_snvs function:

Prediction of effects of SNV based on a VCF. If desired the VCF can be stored with the predicted values as annotation. For a detailed description of the requirements in the yaml files please take a look at the core kipoi documentation on how to write a dataloader.yaml file or at the documentation of kipoi-veff in the section: overview/#model-and-dataloader-requirements.

Arguments

  • model: model string or a model class instance
  • dl_args: dataloader arguments as a dictionary
  • input_vcf: input vcf file path
  • output_vcf: output vcf file path
  • output_writers: output writers a list of used output writers
  • scores: list of score names to compute. See kipoi_veff.scores
  • score_kwargs: optional, list of kwargs that corresponds to the entries in score.
  • num_workers: number of paralell workers to use for dataloading
  • batch_size: batch_size for dataloading
  • source: model source name
  • std_var_id: If true then variant IDs in the annotated VCF will be replaced with a standardised, unique ID.
  • seq_length: If model accepts variable input sequence length then this value has to be set!
  • restriction_bed: If dataloader can be run with regions generated from the VCF then only variants that overlap regions defined in restriction_bed will be tested.
  • return_predictions: return generated predictions also as pandas dataframe.
  • model_outputs: If set then either a boolean filter or a named filter for model outputs that are reported.

Returns

dict: containing a pandas DataFrame containing the calculated values for each model output (target) column VCF SNV line. If return_predictions == False, returns None.