MutationMap

MutationMap(self, model, dataloader, dataloader_args=None, use_dataloader_example_data=False)

Generate mutation map

Prediction of effects of every base at every position of datalaoder input sequences. The regions for which the effect scores will be calculated are primarily defined by the dataloader input.

Arguments

  • model: A kipoi model handle generated by e.g.: kipoi.get_model()
  • dataloader: Dataloader factory generated by e.g.: kipoi.get_dataloader_factory()
  • dataloader_args: arguments passed on to the dataloader for sequence generation, arguments mentioned in dataloader.yaml > postprocessing > variant_effects > bed_input will be overwritten by the methods here.
  • use_dataloader_example_data: Fill out the missing dataloader arguments with the example values given in the dataloader.yaml.

query_region

MutationMap.query_region(self, chrom, start, end, seq_length=None, scores=['logit_ref', 'logit_alt', 'ref', 'alt', 'logit', 'diff'], score_kwargs=None, **kwargs)

Generate mutation map

Prediction of effects of every base at every position of datalaoder input sequences. The regions for which the effect scores will be calculated are primarily defined by the dataloader input. If the dataloader accepts bed file inputs then this file will be overwritten with regions defined here of length seq_length or the model input sequence length. If that is not available all datalaoder-generated regions that overlap the region defined here will be investigated. Effect scores are returned as MutationMapPlotter object which can be saved to an hdf5 file and used for plotting. It is important to mention that the order of the scored sequences is the order in which the dataloader has produced data input - intersected with the region defined here.

Arguments

  • chrom: Chrosome of region of interest. Assembly is defined by the dataload arguments.
  • start: Start of region of interest. Assembly is defined by the dataload arguments.
  • end: End of region of interest. Assembly is defined by the dataload arguments.
  • seq_length: Optional argument of model sequence length to use if model accepts variable input sequence length.
  • scores: list of score names to compute. See kipoi_veff.scores
  • score_kwargs: optional, list of kwargs that corresponds to the entries in score.

Returns

MutationMapPlotter: object containing variant scores.

query_bed

MutationMap.query_bed(self, bed_fpath, seq_length=None, scores=['logit_ref', 'logit_alt', 'ref', 'alt', 'logit', 'diff'], score_kwargs=None, **kwargs)

Generate mutation map

Prediction of effects of every base at every position of datalaoder input sequences. The regions for which the effect scores will be calculated are primarily defined by the dataloader input. If the dataloader accepts bed file inputs then this file will be overwritten with regions defined in bed_fpath of length seq_length or the model input sequence length. If that is not available all datalaoder-generated regions that overlap the region defined here will be investigated. Effect scores are returned as MutationMapPlotter object which can be saved to an hdf5 file and used for plotting. It is important to mention that the order of the scored sequences is the order in which the dataloader has produced data input - intersected with bed_fpath.

Arguments

  • bed_fpath: Only genomic regions overlapping regions in this bed file will be evaluated. If the dataloader accepts bed file input then the dataloader bed input file will be overwritten with regions based this (bed_fpath) bed file. Assembly is defined by the dataload arguments.
  • seq_length: Optional argument of model sequence length to use if model accepts variable input sequence length.
  • scores: list of score names to compute. See kipoi_veff.scores
  • score_kwargs: optional, list of kwargs that corresponds to the entries in score.

Returns

MutationMapPlotter: object containing variant scores.

query_vcf

MutationMap.query_vcf(self, vcf_fpath, seq_length=None, scores=['logit_ref', 'logit_alt', 'ref', 'alt', 'logit', 'diff'], score_kwargs=None, var_centered_regions=True, **kwargs)

Generate mutation map

Prediction of effects of every base at every position of datalaoder input sequences. The regions for which the effect scores will be calculated are primarily defined by the dataloader input. If the dataloader accepts bed file inputs then this file will be overwritten with regions generaten from the SNVs in vcf_fpathin a variant-centered fashion. Sequence length is defined by seq_length or the model input sequence length. If the datalaoder does not have a bed file input all datalaoder-generated regions that overlap SNVs here will be investigated. Effect scores are returned as MutationMapPlotter object which can be saved to an hdf5 file and used for plotting. It is important to mention that the order of the scored sequences is the order in which the dataloader has produced data input - intersected with vcf_fpath.

Arguments

  • vcf_fpath: Only genomic regions overlapping the variants in this VCF will be evaluated. Variants defined here will be highlighted in mutation map plots. Only SNVs will be used. If vcf_to_region is defined and the dataloader accepts bed file input then the dataloader bed input file will be overwritten with regions based on variant positions of this VCF.
  • seq_length: Optional argument of model sequence length to use if model accepts variable input sequence length.
  • var_centered_regions: Generate variant-centered regions if the model accepts that. If a custom vcf_to_region should be used then this can be set explicitly in the kwargs.
  • scores: list of score names to compute. See kipoi_veff.scores
  • score_kwargs: optional, list of kwargs that corresponds to the entries in score.

Returns

MutationMapPlotter: object containing variant scores.