APARENT/veff

Authors: Nicholas Bogard , Johannes Linder , Alexander B. Rosenberg , Georg Seelig

License: MIT

Contributed by: Florian R. Hölzlwimmer , Shabnam Sadegharmaki , Muhammed Hasan Çelik , Ziga Avsec

Cite as: https://doi.org/10.1101/300061

Type: None

Postprocessing: None

Trained on: isoform expression data from over 3 million APA reporters, built by inserting random sequence into 12 distinct 3'UTR contexts.

Source files

Predicting the Impact of cis-Regulatory Variation on Alternative Polyadenylation Abstract Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over three million APA reporters, built by inserting random sequence into twelve distinct 3′UTR contexts. Predictions are highly accurate across both synthetic and genomic contexts; when tasked with inferring APA in human 3′UTRs, APARENT outperforms models trained exclusively on endogenous data. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of cleavage site selection, and integrates these features into a comprehensive, interpretable cis-regulatory code. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.

Create a new conda environment with all dependencies installed
kipoi env create APARENT/veff
source activate kipoi-APARENT__veff
Install model dependencies into current environment
kipoi env install APARENT/veff
Test the model
kipoi test APARENT/veff --source=kipoi
Make a prediction
kipoi get-example APARENT/veff -o example
kipoi predict APARENT/veff \
  --dataloader_args='{"fasta_file": "example/chr22.fa", "gtf_file": "example/chr22.gtf.gz", "vcf_file": "example/vcf_file", "vcf_file_tbi": "example/vcf_file.tbi", "vcf_lazy": true}' \
  -o '/tmp/APARENT|veff.example_pred.tsv'
# check the results
head '/tmp/APARENT|veff.example_pred.tsv'
Get the model
import kipoi
model = kipoi.get_model('APARENT/veff')
Make a prediction for example files
pred = model.pipeline.predict_example(batch_size=4)
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
it = dl.batch_iter(batch_size=4)
# predict for a batch
batch = next(it)
model.predict_on_batch(batch['inputs'])
Make predictions for custom files directly
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
Get the model
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('APARENT/veff')
Make a prediction for example files
predictions <- model$pipeline$predict_example()
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
Make predictions for custom files directly
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
Get the docker image
docker pull kipoi/kipoi-docker:aparent-veff
Get the activated conda environment inside the container
docker run -it kipoi/kipoi-docker:aparent-veff
Test the model
docker run kipoi/kipoi-docker:aparent-veff kipoi test APARENT/veff --source=kipoi
Make prediction for custom files directly
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example 
# You can replace $PWD/kipoi-example with a different absolute path containing the data 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-veff \
kipoi get-example APARENT/veff -o /app/example 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-veff \
kipoi predict APARENT/veff \
--dataloader_args='{'fasta_file': '/app/example/chr22.fa', 'gtf_file': '/app/example/chr22.gtf.gz', 'vcf_file': '/app/example/vcf_file', 'vcf_file_tbi': '/app/example/vcf_file.tbi', 'vcf_lazy': True}' \
-o '/app/APARENT_veff.example_pred.tsv' 
# check the results
head $PWD/kipoi-example/APARENT_veff.example_pred.tsv
Install singularity
conda install --yes -c conda-forge singularity
Make prediction for custom files directly
kipoi get-example APARENT/veff -o example
kipoi predict APARENT/veff \
--dataloader_args='{"fasta_file": "example/chr22.fa", "gtf_file": "example/chr22.gtf.gz", "vcf_file": "example/vcf_file", "vcf_file_tbi": "example/vcf_file.tbi", "vcf_lazy": true}' \
-o 'APARENT_veff.example_pred.tsv' \
--singularity 
# check the results
head APARENT_veff.example_pred.tsv

Schema

Inputs

Dictionary of numpy arrays

Name: ref_seq

    Shape: (205, 4) 

    Doc: 205bp long reference sequence of PolyA-cut-site

Name: alt_seq

    Shape: (205, 4) 

    Doc: 205bp long alternative sequence of PolyA-cut-site


Targets

Dictionary of numpy arrays

Name: delta_logit_polya_prob

    Shape: (1,) 

    Doc: Predicts logit percentage difference to reference of having a PolyA cut site in the specified DNA range

Name: delta_logit_site_probs

    Shape: (205,) 

    Doc: Predicts logit percentage difference to reference of having a PolyA cut site for each position in the specified DNA range


Dataloader

Defined as: ./

Doc: Dataloader for APARENT variant effect prediction

Authors: Shabnam Sadegharmaki , Ziga Avsec , Muhammed Hasan Çelik , Florian R. Hölzlwimmer , Timon Schneider

Type: SampleIterator

License: MIT


Arguments

fasta_file : Reference genome sequence

gtf_file : file path; Genome annotation GTF file

vcf_file : bgzipped vcf file with the variants that are to be investigated. Must be sorted and tabix index present. Filter out any variants with non-DNA symbols!

vcf_file_tbi : tabix index of vcf (just to make kipoi tests work - leave as None in normal usage)

vcf_lazy : decode VCF in lazy manner (see cyvcf2 docs)


Model dependencies
conda:
  • python=3.6
  • tensorflow=1.13
  • keras>=2.0.4,<3
  • numpy
  • scipy

pip:

Dataloader dependencies
conda:
  • python=3.6
  • bioconda::kipoi
  • bioconda::cyvcf2
  • bioconda::pyranges
  • pip>=21.0.1

pip: