APARENT/veff
Authors: Nicholas Bogard , Johannes Linder
License: MIT
Contributed by: Florian R. Hölzlwimmer , Shabnam Sadegharmaki , Muhammed Hasan Çelik , Ziga Avsec
Cite as: https://doi.org/10.1101/300061
Type: None
Postprocessing: None
Trained on: isoform expression data from over 3 million APA reporters, built by inserting random sequence into 12 distinct 3'UTR contexts.
Predicting the Impact of cis-Regulatory Variation on Alternative Polyadenylation Abstract Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over three million APA reporters, built by inserting random sequence into twelve distinct 3′UTR contexts. Predictions are highly accurate across both synthetic and genomic contexts; when tasked with inferring APA in human 3′UTRs, APARENT outperforms models trained exclusively on endogenous data. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of cleavage site selection, and integrates these features into a comprehensive, interpretable cis-regulatory code. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.
kipoi env create APARENT/veff
source activate kipoi-APARENT__veff
kipoi test APARENT/veff --source=kipoi
kipoi get-example APARENT/veff -o example
kipoi predict APARENT/veff \
--dataloader_args='{"fasta_file": "example/chr22.fa", "gtf_file": "example/chr22.gtf.gz", "vcf_file": "example/vcf_file", "vcf_file_tbi": "example/vcf_file.tbi", "vcf_lazy": true}' \
-o '/tmp/APARENT|veff.example_pred.tsv'
# check the results
head '/tmp/APARENT|veff.example_pred.tsv'
kipoi env create APARENT/veff
source activate kipoi-APARENT__veff
import kipoi
model = kipoi.get_model('APARENT/veff')
pred = model.pipeline.predict_example(batch_size=4)
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
# predict for a batch
batch_pred = model.predict_on_batch(batch['inputs'])
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('APARENT/veff')
predictions <- model$pipeline$predict_example()
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
docker pull kipoi/kipoi-docker:aparent-veff-slim
docker pull kipoi/kipoi-docker:aparent-veff
docker run -it kipoi/kipoi-docker:aparent-veff-slim
docker run kipoi/kipoi-docker:aparent-veff-slim kipoi test APARENT/veff --source=kipoi
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example
# You can replace $PWD/kipoi-example with a different absolute path containing the data
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-veff-slim \
kipoi get-example APARENT/veff -o /app/example
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-veff-slim \
kipoi predict APARENT/veff \
--dataloader_args='{'fasta_file': '/app/example/chr22.fa', 'gtf_file': '/app/example/chr22.gtf.gz', 'vcf_file': '/app/example/vcf_file', 'vcf_file_tbi': '/app/example/vcf_file.tbi', 'vcf_lazy': True}' \
-o '/app/APARENT_veff.example_pred.tsv'
# check the results
head $PWD/kipoi-example/APARENT_veff.example_pred.tsv
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
kipoi get-example APARENT/veff -o example
kipoi predict APARENT/veff \
--dataloader_args='{"fasta_file": "example/chr22.fa", "gtf_file": "example/chr22.gtf.gz", "vcf_file": "example/vcf_file", "vcf_file_tbi": "example/vcf_file.tbi", "vcf_lazy": true}' \
-o 'APARENT_veff.example_pred.tsv' \
--singularity
# check the results
head APARENT_veff.example_pred.tsv
Inputs
Dictionary of numpy arrays
Name: ref_seq
Doc: 205bp long reference sequence of PolyA-cut-site
Name: alt_seq
Doc: 205bp long alternative sequence of PolyA-cut-site
Targets
Dictionary of numpy arrays
Name: delta_logit_distal_prop
Doc: Predicts logit difference to reference of having a PolyA cut site **outside** of the specified DNA range
Name: delta_logit_proximal_prop
Doc: Predicts logit difference to reference of having a PolyA cut site for each position in the specified DNA range
Defined as: ./
Doc: Dataloader for APARENT variant effect prediction
Authors: Shabnam Sadegharmaki , Ziga Avsec , Muhammed Hasan Çelik , Florian R. Hölzlwimmer , Timon Schneider
Type: SampleIterator
License: MIT
Arguments
fasta_file : Reference genome sequence
gtf_file : file path; Genome annotation GTF file
vcf_file : bgzipped vcf file with the variants that are to be investigated. Must be sorted and tabix index present. Filter out any variants with non-DNA symbols!
vcf_file_tbi (optional): tabix index of vcf (just to make kipoi tests work - leave as None in normal usage)
vcf_lazy : decode VCF in lazy manner (see cyvcf2 docs)
- python=3.9
- tensorflow=2.7
- keras>=2.0.4,<3
- numpy=1.25.1
- scipy=1.10.1
- python=3.9
- bioconda::kipoi
- bioconda::kipoiseq>=0.7.1
- bioconda::cyvcf2=0.30
- bioconda::pyranges=0.0.129