APARENT/site_probabilities

Authors: Nicholas Bogard , Johannes Linder , Alexander B. Rosenberg , Georg Seelig

License: MIT

Contributed by: Shabnam Sadegharmaki , Ziga Avsec , Muhammed Hasan Çelik , Florian R. Hölzlwimmer

Cite as: https://doi.org/10.1101/300061

Type: None

Postprocessing: None

Trained on: isoform expression data from over 3 million APA reporters, built by inserting random sequence into 12 distinct 3'UTR contexts.

Source files

Predicting the Impact of cis-Regulatory Variation on Alternative Polyadenylation Abstract Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over three million APA reporters, built by inserting random sequence into twelve distinct 3′UTR contexts. Predictions are highly accurate across both synthetic and genomic contexts; when tasked with inferring APA in human 3′UTRs, APARENT outperforms models trained exclusively on endogenous data. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of cleavage site selection, and integrates these features into a comprehensive, interpretable cis-regulatory code. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.

Create a new conda environment with all dependencies installed
kipoi env create APARENT/site_probabilities
source activate kipoi-APARENT__site_probabilities
Install model dependencies into current environment
kipoi env install APARENT/site_probabilities
Test the model
kipoi test APARENT/site_probabilities --source=kipoi
Make a prediction
kipoi get-example APARENT/site_probabilities -o example
kipoi predict APARENT/site_probabilities \
  --dataloader_args='{"fasta_file": "example/chr22.fa", "gtf_file": "example/chr22.gtf.gz"}' \
  -o '/tmp/APARENT|site_probabilities.example_pred.tsv'
# check the results
head '/tmp/APARENT|site_probabilities.example_pred.tsv'
Get the model
import kipoi
model = kipoi.get_model('APARENT/site_probabilities')
Make a prediction for example files
pred = model.pipeline.predict_example(batch_size=4)
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
it = dl.batch_iter(batch_size=4)
# predict for a batch
batch = next(it)
model.predict_on_batch(batch['inputs'])
Make predictions for custom files directly
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
Get the model
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('APARENT/site_probabilities')
Make a prediction for example files
predictions <- model$pipeline$predict_example()
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
Make predictions for custom files directly
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
Get the docker image
docker pull kipoi/kipoi-docker:aparent-site_probabilities
Get the activated conda environment inside the container
docker run -it kipoi/kipoi-docker:aparent-site_probabilities
Test the model
docker run kipoi/kipoi-docker:aparent-site_probabilities kipoi test APARENT/site_probabilities --source=kipoi
Make prediction for custom files directly
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example 
# You can replace $PWD/kipoi-example with a different absolute path containing the data 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-site_probabilities \
kipoi get-example APARENT/site_probabilities -o /app/example 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-site_probabilities \
kipoi predict APARENT/site_probabilities \
--dataloader_args='{'fasta_file': '/app/example/chr22.fa', 'gtf_file': '/app/example/chr22.gtf.gz'}' \
-o '/app/APARENT_site_probabilities.example_pred.tsv' 
# check the results
head $PWD/kipoi-example/APARENT_site_probabilities.example_pred.tsv

Schema

Inputs

Single numpy array

Name: seq

    Shape: (205, 4) 

    Doc: 205bp long sequence of PolyA-cut-site


Targets

Single numpy array

Name: None

    Shape: (206,) 

    Doc: Predicts 206 features: 1 prediction for % proximal isoform + 205 predictions for % cleavage at each position


Dataloader

Defined as: ./

Doc: Dataloader for APARENT sequence scoring

Authors: Shabnam Sadegharmaki , Ziga Avsec , Muhammed Hasan Çelik , Florian R. Hölzlwimmer

Type: SampleIterator

License: MIT


Arguments

fasta_file : Reference genome sequence

gtf_file : file path; Genome annotation GTF file


Model dependencies
conda:
  • python=3.9
  • tensorflow
  • keras>=2.0.4,<3

pip:

Dataloader dependencies
conda:
  • python=3.9
  • bioconda::kipoi
  • bioconda::cyvcf2
  • bioconda::pyranges
  • pip>=21.0.1

pip: