APARENT/site_probabilities

Authors: Nicholas Bogard , Johannes Linder

License: MIT

Contributed by: Shabnam Sadegharmaki , Ziga Avsec , Muhammed Hasan Çelik , Florian R. Hölzlwimmer

Cite as: https://doi.org/10.1101/300061

Type: None

Postprocessing: None

Trained on: isoform expression data from over 3 million APA reporters, built by inserting random sequence into 12 distinct 3'UTR contexts.

Source files

Predicting the Impact of cis-Regulatory Variation on Alternative Polyadenylation Abstract Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over three million APA reporters, built by inserting random sequence into twelve distinct 3′UTR contexts. Predictions are highly accurate across both synthetic and genomic contexts; when tasked with inferring APA in human 3′UTRs, APARENT outperforms models trained exclusively on endogenous data. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of cleavage site selection, and integrates these features into a comprehensive, interpretable cis-regulatory code. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.

Create a new conda environment with all dependencies installed
kipoi env create APARENT/site_probabilities
source activate kipoi-APARENT__site_probabilities
Test the model
kipoi test APARENT/site_probabilities --source=kipoi
Make a prediction
kipoi get-example APARENT/site_probabilities -o example
kipoi predict APARENT/site_probabilities \
  --dataloader_args='{"fasta_file": "example/chr22.fa", "gtf_file": "example/chr22.gtf.gz"}' \
  -o '/tmp/APARENT|site_probabilities.example_pred.tsv'
# check the results
head '/tmp/APARENT|site_probabilities.example_pred.tsv'
Create a new conda environment with all dependencies installed
kipoi env create APARENT/site_probabilities
source activate kipoi-APARENT__site_probabilities
Get the model
import kipoi
model = kipoi.get_model('APARENT/site_probabilities')
Make a prediction for example files
pred = model.pipeline.predict_example(batch_size=4)
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
    # predict for a batch
    batch_pred = model.predict_on_batch(batch['inputs'])
Make predictions for custom files directly
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
Get the model
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('APARENT/site_probabilities')
Make a prediction for example files
predictions <- model$pipeline$predict_example()
Use dataloader and model separately
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
Make predictions for custom files directly
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
Get the docker image
docker pull kipoi/kipoi-docker:aparent-site_probabilities-slim
Get the full sized docker image
docker pull kipoi/kipoi-docker:aparent-site_probabilities
Get the activated conda environment inside the container
docker run -it kipoi/kipoi-docker:aparent-site_probabilities-slim
Test the model
docker run kipoi/kipoi-docker:aparent-site_probabilities-slim kipoi test APARENT/site_probabilities --source=kipoi
Make prediction for custom files directly
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example 
# You can replace $PWD/kipoi-example with a different absolute path containing the data 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-site_probabilities-slim \
kipoi get-example APARENT/site_probabilities -o /app/example 
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:aparent-site_probabilities-slim \
kipoi predict APARENT/site_probabilities \
--dataloader_args='{'fasta_file': '/app/example/chr22.fa', 'gtf_file': '/app/example/chr22.gtf.gz'}' \
-o '/app/APARENT_site_probabilities.example_pred.tsv' 
# check the results
head $PWD/kipoi-example/APARENT_site_probabilities.example_pred.tsv
    
Install apptainer
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
Make prediction for custom files directly
kipoi get-example APARENT/site_probabilities -o example
kipoi predict APARENT/site_probabilities \
--dataloader_args='{"fasta_file": "example/chr22.fa", "gtf_file": "example/chr22.gtf.gz"}' \
-o 'APARENT_site_probabilities.example_pred.tsv' \
--singularity 
# check the results
head APARENT_site_probabilities.example_pred.tsv

Schema

Inputs

Single numpy array

Name: seq

    Shape: (205, 4) 

    Doc: 205bp long sequence of PolyA-cut-site


Targets

Dictionary of numpy arrays

Name: distal_prop

    Shape: (1,) 

    Doc: Predicts proportion of cleavage occuring outside of the specified DNA range

Name: site_props

    Shape: (205,) 

    Doc: Predicts proportion of cleavage occuring at each position in the specified DNA range. Sum of all site props + distal_prop = 1


Dataloader

Defined as: ./

Doc: Dataloader for APARENT sequence scoring

Authors: Shabnam Sadegharmaki , Ziga Avsec , Muhammed Hasan Çelik , Florian R. Hölzlwimmer

Type: SampleIterator

License: MIT


Arguments

fasta_file : Reference genome sequence

gtf_file : file path; Genome annotation GTF file


Model dependencies
conda:
  • python=3.9
  • tensorflow
  • keras>=2.0.4,<3

pip:

Dataloader dependencies
conda:
  • python=3.9
  • bioconda::kipoi
  • bioconda::kipoiseq>=0.7.1
  • bioconda::cyvcf2
  • bioconda::pyranges

pip: