scbasset
Type: keras
Postprocessing: None
Trained on: From 103,151 total peaks, 5,158 randomly reserved for testing and 5,157 for validation, leaving 92,836 for training.
This is the scBasset model published by Han Yuan and David Kelley. It predicted scATAC binary peak-by-cell accessibility matrix from DNA sequences. This dataset is trained on the scATAC binary matrix and peak set provided by Chen et al. at (https://github.com/pinellolab/scATAC-benchmarking/blob/master/Real_Data/Buenrostro_2018/input). which contains 103,151 peaks and 2,034 cells after filtering out peaks accessible in <1% cells. The sequence length the model uses as input is 1344bp. The input of the tensor has to be (N, 1344, 4) for N samples, 1344bp window size and 4 nucleotides. Per sample, 2034 probabilities of accessible chromatin will be predicted.
kipoi env create scbasset
source activate kipoi-scbasset
kipoi test scbasset --source=kipoi
kipoi get-example scbasset -o example
kipoi predict scbasset \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o '/tmp/scbasset.example_pred.tsv'
# check the results
head '/tmp/scbasset.example_pred.tsv'
kipoi env create scbasset
source activate kipoi-scbasset
import kipoi
model = kipoi.get_model('scbasset')
pred = model.pipeline.predict_example(batch_size=4)
# Download example dataloader kwargs
dl_kwargs = model.default_dataloader.download_example('example')
# Get the dataloader and instantiate it
dl = model.default_dataloader(**dl_kwargs)
# get a batch iterator
batch_iterator = dl.batch_iter(batch_size=4)
for batch in batch_iterator:
# predict for a batch
batch_pred = model.predict_on_batch(batch['inputs'])
pred = model.pipeline.predict(dl_kwargs, batch_size=4)
library(reticulate)
kipoi <- import('kipoi')
model <- kipoi$get_model('scbasset')
predictions <- model$pipeline$predict_example()
# Download example dataloader kwargs
dl_kwargs <- model$default_dataloader$download_example('example')
# Get the dataloader
dl <- model$default_dataloader(dl_kwargs)
# get a batch iterator
it <- dl$batch_iter(batch_size=4)
# predict for a batch
batch <- iter_next(it)
model$predict_on_batch(batch$inputs)
pred <- model$pipeline$predict(dl_kwargs, batch_size=4)
docker pull kipoi/kipoi-docker:sharedpy3keras2tf2-slim
docker pull kipoi/kipoi-docker:sharedpy3keras2tf2
docker run -it kipoi/kipoi-docker:sharedpy3keras2tf2-slim
docker run kipoi/kipoi-docker:sharedpy3keras2tf2-slim kipoi test scbasset --source=kipoi
# Create an example directory containing the data
mkdir -p $PWD/kipoi-example
# You can replace $PWD/kipoi-example with a different absolute path containing the data
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras2tf2-slim \
kipoi get-example scbasset -o /app/example
docker run -v $PWD/kipoi-example:/app/ kipoi/kipoi-docker:sharedpy3keras2tf2-slim \
kipoi predict scbasset \
--dataloader_args='{'intervals_file': '/app/example/intervals_file', 'fasta_file': '/app/example/fasta_file'}' \
-o '/app/scbasset.example_pred.tsv'
# check the results
head $PWD/kipoi-example/scbasset.example_pred.tsv
https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps
kipoi get-example scbasset -o example
kipoi predict scbasset \
--dataloader_args='{"intervals_file": "example/intervals_file", "fasta_file": "example/fasta_file"}' \
-o 'scbasset.example_pred.tsv' \
--singularity
# check the results
head scbasset.example_pred.tsv
Defined as: kipoiseq.dataloaders.SeqIntervalDl
Doc: Dataloader for a combination of fasta and tab-delimited input files such as bed files. The dataloader extracts regions from the fasta file as defined in the tab-delimited `intervals_file` and converts them into one-hot encoded format. Returned sequences are of the type np.array with the shape inferred from the arguments: `alphabet_axis` and `dummy_axis`.
Authors: Ziga Avsec , Roman Kreuzhuber
Type: Dataset
License: MIT
Arguments
intervals_file : bed3+<columns> file path containing intervals + (optionally) labels
fasta_file : Reference genome FASTA file path.
num_chr_fasta (optional): True, the the dataloader will make sure that the chromosomes don't start with chr.
label_dtype (optional): None, datatype of the task labels taken from the intervals_file. Example: str, int, float, np.float32
use_strand (optional): reverse-complement fasta sequence if bed file defines negative strand. Requires a bed6 file
ignore_targets (optional): if True, don't return any target variables
- python=3.7
- tensorflow=2.6.0
- keras
- h5py
- pip=22.0.4
- bioconda::pysam=0.17
- cython
- kipoi
- kipoiseq
- bioconda::pybedtools
- bioconda::pyfaidx
- bioconda::pyranges
- numpy
- pandas
- kipoiseq