kipoi.writers
Writers used in kipoi predict
- TsvBatchWriter
- BedBatchWriter
- HDF5BatchWriter
- RegionWriter
- BedGraphWriter
- BigWigWriter
TsvBatchWriter
TsvBatchWriter(self, file_path, nested_sep='/')
Tab-separated file writer
Arguments
- file_path (str): File path of the output tsv file
- nested_sep: What separator to use for flattening the nested dictionary structure into a single key
batch_write
TsvBatchWriter.batch_write(self, batch)
Write a batch of data
Arguments
- batch: batch of data. Either a single
np.array
or a list/dict thereof.
BedBatchWriter
BedBatchWriter(self, file_path, metadata_schema, header=True)
Bed-file writer
Arguments
- file_path (str): File path of the output tsv file
- dataloader_schema: Schema of the dataloader. Used to find the ranges object
- nested_sep: What separator to use for flattening the nested dictionary structure into a single key
batch_write
BedBatchWriter.batch_write(self, batch)
Write a batch of data to bed file
Arguments
- batch: batch of data. Either a single
np.array
or a list/dict thereof.
HDF5BatchWriter
HDF5BatchWriter(self, file_path, chunk_size=10000, compression='gzip')
HDF5 file writer
Arguments
- file_path (str): File path of the output .h5 file
- chunk_size (str): Chunk size for storing the files
- nested_sep: What separator to use for flattening the nested dictionary structure into a single key
- compression (str): default compression to use for the hdf5 datasets.
- see also: http://docs.h5py.org/en/latest/high/dataset.html#dataset-compression
batch_write
HDF5BatchWriter.batch_write(self, batch)
Write a batch of data to bed file
Arguments
- batch: batch of data. Either a single
np.array
or a list/dict thereof.
close
HDF5BatchWriter.close(self)
Close the file handle
dump
HDF5BatchWriter.dump(file_path, batch)
In a single shot write the batch/data to a file and close the file.
Arguments
- file_path: file path
- batch: batch of data. Either a single
np.array
or a list/dict thereof.
ZarrBatchWriter
ZarrBatchWriter(self, file_path, chunk_size=10000, store=None, string_dtype=None, compressor=None)
Zarr file writer
Arguments
- file_path (str): File path of the output zarr file
- chunk_size (str): Chunk size for storing the files
- store: zarr.storage. If not specified, it's inferred from the file-name.
- For example: .lmdb.zarr uses LMDB, .zip.zarr uses Zip, and no special suffix uses DirectoryStore
- compressor (str): Zarr compressor from numcodecs. Example: from numcodecs import Blosc compressor = Blosc(cname='zstd', clevel=3, shuffle=Blosc.BITSHUFFLE)
- string_dtype: how to encode the string. If None, variable length is used
batch_write
ZarrBatchWriter.batch_write(self, batch)
Write a batch of data to bed file
Arguments
- batch: batch of data. Either a single
np.array
or a list/dict thereof.
close
ZarrBatchWriter.close(self)
Close the file handle
dump
ZarrBatchWriter.dump(file_path, batch)
In a single shot write the batch/data to a file and close the file.
Arguments
- file_path: file path
- batch: batch of data. Either a single
np.array
or a list/dict thereof.
BedGraphWriter
BedGraphWriter(self, file_path)
Arguments
- file_path (str): File path of the output bedgraph file
region_write
BedGraphWriter.region_write(self, region, data)
Write region to file.
Arguments
- region: Defines the region that will be written position by position. Example:
{"chr":"chr1", "start":0, "end":4}
. - data: a 1D or 2D numpy array vector that has length "end" - "start". if 2D array is passed then
data.sum(axis=1)
is performed on it first.
write_entry
BedGraphWriter.write_entry(self, chr, start, end, value)
Write region to file.
Arguments
- region: Defines the region that will be written position by position. Example:
{"chr":"chr1", "start":0, "end":4}
. - data: a 1D or 2D numpy array vector that has length "end" - "start". if 2D array is passed then
data.sum(axis=1)
is performed on it first.
close
BedGraphWriter.close(self)
Close the file
BigWigWriter
BigWigWriter(self, file_path, genome_file=None, chrom_sizes=None, is_sorted=True)
Arguments
- file_path (str): File path of the output BigWig file
- genome_file: genome file containing chromosome sizes. Can
be None. Can be overriden by
chrom_sizes
. - chrom_sizes: a list of tuples containing chromosome sizes.
If not None, it overrided
genome_file
. -
is_sorted: if True, the provided entries need to be sorted beforehand
-
Note: One of
genome_file
orchrom_sizes
shouldn't be None.
region_write
BigWigWriter.region_write(self, region, data)
Write region to file. Note: the written regions need to be sorted beforehand.
Arguments
- region: a
kipoi.metadata.GenomicRanges
,pybedtools.Interval
or a dictionary with at least keys: - "chr", "start", "end" and list-values. Example:
{"chr":"chr1", "start":0, "end":4}
. - data: a 1D-array of values to be written - where the 0th entry is at 0-based "start"
close
BigWigWriter.close(self)
Close the file