kipoi.writers

Writers used in kipoi predict

  • TsvBatchWriter
  • BedBatchWriter
  • HDF5BatchWriter
  • RegionWriter
  • BedGraphWriter
  • BigWigWriter

TsvBatchWriter

TsvBatchWriter(self, file_path, nested_sep='/')

Tab-separated file writer

Arguments

  • file_path (str): File path of the output tsv file
  • nested_sep: What separator to use for flattening the nested dictionary structure into a single key

batch_write

TsvBatchWriter.batch_write(self, batch)

Write a batch of data

Arguments

  • batch: batch of data. Either a single np.array or a list/dict thereof.

BedBatchWriter

BedBatchWriter(self, file_path, metadata_schema, header=True)

Bed-file writer

Arguments

  • file_path (str): File path of the output tsv file
  • dataloader_schema: Schema of the dataloader. Used to find the ranges object
  • nested_sep: What separator to use for flattening the nested dictionary structure into a single key

batch_write

BedBatchWriter.batch_write(self, batch)

Write a batch of data to bed file

Arguments

  • batch: batch of data. Either a single np.array or a list/dict thereof.

HDF5BatchWriter

HDF5BatchWriter(self, file_path, chunk_size=10000, compression='gzip')

HDF5 file writer

Arguments

  • file_path (str): File path of the output .h5 file
  • chunk_size (str): Chunk size for storing the files
  • nested_sep: What separator to use for flattening the nested dictionary structure into a single key
  • compression (str): default compression to use for the hdf5 datasets.
  • see also: http://docs.h5py.org/en/latest/high/dataset.html#dataset-compression

batch_write

HDF5BatchWriter.batch_write(self, batch)

Write a batch of data to bed file

Arguments

  • batch: batch of data. Either a single np.array or a list/dict thereof.

close

HDF5BatchWriter.close(self)

Close the file handle

dump

HDF5BatchWriter.dump(file_path, batch)

In a single shot write the batch/data to a file and close the file.

Arguments

  • file_path: file path
  • batch: batch of data. Either a single np.array or a list/dict thereof.

ZarrBatchWriter

ZarrBatchWriter(self, file_path, chunk_size=10000, store=None, string_dtype=None, compressor=None)

Zarr file writer

Arguments

  • file_path (str): File path of the output zarr file
  • chunk_size (str): Chunk size for storing the files
  • store: zarr.storage. If not specified, it's inferred from the file-name.
  • For example: .lmdb.zarr uses LMDB, .zip.zarr uses Zip, and no special suffix uses DirectoryStore
  • compressor (str): Zarr compressor from numcodecs. Example: from numcodecs import Blosc compressor = Blosc(cname='zstd', clevel=3, shuffle=Blosc.BITSHUFFLE)
  • string_dtype: how to encode the string. If None, variable length is used

batch_write

ZarrBatchWriter.batch_write(self, batch)

Write a batch of data to bed file

Arguments

  • batch: batch of data. Either a single np.array or a list/dict thereof.

close

ZarrBatchWriter.close(self)

Close the file handle

dump

ZarrBatchWriter.dump(file_path, batch)

In a single shot write the batch/data to a file and close the file.

Arguments

  • file_path: file path
  • batch: batch of data. Either a single np.array or a list/dict thereof.

BedGraphWriter

BedGraphWriter(self, file_path)

Arguments

  • file_path (str): File path of the output bedgraph file

region_write

BedGraphWriter.region_write(self, region, data)

Write region to file.

Arguments

  • region: Defines the region that will be written position by position. Example: {"chr":"chr1", "start":0, "end":4}.
  • data: a 1D or 2D numpy array vector that has length "end" - "start". if 2D array is passed then data.sum(axis=1) is performed on it first.

write_entry

BedGraphWriter.write_entry(self, chr, start, end, value)

Write region to file.

Arguments

  • region: Defines the region that will be written position by position. Example: {"chr":"chr1", "start":0, "end":4}.
  • data: a 1D or 2D numpy array vector that has length "end" - "start". if 2D array is passed then data.sum(axis=1) is performed on it first.

close

BedGraphWriter.close(self)

Close the file

BigWigWriter

BigWigWriter(self, file_path, genome_file=None, chrom_sizes=None, is_sorted=True)

Arguments

  • file_path (str): File path of the output tsv file
  • genome_file: genome file containing chromosome sizes. Can be None. Can be overriden by chrom_sizes.
  • chrom_sizes: a list of tuples containing chromosome sizes. If not None, it overrided genome_file.
  • is_sorted: if True, the provided entries need to be sorted beforehand

  • Note: One of genome_file or chrom_sizes shouldn't be None.

region_write

BigWigWriter.region_write(self, region, data)

Write region to file. Note: the written regions need to be sorted beforehand.

Arguments

  • region: a kipoi.metadata.GenomicRanges, pybedtools.Interval or a dictionary with at least keys:
  • "chr", "start", "end" and list-values. Example: {"chr":"chr1", "start":0, "end":4}.
  • data: a 1D-array of values to be written - where the 0th entry is at 0-based "start"

close

BigWigWriter.close(self)

Close the file