epitome.dataset.EpitomeDataset

class epitome.dataset.EpitomeDataset(data_dir=None, targets=None, cells=None, min_cells_per_target=3, min_targets_per_cell=2, similarity_targets=['DNase'], assembly=None)

Dataset for holding Epitome data. Data processing scripts can be found in epitome/data.

__init__(data_dir=None, targets=None, cells=None, min_cells_per_target=3, min_targets_per_cell=2, similarity_targets=['DNase'], assembly=None)

Initializes an EpitomeDataset.

Parameters
  • data_dir (str) – path to data directory containing data.h5. By default, accesses data in ~/.epitome/data

  • targets (list) – list of ChIP-seq targets to include in dataset

  • cells (list) – list of celltypes to use in dataset

  • min_cells_per_target (int) – minimum number of cell types required for a given ChIP-seq target

  • min_targets_per_cell (int) – minimum number of ChIP-seq targets required for each celltype

  • similarity_targets (list) – list of targets to be used to compute similarity (ie. DNase, H3K27ac, etc.)

Methods

__init__([data_dir, targets, cells, …])

Initializes an EpitomeDataset.

all_keys(obj[, keys])

Recursively find all keys in an openh5py dataset

contains_required_files(data_dir)

download_data_dir([data_dir, assembly])

Loads data processed from data/download_encode.py.

get_assays([targets, cells, data_dir, …])

Returns at matrix of cell type/targets which exist for a subset of cell types.

get_data(mode)

Lazily loads all data into memory.

get_data_dir([data_dir, assembly])

If both data_dir and assembly are set, it will return the data_dir with the specified assembly.

get_parameter_dict()

Returns dict of all parameters required to reconstruct this dataset

get_y_indices_for_cell(matrix, cellmap, cell)

Gets indices for a cell.

get_y_indices_for_target(matrix, targetmap, …)

Gets indices for a assay.

list_genome_assemblies()

list_targets()

Returns available ChIP-seq targets/chromatin accessibility targets available in the curretn dataset.

order_by_similarity(cell, mode[, compare_target])

Orders list of cellmap names by similarity to comparison cell.

save(out_path, all_data, row_df, regions_df, …)

Saves an Epitome dataset.

saveToyData(toy_path)

Creates a toy dataset for test from this dataset.

set_train_validation_indices(chrom)

Removes and reserves a given chromosome from the TRAIN dataset into its own TRAIN_VALID dataset.

view()

Plots a matrix of available targets from available cells.