epitome.dataset.EpitomeDataset¶
-
class
epitome.dataset.EpitomeDataset(data_dir=None, targets=None, cells=None, min_cells_per_target=3, min_targets_per_cell=2, similarity_targets=['DNase'], assembly=None)¶ Dataset for holding Epitome data. Data processing scripts can be found in epitome/data.
-
__init__(data_dir=None, targets=None, cells=None, min_cells_per_target=3, min_targets_per_cell=2, similarity_targets=['DNase'], assembly=None)¶ Initializes an EpitomeDataset.
- Parameters
data_dir (str) – path to data directory containing data.h5. By default, accesses data in ~/.epitome/data
targets (list) – list of ChIP-seq targets to include in dataset
cells (list) – list of celltypes to use in dataset
min_cells_per_target (int) – minimum number of cell types required for a given ChIP-seq target
min_targets_per_cell (int) – minimum number of ChIP-seq targets required for each celltype
similarity_targets (list) – list of targets to be used to compute similarity (ie. DNase, H3K27ac, etc.)
Methods
__init__([data_dir, targets, cells, …])Initializes an EpitomeDataset.
all_keys(obj[, keys])Recursively find all keys in an openh5py dataset
contains_required_files(data_dir)download_data_dir([data_dir, assembly])Loads data processed from data/download_encode.py.
get_assays([targets, cells, data_dir, …])Returns at matrix of cell type/targets which exist for a subset of cell types.
get_data(mode)Lazily loads all data into memory.
get_data_dir([data_dir, assembly])If both data_dir and assembly are set, it will return the data_dir with the specified assembly.
get_parameter_dict()Returns dict of all parameters required to reconstruct this dataset
get_y_indices_for_cell(matrix, cellmap, cell)Gets indices for a cell.
get_y_indices_for_target(matrix, targetmap, …)Gets indices for a assay.
list_genome_assemblies()list_targets()Returns available ChIP-seq targets/chromatin accessibility targets available in the curretn dataset.
order_by_similarity(cell, mode[, compare_target])Orders list of cellmap names by similarity to comparison cell.
save(out_path, all_data, row_df, regions_df, …)Saves an Epitome dataset.
saveToyData(toy_path)Creates a toy dataset for test from this dataset.
set_train_validation_indices(chrom)Removes and reserves a given chromosome from the TRAIN dataset into its own TRAIN_VALID dataset.
view()Plots a matrix of available targets from available cells.
-