epitome.generators.load_data¶
-
epitome.generators.load_data(data, label_cell_types, eval_cell_types, matrix, targetmap, cellmap, radii, similarity_targets=['DNase'], mode=<Dataset.TRAIN: 1>, similarity_matrix=None, indices=None, continuous=False, return_feature_names=False, **kwargs)¶ Takes Deepsea data and calculates distance metrics from cell types whose locations are specified by label_cell_indices, and the other cell types in the set. Label space is only one cell type. :param data: dictionary of matrices. Should have keys x and y. x contains n by 1000 rows. y contains n by 919 labels. :param label_cell_types: list of cell types to be rotated through and used as labels (subset of eval_cell_types) :param eval_cell_types: list of cell types to be used in evaluation (includes label_cell_types) :param matrix: matrix of celltype, target positions :param targetmap: map of column target positions in matrix :param cellmap: map of row cell type positions in matrix :param radii: radii to compute similarity distances from :param similarity_targets: list of targets used to measure cell type similarity (default is DNase-seq) :param mode: Dataset.TRAIN, VALID, TEST or RUNTIME :param similarity_matrix: matrix with shape (len(similarity_targets), genome_size) containing binary 0/1s of peaks for similarity_targets to be compared in the CASV. :param indices: indices in genome to generate records for. :param boolean continuous: determines whether similarity_matrix has continuous values. If continuous, we do not calculate agreement in the decreasing_train_valid_iters
TODO: remove this eventually, if you can show agreement does not help performance
- Parameters
return_feature_names – boolean whether to return string names of features
kwargs – kargs
- Returns
generator of data with three elements: 1. record features 2. record labels for a given cell type 3. 0/1 mask of labels that have validation data. For example, if this record is for celltype A549, and A549 does not have data for ATF3, there will be a 0 in the position corresponding to the label space.