epitome.models.PeakModel

class epitome.models.PeakModel(dataset, test_celltypes=[], single_cell=False, debug=False, batch_size=64, shuffle_size=10, prefetch_size=10, l1=0.0, l2=0.0, lr=0.001, radii=[1, 3, 10, 30], checkpoint=None, max_valid_batches=None)

Model for learning from ChIP-seq peaks.

__init__(dataset, test_celltypes=[], single_cell=False, debug=False, batch_size=64, shuffle_size=10, prefetch_size=10, l1=0.0, l2=0.0, lr=0.001, radii=[1, 3, 10, 30], checkpoint=None, max_valid_batches=None)

Initializes Peak Model

Parameters
  • dataset (EpitomeDataset) – EpitomeDataset

  • test_celltypes (list) – list of cell types to hold out for test. Should be in cellmap

  • single_cell (boolean) – whether you are building a model to predict using scATAC-seq posteriors. Defaults to False.

  • debug (bool) – used to print out intermediate validation values

  • batch_size (int) – batch size (default is 64)

  • shuffle_size (int) – data shuffle size (default is 10)

  • prefetch_size (int) – data prefetch size (default is 10)

  • floatl1 – l1 regularization (default is 0)

  • l2 (float) – l2 regularization (default is 0)

  • lr (float) – lr (default is 1e-3)

  • radii (list) – radius of DNase-seq to consider around a peak of interest (default is [1,3,10,30]) each model.

  • checkpoint (str) – path to load model from.

  • max_valid_batches (int) – the size of train-validation dataset (default is None, meaning that it doesn’t create a train-validation dataset or stop early while training)

Methods

__init__(dataset[, test_celltypes, …])

Initializes Peak Model

body_fn()

eval_vector(matrix, indices)

Evaluates a new cell type based on its chromatin (DNase or ATAC-seq) vector, as well as any other similarity targets (acetylation, methylation, etc.).

g(p[, a, B, y])

Normalization Function.

loss_fn(y_true, y_pred, weights)

Loss function for Epitome.

run_predictions(num_samples, iter_[, …])

Runs predictions on num_samples records

save(checkpoint_path)

Saves model.

score_matrix(accessilibility_peak_matrix, …)

Runs predictions on a matrix of accessibility peaks, where columns are samples and rows are regions from regions_peak_file.

score_peak_file(similarity_peak_files, …)

Runs predictions on a set of peaks defined in a bed or narrowPeak file.

score_whole_genome(similarity_peak_files, …)

Runs a whole genome scan for all available genomic regions in the dataset (about 3.2Million regions) Takes about 1 hour on entire genome.

test(num_samples[, mode, calculate_metrics])

Tests model on valid and test dataset handlers.

test_from_generator(num_samples, ds[, …])

Runs test given a specified data generator.

train(max_train_batches[, patience, min_delta])

Trains an Epitome model.

Attributes

predict_step_generator

predict_step_matrix