Predicting peaks for ChIP-seq targets¶
Once you have trained a model, you can predict on your own cell types. You need a peak file called from your DNase-seq or ATAC-seq data. This peak file can be in either bed or narrowpeak format.
To get predictions on the whole genome, run:
peak_result = model.score_whole_genome(peak_files, # list of bed files containing similarity data (either chromatin accessibility, histone modifications, or other)
output_path, # where to save results
chrs=["chr8","chr9"]) # chromosomes you would like to score. Leave blank to score the whole genome whole genome.
Note: Scoring on the whole genome scores about 7 million regions and takes about 1.5 hours.
Using histone modifications to compute cell type similarity¶
peak_files is a list of bed or narrowpeak files. Each file represents a different
assay from your cell type of interest that is used to compute cell type similarity.
If you just use DNase-seq to compute cell type similarity, peak_files should be a single
bed file of either ATAC-seq or DNase-seq peaks. If you use additional assays to compute
cell type similarity, such as histone modifications, you should include a separate bed file
for each assay used in computing cell type similarity.
For example, the following example trains an Epitome model using DNase-seq and H3K27ac to compute cell type
similarity, and then predicts using the score_whole_genome function:
# define the dataset, using DNase-seq and H3K27ac to compute similarity
targets = ['CTCF', 'RAD21', 'SMC3']
dataset = EpitomeDataset(targets, similarity_targets=['DNase', 'H3K27ac'])
# create and train model
model = EpitomeModel(dataset)
model.train(5000)
# list of paths to bed files for similarity assays for a cell type of interest
peak_files = ['my_DNase_peaks.bed', 'my_H3K27ac_peaks.bed']
peak_result = model.score_whole_genome(peak_files, # list of bed files containing similarity data (either chromatin accessibility, histone modifications, or other)
output_path, # where to save results
chrs=["chr8","chr9"]) # chromosomes you would like to score. Leave blank to score the whole genome whole genome.
You can also get predictions on specific genomic regions:
results = model.score_peak_file(peak_files, # chromatin accessibility peak file
regions_file) # bed file of regions to score
This method returns a dataframe of the scored predictions.