Build Comprehensive Epigenomic Profiles with ENCODE
When to Use
- User wants to build a comprehensive epigenomic profile for a tissue or cell type
- User asks about "chromatin states", "epigenome", or "histone landscape" for a biosample
- User wants to identify super-enhancers, bivalent domains, or regulatory elements
- User needs to assemble a panel of histone marks, accessibility, and TF binding data
- User wants to run ChromHMM segmentation on ENCODE data
- User asks "what epigenomic data does ENCODE have for [tissue]?"
Assemble a complete epigenomic profile for a tissue or cell type by systematically gathering histone modifications, chromatin accessibility, transcription factor binding, transcription, DNA methylation, and 3D chromatin structure data from ENCODE. Interpret the resulting profile using ChromHMM chromatin state segmentation.
Literature Foundation
| Reference | Year | Journal | DOI | Citations | Contribution |
|---|---|---|---|---|---|
| Roadmap Epigenomics Consortium (Kundaje et al.) | 2015 | Nature | 10.1038/nature14248 | ~5,810 | 111 reference epigenomes; 5-mark core model; 15/18/25-state ChromHMM |
| ENCODE Phase 3 (ENCODE Project Consortium) | 2020 | Nature | 10.1038/s41586-020-2493-4 | ~1,656 | Registry of candidate cis-regulatory elements (cCREs) across 1,310+ experiments |
| Ernst & Kellis | 2012 | Nat Methods | 10.1038/nmeth.1906 | ~2,294 | ChromHMM: multivariate HMM for chromatin state discovery and characterization |
| Barski et al. | 2007 | Cell | 10.1016/j.cell.2007.05.009 | ~4,800 | First genome-wide ChIP-Seq of 20 histone methylations in human CD4+ T cells |
| Mikkelsen et al. | 2007 | Nature | 10.1038/nature06008 | ~4,289 | Chromatin state maps in pluripotent and lineage-committed cells; H3K4me3/H3K27me3 discriminate expressed, poised, and repressed genes |
| Bernstein et al. | 2006 | Cell | 10.1016/j.cell.2006.02.041 | ~3,500 | Discovery of bivalent chromatin domains (H3K4me3+H3K27me3) in embryonic stem cells |
| Creyghton et al. | 2010 | PNAS | 10.1073/pnas.1016071107 | ~2,800 | H3K27ac distinguishes active enhancers from poised (H3K4me1-only) enhancers |
| Whyte et al. | 2013 | Cell | 10.1016/j.cell.2013.03.035 | ~2,500 | Master transcription factors and super-enhancer identification via ROSE algorithm |
| Buenrostro et al. | 2013 | Nat Methods | 10.1038/nmeth.2688 | ~5,000 | ATAC-seq: transposase-based chromatin accessibility profiling |
| Heintzman et al. | 2007 | Nat Genet | 10.1038/ng1966 | ~2,300 | H3K4me1 marks enhancers, H3K4me3 marks promoters — foundational chromatin signature for regulatory element classification |
| Rada-Iglesias et al. | 2011 | Nature | 10.1038/nature09692 | ~1,200 | Discovered "poised enhancers" (H3K4me1+H3K27me3, no H3K27ac) that activate during differentiation |
| ENCODE Blacklist (Amemiya et al.) | 2019 | Sci Rep | 10.1038/s41598-019-45839-z | ~1,372 | Comprehensive set of problematic genomic regions to exclude from all analyses |
Step 1: Choose the Target Biosample
Clarify the target biosample with the user. Check data availability across assay types:
encode_get_facets(organ="pancreas", biosample_type="tissue")
ENCODE Cell Line Tiers
| Tier | Cell Lines | Data Depth | Notes |
|---|---|---|---|
| Tier 1 (most data) | K562, GM12878, H1-hESC | Deep profiling across all assays | Preferred for methods development and benchmarking |
| Tier 2 (good coverage) | HeLa-S3, HepG2, HUVEC, A549, MCF-7 | Most core marks and accessibility | Suitable for tissue-specific profiling |
| Tier 3+ (variable) | 100+ additional cell lines and primary tissues | Variable coverage | Check availability per assay before committing |
For primary tissues, verify what biosamples are available:
encode_search_experiments(organ="pancreas", biosample_type="tissue", limit=50)
Biosample hierarchy (from most to least standardized): tissue > primary cell > cell line > in vitro differentiated cells > organoid. Cell lines offer the deepest profiling. Primary tissues offer biological relevance but greater heterogeneity.
Step 2: Assemble the Histone Modification Panel
Search for each histone mark in the target biosample. Organize the panel into three tiers of increasing depth.
Tier 1: Core 5-Mark Panel (ChromHMM Minimum)
This is the minimum set required for chromatin state segmentation. All 111 Roadmap Epigenomics reference epigenomes were profiled for these five marks (Kundaje et al. 2015). Ernst & Kellis (2012) demonstrated that these five marks suffice for the 15-state ChromHMM model that captures all major functional categories.
| Mark | What It Marks | Genomic Location | Writers | Readers | Key Reference |
|---|---|---|---|---|---|
| H3K4me3 | Active and poised promoters | Sharp peaks at TSSs | SET1A/B (COMPASS), MLL1/2 | TAF3, ING proteins, CHD1 | Barski et al. 2007 |
| H3K4me1 | Enhancers (primed and active) | Distal regulatory elements | MLL3 (KMT2C), MLL4 (KMT2D) | CHD1, BPTF | Heintzman et al. 2007 |
| H3K27me3 | Polycomb-mediated repression | Broad domains over silent genes | EZH2 (PRC2), EZH1 | EED, CBX proteins (PRC1) | Bernstein et al. 2006 |
| H3K36me3 | Actively transcribed gene bodies | Gene bodies, 5'-to-3' gradient | SETD2 (sole trimethylase) | DNMT3B, MSH6 | Mikkelsen et al. 2007 |
| H3K9me3 | Constitutive heterochromatin | Repeats, TEs, ERVs, pericentromeric | SUV39H1/2, SETDB1 | HP1alpha/beta/gamma | Barski et al. 2007 |
Note on H3K27ac: While not in the Roadmap 5-mark core, H3K27ac is essential for distinguishing active from poised elements (Creyghton et al. 2010). It is included in the 18-state extended ChromHMM model. Always include H3K27ac if available.
Search for each:
encode_search_experiments(
assay_title="Histone ChIP-seq",
target="H3K4me3",
biosample_term_name="...",
biosample_type="tissue"
)
Tier 2: Extended Panel
These marks provide finer-grained state resolution. The Ernst et al. (2011) 15-state model across 9 cell types used these marks together with the core 5 to define insulator, active promoter, and transcription states more precisely.
| Mark | What It Marks | Genomic Location | Key Reference |
|---|---|---|---|
| H3K9ac | Active promoters and regulatory regions | TSSs, co-occurs with H3K4me3 | Wang et al. 2008 |
| H3K79me2 | Transcription elongation | Gene bodies (DOT1L-mediated) | Barski et al. 2007 |
| H2A.Z (H2AFZ) | Active regulatory elements | TSSs, enhancers, insulators | Barski et al. 2007 |
| H4K20me1 | Transcription and cell cycle | Gene bodies | Barski et al. 2007 |
| H3K27ac | Active enhancers and promoters | Active regulatory elements (mutually exclusive with H3K27me3) | Creyghton et al. 2010 |
Tier 3: Advanced Panel
These acetylation marks provide additional granularity for specialized analyses. They are rarely profiled outside Tier 1 cell lines but can distinguish subtypes of active chromatin.
| Mark | What It Marks | Genomic Location | Key Reference |
|---|---|---|---|
| H3K14ac | Active promoters, DNA damage response | Active TSSs, DNA double-strand break sites | Wang et al. 2008 |
| H3K18ac | Active transcription | Active promoters and enhancers | Wang et al. 2008 |
| H3K23ac | Active tran |