Multi-Omics Integration of ENCODE Data
When to Use
- User wants to integrate multiple ENCODE data types (RNA-seq + ATAC-seq + ChIP-seq) for a tissue
- User asks about "multi-omics", "integrative analysis", "regulatory landscape", or "layer epigenomic data"
- User needs to build a comprehensive view of active enhancers, promoters, and TF binding in a tissue
- User wants to combine expression with chromatin state to identify cell-type-specific regulatory networks
- Example queries: "integrate all ENCODE data for pancreas", "build a regulatory landscape for liver", "combine RNA-seq and ChIP-seq to find active enhancers"
Layer RNA-seq, ATAC-seq, Histone ChIP-seq, and TF ChIP-seq data from ENCODE to build a comprehensive regulatory landscape for a tissue or cell type.
Scientific Rationale
The question: "What regulatory elements are active in my tissue, and how do expression, chromatin accessibility, histone marks, and TF binding converge to define cell identity?"
No single assay captures the full picture of gene regulation. RNA-seq tells you what is expressed. ATAC-seq tells you where chromatin is open. Histone ChIP-seq tells you how chromatin is modified. TF ChIP-seq tells you who is binding. Each assay provides one dimension; integrating them reveals the regulatory logic.
The Framework (Mawla, van der Meulen & Huising 2023)
Mawla et al. (2023, BMC Genomics) demonstrated this integrative approach by comparing ATAC-seq chromatin accessibility between alpha, beta, and delta cells in mouse pancreatic islets. Key findings:
-
Cell type-specific chromatin accessibility defines cell identity: Differentially accessible regions between alpha, beta, and delta cells map to cell type-specific enhancers. Both alpha and delta cells appear poised, but repressed, from becoming beta cells.
-
Distal-intergenic enrichment in beta cells: Differential chromatin accessibility shows preferentially enriched distal-intergenic regions in beta cells compared to alpha or delta cells — indicating a larger enhancer repertoire.
-
TF motif enrichment reveals regulatory logic: Differentially accessible regions are enriched for binding motifs of known lineage-defining TFs, connecting chromatin structure to transcriptional regulation.
-
Cross-validation with expression: Common endocrine enhancers (accessible in all three cell types) map near genes expressed in all cell types, while cell type-specific enhancers map near differentially expressed genes.
-
Enhancer databases as validation: Previously discovered enhancer regions from the literature were confirmed and novel regions identified through chromatin accessibility analysis.
Literature Support
- Mawla, van der Meulen & Huising 2023 (BMC Genomics): Integrated ATAC-seq across alpha, beta, and delta cells. Identified common and cell type-specific enhancers. Demonstrated that chromatin accessibility patterns predict cell identity and lineage plasticity. DOI
- ChromHMM (Ernst & Kellis 2017, Nature Protocols, 711 citations): The standard tool for chromatin state segmentation. Uses combinatorial patterns of histone marks to annotate genome into functional states (active promoter, enhancer, repressed, etc.). DOI
- ENCODE Phase 3 (Gorkin et al. 2020, Nature, 301 citations): Created unified chromatin state annotations across 66 mouse epigenomes. 18 chromatin states annotated. Demonstrated that bivalent chromatin is enriched in silencers and polycomb targets.
- ENCODE cCRE Registry (ENCODE Project Consortium 2020, Nature): Defined ~926,000 candidate cis-regulatory elements (cCREs) in the human genome classified as promoter-like, enhancer-like, or CTCF-bound, using DNase, H3K4me3, H3K27ac, and CTCF signals.
- SCENIC+ (Gonzalez-Blas et al. 2022, Nature Methods, 369 citations): Single-cell multi-omic inference of enhancers and gene regulatory networks. Predicts genomic enhancers, upstream TFs, and target genes from joint chromatin accessibility and expression data. DOI
- Minnoye et al. 2021 (Nature Reviews Methods Primers, 125 citations): Comprehensive review of chromatin accessibility profiling methods. Discusses orthogonal assays needed to interpret accessible regions — enhancer-promoter proximity, TF binding, regulatory function.
- Roadmap Epigenomics (Kundaje et al. 2015, Nature, 4,800+ citations): Mapped chromatin states across 111 human reference epigenomes. Established the canonical histone mark signatures for functional annotation.
- ENCODE Blacklist (Amemiya et al. 2019, Scientific Reports, 1,372 citations): Defined problematic genomic regions to filter from all functional genomics analyses. DOI
- ABC Model (Fulco et al. 2019, Nature Genetics, 800+ citations): Activity-By-Contact model for predicting enhancer-gene connections. Combines enhancer activity (H3K27ac) with Hi-C contact frequency. Outperforms proximity-based assignment. DOI
- ROSE (Whyte et al. 2013, Cell, 3,000+ citations): Algorithm for identifying super-enhancers from H3K27ac or Med1 ChIP-seq signal. Regions above the inflection point in ranked signal are classified as super-enhancers. DOI
- GREAT (McLean et al. 2010, Nature Biotechnology, 3,500+ citations): Genomic Regions Enrichment of Annotations Tool. Assigns biological meaning to sets of non-coding genomic regions (enhancers, accessible regions) by analyzing nearby gene annotations. DOI
- GRaNIE (Kamal et al. 2023, Molecular Systems Biology): Enhancer-mediated gene regulatory network inference. Builds GRNs based on covariation of chromatin accessibility and RNA-seq across samples, connecting TFs → enhancers → target genes. Includes GRaNPA for unbiased GRN performance evaluation. DOI
- Enformer (Avsec et al. 2021, Nature Methods): Deep learning model predicting gene expression from DNA sequence by integrating long-range interactions (up to 100kb). Accurately predicts variant effects and enhancer-promoter interactions directly from sequence. Useful for validating regulatory element predictions. DOI
- scVI/scANVI (Lopez et al. 2018; Xu et al. 2021, Nature Methods / MSB): Deep generative models for single-cell RNA-seq. scANVI extends scVI with semi-supervised cell type annotation. When used with scATAC-seq multiome data, provides probabilistic cell type assignments that improve regulatory element annotation. DOI
- Luecken et al. 2022 (Nature Methods, scIB benchmark): Atlas-level integration benchmark of 68 methods. Found scANVI, scVI, Scanorama, and scGen perform best. Provides the standard framework (14 metrics) for evaluating single-cell integration quality. DOI
Step 1: Define the Regulatory Question
Multi-omics integration is not a single workflow — the approach depends on the question:
| Question | Required Data Layers | Approach |
|---|---|---|
| "What enhancers are active in my tissue?" | ATAC-seq + H3K27ac + RNA-seq | Intersection of accessible + H3K27ac+ regions near expressed genes |
| "What chromatin states exist?" | H3K4me1 + H3K4me3 + H3K27ac + H3K27me3 + H3K36me3 | ChromHMM segmentation |
| "Which TFs drive cell identity?" | ATAC-seq + TF ChIP-seq + RNA-seq | TF footprinting + motif enrichment in accessible regions |
| "What distinguishes cell type A from B?" | Cell type-resolved ATAC-seq + RNA-seq | Differential accessibility + expression correlation |
| "Where are super-enhancers?" | H3K27ac + H3K4me1 + ATAC-seq | ROSE algorithm on H3K27ac + accessibility confir |