Analyze ENCODE Functional Genomics Screens
When to Use
- User wants to find or analyze CRISPR screen, MPRA, or STARR-seq data from ENCODE
- User asks about "functional screens", "CRISPR perturbation", "reporter assay", or "enhancer validation"
- User needs to identify functionally validated regulatory elements from screen results
- User wants to integrate screen results with epigenomic annotations (ChIP-seq, ATAC-seq peaks)
- Example queries: "find CRISPR screen data in ENCODE", "analyze MPRA results for enhancer activity", "which regulatory elements have functional validation?"
Discover and interpret functional validation data from CRISPR screens, MPRA (Massively Parallel Reporter Assays), and STARR-seq experiments in the ENCODE catalog. These assays directly test whether candidate regulatory elements have functional activity, complementing the correlative evidence from ChIP-seq, ATAC-seq, and Hi-C.
Scientific Rationale
The question: "Which of the candidate regulatory elements identified by ENCODE actually have functional activity, and what genes do they regulate?"
The central challenge in regulatory genomics is that biochemical signatures (histone marks, chromatin accessibility, TF binding) are correlative — they identify candidate regulatory elements but cannot prove function. ENCODE Phase 4 addressed this gap by investing heavily in functional characterization: large-scale CRISPR perturbation screens, MPRA experiments testing thousands of candidate elements in parallel, and STARR-seq for genome-wide enhancer activity mapping.
The Validation Gap
ENCODE catalogs 926,535 human candidate cis-regulatory elements (cCREs). But how many of these are truly functional?
- CRISPR screens (Gasperini et al. 2019): Of 5,920 candidate enhancers tested by CRISPRi, only ~12% showed significant effects on nearby gene expression
- MPRA (Inoue et al. 2017; Tewhey et al. 2016): Reporter assays confirm activity for 40-60% of predicted enhancers, depending on cell type and element class
- STARR-seq (Arnold et al. 2013): Genome-wide enhancer assays identify thousands of active elements, but episomal context differs from chromosomal
These functional assays provide the strongest evidence (short of genetic studies in humans) that a regulatory element has biological activity. ENCODE4 has scaled these approaches: the Functional Characterization Centers (Yao et al. 2024) performed 108 CRISPRi screens with >540,000 perturbations, targeting 3.27 million ENCODE SCREEN cCREs.
Assay Comparison
| Assay | Tests | Context | Scale | Confidence | Key Limitation |
|---|---|---|---|---|---|
| CRISPR screen (CRISPRi/CRISPRa) | Endogenous perturbation | Native chromatin | 5,000–500,000 elements | Highest | Limited to cell lines; delivery constraints |
| MPRA | Reporter activity | Episomal (plasmid) | 10,000–100,000 variants | High for activity | Removed from chromatin context |
| STARR-seq | Self-transcription | Episomal (plasmid) | Genome-wide library | High for activity | Episomal; position effects |
Literature Support
- Gasperini et al. 2019 (Cell, ~800 citations): CRISPRi screen of 5,920 candidate enhancers with single-cell RNA-seq readout in K562 cells. Identified 664 enhancer-gene pairs. Established the "crisprQTL" framework linking perturbation effects to gene expression at single-cell resolution. DOI
- Fulco et al. 2019 (Nature Genetics, ~1,200 citations): CRISPRi tiling screen combined with the Activity-By-Contact (ABC) model. Demonstrated that ABC predictions outperform distance-based enhancer-gene assignment. Quantitative relationship between enhancer activity, contact frequency, and gene regulation. DOI
- Shalem et al. 2014 (Science, ~5,500 citations): Genome-scale CRISPR-Cas9 knockout screening. The foundational paper for CRISPR loss-of-function screens. Established library design principles and analytical frameworks. DOI
- Arnold et al. 2013 (Science, ~1,100 citations): STARR-seq — Self-Transcribing Active Regulatory Region sequencing. First genome-wide quantitative enhancer activity assay. Tests millions of fragments simultaneously in Drosophila; adapted to human. DOI
- Inoue et al. 2017 (Genome Research, ~400 citations): MPRA for systematic variant effect prediction. Tested thousands of regulatory variants for allele-specific enhancer activity. Demonstrated that GWAS risk alleles frequently alter enhancer function. DOI
- Li et al. 2014 (Genome Biology, ~2,800 citations): MAGeCK — Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout. The standard computational tool for CRISPR screen analysis. Robust negative binomial model for guide RNA count data. DOI
- Gordon et al. 2020 (Nature Protocols, ~200 citations): MPRAflow — standardized computational pipeline for MPRA data analysis. Reproducible barcode counting, normalization, and activity scoring. DOI
- Tewhey et al. 2016 (Cell, ~600 citations): High-throughput identification of regulatory variants using MPRA. Tested >30,000 allelic pairs across 3,642 GWAS loci. Identified hundreds of variants with allele-specific regulatory activity. DOI
- Nasser et al. 2021 (Nature, ~700 citations): ABC model for enhancer-gene prediction using ENCODE data. Linked 5,036 GWAS signals to 2,249 genes across 131 cell types. Validated predictions against CRISPRi perturbation data. DOI
- Klein et al. 2020 (Nature Genetics, ~350 citations): CRISPRi screen design principles for non-coding regulatory elements. Established that guide RNA positioning relative to regulatory element boundaries is critical. Quantified the "shadow" of CRISPRi repression (~1–2 kb). DOI
- Yao et al. 2024 (Nature Methods, ~26 citations): ENCODE4 Functional Characterization Centers — 108 CRISPRi screens, >540,000 perturbations across multiple cell types. Pre-designed sgRNA library targeting 3.27M ENCODE SCREEN cCREs. Establishes the largest functional characterization dataset for regulatory elements. DOI
- Lee et al. 2020 (Genome Biology, ~150 citations): STARRPeaker — peak caller designed specifically for STARR-seq data. Handles input library normalization and identifies significant enhancer peaks from STARR-seq enrichment. DOI
- Kim & Hart 2021 (Genome Medicine, ~200 citations): BAGEL2 — Bayesian Analysis of Gene Essentiality. Updated framework for identifying essential genes and functional elements from CRISPR screen data. DOI
Finding ENCODE Screen Data
CRISPR Screens
encode_search_experiments(assay_title="CRISPR screen")
ENCODE contains CRISPRi (inhibition) and CRISPRa (activation) screens targeting regulatory elements:
| Screen Type | Mechanism | Effect on Target | Use Case |
|---|---|---|---|
| CRISPRi (dCas9-KRAB) | Transcriptional repression | Silences enhancer/promoter | Loss-of-function; identifies required elements |
| CRISPRa (dCas9-VP64/p65) | Transcriptional activation | Activates latent elements | Gain-of-function; identifies sufficient elements |
| CRISPR knockout | Cas9 nuclease | Deletes element | Irreversible loss-of-function |
Typical ENCODE CRISPR screen outputs:
- Guide RNA quantifications (sgRNA counts per condition)
- Element quantifications (aggregated guide effects per target element)
- Differential expression results
# List available files for a CRISPR screen experiment