Disease Research with ENCODE Functional Genomics
When to Use
- User wants to connect GWAS variants to ENCODE regulatory elements for disease mechanism research
- User asks about "disease", "pathology", "therapeutic targets", "GWAS interpretation", or "clinical variants"
- User needs to annotate disease-associated loci with functional genomics data from ENCODE
- User wants to identify drug targets from epigenomic evidence using Open Targets integration
- Example queries: "find enhancers disrupted by diabetes GWAS hits", "identify drug targets from ChIP-seq data", "connect my disease variants to regulatory elements"
Leverage ENCODE's 926,535 cCREs and multi-layer functional data to understand disease mechanisms, interpret disease-associated variants, identify therapeutic targets, and connect genomic findings to clinical applications.
Scientific Rationale
The question: "How can ENCODE functional genomics help me understand a disease's molecular mechanisms and identify actionable targets?"
Over 90% of disease-associated variants from GWAS fall in non-coding regions (Maurano et al. 2012). They disrupt regulatory elements controlling gene expression, not protein sequences. ENCODE provides the most comprehensive catalog of these elements across hundreds of cell types and tissues. This skill connects (1) genetic association data, (2) ENCODE functional annotations, and (3) clinical/pharmacological databases for druggable targets.
Literature Foundation
| Reference | Year | Journal | Key Contribution | Citations | DOI |
|---|---|---|---|---|---|
| ENCODE Phase 3 | 2020 | Nature | 926,535 human cCREs across 400+ biosamples, SCREEN portal | ~1,656 | 10.1038/s41586-020-2493-4 |
| Maurano et al. | 2012 | Science | Disease variants enriched in regulatory DNA; DNase hotspots explain 76.6% of GWAS SNPs | ~3,500 | 10.1126/science.1222794 |
| Finucane et al. | 2015 | Nat Genet | S-LDSC partitions heritability into functional annotations; ENCODE categories explain disproportionate heritability | ~2,253 | 10.1038/ng.3404 |
| Nasser et al. | 2021 | Nature | ABC model links enhancers to genes in 131 cell types; connected 5,036 GWAS signals to 2,249 genes | ~468 | 10.1038/s41586-021-03446-x |
| Roadmap Epigenomics | 2015 | Nature | 111 reference epigenomes; tissue-specific chromatin states; disease variant enrichment in tissue-specific marks | ~5,810 | 10.1038/nature14248 |
| Visscher et al. | 2017 | Am J Hum Genet | GWAS review — 10 years of discoveries, statistical frameworks, shift toward functional interpretation | ~2,500 | 10.1016/j.ajhg.2017.06.005 |
| Buniello et al. | 2019 | Nucleic Acids Res | GWAS Catalog — curated repository; >250,000 SNP-trait associations | ~3,000 | 10.1093/nar/gky1120 |
| Ochoa et al. | 2021 | Nucleic Acids Res | Open Targets Platform — integrates GWAS, functional genomics, drugs for systematic target identification | ~600 | 10.1093/nar/gkaa1027 |
Step 1: Map Disease to Relevant Tissues
ENCODE regulatory elements are highly tissue-specific. Correct tissue mapping is the single most important decision.
Disease-Tissue Mapping Table
| Disease Category | Primary Tissues | Key Cell Types | ENCODE Cell Lines | Example Diseases |
|---|---|---|---|---|
| Neurological | brain (cortex, hippocampus, cerebellum) | neurons, astrocytes, microglia | SK-N-SH, SK-N-DZ, BE2C | Alzheimer's, Parkinson's, schizophrenia |
| Cardiovascular | heart, aorta, blood vessels | cardiomyocytes, endothelial, smooth muscle | HUVEC, HCASMC | coronary artery disease, heart failure |
| Metabolic | pancreas, liver, adipose, muscle | beta cells, hepatocytes, adipocytes | HepG2, Panc1 | type 2 diabetes, NAFLD, obesity |
| Cancer | tissue of origin | tumor cells, microenvironment | K562, HepG2, MCF-7, A549, HCT116, PC-3 | leukemia, breast cancer, lung cancer |
| Autoimmune | blood, immune organs, thymus | T cells, B cells, macrophages | GM12878, Jurkat | RA, lupus, MS, type 1 diabetes |
| Respiratory | lung, trachea | alveolar epithelial, bronchial | A549, IMR-90 | asthma, COPD, pulmonary fibrosis |
| Renal | kidney | podocytes, tubular epithelial | HEK293 | CKD, IgA nephropathy, FSGS |
| Hepatic | liver, bile duct | hepatocytes, cholangiocytes | HepG2, Hep3B | NAFLD, cirrhosis, hepatitis |
| Endocrine | thyroid, adrenal, pituitary, pancreas | thyrocytes, adrenal cortical, beta cells | — | hypothyroidism, Cushing's |
| Musculoskeletal | bone, cartilage, skeletal muscle | osteoblasts, chondrocytes, myocytes | — | osteoarthritis, osteoporosis |
| Gastrointestinal | intestine, colon, stomach | epithelial, goblet, Paneth cells | HCT116, Caco-2 | IBD, Crohn's, celiac |
| Hematological | blood, bone marrow | HSCs, erythrocytes, megakaryocytes | K562, GM12878, CD34+ | sickle cell, thalassemia, AML |
Check ENCODE availability:
encode_get_facets(organ="pancreas")
encode_get_facets(organ="brain")
If tissue has limited data: Use Tier 1 cell lines (K562, GM12878, H1-hESC) or Roadmap Epigenomics as proxies. Document the mismatch explicitly.
Step 2: Find Disease-Relevant ENCODE Data
For GWAS / Genetic Diseases
Open chromatin and active enhancers in disease tissue (Maurano et al. 2012):
encode_search_experiments(assay_title="ATAC-seq", organ="...", biosample_type="tissue")
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K27ac", organ="...")
encode_search_experiments(assay_title="DNase-seq", organ="...", biosample_type="tissue")
Then use the variant-annotation skill to overlap variants with functional elements.
For Cancer Research
ENCODE cancer cell lines (NOT tumors — see Cancer Epigenomics section below):
encode_search_experiments(biosample_term_name="K562") # CML
encode_search_experiments(biosample_term_name="HepG2") # Hepatocellular carcinoma
encode_search_experiments(biosample_term_name="MCF-7") # ER+ breast cancer
encode_search_experiments(biosample_term_name="A549") # Lung adenocarcinoma
For Perturbation Studies
encode_search_experiments(perturbed=True, organ="...")
encode_search_experiments(assay_title="CRISPR screen", organ="...")
For Rare Diseases
- Search for closest available tissue; 2. Check Roadmap Epigenomics; 3. Use Tier 1 cell lines as baseline; 4. Document tissue proxy limitations.
Step 3: Cross-Reference with Disease Databases
PubMed — Disease Literature
search_articles(query="[DISEASE] AND (ENCODE OR regulatory element OR enhancer)")
Track experiments and link papers:
encode_track_experiment(accession="ENCSR...", notes="Disease research - [disease]")
encode_get_citations(accession="ENCSR...")
encode_link_reference(experiment_accession="ENCSR...", reference_type="pmid", reference_id="12345678")
ClinicalTrials.gov — Active Trials
search_trials(condition="[DISEASE]", intervention="[TARGET_GENE or DRUG]", status=["RECRUITING"])
Link trials: encode_link_reference(experiment_accession="ENCSR...", reference_type="nct_id", reference_id="NCT...")
Open Targets — Target-Disease Associations
search_entities(query_strings=["[GENE_NAME]"])
query_open_targets_graphql(
query_string="query target($ensemblId: String!) { target(ensemblId: $ensemblId) { approvedSymbol knownDrugs { rows { drug { name } phase mechanismOfAction } } } }",
variables={"ensemblId": "ENSG..."}
)
bioRxiv — Recent Preprints
search_preprints(category="genetics", recent_days=90)
encode_link_reference(experiment_accession="ENCSR...", reference_type="preprint_doi", reference_id="10.1101/...")