Integrating GTEx Tissue Expression with ENCODE Regulatory Data
Use GTEx gene expression across 54 human tissues to validate ENCODE regulatory element activity, establish enhancer-gene links, and provide tissue-specific expression context for functional genomics findings.
Scientific Rationale
The question: "Is the gene near my ENCODE regulatory element actually expressed in the tissue where the element is active?"
ENCODE catalogs where regulatory elements exist (enhancers, promoters, insulators) but does not directly measure gene expression across a broad tissue panel. GTEx (Genotype-Tissue Expression) fills this gap by providing RNA-seq-based gene expression measurements across 54 human tissues from ~1,000 post-mortem donors. Integrating the two answers a fundamental question: does the regulatory landscape match the transcriptional output?
An active enhancer (H3K27ac+, ATAC-seq+) near a gene in pancreas tissue is much more meaningful if GTEx confirms the gene is highly expressed in pancreas. Conversely, an ENCODE enhancer near a gene with zero expression in the relevant tissue suggests the enhancer regulates a different gene, or acts in a cell-type subpopulation not captured by bulk GTEx.
What GTEx Provides
- Median gene expression (TPM) across 54 human tissues from ~1,000 donors
- Transcript-level expression for isoform analysis
- eQTLs — variants associated with gene expression in specific tissues (cis and trans)
- sQTLs — variants associated with alternative splicing
- Single-nucleus RNA-seq for selected tissues (GTEx v8+)
- Allele-specific expression data
Why Integrate with ENCODE
| ENCODE provides | GTEx provides | Together |
|---|---|---|
| Where regulatory elements are | Where genes are expressed | Regulatory element-expression correlation |
| Tissue-specific enhancers | Tissue-specific expression | Enhancer-gene validation |
| TF binding sites | eQTLs in those sites | Functional variant identification |
| Chromatin accessibility | Expression levels | Accessibility-expression concordance |
Key Literature
- GTEx Consortium 2020 "The GTEx Consortium atlas of genetic regulatory effects across human tissues" (Science, ~4,000 citations). The flagship publication describing the v8 release with 17,382 samples across 54 tissues from 948 donors. Identified cis-eQTLs for 95% of genes, tissue-sharing patterns, and cell-type interaction eQTLs. DOI: 10.1126/science.aaz1776
- Aguet et al. 2017 "Genetic effects on gene expression across human tissues" (Nature, ~3,000 citations). The v6p analysis establishing the multi-tissue eQTL framework, demonstrating widespread tissue-specific genetic regulation. DOI: 10.1038/nature24277
- ENCODE Project Consortium 2020 (Nature, ~1,656 citations). Registry of 926,535 human cCREs. Provides the regulatory element catalog to cross-reference with GTEx expression. DOI: 10.1038/s41586-020-2493-4
- Nasser et al. 2021 (Nature, ~468 citations). ABC model linking enhancers to genes using ENCODE data. GTEx expression validates ABC-predicted enhancer-gene pairs. DOI: 10.1038/s41586-021-03446-x
When to Use This Skill
| Scenario | How GTEx helps |
|---|---|
| Found enhancer near gene X in ENCODE | Check if gene X is expressed in the matching tissue |
| GWAS variant in ENCODE peak | Query GTEx eQTLs to identify regulated gene |
| Comparing regulatory landscapes across tissues | Validate that differential enhancers correspond to differential expression |
| Designing functional validation | Confirm gene is expressed before investing in CRISPR/reporter assays |
| Interpreting TF ChIP-seq | Check if TF target genes show expected expression patterns |
| Choosing relevant ENCODE biosamples | Use GTEx to identify which tissues express your gene of interest |
GTEx REST API Reference
Base URL: https://gtexportal.org/api/v2
No authentication required. Responses are JSON.
Key Endpoints
| Endpoint | Purpose | Key Parameters |
|---|---|---|
/expression/geneExpression | Median TPM by tissue for a gene | geneId, datasetId |
/expression/medianTranscriptExpression | Transcript-level TPM by tissue | geneId, datasetId |
/eqtl/singleTissueEqtl | eQTLs for a gene in a tissue | geneId, tissueSiteDetailId, datasetId |
/expression/topExpressedGene | Most expressed genes in a tissue | tissueSiteDetailId, datasetId |
/dataset/tissueSiteDetail | List all GTEx tissues with IDs | — |
/reference/gene | Gene metadata lookup | geneId or geneName |
Dataset IDs
gtex_v8— Current release (54 tissues, 948 donors, 17,382 samples)
Step 1: Identify the Gene of Interest from ENCODE Data
The starting point is typically an ENCODE finding — an active regulatory element near a gene:
# Find enhancers in pancreas
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K27ac", organ="pancreas")
# Get peak files
encode_list_files(
experiment_accession="ENCSR...",
file_format="bed",
output_type="IDR thresholded peaks",
assembly="GRCh38"
)
From the peak file, identify the nearest gene(s) to the enhancer. You will need the Ensembl gene ID (ENSG...) for GTEx queries.
Step 2: Query GTEx for Gene Expression
Get median expression across all tissues
import requests
gene_id = "ENSG00000254647" # INS (insulin)
url = f"https://gtexportal.org/api/v2/expression/geneExpression"
params = {
"geneId": gene_id,
"datasetId": "gtex_v8"
}
response = requests.get(url, params=params)
data = response.json()
# Each entry has: tissueSiteDetailId, median, geneSymbol, etc.
for entry in sorted(data["data"], key=lambda x: x["median"], reverse=True)[:10]:
print(f"{entry['tissueSiteDetailId']}: {entry['median']:.1f} TPM")
Get transcript-level expression
url = "https://gtexportal.org/api/v2/expression/medianTranscriptExpression"
params = {
"geneId": gene_id,
"datasetId": "gtex_v8"
}
response = requests.get(url, params=params)
Step 3: Map GTEx Tissues to ENCODE Biosamples
GTEx and ENCODE use different tissue nomenclature. Key mappings:
| GTEx tissueSiteDetailId | ENCODE organ/biosample | Notes |
|---|---|---|
Pancreas | pancreas | Direct match |
Liver | liver | Direct match |
Brain_Cortex | brain | GTEx has 13 brain sub-regions |
Brain_Hippocampus | brain | Map to specific brain region |
Heart_Left_Ventricle | heart | GTEx separates ventricle/atrial |
Heart_Atrial_Appendage | heart | — |
Lung | lung | Direct match |
Kidney_Cortex | kidney | GTEx has cortex only |
Whole_Blood | blood | ENCODE uses specific blood cell types |
Skin_Sun_Exposed_Lower_leg | skin of body | GTEx has sun-exposed/not-exposed |
Adipose_Subcutaneous | adipose tissue | GTEx has subcutaneous/visceral |
Muscle_Skeletal | muscle | Direct match |
Stomach | stomach | Direct match |
Small_Intestine_Terminal_Ileum | intestine | Partial match |
Colon_Sigmoid | large intestine | GTEx has sigmoid/transverse |
To get the full list of GTEx tissue IDs:
url = "https://gtexportal.org/api/v2/dataset/tissueSiteDetail"
response = requests.get(url)
tissues = response.json()["data"]
for t in tissues:
print(f"{t['tissueSiteDetailId']}: {t['tissueSiteDetail']}")
Step 4: Interpret TPM Values
Expression Thresholds
| TPM Range | Interpretation | ENCODE Implication |
|---|---|---|
| 0 | Not detected | Regulatory elements likely inactive or regulating a different gene |
| 0.1 - 1 | Low / noise threshold | May reflect rare cell-type expression in bulk tissue |
| 1 - 10 | Expressed | Regulatory elemen |