Query the Ensembl REST API
When to Use
- User wants to annotate variants with Ensembl VEP (Variant Effect Predictor) consequences
- User asks about "VEP", "Ensembl", "variant annotation", "regulatory build", or "gene annotation"
- User needs to convert coordinates between assemblies using Ensembl's liftover API
- User wants to check the Ensembl Regulatory Build for overlap with ENCODE elements
- Example queries: "run VEP on my variant list", "annotate SNPs with regulatory consequences", "check Ensembl regulatory build for my peaks"
Annotate variants, look up regulatory features, convert coordinates, and resolve gene identifiers using the Ensembl REST API.
Scientific Rationale
The question: "What does the Ensembl Regulatory Build say about this region, and what is the predicted effect of this variant?"
The Ensembl Regulatory Build integrates ENCODE, Roadmap Epigenomics, and Blueprint data into a unified annotation of regulatory features across human cell types. The Variant Effect Predictor (VEP) is the standard tool for variant consequence prediction, integrating 50+ annotation sources including CADD, REVEL, SpliceAI, and AlphaMissense.
Ensembl ↔ ENCODE Feedback Loop
Ensembl's Regulatory Build incorporates ENCODE ChIP-seq, DNase-seq, and CTCF data to define regulatory features. Querying Ensembl after an ENCODE analysis provides an independent, aggregated view of regulatory annotations — often including data from non-ENCODE sources (Blueprint, Roadmap) that may cover biosamples not in ENCODE.
Literature Support
- Cunningham et al. 2022 (Nucleic Acids Research): Ensembl 2022 update. DOI
- McLaren et al. 2016 (Genome Biology, ~4,500 citations): The Ensembl Variant Effect Predictor. DOI
- Zerbino et al. 2015 (Genome Biology): The Ensembl Regulatory Build. DOI
API Reference
Base URL: https://rest.ensembl.org
Authentication: None required
Rate limit: Reasonable use expected; max 5Mb region queries
Formats: JSON (default), XML, GFF3, BED
Current version: Ensembl 114
Add content-type: application/json header to all requests.
Step 1: Regulatory Feature Overlap
Query what regulatory features the Ensembl Regulatory Build assigns to a region:
# Get regulatory features in a region
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/overlap/region/human/7:140424943-140624564?feature=regulatory"
# Also get TF binding motifs
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/overlap/region/human/7:140424943-140624564?feature=regulatory;feature=motif"
Regulatory Feature Types
| Type | Description | ENCODE Equivalent |
|---|---|---|
| Promoter | Active promoter region | cCRE PLS |
| Enhancer | Active enhancer region | cCRE pELS/dELS |
| Open chromatin | Accessible region without H3K27ac | DNase-only sites |
| CTCF binding site | CTCF-occupied region | cCRE CTCF-only |
| TF binding site | Other TF binding | TF ChIP-seq peaks |
| Promoter flanking | Region flanking a promoter | cCRE TssAFlnk |
Step 2: Variant Effect Prediction (VEP)
VEP provides consequence predictions for variants:
Single Variant (by region notation)
# VEP annotation for a variant
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/vep/human/region/9:22125503-22125502:1/C"
# By rs ID
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/vep/human/id/rs699"
Batch VEP (POST, up to 200 variants)
curl -X POST -H "Content-type: application/json" \
"https://rest.ensembl.org/vep/human/region" \
-d '{"variants": ["1 230710048 . A G . . .", "2 241533886 . T C . . ."]}'
Key VEP Parameters
| Parameter | Description | Default |
|---|---|---|
CADD=1 | Include CADD scores | Off |
Enformer=1 | Include Enformer predictions | Off |
AlphaMissense=1 | Include AlphaMissense pathogenicity | Off |
REVEL=1 | Include REVEL scores | Off |
SpliceAI=1 | Include SpliceAI splicing predictions | Off |
regulatory=1 | Include regulatory feature overlap | Off |
cell_type= | Cell type for regulatory annotations | All |
VEP Consequence Hierarchy (most to least severe)
| Consequence | Impact | Description |
|---|---|---|
transcript_ablation | HIGH | Deletion of entire transcript |
splice_donor_variant | HIGH | Essential splice donor site |
stop_gained | HIGH | Premature stop codon |
frameshift_variant | HIGH | Reading frame change |
missense_variant | MODERATE | Amino acid change |
splice_region_variant | LOW | Near splice site |
synonymous_variant | LOW | No amino acid change |
regulatory_region_variant | MODIFIER | In regulatory element |
intergenic_variant | MODIFIER | Between genes |
For ENCODE regulatory variants: Most will be classified as regulatory_region_variant (MODIFIER impact). The VEP consequence alone does not capture regulatory impact — combine with ENCODE cCRE class, tissue activity, and TF disruption data.
Step 3: Coordinate Conversion (LiftOver)
Convert between GRCh37 (hg19) and GRCh38 (hg38):
# GRCh37 → GRCh38
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/map/human/GRCh37/17:1000000..1000100:1/GRCh38"
# GRCh38 → GRCh37
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/map/human/GRCh38/17:1000000..1000100:1/GRCh37"
When needed: Older GWAS studies report variants on GRCh37. ENCODE data uses GRCh38. Always liftOver before intersecting.
Step 4: Gene Lookup and Cross-References
Gene Information
# By Ensembl ID
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/lookup/id/ENSG00000157764?expand=1"
# By symbol
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/lookup/symbol/homo_sapiens/BRAF"
Cross-References (Ensembl ↔ External DBs)
# Get all external references for a gene
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/xrefs/id/ENSG00000157764"
# Filter by external DB
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/xrefs/id/ENSG00000157764?external_db=HGNC"
ENCODE integration: ENCODE target names are typically HGNC symbols or Ensembl IDs. Use this endpoint to resolve between identifier systems.
Step 5: Phenotype/Disease Associations
# Get phenotype associations for a gene
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/phenotype/gene/homo_sapiens/BRCA2"
# Get phenotype associations for a region
curl -H "Content-type: application/json" \
"https://rest.ensembl.org/phenotype/region/homo_sapiens/9:22125500-22136000"
Integrated ENCODE + Ensembl Workflow
1. Find ENCODE regulatory variants:
→ Intersect GWAS variants with ENCODE cCREs
2. Annotate with VEP:
curl "https://rest.ensembl.org/vep/human/id/rs699?CADD=1;regulatory=1;REVEL=1"
→ Get consequence, CADD score, regulatory overlap
3. Check Ensembl Regulatory Build for independent confirmation:
curl "https://rest.ensembl.org/overlap/region/human/CHR:START-END?feature=regulatory"
→ Compare with ENCODE cCRE classification
4. If working with GRCh37 data, liftOver:
curl "https://rest.ensembl.org/map/human/GRCh37/CHR:POS..POS:1/GRCh38"
5. Resolve gene identifiers for ENCODE targets:
curl "https://rest.ensembl.org/lookup/symbol/homo_sapiens/GENE_SYMBOL"
6. Check disease associations:
curl "https://rest.ensembl.org/phenotype/gene/homo_sapiens/GENE"
Pitfalls and Caveats
- Ensembl Regulatory Build ≠ ENCODE cCREs: The Regulatory Build incorporates ENCODE data but uses different classification criteria. Annotations may not perfectly overlap with ENCODE cCRE calls.
- **VEP MODIFIER impact for regulat