When to Use
- User wants to check if variants in ENCODE regulatory peaks have clinical significance in ClinVar
- User asks about "ClinVar", "pathogenic variants", "clinical significance", or "variant classification"
- User needs to annotate ENCODE-derived regulatory variants with disease associations
- User wants to find clinically relevant variants within enhancers, promoters, or open chromatin regions
- Example queries: "check ClinVar for variants in my ATAC-seq peaks", "find pathogenic variants in pancreas enhancers", "annotate regulatory variants with clinical significance"
Annotating ENCODE Regulatory Variants with ClinVar Clinical Significance
Cross-reference ENCODE functional genomic elements with ClinVar clinical variant classifications to identify pathogenic variants in regulatory regions and understand non-coding disease mechanisms.
Scientific Rationale
The question: "Do any clinically significant variants fall within my ENCODE regulatory elements, and can ENCODE data explain their pathogenic mechanism?"
ClinVar is NCBI's public archive of variant-disease associations, aggregating submissions from clinical laboratories, research groups, and expert panels. Most ClinVar annotations focus on coding variants, but a growing number of non-coding variants are being classified. ENCODE provides the functional context to explain WHY a non-coding variant is pathogenic — by showing that it disrupts an active enhancer, promoter, or insulator in disease-relevant tissue.
This bidirectional integration serves two use cases:
- Forward: Start from ENCODE peaks, find clinically significant variants within them
- Reverse: Start from ClinVar pathogenic variants, use ENCODE to explain their mechanism
The Non-Coding Variant Challenge
- ~90% of GWAS-associated variants are in non-coding regions (Maurano et al. 2012)
- ClinVar increasingly includes non-coding variants, but most lack mechanistic annotation
- ENCODE regulatory annotations provide the "why" behind non-coding pathogenicity
- A variant classified as VUS (variant of uncertain significance) may be reclassified with ENCODE functional evidence
Key Literature
- Landrum et al. 2018 "ClinVar: improving access to variant interpretations and supporting evidence" (Nucleic Acids Research, ~2,000 citations). Describes the ClinVar database architecture, submission standards, and the star-rating review system for variant classifications. DOI: 10.1093/nar/gkx1153
- Riggs et al. 2020 "Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the ACMG and ClinGen" (Genetics in Medicine, ~500 citations). Framework for interpreting structural variants, relevant when ENCODE elements overlap CNVs. DOI: 10.1038/s41436-019-0686-8
- Richards et al. 2015 "Standards and guidelines for the interpretation of sequence variants: ACMG/AMP joint consensus recommendation" (Genetics in Medicine, ~12,000 citations). The ACMG variant classification framework (pathogenic through benign). ENCODE functional data can provide evidence for PS3/BS3 (functional studies) criteria. DOI: 10.1038/gim.2015.30
- ENCODE Project Consortium 2020 (Nature, ~1,656 citations). Registry of 926,535 human cCREs — the functional annotation layer for interpreting non-coding ClinVar variants. DOI: 10.1038/s41586-020-2493-4
ClinVar Clinical Significance Categories
| Classification | Meaning | ENCODE Relevance |
|---|---|---|
| Pathogenic | Causes disease | If in regulatory region, ENCODE explains mechanism |
| Likely pathogenic | Strong evidence for disease causation | ENCODE data may upgrade to pathogenic |
| Uncertain significance (VUS) | Not enough evidence to classify | ENCODE functional data may help resolve |
| Likely benign | Strong evidence against pathogenicity | — |
| Benign | Does not cause disease | — |
| Conflicting interpretations | Labs disagree on classification | ENCODE data may resolve conflict |
| Risk factor | Increases disease risk | May overlap ENCODE regulatory elements |
ClinVar Star Ratings
| Stars | Review Status | Confidence |
|---|---|---|
| 0 | No assertion criteria | Very low — treat with caution |
| 1 | Single submitter with criteria | Low-moderate |
| 2 | Multiple submitters, no conflict | Moderate |
| 3 | Expert panel reviewed | High |
| 4 | Practice guideline | Highest |
Always check star ratings. A 0-star "pathogenic" classification has very different reliability than a 3-star classification.
NCBI E-utilities API Reference
Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
No authentication required for low-volume use. Rate limit: 3 requests/second without API key, 10/second with NCBI API key.
Key Endpoints
| Endpoint | Purpose | Example |
|---|---|---|
esearch.fcgi?db=clinvar&term=... | Search ClinVar | Search by gene, variant, condition |
efetch.fcgi?db=clinvar&id=... | Fetch full record | Get complete variant details |
esummary.fcgi?db=clinvar&id=... | Summary record | Get classification, review status |
elink.fcgi?db=clinvar&dbfrom=... | Cross-database links | Link to PubMed, Gene, etc. |
ClinVar VCF Downloads
For bulk intersection with ENCODE peaks, download the ClinVar VCF:
- GRCh38:
https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz - GRCh37:
https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz
Updated monthly on the first Thursday.
Step 1: Define the Scope
Determine which direction the analysis runs:
Forward: ENCODE Peaks to ClinVar Variants
Starting from ENCODE regulatory elements, find clinically significant variants within them.
# Get ENCODE peaks for target tissue
encode_search_experiments(
assay_title="Histone ChIP-seq",
target="H3K27ac",
organ="pancreas",
biosample_type="tissue"
)
encode_list_files(
experiment_accession="ENCSR...",
file_format="bed",
output_type="IDR thresholded peaks",
assembly="GRCh38"
)
Reverse: ClinVar Variants to ENCODE Context
Starting from ClinVar pathogenic variants, determine if they overlap ENCODE regulatory elements.
import requests
# Search ClinVar for pathogenic variants in a gene
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
params = {
"db": "clinvar",
"term": "INS[gene] AND pathogenic[clinical significance]",
"retmax": 50,
"retmode": "json"
}
response = requests.get(url, params=params)
result = response.json()
variant_ids = result["esearchresult"]["idlist"]
Step 2: Query ClinVar via E-utilities
Search for Variants by Gene
import requests
import time
def search_clinvar(gene_symbol, significance="pathogenic"):
"""Search ClinVar for variants in a gene with given clinical significance."""
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
term = f"{gene_symbol}[gene] AND {significance}[clinical significance]"
params = {
"db": "clinvar",
"term": term,
"retmax": 100,
"retmode": "json"
}
response = requests.get(url, params=params)
time.sleep(0.34) # Rate limit: 3/sec
return response.json()["esearchresult"]["idlist"]
Get Variant Details
def get_clinvar_summary(variant_ids):
"""Get summary for ClinVar variant IDs."""
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
params = {
"db": "clinvar",
"id": ",".join(variant_ids[:20]), # Max 20 per request
"retmode": "json"
}
response = requests.get(url, params=params)
time.sleep(0.34)
return response.json()["result"]
Search by Genomic Region
# Se