Cross-Reference ENCODE with Other Databases
When to Use
- User wants to connect ENCODE data to publications, clinical trials, or other databases
- User asks to "cross-reference", "link", or "connect" ENCODE with PubMed, bioRxiv, GEO, etc.
- User wants to find clinical trials related to ENCODE genomic targets
- User needs to build translational pipelines from ENCODE regulatory data to disease context
- User asks about drugs targeting genes identified in ENCODE experiments
- User wants to find variant annotations (ClinVar, gnomAD) for ENCODE regulatory regions
Help the user connect ENCODE genomics data to the broader scientific literature, clinical research, drug target discovery, and variant interpretation. This skill is the central hub for all multi-database workflows in the ENCODE Toolkit.
Cross-Reference Workflows
ENCODE + PubMed
- Track an ENCODE experiment to extract PMIDs from publications
- Use
encode_get_citationsorencode_get_referencesto get PMIDs - Pass PMIDs to PubMed tools (
search_articles,get_article_metadata,find_related_articles) - Find related literature about the same targets, biosamples, or biological questions
ENCODE + bioRxiv
- Search bioRxiv for preprints in relevant categories (genomics, genetics, cell biology)
- Look for preprints that reference ENCODE accession IDs or targets
- Link discovered preprint DOIs to tracked experiments using
encode_link_reference
ENCODE + ClinicalTrials.gov
- Identify disease-relevant ENCODE data (e.g., pancreatic tissue data for diabetes trials)
- Use
encode_get_referencesto find linked NCT IDs - Search ClinicalTrials.gov for trials targeting the same genes/proteins as ENCODE experiments
- Link relevant trial NCT IDs to experiments using
encode_link_reference
ENCODE + Open Targets
- Identify ENCODE ChIP-seq or CRISPR screen targets of interest
- Resolve gene symbols to Ensembl Gene IDs via
search_entities - Query Open Targets for disease associations, tractability, and drug candidates
- Chain from ENCODE functional data to therapeutic hypotheses
ENCODE + GTEx
- Find ENCODE regulatory experiments in a tissue of interest
- Annotate peaks with nearest genes using GENCODE annotations
- Query GTEx for expression of those genes in the matching tissue
- Validate that putative regulatory elements sit near actively transcribed genes
ENCODE + GWAS Catalog + ClinVar
- Obtain trait-associated variants from the GWAS Catalog
- Check clinical significance in ClinVar
- Intersect variant coordinates with ENCODE peak files and cCREs
- Determine whether disease-associated variants fall in regulatory elements
ENCODE + Consensus (Academic Search)
- Search for high-quality research papers about ENCODE targets (H3K27me3, CTCF, etc.)
- Find systematic reviews and meta-analyses relevant to ENCODE data types
- Cross-validate ENCODE quality metrics against published benchmarks
ENCODE + GEO
- Check ENCODE experiment
dbxrefsfor GEO accessions (format:GEO:GSExxxxx) - Search GEO E-utilities for complementary datasets in the same tissue/assay
- Link GEO accessions to tracked experiments using
encode_link_reference - See
geo-connectorskill for detailed GEO API usage
ENCODE + SRA
Raw sequencing reads for ENCODE experiments are deposited in NCBI SRA. GEO records link to SRA via E-utilities elink. For reprocessing ENCODE data or accessing raw reads not on the ENCODE Portal, query SRA via:
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gds&db=sra&id=GDS_UID&tool=encode_mcp&email=YOUR_EMAIL"
ENCODE + Ensembl
Use the Ensembl REST API to cross-reference ENCODE targets and regulatory elements with Ensembl annotations. See ensembl-annotation skill for VEP, Regulatory Build, and gene lookup endpoints.
Identifier Format Quick Reference
| Identifier | Format | Example | Database | MCP Tool / Skill |
|---|---|---|---|---|
| PMID | numeric string | "35486828" | PubMed | get_article_metadata |
| DOI | 10.xxxx/... | "10.1038/s41586-020-2493-4" | CrossRef | convert_article_ids |
| NCT ID | NCT + 8 digits | "NCT04567890" | ClinicalTrials.gov | get_trial_details |
| GEO Series | GSE + digits | "GSE123456" | GEO/NCBI | geo-connector skill |
| GEO Sample | GSM + digits | "GSM1234567" | GEO/NCBI | geo-connector skill |
| bioRxiv DOI | 10.1101/YYYY.MM.DD.xxx | "10.1101/2024.06.15.598765" | bioRxiv | get_preprint |
| ENCODE Experiment | ENCSR + 6 alphanum | "ENCSR123ABC" | ENCODE | encode_get_experiment |
| ENCODE File | ENCFF + 6 alphanum | "ENCFF001AAA" | ENCODE | encode_get_file_info |
| Ensembl Gene | ENSG + 11 digits | "ENSG00000102974" | Ensembl | ensembl-annotation skill |
| Ensembl Transcript | ENST + 11 digits | "ENST00000264010" | Ensembl | ensembl-annotation skill |
| SRA Study | SRP + digits | "SRP123456" | SRA/NCBI | E-utilities |
| SRA Run | SRR + digits | "SRR1234567" | SRA/NCBI | E-utilities |
| ROR ID | 9 chars | "021nxhr62" | ROR | search_by_funder |
| ChEMBL ID | CHEMBL + digits | "CHEMBL25" | Open Targets | search_entities |
| rsID | rs + digits | "rs7903146" | dbSNP/ClinVar | clinvar-annotation skill |
| ClinVar Accession | RCV + digits | "RCV000012345" | ClinVar | clinvar-annotation skill |
| JASPAR Matrix | MA + digits + version | "MA0139.1" | JASPAR | jaspar-motifs skill |
Code Examples
1. PubMed: "Find the original paper for this ENCODE experiment and link it"
Step 1: Track the experiment to extract publications
encode_track_experiment(accession="ENCSR133RZO", fetch_publications=True)
-> Extracts PMIDs from experiment metadata
Step 2: Get the PMIDs
encode_get_references(experiment_accession="ENCSR133RZO", reference_type="pmid")
-> Returns: [{"reference_id": "32728249", "reference_type": "pmid"}]
Step 3: Fetch full article metadata from PubMed
get_article_metadata(pmids=["32728249"])
-> Returns title, authors, journal, abstract, DOI
Step 4: Find related papers
find_related_articles(pmids=["32728249"], link_type="pubmed_pubmed", max_results=10)
-> Returns similar articles for further reading
2. bioRxiv: "Check if any preprints reference this ChIP-seq dataset"
Step 1: Get experiment details
encode_get_experiment(accession="ENCSR000AKS")
-> Note the target (e.g., H3K27me3), biosample, and any linked DOIs
Step 2: Search bioRxiv for related preprints
search_preprints(category="genomics", recent_days=90, limit=20)
-> Review abstracts for mentions of the target or ENCODE accession
Step 3: Link discovered preprint
encode_link_reference(
experiment_accession="ENCSR000AKS",
reference_type="preprint_doi",
reference_id="10.1101/2024.06.15.598765",
description="Preprint analyzing H3K27me3 patterns in same tissue"
)
3. GEO: "Link the GEO submission that corresponds to this ENCODE experiment"
Step 1: Check experiment metadata for GEO cross-references
encode_get_experiment(accession="ENCSR133RZO")
-> Look in dbxrefs for "GEO:GSExxxxxx"
Step 2: Link the GEO accession
encode_link_reference(
experiment_accession="ENCSR133RZO",
reference_type="geo_accession",
reference_id="GSE125066",
description="GEO submission with same raw data and supplementary files"
)
4. ClinicalTrials: "Find clinical trials studying the same gene target"
Step 1: Get the target from an ENCODE experiment
encode_get_experiment(accession="ENCSR...")
-> Target: "TP53" (a TF ChIP-seq experiment)
Step 2: Search ClinicalTrials.gov for trials involving the target
search_trials(condition="cancer", intervention="TP53", status=["RECRUITING"])
-> Returns active trials targeting TP53
Step 3: Link a relevant trial
encode_link_reference(
experiment_accession="ENCSR...",
reference_type="nct_id",
reference_id="NCT04567890",
description="Phase 2 trial targeting TP53 in solid tumors"
)