Track ENCODE Experiments
When to Use
- User wants to save/bookmark ENCODE experiments for later reference
- User needs to build a collection of experiments for a project
- User asks to "track", "save", or "bookmark" an experiment
- User wants to manage citations and publications for ENCODE data
- User needs to compare experiments for compatibility
- User wants to export their experiment collection as CSV/TSV/JSON
- User asks about data provenance (linking derived files to ENCODE sources)
Help the user manage their local collection of ENCODE experiments. This skill covers the full lifecycle of experiment management: discovery, tracking, annotation, citation, comparison, provenance, and export.
Tracking Capabilities
-
Track an experiment: Use
encode_track_experimentto save experiment metadata, publications, and pipeline info locally.- Automatically extracts GEO accessions and PMIDs from experiment metadata
- Fetches associated publications with authors, journal, DOI
- Stores 18 metadata fields per experiment (see schema below)
- Idempotent: re-tracking the same accession updates metadata without creating duplicates
-
View tracked collection: Use
encode_list_trackedto see all tracked experiments. Filter by assay, organism, or organ. -
Get citations: Use
encode_get_citationsto export publication data."json": Structured data"bibtex": For LaTeX/reference managers"ris": For Endnote, Zotero, Mendeley
-
Compare experiments: Use
encode_compare_experimentsto check if two experiments are compatible for combined analysis (same organism, assembly, assay, biosample, etc.). -
Collection overview: Use
encode_summarize_collectionfor grouped statistics across your tracked experiments. -
Export data: Use
encode_export_datato export tracked experiments as CSV, TSV, or JSON for use in R, pandas, Excel.
Stored Metadata
When you track an experiment, the following fields are captured from the ENCODE Portal API and stored locally:
| Field | Description | Example |
|---|---|---|
accession | ENCODE accession (primary key) | ENCSR123ABC |
assay_title | Assay type | Histone ChIP-seq |
target | Antibody target (ChIP/eCLIP) | H3K27ac-human |
biosample_summary | Full biosample description | pancreas tissue male adult (54 years) |
organism | Species | Homo sapiens |
organ | Organ or tissue of origin | pancreas |
biosample_type | Biosample classification | tissue, primary cell, cell line |
status | ENCODE release status | released |
date_released | Portal release date | 2020-07-15 |
description | Experiment description (from PI) | H3K27ac ChIP-seq on human pancreatic islets |
lab | Submitting laboratory | /labs/bradley-bernstein/ |
award | Funding award | /awards/U01HG007610/ |
assembly | Genome assembly | GRCh38 |
replication_type | Replicate strategy | isogenic, anisogenic |
life_stage | Developmental stage | adult, embryonic, child |
url | ENCODE Portal URL | https://www.encodeproject.org/experiments/ENCSR123ABC/ |
notes | User-provided notes | H3K27ac reference for islet enhancer study |
raw_metadata | Full JSON from API (up to 512KB) | (stored for future queries) |
Additionally, the tracker stores timestamps (tracked_at, updated_at) for audit trail purposes.
SQLite Schema Overview
The tracker uses a local SQLite database with WAL journal mode and foreign keys enabled. The schema consists of six tables:
tracked_experiments -- One row per ENCODE experiment. The accession column is the primary key. Indexes on assay_title, organism, and organ for fast filtered queries.
publications -- Publications linked to experiments. Stores PMID, DOI, title, authors (first 10), journal, year, abstract. Unique constraint on (experiment_accession, pmid) prevents duplicates.
pipeline_info -- ENCODE uniform processing pipeline details. Stores pipeline title, version, software list (as JSON array), and analysis status.
quality_metrics -- Per-file quality metrics from ENCODE audits. Stores file accession, metric type, and metric data (as JSON).
derived_files -- User-created files derived from ENCODE data. Stores file path, source accessions (as JSON array), tool used, parameters, and description. This is the backbone of provenance tracking.
external_references -- Cross-database links. Stores reference type (pmid, doi, geo_accession, nct_id, biorxiv_doi, dbgap), reference ID, and description. Unique constraint on (experiment_accession, reference_type, reference_id).
The database location is ~/.encode_connector/tracker.db (macOS/Linux) or %USERPROFILE%\.encode_connector\tracker.db (Windows). The directory is created automatically on first use.
Data Provenance
-
Log derived files: Use
encode_log_derived_filewhen the user creates files from ENCODE data (filtered peaks, merged signals, etc.). -
View provenance: Use
encode_get_provenanceto trace derived files back to source ENCODE data.
Cross-References
-
Link external references: Use
encode_link_referenceto attach PubMed IDs, DOIs, ClinicalTrials NCT IDs, bioRxiv DOIs, or GEO accessions to tracked experiments. -
Get references: Use
encode_get_referencesto retrieve linked external identifiers. These IDs can be passed to PubMed, bioRxiv, or ClinicalTrials MCP servers for further analysis.
Walkthrough 1: Building a Pancreatic Islet Epigenome Reference Collection
Goal: Curate a comprehensive set of histone modification ChIP-seq, ATAC-seq, and RNA-seq from human pancreatic islets for enhancer analysis. This is the foundational workflow for any tissue-specific integrative analysis.
Step 1: Discover what data exists
Before tracking anything, survey the landscape. Use facets to understand the breadth of available data for your tissue of interest.
encode_get_facets(facet_field="assay_title", organ="pancreas", organism="Homo sapiens")
Expected output (example):
Histone ChIP-seq: 15 experiments
ATAC-seq: 3 experiments
RNA-seq: 8 experiments
TF ChIP-seq: 4 experiments
WGBS: 2 experiments
DNase-seq: 1 experiment
This tells you that pancreatic tissue has strong histone ChIP-seq coverage (15 experiments across multiple marks), adequate ATAC-seq (3), and solid RNA-seq (8). The 2 WGBS experiments are a bonus for methylation analysis.
Step 2: Search for histone ChIP-seq experiments
Now retrieve the actual experiments. Focus on one assay type at a time to keep notes organized.
encode_search_experiments(assay_title="Histone ChIP-seq", organ="pancreas", organism="Homo sapiens")
Expected return: 15 experiments with targets including H3K27ac, H3K4me1, H3K4me3, H3K27me3, H3K36me3. Review the biosample summaries -- some may be whole pancreas tissue, others isolated islets, and others acinar or ductal cells. This distinction matters for enhancer analysis.
Step 3: Track each histone experiment with descriptive notes
Notes are your lab notebook. Record the histone mark, the specific biosample, and the intended analytical role. This context is invaluable weeks later when you revisit the collection.
encode_track_experiment(accession="ENCSR123ABC", notes="H3K27ac pancreatic islets - active enhancers and super-enhancers")
encode_track_experiment(accession="ENCSR456DEF", notes="H3K4me1 pancreatic islets - primed/poised enhancers")
encode_track_experiment(accession="ENCSR789GHI", notes="H3K4me3 pancreatic islets - active promoters, CpG islands")
encode_track_experiment(accession="ENCSR012JKL", notes="H3K27me3 pancreatic islets - Polycomb repression, bivalent domains")
encode_track_experiment(accession="ENCSR345MNO", notes="H3K36me3 pancreatic islets - gene body transcription elongation")
Why these five marks? Together they define the core chromatin states:
- H3K27ac marks active enhancers and promoters (the