Cite ENCODE Data Properly
When to Use
- User wants to generate proper citations for ENCODE data, tools, and consortium papers
- User asks about "citing ENCODE", "BibTeX", "references", "bibliography", or "data citation"
- User needs to create a Key Resources Table (STAR Methods) for Cell-family journals
- User wants to export citations in BibTeX, RIS, or other reference manager formats
- Example queries: "cite the ENCODE experiments I used", "generate BibTeX for my tracked experiments", "how do I cite ENCODE in my paper?"
Help the user generate correct citations for ENCODE data following official guidelines. This is the definitive guide to citing ENCODE data in manuscripts, grants, presentations, and supplementary materials.
ENCODE Citation Requirements
ENCODE data use policy requires citing data in publications. Data is freely available with no embargo -- unrestricted use upon release. However, proper attribution is both a scientific obligation and a practical necessity: reviewers will check that you have cited data sources correctly, and incomplete citations are a common reason for revision requests.
Step 0: Assess Publication Trust Before Citing
Before citing any study, check its scientific integrity using the publication-trust skill. This step catches:
- Formally retracted papers still in circulation
- Key findings contradicted by independent groups
- Expressions of concern from journal editors
- Authors with patterns of problematic publications
If a study scores Trust Level 1 (Compromised) or 2 (Reliability concerns), flag it prominently in the citation list and note the issue. A compromised citation undermines the entire analysis built on it.
# For each paper you plan to cite:
# 1. Get metadata: get_article_metadata(pmids=["PMID"])
# 2. Check retractions: search_articles(query="PMID[PMID] AND Retracted Publication[pt]")
# 3. Check contradictions: search for citing articles with refutation language
# See publication-trust skill for full workflow
Step 1: Identify What to Cite
Determine what the user needs to cite:
Individual Experiments
For specific experiments used in analysis:
- Track the experiment:
encode_track_experiment(accession="ENCSR...") - Get associated publications:
encode_get_citations(accession="ENCSR...") - The experiment's own publications should be cited
The ENCODE Project Itself
When referencing ENCODE as a data source, cite the consortium papers:
ENCODE Phase 3 (2020):
- ENCODE Project Consortium et al. "Expanded encyclopaedias of DNA elements in the human and mouse genomes." Nature 583, 699-710 (2020). PMID: 32728249. DOI: 10.1038/s41586-020-2493-4
ENCODE Phase 2 (2012):
- ENCODE Project Consortium. "An integrated encyclopedia of DNA elements in the human genome." Nature 489, 57-74 (2012). PMID: 22955616. DOI: 10.1038/nature11247
Original ENCODE (2007):
- ENCODE Project Consortium. "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project." Nature 447, 799-816 (2007). PMID: 17571346. DOI: 10.1038/nature05874
Specific Data Standards
When your methods rely on ENCODE standards:
- ChIP-seq guidelines: Landt et al. "ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia." Genome Res 22, 1813-1831 (2012). PMID: 22955991. DOI: 10.1101/gr.136184.111
- ENCODE uniform pipelines: Hitz et al. "The ENCODE Uniform Analysis Pipelines." Nucleic Acids Res 51, D1014-D1024 (2023). DOI: 10.1093/nar/gkac1067
- ENCODE Blacklist: Amemiya et al. "The ENCODE Blacklist: Identification of Problematic Regions of the Genome." Sci Rep 9, 9354 (2019). DOI: 10.1038/s41598-019-45839-z
Step 2: Export Citations
Use encode_get_citations with appropriate format:
export_format="bibtex"-- For LaTeX, Overleaf, BibDeskexport_format="ris"-- For Endnote, Zotero, Mendeley, Papersexport_format="json"-- For programmatic use
For all tracked experiments:
encode_get_citations(export_format="bibtex")
For a specific experiment:
encode_get_citations(accession="ENCSR133RZO", export_format="bibtex")
Step 3: Generate Data Availability Statement
For the Data Availability section of a publication:
Template:
"[Assay type] data for [biosample] were obtained from the ENCODE Project (https://www.encodeproject.org). Experiment accessions: [list ENCSR accessions]. All ENCODE data are freely available under unrestricted use policy."
Use encode_export_data(format="csv") to generate a supplementary table listing all experiments used, with columns for accession, assay, biosample, target, lab, and date released.
Step 4: Write Acknowledgments
Template:
"This work used data generated by the ENCODE Consortium (encodeproject.org). The ENCODE Project is funded by the National Human Genome Research Institute (NHGRI)."
If using data from specific labs, consider acknowledging them:
"We thank [Lab Name] for generating the [assay type] data used in this study (ENCODE accession [ENCSR...])."
Step 5: Cross-Reference with Literature
Use encode_get_references to find all linked PMIDs and DOIs for tracked experiments. These can be:
- Passed to PubMed tools for full metadata
- Used to find related articles
- Included in the bibliography
Step 6: Supplementary Materials
For reproducibility, include in supplements:
- Full experiment accession list:
encode_export_data(format="tsv") - File accessions used: list specific ENCFF accessions
- Pipeline versions and parameters
- Quality metrics for each experiment used
- Any derived files with provenance:
encode_get_provenance
Walkthrough: End-to-End From Analysis to Submitted Manuscript
This walkthrough covers generating ALL citation content needed for a manuscript that uses ENCODE data. Follow each phase in order.
Phase 1: Gather All Experiments Used
encode_list_tracked()
Review the output. A typical multi-omic study might track 12 experiments: 5 Histone ChIP-seq, 3 ATAC-seq, 2 RNA-seq, 2 WGBS. Verify every experiment has publications fetched (the publications column should show a count greater than zero). If any show zero publications, run encode_get_citations(accession="ENCSR...") for those experiments individually to trigger a fresh lookup.
Check that every experiment you actually used in the analysis is tracked. A common oversight is forgetting to track experiments that were used only for quality comparison or as controls. If you generated a figure or statistic from an experiment, it must be tracked and cited.
Phase 2: Generate Methods Section Citations
Every bioinformatics tool used in your analysis pipeline must be cited. Reviewers routinely check for these. Common tools and their canonical citations:
| Tool | Citation | DOI |
|---|---|---|
| MACS2 v2.2.7.1 | Zhang et al. Genome Biol 2008 | 10.1186/gb-2008-9-9-r137 |
| STAR v2.7.10b | Dobin et al. Bioinformatics 2013 | 10.1093/bioinformatics/bts635 |
| DESeq2 v1.38 | Love et al. Genome Biol 2014 | 10.1186/s13059-014-0550-8 |
| deepTools v3.5.4 | Ramirez et al. Nucleic Acids Res 2016 | 10.1093/nar/gkw257 |
| bedtools v2.31.0 | Quinlan & Hall. Bioinformatics 2010 | 10.1093/bioinformatics/btq033 |
| samtools v1.17 | Danecek et al. GigaScience 2021 | 10.1093/gigascience/giab008 |
| Bowtie2 v2.5.1 | Langmead & Salzberg. Nat Methods 2012 | 10.1038/nmeth.1923 |
| featureCounts (Subread) | Liao et al. Bioinformatics 2014 | 10.1093/bioinformatics/btt656 |
| Bismark | Krueger & Andrews. Bioinformatics 2011 | 10.1093/bioinformatics/btr167 |
| HOMER | Heinz et al. Mol Cell 2010 | 10.1016/j.molcel.2010.05.004 |
| IGV | Robinson et al. Nat Biotechnol 2011 | 10.1038/nbt.2754 |
| Picard | Broad Institute (URL-only citation) | https://broadinstitute.github.io/picard/ |
| BWA-MEM | Li. arXiv 2013 | arXiv:1303.3997 |
See the bioinformatics-installer and scientific-writing skills for the complete citation list covering 134+ tools.
Always include tool vers