Published skills
Showing 48 of 94
bioinformatics-installer
Install bioinformatics tools for ENCODE data analysis. Covers CLI tools (BWA, STAR, samtools, MACS2), R/Bioconductor packages (DESeq2, Seurat, ChIPseeker), Python packages (Scanpy, deeptools), and Nextflow pipeline infrastructure. Generates conda environments, R install scripts, and Python requirements. Use when the user needs to set up a bioinformatics workstation, install tools for a specific as
download-encode
Download ENCODE genomics files (BED, FASTQ, BAM, bigWig, etc.) to the user's machine. Use when the user wants to download data files from ENCODE experiments.
functional-screen-analysis
Analyze ENCODE functional genomics screens including CRISPR screens, MPRA (Massively Parallel Reporter Assays), and STARR-seq. Find screen data in ENCODE, process results, identify functional elements, and integrate with epigenomic annotations.
integrative-analysis
Plan and execute integrative analysis combining multiple ENCODE experiments for cross-dataset or multi-omic workflows. Use when the user wants to combine experiments, perform cross-dataset comparison, multi-omic integration, peak overlap analysis, differential binding, signal correlation, chromatin state segmentation, enhancer-gene linkage, or any analysis that requires merging or comparing data f
accessibility-aggregation
Build comprehensive chromatin accessibility maps by aggregating ATAC-seq and DNase-seq narrowPeak data across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "where is chromatin accessible in my tissue?" by combining peak calls into a union peak set. Handles cross-lab variation, ATAC vs DNase platform differences, and ENCODE blocklist filtering.
compare-biosamples
Compare ENCODE experiments across different biosamples, tissues, or cell lines to identify tissue-specific regulatory patterns. Use when the user wants cross-tissue comparison, cell-type comparison, tissue-specific elements, differential chromatin, biosample matching, disease vs normal comparison, developmental time course, constitutive vs variable regulation, or multi-tissue data availability map
clinvar-annotation
Guide for annotating ENCODE regulatory variants with ClinVar clinical significance. Use when users need to check if variants in ENCODE peaks have clinical associations, find pathogenic variants in regulatory regions, or assess variant clinical impact. Trigger on: ClinVar, clinical significance, pathogenic variant, variant classification, clinical variant, disease variant, VUS, benign, likely patho
cross-reference
Cross-reference ENCODE data with PubMed, bioRxiv, ClinicalTrials.gov, Open Targets, GTEx, ClinVar, GWAS Catalog, gnomAD, Ensembl, and other scientific databases. Use when the user wants to find publications, preprints, or clinical trials related to ENCODE experiments, chain ENCODE data with other scientific MCP servers, or build translational pipelines from genomic data to clinical application.
gwas-catalog
Guide for integrating NHGRI-EBI GWAS Catalog associations with ENCODE regulatory data. Use when users need to find GWAS variants in ENCODE peaks, connect regulatory elements to disease associations, or prioritize functional variants using ENCODE annotations. Trigger on: GWAS, genome-wide association, SNP association, trait association, GWAS Catalog, disease association, risk variant, lead SNP, LD
histone-aggregation
Build comprehensive histone mark maps by aggregating narrowPeak data across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "where is this histone mark present in my tissue?" by combining peak calls from multiple studies into a union peak set with confidence annotations. Handles cross-lab batch effects, broad vs narrow marks, and ENCODE blocklist filtering.
motif-analysis
Guide for de novo and known motif enrichment analysis of ENCODE ChIP-seq and ATAC-seq peaks using HOMER and MEME Suite. Use when users need to discover TF binding motifs in peaks, validate ChIP-seq targets, or find co-binding partners. Trigger on: motif analysis, HOMER, MEME, de novo motif, motif enrichment, findMotifsGenome, AME, MEME-ChIP, known motif, TF binding motif, co-factor, motif discover
pipeline-atacseq
Execute ENCODE ATAC-seq processing pipeline from FASTQ to peaks and signal tracks. Child of pipeline-guide. Provides stage-by-stage Nextflow execution with Docker containers and cloud deployment. Handles Tn5 transposase offset correction, mitochondrial read removal, nucleosome-free fragment selection, and TSS enrichment scoring. Use when users need to process ATAC-seq data following ENCODE standar
pipeline-cutandrun
Execute CUT&RUN processing pipeline from FASTQ to peaks and signal tracks. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing CUT&RUN or CUT&Tag data, an alternative to ChIP-seq with lower background. Trigger on: CUT&RUN pipeline, CUT&Tag, SEACR, Henikoff, targeted chromatin, pA-MNase, process CUT&RUN.
pipeline-guide
Access ENCODE uniform analysis pipelines, generate user-specific Nextflow/WDL pipelines, manage compute resources, and integrate with cloud platforms. Use when the user wants to understand ENCODE pipelines, run pipelines on their own data, generate custom Nextflow workflows from ENCODE pipeline code, check compute requirements (CPU/GPU/memory), run pipelines in background, or integrate with Google
track-experiments
Track ENCODE experiments locally with publications, citations, and provenance. Use when the user wants to build a collection of experiments, manage citations, compare experiments, or track data provenance.
batch-analysis
Guide for multi-experiment batch operations: QC screening, batch download, comparison, and report generation across many ENCODE experiments simultaneously. Use when users need to process 5+ experiments together, create experiment comparison tables, perform batch quality checks, or generate summary reports. Trigger on: batch analysis, multiple experiments, bulk processing, experiment comparison, ba
cellxgene-context
Guide for integrating CellxGene Census single-cell data with ENCODE bulk experiments. Use when users need cell-type-specific expression context for ENCODE regulatory data, want to deconvolve bulk ENCODE signals, or validate regulatory elements at single-cell resolution. Trigger on: CellxGene, single-cell atlas, cell type expression, Census, cell type specificity, single-cell context, scRNA-seq atl
epigenome-profiling
Build comprehensive epigenomic profiles for tissues or cell types using ENCODE data. Use when the user wants to characterize chromatin states, assemble histone modification panels, create epigenomic landscapes, run ChromHMM segmentation, identify super-enhancers or bivalent domains, profile regulatory elements across a biosample, or understand epigenetic regulation in a specific biological context
jaspar-motifs
Guide for using JASPAR transcription factor binding profiles with ENCODE ChIP-seq data. Use when users need to find TF binding motifs in ENCODE peaks, validate ChIP-seq targets with known motifs, or scan regulatory regions for TF binding potential. Trigger on: JASPAR, motif database, binding profile, PWM, position weight matrix, TF motif, motif enrichment, motif scanning, binding site prediction.
data-provenance
Track exact provenance for every operation on ENCODE data — tool versions, reference files, scripts, parameters, and timestamps — to enable publication-ready methods writing. Use when the user processes ENCODE files, runs any bioinformatics tool, creates filtered/merged datasets, runs pipelines, performs liftover, uses R/Python/Bash for analysis, or needs to document their analysis chain for repro
disease-research
Use ENCODE functional genomics data for disease mechanism research. Use when the user wants to connect GWAS variants to regulatory elements, annotate disease-associated loci with functional data, identify therapeutic targets from epigenomic data, build disease regulatory models, cross-reference with clinical trials and drug databases, or conduct any disease-focused, pathology-driven, or clinical v
gnomad-variants
Query gnomAD (Genome Aggregation Database) for population allele frequencies, gene constraint scores, and variant annotations to interpret ENCODE regulatory variants. Use when the user needs allele frequencies for variants in ENCODE regulatory elements, wants to assess gene constraint (pLI, LOEUF) for ENCODE target genes, needs population-specific frequencies for GWAS variants overlapping cCREs, w
hic-aggregation
Build comprehensive chromatin contact maps by aggregating Hi-C loop calls (BEDPE) across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "what regions are in 3D contact in my tissue?" by creating a union catalog of chromatin loops. Handles resolution-aware anchor matching, cross-lab variation, and Hi-C-specific quality metrics.
publication-trust
Assess the scientific integrity and trustworthiness of publications before relying on their findings. Use this skill whenever evaluating a paper for a workflow, citing a study, building an analysis on published methods, or when a user asks about the reliability of a study. Checks for formal retractions, corrections, expressions of concern, and — critically — informal contradictions where subsequen
scrna-meta-analysis
Conduct rigorous cross-study meta-analysis of scRNA-seq data from ENCODE, integrating multiple single-cell transcriptomic datasets for a tissue/cell type. Use when the user wants to answer "what cell types exist in my tissue and what genes define them?" by combining scRNA-seq data across donors, labs, and platforms. Follows the Mawla et al. 2019 framework for assessing cross-study reproducibility,
setup
Set up the ENCODE Toolkit server connection. Use when the user needs help installing, configuring, or troubleshooting the ENCODE connector.
ensembl-annotation
Query the Ensembl REST API for regulatory feature annotations, variant effect prediction (VEP), coordinate liftover, gene lookups, and cross-references. Use when the user needs to annotate variants with VEP (consequence, CADD, REVEL, SpliceAI), check Ensembl Regulatory Build overlap for ENCODE regions, convert coordinates between GRCh37 and GRCh38, resolve gene IDs (Ensembl ↔ symbol ↔ RefSeq), loo
geo-connector
Search, query, and cross-reference NCBI GEO (Gene Expression Omnibus) datasets with ENCODE experiments. Use when the user wants to find GEO accessions for ENCODE experiments, search GEO for complementary datasets, download GEO metadata or series matrices, cross-reference ENCODE and GEO data, find supplementary files from GEO, or link GEO series to ENCODE experiments for provenance tracking. Also u
gtex-expression
Guide for integrating GTEx tissue expression data with ENCODE regulatory elements. Use when users need to check if a gene is expressed in a tissue, correlate regulatory elements with expression, or validate ENCODE findings against GTEx. Trigger on: GTEx, tissue expression, gene expression levels, expression atlas, eQTL, tissue-specific expression, TPM values.
methylation-aggregation
Build comprehensive DNA methylation maps by aggregating WGBS (Whole Genome Bisulfite Sequencing) data across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "where is DNA methylated/unmethylated in my tissue?" by combining per-CpG methylation data into tissue-level methylation profiles. Handles coverage filtering, identifies hypomethylated regions (HMRs) and partia
peak-annotation
Guide for annotating ENCODE peaks with genomic features using ChIPseeker and GREAT. Use when users need to assign peaks to genes, determine genomic feature distribution (promoter, intron, intergenic), or perform gene ontology enrichment of peak-associated genes. Trigger on: peak annotation, ChIPseeker, GREAT, peak to gene, genomic feature, promoter enrichment, gene ontology, peak distribution, TSS
scientific-writing
Generate publication-ready methods sections, figure legends, supplementary tables, and data availability statements from ENCODE analysis provenance. Implements the scientific documentation standards requiring complete metadata reporting. Use when the user needs to write methods, generate figure legends, create supplementary tables, draft data availability statements, compile tool citations, or aut
search-encode
Search and explore ENCODE Project genomics data. Use when the user wants to find experiments, files, or explore what data is available for specific assays, organs, cell lines, or targets.
cite-encode
Generate proper ENCODE citations for publications, grants, and presentations. Use when the user needs to cite ENCODE data, create bibliography entries, write acknowledgment sections, or ensure compliance with ENCODE data use policy.
epigenome-profiling
Build comprehensive epigenomic profiles for tissues or cell types using ENCODE data. Use when the user wants to characterize chromatin states, assemble histone modification panels, create epigenomic landscapes, run ChromHMM segmentation, identify super-enhancers or bivalent domains, profile regulatory elements across a biosample, or understand epigenetic regulation in a specific biological context
integrative-analysis
Plan and execute integrative analysis combining multiple ENCODE experiments for cross-dataset or multi-omic workflows. Use when the user wants to combine experiments, perform cross-dataset comparison, multi-omic integration, peak overlap analysis, differential binding, signal correlation, chromatin state segmentation, enhancer-gene linkage, or any analysis that requires merging or comparing data f
pipeline-chipseq
Execute ENCODE ChIP-seq processing pipeline from FASTQ to peaks and signal tracks. Child of pipeline-guide. Provides stage-by-stage Nextflow execution with Docker containers and cloud deployment. Use when users need to process ChIP-seq data following ENCODE standards, run peak calling with MACS2, perform IDR analysis, or generate signal tracks. Trigger on: ChIP-seq pipeline, run ChIP-seq, process
pipeline-dnaseseq
Execute ENCODE DNase-seq pipeline from FASTQ to hotspots and footprints. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing DNase-seq data, calling DNase hypersensitive sites, performing footprinting analysis. Trigger on: DNase-seq pipeline, DNase hypersensitive, DHS, Hotspot2, footprinting, DNase I, chromatin accessibility DNase.
quality-assessment
Evaluate ENCODE experiment quality using standard metrics and audit flags. Use when the user asks about data quality, wants to filter for high-quality experiments, needs to interpret quality metrics (FRiP, NSC, RSC, NRF, IDR, TSS enrichment, fragment size), wants to understand ENCODE audit warnings, needs to compare quality across experiments, or is deciding whether data is usable for their analys
single-cell-encode
Find and work with ENCODE single-cell genomics data including scRNA-seq and scATAC-seq. Use when the user asks about single-cell experiments, cell type resolution, clustering from ENCODE data, deconvolution of bulk signals using single-cell references, or comparing single-cell vs bulk profiles. Covers platform differences (10X Chromium, Smart-seq2, Drop-seq), quality limitations of single-cell dat
batch-analysis
Guide for multi-experiment batch operations: QC screening, batch download, comparison, and report generation across many ENCODE experiments simultaneously. Use when users need to process 5+ experiments together, create experiment comparison tables, perform batch quality checks, or generate summary reports. Trigger on: batch analysis, multiple experiments, bulk processing, experiment comparison, ba
download-encode
Download ENCODE genomics files (BED, FASTQ, BAM, bigWig, etc.) to the user's machine. Use when the user wants to download data files from ENCODE experiments.
geo-connector
Search, query, and cross-reference NCBI GEO (Gene Expression Omnibus) datasets with ENCODE experiments. Use when the user wants to find GEO accessions for ENCODE experiments, search GEO for complementary datasets, download GEO metadata or series matrices, cross-reference ENCODE and GEO data, find supplementary files from GEO, or link GEO series to ENCODE experiments for provenance tracking. Also u
hic-aggregation
Build comprehensive chromatin contact maps by aggregating Hi-C loop calls (BEDPE) across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "what regions are in 3D contact in my tissue?" by creating a union catalog of chromatin loops. Handles resolution-aware anchor matching, cross-lab variation, and Hi-C-specific quality metrics.
multi-omics-integration
Integrate multiple ENCODE data types (RNA-seq, ATAC-seq, Histone ChIP-seq, TF ChIP-seq) for a tissue/cell type to build a comprehensive regulatory landscape. Use when the user wants to answer "what are the enhancers, promoters, and regulatory elements active in my tissue, and which transcription factors control them?" by layering expression, chromatin accessibility, histone marks, and TF binding d
pipeline-atacseq
Execute ENCODE ATAC-seq processing pipeline from FASTQ to peaks and signal tracks. Child of pipeline-guide. Provides stage-by-stage Nextflow execution with Docker containers and cloud deployment. Handles Tn5 transposase offset correction, mitochondrial read removal, nucleosome-free fragment selection, and TSS enrichment scoring. Use when users need to process ATAC-seq data following ENCODE standar
pipeline-hic
Execute ENCODE Hi-C pipeline from FASTQ to contact matrices and loop calls. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing Hi-C data, generating contact matrices, calling loops or TADs. Trigger on: Hi-C pipeline, chromatin conformation, contact matrix, loop calling, TAD detection, Juicer, HiCCUPS, 3D genome.
pipeline-rnaseq
Execute ENCODE RNA-seq pipeline from FASTQ to gene quantification and signal tracks. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing RNA-seq data with STAR alignment, RSEM/Kallisto quantification, or generating expression matrices. Trigger on: RNA-seq pipeline, gene expression, STAR alignment, RSEM quantification, transcript quantification
Category alert