Using JASPAR Transcription Factor Binding Profiles with ENCODE ChIP-seq Data
Integrate JASPAR position weight matrices (PWMs) with ENCODE ChIP-seq peaks to validate TF binding targets, discover co-binding partners, and scan regulatory elements for TF binding potential.
Scientific Rationale
The question: "Does the expected TF binding motif appear in my ENCODE ChIP-seq peaks, and what other TF motifs are enriched?"
ENCODE TF ChIP-seq experiments identify where a transcription factor binds in the genome, but the peak coordinates alone do not confirm direct DNA binding or reveal the binding sequence specificity. JASPAR provides curated position weight matrices (PWMs) — mathematical representations of TF binding preferences — that enable two critical analyses:
-
Target validation: If CTCF ChIP-seq peaks are enriched for the CTCF motif (JASPAR MA0139.1), the experiment worked correctly. If they are NOT enriched, something may be wrong with the antibody, crosslinking, or peak calling.
-
Co-factor discovery: Motif enrichment analysis in ChIP-seq peaks often reveals motifs for co-binding TFs that were not the ChIP target, uncovering regulatory complexes.
What JASPAR Provides
- 900+ curated TF binding profiles across 7 taxonomic groups
- Position Frequency Matrices (PFMs), Position Weight Matrices (PWMs), and sequence logos
- Multiple profile versions reflecting binding mode diversity
- Quality scores (based on validation evidence)
- Taxonomic classification and TF structural class annotation
- REST API for programmatic access
The ENCODE-JASPAR Synergy
| ENCODE provides | JASPAR provides | Together |
|---|---|---|
| Where a TF binds (peak coordinates) | How a TF recognizes DNA (binding motif) | Validated binding sites with sequence specificity |
| TF binding in specific tissues | Universal binding preferences | Tissue-specific motif usage |
| Co-occupancy data (multiple ChIP-seq) | Co-factor motif profiles | Regulatory complex architecture |
| Chromatin context (accessibility, marks) | Motif sequence requirements | Context-dependent binding rules |
Key Literature
- Castro-Mondragon et al. 2022 "JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles" (Nucleic Acids Research, ~1,400 citations). The current JASPAR release with 1,956 profiles across 7 taxonomic groups, including unvalidated (UNVALIDATED collection) profiles. Introduced TFBSTools integration and improved REST API. DOI: 10.1093/nar/gkab1113
- Sandelin et al. 2004 "JASPAR: an open-access database for eukaryotic transcription factor binding profiles" (Nucleic Acids Research, ~2,000 citations). The founding JASPAR publication establishing the curated, open-access model for TF binding profiles. DOI: 10.1093/nar/gkh012
- Grant et al. 2011 "FIMO: scanning for occurrences of a given motif" (Bioinformatics, ~2,500 citations). FIMO (Find Individual Motif Occurrences) — the standard tool for scanning sequences with PWMs. Part of the MEME Suite. DOI: 10.1093/bioinformatics/btr064
- Heinz et al. 2010 "Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities" (Molecular Cell, ~5,000 citations). Introduced HOMER motif analysis — the most widely used tool for de novo and known motif enrichment in ChIP-seq peaks. DOI: 10.1016/j.molcel.2010.05.004
- ENCODE Project Consortium 2020 (Nature, ~1,656 citations). The TF ChIP-seq experiments that JASPAR motifs validate and enrich. DOI: 10.1038/s41586-020-2493-4
When to Use This Skill
| Scenario | How JASPAR Helps |
|---|---|
| Validating ENCODE TF ChIP-seq | Check if target TF motif is enriched in peaks |
| Finding co-binding TFs | Scan peaks for additional enriched motifs |
| Interpreting ENCODE enhancers | Identify which TFs can bind enhancer sequences |
| Variant in TF binding site | Check if variant disrupts a JASPAR motif |
| Comparing TF binding across tissues | Determine if same motif is used in different contexts |
| Planning CRISPR validation | Identify core motif bases to mutate |
JASPAR REST API Reference
Base URL: https://jaspar.genereg.net/api/v1/
No authentication required. Responses are JSON.
Key Endpoints
| Endpoint | Purpose | Key Parameters |
|---|---|---|
/matrix/ | List/search all profiles | name, collection, tax_group, tf_class |
/matrix/{id}/ | Get specific profile | Matrix ID (e.g., MA0139.1) |
/matrix/{id}/?format=pfm | Get PFM (counts) | — |
/matrix/{id}/?format=pwm | Get PWM (log-odds) | — |
/matrix/{id}/?format=jaspar | Get JASPAR format | — |
/matrix/{id}/?format=meme | Get MEME format | Ready for FIMO scanning |
/taxon/ | List taxonomic groups | — |
/tfclass/ | List TF structural classes | — |
Common Matrix IDs for ENCODE TFs
| TF | JASPAR ID | Class | Notes |
|---|---|---|---|
| CTCF | MA0139.1 | C2H2 zinc finger | Most common ENCODE TF ChIP-seq target |
| TP53 (p53) | MA0106.3 | p53 family | Tumor suppressor |
| SP1 | MA0079.5 | C2H2 zinc finger | GC-rich promoter binding |
| FOXA1 | MA0148.4 | Forkhead | Pioneer factor |
| FOXA2 | MA0047.3 | Forkhead | Liver, pancreas |
| HNF4A | MA0114.4 | Nuclear receptor | Hepatocyte-enriched |
| NRF1 | MA0506.2 | bZIP | Mitochondrial regulation |
| REST (NRSF) | MA0138.2 | C2H2 zinc finger | Neuronal gene repressor |
| MYC | MA0147.3 | bHLH | Oncogene, E-box binding |
| JUN (AP-1) | MA0488.1 | bZIP | Immediate early response |
| GATA4 | MA0482.2 | GATA | Cardiac, endoderm |
| PAX6 | MA0069.1 | Paired box | Eye, brain development |
Step 1: Retrieve ENCODE ChIP-seq Peaks
# Find TF ChIP-seq experiments
encode_search_experiments(
assay_title="TF ChIP-seq",
target="CTCF",
organ="pancreas",
biosample_type="tissue"
)
# Get IDR thresholded peaks (highest confidence)
encode_list_files(
experiment_accession="ENCSR...",
file_format="bed",
output_type="IDR thresholded peaks",
assembly="GRCh38",
preferred_default=True
)
Track the experiment:
encode_track_experiment(accession="ENCSR...", notes="CTCF ChIP-seq for motif analysis")
Step 2: Get JASPAR Motif Profiles
Query by TF Name
import requests
def get_jaspar_matrix(tf_name, tax_group="vertebrates", collection="CORE"):
"""Get JASPAR matrix for a TF."""
url = "https://jaspar.genereg.net/api/v1/matrix/"
params = {
"name": tf_name,
"tax_group": tax_group,
"collection": collection,
"format": "json"
}
response = requests.get(url, params=params)
results = response.json()["results"]
if results:
# Return the highest-version profile
return sorted(results, key=lambda x: x["version"], reverse=True)[0]
return None
ctcf_profile = get_jaspar_matrix("CTCF")
print(f"ID: {ctcf_profile['matrix_id']}, Version: {ctcf_profile['version']}")
Get the Position Frequency Matrix (PFM)
def get_pfm(matrix_id):
"""Get Position Frequency Matrix from JASPAR."""
url = f"https://jaspar.genereg.net/api/v1/matrix/{matrix_id}/"
params = {"format": "json"}
response = requests.get(url, params=params)
data = response.json()
pfm = data["pfm"]
# pfm is a dict with keys A, C, G, T, each a list of counts per position
return pfm
pfm = get_pfm("MA0139.1")
print(f"Motif length: {len(pfm['A'])} bp")
for base in ["A", "C", "G", "T"]:
print(f"{base}: {pfm[base]}")
Get MEME Format (for FIMO Scanning)
def get_meme_format(matrix_id):
"""Get motif in MEME format for use with FIMO."""
url = f"htt