ClinPGx (PharmGKB) Pharmacogenomics Database
Overview
PharmGKB rebranded as ClinPGx in 2024 and the API moved from api.pharmgkb.org to api.clinpgx.org. The old host now returns 404/405; every example here uses the new endpoints. Two complementary APIs are used together:
- ClinPGx Data API (
api.clinpgx.org/v1) — record-style access to genes, drugs, variants, clinical annotations, guideline annotations, drug labels, and pathways. Responses wrap data as{"data": [...], "status": "success"}. Filters use dotted property paths (e.g.relatedChemicals.name=clopidogrel,levelOfEvidence.term=1A). - CPIC PostgREST API (
api.cpicpgx.org/v1) — relational lookup of genotype → drug recommendation rows. PostgREST filter syntax (column=eq.value, JSONcs.{...}for jsonb containment). Returns flat JSON arrays.
Use ClinPGx for what is known about a gene/drug/variant; use CPIC for how to prescribe given a phenotype. The pattern is ClinPGx for annotations, CPIC for recommendations.
When to Use
- Retrieving CPIC genotype-specific dosing recommendations for a gene-drug pair (e.g., CYP2C19 + clopidogrel) — use CPIC
- Looking up all pharmacogenomic clinical annotations for a drug or evidence level — use ClinPGx
data/clinicalAnnotation - Finding all CPIC/DPWG guideline annotations for a pharmacogene — use ClinPGx
data/guidelineAnnotation - Resolving a gene symbol, drug name, or rsID to ClinPGx PA identifiers — use
data/{gene,drug,variant} - Free-text search across all ClinPGx record types (genes, drugs, variants, annotations) — use
POST /site/search - Retrieving FDA/EMA pharmacogenomic drug label annotations — use ClinPGx
data/label - Building precision-medicine prescribing workflows that combine annotation evidence with phenotype-specific recommendations
- For germline disease pathogenicity (not PGx) use
clinvar-database - For somatic cancer pharmacogenomics use
cosmic-databaseoropentargets-database
Prerequisites
- Python packages:
requests,pandas— both already in standard environments - Data requirements: HGNC gene symbols, drug names (lowercase generic), dbSNP rsIDs, or PA identifiers
- Environment: internet connection; no authentication required for either host
- Rate limits: the ClinPGx host occasionally returns HTTP 429; insert
time.sleep(0.3–0.5)between sequential calls. CPIC is more permissive.
If you are inside a pixi/conda environment that already provides requests and pandas, skip the install — invoke scripts with pixi run python ....
pip install requests pandas
Quick Start
import requests
CLINPGX = "https://api.clinpgx.org/v1"
CPIC = "https://api.cpicpgx.org/v1"
# CPIC genotype → recommendation: clopidogrel + CYP2C19 Poor Metabolizer
drug = requests.get(f"{CPIC}/drug", params={"name": "eq.clopidogrel"}).json()[0]
recs = requests.get(f"{CPIC}/recommendation",
params={"drugid": f"eq.{drug['drugid']}",
"phenotypes": 'cs.{"CYP2C19":"Poor Metabolizer"}'}).json()
print(f"clopidogrel CYP2C19=PM: {len(recs)} recommendation(s)")
for rec in recs[:2]:
print(f" [{rec['classification']}] {rec['drugrecommendation'][:80]}…")
# ClinPGx side: how many CPIC guideline annotations cover CYP2C19?
glines = requests.get(f"{CLINPGX}/data/guidelineAnnotation",
params={"relatedGenes.symbol": "CYP2C19",
"source": "CPIC", "view": "base"}).json()["data"]
print(f"CYP2C19 CPIC guidelines: {len(glines)}")
Core API
Module 1: Free-text site search
POST /site/search with a JSON body {"query": "<term>"} is the canonical entry point when you don't know the PA ID. It searches across drugs, genes, variants, clinical annotations, guideline annotations, and labels in one shot.
import requests
CLINPGX = "https://api.clinpgx.org/v1"
r = requests.post(f"{CLINPGX}/site/search",
json={"query": "rs4149056"}, timeout=15)
r.raise_for_status()
hits = r.json()["data"]["hits"]
print(f"Total hits: {r.json()['data']['total']}")
for h in hits[:5]:
print(f" id={h.get('id')} name={h.get('name')[:80]}")
# Broader concept search
r = requests.post(f"{CLINPGX}/site/search",
json={"query": "TPMT azathioprine"}, timeout=15)
hits = r.json()["data"]["hits"]
print(f"TPMT+azathioprine hits: {len(hits)}")
for h in hits[:5]:
print(f" {h.get('id'):>15} {h.get('name','')[:80]}")
Module 2: Gene, drug, and variant record lookup
The /data/{type} endpoints accept simple property filters. All return {"data": [...], "status": "success"} — use view=base for summary, view=max for full nested objects.
import requests
CLINPGX = "https://api.clinpgx.org/v1"
# Gene by HGNC symbol
gene = requests.get(f"{CLINPGX}/data/gene",
params={"symbol": "CYP2D6", "view": "base"}).json()["data"][0]
print(f"{gene['symbol']} id={gene['id']} {gene['name']}")
# Drug by name (lowercase generic preferred)
drug = requests.get(f"{CLINPGX}/data/drug",
params={"name": "warfarin", "view": "base"}).json()["data"][0]
print(f"{drug['name']} id={drug['id']}")
# Variant by rsID
var = requests.get(f"{CLINPGX}/data/variant",
params={"name": "rs4149056", "view": "base"}).json()["data"][0]
print(f"{var['name']} id={var['id']} significance={var.get('clinicalSignificance')}")
# Direct record fetch when you already have a PA ID
r = requests.get(f"{CLINPGX}/data/drug/PA449088", params={"view": "max"}).json()
d = r["data"]
print(f"PA449088 → {d['name']} (objCls={d['objCls']})")
Module 3: Clinical annotations
data/clinicalAnnotation records associate a variant (location) with one or more drugs (relatedChemicals) and an evidence level (levelOfEvidence.term). The two supported filters are relatedChemicals.name= and levelOfEvidence.term=. There is no working gene= filter on this endpoint — see Module 4 for gene-driven access.
import requests, pandas as pd
CLINPGX = "https://api.clinpgx.org/v1"
# All clinical annotations for clopidogrel
data = requests.get(f"{CLINPGX}/data/clinicalAnnotation",
params={"relatedChemicals.name": "clopidogrel",
"view": "base"}).json()["data"]
print(f"clopidogrel annotations: {len(data)}")
rows = []
for ann in data[:10]:
loc = ann.get("location") or {}
drugs = ", ".join(c.get("name", "") for c in ann.get("relatedChemicals", []))
rows.append({
"id": ann["id"],
"variant": loc.get("displayName"),
"gene": (loc.get("genes") or [{}])[0].get("symbol"),
"drug": drugs,
"level": (ann.get("levelOfEvidence") or {}).get("term"),
"score": ann.get("score"),
})
print(pd.DataFrame(rows).to_string(index=False))
# All Level 1A clinical annotations (highest evidence)
data = requests.get(f"{CLINPGX}/data/clinicalAnnotation",
params={"levelOfEvidence.term": "1A",
"view": "base"}).json()["data"]
print(f"Level 1A annotations: {len(data)}")
drug_to_count = {}
for ann in data:
for c in ann.get("relatedChemicals") or []:
drug_to_count[c["name"]] = drug_to_count.get(c["name"], 0) + 1
top = sorted(drug_to_count.items(), key=lambda x: -x[1])[:10]
for d, n in top:
print(f" {n:3} {d}")
Module 4: Guideline annotations (gene-driven access)
data/guidelineAnnotation supports both relatedGenes.symbol= and relatedChemicals.name=, plus source= (CPIC, DPWG, CPNDS, RNPGx). This is the canonical way to get gene→guideline coverage.
import requests
CLINPGX = "https://api.clinpgx.org/v1"
# All CPIC guidelines mentioning CYP2C19
data = requests.get(f"{CLINPGX}/data/guidelineAnnotation",
params={"relatedGenes.symbol": "CYP2C19",
"source": "CPIC",