ClinPGx (PharmGKB) Pharmacogenomics Database

Overview

PharmGKB rebranded as ClinPGx in 2024 and the API moved from api.pharmgkb.org to api.clinpgx.org. The old host now returns 404/405; every example here uses the new endpoints. Two complementary APIs are used together:

ClinPGx Data API (api.clinpgx.org/v1) — record-style access to genes, drugs, variants, clinical annotations, guideline annotations, drug labels, and pathways. Responses wrap data as {"data": [...], "status": "success"}. Filters use dotted property paths (e.g. relatedChemicals.name=clopidogrel, levelOfEvidence.term=1A).
CPIC PostgREST API (api.cpicpgx.org/v1) — relational lookup of genotype → drug recommendation rows. PostgREST filter syntax (column=eq.value, JSON cs.{...} for jsonb containment). Returns flat JSON arrays.

Use ClinPGx for what is known about a gene/drug/variant; use CPIC for how to prescribe given a phenotype. The pattern is ClinPGx for annotations, CPIC for recommendations.

When to Use

Retrieving CPIC genotype-specific dosing recommendations for a gene-drug pair (e.g., CYP2C19 + clopidogrel) — use CPIC
Looking up all pharmacogenomic clinical annotations for a drug or evidence level — use ClinPGx data/clinicalAnnotation
Finding all CPIC/DPWG guideline annotations for a pharmacogene — use ClinPGx data/guidelineAnnotation
Resolving a gene symbol, drug name, or rsID to ClinPGx PA identifiers — use data/{gene,drug,variant}
Free-text search across all ClinPGx record types (genes, drugs, variants, annotations) — use POST /site/search
Retrieving FDA/EMA pharmacogenomic drug label annotations — use ClinPGx data/label
Building precision-medicine prescribing workflows that combine annotation evidence with phenotype-specific recommendations
For germline disease pathogenicity (not PGx) use clinvar-database
For somatic cancer pharmacogenomics use cosmic-database or opentargets-database

Prerequisites

Python packages: requests, pandas — both already in standard environments
Data requirements: HGNC gene symbols, drug names (lowercase generic), dbSNP rsIDs, or PA identifiers
Environment: internet connection; no authentication required for either host
Rate limits: the ClinPGx host occasionally returns HTTP 429; insert time.sleep(0.3–0.5) between sequential calls. CPIC is more permissive.

If you are inside a pixi/conda environment that already provides requests and pandas, skip the install — invoke scripts with pixi run python ....

pip install requests pandas

Quick Start

import requests

CLINPGX = "https://api.clinpgx.org/v1"
CPIC    = "https://api.cpicpgx.org/v1"

# CPIC genotype → recommendation: clopidogrel + CYP2C19 Poor Metabolizer
drug = requests.get(f"{CPIC}/drug", params={"name": "eq.clopidogrel"}).json()[0]
recs = requests.get(f"{CPIC}/recommendation",
                    params={"drugid": f"eq.{drug['drugid']}",
                            "phenotypes": 'cs.{"CYP2C19":"Poor Metabolizer"}'}).json()
print(f"clopidogrel CYP2C19=PM: {len(recs)} recommendation(s)")
for rec in recs[:2]:
    print(f"  [{rec['classification']}] {rec['drugrecommendation'][:80]}…")

# ClinPGx side: how many CPIC guideline annotations cover CYP2C19?
glines = requests.get(f"{CLINPGX}/data/guidelineAnnotation",
                      params={"relatedGenes.symbol": "CYP2C19",
                              "source": "CPIC", "view": "base"}).json()["data"]
print(f"CYP2C19 CPIC guidelines: {len(glines)}")

Core API

Module 1: Free-text site search

POST /site/search with a JSON body {"query": "<term>"} is the canonical entry point when you don't know the PA ID. It searches across drugs, genes, variants, clinical annotations, guideline annotations, and labels in one shot.

import requests

CLINPGX = "https://api.clinpgx.org/v1"

r = requests.post(f"{CLINPGX}/site/search",
                  json={"query": "rs4149056"}, timeout=15)
r.raise_for_status()
hits = r.json()["data"]["hits"]
print(f"Total hits: {r.json()['data']['total']}")
for h in hits[:5]:
    print(f"  id={h.get('id')}  name={h.get('name')[:80]}")

# Broader concept search
r = requests.post(f"{CLINPGX}/site/search",
                  json={"query": "TPMT azathioprine"}, timeout=15)
hits = r.json()["data"]["hits"]
print(f"TPMT+azathioprine hits: {len(hits)}")
for h in hits[:5]:
    print(f"  {h.get('id'):>15}  {h.get('name','')[:80]}")

Module 2: Gene, drug, and variant record lookup

The /data/{type} endpoints accept simple property filters. All return {"data": [...], "status": "success"} — use view=base for summary, view=max for full nested objects.

import requests

CLINPGX = "https://api.clinpgx.org/v1"

# Gene by HGNC symbol
gene = requests.get(f"{CLINPGX}/data/gene",
                    params={"symbol": "CYP2D6", "view": "base"}).json()["data"][0]
print(f"{gene['symbol']}  id={gene['id']}  {gene['name']}")

# Drug by name (lowercase generic preferred)
drug = requests.get(f"{CLINPGX}/data/drug",
                    params={"name": "warfarin", "view": "base"}).json()["data"][0]
print(f"{drug['name']}  id={drug['id']}")

# Variant by rsID
var = requests.get(f"{CLINPGX}/data/variant",
                   params={"name": "rs4149056", "view": "base"}).json()["data"][0]
print(f"{var['name']}  id={var['id']}  significance={var.get('clinicalSignificance')}")

# Direct record fetch when you already have a PA ID
r = requests.get(f"{CLINPGX}/data/drug/PA449088", params={"view": "max"}).json()
d = r["data"]
print(f"PA449088 → {d['name']}  (objCls={d['objCls']})")

Module 3: Clinical annotations

data/clinicalAnnotation records associate a variant (location) with one or more drugs (relatedChemicals) and an evidence level (levelOfEvidence.term). The two supported filters are relatedChemicals.name= and levelOfEvidence.term=. There is no working gene= filter on this endpoint — see Module 4 for gene-driven access.

import requests, pandas as pd

CLINPGX = "https://api.clinpgx.org/v1"

# All clinical annotations for clopidogrel
data = requests.get(f"{CLINPGX}/data/clinicalAnnotation",
                    params={"relatedChemicals.name": "clopidogrel",
                            "view": "base"}).json()["data"]
print(f"clopidogrel annotations: {len(data)}")

rows = []
for ann in data[:10]:
    loc = ann.get("location") or {}
    drugs = ", ".join(c.get("name", "") for c in ann.get("relatedChemicals", []))
    rows.append({
        "id": ann["id"],
        "variant": loc.get("displayName"),
        "gene": (loc.get("genes") or [{}])[0].get("symbol"),
        "drug": drugs,
        "level": (ann.get("levelOfEvidence") or {}).get("term"),
        "score": ann.get("score"),
    })
print(pd.DataFrame(rows).to_string(index=False))

# All Level 1A clinical annotations (highest evidence)
data = requests.get(f"{CLINPGX}/data/clinicalAnnotation",
                    params={"levelOfEvidence.term": "1A",
                            "view": "base"}).json()["data"]
print(f"Level 1A annotations: {len(data)}")

drug_to_count = {}
for ann in data:
    for c in ann.get("relatedChemicals") or []:
        drug_to_count[c["name"]] = drug_to_count.get(c["name"], 0) + 1
top = sorted(drug_to_count.items(), key=lambda x: -x[1])[:10]
for d, n in top:
    print(f"  {n:3}  {d}")

Module 4: Guideline annotations (gene-driven access)

data/guidelineAnnotation supports both relatedGenes.symbol= and relatedChemicals.name=, plus source= (CPIC, DPWG, CPNDS, RNPGx). This is the canonical way to get gene→guideline coverage.

import requests

CLINPGX = "https://api.clinpgx.org/v1"

# All CPIC guidelines mentioning CYP2C19
data = requests.get(f"{CLINPGX}/data/guidelineAnnotation",
                    params={"relatedGenes.symbol": "CYP2C19",
                            "source": "CPIC",

clinpgx-database

How to add

Drop this on your repo README

Related skills

xlsx

mem-search

weekly-digests

how-it-works

Get new Dados e Análise skills every Monday