KEGG Database — Biological Pathway & Molecular Network Queries
Overview
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis, molecular interaction networks, and cross-database ID conversion. Access is via a direct REST API with no authentication — all operations use simple HTTP GET requests returning tab-delimited text.
When to Use
- Mapping genes to biological pathways (e.g., "which pathways involve TP53?")
- Retrieving metabolic pathway details, gene lists, or compound structures
- Converting identifiers between KEGG, NCBI Gene, UniProt, and PubChem
- Checking drug-drug interactions from KEGG's pharmacological database
- Building pathway enrichment context (all genes per pathway for an organism)
- Cross-referencing compounds, reactions, enzymes, and pathways
- For Python-native multi-database queries (KEGG + UniProt + Ensembl in one script), prefer
bioservicesinstead - For pathway visualization, use KEGG Mapper (https://www.kegg.jp/kegg/mapper/) directly
Prerequisites
pip install requests
API constraints:
- Academic use only — commercial use requires a separate KEGG license
- Max 10 entries per
get/list/conv/link/ddicall (image/kgml/json: 1 entry only) - No explicit rate limit, but add
time.sleep(0.5)between batch requests to avoid server-side throttling - Base URL:
https://rest.kegg.jp/
Quick Start
import requests
import time
BASE = "https://rest.kegg.jp"
def kegg_get(operation, *args):
"""Generic KEGG REST API caller."""
url = f"{BASE}/{operation}/{'/'.join(args)}"
resp = requests.get(url)
resp.raise_for_status()
return resp.text
# Find pathways linked to human gene TP53
pathways = kegg_get("link", "pathway", "hsa:7157")
print(pathways[:200])
# hsa:7157 path:hsa04010
# hsa:7157 path:hsa04110
# ...
# Get pathway details
detail = kegg_get("get", "hsa04110")
print(detail[:300])
Core API
1. Database Information — kegg_info
Retrieve metadata and statistics about KEGG databases.
import requests
BASE = "https://rest.kegg.jp"
# Database-level info
info = requests.get(f"{BASE}/info/pathway").text
print(info[:200])
# pathway Pathway
# Release 112.0, Dec 2025
# Kanehisa Laboratories
# ...
# Organism-level info
hsa_info = requests.get(f"{BASE}/info/hsa").text
print(hsa_info[:200])
Common databases: kegg, pathway, module, brite, genes, genome, compound, glycan, reaction, enzyme, disease, drug
2. Listing Entries — kegg_list
List entry identifiers and names from any KEGG database.
import requests
BASE = "https://rest.kegg.jp"
# All human pathways
hsa_pathways = requests.get(f"{BASE}/list/pathway/hsa").text
for line in hsa_pathways.strip().split("\n")[:5]:
pathway_id, name = line.split("\t")
print(f"{pathway_id}: {name}")
# path:hsa00010: Glycolysis / Gluconeogenesis - Homo sapiens (human)
# ...
# Specific entries (max 10, joined with +)
genes = requests.get(f"{BASE}/list/hsa:10458+hsa:10459").text
print(genes)
Common organism codes: hsa (human), mmu (mouse), dme (fruit fly), sce (yeast), eco (E. coli)
3. Keyword Search — kegg_find
Search databases by keywords or molecular properties.
import requests
import time
BASE = "https://rest.kegg.jp"
# Keyword search in genes
results = requests.get(f"{BASE}/find/genes/p53").text
print(f"Found {len(results.strip().split(chr(10)))} entries")
time.sleep(0.5)
# Chemical formula search (exact match)
compounds = requests.get(f"{BASE}/find/compound/C7H10N4O2/formula").text
print(compounds[:200])
time.sleep(0.5)
# Molecular weight range search
drugs = requests.get(f"{BASE}/find/drug/300-310/exact_mass").text
print(drugs[:200])
Search options: append /formula (exact match), /exact_mass (range), /mol_weight (range) to compound/drug queries.
4. Entry Retrieval — kegg_get
Retrieve complete database entries or specific data formats.
import requests
import time
BASE = "https://rest.kegg.jp"
# Full pathway entry (text format)
pathway = requests.get(f"{BASE}/get/hsa00010").text
print(pathway[:500])
time.sleep(0.5)
# Multiple entries (max 10, joined with +)
genes = requests.get(f"{BASE}/get/hsa:10458+hsa:10459").text
# Protein sequence (FASTA)
fasta = requests.get(f"{BASE}/get/hsa:10458/aaseq").text
print(fasta[:200])
time.sleep(0.5)
# Compound structure (MOL format)
mol = requests.get(f"{BASE}/get/cpd:C00002/mol").text # ATP
# Pathway image (PNG, single entry only)
img_resp = requests.get(f"{BASE}/get/hsa05130/image")
with open("pathway.png", "wb") as f:
f.write(img_resp.content)
print(f"Saved pathway image: {len(img_resp.content)} bytes")
Output formats: aaseq (protein FASTA), ntseq (nucleotide FASTA), mol (MOL), kcf (KCF), image (PNG), kgml (XML), json (pathway JSON). Image/KGML/JSON accept one entry only.
5. ID Conversion — kegg_conv
Convert identifiers between KEGG and external databases.
import requests
import time
BASE = "https://rest.kegg.jp"
# KEGG gene → NCBI Gene ID (specific gene)
ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/hsa:10458").text
print(ncbi.strip())
# hsa:10458 ncbi-geneid:10458
time.sleep(0.5)
# KEGG gene → UniProt
uniprot = requests.get(f"{BASE}/conv/uniprot/hsa:10458").text
print(uniprot.strip())
time.sleep(0.5)
# Bulk conversion: all human genes → NCBI Gene IDs
all_conv = requests.get(f"{BASE}/conv/ncbi-geneid/hsa").text
lines = all_conv.strip().split("\n")
print(f"Total conversions: {len(lines)}")
# Reverse: NCBI Gene ID → KEGG
reverse = requests.get(f"{BASE}/conv/hsa/ncbi-geneid:7157").text
print(reverse.strip()) # TP53
Supported external databases: ncbi-geneid, ncbi-proteinid, uniprot, pubchem, chebi
6. Cross-Referencing — kegg_link
Find related entries within and between KEGG databases.
import requests
import time
BASE = "https://rest.kegg.jp"
# Genes in glycolysis pathway
genes = requests.get(f"{BASE}/link/genes/hsa00010").text
gene_list = [line.split("\t")[1] for line in genes.strip().split("\n") if line]
print(f"Glycolysis genes: {len(gene_list)}")
time.sleep(0.5)
# Pathways containing a specific gene
pathways = requests.get(f"{BASE}/link/pathway/hsa:7157").text # TP53
print(pathways[:300])
time.sleep(0.5)
# Compounds in a pathway
compounds = requests.get(f"{BASE}/link/compound/hsa00010").text
print(f"Compounds in glycolysis: {len(compounds.strip().split(chr(10)))}")
# Map genes to KO (orthology) groups
ko = requests.get(f"{BASE}/link/ko/hsa:10458").text
print(ko.strip())
Common links: genes ↔ pathway, pathway ↔ compound, pathway ↔ enzyme, genes ↔ ko (orthology)
7. Drug-Drug Interactions — kegg_ddi
Check pharmacological interactions between drugs.
import requests
BASE = "https://rest.kegg.jp"
# Single drug — all known interactions
interactions = requests.get(f"{BASE}/ddi/D00001").text
print(f"Interactions: {len(interactions.strip().split(chr(10)))}")
# Pairwise check (max 10 drugs, joined with +)
pair = requests.get(f"{BASE}/ddi/D00001+D00002+D00003").text
print(pair[:300])
Key Concepts
Identifier Formats
| Type | Format | Example |
|---|---|---|
| Reference pathway | map##### | map00010 (Glycolysis, generic) |
| Organism pathway | {org}##### | hsa00010 (Glycolysis, human) |
| Gene | {org}:{number} | hsa:7157 (TP53) |
| Compound | cpd:C##### | cpd:C00002 (ATP) |
| Drug | dr:D##### | dr:D00001 |
| Enzyme | ec:{EC_number} | ec:1.1.1.1 |
| KO (orthology) | ko:K##### | ko:K00001 |
Pathway Categories
KEGG organizes pathways into seven major categories:
- Metabolism —
map001xx(Glycolysis, TCA cycle, amino acid metabolism) - Genetic Information Processing —
map030xx(Ribosome, Spliceosome, DN