GWAS Catalog Database
Overview
The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.
When to Use This Skill
This skill should be used when queries involve:
- Genetic variant associations: Finding SNPs associated with diseases or traits
- SNP lookups: Retrieving information about specific genetic variants (rs IDs)
- Trait/disease searches: Discovering genetic associations for phenotypes
- Gene associations: Finding variants in or near specific genes
- GWAS summary statistics: Accessing complete genome-wide association data
- Study metadata: Retrieving publication and cohort information
- Population genetics: Exploring ancestry-specific associations
- Polygenic risk scores: Identifying variants for risk prediction models
- Functional genomics: Understanding variant effects and genomic context
- Systematic reviews: Comprehensive literature synthesis of genetic associations
Core Capabilities
1. Understanding GWAS Catalog Data Structure
The GWAS Catalog is organized around four core entities:
- Studies: GWAS publications with metadata (PMID, author, cohort details)
- Associations: SNP-trait associations with statistical evidence (p ≤ 5×10⁻⁸)
- Variants: Genetic markers (SNPs) with genomic coordinates and alleles
- Traits: Phenotypes and diseases (mapped to EFO ontology terms)
Key Identifiers:
- Study accessions:
GCSTIDs (e.g., GCST001234) - Variant IDs:
rsnumbers (e.g., rs7903146) orvariant_idformat - Trait IDs: EFO terms (e.g., EFO_0001360 for type 2 diabetes)
- Gene symbols: HGNC approved names (e.g., TCF7L2)
2. Web Interface Searches
The web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:
By Variant (rs ID):
rs7903146
Returns all trait associations for this SNP.
By Disease/Trait:
type 2 diabetes
Parkinson disease
body mass index
Returns all associated genetic variants.
By Gene:
APOE
TCF7L2
Returns variants in or near the gene region.
By Chromosomal Region:
10:114000000-115000000
Returns variants in the specified genomic interval.
By Publication:
PMID:20581827
Author: McCarthy MI
GCST001234
Returns study details and all reported associations.
3. REST API Access
The GWAS Catalog provides two REST APIs for programmatic access:
Base URLs:
- GWAS Catalog API:
https://www.ebi.ac.uk/gwas/rest/api - Summary Statistics API:
https://www.ebi.ac.uk/gwas/summary-statistics/api
API Documentation:
- Main API docs: https://www.ebi.ac.uk/gwas/rest/docs/api
- Summary stats docs: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
Core Endpoints:
-
Studies endpoint -
/studies/{accessionID}import requests # Get a specific study url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795" response = requests.get(url, headers={"Content-Type": "application/json"}) study = response.json() -
Associations endpoint -
/associations# Find associations for a variant variant = "rs7903146" url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json() -
Variants endpoint -
/singleNucleotidePolymorphisms/{rsID}# Get variant details url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_info = response.json() -
Traits endpoint -
/efoTraits/{efoID}# Get trait information url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360" response = requests.get(url, headers={"Content-Type": "application/json"}) trait_info = response.json()
4. Query Examples and Patterns
Example 1: Find all associations for a disease
import requests
trait = "EFO_0001360" # Type 2 diabetes
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
# Query associations for this trait
url = f"{base_url}/efoTraits/{trait}/associations"
response = requests.get(url, headers={"Content-Type": "application/json"})
associations = response.json()
# Process results
for assoc in associations.get('_embedded', {}).get('associations', []):
variant = assoc.get('rsId')
pvalue = assoc.get('pvalue')
risk_allele = assoc.get('strongestAllele')
print(f"{variant}: p={pvalue}, risk allele={risk_allele}")
Example 2: Get variant information and all trait associations
import requests
variant = "rs7903146"
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
# Get variant details
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant_data = response.json()
# Get all associations for this variant
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"
params = {"projection": "associationBySnp"}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
associations = response.json()
# Extract trait names and p-values
for assoc in associations.get('_embedded', {}).get('associations', []):
trait = assoc.get('efoTrait')
pvalue = assoc.get('pvalue')
print(f"Trait: {trait}, p-value: {pvalue}")
Example 3: Access summary statistics
import requests
# Query summary statistics API
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
# Find associations by trait with p-value threshold
trait = "EFO_0001360" # Type 2 diabetes
p_upper = "0.000000001" # p < 1e-9
url = f"{base_url}/traits/{trait}/associations"
params = {
"p_upper": p_upper,
"size": 100 # Number of results
}
response = requests.get(url, params=params)
results = response.json()
# Process genome-wide significant hits
for hit in results.get('_embedded', {}).get('associations', []):
variant_id = hit.get('variant_id')
chromosome = hit.get('chromosome')
position = hit.get('base_pair_location')
pvalue = hit.get('p_value')
print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")
Example 4: Query by chromosomal region
import requests
# Find variants in a specific genomic region
chromosome = "10"
start_pos = 114000000
end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
params = {
"chrom": chromosome,
"bpStart": start_pos,
"bpEnd": end_pos
}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
variants_in_region = response.json()
5. Working with Summary Statistics
The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).
Access Methods:
- FTP download: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
- REST API: Query-based access to summary statistics
- Web interface: Browse and download via the website
Summary Statistics API Features:
- Filter by chromosome, position, p-value
- Query specific variants across studies
- Retrieve effect sizes and allele frequencies
- Access harmonized and standardized data
Example: Download summary statistics for a study
import requests
import gzip
# Get available summary statistics
base_url = "https://www.ebi.ac.uk/gwas/summary-sta