ZINC Chemical Library Database

Overview

ZINC (ZINC Is Not Commercial) is a free database of commercially available compounds curated for virtual screening. ZINC22 contains over 1.4 billion compounds (ZINC20: 1.4B, including purchasable 3D conformers), organized by molecular property filters (lead-like, fragment-like, drug-like) and reactivity class. The REST API enables SMILES-based searches, property-filtered downloads, and compound subset exports for docking campaigns.

When to Use

Downloading a purchasable, drug-like or lead-like compound library for virtual screening or docking campaigns
Filtering compounds by Lipinski/lead-like properties (MW, logP, HBD, HBA) to build focused screening sets
Searching ZINC for commercially available analogs of a query molecule via SMILES similarity
Retrieving purchasable fragments (MW < 300, logP < 3) for fragment-based drug discovery
Building compound diversity libraries for high-throughput screening (HTS) campaigns
For known drug bioactivity data use chembl-database-bioactivity; for approved drug structures use drugbank-database-access; for RDKit property calculation use rdkit-cheminformatics

Prerequisites

Python packages: requests, pandas
Data requirements: SMILES strings, MW/logP ranges, or ZINC subset IDs
Environment: internet connection; no API key needed for ZINC15; large downloads may take minutes
Rate limits: reasonable use; avoid crawling all 1.4B records in automated loops

pip install requests pandas

Quick Start

import requests

# Search ZINC15 REST API for drug-like compounds
BASE = "https://zinc15.docking.org"

r = requests.get(f"{BASE}/substances.json",
                 params={"mwt__gte": 250, "mwt__lte": 350,
                         "logp__gte": 0, "logp__lte": 3,
                         "availability": "for-sale", "count": 5})
r.raise_for_status()
compounds = r.json()
print(f"Returned {len(compounds)} compounds")
for c in compounds[:3]:
    print(f"  ZINC: {c['zinc_id']:20s} MW: {c['mwt']:.1f}  logP: {c['logp']:.2f}  SMILES: {c['smiles'][:40]}")

Core API

Query 1: Property-Filtered Compound Search

Search ZINC15 by molecular property ranges (Lipinski, lead-like, fragment-like criteria).

import requests, pandas as pd

BASE = "https://zinc15.docking.org"

def zinc_search(params, max_results=500):
    """Search ZINC15 with property filters. Returns DataFrame."""
    all_results = []
    params = dict(params)
    params["count"] = min(100, max_results)

    r = requests.get(f"{BASE}/substances.json", params=params)
    r.raise_for_status()
    compounds = r.json()
    all_results.extend(compounds)
    return pd.DataFrame(all_results)

# Lead-like set: MW 250-350, logP 1-3, HBD ≤ 3
df_leads = zinc_search({
    "mwt__gte": 250, "mwt__lte": 350,
    "logp__gte": 1, "logp__lte": 3,
    "hbd__lte": 3, "hba__lte": 7,
    "availability": "for-sale",
})
print(f"Lead-like compounds: {len(df_leads)}")
print(df_leads[["zinc_id", "mwt", "logp", "smiles"]].head())

# Fragment-like set: MW < 300, logP < 3 (Rule of Three)
df_frags = zinc_search({
    "mwt__lte": 300,
    "logp__lte": 3,
    "hbd__lte": 3,
    "availability": "for-sale",
})
print(f"\nFragment-like compounds: {len(df_frags)}")
print(df_frags[["zinc_id", "mwt", "logp", "smiles"]].head())

Query 2: Retrieve Compound by ZINC ID

Fetch full compound data for a known ZINC identifier.

import requests

BASE = "https://zinc15.docking.org"

zinc_id = "ZINC000000029632"

r = requests.get(f"{BASE}/substances/{zinc_id}.json")
r.raise_for_status()
c = r.json()

print(f"ZINC ID  : {c['zinc_id']}")
print(f"SMILES   : {c['smiles']}")
print(f"MW       : {c['mwt']:.2f}")
print(f"logP     : {c['logp']:.2f}")
print(f"HBD      : {c['hbd']}")
print(f"HBA      : {c['hba']}")
print(f"TPSA     : {c.get('tpsa', 'n/a')}")
print(f"Rotatable: {c.get('rotatable_bonds', 'n/a')}")
print(f"Suppliers: {len(c.get('suppliers', []))}")

Query 3: Download Compound Subsets (Tranches)

ZINC organizes compounds into "tranches" by MW and logP. Download pre-built SDF/SMILES files.

import requests

# ZINC15 tranche download (MW 200-250, logP 1-2 range)
# Tranche naming: letters encode MW range (A-K) and logP range (A-J)
# See http://zinc15.docking.org/tranches/home

def download_zinc_tranche(tranche_name, dest_file, fmt="smi"):
    """Download a ZINC tranche SMILES file."""
    url = f"https://zinc15.docking.org/tranches/{tranche_name}.{fmt}"
    r = requests.get(url, stream=True)
    r.raise_for_status()
    with open(dest_file, "wb") as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f"Downloaded {dest_file}")

# Download one tranche as SMILES
download_zinc_tranche("AABA", "zinc_AABA.smi", fmt="smi")

Query 4: SMILES Similarity Search

Find ZINC compounds similar to a query molecule.

import requests, pandas as pd

BASE = "https://zinc15.docking.org"

query_smiles = "c1ccc(NC(=O)c2ccccc2)cc1"  # benzanilide analog

r = requests.get(f"{BASE}/substances.json",
                 params={
                     "smiles": query_smiles,
                     "similarity": 0.6,       # Tanimoto similarity threshold
                     "count": 20,
                     "availability": "for-sale"
                 })
r.raise_for_status()
results = r.json()
print(f"Similar compounds found: {len(results)}")
df = pd.DataFrame(results)[["zinc_id", "smiles", "mwt", "logp"]]
print(df.head())

Query 5: Catalog and Supplier Information

Retrieve purchasability and supplier catalog data for compounds.

import requests

BASE = "https://zinc15.docking.org"

# Check purchasability and catalog info
zinc_id = "ZINC000000029632"
r = requests.get(f"{BASE}/substances/{zinc_id}/suppliers.json")
r.raise_for_status()
suppliers = r.json()

print(f"Suppliers for {zinc_id}: {len(suppliers)}")
for sup in suppliers[:5]:
    print(f"  {sup.get('name', 'n/a'):30s} | Catalog: {sup.get('catalognum', 'n/a')}")

Query 6: Bulk Download via ZINC Slices

For large-scale virtual screening, download entire ZINC subsets as compressed SMILES.

import requests, gzip, io, pandas as pd

# ZINC15 drug-like purchasable slice (public URL pattern)
# Full drug-like: https://zinc15.docking.org/substances/subsets/drug-like.smi.gz

def download_zinc_subset(subset_name, max_lines=1000):
    """Download a ZINC subset SMILES file and return a DataFrame sample."""
    url = f"https://zinc15.docking.org/substances/subsets/{subset_name}.smi.gz"
    r = requests.get(url, stream=True)
    r.raise_for_status()

    lines = []
    with gzip.open(r.raw, "rt") as f:
        for i, line in enumerate(f):
            if i >= max_lines:
                break
            lines.append(line.strip().split())

    df = pd.DataFrame(lines, columns=["smiles", "zinc_id"] + [f"col{i}" for i in range(max(0, len(lines[0])-2))])
    return df[["smiles", "zinc_id"]]

# Load first 1000 from lead-like subset
df_sample = download_zinc_subset("lead-like", max_lines=1000)
print(f"Loaded {len(df_sample)} compounds from lead-like subset")
print(df_sample.head())

Key Concepts

ZINC Tranches

Compounds are organized into a 2D grid of "tranches" based on MW (rows A–K: <200 to >600 Da) and logP (columns A–J: <-1 to >5). Each tranche can be downloaded as a SMILES or SDF file. This tranching enables targeted downloads of specific property spaces for docking.

Availability Classes

for-sale: Purchasable from ≥1 supplier
in-stock: Available for immediate purchase
wait-ok: Longer lead time acceptable
on-demand: Custom synthesis required

Common Workflows

Workflow 1: Build a Focused Docking Library

Goal: Curate a purchasable, lead-like compound library within specific property ranges, deduplicate, and export for docking.

import requests, pandas as pd

BASE = "https://zin

zinc-database

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

xlsx

mem-search

weekly-digests

how-it-works

Recibe nuevas skills de Dados e Análise todos los lunes