Medchem
Overview
Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specific—use as guidelines combined with domain expertise.
When to Use This Skill
This skill should be used when:
- Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries
- Filtering molecules by structural alerts or PAINS patterns
- Prioritizing compounds for lead optimization
- Assessing compound quality and medicinal chemistry properties
- Detecting reactive or problematic functional groups
- Calculating molecular complexity metrics
Installation
uv pip install medchem
Core Capabilities
1. Medicinal Chemistry Rules
Apply established drug-likeness rules to molecules using the medchem.rules module.
Available Rules:
- Rule of Five (Lipinski)
- Rule of Oprea
- Rule of CNS
- Rule of leadlike (soft and strict)
- Rule of three
- Rule of Reos
- Rule of drug
- Rule of Veber
- Golden triangle
- PAINS filters
Single Rule Application:
import medchem as mc
# Apply Rule of Five to a SMILES string
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin
passes = mc.rules.basic_rules.rule_of_five(smiles)
# Returns: True
# Check specific rules
passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles)
passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)
Multiple Rules with RuleFilters:
import datamol as dm
import medchem as mc
# Load molecules
mols = [dm.to_mol(smiles) for smiles in smiles_list]
# Create filter with multiple rules
rfilter = mc.rules.RuleFilters(
rule_list=[
"rule_of_five",
"rule_of_oprea",
"rule_of_cns",
"rule_of_leadlike_soft"
]
)
# Apply filters with parallelization
results = rfilter(
mols=mols,
n_jobs=-1, # Use all CPU cores
progress=True
)
Result Format: Results are returned as dictionaries with pass/fail status and detailed information for each rule.
2. Structural Alert Filters
Detect potentially problematic structural patterns using the medchem.structural module.
Available Filters:
- Common Alerts - General structural alerts derived from ChEMBL curation and literature
- NIBR Filters - Novartis Institutes for BioMedical Research filter set
- Lilly Demerits - Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)
Common Alerts:
import medchem as mc
# Create filter
alert_filter = mc.structural.CommonAlertsFilters()
# Check single molecule
mol = dm.to_mol("c1ccccc1")
has_alerts, details = alert_filter.check_mol(mol)
# Batch filtering with parallelization
results = alert_filter(
mols=mol_list,
n_jobs=-1,
progress=True
)
NIBR Filters:
import medchem as mc
# Apply NIBR filters
nibr_filter = mc.structural.NIBRFilters()
results = nibr_filter(mols=mol_list, n_jobs=-1)
Lilly Demerits:
import medchem as mc
# Calculate Lilly demerits
lilly = mc.structural.LillyDemeritsFilters()
results = lilly(mols=mol_list, n_jobs=-1)
# Each result includes demerit score and whether it passes (≤100 demerits)
3. Functional API for High-Level Operations
The medchem.functional module provides convenient functions for common workflows.
Quick Filtering:
import medchem as mc
# Apply NIBR filters to a list
filter_ok = mc.functional.nibr_filter(
mols=mol_list,
n_jobs=-1
)
# Apply common alerts
alert_results = mc.functional.common_alerts_filter(
mols=mol_list,
n_jobs=-1
)
4. Chemical Groups Detection
Identify specific chemical groups and functional groups using medchem.groups.
Available Groups:
- Hinge binders
- Phosphate binders
- Michael acceptors
- Reactive groups
- Custom SMARTS patterns
Usage:
import medchem as mc
# Create group detector
group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
# Check for matches
has_matches = group.has_match(mol_list)
# Get detailed match information
matches = group.get_matches(mol)
5. Named Catalogs
Access curated collections of chemical structures through medchem.catalogs.
Available Catalogs:
- Functional groups
- Protecting groups
- Common reagents
- Standard fragments
Usage:
import medchem as mc
# Access named catalogs
catalogs = mc.catalogs.NamedCatalogs
# Use catalog for matching
catalog = catalogs.get("functional_groups")
matches = catalog.get_matches(mol)
6. Molecular Complexity
Calculate complexity metrics that approximate synthetic accessibility using medchem.complexity.
Common Metrics:
- Bertz complexity
- Whitlock complexity
- Barone complexity
Usage:
import medchem as mc
# Calculate complexity
complexity_score = mc.complexity.calculate_complexity(mol)
# Filter by complexity threshold
complex_filter = mc.complexity.ComplexityFilter(max_complexity=500)
results = complex_filter(mols=mol_list)
7. Constraints Filtering
Apply custom property-based constraints using medchem.constraints.
Example Constraints:
- Molecular weight ranges
- LogP bounds
- TPSA limits
- Rotatable bond counts
Usage:
import medchem as mc
# Define constraints
constraints = mc.constraints.Constraints(
mw_range=(200, 500),
logp_range=(-2, 5),
tpsa_max=140,
rotatable_bonds_max=10
)
# Apply constraints
results = constraints(mols=mol_list, n_jobs=-1)
8. Medchem Query Language
Use a specialized query language for complex filtering criteria.
Query Examples:
# Molecules passing Ro5 AND not having common alerts
"rule_of_five AND NOT common_alerts"
# CNS-like molecules with low complexity
"rule_of_cns AND complexity < 400"
# Leadlike molecules without Lilly demerits
"rule_of_leadlike AND lilly_demerits == 0"
Usage:
import medchem as mc
# Parse and apply query
query = mc.query.parse("rule_of_five AND NOT common_alerts")
results = query.apply(mols=mol_list, n_jobs=-1)
Workflow Patterns
Pattern 1: Initial Triage of Compound Library
Filter a large compound collection to identify drug-like candidates.
import datamol as dm
import medchem as mc
import pandas as pd
# Load compound library
df = pd.read_csv("compounds.csv")
mols = [dm.to_mol(smi) for smi in df["smiles"]]
# Apply primary filters
rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"])
rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)
# Apply structural alerts
alert_filter = mc.structural.CommonAlertsFilters()
alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)
# Combine results
df["passes_rules"] = rule_results["pass"]
df["has_alerts"] = alert_results["has_alerts"]
df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]
# Save filtered compounds
filtered_df = df[df["drug_like"]]
filtered_df.to_csv("filtered_compounds.csv", index=False)
Pattern 2: Lead Optimization Filtering
Apply stricter criteria during lead optimization.
import medchem as mc
# Create comprehensive filter
filters = {
"rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]),
"alerts": mc.structural.NIBRFilters(),
"lilly": mc.structural.LillyDemeritsFilters(),
"complexity": mc.complexity.ComplexityFilter(max_complexity=400)
}
# Apply all filters
results = {}
for name, filt in filters.items():
results[name] = filt(mols=candidate_mols, n_jobs=-1)
# Identify compounds passing all filters
passes_all = all(r["pass"] for r in results.values())
Pattern 3: Identify Specific Chemical Groups
Find molecules containing specific functional groups or scaffolds.
import medchem as mc
# Create group detector for multiple groups
group_detector = mc.groups.ChemicalG