ChEMBL Database
ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.
Use Cases
- Find potent inhibitors for a protein target
- Search for compounds similar to a known drug
- Retrieve drug mechanism of action data
- Filter compounds by molecular properties (Lipinski, etc.)
- Export bioactivity data for ML or analysis
Installation
uv pip install chembl_webresource_client
Basic Usage
from chembl_webresource_client.new_client import new_client
# Fetch compound by identifier
mol = new_client.molecule.get('CHEMBL192')
# Retrieve target data
tgt = new_client.target.get('CHEMBL203')
# Query activity measurements
acts = new_client.activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=50
)
Available Endpoints
| Resource | Description |
|---|
molecule | Compound structures and properties |
target | Biological targets |
activity | Bioassay measurements |
assay | Experimental protocols |
drug | Approved drug data |
mechanism | Drug mechanisms of action |
drug_indication | Therapeutic indications |
similarity | Structure similarity search |
substructure | Substructure search |
document | Literature references |
cell_line | Cell line data |
protein_class | Protein classifications |
image | SVG molecular images |
Query Operators
The client uses Django-style filtering:
| Operator | Function | Example |
|---|
__exact | Exact match | pref_name__exact='Aspirin' |
__icontains | Case-insensitive substring | pref_name__icontains='kinase' |
__lte, __gte | Less/greater than or equal | standard_value__lte=10 |
__lt, __gt | Less/greater than | pchembl_value__gt=7 |
__range | Value within range | alogp__range=[-1, 5] |
__in | Value in list | target_chembl_id__in=['CHEMBL203'] |
__isnull | Null check | pchembl_value__isnull=False |
__startswith | Prefix match | pref_name__startswith='Proto' |
__regex | Regular expression | pref_name__regex='^[A-Z]{3}' |
Common Workflows
Find Target Inhibitors
from chembl_webresource_client.new_client import new_client
activity = new_client.activity
# Get potent BRAF inhibitors (IC50 < 100 nM)
braf_hits = activity.filter(
target_chembl_id='CHEMBL5145',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
for hit in braf_hits:
print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")
Search by Target Name
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activity
# Find CDK targets
cdk_targets = target.filter(
pref_name__icontains='cyclin-dependent kinase',
target_type='SINGLE PROTEIN'
)
target_ids = [t['target_chembl_id'] for t in cdk_targets]
# Get activities for these targets
cdk_activities = activity.filter(
target_chembl_id__in=target_ids[:5],
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
Structure Similarity Search
from chembl_webresource_client.new_client import new_client
sim = new_client.similarity
# Find molecules 80% similar to ibuprofen
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)
for m in matches:
print(f"{m['molecule_chembl_id']}: {m['similarity']}%")
Substructure Search
from chembl_webresource_client.new_client import new_client
sub = new_client.substructure
# Find compounds with benzimidazole core
benzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)
Filter by Molecular Properties
from chembl_webresource_client.new_client import new_client
mol = new_client.molecule
# Lipinski-compliant fragments
fragments = mol.filter(
molecule_properties__mw_freebase__lte=300,
molecule_properties__alogp__lte=3,
molecule_properties__hbd__lte=3,
molecule_properties__hba__lte=3
)
Drug Mechanisms of Action
from chembl_webresource_client.new_client import new_client
mech = new_client.mechanism
drug_ind = new_client.drug_indication
# Get mechanism of metformin
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)
for m in mechanisms:
print(f"Target: {m['target_chembl_id']}")
print(f"Action: {m['action_type']}")
# Get approved indications
indications = drug_ind.filter(molecule_chembl_id=metformin_id)
Generate Molecule Images
from chembl_webresource_client.new_client import new_client
img = new_client.image
# Get SVG of caffeine
caffeine_svg = img.get('CHEMBL113')
with open('caffeine.svg', 'w') as f:
f.write(caffeine_svg)
Key Response Fields
Molecule Properties
| Field | Description |
|---|
molecule_chembl_id | ChEMBL identifier |
pref_name | Preferred name |
molecule_structures.canonical_smiles | SMILES string |
molecule_structures.standard_inchi_key | InChI key |
molecule_properties.mw_freebase | Molecular weight |
molecule_properties.alogp | Calculated LogP |
molecule_properties.hba / hbd | H-bond acceptors/donors |
molecule_properties.psa | Polar surface area |
molecule_properties.rtb | Rotatable bonds |
molecule_properties.num_ro5_violations | Lipinski violations |
molecule_properties.qed_weighted | QED drug-likeness |
Activity Fields
| Field | Description |
|---|
molecule_chembl_id | Compound ID |
target_chembl_id | Target ID |
standard_type | Measurement type (IC50, Ki, EC50) |
standard_value | Numeric value |
standard_units | Units (nM, uM) |
pchembl_value | Normalized -log10 value |
data_validity_comment | Quality flag |
potential_duplicate | Duplicate indicator |
Target Fields
| Field | Description |
|---|
target_chembl_id | ChEMBL target ID |
pref_name | Preferred name |
target_type | SINGLE PROTEIN, PROTEIN COMPLEX, etc. |
organism | Species |
Mechanism Fields
| Field | Description |
|---|
molecule_chembl_id | Drug ID |
target_chembl_id | Target ID |
mechanism_of_action | Description |
action_type | INHIBITOR, AGONIST, ANTAGONIST, etc. |
Export to DataFrame
import pandas as pd
from chembl_webresource_client.new_client import new_client
activity = new_client.activity
results = activity.filter(
target_chembl_id='CHEMBL279',
standard_type='Ki',
pchembl_value__isnull=False
)
df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)
Configuration
from chembl_webresource_client.settings import Settings
cfg = Settings.Instance()
cfg.CACHING = True # Enable response caching
cfg.CACHE_EXPIRE = 43200 # Cache TTL (12 hours)
cfg.TIMEOUT = 60 # Request timeout
cfg.TOTAL_RETRIES = 5 # Retry attempts
Data Quality Notes
- ChEMBL data is manually curated but verify
data_validity_comment fields
- Check
potential_duplicate flags when aggregating results
- Use
pchembl_value for normalized comparisons across assay types
- Activity values without
standard_units should be used cautiously
Best Practices
- Use caching - Reduces API load and improves performance
- Filter early - Apply filters to reduce data transfer
- Limit results - Use
[:n] slicing for testing
- Check validity - Inspect
data_validity_comment fields
- Use pchembl_value - Normalized values enable cross-assay comparison
- Batch queries - Use
__in operator for multiple IDs
Error Handling
from chembl_webresource_client.new_client import new_client
mol = new_client.molecule
try:
result = mol.get('INVALID_ID')
except Exception as e:
if '404' in str(e):
print("Compound not found")
elif '503' in str(e):
print("Service unavailable - retry later")
else:
raise
External Links