ChEMBL Database

ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.

Use Cases

Find potent inhibitors for a protein target
Search for compounds similar to a known drug
Retrieve drug mechanism of action data
Filter compounds by molecular properties (Lipinski, etc.)
Export bioactivity data for ML or analysis

Installation

uv pip install chembl_webresource_client

Basic Usage

from chembl_webresource_client.new_client import new_client

# Fetch compound by identifier
mol = new_client.molecule.get('CHEMBL192')

# Retrieve target data
tgt = new_client.target.get('CHEMBL203')

# Query activity measurements
acts = new_client.activity.filter(
    target_chembl_id='CHEMBL203',
    standard_type='IC50',
    standard_value__lte=50
)

Available Endpoints

Resource	Description
`molecule`	Compound structures and properties
`target`	Biological targets
`activity`	Bioassay measurements
`assay`	Experimental protocols
`drug`	Approved drug data
`mechanism`	Drug mechanisms of action
`drug_indication`	Therapeutic indications
`similarity`	Structure similarity search
`substructure`	Substructure search
`document`	Literature references
`cell_line`	Cell line data
`protein_class`	Protein classifications
`image`	SVG molecular images

Query Operators

The client uses Django-style filtering:

Operator	Function	Example
`__exact`	Exact match	`pref_name__exact='Aspirin'`
`__icontains`	Case-insensitive substring	`pref_name__icontains='kinase'`
`__lte`, `__gte`	Less/greater than or equal	`standard_value__lte=10`
`__lt`, `__gt`	Less/greater than	`pchembl_value__gt=7`
`__range`	Value within range	`alogp__range=[-1, 5]`
`__in`	Value in list	`target_chembl_id__in=['CHEMBL203']`
`__isnull`	Null check	`pchembl_value__isnull=False`
`__startswith`	Prefix match	`pref_name__startswith='Proto'`
`__regex`	Regular expression	`pref_name__regex='^[A-Z]{3}'`

Common Workflows

Find Target Inhibitors

from chembl_webresource_client.new_client import new_client

activity = new_client.activity

# Get potent BRAF inhibitors (IC50 < 100 nM)
braf_hits = activity.filter(
    target_chembl_id='CHEMBL5145',
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

for hit in braf_hits:
    print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")

Search by Target Name

from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity

# Find CDK targets
cdk_targets = target.filter(
    pref_name__icontains='cyclin-dependent kinase',
    target_type='SINGLE PROTEIN'
)

target_ids = [t['target_chembl_id'] for t in cdk_targets]

# Get activities for these targets
cdk_activities = activity.filter(
    target_chembl_id__in=target_ids[:5],
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

Structure Similarity Search

from chembl_webresource_client.new_client import new_client

sim = new_client.similarity

# Find molecules 80% similar to ibuprofen
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)

for m in matches:
    print(f"{m['molecule_chembl_id']}: {m['similarity']}%")

Substructure Search

from chembl_webresource_client.new_client import new_client

sub = new_client.substructure

# Find compounds with benzimidazole core
benzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)

Filter by Molecular Properties

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

# Lipinski-compliant fragments
fragments = mol.filter(
    molecule_properties__mw_freebase__lte=300,
    molecule_properties__alogp__lte=3,
    molecule_properties__hbd__lte=3,
    molecule_properties__hba__lte=3
)

Drug Mechanisms of Action

from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication

# Get mechanism of metformin
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)

for m in mechanisms:
    print(f"Target: {m['target_chembl_id']}")
    print(f"Action: {m['action_type']}")

# Get approved indications
indications = drug_ind.filter(molecule_chembl_id=metformin_id)

Generate Molecule Images

from chembl_webresource_client.new_client import new_client

img = new_client.image

# Get SVG of caffeine
caffeine_svg = img.get('CHEMBL113')

with open('caffeine.svg', 'w') as f:
    f.write(caffeine_svg)

Key Response Fields

Molecule Properties

Field	Description
`molecule_chembl_id`	ChEMBL identifier
`pref_name`	Preferred name
`molecule_structures.canonical_smiles`	SMILES string
`molecule_structures.standard_inchi_key`	InChI key
`molecule_properties.mw_freebase`	Molecular weight
`molecule_properties.alogp`	Calculated LogP
`molecule_properties.hba` / `hbd`	H-bond acceptors/donors
`molecule_properties.psa`	Polar surface area
`molecule_properties.rtb`	Rotatable bonds
`molecule_properties.num_ro5_violations`	Lipinski violations
`molecule_properties.qed_weighted`	QED drug-likeness

Activity Fields

Field	Description
`molecule_chembl_id`	Compound ID
`target_chembl_id`	Target ID
`standard_type`	Measurement type (IC50, Ki, EC50)
`standard_value`	Numeric value
`standard_units`	Units (nM, uM)
`pchembl_value`	Normalized -log10 value
`data_validity_comment`	Quality flag
`potential_duplicate`	Duplicate indicator

Target Fields

Field	Description
`target_chembl_id`	ChEMBL target ID
`pref_name`	Preferred name
`target_type`	SINGLE PROTEIN, PROTEIN COMPLEX, etc.
`organism`	Species

Mechanism Fields

Field	Description
`molecule_chembl_id`	Drug ID
`target_chembl_id`	Target ID
`mechanism_of_action`	Description
`action_type`	INHIBITOR, AGONIST, ANTAGONIST, etc.

Export to DataFrame

import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)

Configuration

from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # Enable response caching
cfg.CACHE_EXPIRE = 43200     # Cache TTL (12 hours)
cfg.TIMEOUT = 60             # Request timeout
cfg.TOTAL_RETRIES = 5        # Retry attempts

Data Quality Notes

ChEMBL data is manually curated but verify data_validity_comment fields
Check potential_duplicate flags when aggregating results
Use pchembl_value for normalized comparisons across assay types
Activity values without standard_units should be used cautiously

Best Practices

Use caching - Reduces API load and improves performance
Filter early - Apply filters to reduce data transfer
Limit results - Use [:n] slicing for testing
Check validity - Inspect data_validity_comment fields
Use pchembl_value - Normalized values enable cross-assay comparison
Batch queries - Use __in operator for multiple IDs

Error Handling

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("Compound not found")
    elif '503' in str(e):
        print("Service unavailable - retry later")
    else:
        raise

External Links

ChEMBL: https://www.ebi.ac.uk/chembl/
API Documentation: https://chembl.gitbook.io/chembl-interface-documentation
Python Client: https://github.com/chembl/chembl_webresource_client

chembl-database

Safety Notice

Copy this and send it to your AI assistant to learn