Medchem Rules¶
Medchem rules can be useful to quickly flag or triage compounds that do not meet certain basic molecular properties criteria (such as MW, TSPA, cLogP, etc).
You can find more informations about the rules and their origins within their docstring or in the API documentation.
The below tutorial will introduce you how to apply those rules to a list of molecules.
??? warning: Avoid blindly applying Medchem filters; you may miss valuable compounds or allow toxins for your specific applications.
import datamol as dm
import pandas as pd
import medchem as mc
Filtering with rules¶
You can list all the available rules.
mc.rules.RuleFilters.list_available_rules()
name | rules | description | |
---|---|---|---|
0 | rule_of_five | MW <= 500 & logP <= 5 & HBD <= 5 & HBA <= 10 | leadlike;druglike;small molecule;library design |
1 | rule_of_five_beyond | MW <= 1000 & logP in [-2, 10] & HBD <= 6 & HBA... | leadlike;druglike;small molecule;library design |
2 | rule_of_four | MW >= 400 & logP >= 4 & RINGS >=4 & HBA >= 4 | PPI inhibitor;druglike |
3 | rule_of_three | MW <= 300 & logP <= 3 & HBA <= 3 & HBD <= 3 & ... | fragment;building block |
4 | rule_of_three_extended | MW <= 300 & logP in [-3, 3] & HBA <= 6 & HBD <... | fragment;building block |
5 | rule_of_two | MW <= 200 & logP <= 2 & HBA <= 4 & HBD <= 2 | fragment;reagent;building block |
6 | rule_of_ghose | MW in [160, 480] & logP in [-0.4, 5.6] & Natom... | leadlike;druglike;small molecule;library design |
7 | rule_of_veber | rotatable bond <= 10 & TPSA < 140 | druglike;leadlike;small molecule;oral |
8 | rule_of_reos | MW in [200, 500] & logP in [-5, 5] & HBA in [0... | druglike;small molecule;library design;HTS |
9 | rule_of_chemaxon_druglikeness | MW < 400 & logP < 5 & HBA <= 10 & HBD <= 5 & r... | leadlike;druglike;small molecule |
10 | rule_of_egan | TPSA in [0, 132] & logP in [-1, 6] | druglike;small molecule;admet;absorption;perme... |
11 | rule_of_pfizer_3_75 | not (TPSA < 75 & logP > 3) | druglike;toxicity;invivo;small molecule |
12 | rule_of_gsk_4_400 | MW <= 400 & logP <= 4 | druglike;admet;small molecule |
13 | rule_of_oprea | HBD in [0, 2] & HBA in [2, 9] & ROTBONDS in [2... | druglike;small molecule |
14 | rule_of_xu | HBD <= 5 & HBA <= 10 & ROTBONDS in [2, 35] & R... | druglike;small molecule;library design |
15 | rule_of_cns | MW in [135, 582] & logP in [-0.2, 6.1] & TPSA ... | druglike;CNS;BBB;small molecule |
16 | rule_of_respiratory | MW in [240, 520] & logP in [-2, 4.7] & HBONDS... | druglike;respiratory;small molecule;nasal;inha... |
17 | rule_of_zinc | MW in [60, 600] & logP < in [-4, 6] & HBD <= 6... | druglike;small molecule;library design;zinc |
18 | rule_of_leadlike_soft | MW in [150, 400] & logP < in [-3, 4] & HBD <= ... | leadlike;small molecule;library design;admet |
19 | rule_of_druglike_soft | MW in [100, 600] & logP < in [-3, 6] & HBD <= ... | druglike;small molecule;library design |
20 | rule_of_generative_design | MW in [200, 600] & logP < in [-3, 6] & HBD <= ... | druglike;small molecule;de novo design;generat... |
21 | rule_of_generative_design_strict | MW in [200, 600] & logP < in [-3, 6] & HBD <= ... | druglike;small molecule;de novo design;generat... |
You can also filter the rules using the tags in their descriptions:
mc.rules.RuleFilters.list_available_rules("building block")
name | rules | description | |
---|---|---|---|
3 | rule_of_three | MW <= 300 & logP <= 3 & HBA <= 3 & HBD <= 3 & ... | fragment;building block |
4 | rule_of_three_extended | MW <= 300 & logP in [-3, 3] & HBA <= 6 & HBD <... | fragment;building block |
5 | rule_of_two | MW <= 200 & logP <= 2 & HBA <= 4 & HBD <= 2 | fragment;reagent;building block |
Given a list of rules, you can create a RuleFilters
object in order to filter a list of molecules.
# Create the filter object
rfilter = mc.rules.RuleFilters(
# You can specifiy a rule as a string or as a callable
rule_list=["rule_of_five", "rule_of_oprea", "rule_of_cns", "rule_of_leadlike_soft"],
# You can specify a custom list of names
rule_list_names=["rule_of_five", "rule_of_oprea", "rule_of_cns", "rule_of_leadlike_soft"],
)
# Load a dataset
data = dm.data.solubility()
data = data.sample(50, random_state=20)
dm.to_image(data.iloc[:8]["mol"].tolist(), mol_size=(300, 250))
Apply our rule filters on the list of molecules.
results = rfilter(
mols=data["mol"].tolist(),
n_jobs=-1,
progress=True,
progress_leave=True,
scheduler="auto",
keep_props=False,
fail_if_invalid=True,
)
results.head()
Filter by rules: 0%| | 0/50 [00:00<?, ?it/s]
mol | pass_all | pass_any | rule_of_five | rule_of_oprea | rule_of_cns | rule_of_leadlike_soft | |
---|---|---|---|---|---|---|---|
0 | <rdkit.Chem.rdchem.Mol object at 0x15e7597e0> | False | True | True | False | False | False |
1 | <rdkit.Chem.rdchem.Mol object at 0x15e90d000> | False | True | True | False | True | True |
2 | <rdkit.Chem.rdchem.Mol object at 0x15e75f530> | False | True | True | False | False | False |
3 | <rdkit.Chem.rdchem.Mol object at 0x15e75d8c0> | True | True | True | True | True | True |
4 | <rdkit.Chem.rdchem.Mol object at 0x15e766030> | False | True | True | False | True | True |
You will noticde that the columns pass_all
and pass_any
will indicate whether a molecule has passed either all or at least one of the rules.
Display the results.
rows = results.iloc[:8]
mols = rows["mol"].iloc[:8].tolist()
legends = rows[["pass_all", "pass_any"]].apply(lambda x: f"pass_all={x[0]} - pass_any={x[1]}", axis=1).tolist()
dm.to_image(mols, legends=legends, mol_size=(300, 250))
Low level API¶
You can use the low level API to filter molecules.
List the available rule by their names.
rule_names = mc.rules.RuleFilters.list_available_rules()["name"].tolist()
rule_names
['rule_of_five', 'rule_of_five_beyond', 'rule_of_four', 'rule_of_three', 'rule_of_three_extended', 'rule_of_two', 'rule_of_ghose', 'rule_of_veber', 'rule_of_reos', 'rule_of_chemaxon_druglikeness', 'rule_of_egan', 'rule_of_pfizer_3_75', 'rule_of_gsk_4_400', 'rule_of_oprea', 'rule_of_xu', 'rule_of_cns', 'rule_of_respiratory', 'rule_of_zinc', 'rule_of_leadlike_soft', 'rule_of_druglike_soft', 'rule_of_generative_design', 'rule_of_generative_design_strict']
The rules are available at mc.rules.basic_rules.*
. You can retrieve a rule with:
- the direct import
mc.rules.basic_rules.rule_of_five
. - by its name:
getattr(mc.rules.basic_rules, "rule_of_five")
Let's do it!
# Get some rules
rule_1 = mc.rules.basic_rules.rule_of_five
rule_2 = getattr(mc.rules.basic_rules, "rule_of_chemaxon_druglikeness")
# Get a molecule
mol = dm.to_mol(" CN(C)CCOC(C1=CC=CC=C1)C1=CC=CC=C1")
mol
# Apply rule #1
rule_1(mol)
True
# Apply rule #2
rule_2(mol)
False
-- The End :-)