Structural Alert Filters¶
Structural filters can be used to triage or flag compounds that contains a specific molecular pattern or featre that can often be unwanted or undesired. Medchem proposes 3 types of structural filters:
- Common Alerts: a list of filters curated and aggregated from ChEMBL and the public litterature. The majority of this list has been curated by Patrick Walters and proposed at https://github.com/PatWalters/rd_filters.
- NIBR Filters: Novartis screening deck originally proposed in Evolution of Novartis’ Small Molecule Screening Deck Design.
- Eli Lilly Demerits Filters: A set of 275 rules used to identify compounds that may interfere with biological assays. Originally proposed in Rules for Identifying Potentially Reactive or Promiscuous Compounds.
In [1]:
Copied!
import datamol as dm
import pandas as pd
import medchem as mc
import datamol as dm
import pandas as pd
import medchem as mc
Common Alerts¶
You can list all the available default filters.
In [ ]:
Copied!
mc.structural.CommonAlertsFilters.list_default_available_alerts()
mc.structural.CommonAlertsFilters.list_default_available_alerts()
Out[ ]:
rule_set_name | smarts | catalog_description | rule_set | source | |
---|---|---|---|---|---|
0 | Glaxo | 55 | Glaxo Wellcome Hard filters | 1 | ChEMBL |
1 | Dundee | 105 | University of Dundee NTD Screening Library Fil... | 2 | ChEMBL |
2 | BMS | 180 | Bristol-Myers Squibb HTS Deck filters | 3 | ChEMBL |
3 | PAINS | 481 | PAINS filters | 4 | ChEMBL |
4 | SureChEMBL | 166 | SureChEMBL Non-MedChem Friendly SMARTS | 5 | ChEMBL |
5 | MLSMR | 116 | NIH MLSMR Excluded Functionality filters (MLSMR) | 6 | ChEMBL |
6 | Inpharmatica | 91 | Unwanted fragments derived by Inpharmatica Ltd. | 7 | ChEMBL |
7 | LINT | 57 | Pfizer lint filters (lint) | 8 | ChEMBL |
8 | Alarm-NMR | 75 | Reactive False Positives in Biochemical Screen... | 9 | Litterature |
9 | AlphaScreen-Hitters | 6 | Structural filters for compounds that may be a... | 10 | Litterature |
10 | GST-Hitters | 34 | Structural filters for compounds may prevent G... | 11 | Litterature |
11 | HIS-Hitters | 19 | Structural filters for compounds prevents the ... | 12 | Litterature |
12 | LuciferaseInhibitor | 3 | Structural filters for compounds that may inhi... | 13 | Litterature |
13 | DNABinder | 78 | Structural filters for compounds that may bind... | 14 | Litterature |
14 | Chelator | 55 | Structural filters for compounds that may inhi... | 15 | Litterature |
15 | Frequent-Hitter | 15 | Structural filters for compounds that are freq... | 16 | Litterature |
16 | Electrophilic | 119 | Structural filters for compounds that could ta... | 17 | Litterature |
17 | Genotoxic-Carcinogenicity | 117 | Structural filters for compounds that may caus... | 18 | Litterature |
18 | LD50-Oral | 20 | Structural filters for compounds that may caus... | 19 | Litterature |
19 | Non-Genotoxic-Carcinogenicity | 22 | Structural filters for compounds that may caus... | 20 | Litterature |
20 | Reactive-Unstable-Toxic | 335 | General very reactive/unstable or Toxic compounds | 21 | Litterature |
21 | Skin | 155 | Skin Sensitization filters (irritables) | 22 | Litterature |
22 | Toxicophore | 154 | General Toxicophores | 23 | Litterature |
Create a CommonAlertsFilters
object in order to filter a list of molecules.
In [ ]:
Copied!
alerts = mc.structural.CommonAlertsFilters()
alerts = mc.structural.CommonAlertsFilters()
By default only the "BMS" set is used but you can specify your own set(s):
In [ ]:
Copied!
alerts2 = mc.structural.CommonAlertsFilters(alerts_set=["LINT", "Toxicophore"])
alerts2 =
mc.structural.CommonAlertsFilters(alerts_set=["LINT", "Toxicophore"])
Let's load a few molecules.
In [ ]:
Copied!
# Load a dataset
data = dm.data.solubility()
data = data.sample(50, random_state=20)
dm.to_image(data.iloc[:8]["mol"].tolist(), mol_size=(300, 250))
# Load a dataset
data = dm.data.solubility()
data = data.sample(50, random_state=20)
dm.to_image(data.iloc[:8]["mol"].tolist(), mol_size=(300, 250))
Out[ ]:
Apply the filters on the list of molecules.
In [ ]:
Copied!
results = alerts(
mols=data["mol"].tolist(),
n_jobs=-1,
progress=True,
progress_leave=True,
scheduler="auto",
)
results.head()
results = alerts(
mols=data["mol"].tolist(),
n_jobs=-1,
progress=True,
progress_leave=True,
scheduler="auto",
)
results.head()
Filter by alerts: 0%| | 0/50 [00:00<?, ?it/s]
Out[ ]:
mol | pass_filter | status | reasons | |
---|---|---|---|---|
0 | <rdkit.Chem.rdchem.Mol object at 0x7f4e549e5690> | True | ok | None |
1 | <rdkit.Chem.rdchem.Mol object at 0x7f4e53fc5d20> | False | exclude | Polycyclic aromatic hydrocarbon |
2 | <rdkit.Chem.rdchem.Mol object at 0x7f4e549ccba0> | False | exclude | aniline |
3 | <rdkit.Chem.rdchem.Mol object at 0x7f4e549ce8f0> | True | ok | None |
4 | <rdkit.Chem.rdchem.Mol object at 0x7f4e549b9ee0> | True | ok | None |
Display the results.
In [ ]:
Copied!
rows = results.iloc[:8]
mols = rows["mol"].iloc[:8].tolist()
legends = (
rows[["pass_filter", "reasons"]].apply(lambda x: f"pass_filter={x[0]}\nreasons={x[1]}", axis=1).tolist()
)
dm.to_image(mols, legends=legends, mol_size=(300, 250))
rows = results.iloc[:8]
mols = rows["mol"].iloc[:8].tolist()
legends = (
rows[["pass_filter", "reasons"]].apply(lambda x: f"pass_filter={x[0]}\nreasons={x[1]}",
axis=1).tolist()
)
dm.to_image(mols, legends=legends, mol_size=(300, 250))
Out[ ]:
NIBR Filters¶
Load the NIBR filters.
In [ ]:
Copied!
nibr_filters = mc.structural.NIBRFilters()
nibr_filters = mc.structural.NIBRFilters()
Let's load a few molecules.
In [ ]:
Copied!
# Load a dataset
data = dm.data.solubility()
data = data.sample(50, random_state=20)
dm.to_image(data.iloc[:8]["mol"].tolist(), mol_size=(300, 250))
# Load a dataset
data = dm.data.solubility()
data = data.sample(50, random_state=20)
dm.to_image(data.iloc[:8]["mol"].tolist(), mol_size=(300, 250))
Out[ ]:
Apply the filters on the list of molecules.
In [ ]:
Copied!
results = nibr_filters(
mols=data["mol"].tolist(),
n_jobs=-1,
progress=True,
progress_leave=True,
scheduler="threads",
keep_details=True,
)
results.head()
results = nibr_filters(
mols=data["mol"].tolist(),
n_jobs=-1,
progress=True,
progress_leave=True,
scheduler="threads",
keep_details=True,
)
results.head()
NIBR filtering: 0%| | 0/50 [00:00<?, ?it/s]
Out[ ]:
mol | reasons | severity | status | n_covalent_motif | special_mol | pass_filter | details | |
---|---|---|---|---|---|---|---|---|
0 | <rdkit.Chem.rdchem.Mol object at 0x7f4e517642e0> | ketals _or_acetals_min(1); steroid_non_arom_mi... | 10 | exclude | 0 | 2 | False | {'name': {0: 'ketals _or_acetals_min(1)', 1: '... |
1 | <rdkit.Chem.rdchem.Mol object at 0x7f4e515b71b0> | polycyclic_systems_14_atoms_min(1) | 0 | annotations | 0 | 1 | True | {'name': {0: 'polycyclic_systems_14_atoms_min(... |
2 | <rdkit.Chem.rdchem.Mol object at 0x7f4e5176e030> | None | 0 | ok | 0 | 0 | True | NaN |
3 | <rdkit.Chem.rdchem.Mol object at 0x7f4e5176c3c0> | None | 0 | ok | 0 | 0 | True | NaN |
4 | <rdkit.Chem.rdchem.Mol object at 0x7f4e515bcb30> | None | 0 | ok | 0 | 0 | True | NaN |
In [ ]:
Copied!
results.columns.tolist()
results.columns.tolist()
Out[ ]:
['mol', 'reasons', 'severity', 'status', 'n_covalent_motif', 'special_mol', 'pass_filter', 'details']
Display the results.
In [ ]:
Copied!
rows = results.iloc[:8]
mols = rows["mol"].iloc[:8].tolist()
legends = (
rows[["pass_filter", "status"]].apply(lambda x: f"pass_filter={x[0]} - status={x[1]}", axis=1).tolist()
)
dm.to_image(mols, legends=legends, mol_size=(300, 250))
rows = results.iloc[:8]
mols = rows["mol"].iloc[:8].tolist()
legends = (
rows[["pass_filter", "status"]].apply(lambda x: f"pass_filter={x[0]} - status={x[1]}",
axis=1).tolist()
)
dm.to_image(mols, legends=legends, mol_size=(300, 250))
Out[ ]:
Eli Lilly Demerits Filters¶
In [ ]:
Copied!
# Load a dataset
data = dm.data.solubility()
data = data.sample(50, random_state=20)
dm.to_image(data.iloc[:8]["mol"].tolist(), mol_size=(300, 250))
# Load a dataset
data = dm.data.solubility()
data = data.sample(50, random_state=20)
dm.to_image(data.iloc[:8]["mol"].tolist(), mol_size=(300, 250))
Out[ ]:
In [ ]:
Copied!
from medchem.structural.lilly_demerits import LillyDemeritsFilters
dfilter = LillyDemeritsFilters()
from medchem.structural.lilly_demerits import
LillyDemeritsFilters
dfilter = LillyDemeritsFilters()
In [ ]:
Copied!
results = dfilter(
mols=data["mol"].tolist(),
n_jobs=-1,
progress=True,
progress_leave=True,
scheduler="threads",
)
results.head()
results = dfilter(
mols=data["mol"].tolist(),
n_jobs=-1,
progress=True,
progress_leave=True,
scheduler="threads",
)
results.head()
Out[ ]:
smiles | reasons | step | demerit_score | status | pass_filter | mol | |
---|---|---|---|---|---|---|---|
0 | CC1(C)OC2CC3C4CCC5=CC(=O)C=CC5(C)C4(F)C(O)CC3(... | michael_rejected | 2 | NaN | exclude | False | <rdkit.Chem.rdchem.Mol object at 0x7f58d6f23530> |
1 | C1=CC2=C(C=C1)C1=NC=CN=C1C=C2 | phenanthrene_het:D60 | 2 | 60.0 | flag | True | <rdkit.Chem.rdchem.Mol object at 0x7f58d66f9c40> |
2 | NC1=CC=CC=C1Cl | aniline_h_newd:D50,aniline_h_ewd:D10 | 2 | 60.0 | flag | True | <rdkit.Chem.rdchem.Mol object at 0x7f5a7da61ee0> |
3 | O=C(O)CCC(=O)C1=CC=C(C2=CC=CC=C2)C=C1 | NaN | 2 | 0.0 | ok | True | <rdkit.Chem.rdchem.Mol object at 0x7f58d6da4c10> |
4 | CC(C)N1C(=O)C2=CC=CC=C2NS1(=O)=O | NaN | 2 | 0.0 | ok | True | <rdkit.Chem.rdchem.Mol object at 0x7f58d5a5b610> |
-- The End :-)