Molecular Complexity¶
The ComplexityFilter
allows to filter a molecule according to a structural complexity metric. It's often a good proxy for synthetic accessibility and leadlikeness depending on your discovery stage.
??? warning: Avoid blindly applying Medchem filters; you may miss valuable compounds or allow toxins for your specific applications.
In [2]:
Copied!
import datamol as dm
import pandas as pd
import medchem as mc
import datamol as dm
import pandas as pd
import medchem as mc
Available filters¶
In [3]:
Copied!
mc.complexity.ComplexityFilter.list_default_available_filters()
mc.complexity.ComplexityFilter.list_default_available_filters()
Out[3]:
['bertz', 'sas', 'qed', 'clogp', 'whitlock', 'barone', 'smcm', 'twc']
The complexity filter uses a percentile-based filtering of based on computed metrics to discard molecules that would have been outliers on those metrics on a very large catalog of commercially available molecules.
In [4]:
Copied!
# the default percentile available for filtering are the following
mc.complexity.ComplexityFilter.list_default_percentile()
# the default percentile available for filtering are the following
mc.complexity.ComplexityFilter.list_default_percentile()
Out[4]:
['99', '999', 'max']
In [5]:
Copied!
# you can also have a look at the file containing
# the computed statistics per metrics
mc.complexity.ComplexityFilter.load_threshold_stats_file().head()
# you can also have a look at the file containing
# the computed statistics per metrics
mc.complexity.ComplexityFilter.load_threshold_stats_file().head()
Out[5]:
bertz | whitlock | barone | smcm | mw_bins | percentile | |
---|---|---|---|---|---|---|
0 | 257.0 | 14.0 | 234.0 | 21.7 | 150.0 | 99 |
1 | 394.0 | 17.0 | 309.0 | 28.8 | 200.0 | 99 |
2 | 525.0 | 20.0 | 384.0 | 35.0 | 250.0 | 99 |
3 | 679.0 | 23.0 | 462.0 | 40.2 | 300.0 | 99 |
4 | 864.0 | 26.0 | 540.0 | 44.0 | 350.0 | 99 |
Usage¶
Load some molecules.
In [6]:
Copied!
data = dm.data.cdk2()
data = data.iloc[:8]
# Let's remove the conformers since they are not important here.
data["mol"].apply(lambda x: x.RemoveAllConformers())
dm.to_image(data["mol"].tolist(), mol_size=(300, 200))
data = dm.data.cdk2()
data = data.iloc[:8]
# Let's remove the conformers since they are not important here.
data["mol"].apply(lambda x: x.RemoveAllConformers())
dm.to_image(data["mol"].tolist(), mol_size=(300, 200))
Out[6]:
Load the complexity filter.
In [7]:
Copied!
cfilter = mc.complexity.ComplexityFilter(threshold_stats_file="zinc_12", complexity_metric="whitlock")
cfilter.complexity_metric
cfilter = mc.complexity.ComplexityFilter(threshold_stats_file="zinc_12", complexity_metric="whitlock")
cfilter.complexity_metric
Out[7]:
'whitlock'
Apply the filter on our list of molecules. True
means it passes the filter and False
mean the molecule is too complex.
In [8]:
Copied!
data["pass_cfilter"] = data["mol"].apply(cfilter)
data["pass_cfilter"]
data["pass_cfilter"] = data["mol"].apply(cfilter)
data["pass_cfilter"]
Out[8]:
0 True 1 False 2 False 3 True 4 False 5 True 6 True 7 True Name: pass_cfilter, dtype: bool
In [9]:
Copied!
legends = data["pass_cfilter"].apply(lambda x: f"Pass={x}").tolist()
dm.to_image(data["mol"].tolist(), legends=legends, mol_size=(300, 200))
legends = data["pass_cfilter"].apply(lambda x: f"Pass={x}").tolist()
dm.to_image(data["mol"].tolist(), legends=legends, mol_size=(300, 200))
Out[9]:
-- The End :-)