Molecular Complexity¶

The ComplexityFilter allows to filter a molecule according to a structural complexity metric. It's often a good proxy for synthetic accessibility and leadlikeness depending on your discovery stage.

??? warning: Avoid blindly applying Medchem filters; you may miss valuable compounds or allow toxins for your specific applications.

In [2]:

Copied!

import datamol as dm
import pandas as pd

import medchem as mc
import datamol as dm
import pandas as pd

import medchem as mc

Available filters¶

In [3]:

Copied!

mc.complexity.ComplexityFilter.list_default_available_filters()
mc.complexity.ComplexityFilter.list_default_available_filters()

Out[3]:

['bertz', 'sas', 'qed', 'clogp', 'whitlock', 'barone', 'smcm', 'twc']

The complexity filter uses a percentile-based filtering of based on computed metrics to discard molecules that would have been outliers on those metrics on a very large catalog of commercially available molecules.

In [4]:

Copied!

# the default percentile available for filtering are the following
mc.complexity.ComplexityFilter.list_default_percentile()
# the default percentile available for filtering are the following
mc.complexity.ComplexityFilter.list_default_percentile()

Out[4]:

['99', '999', 'max']

In [5]:

Copied!

# you can also have a look at the file containing 
# the computed statistics per metrics
mc.complexity.ComplexityFilter.load_threshold_stats_file().head()
# you can also have a look at the file containing 
# the computed statistics per metrics
mc.complexity.ComplexityFilter.load_threshold_stats_file().head()

Out[5]:

	bertz	whitlock	barone	smcm	mw_bins	percentile
0	257.0	14.0	234.0	21.7	150.0	99
1	394.0	17.0	309.0	28.8	200.0	99
2	525.0	20.0	384.0	35.0	250.0	99
3	679.0	23.0	462.0	40.2	300.0	99
4	864.0	26.0	540.0	44.0	350.0	99

Usage¶

Load some molecules.

In [6]:

Copied!

data = dm.data.cdk2()
data = data.iloc[:8]

# Let's remove the conformers since they are not important here.
data["mol"].apply(lambda x: x.RemoveAllConformers())

dm.to_image(data["mol"].tolist(), mol_size=(300, 200))
data = dm.data.cdk2()
data = data.iloc[:8]

# Let's remove the conformers since they are not important here.
data["mol"].apply(lambda x: x.RemoveAllConformers())

dm.to_image(data["mol"].tolist(), mol_size=(300, 200))

Out[6]:

No description has been provided for this image

Load the complexity filter.

In [7]:

Copied!

cfilter = mc.complexity.ComplexityFilter(threshold_stats_file="zinc_12", complexity_metric="whitlock")

cfilter.complexity_metric
cfilter = mc.complexity.ComplexityFilter(threshold_stats_file="zinc_12", complexity_metric="whitlock")

cfilter.complexity_metric

Out[7]:

'whitlock'

Apply the filter on our list of molecules. True means it passes the filter and False mean the molecule is too complex.

In [8]:

Copied!

data["pass_cfilter"] = data["mol"].apply(cfilter)

data["pass_cfilter"]
data["pass_cfilter"] = data["mol"].apply(cfilter)

data["pass_cfilter"]

Out[8]:

0     True
1    False
2    False
3     True
4    False
5     True
6     True
7     True
Name: pass_cfilter, dtype: bool

In [9]:

Copied!

legends = data["pass_cfilter"].apply(lambda x: f"Pass={x}").tolist()

dm.to_image(data["mol"].tolist(), legends=legends, mol_size=(300, 200))
legends = data["pass_cfilter"].apply(lambda x: f"Pass={x}").tolist()

dm.to_image(data["mol"].tolist(), legends=legends, mol_size=(300, 200))

Out[9]:

-- The End :-)