Skip to content

medchem.complexity

medchem.complexity.ComplexityFilter

Complexity-based molecular filters which compare some molecular complexity metrics to precomputed thresholds on large datasets of commercially available compounds.

The default threshold have been calculated using all commercially available compounds of the zinc-15 dataset.

Usage

from medchem.complexity import ComplexityFilter
complexity_filter = ComplexityFilter(limit="99", complexity_metric="bertz")
complexity_filter("CC(=O)Nc1ccc(cc1)O") # True

__call__(mol)

Check whether the input structure is too complex given this instance of the complexity filter Return False is the molecule is too complex, else True

Parameters:

Name Type Description Default
mol Union[Mol, str]

input molecule

required

__init__(limit='99', complexity_metric='bertz', threshold_stats_file='zinc_15_available')

Default complexity limit is set on at least 1 exceeding metric on the 999th permille level

Parameters:

Name Type Description Default
limit str

The complexity percentile outlier limit to be used (should be a string expressed as an integer)

'99'
complexity_metric str

The complexity filter name to be used. Use ComplexityFilter.list_default_available_filters to list default filters. The following complexity metrics are supported by default * "bertz": bertz complexity index * "sas": synthetic accessibility score (zinc_15_available only) * "qed": qed score (zinc_15_available only) * "clogp": clogp for how greasy a molecule is compared to other in the same mw range (zinc_15_available only) * "whitlock": whitlock complexity index * "barone": barone complexity index * "smcm": synthetic and molecular complexity * "twc": total walk count complexity (zinc_15_available only)

'bertz'
threshold_stats_file Optional[str]

The path to or type the threshold file to be used. The default available threshold stats files are * "zinc_12" * "zinc_15_available"

'zinc_15_available'

list_default_available_filters() classmethod

Return a list of unique filter names

list_default_percentile(threshold_stats_file=None) cached classmethod

Return the default percentile list for the threshold file

load_threshold_stats_file(path=None) classmethod

Load threshold file to compute the percentille depending on the MW for each complexity_metric Args: path: path to the threshold file

medchem.complexity.WhitlockCT(mol, ringval=4, unsatval=2, heteroval=1, chiralval=2)

A chemically intuitive measure for molecular complexity. This complexity measure has been described in : H. W. Whitlock, J. Org. Chem., 1998, 63, 7982-7989. Benzyls, fenyls, etc. are not treated at all.

On the zinc 15 commercially available dataset, the range of this score is [0, 172] with a median of 25

Parameters:

Name Type Description Default
mol Mol

The input molecule.

required
ringval float

The contribution of rings

4
unsatval float

The contribution of the unsaturated bond.

2
heteroval float

The contribution of the heteroatom.

1
chiralval float

The contribution of the chiral center.

2

medchem.complexity.BaroneCT(mol, chiral=False)

Compute the Barone complexity measure for a molecule as described in: R. Barone and M. Chanon, J. Chem. Inf. Comput. Sci., 2001, 41 (2), pp 269–272

On zinc 15 commercially available dataset, the range of this score is [30, 4266] with a median of 538

Parameters:

Name Type Description Default
mol Mol

The input molecule.

required
chiral bool

Whether to include chirality in the calculation.

False

medchem.complexity.SMCM(mol)

Compute synthetic and molecular complexity as described in: TK Allu, TI Oprea, J. Chem. Inf. Model. 2005, 45(5), pp. 1237-1243

On zinc 15 commercially available dataset, the range of this score is [1.93, 192.00] with a median of 42.23

Parameters:

Name Type Description Default
mol Mol

the input molecule

required

medchem.complexity.TWC(mol, log10=True)

Compute total walk count in a molecules as proxy for complexity. This score is described in: Gerta Rucker and Christoph Rucker, J. Chem. Inf. Comput. Sci. 1993, 33, 683-695

The total walk count is defined as: \(twc = \frac{1}{2} \sum_{k=1}^{n-1} \sum_{i}^{Natoms} \text{awc}(k,i)\) where \(\text{awc}(k,i)\) is the number of walk of length k starting at atom i.

On zinc 15 commercially available dataset, the range of this score is [1.20, 39.08] with a median of 10.65

Parameters:

Name Type Description Default
mol Mol

the input molecule

required
log10 bool

whether to return the log10 of the values

True