medchem.complexity
¶
medchem.complexity.ComplexityFilter
¶
Complexitybased molecular filters which compare some molecular complexity metrics to precomputed thresholds on large datasets of commercially available compounds.
The default threshold have been calculated using all commercially available compounds of the zinc15 dataset.
Usage
from medchem.complexity import ComplexityFilter
complexity_filter = ComplexityFilter(limit="99", complexity_metric="bertz")
complexity_filter("CC(=O)Nc1ccc(cc1)O") # True
__call__(mol)
¶
Check whether the input structure is too complex given this instance of the complexity filter Return False is the molecule is too complex, else True
Parameters:
Name  Type  Description  Default 

mol 
Union[Mol, str]

input molecule 
required 
__init__(limit='99', complexity_metric='bertz', threshold_stats_file='zinc_15_available')
¶
Default complexity limit is set on at least 1 exceeding metric on the 999th permille level
Parameters:
Name  Type  Description  Default 

limit 
str

The complexity percentile outlier limit to be used (should be a string expressed as an integer) 
'99'

complexity_metric 
str

The complexity filter name to be used.
Use 
'bertz'

threshold_stats_file 
Optional[str]

The path to or type the threshold file to be used. The default available threshold stats files are * "zinc_12" * "zinc_15_available" 
'zinc_15_available'

list_default_available_filters()
classmethod
¶
Return a list of unique filter names
list_default_percentile(threshold_stats_file=None)
cached
classmethod
¶
Return the default percentile list for the threshold file
load_threshold_stats_file(path=None)
classmethod
¶
Load threshold file to compute the percentille depending on the MW for each complexity_metric Args: path: path to the threshold file
medchem.complexity.WhitlockCT(mol, ringval=4, unsatval=2, heteroval=1, chiralval=2)
¶
A chemically intuitive measure for molecular complexity. This complexity measure has been described in : H. W. Whitlock, J. Org. Chem., 1998, 63, 79827989. Benzyls, fenyls, etc. are not treated at all.
On the zinc 15 commercially available dataset, the range of this score is [0, 172] with a median of 25
Parameters:
Name  Type  Description  Default 

mol 
Mol

The input molecule. 
required 
ringval 
float

The contribution of rings 
4

unsatval 
float

The contribution of the unsaturated bond. 
2

heteroval 
float

The contribution of the heteroatom. 
1

chiralval 
float

The contribution of the chiral center. 
2

medchem.complexity.BaroneCT(mol, chiral=False)
¶
Compute the Barone complexity measure for a molecule as described in: R. Barone and M. Chanon, J. Chem. Inf. Comput. Sci., 2001, 41 (2), pp 269–272
On zinc 15 commercially available dataset, the range of this score is [30, 4266] with a median of 538
Parameters:
Name  Type  Description  Default 

mol 
Mol

The input molecule. 
required 
chiral 
bool

Whether to include chirality in the calculation. 
False

medchem.complexity.SMCM(mol)
¶
Compute synthetic and molecular complexity as described in: TK Allu, TI Oprea, J. Chem. Inf. Model. 2005, 45(5), pp. 12371243
On zinc 15 commercially available dataset, the range of this score is [1.93, 192.00] with a median of 42.23
Parameters:
Name  Type  Description  Default 

mol 
Mol

the input molecule 
required 
medchem.complexity.TWC(mol, log10=True)
¶
Compute total walk count in a molecules as proxy for complexity. This score is described in: Gerta Rucker and Christoph Rucker, J. Chem. Inf. Comput. Sci. 1993, 33, 683695
The total walk count is defined as: \(twc = \frac{1}{2} \sum_{k=1}^{n1}
\sum_{i}^{Natoms} \text{awc}(k,i)\)
where \(\text{awc}(k,i)\) is the number of walk of length
k
starting at atom i
.
On zinc 15 commercially available dataset, the range of this score is [1.20, 39.08] with a median of 10.65
Parameters:
Name  Type  Description  Default 

mol 
Mol

the input molecule 
required 
log10 
bool

whether to return the log10 of the values 
True
