medchem.complexity
¶
medchem.complexity.ComplexityFilter
¶
Complexity-based molecular filters which compare some molecular complexity metrics to precomputed thresholds on large datasets of commercially available compounds.
The default threshold have been calculated using all commercially available compounds of the zinc-15 dataset.
Usage
from medchem.complexity import ComplexityFilter
complexity_filter = ComplexityFilter(limit="99", complexity_metric="bertz")
complexity_filter("CC(=O)Nc1ccc(cc1)O") # True
__call__(mol)
¶
Check whether the input structure is too complex given this instance of the complexity filter Return False is the molecule is too complex, else True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
Union[Mol, str]
|
input molecule |
required |
__init__(limit='99', complexity_metric='bertz', threshold_stats_file='zinc_15_available')
¶
Default complexity limit is set on at least 1 exceeding metric on the 999th permille level
Parameters:
Name | Type | Description | Default |
---|---|---|---|
limit
|
str
|
The complexity percentile outlier limit to be used (should be a string expressed as an integer) |
'99'
|
complexity_metric
|
str
|
The complexity filter name to be used.
Use |
'bertz'
|
threshold_stats_file
|
Optional[str]
|
The path to or type the threshold file to be used. The default available threshold stats files are * "zinc_12" * "zinc_15_available" |
'zinc_15_available'
|
list_default_available_filters()
classmethod
¶
Return a list of unique filter names
list_default_percentile(threshold_stats_file=None)
cached
classmethod
¶
Return the default percentile list for the threshold file
load_threshold_stats_file(path=None)
classmethod
¶
Load threshold file to compute the percentille depending on the MW for each complexity_metric Args: path: path to the threshold file
medchem.complexity.WhitlockCT(mol, ringval=4, unsatval=2, heteroval=1, chiralval=2)
¶
A chemically intuitive measure for molecular complexity. This complexity measure has been described in : H. W. Whitlock, J. Org. Chem., 1998, 63, 7982-7989. Benzyls, fenyls, etc. are not treated at all.
On the zinc 15 commercially available dataset, the range of this score is [0, 172] with a median of 25
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
Mol
|
The input molecule. |
required |
ringval
|
float
|
The contribution of rings |
4
|
unsatval
|
float
|
The contribution of the unsaturated bond. |
2
|
heteroval
|
float
|
The contribution of the heteroatom. |
1
|
chiralval
|
float
|
The contribution of the chiral center. |
2
|
medchem.complexity.BaroneCT(mol, chiral=False)
¶
Compute the Barone complexity measure for a molecule as described in: R. Barone and M. Chanon, J. Chem. Inf. Comput. Sci., 2001, 41 (2), pp 269–272
On zinc 15 commercially available dataset, the range of this score is [30, 4266] with a median of 538
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
Mol
|
The input molecule. |
required |
chiral
|
bool
|
Whether to include chirality in the calculation. |
False
|
medchem.complexity.SMCM(mol)
¶
Compute synthetic and molecular complexity as described in: TK Allu, TI Oprea, J. Chem. Inf. Model. 2005, 45(5), pp. 1237-1243
On zinc 15 commercially available dataset, the range of this score is [1.93, 192.00] with a median of 42.23
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
Mol
|
the input molecule |
required |
medchem.complexity.TWC(mol, log10=True)
¶
Compute total walk count in a molecules as proxy for complexity. This score is described in: Gerta Rucker and Christoph Rucker, J. Chem. Inf. Comput. Sci. 1993, 33, 683-695
The total walk count is defined as: \(twc = \frac{1}{2} \sum_{k=1}^{n-1} \sum_{i}^{Natoms} \text{awc}(k,i)\)
where \(\text{awc}(k,i)\) is the number of walk of length k
starting at atom i
.
On zinc 15 commercially available dataset, the range of this score is [1.20, 39.08] with a median of 10.65
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
Mol
|
the input molecule |
required |
log10
|
bool
|
whether to return the log10 of the values |
True
|