medchem.functional
¶
medchem.functional.alert_filter(mols, alerts, alerts_db=None, n_jobs=1, progress=False, return_idx=False)
¶
Filter a dataset of molecules, based on common structural alerts and specific rules.
True is good
Returning True
means the molecule does not match any of the structural alerts.
See Also
alert_filter
is a convenient functional API for the
medchem.structural.CommonAlertsFilters
class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
List of molecules to filter |
required |
alerts |
List[str]
|
List of alert collections to screen for. See
|
required |
alerts_db |
Optional[Union[PathLike, str]]
|
Path to the alert file name.
The internal default file ( |
None
|
n_jobs |
Optional[int]
|
Number of workers to use |
1
|
progress |
bool
|
Whether to show progress bar |
False
|
return_idx |
bool
|
Whether to return the filtered index |
False
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule IS OK (not found in the alert catalog). |
medchem.functional.nibr_filter(mols, n_jobs=None, max_severity=10, progress=False, return_idx=False)
¶
Filter a set of molecules based on the Novartis Institutes for BioMedical Research screening deck curation process Schuffenhauer, A. et al. Evolution of Novartis' small molecule screening deck design, J. Med. Chem. (2020)
The severity argument corresponds to the accumulated severity for a compounds accross all pattern in the catalog.
True is good
Returning True
means the molecule does not match any of the structural alerts.
See Also
nibr_filter
is a convenient functional API for the
medchem.structural.NIBRFilters
class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
max_severity |
int
|
maximum severity allowed. Default is <10 |
10
|
progress |
bool
|
whether to show progress bar |
False
|
return_idx |
bool
|
Whether to return the filtered index |
False
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule IS NOT REJECTED (i.e not found in the alert catalog). |
medchem.functional.catalog_filter(mols, catalogs, return_idx=False, n_jobs=-1, progress=False, progress_leave=False, scheduler='processes', batch_size=100)
¶
Filter a list of compounds according to a catalog of structural alerts and patterns
True is good
Returning True
means the molecule does not match any of the structural alerts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
catalogs |
List[Union[str, FilterCatalog]]
|
list of catalogs (name or FilterCatalog) |
required |
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
-1
|
progress |
bool
|
whether to show progress bar |
False
|
progress_leave |
bool
|
whether to leave the progress bar after completion |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
batch_size |
int
|
batch size for parallel processing. Note that |
100
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is not found in the catalog. |
medchem.functional.chemical_group_filter(mols, chemical_group, exact_match=False, return_idx=False, n_jobs=None, progress=False, progress_leave=False, scheduler='threads')
¶
Filter a list of compounds according to a chemical group instance.
Warning
This function will return the list of molecules that DO NOT match the chemical group.
See Also
Consider exploring the medchem.groups.ChemicalGroup
class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
chemical_group |
ChemicalGroup
|
a chemical group instance with the required functional groups to use. |
required |
exact_match |
bool
|
whether to use an exact match of the chemical group patterns (will switch to smiles ) |
False
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
progress_leave |
bool
|
whether to leave the progress bar after completion |
False
|
scheduler |
str
|
joblib scheduler to use |
'threads'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule DOES NOT MATCH the groups. |
medchem.functional.rules_filter(mols, rules, return_idx=False, n_jobs=None, progress=False, progress_leave=False, scheduler='processes')
¶
Filter a list of compounds according to a predefined set of rules
True is good
Returning True
means the molecule passes all the rules.
See Also
Consider exploring the medchem.rules.RuleFilters
class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
rules |
Union[List[Any], RuleFilters]
|
list of rules to apply to the input molecules. |
required |
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule MATCH the rule constraints. |
medchem.functional.complexity_filter(mols, complexity_metric='bertz', threshold_stats_file='zinc_15_available', limit='99', return_idx=False, n_jobs=None, progress=False, progress_leave=False, scheduler='processes')
¶
Filter a list of compounds according to a complexity metric
True is good
Returning True
means the molecule passes the complexity filters.
See Also
Consider exploring the medchem.complexity.ComplexityFilter
class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
complexity_metric |
str
|
complexity metric to use
Use
|
'bertz'
|
threshold_stats_file |
str
|
complexity threshold statistics file to use |
'zinc_15_available'
|
limit |
str
|
complexity outlier percentile to use |
'99'
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
progress_leave |
bool
|
whether to leave the progress bar after completion |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule MATCH the rules. |
medchem.functional.bredt_filter(mols, return_idx=False, n_jobs=None, progress=False, progress_leave=False, scheduler='threads', batch_size=100)
¶
Filter a list of compounds according to Bredt's rules
True is good
Returning True
means the molecule does not violate the Bredt's rules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
progress_leave |
bool
|
whether to leave the progress bar after completion |
False
|
scheduler |
str
|
joblib scheduler to use |
'threads'
|
batch_size |
int
|
batch size for parallel processing. Note that |
100
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is not toxic. |
medchem.functional.molecular_graph_filter(mols, max_severity=5, return_idx=False, n_jobs=None, progress=False, progress_leave=False, scheduler='threads')
¶
Filter a list of compounds according to unstable molecular graph patterns. This list was obtained from observation around technically valid molecular graphs from deep generative models that are not stable.
The disallowed graphs are:
- K3,3 or K2,4 structures
- Cone of P4 or K4 with 3-ear
- Node in more than one ring of length 3 or 4
True is good
Returning True
means the molecule does not violate the molecular graph instability
rules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
max_severity |
Optional[int]
|
maximum acceptable severity (1-10). Default is <5 |
5
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
progress_leave |
bool
|
whether to leave the progress bar after completion |
False
|
scheduler |
str
|
joblib scheduler to use |
'threads'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is not toxic. |
medchem.functional.lilly_demerit_filter(mols, max_demerits=160, return_idx=False, n_jobs=None, progress=False, progress_leave=False, scheduler='threads', batch_size=5000, **kwargs)
¶
Run the Eli Lilly's demerit filter on current list of molecules
True is good
Returning True
means the molecule does not violate the demerit rules.
See Also
Consider exploring the LillyDemeritsFilters
class in
medchem.structural.lilly_demerits
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules as smiles preferably |
required |
max_demerits |
Optional[int]
|
Cutoff to reject molecules Defaults to 160. |
160
|
return_idx |
bool
|
whether to return a mask or a list of valid indexes |
False
|
progress |
bool
|
whether to show progress bar |
False
|
progress_leave |
bool
|
whether to leave the progress bar after completion |
False
|
scheduler |
str
|
joblib scheduler to usescheduler |
'threads'
|
batch_size |
int
|
batch size for parallel processing. |
5000
|
kwargs |
Any
|
parameters specific to the |
{}
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |
medchem.functional.protecting_groups_filter(mols, return_idx=False, protecting_groups=['fmoc', 'tert-butoxymethyl', 'tert-butyl carbamate', 'tert-butyloxycarbonyl'], n_jobs=None, progress=False, progress_leave=False, scheduler='threads')
¶
Filter a list of compounds according to match to known protecting groups.
Warning
This function will return the list of molecules that DO NOT have the protecting groups.
!!! info "See Also" This is a syntaxic sugar for calling chemical_group_filter with the protecting groups subset.
Args: mols: list of input molecules protecting_groups: type of protection group to consider if not provided, will use all (not advised) return_idx: whether to return index or a boolean mask n_jobs: number of parallel job to run. Sequential by default progress: whether to show progress bar progress_leave: whether to leave the progress bar after completion scheduler: joblib scheduler to use
Returns: filtered_mask: boolean array (or index array) where true means the molecule DOES NOT MATCH the groups.
medchem.functional.macrocycle_filter(mols, max_cycle_size=10, return_idx=False, n_jobs=None, progress=False, scheduler='processes')
¶
Find valid molecules that do not infringe the strict maximum cycle size.
True is good
Returning True
means the molecule does not have rings larger than
max_cycle_size
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
max_cycle_size |
int
|
strict maximum macrocycle size |
10
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |
medchem.functional.atom_list_filter(mols, unwanted_atom_list=None, wanted_atom_list=None, return_idx=False, n_jobs=None, progress=False, scheduler='processes')
¶
Find molecules without any atom from a set of unwanted atomic symbols and with all atoms in the set of wanted atom list.
True is good
Returning True
means the molecule only has desirable atom types
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
unwanted_atom_list |
Optional[Sequence]
|
list of undesirable atomic symbol |
None
|
wanted_atom_list |
Optional[Sequence]
|
list of desirable atomic symbol |
None
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel jobs to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |
medchem.functional.ring_infraction_filter(mols, hetcycle_min_size=4, return_idx=False, n_jobs=None, progress=False, scheduler='processes')
¶
Find molecules that have a ring infraction filter. This filter focuses on checking for rings that are too small to have an heteroatom.
True is good
Returning True
means the molecule does not infringe the ring infraction filter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
hetcycle_min_size |
int
|
Minimum ring size before more than 1 hetero atom or any non single bond is allowed. This is a strict threshold (>) |
4
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |
medchem.functional.num_atom_filter(mols, min_atoms=None, max_atoms=None, return_idx=False, n_jobs=None, progress=False, scheduler='processes')
¶
Find molecules that match the number of atom range constraints
True is good
Returning True
means the molecule does not infringe the number of atom filter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
min_atoms |
Optional[int]
|
strict minimum number of atoms (atoms > min_atoms) |
None
|
max_atoms |
Optional[int]
|
strict maximum number of atoms (atoms < max_atoms) |
None
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |
medchem.functional.num_stereo_center_filter(mols, max_stereo_centers=4, max_undefined_stereo_centers=2, return_idx=False, n_jobs=None, progress=False, scheduler='processes')
¶
Find molecules that match the number of stereo center constraints. In general, molecules with too many undefined stereo centers are not desirable. This filter is useful for generated molecules.
True is good
Returning True
means the molecule does not have issues with stereo centers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
max_stereo_centers |
int
|
strict maximum number of stereo centers (<). Default is 4 |
4
|
max_undefined_stereo_centers |
int
|
strict maximum number of undefined stereo centers (<). Default is 2 |
2
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |
medchem.functional.halogenicity_filter(mols, thresh_F=6, thresh_Br=3, thresh_Cl=3, return_idx=False, n_jobs=None, progress=False, scheduler='processes')
¶
Find molecules that do not exceed halogen count threshold. This filter is useful for removing halogen biases in generated or enumerated chemical space during goal-directed optimization.
- 6 for fluorine
- 3 for bromine
- 3 for chlorine
True is good
Returning True
means the molecule does not have too many halogen atoms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
thresh_F |
int
|
maximum number of fluorine |
6
|
thresh_Br |
int
|
maximum number of bromine |
3
|
thresh_Cl |
int
|
maximum number of chlorine |
3
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |
medchem.functional.symmetry_filter(mols, symmetry_threshold=0.8, return_idx=False, n_jobs=None, progress=False, scheduler='processes')
¶
Find molecules that are not symmetrical, given a symmetry threshold. This filter was designed to offset the symmetry issue in molecular design, where some models tend to generate highly symmetrical molecules due to substructure bias.
True is good
Returning True
means the molecule is not too symmetrical
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mols |
Sequence[Union[str, Mol]]
|
list of input molecules |
required |
symmetry_threshold |
float
|
threshold to consider a molecule highly symmetrical |
0.8
|
return_idx |
bool
|
whether to return index or a boolean mask |
False
|
n_jobs |
Optional[int]
|
number of parallel job to run. Sequential by default |
None
|
progress |
bool
|
whether to show progress bar |
False
|
scheduler |
str
|
joblib scheduler to use |
'processes'
|
Returns:
Name | Type | Description |
---|---|---|
filtered_mask |
ndarray
|
boolean array (or index array) where true means the molecule is ok. |