Medchem Query Language¶
The Medchem query language is a simple and intuitive language allowing to express a filtering procedure based on the Medchem API. It's particularly convenient in contexts outside of Python. For example, you can build a frontend application to filter compounds while giving the user full flexibility how to input a filtering procedure. It's also an efficient way to serialize a filtering process and share it with other people using Medchem.
The query language is based on lark
.
In [1]:
Copied!
import datamol as dm
import pandas as pd
import medchem as mc
import datamol as dm
import pandas as pd
import medchem as mc
We start by providing some simple example for building a query. Then we provide the grammar and syntax used by the medchem query language.
Example #1¶
In [2]:
Copied!
# note that whitespace and newlines are ignored in the query
query = """
(
HASPROP("n_lipinski_hba" < 3) AND ! HASALERT("pains")
)
OR
(
HASGROUP("Alcohols")
OR
HASSUBSTRUCTURE("[CX3](=[OX1])O", True, 1)
)
"""
data = dm.freesolv()
data = data.iloc[:12]
data["mol"] = data["smiles"].apply(dm.to_mol)
dm.to_image(data["mol"].tolist(), n_cols=4, mol_size=(300, 150))
# note that whitespace and newlines are ignored
in the query
query = """
(
HASPROP("n_lipinski_hba" < 3) AND ! HASALERT("pains")
)
OR
(
HASGROUP("Alcohols")
OR
HASSUBSTRUCTURE("[CX3](=[OX1])O", True, 1)
)
"""
data = dm.freesolv()
data = data.iloc[:12]
data["mol"] = data["smiles"].apply(dm.to_mol)
dm.to_image(data["mol"].tolist(), n_cols=4, mol_size=(300, 150))
Out[2]:
In [3]:
Copied!
query_filter = mc.query.QueryFilter(query)
out = query_filter(data["smiles"], n_jobs=-1, progress=True)
out
query_filter = mc.query.QueryFilter(query)
out = query_filter(data["smiles"], n_jobs=-1, progress=True)
out
Loading Mols: 0%| | 0/12 [00:00<?, ?it/s]
Out[3]:
[False, True, True, True, True, True, True, True, True, True, True, True]
Example #2¶
In [4]:
Copied!
data = dm.data.cdk2()
data = data.iloc[:8]
dm.to_image(data["mol"].apply(dm.sanitize_mol).tolist(), n_cols=4, mol_size=(300, 150))
data = dm.data.cdk2()
data = data.iloc[:8]
dm.to_image(data["mol"].apply(dm.sanitize_mol).tolist(), n_cols=4, mol_size=(300, 150))
Out[4]: