pdf 609 K
pdf 609 K
pdf 609 K
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
MCSS in OEChem<br />
Krisztina Boda, Matthew Stahl<br />
OpenEye CUP IX, Santa Fe, March 2008
Overview<br />
• What is MCSS?<br />
• Why chemists might need MCSS?<br />
• Defining MCSS in OEChem<br />
– How to define atom and bond equivalency?<br />
– How to evaluate detected common substructures?<br />
– How to traverse the search tree<br />
• Example code
Maximum Common Substructure<br />
The largest sub-graph shared by two molecular structures.<br />
N<br />
m!<br />
n!<br />
( m − k)(<br />
n − k)!<br />
k!<br />
O<br />
O<br />
Cl<br />
NP-complete problem<br />
n, m = number of nodes in the compared graphs<br />
k = number of nodes in the common sub-graph<br />
Cl<br />
O<br />
O
Possible Applications of MCSS<br />
• clustering compound libraries<br />
• diversity measures of a compound<br />
library<br />
• mapping chemical reactions<br />
• R-group decomposition
Atom/Bond Equivalency<br />
OEExprOpts::DefaultAtoms =<br />
AtomicNumber|Aromaticity|FormalCharge<br />
Cl<br />
O<br />
N<br />
Cl<br />
O<br />
Cl<br />
N<br />
N<br />
O<br />
N<br />
N<br />
F<br />
Cl<br />
O<br />
O<br />
Cl<br />
OEExprOpts::DefaultBonds =<br />
BondOrder|Aromaticity<br />
O<br />
N<br />
N<br />
N<br />
N<br />
Cl<br />
F<br />
O<br />
O<br />
N<br />
N<br />
N<br />
F
OEExprOpts Examples<br />
DefaultAtom|EqAromatic<br />
DefaultBond<br />
DefaultAtom|EqHalides<br />
DefaultBond<br />
DefaultAtom<br />
DefaultBond|EqSingleDouble<br />
Cl<br />
Cl<br />
Cl<br />
O<br />
N<br />
O<br />
N<br />
O<br />
N<br />
Cl<br />
Cl<br />
Cl<br />
O<br />
O<br />
O<br />
O<br />
O<br />
N<br />
N<br />
N<br />
N<br />
O N<br />
N<br />
N<br />
N<br />
N<br />
F<br />
F<br />
F
MCS Scoring Functions<br />
OEChem built-in scoring functions<br />
– OEMCSMaxAtoms [default]<br />
– OEMCSMaxBonds<br />
– OEMCSMaxAtomsCompleteCycle<br />
– OEMCSMaxBondsCompleteCycle<br />
Ability to use user-defined scoring function
OEMCSMaxBonds<br />
scoring function<br />
NB +<br />
NA<br />
100<br />
NB = number of bonds<br />
NA = number of atoms<br />
S<br />
S<br />
S<br />
N<br />
N<br />
N<br />
S<br />
Br<br />
S N<br />
Br<br />
S N<br />
Br<br />
N<br />
score<br />
5.06<br />
5.06<br />
5.05
OEMCSMaxBondsCompleteCycle<br />
scoring function<br />
NA<br />
NB P*<br />
NUCB<br />
100 − +<br />
NB<br />
NA<br />
P<br />
NUCB<br />
= number of bonds<br />
= number of atoms<br />
= penalty (default 1.0)<br />
= number of unmapped<br />
cyclic query bonds<br />
S<br />
N<br />
S<br />
Br<br />
N
Unique/Non-Unique MCSS<br />
An MCSS match is<br />
considered unique:<br />
• if it differs from all other<br />
matches found previously by<br />
at least one atom or bond<br />
• if it represents different<br />
mapping<br />
O<br />
O<br />
O<br />
O<br />
Example for unique maximum common<br />
substructures<br />
O<br />
O
Exhaustive MCSS<br />
Pro:<br />
- guarantee to find the<br />
maximum common<br />
substructure(s)<br />
Contra:<br />
- complex structures<br />
cannot be mapped<br />
efficiently<br />
N<br />
for each matching<br />
atom pair<br />
…<br />
O<br />
O<br />
Cl<br />
…<br />
…<br />
N<br />
N<br />
N O<br />
Cl<br />
O<br />
O<br />
O<br />
O<br />
?<br />
O<br />
O<br />
Cl<br />
Cl<br />
? ?<br />
O<br />
Cl<br />
Cl<br />
common sub-graph<br />
can not be extended<br />
any more<br />
O<br />
O<br />
Cl<br />
?<br />
Cl<br />
evaluate<br />
O<br />
O<br />
O<br />
O<br />
…<br />
extend<br />
subgraph<br />
all<br />
possible<br />
ways
Approximate MCSS<br />
pre-defined paths in the pattern<br />
N O<br />
…<br />
…<br />
O<br />
N O<br />
O<br />
try to follow the same path in<br />
the target<br />
Cl<br />
Cl<br />
_<br />
Cl<br />
Cl<br />
O<br />
O<br />
O<br />
O
Comparison _ Speed<br />
on average the approximate search 3-6 times faster<br />
O<br />
O<br />
O<br />
O<br />
O<br />
H<br />
H<br />
approximate<br />
0.4 sec (CPU)<br />
3 matches<br />
O<br />
O<br />
O<br />
H<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
H<br />
H<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
O<br />
exhaustive<br />
100.1 sec (CPU)<br />
59 matches<br />
O<br />
H<br />
H<br />
H<br />
–<br />
O<br />
O
Exhaustive & Approximate MCSS<br />
S<br />
S<br />
N<br />
N<br />
S<br />
S<br />
N<br />
N<br />
O<br />
O<br />
approximate<br />
exhaustive<br />
N<br />
N<br />
O<br />
O<br />
S<br />
S<br />
O<br />
O<br />
N<br />
N
Comparison _ Accuracy<br />
3000 compounds<br />
from Maybridge<br />
9M comparisons<br />
OEMCSMaxAtoms()<br />
approximate 3_<br />
faster
MCSS Example<br />
#!/usr/bin/env python<br />
from openeye.oechem import *<br />
import os,sys<br />
pattern = OEGraphMol()<br />
target = OEGraphMol()<br />
OEParseSmiles(pattern, "c1cc(O)c(O)cc1CCN")<br />
OEParseSmiles(target, "c1c(O)c(O)c(Cl)cc1CCCBr")<br />
atomexpr = OEExprOpts_DefaultAtoms<br />
bondexpr = OEExprOpts_DefaultBonds<br />
mcstype = OEMCSType_Exhaustive<br />
mcss = OEMCSSearch(pattern,atomexpr,bondexpr,mcstype)<br />
mcss.SetMCSFunc(OEMCSMaxAtoms())<br />
mcss.SetMinAtoms(6)<br />
unique = True<br />
count = 1<br />
for match in mcss.Match(target,unique):<br />
sys.stdout.write("\nMatch %d :" % count)<br />
sys.stdout.write("\npattern atoms: ")<br />
for ma in match.GetAtoms():<br />
sys.stdout.write("%d " % ma.pattern.GetIdx())<br />
sys.stdout.write("\ntarget atoms: ")<br />
for ma in match.GetAtoms():<br />
sys.stdout.write("%d " % ma.target.GetIdx())<br />
count+=1<br />
m = OEGraphMol()<br />
OESubsetMol(m,match,True)<br />
smi = OECreateCanSmiString(m)<br />
sys.stdout.write("\nmatch smiles = %s \n" % smi)<br />
10<br />
12<br />
Cl<br />
N<br />
11<br />
9<br />
10<br />
8<br />
9<br />
Cl<br />
Match 1:<br />
pattern atoms: 0 1 2 3 4 5 6 7 8 9<br />
target atoms: 7 5 3 4 1 2 0 8 9 10<br />
match smiles = CCc1ccc(c(c1)O)O<br />
7<br />
0<br />
8<br />
7<br />
6<br />
1<br />
0<br />
5<br />
O<br />
4 5<br />
2 3<br />
6<br />
O<br />
O<br />
1 2<br />
3 4<br />
O
Conclusion<br />
• Adaptable MCSS<br />
– define atom/bond equivalency<br />
– set scoring function<br />
– choose search method<br />
• The approximate search provides a<br />
good trade-off between accuracy and<br />
speed
Future Development<br />
Automatic R-group decomposition<br />
Cl<br />
R 1<br />
Cl<br />
N<br />
R 3<br />
O<br />
O<br />
N<br />
core<br />
R 2<br />
O<br />
O<br />
R 1<br />
R 2<br />
R 3<br />
N<br />
Cl<br />
O<br />
O<br />
Cl<br />
O<br />
O<br />
O<br />
O<br />
N +<br />
O –<br />
N<br />
O<br />
…<br />
H …<br />
N +<br />
O –<br />
O<br />
…<br />
…
Link To Resources<br />
• http://www.eyesopen.com/docs/<br />
– OEChem – API (html/<strong>pdf</strong>)<br />
– OEChem – C++ Theory Manual (html/<strong>pdf</strong>)<br />
• Example<br />
Chapter 17. Pattern Matching