27.08.2013 Views

pdf 609 K

pdf 609 K

pdf 609 K

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MCSS in OEChem<br />

Krisztina Boda, Matthew Stahl<br />

OpenEye CUP IX, Santa Fe, March 2008


Overview<br />

• What is MCSS?<br />

• Why chemists might need MCSS?<br />

• Defining MCSS in OEChem<br />

– How to define atom and bond equivalency?<br />

– How to evaluate detected common substructures?<br />

– How to traverse the search tree<br />

• Example code


Maximum Common Substructure<br />

The largest sub-graph shared by two molecular structures.<br />

N<br />

m!<br />

n!<br />

( m − k)(<br />

n − k)!<br />

k!<br />

O<br />

O<br />

Cl<br />

NP-complete problem<br />

n, m = number of nodes in the compared graphs<br />

k = number of nodes in the common sub-graph<br />

Cl<br />

O<br />

O


Possible Applications of MCSS<br />

• clustering compound libraries<br />

• diversity measures of a compound<br />

library<br />

• mapping chemical reactions<br />

• R-group decomposition


Atom/Bond Equivalency<br />

OEExprOpts::DefaultAtoms =<br />

AtomicNumber|Aromaticity|FormalCharge<br />

Cl<br />

O<br />

N<br />

Cl<br />

O<br />

Cl<br />

N<br />

N<br />

O<br />

N<br />

N<br />

F<br />

Cl<br />

O<br />

O<br />

Cl<br />

OEExprOpts::DefaultBonds =<br />

BondOrder|Aromaticity<br />

O<br />

N<br />

N<br />

N<br />

N<br />

Cl<br />

F<br />

O<br />

O<br />

N<br />

N<br />

N<br />

F


OEExprOpts Examples<br />

DefaultAtom|EqAromatic<br />

DefaultBond<br />

DefaultAtom|EqHalides<br />

DefaultBond<br />

DefaultAtom<br />

DefaultBond|EqSingleDouble<br />

Cl<br />

Cl<br />

Cl<br />

O<br />

N<br />

O<br />

N<br />

O<br />

N<br />

Cl<br />

Cl<br />

Cl<br />

O<br />

O<br />

O<br />

O<br />

O<br />

N<br />

N<br />

N<br />

N<br />

O N<br />

N<br />

N<br />

N<br />

N<br />

F<br />

F<br />

F


MCS Scoring Functions<br />

OEChem built-in scoring functions<br />

– OEMCSMaxAtoms [default]<br />

– OEMCSMaxBonds<br />

– OEMCSMaxAtomsCompleteCycle<br />

– OEMCSMaxBondsCompleteCycle<br />

Ability to use user-defined scoring function


OEMCSMaxBonds<br />

scoring function<br />

NB +<br />

NA<br />

100<br />

NB = number of bonds<br />

NA = number of atoms<br />

S<br />

S<br />

S<br />

N<br />

N<br />

N<br />

S<br />

Br<br />

S N<br />

Br<br />

S N<br />

Br<br />

N<br />

score<br />

5.06<br />

5.06<br />

5.05


OEMCSMaxBondsCompleteCycle<br />

scoring function<br />

NA<br />

NB P*<br />

NUCB<br />

100 − +<br />

NB<br />

NA<br />

P<br />

NUCB<br />

= number of bonds<br />

= number of atoms<br />

= penalty (default 1.0)<br />

= number of unmapped<br />

cyclic query bonds<br />

S<br />

N<br />

S<br />

Br<br />

N


Unique/Non-Unique MCSS<br />

An MCSS match is<br />

considered unique:<br />

• if it differs from all other<br />

matches found previously by<br />

at least one atom or bond<br />

• if it represents different<br />

mapping<br />

O<br />

O<br />

O<br />

O<br />

Example for unique maximum common<br />

substructures<br />

O<br />

O


Exhaustive MCSS<br />

Pro:<br />

- guarantee to find the<br />

maximum common<br />

substructure(s)<br />

Contra:<br />

- complex structures<br />

cannot be mapped<br />

efficiently<br />

N<br />

for each matching<br />

atom pair<br />

…<br />

O<br />

O<br />

Cl<br />

…<br />

…<br />

N<br />

N<br />

N O<br />

Cl<br />

O<br />

O<br />

O<br />

O<br />

?<br />

O<br />

O<br />

Cl<br />

Cl<br />

? ?<br />

O<br />

Cl<br />

Cl<br />

common sub-graph<br />

can not be extended<br />

any more<br />

O<br />

O<br />

Cl<br />

?<br />

Cl<br />

evaluate<br />

O<br />

O<br />

O<br />

O<br />

…<br />

extend<br />

subgraph<br />

all<br />

possible<br />

ways


Approximate MCSS<br />

pre-defined paths in the pattern<br />

N O<br />

…<br />

…<br />

O<br />

N O<br />

O<br />

try to follow the same path in<br />

the target<br />

Cl<br />

Cl<br />

_<br />

Cl<br />

Cl<br />

O<br />

O<br />

O<br />

O


Comparison _ Speed<br />

on average the approximate search 3-6 times faster<br />

O<br />

O<br />

O<br />

O<br />

O<br />

H<br />

H<br />

approximate<br />

0.4 sec (CPU)<br />

3 matches<br />

O<br />

O<br />

O<br />

H<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

H<br />

H<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

O<br />

exhaustive<br />

100.1 sec (CPU)<br />

59 matches<br />

O<br />

H<br />

H<br />

H<br />

–<br />

O<br />

O


Exhaustive & Approximate MCSS<br />

S<br />

S<br />

N<br />

N<br />

S<br />

S<br />

N<br />

N<br />

O<br />

O<br />

approximate<br />

exhaustive<br />

N<br />

N<br />

O<br />

O<br />

S<br />

S<br />

O<br />

O<br />

N<br />

N


Comparison _ Accuracy<br />

3000 compounds<br />

from Maybridge<br />

9M comparisons<br />

OEMCSMaxAtoms()<br />

approximate 3_<br />

faster


MCSS Example<br />

#!/usr/bin/env python<br />

from openeye.oechem import *<br />

import os,sys<br />

pattern = OEGraphMol()<br />

target = OEGraphMol()<br />

OEParseSmiles(pattern, "c1cc(O)c(O)cc1CCN")<br />

OEParseSmiles(target, "c1c(O)c(O)c(Cl)cc1CCCBr")<br />

atomexpr = OEExprOpts_DefaultAtoms<br />

bondexpr = OEExprOpts_DefaultBonds<br />

mcstype = OEMCSType_Exhaustive<br />

mcss = OEMCSSearch(pattern,atomexpr,bondexpr,mcstype)<br />

mcss.SetMCSFunc(OEMCSMaxAtoms())<br />

mcss.SetMinAtoms(6)<br />

unique = True<br />

count = 1<br />

for match in mcss.Match(target,unique):<br />

sys.stdout.write("\nMatch %d :" % count)<br />

sys.stdout.write("\npattern atoms: ")<br />

for ma in match.GetAtoms():<br />

sys.stdout.write("%d " % ma.pattern.GetIdx())<br />

sys.stdout.write("\ntarget atoms: ")<br />

for ma in match.GetAtoms():<br />

sys.stdout.write("%d " % ma.target.GetIdx())<br />

count+=1<br />

m = OEGraphMol()<br />

OESubsetMol(m,match,True)<br />

smi = OECreateCanSmiString(m)<br />

sys.stdout.write("\nmatch smiles = %s \n" % smi)<br />

10<br />

12<br />

Cl<br />

N<br />

11<br />

9<br />

10<br />

8<br />

9<br />

Cl<br />

Match 1:<br />

pattern atoms: 0 1 2 3 4 5 6 7 8 9<br />

target atoms: 7 5 3 4 1 2 0 8 9 10<br />

match smiles = CCc1ccc(c(c1)O)O<br />

7<br />

0<br />

8<br />

7<br />

6<br />

1<br />

0<br />

5<br />

O<br />

4 5<br />

2 3<br />

6<br />

O<br />

O<br />

1 2<br />

3 4<br />

O


Conclusion<br />

• Adaptable MCSS<br />

– define atom/bond equivalency<br />

– set scoring function<br />

– choose search method<br />

• The approximate search provides a<br />

good trade-off between accuracy and<br />

speed


Future Development<br />

Automatic R-group decomposition<br />

Cl<br />

R 1<br />

Cl<br />

N<br />

R 3<br />

O<br />

O<br />

N<br />

core<br />

R 2<br />

O<br />

O<br />

R 1<br />

R 2<br />

R 3<br />

N<br />

Cl<br />

O<br />

O<br />

Cl<br />

O<br />

O<br />

O<br />

O<br />

N +<br />

O –<br />

N<br />

O<br />

…<br />

H …<br />

N +<br />

O –<br />

O<br />

…<br />


Link To Resources<br />

• http://www.eyesopen.com/docs/<br />

– OEChem – API (html/<strong>pdf</strong>)<br />

– OEChem – C++ Theory Manual (html/<strong>pdf</strong>)<br />

• Example<br />

Chapter 17. Pattern Matching

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!