27.08.2013 Views

11 Toolkits. What can you do with them?

11 Toolkits. What can you do with them?

11 Toolkits. What can you do with them?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>11</strong> <strong>Toolkits</strong>. <strong>What</strong> <strong>can</strong> <strong>you</strong> <strong>do</strong><br />

<strong>with</strong> <strong>them</strong>?<br />

Phillip Sawunyama<br />

EuroCUP IV, April 2010


Who is having fun?


Who is having fun?


Molecular modeling Cheminformatics<br />

Zap TK<br />

Poisson-Boltzmann electrostatics<br />

solver<br />

Spicoli TK<br />

Molecular surfaces & volume<br />

generation<br />

SzybkiTK<br />

General purpose function<br />

optimization<br />

Quacpac TK<br />

Tautomer/Protomer enumeration<br />

& charge assignment<br />

Shape TK<br />

Molecular descriptors for shape<br />

overlay & similarity<br />

OmegaTK<br />

Conformer ensembles containing<br />

bioactive conformations<br />

Docking TK<br />

Docking ligands into a receptor<br />

OEChem TK<br />

Programming library for<br />

chemistry and<br />

cheminformatics<br />

GraphSim TK<br />

Fingerprints for 2D similarity<br />

MolProp TK<br />

Molecular property calculation<br />

Lexichem/OEIUPAC TK<br />

Conversion of chemical names to<br />

chemical structures & vice-versa<br />

Ogham/OEDepict TK<br />

2D Molecular Drawings Rendering


OEMath<br />

Geometry Handling,<br />

Constants<br />

OEBio<br />

Sequence Alignment,<br />

Crystal Symmetry<br />

OEChem TK<br />

(1.7.2)<br />

Required in order to use other<br />

toolkits<br />

OEChem<br />

Molecules, File Formats,<br />

Searching<br />

OEGrid<br />

Data container for points<br />

in 3D lattice<br />

OEPlatform<br />

IO, Synchronization,<br />

Machine Info<br />

OESystem<br />

Error Handling, Data<br />

Structures


If <strong>you</strong> just have OEChem…<br />

• Data Processing<br />

– SD data handling<br />

– Small in-memory molecules<br />

• 2D<br />

– Exact and Maximum common substructure<br />

– LINGOS<br />

– Normalization<br />

– Library Generation<br />

• 3D<br />

– Conformers<br />

– Translation/Rotation/Torsions<br />

– RMSD fitting (including automorphisms)


OEChem: Molecules<br />

• Easy way to manage molecules, atoms,<br />

bonds, conformers, queries, and reactions.<br />

• Iterators for traversing atoms and bonds<br />

for atom in mol.GetAtoms():<br />

for bond in atom.GetBonds():<br />

print atom.GetName(), bond.IsRotor()


OEChem file format conversion<br />

SMILES MOL2 SDF PDB OEB<br />

Daylight<br />

Chemistry<br />

Tripos<br />

Chemistry<br />

MDL<br />

Chemistry<br />

PDB<br />

Chemistry<br />

OpenEye<br />

Chemistry<br />

• Aromaticity, valence model, kekulization, atom-types & implicit H’s


Multiple aromaticity models<br />

MDL<br />

Yes<br />

No<br />

No<br />

No<br />

No<br />

Tripos<br />

Yes<br />

No<br />

No<br />

N/A<br />

No<br />

MMFF<br />

Yes<br />

Yes<br />

No<br />

N/A<br />

No<br />

Daylight<br />

Yes<br />

Yes<br />

Yes<br />

No<br />

No/Yes<br />

OpenEye<br />

Yes<br />

Yes<br />

Yes<br />

Yes<br />

Yes<br />

OEAssignAromaticFlags(mol, OEAroModelTripos)


High Level and Low Level<br />

High Level Low Level<br />

High Level<br />

OEWriteMolecule(ifs,mol)<br />

Low Level<br />

OECreateSmiString(str,mol,AtomMap|Kekule|Canonical)


Simple File Conversion Example<br />

Import sys<br />

from openeye.oechem import *<br />

ifs = oemolistream(sys.argv[1])<br />

ofs = oemolostream(sys.argv[2])<br />

for mol in ifs.GetOEGraphMols():<br />

OEWriteMolecule(ofs, mol)<br />

Open Input &<br />

output streams<br />

Write out data<br />

OEWriteMolecule function will change <strong>you</strong>r molecules.<br />

Use OEWriteConstMolecule function if <strong>you</strong> <strong>do</strong> not want <strong>you</strong>r<br />

molecules to change.


Canonical Isomeric SMILES<br />

Solved problems<br />

(R)<br />

(S)<br />

CC([C@@H](C)N)[C@H](C)N<br />

CC([C@H](C)N)[C@@H](C)N<br />

Test set (10M)<br />

(bond or atom stereo)<br />

78 failures<br />

C/C=C/C=C\C<br />

C/C=C\C=C\C<br />

reordering atoms<br />

(10 times/compound)<br />

C1C[C@@H](CC[C@@H]1O)O<br />

C1C[C@H](CC[C@H]1O)O<br />

C1C[C@H](CC[C@@H]1O)O<br />

C1C[C@@H](CC[C@H]1O)O


SD Data Manipulation<br />

• Get Data, Change Data, Set Data<br />

• SDF and OEB files<br />

Registry Number<br />

Binding Energy<br />

CYP 3A4 Data


Simple SDData Example<br />

ifs = oemolistream(sys.argv[1])<br />

ofs = oemolostream(sys.argv[2])<br />

mol = OEGraphMol()<br />

while OEReadMolecule(ifs, mol):<br />

id = OEGetSDData(mol, “ID”)<br />

id = “OpenEye” + id;<br />

OESetSDData(mol, “ID”, id)<br />

OEWriteMolecule(ofs, mol)<br />

Get Data<br />

Change Data<br />

Set Data


Small in-memory molecules<br />

• OEMolBase::Compress<br />

• OEMolBase::UnCompress<br />

• MiniMol<br />

– Graph only<br />

– Only call Compress once<br />

• DBMols – <strong>can</strong> be even smaller<br />

– Coordinates and all<br />

– Call UnCompress and Compress every time


Substructure Search<br />

• Loop through a database writing out matches<br />

Define a function for initializing substructure search<br />

def SubSearch(ifs,ofs, qmol):<br />

subsearch = OESubSearch()<br />

atomexpr = OEExprOpts_DefaultAtoms<br />

bondexpr = OEExprOpts_DefaultBonds<br />

if not subsearch.Init(qmol, atomexpr, bondexpr ):<br />

OEThrow.Fatal(“Unable to initialize substructure search”)<br />

for mol in ifs.GetOEGraphMols():<br />

if subsearch.SingleMatch(mol):<br />

OEWriteMolecule(ofs, mol)


MDL Query File Support


Maximum Common Substructure<br />

Search<br />

• Maximum common substructure searches <strong>can</strong> be performed in<br />

OEChem using the OEMCSSearch class. The OEMCSSearch<br />

class <strong>can</strong> be initialized <strong>with</strong> a SMARTS pattern, an<br />

OEQMolBase query molecule, or a molecule <strong>with</strong> expression<br />

options.


Pro:<br />

- guaranteed to find<br />

the maximum common<br />

substructure(s)<br />

Con:<br />

- complex structures<br />

<strong>can</strong> not be mapped<br />

efficiently<br />

Exhaustive MCSS<br />

for each matching<br />

atom pair<br />

…<br />

…<br />

…<br />

?<br />

? ?<br />

common sub-graph<br />

<strong>can</strong> not be extended<br />

any more<br />

?<br />

evaluate<br />

…<br />

extend<br />

subgraph<br />

all<br />

possible<br />

ways


pre-defined paths in the pattern<br />

…<br />

Approximate MCSS<br />

…<br />

try to follow the same path in<br />

the target<br />

!


Clique Search<br />

Can also perform a clique search using the<br />

OECliqueSearch class.<br />

Identify all common substructures between a<br />

reference molecule and a target molecule


Generate Molecular Framework<br />

Morphine Scaffold or reduced graph


OEChem: Normalization<br />

Uses SMIRKS to specify normalization<br />

!"#$%$&'()*+,-./!-0#12)3456789:;


Inter-conversion of the nitro group<br />

The nitro group is typically represented either <strong>with</strong> pentavalent Nitrogen<br />

“*N(=O)=O” or as a charge-separated trivalent Nitrogen “*[N+](=O)[O-]”<br />

SMIRKS PATTERN<br />

To convert to the trivalent form:<br />

"[*:1][N:2](=[O:3])=[O:4]>>[*:1][N+:2](=[O:3])[O-:4]"<br />

To convert to the pentavalent form:<br />

"[*:1][N+:2]([O-:3])=[O:4]>>[*:1][N+0:2](=[O+0:3])=[O:4]"


Library Generation<br />

Supported MDL query features<br />

• generic and alternative atom types<br />

• number of substituents, ring bonds, hydrogens<br />

• unsaturated atom<br />

• alternative bond types, topology, stereo care<br />

substructure<br />

search<br />

amide.rxn<br />

transformation


Conformers and Geometry<br />

• Easy handling of multi-conformer molecules<br />

• Torsions<br />

• Translations<br />

• Rotations<br />

• Moments of inertia<br />

• RMSD: Automatically handles automorphisms


OEGrid<br />

• Bundled <strong>with</strong> OEChem<br />

• Supports lattices as first class objects<br />

– OEScalarGrid<br />

– OESkewGrid<br />

• Features<br />

– IO from several formats (gridbabel)<br />

– Attach as generic data to molecules<br />

– Regularization<br />

– Masking<br />

– Interpolation


Wedge bond direction<br />

• Clean up structures <strong>with</strong><br />

incorrectly assigned bond<br />

stereochemistry (common<br />

problem). The narrow end of the<br />

wedge bond is incorrectly oriented<br />

toward an sp 2<br />

• The bond stereo specified by the<br />

MD/SDF file is stored as generic<br />

data on the bond using the<br />

"OEProperty_BondStereo" tag.


• Demo!<br />

Wedge bond direction<br />

• This corrects the bond direction when it's obviously wrong.<br />

However, there are cases, such as when both atoms of a bond<br />

are chiral, when the direction <strong>can</strong> not be automatically<br />

corrected. In these cases the program only gives a warning<br />

but leaves the bond direction unchanged.


• Hierarchy View<br />

OEBIO<br />

– represent a biopolymer molecule as a hierarchy of<br />

the components: OEHierChain, OEHierFragment,<br />

and OEHierResidue.<br />

• Extract Ligand from a protein-ligand complex<br />

• Extract protein backbone<br />

• Calculate phipsi rotatable angles for residues --<br />

Ramachandran plot<br />

• Alternate locations<br />

• Sequence alignment


! of PDB Structures Contain<br />

Alternate Location Codes<br />

• Not molecules but<br />

ensembles<br />

• To compute molecular<br />

properties, must<br />

restrict to subset of<br />

atoms<br />

– (eg. just the ‘A’s)


OEBIO<br />

• Create a subset of residues (demo)


The shape toolkit<br />

• Greatly simplifies some tasks<br />

– Using multiple queries<br />

• Exclusions<br />

– Using different color files in the same run<br />

• Filtering <strong>with</strong> constraints<br />

• Some tasks that <strong>can</strong>not be <strong>do</strong>ne in ROCS<br />

– NxN comparisons/clustering<br />

– Shape multipoles


OEQuacPac & OESzybki<br />

• On-the-fly protonation state sampling<br />

– Load protein and ligand<br />

– OEAddExplicitHydrogens<br />

– Loop over pka and tautomer states<br />

• OESet3DHydrogenGeom<br />

• Perform Szybki minimization<br />

– Write out the state <strong>with</strong> the best score


• Surfaces<br />

– Molecular<br />

– Accessible<br />

– Primitives<br />

• Properties<br />

– Area<br />

– Volume<br />

• Void volume<br />

– Curvature<br />

– Depth<br />

OESpicoli


Calculate binding site surface


Calculate binding site surface


ZAP: PB Electrostatics<br />

• Solvation energy & potential grids:<br />

– zap.CalcSolvationEnergy()<br />

– zap.CalcPotentialGrid()<br />

• Both properties on a grid<br />

• Accessible surface area: OEArea<br />

• Binding properties: OEBind<br />

– OEBindResults() allows dissection of all components<br />

• Electrostatic similarity: OEET<br />

– In situ calculation of electrostatic Tanimoto


OEIUPAC (Lexichem)<br />

• The Languages of Chemistry<br />

Images<br />

Names Structures<br />

Methylsalicylate/<br />

2-methoxybenzoate<br />

COc1ccccc1C(=O)[O-]


Lexichem: Naming<br />

• Names structures<br />

– Multi-language support<br />

• German, Japanese, Welsh, Klingon...<br />

• 100% round-trip translation<br />

• Name -> structure -> name<br />

– 250,000 NCI screening database<br />

– 92.5% round-trip<br />

DOI: 10.1021/ci800243w; J. Chem. Inf. Model., 49, 519 (2009)


Naming and Depiction (OEDepict)<br />

benzene<br />

nam2mol<br />

mol2nam<br />

c1ccccc1<br />

depict<br />

mol2gif<br />

Lexichem Ogham<br />

-OEChem-05210615262D<br />

6 6 0 0 0 0 0 0 0999 V2000<br />

-0.8674 1.5027 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />

-0.8674 0.4976 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />

0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />

0.8674 0.4976 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />

0.8674 1.5027 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />

0.0000 2.0102 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />

1 6 2 0 0 0 0<br />

1 2 1 0 0 0 0<br />

2 3 2 0 0 0 0<br />

3 4 1 0 0 0 0<br />

4 5 2 0 0 0 0<br />

5 6 1 0 0 0 0<br />

M END


Picto<br />

• Depiction/naming tool<br />

– OEChem + Lexichem + Ogham<br />

• Ships <strong>with</strong> toolkits and also <strong>with</strong> VIDA 4.0, must be<br />

compiled (C++)<br />

• SMILES name<br />

– Structure is depicted<br />

• Atom selection -> SMARTS


Picto


• OEGraphSim Fingerprints<br />

– MACCS<br />

– LINGO<br />

– Path-based<br />

• Storing Fingerprints<br />

• Similarity Search using OEFPDatabase<br />

• Performance & validation<br />

• Fingerprint Similarity Measures<br />

– Built-in, user-defined<br />

• User-defined path fingerprint


• 166 bit fingerprint<br />

• Each bit is associated <strong>with</strong> a predefined SMARTS pattern<br />

MACCS FP generation =<br />

substructure search<br />

OEMakeFP(fp, mol, OEFPType_MACCS166)<br />

OEMakeMACCS166FP(fp, mol)<br />

…<br />


• Based on the fragmentation of <strong>can</strong>onical isomeric SMILES<br />

into overlapping substrings<br />

• Similarity concept (similar SMILES equal similar structures)<br />

LINGO FP generation =<br />

SMILES generation<br />

OEMakeFP(fp, mol, OEFPType_Lingo)<br />

OEMakeLingoFP(fp, mol)<br />

c1cc([nH]c1)N c1cc([nH]c1)O


• No predefined pattern dictionary<br />

• Exhaustively enumerate all paths in a molecule<br />

Path FP generation =<br />

enumerating paths<br />

+ hashing<br />

OEMakeFP(fp, mol, OEFPType_Path)<br />

OEMakePathFP(fp, mol)<br />

collision


Size in Mbyte<br />

:AA$<br />

DA$<br />

CA$<br />

BA$<br />

8A$<br />

A$<br />

MACCS<br />

LINGO<br />

PATH<br />

RSS<br />

OEB<br />

OEB GZ<br />

• Dataset MDDR<br />

(~<strong>11</strong>0,000 compounds)<br />

OEB GZ<br />

OEB<br />

RSS<br />

• OEB GZ, OEB (disk space)<br />

• RSS (size in physical memory)<br />

– Using compressed OEDBMols<br />

• Path FPs are generated <strong>with</strong><br />

default parameters<br />

– 4096 bit fingerprint<br />

Estimated memory usage<br />

for 1M compounds<br />

MACCS LINGO PATH<br />

430M 470M 900M


Time in sec<br />

:AA$<br />

FE$<br />

EA$<br />

8E$<br />

A$<br />

+GHHI$<br />

JK6L&$<br />

• Dataset: MDDR (~<strong>11</strong>0,000)<br />

• Path FPs <strong>with</strong> default parameters<br />

– 4096 bit fingerprint<br />

– 0-5 path lengths<br />

MGN7$<br />

Core<br />

XEON<br />

• Intel® Core 2<br />

Quad CPU Q6600<br />

(2.4 GHz)<br />

• Intel® XEON®<br />

CPU X5560<br />

(2.8 GHz)<br />

Estimated time to<br />

generate FP for 1M<br />

compounds (ISM!OEB)<br />

MACCS LINGO PATH<br />

~12min ~2min ~5min


OEMolProp<br />

• Calculates 2D molecular properties<br />

• XlogP<br />

• XlogS<br />

• PSA<br />

• hydrogen bond <strong>do</strong>nor and acceptor count<br />

• rotatable bonds, ring size and number<br />

• Generates custom filters<br />

• ADME filters such as Lipinski<br />

• Provides graph-based protonation state assignment<br />

for consistency and speed


Link OE Libraries: DEMO<br />

1. Link Libraries<br />

sys, oechem, oeiupac, oeomega, oegrid<br />

2. Open output streams<br />

sys<br />

3. Read compound names (iupac) from a text file and convert names to structures.<br />

sys, oeiupac<br />

4. Count Rotors<br />

oechem<br />

5. Generate Conformers<br />

oeomega<br />

6. Generate Grid and Write Output<br />

oegrid, oechem


Examples built on OEChem<br />

• Registry Systems<br />

• Pharmacophore tools<br />

• PubChem<br />

• Mix and match<br />

– Rpy<br />

– Database cartridges<br />

– Web services


Give <strong>you</strong>r users this UI<br />

Designing the application UI<br />

Now that <strong>you</strong>’re armed <strong>with</strong> <strong>you</strong>r <strong>11</strong> toolkits…<br />

And not this UI


For more information, please contact us.<br />

business@eyesopen.com<br />

support@eyesopen.com<br />

www.eyesopen.com<br />

505-473-7385<br />

That’s it folks!<br />

• http://www.eyesopen.com/<strong>do</strong>cs/: Toolkit Manuals <strong>with</strong> code examples (C++, Python, Java)<br />

• ~/openeye/toolkits/examples (C++)<br />

• ~/openeye/python/examples (Python)<br />

• ~/openeye/java/openeye/examples (Java)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!