11 Toolkits. What can you do with them?
11 Toolkits. What can you do with them?
11 Toolkits. What can you do with them?
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>11</strong> <strong>Toolkits</strong>. <strong>What</strong> <strong>can</strong> <strong>you</strong> <strong>do</strong><br />
<strong>with</strong> <strong>them</strong>?<br />
Phillip Sawunyama<br />
EuroCUP IV, April 2010
Who is having fun?
Who is having fun?
Molecular modeling Cheminformatics<br />
Zap TK<br />
Poisson-Boltzmann electrostatics<br />
solver<br />
Spicoli TK<br />
Molecular surfaces & volume<br />
generation<br />
SzybkiTK<br />
General purpose function<br />
optimization<br />
Quacpac TK<br />
Tautomer/Protomer enumeration<br />
& charge assignment<br />
Shape TK<br />
Molecular descriptors for shape<br />
overlay & similarity<br />
OmegaTK<br />
Conformer ensembles containing<br />
bioactive conformations<br />
Docking TK<br />
Docking ligands into a receptor<br />
OEChem TK<br />
Programming library for<br />
chemistry and<br />
cheminformatics<br />
GraphSim TK<br />
Fingerprints for 2D similarity<br />
MolProp TK<br />
Molecular property calculation<br />
Lexichem/OEIUPAC TK<br />
Conversion of chemical names to<br />
chemical structures & vice-versa<br />
Ogham/OEDepict TK<br />
2D Molecular Drawings Rendering
OEMath<br />
Geometry Handling,<br />
Constants<br />
OEBio<br />
Sequence Alignment,<br />
Crystal Symmetry<br />
OEChem TK<br />
(1.7.2)<br />
Required in order to use other<br />
toolkits<br />
OEChem<br />
Molecules, File Formats,<br />
Searching<br />
OEGrid<br />
Data container for points<br />
in 3D lattice<br />
OEPlatform<br />
IO, Synchronization,<br />
Machine Info<br />
OESystem<br />
Error Handling, Data<br />
Structures
If <strong>you</strong> just have OEChem…<br />
• Data Processing<br />
– SD data handling<br />
– Small in-memory molecules<br />
• 2D<br />
– Exact and Maximum common substructure<br />
– LINGOS<br />
– Normalization<br />
– Library Generation<br />
• 3D<br />
– Conformers<br />
– Translation/Rotation/Torsions<br />
– RMSD fitting (including automorphisms)
OEChem: Molecules<br />
• Easy way to manage molecules, atoms,<br />
bonds, conformers, queries, and reactions.<br />
• Iterators for traversing atoms and bonds<br />
for atom in mol.GetAtoms():<br />
for bond in atom.GetBonds():<br />
print atom.GetName(), bond.IsRotor()
OEChem file format conversion<br />
SMILES MOL2 SDF PDB OEB<br />
Daylight<br />
Chemistry<br />
Tripos<br />
Chemistry<br />
MDL<br />
Chemistry<br />
PDB<br />
Chemistry<br />
OpenEye<br />
Chemistry<br />
• Aromaticity, valence model, kekulization, atom-types & implicit H’s
Multiple aromaticity models<br />
MDL<br />
Yes<br />
No<br />
No<br />
No<br />
No<br />
Tripos<br />
Yes<br />
No<br />
No<br />
N/A<br />
No<br />
MMFF<br />
Yes<br />
Yes<br />
No<br />
N/A<br />
No<br />
Daylight<br />
Yes<br />
Yes<br />
Yes<br />
No<br />
No/Yes<br />
OpenEye<br />
Yes<br />
Yes<br />
Yes<br />
Yes<br />
Yes<br />
OEAssignAromaticFlags(mol, OEAroModelTripos)
High Level and Low Level<br />
High Level Low Level<br />
High Level<br />
OEWriteMolecule(ifs,mol)<br />
Low Level<br />
OECreateSmiString(str,mol,AtomMap|Kekule|Canonical)
Simple File Conversion Example<br />
Import sys<br />
from openeye.oechem import *<br />
ifs = oemolistream(sys.argv[1])<br />
ofs = oemolostream(sys.argv[2])<br />
for mol in ifs.GetOEGraphMols():<br />
OEWriteMolecule(ofs, mol)<br />
Open Input &<br />
output streams<br />
Write out data<br />
OEWriteMolecule function will change <strong>you</strong>r molecules.<br />
Use OEWriteConstMolecule function if <strong>you</strong> <strong>do</strong> not want <strong>you</strong>r<br />
molecules to change.
Canonical Isomeric SMILES<br />
Solved problems<br />
(R)<br />
(S)<br />
CC([C@@H](C)N)[C@H](C)N<br />
CC([C@H](C)N)[C@@H](C)N<br />
Test set (10M)<br />
(bond or atom stereo)<br />
78 failures<br />
C/C=C/C=C\C<br />
C/C=C\C=C\C<br />
reordering atoms<br />
(10 times/compound)<br />
C1C[C@@H](CC[C@@H]1O)O<br />
C1C[C@H](CC[C@H]1O)O<br />
C1C[C@H](CC[C@@H]1O)O<br />
C1C[C@@H](CC[C@H]1O)O
SD Data Manipulation<br />
• Get Data, Change Data, Set Data<br />
• SDF and OEB files<br />
Registry Number<br />
Binding Energy<br />
CYP 3A4 Data
Simple SDData Example<br />
ifs = oemolistream(sys.argv[1])<br />
ofs = oemolostream(sys.argv[2])<br />
mol = OEGraphMol()<br />
while OEReadMolecule(ifs, mol):<br />
id = OEGetSDData(mol, “ID”)<br />
id = “OpenEye” + id;<br />
OESetSDData(mol, “ID”, id)<br />
OEWriteMolecule(ofs, mol)<br />
Get Data<br />
Change Data<br />
Set Data
Small in-memory molecules<br />
• OEMolBase::Compress<br />
• OEMolBase::UnCompress<br />
• MiniMol<br />
– Graph only<br />
– Only call Compress once<br />
• DBMols – <strong>can</strong> be even smaller<br />
– Coordinates and all<br />
– Call UnCompress and Compress every time
Substructure Search<br />
• Loop through a database writing out matches<br />
Define a function for initializing substructure search<br />
def SubSearch(ifs,ofs, qmol):<br />
subsearch = OESubSearch()<br />
atomexpr = OEExprOpts_DefaultAtoms<br />
bondexpr = OEExprOpts_DefaultBonds<br />
if not subsearch.Init(qmol, atomexpr, bondexpr ):<br />
OEThrow.Fatal(“Unable to initialize substructure search”)<br />
for mol in ifs.GetOEGraphMols():<br />
if subsearch.SingleMatch(mol):<br />
OEWriteMolecule(ofs, mol)
MDL Query File Support
Maximum Common Substructure<br />
Search<br />
• Maximum common substructure searches <strong>can</strong> be performed in<br />
OEChem using the OEMCSSearch class. The OEMCSSearch<br />
class <strong>can</strong> be initialized <strong>with</strong> a SMARTS pattern, an<br />
OEQMolBase query molecule, or a molecule <strong>with</strong> expression<br />
options.
Pro:<br />
- guaranteed to find<br />
the maximum common<br />
substructure(s)<br />
Con:<br />
- complex structures<br />
<strong>can</strong> not be mapped<br />
efficiently<br />
Exhaustive MCSS<br />
for each matching<br />
atom pair<br />
…<br />
…<br />
…<br />
?<br />
? ?<br />
common sub-graph<br />
<strong>can</strong> not be extended<br />
any more<br />
?<br />
evaluate<br />
…<br />
extend<br />
subgraph<br />
all<br />
possible<br />
ways
pre-defined paths in the pattern<br />
…<br />
Approximate MCSS<br />
…<br />
try to follow the same path in<br />
the target<br />
!
Clique Search<br />
Can also perform a clique search using the<br />
OECliqueSearch class.<br />
Identify all common substructures between a<br />
reference molecule and a target molecule
Generate Molecular Framework<br />
Morphine Scaffold or reduced graph
OEChem: Normalization<br />
Uses SMIRKS to specify normalization<br />
!"#$%$&'()*+,-./!-0#12)3456789:;
Inter-conversion of the nitro group<br />
The nitro group is typically represented either <strong>with</strong> pentavalent Nitrogen<br />
“*N(=O)=O” or as a charge-separated trivalent Nitrogen “*[N+](=O)[O-]”<br />
SMIRKS PATTERN<br />
To convert to the trivalent form:<br />
"[*:1][N:2](=[O:3])=[O:4]>>[*:1][N+:2](=[O:3])[O-:4]"<br />
To convert to the pentavalent form:<br />
"[*:1][N+:2]([O-:3])=[O:4]>>[*:1][N+0:2](=[O+0:3])=[O:4]"
Library Generation<br />
Supported MDL query features<br />
• generic and alternative atom types<br />
• number of substituents, ring bonds, hydrogens<br />
• unsaturated atom<br />
• alternative bond types, topology, stereo care<br />
substructure<br />
search<br />
amide.rxn<br />
transformation
Conformers and Geometry<br />
• Easy handling of multi-conformer molecules<br />
• Torsions<br />
• Translations<br />
• Rotations<br />
• Moments of inertia<br />
• RMSD: Automatically handles automorphisms
OEGrid<br />
• Bundled <strong>with</strong> OEChem<br />
• Supports lattices as first class objects<br />
– OEScalarGrid<br />
– OESkewGrid<br />
• Features<br />
– IO from several formats (gridbabel)<br />
– Attach as generic data to molecules<br />
– Regularization<br />
– Masking<br />
– Interpolation
Wedge bond direction<br />
• Clean up structures <strong>with</strong><br />
incorrectly assigned bond<br />
stereochemistry (common<br />
problem). The narrow end of the<br />
wedge bond is incorrectly oriented<br />
toward an sp 2<br />
• The bond stereo specified by the<br />
MD/SDF file is stored as generic<br />
data on the bond using the<br />
"OEProperty_BondStereo" tag.
• Demo!<br />
Wedge bond direction<br />
• This corrects the bond direction when it's obviously wrong.<br />
However, there are cases, such as when both atoms of a bond<br />
are chiral, when the direction <strong>can</strong> not be automatically<br />
corrected. In these cases the program only gives a warning<br />
but leaves the bond direction unchanged.
• Hierarchy View<br />
OEBIO<br />
– represent a biopolymer molecule as a hierarchy of<br />
the components: OEHierChain, OEHierFragment,<br />
and OEHierResidue.<br />
• Extract Ligand from a protein-ligand complex<br />
• Extract protein backbone<br />
• Calculate phipsi rotatable angles for residues --<br />
Ramachandran plot<br />
• Alternate locations<br />
• Sequence alignment
! of PDB Structures Contain<br />
Alternate Location Codes<br />
• Not molecules but<br />
ensembles<br />
• To compute molecular<br />
properties, must<br />
restrict to subset of<br />
atoms<br />
– (eg. just the ‘A’s)
OEBIO<br />
• Create a subset of residues (demo)
The shape toolkit<br />
• Greatly simplifies some tasks<br />
– Using multiple queries<br />
• Exclusions<br />
– Using different color files in the same run<br />
• Filtering <strong>with</strong> constraints<br />
• Some tasks that <strong>can</strong>not be <strong>do</strong>ne in ROCS<br />
– NxN comparisons/clustering<br />
– Shape multipoles
OEQuacPac & OESzybki<br />
• On-the-fly protonation state sampling<br />
– Load protein and ligand<br />
– OEAddExplicitHydrogens<br />
– Loop over pka and tautomer states<br />
• OESet3DHydrogenGeom<br />
• Perform Szybki minimization<br />
– Write out the state <strong>with</strong> the best score
• Surfaces<br />
– Molecular<br />
– Accessible<br />
– Primitives<br />
• Properties<br />
– Area<br />
– Volume<br />
• Void volume<br />
– Curvature<br />
– Depth<br />
OESpicoli
Calculate binding site surface
Calculate binding site surface
ZAP: PB Electrostatics<br />
• Solvation energy & potential grids:<br />
– zap.CalcSolvationEnergy()<br />
– zap.CalcPotentialGrid()<br />
• Both properties on a grid<br />
• Accessible surface area: OEArea<br />
• Binding properties: OEBind<br />
– OEBindResults() allows dissection of all components<br />
• Electrostatic similarity: OEET<br />
– In situ calculation of electrostatic Tanimoto
OEIUPAC (Lexichem)<br />
• The Languages of Chemistry<br />
Images<br />
Names Structures<br />
Methylsalicylate/<br />
2-methoxybenzoate<br />
COc1ccccc1C(=O)[O-]
Lexichem: Naming<br />
• Names structures<br />
– Multi-language support<br />
• German, Japanese, Welsh, Klingon...<br />
• 100% round-trip translation<br />
• Name -> structure -> name<br />
– 250,000 NCI screening database<br />
– 92.5% round-trip<br />
DOI: 10.1021/ci800243w; J. Chem. Inf. Model., 49, 519 (2009)
Naming and Depiction (OEDepict)<br />
benzene<br />
nam2mol<br />
mol2nam<br />
c1ccccc1<br />
depict<br />
mol2gif<br />
Lexichem Ogham<br />
-OEChem-05210615262D<br />
6 6 0 0 0 0 0 0 0999 V2000<br />
-0.8674 1.5027 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />
-0.8674 0.4976 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />
0.8674 0.4976 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />
0.8674 1.5027 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />
0.0000 2.0102 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0<br />
1 6 2 0 0 0 0<br />
1 2 1 0 0 0 0<br />
2 3 2 0 0 0 0<br />
3 4 1 0 0 0 0<br />
4 5 2 0 0 0 0<br />
5 6 1 0 0 0 0<br />
M END
Picto<br />
• Depiction/naming tool<br />
– OEChem + Lexichem + Ogham<br />
• Ships <strong>with</strong> toolkits and also <strong>with</strong> VIDA 4.0, must be<br />
compiled (C++)<br />
• SMILES name<br />
– Structure is depicted<br />
• Atom selection -> SMARTS
Picto
• OEGraphSim Fingerprints<br />
– MACCS<br />
– LINGO<br />
– Path-based<br />
• Storing Fingerprints<br />
• Similarity Search using OEFPDatabase<br />
• Performance & validation<br />
• Fingerprint Similarity Measures<br />
– Built-in, user-defined<br />
• User-defined path fingerprint
• 166 bit fingerprint<br />
• Each bit is associated <strong>with</strong> a predefined SMARTS pattern<br />
MACCS FP generation =<br />
substructure search<br />
OEMakeFP(fp, mol, OEFPType_MACCS166)<br />
OEMakeMACCS166FP(fp, mol)<br />
…<br />
…
• Based on the fragmentation of <strong>can</strong>onical isomeric SMILES<br />
into overlapping substrings<br />
• Similarity concept (similar SMILES equal similar structures)<br />
LINGO FP generation =<br />
SMILES generation<br />
OEMakeFP(fp, mol, OEFPType_Lingo)<br />
OEMakeLingoFP(fp, mol)<br />
c1cc([nH]c1)N c1cc([nH]c1)O
• No predefined pattern dictionary<br />
• Exhaustively enumerate all paths in a molecule<br />
Path FP generation =<br />
enumerating paths<br />
+ hashing<br />
OEMakeFP(fp, mol, OEFPType_Path)<br />
OEMakePathFP(fp, mol)<br />
collision
Size in Mbyte<br />
:AA$<br />
DA$<br />
CA$<br />
BA$<br />
8A$<br />
A$<br />
MACCS<br />
LINGO<br />
PATH<br />
RSS<br />
OEB<br />
OEB GZ<br />
• Dataset MDDR<br />
(~<strong>11</strong>0,000 compounds)<br />
OEB GZ<br />
OEB<br />
RSS<br />
• OEB GZ, OEB (disk space)<br />
• RSS (size in physical memory)<br />
– Using compressed OEDBMols<br />
• Path FPs are generated <strong>with</strong><br />
default parameters<br />
– 4096 bit fingerprint<br />
Estimated memory usage<br />
for 1M compounds<br />
MACCS LINGO PATH<br />
430M 470M 900M
Time in sec<br />
:AA$<br />
FE$<br />
EA$<br />
8E$<br />
A$<br />
+GHHI$<br />
JK6L&$<br />
• Dataset: MDDR (~<strong>11</strong>0,000)<br />
• Path FPs <strong>with</strong> default parameters<br />
– 4096 bit fingerprint<br />
– 0-5 path lengths<br />
MGN7$<br />
Core<br />
XEON<br />
• Intel® Core 2<br />
Quad CPU Q6600<br />
(2.4 GHz)<br />
• Intel® XEON®<br />
CPU X5560<br />
(2.8 GHz)<br />
Estimated time to<br />
generate FP for 1M<br />
compounds (ISM!OEB)<br />
MACCS LINGO PATH<br />
~12min ~2min ~5min
OEMolProp<br />
• Calculates 2D molecular properties<br />
• XlogP<br />
• XlogS<br />
• PSA<br />
• hydrogen bond <strong>do</strong>nor and acceptor count<br />
• rotatable bonds, ring size and number<br />
• Generates custom filters<br />
• ADME filters such as Lipinski<br />
• Provides graph-based protonation state assignment<br />
for consistency and speed
Link OE Libraries: DEMO<br />
1. Link Libraries<br />
sys, oechem, oeiupac, oeomega, oegrid<br />
2. Open output streams<br />
sys<br />
3. Read compound names (iupac) from a text file and convert names to structures.<br />
sys, oeiupac<br />
4. Count Rotors<br />
oechem<br />
5. Generate Conformers<br />
oeomega<br />
6. Generate Grid and Write Output<br />
oegrid, oechem
Examples built on OEChem<br />
• Registry Systems<br />
• Pharmacophore tools<br />
• PubChem<br />
• Mix and match<br />
– Rpy<br />
– Database cartridges<br />
– Web services
Give <strong>you</strong>r users this UI<br />
Designing the application UI<br />
Now that <strong>you</strong>’re armed <strong>with</strong> <strong>you</strong>r <strong>11</strong> toolkits…<br />
And not this UI
For more information, please contact us.<br />
business@eyesopen.com<br />
support@eyesopen.com<br />
www.eyesopen.com<br />
505-473-7385<br />
That’s it folks!<br />
• http://www.eyesopen.com/<strong>do</strong>cs/: Toolkit Manuals <strong>with</strong> code examples (C++, Python, Java)<br />
• ~/openeye/toolkits/examples (C++)<br />
• ~/openeye/python/examples (Python)<br />
• ~/openeye/java/openeye/examples (Java)