11.07.2015 Views

OEChem -- Python Theory Manual

OEChem -- Python Theory Manual

OEChem -- Python Theory Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>OEChem</strong> – <strong>Python</strong> <strong>Theory</strong> <strong>Manual</strong>version 1.6.1OpenEye Scientific Software, Inc.June 4, 20089 Bisbee Ct, Suite DSanta Fe, NM 87508www.eyesopen.comsupport@eyesopen.com


Copyright c○ 1997-2008 OpenEye Scientific Software, Santa Fe, New Mexico. All rights reserved.All rights reserved. This material contains proprietary information of OpenEye Scientific Software. Use of copyrightnotice is precautionary only and does not imply publication or disclosure.The information supplied in this document is believed to be true but no liability is assumed for its use or theinfringement of the rights of others resulting from its use. Information in this document is subject to changewithout notice and does not represent a commitment on the part of OpenEye Scientific Software.This package is sold/licensed/distributed subject to the condition that it shall not, by way of trade or otherwise,be lent, re-sold, hired out or otherwise circulated without OpenEye Scientific Software’s prior consent, in anyform of packaging or cover other than that in which it was produced. No part of this manual or accompanyingdocumentation, may be reproduced, stored in a retrieval system on optical or magnetic disk, tape, CD, DVD orother medium, or transmitted in any form or by any means, electronic, mechanical, photocopying recording orotherwise for any purpose other than for the purchaser’s personal use without a legal agreement or other writtenpermission granted by OpenEye.This product should not be used in the planning, construction, maintenance, operation or use of any nuclear facilitynor the flight, navigation or communication of aircraft or ground support equipment. OpenEye Scientific software,shall not be liable, in whole or in part, for any claims arising from such use, including death, bankruptcy or outbreakof war.Windows is a registered trademark of Microsoft Corporation. Apple and Macintosh are registered trademarks ofApple Computer, Inc. AIX and IBM are registered trademarks of International Business Machines Corporation.UNIX is a registered trademark of the Open Group. RedHat is a registered trademark of RedHat, Inc. Linux isa registered trademark of Linus Torvalds. Alpha is a trademark of Digital Equipment Corporation. SPARC is aregistered trademark of SPARC International Inc.SYBYL is a registered trademark of TRIPOS, Inc. MDL is a registered trademark and ISIS is a trademark ofMDL Information Systems, Inc. SMILES, SMARTS, and SMIRKS may be trademarks of Daylight ChemicalInformation Systems. Macromodel is a trademark of Schrödinger, Inc. Schrödinger, Inc may be a wholly ownedsubsidiary of the Columbia University, New York.<strong>Python</strong> is a trademark of the <strong>Python</strong> Software Foundation. Java is a trademark or registered trademark of SunMicrosystems, Inc. in the U.S. and other countries.“The forefront of chemoinformatics” is a trademark of Daylight Chemical Information Systems, Inc.Other products and software packages referenced in this document are trademarks and registered trademarks oftheir respective vendors or manufacturers.


CONTENTS1 <strong>Python</strong> 11.1 A Bit About Learning <strong>Python</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 <strong>Python</strong> versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Platform Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Installing <strong>Python</strong> from Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Introduction 42.1 <strong>OEChem</strong> and Informatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 How to Read this <strong>Manual</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Getting Started with <strong>OEChem</strong> Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Atoms and Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5 Getting Started with <strong>Python</strong>-<strong>OEChem</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Manipulating Molecules 73.1 Creating Molecule Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Reusing Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 Creating a Molecule from SMILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Generating a SMILES from a Molecule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Reading and Writing Molecules 114.1 Using <strong>OEChem</strong> oemolstreams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Reading Molecules with a Generator Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3 Molecular File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 Molecule Input and Output with Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.5 Compressed Molecule Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.6 Format control from the command line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.7 Flavored Reading and Writing of Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.8 Writing Const Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.9 Writing molecules to string streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Properties of Molecules 185.1 Generic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2 Stored Properties of Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.3 Derived Properties of Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.4 Manipulation of Tagged Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 OEMols and OEGraphMols 26i


6.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2 Multi-conformer and single-conformer molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 266.3 Conformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.4 Properties of Multi-Conformer Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286.5 Reading Multi-conformer molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306.6 Dude, where’s my SD Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Traversing the Atoms and Bonds of a Molecule 337.1 Looping over the Atoms and Bonds of a Molecule . . . . . . . . . . . . . . . . . . . . . . . . . 337.2 Looping over the Bonds of an Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347.3 Looping over the Neighbors of an Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347.4 Looping over subsets of Atoms or Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.5 Using <strong>OEChem</strong> C++ Iterators Directly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Properties of Atoms 388.1 Stored Properties of Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388.2 Derived Properties of Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Properties of Bonds 449.1 Stored Properties of Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449.2 Derived Properties of Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4510 Atom, Bond and Conformer Indices 4611 Creating Atoms, Bonds and Conformers 4711.1 Using NewAtom and NewBond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4711.2 Implicit vs. Explicit Hydrogen Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4811.3 Updating Implicit Hydrogen Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4811.4 Making Hydrogen Atoms Implicit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4911.5 Making Hydrogen Atoms Explicit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4911.6 Sprouting Hydrogens in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4911.7 Using NewConf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5012 Connectivity Processing 5212.1 Determining Bonds From 3D Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5212.2 Kekule Form Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5312.3 Perceiving Bond Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5313 Ring Processing 5413.1 Cycle Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5413.2 Number of Ring Bonds to an Atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5413.3 Testing for Membership in a Given Ring Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5413.4 Determining Smallest Ring Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5513.5 Identifying Connected Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5513.6 Identifying Ring Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5613.7 Smallest Set of Smallest Rings (SSSR) considered Harmful . . . . . . . . . . . . . . . . . . . . 5614 Aromaticity Processing 5814.1 Aromaticity and Hückel’s 4n+2 rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5814.2 Aromaticity Models in <strong>OEChem</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5814.3 Clearing Aromaticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5815 Stereochemistry Processing 59ii


15.1 Atom Stereochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5915.2 Bond Stereochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6016 Atom and Bond Typing 6116.1 Integer Atom Types and Type Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6116.2 Tripos Atom Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6116.3 Tripos Bond Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6316.4 Generic Tripos Type Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6316.5 Writing a Sybyl mol2 file using OEWriteMol2File . . . . . . . . . . . . . . . . . . . . . . . . . 6416.6 MacroModel Atom Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6416.7 Generic MacroModel Type Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6417 Formal and Partial Charges 6517.1 Assigning Formal Charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6517.2 Working with Partial Charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6617.3 Determining Net Charge on a Molecule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6617.4 Calculating Gasteiger Partial Charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6618 Pattern Matching 6818.1 Substructure Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6818.2 Maximum Common Substructure Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7018.3 Clique Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7518.4 OEExprOpts Namespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7619 Coordinate Handling 8119.1 Getting and Setting Coordinates of Atoms and Molecules . . . . . . . . . . . . . . . . . . . . . 8119.2 Coordinate Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8420 Logging and Error Handling 8720.1 OEErrorHandler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8721 Periodic Table Functions 8921.1 Obtaining the Atomic Symbol of an Atom/Element . . . . . . . . . . . . . . . . . . . . . . . . . 8921.2 Obtaining the Atomic Number from an Atomic Symbol . . . . . . . . . . . . . . . . . . . . . . 8921.3 Properties of the Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9021.4 Handling Isotopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9021.5 Calculating Molecular Weight of a Compound . . . . . . . . . . . . . . . . . . . . . . . . . . . 9122 Predicate Functions 9222.1 Callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9222.2 Predefined <strong>OEChem</strong> Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9322.3 Writing your own Functors in <strong>Python</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9422.4 Composition Functors in <strong>OEChem</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9423 Molecular File Formats 9623.1 SMILES File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9623.2 MDL File Format (SD and Mol) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9723.3 Sybyl Mol2 File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9723.4 PDB File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9823.5 MacroModel File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9823.6 XYZ File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9923.7 FASTA Sequence File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99iii


24 Miscellaneous Utilities 10124.1 OEStopWatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10124.2 OEDots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10125 The SMILES Line Notation 10325.1 Daylight SMILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10325.2 Extensions to Daylight SMILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10426 Biopolymer Residues 10526.1 Ontology and Schema Fragility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10526.2 Stored Properties of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10926.3 A Hierarchy View of Residue Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11127 Valence Models 11227.1 The MDL Valence Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11227.2 The OpenEye Valence Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11228 SMARTS Pattern Matching 11628.1 SMARTS Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11629 SMARTS Pattern Matching 11729.1 SMARTS Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11730 Reactions 11930.1 Normalization Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11930.2 Library Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12031 License handling 12432 <strong>OEChem</strong> Class Hierarchy: Why in the world are there 6 molecules?! 12532.1 Atoms, Bond, Conformers, and Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12532.2 Objects and Free-Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12632.3 Programming Layers: The Deep and Twisted Path . . . . . . . . . . . . . . . . . . . . . . . . . 12633 Bibliography 12834 Release Notes 13034.1 <strong>OEChem</strong> 1.6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13034.2 <strong>OEChem</strong> 1.6.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13134.3 <strong>OEChem</strong> 1.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13634.4 <strong>OEChem</strong> 1.5.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13634.5 <strong>OEChem</strong> 1.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13934.6 <strong>OEChem</strong> 1.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14234.7 <strong>OEChem</strong> 1.4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14334.8 <strong>OEChem</strong> 1.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14834.9 <strong>OEChem</strong> 1.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15134.10 <strong>OEChem</strong> 1.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15634.11 <strong>OEChem</strong> 1.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16034.12 <strong>OEChem</strong> 1.3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Index 168iv


230.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12030.2 Strict SMIRKS Reaction Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12030.3 Reactions Using Implicit Hydrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12130.4 Reactions Using Automatic Valence Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 121


2 Chapter 1. <strong>Python</strong>1.2 <strong>Python</strong> versions<strong>Python</strong>-<strong>OEChem</strong> distributions are built to work with the latest version of <strong>Python</strong> at the time of release, i.e., thisrelease supports <strong>Python</strong> 2.5. However, to allow for easier installation a default system <strong>Python</strong> is also supported if itis <strong>Python</strong> 2.3 or higher. On systems where <strong>Python</strong> 2.2 is still the default, we build for <strong>Python</strong> 2.5 using the systemcompiler. Other versions may also be made available for user convenience.<strong>Python</strong> supports binary compatibility between patch versions, but not between minor versions. In other words,Py<strong>OEChem</strong> built for <strong>Python</strong> 2.3.4 can be expected to work with <strong>Python</strong> 2.3.x, but not backward to <strong>Python</strong> 2.2.x orforward to <strong>Python</strong> 2.4.x.1.3 Platform Notes1.3.1 *nix<strong>Python</strong>-<strong>OEChem</strong> is provided in tar.gz format on non-Windows operating systems, which when un-tarred creates adirectory tree just like all other OpenEye applications. If you already have an OpenEye tree, un-tar the <strong>Python</strong> distributioninto the same directory. If not, un-tarring will create an entire directory structure starting with openeye.All the <strong>Python</strong> related code will be found under openeye/python.For <strong>Python</strong> to locate <strong>OEChem</strong> we need to set the PYTHONPATH environment variable. In your shell startup script(.bashrc for example) add the following 2 lines. (The syntax may vary if you use a shell other than bash.)PYTHONPATH=/usr / local / openeye / pythonexport PYTHONPATHThis is equivalent to the following <strong>Python</strong> code.import syssys . path . append ( "/usr/local/openeye/python" )Obviously, if you un-tarred in a different parent directory, you would use that actual location.Py<strong>OEChem</strong> versions prior to <strong>OEChem</strong> version 1.6.1 required setting the LD LIBRARY PATH environmentvariable. Since version 1.6.1 the <strong>OEChem</strong> shared libraries have been enhanced to not require thisenvironment variable. It is strongly recommended to unset this environment variable as it can cause thedynamic linker to link an incompatible library at runtime.1.3.2 WindowsOn Windows, we support the <strong>Python</strong> 2.3, 2.4, and 2.5 binary distributions from www.python.org. If you useCygwin for a shell environment, please note that we do not support the <strong>Python</strong> installed by Cygwin. You canhowever use the Win32 <strong>Python</strong> in Cygwin by adding /cygdrive/c/<strong>Python</strong>25 to your PATH under Cygwin.Installation is a simple double-click installer. The installer will install all the necessary <strong>OEChem</strong> files into <strong>Python</strong>’ssite-packages directory. The documentation and examples will be installed into C:/OpenEye/python/.Please make sure to un-install any previous versions of Py<strong>OEChem</strong> before installing a new version. Make sure<strong>Python</strong> is installed before installing Py<strong>OEChem</strong>.


1.4. Installing <strong>Python</strong> from Source 31.3.3 Mac OS XMac OS X 10.4 (Tiger) ships with <strong>Python</strong> 2.3 installed. Mac OS X 10.5 (Tiger) ships with <strong>Python</strong> 2.5 installed.Py<strong>OEChem</strong> is designed to work with these default versions.Py<strong>OEChem</strong> versions prior to <strong>OEChem</strong> version 1.6.1 required setting the DYLD LIBRARY PATH environmentvariable. Since version 1.6.1 the <strong>OEChem</strong> shared libraries have been enhanced to not require thisenvironment variable. It is strongly recommended to unset this environment variable as it can cause thedynamic linker to link an incompatible library at runtime.1.4 Installing <strong>Python</strong> from SourceIt is very likely that if you have a Unix system (other than Linux), <strong>Python</strong> will not be installed. However, downloading,building and installing <strong>Python</strong> is very simple and works on every platform on which <strong>OEChem</strong> is available.Go to http://www.python.org/download and get the latest source distribution.Following that link above will bring you to a version-specific download page so you can download <strong>Python</strong>-2.*.*tgz(or similar for a different version number). Save this file in an empty directory and then follow the simplifieddirections below (or better yet, follow the INSTALL directions found in the .tgz file)1. cd to the directory containing the <strong>Python</strong>-2.5.2.tgz file.2. gunzip <strong>Python</strong>-2.5.2.tgz3. tar -xvf <strong>Python</strong>-2.5.2.tar4. cd <strong>Python</strong>-2.5.25. ./configureThe main option to configure is a decision about the root of the install tree and whether or not to use gcc asthe compiler.By default, configure will choose /usr/local such that the application ends up in /usr/local/bin/python andthe associated library files end up in /usr/local/lib/python2.5. Additionally, if gcc is in your path, the defaultconfigure will use gcc to build <strong>Python</strong>. It is important that the C compiler used to build <strong>Python</strong> match theversion of the C compiler used to build Py<strong>OEChem</strong>.If you are happy with the default then just run:./configureIf you have an alternate place (such as /apps or /sw) then run configure as:./configure --prefix=/appsTo use the native, system compiler instead of gcc:./configure --without-gcc6. make7. As root: make install8. As a normal user, start a new shell and type:which pythonand make sure it points to /usr/local/bin/python if you ran configure without a prefix option or to /apps/bin/python,/sw/bin/python, etc. depending on the actual prefix you fed configure in step 5.


CHAPTERTWOIntroduction2.1 <strong>OEChem</strong> and InformaticsChemical information processing is the science of representing molecules in computers. Hence the fundamental”object” or data structure within a chemical information system is that of the molecule, its atoms and its bonds.A significant problem encountered in such systems is that different applications place differing requirements orconstraints on how a molecule is represented. In protein biochemistry, molecules are divided into amino acidresidues with specific atom naming and conformational information such as alpha helix or beta sheet. In inorganicchemistry requires isotopic and co-ordination information on each atom which complex chiralities. One possiblesolution is to prescribe a single data structure that encodes all of the potential information required of an atom.However, such an approach suffers from the fact that ”you can’t please all of the chemists, all of the time.” Arequirement in the field of chemical databases and substructure searching is that a molecule representation be ascompact as possible, to allow as much information to be held in memory as possible and maximize the performanceof processing databases from disk.The alternative, even complementary, approach taken by <strong>OEChem</strong> is to concentrate on the similarities in moleculeprocessing, rather than on the differences. Using the concepts of ”polymorphism” from object oriented programming,it possible to write algorithms that are mostly independent of the actual data structure used to storea molecule.2.2 How to Read this <strong>Manual</strong>This is <strong>OEChem</strong>’s <strong>Python</strong> Programming manual. It is a collection of prose covering many of the important topicswhich can be addressed by the <strong>OEChem</strong> library. This manual is meant to be read from front to back at least once.Each topic in this manual is introduced assuming the knowledge presented earlier in the manual. Further, thecomplexity of topics as well as the complexity of the example code grows as the text progresses. While the initiallistings are effectively the ”Hello World” of <strong>OEChem</strong>, later examples may require some time to comprehend fully.This manual is filled with example programs. We encourage you to test and modify the examples we present.For a reference to <strong>OEChem</strong>’s complete functionality, please see the associated API manual. For a more in-depthdiscussion of the underlying C++ library, see the C++ <strong>Theory</strong> <strong>Manual</strong>.4


2.3. Getting Started with <strong>OEChem</strong> Molecules 52.3 Getting Started with <strong>OEChem</strong> MoleculesFor those of you who just picked up the <strong>OEChem</strong> manual for the first time and are looking for somewhere to getstarted, let us consider the the OEGraphMol. This is the molecule used in most example programs you will findin <strong>OEChem</strong>’s example directories, or in the listings of this manual. An OEGraphMol is a concrete class whichcan be declared and used for most molecular functions in <strong>OEChem</strong>. An OEGraphMol contains atoms and bondswhich can be accessed through iterators via the molecule API. Much of the OEGraphMol’s API is defined bythe OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBaseargument. For efficiency, the OEGraphMol does not inherit from OEMolBase. This pattern will be discussed inmore detail later in this manual. This is an over-simplistic view of the <strong>OEChem</strong> molecular hierarchy, but it providesa useful starting point for understanding and using <strong>OEChem</strong>.Simplistic <strong>OEChem</strong> inheritance scheme:OEBase||OEMolBase ---------------- OEGraphMol


6 Chapter 2. Introductionwhere all methods and objects in the oechem namespace are pulled directly into the current script’s namespace.Since <strong>OEChem</strong>’s methods and objects all have unique names, there is little chance to have a name clash with thisparticular import * call.Once the package is imported, objects can be created and methods can be called without the addition of namespaceprefixes, resulting in simpler code.


CHAPTERTHREEManipulating Molecules3.1 Creating Molecule ObjectsAt the center of <strong>OEChem</strong> are the molecule objects. The example below represents the smallest possible<strong>Python</strong>/<strong>OEChem</strong> script. This program creates an OEMol called mol when run. When the program ends, <strong>Python</strong>automatically cleans up the object.from openeye . oechem import ∗mol = OEMol ( )There may be times when you want to delete (destroy) a molecule before the end of the script. This can easily bedone by using the built-in command, del.from openeye . oechem import ∗mol = OEMol ( )del ( mol )3.2 Reusing Molecules<strong>OEChem</strong> also provides a mechanism for reusing a molecule. For example, when processing multiple, sequentialmolecules in a database or input file, instead of requiring a new molecule to be allocated and destroyed for eachiteration, OEMolBase’s provides a Clear method to reset a molecule to its initial (empty) state.The following example demonstrates calling the Clear method of our molecule. Note that the Clear in the followingexample is not required as the molecule is already initialized (empty) by the creation function. The codedoes demonstrate that the OEMol does behave as an OEMolBase allowing it to be used with any of <strong>OEChem</strong>’sOEMolBase methods or functions.from openeye . oechem import ∗mol = OEMol ( )mol . Clear ( )7


8 Chapter 3. Manipulating Molecules3.3 Creating a Molecule from SMILESA common method of creating a molecule in <strong>OEChem</strong> is via the SMILES representation. SMILES notation is commonlyused in chemical information systems, as it provides a convenient string representation of a molecule. Anintroduction to SMILES syntax is provided later in this manual. For examples we’ll use the SMILES ”c1ccccc1”which describes the benzene molecule.A molecule can be created from a SMILES string using the function OEParseSmiles.from openeye . oechem import ∗# create a new moleculemol = OEGraphMol ( )# convert the string into a moleculeOEParseSmiles ( mol , "c1ccccc1" )The OEParseSmiles function actually returns true or false (1 or 0) indicating whether the input string was a validSMILES string. It is good programming practice to check the return value and report an error message if anythingwent wrong. The following example shows adding a check on the return status of OEParseSmiles and prints anerror message to sys.stderr.from openeye . oechem import ∗import sys# create a new moleculemol = OEGraphMol ( )if ( OEParseSmiles ( mol , "c1ccccc1" ) == 1 ) :# do something interesting with molelse :sys . stderr . write ( "SMILES string was invalid!\n" )3.4 Generating a SMILES from a MoleculeTo produce a SMILES string from a molecule, we use a function. The next two examples will use OECreateCanSmiString.OECreateCanSmiString converts the given OEMolBase into a canonical SMILES string and returns thatstring. Note the difference in the syntax between <strong>Python</strong> and C++. C++ sends an empty string as an argument,whereas in <strong>Python</strong> the SMILES string is the return value of the function.from openeye . oechem import ∗import sysmol = OEGraphMol ( )if ( OEParseSmiles ( mol , "c1ccccc1" ) == 1 ) :smi = OECreateCanSmiString ( mol )sys . stdout . write ( "Canonical SMILES is %s\n" % smi )else :sys . stderr . write ( "SMILES string was invalid!\n" )The following more complicated example reads SMILES from stdin and writes the canonical SMILES to stdout.


3.4. Generating a SMILES from a Molecule 91 #!/usr/bin/env python2 # ch3-1.py3 from openeye . oechem import ∗4 import sys56 mol = OEGraphMol ( )78 smilein = raw_input ( )9 while smilein :10 mol . Clear ( )11 if ( OEParseSmiles ( mol , smilein ) == 1 ) :12 smi = OECreateCanSmiString ( mol )13 sys . stdout . write ( "%s\n" % smi )14 else :15 sys . stderr . write ( "%s is an invalid SMILES!" % smilein )16 smilein = raw_input ( )Listing 3.1: Converting SMILES to canonical SMILESNotice that this example makes use of the OEMolBase Clear method to reuse the molecule. The behavior ofOEParseSmiles is to add the given SMILES to the current molecule. If the line mol.Clear() was removedfrom the program, the output would contain longer and longer SMILES containing disconnected fragments.The above example is a very simple canonical SMILES creation program, but probably doesn’t do what most usersmight expect. The molecule returned by OEParseSmiles preserves the aromaticity present in the input SMILESstring, so for example, if benzene is expressed as ”c1ccccc1” all atoms and bonds are marked as aromatic, but ifexpressed as a Kekulé form, ”C1=CC=CC=C1”, all atoms and bonds are kept aliphatic.InputccC1=CC=CC=C1C1=CN=CC=C1Outputc=cC1=CC=CC=C1C1=CC=NC=C1A common task after creating a molecule from SMILES is to normalize its aromaticity with OEAssignAromaticFlags.So the following example will produce canonical SMILES including perception of aromaticity from theconnection table.1 #!/usr/bin/env python2 # ch3-2.py3 from openeye . oechem import ∗4 import sys56 mol = OEGraphMol ( )7 smilein = raw_input ( )8 while smilein :9 mol . Clear ( )10 if ( OEParseSmiles ( mol , smilein ) == 1 ) :11 OEAssignAromaticFlags ( mol )12 smi = OECreateCanSmiString ( mol )13 sys . stdout . write ( "%s\n" % smi )14 else :15 sys . stderr . write ( "%s is an invalid SMILES!" % smilein )16 smilein = raw_input ( )Listing 3.2: A better SMILES to canonical SMILES converterAnd here are the results of this new version:


10 Chapter 3. Manipulating MoleculesInputccC1=CC=CC=C1C1=CN=CC=C1OutputC=Cc1ccccc1c1ccncc1This same program could also be written to construct a new molecule each time through the loop:1 #!/usr/bin/env python2 # ch3-3.py3 from openeye . oechem import ∗4 import sys56 smilein = raw_input ( )7 while smilein :8 mol = OEGraphMol ( )9 if ( OEParseSmiles ( mol , smilein ) == 1 ) :10 OEAssignAromaticFlags ( mol )11 smi = OECreateCanSmiString ( mol )12 sys . stdout . write ( "%s\n" % smi )13 else :14 sys . stderr . write ( "%s is an invalid SMILES!" % smilein )15 smilein = raw_input ( )Listing 3.3: Creating a new molecule for each input SMILES


CHAPTERFOURReading and Writing Molecules4.1 Using <strong>OEChem</strong> oemolstreamsThe previous example demonstrated reading and writing SMILES strings from the command line. This requiresthe programmer to perform the I/O explicitly. While this may be reasonable for SMILES strings that can be read ona single line, it is unsuitable for more complex file formats. To ease this task, <strong>OEChem</strong> provides the ”molstream”abstraction. The classes oemolistream and oemolostream allow input and output of molecules from filesor strings.The first interface to stream I/O uses functions to read and write molecules. This is provided by the functionsOEReadMolecule and OEWriteMolecule that both take a molstream and a OEMolBase as arguments. As a highlevelfunction OEReadMolecule calls mol.Clear() automatically for each incoming molecule.1 #!/usr/bin/env python2 # ch4-1.py3 from openeye . oechem import ∗45 ifs = oemolistream ( )6 ofs = oemolostream ( )78 mol = OEGraphMol ( )910 while ( OEReadMolecule ( ifs , mol ) == 1 ) :11 OEWriteMolecule ( ofs , mol )Listing 4.1: High-level molecule i/o using molstreamsIn this example, the script will read molecules from stdin in SMILES format and write them to stdout in (absolute)SMILES format. Notice that in this example, there’s no need to call the Clear method to reset the molecule,or OEAssignAromaticFlags to normalize aromaticity. This is done automatically by the OEReadMolecule method.4.2 Reading Molecules with a Generator MethodThe preferred way to read molecules in <strong>OEChem</strong> is to use the generator methods provided by oemolistreams.These methods provide syntax similar to using a for x in y loop to iterate over the elements of a list.11


12 Chapter 4. Reading and Writing Molecules1 #!/usr/bin/env python2 # ch4-2.py3 from openeye . oechem import ∗45 ifs = oemolistream ( )6 ofs = oemolostream ( )78 for mol in ifs . GetOEGraphMols ( ) :9 OEWriteMolecule ( ofs , mol )Listing 4.2: Generator method for reading moleculesNote that using this syntax there is no need to create a molecule object. A single molecule object is created bythe generator method (GetOEGraphMols) and is re-used on each loop iteration. As such, this syntax should notbe used to put molecules into a list or other persistent container. If you need to create a molecule object that ispersistent and can be used after the loop completes, there are a couple of alternatives.Probably the most efficient is to change the looping criteria slightly and create a new molecule object each timethrough the loop. This first example tests the state of the input stream to determine when the loop is finished.1 #!/usr/bin/env python2 # ch4-3.py3 from openeye . oechem import ∗45 ifs = oemolistream ( )6 ofs = oemolostream ( )78 # create an empty list9 mollist = [ ]1011 # loop over input12 while ifs . IsValid ( ) :13 mol = OEGraphMol ( )14 OEReadMolecule ( ifs , mol )15 mollist . append ( mol )1617 for mol in mollist :18 OEWriteMolecule ( ofs , mol )Listing 4.3: Reading molecules into a listAlternatively, you can use iterator-like syntax and then construct a new molecule object from the current one. TheOEGraphMol constructor can be used, except this time we use the ”mol” as an argument to the function, creating anew molecule from our temporary one. In the next example, each time through the loop, a new molecule is createdand stored in a <strong>Python</strong> list. Then iteration over the list is used to write the molecules back out.from openeye . oechem import ∗ifs = oemolistream ( )ofs = oemolostream ( )# create an empty listmollist = [ ]# loop over inputfor mol in ifs . GetOEGraphMols ( ) :newmol = OEGraphMol ( mol )mollist . append ( newmol )# create a new molecule# append to listfor mol in mollist :OEWriteMolecule ( ofs , mol )


4.3. Molecular File Formats 13Note for C++ Users: In the C++ theory manual, this same syntax is in the section describing reading moleculeswith iterators. The <strong>Python</strong> generator methods are a new functionality introduced in <strong>Python</strong> 2.2. In <strong>OEChem</strong> theyare used to provide the C++ iterator functionality without requiring the <strong>Python</strong> user to explicitly create an iteratorobject.4.3 Molecular File FormatsIn addition to SMILES strings, <strong>OEChem</strong> is able to read numerous other molecular file formats, including MDL SDfiles, Tripos Mol2 files and PDB files. The format of an input file or stream may be associated with a oemolstreamusing the SetFormat method, and may be retrieved with GetFormat. These take (or return) and integer constantdefined in C++. The following table shows the constants and the corresponding file formats supported by <strong>OEChem</strong>.A value of OEFormat UNDEFINED (zero) means that there is no file format associated with the molstream. Notethat the default format associated with an oemolstream is OEFormat SMI.File Format Description Read? Write?OEFormat OEB New Style OpenEye OEBinary Yes YesOEFormat BIN Old Style OEBinary Yes YesOEFormat CAN Canonical SMILES Yes YesOEFormat FASTA FASTA protein sequence Yes YesOEFormat ISM Isomeric SMILES Yes YesOEFormat MDL MDL Mol File Yes YesOEFormat MF Molecular Formula (Hill order) No YesOEFormat MOL2 Tripos Sybyl mol2 file Yes YesOEFormat MOL2H Sybyl mol2 with explicit hydrogens Yes YesOEFormat MOPAC MOPAC file format(s) Yes YesOEFormat PDB Protein Databank PDB file Yes YesOEFormat SDF MDL SD File Yes YesOEFormat SMI Absolute SMILES Yes YesOEFormat XYZ XMol XYZ format Yes YesThe following example shows how to use oemolstreams to convert MDL SD files into Tripos Mol2 files.1 #!/usr/bin/env python2 # ch4-4.py3 from openeye . oechem import ∗45 ifs = oemolistream ( )6 ifs . open ( )7 ofs = oemolostream ( )8 ofs . open ( )910 ifs . SetFormat ( OEFormat_SDF )11 ofs . SetFormat ( OEFormat_MOL2 )1213 for mol in ifs . GetOEMols ( ) :14 OEWriteMolecule ( ofs , mol )Listing 4.4: Converting SDF to MOL2 using stdin/stdoutIn general, the SetFormat method should only be called on an oemolistream before the first connection table isread.


14 Chapter 4. Reading and Writing Molecules4.4 Molecule Input and Output with FilesIn addition to stdin and stdout, <strong>OEChem</strong>’s oemolstreams also support reading from files. To open a file,use the filename as a constructor argument or call the open method with the filename as an argument. For input(oemolistream) if the file doesn’t exist, the open fails and returns 0 (false). For output (oemolostream) the outputfile is created if it didn’t previously exist and is overwritten if it did. If no filename is passed as an argument to theconstructor (or to the open method), an oemolistream will use stdin and an oemolostream will use stdout.Much like standard file I/O in <strong>Python</strong>, oemolstreams can be closed after use with the close method. When anoemolstream goes out of scope and is deleted by <strong>Python</strong>, it is automatically closed as well.1 #!/usr/bin/env python2 # ch4-5.py3 from openeye . oechem import ∗45 ifs = oemolistream ( )6 ofs = oemolostream ( )78 if ( ifs . open ( "drugs.sdf" ) == 1 ) :9 if ( ofs . open ( "drugs.mol2" ) == 1 ) :10 for mol in ifs . GetOEMols ( ) :11 print mol . NumConfs ( )12 OEWriteMolecule ( ofs , mol )13 else :14 sys . stderr . write ( "Unable to open output file\n" )15 else :16 sys . stderr . write ( "Unable to open input file\n" )Listing 4.5: Reading and writing from filesOne convenient use of the open method of molstreams is that it sets the file format associated with the streamfrom the file extension of the filename used as the argument. The example above converts the file ”drugs.sdf” inMDL SD format into the file, ”drugs.mol2” in Tripos Mol2 format. This behavior can be overridden by callingSetFormat after the call to open but before the first molecule is read or written from/to the stream.4.5 Compressed Molecule Input and OutputFor any of the molecular file formats supported by <strong>OEChem</strong> it is often convenient to read and write compressed filesor strings. Molecule streams support gzipped input and output via the zlib library. The ”.gz” suffix on any filenameused to open a stream is recognized and the stream is read or written in compressed format. This mechanism doesnot interfere with the format perception. For instance, ”fn.sdf.gz” is recognized as a gzipped file with MDL’s SDformat.The following example demonstrates use of compressed input and output1 #!/usr/bin/env python2 # ch4-6.py3 from openeye . oechem import ∗45 ifs = oemolistream ( )6 ofs = oemolostream ( )78 if ( ifs . open ( "drugs.sdf.gz" ) == 1 ) :9 if ( ofs . open ( "drugs.oeb.gz" ) == 1 ) :10 for mol in ifs . GetOEGraphMols ( ) :


16 Chapter 4. Reading and Writing MoleculesThis convert command will take the SMILES format output from GetFromDatabase, send it to foo.py onstdin with the default format of OEFormat SMI and generate an SD format file.However, to make this concept of using stdin and stdout for piping data really useful, one needs to be ableto control the format of stdin and stdout similarly to the way it would be controlled for temporary files. Tofacilitate this, oemolstreams interpret filenames which are ONLY format extensions to indicate format control forstdin and stdout.Now, using our program foo.py from listing 4.7 above:prompt>foo . py . smi . mol2This command opens stdin with SMILES format and opens stdout with MOL2 format.Now we have complete format control of stdin and stdout from the command line. If we have a programGenerateStructures, which only writes MOL2 format and another program GenerateData, which onlyreads SD format, we can use them from the command line with any <strong>OEChem</strong> program which uses command-linearguments for file specification.prompt> GenerateStructures | foo . py . mol2 . sd | GenerateDataThis command demonstrates how any <strong>OEChem</strong> program with command-line file specification can be used to pipeformatted input and output.4.7 Flavored Reading and Writing of MoleculesThe general goal of the oemolstream input and output classes in <strong>OEChem</strong> is to provide the user with transparentaccess to the very complex task of reading and writing molecules in a wide variety of formats. However, occasionally,a programmer may want to tweak the behavior of specific writers without abandoning the oemolstreams to usethe low level writers (such as OEWriteMDLFile). For these instances, oemolstreams provide the SetFlavorand GetFlavor methods.The SetFlavor function takes two integer arguments, the first is the format for which the flavor is being specifiedand the second is the flavor itself. The formats are specified as discussed above for SetFormat. The inputflavors are specified in the <strong>OEChem</strong> namespace OEIFlavor and the output flavors are specified int the <strong>OEChem</strong>namespace OEOFlavor. Unlike the formats, the flavors are a bitmask and may be or’ed together. Under theOEIFlavor and OEOFlavor namespaces, there is a namespace for each format as well as a generic namespace. Thegeneric namespace is used to control aromaticity perception and other properties common to all of formats. Tocompletely specify a flavor, one would typically binary-OR a generic flag and a format specific flag and pass theresultant value to SetFlavor.The default behavior for the PDB reader is that TER specifies the termination of a disconnected fragment withinthe same molecule while END specified the termination of a connection table (see the API manual for details).However, some users may want to have the reader split PDB input files into different molecules every time a TERappears.The following code is an example of changing the PDB reader flavor.1 #!/usr/bin/env python2 #ch4-9.py34 from openeye . oechem import ∗5 import sys6


4.8. Writing Const Molecules 177 ifs = oemolistream ( ’input.pdb’ )8 ofs = oemolostream ( ’output.mol2’ )910 flavor = OEIFlavor_Generic_Default | OEIFlavor_PDB_Default | OEIFlavor_PDB_TER11 ifs . SetFlavor ( OEFormat_PDB , flavor )1213 for mol in ifs . GetOEMols ( ) :14 OEWriteMolecule ( ofs , mol )Listing 4.8: Changing oemolstream reader flavorSimilar low-level control can be exerted over both input and output stream readers using the powerful SetFlavorcommand. See the API documentation of each low-level reader for details on the effects of specific flavor flags.4.8 Writing Const MoleculesThe high-level OEReadMolecule and OEWriteMolecule functions standardize the molecule according tothe output type for uniformity. For writing molecules without changing them, there are two options. If you wouldlike the data to appear in the file exactly as it is in the molecule (perhaps Tripos atom names in a .pdb format),then you should use a low level writer (e.g.–OEWriteMol2File). On the other hand, if you would like to write astandardized molecule (e.g.–Tripos atom types in a MOL2 file), but don’t want your molecule changed, you canuse OEWriteConstMolecule.4.9 Writing molecules to string streamsThe OEReadMolecule and OEWriteMolecule functions take a generic version of the <strong>OEChem</strong> stream. Thismeans that while it is more common to use files streams, a user can certainly pass a string stream to these functions.The <strong>OEChem</strong> stringstream objects are oeisstream and oeosstream.


CHAPTERFIVEProperties of Molecules5.1 Generic DataMost objects created in <strong>OEChem</strong> are able to store relatively arbitrary data or generic data. This allows for easytagging of atoms, bonds and molecules for processing and annotation. In fact, molecules can also store othermolecules as generic data as well as other high level <strong>OEChem</strong> objects such as OESurfaces, OEScalarGrids andOESkewGrids.The basic interface for generic data is through the GetData and SetData methods of molecules, atoms and bonds.For example:from openeye . oechem import ∗target = OEGraphMol ( )query = OEGraphMol ( )# do something with target and querytarget . SetData ( ’original query’ , query )# and later on...query = target . GetData ( ’original query’ )Generic data is also saved when molecules are written out as OEB files as described in Chapter 4.3. This makes itparticularly useful for purposes of annotation.By default, SetData converts the given data into the closest C + + data type possible:Example <strong>Python</strong> Data Type C++ Data Type1 Integer int1.0 Number double’foo’ String std::stringTrue Boolean bool(1,2,3) Tuple of Integers int *(1.0,2.0,3.0) Tuple of Numbers double *(False, True) Tuple of Booleans bool *[1,2,3] List of Integers std::vector[1.0,2.0,3.0] List of Numbers std::vector[False, True] List of Booleans std::vector18


5.2. Stored Properties of Molecules 19Property Name Type Set Method Get MethodDimension int SetDimension GetDimensionEnergy float SetEnergy GetEnergyRxn bool SetRxn IsRxnTitle str SetTitle GetTitleTable 5.1: Stored properties of moleculesConversely, GetData automatically converts any C++ data type into the closest <strong>Python</strong> data type.Sometimes more control is required when setting internal datatypes, SetData optionally takes a third argumentspecifying the desired stored data type. This is useful when converting C++ examples to python or making surethat the stored data is of a particular type. Similar to the automatic SetData conversion, python tuples are convertedto arrays and python lists are convered to std::vectors. Note that not all types are aupported in arrays or vectors.Example usage:type code meaning Note’c’ char (vector only)’b’ signed char (vector only)’B’ unsigned char (vector only)’h’ signed short (vector only)’H’ unsigned short (vector only)’i’ signed int’I’ unsigned int (vector only)’f’ float’d’ double’s’ string (no vector or array)’?’ booleanm . SetData ( ’foo’ , ( 1 , 2 , 3 , 4 , 5 ) , ’h’ ) # saves an array of shortsm . SetData ( ’bar’ , [ 1 , 2 , 3 , 4 , 5 ] , ’h’ ) # saves a std::vector of shortsNote: Setting two different data types to the same name will cause an internal warning to be sent to the currenterror stream and the second call will not set the internal data. Unfortunately, no exception is raised.5.2 Stored Properties of MoleculesMolecules in <strong>OEChem</strong> are represented by OEMolBases. In addition to keeping track of the atoms and bonds thatconstitute a molecule, the OEMolBase is also used to store global information about the molecule. The stored(read-write) properties of a molecule are listed below:


20 Chapter 5. Properties of Molecules5.2.1 DimensionThe “Dimension” property is an unsigned integer representing the dimensionality of the co-ordinates. This has thedefault value zero, for unknown or no coordinates, 2 for 2-dimensional co-ordinates (such as depictions) and 3 for3-dimensional co-ordinates. This property is typically set by the file format reader indicating the dimensionalityof the input file (0 for SMILES, 3 for MOL2 and 2 or 3 for MDL SD files etc...)5.2.2 EnergyThe “Energy” property is a float representing the energy of the structure (in unspecified units). This has the defaultvalue zero. Higher values indicate higher energies and therefore less-favorable or more-strained structures.5.2.3 RxnThe “Rxn” property is a boolean representing whether this molecule represents a reaction or not. The default valueis false. A true value indicates that the molecule represents a reaction, while false indicates the molecule is a simpleconnection table (and the “Role” property of each atom is OERxnRole None.5.2.4 TitleThe “Title” property is a string used to represent the name of the molecule. The default value is an empty string.This field may be used to store a registry number or other identifier, instead of a common name. The string istypically trimmed of white space by most file format readers.The following code uses the OEMolBase.GetTitle method to list the names of the molecules in a file. Theinput file is read from standard-in and the list of identifiers (molecule names) are written to standard-out.1 #!/usr/bin/env python2 # GetTitle.py3 from openeye . oechem import ∗45 mol = OEGraphMol ( )6 ims = oemolistream ( )78 while OEReadMolecule ( ims , mol ) :9 print mol . GetTitle ( )Listing 5.1: Printing Molecule TitlesMuch more data can be stored in generic data containers associated with the molecules. The most common is SDfile tagged data. There are several convenience methods for dealing with SD file tagged data since it is so common.See ‘Manipulation of Tagged Data’ on page 22.


5.3. Derived Properties of Molecules 215.3 Derived Properties of MoleculesIn addition to the stored properties listed above, the OEMolBase interface also contains several methods forgetting derived (or read-only) properties of the molecule.5.3.1 NumAtoms, NumBondsThe NumAtoms method returns the number of atoms in the molecule, and the NumBonds method returns thenumber of bonds in a molecule. Note that these methods return the number of explicit atoms and explicit bondsof a molecule, and don’t include any implicit hydrogens (or bonds to them). For more details, see ‘Implicit vs.Explicit Hydrogen Atoms’ on page 48.5.3.2 GetMaxAtomIdx, GetMaxBondIdxThe GetMaxAtomIdx method returns the largest allocated atom index plus one, and the GetMaxBondIdxmethod returns the largest allocated bond index plus one. For more details, see ‘Atom, Bond and ConformerIndices’ on page 46.5.3.3 GetAtom, GetBondThe GetAtom and GetBond methods return a single OEAtomBase or OEBondBase respectively. The objectreturned is the first instance in the molecule of an atom or bond for which the predicate (see ‘Predicate Functions’on page 92) passed into these methods as an argument returns true. If no match is found None is returned.Thus mol.GetBond(OEHasOrder(2)) will return the first double-bond in the molecule.The GetBond method can also take two OEAtomBases and return a the bond between them, or None if onedoes not exist.5.3.4 GetAtoms, GetBondsThese two functions provide the primary mode of access to the atoms and bonds of a molecule. They return anobject which is an iterator over the OEAtomBases or OEBondBases of the molecule respectively.The use of the iterators will be covered in great detail in ‘Traversing the Atoms and Bonds of a Molecule’ onpage 33.


22 Chapter 5. Properties of Molecules5.4 Manipulation of Tagged Data5.4.1 Manipulating SD Tagged DataMeta information about a molecule is stored in what is known as “tagged data.” The most common example of thisis the data fields found in SD files. Since SD files are a common form of data storage and transfer from one systemto another, <strong>OEChem</strong> provides several methods to manipulate this data. A simple class, OESDDataPair is usedto set or retrieve these pairs. OESDDataPair objects provide SetTag/GetTag and SetValue/GetValuemethods for access to each half of the pair.If you wish to store a numeric value, use <strong>Python</strong>’s “str()” method to convert it to a string and then use “int()” or“float()” on the value when retrieving the data.The following functions provide access to the SD data.Storing SD Data on a MoleculeUse the OESetSDData method to set a tag and value data pair. Both the tag and the value must be strings.If an item with the same tag already exists, it is replaced. The second form is the same as the first but uses anOESDDataPair instance.OESetSDData ( mol , tag , value ) OESetSDData ( mol , dp )Use the OEAddSDData method to add a tag and value data pair. Both the tag and the value must be strings. If anitem with the same tag already exists, another one is added. The second form is the same as the first but uses anOESDDataPair instance.OEAddSDData ( mol , tag , value )OEAddSDData ( mol , dp )Retrieving SD Data from a MoleculeUse the OEHasSDData method to determine if a molecule has an item with a given tag:OEHasSDData ( mol , tag )Use the OEGetSDData method to get the value for the given tag. If the molecule does not have that tag, an emptystring is returned.OEGetSDData ( mol , tag )An OESDDataIter (iterator of SDDataPairs) can be used in a loop as shown in the following example.OEGetSDDataPairs ( mol )Copying SD DataUse OECopySDData to copy the entire set of SD data from a source(src) molecule to a destination(dst) molecule.OECopySDData ( dest , src )


5.4. Manipulation of Tagged Data 23Deleting SD Data from a MoleculeUse OEDeleteSDData to delete a tagged data item. All data items with the specified tag will be deleted.OEDeleteSDData ( mol , tag )Use OEClearSDData to clear all SD data from a molecule.OEClearSDData ( mol )SD Data ExampleThe following example shows how to use the tagged data methods.1 #!/usr/bin/env python2 # ch5-2.py3 from openeye . oechem import ∗4 import os , sys56 mol = OEGraphMol ( )7 OEParseSmiles ( mol , "c1ccccc1" )8 mol . SetTitle ( "benzene" )910 # now set some tagged data11 OESetSDData ( mol , ’color’ , ’brown’ )12 OESetSDData ( mol , ’size’ , ’small’ )13 OESetSDData ( mol , ’natoms’ , str ( mol . NumAtoms ( ) ) )1415 # loop over data and print it out16 for dp in OEGetSDDataPairs ( mol ) :17 sys . stdout . write ( ’%s : %s\n’ % ( dp . GetTag ( ) , dp . GetValue ( ) ) )1819 # check for existence of a field, then delete it20 if OEHasSDData ( mol , ’color’ ) == 1 :21 OEDeleteSDData ( mol , ’color’ )2223 # one last loop shows no ’color’ field24 for dp in OEGetSDDataPairs ( mol ) :25 sys . stdout . write ( ’%s : %s\n’ % ( dp . GetTag ( ) , dp . GetValue ( ) ) )Listing 5.2: Manipulating SD tagged dataNote that SD tagged data is specific to MDL’s SD file format. Any data added to a molecule will only be writtenout to SD files or OEBinary files. The SD data fields will only be filled when reading from SD files that containSD tagged data or from OEBinary files previously created to contain this data.Two more examples are provided specifically dealing with tagged data. sdf2csv.py takes an SD file as inputand outputs a comma-delimited file (.csv) for importing into Excel or other spreadsheet programs. The other,mergecsv.py, takes a csv file and adds the data as tags to molecules in an input stream. This simple scriptassumes that the first column is the molecule title matching titles found in the incoming molecule file. It alsoassumes the first row contains names to be used as the tags.5.4.2 Manipulating PDB Tagged DataThe OESDDataPair class is also used to set or retrieve PDB data pairs. In PDB files, this data is stored in headerlines where the first field is the tag and the remainder of the line is the data. OESDDataPair objects provideSetTag/GetTag and SetValue/GetValue methods for access to each half of PDB pairs.


24 Chapter 5. Properties of MoleculesIf you wish to store a numeric value, use <strong>Python</strong>’s “str()” method to convert it to a string and then use “int()” or“float()” on the value when retrieving the data.The following functions provide access to the PDB data.Storing PDB Data on a MoleculeUse OESetPDBData to set a tag and value data pair. Both tag and value must be strings. If an item with the sametag already exists, it is replaced. The second form is the same as the first but uses an OESDDataPair instance.OESetPDBData ( mol , tag , value )OESetPDBData ( mol , dp )Use OEAddPDBData to add a tag and value data pair. Both tag and value must be strings. If an item with the sametag already exists, another one is added. The second form is the same as the first but uses an OESDDataPairinstance.Note that for PDB header items like REMARK, each line is treated as a separate instance, so to add multipleREMARK lines be sure to use this form instead of OESetPDBData.OEAddPDBData ( mol , tag , value )OEAddPDBData ( mol , dp )Retrieving PDB Data from a MoleculeTo determine if a molecule has an item with tag:OEHasPDBData ( mol , tag )Use OEGetPDBData to get the value for the given tag. If the molecule does not have that tag, an empty string isreturned. Note that if there are multiple parts with the same tag, this will only return the first instance. Using theiterator access show below will allow retrieving multiple tags.OEGetPDBData ( mol , tag )To get access to all PDB data, an iterator of OEBPDBDataPair can be used.OEGetPDBDataPairs ( mol )Copying PDB DataTo copy the entire set of PDB data from a source (src) molecule to a destination (dst) molecule, useOECopyPDBData.OECopyPDBData ( dest , src )Deleting PDB Data from a MoleculeUse OEDeletePDBData to delete a tagged data item. All data items with the specified tag will be deleted.OEDeletePDBData ( mol , tag )To clear all PDB data from a molecule, use OEClearPDBData.OEClearPDBData ( mol )


5.4. Manipulation of Tagged Data 255.4.3 Multi-conformer moleculesFor using tag data with multi-conformer molecules, see ‘Dude, where’s my SD Data?’ on page 31.


CHAPTERSIXOEMols and OEGraphMols6.1 PreparationThis chapter builds upon important concepts introduced in ‘Getting Started with <strong>OEChem</strong> Molecules’ on page 5.It may be beneficial to review that section before proceeding.6.2 Multi-conformer and single-conformer moleculesUp to this point in the manual, all of the examples have involved using concrete OEGraphMol molecules. Thesemolecules have been utilizing the functionality defined in the API of the OEMolBase abstract base-class. Atthis point we will introduce another layer of abstraction in <strong>OEChem</strong>’s representation of molecules. In <strong>OEChem</strong>,we draw a distinction between molecules which are limited to a single conformer and those which may haveany number of conformers. While this may be an arbitrary decision, it is a pragmatic one which allows moreefficient implementation of both classes. The single-conformer molecule’s API is defined by the OEMolBaseabstract base-class, of which you are already familiar. The multi-conformer molecule’s API is defined by anotherabstract base-class, the OEMCMolBaseT (here the MC stands for Multi-Conformer, and the T indicates that thisclass is a template). The OEMCMolBaseT class inherits publicly from OEMolBase, thus the multi-conformermolecule supports the single-conformer API but adds additional functions to manage conformers. Both the singleconformerand the multi-conformer molecules contain atoms and bonds, but only the multi-conformer moleculecontains conformers as first-class objects.You are already familiar with the OEGraphMol, which you have learned is a concrete class which can support theOEMolBase API and can be passed to functions which take an OEMolBase as an argument. You will discoverthat an OEMol provides the API of the OEMCMolBaseT class in addition to that of the OEMolBase. Further, anOEMol can be passed to any function which takes either an OEMolBase or an OEMCMolBaseT as an argument.An OEGraphMol is a concrete class similar to the OEMol which provides access to only the API. AnOEGraphMol can be passed to any function which takes an OEMolBase argument, but not to a function whichtakes an OEMCMolBaseT argument. A OEGraphMol does not inherit from OEMolBase. This is analogous toan OEMol not inheriting from an OEMCMolBaseT. In both cases, this is for efficiency.Simplistic <strong>OEChem</strong> inheritance scheme:OEBase/|\|26


6.3. Conformers 27|OEMolBase ---------------- OEGraphMol/|\||OEMCMolBaseT ------------- OEMolOne must be cautious when utilizing this OEMolBase inheritance functionality. Each multi-conformer moleculehas only a single heavy-atom graph. For functions which query the graph portion of a molecule, a conformer willreflect the graph properties of its parent multi-conformer molecule. Graph properties include the connection tableof atoms and bonds, as well as any properties stored by the atoms and bonds. A conformer is only independentof its parent for non-graph (e.g. conformational) properties. The logical extension of this principle is that changesmade to the graph properties of one conformer will effect its parent multi-conformer molecule and thus all theother conformers in that molecule as well.


28 Chapter 6. OEMols and OEGraphMols6.4 Properties of Multi-Conformer MoleculesMulti-conformer molecules in <strong>OEChem</strong> are represented by the abstract base class OEMCMolBases.OEMCMolBase derives from OEMolBase and thus contains all of the properties of molecules discussed in ‘Propertiesof Molecules’ on page 18. In addition to storing the atoms and bonds of a molecule, an OEMCMolBaseis able to store multiple conformations of a molecule. Conformations of a molecule share the same heavy-atomgraph. Here, the word “graph” should be thought of in the graph theory sense, such that a molecule is modeledas vertices (atoms) and edges connecting pairs of vertices (bonds). Therefore, all conformers share a commonconnection table of the same atoms and bonds. Furthermore, because they share the same atoms and bonds, theyalso share any properties stored by the atoms and bonds (see ‘Properties of Atoms’ on page 38 and ‘Properties ofBonds’ on page 44). The sharing of a common connection table prevents tautomers from being modeled togetherwith an OEMCMolBase. Coordinate information is stored by the molecule, not the shared atoms and bonds, whichallows the conformers to share the same heavy-atom graph but have different spatial configurations. In <strong>OEChem</strong>,these conformers are represented by OEConfBases, which are first-class objects.OEMols can be used as either a molecule with a particular one of its conformers represented as its current “state,”or as a container of conformers with access to many of them at once.6.4.1 GetConf and GetConfsConformers of OEMCMolBases are accessed via the GetConf and GetConfs methods. GetConf returnsthe first conformer in the molecule for which the predicate function passed as the argument returns true (see‘Predicate Functions’ on page 92 for more information on predicates). GetConfs returns an iterator over theconformers of the molecule.6.4.2 Use of the conformer stateOEMCMolBases have four functions which control the current state of the molecule with respect to conformers.SetActive takes an OEConfBase as an object and makes the OEMCMolBase act exactly like an OEMolBasewith the “Active” conformer as the only conformer. The GetActive function returns a pointer to the currently“Active” conformation. There are many OEMolBase functions which access the single-conformer coordinates ofa molecule. When these functions are called on an OEMCMolBase, the coordinates of the “Active” conformerare returned. Similarly, if the OEMCMolBase does not have a title or energy of its own, the title or energy of theactive conformer will be returned. This is particularly convenient when passing the molecule to a function whichhas been written to use or manipulate the coordinates of an OEMolBase.1 #!/usr/bin/env python2 # ch5-3.py3 from openeye . oechem import ∗4 import os , sys56 def GetMaxX ( mol ) :7 maxX = 0 . 08 first = 19 for atom in mol . GetAtoms ( ) :10 xyz = mol . GetCoords ( atom )11 if first :12 maxX = xyz [ 0 ]13 first = None14 else :15 if xyz[0] >maxX :


6.4. Properties of Multi-Conformer Molecules 2916 maxX = xyz [ 0 ]1718 return maxX1920 ifs = oemolistream ( sys . argv [ 1 ] )2122 for mol in ifs . GetOEMols ( ) :23 print mol . GetTitle ( )24 for conf in mol . GetConfs ( ) :25 mol . SetActive ( conf )26 print "maxX = " , GetMaxX ( mol )Listing 6.1: Setting the conformer stateWhile the SetActive and GetActive interface is sufficient for most uses, it is sometimes necessary to thinkof a more complex representation of the state of the molecule. The OEMCMolBase also has PushActive andPopActive functions which extend the control over the active conformation. All four of these functions worktogether to determine which conformation is the current active conformation. The active conformation is the topconformation in a stack of OEConfBases held by the molecule. SetActive changes the top conformation onthe stack, while GetActive returns the top conformation on the stack. PushActive puts a new conformationin the top position of the stack, pushing all other members of the stack down. PopActive removes the topconformer in the stack (allowing the next lower conformer to become the active conformer). The conformer stackis helpful for using the state of an OEMCMolBase within a function while restoring the molecule to its originalstate before returning it.6.4.3 Use of the conformers as first-class objectsAlternatively, a programmer may wish to use the conformers as first class objects rather than via the state of theOEMCMolBase. This allows one to have multiple conformation objects at once and to treat the OEMCMolBaseas a container of single-conformer molecules. The example below shows the use of the conformers as first classobjects. Each conformer is represented by an OEConfBase which inherits from OEMolBase. Thus, eachconformer can be treated as an independent molecule with respect to its coordinates as shown in the example codebelow.1 #!/usr/bin/env python2 # ch5-3.py3 from openeye . oechem import ∗4 import os , sys56 def GetMaxX ( mol ) :7 maxX = 0 . 08 first = 19 for atom in mol . GetAtoms ( ) :10 xyz = mol . GetCoords ( atom )11 if first :12 maxX = xyz [ 0 ]13 first = None14 else :15 if xyz[0] >maxX :16 maxX = xyz [ 0 ]1718 return maxX1920 ifs = oemolistream ( sys . argv [ 1 ] )2122 maxconf = None23 xmax = 0 . 0


30 Chapter 6. OEMols and OEGraphMols2425 for mol in ifs . GetOEMols ( ) :26 for conf in mol . GetConfs ( ) :27 xtmp = GetMaxX ( conf )28 if xtmp > xmax :29 if maxconf :30 print conf . GetTitle ( ) , "has larger x than" ,maxconf31 maxconf = conf . GetTitle ( )32 xmax = GetMaxX ( conf )Listing 6.2: Conformers as first class objectsIn the listing above, the function GetMaxX returns the maximum x-coordinate of a molecule. The main routineloops over all of the conformers of each molecule and compares the maximum x-coordinate to a running maximumof the x-coordinate of every conformer. If there is a new maximum, the associated conformer is stored and the useris notified.6.5 Reading Multi-conformer moleculesMolecule streams, which were introduced in ‘Using <strong>OEChem</strong> oemolstreams’ on page 11, can read both single andmulti-conformer molecules from any file format. Many of the file formats supported by <strong>OEChem</strong> are inherentlya single conformer format (SDF and MOL2, for example). However, a common practice is to store multiple conformersin these files. <strong>OEChem</strong> supports a rather advanced mechanism for recovering these separate conformersinto a single, multi-conformer OEMol. Note that this does not apply to file formats where conformers are storedtogether. For example, all molecules in an old OEBinary file are multi-conformer molecules. New OEBinary(.oeb) files store either single conformer or multi-conformer molecules explicitly, so the file itself determines howto deal with conformers. Additionally, file formats that have no notion of conformers (i.e. SMILES files) are alsounaffected by this feature.In early versions of <strong>OEChem</strong>, the default behavior for reading into an OEMol was to attempt to combine conformersof the same molecules together into a single OEMol. This is no longer the default, but is instead somethingcontrolled by the programmer.oemolistreams have a method, SetConfTest, that sets a functor that is used to compare the graphs of incomingmolecules in order to determine whether to combine them. These functors are instances of OEConfTest.Several predefined versions include:OEDefaultConfTest Never combine connection tables into multi-conformer molecules.OEIsomericConfTest This implementation of OEConfTest combines subsequent connection tables into amulti-conformer molecule if they:1. Have the same title (optional)2. Have the same numbers of atoms and bonds in the same order3. Each atom and bond must have identical properties with its order correspondent in the subsequentconnection table4. Have the same atom and bond stereochemistryNo changes are made to the connection table.The constructor for OEIsomericConfTest has a default argument for whether or not to compare titles.If the constructor is called with no arguments or with the argument true, the titles will be required to be


6.6. Dude, where’s my SD Data? 31the same. Otherwise, the titles will not be compared. In the latter instance, each conformer will have theindividual title of its original connection table and the multi-conformer molecule will reflect the title of theactive conformer.OEAbsoluteConfTest This implementation of OEConfTest combines subsequent connection tables into amulti-conformer molecule if they:1. Have the same title (optional)2. Have the same number of atoms and bonds in the same order3. Each atom and bond must have identical properties with its order correspondent in the subsequentconnection tableThis conformer test sets all fully specified isomeric values to UNDEFINED.The constructor for OEAbsoluteConfTest has a default argument for whether or not to compare titles.If the constructor is called with no arguments or with the argument true, the titles will be required to bethe same. Otherwise, the titles will not be compared. In the latter instance, each conformer will have theindividual title of its original connection table and the multi-conformer molecule will reflect the title of theactive conformer.OEAbsCanonicalConfTest This implementation of OEConfTest combines subsequent connection tables intoa multi-conformer molecule if they:1. Have the same absolute (non-isomeric) graphThis conformer test puts all of the molecules in their canonical atom order. In addition, all fully specifiedisomeric values are set to UNDEFINED.Listing 6.3 will attempt to read multi-conformer molecules from an input file based on OEAbsoluteConfTest.Note that creating the instance of OEAbsoluteConfTest with the default constructor argument (0 or false),which allows conformers to be combined when they have different titles. This is very useful when dealing withfiles created by programs that modify molecule titles to indicate conformer number (i.e. acetsali 1, acetsali 2,acetsali 3). The resulting multi-conformer molecule will have the title associated with the first connection tableread from the file.#!/usr/bin/env pythonfrom openeye . oechem import ∗ifs = oemolistream ( ’mcdrugs.sdf’ )ifs . SetConfTest ( OEIsomericConfTest ( ) )for mol in ifs . GetOEMols ( ) :print ’%s has %d conformers’ % ( mol . GetTitle ( ) , mol . NumConfs ( ) )Listing 6.3: Reading in multi-conformer molecules6.6 Dude, where’s my SD Data?SD tag data can be set to OEMolBases, OEMCMolBases, or OEConfBases. Generally, <strong>OEChem</strong> will neverlose any data when reading or writing. However, there are constraints placed on <strong>OEChem</strong> as to where the datamust go based upon the file format being used.


32 Chapter 6. OEMols and OEGraphMolsA problem occurs when setting SD tag data to an OEMCMolBase and then writing it to SDF. SDF files do notsupport multiple conformers. However, <strong>OEChem</strong> can automatically read consecutive conformers out of a SDFfile into a OEMCMolBase. To preserve the generic data <strong>OEChem</strong> has no choice but to push the data onto theconformers.OEB files do not have this restriction upon them because they do support multi-conformer molecules. The followingtable shows how to round trip SD tag data through the SDF and OEB formats.Attached To Written To Read Into Attached ToOEMCMolBase sdf OEMCMolBase OEConfBaseOEMCMolBase sdf OEMolBaseOEMCMolBase oeb OEMCMolBaseOEMCMolBase oeb OEMolBaseOEConfBase sdf OEMCMolBase OEConfBaseOEConfBase sdf OEMolBaseOEConfBase oeb OEMCMolBase OEConfBaseOEConfBase oeb OEMolBaseOEMolBase sdf OEMCMolBase OEConfBaseOEMolBase sdf OEMolBaseOEMolBase oeb OEMCMolBase OEConfBaseOEMolBase oeb OEMolBasePractically speaking, it is best to never attach SD tag data to an OEMCMolBase. This should only be done as aspace optimization when it is assured that the multi-conformer molecule will only be written to OEB.To this end, when an OEMol copy constructs from an OEGraphMol the SD tag data (and any other generic data)is attached to the first conformer.


CHAPTERSEVENTraversing the Atoms and Bonds of aMolecule7.1 Looping over the Atoms and Bonds of a MoleculeThe <strong>Python</strong> wrapper to <strong>OEChem</strong> provides generator methods for traversing the atoms and bonds in a molecule.Since the underlying storage mechanism is not exposed, there is no longer the notion of an array of atoms or andarray of bonds inside a molecule. In order to traverse the atoms or bonds in C++, one uses iterators, but in <strong>Python</strong>we have wrapped the C++ iterators into generator methods much like the loop over molecules in a oemolistreamor tag,value pairs of SD data.The two methods are GetAtoms() and GetBonds(). For each successive iteration through the for loop, thesemethods return the next atom and bond respectively.The following example show how to traverse the atoms and print out their atomic numbers and then traverse thebonds and print their bond order.1 #!/usr/bin/env python2 # ch6-1.py3 from openeye . oechem import ∗45 mol = OEGraphMol ( )6 OEParseSmiles ( mol , "c1ccccc1" )78 print "atoms"9 for atom in mol . GetAtoms ( ) :10 print atom . GetAtomicNum ( )11 print "bonds"12 for bond in mol . GetBonds ( ) :13 print bond . GetOrder ( )Listing 7.1: Looping over the atoms and bondsNote that this example also introduces a couple of new methods, an OEAtomBase member function: GetAtomic-Num() and an OEBondBase member function: GetOrder(). These and other member functions of these two classeswill be covered in more detail in subsequent chapters.33


34 Chapter 7. Traversing the Atoms and Bonds of a Molecule7.2 Looping over the Bonds of an AtomThe exact same idiom is used for iterating over the bonds attached to an atom. The OEAtomBase method GetBondsis a generator method that allows looping over the bonds connected to the atom. The example below shows howto use this method to determine the explicit degree of an atom, i.e. the number of bonds to it, other than implicithydrogen atoms.1 #!/usr/bin/env python2 # ch6-2.py3 from openeye . oechem import ∗45 def MyGetExplicitDegree ( atom ) :6 result = 07 for bond in atom . GetBonds ( ) :8 result += 19 return result1011 mol = OEGraphMol ( )12 OEParseSmiles ( mol , "c1ccccc1" )1314 for atom in mol . GetAtoms ( ) :15 print MyGetExplicitDegree ( atom )Listing 7.2: Looping over the bonds of an atom7.3 Looping over the Neighbors of an AtomOften it is not the bonds around the atoms that you wish to loop over, but the neighboring atoms. One way to dothis would be to use the GetBonds method described in the previous section and use the GetNbr method of eachOEBondBase to get the atom across the bond from the input atom.1 #!/usr/bin/env python2 # ch6-3.py3 from openeye . oechem import ∗45 def ShowNeighbors ( atom ) :6 for bond in atom . GetBonds ( ) :7 nbor = bond . GetNbr ( atom )8 print nbor . GetIdx ( ) ,9 print1011 mol = OEGraphMol ( )12 OEParseSmiles ( mol , "c1ccccc1" )1314 for atom in mol . GetAtoms ( ) :15 print atom . GetIdx ( ) ,16 ShowNeighbors ( atom )Listing 7.3: Finding the neighbors of an atom version 1However this can be done even more conveniently using the GetAtoms method of an OEAtomBase directly, whichallows loops over the neighbor atoms.1 #!/usr/bin/env python2 # ch6-4.py


7.4. Looping over subsets of Atoms or Bonds 353 from openeye . oechem import ∗45 def ShowNeighbors ( atom ) :6 for nbor in atom . GetAtoms ( ) :7 print nbor . GetIdx ( ) ,8 print910 mol = OEGraphMol ( )11 OEParseSmiles ( mol , "c1ccccc1" )1213 for atom in mol . GetAtoms ( ) :14 print atom . GetIdx ( ) ,15 ShowNeighbors ( atom )Listing 7.4: Finding the neighbors of an atom version 27.4 Looping over subsets of Atoms or BondsIt can sometimes be useful to loop over a subset of the atoms or bonds of a molecule. Traditionally this can be donewith ”if” statements inside a loop, but it can sometimes be cleaner and more convenient to subset the membersbeing looped over inside the iterator. To do this, many of <strong>OEChem</strong>’s iterator generation functions (such as OEMol-Base::GetAtoms) can take an argument which determines which subset of the object to loop over (these functionsare called predicates as detailed in the chapter ”Predicate Functions” below). The details of these functions are notimportant here. Instead, a programmer can simply use the predefined functors to control their loops.The following example shows the use of the predicate HasAtomicNum() to loop over only carbon atoms in amolecule.1 #!/usr/bin/env python2 # ch6-5.py3 from openeye . oechem import ∗45 mol = OEGraphMol ( )6 OEParseSmiles ( mol , "c1ccccc1CCCBr" )78 print "Carbon atoms:" ,9 for atom in mol . GetAtoms ( IsCarbon ( ) ) :10 print atom . GetIdx ( ) ,11 printListing 7.5: Looping over carbon atoms onlySome of the common predefined functors in <strong>OEChem</strong> are listed below. Predicate functions can be trivial, such asIsHydrogen(), or quite complex, such as Match(string), which returns atoms which match the SMARTSstring passed to the constructor. For a complete listing, please see the chapter on predicate functions or the APImanual. Many predicates take intuitive construction arguments. For instance, HasAtomName has a stringargument which is the atom’s name (e.g. mol.GetAtoms(HasAtomName("CA"))).AtomsHasAtomName ( string )HasAtomicNum ( int )IsHalogenIsAromaticAtomAtomIsInRingIsChiralAtomHasResidueNumber ( int )


36 Chapter 7. Traversing the Atoms and Bonds of a MoleculeBondsMatch ( string )HasBondIdx ( int )HasOrder ( int )BondIsInRingIsRotorConformersHasConfIdx ( unsigned int )These predicates can be particularly helpful when used in conjunction with functions which take OEIters as argumentsas seen in the example below. This use of predicates allows factorization of the loop in a way not easilypossible with if statements.7.5 Using <strong>OEChem</strong> C++ Iterators DirectlyThe standard way of processing each item or member of a set or collection in <strong>OEChem</strong> is by use of a C++iterator. The use of iterators is a common design pattern in object-oriented programming that hides the way thecollection/container is implemented internally. Hence a set of atoms could be implemented internally as an array,a linked list, a hash table, or any similar data structure, but its behavior to the programmer is independent of theactual implementation. An iterator can be thought of as a current position indicator.In C++, <strong>OEChem</strong> iterators make use of C++’s template mechanism. The use of templates allows the functionalityof an iterator to be specified independently of the type of the collection being iterated over. An iterator over atomsis defined as type ”OEIter” and an iterator over bonds has type ”OEIter”.In the <strong>Python</strong> wrapper, these instantiations are replaced with method calls on the OEMolBase. To get an iteratorover the atoms in an OEMolBase ”mol”, one would use mol.GetAtomIter() and likewise a call to mol.GetBondIter()would return a reference to an iterator over the bonds of the molecule. The generator methods described in theprevious section make use of these iterator methods, but simplify it for simple looping over the atoms and bonds.If a programmer wants to use iterators directly in <strong>Python</strong>, the following table describes the translation from C++OEIterBase operators and methods to the <strong>Python</strong> wrapper methods.Method C++ <strong>Python</strong>Creation OEIter i = mol.GetAtoms(); i = mol.GetAtomIter()Increment ++i; i.Next()Decrement –i; i.Prev()Increment by n i += n; i.Next(n)Decrement by n i -= n; i.Prev(n)Go to first i.ToFirst(); i.ToFirst()Go to last i.ToLast(); i.ToLast()De-reference (access the object operator -> i.Target()pointed to, i.e. OEAtomBase)Validity operator bool i.IsValid()The next example show how to use the an OEAtomBase iterator directly to loop over the atoms in a molecule inreverse order and print their atomic numbers.


7.5. Using <strong>OEChem</strong> C++ Iterators Directly 371 #!/usr/bin/env python2 # ch6-5.py3 from openeye . oechem import ∗45 mol = OEGraphMol ( )6 OEParseSmiles ( mol , "n1ccccc1" )78 iter = mol . GetAtomIter ( )9 iter . ToLast ( )10 while iter . IsValid ( ) :11 print iter . Target ( ) . GetAtomicNum ( )12 iter . Prev ( )Listing 7.6: Using C++ iterators directly from <strong>Python</strong>


CHAPTEREIGHTProperties of Atoms8.1 Stored Properties of AtomsThe OEAtomBase class is the work horse of the <strong>OEChem</strong> library, representing the atoms of a molecule. Thefollowing table shows the properties stored in each OEAtomBase along with the member methods to Set and Getthat property.Property Name Type Get Method Set MethodAtomic Number int GetAtomicNum SetAtomicNumFormal Charge int GetFormalCharge SetFormalChargeImplicit Hydrogen Count int GetImplicitHCount SetImplicitHCountIsotopic Mass int GetIsotope SetIsotopePartial Charge float GetPartialCharge SetPartialChargeAtomic Hybridization int GetHyb SetHybInteger Atom Type int GetIntType SetIntTypeAtom Name string GetName SetNameAtom Type Name string GetType SetTypeRadius double GetRadius SetRadiusReaction Role int GetRxnRole SetRxnRoleReaction Map Index int GetMapIdx SetMapIdxRing Membership 0 or 1 IsInRing SetInRingSymmetry Class int GetSymmetryClass SetSymmetryClassAromaticity 0 or 1 IsAromatic SetAromatic8.1.1 AtomicNumThe ”atomic number” property of an atom is an unsigned integer representing the atomic number, or element, ofthat atom. The default value is zero.38


8.1. Stored Properties of Atoms 398.1.2 FormalChargeThe ”formal charge” property of an atom is an integer representing the formal charge on an atom. The defaultvalue is zero, indicating a neutral atom.8.1.3 ImplicitHCountThe ”implicit hydrogen count” property of an atom is an unsigned integer denoting the number of hydrogensimplicitly attached to an atom. The default value is zero.8.1.4 IsotopeThe ”isotope” property of an atom is an unsigned integer used to indicate whether the atom is mono-isotopic andif so the atomic mass of the relevant isotope. The default value is zero, indicating that the atom has a typicalpopulation of isotopes (based upon standard abundances) for that element.8.1.5 PartialChargeThe ”partial charge” property of an atom is a float used to hold the partial charge assigned to an atom. The defaultvalue is 0.0.8.1.6 HybThe ”hydridization” property of an atom is an unsigned int used to hold the atomic hybridization or geometry ofan atom. The default value is OEHybridization Unknown, which is zero.8.1.7 IntTypeThe ”integer type” property of an atom is an integer that holds the numeric atom type assigned to an atom. Thedefault value is zero.8.1.8 NameThe ”name” property of an atom is a string used to record the atom name associated with an atom. The defaultvalue is the empty string.


40 Chapter 8. Properties of Atoms8.1.9 TypeThe ”type” property of an atom is a string used to store the symbolic atom type assigned to an atom. The defaultvalue is the empty string.8.1.10 RadiusThe ”radius” property of an atom is a float used to hold the radius assigned to an atom. The default value is 0.0.8.1.11 RxnRoleThe ”reaction role” property of an atom is an unsigned integer used to record the role, if any, of an atom in areaction or transform. The default value is OERxnRole None, which is zero.8.1.12 MapIdxThe ”reaction map index” property of an atom is an unsigned used to represent the atom equivalences in reactionand transforms, or to label R-groups/attachment points in molecules. The default value is zero.8.1.13 InRingThe ”in ring” property of a atom is a boolean used to represent whether the atom is a member of a cycle/ring. Thedefault value is false.8.1.14 SymmetryClassThe ”symmetry class” property of an atom is an unsigned integer used to represent the topological symmetry groupof an atom. The default value is zero.8.1.15 AromaticThe ”aromatic” property of an atom is a boolean used to represent whether the atom is considered a member of anaromatic ring/cycle. The default value is false.


8.2. Derived Properties of Atoms 418.2 Derived Properties of Atoms8.2.1 GetDegreeThe ”GetDegreeMethod” method returns the degree, or total number of bonds attached to an atom, including thoseto implicit hydrogens.8.2.2 GetExplicitDegreeThe ”GetExplicitDegree” method returns the explicit degree, or total number of explicit bonds attached to an atom.8.2.3 GetExplicitHCountThe ”GetExplicitHCount” method returns the number of explicit hydrogens attached to an atom, i.e. the numberof neighbors with atomic number OEElemNo H.8.2.4 GetExplicitValenceThe ”GetExplicitValence” method returns the sum of the bond orders of all of the explicit bonds attached to atom.8.2.5 GetHvyDegreeThe ”GetHvyDegree” method returns the heavy atom degree of an atom, i.e. the number of non-hydrogen atomsbonded to an atom.8.2.6 GetHvyValenceThe ”GetHvyValence” method returns the heavy atom valence of an atom, i.e. the sum of the bond orders of bondsto non-hydrogen neighbors.8.2.7 GetParentThe ”GetParent” method returns an OEMolBase pointer to the parent of a given atom.


42 Chapter 8. Properties of Atoms8.2.8 GetTotalHCountThe ”GetTotalHCount” method returns the total hydrogen count, i.e. the total number of hydrogens bonded to anatom, including implicit hydrogens.8.2.9 GetValenceThe ”GetValence” method returns the sum of all bond orders to an atom, including those to implicit hydrogenatoms.8.2.10 GetIdxThe ”GetIdx” method returns the unique atom index of an atom within its parent molecule.8.2.11 IsCarbonThe ”IsCarbon” method tests whether the given atom is a carbon, i.e. has atomic number of six (OEElemNo C).8.2.12 IsConnectedThe ”IsConnected” method tests whether a bond exists from the calling atom to the atom passed as an argument tothe method. If a bond between the two atoms exists, the method returns true. If no bond exists, the method returnsfalse.8.2.13 IsHalogenThe ”IsHalogen” method tests whether the given atom is a fluorine, chlorine, bromine, or Iodine. i.e. has one ofthe following atomic numbers: (OEElemNo F, OEElemNo Cl, OEElemNo Br, OEElemNo I).8.2.14 IsHydrogenThe ”IsHydrogen” method tests whether the given atom is a hydrogen, i.e.(OEElemNo H).has atomic number of one


8.2. Derived Properties of Atoms 438.2.15 IsMetalThe ”IsMetal” method tests whether the given atom is a metal, i.e. has an atomic number which is classified as a’B’ group element (IB through VIII).8.2.16 IsNitrogenThe ”IsNitrogen” method tests whether the given atom is a nitrogen, i.e.(OEElemNo N).has atomic number of seven8.2.17 IsOxygenThe ”IsOxygen” method tests whether the given atom is an oxygen, i.e.(OEElemNo O).has atomic number of eight8.2.18 IsPhosphorusThe ”IsPhosphorus” method tests whether the given atom is a phosphorus, i.e. has atomic number of 15(OEElemNo P).8.2.19 IsPolarHydrogenThe ”IsPolarHydrogen” method tests whether the given atom is a hydrogen, i.e.(OEElemNo H), and is not connected to either a carbon or another hydrogen.has atomic number of one8.2.20 IsSulfurThe ”IsSulfur” method tests whether the given atom is a sulfur, i.e. has atomic number of 16 (OEElemNo S).


CHAPTERNINEProperties of Bonds9.1 Stored Properties of BondsProperty Name Type Get Method Set MethodBond Order int GetOrder SetOrderAromaticity 0 or 1 IsAromatic SetAromaticRing Membership 0 or 1 IsInRing SetInRingInteger Bond Type int GetIntType SetIntTypeBond Type Name string GetType SetTypeBegin Atom OEAtomBase GetBgn SetBgnEnd Atom OEAtomBase GetEnd SetEnd9.1.1 OrderThe ”order” property of a bond is an unsigned integer representing it’s formal bond order, i.e. single, double, triple,quadruple, etc...9.1.2 AromaticThe ”aromatic” property of a bond is a boolean used to denote whether the bond has been determined to be amember of an aromatic ring/cycle. The default value is false.9.1.3 InRingThe ”in ring” property of a bond is a boolean used to represent whether the bond is a member of a cycle/ring. Thedefault value is false.44


9.2. Derived Properties of Bonds 459.1.4 IntTypeThe ”integer type” property of a bond is an integer used to record the numeric bond type assigned to a bond. Thedefault value is zero.9.1.5 TypeThe ”type” property of a bond is a string used to record the symbolic bond type assigned to a bond. The defaultvalue is the empty string.9.1.6 BgnThe ”begin atom” property of a bond is an OEAtomBase* used to represent the atom at the start of a bond.9.1.7 EndThe ”end atom” property of a bond is an OEAtomBase* used to represent the atom at the end of a bond.9.2 Derived Properties of Bonds9.2.1 GetBgnIdx9.2.2 GetEndIdx9.2.3 GetNbr9.2.4 GetParent9.2.5 IsRotor


CHAPTERTENAtom, Bond and Conformer Indices<strong>OEChem</strong> assigns each OEAtomBase and each OEBondBase of an OEMolBase an index when it is created. Thisindex can be used to distinguish one OEAtomBase from another, as that it is unique among OEAtomBases of thesame molecule. Atom and bond indices are also stable. A given OEAtomBase will have the same index throughoutits lifetime independent of the reordering of atoms of a molecule, or the creation or deletion of other atoms (withthe single exception of the Sweep method of OEMolBases and the SweepConfs method of OEMCMolBases).Molecules instantiated via copy constructors will have indices which correspond with those of the originalmolecule.Note that atom and bond indices are not guaranteed to be sequential, or even created sequentially, hence atomindices can’t easily be used to retrieve all of the atoms of a molecule. They can however be assumed to be densesmall integers greater than or equal to zero and less than GetMaxAtomIdx() for atoms (or GetMaxBondIdx() forbonds or GetMaxConfIdx for conformers).OEAtomBases and OEBondBases both implement the GetIdx method to return their unique index in the moleculeand also a SetIdx method for setting a unique number. SetIdx should probably only be used in writing file readersor other low-level methods. The following example shows loops over the atoms and bonds and prints their indices.1 #!/usr/bin/env python2 # ch9-1.py3 from openeye . oechem import ∗45 mol = OEGraphMol ( )6 OEParseSmiles ( mol , "c1ccccc1" )78 print "atoms"9 for atom in mol . GetAtoms ( ) :10 print atom . GetIdx ( )1112 print "bonds"13 for bond in mol . GetBonds ( ) :14 print bond . GetIdx ( )Listing 10.1: Atom and bond indicesPlease never, never, ever do this:# Never ever, ever do this!for i in xrange ( mol . NumAtoms ( ) ) :atom = mol . GetAtom ( HasAtomIdx ( i ) )# pretend atom is valid46


CHAPTERELEVENCreating Atoms, Bonds andConformers11.1 Using NewAtom and NewBondWhile using SMILES is a convenient method of specifying a molecule, <strong>OEChem</strong> contains functions that allowmolecules to be constructed from atoms and bonds explicitly. The following example shows how to create themolecule water.Atoms are created by calling the OEMolBase method, NewAtom, and Bonds are created by calling the OEMolBasemethod, NewBond. OEMolBase NewAtom takes the atomic number of the atom to create and returns a pointer tothe new OEAtomBase, and NewBond takes two OEAtomBases and a integer bond order as arguments, and returnsa pointer to the new OEBondBase.The atoms and bonds of a molecule are automatically deleted when their parent molecule is destroyed.1 #!/usr/bin/env python2 # ch10-1.py3 from openeye . oechem import ∗45 mol = OEGraphMol ( )67 o = mol . NewAtom ( 8 )8 h1 = mol . NewAtom ( 1 )9 h2 = mol . NewAtom ( 1 )1011 b1 = mol . NewBond ( o , h1 , 1 )12 b2 = mol . NewBond ( o , h2 , 1 )1314 print mol . NumAtoms ( )15 print mol . NumBonds ( )Listing 11.1: Creating new atoms and bonds version 1In the example source code, the atomic numbers of oxygen, 8, and hydrogen, 1, are explicitly encoded in theprogram. To make this code easier to read and less error prone, <strong>OEChem</strong> provides symbolic constants for the first109 elements. This defines the atomic symbols in the C++ namespace OEElemNo with the appropriate values.The following example uses these constants instead of just numbers.1 #!/usr/bin/env python2 # ch10-2.py47


48 Chapter 11. Creating Atoms, Bonds and Conformers3 from openeye . oechem import ∗45 mol = OEGraphMol ( )67 o = mol . NewAtom ( OEElemNo_O )8 h1 = mol . NewAtom ( OEElemNo_H )9 h2 = mol . NewAtom ( OEElemNo_H )1011 b1 = mol . NewBond ( o , h1 , 1 )12 b2 = mol . NewBond ( o , h2 , 1 )1314 print mol . NumAtoms ( )15 print mol . NumBonds ( )Listing 11.2: Creating new atoms and bonds version 211.2 Implicit vs. Explicit Hydrogen AtomsIn many chemistry applications it is useful to treat the hydrogen atoms implicitly. Typically, the number of heavy(non-hydrogen) atoms is approximately half the total number of atoms in a molecule. This means that if insteadof representing hydrogens explicitly, they are maintained as an implicit hydrogen count associated with each atom,it is possible to halve the amount of memory required to represent a molecule (on average). For applications suchas computational chemistry where algorithms are often worse than linear in the number of atoms, this more thandoubles their performance. In molecular mechanics, this is often referred to as a united atom representation of amolecule. Due to its convenience, this is the representation of choice for Daylight SMILES and MDL connectiontables.<strong>OEChem</strong> supports the united atom representation of molecules, by providing SetImplicitHCount and GetImplicitH-Count functions for OEAtomBases. In the example above, it is possible to create and represent a water moleculeby an OEMolBase containing only a single OEAtomBase.1 #!/usr/bin/env python23 from openeye . oechem import ∗45 mol = OEGraphMol ( )6 Oatom = mol . NewAtom ( OEElemNo_O )7 Oatom . SetImplicitHCount ( 2 )Listing 11.3: Using implicit hydrogensTo determine whether a molecule is represented by an all atom or united atom representation (or possibly somecombination of both, such as polar hydrogens), <strong>OEChem</strong> provides the two functions ”OEHasExplicitHydrogens”and ”OEHasImplicitHydrogens” that return true or false (1 or 0).11.3 Updating Implicit Hydrogen CountsFor most simple molecules, <strong>OEChem</strong> can deduce the number of implicit hydrogens that are required to fill valenceon each atom. Provided that the molecule doesn’t contain radicals, unusual charge states, underfilled or overfilled


11.4. Making Hydrogen Atoms Implicit 49valences, the number of implicit hydrogens can be deduced from the atomic number and formal charges on atoms,and the bond orders of the bonds of a molecule.The function to call to automatically update the implicit hydrogen counts of each atom is called OEAssignImplicitHydrogens.1 #!/usr/bin/env python2 # ch10-4.py3 from openeye . oechem import ∗45 mol = OEGraphMol ( )6 o = mol . NewAtom ( OEElemNo_O )7 OEAssignImplicitHydrogens ( mol )Listing 11.4: Adding implicit hydrogensThe OEAssignMDLHydrogens command implements a simple model of valence suitable for guessing the numberof hydrogens expected to be present on each atom when the formal charge has correctly been assigned.11.4 Making Hydrogen Atoms ImplicitA molecule which contains explicit hydrogens may be converted to a hydrogen suppressed graph using theOESuppressHydrogens function. The OESuppressHydrogens function deletes all explicit hydrogensin the molecule. The implicit hydrogen count fields of the remaining atoms are incremented by the number ofpreviously attached explicit hydrogens which have been deleted. The valence state of the remaining atoms ispreserved by the retention of implicit hydrogen counts.11.5 Making Hydrogen Atoms ExplicitImplicit hydrogens contained in a molecule can be made explicit using one of the overloadedOEAddExplicitHydrogens functions. Implicit hydrogens may be made explicit for only a single atom ofa molecule. The OEAddExplicitHydrogens functions which take either a single or multi-conformer conformermolecule as an argument convert either polar or all implicit hydrogens into explicit hydrogens. Thesefunctions take an argument which specifies whether energetically reasonable geometries should be assigned for thenewly added hydrogens. If this option is not desired, the Cartesian coordinates of the sprouted hydrogens will beassigned to the coordinates of the parent atom to which they are attached.11.6 Sprouting Hydrogens in 3DEnergetically reasonable coordinates for hydrogen atoms may be assigned automatically using theOESet3DHydrogenGeom functions. The overloaded functions which take only a molecule as an argumenttraverse molecules and identify hydrogen atoms which are bonded to a single atom, and have the same Cartesiancoordinates of that atoms. The functions then attempt to compute an energetically reasonable position for the identifiedhydrogen atom based on either a pre-assigned hybridization value for the parent atom. If no hybridization


50 Chapter 11. Creating Atoms, Bonds and Conformersvalue is found for the parent atom then the hybridization will be assigned, if possible, from the molecule graph.If hybridization assignment fails then new coordinates will not be assigned for the hydrogen atom. An additionalfunction is provided which allows specification of the hydrogen for which new coordinates are to be assigned. Newcoordinates for the hydrogen atom will be computed regardless of whether the current coordinates of the hydrogenare identical to the atom to which it is attached. If coordinates are assigned correctly the method will return booleanTrue. If the function fails to compute a geometrically reasonable position the method will return boolean False.11.7 Using NewConfThe most common method to create conformers in a molecule is by reading a molecule from a file (see chapter”Reading and Writing Molecules”). However, when manipulating molecules it is often necessary to createconformers on-the-fly. In <strong>OEChem</strong>, this is done with the NewConf method of OEMCMolBases. There are fiveprominent overloads of NewConf. All of the versions create conformers with the capacity to store coordinatesfor the current number of atoms in the molecule. NewAtom and NewBond adjust this capacity as necessary.The default OEMCMolBase constructor puts the molecule in a state with a single empty conformer (as does theOEMCMolBase::Clear function). The DeleteConfs function of the OEMCMolBase removes all of theconformers of the molecule.1 #!/usr/bin/env python2 # ch10-5.py3 from openeye . oechem import ∗45 mol = OEMol ( )67 print "Default NumConfs = " , mol . NumConfs ( )8109 mol . NewConf ( )11 print "After one additional, NumConfs = " , mol . NumConfs ( )1213 mol . DeleteConfs ( )1415 print "After deletion, NumConfs = " , mol . NumConfs ( )The code about will produce the output:Default NumConfs = 1After one additional , NumConfs = 2After deletion , NumConfs = 0The versions of the NewConf method are:NewConf ( )NewConf ( OEFloatArray )NewConf ( OEMolBase )NewConf ( OEConfBase )Listing 11.5: Deleting ConformersAfter the NewConf with no arguments has been called, the coordinates of individual atoms can be set using theSetCoords method which takes an atom, or all of the atoms can be set at once with the SetCoords whichtakes only a OEFloatArray or only a OEDoubleArray.The NewConf overload which takes an argument OEFloatArray is expecting a OEFloatArray ofsize 3*GetMaxAtomIdx() with the Cartesian coordinates of each atom of the new conformer incoords[atom->GetIdx()*3].


11.7. Using NewConf 51The NewConf which takes an OEMolBase is expecting the molecule passed in to have the same graph as theOEMCMolBase which is the parent of the new conformer. It is important to note that this version of NewConfcan take an OEGraphMol, OEMol, or OEMCMol. In the latter two cases, the coordinates of the new conformerwill come from the active conformation of the molecule passed in.Finally, there is an overload which takes a conformer. This function behaves the same as the overload which takesan OEMolBase.1 #!/usr/bin/env python2 # ch10-5.py3 from openeye . oechem import ∗45 def GetGoodMol ( destination , source ) :6 destination . DeleteConfs ( )7 for conf in source . GetConfs ( ) :8 if conf . GetEnergy ( ) < −15.5:9 newconf = destination . NewConf ( conf )10 newconf . SetTitle ( "Low Energy Conformer: %.3f" % newconf . GetEnergy ( ) )111213 oemolistream ifs ( "input.oeb" )14 oemolostream ofs ( "output.sdf" )1516 goodmol = OEMol ( )17 for mol in ifs . GetOEMols ( ) :18 GetGoodMol ( goodmol , mol )19 OEWriteMolecule ( ofs , goodmol )Listing 11.6: Use of NewConf to copy conformersThe example above demonstrates copying conformers from one OEMol to another using the NewConf andDeleteConfs functions. The main routine reads all of the molecules from the file ”input.oeb” and writes themolecules with only their low-energy conformations to ”output.sdf”. The function GetGoodMol generates a destinationmolecule that contains only the low-energy conformations of the source molecule. The title of each newconformer is set to reflect its energy.


CHAPTERTWELVEConnectivity Processing<strong>OEChem</strong> provides several functions for determine the connectivity and/or bond orders from various input fileformats. For correct molecule processing, <strong>OEChem</strong> requires all the covalent bonds to be represented in a moleculeand each bond to have a defined bond order, 1 for single, 2 for double, 3 for triple and 4 for quadruple. Given thisexplicit Kekulé representation of a molecule, <strong>OEChem</strong> can perceive and re-perceive high order attributes such asring membership or aromaticity as defined by different aromaticity models (defintions of aromaticity).Alas, unlike MDL’s SD file format, not all file formats explicitly specify a Kekulé form of a molecule with explicitbond orders. The routines below attempt to deduce such a representation from the information that is available insuch file formats.12.1 Determining Bonds From 3D CoordinatesFor file formats that provide 3D coordinates, but not explicit bond information (or only partial bond information),<strong>OEChem</strong> uses the OEDetermineConnectivity function. This function deduces the pattern of covalent bonding in amolecule from the proximity of atoms. Two atoms are considered bonded if they within the sum of their covalentradii plus an additional ”slop” factor of 0.45 Angstroms. The covalent radii used are those prescribed by theCambridge Crystallographic Database. The values used for the common organic subset of elements is given by thetable below.These values may also be retrieved from <strong>OEChem</strong> using the OEGetCovalentRadius function.OEDetermineConnectivity will not create a bond between two atoms that are less than 0.4 Angstroms apart. Suchunreasonably short bond lengths indicate the structure is either severely distorted, or doesn’t have coordinateinformation at all. All bonds created by OEDetermineConnectivity have bond orders set to one. To perceive bondorder information, see <strong>OEChem</strong>’s OEPerceiveBondOrders function described below.The OEDetermineConnectivity function checks whether a bond already exists between two atoms before creatinga new bond. This allows this function to be used with file formats that may specify partial connectivity, such asonly multiple (double, triple or quadruple) bonds.52


12.2. Kekule Form Assignment 53Element Symbol Number Covalent RadiusHydrogen H 1 0.23Boron B 5 0.83Carbon C 6 0.68Nitrogen N 7 0.68Oxygen O 8 0.68Fluorine F 9 0.64Silicon Si 14 1.20Phosphorus P 15 1.05Sulfur S 16 1.02Chlorine Cl 17 0.99Arsenic As 33 1.21Selenium Se 34 1.22Bromine Br 35 1.21Tellurium Te 52 1.47Iodine I 53 1.40Table 12.1: Covalent Radii in <strong>OEChem</strong>12.2 Kekule Form AssignmentA number of file formats don’t represent a connection table as a single representative Kekulé form but insteaddenote some bonds, such as those in benzene, as aromatic. <strong>OEChem</strong> provides a method for determining a valid,but arbitrary, Kekulé form for such connection tables using the OEKekulize function. On input to OEKekulize, theinteger bond type property of each bond represents either the bond order (1 for single, 2 for double, 3 for triple or4 for quadruple) or a the value 5 indicating the bond is aromatic or resonant. The algorithm sets the bond orderproperty from the bond type property, with the exception of bond type 5, which is assigned a bond order of either1 or 2 representing either a single or double bond. The Boolean return value indicates whether a valid Kekulé formcould be assigned.Note that OEKekulize requires that the implicit hydrogen counts and formal charges have been correctly set on allatoms before being called.OEKekulize is normally only used by low-level file readers for interpreting input connection tables. To write out aKekulé SMILES string, use the <strong>OEChem</strong> function OEClearAromaticity that clears the aromaticity property of allatoms and bonds in a molecule, causing the molecule to be written out as aliphatic with explicit bond orders.12.3 Perceiving Bond OrdersThe OEPerceiveBondOrders function is used to deduce bond orders from the 3D co-ordinates and simple connectivityof a molecule. If the simple connectivity, i.e. bonds without bond orders isn’t specified in the input file,OEDetermineConnectivity should be called first to deduce this information from the 3D coordinates.


CHAPTERTHIRTEENRing Processing13.1 Cycle MembershipThe simplest form of ring processing in <strong>OEChem</strong> is testing for ring/cycle membership. The <strong>OEChem</strong> functionOEFindRingAtomsAndBonds is used to determine which atoms and bonds are members of one or more cyclesand which are acyclic. This function uses an efficient O(n) algorithm. Once OEFindRingAtomsAndBonds hasbeen called, an atom or bond can be tested for being in a ring by calling either the OEAtomBase IsInRing or theOEBondBase IsInRing methods respectively.Because of this it is common to test IsInRing (or aromaticity) in user applications, the function OEFindRingAtomsAndBondsis called automatically by the high-level file I/O functions (including <strong>OEChem</strong> oemolistream GetOE-Mols method or OEReadMolecule). However, whenever you modify a molecule by adding or deleting bonds,you’ll need to call OEFindFingAtomsAndBonds.13.2 Number of Ring Bonds to an AtomTo provide an example of how one might use the above functions, the code below returns the number of ring bondsattached to an atom. This is useful for identifying ring fusion atoms (ring bond count = 3) and potential spiro-atoms(ring bond count ≥ 4).def MyAtomRingBondCount ( atom ) :count = 0 ;for bond in atom . GetBonds ( ) :if bond . IsInRing ( ) :count += 1return count13.3 Testing for Membership in a Given Ring SizeIt is also easy to use <strong>OEChem</strong> to determine whether an atom or bond is in a ring or cycle of a given size, usingthe OEAtomIsInRingSize and OEBondIsInRingSize functions. Both of these functions require that OEFind-54


13.4. Determining Smallest Ring Membership 55RingAtomsAndBonds has previously been called on the molecule. Both of these functions take the query ring sizeas an argument, which should be greater than or equal to three. The definition of ring or cycle is not based uponthe SSSR, and these return true if there is a bonded path of ‘size’ unique atoms where each atom is bonded to thenext and the last is bonded to the first.It is often the case that atoms may be in different sized cycles at the same time. For example, one way to identifythe ring fusion atoms in indole (the fusion of a five-membered pyrrole ring and a six-membered benzene ring) is touse OEAtomIsInRing(atm,5) and OEAtomIsInRing(atm,6). Of course, the ”MyAtomRingBond-Count” given in the previous section would be a more efficient way to solve the same problem.<strong>OEChem</strong> also provides an additional pair of functions, OEAtomIsInAromaticRingSize and OEBondIsInAromaticRingsize, to determine whether an atom or bond is in an aromatic ring or cycle of a given size. These behaveidentically to OEAtomIsInRingSize and OEBondIsInRingSize except that each ring bond in the path/cycle mustbe aromatic. In addition to OEFindRingAtomsAndBonds, these functions also require the user to have calledOEAssignAromaticFlags.13.4 Determining Smallest Ring MembershipIn addition to determining whether an atom or a bond is in a ring or cycle of a given size, its often useful toknow the size of the smallest ring or cycle that an atom or bond is in. To do this <strong>OEChem</strong> provides the functionsOEAtomGetSmallestRingSize and OEBondGetSmallestRingSize. For acyclic atoms and bonds, these functionsreturn the value zero. For cyclic atoms and bonds, they return a value greater than or equal to three.13.5 Identifying Connected ComponentsTo aid in splitting molecules into discrete connected components, for example to separate a parent compoundfrom its salt, or a ligand from a protein, <strong>OEChem</strong> provides the function OEDetermineComponents. This functionarbitrarily assigns an integer index, starting from one, to each disconnected part in the OEMolBase. On return thisprovides a mapping from each atom’s index, obtained by GetIdx, to its component index. Unused atom indicesare mapped to zero. The function itself also returns the total number of components found, i.e. the maximum partindex stored in the array.The following provides a short example of how to use this function.def MyReportParts ( mol ) :size = mol . GetMaxAtomIdx ( )count , parts = OEDetermineComponents ( mol )print "The molecule has %d components\n" % countfor atom in mol . GetAtoms ( ) :print "atom %d is in part %d\n" % ( atom . GetIdx ( ) , parts [ atom . GetIdx ( ) ] )


56 Chapter 13. Ring Processing13.6 Identifying Ring SystemsThe <strong>OEChem</strong> function OEDetermineRingSystems behaves very similarly to the OEDetermineComponents functiondescribed in the previous section. However, this function returns a mapping from atom indices to a ringsystem index denoting a ring system, or ”component” connected only by ring bonds. This function requires thatOEFindRingAtomsAndBonds has called previously. All acyclic atoms are mapped to the value zero.13.7 Smallest Set of Smallest Rings (SSSR) considered HarmfulIn 1968, Edsger Dijkstra, one of the great pioneers of computer science, wrote a classic paper, ”Go To StatementConsidered Harmful”, Communications of the ACM, Vol. 11, No. 3, pp. 147-148, March 1968. The first sentenceof that paper contains ”the observation that the quality of a programmer is a decreasing function of the density ofGOTO statements in the programs they produce”. This paper had such dramatic impact that 35 years later mostprogrammers know they should avoid using ”GOTO”, but would have difficulty explaining why.Dijkstra’s argument was not that GOTO was evil per se, but that it showed that the programmer probably had giventhe problem enough thought to discover a cleaner, more elegant solution without its use. This argument is equallyvalid for the ”Smallest Set of Smallest Rings” in chemical information processing. The use of SSSR probablyshouldn’t be forbidden, but it is almost always used in algorithms for which it is inappropriate. Both the relativelyhigh computational cost, O(n 2 logn), and the non-deterministic ambiguity in choosing SSSR membership lead toreal bugs in almost all chemoinformatics uses. Indeed, the <strong>OEChem</strong> library itself demonstrates that it’s possibleto develop state-of-the-art algorithms for cycle perception, aromaticity perception, symmetry perception and 2Ddepiction, without once using SSSR.The fundamental problem is that Plotkin’s original definition of Smallest Set of Smallest Rings is not unique. Forexample, the molecule cubane, has five rings in its SSSR, as determined by the Frèrejacque number (no. of rings= no. of bonds - no. of atoms + 1). This means that although all eight atoms are symmetric, four will be membersof three SSSR rings and four will be members of two SSSR rings. Obviously SSSR membership can’t be used asa graph theoretical invariant in symmetry perception. Indeed the choice of which rings are part of the SSSR andwhich aren’t is arbitrary, and often dependent upon the input order of the molecule. For example, in aromaticity thenon-determinism of ring membership can result the same atom being aromatic in one input ordering and aliphaticin another. Because of this many alternative definitions to SSSR have been proposed over the years, includingExtended SSSR, the set of ”synthetically important” rings, K-rings, etc...We believe that it is a great service to our customers that we do not include any SSSR functionality in <strong>OEChem</strong>.This is a conscious decision. The forerunners of <strong>OEChem</strong>, babel and OELib, both contained efficient algorithmsfor determining SSSR, and these remain freely available on the Internet today. Furthermore, many useful ringperception routines are available in <strong>OEChem</strong> including; the ability to determine whether an atom or bond is acyclicor part of a ring, the smallest ring size that an atom or bond are in, the size of the smallest aromatic ring an atomor bond are in, and whether an atom or bond are in a ring of a particular size.1. Renzo Balducci and Robert S. Pearlman, ”Efficient Exact Solution of the Ring Perception Problem”, Journalof Chemical Information and Computer Science (JCICS), Vol. 34, No. 4, pp. 822-831, 1994.2. L. Baumer, G. Sala and G. Sello, ”Ring Perception in Organic Structures: A New Algorithm for FindingSSSR”, Computational Chemistry, Vol. 15, p. 293-299, 1991.3. Geoffrey M. Downs, Valerie J. Gillet, John D. Holliday and Michael F. Lynch, ”Review of Ring PerceptionAlgorithms for Chemical Graphs”, Journal of Chemical Information and Computer Science (JCICS), Vol.29, No. 3, pp. 172-187, 1989.


13.7. Smallest Set of Smallest Rings (SSSR) considered Harmful 574. John Figueras, ”Ring Perception Using Breadth-First Search”, Journal of Chemical Information and ComputerScience (JCICS), Vol. 36, No. 5, pp. 986-991, 1996.5. Johann Gasteiger and Clemens Jochum, ”An Algorithm for the Perception of Synthetically ImportantRings”, Journal of Chemical Information and Computer Science (JCICS), Vol. 29, No. 1, pp. 43-48,1979.6. M. J. Plotkin, Journal of Chemical Documentation, Vol. 11, pp. 60-63, 1971.7. C. Qian, W. Fisanick D. E. Hartzler and S. W. Chapman, ”Enhanced Algorithm for Finding Smallest Set ofSmallest Rings”, Journal of Chemical Information and Computer Science (JCICS), Vol. 30, pp. 105-110,1990.8. Barbara L. Roos-Kozel and William L. Jorgensen, ”Computer-Aided Mechanistic Evaluation of OrganicReactions. 2. Perception of Rings, Aromaticity and Tautomers”, Journal of Chemical Information andComputer Science (JCICS), Vol. 21, pp. 101-111, 1981.9. A. Zamora, ”An Algroithm for Finding the Smallest Set of Smallest Rings”, Journal of Chemical Informationand Computer Science (JCICS), Vol. 16, p. 43-48, 1979.


CHAPTERFOURTEENAromaticity Processing14.1 Aromaticity and Hückel’s 4n+2 rule<strong>OEChem</strong>’s aromaticity perception routines are based around Hückel’s 4n+2 electron counting rule.OEAssignAromaticFlags14.2 Aromaticity Models in <strong>OEChem</strong>• OEAroModelOpenEye• OEAroModelDaylight• OEAroModelTripos• OEAroModelMDL• OEAroModelMMFF14.3 Clearing AromaticityThe aromatic property of all atoms and bonds in a molecule, can conveniently be reset by calling the <strong>OEChem</strong>function OEClearAromaticFlags. This is useful, for example, for writing the Kekulé form of a SMILES string, bycalling OEClearAromaticFlags before calling OECreateAbsSmiString.The OEClearAromaticFlags function is equivalent to the following code.def MyClearAromaticFlags ( mol ) :for atom in mol . GetAtoms ( ) :atom . SetAromatic ( 0 )for bond in mol . GetBonds ( ) :bond . SetAromatic ( 0 )58


CHAPTERFIFTEENStereochemistry Processing<strong>OEChem</strong> has the ability to store and retrieve stereochemical information for atoms and bonds independent oftwo or three dimensional coordinates. The current version of <strong>OEChem</strong> supports stereochemistry definitions ofhandedness around tetrahedral centers, and cis/trans configuration around bonds. Future versions of <strong>OEChem</strong>may support definitions such as exo/endo, axial/equatorial, square planar, pyramidal, trigonal bipyramidal, andoctahedral.15.1 Atom StereochemistryThe OEAtomBase member function HasStereoSpecified(type) returns a boolean value which indicates whetherstereochemical information of a particular type as been stored for an atom. The integer type argument must bea constant listed in the OEAtomStereo namespace. If an atom has associated stereochemistry data, it can beretrieved using the OEAtomBase member function GetStereo(const std::vector&,unsigned int)function. Multiple possible values may be associated with each class of stereochemistry. For instance, if an atomhas associated tetrahedral stereochemistry, the possible values are Undefined, RightHanded (or just Right), orLeftHanded (or just Left). The following code sample demonstrates looping over atoms, testing for atoms whichhave tetrahedral stereochemistry, and printing out the value of the tetrahedral stereochemistry.# ch14-1.pyfrom openeye . oechem import ∗import os , sysmol = OECreateOEMol ( )OEParseSmiles ( mol , "F[C@H](Cl)Br" )for atom in mol . GetAtoms ( ) :if atom . HasStereoSpecified ( OEAtomStereo_Tetrahedral ) :v= [ ]for nbr in atom . GetAtoms ( ) :v . append ( nbr )stereovalue = atom . GetStereo ( v , OEAtomStereo_Tetrahedral )print "Atom:" ,atom . GetIdx ( ) , " "if stereovalue == OEAtomStereo_RightHanded :print "Right Handed"elif stereovalue == OEAtomStereo_LeftHanded :print "Left Handed"59


60 Chapter 15. Stereochemistry ProcessingThe definition of ”handedness” for tetrahedral stereochemistry does not imply chirality around a tetrahedral center,but rather indicates relative positions of neighboring atoms. Note that the function GetStereo() requires a STLvector containing pointers to the neighboring atoms as the first argument. The ”handedness” value returned fromGetStereo() will depend on the order of the neighboring atoms as they appear in the vector passed to GetStereo().The definition ”handedness” in <strong>OEChem</strong> is demonstrated pictorially in Figure 14-2.Figure 14-2Looking down the bond between atom number one and the central atom, handedness is defined as the directionof travel from atom number two to atom number three. The direction of travel must always be along the acuteangle formed by atom two, the central atom, and atom three. Right handed and left handed directions of travelcan also be though of as clockwise and counterclockwise, respectively. The first neighbor atom in the vectorpassed to GetStereo() is taken as atom number one determination of ”handedness”. Likewise, subsequent atomsin the neighbor atom vector are assigned sequentially to positions in the ”handedness” definition. Although, threeneighboring atoms are sufficient to determine the ”handedness” around a trigonal pyramidal or tetrahedral center,either three or four atoms can be provided to the GetStereo() function when requesting a value for tetrahedralchirality.Setting the relative stereochemistry around a particular center is accomplished using the function Set-Stereo(AtomVector, type, value). Just as in GetStereo(), the AtomVector of neighbor atoms provide the referencesabout which the handedness is defined. The first of the unsigned integer arguments is the stereochemistry type (i.e.OEAtomStereo Tetra), and the second is the associated value (i.e. OEAtomStereo Right).15.2 Bond StereochemistryStereochemistry around bonds can be specified in a similar fashion to stereochemistry about atom centers. Therequest for stereochemistry of a particular class can be made of an OEBondBase using the function HasStereoSpecified(type).The type argument provided to HasStereoSpecified() indicates the type of stereochemistry requested,and must be one of the constants listed in the OEBondStereo namespace. The most commonly requested stereochemistryvalue of bonds is whether the configuration of two atoms around a non-rotatable bond is cis or trans.The following code sample demonstrates a loop over bonds which tests for bonds with associated stereochemistry,and retrieval of whether the neighboring atoms are cis or trans relative to one another.


CHAPTERSIXTEENAtom and Bond Typing16.1 Integer Atom Types and Type NamesAtom typing, or the task of classifying each atom in a molecule by its immediate chemical environment, is acommon task in computational chemistry applications. Atoms of the same ”type” are assumed to behave similarlyor have similar properties within different molecules or at different locations within the same molecule. The mostcommon uses of atom types are for molecular mechanics force fields and some property prediction algorithms.The <strong>OEChem</strong> library supports both atom and bond typing with two independent properties on each OEAtomBaseand OEBondBase respectively. The are the signed integer form of the atom type, and the string form of the atomtype name. Typically, the integer property tends to be used computationally and is often derived algorithmically,whereas the string name is often the ”type” as read in from an input file, or set based upon the perceived integertype before writing to a particular file format.The integer atom type is set using the OEAtomBase SetIntType method and is retrieved with the OEAtomBaseGetIntType method. The atom type name string is set with OEAtomBase SetType and retrieved with OEAtomBaseGetType. The equivalent functions also exist in OEBondBase for setting and retrieving the integer bond type andthe bond type name string.Note that the atom type name string is not the same as the atom name. To give an example, a Sybyl .mol2 filemay contain an atom with name ”N9” but with atom type name ”N.pl3”. The use of two separate properties allowsthese fields to be preserved, and a mol2 file written out with exactly the same atom name and atom type name.Example bond type names might include ”am” or ”ar”.One major use of integer bond types is encoding the aromatic form of a connection table before calling OEKekulize.The semantics of OEKekulize require that integer bond types 1 through 4 represent single, double, triple andquadruple bonds, and that integer bond type 5 represents an aromatic bond. Given these integer bond types asinput, OEKekulize determines a (possibly arbitrary) kekule form, and sets the bond order properties of the bondsof a molecule appropriately (using OEBondBase SetOrder method).16.2 Tripos Atom TypingThe <strong>OEChem</strong> library contains several routines to perform Tripos atom typing. Tripos atom types are commonlyused in computational chemistry from the Tripos force field itself, to Gasteiger’s partial charge algorithm, xlogpand even the writing of Sybyl .mol2 format files.61


62 Chapter 16. Atom and Bond Typing16.2.1 OETriposAtomTypesThis function sets the integer atom type field of each atom in a molecule to its Tripos atom type. This functiontypically requires OEAssignAromaticFlags has been called to perceive aromaticity using the Tripos aromaticitymodel, i.e. OEAssignAromaticFlags(mol,OEAroModelTripos). This is required as Tripos considerscompounds such as pyrole, to be aliphatic and so the carbon atoms of pyrole should be correctly typed as ”C.2”not ”C.ar”. Using the Tripos aromaticity model, however, is not a strong requirement and other aromaticity modelscan be used (for example if C.ar is desired in pyrole).The integer atom type properties are set to the values defined in the OETriposType namespace. So for example,planar nitrogen with Tripos atom type N.pl3 will be assigned the value OETriposType Npl3. Atoms are cannotbe assigned a Tripos atom type, are assigned value zero, a.k.a. OETriposType Du. This function returns true ifall the atoms could be assigned at type.Tripos integer atom types are required by the Gasteiger partial charge calculation functions.16.2.2 OETriposAtomTypeNamesThis function is similar to OETriposAtomTypes, but instead of setting the integer atom type property it sets theatom type name string property using OEAtomBase SetType. The integer atom type property is left unchanged.Once again, this function typically requires OEAssignAromaticFlags has been called to perceive Tripos aromaticitymodel aromaticity, i.e. OEAssignAromaticFlags(mol,OEAroModelTripos). See OETriposAtom-Types for details.The atom type name string property should typically be set to a tripos type name before writing a Sybyl .mol2 fileusing the low-level OEWriteMol2File function.16.2.3 OETriposTypeNamesThis function is very similar to OETriposAtomTypeNames, and sets the atom type name string property usingOEAtomBase SetType. However, instead of doing graph matching to perceive the Tripos atom types, it assumesOETriposAtomTypes has already been called. This is a simple optimization, that avoids duplicating effort and justsets the atom type name string property from the value stored inThe atom type name string property should typically be set to a Tripos type name before writing a Sybyl .mol2 fileusing the low-level OEWriteMol2File function.16.2.4 OETriposAtomNamesThis function doesn’t involve atom or bond typing at all, but is included here as it is related. This function setsthe atom name fi eld to ”XY” where X is the atomic symbol for the atom and Y is a sequential index per atomicnumber. The atom name property should typically set to a tripos atom name before writing a Sybyl mol2 file usingthe low-level OEWriteMol2File function.


16.3. Tripos Bond Typing 6316.3 Tripos Bond TypingOETriposBondTypeNames16.4 Generic Tripos Type FunctionsOETriposTypeNameOETriposTypeElementOETriposTypeIndexThe next example creates a caffeine molecule from SMILES, adds hydrogens explicitly, assigns Tripos atom namesand typenames and prints them all out.# ch15-1.pyfrom openeye . oechem import ∗mol = OECreateOEMol ( )caffeine = ’Cn1cnc2n(C)c(=O)n(C)c(=O)c12’OEParseSmiles ( mol , caffeine )mol . SetTitle ( ’caffeine’ )print ’#atoms=’ ,mol . NumAtoms ( )# add explicit hydrogensOEAddExplicitHydrogens ( mol )print ’#atoms=’ ,mol . NumAtoms ( )# now loop over atoms, printing their atomic numberfor atom in mol . GetAtoms ( ) :print atom . GetAtomicNum ( ) ,print# assign tripos atom names and typesOETriposAtomNames ( mol )OETriposAtomTypeNames ( mol )# now loop over atoms, printing their namefor atom in mol . GetAtoms ( ) :print atom . GetName ( ) ,print# now loop over heavy atoms, printing their typesfor atom in mol . GetAtoms ( ) :if atom . GetAtomicNum ( ) ! = 1 :print atom . GetType ( ) ,print


64 Chapter 16. Atom and Bond Typing16.5 Writing a Sybyl mol2 file using OEWriteMol2FileTo demonstrate how all of these atom and bond typing routines are used together, the following example showshow to write a Sybyl mol2 file using low-level I/O.def MyWriteMol2Molecule ( ofs , mol ) :OEFindRingAtomsAndBonds ( mol )OEAssignAromaticFlags ( mol , OEAroModelTripos )OETriposAtomTypeNames ( mol )OETriposBondTypeNames ( mol )OETriposAtomNames ( mol )OEWriteMol2File ( ofs , mol , 0)16.6 MacroModel Atom TypingThe set of functions for MacroModel atom typing is very similar to those available for Tripos atom typing.OEMacroModelAtomTypesOEMacroModelAtomTypeNamesOEMacroModelTypeNames16.7 Generic MacroModel Type FunctionsOEMacroModelTypeNameOEMacroModelTypeElement


CHAPTERSEVENTEENFormal and Partial ChargesEach OEAtomBase keeps track of two types of charges. The first, formal charge, is an integer property that isessential for the correct valence representation of a molecule. Together with atomic valences, bond order andthe connectivity, this field is defines the identity of a molecule. The second type of charge, partial charge, is afloating point property used in computational chemistry and molecular modeling. This value is used to representthe electronic distribution/wave-function of a molecule by approximating the molecule’s electrostatic field with aset of point charges located at each atom.The formal charges on an atom may be stored and retrieved using the SetFormalCharge andGetFormalCharge methods respectively. Similarly, the partial charges are stored and retrieved with theSetPartialCharge and GetPartialCharge methods.Neither the formal charge nor the partial charge is a directly observable property of an atom. Instead the samemolecule may be represented by different valence representations each placing the formal charges in differentlocations, i.e. [cH+]1[cH-][cH+][cH-][cH+][cH-]1 Benzene, and different partial charging algorithms may assignsignificantly different partial charges to the same atom.17.1 Assigning Formal ChargesNormally, file formats such as SMILES, SLN or MDL’s SDF format, specify the formal on each atom of a connectiontable. However, when reading from lesser file formats or when repairing ”broken” molecules, it may beconvenient to assign formal charges to each atom such that the atomic valence is consistent. <strong>OEChem</strong> providesthis functionality via the OEAssignFormalCharges function. This function requires that bond orders andimplicit hydrogen counts have been set on a molecule. It then adjusts the formal charge on each uncharged atom tocorrect common valence model mismatches. For example, quaternary nitrogens are assigned a +1 formal charge,and terminal oxygen connected only by a single bond (with no implicit hydrogens) is assigned a -1 formal charge.A more technical discussion of the formal charges that are assigned by this function are described in the ”OpenEyeCharge Model” section of the ”Valence Models” chapter of this document.65


66 Chapter 17. Formal and Partial Charges17.2 Working with Partial Charges<strong>OEChem</strong> also provides several functions that simplify the task of working with partial charges, independent of anypartial charging scheme.The OEClearPartialCharges function may be used to result the partial charge for all atoms in an OEMol-Base to zero. By default, OEAtomBases are created with zero partial charge, so this function is only really requiredto zero the partial charges after values have been assigned.The OEFormalPartialCharges function provides a convenient way to set the partial charge on eachOEAtomBase of an OEMolBase to it’s formal charge.Finally, the OEHasPartialCharges function examines an OEMolBase to see whether any of it’sOEAtomBases has a non-zero partial charge.17.3 Determining Net Charge on a MoleculeThe total (or net) charge on a molecule can be conveniently calculated by calling <strong>OEChem</strong>’s OENetCharge functionon a molecule. If any of the atoms in the molecule has a non-zero partial charge, i.e. the function OEHasPartialChargesreturns true, this function returns the sum of the partial charges rounded to the nearest integer. If themolecule doesn’t have partial charges, this function returns the sum of the formal charges on each atom.This logic should return the total charge on the molecule if either formal or partial charges are present. Whenboth partial and formal charges are present, partial charges take priority. For ”valid” partial charges, the sum ofthe partial charges should always be an integer equal the sum of the formal charges, so this preference typicallywon’t matter. However when reading from file formats that contain partial charges, such as .mol2 or Delphi PDB,<strong>OEChem</strong> may be unable to correctly assign formal charges to each atom, in which case the partial charges areoften more trustworthy.The following example contains two functions that determines whether a molecule is a cation or anion, i.e. carriesa formal positive or negative charge respectively.def MyIsCation ( mol )if OENetCharge ( mol ) > 0 :return 1return 0def MyIsAnion ( mol )if OENetCharge ( mol ) < 0 :return 1return 017.4 Calculating Gasteiger Partial ChargesTo assign Marsilli-Gasteiger partial charges to a molecule, <strong>OEChem</strong> provides theOEGasteigerPartialCharges function. This sets the partial charge property of each atom, usingthe OEAtomBase::SetPartialCharge method. The algorithm itself reproduces the partial charges ascalculated by Tripos Inc’s Sybyl software, with default parameter settings, which is the de facto referenceimplementation for Gasteiger charges.


17.4. Calculating Gasteiger Partial Charges 67The Gasteiger partial charge algorithm currently assumes that all hydrogen atoms are represented explicitly, forexample by calling OEAddExplicitHydrogens.The current version of OEGasteigerPartialCharges should return the same results independent of thecurrently assigned aromaticity models and the values of each atom’s ”integer atom type” property. Early versionsof <strong>OEChem</strong> allowed customization of the Gasteiger charge calculation by explicitly assigning the Tripos atomstypes of each atom of the molecule. Unfortunately, this required assigning Tripos’ aromaticity and assigningTripos atom types before each call to OEGasteigerPartialCharges. The more recent behavior is lesserror-prone (as it’s no longer possible to forget to prepare a molecule) and greatly simplifies common usage.The first stage of the Marsilli-Gasteiger ”Partial Equalization of Orbital Electronegativities (PEOE)” calculation isassignment of seed charges to each atoms. Typically for neutral atoms the partial charges is seeded zero, but forexample, each oxygen in a carboxylate is assigned -0.5, and the net formal charge on conjugated rings is sharedequally amongst the atoms of the ring system. These seed charges may also be useful in some applications, andcan be assigned using the OEGasteigerInitialCharges method.


CHAPTEREIGHTEENPattern Matching<strong>OEChem</strong> includes facilities to perform different types of pattern (graph) matching. Graph matching is based onnode (atom) and edge (bond) correspondences. An atom which satisfies the conditions of a node in a query graph issaid to match. Likewise, a bond which satisfies the conditions of an edge in a query graph is said to match. Patternmatching is the process of identifying groupings of matching nodes and edges. Substructure search, or subgraphisomorphism, is the process of finding a graph match which is less than or equal to a larger graph. Maximumcommon substructure search is the process of identifying the maximal graph correspondence between two graphs.Clique detection is the process of finding all possible correspondences between two graphs within a set of bounds.18.1 Substructure SearchSubstructure searches can be done in <strong>OEChem</strong> using the OESubSearch class. The OESubSearch class canbe initialized with a SMARTS pattern, an OEQMolBase query molecule, or a molecule with expression options.The following example demonstrates how to initialize a OESubSearch instance with a SMARTS pattern, andperform a substructure search.1 #!/usr/bin/env python23 from openeye . oechem import ∗4 import os , sys56 mol = OEGraphMol ( )7 OEParseSmiles ( mol , "c1ccccc1C" )8 # create a substructure search object9 ss = OESubSearch ( "c1ccccc1" )1011 if ss . SingleMatch ( mol ) :12 print "benzene matches toluene"13 else :14 print "benzene does not match toluene"Listing 18.1: Substructure search exampleIn the Listing 18.1, the query pattern is benzene and the molecule in which the substructure is being searched foris toluene. Since benzene is a substructure of toluene the OESubSearch.SingleMatch method will returntrue. The OESubSearch.SingleMatch method returns true if a single subgraph isomorphism is detected inthe molecule passed as the function argument.The OESubSearch class is able to identify the atom and bond correspondences of the pattern and target structures.The program in Listing 18.2 extends the simple match example to write out all atom correspondences68


18.1. Substructure Search 69between benzene and toluene.1 #!/usr/bin/env python23 from openeye . oechem import ∗4 import os , sys56 mol= OEGraphMol ( )7 OEParseSmiles ( mol , "c1ccccc1C" )8 # create a substructure search object9 ss = OESubSearch ( "c1ccccc1" )1011 count = 112 # loop over matches13 for match in ss . Match ( mol ) :14 sys . stdout . write ( "\nMatch %d : " % count )15 sys . stdout . write ( "pattern atoms: " )16 for ma in match . GetAtoms ( ) :17 sys . stdout . write ( "%d " % ma . pattern . GetIdx ( ) )18 sys . stdout . write ( "target atoms: " )19 for ma in match . GetAtoms ( ) :20 sys . stdout . write ( "%d " % ma . target . GetIdx ( ) )21 count+=1Listing 18.2: Atom map exampleThe output of Listing 18.2 is the following:Match 1 : pattern atoms : 0 1 2 3 4 5 target atoms : 0 1 2 3 4 5Match 2 : pattern atoms : 0 1 2 3 4 5 target atoms : 0 5 4 3 2 1Match 3 : pattern atoms : 0 1 2 3 4 5 target atoms : 1 2 3 4 5 0Match 4 : pattern atoms : 0 1 2 3 4 5 target atoms : 1 0 5 4 3 2Match 5 : pattern atoms : 0 1 2 3 4 5 target atoms : 2 3 4 5 0 1Match 6 : pattern atoms : 0 1 2 3 4 5 target atoms : 2 1 0 5 4 3Match 7 : pattern atoms : 0 1 2 3 4 5 target atoms : 3 4 5 0 1 2Match 8 : pattern atoms : 0 1 2 3 4 5 target atoms : 3 2 1 0 5 4Match 9 : pattern atoms : 0 1 2 3 4 5 target atoms : 4 5 0 1 2 3Match 10 : pattern atoms : 0 1 2 3 4 5 target atoms : 4 3 2 1 0 5Match 11 : pattern atoms : 0 1 2 3 4 5 target atoms : 5 0 1 2 3 4Match 12 : pattern atoms : 0 1 2 3 4 5 target atoms : 5 4 3 2 1 0The OESubSearch.Match method performs subgraph isomorphism determination for instances ofOEMolBase or OEQMolBase and returns an iterator over all detected subgraphs. Each of the subgraphs canbe queried for their atom and bond correspondences. In this particular example, the benzene substructure is identifiedtwelve times in toluene. There are twelve matches because the benzene ring can be rotated around for 6matches, and then flipped and rotated around for another 6 matches, yielding a total of twelve. Each of the matchesdiffer in their atom and bond correspondences to the pattern substructure.A match or subgraph is considered unique if it differs from all other subgraphs found previously by at least oneatom or bond. When doing unique matching, two subgraph matches which cover the same atoms and bonds, albeitin different orders, will be called duplicates and it will be discarded. In order to retrieve only unique matches, theMatch function has to be called with a second argument being set to true. In the Listing 18.2 example, usingunique search would yield only a single match for benzene in toluene.An OESubSearch may be initialized using a SMARTS or an query molecule (OEQMolBase). Query moleculesmust have atom and bond expressions built for the entire molecule to be able to initialize the search object (seeOEQMolBase.BuildExpressions in the API document).OESubSearch.GetPattern returns a read-only reference to the query molecule contained in an instance ofOESubSearch. Const OEQMolBase methods can be used to interrogate the returned OEQMolBase reference.


70 Chapter 18. Pattern MatchingThe OESubSearch.SetMaxMatches method sets the maximum number of subgraphs to be returned by theOESubSearch.Match methods. Once the maximum number of subgraphs has been found the search for isterminated. By default, an OESubSearch is constructed with the maximum number of matches set to 1024. Theconstraint on the maximum number of matches can be removed by calling SetMaxMatches with a value of zero.18.2 Maximum Common Substructure SearchThe maximum common substructure (henceforth MCS) of two molecular graphs can be identified using theOEMCSSearch class. The Listing 18.3 demonstrates how to initialize an OEMCSSearch object, perform amaximum common substructure search, and then retrieve the matches.1 #!/usr/bin/env python23 from openeye . oechem import ∗4 import os , sys56 pattern = OEGraphMol ( )7 target = OEGraphMol ( )8 OEParseSmiles ( pattern , "c1cc(O)c(O)cc1CCN" )9 OEParseSmiles ( target , "c1c(O)c(O)c(Cl)cc1CCCBr" )1011 atomexpr = OEExprOpts_DefaultAtoms12 bondexpr = OEExprOpts_DefaultBonds13 # create maximum common substructre object14 mcss = OEMCSSearch ( pattern , atomexpr , bondexpr , OEMCSType_Exhaustive ) #15 # set scoring function16 mcss . SetMCSFunc ( OEMCSMaxAtoms ( ) ) #17 # ignore matches smaller than 6 atoms */18 mcss . SetMinAtoms ( 6 ) #1920 unique = True21 count = 122 # loop over matches23 for match in mcss . Match ( target , unique ) : #24 sys . stdout . write ( "\nMatch %d :" % count )25 sys . stdout . write ( "\npattern atoms: " )26 for ma in match . GetAtoms ( ) :27 sys . stdout . write ( "%d " % ma . pattern . GetIdx ( ) )28 sys . stdout . write ( "\ntarget atoms: " )29 for ma in match . GetAtoms ( ) :30 sys . stdout . write ( "%d " % ma . target . GetIdx ( ) )31 count+=132 # create match subgraph33 m = OEGraphMol ( )34 OESubsetMol ( m , match , True ) #35 smi = OECreateCanSmiString ( m )36 sys . stdout . write ( "\nmatch smiles = %s \n" % smi )Listing 18.3: Maximum common substructure search exampleThe first molecule, pattern, is dopamine, and the second molecule, target, is a dopamine analog. TheOEMCSSearch instance is initialized with dopamine, atom and bond expressions for the maximum commonsubstructure query, and the type of the search (see line 14 in Listing 18.3).The atom and bond expressions define criteria for atom and bond equivalence used during the search and aredefined in the OEExprOpts namespace. See ‘OEExprOpts Namespace’ on page 76 for more information.


18.2. Maximum Common Substructure Search 71An OEMCSSearch object can also be constructed from a SMARTS string directly. In this case standard SMARTSmatching rules apply for what constitutes a match. If it is constructed with an OEQMolBase, then whatever atomand bond expressions have been applied to the OEQMolBase will apply in the MCS search.The last argument of the initialization defines the search type, either OEMCSType Approximate orOEMCSType Exhaustive. The difference between the two search types is detailed in ‘Exhaustive and approximateMCSS’ on page 72. This argument is optional, if it is not specified, then the faster approximate methodis employed.During the search process, each identified common substructure is evaluated by a scoring function and only substructureswith the best score are retained. The SetMCSFunc (line 16) provides an ability to set the scoringfunction of an OEMCSSearch object, thereby influence the result of the maximum common substructure searchprocess. See ‘MCS scoring functions: OEMCSFunc’ on page 73 for more information.The OEMCSSearch.Match method (line 23) returns an iterator over the maximum common substructure(s).The OEMatchBase is then passed as an argument to the OESubsetMol function (line 34), and subsequentlyconverted into a smiles string.The detected maximum common substructure of the example program is depicted in Figure 18.1, the output of theprogram is shown below.Match 1 :pattern atoms : 0 1 2 3 4 5 6 7 8 9target atoms : 7 5 3 4 1 2 0 8 9 10match smiles = CCc1ccc ( c ( c1 ) O ) OFigure 18.1: The maximum common substructure (highlighted by red) of dopamine and dopamine analogThe maximum common substructure search can perform unique or non-unique substructure searching changing thesecond argument of the OEMCSSearch.Match function (see line 23 in Listing 18.3). Th default in a non-uniquesearch.By definition, a match or subgraph is considered unique if it differs from all other subgraphs found previously byat least one atom or bond. Additionally, it is also considered unique if the pattern subgraph is mapped to a differentpart of the target.Figure 18.2 shows an example in which the two matches are identified using the unique search method. Eventhough the two obtained subgraphs are identical, they represent different mappings between the pattern and thetarget, therefore they are both considered unique. Using a non-unique search would result in four matches, sincethe phenol can flip, yielding two additional matches.The search space of a maximum common subgraph determination can be restricted by constraining pairsof atoms or bonds to be mapped onto one another in all subgraph solutions. This is done using theOEMCSSSearch.AddConstraint method. Failure to satisfy atom or bond pairwise constraints will preventany subgraph solutions from being identified. Constraints are considered satisfied in subgraphs which do not


72 Chapter 18. Pattern MatchingFigure 18.2: Example for unique maximum common substructurescontain any constrained atoms or bonds in either the pattern or target molecules. The AddConstraint methodreturns true if a constraint is added successfully. If the pattern atom or bond in the OEMatchPair does not exist aspart of the query molecule created in the initialization of the OESubSearch object then AddConstraint willreturn false. Multiple calls to AddConstraint using the same pattern atom or bond will cause previously storedconstraints to be overwritten as constraints are mutually exclusive. It is impossible to satisfy multiple simultaneousconstraints for a single pattern atom or bond, hence the exclusivity.A read-only reference to the query molecule (OEQMol) contained in an instance of OEMCSSearch can be obtainedwith the GetPattern method. Note that if the OEMCSSearch was constructed with an OEMolBase,the returned OEQMol is a separate object. Const OEQMolBase methods can be used to interrogate the returnedOEQMolBase reference.The SetMaxMatches method alters the maximum number of maximum common subgraph matches that willbe returned by the OEMCSSearch.Match method. The search for maximum common substructures will notterminate immediately upon reaching this limit. The maximum common subgraph cannot be known unless theMCS is composed of all atoms and bonds of at least one of the graphs being compared. The limit of subgraphs tobe returned may be reached with a smaller subgraph than the maximum. In such a case the search continues forlarger subgraphs until the search is exhausted. OEMCSSearch.Match will return the first N maximum commonsubgraphs where N is less than or equal to the maximum match limit. The default limit set upon construction of anOEMCSSearch instance is 1024 matches.The SetMinAtoms method sets the minimum number of atoms required of a subgraph match to be returned bya MCS search. For example, changing the parameter of SetMinAtoms in Listing 18.3 line 18 to 11 would resultin no solution since there are only 10 atoms of the largest maximum common substructure (see Figure 18.1).A single atom can be a perfectly valid maximum common subgraph, however, for many applications such a smallsubgraph may not be considered useful. Setting the minimum number of atoms to a useful size prevents unproductivesubgraph matches from being returned by the OEMCSSearch.Match method. The default set uponconstruction of an OEMCSSearch instance for the minimum number of atoms is one.18.2.1 Exhaustive and approximate MCSSThe maximum common substructure search can be performed in two different modes: a very fast method(OEMCSType Approximate) or a more comprehensive method (OEMCSType Exhaustive).The type of the OEMCSSearch can be set at initialization. The default is OEMCSType Approximate.


18.2. Maximum Common Substructure Search 73The approximate method is based on traversing through pre-defined paths of the query structure and trying tomap the visited query atoms into target atoms. Because these pre-defined paths represent only a fraction of allpossible paths of a compound, it is not guaranteed that the approximate method can find the largest and all commonsubstructures. Significant difference of the detected matches of the two methods could exist in cases when the queryor target structure contains complex ring systems (fused or bridged) or stereo centers. However, comparing the twomethods for thousands of structures revealed that these cases are rare and the approximate method provides a goodtrade-off between identifying MCS matches accurately and doing it 3–6 times faster than the exhaustive method.Figure 18.3 and Figure 18.4 shows an example where the substructure identified by the approximate method issmaller by one atom then the solution identified by the exhaustive method.Figure 18.3: Example for maximum common substructure identified by the approximate methodUsing the approximate MCS is recommended if the speed of the search is crucial.18.2.2 MCS scoring functions: OEMCSFuncOEMCSFunc is an abstract base class that defines the API used for scoring subgraph matches. The scores generatedby implementations of OEMCSFunc influence the sorting and retention of maximum common subgraph matchesgenerated by the OEMCSSearch class.It is important to mention that using different scoring functions does not alter the way the search space is traversedto identify common substructures. It effects only how these identified substructures are evaluated.Four implementations of the OEMCSFunc class are available in <strong>OEChem</strong>:1. OEMCSMaxAtoms (See example in Figure 18.5)The OEMCSMaxAtoms class is designed to order maximum common substructure matches by the maximumnumber of atoms included in the graph match. If two matches have the same number of atoms, then te tie issplit based on the number of bonds contained in the match.Figure 18.4: Example for maximum common substructure identified by the exhaustive method


74 Chapter 18. Pattern MatchingScoring function: number of mapped atoms + number of mapped bonds/100Figure 18.5: Retrieved matches using OEMCSMaxAtoms as scoring function2. OEMCSMaxBonds (See example in Figure 18.6)The OEMCSMaxBonds class is designed to order maximum common substructure matches by the maximumnumber of bonds included in the graph match. If two matches have the same number of bonds, then the tieis split based on the number of atoms contained in the match.Scoring function: number of mapped bonds + number of mapped atoms/1003. OEMCSMaxAtomsCompleteCycles (See example in Figure 18.7)The OEMCSMaxAtomsCompleteCycles class is the same as the OEMCSMaxAtoms with the additionof penalizing cyclic query bonds that are not mapped to any target bonds, thereby giving priority to matcheswhich contain complete cycles common to both the pattern and the target structure.Scoring function: number of mapped atoms + nrumber of mapped bonds/100 - penalty × number. of unmappedcyclic query bondsThe default penalty for each unmapped cyclic query bond is 1.0.4. OEMCSMaxBondsCompleteCycles (See example in Figure 18.8)The OEMCSMaxBondsCompleteCycles class is the same as the OEMCSMaxBonds with the additionof penalizing cyclic query bonds that are not mapped to any target bonds, thereby giving priority to matcheswhich contain complete cycles common to both the pattern and the target structure.Scoring function: number of mapped bonds + number of mapped atoms/100 - penalty × number of unmappedcyclic query bondsThe default penalty for each unmapped cyclic query bond is 1.0.The default scoring function, OEMCSMaxAtoms, can be changed using the SetMCSFunc method of theOEMCSSearch class.It is important to mention, that not only matches with the highest score are retained, but also matches with scoreshigher than the best score rounded down to the highest integer. In the example shown in Figure 18.6 three commonsubstructures are detected using the OEMCSMaxBonds scoring functions. The first two matches are scored 5.06,since they composed of 5 mapped bonds and 6 mapped atoms. There is only one other matche which scored higherthan 5.0, this is the third retained match with a 5.05 score.


18.3. Clique Search 75Figure 18.6: Retrieved matches using OEMCSMaxBonds as scoring functionFigure 18.7: Retrieved matches using OEMCSMaxAtomsCompleteCycles as scoring function18.3 Clique SearchClique detection is a bounded common structure search. It is a useful search method in cases where commonsubstructure(s) other than the maximum common substructure(s) need to be identified. The following exampledemonstrates a clique search.


76 Chapter 18. Pattern MatchingFigure 18.8: Retrieved match using OEMCSMaxBondsCompleteCycles as scoring function1 #!/usr/bin/env python23 from openeye . oechem import ∗4 import os , sys56 pattern = OEGraphMol ( )7 target = OEGraphMol ( )8 OEParseSmiles ( pattern , "c1cc(O)c(O)cc1CCN" )9 OEParseSmiles ( target , "c1c(O)c(O)c(Cl)cc1CCCBr" )10 # create clique earch object11 cs = OECliqueSearch ( pattern , OEExprOpts_DefaultAtoms , OEExprOpts_DefaultBonds )12 # ignore cliques that differ by more than 5 atoms from MCS13 cs . SetSaveRange ( 5 )1415 count = 116 # loop over matches17 for match in cs . Match ( target ) :18 sys . stdout . write ( "\nMatch %d :" % count )19 sys . stdout . write ( "\npattern atoms: " )20 for ma in match . GetAtoms ( ) :21 sys . stdout . write ( "%d " % ma . pattern . GetIdx ( ) )22 sys . stdout . write ( "\ntarget atoms: " )23 for ma in match . GetAtoms ( ) :24 sys . stdout . write ( "%d " % ma . target . GetIdx ( ) )25 count += 1Listing 18.4: Clique search exampleThe same molecules and expression options are used as in Listing 18.3, however, an iterator over all identifiedcliques is returned by the OECliqueSearch.Match method. The OECliqueSearch.SetSaveRangemethod bounds the search. In this case, cliques returned will only differ by five nodes relative to the maximumcommon substructure. The atom correspondences for each of the returned cliques are printed in the exampleprogram.18.4 OEExprOpts NamespacePattern matching in <strong>OEChem</strong> is always done using query molecules or query graphs. Non-query molecules, i.e.those that are derived directly from OEMolBase or OEMCMolBase, must be converted into a query molecule.Conversion into a query molecule is controlled using the values in the OEExprOpts namespace. Expressionoptions can either be specified in the constructor for an OEQMol, or using the convenience constructors in patternmatching classes (OESubSearch, OEMCSSearch, and OECliqueSearch) which take expression options asas arguments.


18.4. OEExprOpts Namespace 77Figure 18.9 shows an example where maximum common substructure search is performed using theOEExprOpts DefaultAtoms and OEExprOpts DefaultBonds options.Figure 18.9: Example of maximum common substructure search with DefaultAtoms and DefaultBondsThe OEExprOpts DefaultAtoms option means that two atoms are considered to be equivalent i.e. theycan be mapped to each other if they have the same atomic number, aromaticity, and formal charge. TheOEExprOpts DefaultBonds option means that two bonds can be mapped to each other if they have thesame bond order and aromaticity.1 #!/usr/bin/env python23 from openeye . oechem import ∗4 import os , sys56 pattern = OEGraphMol ( )7 target = OEGraphMol ( )8 OEParseSmiles ( pattern , "c1(cc(nc2c1C(CCC2)Cl)CCl)O" )9 OEParseSmiles ( target , "c1(c2c(nc(n1)CF)COC=C2)N" )1011 atomexpr = OEExprOpts_DefaultAtoms #12 bondexpr = OEExprOpts_DefaultBonds #1314 patternQ = OEQMol ( pattern )15 # generate query with atom and bond expression options16 patternQ . BuildExpressions ( atomexpr , bondexpr ) #17 mcss = OEMCSSearch ( patternQ )1819 unique = True20 count = 121 # loop over matches22 for match in mcss . Match ( target , unique ) :23 sys . stdout . write ( "\nMatch %d :" % count )24 sys . stdout . write ( "\nNumber of matched atoms: %d " % match . NumAtoms ( ) )25 sys . stdout . write ( "\nNumber of matched bonds: %d " % match . NumBonds ( ) )26 # create match subgraph27 m = OEGraphMol ( )28 OESubsetMol ( m , match , True )29 smi = OECreateCanSmiString ( m )30 sys . stdout . write ( "\nmatch smiles = %s \n" % smi )31 count += 1Listing 18.5: MCSS with atom and bond expressionThe best way to understand how various atom and bond expressions influence the pattern matching is to changethe atom (line 11) and bond expressions (line 12) in Listing18.5 and compare the obtained matches.After constructing the pattern molecule, the OEQMolBase.BuildExpressions (line 16) defines the level ofatom and bond matching between the pattern molecule and any target molecule.By modifying the atom and bond expression options, very diverse pattern matching can be performed. Figure 18.10


78 Chapter 18. Pattern MatchingFigure 18.10: ExactAtoms and DefaultBondsFigure 18.11: DefaultAtoms|EqAromatic and DefaultBonds– Figure 18.14 show several examples where maximum common substructure searches are performed for the samequery and target molecules, but with various atom and bond expression options.In the first example in Figure 18.10, the OEExprOpts ExactAtoms expression option is used to give a higherdegree of discrimination of the equivalence of atoms, i.e. atoms can only be mapped to each other if they have thesame degree, number of hydrogens, chirality, mass, and ring membership in addition to the requirements of theOEExprOpts DefaultAtoms option.Figure 18.11 – Figure 18.14 show examples where the discrimination capability of theOEExprOpts DefaultAtoms is decreased by adding various modifiers. For example, using theOEExprOpts EqAromatic modifier, atoms in any aromatic ring systems are considered equivalent.As a result, the pyridine and pyrimidine ring can be mapped to each other in Figure 18.11.Similarly, OEExprOpts EqHalogen (Figure 18.12) and OEExprOpts EqONS (Figure 18.13)define equivalency between halogen atoms and oxygen-nitrogen-sulfur atoms, respectively. UsingOEExprOpts EqCAliphaticONS (Figure 18.14) an aliphatic query carbon atom is considered equivalentto any oxygen, nitrogen, or sulphur atom.Similar modifiers exist for altering bond equivalency. Figure 18.15 shows an example where single and doubleFigure 18.12: DefaultAtoms|EqHalogen and DefaultBonds


18.4. OEExprOpts Namespace 79Figure 18.13: DefaultAtoms|EqONS and DefaultBondsFigure 18.14: DefaultAtoms|EqCAliphaticONS and DefaultBondsbonds are considered identical when OEExprOpts EqSingleDouble modifier is utilized.The last example in Figure 18.16 represents a very unrestrained search, where both the atom and bond expressionoptions have weak discrimination power.Even though only maximum common substructure search examples are presented here, atom and bond expressionoptions can be similarly used with substructure searches or clique detections. For a full description of expressionoptions and their usage please refer to the OEExprOpts namespace section in the <strong>OEChem</strong> namespaces of the<strong>OEChem</strong> C++ API document.Figure 18.15: DefaultAtoms and DefaultBonds|EqSingleDouble


80 Chapter 18. Pattern MatchingFigure 18.16: DefaultAtoms|EqAromatic|EqCAliphaticONS|EqHalogen|EqONS andDefaultBonds|EqSingleDouble


CHAPTERNINETEENCoordinate Handling19.1 Getting and Setting Coordinates of Atoms and MoleculesAll molecules that conform to the OEMolBase API have a set of methods for getting coordinates and settingcoordinates. In <strong>Python</strong>, there are several versions of each, used depending on the level of access desired and theperformance required.Member functions for getting coordinates:• GetCoords() returns dictionary• GetCoords(atom) returns (x,y,z) tuple• GetCoords(OEDoubleArray)• GetCoords(OEAtomsBase, OEDoubleArray)Member functions for setting coordinates:• SetCoords( dictionary )• SetCoords( OEAtomBase, (x,y,z) )• SetCoords(OEDoubleArray)• SetCoords(OEAtomBase, OEDoubleArray)The first two methods in each section use <strong>Python</strong> data structures for output and input. This has a potential performancecost as the structures are created and populated in the Get methods and the converted to C data structures inthe Set methods. However, for many applications, the cost is small and more than offset by the gain in using nativedata structures.Here is a simple example, looping over the atoms in a molecule and printing each atom’s coordinates:ifs = oemolistream ( ’drugs.sdf’ )mol = OEGraphMol ( )OEReadMolecule ( ifs , mol )for atom in mol . GetAtoms ( ) :print atom . GetIdx ( ) , mol . GetCoords ( atom )81


82 Chapter 19. Coordinate HandlingIf we want to format the output a bit better, we can send the tuple returned from GetCoords directly to a stringinterpolation operator:ifs = oemolistream ( ’drugs.sdf’ )mol = OEGraphMol ( )OEReadMolecule ( ifs , mol )for atom in mol . GetAtoms ( ) :print atom . GetIdx ( ) ,print ’x = %6.3f y = %6.3f z = %6.3f’ % mol . GetCoords ( atom )To get the coordinates of the entire molecule in one call, we can use the following example. Note that the returnfrom GetCoords() is a dictionary of triples indexed by atom index as returned from atom.GetIdx().ifs = oemolistream ( ’drugs.sdf’ )mol = OEGraphMol ( )OEReadMolecule ( ifs , mol )coords = mol . GetCoords ( )for atom in mol . GetAtoms ( ) :print atom . GetIdx ( ) ,print ’x = %6.3f y = %6.3f z = %6.3f’ % coords [ atom . GetIdx ( ) ]Setting of coordinates can be done in a completely analogous way using the corresponding SetCoords methods.To zero all the coordinates, on an atom-by-atom basis:ifs = oemolistream ( ’drugs.sdf’ )mol = OEGraphMol ( )OEReadMolecule ( ifs , mol )for atom in mol . GetAtoms ( ) :print ’x = %6.3f y = %6.3f z = %6.3f’ % mol . GetCoords ( atom )mol . SetCoords ( atom , ( 0 . 0 , 0 . 0 , 0 . 0 ) )print ’x = %6.3f y = %6.3f z = %6.3f’ % mol . GetCoords ( atom )or you can set them all at once with a dictionary.ifs = oemolistream ( ’drugs.sdf’ )mol = OEGraphMol ( )OEReadMolecule ( ifs , mol )# create an empty dictionarycoords = {}# loop over atoms, adding entry to dictionaryfor atom in mol . GetAtoms ( ) :i = atom . GetIdx ( )coords [ i ] = ( 0 . 0 , 0 . 0 , 0 . 0 )# set coordsmol . SetCoords ( coords )# loop again to verify it workedfor atom in mol . GetAtoms ( ) :print mol . GetCoords ( atom )


19.1. Getting and Setting Coordinates of Atoms and Molecules 8319.1.1 C Array WrappersWhile the above methods may be all the beginning user may need, sometimes a more direct approach is neededfor speed and memory-savings. In C++, arrays of floating point numbers are passed between routines as pointersto raw memory locations. While this procedure can lead to crashes and errors accessing memory outside theallocated space, it maximizes performance since no extra copies of the values have to be made and passed around.While it is possible to create a ”pointer to float” or ”pointer to double” in <strong>Python</strong>, it is opaque to the rest of thepython code and can only be passed to other C routines. As a compromise, the Py<strong>OEChem</strong> wrappers introduce 2classes that are very thin wrappers around C-style arrays. OEFloatArray wraps a float* pointer of a given sizeand OEDoubleArray wraps a double* pointer of a given size. These classes can be passed into any <strong>OEChem</strong>method that takes a float pointer or pointer, resulting in much less overhead than the methods above, yetthey provide a len () method so that the user can determine their size. And they provide getitem ,setitem access with bounds checking so that members can be accessed like members of a <strong>Python</strong> list, butan exception will be thrown if the user tries to access outside the size of the Array.While not used for coordinate handling, <strong>Python</strong>-<strong>OEChem</strong> also creates OEIntArray for wrapping a C int*,OEUIntArray for wrapping a C unsigned int* and an OEUCharArray for wrapping a C unsiged char*.Since floating point numbers in <strong>Python</strong> are usually stored as doubles, all the methods that follow are designed touse a OEDoubleArray.For getting molecule or atom coordinates into a OEDoubleArray, you must first create an instance of the correctsize. For an atom’s coordinates you need an array of size 3:xyz = OEDoubleArray ( 3 )mol . GetCoords ( atom , xyz )print ’x= ’ , xyz [ 0 ] , ’y= ’ , xyz [ 1 ] , ’z= ’ , xyz [ 2 ]To get the coordinates of the entire molecule, create a OEDoubleArray 3 times the return from GetMaxAtomIdx().Then to access the coordinates of a specific atom, index into the array using atom.GetIdx(). The next example getsall the coordinates into a single array and the prints them out, atom-by-atom.coords = OEDoubleArray(3∗mol . GetMaxAtomIdx ( ) )mol . GetCoords ( coords )for atom in mol . GetAtoms ( ) :idx = 3∗atom . GetIdx ( )print ( ’i= %3d x= %8.4f y= %8.4f z= %8.4f’ %( atom . GetIdx ( ) , coords [ idx ] , coords [ idx + 1 ] , coords [ idx + 2 ] ) )Setting coordinates using OEDoubleArrays is performed in a completely analogous fashion. For a single atom,create a OEDoubleArray of size 3, fill it with values (it defaults to all zeroes) and then pass it to SetCoords. Thisarray could also be the result of a previous call to GetCoords. For a whole molecule, you again need an array size3 time the return from GetMaxAtomIdx().While these molecule member functions provide access to molecule coordinates, the next few sections describevarious methods to manipulate these coordinates either inside OEDoubleArrays directly (low-level routines) or byoperations directly on the coordinates inside the molecule object.


84 Chapter 19. Coordinate Handling19.2 Coordinate Manipulation19.2.1 High-level Transform Objects19.2.2 Molecular Geometry RoutinesIn all the subsections below, mol is used to indicate an instance of a molecule (OEMol or OEGraphMol for example),conf is used to indicate an instance of OEConfBase and atom is used to indicate an instance of anOEAtomBase. rmat is a rotation matrix (OEDoubleArray(9)) and trans is a translation matrix (OEDoubleArray(3)).OEGetDistance• OEGetDistance(mol, atom1, atom2)• OEGetDistance(mol1, atom1, mol2, atom2)• OEGetDistance(conf, atom1, atom2)• OEGetDistance(conf1, atom1, conf2, atom2)OEGetDistance2• OEGetDistance2(mol, atom1, atom2)• OEGetDistance2(mol1, atom1, mol2, atom2)• OEGetDistance2(conf, atom1, atom2)• OEGetDistance2(conf1, atom1, conf2, atom2)OEGetAngle• OEGetAngle(mol, atom1, atom2, atom3)• OEGetAngle(mol1, atom1, mol2, atom2, mol3, atom3)• OEGetAngle(conf, atom1, atom2, atom3)• OEGetAngle(conf1, atom1, conf2, atom2, conf3, atom3)OEGetTorsion• OEGetTorsion(mol, atom1, atom2, atom3, atom4)• OEGetTorsion(conf, atom1, atom2, atom3, atom4)


19.2. Coordinate Manipulation 85OEGetAbsTorsion• OEGetAbsTorsion(mol, atom1, atom2, atom3, atom4)• OEGetAbsTorsion(conf, atom1, atom2, atom3, atom4)OESetTorsion• OESetTorsion(mol, atom1, atom2, atom3, atom4, angle)• OESetTorsion(conf, atom1, atom2, atom3, atom4, angle)OETranslate• OETranslate(mol, trans)• OETranslate(conf, trans)• OETranslate(mcmol, trans)OERotate• OERotate(mol, rmat)• OERotate(conf, rmat)• OERotate(mcmol, rmat)OEEulerRotate• OEEulerRotate(mol, angles)• OEEulerRotate(conf, angles)• OEEulerRotate(mcmol, angles)OECenter• OECenter(mol, trans=None)• OECenter(conf, trans=None)• OECenter(mcmol)OERMSD• OERMSD(mol1,mol2,automorphflag=True,heavyOnly=True,overlayflag=False,rmat=None,trans=None)• OERMSD(mol1,mol2,OEMatchBase,overlayflag=False, rmat=None,trans=None)• OERMSD(OEDoubleArray1,OEDoubleArray2,overlayflag=False,rmat=None,trans=None)


86 Chapter 19. Coordinate HandlingOEGetTorsionsOECopyCoords19.2.3 Low-level Geometry RoutinesNot yet exposed in <strong>Python</strong>.


CHAPTERTWENTYLogging and Error Handling20.1 OEErrorHandler<strong>OEChem</strong> provides the OEErrorHandler class for generating error messages and warnings. OEErrorHandler providesthe following main methods:• SetOutputStream(stream)By default, output from an OEErrorHandler instance will go to stdout. However, there are timeswhen the user may want to use stdout instead or to log directly to a file. To use stdout, callSetOutputStream(cvar.oeout) and for stderr use SetOutputStream(cvar.oeerr). To usea file instead, create an instance of an oeofstream and then call SetOutputStream with that.• Info(message)Used for pure informational messages.• Fatal(message)Used to log a fatal error and die. Program will exit after logging the message.• Error(message)Used to log an error.• Warning(message)Used to log a warning.• Usage(message)Many programs use a commandline switch like ”-h” or ”-help” to show the user a list of commandlineoptions. Use Usage(message) to send to the output stream and the exit the program.A new OEErrorHandler class instance can be created and then the output can be set at runtime to any of theavailable output streams, including stdout (cvar.oeout), stderr (cvar.oeerr), a file, or a string. All of this functionalityis available using built-in <strong>Python</strong> methods, but programmers wanting to write code for possible future porting toC++ may want to use this message of printing messages to make the task of porting more straightforward.Here is an example of creating an OEErrorHandler and using it to write to a log file or to the terminal, dependingon whether a filename is provided on the commandline.#!/usr/bin/env pythonimport os , sys87


88 Chapter 20. Logging and Error Handlingfrom openeye . oechem import ∗Log = OEErrorHandler ( )if len ( sys . argv ) = = 2 :ofs = oeofstream ( sys . argv [ 1 ] )else :ofs = cvar . oeerrLog . SetOutputStream ( ofs )Log . Info ( "Here is an information message" )Log . Warning ( "Here is a warning" )Log . Fatal ( "Here is a fatal message" )# program dies after Fatal above so this doesn’t get calledLog . Info ( "Shouldn’t see this one" )


CHAPTERTWENTYONEPeriodic Table Functions21.1 Obtaining the Atomic Symbol of an Atom/ElementTo simplify the task of dealing with the elements of the periodic table, <strong>OEChem</strong> contains several functions toobtain useful properties of the elements.A common task is to obtain or display the atomic symbol of an atom represented by an OEAtomBase. To save spaceand reduce redundancy and consistency issues, the OEAtomBase class contains only an unsigned integer representingthe atom’s atomic number. This value may be obtained using the OEAtomBase method GetAtomicNum.This value can be converted into an atomic symbol using the <strong>OEChem</strong> function, OEGetAtomicSymbol.# ch20_1.pyfrom openeye . oechem import ∗symb = OEGetAtomicSymbol ( OEElemNo_C )print "The atomic symbol for carbon is" , symbThe example above uses the integer constant OEElemNo C from the OEElemNo namespace. This namespacerepresents the atomic numbers of the 109 elements as their symbols as a convenience.The OEGetAtomicSymbol returns a string. For atomic number zero and elements greater than or equal toOEElemNo MAXELEM (currently 110), OEGetAtomicSymbol returns an empty string. For all other (valid) values,the returned string contains one or two characters. The first character is always uppercase, and the secondcharacter, if it exists is lower case.21.2 Obtaining the Atomic Number from an Atomic SymbolThe inverse of OEGetAtomicSymbol, i.e. obtaining the atomic numbers from an atomic symbol, is performed by<strong>OEChem</strong>’s OEGetAtomicNum function. This function takes a string argument and returns an integer representingthe atomic number. All the returned string values for OEGetAtomicSymbol are legitimate inputs. Empty symbols(of length zero) or symbols longer than two characters return the value zero. The one or two characters of thesymbol are considered case insensitive (i.e. the first character converted to uppercase, and the second, if it exists,converted to lower case for matching). The symbols ”D” and ”T” (representing deuterium and tritium) return thevalue one (for hydrogen). If the atomic symbol isn’t recognized at value of zero is returned. OEGetAtomicNumalways returns a value less than OEElemNo MAXELEM.89


90 Chapter 21. Periodic Table Functions21.3 Properties of the Elements<strong>OEChem</strong> provides several functions for obtaining properties of the elements, all of which take an integer argumentrepresenting the element’s atomic number.21.3.1 OEGetAverageWeightThis function returns a floating point value representing the average atomic weight.21.3.2 OEGetDefaultMassOEGetDefaultMass returns an unsigned integer representing the most common (or default) isotope for an element.For elements, such as Bromine that have two or more almost equally common isotopes, <strong>OEChem</strong> uses the samedefault mass as MDL. This value is used when converting between <strong>OEChem</strong>’s internal mass value, OEAtomBaseGetIsotope, to and from MDL’s mass differences. For atomic number zero and values greater than or equal toOEElemNo MAXELEM, this function returns zero. Remember that in <strong>OEChem</strong>, an isotopic mass of zero representsa composition of isotopes based upon their natural abundance (with average isotopic mass), but an explicit nonzeromass represents a single isotope (with monoisotopic mass).21.3.3 OEGetCovalentRadiusOEGetCovalentRadius returns the covalent radius of an element, as parameterized by the Cambridge CrystallographicDatabase (CCD). Currently, this table contains values for the organic subset of elements, with inorganicand unparameterized elements returning the value 0.0.21.4 Handling IsotopesThe <strong>OEChem</strong> library also provides two functions for dealing with specific isotopes. Both functions take an integerrepresenting the isotope’s atomic number (number of protons), and an integer representing the isotope’s mass(number of protons plus neutrons).21.4.1 OEIsCommonIsotopeThis predicate (function returning a bool) determines whether the combination of atomic number and mass isreasonable. Obviously the mass must be greater than or equal to the atomic number. Additionally, this functiontabulates the complete list of isotopes accepted as having been observed in high energy physics experiments inparticle accelerators. A mass of zero always returns false.


21.5. Calculating Molecular Weight of a Compound 9121.4.2 OEGetIsotopicWeightThis function returns a floating point value representing the atomic weight of the specified isotope. All values forwhich OEIsCommonIsotope returns true, are encoded. The weight of a neutron can be determined by using atomicnumber zero, and mass one. Whenever OEIsCommonIsotope returns false, this function returns 0.0.21.5 Calculating Molecular Weight of a CompoundThe following example demonstrates how to use <strong>OEChem</strong>’s periodic table functions to perform the common taskof determining the molecular weight of a compound. Average molecular weight is commonly used in filtering (Lipinski’sRules) and as a descriptor in QSAR. The use of inaccurate values for molecular weight in these applicationsmay help explain their limited success.# ch20-2.pyfrom openeye . oechem import ∗def CalculateMolecularWeight ( mol ) :result = 0 . 0for atom in mol . GetAtoms ( ) :elem = atom . GetAtomicNum ( )mass = atom . GetIsotope ( )if ( elem != 0 and mass != 0 ) :result += OEGetIsotopicWeight ( elem , mass )else :result += OEGetAverageWeight ( elem )return resultims = oemolistream ( "drugs.sdf" ) ;for mol in ims . GetOEMols ( ) :print mol . GetTitle ( ) , " mw = " , CalculateMolecularWeight ( mol )


CHAPTERTWENTYTWOPredicate Functions22.1 CallbacksSimply stated, predicate functors (boolean functors) are functions which return ”true” or ”false”. In <strong>OEChem</strong>, thesefunctors are often passed into another function. The functors are then called from inside the second function. Thisis the concept of a ”callback” because the second function provides the argument and ”call’s back” to the functorwhich was passed into the function. We’ve already seen examples of this for the generator methods like GetAtomsand GetBonds. However, you can write your own functions and pass predicates as a function argument.1 #!/usr/bin/env python2 # ch21-1.py34 from openeye . oechem import ∗56 def Count ( fcn , mol ) :7 count=08 for atom in mol . GetAtoms ( ) :9 if fcn ( atom ) = = 1 :10 count += 111 return count12131415 mol = OEGraphMol ( )16 OEParseSmiles ( mol , "c1c(O)c(O)c(Cl)cc1CCCBr" )1718 print "Number of Oxygens = " , Count ( IsOxygen ( ) , mol )19 print "Number of Carbons = " , Count ( HasAtomicNum ( 6 ) , mol )20 print "Number of Halides = " , Count ( IsHalide ( ) , mol )Listing 22.1: Using a functor callbackIn the example above, the function Count loops over the atom and performs a callback to the predicate functorfcn for each atom. If the predicate returns true, a counter is incremented. The main loop passes three of <strong>OEChem</strong>’spredefined atom predicates to the Count function, allowing the same function to calculate the number atoms inthe molecule which satisfy the functor passed to it.92


22.2. Predefined <strong>OEChem</strong> Functors 9322.2 Predefined <strong>OEChem</strong> FunctorsThere are many useful functors already defined in <strong>OEChem</strong>. These can be used by programmers with little orno understanding of the details of how functors work. A programmer can simply pass them to one of the many<strong>OEChem</strong> functions which take predicates as arguments with the expectation that they will behave as described inthe api manual.The predefined functors in <strong>OEChem</strong> include:OEAtomBase FunctorsOEHasAtomIdx ( int )OEHasAtomName ( string )OEHasMapIdx ( int=0)OEHasAtomicNum ( int )OEIsRGroup ( int=0)OENthAtom ( int , int )OEMatchFunc ( string )OEAtomIsInRingOEIsChiralAtomOEHasStereoSpecifiedOEHasAlphaBetaUnsatOEAtomIsInResidueOEIsHydrogenOEIsHeavyOEIsPolarOEIsPolarHydrogenOEIsCarbonOEIsNitrogenOEIsOxygenOEIsHalogenOEIsSulfurOEIsPhosphorusOEIsAromaticAtomResidue data Functors in OEAtomBasesOEHasChainID ( string )OEHasResidueNumber ( int )OEHasFragmentNumber ( int )OEBondBase FunctorsOEHasBondIdx ( int )OEHasOrder ( int )OEBondIsInRingOEIsRotorOEIsChiralBondOEHasBondStereoSpecifiedOEIsAromaticBondOEConfBase FunctorsOEHasConfIdx ( int )


94 Chapter 22. Predicate Functions22.3 Writing your own Functors in <strong>Python</strong>Deriving new instances of OEUnaryFunctions and OEUnaryPredicates requires C++, but for many cases, a specialcase of these has been instantiated for atoms and bonds. OEPyAtomPredicate and OEPyBondPredicate are specialcase predicates that take a <strong>Python</strong> function as the single argument. In essence, we are creating a callback that itselfholds a callback. This <strong>Python</strong> function must be defined a certain way. First, it can only take a single argument, anatom for an OEPyAtomPredicate and a bond for an OEPyBondPredicate. Second, it must return 1 if the atom(bond)satisifies the condition and 0 otherwise. Since it may sometimes be necessary to create predicates that hold state,a class instance method can be used. Just make sure that the method (to which a pointer will be stored inside theC++ predicate) stays in scope while the predicate is used.The following example shows a user defined functor which screens for atoms whose atomic mass is greater than15.1 #!/usr/bin/env python2 # ch21-2.py34 from openeye . oechem import ∗56 def AtomWgtGT15 ( atom ) :7 if OEGetAverageWeight ( atom . GetAtomicNum()) >15:8 return 19 return 01011 WeightGT15 = PyAtomPredicate ( AtomWgtGT15 )1213 mol = OEGraphMol ( )14 OEParseSmiles ( mol , "c1c(O)c(O)c(Cl)cc1CCCBr" )15 OETriposAtomNames ( mol )1617 for atom in mol . GetAtoms ( WeightGT15 ) :18 print atom . GetName ( ) , "has weight > 15."Listing 22.2: User defined functors can be simple22.4 Composition Functors in <strong>OEChem</strong>Occasionally, one may want to use a logical operator to join two or more functors. While it is certainly possibleto write a quick functor which wraps two or more functors, but this is not necessary. The functors OEAndAtom,OEOrAtom and OENotAtom are already defined. The each have constructors which take the appropriate numberof predicates as arguments and generate a single unary predicate. Similar logical predicates are defined for bonds,residues, etc.The following example demonstrates use of the OEAndAtom and OENotAtom composition predicates with twoof the predefined atom predicates1 #!/usr/bin/env python2 # ch21-3.py34 from openeye . oechem import ∗56 def Count ( fcn , mol ) :7 count=08 for atom in mol . GetAtoms ( ) :


22.4. Composition Functors in <strong>OEChem</strong> 959 if fcn ( atom ) = = 1 :10 count += 111 return count121314 mol = OEGraphMol ( )15 OEParseSmiles ( mol , "c1c(O)c(O)c(Cl)cc1CCCBr" )1617 print "Number of Aromatic Oxygens = " ,18 print Count ( OEAndAtom ( IsOxygen ( ) , IsAromaticAtom ( ) ) , mol )1920 print "Number of Non-Carbons = " ,21 print Count ( OENotAtom ( HasAtomicNum ( 6 ) ) , mol )Listing 22.3: Composition Functors in <strong>OEChem</strong>


CHAPTERTWENTYTHREEMolecular File FormatsIn addition to the high-level file readers and writers, OEReadMolecule and OEWriteMolecule that are used bymolecule streams, <strong>OEChem</strong> also provides low-level access to the readers and writers for the major file formats.The following examples show the steps involved in each of the low-level reader calls.23.1 SMILES File Format23.1.1 OEParseSmilesdef MyReadSmilesMolecule ( pyfile , mol ) :mol . Clear ( )for line in pyfile . xreadlines ( ) :if OEParseSmiles ( mol , line ) :OEFindRingAtomAndBonds ( mol )OEAssignAromaticFlags ( mol )return 1mol . Clear ( )return 023.1.2 OECreateCanSmiStringNote that the canonical SMILES generated by this function remains dependent on the state of the molecule, esp.its aromaticity state. Thus, to generate a canonical smiles suitable for purposes such as a database key, the programmermust assure that the state of the molecule has been standardized. In particular, aromaticity should beperceived according to the preferred model. The SMILES canonicalization flag OESMILESFlag Canonicalrefers specifically to canonical ordering of atoms. In contrast, the high-level output function OEWriteMolecule,when writing the canonical SMILES format (OEFormat::CAN) does invoke OEFindRingAtomsAndBondsOEAssignAromaticFlags, and is equivalent to the following code.def MyWriteCanSmilesMolecule ( pyfile , mol ) :OEFindRingAtomsAndBonds ( mol )OEAssignAromaticFlags ( mol )smiles = OECreateCanSmiString ( mol )pyfile . write ( "%s %s\n" % ( smiles , mol . GetTitle ( ) ) )96


23.2. MDL File Format (SD and Mol) 9723.2 MDL File Format (SD and Mol)<strong>OEChem</strong> provides low-level file readers and writers to assist reading and writing MDL mol and SD files.23.2.1 OEReadMDLFiledef MyReadMDLMolecule ( ifs , mol ) :mol . Clear ( )if OEReadMDLFile ( ifs , mol ) :OEFindRingAtomsAndBonds ( mol )OEAssignAromaticFlags ( mol )return 1mol . Clear ( )return 023.2.2 OEWriteMDLFiledef MyWriteMDLMolecule ( ofs , mol ) :OEClearAromaticFlags ( mol )if not OEMDLHasParity ( mol ) :OEMDLPerceiveParity ( mol )OEWriteMDLFile ( ofs , mol )ofs . write ( "$$$$\n" )23.3 Sybyl Mol2 File Format23.3.1 OEReadMol2Filedef MyReadMol2Molecule ( ifs , mol ) :mol . Clear ( )if OEReadMol2File ( ifs , mol ) :OEFindRingAtomsAndBonds ( mol )OEAssignAromaticFlags ( mol )return 1mol . Clear ( )return 023.3.2 OEWriteMol2Filedef MyWriteMol2Molecule ( ofs , mol ) :OEFindRingAtomsAndBonds ( mol )OEAssignAromaticFlags ( mol , OEAroModelTripos )OETriposAtomTypeNames ( mol )OETriposBondTypeNames ( mol )OETriposAtomNames ( mol )OEWriteMol2File ( ofs , mol , false )


98 Chapter 23. Molecular File Formats23.4 PDB File Format23.4.1 OEReadPDBFiledef MyReadPDBMolecule ( ifs , mol ) :mol . Clear ( )if OEReadPDBFile ( ifs , mol ) :OEDetermineConnectivity ( mol )OEFindRingAtomsAndBonds ( mol )OEPerceiveBondOrders ( mol )OEAssignImplicitHydrogens ( mol )OEAssignFormalCharges ( mol )OEAssignAromaticFlags ( mol )return 1mol . Clear ( )return 023.4.2 OEWritePDBFiledef MyWritePDBMolecule ( ofs , mol ) :if OEHasResidues ( mol ) :OEPDBOrderAtoms ( mol )else :OEPerceiveResidues ( mol )if mol . GetDimension ( ) < 3 :# If no co-ordinates, write out bonds and bond orders.flags = OEPDBOFlag_ORDERS | OEPDBOFlag_BONDS ;else :flags = OEPDBOFlag_DEFAULT ;OEWritePDBFile ( ofs , mol , flags )23.5 MacroModel File Format23.5.1 OEReadMacroModelFiledef MyReadMacroModelMolecule ( ifs , mol ) :mol . Clear ( )if OEReadMacroModelFile ( ifs , mol ) :OEAssignFormalCharges ( mol )OEFindRingAtomsAndBonds ( mol )OEAssignAromaticFlags ( mol )return 1mol . Clear ( )return 023.5.2 OEWriteMacroModelFile


23.6. XYZ File Format 99def MyWriteMacroModelMolecule ( ofs , mol ) :if not OEHasResidues ( mol ) :OEPerceiveResidues ( mol )OEMacroModelAtomTypes ( mol )OEWriteMacroModelFile ( ofs , mol )23.6 XYZ File Format23.6.1 OEReadXYZFiledef MyReadXYZMolecule ( ifs , mol ) :mol . Clear ( )if OEReadXYZFile ( ifs , mol ) :OEDetermineConnectivity ( mol )OEFindRingAtomsAndBonds ( mol )OEPerceiveBondOrders ( mol )OEAssignImplicitHydrogens ( mol )OEAssignFormalCharges ( mol )OEAssignAromaticFlags ( mol )return 1mol . Clear ( )return 023.6.2 OEWriteXYZFiledef MyWriteXYZMolecule ( ofs , mol ) :OEWriteXYZFile ( ofs , mol )23.7 FASTA Sequence File Format23.7.1 OEReadFastaFiledef MyReadFASTAMolecule ( ifs , mol ) :mol . Clear ( )if OEReadFASTAFile ( ifs , mol ) :OEFindRingAtomsAndBonds ( mol )OEAssignAromaticFlags ( mol )return 1mol . Clear ( )return 023.7.2 OEWriteFastaFiledef MyWriteFASTAMolecule ( ofs , mol ) :


100 Chapter 23. Molecular File Formatsif not OEHasResidues ( mol ) :OEPerceiveResidues ( mol )OEWriteFASTAFile ( ofs , mol )


CHAPTERTWENTYFOURMiscellaneous UtilitiesThe OpenEye libraries also contain a number of utility classes and functions that although are unrelated to chemistry,are often useful for writing applications.24.1 OEStopWatchThis is a nice simple class for timing scripts and programs. Simply create an instance of the OEStopwatch class.It has a Start() and Stop() method. Elapsed() is used to output the time elapsed since start. For example:sw = OEStopwatch ( )sw . Start ( ). . . Do something that takes some timeprint "That took" , sw . Elapsed ( ) , "seconds."One quirk of the OEStopWatch class is that to work around operating system limitations, values below 2147seconds (about 35 minutes) are based on a high-resolution CPU-usage timer (using ”clock(3)”) and after thatreport Wall-clock times accurate to about a second (using ”time(3)”).24.2 OEDotsThis is a utility class to provide a dot-based (on the console) progress bar display to show status of scripts thatiterate over large input files. The constructor takes 3 arguments, with the third optional.dots = OEDots(1000, 50, ”molecules”)The first argument is how many count events happen on each line of dots, the second is how many counts eachdot equals. The last argument is simply a text label such that at the end of each line, the text says what is beingcounted. The example above will print a dot (to stderr) every 50 events. After 1000 event, it will end the line ofdots with ”1000 molecules processed, 2000 molecules processed”, etc. If either of the numeric arguments are setto zero (0), there will be no output but the internal counter will still count. This is useful if you want to countnumber of times through a loop, but don’t care about a status update.In order up increment the counter, call the Update() method each time through the loop. If you loop in groups ofmore than one, you can call Update(n), where n is a positive integer.101


102 Chapter 24. Miscellaneous UtilitiesAfter the end of the loop, call the Total() member function to get output to stderr of the total number of itemscounted. The following example shows how to use OEDots:from openeye . oechem import ∗import os , sysifs = oemolistream ( )ifs . open ( "nci.smi" )# create a dots instancedots = OEDots ( 1 0 0 0 0 , 5 0 0 , "SMILES" )# loop over the input file, call update for each moleculefor mol in ifs . GetOEGraphMols ( ) :DoSomething ( mol )dots . Update ( )# output a summarydots . Total ( )


CHAPTERTWENTYFIVEThe SMILES Line NotationThe section provides a brief introduction to SMILES.25.1 Daylight SMILESUnfortunately, there are a number of ambiguities in the original paper describing the Daylight SMILES syntax,that have led to different SMILES being accepted or rejected by independent SMILES parser implementations.Daylight Corina Corina Concord COBRA Synopsis <strong>OEChem</strong>SMILES 4.41 1.6 WWW 3.2.1 3.21A 4.0 1.5C1.C1 Y Y Y N N Y YC%00CC%00 Y Y Y N N N YC(C.C)C Y Y Y N N Y YC(C)1CC1 Y N N N Y N YC(.C) Y Y Y N N Y YC() Y Y N Y Y Y Y(CO)=O N N N N N Y N(C) N N N N N Y Y.C N N N Y Y N YC..C N Y N Y Y N YC. N Y Y Y Y Y YC=(O)C N Y N N Y N NC((C)) N Y N Y N Y YC.(C) N Y N Y N N YC1CC(=1) N Y N N N N YC1CC(1) N N N N N N YC(C.) N Y N N N N YC==C N Y N N N N YC(1CC1) N N N N N N YC(1)CC1 N N N N N N YThe <strong>OEChem</strong> SMILES parser actually has two modes. The default is relaxed which produces the results aboveand enables the SMILES extensions described in the next section. It also has a strict mode that may be used forvalidating SMILES strings that is far less forgiving about dubious SMILES strings.103


104 Chapter 25. The SMILES Line Notation25.2 Extensions to Daylight SMILESThe <strong>OEChem</strong> SMILES parsers support several minor extensions to Daylight syntax. Each of these extensions andits motivations are listed below.Quadruple Bond In addition to ”-”, ”=” and ”#” for specifying single, double and triple bonds respectively,<strong>OEChem</strong> also supports ”$” for specifying quadruple bonds. An example would be octachlorodirhenate (III),which is written as ”[Rh-](Cl)(Cl)(Cl)(Cl)$[Rh-](Cl)(Cl)(Cl)Cl”.Unquoted and Additional Elements In addition to the standard Daylight unquoted elements, B, C, N, O, F, P, S,Cl, Br and I, <strong>OEChem</strong>’s SMILES readers also allow H, D and T to specify hydrogen, deuterium and tritium.Additionally, to support Syracuse SMILES, ”CL” and ”BR” are considered ”Cl” and ”Br”. The periodictable is also extended from 102 to 109 elements, i.e. [Sg] for Seaborgium, with the addition of [D] and [T]representing [2H] and [3H] respectively.<strong>OEChem</strong> may support ”Na”, ”Li” and ”K” as unquoted elements to support Syracuse SMILES at some pointin the future.Aromatic Telurium In order to support OpenEye’s aromaticity model, which allows Tellurium to be aromatic, theSMILES parser has been extended to support ”[te]”, such as in tellurophene, ”[te]1cccc1”, which follows inthe sequence furan (”o1cccc1”), thiophene (”s1cccc1”) and selenophene (”[se]1cccc1”).Atom Maps in Molecules Traditionally, SMILES atom maps, i.e. [Pb:1], are only ever used and specified inreaction molecules, [Pb:1]>>[Au:1]. However, <strong>OEChem</strong> extends this notion to allow atom maps to beused in discrete molecules. Hence, both [1*] and [*:1] may be used to mark significant sites in a molecule.RGroup Attachment Points As a short hand to support specifying templates for combinatorial libraries, and tosupport existing Cactus and JChem/Marvin usage, <strong>OEChem</strong> allows ”[R2]” to be used as short-hand for [*:2].For enquiring minds, the SMILES [R2:3] is interpreted as [*:3] or [R3], with the last specification takingpriority.External Bond Attachment Points <strong>OEChem</strong> SMILES also allows specification of attachment points as externalclosures. These have the syntax, ampersand followed by a ring closure. Hence the SMILES CC&1 isequivalent to the RGroup attachment SMILES CC[R1], which is equivalent to the atom mapped moleculeCC[*:1]. As with ring closures, bond orders may be specified after the ampersand and before the closureindex, C&=1, and two digit closures are indicated by a ’%’ prefix, i.e. C&%12 or C&=%12.One major advantage, of this notation is that the SMILES parser will fuse attachment points present ina SMILES string, in the same way as it fuses ring closures. Hence, C&1C&2.Cl&1.Br&2, when parsedproduces the molecule ClCCBr. This provides a convenient method of enumerating combinatorial librariesusing string concatenation.


CHAPTERTWENTYSIXBiopolymer ResiduesThe <strong>OEChem</strong> toolkit also provides functionality for handling residue information, such as the amino acids of aprotein or the bases of a nucleic acid.26.1 Ontology and Schema FragilityOne of the major problems in computer science is the question of how best to represent or organize data, forexample by a data structure or a database schema. Indeed, volumes have been written on data modeling, andnumerous methodologies have been developed to codify this task; object-oriented design, relational design, objectrelationaldesign, and so on. Two measures by which these design-styles can be evaluated are (i) how well theycan present the same information in multiple views or organizations and (ii) the related problem of how well theycan compensate for changes or modifications to their schema/organization.These two challenges form the basis of ”Schema Fragility”. Different users or applications may require the samedata to be organized differently for ease of processing, or reflecting different models or levels of detail, or toprovide different user communities with views customized for their perspective. This ability is not only critical tothe ”applicability” of a representation, but also its ”longevity”. One of the most important aspects of modeling a”real-world” system is that both it and our knowledge of it are changing continually. A robust design is one thatcan easily adapt to such inevitable changes.26.1.1 Biological HierarchiesConsider, for example, the issues involved in modeling a hierarchy. In such organizations, records are linkedtogether like a family tree, such that each record has only one owner, e.g. an order is owned by only one customer.Indeed, hierarchical structures were widely used in the first mainframe management systems in the early 1970s.As a case study, we’ll consider the classic hierarchy from biology, that of phylogeny. Most biologists are awarethat all living organisms can be placed in a hierarchy of kingdoms, genus and species. If it were truly as simple asthis it might be reasonable to organize as a three-level tree. The Biota bioinformatics database, for example, goesfurther with kingdom, phylum, class, order, family, genus and species as seven different tables. But even this is thetip of the iceberg, as the NCBI taxonomy database, as used by GenBank, records things assuperkingdom, kingdom, subkingdom, superphylum, phylum, subphylum, superclass, class, subclass, infraclass,cohort, subcohort, superorder, order, suborder, infraorder, parvorder, superfamily, family, subfamily, tribe, subtribe,genus, subgenus, species group, species subgroup, species, subspecies, varietas and forma.105


106 Chapter 26. Biopolymer ResiduesHence if NCBI used a top-down hierarchical data organization, looping over all the organisms in a database onlyrequires 30 nested loops. The problem is exasperated by the fact that the other major bioinformatics databases usea different hierarchy, and that different subtrees and branches have different levels.Clearly, modeling such a hierarchy explicitly (even without the ambiguity of organisms belonging to multipleleaves of the hierarchy) has serious limitations.More relevant to <strong>OEChem</strong> is the related problem of how to represent biomolecules. Once again, a naive structuralbiologist could be forgiven for assuming it’s a simple matter of organizing atoms into residues, and residues intochains for a simple three-level hierarchy. Indeed, this was a fundamental mistake made by the immensely popularRasMol molecular graphics program that had exactly such a three-level structure. In reality, the organizationof a PDB file also requires multiple NMR models, crystal related symmetries, secondary and tertiary structuralelements, folding domains, (active) sites, connected components, XPLOR segment IDs and distinctions betweenproteins, nucleic acids, ligands and solvent, and the distinction between backbone vs. side-chain atoms, andcategorization by ring system and cycle membership, heavy atoms and hydrogens. Finally, let’s not forget thealternate conformation indicators for each atom or residue!Notice, that this hierarchy also fails to be a ”strict” hierarchy. A single syntactic chain may be split into multipleconnected components, and multiple PDB chains may be covalently bonded into a single connected component.TER records normally serve as chain terminators but in several PDB files occur with a single residue. Most chainsare either ATOM or HETATOM, but peptidic inhibitors and post-translationally modified proteins are mixtures ofboth. A single strand of a beta-sheet is always formed from a single chain, but a beta-sheet may be formed fromstands from multiple chains.26.1.2 Set <strong>Theory</strong> to the RescueIt turns out that there are simple computer science solutions to these problems. Indeed, it was Codd’s expositionof these principals for ”relational” database systems, that completely killed off use of hierarchical and networkdatabase management systems within a decade of their introduction.The premise is that rather than encode a single fragile hierarchy explicitly, each leaf or record instead maintainsits identity or position within the organization. This allows the representation of arbitrary sets and/or partitionsof a set of records. All ligand atoms of a molecule are denoted by the fact that they have a ligand property thatis true, rather than it being implicit from where it is stored (in the abstract sense of an access path). Of course,each record may possess more than one property, allowing it to simultaneously exist in more than one set. Hence,this representation is generic enough to handle arbitrary Venn diagrams. Strict hierarchies and trees are thereforejust an emergent property, where some sets are strict subsets of others. This allows elements to simultaneously beorganized in more than one hierarchy, or to elide or introduce new levels into the hierarchy.The next realization is that once sets, or levels in a hierarchy, are represented by boolean properties or predicates,that there’s no need to have an explicit ”name” or placeholder for a set. Instead, a set or partition canbe defined/named by providing a representative member, and a binary predicate that determines whether anothermember/record is in the same set/partition as it. For example, to represent a protein chain, it is sufficient to specifyan arbitrary atom in the appropriate chain, and provide a SameChain function. Similarly, a residue can be specifiedby providing the exact same atom and a SameResidue function.26.1.3 <strong>OEChem</strong> ExamplesThe above explanation should go some way to explaining <strong>OEChem</strong>’s decision to attach biopolymer informationto each atom, rather than have container classes for residues and chains (and presumably connected components,


26.1. Ontology and Schema Fragility 107NMR models, etc...). The OEResidue class is therefore an additional set of fields that may be associated with anatom. It does not denote or prescribe an amino or nucleic acid but instead stores atom-specific data such as atomserial number, b-factor and occupancy, in addition to residue information, chain information, fragment information,NMR model information, etc...The residue information associated with an atom can be set with the OEAtomSetResidue function, and is retrievedwith the OEAtomGetResidue function. The PDB and Macromodel file format readers parse this informationfrom the input file format. Additionally, <strong>OEChem</strong> allows residue information to be perceived directly from theconnection table using the OEPerceiveResidues function.For many algorithms processing biomolecules, it is convenient to maintain the atoms of the OEMolBase in sortedorder to group atoms in the same residue next to one another, and residues in the same chain sequentially. This canbe done conveniently in <strong>OEChem</strong> using the OEPDBOrderAtoms function. Note, that OEPercieveResidues callsOEPDBOrderAtoms automatically.A common idiom is therefore the following code snippet:def MyPrepareProtein ( mol ) :if OEHasResidues ( mol ) :OEPDBOrderAtoms ( mol )else :OEPerceiveResidues ( mol )As a teaching example, the following code demonstrates one way of reporting the number of different chains in anOEMolBase.def MyCountChains ( mol ) :result = 0first = Trueprev = Nonefor atom in mol . GetAtoms ( ) :res = OEAtomGetResidue ( atom )chain = res . GetChainID ( )if first or chain != prev :result += 1first = Falseprev = chainreturn resultA slightly improved version would be to use <strong>OEChem</strong>’s OESameChain function.def MyCountChains2 ( mol ) :result = 0prev = Nonefor atom in mol . GetAtoms ( ) :res = OEAtomGetResidue ( atom )if prev and OESameChain ( res , prev ) :continueprev = resresult += 1return resultClearly, a MyCountResidues function would look almost identical but use the OESameResidue function instead ofOESameChain. The slightly more complicated example below, reports the number of residues in each chain.def MyReportResidues1 ( mol ) :prevchain = Nonefor chain in mol . GetAtoms ( ) :chainres = OEAtomGetResidue ( chain )if not prevchain or not OESameChain ( chainres , prevchain ) :


108 Chapter 26. Biopolymer Residuesprevres = Nonecount = 0for residue in mol . GetAtoms ( ) :resres = OEAtomGetResidue ( residue )if OESameChain ( resres , chainres ) :if not prevres or not OESameChain ( resres , prevres ) :prevres = resrescount += 1print count , "residues in chain" ,chainres . GetChainID ( )prevchain = chainresWhilst the above example contains the doubly nested loops that some structural biologists like to see, the sameoutput can be generated even more efficiently by:def MyReportResidues2 ( mol ) :count = 0residue = Nonechain = Nonefor atom in mol . GetAtoms ( ) :res = OEAtomGetResidue ( atom )if not chain :chain = reselif not OESameChain ( res , chain ) :print count , "residues in chain" ,chain . GetChainID ( )count = 0if not residue or not OESameResidue ( res , residue ) :residue = rescount += 1if count>0:print count , "residues in chain" ,chain . GetChainID ( )Of course, just because <strong>OEChem</strong> uses an extremely advanced representation of biopolymers, there’s absolutelynothing to prevent a user slurping this information into a FORTRAN common block or whichever representationbest suits their way of thinking about the problem.26.1.4 Data Modeling Bibliography1. Chris J. Date, ”An Introduction to Database Systems”, Addison-Wesley Publishers, 8th Edition, July 2003.2. Tim Denvir, ”Introduction to Discrete Mathematics for Software Engineering”, MacMillan Publishers, 1986.3. Douglas B. Lenat and R.V. Guha, ”Building Large Knowledge-Based Systems: Representation and Inferencein the Cyc Project”, Addison-Wesley Publishers, February 1990.


26.2. Stored Properties of Residues 10926.2 Stored Properties of ResiduesProperty Name Type Get Method Set MethodAlternate Location char GetAlternateLocation SetAlternateLocationTemperature Factor float GetBFactor SetBFactorChain Identifier char GetChainID SetChainIDFragment Number int GetFragmentNumber SetFragmentNumberHETATM Record bool IsHetAtom SetHetAtomInsertion Code char GetInsertCode SetInsertCodeNMR Model Number int GetModelNumber SetModelNumberResidue Name std::string GetName SetNameResidue Number int GetResidueNumber SetResidueNumberOccupancy float GetOccupancy SetOccupancySecondary Structure int GetSecondaryStructure SetSecondaryStructureAtom Serial Number int GetSerialNumber SetSerialNumber26.2.1 AlternateLocationThe ”alternate location” property of a residue is a character used to denote the alternate location of its parent atom.The default value is a space, indicating that the parent atom is the primary location.26.2.2 BFactorThe ”beta-factor” property of a residue is a float used to represent the temperature b-factor of its parent atom. Thedefault value is 0.0.26.2.3 ChainIDThe ”chain id” property of a residue is a character used to represent the chain identifier of the residue/parent atom.The default value is a space.26.2.4 FragmentNumberThe ”fragment number” property is an integer used to represent the logical fragment/segment/chain of which theresidue/parent atom is part. The default value is zero.26.2.5 HetAtomThe ”hetatom” property of a residue is a boolean used to represent whether the atom is part of a standard amino ornucleic acid, i.e. whether the atom is stored in a HETATM record of a PDB file. The default value is false.


110 Chapter 26. Biopolymer Residues26.2.6 InsertCodeThe ”insertion code” property of a residue is a character used to represent the sequence insertion code of a residue.The default value is a space.26.2.7 ModelNumberThe ”model number” property of a residue is an integer used to represet the NMR model number of the residue.The default value is zero.26.2.8 NameThe ”name” property of a residue is a string used to represent the residue name of the residue. The default value is”MOL”.26.2.9 ResidueNumberThe ”residue number” property of a residue is an integer used to represent the residue sequence number of aresidue. The default value is one.26.2.10 OccupancyThe ”occupancy” property of a residue is a float used to represent the crystallographic occupancy of the residue’sparent atom. The default value is 1.0.26.2.11 SecondaryStructureThe ”secondary structure” property of a residue is an integer that represents the secondary structure of that residue.The default value is zero.26.2.12 SerialNumberThe ”atom serial number” property of a residue is an integer used to represent the PDB atom serial number associatedwith the residue’s parent atom. The default value is zero.


26.3. A Hierarchy View of Residue Data 11126.3 A Hierarchy View of Residue DataWhile a fragile data model is not desirable, a good data model does not prevent the data from being viewed as if itwere in a hierarchy. For the convenience of users who are unwilling to understand the benefits of a non-hierarchy,<strong>OEChem</strong> provides a method to view the data as if it were in a hierarchy via the OEHierView class.


CHAPTERTWENTYSEVENValence ModelsThis section describes the two (three?) valence models currently implemented by <strong>OEChem</strong>. For molecules thathave fully specified formal charges, the MDL valence model may be used to assign hydrogen counts. For moleculesthat have fully specified hydrogen counts, the OpenEye ”charge” model may be used to assign formal charges.Finally, for molecules with neither formal charges nor hydrogen counts, <strong>OEChem</strong> uses the OpenEye HCountmodel to assign both hydrogen counts and formal charges.27.1 The MDL Valence ModelThe MDL valence model was developed for MDL for allowing hydrogen counts to be implicit in MDL SD andMOL file formats. It assumes that the bond orders to an atom are specified (ExplicitValence), and that the atomicnumber and formal charge are correctly set. The MDL valence model, then prescribes the number of implicithydrogens on a particular atom. Table 27.1 shows the MDL Valence model as implemented in <strong>OEChem</strong>.In <strong>OEChem</strong>, the MDL valence model is used for calls to the OEAssignMDLHydrogens function.All the remaining elements not listed in Table 27.1 are assumed to have no implicit hydrogens.27.2 The OpenEye Valence Model27.2.1 The OpenEye Charge ModelThe OpenEye formal charge model assigns formal charges to elements based upon their total valence. In <strong>OEChem</strong>,this functionality is invoked by the OEAssignFormalCharges function. If the formal charge on an atom is non-zero,it is left unchanged.Hydrogen If the valence isn’t one, the formal charge is +1.Boron If the valence is four, the formal charge is +1.Carbon If the valence is three, the formal charge is +1 if the atom has a polar neighbor, i.e. N, O or S, and formalcharge -1 otherwise.112


27.2. The OpenEye Valence Model 113At# Symbol -3 -2 -1 0 +1 +2 +3 +4 +51 H 0 1 03 Li 0 1 04 Be 0 2 1 05 B 2 3,5 4 3 2 1 06 C 2 3,5 4 3 2 17 N 1 2 3,5 4 38 O 0 1 2 3,5 4 3 2 19 F 0 1 211 Na 0 1 012 Mg 0 2 1 013 Al 2,4,6 3,5 4 3 2 1 014 Si 2,4,6 3,5 4 3 2 1 015 P 2,4,6 3,5 4 3 2 1 016 S 0 1,3,5,7 2,4,6 3,5 4 317 Cl 0 1,3,5,7 2,4,6 3,5 419 K 0 1 020 Ca 0 2 1 031 Ga 4 3 0 1 032 Ge 2,4,6 3,5 4 3 0 133 As 2,4,6 3,5 4 3 0 1 034 Se 0 1,3,5,7 2,4,6 3,5 4 335 Br 0 1,3,5,7 2,4,6 3,5 437 Rb 00 1 038 Sr 0 2 1 049 In 2,4,6 3,5 2,4 3 0 1 050 Sn 2,4,6 3,5 2,4 3 0 151 Sb 0 1,3,5,7 2,4,6 3,5 2,4 3 0 1 052 Te 0 1,3,5,7 2,4,6 3,5 2,4 353 I 0 1,3,5,7 2,4,6 3,5 2,455 Cs 0 1 056 Ba 0 2 1 081 Tl 2,4 1,3 0 0 082 Pb 2,4,6 3,5 2,4 3 0 183 Bi 0 1,3,5,7 2,4,6 3,5 2,4 3 0 1 084 Po 0 1,3,5,7 2,4,6 3,5 2,4 385 At 0 1,3,5,7 2,4,6 3,5 2,487 Fr 0 1 088 Ra 0 2 1 0Table 27.1: MDL Valence Model


114 Chapter 27. Valence ModelsNitrogen If the valence is two, the formal charge is -1, and if the valence is four the formal charge is +1.Oxygen If the valence is one, the formal charge is -1, and if the valence is three the formal charge is +1.Phosphorous If the valence is four, the formal charge is +1.Sulfur If the valence is 1, the formal charge is -1, if the valence is three the formal charge is +1, if the valence is5, the formal charge is -1, if the valence is four and the degree is four the charge is +2.Chlorine If the valence is 0 the formal charge is -1, if the valence is four the formal charge is +3.Fluorine, Bromine, Iodine If the valence is zero, the formal charge is -1.Magnesium, Calcium, Zinc If the valence is zero, the formal charge is +2.Lithium, Sodium, Potassium If the valence is zero, the formal charge is +1.Iron If the valence is zero, the formal charge is +3 if the partial charge is 3.0, and +2 otherwise.Copper If the valence is zero, the formal charge is +2 if the partial charge is 2.0, and +1 otherwise.For the remaining elements, if the valence of an atom is zero, its formal charge is set from its partial charge.27.2.2 The OpenEye Hydrogen Count ModelOpenEye’s hydrogen count valence model is used by <strong>OEChem</strong> when neither hydrogen counts nor valence arespecified. The typical uses are reading molecules from PDB or XYZ format files without explicit hydrogens. Thisfunctionality is invoked by OEAssignImplicitHydrogens, which must always be followed by a call to OEAssign-FormalCharges. This valence model is unique in that it only partially updates hydrogen counts, assuming theunfilled valences will be corrected by OpenEye’s charge valence model above. In MDL’s model for example, aneutral sodium atom is assumed to have one implicit hydrogen, i.e. sodium hydride instead of sodium metal. InOpenEye’s hydrogen count valence model, a disconnected sodium atom is assumed to be a Sodium cation, [Na+].When reading from PDB files, this is a very reasonable assumption.Note that although the OpenEye hydrogen count valence model often sets charge and protonation states to physiologicalconditions, it is neither intended to be a pKa nor ionization state predictor. Instead, it is a normalization.Much like many registry system (and the MDL valence model) will convert C(=O)[O-] to C(=O)O for registrationpurposes, this valence model converts the opposite direction to C(=O)[O-].• Carbon is always assumed to be four valent, and therefore neutral.• Nitrogens that are conjugated (have double bonds, or have neighbors that have double bonds, in their Kekulérepresentations) are assumed three valence and neutral, whilst all other nitrogens are assumed to be fourvalent, with a +1 formal charge.• Oxygens are assumed to be two valent and neutral, unless they have a single bond to an atom that’s doublybonded to oxygen, in which case it’s assumed to be one valent, with a -1 formal charge.• Sulphur is assumed to always be two valent.All other elements are assumed to have no implicit hydrogens, and the formal charge as specified by the OpenEyecharge model. This models all disconnected halgens as halide anions, and when disconnected the metals listedabove as cations.


27.2. The OpenEye Valence Model 115These rules are sufficient to reasonably protonate proteins read from PDB files. However, as described above, theserules are not intended to be a comprehensive rule-based pKa predictor. Users interested in predicting physiologicalionization, and protonation/disassociation state enumeration should contact OpenEye Scientific Software about ourtools for exactly this task.


CHAPTERTWENTYEIGHTSMARTS Pattern Matching28.1 SMARTS Syntax116


CHAPTERTWENTYNINESMARTS Pattern Matching29.1 SMARTS SyntaxSMARTS is a line notation developed by Daylight Chemical Information Systems for compactly representingmolecular substructure queries. The SMARTS language can be considered an extension or generalization of Daylight’sSMILES notation for representing discrete molecules.29.1.1 Atom PrimitivesSymbol Description Arg? Default ValueA Non-aromatic (aliphatic) atom Noa Aromatic atom NoDn Degree (explicit connections) Yes (No default)Hn Total hydrogen count Optional Exactly onehn Implicit hydrogen count Optional Exactly oneRn Ring bond count 1 Optional Any ring atomxn Ring bond count 1 Optional Any ring atomrn Smallest ring size Optional Any ring atomvn Valence (total bond order) Yes (No default)Xn Connectivity (total connections) Yes (No default)#n Atomic number Yes (No default)+n Positive charge Optional +1 cation (++ is +2, etc)-n Negative charge Optional -1 anion (−− is -2, etc)ˆn Atomic hybridization 2 Yes (No default)@ Anticlockwise local chirality No@@ Clockwise local chirality No@n Chirality class Optional Anticlockwisen Explicit atomic mass No117


118 Chapter 29. SMARTS Pattern Matching29.1.2 Bond PrimitivesSymbol DescriptionDefault (single or aromatic)- Single bond (not aromatic)= Double bond (not aromatic)# Triple bond˜ Any bond (wildcard): Aromatic bond@ Ring bond/ Directional single “up” bond\ Directional single “down” bond/? Directional “up” or unspecified\? Directional “down” or unspecified29.1.3 Logical OperatorsSyntax Description!e Not ee1 & e2 e1 and e2 (high precedence)e1 , e2 e1 or e2e1 ; e2 e1 and e2 (low precedence)1 The semantics of the ring count primitive, R, differs slightly between Daylight SMARTS and OpenEye SMARTS. In Daylight semantics,Rn means that an atom is in n rings of the chosen SSSR. As the choice of SSSR is non-deterministic, this interpretation can cause an arbitraryset of atoms to match depending upon input order. For example, in the symmetric molecule, cubane, four of the eight atoms will appear intwo SSSR rings, and half of the atoms appear in three, but the choice is made almost randomly. Rather than attempt to reproduce these weaksemantics, OpenEye strengthens the definition of Rn to mean the number of ring bonds to an atom, which is graph invariant and thereforeindependent of a molecule’s input order. Notice, that the interpretation of [R] and [R0], i.e. ring membership, remains the same. Similarly,Daylight [R1] is approximately equal to OpenEye [R2], and Daylight [R2] is approximately equivalent to OpenEye [R3]. Note that [x]was implemented by Daylight v4.9 and <strong>OEChem</strong> 1.5, and is exactly synonymous to <strong>OEChem</strong>’s [R].2 The atomic hybridization primtive, ˆ, is an OpenEye extension that is not available in Daylight SMARTS, but can be implemented usingrecursive SMARTS.


CHAPTERTHIRTYReactionsReaction processing in <strong>OEChem</strong> is divided into two categories: unimolecular reactions and library generation. Unimolecularreactions are useful for (although not limited to) normalization reactions. The OEUniMolecularRxnclass in <strong>OEChem</strong> applies chemical transformations to individual molecules. Reactions can also be used to generatecombinatorial libraries using <strong>OEChem</strong>. Both ’clipping’ and reaction based enumeration can be achieved using theOELibraryGen class.Reactions are represented in <strong>OEChem</strong> as query molecules (OEQMol). Sets of chemical transform operationsare derived from reaction molecules by differences between the reactant and product patterns and in the reactionmolecule. For example, atoms and bonds that appear in the reactant pattern, but are absent the in the productpattern are ’destroyed’. Atoms and bonds that appear in the product pattern but not in the reactant pattern are’created’. Atoms are tracked between reactants and products by means atom maps. Atom maps are stored andretrieved using the SetMapIdx and GetMapIdx methods. Product atoms that have the same map index asreactant atoms originate from their reactant counterpart. Reactions are completely defined by a fields in a querymolecule. Reaction molecules can be created from virtually any reaction file format (SMIRKS, MDL RXN, etc.),and can even be constructed programmatically.30.1 Normalization ReactionsThe OEUniMolecularRxn class is designed to apply a set of transformations defined by a reaction to exactlyone reactant molecule. All possible transformations are applied to the initial set of atoms and bonds of an inputmolecule. For example, a reaction that affects a particular type of functional group will be automatically be appliedtwice a to bi-functional molecule. The number of transformations applied by the OEUniMolecularRxn classis limited in order to prevent infinite loops. Consider a hypothetical reaction that methylates a methyl group.If a methyl group added in a reaction were allowed to react again, the methyl groups of a molecule would bemethylated ad infinitum. The first protection against infinite loops provided by OEUniMolecularRxn is thatonly original atoms and bonds of the input molecule are allowed to react. Atoms and bonds created by a reactionare excluded from involvement in further reactions. A more subtle source of potential infinite loops are reactionswhere products atoms still match the reactant pattern after they have been involved in a chemical transformation.The OEUniMolecularRxn class allows a set of atom that match a reactant pattern to react only a single time.The following code demonstrates the use of the OEUniMolecularRxn class. The OEUniMolecularRxnin this case is initialized using a SMIRKS pattern. The example reaction protonates and charges an amine nitrogen.When the OEUniMolecularRxn class is applied to 1,2-ethanediamine both nitrogens are charged andprotonated to yield 1,2-ethanediaminium. The example reaction was intentionally written to demonstrate the protectionmechanisms in place to prevent underspecified reactions from causing infinite loops. The product 1,2-119


120 Chapter 30. Reactionsethanediaminium still matches the reactant pattern, however, subsequent reactions terminate when no unreactedatoms are identified.1 #!/usr/bin/env python2 # ch28-1.py34 from openeye . oechem import ∗56 umr = OEUniMolecularRxn ( "[N:1]>>[Nh3+:1]" )78 mol = OEGraphMol ( )9 OEParseSmiles ( mol , "NCCN" )1011 umr ( mol )1213 smi = OECreateSmiString ( mol )14 print "smiles = " , smiListing 30.1: Normalization30.2 Library GenerationThe OELibraryGen was designed to give programmers a high degree of control when applying chemical transformations.It was also designed for efficiency. Potentially costly preprocessing is performed a single time beforetransformations can be carried out. The relative setup cost of a OELibraryGen instance may be high, and thememory use large as preprocessed reactants are stored in memory. Subsequent generation of products,however,is very efficient because setup costs are paid in advance. The OELibraryGen class serves a dual purpose ofmanaging sets of preprocessed starting materials, and storing a list of chemical transform operations defined by areaction molecule.Chemical transform operations are carried out on starting materials. Starting materials provide most of the virtualmatter that goes into making virtual product molecules. The OELibraryGen class provides an interface toassociate starting materials with reactant patterns using the OELibraryGen::SetStartingMaterial andOELibraryGen::AddStartingMaterial methods. These methods associate starting materials to reactantpatterns using the index (reactant number) of the pattern. Reactant patterns are numbered starting at zero for thelowest atom index and all atoms that are a members of the same connected component. The next reactant patternbegins with the next lowest atom index that is not a member of the first component. In a SMIRKS pattern the firstreactant (reactant number zero) is the furthest reactant on the left. Disconnected reactant patterns may be groupedinto a single component using component level grouping in SMIRKS denoted by parentheses.Once a reaction has been defined, and starting materials have been associated with each of the reactant patterns,chemical transformations can be applied to combinations of starting materials. To achieve a chemically reasonableoutput attention should be given to the mode of valence (or hydrogen count) correction that matches the reaction.The OELibraryGen class has three possible modes of valence correction: explicit hydrogen, implicit hydrogen,and automatic. The default mode for valence correction and SMIRKS interpretation is to emulate the DaylightReaction Toolkit. Hydrogen counts are adjusted using explicit hydrogens in SMIRKS patterns. Reactions arecarried out using explicit hydrogens, and valence correction occurs when explicit hydrogens are added or deletedas defined by a reaction. The following example demonstrates strict SMIRKS and explicit hydrogen handling.1 #!/usr/bin/env python2 # ch28-2.py34 from openeye . oechem import ∗


30.2. Library Generation 12156 libgen = OELibraryGen ( "[O:1]=[C:2][Cl:3].[N:4][H:5]>>[O:1]=[C:2][N:4]" )78 mol = OEGraphMol ( )9 OEParseSmiles ( mol , "CC(=O)Cl" )10 libgen . SetStartingMaterial ( mol , 0)1112 mol . Clear ( )13 OEParseSmiles ( mol , "NCC" )14 libgen . SetStartingMaterial ( mol , 1)1516 for product in libgen . GetProducts ( ) :17 smi = OECreateCanSmiString ( product )18 print smiListing 30.2: Strict SMIRKS Reaction HandlingIn the amide bond forming reaction a hydrogen atom attached to the nitrogen in the amine pattern is explicitlydeleted when forming the product. When executed, the example generates two products in total. Each productcorresponds to the equivalent protons attached to the amine. If a unique set of products is desired, canonical smilesstrings may be stored for verification that products generated are indeed unique.The following demonstrates how the same basic reaction given in the previous example can be carried out using theimplicit hydrogen correction mode. Notice that no explicit hydrogens appear in the reaction. Instead, the SMARTSimplicit hydrogen count operator appears on the right hand side of the reaction and is used to assign the implicithydrogen count of the product nitrogen.1 #!/usr/bin/env python2 # ch28-3.py34 from openeye . oechem import ∗56 libgen = OELibraryGen ( "[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][Nh1:4]" )7 libgen . SetExplicitHydrogens ( False )89 mol = OEGraphMol ( )10 OEParseSmiles ( mol , "CC(=O)Cl" )11 libgen . SetStartingMaterial ( mol , 0)1213 mol . Clear ( )14 OEParseSmiles ( mol , "NCC" )15 libgen . SetStartingMaterial ( mol , 1)1617 for product in libgen . GetProducts ( ) :18 smi = OECreateCanSmiString ( product )19 print smiListing 30.3: Reactions Using Implicit HydrogensThe reaction is written to work with implicit hydrogens (using the lowercase ’h’ primitive),and the OELibraryGen instance is set to work in implicit hydrogen mode using theOELibraryGen::SetExplicitHydrogens method.The final example demonstrates automatic valence correction. In implicit hydrogen mode (set using theOELibraryGen::SetExplicitHydrogens method) automatic valence correction attempts to add or subtractimplicit hydrogens in order to retain the valence state observed in the starting materials. Before chemicaltransformations commence, the valence state for each reacting atom is recorded. After the transform operationsare complete the implicit hydrogen count is adjusted to match the beginning state of the reacting atoms. Changesin formal charge are taken into account during the valence correction.


122 Chapter 30. Reactions1 #!/usr/bin/env python2 # ch28-4.py34 from openeye . oechem import ∗56 libgen = OELibraryGen ( "[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][N:4]" )7 libgen . SetExplicitHydrogens ( False )8 libgen . SetValenceCorrection ( True )910 mol = OEGraphMol ( )11 OEParseSmiles ( mol , "CC(=O)Cl" )12 libgen . SetStartingMaterial ( mol , 0)1314 mol . Clear ( )15 OEParseSmiles ( mol , "NCC" )16 libgen . SetStartingMaterial ( mol , 1)1718 for product in libgen . GetProducts ( ) :19 smi = OECreateCanSmiString ( product )20 print smiListing 30.4: Reactions Using Automatic Valence CorrectionIn general, automatic valence correction is a convenience that allows straightforward reactions to be written insimplified manner and reduces the onus of valence state bookkeeping. Reactions that alter the preferred valencestate of an atom, oxidation for example, may not be automatically correctable.OELibraryGen objects are normally initialized with a SMIRKS pattern. A boolean argument is used to specifywhether the SMIRKS string should be interpreted using strict SMIRKS semantics. Here strict means in fullcompliance with the SMIRKS language defined by its originator, Daylight CIS, Inc. If the default value of true isused, the SMIRKS string must have corresponding reaction mapped reactant and product atoms. Mapped productatoms that do not have corresponding mapped reactant atoms are considered invalid SMIRKS and will result in afailure to initialize the OELibraryGen instance. Strict SMIRKS also requires unmapped reactant atoms to bedestroyed in the reaction. Passing a boolean value of false to the second method argument will relax both of thestrict SMIRKS restrictions.The AddStartingMaterial and SetStartingMaterial methods are used to initialize the starting materialscorresponding to a reaction component (reactant). An iterator over molecules or a single molecule may bepassed as the first argument to the methods. Subsequent calls to the AddStartingMaterial method append tothe list of starting materials set in prior calls. The second argument specifies the reactant by number, starting withzero, to which the starting materials correspond. These numbers correspond with the left to right lexical orderingof reactants in the SMIRKS. The final argument is used to control the pattern matching of the reactant pattern to thestaring material. If the value passed is true, only matches that contain a unique set of atoms relative to previouslyidentified matches are used. If the value is false, every possible match including those related by symmetry will beused. Reactant patterns are unique matched by default.The SetExplicitHydrogens method sets the hydrogen handling mode for the OELibraryGen instance.OELibraryGen instance are constructed by default with the explicit hydrogen mode set to true. Reactionsmay be executed using either implicit or explicit hydrogens represented in the starting materials for a reaction.If the value is true, the OELibraryGen instance will add explicit hydrogens to reactant moleculeswhen they are initialized using either of the SetStartingMaterial methods. If the value is false, thenboth of the SetStartingMaterial methods will suppress any explict hydrogens in the reactant molecules,and simply retain the implicit hydrogen counts for remaining non-hydrogen atoms. The hydrogen handlingmode must be assigned prior to calling SetStartingMaterial. Calling SetExplicitHydrogens afterSetStartingMaterial will have not effect. Note that the explicit hydrogen setting in effect modifies thesemantics of smirks. If the programmer wishes to implement strict SMIRKS according to the Daylight standard,in full, explicit hydrogens should be set on.


30.2. Library Generation 123The SetValenceCorrection method controls the valence correction mode setting of an OELibraryGeninstance. OELibraryGen instances are constructed by default with the valence correction mode set to false.Valence correction mode can be turned on by passing a boolean true value to an OELibraryGen instance usingthis method. When valence correction mode is enabled, the OELibraryGen instance will attempt to adjust thehydrogen count on atoms in the product molecule that are involved in the reaction to match the original valencestate of the reactant. For product atoms that do not undergo a nuclear reaction (atomic number is retained) thehydrogen count is either increased or decreased to match the initial valence state of the corresponding reactantatom. Formal charge is taken into account during the hydrogen count adjustment. Note that valence correction ineffect modifies the semantics of smirks. Thus, if the programmer wishes to implement strict SMIRKS accordingto the Daylight standard, in full, valence correction should be set off.


CHAPTERTHIRTYONELicense handlingWith version 1.3.4 of the <strong>Python</strong> wrappers, there are 2 new functions provided to handle licenses.The first can be used to query the status of a valid license.<strong>OEChem</strong>IsLicensed ( )will return True if a valid <strong>OEChem</strong> license can be found. False otherwise. Note that for <strong>Python</strong>, we also need tocheck the a valid ”python” feature, so the command to use would be:<strong>OEChem</strong>IsLicensed ( ’python’ )These commands are intended to provide developers a way to test licenses before actually instantiating <strong>OEChem</strong>objects and invoking the internal license check.A second function is provided to allow addition of license keys inside the script instead of from a license file.OEAddLicenseKey ( key )can be used to install the key provided in an oe license.txt file such that the script does not depend on OE DIR orOE LICENSE or the existence of an external license file. The ”key” is the single line of ASCII under the specificfeature that you are adding the license for. If you want to install both an <strong>OEChem</strong> and an OEShape license, then 2function calls, one for each key, are required. Note that this will effectively put a time-bomb into the program, soit should be used with care.124


CHAPTERTHIRTYTWO<strong>OEChem</strong> Class Hierarchy: Why in theworld are there 6 molecules?!32.1 Atoms, Bond, Conformers, and MoleculesThe most important data types in the <strong>OEChem</strong> library are OEMolBase, OEAtomBase and OEBondBase. Thesethree classes describe the behaviors of molecules, atoms and bonds respectively. However, these types are ”abstract”classes, describing the methods and semantics of molecules, atoms and bonds, but without defining anactual implementation. Several implementations of OEMolBase are defined by the <strong>OEChem</strong> library includingOESCMol (Single Conformer Molecule) for simple molecule processing, OEQMol (Query Molecule) for specifyingsubstructure, superstructure and pattern matching, and OEDBMol (Database Molecule) for minimal memoryusage. Additional OEMolBase implementations may also be provided by the user, which can then be used bymany of the functions in the <strong>OEChem</strong> Library.An OEMCMolBase is a multi-conformer extension of an OEMolBase. The OEMCMolBase inherits from a Mol-Base and also contains OEConfBases, which are additional sets of coordinates to represent conformers, depictions,etc. There may soon be multiple implementations of an OEMCMolBase, but currently the only implementation isthe OEMCMol, which holds a 3-dimensional Cartesian representation of the coordinates in floats.An OEMolBase representing a molecule can be thought of as containing a (possibly empty) set of atoms representedby OEAtomBases, and a (possibly empty) set of bonds represented by OEBondBases. Each OEAtomBaseand OEBondBase belongs to a single OEMolBase, its parent. It is not possible for an OEAtomBase or an OE-BondBase to be simultaneously part of two distinct molecules. Similarly, an OEMCMolBase, which inherits froman OEMolBase may contain atoms and bonds. In addition, is can contain a (possibly empty) set of conformersrepresented by OEConfBases. Each OEConfBase belongs to a single parent. The OEAtomBases, OEBondBases,and OEConfBases currently implemented cannot be created outside of the context of a parent molecule.An OEBondBase (typically) represents a covalent/dative bond between two OEAtomBases.OEMolBase, OEAtomBase, OEBondBase, OEMCMolBases, and OEConfBases are all themselves derived froma more primitive class OEBase. The methods of the OEBase class allow arbitrary data to be associated with anobject, using a tag- value pair mechanism. Thus for simple extensions of molecules, atoms, or bonds, one cansimply use this arbitrary data method to associate additional data with the classes, rather than needing to derive orwrap a new molecule, atom, or bond class.Since all of the classes discussed so far are abstract (e.g. - they define only an interface), they cannot be explicitlyinstantiated by a user. However for ease of use we have designed some concrete wrapper classes. These classes arethe OEGraphMol, OEQMol and OEMol, which correspond to the OEMolBase and OEMCMolBase respectively.125


126 Chapter 32. <strong>OEChem</strong> Class Hierarchy: Why in the world are there 6 molecules?!OEGraphMols and OEMols can be declared by users and should be the primary molecules used in most high-level<strong>OEChem</strong> code. Any <strong>OEChem</strong> function which is defined for OEMolBases is also defined for OEGraphMols, andan OEGraphMol can be passed to any function which takes an OEMolBase argument. Similarly OEMols have thesame api as OEMCMolBases and can be passed to any function which takes and OEMCMolBase or OEMolBase.Above we spoke of several different implementations of OEMolBases. The different implementations of OEMol-Bases. The different implementations can be obtained by using an unsigned int argument in the OEGraphMol,OEQMol, and OEMol constructors. Finally, these concrete classes act as smart pointers around the implementationsof the abstract classes, so there is no need for a user to clean up memory used by these versions of themolecules.Since OEAtomBases, OEBondBases, and OEConfBases can only be accessed through their parent molecules,there is no need for concrete instances of these classes. In <strong>OEChem</strong>, these three classes are accessed via pointersto the respective base classes, or through the iterator interface discussed elsewhere.32.2 Objects and Free-Functions<strong>OEChem</strong> is an object-oriented library. However, we have taken a design philosophy that objects are primarily datacontainers with data access member functions. Most powerful data analysis and manipulation routines in <strong>OEChem</strong>are implemented as free functions rather than member functions. We have used namespaces to keep this decisionfrom cluttering the global namespace. This decision to avoid the ”kitchen-sink” model allows simple light-weightobjects which are relatively easy to derive from and implement. A user can take advantage of the parts of the apithey need without using objects with enormous and confusing APIs.32.3 Programming Layers: The Deep and Twisted PathIn designing <strong>OEChem</strong>, we strove to provide a library which puts powerful algorithms in the hands of novice userswithout hand-cuffing the expert. For this reason, <strong>OEChem</strong> can at the same time seem trivial and overwhelming.There are often several ways to carry out certain tasks in <strong>OEChem</strong> each with its subtle advantages, which canbenefit an experienced user. There are very few algorithms we have shied away from including in <strong>OEChem</strong>, andmany of the methods are new, unique and powerful. This gives <strong>OEChem</strong> a very rich interface, yet to gain thisefficiency and power <strong>OEChem</strong> may force you to think about problems in ways your are not accustomed to doing.We hope that you benefit from the experience. If you find that you cannot or it is difficult to carry out a task with<strong>OEChem</strong>, please contact us.<strong>OEChem</strong> has several layers of interfaces to most of its functionality. There are ”high-level” interfaces, whichprovide the user with an enormous amount of power with minimal code. This level is exemplified by the secondcode listing in the chapter ”Reading and Writing Molecules” (see below). In this example, the functionality of the”babel” file-format conversion program is re-implemented in about 10 lines of code. While this is trivial to writeand understand (maybe after you’ve finished the manual), it should not belie the fact that <strong>OEChem</strong> is carrying outan enormous amount of work under the surface.A perhaps ”mid-level” interface in <strong>OEChem</strong> is the generic data interface. Once you are familiar with <strong>OEChem</strong>’smolecules, you may find that none of them are specifically tailored to your task. Rather than having to delve deeplyinto the molecule api and derive your own molecule for the task, you can simply extend the api of molecules atrun-time to suite your needs. While this functionality is perhaps not for the first-day user, it certainly doesn’trequire a stout heart.


32.3. Programming Layers: The Deep and Twisted Path 127Finally, for advanced programmers, <strong>OEChem</strong> provides access to nearly all of the detailed control of our functions.<strong>OEChem</strong> molecules have a simple api which can be used to derived your own molecules. The free-function heavyapi discussed above then lets you have access to much of <strong>OEChem</strong>’s power without having to implement anenormous molecule api yourself. Similarly, many of the functions that are wrapped in high level functions (like themolecule readers and writers) are also available directly to the user at the low level. For instance, while you manyuse oemolostreams to write molecules to output, you can also use the OEWriteMDLFile functions for moredirect access.If you ever have difficulty implementing the functionality you desire with the <strong>OEChem</strong> interface you already know,look deeper. It will often be the case that a lower level function will allow you the control you are seeking.


CHAPTERTHIRTYTHREEBibliography1. Sheila Ash, Malcolm A. Cline, R. Webster Homer, Tad Hurst and Gregory B. Smith, ”SYBYL Line Notation(SLN): A Versatile Language for Chemical Structure Representation”, Journal of Chemical Information andComputer Science (JCICS), Vol. 37, No. 1, pp. 71–79, 1997.2. Wolf Dietrich Ihlenfeldt and Johann Gasteiger, ”Hash Codes for the Identification and Classification ofMolecular Structure Elements”, Journal of Computational Chemistry, Vol. 15, No. 8, pp. 793–813, 1994.3. Geoffrey M. Downs, Valerie Gillet, John D. Holiday and Michael F. Lynch, ”Review of Ring PerceptionAlgorithms for Chemical Graphs”, Journal of Chemical Information and Computer Science (JCICS), Vol.29, No. 3, pp. 172–187, 1989.4. Johann Gasteiger and Clemens Jochum, ”An Algorithm for the Perception of Synthetically ImportantRings”, Journal of Chemical Information and Computer Science (JCICS), Vol. 19, No. 1, pp. 43–48,1979.5. Paul Labute, ”Automatic Assignment of Bond Order”, Journal of the Chemical Computing Group, On-line,May 1996.6. Andrew R. Leach, Daniel P. Dolata and Keith Prout, ”Automated Conformational Analysis and StructureGeneration: Algorithms for Molecular Perception”, Journal of Chemical Information and Computer Science(JCICS), Vol. 30, No. 3, pp. 316–324, 1990.7. Douglas Lloyd, ”What Is Aromaticity?”, Journal of Chemical Information and Computer Science (JCICS),Vol. 36, No. 3, pp. 442–447, 1996.8. Simon K. Kearsley, ”A Quick Robust Method for Assigning A Kekulé Structure”, Computers in Chemistry,Vol. 17, No. 1, pp. 1–10, 1993.9. H.L. Morgan, ”The Generation of a Unique Machine Description for Chemical Structures Developed atChemical Abstracts Service”, Journal of Chemical Documentation, Vol. 5, pp. 107–113, 1965.10. Barbara L. Roos-Kozel and William L. Jorgensen, ”Computer-Assisted Mechanistic Evaluation of OrganicReactions 2: Perception of Rings, Aromaticity and Tautomers”, Journal of Chemical Information and ComputerScience (JCICS), Vol. 21, No. 2, pp. 101–111, 1981.11. Craig A. Shelley and Morton E. Munk, ”Computer Perception of Topological Symmetry”, Journal of ChemicalInformation and Computer Science (JCICS), Vol. 17, No. 2, pp. 110–113, 1977.12. Craig A. Shelley and Morton E. Munk, ”An Approach to the Assignment of Canonical Connection Tablesand Topological Symmetry Perception”, Journal of Chemical Information and Computer Science (JCICS),Vol. 19, No. 4, pp. 247–250, 1979.128


12913. David Weininger, Arthur Weininger and Joseph L. Weininger, ”SMILES. 2. Algorithm for Generation ofUnique SMILES Notation”, Journal of Chemical Information and Computer Science (JCICS), Vol. 29, No.2, pp. 97–101, 1989.14. ”A Comparison of Algorithms for Maximum Common Subgraph on Randomly Connected Graphs”, H.Bunke1, P. Foggia, C. Guidobaldi, C. Sansone, M. Vento, Lecture Notes In Computer Science, 2396, Proceedingsof the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition,pp123-132, 2002. (<strong>OEChem</strong> MCSS based on ”space state search” method described herein.)


CHAPTERTHIRTYFOURRelease Notes34.1 <strong>OEChem</strong> 1.6.134.1.1 <strong>OEChem</strong>New features1. RegisterMolParameters renamed to OERegisterMolParameters. However, the function isnow automatically called at link time alleviating the user from having to call this function in order to usemolecules with OEInterfaces.2. OEReadXYZFile uses the ”Unichem” numeric convention for elements that don’t have a one or twocharacterIUPAC symbol.3. Adding const string& variants of OEParseSmiles and OESmilesAtomCount functions to the<strong>OEChem</strong> API.Major bug fixes1. OEPerceiveBondOrders() function is improved by:(a) distinguishing unsaturated allene (C=C=C) from propyne (C#CC)(b) assigning the double bond to a shorter of the bonds to a terminal oxygen in functional groups such as*C(=O)O or *C(=O)[O-]Minor bug fixes1. Hydrogen bond lengths for As, Ge, Se and Te elements are added to improve the 3D coordinates assignedfor explicit hydrogens by OESet3DHydrogenGeom function.2. Optimization of OEDBMol compression.3. Memory leak was fixed in OERMSD.4. Bug was fixed in the binary write routine of OEGraphMol and OEMol parameters, that caused problemsduring a multiprocessor run.130


34.2. <strong>OEChem</strong> 1.6.0 1315. Memory problem that occurred when reading molecules with large number of conformations was fixed byallocating memory from the heap rather than stack in such cases.6. Using OEMol::operator= (const OEMolBase &) on itself will no longer crash.34.1.2 OESystemNew features1. New constructor added to the OEInterface class, that takes the interface data along with argc and argvarguments of main(). Using this constructor, an OEInterface object is configured, parses the commandline, and prints the help message if requested.Major bug fixes1. The direction of rotation matrix given by OEGeom3DEulerToRotMatrix andOEGeom3DQuaternionToRotMatrix functions is now consistent with the other geometry handlingroutines.2. Memory leak, that could occur in multi-threaded environment (such as Java), was fixed.Minor bug fixes1. OEParseCommandLine prints more sensible error message.34.1.3 OEPlatformMinor bug fixes1. Retrieving the host identifier for MAC Darwin platform is improved.34.2 <strong>OEChem</strong> 1.6.034.2.1 <strong>OEChem</strong>New features1. <strong>OEChem</strong>’s PDB residue perception code now follows the changes required by the PDB version 3.0 standard.This includes disambiguation of the DNA residues ”DA”, ”DC” and ”DG” from the RNA residues ”D”, ”C”and ”G”. Nucleic acid backbone atom names now end in an apostrophe instead of a star/asterisk. The defaultligand name in <strong>OEChem</strong> is now ”UNL” instead of ”MOL” or ”LIG”.


132 Chapter 34. Release Notes2. There have been significant improvements to <strong>OEChem</strong>’s bond order perception code including phosphates,thiophosphates, dithioic acids, oximes, aldoximes, sulfur oxides, sulfites and iron-sulfur clusters.3. The cis/trans detection logic in OE3DToBondStereo has been robustified to do better with 2D depictionscontaining bonds that are co-linear with a chiral double bond or have zero length. The code now continuessearching for additional incident bonds that have non-zero length and aren’t co-linear.4. Significant improvements have been made to <strong>OEChem</strong>’s DNA/RNA perception code. The code now handles/recognizestruncated RNA biopolymers, and recognizes the bases ”1MA”, ”2MC”, ”5MC”, ”5MU”,”7MG”, ”M2G”, ”OMG”, ”OMC”, ”PSU”, etc...5. <strong>OEChem</strong>’s residue perception code now handles/recognizes a much larger set of common co-factors andligands, including ”ADP”, ”ATP”, ”DMS”, ”EDO”, ”FAD”, ”HEM”, ”NAD”, ”NAG”, ”PEO” etc...6. <strong>OEChem</strong>’s residue perception code now handles the non-standard amino acid ornithine (ORN).7. A new OEAssignFormalCharges function variant has been added to the <strong>OEChem</strong> API to allow assignmentof formal charges on a specific OEAtomBase*. The existing OEAssignFormalCharges functioncontinues to operate on the whole molecule.8. A new OEPreserveResInfo::AtomName flavor has been added to the OEPreserveResInfonamespace to allow the OEPerceiveResidues function to preserve the original PDB atom name.9. The OpenEye formal charge model has been extended such that a four-valent aluminum (aluminium) nowhas an implicit negative charge, and aluminum ions have a +3 formal charge. The model has also beentweaked to consider the sulfurs in iron-sulfur clusters as neutral [S] radicals.10. The OpenEye hydrogen count model has been tweaked to prefer five-valent phosphorus, such as O=[PH2]Oover three-valent O=PO.11. The values returned by OEGetAverageWeight have been updated and revised to follow the latest (2007)recommendations of the IUPAC Commission on Isotopic Abundances and Atomic Weights.12. The OEParseSmarts and OEParseSmirks functions have been enhanced to allow a TAB character’\t’ to be treated as a separator after a SMARTS pattern. This matches the behavior of the SMILES parser,OEParseSmiles, and simplifies the task of writing ”patty”-like applications.13. The <strong>OEChem</strong> SMILES and SMARTS parsers have been tweaked to allow the backslash used in specifyingcis/trans stereochemistry to be duplicated in the input string, i.e. “C\\C=C\\C” is now interpreted as“C\C=C\C”. This is convenient when working with programming languages such as C and C++ where thebackslash is used as an escape character. Embedding SMILES in C/C++ source files requires the stringslook like "C\\C=C\\C" which previously couldn’t be cut’n’paste like regular SMILES strings.14. The interpretation of acyclic aromatic elements by <strong>OEChem</strong>’s SMILES reader now more accurately followsthe Daylight toolkit. For example, ”n” is interpreted as ”[NH2]” and not ”[N]”, and ”nn” now means ”N=N”instead of ”[N]=[N]”.15. Fixed an obscure corner case in the <strong>OEChem</strong> SMILES writer, when not performing aromaticity perceptionand using the low-level SMILES writer. We need to preserve the explicit single bond (hyphen/minus)in “[cH2]-[cH2]” otherwise “[cH2][cH2]” would get interpreted like cc and result in c=c and[cH2]=[cH2].16. The MDL file format reader has been enhanced to allow TABs in addition to spaces as separators in ”MCHG”, ”M RAD” and ”M ISO” lines.17. The Sybyl .mol2 file format reader has been enhanced to recognize pyrylium-like ring systems, containingcharged oxygen atoms. A minor bug has also been fixed that could assign inappropriate formal charges tosubstituted nitrates.


34.2. <strong>OEChem</strong> 1.6.0 13318. In Sybyl .mol2 format files, the atoms types ‘d’ and ‘t’ are now treated identically to ‘D’ and ‘T’ andinterpreted as Deuterium and Tritium respectively. Previously, they’d be interpreted as hydrogen atoms, butthe isotope specification wasn’t getting set.19. The Tripos bond types in Sybyl .mol2 format files are now treated as case-insensitive. We now treat ”AR”and ”Ar” as identical to ”ar”, and ”AM” and ”Am” as identical to ”am”, etc...20. The CambridgeSoft CDX file format reader has been significantly rewritten to address bugs in the reading/writingof 3D co-ordinates.21. The OpenEye OEB file format reader is now more robust to invalid, corrupt and/or truncated input files.22. Added versions of OEIsReadable and OEIsWritable that can directly take a filename or extension.23. Significant work was done on SD data robustness.(a) If SD data was attached to an OEMCMolBase and an OEConfBase then written to OEB and readback into an OEMolBase the data from the OEConfBase would appear to disappear. This wouldresult in losing the data if then written to SDF.(b) SD data attached to an OEMolBase, written to OEB, then read into an OEMCMolBase will now beattached to the OEConfBase instead of the OEMCMolBase.(c) Significant speed improvements were made to SD data (e.g. csv2sdf saw a 4-fold speed improvement).(d) See ‘Dude, where’s my SD Data?’ on page 31 in the programming theory manual for more details.24. The OEReadPDBFile and OEWritePDBFile functions are able to read and write ANISOU records,respectively. ANISOU records, which are atom property representing anisotropic temperature factors inPDB, are scaled by a factor of 10 4 and represented as integers.25. OEGetCenterOfMass function, which computes the center of mass of a molecule (with or without atomicweights), was added to the <strong>OEChem</strong> namespace.Major bug fixes1. The algorithm that generates canonical SMILES did not ignore cis/trans stereo hydrogens and produced’[H]N=CC’, rather than the correct ’N=CC’ canonical SMILES.Even though this bug fix has affected only a small percentage of canonical SMILES, we highly recommendthe regeneration of all canonical SMILES.2. Small improvements have been made to the generation canonical isomeric smiles.3. A problem has been fixed in <strong>OEChem</strong>’s Kekulization algorithms for large molecules (with between 250 and1000 atoms) that can’t be assigned a valid Kekulé form. The changes in <strong>OEChem</strong> 1.5.0 that attempted toassign as much of a Kekulé form as possible upon failure could occasionally lead to OEKekulize returningtrue for an invalid molecule.4. A performance problem in <strong>OEChem</strong>’s aromaticity perception has been resolved. Previously pathologicalsubstituted fullerenes and PAHs could cause <strong>OEChem</strong>’s aromaticity routines to take over a minute to perceiveall of the conjugated cycles. Algorithmic improvements to <strong>OEChem</strong>’s aromaticity perception now allow allof the reported cases to be processed in a fraction of a second.5. A rare problem interpreting the stereo from wedge/hash bonds around atoms of degree three has been resolved.When we have two bonds in the plane, and the third marked as a wedge or a hash, we need todetermine whether the raised/lowered bond is in the larger or smaller sector subtended by the two in-planebonds. A bug in this code failed to handle the case when all three bonds lay in the same half-circle. Thisproblem is extremely rare, for example, no cases were found in the 250,251 MDL connection tables distributedby the NCI as the NCI August 2000 database.


134 Chapter 34. Release Notes6. Memory leak problem was fixed in OELibraryGen.Minor bug fixes1. The <strong>OEChem</strong> MDL mol file reader has been improved to allow the dimension field in the connection tableheader line to be omitted, and still correctly decide whether to process wedge/hash bonds or determinechirality from 3D co-ordinates. Previously, the molecule’s stereochemistry would be set incorrectly if theoptional header line was missing.2. The MDL file reader now perceives aromatic cycles using the MDL aromaticity model prior to callingOEPerceiveChirality. This ensures that alternate Kekulé forms of substituted phenyl rings (for example)don’t inappropriately split symmetry groups, causing achiral double bonds to acquire specified cis/transstereochemistry.3. An aesthetic improvement has been made to the rules used in the OEMDLPerceiveBondStereo functionthat assigns wedge and hash bonds to depictions. For acyclic bonds, we now prefer to place the wedge orhash on bonds to non-ring atoms. A typo in the previous rules reversed this priority.4. The <strong>OEChem</strong> SMILES writer was being miscompiled on IBM AIX 5.x resulting in canonical SMILES thatdiffered from those on other platforms. The code has been rewritten to avoid the issue in IBM’s xlC compiler,so the SMILES are once again identical to the other platforms.5. The <strong>OEChem</strong> PDB file parser has been updated to reflect the latest atom name exceptions in the RCS-B/wwPDB database. These changes should eliminate the spurious Holmium and Helium atoms perceived inrecently added ligand residues.6. Numerous small performance improvements have been made to <strong>OEChem</strong>.7. The torsion cutoff values for perceiving cis/trans bond stereo from 3D are relaxed in OE3DToBondStereofunction. The cis cutoff is increased to 30 from 15, the trans cutoff is lowered to 150 from 165.8. Even when the maximum number of matches is set, the MCS search can not be terminated uponreaching this limit, since there is no guarantee that the maximum common substructure has been detected.Instead, the search continues, then the best N matches are returned, where N is set byOEMCSSearch.SetMaxMatches.9. The exhaustive and the approximate MCS algorithms no longer use different functions to determine whethera match is unique or not. Several other small problems were fixed in order to insure that all matches locatedby the approximate method are also detected by the exhaustive one.10. After changing the atomic number of an atom with OEAtomBase.SetAtomicNum, the aromaticityand chirality of the molecule have to be reperceived with the OEAssignAromaticFlags andOEPerceiveChiral functions, respectively.11. Improved the stability of OEPerceiveResidues to not reorder atoms.12. An API point was added to OELibraryGen to change the character used to separate productmolecules title when concatenating reaction molecule titles together. For more information seeOELibraryGen.SetTitleSeparator and OELibraryGen.GetTitleSeparator in the APImanual.13. A rare problem occurred in the substructure search when hydrogen atoms were matched first. This problemhas been solved by rearranging the order in which atoms are taken into consideration, moving hydrogensto the end of the match order. Other small modifications have been made to improve the performance ofsubstructure search.


34.2. <strong>OEChem</strong> 1.6.0 13514. OEFindRingAtomsAndBonds is automatically called to perceive rings in structures returned byOEUniMolecularRxn or OELibraryGen.15. Bug was fixed that caused OEGraphMol and OEMol parameters to always fail to load inRegisterMolParametersChanges in this documentation1. OEDeleteSDData was improperly documented in the theory manual (‘Manipulation of Tagged Data’ onpage 22). The old documentation stated that only the first instance of a tag was deleted when all instancesof the tag were actually deleted. The documentation has been corrected to state that all instances of a tag aredeleted.2. ‘Maximum Common Substructure Search’ on page 70 has been revised adding new examples, explainingthe difference between the exhaustive and the approximate methods and providing more details about thebuilt-in MCS scoring functions.3. Figures have been added to ‘OEExprOpts Namespace’ on page 76 in order to demonstrate the effect ofvarious atom and bond expression options on pattern matching.4. C++, <strong>Python</strong>, and Java manuals brought into closer alignment with each other.34.2.2 OESystemNew features1. OEInterface allows multiple values per parameter. The values can be accessed by calling the GetListtemplate member function of OEInterface.2. OEAnnotate class provides ability to attach various graphical objects (sphere, box, surface, etc)to classes derived from OEBase by using the generic data functions OEBase.SetData andOEBase.GetData.Minor bug fixes1. The memory allocation performance of multi-threaded <strong>OEChem</strong> applications on both Windows and recentLinux/UNIX distributions (that use pthreads) has been dramatically improved. A new thread hashing algorithmis now used in OESystem’s memory pooling code which should dramatically reduce contention inallocation heavy multi-threaded code.2. A number of minor performance and numerical stability improvements have been made to OEMath’s geometryroutines.3. When parsing the command line --help -foo is no longer sensitive to the case of -foo


136 Chapter 34. Release Notes34.2.3 OEPlatformMinor bug fixes1. The Microsoft Windows implementation of OEMutex has been rewritten to use interlocked intrinsics andsemaphores instead of the Windows’ Mutex objects, which were much slower.2. The OETryMutex class is now more efficient, using a platfrom-specific implementation on each supportedtarget.34.3 <strong>OEChem</strong> 1.5.134.3.1 Bug fixes1. Fixed two bugs in kekulization of large molecules. First, some large molecules would fail kekulization whenthey were actually ok. Second, even when they were kekulized correctly, the method would still return false.2. This tweaks the MDL mol file reader to use the test ”dimension != 3” instead of ”dimension == 2” whendeciding to honor the wedge/hash bonds or to determine the chirality from 3D co-ordinates. The subtledifference is that at the point this code is called, the dimension is not necessarily ”2” or ”3” if the (optional)header line is missing. If the header has been omitted, we treat the molfile like a 2D file (which it mostprobably is).3. Changed SMARTS parser to allow a TAB character (’t’) to be treated as a separator following a SMARTS pattern. This relects similar functionality in the SMILESparser and simplifies the task of writing ”patty”-like applications.34.3.2 New features1. Added new OEAssignFormalCharges function that operates on a single atom instead of just a version for theentire molecule.2. Renamed the OESystem::ParamVis namespace to OESystem::OEParameterVisibility to make it consistentwith other OpenEye namespaces.34.4 <strong>OEChem</strong> 1.5.0<strong>OEChem</strong> 1.5.0 is a new release including many major and minor bug fixes along with several new features. This isalso a continuation of a complete release of all OpenEye toolkits as a consolidated set so that there are no chancesof incompatibilities between libraries.Note, that in this release the directory structure has been changed to allow multiple versions of the toolkits to beinstalled in the same directory tree without conflicts. From this release on, all C++ releases will be under theopeneye/toolkits main directory. There is then a directory specific to the version of the release and then below


34.4. <strong>OEChem</strong> 1.5.0 137that, directories for each architecture/compiler combination. To simplify end user Makefiles, “openeye/toolkits/lib”,“openeye/toolkits/include”, and “openeye/toolkits/examples” are all symlinks to the specific last versionand architecture that was installed.New users should look in “openeye/toolkits/examples” for all the examples. Existing users updating existingMakefiles should change their include directory from ”openeye/include” to “openeye/toolkits/include”. As well,existing Makefiles should change the library directory from “openeye/lib” to “openeye/toolkits/lib”.34.4.1 New features1. <strong>OEChem</strong> now has a 2D similarity implementation using the Lingos method of similarity. Lingos comparesIsomeric SMILES strings instead of pre-computed fingerprints. This combination leads to very rapid 2Dsimilarity calculation without any upfront cost to calculate fingerprints and without any storage requirementsto store fingerprints.2. MMFF94 charges are now avialable in <strong>OEChem</strong>. While we recommend AM1-BCC charges as the bestavailable charge model, having MMFF94 charges available at the <strong>OEChem</strong> level means that decent chargesare available to all toolkit users.3. In <strong>OEChem</strong> 1.2, there was an alternate implementation of MCS that used a fast, approximate method fordetermining the MCS. While it is less than exhaustive, the speed does have some appealing uses. In <strong>OEChem</strong>1.5, we’ve restored this older algorithm and now both are available.namespace OEMCSType{static const unsigned int Exhaustive = 0;static const unsigned int Approximate = 1;static const unsigned int Default = Exhaustive;}OEMCSType Exhaustive implies the current, exhaustive algorithm from <strong>OEChem</strong> 1.3 and later, whileOEMCSType Approximate implements the older, fast but approximate algorithm.4. The ability to get the license expiration date when calling <strong>OEChem</strong>IsLicensed has been added.5. Molecules (OEMol, OEGraphMol) can now be attached to an existing OEBase as generic data and they willbe written to OEB and read back in. Additional support for attaching grids and surfaces to molecules hasbeen added to Grid and Spicoli.6. There is a new retain Isotope flag to OESuppressHydrogens. If false, [2H] and [3H] will also be removedby this call. By default, this is true so that the current behavior of OESuppressHydrogens is identical to theprevious version.7. The <strong>OEChem</strong> CDX file reader can now Kekulize aromatic (single) bonds in the input ChemDraw file. Itswitches the internal bond order processing to use the bond’s integer type field, and then calls ”OEKekulize”to do all of the heavy lifting.8. Tweaks to the algorithm used for determining which bond(s) around a stereocenter should bear a wedge ora hash. The bug fixed here includes an example where all three neighbors are stereocenters, but two are in aring and one isn’t.9. There are new versions of OEIsReadable and OEIsWriteable that take a filename directly.10. More exceptional atom naming support for the PDB residues CO6 (pdb2ii5), SFC (pdb2gce), RFC(pdb2gce), MRR (pdb2gci), MRS (pdb2gd0), FSM (pdb2cgy) and YE1 (pdb2np9) has been added.


138 Chapter 34. Release Notes34.4.2 Major bug fixes1. The Bondi VdW radii tables have been completely updated and fixed. Additionally, any element not coveredby the original Bondi paper is assigned a radius of 2.0 Angstroms.2. Corrected a serious glitch in the <strong>OEChem</strong> SMILES parser, where ”[X-2]” would be incorrectly interpretedas ”[X+2]”. This bug was an unintentional side-effect of the recent changes to warn about ”[X+0]” and”[X-0]” when the parser is in strict mode.3. Fixed a bug in Kekulization of very large molecules. Almost all small molecules continue to use the fasterlight-weight method, but we avoid exponential behavior for the large non-Kekulizable cases.4. Fixed a bug dealing with the semantics of wedge/hash bonds when the thin end is on a terminal atom.Clearly, in such cases the author probably intended to specify some aspect of the chirality of the atom atthe other end of the bond. First, when drawing such structures into MDL’s ISIS/Draw, it does not percievespecified chirality at the thick end of a wedge/hash bond from a hydrogen. However, this patch adds supportfor conveniently handling this case via OEMDLHasIncorrectBondStereo and OEMDLCorrectBondStereo,which notices this case, and either inverts the sense of the bond if the other end has explicit degree three orfour, or removes the ignored ”wedge/hash” bond annotation.5. Bug-fix improvement to CorrectMDLBondStereo such that whenever we introduce a new wedge/hash bond(on a previously unmarked bond), we now ensure that the we correctly set the thin end on the appropriatechiral atom.6. Fixed a bug in OEB reading/writing that makes sure that any unknown tags are simply passed from input tooutput without loss.7. Fixed a bug in reactions where the final valence was being computed incorrectly, such that the bond ordersaren’t modified directly until after valence correction is applied.34.4.3 Minor bug fixes1. Tweaked the way the SMILES parser calculates implicit hydrogen counts on unbracketed aromatic atoms inSMILES strings. This fixes ”Scc” and ”S1cccc1”.2. OEParseCommandLine now lists the legal and illegal values for a parameter if the user specifies an invalidvalue for the parameter.3. Fixed a bug where a warning was issued when writing >999 atoms to a V3000 MOL file. V3000 MOL filescan contain >999 atoms, unlike V2000 files.4. Fixed a bug where round-off differences between Windows and Linux could result in stereo perceptiondifferences for 2D coordinates with angles very nearly equal to 180 degrees.5. Corrected the atomic number perception in the PDB file reader for the residues ”G5P”, ”COK” and ”COZ”(from the recently added PDB files pdb2f35, pdb2ges and pdb2g2z).6. Added residue perception support for the non-standard amino acids ”LYZ” and ”MEN”, which denote 5-hydroxylysine and N-methylasparagine.7. Fixed a bug where OEGetHybridization() failed to classify the sulfur in C=S as sp2 hybridized.8. Completed the coverage of the MDL valence model to all atoms to handle charges less than or equal -2, orgreater than or equal to +2. Many of the common cases of these strange charge states were already handled.


34.5. <strong>OEChem</strong> 1.4.2 1399. The generic data reader now calls AddData instead of SetData. This means that mutiple objects with thesame tag can be handled appropriately.10. Added checks to prevent division-by-zero in several geometry routines in OEMath.11. OESetComment now safer to either a NULL pointer for the comment or a NULL pointer for theOEAtomBase.12. Fixed copy constructor and assignment operator for OECoordArray.34.5 <strong>OEChem</strong> 1.4.2<strong>OEChem</strong> 1.4.2 is a bug fix release including dozens of minor and major bug fixes. It also includes numerous newminor features, and also brings significant practical changes. <strong>OEChem</strong> 1.4.2 is the first <strong>OEChem</strong> that is beingreleased along with all the other public OpenEye toolkits. This will be a major step forward for inter-operabilitybetween different OpenEye toolkits and will open the way for easier support of customer applications that spanmultiple OpenEye toolkits.OpenEye continues to be committed to maintaining a stable <strong>OEChem</strong> API and thus any programs written withprevious versions of <strong>OEChem</strong> should re-build with the 1.4.2 version of <strong>OEChem</strong>. However, as is the nature ofC/C++ code, particularly C++ template code, some of the bug fixes are in header files, so dependent libraries suchas most other OpenEye libraries may require a re-link.<strong>OEChem</strong> 1.4.2 will be the first release in which OpenEye is beginning to move toward synchronizing our libraryreleases. This will remedy problems some people have experienced with link-problems when combining three ormore OpenEye libraries into a single application.34.5.1 OEPlatformMinor bug fixes1. Improved support for both MSVC and mingw on 32 bit Windows.2. Improved 64 bit support.3. Improved compatability with HPUX and IRIX644. Improved support for gcc 4.x.Major bug fixes1. Fixed bug in oeistream::oeclose that prevented oemolistreams::close from returning the molstream to a defaultstate.2. Fixed a size-check bug in low-level binary i/o routines.


140 Chapter 34. Release Notes34.5.2 OESystemMinor bug fixes1. Improved efficiency of sorting in OEIter class and fixed bug in the OEBinaryPredicate sort implementation.2. Added Push, Sort and the NonConst constructor to the primative specializations of OEIter.3. Allowed weighting of coordinates in inertial tensor and inertial alignment routines.4. Fix OEBitvector::FromHexString function to skip over end of line characters in the input.5. Allow OEBitvector to properly read std::vector¡bool¿.6. Allow quoting parameter values in OEInterface parameter files.7. Fixed bug in wildcard matching for OEInterface string parameters.8. Fixed auto-link warnings resulting from “–help html”.9. Added OEFixedGrid::SetValues (note, the size of the grid is not mutable).10. Fixed float/double precision bug in OEStringToNumber.Major bug fixes1. Fixed bug that caused all the OEErrorHandler parameters to be reset to default values when the stream wasreset.2. Increased error handler buffers to 8K to prevent crashes with very large warning and error messages.3. Added specific handling of OEErrorLevel::Verbose that was previously missing.4. Protect Hanoi sort routine from sorting lists of size 0 or 1.5. Prevent parameter mismatches in OEMultiGrid’s owned grids.34.5.3 <strong>OEChem</strong>Minor bug fixes1. Dramatic improvement in the search speed of disconnected SMARTS such as (*).(*).2. Speed optimizations were added to molecular geometry manipulations.3. Fixed very rare search-path bug in aromaticity perception.4. Allow support for C[N@H]O and C[N@@H]O in SMILES output.5. Fixed side-effects of OECanonicalOrderBonds on stereochemistry.6. Added BondIntType I/O in OEBinary format.7. Improve OEKekulize fall-backs to it fails gracefully.8. Add support for fully explicit Hydrogen flavor in SMILES (i.e. [CH4]).


34.5. <strong>OEChem</strong> 1.4.2 1419. Fixed bug in the over-application of constraint for MCS searches.10. Added support for aromatic lead, aluminum, germanium and tin.11. Improved support for MDL’s V3000 format.12. Improved perception of cationic N.pl3 atoms with partial charge greated than one.13. Added protection to SMILES writer for molecules with a terminal single bond that has stereochemistryspecified.14. Fixed initialization of OEBinaryHandlers.15. Added support for the aromaticity model being used to effect chirality perception.16. Improved carbon dioxide, acetonitrile, cyanamide, cyanic acid and thiocyanic acid fragment recognition inbond perception alogrithms.17. Fixed rounding error in OENetCharge function.18. Fixed endian bug in CDX reader for OS X.19. Added protection from molecule titles ¿4K in length.20. Added proper support for comments in MOPAC files.21. Improved copy construction and assignment of OEQAtoms and OEQBonds.22. Added HETBONDS flavor to the PDB writer that causes any bond to or from a HETATOM to be writtenexplicitly. The HETBONDS flavor now part of the defualt output.23. Added support for “UNK” atoms attached to a backbond and “LIG” atoms that are covalently bound ligandor post-translational modifications to be given the same chain ID and residue number as the residue to whichthey are bonded.24. Added OESymmetryNumber to calculate the symmetry number of a given conformer within a user-definedthreshold.25. Fixed bug that lost SD data attached to an OEMol (not it’s conformers) when the molecule was written toOEB (Note, SD data is not generally stored on the parent molecule).26. Fixed OEMCMolBase::OrderConfs OS X memory bug.27. Fixed strip salts to only count heavy atoms in determining the fragment with the largest size. This makes thebehavior remain consistent with implict and explicit hydrogens.28. Added support for ’x’ in SMARTS as defined in v4.9 of Daylight’s toolkit.29. Prevent PDB writer from generating CONECT records that have the same serial number for the source anddestination.30. Fixed bug in automorphism calculations on molecules with hydrogen-defined cis-trans bond stereo.31. Improved algorithm for perception of 3D bond stereo.32. Added support for stable serial numbers, alternate locations, insertion codes and heteroatoms during calls toOEPerceiveResidues.33. Improved heuristics for residue perception in the precense of unusual chain terminations.


142 Chapter 34. Release NotesMajor bug fixes1. Added safegaurds against stack over-runs in all molecule implementations.2. Fixed a major bug in rotor-offset compressed OEBinary file handling that made these molecules fragile withrespect to molecule copies.3. Fixed thread-safety of reading SMILES.4. Fixed crash-bug when OEPerceiveResidues was called on unusual small-molecule peptide mimics such as“O=CC(C)NC(C)C=O”.5. Fixed dummy-atom handling in the CDX reader. Also improves implicit-hydrogen handling, but limits thereader to reading a single connection table.6. Corrected failure to include Isotope information in OEBinary files.7. Fixed crash-bug from processing stereochemistry on MDL atoms with degree three having hashes andwedges.8. A major oemolistream bug was fixed by fixing a stream bug (see OEPlatform’s Major bug fixes above).9. Fixed a bug where OELibraryGen failed to convert an aromatic ring to aliphatic accurately.34.5.4 OEBioMinor bug fixes1. Made command-line usage more explicit for subsetres example.2. Improved output format at water categorization for rescount example.3. Converted GetChis to OEGetChis.4. Improved support for g++ 4.x.5. Corrected bug so that a new chain automatically starts a new fragment and a new chain or new fragmentautomatically start a new residue.6. Fixed bug in OEGetAtoms from a residue’s flood-fill algorithm.Major bug fixes1. Changed OEHierView constructor to assume initial perception has already occurred. This prevents theconstructor from over-aggressively re-perceiving residue information.34.6 <strong>OEChem</strong> 1.4.1<strong>OEChem</strong> 1.4.1 was an internal OpenEye release only. All associated release notes have been incorporated into the<strong>OEChem</strong> 1.4.2 release notes (see above).


34.7. <strong>OEChem</strong> 1.4.0 14334.7 <strong>OEChem</strong> 1.4.0<strong>OEChem</strong> 1.4.0 is a major new feature release. OpenEye is introducing OEBio, a new programming library extending<strong>OEChem</strong>’s convenience in handling biopolymers. In this initial release, OEBio’s API is small but useful.Over the life of the 1.4.x <strong>OEChem</strong> release series the OEBio API will grow. The purpose of OEBio is not tocover Bioinformatics, but to extend <strong>OEChem</strong>’s strong cheminformatics foundation to conveniently support proteinmodeling.The source-code and examples in /openeye/examples/oechem have long been caught in a conflict. They servedboth as very useful tools and as didactic coding examples. To fufill the role as tools, they needed good commandline-interfacesand error reporting. Unfortunately these features lead to more complex code. To fufill a role ascode examples, these programs need to be as simple as possible, highlighting one or two programming principles.In order to better serve both purposes, the example programs have now been split into /openeye/utilitiesand /openeye/examples, the first includes programs with more complex code and better interfaces and the latterwith simple <strong>OEChem</strong> code examples. In addition, nine new example programs have been included to demonstratecommon uses of the OEBio api.In addition to OEBio, the 1.4.0 release includes many new features and bug fixes in the <strong>OEChem</strong>, OESystem andOEPlatform libraries.34.7.1 OEPlatformNew Features1. Improved binary data handling in streams.2. Significant improvements for user convenience in licensing code will allow future versions of OpenEyeapplications to manage licensing failures in a friendly manner.3. New pipe streams (oepstream) added to the beta public api.Minor bug fixes1. Fixed bug in cross-platform directory searching and checking for files on a file system.2. Fixed bug in oeigzstream::size that reported incorrect sizes in some instances.3. Added the ability to detect moved home directories under Windows.Major bug fixes1. Fixed bug that prevented reading the final molecule in a file and then seeking to other positions in the file.2. Fixed a 64bit stream seek and read bug that could cause memory overflows and crashes.


144 Chapter 34. Release Notes34.7.2 OESystemNew Features1. Moved superpos and tensor2mat API points from <strong>OEChem</strong> to OESystem. Added deprecated support fortheir use in <strong>OEChem</strong>.2. Added ability to assign an OEIterBase* to an OEIter object. This allows muchwider use of iterators of const objects.3. Made OEIter::Sort a stable sort.4. Additional physical constants added to OEMath::OEConst.5. Added the ability to parse OEInterface parameter files without use of command-line parsing.6. New ELEMENT and FORMALCHARGE flavors for pdb writer. ELEMENT adds the atomic symbol to columns77-78 and FORMALCHARGE add non-zero formal charges in columns 79-80.7. Extended the ExtBonds option from the “.smi” writer to the “.can” and “.ism” writers.Minor bug fixes1. Fixed OEGrid and OEMultiGrid constructor bug that could cause no memory to be allocated for the gridelements.2. Corrected behavior of OEGrid::Clear to clear the OEBase data, remove the title and reinitialize all theelements of the grid.3. Fixed rotation bug in inertial-frame alignment.4. Fixed bug in the atom index into coordinates used while calculating the center of mass.5. Fixed bug in the calculation of OEMultiGrid::SetSpacing and OEMultiGrid::SetMid functions.6. Fixed OEInterface category name bug, !KEYLESS bug and unterminated category bug.Major bug fixes1. Protected the OEIter::Sort function from NaN (not a number) members.34.7.3 <strong>OEChem</strong>New Features1. New support for highly compact “rotor-offset compressed” oeb files.2. Added support for MDL ISIS Sketch file format with the “.skc” suffix.3. Added support for writing hydrogens that are required for specifying cis-trans stereo.4. Added support for “[Ds]” and “[Rg]” in SMILES and SMARTS.


34.7. <strong>OEChem</strong> 1.4.0 1455. Added support for writing high-atomic number atoms in SMILES using “[#123]” notation.6. New OEWriteConstMolecule function class to support high-level writing of const molecules. Introducedreturn-codes for the high-level writers that reflect that some molecules are inherently not supportedby certain file formats (e.g. ¿999 atoms in .sdf).7. Add an OEOFlavor::MOL2::Substructure high-level writer flavor to force an“@TRIPOS” idiom in the .mol2 file.8. New OEHasStereoHydrogens(OEAtomBase *) function that determines if an atom has a protonthat is required to specify stereochemistry.9. Added retainStereo=false default argument to OESuppressHydrogens that keeps hydrogensindicated by the OEHasStereoHydrogens function.10. Added OEMatchBase::Clear function.11. Dramatically improved efficiency of OEMCMolBase::DeleteConf for deleting large numbers of conformersin order. Worst case behavior of the algorithm was changed from N ∗ N to N.12. Allow the SD file reader to handle a blank line between the “M END” and the “$$$$” lines.13. Added convenience functions for getting and setting the MDL parity on atoms.14. Added new bitmask initialization parameters to OEInitDefaultHandler that allow easy specificationof which handlers to initialize.15. New support for “h”, “d”, “t”, “[T]” and “[t]” non-standard SMILES representations.16. Improved support for multiple NMR models in PDB files by reading, retaining and writing model number.17. Added fully supported OEPDBData and OEPDBDataPair classes as well as the neccessary function tostore and retrieve them from molecules.18. Three new convenience functions for clearing tag data: OEClearTagData, OEClearSDData andOEClearPDBData.19. Added support for determining whether the library is properly licensed with <strong>OEChem</strong>IsLicensed funciton.20. Added OEResidueHydrogens(OEAtomBase *) function that will rename hydrogens an a heavy atomto their proper PDB atom names.Minor bug fixes1. Added PDBData readers and writers to OEBinary file handlers.2. Added defensive code to OEMolBase::DeleteAtom and OEMolBase::DeleteBond to confirm thatthe atom or bond are owned by that specific molecule.3. Fixed rotation bug in intertial frame alignment.4. Converted inconsitent “/” and “\” into a warning rather than an error, allowing the molecule to be parsed ina racemic fashion.5. Added an upper bound to the degree of the atoms at either end of a cis-trans chiral double bond.6. Added defensive code to prevent creation of atoms with atomic number greater than 255.


146 Chapter 34. Release Notes7. Improved perception of non-aromatic exo double-bonds. This corrects a problem perceiving the progesteronein pdb1a28.8. Improved the exo-cyclic double bonds to sulfur. This improves the connectivity perception in 1hnv, 1rev,1usn, 1uwb, 2usn, 3usn and 1zxv.9. Improved the bond order perception of notroso, oxime, azide, and arylhydroxylamine functional groups.10. Improved bond order perception of clashed structures by allowing hydrogens to only bond to their nearestheavy atoms.11. Prevent alternate conformation representations to be bonded to one another during bond perception.12. Made Up/Down choice for the first stereo bond in each resonance system canonical for writing isomericsmiles files.13. Made OECanonicalOrderBonds also order the bonds obtained with the OEAtomBase::GetBondsfunction call.14. Fixed bug in binary search for atomic number “0” used in OEIsCommonIsotope,OEGetAverageWeight and OEGetIsotopicWeight.15. Fixed the high-level pdb writer to preserve residue information found on the molecule.16. Corrected OEIsReadable to return false for the MOPAC file format.17. Added MOPAC flavors to the high-level molecule writers.18. Changed the hybridization assignment of negatively charged resonant nitrogens such as*S(=O)(=O)[N-]C(=O)*.19. Fix bug in OESet3DHydrogenGeometry the could use a hydrogen’s own coordinates as a reference fordetermining its geometry.20. Fix ring perception bug in OEMCSMaxAtomsCompleteCycles.21. Eliminate the redundancy between OEMDLSetBondStereo and OE3DToBondStereo by allowingOE3DToBondStereo to take an optional bond mask and work on 2D as well as 3D molecules.22. Correct a bug in the <strong>OEChem</strong> interpretation of MDL wedge and hash bonds. In MDL connection tables,wedges and hashes only imply a specified stereo-center at the thin end (i.e. OEBondBase::GetBgn). Thishas been confirmed by comparing the wedge/hash bonds with the atom stereo parity bit in MDL ISIS output(including large vendor databases such as the entire Asinex 2005 collection).23. Fixed MDL reader bug where unrecognized atomic symbols would ignore subsequent fields in the atomblock such as stereo parity, reaction role and valence.24. Added copy constructors and assignment operators to OEMiniMols, OEMiniBonds and OEMiniAtoms.25. Fixed a sign error in OESetAngle.26. Added a length==0.0 check for OESetDistance and OESetAngle.Major bug fixes1. Fixed oemolistream::seek and oemolistream::tell to take into account any cached moleculesthat may exist in the stream.2. Fixed low-level MDL reader to accept multiple SD tags with the same tag. Note: It is not clear from the SDfile specification if this is a valid SD format.


34.7. <strong>OEChem</strong> 1.4.0 14734.7.4 OEBioNew Features1. Added OESeqAlignment class with associated features for pairwise sequence alignment (includingPAM250, BLOSUM62 and GONNET), writing an alignment to an oeostream and carrying out RMSDalignment between two proteins based on the sequence alignment.2. Simple methods for accessing and manipulating the torsion angles of biopolymers.3. Introduce classes that allow a hierarchical view of the Chains, Fragments and Residues of a protein whilemaintaining the efficient <strong>OEChem</strong> internal data structures.4. Added facility for swapping the terminal atoms of residues that are commonly ambiguous in protein crystalstructures (e.g. terminal N,O of ASN).5. Added nine new example programs demonstrating the use of the new OEBio api points. These include:backbone, cischeck, makealpha, phipsi, rescount, reshist, seqalign, subsetres and swapaieres.New Example ProgramsThese examples show the best feature of <strong>OEChem</strong>. Though most are less than 100 lines of simple code theydemonstrate protein-protein sequence alignment, 2D and 3D structure manipulation, residue perception, robustmulti-format I/O, stl integration, canonicalization, chirality perception and manipulation and many other complexcheminformatics tasks. While the main loop of each program is often only 30 lines long, it brings to bear thousandof lines of <strong>OEChem</strong> code and years of cumulative cheminformatics experience to easily combine 2D and 3Dstructure analysis and manipulation.1. backbone.cpp: Code to show the use of functors to select and write the backbond atoms of a protein.2. cischeck.cpp: Demonstrates how to loop over residues and checking the omega torsion for cis amides.3. makealpha.cpp: A code example of protein structure manipulation. This example modifies any protein intoan alpha-helical structure with extended side-chains.4. phipsi.cpp: Simple code to report the phi-psi angles of a protein.5. rescount.cpp: Demonstrates an easy way to loop over the residues of a protein and query their information.6. reshist.cpp: Demonstrates and easy way to loop over a protein’s residues and integrate the aquired data intoan STL “dictionary” class.7. seqalign.cpp: This is perhaps the most complex program of the examples. It carries out protein-proteinsequence alignment, alignment evaluation and printing as well as 3D structural alignment.8. subsetres.cpp: Simple code example of how to pull a specific residue out of a protein using its commonname(e.g. ARG B 52).9. swapaieres.cpp: Demonstrates how a user can select a residue using its common name (e.g. GLN 252) andswap the ambiguous iso-electronic atoms.


148 Chapter 34. Release Notes34.7.5 CommonNew Features1. Split the programs previously in the examples directory into examples and utilities. The utilities directorywill contain programs or versions of programs that may be useful and convenient for modelers to carry outcommon tasks. The examples directory will contain programs that may also be useful, but there primarypurpose will be to provide didactic code examples of how to program common tasks using the <strong>OEChem</strong>library.2. Numerous additions to the API and <strong>Theory</strong> manuals.34.8 <strong>OEChem</strong> 1.3.434.8.1 OEPlatformMinor bug fixes1. Fixed a problem in the oeigzstream constructor, that could cause problems determining the size of a fileusing size, as zlib was not being correctly initialized.2. Fixed a minor bug in OEFileCreate that was failing to close the file descriptor for the newly created file.3. The return type of the OEFile::Size method and the OEFileSize function have been changed fromsize t to oefpos t to allow correct handling of large files on 32-bit platforms.4. Fixed some minor bugs in oeistream and oeostream when these classes are used directly.New Features1. New functions OEFileDetermineName and OEFileDeterminePath to extract the basename andpath of a filename respectively.34.8.2 OESystemMinor bug fixes1. A complete new implementation of OEConcatIter to avoid potential problems caused by the fact thatappending items to an STL std::vector invalidates iterators over it. The new implementation usesoffsets rather than pointers to avoid problems. This also fixes a crash of OEConcatIter::operator++on an empty iterator.2. Fixed a bug in OEIter::Push that caused problems if the iterator was previously empty.3. Fixed a bug in OEFileExtension that was causing crashes when no extension was present.


34.8. <strong>OEChem</strong> 1.3.4 1494. Fixed a bug in OEInterface::GetParameters that caused parameters in sub sub interfaces to appearmultiple times in the returned iterator.5. The OEWriteSettings function now takes a const OEInterface&.New Features1. New OEMath template functions OEGeom3DGetCenterAndExtents andOEGeom3DGetCenterOfMass have been added to calculate the center, extents and (weighted)center of mass of a set of 3D coordinates.2. The OEInterface class now has new GetTypedParameter template member functions.3. A new function OEGetCenterAndExtents has been added to OESystem’s grid handling code, to retrievethe center and extents of a grid.34.8.3 <strong>OEChem</strong>Minor bug fixes1. Fixed a segmentation fault when calling either IsCommonIsotope or OEGetIsotopicWeight withboth atomic number and atomic mass of zero. If ever IsCommonIsotope is called with a mass of zero (forany element) it now returns false, and whenever OEGetIsotopicWeight is called with a mass of zero,it now returns the corresponding isotopically averaged atomic weight, i.e. via OEGetAverageWeight.2. The OEMatch class was missing an explicit assignment operator, which could cause memory corruption ifone OEMatch was ever assigned to another. This has now been implemented.3. Improved <strong>OEChem</strong>’s OEDetermineConnectivity functionality to restrict proximity-based bonding ofhydrogen atoms preferably to their nearest suitable heavy (non-hydrogen) neighbor. Previously, hydrogensonly bonded to their nearest neighbor, which for all-atom molecules with bad clashes caused the overlappingprotons to sublimate as molecular hydrogen, leaving their original parents in strange charge (or radical)states.4. The generation of @TRIPOS records in Sybyl .mol2 files has been rewritten to ensurethat ligands, solvent molecules and non-standard amino acids are correctly placed in their own Tripossubstructure. Previously, PDB HETATMs would be placed in substructure 1, sharing it with a validRESIDUE. Each OEResidue is now allocated a unique Tripos substructure, and non-standard (or unrecognized)residues are marked as GROUP.5. The geometry functions OESetAngle and OESetDistance have both been made more robust tomolecules without coordinates (and zero bond lengths).6. A refactoring of the SMILES generation routines in the <strong>OEChem</strong> source code means that the non-canonicalSMILES created by the function OECreateAbsSmiString are now identical to those created byOECreateSmiString with just the AtomMaps and RGroups flags.7. Tweaks OEDetermineConnectivity to avoid generating bonds between atoms in alternate locations/-conformations. This is only a problem for PDB files read with the ALL input flavor. There are still multiplebonds between invariant “hinge” atoms and the multiple copies of it’s neighbors in alternate conformations.8. Fixed a potential performance problem in OEMolBase’s += operator, and the equivalent OEAddMolsfunction calls, when concatenating large molecules with co-ordinates.


150 Chapter 34. Release Notes9. The performance of reading MDL SD files containing a large quantity of SD tag data and for PDB filescontaining a large amount of header information (with the DATA flavor) has been improved. Instead ofattaching each data item to the molecule as it’s encountered, the file format readers now accumulate the datafirst, and then attach it to the molecule on encountering the end of the connection table.10. The performance of handling generic data tags when reading OEB files containing large numbers of atomsor bonds has been improved.11. The consistency of file format flavors have been improved. The SMILES ExtBonds flavor that was previouslyonly available to the SMI file format, is now also available to the CAN and ISM formats. Likewise,the flavors available to the MOL2 and MOL2H file formats have been synchronized, with MOL2H becoming aflavor of MOL2 (i.e. the file extension effectively specifies a different default flavor).12. The OEMCSMaxAtomsCompleteCycles objective function requires that ring atoms and bonds of thequery molecule have been perceived (using OEFindRingAtomsAndBonds) before the correspondingOEMCSSearch instance is constructed. To avoid unexpected behavior, the OEMCSSearch constructorsnow call OEFindRingAtomsAndBonds themselves if allowed to make a copy of the query molecule (thedefault), or issue a warning to OEThrow otherwise.Major bug fixes1. A problem with the generation of canonical isomeric SMILES has been fixed, that for pathological moleculeswith specified cis/trans stereochemistry could cause the canonical SMILES string to be written using ‘/’(forward slash) for some input orderings and ‘\’ (backslash) for others. The resulting SMILES weren’tincorrect (and the perceived symmetry invariants were fine), but this allowed equivalent isomers to be givennon-identical isomeric SMILES strings.2. Fixed a bug in the 1.3.3 changes to the OEPerceiveResidues function. The new functionality to identifyan acetyl as an N-terminal blocking group, PDB code ACE, could get confused when pattern matching somepathological graphs, such as proteins with N-terminal proline residues.3. Fixed a bug in the OEB file format writer that could result in the co-ordinates of a multiconformermolecule becoming scrambled if the atoms of the OEMolBase has been reordered, withOEMolBase::OrderAtoms, (and no atoms had been deleted) before writing the molecule. This couldalso affected the OEMolBase::Compress and OEMolBase::UnCompress methods.4. Fixed a bug that could cause a core dump when making a copy of an OESubSearch instance that containsatom or bond stereo.New Features1. A new <strong>OEChem</strong>IsLicensed API point has been provided to allow an application to check whether the<strong>OEChem</strong>, OESystem and OEPlatform libraries have managed to find a suitable run-time license. This functioncan safely be called from programs to determine a priori whether using <strong>OEChem</strong> functionality maytrigger a fatal error, allowing a warning to be issued and/or calls the affected functionality to be avoided.2. A new MOL2::Substructure flavor has been added to the high-level writers (and a newdefault argument to the low-level OEWriteMol2File) to force the writer to emit a suitable@TRIPOS record even for molecules having only a single substructure. The defaultbehavior remains the same (to omit the record for “small” molecules) to save space in the common case.3. <strong>OEChem</strong>’s SMILES format parser, OEParseSmiles, has been tweaked to treat SMILES with inconsistentcis/trans bond stereo, such as C/C=C(/C)/C, as a soft warning rather than as a hard error. Previously,


34.9. <strong>OEChem</strong> 1.3.3 151<strong>OEChem</strong> followed the Daylight toolkits, treating this as a serious error, and returning an empty molecule.The new behavior is that a warning is now thrown to OEThrow, but the molecule is returned, just withoutspecified stereochemistry for the problematic bond(s). This allows <strong>OEChem</strong> to be used to correct/recoverSMILES generated by buggy SMILES writers.4. The functions OEAtomSetMDLParity and OEAtomGetMDLParity have been introduced, to simplifythe task of storing and retrieving the MDL parity value associated with each atom.5. The OEB file format readers and writers will now preserve PDB file header records, if they are present onthe original molecule.6. A new function OEGetCenterAndExtents can be used to retrieve the center and 3D extents of a givenOEMolBase.7. The file format extension “.isosmi” is now treated as a synonym for “.ism”, indicating isomericSMILES.8. The OEQMolBase::BuildExpressions function now avoids constructing an atomic hybridizationconstraint if the query atom’s OEAtomBase::GetHyb() method returns zero.34.8.4 CommonMinor bug fixes1. Numerous improvements and refinements to the API documentation.34.8.5 <strong>Python</strong> and Java wrappersThe <strong>Python</strong> and Java wrappers have been split out from the <strong>OEChem</strong> distribution, into a distribution of their own.Release notes for changes to the <strong>Python</strong> and Java language bindings for <strong>OEChem</strong>, OESystem and OEPlatform canfrom now be found listed in their own separate release notes document.34.9 <strong>OEChem</strong> 1.3.334.9.1 OEPlatformMinor bug fixes1. Modified OEFileDeterminePathAndName to canonicalize directory separators to the appropriate formfor the host operating system.2. Improved the performance of OEMutex when using g++, by using the low-level gthr API, rather thanusing the higher-level locking primitives used by the libstdc++ STL library.3. Fixes to oestream classes to prevent accidentally closing stdin. Minor bug fix tooeiwrapperstream’s implementation.


152 Chapter 34. Release Notes34.9.2 OESystemMinor bug fixes1. Fixed a potential memory leak in OEBinaryNot.2. OEInterface’s methods DeleteInterface and DeleteParameter now recursively searchthrough sub-interfaces for the object to delete.3. The OEStringTokenize and OEStringTokenizeQuoted functions have been completely rewritten.Both previous implementations could potentially thrown C++ exceptions, and the latter was just plainbroken.4. A minor bug in the OEInterface class, that in some cases caused the detailed description to end with!END, has been fixed.5. The behavior of the !REQUIRED keyword has changed in OEInterface files. If an option has a defaultvalue, specified by the !DEFAULT keyword, then the !REQUIRED option is ignored.Major bug fixes1. The semantics of how quaternions are represented within the OpenEye toolkits have now been standardized,as scalar-first. Hence, of the four floating point values that define a quaternion, the first represents the scalarcomponent and the final three values represent the vector component. The failure to explicitly documentwhich of the two possible forms was used, resulted in some OEMath functions assuming scalar-first whilstothers assumed scalar-last. [The quaternion functions in OELib, for example, used scalar-last]. Functions affectedby this include OEGeomQuaternionMultiply, OEGeomGetUnitQuaternionConjugate,OEGeom3DQuaternionToRotMatrix and OEGeom3DRotMatrixToQuaternion.New Features1. By default the OpenEye toolkits now use thread-safe memory management internally to allow multiplemolecules (and other objects) to be manipulated by different concurrent threads. Modifying the same objectconcurrently is still unsafe. On some operating systems, <strong>OEChem</strong> intensive applications may experience aslight overhead which may be explicitly disabled with the new OESetThreadSafe function call. Timingson modern GNU/Linux systems show almost no overhead, and the performance benefits of upgrading to g++3.4.x means that most applications should run faster with <strong>OEChem</strong> 1.3.3 than with previous releases evenwith thread-safety enabled.2. The --help functionality of the OEInterface class has been improved to indent and wrap the on-linehelp text at 80 columns. The default screen width can be controlled by specifying the column width on thecommand line, for example --help all 100.3. The OEInterface parser has been improved to allow !CATEGORY names to be quoted, allowing namesto contain spaces.4. The OEFizzGrid class now has an operator bool() method, which returns true if either floats orintegers have been set.


34.9. <strong>OEChem</strong> 1.3.3 15334.9.3 <strong>OEChem</strong>Minor bug fixes1. Fixed bug in the <strong>OEChem</strong> SMARTS parser that failed to follow the Daylight semantics for patterns such as“[H]”, “[2H]” and “[H+]” where the “H” specifies the pattern must match a hydrogen, and not the expectedhydrogen count on an atom.2. The <strong>OEChem</strong> SMILES writers have been modified to prevent them generating atoms such as “[C@H2]” or“[C@@H2]” for centers that have stereo explicitly specified (on non-chiral centers) with explicit hydrogens,when the hydrogens are being automatically suppressed by the output SMILES flavor.3. The methods OEAtomBase::SetStereo, OEAtomBase::GetStereo,OEBondBase::SetStereo and OEBondBase::GetStereo have been enhanced such thatthe internal representation of stereochemistry is invariant of hydrogen suppression. The functionsOESuppressHydrogens and OEAddExplicitHydrogens no longer invalidate stereochemistry.4. The old-style OE binary, .bin, file format reader now automatically sets the dimension property ofmolecules and conformers to 3. Whilst new-style OE binary, .oeb, files explicitly record the dimensionalityof the stored co-ordinates, the old format didn’t and it’s contents should be assumed to be 3-dimensional.5. Correct a minor logic problem in OEQMolBase::BuildExpressions when constructing the expressionsto match bond orders but not aromaticity.6. Fixed a problem in the SMILES parser, which would cause a segmentation fault if ever a SMILES stringlonger than 4096 characters encountered a syntax or Kekulization error. We no longer try to report thelocation of the syntax error for SMILES strings longer than 2048 characters.7. A bug in OEPerceiveBondOrders that assumed/required that the incoming molecule not have anyaromaticity specified, has been fixed by calling OEClearAromaticFlags on the incoming molecule.This assumption was valid for its existing use by the high-level file format readers, but meant that callingOEPerceiveBondOrders twice in a row could sometimes produce different results.8. Fixed a potential problem in several file format readers that caused a run-time abort in Microsoft’s runtimelibraries on Windows when reading corrupt or binary files. The Microsoft implementation of the standard functions, such as isdigit and isupper will abort when passed negative values, such aswhen interpreting the bytes of a file as (signed) char.9. Fixed a segmentation fault in OEScrambleMolecule that was triggered by chiral molecules.10. Fixed a bug in OEMDLCorrectBondStereo that could cause that routine to crash, if the chiral atom onwhich the stereo chemistry needed to be corrected was degree three instead of degree 4. This routine hasbeen made more robust, and can now correct wedges and hashes around degree three atoms that conflict withthe specified MDL parity bit.11. The <strong>OEChem</strong> MDL mol file reader has been made more robust by checking for negative values in the atomcount, bond count and list count fields. These are now interpreted as being zero. Corrupted SD files couldpreviously cause <strong>OEChem</strong> to crash.12. Calling close on an oemolistream that wraps oein, will now correctly makeoemolistream::operator bool() return false, and stop it reading (even though oein itselfshouldn’t be closed).13. The <strong>OEChem</strong> SMILES parser, OEParseSmiles function, has been fixed to set the default bond order ofunspecified external bonds, i.e. “C&1”, to be single. Previously these were left initialized as bond orderzero, although “C&=1” and “C&#1” were correctly handled as double and triple bonds respectively.


154 Chapter 34. Release Notes14. The function OEPDBOrderAtoms has been improved to only compare atoms names for recognizedresidues when sorting. This prevents atoms being needlessly reordered for no good reason.15. OEPerceiveResidues has been improved to assign unique atom names to every atom within an unknownor unrecognized residue. Previously, all six atoms in benzene would be given the same atom name“ C ” which confuses software that assumes PDB atom names are unique within a residue. <strong>OEChem</strong> nowassigns “ C1 ”, “ C2 ”, etc...16. Add goof-proofing to return calls to OEInvertCenter where the specified atom is not trivially invertible(i.e. a center with 3 or more ring bonds).17. Improved handling of the hydrogen isotopes “D” and “T” when reading MDL connection tables. Thesesymbols now automatically set the isotope field appropriately. Previous versions of <strong>OEChem</strong> interpretedthese symbols as forms of hydrogen, but relied on the MDL’s mass field or “M ISO” line being correctly setto specify a/which isotope.18. A very minor bug in OEPerceiveResidues has been fixed that prevented residue information frombeing assigned to lone protons. The algorithm previously assumed all hydrogens were bonded to a heavyatom parent.19. In OESubsetMol the dummy atoms used to represent attachment points are no assigned map indicesstarting from one, i.e. R1, R2, R3, instead of from zero, i.e. *, R1, R2.20. OESubsetMol now attempts to preserve or undefine the specified stereochemistry at atoms and bondsaffected by attachment points.21. The performance of OEDetermineConnectivity has been dramatically improved for very largemolecules. This greatly speeds up the reading of proteins like pdb1jj2.ent (which contains 98,543 atoms)several fold.22. Replaced an inefficient O(n 2 ) algorithm in the OEMolBaseImpl::OrderAtoms method that checkedthat the input vector was a valid permutation of a subset of the atoms in the molecule. This dramaticallyimproves the performance of writing large PDB files.23. The performance of many of the OEMolBase, OEAtomBase and OEBondBase methods has been improvedin <strong>OEChem</strong> 1.3.3.24. The methods oemolistream::operator bool, oemolostream::operator bool andoemolistream::eof have been marked “const” to enable better compiler optimization.Major bug fixes1. A problem in <strong>OEChem</strong>’s graph canonicalization algorithm was identified by the NCBI’s PubChem projectfor the single molecule: C12C3C4C3C5C4C1C25. This problem has been fixed in <strong>OEChem</strong> 1.3.3. Unfortunately,this failure didn’t show up on our testing of 100 random permutations of 2.5 million compoundtest set. Efforts are now on-going to validate OpenEye’s canonicalization against all theoretical connectiontables with less than N atoms, for some N > 10.2. A bug in the OEB file format readers and writers that could cause the titles and/or comments attached tomolecules or conformers to be lost, has been corrected.


34.9. <strong>OEChem</strong> 1.3.3 155New Features1. Several enhancements have been made to the protein perception algorithms used inOEPerceiveResidues. These allow <strong>OEChem</strong> to recognize the N-terminal capping group “ACE”,and the non-standard amino acid residues “ABA”, “CGU”, “CME”, “CSD”, “MLY”, “MSE”, “PCA”,“PTR”, “SEP” and “TPO”. Support for these additional amino acid types has also been added toOEGetResidueIndex and friends. The sidechain pattern matching algorithm now has improved“fallback” functionality for better handling of modified/substituted residues.2. Improved support from aromatic boron and aromatic silicon in OEKekulize. The <strong>OEChem</strong> toolkit currentlydoesn’t perceive either boron or silicon to be aromatic (with any aromaticity model), but this enhancementallows us to Kekulize structures so specified.3. Added improved support of parsing SMILES containing aromatic boron and aromatic silicon, allowing the<strong>OEChem</strong> toolkit to parse “b1ccccc1” (borinine).4. A new OEGetDelphiRadius function has been added to <strong>OEChem</strong> to return the default radius for a givenelement used by the Accelrys’ Delphi program for electrostatics calculations.5. A new function OEGetAminoAcidCode can be used to convert an index from the OEResidueIndexnamespace to a IUMB single character code (’A’ for alanine, ’R’ for arginine, etc...).6. Several new convenience functions, OEAssignCovalentRadii,OEAssignDelphiRadii, OEAssignBondVdWRadii, OEAssignPaulingVdWRadii andOEAssignHonigIonicCavityRadii, are now provided to set the radius property on each atom of amolecule to the value specified by the corresponding OEGet...Radius function.7. A new function OEIsBinary is provided to determine whether the specified file format is binary or not,for example, .oeb, .bin and .cdx.8. The new function OEGetFormatExtension can be used to return a comma separated list of lowercasefile format extensions that can be used to aid implementing directory scans and file format dialog boxes.9. A new OEMCSFunc functor, OEMCSMaxBondsCompleteCycles can be used as an objective functionto <strong>OEChem</strong>’s maximal common subgraph matching algorithms.34.9.4 <strong>Python</strong> wrappersMajor bug fixes1. Fixed a memory leak in OENot*, OEAnd* and OEOr* predicates.2. Fixed a bug in PyAtomPredicate, PyBondPredicate and PyConfPredicate where a syntaxerror in the <strong>Python</strong> callable function would silently fail. Now, if there is an error in the <strong>Python</strong> function, theexception will propagate back to the <strong>Python</strong> interpreter.New Features1. The OEInterface class and associated machinery for creating and parsing command lines is now availablein <strong>Python</strong>. While <strong>Python</strong> has native command line argument support, this provides an alternative thatis functionally similar to the C++ <strong>OEChem</strong> version. The example program molextract.py has beenupdated to demonstrate this new feature.


156 Chapter 34. Release Notes34.9.5 Java wrappersNew Features1. With this release of <strong>OEChem</strong>, Java wrappers are now provided. This first version only supports Sun’s JVMversion 1.4.2.34.9.6 CommonMinor bug fixes1. Numerous improvements and refinements to the API documentation.New Features1. The preferred version of the GNU C/C++ compiler has been upgraded from 3.3.3 to 3.4.3. To ease anypotential transition issues both g++3.3 and g++3.4 versions of the previous release of <strong>OEChem</strong>, version1.3.2, are available for OpenEye’s download page.2. Added support for the native SunStudio C++ compilers on SUN Solaris 2.8 running on SPARC processors,for both 32-bit and 64-bit ABIs. Previously, Solaris support required to use of the GNU g++ compilers dueto bugs in the /opt/SUNWspro/bin/CC compiler, that have now been fixed by recent patches.Caveats1. We are no longer able to support the native cxx compiler on HP/Compaq Tru64 running on Alpha processors.This is caused by HP’s official “end-of-life” of their alpha server line, including developer supportprogram. For the time being, we continue to support Alpha Tru64 but only using the GNU g++ compiler.34.10 <strong>OEChem</strong> 1.3.234.10.1 OEPlatformMinor bug fixes1. Fixed the STL standard file I/O wrappers, oeistdstream and oeostdstream that wrap/transformC++’s std::istreams and std:ostreams into the oeistreams and oeostreams required by<strong>OEChem</strong>’s file functions.


34.10. <strong>OEChem</strong> 1.3.2 15734.10.2 OESystemMinor bug fixes1. Made OEBitVector’s destructor virtual to allow classes to be derived from them.Major bug fixes1. Exposed the baseimpl member of OEBase. This allows the <strong>OEChem</strong> python wrappers to associategeneric data correctly with a molecule.New Features1. A new two argument variant of OEGeom3DAngle allows the calculation of the angle between two vectors.This is an efficient form of the three argument variant, where the middle point is defined to be the origin.2. Added a new OELinearInterpolate function (template) that can be used to linearly interpolate thevalue at an arbitrary point inside an OEFixedGrid.3. Two new functors OEBinaryAnd and OEBinaryOr for constructing a single binary predicate from twobinary predicates. These are equivalent to the OEAnd and OEOr functors for unary predicates.34.10.3 <strong>OEChem</strong>Minor bug fixes1. Fixed a problem with reading multi-conformer molecules from MDL SD file format, where we wouldn’tcorrectly record that each conformer already had MDL atom parity information (so we’d re-perceive it uponoutput which wouldn’t preserve the original input values).2. The dimension of an OEMolBase (as returned by the OEMolBase::GetDimension method), is nowread and written to OEB binary files. Previously, all molecules in OEB were implicitly 3D (which remainsthe default), but we now explicitly record when the dimension has a value other than three.3. Fixed a bug in OEPerceiveChiral that failed to recognize that double bonds in rings of size seven orgreater are potentially chiral.4. Fixed an obscure bug in the MDL file reader. We were doing chirality and bond stereochemistry processingbefore we’d kekulized any aromatic bonds (from MDL substructure queries) or set the implicithydrogen counts. This occasionally confused OEPerceiveSymmetry’s graph invariants andOEPerceiveChiral’s tests for potential chiral atoms.5. Corrected the high-level file writer, OEWriteMolecule, such that when calculating MDL atom paritybits prior to writing MDL file formats, to first extract chirality from 3D co-ordinates if present. This is nowconsistent with similar logic prior to writing isomeric SMILES.6. When writing isomeric SMILES, the OECreateSmiString function was suppressing explicit hydrogensnecessary to represent double bond stereochemistry. These necessary “stereo” hydrogens are now writtenout explicitly.


158 Chapter 34. Release Notes7. Fixed some issues in the <strong>OEChem</strong> SMILES parser where there was insufficient validity checking in theprocessing of external bonds. These problems caused SMILES strings such as “C&1&1”, “C&1C&1” andeven “&=9” to crash the parser. We now more politely generate a warning message and return false.8. Improvements to OESet3DHydrogenGeom to improve the geometry of protons added to carboxylicacids (which are now guaranteed to be cis). OESet3DHydrogenGeom also avoids calling the functionOEAssignHybridization on the molecule, using a user-assigned hybridization if available and callingOEGetHybridization otherwise.9. Several fixes and improvements to the SMARTS parser. We weren’t correctly handling SMARTS withdouble digit ring closures, and we didn’t recognize “[te]” as aromatic tellurium. Also fixes to cis/trans stereochemistry in a few corner cases.10. Fixed by in a bug in reaction handling of the OELibraryGen class that caused a segmentation fault whenusing a reaction to delete unmapped atoms.11. Fixed a bug in OERMSD where if asked to overlay two sets of co-ordinates, and the user didn’t ask foreither the rotation matrix or translation vector, and the two structures overlayed perfectly, we’d generate asegmentation fault (we were writing a unit matrix to a NULL pointer).12. Fixed a bug in OEMCMolBase::SweepConfs where we’d fail to renumber conformer indices, if therewere no deleted conformers.13. Minor tweak to OEGetHybridization for uncharged sulfur, selenium, tellurium and polonium. Theseare now always considered sp3 unless they’re aromatic, in which case they are sp2.14. The OEQBase::SetExpr has been changed to make a copy of the passed const OEExprBase* whichfixes issues with copying/assigning OEQMolBases.15. Fixed a rare bug in OEAtomBase::GetStereo that could occasionally return eitherLeft or Right for an atom without HasStereoSpecified. This has now been fixedsuch that whenever HasStereoSpecified returns false, GetStereo will always returnOEAtomStereo::Undefined.16. A minor tweak to OEGetFileType to perform case-insensitive string comparison, allows this code torecognize file format extensions independent of capitalization, i.e. “.mol2” and “.MOL2”.17. Improvements to the heuristics used by OEMDLPerceiveBondStereo for placing wedge and hash bondson a connection table. These include avoiding placing the wedge/hash on the fusion bond for chiral bridgeheadatoms, and arbitrarily choosing amongst the best bonds when no unique best is found (previously a tiecaused the algorithm to choose randomly between all neighboring bonds).18. Tweak to OEDetermineConnectivity such that atoms that are marked as PDB residues “CL”, “BR”and “IOD” (i.e. chloride, bromide and iodide ions) are treated like solvent, and are never bonded to otherresidues (in this case other atoms).19. Routine maintenance improvements to <strong>OEChem</strong>’s PDB file format readers to handle files recently releasedby RCSB. For example, residue “GCP” in pdb1rca.ent, residue “783” in pdb1o2t, and similar fixes in PDBcodes 1ta2, 1ta6, 1o3m, 1o3n and 1o3o.20. Improvement to OEReadMacroModelFile to read the atomic partial charge out of the charge-chargeinteraction column and fixed a bug in extracting the atom name field.21. Enhancement to the OEWriteMacroModelFile function to write out the atomic partial charge to boththe charge-charge and charge-multipole fields of the MacroModel connection table. Previously, we onlystored the partial charge in the charge-charge field, and wrote 0.0 to the charge-multipole field.


34.10. <strong>OEChem</strong> 1.3.2 159New Features1. New function OEReadRxnFile to read MDL RXN (reaction) file format.2. New functions OEGetAtomComment and OESetAtomComment that allow arbitrary text strings to beassociated with atoms. This is currently used to preserve/manipulate atom aliases from MDL file formats.3. Changes to OEReadMDLFile and OEWriteMDLFile to read and write atom alias information. Atomalias information is stored on an atom by the reader using OESetAtomComment, and any such atomcomments are written to the MDL connection table on output.4. New function OEInvertCenter to invert a tetrahedral center. For chiral atoms, this function flips themolecule from one isomeric form to the other. This function returns false if the center can’t easily beinverted.5. Query molecules, represented by an OEQMolBase can now be read from and written to OEB binary files.6. New function(s) OECorrectAcidProtonGeometry to expose the new functionality inOESet3DHydrogenGeom to ensure that protons on carboxylic acid groups are cis.7. A new SMILES flavor, OESmilesFlag::ExtBonds allows <strong>OEChem</strong> to write out SMILES strings usingthe external bond notation, i.e. “[*:1]CC[*:2]” (a.k.a. “[R1]CC[R2]”) can now be written out as“C&1C&2”.8. Constructors that take an STL std::string have been added to both oemolistreams andoemolostreams, allowing then to take C++ strings, in addition to const char*.9. A rewind method has been added to oemolistreams to match the functionality available withOESystem::oeistreams. This method rewinds the stream to the beginning (if possible) and is equivalentto oemolistream::seek(0).10. Added a new SetMCSFunc method to OECliqueSearch to allow callers to customize the MCS functionused in clique searching.11. A new constructor to the OEIsMember functor allows it to be used with an STL std::set. AdditionallyOEIsMember and OEIsMemberPtr now have assignment operators and Set methods.12. Added new function OEGetHonigIonicCavityRadius to return the effective ionic radius of eachelement to be used in solvation calculations. These values are described in Alexander A. Rashin and BarryHonig, “Reevaluation of the Born Model of Ion Hydration”, Journal of Physical Chemistry, Vol. 89, pp.5588-5593, 1985.13. New example programs “catmols”, “movemol”, “pdbdata”, “rings”, and “rmsd selftest”.34.10.4 <strong>Python</strong> wrappersMajor bug fixes1. Fixed a major bug in SD tag handling for OEGraphMols. In <strong>OEChem</strong> 1.3.1, re-using an OEGraphMol andreading from an SD file, could result in one molecule getting SD tags from the previous molecule.


160 Chapter 34. Release NotesMinor bug fixes1. Alternate implementations of OEGraphMol are now exposed by passing a constant to the constructor. Thiswas exposed in v1.3.1, but the C++ constants from molfactory.h were not wrapped.2. molchunk.py stopped working in v1.3.1 due to changes to seek() on a stream. A new method rewind() hasbeen added to oemolistream to take the place of the old seek(0).34.10.5 CommonMinor bug fixes1. Numerous improvements and refinements to the API documentation.New Features1. Added support for x86 64/AMD64 processor families (i.e. AMD Opteron and Athlon64) running SuSELinux 9.1 (or later).34.11 <strong>OEChem</strong> 1.3.134.11.1 OEPlatformThere are no significant changes to OEPlatform in this release.34.11.2 OESystemMinor bug fixes1. A Fix to the function OECheckHelp that prevents displaying on-line help for “HIDDEN” command lineinterface options.2. Fixed a bug in OEInterface::GetInterface that would fail to find the correct interface by name.3. Correct the return value for OEInterface::DeleteInterface. We now return false if we failed todelete the given interface.4. Fixed an off-by-one error with the ToLast method on some kinds of iterators (specifically,OEPredVectorPtrIters).


34.11. <strong>OEChem</strong> 1.3.1 16134.11.3 <strong>OEChem</strong>Minor bug fixes1. Accidentally <strong>OEChem</strong>’s zmatrix handling functions were documented in the <strong>OEChem</strong> documentationbut were missing from the distributed list of header files. This oversight has been correctedand the classes OECartesianToInternal and OEInternalToCartesian and the functionsOECalcCartesianCoord and OECalcInternalCoord are now publicly available.2. Handle (and optimize) the case where OECanonicalOrderAtoms or OECanonicalOrderBonds iscalled with less then two atoms.3. Fixes OEGetSmallestSubtree to correctly return the smallest set of atoms on either side of a givenbond when the molecule contains disconnected components.4. Performance improvements to OEGraphMolParameter, to avoid using OEIsomericConfTest whenloading single conformer molecules.5. Bug fix to OEMCMolBase::DeleteConf that could occasionally result in that multi-conformermolecule’s active conformer being corrupted.6. Tweak to OEMDLPerceiveBondStereo to preferentially place the wedge or hash bond on the bond toan explicit hydrogen, for chiral stereo centers of degree three.7. Enhancements to OEMDLPerceiveParity to support chiral atoms with two heavy atom neighbors and anexplicit hydrogen. These aren’t supported by MDL software (including ISIS/Draw) but this allows <strong>OEChem</strong>to convert X[N@H]Y to an MDL mol file and back to isomeric SMILES without loss of information.8. Improvements to OEMDLStereoFromParity to set <strong>OEChem</strong>’s atomic chirality for the MDL stereo parityflag for both degree three and degree four atom stereo centers.9. Correct problem in OEMiniAtom::Copy where duplicating an atom would preserve the explicit degreeof the original, as returned by OEAtomBase::GetExplicitDegree.10. Fixes to both OEMiniBond::SetOrder and OEMiniBond::SetIntType that would incorrectly resetthe aromaticity and in-ring flags of a bond as a side-effect.11. Miscellaneous fixes and improvements to OEMiniMol::Copy.12. Fix to oemolostream::openstring to allow an oemolostream to be reused multiple times, ratherthan constructed and destructed for each use.13. OEPerceiveSymmetry would previously segmentation fault on some platforms if passed a moleculewith no atoms. This routine has now been idiot-proofed.14. Maintenance updates to OEReadPDBFile to handle PDB files deposited at RCSB, upto July 2004. Forexample, correctly handling the hetero atoms of the “NZQ” residue of pdb1oj7.ent.15. A serious performance regression in OERMSD has been addressed. A change to OERMSD for <strong>OEChem</strong> 1.3unexpectedly resulted in a significant drop in superposition speed. This has now been resolved and OERMSDis as fast as (or faster than) it was in <strong>OEChem</strong> 1.2.16. The string reference argument to OESetComment is now correctly marked as const. This value is nevermodified by this function.17. The sanity checking in OESetTorsion has been improved such that we now return false for dubious callsto that function. For example, when the first and four atom pointers refer to the same atom. Previously, we’ddo nothing, but not indicate a failure with the return value.


162 Chapter 34. Release Notes18. Improvements to match limit test in OESubSearch that prevent excessive run-times for pathological substructuresearch patterns. Previously, substructure searching of “*.*.*.*” wouldn’t honor the match limitsetting, and spew multiple “match limit reached” warnings.19. Enhancement to OEWritePDBFile to allow it to honor the OEPDBOFlag::RADIUS flag to write anatom’s radius in the PDB occupancy field, even when writing a molecule without residue information.20. Better handling of out of range co-ordinates in OEWriteMDLFile, OEWritePDBFile andOEWriteMacroModelFile. Previously, a Cartesian co-ordinate larger than these file format’s fixedwidth fields would corrupt the following fields on the same line.Major bug fixes1. Fix cis/trans bond stereochemistry perception when reading MDL mol and SD files via OEReadMDLFile.We were overly aggressive when marking double bonds in rings as cis vs. trans due to a missing call toOEFindRingAtomsAndBonds. We also no longer attempt to attribute E/Z stereochemistry to connectiontables without co-ordinates, or when any of the relevant bond lengths are zero.34.11.4 CommonMinor bug fixes1. Numerous improvements and refinements to the API and theory documentation.2. Miscellaneous minor improvements to the <strong>OEChem</strong> python wrappers.New Features1. Added support for “Linux for IBM pSeries” running on PowerPC processors, including both RedHat AdvancedServer 3.2 and SuSE SLES 8.2. Added support for the 64-bit ABI on SUN Solaris for SPARC processors when using the GNU g++ compiler.34.12 <strong>OEChem</strong> 1.3.034.12.1 OEPlatformMinor bug fixes1. Made gzip set and initialization functions work with std::cin.2. Fixed a bug in the gzip seek function.3. Fixed a few minor issues with the handling of trailing separators in path names.4. Fix bug in OEStreamManager’s openInput function. When there was no protocol part of the “filename”parameter, the internal protocol string was not properly NULL-terminated.


34.12. <strong>OEChem</strong> 1.3.0 1635. Added code to properly intialize OEMutex.6. Used “&&” to replace “and” for more consistent compiler support in OEMutex.7. Implemented copy constructor and assignment operator for oeisstream.Major bug fixes1. Defined NONMINMAX in openeye.h under win32, which prevents conflicts between STL min/max andwindows min/max macros.Improvements and Optimizations1. OEMutex implementation now uses pthreads for icc 8.02. Add default 64 bit file support on linux.New Features1. Added an OETryMutex class that allows for an “Attempted lock” through the functionOETryMutex::Try()2. Added OEFileCorrectSeparators and OEFileDeterminePathAndName functions.3. Added new platform-independent bool OEIsNaN() function which is overloaded for float and double.As the name implies, this function returns a boolean which is true if the number passed is a nan.34.12.2 OESystemMinor bug fixes1. Removed a bug where OEFizzGrid::IsDataType calls OEBase::IsDataType as well2. Fixed a bug in OEConcatIter that would cause the iterator to appear to have only part of its contents of twoempty iterators were pushed back onto the concat iterator.3. OEB read an write functions now return with meaningful values (previously the writers always returned falseregardless and the readers always returned success regardless).4. Fixed problem with some interface parameters calling new on zero bytes, then reset the pointer, causingsmall leaks on some platforms.5. Corrected the error descriptor for FATAL errors.6. Fixed MSVC 7.1 ambiguity in operator &&(const OEIter&, bool).7. Added endianness test for short integers to fix problems with small number of OEB files.8. Fixed recurring compile problem in cygwin where ’ssize t’ isn’t defined.


164 Chapter 34. Release NotesMajor bug fixes1. Fixed the previously broken FirstBit, LastBit, NextBit, and PrevBit functions.2. Fixed OEBinary writer bug where atom types were being written as atom names.3. Added handling for unknown data in OEB files for lossless data conversion even when older software versionsare unable to parse data.Improvements and Optimizations1. CheckHelp now also checks for “–help defaults”, which will list the default arguments of all parameters.2. Added a new function for configuring OEInterface which takes an unsigned char* rather than a char*.3. The WriteSettings function now defaults to using OEERR rather than OEOUT.4. Added the non-virtual function CreatePredicateCopy() to the OEUnaryPredicate andOEBinaryPredicate base classes which allow users to use this virtual constructor method without unnecessarycasts.New Features1. Added two functions OEFizzGrid::GetFloatsSet and OEFizzGrid::GetIntegersSet to determineif the floats and integers are set for this grid.2. Added the new functions SetRangeOn and SetRangeOff to OEBitVector.3. Classes to handle grid I/O in OEB files.4. Add default 64 bit file support on linux.5. Added Sort(const OEBinaryPredicate &) member function to iterators. This allows arbitrarysorting of the order in which objects come out of any OEIter.6. Added two functions: OEFileExtension and OEFileExtensions which return a string containingthe specified filenames file extension or list of extensions respectively.34.12.3 <strong>OEChem</strong>Minor bug fixes1. OEB files now properly store and read conformer energy.2. Fixed bug that prevented OEB’s large molecules (¿255 atoms) from being cross platform.3. Reading and writing molecules in OEB format no longer fails if a string of 0 length has been tagged to themolecules generic data.4. Fixed bug in Sweep to ensure that atoms/bonds are reindexed sequentially even if there have been nodeletions.5. Fixed bug in <strong>OEChem</strong>’s SMILES parser that incorrectly parsed atom map indices.


34.12. <strong>OEChem</strong> 1.3.0 1656. Fixed bug that SetStereo was returning ’false’ even when the stereo was set correctly.7. Fixed a bug in the copy constructor of OEMolTmplt that was only apparent for OEDBMols.8. Bug fixes to MDL superatom expansion to avoid problems with the order in which superatoms expanded.9. Fixed the return value of the DeleteAtom and DeleteBond member functions of OEMCMolBase andOEMol. The behavior of these functions was not changed.10. Added missing IsDeleted(OEAtomBase*) and IsDeleted(OEBondBase*) methods to OEMol,OEGraphMol and OEQMol.11. Fixed API misspelling of OESuppressHydrogens12. Fixed OEAddExplicitHydrogens() to return bool rather than void to reflect its failures.13. Fixed OESet3DHydrogenGeom() to return an appropriate boolean value rather than always returningtrue.14. In bool OERMSD(const OEMolBase&,const OEMCMolBase&, double *rmsdArray,const OEMatchBase &match, bool overlay = false, double *rot = 0, double*trans = 0) fixed bug that the first conformer of the OEMCMolBase’s was used repeatedly rather thanusing each conformer in succession.15. Fixed behavior of oemolostream::GetString() when the stream is wrapping a gzstream.16. Corrected behavior of OESubsetMolecule() in the case when a ring bond it the only item removedfrom the original molecule.17. Fixed a bug in <strong>OEChem</strong>’s default OEAtomBase implementation that caused problems for an atom after65536 neighboring bonds had been deleted. This occurred in codes repeatedly created temporary bonds thendeleted them without calling Sweep.18. Fixes to atom naming bugs in <strong>OEChem</strong>’s peptide reside perception routines. Also adds support for the PTRresidue, respresenting phosphotyrosine. Additional fixes to tie splitting of C-terminal serine residues. Use “H’ ” to name C-terminal aldehyde hydrogens (“ H ” denotes the backbone amide nitrogen’s hydrogen).19. Changed OEFuzzy::operator!= and OEFuzzy:operator== to return bool instead of OEFuzzy.20. Removed overloaded OEMatch::operator new as it was incompatible with some versions of stl.Major bug fixes1. Fixed bug in OECalcCartesianCoord when rotating around the y axis. Prior implementation wouldgive wildly incorrect results.2. Removed dangerous allocation of too small a temporary array for use in writing conformer coordinates toOEBinary files.3. Fixed bug in .mol2 file format reader that resulted in connection tables occassionally being skipped whenreading files from stdin, pipes and/or sockets. The OEReadMol2File function no longer requires the useof tell and seek on it’s input stream.4. Modified the heuristic in OERMSD for recognizing and dealing with degenerate roots. When the heuristicfails, inappropriate rotation matrixes were generated.5. Fixed bug in OESet3DHydrogenGeom where memcmp was comparing coordinates of type double, butusing sizeof(float), so only a partial comparison was being carried out.


166 Chapter 34. Release Notes6. Modifications which provide the ability to read OEBinary v2 files which contain unrecognized data, withoutlosing the data.7. Added 1 byte at the end of OEMolCT record for molecules with ¿255 && ¡ 65536 bonds to store theendianness of short integers which are used to store bond records. The extra byte won’t break old formatreaders, although oeb files written with old version writers won’t be portable. This change doesn’t breakanything either direction, but does allow for portability of oeb files.Improvements and Optimizations1. Performance enhancement to OESetDistance.2. Modified the implementation of the OEMCMolBase member function SetActive() for significant performancegains in binary i/o.3. Optimization of copy construction of OEMCMolBase.4. Added warning when oemolistream and oemolostream fail to allocate a gzstream.5. Made OEMol and OEMCMolBase contain a separate dimensionality (eg - the value returned byOEMolBase::GetDimension()) for each conformer rather than a single value for the molecule. Thisallows depictions and conformers to be stored on the same molecules and still be handled correctly by the.sd writers.6. Numerous improvements/enhancements to <strong>OEChem</strong>’s PDB file reader. The list of exceptional residue nameshas been updated to match the May 2004 list of public PDB files from RCSB.7. The MDL mol file reader is tweaked to record unrecognized atomic symbols (just like we do for 3-lettersuperatom codes) storing them in the atom’s name property/field.8. The routine OEMDLPercieveParity (and thereby the MDL mol file writer) is improved to record chiralcenters that don’t have stereo specifiled as the mdl parity value “3” (unspecified chiral center).9. Improved performance for OEMolBase member functions GetAtom, GetBond and Sweep.10. All parity values were being stored for atoms – including value == 0 – which was unnecessary, and wasconsuming both memory and space on disk in the form of OEB files.11. Added call to seed random number generator from the clock in OERandomizeTorsions.12. Added more float multi-grid handlers to be initialized by default.13. A warning message was added to the SMARTS parser to alert users in cases of attempted assignment of mapidx to zero.14. Moved implementation of OEInSamePart to a template class so that parts can be tested for bonds (andother objects) as well atoms.15. Extended to OEIsRotor constructor to allow the option of hydrogen rotors.16. Created and exposed OEMDLStereoFromBondStereo function from previously internal code.


34.12. <strong>OEChem</strong> 1.3.0 167New Features1. Added Reaction parsing and processing including: OELibraryGen class for generating combinatoriallibraries; OEUniMolecularRxn class primarily for applying normalizations; OEParseSMIRKS functionfor converting Daylight’s SMIRKS format to an OEQMolBase; and OEReadRxnFile that takes anOEMolBase for reading MDL’s rxn files.2. Added OEVectorBindings class to handle SMARTS vector bindings.3. Added ChemDraw CDX reader and writer functions.4. Added ability to perceive symmetry classes useful for automorphism detection.5. Added factory methods for allocating molecules. Implemented graphmols, oemols and oeqmols in terms ofthe factory methods.6. Added int and float grid handlers to OEBinaryIOHandler.7. Added OEDisassembleExpressions function which converts query expressions into data directly onatoms/bonds.8. Added OEMiniMol – a small memory footprint molecule useful for in-memory graph searching.9. Added utility functors OEUnaryToBinaryAnd and OEUnaryToBinaryOr that turn a Unary functorinto a binary ’or’ or ’and’ functor.10. Add default 64 bit file support on linux.11. Added functions to set and get comment on molecules. These comments are written to the appropriate fieldsof .sdf, .mol2 and .oeb file formats.12. New function OEMDLStereoFromParity that sets the <strong>OEChem</strong> stereochemistry of each atom from theMDL stereo parity flag stored in generic data.


INDEXAAddContraint, 71API, 26, 27, 79Bbenzene, 68, 69BuildExpressions, 69, 77GGetActive, 28, 29GetAtom, 21GetBond, 21GetConf, 28GetConfs, 28GetMaxAtomIdx, 21GetMaxX, 30GetPattern, 69, 72GetTag, 22, 23GetTitle, 20GetValue, 22, 23graph, 27, 28, 30, 31Iimplicit hydrogen, 21MMatch, 69, 70, 72, 76MDL, 20, 23MOL2, 20, 30Nnamespace, 70, 76, 79NumAtoms, 21NumBonds, 21OOEAbsCanonicalConfTest, 31OEAbsoluteConfTest, 31OEAddPDBData, 24OEAddSDData, 22OEAtomBase, 21OEB, 30OEBinary, 23, 30OEBondBase, 21OEClearPDBData, 24OEClearSDData, 23OECliqueSearch, 76OEConfBase, 28, 29, 31OEConfBaseT, 27OEConfTest, 30, 31OECopyPDBData, 24OECopySDData, 22OEDefaultConfTest, 30OEDeletePDBData, 24OEDeleteSDData, 23OEExprOpts, 76, 79OEGetSDData, 22OEGraphMol, 26, 27OEHasOrder, 21OEHasSDData, 22OEIsomericConfTest, 30OEMatchPair, 72OEMCMolBAse, 76OEMCMolBase, 28, 29, 31, 32OEMCMolBaseT, 26, 27OEMCSFunc, 73OEMCSMaxAtoms, 73OEMCSMaxAtomsCompleteCycles, 74OEMCSMaxBonds, 74OEMCSMaxBondsCompleteCycles, 74OEMCSSearch, 70, 72, 76OEMCSSsearch, 73OEMol, 26–28, 30OEMolBase, 19–21, 26–29, 31, 69, 72, 76oemolistream, 30OEQMol, 72, 76OEQMolBase, 69, 72168


Index 169OESDDataIter, 22OESDDataPair, 22–24OESetPDBData, 24OESetSDData, 22OESubSearch, 68, 69, 72, 76OESubsetMol, 71PPDB, 23, 24PEGetPDBData, 24PopActive, 29predicate, 21, 28PushActive, 29SSD, 20, 22, 23SDDataPairs, 22SDF, 30SetActive, 28, 29SetConfTest, 30SetMaxMatches, 72SetMCSFunc, 71SetMinAtoms, 72SetSaveRangs, 76SetTag, 22, 23SetValue, 22, 23SingleMatch, 68SMARTS, 68, 69SMILES, 20, 30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!