Computer-Aided Molecular Diversity Analysis and ... - Read
Computer-Aided Molecular Diversity Analysis and ... - Read
Computer-Aided Molecular Diversity Analysis and ... - Read
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
CHAPTER 1<br />
<strong>Computer</strong>-<strong>Aided</strong> <strong>Molecular</strong> <strong>Diversity</strong><br />
<strong>Analysis</strong> <strong>and</strong> Combinatorial Library<br />
Design<br />
Richard A. Lewis," Stephen D. Pickett,+* <strong>and</strong><br />
David E. Clark+<br />
*Computational Chemistry, Eli Lilly <strong>and</strong> Company Ltd., Lilly<br />
Research Centre, Erl Wood Manor, Sunninghill Road,<br />
Windlesham, Surrey, G U20 6PH, United Kingdom, <strong>and</strong><br />
t <strong>Computer</strong>-<strong>Aided</strong> Drug Design, Aventis Pharma Ltd. (formerly<br />
Rhbne-Poulenc Rorer Ltd.), Dagenham Research Centre,<br />
Rainham Road South, Dagenham, Essex, RMlO 7XS, United<br />
Kingdom, (present address): *Roche Products Ltd., Roche<br />
Discovery Welwyn, 40 Broadwater Road, Welwyn Garden City,<br />
Hertfordshire, AL7 3AY; United Kingdom<br />
INTRODUCTION<br />
The roots of combinatorial chemistry can be traced back to Merrifield's<br />
work on the solid phase synthesis of peptides during the 1960s.l Methods for<br />
rapidly synthesizing large libraries of peptides on solid phase were developed<br />
during the 1980s, making use of the combinatorial relationship between the<br />
length of a peptide <strong>and</strong> the number of possible amino acids at each position in<br />
Reviews in Computational Chemistry, Volume 16<br />
Kenny B. Lipkowitz <strong>and</strong> Donald B. Boyd, Editors<br />
Wiley-VCH, John Wiley <strong>and</strong> Sons, Inc., New York, 0 2000<br />
1
2 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
the sequence (i.e., an n-residue peptide with X possible amino acids at each<br />
position can be used as the basis for a library of X" compounds).2 A number of<br />
groups reported protocols for what has become known as cornbirzatoriul syn-<br />
thesis.3-" At about the same time, the pharmaceutical industry began to come<br />
under greater economic pressure to increase the speed of drug discovery, <strong>and</strong> so<br />
the prospect of being able to synthesize rapidly large numbers of compounds for<br />
testing was seized upon with enthusiasm. However, peptides generally make<br />
poor drug c<strong>and</strong>idates because they are rapidly metabolized in the body. There-<br />
fore, much effort was expended to develop analogous combinatorial synthetic<br />
methods applicable for producing small organic molecules. By the mid-l990s,<br />
these efforts began to bear fruit. Thus, the discipline of combinatorial chemistry,<br />
in its present-day form, was born <strong>and</strong> quickly integrated into the drug discovery<br />
efforts of the majority of pharmaceutical companies. For more details on com-<br />
binatorial chemistry <strong>and</strong> its application to drug discovery, the reader is referred<br />
to the reviews from the mid- <strong>and</strong> late 199Os.6-13<br />
The most common form of combinatorial synthesis for small molecules<br />
involves the combination of a core or scaffold moiety with various reagents,<br />
which provide the substituents for the variable R positions (Figure 1). Assuming<br />
that there are no prohibitions for synthetic reasons, all combinations of reagents<br />
at each of the positions may be generated. Thus, the potential size of the<br />
combinatorial library is given by the product of the number of possible reagents<br />
at each of the variable R positions. For example, if a scaffold has three variable<br />
positions <strong>and</strong> there are 100 possible reagents for each of those positions, then<br />
the combinatorial library generated would contain 1003 (1 million) com-<br />
pounds. Since it often happens that many more than 100 possible reagents are<br />
readily available for a given reaction, <strong>and</strong> because the number of variable groups<br />
may exceed three, it is easy to see how combinatorial library sizes may rapidly<br />
exceed current capabilities for synthesis, screening, <strong>and</strong> storage.<br />
Given that, for many libraries, a full combinatorial synthesis using all<br />
available reagents is impractical, one of the outst<strong>and</strong>ing challenges to computer-<br />
aided molecular design practitioners in recent years has been to develop<br />
computer-based techniques to help design combinatorial libraries that encom-<br />
pass as much molecular diversity as possible in the smallest number of com-<br />
pounds. Analogous methods have also been applied to analyze the molecular<br />
,R1<br />
Figure 1 Combinatorial libraries built around a benzodiazepine scaffold (left) <strong>and</strong> a<br />
diketopiperazine scaffold (right).
<strong>Molecular</strong> Recognition: Similarity <strong>and</strong> <strong>Diversity</strong> 3<br />
diversity of compound collections (e.g., combinatorial libraries, corporate re-<br />
positories, or commercial directories) to find areas of overlap or complemen-<br />
tarity, thereby providing information for compound acquisition or further syn-<br />
thesis. The application of computational methods to combinatorial libraries<br />
<strong>and</strong> the study of molecular diversity has been the subject of a number of re-<br />
views14-17 <strong>and</strong> special issues of journals;’* however, the field is still at best<br />
adolescent <strong>and</strong> continues to evolve rapidly.<br />
This chapter reviews the field of computer-aided combinatorial library<br />
design <strong>and</strong> molecular diversity analysis. The first section of the chapter provides<br />
the foundation for all that follows by examining the nature of the forces govern-<br />
ing molecular recognition <strong>and</strong> introducing the concepts of molecular similarity<br />
<strong>and</strong> molecular diversity. Following on from that, we critically review the types<br />
of descriptor used in molecular diversity studies, as well as methods for the<br />
analysis of “diversity space.” The question of how descriptors of molecular<br />
diversity can be validated is also addressed. After these topics are covered, we<br />
shall review published applications of computational methodologies for library<br />
design <strong>and</strong> diversity analysis, seeking to highlight their relative strengths <strong>and</strong><br />
weaknesses. This leads naturally into the final section, which comprises a<br />
discussion of some of the current issues facing those working in this area <strong>and</strong><br />
suggestions regarding possible directions for future research.<br />
MOLECULAR RECOGNITION:<br />
SIMILARITY AND DIVERSITY<br />
There is no universally agreed-upon definition of chemical diversity,l9720<br />
<strong>and</strong> there are several approaches for designing chemically diverse combinatorial<br />
libraries, which differ not only in the methods <strong>and</strong> descriptors used but also in<br />
the objectives of the design. We therefore start by defining our terms: by “gen-<br />
eral diverse” library we mean a combinatorial library that covers as wide a<br />
range of values as possible relative to some molecular descriptor derived from<br />
its members. A “general representative” library is here defined as a library that<br />
is designed to mirror the distribution of values for some descriptor shown by a<br />
reference collection (e.g., the World Drug Index21). A “focused” library, on the<br />
other h<strong>and</strong>, is a library that is constrained to match closely a small set of<br />
compounds or the receptor site of a protein. Each definition is relevant to an<br />
increasing hierarchy of information used for drug discovery, with the detailed<br />
three-dimensional structural information provided by a model of the binding<br />
site being at the top. It seems sensible to try to use the knowledge we have about<br />
lig<strong>and</strong>-receptor complexes <strong>and</strong> propagate this underst<strong>and</strong>ing right down to the<br />
design of general diverse libraries, if possible. The reader should not take these<br />
definitions too literally, as they are not the only ones used in the literature.<br />
It is appropriate at this point to explain also the semantics of similarity
4 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
<strong>and</strong> diversity. Similarity is a property of pairs of objects (A is similar to B).<br />
<strong>Diversity</strong> is a property of collections of objects either with respect to that<br />
collection (as in a general diverse library) or with respect to some external frame<br />
of reference (as in representative or focused libraries). <strong>Diversity</strong> is therefore not<br />
necessarily the complement of similarity; we reserve the term dissimilarity for<br />
that concept.<br />
Similarity, diversity, <strong>and</strong> compound libraries relate to the effort of phar-<br />
maceutical discovery chemists to invent molecules that will be recognized by a<br />
biological target playing a key role in a disease process. The molecules must be<br />
able to interact with the target <strong>and</strong> favorably alter the course of the disease.<br />
Our goal in design is to improve the rate <strong>and</strong> cost at which new leads are<br />
discovered. In a broad sense, this will be achieved if libraries are synthesized or<br />
compounds bought that complement the physicochemical <strong>and</strong>/or structural<br />
properties already well represented within the set of compounds available for<br />
screening: that is, if the diversity of the screening set is increased. The assump-<br />
tion here is that the properties we use are relevant to drug-receptor interac-<br />
tions. It is sometimes the case that one or more leads are known. The aim of the<br />
design is then to focus on the important properties of the leads. If the structure<br />
of the protein target is known, then the design should use this information <strong>and</strong><br />
focus the library toward compounds likely both to fit sterically <strong>and</strong> to interact<br />
favorably with the protein. This philosophy is well illustrated by Martin <strong>and</strong><br />
coworkers, who describe the design of four different libraries for different<br />
purposes <strong>and</strong> with different levels of information to direct them.22<br />
We shall start at the top of the information hierarchy, the receptor site of a<br />
protein target, to try to underst<strong>and</strong> what drives the formation of a tightly<br />
binding protein-lig<strong>and</strong> complex. We can then assess our molecular descriptors<br />
in the light of this underst<strong>and</strong>ing. There have been several successful applica-<br />
tions of site-directed lig<strong>and</strong> design,23>24 so we can try to build on these past<br />
efforts. Most of what we say in this chapter assumes that the biological target is<br />
a protein, but similar concepts apply to nucleic acids, which are less frequently<br />
the site of drug action. We use the term "drug" rather loosely; in reality, we are<br />
dealing with hg<strong>and</strong>s, some of which will hopefully have the necessary attributes<br />
to become drugs.<br />
Our current underst<strong>and</strong>ing of the specificity of biological function is<br />
based on the principles of molecular recognition25 which, details aside, have not<br />
changed greatly in the last few years. Indeed, the successes of structure-based<br />
drug design have reinforced this orthodoxy. The binding <strong>and</strong> actions of a lig<strong>and</strong><br />
are controlled by the patterns of molecular fields found in the vicinity of the<br />
contact surface of the receptor. In other words, the amino acids of the protein<br />
create an environment that the functional groups of the lig<strong>and</strong> complement.<br />
There should be multiple contacts between the lig<strong>and</strong> <strong>and</strong> the receptor to maxi-<br />
mize specificity <strong>and</strong> affinity of the overall interaction. It is still a very difficult<br />
task to design conformationally sensible, synthetically accessible target mole-<br />
cules that have the properties required for tight binding. The advantage of
<strong>Molecular</strong> Recognition: Similarity <strong>and</strong> <strong>Diversity</strong> 5<br />
combinatorial chemistry is that we can make many compounds that are approx-<br />
imately complementary to our target in shape, in hydrogen-bonding pattern,<br />
<strong>and</strong> so on, <strong>and</strong> use this extra coverage of compound space to find leads in more<br />
situations.<br />
The reduction of the rotational <strong>and</strong> translational motion of a mobile<br />
molecule that occurs on binding to the receptor site <strong>and</strong> the fixing of certain<br />
receptor side chains implies loss of entropy in both the lig<strong>and</strong> <strong>and</strong> the receptor.<br />
This must be balanced by the utilization of enthalpic binding energy between<br />
the lig<strong>and</strong> <strong>and</strong> the receptoq26 <strong>and</strong> the energy of desolvation. Favorable en-<br />
thalpic intermolecular interactions can be divided into three main groups: hy-<br />
drogen bonding, electrostatic, <strong>and</strong> polarization. This division is perhaps arbi-<br />
trary, but it is convenient, because it allows us to associate functional groups<br />
with interactions <strong>and</strong> to make up classes of hydrogen bond donors, hydrogen-<br />
bond acceptors, deprotonated acids (at physiological pH), protonated bases,<br />
aromatic rings, <strong>and</strong> hydrophobes (lipophilic portions of a molecule). These<br />
favorable interactions are counteracted by steric repulsion caused by a poor fit<br />
of the lig<strong>and</strong> <strong>and</strong> noncomplementarity between lig<strong>and</strong> functional groups <strong>and</strong><br />
the receptor (e.g., the positioning of acidic lig<strong>and</strong> groups in negatively charged<br />
regions of the receptor). It is not our purpose to discuss this issue in great detail,<br />
<strong>and</strong> the reader is directed to several excellent reviews in this area.27-31 How-<br />
ever, several points are pertinent to the discussion that follows.<br />
The in vacuo strength of a hydrogen bond can be modeled with accuracy,<br />
but the energetics of hydrogen bond formation in solution are not well under-<br />
stood, as yet. Studies by Fersht <strong>and</strong> coworkers32 indicate that the free energies<br />
for processes of the type: X-Ha, + Y, = (X-H - . Y) + aq, range from 2 to 6<br />
kJ/mol for uncharged groups <strong>and</strong> to approximately 12 kJ/mol for charged<br />
groups. The values are strongly affected by the degree of solvent exposure of the<br />
interaction; that is, surface hydrogen bonds are worth very little, even in salt<br />
bridges.33 It would thus seem likely that hydrogen bonds do not contribute<br />
greatly to the enthalpic stability of a lig<strong>and</strong>-receptor complex. Their role in<br />
drug-receptor binding seems to be more related to specificity, especially when<br />
the interaction is between charged groups. It should be noted, however, that<br />
even this view is in dispute: work by Doig <strong>and</strong> Williams34 suggests that hydro-<br />
gen bonds can, through entropy, contribute more strongly to the free energy of<br />
binding than is often supposed.<br />
The binding site will have a distinct electrostatic profile owing to the<br />
differing electronegativities <strong>and</strong> bonding environments of the receptor atoms.<br />
Electrostatic interactions may take the form of charge-charge pairs, for in-<br />
stance, salt bridges, or interactions involving one or more permanent dipoles.<br />
The affinity of the lig<strong>and</strong> will be enhanced if the pattern of lig<strong>and</strong> partial charges<br />
can be made to complement that of the receptor.3"-37 It is emphasized that<br />
complementarity does not simply imply that positive charge on the lig<strong>and</strong><br />
should be matched by negative charge on the receptor. Complementarity should<br />
also be taken to imply a matching of the magnitudes of the charges as well. A
6 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combirzatorial Library Design<br />
highly polar area should not be matched to a slightly polar area, since the energy<br />
of desohation will not be recouped. This is the same argument as for hydrogen<br />
bonds.<br />
In regions of low polarity, the drug-receptor interaction is influenced<br />
more by entropic <strong>and</strong> weak dispersive effects. Complementarity is achieved by<br />
placing nonpolar regions of the lig<strong>and</strong> <strong>and</strong> receptor next to each other. The<br />
work of Eisenberg <strong>and</strong> McLachlan38 has provided an approximate means of<br />
quantifying the free energy of hydrophobic interactions involved in protein<br />
folding, using a simple atomic solvation potential, G = X(rsjAj), where oi is an<br />
empirically determined partition coefficient for the atom class <strong>and</strong> Ai is the<br />
surface area of atom i in the protein.<br />
The free energy of binding can also be strongly influenced by entropic<br />
effects. Any solute in water causes a local ordering of the water molecules in the<br />
first hydration sheath <strong>and</strong> a loss of mobility.39 Removal of the solute by complexation<br />
will lead to an increase in the solvent entropy. A similar result is<br />
obtained by displacing weakly bound water from the binding site. In contrast,<br />
entropy is lost through the fixing of the lig<strong>and</strong> upon complexation. The loss of<br />
Brownian entropy of rotation <strong>and</strong> translation is inevitable. The loss of internal<br />
conformational entropy, caused by the enthalpic interactions between the site<br />
groups <strong>and</strong> the lig<strong>and</strong> atoms, can be reduced by chemically bracing (rigidifying)<br />
the lig<strong>and</strong>, that is, through the introduction of ring systems in place of flexible<br />
chains. An excellent illustration of this is the work of Alberg <strong>and</strong> Schreiber.40<br />
More recently, studies by Khan et al.41 have given a further vivid example:<br />
X-ray structures of both the flexible <strong>and</strong> the braced lig<strong>and</strong> showed that the<br />
extra binding of the braced lig<strong>and</strong> was due almost entirely to the fixing of the<br />
bound orientation. NMR experiments have shed light on many aspects of protein<br />
dynamics <strong>and</strong> the effect of lig<strong>and</strong> binding.42 Indeed, it has been suggested<br />
that in some cases the loss of protein conformational entropy at its binding site<br />
may be compensated for by increased conformational flexibility in other<br />
regions.43<br />
The conformational changes that occur on formation of a complex have<br />
further implications for the process of library design. Many current methods<br />
assume an essentially static picture of the receptor. This assumption is clearly<br />
unsound, but the nature of the conformational changes that occur upon complexation<br />
cannot be predicted until a lig<strong>and</strong> has been fully designed. It is often<br />
assumed that the uncomplexed conformations of the receptor <strong>and</strong> the lig<strong>and</strong> are<br />
low energy states <strong>and</strong>, as such, will be reasonably well populated in the complex<br />
<strong>and</strong> will provide a good starting model for the design process. HIV-1 protease44<br />
<strong>and</strong> the retinoic acid lig<strong>and</strong> binding domains45 provide worrying counterexamples<br />
to this assumption; a number of others have been cataloged recently.46<br />
Nevertheless, modeling studies have still proved very useful in the case of HIV-1<br />
protease when coupled with X-ray or NMR data.47 Several conformations of<br />
the receptor <strong>and</strong> the lig<strong>and</strong> may be examined, but owing to the computational<br />
expense, it is not possible at present to examine all the low energy states. It is
Describing <strong>Diversity</strong> Space 7<br />
possible to perform good conformational analyses on large numbers of small<br />
molecules, <strong>and</strong> on the binding site itself, but at present the two cannot be<br />
combined except in an approximate or limited manner.48-51<br />
It is easy, when discussing the energetics of complex formation, to forget<br />
the crucial role played by water. It cannot be emphasized enough that water<br />
plays a vital part in the energetics of complexation, both entropically <strong>and</strong><br />
enthalpically. Another function of water molecules is the mediation of contacts<br />
between the lig<strong>and</strong> <strong>and</strong> the receptor. There are many examples in which this<br />
behavior has been observed in crystallographic complexes. One study that spe-<br />
cifically investigates this phenomenon is the work of Quiocho et al. on L-ara-<br />
binose binding protein.52 It is not clear which of the molecules of water that are<br />
observed in the crystal structure of a receptor are going to be important in<br />
subsequent interactions with an incoming lig<strong>and</strong>. There are no firm rules for<br />
deciding a priori which water molecules are structural <strong>and</strong> integral to the site,<br />
but progress has been made in this direction with the CONSOLV programs3<br />
<strong>and</strong> more recent work by Pettitt <strong>and</strong> coworkers.54~SS The docking program<br />
FlexX56 has been extended to allow automatic inclusion of water molecules in<br />
the docking. However, the difficulties in this area are shown by the final overall<br />
results: only a slight improvement was obtained over calculations without wa-<br />
ter, some dockings being greatly improved <strong>and</strong> others worsened.57<br />
In any set of lig<strong>and</strong>s, it is possible to have multiple modes of binding to the<br />
same active site; it is very difficult to distinguish a priori between the different<br />
modes with confidence using existing methodologies. Examples of potential<br />
multiple binding modes can be found in several well-characterized systems.s8<br />
These systems show large-scale changes among the different binding modes. In<br />
the human rhinovirus-14 system, two binding modes are equally populated <strong>and</strong><br />
so cannot be distinguished.59 In other cases, the binding mode may be poorly<br />
defined (giving disorder in the crystal). Multiple binding modes do not affect<br />
the process of library design in principle, However, methods should be able to<br />
consider all reasonable binding modes for which the correct answer is not<br />
known a priori, e.g., by similarity to a docked lig<strong>and</strong>. The interpretation of<br />
binding studies can also be complicated if members of the same library bind in a<br />
different manner, giving rise to what is in effect two or more structure-activity<br />
relationships.<br />
DESCRIBING DIVERSITY SPACE<br />
The key to any analysis of molecular diversity or library design is the<br />
descriptors used. From the discussion above, it is clear that the descriptors must<br />
in some way represent, or be correlated with, the important factors governing<br />
pharmaceutical efficacy, such as receptor binding or drug transport. The<br />
descriptors to be chosen will depend on several factors, such as the number of
8 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />
compounds to be analyzed <strong>and</strong> what information is available for the target. It<br />
may be that different descriptors are used at various stages of the design process<br />
as described later in the section on Applications. Here we begin by summarizing<br />
the many different descriptors available for diversity analysis/library design;<br />
then we shall discuss the best choice of descriptors for different design tasks.<br />
Finally, we present a discussion on descriptor validation. Descriptors for diver-<br />
sity analysis have also been reviewed by Brown.60<br />
Types of Descriptor<br />
Most available descriptors can be divided into two broad classes depend-<br />
ing on whether they can be calculated from the two-dimensional (2-D) connec-<br />
tion table or a three-dimensional (3-D) structure, which is usually generated<br />
from a connection table by programs such as CONCORD61 or CORINA.62 In<br />
the 3-D case, conformational flexibility of the molecules should also be con-<br />
sidered, since the generated conformation is unlikely to correspond precisely to<br />
that bound at the biological target. In this instance, descriptor calculation can<br />
be a time-consuming exercise. A second classification of descriptors may be<br />
made according to the way that the information is encoded <strong>and</strong> similarities<br />
calculated: bit strings or fingerprints versus data reduction of many real-valued<br />
descriptors.<br />
2-0 Bit Strings<br />
Molecules are not well described by single descriptors, <strong>and</strong> thus as many<br />
descriptors as is practical should be used. This necessitates mechanisms for<br />
encoding the descriptor information as efficiently as possible, to allow more<br />
parameters to be used. The most obvious method is to use a binary key (or “bit<br />
string”), in which bits are set on or off depending on the presence or absence of<br />
a feature or some other binary condition. Apart from compact storage, binary<br />
keys can also be operated on very quickly. If a sufficient number of features is<br />
encoded in it, a key can serve as a unique descriptor, or “fingerprint,” for the<br />
molecule. The fingerprint profile for a library can be built up by using the<br />
Boolean AND or OR operation for all the molecule fingerprints in the library.<br />
The AND operation gives an idea of what features are common throughout the<br />
library; the OR operation gives the diversity of features. The power of the AND<br />
operation can be extended to give modal fingerprints,63 in which the feature bit<br />
is set if the feature occurs in more than a threshold percentage of the com-<br />
pounds (the normal AND key would have a threshold of 100%). This is useful<br />
when one is trying to analyze a series of screening hits to create a constraint<br />
profile to guide library generation.<br />
Two approaches have been adopted for encoding structural information<br />
in bit strings. The first uses a predefined set (or “dictionary”) of substructural<br />
features, <strong>and</strong> a bit is set on only if a particular feature is present in the molecule<br />
(Figure 2a). Such keys were originally developed in the context of substructural
IIIIIIIIIIIIIIIIIIIII<br />
H,C-OH<br />
Describing <strong>Diversity</strong> Space 9<br />
LlIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIlIi<br />
*<br />
Figure 2 Simple illustration of bit string encoding of chemical structure. (a) Sample<br />
of a fragment dictionary-based approach. (b) Sample of a hashing scheme using a<br />
path-based decomposition of the structure. The asterisk denotes an element in the bit<br />
string where a collision has resulted from the hashing procedure.<br />
searching systems; Willett et al.64 were the first to use them to analyze screening<br />
sets. One of the most commonly used implementations of the first approach, the<br />
MACCS keys,65 have been used quite frequently for diversity studies.66>67<br />
Brown <strong>and</strong> Martin have shown that adding a frequency count (i.e., storing the<br />
number of times a feature occurs in the molecule) gives improved performance<br />
<strong>and</strong> that such keys correlate reasonably well with calculated physical properties<br />
such as octanol-water partition coefficients (ClogP) etc.68 The alternative ap-<br />
proach involves an exhaustive enumeration of all bond paths through a mole-<br />
cule, starting with paths of zero length (the atoms) <strong>and</strong> continuing up to a<br />
length of seven bonds. This method encodes not just the st<strong>and</strong>ard substructural<br />
features (e.g., a carboxylate group is covered by paths of length 0,1, <strong>and</strong> 2) but<br />
their relationship in the molecule. The most well-known implementation of this<br />
method is in the Daylight software.69 To enable the use of a fixed-length string,<br />
the occurrence of a particular path is taken as the seed to a pseudo-r<strong>and</strong>om<br />
number generator, which generates a number of bits. These bits are then OR’ed<br />
into the fingerprint for the molecule (Figure 2b). This process is known as<br />
hashing. The advantages of the path-based approach are that it is exhaustive<br />
<strong>and</strong> no predefinition of fragments is necessary. In principle, this should lead to
10 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
better retrieval performance in substructure or similarity searching, whatever<br />
the query. The disadvantage is that a particular bit in a hashed fingerprint has<br />
no particular meaning, <strong>and</strong> several paths may set the same bit by chance. This<br />
may be an issue when using hashed fingerprints for similarity- <strong>and</strong> diversity-<br />
related tasks. Two recent discussions of bit string similarity measures are recom-<br />
mended reading.70971<br />
Topological Indices <strong>and</strong> Other Propevties Derived<br />
from 2-D Structures<br />
A large number of topological descriptors can be calculated from a 2-D<br />
connection table. These represent such molecular attributes as shape, branch-<br />
ing, flexibility, <strong>and</strong> electronic properties.72973 Such descriptors have been used<br />
by several groups for library design or compound selection.74-76 The difficulty<br />
here is in combining the descriptors, because many of them will be correlated. A<br />
variety of techniques exists to tackle this problem, including principal compo-<br />
nents analysis (PCA)77 <strong>and</strong> multidimensional scaling (MDS).78J9 In the Chiron<br />
work,75 both PCA <strong>and</strong> MDS were used on different families of descriptors such<br />
as topological indices, ClogP,80-82 2-D structural similarities, <strong>and</strong> specific atom<br />
layer descriptors derived to represent the distribution of key chemical features<br />
around a key point (such as the point of attachment to the core) using bond<br />
counts. These analyses provided a total of 16 composite descriptors for analysis<br />
by D-optimal design techniques.83 Lewis et a1.74 took the approach of searching<br />
for six noncorrelated descriptors <strong>and</strong> used these to partition the corporate<br />
database at what was RhGne-Poulenc Rorer (RPR). Compared to the Chiron<br />
work, the latter approach offers greater interpretability.<br />
Pearlman <strong>and</strong> Smith84385 have developed novel molecular descriptors<br />
termed BCUTs based on an initial idea by Burden.86 A number of different atom<br />
level matrices are generated in which the diagonal represents a property such as<br />
atom charge while the off-diagonal elements contain information such as the<br />
2-D (or single-conformer 3-D) distance between two atoms. It is suggested that<br />
the lowest <strong>and</strong> highest eigenvalues of such matrices contain information that is<br />
useful with regard to molecular diversity. Five or six eigenvalues are selected by<br />
means of a x2 test such that the favored descriptors give an even distribution of<br />
molecules across the five- or six-dimensional space. Again, partitioning is used<br />
to divide the space. This method is applicable to very large data sets (hundreds<br />
of thous<strong>and</strong>s of molecules) <strong>and</strong> can be used to rapidly compare two large sets of<br />
compounds or to select a representative set of reagents for library design (based<br />
on whole molecule properties). Recent work87 has extended this approach to<br />
use a nonuniform binning scheme. Furthermore, Pearlman <strong>and</strong> Smiths8<br />
describe how this methodology can be used to define what they have termed a<br />
receptor-relevant subspace. In this case, the metrics are chosen so as to group<br />
sets of actives in the same region of space. The BCUT descriptors have also been<br />
shown to be useful for studies of quantitative structure-activity <strong>and</strong> structure-<br />
property relationships (QSAR <strong>and</strong> QSPR).89
Describing <strong>Diversity</strong> Space 11<br />
Property Fingerprints<br />
A natural extension to the substructural fingerprint is the property fin-<br />
gerprint. Bemis <strong>and</strong> Kuntz90 have described a method for combining the dis-<br />
tances between points on a molecular surface into a histogram, which can be<br />
regarded as a fingerprint with frequencies. Moreau <strong>and</strong> Turpin91 have used<br />
autocorrelation vectors based on the values of properties at the atomic centers<br />
in a molecule. Gasteiger <strong>and</strong> coworkers92 have taken this idea further by look-<br />
ing at the values of some defined property calculated at the surface of a mole-<br />
cule. An autocorrelation coefficient is constructed from the property values at<br />
several pairs of points (at the atomic centers or r<strong>and</strong>omly distributed on the<br />
surface of the molecule) <strong>and</strong> the distance separating the points. A fingerprint is<br />
obtained by binning the pairs into preset distance intervals. For reasons of<br />
computational expediency, however, these approaches consider only one con-<br />
formation of each molecule. In the Moreau approach,91 where the number of<br />
points to be sampled is much smaller, the distance intervals also have an impor-<br />
tant effect on the amount of useful information contained within the vector.<br />
This is also a critical factor in pharmacophore keys, as we discuss below. Mor-<br />
eau also computes eight separate vectors based on the connectivity, size,<br />
n-bonds, heteroaromaticity, hydrogen bond donor <strong>and</strong> acceptor capability, <strong>and</strong><br />
the contribution to ClogP of each atom. These vectors are concatenated to give<br />
the overall property fingerprint.<br />
3-0 Desm‘ptors<br />
Following the early work of Willett <strong>and</strong> coworkers93794 <strong>and</strong> Sheridan et<br />
a1.,95 searching databases of 3-D structures of organic compounds has become<br />
an essential tool in the pharmaceutical industry.96.97 Results of 3-D flexible<br />
searching within databases of known compounds have proven this in a practical<br />
sense (see, e.g., Refs. 98 <strong>and</strong> 99).<br />
These successes have led to the suggestion that descriptors based on three-<br />
point pharmacophores could be useful in assessing the pharmacophoric diver-<br />
sity of large data sets <strong>and</strong> in library design.100-104 The principle is illustrated in<br />
Figure 3. The Abbott implementation used fixed-width 1 bins up to 15 A <strong>and</strong><br />
considered only the CONCORD-generated conformation.104 In the implemen-<br />
tation at RPR using the ChemDiverse software,lOs all potential pharmacophore<br />
triangles or quadrangles are formed from seven types of interaction center<br />
(hydrogen bond acceptor, hydrogen bond donor, tautomeric groups, aromatic<br />
centroids, hydrophobes, acids, <strong>and</strong> bases) over a range of distances of 2-24 A<br />
with variable-width bins. Conformational flexibility is taken into account by<br />
means of a systematic search procedure including a bump-check to eliminate<br />
high energy conformers.100J01 With three points (triangles), there are over<br />
250,000 potential pharmacophores; this number rises to 24 x 106 if four points<br />
(quadrangles) are considered. The presence or absence of these pharmacophores<br />
in a molecule is encoded in a bit string, often referred to as the molecule’s<br />
“pharmacophore key.”
12 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />
J<br />
Pharmacophore Key -1<br />
Figure 3 Illustration of the creation of a pharmacophore key. As the conformation of<br />
a molecule changes, so do the distances between the pharmacophoric groups<br />
(spheres). Each of the two different three-point pharmacophores shown sets its own<br />
particular bit in the pharmacophore key.<br />
The relevance of such descriptors to drug-receptor interactions is evident.<br />
The bit string represents the triangles formed between key interaction points<br />
over a range of accessible conformations. Two key elements in this approach are<br />
correct atom typing (distinguishing basic nitrogens, tautomeric groups, etc.)<br />
<strong>and</strong> the conformational analysis.100J01 Both these aspects have been the subject<br />
of extensive in-house development at RPR. The recent extension to four-point<br />
pharmacophores has been shown to give even greater discrimination between<br />
compounds.101 One drawback is the time needed to perform the conforma-<br />
tional analysis. Given the availability of several machines on a network, how-<br />
ever, even crude parallelization allows the corporate database to be analyzed<br />
within a few days.<br />
Cramer et a1.106 developed a methodology called comparative molecular<br />
field analysis (CoMFA). Rules are used to align R groups (hence the method is<br />
not applicable to all diversity tasks) in a single conformation (which may in-<br />
clude intramolecular contacts). An interaction energy is calculated with a probe<br />
positioned at all points on a grid around the molecules. Since conformational<br />
flexibility is ignored, these “topomeric” descriptors are essentially “2.5-D.”<br />
Mount et al.107 have recently published the IcePick methodology<br />
developed at Axys. A small set of low energy conformers is generated for each<br />
molecule. Pairwise comparisons are performed, flexibly fitting a conformation<br />
of molecule B onto a fixed conformer of molecule A <strong>and</strong> vice versa, using a<br />
modified version of the Hammerhead docking algorithm.108 The scoring func-<br />
tion utilizes the molecular surface scoring of the Compass program,’O9 which<br />
considers hydrophobic <strong>and</strong> hydrogen-bonding properties at a set of discrete<br />
points projected onto two shells at 6 <strong>and</strong> 9 8, around the molecule. The overall<br />
similarity is the average of these measures over all pairs of matches of A onto B<br />
<strong>and</strong> B onto A. The dissimilarity is computed as (1 - similarity). Each<br />
dissimilarity calculation can take about 40 seconds on a DEC (now Compaq)
Describing <strong>Diversity</strong> Space 13<br />
Alpha workstation, <strong>and</strong> so the results are stored in a database for future use.<br />
This time-consuming method has been used primarily for reagent selection,<br />
assuming that the presence of an acid, for example, would define how the<br />
reagents would fit to a common core.<br />
A further method for analyzing the geometric diversity of functional<br />
groups in chemical structure databases has been reported by Hubbard <strong>and</strong> co-<br />
workers.110 Their program, HookSpace, analyses the spatial relationship be-<br />
tween pairs of functional groups <strong>and</strong> provides both qualitative <strong>and</strong> quantitative<br />
diversity measures. The utility of the method was demonstrated by comparing<br />
the diversity of two commercially available databases <strong>and</strong> a benzodiazepam-<br />
based combinatorial library. In a similar vein, Bartlett <strong>and</strong> Lauri have used the<br />
CAVEAT program to assess the diversity of different combinatorial core groups<br />
based on a comparison of bond vectors at the substituent positions.111<br />
Chapman112 has proposed an elegant formalism for expressing the diver-<br />
sity of a collection of molecules, based on molecular entropy <strong>and</strong> the three-<br />
dimensional arrangement of steric bulk <strong>and</strong> polar functionalities. The method<br />
addresses molecular flexibility by means of a conformational search to identify<br />
a set of low energy conformers. The similarity of two conformers is given by<br />
computing the best steric overlap, then computing the sum of the distances<br />
between each atom in conformer 1 <strong>and</strong> its corresponding nearest neighbor in<br />
conformer 2. An analogous function is used to compute a distance based on<br />
polar functionalities (hydrogen bond donors, acceptors, etc.). Note that all<br />
pairs of conformers for all molecules are compared. The diversity function<br />
comprises a sum of minimum dissimilarities together with an entropic penalty<br />
term based on the number of rotatable bonds in a molecule. It will come as no<br />
surprise to learn that this approach is very computationally expensive. Thus, in<br />
practice, this method is probably restricted to cases in which the superposition<br />
is fixed, that is, looking at the position of side chains relative to a fixed core.<br />
Receptor-Based Descriptors<br />
When a crystal structure is available, the additional information ironically<br />
makes the task of design more time-consuming. It is not currently feasible to<br />
perform detailed calculations on every member of a library within the proposed<br />
active site, including all the important factors described in the above section on<br />
<strong>Molecular</strong> Recognition: Similarity <strong>and</strong> <strong>Diversity</strong>. Indeed, methods for the flex-<br />
ible docking of lig<strong>and</strong>s are still being developed, although some (e.g., those<br />
described in Refs. 56, 113, <strong>and</strong> 114) are beginning to show promising success<br />
rates. However, such methods are quite computationally expensive; thus, ap-<br />
proaches that make more approximations are probably necessary. Some recent<br />
publications in this area use one particular approximation: specifically, holding<br />
the template or scaffold fixed <strong>and</strong> considering each R group independently. The<br />
PROSELECT1 1s strategy builds on the earlier de novo design program<br />
PRO-LIGAND.116 Several potential template positions are chosen <strong>and</strong> substi-<br />
tuents assessed by means of an empirical scoring function.117 The Kuntz group
14 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />
has developed an approach (CombiBUILD) based around the program DOCK,<br />
which assesses mainly the steric fit of a lig<strong>and</strong> with an approximate force field<br />
score for ranking. The template is kept fixed <strong>and</strong> substituents at each position<br />
are evaluated while allowing for possible intramolecular interactions between<br />
substituents at different positions using conformational probability maps. This<br />
method has been used with success to select reagents for a library against<br />
cathepsin D.118 More recently, the DOCK program itself has been used in an<br />
“anchor-<strong>and</strong>-grow” mode to design libraries targeted against plasmepsin 11.119<br />
Another DOCK variant for library design, CombiDOCK, has been described,<br />
but no applications have yet been published.120 In another approach, Bohm<br />
adapted the LUDI de novo design programl21J22 to allow the structure-based<br />
selection of reagents <strong>and</strong> has recently applied this methodology to design inhibi-<br />
tors of thrombin.123<br />
Chemical Design Ltd. has developed software (“Design in Recep-<br />
tor”124J25) that allows the virtual screening of tens to hundreds of thous<strong>and</strong>s of<br />
compounds against all potential three- or four-point pharmacophores within<br />
the binding site of a protein. This program, which extends the pharmacophore-<br />
based methodology to embrace the concept of site-directed library design, was<br />
developed in collaboration with a small number of pharmaceutical industry<br />
partners. The method operates as follows: first, key interaction sites (donor,<br />
acceptor, acid, base, hydrophobe, or aromatic) are defined in the receptor site.<br />
Then, all possible three- or four-point pharmacophore queries are derived from<br />
these sites. The number of queries can be restricted by applying user-definable<br />
criteria, which may specify, for instance, minimum <strong>and</strong> maximum distances<br />
between points <strong>and</strong>/or groups of points that must be included in all phar-<br />
macophores. Finally, the derived set of pharmacophores (perhaps several hun-<br />
dred to more than a thous<strong>and</strong>) is used to search the database of virtual products,<br />
with the protein active site acting as a steric constraint. The search is performed<br />
as a st<strong>and</strong>ard 3-D pharmacophore search, with each hit conformer being fitted<br />
back onto the matching query pharmacophore. However, matching each phar-<br />
macophore in turn against every molecule would require repeating the confor-<br />
mational analysis for each compound. Speed is gained by inverting the match-<br />
ing loop: performing the conformational analysis only once <strong>and</strong> comparing<br />
each conformer against all query pharmacophores. The same proprietary con-<br />
formational analysis scripts <strong>and</strong> atom typing can be used as for st<strong>and</strong>ard phar-<br />
macophore key calculations.101 It is possible to save three pharmacophore keys:<br />
(1) the key of the site pharmacophore matched, (2) the key of the lig<strong>and</strong> atoms<br />
matching site pharmacophores, <strong>and</strong> (3) the full pharmacophore key of the<br />
lig<strong>and</strong> OR’ed over all conformations that fit the site. Such methodology should<br />
open the way for full product-based design taking account of the ability of<br />
the molecules to fit the receptor with no a priori assumptions about binding<br />
modes <strong>and</strong> selecting products such that the library will cover all potential site<br />
pharmacophores .
Choosing Appropriate Descriptors<br />
Describing <strong>Diversity</strong> Space 15<br />
The choice of descriptor will depend on a number of factors, including any<br />
personal biases of the modeler! Perhaps the most important considerations are<br />
the amount of information available about the target <strong>and</strong> whether lead com-<br />
pounds have been discovered. There are several possible scenarios:<br />
1. Little information is available, <strong>and</strong> we are in the realm of general library<br />
design.<br />
2. Several leads are available, <strong>and</strong> the descriptors must in some way utilize the<br />
information in these leads.<br />
3. A crystal structure is available, <strong>and</strong> descriptors/methods are needed to utilize<br />
this information.<br />
The scale of the problem (i.e,, number of compounds to be processed) can be<br />
significant, because some of the descriptors described above will be applicable<br />
to only a few hundred thous<strong>and</strong> compounds rather than millions. Thus we face<br />
several questions of vital importance in the design of drug molecules: To what<br />
extent can <strong>and</strong> should pharmacological <strong>and</strong> pharmaceutical properties be taken<br />
into account/predicted? How can the plethora of available descriptors be sensi-<br />
bly weighted? Finally, how can the various descriptors <strong>and</strong> methods of design be<br />
validated? These are all active areas of current research. Overriding all these<br />
considerations, however, is the requirement that the descriptors be calculable<br />
for a wide range of structural classes in a time frame applicable to the problem<br />
at h<strong>and</strong>. Several months may be needed for selecting subsets from a corporate<br />
database or assessing compounds for purchase, but the turn-around time for<br />
library design is generally a few weeks or less. Of course, it would also be<br />
advantageous if the same descriptors could be used to tackle a variety of prob-<br />
lems. For example, screening hits from a general library could be analyzed<br />
within the descriptor space used to design the library, which immediately pro-<br />
vides insight into the type of molecules required for focused lead follow-up<br />
libraries. Thus, descriptor interpretability may also be an issue.<br />
In summary, the choice of descriptors will depend on the problem at h<strong>and</strong><br />
<strong>and</strong> the constraints of time imposed on the designer. Issues of descriptor valida-<br />
tion are discussed in the next section, though there is no consensus at this time<br />
on the best descriptors to use. We have had success in applying the<br />
pharmacophore-based 3-D descriptors to a variety of design tasks. We favor the<br />
descriptors because they represent key aspects of intermolecular interactions<br />
<strong>and</strong> take account of conformational flexibility. The pharmacophore descriptors<br />
can be applied to diverse subset selection, general library design, <strong>and</strong> focused<br />
library design. Site-directed design is in its infancy, but, as described above, the<br />
methods are being developed to apply the pharmacophore descriptors in this<br />
area too.
16 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />
Validation of Descriptors<br />
The validation of descriptors is an unsolved problem, fraught with difficulties.<br />
Validation implies the comparison of theoretical results against some absolute<br />
truth, provided by experimental data or by the universe of all possible results.<br />
Our stated goal is that design should enhance the process of lead generation <strong>and</strong><br />
optimization. It would seem appropriate to use hit rates as a measure of how<br />
well our diversity analysis does in comparison to chance: “simulated screen-<br />
ing.” This approach has been investigated by a number of researchers including<br />
the authors of Refs. 126-129 However, there are a number of issues concerning<br />
this type of approach. First, it assumes that the universe of chemical space can<br />
be neatly divided into actives <strong>and</strong> inactives, according to some biological test.<br />
However, membership of a set depends on the threshold defined for activity. If<br />
we return to our ideas about molecular recognition, we see that binding with<br />
micromolar affinity may indicate some degree of recognition, possibly mixed in<br />
with some solvophobic effects. As the activity improves, we are getting more of<br />
the features right, until at low nanomolar levels, we have compounds that fill<br />
the active site in a complementary manner. Thus, membership of the actives<br />
club becomes more exclusive as the threshold is raised <strong>and</strong> fewer chemical<br />
families are able to gain entrance.<br />
The next issue is that of sampling. The entire universe of compounds<br />
cannot be assayed <strong>and</strong> split into the activehnactive sets. How do we know that<br />
we have used a representative sample to test? Are the contents of the Spresi<br />
database130 representative of the chemical universe, or those of the World Drug<br />
Index21 of active drugs? Both questions probably have a negative answer, so<br />
methods that use this approach to validation must be viewed with caution. Even<br />
the term “hit rate” can be misleading. From a lead generation viewpoint, the<br />
aim should be to cover as many distinct structural classes as possible rather than<br />
concentrating on crude counts of hits (prompting the question of how to define<br />
a distinct structural class!). The “quality” of the hits is also important: that is,<br />
how amenable are they to optimization by medicinal chemistry. These consider-<br />
ations imply that the most efficient approach involves screening a well-designed<br />
set, followed up by screening close analogs of the hits.<br />
A number of studies have used an alternative approach to assess descriptor<br />
quality for diversity profiling. In these studies, descriptors were ranked by their<br />
ability to discriminate active <strong>and</strong> inactive compounds within a number of medic-<br />
inal chemistry project data sets. In the work of Brown <strong>and</strong> Martin,66 this<br />
discrimination involved the ability to separate one class of compounds from a<br />
general pool of compounds. The approach put forward by Patterson et<br />
(see also Refs. 132 <strong>and</strong> 133) introduced the concept of “neighborhood be-<br />
havior”: that is, compounds close in biological space should have a small differ-<br />
ence in descriptor values. In these studies, it was suggested that 2-D fingerprints<br />
<strong>and</strong> simple shape descriptors make better descriptors than other alternatives
Describing <strong>Diversity</strong> Space 17<br />
such as the primitive 3-D pharmacophore fingerprints studied. From our own<br />
perspective, such assertions regarding descriptor quality are rather sweeping.<br />
Two-dimensional substructure searches are used routinely to extract analogs<br />
from databases.134 Similarly, measurement of shape variation provides one of<br />
the staple descriptors of 3-D QSAR calculations.135-137 A capacity to distin-<br />
guish active from inactive analogs from a single biological screen at a nanomolar<br />
level is hardly proof of an ability to discriminate between heterogeneous activity<br />
classes. Within a single activity class, differences as small as a methyl group can<br />
have significant effects on activity. This well-known piece of medicinal chemis-<br />
try lore can be verified by a careful reading of many SAR papers. Jacobsen et<br />
al.138 provide a recent example in which two compounds (Figure 4) differ by one<br />
methyl group <strong>and</strong> have 70-fold difference in their relative activities. The struc-<br />
tural differences that exist between different receptors will tend to be much<br />
larger, however. Thus, to some extent, the results of such studies could have been<br />
predicted. In fact, there are any number of examples in which such approaches<br />
would break down. Many targets of pharmaceutical relevance involve the com-<br />
petition of a small-molecule lig<strong>and</strong> for a binding site with a natural lig<strong>and</strong> such<br />
as a small peptide or even a protein. The structurally diverse endothelin antago-<br />
nists discovered by a number of companies offer a case in point.99,139,140 All<br />
have a low 2-D similarity according to Daylight fingerprints (Figure 5), yet<br />
maintain the arrangement of essential pharmacophoric features.<br />
Fibrinogen receptor antagonists represent another example. In this in-<br />
stance, the natural lig<strong>and</strong> is (in part) the RGD (Arg-Gly-Asp) loop. As can be<br />
seen from Figure 6, different antagonists may show a high degree of structural<br />
diversity, exhibiting Daylight fingerprint similarities of less than 0.6. As an<br />
experiment, a database of 100,000 compounds taken from the RPR collection<br />
was seeded with 12 diverse RGD antagonists taken from the literature.141<br />
Performing a similarity search in this database with a multipharmacophore key<br />
derived from a flexible conformational analysis of the RGD tripeptide retrieves<br />
all 12 antagonists within the top 3% of the database (Table l).I42 Alternatively,<br />
@yX" /<br />
OAN/YCH,<br />
lyN*CH3<br />
CH3<br />
Figure 4 Illustration of the effect of adding a single methyl group to a compound's<br />
activity. In the source paper (Ref. 138), compound 41 (R = H) has a mean binding<br />
affinity of 6.67 nM against [3H]flunitrazepam. The corresponding value for<br />
compound 54 (R = Me) is 470 nM.
SB 209670<br />
0 RPRl09353<br />
Figure 5 Structurally diverse endothelin antagonists exhibiting low 2-D similarity<br />
while maintaining common pharmacophoric elements crucial to activity.<br />
using one of the synthetic antagonists (BIBU52) as the probe retrieves the other<br />
11 antagonists in the top 855 compounds. While this result is not proof of the<br />
validity of pharmacophore descriptors for library design, it certainly shows that<br />
the descriptors capture many of the important features of lig<strong>and</strong>-receptor<br />
interactions.<br />
Perhaps the best lesson to be drawn from these descriptor comparisons is<br />
that most of the proposed descriptors provide some discrimination pertinent to<br />
the problem at h<strong>and</strong>, <strong>and</strong>, as stated earlier, the final choice will depend on many<br />
factors relating to the nature of the problem. Two-dimensional descriptors can<br />
be very efficient at removing close analogs from screening sets, whereas to<br />
design small-organic molecule libraries based on peptide leads, or indeed on any<br />
structurally diverse compound set, or to achieve diversity in a biologically<br />
relevant space, requires descriptors (namely, 3-D ones) that capture the essence<br />
of drug-receptor interactions.<br />
A further philosophical problem is that many of the descriptors used to<br />
date are derived from the field of similarity analysis.143 Two-dimensional fin-<br />
gerprints lose relevance once outside a defined structural family. It is an ac-<br />
cepted fact that similarity values below about 0.5 are not reliablekignificant.<br />
This is not a problem for clustering similar compounds, when one simply wants<br />
to know that compound A is not similar to compound B, but problems arise<br />
when it is important to know how dissimilar two compounds are. A pertinent<br />
critique of 2-D bit string descriptors has been presented by Flower.70
TAKO29<br />
MK383<br />
BIBU52<br />
Figure 6 Some structurally diverse RGD antagonists.<br />
APPLICATIONS<br />
Applications 2 9<br />
With the necessary theory <strong>and</strong> background now in place, we move on to<br />
examine how to use the descriptors. In addition to what follows, the reader may<br />
wish to consult a special issue of Perspectives in Drug Discovery <strong>and</strong> Design<br />
from a few years ago entitled “Computational Tools for the <strong>Analysis</strong> of Molecu-<br />
lar <strong>Diversity</strong>.”Is It contains review articles covering many of the issues<br />
discussed below: cluster-based selection, partition-based selection, <strong>and</strong>
20 <strong>Molecular</strong> Diuemitv <strong>and</strong> Cornbinatorial Libran, Desian<br />
Table 1 Use of a Pharmacophore Key Derived from the RGD Tripeptide to Retrieve<br />
12 Seeded RGD Antagonists from a R<strong>and</strong>om Collection of 100,000 Molecules<br />
Nc<br />
Probe Numberof Hits Topa Lowestb 100 500 1000<br />
RGD 23,884 8 3,044 3 5 7<br />
MK383 57,846 13 11,252 2 5 5<br />
SB214857 48,210 10 18,086 3 4 6<br />
TAK029 38,728 1 2,275 5 6 9<br />
BIBU52d 37,805 1 855 4 6 11<br />
aPosition in the hit list of the highest ranking of the 12 seeded compounds.<br />
bLocation of the lowest ranking of the seeded compounds.<br />
Applications 21<br />
at this ~tage.12791~6<br />
This is especially true when one is simply looking for hits<br />
showing some activity that can be followed up by screening similar compounds<br />
from the corporate database. A maximally diverse set is to be preferred to a<br />
purely r<strong>and</strong>om selection for the following reasons. The maximally diverse set<br />
should maximize the structure-activity information gained from the screen by<br />
minimizing the redundancy in the set of compounds tested. A simply r<strong>and</strong>om<br />
selection, rather than a maximally diverse one, will not guarantee the absence of<br />
close homologs. Further, although empirical evidence suggests that the number<br />
of hits obtained from a r<strong>and</strong>om selection may approach that obtained from a<br />
maximally diverse set, the latter should ensure that structurally <strong>and</strong> phys-<br />
icochemically diverse leads are found, giving medicinal chemists a better chance<br />
of finding suitable compounds to follow up for lead optimization.146 Once one<br />
or more leads have been selected for a project, it might be desirable to select<br />
follow-up sets for screening. In this case, compounds that are similar to the<br />
lead(s) in some sense will be sought.<br />
Both these types of selection may be accomplished by either clustering or<br />
partitioning methods. For a diverse selection, one might cluster the collection<br />
<strong>and</strong> then test only the cluster centroids, whereas in a follow-up similarity<br />
search, other compounds from within the clusters containing the leads could be<br />
tested. If a partitioning approach were to be used, a diverse selection could be<br />
obtained by choosing one compound from each occupied cell in the grid,<br />
whereas compounds similar to a lead could be found by examining the cell that<br />
contains it, together with immediately adjacent cells. A diverse set can also be<br />
constructed by means of a maximum dissimilarity selection algorithm, whereas<br />
a follow-up set could be identified by simply ranking compounds by similarity<br />
to the lead( s). Finally, experimental design techniques, autocorrelation<br />
methods, <strong>and</strong> a variety of stochastic algorithms may also be applied to subset<br />
selection.<br />
Clustering Subset selection by clustering has been a st<strong>and</strong>ard approach<br />
for many years. Perhaps the seminal paper in this regard is that of Willett <strong>and</strong><br />
coworkers.64 In this work, the nonhierarchical clustering algorithm due to<br />
Jarvis <strong>and</strong> Patrick147 was employed to cluster the Pfizer chemical stores file<br />
(approximately 8500 available compounds) with the aim of selecting small<br />
subsets for screening. The same techniques were also employed to group the<br />
output from substructure searches, again with the intent of reducing the number<br />
of compounds to be screened, while maximizing the information gained from<br />
the screening. A drawback to this nonhierarchical method is the lack of control<br />
over the size of the largest cluster <strong>and</strong> the number of singletons. Slight variations<br />
in the control parameters can lead to the formation of one very large, probably<br />
unrealistic, cluster, or at the other extreme, a high fraction of clusters with a<br />
single compound, Menard <strong>and</strong> coworkers148 tried to address this issue through<br />
their cascaded clustering approach, in which prior knowledge about the poten-<br />
tial size of the largest cluster in the database was used to set the clustering<br />
parameters.
22 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Cornbinatorid Library Design<br />
The small clusters (< 5 members) were extracted <strong>and</strong> reclustered. When<br />
the results were checked by medicinal chemists, this strategy seemed to have<br />
reduced the number of singletons to an acceptable level. An alternative approach<br />
developed by Doman et a1.149 employed a fuzzy clustering technique150<br />
combined with the Jarvis-Patrick method.147 The methodology has no userdefined<br />
parameters <strong>and</strong> allows compounds to belong to more than one cluster.<br />
Hierarchical clustering methods are not as greatly affected by the issue of<br />
singletons, but they do impose higher computational dem<strong>and</strong>s. If N is the<br />
number of compounds to be processed, the dissimilarity matrix can require up<br />
to O(N2) disk space for storage, <strong>and</strong> the st<strong>and</strong>ard clustering algorithm requires<br />
O(N3) time.151 Some workers have achieved improved performance by use of<br />
Murtagh’s reciprocal nearest-neighbor algorithm,l52 which requires only O(N)<br />
disk space <strong>and</strong> O(RT2) time, allowing the clustering of up to 200,000 structures<br />
in a reasonable time.71J51<br />
Partitioning A good example of a partitioning approach to screening <strong>and</strong><br />
follow-up set selection is the diverse property-derived (DPD) method described<br />
by Lewis et al.74 The following molecular attributes were used to construct a<br />
six-dimensional property space: number of H-bond acceptors, number of<br />
H-bond donors, molecular flexibility, Hall <strong>and</strong> Kier’s electrotopological state<br />
index,l53 ClogP, <strong>and</strong> an “aromatic density” measure. Compounds from the<br />
corporate database were then partitioned across this space, <strong>and</strong> each compound<br />
was assigned an identifier (DPD code) according to the partition to which it was<br />
allotted. When the compounds had been partitioned, a rational, general screening<br />
set was created by selecting one compound from each of the partitions. This<br />
screening set has been in regular use at RPR for a number of years <strong>and</strong> has<br />
yielded several weak leads (1-50 pM) in a variety of assays. A particular instance<br />
of this concerned a project to find inhibitors of low density lipoprotein<br />
(LDL) production. In this case, the general DPD set yielded one hit, but a<br />
follow-up set containing compounds with the same DPD code (i.e,, occupying<br />
the same cell) gave further hits. These were refined in conjunction with an<br />
existing lead to give a query for use in 3-D searching. Searches of the corporate<br />
database resulted in compounds having low nanomolar activity.154 Lewis et<br />
a1.’4 make the point that, in general, the DPD set does not give rise to high<br />
quality leads, but rather to hits. However, since the DPD set represents a diversity<br />
of molecular properties rather than of structural features, the DPD set is<br />
likely to be especially useful with new screens where leads have not yet been<br />
identified.<br />
Maximum Dissimilarity-Based Selection The original algorithm for<br />
dissimilarity ranking in the chemical structure context seems to have been proposed<br />
by Bawden,*55 although the basic algorithm may be due to Kennard <strong>and</strong><br />
Stone.156 The basic operation of a dissimilarity selection algorithm is to start<br />
with a compound selected at r<strong>and</strong>om <strong>and</strong> make this the first selected compound.<br />
Subsequent compounds are selected so that they are maximally dissimilar<br />
to all those in the currently selected set. Dissimilarity may be measured by
Amlications 23<br />
the maximum sum of similarities to all selected molecules (MaxSum) or the<br />
largest nearest neighbor distance (MaxMin). The final diversity of the N mole-<br />
cule subset is given by Eq. [l] or [2], where sim(i, j) is the similarity between<br />
molecules i <strong>and</strong> j, <strong>and</strong> d, is the Euclidean distance between molecules in the<br />
descriptor space.<br />
This type of methodology was embraced by researchers at Upjohn in their<br />
COUSIN system.126 The Willett developed fast algorithms based<br />
on the MaxSum dissimilarity measure in combination with the cosine coeffi-<br />
cient. This algorithm was applied by Pickett et a1.102 in conjunction with multi-<br />
pharmacophore descriptors to the task of selecting diverse reagents. Willett’s<br />
group has looked extensively at both definitions of dissimilarity159 <strong>and</strong> al-<br />
gorithms for dissimilarity-based compound selection.160 In the former case,<br />
they concluded that it was impossible to identify any of the four definitions<br />
studied as being superior to the others.<br />
When the algorithms were compared, however, the MaxMin algorithm<br />
gave better results than the alternatives under study. In fact, several<br />
workersl07J61 have highlighted a problem with the MaxSum procedure. The<br />
measure is based on the distance of the point from the centroid of the set <strong>and</strong> so<br />
tends to select molecules from the corners of diversity space, <strong>and</strong> duplicate<br />
selections can appear to add to the diversity. This situation is clearly a problem<br />
with traditional descriptors, because the extremes of space tend to be less rele-<br />
vant chemical compounds (very high or very low log P, etc.).<br />
It is interesting to consider why using “corner” compounds is a less press-<br />
ing issue when applied to pharmacophore keys. First, the pharmacophore space<br />
is very high-dimensional, <strong>and</strong> it is not uncommon to have a number of reagents<br />
or molecules that have no (or only very few) pharmacophores in common.<br />
Mount et al.107 note that in higher dimensional spaces, more of the points are<br />
near the periphery, rendering the difference in behavior less pronounced. Sec-<br />
ond, the molecules are not r<strong>and</strong>omly spread throughout space but tend to<br />
cluster; thus inclusion of a similarity threshold to prevent selection of molecules<br />
similar to those already selected avoids revisiting areas of space. Provided the<br />
number of compounds to be selected is small compared to the size of the set, the<br />
time overhead for this additional constraint is not too great. Third, it is also<br />
possible to monitor how many new pharmacophores a selected molecule would<br />
add to the set.100 Thus, the similarity measure ensures that pharmacophores are<br />
presented in different combinations, while the monitoring of the addition of<br />
new pharmacophores ensures that, overall, all pharmacophores within the set
24 <strong>Molecular</strong> Divmity <strong>and</strong> Combinatorial Libra y Design<br />
are covered (i.e., by combining a partitioning <strong>and</strong> a distance-based approach).<br />
These arguments not withst<strong>and</strong>ing, the MaxMin procedure would appear to be<br />
the method of choice today. Agrafiotis <strong>and</strong> Lubanovl61 have shown how k-d<br />
trees can provide an efficient way to calculate nearest neighbor distances for<br />
input to a MaxMin selection procedure. They use a simulated annealing pro-<br />
cedure to select an n-molecule subset that maximizes Eq. [3]. This expression<br />
provides a smoother function compared to the st<strong>and</strong>ard MaxMin expression<br />
(Eq. PI).<br />
A general dissimilarity selection algorithm was recently reported by<br />
Clark.1627163 There is an adjustable parameter in the algorithm that controls the<br />
balance between representativeness <strong>and</strong> diversity. Other functions for maximiz-<br />
ing dissimilarity have been suggested by Hassan et al.164 In their work, the<br />
(dis)similarity function is derived from a large number of 2-D <strong>and</strong> single-<br />
conformer 3-D descriptors, the dimensionality being reduced by means of prin-<br />
cipal components analysis (PCA). Multidimensional scaling is used to generate<br />
a 3-D coordinate plot for the library. The library design is a “cherry-picking”<br />
procedure: a r<strong>and</strong>om selection of compounds is taken, <strong>and</strong> compounds are<br />
added <strong>and</strong> removed from this selection by means of a Monte Carlo method<br />
combined with a maximal dissimilarity function based on the sum of the dis-<br />
tances between molecules in the PCA descriptor space. It seems from Hassan’s<br />
paper,164 that the principal components are recalculated for, <strong>and</strong> are particular<br />
to, each library, making the performance of interlibrary comparisons a non-<br />
trivial task. Hudson et al.165 have also reported the development of<br />
dissimilarity-based methods for the selection of diverse subsets.<br />
Experimental Design In addition to a maximal dissimilar selection al-<br />
gorithm, similar in spirit to those described above, Higgs et a1.166 have experi-<br />
mented with the use of a D-optimal design algorithm to generate what they term<br />
an “edge design.” By this they mean a design that tends to select molecules on<br />
the edge of the descriptor space, filling the corners first <strong>and</strong> then populating the<br />
edges. Experimental design has also been used for reagent selection by the<br />
Chiron group,7s who claim that it can generate “maximal overall diversity.”<br />
However, Higgs et al. criticize this assumption. In their experience, the D-opti-<br />
ma1 design algorithm does not explicitly seek to avoid previously sampled areas<br />
of space, even with the addition of additional (quadratic) terms. The Lilly<br />
group166 much prefers the maximal dissimilarity selection algorithm (what they<br />
term a “spread design”), which is able to sample descriptor space thoroughly,<br />
including molecules from the edges <strong>and</strong> throughout the space. A further type of<br />
design (a “coverage design”), suitable for lead follow-up, is mentioned in their<br />
work.166 The coverage design algorithm identifies a subset of molecules that is<br />
maximally similar to a c<strong>and</strong>idate set.
Applications 25<br />
Kohonen Maps Kohonen maps are essentially a projection technique,<br />
providing a lower dimensional (usually 2-D) view of a higher dimensional<br />
descriptor space. Objects close in the higher dimensional space will be placed in<br />
the same or neighboring neurons, <strong>and</strong> so the method could be classed as a<br />
partitioning technique. Gasteiger <strong>and</strong> coworkers167 applied this technique in<br />
conjunction with spatial autocorrelation vectors <strong>and</strong> were able to differentiate<br />
dopamine <strong>and</strong> benzodiazepine agonists.168 The method has also been proposed<br />
as a means of assessing the diversity of combinatorial libraries.92 Agrafiotis has<br />
described the application of a similar technique, Sammon mapping, for visualiz-<br />
ing the results of diversity analyses.’@<br />
Spanning Trees The IcePick program,lo7 mentioned earlier in connection<br />
with 3-D descriptors, utilizes a minimum weight spanning tree (MWST) to<br />
obtain a spread of molecules. The MWST can be thought of as the shortest way<br />
of indirectly connecting a set of points. When the MWST is large, the set will be<br />
diverse because the points are spread out.<br />
It is worth noting that in all the methods described in this section, diversity<br />
is being equated to dissimilarity between compounds, <strong>and</strong> dissimilarity is being<br />
assessed as (1 - similarity). In other words, the methods require a comparison<br />
metric that is meaningful for measurement of distance between quite dissimilar<br />
objects. This is not the case for 2-D fingerprints, for example, which were<br />
developed for 2-D substructure searching <strong>and</strong>, as mentioned earlier, tend to lose<br />
meaning below similarities of about 0.5. In the authors’ opinion, not enough<br />
consideration has been given to this issue. It is for this reason that validating<br />
metrics on quite structurally homogeneous data sets (where such assumptions<br />
may apply) is not the same as validating them on very structurally inhom-<br />
ogeneous sets (see above section on Validation of Descriptors).<br />
Partitioning Versus Distance-based Methods There are several methods<br />
available for selecting representative subsets from large sets. Each method has<br />
its good <strong>and</strong> bad points, <strong>and</strong> the specifics of the application should determine<br />
the most appropriate method to select. The methods are fairly independent of<br />
the nature of the descriptor but are affected by whether the descriptor is discrete<br />
(e.g., binary fingerprints) or continuous (e.g., molecular weight). Techniques for<br />
clustering chemical objects have been well reviewed by other re-<br />
searchers1449170J71 <strong>and</strong> have been applied by several groups to select repre-<br />
sentative screening sets from large compound collections. Despite these success-<br />
ful applications, we think that clustering should be used with great care. The<br />
application of a clustering method makes the assumption that the data are in<br />
fact amenable to clustering: in other words, most clustering methods will pro-<br />
duce a clustering, whatever the data. To the authors’ knowledge, there are no<br />
simple ways of testing whether this assumption is justified for a very large data<br />
set. Certainly, cluster significance tests have been proposed,172, but they are<br />
quite computationally expensive <strong>and</strong> not practical to apply to very large data<br />
sets. The second <strong>and</strong> most important factor is the lack of generality when one is<br />
applying distance-based measures. If the subset is defined by the clustering of
26 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />
one database or combinatorial library, it is hard to define the descriptors for<br />
compounds in a second database or library without a large number of expensive<br />
distance calculations, as well as some arbitrary definitions of cluster dimensions.<br />
Perhaps the best application of clustering in the context of library design<br />
is to remove redundancy in reagent sets.<br />
Partitioning is best described as a boxing algorithm: each descriptor is<br />
divided into ranges; a combination of descriptor ranges makes a partition or<br />
box. The composite descriptor is then effectively the coordinate vector of one of<br />
the vertices of the box. The complete set of partitions is formed by taking all<br />
combinations of all the ranges into which the molecular descriptors have been<br />
divided. This approach also has the useful property of being space filling. It is<br />
completely portable between different databases, designs, or applications, provided<br />
the same descriptors <strong>and</strong> ranges are used, thus allowing comparison<br />
between compound sets from different sources or different combinatorial libraries<br />
(see next section). Other advantages are easy control of granularity <strong>and</strong>,<br />
perhaps most important, the ability to identify property space not represented<br />
by any molecule. Disadvantages of the partitioning algorithm include the arbitrary<br />
way in which the ranges must be set, <strong>and</strong> the introduction of edge effects<br />
when a partition boundary slices between two very similar compounds; an<br />
answer to this issue may come though the application of fuzzy 10gic.l~~ These<br />
edge effects have implications for follow-up screening of molecules in the same<br />
partition as a lead: to avoid missing compounds that fall just outside a partition,<br />
the surrounding partitions should also be tested. However, for a sixdimensional<br />
classification like the DPD system,74 with perhaps 50 compounds<br />
per partition, this could necessitate screening a further potential 36,400 [i.e.,<br />
50(36-1)] compounds, a number almost large enough to defeat the object of the<br />
exercise. However, the portability of the descriptor outweighs this negative<br />
factor, in our opinion. More work remains to be done to reach a consensus on<br />
the question of which method, clustering or partitioning, gives the better performance.<br />
At present, we must conclude that choosing the method best suited to<br />
the task at h<strong>and</strong> is preferable to modifying the task to suit a favored methodology.<br />
Thus the application of these methods by the practicing computational<br />
chemist may require some trial <strong>and</strong> error.<br />
Comparison of Compound Collections with a<br />
View to Acquisition or Combinatorial Libraries<br />
with a View to Synthesis<br />
As mentioned above, corporate chemical structure databases are replete<br />
with analog series <strong>and</strong> are thus far from representative of the full range of<br />
structural or physicochemical diversity. There is therefore much interest in,<br />
first, locating the “diversity voids” within a particular collection, <strong>and</strong> then<br />
analyzing external collections to see which compounds could be purchased to<br />
occupy those holes. In this way, the molecular diversity of a corporate collection<br />
can be enhanced, <strong>and</strong> this in turn should lead to better results from high-
throughput screening experiments for the reasons outlined in the preceding<br />
section. Clearly, identical techniques can be used for the comparison of com-<br />
binatorial libraries to ensure that synthetic effort is not being wasted in the<br />
generation of redundant compounds.<br />
As an example of compound collection comparison, Shemetulskis et a1.174<br />
carried out clustering experiments to see how much diversity would be added to<br />
the Parke-Davis corporate database (CBI, 117,459 compounds) by the inclu-<br />
sion of the Chemical Abstracts Service ( CAST-3-D, 379,847 compounds) <strong>and</strong><br />
the Maybridge (MAY, 41,912 compounds) databases.175.176 The approach<br />
used was to cluster the CBI database with each of the MAY <strong>and</strong> CAST-3-D<br />
databases in turn <strong>and</strong> to examine what percentage of the resulting clusters<br />
contained only (or more than 95%) MAY or CAST-3-D compounds. The MAY<br />
compounds in these clusters could then be considered as c<strong>and</strong>idates for pur-<br />
chase. The clustering experiments were carried out on the basis of both struc-<br />
tural attributes <strong>and</strong> physicochemical properties using the Jarvis-Patrick al-<br />
gorithml47 as implemented in the Daylight software.69 With the large numbers<br />
of compounds involved, the clustering effort [requiring an O( N2) nearest-<br />
neighbor table calculation] was immense. As an illustration, the generation of<br />
the nearest-neighbor table for the CAST-3-D database took 64 CPU days on an<br />
SGI 4D/480 workstation!<br />
Apart from the large amount of CPU time required for clustering (or<br />
distance-based) experiments of the type mentioned above, such methods are<br />
generally not well suited to diversity void location, simply because they can deal<br />
only with space that is covered by the compounds being clustered. So, in the<br />
work above for instance, if there were regions of diversity space not occupied by<br />
any compound in CBI, MAY, or CAST-3-D, there would be no way of discover-<br />
ing these voids or of choosing compounds to fill them. Thus, partitioning (cell-<br />
based) approaches are generally considered to be preferable for this kind of<br />
analysis, provided, of course, that a suitable diversity space for partitioning is<br />
defined.84<br />
Cummins et al.76 used a cell-based approach to compare the molecular<br />
diversity in five databases: the Comprehensive Medicinal Chemistry (CMC177)<br />
<strong>and</strong> MACCS Drug Data Report (MDDR17*) (each representing medicinal<br />
chemistry knowledge bases), the Available Chemicals Directory (ACD1791) <strong>and</strong><br />
SPECS180 (representing commercially available compounds), <strong>and</strong> the Wellcome<br />
Registry. The compounds in these databases (totaling more than 300,000) were<br />
mapped into a molecular descriptor space describing molecular diversity in<br />
terms of the free energy of solvation <strong>and</strong> 60 topological indices. This number of<br />
descriptors was reduced to four by factor analysis, <strong>and</strong> a partitioning method<br />
was used to analyze the resulting space. It was found that the superpopulation<br />
of structures occupied only a very small volume of the available space; attention<br />
was focused on the densely populated part by removing outliers (cells with no<br />
or few representatives). In any event, only about 7000 compounds were deleted<br />
in this process, at which point it became possible to compare the databases in
28 <strong>Molecular</strong> Diuevsity <strong>and</strong> Combinatorial Libraty Design<br />
detail. For example, the MDDR <strong>and</strong> ACD databases were found to overlap<br />
each other’s volume by around 70%, reflecting the fact that many biologically<br />
active molecules are of commercial interest <strong>and</strong> vice versa.<br />
More recently, Willett’s group has extended its methodology for diverse<br />
subset selection to the analysis of the relative diversity of compound collec-<br />
tions.158 The six databases compared comprised five publicly available collec-<br />
tions <strong>and</strong> a combinatorial library. The individual diversities of the databases<br />
were assessed, <strong>and</strong> also the changes in diversity that occurred when one<br />
database was merged with another. Interestingly, the union of two databases<br />
does not always result in an increase in diversity! For instance, the diversity of<br />
the Maybridge collection was found to decrease markedly when it was merged<br />
with a simple combinatorial library constructed from the condensation of 400<br />
primary amines <strong>and</strong> 400 carboxylic acids selected from the World Drug Index21<br />
(WDI) database. In other words, according to the metrics used, the molecules in<br />
the resulting database are more similar to each other than those just in May-<br />
bridge. Pickett et a1.102 have adopted a similar kind of methodology but using a<br />
different descriptor, 3-D pharmacophores rather than 2-D bit strings. In this<br />
work a number of potential combinatorial libraries were compared, <strong>and</strong> the<br />
results were used to select the subset that added the most pharmacophore<br />
diversity in comparison to screening libraries previously synthesized.<br />
A rather different tack has been taken by Nilakantan et al.,lgl who<br />
describe a method for comparing large chemical databases. Their approach<br />
relies on categorizing each database according to its ring system content, based<br />
on some earlier work.182 Each ring system in each molecule is assigned a hash<br />
code, <strong>and</strong> these codes are summed for each molecule to generate what the<br />
investigators term a ring-cluster hash code. By comparing the resulting hash<br />
codes for two databases, it is possible to gain some idea about how similar they<br />
are. Nilakantan et al. used this metric to compare a number of public databases<br />
[Cambridge Structural Database (CSD),l83 ACD, WDI, <strong>and</strong> the National<br />
Cancer Institute (NCI-3-D)184 database] <strong>and</strong> discovered that the CSD has the<br />
richest collection of ring systems <strong>and</strong> ring clusters. The same paper presents a<br />
different method for the estimation of database diversity. The program DIVPIK<br />
simply tries to pick a certain number of dissimilar compounds from a database.<br />
Intuitively, the more diverse a database, the fewer attempted selections will be<br />
required. A measure of diversity can be gained by considering the ratio<br />
NTRIES/NPICK. Nilakantan et al. used this measure to demonstrate that the<br />
diversity of the four databases increased in the order WDI < ACD = NCI-3-D <<br />
CSD (essentially the same result obtained by a consideration of the ring cluster/<br />
system hash codes). The two independent methods thus serve to validate each<br />
other to some extent, although the DIVPIK method is significantly more com-<br />
putationally expensive in practice.<br />
We attempted a practical application of these ideas in a project to select<br />
1000 compounds from one agrochemical-biased corporate collection (CC1) to<br />
supplement the diversity of a representative pharmaceutical-biased screening
Abblications 29<br />
set (PSS) derived from another independent corporate collection. These experi-<br />
ments used the Chem-Xlos pharmacophore key overlaps as the similarity met-<br />
ric. We found that we could achieve better results by using diversity analysis<br />
tools, but that prefiltering had a very important role to play (a sobering thought<br />
for those of us caught up in the mathematics of diversity analysis). The follow-<br />
ing filters were used:<br />
0 Remove compounds containing potentially reactive or toxic groups.<br />
0 Remove molecules with a molecular weight outside the range 200-600<br />
Da.<br />
0 Remove molecules with a ClogP value outside the 0-6 range.<br />
Remove all molecules expressing a number of pharmacophores outside<br />
the range 1-1000.<br />
Remove all molecules with more than 100,000 conformations.<br />
0 Remove all instances of “near-duplicate’’ molecules. (This was achieved<br />
by taking each molecule in turn <strong>and</strong> removing all molecules with a<br />
Daylight fingerprint similarity > 0.95 to it).<br />
0 Remove compounds with heavy atom counts outside 20-45, excluding<br />
halogens.<br />
While the filters are fairly stringent, we did not expect them to remove 83% of<br />
the corporate collection! Use of the HARPick programl85 (see below) increased<br />
the number of pharmacophores present in the selected set from around 13,000<br />
for the first r<strong>and</strong>om pick to 15,711 <strong>and</strong> increased the number of phar-<br />
macophores unique to the selected set (as compared to PSS) from 535 to 850.<br />
Combinatorial Library Design<br />
The key task in library design, in which molecular diversity analysis can<br />
play a central role, is the selection of reagents. In general, these reagents will<br />
give rise to R groups attached to a conserved scaffold or template. The need for<br />
reagent selection arises because in many instances, the product of the number of<br />
available reagents at each variable position rapidly outstrips the synthetic<br />
capability of even high-throughput, robotic synthesis units. From arguments<br />
similar to those advanced in the preceding section, it is obviously sensible to<br />
choose a diverse subset of the available reagents at each position for general<br />
library design. In some instances, there will be additional information that can<br />
focus or constrain the design. We shall deal with these two scenarios separately.<br />
General Library Design<br />
Broadly speaking, there are three approaches to reagent selection. In<br />
reagent-based selection, a subset is chosen to maximize the diversity of the<br />
reagents at each position without considering the reagents at the other posi-<br />
tions, or the scaffold. A good example of such a method is that reported by the
30 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libraty Design<br />
Chiron group.75 Of course, almost any of the techniques for diverse subset<br />
selection may be applied to reagent-based selection of reagents. Alternatively, a<br />
product-based scheme can be envisaged, in which reagents are selected at all<br />
positions so that the diversity of the generated products is maximized. This type<br />
of approach has been championed by Gillet et al.186 <strong>and</strong> by Good <strong>and</strong> Lewis.185<br />
Finally, one may pick the most diverse set of products <strong>and</strong> then deconvolute to<br />
find the sets of reagents required to make that set. This kind of approach,<br />
sometimes called cherry-picking, is exemplified by the methods embodied in the<br />
ChemDiverse package.105<br />
There are some advantages <strong>and</strong> disadvantages to each of these ap-<br />
proaches, <strong>and</strong> each may be appropriate in certain design situations. In general,<br />
the cherry-picking approach will result in the most diverse set of products;<br />
however, this approach has the serious disadvantage of not resulting in a syn-<br />
thetically efficient combinatorial library. That is, it is likely to be necessary to<br />
synthesize a number of “unwanted” molecules in addition to the desired prod-<br />
ucts. Reagent-based selection is fast, since one is not considering the enumer-<br />
ated combinatorial products in the analysis, <strong>and</strong> thus this method may be<br />
suitable when the enumerated virtual library is very large. However, experi-<br />
ments by Gillet et a1.186 have shown that a product-based reagent selection<br />
approach gives diversity superior to that obtainable from a reagent-based<br />
method. Van Drie <strong>and</strong> Lajiness report a similar experience.187 Balanced against<br />
this we note that most product-based schemes can deal only with enumerated<br />
libraries of the order of 100,000 molecules, a number that is easily attainable,<br />
particularly with more than two variable positions on the template. In practice,<br />
one is likely to need to combine the reagent-based <strong>and</strong> product-based ap-<br />
proaches. The reagent-based selection methods can be used to filter the initial<br />
reagent lists to a size at which the virtual library becomes tractable for analysis<br />
by a product-based method. This kind of hybrid approach has been used suc-<br />
cessfully by Good <strong>and</strong> Lewis in applying their HARPick program.185<br />
We have already discussed the work of Chapman112 from the perspective<br />
of molecular descriptors. We will now look at it in terms of library design.<br />
Chapman computes diversity as the sum of all pairwise dissimilarities between<br />
the molecules in the set. A bias may be introduced to weight against excessive<br />
flexibility in the molecules by a function based on the number of rotatable<br />
bonds. A st<strong>and</strong>ard “greedy” algorithm that adds the molecule that will most<br />
increase the diversity of the current set of molecules is used to build up a library<br />
design, This implies a cherry-picking strategy. Even so, the diversity measure is<br />
still very computationally intensive, <strong>and</strong> at present this method can h<strong>and</strong>le only<br />
libraries in the low thous<strong>and</strong>s.<br />
The nature of product-based library design lends itself naturally to the<br />
application of heuristic search methods such as simulated annealing188 <strong>and</strong><br />
genetic algorithms.189 Several groups have published applications in the latter<br />
area, which has been recently reviewed.190-192 While all methods differ some-<br />
what in their technical implementations of the different algorithms, by far the
Applications 3 1<br />
most important factor affecting the final choice of reagents is the scoring func-<br />
tion. As always, there is a need to use descriptors pertinent to lig<strong>and</strong>-receptor<br />
interactions. The HARPick program of Good <strong>and</strong> Lewis185 uses a fitness func-<br />
tion based on multipharmacophore molecular descriptors. Both simulated an-<br />
nealing <strong>and</strong> genetic algorithms have been studied.193 The scoring function in<br />
HARPick is very flexible <strong>and</strong> is made up from a weighted combination of the<br />
following terms: the number of pharmacophores expressed <strong>and</strong> their frequency,<br />
some crude shape measures, molecular flexibility, <strong>and</strong> the degree of match to<br />
the pharmacophore profile of a reference library. The method was tested by<br />
means of a variety of weighting combinations <strong>and</strong> libraries, <strong>and</strong> the results were<br />
compared with the data obtained with ChemDiverse,lOS which, as mentioned<br />
earlier, uses a cherry-picking strategy. Both ChemDiverse <strong>and</strong> HARPick were<br />
able to improve considerably molecular selection based on pharmacophore<br />
count, compared to r<strong>and</strong>om selections, but HARPick calculations, which were<br />
set to purely maximize pharmacophore diversity, were able to find around twice<br />
the number of pharmacophores obtained by the comparable ChemDiverse runs,<br />
As expected, however, the molecules chosen were substantially more flexible<br />
<strong>and</strong> “promiscuous.” Inclusion of the “quality” terms (which penalize undesir-<br />
able characteristics such as excessive conformational flexibility in the library<br />
members) reduced the pharmacophore scores of the final selections but not<br />
drastically (still better than r<strong>and</strong>om). As one might expect, selections made at<br />
r<strong>and</strong>om or via ChemDiverse gave sets of molecules that broadly followed the<br />
distribution of properties (such as the number of rotatable bonds in a molecule)<br />
observed in the whole St<strong>and</strong>ard Drugs File (now known as the World Drug<br />
Index21). HARPick managed to produce a much more even distribution. In<br />
another evaluation of HARPick reported in Ref. 185, the program outper-<br />
formed r<strong>and</strong>om selections from the perspective of filling diversity voids in a<br />
reference library. Given our remarks about the difficulties in measuring general<br />
diversity, this is probably the best way in which such selection methods should<br />
be applied.<br />
The primary feature emphasized by the calculations above is the control<br />
afforded to the user over both the components of the scoring function <strong>and</strong> the<br />
weights applied to them. In principle, any descriptor could be applied to the<br />
scoring functions. One could envisage maximizing functions (e.g., 3-D phar-<br />
macophore or 2-D fingerprint coverage, reagent supplier reliability), minimiz-<br />
ing functions (e.g., cost per reagent), partition functions (e.g., general shape,<br />
ClogP), <strong>and</strong> bounding functions (assigning a score of zero to products with<br />
properties outside specified bounds, e.g., minimudmaximum ClogP). In prin-<br />
ciple, a totally customizable scoring function could be devised, with the user<br />
able to choose the properties included in the scoring routine, <strong>and</strong> the functions<br />
used on them. Similar ideas are envisaged by Agrafiotisl69 <strong>and</strong> have been imple-<br />
mented by groups at various pharmaceutical companies. With careful applica-<br />
tion of user weightings for each component function, the result would be a<br />
totally flexible profiling paradigm.
32 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Conzbinatorial Library Design<br />
Gillet et a1.194 have recently reported on the SELECT program, which is<br />
similar in philosophy to HARPick but uses a genetic algorithm rather than<br />
simulated annealing. A product-based program, SELECT utilizes the Daylight<br />
structural fingerprints to optimize either the sum of dissimilarities or the aver-<br />
age nearest-neighbor distance of selected compounds. Interestingly, the pro-<br />
gram can also select the best configuration for a multicomponent library. Be-<br />
cause of the nature of the descriptors used, the program can be applied to<br />
virtual libraries of hundreds of thous<strong>and</strong>s of products. Additional terms in the<br />
scoring function allow libraries to be designed with respect to an external<br />
reference <strong>and</strong> to have an appropriate spread of physicochemical properties.<br />
Constrained/Focused/Biased Libra y Design<br />
In designing a library, it is of paramount importance to take account of all<br />
the available information. A general library design assumes no particular prior<br />
knowledge, but in many cases, there will be information that can be used. For<br />
instance, it might be desirable to bias a library away from a previous collection<br />
or library, or toward a set of compounds known to be active. In one case,19-<<br />
Sheridan <strong>and</strong> Kearsley constrained their design to select tripeptoids similar to<br />
two tetrapeptide cholecystokinin (CCK) antagonists. In a second example, scor-<br />
ing was based on an angiotensin converting enzyme (ACE) “trend vector”<br />
summarizing the chemical features shared by known ACE inhibitors that differ<br />
from those of a general population of druglike molecules.195 Similar work has<br />
been reported by Cho et al. with their FOCUS-2-D method.196 Good <strong>and</strong> Lewis<br />
have shown how the HARPick program can be used in this context, selecting a<br />
set of reagents such that the generated products would fill diversity voids in the<br />
space occupied by the St<strong>and</strong>ard Drugs File.185<br />
In related work, Pickettl97 has used a genetic algorithm whose objective<br />
function was the overlap in pharmacophores between one or more lead com-<br />
pounds <strong>and</strong> members of the proposed library. In the context of an ongoing<br />
medicinal chemistry program, Brown et al.198 have described the design of<br />
libraries biased toward the family of peroxisome proliferator-activated recep-<br />
tors (PPARs). In this instance, a phenoxybutyric acid group (present in known<br />
PPAR lig<strong>and</strong>s) was incorporated as a “privileged” fragment at one diversity<br />
position. At the other two variable positions, molecular weight <strong>and</strong> synthetic<br />
considerations were used to filter reagents before subjecting them to an experi-<br />
mental design procedure to select a diverse set at each point. Deconvolution of<br />
the resulting library led to the identification of GW 2433 (Figure 7) as the first<br />
high affinity PPARG lig<strong>and</strong>.<br />
The most exciting situation, however, is where there is information con-<br />
cerning the structure of the receptor site that is being targeted. In this case,<br />
structure-based design <strong>and</strong> combinatorial chemistry can combine syn-<br />
ergistically to give enormous benefits.199.200 The structural information pro-<br />
vides a strong constraint for reagent selection, while combinatorial library<br />
design ensures the rapid provision of synthetically accessible compounds, thus
<strong>Diversity</strong> Is Not the Be-All <strong>and</strong> End-All! 33<br />
Biased library GW 2433<br />
Figure 7 Identification of GW 2433. The biased library comprised a biasing fibrate<br />
monomer at R1. R2 <strong>and</strong> R3, derived from carboxylic acids <strong>and</strong> isocyanates, were<br />
chosen for diversity by means of experimental design techniques.<br />
overcoming a debilitating bottleneck in de novo/structure-based drug<br />
design.201J02 There is a growing number of published examples of structure-<br />
based library design (see, e.g., Refs. 119 <strong>and</strong> 203-214). Perhaps the most<br />
compelling example is that of Kick et a1.118 In this work, the active site of<br />
cathepsin D was used to constrain the selection of reagents at four variable<br />
positions on a scaffold based on a known inhibitor, pepstatin. The resulting<br />
library (1000 compounds) yielded a hit rate of 6-7% when screened at 1 yM<br />
with 7 compounds being active at 100 nM or less. The information gained from<br />
this initial library was used to design <strong>and</strong> synthesize a follow-up library yielding<br />
inhibitors in the range 9-15 nM. As a control, Kick et al. also designed a<br />
general, diverse library (also 1000 compounds) using 2-D similarity measures<br />
for screening against the enzyme. This library produced a hit rate of 2-3% at 1<br />
pM with only one compound being active at 100 nM. From this example, the<br />
incorporation of structural information into the library design can be seen to be<br />
extremely valuable. A similar method for structure-based library design, called<br />
PRO-SELECT has been reported by Murray <strong>and</strong> coworkers.lls This program<br />
was used to design inhibitors of thrombin based around a scaffold from a<br />
known covalent inhibitor, PPACK (D-Phe-Pro-Arg-chloromethylketone).<br />
About half the designed molecules were found to have micromolar activity, the<br />
best being a close PPACK analog (D-Phe-Pro-agmatine) which showed an inhib-<br />
itory concentration (IC50) of 40 nM. Thrombin also provided the target for the<br />
structure-based combinatorial library design described by Graybill et a1.,21s<br />
although few computational details are given.<br />
DIVERSITY IS NOT THE BE-ALL AND<br />
END-ALL!<br />
In all work on the selection of compounds or reagents by means of mo-<br />
lecular diversity techniques, it is vital not to lose sight of other consider-
34 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
ations.Z16 As Higgs et al. put it: “compounds must not be so diverse as to be<br />
pharmaceutically unreasonable.”*66 In their early work with a maximal<br />
dissimilarity selection algorithm, Higgs et al. found that nearly all the com-<br />
pounds selected were deemed pharmaceutically unreasonable by medicinal che-<br />
mists. They thus implemented a series of rules based on substructural queries,<br />
molecular weight, <strong>and</strong> ClogP cutoffs, which they use to assign “demerits” to<br />
compounds. If any compound gains too many demerits, it is rejected-a fate<br />
that may be suffered by up to half of the molecules initially selected! The fact<br />
that 90% of the molecules in the CMC database (i.e., known drugs) caused one<br />
or more of the rules to fire underlines the need not to be too zealous in rejecting<br />
compounds with only one poor feature.<br />
In a similar vein, Lewis et a1.74 describe a series of substructural filters<br />
applied during the creation of the diverse property-derived sets. These rules are<br />
designed to eliminate molecules containing toxic or very reactive substructures<br />
such as reactive epoxides, acyclic aminals or acid anhydrides.217 Also rejected<br />
are other molecules that exhibit a wide range of biological activities (e.g., pros-<br />
tagl<strong>and</strong>ins, prostacyclins, or thromboxanes) <strong>and</strong> are thus unsuitable for general<br />
screening. A similar “badlist” was developed by Lajiness at Pharmacia <strong>and</strong><br />
Upjohn.145 More recently, at RPR, we have implemented a set of alerting rules<br />
for compounds that contain chromophores that absorb in the range above 300<br />
nm. Such compounds may interfere with certain assays <strong>and</strong> thereby reduce the<br />
accuracy of high-throughput screening (HTS) data.<br />
With increasing importance being attached to the early detection of com-<br />
pounds likely to be problematic from an absorption, distribution, metabolism,<br />
<strong>and</strong> excretion (ADME) viewpoint,21*-221 at RPR we sought to apply computa-<br />
tional measures for the prediction of intestinal absorption-a key requirement<br />
for an orally bioavailable compound-during the design of lead optimization<br />
libraries. To this end, we implemented the popular “rule-of-5” criteria<br />
described by Lipinski et a1.222 A compound is deemed to fail the rule-of-5 check<br />
(<strong>and</strong> thereby to be possibly deficient from an oral absorptiodpermeability as-<br />
pect) if it possesses two or more of the following features:<br />
0 more than 5 hydrogen bond donors (i.e., N-H or 0-H bonds)<br />
0 more than 10 hydrogen bond acceptors (i.e., any N or 0, including those<br />
in donors)<br />
0 a ClogP value of greater than 5.0 (or an MlogP223 value > 4.15)<br />
0 a molecular weight of greater than 500.0<br />
At RPR we also developed computational alerts based on the work of Palm et<br />
al.224-226 <strong>and</strong> Winiwarter <strong>and</strong> coworkers.227 Both these groups demonstrated<br />
a strong correlation between polar molecular surface area (PSA) <strong>and</strong> human<br />
intestinal absorption. Of particular interest is the observation that molecules<br />
with a PSA of greater than 140 A2 are likely to show poor (< 10%) fractional<br />
absorption. Our own research has confirmed this observation, <strong>and</strong> we have
Current Issues <strong>and</strong> Future Directions 35<br />
extended the methods to develop a QSAR model for predicting blood-brain<br />
barrier penetration.228J29 Our implementation of the polar surface area<br />
calculations is sufficiently rapid to allow the profiling of large (virtual) com-<br />
pound collections on a routine basis. This permits the inclusion of ADME-<br />
related parameters in the process of product-based reagent selection.142 In this<br />
way, we can attempt to ensure that the library compounds will have good<br />
pharmacokinetic properties, thus facilitating the hit-to-lead transition.<br />
CURRENT ISSUES AND FUTURE<br />
DIRECTIONS<br />
In a field that is far from mature, there are necessarily many issues to be<br />
addressed <strong>and</strong> myriad possible future directions that research must explore.18<br />
Here, we highlight a few of the current issueddebates in the field <strong>and</strong> suggest<br />
possible avenues for future work. We have touched on several issues above, <strong>and</strong><br />
the reader is also directed to the reviews by Martin230 <strong>and</strong> Mason <strong>and</strong><br />
Hermsmeier.231<br />
<strong>Diversity</strong> Descriptors<br />
There are many issues surrounding the way that “diversity space” is<br />
described. As we have mentioned, the popular 2-D bit string or fingerprint<br />
descriptors were originally designed for 2-D substructure-searching applica-<br />
tions, <strong>and</strong> it remains unclear whether these are truly optimal for diversity<br />
calculations.70 The debate that has raged over 2-D versus 3-D descriptors has,<br />
perhaps, generated more heat than light. It is likely that each type of descriptor<br />
has its place in the process of diversity analysis <strong>and</strong> library design, but a con-<br />
sensus on this matter has yet to be reached. Nonetheless, it would appear that<br />
several groups are trying three-dimensional measures of diversity which more<br />
accurately reflect lig<strong>and</strong>-receptor interactions. Unfortunately, this leads to in-<br />
creased computational effort, limits in the description of conformational space<br />
(e.g., neglect of solvent effects in most cases), <strong>and</strong> the need for tailored diversity<br />
measures.<br />
In terms of 3-D descriptors, there remains the need for a useful, computa-<br />
tionally expedient descriptor of molecular shape. Another question is whether<br />
complementary site points should be included in 3-D descriptors as advocated<br />
by some workers?2303232 Can molecular field information be included in 3-D<br />
descriptors in a manner similar to the way it has been incorporated into experi-<br />
mental 3-D similarity searching system?233 How should tautomeric <strong>and</strong> ioniza-<br />
tion states be h<strong>and</strong>led? These are all questions worthy of future research.<br />
With both 2-D <strong>and</strong> 3-D descriptors, the thorny issue of how to validate<br />
descriptors is still an open question. It is clear that we would like to have
36 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
descriptors that relate better to biological activity,230 but proving that this is<br />
indeed the case for a given descriptor is a task fraught with difficulties. A key<br />
issue in descriptor validation is how to define a reference set that is meant to<br />
typify the universal set of actives, <strong>and</strong> possibly inactives. One approach has<br />
been to use the World Drug Index21 to define the set of active compounds <strong>and</strong><br />
the Spresi database130 to define the inactives. The WDI must be used carefully<br />
<strong>and</strong> selectively because it contains many classes that are inappropriate (e.g.,<br />
disinfectants, dentrifrices). The next question is, How valid is it to compare<br />
central nervous system (CNS) drugs with topical steroids with anticancer<br />
drugs? The danger is that the analysis will tend to produce the lowest common<br />
denominator (like the rule of 5),222 rather than a stunning insight into molecu-<br />
lar diversity. There is also the issue of reverse sampling: How valid is it to deduce<br />
the properties of the universal set of biologically active molecules from a subset?<br />
The properties of previous drugs may have been driven mainly by bio-<br />
availability, or toward making analogs of a natural substrate. Using these data<br />
forces an unnatural conservatism into our diversity models.<br />
It is also interesting to reflect on what is meant by activity <strong>and</strong> inactivity.<br />
Any molecule will bind to any receptor, although the affinity may have any<br />
value between picomolar <strong>and</strong> gigamolar. If the binding event is viewed in terms<br />
of moiecular interactions, then interesting, specific binding can be characterized<br />
by affinity constants lower than 1000 nM. However, it is not uncommon to find<br />
affinity constants of 1000 nM that are mainly due to solvophobic interactions<br />
forcing the lig<strong>and</strong> to associate with the receptor (particularly for hydrophobic<br />
compounds like steroids). At 100 nM, some specific noncovalent interactions<br />
are being formed, <strong>and</strong> at levels below 10 nM, there are many highly specific<br />
interactions present. It should be clear that the activity is a continuous phenom-<br />
enon, <strong>and</strong> that drawing an arbitrary division is a hazardous ploy. Furthermore,<br />
while one can be fairly sure why a compound is active, it is much harder to say<br />
precisely why a compound is inactive. Was it the wrong pharmacophore, a steric<br />
bump, poor solubility, metabolic alteration, or something else? Despite all these<br />
caveats, several research groups have followed such an approach <strong>and</strong> claim to<br />
be able to distinguish a potential active from a potential inactive, with reason-<br />
able confidence. Such results cannot be ignored, <strong>and</strong> they will be of use in the<br />
early phases of library design, where the basic feasibility of the library <strong>and</strong> the<br />
reaction are being considered.<br />
The realization that “mere diversity”216 is not sufficient in practical li-<br />
brary design has driven much recent work in the direction of biasing design<br />
toward compounds with more “druglike” properties. The challenge here is<br />
defining the term “druglike.” Several groups have attempted to tackle this<br />
problern,136,234-236 but some of the arguments used earlier (see section on<br />
Validation of Descriptors) also apply here. How can the non-drug like space be<br />
adequately defined? Physical properties or other measures such as polar surface<br />
area can be included in the design, but how should these be weighted with<br />
respect to diversity? Should compounds falling outside the bounds simply be
Current Issues <strong>and</strong> Future Directions 3 7<br />
excluded from further consideration? If such hard cutoffs are applied, it is not<br />
always possible to identify a truly combinatorial subset of a virtual library.<br />
Pickettl42 has implemented a simulated annealing procedure that attempts to<br />
find the solution closest to a true combinatorial subset within a number of user-<br />
defined constraints.<br />
As a final note in this section, several years ago Martin230 suggested a<br />
competition (similar to the CASP competition for protein structure predic-<br />
tion237) for assessing descriptors. This would presumably involve the computa-<br />
tion of the diversity of a defined library by several different research teams, each<br />
using its own favored approaches. The results of each team would then com-<br />
pared to some pre-agreed experimental determination of diversity. This would<br />
be interesting if it could ever be arranged!<br />
Library Design<br />
In terms of sampling diversity space, it would seem that stochastic selec-<br />
tion algorithms are becoming popular for combinatorial library design. Ad-<br />
vances in technology now allow many robots to h<strong>and</strong>le noncombinatorial li-<br />
braries, but reagent cost remains a big issue. It is possible to include cost within<br />
the selection process, but again this has to be carefully balanced with diversity<br />
(or similarity in a focused library). Product-based reagent selection would seem<br />
to be demonstrably superior to reagent-based approaches186 but, depending on<br />
the type of descriptors used, may still be problematic in terms of CPU time for<br />
very large libraries. Thus, from a practical point of view, a two-step process of<br />
reagent selection may constitute a workable compromise, with an initial<br />
reagent-based filtering step preceding the full product-based selection.<br />
The area of structure-based library design is one that promises much in the<br />
coming years. Currently, most reported approaches use the approximation of a<br />
fixed scaffold in the site (see, e.g. Refs. 115 <strong>and</strong> 118). This could be overcome<br />
by allowing some limited relaxation or docking after the attachment of each<br />
combination of R groups. Of crucial importance is the continuing search for<br />
better binding affinity prediction algorithms.230 Approaches to this problem<br />
range from empirical scoring functions117J38J39 to more detailed treatments<br />
based on Monte Carl0240 or molecular dynamics241 simulations to full free<br />
energy perturbation methods.242 In realistic terms, it is likely that only empiri-<br />
cal approaches will be applicable to library design in the near future, But con-<br />
tinuing theoretical <strong>and</strong> methodological improvements, coupled with the in-<br />
creases in computer speed combined with parallelization, should eventually lead<br />
to improved structure-based designs.<br />
Finally, even in cases where we may be able to show that our designed<br />
libraries are “better” than r<strong>and</strong>om, how close are they to being optimal? To<br />
answer this question, we need to have an external definition of optimality,<br />
which does not exist at present. What is required is accurate screening results on<br />
a large library, from which we try to select a sublibrary. It should be noted that
38 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Cornbinatorial Library Desinn<br />
the optimality test will be valid only for that library <strong>and</strong> that set of screening<br />
data.<br />
Speed Requirement<br />
As we mentioned earlier, the time that is available for each diversity task<br />
will likely depend on the nature of the task. Reagent selection may need to be<br />
done in a hurry, whereas compound acquisition studies may be afforded rather<br />
more time. In the former case, it is clear that the computer time required for<br />
diversity analysdlibrary design must not exceed that available (possibly only<br />
days if the library chemistry is already developed, longer if the chemistry is<br />
new). For many product-based reagent selection approaches, CPU time is at<br />
present a very real obstacle to what might be done. It is to be hoped that more<br />
efficient algorithms <strong>and</strong> exploitation of parallel computation techniques will<br />
help alleviate the current difficulties. More fundamentally, the development of<br />
approaches based on Markush representations may offer a solution in instances<br />
where only simple 2-D descriptors are employed.243<br />
“Quick <strong>and</strong> Dirty” QSAR<br />
The process of library design is an iterative rather than a “one-off ” pro-<br />
cedure. Once the first library has been assayed, the next question is, What to<br />
make next? In the modern pharmaceutical discovery milieu, the computational<br />
chemist needs to answer this question quickly to have an effective input in<br />
selecting the next synthetic targets. Clearly, there is a requirement for quantita-<br />
tive structure-activity relationships <strong>and</strong> other data-mining techniques to extract<br />
relationships from the HTS data resulting from large libraries. Martin230 sug-<br />
gests that QSAR techniques need to be able to h<strong>and</strong>le 105 compounds rather<br />
than the relatively small data sets (ca. 102) usually studied at present. Methods<br />
are also required to cope with noisy, incomplete, or binary (results simply<br />
expressed as “+” or “-” ) biological activity data. Hence the expression “quick<br />
<strong>and</strong> dirty QSAR” has come into use. Some approaches to these problems are<br />
being reported,2447245 <strong>and</strong> it is possible that fuzzy methods may also have<br />
a part to play. Certainly, there is much room for further research in this<br />
area.<br />
Integration with Other Modeling Tools<br />
A further issue is how to link diversity tools effectively with extant modeling<br />
programs. For instance, if a partitioning scheme were being used for analyzing<br />
diversity space, it might be possible to use de novo design techniques to<br />
suggest compounds to fill currently empty cells.18J30 Indeed, Pearlman246 is<br />
working on a program called EAInventor to do just this in conjunction with his<br />
Diver~eSolutions2~~ package.
Persuading the Customers<br />
References 39<br />
Last but not at all least, there is the issue of getting buy-in from the<br />
medicinal chemists. It is not always easy to convince those tasked with library<br />
synthesis of the benefits of computational reagent selection. Many still prefer to<br />
stick with their experience <strong>and</strong> intuition as to “what will work.” Of course, this<br />
accumulated wisdom should not be ignored <strong>and</strong>, in practice, a compromise<br />
between human <strong>and</strong> computer selection may be the best way forward. Yet<br />
nothing succeeds like success, <strong>and</strong> it has already been demonstrated at various<br />
pharmaceutical companies that the adoption of library design will accelerate<br />
when it is associated with the discovery of novel leads at a rate far faster than<br />
that which can be simply explained away by its detractors. The analogous<br />
situation existed a few years ago in the field of structure-based drug design,<br />
which really took off only after the publication of potent new leads, particularly<br />
by groups working on HIV-1 protease.47<br />
CONCLUSIONS<br />
The term “diversity” is hard to define conceptually. In a practical sense,<br />
diversity analysis is a design strategy that attempts to maximize the hit rate of<br />
HTS experiments, <strong>and</strong> validation should be in terms of this goal. It is important<br />
to maintain a pragmatic approachl87: “diversity” is not the be-all <strong>and</strong> end-all.<br />
This is especially so when one is designing structure-based libraries, where<br />
diversity is perhaps only a weak contributor to a good design. The best selection<br />
is likely to be neither arbitrary nor maximally diverse.14<br />
Finally, we reemphasize that this research area is still young: developments<br />
are occurring rapidly, driven by other new technologies in drug discovery re-<br />
search. This chapter represents a personal snapshot taken by the authors. “It is<br />
impossible to predict the contents of an article written in 10 years on the subject<br />
of molecular diversity” .230<br />
ACKNOWLEDGMENTS<br />
We thank our colleagues, past <strong>and</strong> present, for their help <strong>and</strong> insights in the field of molecu-<br />
lar diversity <strong>and</strong> combinatorid library design. In particular, we acknowledge the contributions of<br />
present <strong>and</strong> past coworkers at Rhbne-Poulenc Rorer (Aventis) Iain McLay (now at Glaxo Well-<br />
come), Paul Menard, Claude Luttmann, Isabelle Morize, Jon Mason, <strong>and</strong> Andrew Good (the last<br />
two now at Bristol-Myers Squibb).<br />
REFERENCES<br />
1. B. Merrifield, J. Am. Chem. SOC., 85, 2149 (1963). Solid Phase Peptide Synthesis. I. The<br />
Synthesis of a Tetrapeptide.
40 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />
2. C. Desai, R. N. Zuckermann, <strong>and</strong> W. H. Moos, Drug Dev. Res., 33, 174 (1994). Recent<br />
Advances in the Generation of Chemical <strong>Diversity</strong> Libraries.<br />
3. M. Geysen, S. Barteling, <strong>and</strong> R. Moelen, Proc. Natl. Acad. Sci. USA, 81,3998 (1984). Use of<br />
Peptide Synthesis to Probe Viral Antigens for Epitopes to a Resolution of a Single Amino<br />
Acid.<br />
4. R. A. Houghten, Proc. Natl. Acad. Sci. USA, 82,5131 (1985). General Method for the Rapid<br />
Solid-Phase Synthesis of Large Numbers of Peptides: Specificity of Antigen-Antibody Inter-<br />
action at the Level of Individual Amino Acids.<br />
5. K. S. Lam, S. E. Salmon, E. M. Hersh, V. J. Hruby, W. M. Kazmierski, <strong>and</strong> R. J. Knapp,<br />
Nature, 354, 82 (1991). A New Type of Synthetic Peptide Library for Identifying Lig<strong>and</strong>-<br />
Binding Activity.<br />
6. L. A. Thompson <strong>and</strong> J. A. Ellman, Chem. Rev., 96,555 (1996). Synthesis <strong>and</strong> Applications of<br />
Small Molecule Libraries.<br />
7. E. M. Gordon, M. A. Gallop, <strong>and</strong>D. V. Patel, Acc. Chem. Res., 29,144 (1996). Strategy <strong>and</strong><br />
Tactics in Combinatorial Organic Synthesis. Applications to Drug Discovery.<br />
8. F. Balkenhohl, C. von dem Bussche-Huennefeld, A. Lansky, <strong>and</strong> C. Zechel, Angew. Cbem.<br />
Int. Ed. Engl., 35, 2288 (1996). Combinatorial Synthesis of Small Organic Molecules.<br />
9. E. R. Felder <strong>and</strong> D. Poppinger, Adv. Drug Res., 30, 111 (1997). Combinatorial Compound<br />
Libraries for Enhanced Drug Discovery Approaches.<br />
10. D. Brown, Mol. <strong>Diversity</strong>, 2, 217 (1997). Future Pathways for Combinatorial Chemistry.<br />
11. P. L. Myers, Curr. Opin. Biotechnol., 8, 701 (1997). Will Combinatorial Chemistry Deliver<br />
Real Medicines?<br />
12. R. E. Dolle, Mol. <strong>Diversity</strong>, 3, 199 (1998). Comprehensive Survey of Chemical Libraries<br />
Yielding Enzyme Inhibitors, Receptor Agonists <strong>and</strong> Antagonists, <strong>and</strong> Other Biologically<br />
Active Agents: 1992 Through 1997.<br />
13. J.-L. Fauchere, J. A. Boutin, J.-M. Henlin, N. Kucharczyk, <strong>and</strong> J.-C. Ortuno, Chemom. Intell.<br />
Lab. Syst., 43 (1,2), 43 (1998). Combinatorial Chemistry for the Generation of <strong>Molecular</strong><br />
<strong>Diversity</strong> <strong>and</strong> the Discovery of Bioactive Leads.<br />
14. J. M. Blaney <strong>and</strong> E. J. Martin, Cum Opin. Chem. Biol., 1, 54 (1997). Computational<br />
Approaches for Combinatorial Library Design <strong>and</strong> <strong>Molecular</strong> <strong>Diversity</strong> <strong>Analysis</strong>.<br />
15. E. J. Martin, D. C. Spellmeyer, R. E. Critchlow Jr., <strong>and</strong> J. M. Blaney, in Reviews in Computa-<br />
tional Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd, Eds., VCH Publishers, New York, 1997,<br />
Vol. 10, pp. 75-100. Does Combinatorial Chemistry Obviate <strong>Computer</strong>-<strong>Aided</strong> Drug<br />
Design?<br />
16. M. G. Bures <strong>and</strong> Y. C. Martin, Curr. Opin. Chem. Biol., 2, 376 (1998). Computational<br />
Methods in <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Chemistry.<br />
17. D. K. Agrafiotis, J. C. Myslik, <strong>and</strong> F. R. Salemme, Mol. <strong>Diversity</strong>, 4, 1 (1999). Advances in<br />
<strong>Diversity</strong> Profiling <strong>and</strong> Combinatorial Series Design.<br />
18. P. Willett, Perspect. Drug Discovery Des., 7/8,1 (1997). Computational Tools for the Analy-<br />
sis of <strong>Molecular</strong> <strong>Diversity</strong>. For more recent material, see: D. K. Agrafiotis <strong>and</strong> E. J. Martin,<br />
J. Mol. Graphics Modell., 18, (3/4), in press (2000). Combinatorial Library Design.<br />
19. H. Kubinyi, Perspect. Drug Discovery Des., 9/10/11, 225 (1998). Similarity <strong>and</strong><br />
Dissimilarity: A Medicinal Chemist's View.<br />
20. G. Sello, J. Chem. Inf. Comput. Sci., 38, 691 (1998). Similarity Measures: Is It Possible to<br />
Compare Dissimilar Structures?<br />
21. World Drug Index. Derwent Information, http://www.derwent.com/.<br />
22. E. J. Martin, R. E. Critchiow Jr., D. C. Speilmeyer, S. Rosenberg, K. L. Spear, <strong>and</strong> J. M.<br />
Blaney, Pharmacocbem. Libr., 29, 133 (1998). Diverse Approaches to Combinatorial Li-<br />
brary Design.<br />
23. R. S. Bohacek, C. McMartin, <strong>and</strong> W. C. Guida, Med. Res. Rev., 16, 3 (1996). The Art <strong>and</strong><br />
Practice of Structure-Based Drug Design.<br />
24. H. Kubinyi, Curr. Opin. Drug Discovery Dev., 1, 4 (1998). Structure-Based Design of En-<br />
zyme Inhibitors <strong>and</strong> Receptor Lig<strong>and</strong>s.
25.<br />
26.<br />
27.<br />
28.<br />
29.<br />
30.<br />
31.<br />
32.<br />
33.<br />
34.<br />
35.<br />
36.<br />
37.<br />
38.<br />
39.<br />
40.<br />
41.<br />
42.<br />
43.<br />
44.<br />
45.<br />
46.<br />
References 41<br />
P. M. Dean, <strong>Molecular</strong> Foundations of Drug-Receptor Interaction, Cambridge University<br />
Press, Cambridge, 1987.<br />
W. P. Jencks, in Chemical Recognition in Biology, F. Chapeville <strong>and</strong> A.-L. Haenni, Eds.,<br />
Springer-Verlag, Berlin, 1980, pp. 3-25. What Everyone Wanted to Know About Tight<br />
Binding <strong>and</strong> Catalysis, But Never Thought of Asking.<br />
H.-J. Bohm <strong>and</strong> G. Klehe, Angew. Chern. Int. Ed. Engl., 35, 2588 (1996). What Can We<br />
Learn from <strong>Molecular</strong> Recognition in Protein-Lig<strong>and</strong> Complexes for the Design of New<br />
Drugs?<br />
R. L. Babine <strong>and</strong> S. L. Bender, Chem. Rev., 97, 1359 (1997). <strong>Molecular</strong> Recognition of<br />
Protein-Lig<strong>and</strong> Complexes: Application to Drug Design.<br />
G. Klebe <strong>and</strong> H.-J. Bohm,]. Recept. Signal. Transduction Res., 17,459 (1997). Energetic <strong>and</strong><br />
Entropic Factors Determining Binding Affinity in Protein-Lig<strong>and</strong> Complexes.<br />
D. H. Williams, Chem. SOC. Rev., 28,57 (1998). Aspects of Weak Interactions.<br />
J. R. H. Tame, J. Cornput.-<strong>Aided</strong> Mol. Des., 13,99 (1999). Scoring Functions: A View from<br />
the Bench.<br />
A. R. Fersht, J.-P. Shi, J. Knill-Jones, D. M. Lowe, A. J. Wilkinson, D. M. Blow, P. Brick, P.<br />
Carter, M. M. Y. Waye, <strong>and</strong> G. Winter, Nature, 314, 235 (1985). Hydrogen Bonding <strong>and</strong><br />
Biological Specificity Analyzed by Protein Engineering.<br />
A. Horovitz, L. Serrano, B. Avron, M. Bycroft, <strong>and</strong> A. R. Fersht, /. Mol. B id, 216, 1031<br />
(1990). Strength <strong>and</strong> Cooperativity of Contributions of Surface Salt Bridges to Protein<br />
Stability.<br />
A. J. Doig <strong>and</strong> D. H. Williams,]. Am. Chem. SOC., 114, 338 (1992). Binding Energy of an<br />
Arnide-Amide Hydrogen Bond in Aqueous <strong>and</strong> Nonpolar Solvents.<br />
P. L. Chau <strong>and</strong> P. M. Dean, ]. Cornput.-<strong>Aided</strong> Mol. Des., 8, 513 (1994). Electrostatic<br />
Complementarity Between Proteins <strong>and</strong> Lig<strong>and</strong>s. 1. Charge Disposition, Dielectric <strong>and</strong><br />
Interface Effects.<br />
P. L. Chau <strong>and</strong> P. M. Dean, J. Cornput.-<strong>Aided</strong> Mol. Des., 8, 527 (1994). Electrostatic<br />
Complementarity Between Proteins <strong>and</strong> Lig<strong>and</strong>s. 2. Lig<strong>and</strong> Moieties.<br />
P. L. Chau <strong>and</strong> P. M. Dean, I. Cornput.-<strong>Aided</strong> Mol. Des., 8, 545 (1994). Electrostatic<br />
Complementarity Between Proteins <strong>and</strong> Lig<strong>and</strong>s. 3. Structural Basis.<br />
D. Eisenberg <strong>and</strong> A. D. McLachlan, Nature, 319, 199 (1986). Solvation Energy in Protein<br />
Folding <strong>and</strong> Binding.<br />
A. Ben-Naim, Hydrophobic Interactions, Plenum Press, New York, 1980.<br />
1). G. Alberg <strong>and</strong> S. L. Schreiber, Science, 262, 248 (1993). Structure-Based Design of a<br />
Cyclophilin-Calcineurin Bridging Lig<strong>and</strong>.<br />
A. R. Khan, J. C. Parrish, M. E. Fraser, W. W. Smith, P. A. Bartlett, <strong>and</strong> M. N. G. James,<br />
Biochemistry, 37, 16839 (1998). Lowering of the Entropic Barrier for Binding Conforma-<br />
tionally Flexible Inhibitors to Enzymes.<br />
B. J. Stockman, Prog. Nucl. Magn. Reson. Spectrosc., 33,109 (1998). NMR Spectroscopy as<br />
a Tool for Structure-Based Drug Design.<br />
J. T. Stivers, C. Abeygunawardana, A. S. Mildvan, <strong>and</strong> C. l? Whitman, Biochemistry 35,<br />
16036 (1996). '"N NMR Relaxation Studies of Free <strong>and</strong> Inhibitor-Bound 4-Oxalocrotonate<br />
Tautomerase: Backbone Dynamics <strong>and</strong> Entropy Changes of an Enzyme upon Inhibitor<br />
Binding.<br />
L. K. Nicholson, T. Yarnazaki, D. A. Torchia, S. Grzesiek, A. Bax, S. J. Stahl, J. D. Kaufman,<br />
P. T. Wingfield, P. Y. S. Lam, P. K. Jadhav, C. N. Hodge, P. J. Domaille, <strong>and</strong> C.-H. Chang,<br />
Nut. Struct. Biol., 2,274 (1995). Flexibility <strong>and</strong> Function in HIV-1 Protease.<br />
X. Leng, S. Y. Tsai, B. W. O'Malley, <strong>and</strong> M. J. Tsai, J. Steroid Biochem. Mol. Biol., 46,643<br />
(1993). Lig<strong>and</strong>-Dependent Conformational Changes in Thyroid Hormone <strong>and</strong> Retinoic<br />
Acid Receptors Are Potentially Enhanced by Heterodimerization with Retinoic X Receptor.<br />
A. M. Davis <strong>and</strong> S. J. Teague, Angew. Chem. Int. Ed. Engl., 38, 736 (1999). Hydrogen<br />
Bonding, Hydrophobic Interactions, <strong>and</strong> Failure of the Rigid Receptor Hypothesis.
42 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Cornbinatorial Libra y Design<br />
47. A. Wlodawer <strong>and</strong> J. Vondrasek, Annu. Rev. Biophys. Biomol. Struct., 27, 249 (1998).<br />
Inhibitors of HIV-1 Protease: A Major Success of Structure-Assisted Drug Design.<br />
48. A. R. Leach, J. Mol. Biol., 235, 345 (1994). Lig<strong>and</strong> Docking to Proteins with Discrete<br />
Sidechain Flexibility.<br />
49. G. Jones, P. Willett, <strong>and</strong> R. C. Glen,J. Mol. Biol., 245,43 (1995). <strong>Molecular</strong> Recognition of<br />
Receptor Sites Using a Genetic Algorithm with a Description of Desolvation.<br />
50. V. Schnecke, C. A. Swanson, E. D. Getzoff, J. A. Tainer, <strong>and</strong> L. A. Kuhn, Proteins: Struct.,<br />
Funct., Genet., 33, 74 (1998). Screening a Peptidyl Database for Potential Lig<strong>and</strong>s to Pro-<br />
teins with Side-Chain Flexibility.<br />
51. B. S<strong>and</strong>ak, R. Nussinov, <strong>and</strong> H. J. Wolfson, J. Comput. Biol., 5,631 (1998). A Method for<br />
Biomolecular Structural Recognition <strong>and</strong> Docking Allowing Conformational Flexibility.<br />
52. F. A. Quiocho, D. K. Wilson, <strong>and</strong> N. K. Vyas, Nature, 340,404 (1989). Substrate Specificity<br />
<strong>and</strong> Affinity of a Protein Modulated by Bound Water Molecules.<br />
53. M. L. Raymer, P. C. Sanschagrin, W. F. Punch, S. Venkataraman, E. D. Goodman, <strong>and</strong> L. A.<br />
Kuhn, J. Mol. Biol., 265, 445 (1997). Predicting Conserved Water-Mediated <strong>and</strong> Polar<br />
Lig<strong>and</strong> Interactions in Proteins Using a K-Nearest-Neighbors Genetic Algorithm.<br />
54. V. A. Makarov, B. K. Andrews, <strong>and</strong> B. M. Pettitt, Biopolymers, 45,469 (1998). Reconstruct-<br />
ing the Protein-Water Interface.<br />
55. M. Feig <strong>and</strong> B. M. Pettitt, Structure, 6, 1351 (1998). Crystallographic Water Sites from a<br />
Theoretical Perspective.<br />
56. M. Rarey, B. Kramer, T. Lengauer, <strong>and</strong> G. Klebe, J. Mol. Biol., 261, 470 (1996). A Fast<br />
Flexible Docking Method Using an Incremental Construction Algorithm.<br />
57. M. Rarey, B. Kramer, <strong>and</strong> T. Lengauer, Proteins: Struct., Func., Genet., 34, 17 (1999). The<br />
Particle Concept: Placing Discrete Water Molecules During Protein-Lig<strong>and</strong> Docking<br />
Predictions.<br />
58. E. F. Meyer, I. Botos, L. Scapozza, <strong>and</strong> D. Zhang, Perspect. Drug Discovery Des., 3, 168<br />
(1995). Backward Binding <strong>and</strong> Other Structural Surprises.<br />
59. G. D. Diana, A. M. Treasurywala, T. R. Bailey, R. C. Oglesby, D. C. Pevear, <strong>and</strong> F. J. Dutko,<br />
J. Med. Chem., 33, 1306 (1990). A Model for Compounds Active Against Human Rhi-<br />
novirus-14 Based on X-Ray Crystallography Data.<br />
60. R. D. Brown, Perspect. Drug Discovery Des., 7/8, 31 (1997). Descriptors for <strong>Diversity</strong><br />
<strong>Analysis</strong>.<br />
61. R. S. Pearlman, Chem. Des. Autom. News, 2 (l), 1 (1987). Rapid Generation of High Quality<br />
Approximate 3D <strong>Molecular</strong> Structures.<br />
62. J. Sadowski <strong>and</strong> J. Gasteiger, Chem. Rev., 93,2567 (1993). From Atoms <strong>and</strong> Bonds to Three-<br />
Dimensional Atomic Coordinates.<br />
63. N. E. Shemetulskis, D. Weininger, C. J. Blankley, J. J. Yang, <strong>and</strong> C. Humblet, J. Chem. Inf.<br />
Comput. Sci., 36, 862 (1996). Stigmata: An Algorithm to Determine Structural Com-<br />
monalities in Diverse Datasets.<br />
64. P. Willett, V. Winterman, <strong>and</strong> D. Bawden, J. Chem. Inf. Comput. Sci., 26, 109 (1986).<br />
Implementation of Nonhierarchical Cluster <strong>Analysis</strong> Methods in Chemical Information<br />
Systems: Selection of Compounds for Biological Testing <strong>and</strong> Clustering of Substructure<br />
Search Output.<br />
65. SSKEYS Gateway, MDL Information Systems Inc., 14600 Catalina St., San Le<strong>and</strong>ro, CA<br />
94577. http://www.mdli.com/.<br />
66. R. D. Brown <strong>and</strong> Y. C. Martin,J. Chem. Inf. Comput. Sci., 36,572 (1996). Use of Structure-<br />
Activity Data to Compare Structure-Based Clustering Methods <strong>and</strong> Descriptors for Use in<br />
Compound Selection.<br />
67. M. J. McGregor <strong>and</strong> P. V. Pallai, J. Chem. In{ Comput. Sci., 37, 443 (1997). Clustering of<br />
Large Databases of Compounds: Using the MDL Keys as Structural Descriptors.<br />
68. R. D. Brown <strong>and</strong> Y. C. Martin, J. Chem. lnf. Cornput. Sci., 37, 1 (1997). The Information<br />
Content of 2-D <strong>and</strong> 3-D Structural Descriptors Relevant to Lig<strong>and</strong>-Receptor Binding.
References 43<br />
69. Daylight Chemical Information Software, version 4.62. Daylight Chemical Information<br />
Systems Inc., 27401 Los Altos, Suite 370, Mission Viejo, CA 92691. http://<br />
www.daylight.com/.<br />
70. D. R. Flower, J. Chem. Inf. Comput. Sci., 38, 379 (1998). On the Properties of Bit String-<br />
Based Measures of Chemical Similarity.<br />
71. P. Willett, J. M. Barnard, <strong>and</strong> G. M. Downs, J. Chem. Znf. Comput. Sci., 38, 983 (1998).<br />
Chemical Similarity Searching.<br />
72. L. H. Hall <strong>and</strong> L. B. Kier, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B.<br />
Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 367-422. The <strong>Molecular</strong> Connec-<br />
tivity Chi Indexes <strong>and</strong> Kappa Shape Indexes in Structure-Property Modeling.<br />
73. A. T. Balaban, SAR QSAR Environ. Res., 8, 1 (1998). Topological <strong>and</strong> Stereochemical<br />
<strong>Molecular</strong> Descriptors for Databases Useful in QSAR SimilaritylDissimilarity <strong>and</strong> Drug<br />
Design.<br />
74. R. A. Lewis, J. S. Mason, <strong>and</strong> I. M. McLay, J. Chem. Znf. Comput. Sci., 37, 599 (1997).<br />
Similarity Measures for Rational Set Selection <strong>and</strong> <strong>Analysis</strong> of Combinatorial Libraries: The<br />
Diverse Property-Derived (DPD) Approach.<br />
75. E. J. Martin, J. M. Blaney, M. A. Siani, D. C. Spellmeyer, A. K. Wong, <strong>and</strong> W. H. Moos, J.<br />
Med. Chem., 38,1431 (1 995). Measuring <strong>Diversity</strong>: Experimental Design of Combinatorial<br />
Libraries for Drug Discovery.<br />
76. D. J. Cummins, C. W. Andrews, J. A. Bentley, <strong>and</strong> M. Cory, J. Chem. Inf. Comput. Sci., 36,<br />
750 (1996). <strong>Molecular</strong> <strong>Diversity</strong> in Chemical Databases: Comparison of Medicinal Chemis-<br />
try Knowledge Bases <strong>and</strong> Databases of Commercially Available Compounds.<br />
77. S. Wold, K. Esbensen, <strong>and</strong> P. Geladi, Chemom. Intell. Lab. Syst., 2, 37 (1987). Principal<br />
Component <strong>Analysis</strong>.<br />
78. B. S. Everitt <strong>and</strong> G. Dunn, Applied Multivariate Dakz <strong>Analysis</strong>, Oxford University Press,<br />
New York, 1992.<br />
79. W. S. Dillon <strong>and</strong> M. Goldstein, Multivariate <strong>Analysis</strong>: Methods <strong>and</strong> Applications, Wiley,<br />
New York, 1984.<br />
80. CLOGP. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite 370, Mission<br />
Viejo, CA 92691. http://www.daylight.com/; see also http://biobyte.com/.<br />
81. A. J. Leo, Chem. Rev., 93, 1281 (1993). Calculating log Poct from Structures.<br />
82. P.-A. Carrupt, B. Testa, <strong>and</strong> P. Gaillard, in Reviews in computational Chemistry, K. B.<br />
Lipkowitz <strong>and</strong> D. B. Boyd, Eds., Wiky-VCH, New York, 1997, Vol. 11, pp. 241-315.<br />
Computational Approaches to Lipophilicity: Methods <strong>and</strong> Applications.<br />
83. P. F. de Aguiar, B. Bourguignon, M. S. Khots, D. L. Massart, <strong>and</strong> R. Phan-Than-Luu,<br />
Chemom. Intell. Lab. Syst., 30, 199 (1992). D-Optimal Designs.<br />
84. R. S. Pearlman <strong>and</strong> K. M. Smith, Perspect. Drug Discovery Des., 9/10/11,355 (1 997). Novel<br />
Software Tools for Chemical <strong>Diversity</strong>.<br />
85. R. S. Pearlman <strong>and</strong> K. M. Smith, Drugs Future, 23, 885 (1998). Software for Chemical<br />
<strong>Diversity</strong> in the Context of Accelerated Drug Discovery.<br />
86. F. R. Burden, J. Chern. Inf. Comput. Sci., 29,225 (1989). <strong>Molecular</strong> Identification Number<br />
for Substructure Searches.<br />
87. P. R. Menard, J. S. Mason, I. Morize, <strong>and</strong> S. Bauerschmidt,]. Chem. Znf. Comput. Sci., 38,<br />
1204 (1998). Chemistry Space Metrics in <strong>Diversity</strong> <strong>Analysis</strong>, Library Design, <strong>and</strong> Com-<br />
pound Selection.<br />
88. R. S. Pearlman <strong>and</strong> K. M. Smith,J. Chem. Inf. Comput. Sci., 39,28 (1999). Metric Validation<br />
<strong>and</strong> the Receptor-Relevant Subspace Concept.<br />
89. D. Stanton,J. Chem. Inf. Comput. Sci., 39,ll (1999). Evaluation <strong>and</strong> Use of BCUT Descrip-<br />
tors in QSAR <strong>and</strong> QSPR Studies.<br />
90. G. W. Bemis <strong>and</strong> I. D. Kuntz,J. Cornput.-<strong>Aided</strong> Mol. Des., 6,607 (1992). A Fast <strong>and</strong> Efficient<br />
Method for 2D <strong>and</strong> 3D <strong>Molecular</strong> Shape Description.<br />
91. G. Moreau <strong>and</strong> C. Turpin, Analusis, 24, 17 (1996). Use of Similarity <strong>Analysis</strong> to Reduce<br />
Large <strong>Molecular</strong> Libraries to Smaller Sets of Representative Molecules.
44 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
\<br />
92. J. Sadowski, M. Wagener, <strong>and</strong> J. Gasteiger, Angew. Chem. Int. Ed. Engl., 34,2674 (1996).<br />
Assessing Similarity <strong>and</strong> <strong>Diversity</strong> of Combinatorial Libraries by Spatial Autocorrelation<br />
Functions <strong>and</strong> Neural Networks.<br />
93. S. E. Jakes <strong>and</strong> P. Willett, J. Mol. Graphics, 4, 12 (1986). Pharmacophoric Pattern Matching<br />
in Files of 3-D Chemical Structures: Selection of Interatomic Distance Screens.<br />
94. S. E. Jakes, N. Watts, P. Willett, D. Bawden, <strong>and</strong> J. D. Fisher, J. Mol. Graphics, 5,41 (1987).<br />
Pharmacophoric Pattern Matching in Files of 3-D Chemical Structures: Evaluation of Search<br />
Performance.<br />
95. R. P. Sheridan, R. Nilakantan, A. Rusinko 111, N. Bauman, K. S. Haraki, <strong>and</strong> R. Ven-<br />
kataraghavan, ]. Chem. Inf. Comput. Sci., 29, 255 (1989). 3-DSEARCH: A System for<br />
Three-Dimensional Substructure Searching.<br />
96. Y. C. Martin, M. G. Bures, <strong>and</strong> P. Willett, in Reviews in Computa#ional Chemistry, K. B.<br />
Lipkowitz <strong>and</strong> D. B. Boyd, Eds., VCH Publishers, New York, 1990, Vol. 1, pp. 213-263.<br />
Searching Databases of Three-Dimensional Structures. Y. C. Martin, J. Med. Chem., 35,<br />
2145 (1992). 3-D Database Searching in Drug Design.<br />
97. A. C. Good <strong>and</strong> J. S. Mason, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D.<br />
B. Boyd, Eds., VCH Publishers, New York, 1995, Vol. 7, pp. 67-117. Three-Dimensional<br />
Structure Database Searches.<br />
98. S. Wang, G. W. A. Milne, X. Yan, I. Posey, M. C. Nicklaus, L. Graham, <strong>and</strong> W. G. Rice, 1.<br />
Med. Chem., 39,2047 (1996). Discovery of Novel, Non-Peptide HIV-1 Protease Inhibitors<br />
by Pharmacophore Searching.<br />
99: P. C. Astles, T. J. Brown, C. M. H<strong>and</strong>scombe, M. F. Harper, N. V. Harris, R. A. Lewis, P. M.<br />
Lockey, C. McCarthy, I. M. McLay, B. Porter, A. G. Roach, C. Smith, <strong>and</strong> R. J. A. Walsh,<br />
Eur. ]. Med. Chem., 32,409 (1997). Selective Endothelin A Receptor Lig<strong>and</strong>s. 1. Discovery<br />
<strong>and</strong> Structure-Activity of 2,4-Disubstituted Benzoic Acid Derivatives.<br />
100. S. D. Pickett, J. S. Mason, <strong>and</strong> I. M. McLay, J. Chem. Inf. Comput. Sci., 36, 1214 (1996).<br />
<strong>Diversity</strong> Profiling <strong>and</strong> Design Using 3-D Pharmacophores: Pharmacophore-Derived QU-<br />
eries (PDQ).<br />
101. J. S. Mason <strong>and</strong> S. D. Pickett, Perspect. Drug Discovery Des., 7/8,85 (1997). Partition-Based<br />
Selection.<br />
102. S. D. Pickett, C. Luttmann, V. Guerin, A. Laoui, <strong>and</strong> E. James,]. Chem. Inf. Comput. Sci., 38,<br />
144 (1998). DIVSEL <strong>and</strong> COMPLIB-Strategies for the Design <strong>and</strong> Comparison of Com-<br />
binatorial Libraries Using Pharmacophoric Descriptors.<br />
103. E. K. Davies, in <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Chemistry: Libraries <strong>and</strong> Drug<br />
Discovery, 1. M. Chaiken <strong>and</strong> K. D. J<strong>and</strong>a, Eds., American Chemical Society, Washington,<br />
DC, 1996, pp. 309-316. Using Pharrnacophore <strong>Diversity</strong> to Select Molecules to Test from<br />
Commercial Catalogues.<br />
104. R. D. Brown <strong>and</strong> Y. C. Martin,]. Med. Chem., 40, 2304 (1997). Designing Combinatorial<br />
Library Mixtures Using a Genetic Algorithm.<br />
105. ChernDiverse. Oxford <strong>Molecular</strong> Group plc, The Medawar Centre, Oxford Science Park,<br />
Oxford, OX4 4GA, United Kingdom. http://www.oxmol.coml.<br />
106. R. D. Cramer, R. D. Clark, D. E. Patterson, <strong>and</strong> A. M. Ferguson, J. Med. Chem., 39, 3060<br />
(1996). Bioisosterism as a <strong>Molecular</strong> <strong>Diversity</strong> Descriptor: Steric Fields of Single Topomeric<br />
Conformers.<br />
107. J. Mount, J. Ruppert, W. Welch, <strong>and</strong> A. N. Jain, J. Med. Chem., 42, 60 (1999). IcePick: A<br />
Flexible Surface-Based System for <strong>Molecular</strong> <strong>Diversity</strong>.<br />
108. W. Welch, J. Ruppert, <strong>and</strong> A. N. Jain, Chem. Biol., 3,449 (1996). Hammerhead: Fast, Fully<br />
Automated Docking of Flexible Lig<strong>and</strong>s to Protein Binding Sites.<br />
109. A. N. Jain, K. Koile, <strong>and</strong> D. Chaprnan,J. Med. Chem., 37,2315 (1994). Compass: Predicting<br />
Biological Activities from <strong>Molecular</strong> Surface Properties. Performance Comparisons on a<br />
Steroid Benchmark.<br />
110. S. M. Boyd, M. Beverley, L. Norskov, <strong>and</strong> R. E. Hubbard, J. Cornput.-<strong>Aided</strong> Mol. Des., 9,<br />
417 (1995). Characterising the Geometric <strong>Diversity</strong> of Functional Groups in Chemical<br />
Databases.
111.<br />
112.<br />
113.<br />
114.<br />
115.<br />
116.<br />
117.<br />
118.<br />
119.<br />
120.<br />
121.<br />
122.<br />
123.<br />
124.<br />
125.<br />
126.<br />
127.<br />
128.<br />
129.<br />
References 45<br />
P. A. Bartlett <strong>and</strong> G. Lauri, in Book of Abstracts, 211th ACS National Meeting, New<br />
Orleans, LA, March 24-28, 1996, American Chemical Society, Washington, DC, 1996,<br />
COMP-014. The CAVEAT Vector Approach for Structure-Based Design <strong>and</strong> Combinatorial<br />
Chemistry.<br />
D. Chapman, J. Cornput.-<strong>Aided</strong> Mol. Des., 10,501 (1996). The Measurement of <strong>Molecular</strong><br />
<strong>Diversity</strong>: A Three-Dimensional Approach.<br />
G. Jones, P. Willett, R. C. Glen, A. R. Leach, <strong>and</strong> R. Taylor, J. Mol. Biol., 267, 727 (1997).<br />
Development <strong>and</strong> Validation of a Genetic Algorithm for Flexible Docking.<br />
C. A. Baxter, C. W. Murray, D. E. Clark, D. R. Westhead, <strong>and</strong> M. D. Eldridge, Puoteins:<br />
Struct., Funct., Genet., 33,367 (1998). Flexible Docking Using Tabu Search <strong>and</strong> an Empiri-<br />
cal Estimate of Binding Affinity.<br />
C. W. Murray, D. E. Clark, T. R. Auton, M. A. Firth, J. Li, R. A. Sykes, B. Waszkowycz, D. R.<br />
Westhead, <strong>and</strong> S. C. Young,]. Cornput.-<strong>Aided</strong> Mol. Des., 11, 193 (1997). PROSELECT<br />
Combining Structure-Based Drug Design <strong>and</strong> Combinatorial Chemistry for Rapid Lead<br />
Discovery. 1. Technology.<br />
D. E. Clark, D. Frenkel, S. A. Levy, J. Li, C. W. Murray, B. Robson, B. Waszkowycz, <strong>and</strong> D. R.<br />
Westhead, J. Cornput.-<strong>Aided</strong> Mol. Des., 9,13 (1995). PRO-LIGAND: An Approach to De<br />
Novo <strong>Molecular</strong> Design. 1. Application to the Design of Organic Molecules.<br />
M. D. Eldridge, C. W. Murray, T. R. Auton, G. V. Paolini, <strong>and</strong> R. P. Mee,]. Cornput.-<strong>Aided</strong><br />
Mol. Des., 11, 425 (1997). Empirical Scoring Functions. I. The Development of a Fast<br />
Empirical Scoring Function to Estimate the Binding Affinity of Lig<strong>and</strong>s in Receptor<br />
Complexes.<br />
E. K. Kick, D. C. Roe, A. G. Skillman, G. Liu, T. J. A. Ewing, Y. Sun, 1. D. Kuntz, <strong>and</strong> J. A.<br />
Ellman, Chem. Biol., 4,297 (1997). Structure-Based Design <strong>and</strong> Combinatorial Chemistry<br />
Yield Low-Nanomolar Inhibitors of Cathepsin D.<br />
T. S. Haque, A. G. Skillman, C. E. Lee, H. Hahashita, I. Y. Gluzman, T. J. A. Ewing, D. E.<br />
Goldberg, I. D. Kuntz, <strong>and</strong> J. A. Ellman, 1. Med. Chern., 42, 1428 (1999). Potent, Low-<br />
<strong>Molecular</strong>-Weight Non-Deptide Inhibitors of Malarial Aspartyl Protease Plasmepsin 11.<br />
Y. Sun, T. J. A. Ewing, A. G. Skillman, <strong>and</strong> I. D. Kuntz,J. Cornput.-<strong>Aided</strong> Mol. Des., 12,597<br />
(1998). CombiDOCK: Structure-Based Combinatorial Docking <strong>and</strong> Library Design.<br />
H.-J. Bohm, J. Cornput.-<strong>Aided</strong> Mol. Des., 6, 61 (1992). The <strong>Computer</strong> Program LUDI: A<br />
New Method for the De Novo Design of Enzyme Inhibitors.<br />
H.-J. Bohm,J. Cornput.-<strong>Aided</strong> Mol. Des., 10,265 (1996). Towards the Automatic Design of<br />
Synthetically Accessible Protein Lig<strong>and</strong>s: Peptides, Amides <strong>and</strong> Peptidomimetics.<br />
H.-J. Bohm, D. W. Bannel; <strong>and</strong> L. Weber, J. Cornput.-<strong>Aided</strong> Mol. Des., 13, 51 (1999).<br />
Combinatorial Docking <strong>and</strong> Combinatorial Chemistry: Design of Potent Non-peptide<br />
Thrombin Inhibitors.<br />
Design in Receptor. Oxford <strong>Molecular</strong> Group plc, The Medawar Centre, Oxford Science<br />
Park, Oxford, OX4 4GA, United Kingdom. http://www.oxmol.co.u!d.<br />
C. M. Murray <strong>and</strong> S. J. Cato, J. Chern. Inf Cornput. Sci., 39,46 (1999). Design of Libraries<br />
to Explore Receptor Sites.<br />
M. Lajiness, in QSAR: Rational Approaches to the Design of Bioactive Compounds, C.<br />
Silipo <strong>and</strong> A. Vittoria, Eds., ESCOM, Leiden, 1991, pp. 201-204. Evaluation of the Perfor-<br />
mance of Dissimilarity Selection Methodology.<br />
R. Taylor, J. Cbern. Inf. Cornput. Sci., 35, 59 (1995). Simulation <strong>Analysis</strong> of Experimental<br />
Design Strategies for Screening R<strong>and</strong>om Compounds as Potential New Drugs <strong>and</strong><br />
Agrochemicals.<br />
S. K. Kearsley, S. Sallamack, E. M. Fluder, J. D. Andose, R. T. Mosley, <strong>and</strong> R. P. Sheridan,<br />
J. Chem. Inf. Cornput. Sci., 36, 118 (1996). Chemical Similarity Using Physiochemical<br />
Property Descriptors.<br />
V. J. Gillet, P. Willett, <strong>and</strong> J. Bradshaw,]. Chern. In{ Cornput. Sci., 38,165 (1998). Identifica-<br />
tion of Biological Activity Profiles Using Substructural <strong>Analysis</strong> <strong>and</strong> Genetic Algorithms.
46 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
130. Spresi database. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite 370,<br />
Mission Viejo, CA 92691. http://www.daylight.com/.<br />
131. D. E. Patterson, R. D. Cramer, A. M. Ferguson, R. D. Clark, <strong>and</strong> L. E. Weinberger,]. Med.<br />
Chem., 39, 3049 (1996). Neighborhood Behavior: A Useful Concept for Validation of<br />
<strong>Molecular</strong> <strong>Diversity</strong> Descriptors.<br />
132. H. Matter, J. Med. Chem., 40, 1219 (1997). Selecting Optimally Diverse Compounds from<br />
Structure Databases: A Validation Study of Two-Dimensional <strong>and</strong> Three-Dimensional<br />
Descriptors.<br />
133. H. Matter,]. Peptide. Res., 52,305 (1998). A Validation Study of <strong>Molecular</strong> Descriptors for<br />
the Rational Design of Peptide Libraries.<br />
134. G. M. Downs <strong>and</strong> P. Willett, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D.<br />
B. Boyd, Eds., VCH Publishers, New York, 1995, Vol. 7, pp. 1-66. Similarity Searching in<br />
Databases of Chemical Structures.<br />
135. R. D. Cramer, S. A DePriest, D. E. Patterson, <strong>and</strong> P. Hecht, in 3-D QSAR in Drug Design, H.<br />
Kubinyi, Ed., ESCOM, Leiden, 1993, pp. 443-485. The Developing Practice of Comparative<br />
<strong>Molecular</strong> Field <strong>Analysis</strong>.<br />
136. T. I. Oprea <strong>and</strong> C. L. Waller, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D.<br />
B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 127-182. Theoretical <strong>and</strong> Practical<br />
Aspects of Three-Dimensional Quantitative Structure-Activity Relationships.<br />
137. G. Greco, E. Novellino, <strong>and</strong> Y. C. Martin, in Reviews in Computational Chemistry, K. B.<br />
Lipkowitz <strong>and</strong> D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 183-240.<br />
Approaches to Three-Dimensional Quantitative Structure-Activity Relationships.<br />
138. E. J. Jacobsen, L. S. Stelzer, R. E. TenBrink, K. L. Belonga, D. B. Carter, H. K. Im, W. B. Im, V.<br />
H. Sethy, A. H. Tang, P. F. Von Voigtl<strong>and</strong>er, J. D. Petke, W.-Z. Zhong, <strong>and</strong> J. W. Mickelson,J.<br />
Med. Chem., 42,1123 (1999). Piperazine Imidazo[l,S-a]quinoxaline Ureas as High-Affinity<br />
GABA, Lig<strong>and</strong>s of Dual Functionality.<br />
139. J. D. Elliott, M. A. Lago, R. D. Cousins, A. Gao, J. D. Leber, K. F. Erhard, P. Nambi, N. A.<br />
Elshourbagy, C. Kumar, J. A. Lee, J. W. Bean, C. W. DeBrosse, D. S. Eggleston, D. P. Brooks,<br />
G. Feuerstein, R. R. Ruffolo Jr., J. Weinstock, J. G. Gleason, C. E. Peishoff, <strong>and</strong> E. H.<br />
Ohlstein, ]. Med. Chem., 37, 1553 (1994). 1,3-Diarylindan-2-carboxylic Acids, Potent <strong>and</strong><br />
Selective Non-peptide Endothelin Receptor Antagonists.<br />
140. T. F. Walsh, K. J. Fitch, D. L. Williams Jr., K. L. Murphy, N. A. Nolan, D. J. Pettibone, S. L.<br />
Raymond, S. S. O’Malley, B. V. Clineschmidt, D. F. Veber, <strong>and</strong> W. J. Greenlee, Bioorg. Med.<br />
Chem. Lett., 5, 1155 (1995). Potent Dual Antagonists of Endothelin <strong>and</strong> Angiotensin I1<br />
Receptors Derived from a-Phenoxyphenylacetic Acids. 111.<br />
141. S. A. Mousa <strong>and</strong> D. A. Cheresh, Drug Discovery Today, 2, 187 (1997). Recent Advances in<br />
Cell Adhesion Molecules <strong>and</strong> Extracellular Matrix Proteins: Potential Clinical Implications.<br />
142. S. D. Pickett, I. M. McLay, <strong>and</strong> D. E. Clark, 1. Chem. Inf Comput. Sci., 40, 263 (2000).<br />
Enhancing the Hit-to-Lead Properties of Lead Optimization Libraries.<br />
143. M. A. Johnson <strong>and</strong> G. M. Maggiora, Eds., Concepts <strong>and</strong> Applications of <strong>Molecular</strong> Similarity.<br />
Wiley-Interscience, New York, 1990.<br />
144. J. B. Dunbar, Perspect. Drug Discovery Des., 7/8, 51 (1997). Cluster-Based Selection.<br />
145. M. S. Lajiness, Perspect. Drug Discovery Des., 718, 65 (1997). Dissimilarity-Based Compound<br />
Selection Techniques.<br />
146. J. H. Wikel <strong>and</strong> R. E. Higgs, ]. Biomol. Screening, 2,65 (1997). Applications of <strong>Molecular</strong><br />
<strong>Diversity</strong> <strong>Analysis</strong> in High Throughput Screening.<br />
147. R. A. Jarvis <strong>and</strong> E. A. Patrick, IEEE Trans. Comput., C-22,1025 (1973). Clustering Using a<br />
Similarity Measure Based on Shared Nearest Neighbors.<br />
148. P. R. Menard, R. A. Lewis, <strong>and</strong> J. S. Mason, J. Chem. Inf. Comput. Sci., 38, 497 (1998).<br />
Rational Screening Set Design <strong>and</strong> Compound Selection: Cascaded Clustering.<br />
149. T. N. Doman, J. M. Cibulskis, M. J. Cibulskis, P. D. McCray, <strong>and</strong> D. P. Spangler, 1. Chem. In{<br />
Comput. Sci., 36,1195 (1996). AlgorithmS: A Technique for Fuzzy Similarity Clustering of<br />
Chemical Inventories.
150.<br />
151.<br />
152.<br />
153.<br />
154.<br />
155.<br />
156.<br />
157.<br />
158.<br />
159.<br />
160.<br />
161.<br />
162.<br />
163.<br />
164.<br />
165.<br />
166.<br />
167.<br />
168.<br />
169.<br />
170.<br />
171.<br />
172.<br />
173.<br />
References 47<br />
R. Dubes <strong>and</strong> A. K. Jain, Adu. Comput., 19, 113 (1980). Clustering Methodologies in<br />
Exploratory Data <strong>Analysis</strong>.<br />
J. M. Barnard <strong>and</strong> G. M. Downs, 1. Chem. inf. Comput. Sci., 37, 141 (1997). Chemical<br />
Fragment Generation <strong>and</strong> Clustering Software.<br />
F. Murtagh, Multidimensional Clustering Algorithms, Physica-Verlag, Vienna, 1985.<br />
L. H. Hall, L. B. Kier, <strong>and</strong> B. B. Brown, J. Chem. Inf. Comput. Sci., 35, 1074 (1995).<br />
<strong>Molecular</strong> Similarity Based on Novel Atom-Type Electrotopological State Indices.<br />
M. J. Ashton, M. C. Jaye, <strong>and</strong> J. S. Mason, Drug Discovery Today, 1, 71 (1996). New<br />
Perspectives in Lead Generation. 11. Evaluating <strong>Molecular</strong> <strong>Diversity</strong>.<br />
D. Bawden, in Chemical Structures 2: The international Language of Chemistry, W. A. Warr,<br />
Ed., Springer-Verlag, Berlin, 1993, pp. 383-388. <strong>Molecular</strong> Dissimilarity in Chemical Infor-<br />
mation Systems.<br />
R. W. Kennard <strong>and</strong> L. A. Stone, Technometrics, 11, 137 (1969). <strong>Computer</strong> <strong>Aided</strong> Design of<br />
Experiments.<br />
J. D. Holliday, S. S. Ranade, <strong>and</strong> P. Willett, Quant. Struct.-Act. Relat., 14,501 (1995). A Fast<br />
Algorithm for Selecting Sets of Dissimilar Molecules from Large Chemical Databases.<br />
D. B. Turner, S. M. Tyrrell, <strong>and</strong> P. Willett,J. Chem. Infi Comput. Sci., 37, 18 (1997). Rapid<br />
Quantification of <strong>Molecular</strong> <strong>Diversity</strong> for Selective Database Acquisition.<br />
J. D. Holliday <strong>and</strong> P. Willett,]. Biomol. Screening, 1,145 (1996). Definitions of Dissimilarity<br />
for Dissimilarity-Based Compound Selection.<br />
M. Snarey, N. K. Terrett, P. Willett, <strong>and</strong> D. J. Wilton, /. Mol. Graphics, 15, 372 (1997).<br />
Comparison of Algorithms for Dissimilarity-Based Compound Selection.<br />
D. K. Agrafiotis <strong>and</strong> V. S. Lobanov, J. Chem. Inf. Comput. Sci., 39, 51 (1999). An Efficient<br />
Implementation of Distance-Based <strong>Diversity</strong> Measures Based on k-d Trees.<br />
R. D. Clark, /. Chern. In$ Comput. Sci., 37, 1181 (1997). OptiSim: An Extended<br />
Dissimilarity Selection Method for Finding Diverse Representative Subsets.<br />
R. D. Clark <strong>and</strong> W. J. Langton, J. Chem. Inf. Comput. Sci., 38, 1079 (1998). Balancing<br />
Representativeness Against <strong>Diversity</strong> Using Optimizable K-Dissimilarity <strong>and</strong> Hierarchical<br />
Clustering,<br />
M. Hassan, J. P. Bielawski, J. C. Hempel, <strong>and</strong> M. Waldman, Mol. <strong>Diversity</strong>, 2, 64 (1996).<br />
Optimisation <strong>and</strong> Visualisation of <strong>Molecular</strong> <strong>Diversity</strong> of Combinatorial Libraries.<br />
B. D. Hudson, R. M. Hyde, E. Rahr, J. Wood, <strong>and</strong> J. Osman, Quant. Struct.-Act. Relat., 15,<br />
283 (1996). Parameter Based Methods for Compound Selection from Chemical Data Bases.<br />
R. E. Higgs, K. G. Bemis, I. A. Watson, <strong>and</strong> J. H. Wikel, J. Chem. Inf. Comput. Sci., 37, 861<br />
(1997). Experimental Designs for Selecting Molecules from Large Chemical Databases.<br />
S. Anzali, J. Gasteiger, U. Holzgrabe, J. Polanski, J, Sadowski, A. Teckentrup, <strong>and</strong> M. Wage-<br />
ner, Perspect. Drug Discovery Des., 9/10/11,273 (1998). The Use of Self-organizing Neu-<br />
ral Networks in Drug Design.<br />
H. Bauknecht, A. Zell, H. Bayer, P. Levi, M. Wagener, J. Sadowski, <strong>and</strong> J. Gasteiger, /. Chem.<br />
Inf. Comput. Sci., 36, 1205 (1996). Locating Biologically Active Compounds in Medium-<br />
Sized Heterogeneous Datasets by Topological Autocorrelation Vectors: Dopamine <strong>and</strong> Ben-<br />
zodiazepine Agonists.<br />
D. K. Agrafiotis, /. Chem. Inf. Comput. Sci., 37, 841 (1997). Stochastic Algorithms for<br />
Maximizing <strong>Molecular</strong> <strong>Diversity</strong>.<br />
P. Willett, Similarity <strong>and</strong> Clustering in Chemical Information Systems, Research Studies<br />
Press, Letchworth, 1987.<br />
J. M. Barnard <strong>and</strong> G. M. Downs, J. Chem. Inf. Comput. Sci., 32,644 (1992). Clustering of<br />
Chemical Structures on the Basis of Two-Dimensional Similarity Measures.<br />
J. W. MacFarlane <strong>and</strong> D. J. Gans, in Cbemometric Methods in <strong>Molecular</strong> Design, H. van de<br />
Waterbeemd, Ed., VCH, Weinheim, 1995, pp. 295-308. Cluster Significance <strong>Analysis</strong>.<br />
D. H. Rouvray, Fuzzy Logic in Chemistry, Academic Press, San Diego, CA, 1997.
48 <strong>Molecular</strong> Diversitv <strong>and</strong> Combinatorial Librarv Desim<br />
174.<br />
175.<br />
176.<br />
177.<br />
178.<br />
179.<br />
180.<br />
181.<br />
182.<br />
183.<br />
184.<br />
185.<br />
186.<br />
187.<br />
188.<br />
189.<br />
190.<br />
191.<br />
192.<br />
193.<br />
N. E. Shemetulskis, J. B. Dunbar Jr., B. W. Dunbar, D. W. Morel<strong>and</strong>, <strong>and</strong> C. Humblet, J.<br />
Cornput.-<strong>Aided</strong> Mol. Des., 9,407 (1995). Enhancing the <strong>Diversity</strong> of a Corporate Database<br />
Using Chemical Database Clustering <strong>and</strong> <strong>Analysis</strong>.<br />
CAST-3D Database. Chemical Abstracts Services, Columbus, OH. http://www.cas.org/.<br />
Maybridge Database. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite<br />
370, Mission Viejo, CA 92691. http://www.daylight.com/.<br />
Comprehensive Medicinal Chemistry (CMC), <strong>Molecular</strong> Design Limited, San Le<strong>and</strong>ro, CA<br />
94577. An electronic database version of the Drug Compendium that is Volume 6 of Com-<br />
prehensive Medicinal Chemistry published by Pergamon Press in March 1990. Contains<br />
drugs already on the market.<br />
MACCS-I1 Drug Data Report (MDDR), <strong>Molecular</strong> Design Limited, San Le<strong>and</strong>ro, CA 94577.<br />
An electronic database version of the Prous Science Publishers journal Drug Data Report,<br />
extracted from issues starting mid-1 988. Contains biologically active compounds in the<br />
early stages of drug development.<br />
Available Chemicals Directory (ACD), <strong>Molecular</strong> Design Limited, San Le<strong>and</strong>ro, CA 94577.<br />
Contains speciality <strong>and</strong> bulk chemicals from commercial sources.<br />
SPECS/BioSPECS Database; Br<strong>and</strong>on Associates, Merrimack, NH 03054. Contains chemi-<br />
cals from private sources.<br />
R. Nilakantan, N. Bauman, <strong>and</strong> K. S. Haraki,]. Cornput.-<strong>Aided</strong> Mol. Des., 11,447 (1997).<br />
<strong>Diversity</strong> Database Assessment: New Ideas, Concepts <strong>and</strong> Tools.<br />
R. Nilakantan, N. Bauman, K. S. Haraki, <strong>and</strong> R. Venkataraghavan, ]. Chem. Inf. Comput.<br />
Sci., 30,65 (1990). A Ring-Based Chemical Structural Query System: Use of a Novel Ring-<br />
Complexity Heuristic.<br />
F. H. Allen, J. E. Davies, J. J. Galloy, 0. Johnson, 0. Kennard, C. F. Macrae, E. M. Mitchell,<br />
G. F. Mitchell, J. M. Smith <strong>and</strong> D. G. Watson,]. Chem. Inf. Comput. Sci., 31, 187 (1991).<br />
The Development of Version 3 <strong>and</strong> Version 4 of the Cambridge Structural Database System.<br />
G. W. A. Milne, M. C. Nicklaus, J. S. Driscoll, S. Wang, <strong>and</strong> D. J. Zaharevitz, J. Chem. Inf.<br />
Comput. Sci., 34, 1219 (1994). National Cancer Institute Drug Information System 3D<br />
Database.<br />
A. C. Good <strong>and</strong> R. A. Lewis, J. Med. Chem., 40, 3926 (1997). New Methodology for<br />
Profiling Combinatorial Libraries <strong>and</strong> Screening Sets: Cleaning Up the Design Process with<br />
HARPick.<br />
V. J. Gillet, P. Willett, <strong>and</strong> J. Bradshaw, J. Chem. In6 Cornput. Sci., 37, 731 (1997). The<br />
Effectiveness of Reactant Pools for Generating Structurally-Diverse Combinatorial<br />
Libraries.<br />
J. H. van Drie <strong>and</strong> M. S. Lajiness, Drug Discovery Today, 3, 274 (1998). Approaches to<br />
Virtual Library Design.<br />
J. H. Kalivas, Chemom. Intell. Lab. Syst., 15, 1 (1992). Optimization Using Variations of<br />
Simulated Annealing.<br />
R. Judson, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd, Eds.,<br />
VCH Publishers, New York, 1997, Vol. 10, pp. 1-73. Genetic Algorithms <strong>and</strong> Their Use in<br />
Chemistry.<br />
R. D. Brown <strong>and</strong> D. E. Clark, Expert Opin. Ther. Patents, 8,1447 (1998). Genetic <strong>Diversity</strong>:<br />
Applications of Evolutionary Algorithms to Combinatorial Library Design.<br />
L. Weber, Curr. Opin. Chem. Bzol., 2, 381 (1998). Applications of Genetic Algorithms in<br />
<strong>Molecular</strong> <strong>Diversity</strong>.<br />
L. Weber, Drug Discovery Today, 3, 379 (1998). Evolutionary Combinatorial Chemistry:<br />
Application of Genetic Algorithms.<br />
R. A. Lewis, A. C. Good, <strong>and</strong> S. D. Pickett, in <strong>Computer</strong>-Assisted Lead Finding <strong>and</strong> Optimi-<br />
zation: Current Tools for Medicinal Chemistry, H. van de Waterbeemd, B. Testa, <strong>and</strong> G.<br />
Fokers, Eds., Wiley-VCH, Weinheim, 1997, pp. 135-1 56. Quantification of <strong>Molecular</strong><br />
Similarity <strong>and</strong> Its Application to Combinatorial Chemistry.
References 49<br />
194. V. J. Gillet, P. Willett, J. Bradshaw, <strong>and</strong> D. V. S. Green, J. Chem. Inf. Comput. Sci., 39, 169<br />
(1999). Selecting Combinatorial Libraries to Optimize <strong>Diversity</strong> <strong>and</strong> Physical Properties.<br />
195. R. P. Sheridan <strong>and</strong> S. K. Kearsley,]. Chem. lnf. Comput. Sci., 35,310 (1995). Using a Genetic<br />
Algorithm to Suggest Combinatorial Libraries.<br />
196. S. J. Cho, W. Zheng, <strong>and</strong> A. Tropsha,J. Chem. Inf. Comput. Sci., 38,259 (1998). Rational<br />
Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide I i<br />
braries Using Chemical Similarity Probe <strong>and</strong> the Inverse QSAR Approaches.<br />
197. S. D. Pickett, unpublished work, 1999.<br />
198. P. J. Brown, T. A. Smith-Oliver, P. S. Charifson, N. C. 0. Tomkinson, A. M. Fivush, D. D.<br />
Sternbach, L. E. Wade, L. Orb<strong>and</strong>-Miller, D. J. Parks, S. G. Blanchard, S. A. Kliewer, J. H.<br />
Lehmann, <strong>and</strong> T. M. Willson, Chem. Biol., 4, 909 (1997). Identification of Peroxisome<br />
Proliferator-Activated Receptor Lig<strong>and</strong>s from a Biased Chemical Library.<br />
199. F. R. Salemme, J. Spurlino, <strong>and</strong> R. Bone, Structure, 5, 319 (1997). Serendipity Meets Precision:<br />
The Integration of Structure-Based Drug Design <strong>and</strong> Combinatorial Chemistry for<br />
Efficient Drug Discovery.<br />
200. J. Li, C. W. Murray, B. Waszkowycz, <strong>and</strong> S. C. Young, Drug Discovery Today, 3,105 (1998).<br />
Targeted <strong>Molecular</strong> <strong>Diversity</strong> in Drug Discovery: Integration of Structure-Based Design <strong>and</strong><br />
Combinatorial Chemistry.<br />
201. M. A. Murcko, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd,<br />
Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 1-66. Recent Advances in Lig<strong>and</strong> Design<br />
Methods.<br />
202. D. E. Clark, C. W. Murray, <strong>and</strong> J. Li, in Reviews in Computational Chemistry, K. B.<br />
Lipkowitz <strong>and</strong> D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 67-125.<br />
Current Issues in De Novo <strong>Molecular</strong> Design.<br />
203. A. Rockwell, M. Melden, R. A. Copel<strong>and</strong>, K. Hardman, C. P. Decicco, <strong>and</strong> W. F. DeGrado,<br />
J. Am. Chem. SOC., 118, 10337 (1996). Complementarity of Combinatorial Chemistry <strong>and</strong><br />
Structure-Based Lig<strong>and</strong> Design: Application to the Discovery of Novel Inhibitors of Matrix<br />
Metalloproteinases.<br />
204. A. P. Combs, T. M. Kapoor, S. B. Feng, J. K. Chen, L. F. Daudesnow, <strong>and</strong> S. L. Schreiber,<br />
J. Am. Chem. SOC., 118, 287 (1996). Protein Structure-Based Combinatorial Chemistry:<br />
Discovery of Non-peptide Binding Elements to Src SH3 Domain.<br />
205. T. C. Norman, N. S. Gray, J. T. Koh, <strong>and</strong> P. G. Schultz,]. Am. Cbem. SOL., 118,7430 (1996).<br />
A Structure-Based Library Approach to Kinase Inhibitors.<br />
206. T. M. Kapoor, A. H. Andreotti, <strong>and</strong> S. L. Schreiber, I. Am. Cbem. SOC., 120, 23 (1998).<br />
Exploring the Specificity Pockets of Two Homologous SH3 Domains Using Structure-Based,<br />
Split-Pool Synthesis <strong>and</strong> Affinity-Based Selection.<br />
207. J. P. Morken, T. M. Kapoor, S. Feng, F. Shirai, <strong>and</strong> S. L. Schreiber,J. Am. Cbem. SOC., 120,30<br />
(1998). Exploring the Leucine-Proline Binding Pocket of the Src SH3 Domain Using<br />
Structure-Based, Split-Pool Synthesis <strong>and</strong> Affinity-Based Selection.<br />
208. S. F. Brady, K. J. Stauffer, W. C. Lumma, G. M. Smith, H. G. Ramjit, S. D. Lewis, B. J. Lucas,<br />
S. J. Gardell, E. A. Lyle, S. D. Appleby, J. J. Cook, M. A. Holahan, M. T. Stranieri, J. J. Lynch<br />
Jr., J. H. Lin, I.-W. Chen, K. Vastag, A. M. Naylor-Olsen, <strong>and</strong> J. P. Vacca,J. Med. Chem., 41,<br />
401 (1998). Discoverv <strong>and</strong> Develo~ment of the Novel Potent Orallv Active Thrombin<br />
Inhiktor I\j-(9-Hydro~y-9-fluorene~arboxy)prolyl trans-4-Aminocyclohexylmethyl Amide<br />
(L-372,460): Coapplication of Structure-Based Design <strong>and</strong> Rapid Multiple Analog Synthesis<br />
on Solid Support.<br />
209. C. Illig, S. Eisennagel, R. Bone, A. Radzicka, L. Murphy, T. R<strong>and</strong>le, J. Spurlino, F. R.<br />
Salemme, <strong>and</strong> R. M. SOH, Med. Chem. Res., 4/5,244 (1998). Exp<strong>and</strong>ing the Envelope of<br />
Structure-Based Drug Design Using Chemical Libraries: Application to Small Molecule<br />
Inhibitors of Thrombin.<br />
210. D. S. Dhanoa, R. M. Soll, 2. Wu, N. Subasinghe, J. Rinker, J. Hoffman, S. Eisennagel, T.<br />
Graybill, R. Bone, A. Radzicka, L. Murphy, <strong>and</strong> F. R. Salemme, Med. Chem. Res., 415,187<br />
(1998). Serine Proteases-Directed Small Molecule Probe Libraries.
SO <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />
211.<br />
212.<br />
213.<br />
214.<br />
215.<br />
216.<br />
217.<br />
218.<br />
219.<br />
220.<br />
221.<br />
222.<br />
223.<br />
224.<br />
225.<br />
226.<br />
227.<br />
228.<br />
229.<br />
S.-H. Kim, Pure Appl. Chem., 70,555 (1998). Structure-Based Inhibitor Design for CDK2, a<br />
Cell Cycle Controlling Protein.<br />
M. Whittaker, Cum Opin. Chem. Biol., 2, 386 (1998). Discovery of Protease Inhibitors<br />
Using Targeted Libraries.<br />
A. K. Szardenings, D. Harris, S. Lam, L. Shi, D. Tien, Y. Wang, D. V. Patel, M. Navre, <strong>and</strong> D.<br />
A. Campbell, J. Med. Chem., 41,2194 (1998). Rational Design <strong>and</strong> Combinatorial Evalua-<br />
tion of Enzyme Inhibitor Scaffolds: Identification of Novel Inhibitors of Matrix<br />
Metalloproteinases.<br />
K. D. Stewart, S. Loren, L. Frey, E. Otis, V. Klinghofer, <strong>and</strong> K. I, Hulkower, Bioorg. Med.<br />
Chem. Lett., 8, 529 (1998). Discovery of a New Cyclooxygenase-2 Lead Compound<br />
Through 3-D Database Searching <strong>and</strong> Combinatorial Chemistry.<br />
T. L. Graybill, D. K. Agrafiotis, R. Bone, C. R. Illig, E. P. Jaeger, K. T. Locke, T. Lu, J. M.<br />
Salvino, R. M. SOIL J. C. Spurlino, N. Subasinghe, B. E. Tomczuk, <strong>and</strong> F. R. Salemme, in<br />
<strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Chemistry: Libraries <strong>and</strong> Drug Discovery, I. M.<br />
Chaiken <strong>and</strong> K. D. J<strong>and</strong>a, Eds., American Chemical Society, Washington, DC, 1996, pp. 16-<br />
27. Enhancing the Drug Discovery Process by Integration of High-Throughput Chemistry<br />
<strong>and</strong> Structure-Based Drug Design.<br />
E. J. Martin <strong>and</strong> R. E. Critchlow, J. Comb. Chem., 1, 32 (1999). Beyond Mere <strong>Diversity</strong>:<br />
Tailoring Combinatorial Libraries for Drug Discovery.<br />
G. M. Rishton, Drug Discovery Today, 2, 382 (1997). Reactive Compounds <strong>and</strong> In Vitro<br />
False Positives in HTS.<br />
A. D. Rodrigues, Pharm. Res., 14, 1504 (1997). Preclinical Drug Metabolism in the Age of<br />
High-Throughput Screening: An Industrial Perspective.<br />
J. H. Lin <strong>and</strong> A. Y. H. Lu, Pharmacol. Rev., 49,403 (1997). Pharmacokinetics <strong>and</strong> Metabo-<br />
lism in Drug Discovery <strong>and</strong> Development.<br />
M. H. Tarbit <strong>and</strong> J. Berrnan, Curz Opin. Chem. Biol., 2, 411 (1998). High-Throughput<br />
Approaches for Evaluating Absorption, Distribution, Metabolism <strong>and</strong> Excretion Properties<br />
of Lead Compounds.<br />
P. J. Sinko, Cum. Opin. Drug Discovery Dev., 2, 42 (1999). Drug Selection in Early Drug<br />
Development: Screening for Acceptable Pharmacokinetic Properties Using Combined In<br />
Vitro <strong>and</strong> Computational Approaches.<br />
C. A. Lipinski, F. Lombardo, B. W. Dominy, <strong>and</strong> P. J. Feeney, Adv. Drug. Delivery Rev., 23,3<br />
(1997). Experimental <strong>and</strong> Computational Approaches to Estimate Solubility <strong>and</strong> Per-<br />
meability in Drug Discovery <strong>and</strong> Development Settings.<br />
I. Moriguchi, S. Hirono, Q. Liu, I. Nakagome, <strong>and</strong> Y. Matsushita, Chem. Pham. Bull., 40,<br />
127 (1992). Simple Method of Calculating OctanoVWater Partition Coefficient.<br />
K. Palm, K. Luthmann, A.-L. Ungell, G. Str<strong>and</strong>lund, <strong>and</strong> P. Artursson,]. Pbarm. Sci., 85,32<br />
(1996). Correlation of Drug Absorption with <strong>Molecular</strong> Surface Properties.<br />
K. Palm, P. Stenberg, K. Luthmann, <strong>and</strong> P. Artursson, Pharm. Res., 14, 568 (1997). Polar<br />
<strong>Molecular</strong> Surface Properties Predict the Intestinal Absorption of Drugs in Humans.<br />
K. Palm, K. Luthman, A.-L. Ungell, G. Str<strong>and</strong>lund, F. Beigi, P. Lundahl, <strong>and</strong> P. Artursson, I.<br />
Med. Chem., 41, 5382 (1998). Evaluation of Dynamic Polar <strong>Molecular</strong> Surface Area as<br />
Predictor of Drug Absorption: Comparison with Other Computational <strong>and</strong> Experimental<br />
Predictors.<br />
S. Winiwarter, N. M. Bonham, F. Ax, A. Hallberg, H. Lennernas, <strong>and</strong> A. Karlen, J. Med.<br />
Cbem., 41,4939 (1998). Correlation of Human Jejunal Permeability (In Vivo) of Drugs with<br />
Experimentally <strong>and</strong> Theoretically Derived Parameters. A Multivariate Data <strong>Analysis</strong><br />
Approach.<br />
D. E. Clark, J. Pharm. Sci., 88, 807 (1999). Rapid Calculation of Polar <strong>Molecular</strong> Surface<br />
Area <strong>and</strong> Its Application to the Prediction of Transport Phenomena. 1. Prediction of Intesti-<br />
nal Absorption.<br />
D. E. Clark,J. Pharm. Sci., 88, 815 (1999). Rapid Calculation of Polar <strong>Molecular</strong> Surface<br />
Area <strong>and</strong> Its Application to the Prediction of Transport Phenomena. 2. Prediction of Blood-<br />
Brain Barrier Penetration.
230.<br />
231.<br />
232.<br />
233.<br />
234.<br />
235.<br />
236.<br />
237.<br />
238.<br />
239.<br />
240.<br />
241.<br />
242.<br />
243.<br />
244.<br />
245.<br />
246.<br />
247.<br />
References 51<br />
Y. C. Martin, Perspect. Drug Discovery Des., 7/8, 159 (1997). Challenges <strong>and</strong> Prospects for<br />
Computational Aids to <strong>Molecular</strong> <strong>Diversity</strong>.<br />
J. S. Mason <strong>and</strong> M. A. Hermsmeier, Curr. Opin. Chem. Biol., 3, 342 (1999). <strong>Diversity</strong><br />
Assessment.<br />
C. A. Parks, G. M. Crippen, <strong>and</strong> J. G. Topliss,J. Cornput.-<strong>Aided</strong>Mol. Des., 12,441 (1998).<br />
The Measurement of <strong>Molecular</strong> <strong>Diversity</strong> by Receptor Site Interaction Simulation.<br />
D. A. Thorner, D. J. Wild, P. Willett, <strong>and</strong> P. M. Wright, Perspect. Drug Discovery Des.,<br />
9/10/11, 301 (1998). Calculation of Structural Similarity by the Alignment of <strong>Molecular</strong><br />
Electrostatic Potentials.<br />
Ajay, W. P. Walters, <strong>and</strong> M. A. Murcko,]. Med. Chem., 41, 3314 (1998). Can We Learn to<br />
Distinguish Between Drug-like <strong>and</strong> Non-drug-like Molecules?<br />
J. Sadowski <strong>and</strong> H. Kubinyi, J. Med. Chem., 41, 3325 (1998). A Scoring Scheme for<br />
Discriminating Between Drugs <strong>and</strong> Nondrugs.<br />
A. K. Ghose, V. N. Viswanadhan, <strong>and</strong> J. J. Wendoloski, J'. Comb. Chem., 1, 55 (1999). A<br />
Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries<br />
for Drug Discovery. 1. A Qualitative <strong>and</strong> Quantitative Characterization of Known Drug<br />
Databases.<br />
J. Moult, T. Hubbard, S. H. Bryant, K. Fidelis, <strong>and</strong> J. T. Pedersen, Proteins: Struct., Funct.,<br />
Genet., Suppl. 1,2 (1997). Critical Assessment of Methods of Protein Structure Prediction<br />
(CASP): Round 11.<br />
H.-J. Bohm,J. Cornput.-<strong>Aided</strong> Mol. Des., 12,309 (1998). Prediction of Binding Constants of<br />
Protein Lig<strong>and</strong>s: A Fast Method for the Prioritization of Hits Obtained from De Novo<br />
Design or 3-D Database Search Programs.<br />
I. Muegge <strong>and</strong> Y. C. Martin, J. Med. Chem., 42, 791 (1999). A General <strong>and</strong> Fast Scoring<br />
Function for Protein-Lig<strong>and</strong> Interactions: A Simplified Potential Approach.<br />
R. H. Smith Jr., W. L. Jorgensen, J. Tirado-Rives, M. L. Lamb, P. A. J. Janssen, C. J. Michejda,<br />
<strong>and</strong> M. B. K. Smith, ]. Med. Chem., 41, 5272 (1998). Prediction of Binding Affinities for<br />
TIBO Inhibitors of HIV-1 Reverse Transcriptase Using Monte Carlo Simulations in a Linear<br />
Response Method.<br />
T. Hansson, J. Marelius, <strong>and</strong> J. Aqvist, J. Cornput.-<strong>Aided</strong> Mol. Des., 12,27 (1998). Lig<strong>and</strong><br />
Binding Affinity Prediction by Linear Interaction Energy Methods.<br />
T. P. Straatsma, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd,<br />
Eds., VCH Publishers, New York, 1996, Vol. 9, pp. 81-127. Free Energy by <strong>Molecular</strong><br />
Simulation.<br />
J, M. Barnard <strong>and</strong> G. M. Downs, Perspect. Drug Discovery Des., 7/8,13 (1997). <strong>Computer</strong><br />
Representation <strong>and</strong> Manipulation of Combinatorial Libraries.<br />
X. Chen, A. Rusinko, <strong>and</strong> S. S. Young,J. Chem. Inf. Comput. Sci., 38,1054 (1998). Recur-<br />
sive Partitioning Analys' ; of a Large Structure-Activity Data Set Using Three-Dimensional<br />
Descriptors.<br />
H. Gao, C. Williams, P. Labute, <strong>and</strong> J. Bajorath,]. Chem. Inf. Comput. Sci., 39,164 (1999).<br />
Binary Quantitative Structure-Activity Relationship (QSAR) <strong>Analysis</strong> of Estrogen Receptor<br />
Lig<strong>and</strong>s.<br />
R. S. Pearlman (University of Texas at Austin), private communication, 1999.<br />
DiverseSolutions. Distributed by Tripos, Inc., 1699 South Hanley Road, St. Louis, MO<br />
63144, on behalf of the Laboratory for <strong>Molecular</strong> Graphics <strong>and</strong> Theoretical Modeling,<br />
College of Pharmacy, University of Texas at Austin, Austin, TX, 78712.