31.07.2013 Views

Computer-Aided Molecular Diversity Analysis and ... - Read

Computer-Aided Molecular Diversity Analysis and ... - Read

Computer-Aided Molecular Diversity Analysis and ... - Read

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 1<br />

<strong>Computer</strong>-<strong>Aided</strong> <strong>Molecular</strong> <strong>Diversity</strong><br />

<strong>Analysis</strong> <strong>and</strong> Combinatorial Library<br />

Design<br />

Richard A. Lewis," Stephen D. Pickett,+* <strong>and</strong><br />

David E. Clark+<br />

*Computational Chemistry, Eli Lilly <strong>and</strong> Company Ltd., Lilly<br />

Research Centre, Erl Wood Manor, Sunninghill Road,<br />

Windlesham, Surrey, G U20 6PH, United Kingdom, <strong>and</strong><br />

t <strong>Computer</strong>-<strong>Aided</strong> Drug Design, Aventis Pharma Ltd. (formerly<br />

Rhbne-Poulenc Rorer Ltd.), Dagenham Research Centre,<br />

Rainham Road South, Dagenham, Essex, RMlO 7XS, United<br />

Kingdom, (present address): *Roche Products Ltd., Roche<br />

Discovery Welwyn, 40 Broadwater Road, Welwyn Garden City,<br />

Hertfordshire, AL7 3AY; United Kingdom<br />

INTRODUCTION<br />

The roots of combinatorial chemistry can be traced back to Merrifield's<br />

work on the solid phase synthesis of peptides during the 1960s.l Methods for<br />

rapidly synthesizing large libraries of peptides on solid phase were developed<br />

during the 1980s, making use of the combinatorial relationship between the<br />

length of a peptide <strong>and</strong> the number of possible amino acids at each position in<br />

Reviews in Computational Chemistry, Volume 16<br />

Kenny B. Lipkowitz <strong>and</strong> Donald B. Boyd, Editors<br />

Wiley-VCH, John Wiley <strong>and</strong> Sons, Inc., New York, 0 2000<br />

1


2 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

the sequence (i.e., an n-residue peptide with X possible amino acids at each<br />

position can be used as the basis for a library of X" compounds).2 A number of<br />

groups reported protocols for what has become known as cornbirzatoriul syn-<br />

thesis.3-" At about the same time, the pharmaceutical industry began to come<br />

under greater economic pressure to increase the speed of drug discovery, <strong>and</strong> so<br />

the prospect of being able to synthesize rapidly large numbers of compounds for<br />

testing was seized upon with enthusiasm. However, peptides generally make<br />

poor drug c<strong>and</strong>idates because they are rapidly metabolized in the body. There-<br />

fore, much effort was expended to develop analogous combinatorial synthetic<br />

methods applicable for producing small organic molecules. By the mid-l990s,<br />

these efforts began to bear fruit. Thus, the discipline of combinatorial chemistry,<br />

in its present-day form, was born <strong>and</strong> quickly integrated into the drug discovery<br />

efforts of the majority of pharmaceutical companies. For more details on com-<br />

binatorial chemistry <strong>and</strong> its application to drug discovery, the reader is referred<br />

to the reviews from the mid- <strong>and</strong> late 199Os.6-13<br />

The most common form of combinatorial synthesis for small molecules<br />

involves the combination of a core or scaffold moiety with various reagents,<br />

which provide the substituents for the variable R positions (Figure 1). Assuming<br />

that there are no prohibitions for synthetic reasons, all combinations of reagents<br />

at each of the positions may be generated. Thus, the potential size of the<br />

combinatorial library is given by the product of the number of possible reagents<br />

at each of the variable R positions. For example, if a scaffold has three variable<br />

positions <strong>and</strong> there are 100 possible reagents for each of those positions, then<br />

the combinatorial library generated would contain 1003 (1 million) com-<br />

pounds. Since it often happens that many more than 100 possible reagents are<br />

readily available for a given reaction, <strong>and</strong> because the number of variable groups<br />

may exceed three, it is easy to see how combinatorial library sizes may rapidly<br />

exceed current capabilities for synthesis, screening, <strong>and</strong> storage.<br />

Given that, for many libraries, a full combinatorial synthesis using all<br />

available reagents is impractical, one of the outst<strong>and</strong>ing challenges to computer-<br />

aided molecular design practitioners in recent years has been to develop<br />

computer-based techniques to help design combinatorial libraries that encom-<br />

pass as much molecular diversity as possible in the smallest number of com-<br />

pounds. Analogous methods have also been applied to analyze the molecular<br />

,R1<br />

Figure 1 Combinatorial libraries built around a benzodiazepine scaffold (left) <strong>and</strong> a<br />

diketopiperazine scaffold (right).


<strong>Molecular</strong> Recognition: Similarity <strong>and</strong> <strong>Diversity</strong> 3<br />

diversity of compound collections (e.g., combinatorial libraries, corporate re-<br />

positories, or commercial directories) to find areas of overlap or complemen-<br />

tarity, thereby providing information for compound acquisition or further syn-<br />

thesis. The application of computational methods to combinatorial libraries<br />

<strong>and</strong> the study of molecular diversity has been the subject of a number of re-<br />

views14-17 <strong>and</strong> special issues of journals;’* however, the field is still at best<br />

adolescent <strong>and</strong> continues to evolve rapidly.<br />

This chapter reviews the field of computer-aided combinatorial library<br />

design <strong>and</strong> molecular diversity analysis. The first section of the chapter provides<br />

the foundation for all that follows by examining the nature of the forces govern-<br />

ing molecular recognition <strong>and</strong> introducing the concepts of molecular similarity<br />

<strong>and</strong> molecular diversity. Following on from that, we critically review the types<br />

of descriptor used in molecular diversity studies, as well as methods for the<br />

analysis of “diversity space.” The question of how descriptors of molecular<br />

diversity can be validated is also addressed. After these topics are covered, we<br />

shall review published applications of computational methodologies for library<br />

design <strong>and</strong> diversity analysis, seeking to highlight their relative strengths <strong>and</strong><br />

weaknesses. This leads naturally into the final section, which comprises a<br />

discussion of some of the current issues facing those working in this area <strong>and</strong><br />

suggestions regarding possible directions for future research.<br />

MOLECULAR RECOGNITION:<br />

SIMILARITY AND DIVERSITY<br />

There is no universally agreed-upon definition of chemical diversity,l9720<br />

<strong>and</strong> there are several approaches for designing chemically diverse combinatorial<br />

libraries, which differ not only in the methods <strong>and</strong> descriptors used but also in<br />

the objectives of the design. We therefore start by defining our terms: by “gen-<br />

eral diverse” library we mean a combinatorial library that covers as wide a<br />

range of values as possible relative to some molecular descriptor derived from<br />

its members. A “general representative” library is here defined as a library that<br />

is designed to mirror the distribution of values for some descriptor shown by a<br />

reference collection (e.g., the World Drug Index21). A “focused” library, on the<br />

other h<strong>and</strong>, is a library that is constrained to match closely a small set of<br />

compounds or the receptor site of a protein. Each definition is relevant to an<br />

increasing hierarchy of information used for drug discovery, with the detailed<br />

three-dimensional structural information provided by a model of the binding<br />

site being at the top. It seems sensible to try to use the knowledge we have about<br />

lig<strong>and</strong>-receptor complexes <strong>and</strong> propagate this underst<strong>and</strong>ing right down to the<br />

design of general diverse libraries, if possible. The reader should not take these<br />

definitions too literally, as they are not the only ones used in the literature.<br />

It is appropriate at this point to explain also the semantics of similarity


4 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

<strong>and</strong> diversity. Similarity is a property of pairs of objects (A is similar to B).<br />

<strong>Diversity</strong> is a property of collections of objects either with respect to that<br />

collection (as in a general diverse library) or with respect to some external frame<br />

of reference (as in representative or focused libraries). <strong>Diversity</strong> is therefore not<br />

necessarily the complement of similarity; we reserve the term dissimilarity for<br />

that concept.<br />

Similarity, diversity, <strong>and</strong> compound libraries relate to the effort of phar-<br />

maceutical discovery chemists to invent molecules that will be recognized by a<br />

biological target playing a key role in a disease process. The molecules must be<br />

able to interact with the target <strong>and</strong> favorably alter the course of the disease.<br />

Our goal in design is to improve the rate <strong>and</strong> cost at which new leads are<br />

discovered. In a broad sense, this will be achieved if libraries are synthesized or<br />

compounds bought that complement the physicochemical <strong>and</strong>/or structural<br />

properties already well represented within the set of compounds available for<br />

screening: that is, if the diversity of the screening set is increased. The assump-<br />

tion here is that the properties we use are relevant to drug-receptor interac-<br />

tions. It is sometimes the case that one or more leads are known. The aim of the<br />

design is then to focus on the important properties of the leads. If the structure<br />

of the protein target is known, then the design should use this information <strong>and</strong><br />

focus the library toward compounds likely both to fit sterically <strong>and</strong> to interact<br />

favorably with the protein. This philosophy is well illustrated by Martin <strong>and</strong><br />

coworkers, who describe the design of four different libraries for different<br />

purposes <strong>and</strong> with different levels of information to direct them.22<br />

We shall start at the top of the information hierarchy, the receptor site of a<br />

protein target, to try to underst<strong>and</strong> what drives the formation of a tightly<br />

binding protein-lig<strong>and</strong> complex. We can then assess our molecular descriptors<br />

in the light of this underst<strong>and</strong>ing. There have been several successful applica-<br />

tions of site-directed lig<strong>and</strong> design,23>24 so we can try to build on these past<br />

efforts. Most of what we say in this chapter assumes that the biological target is<br />

a protein, but similar concepts apply to nucleic acids, which are less frequently<br />

the site of drug action. We use the term "drug" rather loosely; in reality, we are<br />

dealing with hg<strong>and</strong>s, some of which will hopefully have the necessary attributes<br />

to become drugs.<br />

Our current underst<strong>and</strong>ing of the specificity of biological function is<br />

based on the principles of molecular recognition25 which, details aside, have not<br />

changed greatly in the last few years. Indeed, the successes of structure-based<br />

drug design have reinforced this orthodoxy. The binding <strong>and</strong> actions of a lig<strong>and</strong><br />

are controlled by the patterns of molecular fields found in the vicinity of the<br />

contact surface of the receptor. In other words, the amino acids of the protein<br />

create an environment that the functional groups of the lig<strong>and</strong> complement.<br />

There should be multiple contacts between the lig<strong>and</strong> <strong>and</strong> the receptor to maxi-<br />

mize specificity <strong>and</strong> affinity of the overall interaction. It is still a very difficult<br />

task to design conformationally sensible, synthetically accessible target mole-<br />

cules that have the properties required for tight binding. The advantage of


<strong>Molecular</strong> Recognition: Similarity <strong>and</strong> <strong>Diversity</strong> 5<br />

combinatorial chemistry is that we can make many compounds that are approx-<br />

imately complementary to our target in shape, in hydrogen-bonding pattern,<br />

<strong>and</strong> so on, <strong>and</strong> use this extra coverage of compound space to find leads in more<br />

situations.<br />

The reduction of the rotational <strong>and</strong> translational motion of a mobile<br />

molecule that occurs on binding to the receptor site <strong>and</strong> the fixing of certain<br />

receptor side chains implies loss of entropy in both the lig<strong>and</strong> <strong>and</strong> the receptor.<br />

This must be balanced by the utilization of enthalpic binding energy between<br />

the lig<strong>and</strong> <strong>and</strong> the receptoq26 <strong>and</strong> the energy of desolvation. Favorable en-<br />

thalpic intermolecular interactions can be divided into three main groups: hy-<br />

drogen bonding, electrostatic, <strong>and</strong> polarization. This division is perhaps arbi-<br />

trary, but it is convenient, because it allows us to associate functional groups<br />

with interactions <strong>and</strong> to make up classes of hydrogen bond donors, hydrogen-<br />

bond acceptors, deprotonated acids (at physiological pH), protonated bases,<br />

aromatic rings, <strong>and</strong> hydrophobes (lipophilic portions of a molecule). These<br />

favorable interactions are counteracted by steric repulsion caused by a poor fit<br />

of the lig<strong>and</strong> <strong>and</strong> noncomplementarity between lig<strong>and</strong> functional groups <strong>and</strong><br />

the receptor (e.g., the positioning of acidic lig<strong>and</strong> groups in negatively charged<br />

regions of the receptor). It is not our purpose to discuss this issue in great detail,<br />

<strong>and</strong> the reader is directed to several excellent reviews in this area.27-31 How-<br />

ever, several points are pertinent to the discussion that follows.<br />

The in vacuo strength of a hydrogen bond can be modeled with accuracy,<br />

but the energetics of hydrogen bond formation in solution are not well under-<br />

stood, as yet. Studies by Fersht <strong>and</strong> coworkers32 indicate that the free energies<br />

for processes of the type: X-Ha, + Y, = (X-H - . Y) + aq, range from 2 to 6<br />

kJ/mol for uncharged groups <strong>and</strong> to approximately 12 kJ/mol for charged<br />

groups. The values are strongly affected by the degree of solvent exposure of the<br />

interaction; that is, surface hydrogen bonds are worth very little, even in salt<br />

bridges.33 It would thus seem likely that hydrogen bonds do not contribute<br />

greatly to the enthalpic stability of a lig<strong>and</strong>-receptor complex. Their role in<br />

drug-receptor binding seems to be more related to specificity, especially when<br />

the interaction is between charged groups. It should be noted, however, that<br />

even this view is in dispute: work by Doig <strong>and</strong> Williams34 suggests that hydro-<br />

gen bonds can, through entropy, contribute more strongly to the free energy of<br />

binding than is often supposed.<br />

The binding site will have a distinct electrostatic profile owing to the<br />

differing electronegativities <strong>and</strong> bonding environments of the receptor atoms.<br />

Electrostatic interactions may take the form of charge-charge pairs, for in-<br />

stance, salt bridges, or interactions involving one or more permanent dipoles.<br />

The affinity of the lig<strong>and</strong> will be enhanced if the pattern of lig<strong>and</strong> partial charges<br />

can be made to complement that of the receptor.3"-37 It is emphasized that<br />

complementarity does not simply imply that positive charge on the lig<strong>and</strong><br />

should be matched by negative charge on the receptor. Complementarity should<br />

also be taken to imply a matching of the magnitudes of the charges as well. A


6 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combirzatorial Library Design<br />

highly polar area should not be matched to a slightly polar area, since the energy<br />

of desohation will not be recouped. This is the same argument as for hydrogen<br />

bonds.<br />

In regions of low polarity, the drug-receptor interaction is influenced<br />

more by entropic <strong>and</strong> weak dispersive effects. Complementarity is achieved by<br />

placing nonpolar regions of the lig<strong>and</strong> <strong>and</strong> receptor next to each other. The<br />

work of Eisenberg <strong>and</strong> McLachlan38 has provided an approximate means of<br />

quantifying the free energy of hydrophobic interactions involved in protein<br />

folding, using a simple atomic solvation potential, G = X(rsjAj), where oi is an<br />

empirically determined partition coefficient for the atom class <strong>and</strong> Ai is the<br />

surface area of atom i in the protein.<br />

The free energy of binding can also be strongly influenced by entropic<br />

effects. Any solute in water causes a local ordering of the water molecules in the<br />

first hydration sheath <strong>and</strong> a loss of mobility.39 Removal of the solute by complexation<br />

will lead to an increase in the solvent entropy. A similar result is<br />

obtained by displacing weakly bound water from the binding site. In contrast,<br />

entropy is lost through the fixing of the lig<strong>and</strong> upon complexation. The loss of<br />

Brownian entropy of rotation <strong>and</strong> translation is inevitable. The loss of internal<br />

conformational entropy, caused by the enthalpic interactions between the site<br />

groups <strong>and</strong> the lig<strong>and</strong> atoms, can be reduced by chemically bracing (rigidifying)<br />

the lig<strong>and</strong>, that is, through the introduction of ring systems in place of flexible<br />

chains. An excellent illustration of this is the work of Alberg <strong>and</strong> Schreiber.40<br />

More recently, studies by Khan et al.41 have given a further vivid example:<br />

X-ray structures of both the flexible <strong>and</strong> the braced lig<strong>and</strong> showed that the<br />

extra binding of the braced lig<strong>and</strong> was due almost entirely to the fixing of the<br />

bound orientation. NMR experiments have shed light on many aspects of protein<br />

dynamics <strong>and</strong> the effect of lig<strong>and</strong> binding.42 Indeed, it has been suggested<br />

that in some cases the loss of protein conformational entropy at its binding site<br />

may be compensated for by increased conformational flexibility in other<br />

regions.43<br />

The conformational changes that occur on formation of a complex have<br />

further implications for the process of library design. Many current methods<br />

assume an essentially static picture of the receptor. This assumption is clearly<br />

unsound, but the nature of the conformational changes that occur upon complexation<br />

cannot be predicted until a lig<strong>and</strong> has been fully designed. It is often<br />

assumed that the uncomplexed conformations of the receptor <strong>and</strong> the lig<strong>and</strong> are<br />

low energy states <strong>and</strong>, as such, will be reasonably well populated in the complex<br />

<strong>and</strong> will provide a good starting model for the design process. HIV-1 protease44<br />

<strong>and</strong> the retinoic acid lig<strong>and</strong> binding domains45 provide worrying counterexamples<br />

to this assumption; a number of others have been cataloged recently.46<br />

Nevertheless, modeling studies have still proved very useful in the case of HIV-1<br />

protease when coupled with X-ray or NMR data.47 Several conformations of<br />

the receptor <strong>and</strong> the lig<strong>and</strong> may be examined, but owing to the computational<br />

expense, it is not possible at present to examine all the low energy states. It is


Describing <strong>Diversity</strong> Space 7<br />

possible to perform good conformational analyses on large numbers of small<br />

molecules, <strong>and</strong> on the binding site itself, but at present the two cannot be<br />

combined except in an approximate or limited manner.48-51<br />

It is easy, when discussing the energetics of complex formation, to forget<br />

the crucial role played by water. It cannot be emphasized enough that water<br />

plays a vital part in the energetics of complexation, both entropically <strong>and</strong><br />

enthalpically. Another function of water molecules is the mediation of contacts<br />

between the lig<strong>and</strong> <strong>and</strong> the receptor. There are many examples in which this<br />

behavior has been observed in crystallographic complexes. One study that spe-<br />

cifically investigates this phenomenon is the work of Quiocho et al. on L-ara-<br />

binose binding protein.52 It is not clear which of the molecules of water that are<br />

observed in the crystal structure of a receptor are going to be important in<br />

subsequent interactions with an incoming lig<strong>and</strong>. There are no firm rules for<br />

deciding a priori which water molecules are structural <strong>and</strong> integral to the site,<br />

but progress has been made in this direction with the CONSOLV programs3<br />

<strong>and</strong> more recent work by Pettitt <strong>and</strong> coworkers.54~SS The docking program<br />

FlexX56 has been extended to allow automatic inclusion of water molecules in<br />

the docking. However, the difficulties in this area are shown by the final overall<br />

results: only a slight improvement was obtained over calculations without wa-<br />

ter, some dockings being greatly improved <strong>and</strong> others worsened.57<br />

In any set of lig<strong>and</strong>s, it is possible to have multiple modes of binding to the<br />

same active site; it is very difficult to distinguish a priori between the different<br />

modes with confidence using existing methodologies. Examples of potential<br />

multiple binding modes can be found in several well-characterized systems.s8<br />

These systems show large-scale changes among the different binding modes. In<br />

the human rhinovirus-14 system, two binding modes are equally populated <strong>and</strong><br />

so cannot be distinguished.59 In other cases, the binding mode may be poorly<br />

defined (giving disorder in the crystal). Multiple binding modes do not affect<br />

the process of library design in principle, However, methods should be able to<br />

consider all reasonable binding modes for which the correct answer is not<br />

known a priori, e.g., by similarity to a docked lig<strong>and</strong>. The interpretation of<br />

binding studies can also be complicated if members of the same library bind in a<br />

different manner, giving rise to what is in effect two or more structure-activity<br />

relationships.<br />

DESCRIBING DIVERSITY SPACE<br />

The key to any analysis of molecular diversity or library design is the<br />

descriptors used. From the discussion above, it is clear that the descriptors must<br />

in some way represent, or be correlated with, the important factors governing<br />

pharmaceutical efficacy, such as receptor binding or drug transport. The<br />

descriptors to be chosen will depend on several factors, such as the number of


8 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />

compounds to be analyzed <strong>and</strong> what information is available for the target. It<br />

may be that different descriptors are used at various stages of the design process<br />

as described later in the section on Applications. Here we begin by summarizing<br />

the many different descriptors available for diversity analysis/library design;<br />

then we shall discuss the best choice of descriptors for different design tasks.<br />

Finally, we present a discussion on descriptor validation. Descriptors for diver-<br />

sity analysis have also been reviewed by Brown.60<br />

Types of Descriptor<br />

Most available descriptors can be divided into two broad classes depend-<br />

ing on whether they can be calculated from the two-dimensional (2-D) connec-<br />

tion table or a three-dimensional (3-D) structure, which is usually generated<br />

from a connection table by programs such as CONCORD61 or CORINA.62 In<br />

the 3-D case, conformational flexibility of the molecules should also be con-<br />

sidered, since the generated conformation is unlikely to correspond precisely to<br />

that bound at the biological target. In this instance, descriptor calculation can<br />

be a time-consuming exercise. A second classification of descriptors may be<br />

made according to the way that the information is encoded <strong>and</strong> similarities<br />

calculated: bit strings or fingerprints versus data reduction of many real-valued<br />

descriptors.<br />

2-0 Bit Strings<br />

Molecules are not well described by single descriptors, <strong>and</strong> thus as many<br />

descriptors as is practical should be used. This necessitates mechanisms for<br />

encoding the descriptor information as efficiently as possible, to allow more<br />

parameters to be used. The most obvious method is to use a binary key (or “bit<br />

string”), in which bits are set on or off depending on the presence or absence of<br />

a feature or some other binary condition. Apart from compact storage, binary<br />

keys can also be operated on very quickly. If a sufficient number of features is<br />

encoded in it, a key can serve as a unique descriptor, or “fingerprint,” for the<br />

molecule. The fingerprint profile for a library can be built up by using the<br />

Boolean AND or OR operation for all the molecule fingerprints in the library.<br />

The AND operation gives an idea of what features are common throughout the<br />

library; the OR operation gives the diversity of features. The power of the AND<br />

operation can be extended to give modal fingerprints,63 in which the feature bit<br />

is set if the feature occurs in more than a threshold percentage of the com-<br />

pounds (the normal AND key would have a threshold of 100%). This is useful<br />

when one is trying to analyze a series of screening hits to create a constraint<br />

profile to guide library generation.<br />

Two approaches have been adopted for encoding structural information<br />

in bit strings. The first uses a predefined set (or “dictionary”) of substructural<br />

features, <strong>and</strong> a bit is set on only if a particular feature is present in the molecule<br />

(Figure 2a). Such keys were originally developed in the context of substructural


IIIIIIIIIIIIIIIIIIIII<br />

H,C-OH<br />

Describing <strong>Diversity</strong> Space 9<br />

LlIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIlIi<br />

*<br />

Figure 2 Simple illustration of bit string encoding of chemical structure. (a) Sample<br />

of a fragment dictionary-based approach. (b) Sample of a hashing scheme using a<br />

path-based decomposition of the structure. The asterisk denotes an element in the bit<br />

string where a collision has resulted from the hashing procedure.<br />

searching systems; Willett et al.64 were the first to use them to analyze screening<br />

sets. One of the most commonly used implementations of the first approach, the<br />

MACCS keys,65 have been used quite frequently for diversity studies.66>67<br />

Brown <strong>and</strong> Martin have shown that adding a frequency count (i.e., storing the<br />

number of times a feature occurs in the molecule) gives improved performance<br />

<strong>and</strong> that such keys correlate reasonably well with calculated physical properties<br />

such as octanol-water partition coefficients (ClogP) etc.68 The alternative ap-<br />

proach involves an exhaustive enumeration of all bond paths through a mole-<br />

cule, starting with paths of zero length (the atoms) <strong>and</strong> continuing up to a<br />

length of seven bonds. This method encodes not just the st<strong>and</strong>ard substructural<br />

features (e.g., a carboxylate group is covered by paths of length 0,1, <strong>and</strong> 2) but<br />

their relationship in the molecule. The most well-known implementation of this<br />

method is in the Daylight software.69 To enable the use of a fixed-length string,<br />

the occurrence of a particular path is taken as the seed to a pseudo-r<strong>and</strong>om<br />

number generator, which generates a number of bits. These bits are then OR’ed<br />

into the fingerprint for the molecule (Figure 2b). This process is known as<br />

hashing. The advantages of the path-based approach are that it is exhaustive<br />

<strong>and</strong> no predefinition of fragments is necessary. In principle, this should lead to


10 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

better retrieval performance in substructure or similarity searching, whatever<br />

the query. The disadvantage is that a particular bit in a hashed fingerprint has<br />

no particular meaning, <strong>and</strong> several paths may set the same bit by chance. This<br />

may be an issue when using hashed fingerprints for similarity- <strong>and</strong> diversity-<br />

related tasks. Two recent discussions of bit string similarity measures are recom-<br />

mended reading.70971<br />

Topological Indices <strong>and</strong> Other Propevties Derived<br />

from 2-D Structures<br />

A large number of topological descriptors can be calculated from a 2-D<br />

connection table. These represent such molecular attributes as shape, branch-<br />

ing, flexibility, <strong>and</strong> electronic properties.72973 Such descriptors have been used<br />

by several groups for library design or compound selection.74-76 The difficulty<br />

here is in combining the descriptors, because many of them will be correlated. A<br />

variety of techniques exists to tackle this problem, including principal compo-<br />

nents analysis (PCA)77 <strong>and</strong> multidimensional scaling (MDS).78J9 In the Chiron<br />

work,75 both PCA <strong>and</strong> MDS were used on different families of descriptors such<br />

as topological indices, ClogP,80-82 2-D structural similarities, <strong>and</strong> specific atom<br />

layer descriptors derived to represent the distribution of key chemical features<br />

around a key point (such as the point of attachment to the core) using bond<br />

counts. These analyses provided a total of 16 composite descriptors for analysis<br />

by D-optimal design techniques.83 Lewis et a1.74 took the approach of searching<br />

for six noncorrelated descriptors <strong>and</strong> used these to partition the corporate<br />

database at what was RhGne-Poulenc Rorer (RPR). Compared to the Chiron<br />

work, the latter approach offers greater interpretability.<br />

Pearlman <strong>and</strong> Smith84385 have developed novel molecular descriptors<br />

termed BCUTs based on an initial idea by Burden.86 A number of different atom<br />

level matrices are generated in which the diagonal represents a property such as<br />

atom charge while the off-diagonal elements contain information such as the<br />

2-D (or single-conformer 3-D) distance between two atoms. It is suggested that<br />

the lowest <strong>and</strong> highest eigenvalues of such matrices contain information that is<br />

useful with regard to molecular diversity. Five or six eigenvalues are selected by<br />

means of a x2 test such that the favored descriptors give an even distribution of<br />

molecules across the five- or six-dimensional space. Again, partitioning is used<br />

to divide the space. This method is applicable to very large data sets (hundreds<br />

of thous<strong>and</strong>s of molecules) <strong>and</strong> can be used to rapidly compare two large sets of<br />

compounds or to select a representative set of reagents for library design (based<br />

on whole molecule properties). Recent work87 has extended this approach to<br />

use a nonuniform binning scheme. Furthermore, Pearlman <strong>and</strong> Smiths8<br />

describe how this methodology can be used to define what they have termed a<br />

receptor-relevant subspace. In this case, the metrics are chosen so as to group<br />

sets of actives in the same region of space. The BCUT descriptors have also been<br />

shown to be useful for studies of quantitative structure-activity <strong>and</strong> structure-<br />

property relationships (QSAR <strong>and</strong> QSPR).89


Describing <strong>Diversity</strong> Space 11<br />

Property Fingerprints<br />

A natural extension to the substructural fingerprint is the property fin-<br />

gerprint. Bemis <strong>and</strong> Kuntz90 have described a method for combining the dis-<br />

tances between points on a molecular surface into a histogram, which can be<br />

regarded as a fingerprint with frequencies. Moreau <strong>and</strong> Turpin91 have used<br />

autocorrelation vectors based on the values of properties at the atomic centers<br />

in a molecule. Gasteiger <strong>and</strong> coworkers92 have taken this idea further by look-<br />

ing at the values of some defined property calculated at the surface of a mole-<br />

cule. An autocorrelation coefficient is constructed from the property values at<br />

several pairs of points (at the atomic centers or r<strong>and</strong>omly distributed on the<br />

surface of the molecule) <strong>and</strong> the distance separating the points. A fingerprint is<br />

obtained by binning the pairs into preset distance intervals. For reasons of<br />

computational expediency, however, these approaches consider only one con-<br />

formation of each molecule. In the Moreau approach,91 where the number of<br />

points to be sampled is much smaller, the distance intervals also have an impor-<br />

tant effect on the amount of useful information contained within the vector.<br />

This is also a critical factor in pharmacophore keys, as we discuss below. Mor-<br />

eau also computes eight separate vectors based on the connectivity, size,<br />

n-bonds, heteroaromaticity, hydrogen bond donor <strong>and</strong> acceptor capability, <strong>and</strong><br />

the contribution to ClogP of each atom. These vectors are concatenated to give<br />

the overall property fingerprint.<br />

3-0 Desm‘ptors<br />

Following the early work of Willett <strong>and</strong> coworkers93794 <strong>and</strong> Sheridan et<br />

a1.,95 searching databases of 3-D structures of organic compounds has become<br />

an essential tool in the pharmaceutical industry.96.97 Results of 3-D flexible<br />

searching within databases of known compounds have proven this in a practical<br />

sense (see, e.g., Refs. 98 <strong>and</strong> 99).<br />

These successes have led to the suggestion that descriptors based on three-<br />

point pharmacophores could be useful in assessing the pharmacophoric diver-<br />

sity of large data sets <strong>and</strong> in library design.100-104 The principle is illustrated in<br />

Figure 3. The Abbott implementation used fixed-width 1 bins up to 15 A <strong>and</strong><br />

considered only the CONCORD-generated conformation.104 In the implemen-<br />

tation at RPR using the ChemDiverse software,lOs all potential pharmacophore<br />

triangles or quadrangles are formed from seven types of interaction center<br />

(hydrogen bond acceptor, hydrogen bond donor, tautomeric groups, aromatic<br />

centroids, hydrophobes, acids, <strong>and</strong> bases) over a range of distances of 2-24 A<br />

with variable-width bins. Conformational flexibility is taken into account by<br />

means of a systematic search procedure including a bump-check to eliminate<br />

high energy conformers.100J01 With three points (triangles), there are over<br />

250,000 potential pharmacophores; this number rises to 24 x 106 if four points<br />

(quadrangles) are considered. The presence or absence of these pharmacophores<br />

in a molecule is encoded in a bit string, often referred to as the molecule’s<br />

“pharmacophore key.”


12 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />

J<br />

Pharmacophore Key -1<br />

Figure 3 Illustration of the creation of a pharmacophore key. As the conformation of<br />

a molecule changes, so do the distances between the pharmacophoric groups<br />

(spheres). Each of the two different three-point pharmacophores shown sets its own<br />

particular bit in the pharmacophore key.<br />

The relevance of such descriptors to drug-receptor interactions is evident.<br />

The bit string represents the triangles formed between key interaction points<br />

over a range of accessible conformations. Two key elements in this approach are<br />

correct atom typing (distinguishing basic nitrogens, tautomeric groups, etc.)<br />

<strong>and</strong> the conformational analysis.100J01 Both these aspects have been the subject<br />

of extensive in-house development at RPR. The recent extension to four-point<br />

pharmacophores has been shown to give even greater discrimination between<br />

compounds.101 One drawback is the time needed to perform the conforma-<br />

tional analysis. Given the availability of several machines on a network, how-<br />

ever, even crude parallelization allows the corporate database to be analyzed<br />

within a few days.<br />

Cramer et a1.106 developed a methodology called comparative molecular<br />

field analysis (CoMFA). Rules are used to align R groups (hence the method is<br />

not applicable to all diversity tasks) in a single conformation (which may in-<br />

clude intramolecular contacts). An interaction energy is calculated with a probe<br />

positioned at all points on a grid around the molecules. Since conformational<br />

flexibility is ignored, these “topomeric” descriptors are essentially “2.5-D.”<br />

Mount et al.107 have recently published the IcePick methodology<br />

developed at Axys. A small set of low energy conformers is generated for each<br />

molecule. Pairwise comparisons are performed, flexibly fitting a conformation<br />

of molecule B onto a fixed conformer of molecule A <strong>and</strong> vice versa, using a<br />

modified version of the Hammerhead docking algorithm.108 The scoring func-<br />

tion utilizes the molecular surface scoring of the Compass program,’O9 which<br />

considers hydrophobic <strong>and</strong> hydrogen-bonding properties at a set of discrete<br />

points projected onto two shells at 6 <strong>and</strong> 9 8, around the molecule. The overall<br />

similarity is the average of these measures over all pairs of matches of A onto B<br />

<strong>and</strong> B onto A. The dissimilarity is computed as (1 - similarity). Each<br />

dissimilarity calculation can take about 40 seconds on a DEC (now Compaq)


Describing <strong>Diversity</strong> Space 13<br />

Alpha workstation, <strong>and</strong> so the results are stored in a database for future use.<br />

This time-consuming method has been used primarily for reagent selection,<br />

assuming that the presence of an acid, for example, would define how the<br />

reagents would fit to a common core.<br />

A further method for analyzing the geometric diversity of functional<br />

groups in chemical structure databases has been reported by Hubbard <strong>and</strong> co-<br />

workers.110 Their program, HookSpace, analyses the spatial relationship be-<br />

tween pairs of functional groups <strong>and</strong> provides both qualitative <strong>and</strong> quantitative<br />

diversity measures. The utility of the method was demonstrated by comparing<br />

the diversity of two commercially available databases <strong>and</strong> a benzodiazepam-<br />

based combinatorial library. In a similar vein, Bartlett <strong>and</strong> Lauri have used the<br />

CAVEAT program to assess the diversity of different combinatorial core groups<br />

based on a comparison of bond vectors at the substituent positions.111<br />

Chapman112 has proposed an elegant formalism for expressing the diver-<br />

sity of a collection of molecules, based on molecular entropy <strong>and</strong> the three-<br />

dimensional arrangement of steric bulk <strong>and</strong> polar functionalities. The method<br />

addresses molecular flexibility by means of a conformational search to identify<br />

a set of low energy conformers. The similarity of two conformers is given by<br />

computing the best steric overlap, then computing the sum of the distances<br />

between each atom in conformer 1 <strong>and</strong> its corresponding nearest neighbor in<br />

conformer 2. An analogous function is used to compute a distance based on<br />

polar functionalities (hydrogen bond donors, acceptors, etc.). Note that all<br />

pairs of conformers for all molecules are compared. The diversity function<br />

comprises a sum of minimum dissimilarities together with an entropic penalty<br />

term based on the number of rotatable bonds in a molecule. It will come as no<br />

surprise to learn that this approach is very computationally expensive. Thus, in<br />

practice, this method is probably restricted to cases in which the superposition<br />

is fixed, that is, looking at the position of side chains relative to a fixed core.<br />

Receptor-Based Descriptors<br />

When a crystal structure is available, the additional information ironically<br />

makes the task of design more time-consuming. It is not currently feasible to<br />

perform detailed calculations on every member of a library within the proposed<br />

active site, including all the important factors described in the above section on<br />

<strong>Molecular</strong> Recognition: Similarity <strong>and</strong> <strong>Diversity</strong>. Indeed, methods for the flex-<br />

ible docking of lig<strong>and</strong>s are still being developed, although some (e.g., those<br />

described in Refs. 56, 113, <strong>and</strong> 114) are beginning to show promising success<br />

rates. However, such methods are quite computationally expensive; thus, ap-<br />

proaches that make more approximations are probably necessary. Some recent<br />

publications in this area use one particular approximation: specifically, holding<br />

the template or scaffold fixed <strong>and</strong> considering each R group independently. The<br />

PROSELECT1 1s strategy builds on the earlier de novo design program<br />

PRO-LIGAND.116 Several potential template positions are chosen <strong>and</strong> substi-<br />

tuents assessed by means of an empirical scoring function.117 The Kuntz group


14 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />

has developed an approach (CombiBUILD) based around the program DOCK,<br />

which assesses mainly the steric fit of a lig<strong>and</strong> with an approximate force field<br />

score for ranking. The template is kept fixed <strong>and</strong> substituents at each position<br />

are evaluated while allowing for possible intramolecular interactions between<br />

substituents at different positions using conformational probability maps. This<br />

method has been used with success to select reagents for a library against<br />

cathepsin D.118 More recently, the DOCK program itself has been used in an<br />

“anchor-<strong>and</strong>-grow” mode to design libraries targeted against plasmepsin 11.119<br />

Another DOCK variant for library design, CombiDOCK, has been described,<br />

but no applications have yet been published.120 In another approach, Bohm<br />

adapted the LUDI de novo design programl21J22 to allow the structure-based<br />

selection of reagents <strong>and</strong> has recently applied this methodology to design inhibi-<br />

tors of thrombin.123<br />

Chemical Design Ltd. has developed software (“Design in Recep-<br />

tor”124J25) that allows the virtual screening of tens to hundreds of thous<strong>and</strong>s of<br />

compounds against all potential three- or four-point pharmacophores within<br />

the binding site of a protein. This program, which extends the pharmacophore-<br />

based methodology to embrace the concept of site-directed library design, was<br />

developed in collaboration with a small number of pharmaceutical industry<br />

partners. The method operates as follows: first, key interaction sites (donor,<br />

acceptor, acid, base, hydrophobe, or aromatic) are defined in the receptor site.<br />

Then, all possible three- or four-point pharmacophore queries are derived from<br />

these sites. The number of queries can be restricted by applying user-definable<br />

criteria, which may specify, for instance, minimum <strong>and</strong> maximum distances<br />

between points <strong>and</strong>/or groups of points that must be included in all phar-<br />

macophores. Finally, the derived set of pharmacophores (perhaps several hun-<br />

dred to more than a thous<strong>and</strong>) is used to search the database of virtual products,<br />

with the protein active site acting as a steric constraint. The search is performed<br />

as a st<strong>and</strong>ard 3-D pharmacophore search, with each hit conformer being fitted<br />

back onto the matching query pharmacophore. However, matching each phar-<br />

macophore in turn against every molecule would require repeating the confor-<br />

mational analysis for each compound. Speed is gained by inverting the match-<br />

ing loop: performing the conformational analysis only once <strong>and</strong> comparing<br />

each conformer against all query pharmacophores. The same proprietary con-<br />

formational analysis scripts <strong>and</strong> atom typing can be used as for st<strong>and</strong>ard phar-<br />

macophore key calculations.101 It is possible to save three pharmacophore keys:<br />

(1) the key of the site pharmacophore matched, (2) the key of the lig<strong>and</strong> atoms<br />

matching site pharmacophores, <strong>and</strong> (3) the full pharmacophore key of the<br />

lig<strong>and</strong> OR’ed over all conformations that fit the site. Such methodology should<br />

open the way for full product-based design taking account of the ability of<br />

the molecules to fit the receptor with no a priori assumptions about binding<br />

modes <strong>and</strong> selecting products such that the library will cover all potential site<br />

pharmacophores .


Choosing Appropriate Descriptors<br />

Describing <strong>Diversity</strong> Space 15<br />

The choice of descriptor will depend on a number of factors, including any<br />

personal biases of the modeler! Perhaps the most important considerations are<br />

the amount of information available about the target <strong>and</strong> whether lead com-<br />

pounds have been discovered. There are several possible scenarios:<br />

1. Little information is available, <strong>and</strong> we are in the realm of general library<br />

design.<br />

2. Several leads are available, <strong>and</strong> the descriptors must in some way utilize the<br />

information in these leads.<br />

3. A crystal structure is available, <strong>and</strong> descriptors/methods are needed to utilize<br />

this information.<br />

The scale of the problem (i.e,, number of compounds to be processed) can be<br />

significant, because some of the descriptors described above will be applicable<br />

to only a few hundred thous<strong>and</strong> compounds rather than millions. Thus we face<br />

several questions of vital importance in the design of drug molecules: To what<br />

extent can <strong>and</strong> should pharmacological <strong>and</strong> pharmaceutical properties be taken<br />

into account/predicted? How can the plethora of available descriptors be sensi-<br />

bly weighted? Finally, how can the various descriptors <strong>and</strong> methods of design be<br />

validated? These are all active areas of current research. Overriding all these<br />

considerations, however, is the requirement that the descriptors be calculable<br />

for a wide range of structural classes in a time frame applicable to the problem<br />

at h<strong>and</strong>. Several months may be needed for selecting subsets from a corporate<br />

database or assessing compounds for purchase, but the turn-around time for<br />

library design is generally a few weeks or less. Of course, it would also be<br />

advantageous if the same descriptors could be used to tackle a variety of prob-<br />

lems. For example, screening hits from a general library could be analyzed<br />

within the descriptor space used to design the library, which immediately pro-<br />

vides insight into the type of molecules required for focused lead follow-up<br />

libraries. Thus, descriptor interpretability may also be an issue.<br />

In summary, the choice of descriptors will depend on the problem at h<strong>and</strong><br />

<strong>and</strong> the constraints of time imposed on the designer. Issues of descriptor valida-<br />

tion are discussed in the next section, though there is no consensus at this time<br />

on the best descriptors to use. We have had success in applying the<br />

pharmacophore-based 3-D descriptors to a variety of design tasks. We favor the<br />

descriptors because they represent key aspects of intermolecular interactions<br />

<strong>and</strong> take account of conformational flexibility. The pharmacophore descriptors<br />

can be applied to diverse subset selection, general library design, <strong>and</strong> focused<br />

library design. Site-directed design is in its infancy, but, as described above, the<br />

methods are being developed to apply the pharmacophore descriptors in this<br />

area too.


16 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />

Validation of Descriptors<br />

The validation of descriptors is an unsolved problem, fraught with difficulties.<br />

Validation implies the comparison of theoretical results against some absolute<br />

truth, provided by experimental data or by the universe of all possible results.<br />

Our stated goal is that design should enhance the process of lead generation <strong>and</strong><br />

optimization. It would seem appropriate to use hit rates as a measure of how<br />

well our diversity analysis does in comparison to chance: “simulated screen-<br />

ing.” This approach has been investigated by a number of researchers including<br />

the authors of Refs. 126-129 However, there are a number of issues concerning<br />

this type of approach. First, it assumes that the universe of chemical space can<br />

be neatly divided into actives <strong>and</strong> inactives, according to some biological test.<br />

However, membership of a set depends on the threshold defined for activity. If<br />

we return to our ideas about molecular recognition, we see that binding with<br />

micromolar affinity may indicate some degree of recognition, possibly mixed in<br />

with some solvophobic effects. As the activity improves, we are getting more of<br />

the features right, until at low nanomolar levels, we have compounds that fill<br />

the active site in a complementary manner. Thus, membership of the actives<br />

club becomes more exclusive as the threshold is raised <strong>and</strong> fewer chemical<br />

families are able to gain entrance.<br />

The next issue is that of sampling. The entire universe of compounds<br />

cannot be assayed <strong>and</strong> split into the activehnactive sets. How do we know that<br />

we have used a representative sample to test? Are the contents of the Spresi<br />

database130 representative of the chemical universe, or those of the World Drug<br />

Index21 of active drugs? Both questions probably have a negative answer, so<br />

methods that use this approach to validation must be viewed with caution. Even<br />

the term “hit rate” can be misleading. From a lead generation viewpoint, the<br />

aim should be to cover as many distinct structural classes as possible rather than<br />

concentrating on crude counts of hits (prompting the question of how to define<br />

a distinct structural class!). The “quality” of the hits is also important: that is,<br />

how amenable are they to optimization by medicinal chemistry. These consider-<br />

ations imply that the most efficient approach involves screening a well-designed<br />

set, followed up by screening close analogs of the hits.<br />

A number of studies have used an alternative approach to assess descriptor<br />

quality for diversity profiling. In these studies, descriptors were ranked by their<br />

ability to discriminate active <strong>and</strong> inactive compounds within a number of medic-<br />

inal chemistry project data sets. In the work of Brown <strong>and</strong> Martin,66 this<br />

discrimination involved the ability to separate one class of compounds from a<br />

general pool of compounds. The approach put forward by Patterson et<br />

(see also Refs. 132 <strong>and</strong> 133) introduced the concept of “neighborhood be-<br />

havior”: that is, compounds close in biological space should have a small differ-<br />

ence in descriptor values. In these studies, it was suggested that 2-D fingerprints<br />

<strong>and</strong> simple shape descriptors make better descriptors than other alternatives


Describing <strong>Diversity</strong> Space 17<br />

such as the primitive 3-D pharmacophore fingerprints studied. From our own<br />

perspective, such assertions regarding descriptor quality are rather sweeping.<br />

Two-dimensional substructure searches are used routinely to extract analogs<br />

from databases.134 Similarly, measurement of shape variation provides one of<br />

the staple descriptors of 3-D QSAR calculations.135-137 A capacity to distin-<br />

guish active from inactive analogs from a single biological screen at a nanomolar<br />

level is hardly proof of an ability to discriminate between heterogeneous activity<br />

classes. Within a single activity class, differences as small as a methyl group can<br />

have significant effects on activity. This well-known piece of medicinal chemis-<br />

try lore can be verified by a careful reading of many SAR papers. Jacobsen et<br />

al.138 provide a recent example in which two compounds (Figure 4) differ by one<br />

methyl group <strong>and</strong> have 70-fold difference in their relative activities. The struc-<br />

tural differences that exist between different receptors will tend to be much<br />

larger, however. Thus, to some extent, the results of such studies could have been<br />

predicted. In fact, there are any number of examples in which such approaches<br />

would break down. Many targets of pharmaceutical relevance involve the com-<br />

petition of a small-molecule lig<strong>and</strong> for a binding site with a natural lig<strong>and</strong> such<br />

as a small peptide or even a protein. The structurally diverse endothelin antago-<br />

nists discovered by a number of companies offer a case in point.99,139,140 All<br />

have a low 2-D similarity according to Daylight fingerprints (Figure 5), yet<br />

maintain the arrangement of essential pharmacophoric features.<br />

Fibrinogen receptor antagonists represent another example. In this in-<br />

stance, the natural lig<strong>and</strong> is (in part) the RGD (Arg-Gly-Asp) loop. As can be<br />

seen from Figure 6, different antagonists may show a high degree of structural<br />

diversity, exhibiting Daylight fingerprint similarities of less than 0.6. As an<br />

experiment, a database of 100,000 compounds taken from the RPR collection<br />

was seeded with 12 diverse RGD antagonists taken from the literature.141<br />

Performing a similarity search in this database with a multipharmacophore key<br />

derived from a flexible conformational analysis of the RGD tripeptide retrieves<br />

all 12 antagonists within the top 3% of the database (Table l).I42 Alternatively,<br />

@yX" /<br />

OAN/YCH,<br />

lyN*CH3<br />

CH3<br />

Figure 4 Illustration of the effect of adding a single methyl group to a compound's<br />

activity. In the source paper (Ref. 138), compound 41 (R = H) has a mean binding<br />

affinity of 6.67 nM against [3H]flunitrazepam. The corresponding value for<br />

compound 54 (R = Me) is 470 nM.


SB 209670<br />

0 RPRl09353<br />

Figure 5 Structurally diverse endothelin antagonists exhibiting low 2-D similarity<br />

while maintaining common pharmacophoric elements crucial to activity.<br />

using one of the synthetic antagonists (BIBU52) as the probe retrieves the other<br />

11 antagonists in the top 855 compounds. While this result is not proof of the<br />

validity of pharmacophore descriptors for library design, it certainly shows that<br />

the descriptors capture many of the important features of lig<strong>and</strong>-receptor<br />

interactions.<br />

Perhaps the best lesson to be drawn from these descriptor comparisons is<br />

that most of the proposed descriptors provide some discrimination pertinent to<br />

the problem at h<strong>and</strong>, <strong>and</strong>, as stated earlier, the final choice will depend on many<br />

factors relating to the nature of the problem. Two-dimensional descriptors can<br />

be very efficient at removing close analogs from screening sets, whereas to<br />

design small-organic molecule libraries based on peptide leads, or indeed on any<br />

structurally diverse compound set, or to achieve diversity in a biologically<br />

relevant space, requires descriptors (namely, 3-D ones) that capture the essence<br />

of drug-receptor interactions.<br />

A further philosophical problem is that many of the descriptors used to<br />

date are derived from the field of similarity analysis.143 Two-dimensional fin-<br />

gerprints lose relevance once outside a defined structural family. It is an ac-<br />

cepted fact that similarity values below about 0.5 are not reliablekignificant.<br />

This is not a problem for clustering similar compounds, when one simply wants<br />

to know that compound A is not similar to compound B, but problems arise<br />

when it is important to know how dissimilar two compounds are. A pertinent<br />

critique of 2-D bit string descriptors has been presented by Flower.70


TAKO29<br />

MK383<br />

BIBU52<br />

Figure 6 Some structurally diverse RGD antagonists.<br />

APPLICATIONS<br />

Applications 2 9<br />

With the necessary theory <strong>and</strong> background now in place, we move on to<br />

examine how to use the descriptors. In addition to what follows, the reader may<br />

wish to consult a special issue of Perspectives in Drug Discovery <strong>and</strong> Design<br />

from a few years ago entitled “Computational Tools for the <strong>Analysis</strong> of Molecu-<br />

lar <strong>Diversity</strong>.”Is It contains review articles covering many of the issues<br />

discussed below: cluster-based selection, partition-based selection, <strong>and</strong>


20 <strong>Molecular</strong> Diuemitv <strong>and</strong> Cornbinatorial Libran, Desian<br />

Table 1 Use of a Pharmacophore Key Derived from the RGD Tripeptide to Retrieve<br />

12 Seeded RGD Antagonists from a R<strong>and</strong>om Collection of 100,000 Molecules<br />

Nc<br />

Probe Numberof Hits Topa Lowestb 100 500 1000<br />

RGD 23,884 8 3,044 3 5 7<br />

MK383 57,846 13 11,252 2 5 5<br />

SB214857 48,210 10 18,086 3 4 6<br />

TAK029 38,728 1 2,275 5 6 9<br />

BIBU52d 37,805 1 855 4 6 11<br />

aPosition in the hit list of the highest ranking of the 12 seeded compounds.<br />

bLocation of the lowest ranking of the seeded compounds.<br />


Applications 21<br />

at this ~tage.12791~6<br />

This is especially true when one is simply looking for hits<br />

showing some activity that can be followed up by screening similar compounds<br />

from the corporate database. A maximally diverse set is to be preferred to a<br />

purely r<strong>and</strong>om selection for the following reasons. The maximally diverse set<br />

should maximize the structure-activity information gained from the screen by<br />

minimizing the redundancy in the set of compounds tested. A simply r<strong>and</strong>om<br />

selection, rather than a maximally diverse one, will not guarantee the absence of<br />

close homologs. Further, although empirical evidence suggests that the number<br />

of hits obtained from a r<strong>and</strong>om selection may approach that obtained from a<br />

maximally diverse set, the latter should ensure that structurally <strong>and</strong> phys-<br />

icochemically diverse leads are found, giving medicinal chemists a better chance<br />

of finding suitable compounds to follow up for lead optimization.146 Once one<br />

or more leads have been selected for a project, it might be desirable to select<br />

follow-up sets for screening. In this case, compounds that are similar to the<br />

lead(s) in some sense will be sought.<br />

Both these types of selection may be accomplished by either clustering or<br />

partitioning methods. For a diverse selection, one might cluster the collection<br />

<strong>and</strong> then test only the cluster centroids, whereas in a follow-up similarity<br />

search, other compounds from within the clusters containing the leads could be<br />

tested. If a partitioning approach were to be used, a diverse selection could be<br />

obtained by choosing one compound from each occupied cell in the grid,<br />

whereas compounds similar to a lead could be found by examining the cell that<br />

contains it, together with immediately adjacent cells. A diverse set can also be<br />

constructed by means of a maximum dissimilarity selection algorithm, whereas<br />

a follow-up set could be identified by simply ranking compounds by similarity<br />

to the lead( s). Finally, experimental design techniques, autocorrelation<br />

methods, <strong>and</strong> a variety of stochastic algorithms may also be applied to subset<br />

selection.<br />

Clustering Subset selection by clustering has been a st<strong>and</strong>ard approach<br />

for many years. Perhaps the seminal paper in this regard is that of Willett <strong>and</strong><br />

coworkers.64 In this work, the nonhierarchical clustering algorithm due to<br />

Jarvis <strong>and</strong> Patrick147 was employed to cluster the Pfizer chemical stores file<br />

(approximately 8500 available compounds) with the aim of selecting small<br />

subsets for screening. The same techniques were also employed to group the<br />

output from substructure searches, again with the intent of reducing the number<br />

of compounds to be screened, while maximizing the information gained from<br />

the screening. A drawback to this nonhierarchical method is the lack of control<br />

over the size of the largest cluster <strong>and</strong> the number of singletons. Slight variations<br />

in the control parameters can lead to the formation of one very large, probably<br />

unrealistic, cluster, or at the other extreme, a high fraction of clusters with a<br />

single compound, Menard <strong>and</strong> coworkers148 tried to address this issue through<br />

their cascaded clustering approach, in which prior knowledge about the poten-<br />

tial size of the largest cluster in the database was used to set the clustering<br />

parameters.


22 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Cornbinatorid Library Design<br />

The small clusters (< 5 members) were extracted <strong>and</strong> reclustered. When<br />

the results were checked by medicinal chemists, this strategy seemed to have<br />

reduced the number of singletons to an acceptable level. An alternative approach<br />

developed by Doman et a1.149 employed a fuzzy clustering technique150<br />

combined with the Jarvis-Patrick method.147 The methodology has no userdefined<br />

parameters <strong>and</strong> allows compounds to belong to more than one cluster.<br />

Hierarchical clustering methods are not as greatly affected by the issue of<br />

singletons, but they do impose higher computational dem<strong>and</strong>s. If N is the<br />

number of compounds to be processed, the dissimilarity matrix can require up<br />

to O(N2) disk space for storage, <strong>and</strong> the st<strong>and</strong>ard clustering algorithm requires<br />

O(N3) time.151 Some workers have achieved improved performance by use of<br />

Murtagh’s reciprocal nearest-neighbor algorithm,l52 which requires only O(N)<br />

disk space <strong>and</strong> O(RT2) time, allowing the clustering of up to 200,000 structures<br />

in a reasonable time.71J51<br />

Partitioning A good example of a partitioning approach to screening <strong>and</strong><br />

follow-up set selection is the diverse property-derived (DPD) method described<br />

by Lewis et al.74 The following molecular attributes were used to construct a<br />

six-dimensional property space: number of H-bond acceptors, number of<br />

H-bond donors, molecular flexibility, Hall <strong>and</strong> Kier’s electrotopological state<br />

index,l53 ClogP, <strong>and</strong> an “aromatic density” measure. Compounds from the<br />

corporate database were then partitioned across this space, <strong>and</strong> each compound<br />

was assigned an identifier (DPD code) according to the partition to which it was<br />

allotted. When the compounds had been partitioned, a rational, general screening<br />

set was created by selecting one compound from each of the partitions. This<br />

screening set has been in regular use at RPR for a number of years <strong>and</strong> has<br />

yielded several weak leads (1-50 pM) in a variety of assays. A particular instance<br />

of this concerned a project to find inhibitors of low density lipoprotein<br />

(LDL) production. In this case, the general DPD set yielded one hit, but a<br />

follow-up set containing compounds with the same DPD code (i.e,, occupying<br />

the same cell) gave further hits. These were refined in conjunction with an<br />

existing lead to give a query for use in 3-D searching. Searches of the corporate<br />

database resulted in compounds having low nanomolar activity.154 Lewis et<br />

a1.’4 make the point that, in general, the DPD set does not give rise to high<br />

quality leads, but rather to hits. However, since the DPD set represents a diversity<br />

of molecular properties rather than of structural features, the DPD set is<br />

likely to be especially useful with new screens where leads have not yet been<br />

identified.<br />

Maximum Dissimilarity-Based Selection The original algorithm for<br />

dissimilarity ranking in the chemical structure context seems to have been proposed<br />

by Bawden,*55 although the basic algorithm may be due to Kennard <strong>and</strong><br />

Stone.156 The basic operation of a dissimilarity selection algorithm is to start<br />

with a compound selected at r<strong>and</strong>om <strong>and</strong> make this the first selected compound.<br />

Subsequent compounds are selected so that they are maximally dissimilar<br />

to all those in the currently selected set. Dissimilarity may be measured by


Amlications 23<br />

the maximum sum of similarities to all selected molecules (MaxSum) or the<br />

largest nearest neighbor distance (MaxMin). The final diversity of the N mole-<br />

cule subset is given by Eq. [l] or [2], where sim(i, j) is the similarity between<br />

molecules i <strong>and</strong> j, <strong>and</strong> d, is the Euclidean distance between molecules in the<br />

descriptor space.<br />

This type of methodology was embraced by researchers at Upjohn in their<br />

COUSIN system.126 The Willett developed fast algorithms based<br />

on the MaxSum dissimilarity measure in combination with the cosine coeffi-<br />

cient. This algorithm was applied by Pickett et a1.102 in conjunction with multi-<br />

pharmacophore descriptors to the task of selecting diverse reagents. Willett’s<br />

group has looked extensively at both definitions of dissimilarity159 <strong>and</strong> al-<br />

gorithms for dissimilarity-based compound selection.160 In the former case,<br />

they concluded that it was impossible to identify any of the four definitions<br />

studied as being superior to the others.<br />

When the algorithms were compared, however, the MaxMin algorithm<br />

gave better results than the alternatives under study. In fact, several<br />

workersl07J61 have highlighted a problem with the MaxSum procedure. The<br />

measure is based on the distance of the point from the centroid of the set <strong>and</strong> so<br />

tends to select molecules from the corners of diversity space, <strong>and</strong> duplicate<br />

selections can appear to add to the diversity. This situation is clearly a problem<br />

with traditional descriptors, because the extremes of space tend to be less rele-<br />

vant chemical compounds (very high or very low log P, etc.).<br />

It is interesting to consider why using “corner” compounds is a less press-<br />

ing issue when applied to pharmacophore keys. First, the pharmacophore space<br />

is very high-dimensional, <strong>and</strong> it is not uncommon to have a number of reagents<br />

or molecules that have no (or only very few) pharmacophores in common.<br />

Mount et al.107 note that in higher dimensional spaces, more of the points are<br />

near the periphery, rendering the difference in behavior less pronounced. Sec-<br />

ond, the molecules are not r<strong>and</strong>omly spread throughout space but tend to<br />

cluster; thus inclusion of a similarity threshold to prevent selection of molecules<br />

similar to those already selected avoids revisiting areas of space. Provided the<br />

number of compounds to be selected is small compared to the size of the set, the<br />

time overhead for this additional constraint is not too great. Third, it is also<br />

possible to monitor how many new pharmacophores a selected molecule would<br />

add to the set.100 Thus, the similarity measure ensures that pharmacophores are<br />

presented in different combinations, while the monitoring of the addition of<br />

new pharmacophores ensures that, overall, all pharmacophores within the set


24 <strong>Molecular</strong> Divmity <strong>and</strong> Combinatorial Libra y Design<br />

are covered (i.e., by combining a partitioning <strong>and</strong> a distance-based approach).<br />

These arguments not withst<strong>and</strong>ing, the MaxMin procedure would appear to be<br />

the method of choice today. Agrafiotis <strong>and</strong> Lubanovl61 have shown how k-d<br />

trees can provide an efficient way to calculate nearest neighbor distances for<br />

input to a MaxMin selection procedure. They use a simulated annealing pro-<br />

cedure to select an n-molecule subset that maximizes Eq. [3]. This expression<br />

provides a smoother function compared to the st<strong>and</strong>ard MaxMin expression<br />

(Eq. PI).<br />

A general dissimilarity selection algorithm was recently reported by<br />

Clark.1627163 There is an adjustable parameter in the algorithm that controls the<br />

balance between representativeness <strong>and</strong> diversity. Other functions for maximiz-<br />

ing dissimilarity have been suggested by Hassan et al.164 In their work, the<br />

(dis)similarity function is derived from a large number of 2-D <strong>and</strong> single-<br />

conformer 3-D descriptors, the dimensionality being reduced by means of prin-<br />

cipal components analysis (PCA). Multidimensional scaling is used to generate<br />

a 3-D coordinate plot for the library. The library design is a “cherry-picking”<br />

procedure: a r<strong>and</strong>om selection of compounds is taken, <strong>and</strong> compounds are<br />

added <strong>and</strong> removed from this selection by means of a Monte Carlo method<br />

combined with a maximal dissimilarity function based on the sum of the dis-<br />

tances between molecules in the PCA descriptor space. It seems from Hassan’s<br />

paper,164 that the principal components are recalculated for, <strong>and</strong> are particular<br />

to, each library, making the performance of interlibrary comparisons a non-<br />

trivial task. Hudson et al.165 have also reported the development of<br />

dissimilarity-based methods for the selection of diverse subsets.<br />

Experimental Design In addition to a maximal dissimilar selection al-<br />

gorithm, similar in spirit to those described above, Higgs et a1.166 have experi-<br />

mented with the use of a D-optimal design algorithm to generate what they term<br />

an “edge design.” By this they mean a design that tends to select molecules on<br />

the edge of the descriptor space, filling the corners first <strong>and</strong> then populating the<br />

edges. Experimental design has also been used for reagent selection by the<br />

Chiron group,7s who claim that it can generate “maximal overall diversity.”<br />

However, Higgs et al. criticize this assumption. In their experience, the D-opti-<br />

ma1 design algorithm does not explicitly seek to avoid previously sampled areas<br />

of space, even with the addition of additional (quadratic) terms. The Lilly<br />

group166 much prefers the maximal dissimilarity selection algorithm (what they<br />

term a “spread design”), which is able to sample descriptor space thoroughly,<br />

including molecules from the edges <strong>and</strong> throughout the space. A further type of<br />

design (a “coverage design”), suitable for lead follow-up, is mentioned in their<br />

work.166 The coverage design algorithm identifies a subset of molecules that is<br />

maximally similar to a c<strong>and</strong>idate set.


Applications 25<br />

Kohonen Maps Kohonen maps are essentially a projection technique,<br />

providing a lower dimensional (usually 2-D) view of a higher dimensional<br />

descriptor space. Objects close in the higher dimensional space will be placed in<br />

the same or neighboring neurons, <strong>and</strong> so the method could be classed as a<br />

partitioning technique. Gasteiger <strong>and</strong> coworkers167 applied this technique in<br />

conjunction with spatial autocorrelation vectors <strong>and</strong> were able to differentiate<br />

dopamine <strong>and</strong> benzodiazepine agonists.168 The method has also been proposed<br />

as a means of assessing the diversity of combinatorial libraries.92 Agrafiotis has<br />

described the application of a similar technique, Sammon mapping, for visualiz-<br />

ing the results of diversity analyses.’@<br />

Spanning Trees The IcePick program,lo7 mentioned earlier in connection<br />

with 3-D descriptors, utilizes a minimum weight spanning tree (MWST) to<br />

obtain a spread of molecules. The MWST can be thought of as the shortest way<br />

of indirectly connecting a set of points. When the MWST is large, the set will be<br />

diverse because the points are spread out.<br />

It is worth noting that in all the methods described in this section, diversity<br />

is being equated to dissimilarity between compounds, <strong>and</strong> dissimilarity is being<br />

assessed as (1 - similarity). In other words, the methods require a comparison<br />

metric that is meaningful for measurement of distance between quite dissimilar<br />

objects. This is not the case for 2-D fingerprints, for example, which were<br />

developed for 2-D substructure searching <strong>and</strong>, as mentioned earlier, tend to lose<br />

meaning below similarities of about 0.5. In the authors’ opinion, not enough<br />

consideration has been given to this issue. It is for this reason that validating<br />

metrics on quite structurally homogeneous data sets (where such assumptions<br />

may apply) is not the same as validating them on very structurally inhom-<br />

ogeneous sets (see above section on Validation of Descriptors).<br />

Partitioning Versus Distance-based Methods There are several methods<br />

available for selecting representative subsets from large sets. Each method has<br />

its good <strong>and</strong> bad points, <strong>and</strong> the specifics of the application should determine<br />

the most appropriate method to select. The methods are fairly independent of<br />

the nature of the descriptor but are affected by whether the descriptor is discrete<br />

(e.g., binary fingerprints) or continuous (e.g., molecular weight). Techniques for<br />

clustering chemical objects have been well reviewed by other re-<br />

searchers1449170J71 <strong>and</strong> have been applied by several groups to select repre-<br />

sentative screening sets from large compound collections. Despite these success-<br />

ful applications, we think that clustering should be used with great care. The<br />

application of a clustering method makes the assumption that the data are in<br />

fact amenable to clustering: in other words, most clustering methods will pro-<br />

duce a clustering, whatever the data. To the authors’ knowledge, there are no<br />

simple ways of testing whether this assumption is justified for a very large data<br />

set. Certainly, cluster significance tests have been proposed,172, but they are<br />

quite computationally expensive <strong>and</strong> not practical to apply to very large data<br />

sets. The second <strong>and</strong> most important factor is the lack of generality when one is<br />

applying distance-based measures. If the subset is defined by the clustering of


26 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />

one database or combinatorial library, it is hard to define the descriptors for<br />

compounds in a second database or library without a large number of expensive<br />

distance calculations, as well as some arbitrary definitions of cluster dimensions.<br />

Perhaps the best application of clustering in the context of library design<br />

is to remove redundancy in reagent sets.<br />

Partitioning is best described as a boxing algorithm: each descriptor is<br />

divided into ranges; a combination of descriptor ranges makes a partition or<br />

box. The composite descriptor is then effectively the coordinate vector of one of<br />

the vertices of the box. The complete set of partitions is formed by taking all<br />

combinations of all the ranges into which the molecular descriptors have been<br />

divided. This approach also has the useful property of being space filling. It is<br />

completely portable between different databases, designs, or applications, provided<br />

the same descriptors <strong>and</strong> ranges are used, thus allowing comparison<br />

between compound sets from different sources or different combinatorial libraries<br />

(see next section). Other advantages are easy control of granularity <strong>and</strong>,<br />

perhaps most important, the ability to identify property space not represented<br />

by any molecule. Disadvantages of the partitioning algorithm include the arbitrary<br />

way in which the ranges must be set, <strong>and</strong> the introduction of edge effects<br />

when a partition boundary slices between two very similar compounds; an<br />

answer to this issue may come though the application of fuzzy 10gic.l~~ These<br />

edge effects have implications for follow-up screening of molecules in the same<br />

partition as a lead: to avoid missing compounds that fall just outside a partition,<br />

the surrounding partitions should also be tested. However, for a sixdimensional<br />

classification like the DPD system,74 with perhaps 50 compounds<br />

per partition, this could necessitate screening a further potential 36,400 [i.e.,<br />

50(36-1)] compounds, a number almost large enough to defeat the object of the<br />

exercise. However, the portability of the descriptor outweighs this negative<br />

factor, in our opinion. More work remains to be done to reach a consensus on<br />

the question of which method, clustering or partitioning, gives the better performance.<br />

At present, we must conclude that choosing the method best suited to<br />

the task at h<strong>and</strong> is preferable to modifying the task to suit a favored methodology.<br />

Thus the application of these methods by the practicing computational<br />

chemist may require some trial <strong>and</strong> error.<br />

Comparison of Compound Collections with a<br />

View to Acquisition or Combinatorial Libraries<br />

with a View to Synthesis<br />

As mentioned above, corporate chemical structure databases are replete<br />

with analog series <strong>and</strong> are thus far from representative of the full range of<br />

structural or physicochemical diversity. There is therefore much interest in,<br />

first, locating the “diversity voids” within a particular collection, <strong>and</strong> then<br />

analyzing external collections to see which compounds could be purchased to<br />

occupy those holes. In this way, the molecular diversity of a corporate collection<br />

can be enhanced, <strong>and</strong> this in turn should lead to better results from high-


throughput screening experiments for the reasons outlined in the preceding<br />

section. Clearly, identical techniques can be used for the comparison of com-<br />

binatorial libraries to ensure that synthetic effort is not being wasted in the<br />

generation of redundant compounds.<br />

As an example of compound collection comparison, Shemetulskis et a1.174<br />

carried out clustering experiments to see how much diversity would be added to<br />

the Parke-Davis corporate database (CBI, 117,459 compounds) by the inclu-<br />

sion of the Chemical Abstracts Service ( CAST-3-D, 379,847 compounds) <strong>and</strong><br />

the Maybridge (MAY, 41,912 compounds) databases.175.176 The approach<br />

used was to cluster the CBI database with each of the MAY <strong>and</strong> CAST-3-D<br />

databases in turn <strong>and</strong> to examine what percentage of the resulting clusters<br />

contained only (or more than 95%) MAY or CAST-3-D compounds. The MAY<br />

compounds in these clusters could then be considered as c<strong>and</strong>idates for pur-<br />

chase. The clustering experiments were carried out on the basis of both struc-<br />

tural attributes <strong>and</strong> physicochemical properties using the Jarvis-Patrick al-<br />

gorithml47 as implemented in the Daylight software.69 With the large numbers<br />

of compounds involved, the clustering effort [requiring an O( N2) nearest-<br />

neighbor table calculation] was immense. As an illustration, the generation of<br />

the nearest-neighbor table for the CAST-3-D database took 64 CPU days on an<br />

SGI 4D/480 workstation!<br />

Apart from the large amount of CPU time required for clustering (or<br />

distance-based) experiments of the type mentioned above, such methods are<br />

generally not well suited to diversity void location, simply because they can deal<br />

only with space that is covered by the compounds being clustered. So, in the<br />

work above for instance, if there were regions of diversity space not occupied by<br />

any compound in CBI, MAY, or CAST-3-D, there would be no way of discover-<br />

ing these voids or of choosing compounds to fill them. Thus, partitioning (cell-<br />

based) approaches are generally considered to be preferable for this kind of<br />

analysis, provided, of course, that a suitable diversity space for partitioning is<br />

defined.84<br />

Cummins et al.76 used a cell-based approach to compare the molecular<br />

diversity in five databases: the Comprehensive Medicinal Chemistry (CMC177)<br />

<strong>and</strong> MACCS Drug Data Report (MDDR17*) (each representing medicinal<br />

chemistry knowledge bases), the Available Chemicals Directory (ACD1791) <strong>and</strong><br />

SPECS180 (representing commercially available compounds), <strong>and</strong> the Wellcome<br />

Registry. The compounds in these databases (totaling more than 300,000) were<br />

mapped into a molecular descriptor space describing molecular diversity in<br />

terms of the free energy of solvation <strong>and</strong> 60 topological indices. This number of<br />

descriptors was reduced to four by factor analysis, <strong>and</strong> a partitioning method<br />

was used to analyze the resulting space. It was found that the superpopulation<br />

of structures occupied only a very small volume of the available space; attention<br />

was focused on the densely populated part by removing outliers (cells with no<br />

or few representatives). In any event, only about 7000 compounds were deleted<br />

in this process, at which point it became possible to compare the databases in


28 <strong>Molecular</strong> Diuevsity <strong>and</strong> Combinatorial Libraty Design<br />

detail. For example, the MDDR <strong>and</strong> ACD databases were found to overlap<br />

each other’s volume by around 70%, reflecting the fact that many biologically<br />

active molecules are of commercial interest <strong>and</strong> vice versa.<br />

More recently, Willett’s group has extended its methodology for diverse<br />

subset selection to the analysis of the relative diversity of compound collec-<br />

tions.158 The six databases compared comprised five publicly available collec-<br />

tions <strong>and</strong> a combinatorial library. The individual diversities of the databases<br />

were assessed, <strong>and</strong> also the changes in diversity that occurred when one<br />

database was merged with another. Interestingly, the union of two databases<br />

does not always result in an increase in diversity! For instance, the diversity of<br />

the Maybridge collection was found to decrease markedly when it was merged<br />

with a simple combinatorial library constructed from the condensation of 400<br />

primary amines <strong>and</strong> 400 carboxylic acids selected from the World Drug Index21<br />

(WDI) database. In other words, according to the metrics used, the molecules in<br />

the resulting database are more similar to each other than those just in May-<br />

bridge. Pickett et a1.102 have adopted a similar kind of methodology but using a<br />

different descriptor, 3-D pharmacophores rather than 2-D bit strings. In this<br />

work a number of potential combinatorial libraries were compared, <strong>and</strong> the<br />

results were used to select the subset that added the most pharmacophore<br />

diversity in comparison to screening libraries previously synthesized.<br />

A rather different tack has been taken by Nilakantan et al.,lgl who<br />

describe a method for comparing large chemical databases. Their approach<br />

relies on categorizing each database according to its ring system content, based<br />

on some earlier work.182 Each ring system in each molecule is assigned a hash<br />

code, <strong>and</strong> these codes are summed for each molecule to generate what the<br />

investigators term a ring-cluster hash code. By comparing the resulting hash<br />

codes for two databases, it is possible to gain some idea about how similar they<br />

are. Nilakantan et al. used this metric to compare a number of public databases<br />

[Cambridge Structural Database (CSD),l83 ACD, WDI, <strong>and</strong> the National<br />

Cancer Institute (NCI-3-D)184 database] <strong>and</strong> discovered that the CSD has the<br />

richest collection of ring systems <strong>and</strong> ring clusters. The same paper presents a<br />

different method for the estimation of database diversity. The program DIVPIK<br />

simply tries to pick a certain number of dissimilar compounds from a database.<br />

Intuitively, the more diverse a database, the fewer attempted selections will be<br />

required. A measure of diversity can be gained by considering the ratio<br />

NTRIES/NPICK. Nilakantan et al. used this measure to demonstrate that the<br />

diversity of the four databases increased in the order WDI < ACD = NCI-3-D <<br />

CSD (essentially the same result obtained by a consideration of the ring cluster/<br />

system hash codes). The two independent methods thus serve to validate each<br />

other to some extent, although the DIVPIK method is significantly more com-<br />

putationally expensive in practice.<br />

We attempted a practical application of these ideas in a project to select<br />

1000 compounds from one agrochemical-biased corporate collection (CC1) to<br />

supplement the diversity of a representative pharmaceutical-biased screening


Abblications 29<br />

set (PSS) derived from another independent corporate collection. These experi-<br />

ments used the Chem-Xlos pharmacophore key overlaps as the similarity met-<br />

ric. We found that we could achieve better results by using diversity analysis<br />

tools, but that prefiltering had a very important role to play (a sobering thought<br />

for those of us caught up in the mathematics of diversity analysis). The follow-<br />

ing filters were used:<br />

0 Remove compounds containing potentially reactive or toxic groups.<br />

0 Remove molecules with a molecular weight outside the range 200-600<br />

Da.<br />

0 Remove molecules with a ClogP value outside the 0-6 range.<br />

Remove all molecules expressing a number of pharmacophores outside<br />

the range 1-1000.<br />

Remove all molecules with more than 100,000 conformations.<br />

0 Remove all instances of “near-duplicate’’ molecules. (This was achieved<br />

by taking each molecule in turn <strong>and</strong> removing all molecules with a<br />

Daylight fingerprint similarity > 0.95 to it).<br />

0 Remove compounds with heavy atom counts outside 20-45, excluding<br />

halogens.<br />

While the filters are fairly stringent, we did not expect them to remove 83% of<br />

the corporate collection! Use of the HARPick programl85 (see below) increased<br />

the number of pharmacophores present in the selected set from around 13,000<br />

for the first r<strong>and</strong>om pick to 15,711 <strong>and</strong> increased the number of phar-<br />

macophores unique to the selected set (as compared to PSS) from 535 to 850.<br />

Combinatorial Library Design<br />

The key task in library design, in which molecular diversity analysis can<br />

play a central role, is the selection of reagents. In general, these reagents will<br />

give rise to R groups attached to a conserved scaffold or template. The need for<br />

reagent selection arises because in many instances, the product of the number of<br />

available reagents at each variable position rapidly outstrips the synthetic<br />

capability of even high-throughput, robotic synthesis units. From arguments<br />

similar to those advanced in the preceding section, it is obviously sensible to<br />

choose a diverse subset of the available reagents at each position for general<br />

library design. In some instances, there will be additional information that can<br />

focus or constrain the design. We shall deal with these two scenarios separately.<br />

General Library Design<br />

Broadly speaking, there are three approaches to reagent selection. In<br />

reagent-based selection, a subset is chosen to maximize the diversity of the<br />

reagents at each position without considering the reagents at the other posi-<br />

tions, or the scaffold. A good example of such a method is that reported by the


30 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libraty Design<br />

Chiron group.75 Of course, almost any of the techniques for diverse subset<br />

selection may be applied to reagent-based selection of reagents. Alternatively, a<br />

product-based scheme can be envisaged, in which reagents are selected at all<br />

positions so that the diversity of the generated products is maximized. This type<br />

of approach has been championed by Gillet et al.186 <strong>and</strong> by Good <strong>and</strong> Lewis.185<br />

Finally, one may pick the most diverse set of products <strong>and</strong> then deconvolute to<br />

find the sets of reagents required to make that set. This kind of approach,<br />

sometimes called cherry-picking, is exemplified by the methods embodied in the<br />

ChemDiverse package.105<br />

There are some advantages <strong>and</strong> disadvantages to each of these ap-<br />

proaches, <strong>and</strong> each may be appropriate in certain design situations. In general,<br />

the cherry-picking approach will result in the most diverse set of products;<br />

however, this approach has the serious disadvantage of not resulting in a syn-<br />

thetically efficient combinatorial library. That is, it is likely to be necessary to<br />

synthesize a number of “unwanted” molecules in addition to the desired prod-<br />

ucts. Reagent-based selection is fast, since one is not considering the enumer-<br />

ated combinatorial products in the analysis, <strong>and</strong> thus this method may be<br />

suitable when the enumerated virtual library is very large. However, experi-<br />

ments by Gillet et a1.186 have shown that a product-based reagent selection<br />

approach gives diversity superior to that obtainable from a reagent-based<br />

method. Van Drie <strong>and</strong> Lajiness report a similar experience.187 Balanced against<br />

this we note that most product-based schemes can deal only with enumerated<br />

libraries of the order of 100,000 molecules, a number that is easily attainable,<br />

particularly with more than two variable positions on the template. In practice,<br />

one is likely to need to combine the reagent-based <strong>and</strong> product-based ap-<br />

proaches. The reagent-based selection methods can be used to filter the initial<br />

reagent lists to a size at which the virtual library becomes tractable for analysis<br />

by a product-based method. This kind of hybrid approach has been used suc-<br />

cessfully by Good <strong>and</strong> Lewis in applying their HARPick program.185<br />

We have already discussed the work of Chapman112 from the perspective<br />

of molecular descriptors. We will now look at it in terms of library design.<br />

Chapman computes diversity as the sum of all pairwise dissimilarities between<br />

the molecules in the set. A bias may be introduced to weight against excessive<br />

flexibility in the molecules by a function based on the number of rotatable<br />

bonds. A st<strong>and</strong>ard “greedy” algorithm that adds the molecule that will most<br />

increase the diversity of the current set of molecules is used to build up a library<br />

design, This implies a cherry-picking strategy. Even so, the diversity measure is<br />

still very computationally intensive, <strong>and</strong> at present this method can h<strong>and</strong>le only<br />

libraries in the low thous<strong>and</strong>s.<br />

The nature of product-based library design lends itself naturally to the<br />

application of heuristic search methods such as simulated annealing188 <strong>and</strong><br />

genetic algorithms.189 Several groups have published applications in the latter<br />

area, which has been recently reviewed.190-192 While all methods differ some-<br />

what in their technical implementations of the different algorithms, by far the


Applications 3 1<br />

most important factor affecting the final choice of reagents is the scoring func-<br />

tion. As always, there is a need to use descriptors pertinent to lig<strong>and</strong>-receptor<br />

interactions. The HARPick program of Good <strong>and</strong> Lewis185 uses a fitness func-<br />

tion based on multipharmacophore molecular descriptors. Both simulated an-<br />

nealing <strong>and</strong> genetic algorithms have been studied.193 The scoring function in<br />

HARPick is very flexible <strong>and</strong> is made up from a weighted combination of the<br />

following terms: the number of pharmacophores expressed <strong>and</strong> their frequency,<br />

some crude shape measures, molecular flexibility, <strong>and</strong> the degree of match to<br />

the pharmacophore profile of a reference library. The method was tested by<br />

means of a variety of weighting combinations <strong>and</strong> libraries, <strong>and</strong> the results were<br />

compared with the data obtained with ChemDiverse,lOS which, as mentioned<br />

earlier, uses a cherry-picking strategy. Both ChemDiverse <strong>and</strong> HARPick were<br />

able to improve considerably molecular selection based on pharmacophore<br />

count, compared to r<strong>and</strong>om selections, but HARPick calculations, which were<br />

set to purely maximize pharmacophore diversity, were able to find around twice<br />

the number of pharmacophores obtained by the comparable ChemDiverse runs,<br />

As expected, however, the molecules chosen were substantially more flexible<br />

<strong>and</strong> “promiscuous.” Inclusion of the “quality” terms (which penalize undesir-<br />

able characteristics such as excessive conformational flexibility in the library<br />

members) reduced the pharmacophore scores of the final selections but not<br />

drastically (still better than r<strong>and</strong>om). As one might expect, selections made at<br />

r<strong>and</strong>om or via ChemDiverse gave sets of molecules that broadly followed the<br />

distribution of properties (such as the number of rotatable bonds in a molecule)<br />

observed in the whole St<strong>and</strong>ard Drugs File (now known as the World Drug<br />

Index21). HARPick managed to produce a much more even distribution. In<br />

another evaluation of HARPick reported in Ref. 185, the program outper-<br />

formed r<strong>and</strong>om selections from the perspective of filling diversity voids in a<br />

reference library. Given our remarks about the difficulties in measuring general<br />

diversity, this is probably the best way in which such selection methods should<br />

be applied.<br />

The primary feature emphasized by the calculations above is the control<br />

afforded to the user over both the components of the scoring function <strong>and</strong> the<br />

weights applied to them. In principle, any descriptor could be applied to the<br />

scoring functions. One could envisage maximizing functions (e.g., 3-D phar-<br />

macophore or 2-D fingerprint coverage, reagent supplier reliability), minimiz-<br />

ing functions (e.g., cost per reagent), partition functions (e.g., general shape,<br />

ClogP), <strong>and</strong> bounding functions (assigning a score of zero to products with<br />

properties outside specified bounds, e.g., minimudmaximum ClogP). In prin-<br />

ciple, a totally customizable scoring function could be devised, with the user<br />

able to choose the properties included in the scoring routine, <strong>and</strong> the functions<br />

used on them. Similar ideas are envisaged by Agrafiotisl69 <strong>and</strong> have been imple-<br />

mented by groups at various pharmaceutical companies. With careful applica-<br />

tion of user weightings for each component function, the result would be a<br />

totally flexible profiling paradigm.


32 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Conzbinatorial Library Design<br />

Gillet et a1.194 have recently reported on the SELECT program, which is<br />

similar in philosophy to HARPick but uses a genetic algorithm rather than<br />

simulated annealing. A product-based program, SELECT utilizes the Daylight<br />

structural fingerprints to optimize either the sum of dissimilarities or the aver-<br />

age nearest-neighbor distance of selected compounds. Interestingly, the pro-<br />

gram can also select the best configuration for a multicomponent library. Be-<br />

cause of the nature of the descriptors used, the program can be applied to<br />

virtual libraries of hundreds of thous<strong>and</strong>s of products. Additional terms in the<br />

scoring function allow libraries to be designed with respect to an external<br />

reference <strong>and</strong> to have an appropriate spread of physicochemical properties.<br />

Constrained/Focused/Biased Libra y Design<br />

In designing a library, it is of paramount importance to take account of all<br />

the available information. A general library design assumes no particular prior<br />

knowledge, but in many cases, there will be information that can be used. For<br />

instance, it might be desirable to bias a library away from a previous collection<br />

or library, or toward a set of compounds known to be active. In one case,19-<<br />

Sheridan <strong>and</strong> Kearsley constrained their design to select tripeptoids similar to<br />

two tetrapeptide cholecystokinin (CCK) antagonists. In a second example, scor-<br />

ing was based on an angiotensin converting enzyme (ACE) “trend vector”<br />

summarizing the chemical features shared by known ACE inhibitors that differ<br />

from those of a general population of druglike molecules.195 Similar work has<br />

been reported by Cho et al. with their FOCUS-2-D method.196 Good <strong>and</strong> Lewis<br />

have shown how the HARPick program can be used in this context, selecting a<br />

set of reagents such that the generated products would fill diversity voids in the<br />

space occupied by the St<strong>and</strong>ard Drugs File.185<br />

In related work, Pickettl97 has used a genetic algorithm whose objective<br />

function was the overlap in pharmacophores between one or more lead com-<br />

pounds <strong>and</strong> members of the proposed library. In the context of an ongoing<br />

medicinal chemistry program, Brown et al.198 have described the design of<br />

libraries biased toward the family of peroxisome proliferator-activated recep-<br />

tors (PPARs). In this instance, a phenoxybutyric acid group (present in known<br />

PPAR lig<strong>and</strong>s) was incorporated as a “privileged” fragment at one diversity<br />

position. At the other two variable positions, molecular weight <strong>and</strong> synthetic<br />

considerations were used to filter reagents before subjecting them to an experi-<br />

mental design procedure to select a diverse set at each point. Deconvolution of<br />

the resulting library led to the identification of GW 2433 (Figure 7) as the first<br />

high affinity PPARG lig<strong>and</strong>.<br />

The most exciting situation, however, is where there is information con-<br />

cerning the structure of the receptor site that is being targeted. In this case,<br />

structure-based design <strong>and</strong> combinatorial chemistry can combine syn-<br />

ergistically to give enormous benefits.199.200 The structural information pro-<br />

vides a strong constraint for reagent selection, while combinatorial library<br />

design ensures the rapid provision of synthetically accessible compounds, thus


<strong>Diversity</strong> Is Not the Be-All <strong>and</strong> End-All! 33<br />

Biased library GW 2433<br />

Figure 7 Identification of GW 2433. The biased library comprised a biasing fibrate<br />

monomer at R1. R2 <strong>and</strong> R3, derived from carboxylic acids <strong>and</strong> isocyanates, were<br />

chosen for diversity by means of experimental design techniques.<br />

overcoming a debilitating bottleneck in de novo/structure-based drug<br />

design.201J02 There is a growing number of published examples of structure-<br />

based library design (see, e.g., Refs. 119 <strong>and</strong> 203-214). Perhaps the most<br />

compelling example is that of Kick et a1.118 In this work, the active site of<br />

cathepsin D was used to constrain the selection of reagents at four variable<br />

positions on a scaffold based on a known inhibitor, pepstatin. The resulting<br />

library (1000 compounds) yielded a hit rate of 6-7% when screened at 1 yM<br />

with 7 compounds being active at 100 nM or less. The information gained from<br />

this initial library was used to design <strong>and</strong> synthesize a follow-up library yielding<br />

inhibitors in the range 9-15 nM. As a control, Kick et al. also designed a<br />

general, diverse library (also 1000 compounds) using 2-D similarity measures<br />

for screening against the enzyme. This library produced a hit rate of 2-3% at 1<br />

pM with only one compound being active at 100 nM. From this example, the<br />

incorporation of structural information into the library design can be seen to be<br />

extremely valuable. A similar method for structure-based library design, called<br />

PRO-SELECT has been reported by Murray <strong>and</strong> coworkers.lls This program<br />

was used to design inhibitors of thrombin based around a scaffold from a<br />

known covalent inhibitor, PPACK (D-Phe-Pro-Arg-chloromethylketone).<br />

About half the designed molecules were found to have micromolar activity, the<br />

best being a close PPACK analog (D-Phe-Pro-agmatine) which showed an inhib-<br />

itory concentration (IC50) of 40 nM. Thrombin also provided the target for the<br />

structure-based combinatorial library design described by Graybill et a1.,21s<br />

although few computational details are given.<br />

DIVERSITY IS NOT THE BE-ALL AND<br />

END-ALL!<br />

In all work on the selection of compounds or reagents by means of mo-<br />

lecular diversity techniques, it is vital not to lose sight of other consider-


34 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

ations.Z16 As Higgs et al. put it: “compounds must not be so diverse as to be<br />

pharmaceutically unreasonable.”*66 In their early work with a maximal<br />

dissimilarity selection algorithm, Higgs et al. found that nearly all the com-<br />

pounds selected were deemed pharmaceutically unreasonable by medicinal che-<br />

mists. They thus implemented a series of rules based on substructural queries,<br />

molecular weight, <strong>and</strong> ClogP cutoffs, which they use to assign “demerits” to<br />

compounds. If any compound gains too many demerits, it is rejected-a fate<br />

that may be suffered by up to half of the molecules initially selected! The fact<br />

that 90% of the molecules in the CMC database (i.e., known drugs) caused one<br />

or more of the rules to fire underlines the need not to be too zealous in rejecting<br />

compounds with only one poor feature.<br />

In a similar vein, Lewis et a1.74 describe a series of substructural filters<br />

applied during the creation of the diverse property-derived sets. These rules are<br />

designed to eliminate molecules containing toxic or very reactive substructures<br />

such as reactive epoxides, acyclic aminals or acid anhydrides.217 Also rejected<br />

are other molecules that exhibit a wide range of biological activities (e.g., pros-<br />

tagl<strong>and</strong>ins, prostacyclins, or thromboxanes) <strong>and</strong> are thus unsuitable for general<br />

screening. A similar “badlist” was developed by Lajiness at Pharmacia <strong>and</strong><br />

Upjohn.145 More recently, at RPR, we have implemented a set of alerting rules<br />

for compounds that contain chromophores that absorb in the range above 300<br />

nm. Such compounds may interfere with certain assays <strong>and</strong> thereby reduce the<br />

accuracy of high-throughput screening (HTS) data.<br />

With increasing importance being attached to the early detection of com-<br />

pounds likely to be problematic from an absorption, distribution, metabolism,<br />

<strong>and</strong> excretion (ADME) viewpoint,21*-221 at RPR we sought to apply computa-<br />

tional measures for the prediction of intestinal absorption-a key requirement<br />

for an orally bioavailable compound-during the design of lead optimization<br />

libraries. To this end, we implemented the popular “rule-of-5” criteria<br />

described by Lipinski et a1.222 A compound is deemed to fail the rule-of-5 check<br />

(<strong>and</strong> thereby to be possibly deficient from an oral absorptiodpermeability as-<br />

pect) if it possesses two or more of the following features:<br />

0 more than 5 hydrogen bond donors (i.e., N-H or 0-H bonds)<br />

0 more than 10 hydrogen bond acceptors (i.e., any N or 0, including those<br />

in donors)<br />

0 a ClogP value of greater than 5.0 (or an MlogP223 value > 4.15)<br />

0 a molecular weight of greater than 500.0<br />

At RPR we also developed computational alerts based on the work of Palm et<br />

al.224-226 <strong>and</strong> Winiwarter <strong>and</strong> coworkers.227 Both these groups demonstrated<br />

a strong correlation between polar molecular surface area (PSA) <strong>and</strong> human<br />

intestinal absorption. Of particular interest is the observation that molecules<br />

with a PSA of greater than 140 A2 are likely to show poor (< 10%) fractional<br />

absorption. Our own research has confirmed this observation, <strong>and</strong> we have


Current Issues <strong>and</strong> Future Directions 35<br />

extended the methods to develop a QSAR model for predicting blood-brain<br />

barrier penetration.228J29 Our implementation of the polar surface area<br />

calculations is sufficiently rapid to allow the profiling of large (virtual) com-<br />

pound collections on a routine basis. This permits the inclusion of ADME-<br />

related parameters in the process of product-based reagent selection.142 In this<br />

way, we can attempt to ensure that the library compounds will have good<br />

pharmacokinetic properties, thus facilitating the hit-to-lead transition.<br />

CURRENT ISSUES AND FUTURE<br />

DIRECTIONS<br />

In a field that is far from mature, there are necessarily many issues to be<br />

addressed <strong>and</strong> myriad possible future directions that research must explore.18<br />

Here, we highlight a few of the current issueddebates in the field <strong>and</strong> suggest<br />

possible avenues for future work. We have touched on several issues above, <strong>and</strong><br />

the reader is also directed to the reviews by Martin230 <strong>and</strong> Mason <strong>and</strong><br />

Hermsmeier.231<br />

<strong>Diversity</strong> Descriptors<br />

There are many issues surrounding the way that “diversity space” is<br />

described. As we have mentioned, the popular 2-D bit string or fingerprint<br />

descriptors were originally designed for 2-D substructure-searching applica-<br />

tions, <strong>and</strong> it remains unclear whether these are truly optimal for diversity<br />

calculations.70 The debate that has raged over 2-D versus 3-D descriptors has,<br />

perhaps, generated more heat than light. It is likely that each type of descriptor<br />

has its place in the process of diversity analysis <strong>and</strong> library design, but a con-<br />

sensus on this matter has yet to be reached. Nonetheless, it would appear that<br />

several groups are trying three-dimensional measures of diversity which more<br />

accurately reflect lig<strong>and</strong>-receptor interactions. Unfortunately, this leads to in-<br />

creased computational effort, limits in the description of conformational space<br />

(e.g., neglect of solvent effects in most cases), <strong>and</strong> the need for tailored diversity<br />

measures.<br />

In terms of 3-D descriptors, there remains the need for a useful, computa-<br />

tionally expedient descriptor of molecular shape. Another question is whether<br />

complementary site points should be included in 3-D descriptors as advocated<br />

by some workers?2303232 Can molecular field information be included in 3-D<br />

descriptors in a manner similar to the way it has been incorporated into experi-<br />

mental 3-D similarity searching system?233 How should tautomeric <strong>and</strong> ioniza-<br />

tion states be h<strong>and</strong>led? These are all questions worthy of future research.<br />

With both 2-D <strong>and</strong> 3-D descriptors, the thorny issue of how to validate<br />

descriptors is still an open question. It is clear that we would like to have


36 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

descriptors that relate better to biological activity,230 but proving that this is<br />

indeed the case for a given descriptor is a task fraught with difficulties. A key<br />

issue in descriptor validation is how to define a reference set that is meant to<br />

typify the universal set of actives, <strong>and</strong> possibly inactives. One approach has<br />

been to use the World Drug Index21 to define the set of active compounds <strong>and</strong><br />

the Spresi database130 to define the inactives. The WDI must be used carefully<br />

<strong>and</strong> selectively because it contains many classes that are inappropriate (e.g.,<br />

disinfectants, dentrifrices). The next question is, How valid is it to compare<br />

central nervous system (CNS) drugs with topical steroids with anticancer<br />

drugs? The danger is that the analysis will tend to produce the lowest common<br />

denominator (like the rule of 5),222 rather than a stunning insight into molecu-<br />

lar diversity. There is also the issue of reverse sampling: How valid is it to deduce<br />

the properties of the universal set of biologically active molecules from a subset?<br />

The properties of previous drugs may have been driven mainly by bio-<br />

availability, or toward making analogs of a natural substrate. Using these data<br />

forces an unnatural conservatism into our diversity models.<br />

It is also interesting to reflect on what is meant by activity <strong>and</strong> inactivity.<br />

Any molecule will bind to any receptor, although the affinity may have any<br />

value between picomolar <strong>and</strong> gigamolar. If the binding event is viewed in terms<br />

of moiecular interactions, then interesting, specific binding can be characterized<br />

by affinity constants lower than 1000 nM. However, it is not uncommon to find<br />

affinity constants of 1000 nM that are mainly due to solvophobic interactions<br />

forcing the lig<strong>and</strong> to associate with the receptor (particularly for hydrophobic<br />

compounds like steroids). At 100 nM, some specific noncovalent interactions<br />

are being formed, <strong>and</strong> at levels below 10 nM, there are many highly specific<br />

interactions present. It should be clear that the activity is a continuous phenom-<br />

enon, <strong>and</strong> that drawing an arbitrary division is a hazardous ploy. Furthermore,<br />

while one can be fairly sure why a compound is active, it is much harder to say<br />

precisely why a compound is inactive. Was it the wrong pharmacophore, a steric<br />

bump, poor solubility, metabolic alteration, or something else? Despite all these<br />

caveats, several research groups have followed such an approach <strong>and</strong> claim to<br />

be able to distinguish a potential active from a potential inactive, with reason-<br />

able confidence. Such results cannot be ignored, <strong>and</strong> they will be of use in the<br />

early phases of library design, where the basic feasibility of the library <strong>and</strong> the<br />

reaction are being considered.<br />

The realization that “mere diversity”216 is not sufficient in practical li-<br />

brary design has driven much recent work in the direction of biasing design<br />

toward compounds with more “druglike” properties. The challenge here is<br />

defining the term “druglike.” Several groups have attempted to tackle this<br />

problern,136,234-236 but some of the arguments used earlier (see section on<br />

Validation of Descriptors) also apply here. How can the non-drug like space be<br />

adequately defined? Physical properties or other measures such as polar surface<br />

area can be included in the design, but how should these be weighted with<br />

respect to diversity? Should compounds falling outside the bounds simply be


Current Issues <strong>and</strong> Future Directions 3 7<br />

excluded from further consideration? If such hard cutoffs are applied, it is not<br />

always possible to identify a truly combinatorial subset of a virtual library.<br />

Pickettl42 has implemented a simulated annealing procedure that attempts to<br />

find the solution closest to a true combinatorial subset within a number of user-<br />

defined constraints.<br />

As a final note in this section, several years ago Martin230 suggested a<br />

competition (similar to the CASP competition for protein structure predic-<br />

tion237) for assessing descriptors. This would presumably involve the computa-<br />

tion of the diversity of a defined library by several different research teams, each<br />

using its own favored approaches. The results of each team would then com-<br />

pared to some pre-agreed experimental determination of diversity. This would<br />

be interesting if it could ever be arranged!<br />

Library Design<br />

In terms of sampling diversity space, it would seem that stochastic selec-<br />

tion algorithms are becoming popular for combinatorial library design. Ad-<br />

vances in technology now allow many robots to h<strong>and</strong>le noncombinatorial li-<br />

braries, but reagent cost remains a big issue. It is possible to include cost within<br />

the selection process, but again this has to be carefully balanced with diversity<br />

(or similarity in a focused library). Product-based reagent selection would seem<br />

to be demonstrably superior to reagent-based approaches186 but, depending on<br />

the type of descriptors used, may still be problematic in terms of CPU time for<br />

very large libraries. Thus, from a practical point of view, a two-step process of<br />

reagent selection may constitute a workable compromise, with an initial<br />

reagent-based filtering step preceding the full product-based selection.<br />

The area of structure-based library design is one that promises much in the<br />

coming years. Currently, most reported approaches use the approximation of a<br />

fixed scaffold in the site (see, e.g. Refs. 115 <strong>and</strong> 118). This could be overcome<br />

by allowing some limited relaxation or docking after the attachment of each<br />

combination of R groups. Of crucial importance is the continuing search for<br />

better binding affinity prediction algorithms.230 Approaches to this problem<br />

range from empirical scoring functions117J38J39 to more detailed treatments<br />

based on Monte Carl0240 or molecular dynamics241 simulations to full free<br />

energy perturbation methods.242 In realistic terms, it is likely that only empiri-<br />

cal approaches will be applicable to library design in the near future, But con-<br />

tinuing theoretical <strong>and</strong> methodological improvements, coupled with the in-<br />

creases in computer speed combined with parallelization, should eventually lead<br />

to improved structure-based designs.<br />

Finally, even in cases where we may be able to show that our designed<br />

libraries are “better” than r<strong>and</strong>om, how close are they to being optimal? To<br />

answer this question, we need to have an external definition of optimality,<br />

which does not exist at present. What is required is accurate screening results on<br />

a large library, from which we try to select a sublibrary. It should be noted that


38 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Cornbinatorial Library Desinn<br />

the optimality test will be valid only for that library <strong>and</strong> that set of screening<br />

data.<br />

Speed Requirement<br />

As we mentioned earlier, the time that is available for each diversity task<br />

will likely depend on the nature of the task. Reagent selection may need to be<br />

done in a hurry, whereas compound acquisition studies may be afforded rather<br />

more time. In the former case, it is clear that the computer time required for<br />

diversity analysdlibrary design must not exceed that available (possibly only<br />

days if the library chemistry is already developed, longer if the chemistry is<br />

new). For many product-based reagent selection approaches, CPU time is at<br />

present a very real obstacle to what might be done. It is to be hoped that more<br />

efficient algorithms <strong>and</strong> exploitation of parallel computation techniques will<br />

help alleviate the current difficulties. More fundamentally, the development of<br />

approaches based on Markush representations may offer a solution in instances<br />

where only simple 2-D descriptors are employed.243<br />

“Quick <strong>and</strong> Dirty” QSAR<br />

The process of library design is an iterative rather than a “one-off ” pro-<br />

cedure. Once the first library has been assayed, the next question is, What to<br />

make next? In the modern pharmaceutical discovery milieu, the computational<br />

chemist needs to answer this question quickly to have an effective input in<br />

selecting the next synthetic targets. Clearly, there is a requirement for quantita-<br />

tive structure-activity relationships <strong>and</strong> other data-mining techniques to extract<br />

relationships from the HTS data resulting from large libraries. Martin230 sug-<br />

gests that QSAR techniques need to be able to h<strong>and</strong>le 105 compounds rather<br />

than the relatively small data sets (ca. 102) usually studied at present. Methods<br />

are also required to cope with noisy, incomplete, or binary (results simply<br />

expressed as “+” or “-” ) biological activity data. Hence the expression “quick<br />

<strong>and</strong> dirty QSAR” has come into use. Some approaches to these problems are<br />

being reported,2447245 <strong>and</strong> it is possible that fuzzy methods may also have<br />

a part to play. Certainly, there is much room for further research in this<br />

area.<br />

Integration with Other Modeling Tools<br />

A further issue is how to link diversity tools effectively with extant modeling<br />

programs. For instance, if a partitioning scheme were being used for analyzing<br />

diversity space, it might be possible to use de novo design techniques to<br />

suggest compounds to fill currently empty cells.18J30 Indeed, Pearlman246 is<br />

working on a program called EAInventor to do just this in conjunction with his<br />

Diver~eSolutions2~~ package.


Persuading the Customers<br />

References 39<br />

Last but not at all least, there is the issue of getting buy-in from the<br />

medicinal chemists. It is not always easy to convince those tasked with library<br />

synthesis of the benefits of computational reagent selection. Many still prefer to<br />

stick with their experience <strong>and</strong> intuition as to “what will work.” Of course, this<br />

accumulated wisdom should not be ignored <strong>and</strong>, in practice, a compromise<br />

between human <strong>and</strong> computer selection may be the best way forward. Yet<br />

nothing succeeds like success, <strong>and</strong> it has already been demonstrated at various<br />

pharmaceutical companies that the adoption of library design will accelerate<br />

when it is associated with the discovery of novel leads at a rate far faster than<br />

that which can be simply explained away by its detractors. The analogous<br />

situation existed a few years ago in the field of structure-based drug design,<br />

which really took off only after the publication of potent new leads, particularly<br />

by groups working on HIV-1 protease.47<br />

CONCLUSIONS<br />

The term “diversity” is hard to define conceptually. In a practical sense,<br />

diversity analysis is a design strategy that attempts to maximize the hit rate of<br />

HTS experiments, <strong>and</strong> validation should be in terms of this goal. It is important<br />

to maintain a pragmatic approachl87: “diversity” is not the be-all <strong>and</strong> end-all.<br />

This is especially so when one is designing structure-based libraries, where<br />

diversity is perhaps only a weak contributor to a good design. The best selection<br />

is likely to be neither arbitrary nor maximally diverse.14<br />

Finally, we reemphasize that this research area is still young: developments<br />

are occurring rapidly, driven by other new technologies in drug discovery re-<br />

search. This chapter represents a personal snapshot taken by the authors. “It is<br />

impossible to predict the contents of an article written in 10 years on the subject<br />

of molecular diversity” .230<br />

ACKNOWLEDGMENTS<br />

We thank our colleagues, past <strong>and</strong> present, for their help <strong>and</strong> insights in the field of molecu-<br />

lar diversity <strong>and</strong> combinatorid library design. In particular, we acknowledge the contributions of<br />

present <strong>and</strong> past coworkers at Rhbne-Poulenc Rorer (Aventis) Iain McLay (now at Glaxo Well-<br />

come), Paul Menard, Claude Luttmann, Isabelle Morize, Jon Mason, <strong>and</strong> Andrew Good (the last<br />

two now at Bristol-Myers Squibb).<br />

REFERENCES<br />

1. B. Merrifield, J. Am. Chem. SOC., 85, 2149 (1963). Solid Phase Peptide Synthesis. I. The<br />

Synthesis of a Tetrapeptide.


40 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Library Design<br />

2. C. Desai, R. N. Zuckermann, <strong>and</strong> W. H. Moos, Drug Dev. Res., 33, 174 (1994). Recent<br />

Advances in the Generation of Chemical <strong>Diversity</strong> Libraries.<br />

3. M. Geysen, S. Barteling, <strong>and</strong> R. Moelen, Proc. Natl. Acad. Sci. USA, 81,3998 (1984). Use of<br />

Peptide Synthesis to Probe Viral Antigens for Epitopes to a Resolution of a Single Amino<br />

Acid.<br />

4. R. A. Houghten, Proc. Natl. Acad. Sci. USA, 82,5131 (1985). General Method for the Rapid<br />

Solid-Phase Synthesis of Large Numbers of Peptides: Specificity of Antigen-Antibody Inter-<br />

action at the Level of Individual Amino Acids.<br />

5. K. S. Lam, S. E. Salmon, E. M. Hersh, V. J. Hruby, W. M. Kazmierski, <strong>and</strong> R. J. Knapp,<br />

Nature, 354, 82 (1991). A New Type of Synthetic Peptide Library for Identifying Lig<strong>and</strong>-<br />

Binding Activity.<br />

6. L. A. Thompson <strong>and</strong> J. A. Ellman, Chem. Rev., 96,555 (1996). Synthesis <strong>and</strong> Applications of<br />

Small Molecule Libraries.<br />

7. E. M. Gordon, M. A. Gallop, <strong>and</strong>D. V. Patel, Acc. Chem. Res., 29,144 (1996). Strategy <strong>and</strong><br />

Tactics in Combinatorial Organic Synthesis. Applications to Drug Discovery.<br />

8. F. Balkenhohl, C. von dem Bussche-Huennefeld, A. Lansky, <strong>and</strong> C. Zechel, Angew. Cbem.<br />

Int. Ed. Engl., 35, 2288 (1996). Combinatorial Synthesis of Small Organic Molecules.<br />

9. E. R. Felder <strong>and</strong> D. Poppinger, Adv. Drug Res., 30, 111 (1997). Combinatorial Compound<br />

Libraries for Enhanced Drug Discovery Approaches.<br />

10. D. Brown, Mol. <strong>Diversity</strong>, 2, 217 (1997). Future Pathways for Combinatorial Chemistry.<br />

11. P. L. Myers, Curr. Opin. Biotechnol., 8, 701 (1997). Will Combinatorial Chemistry Deliver<br />

Real Medicines?<br />

12. R. E. Dolle, Mol. <strong>Diversity</strong>, 3, 199 (1998). Comprehensive Survey of Chemical Libraries<br />

Yielding Enzyme Inhibitors, Receptor Agonists <strong>and</strong> Antagonists, <strong>and</strong> Other Biologically<br />

Active Agents: 1992 Through 1997.<br />

13. J.-L. Fauchere, J. A. Boutin, J.-M. Henlin, N. Kucharczyk, <strong>and</strong> J.-C. Ortuno, Chemom. Intell.<br />

Lab. Syst., 43 (1,2), 43 (1998). Combinatorial Chemistry for the Generation of <strong>Molecular</strong><br />

<strong>Diversity</strong> <strong>and</strong> the Discovery of Bioactive Leads.<br />

14. J. M. Blaney <strong>and</strong> E. J. Martin, Cum Opin. Chem. Biol., 1, 54 (1997). Computational<br />

Approaches for Combinatorial Library Design <strong>and</strong> <strong>Molecular</strong> <strong>Diversity</strong> <strong>Analysis</strong>.<br />

15. E. J. Martin, D. C. Spellmeyer, R. E. Critchlow Jr., <strong>and</strong> J. M. Blaney, in Reviews in Computa-<br />

tional Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd, Eds., VCH Publishers, New York, 1997,<br />

Vol. 10, pp. 75-100. Does Combinatorial Chemistry Obviate <strong>Computer</strong>-<strong>Aided</strong> Drug<br />

Design?<br />

16. M. G. Bures <strong>and</strong> Y. C. Martin, Curr. Opin. Chem. Biol., 2, 376 (1998). Computational<br />

Methods in <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Chemistry.<br />

17. D. K. Agrafiotis, J. C. Myslik, <strong>and</strong> F. R. Salemme, Mol. <strong>Diversity</strong>, 4, 1 (1999). Advances in<br />

<strong>Diversity</strong> Profiling <strong>and</strong> Combinatorial Series Design.<br />

18. P. Willett, Perspect. Drug Discovery Des., 7/8,1 (1997). Computational Tools for the Analy-<br />

sis of <strong>Molecular</strong> <strong>Diversity</strong>. For more recent material, see: D. K. Agrafiotis <strong>and</strong> E. J. Martin,<br />

J. Mol. Graphics Modell., 18, (3/4), in press (2000). Combinatorial Library Design.<br />

19. H. Kubinyi, Perspect. Drug Discovery Des., 9/10/11, 225 (1998). Similarity <strong>and</strong><br />

Dissimilarity: A Medicinal Chemist's View.<br />

20. G. Sello, J. Chem. Inf. Comput. Sci., 38, 691 (1998). Similarity Measures: Is It Possible to<br />

Compare Dissimilar Structures?<br />

21. World Drug Index. Derwent Information, http://www.derwent.com/.<br />

22. E. J. Martin, R. E. Critchiow Jr., D. C. Speilmeyer, S. Rosenberg, K. L. Spear, <strong>and</strong> J. M.<br />

Blaney, Pharmacocbem. Libr., 29, 133 (1998). Diverse Approaches to Combinatorial Li-<br />

brary Design.<br />

23. R. S. Bohacek, C. McMartin, <strong>and</strong> W. C. Guida, Med. Res. Rev., 16, 3 (1996). The Art <strong>and</strong><br />

Practice of Structure-Based Drug Design.<br />

24. H. Kubinyi, Curr. Opin. Drug Discovery Dev., 1, 4 (1998). Structure-Based Design of En-<br />

zyme Inhibitors <strong>and</strong> Receptor Lig<strong>and</strong>s.


25.<br />

26.<br />

27.<br />

28.<br />

29.<br />

30.<br />

31.<br />

32.<br />

33.<br />

34.<br />

35.<br />

36.<br />

37.<br />

38.<br />

39.<br />

40.<br />

41.<br />

42.<br />

43.<br />

44.<br />

45.<br />

46.<br />

References 41<br />

P. M. Dean, <strong>Molecular</strong> Foundations of Drug-Receptor Interaction, Cambridge University<br />

Press, Cambridge, 1987.<br />

W. P. Jencks, in Chemical Recognition in Biology, F. Chapeville <strong>and</strong> A.-L. Haenni, Eds.,<br />

Springer-Verlag, Berlin, 1980, pp. 3-25. What Everyone Wanted to Know About Tight<br />

Binding <strong>and</strong> Catalysis, But Never Thought of Asking.<br />

H.-J. Bohm <strong>and</strong> G. Klehe, Angew. Chern. Int. Ed. Engl., 35, 2588 (1996). What Can We<br />

Learn from <strong>Molecular</strong> Recognition in Protein-Lig<strong>and</strong> Complexes for the Design of New<br />

Drugs?<br />

R. L. Babine <strong>and</strong> S. L. Bender, Chem. Rev., 97, 1359 (1997). <strong>Molecular</strong> Recognition of<br />

Protein-Lig<strong>and</strong> Complexes: Application to Drug Design.<br />

G. Klebe <strong>and</strong> H.-J. Bohm,]. Recept. Signal. Transduction Res., 17,459 (1997). Energetic <strong>and</strong><br />

Entropic Factors Determining Binding Affinity in Protein-Lig<strong>and</strong> Complexes.<br />

D. H. Williams, Chem. SOC. Rev., 28,57 (1998). Aspects of Weak Interactions.<br />

J. R. H. Tame, J. Cornput.-<strong>Aided</strong> Mol. Des., 13,99 (1999). Scoring Functions: A View from<br />

the Bench.<br />

A. R. Fersht, J.-P. Shi, J. Knill-Jones, D. M. Lowe, A. J. Wilkinson, D. M. Blow, P. Brick, P.<br />

Carter, M. M. Y. Waye, <strong>and</strong> G. Winter, Nature, 314, 235 (1985). Hydrogen Bonding <strong>and</strong><br />

Biological Specificity Analyzed by Protein Engineering.<br />

A. Horovitz, L. Serrano, B. Avron, M. Bycroft, <strong>and</strong> A. R. Fersht, /. Mol. B id, 216, 1031<br />

(1990). Strength <strong>and</strong> Cooperativity of Contributions of Surface Salt Bridges to Protein<br />

Stability.<br />

A. J. Doig <strong>and</strong> D. H. Williams,]. Am. Chem. SOC., 114, 338 (1992). Binding Energy of an<br />

Arnide-Amide Hydrogen Bond in Aqueous <strong>and</strong> Nonpolar Solvents.<br />

P. L. Chau <strong>and</strong> P. M. Dean, ]. Cornput.-<strong>Aided</strong> Mol. Des., 8, 513 (1994). Electrostatic<br />

Complementarity Between Proteins <strong>and</strong> Lig<strong>and</strong>s. 1. Charge Disposition, Dielectric <strong>and</strong><br />

Interface Effects.<br />

P. L. Chau <strong>and</strong> P. M. Dean, J. Cornput.-<strong>Aided</strong> Mol. Des., 8, 527 (1994). Electrostatic<br />

Complementarity Between Proteins <strong>and</strong> Lig<strong>and</strong>s. 2. Lig<strong>and</strong> Moieties.<br />

P. L. Chau <strong>and</strong> P. M. Dean, I. Cornput.-<strong>Aided</strong> Mol. Des., 8, 545 (1994). Electrostatic<br />

Complementarity Between Proteins <strong>and</strong> Lig<strong>and</strong>s. 3. Structural Basis.<br />

D. Eisenberg <strong>and</strong> A. D. McLachlan, Nature, 319, 199 (1986). Solvation Energy in Protein<br />

Folding <strong>and</strong> Binding.<br />

A. Ben-Naim, Hydrophobic Interactions, Plenum Press, New York, 1980.<br />

1). G. Alberg <strong>and</strong> S. L. Schreiber, Science, 262, 248 (1993). Structure-Based Design of a<br />

Cyclophilin-Calcineurin Bridging Lig<strong>and</strong>.<br />

A. R. Khan, J. C. Parrish, M. E. Fraser, W. W. Smith, P. A. Bartlett, <strong>and</strong> M. N. G. James,<br />

Biochemistry, 37, 16839 (1998). Lowering of the Entropic Barrier for Binding Conforma-<br />

tionally Flexible Inhibitors to Enzymes.<br />

B. J. Stockman, Prog. Nucl. Magn. Reson. Spectrosc., 33,109 (1998). NMR Spectroscopy as<br />

a Tool for Structure-Based Drug Design.<br />

J. T. Stivers, C. Abeygunawardana, A. S. Mildvan, <strong>and</strong> C. l? Whitman, Biochemistry 35,<br />

16036 (1996). '"N NMR Relaxation Studies of Free <strong>and</strong> Inhibitor-Bound 4-Oxalocrotonate<br />

Tautomerase: Backbone Dynamics <strong>and</strong> Entropy Changes of an Enzyme upon Inhibitor<br />

Binding.<br />

L. K. Nicholson, T. Yarnazaki, D. A. Torchia, S. Grzesiek, A. Bax, S. J. Stahl, J. D. Kaufman,<br />

P. T. Wingfield, P. Y. S. Lam, P. K. Jadhav, C. N. Hodge, P. J. Domaille, <strong>and</strong> C.-H. Chang,<br />

Nut. Struct. Biol., 2,274 (1995). Flexibility <strong>and</strong> Function in HIV-1 Protease.<br />

X. Leng, S. Y. Tsai, B. W. O'Malley, <strong>and</strong> M. J. Tsai, J. Steroid Biochem. Mol. Biol., 46,643<br />

(1993). Lig<strong>and</strong>-Dependent Conformational Changes in Thyroid Hormone <strong>and</strong> Retinoic<br />

Acid Receptors Are Potentially Enhanced by Heterodimerization with Retinoic X Receptor.<br />

A. M. Davis <strong>and</strong> S. J. Teague, Angew. Chem. Int. Ed. Engl., 38, 736 (1999). Hydrogen<br />

Bonding, Hydrophobic Interactions, <strong>and</strong> Failure of the Rigid Receptor Hypothesis.


42 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Cornbinatorial Libra y Design<br />

47. A. Wlodawer <strong>and</strong> J. Vondrasek, Annu. Rev. Biophys. Biomol. Struct., 27, 249 (1998).<br />

Inhibitors of HIV-1 Protease: A Major Success of Structure-Assisted Drug Design.<br />

48. A. R. Leach, J. Mol. Biol., 235, 345 (1994). Lig<strong>and</strong> Docking to Proteins with Discrete<br />

Sidechain Flexibility.<br />

49. G. Jones, P. Willett, <strong>and</strong> R. C. Glen,J. Mol. Biol., 245,43 (1995). <strong>Molecular</strong> Recognition of<br />

Receptor Sites Using a Genetic Algorithm with a Description of Desolvation.<br />

50. V. Schnecke, C. A. Swanson, E. D. Getzoff, J. A. Tainer, <strong>and</strong> L. A. Kuhn, Proteins: Struct.,<br />

Funct., Genet., 33, 74 (1998). Screening a Peptidyl Database for Potential Lig<strong>and</strong>s to Pro-<br />

teins with Side-Chain Flexibility.<br />

51. B. S<strong>and</strong>ak, R. Nussinov, <strong>and</strong> H. J. Wolfson, J. Comput. Biol., 5,631 (1998). A Method for<br />

Biomolecular Structural Recognition <strong>and</strong> Docking Allowing Conformational Flexibility.<br />

52. F. A. Quiocho, D. K. Wilson, <strong>and</strong> N. K. Vyas, Nature, 340,404 (1989). Substrate Specificity<br />

<strong>and</strong> Affinity of a Protein Modulated by Bound Water Molecules.<br />

53. M. L. Raymer, P. C. Sanschagrin, W. F. Punch, S. Venkataraman, E. D. Goodman, <strong>and</strong> L. A.<br />

Kuhn, J. Mol. Biol., 265, 445 (1997). Predicting Conserved Water-Mediated <strong>and</strong> Polar<br />

Lig<strong>and</strong> Interactions in Proteins Using a K-Nearest-Neighbors Genetic Algorithm.<br />

54. V. A. Makarov, B. K. Andrews, <strong>and</strong> B. M. Pettitt, Biopolymers, 45,469 (1998). Reconstruct-<br />

ing the Protein-Water Interface.<br />

55. M. Feig <strong>and</strong> B. M. Pettitt, Structure, 6, 1351 (1998). Crystallographic Water Sites from a<br />

Theoretical Perspective.<br />

56. M. Rarey, B. Kramer, T. Lengauer, <strong>and</strong> G. Klebe, J. Mol. Biol., 261, 470 (1996). A Fast<br />

Flexible Docking Method Using an Incremental Construction Algorithm.<br />

57. M. Rarey, B. Kramer, <strong>and</strong> T. Lengauer, Proteins: Struct., Func., Genet., 34, 17 (1999). The<br />

Particle Concept: Placing Discrete Water Molecules During Protein-Lig<strong>and</strong> Docking<br />

Predictions.<br />

58. E. F. Meyer, I. Botos, L. Scapozza, <strong>and</strong> D. Zhang, Perspect. Drug Discovery Des., 3, 168<br />

(1995). Backward Binding <strong>and</strong> Other Structural Surprises.<br />

59. G. D. Diana, A. M. Treasurywala, T. R. Bailey, R. C. Oglesby, D. C. Pevear, <strong>and</strong> F. J. Dutko,<br />

J. Med. Chem., 33, 1306 (1990). A Model for Compounds Active Against Human Rhi-<br />

novirus-14 Based on X-Ray Crystallography Data.<br />

60. R. D. Brown, Perspect. Drug Discovery Des., 7/8, 31 (1997). Descriptors for <strong>Diversity</strong><br />

<strong>Analysis</strong>.<br />

61. R. S. Pearlman, Chem. Des. Autom. News, 2 (l), 1 (1987). Rapid Generation of High Quality<br />

Approximate 3D <strong>Molecular</strong> Structures.<br />

62. J. Sadowski <strong>and</strong> J. Gasteiger, Chem. Rev., 93,2567 (1993). From Atoms <strong>and</strong> Bonds to Three-<br />

Dimensional Atomic Coordinates.<br />

63. N. E. Shemetulskis, D. Weininger, C. J. Blankley, J. J. Yang, <strong>and</strong> C. Humblet, J. Chem. Inf.<br />

Comput. Sci., 36, 862 (1996). Stigmata: An Algorithm to Determine Structural Com-<br />

monalities in Diverse Datasets.<br />

64. P. Willett, V. Winterman, <strong>and</strong> D. Bawden, J. Chem. Inf. Comput. Sci., 26, 109 (1986).<br />

Implementation of Nonhierarchical Cluster <strong>Analysis</strong> Methods in Chemical Information<br />

Systems: Selection of Compounds for Biological Testing <strong>and</strong> Clustering of Substructure<br />

Search Output.<br />

65. SSKEYS Gateway, MDL Information Systems Inc., 14600 Catalina St., San Le<strong>and</strong>ro, CA<br />

94577. http://www.mdli.com/.<br />

66. R. D. Brown <strong>and</strong> Y. C. Martin,J. Chem. Inf. Comput. Sci., 36,572 (1996). Use of Structure-<br />

Activity Data to Compare Structure-Based Clustering Methods <strong>and</strong> Descriptors for Use in<br />

Compound Selection.<br />

67. M. J. McGregor <strong>and</strong> P. V. Pallai, J. Chem. In{ Comput. Sci., 37, 443 (1997). Clustering of<br />

Large Databases of Compounds: Using the MDL Keys as Structural Descriptors.<br />

68. R. D. Brown <strong>and</strong> Y. C. Martin, J. Chem. lnf. Cornput. Sci., 37, 1 (1997). The Information<br />

Content of 2-D <strong>and</strong> 3-D Structural Descriptors Relevant to Lig<strong>and</strong>-Receptor Binding.


References 43<br />

69. Daylight Chemical Information Software, version 4.62. Daylight Chemical Information<br />

Systems Inc., 27401 Los Altos, Suite 370, Mission Viejo, CA 92691. http://<br />

www.daylight.com/.<br />

70. D. R. Flower, J. Chem. Inf. Comput. Sci., 38, 379 (1998). On the Properties of Bit String-<br />

Based Measures of Chemical Similarity.<br />

71. P. Willett, J. M. Barnard, <strong>and</strong> G. M. Downs, J. Chem. Znf. Comput. Sci., 38, 983 (1998).<br />

Chemical Similarity Searching.<br />

72. L. H. Hall <strong>and</strong> L. B. Kier, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B.<br />

Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 367-422. The <strong>Molecular</strong> Connec-<br />

tivity Chi Indexes <strong>and</strong> Kappa Shape Indexes in Structure-Property Modeling.<br />

73. A. T. Balaban, SAR QSAR Environ. Res., 8, 1 (1998). Topological <strong>and</strong> Stereochemical<br />

<strong>Molecular</strong> Descriptors for Databases Useful in QSAR SimilaritylDissimilarity <strong>and</strong> Drug<br />

Design.<br />

74. R. A. Lewis, J. S. Mason, <strong>and</strong> I. M. McLay, J. Chem. Znf. Comput. Sci., 37, 599 (1997).<br />

Similarity Measures for Rational Set Selection <strong>and</strong> <strong>Analysis</strong> of Combinatorial Libraries: The<br />

Diverse Property-Derived (DPD) Approach.<br />

75. E. J. Martin, J. M. Blaney, M. A. Siani, D. C. Spellmeyer, A. K. Wong, <strong>and</strong> W. H. Moos, J.<br />

Med. Chem., 38,1431 (1 995). Measuring <strong>Diversity</strong>: Experimental Design of Combinatorial<br />

Libraries for Drug Discovery.<br />

76. D. J. Cummins, C. W. Andrews, J. A. Bentley, <strong>and</strong> M. Cory, J. Chem. Inf. Comput. Sci., 36,<br />

750 (1996). <strong>Molecular</strong> <strong>Diversity</strong> in Chemical Databases: Comparison of Medicinal Chemis-<br />

try Knowledge Bases <strong>and</strong> Databases of Commercially Available Compounds.<br />

77. S. Wold, K. Esbensen, <strong>and</strong> P. Geladi, Chemom. Intell. Lab. Syst., 2, 37 (1987). Principal<br />

Component <strong>Analysis</strong>.<br />

78. B. S. Everitt <strong>and</strong> G. Dunn, Applied Multivariate Dakz <strong>Analysis</strong>, Oxford University Press,<br />

New York, 1992.<br />

79. W. S. Dillon <strong>and</strong> M. Goldstein, Multivariate <strong>Analysis</strong>: Methods <strong>and</strong> Applications, Wiley,<br />

New York, 1984.<br />

80. CLOGP. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite 370, Mission<br />

Viejo, CA 92691. http://www.daylight.com/; see also http://biobyte.com/.<br />

81. A. J. Leo, Chem. Rev., 93, 1281 (1993). Calculating log Poct from Structures.<br />

82. P.-A. Carrupt, B. Testa, <strong>and</strong> P. Gaillard, in Reviews in computational Chemistry, K. B.<br />

Lipkowitz <strong>and</strong> D. B. Boyd, Eds., Wiky-VCH, New York, 1997, Vol. 11, pp. 241-315.<br />

Computational Approaches to Lipophilicity: Methods <strong>and</strong> Applications.<br />

83. P. F. de Aguiar, B. Bourguignon, M. S. Khots, D. L. Massart, <strong>and</strong> R. Phan-Than-Luu,<br />

Chemom. Intell. Lab. Syst., 30, 199 (1992). D-Optimal Designs.<br />

84. R. S. Pearlman <strong>and</strong> K. M. Smith, Perspect. Drug Discovery Des., 9/10/11,355 (1 997). Novel<br />

Software Tools for Chemical <strong>Diversity</strong>.<br />

85. R. S. Pearlman <strong>and</strong> K. M. Smith, Drugs Future, 23, 885 (1998). Software for Chemical<br />

<strong>Diversity</strong> in the Context of Accelerated Drug Discovery.<br />

86. F. R. Burden, J. Chern. Inf. Comput. Sci., 29,225 (1989). <strong>Molecular</strong> Identification Number<br />

for Substructure Searches.<br />

87. P. R. Menard, J. S. Mason, I. Morize, <strong>and</strong> S. Bauerschmidt,]. Chem. Znf. Comput. Sci., 38,<br />

1204 (1998). Chemistry Space Metrics in <strong>Diversity</strong> <strong>Analysis</strong>, Library Design, <strong>and</strong> Com-<br />

pound Selection.<br />

88. R. S. Pearlman <strong>and</strong> K. M. Smith,J. Chem. Inf. Comput. Sci., 39,28 (1999). Metric Validation<br />

<strong>and</strong> the Receptor-Relevant Subspace Concept.<br />

89. D. Stanton,J. Chem. Inf. Comput. Sci., 39,ll (1999). Evaluation <strong>and</strong> Use of BCUT Descrip-<br />

tors in QSAR <strong>and</strong> QSPR Studies.<br />

90. G. W. Bemis <strong>and</strong> I. D. Kuntz,J. Cornput.-<strong>Aided</strong> Mol. Des., 6,607 (1992). A Fast <strong>and</strong> Efficient<br />

Method for 2D <strong>and</strong> 3D <strong>Molecular</strong> Shape Description.<br />

91. G. Moreau <strong>and</strong> C. Turpin, Analusis, 24, 17 (1996). Use of Similarity <strong>Analysis</strong> to Reduce<br />

Large <strong>Molecular</strong> Libraries to Smaller Sets of Representative Molecules.


44 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

\<br />

92. J. Sadowski, M. Wagener, <strong>and</strong> J. Gasteiger, Angew. Chem. Int. Ed. Engl., 34,2674 (1996).<br />

Assessing Similarity <strong>and</strong> <strong>Diversity</strong> of Combinatorial Libraries by Spatial Autocorrelation<br />

Functions <strong>and</strong> Neural Networks.<br />

93. S. E. Jakes <strong>and</strong> P. Willett, J. Mol. Graphics, 4, 12 (1986). Pharmacophoric Pattern Matching<br />

in Files of 3-D Chemical Structures: Selection of Interatomic Distance Screens.<br />

94. S. E. Jakes, N. Watts, P. Willett, D. Bawden, <strong>and</strong> J. D. Fisher, J. Mol. Graphics, 5,41 (1987).<br />

Pharmacophoric Pattern Matching in Files of 3-D Chemical Structures: Evaluation of Search<br />

Performance.<br />

95. R. P. Sheridan, R. Nilakantan, A. Rusinko 111, N. Bauman, K. S. Haraki, <strong>and</strong> R. Ven-<br />

kataraghavan, ]. Chem. Inf. Comput. Sci., 29, 255 (1989). 3-DSEARCH: A System for<br />

Three-Dimensional Substructure Searching.<br />

96. Y. C. Martin, M. G. Bures, <strong>and</strong> P. Willett, in Reviews in Computa#ional Chemistry, K. B.<br />

Lipkowitz <strong>and</strong> D. B. Boyd, Eds., VCH Publishers, New York, 1990, Vol. 1, pp. 213-263.<br />

Searching Databases of Three-Dimensional Structures. Y. C. Martin, J. Med. Chem., 35,<br />

2145 (1992). 3-D Database Searching in Drug Design.<br />

97. A. C. Good <strong>and</strong> J. S. Mason, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D.<br />

B. Boyd, Eds., VCH Publishers, New York, 1995, Vol. 7, pp. 67-117. Three-Dimensional<br />

Structure Database Searches.<br />

98. S. Wang, G. W. A. Milne, X. Yan, I. Posey, M. C. Nicklaus, L. Graham, <strong>and</strong> W. G. Rice, 1.<br />

Med. Chem., 39,2047 (1996). Discovery of Novel, Non-Peptide HIV-1 Protease Inhibitors<br />

by Pharmacophore Searching.<br />

99: P. C. Astles, T. J. Brown, C. M. H<strong>and</strong>scombe, M. F. Harper, N. V. Harris, R. A. Lewis, P. M.<br />

Lockey, C. McCarthy, I. M. McLay, B. Porter, A. G. Roach, C. Smith, <strong>and</strong> R. J. A. Walsh,<br />

Eur. ]. Med. Chem., 32,409 (1997). Selective Endothelin A Receptor Lig<strong>and</strong>s. 1. Discovery<br />

<strong>and</strong> Structure-Activity of 2,4-Disubstituted Benzoic Acid Derivatives.<br />

100. S. D. Pickett, J. S. Mason, <strong>and</strong> I. M. McLay, J. Chem. Inf. Comput. Sci., 36, 1214 (1996).<br />

<strong>Diversity</strong> Profiling <strong>and</strong> Design Using 3-D Pharmacophores: Pharmacophore-Derived QU-<br />

eries (PDQ).<br />

101. J. S. Mason <strong>and</strong> S. D. Pickett, Perspect. Drug Discovery Des., 7/8,85 (1997). Partition-Based<br />

Selection.<br />

102. S. D. Pickett, C. Luttmann, V. Guerin, A. Laoui, <strong>and</strong> E. James,]. Chem. Inf. Comput. Sci., 38,<br />

144 (1998). DIVSEL <strong>and</strong> COMPLIB-Strategies for the Design <strong>and</strong> Comparison of Com-<br />

binatorial Libraries Using Pharmacophoric Descriptors.<br />

103. E. K. Davies, in <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Chemistry: Libraries <strong>and</strong> Drug<br />

Discovery, 1. M. Chaiken <strong>and</strong> K. D. J<strong>and</strong>a, Eds., American Chemical Society, Washington,<br />

DC, 1996, pp. 309-316. Using Pharrnacophore <strong>Diversity</strong> to Select Molecules to Test from<br />

Commercial Catalogues.<br />

104. R. D. Brown <strong>and</strong> Y. C. Martin,]. Med. Chem., 40, 2304 (1997). Designing Combinatorial<br />

Library Mixtures Using a Genetic Algorithm.<br />

105. ChernDiverse. Oxford <strong>Molecular</strong> Group plc, The Medawar Centre, Oxford Science Park,<br />

Oxford, OX4 4GA, United Kingdom. http://www.oxmol.coml.<br />

106. R. D. Cramer, R. D. Clark, D. E. Patterson, <strong>and</strong> A. M. Ferguson, J. Med. Chem., 39, 3060<br />

(1996). Bioisosterism as a <strong>Molecular</strong> <strong>Diversity</strong> Descriptor: Steric Fields of Single Topomeric<br />

Conformers.<br />

107. J. Mount, J. Ruppert, W. Welch, <strong>and</strong> A. N. Jain, J. Med. Chem., 42, 60 (1999). IcePick: A<br />

Flexible Surface-Based System for <strong>Molecular</strong> <strong>Diversity</strong>.<br />

108. W. Welch, J. Ruppert, <strong>and</strong> A. N. Jain, Chem. Biol., 3,449 (1996). Hammerhead: Fast, Fully<br />

Automated Docking of Flexible Lig<strong>and</strong>s to Protein Binding Sites.<br />

109. A. N. Jain, K. Koile, <strong>and</strong> D. Chaprnan,J. Med. Chem., 37,2315 (1994). Compass: Predicting<br />

Biological Activities from <strong>Molecular</strong> Surface Properties. Performance Comparisons on a<br />

Steroid Benchmark.<br />

110. S. M. Boyd, M. Beverley, L. Norskov, <strong>and</strong> R. E. Hubbard, J. Cornput.-<strong>Aided</strong> Mol. Des., 9,<br />

417 (1995). Characterising the Geometric <strong>Diversity</strong> of Functional Groups in Chemical<br />

Databases.


111.<br />

112.<br />

113.<br />

114.<br />

115.<br />

116.<br />

117.<br />

118.<br />

119.<br />

120.<br />

121.<br />

122.<br />

123.<br />

124.<br />

125.<br />

126.<br />

127.<br />

128.<br />

129.<br />

References 45<br />

P. A. Bartlett <strong>and</strong> G. Lauri, in Book of Abstracts, 211th ACS National Meeting, New<br />

Orleans, LA, March 24-28, 1996, American Chemical Society, Washington, DC, 1996,<br />

COMP-014. The CAVEAT Vector Approach for Structure-Based Design <strong>and</strong> Combinatorial<br />

Chemistry.<br />

D. Chapman, J. Cornput.-<strong>Aided</strong> Mol. Des., 10,501 (1996). The Measurement of <strong>Molecular</strong><br />

<strong>Diversity</strong>: A Three-Dimensional Approach.<br />

G. Jones, P. Willett, R. C. Glen, A. R. Leach, <strong>and</strong> R. Taylor, J. Mol. Biol., 267, 727 (1997).<br />

Development <strong>and</strong> Validation of a Genetic Algorithm for Flexible Docking.<br />

C. A. Baxter, C. W. Murray, D. E. Clark, D. R. Westhead, <strong>and</strong> M. D. Eldridge, Puoteins:<br />

Struct., Funct., Genet., 33,367 (1998). Flexible Docking Using Tabu Search <strong>and</strong> an Empiri-<br />

cal Estimate of Binding Affinity.<br />

C. W. Murray, D. E. Clark, T. R. Auton, M. A. Firth, J. Li, R. A. Sykes, B. Waszkowycz, D. R.<br />

Westhead, <strong>and</strong> S. C. Young,]. Cornput.-<strong>Aided</strong> Mol. Des., 11, 193 (1997). PROSELECT<br />

Combining Structure-Based Drug Design <strong>and</strong> Combinatorial Chemistry for Rapid Lead<br />

Discovery. 1. Technology.<br />

D. E. Clark, D. Frenkel, S. A. Levy, J. Li, C. W. Murray, B. Robson, B. Waszkowycz, <strong>and</strong> D. R.<br />

Westhead, J. Cornput.-<strong>Aided</strong> Mol. Des., 9,13 (1995). PRO-LIGAND: An Approach to De<br />

Novo <strong>Molecular</strong> Design. 1. Application to the Design of Organic Molecules.<br />

M. D. Eldridge, C. W. Murray, T. R. Auton, G. V. Paolini, <strong>and</strong> R. P. Mee,]. Cornput.-<strong>Aided</strong><br />

Mol. Des., 11, 425 (1997). Empirical Scoring Functions. I. The Development of a Fast<br />

Empirical Scoring Function to Estimate the Binding Affinity of Lig<strong>and</strong>s in Receptor<br />

Complexes.<br />

E. K. Kick, D. C. Roe, A. G. Skillman, G. Liu, T. J. A. Ewing, Y. Sun, 1. D. Kuntz, <strong>and</strong> J. A.<br />

Ellman, Chem. Biol., 4,297 (1997). Structure-Based Design <strong>and</strong> Combinatorial Chemistry<br />

Yield Low-Nanomolar Inhibitors of Cathepsin D.<br />

T. S. Haque, A. G. Skillman, C. E. Lee, H. Hahashita, I. Y. Gluzman, T. J. A. Ewing, D. E.<br />

Goldberg, I. D. Kuntz, <strong>and</strong> J. A. Ellman, 1. Med. Chern., 42, 1428 (1999). Potent, Low-<br />

<strong>Molecular</strong>-Weight Non-Deptide Inhibitors of Malarial Aspartyl Protease Plasmepsin 11.<br />

Y. Sun, T. J. A. Ewing, A. G. Skillman, <strong>and</strong> I. D. Kuntz,J. Cornput.-<strong>Aided</strong> Mol. Des., 12,597<br />

(1998). CombiDOCK: Structure-Based Combinatorial Docking <strong>and</strong> Library Design.<br />

H.-J. Bohm, J. Cornput.-<strong>Aided</strong> Mol. Des., 6, 61 (1992). The <strong>Computer</strong> Program LUDI: A<br />

New Method for the De Novo Design of Enzyme Inhibitors.<br />

H.-J. Bohm,J. Cornput.-<strong>Aided</strong> Mol. Des., 10,265 (1996). Towards the Automatic Design of<br />

Synthetically Accessible Protein Lig<strong>and</strong>s: Peptides, Amides <strong>and</strong> Peptidomimetics.<br />

H.-J. Bohm, D. W. Bannel; <strong>and</strong> L. Weber, J. Cornput.-<strong>Aided</strong> Mol. Des., 13, 51 (1999).<br />

Combinatorial Docking <strong>and</strong> Combinatorial Chemistry: Design of Potent Non-peptide<br />

Thrombin Inhibitors.<br />

Design in Receptor. Oxford <strong>Molecular</strong> Group plc, The Medawar Centre, Oxford Science<br />

Park, Oxford, OX4 4GA, United Kingdom. http://www.oxmol.co.u!d.<br />

C. M. Murray <strong>and</strong> S. J. Cato, J. Chern. Inf Cornput. Sci., 39,46 (1999). Design of Libraries<br />

to Explore Receptor Sites.<br />

M. Lajiness, in QSAR: Rational Approaches to the Design of Bioactive Compounds, C.<br />

Silipo <strong>and</strong> A. Vittoria, Eds., ESCOM, Leiden, 1991, pp. 201-204. Evaluation of the Perfor-<br />

mance of Dissimilarity Selection Methodology.<br />

R. Taylor, J. Cbern. Inf. Cornput. Sci., 35, 59 (1995). Simulation <strong>Analysis</strong> of Experimental<br />

Design Strategies for Screening R<strong>and</strong>om Compounds as Potential New Drugs <strong>and</strong><br />

Agrochemicals.<br />

S. K. Kearsley, S. Sallamack, E. M. Fluder, J. D. Andose, R. T. Mosley, <strong>and</strong> R. P. Sheridan,<br />

J. Chem. Inf. Cornput. Sci., 36, 118 (1996). Chemical Similarity Using Physiochemical<br />

Property Descriptors.<br />

V. J. Gillet, P. Willett, <strong>and</strong> J. Bradshaw,]. Chern. In{ Cornput. Sci., 38,165 (1998). Identifica-<br />

tion of Biological Activity Profiles Using Substructural <strong>Analysis</strong> <strong>and</strong> Genetic Algorithms.


46 <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

130. Spresi database. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite 370,<br />

Mission Viejo, CA 92691. http://www.daylight.com/.<br />

131. D. E. Patterson, R. D. Cramer, A. M. Ferguson, R. D. Clark, <strong>and</strong> L. E. Weinberger,]. Med.<br />

Chem., 39, 3049 (1996). Neighborhood Behavior: A Useful Concept for Validation of<br />

<strong>Molecular</strong> <strong>Diversity</strong> Descriptors.<br />

132. H. Matter, J. Med. Chem., 40, 1219 (1997). Selecting Optimally Diverse Compounds from<br />

Structure Databases: A Validation Study of Two-Dimensional <strong>and</strong> Three-Dimensional<br />

Descriptors.<br />

133. H. Matter,]. Peptide. Res., 52,305 (1998). A Validation Study of <strong>Molecular</strong> Descriptors for<br />

the Rational Design of Peptide Libraries.<br />

134. G. M. Downs <strong>and</strong> P. Willett, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D.<br />

B. Boyd, Eds., VCH Publishers, New York, 1995, Vol. 7, pp. 1-66. Similarity Searching in<br />

Databases of Chemical Structures.<br />

135. R. D. Cramer, S. A DePriest, D. E. Patterson, <strong>and</strong> P. Hecht, in 3-D QSAR in Drug Design, H.<br />

Kubinyi, Ed., ESCOM, Leiden, 1993, pp. 443-485. The Developing Practice of Comparative<br />

<strong>Molecular</strong> Field <strong>Analysis</strong>.<br />

136. T. I. Oprea <strong>and</strong> C. L. Waller, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D.<br />

B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 127-182. Theoretical <strong>and</strong> Practical<br />

Aspects of Three-Dimensional Quantitative Structure-Activity Relationships.<br />

137. G. Greco, E. Novellino, <strong>and</strong> Y. C. Martin, in Reviews in Computational Chemistry, K. B.<br />

Lipkowitz <strong>and</strong> D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 183-240.<br />

Approaches to Three-Dimensional Quantitative Structure-Activity Relationships.<br />

138. E. J. Jacobsen, L. S. Stelzer, R. E. TenBrink, K. L. Belonga, D. B. Carter, H. K. Im, W. B. Im, V.<br />

H. Sethy, A. H. Tang, P. F. Von Voigtl<strong>and</strong>er, J. D. Petke, W.-Z. Zhong, <strong>and</strong> J. W. Mickelson,J.<br />

Med. Chem., 42,1123 (1999). Piperazine Imidazo[l,S-a]quinoxaline Ureas as High-Affinity<br />

GABA, Lig<strong>and</strong>s of Dual Functionality.<br />

139. J. D. Elliott, M. A. Lago, R. D. Cousins, A. Gao, J. D. Leber, K. F. Erhard, P. Nambi, N. A.<br />

Elshourbagy, C. Kumar, J. A. Lee, J. W. Bean, C. W. DeBrosse, D. S. Eggleston, D. P. Brooks,<br />

G. Feuerstein, R. R. Ruffolo Jr., J. Weinstock, J. G. Gleason, C. E. Peishoff, <strong>and</strong> E. H.<br />

Ohlstein, ]. Med. Chem., 37, 1553 (1994). 1,3-Diarylindan-2-carboxylic Acids, Potent <strong>and</strong><br />

Selective Non-peptide Endothelin Receptor Antagonists.<br />

140. T. F. Walsh, K. J. Fitch, D. L. Williams Jr., K. L. Murphy, N. A. Nolan, D. J. Pettibone, S. L.<br />

Raymond, S. S. O’Malley, B. V. Clineschmidt, D. F. Veber, <strong>and</strong> W. J. Greenlee, Bioorg. Med.<br />

Chem. Lett., 5, 1155 (1995). Potent Dual Antagonists of Endothelin <strong>and</strong> Angiotensin I1<br />

Receptors Derived from a-Phenoxyphenylacetic Acids. 111.<br />

141. S. A. Mousa <strong>and</strong> D. A. Cheresh, Drug Discovery Today, 2, 187 (1997). Recent Advances in<br />

Cell Adhesion Molecules <strong>and</strong> Extracellular Matrix Proteins: Potential Clinical Implications.<br />

142. S. D. Pickett, I. M. McLay, <strong>and</strong> D. E. Clark, 1. Chem. Inf Comput. Sci., 40, 263 (2000).<br />

Enhancing the Hit-to-Lead Properties of Lead Optimization Libraries.<br />

143. M. A. Johnson <strong>and</strong> G. M. Maggiora, Eds., Concepts <strong>and</strong> Applications of <strong>Molecular</strong> Similarity.<br />

Wiley-Interscience, New York, 1990.<br />

144. J. B. Dunbar, Perspect. Drug Discovery Des., 7/8, 51 (1997). Cluster-Based Selection.<br />

145. M. S. Lajiness, Perspect. Drug Discovery Des., 718, 65 (1997). Dissimilarity-Based Compound<br />

Selection Techniques.<br />

146. J. H. Wikel <strong>and</strong> R. E. Higgs, ]. Biomol. Screening, 2,65 (1997). Applications of <strong>Molecular</strong><br />

<strong>Diversity</strong> <strong>Analysis</strong> in High Throughput Screening.<br />

147. R. A. Jarvis <strong>and</strong> E. A. Patrick, IEEE Trans. Comput., C-22,1025 (1973). Clustering Using a<br />

Similarity Measure Based on Shared Nearest Neighbors.<br />

148. P. R. Menard, R. A. Lewis, <strong>and</strong> J. S. Mason, J. Chem. Inf. Comput. Sci., 38, 497 (1998).<br />

Rational Screening Set Design <strong>and</strong> Compound Selection: Cascaded Clustering.<br />

149. T. N. Doman, J. M. Cibulskis, M. J. Cibulskis, P. D. McCray, <strong>and</strong> D. P. Spangler, 1. Chem. In{<br />

Comput. Sci., 36,1195 (1996). AlgorithmS: A Technique for Fuzzy Similarity Clustering of<br />

Chemical Inventories.


150.<br />

151.<br />

152.<br />

153.<br />

154.<br />

155.<br />

156.<br />

157.<br />

158.<br />

159.<br />

160.<br />

161.<br />

162.<br />

163.<br />

164.<br />

165.<br />

166.<br />

167.<br />

168.<br />

169.<br />

170.<br />

171.<br />

172.<br />

173.<br />

References 47<br />

R. Dubes <strong>and</strong> A. K. Jain, Adu. Comput., 19, 113 (1980). Clustering Methodologies in<br />

Exploratory Data <strong>Analysis</strong>.<br />

J. M. Barnard <strong>and</strong> G. M. Downs, 1. Chem. inf. Comput. Sci., 37, 141 (1997). Chemical<br />

Fragment Generation <strong>and</strong> Clustering Software.<br />

F. Murtagh, Multidimensional Clustering Algorithms, Physica-Verlag, Vienna, 1985.<br />

L. H. Hall, L. B. Kier, <strong>and</strong> B. B. Brown, J. Chem. Inf. Comput. Sci., 35, 1074 (1995).<br />

<strong>Molecular</strong> Similarity Based on Novel Atom-Type Electrotopological State Indices.<br />

M. J. Ashton, M. C. Jaye, <strong>and</strong> J. S. Mason, Drug Discovery Today, 1, 71 (1996). New<br />

Perspectives in Lead Generation. 11. Evaluating <strong>Molecular</strong> <strong>Diversity</strong>.<br />

D. Bawden, in Chemical Structures 2: The international Language of Chemistry, W. A. Warr,<br />

Ed., Springer-Verlag, Berlin, 1993, pp. 383-388. <strong>Molecular</strong> Dissimilarity in Chemical Infor-<br />

mation Systems.<br />

R. W. Kennard <strong>and</strong> L. A. Stone, Technometrics, 11, 137 (1969). <strong>Computer</strong> <strong>Aided</strong> Design of<br />

Experiments.<br />

J. D. Holliday, S. S. Ranade, <strong>and</strong> P. Willett, Quant. Struct.-Act. Relat., 14,501 (1995). A Fast<br />

Algorithm for Selecting Sets of Dissimilar Molecules from Large Chemical Databases.<br />

D. B. Turner, S. M. Tyrrell, <strong>and</strong> P. Willett,J. Chem. Infi Comput. Sci., 37, 18 (1997). Rapid<br />

Quantification of <strong>Molecular</strong> <strong>Diversity</strong> for Selective Database Acquisition.<br />

J. D. Holliday <strong>and</strong> P. Willett,]. Biomol. Screening, 1,145 (1996). Definitions of Dissimilarity<br />

for Dissimilarity-Based Compound Selection.<br />

M. Snarey, N. K. Terrett, P. Willett, <strong>and</strong> D. J. Wilton, /. Mol. Graphics, 15, 372 (1997).<br />

Comparison of Algorithms for Dissimilarity-Based Compound Selection.<br />

D. K. Agrafiotis <strong>and</strong> V. S. Lobanov, J. Chem. Inf. Comput. Sci., 39, 51 (1999). An Efficient<br />

Implementation of Distance-Based <strong>Diversity</strong> Measures Based on k-d Trees.<br />

R. D. Clark, /. Chern. In$ Comput. Sci., 37, 1181 (1997). OptiSim: An Extended<br />

Dissimilarity Selection Method for Finding Diverse Representative Subsets.<br />

R. D. Clark <strong>and</strong> W. J. Langton, J. Chem. Inf. Comput. Sci., 38, 1079 (1998). Balancing<br />

Representativeness Against <strong>Diversity</strong> Using Optimizable K-Dissimilarity <strong>and</strong> Hierarchical<br />

Clustering,<br />

M. Hassan, J. P. Bielawski, J. C. Hempel, <strong>and</strong> M. Waldman, Mol. <strong>Diversity</strong>, 2, 64 (1996).<br />

Optimisation <strong>and</strong> Visualisation of <strong>Molecular</strong> <strong>Diversity</strong> of Combinatorial Libraries.<br />

B. D. Hudson, R. M. Hyde, E. Rahr, J. Wood, <strong>and</strong> J. Osman, Quant. Struct.-Act. Relat., 15,<br />

283 (1996). Parameter Based Methods for Compound Selection from Chemical Data Bases.<br />

R. E. Higgs, K. G. Bemis, I. A. Watson, <strong>and</strong> J. H. Wikel, J. Chem. Inf. Comput. Sci., 37, 861<br />

(1997). Experimental Designs for Selecting Molecules from Large Chemical Databases.<br />

S. Anzali, J. Gasteiger, U. Holzgrabe, J. Polanski, J, Sadowski, A. Teckentrup, <strong>and</strong> M. Wage-<br />

ner, Perspect. Drug Discovery Des., 9/10/11,273 (1998). The Use of Self-organizing Neu-<br />

ral Networks in Drug Design.<br />

H. Bauknecht, A. Zell, H. Bayer, P. Levi, M. Wagener, J. Sadowski, <strong>and</strong> J. Gasteiger, /. Chem.<br />

Inf. Comput. Sci., 36, 1205 (1996). Locating Biologically Active Compounds in Medium-<br />

Sized Heterogeneous Datasets by Topological Autocorrelation Vectors: Dopamine <strong>and</strong> Ben-<br />

zodiazepine Agonists.<br />

D. K. Agrafiotis, /. Chem. Inf. Comput. Sci., 37, 841 (1997). Stochastic Algorithms for<br />

Maximizing <strong>Molecular</strong> <strong>Diversity</strong>.<br />

P. Willett, Similarity <strong>and</strong> Clustering in Chemical Information Systems, Research Studies<br />

Press, Letchworth, 1987.<br />

J. M. Barnard <strong>and</strong> G. M. Downs, J. Chem. Inf. Comput. Sci., 32,644 (1992). Clustering of<br />

Chemical Structures on the Basis of Two-Dimensional Similarity Measures.<br />

J. W. MacFarlane <strong>and</strong> D. J. Gans, in Cbemometric Methods in <strong>Molecular</strong> Design, H. van de<br />

Waterbeemd, Ed., VCH, Weinheim, 1995, pp. 295-308. Cluster Significance <strong>Analysis</strong>.<br />

D. H. Rouvray, Fuzzy Logic in Chemistry, Academic Press, San Diego, CA, 1997.


48 <strong>Molecular</strong> Diversitv <strong>and</strong> Combinatorial Librarv Desim<br />

174.<br />

175.<br />

176.<br />

177.<br />

178.<br />

179.<br />

180.<br />

181.<br />

182.<br />

183.<br />

184.<br />

185.<br />

186.<br />

187.<br />

188.<br />

189.<br />

190.<br />

191.<br />

192.<br />

193.<br />

N. E. Shemetulskis, J. B. Dunbar Jr., B. W. Dunbar, D. W. Morel<strong>and</strong>, <strong>and</strong> C. Humblet, J.<br />

Cornput.-<strong>Aided</strong> Mol. Des., 9,407 (1995). Enhancing the <strong>Diversity</strong> of a Corporate Database<br />

Using Chemical Database Clustering <strong>and</strong> <strong>Analysis</strong>.<br />

CAST-3D Database. Chemical Abstracts Services, Columbus, OH. http://www.cas.org/.<br />

Maybridge Database. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite<br />

370, Mission Viejo, CA 92691. http://www.daylight.com/.<br />

Comprehensive Medicinal Chemistry (CMC), <strong>Molecular</strong> Design Limited, San Le<strong>and</strong>ro, CA<br />

94577. An electronic database version of the Drug Compendium that is Volume 6 of Com-<br />

prehensive Medicinal Chemistry published by Pergamon Press in March 1990. Contains<br />

drugs already on the market.<br />

MACCS-I1 Drug Data Report (MDDR), <strong>Molecular</strong> Design Limited, San Le<strong>and</strong>ro, CA 94577.<br />

An electronic database version of the Prous Science Publishers journal Drug Data Report,<br />

extracted from issues starting mid-1 988. Contains biologically active compounds in the<br />

early stages of drug development.<br />

Available Chemicals Directory (ACD), <strong>Molecular</strong> Design Limited, San Le<strong>and</strong>ro, CA 94577.<br />

Contains speciality <strong>and</strong> bulk chemicals from commercial sources.<br />

SPECS/BioSPECS Database; Br<strong>and</strong>on Associates, Merrimack, NH 03054. Contains chemi-<br />

cals from private sources.<br />

R. Nilakantan, N. Bauman, <strong>and</strong> K. S. Haraki,]. Cornput.-<strong>Aided</strong> Mol. Des., 11,447 (1997).<br />

<strong>Diversity</strong> Database Assessment: New Ideas, Concepts <strong>and</strong> Tools.<br />

R. Nilakantan, N. Bauman, K. S. Haraki, <strong>and</strong> R. Venkataraghavan, ]. Chem. Inf. Comput.<br />

Sci., 30,65 (1990). A Ring-Based Chemical Structural Query System: Use of a Novel Ring-<br />

Complexity Heuristic.<br />

F. H. Allen, J. E. Davies, J. J. Galloy, 0. Johnson, 0. Kennard, C. F. Macrae, E. M. Mitchell,<br />

G. F. Mitchell, J. M. Smith <strong>and</strong> D. G. Watson,]. Chem. Inf. Comput. Sci., 31, 187 (1991).<br />

The Development of Version 3 <strong>and</strong> Version 4 of the Cambridge Structural Database System.<br />

G. W. A. Milne, M. C. Nicklaus, J. S. Driscoll, S. Wang, <strong>and</strong> D. J. Zaharevitz, J. Chem. Inf.<br />

Comput. Sci., 34, 1219 (1994). National Cancer Institute Drug Information System 3D<br />

Database.<br />

A. C. Good <strong>and</strong> R. A. Lewis, J. Med. Chem., 40, 3926 (1997). New Methodology for<br />

Profiling Combinatorial Libraries <strong>and</strong> Screening Sets: Cleaning Up the Design Process with<br />

HARPick.<br />

V. J. Gillet, P. Willett, <strong>and</strong> J. Bradshaw, J. Chem. In6 Cornput. Sci., 37, 731 (1997). The<br />

Effectiveness of Reactant Pools for Generating Structurally-Diverse Combinatorial<br />

Libraries.<br />

J. H. van Drie <strong>and</strong> M. S. Lajiness, Drug Discovery Today, 3, 274 (1998). Approaches to<br />

Virtual Library Design.<br />

J. H. Kalivas, Chemom. Intell. Lab. Syst., 15, 1 (1992). Optimization Using Variations of<br />

Simulated Annealing.<br />

R. Judson, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd, Eds.,<br />

VCH Publishers, New York, 1997, Vol. 10, pp. 1-73. Genetic Algorithms <strong>and</strong> Their Use in<br />

Chemistry.<br />

R. D. Brown <strong>and</strong> D. E. Clark, Expert Opin. Ther. Patents, 8,1447 (1998). Genetic <strong>Diversity</strong>:<br />

Applications of Evolutionary Algorithms to Combinatorial Library Design.<br />

L. Weber, Curr. Opin. Chem. Bzol., 2, 381 (1998). Applications of Genetic Algorithms in<br />

<strong>Molecular</strong> <strong>Diversity</strong>.<br />

L. Weber, Drug Discovery Today, 3, 379 (1998). Evolutionary Combinatorial Chemistry:<br />

Application of Genetic Algorithms.<br />

R. A. Lewis, A. C. Good, <strong>and</strong> S. D. Pickett, in <strong>Computer</strong>-Assisted Lead Finding <strong>and</strong> Optimi-<br />

zation: Current Tools for Medicinal Chemistry, H. van de Waterbeemd, B. Testa, <strong>and</strong> G.<br />

Fokers, Eds., Wiley-VCH, Weinheim, 1997, pp. 135-1 56. Quantification of <strong>Molecular</strong><br />

Similarity <strong>and</strong> Its Application to Combinatorial Chemistry.


References 49<br />

194. V. J. Gillet, P. Willett, J. Bradshaw, <strong>and</strong> D. V. S. Green, J. Chem. Inf. Comput. Sci., 39, 169<br />

(1999). Selecting Combinatorial Libraries to Optimize <strong>Diversity</strong> <strong>and</strong> Physical Properties.<br />

195. R. P. Sheridan <strong>and</strong> S. K. Kearsley,]. Chem. lnf. Comput. Sci., 35,310 (1995). Using a Genetic<br />

Algorithm to Suggest Combinatorial Libraries.<br />

196. S. J. Cho, W. Zheng, <strong>and</strong> A. Tropsha,J. Chem. Inf. Comput. Sci., 38,259 (1998). Rational<br />

Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide I i<br />

braries Using Chemical Similarity Probe <strong>and</strong> the Inverse QSAR Approaches.<br />

197. S. D. Pickett, unpublished work, 1999.<br />

198. P. J. Brown, T. A. Smith-Oliver, P. S. Charifson, N. C. 0. Tomkinson, A. M. Fivush, D. D.<br />

Sternbach, L. E. Wade, L. Orb<strong>and</strong>-Miller, D. J. Parks, S. G. Blanchard, S. A. Kliewer, J. H.<br />

Lehmann, <strong>and</strong> T. M. Willson, Chem. Biol., 4, 909 (1997). Identification of Peroxisome<br />

Proliferator-Activated Receptor Lig<strong>and</strong>s from a Biased Chemical Library.<br />

199. F. R. Salemme, J. Spurlino, <strong>and</strong> R. Bone, Structure, 5, 319 (1997). Serendipity Meets Precision:<br />

The Integration of Structure-Based Drug Design <strong>and</strong> Combinatorial Chemistry for<br />

Efficient Drug Discovery.<br />

200. J. Li, C. W. Murray, B. Waszkowycz, <strong>and</strong> S. C. Young, Drug Discovery Today, 3,105 (1998).<br />

Targeted <strong>Molecular</strong> <strong>Diversity</strong> in Drug Discovery: Integration of Structure-Based Design <strong>and</strong><br />

Combinatorial Chemistry.<br />

201. M. A. Murcko, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd,<br />

Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 1-66. Recent Advances in Lig<strong>and</strong> Design<br />

Methods.<br />

202. D. E. Clark, C. W. Murray, <strong>and</strong> J. Li, in Reviews in Computational Chemistry, K. B.<br />

Lipkowitz <strong>and</strong> D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 67-125.<br />

Current Issues in De Novo <strong>Molecular</strong> Design.<br />

203. A. Rockwell, M. Melden, R. A. Copel<strong>and</strong>, K. Hardman, C. P. Decicco, <strong>and</strong> W. F. DeGrado,<br />

J. Am. Chem. SOC., 118, 10337 (1996). Complementarity of Combinatorial Chemistry <strong>and</strong><br />

Structure-Based Lig<strong>and</strong> Design: Application to the Discovery of Novel Inhibitors of Matrix<br />

Metalloproteinases.<br />

204. A. P. Combs, T. M. Kapoor, S. B. Feng, J. K. Chen, L. F. Daudesnow, <strong>and</strong> S. L. Schreiber,<br />

J. Am. Chem. SOC., 118, 287 (1996). Protein Structure-Based Combinatorial Chemistry:<br />

Discovery of Non-peptide Binding Elements to Src SH3 Domain.<br />

205. T. C. Norman, N. S. Gray, J. T. Koh, <strong>and</strong> P. G. Schultz,]. Am. Cbem. SOL., 118,7430 (1996).<br />

A Structure-Based Library Approach to Kinase Inhibitors.<br />

206. T. M. Kapoor, A. H. Andreotti, <strong>and</strong> S. L. Schreiber, I. Am. Cbem. SOC., 120, 23 (1998).<br />

Exploring the Specificity Pockets of Two Homologous SH3 Domains Using Structure-Based,<br />

Split-Pool Synthesis <strong>and</strong> Affinity-Based Selection.<br />

207. J. P. Morken, T. M. Kapoor, S. Feng, F. Shirai, <strong>and</strong> S. L. Schreiber,J. Am. Cbem. SOC., 120,30<br />

(1998). Exploring the Leucine-Proline Binding Pocket of the Src SH3 Domain Using<br />

Structure-Based, Split-Pool Synthesis <strong>and</strong> Affinity-Based Selection.<br />

208. S. F. Brady, K. J. Stauffer, W. C. Lumma, G. M. Smith, H. G. Ramjit, S. D. Lewis, B. J. Lucas,<br />

S. J. Gardell, E. A. Lyle, S. D. Appleby, J. J. Cook, M. A. Holahan, M. T. Stranieri, J. J. Lynch<br />

Jr., J. H. Lin, I.-W. Chen, K. Vastag, A. M. Naylor-Olsen, <strong>and</strong> J. P. Vacca,J. Med. Chem., 41,<br />

401 (1998). Discoverv <strong>and</strong> Develo~ment of the Novel Potent Orallv Active Thrombin<br />

Inhiktor I\j-(9-Hydro~y-9-fluorene~arboxy)prolyl trans-4-Aminocyclohexylmethyl Amide<br />

(L-372,460): Coapplication of Structure-Based Design <strong>and</strong> Rapid Multiple Analog Synthesis<br />

on Solid Support.<br />

209. C. Illig, S. Eisennagel, R. Bone, A. Radzicka, L. Murphy, T. R<strong>and</strong>le, J. Spurlino, F. R.<br />

Salemme, <strong>and</strong> R. M. SOH, Med. Chem. Res., 4/5,244 (1998). Exp<strong>and</strong>ing the Envelope of<br />

Structure-Based Drug Design Using Chemical Libraries: Application to Small Molecule<br />

Inhibitors of Thrombin.<br />

210. D. S. Dhanoa, R. M. Soll, 2. Wu, N. Subasinghe, J. Rinker, J. Hoffman, S. Eisennagel, T.<br />

Graybill, R. Bone, A. Radzicka, L. Murphy, <strong>and</strong> F. R. Salemme, Med. Chem. Res., 415,187<br />

(1998). Serine Proteases-Directed Small Molecule Probe Libraries.


SO <strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Libra y Design<br />

211.<br />

212.<br />

213.<br />

214.<br />

215.<br />

216.<br />

217.<br />

218.<br />

219.<br />

220.<br />

221.<br />

222.<br />

223.<br />

224.<br />

225.<br />

226.<br />

227.<br />

228.<br />

229.<br />

S.-H. Kim, Pure Appl. Chem., 70,555 (1998). Structure-Based Inhibitor Design for CDK2, a<br />

Cell Cycle Controlling Protein.<br />

M. Whittaker, Cum Opin. Chem. Biol., 2, 386 (1998). Discovery of Protease Inhibitors<br />

Using Targeted Libraries.<br />

A. K. Szardenings, D. Harris, S. Lam, L. Shi, D. Tien, Y. Wang, D. V. Patel, M. Navre, <strong>and</strong> D.<br />

A. Campbell, J. Med. Chem., 41,2194 (1998). Rational Design <strong>and</strong> Combinatorial Evalua-<br />

tion of Enzyme Inhibitor Scaffolds: Identification of Novel Inhibitors of Matrix<br />

Metalloproteinases.<br />

K. D. Stewart, S. Loren, L. Frey, E. Otis, V. Klinghofer, <strong>and</strong> K. I, Hulkower, Bioorg. Med.<br />

Chem. Lett., 8, 529 (1998). Discovery of a New Cyclooxygenase-2 Lead Compound<br />

Through 3-D Database Searching <strong>and</strong> Combinatorial Chemistry.<br />

T. L. Graybill, D. K. Agrafiotis, R. Bone, C. R. Illig, E. P. Jaeger, K. T. Locke, T. Lu, J. M.<br />

Salvino, R. M. SOIL J. C. Spurlino, N. Subasinghe, B. E. Tomczuk, <strong>and</strong> F. R. Salemme, in<br />

<strong>Molecular</strong> <strong>Diversity</strong> <strong>and</strong> Combinatorial Chemistry: Libraries <strong>and</strong> Drug Discovery, I. M.<br />

Chaiken <strong>and</strong> K. D. J<strong>and</strong>a, Eds., American Chemical Society, Washington, DC, 1996, pp. 16-<br />

27. Enhancing the Drug Discovery Process by Integration of High-Throughput Chemistry<br />

<strong>and</strong> Structure-Based Drug Design.<br />

E. J. Martin <strong>and</strong> R. E. Critchlow, J. Comb. Chem., 1, 32 (1999). Beyond Mere <strong>Diversity</strong>:<br />

Tailoring Combinatorial Libraries for Drug Discovery.<br />

G. M. Rishton, Drug Discovery Today, 2, 382 (1997). Reactive Compounds <strong>and</strong> In Vitro<br />

False Positives in HTS.<br />

A. D. Rodrigues, Pharm. Res., 14, 1504 (1997). Preclinical Drug Metabolism in the Age of<br />

High-Throughput Screening: An Industrial Perspective.<br />

J. H. Lin <strong>and</strong> A. Y. H. Lu, Pharmacol. Rev., 49,403 (1997). Pharmacokinetics <strong>and</strong> Metabo-<br />

lism in Drug Discovery <strong>and</strong> Development.<br />

M. H. Tarbit <strong>and</strong> J. Berrnan, Curz Opin. Chem. Biol., 2, 411 (1998). High-Throughput<br />

Approaches for Evaluating Absorption, Distribution, Metabolism <strong>and</strong> Excretion Properties<br />

of Lead Compounds.<br />

P. J. Sinko, Cum. Opin. Drug Discovery Dev., 2, 42 (1999). Drug Selection in Early Drug<br />

Development: Screening for Acceptable Pharmacokinetic Properties Using Combined In<br />

Vitro <strong>and</strong> Computational Approaches.<br />

C. A. Lipinski, F. Lombardo, B. W. Dominy, <strong>and</strong> P. J. Feeney, Adv. Drug. Delivery Rev., 23,3<br />

(1997). Experimental <strong>and</strong> Computational Approaches to Estimate Solubility <strong>and</strong> Per-<br />

meability in Drug Discovery <strong>and</strong> Development Settings.<br />

I. Moriguchi, S. Hirono, Q. Liu, I. Nakagome, <strong>and</strong> Y. Matsushita, Chem. Pham. Bull., 40,<br />

127 (1992). Simple Method of Calculating OctanoVWater Partition Coefficient.<br />

K. Palm, K. Luthmann, A.-L. Ungell, G. Str<strong>and</strong>lund, <strong>and</strong> P. Artursson,]. Pbarm. Sci., 85,32<br />

(1996). Correlation of Drug Absorption with <strong>Molecular</strong> Surface Properties.<br />

K. Palm, P. Stenberg, K. Luthmann, <strong>and</strong> P. Artursson, Pharm. Res., 14, 568 (1997). Polar<br />

<strong>Molecular</strong> Surface Properties Predict the Intestinal Absorption of Drugs in Humans.<br />

K. Palm, K. Luthman, A.-L. Ungell, G. Str<strong>and</strong>lund, F. Beigi, P. Lundahl, <strong>and</strong> P. Artursson, I.<br />

Med. Chem., 41, 5382 (1998). Evaluation of Dynamic Polar <strong>Molecular</strong> Surface Area as<br />

Predictor of Drug Absorption: Comparison with Other Computational <strong>and</strong> Experimental<br />

Predictors.<br />

S. Winiwarter, N. M. Bonham, F. Ax, A. Hallberg, H. Lennernas, <strong>and</strong> A. Karlen, J. Med.<br />

Cbem., 41,4939 (1998). Correlation of Human Jejunal Permeability (In Vivo) of Drugs with<br />

Experimentally <strong>and</strong> Theoretically Derived Parameters. A Multivariate Data <strong>Analysis</strong><br />

Approach.<br />

D. E. Clark, J. Pharm. Sci., 88, 807 (1999). Rapid Calculation of Polar <strong>Molecular</strong> Surface<br />

Area <strong>and</strong> Its Application to the Prediction of Transport Phenomena. 1. Prediction of Intesti-<br />

nal Absorption.<br />

D. E. Clark,J. Pharm. Sci., 88, 815 (1999). Rapid Calculation of Polar <strong>Molecular</strong> Surface<br />

Area <strong>and</strong> Its Application to the Prediction of Transport Phenomena. 2. Prediction of Blood-<br />

Brain Barrier Penetration.


230.<br />

231.<br />

232.<br />

233.<br />

234.<br />

235.<br />

236.<br />

237.<br />

238.<br />

239.<br />

240.<br />

241.<br />

242.<br />

243.<br />

244.<br />

245.<br />

246.<br />

247.<br />

References 51<br />

Y. C. Martin, Perspect. Drug Discovery Des., 7/8, 159 (1997). Challenges <strong>and</strong> Prospects for<br />

Computational Aids to <strong>Molecular</strong> <strong>Diversity</strong>.<br />

J. S. Mason <strong>and</strong> M. A. Hermsmeier, Curr. Opin. Chem. Biol., 3, 342 (1999). <strong>Diversity</strong><br />

Assessment.<br />

C. A. Parks, G. M. Crippen, <strong>and</strong> J. G. Topliss,J. Cornput.-<strong>Aided</strong>Mol. Des., 12,441 (1998).<br />

The Measurement of <strong>Molecular</strong> <strong>Diversity</strong> by Receptor Site Interaction Simulation.<br />

D. A. Thorner, D. J. Wild, P. Willett, <strong>and</strong> P. M. Wright, Perspect. Drug Discovery Des.,<br />

9/10/11, 301 (1998). Calculation of Structural Similarity by the Alignment of <strong>Molecular</strong><br />

Electrostatic Potentials.<br />

Ajay, W. P. Walters, <strong>and</strong> M. A. Murcko,]. Med. Chem., 41, 3314 (1998). Can We Learn to<br />

Distinguish Between Drug-like <strong>and</strong> Non-drug-like Molecules?<br />

J. Sadowski <strong>and</strong> H. Kubinyi, J. Med. Chem., 41, 3325 (1998). A Scoring Scheme for<br />

Discriminating Between Drugs <strong>and</strong> Nondrugs.<br />

A. K. Ghose, V. N. Viswanadhan, <strong>and</strong> J. J. Wendoloski, J'. Comb. Chem., 1, 55 (1999). A<br />

Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries<br />

for Drug Discovery. 1. A Qualitative <strong>and</strong> Quantitative Characterization of Known Drug<br />

Databases.<br />

J. Moult, T. Hubbard, S. H. Bryant, K. Fidelis, <strong>and</strong> J. T. Pedersen, Proteins: Struct., Funct.,<br />

Genet., Suppl. 1,2 (1997). Critical Assessment of Methods of Protein Structure Prediction<br />

(CASP): Round 11.<br />

H.-J. Bohm,J. Cornput.-<strong>Aided</strong> Mol. Des., 12,309 (1998). Prediction of Binding Constants of<br />

Protein Lig<strong>and</strong>s: A Fast Method for the Prioritization of Hits Obtained from De Novo<br />

Design or 3-D Database Search Programs.<br />

I. Muegge <strong>and</strong> Y. C. Martin, J. Med. Chem., 42, 791 (1999). A General <strong>and</strong> Fast Scoring<br />

Function for Protein-Lig<strong>and</strong> Interactions: A Simplified Potential Approach.<br />

R. H. Smith Jr., W. L. Jorgensen, J. Tirado-Rives, M. L. Lamb, P. A. J. Janssen, C. J. Michejda,<br />

<strong>and</strong> M. B. K. Smith, ]. Med. Chem., 41, 5272 (1998). Prediction of Binding Affinities for<br />

TIBO Inhibitors of HIV-1 Reverse Transcriptase Using Monte Carlo Simulations in a Linear<br />

Response Method.<br />

T. Hansson, J. Marelius, <strong>and</strong> J. Aqvist, J. Cornput.-<strong>Aided</strong> Mol. Des., 12,27 (1998). Lig<strong>and</strong><br />

Binding Affinity Prediction by Linear Interaction Energy Methods.<br />

T. P. Straatsma, in Reviews in Computational Chemistry, K. B. Lipkowitz <strong>and</strong> D. B. Boyd,<br />

Eds., VCH Publishers, New York, 1996, Vol. 9, pp. 81-127. Free Energy by <strong>Molecular</strong><br />

Simulation.<br />

J, M. Barnard <strong>and</strong> G. M. Downs, Perspect. Drug Discovery Des., 7/8,13 (1997). <strong>Computer</strong><br />

Representation <strong>and</strong> Manipulation of Combinatorial Libraries.<br />

X. Chen, A. Rusinko, <strong>and</strong> S. S. Young,J. Chem. Inf. Comput. Sci., 38,1054 (1998). Recur-<br />

sive Partitioning Analys' ; of a Large Structure-Activity Data Set Using Three-Dimensional<br />

Descriptors.<br />

H. Gao, C. Williams, P. Labute, <strong>and</strong> J. Bajorath,]. Chem. Inf. Comput. Sci., 39,164 (1999).<br />

Binary Quantitative Structure-Activity Relationship (QSAR) <strong>Analysis</strong> of Estrogen Receptor<br />

Lig<strong>and</strong>s.<br />

R. S. Pearlman (University of Texas at Austin), private communication, 1999.<br />

DiverseSolutions. Distributed by Tripos, Inc., 1699 South Hanley Road, St. Louis, MO<br />

63144, on behalf of the Laboratory for <strong>Molecular</strong> Graphics <strong>and</strong> Theoretical Modeling,<br />

College of Pharmacy, University of Texas at Austin, Austin, TX, 78712.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!