Computer-Aided Molecular Diversity Analysis and ... - Read

CHAPTER 1 

Computer-Aided Molecular Diversity 

Analysis and Combinatorial Library 

Design 

Richard A. Lewis," Stephen D. Pickett,+* and 

David E. Clark+ 

*Computational Chemistry, Eli Lilly and Company Ltd., Lilly 

Research Centre, Erl Wood Manor, Sunninghill Road, 

Windlesham, Surrey, G U20 6PH, United Kingdom, and 

t Computer-Aided Drug Design, Aventis Pharma Ltd. (formerly 

Rhbne-Poulenc Rorer Ltd.), Dagenham Research Centre, 

Rainham Road South, Dagenham, Essex, RMlO 7XS, United 

Kingdom, (present address): *Roche Products Ltd., Roche 

Discovery Welwyn, 40 Broadwater Road, Welwyn Garden City, 

Hertfordshire, AL7 3AY; United Kingdom 

INTRODUCTION 

The roots of combinatorial chemistry can be traced back to Merrifield's 

work on the solid phase synthesis of peptides during the 1960s.l Methods for 

rapidly synthesizing large libraries of peptides on solid phase were developed 

during the 1980s, making use of the combinatorial relationship between the 

length of a peptide and the number of possible amino acids at each position in 

Reviews in Computational Chemistry, Volume 16 

Kenny B. Lipkowitz and Donald B. Boyd, Editors 

Wiley-VCH, John Wiley and Sons, Inc., New York, 0 2000 

1

2 Molecular Diversity and Combinatorial Libra y Design 

the sequence (i.e., an n-residue peptide with X possible amino acids at each 

position can be used as the basis for a library of X" compounds).2 A number of 

groups reported protocols for what has become known as cornbirzatoriul syn- 

thesis.3-" At about the same time, the pharmaceutical industry began to come 

under greater economic pressure to increase the speed of drug discovery, and so 

the prospect of being able to synthesize rapidly large numbers of compounds for 

testing was seized upon with enthusiasm. However, peptides generally make 

poor drug candidates because they are rapidly metabolized in the body. There- 

fore, much effort was expended to develop analogous combinatorial synthetic 

methods applicable for producing small organic molecules. By the mid-l990s, 

these efforts began to bear fruit. Thus, the discipline of combinatorial chemistry, 

in its present-day form, was born and quickly integrated into the drug discovery 

efforts of the majority of pharmaceutical companies. For more details on com- 

binatorial chemistry and its application to drug discovery, the reader is referred 

to the reviews from the mid- and late 199Os.6-13 

The most common form of combinatorial synthesis for small molecules 

involves the combination of a core or scaffold moiety with various reagents, 

which provide the substituents for the variable R positions (Figure 1). Assuming 

that there are no prohibitions for synthetic reasons, all combinations of reagents 

at each of the positions may be generated. Thus, the potential size of the 

combinatorial library is given by the product of the number of possible reagents 

at each of the variable R positions. For example, if a scaffold has three variable 

positions and there are 100 possible reagents for each of those positions, then 

the combinatorial library generated would contain 1003 (1 million) com- 

pounds. Since it often happens that many more than 100 possible reagents are 

readily available for a given reaction, and because the number of variable groups 

may exceed three, it is easy to see how combinatorial library sizes may rapidly 

exceed current capabilities for synthesis, screening, and storage. 

Given that, for many libraries, a full combinatorial synthesis using all 

available reagents is impractical, one of the outstanding challenges to computer- 

aided molecular design practitioners in recent years has been to develop 

computer-based techniques to help design combinatorial libraries that encom- 

pass as much molecular diversity as possible in the smallest number of com- 

pounds. Analogous methods have also been applied to analyze the molecular 

,R1 

Figure 1 Combinatorial libraries built around a benzodiazepine scaffold (left) and a 

diketopiperazine scaffold (right).

Molecular Recognition: Similarity and Diversity 3 

diversity of compound collections (e.g., combinatorial libraries, corporate re- 

positories, or commercial directories) to find areas of overlap or complemen- 

tarity, thereby providing information for compound acquisition or further syn- 

thesis. The application of computational methods to combinatorial libraries 

and the study of molecular diversity has been the subject of a number of re- 

views14-17 and special issues of journals;’* however, the field is still at best 

adolescent and continues to evolve rapidly. 

This chapter reviews the field of computer-aided combinatorial library 

design and molecular diversity analysis. The first section of the chapter provides 

the foundation for all that follows by examining the nature of the forces govern- 

ing molecular recognition and introducing the concepts of molecular similarity 

and molecular diversity. Following on from that, we critically review the types 

of descriptor used in molecular diversity studies, as well as methods for the 

analysis of “diversity space.” The question of how descriptors of molecular 

diversity can be validated is also addressed. After these topics are covered, we 

shall review published applications of computational methodologies for library 

design and diversity analysis, seeking to highlight their relative strengths and 

weaknesses. This leads naturally into the final section, which comprises a 

discussion of some of the current issues facing those working in this area and 

suggestions regarding possible directions for future research. 

MOLECULAR RECOGNITION: 

SIMILARITY AND DIVERSITY 

There is no universally agreed-upon definition of chemical diversity,l9720 

and there are several approaches for designing chemically diverse combinatorial 

libraries, which differ not only in the methods and descriptors used but also in 

the objectives of the design. We therefore start by defining our terms: by “gen- 

eral diverse” library we mean a combinatorial library that covers as wide a 

range of values as possible relative to some molecular descriptor derived from 

its members. A “general representative” library is here defined as a library that 

is designed to mirror the distribution of values for some descriptor shown by a 

reference collection (e.g., the World Drug Index21). A “focused” library, on the 

other hand, is a library that is constrained to match closely a small set of 

compounds or the receptor site of a protein. Each definition is relevant to an 

increasing hierarchy of information used for drug discovery, with the detailed 

three-dimensional structural information provided by a model of the binding 

site being at the top. It seems sensible to try to use the knowledge we have about 

ligand-receptor complexes and propagate this understanding right down to the 

design of general diverse libraries, if possible. The reader should not take these 

definitions too literally, as they are not the only ones used in the literature. 

It is appropriate at this point to explain also the semantics of similarity


and diversity. Similarity is a property of pairs of objects (A is similar to B). 

Diversity is a property of collections of objects either with respect to that 

collection (as in a general diverse library) or with respect to some external frame 

of reference (as in representative or focused libraries). Diversity is therefore not 

necessarily the complement of similarity; we reserve the term dissimilarity for 

that concept. 

Similarity, diversity, and compound libraries relate to the effort of phar- 

maceutical discovery chemists to invent molecules that will be recognized by a 

biological target playing a key role in a disease process. The molecules must be 

able to interact with the target and favorably alter the course of the disease. 

Our goal in design is to improve the rate and cost at which new leads are 

discovered. In a broad sense, this will be achieved if libraries are synthesized or 

compounds bought that complement the physicochemical and/or structural 

properties already well represented within the set of compounds available for 

screening: that is, if the diversity of the screening set is increased. The assump- 

tion here is that the properties we use are relevant to drug-receptor interac- 

tions. It is sometimes the case that one or more leads are known. The aim of the 

design is then to focus on the important properties of the leads. If the structure 

of the protein target is known, then the design should use this information and 

focus the library toward compounds likely both to fit sterically and to interact 

favorably with the protein. This philosophy is well illustrated by Martin and 

coworkers, who describe the design of four different libraries for different 

purposes and with different levels of information to direct them.22 

We shall start at the top of the information hierarchy, the receptor site of a 

protein target, to try to understand what drives the formation of a tightly 

binding protein-ligand complex. We can then assess our molecular descriptors 

in the light of this understanding. There have been several successful applica- 

tions of site-directed ligand design,23>24 so we can try to build on these past 

efforts. Most of what we say in this chapter assumes that the biological target is 

a protein, but similar concepts apply to nucleic acids, which are less frequently 

the site of drug action. We use the term "drug" rather loosely; in reality, we are 

dealing with hgands, some of which will hopefully have the necessary attributes 

to become drugs. 

Our current understanding of the specificity of biological function is 

based on the principles of molecular recognition25 which, details aside, have not 

changed greatly in the last few years. Indeed, the successes of structure-based 

drug design have reinforced this orthodoxy. The binding and actions of a ligand 

are controlled by the patterns of molecular fields found in the vicinity of the 

contact surface of the receptor. In other words, the amino acids of the protein 

create an environment that the functional groups of the ligand complement. 

There should be multiple contacts between the ligand and the receptor to maxi- 

mize specificity and affinity of the overall interaction. It is still a very difficult 

task to design conformationally sensible, synthetically accessible target mole- 

cules that have the properties required for tight binding. The advantage of

Molecular Recognition: Similarity and Diversity 5 

combinatorial chemistry is that we can make many compounds that are approx- 

imately complementary to our target in shape, in hydrogen-bonding pattern, 

and so on, and use this extra coverage of compound space to find leads in more 

situations. 

The reduction of the rotational and translational motion of a mobile 

molecule that occurs on binding to the receptor site and the fixing of certain 

receptor side chains implies loss of entropy in both the ligand and the receptor. 

This must be balanced by the utilization of enthalpic binding energy between 

the ligand and the receptoq26 and the energy of desolvation. Favorable en- 

thalpic intermolecular interactions can be divided into three main groups: hy- 

drogen bonding, electrostatic, and polarization. This division is perhaps arbi- 

trary, but it is convenient, because it allows us to associate functional groups 

with interactions and to make up classes of hydrogen bond donors, hydrogen- 

bond acceptors, deprotonated acids (at physiological pH), protonated bases, 

aromatic rings, and hydrophobes (lipophilic portions of a molecule). These 

favorable interactions are counteracted by steric repulsion caused by a poor fit 

of the ligand and noncomplementarity between ligand functional groups and 

the receptor (e.g., the positioning of acidic ligand groups in negatively charged 

regions of the receptor). It is not our purpose to discuss this issue in great detail, 

and the reader is directed to several excellent reviews in this area.27-31 How- 

ever, several points are pertinent to the discussion that follows. 

The in vacuo strength of a hydrogen bond can be modeled with accuracy, 

but the energetics of hydrogen bond formation in solution are not well under- 

stood, as yet. Studies by Fersht and coworkers32 indicate that the free energies 

for processes of the type: X-Ha, + Y, = (X-H - . Y) + aq, range from 2 to 6 

kJ/mol for uncharged groups and to approximately 12 kJ/mol for charged 

groups. The values are strongly affected by the degree of solvent exposure of the 

interaction; that is, surface hydrogen bonds are worth very little, even in salt 

bridges.33 It would thus seem likely that hydrogen bonds do not contribute 

greatly to the enthalpic stability of a ligand-receptor complex. Their role in 

drug-receptor binding seems to be more related to specificity, especially when 

the interaction is between charged groups. It should be noted, however, that 

even this view is in dispute: work by Doig and Williams34 suggests that hydro- 

gen bonds can, through entropy, contribute more strongly to the free energy of 

binding than is often supposed. 

The binding site will have a distinct electrostatic profile owing to the 

differing electronegativities and bonding environments of the receptor atoms. 

Electrostatic interactions may take the form of charge-charge pairs, for in- 

stance, salt bridges, or interactions involving one or more permanent dipoles. 

The affinity of the ligand will be enhanced if the pattern of ligand partial charges 

can be made to complement that of the receptor.3"-37 It is emphasized that 

complementarity does not simply imply that positive charge on the ligand 

should be matched by negative charge on the receptor. Complementarity should 

also be taken to imply a matching of the magnitudes of the charges as well. A

6 Molecular Diversity and Combirzatorial Library Design 

highly polar area should not be matched to a slightly polar area, since the energy 

of desohation will not be recouped. This is the same argument as for hydrogen 

bonds. 

In regions of low polarity, the drug-receptor interaction is influenced 

more by entropic and weak dispersive effects. Complementarity is achieved by 

placing nonpolar regions of the ligand and receptor next to each other. The 

work of Eisenberg and McLachlan38 has provided an approximate means of 

quantifying the free energy of hydrophobic interactions involved in protein 

folding, using a simple atomic solvation potential, G = X(rsjAj), where oi is an 

empirically determined partition coefficient for the atom class and Ai is the 

surface area of atom i in the protein. 

The free energy of binding can also be strongly influenced by entropic 

effects. Any solute in water causes a local ordering of the water molecules in the 

first hydration sheath and a loss of mobility.39 Removal of the solute by complexation 

will lead to an increase in the solvent entropy. A similar result is 

obtained by displacing weakly bound water from the binding site. In contrast, 

entropy is lost through the fixing of the ligand upon complexation. The loss of 

Brownian entropy of rotation and translation is inevitable. The loss of internal 

conformational entropy, caused by the enthalpic interactions between the site 

groups and the ligand atoms, can be reduced by chemically bracing (rigidifying) 

the ligand, that is, through the introduction of ring systems in place of flexible 

chains. An excellent illustration of this is the work of Alberg and Schreiber.40 

More recently, studies by Khan et al.41 have given a further vivid example: 

X-ray structures of both the flexible and the braced ligand showed that the 

extra binding of the braced ligand was due almost entirely to the fixing of the 

bound orientation. NMR experiments have shed light on many aspects of protein 

dynamics and the effect of ligand binding.42 Indeed, it has been suggested 

that in some cases the loss of protein conformational entropy at its binding site 

may be compensated for by increased conformational flexibility in other 

regions.43 

The conformational changes that occur on formation of a complex have 

further implications for the process of library design. Many current methods 

assume an essentially static picture of the receptor. This assumption is clearly 

unsound, but the nature of the conformational changes that occur upon complexation 

cannot be predicted until a ligand has been fully designed. It is often 

assumed that the uncomplexed conformations of the receptor and the ligand are 

low energy states and, as such, will be reasonably well populated in the complex 

and will provide a good starting model for the design process. HIV-1 protease44 

and the retinoic acid ligand binding domains45 provide worrying counterexamples 

to this assumption; a number of others have been cataloged recently.46 

Nevertheless, modeling studies have still proved very useful in the case of HIV-1 

protease when coupled with X-ray or NMR data.47 Several conformations of 

the receptor and the ligand may be examined, but owing to the computational 

expense, it is not possible at present to examine all the low energy states. It is

Describing Diversity Space 7 

possible to perform good conformational analyses on large numbers of small 

molecules, and on the binding site itself, but at present the two cannot be 

combined except in an approximate or limited manner.48-51 

It is easy, when discussing the energetics of complex formation, to forget 

the crucial role played by water. It cannot be emphasized enough that water 

plays a vital part in the energetics of complexation, both entropically and 

enthalpically. Another function of water molecules is the mediation of contacts 

between the ligand and the receptor. There are many examples in which this 

behavior has been observed in crystallographic complexes. One study that spe- 

cifically investigates this phenomenon is the work of Quiocho et al. on L-ara- 

binose binding protein.52 It is not clear which of the molecules of water that are 

observed in the crystal structure of a receptor are going to be important in 

subsequent interactions with an incoming ligand. There are no firm rules for 

deciding a priori which water molecules are structural and integral to the site, 

but progress has been made in this direction with the CONSOLV programs3 

and more recent work by Pettitt and coworkers.54~SS The docking program 

FlexX56 has been extended to allow automatic inclusion of water molecules in 

the docking. However, the difficulties in this area are shown by the final overall 

results: only a slight improvement was obtained over calculations without wa- 

ter, some dockings being greatly improved and others worsened.57 

In any set of ligands, it is possible to have multiple modes of binding to the 

same active site; it is very difficult to distinguish a priori between the different 

modes with confidence using existing methodologies. Examples of potential 

multiple binding modes can be found in several well-characterized systems.s8 

These systems show large-scale changes among the different binding modes. In 

the human rhinovirus-14 system, two binding modes are equally populated and 

so cannot be distinguished.59 In other cases, the binding mode may be poorly 

defined (giving disorder in the crystal). Multiple binding modes do not affect 

the process of library design in principle, However, methods should be able to 

consider all reasonable binding modes for which the correct answer is not 

known a priori, e.g., by similarity to a docked ligand. The interpretation of 

binding studies can also be complicated if members of the same library bind in a 

different manner, giving rise to what is in effect two or more structure-activity 

relationships. 

DESCRIBING DIVERSITY SPACE 

The key to any analysis of molecular diversity or library design is the 

descriptors used. From the discussion above, it is clear that the descriptors must 

in some way represent, or be correlated with, the important factors governing 

pharmaceutical efficacy, such as receptor binding or drug transport. The 

descriptors to be chosen will depend on several factors, such as the number of

8 Molecular Diversity and Combinatorial Library Design 

compounds to be analyzed and what information is available for the target. It 

may be that different descriptors are used at various stages of the design process 

as described later in the section on Applications. Here we begin by summarizing 

the many different descriptors available for diversity analysis/library design; 

then we shall discuss the best choice of descriptors for different design tasks. 

Finally, we present a discussion on descriptor validation. Descriptors for diver- 

sity analysis have also been reviewed by Brown.60 

Types of Descriptor 

Most available descriptors can be divided into two broad classes depend- 

ing on whether they can be calculated from the two-dimensional (2-D) connec- 

tion table or a three-dimensional (3-D) structure, which is usually generated 

from a connection table by programs such as CONCORD61 or CORINA.62 In 

the 3-D case, conformational flexibility of the molecules should also be con- 

sidered, since the generated conformation is unlikely to correspond precisely to 

that bound at the biological target. In this instance, descriptor calculation can 

be a time-consuming exercise. A second classification of descriptors may be 

made according to the way that the information is encoded and similarities 

calculated: bit strings or fingerprints versus data reduction of many real-valued 

descriptors. 

2-0 Bit Strings 

Molecules are not well described by single descriptors, and thus as many 

descriptors as is practical should be used. This necessitates mechanisms for 

encoding the descriptor information as efficiently as possible, to allow more 

parameters to be used. The most obvious method is to use a binary key (or “bit 

string”), in which bits are set on or off depending on the presence or absence of 

a feature or some other binary condition. Apart from compact storage, binary 

keys can also be operated on very quickly. If a sufficient number of features is 

encoded in it, a key can serve as a unique descriptor, or “fingerprint,” for the 

molecule. The fingerprint profile for a library can be built up by using the 

Boolean AND or OR operation for all the molecule fingerprints in the library. 

The AND operation gives an idea of what features are common throughout the 

library; the OR operation gives the diversity of features. The power of the AND 

operation can be extended to give modal fingerprints,63 in which the feature bit 

is set if the feature occurs in more than a threshold percentage of the com- 

pounds (the normal AND key would have a threshold of 100%). This is useful 

when one is trying to analyze a series of screening hits to create a constraint 

profile to guide library generation. 

Two approaches have been adopted for encoding structural information 

in bit strings. The first uses a predefined set (or “dictionary”) of substructural 

features, and a bit is set on only if a particular feature is present in the molecule 

(Figure 2a). Such keys were originally developed in the context of substructural

IIIIIIIIIIIIIIIIIIIII 

H,C-OH 


LlIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIlIi 

* 

Figure 2 Simple illustration of bit string encoding of chemical structure. (a) Sample 

of a fragment dictionary-based approach. (b) Sample of a hashing scheme using a 

path-based decomposition of the structure. The asterisk denotes an element in the bit 

string where a collision has resulted from the hashing procedure. 

searching systems; Willett et al.64 were the first to use them to analyze screening 

sets. One of the most commonly used implementations of the first approach, the 

MACCS keys,65 have been used quite frequently for diversity studies.66>67 

Brown and Martin have shown that adding a frequency count (i.e., storing the 

number of times a feature occurs in the molecule) gives improved performance 

and that such keys correlate reasonably well with calculated physical properties 

such as octanol-water partition coefficients (ClogP) etc.68 The alternative ap- 

proach involves an exhaustive enumeration of all bond paths through a mole- 

cule, starting with paths of zero length (the atoms) and continuing up to a 

length of seven bonds. This method encodes not just the standard substructural 

features (e.g., a carboxylate group is covered by paths of length 0,1, and 2) but 

their relationship in the molecule. The most well-known implementation of this 

method is in the Daylight software.69 To enable the use of a fixed-length string, 

the occurrence of a particular path is taken as the seed to a pseudo-random 

number generator, which generates a number of bits. These bits are then OR’ed 

into the fingerprint for the molecule (Figure 2b). This process is known as 

hashing. The advantages of the path-based approach are that it is exhaustive 

and no predefinition of fragments is necessary. In principle, this should lead to


better retrieval performance in substructure or similarity searching, whatever 

the query. The disadvantage is that a particular bit in a hashed fingerprint has 

no particular meaning, and several paths may set the same bit by chance. This 

may be an issue when using hashed fingerprints for similarity- and diversity- 

related tasks. Two recent discussions of bit string similarity measures are recom- 

mended reading.70971 

Topological Indices and Other Propevties Derived 

from 2-D Structures 

A large number of topological descriptors can be calculated from a 2-D 

connection table. These represent such molecular attributes as shape, branch- 

ing, flexibility, and electronic properties.72973 Such descriptors have been used 

by several groups for library design or compound selection.74-76 The difficulty 

here is in combining the descriptors, because many of them will be correlated. A 

variety of techniques exists to tackle this problem, including principal compo- 

nents analysis (PCA)77 and multidimensional scaling (MDS).78J9 In the Chiron 

work,75 both PCA and MDS were used on different families of descriptors such 

as topological indices, ClogP,80-82 2-D structural similarities, and specific atom 

layer descriptors derived to represent the distribution of key chemical features 

around a key point (such as the point of attachment to the core) using bond 

counts. These analyses provided a total of 16 composite descriptors for analysis 

by D-optimal design techniques.83 Lewis et a1.74 took the approach of searching 

for six noncorrelated descriptors and used these to partition the corporate 

database at what was RhGne-Poulenc Rorer (RPR). Compared to the Chiron 

work, the latter approach offers greater interpretability. 

Pearlman and Smith84385 have developed novel molecular descriptors 

termed BCUTs based on an initial idea by Burden.86 A number of different atom 

level matrices are generated in which the diagonal represents a property such as 

atom charge while the off-diagonal elements contain information such as the 

2-D (or single-conformer 3-D) distance between two atoms. It is suggested that 

the lowest and highest eigenvalues of such matrices contain information that is 

useful with regard to molecular diversity. Five or six eigenvalues are selected by 

means of a x2 test such that the favored descriptors give an even distribution of 

molecules across the five- or six-dimensional space. Again, partitioning is used 

to divide the space. This method is applicable to very large data sets (hundreds 

of thousands of molecules) and can be used to rapidly compare two large sets of 

compounds or to select a representative set of reagents for library design (based 

on whole molecule properties). Recent work87 has extended this approach to 

use a nonuniform binning scheme. Furthermore, Pearlman and Smiths8 

describe how this methodology can be used to define what they have termed a 

receptor-relevant subspace. In this case, the metrics are chosen so as to group 

sets of actives in the same region of space. The BCUT descriptors have also been 

shown to be useful for studies of quantitative structure-activity and structure- 

property relationships (QSAR and QSPR).89


Property Fingerprints 

A natural extension to the substructural fingerprint is the property fin- 

gerprint. Bemis and Kuntz90 have described a method for combining the dis- 

tances between points on a molecular surface into a histogram, which can be 

regarded as a fingerprint with frequencies. Moreau and Turpin91 have used 

autocorrelation vectors based on the values of properties at the atomic centers 

in a molecule. Gasteiger and coworkers92 have taken this idea further by look- 

ing at the values of some defined property calculated at the surface of a mole- 

cule. An autocorrelation coefficient is constructed from the property values at 

several pairs of points (at the atomic centers or randomly distributed on the 

surface of the molecule) and the distance separating the points. A fingerprint is 

obtained by binning the pairs into preset distance intervals. For reasons of 

computational expediency, however, these approaches consider only one con- 

formation of each molecule. In the Moreau approach,91 where the number of 

points to be sampled is much smaller, the distance intervals also have an impor- 

tant effect on the amount of useful information contained within the vector. 

This is also a critical factor in pharmacophore keys, as we discuss below. Mor- 

eau also computes eight separate vectors based on the connectivity, size, 

n-bonds, heteroaromaticity, hydrogen bond donor and acceptor capability, and 

the contribution to ClogP of each atom. These vectors are concatenated to give 

the overall property fingerprint. 

3-0 Desm‘ptors 

Following the early work of Willett and coworkers93794 and Sheridan et 

a1.,95 searching databases of 3-D structures of organic compounds has become 

an essential tool in the pharmaceutical industry.96.97 Results of 3-D flexible 

searching within databases of known compounds have proven this in a practical 

sense (see, e.g., Refs. 98 and 99). 

These successes have led to the suggestion that descriptors based on three- 

point pharmacophores could be useful in assessing the pharmacophoric diver- 

sity of large data sets and in library design.100-104 The principle is illustrated in 

Figure 3. The Abbott implementation used fixed-width 1 bins up to 15 A and 

considered only the CONCORD-generated conformation.104 In the implemen- 

tation at RPR using the ChemDiverse software,lOs all potential pharmacophore 

triangles or quadrangles are formed from seven types of interaction center 

(hydrogen bond acceptor, hydrogen bond donor, tautomeric groups, aromatic 

centroids, hydrophobes, acids, and bases) over a range of distances of 2-24 A 

with variable-width bins. Conformational flexibility is taken into account by 

means of a systematic search procedure including a bump-check to eliminate 

high energy conformers.100J01 With three points (triangles), there are over 

250,000 potential pharmacophores; this number rises to 24 x 106 if four points 

(quadrangles) are considered. The presence or absence of these pharmacophores 

in a molecule is encoded in a bit string, often referred to as the molecule’s 

“pharmacophore key.”


J 

Pharmacophore Key -1 

Figure 3 Illustration of the creation of a pharmacophore key. As the conformation of 

a molecule changes, so do the distances between the pharmacophoric groups 

(spheres). Each of the two different three-point pharmacophores shown sets its own 

particular bit in the pharmacophore key. 

The relevance of such descriptors to drug-receptor interactions is evident. 

The bit string represents the triangles formed between key interaction points 

over a range of accessible conformations. Two key elements in this approach are 

correct atom typing (distinguishing basic nitrogens, tautomeric groups, etc.) 

and the conformational analysis.100J01 Both these aspects have been the subject 

of extensive in-house development at RPR. The recent extension to four-point 

pharmacophores has been shown to give even greater discrimination between 

compounds.101 One drawback is the time needed to perform the conforma- 

tional analysis. Given the availability of several machines on a network, how- 

ever, even crude parallelization allows the corporate database to be analyzed 

within a few days. 

Cramer et a1.106 developed a methodology called comparative molecular 

field analysis (CoMFA). Rules are used to align R groups (hence the method is 

not applicable to all diversity tasks) in a single conformation (which may in- 

clude intramolecular contacts). An interaction energy is calculated with a probe 

positioned at all points on a grid around the molecules. Since conformational 

flexibility is ignored, these “topomeric” descriptors are essentially “2.5-D.” 

Mount et al.107 have recently published the IcePick methodology 

developed at Axys. A small set of low energy conformers is generated for each 

molecule. Pairwise comparisons are performed, flexibly fitting a conformation 

of molecule B onto a fixed conformer of molecule A and vice versa, using a 

modified version of the Hammerhead docking algorithm.108 The scoring func- 

tion utilizes the molecular surface scoring of the Compass program,’O9 which 

considers hydrophobic and hydrogen-bonding properties at a set of discrete 

points projected onto two shells at 6 and 9 8, around the molecule. The overall 

similarity is the average of these measures over all pairs of matches of A onto B 

and B onto A. The dissimilarity is computed as (1 - similarity). Each 

dissimilarity calculation can take about 40 seconds on a DEC (now Compaq)


Alpha workstation, and so the results are stored in a database for future use. 

This time-consuming method has been used primarily for reagent selection, 

assuming that the presence of an acid, for example, would define how the 

reagents would fit to a common core. 

A further method for analyzing the geometric diversity of functional 

groups in chemical structure databases has been reported by Hubbard and co- 

workers.110 Their program, HookSpace, analyses the spatial relationship be- 

tween pairs of functional groups and provides both qualitative and quantitative 

diversity measures. The utility of the method was demonstrated by comparing 

the diversity of two commercially available databases and a benzodiazepam- 

based combinatorial library. In a similar vein, Bartlett and Lauri have used the 

CAVEAT program to assess the diversity of different combinatorial core groups 

based on a comparison of bond vectors at the substituent positions.111 

Chapman112 has proposed an elegant formalism for expressing the diver- 

sity of a collection of molecules, based on molecular entropy and the three- 

dimensional arrangement of steric bulk and polar functionalities. The method 

addresses molecular flexibility by means of a conformational search to identify 

a set of low energy conformers. The similarity of two conformers is given by 

computing the best steric overlap, then computing the sum of the distances 

between each atom in conformer 1 and its corresponding nearest neighbor in 

conformer 2. An analogous function is used to compute a distance based on 

polar functionalities (hydrogen bond donors, acceptors, etc.). Note that all 

pairs of conformers for all molecules are compared. The diversity function 

comprises a sum of minimum dissimilarities together with an entropic penalty 

term based on the number of rotatable bonds in a molecule. It will come as no 

surprise to learn that this approach is very computationally expensive. Thus, in 

practice, this method is probably restricted to cases in which the superposition 

is fixed, that is, looking at the position of side chains relative to a fixed core. 

Receptor-Based Descriptors 

When a crystal structure is available, the additional information ironically 

makes the task of design more time-consuming. It is not currently feasible to 

perform detailed calculations on every member of a library within the proposed 

active site, including all the important factors described in the above section on 

Molecular Recognition: Similarity and Diversity. Indeed, methods for the flex- 

ible docking of ligands are still being developed, although some (e.g., those 

described in Refs. 56, 113, and 114) are beginning to show promising success 

rates. However, such methods are quite computationally expensive; thus, ap- 

proaches that make more approximations are probably necessary. Some recent 

publications in this area use one particular approximation: specifically, holding 

the template or scaffold fixed and considering each R group independently. The 

PROSELECT1 1s strategy builds on the earlier de novo design program 

PRO-LIGAND.116 Several potential template positions are chosen and substi- 

tuents assessed by means of an empirical scoring function.117 The Kuntz group


has developed an approach (CombiBUILD) based around the program DOCK, 

which assesses mainly the steric fit of a ligand with an approximate force field 

score for ranking. The template is kept fixed and substituents at each position 

are evaluated while allowing for possible intramolecular interactions between 

substituents at different positions using conformational probability maps. This 

method has been used with success to select reagents for a library against 

cathepsin D.118 More recently, the DOCK program itself has been used in an 

“anchor-and-grow” mode to design libraries targeted against plasmepsin 11.119 

Another DOCK variant for library design, CombiDOCK, has been described, 

but no applications have yet been published.120 In another approach, Bohm 

adapted the LUDI de novo design programl21J22 to allow the structure-based 

selection of reagents and has recently applied this methodology to design inhibi- 

tors of thrombin.123 

Chemical Design Ltd. has developed software (“Design in Recep- 

tor”124J25) that allows the virtual screening of tens to hundreds of thousands of 

compounds against all potential three- or four-point pharmacophores within 

the binding site of a protein. This program, which extends the pharmacophore- 

based methodology to embrace the concept of site-directed library design, was 

developed in collaboration with a small number of pharmaceutical industry 

partners. The method operates as follows: first, key interaction sites (donor, 

acceptor, acid, base, hydrophobe, or aromatic) are defined in the receptor site. 

Then, all possible three- or four-point pharmacophore queries are derived from 

these sites. The number of queries can be restricted by applying user-definable 

criteria, which may specify, for instance, minimum and maximum distances 

between points and/or groups of points that must be included in all phar- 

macophores. Finally, the derived set of pharmacophores (perhaps several hun- 

dred to more than a thousand) is used to search the database of virtual products, 

with the protein active site acting as a steric constraint. The search is performed 

as a standard 3-D pharmacophore search, with each hit conformer being fitted 

back onto the matching query pharmacophore. However, matching each phar- 

macophore in turn against every molecule would require repeating the confor- 

mational analysis for each compound. Speed is gained by inverting the match- 

ing loop: performing the conformational analysis only once and comparing 

each conformer against all query pharmacophores. The same proprietary con- 

formational analysis scripts and atom typing can be used as for standard phar- 

macophore key calculations.101 It is possible to save three pharmacophore keys: 

(1) the key of the site pharmacophore matched, (2) the key of the ligand atoms 

matching site pharmacophores, and (3) the full pharmacophore key of the 

ligand OR’ed over all conformations that fit the site. Such methodology should 

open the way for full product-based design taking account of the ability of 

the molecules to fit the receptor with no a priori assumptions about binding 

modes and selecting products such that the library will cover all potential site 

pharmacophores .

Choosing Appropriate Descriptors 


The choice of descriptor will depend on a number of factors, including any 

personal biases of the modeler! Perhaps the most important considerations are 

the amount of information available about the target and whether lead com- 

pounds have been discovered. There are several possible scenarios: 

1. Little information is available, and we are in the realm of general library 

design. 

2. Several leads are available, and the descriptors must in some way utilize the 

information in these leads. 

3. A crystal structure is available, and descriptors/methods are needed to utilize 

this information. 

The scale of the problem (i.e,, number of compounds to be processed) can be 

significant, because some of the descriptors described above will be applicable 

to only a few hundred thousand compounds rather than millions. Thus we face 

several questions of vital importance in the design of drug molecules: To what 

extent can and should pharmacological and pharmaceutical properties be taken 

into account/predicted? How can the plethora of available descriptors be sensi- 

bly weighted? Finally, how can the various descriptors and methods of design be 

validated? These are all active areas of current research. Overriding all these 

considerations, however, is the requirement that the descriptors be calculable 

for a wide range of structural classes in a time frame applicable to the problem 

at hand. Several months may be needed for selecting subsets from a corporate 

database or assessing compounds for purchase, but the turn-around time for 

library design is generally a few weeks or less. Of course, it would also be 

advantageous if the same descriptors could be used to tackle a variety of prob- 

lems. For example, screening hits from a general library could be analyzed 

within the descriptor space used to design the library, which immediately pro- 

vides insight into the type of molecules required for focused lead follow-up 

libraries. Thus, descriptor interpretability may also be an issue. 

In summary, the choice of descriptors will depend on the problem at hand 

and the constraints of time imposed on the designer. Issues of descriptor valida- 

tion are discussed in the next section, though there is no consensus at this time 

on the best descriptors to use. We have had success in applying the 

pharmacophore-based 3-D descriptors to a variety of design tasks. We favor the 

descriptors because they represent key aspects of intermolecular interactions 

and take account of conformational flexibility. The pharmacophore descriptors 

can be applied to diverse subset selection, general library design, and focused 

library design. Site-directed design is in its infancy, but, as described above, the 

methods are being developed to apply the pharmacophore descriptors in this 

area too.


Validation of Descriptors 

The validation of descriptors is an unsolved problem, fraught with difficulties. 

Validation implies the comparison of theoretical results against some absolute 

truth, provided by experimental data or by the universe of all possible results. 

Our stated goal is that design should enhance the process of lead generation and 

optimization. It would seem appropriate to use hit rates as a measure of how 

well our diversity analysis does in comparison to chance: “simulated screen- 

ing.” This approach has been investigated by a number of researchers including 

the authors of Refs. 126-129 However, there are a number of issues concerning 

this type of approach. First, it assumes that the universe of chemical space can 

be neatly divided into actives and inactives, according to some biological test. 

However, membership of a set depends on the threshold defined for activity. If 

we return to our ideas about molecular recognition, we see that binding with 

micromolar affinity may indicate some degree of recognition, possibly mixed in 

with some solvophobic effects. As the activity improves, we are getting more of 

the features right, until at low nanomolar levels, we have compounds that fill 

the active site in a complementary manner. Thus, membership of the actives 

club becomes more exclusive as the threshold is raised and fewer chemical 

families are able to gain entrance. 

The next issue is that of sampling. The entire universe of compounds 

cannot be assayed and split into the activehnactive sets. How do we know that 

we have used a representative sample to test? Are the contents of the Spresi 

database130 representative of the chemical universe, or those of the World Drug 

Index21 of active drugs? Both questions probably have a negative answer, so 

methods that use this approach to validation must be viewed with caution. Even 

the term “hit rate” can be misleading. From a lead generation viewpoint, the 

aim should be to cover as many distinct structural classes as possible rather than 

concentrating on crude counts of hits (prompting the question of how to define 

a distinct structural class!). The “quality” of the hits is also important: that is, 

how amenable are they to optimization by medicinal chemistry. These consider- 

ations imply that the most efficient approach involves screening a well-designed 

set, followed up by screening close analogs of the hits. 

A number of studies have used an alternative approach to assess descriptor 

quality for diversity profiling. In these studies, descriptors were ranked by their 

ability to discriminate active and inactive compounds within a number of medic- 

inal chemistry project data sets. In the work of Brown and Martin,66 this 

discrimination involved the ability to separate one class of compounds from a 

general pool of compounds. The approach put forward by Patterson et 

(see also Refs. 132 and 133) introduced the concept of “neighborhood be- 

havior”: that is, compounds close in biological space should have a small differ- 

ence in descriptor values. In these studies, it was suggested that 2-D fingerprints 

and simple shape descriptors make better descriptors than other alternatives


such as the primitive 3-D pharmacophore fingerprints studied. From our own 

perspective, such assertions regarding descriptor quality are rather sweeping. 

Two-dimensional substructure searches are used routinely to extract analogs 

from databases.134 Similarly, measurement of shape variation provides one of 

the staple descriptors of 3-D QSAR calculations.135-137 A capacity to distin- 

guish active from inactive analogs from a single biological screen at a nanomolar 

level is hardly proof of an ability to discriminate between heterogeneous activity 

classes. Within a single activity class, differences as small as a methyl group can 

have significant effects on activity. This well-known piece of medicinal chemis- 

try lore can be verified by a careful reading of many SAR papers. Jacobsen et 

al.138 provide a recent example in which two compounds (Figure 4) differ by one 

methyl group and have 70-fold difference in their relative activities. The struc- 

tural differences that exist between different receptors will tend to be much 

larger, however. Thus, to some extent, the results of such studies could have been 

predicted. In fact, there are any number of examples in which such approaches 

would break down. Many targets of pharmaceutical relevance involve the com- 

petition of a small-molecule ligand for a binding site with a natural ligand such 

as a small peptide or even a protein. The structurally diverse endothelin antago- 

nists discovered by a number of companies offer a case in point.99,139,140 All 

have a low 2-D similarity according to Daylight fingerprints (Figure 5), yet 

maintain the arrangement of essential pharmacophoric features. 

Fibrinogen receptor antagonists represent another example. In this in- 

stance, the natural ligand is (in part) the RGD (Arg-Gly-Asp) loop. As can be 

seen from Figure 6, different antagonists may show a high degree of structural 

diversity, exhibiting Daylight fingerprint similarities of less than 0.6. As an 

experiment, a database of 100,000 compounds taken from the RPR collection 

was seeded with 12 diverse RGD antagonists taken from the literature.141 

Performing a similarity search in this database with a multipharmacophore key 

derived from a flexible conformational analysis of the RGD tripeptide retrieves 

all 12 antagonists within the top 3% of the database (Table l).I42 Alternatively, 

@yX" / 

OAN/YCH, 

lyN*CH3 

CH3 

Figure 4 Illustration of the effect of adding a single methyl group to a compound's 

activity. In the source paper (Ref. 138), compound 41 (R = H) has a mean binding 

affinity of 6.67 nM against [3H]flunitrazepam. The corresponding value for 

compound 54 (R = Me) is 470 nM.

SB 209670 

0 RPRl09353 

Figure 5 Structurally diverse endothelin antagonists exhibiting low 2-D similarity 

while maintaining common pharmacophoric elements crucial to activity. 

using one of the synthetic antagonists (BIBU52) as the probe retrieves the other 

11 antagonists in the top 855 compounds. While this result is not proof of the 

validity of pharmacophore descriptors for library design, it certainly shows that 

the descriptors capture many of the important features of ligand-receptor 

interactions. 

Perhaps the best lesson to be drawn from these descriptor comparisons is 

that most of the proposed descriptors provide some discrimination pertinent to 

the problem at hand, and, as stated earlier, the final choice will depend on many 

factors relating to the nature of the problem. Two-dimensional descriptors can 

be very efficient at removing close analogs from screening sets, whereas to 

design small-organic molecule libraries based on peptide leads, or indeed on any 

structurally diverse compound set, or to achieve diversity in a biologically 

relevant space, requires descriptors (namely, 3-D ones) that capture the essence 

of drug-receptor interactions. 

A further philosophical problem is that many of the descriptors used to 

date are derived from the field of similarity analysis.143 Two-dimensional fin- 

gerprints lose relevance once outside a defined structural family. It is an ac- 

cepted fact that similarity values below about 0.5 are not reliablekignificant. 

This is not a problem for clustering similar compounds, when one simply wants 

to know that compound A is not similar to compound B, but problems arise 

when it is important to know how dissimilar two compounds are. A pertinent 

critique of 2-D bit string descriptors has been presented by Flower.70

TAKO29 

MK383 

BIBU52 

Figure 6 Some structurally diverse RGD antagonists. 

APPLICATIONS 

Applications 2 9 

With the necessary theory and background now in place, we move on to 

examine how to use the descriptors. In addition to what follows, the reader may 

wish to consult a special issue of Perspectives in Drug Discovery and Design 

from a few years ago entitled “Computational Tools for the Analysis of Molecu- 

lar Diversity.”Is It contains review articles covering many of the issues 

discussed below: cluster-based selection, partition-based selection, and

20 Molecular Diuemitv and Cornbinatorial Libran, Desian 

Table 1 Use of a Pharmacophore Key Derived from the RGD Tripeptide to Retrieve 

12 Seeded RGD Antagonists from a Random Collection of 100,000 Molecules 

Nc 

Probe Numberof Hits Topa Lowestb 100 500 1000 

RGD 23,884 8 3,044 3 5 7 

MK383 57,846 13 11,252 2 5 5 

SB214857 48,210 10 18,086 3 4 6 

TAK029 38,728 1 2,275 5 6 9 

BIBU52d 37,805 1 855 4 6 11 

aPosition in the hit list of the highest ranking of the 12 seeded compounds. 

bLocation of the lowest ranking of the seeded compounds. 

Applications 21 

at this ~tage.12791~6 

This is especially true when one is simply looking for hits 

showing some activity that can be followed up by screening similar compounds 

from the corporate database. A maximally diverse set is to be preferred to a 

purely random selection for the following reasons. The maximally diverse set 

should maximize the structure-activity information gained from the screen by 

minimizing the redundancy in the set of compounds tested. A simply random 

selection, rather than a maximally diverse one, will not guarantee the absence of 

close homologs. Further, although empirical evidence suggests that the number 

of hits obtained from a random selection may approach that obtained from a 

maximally diverse set, the latter should ensure that structurally and phys- 

icochemically diverse leads are found, giving medicinal chemists a better chance 

of finding suitable compounds to follow up for lead optimization.146 Once one 

or more leads have been selected for a project, it might be desirable to select 

follow-up sets for screening. In this case, compounds that are similar to the 

lead(s) in some sense will be sought. 

Both these types of selection may be accomplished by either clustering or 

partitioning methods. For a diverse selection, one might cluster the collection 

and then test only the cluster centroids, whereas in a follow-up similarity 

search, other compounds from within the clusters containing the leads could be 

tested. If a partitioning approach were to be used, a diverse selection could be 

obtained by choosing one compound from each occupied cell in the grid, 

whereas compounds similar to a lead could be found by examining the cell that 

contains it, together with immediately adjacent cells. A diverse set can also be 

constructed by means of a maximum dissimilarity selection algorithm, whereas 

a follow-up set could be identified by simply ranking compounds by similarity 

to the lead( s). Finally, experimental design techniques, autocorrelation 

methods, and a variety of stochastic algorithms may also be applied to subset 

selection. 

Clustering Subset selection by clustering has been a standard approach 

for many years. Perhaps the seminal paper in this regard is that of Willett and 

coworkers.64 In this work, the nonhierarchical clustering algorithm due to 

Jarvis and Patrick147 was employed to cluster the Pfizer chemical stores file 

(approximately 8500 available compounds) with the aim of selecting small 

subsets for screening. The same techniques were also employed to group the 

output from substructure searches, again with the intent of reducing the number 

of compounds to be screened, while maximizing the information gained from 

the screening. A drawback to this nonhierarchical method is the lack of control 

over the size of the largest cluster and the number of singletons. Slight variations 

in the control parameters can lead to the formation of one very large, probably 

unrealistic, cluster, or at the other extreme, a high fraction of clusters with a 

single compound, Menard and coworkers148 tried to address this issue through 

their cascaded clustering approach, in which prior knowledge about the poten- 

tial size of the largest cluster in the database was used to set the clustering 

parameters.

22 Molecular Diversity and Cornbinatorid Library Design 

The small clusters (< 5 members) were extracted and reclustered. When 

the results were checked by medicinal chemists, this strategy seemed to have 

reduced the number of singletons to an acceptable level. An alternative approach 

developed by Doman et a1.149 employed a fuzzy clustering technique150 

combined with the Jarvis-Patrick method.147 The methodology has no userdefined 

parameters and allows compounds to belong to more than one cluster. 

Hierarchical clustering methods are not as greatly affected by the issue of 

singletons, but they do impose higher computational demands. If N is the 

number of compounds to be processed, the dissimilarity matrix can require up 

to O(N2) disk space for storage, and the standard clustering algorithm requires 

O(N3) time.151 Some workers have achieved improved performance by use of 

Murtagh’s reciprocal nearest-neighbor algorithm,l52 which requires only O(N) 

disk space and O(RT2) time, allowing the clustering of up to 200,000 structures 

in a reasonable time.71J51 

Partitioning A good example of a partitioning approach to screening and 

follow-up set selection is the diverse property-derived (DPD) method described 

by Lewis et al.74 The following molecular attributes were used to construct a 

six-dimensional property space: number of H-bond acceptors, number of 

H-bond donors, molecular flexibility, Hall and Kier’s electrotopological state 

index,l53 ClogP, and an “aromatic density” measure. Compounds from the 

corporate database were then partitioned across this space, and each compound 

was assigned an identifier (DPD code) according to the partition to which it was 

allotted. When the compounds had been partitioned, a rational, general screening 

set was created by selecting one compound from each of the partitions. This 

screening set has been in regular use at RPR for a number of years and has 

yielded several weak leads (1-50 pM) in a variety of assays. A particular instance 

of this concerned a project to find inhibitors of low density lipoprotein 

(LDL) production. In this case, the general DPD set yielded one hit, but a 

follow-up set containing compounds with the same DPD code (i.e,, occupying 

the same cell) gave further hits. These were refined in conjunction with an 

existing lead to give a query for use in 3-D searching. Searches of the corporate 

database resulted in compounds having low nanomolar activity.154 Lewis et 

a1.’4 make the point that, in general, the DPD set does not give rise to high 

quality leads, but rather to hits. However, since the DPD set represents a diversity 

of molecular properties rather than of structural features, the DPD set is 

likely to be especially useful with new screens where leads have not yet been 

identified. 

Maximum Dissimilarity-Based Selection The original algorithm for 

dissimilarity ranking in the chemical structure context seems to have been proposed 

by Bawden,*55 although the basic algorithm may be due to Kennard and 

Stone.156 The basic operation of a dissimilarity selection algorithm is to start 

with a compound selected at random and make this the first selected compound. 

Subsequent compounds are selected so that they are maximally dissimilar 

to all those in the currently selected set. Dissimilarity may be measured by

Amlications 23 

the maximum sum of similarities to all selected molecules (MaxSum) or the 

largest nearest neighbor distance (MaxMin). The final diversity of the N mole- 

cule subset is given by Eq. [l] or [2], where sim(i, j) is the similarity between 

molecules i and j, and d, is the Euclidean distance between molecules in the 

descriptor space. 

This type of methodology was embraced by researchers at Upjohn in their 

COUSIN system.126 The Willett developed fast algorithms based 

on the MaxSum dissimilarity measure in combination with the cosine coeffi- 

cient. This algorithm was applied by Pickett et a1.102 in conjunction with multi- 

pharmacophore descriptors to the task of selecting diverse reagents. Willett’s 

group has looked extensively at both definitions of dissimilarity159 and al- 

gorithms for dissimilarity-based compound selection.160 In the former case, 

they concluded that it was impossible to identify any of the four definitions 

studied as being superior to the others. 

When the algorithms were compared, however, the MaxMin algorithm 

gave better results than the alternatives under study. In fact, several 

workersl07J61 have highlighted a problem with the MaxSum procedure. The 

measure is based on the distance of the point from the centroid of the set and so 

tends to select molecules from the corners of diversity space, and duplicate 

selections can appear to add to the diversity. This situation is clearly a problem 

with traditional descriptors, because the extremes of space tend to be less rele- 

vant chemical compounds (very high or very low log P, etc.). 

It is interesting to consider why using “corner” compounds is a less press- 

ing issue when applied to pharmacophore keys. First, the pharmacophore space 

is very high-dimensional, and it is not uncommon to have a number of reagents 

or molecules that have no (or only very few) pharmacophores in common. 

Mount et al.107 note that in higher dimensional spaces, more of the points are 

near the periphery, rendering the difference in behavior less pronounced. Sec- 

ond, the molecules are not randomly spread throughout space but tend to 

cluster; thus inclusion of a similarity threshold to prevent selection of molecules 

similar to those already selected avoids revisiting areas of space. Provided the 

number of compounds to be selected is small compared to the size of the set, the 

time overhead for this additional constraint is not too great. Third, it is also 

possible to monitor how many new pharmacophores a selected molecule would 

add to the set.100 Thus, the similarity measure ensures that pharmacophores are 

presented in different combinations, while the monitoring of the addition of 

new pharmacophores ensures that, overall, all pharmacophores within the set

24 Molecular Divmity and Combinatorial Libra y Design 

are covered (i.e., by combining a partitioning and a distance-based approach). 

These arguments not withstanding, the MaxMin procedure would appear to be 

the method of choice today. Agrafiotis and Lubanovl61 have shown how k-d 

trees can provide an efficient way to calculate nearest neighbor distances for 

input to a MaxMin selection procedure. They use a simulated annealing pro- 

cedure to select an n-molecule subset that maximizes Eq. [3]. This expression 

provides a smoother function compared to the standard MaxMin expression 

(Eq. PI). 

A general dissimilarity selection algorithm was recently reported by 

Clark.1627163 There is an adjustable parameter in the algorithm that controls the 

balance between representativeness and diversity. Other functions for maximiz- 

ing dissimilarity have been suggested by Hassan et al.164 In their work, the 

(dis)similarity function is derived from a large number of 2-D and single- 

conformer 3-D descriptors, the dimensionality being reduced by means of prin- 

cipal components analysis (PCA). Multidimensional scaling is used to generate 

a 3-D coordinate plot for the library. The library design is a “cherry-picking” 

procedure: a random selection of compounds is taken, and compounds are 

added and removed from this selection by means of a Monte Carlo method 

combined with a maximal dissimilarity function based on the sum of the dis- 

tances between molecules in the PCA descriptor space. It seems from Hassan’s 

paper,164 that the principal components are recalculated for, and are particular 

to, each library, making the performance of interlibrary comparisons a non- 

trivial task. Hudson et al.165 have also reported the development of 

dissimilarity-based methods for the selection of diverse subsets. 

Experimental Design In addition to a maximal dissimilar selection al- 

gorithm, similar in spirit to those described above, Higgs et a1.166 have experi- 

mented with the use of a D-optimal design algorithm to generate what they term 

an “edge design.” By this they mean a design that tends to select molecules on 

the edge of the descriptor space, filling the corners first and then populating the 

edges. Experimental design has also been used for reagent selection by the 

Chiron group,7s who claim that it can generate “maximal overall diversity.” 

However, Higgs et al. criticize this assumption. In their experience, the D-opti- 

ma1 design algorithm does not explicitly seek to avoid previously sampled areas 

of space, even with the addition of additional (quadratic) terms. The Lilly 

group166 much prefers the maximal dissimilarity selection algorithm (what they 

term a “spread design”), which is able to sample descriptor space thoroughly, 

including molecules from the edges and throughout the space. A further type of 

design (a “coverage design”), suitable for lead follow-up, is mentioned in their 

work.166 The coverage design algorithm identifies a subset of molecules that is 

maximally similar to a candidate set.

Applications 25 

Kohonen Maps Kohonen maps are essentially a projection technique, 

providing a lower dimensional (usually 2-D) view of a higher dimensional 

descriptor space. Objects close in the higher dimensional space will be placed in 

the same or neighboring neurons, and so the method could be classed as a 

partitioning technique. Gasteiger and coworkers167 applied this technique in 

conjunction with spatial autocorrelation vectors and were able to differentiate 

dopamine and benzodiazepine agonists.168 The method has also been proposed 

as a means of assessing the diversity of combinatorial libraries.92 Agrafiotis has 

described the application of a similar technique, Sammon mapping, for visualiz- 

ing the results of diversity analyses.’@ 

Spanning Trees The IcePick program,lo7 mentioned earlier in connection 

with 3-D descriptors, utilizes a minimum weight spanning tree (MWST) to 

obtain a spread of molecules. The MWST can be thought of as the shortest way 

of indirectly connecting a set of points. When the MWST is large, the set will be 

diverse because the points are spread out. 

It is worth noting that in all the methods described in this section, diversity 

is being equated to dissimilarity between compounds, and dissimilarity is being 

assessed as (1 - similarity). In other words, the methods require a comparison 

metric that is meaningful for measurement of distance between quite dissimilar 

objects. This is not the case for 2-D fingerprints, for example, which were 

developed for 2-D substructure searching and, as mentioned earlier, tend to lose 

meaning below similarities of about 0.5. In the authors’ opinion, not enough 

consideration has been given to this issue. It is for this reason that validating 

metrics on quite structurally homogeneous data sets (where such assumptions 

may apply) is not the same as validating them on very structurally inhom- 

ogeneous sets (see above section on Validation of Descriptors). 

Partitioning Versus Distance-based Methods There are several methods 

available for selecting representative subsets from large sets. Each method has 

its good and bad points, and the specifics of the application should determine 

the most appropriate method to select. The methods are fairly independent of 

the nature of the descriptor but are affected by whether the descriptor is discrete 

(e.g., binary fingerprints) or continuous (e.g., molecular weight). Techniques for 

clustering chemical objects have been well reviewed by other re- 

searchers1449170J71 and have been applied by several groups to select repre- 

sentative screening sets from large compound collections. Despite these success- 

ful applications, we think that clustering should be used with great care. The 

application of a clustering method makes the assumption that the data are in 

fact amenable to clustering: in other words, most clustering methods will pro- 

duce a clustering, whatever the data. To the authors’ knowledge, there are no 

simple ways of testing whether this assumption is justified for a very large data 

set. Certainly, cluster significance tests have been proposed,172, but they are 

quite computationally expensive and not practical to apply to very large data 

sets. The second and most important factor is the lack of generality when one is 

applying distance-based measures. If the subset is defined by the clustering of


one database or combinatorial library, it is hard to define the descriptors for 

compounds in a second database or library without a large number of expensive 

distance calculations, as well as some arbitrary definitions of cluster dimensions. 

Perhaps the best application of clustering in the context of library design 

is to remove redundancy in reagent sets. 

Partitioning is best described as a boxing algorithm: each descriptor is 

divided into ranges; a combination of descriptor ranges makes a partition or 

box. The composite descriptor is then effectively the coordinate vector of one of 

the vertices of the box. The complete set of partitions is formed by taking all 

combinations of all the ranges into which the molecular descriptors have been 

divided. This approach also has the useful property of being space filling. It is 

completely portable between different databases, designs, or applications, provided 

the same descriptors and ranges are used, thus allowing comparison 

between compound sets from different sources or different combinatorial libraries 

(see next section). Other advantages are easy control of granularity and, 

perhaps most important, the ability to identify property space not represented 

by any molecule. Disadvantages of the partitioning algorithm include the arbitrary 

way in which the ranges must be set, and the introduction of edge effects 

when a partition boundary slices between two very similar compounds; an 

answer to this issue may come though the application of fuzzy 10gic.l~~ These 

edge effects have implications for follow-up screening of molecules in the same 

partition as a lead: to avoid missing compounds that fall just outside a partition, 

the surrounding partitions should also be tested. However, for a sixdimensional 

classification like the DPD system,74 with perhaps 50 compounds 

per partition, this could necessitate screening a further potential 36,400 [i.e., 

50(36-1)] compounds, a number almost large enough to defeat the object of the 

exercise. However, the portability of the descriptor outweighs this negative 

factor, in our opinion. More work remains to be done to reach a consensus on 

the question of which method, clustering or partitioning, gives the better performance. 

At present, we must conclude that choosing the method best suited to 

the task at hand is preferable to modifying the task to suit a favored methodology. 

Thus the application of these methods by the practicing computational 

chemist may require some trial and error. 

Comparison of Compound Collections with a 

View to Acquisition or Combinatorial Libraries 

with a View to Synthesis 

As mentioned above, corporate chemical structure databases are replete 

with analog series and are thus far from representative of the full range of 

structural or physicochemical diversity. There is therefore much interest in, 

first, locating the “diversity voids” within a particular collection, and then 

analyzing external collections to see which compounds could be purchased to 

occupy those holes. In this way, the molecular diversity of a corporate collection 

can be enhanced, and this in turn should lead to better results from high-

throughput screening experiments for the reasons outlined in the preceding 

section. Clearly, identical techniques can be used for the comparison of com- 

binatorial libraries to ensure that synthetic effort is not being wasted in the 

generation of redundant compounds. 

As an example of compound collection comparison, Shemetulskis et a1.174 

carried out clustering experiments to see how much diversity would be added to 

the Parke-Davis corporate database (CBI, 117,459 compounds) by the inclu- 

sion of the Chemical Abstracts Service ( CAST-3-D, 379,847 compounds) and 

the Maybridge (MAY, 41,912 compounds) databases.175.176 The approach 

used was to cluster the CBI database with each of the MAY and CAST-3-D 

databases in turn and to examine what percentage of the resulting clusters 

contained only (or more than 95%) MAY or CAST-3-D compounds. The MAY 

compounds in these clusters could then be considered as candidates for pur- 

chase. The clustering experiments were carried out on the basis of both struc- 

tural attributes and physicochemical properties using the Jarvis-Patrick al- 

gorithml47 as implemented in the Daylight software.69 With the large numbers 

of compounds involved, the clustering effort [requiring an O( N2) nearest- 

neighbor table calculation] was immense. As an illustration, the generation of 

the nearest-neighbor table for the CAST-3-D database took 64 CPU days on an 

SGI 4D/480 workstation! 

Apart from the large amount of CPU time required for clustering (or 

distance-based) experiments of the type mentioned above, such methods are 

generally not well suited to diversity void location, simply because they can deal 

only with space that is covered by the compounds being clustered. So, in the 

work above for instance, if there were regions of diversity space not occupied by 

any compound in CBI, MAY, or CAST-3-D, there would be no way of discover- 

ing these voids or of choosing compounds to fill them. Thus, partitioning (cell- 

based) approaches are generally considered to be preferable for this kind of 

analysis, provided, of course, that a suitable diversity space for partitioning is 

defined.84 

Cummins et al.76 used a cell-based approach to compare the molecular 

diversity in five databases: the Comprehensive Medicinal Chemistry (CMC177) 

and MACCS Drug Data Report (MDDR17*) (each representing medicinal 

chemistry knowledge bases), the Available Chemicals Directory (ACD1791) and 

SPECS180 (representing commercially available compounds), and the Wellcome 

Registry. The compounds in these databases (totaling more than 300,000) were 

mapped into a molecular descriptor space describing molecular diversity in 

terms of the free energy of solvation and 60 topological indices. This number of 

descriptors was reduced to four by factor analysis, and a partitioning method 

was used to analyze the resulting space. It was found that the superpopulation 

of structures occupied only a very small volume of the available space; attention 

was focused on the densely populated part by removing outliers (cells with no 

or few representatives). In any event, only about 7000 compounds were deleted 

in this process, at which point it became possible to compare the databases in

28 Molecular Diuevsity and Combinatorial Libraty Design 

detail. For example, the MDDR and ACD databases were found to overlap 

each other’s volume by around 70%, reflecting the fact that many biologically 

active molecules are of commercial interest and vice versa. 

More recently, Willett’s group has extended its methodology for diverse 

subset selection to the analysis of the relative diversity of compound collec- 

tions.158 The six databases compared comprised five publicly available collec- 

tions and a combinatorial library. The individual diversities of the databases 

were assessed, and also the changes in diversity that occurred when one 

database was merged with another. Interestingly, the union of two databases 

does not always result in an increase in diversity! For instance, the diversity of 

the Maybridge collection was found to decrease markedly when it was merged 

with a simple combinatorial library constructed from the condensation of 400 

primary amines and 400 carboxylic acids selected from the World Drug Index21 

(WDI) database. In other words, according to the metrics used, the molecules in 

the resulting database are more similar to each other than those just in May- 

bridge. Pickett et a1.102 have adopted a similar kind of methodology but using a 

different descriptor, 3-D pharmacophores rather than 2-D bit strings. In this 

work a number of potential combinatorial libraries were compared, and the 

results were used to select the subset that added the most pharmacophore 

diversity in comparison to screening libraries previously synthesized. 

A rather different tack has been taken by Nilakantan et al.,lgl who 

describe a method for comparing large chemical databases. Their approach 

relies on categorizing each database according to its ring system content, based 

on some earlier work.182 Each ring system in each molecule is assigned a hash 

code, and these codes are summed for each molecule to generate what the 

investigators term a ring-cluster hash code. By comparing the resulting hash 

codes for two databases, it is possible to gain some idea about how similar they 

are. Nilakantan et al. used this metric to compare a number of public databases 

[Cambridge Structural Database (CSD),l83 ACD, WDI, and the National 

Cancer Institute (NCI-3-D)184 database] and discovered that the CSD has the 

richest collection of ring systems and ring clusters. The same paper presents a 

different method for the estimation of database diversity. The program DIVPIK 

simply tries to pick a certain number of dissimilar compounds from a database. 

Intuitively, the more diverse a database, the fewer attempted selections will be 

required. A measure of diversity can be gained by considering the ratio 

NTRIES/NPICK. Nilakantan et al. used this measure to demonstrate that the 

diversity of the four databases increased in the order WDI < ACD = NCI-3-D < 

CSD (essentially the same result obtained by a consideration of the ring cluster/ 

system hash codes). The two independent methods thus serve to validate each 

other to some extent, although the DIVPIK method is significantly more com- 

putationally expensive in practice. 

We attempted a practical application of these ideas in a project to select 

1000 compounds from one agrochemical-biased corporate collection (CC1) to 

supplement the diversity of a representative pharmaceutical-biased screening

Abblications 29 

set (PSS) derived from another independent corporate collection. These experi- 

ments used the Chem-Xlos pharmacophore key overlaps as the similarity met- 

ric. We found that we could achieve better results by using diversity analysis 

tools, but that prefiltering had a very important role to play (a sobering thought 

for those of us caught up in the mathematics of diversity analysis). The follow- 

ing filters were used: 

0 Remove compounds containing potentially reactive or toxic groups. 

0 Remove molecules with a molecular weight outside the range 200-600 

Da. 

0 Remove molecules with a ClogP value outside the 0-6 range. 

Remove all molecules expressing a number of pharmacophores outside 

the range 1-1000. 

Remove all molecules with more than 100,000 conformations. 

0 Remove all instances of “near-duplicate’’ molecules. (This was achieved 

by taking each molecule in turn and removing all molecules with a 

Daylight fingerprint similarity > 0.95 to it). 

0 Remove compounds with heavy atom counts outside 20-45, excluding 

halogens. 

While the filters are fairly stringent, we did not expect them to remove 83% of 

the corporate collection! Use of the HARPick programl85 (see below) increased 

the number of pharmacophores present in the selected set from around 13,000 

for the first random pick to 15,711 and increased the number of phar- 

macophores unique to the selected set (as compared to PSS) from 535 to 850. 

Combinatorial Library Design 

The key task in library design, in which molecular diversity analysis can 

play a central role, is the selection of reagents. In general, these reagents will 

give rise to R groups attached to a conserved scaffold or template. The need for 

reagent selection arises because in many instances, the product of the number of 

available reagents at each variable position rapidly outstrips the synthetic 

capability of even high-throughput, robotic synthesis units. From arguments 

similar to those advanced in the preceding section, it is obviously sensible to 

choose a diverse subset of the available reagents at each position for general 

library design. In some instances, there will be additional information that can 

focus or constrain the design. We shall deal with these two scenarios separately. 

General Library Design 

Broadly speaking, there are three approaches to reagent selection. In 

reagent-based selection, a subset is chosen to maximize the diversity of the 

reagents at each position without considering the reagents at the other posi- 

tions, or the scaffold. A good example of such a method is that reported by the

30 Molecular Diversity and Combinatorial Libraty Design 

Chiron group.75 Of course, almost any of the techniques for diverse subset 

selection may be applied to reagent-based selection of reagents. Alternatively, a 

product-based scheme can be envisaged, in which reagents are selected at all 

positions so that the diversity of the generated products is maximized. This type 

of approach has been championed by Gillet et al.186 and by Good and Lewis.185 

Finally, one may pick the most diverse set of products and then deconvolute to 

find the sets of reagents required to make that set. This kind of approach, 

sometimes called cherry-picking, is exemplified by the methods embodied in the 

ChemDiverse package.105 

There are some advantages and disadvantages to each of these ap- 

proaches, and each may be appropriate in certain design situations. In general, 

the cherry-picking approach will result in the most diverse set of products; 

however, this approach has the serious disadvantage of not resulting in a syn- 

thetically efficient combinatorial library. That is, it is likely to be necessary to 

synthesize a number of “unwanted” molecules in addition to the desired prod- 

ucts. Reagent-based selection is fast, since one is not considering the enumer- 

ated combinatorial products in the analysis, and thus this method may be 

suitable when the enumerated virtual library is very large. However, experi- 

ments by Gillet et a1.186 have shown that a product-based reagent selection 

approach gives diversity superior to that obtainable from a reagent-based 

method. Van Drie and Lajiness report a similar experience.187 Balanced against 

this we note that most product-based schemes can deal only with enumerated 

libraries of the order of 100,000 molecules, a number that is easily attainable, 

particularly with more than two variable positions on the template. In practice, 

one is likely to need to combine the reagent-based and product-based ap- 

proaches. The reagent-based selection methods can be used to filter the initial 

reagent lists to a size at which the virtual library becomes tractable for analysis 

by a product-based method. This kind of hybrid approach has been used suc- 

cessfully by Good and Lewis in applying their HARPick program.185 

We have already discussed the work of Chapman112 from the perspective 

of molecular descriptors. We will now look at it in terms of library design. 

Chapman computes diversity as the sum of all pairwise dissimilarities between 

the molecules in the set. A bias may be introduced to weight against excessive 

flexibility in the molecules by a function based on the number of rotatable 

bonds. A standard “greedy” algorithm that adds the molecule that will most 

increase the diversity of the current set of molecules is used to build up a library 

design, This implies a cherry-picking strategy. Even so, the diversity measure is 

still very computationally intensive, and at present this method can handle only 

libraries in the low thousands. 

The nature of product-based library design lends itself naturally to the 

application of heuristic search methods such as simulated annealing188 and 

genetic algorithms.189 Several groups have published applications in the latter 

area, which has been recently reviewed.190-192 While all methods differ some- 

what in their technical implementations of the different algorithms, by far the

Applications 3 1 

most important factor affecting the final choice of reagents is the scoring func- 

tion. As always, there is a need to use descriptors pertinent to ligand-receptor 

interactions. The HARPick program of Good and Lewis185 uses a fitness func- 

tion based on multipharmacophore molecular descriptors. Both simulated an- 

nealing and genetic algorithms have been studied.193 The scoring function in 

HARPick is very flexible and is made up from a weighted combination of the 

following terms: the number of pharmacophores expressed and their frequency, 

some crude shape measures, molecular flexibility, and the degree of match to 

the pharmacophore profile of a reference library. The method was tested by 

means of a variety of weighting combinations and libraries, and the results were 

compared with the data obtained with ChemDiverse,lOS which, as mentioned 

earlier, uses a cherry-picking strategy. Both ChemDiverse and HARPick were 

able to improve considerably molecular selection based on pharmacophore 

count, compared to random selections, but HARPick calculations, which were 

set to purely maximize pharmacophore diversity, were able to find around twice 

the number of pharmacophores obtained by the comparable ChemDiverse runs, 

As expected, however, the molecules chosen were substantially more flexible 

and “promiscuous.” Inclusion of the “quality” terms (which penalize undesir- 

able characteristics such as excessive conformational flexibility in the library 

members) reduced the pharmacophore scores of the final selections but not 

drastically (still better than random). As one might expect, selections made at 

random or via ChemDiverse gave sets of molecules that broadly followed the 

distribution of properties (such as the number of rotatable bonds in a molecule) 

observed in the whole Standard Drugs File (now known as the World Drug 

Index21). HARPick managed to produce a much more even distribution. In 

another evaluation of HARPick reported in Ref. 185, the program outper- 

formed random selections from the perspective of filling diversity voids in a 

reference library. Given our remarks about the difficulties in measuring general 

diversity, this is probably the best way in which such selection methods should 

be applied. 

The primary feature emphasized by the calculations above is the control 

afforded to the user over both the components of the scoring function and the 

weights applied to them. In principle, any descriptor could be applied to the 

scoring functions. One could envisage maximizing functions (e.g., 3-D phar- 

macophore or 2-D fingerprint coverage, reagent supplier reliability), minimiz- 

ing functions (e.g., cost per reagent), partition functions (e.g., general shape, 

ClogP), and bounding functions (assigning a score of zero to products with 

properties outside specified bounds, e.g., minimudmaximum ClogP). In prin- 

ciple, a totally customizable scoring function could be devised, with the user 

able to choose the properties included in the scoring routine, and the functions 

used on them. Similar ideas are envisaged by Agrafiotisl69 and have been imple- 

mented by groups at various pharmaceutical companies. With careful applica- 

tion of user weightings for each component function, the result would be a 

totally flexible profiling paradigm.

32 Molecular Diversity and Conzbinatorial Library Design 

Gillet et a1.194 have recently reported on the SELECT program, which is 

similar in philosophy to HARPick but uses a genetic algorithm rather than 

simulated annealing. A product-based program, SELECT utilizes the Daylight 

structural fingerprints to optimize either the sum of dissimilarities or the aver- 

age nearest-neighbor distance of selected compounds. Interestingly, the pro- 

gram can also select the best configuration for a multicomponent library. Be- 

cause of the nature of the descriptors used, the program can be applied to 

virtual libraries of hundreds of thousands of products. Additional terms in the 

scoring function allow libraries to be designed with respect to an external 

reference and to have an appropriate spread of physicochemical properties. 

Constrained/Focused/Biased Libra y Design 

In designing a library, it is of paramount importance to take account of all 

the available information. A general library design assumes no particular prior 

knowledge, but in many cases, there will be information that can be used. For 

instance, it might be desirable to bias a library away from a previous collection 

or library, or toward a set of compounds known to be active. In one case,19-< 

Sheridan and Kearsley constrained their design to select tripeptoids similar to 

two tetrapeptide cholecystokinin (CCK) antagonists. In a second example, scor- 

ing was based on an angiotensin converting enzyme (ACE) “trend vector” 

summarizing the chemical features shared by known ACE inhibitors that differ 

from those of a general population of druglike molecules.195 Similar work has 

been reported by Cho et al. with their FOCUS-2-D method.196 Good and Lewis 

have shown how the HARPick program can be used in this context, selecting a 

set of reagents such that the generated products would fill diversity voids in the 

space occupied by the Standard Drugs File.185 

In related work, Pickettl97 has used a genetic algorithm whose objective 

function was the overlap in pharmacophores between one or more lead com- 

pounds and members of the proposed library. In the context of an ongoing 

medicinal chemistry program, Brown et al.198 have described the design of 

libraries biased toward the family of peroxisome proliferator-activated recep- 

tors (PPARs). In this instance, a phenoxybutyric acid group (present in known 

PPAR ligands) was incorporated as a “privileged” fragment at one diversity 

position. At the other two variable positions, molecular weight and synthetic 

considerations were used to filter reagents before subjecting them to an experi- 

mental design procedure to select a diverse set at each point. Deconvolution of 

the resulting library led to the identification of GW 2433 (Figure 7) as the first 

high affinity PPARG ligand. 

The most exciting situation, however, is where there is information con- 

cerning the structure of the receptor site that is being targeted. In this case, 

structure-based design and combinatorial chemistry can combine syn- 

ergistically to give enormous benefits.199.200 The structural information pro- 

vides a strong constraint for reagent selection, while combinatorial library 

design ensures the rapid provision of synthetically accessible compounds, thus

Diversity Is Not the Be-All and End-All! 33 

Biased library GW 2433 

Figure 7 Identification of GW 2433. The biased library comprised a biasing fibrate 

monomer at R1. R2 and R3, derived from carboxylic acids and isocyanates, were 

chosen for diversity by means of experimental design techniques. 

overcoming a debilitating bottleneck in de novo/structure-based drug 

design.201J02 There is a growing number of published examples of structure- 

based library design (see, e.g., Refs. 119 and 203-214). Perhaps the most 

compelling example is that of Kick et a1.118 In this work, the active site of 

cathepsin D was used to constrain the selection of reagents at four variable 

positions on a scaffold based on a known inhibitor, pepstatin. The resulting 

library (1000 compounds) yielded a hit rate of 6-7% when screened at 1 yM 

with 7 compounds being active at 100 nM or less. The information gained from 

this initial library was used to design and synthesize a follow-up library yielding 

inhibitors in the range 9-15 nM. As a control, Kick et al. also designed a 

general, diverse library (also 1000 compounds) using 2-D similarity measures 

for screening against the enzyme. This library produced a hit rate of 2-3% at 1 

pM with only one compound being active at 100 nM. From this example, the 

incorporation of structural information into the library design can be seen to be 

extremely valuable. A similar method for structure-based library design, called 

PRO-SELECT has been reported by Murray and coworkers.lls This program 

was used to design inhibitors of thrombin based around a scaffold from a 

known covalent inhibitor, PPACK (D-Phe-Pro-Arg-chloromethylketone). 

About half the designed molecules were found to have micromolar activity, the 

best being a close PPACK analog (D-Phe-Pro-agmatine) which showed an inhib- 

itory concentration (IC50) of 40 nM. Thrombin also provided the target for the 

structure-based combinatorial library design described by Graybill et a1.,21s 

although few computational details are given. 

DIVERSITY IS NOT THE BE-ALL AND 

END-ALL! 

In all work on the selection of compounds or reagents by means of mo- 

lecular diversity techniques, it is vital not to lose sight of other consider-


ations.Z16 As Higgs et al. put it: “compounds must not be so diverse as to be 

pharmaceutically unreasonable.”*66 In their early work with a maximal 

dissimilarity selection algorithm, Higgs et al. found that nearly all the com- 

pounds selected were deemed pharmaceutically unreasonable by medicinal che- 

mists. They thus implemented a series of rules based on substructural queries, 

molecular weight, and ClogP cutoffs, which they use to assign “demerits” to 

compounds. If any compound gains too many demerits, it is rejected-a fate 

that may be suffered by up to half of the molecules initially selected! The fact 

that 90% of the molecules in the CMC database (i.e., known drugs) caused one 

or more of the rules to fire underlines the need not to be too zealous in rejecting 

compounds with only one poor feature. 

In a similar vein, Lewis et a1.74 describe a series of substructural filters 

applied during the creation of the diverse property-derived sets. These rules are 

designed to eliminate molecules containing toxic or very reactive substructures 

such as reactive epoxides, acyclic aminals or acid anhydrides.217 Also rejected 

are other molecules that exhibit a wide range of biological activities (e.g., pros- 

taglandins, prostacyclins, or thromboxanes) and are thus unsuitable for general 

screening. A similar “badlist” was developed by Lajiness at Pharmacia and 

Upjohn.145 More recently, at RPR, we have implemented a set of alerting rules 

for compounds that contain chromophores that absorb in the range above 300 

nm. Such compounds may interfere with certain assays and thereby reduce the 

accuracy of high-throughput screening (HTS) data. 

With increasing importance being attached to the early detection of com- 

pounds likely to be problematic from an absorption, distribution, metabolism, 

and excretion (ADME) viewpoint,21*-221 at RPR we sought to apply computa- 

tional measures for the prediction of intestinal absorption-a key requirement 

for an orally bioavailable compound-during the design of lead optimization 

libraries. To this end, we implemented the popular “rule-of-5” criteria 

described by Lipinski et a1.222 A compound is deemed to fail the rule-of-5 check 

(and thereby to be possibly deficient from an oral absorptiodpermeability as- 

pect) if it possesses two or more of the following features: 

0 more than 5 hydrogen bond donors (i.e., N-H or 0-H bonds) 

0 more than 10 hydrogen bond acceptors (i.e., any N or 0, including those 

in donors) 

0 a ClogP value of greater than 5.0 (or an MlogP223 value > 4.15) 

0 a molecular weight of greater than 500.0 

At RPR we also developed computational alerts based on the work of Palm et 

al.224-226 and Winiwarter and coworkers.227 Both these groups demonstrated 

a strong correlation between polar molecular surface area (PSA) and human 

intestinal absorption. Of particular interest is the observation that molecules 

with a PSA of greater than 140 A2 are likely to show poor (< 10%) fractional 

absorption. Our own research has confirmed this observation, and we have

Current Issues and Future Directions 35 

extended the methods to develop a QSAR model for predicting blood-brain 

barrier penetration.228J29 Our implementation of the polar surface area 

calculations is sufficiently rapid to allow the profiling of large (virtual) com- 

pound collections on a routine basis. This permits the inclusion of ADME- 

related parameters in the process of product-based reagent selection.142 In this 

way, we can attempt to ensure that the library compounds will have good 

pharmacokinetic properties, thus facilitating the hit-to-lead transition. 

CURRENT ISSUES AND FUTURE 

DIRECTIONS 

In a field that is far from mature, there are necessarily many issues to be 

addressed and myriad possible future directions that research must explore.18 

Here, we highlight a few of the current issueddebates in the field and suggest 

possible avenues for future work. We have touched on several issues above, and 

the reader is also directed to the reviews by Martin230 and Mason and 

Hermsmeier.231 

Diversity Descriptors 

There are many issues surrounding the way that “diversity space” is 

described. As we have mentioned, the popular 2-D bit string or fingerprint 

descriptors were originally designed for 2-D substructure-searching applica- 

tions, and it remains unclear whether these are truly optimal for diversity 

calculations.70 The debate that has raged over 2-D versus 3-D descriptors has, 

perhaps, generated more heat than light. It is likely that each type of descriptor 

has its place in the process of diversity analysis and library design, but a con- 

sensus on this matter has yet to be reached. Nonetheless, it would appear that 

several groups are trying three-dimensional measures of diversity which more 

accurately reflect ligand-receptor interactions. Unfortunately, this leads to in- 

creased computational effort, limits in the description of conformational space 

(e.g., neglect of solvent effects in most cases), and the need for tailored diversity 

measures. 

In terms of 3-D descriptors, there remains the need for a useful, computa- 

tionally expedient descriptor of molecular shape. Another question is whether 

complementary site points should be included in 3-D descriptors as advocated 

by some workers?2303232 Can molecular field information be included in 3-D 

descriptors in a manner similar to the way it has been incorporated into experi- 

mental 3-D similarity searching system?233 How should tautomeric and ioniza- 

tion states be handled? These are all questions worthy of future research. 

With both 2-D and 3-D descriptors, the thorny issue of how to validate 

descriptors is still an open question. It is clear that we would like to have


descriptors that relate better to biological activity,230 but proving that this is 

indeed the case for a given descriptor is a task fraught with difficulties. A key 

issue in descriptor validation is how to define a reference set that is meant to 

typify the universal set of actives, and possibly inactives. One approach has 

been to use the World Drug Index21 to define the set of active compounds and 

the Spresi database130 to define the inactives. The WDI must be used carefully 

and selectively because it contains many classes that are inappropriate (e.g., 

disinfectants, dentrifrices). The next question is, How valid is it to compare 

central nervous system (CNS) drugs with topical steroids with anticancer 

drugs? The danger is that the analysis will tend to produce the lowest common 

denominator (like the rule of 5),222 rather than a stunning insight into molecu- 

lar diversity. There is also the issue of reverse sampling: How valid is it to deduce 

the properties of the universal set of biologically active molecules from a subset? 

The properties of previous drugs may have been driven mainly by bio- 

availability, or toward making analogs of a natural substrate. Using these data 

forces an unnatural conservatism into our diversity models. 

It is also interesting to reflect on what is meant by activity and inactivity. 

Any molecule will bind to any receptor, although the affinity may have any 

value between picomolar and gigamolar. If the binding event is viewed in terms 

of moiecular interactions, then interesting, specific binding can be characterized 

by affinity constants lower than 1000 nM. However, it is not uncommon to find 

affinity constants of 1000 nM that are mainly due to solvophobic interactions 

forcing the ligand to associate with the receptor (particularly for hydrophobic 

compounds like steroids). At 100 nM, some specific noncovalent interactions 

are being formed, and at levels below 10 nM, there are many highly specific 

interactions present. It should be clear that the activity is a continuous phenom- 

enon, and that drawing an arbitrary division is a hazardous ploy. Furthermore, 

while one can be fairly sure why a compound is active, it is much harder to say 

precisely why a compound is inactive. Was it the wrong pharmacophore, a steric 

bump, poor solubility, metabolic alteration, or something else? Despite all these 

caveats, several research groups have followed such an approach and claim to 

be able to distinguish a potential active from a potential inactive, with reason- 

able confidence. Such results cannot be ignored, and they will be of use in the 

early phases of library design, where the basic feasibility of the library and the 

reaction are being considered. 

The realization that “mere diversity”216 is not sufficient in practical li- 

brary design has driven much recent work in the direction of biasing design 

toward compounds with more “druglike” properties. The challenge here is 

defining the term “druglike.” Several groups have attempted to tackle this 

problern,136,234-236 but some of the arguments used earlier (see section on 

Validation of Descriptors) also apply here. How can the non-drug like space be 

adequately defined? Physical properties or other measures such as polar surface 

area can be included in the design, but how should these be weighted with 

respect to diversity? Should compounds falling outside the bounds simply be

Current Issues and Future Directions 3 7 

excluded from further consideration? If such hard cutoffs are applied, it is not 

always possible to identify a truly combinatorial subset of a virtual library. 

Pickettl42 has implemented a simulated annealing procedure that attempts to 

find the solution closest to a true combinatorial subset within a number of user- 

defined constraints. 

As a final note in this section, several years ago Martin230 suggested a 

competition (similar to the CASP competition for protein structure predic- 

tion237) for assessing descriptors. This would presumably involve the computa- 

tion of the diversity of a defined library by several different research teams, each 

using its own favored approaches. The results of each team would then com- 

pared to some pre-agreed experimental determination of diversity. This would 

be interesting if it could ever be arranged! 

Library Design 

In terms of sampling diversity space, it would seem that stochastic selec- 

tion algorithms are becoming popular for combinatorial library design. Ad- 

vances in technology now allow many robots to handle noncombinatorial li- 

braries, but reagent cost remains a big issue. It is possible to include cost within 

the selection process, but again this has to be carefully balanced with diversity 

(or similarity in a focused library). Product-based reagent selection would seem 

to be demonstrably superior to reagent-based approaches186 but, depending on 

the type of descriptors used, may still be problematic in terms of CPU time for 

very large libraries. Thus, from a practical point of view, a two-step process of 

reagent selection may constitute a workable compromise, with an initial 

reagent-based filtering step preceding the full product-based selection. 

The area of structure-based library design is one that promises much in the 

coming years. Currently, most reported approaches use the approximation of a 

fixed scaffold in the site (see, e.g. Refs. 115 and 118). This could be overcome 

by allowing some limited relaxation or docking after the attachment of each 

combination of R groups. Of crucial importance is the continuing search for 

better binding affinity prediction algorithms.230 Approaches to this problem 

range from empirical scoring functions117J38J39 to more detailed treatments 

based on Monte Carl0240 or molecular dynamics241 simulations to full free 

energy perturbation methods.242 In realistic terms, it is likely that only empiri- 

cal approaches will be applicable to library design in the near future, But con- 

tinuing theoretical and methodological improvements, coupled with the in- 

creases in computer speed combined with parallelization, should eventually lead 

to improved structure-based designs. 

Finally, even in cases where we may be able to show that our designed 

libraries are “better” than random, how close are they to being optimal? To 

answer this question, we need to have an external definition of optimality, 

which does not exist at present. What is required is accurate screening results on 

a large library, from which we try to select a sublibrary. It should be noted that

38 Molecular Diversity and Cornbinatorial Library Desinn 

the optimality test will be valid only for that library and that set of screening 

data. 

Speed Requirement 

As we mentioned earlier, the time that is available for each diversity task 

will likely depend on the nature of the task. Reagent selection may need to be 

done in a hurry, whereas compound acquisition studies may be afforded rather 

more time. In the former case, it is clear that the computer time required for 

diversity analysdlibrary design must not exceed that available (possibly only 

days if the library chemistry is already developed, longer if the chemistry is 

new). For many product-based reagent selection approaches, CPU time is at 

present a very real obstacle to what might be done. It is to be hoped that more 

efficient algorithms and exploitation of parallel computation techniques will 

help alleviate the current difficulties. More fundamentally, the development of 

approaches based on Markush representations may offer a solution in instances 

where only simple 2-D descriptors are employed.243 

“Quick and Dirty” QSAR 

The process of library design is an iterative rather than a “one-off ” pro- 

cedure. Once the first library has been assayed, the next question is, What to 

make next? In the modern pharmaceutical discovery milieu, the computational 

chemist needs to answer this question quickly to have an effective input in 

selecting the next synthetic targets. Clearly, there is a requirement for quantita- 

tive structure-activity relationships and other data-mining techniques to extract 

relationships from the HTS data resulting from large libraries. Martin230 sug- 

gests that QSAR techniques need to be able to handle 105 compounds rather 

than the relatively small data sets (ca. 102) usually studied at present. Methods 

are also required to cope with noisy, incomplete, or binary (results simply 

expressed as “+” or “-” ) biological activity data. Hence the expression “quick 

and dirty QSAR” has come into use. Some approaches to these problems are 

being reported,2447245 and it is possible that fuzzy methods may also have 

a part to play. Certainly, there is much room for further research in this 

area. 

Integration with Other Modeling Tools 

A further issue is how to link diversity tools effectively with extant modeling 

programs. For instance, if a partitioning scheme were being used for analyzing 

diversity space, it might be possible to use de novo design techniques to 

suggest compounds to fill currently empty cells.18J30 Indeed, Pearlman246 is 

working on a program called EAInventor to do just this in conjunction with his 

Diver~eSolutions2~~ package.

Persuading the Customers 

References 39 

Last but not at all least, there is the issue of getting buy-in from the 

medicinal chemists. It is not always easy to convince those tasked with library 

synthesis of the benefits of computational reagent selection. Many still prefer to 

stick with their experience and intuition as to “what will work.” Of course, this 

accumulated wisdom should not be ignored and, in practice, a compromise 

between human and computer selection may be the best way forward. Yet 

nothing succeeds like success, and it has already been demonstrated at various 

pharmaceutical companies that the adoption of library design will accelerate 

when it is associated with the discovery of novel leads at a rate far faster than 

that which can be simply explained away by its detractors. The analogous 

situation existed a few years ago in the field of structure-based drug design, 

which really took off only after the publication of potent new leads, particularly 

by groups working on HIV-1 protease.47 

CONCLUSIONS 

The term “diversity” is hard to define conceptually. In a practical sense, 

diversity analysis is a design strategy that attempts to maximize the hit rate of 

HTS experiments, and validation should be in terms of this goal. It is important 

to maintain a pragmatic approachl87: “diversity” is not the be-all and end-all. 

This is especially so when one is designing structure-based libraries, where 

diversity is perhaps only a weak contributor to a good design. The best selection 

is likely to be neither arbitrary nor maximally diverse.14 

Finally, we reemphasize that this research area is still young: developments 

are occurring rapidly, driven by other new technologies in drug discovery re- 

search. This chapter represents a personal snapshot taken by the authors. “It is 

impossible to predict the contents of an article written in 10 years on the subject 

of molecular diversity” .230 

ACKNOWLEDGMENTS 

We thank our colleagues, past and present, for their help and insights in the field of molecu- 

lar diversity and combinatorid library design. In particular, we acknowledge the contributions of 

present and past coworkers at Rhbne-Poulenc Rorer (Aventis) Iain McLay (now at Glaxo Well- 

come), Paul Menard, Claude Luttmann, Isabelle Morize, Jon Mason, and Andrew Good (the last 

two now at Bristol-Myers Squibb). 

REFERENCES 

1. B. Merrifield, J. Am. Chem. SOC., 85, 2149 (1963). Solid Phase Peptide Synthesis. I. The 

Synthesis of a Tetrapeptide.


2. C. Desai, R. N. Zuckermann, and W. H. Moos, Drug Dev. Res., 33, 174 (1994). Recent 

Advances in the Generation of Chemical Diversity Libraries. 

3. M. Geysen, S. Barteling, and R. Moelen, Proc. Natl. Acad. Sci. USA, 81,3998 (1984). Use of 

Peptide Synthesis to Probe Viral Antigens for Epitopes to a Resolution of a Single Amino 

Acid. 

4. R. A. Houghten, Proc. Natl. Acad. Sci. USA, 82,5131 (1985). General Method for the Rapid 

Solid-Phase Synthesis of Large Numbers of Peptides: Specificity of Antigen-Antibody Inter- 

action at the Level of Individual Amino Acids. 

5. K. S. Lam, S. E. Salmon, E. M. Hersh, V. J. Hruby, W. M. Kazmierski, and R. J. Knapp, 

Nature, 354, 82 (1991). A New Type of Synthetic Peptide Library for Identifying Ligand- 

Binding Activity. 

6. L. A. Thompson and J. A. Ellman, Chem. Rev., 96,555 (1996). Synthesis and Applications of 

Small Molecule Libraries. 

7. E. M. Gordon, M. A. Gallop, andD. V. Patel, Acc. Chem. Res., 29,144 (1996). Strategy and 

Tactics in Combinatorial Organic Synthesis. Applications to Drug Discovery. 

8. F. Balkenhohl, C. von dem Bussche-Huennefeld, A. Lansky, and C. Zechel, Angew. Cbem. 

Int. Ed. Engl., 35, 2288 (1996). Combinatorial Synthesis of Small Organic Molecules. 

9. E. R. Felder and D. Poppinger, Adv. Drug Res., 30, 111 (1997). Combinatorial Compound 

Libraries for Enhanced Drug Discovery Approaches. 

10. D. Brown, Mol. Diversity, 2, 217 (1997). Future Pathways for Combinatorial Chemistry. 

11. P. L. Myers, Curr. Opin. Biotechnol., 8, 701 (1997). Will Combinatorial Chemistry Deliver 

Real Medicines? 

12. R. E. Dolle, Mol. Diversity, 3, 199 (1998). Comprehensive Survey of Chemical Libraries 

Yielding Enzyme Inhibitors, Receptor Agonists and Antagonists, and Other Biologically 

Active Agents: 1992 Through 1997. 

13. J.-L. Fauchere, J. A. Boutin, J.-M. Henlin, N. Kucharczyk, and J.-C. Ortuno, Chemom. Intell. 

Lab. Syst., 43 (1,2), 43 (1998). Combinatorial Chemistry for the Generation of Molecular 

Diversity and the Discovery of Bioactive Leads. 

14. J. M. Blaney and E. J. Martin, Cum Opin. Chem. Biol., 1, 54 (1997). Computational 

Approaches for Combinatorial Library Design and Molecular Diversity Analysis. 

15. E. J. Martin, D. C. Spellmeyer, R. E. Critchlow Jr., and J. M. Blaney, in Reviews in Computa- 

tional Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1997, 

Vol. 10, pp. 75-100. Does Combinatorial Chemistry Obviate Computer-Aided Drug 

Design? 

16. M. G. Bures and Y. C. Martin, Curr. Opin. Chem. Biol., 2, 376 (1998). Computational 

Methods in Molecular Diversity and Combinatorial Chemistry. 

17. D. K. Agrafiotis, J. C. Myslik, and F. R. Salemme, Mol. Diversity, 4, 1 (1999). Advances in 

Diversity Profiling and Combinatorial Series Design. 

18. P. Willett, Perspect. Drug Discovery Des., 7/8,1 (1997). Computational Tools for the Analy- 

sis of Molecular Diversity. For more recent material, see: D. K. Agrafiotis and E. J. Martin, 

J. Mol. Graphics Modell., 18, (3/4), in press (2000). Combinatorial Library Design. 

19. H. Kubinyi, Perspect. Drug Discovery Des., 9/10/11, 225 (1998). Similarity and 

Dissimilarity: A Medicinal Chemist's View. 

20. G. Sello, J. Chem. Inf. Comput. Sci., 38, 691 (1998). Similarity Measures: Is It Possible to 

Compare Dissimilar Structures? 

21. World Drug Index. Derwent Information, http://www.derwent.com/. 

22. E. J. Martin, R. E. Critchiow Jr., D. C. Speilmeyer, S. Rosenberg, K. L. Spear, and J. M. 

Blaney, Pharmacocbem. Libr., 29, 133 (1998). Diverse Approaches to Combinatorial Li- 

brary Design. 

23. R. S. Bohacek, C. McMartin, and W. C. Guida, Med. Res. Rev., 16, 3 (1996). The Art and 

Practice of Structure-Based Drug Design. 

24. H. Kubinyi, Curr. Opin. Drug Discovery Dev., 1, 4 (1998). Structure-Based Design of En- 

zyme Inhibitors and Receptor Ligands.

25. 

26. 

27. 

28. 

29. 

30. 

31. 

32. 

33. 

34. 

35. 

36. 

37. 

38. 

39. 

40. 

41. 

42. 

43. 

44. 

45. 

46. 

References 41 

P. M. Dean, Molecular Foundations of Drug-Receptor Interaction, Cambridge University 

Press, Cambridge, 1987. 

W. P. Jencks, in Chemical Recognition in Biology, F. Chapeville and A.-L. Haenni, Eds., 

Springer-Verlag, Berlin, 1980, pp. 3-25. What Everyone Wanted to Know About Tight 

Binding and Catalysis, But Never Thought of Asking. 

H.-J. Bohm and G. Klehe, Angew. Chern. Int. Ed. Engl., 35, 2588 (1996). What Can We 

Learn from Molecular Recognition in Protein-Ligand Complexes for the Design of New 

Drugs? 

R. L. Babine and S. L. Bender, Chem. Rev., 97, 1359 (1997). Molecular Recognition of 

Protein-Ligand Complexes: Application to Drug Design. 

G. Klebe and H.-J. Bohm,]. Recept. Signal. Transduction Res., 17,459 (1997). Energetic and 

Entropic Factors Determining Binding Affinity in Protein-Ligand Complexes. 

D. H. Williams, Chem. SOC. Rev., 28,57 (1998). Aspects of Weak Interactions. 

J. R. H. Tame, J. Cornput.-Aided Mol. Des., 13,99 (1999). Scoring Functions: A View from 

the Bench. 

A. R. Fersht, J.-P. Shi, J. Knill-Jones, D. M. Lowe, A. J. Wilkinson, D. M. Blow, P. Brick, P. 

Carter, M. M. Y. Waye, and G. Winter, Nature, 314, 235 (1985). Hydrogen Bonding and 

Biological Specificity Analyzed by Protein Engineering. 

A. Horovitz, L. Serrano, B. Avron, M. Bycroft, and A. R. Fersht, /. Mol. B id, 216, 1031 

(1990). Strength and Cooperativity of Contributions of Surface Salt Bridges to Protein 

Stability. 

A. J. Doig and D. H. Williams,]. Am. Chem. SOC., 114, 338 (1992). Binding Energy of an 

Arnide-Amide Hydrogen Bond in Aqueous and Nonpolar Solvents. 

P. L. Chau and P. M. Dean, ]. Cornput.-Aided Mol. Des., 8, 513 (1994). Electrostatic 

Complementarity Between Proteins and Ligands. 1. Charge Disposition, Dielectric and 

Interface Effects. 

P. L. Chau and P. M. Dean, J. Cornput.-Aided Mol. Des., 8, 527 (1994). Electrostatic 

Complementarity Between Proteins and Ligands. 2. Ligand Moieties. 

P. L. Chau and P. M. Dean, I. Cornput.-Aided Mol. Des., 8, 545 (1994). Electrostatic 

Complementarity Between Proteins and Ligands. 3. Structural Basis. 

D. Eisenberg and A. D. McLachlan, Nature, 319, 199 (1986). Solvation Energy in Protein 

Folding and Binding. 

A. Ben-Naim, Hydrophobic Interactions, Plenum Press, New York, 1980. 

1). G. Alberg and S. L. Schreiber, Science, 262, 248 (1993). Structure-Based Design of a 

Cyclophilin-Calcineurin Bridging Ligand. 

A. R. Khan, J. C. Parrish, M. E. Fraser, W. W. Smith, P. A. Bartlett, and M. N. G. James, 

Biochemistry, 37, 16839 (1998). Lowering of the Entropic Barrier for Binding Conforma- 

tionally Flexible Inhibitors to Enzymes. 

B. J. Stockman, Prog. Nucl. Magn. Reson. Spectrosc., 33,109 (1998). NMR Spectroscopy as 

a Tool for Structure-Based Drug Design. 

J. T. Stivers, C. Abeygunawardana, A. S. Mildvan, and C. l? Whitman, Biochemistry 35, 

16036 (1996). '"N NMR Relaxation Studies of Free and Inhibitor-Bound 4-Oxalocrotonate 

Tautomerase: Backbone Dynamics and Entropy Changes of an Enzyme upon Inhibitor 

Binding. 

L. K. Nicholson, T. Yarnazaki, D. A. Torchia, S. Grzesiek, A. Bax, S. J. Stahl, J. D. Kaufman, 

P. T. Wingfield, P. Y. S. Lam, P. K. Jadhav, C. N. Hodge, P. J. Domaille, and C.-H. Chang, 

Nut. Struct. Biol., 2,274 (1995). Flexibility and Function in HIV-1 Protease. 

X. Leng, S. Y. Tsai, B. W. O'Malley, and M. J. Tsai, J. Steroid Biochem. Mol. Biol., 46,643 

(1993). Ligand-Dependent Conformational Changes in Thyroid Hormone and Retinoic 

Acid Receptors Are Potentially Enhanced by Heterodimerization with Retinoic X Receptor. 

A. M. Davis and S. J. Teague, Angew. Chem. Int. Ed. Engl., 38, 736 (1999). Hydrogen 

Bonding, Hydrophobic Interactions, and Failure of the Rigid Receptor Hypothesis.

42 Molecular Diversity and Cornbinatorial Libra y Design 

47. A. Wlodawer and J. Vondrasek, Annu. Rev. Biophys. Biomol. Struct., 27, 249 (1998). 

Inhibitors of HIV-1 Protease: A Major Success of Structure-Assisted Drug Design. 

48. A. R. Leach, J. Mol. Biol., 235, 345 (1994). Ligand Docking to Proteins with Discrete 

Sidechain Flexibility. 

49. G. Jones, P. Willett, and R. C. Glen,J. Mol. Biol., 245,43 (1995). Molecular Recognition of 

Receptor Sites Using a Genetic Algorithm with a Description of Desolvation. 

50. V. Schnecke, C. A. Swanson, E. D. Getzoff, J. A. Tainer, and L. A. Kuhn, Proteins: Struct., 

Funct., Genet., 33, 74 (1998). Screening a Peptidyl Database for Potential Ligands to Pro- 

teins with Side-Chain Flexibility. 

51. B. Sandak, R. Nussinov, and H. J. Wolfson, J. Comput. Biol., 5,631 (1998). A Method for 

Biomolecular Structural Recognition and Docking Allowing Conformational Flexibility. 

52. F. A. Quiocho, D. K. Wilson, and N. K. Vyas, Nature, 340,404 (1989). Substrate Specificity 

and Affinity of a Protein Modulated by Bound Water Molecules. 

53. M. L. Raymer, P. C. Sanschagrin, W. F. Punch, S. Venkataraman, E. D. Goodman, and L. A. 

Kuhn, J. Mol. Biol., 265, 445 (1997). Predicting Conserved Water-Mediated and Polar 

Ligand Interactions in Proteins Using a K-Nearest-Neighbors Genetic Algorithm. 

54. V. A. Makarov, B. K. Andrews, and B. M. Pettitt, Biopolymers, 45,469 (1998). Reconstruct- 

ing the Protein-Water Interface. 

55. M. Feig and B. M. Pettitt, Structure, 6, 1351 (1998). Crystallographic Water Sites from a 

Theoretical Perspective. 

56. M. Rarey, B. Kramer, T. Lengauer, and G. Klebe, J. Mol. Biol., 261, 470 (1996). A Fast 

Flexible Docking Method Using an Incremental Construction Algorithm. 

57. M. Rarey, B. Kramer, and T. Lengauer, Proteins: Struct., Func., Genet., 34, 17 (1999). The 

Particle Concept: Placing Discrete Water Molecules During Protein-Ligand Docking 

Predictions. 

58. E. F. Meyer, I. Botos, L. Scapozza, and D. Zhang, Perspect. Drug Discovery Des., 3, 168 

(1995). Backward Binding and Other Structural Surprises. 

59. G. D. Diana, A. M. Treasurywala, T. R. Bailey, R. C. Oglesby, D. C. Pevear, and F. J. Dutko, 

J. Med. Chem., 33, 1306 (1990). A Model for Compounds Active Against Human Rhi- 

novirus-14 Based on X-Ray Crystallography Data. 

60. R. D. Brown, Perspect. Drug Discovery Des., 7/8, 31 (1997). Descriptors for Diversity 

Analysis. 

61. R. S. Pearlman, Chem. Des. Autom. News, 2 (l), 1 (1987). Rapid Generation of High Quality 

Approximate 3D Molecular Structures. 

62. J. Sadowski and J. Gasteiger, Chem. Rev., 93,2567 (1993). From Atoms and Bonds to Three- 

Dimensional Atomic Coordinates. 

63. N. E. Shemetulskis, D. Weininger, C. J. Blankley, J. J. Yang, and C. Humblet, J. Chem. Inf. 

Comput. Sci., 36, 862 (1996). Stigmata: An Algorithm to Determine Structural Com- 

monalities in Diverse Datasets. 

64. P. Willett, V. Winterman, and D. Bawden, J. Chem. Inf. Comput. Sci., 26, 109 (1986). 

Implementation of Nonhierarchical Cluster Analysis Methods in Chemical Information 

Systems: Selection of Compounds for Biological Testing and Clustering of Substructure 

Search Output. 

65. SSKEYS Gateway, MDL Information Systems Inc., 14600 Catalina St., San Leandro, CA 

94577. http://www.mdli.com/. 

66. R. D. Brown and Y. C. Martin,J. Chem. Inf. Comput. Sci., 36,572 (1996). Use of Structure- 

Activity Data to Compare Structure-Based Clustering Methods and Descriptors for Use in 

Compound Selection. 

67. M. J. McGregor and P. V. Pallai, J. Chem. In{ Comput. Sci., 37, 443 (1997). Clustering of 

Large Databases of Compounds: Using the MDL Keys as Structural Descriptors. 

68. R. D. Brown and Y. C. Martin, J. Chem. lnf. Cornput. Sci., 37, 1 (1997). The Information 

Content of 2-D and 3-D Structural Descriptors Relevant to Ligand-Receptor Binding.

References 43 

69. Daylight Chemical Information Software, version 4.62. Daylight Chemical Information 

Systems Inc., 27401 Los Altos, Suite 370, Mission Viejo, CA 92691. http:// 

www.daylight.com/. 

70. D. R. Flower, J. Chem. Inf. Comput. Sci., 38, 379 (1998). On the Properties of Bit String- 

Based Measures of Chemical Similarity. 

71. P. Willett, J. M. Barnard, and G. M. Downs, J. Chem. Znf. Comput. Sci., 38, 983 (1998). 

Chemical Similarity Searching. 

72. L. H. Hall and L. B. Kier, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. 

Boyd, Eds., VCH Publishers, New York, 1991, Vol. 2, pp. 367-422. The Molecular Connec- 

tivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling. 

73. A. T. Balaban, SAR QSAR Environ. Res., 8, 1 (1998). Topological and Stereochemical 

Molecular Descriptors for Databases Useful in QSAR SimilaritylDissimilarity and Drug 

Design. 

74. R. A. Lewis, J. S. Mason, and I. M. McLay, J. Chem. Znf. Comput. Sci., 37, 599 (1997). 

Similarity Measures for Rational Set Selection and Analysis of Combinatorial Libraries: The 

Diverse Property-Derived (DPD) Approach. 

75. E. J. Martin, J. M. Blaney, M. A. Siani, D. C. Spellmeyer, A. K. Wong, and W. H. Moos, J. 

Med. Chem., 38,1431 (1 995). Measuring Diversity: Experimental Design of Combinatorial 

Libraries for Drug Discovery. 

76. D. J. Cummins, C. W. Andrews, J. A. Bentley, and M. Cory, J. Chem. Inf. Comput. Sci., 36, 

750 (1996). Molecular Diversity in Chemical Databases: Comparison of Medicinal Chemis- 

try Knowledge Bases and Databases of Commercially Available Compounds. 

77. S. Wold, K. Esbensen, and P. Geladi, Chemom. Intell. Lab. Syst., 2, 37 (1987). Principal 

Component Analysis. 

78. B. S. Everitt and G. Dunn, Applied Multivariate Dakz Analysis, Oxford University Press, 

New York, 1992. 

79. W. S. Dillon and M. Goldstein, Multivariate Analysis: Methods and Applications, Wiley, 

New York, 1984. 

80. CLOGP. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite 370, Mission 

Viejo, CA 92691. http://www.daylight.com/; see also http://biobyte.com/. 

81. A. J. Leo, Chem. Rev., 93, 1281 (1993). Calculating log Poct from Structures. 

82. P.-A. Carrupt, B. Testa, and P. Gaillard, in Reviews in computational Chemistry, K. B. 

Lipkowitz and D. B. Boyd, Eds., Wiky-VCH, New York, 1997, Vol. 11, pp. 241-315. 

Computational Approaches to Lipophilicity: Methods and Applications. 

83. P. F. de Aguiar, B. Bourguignon, M. S. Khots, D. L. Massart, and R. Phan-Than-Luu, 

Chemom. Intell. Lab. Syst., 30, 199 (1992). D-Optimal Designs. 

84. R. S. Pearlman and K. M. Smith, Perspect. Drug Discovery Des., 9/10/11,355 (1 997). Novel 

Software Tools for Chemical Diversity. 

85. R. S. Pearlman and K. M. Smith, Drugs Future, 23, 885 (1998). Software for Chemical 

Diversity in the Context of Accelerated Drug Discovery. 

86. F. R. Burden, J. Chern. Inf. Comput. Sci., 29,225 (1989). Molecular Identification Number 

for Substructure Searches. 

87. P. R. Menard, J. S. Mason, I. Morize, and S. Bauerschmidt,]. Chem. Znf. Comput. Sci., 38, 

1204 (1998). Chemistry Space Metrics in Diversity Analysis, Library Design, and Com- 

pound Selection. 

88. R. S. Pearlman and K. M. Smith,J. Chem. Inf. Comput. Sci., 39,28 (1999). Metric Validation 

and the Receptor-Relevant Subspace Concept. 

89. D. Stanton,J. Chem. Inf. Comput. Sci., 39,ll (1999). Evaluation and Use of BCUT Descrip- 

tors in QSAR and QSPR Studies. 

90. G. W. Bemis and I. D. Kuntz,J. Cornput.-Aided Mol. Des., 6,607 (1992). A Fast and Efficient 

Method for 2D and 3D Molecular Shape Description. 

91. G. Moreau and C. Turpin, Analusis, 24, 17 (1996). Use of Similarity Analysis to Reduce 

Large Molecular Libraries to Smaller Sets of Representative Molecules.


\ 

92. J. Sadowski, M. Wagener, and J. Gasteiger, Angew. Chem. Int. Ed. Engl., 34,2674 (1996). 

Assessing Similarity and Diversity of Combinatorial Libraries by Spatial Autocorrelation 

Functions and Neural Networks. 

93. S. E. Jakes and P. Willett, J. Mol. Graphics, 4, 12 (1986). Pharmacophoric Pattern Matching 

in Files of 3-D Chemical Structures: Selection of Interatomic Distance Screens. 

94. S. E. Jakes, N. Watts, P. Willett, D. Bawden, and J. D. Fisher, J. Mol. Graphics, 5,41 (1987). 

Pharmacophoric Pattern Matching in Files of 3-D Chemical Structures: Evaluation of Search 

Performance. 

95. R. P. Sheridan, R. Nilakantan, A. Rusinko 111, N. Bauman, K. S. Haraki, and R. Ven- 

kataraghavan, ]. Chem. Inf. Comput. Sci., 29, 255 (1989). 3-DSEARCH: A System for 

Three-Dimensional Substructure Searching. 

96. Y. C. Martin, M. G. Bures, and P. Willett, in Reviews in Computa#ional Chemistry, K. B. 

Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1990, Vol. 1, pp. 213-263. 

Searching Databases of Three-Dimensional Structures. Y. C. Martin, J. Med. Chem., 35, 

2145 (1992). 3-D Database Searching in Drug Design. 

97. A. C. Good and J. S. Mason, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. 

B. Boyd, Eds., VCH Publishers, New York, 1995, Vol. 7, pp. 67-117. Three-Dimensional 

Structure Database Searches. 

98. S. Wang, G. W. A. Milne, X. Yan, I. Posey, M. C. Nicklaus, L. Graham, and W. G. Rice, 1. 

Med. Chem., 39,2047 (1996). Discovery of Novel, Non-Peptide HIV-1 Protease Inhibitors 

by Pharmacophore Searching. 

99: P. C. Astles, T. J. Brown, C. M. Handscombe, M. F. Harper, N. V. Harris, R. A. Lewis, P. M. 

Lockey, C. McCarthy, I. M. McLay, B. Porter, A. G. Roach, C. Smith, and R. J. A. Walsh, 

Eur. ]. Med. Chem., 32,409 (1997). Selective Endothelin A Receptor Ligands. 1. Discovery 

and Structure-Activity of 2,4-Disubstituted Benzoic Acid Derivatives. 

100. S. D. Pickett, J. S. Mason, and I. M. McLay, J. Chem. Inf. Comput. Sci., 36, 1214 (1996). 

Diversity Profiling and Design Using 3-D Pharmacophores: Pharmacophore-Derived QU- 

eries (PDQ). 

101. J. S. Mason and S. D. Pickett, Perspect. Drug Discovery Des., 7/8,85 (1997). Partition-Based 

Selection. 

102. S. D. Pickett, C. Luttmann, V. Guerin, A. Laoui, and E. James,]. Chem. Inf. Comput. Sci., 38, 

144 (1998). DIVSEL and COMPLIB-Strategies for the Design and Comparison of Com- 

binatorial Libraries Using Pharmacophoric Descriptors. 

103. E. K. Davies, in Molecular Diversity and Combinatorial Chemistry: Libraries and Drug 

Discovery, 1. M. Chaiken and K. D. Janda, Eds., American Chemical Society, Washington, 

DC, 1996, pp. 309-316. Using Pharrnacophore Diversity to Select Molecules to Test from 

Commercial Catalogues. 

104. R. D. Brown and Y. C. Martin,]. Med. Chem., 40, 2304 (1997). Designing Combinatorial 

Library Mixtures Using a Genetic Algorithm. 

105. ChernDiverse. Oxford Molecular Group plc, The Medawar Centre, Oxford Science Park, 

Oxford, OX4 4GA, United Kingdom. http://www.oxmol.coml. 

106. R. D. Cramer, R. D. Clark, D. E. Patterson, and A. M. Ferguson, J. Med. Chem., 39, 3060 

(1996). Bioisosterism as a Molecular Diversity Descriptor: Steric Fields of Single Topomeric 

Conformers. 

107. J. Mount, J. Ruppert, W. Welch, and A. N. Jain, J. Med. Chem., 42, 60 (1999). IcePick: A 

Flexible Surface-Based System for Molecular Diversity. 

108. W. Welch, J. Ruppert, and A. N. Jain, Chem. Biol., 3,449 (1996). Hammerhead: Fast, Fully 

Automated Docking of Flexible Ligands to Protein Binding Sites. 

109. A. N. Jain, K. Koile, and D. Chaprnan,J. Med. Chem., 37,2315 (1994). Compass: Predicting 

Biological Activities from Molecular Surface Properties. Performance Comparisons on a 

Steroid Benchmark. 

110. S. M. Boyd, M. Beverley, L. Norskov, and R. E. Hubbard, J. Cornput.-Aided Mol. Des., 9, 

417 (1995). Characterising the Geometric Diversity of Functional Groups in Chemical 

Databases.

111. 

112. 

113. 

114. 

115. 

116. 

117. 

118. 

119. 

120. 

121. 

122. 

123. 

124. 

125. 

126. 

127. 

128. 

129. 

References 45 

P. A. Bartlett and G. Lauri, in Book of Abstracts, 211th ACS National Meeting, New 

Orleans, LA, March 24-28, 1996, American Chemical Society, Washington, DC, 1996, 

COMP-014. The CAVEAT Vector Approach for Structure-Based Design and Combinatorial 

Chemistry. 

D. Chapman, J. Cornput.-Aided Mol. Des., 10,501 (1996). The Measurement of Molecular 

Diversity: A Three-Dimensional Approach. 

G. Jones, P. Willett, R. C. Glen, A. R. Leach, and R. Taylor, J. Mol. Biol., 267, 727 (1997). 

Development and Validation of a Genetic Algorithm for Flexible Docking. 

C. A. Baxter, C. W. Murray, D. E. Clark, D. R. Westhead, and M. D. Eldridge, Puoteins: 

Struct., Funct., Genet., 33,367 (1998). Flexible Docking Using Tabu Search and an Empiri- 

cal Estimate of Binding Affinity. 

C. W. Murray, D. E. Clark, T. R. Auton, M. A. Firth, J. Li, R. A. Sykes, B. Waszkowycz, D. R. 

Westhead, and S. C. Young,]. Cornput.-Aided Mol. Des., 11, 193 (1997). PROSELECT 

Combining Structure-Based Drug Design and Combinatorial Chemistry for Rapid Lead 

Discovery. 1. Technology. 

D. E. Clark, D. Frenkel, S. A. Levy, J. Li, C. W. Murray, B. Robson, B. Waszkowycz, and D. R. 

Westhead, J. Cornput.-Aided Mol. Des., 9,13 (1995). PRO-LIGAND: An Approach to De 

Novo Molecular Design. 1. Application to the Design of Organic Molecules. 

M. D. Eldridge, C. W. Murray, T. R. Auton, G. V. Paolini, and R. P. Mee,]. Cornput.-Aided 

Mol. Des., 11, 425 (1997). Empirical Scoring Functions. I. The Development of a Fast 

Empirical Scoring Function to Estimate the Binding Affinity of Ligands in Receptor 

Complexes. 

E. K. Kick, D. C. Roe, A. G. Skillman, G. Liu, T. J. A. Ewing, Y. Sun, 1. D. Kuntz, and J. A. 

Ellman, Chem. Biol., 4,297 (1997). Structure-Based Design and Combinatorial Chemistry 

Yield Low-Nanomolar Inhibitors of Cathepsin D. 

T. S. Haque, A. G. Skillman, C. E. Lee, H. Hahashita, I. Y. Gluzman, T. J. A. Ewing, D. E. 

Goldberg, I. D. Kuntz, and J. A. Ellman, 1. Med. Chern., 42, 1428 (1999). Potent, Low- 

Molecular-Weight Non-Deptide Inhibitors of Malarial Aspartyl Protease Plasmepsin 11. 

Y. Sun, T. J. A. Ewing, A. G. Skillman, and I. D. Kuntz,J. Cornput.-Aided Mol. Des., 12,597 

(1998). CombiDOCK: Structure-Based Combinatorial Docking and Library Design. 

H.-J. Bohm, J. Cornput.-Aided Mol. Des., 6, 61 (1992). The Computer Program LUDI: A 

New Method for the De Novo Design of Enzyme Inhibitors. 

H.-J. Bohm,J. Cornput.-Aided Mol. Des., 10,265 (1996). Towards the Automatic Design of 

Synthetically Accessible Protein Ligands: Peptides, Amides and Peptidomimetics. 

H.-J. Bohm, D. W. Bannel; and L. Weber, J. Cornput.-Aided Mol. Des., 13, 51 (1999). 

Combinatorial Docking and Combinatorial Chemistry: Design of Potent Non-peptide 

Thrombin Inhibitors. 

Design in Receptor. Oxford Molecular Group plc, The Medawar Centre, Oxford Science 

Park, Oxford, OX4 4GA, United Kingdom. http://www.oxmol.co.u!d. 

C. M. Murray and S. J. Cato, J. Chern. Inf Cornput. Sci., 39,46 (1999). Design of Libraries 

to Explore Receptor Sites. 

M. Lajiness, in QSAR: Rational Approaches to the Design of Bioactive Compounds, C. 

Silipo and A. Vittoria, Eds., ESCOM, Leiden, 1991, pp. 201-204. Evaluation of the Perfor- 

mance of Dissimilarity Selection Methodology. 

R. Taylor, J. Cbern. Inf. Cornput. Sci., 35, 59 (1995). Simulation Analysis of Experimental 

Design Strategies for Screening Random Compounds as Potential New Drugs and 

Agrochemicals. 

S. K. Kearsley, S. Sallamack, E. M. Fluder, J. D. Andose, R. T. Mosley, and R. P. Sheridan, 

J. Chem. Inf. Cornput. Sci., 36, 118 (1996). Chemical Similarity Using Physiochemical 

Property Descriptors. 

V. J. Gillet, P. Willett, and J. Bradshaw,]. Chern. In{ Cornput. Sci., 38,165 (1998). Identifica- 

tion of Biological Activity Profiles Using Substructural Analysis and Genetic Algorithms.


130. Spresi database. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite 370, 

Mission Viejo, CA 92691. http://www.daylight.com/. 

131. D. E. Patterson, R. D. Cramer, A. M. Ferguson, R. D. Clark, and L. E. Weinberger,]. Med. 

Chem., 39, 3049 (1996). Neighborhood Behavior: A Useful Concept for Validation of 

Molecular Diversity Descriptors. 

132. H. Matter, J. Med. Chem., 40, 1219 (1997). Selecting Optimally Diverse Compounds from 

Structure Databases: A Validation Study of Two-Dimensional and Three-Dimensional 

Descriptors. 

133. H. Matter,]. Peptide. Res., 52,305 (1998). A Validation Study of Molecular Descriptors for 

the Rational Design of Peptide Libraries. 

134. G. M. Downs and P. Willett, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. 

B. Boyd, Eds., VCH Publishers, New York, 1995, Vol. 7, pp. 1-66. Similarity Searching in 

Databases of Chemical Structures. 

135. R. D. Cramer, S. A DePriest, D. E. Patterson, and P. Hecht, in 3-D QSAR in Drug Design, H. 

Kubinyi, Ed., ESCOM, Leiden, 1993, pp. 443-485. The Developing Practice of Comparative 

Molecular Field Analysis. 

136. T. I. Oprea and C. L. Waller, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. 

B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 127-182. Theoretical and Practical 

Aspects of Three-Dimensional Quantitative Structure-Activity Relationships. 

137. G. Greco, E. Novellino, and Y. C. Martin, in Reviews in Computational Chemistry, K. B. 

Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 183-240. 

Approaches to Three-Dimensional Quantitative Structure-Activity Relationships. 

138. E. J. Jacobsen, L. S. Stelzer, R. E. TenBrink, K. L. Belonga, D. B. Carter, H. K. Im, W. B. Im, V. 

H. Sethy, A. H. Tang, P. F. Von Voigtlander, J. D. Petke, W.-Z. Zhong, and J. W. Mickelson,J. 

Med. Chem., 42,1123 (1999). Piperazine Imidazo[l,S-a]quinoxaline Ureas as High-Affinity 

GABA, Ligands of Dual Functionality. 

139. J. D. Elliott, M. A. Lago, R. D. Cousins, A. Gao, J. D. Leber, K. F. Erhard, P. Nambi, N. A. 

Elshourbagy, C. Kumar, J. A. Lee, J. W. Bean, C. W. DeBrosse, D. S. Eggleston, D. P. Brooks, 

G. Feuerstein, R. R. Ruffolo Jr., J. Weinstock, J. G. Gleason, C. E. Peishoff, and E. H. 

Ohlstein, ]. Med. Chem., 37, 1553 (1994). 1,3-Diarylindan-2-carboxylic Acids, Potent and 

Selective Non-peptide Endothelin Receptor Antagonists. 

140. T. F. Walsh, K. J. Fitch, D. L. Williams Jr., K. L. Murphy, N. A. Nolan, D. J. Pettibone, S. L. 

Raymond, S. S. O’Malley, B. V. Clineschmidt, D. F. Veber, and W. J. Greenlee, Bioorg. Med. 

Chem. Lett., 5, 1155 (1995). Potent Dual Antagonists of Endothelin and Angiotensin I1 

Receptors Derived from a-Phenoxyphenylacetic Acids. 111. 

141. S. A. Mousa and D. A. Cheresh, Drug Discovery Today, 2, 187 (1997). Recent Advances in 

Cell Adhesion Molecules and Extracellular Matrix Proteins: Potential Clinical Implications. 

142. S. D. Pickett, I. M. McLay, and D. E. Clark, 1. Chem. Inf Comput. Sci., 40, 263 (2000). 

Enhancing the Hit-to-Lead Properties of Lead Optimization Libraries. 

143. M. A. Johnson and G. M. Maggiora, Eds., Concepts and Applications of Molecular Similarity. 

Wiley-Interscience, New York, 1990. 

144. J. B. Dunbar, Perspect. Drug Discovery Des., 7/8, 51 (1997). Cluster-Based Selection. 

145. M. S. Lajiness, Perspect. Drug Discovery Des., 718, 65 (1997). Dissimilarity-Based Compound 

Selection Techniques. 

146. J. H. Wikel and R. E. Higgs, ]. Biomol. Screening, 2,65 (1997). Applications of Molecular 

Diversity Analysis in High Throughput Screening. 

147. R. A. Jarvis and E. A. Patrick, IEEE Trans. Comput., C-22,1025 (1973). Clustering Using a 

Similarity Measure Based on Shared Nearest Neighbors. 

148. P. R. Menard, R. A. Lewis, and J. S. Mason, J. Chem. Inf. Comput. Sci., 38, 497 (1998). 

Rational Screening Set Design and Compound Selection: Cascaded Clustering. 

149. T. N. Doman, J. M. Cibulskis, M. J. Cibulskis, P. D. McCray, and D. P. Spangler, 1. Chem. In{ 

Comput. Sci., 36,1195 (1996). AlgorithmS: A Technique for Fuzzy Similarity Clustering of 

Chemical Inventories.

150. 

151. 

152. 

153. 

154. 

155. 

156. 

157. 

158. 

159. 

160. 

161. 

162. 

163. 

164. 

165. 

166. 

167. 

168. 

169. 

170. 

171. 

172. 

173. 

References 47 

R. Dubes and A. K. Jain, Adu. Comput., 19, 113 (1980). Clustering Methodologies in 

Exploratory Data Analysis. 

J. M. Barnard and G. M. Downs, 1. Chem. inf. Comput. Sci., 37, 141 (1997). Chemical 

Fragment Generation and Clustering Software. 

F. Murtagh, Multidimensional Clustering Algorithms, Physica-Verlag, Vienna, 1985. 

L. H. Hall, L. B. Kier, and B. B. Brown, J. Chem. Inf. Comput. Sci., 35, 1074 (1995). 

Molecular Similarity Based on Novel Atom-Type Electrotopological State Indices. 

M. J. Ashton, M. C. Jaye, and J. S. Mason, Drug Discovery Today, 1, 71 (1996). New 

Perspectives in Lead Generation. 11. Evaluating Molecular Diversity. 

D. Bawden, in Chemical Structures 2: The international Language of Chemistry, W. A. Warr, 

Ed., Springer-Verlag, Berlin, 1993, pp. 383-388. Molecular Dissimilarity in Chemical Infor- 

mation Systems. 

R. W. Kennard and L. A. Stone, Technometrics, 11, 137 (1969). Computer Aided Design of 

Experiments. 

J. D. Holliday, S. S. Ranade, and P. Willett, Quant. Struct.-Act. Relat., 14,501 (1995). A Fast 

Algorithm for Selecting Sets of Dissimilar Molecules from Large Chemical Databases. 

D. B. Turner, S. M. Tyrrell, and P. Willett,J. Chem. Infi Comput. Sci., 37, 18 (1997). Rapid 

Quantification of Molecular Diversity for Selective Database Acquisition. 

J. D. Holliday and P. Willett,]. Biomol. Screening, 1,145 (1996). Definitions of Dissimilarity 

for Dissimilarity-Based Compound Selection. 

M. Snarey, N. K. Terrett, P. Willett, and D. J. Wilton, /. Mol. Graphics, 15, 372 (1997). 

Comparison of Algorithms for Dissimilarity-Based Compound Selection. 

D. K. Agrafiotis and V. S. Lobanov, J. Chem. Inf. Comput. Sci., 39, 51 (1999). An Efficient 

Implementation of Distance-Based Diversity Measures Based on k-d Trees. 

R. D. Clark, /. Chern. In$ Comput. Sci., 37, 1181 (1997). OptiSim: An Extended 

Dissimilarity Selection Method for Finding Diverse Representative Subsets. 

R. D. Clark and W. J. Langton, J. Chem. Inf. Comput. Sci., 38, 1079 (1998). Balancing 

Representativeness Against Diversity Using Optimizable K-Dissimilarity and Hierarchical 

Clustering, 

M. Hassan, J. P. Bielawski, J. C. Hempel, and M. Waldman, Mol. Diversity, 2, 64 (1996). 

Optimisation and Visualisation of Molecular Diversity of Combinatorial Libraries. 

B. D. Hudson, R. M. Hyde, E. Rahr, J. Wood, and J. Osman, Quant. Struct.-Act. Relat., 15, 

283 (1996). Parameter Based Methods for Compound Selection from Chemical Data Bases. 

R. E. Higgs, K. G. Bemis, I. A. Watson, and J. H. Wikel, J. Chem. Inf. Comput. Sci., 37, 861 

(1997). Experimental Designs for Selecting Molecules from Large Chemical Databases. 

S. Anzali, J. Gasteiger, U. Holzgrabe, J. Polanski, J, Sadowski, A. Teckentrup, and M. Wage- 

ner, Perspect. Drug Discovery Des., 9/10/11,273 (1998). The Use of Self-organizing Neu- 

ral Networks in Drug Design. 

H. Bauknecht, A. Zell, H. Bayer, P. Levi, M. Wagener, J. Sadowski, and J. Gasteiger, /. Chem. 

Inf. Comput. Sci., 36, 1205 (1996). Locating Biologically Active Compounds in Medium- 

Sized Heterogeneous Datasets by Topological Autocorrelation Vectors: Dopamine and Ben- 

zodiazepine Agonists. 

D. K. Agrafiotis, /. Chem. Inf. Comput. Sci., 37, 841 (1997). Stochastic Algorithms for 

Maximizing Molecular Diversity. 

P. Willett, Similarity and Clustering in Chemical Information Systems, Research Studies 

Press, Letchworth, 1987. 

J. M. Barnard and G. M. Downs, J. Chem. Inf. Comput. Sci., 32,644 (1992). Clustering of 

Chemical Structures on the Basis of Two-Dimensional Similarity Measures. 

J. W. MacFarlane and D. J. Gans, in Cbemometric Methods in Molecular Design, H. van de 

Waterbeemd, Ed., VCH, Weinheim, 1995, pp. 295-308. Cluster Significance Analysis. 

D. H. Rouvray, Fuzzy Logic in Chemistry, Academic Press, San Diego, CA, 1997.

48 Molecular Diversitv and Combinatorial Librarv Desim 

174. 

175. 

176. 

177. 

178. 

179. 

180. 

181. 

182. 

183. 

184. 

185. 

186. 

187. 

188. 

189. 

190. 

191. 

192. 

193. 

N. E. Shemetulskis, J. B. Dunbar Jr., B. W. Dunbar, D. W. Moreland, and C. Humblet, J. 

Cornput.-Aided Mol. Des., 9,407 (1995). Enhancing the Diversity of a Corporate Database 

Using Chemical Database Clustering and Analysis. 

CAST-3D Database. Chemical Abstracts Services, Columbus, OH. http://www.cas.org/. 

Maybridge Database. Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite 

370, Mission Viejo, CA 92691. http://www.daylight.com/. 

Comprehensive Medicinal Chemistry (CMC), Molecular Design Limited, San Leandro, CA 

94577. An electronic database version of the Drug Compendium that is Volume 6 of Com- 

prehensive Medicinal Chemistry published by Pergamon Press in March 1990. Contains 

drugs already on the market. 

MACCS-I1 Drug Data Report (MDDR), Molecular Design Limited, San Leandro, CA 94577. 

An electronic database version of the Prous Science Publishers journal Drug Data Report, 

extracted from issues starting mid-1 988. Contains biologically active compounds in the 

early stages of drug development. 

Available Chemicals Directory (ACD), Molecular Design Limited, San Leandro, CA 94577. 

Contains speciality and bulk chemicals from commercial sources. 

SPECS/BioSPECS Database; Brandon Associates, Merrimack, NH 03054. Contains chemi- 

cals from private sources. 

R. Nilakantan, N. Bauman, and K. S. Haraki,]. Cornput.-Aided Mol. Des., 11,447 (1997). 

Diversity Database Assessment: New Ideas, Concepts and Tools. 

R. Nilakantan, N. Bauman, K. S. Haraki, and R. Venkataraghavan, ]. Chem. Inf. Comput. 

Sci., 30,65 (1990). A Ring-Based Chemical Structural Query System: Use of a Novel Ring- 

Complexity Heuristic. 

F. H. Allen, J. E. Davies, J. J. Galloy, 0. Johnson, 0. Kennard, C. F. Macrae, E. M. Mitchell, 

G. F. Mitchell, J. M. Smith and D. G. Watson,]. Chem. Inf. Comput. Sci., 31, 187 (1991). 

The Development of Version 3 and Version 4 of the Cambridge Structural Database System. 

G. W. A. Milne, M. C. Nicklaus, J. S. Driscoll, S. Wang, and D. J. Zaharevitz, J. Chem. Inf. 

Comput. Sci., 34, 1219 (1994). National Cancer Institute Drug Information System 3D 

Database. 

A. C. Good and R. A. Lewis, J. Med. Chem., 40, 3926 (1997). New Methodology for 

Profiling Combinatorial Libraries and Screening Sets: Cleaning Up the Design Process with 

HARPick. 

V. J. Gillet, P. Willett, and J. Bradshaw, J. Chem. In6 Cornput. Sci., 37, 731 (1997). The 

Effectiveness of Reactant Pools for Generating Structurally-Diverse Combinatorial 

Libraries. 

J. H. van Drie and M. S. Lajiness, Drug Discovery Today, 3, 274 (1998). Approaches to 

Virtual Library Design. 

J. H. Kalivas, Chemom. Intell. Lab. Syst., 15, 1 (1992). Optimization Using Variations of 

Simulated Annealing. 

R. Judson, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., 

VCH Publishers, New York, 1997, Vol. 10, pp. 1-73. Genetic Algorithms and Their Use in 

Chemistry. 

R. D. Brown and D. E. Clark, Expert Opin. Ther. Patents, 8,1447 (1998). Genetic Diversity: 

Applications of Evolutionary Algorithms to Combinatorial Library Design. 

L. Weber, Curr. Opin. Chem. Bzol., 2, 381 (1998). Applications of Genetic Algorithms in 

Molecular Diversity. 

L. Weber, Drug Discovery Today, 3, 379 (1998). Evolutionary Combinatorial Chemistry: 

Application of Genetic Algorithms. 

R. A. Lewis, A. C. Good, and S. D. Pickett, in Computer-Assisted Lead Finding and Optimi- 

zation: Current Tools for Medicinal Chemistry, H. van de Waterbeemd, B. Testa, and G. 

Fokers, Eds., Wiley-VCH, Weinheim, 1997, pp. 135-1 56. Quantification of Molecular 

Similarity and Its Application to Combinatorial Chemistry.

References 49 

194. V. J. Gillet, P. Willett, J. Bradshaw, and D. V. S. Green, J. Chem. Inf. Comput. Sci., 39, 169 

(1999). Selecting Combinatorial Libraries to Optimize Diversity and Physical Properties. 

195. R. P. Sheridan and S. K. Kearsley,]. Chem. lnf. Comput. Sci., 35,310 (1995). Using a Genetic 

Algorithm to Suggest Combinatorial Libraries. 

196. S. J. Cho, W. Zheng, and A. Tropsha,J. Chem. Inf. Comput. Sci., 38,259 (1998). Rational 

Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide I i 

braries Using Chemical Similarity Probe and the Inverse QSAR Approaches. 

197. S. D. Pickett, unpublished work, 1999. 

198. P. J. Brown, T. A. Smith-Oliver, P. S. Charifson, N. C. 0. Tomkinson, A. M. Fivush, D. D. 

Sternbach, L. E. Wade, L. Orband-Miller, D. J. Parks, S. G. Blanchard, S. A. Kliewer, J. H. 

Lehmann, and T. M. Willson, Chem. Biol., 4, 909 (1997). Identification of Peroxisome 

Proliferator-Activated Receptor Ligands from a Biased Chemical Library. 

199. F. R. Salemme, J. Spurlino, and R. Bone, Structure, 5, 319 (1997). Serendipity Meets Precision: 

The Integration of Structure-Based Drug Design and Combinatorial Chemistry for 

Efficient Drug Discovery. 

200. J. Li, C. W. Murray, B. Waszkowycz, and S. C. Young, Drug Discovery Today, 3,105 (1998). 

Targeted Molecular Diversity in Drug Discovery: Integration of Structure-Based Design and 

Combinatorial Chemistry. 

201. M. A. Murcko, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, 

Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 1-66. Recent Advances in Ligand Design 

Methods. 

202. D. E. Clark, C. W. Murray, and J. Li, in Reviews in Computational Chemistry, K. B. 

Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 67-125. 

Current Issues in De Novo Molecular Design. 

203. A. Rockwell, M. Melden, R. A. Copeland, K. Hardman, C. P. Decicco, and W. F. DeGrado, 

J. Am. Chem. SOC., 118, 10337 (1996). Complementarity of Combinatorial Chemistry and 

Structure-Based Ligand Design: Application to the Discovery of Novel Inhibitors of Matrix 

Metalloproteinases. 

204. A. P. Combs, T. M. Kapoor, S. B. Feng, J. K. Chen, L. F. Daudesnow, and S. L. Schreiber, 

J. Am. Chem. SOC., 118, 287 (1996). Protein Structure-Based Combinatorial Chemistry: 

Discovery of Non-peptide Binding Elements to Src SH3 Domain. 

205. T. C. Norman, N. S. Gray, J. T. Koh, and P. G. Schultz,]. Am. Cbem. SOL., 118,7430 (1996). 

A Structure-Based Library Approach to Kinase Inhibitors. 

206. T. M. Kapoor, A. H. Andreotti, and S. L. Schreiber, I. Am. Cbem. SOC., 120, 23 (1998). 

Exploring the Specificity Pockets of Two Homologous SH3 Domains Using Structure-Based, 

Split-Pool Synthesis and Affinity-Based Selection. 

207. J. P. Morken, T. M. Kapoor, S. Feng, F. Shirai, and S. L. Schreiber,J. Am. Cbem. SOC., 120,30 

(1998). Exploring the Leucine-Proline Binding Pocket of the Src SH3 Domain Using 

Structure-Based, Split-Pool Synthesis and Affinity-Based Selection. 

208. S. F. Brady, K. J. Stauffer, W. C. Lumma, G. M. Smith, H. G. Ramjit, S. D. Lewis, B. J. Lucas, 

S. J. Gardell, E. A. Lyle, S. D. Appleby, J. J. Cook, M. A. Holahan, M. T. Stranieri, J. J. Lynch 

Jr., J. H. Lin, I.-W. Chen, K. Vastag, A. M. Naylor-Olsen, and J. P. Vacca,J. Med. Chem., 41, 

401 (1998). Discoverv and Develo~ment of the Novel Potent Orallv Active Thrombin 

Inhiktor I\j-(9-Hydro~y-9-fluorene~arboxy)prolyl trans-4-Aminocyclohexylmethyl Amide 

(L-372,460): Coapplication of Structure-Based Design and Rapid Multiple Analog Synthesis 

on Solid Support. 

209. C. Illig, S. Eisennagel, R. Bone, A. Radzicka, L. Murphy, T. Randle, J. Spurlino, F. R. 

Salemme, and R. M. SOH, Med. Chem. Res., 4/5,244 (1998). Expanding the Envelope of 

Structure-Based Drug Design Using Chemical Libraries: Application to Small Molecule 

Inhibitors of Thrombin. 

210. D. S. Dhanoa, R. M. Soll, 2. Wu, N. Subasinghe, J. Rinker, J. Hoffman, S. Eisennagel, T. 

Graybill, R. Bone, A. Radzicka, L. Murphy, and F. R. Salemme, Med. Chem. Res., 415,187 

(1998). Serine Proteases-Directed Small Molecule Probe Libraries.

SO Molecular Diversity and Combinatorial Libra y Design 

211. 

212. 

213. 

214. 

215. 

216. 

217. 

218. 

219. 

220. 

221. 

222. 

223. 

224. 

225. 

226. 

227. 

228. 

229. 

S.-H. Kim, Pure Appl. Chem., 70,555 (1998). Structure-Based Inhibitor Design for CDK2, a 

Cell Cycle Controlling Protein. 

M. Whittaker, Cum Opin. Chem. Biol., 2, 386 (1998). Discovery of Protease Inhibitors 

Using Targeted Libraries. 

A. K. Szardenings, D. Harris, S. Lam, L. Shi, D. Tien, Y. Wang, D. V. Patel, M. Navre, and D. 

A. Campbell, J. Med. Chem., 41,2194 (1998). Rational Design and Combinatorial Evalua- 

tion of Enzyme Inhibitor Scaffolds: Identification of Novel Inhibitors of Matrix 

Metalloproteinases. 

K. D. Stewart, S. Loren, L. Frey, E. Otis, V. Klinghofer, and K. I, Hulkower, Bioorg. Med. 

Chem. Lett., 8, 529 (1998). Discovery of a New Cyclooxygenase-2 Lead Compound 

Through 3-D Database Searching and Combinatorial Chemistry. 

T. L. Graybill, D. K. Agrafiotis, R. Bone, C. R. Illig, E. P. Jaeger, K. T. Locke, T. Lu, J. M. 

Salvino, R. M. SOIL J. C. Spurlino, N. Subasinghe, B. E. Tomczuk, and F. R. Salemme, in 

Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, I. M. 

Chaiken and K. D. Janda, Eds., American Chemical Society, Washington, DC, 1996, pp. 16- 

27. Enhancing the Drug Discovery Process by Integration of High-Throughput Chemistry 

and Structure-Based Drug Design. 

E. J. Martin and R. E. Critchlow, J. Comb. Chem., 1, 32 (1999). Beyond Mere Diversity: 

Tailoring Combinatorial Libraries for Drug Discovery. 

G. M. Rishton, Drug Discovery Today, 2, 382 (1997). Reactive Compounds and In Vitro 

False Positives in HTS. 

A. D. Rodrigues, Pharm. Res., 14, 1504 (1997). Preclinical Drug Metabolism in the Age of 

High-Throughput Screening: An Industrial Perspective. 

J. H. Lin and A. Y. H. Lu, Pharmacol. Rev., 49,403 (1997). Pharmacokinetics and Metabo- 

lism in Drug Discovery and Development. 

M. H. Tarbit and J. Berrnan, Curz Opin. Chem. Biol., 2, 411 (1998). High-Throughput 

Approaches for Evaluating Absorption, Distribution, Metabolism and Excretion Properties 

of Lead Compounds. 

P. J. Sinko, Cum. Opin. Drug Discovery Dev., 2, 42 (1999). Drug Selection in Early Drug 

Development: Screening for Acceptable Pharmacokinetic Properties Using Combined In 

Vitro and Computational Approaches. 

C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney, Adv. Drug. Delivery Rev., 23,3 

(1997). Experimental and Computational Approaches to Estimate Solubility and Per- 

meability in Drug Discovery and Development Settings. 

I. Moriguchi, S. Hirono, Q. Liu, I. Nakagome, and Y. Matsushita, Chem. Pham. Bull., 40, 

127 (1992). Simple Method of Calculating OctanoVWater Partition Coefficient. 

K. Palm, K. Luthmann, A.-L. Ungell, G. Strandlund, and P. Artursson,]. Pbarm. Sci., 85,32 

(1996). Correlation of Drug Absorption with Molecular Surface Properties. 

K. Palm, P. Stenberg, K. Luthmann, and P. Artursson, Pharm. Res., 14, 568 (1997). Polar 

Molecular Surface Properties Predict the Intestinal Absorption of Drugs in Humans. 

K. Palm, K. Luthman, A.-L. Ungell, G. Strandlund, F. Beigi, P. Lundahl, and P. Artursson, I. 

Med. Chem., 41, 5382 (1998). Evaluation of Dynamic Polar Molecular Surface Area as 

Predictor of Drug Absorption: Comparison with Other Computational and Experimental 

Predictors. 

S. Winiwarter, N. M. Bonham, F. Ax, A. Hallberg, H. Lennernas, and A. Karlen, J. Med. 

Cbem., 41,4939 (1998). Correlation of Human Jejunal Permeability (In Vivo) of Drugs with 

Experimentally and Theoretically Derived Parameters. A Multivariate Data Analysis 

Approach. 

D. E. Clark, J. Pharm. Sci., 88, 807 (1999). Rapid Calculation of Polar Molecular Surface 

Area and Its Application to the Prediction of Transport Phenomena. 1. Prediction of Intesti- 

nal Absorption. 

D. E. Clark,J. Pharm. Sci., 88, 815 (1999). Rapid Calculation of Polar Molecular Surface 

Area and Its Application to the Prediction of Transport Phenomena. 2. Prediction of Blood- 

Brain Barrier Penetration.

230. 

231. 

232. 

233. 

234. 

235. 

236. 

237. 

238. 

239. 

240. 

241. 

242. 

243. 

244. 

245. 

246. 

247. 

References 51 

Y. C. Martin, Perspect. Drug Discovery Des., 7/8, 159 (1997). Challenges and Prospects for 

Computational Aids to Molecular Diversity. 

J. S. Mason and M. A. Hermsmeier, Curr. Opin. Chem. Biol., 3, 342 (1999). Diversity 

Assessment. 

C. A. Parks, G. M. Crippen, and J. G. Topliss,J. Cornput.-AidedMol. Des., 12,441 (1998). 

The Measurement of Molecular Diversity by Receptor Site Interaction Simulation. 

D. A. Thorner, D. J. Wild, P. Willett, and P. M. Wright, Perspect. Drug Discovery Des., 

9/10/11, 301 (1998). Calculation of Structural Similarity by the Alignment of Molecular 

Electrostatic Potentials. 

Ajay, W. P. Walters, and M. A. Murcko,]. Med. Chem., 41, 3314 (1998). Can We Learn to 

Distinguish Between Drug-like and Non-drug-like Molecules? 

J. Sadowski and H. Kubinyi, J. Med. Chem., 41, 3325 (1998). A Scoring Scheme for 

Discriminating Between Drugs and Nondrugs. 

A. K. Ghose, V. N. Viswanadhan, and J. J. Wendoloski, J'. Comb. Chem., 1, 55 (1999). A 

Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries 

for Drug Discovery. 1. A Qualitative and Quantitative Characterization of Known Drug 

Databases. 

J. Moult, T. Hubbard, S. H. Bryant, K. Fidelis, and J. T. Pedersen, Proteins: Struct., Funct., 

Genet., Suppl. 1,2 (1997). Critical Assessment of Methods of Protein Structure Prediction 

(CASP): Round 11. 

H.-J. Bohm,J. Cornput.-Aided Mol. Des., 12,309 (1998). Prediction of Binding Constants of 

Protein Ligands: A Fast Method for the Prioritization of Hits Obtained from De Novo 

Design or 3-D Database Search Programs. 

I. Muegge and Y. C. Martin, J. Med. Chem., 42, 791 (1999). A General and Fast Scoring 

Function for Protein-Ligand Interactions: A Simplified Potential Approach. 

R. H. Smith Jr., W. L. Jorgensen, J. Tirado-Rives, M. L. Lamb, P. A. J. Janssen, C. J. Michejda, 

and M. B. K. Smith, ]. Med. Chem., 41, 5272 (1998). Prediction of Binding Affinities for 

TIBO Inhibitors of HIV-1 Reverse Transcriptase Using Monte Carlo Simulations in a Linear 

Response Method. 

T. Hansson, J. Marelius, and J. Aqvist, J. Cornput.-Aided Mol. Des., 12,27 (1998). Ligand 

Binding Affinity Prediction by Linear Interaction Energy Methods. 

T. P. Straatsma, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, 

Eds., VCH Publishers, New York, 1996, Vol. 9, pp. 81-127. Free Energy by Molecular 

Simulation. 

J, M. Barnard and G. M. Downs, Perspect. Drug Discovery Des., 7/8,13 (1997). Computer 

Representation and Manipulation of Combinatorial Libraries. 

X. Chen, A. Rusinko, and S. S. Young,J. Chem. Inf. Comput. Sci., 38,1054 (1998). Recur- 

sive Partitioning Analys' ; of a Large Structure-Activity Data Set Using Three-Dimensional 

Descriptors. 

H. Gao, C. Williams, P. Labute, and J. Bajorath,]. Chem. Inf. Comput. Sci., 39,164 (1999). 

Binary Quantitative Structure-Activity Relationship (QSAR) Analysis of Estrogen Receptor 

Ligands. 

R. S. Pearlman (University of Texas at Austin), private communication, 1999. 

DiverseSolutions. Distributed by Tripos, Inc., 1699 South Hanley Road, St. Louis, MO 

63144, on behalf of the Laboratory for Molecular Graphics and Theoretical Modeling, 

College of Pharmacy, University of Texas at Austin, Austin, TX, 78712.

Computer-Aided Molecular Diversity Analysis and ... - Read

Create successful ePaper yourself

Delete template?

Save as template?