27.08.2013 Views

Extracting SAR Rules from Compound Data

Extracting SAR Rules from Compound Data

Extracting SAR Rules from Compound Data

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Extraction of <strong>SAR</strong> <strong>Rules</strong> <strong>from</strong><br />

<strong>Compound</strong> <strong>Data</strong><br />

Jürgen Bajorath<br />

Department of Life Science Informatics<br />

LIMES Program Chemical Biology and Medicinal Chemistry<br />

University of Bonn<br />

How to extract <strong>SAR</strong> information <strong>from</strong><br />

compound data sets?<br />

Systematically<br />

With the aid of graphical representations


Basic <strong>SAR</strong> Concepts<br />

6 nM 6 nM<br />

<strong>SAR</strong> continuity<br />

distinct structures with<br />

similar potency<br />

<strong>SAR</strong> discontinuity<br />

similar structures with<br />

highly different potency<br />

(“activity cliff”)<br />

continuity<br />

discontinuity<br />

2.3 µM<br />

Concept of Activity Landscapes<br />

“Activity landscapes”: biological activity hypersurfaces<br />

within chemical space;<br />

visualized as a 2D projection of chemical space with<br />

compound potency as the third dimension


Idealized Activity Landscapes and <strong>SAR</strong>s<br />

Continuous <strong>SAR</strong> Discontinuous <strong>SAR</strong><br />

gradual changes in<br />

structure result in moderate<br />

changes in activity<br />

“rolling hills”<br />

Basic <strong>SAR</strong> Concepts<br />

small changes in<br />

structure have dramatic<br />

effects on activity<br />

“activity cliffs”<br />

6 nM 6 nM<br />

Coexistence of<br />

continuous and<br />

discontinuous <strong>SAR</strong><br />

components:<br />

<strong>SAR</strong> heterogeneity<br />

corresponding to<br />

variable activity landscapes<br />

continuity<br />

discontinuity<br />

2.3 µM


y<br />

Variable Activity Landscapes<br />

Cathepsin S inhibitors<br />

potency<br />

- not idealized, but calculated -<br />

“Coordinate-free” chemical<br />

space (MACCS Tanimoto<br />

coefficient distances)<br />

2D projection through multidimensional<br />

scaling<br />

xy-plane: MACCS<br />

Tanimoto similarity-based<br />

projection<br />

z-axis: interpolation of<br />

potency values<br />

color code: surface<br />

elevation<br />

Activity Landscapes and <strong>SAR</strong>s<br />

Cathepsin S inhibitors<br />

x<br />

2D projection of an activity<br />

landscape<br />

points represent molecules<br />

color: potency<br />

(red: high, green: low)<br />

area shaded according to<br />

interpolated potency


Activity Landscapes and <strong>SAR</strong>s<br />

Cathepsin S inhibitors<br />

Cathepsin S inhibitors<br />

2D vs. 3D landscape<br />

representation<br />

Activity Landscapes and <strong>SAR</strong>s<br />

0.1 nM<br />

Activity cliff formed by highly<br />

and weakly potent molecules<br />

10 μM


Systematic <strong>SAR</strong> Analysis<br />

What do we like to learn ?<br />

Activity cliffs<br />

<strong>SAR</strong> microenvironments<br />

subsets of compounds<br />

representing different local <strong>SAR</strong>s<br />

Graph Representations<br />

Discontinuous <strong>SAR</strong><br />

components<br />

lead optimization<br />

Continuous <strong>SAR</strong><br />

components<br />

Q<strong>SAR</strong>, lead hopping<br />

Basic data<br />

A list of compounds<br />

with potency values<br />

pairwise comparison


Graph Representations<br />

⎛ 1 ⎞<br />

cont = weighted mean ⎜ ⎟<br />

{ i , j i > j } ⎝ 1 + sim (i,j) ⎠<br />

P i ⋅ P j<br />

weight ij =<br />

1 + P − P<br />

i<br />

j<br />

disc =<br />

1<br />

<strong>SAR</strong>I −<br />

2<br />

= ( cont − ( 1 disc ) )<br />

Common features<br />

edges determined by<br />

2D structural similarity<br />

potency used as<br />

node annotation<br />

<strong>SAR</strong> Index Scoring - Annotation<br />

Numerical function to characterize <strong>SAR</strong> features<br />

continuity score<br />

emphasizes structurally<br />

diverse compounds having<br />

similar potency<br />

balances<br />

two parts<br />

GLOBAL SCORE<br />

all possible compound pairs<br />

mean P i j<br />

⎪⎧<br />

⎪⎧<br />

i > j , P Pi<br />

Pj<br />

1,<br />

⎪⎫<br />

i − −Pj<br />

> 1,<br />

⎪⎫<br />

⎨ i , j<br />

⎬<br />

⎪⎩ ⎪⎩ sim( i , j ) > 0 . 65 65⎪⎭<br />

⎪⎭<br />

( − P ⋅ sim( i , j ) )<br />

(P, potency; sim,<br />

pairwise 2D similarity)<br />

discontinuity score<br />

emphasizes similar<br />

compounds with large<br />

potency differences


<strong>SAR</strong> Index Scoring<br />

Numerical function to characterize <strong>SAR</strong> features<br />

node size scaling<br />

high compound score<br />

low compound score<br />

reflects the potency<br />

deviation of a compound<br />

<strong>from</strong> its structurally<br />

similar neighbors<br />

LOCAL SCORE<br />

pairs formed by a given compound<br />

disc<br />

( )<br />

i j<br />

= mean P i − P<br />

{ j sim( i , j ) > t , i ≠ j }<br />

( ⋅ sim ( i , j ) )<br />

discontinuity score<br />

emphasizes similar<br />

compounds with large<br />

potency differences<br />

Network-like Similarity Graph (NSG)<br />

Exemplary graphical<br />

<strong>SAR</strong> analysis method


Network-like Similarity Graph<br />

Network-like Similarity Graph<br />

NSG for a set of<br />

squalene synthese<br />

inhibitors<br />

Annotated graph<br />

representation of<br />

similarity relationships<br />

in compound data sets<br />

Nodes: represent all<br />

compounds in the data set<br />

Edges: connect nodes with<br />

high pairwise similarity<br />

Clusters: Ward’s hierachical<br />

clustering (gray background)<br />

Layout: Fruchterman-Reingold<br />

Annotated graph<br />

representation of<br />

similarity relationships<br />

in compound data sets<br />

Annotations:<br />

node size<br />

cluster scores<br />

global scores


NSG – Score Annotations<br />

node size compound compound compound compound<br />

discontinuity discontinuity discontinuity discontinuity score score score score<br />

cluster scores <strong>SAR</strong> <strong>SAR</strong> <strong>SAR</strong> <strong>SAR</strong> Index Index Index Index for for for for<br />

compound compound compound compound clusters clusters clusters clusters<br />

global scores <strong>SAR</strong> <strong>SAR</strong> <strong>SAR</strong> <strong>SAR</strong> Index Index Index Index for for for for the the the the entire entire entire entire<br />

compound compound compound compound set set set set<br />

highlights compounds that<br />

introduce <strong>SAR</strong> discontinuity/<br />

activity cliffs in a data set<br />

indicates the level of<br />

continuity/discontinuity in a<br />

group of similar compounds<br />

indicates the level of<br />

continuity/discontinuity in the<br />

data set<br />

NSG Interpretation - Local <strong>SAR</strong> Features


Activity Cliff Index<br />

NSGs provide interactive<br />

graphical access to prominent<br />

activity cliffs<br />

Cliff Index (CI) enables<br />

systematic mining and ranking<br />

of activity cliffs<br />

CI prioritizes pairs of similar<br />

compounds having large<br />

potency differences:<br />

( ) i j P P j i ⋅<br />

+<br />

2<br />

1 sim( , )<br />

j i −<br />

= CI( , )<br />

CI = 15.2<br />

<strong>SAR</strong> Pathways <strong>from</strong> NSGs<br />

Pathways are annotated with compound<br />

discontinuity scores to emphasize compounds<br />

forming activtiy cliffs<br />

7 μM<br />

CI = 13.4<br />

activity cliff marker<br />

1 μM<br />

0.015 nM<br />

A sequence of pairwise<br />

similar compounds with<br />

balanced chemical and<br />

activity similarity<br />

potency<br />

increases <strong>from</strong> start to end<br />

node<br />

potency gradient<br />

smooth gradients are preferred


<strong>SAR</strong> Pathways<br />

Cytochrome P450<br />

2C19 PubChem<br />

screening data set<br />

Preferred pathways with:<br />

pairwise similar compounds<br />

scaffold hop<br />

Pathway <strong>SAR</strong> Model<br />

small increase in potency per compound<br />

large potency difference between start- and endpoint<br />

smooth potency gradient<br />

many compounds<br />

deviation deviation <strong>from</strong> a linear<br />

potency increase<br />

number number number of of compounds compounds in<br />

the pathway<br />

Pathways are based on<br />

a predefined <strong>SAR</strong> model<br />

potency potency<br />

potency<br />

difference<br />

difference<br />

between<br />

start- and<br />

endpoint


<strong>SAR</strong> Trees<br />

Cytochrome P450<br />

2C19 PubChem<br />

screening data set<br />

<strong>SAR</strong> Trees<br />

<strong>SAR</strong> Trees provide a<br />

structural context for<br />

individual pathways<br />

Activity cliff pathways<br />

can be monitored<br />

A set of pathways<br />

organized in a tree<br />

root<br />

all pathways begin (or lead to)<br />

the same compound<br />

branches<br />

identical pathway sections are<br />

fused into one branch<br />

leaves<br />

endpoints of potency gradients<br />

(highest/lowest potent<br />

compounds)<br />

activity cliff


Advanced Application: Studying<br />

Multi-target <strong>SAR</strong>s Using NSGs<br />

NSGs can also be utilized<br />

to compare <strong>SAR</strong> behavior<br />

for multiple targets<br />

Node color reflects<br />

compound selectivity<br />

instead of potency<br />

From Activity Cliffs to Selectivity Cliffs<br />

Multi-target <strong>SAR</strong>s<br />

Target-pair selectivity:<br />

difference between<br />

logarithmic potency<br />

SA / B(<br />

i)<br />

= −SB<br />

/ A = PA(<br />

i)<br />

− PB<br />

( i)<br />

Structure-selectivity<br />

relationships (SSRs)<br />

cathepsin L<br />

cathepsin B<br />

pIC50 = 9 pIC50 = 7<br />

S L/B = 2


<strong>SAR</strong> and SSR Network Analysis<br />

cathepsin L<br />

0.48<br />

0.05<br />

Potency-based NSG<br />

Potency:<br />

10.4 3.0<br />

<strong>Compound</strong> discontinuity score:<br />

1<br />

0<br />

activity cliff markers<br />

Cluster discontinuity score<br />

<strong>SAR</strong> and SSR Network Analysis<br />

cathepsin L<br />

0.48<br />

0.05<br />

cathepsin B<br />

1<br />

0<br />

“rough” <strong>SAR</strong><br />

“smooth” <strong>SAR</strong><br />

0.10<br />

0.27


<strong>SAR</strong> and SSR Network Analysis<br />

cathepsin L /<br />

cathepsin B<br />

0.73<br />

0.72 1<br />

Local SSR Environments<br />

cathepsin L /<br />

cathepsin B<br />

0.73<br />

0.72<br />

Selectivity-based NSG<br />

Selectivity:<br />

3.2 (L) – 3.2 (B)<br />

<strong>Compound</strong> discontinuity score:<br />

1<br />

0<br />

selectivity cliff markers<br />

Cluster discontinuity score<br />

0<br />

“rough” SSR<br />

“smooth” SSR<br />

discontinuous SSR


Activity Cliffs vs. Selectivity Cliffs<br />

L B L/B<br />

discontinuous <strong>SAR</strong> continuous <strong>SAR</strong><br />

L: 15 nM<br />

B: 3.5 μM<br />

L/B: 1.4<br />

discontinuous SSR<br />

activity cliff markers selectivity cliff markers<br />

Local SSR Environments<br />

cathepsin L /<br />

cathepsin B<br />

0.73<br />

0.72<br />

L: 10 μM<br />

B: 170 nM<br />

L/B: -1.8<br />

discontinuous SSR


Activity Cliffs vs. Selectivity Cliffs<br />

L B L/B<br />

continuous <strong>SAR</strong> continuous <strong>SAR</strong><br />

L: 3.6 μM<br />

B: 102 μM<br />

L/B: 1.5<br />

Selectivity Determinants<br />

L/B<br />

discontinuous SSR<br />

selectivity cliff markers<br />

L: 26 μM<br />

B: 5.3 μM<br />

L/B: -0.7<br />

Molecules with different selectivity are found in the neighborhood of<br />

selectivity cliff markers<br />

Selectivity rules can be formulated<br />

sel: -0.7<br />

sel: 0.1<br />

halogens with increasing<br />

bulkiness and decreasing<br />

electronegativity shift<br />

selectivity toward cat L<br />

sel: 1.5<br />

sel: 2.0


Selectivity Determinants<br />

sel: 2.3<br />

Conclusions<br />

K/L<br />

bulkier substituents shift<br />

selectivity towards cat L<br />

sel: -0.6<br />

sel: -0.8<br />

sel: -0.8<br />

Numerical and graphical analysis tools are developed for<br />

mining of <strong>SAR</strong> information in compound data sets<br />

Annotated similarity-based compound networks play an<br />

important role for graphical <strong>SAR</strong> analysis<br />

NSGs enable a systematic comparison of global and local<br />

<strong>SAR</strong> features in compound data sets and the identification<br />

of activity cliffs<br />

<strong>SAR</strong> Trees are based on pre-defined <strong>SAR</strong> model<br />

NSG enable a comparative analysis of multi-target <strong>SAR</strong>s

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!