18.10.2014 Views

IBR and Association rules

IBR and Association rules

IBR and Association rules

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Overview<br />

Prognostic Models <strong>and</strong> Data Mining<br />

in Medicine, part II<br />

Instance-based reasoning<br />

1. Introduction<br />

2. Case-based reasoning<br />

3. Example: content-based image retrieval<br />

4. k-NN classification<br />

5. Case study: ICU prognosis<br />

6. Summary<br />

Includes material from:<br />

- Tan, Steinbach, Kumar: Introduction to Data Mining<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 1<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 2<br />

Instance-Based Reasoning<br />

1. Introduction<br />

Set of Stored Cases<br />

Atr1 ……... AtrN Class<br />

A<br />

B<br />

B<br />

C<br />

A<br />

C<br />

B<br />

• Store the training records<br />

• Select <strong>and</strong> use similar<br />

training records to predict<br />

the class label of unseen<br />

cases<br />

• No Model !<br />

Unseen Case<br />

Atr1 ……... AtrN<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 3<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 4<br />

The Basic Idea ...<br />

If it walks like a duck, quacks like a duck, then it’s<br />

probably a duck.<br />

Training<br />

Records<br />

Determine<br />

similarity<br />

Choose the most<br />

similar records<br />

Test<br />

Record<br />

The Flavors ...<br />

• Case-based reasoning (CBR)<br />

– alternative to reasoning with explicit<br />

knowledge (e.g. IF-THEN <strong>rules</strong>)<br />

– used in decision support systems<br />

• k-Nearest Neighbours (k-NN) classification<br />

– “lazy” machine learning method<br />

– few assumptions, adapts to problem domain<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 5<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 6


The CBR paradigm<br />

• Utilize specific knowledge of previously solved<br />

problems to solve new problems.<br />

2. Case-Based Reasoning (CBR)<br />

• No need to formulate general <strong>rules</strong> about the<br />

problem domain.<br />

• Use intra-domain analogical reasoning<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 7<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 8<br />

The CBR Cycle<br />

Retrieve Step<br />

Retained experience<br />

New Problem<br />

Case Library<br />

RETRIEVE<br />

Retrieved cases<br />

REUSE<br />

our<br />

focus<br />

• Find cases that are most similar to the current problem.<br />

• Requires notion of similarity (“nearness”)<br />

General properties:<br />

1. s(x,x') ≥ 0 for all x <strong>and</strong> x'. (positiveness)<br />

2. s(x,x') = 1 only if x = x'.<br />

3. s(x,x') = s(x',x) for all x <strong>and</strong> x'. (symmetry)<br />

RETAIN<br />

Retrieved Solutions<br />

• Common approach is to construct a distance metric d<br />

for the feature space, <strong>and</strong> use s(x,x') = 1- d(x,x')/ d max ,<br />

where d max is an upper limit on the distance.<br />

Revised solution<br />

REVISE<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 9<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 10<br />

Distance Metrics (1)<br />

Distance Metrics (2)<br />

Triangle<br />

inequality<br />

Equal distance<br />

“contour”s<br />

Formally, for all objects x,x',x'' we should have that<br />

– d(x,x') ≥ 0<br />

(positiveness)<br />

– d(x,x') = 0 iff x=x'<br />

– d(x,x') = d(x',x) (symmetry)<br />

– d(x,x'') ≤ d(x,x')+ d(x',x'') (triangle equality)<br />

The best-known metric that satisfies these properties is<br />

the Euclidean distance for objects in R m :<br />

Symmetric nature of<br />

distance functions<br />

d(<br />

x,<br />

x' ) =<br />

m<br />

∑<br />

i=<br />

1<br />

( x i<br />

− x i<br />

')<br />

2<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 11<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 12


Minkowski Distance<br />

Minkowski Distance<br />

The Minkowski distance (also called power metric) is a<br />

generalization of the Euclidean distance:<br />

1<br />

r<br />

r<br />

m<br />

⎛ ⎞<br />

d(<br />

x,<br />

x' ) = ⎜∑|<br />

x i<br />

− x i<br />

'| ⎟<br />

⎝ i=<br />

1 ⎠<br />

• r = 1. City block (Manhattan, taxicab, L 1<br />

norm) distance.<br />

– Called Hamming distance for binary vectors<br />

• r = 2. Euclidean distance<br />

• r →∞. “supremum” (L max<br />

norm, L ∞<br />

norm) distance.<br />

– This is the maximum difference between any component of the<br />

object.<br />

point x y<br />

p1 0 2<br />

p2 2 0<br />

p3 3 1<br />

p4 5 1<br />

L1 p1 p2 p3 p4<br />

p1 0 4 4 6<br />

p2 4 0 2 4<br />

p3 4 2 0 2<br />

p4 6 4 2 0<br />

L2 p1 p2 p3 p4<br />

p1 0 2.828 3.162 5.099<br />

p2 2.828 0 1.414 3.162<br />

p3 3.162 1.414 0 2<br />

p4 5.099 3.162 2 0<br />

L∞ p1 p2 p3 p4<br />

p1 0 2 3 5<br />

p2 2 0 1 3<br />

p3 3 1 0 2<br />

p4 5 3 2 0<br />

Distance Matrix<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 13<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 14<br />

Non-numeric attributes<br />

• For non-numeric attributes, a dedicated distance measure<br />

is required<br />

• A simple solution for categorical attributes is to take<br />

d(x i ,x i ')=0 if x i =x i ', <strong>and</strong> d(x i ,x i ')=1 otherwise<br />

(Manhattan distance)<br />

• More sophisticated solutions are possible when there<br />

exists a (partial or complete) order on the categories<br />

Scaling<br />

• Weighed Minkowski distance<br />

m<br />

⎛<br />

d(<br />

x,<br />

x' ) = ⎜∑<br />

wi<br />

| xi<br />

− x<br />

⎝ i=<br />

1<br />

1<br />

r<br />

r ⎞<br />

i<br />

'| ⎟<br />

⎠<br />

• Use w i =0 for attribute selection<br />

• Problem: there exists no general method for assessing<br />

the weights w i<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 15<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 16<br />

Case-based reasoning in medicine<br />

CBR is a popular methodology for building decision support<br />

systems in the health sciences.<br />

• Case histories are essential in the training of healthcare<br />

professionals.<br />

• The medical literature is filled with anecdotal accounts of<br />

the treatments of individual patients.<br />

• Many diseases are not well enough understood for formal<br />

models or general guidelines.<br />

• Reasoning from examples is natural for healthcare<br />

professionals.<br />

Advantages of CBR<br />

• Intuitive problem-solving method.<br />

• No need to formulate domain knowledge.<br />

• Supports interactive way of finding solutions.<br />

Disadvantages of CBR<br />

• Can lead to copying mistakes from the past.<br />

• Cases do not include knowledge of the domain, <strong>and</strong> this<br />

h<strong>and</strong>icaps explanation facilities.<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 17<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 18


Content-based image retrieval<br />

• Digital images for diagnostics <strong>and</strong> therapy are produced<br />

in ever-increasing quantities in medicine<br />

3. Example: Content-based<br />

image retrieval<br />

• Access to relevant medical images can improve clinical<br />

decisions<br />

• Most convenient solution (for user) is content-based<br />

retrieval of relevent images<br />

• Content = colors, shapes, textures, or any other<br />

information that can be derived from the image<br />

• Sometimes also called visual query-by-example<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 19<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 20<br />

The ASSERT system (1)<br />

The ASSERT system (2)<br />

• Computer-aided diagnosis with computed tomographic<br />

(CT) images of the chest (heart <strong>and</strong> lungs).<br />

• When presented with a new image, the system retrieves<br />

similar images with known diagnoses from the database.<br />

• Preliminary validation: percentage of correct diagnoses<br />

increased from 29% to 62% with computer assistance<br />

(inexperienced doctors)<br />

?<br />

Emphysema<br />

Emphysema<br />

A.M. Aisen et al., Radiology 2003;228:265-70.<br />

Macro nodules<br />

Micro nodules<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 21<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 22<br />

The ASSERT system (3)<br />

4. k-Nearest k<br />

Neighbor (k-NN) Classification<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 23<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 24


Nearest-Neighbor Classification<br />

Nearest-Neighbor Classifiers<br />

• Classification method from ML that resembles casebased<br />

reasoning.<br />

• Key idea: to classify an object x, locate the nearest<br />

object(s) in the training set, <strong>and</strong> look at its/their class(es)<br />

• Avoids constructing a model, instead classifies directly<br />

from the data (“lazy learning”)<br />

new object<br />

• Requires three things<br />

– The set of stored objects<br />

– Distance Metric to compute<br />

distance between objects<br />

– The value of k, the number of<br />

nearest neighbors to retrieve<br />

• To classify a new object:<br />

– Compute distance to other<br />

training objects<br />

– Identify k nearest neighbors<br />

– Use class labels of nearest<br />

neighbors to determine the<br />

class label of new object<br />

(e.g., by taking majority vote)<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 25<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 26<br />

Definition of Nearest Neighbor<br />

Example<br />

X X X<br />

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor<br />

k-nearest neighbors of a object x are data points that<br />

have the k smallest distance to x<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

0<br />

0<br />

-2<br />

0<br />

2<br />

-2<br />

-5<br />

1<br />

-2<br />

x 2<br />

0<br />

-3<br />

-2<br />

5<br />

3<br />

2<br />

0<br />

1<br />

0<br />

x 3<br />

T<br />

F<br />

F<br />

T<br />

T<br />

T<br />

F<br />

T<br />

F<br />

y<br />

−<br />

−<br />

−<br />

+<br />

+<br />

+<br />

+<br />

?<br />

?<br />

2<br />

18<br />

19<br />

17<br />

5<br />

10<br />

1-NN: −<br />

3-NN: +<br />

x 1<br />

38<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 27<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 28<br />

Example<br />

1 nearest-neighbor<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

x 1<br />

9<br />

0<br />

0<br />

-2<br />

0<br />

2<br />

-2<br />

-5<br />

1<br />

-2<br />

x 2<br />

0<br />

-3<br />

-2<br />

5<br />

3<br />

2<br />

0<br />

1<br />

0<br />

x 3<br />

T<br />

F<br />

F<br />

T<br />

T<br />

T<br />

F<br />

T<br />

F<br />

y<br />

−<br />

−<br />

−<br />

+<br />

+<br />

+<br />

+<br />

?<br />

?<br />

5<br />

13<br />

4<br />

30<br />

14<br />

5<br />

1-NN: −<br />

3-NN: −<br />

Voronoi Diagram<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 29<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 30


How many Neighbors? (1)<br />

• Choosing the value of k:<br />

– If k is too small, sensitive to noise points<br />

– If k is too large, neighborhood may include points from<br />

other classes<br />

How many Neighbors? (2)<br />

• A possible solution is to determine the optimal value of k<br />

using the data<br />

• That is, we try different values of k, <strong>and</strong> choose the one<br />

that performs best ...<br />

• ... on a test set, or with cross-validation (Why?)<br />

• This is basically the same solution as choosing the<br />

optimal size of a decision tree by post-pruning<br />

• If we also want to evaluate the final k-NN classifier,<br />

we have to use another, separate test set<br />

• ... or an outer cross-validation loop<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 31<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 32<br />

Classification <strong>rules</strong><br />

Determine the class from nearest neighbor list<br />

• Take the majority vote of class labels among<br />

the k-nearest neighbors<br />

• Weigh the vote according to distance:<br />

k<br />

∑ w<br />

j=<br />

j<br />

y<br />

1 j<br />

1<br />

P(<br />

y | xq<br />

) = where w<br />

k<br />

j<br />

=<br />

2<br />

w<br />

d( x , x )<br />

∑<br />

j=<br />

1<br />

j<br />

q<br />

j<br />

Scaling issues<br />

• Attributes may have to be scaled to prevent distance<br />

measures from being dominated by one of the attributes<br />

• Example:<br />

– height of a person may vary from 1.5m to 1.8m<br />

– weight of a person may vary from 90lb to 300lb<br />

– income of a person may vary from $10K to $1M<br />

Note: now it makes sens to use all training objects,<br />

instead of just k (Shepard’s method).<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 33<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 34<br />

Curse of Dimensionality<br />

• When dimensionality<br />

increases, data becomes<br />

increasingly sparse in the<br />

space that it occupies<br />

• Definitions of density <strong>and</strong><br />

distance between points,<br />

which is critical for k-NN,<br />

become less meaningful<br />

Lazy learning<br />

• k-NN classifiers are lazy learners<br />

– No models are built, unlike eager learners such as<br />

decision tree induction <strong>and</strong> rule-based systems.<br />

– Classifying new objects is relatively expensive.<br />

• k-NN is easily misled in<br />

high-dimensional spaces<br />

• R<strong>and</strong>omly generate 500 points<br />

• Compute difference between max <strong>and</strong> min<br />

distance between any pair of points<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 35<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 36


Advantages of k-NN<br />

• No assumptions on model form (no inductive bias):<br />

very flexible ML method.<br />

• Training is very fast.<br />

• No loss of information.<br />

Disadvantages of k-NN<br />

• Requires more data than eager learners to obtain the<br />

same perforance.<br />

• Classifying new objects can be slow.<br />

• Curse of dimensionality: does not work in<br />

high-dimensional spaces.<br />

5. Case study: ICU prognosis<br />

Joint work with Clarence Tan <strong>and</strong> Linda Peelen.<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 37<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 38<br />

Prognosis in intensive care<br />

Scoring<br />

LR model<br />

Patient data Score Probability<br />

sheet<br />

• Case–mix correction of outcomes (mortality)<br />

for benchmarking <strong>and</strong> institutional comparison.<br />

• Prediction: probability of hospital death.<br />

• Based on APACHE II score.<br />

Scoring<br />

k-NN<br />

Patient data Score Probability<br />

sheet<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 39 Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 40<br />

Why k-NN?<br />

k-NN regression<br />

• Logistic regression assumes fixed relationship between<br />

case-mix (score) <strong>and</strong> outcome – may not hold.<br />

• Logistic regression model becomes outdated after time<br />

(sensitive to drift)<br />

+<br />

+<br />

+<br />

But ...<br />

• k-NN requires large dataset<br />

a2<br />

-<br />

X<br />

-<br />

-<br />

1-NN regression<br />

estimate for X: 1<br />

• k-NN works only in low-dimensional domains<br />

-<br />

+<br />

+<br />

-<br />

5-NN regression<br />

estimate for X: 0.4<br />

a1<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 41<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 42


Kernel regression<br />

How many neighbors?<br />

Similar to weighted k-NN, but uses a predefined kernel<br />

function to transform (normalized) distances into weights.<br />

• Problem: how many neighbors should we have in the<br />

neighborhood?<br />

• Solution: Depends on problem, learn from the data!<br />

Uniform<br />

Tri-cube<br />

Epanechnikov<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 43<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 44<br />

Choosing the neighborhood size<br />

Model validation<br />

R-squared<br />

Validating a prognostic model means establishing that the<br />

model works satisfactorily for patients other than those from<br />

whose data it was derived.<br />

Like for any other scientific hypothesis, the validity of a<br />

model is established by gathering incremental evidence<br />

across diverse settings.<br />

Neighborhood size<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 45<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 46<br />

Types of validity<br />

Prospective validation<br />

• Internal validity<br />

The model is valid for patients from the same population<br />

<strong>and</strong> in the same setting.<br />

• Prospective validity<br />

The model is valid for future patients from the same<br />

population <strong>and</strong> in the same setting.<br />

• External validity<br />

The model is valid for patients from another population or<br />

another setting.<br />

Does the k-NN prediction method generalize well to<br />

prospective data?<br />

Training<br />

set<br />

Kernel parameter settings<br />

Instance<br />

base<br />

<strong>IBR</strong><br />

Results<br />

Validation<br />

set<br />

Queries<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 47<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 48


Prospective validation: results<br />

Incremental prospective validation<br />

AUC<br />

Internal<br />

validation<br />

Prospective<br />

validation<br />

LR model<br />

Does the method generalize well while adding data?<br />

– ICU admissions with known outcomes become<br />

examples for new instances<br />

APACHE<br />

0.792<br />

0.784<br />

0.804<br />

SAPS<br />

0.860<br />

0.867<br />

0.877<br />

Training<br />

set<br />

Kernel parameter settings<br />

Add when<br />

outcome is known<br />

Instance<br />

base<br />

<strong>IBR</strong><br />

Results<br />

Validation<br />

set<br />

Queries<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 49<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 50<br />

Incremental prospective validation<br />

AUC<br />

Plain<br />

prospective<br />

validation<br />

Incremental<br />

prospective<br />

validation<br />

LR model<br />

APACHE<br />

0.784<br />

0.809<br />

0.804<br />

SAPS<br />

0.867<br />

0.867<br />

0.877<br />

6. Summary<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 51<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 52<br />

Summary: CBR <strong>and</strong> k-NN<br />

• Case-based reasoning is a methodology to build advice<br />

systems. It utilizes experience of previously solved<br />

problems to solve new problems.<br />

• k-NN is a supervised machine learning method, that<br />

avoids to construct a model <strong>and</strong> classifies new objects<br />

directly from the training data.<br />

• Both methods are very flexible because they do not try to<br />

exploit general <strong>rules</strong> – but this also means that they<br />

provide no new insights.<br />

• The notion of similarity/distance is central to both<br />

approaches.<br />

Prognostic Models <strong>and</strong> Data Mining, part II Instance-Based Reasoning 53


Overview<br />

Prognostic Models <strong>and</strong> Data Mining<br />

in Medicine, part II<br />

<strong>Association</strong> Rule Discovery<br />

1. Introduction<br />

2. Frequent Itemset Generation<br />

3. Rule Generation<br />

4. Interpretaion <strong>and</strong> Evaluation<br />

5. Application: Hospital Infection Control<br />

6. Summary<br />

Includes material from:<br />

- Tan, Steinbach, Kumar: Introduction to Data Mining<br />

- Witten & Frank: Data Mining. Practical Machine Learning Tools <strong>and</strong> Techniques<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 1<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 2<br />

<strong>Association</strong> Rule Discovery: Definition<br />

• Given a set of records each of which contain some<br />

number of items (“transaction”) from a given collection:<br />

• Produce dependency <strong>rules</strong> which will predict occurrence<br />

of an item based on occurrences of other items.<br />

1. Introduction<br />

TID Items<br />

1 Bread, Coke, Milk<br />

2 Beer, Bread<br />

3 Beer, Coke, Diaper, Milk<br />

4 Beer, Bread, Diaper, Milk<br />

5 Coke, Diaper, Milk<br />

Rules Discovered:<br />

{Milk} {Milk} → {Coke}<br />

{Diaper, Milk} Milk} → {Beer}<br />

Implication means co-occurrence,<br />

not causality!<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 3<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 4<br />

<strong>Association</strong>al learning<br />

Definition: Frequent Itemset<br />

• Can be applied if no class is specified <strong>and</strong> any<br />

kind of structure is considered “interesting”<br />

• Difference to classification learning:<br />

– Can predict any attribute’s value, not just the class,<br />

<strong>and</strong> more than one attribute’s value at a time<br />

– Hence: far more association <strong>rules</strong> than classification<br />

<strong>rules</strong><br />

– Thus: constraints are necessary<br />

– Minimum coverage <strong>and</strong> minimum accuracy<br />

• Itemset<br />

– A collection of one or more items<br />

Example: {Milk, Bread, Diaper}<br />

– k-itemset<br />

An itemset that contains k items<br />

• Support count (σ)<br />

– Frequency of occurrence of an itemset<br />

– E.g. σ({Milk, Bread,Diaper}) = 2<br />

• Support<br />

– Fraction of transactions that contain an<br />

itemset<br />

– E.g. s({Milk, Bread, Diaper}) = 2/5<br />

• Frequent Itemset<br />

– An itemset whose support is greater<br />

than or equal to a minsup threshold<br />

TID<br />

Items<br />

1 Bread, Milk<br />

2 Bread, Diaper, Beer, Eggs<br />

3 Milk, Diaper, Beer, Coke<br />

4 Bread, Milk, Diaper, Beer<br />

5 Bread, Milk, Diaper, Coke<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 5<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 6


Definition: <strong>Association</strong> Rule<br />

• <strong>Association</strong> Rule<br />

– An implication expression of the form<br />

X → Y, where X <strong>and</strong> Y are itemsets<br />

– Example:<br />

{Milk, Diaper} → {Beer}<br />

• Rule Evaluation Metrics<br />

– Support (s)<br />

Fraction of transactions that contain<br />

both X <strong>and</strong> Y<br />

– Confidence (c)<br />

Measures how often items in Y<br />

appear in transactions that<br />

contain X<br />

TID Items<br />

1 Bread, Milk<br />

2 Bread, Diaper, Beer, Eggs<br />

3 Milk, Diaper, Beer, Coke<br />

4 Bread, Milk, Diaper, Beer<br />

5 Bread, Milk, Diaper, Coke<br />

Example:<br />

{ Milk, Diaper} ⇒ Beer<br />

(Milk,Diaper,Beer) 2<br />

s = σ<br />

= = 0.4<br />

| T | 5<br />

σ (Milk,Diaper,Beer) 2<br />

c =<br />

= = 0.67<br />

σ (Milk,Diaper) 3<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 7<br />

<strong>Association</strong> Rule Mining Task<br />

• Given a set of transactions T, the goal of<br />

association rule mining is to find all <strong>rules</strong> having<br />

– support ≥ minsup threshold<br />

– confidence ≥ minconf threshold<br />

• Brute-force approach:<br />

– List all possible association <strong>rules</strong><br />

– Compute the support <strong>and</strong> confidence for each rule<br />

– Prune <strong>rules</strong> that fail the minsup <strong>and</strong> minconf<br />

thresholds<br />

→ Computationally prohibitive!<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 8<br />

Mining <strong>Association</strong> Rules<br />

TID<br />

Items<br />

1 Bread, Milk<br />

2 Bread, Diaper, Beer, Eggs<br />

3 Milk, Diaper, Beer, Coke<br />

4 Bread, Milk, Diaper, Beer<br />

5 Bread, Milk, Diaper, Coke<br />

Observations:<br />

Example of Rules:<br />

{Milk,Diaper} → {Beer} (s=0.4, c=0.67)<br />

{Milk,Beer} → {Diaper} (s=0.4, c=1.0)<br />

{Diaper,Beer} → {Milk} (s=0.4, c=0.67)<br />

{Beer} → {Milk,Diaper} (s=0.4, c=0.67)<br />

{Diaper} → {Milk,Beer} (s=0.4, c=0.5)<br />

{Milk} → {Diaper,Beer} (s=0.4, c=0.5)<br />

• All the above <strong>rules</strong> are binary partitions of the same itemset:<br />

{Milk, Diaper, Beer}<br />

• Rules originating from the same itemset have identical support but<br />

can have different confidence<br />

• Thus, we may decouple the support <strong>and</strong> confidence requirements<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 9<br />

Mining <strong>Association</strong> Rules<br />

• Two-step approach:<br />

1. Frequent Itemset Generation<br />

– Generate all itemsets whose support ≥ minsup<br />

2. Rule Generation<br />

– Generate high confidence <strong>rules</strong> from each frequent itemset,<br />

where each rule is a binary partitioning of a frequent itemset<br />

• Frequent itemset generation is still<br />

computationally expensive<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 10<br />

Frequent Itemset Generation<br />

null<br />

A B C D E<br />

2. Frequent Itemset Generation<br />

AB AC AD AE BC BD BE CD CE DE<br />

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE<br />

ABCD ABCE ABDE ACDE BCDE<br />

ABCDE<br />

Given d items, there<br />

are 2 d possible<br />

c<strong>and</strong>idate itemsets<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 11<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 12


Reducing Number of C<strong>and</strong>idates<br />

Illustrating Apriori Principle<br />

• Apriori principle:<br />

– If an itemset is frequent, then all of its subsets must also<br />

be frequent<br />

• Apriori principle holds due to the following property<br />

of the support measure:<br />

∀X<br />

, Y : ( X ⊆ Y ) ⇒ s(<br />

X ) ≥ s(<br />

Y )<br />

Found to be<br />

Infrequent<br />

– Support of an itemset never exceeds the support of its<br />

subsets<br />

– This is known as the anti-monotone property of support<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 13<br />

Pruned<br />

supersets<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 14<br />

Illustrating Apriori Principle<br />

Apriori Algorithm<br />

Item Count<br />

Bread 4<br />

Coke 2<br />

Milk 4<br />

Beer 3<br />

Diaper 4<br />

Eggs 1<br />

Minimum Support = 3<br />

If every subset is considered,<br />

6 C 1 + 6 C 2 + 6 C 3 = 41<br />

With support-based pruning,<br />

6 + 6 + 1 = 13<br />

Items (1-itemsets)<br />

Itemset<br />

Count<br />

{Bread,Milk} 3<br />

{Bread,Beer} 2<br />

{Bread,Diaper} 3<br />

{Milk,Beer} 2<br />

{Milk,Diaper} 3<br />

{Beer,Diaper} 3<br />

Pairs (2-itemsets)<br />

(No need to generate<br />

c<strong>and</strong>idates involving Coke<br />

or Eggs)<br />

Triplets (3-itemsets)<br />

Item set<br />

Count<br />

{B read,M ilk,D iaper} 3<br />

• Method:<br />

– Let k=1<br />

– Generate frequent itemsets of length 1<br />

– Repeat until no new frequent itemsets are identified<br />

Generate length (k+1) c<strong>and</strong>idate itemsets from length k<br />

frequent itemsets<br />

Prune c<strong>and</strong>idate itemsets containing subsets of length k that<br />

are infrequent<br />

Count the support of each c<strong>and</strong>idate by scanning the DB<br />

Eliminate c<strong>and</strong>idates that are infrequent, leaving only those<br />

that are frequent<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 15<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 16<br />

Example<br />

Maximal Frequent Itemset<br />

TID<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

Items<br />

A,B,C,D<br />

A,C,D,F<br />

C,D,E,G,A<br />

A,D,F,B<br />

B,C,G<br />

D,F,G<br />

A,B,G<br />

C,D,F,G<br />

Itemsets with minsup=33%:<br />

• A:5, B:4, C:5, D:6, E:1, F:4, G:5<br />

• AB:3, AC:3, AD:3, AF:2, AG:2,<br />

BC:2, BD:2, BF:1, BG:2, CD:4,<br />

CF:2, CG:3, DF:4, DG:3, FG:2<br />

• ABC:1, ABD:2, ACD:3, CDG:2,<br />

CDF:2, DFG:2<br />

• Done<br />

An itemset is maximal frequent if none of its immediate<br />

supersets is frequent<br />

Maximal<br />

Itemsets<br />

Infrequent<br />

Itemsets<br />

Border<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 17<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 18


Closed Itemset<br />

Maximal vs Closed Itemsets<br />

An itemset is closed if none of its immediate supersets has<br />

the same support as the itemset<br />

TID Items<br />

1 {A,B}<br />

2 {B,C,D}<br />

3 {A,B,C,D}<br />

4 {A,B,D}<br />

5 {A,B,C,D}<br />

Itemset Support<br />

{A} 4<br />

{B} 5<br />

{C} 3<br />

{D} 4<br />

{A,B} 4<br />

{A,C} 2<br />

{A,D} 3<br />

{B,C} 3<br />

{B,D} 4<br />

{C,D} 3<br />

Itemset Support<br />

{A,B,C} 2<br />

{A,B,D} 3<br />

{A,C,D} 2<br />

{B,C,D} 3<br />

{A,B,C,D} 2<br />

TID Items<br />

1 ABC<br />

2 ABCD<br />

3 BCE<br />

4 ACDE<br />

5 DE<br />

Transaction Ids<br />

null<br />

124 123 1234 245 345<br />

A B C D E<br />

12<br />

AB<br />

124<br />

AC<br />

24<br />

AD<br />

4 123 2 3 24 34 45<br />

AE BC BD BE CD CE DE<br />

12 2 24 4 4 2 3 4<br />

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE<br />

2 4<br />

ABCD ABCE ABDE ACDE BCDE<br />

Not supported by<br />

any transactions<br />

ABCDE<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 19<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 20<br />

Maximal vs Closed Frequent Itemsets<br />

Maximal vs Closed Itemsets<br />

Minimum support = 2<br />

null<br />

Closed but<br />

not maximal<br />

124 123 1234 245 345<br />

A B C D E<br />

Closed <strong>and</strong><br />

maximal<br />

12<br />

AB<br />

124<br />

AC<br />

24<br />

AD<br />

4 123 2 3 24 34 45<br />

AE BC BD BE CD CE DE<br />

12 2 24 4 4 2 3 4<br />

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE<br />

2 4<br />

ABCD ABCE ABDE ACDE BCDE<br />

# Closed = 9<br />

# Maximal = 4<br />

ABCDE<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 21<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 22<br />

Rule Generation<br />

3. Rule Generation<br />

• Given a frequent itemset L, find all non-empty<br />

subsets f ⊂ L such that f → L – f satisfies the<br />

minimum confidence requirement<br />

– If {A,B,C,D} is a frequent itemset, c<strong>and</strong>idate <strong>rules</strong>:<br />

ABC →D, ABD →C, ACD →B, BCD →A,<br />

A →BCD, B →ACD, C →ABD, D →ABC<br />

AB →CD, AC → BD, AD → BC, BC →AD,<br />

BD →AC, CD →AB,<br />

• If |L| = k, then there are 2 k – 2 c<strong>and</strong>idate<br />

association <strong>rules</strong> (ignoring L → ∅<strong>and</strong> ∅ →L)<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 23<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 24


Rule Generation<br />

• How to efficiently generate <strong>rules</strong> from frequent<br />

itemsets?<br />

– In general, confidence does not have an antimonotone<br />

property<br />

c(ABC →D) can be larger or smaller than c(AB →D)<br />

Rule Generation for Apriori Algorithm<br />

Lattice of <strong>rules</strong><br />

Low<br />

Confidence<br />

Rule<br />

– But confidence of <strong>rules</strong> generated from the same<br />

itemset has an anti-monotone property<br />

– e.g., L = {A,B,C,D}:<br />

c(ABC → D) ≥ c(AB → CD) ≥ c(A → BCD)<br />

Confidence is anti-monotone w.r.t. number of items on the<br />

RHS of the rule<br />

Pruned<br />

Rules<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 25<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 26<br />

Rule Generation for Apriori Algorithm<br />

• C<strong>and</strong>idate rule is generated by merging two <strong>rules</strong><br />

that share the same prefix<br />

in the rule consequent<br />

CD=>AB<br />

BD=>AC<br />

• join(CD=>AB,BD=>AC)<br />

would produce the c<strong>and</strong>idate<br />

rule D => ABC<br />

4. Interpretation & Evaluation<br />

• Prune rule D=>ABC if its<br />

subset AD=>BC does not have<br />

high confidence<br />

D=>ABC<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 27<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 28<br />

Interpreting association <strong>rules</strong><br />

• Interpretation is not obvious:<br />

If windy = false <strong>and</strong> play = no then outlook = sunny<br />

<strong>and</strong> humidity = high<br />

is not the same as<br />

If windy = false <strong>and</strong> play = no then outlook = sunny<br />

If windy = false <strong>and</strong> play = no then humidity = high<br />

• It means that the following also holds:<br />

Evaluating <strong>Association</strong> Rules<br />

• <strong>Association</strong> rule algorithms tend to produce too<br />

many <strong>rules</strong><br />

– many of them are uninteresting or redundant<br />

– Redundant if {A,B,C} → {D} <strong>and</strong> {A,B} → {D}<br />

have same support & confidence<br />

• Interestingness measures can be used to<br />

prune/rank the derived patterns<br />

If humidity = high <strong>and</strong> windy = false <strong>and</strong> play = no<br />

then outlook = sunny<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 29<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 30


Computing Interestingness Measure<br />

Drawback of Confidence<br />

• Given a rule X → Y, information needed to compute rule<br />

interestingness can be obtained from a contingency table<br />

Contingency table for X → Y<br />

Y Y<br />

X f 11 f 10 f 1+<br />

X f 01 f 00 f o+<br />

f +1 f +0 |T|<br />

f 11 : support of X <strong>and</strong> Y<br />

f 10 : support of X <strong>and</strong> Y<br />

f 01 : support of X <strong>and</strong> Y<br />

f 00 : support of X <strong>and</strong> Y<br />

Used to define various measures<br />

support, confidence, lift, Gini,<br />

J-measure, etc.<br />

Coffee Coffee<br />

Tea 15 5 20<br />

Tea 75 5 80<br />

90 10 100<br />

<strong>Association</strong> Rule: Tea → Coffee<br />

Confidence= P(Coffee|Tea) = 0.75<br />

but P(Coffee) = 0.9<br />

⇒ Although confidence is high, rule is misleading<br />

⇒ P(Coffee|Tea) = 0.9375<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 31<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 32<br />

Other measures of association<br />

Example: Lift<br />

P(<br />

Y | X )<br />

Lift =<br />

(sometimes called “interest”)<br />

P(<br />

Y )<br />

PS = P(<br />

X , Y ) − P(<br />

X ) P(<br />

Y )<br />

P(<br />

X , Y ) − P(<br />

X ) P(<br />

Y )<br />

φ − coefficient =<br />

P(<br />

X )[1 − P(<br />

X )] P(<br />

Y )[1 − P(<br />

Y )]<br />

Coffee Coffee<br />

Tea 15 5 20<br />

Tea 75 5 80<br />

90 10 100<br />

<strong>Association</strong> Rule: Tea → Coffee<br />

Confidence= P(Coffee|Tea) = 0.75<br />

but P(Coffee) = 0.9<br />

⇒ Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated)<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 33<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 34<br />

Biosurveillance systems<br />

5. Application: Biosurveillance systems<br />

• Biosurveillance systems are computer programs for early<br />

detection of infectious outbreaks.<br />

• Hospitals environments (<strong>and</strong> especially ICUs) are liable<br />

to outbreaks of infections, but outbreaks can also occur<br />

elsewhere.<br />

• Traditionally, biosurveillance systems assume a<br />

predefined event whose incidence is to be monitored.<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 35<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 36


Traditional Approaches (1)<br />

• Various methods have<br />

been developed for<br />

monitoring event data:<br />

– Time series analysis<br />

– Regression techniques<br />

– Statistical Quality Control<br />

methods<br />

Number of ED Visits per Day<br />

• Problem: We need to know in advance which events should<br />

be monitored.<br />

• We cannot focus on other characteristics (e.g. spatial or<br />

demographic) of an epidemic.<br />

Number of ED Visits<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

1<br />

10<br />

19<br />

28<br />

37<br />

46<br />

55<br />

64<br />

73<br />

82<br />

91<br />

100<br />

Day Number<br />

Traditional Approaches (2)<br />

We need to build a univariate detector to monitor each<br />

interesting combination of attributes:<br />

Diarrhea cases<br />

among children<br />

Respiratory syndrome<br />

Number of cases involving<br />

cases You’ll among need females hundreds of teenage univariate girls detectors!<br />

living in the<br />

We would like to identify the groups western with part of the strangest city<br />

Viral syndrome cases<br />

involving senior behavior citizens in recent events.<br />

from eastern part of city Botulinic syndrome cases<br />

Number of children from<br />

downtown hospital<br />

Number of cases involving<br />

people working in southern<br />

part of the city<br />

And so on…<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 37<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 38<br />

The approach from Brossette et al. (1)<br />

1. For each time window (e.g. one month), discover all<br />

high-support association <strong>rules</strong>.<br />

2. The confidence of each rule discovered in the current<br />

slice is compared with its confidence in previous slices.<br />

3. If the confidence has changed significantly, this is<br />

reported to the user.<br />

The approach from Brossette et al. (2)<br />

Advantages:<br />

• Not limited to a single event.<br />

• Can take other characteristics of epidemic (e.g. location)<br />

into account.<br />

Disadvantages:<br />

• The method is statistically poorer than traditional<br />

approaches. Some “significant” changes in rule<br />

confidence will occur due to chance.<br />

• The patterns identified by the analysis are only<br />

potentially interesting; further examination will be<br />

needed.<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 39<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 40<br />

Summary<br />

• <strong>Association</strong> <strong>rules</strong> describe co-occurrence of items within<br />

transaction data.<br />

6. Summary<br />

• <strong>Association</strong> <strong>rules</strong> are discovered in two steps:<br />

1. Frequent Itemset Generation (focus on support)<br />

2. Rule Generation (focus on confidence)<br />

• Often, additional measures of “interestingness” are<br />

needed to filter the discovered <strong>rules</strong>. This is non-trivial.<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 41<br />

Prognostic Models <strong>and</strong> Data Mining, part II <strong>Association</strong> Rule Discovery 42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!