27.03.2013 Views

SPSS® 12.0 Command Syntax Reference

SPSS® 12.0 Command Syntax Reference

SPSS® 12.0 Command Syntax Reference

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Measures for Binary Data<br />

CLUSTER 231<br />

Different binary measures emphasize different aspects of the relationship between sets of<br />

binary values. However, all the measures are specified in the same way. Each measure has<br />

two optional integer-valued parameters, p (present) and np (not present).<br />

• If both parameters are specified, CLUSTER uses the value of the first as an indicator that<br />

a characteristic is present and the value of the second as an indicator that a characteristic<br />

is absent. CLUSTER skips all other values.<br />

• If only the first parameter is specified, CLUSTER uses that value to indicate presence and<br />

all other values to indicate absence.<br />

• If no parameters are specified, CLUSTER assumes that 1 indicates presence and 0 indicates<br />

absence.<br />

Using the indicators for presence and absence within each item (case or variable), CLUSTER constructs<br />

a 2 ×<br />

2 contingency table for each pair of items in turn. It uses this table to compute a<br />

proximity measure for the pair.<br />

Item 2 characteristics<br />

Present Absent<br />

Item 1 characteristics<br />

Present a b<br />

Absent c d<br />

CLUSTER computes all binary measures from the values of a, b, c, and d. These values are<br />

tallied across variables (when the items are cases) or across cases (when the items are<br />

variables). For example, if variables V, W, X, Y, Z have values 0, 1, 1, 0, 1 for case 1 and values<br />

0, 1, 1, 0, 0 for case 2 (where 1 indicates presence and 0 indicates absence), the contingency<br />

table is as follows:<br />

Case 2 characteristics<br />

Present Absent<br />

Case 1 characteristics<br />

Present 2 1<br />

Absent 0 2<br />

The contingency table indicates that both cases are present for two variables (W and X), both<br />

cases are absent for two variables (V and Y), and case 1 is present and case 2 is absent for one<br />

variable (Z). There are no variables for which case 1 is absent and case 2 is present.<br />

The available binary measures include matching coefficients, conditional probabilities,<br />

predictability measures, and others.<br />

Matching Coefficients. Table 1 shows a classification scheme for matching coefficients. In<br />

this scheme, matches are joint presences (value a in the contingency table) or joint absences<br />

(value d). Nonmatches are equal in number to value b plus value c. Matches and nonmatches<br />

may be weighted equally or not. The three coefficients JACCARD, DICE, and SS2 are related<br />

monotonically, as are SM, SS1, and RT. All coefficients in Table 1 are similarity measures,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!