29.06.2015 Views

SPSS Complex Samples™ 13.0 - Docs.is.ed.ac.uk

SPSS Complex Samples™ 13.0 - Docs.is.ed.ac.uk

SPSS Complex Samples™ 13.0 - Docs.is.ed.ac.uk

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>SPSS</strong> <strong>Complex</strong> Samples <strong>13.0</strong>


For more information about <strong>SPSS</strong> ® software products, please v<strong>is</strong>it our Web site at http://www.spss.com or cont<strong>ac</strong>t<br />

<strong>SPSS</strong> Inc.<br />

233 South W<strong>ac</strong>ker Drive, 11th Floor<br />

Chicago, IL 60606-6412<br />

Tel: (312) 651-3000<br />

Fax: (312) 651-3668<br />

<strong>SPSS</strong> <strong>is</strong> a reg<strong>is</strong>ter<strong>ed</strong> trademark and the other product names are the trademarks of <strong>SPSS</strong> Inc. for its proprietary computer<br />

software. No material describing such software may be produc<strong>ed</strong> or d<strong>is</strong>tribut<strong>ed</strong> without the written perm<strong>is</strong>sion of the owners of<br />

the trademark and license rights in the software and the copyrights in the publ<strong>is</strong>h<strong>ed</strong> materials.<br />

The SOFTWARE and documentation are provid<strong>ed</strong> with RESTRICTED RIGHTS. Use, duplication, or d<strong>is</strong>closure by the<br />

Government <strong>is</strong> subject to restrictions as set forth in subdiv<strong>is</strong>ion (c) (1) (ii) of The Rights in Technical Data and Computer Software<br />

clause at 52.227-7013. Contr<strong>ac</strong>tor/manuf<strong>ac</strong>turer <strong>is</strong> <strong>SPSS</strong> Inc., 233 South W<strong>ac</strong>ker Drive, 11th Floor, Chicago, IL 60606-6412.<br />

General notice: Other product names mention<strong>ed</strong> herein are us<strong>ed</strong> for identification purposes only and may be trademarks of<br />

their respective companies.<br />

TableLook <strong>is</strong> a trademark of <strong>SPSS</strong> Inc.<br />

Windows <strong>is</strong> a reg<strong>is</strong>ter<strong>ed</strong> trademark of Microsoft Corporation.<br />

DataDirect, DataDirect Connect, INTERSOLV, and SequeLink are reg<strong>is</strong>ter<strong>ed</strong> trademarks of DataDirect Technologies.<br />

Portions of th<strong>is</strong> product were creat<strong>ed</strong> using LEADTOOLS © 1991–2000, LEAD Technologies, Inc. ALL RIGHTS RESERVED.<br />

LEAD, LEADTOOLS, and LEADVIEW are reg<strong>is</strong>ter<strong>ed</strong> trademarks of LEAD Technologies, Inc.<br />

Sax Basic <strong>is</strong> a trademark of Sax Software Corporation. Copyright © 1993–2004 by Polar Engineering and Consulting.<br />

All rights reserv<strong>ed</strong>.<br />

Portions of th<strong>is</strong> product were bas<strong>ed</strong> on the work of the FreeType Team (http://www.freetype.org).<br />

A portion of the <strong>SPSS</strong> software contains zlib technology. Copyright © 1995–2002 by Jean-loup Gailly and Mark Adler. The zlib<br />

software <strong>is</strong> provid<strong>ed</strong> “as <strong>is</strong>,” without express or impli<strong>ed</strong> warranty.<br />

A portion of the <strong>SPSS</strong> software contains Sun Java Runtime libraries. Copyright © 2003 by Sun Microsystems, Inc. All rights<br />

reserv<strong>ed</strong>. The Sun Java Runtime libraries include code licens<strong>ed</strong> from RSA Security, Inc. Some portions of the libraries are<br />

licens<strong>ed</strong> from IBM and are available at http://oss.software.ibm.com/icu4j/.<br />

<strong>SPSS</strong> <strong>Complex</strong> Samples <strong>13.0</strong><br />

Copyright © 2004 by <strong>SPSS</strong> Inc.<br />

All rights reserv<strong>ed</strong>.<br />

Print<strong>ed</strong> in the Unit<strong>ed</strong> States of America.<br />

No part of th<strong>is</strong> publication may be reproduc<strong>ed</strong>, stor<strong>ed</strong> in a retrieval system, or transmitt<strong>ed</strong>, in any form or by any means,<br />

electronic, mechanical, photocopying, recording, or otherw<strong>is</strong>e, without the prior written perm<strong>is</strong>sion of the publ<strong>is</strong>her.<br />

1234567890 060504<br />

ISBN 1-56827-353-3


Pref<strong>ac</strong>e<br />

<strong>SPSS</strong> <strong>13.0</strong> <strong>is</strong> a comprehensive system for analyzing data. The <strong>Complex</strong> Samples<br />

optional add-on module provides the additional analytic techniques describ<strong>ed</strong> in th<strong>is</strong><br />

manual. The <strong>Complex</strong> Samples add-on module must be us<strong>ed</strong> with the <strong>SPSS</strong> <strong>13.0</strong><br />

Basesystemand<strong>is</strong>completelyintegrat<strong>ed</strong>intothatsystem.<br />

Installation<br />

To install the <strong>Complex</strong> Samples add-on module, run the License Authorization<br />

Wizard using the authorization code that you receiv<strong>ed</strong> from <strong>SPSS</strong> Inc. For more<br />

information, see the installation instructions suppli<strong>ed</strong> with the <strong>SPSS</strong> Base system.<br />

Compatibility<br />

<strong>SPSS</strong> <strong>is</strong> design<strong>ed</strong> to run on many computer systems. See the installation instructions<br />

that came with your system for specific information on minimum and recommend<strong>ed</strong><br />

requirements.<br />

Serial Numbers<br />

Your serial number <strong>is</strong> your identification number with <strong>SPSS</strong> Inc. You will ne<strong>ed</strong><br />

th<strong>is</strong> serial number when you cont<strong>ac</strong>t <strong>SPSS</strong> Inc. for information regarding support,<br />

payment, or an upgrad<strong>ed</strong> system. The serial number was provid<strong>ed</strong> with your Base<br />

system.<br />

Customer Service<br />

If you have any questions concerning your shipment or <strong>ac</strong>count, cont<strong>ac</strong>t your local<br />

office, l<strong>is</strong>t<strong>ed</strong> on the <strong>SPSS</strong> Web site at http://www.spss.com/worldwide. Please have<br />

your serial number ready for identification.<br />

iii


Training Seminars<br />

<strong>SPSS</strong> Inc. provides both public and onsite training seminars. All seminars feature<br />

hands-on workshops. Seminars will be offer<strong>ed</strong> in major cities on a regular bas<strong>is</strong>. For<br />

more information on these seminars, cont<strong>ac</strong>t your local office, l<strong>is</strong>t<strong>ed</strong> on the <strong>SPSS</strong><br />

Web site at http://www.spss.com/worldwide.<br />

Technical Support<br />

The services of <strong>SPSS</strong> Technical Support are available to reg<strong>is</strong>ter<strong>ed</strong> customers.<br />

Customers may cont<strong>ac</strong>t Technical Support for ass<strong>is</strong>tance in using <strong>SPSS</strong> or for<br />

installation help for one of the support<strong>ed</strong> hardware environments. To re<strong>ac</strong>h Technical<br />

Support, see the <strong>SPSS</strong> Web site at http://www.spss.com, or cont<strong>ac</strong>t your local office,<br />

l<strong>is</strong>t<strong>ed</strong>onthe<strong>SPSS</strong>Websiteathttp://www.spss.com/worldwide. Beprepar<strong>ed</strong>to<br />

identify yourself, your organization, and the serial number of your system.<br />

Additional Publications<br />

Additional copies of <strong>SPSS</strong> product manuals may be purchas<strong>ed</strong> directly from <strong>SPSS</strong><br />

Inc. V<strong>is</strong>it the <strong>SPSS</strong> Web Store at http://www.spss.com/estore, or cont<strong>ac</strong>t your local<br />

<strong>SPSS</strong> office, l<strong>is</strong>t<strong>ed</strong> on the <strong>SPSS</strong> Web site at http://www.spss.com/worldwide. For<br />

telephone orders in the Unit<strong>ed</strong> States and Canada, call <strong>SPSS</strong> Inc. at 800-543-2185.<br />

For telephone orders outside of North America, cont<strong>ac</strong>t your local office, l<strong>is</strong>t<strong>ed</strong><br />

on the <strong>SPSS</strong> Web site.<br />

The <strong>SPSS</strong> Stat<strong>is</strong>tical Proc<strong>ed</strong>ures Companion, by Marija Norus<strong>is</strong>, has been<br />

publ<strong>is</strong>h<strong>ed</strong> by Prentice Hall. A new version of th<strong>is</strong> book, updat<strong>ed</strong> for <strong>SPSS</strong> <strong>13.0</strong>, <strong>is</strong><br />

plann<strong>ed</strong>. The<strong>SPSS</strong> Advanc<strong>ed</strong> Stat<strong>is</strong>tical Proc<strong>ed</strong>ures Companion, also bas<strong>ed</strong> on <strong>SPSS</strong><br />

<strong>13.0</strong>, <strong>is</strong> forthcoming. The <strong>SPSS</strong> Guide to Data Analys<strong>is</strong> for<strong>SPSS</strong><strong>13.0</strong><strong>is</strong>alsoin<br />

development. Announcements of publications available exclusively through Prentice<br />

Hall will be available on the <strong>SPSS</strong> Web site at http://www.spss.com/estore (select<br />

your home country, and then click Books).<br />

Tell Us Your Thoughts<br />

Your comments are important. Please let us know about your experiences with <strong>SPSS</strong><br />

products. We especially like to hear about new and interesting applications using<br />

the <strong>SPSS</strong> system. Please send e-mail to suggest@spss.com or write to <strong>SPSS</strong> Inc.,<br />

iv


Attn.: Director of Product Planning, 233 South W<strong>ac</strong>ker Drive, 11th Floor, Chicago,<br />

IL 60606-6412.<br />

About Th<strong>is</strong> Manual<br />

Th<strong>is</strong> manual documents the graphical user interf<strong>ac</strong>e for the proc<strong>ed</strong>ures includ<strong>ed</strong> in<br />

the <strong>Complex</strong> Samples add-on module. Illustrations of dialog boxes are taken from<br />

<strong>SPSS</strong> for Windows. Dialog boxes in other operating systems are similar. Detail<strong>ed</strong><br />

information about the command syntax for features in th<strong>is</strong> module <strong>is</strong> provid<strong>ed</strong> in the<br />

<strong>SPSS</strong> Command Syntax Reference, available from the Help menu.<br />

Cont<strong>ac</strong>ting <strong>SPSS</strong><br />

If you would like to be on our mailing l<strong>is</strong>t, cont<strong>ac</strong>t one of our offices, l<strong>is</strong>t<strong>ed</strong> on our<br />

Web site at http://www.spss.com/worldwide.<br />

v


Contents<br />

Part I: User's Guide<br />

1 Introduction to <strong>SPSS</strong> <strong>Complex</strong> Samples<br />

Proc<strong>ed</strong>ures 1<br />

Propertiesof<strong>Complex</strong>Samples .................................. 1<br />

Usageof<strong>Complex</strong>SamplesProc<strong>ed</strong>ures............................ 2<br />

PlanFiles................................................ 3<br />

FurtherReadings ............................................. 3<br />

2 Sampling from a <strong>Complex</strong> Design 5<br />

CreatingaNewSamplePlan .................................... 6<br />

SamplingWizard:DesignVariables ............................... 7<br />

TreeControlsforNavigatingtheSamplingWizard................. 8<br />

Sampling Wizard:SamplingMethod............................... 9<br />

Sampling Wizard:SampleSize.................................. 11<br />

Define UnequalSizes...................................... 12<br />

SamplingWizard:OutputVariables............................... 13<br />

SamplingWizard:PlanSummary ................................ 15<br />

SamplingWizard:DrawSampleSelectionOptions................... 16<br />

SamplingWizard:DrawSampleOutputFiles........................ 17<br />

SamplingWizard:Fin<strong>is</strong>h....................................... 18<br />

ModifyinganEx<strong>is</strong>tingSamplePlan............................... 19<br />

SamplingWizard:PlanSummary ................................ 20<br />

vii


RunninganEx<strong>is</strong>tingSamplePlan ................................ 21<br />

CSPLAN and CSSELECT Commands Additional Features . . . . . . . . . . . . . . . 21<br />

3 Preparing a <strong>Complex</strong>SampleforAnalys<strong>is</strong> 23<br />

CreatingaNewAnalys<strong>is</strong>Plan................................... 24<br />

Analys<strong>is</strong>PreparationWizard:DesignVariables...................... 25<br />

TreeControlsforNavigatingtheAnalys<strong>is</strong>Wizard................. 26<br />

Analys<strong>is</strong> PreparationWizard:EstimationMethod.................... 27<br />

Analys<strong>is</strong> PreparationWizard:Size ............................... 28<br />

Define UnequalSizes...................................... 29<br />

Analys<strong>is</strong>PreparationWizard:StageSummary ...................... 30<br />

Analys<strong>is</strong>PreparationWizard:Fin<strong>is</strong>h.............................. 31<br />

ModifyinganEx<strong>is</strong>tingAnalys<strong>is</strong>Plan.............................. 32<br />

Analys<strong>is</strong>PreparationWizard:PlanSummary ....................... 33<br />

4 <strong>Complex</strong> Samples Plan 35<br />

5 <strong>Complex</strong> Samples Frequencies 37<br />

<strong>Complex</strong>SamplesFrequenciesStat<strong>is</strong>tics.......................... 39<br />

<strong>Complex</strong>SamplesM<strong>is</strong>singValues................................ 40<br />

<strong>Complex</strong>SamplesOptions ..................................... 41<br />

viii


6 <strong>Complex</strong> Samples Descriptives 43<br />

<strong>Complex</strong>SamplesDescriptivesStat<strong>is</strong>tics.......................... 45<br />

<strong>Complex</strong>SamplesDescriptivesM<strong>is</strong>singValues...................... 46<br />

<strong>Complex</strong>SamplesOptions ..................................... 46<br />

7 <strong>Complex</strong> Samples Crosstabs 49<br />

<strong>Complex</strong>SamplesCrosstabsStat<strong>is</strong>tics............................ 51<br />

<strong>Complex</strong>SamplesM<strong>is</strong>singValues................................ 53<br />

<strong>Complex</strong>SamplesOptions ..................................... 53<br />

8 <strong>Complex</strong> Samples Ratios 55<br />

<strong>Complex</strong>SamplesRatiosStat<strong>is</strong>tics............................... 57<br />

<strong>Complex</strong>SamplesRatiosM<strong>is</strong>singValues .......................... 58<br />

<strong>Complex</strong>SamplesOptions ..................................... 58<br />

9 <strong>Complex</strong> Samples General Linear Model 61<br />

<strong>Complex</strong> Samples General Linear Model Stat<strong>is</strong>tics . . . . . . . . . . . . . . . . . . . 65<br />

<strong>Complex</strong>SamplesHypothes<strong>is</strong>Tests .............................. 66<br />

<strong>Complex</strong> Samples General Linear Model Estimat<strong>ed</strong> Means . . . . . . . . . . . . . 68<br />

<strong>Complex</strong>SamplesGeneralLinearModelSave ...................... 69<br />

<strong>Complex</strong> Samples General Linear Model Options . . . . . . . . . . . . . . . . . . . . 70<br />

CSGLMCommandAdditionalFeatures............................ 71<br />

ix


10 <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression 73<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Reference Category . . . . . . . . . . . . 75<br />

<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionModel....................... 76<br />

<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionStat<strong>is</strong>tics..................... 78<br />

<strong>Complex</strong>SamplesHypothes<strong>is</strong>Tests .............................. 80<br />

<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionOddsRatios................... 81<br />

<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionSave........................ 83<br />

<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionOptions...................... 84<br />

CSLOGISTICCommandAdditionalFeatures ........................ 85<br />

Part II: Examples<br />

11 <strong>Complex</strong> Samples Sampling Wizard 89<br />

Obtaining aSamplefromaFullSamplingFrame..................... 89<br />

Using the Wizard ........................................ 89<br />

PlanSummary........................................... 99<br />

SamplingSummary...................................... 100<br />

Sample Results......................................... 102<br />

ObtainingaSamplefromaPartialSamplingFrame.................. 103<br />

UsingtheWizardtoSamplefromtheFirstPartialFrame .......... 103<br />

Sample Results......................................... 113<br />

UsingtheWizardtoSamplefromtheSecondPartialFrame........ 114<br />

SampleResults......................................... 120<br />

Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 121<br />

x


12 <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation<br />

Wizard 123<br />

Using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard to Ready NHIS<br />

PublicData................................................ 123<br />

UsingtheWizard........................................ 123<br />

Summary.............................................. 126<br />

Preparing for Analys<strong>is</strong> When Sampling Weights Are Not in the Data File. . 126<br />

Computing Inclusion Probabilities and Sampling Weights . . . . . . . . . 127<br />

Using the Wizard........................................ 130<br />

Summary.............................................. 137<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures......................................... 137<br />

13 <strong>Complex</strong> Samples Frequencies 139<br />

Using <strong>Complex</strong> Samples Frequencies to Analyze Nutritional Supplement<br />

Usage.................................................... 139<br />

RunningtheAnalys<strong>is</strong>..................................... 139<br />

Frequency Table ........................................ 142<br />

FrequencybySubpopulation............................... 143<br />

Summary.............................................. 144<br />

Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 144<br />

14 <strong>Complex</strong> Samples Descriptives 145<br />

Using <strong>Complex</strong> Samples Descriptives to Analyze Activity Levels . . . . . . . . 145<br />

RunningtheAnalys<strong>is</strong>..................................... 145<br />

UnivariateStat<strong>is</strong>tics...................................... 148<br />

xi


UnivariateStat<strong>is</strong>ticsbySubpopulation........................ 149<br />

Summary.............................................. 150<br />

Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 150<br />

15 <strong>Complex</strong> Samples Crosstabs 151<br />

Using <strong>Complex</strong> Samples Crosstabs to Measure the Relative R<strong>is</strong>k of an<br />

Event.................................................... 151<br />

Running theAnalys<strong>is</strong>..................................... 151<br />

Crosstabulation......................................... 155<br />

R<strong>is</strong>kEstimate .......................................... 155<br />

R<strong>is</strong>k EstimatebySubpopulation............................. 157<br />

Summary.............................................. 157<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures......................................... 158<br />

16 <strong>Complex</strong> Samples Ratios 159<br />

Using <strong>Complex</strong> Samples Ratios to Aid Property Value Assessment . . . . . . 159<br />

Running theAnalys<strong>is</strong>..................................... 159<br />

Ratios................................................ 162<br />

Pivot<strong>ed</strong>RatiosTable ..................................... 163<br />

Summary.............................................. 164<br />

Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 165<br />

17 <strong>Complex</strong> Samples General Linear Model 167<br />

Using <strong>Complex</strong> Samples General Linear Model to Fit a Two-F<strong>ac</strong>tor ANOVA 167<br />

RunningtheAnalys<strong>is</strong>..................................... 167<br />

Model Summary ........................................ 172<br />

xii


TestsofModelEffects.................................... 173<br />

ParameterEstimates..................................... 174<br />

Estimat<strong>ed</strong> MarginalMeans................................ 175<br />

Summary ............................................. 178<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures......................................... 179<br />

18 <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression 181<br />

Using <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression to Assess Cr<strong>ed</strong>it R<strong>is</strong>k . . . . . . 181<br />

Running the Analys<strong>is</strong>..................................... 181<br />

PseudoR-Squares....................................... 187<br />

Classification........................................... 187<br />

Tests of ModelEffects.................................... 188<br />

ParameterEstimates..................................... 189<br />

OddsRatios............................................ 190<br />

Summary.............................................. 192<br />

Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 192<br />

Bibliography 193<br />

Index 195<br />

xiii


Part 1: User's Guide


Introduction to<strong>SPSS</strong><strong>Complex</strong><br />

Samples Proc<strong>ed</strong>ures<br />

Chapter<br />

1<br />

An inherent assumption of analytical proc<strong>ed</strong>ures in traditional software p<strong>ac</strong>kages<br />

<strong>is</strong>thattheobservationsinadatafilerepresentasimplerandomsamplefromthe<br />

population of interest. Th<strong>is</strong> assumption <strong>is</strong> untenable for an increasing number of<br />

companies and researchers who find it both cost-effective and convenient to obtain<br />

samples in a more structur<strong>ed</strong> way.<br />

The <strong>SPSS</strong> <strong>Complex</strong> Samples option allows you to select a sample <strong>ac</strong>cording to<br />

a complex design and incorporate the design specifications into the data analys<strong>is</strong>,<br />

thus ensuring that your results are valid.<br />

Properties of <strong>Complex</strong> Samples<br />

A complex sample can differ from a simple random sample in many ways. In a<br />

simple random sample, individual sampling units are select<strong>ed</strong> at random with equal<br />

probability and without repl<strong>ac</strong>ement (WOR) directly from the entire population. By<br />

contrast, a given complex sample can have some or all of the following features:<br />

Stratification. Stratifi<strong>ed</strong> sampling involves selecting samples independently within<br />

non-overlapping subgroups of the population, or strata. For example, strata may<br />

be socioeconomic groups, job categories, age groups, or ethnic groups. With<br />

stratification, you can ensure adequate sample sizes for subgroups of interest,<br />

improve the prec<strong>is</strong>ion of overall estimates, and use different sampling methods from<br />

stratum to stratum.<br />

Clustering. Cluster sampling involves the selection of groups of sampling units, or<br />

clusters. For example, clusters may be schools, hospitals, or geographical areas,<br />

and sampling units may be students, patients, or citizens. Clustering <strong>is</strong> common in<br />

mult<strong>is</strong>tage designs and area (geographic) samples.<br />

1


2<br />

Chapter 1<br />

Multiple stages. In mult<strong>is</strong>tage sampling, you select a first-stage sample bas<strong>ed</strong> on<br />

clusters. Then you create a second-stage sample by drawing subsamples from the<br />

select<strong>ed</strong> clusters. If the second-stage sample <strong>is</strong> bas<strong>ed</strong> on subclusters, you can then add<br />

a third stage to the sample. For example, in the first stage of a survey, a sample of<br />

cities could be drawn. Then, from the select<strong>ed</strong> cities, households could be sampl<strong>ed</strong>.<br />

Finally, from the select<strong>ed</strong> households, individuals could be poll<strong>ed</strong>. The Sampling and<br />

Analys<strong>is</strong> Preparation wizards allow you to specify three stages in a design.<br />

Nonrandom sampling. When selection at random <strong>is</strong> difficult to obtain, units can be<br />

sampl<strong>ed</strong> systematically (at a fix<strong>ed</strong> interval) or sequentially.<br />

Unequal selection probabilities. When sampling clusters that contain unequal numbers<br />

of units, you can use probability-proportional-to-size (PPS) sampling to make a<br />

cluster’s selection probability equal to the proportion of units it contains. PPS<br />

sampling can also use more general weighting schemes to select units.<br />

Unrestrict<strong>ed</strong> sampling. Unrestrict<strong>ed</strong> sampling selects units with repl<strong>ac</strong>ement (WR).<br />

Thus, an individual unit can be select<strong>ed</strong> for the sample more than once.<br />

Sampling weights. Sampling weights are automatically comput<strong>ed</strong> while drawing a<br />

complex sample and ideally correspond to the “frequency” that e<strong>ac</strong>h sampling unit<br />

represents in the target population. Therefore, the sum of the weights over the sample<br />

should estimate the population size. <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures require<br />

sampling weights in order to properly analyze a complex sample. Note that these<br />

weights should be us<strong>ed</strong> entirely within the <strong>Complex</strong> Samples option and should not<br />

be us<strong>ed</strong> with other analytical proc<strong>ed</strong>ures via the Weight Cases proc<strong>ed</strong>ure, which treats<br />

weightsascasereplications.<br />

Usage of <strong>Complex</strong> Samples Proc<strong>ed</strong>ures<br />

Your usage of <strong>Complex</strong> Samples proc<strong>ed</strong>ures depends on your particular ne<strong>ed</strong>s. The<br />

primary types of users are those who:<br />

• Plan and carry out surveys <strong>ac</strong>cording to complex designs, possibly analyzing the<br />

sample later. The primary tool for surveyors <strong>is</strong> the Sampling Wizard.<br />

• Analyze sample data files previously obtain<strong>ed</strong> <strong>ac</strong>cording to complex designs.<br />

Before using the <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures, you may ne<strong>ed</strong> to use the<br />

Analys<strong>is</strong> Preparation Wizard.


Introduction to <strong>SPSS</strong> <strong>Complex</strong> Samples Proc<strong>ed</strong>ures<br />

3<br />

Regardlessofwhichtypeofuseryouare,youne<strong>ed</strong>tosupplydesigninformationto<br />

<strong>Complex</strong> Samples proc<strong>ed</strong>ures. Th<strong>is</strong> information <strong>is</strong> stor<strong>ed</strong> in a plan file for easy reuse.<br />

Plan Files<br />

A plan file contains complex sample specifications. There are two types of plan files:<br />

Sampling plan. The specifications given in the Sampling Wizard define a sample<br />

design that <strong>is</strong> us<strong>ed</strong> to draw a complex sample. The sampling plan file contains those<br />

specifications. The sampling plan file also contains a default analys<strong>is</strong> plan that uses<br />

estimation methods suitable for the specifi<strong>ed</strong> sample design.<br />

Analys<strong>is</strong> plan. Th<strong>is</strong> plan file contains information ne<strong>ed</strong><strong>ed</strong> by <strong>Complex</strong> Samples<br />

analys<strong>is</strong> proc<strong>ed</strong>ures to properly compute variance estimates for a complex sample.<br />

The plan includes the sample structure, estimation methods for e<strong>ac</strong>h stage, and<br />

references to requir<strong>ed</strong> variables, such as sample weights. The Analys<strong>is</strong> Preparation<br />

Wizard allows you to create and <strong>ed</strong>it analys<strong>is</strong> plans.<br />

There are several advantages to saving your specifications in a plan file, including:<br />

• A surveyor can specify the first stage of a mult<strong>is</strong>tage sampling plan and draw<br />

first-stage units now, collect information on sampling units for the second stage,<br />

and then modify the sampling plan to include the second stage.<br />

• Ananalystwhodoesn’thave<strong>ac</strong>cesstothesamplingplanfilecanspecifyan<br />

analys<strong>is</strong> plan and refer to that plan from e<strong>ac</strong>h <strong>Complex</strong> Samples analys<strong>is</strong><br />

proc<strong>ed</strong>ure.<br />

• Adesigner of large-scale public use samples can publ<strong>is</strong>h the sampling plan file,<br />

which simplifies the instructions for analysts and avoids the ne<strong>ed</strong> for e<strong>ac</strong>h analyst<br />

to specify h<strong>is</strong> or her own analys<strong>is</strong> plans.<br />

Further Readings<br />

For more information on sampling techniques, see the following texts:<br />

Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley and Sons.<br />

K<strong>is</strong>h,L.1965.Survey Sampling. New York: John Wiley and Sons.<br />

K<strong>is</strong>h, L. 1987. Stat<strong>is</strong>tical Design for Research. New York: John Wiley and Sons.


4<br />

Chapter 1<br />

Murthy, M. N. 1967. Sampling Theory and Methods. Calcutta, India: Stat<strong>is</strong>tical<br />

Publ<strong>is</strong>hing Society.<br />

Särndal, C., B. Swensson, and J. Wretman. 1992. Model Ass<strong>is</strong>t<strong>ed</strong> Survey Sampling.<br />

New York: Springer-Verlag.


Chapter<br />

2<br />

Sampling froma<strong>Complex</strong>Design<br />

Figure 2-1<br />

Sampling Wizard, Welcome step<br />

The Sampling Wizard guides you through the steps for creating, modifying, or<br />

executing a sampling plan file. Before using the Wizard, you should have a<br />

well-defin<strong>ed</strong> target population, a l<strong>is</strong>t of sampling units, and an appropriate sample<br />

design in mind.<br />

5


6<br />

Chapter 2<br />

Creating a New SamplePlan<br />

E<br />

E<br />

E<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Select a Sample...<br />

Select Design a sample and choose a plan filename to save the sample plan.<br />

Click Next to continue through the Wizard.<br />

Optionally, in the Define Variables step, you can define strata, clusters, and input<br />

sample weights. After you define these, click Next.<br />

Optionally, in the Sampling Method step, you can choose a method for selecting items.<br />

If you select PPS Brewer or PPS Murthy, you can click Fin<strong>is</strong>h to draw the sample.<br />

Otherw<strong>is</strong>e, click Next and then:<br />

E<br />

In the Sample Size step, specify the number or proportion of units to sample.<br />

You can now click Fin<strong>is</strong>h to draw the sample. Optionally, in further steps, you can:<br />

• Choose output variables to save.<br />

• Add a second or third stage to the design.<br />

• Set various selection options, including which stages to draw samples from, the<br />

random number se<strong>ed</strong>, and whether to treat user-m<strong>is</strong>sing values as valid values of<br />

design variables.<br />

• Choose wheretosaveoutputdata.<br />

• Paste your selections as command syntax.


7<br />

Sampling from a <strong>Complex</strong> Design<br />

Sampling Wizard: Design Variables<br />

Figure 2-2<br />

Sampling Wizard, Design Variables step<br />

Th<strong>is</strong> step allows you to select stratification and clustering variables and to define input<br />

sample weights. You can also specify a label for the stage.<br />

Stratify By. The cross-classification of stratification variables defines d<strong>is</strong>tinct<br />

subpopulations, or strata. Separate samples are obtain<strong>ed</strong> for e<strong>ac</strong>h stratum. To improve<br />

the prec<strong>is</strong>ion of your estimates, units within strata should be as homogeneous as<br />

possible for the char<strong>ac</strong>ter<strong>is</strong>tics of interest.<br />

Clusters. Cluster variables define groups of observational units, or clusters. Clusters<br />

are useful when directly sampling observational units from the population <strong>is</strong><br />

expensive or impossible; instead, you can sample clusters from the population and<br />

then sample observational units from the select<strong>ed</strong> clusters. However, the use of<br />

clusters can introduce correlations among sampling units, resulting in a loss of


8<br />

Chapter 2<br />

prec<strong>is</strong>ion. To minimize th<strong>is</strong> effect, units within clusters should be as heterogeneous<br />

as possible for the char<strong>ac</strong>ter<strong>is</strong>tics of interest. You must define at least one cluster<br />

variable in order to plan a mult<strong>is</strong>tage design. Clusters are also necessary in the use of<br />

several different sampling methods. For more information, see “Sampling Wizard:<br />

Sampling Method” on p. 9.<br />

Input Sample Weight. If the current sample design <strong>is</strong> part of a larger sample design,<br />

you may have sample weights from a previous stage of the larger design. You<br />

can specify a numeric variable containing these weights in the first stage of the<br />

current design. Sample weights are comput<strong>ed</strong> automatically for subsequent stages<br />

of the current design.<br />

Stage Label. You can specify an optional string label for e<strong>ac</strong>h stage. Th<strong>is</strong> <strong>is</strong> us<strong>ed</strong> in the<br />

output to help identify stagew<strong>is</strong>e information.<br />

Note: The source variable l<strong>is</strong>t has the same content <strong>ac</strong>ross steps of the Wizard. In other<br />

words, variables remov<strong>ed</strong> from the source l<strong>is</strong>t in a particular step are remov<strong>ed</strong> from<br />

the l<strong>is</strong>t inallsteps.Variablesreturn<strong>ed</strong>tothesource l<strong>is</strong>t appear in the l<strong>is</strong>t in all steps.<br />

Tree Controls for Navigating the Sampling Wizard<br />

On the left side of e<strong>ac</strong>h step in the Sampling Wizard <strong>is</strong> an outline of all the steps. You<br />

can navigatetheWizardbyclickingonthenameofanenabl<strong>ed</strong>stepintheoutline.<br />

Steps are enabl<strong>ed</strong> as long as all previous steps are valid—that <strong>is</strong>, if e<strong>ac</strong>h previous step<br />

has been given the minimum requir<strong>ed</strong> specifications for that step. See the Help for<br />

individual steps for more information on why a given step may be invalid.


9<br />

Sampling from a <strong>Complex</strong> Design<br />

Sampling Wizard: Sampling Method<br />

Figure 2-3<br />

Sampling Wizard, Method step<br />

Th<strong>is</strong> step allows you to specify how to select cases from the working data file.<br />

Method. Controls in th<strong>is</strong> group are us<strong>ed</strong> to choose a selection method. Some sampling<br />

types allow you to choose whether to sample with repl<strong>ac</strong>ement (WR) or without<br />

repl<strong>ac</strong>ement (WOR). See the type descriptions for more information. Note that some<br />

probability-proportional-to-size (PPS) types are available only when clusters have<br />

been defin<strong>ed</strong> and that all PPS types are available only in the first stage of a design.<br />

Moreover, WR methods are available only in the last stage of a design.<br />

• Simple Random Sampling. Units are select<strong>ed</strong> with equal probability. They can be<br />

select<strong>ed</strong> with or without repl<strong>ac</strong>ement.


10<br />

Chapter 2<br />

• Simple Systematic. Units are select<strong>ed</strong> at a fix<strong>ed</strong> interval throughout the sampling<br />

frame (or strata,iftheyhavebeenspecifi<strong>ed</strong>)and extr<strong>ac</strong>t<strong>ed</strong> without repl<strong>ac</strong>ement.<br />

A randomly select<strong>ed</strong> unit within the first interval <strong>is</strong> chosen as the starting point.<br />

• Simple Sequential. Units are select<strong>ed</strong> sequentially with equal probability and<br />

without repl<strong>ac</strong>ement.<br />

• PPS. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects units at random with probability<br />

proportional to size. Any units can be select<strong>ed</strong> with repl<strong>ac</strong>ement; only clusters<br />

can be sampl<strong>ed</strong> without repl<strong>ac</strong>ement.<br />

• PPS Systematic. Th<strong>is</strong> <strong>is</strong> a first-stage method that systematically selects units with<br />

probability proportional to size. They are select<strong>ed</strong> without repl<strong>ac</strong>ement.<br />

• PPS Sequential. Th<strong>is</strong> <strong>is</strong> a first-stage method that sequentially selects units with<br />

probability proportional to cluster size and without repl<strong>ac</strong>ement.<br />

• PPS Brewer. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects two clusters from e<strong>ac</strong>h<br />

stratum with probability proportional to cluster size and without repl<strong>ac</strong>ement. A<br />

cluster variable must be specifi<strong>ed</strong> to use th<strong>is</strong> method.<br />

• PPS Murthy. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects two clusters from e<strong>ac</strong>h<br />

stratum with probability proportional to cluster size and without repl<strong>ac</strong>ement. A<br />

cluster variable must be specifi<strong>ed</strong> to use th<strong>is</strong> method.<br />

• PPS Sampford. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects more than two clusters<br />

from e<strong>ac</strong>h stratum with probability proportional to cluster size and without<br />

repl<strong>ac</strong>ement. It <strong>is</strong> an extension of Brewer’s method. A cluster variable must be<br />

specifi<strong>ed</strong> to use th<strong>is</strong> method.<br />

• Use WR estimation for analys<strong>is</strong>. By default, an estimation method <strong>is</strong> specifi<strong>ed</strong> in<br />

the plan file that <strong>is</strong> cons<strong>is</strong>tent with the select<strong>ed</strong> sampling method. Th<strong>is</strong> allows you<br />

to use with-repl<strong>ac</strong>ement estimation even if the sampling method implies WOR<br />

estimation. Th<strong>is</strong> option <strong>is</strong> available only in stage 1.<br />

Measure of Size (MOS). If a PPS method <strong>is</strong> select<strong>ed</strong>, you must specify a measure of<br />

size that defines the size of e<strong>ac</strong>h unit. These sizes can be explicitly defin<strong>ed</strong> in a<br />

variable or they can be comput<strong>ed</strong> from the data. Optionally, you can set lower and<br />

upper boundsontheMOS,overridinganyvaluesfoundintheMOSvariableor<br />

comput<strong>ed</strong> from the data. These options are available only in stage 1.


11<br />

Sampling from a <strong>Complex</strong> Design<br />

Sampling Wizard: Sample Size<br />

Figure 2-4<br />

Sampling Wizard, Sample Size step<br />

Th<strong>is</strong> step allows you to specify the number or proportion of units to sample within<br />

the current stage. The sample size can be fix<strong>ed</strong> or it can vary <strong>ac</strong>ross strata. For the<br />

purpose of specifying sample size, clusters chosen in previous stages can be us<strong>ed</strong> to<br />

define strata.<br />

Units. You can specify an ex<strong>ac</strong>t sample size or a proportion of units to sample.<br />

• Value. A single value <strong>is</strong> appli<strong>ed</strong> to all strata. If Counts <strong>is</strong> select<strong>ed</strong> as the unit<br />

metric, you should enter a positive integer. If Proportions <strong>is</strong> select<strong>ed</strong>, you should<br />

enter a non-negative value. Unless sampling with repl<strong>ac</strong>ement, proportion values<br />

should also be no greater than 1.


12<br />

Chapter 2<br />

• Unequal values for strata. Allows you to enter size values on a per-stratum bas<strong>is</strong><br />

via the Define Unequal Sizes dialog box.<br />

• Read values from variable. Allows you to select a numeric variable that contains<br />

size values for strata.<br />

If Proportions <strong>is</strong> select<strong>ed</strong>, you have the option to set lower and upper bounds on<br />

the number of units sampl<strong>ed</strong>.<br />

Define Unequal Sizes<br />

Figure 2-5<br />

Define Unequal Sizes dialog box<br />

The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum bas<strong>is</strong>.<br />

Size Specifications grid. The grid d<strong>is</strong>plays the cross-classifications of up to five strata<br />

or cluster variables—one stratum/cluster combination per row. Eligible grid variables<br />

include all stratification variables from the current and previous stages and all cluster<br />

variables from previous stages. Variables can be reorder<strong>ed</strong> within the grid or mov<strong>ed</strong><br />

to the Exclude l<strong>is</strong>t. Enter sizes in the rightmost column. Click Labels or Values<br />

to toggle the d<strong>is</strong>play of value labels and data values for stratification and cluster<br />

variables in the grid cells. Cells that contain unlabel<strong>ed</strong> values always show values.


13<br />

Sampling from a <strong>Complex</strong> Design<br />

Click Refresh Strata to repopulate the grid with e<strong>ac</strong>h combination of label<strong>ed</strong> data<br />

values for variables in the grid.<br />

Exclude. To specify sizes for a subset of stratum/cluster combinations, move one or<br />

more variables to the Exclude l<strong>is</strong>t. These variables are not us<strong>ed</strong> to define sample sizes.<br />

Sampling Wizard: Output Variables<br />

Figure 2-6<br />

Sampling Wizard, Output Variables step<br />

Th<strong>is</strong> step allows you to choose variables to save when the sample <strong>is</strong> drawn.<br />

Population size. The estimat<strong>ed</strong> number of units in the population for a given stage.<br />

The root name for the sav<strong>ed</strong> variable <strong>is</strong> PopulationSize_.<br />

Sample proportion. The sampling rate at a given stage. The root name for the sav<strong>ed</strong><br />

variable <strong>is</strong> SamplingRate_.


14<br />

Chapter 2<br />

Sample size. The number of units drawn at a given stage. The root name for the<br />

sav<strong>ed</strong> variable <strong>is</strong> SampleSize_.<br />

Sample weight. The inverse of the inclusion probabilities. The root name for the<br />

sav<strong>ed</strong> variable <strong>is</strong> SampleWeight_.<br />

Some stagew<strong>is</strong>e variables are generat<strong>ed</strong> automatically. These include:<br />

Inclusion probabilities. The proportion of units drawn at a given stage. The root name<br />

for the sav<strong>ed</strong> variable<strong>is</strong>InclusionProbability_.<br />

Cumulative weight. The cumulative sample weight over stages previous to<br />

and including the current one. The root name for the sav<strong>ed</strong> variable <strong>is</strong><br />

SampleWeightCumulative_.<br />

Index. Identifies units select<strong>ed</strong> multiple times within a given stage. The root name for<br />

the sav<strong>ed</strong> variable <strong>is</strong> Index_.<br />

Note: Sav<strong>ed</strong> variable root names include an integer suffix that reflects the stage<br />

number—for example, PopulationSize_1_ for the sav<strong>ed</strong> population size for stage 1.


15<br />

Sampling from a <strong>Complex</strong> Design<br />

Sampling Wizard: Plan Summary<br />

Figure 2-7<br />

Sampling Wizard, Plan Summary step<br />

Th<strong>is</strong> <strong>is</strong> the last step within e<strong>ac</strong>h stage, providing a summary of the sample design<br />

specifications through the current stage. From here, you can either proce<strong>ed</strong> to the<br />

next stage (creating it, if necessary) or set options for drawing the sample.


16<br />

Chapter 2<br />

Sampling Wizard: Draw Sample Selection Options<br />

Figure 2-8<br />

Sampling Wizard, Draw Sample, Selection Options step<br />

Th<strong>is</strong> step allows you to choose whether to draw a sample. You can also control other<br />

sampling options, such as the random se<strong>ed</strong> and m<strong>is</strong>sing-value handling.<br />

Draw sample. In addition to choosing whether to draw a sample, you can also choose<br />

to execute part of the sampling design. Stages must be drawn in order—that <strong>is</strong>,<br />

stage 2 cannot be drawn unless stage 1 <strong>is</strong> also drawn. When <strong>ed</strong>iting or executing a<br />

plan, you cannot resample lock<strong>ed</strong> stages.<br />

Se<strong>ed</strong>. Th<strong>is</strong> allows you to choose a se<strong>ed</strong> value for random number generation.<br />

Include user-m<strong>is</strong>sing values. Th<strong>is</strong> determines whether user-m<strong>is</strong>sing values are valid. If<br />

so, user-m<strong>is</strong>sing values are treat<strong>ed</strong> as a separate category.<br />

Data already sort<strong>ed</strong>. If your sample frame <strong>is</strong> presort<strong>ed</strong> by the values of the stratification<br />

variables, th<strong>is</strong> option allows you to spe<strong>ed</strong> the selection process.


Sampling Wizard: Draw Sample Output Files<br />

Figure 2-9<br />

Sampling Wizard, Draw Sample, Output Files step<br />

17<br />

Sampling from a <strong>Complex</strong> Design<br />

Th<strong>is</strong> step allows you to choose where to direct sampl<strong>ed</strong> cases, weight variables, joint<br />

probabilities, and case selection rules.<br />

Sample data. These options let you determine where sample output <strong>is</strong> written. It can<br />

be add<strong>ed</strong> to the working data file or sav<strong>ed</strong> to an external file. If an external file <strong>is</strong><br />

specifi<strong>ed</strong>, the sampling output variables and variables in the working data file for<br />

the select<strong>ed</strong> cases are sav<strong>ed</strong> to the file.<br />

Joint probabilities. These options let you determine where joint probabilities are<br />

written. Joint probabilities are produc<strong>ed</strong> if the PPS WOR, PPS Brewer, PPS<br />

Sampford, or PPS Murthy method <strong>is</strong> select<strong>ed</strong> and WR estimation <strong>is</strong> not specifi<strong>ed</strong>.


18<br />

Chapter 2<br />

Case selection rules. If you are constructing your sample one stage at a time, you may<br />

want to save the case selection rules to a text file. They are useful for constructing the<br />

subframe for subsequent stages.<br />

Sampling Wizard: Fin<strong>is</strong>h<br />

Figure 2-10<br />

Sampling Wizard, Fin<strong>is</strong>h step<br />

Th<strong>is</strong> <strong>is</strong> the final step. You can save the plan file and draw the sample now or paste<br />

your selections into a syntax window.<br />

When making changes to stages in the ex<strong>is</strong>ting plan file, you can save the <strong>ed</strong>it<strong>ed</strong><br />

plan to a new file or overwrite the ex<strong>is</strong>ting file. When adding stages without making<br />

changes to ex<strong>is</strong>ting stages, the Wizard automatically overwrites the ex<strong>is</strong>ting plan file.<br />

Ifyouwanttosavetheplantoanewfile,selectPaste the syntax generat<strong>ed</strong> by the<br />

Wizard into a syntax window and change the filename in the syntax commands.


19<br />

Sampling from a <strong>Complex</strong> Design<br />

Modifying an Ex<strong>is</strong>ting Sample Plan<br />

E<br />

E<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Select a Sample...<br />

Select Edit a sample design and choose a plan file to <strong>ed</strong>it.<br />

Click Next to continue through the Wizard.<br />

Review the sampling plan in the Plan Summary step, and then click Next.<br />

Subsequent steps are largely the same as for a new design. See the Help for individual<br />

steps for more information.<br />

E<br />

Navigate to the Fin<strong>is</strong>h step, and specify a new name for the <strong>ed</strong>it<strong>ed</strong> plan file or choose<br />

to overwrite the ex<strong>is</strong>ting plan file.<br />

Optionally, you can:<br />

• Specify stages that have already been sampl<strong>ed</strong>.<br />

• Remove stages from the plan.


20<br />

Chapter 2<br />

Sampling Wizard: Plan Summary<br />

Figure 2-11<br />

Sampling Wizard, Plan Summary step<br />

Th<strong>is</strong> step allows you to review the sampling plan and indicate stages that have already<br />

been sampl<strong>ed</strong>. If <strong>ed</strong>iting a plan, you can also remove stages from the plan.<br />

Previously sampl<strong>ed</strong> stages. If an extend<strong>ed</strong> sampling frame <strong>is</strong> not available, you will<br />

have to execute a mult<strong>is</strong>tage sampling design one stage at a time. Select which stages<br />

have already been sampl<strong>ed</strong> from the drop-down l<strong>is</strong>t. Any stages that have been<br />

execut<strong>ed</strong> are lock<strong>ed</strong>; they are not available in the Draw Sample Selection Options<br />

step, and they cannot be alter<strong>ed</strong> when <strong>ed</strong>iting a plan.<br />

Remove stages. You can remove stages 2 and 3 from a mult<strong>is</strong>tage design.


21<br />

Sampling from a <strong>Complex</strong> Design<br />

RunninganEx<strong>is</strong>ting Sample Plan<br />

E<br />

E<br />

E<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Select a Sample...<br />

Select Draw a sample and choose a plan file to run.<br />

Click Next to continue through the Wizard.<br />

Review the sampling plan in the Plan Summary step, and then click Next.<br />

The individual steps containing stage information are skipp<strong>ed</strong> when executing a<br />

sample plan. You can now go on to the Fin<strong>is</strong>h step at any time.<br />

Optionally, you can:<br />

• Specify stages that have already been sampl<strong>ed</strong>.<br />

CSPLAN and CSSELECT Commands Additional Features<br />

The <strong>SPSS</strong> command language also allows you to:<br />

• Specify custom names for output variables.<br />

• Control the output in the Viewer. For example, you can suppress the stagew<strong>is</strong>e<br />

summary of the plan that <strong>is</strong> d<strong>is</strong>play<strong>ed</strong> if a sample <strong>is</strong> design<strong>ed</strong> or modifi<strong>ed</strong>,<br />

suppress the summary of the d<strong>is</strong>tribution of sampl<strong>ed</strong> cases by strata that <strong>is</strong> shown<br />

if the sample design <strong>is</strong> execut<strong>ed</strong>, and request a case processing summary.<br />

• Choose a subset of variables in the working data file to write to an external<br />

sample file.<br />

See the <strong>SPSS</strong> Command Syntax Reference for complete syntax information.


Preparing a <strong>Complex</strong> Sample<br />

for Analys<strong>is</strong><br />

Chapter<br />

3<br />

Figure 3-1<br />

Analys<strong>is</strong> Preparation Wizard, Welcome step<br />

23


24<br />

Chapter 3<br />

The Analys<strong>is</strong> Preparation Wizard guides you through the steps for creating or<br />

modifying an analys<strong>is</strong> plan for use with the various <strong>Complex</strong> Samples analys<strong>is</strong><br />

proc<strong>ed</strong>ures. Before using the Wizard, you should have a sample drawn <strong>ac</strong>cording to a<br />

complex design.<br />

Creating a new plan <strong>is</strong> most useful when you do not have <strong>ac</strong>cess to the sampling<br />

plan file us<strong>ed</strong> to draw the sample (recall that the sampling plan contains a default<br />

analys<strong>is</strong> plan). If you do have <strong>ac</strong>cess to the sampling plan file us<strong>ed</strong> to draw the<br />

sample, you can use the default analys<strong>is</strong> plan contain<strong>ed</strong> in the sampling plan file or<br />

override the default analys<strong>is</strong> specifications and save your changes to a new file.<br />

Creating a New Analys<strong>is</strong> Plan<br />

E<br />

E<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Prepare for Analys<strong>is</strong>...<br />

Select Createaplanfile, and choose a plan filename to which you will save the<br />

analys<strong>is</strong> plan.<br />

Click Next to continue through the Wizard.<br />

Specify the variable containing sample weights in the Design Variables step,<br />

optionally defining strata and clusters.<br />

You can now click Fin<strong>is</strong>h to save the plan. Optionally, in further steps you can:<br />

• Select the method for estimating standard errors in the Estimation Method step.<br />

• Specify the number of units sampl<strong>ed</strong> or the inclusion probability per unit in<br />

the Size step.<br />

• Add a second or third stage to the design.<br />

• Paste your selections as command syntax.


Analys<strong>is</strong> Preparation Wizard: Design Variables<br />

Figure 3-2<br />

Analys<strong>is</strong> Preparation Wizard, Design Variables step (stage 1)<br />

25<br />

Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />

Th<strong>is</strong> step allows you to identify the stratification and clustering variables and define<br />

sample weights. You can also provide a label for the stage.<br />

Strata. The cross-classification of stratification variables defines d<strong>is</strong>tinct<br />

subpopulations, or strata. Your total sample represents the combination of<br />

independent samples from e<strong>ac</strong>h stratum.<br />

Clusters. Cluster variables define groups of observational units, or clusters. Samples<br />

drawn inmultiplestagesselectclustersintheearlier stages and then subsample units<br />

from the select<strong>ed</strong> clusters. When analyzing a data file obtain<strong>ed</strong> by sampling clusters<br />

with repl<strong>ac</strong>ement, you should include the duplication index as a cluster variable.<br />

Sample Weights. You must provide sample weights in the first stage. Sample weights<br />

are comput<strong>ed</strong> automatically for subsequent stages of the current design.


26<br />

Chapter 3<br />

Stage Label. You can specify an optional string label for e<strong>ac</strong>h stage. Th<strong>is</strong> <strong>is</strong> us<strong>ed</strong> in the<br />

output to help identify stagew<strong>is</strong>e information.<br />

Note: The source variable l<strong>is</strong>t has the same contents <strong>ac</strong>ross steps of the Wizard. In<br />

other words, variables remov<strong>ed</strong> from the source l<strong>is</strong>t in a particular step are remov<strong>ed</strong><br />

from the l<strong>is</strong>t in all steps. Variables return<strong>ed</strong> to the source l<strong>is</strong>t show up in all steps.<br />

Tree Controls for Navigating the Analys<strong>is</strong> Wizard<br />

At the left side of e<strong>ac</strong>h step of the Analys<strong>is</strong> Wizard <strong>is</strong> an outline of all the steps. You<br />

can navigate the Wizard by clicking on the name of an enabl<strong>ed</strong> step in the outline.<br />

Steps are enabl<strong>ed</strong> as long as all previous steps are valid—that <strong>is</strong>, as long as e<strong>ac</strong>h<br />

previous step has been given the minimum requir<strong>ed</strong> specifications for that step. For<br />

more information on why a given step may be invalid, see the Help for individual<br />

steps.


Analys<strong>is</strong> Preparation Wizard: Estimation Method<br />

Figure 3-3<br />

Analys<strong>is</strong> Preparation Wizard, Estimation Method step (stage 1)<br />

27<br />

Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />

Th<strong>is</strong> step allows you to specify an estimation method for the stage.<br />

WR (sampling with repl<strong>ac</strong>ement). WR estimation does not include a correction for<br />

sampling from a finite population, since it assumes that the sample was taken from<br />

an infinite population. When the population for the stage <strong>is</strong> much larger than the<br />

sample, th<strong>is</strong> <strong>is</strong> a reasonable assumption. WR estimation can be specifi<strong>ed</strong> only in the<br />

final stage of a design; the Wizard will not allow you to add another stage if you<br />

select WR estimation.<br />

Equal WOR (equal probability sampling without repl<strong>ac</strong>ement). Equal WOR estimation<br />

includes the finite population correction and assumes that units are sampl<strong>ed</strong> with<br />

equal probability. Equal WOR can be specifi<strong>ed</strong> in any stage of a design.


28<br />

Chapter 3<br />

Unequal WOR (unequal probability sampling without repl<strong>ac</strong>ement). In addition to using<br />

the finite population correction, Unequal WOR <strong>ac</strong>counts for sampling units (usually<br />

clusters) select<strong>ed</strong> with unequal probability. Th<strong>is</strong> estimation method <strong>is</strong> available<br />

only in the first stage.<br />

Analys<strong>is</strong> Preparation Wizard: Size<br />

Figure 3-4<br />

Analys<strong>is</strong> Preparation Wizard, Size step (stage 1)<br />

Th<strong>is</strong> step <strong>is</strong> us<strong>ed</strong> to specify inclusion probabilities or population sizes for the current<br />

stage. Sizes can be fix<strong>ed</strong> or can vary <strong>ac</strong>ross strata. For the purpose of specifying sizes,<br />

clusters specifi<strong>ed</strong> in previous stages can be us<strong>ed</strong> to define strata.<br />

Units. You can specify ex<strong>ac</strong>t population sizes or the probabilities with which units<br />

were sampl<strong>ed</strong>.


29<br />

Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />

• Value. A single value <strong>is</strong> appli<strong>ed</strong> to all strata. If Population Sizes <strong>is</strong> select<strong>ed</strong> as the<br />

unit metric, you should enter a non-negative integer. If Inclusion Probabilities <strong>is</strong><br />

select<strong>ed</strong>, you should enter a value between 0 and 1, inclusive.<br />

• Unequal values for strata. Allows you to enter size values on a per-stratum bas<strong>is</strong><br />

via the Define Unequal Sizes dialog box.<br />

• Read values from variable. Allows you to select a numeric variable that contains<br />

size values for strata.<br />

Define Unequal Sizes<br />

Figure 3-5<br />

Define Unequal Sizes dialog box<br />

The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum bas<strong>is</strong>.<br />

Size Specifications grid. The grid d<strong>is</strong>plays the cross-classifications of up to five strata<br />

or cluster variables—one stratum/cluster combination per row. Eligible grid variables<br />

include all stratification variables from the current and previous stages and all cluster<br />

variables from previous stages. Variables can be reorder<strong>ed</strong> within the grid or mov<strong>ed</strong><br />

to the Exclude l<strong>is</strong>t. Enter sizes in the rightmost column. Click Labels or Values<br />

to toggle the d<strong>is</strong>play of value labels and data values for stratification and cluster<br />

variables in the grid cells. Cells that contain unlabel<strong>ed</strong> values always show values.


30<br />

Chapter 3<br />

Click Refresh Strata to repopulate the grid with e<strong>ac</strong>h combination of label<strong>ed</strong> data<br />

values for variables in the grid.<br />

Exclude. To specify sizes for a subset of stratum/cluster combinations, move one or<br />

more variables to the Exclude l<strong>is</strong>t. These variables are not us<strong>ed</strong> to define sample sizes.<br />

Analys<strong>is</strong> Preparation Wizard: Stage Summary<br />

Figure 3-6<br />

Analys<strong>is</strong> Preparation Wizard, Plan Summary step (stage 1)<br />

Th<strong>is</strong> <strong>is</strong> the last step within e<strong>ac</strong>h stage, providing a summary of the analys<strong>is</strong> design<br />

specifications through the current stage. From here, you can either proce<strong>ed</strong> to the<br />

next stage (creating it if necessary) or save the analys<strong>is</strong> specifications.<br />

If you cannot add another stage, it <strong>is</strong> likely because:<br />

• No cluster variable was specifi<strong>ed</strong> in the Design Variables step.


31<br />

Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />

• You select<strong>ed</strong> WR estimation in the Estimation Method step.<br />

• Th<strong>is</strong> <strong>is</strong> the third stage of the analys<strong>is</strong>, and the Wizard supports a maximum of<br />

three stages.<br />

Analys<strong>is</strong> Preparation Wizard: Fin<strong>is</strong>h<br />

Figure 3-7<br />

Analys<strong>is</strong> Preparation Wizard, Fin<strong>is</strong>h step<br />

Th<strong>is</strong> <strong>is</strong> the final step. You can save the plan file now or paste your selections to<br />

asyntax window.<br />

When making changes to stages in the ex<strong>is</strong>ting plan file, you can save the <strong>ed</strong>it<strong>ed</strong><br />

plan to a new file or overwrite the ex<strong>is</strong>ting file. When adding stages without making<br />

changes to ex<strong>is</strong>ting stages, the Wizard automatically overwrites the ex<strong>is</strong>ting plan file.<br />

If you want to save the plan to a new file, choose to Paste the syntax generat<strong>ed</strong> by the<br />

Wizard into a syntax window and change the filename in the syntax commands.


32<br />

Chapter 3<br />

Modifying an Ex<strong>is</strong>ting Analys<strong>is</strong> Plan<br />

E<br />

E<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Prepare for Analys<strong>is</strong>...<br />

Select Edit a plan file, and choose a plan filename to which you will save the analys<strong>is</strong><br />

plan.<br />

Click Next to continue through the Wizard.<br />

Review the analys<strong>is</strong> plan in the Plan Summary step, and then click Next.<br />

Subsequent steps are largely the same as for a new design. For more information, see<br />

the Help for individual steps.<br />

E<br />

Navigate to the Fin<strong>is</strong>h step, and specify a new name for the <strong>ed</strong>it<strong>ed</strong> plan file, or choose<br />

to overwrite the ex<strong>is</strong>ting plan file.<br />

Optionally, you can:<br />

• Remove stages from the plan.


Analys<strong>is</strong> Preparation Wizard: Plan Summary<br />

Figure 3-8<br />

Analys<strong>is</strong> Preparation Wizard, Plan Summary step<br />

33<br />

Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />

Th<strong>is</strong> step allows you to review the analys<strong>is</strong> plan and remove stages from the plan.<br />

Remove Stages. You can remove stages 2 and 3 from a mult<strong>is</strong>tage design. Since a plan<br />

must have at least one stage, you can <strong>ed</strong>it but not remove stage 1 from the design.


<strong>Complex</strong> Samples Plan<br />

Chapter<br />

4<br />

<strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures require analys<strong>is</strong> specifications from an<br />

analys<strong>is</strong> or sample plan file in order to provide valid results.<br />

Figure 4-1<br />

<strong>Complex</strong> Samples Plan dialog box<br />

Plan. Specify the path of an analys<strong>is</strong> or sample plan file.<br />

Joint Probabilities. In order to use Unequal WOR estimation for clusters drawn<br />

using a PPS WOR method, you ne<strong>ed</strong> to specify a separate file containing the joint<br />

probabilities. Th<strong>is</strong> file <strong>is</strong> creat<strong>ed</strong> by the Sampling Wizard during sampling.<br />

35


<strong>Complex</strong> Samples Frequencies<br />

Chapter<br />

5<br />

The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure produces frequency tables for select<strong>ed</strong><br />

variables and d<strong>is</strong>plays univariate stat<strong>is</strong>tics. Optionally, you can request stat<strong>is</strong>tics by<br />

subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />

Example. Using the <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure, you can obtain<br />

univariate tabular stat<strong>is</strong>tics for vitamin usage among U.S. citizens, bas<strong>ed</strong> on the<br />

results of the National Health Interview Survey (NHIS) and with an appropriate<br />

analys<strong>is</strong> plan for th<strong>is</strong> public use data.<br />

Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates of cell population sizes and table<br />

percentages, plus standard errors, confidence intervals, coefficients of variation,<br />

design effects, square roots of design effects, cumulative values, and unweight<strong>ed</strong><br />

counts for e<strong>ac</strong>h estimate. Additionally, chi-square and likelihood ratio stat<strong>is</strong>tics are<br />

comput<strong>ed</strong> for the test of equal cell proportions.<br />

Data. Variables for which frequency tables are produc<strong>ed</strong> should be categorical.<br />

Subpopulation variables can be string or numeric, but should be categorical.<br />

Assumptions. The cases in the data file represent a sample from a complex design<br />

that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />

<strong>Complex</strong> Samples Plan dialog box.<br />

Obtaining <strong>Complex</strong> Samples Frequencies<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Frequencies...<br />

Select a plan file and optionally select a custom joint probabilities file.<br />

37


38<br />

Chapter 5<br />

E<br />

Click Continue.<br />

Figure 5-1<br />

Frequencies dialog box<br />

E<br />

Select at least one frequency variable.<br />

Optionally, you can:<br />

• Specify variables to define subpopulations. Stat<strong>is</strong>tics are comput<strong>ed</strong> separately for<br />

e<strong>ac</strong>h subpopulation.


39<br />

<strong>Complex</strong> Samples Frequencies<br />

<strong>Complex</strong> Samples Frequencies Stat<strong>is</strong>tics<br />

Figure 5-2<br />

Frequencies Stat<strong>is</strong>tics dialog box<br />

Cells. Th<strong>is</strong> group allows you to request estimates of the cell population sizes and<br />

table percentages.<br />

Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the population size or table<br />

percentage.<br />

• Standard error. The standard error of the estimate.<br />

• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />

level.<br />

• Coefficient of variation. The ratio of the standard error of the estimate to the<br />

estimate.<br />

• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.<br />

• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />

by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />

the effect of specifying a complex design, where values further from 1 indicate<br />

greater effects.<br />

• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />

design, where values further from 1 indicate greater effects.<br />

• Cumulative values. The cumulative estimate through e<strong>ac</strong>h value of the variable.


40<br />

Chapter 5<br />

Test of equal cell proportions. Th<strong>is</strong> produces chi-square and likelihood-ratio tests of<br />

the hypothes<strong>is</strong> that the categories of a variable have equal frequencies. Separate<br />

tests are perform<strong>ed</strong> for e<strong>ac</strong>h variable.<br />

<strong>Complex</strong> Samples M<strong>is</strong>sing Values<br />

Figure 5-3<br />

M<strong>is</strong>sing Values dialog box<br />

Tables. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the analys<strong>is</strong>.<br />

• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a table-by-table bas<strong>is</strong>.<br />

Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross frequency or<br />

crosstabulation tables.<br />

• Use cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables.<br />

Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent <strong>ac</strong>ross tables.<br />

Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />

valid or invalid.


41<br />

<strong>Complex</strong> Samples Frequencies<br />

<strong>Complex</strong> Samples Options<br />

Figure 5-4<br />

Options dialog box<br />

Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />

table or in separate tables.


<strong>Complex</strong> Samples Descriptives<br />

Chapter<br />

6<br />

The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics<br />

for several variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />

by one or more categorical variables.<br />

Example. Using the <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure, you can obtain<br />

univariate descriptive stat<strong>is</strong>tics for the <strong>ac</strong>tivity levels of U.S. citizens bas<strong>ed</strong> on the<br />

results of the National Health Interview Survey (NHIS) and with an appropriate<br />

analys<strong>is</strong> plan for th<strong>is</strong> public use data.<br />

Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces means and sums, plus t tests, standard errors,<br />

confidence intervals, coefficients of variation, unweight<strong>ed</strong> counts, population sizes,<br />

design effects, and square roots of design effects for e<strong>ac</strong>h estimate.<br />

Data. Measures should be scale variables. Subpopulation variables can be string<br />

or numeric but should be categorical.<br />

Assumptions. The cases in the data file represent a sample from a complex design<br />

that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />

<strong>Complex</strong> Samples Plan dialog box.<br />

Obtaining <strong>Complex</strong> Samples Descriptives<br />

E<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Descriptives...<br />

Select a plan file, and optionally select a custom joint probabilities file.<br />

Click Continue.<br />

43


44<br />

Chapter 6<br />

Figure 6-1<br />

Descriptives dialog box<br />

E<br />

Select at least one measure variable.<br />

Optionally, you can:<br />

• Specify variables to define subpopulations. Stat<strong>is</strong>tics are comput<strong>ed</strong> separately for<br />

e<strong>ac</strong>h subpopulation.


45<br />

<strong>Complex</strong> Samples Descriptives<br />

<strong>Complex</strong> Samples Descriptives Stat<strong>is</strong>tics<br />

Figure 6-2<br />

Descriptives Stat<strong>is</strong>tics dialog box<br />

Summaries. Th<strong>is</strong> group allows you to request estimates of the means and sums of the<br />

measure variables. Additionally, you can request t tests of the estimates against<br />

a specifi<strong>ed</strong> value.<br />

Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the mean or sum.<br />

• Standard error. The standard error of the estimate.<br />

• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />

level.<br />

• Coefficient of variation. The ratio of the standard error of the estimate to the<br />

estimate.<br />

• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.<br />

• Population size. The estimat<strong>ed</strong> number of units in the population.<br />

• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />

by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />

the effect of specifying a complex design, where values further from 1 indicate<br />

greater effects.<br />

• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />

design, where values further from 1 indicate greater effects.


46<br />

Chapter 6<br />

<strong>Complex</strong> Samples Descriptives M<strong>is</strong>sing Values<br />

Figure 6-3<br />

Descriptives M<strong>is</strong>sing Values dialog box<br />

Stat<strong>is</strong>tics for Measure Variables. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the<br />

analys<strong>is</strong>.<br />

• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a variable-by-variable<br />

bas<strong>is</strong>, thus the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross measure variables.<br />

• Ensure cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables,<br />

thus the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent.<br />

Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />

valid or invalid.<br />

<strong>Complex</strong> Samples Options<br />

Figure 6-4<br />

Options dialog box


47<br />

<strong>Complex</strong> Samples Descriptives<br />

Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />

table or in separate tables.


<strong>Complex</strong> Samples Crosstabs<br />

Chapter<br />

7<br />

The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure produces crosstabulation tables for pairs<br />

of select<strong>ed</strong> variables and d<strong>is</strong>plays two-way stat<strong>is</strong>tics. Optionally, you can request<br />

stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />

Example. Using the <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure, you can obtain<br />

cross-classification stat<strong>is</strong>tics for smoking frequency by vitamin usage of U.S. citizens,<br />

bas<strong>ed</strong> on the results of the National Health Interview Survey (NHIS) and with an<br />

appropriate analys<strong>is</strong> plan for th<strong>is</strong> public-use data.<br />

Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates of cell population sizes and row, column,<br />

and table percentages, plus standard errors, confidence intervals, coefficients of<br />

variation, expect<strong>ed</strong> values, design effects, square roots of design effects, residuals,<br />

adjust<strong>ed</strong> residuals, and unweight<strong>ed</strong> counts for e<strong>ac</strong>h estimate. The odds ratio, relative<br />

r<strong>is</strong>k, and r<strong>is</strong>k difference are comput<strong>ed</strong> for 2-by-2 tables. Additionally, Pearson and<br />

likelihood-ratio stat<strong>is</strong>tics are comput<strong>ed</strong> for the test of independence of the row and<br />

column variables.<br />

Data. Row and column variables should be categorical. Subpopulation variables can<br />

be string or numeric but should be categorical.<br />

Assumptions. The cases in the data file represent a sample from a complex design<br />

that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />

<strong>Complex</strong> Samples Plan dialog box.<br />

Obtaining <strong>Complex</strong> Samples Crosstabs<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Crosstabs...<br />

49


50<br />

Chapter 7<br />

E<br />

E<br />

Select a plan file and, optionally, select a custom joint probabilities file.<br />

Click Continue.<br />

Figure 7-1<br />

Crosstabs dialog box<br />

E<br />

Select at least one row variable and one column variable.<br />

Optionally, you can:<br />

• Specify variables to define subpopulations. Stat<strong>is</strong>tics are comput<strong>ed</strong> separately for<br />

e<strong>ac</strong>h subpopulation.


51<br />

<strong>Complex</strong> Samples Crosstabs<br />

<strong>Complex</strong> Samples Crosstabs Stat<strong>is</strong>tics<br />

Figure 7-2<br />

Crosstabs Stat<strong>is</strong>tics dialog box<br />

Cells. Th<strong>is</strong> group allows you to request estimates of the cell population size and row,<br />

column, and table percentages.<br />

Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the population size and row,<br />

column, and table percentages.<br />

• Standard error. The standard error of the estimate.<br />

• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />

level.<br />

• Coefficient of variation. The ratio of the standard error of the estimate to the<br />

estimate.<br />

• Expect<strong>ed</strong> values. The expect<strong>ed</strong> value of the estimate, under the hypothes<strong>is</strong> of<br />

independence of the row and column variable.<br />

• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.


52<br />

Chapter 7<br />

• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />

by assuming thatthesample<strong>is</strong>asimplerandomsample.Th<strong>is</strong><strong>is</strong>ameasureof<br />

the effect of specifying a complex design, where values further from 1 indicate<br />

greater effects.<br />

• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />

design, where values further from 1 indicate greater effects.<br />

• Residuals. The expect<strong>ed</strong> value <strong>is</strong> the number of cases that you would expect in the<br />

cell if there were no relationship between the two variables. A positive residual<br />

indicates that there are more cases in the cell than there would be if the row and<br />

column variables were independent.<br />

• Adjust<strong>ed</strong> residuals. The residual for a cell (observ<strong>ed</strong> minus expect<strong>ed</strong> value)<br />

divid<strong>ed</strong>byanestimateofitsstandarderror.Theresultingstandardiz<strong>ed</strong>residual<strong>is</strong><br />

express<strong>ed</strong> in standard deviation units above or below the mean.<br />

Summaries for 2-by-2 Tables. Th<strong>is</strong> group produces stat<strong>is</strong>tics for tables in which the row<br />

and column variable e<strong>ac</strong>h have two categories. E<strong>ac</strong>h <strong>is</strong> a measure of the strength of<br />

the association between the presence of a f<strong>ac</strong>tor and the occurrence of an event.<br />

• Odds ratio. The odds ratio can be us<strong>ed</strong> as an estimate of relative r<strong>is</strong>k when the<br />

occurrence of the f<strong>ac</strong>tor <strong>is</strong> rare.<br />

• Relative r<strong>is</strong>k. The ratio of the r<strong>is</strong>k of an event in the presence of the f<strong>ac</strong>tor to the<br />

r<strong>is</strong>k of the event in the absence of the f<strong>ac</strong>tor.<br />

• R<strong>is</strong>k difference. Th<strong>ed</strong>ifferencebetweenther<strong>is</strong>kofaneventinthepresenceofthe<br />

f<strong>ac</strong>tor and the r<strong>is</strong>k of the event in the absence of the f<strong>ac</strong>tor.<br />

Test of independence of rows and columns. Th<strong>is</strong> produces chi-square and<br />

likelihood-ratio tests of the hypothes<strong>is</strong> that a row and column variable are<br />

independent. Separate tests are perform<strong>ed</strong> for e<strong>ac</strong>h pair of variables.


53<br />

<strong>Complex</strong> Samples Crosstabs<br />

<strong>Complex</strong> Samples M<strong>is</strong>sing Values<br />

Figure 7-3<br />

M<strong>is</strong>sing Values dialog box<br />

Tables. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the analys<strong>is</strong>.<br />

• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a table-by-table bas<strong>is</strong>.<br />

Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross frequency or<br />

crosstabulation tables.<br />

• Use cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables.<br />

Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent <strong>ac</strong>ross tables.<br />

Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />

valid or invalid.<br />

<strong>Complex</strong> Samples Options<br />

Figure 7-4<br />

Options dialog box


54<br />

Chapter 7<br />

Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />

table or in separate tables.


<strong>Complex</strong> Samples Ratios<br />

Chapter<br />

8<br />

The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics for<br />

ratios of variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />

by one or more categorical variables.<br />

Example. Using the <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure, you can obtain descriptive<br />

stat<strong>is</strong>tics for the ratio of current property value to last assess<strong>ed</strong> value, bas<strong>ed</strong> on the<br />

results of a statewide survey carri<strong>ed</strong> out <strong>ac</strong>cording to a complex design and with an<br />

appropriate analys<strong>is</strong> plan for the data.<br />

Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces ratio estimates, t tests, standard errors, confidence<br />

intervals, coefficients of variation, unweight<strong>ed</strong> counts, population sizes, design<br />

effects, and square roots of design effects.<br />

Data. Numerators and denominators should be positive-valu<strong>ed</strong> scale variables.<br />

Subpopulation variables can be string or numeric but should be categorical.<br />

Assumptions. The cases in the data file represent a sample from a complex design<br />

that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />

<strong>Complex</strong> Samples Plan dialog box.<br />

Obtaining <strong>Complex</strong> Samples Ratios<br />

E<br />

E<br />

E<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Ratios...<br />

Select a plan file and, optionally, select a custom joint probabilities file.<br />

Click Continue.<br />

55


56<br />

Chapter 8<br />

Figure 8-1<br />

Ratios dialog box<br />

E<br />

Select at least one numerator variable and denominator variable.<br />

Optionally, you can:<br />

• Specify variables to define subgroups for which stat<strong>is</strong>tics are produc<strong>ed</strong>.


57<br />

<strong>Complex</strong> Samples Ratios<br />

<strong>Complex</strong> Samples Ratios Stat<strong>is</strong>tics<br />

Figure 8-2<br />

Ratios Stat<strong>is</strong>tics dialog box<br />

Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the ratio estimate.<br />

• Standard error. The standard error of the estimate.<br />

• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />

level.<br />

• Coefficient of variation. The ratio of the standard error of the estimate to the<br />

estimate.<br />

• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.<br />

• Population size. The estimat<strong>ed</strong> number of units in the population.<br />

• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />

by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />

the effect of specifying a complex design, where values further from 1 indicate<br />

greater effects.<br />

• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />

design, where values further from 1 indicate greater effects.<br />

t-test. You can request t tests of the estimates against a specifi<strong>ed</strong> value.


58<br />

Chapter 8<br />

<strong>Complex</strong> Samples Ratios M<strong>is</strong>sing Values<br />

Figure 8-3<br />

Ratios M<strong>is</strong>sing Values dialog box<br />

Ratios. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the analys<strong>is</strong>.<br />

• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a ratio-by-ratio bas<strong>is</strong>.<br />

Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross numerator-denominator<br />

pairs.<br />

• Ensure cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables.<br />

Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent.<br />

Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />

valid or invalid.<br />

<strong>Complex</strong> Samples Options<br />

Figure 8-4<br />

Options dialog box


59<br />

<strong>Complex</strong> Samples Ratios<br />

Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />

table or in separate tables.


<strong>Complex</strong> Samples<br />

General Linear Model<br />

Chapter<br />

9<br />

The <strong>Complex</strong> Samples General Linear Model (CSGLM) proc<strong>ed</strong>ure performs linear<br />

regression analys<strong>is</strong>, as well as analys<strong>is</strong> of variance and covariance, for samples<br />

drawn by complex sampling methods. Optionally, you can request analyses for<br />

a subpopulation.<br />

Example. A grocery store chain survey<strong>ed</strong> a set of customers concerning their<br />

purchasing habits, <strong>ac</strong>cording to a complex design. Given the survey results and<br />

how much e<strong>ac</strong>h customer spent in the previous month, the store wants to see if the<br />

frequency with which customers shop <strong>is</strong> relat<strong>ed</strong> to the amount they spend in a month,<br />

controlling for the gender of the customer and incorporating the sampling design.<br />

Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates, standard errors, confidence intervals,<br />

t tests, design effects, and square roots of design effects for model parameters, as<br />

well as the correlations and covariances between parameter estimates. Measures of<br />

model fit and descriptive stat<strong>is</strong>tics for the dependent and independent variables are<br />

also available. Additionally, you can request estimat<strong>ed</strong> marginal means for levels of<br />

model f<strong>ac</strong>tors and f<strong>ac</strong>tor inter<strong>ac</strong>tions.<br />

Data. The dependent variable <strong>is</strong> quantitative. F<strong>ac</strong>tors are categorical. Covariates<br />

are quantitative variables that are relat<strong>ed</strong> to the dependent variable. Subpopulation<br />

variables can be string or numeric but should be categorical.<br />

Assumptions. Thecasesinth<strong>ed</strong>atafilerepresentasamplefrom<strong>ac</strong>omplexdesign<br />

that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />

<strong>Complex</strong> Samples Plan dialog box.<br />

61


62<br />

Chapter 9<br />

Obtaining a <strong>Complex</strong> Samples General Linear Model<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

General Linear Model...<br />

E<br />

E<br />

Select a plan file and, optionally, select a custom joint probabilities file.<br />

Click Continue.<br />

Figure 9-1<br />

General Linear Model dialog box<br />

E<br />

Select a dependent variable.


63<br />

<strong>Complex</strong> Samples General Linear Model<br />

Optionally, you can:<br />

• Select variables for F<strong>ac</strong>tors and Covariates, as appropriate for your data.<br />

• Specify a variable to define a subpopulation. The analys<strong>is</strong> <strong>is</strong> perform<strong>ed</strong> only for<br />

the select<strong>ed</strong> category of the subpopulation variable.<br />

Figure 9-2<br />

Model dialog box<br />

Specify Model Effects. By default, the proc<strong>ed</strong>ure builds a main-effects model using the<br />

f<strong>ac</strong>tors and covariates specifi<strong>ed</strong> in the main dialog box. Alternatively, you can build a<br />

custom model that includes inter<strong>ac</strong>tion effects and nest<strong>ed</strong> terms.


64<br />

Chapter 9<br />

Non-Nest<strong>ed</strong> Terms<br />

For the select<strong>ed</strong> f<strong>ac</strong>tors and covariates:<br />

Inter<strong>ac</strong>tion. Creates the highest-level inter<strong>ac</strong>tion term for all select<strong>ed</strong> variables.<br />

Main effects. Creates a main-effects term for e<strong>ac</strong>h variable select<strong>ed</strong>.<br />

All 2-way. Creates all possible two-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

All 3-way. Creates all possible three-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

All 4-way. Creates all possible four-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

All 5-way. Creates all possible five-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

Nest<strong>ed</strong> Terms<br />

You can build nest<strong>ed</strong> terms for your model in th<strong>is</strong> proc<strong>ed</strong>ure. Nest<strong>ed</strong> terms are<br />

useful for modeling the effect of a f<strong>ac</strong>tor or covariate whose values do not inter<strong>ac</strong>t<br />

with the levels of another f<strong>ac</strong>tor. For example, a grocery store chain may follow the<br />

spending habits of its customers at several store locations. Since e<strong>ac</strong>h customer<br />

frequents only one of these locations, the Customer effect can be said to be nest<strong>ed</strong><br />

within the Store location effect.<br />

Additionally, you can include inter<strong>ac</strong>tion effects or add multiple levels of nesting<br />

to the nest<strong>ed</strong> term.<br />

Limitations. Nest<strong>ed</strong> terms have the following restrictions:<br />

• All f<strong>ac</strong>tors within an inter<strong>ac</strong>tion must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />

specifying A*A <strong>is</strong> invalid.<br />

• All f<strong>ac</strong>tors within a nest<strong>ed</strong> effect must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />

specifying A(A) <strong>is</strong> invalid.<br />

• No effect can be nest<strong>ed</strong> within a covariate. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor and X <strong>is</strong> a<br />

covariate, then specifying A(X) <strong>is</strong> invalid.<br />

Intercept. The intercept <strong>is</strong> usually includ<strong>ed</strong> in the model. If you can assume the<br />

data pass through the origin, you can exclude the intercept. Even if you include the<br />

intercept in the model, you can choose to suppress stat<strong>is</strong>tics relat<strong>ed</strong> to it.


65<br />

<strong>Complex</strong> Samples General Linear Model<br />

<strong>Complex</strong> Samples General Linear Model Stat<strong>is</strong>tics<br />

Figure 9-3<br />

General Linear Model Stat<strong>is</strong>tics dialog box<br />

Model Parameters. Th<strong>is</strong> group allows you to control the d<strong>is</strong>play of stat<strong>is</strong>tics relat<strong>ed</strong> to<br />

the model parameters.<br />

• Estimate. D<strong>is</strong>plays estimates of the coefficients.<br />

• Standard error. D<strong>is</strong>plays the standard error for e<strong>ac</strong>h coefficient estimate.<br />

• Confidence interval. D<strong>is</strong>plays a confidence interval for e<strong>ac</strong>h coefficient estimate.<br />

The confidence level for the interval <strong>is</strong> set in the Options dialog box.<br />

• t-test. D<strong>is</strong>plays a t test of e<strong>ac</strong>h coefficient estimate. The null hypothes<strong>is</strong> for e<strong>ac</strong>h<br />

test <strong>is</strong> that the value of the coefficient <strong>is</strong> 0.<br />

• Covariances of parameter estimates. D<strong>is</strong>plays an estimate of the covariance matrix<br />

for the model coefficients.<br />

• Correlations of parameter estimates. D<strong>is</strong>plays an estimate of the correlation matrix<br />

for the model coefficients.


66<br />

Chapter 9<br />

• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />

by assuming thatthesample<strong>is</strong>asimplerandomsample.Th<strong>is</strong><strong>is</strong>ameasureof<br />

the effect of specifying a complex design, where values further from 1 indicate<br />

greater effects.<br />

• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />

design, where values further from 1 indicate greater effects.<br />

Model fit. D<strong>is</strong>plays R 2 and root mean squar<strong>ed</strong> error stat<strong>is</strong>tics.<br />

Population means of dependent variable and covariates. D<strong>is</strong>plays summary information<br />

about the dependent variable, covariates, and f<strong>ac</strong>tors.<br />

Sample design information. D<strong>is</strong>plays summary information about the sample,<br />

including the unweight<strong>ed</strong> count and the population size.<br />

<strong>Complex</strong> Samples Hypothes<strong>is</strong> Tests<br />

Figure 9-4<br />

Hypothes<strong>is</strong> Tests dialog box


67<br />

<strong>Complex</strong> Samples General Linear Model<br />

Test Stat<strong>is</strong>tic. Th<strong>is</strong> group allows you to select the type of stat<strong>is</strong>tic us<strong>ed</strong> for testing<br />

hypotheses. You can choose between F, adjust<strong>ed</strong> F, chi-square, and adjust<strong>ed</strong><br />

chi-square.<br />

Sampling Degrees of Fre<strong>ed</strong>om. Th<strong>is</strong> group gives you control over the sampling design<br />

degrees of fre<strong>ed</strong>om us<strong>ed</strong> to compute p values for all test stat<strong>is</strong>tics. If bas<strong>ed</strong> on the<br />

sampling design, the value <strong>is</strong> the difference between the number of primary sampling<br />

units and the number of strata in the first stage of sampling. Alternatively, you can set<br />

a custom degrees of fre<strong>ed</strong>om by specifying a positive integer.<br />

Adjustment for Multiple Compar<strong>is</strong>ons. When performing many hypothes<strong>is</strong> tests, you<br />

run the r<strong>is</strong>k of increas<strong>ed</strong> overall Type I error (the probability of incorrectly rejecting<br />

a null hypothes<strong>is</strong>). Th<strong>is</strong> group allows you to choose the method of adjusting the<br />

significance level.<br />

• Least significant difference. Th<strong>is</strong> method does not control the overall probability<br />

of rejecting the hypotheses that some linear contrasts are different from the<br />

null hypothes<strong>is</strong> values.<br />

• Sequential Sidak. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Sidak proc<strong>ed</strong>ure<br />

that <strong>is</strong> much less conservative in terms of rejecting individual hypotheses but<br />

maintains the same overall significance level.<br />

• Sequential Bonferroni. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Bonferroni<br />

proc<strong>ed</strong>ure that <strong>is</strong> much less conservative in terms of rejecting individual<br />

hypotheses but maintaining the same overall significance level.<br />

• Sidak. Th<strong>is</strong> method provides tighter bounds than the Bonferroni appro<strong>ac</strong>h.<br />

• Bonferroni. Th<strong>is</strong> method adjusts the observ<strong>ed</strong> significance level for the f<strong>ac</strong>t that<br />

multiple contrasts are being test<strong>ed</strong>.


68<br />

Chapter 9<br />

<strong>Complex</strong> Samples General Linear Model Estimat<strong>ed</strong> Means<br />

Figure 9-5<br />

General Linear Model Estimat<strong>ed</strong> Means dialog box<br />

The Estimat<strong>ed</strong> Means dialog box allows you to d<strong>is</strong>play the model-estimat<strong>ed</strong> marginal<br />

means for levels of f<strong>ac</strong>tors and f<strong>ac</strong>tor inter<strong>ac</strong>tions specifi<strong>ed</strong> in the Model subdialog<br />

box. You can also request that the overall population mean be d<strong>is</strong>play<strong>ed</strong>.<br />

Term. Estimat<strong>ed</strong> means are comput<strong>ed</strong> for the select<strong>ed</strong> f<strong>ac</strong>tors and f<strong>ac</strong>tor inter<strong>ac</strong>tions.<br />

Contrast. The contrast determines how hypothes<strong>is</strong> tests are set up to compare the<br />

estimat<strong>ed</strong> means.<br />

• Simple. Compares the mean of e<strong>ac</strong>h level to the mean of a specifi<strong>ed</strong> level. Th<strong>is</strong><br />

type of contrast <strong>is</strong> useful when there <strong>is</strong> a control group.<br />

• Deviation. Compares the mean of e<strong>ac</strong>h level (except a reference category) to the<br />

mean of all of the levels (grand mean). The levels of the f<strong>ac</strong>tor can be in any order.<br />

• Difference. Compares the mean of e<strong>ac</strong>h level (except the first) to the mean of<br />

previous levels. They are sometimes call<strong>ed</strong> reverse Helmert contrasts.<br />

• Helmert. Compares the mean of e<strong>ac</strong>h level of the f<strong>ac</strong>tor (except the last) to the<br />

mean of subsequent levels.


69<br />

<strong>Complex</strong> Samples General Linear Model<br />

• Repeat<strong>ed</strong>. Compares the mean of e<strong>ac</strong>h level (except the last) to the mean of<br />

the subsequent level.<br />

• Polynomial. Compares the linear effect, quadratic effect, cubic effect, and so on.<br />

The first degree of fre<strong>ed</strong>om contains the linear effect <strong>ac</strong>ross all categories; the<br />

second degree of fre<strong>ed</strong>om, the quadratic effect; and so on. These contrasts are<br />

oftenus<strong>ed</strong>to estimate polynomial trends.<br />

Reference Category. The simple and deviation contrasts require a reference category,<br />

or f<strong>ac</strong>tor level against which the others are compar<strong>ed</strong>.<br />

<strong>Complex</strong> Samples General Linear Model Save<br />

Figure 9-6<br />

General Linear Model Save dialog box<br />

Save Variables. Th<strong>is</strong> group allows you to save the model pr<strong>ed</strong>ict<strong>ed</strong> values and<br />

residuals as new variables in the working file.


70<br />

Chapter 9<br />

Export Model as <strong>SPSS</strong> data. Writes an <strong>SPSS</strong> data file containing a covariance (or<br />

correlation, if select<strong>ed</strong>) matrix of the parameter estimates in the model. Also, for e<strong>ac</strong>h<br />

dependent variable, there will be a row of parameter estimates, a row of standard<br />

errors, a row of significance values for the t stat<strong>is</strong>tics corresponding to the parameter<br />

estimates, and a row of sampling design degrees of fre<strong>ed</strong>om. You can use th<strong>is</strong> matrix<br />

file in other proc<strong>ed</strong>ures that read an <strong>SPSS</strong> matrix file.<br />

Export Model as XML. Saves the parameter estimates and the parameter covariance<br />

matrix, if select<strong>ed</strong>, in XML (PMML) format. SmartScore and the server version of<br />

<strong>SPSS</strong> (a separate product) can use th<strong>is</strong> model file to apply the model information<br />

to other data files for scoring purposes.<br />

<strong>Complex</strong> Samples General Linear Model Options<br />

Figure 9-7<br />

General Linear Model Options dialog box<br />

User-M<strong>is</strong>sing Values. All design variables, as well as the dependent variable and<br />

any covariates, must have valid data. Cases with invalid data for any of these<br />

variables are delet<strong>ed</strong> from the analys<strong>is</strong>. These controls allow you to decide whether<br />

user-m<strong>is</strong>sing values are treat<strong>ed</strong> as valid among the strata, cluster, subpopulation, and<br />

f<strong>ac</strong>tor variables.<br />

Confidence Interval. Th<strong>is</strong> <strong>is</strong> the confidence interval level for coefficient estimates<br />

and estimat<strong>ed</strong> marginal means. Specify a value greater than or equal to 50 and less<br />

than 100.


CSGLM Command Additional Features<br />

The <strong>SPSS</strong> command language also allows you to:<br />

71<br />

<strong>Complex</strong> Samples General Linear Model<br />

• Specify custom tests of effects versus a linear combination of effects or a value<br />

(using the CUSTOM subcommand).<br />

• Fix covariates at values other than their means when computing estimat<strong>ed</strong><br />

marginal means (using the EMMEANS subcommand).<br />

• Specify a metric for polynomial contrasts (using the EMMEANS subcommand).<br />

• Specify a tolerance value for checking singularity (using the CRITERIA<br />

subcommand).<br />

• Create user-specifi<strong>ed</strong> names for sav<strong>ed</strong> variables (using the SAVE subcommand).<br />

• Produce a general estimable function table (using the PRINT subcommand).<br />

See the <strong>SPSS</strong> Command Syntax Reference for complete syntax information.


<strong>Complex</strong> Samples<br />

Log<strong>is</strong>tic Regression<br />

Chapter<br />

10<br />

The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure performs log<strong>is</strong>tic regression<br />

analys<strong>is</strong> on a binary or multinomial dependent variable for samples drawn by complex<br />

sampling methods. Optionally, you can request analyses for a subpopulation.<br />

Example. A loan officer has collect<strong>ed</strong> past records of customers given loans at several<br />

different branches, <strong>ac</strong>cording to a complex design. While incorporating the sample<br />

design, the officer wants to see if the probability with which a customer defaults <strong>is</strong><br />

relat<strong>ed</strong> to age, employment h<strong>is</strong>tory, and amount of cr<strong>ed</strong>it debt.<br />

Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates, exponentiat<strong>ed</strong> estimates, standard<br />

errors, confidence intervals, t tests, design effects, and square roots of design effects<br />

for model parameters, as well as the correlations and covariances between parameter<br />

estimates. Pseudo R 2 stat<strong>is</strong>tics, classification tables, and descriptive stat<strong>is</strong>tics for the<br />

dependent and independent variables are also available.<br />

Data. The dependent variable <strong>is</strong> categorical. F<strong>ac</strong>tors are categorical. Covariates<br />

are quantitative variables that are relat<strong>ed</strong> to the dependent variable. Subpopulation<br />

variables can be string or numeric but should be categorical.<br />

Assumptions. Thecasesinth<strong>ed</strong>atafilerepresentasamplefrom<strong>ac</strong>omplexdesign<br />

that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />

<strong>Complex</strong> Samples Plan dialog box.<br />

Obtaining <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

From the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Log<strong>is</strong>tic Regression...<br />

73


74<br />

Chapter 10<br />

E<br />

E<br />

Select a plan file and, optionally, select a custom joint probabilities file.<br />

Click Continue.<br />

Figure 10-1<br />

Log<strong>is</strong>tic Regression dialog box<br />

E<br />

Select a dependent variable.<br />

Optionally, you can:<br />

• Select variables for f<strong>ac</strong>tors and covariates, as appropriate for your data.<br />

• Specify a variable to define a subpopulation. The analys<strong>is</strong> <strong>is</strong> perform<strong>ed</strong> only for<br />

the select<strong>ed</strong> category of the subpopulation variable.


75<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Reference Category<br />

Figure 10-2<br />

Log<strong>is</strong>tic Regression Reference Category dialog box<br />

By default, the <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure makes the<br />

highest-valu<strong>ed</strong> category the reference category. Th<strong>is</strong> dialog box allows you to specify<br />

the highest, lowest, or a custom category as the reference category.


76<br />

Chapter 10<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Model<br />

Figure 10-3<br />

Log<strong>is</strong>tic Regression Model dialog box<br />

Specify Model Effects. By default, the proc<strong>ed</strong>ure builds a main-effects model using the<br />

f<strong>ac</strong>tors and covariates specifi<strong>ed</strong> in the main dialog box. Alternatively, you can build a<br />

custom model that includes inter<strong>ac</strong>tion effects and nest<strong>ed</strong> terms.<br />

Non-Nest<strong>ed</strong> Terms<br />

For the select<strong>ed</strong> f<strong>ac</strong>tors and covariates:<br />

Inter<strong>ac</strong>tion. Creates the highest-level inter<strong>ac</strong>tion term for all select<strong>ed</strong> variables.<br />

Main effects. Creates a main-effects term for e<strong>ac</strong>h variable select<strong>ed</strong>.


77<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

All 2-way. Creates all possible two-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

All 3-way. Creates all possible three-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

All 4-way. Creates all possible four-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

All 5-way. Creates all possible five-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />

Nest<strong>ed</strong> Terms<br />

You can build nest<strong>ed</strong> terms for your model in th<strong>is</strong> proc<strong>ed</strong>ure. Nest<strong>ed</strong> terms are<br />

useful for modeling the effect of a f<strong>ac</strong>tor or covariate whose values do not inter<strong>ac</strong>t<br />

with the levels of another f<strong>ac</strong>tor. For example, a grocery store chain may follow the<br />

spending habits of its customers at several store locations. Since e<strong>ac</strong>h customer<br />

frequents only one of these locations, the Customer effect can be said to be nest<strong>ed</strong><br />

within the Store location effect.<br />

Additionally, you can include inter<strong>ac</strong>tion effects or add multiple levels of nesting<br />

to the nest<strong>ed</strong> term.<br />

Limitations. Nest<strong>ed</strong> terms have the following restrictions:<br />

• All f<strong>ac</strong>tors within an inter<strong>ac</strong>tion must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />

specifying A*A <strong>is</strong> invalid.<br />

• All f<strong>ac</strong>tors within a nest<strong>ed</strong> effect must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />

specifying A(A) <strong>is</strong> invalid.<br />

• No effect can be nest<strong>ed</strong> within a covariate. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor and X <strong>is</strong> a<br />

covariate, then specifying A(X) <strong>is</strong> invalid.<br />

Intercept. The intercept <strong>is</strong> usually includ<strong>ed</strong> in the model. If you can assume the<br />

data pass through the origin, you can exclude the intercept. Even if you include the<br />

intercept in the model, you can choose to suppress stat<strong>is</strong>tics relat<strong>ed</strong> to it.


78<br />

Chapter 10<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Stat<strong>is</strong>tics<br />

Figure 10-4<br />

Log<strong>is</strong>tic Regression Stat<strong>is</strong>tics dialog box<br />

Model Fit. Controls the d<strong>is</strong>plays of stat<strong>is</strong>tics that measure the overall model<br />

performance.<br />

• Pseudo R-square. The R 2 stat<strong>is</strong>tic from linear regression does not have an ex<strong>ac</strong>t<br />

counterpart among log<strong>is</strong>tic regression models. There are, instead, multiple<br />

measures that attempt to mimic the properties of the R 2 stat<strong>is</strong>tic.<br />

• Classification table. D<strong>is</strong>plays the tabulat<strong>ed</strong> cross-classifications of the observ<strong>ed</strong><br />

category by the model-pr<strong>ed</strong>ict<strong>ed</strong> category on the dependent variable.<br />

Parameters. Th<strong>is</strong> group allows you to control the d<strong>is</strong>play of stat<strong>is</strong>tics relat<strong>ed</strong> to the<br />

model parameters.<br />

• Estimate. D<strong>is</strong>plays estimates of the coefficients.<br />

• Exponentiat<strong>ed</strong> estimate. D<strong>is</strong>plays the base of the natural logarithm ra<strong>is</strong><strong>ed</strong> to the<br />

power of the estimates of the coefficients. While the estimate has nice properties<br />

for stat<strong>is</strong>tical testing, the exponentiat<strong>ed</strong> estimate, or exp(B), <strong>is</strong> easier to interpret.


79<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

• Standard error. D<strong>is</strong>plays the standard error for e<strong>ac</strong>h coefficient estimate.<br />

• Confidence interval. D<strong>is</strong>plays a confidence interval for e<strong>ac</strong>h coefficient estimate.<br />

The confidence level for the interval <strong>is</strong> set in the Options dialog box.<br />

• t-test. D<strong>is</strong>plays a t test of e<strong>ac</strong>h coefficient estimate. The null hypothes<strong>is</strong> for e<strong>ac</strong>h<br />

test <strong>is</strong> that the value of the coefficient <strong>is</strong> 0.<br />

• Covariances of parameter estimates. D<strong>is</strong>plays an estimate of the covariance matrix<br />

for the model coefficients.<br />

• Correlations of parameter estimates. D<strong>is</strong>plays an estimate of the correlation matrix<br />

for the model coefficients.<br />

• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />

by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />

the effect of specifying a complex design, where values further from 1 indicate<br />

greater effects.<br />

• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />

design, where values further from 1 indicate greater effects.<br />

Summary stat<strong>is</strong>tics for model variables. D<strong>is</strong>plays summary information about the<br />

dependent variable, covariates, and f<strong>ac</strong>tors.<br />

Sample design information. D<strong>is</strong>plays summary information about the sample,<br />

including the unweight<strong>ed</strong> count and the population size.


80<br />

Chapter 10<br />

<strong>Complex</strong> Samples Hypothes<strong>is</strong> Tests<br />

Figure 10-5<br />

Hypothes<strong>is</strong> Tests dialog box<br />

Test Stat<strong>is</strong>tic. Th<strong>is</strong> group allows you to select the type of stat<strong>is</strong>tic us<strong>ed</strong> for testing<br />

hypotheses. You can choose between F, adjust<strong>ed</strong> F, chi-square, and adjust<strong>ed</strong><br />

chi-square.<br />

Sampling Degrees of Fre<strong>ed</strong>om. Th<strong>is</strong> group gives you control over the sampling design<br />

degrees of fre<strong>ed</strong>om us<strong>ed</strong> to compute p values for all test stat<strong>is</strong>tics. If bas<strong>ed</strong> on the<br />

sampling design, the value <strong>is</strong> the difference between the number of primary sampling<br />

units and the number of strata in the first stage of sampling. Alternatively, you can set<br />

a custom degrees of fre<strong>ed</strong>om by specifying a positive integer.<br />

Adjustment for Multiple Compar<strong>is</strong>ons. When performing many hypothes<strong>is</strong> tests, you<br />

run the r<strong>is</strong>k of increas<strong>ed</strong> overall Type I error (the probability of incorrectly rejecting<br />

a null hypothes<strong>is</strong>). Th<strong>is</strong> group allows you to choose the method of adjusting the<br />

significance level.


81<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

• Least significant difference. Th<strong>is</strong> method does not control the overall probability<br />

of rejecting the hypotheses that some linear contrasts are different from the<br />

null hypothes<strong>is</strong> values.<br />

• Sequential Sidak. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Sidak proc<strong>ed</strong>ure<br />

that <strong>is</strong> much less conservative in terms of rejecting individual hypotheses but<br />

maintains the same overall significance level.<br />

• Sequential Bonferroni. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Bonferroni<br />

proc<strong>ed</strong>ure that <strong>is</strong> much less conservative in terms of rejecting individual<br />

hypotheses but maintaining the same overall significance level.<br />

• Sidak. Th<strong>is</strong> method provides tighter bounds than the Bonferroni appro<strong>ac</strong>h.<br />

• Bonferroni. Th<strong>is</strong> method adjusts the observ<strong>ed</strong> significance level for the f<strong>ac</strong>t that<br />

multiple contrasts are being test<strong>ed</strong>.<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Odds Ratios<br />

Figure 10-6<br />

Log<strong>is</strong>tic Regression Odds Ratios dialog box


82<br />

Chapter 10<br />

The Odds Ratios dialog box allows you to d<strong>is</strong>play the model-estimat<strong>ed</strong> odds ratios for<br />

specifi<strong>ed</strong> f<strong>ac</strong>tors and covariates. A separate set of odds ratios <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h<br />

category of the dependent variable except the reference category.<br />

F<strong>ac</strong>tors. For e<strong>ac</strong>h select<strong>ed</strong> f<strong>ac</strong>tor, d<strong>is</strong>plays the ratio of the odds at e<strong>ac</strong>h category of the<br />

f<strong>ac</strong>tor to the odds at the specifi<strong>ed</strong> reference category.<br />

Covariates. For e<strong>ac</strong>h select<strong>ed</strong> covariate, d<strong>is</strong>plays the ratio of the odds at the covariate’s<br />

mean value plus the specifi<strong>ed</strong> units of change to the odds at the mean.<br />

When computing odds ratios for a f<strong>ac</strong>tor or covariate, the proc<strong>ed</strong>ure fixes all other<br />

f<strong>ac</strong>tors at their highest levels and all other covariates at their means. If a f<strong>ac</strong>tor or<br />

covariate inter<strong>ac</strong>ts with other pr<strong>ed</strong>ictors in the model, then the odds ratios depend not<br />

only on the change in the specifi<strong>ed</strong> variable but also on the values of the variables<br />

with which it inter<strong>ac</strong>ts. If a specifi<strong>ed</strong> covariate inter<strong>ac</strong>ts with itself in the model (for<br />

example, age*age), then the odds ratios depend on both the change in the covariate<br />

and the value of the covariate.


<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Save<br />

Figure 10-7<br />

Log<strong>is</strong>tic Regression Save dialog box<br />

83<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

Save Variables. Th<strong>is</strong> group allows you to save the model-pr<strong>ed</strong>ict<strong>ed</strong> category and<br />

pr<strong>ed</strong>ict<strong>ed</strong> probabilities as new variables in the working file.<br />

Export Model as <strong>SPSS</strong> data. Writes an <strong>SPSS</strong> data file containing a covariance (or<br />

correlation, if select<strong>ed</strong>) matrix of the parameter estimates in the model. Also, for e<strong>ac</strong>h<br />

dependent variable, there will be a row of parameter estimates, a row of standard<br />

errors, a row of significance values for the t stat<strong>is</strong>tics corresponding to the parameter<br />

estimates, and a row of sampling design degrees of fre<strong>ed</strong>om. You can use th<strong>is</strong> matrix<br />

file in other proc<strong>ed</strong>ures that read an <strong>SPSS</strong> matrix file.


84<br />

Chapter 10<br />

Export Model as XML. Saves the parameter estimates and the parameter covariance<br />

matrix, if select<strong>ed</strong>, in XML (PMML) format. SmartScore and the server version of<br />

<strong>SPSS</strong> (a separate product) can use th<strong>is</strong> model file to apply the model information<br />

to other data files for scoring purposes.<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Options<br />

Figure 10-8<br />

Log<strong>is</strong>tic Regression Options dialog box<br />

Estimation. Th<strong>is</strong> group gives you control of various criteria us<strong>ed</strong> in the model<br />

estimation.<br />

• Maximum Iterations. The maximum number of iterations the algorithm will<br />

execute. Specify a non-negative integer.<br />

• Maximum Step-Halving. At e<strong>ac</strong>h iteration, the step size <strong>is</strong> r<strong>ed</strong>uc<strong>ed</strong> by a f<strong>ac</strong>tor<br />

of 0.5 until the log-likelihood increases or maximum step-halving <strong>is</strong> re<strong>ac</strong>h<strong>ed</strong>.<br />

Specify a positive integer.


85<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

• Limit iterations bas<strong>ed</strong> on change in parameter estimates. When select<strong>ed</strong>, the<br />

algorithm stops after an iteration in which the absolute or relative change in the<br />

parameter estimates <strong>is</strong> less than the value specifi<strong>ed</strong>, which must be non-negative.<br />

• Limit iterations bas<strong>ed</strong> on change in log-likelihood. When select<strong>ed</strong>, the algorithm<br />

stops after an iteration in which the absolute or relative change in the<br />

log-likelihood function <strong>is</strong> less than the value specifi<strong>ed</strong>, which must be<br />

non-negative.<br />

• Check for complete separation of data points. When select<strong>ed</strong>, the algorithm<br />

performs tests to ensure that the parameter estimates have unique values.<br />

Separation occurs when the proc<strong>ed</strong>ure can produce a model that correctly<br />

classifies every case.<br />

• D<strong>is</strong>play iteration h<strong>is</strong>tory. D<strong>is</strong>plays parameter estimates and stat<strong>is</strong>tics at every n<br />

iterations beginning with the 0 th iteration (the initial estimates). If you choose to<br />

print the iteration h<strong>is</strong>tory, the last iteration <strong>is</strong> always print<strong>ed</strong> regardless of the<br />

value of n.<br />

User-M<strong>is</strong>sing Values. All design variables, as well as the dependent variable and<br />

any covariates, must have valid data. Cases with invalid data for any of these<br />

variables are delet<strong>ed</strong> from the analys<strong>is</strong>. These controls allow you to decide whether<br />

user-m<strong>is</strong>sing values are treat<strong>ed</strong> as valid among the strata, cluster, subpopulation, and<br />

f<strong>ac</strong>tor variables.<br />

Confidence Interval. Th<strong>is</strong> <strong>is</strong> the confidence interval level for coefficient estimates,<br />

exponentiat<strong>ed</strong> coefficient estimates, and odds ratios. Specify a value greater than or<br />

equal to 50 and less than 100.<br />

CSLOGISTIC Command Additional Features<br />

The <strong>SPSS</strong> command language also allows you to:<br />

• Specify custom tests of effects versus a linear combination of effects or a value<br />

(using the CUSTOM subcommand).<br />

• Fix values of other model variables when computing odds ratios for f<strong>ac</strong>tors and<br />

covariates (using the ODDSRATIOS subcommand).<br />

• Specify a tolerance value for checking singularity (using the CRITERIA<br />

subcommand).


86<br />

Chapter 10<br />

• Create user-specifi<strong>ed</strong> names for sav<strong>ed</strong> variables (using the SAVE subcommand).<br />

• Produce a general estimable function table (using the PRINT subcommand).<br />

See the <strong>SPSS</strong> Command Syntax Reference for complete syntax information.


Part 2: Examples


Chapter<br />

11<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

The Sampling Wizard guides you through the steps for creating, modifying, or<br />

executing a sampling plan file. Before using the Wizard, you should have a<br />

well-defin<strong>ed</strong> target population, a l<strong>is</strong>t of sampling units, and an appropriate sample<br />

design in mind.<br />

Obtaining a Sample from a Full Sampling Frame<br />

A state agency <strong>is</strong> charg<strong>ed</strong> with ensuring fair property taxes from county to county.<br />

Taxes are bas<strong>ed</strong> on the appra<strong>is</strong><strong>ed</strong> value of the property, so the agency wants to survey<br />

a sample of properties by county to be sure that e<strong>ac</strong>h county’s records are equally<br />

up-to-date. However, resources for obtaining current appra<strong>is</strong>als are limit<strong>ed</strong>, so it’s<br />

important that what <strong>is</strong> available <strong>is</strong> us<strong>ed</strong> w<strong>is</strong>ely. The agency decides to employ<br />

complex sampling methodology to select a sample of properties.<br />

A l<strong>is</strong>ting of properties <strong>is</strong> collect<strong>ed</strong> in property_assess_cs.sav, found in the<br />

\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />

the <strong>Complex</strong> Samples Sampling Wizard to select a sample.<br />

Using the Wizard<br />

E<br />

To run the <strong>Complex</strong> Samples Sampling Wizard, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Select a Sample...<br />

89


90<br />

Chapter 11<br />

Figure 11-1<br />

Sampling Wizard, Welcome step<br />

E<br />

E<br />

Select Designasampleand type c:\property_assess.csplan asthenameoftheplan<br />

file.<br />

Click Next.


91<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-2<br />

Sampling Wizard, Design Variables step (stage 1)<br />

E<br />

E<br />

E<br />

Select County as a stratification variable.<br />

Select Township as a cluster variable.<br />

Click Next, and then click Next in the Method step.<br />

Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h county.<br />

In th<strong>is</strong> stage, townships are drawn as the primary sampling unit using the default<br />

method, simple random sampling.


92<br />

Chapter 11<br />

Figure 11-3<br />

Sampling Wizard, Sample Size step (stage 1)<br />

E<br />

E<br />

Type 4 as the value for the number of clusters to select in th<strong>is</strong> stage.<br />

Click Next, andthenclickNext in the Output Variables step.


93<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-4<br />

Sampling Wizard, Plan Summary step (stage 1)<br />

E<br />

E<br />

Select Yes, add stage 2 now.<br />

Click Next.


94<br />

Chapter 11<br />

Figure 11-5<br />

Sampling Wizard, Design Variables step (stage 2)<br />

E<br />

E<br />

Select Neighborhood as a stratification variable.<br />

Click Next, and then click Next in the Method step.<br />

Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h<br />

neighborhood of the townships drawn in stage 1. In th<strong>is</strong> stage, properties are drawn as<br />

the primary sampling unit using simple random sampling.


95<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-6<br />

Sampling Wizard, Sample Size step (stage 2)<br />

E<br />

E<br />

E<br />

Select Proportions from the Units drop-down l<strong>is</strong>t.<br />

Type 0.2 as the value of the proportion of units to sample from e<strong>ac</strong>h strata.<br />

Click Next, andthenclickNext in the Output Variables step.


96<br />

Chapter 11<br />

Figure 11-7<br />

Sampling Wizard, Plan Summary step (stage 2)<br />

E<br />

Look over the sampling design, and then click Next.


97<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-8<br />

Sampling Wizard, Draw Sample, Selection Options step<br />

E<br />

E<br />

Select Custom value for the type of random se<strong>ed</strong> to use and type 241972 as the value.<br />

Using a custom value allows you to replicate the results of th<strong>is</strong> example ex<strong>ac</strong>tly.<br />

Click Next, and then click Next in the Draw Sample, Output Files step.


98<br />

Chapter 11<br />

Figure 11-9<br />

Sampling Wizard, Fin<strong>is</strong>h step<br />

E<br />

Click Fin<strong>is</strong>h.<br />

These selections produce the sampling plan file property_assess.csplan and draw a<br />

sample <strong>ac</strong>cording to that plan.


99<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Plan Summary<br />

Figure 11-10<br />

Plan summary<br />

The summary table reviews your sampling plan, and it <strong>is</strong> useful for making sure<br />

that the plan represents your intentions.


100<br />

Chapter 11<br />

Sampling Summary<br />

Figure 11-11<br />

Stage summary<br />

Th<strong>is</strong> summary table reviews the first stage of sampling, and it <strong>is</strong> useful for checking<br />

that the sampling went <strong>ac</strong>cording to plan. Four townships were sampl<strong>ed</strong> from e<strong>ac</strong>h<br />

county, as request<strong>ed</strong>.


101<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-12<br />

Stage summary<br />

Th<strong>is</strong> summary table (the top part of which <strong>is</strong> shown here) reviews the second stage of<br />

sampling. It <strong>is</strong> also useful for checking that the sampling went <strong>ac</strong>cording to plan.<br />

Approximately 20% of the properties were sampl<strong>ed</strong> from e<strong>ac</strong>h neighborhood from<br />

e<strong>ac</strong>h township sampl<strong>ed</strong> in the first stage, as request<strong>ed</strong>.


102<br />

Chapter 11<br />

Sample Results<br />

Figure 11-13<br />

Data Editor with sample results<br />

You can see the sampling results in the Data Editor. Five new variables were sav<strong>ed</strong> to<br />

the working file, representing the inclusion probabilities and cumulative sampling<br />

weights for e<strong>ac</strong>h stage, plus the final sampling weights.<br />

• Cases with values for these variables were select<strong>ed</strong> to the sample.<br />

• Cases with system-m<strong>is</strong>sing values for the variables were not select<strong>ed</strong>.<br />

The agency will now use its resources to collect current valuations for the properties<br />

select<strong>ed</strong> in the sample. Once those valuations are available, you can process the<br />

sample with <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures, using the sampling plan<br />

property_assess.csplan to provide the sampling specifications.


Obtaining a Sample from a Partial Sampling Frame<br />

103<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Acompany<strong>is</strong>interest<strong>ed</strong> in compiling and selling a database of high-quality survey<br />

information. The survey sample should be representative but efficiently carri<strong>ed</strong><br />

out, so complex sampling methods are us<strong>ed</strong>. The full sampling design calls for<br />

the following structure:<br />

Stage Strata Clusters<br />

1 Region Province<br />

2 D<strong>is</strong>trict City<br />

3 Subdiv<strong>is</strong>ion<br />

In the third stage, households are the primary sampling unit, and select<strong>ed</strong> households<br />

will be survey<strong>ed</strong>. However, information <strong>is</strong> easily available only to the city level, so<br />

the company plans to execute the first two stages of the design now, and then collect<br />

information on the numbers of subdiv<strong>is</strong>ions and households from the sampl<strong>ed</strong> cities.<br />

The available information to the city level <strong>is</strong> collect<strong>ed</strong> in demo_cs_1.sav, found in the<br />

\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />

the <strong>Complex</strong> Samples Sampling Wizard to draw the first two stages.<br />

Using the Wizard to Sample from the First Partial Frame<br />

E<br />

To run the <strong>Complex</strong> Samples Sampling Wizard, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Select aSample...


104<br />

Chapter 11<br />

Figure 11-14<br />

Sampling Wizard, Welcome step<br />

E<br />

E<br />

Select Designasampleand type c:\demo_1.csplan as the name of the plan file.<br />

Click Next.


105<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-15<br />

Sampling Wizard, Design Variables step (stage 1)<br />

E<br />

E<br />

E<br />

Select Region as a stratification variable.<br />

Select Province as a cluster variable.<br />

Click Next, and then click Next in the Method step.<br />

Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h region. In<br />

th<strong>is</strong> stage, provinces are drawn as the primary sampling unit using the default method,<br />

simple random sampling.


106<br />

Chapter 11<br />

Figure 11-16<br />

Sampling Wizard, Sample Size step (stage 1)<br />

E<br />

E<br />

Type 3 as the value for the number of clusters to select in th<strong>is</strong> stage.<br />

Click Next, andthenclickNext in the Output Variables step.


107<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-17<br />

Sampling Wizard, Plan Summary step (stage 1)<br />

E<br />

E<br />

Select Yes, add stage 2 now.<br />

Click Next.


108<br />

Chapter 11<br />

Figure 11-18<br />

Sampling Wizard, Design Variables step (stage 2)<br />

E<br />

E<br />

E<br />

Select D<strong>is</strong>trict as a stratification variable.<br />

Select City as a cluster variable.<br />

Click Next, and then click Next in the Method step.<br />

Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h d<strong>is</strong>trict. In<br />

th<strong>is</strong> stage, cities are drawn as the primary sampling unit using the default method,<br />

simple random sampling.


109<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-19<br />

Sampling Wizard, Sample Size step (stage 2)<br />

E<br />

E<br />

E<br />

Select Proportions from the Units drop-down l<strong>is</strong>t.<br />

Type 0.1 as the value of the proportion of units to sample from e<strong>ac</strong>h strata.<br />

Click Next, andthenclickNext in the Output Variables step.


110<br />

Chapter 11<br />

Figure 11-20<br />

Sampling Wizard, Plan Summary step (stage 2)<br />

E<br />

Look over the sampling design, and then click Next.


111<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-21<br />

Sampling Wizard, Draw Sample, Selection Options step<br />

E<br />

E<br />

Select Custom value for the type of random se<strong>ed</strong> to use, and type 241972 as the value.<br />

Using a custom value allows you to replicate the results of th<strong>is</strong> example ex<strong>ac</strong>tly.<br />

Click Next, and then click Next in the Draw Sample, Output Files step.


112<br />

Chapter 11<br />

Figure 11-22<br />

Sampling Wizard, Fin<strong>is</strong>h step<br />

E<br />

Click Fin<strong>is</strong>h.<br />

These selections produce the sampling plan file demo_1.csplan and draw a sample<br />

<strong>ac</strong>cording to that plan.


113<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Sample Results<br />

Figure 11-23<br />

Data Editor with sample results<br />

You can see the sampling results in the Data Editor. Five new variables were sav<strong>ed</strong> to<br />

the working file, representing the inclusion probabilities and cumulative sampling<br />

weights for e<strong>ac</strong>h stage, plus the “final” sampling weights for the first two stages.<br />

• Cities with values for these variables were select<strong>ed</strong> to the sample.<br />

• Cities with system-m<strong>is</strong>sing values for the variables were not select<strong>ed</strong>.<br />

For e<strong>ac</strong>h city select<strong>ed</strong>, the company <strong>ac</strong>quir<strong>ed</strong> subdiv<strong>is</strong>ion and household unit<br />

information and pl<strong>ac</strong><strong>ed</strong> it in demo_cs_2.sav. Use th<strong>is</strong> file and the Sampling Wizard to<br />

sample the third stage of th<strong>is</strong> design.


114<br />

Chapter 11<br />

Using the Wizard to Sample from the Second Partial Frame<br />

E<br />

To run the <strong>Complex</strong> Samples Sampling Wizard, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Select a Sample...<br />

Figure 11-24<br />

Sampling Wizard, Welcome step<br />

E<br />

E<br />

Select Designasampleand type c:\demo_2.csplan as the name of the plan file.<br />

Click Next.


115<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-25<br />

Sampling Wizard, Design Variables step (stage 1)<br />

E<br />

E<br />

E<br />

Select Subdiv<strong>is</strong>ion as a stratification variable.<br />

Select Cumulative Sampling Weight for Stage 2 as the input sample weight variable.<br />

Click Next, and then click Next in the Method step.<br />

Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h subdiv<strong>is</strong>ion.<br />

In th<strong>is</strong> stage, household units are drawn as the primary sampling unit using the default<br />

method, simple random sampling.


116<br />

Chapter 11<br />

Figure 11-26<br />

Sampling Wizard, Sample Size step (stage 1)<br />

E<br />

E<br />

Type 0.2 as the value for the proportion of units to select in th<strong>is</strong> stage.<br />

Click Next, andthenclickNext in the Output Variables step.


117<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-27<br />

Sampling Wizard, Plan Summary step (stage 1)<br />

E<br />

Look over the sampling design, and then click Next.


118<br />

Chapter 11<br />

Figure 11-28<br />

Sampling Wizard, Draw Sample, Selection Options step<br />

E<br />

E<br />

Select Custom value for the type of random se<strong>ed</strong> to use and type 4231946 as the value.<br />

Click Next, and then click Next in the Draw Sample, Output Files step.


119<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Figure 11-29<br />

Sampling Wizard, Fin<strong>is</strong>h step<br />

E<br />

Click Fin<strong>is</strong>h.<br />

These selections produce the sampling plan file demo_2.csplan and draw a sample<br />

<strong>ac</strong>cording to that plan.


120<br />

Chapter 11<br />

Sample Results<br />

Figure 11-30<br />

Data Editor with sample results<br />

You can see the sampling results in the Data Editor. Three new variables were sav<strong>ed</strong><br />

to the working file, representing the inclusion probabilities and cumulative sampling<br />

weights for the third stage, plus the final sampling weights. These new weights take<br />

into <strong>ac</strong>count the weights comput<strong>ed</strong> during the sampling of the first two stages.<br />

• Units with values for these variables were select<strong>ed</strong> to the sample.<br />

• Units with system-m<strong>is</strong>sing values for the variables were not select<strong>ed</strong>.<br />

The company will now use its resources to obtain survey information for the housing<br />

units select<strong>ed</strong> in the sample. Once the surveys are collect<strong>ed</strong>, you can process<br />

the sample with <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures, using the sampling plan<br />

demo_2.csplan to provide the sampling specifications.


121<br />

<strong>Complex</strong> Samples Sampling Wizard<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples Sampling Wizard proc<strong>ed</strong>ure <strong>is</strong> a useful tool for creating a<br />

sampling plan file and drawing a sample.<br />

• To ready a sample for analys<strong>is</strong> when you do not have <strong>ac</strong>cess to the sampling plan<br />

file, use the Analys<strong>is</strong> Preparation Wizard.


<strong>Complex</strong> Samples Analys<strong>is</strong><br />

Preparation Wizard<br />

Chapter<br />

12<br />

The Analys<strong>is</strong> Preparation Wizard guides you through the steps for creating or<br />

modifying an analys<strong>is</strong> plan for use with the various <strong>Complex</strong> Samples analys<strong>is</strong><br />

proc<strong>ed</strong>ures. It <strong>is</strong> most useful when you do not have <strong>ac</strong>cess to the sampling plan<br />

file us<strong>ed</strong> to draw the sample.<br />

Using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard to<br />

Ready NHIS Public Data<br />

The National Health Interview Survey (NHIS) <strong>is</strong> a large, population-bas<strong>ed</strong> survey of<br />

the U.S. civilian population. Interviews are carri<strong>ed</strong> out f<strong>ac</strong>e-to-f<strong>ac</strong>e in a nationally<br />

representative sample of households. Demographic information and observations<br />

about health behavior and status are obtain<strong>ed</strong> for members of e<strong>ac</strong>h household.<br />

A subset of the 2000 survey <strong>is</strong> collect<strong>ed</strong> in nh<strong>is</strong>2000_subset.sav, found in the<br />

\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />

the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard to create an analys<strong>is</strong> plan for th<strong>is</strong><br />

data file so that it can be process<strong>ed</strong> by <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures.<br />

Using the Wizard<br />

E<br />

To prepare a sample using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard, from<br />

the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Prepare for Analys<strong>is</strong>...<br />

123


124<br />

Chapter 12<br />

Figure 12-1<br />

Analys<strong>is</strong> Preparation Wizard, Welcome step<br />

E<br />

E<br />

Enter c:\nh<strong>is</strong>2000_subset.csaplan as the name for the analys<strong>is</strong> plan file.<br />

Click Next.


125<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

Figure 12-2<br />

Analys<strong>is</strong> Preparation Wizard, Design Variables step (stage 1)<br />

The data are obtain<strong>ed</strong> using a complex mult<strong>is</strong>tage sample. However, for end users,<br />

the original NHIS design variables were transform<strong>ed</strong> to a simplifi<strong>ed</strong> set of design and<br />

weight variables whose results approximate those of the original design structures.<br />

E<br />

E<br />

E<br />

E<br />

Select Stratum for variance estimation as a strata variable.<br />

Select PSU for variance estimation as a cluster variable.<br />

Select Weight - Final Annual as the sample weight variable.<br />

Click Fin<strong>is</strong>h.


126<br />

Chapter 12<br />

Summary<br />

Figure 12-3<br />

Summary<br />

The summary table reviews your analys<strong>is</strong> plan. The plan cons<strong>is</strong>ts of one stage with a<br />

design of one stratification variable and one cluster variable. With-repl<strong>ac</strong>ement<br />

(WR) estimation <strong>is</strong> us<strong>ed</strong>, and the plan <strong>is</strong> sav<strong>ed</strong> to c:/nh<strong>is</strong>2000_subset.csaplan. You<br />

can now use th<strong>is</strong> plan file to process nh<strong>is</strong>2000_subset.sav with <strong>Complex</strong> Samples<br />

analys<strong>is</strong> proc<strong>ed</strong>ures.<br />

Preparing for Analys<strong>is</strong> When Sampling Weights Are Not in<br />

the Data File<br />

Aloanofficer has a collection of customer records, taken <strong>ac</strong>cording to a complex<br />

design; however, the sampling weights are not includ<strong>ed</strong> in the file. Th<strong>is</strong> information<br />

<strong>is</strong> contain<strong>ed</strong> in bankloan_cs_noweights.sav, found in the \tutorial\sample_files\<br />

subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong> . Starting with what she<br />

knows about the sampling design, the officer wants to use the <strong>Complex</strong> Samples<br />

Analys<strong>is</strong> Preparation Wizard to create an analys<strong>is</strong> plan for th<strong>is</strong> data file so that it can<br />

be process<strong>ed</strong> by <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures.<br />

The loan officer knows that the records were select<strong>ed</strong> in two stages, with 15 out of<br />

100 bank branches select<strong>ed</strong> with equal probability and without repl<strong>ac</strong>ement in the<br />

first stage. One hundr<strong>ed</strong> customers were then select<strong>ed</strong> from e<strong>ac</strong>h of those banks with


127<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

equal probability and without repl<strong>ac</strong>ement in the second stage, and information on<br />

the number of customers at e<strong>ac</strong>h bank <strong>is</strong> includ<strong>ed</strong> in the data file. The first step to<br />

creating an analys<strong>is</strong> plan <strong>is</strong> to compute the stagew<strong>is</strong>e inclusion probabilities and<br />

final sampling weights.<br />

Computing Inclusion Probabilities and Sampling Weights<br />

E<br />

To compute the inclusion probabilities for the first stage, from the menus choose:<br />

Transform<br />

Compute...<br />

Figure 12-4<br />

Compute Variable dialog box<br />

Fifteen out of one hundr<strong>ed</strong> bank branches were select<strong>ed</strong> without repl<strong>ac</strong>ement in the<br />

first stage; thus, the probability that a given bank was select<strong>ed</strong> <strong>is</strong> 15/100 = 0.15.<br />

E<br />

Type inclprob_s1 as the target variable.


128<br />

Chapter 12<br />

E<br />

E<br />

Type 0.15 as the numeric expression.<br />

Click OK.<br />

Figure 12-5<br />

Compute Variable dialog box<br />

One hundr<strong>ed</strong> customers were select<strong>ed</strong> from e<strong>ac</strong>h branch in the second stage; thus,<br />

the stage 2 inclusion probability for a given customer at a given bank <strong>is</strong> 100/the<br />

number of customers at that bank.<br />

E<br />

E<br />

E<br />

E<br />

Recall the Compute Variable dialog box.<br />

Type inclprob_s2 as the target variable.<br />

Type 100/ncust as the numeric expression.<br />

Click OK.


129<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

Figure 12-6<br />

Compute Variable dialog box<br />

Now that you have the inclusion probabilities for e<strong>ac</strong>h stage, it’s easy to compute the<br />

final sampling weights.<br />

E<br />

E<br />

E<br />

E<br />

Recall the Compute Variable dialog box.<br />

Type finalweight as the target variable.<br />

Type 1/(inclprob_s1 * inclprob_s2) as the numeric expression.<br />

Click OK.<br />

You are now ready to create the analys<strong>is</strong> plan.


130<br />

Chapter 12<br />

Using the Wizard<br />

E<br />

To prepare a sample using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard, from<br />

the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Prepare for Analys<strong>is</strong>...<br />

Figure 12-7<br />

Analys<strong>is</strong> Preparation Wizard, Welcome step<br />

E<br />

E<br />

Enter c:\bankloan.csaplan as the name for the analys<strong>is</strong> plan file.<br />

Click Next.


131<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

Figure 12-8<br />

Analys<strong>is</strong> Preparation Wizard, Design Variables step (stage 1)<br />

E<br />

E<br />

E<br />

Select Branch as a cluster variable.<br />

Select finalweight as the sample weight variable.<br />

Click Next.


132<br />

Chapter 12<br />

Figure 12-9<br />

Analys<strong>is</strong> Preparation Wizard, Estimation Method step (stage 1)<br />

E<br />

E<br />

Select Equal WOR as the first stage estimation method.<br />

Click Next.


133<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

Figure 12-10<br />

Analys<strong>is</strong> Preparation Wizard, Size step (stage 1)<br />

E<br />

E<br />

Select Read values from variable and select inclprob_s1 as the variable containing<br />

the first stage inclusion probabilities.<br />

Click Next.


134<br />

Chapter 12<br />

Figure 12-11<br />

Analys<strong>is</strong> Preparation Wizard, Plan Summary step (stage 1)<br />

E<br />

E<br />

Select Yes, add stage 2 now.<br />

Click Next, and then click Next in the Design step.


135<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

Figure 12-12<br />

Analys<strong>is</strong> Preparation Wizard, Estimation Method step (stage 2)<br />

E<br />

E<br />

Select Equal WOR as the second stage estimation method.<br />

Click Next.


136<br />

Chapter 12<br />

Figure 12-13<br />

Analys<strong>is</strong> Preparation Wizard, Size step (stage 2)<br />

E<br />

E<br />

Select Read values from variable and select inclprob_s2 as the variable containing<br />

the second stage inclusion probabilities.<br />

Click Fin<strong>is</strong>h.


137<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

Summary<br />

Figure 12-14<br />

Summary<br />

The summary table reviews your analys<strong>is</strong> plan. The plan cons<strong>is</strong>ts of two stages with<br />

adesignofoneclustervariable. Equalprobability without repl<strong>ac</strong>ement (WOR)<br />

estimation <strong>is</strong> us<strong>ed</strong>, and the plan <strong>is</strong> sav<strong>ed</strong> to c:/bankloan.csaplan. You can now use<br />

th<strong>is</strong> plan file to process bankloan_noweights.sav (with the inclusion probabilities and<br />

sampling weights you’ve comput<strong>ed</strong>) with <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures.<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard proc<strong>ed</strong>ure <strong>is</strong> a useful tool for<br />

readying a sample for analys<strong>is</strong> when you do not have <strong>ac</strong>cess to the sampling plan file.<br />

• To create a sampling plan file and draw a sample, use the Sampling Wizard.


<strong>Complex</strong> Samples Frequencies<br />

Chapter<br />

13<br />

The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure produces frequency tables for select<strong>ed</strong><br />

variables and d<strong>is</strong>plays univariate stat<strong>is</strong>tics. Optionally, you can request stat<strong>is</strong>tics by<br />

subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />

Using <strong>Complex</strong> Samples Frequencies to Analyze Nutritional<br />

Supplement Usage<br />

A researcher wants to study the use of nutritional supplements among U.S. citizens,<br />

using the results of the National Health Interview Survey (NHIS) and a previously<br />

creat<strong>ed</strong> analys<strong>is</strong> plan. For more information, see “Using the <strong>Complex</strong> Samples<br />

Analys<strong>is</strong> Preparation Wizard to Ready NHIS Public Data” in Chapter 12 on p. 123.<br />

A subset of the 2000 survey <strong>is</strong> collect<strong>ed</strong> in nh<strong>is</strong>2000_subset.sav, found in the<br />

\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>.<br />

The analys<strong>is</strong> plan <strong>is</strong> stor<strong>ed</strong> in nh<strong>is</strong>2000_subset.csaplan. Use <strong>Complex</strong> Samples<br />

Frequencies to produce stat<strong>is</strong>tics for nutritional supplement usage.<br />

Running the Analys<strong>is</strong><br />

E<br />

To run a <strong>Complex</strong> Samples Frequencies analys<strong>is</strong>, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Frequencies...<br />

139


140<br />

Chapter 13<br />

Figure 13-1<br />

<strong>Complex</strong> Samples Plan dialog box<br />

E<br />

E<br />

Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />

install<strong>ed</strong> <strong>SPSS</strong> and select nh<strong>is</strong>2000_subset.csaplan.<br />

Click Continue.


141<br />

<strong>Complex</strong> Samples Frequencies<br />

Figure 13-2<br />

Frequencies dialog box<br />

E<br />

E<br />

E<br />

Select Vitamin/mineral supplmnts-past 12 m as a frequency variable.<br />

Select Age category as a subpopulation variable.<br />

Click Stat<strong>is</strong>tics.


142<br />

Chapter 13<br />

Figure 13-3<br />

Frequencies Stat<strong>is</strong>tics dialog box<br />

E<br />

E<br />

E<br />

E<br />

Select Table percent in the Cells group.<br />

Select Confidence interval in the Stat<strong>is</strong>tics group.<br />

Click Continue.<br />

Click OK in the Frequencies dialog box.<br />

Frequency Table<br />

Figure 13-4<br />

Frequency table for variable/situation


143<br />

<strong>Complex</strong> Samples Frequencies<br />

E<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h select<strong>ed</strong> cell measure. The first column<br />

contains estimates of the number and percentage of the population that do or do not<br />

take vitamin/mineral supplements. The confidence intervals are non-overlapping;<br />

thus, you can conclude that, overall, more Americans take vitamin/mineral<br />

supplements than not.<br />

Frequency by Subpopulation<br />

Figure 13-5<br />

Frequency table by subpopulation<br />

When computing stat<strong>is</strong>tics by subpopulation, e<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for<br />

e<strong>ac</strong>h select<strong>ed</strong> cell measure by value of Age category. The first column contains<br />

estimates of the number and percentage of the population of e<strong>ac</strong>h category that do<br />

or do not take vitamin/mineral supplements. The confidence intervals for the table


144<br />

Chapter 13<br />

percentages are all non-overlapping; thus, you can conclude that with increasing age<br />

category, greater proportions of Americans take vitamin/mineral supplements.<br />

Summary<br />

Using the <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure, you have obtain<strong>ed</strong> stat<strong>is</strong>tics for<br />

the use of nutritional supplements among U.S. citizens.<br />

• Overall, more Americans take vitamin/mineral supplements than not.<br />

• When broken down by age category, greater proportions of Americans take<br />

vitamin/mineral supplements with increasing age.<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining univariate<br />

descriptive stat<strong>is</strong>tics of categorical variables for observations obtain<strong>ed</strong> via a complex<br />

sampling design.<br />

• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />

design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />

Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />

dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />

• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to set analys<strong>is</strong><br />

specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />

the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />

analyzing the sample corresponding to that plan.<br />

• The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure provides descriptive stat<strong>is</strong>tics for the<br />

crosstabulation of categorical variables.<br />

• The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure provides univariate descriptive<br />

stat<strong>is</strong>tics for scale variables.


Chapter<br />

14<br />

<strong>Complex</strong> Samples Descriptives<br />

The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics<br />

for several variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />

by one or more categorical variables.<br />

Using <strong>Complex</strong> Samples Descriptives to Analyze Activity Levels<br />

A researcher wants to study the <strong>ac</strong>tivity levels of U.S. citizens, using the results of the<br />

National Health Interview Survey (NHIS) and a previously creat<strong>ed</strong> analys<strong>is</strong> plan. For<br />

more information, see “Using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />

to Ready NHIS Public Data” in Chapter 12 on p. 123.<br />

A subset of the 2000 survey <strong>is</strong> collect<strong>ed</strong> in nh<strong>is</strong>2000_subset.sav, found in the<br />

\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>.<br />

The analys<strong>is</strong> plan <strong>is</strong> stor<strong>ed</strong> in nh<strong>is</strong>2000_subset.csaplan. Use <strong>Complex</strong> Samples<br />

Descriptives to produce univariate descriptive stat<strong>is</strong>tics for <strong>ac</strong>tivity levels.<br />

Running the Analys<strong>is</strong><br />

E<br />

To run a <strong>Complex</strong> Samples Descriptives analys<strong>is</strong>, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Descriptives...<br />

145


146<br />

Chapter 14<br />

Figure 14-1<br />

<strong>Complex</strong> Samples Plan dialog box<br />

E<br />

E<br />

Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />

install<strong>ed</strong> <strong>SPSS</strong>, and select nh<strong>is</strong>2000_subset.csaplan.<br />

Click Continue.


147<br />

<strong>Complex</strong> Samples Descriptives<br />

Figure 14-2<br />

Descriptives dialog box<br />

E<br />

E<br />

E<br />

Select Freq vigorous <strong>ac</strong>tivity (times per wk) through Freq strength <strong>ac</strong>tivity (times<br />

per wk) as measure variables.<br />

Select Age category as a subpopulation variable.<br />

Click Stat<strong>is</strong>tics.


148<br />

Chapter 14<br />

Figure 14-3<br />

Descriptives Stat<strong>is</strong>tics dialog box<br />

E<br />

E<br />

E<br />

Select Confidence interval in the Stat<strong>is</strong>tics group.<br />

Click Continue.<br />

Click OK in the <strong>Complex</strong> Samples Descriptives dialog box.<br />

Univariate Stat<strong>is</strong>tics<br />

Figure 14-4<br />

Univariate stat<strong>is</strong>tics<br />

E<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h measure variable. The first column<br />

contains estimates of the average number of times per week that a person engages<br />

in a particular type of <strong>ac</strong>tivity. The confidence intervals for the means are


149<br />

<strong>Complex</strong> Samples Descriptives<br />

non-overlapping. Thus, you can conclude that, overall, Americans engage in a<br />

strength <strong>ac</strong>tivity less often than vigorous <strong>ac</strong>tivity,andtheyengageinvigorous<strong>ac</strong>tivity<br />

less often than moderate <strong>ac</strong>tivity.<br />

Univariate Stat<strong>is</strong>tics by Subpopulation<br />

Figure 14-5<br />

Univariate stat<strong>is</strong>tics by subpopulation<br />

E<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h measure variable by values of Age<br />

category. The first column contains estimates of the average number of times per<br />

week that people of e<strong>ac</strong>h category engage in a particular type of <strong>ac</strong>tivity. The<br />

confidence intervals for the means allow you to make some interesting conclusions.<br />

• In terms of vigorous and moderate <strong>ac</strong>tivities, 25–44-year-olds are less <strong>ac</strong>tive than<br />

those 18–24 and 45–64, and 45–64-year-olds are less <strong>ac</strong>tive than those 65 or older.


150<br />

Chapter 14<br />

• In terms of strength <strong>ac</strong>tivity, 25–44-year-olds are less <strong>ac</strong>tive than those 45–64,<br />

and 18–24 and 45–64-year-olds are less <strong>ac</strong>tive than those 65 or older.<br />

Summary<br />

Using the <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure, you have obtain<strong>ed</strong> stat<strong>is</strong>tics<br />

for the <strong>ac</strong>tivity levels of U.S. citizens.<br />

• Overall, Americans spend varying amounts of time at different types of <strong>ac</strong>tivities.<br />

• When broken downbyage,itroughlyappearsthatpost-collegiateAmericansare<br />

initially less <strong>ac</strong>tive than they were while in school but become more conscientious<br />

about exerc<strong>is</strong>ing as they age.<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining univariate<br />

descriptive stat<strong>is</strong>tics of scale measures for observations obtain<strong>ed</strong> via a complex<br />

sampling design.<br />

• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />

design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />

Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />

dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />

• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />

specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />

the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />

analyzing the sample corresponding to that plan.<br />

• The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure provides descriptive stat<strong>is</strong>tics for ratios<br />

of scale measures.<br />

• The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure provides univariate descriptive<br />

stat<strong>is</strong>tics of categorical variables.


<strong>Complex</strong> Samples Crosstabs<br />

Chapter<br />

15<br />

The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure produces crosstabulation tables for pairs<br />

of select<strong>ed</strong> variables and d<strong>is</strong>plays two-way stat<strong>is</strong>tics. Optionally, you can request<br />

stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />

Using <strong>Complex</strong> Samples Crosstabs to Measure the Relative<br />

R<strong>is</strong>k of an Event<br />

A company that sells magazine subscriptions traditionally sends monthly mailings<br />

to a purchas<strong>ed</strong> database of names. The response rate <strong>is</strong> typically low, so you ne<strong>ed</strong><br />

to find a way to better target prospective customers. One suggestion was to focus<br />

mailings on people with newspaper subscriptions, on the assumption that people who<br />

read newspapers are more likely to subscribe to magazines.<br />

Use the <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure to test th<strong>is</strong> theory by constructing a<br />

two-by-two table of Newspaper subscription by Response and compute the relative<br />

r<strong>is</strong>k that a person with a newspaper subscription will respond to the mailing. Th<strong>is</strong><br />

information <strong>is</strong> collect<strong>ed</strong> in demo_cs.sav and should be analyz<strong>ed</strong> using the sampling<br />

plan file demo_2.csplan, found in the \tutorial\sample_files\ subdirectory of the<br />

directory in which you install<strong>ed</strong> <strong>SPSS</strong>.<br />

Running the Analys<strong>is</strong><br />

E<br />

Toruna<strong>Complex</strong>SamplesCrosstabsanalys<strong>is</strong>, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Crosstabs...<br />

151


152<br />

Chapter 15<br />

Figure 15-1<br />

<strong>Complex</strong> Samples Plan dialog box<br />

E<br />

E<br />

Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />

install<strong>ed</strong> <strong>SPSS</strong> and select demo_2.csplan.<br />

Click Continue.


153<br />

<strong>Complex</strong> Samples Crosstabs<br />

Figure 15-2<br />

Crosstabs dialog box<br />

E<br />

E<br />

E<br />

E<br />

Select Newspaper subscription as a row variable.<br />

Select Response as the column variable.<br />

There <strong>is</strong> also some interest in seeing the results broken down by income categories, so<br />

select Income category in thousands as a subpopulation variable.<br />

Click Stat<strong>is</strong>tics.


154<br />

Chapter 15<br />

Figure 15-3<br />

Crosstabs Stat<strong>is</strong>tics dialog box<br />

E<br />

E<br />

E<br />

E<br />

Deselect Population size and select Row percent in the Cells group.<br />

Select Odds ratio and Relative r<strong>is</strong>k in the Summaries for 2-by-2 Tables group.<br />

Click Continue.<br />

Click OK in the <strong>Complex</strong> Samples Crosstabs dialog box.<br />

These selections produce a crosstabulation table and r<strong>is</strong>k estimate for Newspaper<br />

subscription by Response. Separate tables with results split by Income category<br />

in thousands are also creat<strong>ed</strong>.


155<br />

<strong>Complex</strong> Samples Crosstabs<br />

Crosstabulation<br />

Figure 15-4<br />

Crosstabulation for newspaper subscription by response<br />

The crosstabulation shows that, overall, not very many people respond<strong>ed</strong> to the<br />

mailing. However, a greater proportion of newspaper subscribers respond<strong>ed</strong>.<br />

R<strong>is</strong>k Estimate<br />

Figure 15-5<br />

R<strong>is</strong>k estimate for newspaper subscription by response<br />

The relative r<strong>is</strong>k <strong>is</strong> a ratio of event probabilities. The relative r<strong>is</strong>k of a response to<br />

the mailing <strong>is</strong> the ratio of the probability that a newspaper subscriber responds, to<br />

the probability that a nonsubscriber responds. Thus, the estimate of the relative<br />

r<strong>is</strong>k <strong>is</strong> simply 18.6%/10.6% = 1.743. Likew<strong>is</strong>e, the relative r<strong>is</strong>k of nonresponse <strong>is</strong><br />

the ratio of the probability that a subscriber does not respond, to the probability<br />

that a nonsubscriber does not respond. Your estimateofth<strong>is</strong>relativer<strong>is</strong>k<strong>is</strong>0.911.<br />

Given these results, you can estimate that a newspaper subscriber <strong>is</strong> 1.743 times as<br />

likely to respond to the mailing as a nonsubscriber, or 0.911 times as likely as a<br />

nonsubscriber not to respond.<br />

The odds ratio <strong>is</strong> a ratio of event odds. The odds of an event <strong>is</strong> the ratio of the<br />

probability that the event occurs, to the probability that the event does not occur.<br />

Thus, the estimate of the odds that a newspaper subscriber responds to the mailing


156<br />

Chapter 15<br />

<strong>is</strong> 18.6%/81.4% = 0.229. Likew<strong>is</strong>e, the estimate of the odds that a nonsubscriber<br />

responds <strong>is</strong> 10.6%/89.4% = 0.118. The estimate of the odds ratio <strong>is</strong> therefore<br />

0.229/0.118 = 1.912. Note that the odds ratio <strong>is</strong> the ratio of the relative r<strong>is</strong>k of<br />

responding, to the relative r<strong>is</strong>k of not responding, or 1.743/0.911 = 1.912.<br />

Odds Ratio versus Relative R<strong>is</strong>k<br />

Since it <strong>is</strong> a ratio of ratios, the odds ratio <strong>is</strong> very difficult to interpret. The relative r<strong>is</strong>k<br />

<strong>is</strong> easier to interpret, so the odds ratio alone <strong>is</strong> not very helpful. However, there are<br />

certain commonly occurring situations in which the estimate of the relative r<strong>is</strong>k <strong>is</strong> not<br />

very good, and the odds ratio can be us<strong>ed</strong> to approximate the relative r<strong>is</strong>k of the event<br />

of interest. The odds ratio should be us<strong>ed</strong> as an approximation to the relative r<strong>is</strong>k of<br />

the event of interest when both of the following conditions are met:<br />

• The probability of the event of interest <strong>is</strong> small (< 0.1). Th<strong>is</strong> condition guarantees<br />

that the odds ratio will make a good approximation to the relative r<strong>is</strong>k. In th<strong>is</strong><br />

example, the event of interest <strong>is</strong> a response to the mailing.<br />

• The design of the study <strong>is</strong> case control. Th<strong>is</strong> condition signals that the usual<br />

estimate of the relative r<strong>is</strong>k will likely not be good. A case-control study <strong>is</strong><br />

retrospective, most often us<strong>ed</strong> when the event of interest <strong>is</strong> unlikely or when the<br />

design of a prospective experiment <strong>is</strong> impr<strong>ac</strong>tical or unethical.<br />

Neither condition <strong>is</strong> met in th<strong>is</strong> example, since the overall proportion of respondents<br />

was 13.5% and the design of the study was not case control, so it’s safer to report<br />

1.743 as the relative r<strong>is</strong>k, rather than the value of the odds ratio.


R<strong>is</strong>k Estimate by Subpopulation<br />

157<br />

<strong>Complex</strong> Samples Crosstabs<br />

Figure 15-6<br />

R<strong>is</strong>k estimate for newspaper subscription by response, controlling for income category<br />

Relative r<strong>is</strong>k estimates are comput<strong>ed</strong> separately for e<strong>ac</strong>h income category. Note<br />

that the relative r<strong>is</strong>k of a positive response for newspaper subscribers appears to<br />

gradually decrease with increasing income, which indicates that you may be able<br />

to further target the mailings.<br />

Summary<br />

Using <strong>Complex</strong> Samples Crosstabs r<strong>is</strong>k estimates, you found that you can increase<br />

your response rate to direct mailings by targeting newspaper subscribers. Further,<br />

you found some evidence that the r<strong>is</strong>k estimates may not be constant <strong>ac</strong>ross Income<br />

category, soyoumaybeabletoincreaseyourresponserateevenmorebytargeting<br />

lower-income newspaper subscribers.


158<br />

Chapter 15<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining descriptive<br />

stat<strong>is</strong>tics of the crosstabulation of categorical variables for observations obtain<strong>ed</strong><br />

via a complex sampling design.<br />

• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />

design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />

Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />

dialog box when analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />

• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />

specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />

the SamplingWizardcanbespecifi<strong>ed</strong>inthePlan dialog box when analyzing the<br />

sample corresponding to that plan.<br />

• The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure provides univariate descriptive<br />

stat<strong>is</strong>tics for categorical variables.


<strong>Complex</strong> Samples Ratios<br />

Chapter<br />

16<br />

The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics for<br />

ratios of variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />

by one or more categorical variables.<br />

Using <strong>Complex</strong> Samples Ratios to Aid Property Value<br />

Assessment<br />

A state agency <strong>is</strong> charg<strong>ed</strong> with ensuring that property taxes are fairly assess<strong>ed</strong> from<br />

county to county. Taxes are bas<strong>ed</strong> on the appra<strong>is</strong><strong>ed</strong> value of the property, so the agency<br />

wants to tr<strong>ac</strong>k property values <strong>ac</strong>ross counties to be sure that e<strong>ac</strong>h county’s records<br />

are equally up-to-date. Since resources for obtaining current appra<strong>is</strong>als are limit<strong>ed</strong>,<br />

the agency chose to employ complex sampling methodology to select properties.<br />

The sample of properties select<strong>ed</strong> and their current appra<strong>is</strong>al information <strong>is</strong><br />

collect<strong>ed</strong> in property_assess_cs_sample.sav, found in the \tutorial\sample_files\<br />

subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use <strong>Complex</strong> Samples<br />

Ratios to assess the change in property values since the last appra<strong>is</strong>al <strong>ac</strong>ross the<br />

five counties.<br />

Running the Analys<strong>is</strong><br />

E<br />

To run a <strong>Complex</strong> Samples Ratios analys<strong>is</strong>, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Ratios...<br />

159


160<br />

Chapter 16<br />

Figure 16-1<br />

<strong>Complex</strong> Samples Plan dialog box<br />

E<br />

E<br />

Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />

install<strong>ed</strong> <strong>SPSS</strong> and select property_assess.csplan.<br />

Click Continue.


161<br />

<strong>Complex</strong> Samples Ratios<br />

Figure 16-2<br />

<strong>Complex</strong> Samples Ratios dialog box<br />

E<br />

E<br />

E<br />

E<br />

Select Current value as a numerator variable.<br />

Select Valueatlastappra<strong>is</strong>alas the denominator variable.<br />

Select County as a subpopulation variable.<br />

Click Stat<strong>is</strong>tics.


162<br />

Chapter 16<br />

Figure 16-3<br />

Ratios Stat<strong>is</strong>tics dialog box<br />

E<br />

E<br />

E<br />

E<br />

Select Confidence interval, Unweight<strong>ed</strong> count,andPopulation size in the Stat<strong>is</strong>tics group.<br />

Select t-test and enter 1.3 as the test value.<br />

Click Continue.<br />

Click OK in the <strong>Complex</strong> Samples Ratios dialog box.<br />

Ratios<br />

Figure 16-4<br />

Ratios table<br />

The default d<strong>is</strong>play of the table <strong>is</strong> very wide, so you will ne<strong>ed</strong> to pivot it for a<br />

better view.


163<br />

<strong>Complex</strong> Samples Ratios<br />

Pivoting the Ratios Table<br />

E<br />

E<br />

Double-click the table to <strong>ac</strong>tivate it.<br />

From the Viewer menus choose:<br />

Pivot<br />

Pivoting Trays<br />

E<br />

E<br />

E<br />

E<br />

Drag Numerator and then Denominator from the row to the layer.<br />

Drag County from the row to the column.<br />

Drag Stat<strong>is</strong>tics from the column to the row.<br />

Close the pivoting trays window.<br />

Pivot<strong>ed</strong> Ratios Table<br />

Figure 16-5<br />

Pivot<strong>ed</strong> ratios table<br />

The ratios table <strong>is</strong> now pivot<strong>ed</strong> so that stat<strong>is</strong>tics are easier to compare <strong>ac</strong>ross counties.


164<br />

Chapter 16<br />

• The ratio estimates range from a low of 1.195 in the Southern county to a high of<br />

1.524intheWestern county.<br />

• There <strong>is</strong> also quite a bit of variability in the standard errors, which range from a<br />

low of 0.029 in the Southern county to 0.068 in the Eastern county.<br />

• Some of the confidence intervals do not overlap; thus, you can conclude that<br />

the ratios for the Western county are higher than the ratios for the Northern<br />

and Southern counties.<br />

• Finally, as a more objective measure, note that the significance values for the t<br />

tests for the Western and Southern counties are less than 0.05. Thus, you can<br />

conclude that the ratio for the Western county <strong>is</strong> greater than 1.3 and the ratio for<br />

the Southern county <strong>is</strong> less than 1.3.<br />

Summary<br />

Using the <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure, you have obtain<strong>ed</strong> various stat<strong>is</strong>tics<br />

for the ratios of Current value to Valueatlastappra<strong>is</strong>al. The results suggest that<br />

there may be certain inequities in the assessment of property taxes from county to<br />

county, namely:<br />

• The ratios for the Western county are high, indicating that their records are not as<br />

up-to-date as other counties with respect to the appreciation of property values.<br />

Property taxes are probably too low in th<strong>is</strong> county.<br />

• The ratios for the Southern county are low, indicating that their records are more<br />

up-to-date than the other counties with respect to the appreciation of property<br />

values. Property taxes are probably too high in th<strong>is</strong> county.<br />

• The ratios for the Southern county are lower than those of the Western county but<br />

are still within the objective goal of 1.3.<br />

Resources us<strong>ed</strong> to tr<strong>ac</strong>k property values in the Southern county will be reassign<strong>ed</strong><br />

to the Western county to bring these counties’ ratios in line with the others and<br />

with the goal of 1.3.


165<br />

<strong>Complex</strong> Samples Ratios<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining univariate<br />

descriptive stat<strong>is</strong>tics of the ratio of scale measures for observations obtain<strong>ed</strong> via a<br />

complex sampling design.<br />

• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />

design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />

Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />

dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />

• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />

specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />

the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />

analyzing the sample corresponding to that plan.<br />

• The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure provides descriptive stat<strong>is</strong>tics<br />

for individual scale measures.


Chapter<br />

17<br />

<strong>Complex</strong> Samples General Linear<br />

Model<br />

The <strong>Complex</strong> Samples General Linear Model (CSGLM) proc<strong>ed</strong>ure performs linear<br />

regression analys<strong>is</strong>, as well as analys<strong>is</strong> of variance and covariance, for samples<br />

drawn by complex sampling methods. Optionally, you can request analyses for<br />

a subpopulation.<br />

Using <strong>Complex</strong> Samples General Linear Model to Fit a<br />

Two-F<strong>ac</strong>tor ANOVA<br />

A grocery store chain survey<strong>ed</strong> a set of customers concerning their purchasing<br />

habits, <strong>ac</strong>cording to a complex design. Given the survey results and how much e<strong>ac</strong>h<br />

customer spent in the previous month, the store wants to see if the frequency with<br />

which customers shop <strong>is</strong> relat<strong>ed</strong> to the amount they spend in a month, controlling for<br />

the gender of the customer and incorporating the sampling design.<br />

Th<strong>is</strong> information <strong>is</strong> collect<strong>ed</strong> in grocery_1month_sample.sav, found in the<br />

\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />

the <strong>Complex</strong> Samples General Linear Model proc<strong>ed</strong>ure to perform a two-f<strong>ac</strong>tor (or<br />

two-way) ANOVA on the amounts spent.<br />

Running the Analys<strong>is</strong><br />

E<br />

To run a <strong>Complex</strong> Samples General Linear Model analys<strong>is</strong>, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

General Linear Model...<br />

167


168<br />

Chapter 17<br />

Figure 17-1<br />

<strong>Complex</strong> Samples Plan dialog box<br />

E<br />

E<br />

Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />

install<strong>ed</strong> <strong>SPSS</strong> and select grocery.csplan.<br />

Click Continue.


169<br />

<strong>Complex</strong> Samples General Linear Model<br />

Figure 17-2<br />

General Linear Model dialog box<br />

E<br />

E<br />

E<br />

Select Amount spent as the dependent variable.<br />

Select Who shopping for and Use coupons as f<strong>ac</strong>tors.<br />

Click Model.


170<br />

Chapter 17<br />

Figure 17-3<br />

Model dialog box<br />

E<br />

E<br />

E<br />

E<br />

E<br />

Choose to build a Custom model.<br />

Select Main effects as the type of term to build and select shopfor and usecoup<br />

as model terms.<br />

Select Inter<strong>ac</strong>tion as the type of term to build and add the shopfor*usecoup inter<strong>ac</strong>tion<br />

as a model term.<br />

Click Continue.<br />

Click Stat<strong>is</strong>tics in the General Linear Model dialog box.


171<br />

<strong>Complex</strong> Samples General Linear Model<br />

Figure 17-4<br />

General Linear Model Stat<strong>is</strong>tics dialog box<br />

E<br />

E<br />

E<br />

Select Estimate, Standard error, Confidence interval, andDesign effect in the Model<br />

Parameters group.<br />

Click Continue.<br />

Click Estimat<strong>ed</strong> Means in the General Linear Model dialog box.


172<br />

Chapter 17<br />

Figure 17-5<br />

General Linear Model Estimat<strong>ed</strong> Means dialog box<br />

E<br />

E<br />

E<br />

E<br />

E<br />

Choose to d<strong>is</strong>play means for shopfor, usecoup, andtheshopfor*usecoup inter<strong>ac</strong>tion.<br />

Select a Simple contrast and 3Selfandfamilyas the reference category for shopfor.<br />

Note that, once select<strong>ed</strong>, the category appears as “3” in the dialog box.<br />

Select a Simple contrast and 1Noas the reference category for usecoup.<br />

Click Continue.<br />

Click OK in the General Linear Model dialog box.<br />

Model Summary<br />

Figure 17-6<br />

Tests of between-subjects effects


173<br />

<strong>Complex</strong> Samples General Linear Model<br />

R Square, the coefficient of determination, <strong>is</strong> a measure of the strength of the model<br />

fit. It shows that about 60% of the variation in Amount spent <strong>is</strong>explain<strong>ed</strong>bythe<br />

model, which gives you good explanatory ability. You may still want to add other<br />

pr<strong>ed</strong>ictors to the model to further improve the fit.<br />

Tests of Model Effects<br />

Figure 17-7<br />

Tests of between-subjects effects<br />

E<strong>ac</strong>h term in the model, plus the model as a whole, <strong>is</strong> test<strong>ed</strong> for whether the value<br />

of its effect equals 0. Terms with significance values of less than 0.05 have some<br />

d<strong>is</strong>cernible effect. Thus, all model terms contribute to the model.


174<br />

Chapter 17<br />

Parameter Estimates<br />

Figure 17-8<br />

Parameter estimates<br />

The parameter estimates show the effect of e<strong>ac</strong>h pr<strong>ed</strong>ictor on Amount spent. The<br />

value of 518.249 for the intercept term indicates that the grocery chain can expect a<br />

shopper with a family who uses coupons from the newspaper and target<strong>ed</strong> mailings to<br />

spend $518.25, on average. You can tell that the intercept <strong>is</strong> associat<strong>ed</strong> with these<br />

f<strong>ac</strong>tor levels because those are the f<strong>ac</strong>tor levels whose parameters are r<strong>ed</strong>undant.<br />

• The shopfor coefficients suggest that, among customerswhousebothmail<strong>ed</strong><br />

coupons and newspaper coupons, those without family tend to spend less than<br />

those with spouses, who in turn spend less than those with dependents at home.<br />

Since the tests of model effects show<strong>ed</strong> that th<strong>is</strong> term contributes to the model,<br />

these differences are not due to chance.


175<br />

<strong>Complex</strong> Samples General Linear Model<br />

• The usecoup coefficients suggest that spending among customers with dependents<br />

at home decreases with decreas<strong>ed</strong> coupon usage. There <strong>is</strong> a moderate amount of<br />

uncertainty in the estimates, but the confidence intervals do not include 0.<br />

• The inter<strong>ac</strong>tion coefficients suggest that customers who do not use coupons or<br />

only clip from the newspaper and do not have dependents tend to spend more<br />

than you would otherw<strong>is</strong>e expect. If any portion of an inter<strong>ac</strong>tion parameter <strong>is</strong><br />

r<strong>ed</strong>undant, the inter<strong>ac</strong>tion parameter <strong>is</strong> r<strong>ed</strong>undant.<br />

• The deviation in the values of the design effects from 1 indicate that some of the<br />

standard errors comput<strong>ed</strong> for these parameter estimates are larger than those<br />

you would obtain if you assum<strong>ed</strong> that these observations came from a simple<br />

random sample, while others are smaller. It <strong>is</strong> vitally important to incorporate the<br />

sampling design information in your analys<strong>is</strong> because you might otherw<strong>is</strong>e infer,<br />

for example, that the usecoup=3 coefficient <strong>is</strong> not different from 0!<br />

The parameter estimates are useful for quantifying the effect of e<strong>ac</strong>h model term, but<br />

the estimat<strong>ed</strong> marginal means tables can make it easier to interpret the model results.<br />

Estimat<strong>ed</strong> Marginal Means<br />

Figure 17-9<br />

Estimat<strong>ed</strong> marginal means by levels of gender<br />

Th<strong>is</strong> table d<strong>is</strong>plays the model-estimat<strong>ed</strong> marginal means and standard errors of<br />

Amount spent at the f<strong>ac</strong>tor levels of Who shopping for. Th<strong>is</strong> table <strong>is</strong> useful for<br />

exploring the differences between the levels of th<strong>is</strong> f<strong>ac</strong>tor. In th<strong>is</strong> example, a customer<br />

who shops for themselves <strong>is</strong> expect<strong>ed</strong> to spend about $308.53, while a customer<br />

with a spouse <strong>is</strong> expect<strong>ed</strong> to spend $370.33, and a customer with dependents will<br />

spend $459.44. To see whether th<strong>is</strong> represents a real difference or <strong>is</strong> due to chance<br />

variation, look at the test results.


176<br />

Chapter 17<br />

Figure 17-10<br />

Individual test results for estimat<strong>ed</strong> marginal means of gender<br />

The individual tests table d<strong>is</strong>plays two simple contrasts in spending.<br />

• The contrast estimate <strong>is</strong> the difference in spending for the l<strong>is</strong>t<strong>ed</strong> levels of Who<br />

shopping for.<br />

• The hypothesiz<strong>ed</strong> value of 0.00 represents the belief that there <strong>is</strong> no difference<br />

in spending.<br />

• The Wald F stat<strong>is</strong>tic, with the d<strong>is</strong>play<strong>ed</strong> degrees of fre<strong>ed</strong>om, <strong>is</strong> us<strong>ed</strong> to test<br />

whether the difference between a contrast estimate and hypothesiz<strong>ed</strong> value <strong>is</strong> due<br />

to chance variation.<br />

• Since the significance values are less than 0.05, you can conclude that there<br />

are differences in spending.<br />

The values of the contrast estimates are different from the parameter estimates. Th<strong>is</strong><br />

<strong>is</strong> because there <strong>is</strong> an inter<strong>ac</strong>tion term containing the Who shopping for effect. As a<br />

result, the parameter estimate for shopfor=1 <strong>is</strong> a simple contrast between the levels<br />

Self and Self and Family at the level From both of the variable Use coupons. The<br />

contrast estimate in th<strong>is</strong> table <strong>is</strong> averag<strong>ed</strong> over the levels of Use coupons.<br />

Figure 17-11<br />

Overall test results for estimat<strong>ed</strong> marginal means of gender<br />

The overall test table reports the results of a test of all of the contrasts in the individual<br />

test table. Its significance value of less than 0.05 confirms that there <strong>is</strong> a difference in<br />

spending among the levels of Who shopping for.


177<br />

<strong>Complex</strong> Samples General Linear Model<br />

Figure 17-12<br />

Estimat<strong>ed</strong> marginal means by levels of shopping style<br />

Th<strong>is</strong> table d<strong>is</strong>plays the model-estimat<strong>ed</strong> marginal means and standard errors of<br />

Amount spent at the f<strong>ac</strong>tor levels of Use coupons. Th<strong>is</strong> table <strong>is</strong> useful for exploring<br />

the differences between the levels of th<strong>is</strong> f<strong>ac</strong>tor. In th<strong>is</strong> example, a customer who does<br />

not use coupons <strong>is</strong> expect<strong>ed</strong> to spend about $319.65, and those who do use coupons<br />

are expect<strong>ed</strong> to spend considerably more.<br />

Figure 17-13<br />

Individual test results for estimat<strong>ed</strong> marginal means of shopping style<br />

The individual tests table d<strong>is</strong>plays three simple contrasts, comparing the spending of<br />

customers who do not use coupons to those who do.<br />

Since the significance values of the tests are less than 0.05, you can conclude that<br />

customers who use coupons tend to spend more than those who don’t.<br />

Figure 17-14<br />

Overall test results for estimat<strong>ed</strong> marginal means of shopping style


178<br />

Chapter 17<br />

The overall test table reports the results of a test of all the contrasts in the individual<br />

test table. Its significance value of less than 0.05 confirms that there <strong>is</strong> a difference<br />

in spending among the levels of Use coupons. Note that the overall tests for Use<br />

coupons and Who shopping for are equivalent to the tests of model effects because the<br />

hypothesiz<strong>ed</strong> contrast values are equal to 0.<br />

Figure 17-15<br />

Estimat<strong>ed</strong> marginal means by levels of gender by shopping style<br />

Th<strong>is</strong> table d<strong>is</strong>plays the model-estimat<strong>ed</strong> marginal means, standard errors, and<br />

confidence intervals of Amount spent at the f<strong>ac</strong>tor combinations of Who shopping for<br />

and Use coupons. Th<strong>is</strong> table <strong>is</strong> useful for exploring the inter<strong>ac</strong>tion effect between<br />

these two f<strong>ac</strong>tors that was found in the tests of model effects.<br />

Summary<br />

In th<strong>is</strong> example, the estimat<strong>ed</strong> marginal means reveal<strong>ed</strong> differences in spending<br />

between customers at varying levels of Who shopping for and Use coupons. Thetests<br />

of model effects confirm<strong>ed</strong> th<strong>is</strong>, as well as the f<strong>ac</strong>t that there appears to be a Who<br />

shopping for*Use coupons inter<strong>ac</strong>tion effect. The model summary table reveal<strong>ed</strong> that<br />

the present model explains somewhat more than half of the variation in the data, and<br />

couldlikelybeimprov<strong>ed</strong>byaddingmorepr<strong>ed</strong>ictors.


179<br />

<strong>Complex</strong> Samples General Linear Model<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples General Linear Model proc<strong>ed</strong>ure <strong>is</strong> a useful tool for modeling<br />

a scale variable when the cases have been drawn<strong>ac</strong>cordingto<strong>ac</strong>omplexsampling<br />

scheme.<br />

• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />

design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />

Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />

dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />

• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />

specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />

the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />

analyzing the sample corresponding to that plan.<br />

• The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure allows you to model a<br />

categorical response.


<strong>Complex</strong> Samples Log<strong>is</strong>tic<br />

Regression<br />

Chapter<br />

18<br />

The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure performs log<strong>is</strong>tic regression<br />

analys<strong>is</strong> on a binary or multinomial dependent variable for samples drawn by complex<br />

sampling methods. Optionally, you can request analyses for a subpopulation.<br />

Using <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression to Assess Cr<strong>ed</strong>it<br />

R<strong>is</strong>k<br />

If you are a loan officer at a bank, then you want to be able to identify char<strong>ac</strong>ter<strong>is</strong>tics<br />

that are indicative of people who are likely to default on loans, and use those<br />

char<strong>ac</strong>ter<strong>is</strong>ticstoidentifygoodandbadcr<strong>ed</strong>itr<strong>is</strong>ks.<br />

Suppose that a loan officer has collect<strong>ed</strong> past records of customers given loans<br />

at several different branches, <strong>ac</strong>cording to a complex design. Th<strong>is</strong> information <strong>is</strong><br />

contain<strong>ed</strong> in bankloan_cs.sav, found in the \tutorial\sample_files\ subdirectory of<br />

the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. The officer wants to see if the probability<br />

with which a customer defaults <strong>is</strong> relat<strong>ed</strong> to age, employment h<strong>is</strong>tory, and amount<br />

of cr<strong>ed</strong>it debt, and incorporating the sampling design.<br />

Running the Analys<strong>is</strong><br />

E<br />

To create the log<strong>is</strong>tic regression model, from the menus choose:<br />

Analyze<br />

<strong>Complex</strong> Samples<br />

Log<strong>is</strong>tic Regression...<br />

181


182<br />

Chapter 18<br />

Figure 18-1<br />

<strong>Complex</strong> Samples Plan dialog box<br />

E<br />

E<br />

Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />

install<strong>ed</strong> <strong>SPSS</strong> and select bankloan.csaplan.<br />

Click Continue.


183<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

Figure 18-2<br />

Log<strong>is</strong>tic Regression dialog box<br />

E<br />

E<br />

E<br />

E<br />

Select Previously default<strong>ed</strong> as the dependent variable.<br />

Select Levelof<strong>ed</strong>ucationas a f<strong>ac</strong>tor.<br />

Select Age in years through Other debt in thousands as covariates.<br />

Select Previously default<strong>ed</strong> and click Reference Category.


184<br />

Chapter 18<br />

Figure 18-3<br />

Log<strong>is</strong>tic Regression Reference Category dialog box<br />

E<br />

E<br />

E<br />

Select Lowest value as the reference category.<br />

Th<strong>is</strong> sets the “did not default” category as the reference category; thus, the odds ratios<br />

report<strong>ed</strong> in the output will have the property that increasing odds ratios correspond to<br />

increasing probability of default.<br />

Click Continue.<br />

Click Stat<strong>is</strong>tics in the Log<strong>is</strong>tic Regression dialog box.


185<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

Figure 18-4<br />

Log<strong>is</strong>tic Regression Stat<strong>is</strong>tics dialog box<br />

E<br />

E<br />

E<br />

E<br />

Select Classification table in the Model Fit group<br />

Select Estimate, Exponentiat<strong>ed</strong> estimate, Standard error, Confidence interval, andDesign<br />

effect in the Parameters group.<br />

Click Continue.<br />

Click Odds Ratios in the Log<strong>is</strong>tic Regression dialog box.


186<br />

Chapter 18<br />

Figure 18-5<br />

Log<strong>is</strong>tic Regression Odds Ratios dialog box<br />

E<br />

E<br />

E<br />

Choose to create odds ratios for the f<strong>ac</strong>tor <strong>ed</strong> and the covariates employ and debtinc.<br />

Click Continue.<br />

Click OK in the Log<strong>is</strong>tic Regression dialog box.


187<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

Pseudo R-Squares<br />

Figure 18-6<br />

Pseudo R-square stat<strong>is</strong>tics<br />

There <strong>is</strong> no single stat<strong>is</strong>tic for log<strong>is</strong>tic regression models that duplicates the properties<br />

of the R-square stat<strong>is</strong>tic for linear regression models, so these approximations are<br />

comput<strong>ed</strong> instead. Larger pseudo R-square stat<strong>is</strong>tics indicate that more of the<br />

variation <strong>is</strong> explain<strong>ed</strong> by the model, to a maximum of 1.<br />

Classification<br />

Figure 18-7<br />

Classification table<br />

The classification table shows the pr<strong>ac</strong>tical results of using the log<strong>is</strong>tic regression<br />

model. For e<strong>ac</strong>h case, the pr<strong>ed</strong>ict<strong>ed</strong> response <strong>is</strong> Yes if that case’s model-pr<strong>ed</strong>ict<strong>ed</strong><br />

logit <strong>is</strong> greater than 0. Cases are weight<strong>ed</strong> by finalweight, so that the classification<br />

table reports the expect<strong>ed</strong> model performance in the population.<br />

• Cells on the diagonal are correct pr<strong>ed</strong>ictions.<br />

• Cells off the diagonal are incorrect pr<strong>ed</strong>ictions.


188<br />

Chapter 18<br />

Bas<strong>ed</strong> upon the cases us<strong>ed</strong> to create the model, you can expect to correctly classify<br />

85.5% of the non-defaulters in the population using th<strong>is</strong> model. Likew<strong>is</strong>e, you can<br />

expect to correctly classify 60.9% of the defaulters. Overall, you can expect to<br />

classify 76.5% of the cases are classifi<strong>ed</strong> correctly; however, because th<strong>is</strong> table was<br />

construct<strong>ed</strong> with the cases us<strong>ed</strong> to create the model, these estimates are likely to<br />

be overly optim<strong>is</strong>tic.<br />

Tests of Model Effects<br />

Figure 18-8<br />

Tests of between-subjects effects<br />

E<strong>ac</strong>h term in the model, plus the model as a whole, <strong>is</strong> test<strong>ed</strong> for whether its effect<br />

equals 0. Terms with significance values less than 0.05 have some d<strong>is</strong>cernable effect.<br />

Thus, age, employ, debtinc, andcr<strong>ed</strong>debt contribute to the model, while the other<br />

main effects do not. In a further analys<strong>is</strong> of the data, you would probably remove <strong>ed</strong>,<br />

address, income, andothdebt from model consideration.


189<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

Parameter Estimates<br />

Figure 18-9<br />

Parameter estimates<br />

The parameter estimates table summarizes the effect of e<strong>ac</strong>h pr<strong>ed</strong>ictor. Note that<br />

parameter values affect the likelihood of the “did default” category relative to the<br />

“did not default” category. Thus, parameters with positive coefficients increase<br />

the likelihood of default, while parameters with negative coefficients decrease the<br />

likelihood of default.<br />

The meaning of a log<strong>is</strong>tic regression coefficient <strong>is</strong> not as straightforward as that of<br />

a linear regression coefficient. While B <strong>is</strong> convenient for testing the model effects,<br />

Exp(B) <strong>is</strong> easier to interpret. Exp(B) represents the ratio change in the odds of the<br />

event of interest attributable to a one-unit increase in the pr<strong>ed</strong>ictor for pr<strong>ed</strong>ictors that<br />

are not part of inter<strong>ac</strong>tion terms. For example, Exp(B) for employ <strong>is</strong> equal to 0.798,<br />

which means that the odds of default for people who have been with their current<br />

employer for two years are 0.798 times the odds of default for those who have been<br />

with their current employer for one year, all other things being equal.<br />

The design effects indicate that some of the standard errors comput<strong>ed</strong> for these<br />

parameter estimates are larger than those you would obtain if you assum<strong>ed</strong> that these<br />

observations came from a simple random sample, while other are smaller. It <strong>is</strong> vitally


190<br />

Chapter 18<br />

important to incorporate the sampling design information in your analys<strong>is</strong> because<br />

you might otherw<strong>is</strong>e infer, for example, that the age coefficient <strong>is</strong> no different from 0!<br />

Odds Ratios<br />

Figure 18-10<br />

Odds ratios for level of <strong>ed</strong>ucation<br />

Th<strong>is</strong> table d<strong>is</strong>plays the odds ratios of Previously default<strong>ed</strong> at the f<strong>ac</strong>tor levels of Level<br />

of <strong>ed</strong>ucation. The report<strong>ed</strong> values are the ratios of the odds of default for Did not<br />

complete highschoolthrough College degree, compar<strong>ed</strong> to the odds of default for<br />

Post-undergraduate degree. Thus, the odds ratio of 2.054 in the first row of the table<br />

means that the odds of default for a person who did not complete high school are<br />

2.054 times the odds of default for a person who has a post-undergraduate degree.


191<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />

Figure 18-11<br />

Odds ratios for years with current employer<br />

Th<strong>is</strong> table d<strong>is</strong>plays the odds ratio of Previously default<strong>ed</strong> for a unit change in the<br />

covariate Years with current employer. The report<strong>ed</strong> value <strong>is</strong> the ratio of the odds<br />

of default for a person with 7.99 years at their current job compar<strong>ed</strong> to the odds of<br />

default for a person with 6.99 years (the mean).<br />

Figure 18-12<br />

Odds ratios for debt to income ratio<br />

Th<strong>is</strong> table d<strong>is</strong>plays the odds ratio of Previously default<strong>ed</strong> for a unit change in the<br />

covariate Debt to income ratio. The report<strong>ed</strong> value <strong>is</strong> the ratio of the odds of default<br />

for a person with a debt/income ratio of 10.9341 compar<strong>ed</strong> to the odds of default<br />

for a personwith9.9341(themean).<br />

Note that because none of these pr<strong>ed</strong>ictors are part of inter<strong>ac</strong>tion terms, the values<br />

of the odds ratios report<strong>ed</strong> in these tables are equal to the values of the exponentiat<strong>ed</strong><br />

parameter estimates. When a pr<strong>ed</strong>ictor <strong>is</strong> part of an inter<strong>ac</strong>tion term, its odds ratio<br />

as report<strong>ed</strong> in these tables will also depend on the values of the other pr<strong>ed</strong>ictors<br />

that make up the inter<strong>ac</strong>tion.


192<br />

Chapter 18<br />

Summary<br />

Using the <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Proc<strong>ed</strong>ure, you have construct<strong>ed</strong> a<br />

model for pr<strong>ed</strong>icting the probability that a given customer will default on a loan.<br />

Acritical<strong>is</strong>sue for loan officers <strong>is</strong> the cost of Type I and Type II errors. That <strong>is</strong>,<br />

what <strong>is</strong> the cost of classifying a defaulter as a non-defaulter (Type I)? What <strong>is</strong> the<br />

cost of classifying a non-defaulter as a defaulter (Type II)? If bad debt <strong>is</strong> the primary<br />

concern, then you want to lower your Type I error and maximize your sensitivity.<br />

If growing your customer base <strong>is</strong> the priority, then you want to lower your Type<br />

II error and maximize your specificity. Usually, both are major concerns, so you<br />

have to choose a dec<strong>is</strong>ion rule for classifying customers that gives the best mix<br />

of sensitivity and specificity.<br />

Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />

The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure <strong>is</strong> a useful tool for modeling<br />

a categorical variable when the cases have been drawn <strong>ac</strong>cording to a complex<br />

sampling scheme.<br />

• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />

design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />

Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />

dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />

• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />

specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />

the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />

analyzing the sample corresponding to that plan.<br />

• The <strong>Complex</strong> Samples General Linear Model proc<strong>ed</strong>ure allows you to model a<br />

scale response.


Bibliography<br />

Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley and Sons.<br />

K<strong>is</strong>h,L.1965.Survey Sampling. New York: John Wiley and Sons.<br />

K<strong>is</strong>h,L.1987.Stat<strong>is</strong>tical Design for Research. New York: John Wiley and Sons.<br />

Murthy, M. N. 1967. Sampling Theory and Methods. Calcutta, India: Stat<strong>is</strong>tical<br />

Publ<strong>is</strong>hing Society.<br />

Särndal, C., B. Swensson, and J. Wretman. 1992. Model Ass<strong>is</strong>t<strong>ed</strong> Survey Sampling.<br />

New York: Springer-Verlag.<br />

193


Index<br />

adjust<strong>ed</strong> chi-square stat<strong>is</strong>tic<br />

in <strong>Complex</strong> Samples, 66, 80<br />

adjust<strong>ed</strong> F stat<strong>is</strong>tic<br />

in <strong>Complex</strong> Samples, 66, 80<br />

adjust<strong>ed</strong> residuals<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

analys<strong>is</strong> plan, 23<br />

Bonferroni<br />

in <strong>Complex</strong> Samples, 66, 80<br />

Brewer’s sampling method<br />

in Sampling Wizard, 9<br />

chi-square stat<strong>is</strong>tic<br />

in <strong>Complex</strong> Samples, 66, 80<br />

classification tables<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

clusters<br />

in Analys<strong>is</strong> Preparation Wizard, 25<br />

in Sampling Wizard, 7<br />

coefficient of variation (COV)<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Descriptives, 45<br />

in <strong>Complex</strong> Samples Frequencies, 39<br />

in <strong>Complex</strong> Samples Ratios, 57<br />

column percentages<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

<strong>Complex</strong> Samples<br />

hypothes<strong>is</strong> tests, 66, 80<br />

m<strong>is</strong>sing values, 40, 53<br />

options, 41, 46, 53, 58<br />

<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard,<br />

123<br />

public data, 123<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 137<br />

sampling weights not available, 126<br />

summary, 126, 137<br />

<strong>Complex</strong> Samples Crosstabs, 49, 151<br />

crosstabulation table, 155<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 158<br />

relative r<strong>is</strong>k, 151, 155, 157<br />

stat<strong>is</strong>tics, 51<br />

<strong>Complex</strong> Samples Descriptives, 43, 145<br />

m<strong>is</strong>sing values, 46<br />

public data, 145<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 150<br />

stat<strong>is</strong>tics, 45, 148<br />

stat<strong>is</strong>tics by subpopulation, 149<br />

<strong>Complex</strong> Samples Frequencies, 37, 139<br />

frequency table, 142<br />

frequency table by subpopulation, 143<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 144<br />

stat<strong>is</strong>tics, 39<br />

<strong>Complex</strong> Samples General Linear Model, 61, 167<br />

command additional features,71<br />

estimat<strong>ed</strong> means, 68<br />

marginal means, 175<br />

model, 63<br />

model summary, 172<br />

options, 70<br />

parameter estimates, 174<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 179<br />

save variables, 69<br />

stat<strong>is</strong>tics, 65<br />

tests of model effects, 173<br />

<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 73, 181<br />

command additional features,85<br />

model, 76<br />

odds ratios, 81, 190<br />

options, 84<br />

parameter estimates, 189<br />

pseudo R 2 stat<strong>is</strong>tics, 187<br />

reference category, 75<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 192<br />

save variables, 83<br />

195


196<br />

Index<br />

stat<strong>is</strong>tics, 78<br />

tests of model effects, 188<br />

<strong>Complex</strong> Samples Ratios, 55, 159<br />

m<strong>is</strong>sing values, 58<br />

ratios, 162<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 165<br />

stat<strong>is</strong>tics, 57<br />

<strong>Complex</strong> Samples Sampling Wizard, 89<br />

relat<strong>ed</strong> proc<strong>ed</strong>ures, 121<br />

sampling frame, full, 89<br />

sampling frame, partial, 103<br />

summary, 99, 100<br />

complex sampling<br />

analys<strong>is</strong> plan, 23<br />

sample plan, 5<br />

confidence intervals<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Descriptives, 45, 148, 149<br />

in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />

in <strong>Complex</strong> Samples General Linear Model, 65,<br />

70<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

in <strong>Complex</strong> Samples Ratios, 57<br />

confidence level<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />

contrasts<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

correlations of parameter estimates<br />

in <strong>Complex</strong> Samples General Linear Model, 65<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

covariances of parameter estimates<br />

in <strong>Complex</strong> Samples General Linear Model, 65<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

crosstabulation table<br />

in <strong>Complex</strong> Samples Crosstabs, 155<br />

cumulative values<br />

in <strong>Complex</strong> Samples Frequencies, 39<br />

in <strong>Complex</strong> Samples Frequencies, 39<br />

in <strong>Complex</strong> Samples General Linear Model, 65<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

in <strong>Complex</strong> Samples Ratios, 57<br />

deviation contrasts<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

difference contrasts<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

estimat<strong>ed</strong> marginal means<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

expect<strong>ed</strong> values<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

F stat<strong>is</strong>tic<br />

in <strong>Complex</strong> Samples, 66, 80<br />

Helmert contrasts<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

inclusion probabilities<br />

in Sampling Wizard, 13<br />

input sample weights<br />

in Sampling Wizard, 7<br />

iteration h<strong>is</strong>tory<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />

iterations<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />

degrees of fre<strong>ed</strong>om<br />

in <strong>Complex</strong> Samples, 66, 80<br />

design effect<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Descriptives, 45<br />

least significance difference<br />

in <strong>Complex</strong> Samples, 66, 80<br />

likelihood convergence<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84


197<br />

Index<br />

marginal means<br />

in GLM Univariate, 175<br />

mean<br />

in <strong>Complex</strong> Samples Descriptives, 45, 148, 149<br />

measure of size<br />

in Sampling Wizard, 9<br />

m<strong>is</strong>sing values<br />

in <strong>Complex</strong> Samples, 40, 53<br />

in <strong>Complex</strong> Samples Descriptives, 46<br />

in <strong>Complex</strong> Samples General Linear Model, 70<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />

in <strong>Complex</strong> Samples Ratios, 58<br />

Murthy’s sampling method<br />

in Sampling Wizard, 9<br />

odds ratios<br />

in <strong>Complex</strong> Samples Crosstabs, 51, 151<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 81,<br />

190<br />

parameter convergence<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />

parameter estimates<br />

in <strong>Complex</strong> Samples General Linear Model, 65,<br />

174<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78,<br />

189<br />

polynomial contrasts<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

population size<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Descriptives, 45<br />

in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />

in <strong>Complex</strong> Samples Ratios, 57<br />

in Sampling Wizard, 13<br />

PPS sampling<br />

in Sampling Wizard, 9<br />

pr<strong>ed</strong>ict<strong>ed</strong> categories<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 83<br />

pr<strong>ed</strong>ict<strong>ed</strong> probabilities<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 83<br />

pr<strong>ed</strong>ict<strong>ed</strong> values<br />

in <strong>Complex</strong> Samples General Linear Model, 69<br />

pseudo R 2 stat<strong>is</strong>tics<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78,<br />

187<br />

public data<br />

in Analys<strong>is</strong> Preparation Wizard, 123<br />

in <strong>Complex</strong> Samples Descriptives, 145<br />

R 2 in <strong>Complex</strong> Samples General Linear Model, 65,<br />

172<br />

ratios<br />

in <strong>Complex</strong> Samples Ratios, 162<br />

reference category<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 75<br />

relative r<strong>is</strong>k<br />

in <strong>Complex</strong> Samples Crosstabs, 51, 151, 155,<br />

157<br />

repeat<strong>ed</strong> contrasts<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

residuals<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples General Linear Model, 69<br />

r<strong>is</strong>k difference<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

row percentages<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

Sampford’s sampling method<br />

in Sampling Wizard, 9<br />

sample plan, 5<br />

sample proportion<br />

in Sampling Wizard, 13<br />

sample size<br />

in Sampling Wizard, 11, 13<br />

sample weights<br />

in Analys<strong>is</strong> Preparation Wizard, 25<br />

in Sampling Wizard, 13


198<br />

Index<br />

sampling<br />

complex design, 5<br />

sampling estimation<br />

in Analys<strong>is</strong> Preparation Wizard, 27<br />

sampling frame, full<br />

in Sampling Wizard, 89<br />

sampling frame, partial<br />

in Sampling Wizard, 103<br />

sampling method<br />

in Sampling Wizard, 9<br />

separation<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />

sequential Bonferroni<br />

in <strong>Complex</strong> Samples, 66, 80<br />

sequential sampling<br />

in Sampling Wizard, 9<br />

sequential Sidak<br />

in <strong>Complex</strong> Samples, 66, 80<br />

Sidak<br />

in <strong>Complex</strong> Samples, 66, 80<br />

simple contrasts<br />

in <strong>Complex</strong> Samples General Linear Model, 68<br />

simple random sampling<br />

in Sampling Wizard, 9<br />

square root of design effect<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Descriptives, 45<br />

in <strong>Complex</strong> Samples Frequencies, 39<br />

in <strong>Complex</strong> Samples General Linear Model, 65<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

in <strong>Complex</strong> Samples Ratios, 57<br />

standard error<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Descriptives, 45, 148, 149<br />

in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />

in <strong>Complex</strong> Samples General Linear Model, 65<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

in <strong>Complex</strong> Samples Ratios, 57<br />

step-halving<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />

stratification<br />

in Analys<strong>is</strong> Preparation Wizard, 25<br />

in Sampling Wizard, 7<br />

sum<br />

in <strong>Complex</strong> Samples Descriptives, 45<br />

summary<br />

in Analys<strong>is</strong> Preparation Wizard, 126, 137<br />

in Sampling Wizard, 99, 100<br />

systematic sampling<br />

in Sampling Wizard, 9<br />

table percentages<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />

tests of model effects<br />

in <strong>Complex</strong> Samples General Linear Model, 173<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 188<br />

t tests<br />

in <strong>Complex</strong> Samples General Linear Model, 65<br />

in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />

unweight<strong>ed</strong> count<br />

in <strong>Complex</strong> Samples Crosstabs, 51<br />

in <strong>Complex</strong> Samples Descriptives, 45<br />

in <strong>Complex</strong> Samples Frequencies, 39<br />

in <strong>Complex</strong> Samples Ratios, 57

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!