SPSS Complex Samples⢠13.0 - Docs.is.ed.ac.uk
SPSS Complex Samples⢠13.0 - Docs.is.ed.ac.uk
SPSS Complex Samples⢠13.0 - Docs.is.ed.ac.uk
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>SPSS</strong> <strong>Complex</strong> Samples <strong>13.0</strong>
For more information about <strong>SPSS</strong> ® software products, please v<strong>is</strong>it our Web site at http://www.spss.com or cont<strong>ac</strong>t<br />
<strong>SPSS</strong> Inc.<br />
233 South W<strong>ac</strong>ker Drive, 11th Floor<br />
Chicago, IL 60606-6412<br />
Tel: (312) 651-3000<br />
Fax: (312) 651-3668<br />
<strong>SPSS</strong> <strong>is</strong> a reg<strong>is</strong>ter<strong>ed</strong> trademark and the other product names are the trademarks of <strong>SPSS</strong> Inc. for its proprietary computer<br />
software. No material describing such software may be produc<strong>ed</strong> or d<strong>is</strong>tribut<strong>ed</strong> without the written perm<strong>is</strong>sion of the owners of<br />
the trademark and license rights in the software and the copyrights in the publ<strong>is</strong>h<strong>ed</strong> materials.<br />
The SOFTWARE and documentation are provid<strong>ed</strong> with RESTRICTED RIGHTS. Use, duplication, or d<strong>is</strong>closure by the<br />
Government <strong>is</strong> subject to restrictions as set forth in subdiv<strong>is</strong>ion (c) (1) (ii) of The Rights in Technical Data and Computer Software<br />
clause at 52.227-7013. Contr<strong>ac</strong>tor/manuf<strong>ac</strong>turer <strong>is</strong> <strong>SPSS</strong> Inc., 233 South W<strong>ac</strong>ker Drive, 11th Floor, Chicago, IL 60606-6412.<br />
General notice: Other product names mention<strong>ed</strong> herein are us<strong>ed</strong> for identification purposes only and may be trademarks of<br />
their respective companies.<br />
TableLook <strong>is</strong> a trademark of <strong>SPSS</strong> Inc.<br />
Windows <strong>is</strong> a reg<strong>is</strong>ter<strong>ed</strong> trademark of Microsoft Corporation.<br />
DataDirect, DataDirect Connect, INTERSOLV, and SequeLink are reg<strong>is</strong>ter<strong>ed</strong> trademarks of DataDirect Technologies.<br />
Portions of th<strong>is</strong> product were creat<strong>ed</strong> using LEADTOOLS © 1991–2000, LEAD Technologies, Inc. ALL RIGHTS RESERVED.<br />
LEAD, LEADTOOLS, and LEADVIEW are reg<strong>is</strong>ter<strong>ed</strong> trademarks of LEAD Technologies, Inc.<br />
Sax Basic <strong>is</strong> a trademark of Sax Software Corporation. Copyright © 1993–2004 by Polar Engineering and Consulting.<br />
All rights reserv<strong>ed</strong>.<br />
Portions of th<strong>is</strong> product were bas<strong>ed</strong> on the work of the FreeType Team (http://www.freetype.org).<br />
A portion of the <strong>SPSS</strong> software contains zlib technology. Copyright © 1995–2002 by Jean-loup Gailly and Mark Adler. The zlib<br />
software <strong>is</strong> provid<strong>ed</strong> “as <strong>is</strong>,” without express or impli<strong>ed</strong> warranty.<br />
A portion of the <strong>SPSS</strong> software contains Sun Java Runtime libraries. Copyright © 2003 by Sun Microsystems, Inc. All rights<br />
reserv<strong>ed</strong>. The Sun Java Runtime libraries include code licens<strong>ed</strong> from RSA Security, Inc. Some portions of the libraries are<br />
licens<strong>ed</strong> from IBM and are available at http://oss.software.ibm.com/icu4j/.<br />
<strong>SPSS</strong> <strong>Complex</strong> Samples <strong>13.0</strong><br />
Copyright © 2004 by <strong>SPSS</strong> Inc.<br />
All rights reserv<strong>ed</strong>.<br />
Print<strong>ed</strong> in the Unit<strong>ed</strong> States of America.<br />
No part of th<strong>is</strong> publication may be reproduc<strong>ed</strong>, stor<strong>ed</strong> in a retrieval system, or transmitt<strong>ed</strong>, in any form or by any means,<br />
electronic, mechanical, photocopying, recording, or otherw<strong>is</strong>e, without the prior written perm<strong>is</strong>sion of the publ<strong>is</strong>her.<br />
1234567890 060504<br />
ISBN 1-56827-353-3
Pref<strong>ac</strong>e<br />
<strong>SPSS</strong> <strong>13.0</strong> <strong>is</strong> a comprehensive system for analyzing data. The <strong>Complex</strong> Samples<br />
optional add-on module provides the additional analytic techniques describ<strong>ed</strong> in th<strong>is</strong><br />
manual. The <strong>Complex</strong> Samples add-on module must be us<strong>ed</strong> with the <strong>SPSS</strong> <strong>13.0</strong><br />
Basesystemand<strong>is</strong>completelyintegrat<strong>ed</strong>intothatsystem.<br />
Installation<br />
To install the <strong>Complex</strong> Samples add-on module, run the License Authorization<br />
Wizard using the authorization code that you receiv<strong>ed</strong> from <strong>SPSS</strong> Inc. For more<br />
information, see the installation instructions suppli<strong>ed</strong> with the <strong>SPSS</strong> Base system.<br />
Compatibility<br />
<strong>SPSS</strong> <strong>is</strong> design<strong>ed</strong> to run on many computer systems. See the installation instructions<br />
that came with your system for specific information on minimum and recommend<strong>ed</strong><br />
requirements.<br />
Serial Numbers<br />
Your serial number <strong>is</strong> your identification number with <strong>SPSS</strong> Inc. You will ne<strong>ed</strong><br />
th<strong>is</strong> serial number when you cont<strong>ac</strong>t <strong>SPSS</strong> Inc. for information regarding support,<br />
payment, or an upgrad<strong>ed</strong> system. The serial number was provid<strong>ed</strong> with your Base<br />
system.<br />
Customer Service<br />
If you have any questions concerning your shipment or <strong>ac</strong>count, cont<strong>ac</strong>t your local<br />
office, l<strong>is</strong>t<strong>ed</strong> on the <strong>SPSS</strong> Web site at http://www.spss.com/worldwide. Please have<br />
your serial number ready for identification.<br />
iii
Training Seminars<br />
<strong>SPSS</strong> Inc. provides both public and onsite training seminars. All seminars feature<br />
hands-on workshops. Seminars will be offer<strong>ed</strong> in major cities on a regular bas<strong>is</strong>. For<br />
more information on these seminars, cont<strong>ac</strong>t your local office, l<strong>is</strong>t<strong>ed</strong> on the <strong>SPSS</strong><br />
Web site at http://www.spss.com/worldwide.<br />
Technical Support<br />
The services of <strong>SPSS</strong> Technical Support are available to reg<strong>is</strong>ter<strong>ed</strong> customers.<br />
Customers may cont<strong>ac</strong>t Technical Support for ass<strong>is</strong>tance in using <strong>SPSS</strong> or for<br />
installation help for one of the support<strong>ed</strong> hardware environments. To re<strong>ac</strong>h Technical<br />
Support, see the <strong>SPSS</strong> Web site at http://www.spss.com, or cont<strong>ac</strong>t your local office,<br />
l<strong>is</strong>t<strong>ed</strong>onthe<strong>SPSS</strong>Websiteathttp://www.spss.com/worldwide. Beprepar<strong>ed</strong>to<br />
identify yourself, your organization, and the serial number of your system.<br />
Additional Publications<br />
Additional copies of <strong>SPSS</strong> product manuals may be purchas<strong>ed</strong> directly from <strong>SPSS</strong><br />
Inc. V<strong>is</strong>it the <strong>SPSS</strong> Web Store at http://www.spss.com/estore, or cont<strong>ac</strong>t your local<br />
<strong>SPSS</strong> office, l<strong>is</strong>t<strong>ed</strong> on the <strong>SPSS</strong> Web site at http://www.spss.com/worldwide. For<br />
telephone orders in the Unit<strong>ed</strong> States and Canada, call <strong>SPSS</strong> Inc. at 800-543-2185.<br />
For telephone orders outside of North America, cont<strong>ac</strong>t your local office, l<strong>is</strong>t<strong>ed</strong><br />
on the <strong>SPSS</strong> Web site.<br />
The <strong>SPSS</strong> Stat<strong>is</strong>tical Proc<strong>ed</strong>ures Companion, by Marija Norus<strong>is</strong>, has been<br />
publ<strong>is</strong>h<strong>ed</strong> by Prentice Hall. A new version of th<strong>is</strong> book, updat<strong>ed</strong> for <strong>SPSS</strong> <strong>13.0</strong>, <strong>is</strong><br />
plann<strong>ed</strong>. The<strong>SPSS</strong> Advanc<strong>ed</strong> Stat<strong>is</strong>tical Proc<strong>ed</strong>ures Companion, also bas<strong>ed</strong> on <strong>SPSS</strong><br />
<strong>13.0</strong>, <strong>is</strong> forthcoming. The <strong>SPSS</strong> Guide to Data Analys<strong>is</strong> for<strong>SPSS</strong><strong>13.0</strong><strong>is</strong>alsoin<br />
development. Announcements of publications available exclusively through Prentice<br />
Hall will be available on the <strong>SPSS</strong> Web site at http://www.spss.com/estore (select<br />
your home country, and then click Books).<br />
Tell Us Your Thoughts<br />
Your comments are important. Please let us know about your experiences with <strong>SPSS</strong><br />
products. We especially like to hear about new and interesting applications using<br />
the <strong>SPSS</strong> system. Please send e-mail to suggest@spss.com or write to <strong>SPSS</strong> Inc.,<br />
iv
Attn.: Director of Product Planning, 233 South W<strong>ac</strong>ker Drive, 11th Floor, Chicago,<br />
IL 60606-6412.<br />
About Th<strong>is</strong> Manual<br />
Th<strong>is</strong> manual documents the graphical user interf<strong>ac</strong>e for the proc<strong>ed</strong>ures includ<strong>ed</strong> in<br />
the <strong>Complex</strong> Samples add-on module. Illustrations of dialog boxes are taken from<br />
<strong>SPSS</strong> for Windows. Dialog boxes in other operating systems are similar. Detail<strong>ed</strong><br />
information about the command syntax for features in th<strong>is</strong> module <strong>is</strong> provid<strong>ed</strong> in the<br />
<strong>SPSS</strong> Command Syntax Reference, available from the Help menu.<br />
Cont<strong>ac</strong>ting <strong>SPSS</strong><br />
If you would like to be on our mailing l<strong>is</strong>t, cont<strong>ac</strong>t one of our offices, l<strong>is</strong>t<strong>ed</strong> on our<br />
Web site at http://www.spss.com/worldwide.<br />
v
Contents<br />
Part I: User's Guide<br />
1 Introduction to <strong>SPSS</strong> <strong>Complex</strong> Samples<br />
Proc<strong>ed</strong>ures 1<br />
Propertiesof<strong>Complex</strong>Samples .................................. 1<br />
Usageof<strong>Complex</strong>SamplesProc<strong>ed</strong>ures............................ 2<br />
PlanFiles................................................ 3<br />
FurtherReadings ............................................. 3<br />
2 Sampling from a <strong>Complex</strong> Design 5<br />
CreatingaNewSamplePlan .................................... 6<br />
SamplingWizard:DesignVariables ............................... 7<br />
TreeControlsforNavigatingtheSamplingWizard................. 8<br />
Sampling Wizard:SamplingMethod............................... 9<br />
Sampling Wizard:SampleSize.................................. 11<br />
Define UnequalSizes...................................... 12<br />
SamplingWizard:OutputVariables............................... 13<br />
SamplingWizard:PlanSummary ................................ 15<br />
SamplingWizard:DrawSampleSelectionOptions................... 16<br />
SamplingWizard:DrawSampleOutputFiles........................ 17<br />
SamplingWizard:Fin<strong>is</strong>h....................................... 18<br />
ModifyinganEx<strong>is</strong>tingSamplePlan............................... 19<br />
SamplingWizard:PlanSummary ................................ 20<br />
vii
RunninganEx<strong>is</strong>tingSamplePlan ................................ 21<br />
CSPLAN and CSSELECT Commands Additional Features . . . . . . . . . . . . . . . 21<br />
3 Preparing a <strong>Complex</strong>SampleforAnalys<strong>is</strong> 23<br />
CreatingaNewAnalys<strong>is</strong>Plan................................... 24<br />
Analys<strong>is</strong>PreparationWizard:DesignVariables...................... 25<br />
TreeControlsforNavigatingtheAnalys<strong>is</strong>Wizard................. 26<br />
Analys<strong>is</strong> PreparationWizard:EstimationMethod.................... 27<br />
Analys<strong>is</strong> PreparationWizard:Size ............................... 28<br />
Define UnequalSizes...................................... 29<br />
Analys<strong>is</strong>PreparationWizard:StageSummary ...................... 30<br />
Analys<strong>is</strong>PreparationWizard:Fin<strong>is</strong>h.............................. 31<br />
ModifyinganEx<strong>is</strong>tingAnalys<strong>is</strong>Plan.............................. 32<br />
Analys<strong>is</strong>PreparationWizard:PlanSummary ....................... 33<br />
4 <strong>Complex</strong> Samples Plan 35<br />
5 <strong>Complex</strong> Samples Frequencies 37<br />
<strong>Complex</strong>SamplesFrequenciesStat<strong>is</strong>tics.......................... 39<br />
<strong>Complex</strong>SamplesM<strong>is</strong>singValues................................ 40<br />
<strong>Complex</strong>SamplesOptions ..................................... 41<br />
viii
6 <strong>Complex</strong> Samples Descriptives 43<br />
<strong>Complex</strong>SamplesDescriptivesStat<strong>is</strong>tics.......................... 45<br />
<strong>Complex</strong>SamplesDescriptivesM<strong>is</strong>singValues...................... 46<br />
<strong>Complex</strong>SamplesOptions ..................................... 46<br />
7 <strong>Complex</strong> Samples Crosstabs 49<br />
<strong>Complex</strong>SamplesCrosstabsStat<strong>is</strong>tics............................ 51<br />
<strong>Complex</strong>SamplesM<strong>is</strong>singValues................................ 53<br />
<strong>Complex</strong>SamplesOptions ..................................... 53<br />
8 <strong>Complex</strong> Samples Ratios 55<br />
<strong>Complex</strong>SamplesRatiosStat<strong>is</strong>tics............................... 57<br />
<strong>Complex</strong>SamplesRatiosM<strong>is</strong>singValues .......................... 58<br />
<strong>Complex</strong>SamplesOptions ..................................... 58<br />
9 <strong>Complex</strong> Samples General Linear Model 61<br />
<strong>Complex</strong> Samples General Linear Model Stat<strong>is</strong>tics . . . . . . . . . . . . . . . . . . . 65<br />
<strong>Complex</strong>SamplesHypothes<strong>is</strong>Tests .............................. 66<br />
<strong>Complex</strong> Samples General Linear Model Estimat<strong>ed</strong> Means . . . . . . . . . . . . . 68<br />
<strong>Complex</strong>SamplesGeneralLinearModelSave ...................... 69<br />
<strong>Complex</strong> Samples General Linear Model Options . . . . . . . . . . . . . . . . . . . . 70<br />
CSGLMCommandAdditionalFeatures............................ 71<br />
ix
10 <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression 73<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Reference Category . . . . . . . . . . . . 75<br />
<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionModel....................... 76<br />
<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionStat<strong>is</strong>tics..................... 78<br />
<strong>Complex</strong>SamplesHypothes<strong>is</strong>Tests .............................. 80<br />
<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionOddsRatios................... 81<br />
<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionSave........................ 83<br />
<strong>Complex</strong>SamplesLog<strong>is</strong>ticRegressionOptions...................... 84<br />
CSLOGISTICCommandAdditionalFeatures ........................ 85<br />
Part II: Examples<br />
11 <strong>Complex</strong> Samples Sampling Wizard 89<br />
Obtaining aSamplefromaFullSamplingFrame..................... 89<br />
Using the Wizard ........................................ 89<br />
PlanSummary........................................... 99<br />
SamplingSummary...................................... 100<br />
Sample Results......................................... 102<br />
ObtainingaSamplefromaPartialSamplingFrame.................. 103<br />
UsingtheWizardtoSamplefromtheFirstPartialFrame .......... 103<br />
Sample Results......................................... 113<br />
UsingtheWizardtoSamplefromtheSecondPartialFrame........ 114<br />
SampleResults......................................... 120<br />
Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 121<br />
x
12 <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation<br />
Wizard 123<br />
Using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard to Ready NHIS<br />
PublicData................................................ 123<br />
UsingtheWizard........................................ 123<br />
Summary.............................................. 126<br />
Preparing for Analys<strong>is</strong> When Sampling Weights Are Not in the Data File. . 126<br />
Computing Inclusion Probabilities and Sampling Weights . . . . . . . . . 127<br />
Using the Wizard........................................ 130<br />
Summary.............................................. 137<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures......................................... 137<br />
13 <strong>Complex</strong> Samples Frequencies 139<br />
Using <strong>Complex</strong> Samples Frequencies to Analyze Nutritional Supplement<br />
Usage.................................................... 139<br />
RunningtheAnalys<strong>is</strong>..................................... 139<br />
Frequency Table ........................................ 142<br />
FrequencybySubpopulation............................... 143<br />
Summary.............................................. 144<br />
Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 144<br />
14 <strong>Complex</strong> Samples Descriptives 145<br />
Using <strong>Complex</strong> Samples Descriptives to Analyze Activity Levels . . . . . . . . 145<br />
RunningtheAnalys<strong>is</strong>..................................... 145<br />
UnivariateStat<strong>is</strong>tics...................................... 148<br />
xi
UnivariateStat<strong>is</strong>ticsbySubpopulation........................ 149<br />
Summary.............................................. 150<br />
Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 150<br />
15 <strong>Complex</strong> Samples Crosstabs 151<br />
Using <strong>Complex</strong> Samples Crosstabs to Measure the Relative R<strong>is</strong>k of an<br />
Event.................................................... 151<br />
Running theAnalys<strong>is</strong>..................................... 151<br />
Crosstabulation......................................... 155<br />
R<strong>is</strong>kEstimate .......................................... 155<br />
R<strong>is</strong>k EstimatebySubpopulation............................. 157<br />
Summary.............................................. 157<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures......................................... 158<br />
16 <strong>Complex</strong> Samples Ratios 159<br />
Using <strong>Complex</strong> Samples Ratios to Aid Property Value Assessment . . . . . . 159<br />
Running theAnalys<strong>is</strong>..................................... 159<br />
Ratios................................................ 162<br />
Pivot<strong>ed</strong>RatiosTable ..................................... 163<br />
Summary.............................................. 164<br />
Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 165<br />
17 <strong>Complex</strong> Samples General Linear Model 167<br />
Using <strong>Complex</strong> Samples General Linear Model to Fit a Two-F<strong>ac</strong>tor ANOVA 167<br />
RunningtheAnalys<strong>is</strong>..................................... 167<br />
Model Summary ........................................ 172<br />
xii
TestsofModelEffects.................................... 173<br />
ParameterEstimates..................................... 174<br />
Estimat<strong>ed</strong> MarginalMeans................................ 175<br />
Summary ............................................. 178<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures......................................... 179<br />
18 <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression 181<br />
Using <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression to Assess Cr<strong>ed</strong>it R<strong>is</strong>k . . . . . . 181<br />
Running the Analys<strong>is</strong>..................................... 181<br />
PseudoR-Squares....................................... 187<br />
Classification........................................... 187<br />
Tests of ModelEffects.................................... 188<br />
ParameterEstimates..................................... 189<br />
OddsRatios............................................ 190<br />
Summary.............................................. 192<br />
Relat<strong>ed</strong>Proc<strong>ed</strong>ures......................................... 192<br />
Bibliography 193<br />
Index 195<br />
xiii
Part 1: User's Guide
Introduction to<strong>SPSS</strong><strong>Complex</strong><br />
Samples Proc<strong>ed</strong>ures<br />
Chapter<br />
1<br />
An inherent assumption of analytical proc<strong>ed</strong>ures in traditional software p<strong>ac</strong>kages<br />
<strong>is</strong>thattheobservationsinadatafilerepresentasimplerandomsamplefromthe<br />
population of interest. Th<strong>is</strong> assumption <strong>is</strong> untenable for an increasing number of<br />
companies and researchers who find it both cost-effective and convenient to obtain<br />
samples in a more structur<strong>ed</strong> way.<br />
The <strong>SPSS</strong> <strong>Complex</strong> Samples option allows you to select a sample <strong>ac</strong>cording to<br />
a complex design and incorporate the design specifications into the data analys<strong>is</strong>,<br />
thus ensuring that your results are valid.<br />
Properties of <strong>Complex</strong> Samples<br />
A complex sample can differ from a simple random sample in many ways. In a<br />
simple random sample, individual sampling units are select<strong>ed</strong> at random with equal<br />
probability and without repl<strong>ac</strong>ement (WOR) directly from the entire population. By<br />
contrast, a given complex sample can have some or all of the following features:<br />
Stratification. Stratifi<strong>ed</strong> sampling involves selecting samples independently within<br />
non-overlapping subgroups of the population, or strata. For example, strata may<br />
be socioeconomic groups, job categories, age groups, or ethnic groups. With<br />
stratification, you can ensure adequate sample sizes for subgroups of interest,<br />
improve the prec<strong>is</strong>ion of overall estimates, and use different sampling methods from<br />
stratum to stratum.<br />
Clustering. Cluster sampling involves the selection of groups of sampling units, or<br />
clusters. For example, clusters may be schools, hospitals, or geographical areas,<br />
and sampling units may be students, patients, or citizens. Clustering <strong>is</strong> common in<br />
mult<strong>is</strong>tage designs and area (geographic) samples.<br />
1
2<br />
Chapter 1<br />
Multiple stages. In mult<strong>is</strong>tage sampling, you select a first-stage sample bas<strong>ed</strong> on<br />
clusters. Then you create a second-stage sample by drawing subsamples from the<br />
select<strong>ed</strong> clusters. If the second-stage sample <strong>is</strong> bas<strong>ed</strong> on subclusters, you can then add<br />
a third stage to the sample. For example, in the first stage of a survey, a sample of<br />
cities could be drawn. Then, from the select<strong>ed</strong> cities, households could be sampl<strong>ed</strong>.<br />
Finally, from the select<strong>ed</strong> households, individuals could be poll<strong>ed</strong>. The Sampling and<br />
Analys<strong>is</strong> Preparation wizards allow you to specify three stages in a design.<br />
Nonrandom sampling. When selection at random <strong>is</strong> difficult to obtain, units can be<br />
sampl<strong>ed</strong> systematically (at a fix<strong>ed</strong> interval) or sequentially.<br />
Unequal selection probabilities. When sampling clusters that contain unequal numbers<br />
of units, you can use probability-proportional-to-size (PPS) sampling to make a<br />
cluster’s selection probability equal to the proportion of units it contains. PPS<br />
sampling can also use more general weighting schemes to select units.<br />
Unrestrict<strong>ed</strong> sampling. Unrestrict<strong>ed</strong> sampling selects units with repl<strong>ac</strong>ement (WR).<br />
Thus, an individual unit can be select<strong>ed</strong> for the sample more than once.<br />
Sampling weights. Sampling weights are automatically comput<strong>ed</strong> while drawing a<br />
complex sample and ideally correspond to the “frequency” that e<strong>ac</strong>h sampling unit<br />
represents in the target population. Therefore, the sum of the weights over the sample<br />
should estimate the population size. <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures require<br />
sampling weights in order to properly analyze a complex sample. Note that these<br />
weights should be us<strong>ed</strong> entirely within the <strong>Complex</strong> Samples option and should not<br />
be us<strong>ed</strong> with other analytical proc<strong>ed</strong>ures via the Weight Cases proc<strong>ed</strong>ure, which treats<br />
weightsascasereplications.<br />
Usage of <strong>Complex</strong> Samples Proc<strong>ed</strong>ures<br />
Your usage of <strong>Complex</strong> Samples proc<strong>ed</strong>ures depends on your particular ne<strong>ed</strong>s. The<br />
primary types of users are those who:<br />
• Plan and carry out surveys <strong>ac</strong>cording to complex designs, possibly analyzing the<br />
sample later. The primary tool for surveyors <strong>is</strong> the Sampling Wizard.<br />
• Analyze sample data files previously obtain<strong>ed</strong> <strong>ac</strong>cording to complex designs.<br />
Before using the <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures, you may ne<strong>ed</strong> to use the<br />
Analys<strong>is</strong> Preparation Wizard.
Introduction to <strong>SPSS</strong> <strong>Complex</strong> Samples Proc<strong>ed</strong>ures<br />
3<br />
Regardlessofwhichtypeofuseryouare,youne<strong>ed</strong>tosupplydesigninformationto<br />
<strong>Complex</strong> Samples proc<strong>ed</strong>ures. Th<strong>is</strong> information <strong>is</strong> stor<strong>ed</strong> in a plan file for easy reuse.<br />
Plan Files<br />
A plan file contains complex sample specifications. There are two types of plan files:<br />
Sampling plan. The specifications given in the Sampling Wizard define a sample<br />
design that <strong>is</strong> us<strong>ed</strong> to draw a complex sample. The sampling plan file contains those<br />
specifications. The sampling plan file also contains a default analys<strong>is</strong> plan that uses<br />
estimation methods suitable for the specifi<strong>ed</strong> sample design.<br />
Analys<strong>is</strong> plan. Th<strong>is</strong> plan file contains information ne<strong>ed</strong><strong>ed</strong> by <strong>Complex</strong> Samples<br />
analys<strong>is</strong> proc<strong>ed</strong>ures to properly compute variance estimates for a complex sample.<br />
The plan includes the sample structure, estimation methods for e<strong>ac</strong>h stage, and<br />
references to requir<strong>ed</strong> variables, such as sample weights. The Analys<strong>is</strong> Preparation<br />
Wizard allows you to create and <strong>ed</strong>it analys<strong>is</strong> plans.<br />
There are several advantages to saving your specifications in a plan file, including:<br />
• A surveyor can specify the first stage of a mult<strong>is</strong>tage sampling plan and draw<br />
first-stage units now, collect information on sampling units for the second stage,<br />
and then modify the sampling plan to include the second stage.<br />
• Ananalystwhodoesn’thave<strong>ac</strong>cesstothesamplingplanfilecanspecifyan<br />
analys<strong>is</strong> plan and refer to that plan from e<strong>ac</strong>h <strong>Complex</strong> Samples analys<strong>is</strong><br />
proc<strong>ed</strong>ure.<br />
• Adesigner of large-scale public use samples can publ<strong>is</strong>h the sampling plan file,<br />
which simplifies the instructions for analysts and avoids the ne<strong>ed</strong> for e<strong>ac</strong>h analyst<br />
to specify h<strong>is</strong> or her own analys<strong>is</strong> plans.<br />
Further Readings<br />
For more information on sampling techniques, see the following texts:<br />
Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley and Sons.<br />
K<strong>is</strong>h,L.1965.Survey Sampling. New York: John Wiley and Sons.<br />
K<strong>is</strong>h, L. 1987. Stat<strong>is</strong>tical Design for Research. New York: John Wiley and Sons.
4<br />
Chapter 1<br />
Murthy, M. N. 1967. Sampling Theory and Methods. Calcutta, India: Stat<strong>is</strong>tical<br />
Publ<strong>is</strong>hing Society.<br />
Särndal, C., B. Swensson, and J. Wretman. 1992. Model Ass<strong>is</strong>t<strong>ed</strong> Survey Sampling.<br />
New York: Springer-Verlag.
Chapter<br />
2<br />
Sampling froma<strong>Complex</strong>Design<br />
Figure 2-1<br />
Sampling Wizard, Welcome step<br />
The Sampling Wizard guides you through the steps for creating, modifying, or<br />
executing a sampling plan file. Before using the Wizard, you should have a<br />
well-defin<strong>ed</strong> target population, a l<strong>is</strong>t of sampling units, and an appropriate sample<br />
design in mind.<br />
5
6<br />
Chapter 2<br />
Creating a New SamplePlan<br />
E<br />
E<br />
E<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Select a Sample...<br />
Select Design a sample and choose a plan filename to save the sample plan.<br />
Click Next to continue through the Wizard.<br />
Optionally, in the Define Variables step, you can define strata, clusters, and input<br />
sample weights. After you define these, click Next.<br />
Optionally, in the Sampling Method step, you can choose a method for selecting items.<br />
If you select PPS Brewer or PPS Murthy, you can click Fin<strong>is</strong>h to draw the sample.<br />
Otherw<strong>is</strong>e, click Next and then:<br />
E<br />
In the Sample Size step, specify the number or proportion of units to sample.<br />
You can now click Fin<strong>is</strong>h to draw the sample. Optionally, in further steps, you can:<br />
• Choose output variables to save.<br />
• Add a second or third stage to the design.<br />
• Set various selection options, including which stages to draw samples from, the<br />
random number se<strong>ed</strong>, and whether to treat user-m<strong>is</strong>sing values as valid values of<br />
design variables.<br />
• Choose wheretosaveoutputdata.<br />
• Paste your selections as command syntax.
7<br />
Sampling from a <strong>Complex</strong> Design<br />
Sampling Wizard: Design Variables<br />
Figure 2-2<br />
Sampling Wizard, Design Variables step<br />
Th<strong>is</strong> step allows you to select stratification and clustering variables and to define input<br />
sample weights. You can also specify a label for the stage.<br />
Stratify By. The cross-classification of stratification variables defines d<strong>is</strong>tinct<br />
subpopulations, or strata. Separate samples are obtain<strong>ed</strong> for e<strong>ac</strong>h stratum. To improve<br />
the prec<strong>is</strong>ion of your estimates, units within strata should be as homogeneous as<br />
possible for the char<strong>ac</strong>ter<strong>is</strong>tics of interest.<br />
Clusters. Cluster variables define groups of observational units, or clusters. Clusters<br />
are useful when directly sampling observational units from the population <strong>is</strong><br />
expensive or impossible; instead, you can sample clusters from the population and<br />
then sample observational units from the select<strong>ed</strong> clusters. However, the use of<br />
clusters can introduce correlations among sampling units, resulting in a loss of
8<br />
Chapter 2<br />
prec<strong>is</strong>ion. To minimize th<strong>is</strong> effect, units within clusters should be as heterogeneous<br />
as possible for the char<strong>ac</strong>ter<strong>is</strong>tics of interest. You must define at least one cluster<br />
variable in order to plan a mult<strong>is</strong>tage design. Clusters are also necessary in the use of<br />
several different sampling methods. For more information, see “Sampling Wizard:<br />
Sampling Method” on p. 9.<br />
Input Sample Weight. If the current sample design <strong>is</strong> part of a larger sample design,<br />
you may have sample weights from a previous stage of the larger design. You<br />
can specify a numeric variable containing these weights in the first stage of the<br />
current design. Sample weights are comput<strong>ed</strong> automatically for subsequent stages<br />
of the current design.<br />
Stage Label. You can specify an optional string label for e<strong>ac</strong>h stage. Th<strong>is</strong> <strong>is</strong> us<strong>ed</strong> in the<br />
output to help identify stagew<strong>is</strong>e information.<br />
Note: The source variable l<strong>is</strong>t has the same content <strong>ac</strong>ross steps of the Wizard. In other<br />
words, variables remov<strong>ed</strong> from the source l<strong>is</strong>t in a particular step are remov<strong>ed</strong> from<br />
the l<strong>is</strong>t inallsteps.Variablesreturn<strong>ed</strong>tothesource l<strong>is</strong>t appear in the l<strong>is</strong>t in all steps.<br />
Tree Controls for Navigating the Sampling Wizard<br />
On the left side of e<strong>ac</strong>h step in the Sampling Wizard <strong>is</strong> an outline of all the steps. You<br />
can navigatetheWizardbyclickingonthenameofanenabl<strong>ed</strong>stepintheoutline.<br />
Steps are enabl<strong>ed</strong> as long as all previous steps are valid—that <strong>is</strong>, if e<strong>ac</strong>h previous step<br />
has been given the minimum requir<strong>ed</strong> specifications for that step. See the Help for<br />
individual steps for more information on why a given step may be invalid.
9<br />
Sampling from a <strong>Complex</strong> Design<br />
Sampling Wizard: Sampling Method<br />
Figure 2-3<br />
Sampling Wizard, Method step<br />
Th<strong>is</strong> step allows you to specify how to select cases from the working data file.<br />
Method. Controls in th<strong>is</strong> group are us<strong>ed</strong> to choose a selection method. Some sampling<br />
types allow you to choose whether to sample with repl<strong>ac</strong>ement (WR) or without<br />
repl<strong>ac</strong>ement (WOR). See the type descriptions for more information. Note that some<br />
probability-proportional-to-size (PPS) types are available only when clusters have<br />
been defin<strong>ed</strong> and that all PPS types are available only in the first stage of a design.<br />
Moreover, WR methods are available only in the last stage of a design.<br />
• Simple Random Sampling. Units are select<strong>ed</strong> with equal probability. They can be<br />
select<strong>ed</strong> with or without repl<strong>ac</strong>ement.
10<br />
Chapter 2<br />
• Simple Systematic. Units are select<strong>ed</strong> at a fix<strong>ed</strong> interval throughout the sampling<br />
frame (or strata,iftheyhavebeenspecifi<strong>ed</strong>)and extr<strong>ac</strong>t<strong>ed</strong> without repl<strong>ac</strong>ement.<br />
A randomly select<strong>ed</strong> unit within the first interval <strong>is</strong> chosen as the starting point.<br />
• Simple Sequential. Units are select<strong>ed</strong> sequentially with equal probability and<br />
without repl<strong>ac</strong>ement.<br />
• PPS. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects units at random with probability<br />
proportional to size. Any units can be select<strong>ed</strong> with repl<strong>ac</strong>ement; only clusters<br />
can be sampl<strong>ed</strong> without repl<strong>ac</strong>ement.<br />
• PPS Systematic. Th<strong>is</strong> <strong>is</strong> a first-stage method that systematically selects units with<br />
probability proportional to size. They are select<strong>ed</strong> without repl<strong>ac</strong>ement.<br />
• PPS Sequential. Th<strong>is</strong> <strong>is</strong> a first-stage method that sequentially selects units with<br />
probability proportional to cluster size and without repl<strong>ac</strong>ement.<br />
• PPS Brewer. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects two clusters from e<strong>ac</strong>h<br />
stratum with probability proportional to cluster size and without repl<strong>ac</strong>ement. A<br />
cluster variable must be specifi<strong>ed</strong> to use th<strong>is</strong> method.<br />
• PPS Murthy. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects two clusters from e<strong>ac</strong>h<br />
stratum with probability proportional to cluster size and without repl<strong>ac</strong>ement. A<br />
cluster variable must be specifi<strong>ed</strong> to use th<strong>is</strong> method.<br />
• PPS Sampford. Th<strong>is</strong> <strong>is</strong> a first-stage method that selects more than two clusters<br />
from e<strong>ac</strong>h stratum with probability proportional to cluster size and without<br />
repl<strong>ac</strong>ement. It <strong>is</strong> an extension of Brewer’s method. A cluster variable must be<br />
specifi<strong>ed</strong> to use th<strong>is</strong> method.<br />
• Use WR estimation for analys<strong>is</strong>. By default, an estimation method <strong>is</strong> specifi<strong>ed</strong> in<br />
the plan file that <strong>is</strong> cons<strong>is</strong>tent with the select<strong>ed</strong> sampling method. Th<strong>is</strong> allows you<br />
to use with-repl<strong>ac</strong>ement estimation even if the sampling method implies WOR<br />
estimation. Th<strong>is</strong> option <strong>is</strong> available only in stage 1.<br />
Measure of Size (MOS). If a PPS method <strong>is</strong> select<strong>ed</strong>, you must specify a measure of<br />
size that defines the size of e<strong>ac</strong>h unit. These sizes can be explicitly defin<strong>ed</strong> in a<br />
variable or they can be comput<strong>ed</strong> from the data. Optionally, you can set lower and<br />
upper boundsontheMOS,overridinganyvaluesfoundintheMOSvariableor<br />
comput<strong>ed</strong> from the data. These options are available only in stage 1.
11<br />
Sampling from a <strong>Complex</strong> Design<br />
Sampling Wizard: Sample Size<br />
Figure 2-4<br />
Sampling Wizard, Sample Size step<br />
Th<strong>is</strong> step allows you to specify the number or proportion of units to sample within<br />
the current stage. The sample size can be fix<strong>ed</strong> or it can vary <strong>ac</strong>ross strata. For the<br />
purpose of specifying sample size, clusters chosen in previous stages can be us<strong>ed</strong> to<br />
define strata.<br />
Units. You can specify an ex<strong>ac</strong>t sample size or a proportion of units to sample.<br />
• Value. A single value <strong>is</strong> appli<strong>ed</strong> to all strata. If Counts <strong>is</strong> select<strong>ed</strong> as the unit<br />
metric, you should enter a positive integer. If Proportions <strong>is</strong> select<strong>ed</strong>, you should<br />
enter a non-negative value. Unless sampling with repl<strong>ac</strong>ement, proportion values<br />
should also be no greater than 1.
12<br />
Chapter 2<br />
• Unequal values for strata. Allows you to enter size values on a per-stratum bas<strong>is</strong><br />
via the Define Unequal Sizes dialog box.<br />
• Read values from variable. Allows you to select a numeric variable that contains<br />
size values for strata.<br />
If Proportions <strong>is</strong> select<strong>ed</strong>, you have the option to set lower and upper bounds on<br />
the number of units sampl<strong>ed</strong>.<br />
Define Unequal Sizes<br />
Figure 2-5<br />
Define Unequal Sizes dialog box<br />
The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum bas<strong>is</strong>.<br />
Size Specifications grid. The grid d<strong>is</strong>plays the cross-classifications of up to five strata<br />
or cluster variables—one stratum/cluster combination per row. Eligible grid variables<br />
include all stratification variables from the current and previous stages and all cluster<br />
variables from previous stages. Variables can be reorder<strong>ed</strong> within the grid or mov<strong>ed</strong><br />
to the Exclude l<strong>is</strong>t. Enter sizes in the rightmost column. Click Labels or Values<br />
to toggle the d<strong>is</strong>play of value labels and data values for stratification and cluster<br />
variables in the grid cells. Cells that contain unlabel<strong>ed</strong> values always show values.
13<br />
Sampling from a <strong>Complex</strong> Design<br />
Click Refresh Strata to repopulate the grid with e<strong>ac</strong>h combination of label<strong>ed</strong> data<br />
values for variables in the grid.<br />
Exclude. To specify sizes for a subset of stratum/cluster combinations, move one or<br />
more variables to the Exclude l<strong>is</strong>t. These variables are not us<strong>ed</strong> to define sample sizes.<br />
Sampling Wizard: Output Variables<br />
Figure 2-6<br />
Sampling Wizard, Output Variables step<br />
Th<strong>is</strong> step allows you to choose variables to save when the sample <strong>is</strong> drawn.<br />
Population size. The estimat<strong>ed</strong> number of units in the population for a given stage.<br />
The root name for the sav<strong>ed</strong> variable <strong>is</strong> PopulationSize_.<br />
Sample proportion. The sampling rate at a given stage. The root name for the sav<strong>ed</strong><br />
variable <strong>is</strong> SamplingRate_.
14<br />
Chapter 2<br />
Sample size. The number of units drawn at a given stage. The root name for the<br />
sav<strong>ed</strong> variable <strong>is</strong> SampleSize_.<br />
Sample weight. The inverse of the inclusion probabilities. The root name for the<br />
sav<strong>ed</strong> variable <strong>is</strong> SampleWeight_.<br />
Some stagew<strong>is</strong>e variables are generat<strong>ed</strong> automatically. These include:<br />
Inclusion probabilities. The proportion of units drawn at a given stage. The root name<br />
for the sav<strong>ed</strong> variable<strong>is</strong>InclusionProbability_.<br />
Cumulative weight. The cumulative sample weight over stages previous to<br />
and including the current one. The root name for the sav<strong>ed</strong> variable <strong>is</strong><br />
SampleWeightCumulative_.<br />
Index. Identifies units select<strong>ed</strong> multiple times within a given stage. The root name for<br />
the sav<strong>ed</strong> variable <strong>is</strong> Index_.<br />
Note: Sav<strong>ed</strong> variable root names include an integer suffix that reflects the stage<br />
number—for example, PopulationSize_1_ for the sav<strong>ed</strong> population size for stage 1.
15<br />
Sampling from a <strong>Complex</strong> Design<br />
Sampling Wizard: Plan Summary<br />
Figure 2-7<br />
Sampling Wizard, Plan Summary step<br />
Th<strong>is</strong> <strong>is</strong> the last step within e<strong>ac</strong>h stage, providing a summary of the sample design<br />
specifications through the current stage. From here, you can either proce<strong>ed</strong> to the<br />
next stage (creating it, if necessary) or set options for drawing the sample.
16<br />
Chapter 2<br />
Sampling Wizard: Draw Sample Selection Options<br />
Figure 2-8<br />
Sampling Wizard, Draw Sample, Selection Options step<br />
Th<strong>is</strong> step allows you to choose whether to draw a sample. You can also control other<br />
sampling options, such as the random se<strong>ed</strong> and m<strong>is</strong>sing-value handling.<br />
Draw sample. In addition to choosing whether to draw a sample, you can also choose<br />
to execute part of the sampling design. Stages must be drawn in order—that <strong>is</strong>,<br />
stage 2 cannot be drawn unless stage 1 <strong>is</strong> also drawn. When <strong>ed</strong>iting or executing a<br />
plan, you cannot resample lock<strong>ed</strong> stages.<br />
Se<strong>ed</strong>. Th<strong>is</strong> allows you to choose a se<strong>ed</strong> value for random number generation.<br />
Include user-m<strong>is</strong>sing values. Th<strong>is</strong> determines whether user-m<strong>is</strong>sing values are valid. If<br />
so, user-m<strong>is</strong>sing values are treat<strong>ed</strong> as a separate category.<br />
Data already sort<strong>ed</strong>. If your sample frame <strong>is</strong> presort<strong>ed</strong> by the values of the stratification<br />
variables, th<strong>is</strong> option allows you to spe<strong>ed</strong> the selection process.
Sampling Wizard: Draw Sample Output Files<br />
Figure 2-9<br />
Sampling Wizard, Draw Sample, Output Files step<br />
17<br />
Sampling from a <strong>Complex</strong> Design<br />
Th<strong>is</strong> step allows you to choose where to direct sampl<strong>ed</strong> cases, weight variables, joint<br />
probabilities, and case selection rules.<br />
Sample data. These options let you determine where sample output <strong>is</strong> written. It can<br />
be add<strong>ed</strong> to the working data file or sav<strong>ed</strong> to an external file. If an external file <strong>is</strong><br />
specifi<strong>ed</strong>, the sampling output variables and variables in the working data file for<br />
the select<strong>ed</strong> cases are sav<strong>ed</strong> to the file.<br />
Joint probabilities. These options let you determine where joint probabilities are<br />
written. Joint probabilities are produc<strong>ed</strong> if the PPS WOR, PPS Brewer, PPS<br />
Sampford, or PPS Murthy method <strong>is</strong> select<strong>ed</strong> and WR estimation <strong>is</strong> not specifi<strong>ed</strong>.
18<br />
Chapter 2<br />
Case selection rules. If you are constructing your sample one stage at a time, you may<br />
want to save the case selection rules to a text file. They are useful for constructing the<br />
subframe for subsequent stages.<br />
Sampling Wizard: Fin<strong>is</strong>h<br />
Figure 2-10<br />
Sampling Wizard, Fin<strong>is</strong>h step<br />
Th<strong>is</strong> <strong>is</strong> the final step. You can save the plan file and draw the sample now or paste<br />
your selections into a syntax window.<br />
When making changes to stages in the ex<strong>is</strong>ting plan file, you can save the <strong>ed</strong>it<strong>ed</strong><br />
plan to a new file or overwrite the ex<strong>is</strong>ting file. When adding stages without making<br />
changes to ex<strong>is</strong>ting stages, the Wizard automatically overwrites the ex<strong>is</strong>ting plan file.<br />
Ifyouwanttosavetheplantoanewfile,selectPaste the syntax generat<strong>ed</strong> by the<br />
Wizard into a syntax window and change the filename in the syntax commands.
19<br />
Sampling from a <strong>Complex</strong> Design<br />
Modifying an Ex<strong>is</strong>ting Sample Plan<br />
E<br />
E<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Select a Sample...<br />
Select Edit a sample design and choose a plan file to <strong>ed</strong>it.<br />
Click Next to continue through the Wizard.<br />
Review the sampling plan in the Plan Summary step, and then click Next.<br />
Subsequent steps are largely the same as for a new design. See the Help for individual<br />
steps for more information.<br />
E<br />
Navigate to the Fin<strong>is</strong>h step, and specify a new name for the <strong>ed</strong>it<strong>ed</strong> plan file or choose<br />
to overwrite the ex<strong>is</strong>ting plan file.<br />
Optionally, you can:<br />
• Specify stages that have already been sampl<strong>ed</strong>.<br />
• Remove stages from the plan.
20<br />
Chapter 2<br />
Sampling Wizard: Plan Summary<br />
Figure 2-11<br />
Sampling Wizard, Plan Summary step<br />
Th<strong>is</strong> step allows you to review the sampling plan and indicate stages that have already<br />
been sampl<strong>ed</strong>. If <strong>ed</strong>iting a plan, you can also remove stages from the plan.<br />
Previously sampl<strong>ed</strong> stages. If an extend<strong>ed</strong> sampling frame <strong>is</strong> not available, you will<br />
have to execute a mult<strong>is</strong>tage sampling design one stage at a time. Select which stages<br />
have already been sampl<strong>ed</strong> from the drop-down l<strong>is</strong>t. Any stages that have been<br />
execut<strong>ed</strong> are lock<strong>ed</strong>; they are not available in the Draw Sample Selection Options<br />
step, and they cannot be alter<strong>ed</strong> when <strong>ed</strong>iting a plan.<br />
Remove stages. You can remove stages 2 and 3 from a mult<strong>is</strong>tage design.
21<br />
Sampling from a <strong>Complex</strong> Design<br />
RunninganEx<strong>is</strong>ting Sample Plan<br />
E<br />
E<br />
E<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Select a Sample...<br />
Select Draw a sample and choose a plan file to run.<br />
Click Next to continue through the Wizard.<br />
Review the sampling plan in the Plan Summary step, and then click Next.<br />
The individual steps containing stage information are skipp<strong>ed</strong> when executing a<br />
sample plan. You can now go on to the Fin<strong>is</strong>h step at any time.<br />
Optionally, you can:<br />
• Specify stages that have already been sampl<strong>ed</strong>.<br />
CSPLAN and CSSELECT Commands Additional Features<br />
The <strong>SPSS</strong> command language also allows you to:<br />
• Specify custom names for output variables.<br />
• Control the output in the Viewer. For example, you can suppress the stagew<strong>is</strong>e<br />
summary of the plan that <strong>is</strong> d<strong>is</strong>play<strong>ed</strong> if a sample <strong>is</strong> design<strong>ed</strong> or modifi<strong>ed</strong>,<br />
suppress the summary of the d<strong>is</strong>tribution of sampl<strong>ed</strong> cases by strata that <strong>is</strong> shown<br />
if the sample design <strong>is</strong> execut<strong>ed</strong>, and request a case processing summary.<br />
• Choose a subset of variables in the working data file to write to an external<br />
sample file.<br />
See the <strong>SPSS</strong> Command Syntax Reference for complete syntax information.
Preparing a <strong>Complex</strong> Sample<br />
for Analys<strong>is</strong><br />
Chapter<br />
3<br />
Figure 3-1<br />
Analys<strong>is</strong> Preparation Wizard, Welcome step<br />
23
24<br />
Chapter 3<br />
The Analys<strong>is</strong> Preparation Wizard guides you through the steps for creating or<br />
modifying an analys<strong>is</strong> plan for use with the various <strong>Complex</strong> Samples analys<strong>is</strong><br />
proc<strong>ed</strong>ures. Before using the Wizard, you should have a sample drawn <strong>ac</strong>cording to a<br />
complex design.<br />
Creating a new plan <strong>is</strong> most useful when you do not have <strong>ac</strong>cess to the sampling<br />
plan file us<strong>ed</strong> to draw the sample (recall that the sampling plan contains a default<br />
analys<strong>is</strong> plan). If you do have <strong>ac</strong>cess to the sampling plan file us<strong>ed</strong> to draw the<br />
sample, you can use the default analys<strong>is</strong> plan contain<strong>ed</strong> in the sampling plan file or<br />
override the default analys<strong>is</strong> specifications and save your changes to a new file.<br />
Creating a New Analys<strong>is</strong> Plan<br />
E<br />
E<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Prepare for Analys<strong>is</strong>...<br />
Select Createaplanfile, and choose a plan filename to which you will save the<br />
analys<strong>is</strong> plan.<br />
Click Next to continue through the Wizard.<br />
Specify the variable containing sample weights in the Design Variables step,<br />
optionally defining strata and clusters.<br />
You can now click Fin<strong>is</strong>h to save the plan. Optionally, in further steps you can:<br />
• Select the method for estimating standard errors in the Estimation Method step.<br />
• Specify the number of units sampl<strong>ed</strong> or the inclusion probability per unit in<br />
the Size step.<br />
• Add a second or third stage to the design.<br />
• Paste your selections as command syntax.
Analys<strong>is</strong> Preparation Wizard: Design Variables<br />
Figure 3-2<br />
Analys<strong>is</strong> Preparation Wizard, Design Variables step (stage 1)<br />
25<br />
Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />
Th<strong>is</strong> step allows you to identify the stratification and clustering variables and define<br />
sample weights. You can also provide a label for the stage.<br />
Strata. The cross-classification of stratification variables defines d<strong>is</strong>tinct<br />
subpopulations, or strata. Your total sample represents the combination of<br />
independent samples from e<strong>ac</strong>h stratum.<br />
Clusters. Cluster variables define groups of observational units, or clusters. Samples<br />
drawn inmultiplestagesselectclustersintheearlier stages and then subsample units<br />
from the select<strong>ed</strong> clusters. When analyzing a data file obtain<strong>ed</strong> by sampling clusters<br />
with repl<strong>ac</strong>ement, you should include the duplication index as a cluster variable.<br />
Sample Weights. You must provide sample weights in the first stage. Sample weights<br />
are comput<strong>ed</strong> automatically for subsequent stages of the current design.
26<br />
Chapter 3<br />
Stage Label. You can specify an optional string label for e<strong>ac</strong>h stage. Th<strong>is</strong> <strong>is</strong> us<strong>ed</strong> in the<br />
output to help identify stagew<strong>is</strong>e information.<br />
Note: The source variable l<strong>is</strong>t has the same contents <strong>ac</strong>ross steps of the Wizard. In<br />
other words, variables remov<strong>ed</strong> from the source l<strong>is</strong>t in a particular step are remov<strong>ed</strong><br />
from the l<strong>is</strong>t in all steps. Variables return<strong>ed</strong> to the source l<strong>is</strong>t show up in all steps.<br />
Tree Controls for Navigating the Analys<strong>is</strong> Wizard<br />
At the left side of e<strong>ac</strong>h step of the Analys<strong>is</strong> Wizard <strong>is</strong> an outline of all the steps. You<br />
can navigate the Wizard by clicking on the name of an enabl<strong>ed</strong> step in the outline.<br />
Steps are enabl<strong>ed</strong> as long as all previous steps are valid—that <strong>is</strong>, as long as e<strong>ac</strong>h<br />
previous step has been given the minimum requir<strong>ed</strong> specifications for that step. For<br />
more information on why a given step may be invalid, see the Help for individual<br />
steps.
Analys<strong>is</strong> Preparation Wizard: Estimation Method<br />
Figure 3-3<br />
Analys<strong>is</strong> Preparation Wizard, Estimation Method step (stage 1)<br />
27<br />
Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />
Th<strong>is</strong> step allows you to specify an estimation method for the stage.<br />
WR (sampling with repl<strong>ac</strong>ement). WR estimation does not include a correction for<br />
sampling from a finite population, since it assumes that the sample was taken from<br />
an infinite population. When the population for the stage <strong>is</strong> much larger than the<br />
sample, th<strong>is</strong> <strong>is</strong> a reasonable assumption. WR estimation can be specifi<strong>ed</strong> only in the<br />
final stage of a design; the Wizard will not allow you to add another stage if you<br />
select WR estimation.<br />
Equal WOR (equal probability sampling without repl<strong>ac</strong>ement). Equal WOR estimation<br />
includes the finite population correction and assumes that units are sampl<strong>ed</strong> with<br />
equal probability. Equal WOR can be specifi<strong>ed</strong> in any stage of a design.
28<br />
Chapter 3<br />
Unequal WOR (unequal probability sampling without repl<strong>ac</strong>ement). In addition to using<br />
the finite population correction, Unequal WOR <strong>ac</strong>counts for sampling units (usually<br />
clusters) select<strong>ed</strong> with unequal probability. Th<strong>is</strong> estimation method <strong>is</strong> available<br />
only in the first stage.<br />
Analys<strong>is</strong> Preparation Wizard: Size<br />
Figure 3-4<br />
Analys<strong>is</strong> Preparation Wizard, Size step (stage 1)<br />
Th<strong>is</strong> step <strong>is</strong> us<strong>ed</strong> to specify inclusion probabilities or population sizes for the current<br />
stage. Sizes can be fix<strong>ed</strong> or can vary <strong>ac</strong>ross strata. For the purpose of specifying sizes,<br />
clusters specifi<strong>ed</strong> in previous stages can be us<strong>ed</strong> to define strata.<br />
Units. You can specify ex<strong>ac</strong>t population sizes or the probabilities with which units<br />
were sampl<strong>ed</strong>.
29<br />
Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />
• Value. A single value <strong>is</strong> appli<strong>ed</strong> to all strata. If Population Sizes <strong>is</strong> select<strong>ed</strong> as the<br />
unit metric, you should enter a non-negative integer. If Inclusion Probabilities <strong>is</strong><br />
select<strong>ed</strong>, you should enter a value between 0 and 1, inclusive.<br />
• Unequal values for strata. Allows you to enter size values on a per-stratum bas<strong>is</strong><br />
via the Define Unequal Sizes dialog box.<br />
• Read values from variable. Allows you to select a numeric variable that contains<br />
size values for strata.<br />
Define Unequal Sizes<br />
Figure 3-5<br />
Define Unequal Sizes dialog box<br />
The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum bas<strong>is</strong>.<br />
Size Specifications grid. The grid d<strong>is</strong>plays the cross-classifications of up to five strata<br />
or cluster variables—one stratum/cluster combination per row. Eligible grid variables<br />
include all stratification variables from the current and previous stages and all cluster<br />
variables from previous stages. Variables can be reorder<strong>ed</strong> within the grid or mov<strong>ed</strong><br />
to the Exclude l<strong>is</strong>t. Enter sizes in the rightmost column. Click Labels or Values<br />
to toggle the d<strong>is</strong>play of value labels and data values for stratification and cluster<br />
variables in the grid cells. Cells that contain unlabel<strong>ed</strong> values always show values.
30<br />
Chapter 3<br />
Click Refresh Strata to repopulate the grid with e<strong>ac</strong>h combination of label<strong>ed</strong> data<br />
values for variables in the grid.<br />
Exclude. To specify sizes for a subset of stratum/cluster combinations, move one or<br />
more variables to the Exclude l<strong>is</strong>t. These variables are not us<strong>ed</strong> to define sample sizes.<br />
Analys<strong>is</strong> Preparation Wizard: Stage Summary<br />
Figure 3-6<br />
Analys<strong>is</strong> Preparation Wizard, Plan Summary step (stage 1)<br />
Th<strong>is</strong> <strong>is</strong> the last step within e<strong>ac</strong>h stage, providing a summary of the analys<strong>is</strong> design<br />
specifications through the current stage. From here, you can either proce<strong>ed</strong> to the<br />
next stage (creating it if necessary) or save the analys<strong>is</strong> specifications.<br />
If you cannot add another stage, it <strong>is</strong> likely because:<br />
• No cluster variable was specifi<strong>ed</strong> in the Design Variables step.
31<br />
Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />
• You select<strong>ed</strong> WR estimation in the Estimation Method step.<br />
• Th<strong>is</strong> <strong>is</strong> the third stage of the analys<strong>is</strong>, and the Wizard supports a maximum of<br />
three stages.<br />
Analys<strong>is</strong> Preparation Wizard: Fin<strong>is</strong>h<br />
Figure 3-7<br />
Analys<strong>is</strong> Preparation Wizard, Fin<strong>is</strong>h step<br />
Th<strong>is</strong> <strong>is</strong> the final step. You can save the plan file now or paste your selections to<br />
asyntax window.<br />
When making changes to stages in the ex<strong>is</strong>ting plan file, you can save the <strong>ed</strong>it<strong>ed</strong><br />
plan to a new file or overwrite the ex<strong>is</strong>ting file. When adding stages without making<br />
changes to ex<strong>is</strong>ting stages, the Wizard automatically overwrites the ex<strong>is</strong>ting plan file.<br />
If you want to save the plan to a new file, choose to Paste the syntax generat<strong>ed</strong> by the<br />
Wizard into a syntax window and change the filename in the syntax commands.
32<br />
Chapter 3<br />
Modifying an Ex<strong>is</strong>ting Analys<strong>is</strong> Plan<br />
E<br />
E<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Prepare for Analys<strong>is</strong>...<br />
Select Edit a plan file, and choose a plan filename to which you will save the analys<strong>is</strong><br />
plan.<br />
Click Next to continue through the Wizard.<br />
Review the analys<strong>is</strong> plan in the Plan Summary step, and then click Next.<br />
Subsequent steps are largely the same as for a new design. For more information, see<br />
the Help for individual steps.<br />
E<br />
Navigate to the Fin<strong>is</strong>h step, and specify a new name for the <strong>ed</strong>it<strong>ed</strong> plan file, or choose<br />
to overwrite the ex<strong>is</strong>ting plan file.<br />
Optionally, you can:<br />
• Remove stages from the plan.
Analys<strong>is</strong> Preparation Wizard: Plan Summary<br />
Figure 3-8<br />
Analys<strong>is</strong> Preparation Wizard, Plan Summary step<br />
33<br />
Preparing a <strong>Complex</strong> Sample for Analys<strong>is</strong><br />
Th<strong>is</strong> step allows you to review the analys<strong>is</strong> plan and remove stages from the plan.<br />
Remove Stages. You can remove stages 2 and 3 from a mult<strong>is</strong>tage design. Since a plan<br />
must have at least one stage, you can <strong>ed</strong>it but not remove stage 1 from the design.
<strong>Complex</strong> Samples Plan<br />
Chapter<br />
4<br />
<strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures require analys<strong>is</strong> specifications from an<br />
analys<strong>is</strong> or sample plan file in order to provide valid results.<br />
Figure 4-1<br />
<strong>Complex</strong> Samples Plan dialog box<br />
Plan. Specify the path of an analys<strong>is</strong> or sample plan file.<br />
Joint Probabilities. In order to use Unequal WOR estimation for clusters drawn<br />
using a PPS WOR method, you ne<strong>ed</strong> to specify a separate file containing the joint<br />
probabilities. Th<strong>is</strong> file <strong>is</strong> creat<strong>ed</strong> by the Sampling Wizard during sampling.<br />
35
<strong>Complex</strong> Samples Frequencies<br />
Chapter<br />
5<br />
The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure produces frequency tables for select<strong>ed</strong><br />
variables and d<strong>is</strong>plays univariate stat<strong>is</strong>tics. Optionally, you can request stat<strong>is</strong>tics by<br />
subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />
Example. Using the <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure, you can obtain<br />
univariate tabular stat<strong>is</strong>tics for vitamin usage among U.S. citizens, bas<strong>ed</strong> on the<br />
results of the National Health Interview Survey (NHIS) and with an appropriate<br />
analys<strong>is</strong> plan for th<strong>is</strong> public use data.<br />
Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates of cell population sizes and table<br />
percentages, plus standard errors, confidence intervals, coefficients of variation,<br />
design effects, square roots of design effects, cumulative values, and unweight<strong>ed</strong><br />
counts for e<strong>ac</strong>h estimate. Additionally, chi-square and likelihood ratio stat<strong>is</strong>tics are<br />
comput<strong>ed</strong> for the test of equal cell proportions.<br />
Data. Variables for which frequency tables are produc<strong>ed</strong> should be categorical.<br />
Subpopulation variables can be string or numeric, but should be categorical.<br />
Assumptions. The cases in the data file represent a sample from a complex design<br />
that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />
<strong>Complex</strong> Samples Plan dialog box.<br />
Obtaining <strong>Complex</strong> Samples Frequencies<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Frequencies...<br />
Select a plan file and optionally select a custom joint probabilities file.<br />
37
38<br />
Chapter 5<br />
E<br />
Click Continue.<br />
Figure 5-1<br />
Frequencies dialog box<br />
E<br />
Select at least one frequency variable.<br />
Optionally, you can:<br />
• Specify variables to define subpopulations. Stat<strong>is</strong>tics are comput<strong>ed</strong> separately for<br />
e<strong>ac</strong>h subpopulation.
39<br />
<strong>Complex</strong> Samples Frequencies<br />
<strong>Complex</strong> Samples Frequencies Stat<strong>is</strong>tics<br />
Figure 5-2<br />
Frequencies Stat<strong>is</strong>tics dialog box<br />
Cells. Th<strong>is</strong> group allows you to request estimates of the cell population sizes and<br />
table percentages.<br />
Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the population size or table<br />
percentage.<br />
• Standard error. The standard error of the estimate.<br />
• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />
level.<br />
• Coefficient of variation. The ratio of the standard error of the estimate to the<br />
estimate.<br />
• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.<br />
• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />
by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />
the effect of specifying a complex design, where values further from 1 indicate<br />
greater effects.<br />
• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />
design, where values further from 1 indicate greater effects.<br />
• Cumulative values. The cumulative estimate through e<strong>ac</strong>h value of the variable.
40<br />
Chapter 5<br />
Test of equal cell proportions. Th<strong>is</strong> produces chi-square and likelihood-ratio tests of<br />
the hypothes<strong>is</strong> that the categories of a variable have equal frequencies. Separate<br />
tests are perform<strong>ed</strong> for e<strong>ac</strong>h variable.<br />
<strong>Complex</strong> Samples M<strong>is</strong>sing Values<br />
Figure 5-3<br />
M<strong>is</strong>sing Values dialog box<br />
Tables. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the analys<strong>is</strong>.<br />
• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a table-by-table bas<strong>is</strong>.<br />
Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross frequency or<br />
crosstabulation tables.<br />
• Use cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables.<br />
Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent <strong>ac</strong>ross tables.<br />
Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />
valid or invalid.
41<br />
<strong>Complex</strong> Samples Frequencies<br />
<strong>Complex</strong> Samples Options<br />
Figure 5-4<br />
Options dialog box<br />
Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />
table or in separate tables.
<strong>Complex</strong> Samples Descriptives<br />
Chapter<br />
6<br />
The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics<br />
for several variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />
by one or more categorical variables.<br />
Example. Using the <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure, you can obtain<br />
univariate descriptive stat<strong>is</strong>tics for the <strong>ac</strong>tivity levels of U.S. citizens bas<strong>ed</strong> on the<br />
results of the National Health Interview Survey (NHIS) and with an appropriate<br />
analys<strong>is</strong> plan for th<strong>is</strong> public use data.<br />
Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces means and sums, plus t tests, standard errors,<br />
confidence intervals, coefficients of variation, unweight<strong>ed</strong> counts, population sizes,<br />
design effects, and square roots of design effects for e<strong>ac</strong>h estimate.<br />
Data. Measures should be scale variables. Subpopulation variables can be string<br />
or numeric but should be categorical.<br />
Assumptions. The cases in the data file represent a sample from a complex design<br />
that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />
<strong>Complex</strong> Samples Plan dialog box.<br />
Obtaining <strong>Complex</strong> Samples Descriptives<br />
E<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Descriptives...<br />
Select a plan file, and optionally select a custom joint probabilities file.<br />
Click Continue.<br />
43
44<br />
Chapter 6<br />
Figure 6-1<br />
Descriptives dialog box<br />
E<br />
Select at least one measure variable.<br />
Optionally, you can:<br />
• Specify variables to define subpopulations. Stat<strong>is</strong>tics are comput<strong>ed</strong> separately for<br />
e<strong>ac</strong>h subpopulation.
45<br />
<strong>Complex</strong> Samples Descriptives<br />
<strong>Complex</strong> Samples Descriptives Stat<strong>is</strong>tics<br />
Figure 6-2<br />
Descriptives Stat<strong>is</strong>tics dialog box<br />
Summaries. Th<strong>is</strong> group allows you to request estimates of the means and sums of the<br />
measure variables. Additionally, you can request t tests of the estimates against<br />
a specifi<strong>ed</strong> value.<br />
Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the mean or sum.<br />
• Standard error. The standard error of the estimate.<br />
• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />
level.<br />
• Coefficient of variation. The ratio of the standard error of the estimate to the<br />
estimate.<br />
• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.<br />
• Population size. The estimat<strong>ed</strong> number of units in the population.<br />
• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />
by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />
the effect of specifying a complex design, where values further from 1 indicate<br />
greater effects.<br />
• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />
design, where values further from 1 indicate greater effects.
46<br />
Chapter 6<br />
<strong>Complex</strong> Samples Descriptives M<strong>is</strong>sing Values<br />
Figure 6-3<br />
Descriptives M<strong>is</strong>sing Values dialog box<br />
Stat<strong>is</strong>tics for Measure Variables. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the<br />
analys<strong>is</strong>.<br />
• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a variable-by-variable<br />
bas<strong>is</strong>, thus the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross measure variables.<br />
• Ensure cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables,<br />
thus the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent.<br />
Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />
valid or invalid.<br />
<strong>Complex</strong> Samples Options<br />
Figure 6-4<br />
Options dialog box
47<br />
<strong>Complex</strong> Samples Descriptives<br />
Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />
table or in separate tables.
<strong>Complex</strong> Samples Crosstabs<br />
Chapter<br />
7<br />
The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure produces crosstabulation tables for pairs<br />
of select<strong>ed</strong> variables and d<strong>is</strong>plays two-way stat<strong>is</strong>tics. Optionally, you can request<br />
stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />
Example. Using the <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure, you can obtain<br />
cross-classification stat<strong>is</strong>tics for smoking frequency by vitamin usage of U.S. citizens,<br />
bas<strong>ed</strong> on the results of the National Health Interview Survey (NHIS) and with an<br />
appropriate analys<strong>is</strong> plan for th<strong>is</strong> public-use data.<br />
Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates of cell population sizes and row, column,<br />
and table percentages, plus standard errors, confidence intervals, coefficients of<br />
variation, expect<strong>ed</strong> values, design effects, square roots of design effects, residuals,<br />
adjust<strong>ed</strong> residuals, and unweight<strong>ed</strong> counts for e<strong>ac</strong>h estimate. The odds ratio, relative<br />
r<strong>is</strong>k, and r<strong>is</strong>k difference are comput<strong>ed</strong> for 2-by-2 tables. Additionally, Pearson and<br />
likelihood-ratio stat<strong>is</strong>tics are comput<strong>ed</strong> for the test of independence of the row and<br />
column variables.<br />
Data. Row and column variables should be categorical. Subpopulation variables can<br />
be string or numeric but should be categorical.<br />
Assumptions. The cases in the data file represent a sample from a complex design<br />
that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />
<strong>Complex</strong> Samples Plan dialog box.<br />
Obtaining <strong>Complex</strong> Samples Crosstabs<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Crosstabs...<br />
49
50<br />
Chapter 7<br />
E<br />
E<br />
Select a plan file and, optionally, select a custom joint probabilities file.<br />
Click Continue.<br />
Figure 7-1<br />
Crosstabs dialog box<br />
E<br />
Select at least one row variable and one column variable.<br />
Optionally, you can:<br />
• Specify variables to define subpopulations. Stat<strong>is</strong>tics are comput<strong>ed</strong> separately for<br />
e<strong>ac</strong>h subpopulation.
51<br />
<strong>Complex</strong> Samples Crosstabs<br />
<strong>Complex</strong> Samples Crosstabs Stat<strong>is</strong>tics<br />
Figure 7-2<br />
Crosstabs Stat<strong>is</strong>tics dialog box<br />
Cells. Th<strong>is</strong> group allows you to request estimates of the cell population size and row,<br />
column, and table percentages.<br />
Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the population size and row,<br />
column, and table percentages.<br />
• Standard error. The standard error of the estimate.<br />
• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />
level.<br />
• Coefficient of variation. The ratio of the standard error of the estimate to the<br />
estimate.<br />
• Expect<strong>ed</strong> values. The expect<strong>ed</strong> value of the estimate, under the hypothes<strong>is</strong> of<br />
independence of the row and column variable.<br />
• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.
52<br />
Chapter 7<br />
• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />
by assuming thatthesample<strong>is</strong>asimplerandomsample.Th<strong>is</strong><strong>is</strong>ameasureof<br />
the effect of specifying a complex design, where values further from 1 indicate<br />
greater effects.<br />
• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />
design, where values further from 1 indicate greater effects.<br />
• Residuals. The expect<strong>ed</strong> value <strong>is</strong> the number of cases that you would expect in the<br />
cell if there were no relationship between the two variables. A positive residual<br />
indicates that there are more cases in the cell than there would be if the row and<br />
column variables were independent.<br />
• Adjust<strong>ed</strong> residuals. The residual for a cell (observ<strong>ed</strong> minus expect<strong>ed</strong> value)<br />
divid<strong>ed</strong>byanestimateofitsstandarderror.Theresultingstandardiz<strong>ed</strong>residual<strong>is</strong><br />
express<strong>ed</strong> in standard deviation units above or below the mean.<br />
Summaries for 2-by-2 Tables. Th<strong>is</strong> group produces stat<strong>is</strong>tics for tables in which the row<br />
and column variable e<strong>ac</strong>h have two categories. E<strong>ac</strong>h <strong>is</strong> a measure of the strength of<br />
the association between the presence of a f<strong>ac</strong>tor and the occurrence of an event.<br />
• Odds ratio. The odds ratio can be us<strong>ed</strong> as an estimate of relative r<strong>is</strong>k when the<br />
occurrence of the f<strong>ac</strong>tor <strong>is</strong> rare.<br />
• Relative r<strong>is</strong>k. The ratio of the r<strong>is</strong>k of an event in the presence of the f<strong>ac</strong>tor to the<br />
r<strong>is</strong>k of the event in the absence of the f<strong>ac</strong>tor.<br />
• R<strong>is</strong>k difference. Th<strong>ed</strong>ifferencebetweenther<strong>is</strong>kofaneventinthepresenceofthe<br />
f<strong>ac</strong>tor and the r<strong>is</strong>k of the event in the absence of the f<strong>ac</strong>tor.<br />
Test of independence of rows and columns. Th<strong>is</strong> produces chi-square and<br />
likelihood-ratio tests of the hypothes<strong>is</strong> that a row and column variable are<br />
independent. Separate tests are perform<strong>ed</strong> for e<strong>ac</strong>h pair of variables.
53<br />
<strong>Complex</strong> Samples Crosstabs<br />
<strong>Complex</strong> Samples M<strong>is</strong>sing Values<br />
Figure 7-3<br />
M<strong>is</strong>sing Values dialog box<br />
Tables. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the analys<strong>is</strong>.<br />
• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a table-by-table bas<strong>is</strong>.<br />
Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross frequency or<br />
crosstabulation tables.<br />
• Use cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables.<br />
Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent <strong>ac</strong>ross tables.<br />
Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />
valid or invalid.<br />
<strong>Complex</strong> Samples Options<br />
Figure 7-4<br />
Options dialog box
54<br />
Chapter 7<br />
Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />
table or in separate tables.
<strong>Complex</strong> Samples Ratios<br />
Chapter<br />
8<br />
The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics for<br />
ratios of variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />
by one or more categorical variables.<br />
Example. Using the <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure, you can obtain descriptive<br />
stat<strong>is</strong>tics for the ratio of current property value to last assess<strong>ed</strong> value, bas<strong>ed</strong> on the<br />
results of a statewide survey carri<strong>ed</strong> out <strong>ac</strong>cording to a complex design and with an<br />
appropriate analys<strong>is</strong> plan for the data.<br />
Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces ratio estimates, t tests, standard errors, confidence<br />
intervals, coefficients of variation, unweight<strong>ed</strong> counts, population sizes, design<br />
effects, and square roots of design effects.<br />
Data. Numerators and denominators should be positive-valu<strong>ed</strong> scale variables.<br />
Subpopulation variables can be string or numeric but should be categorical.<br />
Assumptions. The cases in the data file represent a sample from a complex design<br />
that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />
<strong>Complex</strong> Samples Plan dialog box.<br />
Obtaining <strong>Complex</strong> Samples Ratios<br />
E<br />
E<br />
E<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Ratios...<br />
Select a plan file and, optionally, select a custom joint probabilities file.<br />
Click Continue.<br />
55
56<br />
Chapter 8<br />
Figure 8-1<br />
Ratios dialog box<br />
E<br />
Select at least one numerator variable and denominator variable.<br />
Optionally, you can:<br />
• Specify variables to define subgroups for which stat<strong>is</strong>tics are produc<strong>ed</strong>.
57<br />
<strong>Complex</strong> Samples Ratios<br />
<strong>Complex</strong> Samples Ratios Stat<strong>is</strong>tics<br />
Figure 8-2<br />
Ratios Stat<strong>is</strong>tics dialog box<br />
Stat<strong>is</strong>tics. Th<strong>is</strong> group produces stat<strong>is</strong>tics associat<strong>ed</strong> with the ratio estimate.<br />
• Standard error. The standard error of the estimate.<br />
• Confidence interval. A confidence interval for the estimate, using the specifi<strong>ed</strong><br />
level.<br />
• Coefficient of variation. The ratio of the standard error of the estimate to the<br />
estimate.<br />
• Unweight<strong>ed</strong> count. The number of units us<strong>ed</strong> to compute the estimate.<br />
• Population size. The estimat<strong>ed</strong> number of units in the population.<br />
• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />
by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />
the effect of specifying a complex design, where values further from 1 indicate<br />
greater effects.<br />
• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />
design, where values further from 1 indicate greater effects.<br />
t-test. You can request t tests of the estimates against a specifi<strong>ed</strong> value.
58<br />
Chapter 8<br />
<strong>Complex</strong> Samples Ratios M<strong>is</strong>sing Values<br />
Figure 8-3<br />
Ratios M<strong>is</strong>sing Values dialog box<br />
Ratios. Th<strong>is</strong> group determines which cases are us<strong>ed</strong> in the analys<strong>is</strong>.<br />
• Use all available data. M<strong>is</strong>sing values are determin<strong>ed</strong> on a ratio-by-ratio bas<strong>is</strong>.<br />
Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics may vary <strong>ac</strong>ross numerator-denominator<br />
pairs.<br />
• Ensure cons<strong>is</strong>tent case base. M<strong>is</strong>sing values are determin<strong>ed</strong> <strong>ac</strong>ross all variables.<br />
Thus, the cases us<strong>ed</strong> to compute stat<strong>is</strong>tics are cons<strong>is</strong>tent.<br />
Categorical Design Variables. Th<strong>is</strong> group determines whether user-m<strong>is</strong>sing values are<br />
valid or invalid.<br />
<strong>Complex</strong> Samples Options<br />
Figure 8-4<br />
Options dialog box
59<br />
<strong>Complex</strong> Samples Ratios<br />
Subpopulation D<strong>is</strong>play. You can choose to have subpopulations d<strong>is</strong>play<strong>ed</strong> in the same<br />
table or in separate tables.
<strong>Complex</strong> Samples<br />
General Linear Model<br />
Chapter<br />
9<br />
The <strong>Complex</strong> Samples General Linear Model (CSGLM) proc<strong>ed</strong>ure performs linear<br />
regression analys<strong>is</strong>, as well as analys<strong>is</strong> of variance and covariance, for samples<br />
drawn by complex sampling methods. Optionally, you can request analyses for<br />
a subpopulation.<br />
Example. A grocery store chain survey<strong>ed</strong> a set of customers concerning their<br />
purchasing habits, <strong>ac</strong>cording to a complex design. Given the survey results and<br />
how much e<strong>ac</strong>h customer spent in the previous month, the store wants to see if the<br />
frequency with which customers shop <strong>is</strong> relat<strong>ed</strong> to the amount they spend in a month,<br />
controlling for the gender of the customer and incorporating the sampling design.<br />
Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates, standard errors, confidence intervals,<br />
t tests, design effects, and square roots of design effects for model parameters, as<br />
well as the correlations and covariances between parameter estimates. Measures of<br />
model fit and descriptive stat<strong>is</strong>tics for the dependent and independent variables are<br />
also available. Additionally, you can request estimat<strong>ed</strong> marginal means for levels of<br />
model f<strong>ac</strong>tors and f<strong>ac</strong>tor inter<strong>ac</strong>tions.<br />
Data. The dependent variable <strong>is</strong> quantitative. F<strong>ac</strong>tors are categorical. Covariates<br />
are quantitative variables that are relat<strong>ed</strong> to the dependent variable. Subpopulation<br />
variables can be string or numeric but should be categorical.<br />
Assumptions. Thecasesinth<strong>ed</strong>atafilerepresentasamplefrom<strong>ac</strong>omplexdesign<br />
that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />
<strong>Complex</strong> Samples Plan dialog box.<br />
61
62<br />
Chapter 9<br />
Obtaining a <strong>Complex</strong> Samples General Linear Model<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
General Linear Model...<br />
E<br />
E<br />
Select a plan file and, optionally, select a custom joint probabilities file.<br />
Click Continue.<br />
Figure 9-1<br />
General Linear Model dialog box<br />
E<br />
Select a dependent variable.
63<br />
<strong>Complex</strong> Samples General Linear Model<br />
Optionally, you can:<br />
• Select variables for F<strong>ac</strong>tors and Covariates, as appropriate for your data.<br />
• Specify a variable to define a subpopulation. The analys<strong>is</strong> <strong>is</strong> perform<strong>ed</strong> only for<br />
the select<strong>ed</strong> category of the subpopulation variable.<br />
Figure 9-2<br />
Model dialog box<br />
Specify Model Effects. By default, the proc<strong>ed</strong>ure builds a main-effects model using the<br />
f<strong>ac</strong>tors and covariates specifi<strong>ed</strong> in the main dialog box. Alternatively, you can build a<br />
custom model that includes inter<strong>ac</strong>tion effects and nest<strong>ed</strong> terms.
64<br />
Chapter 9<br />
Non-Nest<strong>ed</strong> Terms<br />
For the select<strong>ed</strong> f<strong>ac</strong>tors and covariates:<br />
Inter<strong>ac</strong>tion. Creates the highest-level inter<strong>ac</strong>tion term for all select<strong>ed</strong> variables.<br />
Main effects. Creates a main-effects term for e<strong>ac</strong>h variable select<strong>ed</strong>.<br />
All 2-way. Creates all possible two-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
All 3-way. Creates all possible three-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
All 4-way. Creates all possible four-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
All 5-way. Creates all possible five-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
Nest<strong>ed</strong> Terms<br />
You can build nest<strong>ed</strong> terms for your model in th<strong>is</strong> proc<strong>ed</strong>ure. Nest<strong>ed</strong> terms are<br />
useful for modeling the effect of a f<strong>ac</strong>tor or covariate whose values do not inter<strong>ac</strong>t<br />
with the levels of another f<strong>ac</strong>tor. For example, a grocery store chain may follow the<br />
spending habits of its customers at several store locations. Since e<strong>ac</strong>h customer<br />
frequents only one of these locations, the Customer effect can be said to be nest<strong>ed</strong><br />
within the Store location effect.<br />
Additionally, you can include inter<strong>ac</strong>tion effects or add multiple levels of nesting<br />
to the nest<strong>ed</strong> term.<br />
Limitations. Nest<strong>ed</strong> terms have the following restrictions:<br />
• All f<strong>ac</strong>tors within an inter<strong>ac</strong>tion must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />
specifying A*A <strong>is</strong> invalid.<br />
• All f<strong>ac</strong>tors within a nest<strong>ed</strong> effect must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />
specifying A(A) <strong>is</strong> invalid.<br />
• No effect can be nest<strong>ed</strong> within a covariate. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor and X <strong>is</strong> a<br />
covariate, then specifying A(X) <strong>is</strong> invalid.<br />
Intercept. The intercept <strong>is</strong> usually includ<strong>ed</strong> in the model. If you can assume the<br />
data pass through the origin, you can exclude the intercept. Even if you include the<br />
intercept in the model, you can choose to suppress stat<strong>is</strong>tics relat<strong>ed</strong> to it.
65<br />
<strong>Complex</strong> Samples General Linear Model<br />
<strong>Complex</strong> Samples General Linear Model Stat<strong>is</strong>tics<br />
Figure 9-3<br />
General Linear Model Stat<strong>is</strong>tics dialog box<br />
Model Parameters. Th<strong>is</strong> group allows you to control the d<strong>is</strong>play of stat<strong>is</strong>tics relat<strong>ed</strong> to<br />
the model parameters.<br />
• Estimate. D<strong>is</strong>plays estimates of the coefficients.<br />
• Standard error. D<strong>is</strong>plays the standard error for e<strong>ac</strong>h coefficient estimate.<br />
• Confidence interval. D<strong>is</strong>plays a confidence interval for e<strong>ac</strong>h coefficient estimate.<br />
The confidence level for the interval <strong>is</strong> set in the Options dialog box.<br />
• t-test. D<strong>is</strong>plays a t test of e<strong>ac</strong>h coefficient estimate. The null hypothes<strong>is</strong> for e<strong>ac</strong>h<br />
test <strong>is</strong> that the value of the coefficient <strong>is</strong> 0.<br />
• Covariances of parameter estimates. D<strong>is</strong>plays an estimate of the covariance matrix<br />
for the model coefficients.<br />
• Correlations of parameter estimates. D<strong>is</strong>plays an estimate of the correlation matrix<br />
for the model coefficients.
66<br />
Chapter 9<br />
• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />
by assuming thatthesample<strong>is</strong>asimplerandomsample.Th<strong>is</strong><strong>is</strong>ameasureof<br />
the effect of specifying a complex design, where values further from 1 indicate<br />
greater effects.<br />
• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />
design, where values further from 1 indicate greater effects.<br />
Model fit. D<strong>is</strong>plays R 2 and root mean squar<strong>ed</strong> error stat<strong>is</strong>tics.<br />
Population means of dependent variable and covariates. D<strong>is</strong>plays summary information<br />
about the dependent variable, covariates, and f<strong>ac</strong>tors.<br />
Sample design information. D<strong>is</strong>plays summary information about the sample,<br />
including the unweight<strong>ed</strong> count and the population size.<br />
<strong>Complex</strong> Samples Hypothes<strong>is</strong> Tests<br />
Figure 9-4<br />
Hypothes<strong>is</strong> Tests dialog box
67<br />
<strong>Complex</strong> Samples General Linear Model<br />
Test Stat<strong>is</strong>tic. Th<strong>is</strong> group allows you to select the type of stat<strong>is</strong>tic us<strong>ed</strong> for testing<br />
hypotheses. You can choose between F, adjust<strong>ed</strong> F, chi-square, and adjust<strong>ed</strong><br />
chi-square.<br />
Sampling Degrees of Fre<strong>ed</strong>om. Th<strong>is</strong> group gives you control over the sampling design<br />
degrees of fre<strong>ed</strong>om us<strong>ed</strong> to compute p values for all test stat<strong>is</strong>tics. If bas<strong>ed</strong> on the<br />
sampling design, the value <strong>is</strong> the difference between the number of primary sampling<br />
units and the number of strata in the first stage of sampling. Alternatively, you can set<br />
a custom degrees of fre<strong>ed</strong>om by specifying a positive integer.<br />
Adjustment for Multiple Compar<strong>is</strong>ons. When performing many hypothes<strong>is</strong> tests, you<br />
run the r<strong>is</strong>k of increas<strong>ed</strong> overall Type I error (the probability of incorrectly rejecting<br />
a null hypothes<strong>is</strong>). Th<strong>is</strong> group allows you to choose the method of adjusting the<br />
significance level.<br />
• Least significant difference. Th<strong>is</strong> method does not control the overall probability<br />
of rejecting the hypotheses that some linear contrasts are different from the<br />
null hypothes<strong>is</strong> values.<br />
• Sequential Sidak. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Sidak proc<strong>ed</strong>ure<br />
that <strong>is</strong> much less conservative in terms of rejecting individual hypotheses but<br />
maintains the same overall significance level.<br />
• Sequential Bonferroni. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Bonferroni<br />
proc<strong>ed</strong>ure that <strong>is</strong> much less conservative in terms of rejecting individual<br />
hypotheses but maintaining the same overall significance level.<br />
• Sidak. Th<strong>is</strong> method provides tighter bounds than the Bonferroni appro<strong>ac</strong>h.<br />
• Bonferroni. Th<strong>is</strong> method adjusts the observ<strong>ed</strong> significance level for the f<strong>ac</strong>t that<br />
multiple contrasts are being test<strong>ed</strong>.
68<br />
Chapter 9<br />
<strong>Complex</strong> Samples General Linear Model Estimat<strong>ed</strong> Means<br />
Figure 9-5<br />
General Linear Model Estimat<strong>ed</strong> Means dialog box<br />
The Estimat<strong>ed</strong> Means dialog box allows you to d<strong>is</strong>play the model-estimat<strong>ed</strong> marginal<br />
means for levels of f<strong>ac</strong>tors and f<strong>ac</strong>tor inter<strong>ac</strong>tions specifi<strong>ed</strong> in the Model subdialog<br />
box. You can also request that the overall population mean be d<strong>is</strong>play<strong>ed</strong>.<br />
Term. Estimat<strong>ed</strong> means are comput<strong>ed</strong> for the select<strong>ed</strong> f<strong>ac</strong>tors and f<strong>ac</strong>tor inter<strong>ac</strong>tions.<br />
Contrast. The contrast determines how hypothes<strong>is</strong> tests are set up to compare the<br />
estimat<strong>ed</strong> means.<br />
• Simple. Compares the mean of e<strong>ac</strong>h level to the mean of a specifi<strong>ed</strong> level. Th<strong>is</strong><br />
type of contrast <strong>is</strong> useful when there <strong>is</strong> a control group.<br />
• Deviation. Compares the mean of e<strong>ac</strong>h level (except a reference category) to the<br />
mean of all of the levels (grand mean). The levels of the f<strong>ac</strong>tor can be in any order.<br />
• Difference. Compares the mean of e<strong>ac</strong>h level (except the first) to the mean of<br />
previous levels. They are sometimes call<strong>ed</strong> reverse Helmert contrasts.<br />
• Helmert. Compares the mean of e<strong>ac</strong>h level of the f<strong>ac</strong>tor (except the last) to the<br />
mean of subsequent levels.
69<br />
<strong>Complex</strong> Samples General Linear Model<br />
• Repeat<strong>ed</strong>. Compares the mean of e<strong>ac</strong>h level (except the last) to the mean of<br />
the subsequent level.<br />
• Polynomial. Compares the linear effect, quadratic effect, cubic effect, and so on.<br />
The first degree of fre<strong>ed</strong>om contains the linear effect <strong>ac</strong>ross all categories; the<br />
second degree of fre<strong>ed</strong>om, the quadratic effect; and so on. These contrasts are<br />
oftenus<strong>ed</strong>to estimate polynomial trends.<br />
Reference Category. The simple and deviation contrasts require a reference category,<br />
or f<strong>ac</strong>tor level against which the others are compar<strong>ed</strong>.<br />
<strong>Complex</strong> Samples General Linear Model Save<br />
Figure 9-6<br />
General Linear Model Save dialog box<br />
Save Variables. Th<strong>is</strong> group allows you to save the model pr<strong>ed</strong>ict<strong>ed</strong> values and<br />
residuals as new variables in the working file.
70<br />
Chapter 9<br />
Export Model as <strong>SPSS</strong> data. Writes an <strong>SPSS</strong> data file containing a covariance (or<br />
correlation, if select<strong>ed</strong>) matrix of the parameter estimates in the model. Also, for e<strong>ac</strong>h<br />
dependent variable, there will be a row of parameter estimates, a row of standard<br />
errors, a row of significance values for the t stat<strong>is</strong>tics corresponding to the parameter<br />
estimates, and a row of sampling design degrees of fre<strong>ed</strong>om. You can use th<strong>is</strong> matrix<br />
file in other proc<strong>ed</strong>ures that read an <strong>SPSS</strong> matrix file.<br />
Export Model as XML. Saves the parameter estimates and the parameter covariance<br />
matrix, if select<strong>ed</strong>, in XML (PMML) format. SmartScore and the server version of<br />
<strong>SPSS</strong> (a separate product) can use th<strong>is</strong> model file to apply the model information<br />
to other data files for scoring purposes.<br />
<strong>Complex</strong> Samples General Linear Model Options<br />
Figure 9-7<br />
General Linear Model Options dialog box<br />
User-M<strong>is</strong>sing Values. All design variables, as well as the dependent variable and<br />
any covariates, must have valid data. Cases with invalid data for any of these<br />
variables are delet<strong>ed</strong> from the analys<strong>is</strong>. These controls allow you to decide whether<br />
user-m<strong>is</strong>sing values are treat<strong>ed</strong> as valid among the strata, cluster, subpopulation, and<br />
f<strong>ac</strong>tor variables.<br />
Confidence Interval. Th<strong>is</strong> <strong>is</strong> the confidence interval level for coefficient estimates<br />
and estimat<strong>ed</strong> marginal means. Specify a value greater than or equal to 50 and less<br />
than 100.
CSGLM Command Additional Features<br />
The <strong>SPSS</strong> command language also allows you to:<br />
71<br />
<strong>Complex</strong> Samples General Linear Model<br />
• Specify custom tests of effects versus a linear combination of effects or a value<br />
(using the CUSTOM subcommand).<br />
• Fix covariates at values other than their means when computing estimat<strong>ed</strong><br />
marginal means (using the EMMEANS subcommand).<br />
• Specify a metric for polynomial contrasts (using the EMMEANS subcommand).<br />
• Specify a tolerance value for checking singularity (using the CRITERIA<br />
subcommand).<br />
• Create user-specifi<strong>ed</strong> names for sav<strong>ed</strong> variables (using the SAVE subcommand).<br />
• Produce a general estimable function table (using the PRINT subcommand).<br />
See the <strong>SPSS</strong> Command Syntax Reference for complete syntax information.
<strong>Complex</strong> Samples<br />
Log<strong>is</strong>tic Regression<br />
Chapter<br />
10<br />
The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure performs log<strong>is</strong>tic regression<br />
analys<strong>is</strong> on a binary or multinomial dependent variable for samples drawn by complex<br />
sampling methods. Optionally, you can request analyses for a subpopulation.<br />
Example. A loan officer has collect<strong>ed</strong> past records of customers given loans at several<br />
different branches, <strong>ac</strong>cording to a complex design. While incorporating the sample<br />
design, the officer wants to see if the probability with which a customer defaults <strong>is</strong><br />
relat<strong>ed</strong> to age, employment h<strong>is</strong>tory, and amount of cr<strong>ed</strong>it debt.<br />
Stat<strong>is</strong>tics. The proc<strong>ed</strong>ure produces estimates, exponentiat<strong>ed</strong> estimates, standard<br />
errors, confidence intervals, t tests, design effects, and square roots of design effects<br />
for model parameters, as well as the correlations and covariances between parameter<br />
estimates. Pseudo R 2 stat<strong>is</strong>tics, classification tables, and descriptive stat<strong>is</strong>tics for the<br />
dependent and independent variables are also available.<br />
Data. The dependent variable <strong>is</strong> categorical. F<strong>ac</strong>tors are categorical. Covariates<br />
are quantitative variables that are relat<strong>ed</strong> to the dependent variable. Subpopulation<br />
variables can be string or numeric but should be categorical.<br />
Assumptions. Thecasesinth<strong>ed</strong>atafilerepresentasamplefrom<strong>ac</strong>omplexdesign<br />
that should be analyz<strong>ed</strong> <strong>ac</strong>cording to the specifications in the file select<strong>ed</strong> in the<br />
<strong>Complex</strong> Samples Plan dialog box.<br />
Obtaining <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
From the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Log<strong>is</strong>tic Regression...<br />
73
74<br />
Chapter 10<br />
E<br />
E<br />
Select a plan file and, optionally, select a custom joint probabilities file.<br />
Click Continue.<br />
Figure 10-1<br />
Log<strong>is</strong>tic Regression dialog box<br />
E<br />
Select a dependent variable.<br />
Optionally, you can:<br />
• Select variables for f<strong>ac</strong>tors and covariates, as appropriate for your data.<br />
• Specify a variable to define a subpopulation. The analys<strong>is</strong> <strong>is</strong> perform<strong>ed</strong> only for<br />
the select<strong>ed</strong> category of the subpopulation variable.
75<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Reference Category<br />
Figure 10-2<br />
Log<strong>is</strong>tic Regression Reference Category dialog box<br />
By default, the <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure makes the<br />
highest-valu<strong>ed</strong> category the reference category. Th<strong>is</strong> dialog box allows you to specify<br />
the highest, lowest, or a custom category as the reference category.
76<br />
Chapter 10<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Model<br />
Figure 10-3<br />
Log<strong>is</strong>tic Regression Model dialog box<br />
Specify Model Effects. By default, the proc<strong>ed</strong>ure builds a main-effects model using the<br />
f<strong>ac</strong>tors and covariates specifi<strong>ed</strong> in the main dialog box. Alternatively, you can build a<br />
custom model that includes inter<strong>ac</strong>tion effects and nest<strong>ed</strong> terms.<br />
Non-Nest<strong>ed</strong> Terms<br />
For the select<strong>ed</strong> f<strong>ac</strong>tors and covariates:<br />
Inter<strong>ac</strong>tion. Creates the highest-level inter<strong>ac</strong>tion term for all select<strong>ed</strong> variables.<br />
Main effects. Creates a main-effects term for e<strong>ac</strong>h variable select<strong>ed</strong>.
77<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
All 2-way. Creates all possible two-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
All 3-way. Creates all possible three-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
All 4-way. Creates all possible four-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
All 5-way. Creates all possible five-way inter<strong>ac</strong>tions of the select<strong>ed</strong> variables.<br />
Nest<strong>ed</strong> Terms<br />
You can build nest<strong>ed</strong> terms for your model in th<strong>is</strong> proc<strong>ed</strong>ure. Nest<strong>ed</strong> terms are<br />
useful for modeling the effect of a f<strong>ac</strong>tor or covariate whose values do not inter<strong>ac</strong>t<br />
with the levels of another f<strong>ac</strong>tor. For example, a grocery store chain may follow the<br />
spending habits of its customers at several store locations. Since e<strong>ac</strong>h customer<br />
frequents only one of these locations, the Customer effect can be said to be nest<strong>ed</strong><br />
within the Store location effect.<br />
Additionally, you can include inter<strong>ac</strong>tion effects or add multiple levels of nesting<br />
to the nest<strong>ed</strong> term.<br />
Limitations. Nest<strong>ed</strong> terms have the following restrictions:<br />
• All f<strong>ac</strong>tors within an inter<strong>ac</strong>tion must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />
specifying A*A <strong>is</strong> invalid.<br />
• All f<strong>ac</strong>tors within a nest<strong>ed</strong> effect must be unique. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor, then<br />
specifying A(A) <strong>is</strong> invalid.<br />
• No effect can be nest<strong>ed</strong> within a covariate. Thus, if A <strong>is</strong> a f<strong>ac</strong>tor and X <strong>is</strong> a<br />
covariate, then specifying A(X) <strong>is</strong> invalid.<br />
Intercept. The intercept <strong>is</strong> usually includ<strong>ed</strong> in the model. If you can assume the<br />
data pass through the origin, you can exclude the intercept. Even if you include the<br />
intercept in the model, you can choose to suppress stat<strong>is</strong>tics relat<strong>ed</strong> to it.
78<br />
Chapter 10<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Stat<strong>is</strong>tics<br />
Figure 10-4<br />
Log<strong>is</strong>tic Regression Stat<strong>is</strong>tics dialog box<br />
Model Fit. Controls the d<strong>is</strong>plays of stat<strong>is</strong>tics that measure the overall model<br />
performance.<br />
• Pseudo R-square. The R 2 stat<strong>is</strong>tic from linear regression does not have an ex<strong>ac</strong>t<br />
counterpart among log<strong>is</strong>tic regression models. There are, instead, multiple<br />
measures that attempt to mimic the properties of the R 2 stat<strong>is</strong>tic.<br />
• Classification table. D<strong>is</strong>plays the tabulat<strong>ed</strong> cross-classifications of the observ<strong>ed</strong><br />
category by the model-pr<strong>ed</strong>ict<strong>ed</strong> category on the dependent variable.<br />
Parameters. Th<strong>is</strong> group allows you to control the d<strong>is</strong>play of stat<strong>is</strong>tics relat<strong>ed</strong> to the<br />
model parameters.<br />
• Estimate. D<strong>is</strong>plays estimates of the coefficients.<br />
• Exponentiat<strong>ed</strong> estimate. D<strong>is</strong>plays the base of the natural logarithm ra<strong>is</strong><strong>ed</strong> to the<br />
power of the estimates of the coefficients. While the estimate has nice properties<br />
for stat<strong>is</strong>tical testing, the exponentiat<strong>ed</strong> estimate, or exp(B), <strong>is</strong> easier to interpret.
79<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
• Standard error. D<strong>is</strong>plays the standard error for e<strong>ac</strong>h coefficient estimate.<br />
• Confidence interval. D<strong>is</strong>plays a confidence interval for e<strong>ac</strong>h coefficient estimate.<br />
The confidence level for the interval <strong>is</strong> set in the Options dialog box.<br />
• t-test. D<strong>is</strong>plays a t test of e<strong>ac</strong>h coefficient estimate. The null hypothes<strong>is</strong> for e<strong>ac</strong>h<br />
test <strong>is</strong> that the value of the coefficient <strong>is</strong> 0.<br />
• Covariances of parameter estimates. D<strong>is</strong>plays an estimate of the covariance matrix<br />
for the model coefficients.<br />
• Correlations of parameter estimates. D<strong>is</strong>plays an estimate of the correlation matrix<br />
for the model coefficients.<br />
• Design effect. The ratio of the variance of the estimate to the variance obtain<strong>ed</strong><br />
by assuming that the sample <strong>is</strong> a simple random sample. Th<strong>is</strong> <strong>is</strong> a measure of<br />
the effect of specifying a complex design, where values further from 1 indicate<br />
greater effects.<br />
• Square root of design effect. Th<strong>is</strong> <strong>is</strong> a measure of the effect of specifying a complex<br />
design, where values further from 1 indicate greater effects.<br />
Summary stat<strong>is</strong>tics for model variables. D<strong>is</strong>plays summary information about the<br />
dependent variable, covariates, and f<strong>ac</strong>tors.<br />
Sample design information. D<strong>is</strong>plays summary information about the sample,<br />
including the unweight<strong>ed</strong> count and the population size.
80<br />
Chapter 10<br />
<strong>Complex</strong> Samples Hypothes<strong>is</strong> Tests<br />
Figure 10-5<br />
Hypothes<strong>is</strong> Tests dialog box<br />
Test Stat<strong>is</strong>tic. Th<strong>is</strong> group allows you to select the type of stat<strong>is</strong>tic us<strong>ed</strong> for testing<br />
hypotheses. You can choose between F, adjust<strong>ed</strong> F, chi-square, and adjust<strong>ed</strong><br />
chi-square.<br />
Sampling Degrees of Fre<strong>ed</strong>om. Th<strong>is</strong> group gives you control over the sampling design<br />
degrees of fre<strong>ed</strong>om us<strong>ed</strong> to compute p values for all test stat<strong>is</strong>tics. If bas<strong>ed</strong> on the<br />
sampling design, the value <strong>is</strong> the difference between the number of primary sampling<br />
units and the number of strata in the first stage of sampling. Alternatively, you can set<br />
a custom degrees of fre<strong>ed</strong>om by specifying a positive integer.<br />
Adjustment for Multiple Compar<strong>is</strong>ons. When performing many hypothes<strong>is</strong> tests, you<br />
run the r<strong>is</strong>k of increas<strong>ed</strong> overall Type I error (the probability of incorrectly rejecting<br />
a null hypothes<strong>is</strong>). Th<strong>is</strong> group allows you to choose the method of adjusting the<br />
significance level.
81<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
• Least significant difference. Th<strong>is</strong> method does not control the overall probability<br />
of rejecting the hypotheses that some linear contrasts are different from the<br />
null hypothes<strong>is</strong> values.<br />
• Sequential Sidak. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Sidak proc<strong>ed</strong>ure<br />
that <strong>is</strong> much less conservative in terms of rejecting individual hypotheses but<br />
maintains the same overall significance level.<br />
• Sequential Bonferroni. Th<strong>is</strong> <strong>is</strong> a sequentially step-down rejective Bonferroni<br />
proc<strong>ed</strong>ure that <strong>is</strong> much less conservative in terms of rejecting individual<br />
hypotheses but maintaining the same overall significance level.<br />
• Sidak. Th<strong>is</strong> method provides tighter bounds than the Bonferroni appro<strong>ac</strong>h.<br />
• Bonferroni. Th<strong>is</strong> method adjusts the observ<strong>ed</strong> significance level for the f<strong>ac</strong>t that<br />
multiple contrasts are being test<strong>ed</strong>.<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Odds Ratios<br />
Figure 10-6<br />
Log<strong>is</strong>tic Regression Odds Ratios dialog box
82<br />
Chapter 10<br />
The Odds Ratios dialog box allows you to d<strong>is</strong>play the model-estimat<strong>ed</strong> odds ratios for<br />
specifi<strong>ed</strong> f<strong>ac</strong>tors and covariates. A separate set of odds ratios <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h<br />
category of the dependent variable except the reference category.<br />
F<strong>ac</strong>tors. For e<strong>ac</strong>h select<strong>ed</strong> f<strong>ac</strong>tor, d<strong>is</strong>plays the ratio of the odds at e<strong>ac</strong>h category of the<br />
f<strong>ac</strong>tor to the odds at the specifi<strong>ed</strong> reference category.<br />
Covariates. For e<strong>ac</strong>h select<strong>ed</strong> covariate, d<strong>is</strong>plays the ratio of the odds at the covariate’s<br />
mean value plus the specifi<strong>ed</strong> units of change to the odds at the mean.<br />
When computing odds ratios for a f<strong>ac</strong>tor or covariate, the proc<strong>ed</strong>ure fixes all other<br />
f<strong>ac</strong>tors at their highest levels and all other covariates at their means. If a f<strong>ac</strong>tor or<br />
covariate inter<strong>ac</strong>ts with other pr<strong>ed</strong>ictors in the model, then the odds ratios depend not<br />
only on the change in the specifi<strong>ed</strong> variable but also on the values of the variables<br />
with which it inter<strong>ac</strong>ts. If a specifi<strong>ed</strong> covariate inter<strong>ac</strong>ts with itself in the model (for<br />
example, age*age), then the odds ratios depend on both the change in the covariate<br />
and the value of the covariate.
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Save<br />
Figure 10-7<br />
Log<strong>is</strong>tic Regression Save dialog box<br />
83<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
Save Variables. Th<strong>is</strong> group allows you to save the model-pr<strong>ed</strong>ict<strong>ed</strong> category and<br />
pr<strong>ed</strong>ict<strong>ed</strong> probabilities as new variables in the working file.<br />
Export Model as <strong>SPSS</strong> data. Writes an <strong>SPSS</strong> data file containing a covariance (or<br />
correlation, if select<strong>ed</strong>) matrix of the parameter estimates in the model. Also, for e<strong>ac</strong>h<br />
dependent variable, there will be a row of parameter estimates, a row of standard<br />
errors, a row of significance values for the t stat<strong>is</strong>tics corresponding to the parameter<br />
estimates, and a row of sampling design degrees of fre<strong>ed</strong>om. You can use th<strong>is</strong> matrix<br />
file in other proc<strong>ed</strong>ures that read an <strong>SPSS</strong> matrix file.
84<br />
Chapter 10<br />
Export Model as XML. Saves the parameter estimates and the parameter covariance<br />
matrix, if select<strong>ed</strong>, in XML (PMML) format. SmartScore and the server version of<br />
<strong>SPSS</strong> (a separate product) can use th<strong>is</strong> model file to apply the model information<br />
to other data files for scoring purposes.<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Options<br />
Figure 10-8<br />
Log<strong>is</strong>tic Regression Options dialog box<br />
Estimation. Th<strong>is</strong> group gives you control of various criteria us<strong>ed</strong> in the model<br />
estimation.<br />
• Maximum Iterations. The maximum number of iterations the algorithm will<br />
execute. Specify a non-negative integer.<br />
• Maximum Step-Halving. At e<strong>ac</strong>h iteration, the step size <strong>is</strong> r<strong>ed</strong>uc<strong>ed</strong> by a f<strong>ac</strong>tor<br />
of 0.5 until the log-likelihood increases or maximum step-halving <strong>is</strong> re<strong>ac</strong>h<strong>ed</strong>.<br />
Specify a positive integer.
85<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
• Limit iterations bas<strong>ed</strong> on change in parameter estimates. When select<strong>ed</strong>, the<br />
algorithm stops after an iteration in which the absolute or relative change in the<br />
parameter estimates <strong>is</strong> less than the value specifi<strong>ed</strong>, which must be non-negative.<br />
• Limit iterations bas<strong>ed</strong> on change in log-likelihood. When select<strong>ed</strong>, the algorithm<br />
stops after an iteration in which the absolute or relative change in the<br />
log-likelihood function <strong>is</strong> less than the value specifi<strong>ed</strong>, which must be<br />
non-negative.<br />
• Check for complete separation of data points. When select<strong>ed</strong>, the algorithm<br />
performs tests to ensure that the parameter estimates have unique values.<br />
Separation occurs when the proc<strong>ed</strong>ure can produce a model that correctly<br />
classifies every case.<br />
• D<strong>is</strong>play iteration h<strong>is</strong>tory. D<strong>is</strong>plays parameter estimates and stat<strong>is</strong>tics at every n<br />
iterations beginning with the 0 th iteration (the initial estimates). If you choose to<br />
print the iteration h<strong>is</strong>tory, the last iteration <strong>is</strong> always print<strong>ed</strong> regardless of the<br />
value of n.<br />
User-M<strong>is</strong>sing Values. All design variables, as well as the dependent variable and<br />
any covariates, must have valid data. Cases with invalid data for any of these<br />
variables are delet<strong>ed</strong> from the analys<strong>is</strong>. These controls allow you to decide whether<br />
user-m<strong>is</strong>sing values are treat<strong>ed</strong> as valid among the strata, cluster, subpopulation, and<br />
f<strong>ac</strong>tor variables.<br />
Confidence Interval. Th<strong>is</strong> <strong>is</strong> the confidence interval level for coefficient estimates,<br />
exponentiat<strong>ed</strong> coefficient estimates, and odds ratios. Specify a value greater than or<br />
equal to 50 and less than 100.<br />
CSLOGISTIC Command Additional Features<br />
The <strong>SPSS</strong> command language also allows you to:<br />
• Specify custom tests of effects versus a linear combination of effects or a value<br />
(using the CUSTOM subcommand).<br />
• Fix values of other model variables when computing odds ratios for f<strong>ac</strong>tors and<br />
covariates (using the ODDSRATIOS subcommand).<br />
• Specify a tolerance value for checking singularity (using the CRITERIA<br />
subcommand).
86<br />
Chapter 10<br />
• Create user-specifi<strong>ed</strong> names for sav<strong>ed</strong> variables (using the SAVE subcommand).<br />
• Produce a general estimable function table (using the PRINT subcommand).<br />
See the <strong>SPSS</strong> Command Syntax Reference for complete syntax information.
Part 2: Examples
Chapter<br />
11<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
The Sampling Wizard guides you through the steps for creating, modifying, or<br />
executing a sampling plan file. Before using the Wizard, you should have a<br />
well-defin<strong>ed</strong> target population, a l<strong>is</strong>t of sampling units, and an appropriate sample<br />
design in mind.<br />
Obtaining a Sample from a Full Sampling Frame<br />
A state agency <strong>is</strong> charg<strong>ed</strong> with ensuring fair property taxes from county to county.<br />
Taxes are bas<strong>ed</strong> on the appra<strong>is</strong><strong>ed</strong> value of the property, so the agency wants to survey<br />
a sample of properties by county to be sure that e<strong>ac</strong>h county’s records are equally<br />
up-to-date. However, resources for obtaining current appra<strong>is</strong>als are limit<strong>ed</strong>, so it’s<br />
important that what <strong>is</strong> available <strong>is</strong> us<strong>ed</strong> w<strong>is</strong>ely. The agency decides to employ<br />
complex sampling methodology to select a sample of properties.<br />
A l<strong>is</strong>ting of properties <strong>is</strong> collect<strong>ed</strong> in property_assess_cs.sav, found in the<br />
\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />
the <strong>Complex</strong> Samples Sampling Wizard to select a sample.<br />
Using the Wizard<br />
E<br />
To run the <strong>Complex</strong> Samples Sampling Wizard, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Select a Sample...<br />
89
90<br />
Chapter 11<br />
Figure 11-1<br />
Sampling Wizard, Welcome step<br />
E<br />
E<br />
Select Designasampleand type c:\property_assess.csplan asthenameoftheplan<br />
file.<br />
Click Next.
91<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-2<br />
Sampling Wizard, Design Variables step (stage 1)<br />
E<br />
E<br />
E<br />
Select County as a stratification variable.<br />
Select Township as a cluster variable.<br />
Click Next, and then click Next in the Method step.<br />
Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h county.<br />
In th<strong>is</strong> stage, townships are drawn as the primary sampling unit using the default<br />
method, simple random sampling.
92<br />
Chapter 11<br />
Figure 11-3<br />
Sampling Wizard, Sample Size step (stage 1)<br />
E<br />
E<br />
Type 4 as the value for the number of clusters to select in th<strong>is</strong> stage.<br />
Click Next, andthenclickNext in the Output Variables step.
93<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-4<br />
Sampling Wizard, Plan Summary step (stage 1)<br />
E<br />
E<br />
Select Yes, add stage 2 now.<br />
Click Next.
94<br />
Chapter 11<br />
Figure 11-5<br />
Sampling Wizard, Design Variables step (stage 2)<br />
E<br />
E<br />
Select Neighborhood as a stratification variable.<br />
Click Next, and then click Next in the Method step.<br />
Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h<br />
neighborhood of the townships drawn in stage 1. In th<strong>is</strong> stage, properties are drawn as<br />
the primary sampling unit using simple random sampling.
95<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-6<br />
Sampling Wizard, Sample Size step (stage 2)<br />
E<br />
E<br />
E<br />
Select Proportions from the Units drop-down l<strong>is</strong>t.<br />
Type 0.2 as the value of the proportion of units to sample from e<strong>ac</strong>h strata.<br />
Click Next, andthenclickNext in the Output Variables step.
96<br />
Chapter 11<br />
Figure 11-7<br />
Sampling Wizard, Plan Summary step (stage 2)<br />
E<br />
Look over the sampling design, and then click Next.
97<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-8<br />
Sampling Wizard, Draw Sample, Selection Options step<br />
E<br />
E<br />
Select Custom value for the type of random se<strong>ed</strong> to use and type 241972 as the value.<br />
Using a custom value allows you to replicate the results of th<strong>is</strong> example ex<strong>ac</strong>tly.<br />
Click Next, and then click Next in the Draw Sample, Output Files step.
98<br />
Chapter 11<br />
Figure 11-9<br />
Sampling Wizard, Fin<strong>is</strong>h step<br />
E<br />
Click Fin<strong>is</strong>h.<br />
These selections produce the sampling plan file property_assess.csplan and draw a<br />
sample <strong>ac</strong>cording to that plan.
99<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Plan Summary<br />
Figure 11-10<br />
Plan summary<br />
The summary table reviews your sampling plan, and it <strong>is</strong> useful for making sure<br />
that the plan represents your intentions.
100<br />
Chapter 11<br />
Sampling Summary<br />
Figure 11-11<br />
Stage summary<br />
Th<strong>is</strong> summary table reviews the first stage of sampling, and it <strong>is</strong> useful for checking<br />
that the sampling went <strong>ac</strong>cording to plan. Four townships were sampl<strong>ed</strong> from e<strong>ac</strong>h<br />
county, as request<strong>ed</strong>.
101<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-12<br />
Stage summary<br />
Th<strong>is</strong> summary table (the top part of which <strong>is</strong> shown here) reviews the second stage of<br />
sampling. It <strong>is</strong> also useful for checking that the sampling went <strong>ac</strong>cording to plan.<br />
Approximately 20% of the properties were sampl<strong>ed</strong> from e<strong>ac</strong>h neighborhood from<br />
e<strong>ac</strong>h township sampl<strong>ed</strong> in the first stage, as request<strong>ed</strong>.
102<br />
Chapter 11<br />
Sample Results<br />
Figure 11-13<br />
Data Editor with sample results<br />
You can see the sampling results in the Data Editor. Five new variables were sav<strong>ed</strong> to<br />
the working file, representing the inclusion probabilities and cumulative sampling<br />
weights for e<strong>ac</strong>h stage, plus the final sampling weights.<br />
• Cases with values for these variables were select<strong>ed</strong> to the sample.<br />
• Cases with system-m<strong>is</strong>sing values for the variables were not select<strong>ed</strong>.<br />
The agency will now use its resources to collect current valuations for the properties<br />
select<strong>ed</strong> in the sample. Once those valuations are available, you can process the<br />
sample with <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures, using the sampling plan<br />
property_assess.csplan to provide the sampling specifications.
Obtaining a Sample from a Partial Sampling Frame<br />
103<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Acompany<strong>is</strong>interest<strong>ed</strong> in compiling and selling a database of high-quality survey<br />
information. The survey sample should be representative but efficiently carri<strong>ed</strong><br />
out, so complex sampling methods are us<strong>ed</strong>. The full sampling design calls for<br />
the following structure:<br />
Stage Strata Clusters<br />
1 Region Province<br />
2 D<strong>is</strong>trict City<br />
3 Subdiv<strong>is</strong>ion<br />
In the third stage, households are the primary sampling unit, and select<strong>ed</strong> households<br />
will be survey<strong>ed</strong>. However, information <strong>is</strong> easily available only to the city level, so<br />
the company plans to execute the first two stages of the design now, and then collect<br />
information on the numbers of subdiv<strong>is</strong>ions and households from the sampl<strong>ed</strong> cities.<br />
The available information to the city level <strong>is</strong> collect<strong>ed</strong> in demo_cs_1.sav, found in the<br />
\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />
the <strong>Complex</strong> Samples Sampling Wizard to draw the first two stages.<br />
Using the Wizard to Sample from the First Partial Frame<br />
E<br />
To run the <strong>Complex</strong> Samples Sampling Wizard, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Select aSample...
104<br />
Chapter 11<br />
Figure 11-14<br />
Sampling Wizard, Welcome step<br />
E<br />
E<br />
Select Designasampleand type c:\demo_1.csplan as the name of the plan file.<br />
Click Next.
105<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-15<br />
Sampling Wizard, Design Variables step (stage 1)<br />
E<br />
E<br />
E<br />
Select Region as a stratification variable.<br />
Select Province as a cluster variable.<br />
Click Next, and then click Next in the Method step.<br />
Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h region. In<br />
th<strong>is</strong> stage, provinces are drawn as the primary sampling unit using the default method,<br />
simple random sampling.
106<br />
Chapter 11<br />
Figure 11-16<br />
Sampling Wizard, Sample Size step (stage 1)<br />
E<br />
E<br />
Type 3 as the value for the number of clusters to select in th<strong>is</strong> stage.<br />
Click Next, andthenclickNext in the Output Variables step.
107<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-17<br />
Sampling Wizard, Plan Summary step (stage 1)<br />
E<br />
E<br />
Select Yes, add stage 2 now.<br />
Click Next.
108<br />
Chapter 11<br />
Figure 11-18<br />
Sampling Wizard, Design Variables step (stage 2)<br />
E<br />
E<br />
E<br />
Select D<strong>is</strong>trict as a stratification variable.<br />
Select City as a cluster variable.<br />
Click Next, and then click Next in the Method step.<br />
Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h d<strong>is</strong>trict. In<br />
th<strong>is</strong> stage, cities are drawn as the primary sampling unit using the default method,<br />
simple random sampling.
109<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-19<br />
Sampling Wizard, Sample Size step (stage 2)<br />
E<br />
E<br />
E<br />
Select Proportions from the Units drop-down l<strong>is</strong>t.<br />
Type 0.1 as the value of the proportion of units to sample from e<strong>ac</strong>h strata.<br />
Click Next, andthenclickNext in the Output Variables step.
110<br />
Chapter 11<br />
Figure 11-20<br />
Sampling Wizard, Plan Summary step (stage 2)<br />
E<br />
Look over the sampling design, and then click Next.
111<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-21<br />
Sampling Wizard, Draw Sample, Selection Options step<br />
E<br />
E<br />
Select Custom value for the type of random se<strong>ed</strong> to use, and type 241972 as the value.<br />
Using a custom value allows you to replicate the results of th<strong>is</strong> example ex<strong>ac</strong>tly.<br />
Click Next, and then click Next in the Draw Sample, Output Files step.
112<br />
Chapter 11<br />
Figure 11-22<br />
Sampling Wizard, Fin<strong>is</strong>h step<br />
E<br />
Click Fin<strong>is</strong>h.<br />
These selections produce the sampling plan file demo_1.csplan and draw a sample<br />
<strong>ac</strong>cording to that plan.
113<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Sample Results<br />
Figure 11-23<br />
Data Editor with sample results<br />
You can see the sampling results in the Data Editor. Five new variables were sav<strong>ed</strong> to<br />
the working file, representing the inclusion probabilities and cumulative sampling<br />
weights for e<strong>ac</strong>h stage, plus the “final” sampling weights for the first two stages.<br />
• Cities with values for these variables were select<strong>ed</strong> to the sample.<br />
• Cities with system-m<strong>is</strong>sing values for the variables were not select<strong>ed</strong>.<br />
For e<strong>ac</strong>h city select<strong>ed</strong>, the company <strong>ac</strong>quir<strong>ed</strong> subdiv<strong>is</strong>ion and household unit<br />
information and pl<strong>ac</strong><strong>ed</strong> it in demo_cs_2.sav. Use th<strong>is</strong> file and the Sampling Wizard to<br />
sample the third stage of th<strong>is</strong> design.
114<br />
Chapter 11<br />
Using the Wizard to Sample from the Second Partial Frame<br />
E<br />
To run the <strong>Complex</strong> Samples Sampling Wizard, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Select a Sample...<br />
Figure 11-24<br />
Sampling Wizard, Welcome step<br />
E<br />
E<br />
Select Designasampleand type c:\demo_2.csplan as the name of the plan file.<br />
Click Next.
115<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-25<br />
Sampling Wizard, Design Variables step (stage 1)<br />
E<br />
E<br />
E<br />
Select Subdiv<strong>is</strong>ion as a stratification variable.<br />
Select Cumulative Sampling Weight for Stage 2 as the input sample weight variable.<br />
Click Next, and then click Next in the Method step.<br />
Th<strong>is</strong> design structure means that independent samples are drawn for e<strong>ac</strong>h subdiv<strong>is</strong>ion.<br />
In th<strong>is</strong> stage, household units are drawn as the primary sampling unit using the default<br />
method, simple random sampling.
116<br />
Chapter 11<br />
Figure 11-26<br />
Sampling Wizard, Sample Size step (stage 1)<br />
E<br />
E<br />
Type 0.2 as the value for the proportion of units to select in th<strong>is</strong> stage.<br />
Click Next, andthenclickNext in the Output Variables step.
117<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-27<br />
Sampling Wizard, Plan Summary step (stage 1)<br />
E<br />
Look over the sampling design, and then click Next.
118<br />
Chapter 11<br />
Figure 11-28<br />
Sampling Wizard, Draw Sample, Selection Options step<br />
E<br />
E<br />
Select Custom value for the type of random se<strong>ed</strong> to use and type 4231946 as the value.<br />
Click Next, and then click Next in the Draw Sample, Output Files step.
119<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Figure 11-29<br />
Sampling Wizard, Fin<strong>is</strong>h step<br />
E<br />
Click Fin<strong>is</strong>h.<br />
These selections produce the sampling plan file demo_2.csplan and draw a sample<br />
<strong>ac</strong>cording to that plan.
120<br />
Chapter 11<br />
Sample Results<br />
Figure 11-30<br />
Data Editor with sample results<br />
You can see the sampling results in the Data Editor. Three new variables were sav<strong>ed</strong><br />
to the working file, representing the inclusion probabilities and cumulative sampling<br />
weights for the third stage, plus the final sampling weights. These new weights take<br />
into <strong>ac</strong>count the weights comput<strong>ed</strong> during the sampling of the first two stages.<br />
• Units with values for these variables were select<strong>ed</strong> to the sample.<br />
• Units with system-m<strong>is</strong>sing values for the variables were not select<strong>ed</strong>.<br />
The company will now use its resources to obtain survey information for the housing<br />
units select<strong>ed</strong> in the sample. Once the surveys are collect<strong>ed</strong>, you can process<br />
the sample with <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures, using the sampling plan<br />
demo_2.csplan to provide the sampling specifications.
121<br />
<strong>Complex</strong> Samples Sampling Wizard<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples Sampling Wizard proc<strong>ed</strong>ure <strong>is</strong> a useful tool for creating a<br />
sampling plan file and drawing a sample.<br />
• To ready a sample for analys<strong>is</strong> when you do not have <strong>ac</strong>cess to the sampling plan<br />
file, use the Analys<strong>is</strong> Preparation Wizard.
<strong>Complex</strong> Samples Analys<strong>is</strong><br />
Preparation Wizard<br />
Chapter<br />
12<br />
The Analys<strong>is</strong> Preparation Wizard guides you through the steps for creating or<br />
modifying an analys<strong>is</strong> plan for use with the various <strong>Complex</strong> Samples analys<strong>is</strong><br />
proc<strong>ed</strong>ures. It <strong>is</strong> most useful when you do not have <strong>ac</strong>cess to the sampling plan<br />
file us<strong>ed</strong> to draw the sample.<br />
Using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard to<br />
Ready NHIS Public Data<br />
The National Health Interview Survey (NHIS) <strong>is</strong> a large, population-bas<strong>ed</strong> survey of<br />
the U.S. civilian population. Interviews are carri<strong>ed</strong> out f<strong>ac</strong>e-to-f<strong>ac</strong>e in a nationally<br />
representative sample of households. Demographic information and observations<br />
about health behavior and status are obtain<strong>ed</strong> for members of e<strong>ac</strong>h household.<br />
A subset of the 2000 survey <strong>is</strong> collect<strong>ed</strong> in nh<strong>is</strong>2000_subset.sav, found in the<br />
\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />
the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard to create an analys<strong>is</strong> plan for th<strong>is</strong><br />
data file so that it can be process<strong>ed</strong> by <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures.<br />
Using the Wizard<br />
E<br />
To prepare a sample using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard, from<br />
the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Prepare for Analys<strong>is</strong>...<br />
123
124<br />
Chapter 12<br />
Figure 12-1<br />
Analys<strong>is</strong> Preparation Wizard, Welcome step<br />
E<br />
E<br />
Enter c:\nh<strong>is</strong>2000_subset.csaplan as the name for the analys<strong>is</strong> plan file.<br />
Click Next.
125<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
Figure 12-2<br />
Analys<strong>is</strong> Preparation Wizard, Design Variables step (stage 1)<br />
The data are obtain<strong>ed</strong> using a complex mult<strong>is</strong>tage sample. However, for end users,<br />
the original NHIS design variables were transform<strong>ed</strong> to a simplifi<strong>ed</strong> set of design and<br />
weight variables whose results approximate those of the original design structures.<br />
E<br />
E<br />
E<br />
E<br />
Select Stratum for variance estimation as a strata variable.<br />
Select PSU for variance estimation as a cluster variable.<br />
Select Weight - Final Annual as the sample weight variable.<br />
Click Fin<strong>is</strong>h.
126<br />
Chapter 12<br />
Summary<br />
Figure 12-3<br />
Summary<br />
The summary table reviews your analys<strong>is</strong> plan. The plan cons<strong>is</strong>ts of one stage with a<br />
design of one stratification variable and one cluster variable. With-repl<strong>ac</strong>ement<br />
(WR) estimation <strong>is</strong> us<strong>ed</strong>, and the plan <strong>is</strong> sav<strong>ed</strong> to c:/nh<strong>is</strong>2000_subset.csaplan. You<br />
can now use th<strong>is</strong> plan file to process nh<strong>is</strong>2000_subset.sav with <strong>Complex</strong> Samples<br />
analys<strong>is</strong> proc<strong>ed</strong>ures.<br />
Preparing for Analys<strong>is</strong> When Sampling Weights Are Not in<br />
the Data File<br />
Aloanofficer has a collection of customer records, taken <strong>ac</strong>cording to a complex<br />
design; however, the sampling weights are not includ<strong>ed</strong> in the file. Th<strong>is</strong> information<br />
<strong>is</strong> contain<strong>ed</strong> in bankloan_cs_noweights.sav, found in the \tutorial\sample_files\<br />
subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong> . Starting with what she<br />
knows about the sampling design, the officer wants to use the <strong>Complex</strong> Samples<br />
Analys<strong>is</strong> Preparation Wizard to create an analys<strong>is</strong> plan for th<strong>is</strong> data file so that it can<br />
be process<strong>ed</strong> by <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures.<br />
The loan officer knows that the records were select<strong>ed</strong> in two stages, with 15 out of<br />
100 bank branches select<strong>ed</strong> with equal probability and without repl<strong>ac</strong>ement in the<br />
first stage. One hundr<strong>ed</strong> customers were then select<strong>ed</strong> from e<strong>ac</strong>h of those banks with
127<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
equal probability and without repl<strong>ac</strong>ement in the second stage, and information on<br />
the number of customers at e<strong>ac</strong>h bank <strong>is</strong> includ<strong>ed</strong> in the data file. The first step to<br />
creating an analys<strong>is</strong> plan <strong>is</strong> to compute the stagew<strong>is</strong>e inclusion probabilities and<br />
final sampling weights.<br />
Computing Inclusion Probabilities and Sampling Weights<br />
E<br />
To compute the inclusion probabilities for the first stage, from the menus choose:<br />
Transform<br />
Compute...<br />
Figure 12-4<br />
Compute Variable dialog box<br />
Fifteen out of one hundr<strong>ed</strong> bank branches were select<strong>ed</strong> without repl<strong>ac</strong>ement in the<br />
first stage; thus, the probability that a given bank was select<strong>ed</strong> <strong>is</strong> 15/100 = 0.15.<br />
E<br />
Type inclprob_s1 as the target variable.
128<br />
Chapter 12<br />
E<br />
E<br />
Type 0.15 as the numeric expression.<br />
Click OK.<br />
Figure 12-5<br />
Compute Variable dialog box<br />
One hundr<strong>ed</strong> customers were select<strong>ed</strong> from e<strong>ac</strong>h branch in the second stage; thus,<br />
the stage 2 inclusion probability for a given customer at a given bank <strong>is</strong> 100/the<br />
number of customers at that bank.<br />
E<br />
E<br />
E<br />
E<br />
Recall the Compute Variable dialog box.<br />
Type inclprob_s2 as the target variable.<br />
Type 100/ncust as the numeric expression.<br />
Click OK.
129<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
Figure 12-6<br />
Compute Variable dialog box<br />
Now that you have the inclusion probabilities for e<strong>ac</strong>h stage, it’s easy to compute the<br />
final sampling weights.<br />
E<br />
E<br />
E<br />
E<br />
Recall the Compute Variable dialog box.<br />
Type finalweight as the target variable.<br />
Type 1/(inclprob_s1 * inclprob_s2) as the numeric expression.<br />
Click OK.<br />
You are now ready to create the analys<strong>is</strong> plan.
130<br />
Chapter 12<br />
Using the Wizard<br />
E<br />
To prepare a sample using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard, from<br />
the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Prepare for Analys<strong>is</strong>...<br />
Figure 12-7<br />
Analys<strong>is</strong> Preparation Wizard, Welcome step<br />
E<br />
E<br />
Enter c:\bankloan.csaplan as the name for the analys<strong>is</strong> plan file.<br />
Click Next.
131<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
Figure 12-8<br />
Analys<strong>is</strong> Preparation Wizard, Design Variables step (stage 1)<br />
E<br />
E<br />
E<br />
Select Branch as a cluster variable.<br />
Select finalweight as the sample weight variable.<br />
Click Next.
132<br />
Chapter 12<br />
Figure 12-9<br />
Analys<strong>is</strong> Preparation Wizard, Estimation Method step (stage 1)<br />
E<br />
E<br />
Select Equal WOR as the first stage estimation method.<br />
Click Next.
133<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
Figure 12-10<br />
Analys<strong>is</strong> Preparation Wizard, Size step (stage 1)<br />
E<br />
E<br />
Select Read values from variable and select inclprob_s1 as the variable containing<br />
the first stage inclusion probabilities.<br />
Click Next.
134<br />
Chapter 12<br />
Figure 12-11<br />
Analys<strong>is</strong> Preparation Wizard, Plan Summary step (stage 1)<br />
E<br />
E<br />
Select Yes, add stage 2 now.<br />
Click Next, and then click Next in the Design step.
135<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
Figure 12-12<br />
Analys<strong>is</strong> Preparation Wizard, Estimation Method step (stage 2)<br />
E<br />
E<br />
Select Equal WOR as the second stage estimation method.<br />
Click Next.
136<br />
Chapter 12<br />
Figure 12-13<br />
Analys<strong>is</strong> Preparation Wizard, Size step (stage 2)<br />
E<br />
E<br />
Select Read values from variable and select inclprob_s2 as the variable containing<br />
the second stage inclusion probabilities.<br />
Click Fin<strong>is</strong>h.
137<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
Summary<br />
Figure 12-14<br />
Summary<br />
The summary table reviews your analys<strong>is</strong> plan. The plan cons<strong>is</strong>ts of two stages with<br />
adesignofoneclustervariable. Equalprobability without repl<strong>ac</strong>ement (WOR)<br />
estimation <strong>is</strong> us<strong>ed</strong>, and the plan <strong>is</strong> sav<strong>ed</strong> to c:/bankloan.csaplan. You can now use<br />
th<strong>is</strong> plan file to process bankloan_noweights.sav (with the inclusion probabilities and<br />
sampling weights you’ve comput<strong>ed</strong>) with <strong>Complex</strong> Samples analys<strong>is</strong> proc<strong>ed</strong>ures.<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard proc<strong>ed</strong>ure <strong>is</strong> a useful tool for<br />
readying a sample for analys<strong>is</strong> when you do not have <strong>ac</strong>cess to the sampling plan file.<br />
• To create a sampling plan file and draw a sample, use the Sampling Wizard.
<strong>Complex</strong> Samples Frequencies<br />
Chapter<br />
13<br />
The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure produces frequency tables for select<strong>ed</strong><br />
variables and d<strong>is</strong>plays univariate stat<strong>is</strong>tics. Optionally, you can request stat<strong>is</strong>tics by<br />
subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />
Using <strong>Complex</strong> Samples Frequencies to Analyze Nutritional<br />
Supplement Usage<br />
A researcher wants to study the use of nutritional supplements among U.S. citizens,<br />
using the results of the National Health Interview Survey (NHIS) and a previously<br />
creat<strong>ed</strong> analys<strong>is</strong> plan. For more information, see “Using the <strong>Complex</strong> Samples<br />
Analys<strong>is</strong> Preparation Wizard to Ready NHIS Public Data” in Chapter 12 on p. 123.<br />
A subset of the 2000 survey <strong>is</strong> collect<strong>ed</strong> in nh<strong>is</strong>2000_subset.sav, found in the<br />
\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>.<br />
The analys<strong>is</strong> plan <strong>is</strong> stor<strong>ed</strong> in nh<strong>is</strong>2000_subset.csaplan. Use <strong>Complex</strong> Samples<br />
Frequencies to produce stat<strong>is</strong>tics for nutritional supplement usage.<br />
Running the Analys<strong>is</strong><br />
E<br />
To run a <strong>Complex</strong> Samples Frequencies analys<strong>is</strong>, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Frequencies...<br />
139
140<br />
Chapter 13<br />
Figure 13-1<br />
<strong>Complex</strong> Samples Plan dialog box<br />
E<br />
E<br />
Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />
install<strong>ed</strong> <strong>SPSS</strong> and select nh<strong>is</strong>2000_subset.csaplan.<br />
Click Continue.
141<br />
<strong>Complex</strong> Samples Frequencies<br />
Figure 13-2<br />
Frequencies dialog box<br />
E<br />
E<br />
E<br />
Select Vitamin/mineral supplmnts-past 12 m as a frequency variable.<br />
Select Age category as a subpopulation variable.<br />
Click Stat<strong>is</strong>tics.
142<br />
Chapter 13<br />
Figure 13-3<br />
Frequencies Stat<strong>is</strong>tics dialog box<br />
E<br />
E<br />
E<br />
E<br />
Select Table percent in the Cells group.<br />
Select Confidence interval in the Stat<strong>is</strong>tics group.<br />
Click Continue.<br />
Click OK in the Frequencies dialog box.<br />
Frequency Table<br />
Figure 13-4<br />
Frequency table for variable/situation
143<br />
<strong>Complex</strong> Samples Frequencies<br />
E<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h select<strong>ed</strong> cell measure. The first column<br />
contains estimates of the number and percentage of the population that do or do not<br />
take vitamin/mineral supplements. The confidence intervals are non-overlapping;<br />
thus, you can conclude that, overall, more Americans take vitamin/mineral<br />
supplements than not.<br />
Frequency by Subpopulation<br />
Figure 13-5<br />
Frequency table by subpopulation<br />
When computing stat<strong>is</strong>tics by subpopulation, e<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for<br />
e<strong>ac</strong>h select<strong>ed</strong> cell measure by value of Age category. The first column contains<br />
estimates of the number and percentage of the population of e<strong>ac</strong>h category that do<br />
or do not take vitamin/mineral supplements. The confidence intervals for the table
144<br />
Chapter 13<br />
percentages are all non-overlapping; thus, you can conclude that with increasing age<br />
category, greater proportions of Americans take vitamin/mineral supplements.<br />
Summary<br />
Using the <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure, you have obtain<strong>ed</strong> stat<strong>is</strong>tics for<br />
the use of nutritional supplements among U.S. citizens.<br />
• Overall, more Americans take vitamin/mineral supplements than not.<br />
• When broken down by age category, greater proportions of Americans take<br />
vitamin/mineral supplements with increasing age.<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining univariate<br />
descriptive stat<strong>is</strong>tics of categorical variables for observations obtain<strong>ed</strong> via a complex<br />
sampling design.<br />
• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />
design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />
Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />
dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />
• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to set analys<strong>is</strong><br />
specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />
the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />
analyzing the sample corresponding to that plan.<br />
• The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure provides descriptive stat<strong>is</strong>tics for the<br />
crosstabulation of categorical variables.<br />
• The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure provides univariate descriptive<br />
stat<strong>is</strong>tics for scale variables.
Chapter<br />
14<br />
<strong>Complex</strong> Samples Descriptives<br />
The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics<br />
for several variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />
by one or more categorical variables.<br />
Using <strong>Complex</strong> Samples Descriptives to Analyze Activity Levels<br />
A researcher wants to study the <strong>ac</strong>tivity levels of U.S. citizens, using the results of the<br />
National Health Interview Survey (NHIS) and a previously creat<strong>ed</strong> analys<strong>is</strong> plan. For<br />
more information, see “Using the <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard<br />
to Ready NHIS Public Data” in Chapter 12 on p. 123.<br />
A subset of the 2000 survey <strong>is</strong> collect<strong>ed</strong> in nh<strong>is</strong>2000_subset.sav, found in the<br />
\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>.<br />
The analys<strong>is</strong> plan <strong>is</strong> stor<strong>ed</strong> in nh<strong>is</strong>2000_subset.csaplan. Use <strong>Complex</strong> Samples<br />
Descriptives to produce univariate descriptive stat<strong>is</strong>tics for <strong>ac</strong>tivity levels.<br />
Running the Analys<strong>is</strong><br />
E<br />
To run a <strong>Complex</strong> Samples Descriptives analys<strong>is</strong>, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Descriptives...<br />
145
146<br />
Chapter 14<br />
Figure 14-1<br />
<strong>Complex</strong> Samples Plan dialog box<br />
E<br />
E<br />
Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />
install<strong>ed</strong> <strong>SPSS</strong>, and select nh<strong>is</strong>2000_subset.csaplan.<br />
Click Continue.
147<br />
<strong>Complex</strong> Samples Descriptives<br />
Figure 14-2<br />
Descriptives dialog box<br />
E<br />
E<br />
E<br />
Select Freq vigorous <strong>ac</strong>tivity (times per wk) through Freq strength <strong>ac</strong>tivity (times<br />
per wk) as measure variables.<br />
Select Age category as a subpopulation variable.<br />
Click Stat<strong>is</strong>tics.
148<br />
Chapter 14<br />
Figure 14-3<br />
Descriptives Stat<strong>is</strong>tics dialog box<br />
E<br />
E<br />
E<br />
Select Confidence interval in the Stat<strong>is</strong>tics group.<br />
Click Continue.<br />
Click OK in the <strong>Complex</strong> Samples Descriptives dialog box.<br />
Univariate Stat<strong>is</strong>tics<br />
Figure 14-4<br />
Univariate stat<strong>is</strong>tics<br />
E<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h measure variable. The first column<br />
contains estimates of the average number of times per week that a person engages<br />
in a particular type of <strong>ac</strong>tivity. The confidence intervals for the means are
149<br />
<strong>Complex</strong> Samples Descriptives<br />
non-overlapping. Thus, you can conclude that, overall, Americans engage in a<br />
strength <strong>ac</strong>tivity less often than vigorous <strong>ac</strong>tivity,andtheyengageinvigorous<strong>ac</strong>tivity<br />
less often than moderate <strong>ac</strong>tivity.<br />
Univariate Stat<strong>is</strong>tics by Subpopulation<br />
Figure 14-5<br />
Univariate stat<strong>is</strong>tics by subpopulation<br />
E<strong>ac</strong>h select<strong>ed</strong> stat<strong>is</strong>tic <strong>is</strong> comput<strong>ed</strong> for e<strong>ac</strong>h measure variable by values of Age<br />
category. The first column contains estimates of the average number of times per<br />
week that people of e<strong>ac</strong>h category engage in a particular type of <strong>ac</strong>tivity. The<br />
confidence intervals for the means allow you to make some interesting conclusions.<br />
• In terms of vigorous and moderate <strong>ac</strong>tivities, 25–44-year-olds are less <strong>ac</strong>tive than<br />
those 18–24 and 45–64, and 45–64-year-olds are less <strong>ac</strong>tive than those 65 or older.
150<br />
Chapter 14<br />
• In terms of strength <strong>ac</strong>tivity, 25–44-year-olds are less <strong>ac</strong>tive than those 45–64,<br />
and 18–24 and 45–64-year-olds are less <strong>ac</strong>tive than those 65 or older.<br />
Summary<br />
Using the <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure, you have obtain<strong>ed</strong> stat<strong>is</strong>tics<br />
for the <strong>ac</strong>tivity levels of U.S. citizens.<br />
• Overall, Americans spend varying amounts of time at different types of <strong>ac</strong>tivities.<br />
• When broken downbyage,itroughlyappearsthatpost-collegiateAmericansare<br />
initially less <strong>ac</strong>tive than they were while in school but become more conscientious<br />
about exerc<strong>is</strong>ing as they age.<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining univariate<br />
descriptive stat<strong>is</strong>tics of scale measures for observations obtain<strong>ed</strong> via a complex<br />
sampling design.<br />
• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />
design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />
Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />
dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />
• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />
specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />
the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />
analyzing the sample corresponding to that plan.<br />
• The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure provides descriptive stat<strong>is</strong>tics for ratios<br />
of scale measures.<br />
• The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure provides univariate descriptive<br />
stat<strong>is</strong>tics of categorical variables.
<strong>Complex</strong> Samples Crosstabs<br />
Chapter<br />
15<br />
The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure produces crosstabulation tables for pairs<br />
of select<strong>ed</strong> variables and d<strong>is</strong>plays two-way stat<strong>is</strong>tics. Optionally, you can request<br />
stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong> by one or more categorical variables.<br />
Using <strong>Complex</strong> Samples Crosstabs to Measure the Relative<br />
R<strong>is</strong>k of an Event<br />
A company that sells magazine subscriptions traditionally sends monthly mailings<br />
to a purchas<strong>ed</strong> database of names. The response rate <strong>is</strong> typically low, so you ne<strong>ed</strong><br />
to find a way to better target prospective customers. One suggestion was to focus<br />
mailings on people with newspaper subscriptions, on the assumption that people who<br />
read newspapers are more likely to subscribe to magazines.<br />
Use the <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure to test th<strong>is</strong> theory by constructing a<br />
two-by-two table of Newspaper subscription by Response and compute the relative<br />
r<strong>is</strong>k that a person with a newspaper subscription will respond to the mailing. Th<strong>is</strong><br />
information <strong>is</strong> collect<strong>ed</strong> in demo_cs.sav and should be analyz<strong>ed</strong> using the sampling<br />
plan file demo_2.csplan, found in the \tutorial\sample_files\ subdirectory of the<br />
directory in which you install<strong>ed</strong> <strong>SPSS</strong>.<br />
Running the Analys<strong>is</strong><br />
E<br />
Toruna<strong>Complex</strong>SamplesCrosstabsanalys<strong>is</strong>, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Crosstabs...<br />
151
152<br />
Chapter 15<br />
Figure 15-1<br />
<strong>Complex</strong> Samples Plan dialog box<br />
E<br />
E<br />
Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />
install<strong>ed</strong> <strong>SPSS</strong> and select demo_2.csplan.<br />
Click Continue.
153<br />
<strong>Complex</strong> Samples Crosstabs<br />
Figure 15-2<br />
Crosstabs dialog box<br />
E<br />
E<br />
E<br />
E<br />
Select Newspaper subscription as a row variable.<br />
Select Response as the column variable.<br />
There <strong>is</strong> also some interest in seeing the results broken down by income categories, so<br />
select Income category in thousands as a subpopulation variable.<br />
Click Stat<strong>is</strong>tics.
154<br />
Chapter 15<br />
Figure 15-3<br />
Crosstabs Stat<strong>is</strong>tics dialog box<br />
E<br />
E<br />
E<br />
E<br />
Deselect Population size and select Row percent in the Cells group.<br />
Select Odds ratio and Relative r<strong>is</strong>k in the Summaries for 2-by-2 Tables group.<br />
Click Continue.<br />
Click OK in the <strong>Complex</strong> Samples Crosstabs dialog box.<br />
These selections produce a crosstabulation table and r<strong>is</strong>k estimate for Newspaper<br />
subscription by Response. Separate tables with results split by Income category<br />
in thousands are also creat<strong>ed</strong>.
155<br />
<strong>Complex</strong> Samples Crosstabs<br />
Crosstabulation<br />
Figure 15-4<br />
Crosstabulation for newspaper subscription by response<br />
The crosstabulation shows that, overall, not very many people respond<strong>ed</strong> to the<br />
mailing. However, a greater proportion of newspaper subscribers respond<strong>ed</strong>.<br />
R<strong>is</strong>k Estimate<br />
Figure 15-5<br />
R<strong>is</strong>k estimate for newspaper subscription by response<br />
The relative r<strong>is</strong>k <strong>is</strong> a ratio of event probabilities. The relative r<strong>is</strong>k of a response to<br />
the mailing <strong>is</strong> the ratio of the probability that a newspaper subscriber responds, to<br />
the probability that a nonsubscriber responds. Thus, the estimate of the relative<br />
r<strong>is</strong>k <strong>is</strong> simply 18.6%/10.6% = 1.743. Likew<strong>is</strong>e, the relative r<strong>is</strong>k of nonresponse <strong>is</strong><br />
the ratio of the probability that a subscriber does not respond, to the probability<br />
that a nonsubscriber does not respond. Your estimateofth<strong>is</strong>relativer<strong>is</strong>k<strong>is</strong>0.911.<br />
Given these results, you can estimate that a newspaper subscriber <strong>is</strong> 1.743 times as<br />
likely to respond to the mailing as a nonsubscriber, or 0.911 times as likely as a<br />
nonsubscriber not to respond.<br />
The odds ratio <strong>is</strong> a ratio of event odds. The odds of an event <strong>is</strong> the ratio of the<br />
probability that the event occurs, to the probability that the event does not occur.<br />
Thus, the estimate of the odds that a newspaper subscriber responds to the mailing
156<br />
Chapter 15<br />
<strong>is</strong> 18.6%/81.4% = 0.229. Likew<strong>is</strong>e, the estimate of the odds that a nonsubscriber<br />
responds <strong>is</strong> 10.6%/89.4% = 0.118. The estimate of the odds ratio <strong>is</strong> therefore<br />
0.229/0.118 = 1.912. Note that the odds ratio <strong>is</strong> the ratio of the relative r<strong>is</strong>k of<br />
responding, to the relative r<strong>is</strong>k of not responding, or 1.743/0.911 = 1.912.<br />
Odds Ratio versus Relative R<strong>is</strong>k<br />
Since it <strong>is</strong> a ratio of ratios, the odds ratio <strong>is</strong> very difficult to interpret. The relative r<strong>is</strong>k<br />
<strong>is</strong> easier to interpret, so the odds ratio alone <strong>is</strong> not very helpful. However, there are<br />
certain commonly occurring situations in which the estimate of the relative r<strong>is</strong>k <strong>is</strong> not<br />
very good, and the odds ratio can be us<strong>ed</strong> to approximate the relative r<strong>is</strong>k of the event<br />
of interest. The odds ratio should be us<strong>ed</strong> as an approximation to the relative r<strong>is</strong>k of<br />
the event of interest when both of the following conditions are met:<br />
• The probability of the event of interest <strong>is</strong> small (< 0.1). Th<strong>is</strong> condition guarantees<br />
that the odds ratio will make a good approximation to the relative r<strong>is</strong>k. In th<strong>is</strong><br />
example, the event of interest <strong>is</strong> a response to the mailing.<br />
• The design of the study <strong>is</strong> case control. Th<strong>is</strong> condition signals that the usual<br />
estimate of the relative r<strong>is</strong>k will likely not be good. A case-control study <strong>is</strong><br />
retrospective, most often us<strong>ed</strong> when the event of interest <strong>is</strong> unlikely or when the<br />
design of a prospective experiment <strong>is</strong> impr<strong>ac</strong>tical or unethical.<br />
Neither condition <strong>is</strong> met in th<strong>is</strong> example, since the overall proportion of respondents<br />
was 13.5% and the design of the study was not case control, so it’s safer to report<br />
1.743 as the relative r<strong>is</strong>k, rather than the value of the odds ratio.
R<strong>is</strong>k Estimate by Subpopulation<br />
157<br />
<strong>Complex</strong> Samples Crosstabs<br />
Figure 15-6<br />
R<strong>is</strong>k estimate for newspaper subscription by response, controlling for income category<br />
Relative r<strong>is</strong>k estimates are comput<strong>ed</strong> separately for e<strong>ac</strong>h income category. Note<br />
that the relative r<strong>is</strong>k of a positive response for newspaper subscribers appears to<br />
gradually decrease with increasing income, which indicates that you may be able<br />
to further target the mailings.<br />
Summary<br />
Using <strong>Complex</strong> Samples Crosstabs r<strong>is</strong>k estimates, you found that you can increase<br />
your response rate to direct mailings by targeting newspaper subscribers. Further,<br />
you found some evidence that the r<strong>is</strong>k estimates may not be constant <strong>ac</strong>ross Income<br />
category, soyoumaybeabletoincreaseyourresponserateevenmorebytargeting<br />
lower-income newspaper subscribers.
158<br />
Chapter 15<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples Crosstabs proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining descriptive<br />
stat<strong>is</strong>tics of the crosstabulation of categorical variables for observations obtain<strong>ed</strong><br />
via a complex sampling design.<br />
• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />
design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />
Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />
dialog box when analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />
• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />
specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />
the SamplingWizardcanbespecifi<strong>ed</strong>inthePlan dialog box when analyzing the<br />
sample corresponding to that plan.<br />
• The <strong>Complex</strong> Samples Frequencies proc<strong>ed</strong>ure provides univariate descriptive<br />
stat<strong>is</strong>tics for categorical variables.
<strong>Complex</strong> Samples Ratios<br />
Chapter<br />
16<br />
The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure d<strong>is</strong>plays univariate summary stat<strong>is</strong>tics for<br />
ratios of variables. Optionally, you can request stat<strong>is</strong>tics by subgroups, defin<strong>ed</strong><br />
by one or more categorical variables.<br />
Using <strong>Complex</strong> Samples Ratios to Aid Property Value<br />
Assessment<br />
A state agency <strong>is</strong> charg<strong>ed</strong> with ensuring that property taxes are fairly assess<strong>ed</strong> from<br />
county to county. Taxes are bas<strong>ed</strong> on the appra<strong>is</strong><strong>ed</strong> value of the property, so the agency<br />
wants to tr<strong>ac</strong>k property values <strong>ac</strong>ross counties to be sure that e<strong>ac</strong>h county’s records<br />
are equally up-to-date. Since resources for obtaining current appra<strong>is</strong>als are limit<strong>ed</strong>,<br />
the agency chose to employ complex sampling methodology to select properties.<br />
The sample of properties select<strong>ed</strong> and their current appra<strong>is</strong>al information <strong>is</strong><br />
collect<strong>ed</strong> in property_assess_cs_sample.sav, found in the \tutorial\sample_files\<br />
subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use <strong>Complex</strong> Samples<br />
Ratios to assess the change in property values since the last appra<strong>is</strong>al <strong>ac</strong>ross the<br />
five counties.<br />
Running the Analys<strong>is</strong><br />
E<br />
To run a <strong>Complex</strong> Samples Ratios analys<strong>is</strong>, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Ratios...<br />
159
160<br />
Chapter 16<br />
Figure 16-1<br />
<strong>Complex</strong> Samples Plan dialog box<br />
E<br />
E<br />
Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />
install<strong>ed</strong> <strong>SPSS</strong> and select property_assess.csplan.<br />
Click Continue.
161<br />
<strong>Complex</strong> Samples Ratios<br />
Figure 16-2<br />
<strong>Complex</strong> Samples Ratios dialog box<br />
E<br />
E<br />
E<br />
E<br />
Select Current value as a numerator variable.<br />
Select Valueatlastappra<strong>is</strong>alas the denominator variable.<br />
Select County as a subpopulation variable.<br />
Click Stat<strong>is</strong>tics.
162<br />
Chapter 16<br />
Figure 16-3<br />
Ratios Stat<strong>is</strong>tics dialog box<br />
E<br />
E<br />
E<br />
E<br />
Select Confidence interval, Unweight<strong>ed</strong> count,andPopulation size in the Stat<strong>is</strong>tics group.<br />
Select t-test and enter 1.3 as the test value.<br />
Click Continue.<br />
Click OK in the <strong>Complex</strong> Samples Ratios dialog box.<br />
Ratios<br />
Figure 16-4<br />
Ratios table<br />
The default d<strong>is</strong>play of the table <strong>is</strong> very wide, so you will ne<strong>ed</strong> to pivot it for a<br />
better view.
163<br />
<strong>Complex</strong> Samples Ratios<br />
Pivoting the Ratios Table<br />
E<br />
E<br />
Double-click the table to <strong>ac</strong>tivate it.<br />
From the Viewer menus choose:<br />
Pivot<br />
Pivoting Trays<br />
E<br />
E<br />
E<br />
E<br />
Drag Numerator and then Denominator from the row to the layer.<br />
Drag County from the row to the column.<br />
Drag Stat<strong>is</strong>tics from the column to the row.<br />
Close the pivoting trays window.<br />
Pivot<strong>ed</strong> Ratios Table<br />
Figure 16-5<br />
Pivot<strong>ed</strong> ratios table<br />
The ratios table <strong>is</strong> now pivot<strong>ed</strong> so that stat<strong>is</strong>tics are easier to compare <strong>ac</strong>ross counties.
164<br />
Chapter 16<br />
• The ratio estimates range from a low of 1.195 in the Southern county to a high of<br />
1.524intheWestern county.<br />
• There <strong>is</strong> also quite a bit of variability in the standard errors, which range from a<br />
low of 0.029 in the Southern county to 0.068 in the Eastern county.<br />
• Some of the confidence intervals do not overlap; thus, you can conclude that<br />
the ratios for the Western county are higher than the ratios for the Northern<br />
and Southern counties.<br />
• Finally, as a more objective measure, note that the significance values for the t<br />
tests for the Western and Southern counties are less than 0.05. Thus, you can<br />
conclude that the ratio for the Western county <strong>is</strong> greater than 1.3 and the ratio for<br />
the Southern county <strong>is</strong> less than 1.3.<br />
Summary<br />
Using the <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure, you have obtain<strong>ed</strong> various stat<strong>is</strong>tics<br />
for the ratios of Current value to Valueatlastappra<strong>is</strong>al. The results suggest that<br />
there may be certain inequities in the assessment of property taxes from county to<br />
county, namely:<br />
• The ratios for the Western county are high, indicating that their records are not as<br />
up-to-date as other counties with respect to the appreciation of property values.<br />
Property taxes are probably too low in th<strong>is</strong> county.<br />
• The ratios for the Southern county are low, indicating that their records are more<br />
up-to-date than the other counties with respect to the appreciation of property<br />
values. Property taxes are probably too high in th<strong>is</strong> county.<br />
• The ratios for the Southern county are lower than those of the Western county but<br />
are still within the objective goal of 1.3.<br />
Resources us<strong>ed</strong> to tr<strong>ac</strong>k property values in the Southern county will be reassign<strong>ed</strong><br />
to the Western county to bring these counties’ ratios in line with the others and<br />
with the goal of 1.3.
165<br />
<strong>Complex</strong> Samples Ratios<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples Ratios proc<strong>ed</strong>ure <strong>is</strong> a useful tool for obtaining univariate<br />
descriptive stat<strong>is</strong>tics of the ratio of scale measures for observations obtain<strong>ed</strong> via a<br />
complex sampling design.<br />
• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />
design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />
Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />
dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />
• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />
specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />
the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />
analyzing the sample corresponding to that plan.<br />
• The <strong>Complex</strong> Samples Descriptives proc<strong>ed</strong>ure provides descriptive stat<strong>is</strong>tics<br />
for individual scale measures.
Chapter<br />
17<br />
<strong>Complex</strong> Samples General Linear<br />
Model<br />
The <strong>Complex</strong> Samples General Linear Model (CSGLM) proc<strong>ed</strong>ure performs linear<br />
regression analys<strong>is</strong>, as well as analys<strong>is</strong> of variance and covariance, for samples<br />
drawn by complex sampling methods. Optionally, you can request analyses for<br />
a subpopulation.<br />
Using <strong>Complex</strong> Samples General Linear Model to Fit a<br />
Two-F<strong>ac</strong>tor ANOVA<br />
A grocery store chain survey<strong>ed</strong> a set of customers concerning their purchasing<br />
habits, <strong>ac</strong>cording to a complex design. Given the survey results and how much e<strong>ac</strong>h<br />
customer spent in the previous month, the store wants to see if the frequency with<br />
which customers shop <strong>is</strong> relat<strong>ed</strong> to the amount they spend in a month, controlling for<br />
the gender of the customer and incorporating the sampling design.<br />
Th<strong>is</strong> information <strong>is</strong> collect<strong>ed</strong> in grocery_1month_sample.sav, found in the<br />
\tutorial\sample_files\ subdirectory of the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. Use<br />
the <strong>Complex</strong> Samples General Linear Model proc<strong>ed</strong>ure to perform a two-f<strong>ac</strong>tor (or<br />
two-way) ANOVA on the amounts spent.<br />
Running the Analys<strong>is</strong><br />
E<br />
To run a <strong>Complex</strong> Samples General Linear Model analys<strong>is</strong>, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
General Linear Model...<br />
167
168<br />
Chapter 17<br />
Figure 17-1<br />
<strong>Complex</strong> Samples Plan dialog box<br />
E<br />
E<br />
Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />
install<strong>ed</strong> <strong>SPSS</strong> and select grocery.csplan.<br />
Click Continue.
169<br />
<strong>Complex</strong> Samples General Linear Model<br />
Figure 17-2<br />
General Linear Model dialog box<br />
E<br />
E<br />
E<br />
Select Amount spent as the dependent variable.<br />
Select Who shopping for and Use coupons as f<strong>ac</strong>tors.<br />
Click Model.
170<br />
Chapter 17<br />
Figure 17-3<br />
Model dialog box<br />
E<br />
E<br />
E<br />
E<br />
E<br />
Choose to build a Custom model.<br />
Select Main effects as the type of term to build and select shopfor and usecoup<br />
as model terms.<br />
Select Inter<strong>ac</strong>tion as the type of term to build and add the shopfor*usecoup inter<strong>ac</strong>tion<br />
as a model term.<br />
Click Continue.<br />
Click Stat<strong>is</strong>tics in the General Linear Model dialog box.
171<br />
<strong>Complex</strong> Samples General Linear Model<br />
Figure 17-4<br />
General Linear Model Stat<strong>is</strong>tics dialog box<br />
E<br />
E<br />
E<br />
Select Estimate, Standard error, Confidence interval, andDesign effect in the Model<br />
Parameters group.<br />
Click Continue.<br />
Click Estimat<strong>ed</strong> Means in the General Linear Model dialog box.
172<br />
Chapter 17<br />
Figure 17-5<br />
General Linear Model Estimat<strong>ed</strong> Means dialog box<br />
E<br />
E<br />
E<br />
E<br />
E<br />
Choose to d<strong>is</strong>play means for shopfor, usecoup, andtheshopfor*usecoup inter<strong>ac</strong>tion.<br />
Select a Simple contrast and 3Selfandfamilyas the reference category for shopfor.<br />
Note that, once select<strong>ed</strong>, the category appears as “3” in the dialog box.<br />
Select a Simple contrast and 1Noas the reference category for usecoup.<br />
Click Continue.<br />
Click OK in the General Linear Model dialog box.<br />
Model Summary<br />
Figure 17-6<br />
Tests of between-subjects effects
173<br />
<strong>Complex</strong> Samples General Linear Model<br />
R Square, the coefficient of determination, <strong>is</strong> a measure of the strength of the model<br />
fit. It shows that about 60% of the variation in Amount spent <strong>is</strong>explain<strong>ed</strong>bythe<br />
model, which gives you good explanatory ability. You may still want to add other<br />
pr<strong>ed</strong>ictors to the model to further improve the fit.<br />
Tests of Model Effects<br />
Figure 17-7<br />
Tests of between-subjects effects<br />
E<strong>ac</strong>h term in the model, plus the model as a whole, <strong>is</strong> test<strong>ed</strong> for whether the value<br />
of its effect equals 0. Terms with significance values of less than 0.05 have some<br />
d<strong>is</strong>cernible effect. Thus, all model terms contribute to the model.
174<br />
Chapter 17<br />
Parameter Estimates<br />
Figure 17-8<br />
Parameter estimates<br />
The parameter estimates show the effect of e<strong>ac</strong>h pr<strong>ed</strong>ictor on Amount spent. The<br />
value of 518.249 for the intercept term indicates that the grocery chain can expect a<br />
shopper with a family who uses coupons from the newspaper and target<strong>ed</strong> mailings to<br />
spend $518.25, on average. You can tell that the intercept <strong>is</strong> associat<strong>ed</strong> with these<br />
f<strong>ac</strong>tor levels because those are the f<strong>ac</strong>tor levels whose parameters are r<strong>ed</strong>undant.<br />
• The shopfor coefficients suggest that, among customerswhousebothmail<strong>ed</strong><br />
coupons and newspaper coupons, those without family tend to spend less than<br />
those with spouses, who in turn spend less than those with dependents at home.<br />
Since the tests of model effects show<strong>ed</strong> that th<strong>is</strong> term contributes to the model,<br />
these differences are not due to chance.
175<br />
<strong>Complex</strong> Samples General Linear Model<br />
• The usecoup coefficients suggest that spending among customers with dependents<br />
at home decreases with decreas<strong>ed</strong> coupon usage. There <strong>is</strong> a moderate amount of<br />
uncertainty in the estimates, but the confidence intervals do not include 0.<br />
• The inter<strong>ac</strong>tion coefficients suggest that customers who do not use coupons or<br />
only clip from the newspaper and do not have dependents tend to spend more<br />
than you would otherw<strong>is</strong>e expect. If any portion of an inter<strong>ac</strong>tion parameter <strong>is</strong><br />
r<strong>ed</strong>undant, the inter<strong>ac</strong>tion parameter <strong>is</strong> r<strong>ed</strong>undant.<br />
• The deviation in the values of the design effects from 1 indicate that some of the<br />
standard errors comput<strong>ed</strong> for these parameter estimates are larger than those<br />
you would obtain if you assum<strong>ed</strong> that these observations came from a simple<br />
random sample, while others are smaller. It <strong>is</strong> vitally important to incorporate the<br />
sampling design information in your analys<strong>is</strong> because you might otherw<strong>is</strong>e infer,<br />
for example, that the usecoup=3 coefficient <strong>is</strong> not different from 0!<br />
The parameter estimates are useful for quantifying the effect of e<strong>ac</strong>h model term, but<br />
the estimat<strong>ed</strong> marginal means tables can make it easier to interpret the model results.<br />
Estimat<strong>ed</strong> Marginal Means<br />
Figure 17-9<br />
Estimat<strong>ed</strong> marginal means by levels of gender<br />
Th<strong>is</strong> table d<strong>is</strong>plays the model-estimat<strong>ed</strong> marginal means and standard errors of<br />
Amount spent at the f<strong>ac</strong>tor levels of Who shopping for. Th<strong>is</strong> table <strong>is</strong> useful for<br />
exploring the differences between the levels of th<strong>is</strong> f<strong>ac</strong>tor. In th<strong>is</strong> example, a customer<br />
who shops for themselves <strong>is</strong> expect<strong>ed</strong> to spend about $308.53, while a customer<br />
with a spouse <strong>is</strong> expect<strong>ed</strong> to spend $370.33, and a customer with dependents will<br />
spend $459.44. To see whether th<strong>is</strong> represents a real difference or <strong>is</strong> due to chance<br />
variation, look at the test results.
176<br />
Chapter 17<br />
Figure 17-10<br />
Individual test results for estimat<strong>ed</strong> marginal means of gender<br />
The individual tests table d<strong>is</strong>plays two simple contrasts in spending.<br />
• The contrast estimate <strong>is</strong> the difference in spending for the l<strong>is</strong>t<strong>ed</strong> levels of Who<br />
shopping for.<br />
• The hypothesiz<strong>ed</strong> value of 0.00 represents the belief that there <strong>is</strong> no difference<br />
in spending.<br />
• The Wald F stat<strong>is</strong>tic, with the d<strong>is</strong>play<strong>ed</strong> degrees of fre<strong>ed</strong>om, <strong>is</strong> us<strong>ed</strong> to test<br />
whether the difference between a contrast estimate and hypothesiz<strong>ed</strong> value <strong>is</strong> due<br />
to chance variation.<br />
• Since the significance values are less than 0.05, you can conclude that there<br />
are differences in spending.<br />
The values of the contrast estimates are different from the parameter estimates. Th<strong>is</strong><br />
<strong>is</strong> because there <strong>is</strong> an inter<strong>ac</strong>tion term containing the Who shopping for effect. As a<br />
result, the parameter estimate for shopfor=1 <strong>is</strong> a simple contrast between the levels<br />
Self and Self and Family at the level From both of the variable Use coupons. The<br />
contrast estimate in th<strong>is</strong> table <strong>is</strong> averag<strong>ed</strong> over the levels of Use coupons.<br />
Figure 17-11<br />
Overall test results for estimat<strong>ed</strong> marginal means of gender<br />
The overall test table reports the results of a test of all of the contrasts in the individual<br />
test table. Its significance value of less than 0.05 confirms that there <strong>is</strong> a difference in<br />
spending among the levels of Who shopping for.
177<br />
<strong>Complex</strong> Samples General Linear Model<br />
Figure 17-12<br />
Estimat<strong>ed</strong> marginal means by levels of shopping style<br />
Th<strong>is</strong> table d<strong>is</strong>plays the model-estimat<strong>ed</strong> marginal means and standard errors of<br />
Amount spent at the f<strong>ac</strong>tor levels of Use coupons. Th<strong>is</strong> table <strong>is</strong> useful for exploring<br />
the differences between the levels of th<strong>is</strong> f<strong>ac</strong>tor. In th<strong>is</strong> example, a customer who does<br />
not use coupons <strong>is</strong> expect<strong>ed</strong> to spend about $319.65, and those who do use coupons<br />
are expect<strong>ed</strong> to spend considerably more.<br />
Figure 17-13<br />
Individual test results for estimat<strong>ed</strong> marginal means of shopping style<br />
The individual tests table d<strong>is</strong>plays three simple contrasts, comparing the spending of<br />
customers who do not use coupons to those who do.<br />
Since the significance values of the tests are less than 0.05, you can conclude that<br />
customers who use coupons tend to spend more than those who don’t.<br />
Figure 17-14<br />
Overall test results for estimat<strong>ed</strong> marginal means of shopping style
178<br />
Chapter 17<br />
The overall test table reports the results of a test of all the contrasts in the individual<br />
test table. Its significance value of less than 0.05 confirms that there <strong>is</strong> a difference<br />
in spending among the levels of Use coupons. Note that the overall tests for Use<br />
coupons and Who shopping for are equivalent to the tests of model effects because the<br />
hypothesiz<strong>ed</strong> contrast values are equal to 0.<br />
Figure 17-15<br />
Estimat<strong>ed</strong> marginal means by levels of gender by shopping style<br />
Th<strong>is</strong> table d<strong>is</strong>plays the model-estimat<strong>ed</strong> marginal means, standard errors, and<br />
confidence intervals of Amount spent at the f<strong>ac</strong>tor combinations of Who shopping for<br />
and Use coupons. Th<strong>is</strong> table <strong>is</strong> useful for exploring the inter<strong>ac</strong>tion effect between<br />
these two f<strong>ac</strong>tors that was found in the tests of model effects.<br />
Summary<br />
In th<strong>is</strong> example, the estimat<strong>ed</strong> marginal means reveal<strong>ed</strong> differences in spending<br />
between customers at varying levels of Who shopping for and Use coupons. Thetests<br />
of model effects confirm<strong>ed</strong> th<strong>is</strong>, as well as the f<strong>ac</strong>t that there appears to be a Who<br />
shopping for*Use coupons inter<strong>ac</strong>tion effect. The model summary table reveal<strong>ed</strong> that<br />
the present model explains somewhat more than half of the variation in the data, and<br />
couldlikelybeimprov<strong>ed</strong>byaddingmorepr<strong>ed</strong>ictors.
179<br />
<strong>Complex</strong> Samples General Linear Model<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples General Linear Model proc<strong>ed</strong>ure <strong>is</strong> a useful tool for modeling<br />
a scale variable when the cases have been drawn<strong>ac</strong>cordingto<strong>ac</strong>omplexsampling<br />
scheme.<br />
• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />
design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />
Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />
dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />
• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />
specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />
the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />
analyzing the sample corresponding to that plan.<br />
• The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure allows you to model a<br />
categorical response.
<strong>Complex</strong> Samples Log<strong>is</strong>tic<br />
Regression<br />
Chapter<br />
18<br />
The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure performs log<strong>is</strong>tic regression<br />
analys<strong>is</strong> on a binary or multinomial dependent variable for samples drawn by complex<br />
sampling methods. Optionally, you can request analyses for a subpopulation.<br />
Using <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression to Assess Cr<strong>ed</strong>it<br />
R<strong>is</strong>k<br />
If you are a loan officer at a bank, then you want to be able to identify char<strong>ac</strong>ter<strong>is</strong>tics<br />
that are indicative of people who are likely to default on loans, and use those<br />
char<strong>ac</strong>ter<strong>is</strong>ticstoidentifygoodandbadcr<strong>ed</strong>itr<strong>is</strong>ks.<br />
Suppose that a loan officer has collect<strong>ed</strong> past records of customers given loans<br />
at several different branches, <strong>ac</strong>cording to a complex design. Th<strong>is</strong> information <strong>is</strong><br />
contain<strong>ed</strong> in bankloan_cs.sav, found in the \tutorial\sample_files\ subdirectory of<br />
the directory in which you install<strong>ed</strong> <strong>SPSS</strong>. The officer wants to see if the probability<br />
with which a customer defaults <strong>is</strong> relat<strong>ed</strong> to age, employment h<strong>is</strong>tory, and amount<br />
of cr<strong>ed</strong>it debt, and incorporating the sampling design.<br />
Running the Analys<strong>is</strong><br />
E<br />
To create the log<strong>is</strong>tic regression model, from the menus choose:<br />
Analyze<br />
<strong>Complex</strong> Samples<br />
Log<strong>is</strong>tic Regression...<br />
181
182<br />
Chapter 18<br />
Figure 18-1<br />
<strong>Complex</strong> Samples Plan dialog box<br />
E<br />
E<br />
Browse to the \tutorial\sample_files\ subdirectory of the directory in which you<br />
install<strong>ed</strong> <strong>SPSS</strong> and select bankloan.csaplan.<br />
Click Continue.
183<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
Figure 18-2<br />
Log<strong>is</strong>tic Regression dialog box<br />
E<br />
E<br />
E<br />
E<br />
Select Previously default<strong>ed</strong> as the dependent variable.<br />
Select Levelof<strong>ed</strong>ucationas a f<strong>ac</strong>tor.<br />
Select Age in years through Other debt in thousands as covariates.<br />
Select Previously default<strong>ed</strong> and click Reference Category.
184<br />
Chapter 18<br />
Figure 18-3<br />
Log<strong>is</strong>tic Regression Reference Category dialog box<br />
E<br />
E<br />
E<br />
Select Lowest value as the reference category.<br />
Th<strong>is</strong> sets the “did not default” category as the reference category; thus, the odds ratios<br />
report<strong>ed</strong> in the output will have the property that increasing odds ratios correspond to<br />
increasing probability of default.<br />
Click Continue.<br />
Click Stat<strong>is</strong>tics in the Log<strong>is</strong>tic Regression dialog box.
185<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
Figure 18-4<br />
Log<strong>is</strong>tic Regression Stat<strong>is</strong>tics dialog box<br />
E<br />
E<br />
E<br />
E<br />
Select Classification table in the Model Fit group<br />
Select Estimate, Exponentiat<strong>ed</strong> estimate, Standard error, Confidence interval, andDesign<br />
effect in the Parameters group.<br />
Click Continue.<br />
Click Odds Ratios in the Log<strong>is</strong>tic Regression dialog box.
186<br />
Chapter 18<br />
Figure 18-5<br />
Log<strong>is</strong>tic Regression Odds Ratios dialog box<br />
E<br />
E<br />
E<br />
Choose to create odds ratios for the f<strong>ac</strong>tor <strong>ed</strong> and the covariates employ and debtinc.<br />
Click Continue.<br />
Click OK in the Log<strong>is</strong>tic Regression dialog box.
187<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
Pseudo R-Squares<br />
Figure 18-6<br />
Pseudo R-square stat<strong>is</strong>tics<br />
There <strong>is</strong> no single stat<strong>is</strong>tic for log<strong>is</strong>tic regression models that duplicates the properties<br />
of the R-square stat<strong>is</strong>tic for linear regression models, so these approximations are<br />
comput<strong>ed</strong> instead. Larger pseudo R-square stat<strong>is</strong>tics indicate that more of the<br />
variation <strong>is</strong> explain<strong>ed</strong> by the model, to a maximum of 1.<br />
Classification<br />
Figure 18-7<br />
Classification table<br />
The classification table shows the pr<strong>ac</strong>tical results of using the log<strong>is</strong>tic regression<br />
model. For e<strong>ac</strong>h case, the pr<strong>ed</strong>ict<strong>ed</strong> response <strong>is</strong> Yes if that case’s model-pr<strong>ed</strong>ict<strong>ed</strong><br />
logit <strong>is</strong> greater than 0. Cases are weight<strong>ed</strong> by finalweight, so that the classification<br />
table reports the expect<strong>ed</strong> model performance in the population.<br />
• Cells on the diagonal are correct pr<strong>ed</strong>ictions.<br />
• Cells off the diagonal are incorrect pr<strong>ed</strong>ictions.
188<br />
Chapter 18<br />
Bas<strong>ed</strong> upon the cases us<strong>ed</strong> to create the model, you can expect to correctly classify<br />
85.5% of the non-defaulters in the population using th<strong>is</strong> model. Likew<strong>is</strong>e, you can<br />
expect to correctly classify 60.9% of the defaulters. Overall, you can expect to<br />
classify 76.5% of the cases are classifi<strong>ed</strong> correctly; however, because th<strong>is</strong> table was<br />
construct<strong>ed</strong> with the cases us<strong>ed</strong> to create the model, these estimates are likely to<br />
be overly optim<strong>is</strong>tic.<br />
Tests of Model Effects<br />
Figure 18-8<br />
Tests of between-subjects effects<br />
E<strong>ac</strong>h term in the model, plus the model as a whole, <strong>is</strong> test<strong>ed</strong> for whether its effect<br />
equals 0. Terms with significance values less than 0.05 have some d<strong>is</strong>cernable effect.<br />
Thus, age, employ, debtinc, andcr<strong>ed</strong>debt contribute to the model, while the other<br />
main effects do not. In a further analys<strong>is</strong> of the data, you would probably remove <strong>ed</strong>,<br />
address, income, andothdebt from model consideration.
189<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
Parameter Estimates<br />
Figure 18-9<br />
Parameter estimates<br />
The parameter estimates table summarizes the effect of e<strong>ac</strong>h pr<strong>ed</strong>ictor. Note that<br />
parameter values affect the likelihood of the “did default” category relative to the<br />
“did not default” category. Thus, parameters with positive coefficients increase<br />
the likelihood of default, while parameters with negative coefficients decrease the<br />
likelihood of default.<br />
The meaning of a log<strong>is</strong>tic regression coefficient <strong>is</strong> not as straightforward as that of<br />
a linear regression coefficient. While B <strong>is</strong> convenient for testing the model effects,<br />
Exp(B) <strong>is</strong> easier to interpret. Exp(B) represents the ratio change in the odds of the<br />
event of interest attributable to a one-unit increase in the pr<strong>ed</strong>ictor for pr<strong>ed</strong>ictors that<br />
are not part of inter<strong>ac</strong>tion terms. For example, Exp(B) for employ <strong>is</strong> equal to 0.798,<br />
which means that the odds of default for people who have been with their current<br />
employer for two years are 0.798 times the odds of default for those who have been<br />
with their current employer for one year, all other things being equal.<br />
The design effects indicate that some of the standard errors comput<strong>ed</strong> for these<br />
parameter estimates are larger than those you would obtain if you assum<strong>ed</strong> that these<br />
observations came from a simple random sample, while other are smaller. It <strong>is</strong> vitally
190<br />
Chapter 18<br />
important to incorporate the sampling design information in your analys<strong>is</strong> because<br />
you might otherw<strong>is</strong>e infer, for example, that the age coefficient <strong>is</strong> no different from 0!<br />
Odds Ratios<br />
Figure 18-10<br />
Odds ratios for level of <strong>ed</strong>ucation<br />
Th<strong>is</strong> table d<strong>is</strong>plays the odds ratios of Previously default<strong>ed</strong> at the f<strong>ac</strong>tor levels of Level<br />
of <strong>ed</strong>ucation. The report<strong>ed</strong> values are the ratios of the odds of default for Did not<br />
complete highschoolthrough College degree, compar<strong>ed</strong> to the odds of default for<br />
Post-undergraduate degree. Thus, the odds ratio of 2.054 in the first row of the table<br />
means that the odds of default for a person who did not complete high school are<br />
2.054 times the odds of default for a person who has a post-undergraduate degree.
191<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression<br />
Figure 18-11<br />
Odds ratios for years with current employer<br />
Th<strong>is</strong> table d<strong>is</strong>plays the odds ratio of Previously default<strong>ed</strong> for a unit change in the<br />
covariate Years with current employer. The report<strong>ed</strong> value <strong>is</strong> the ratio of the odds<br />
of default for a person with 7.99 years at their current job compar<strong>ed</strong> to the odds of<br />
default for a person with 6.99 years (the mean).<br />
Figure 18-12<br />
Odds ratios for debt to income ratio<br />
Th<strong>is</strong> table d<strong>is</strong>plays the odds ratio of Previously default<strong>ed</strong> for a unit change in the<br />
covariate Debt to income ratio. The report<strong>ed</strong> value <strong>is</strong> the ratio of the odds of default<br />
for a person with a debt/income ratio of 10.9341 compar<strong>ed</strong> to the odds of default<br />
for a personwith9.9341(themean).<br />
Note that because none of these pr<strong>ed</strong>ictors are part of inter<strong>ac</strong>tion terms, the values<br />
of the odds ratios report<strong>ed</strong> in these tables are equal to the values of the exponentiat<strong>ed</strong><br />
parameter estimates. When a pr<strong>ed</strong>ictor <strong>is</strong> part of an inter<strong>ac</strong>tion term, its odds ratio<br />
as report<strong>ed</strong> in these tables will also depend on the values of the other pr<strong>ed</strong>ictors<br />
that make up the inter<strong>ac</strong>tion.
192<br />
Chapter 18<br />
Summary<br />
Using the <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression Proc<strong>ed</strong>ure, you have construct<strong>ed</strong> a<br />
model for pr<strong>ed</strong>icting the probability that a given customer will default on a loan.<br />
Acritical<strong>is</strong>sue for loan officers <strong>is</strong> the cost of Type I and Type II errors. That <strong>is</strong>,<br />
what <strong>is</strong> the cost of classifying a defaulter as a non-defaulter (Type I)? What <strong>is</strong> the<br />
cost of classifying a non-defaulter as a defaulter (Type II)? If bad debt <strong>is</strong> the primary<br />
concern, then you want to lower your Type I error and maximize your sensitivity.<br />
If growing your customer base <strong>is</strong> the priority, then you want to lower your Type<br />
II error and maximize your specificity. Usually, both are major concerns, so you<br />
have to choose a dec<strong>is</strong>ion rule for classifying customers that gives the best mix<br />
of sensitivity and specificity.<br />
Relat<strong>ed</strong> Proc<strong>ed</strong>ures<br />
The <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression proc<strong>ed</strong>ure <strong>is</strong> a useful tool for modeling<br />
a categorical variable when the cases have been drawn <strong>ac</strong>cording to a complex<br />
sampling scheme.<br />
• The<strong>Complex</strong>SamplesSamplingWizard<strong>is</strong>us<strong>ed</strong>tospecifycomplexsampling<br />
design specifications and obtain a sample. The sampling plan file creat<strong>ed</strong> by the<br />
Sampling Wizard contains a default analys<strong>is</strong> plan and can be specifi<strong>ed</strong> in the Plan<br />
dialog box when you are analyzing the sample obtain<strong>ed</strong> <strong>ac</strong>cording to that plan.<br />
• The <strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard <strong>is</strong> us<strong>ed</strong> to specify analys<strong>is</strong><br />
specifications for an ex<strong>is</strong>ting complex sample. The analys<strong>is</strong> plan file creat<strong>ed</strong> by<br />
the Sampling Wizard can be specifi<strong>ed</strong> in the Plan dialog box when you are<br />
analyzing the sample corresponding to that plan.<br />
• The <strong>Complex</strong> Samples General Linear Model proc<strong>ed</strong>ure allows you to model a<br />
scale response.
Bibliography<br />
Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley and Sons.<br />
K<strong>is</strong>h,L.1965.Survey Sampling. New York: John Wiley and Sons.<br />
K<strong>is</strong>h,L.1987.Stat<strong>is</strong>tical Design for Research. New York: John Wiley and Sons.<br />
Murthy, M. N. 1967. Sampling Theory and Methods. Calcutta, India: Stat<strong>is</strong>tical<br />
Publ<strong>is</strong>hing Society.<br />
Särndal, C., B. Swensson, and J. Wretman. 1992. Model Ass<strong>is</strong>t<strong>ed</strong> Survey Sampling.<br />
New York: Springer-Verlag.<br />
193
Index<br />
adjust<strong>ed</strong> chi-square stat<strong>is</strong>tic<br />
in <strong>Complex</strong> Samples, 66, 80<br />
adjust<strong>ed</strong> F stat<strong>is</strong>tic<br />
in <strong>Complex</strong> Samples, 66, 80<br />
adjust<strong>ed</strong> residuals<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
analys<strong>is</strong> plan, 23<br />
Bonferroni<br />
in <strong>Complex</strong> Samples, 66, 80<br />
Brewer’s sampling method<br />
in Sampling Wizard, 9<br />
chi-square stat<strong>is</strong>tic<br />
in <strong>Complex</strong> Samples, 66, 80<br />
classification tables<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
clusters<br />
in Analys<strong>is</strong> Preparation Wizard, 25<br />
in Sampling Wizard, 7<br />
coefficient of variation (COV)<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Descriptives, 45<br />
in <strong>Complex</strong> Samples Frequencies, 39<br />
in <strong>Complex</strong> Samples Ratios, 57<br />
column percentages<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
<strong>Complex</strong> Samples<br />
hypothes<strong>is</strong> tests, 66, 80<br />
m<strong>is</strong>sing values, 40, 53<br />
options, 41, 46, 53, 58<br />
<strong>Complex</strong> Samples Analys<strong>is</strong> Preparation Wizard,<br />
123<br />
public data, 123<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 137<br />
sampling weights not available, 126<br />
summary, 126, 137<br />
<strong>Complex</strong> Samples Crosstabs, 49, 151<br />
crosstabulation table, 155<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 158<br />
relative r<strong>is</strong>k, 151, 155, 157<br />
stat<strong>is</strong>tics, 51<br />
<strong>Complex</strong> Samples Descriptives, 43, 145<br />
m<strong>is</strong>sing values, 46<br />
public data, 145<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 150<br />
stat<strong>is</strong>tics, 45, 148<br />
stat<strong>is</strong>tics by subpopulation, 149<br />
<strong>Complex</strong> Samples Frequencies, 37, 139<br />
frequency table, 142<br />
frequency table by subpopulation, 143<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 144<br />
stat<strong>is</strong>tics, 39<br />
<strong>Complex</strong> Samples General Linear Model, 61, 167<br />
command additional features,71<br />
estimat<strong>ed</strong> means, 68<br />
marginal means, 175<br />
model, 63<br />
model summary, 172<br />
options, 70<br />
parameter estimates, 174<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 179<br />
save variables, 69<br />
stat<strong>is</strong>tics, 65<br />
tests of model effects, 173<br />
<strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 73, 181<br />
command additional features,85<br />
model, 76<br />
odds ratios, 81, 190<br />
options, 84<br />
parameter estimates, 189<br />
pseudo R 2 stat<strong>is</strong>tics, 187<br />
reference category, 75<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 192<br />
save variables, 83<br />
195
196<br />
Index<br />
stat<strong>is</strong>tics, 78<br />
tests of model effects, 188<br />
<strong>Complex</strong> Samples Ratios, 55, 159<br />
m<strong>is</strong>sing values, 58<br />
ratios, 162<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 165<br />
stat<strong>is</strong>tics, 57<br />
<strong>Complex</strong> Samples Sampling Wizard, 89<br />
relat<strong>ed</strong> proc<strong>ed</strong>ures, 121<br />
sampling frame, full, 89<br />
sampling frame, partial, 103<br />
summary, 99, 100<br />
complex sampling<br />
analys<strong>is</strong> plan, 23<br />
sample plan, 5<br />
confidence intervals<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Descriptives, 45, 148, 149<br />
in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />
in <strong>Complex</strong> Samples General Linear Model, 65,<br />
70<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
in <strong>Complex</strong> Samples Ratios, 57<br />
confidence level<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />
contrasts<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
correlations of parameter estimates<br />
in <strong>Complex</strong> Samples General Linear Model, 65<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
covariances of parameter estimates<br />
in <strong>Complex</strong> Samples General Linear Model, 65<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
crosstabulation table<br />
in <strong>Complex</strong> Samples Crosstabs, 155<br />
cumulative values<br />
in <strong>Complex</strong> Samples Frequencies, 39<br />
in <strong>Complex</strong> Samples Frequencies, 39<br />
in <strong>Complex</strong> Samples General Linear Model, 65<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
in <strong>Complex</strong> Samples Ratios, 57<br />
deviation contrasts<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
difference contrasts<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
estimat<strong>ed</strong> marginal means<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
expect<strong>ed</strong> values<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
F stat<strong>is</strong>tic<br />
in <strong>Complex</strong> Samples, 66, 80<br />
Helmert contrasts<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
inclusion probabilities<br />
in Sampling Wizard, 13<br />
input sample weights<br />
in Sampling Wizard, 7<br />
iteration h<strong>is</strong>tory<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />
iterations<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />
degrees of fre<strong>ed</strong>om<br />
in <strong>Complex</strong> Samples, 66, 80<br />
design effect<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Descriptives, 45<br />
least significance difference<br />
in <strong>Complex</strong> Samples, 66, 80<br />
likelihood convergence<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84
197<br />
Index<br />
marginal means<br />
in GLM Univariate, 175<br />
mean<br />
in <strong>Complex</strong> Samples Descriptives, 45, 148, 149<br />
measure of size<br />
in Sampling Wizard, 9<br />
m<strong>is</strong>sing values<br />
in <strong>Complex</strong> Samples, 40, 53<br />
in <strong>Complex</strong> Samples Descriptives, 46<br />
in <strong>Complex</strong> Samples General Linear Model, 70<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />
in <strong>Complex</strong> Samples Ratios, 58<br />
Murthy’s sampling method<br />
in Sampling Wizard, 9<br />
odds ratios<br />
in <strong>Complex</strong> Samples Crosstabs, 51, 151<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 81,<br />
190<br />
parameter convergence<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />
parameter estimates<br />
in <strong>Complex</strong> Samples General Linear Model, 65,<br />
174<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78,<br />
189<br />
polynomial contrasts<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
population size<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Descriptives, 45<br />
in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />
in <strong>Complex</strong> Samples Ratios, 57<br />
in Sampling Wizard, 13<br />
PPS sampling<br />
in Sampling Wizard, 9<br />
pr<strong>ed</strong>ict<strong>ed</strong> categories<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 83<br />
pr<strong>ed</strong>ict<strong>ed</strong> probabilities<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 83<br />
pr<strong>ed</strong>ict<strong>ed</strong> values<br />
in <strong>Complex</strong> Samples General Linear Model, 69<br />
pseudo R 2 stat<strong>is</strong>tics<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78,<br />
187<br />
public data<br />
in Analys<strong>is</strong> Preparation Wizard, 123<br />
in <strong>Complex</strong> Samples Descriptives, 145<br />
R 2 in <strong>Complex</strong> Samples General Linear Model, 65,<br />
172<br />
ratios<br />
in <strong>Complex</strong> Samples Ratios, 162<br />
reference category<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 75<br />
relative r<strong>is</strong>k<br />
in <strong>Complex</strong> Samples Crosstabs, 51, 151, 155,<br />
157<br />
repeat<strong>ed</strong> contrasts<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
residuals<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples General Linear Model, 69<br />
r<strong>is</strong>k difference<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
row percentages<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
Sampford’s sampling method<br />
in Sampling Wizard, 9<br />
sample plan, 5<br />
sample proportion<br />
in Sampling Wizard, 13<br />
sample size<br />
in Sampling Wizard, 11, 13<br />
sample weights<br />
in Analys<strong>is</strong> Preparation Wizard, 25<br />
in Sampling Wizard, 13
198<br />
Index<br />
sampling<br />
complex design, 5<br />
sampling estimation<br />
in Analys<strong>is</strong> Preparation Wizard, 27<br />
sampling frame, full<br />
in Sampling Wizard, 89<br />
sampling frame, partial<br />
in Sampling Wizard, 103<br />
sampling method<br />
in Sampling Wizard, 9<br />
separation<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />
sequential Bonferroni<br />
in <strong>Complex</strong> Samples, 66, 80<br />
sequential sampling<br />
in Sampling Wizard, 9<br />
sequential Sidak<br />
in <strong>Complex</strong> Samples, 66, 80<br />
Sidak<br />
in <strong>Complex</strong> Samples, 66, 80<br />
simple contrasts<br />
in <strong>Complex</strong> Samples General Linear Model, 68<br />
simple random sampling<br />
in Sampling Wizard, 9<br />
square root of design effect<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Descriptives, 45<br />
in <strong>Complex</strong> Samples Frequencies, 39<br />
in <strong>Complex</strong> Samples General Linear Model, 65<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
in <strong>Complex</strong> Samples Ratios, 57<br />
standard error<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Descriptives, 45, 148, 149<br />
in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />
in <strong>Complex</strong> Samples General Linear Model, 65<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
in <strong>Complex</strong> Samples Ratios, 57<br />
step-halving<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 84<br />
stratification<br />
in Analys<strong>is</strong> Preparation Wizard, 25<br />
in Sampling Wizard, 7<br />
sum<br />
in <strong>Complex</strong> Samples Descriptives, 45<br />
summary<br />
in Analys<strong>is</strong> Preparation Wizard, 126, 137<br />
in Sampling Wizard, 99, 100<br />
systematic sampling<br />
in Sampling Wizard, 9<br />
table percentages<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Frequencies, 39, 142, 143<br />
tests of model effects<br />
in <strong>Complex</strong> Samples General Linear Model, 173<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 188<br />
t tests<br />
in <strong>Complex</strong> Samples General Linear Model, 65<br />
in <strong>Complex</strong> Samples Log<strong>is</strong>tic Regression, 78<br />
unweight<strong>ed</strong> count<br />
in <strong>Complex</strong> Samples Crosstabs, 51<br />
in <strong>Complex</strong> Samples Descriptives, 45<br />
in <strong>Complex</strong> Samples Frequencies, 39<br />
in <strong>Complex</strong> Samples Ratios, 57