How to use HLM 6 for hierarchical linear
How to use HLM 6 for hierarchical linear
How to use HLM 6 for hierarchical linear
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>How</strong> <strong>to</strong> <strong>use</strong> <strong>HLM</strong> 6 <strong>for</strong> <strong>hierarchical</strong> <strong>linear</strong> modeling<br />
(aka “mixed modeling”, aka “generalized estimating equations”)<br />
Use <strong>HLM</strong> when you have random effects (e.g., outcomes over time, a continuous<br />
variable) nested within fixed effects (e.g., participants, a categorical variable).<br />
Options <strong>for</strong> the procedure include “PROC MIXED” in SAS, “PROC GLM” with a<br />
“RANDOM” command statement in SAS, “Repeated Measures” under the GLM<br />
menu in SPSS, or the <strong>HLM</strong> program.<br />
The main disadvantage of using a GLM-based procedure is that it is not able <strong>to</strong> deal<br />
with missing data – if not all participants have data at all time intervals (especially if<br />
observations are scattered over varying amounts of time or at different intervals <strong>for</strong><br />
each person, e.g., in the case of cued diary-type measures), then GLM-based analyses<br />
will drop the time intervals at which not all participants have data (or else drop the<br />
participants who do not have data at all time intervals). This weakness is overcome by<br />
PROC MIXED in SAS (Littell, Milliken, Stroup, & Wolfinger, 1996) or by the <strong>HLM</strong><br />
program designed specifically <strong>for</strong> <strong>hierarchical</strong> <strong>linear</strong> modeling (Raudenbush, Bryk,<br />
Cheong, & Congdon, 2001). “The basic theory on which PROC MIXED [in SAS] is<br />
based holds even with unbalanced and missing data, so long as the missing data are<br />
random,” (Littell, et al., 1996). With multiple repeated measures <strong>for</strong> specific<br />
participants, another issue is that the observations <strong>for</strong> a given participant are serially<br />
au<strong>to</strong>correlated with each other, violating the assumption of independence of<br />
observations which is fundamental <strong>to</strong> GLM procedures. For these reasons, it is<br />
preferable <strong>to</strong> <strong>use</strong> <strong>HLM</strong> or SAS PROC MIXED when dealing with multiple repeated<br />
measures, rather than the GLM-based Repeated Measures ANOVA option in SPSS.<br />
The basic <strong>HLM</strong> procedure is not specifically limited by sample size, although some<br />
procedures do require larger samples in order <strong>to</strong> be reliable (e.g., <strong>use</strong> of “robust<br />
standard errors” <strong>to</strong> improve estimates of beta- and gamma-weights in the <strong>HLM</strong><br />
program). Assumptions of the procedure include:<br />
• Variables are normally distributed<br />
• Level-2 cases are independent of one another (Level-1 cases are expected <strong>to</strong><br />
be dependent)<br />
• There is homogeneity of variance <strong>for</strong> the variability in the Level-1 cases (the<br />
<strong>HLM</strong> program has an option <strong>to</strong> test <strong>for</strong> whether this assumption is violated)<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 1 of 8
STEPS FOR USING <strong>HLM</strong>:<br />
1. Configure data – Level 1 file has multiple observations per case (change over<br />
time <strong>for</strong> the case, with a “time” or “sequence” variable, and an<br />
“outcome” variable)<br />
– Level 2 file has only one observation per case (additional<br />
descrip<strong>to</strong>rs <strong>for</strong> the case)<br />
2. Import in<strong>to</strong> <strong>HLM</strong> program and “make new MDM” file. There are a number of<br />
steps here, and it’s important <strong>to</strong> do them all in the right order:<br />
— select “stat package input”<br />
— select “<strong>HLM</strong>2” <strong>for</strong> the type of analysis (2 levels)<br />
— set “input file type” <strong>to</strong> “SPSS”<br />
— make up a name <strong>for</strong> the MDM file (with “.mdm” extension)<br />
— identify the files using “browse” but<strong>to</strong>ns<br />
— <strong>use</strong> “choose variables” <strong>to</strong> define the Level 1 & Level 2<br />
variables of interest<br />
— select the variable that is the primary key between the<br />
Level 1 & Level 2 files – it gets flagged as “ID” in both<br />
— if there are any missing data, select “missing data” in the<br />
Level 1 file. If you are using the student version, select<br />
“delete missing data when making MDM” beca<strong>use</strong> this<br />
makes the analysis less complex. Otherwise, select “delete<br />
missing data when running analyses,” beca<strong>use</strong> this will<br />
conserve statistical power as much as possible.<br />
— click “save mdmt file” and give it a file name.<br />
— click the “make MDM” but<strong>to</strong>n.<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 2 of 8
3. Click on the “check stats” but<strong>to</strong>n <strong>to</strong> check descriptive stats <strong>for</strong> each variable:<br />
LEVEL-1 DESCRIPTIVE STATISTICS<br />
VARIABLE NAME N MEAN SD MINIMUM MAXIMUM<br />
SCORE 48 10.00 5.17 1.00 19.00<br />
TRIAL 48 2.50 1.13 1.00 4.00<br />
LEVEL-2 DESCRIPTIVE STATISTICS<br />
VARIABLE NAME N MEAN SD MINIMUM MAXIMUM<br />
ANXIETY 12 1.50 0.52 1.00 2.00<br />
TENSION 12 1.50 0.52 1.00 2.00<br />
4. Click on the “Done” but<strong>to</strong>n <strong>to</strong> go <strong>to</strong> the next screen<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 3 of 8
5. Specify the <strong>HLM</strong> model – beta coefficients <strong>for</strong> Level 1 variables; gamma<br />
coefficients <strong>for</strong> Level 2 variables. Start by specifying the “outcome” variable at<br />
Level 1, then add other variables <strong>to</strong> the model at Level 1 and Level 2. (Left-click<br />
on each variable name on the list on the left-hand side of the screen, in order <strong>to</strong><br />
specify their role in the equation)<br />
6. Clicking either “Basic Settings” or “Outcome” lets you say where <strong>to</strong> save the<br />
output file, whether <strong>to</strong> graph results, etc.<br />
7. Save the model, under the “File” menu.<br />
8. Click on “Run Analysis” <strong>to</strong> see results.<br />
9. “View Output” is under the “file” menu. Results include the model coefficients<br />
and tests <strong>for</strong> the statistical significance of each predic<strong>to</strong>r. The model also gives<br />
you the level-2 values <strong>for</strong> each level-1 regression equation:<br />
SAMPLE OUTPUT<br />
Summary of the model specified (in equation <strong>for</strong>mat)<br />
---------------------------------------------------<br />
Level-1 Model<br />
Y = B0 + B1*(TRIAL) + R<br />
Level-2 Model<br />
B0 = G00 + U0<br />
B1 = G10 + G11*(ANXIETY) + G12*(TENSION) + U1<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 4 of 8
Level-1 OLS regressions<br />
-----------------------<br />
Level-2 Unit INTRCPT1 TRIAL slope<br />
------------------------------------------------------------------------------<br />
1 22.00000 -3.80000<br />
2 23.00000 -4.90000<br />
3 18.00000 -4.00000<br />
4 20.00000 -3.80000<br />
5 15.00000 -3.20000<br />
6 22.50000 -5.60000<br />
7 19.00000 -3.80000<br />
8 21.50000 -5.50000<br />
9 21.00000 -4.80000<br />
10 23.00000 -3.90000<br />
The average OLS level-1 coefficient <strong>for</strong> INTRCPT1 = 20.12500<br />
The average OLS level-1 coefficient <strong>for</strong> TRIAL = -4.05000<br />
Least Squares Estimates<br />
-----------------------<br />
sigma_squared = 5.60202<br />
The outcome variable is<br />
SCORE<br />
Least-squares estimates of fixed effects<br />
----------------------------------------------------------------------------<br />
Standard<br />
Fixed Effect Coefficient Error T-ratio d.f. P-value<br />
----------------------------------------------------------------------------<br />
For INTRCPT1, B0<br />
INTRCPT2, G00 20.125000 0.836811 24.050 44 0.000<br />
For TRIAL slope, B1<br />
INTRCPT2, G10 -5.216667 0.611120 -8.536 44 0.000<br />
ANXIETY, G11 0.361111 0.249489 1.447 44 0.155<br />
TENSION, G12 0.416667 0.249489 1.670 44 0.102<br />
----------------------------------------------------------------------------<br />
Interpretation: In this example, the level-2 constant (intercept 2, G00) is a significant<br />
predic<strong>to</strong>r of the level-one constant (beta-zero), which is the participant’s initial level of<br />
per<strong>for</strong>mance on the SCORE variable. The level-2 constant (intercept 2, G10) is also a<br />
significant predic<strong>to</strong>r of the level-1 slope (beta-one). Although anxiety level (G11)<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 5 of 8
approaches significance (p = .155) as a predic<strong>to</strong>r of beta-one, and tension level (G12) also<br />
approaches significance (p = .102) as a predic<strong>to</strong>r of beta-one, neither of these level-2<br />
predic<strong>to</strong>rs had a strong enough effect <strong>to</strong> be considered statistically significant as a<br />
predic<strong>to</strong>r of the within-person change in the SCORE variable over time (i.e., beta-one).<br />
SAMPLE OUTPUT (CONTINUED)<br />
Final estimation of variance components:<br />
-----------------------------------------------------------------------------<br />
Random Effect Standard Variance df Chi-square P-value<br />
Deviation Component<br />
-----------------------------------------------------------------------------<br />
INTRCPT1, U0 1.87129 3.50174 11 73.95178 0.000<br />
level-1, R 1.56446 2.44754<br />
-----------------------------------------------------------------------------<br />
Interpretation: The level-1 intercept (i.e., people’s starting score) is a significant<br />
predic<strong>to</strong>r of the SCORE variable over time. This means that people are significantly<br />
different from one another (there is variability among the level-1 units), even though the<br />
level-2 predic<strong>to</strong>rs weren’t able <strong>to</strong> account <strong>for</strong> this variability.<br />
Statistics <strong>for</strong> current covariance components model<br />
--------------------------------------------------<br />
Deviance = 198.591730<br />
Number of estimated parameters = 2<br />
Interpretation: The deviance statistic is the same as a -2 log likelihood, and the larger it<br />
is, the worse the fit between the model and the data. This -2LL is fairly high (greater than<br />
100), so the model is not an adequate fit <strong>for</strong> the data. Other predic<strong>to</strong>rs or other<br />
combinations of variables should be considered in trying <strong>to</strong> account <strong>for</strong> individual<br />
participants’ outcomes on the SCORE variable.<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 6 of 8
You can also test one <strong>HLM</strong> model against another, by using the “hypothesis testing”<br />
command under the “other settings” menu. Just put in this model’s deviance and df (from<br />
the output above), specify and different model, and re-run the analysis <strong>to</strong> compare them.<br />
One other great new feature in <strong>HLM</strong> 6 is that you can graph each individual participant’s<br />
level-1 regression line <strong>to</strong> see the overall pattern and any outliers. Here’s how: <strong>use</strong> the<br />
“graph equations – level 1 equation graphing” command in the “file” menu.<br />
In the next screen, again select your outcome variable, and the level-1 predic<strong>to</strong>r and<br />
level-2 predic<strong>to</strong>r that you specifically want <strong>to</strong> focus on. You can select either a subset of<br />
cases (e.g., “first ten cases”) if the sample size is very large, or all cases. In this example<br />
the <strong>to</strong>tal n was only 12 cases, so we included all of them.<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 7 of 8
The output graph looks like this. It shows you each individual participant’s score (y-axis)<br />
over time (x-axis) as a separate regression line. It further highlights people with the two<br />
different levels of tension (the level-2 predic<strong>to</strong>r) in different colors. This graph confirms<br />
our statistical results, showing that the “tension” variable didn’t significantly differentiate<br />
among people, even though there was a significant association <strong>for</strong> everyone between<br />
“score” and “time.”<br />
UCDHSC Center <strong>for</strong> Nursing Research Updated 5/20/06<br />
Page 8 of 8