23.06.2015 Views

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Versi<strong>on</strong>: September 3, 2010<br />

<str<strong>on</strong>g>Notes</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Multivariate</strong> <strong>Normal</strong><br />

<strong>and</strong> <strong>Related</strong> <strong>Topics</strong><br />

Let me refresh your memory about <strong>the</strong> distincti<strong>on</strong>s between populati<strong>on</strong><br />

<strong>and</strong> sample; parameters <strong>and</strong> statistics; populati<strong>on</strong> distributi<strong>on</strong>s<br />

<strong>and</strong> sampling distributi<strong>on</strong>s. One might say that any<strong>on</strong>e<br />

worth knowing, knows <strong>the</strong>se distincti<strong>on</strong>s. We start with <strong>the</strong> simple<br />

univariate framework <strong>and</strong> <strong>the</strong>n move <strong>on</strong> to <strong>the</strong> multivariate c<strong>on</strong>text.<br />

A) Begin by positing a populati<strong>on</strong> of interest that is operati<strong>on</strong>alized<br />

by some r<strong>and</strong>om variable, say X. In this Theory World framework,<br />

X is characterized by parameters, such as <strong>the</strong> expectati<strong>on</strong> of<br />

X, µ = E(X), or its variance, σ 2 = V(X). The r<strong>and</strong>om variable<br />

X has a (populati<strong>on</strong>) distributi<strong>on</strong>, which for us will typically be<br />

assumed normal.<br />

B) A sample is generated by taking observati<strong>on</strong>s <strong>on</strong> X, say, X 1 , . . . , X n ,<br />

c<strong>on</strong>sidered independent <strong>and</strong> identically distributed as X, i.e., <strong>the</strong>y<br />

are exact copies of X. In this Data World c<strong>on</strong>text, statistics are functi<strong>on</strong>s<br />

of <strong>the</strong> sample, <strong>and</strong> <strong>the</strong>refore, characterize <strong>the</strong> sample: <strong>the</strong> sample<br />

mean, ˆµ = 1 ∑ n<br />

n i=1 X i ; <strong>the</strong> sample variance, ˆσ 2 = 1 ∑ n<br />

n i=1 (X i − ˆµ) 2 ,<br />

with some possible variati<strong>on</strong> in dividing by n−1 to generate an unbiased<br />

estimator of σ 2 . The statistics, ˆµ <strong>and</strong> ˆσ 2 , are point estimators<br />

of µ <strong>and</strong> σ 2 . They are r<strong>and</strong>om variables by <strong>the</strong>mselves, so <strong>the</strong>y have<br />

distributi<strong>on</strong>s that are called sampling distributi<strong>on</strong>s.<br />

The general problem of statistical inference is to ask what <strong>the</strong> sample<br />

statistics, ˆµ <strong>and</strong> ˆσ 2 , tell us about <strong>the</strong>ir populati<strong>on</strong> counterparts,<br />

1


such as µ <strong>and</strong> σ 2 . Can we obtain some noti<strong>on</strong> of accuracy from <strong>the</strong><br />

sampling distributi<strong>on</strong>, e.g., c<strong>on</strong>fidence intervals?<br />

The multivariate problem is generally <strong>the</strong> same but with more<br />

notati<strong>on</strong>:<br />

A) The populati<strong>on</strong> (Theory World) is characterized by a collecti<strong>on</strong><br />

of p r<strong>and</strong>om variables, X ′ = [X 1 , X 2 , . . . , X p ], with parameters:<br />

µ i = E(X i ); σ 2 i = V(X i ); σ ij = Cov(X i , X j ) = E[(X i −E(X i ))(X j −<br />

E(X j ))]; or <strong>the</strong> correlati<strong>on</strong> between X i <strong>and</strong> X j , ρ ij = σ ij /σ i σ j .<br />

B) The sample (Data World) is defined by n independent observati<strong>on</strong>s<br />

<strong>on</strong> <strong>the</strong> r<strong>and</strong>om vector X, with <strong>the</strong> observati<strong>on</strong>s placed into<br />

an n × p data matrix (e.g., subject by variable) that we also (with a<br />

slight abuse of notati<strong>on</strong>) denote by a bold-face capital letter, X n×p :<br />

X n×p =<br />

⎛<br />

⎜<br />

⎝<br />

⎞<br />

x 11 x 12 · · · x 1p<br />

x 21 x 22 · · · x 2p<br />

=<br />

.<br />

.<br />

⎟<br />

⎠<br />

x n1 x n2 · · · x np<br />

The statistics corresp<strong>on</strong>ding to <strong>the</strong> parameters of <strong>the</strong> populati<strong>on</strong> are:<br />

ˆµ i = 1 ∑ n<br />

n k=1 x ki ; ˆσ 2 i = 1 ∑ n<br />

n k=1 (x ki − ˆµ i ) 2 , with again some possible<br />

variati<strong>on</strong> to a divisi<strong>on</strong> by n − 1 for an unbiased estimate; ˆσ ij =<br />

∑ n<br />

k=1 (x ki − ˆµ i )(x kj − ˆµ j ); <strong>and</strong> ˆρ ij = ˆσ ij /ˆσ iˆσ j .<br />

1<br />

n<br />

To obtain a good sense of what <strong>the</strong> estimators tell us about <strong>the</strong><br />

populati<strong>on</strong> parameters, we will have to make some assumpti<strong>on</strong> about<br />

<strong>the</strong> populati<strong>on</strong>, e.g., [X 1 , . . . , X p ] has a multivariate normal distributi<strong>on</strong>.<br />

As we will see, this assumpti<strong>on</strong> leads to some very nice results.<br />

⎛<br />

⎜<br />

⎝<br />

x ′ 1<br />

x ′ 2<br />

.<br />

x ′ n<br />

⎞<br />

⎟<br />

⎠<br />

.<br />

2


0.1 Developing <strong>the</strong> <strong>Multivariate</strong> <strong>Normal</strong> Distributi<strong>on</strong><br />

Suppose X 1 , . . . , X p are p c<strong>on</strong>tinuous r<strong>and</strong>om variables with density<br />

functi<strong>on</strong>s, f 1 (x 1 ), . . . , f p (x p ), <strong>and</strong> distributi<strong>on</strong> functi<strong>on</strong>s, F 1 (x 1 ), . . . , F p (x p ),<br />

where<br />

P (X i ≤ x i ) = F i (x i ) = ∫ x i<br />

−∞ f i(x i )dx .<br />

We define a p-dimensi<strong>on</strong>al r<strong>and</strong>om variable (or r<strong>and</strong>om vector) as <strong>the</strong><br />

vector, X ′ = [X 1 , . . . , X p ]; X has <strong>the</strong> joint cumulative distributi<strong>on</strong><br />

functi<strong>on</strong><br />

F (x 1 , . . . , x p ) = ∫ x p<br />

−∞ · · · ∫ x 1<br />

−∞ f(x 1, . . . , x p )dx 1 · · · dx p .<br />

If <strong>the</strong> r<strong>and</strong>om variables, X 1 , . . . , X p are independent, <strong>the</strong>n <strong>the</strong> joint<br />

density <strong>and</strong> cumulative distributi<strong>on</strong> functi<strong>on</strong>s factor: f(x 1 , . . . , x p ) =<br />

f 1 (x 1 ) · · · f p (x p ) <strong>and</strong> F (x 1 , . . . , x p ) = F 1 (x 1 ) · · · F p (x p ). The independence<br />

property will not be assumed in <strong>the</strong> following; in fact, <strong>the</strong><br />

whole idea is to investigate what type of dependency exists in <strong>the</strong><br />

r<strong>and</strong>om vector X.<br />

When X ′ = [X 1 , X 2 , . . . , X p ] ∼ MVN(µ, Σ), <strong>the</strong> joint density<br />

functi<strong>on</strong> has <strong>the</strong> form<br />

1<br />

φ(x 1 , . . . , x p ) =<br />

(2π) p/2 |Σ| exp{−1 1/2 2 (x − µ)′ Σ −1 (x − µ)} .<br />

In <strong>the</strong> univariate case, <strong>the</strong> density provides <strong>the</strong> usual bell-shaped<br />

curve: φ(x) = √ 1<br />

2πσ<br />

exp{− 1 2 [(x − µ)/σ)2 ]}.<br />

Given <strong>the</strong> MVN assumpti<strong>on</strong> <strong>on</strong> <strong>the</strong> generati<strong>on</strong> of <strong>the</strong> data matrix,<br />

3


X n×p , we know <strong>the</strong> sampling distributi<strong>on</strong> of <strong>the</strong> vector of means:<br />

ˆµ =<br />

⎛<br />

⎜<br />

⎝<br />

⎞<br />

ˆµ 1<br />

ˆµ 2<br />

.<br />

⎟<br />

⎠<br />

ˆµ p<br />

∼ MVN(µ, (1/n)Σ)<br />

In <strong>the</strong> univariate case, we have that ˆµ ∼ N(µ, (1/n)σ 2 ). As might be<br />

expected, a Central Limit Theorem (CLT) states that <strong>the</strong>se results<br />

also hold asymptotically when <strong>the</strong> MVN assumpti<strong>on</strong> is relaxed. Also,<br />

if we estimate <strong>the</strong> variance-covariance matrix, Σ, with divisi<strong>on</strong>s by<br />

n − 1 ra<strong>the</strong>r than n, <strong>and</strong> denote <strong>the</strong> result as ˆΣ, <strong>the</strong>n E(ˆΣ) = Σ.<br />

Linear combinati<strong>on</strong>s of r<strong>and</strong>om variables form <strong>the</strong> backb<strong>on</strong>e of all<br />

of multivariate statistics. For example, suppose X ′ = [X 1 , X 2 , . . . , X p ] ∼<br />

MVN(µ, Σ), <strong>and</strong> c<strong>on</strong>sider <strong>the</strong> following q linear combinati<strong>on</strong>s:<br />

Z 1 = c 11 X 1 + · · · + c 1p X p<br />

.<br />

Z q = c q1 X 1 + · · · + c qp X p<br />

Then <strong>the</strong> vector Z ′ = [Z 1 , Z 2 , . . . , Z q ] ∼ MVN(Cµ, CΣC ′ ), where<br />

C q×p =<br />

⎛<br />

c 11<br />

⎜<br />

.<br />

⎝<br />

c q1<br />

⎞<br />

· · · c 1p<br />

.<br />

⎟<br />

⎠<br />

· · · c qp<br />

These same results hold in <strong>the</strong> sample if we observe <strong>the</strong> q linear<br />

combinati<strong>on</strong>s, [Z 1 , Z 2 , . . . , Z q ]: i.e., <strong>the</strong> sample mean vector is Cˆµ<br />

<strong>and</strong> <strong>the</strong> sample variance-covariance matrix is CˆΣC ′ .<br />

.<br />

4


0.2 Multiple Regressi<strong>on</strong> <strong>and</strong> Partial Correlati<strong>on</strong> (in<br />

<strong>the</strong> Populati<strong>on</strong>)<br />

Suppose we partiti<strong>on</strong> our vector of p r<strong>and</strong>om variables as follows:<br />

X ′ = [X 1 , X 2 , . . . , X p ] =<br />

[[X 1 , X 2 , . . . , X k ], [X k+1 , X k+2 . . . , X p ]] ≡ [X ′ 1, X ′ 2]<br />

Partiti<strong>on</strong>ing <strong>the</strong> mean vector <strong>and</strong> variance-covariance matrix in <strong>the</strong><br />

same way,<br />

⎛ ⎞<br />

⎛<br />

⎞<br />

⎜ µ<br />

µ = 1 ⎟<br />

⎜ Σ<br />

⎝ ⎠ ; Σ = 11 Σ 12 ⎟<br />

⎝<br />

µ 2 Σ ′ 12 Σ ⎠ ,<br />

22<br />

we have<br />

X ′ 1 ∼ MVN(µ 1 , Σ 11 ) ; X ′ 2 ∼ MVN(µ 2 , Σ 22 ) .<br />

X ′ 1 <strong>and</strong> X ′ 2 are statistically independent vectors if <strong>and</strong> <strong>on</strong>ly if Σ 12 =<br />

0 k×p , <strong>the</strong> k × p zero matrix.<br />

What I call <strong>the</strong> Master Theorem refers to <strong>the</strong> c<strong>on</strong>diti<strong>on</strong>al density<br />

of X 1 given X 2 = x 2 ; it is multivariate normal with mean vector of<br />

order k × 1:<br />

µ 1 + Σ 12 Σ −1<br />

22 (x 2 − µ 2 ) ,<br />

<strong>and</strong> variance-covariance matrix of order k × k:<br />

Σ 11 − Σ 12 Σ −1<br />

22 Σ′ 12 .<br />

If we denote <strong>the</strong> (i, j) th partial covariance element in this latter matrix<br />

(‘holding’ all variables in <strong>the</strong> sec<strong>on</strong>d set ‘c<strong>on</strong>stant’) as σ ij·(k+1)···p ,<br />

<strong>the</strong>n <strong>the</strong> partial correlati<strong>on</strong> of variables i <strong>and</strong> j, ‘holding’ all variables<br />

in <strong>the</strong> sec<strong>on</strong>d set ‘c<strong>on</strong>stant’, is<br />

ρ ij·(k+1)···p = σ ij·(k+1)···p / √ σ ii·(k+1)···p σ jj·(k+1)···p .<br />

5


Notice that in <strong>the</strong> formulas just given, <strong>the</strong> inverse of Σ 22 must exist;<br />

o<strong>the</strong>rwise, we have what is called “multicollinearity,” <strong>and</strong> soluti<strong>on</strong>s<br />

d<strong>on</strong>’t exist (or are very unstable numerically if <strong>the</strong> inverse ‘almost’<br />

doesn’t exist). So, as <strong>on</strong>e moral: d<strong>on</strong>’t use both total <strong>and</strong> subtest<br />

scores in <strong>the</strong> same set of “independent” variables.<br />

If <strong>the</strong> set X 1 c<strong>on</strong>tains just <strong>on</strong>e r<strong>and</strong>om variable, X 1 , <strong>the</strong>n <strong>the</strong> mean<br />

vector of <strong>the</strong> c<strong>on</strong>diti<strong>on</strong>al distributi<strong>on</strong> can be given as<br />

E(X 1 ) + σ ′ 12 Σ−1 22 (x 2 − µ 2 ) ,<br />

where σ<br />

12 ′ is 1×(p−1) <strong>and</strong> of <strong>the</strong> form [Cov(X 1, X 2 ), . . . , Cov(X 1 , X p )].<br />

This is nothing but our old regressi<strong>on</strong> equati<strong>on</strong> written out in matrix<br />

notati<strong>on</strong>. If we let β ′ = σ<br />

12 ′ Σ−1 22<br />

, <strong>the</strong>n <strong>the</strong> c<strong>on</strong>diti<strong>on</strong>al mean vector<br />

(predicted X 1 ) is equal to E(X 1 ) + β ′ (x 2 − µ 2 ) = E(X 1 ) + β 1 (x 2 −<br />

µ 2 )+· · ·+β p−1 (x p −µ p ). The covariance matrix, Σ 11 −Σ 12 Σ −1<br />

22 Σ′ 12 ,<br />

takes <strong>on</strong> <strong>the</strong> form Var(X 1 ) − σ 12 Σ −1<br />

22 σ′ 12 , i.e., <strong>the</strong> variati<strong>on</strong> in X 1<br />

that is not explained. If we take explained variati<strong>on</strong>, σ 12 Σ −1<br />

22 σ′ 12 ,<br />

<strong>and</strong> c<strong>on</strong>sider <strong>the</strong> proporti<strong>on</strong> to <strong>the</strong> total variance, Var(X 1 ), we define<br />

<strong>the</strong> squared multiple correlati<strong>on</strong> coefficient:<br />

ρ 2 1·2···p = σ 12Σ −1<br />

22 σ′ 12<br />

Var(X 1 )<br />

In fact, <strong>the</strong> linear combinati<strong>on</strong>, β ′ X 2 , has <strong>the</strong> highest correlati<strong>on</strong> of<br />

any linear combinati<strong>on</strong> with X 1 ; <strong>and</strong> this correlati<strong>on</strong> is <strong>the</strong> positive<br />

square root of <strong>the</strong> squared multiple correlati<strong>on</strong> coefficient.<br />

0.3 Moving to <strong>the</strong> Sample<br />

Up to this point, <strong>the</strong> c<strong>on</strong>cepts of regressi<strong>on</strong> <strong>and</strong> kindred ideas, have<br />

been discussed <strong>on</strong>ly in terms of populati<strong>on</strong> parameters. We now have<br />

6<br />

.


<strong>the</strong> task of obtaining estimates of <strong>the</strong>se various quantities based <strong>on</strong> a<br />

sample; also, associated inference procedures need to be developed.<br />

Generally, we will rely <strong>on</strong> maximum-likelihood estimati<strong>on</strong> <strong>and</strong> <strong>the</strong><br />

related likelihood-ratio tests.<br />

To define how maximum-likelihood estimati<strong>on</strong> proceeds, we will<br />

first give a series of general steps, <strong>and</strong> <strong>the</strong>n operati<strong>on</strong>alize for a univariate<br />

normal distributi<strong>on</strong> example.<br />

A) Let X 1 , . . . , X n be univariate observati<strong>on</strong>s <strong>on</strong> X (independent<br />

<strong>and</strong> c<strong>on</strong>tinuous, <strong>and</strong> depending <strong>on</strong> some parameters, θ 1 , . . . , θ k . The<br />

density functi<strong>on</strong> of X i is denoted as f(x i ; θ 1 , . . . , θ k ).<br />

B) The likelihood of observing x 1 , . . . , x n for values of X 1 , . . . , X n<br />

is <strong>the</strong> joint density of <strong>the</strong> n observati<strong>on</strong>s, <strong>and</strong> because <strong>the</strong> observati<strong>on</strong>s<br />

are independent,<br />

L(θ 1 , . . . , θ k ) ≡ n ∏<br />

i=1<br />

f(x i ; θ 1 , . . . , θ k ) ,<br />

i.e., we assume x 1 , . . . , x n are already observed, <strong>and</strong> that L is a<br />

functi<strong>on</strong> of <strong>the</strong> parameters <strong>on</strong>ly.<br />

Now, choose parameter values such that L(θ 1 , . . . , θ k ) is at a maximum<br />

— i.e., <strong>the</strong> probability of observing that particular sample<br />

is maximized by <strong>the</strong> choice of θ 1 , . . . , θ k . Generally, it is easier<br />

<strong>and</strong> equivalent to maximize <strong>the</strong> log-likelihood using l(θ 1 , . . . , θ k ) ≡<br />

log(L(θ 1 , . . . , θ k )).<br />

C) Generally, <strong>the</strong> maximum values are found through differentiati<strong>on</strong>,<br />

<strong>and</strong> by setting <strong>the</strong> partial derivatives equal to zero. Explicitly,<br />

7


we find θ 1 , . . . , θ k such that<br />

∂<br />

∂θ j<br />

l(θ 1 , . . . , θ k ) = 0 ,<br />

for j = 1, . . . , k. (We should probably check sec<strong>on</strong>d derivatives to<br />

see if we have a maximum, but this is almost always true, anyways.)<br />

Example:<br />

Suppose X i ∼ N(µ, σ 2 ), with density<br />

f(x i ; µ, σ 2 ) = 1<br />

σ √ 2π exp{−(x i − µ) 2 /2σ 2 } .<br />

The likelihood has <strong>the</strong> form<br />

L(µ, σ 2 ∏<br />

) = n 1<br />

i=1 σ √ 2π exp{−(x i − µ) 2 /2σ 2 } =<br />

(1/ √ 2π) n (1/σ 2 ) n/2 exp{−(1/2σ 2 ∑<br />

) n (x i − µ) 2 } .<br />

The log-likelihood reduces:<br />

i=1<br />

l(µ, σ 2 ) = log L(µ, σ 2 ) =<br />

log(1/ √ 2π) n + log(1/σ 2 ) n/2 + (−(1/2σ 2 ∑<br />

) n (x i − µ) 2 ) =<br />

i=1<br />

c<strong>on</strong>stant − (n/2) log σ 2 − (1/2σ 2 ∑<br />

) n (x i − µ) 2 .<br />

The partial derivatives have <strong>the</strong> form:<br />

∂l(µ, σ 2 )<br />

∂µ<br />

= −(1/2σ 2 ) n ∑<br />

i=1<br />

i=1<br />

2(x i − µ)(−1) ,<br />

8


<strong>and</strong><br />

∂l(µ, σ 2 )<br />

∂σ 2<br />

= −(n/2)(1/σ 2 ) − (1/2(σ 2 ) 2 ∑<br />

)(−1) n (x i − µ) 2 .<br />

Setting <strong>the</strong>se two expressi<strong>on</strong>s to zero, gives<br />

ˆµ = n ∑<br />

i=1<br />

i=1<br />

x i /n ; ˆσ 2 ∑<br />

= (1/n) n (x i − ˆµ) 2 .<br />

Maximum likelihood (ML) estimates are generally c<strong>on</strong>sistent, asymptotically<br />

normal (for large n), <strong>and</strong> efficient; a functi<strong>on</strong> of <strong>the</strong> sufficient<br />

statistics; <strong>and</strong> have an invariance to operati<strong>on</strong>s performed <strong>on</strong> <strong>the</strong>m<br />

– e.g., <strong>the</strong> ML estimate for σ is just √ˆσ 2 . As <strong>the</strong> ML estimator for<br />

ˆσ 2 shows, ML estimators are not necessarily unbiased.<br />

Ano<strong>the</strong>r Example:<br />

Suppose X 1 , . . . , X n are observati<strong>on</strong>s <strong>on</strong> <strong>the</strong> Poiss<strong>on</strong> discrete r<strong>and</strong>om<br />

variable X (having outcomes 0, 1, . . .):<br />

i=1<br />

The likelihood is<br />

L(λ) = n ∏<br />

P (X i = x i ) = exp(−λ)λx i<br />

x i !<br />

i=1<br />

<strong>and</strong> <strong>the</strong> log-likelihood<br />

P (X i = x i ) = exp(−nλ)λ∑ i x i<br />

∏<br />

i x i !<br />

l(λ) = log L(λ) = −nλ + log(λ ∑ i x i<br />

) − log ∏ i<br />

.<br />

,<br />

x i ! .<br />

The partial derivative<br />

∂l(λ)<br />

∂λ<br />

= −n + (1/λ) n ∑<br />

i=1<br />

x i ,<br />

9


<strong>and</strong> when set to zero, gives<br />

0.3.1 Likelihood Ratio Tests<br />

ˆλ = n ∑<br />

i=1<br />

x i /n .<br />

Besides using <strong>the</strong> likelihood c<strong>on</strong>cept to find good point estimates,<br />

<strong>the</strong> likelihood can also be used to find good tests of hypo<strong>the</strong>ses.<br />

We will develop this idea in terms of a simple example: Suppose<br />

X 1 = x 1 , . . . , X n = x n denote values obtained <strong>on</strong> n independent<br />

observati<strong>on</strong>s <strong>on</strong> a N(µ, σ 2 ) r<strong>and</strong>om variable, <strong>and</strong> where σ 2 is assumed<br />

known. The st<strong>and</strong>ard test of H 0 : µ = µ 0 versus H 1 : µ ≠ µ 0 , would<br />

compare (¯x· − µ 0 ) 2 /(σ 2 /n) to a χ 2 r<strong>and</strong>om variable with <strong>on</strong>e-degree<br />

of freedom. Or, if we chose <strong>the</strong> significance level to be .05, we could<br />

reject H 0 if |¯x· − µ 0 |/(σ/ √ n) ≥ 1.96. We now develop this same<br />

test using likelihood ideas:<br />

The likelihood of observing x 1 , . . . , x n , L(µ), is <strong>on</strong>ly a functi<strong>on</strong> of<br />

µ (because σ 2 is assumed to be known):<br />

L(µ) = (1/σ √ 2π) n exp{−(1/2σ 2 ∑<br />

) n (x i − µ) 2 } .<br />

Under H 0 , L(µ) is at a maximum for µ = µ 0 ; under H 1 , L(µ) is<br />

at a maximum for µ = ˆµ, <strong>the</strong> maximum likelihood estimator. If<br />

H 0 is true, L(ˆµ) <strong>and</strong> L(µ 0 ) should be close to each o<strong>the</strong>r; If H 0 is<br />

false, L(ˆµ) should be much larger than L(µ 0 ). Thus, <strong>the</strong> decisi<strong>on</strong><br />

rule would be to reject H 0 if L(µ 0 )/L(ˆµ) ≤ λ, where λ is some<br />

number less than 1.00 <strong>and</strong> chosen to obtain a particular α level.<br />

The ratio, (L(µ 0 )/L(ˆµ)) = exp{−(1/2)(ˆµ−µ 0 ) 2 }/(σ 2 /n). Thus, we<br />

could rephrase <strong>the</strong> decisi<strong>on</strong> rule: reject H 0 if (ˆµ − µ 0 ) 2 }/(σ 2 /n) ≥<br />

10<br />

i=1


−2 log(λ), or if |¯x· − µ 0 |/(σ/ √ n) ≥ √ −2 log λ. Thus, for an .05<br />

level of significance, choose λ so 1.96 = √ −2 log λ. Generally, we<br />

can phrase likelihood ratio tests as:<br />

−2 log(likelihood ratio) ∼ χ 2 ν−ν 0<br />

,<br />

where ν is <strong>the</strong> dimensi<strong>on</strong> of <strong>the</strong> parameter space generally, <strong>and</strong> ν 0 is<br />

<strong>the</strong> dimensi<strong>on</strong> under H 0 .<br />

0.3.2 Estimati<strong>on</strong><br />

To obtain estimates of <strong>the</strong> various quantities we need, merely replace<br />

<strong>the</strong> variances <strong>and</strong> covariances by <strong>the</strong>ir maximum likelihood<br />

estimates. This process generates sample partial correlati<strong>on</strong>s or covariances;<br />

sample multiple squared correlati<strong>on</strong>s; sample regressi<strong>on</strong><br />

parameters; <strong>and</strong> so <strong>on</strong>.<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!