MULTIVARIATE RANDOM VARIABLES 1 Joint ad marginal densities

Computational Methods and advanced Statistics Tools 

III - MULTIVARIATE RANDOM VARIABLES 

A random vector, or multivariate random variable, is a vector of n scalar 

random variables. The random vector is written 

X = 

⎛ ⎞ 

X1 

⎜ ⎟ 

⎜ X2 ⎟ 

⎜ ⎟ 

⎝ ... ⎠ 

A random vector is also said an n¡ dimensional random variable. 

Xn 

1 Joint ad marginal densities 

The glory of God is man fully alive; moreover, man’s life is the vision of God. 

(St. Irenaeus of Lyons, ca. 135-202) 

Every random variable has a distribution function. 

DEFINITION III.1A - (DISTRIBUTION) - The n¡dimensional random 

variable X has the distribution function 

F (x1, ..., xn) = P fX1 · x1, ..., Xn · xng. (2) 

A random vector, X, is called absolutely continuous if there exists the 

joint probability density f such that 

F (x1, ..., xn) = 

x1 

−∞ 

(1) 

xn 

... f(t1, ..., tn)dt1, ...dtn. (3) 

−∞ 

A random vector, X, is called discrete if it takes values on a discrete (i.e. 

finite or countable) sample space. It means that X : Ω ! R n , where Ω 

is discrete. In this case the joint probability function (or, less properly, the 

joint discrete density) is defined as 

f(x1, ..., xn) = P fX1 = x1, ..., Xn = xng. (4) 

1

The joint distribution and the joint probability function are related by 

F (x1, ..., xn) = 

t1·x1 

. . . 

tn·xn 

f(t1, . . . , tn). (5) 

Ex. III.1 (n = 2) The joint distribution function of the random vector 

(X, Y ) is defined by the relation 

F (x, y) = P (X · x, Y · y) 

The function F (x, y) is defined on R 2 and, as a function each variable, is 

monotone non-decreasing and such that 

lim F (x, y) = 1, lim F (x, y) = lim F (x, y) = 0 

x,y→+∞ x→−∞ y→−∞ 

To know F (x, y) allows to know the probability that (X, Y ) falls in any given 

rectangle (a, b] £ (c, d] : 

P [(X, Y ) 2 (a, b] £ (c, d] ] = F (b, d) ¡ F (a, d) ¡ F (b, c) + F (c, d). 

Since rectangles approximate any measurable set of R 2 , if we know F then we 

know P [(X, Y ) 2 A] for any measurable set A ½ R 2 ; that is, the distribution 

function F (x, y) thoroughly describes the distribution measure of the couple 

(X, Y ). 

If the random vector (X, Y ) is absolutely continuous, i.e. if there is an 

integrable function f(x, y) such that 

then 

f(x, y) ¸ 0, 

 

A 

 

R 2 

f(x, y)dxdy = 1, F (x, y) = 

x 

−∞ 

y 

−∞ 

f(ξ, η)dξdη, 

f(x, y)dxdy = P [(X, Y ) 2 A], 8 measurable set A ½ R 2 , 

which illustrates the meaning of the joint probability density f(x, y). 

An example of an absolutely continuous random vector: (X, Y ) uniformly 

distributed on a disk of radius r, with probability density 

f(x, y) = 

For example, F (0, 0) = 1/4. 

 

1/(πr 2 ), if x 2 + y 2 · r 2 

0, elsewhere. 

2

An example of a discrete random vector is (X, Y ) with X = score got by tossing 

a die, Y = score got by tossing another die. Then the joint probability 

function has the value 1/36 for any (xi, yj) 2 f1, ..., 6g2 and zero elsewhere. 

The distribution function 

F (x, y) = 

P (X = xi, Y = yj) 

xi·x yj·y 

is computed by counting and dividing by 36. For example F (2, 4) = 2/9,, 

F (3, 3) = 1/4, ..., F (6, 6) = 1,. 4 

For a sub-vector S = (X1, ..., Xk) T marginal density function is 

(k < n) of the random vector X, the 

fS(x1, ..., xk) = 

in the continuous case, and 

fS(x1, ..., xk) = 

xk+1 

+∞ 

−∞ 

f(x1, ..., xn) dxk+1 . . . dxn 

. . . 

f(x1, ..., xn) dxk+1 . . . dxn 

xn 

if X is discrete. In both cases the marginal distribution function is 

FS(x1, . . . , xk) = F (x1, ..., xk, 1, ..., 1). (8) 

For the sake of notation, we will write fX(x) and FX(x) instead of f(x) and 

F (x), respectively, whenever it is necessary to emphasize to which random 

variable the function belongs. 

Ex III.2 As in Ex. I.1, let f(x, y) = 1/(πr2 )¢IDr, where IDr is the indicator 

of the disk of radius r. Let us compute the marginal densities of X, Y : 

fX(x) = 

 

R 

f(x, y)dxdy = 

 

x 2 +y 2 ·r 2 

1 1 

dy = 

πr2 πr2 for jxj · r, and zero elsewhere. By symmetry 

√ r 2 −x 2 

− √ r 2 −x 2 

fY (y) = 2pr2 ¡ y2 πr2 , 

Another example: 

if jyj · r, and 0 elsewhere. 

fX,Y (x, y) = 

 

1 

8 

Then the marginal density function of X is 

fX(x) = 

 

R 2 fX,Y (x, y) dy = 

, 0 · y · x · 4 

0, else 

x 

0 1 

8 

3 

1 dy = x, x 2 [0, 4] 

8 

0 x /2 [0, 4] 

 

(6) 

(7) 

dy = 2p r 2 ¡ x 2 

πr 2 

 

4

2 Conditional distributions 

Does God guide the world and my life? Yes, but in a mysterious way; God guides 

everything along paths that he only knows, leading it to its perfection. ”Even the 

hairs of your head are all numbered” (Mt 10:30) 

Let A and B denote some events. If P (B) > 0 then the conditional probability 

of A occurring given that B occurs is 

P (AjB) := 

P (A \ B 

P (B) 

We interpret P (AjB) as ”the probability of A given B”. 

By use of (9) the conditional probability function of a discrete r.v. Y , given 

a discrete r.v. X is defined: 

fY |X(yjjxi) := P (X = xi, Y = yj) 

P (X = xi) 

´ f(xi, yj) 

fX(xi) 

Now suppose that the continuous random variables X and Y have joint 

density fX,Y . If we wish to use (9) to determine the conditional distribution 

of Y given that X takes the value x, we have the problem that the probability 

P (Y · yj X = x) is undefined as we may only condition on events which 

have a strictly positive probability, and P (X = x) = 0. However, fX(x) is 

positive for some x values. Therefore for both discrete and continuous random 

variables we use the following definition: 

DEFINITION III.2A (CONDITIONAL DENSITY) 

The conditional density function of Y given X = x is 

fY | X(yjx) = fX,Y (x, y) 

, 8x : fX(x) > 0 (10) 

fX(x) 

where fX,Y is the joint density function of X and Y . Both X and Y may 

be multivariate random variables. 

The conditional distribution function is then found by integration or summation 

as previously described in (3), (5) for the discrete and continuous case, 

respectively. 

It follows from (10) that 

(9) 

fX,Y (x, y) = fY | X=x(y) ¢ fX(x) (11) 

4

and by interchanging X and Y on the right hand side of (11) we get Bayes’ 

rule 

fY |X(yjx) = fX|Y (xjy) ¢ fY (y) 

fX(x) 

(12) 

Ex. III.3 Let us recall that an exponential variable T with parameter λ > 0 

is an absolutely continuous variable with density f(t) = λe −λt ¢ IR +(t). 

Now let Y be an exponential variable with parameter x > 0, which is in turn 

a value of X » exp(3). In other words, the conditional density of Y given 

X and the marginal of X are known: 

fY |X(yjx) = x ¢ e −xy ¢ IR +(y); fX(x) = 3e −3x ¢ IR +(x). 

Which is the conditional density of X given Y ? The answer is 

where 

= f 

So 

fY (y) = 

3x e −x(3+y) 

¡(3 + y) 

∞ 

0 

∞ 

¡ 

0 

∞ 

0 

fX|Y (xjy) = fY |X(yjx) ¢ fX(x) 

fY (y) 

fY |X(yjx) ¢ fX(x)dx = 

∞ 

e−x(3+y) 3 

¢3dxg = 

¡(3 + y) 3 + y 

0 

3xe −x(3+y) dx = 

e −x(3+y) 

¡(3 + y) 

fX|Y (xjy) = x(3 + y) 2 e −x(3+y) , x > 0, y > 0. 

DEFINITION III.2B (INDEPENDENCE) 

The random vectors X and Y are independent 1 if 

which corresponds to 

∞ 

0 

= 

3 

, y > 0. 

(3 + y) 2 

fX,Y (x, y) = fX(x) ¢ fY (y), 8x, y (13) 

FX,Y (x, y) = FX(x) ¢ FY (y), 8x, y (14) 

1 In most textbooks the n¡dimensional and m¡dimensional r.v. X, Y are said to be 

independent if 

P (X 2 A, Y 2 B) = P (X 2 A) ¢ P (Y 2 B), 8 measurable A ½ R n , B ½ R m . 

A suitable choice of A, B leads directly to the equivalent definition (14) 

5

If X and Y are independent, then 

fY |X=x(y) = fY (y). (15) 

Last but not least: if two random variables are independent, then they are 

also uncorrelated, but uncorrelated variables are not necessarily independent. 

3 Moments of scalar random variables 

Evolution presupposes the existence of something that can develop. The theory says 

nothing about where this ”something” came from. Furthermore, questions about 

the being, essence, dignity, mission, meaning and wherefore of the world and man 

cannot be answered in biological terms. (YouCat 42) 

An average of the possible values of X, each value being weighted by its 

probability, 

E[X] = 

x ¢ P fX = xg, 

x 

is ad litteram the notion of expectation for a discrete variable X (provided 

that such series is absolutely convergent). For continuous variables, the sums 

are replaced by integrals. 

DEFINITION III.3A (EXPECTATION) 

The expectation (or mean value) of an absolutely continuous variable X with 

density function fX is 

E[X] = 

∞ 

−∞ 

x ¢ fX(x) dx (16) 

provided that the integral is absolutely convergent, i.e. E[jXj] < 1. 

If X is a random variable and g is a function, then Y = g(X) is also a 

random variable. To calculate the expectation of Y , we could find fY and 

use (16). However the process of finding fY can be complicated; instead, we 

can simply use 

E[Y ] = E[g(X)] = 

 

R 

g(x)fX(x) dx (17) 

This provides a method for calculating the ”moments” of a distribution. 

6

DEFINITION III.3B (MOMENTS) 

The n-th moment of X is 

E[X n ] = 

and the n-th central moment is 

E[(X ¡ E[X]) n ] = 

 

 

R 

R 

x n fX(x) dx (18) 

(x ¡ E[X]) n fX(x) dx (19) 

The second central moment is also called the variance, and the the variance 

of X is given by 

V ar[X] = E[(X ¡ E[X]) 2 ] = E[X 2 ] ¡ (E[X]) 2 . (20) 

Now let X be an n¡dimensional random variable and put Y = g(X), where 

g is a function. As in (17) we have 

E[Y ] = E[g(X)] = 

 

R n 

g(x1..., xn)fX(x1, ..., xn)dx1...dxn 

This gives sense to the mean of a linear combination of variables, and to any 

type of moments. Since 

E[aX1 + bX2] = aE[X1] + bE[X2] 

the expectation operator is a linear operator. 

Of special interest is the covariance of Xi and Xj: 

DEF: III.3C - Covariance - The covariance of two scalar r.v.’s X1,2 is 

Cov[X1, X2] = E[(X1 ¡ E[X1])(X2 ¡ E[X2])] = E[X1X2] ¡ E[X1]E[X2](21) 

The covariance gives information about the simultaneous variation of two 

variables and is useful for finding interdependencies. Since the expectation 

is linear, the covariance is a bilinear operator, i.e. linear in each of the two 

arguments, whence its calculation rule: 

Cov[aX1 + bX2, cX3 + dX4] = 

= ac Cov[X1, X3] + ad Cov[X1, X4] + bc Cov[X2, X3] + bd Cov[X2, X4] (22) 

7

for constants a, b, c, d and r.v. X1, ..., X4. 

In particular, for two scalar random variables X, Y we have the formulae: 

Cov(X, Y ) = E(XY ) ¡ µxµy 

(23) 

V ar(X + Y ) = V ar(X) + 2 ¢ Cov(X, Y ) + V ar(Y ) (24) 

DEF. III.3C - Correlation coefficient - The correlation coefficient, or correlation, 

of two scalar r.v.’s X1, X2 is 

ρ ´ ρ(X1, X2) := E[ X1 ¡ µ 

σ1 

¢ X2 ¡ µ2 

] 

σ2 

THM. III.3D - The covariance and the correlation of two scalar r.v.’s X, Y 

satisfy 

Cov(X, Y ) 2 · V ar(X)V ar(Y ) (Schwarz’s inequality), 

¡1 · ρ(X, Y ) · 1 

respectively. 

Proof 

Since the expectation of a nonnegative variable is always nonnegative, 

0 · E[(θjXj + jY j) 2 ] = θ 2 E[X 2 ] + 2θE[jXY j] + E[Y 2 ], 8θ. 

Thus a polynomial with degree 2 (in theta) does not take negative values, so 

that the discriminant must be · 0: 

E[jXY j] 2 ¡ E[X 2 ]E[Y 2 ] · 0, or E[jXY j] 2 · E[X 2 ]E[Y 2 ]. 

Replacing X and Y by X ¡ µx and Y ¡ µy, and using j f(t)g(t)dtj · 

jf(t)g(t)jdt, this implies 

Cov(X, Y ) 2 · E[j(X¡mux)(Y ¡µy)j] 2 · E[(X¡µx) 2 ]E[(Y ¡µy) 2 ] = V ar(X)V ar(Y ) 

Since ρ = Cov(X, Y )/ 

V ar(X)V ar(Y ), this also means 

[ρ(X, Y )] 2 · 1, i.e. ¡ 1 · ρ · +1. 4 

8

DEFINITION III.3E - Uncorrelation - Two scalar random variables , Y are 

said uncorrelated, or orthogonal, when Cov(X, Y ) = 0. 

Since Cov(X, Y ) = E(XY ) ¡ µxµy, 

However 

X, Y independent ) X, Y uncorrelated 

X, Y uncorrelated does not imply X, Y independent 

as we see in the following example. 

Ex. III.4 Let U be uniformly distributed in [0, 1] and consider the r.v.’s 

X = sin 2πU, Y = cos 2πU. 

X and Y are not independent since Y has only two possible values if X is 

known. However they are orthogonal or uncorrelated: 

so that 

E(X) = 

1 

Cov(X, Y ) = E(X ¢ Y ) = 

0 

sin 2πt dt = 0, E(Y ) = 

1 

0 

1 

0 

cos 2πt dt = 0, 

sin 2πt cos 2πt dt = 1 

1 

sin 4πt dt = 0. 

2 0 

In this case the covariance is zero, but the variables are not independent. 

4 Moments of multivariate random variables 

What role does man play in God’s providence? 

The completion of creation through divine providence is not something that happens 

above and beyond us. God invites us to collaborate in the completion of creation. 

We are interested in defining a (vector) first moment ad a (matrix) second 

moment of a multivariate random variable. 

DEFINITION III.4A (EXPECTATION OF A RANDOM VECTOR) 

9

The expectation (or the mean value) of the random vector X is 

µ = E[X] = 

⎛ 

⎜ 

⎝ 

E[X1] 

E[X2] 

. 

E[Xn] 

⎞ 

⎟ 

⎠ 

(25) 

DEFINITION III.4B (VARIANCE matrix) The variance (matrix) of the 

random vector X is 

= 

⎛ 

⎜ 

⎝ 

ΣX = V ar[X] = E[(X ¡ µ)(X ¡ µ) T ] = 

V ar[X1] Cov[X1, X2] ... Cov[X1, Xn] 

Cov[X2, X1] V ar[X2] ... Cov[X2, Xn] 

. 

. 

Cov[Xn, X1] Cov[Xn, X2] ... V ar[Xn] 

Sometimes we shall use the notation 

Both σii and σ 2 i 

⎛ 

σ 

⎜ 

⎝ 

2 1 σ12 ... σ1n 

σ21 σ2 ⎞ 

⎟ 

2 ... σ2n ⎟ 

⎠ 

. 

σn1 σn2 ... σ 2 n 

are possible notations for the variance of Xi. 

. 

⎞ 

⎟ 

⎠ 

(26) 

By the calculation rule (25), the variance of a linear combination of X1, X2 

is known from the variance matrix of (X1, X2): 

V ar[z1X1 + z2X2] = z 2 1V ar(X1) + 2z1z2Cov(X1, X2) + z 2 2V ar(X2). 

and in general, for any n ¸ 2, it is the the quadratic form of Σ calculated in 

the coefficients z1, ..., zn: 

V ar[z1X1 + . . . + znXn] = z T Σz. 

Ex. III.4 

Let (X1, X2) be a random vector with mean (0, 0) and variance matrix 

Σ = 

 

2 ¡1 

¡1 3 

10

Compute the variance of the projection of (X1, X2) on the direction ( 1 

We deal with the scalar variance of a linear combination: 

V ar[(X1, X2) ¢ ( 1 

2 , 

p 

3 

)] = V ar[1 

2 2 X1 

p 

3 

+ 

2 X2] 

= ( 1 

2 )2V ar[X1] + 2 ¢ 1 

p 

3 

2 2 Cov(X1, 

p 

3 

X2) + ( 

2 )2V ar[X2] 

= 1 

4 ¢ 2 ¡ 2 ¢ p 3/4 + 3 

4 ¢ 3 = (11 ¡ 2p3)/4. 2 , √ 3 

The correlation of two random variables, Xi and Xj, is a standardization of 

the variance and can be written 

ρij = 

Cov[Xi, Xj] 

 

V ar[Xi]V ar[Xj] 

Equations (26, 27) lead to the definition 

DEFINITION III.4C 

The correlation matrix for X is 

⎛ 

⎜ 

⎝ 

1 ρ12 ... ρ1n 

ρ21 1 ... ρ2n 

. 

. 

ρn1 ρn2 ... 1 

= σij 

σiσj 

⎞ 

⎟ 

⎠ 

2 ). 

(27) 

(28) 

THEOREM III.4D 

The variance matrix Σ and the correlation matrix R are a) symmetric and 

b) positive semi-definite. 

Proof. (a) The symmetry is obvious. (b) By definition of variance, V ar(z1X1+ 

. . . + znXn) ¸ 0 for any z 2 R n . Hence the quadratic form z T Σz ´ 

V ar[z T X] ¸ 0, 8z, i.e. the matrix Σ is positive semidefinite. 4. 

Remark III.4E 

If Σ is positive definite, 

we often write Σ > 0. 

z T Σz ¸ 0 and z T Σz = 0 ) z = (0, ..., 0) 

11

DEFINITION III.4F 

The variance matrix of a p¡ dimensional random vector X with mean µ, 

and a q¡ dimensional random vector Y with mean ν is 

ΣXY = C[X, Y] = E 

 

T 

(X ¡ µ)(Y ¡ ν) 

= 

⎛ 

⎜ 

⎝ 

Cov[X1, Y1] . . . Cov[X1, Yq] 

. 

. 

Cov[Xp, Y1] . . . Cov[Xp, Yq] 

In particular C[X, X] = V ar[X] = ΣX. 

⎞ 

⎟ 

⎠ 

(29) 

COROLLARY III.4G - Linear transformations of random vectors 

Let X = (X1, ..., Xn) T have mean µ and covariance ΣX. Now we introduce 

the new random variable Y = (Y1, ..., Yk) T by the linear transformation 

Y = a + BX 

where a is (k £ 1) vector and B is a (k £ n) matrix. Then the expectation 

and variance matrix of Y are: 

Proof 

E[Y] = E[a + BX] = a + BE[X] (30) 

ΣY = V ar[a + BX] = V ar[BX] = BV ar[X]B T 

(31) 

The first formula is obvious since the expectation operator is linear. As for 

the variance of BX we have: 

= Cov[ 

l 

bi,lXl, 

(V ar[a + BX]) i,j = Cov[(BX)i, (BX)j] = 

m 

bj,mXm] = 

l,m 

bi,lbj,m Cov(Xl, Xm) = 

T 

BΣXB . 4 

i,j 

12

5 Conditional expectation 

What is heaven? Heaven is God’s milieu, the dwelling place of the angels and 

saints, and the goal of creation. Heaven is not a place in the universe. It is a 

condition in the next life. Heaven is where God’s will is done without resistance. 

Heaven happens when life is present in its greatest intensity and blessedness - a 

kind of life that we do not find on earth. 

The conditional expectation is the mean of a random variable, given values 

of another random variable. Mathematically, it is the mean calculated by the 

corresponding conditional density. 

DEF. III.5A - Conditional expectation - The conditional expectation, or 

conditional mean, of the random variable Y given X = x is 

E[Y jX = x] := 

+∞ 

−∞ 

y fY |X(yjx) dy (32) 

where fY |X(yjx) is the conditional probability density of Y given X = x. 

The equation y = E[Y jX = x], defines a curve which is said the regression 

curve of Y on X. 

Ex. III.5 Let the conditional density of Y given X = x be known: 

Then 

fY |X(yjx) = xe −xy ¢ IR +(y) 

E[Y jX = x] := 

= x 

x 

 

y e−xy 

∞ ¡x 

 

−xy 1 e 

x ¡x 

+∞ 

0 

¡ 

0 

∞ 0 

∞ 

0 

y ¢ xe −xy dy = 

e−xy ¡x dy 

 

= 

= x ¢ 1 1 

= 

x2 x . 

In particular, the regression curve of Y on X is y = 1 

x 

for x > 0. 

DEF. III.5B - Conditional variance - The conditional variance of Y given X 

is 

V ar[Y jX = x] := 

+∞ 

(y ¡ E[Y jX = x]) 2 fY |X(yjx) dy 

−∞ 

13

6 The multivariate normal distribution 

What is hell? Our faith calls ”hell” the condition of the final separation from 

God. Anyone who sees love clearly in the face of God and, nevertheless, does not 

want it decides freely to have this condition instead. 

We assume that X1, ..., Xn are independent random variables with means 

µ1, ..., µn , and variances σ 2 1, ..., σ 2 n. We write Xi 2 N(µi, σ 2 i ). Now, define 

the random vector X = (X1, ..., Xn) T . Because the random variables are 

indepedent, the joint density is the product of the marginal densities: 

fX(x1, ..., xn) = fX1(x1) ¢ ¢ ¢ fXn(xn) 

n 

 

1 (xi ¡ µi) 

= p exp 

2π 2 

 

i=1 σi 

1 

( n exp 

i=1 σi)(2π) n/2 

 

¡ 1 

2 

2σ2 i 

n 

· 

xi ¡ µi 

2 

By introducing the mean µ = (µ1, ..., µn) T , and the covariance 

this is written 

fX(x) = 

i=1 

σi 

ΣX = V ar[X] = diag(σ 2 1 , ..., σ2 n ), 

1 

(2π) n/2p det ΣX 

· 

exp ¡ 1 

2 (x ¡ µ)T Σ −1 

X 

. 

 

(x ¡ µ) 

(33) 

A generalization to the case where the covariance matrix is a full matrix leads 

to the following 

DEF. III.6A - The multivariate normal distribution - 

The joint density function for the n¡ dimensional random variable X with 

mean µ and covariance matrix Σ is 

fX(x) = 

1 

(2π) n/2p det ΣX 

· 

exp ¡ 1 

2 (x ¡ µ)T Σ −1 

 

X (x ¡ µ) 

(34) 

where Σ > 0 . We write X 2 N(µ, Σ). If X 2 N(0, I) we say that X is 

standardized normally distributed. 

THM. III.6B - In the case of a normally distributed variables X1, .., Xn, they 

are uncorrelated if ad only if they are independent. 

14

Proof 

Cov(Xi, Xj) = 0, 8i 6= j implies that the covariance and the quadratic form 

in the exponent of f are diagonal, so that the density is the product of the 

marginal densities. 4 

THM. III.6C - Principal components of N(µ; Σ). - 

Given X 2 N(µ; Σ), with values in R n , the associated joint density f(x) is 

constant on the ellipsoid 

fx : (x ¡ µ) T Σ −1 (x ¡ µ) = c 2 g (35) 

in R n , c being a constant. The axes of the ellipsoid are 

 

§c λiei 

where (λi, ei) are the eigenvalue-eigenvector pairs of the covariance matrix 

Σ (i = 1, .., n). These ellipsoids are called the contours of the distribution 

or the ellipsoids of equal concentration. 

An orthogonal matrix G such that GΣG T is diagonal, determines a principal 

component transformation 

Y = G T (X ¡ µ) , (36) 

by which the Xi’s are transformed into independent and normally distributed 

variables Yi with mean zero and variance λi. 

Proof 

The covariance matrix Σ is symmetric and definite positive. Thus by the 

spectral decomposition theorem, 

G T ΣG = Λ, (or G T Σ −1 G = Λ −1 ) 

where Λ = Diag(λ1, ..., λn) is the diagonal matrix of (positive) eigenvalues 

of Σ, while G is an orthogonal matrix whose columns are the corresponding 

normalized eigenvectors. By the principal component trasformation 

the ellipsoid becomes 

y = G T (x ¡ µ), i.e. x ¡ µ = Gy 

(Gy) T Σ −1 Gy = c 2 , i.e. y T G T Σ −1 Gy = c 2 , 

i.e. y T Λ −1 y = c 2 

15 

i.e 

n 1 

y 

i=1 

λi 

2 i = c2 .

This is the equation of an ellipsoid with semimajor axes of length c p λi. 

y1, ..., yn represent the different axes of the ellipsoid. The direction of the 

axes are given by the eigenvectors of Σ since y = G T (x ¡ µ) is a rotation 

and G is orthogonal with eigenvectors of Σ as columns. 4 

EX. III.6D Principal components and contours of a two-dimensional normal 

vector. 

To obtain contours of a bivariate normal density we have to find the eigenvalues 

and eigenvectors of Σ. Suppose 

µ1 = 5, µ2 = 8, Σ = 

Here the eiegnvalue equation becomes 

0 = jΣ ¡ λIj = det 

 

12 ¡ λ 9 

9 25 ¡ λ 

 

 

12 9 

9 25 

 

. 

= (12 ¡ λ)(25 ¡ λ) ¡ 81 = 0. 

The equation gives λ1 = 29.601, λ2 = 7.398. The corresponding normalized 

eigenvectors are 

e1 = 

 

.455 

.890 

 

e2 = 

 

.890 

¡.455 

The ellipse of constant probability density (35) has center at µ = (5 , 8) T 

and axes §5.44 c e1 and §2.72 c e2. The eigenvector e1 lies along the line 

which makes an angle of cos −1 (.455) = 62.92 ◦ with the X1 axis and passes 

through the point (5, 8) in the (X1, X2) plane. By principal component 

transformation y = G T (x ¡ µ) ) where G = (e1, e2), the ellipse (35) has the 

equation 

fy : 

n 

i=1 

λi 

1 

y 2 i = c2g. The ellipse as center at (y1 = 0, y2 = 0) and axes §c¢ p λifi where fi, i = 1, 2, 

are the eigenvectors of L = Diag(λ1, λ2). In terms of the new co-ordinate 

axes Y1, Y2, the eigenvectors are f1 = (1, 0) T , f2 = (0, 1) T . The lengths of 

the axes are c p λi, i = 1, 2. The axes Y1, Y2 are the Principal Components 

of Σ, 4 

> Sigma eigen(Sigma)$values 

[1] 29.601802 7.398198 

> eigen(Sigma)$vectors 

16 

 

.

[,1] [,2] 

[1,] 0.4552524 0.8903624 

[2,] 0.8903624 -0.4552524 

> 

THM. III.6E 

Let X be a normal random vector, with dimension n, with mean µ and positive 

definite variance Σ. Then X can be written as 

X = µ + CU 

where U = (U1, ..., Un) T » N(0; I). 

Proof 

Due to symmetry of the positive matrix Σ, there always exists a non singular 

real matrix C such that Σ = CC T (a matrix ”square root” of Σ). Setting 

C −1 (X ¡ µ), then from Corollary III.4G: 

E[U] = C −1 E[X ¡ µ] = 0 

V ar[U] = C −1 V ar[X](C −1 ) T = C −1 CC T (C T ) −1 = I. 4 

7 III.7 - Distributions derived from the normal 

distribution 

”I ask myself: What does hell mean? I maintain that it is the inability to love” 

(Fyodor M.Dostoyevsky, 1821-1881) 

Most of the test quantities used in Statistics are based on distributions derived 

from the normal one. 

Any linear combination of normally distributed random variables is normal. 

If, for instance, X 2 N(µ; Σ), then the linear transformation Y = a + BX 

defines a normally distributed random variable 

Y 2 N(a + Bµ, BΣB T ) (37) 

17

Compare with Corollary III.4G, where the transformation rule Σ ! BΣB T 

of the variance of X, for any linear map X ! a + BX, is established. 

Let U = (U1, ..., Un) T be a vector of independent N(0; 1) random variables. 

The (central) χ 2 distribution with n degrees of freedom is obtained as the 

squared sum of n independent N(0; 1) random variables, i.e. 

X 2 = 

n 

i=1 

U 2 i = UT U » χ 2 (n) (38) 

From this it is clear that if Y1, ..., Yn are independent N(µi; σ 2 i ) r.v.’s, then 

X 2 = 

n 

· 2 

Yi ¡ µi 

i=1 

since Ui = (Yi ¡ µi)/σi is N(0; 1) distributed. 

σi 

» χ 2 (n) 

THM. III.7A If Xi, for i = 1, ..., n, are independent normal variables from 

a population N(µ; σ 2 ), then the sample mean and the sample variance follow 

a normal and a chi squared distribution, respectively: 

¯X = 1 

n 

n 

i=1 

S 2 = 1 

(Xi ¡ 

n ¡ 1 

¯ X) 2 ) 

Xi ) ¯ X » N(µ, σ2 

n ) 

(n ¡ 1)S2 

σ 2 

» χ 2 (n ¡ 1). 

Proof. 

¯X is normal as a linear transformation of normal variables. By linearity 

E[ ¯ X] = 1 

n i E[Xi] = 1 ¢ nµ = µ and by independence 

n 

V ar[ 1 

n 

n 

i=1 

Xi] = 1 

n 2 

 

whence ¯ X » N(µ, σ 2 /n). 

Now using this fact and the definition of χ 2 , 

 

i 

( Xi ¡ µ 

σ 

i 

V ar[Xi] = nσ2 σ2 

= 

n2 n . 

) 2 » χ 2 (n), ( ¯ X ¡ µ 

σ/ p n )2 = ( ¯ X ¡ µ) 2 

σ 2 /n » χ2 (1) 

Now writing Xi ¡ µ = Xi ¡ ¯ X + ¯ X ¡ µ, the following decomposition can be 

verified: 

n 

( Xi ¡ µ 

) 

σ 

2 n (Xi ¡ 

= 

¯ X) 2 

σ2 + n( ¯ X ¡ µ 

) 

σ 

2 = 

i=1 

i=1 

18

= (n ¡ 1)S2 

σ2 Therefore the random variable 

+ n( ¯ X ¡ µ 

) 

σ 

2 . 

(n ¡ 1)S 2 

σ 2 

is the difference between a χ 2 (n) r.v. and a χ 2 (1) r.v., that is a r. v. with 

distribution χ 2 (n ¡ 1). 4. 

The noncentral χ 2 distribution with n degrees of freedom and noncentrality 

parameter λ appears when considering the sum of squared normal independent 

variables in which the means are not necessarily zero. Hence, 

X 2 = 

n 

i=1 

where Vi » N(µi; 1) and λ = n i=1 µ 2 i ´ µT µ. 

V 2 

i = VT V » χ 2 (n; λ), (39) 

Let X 2 1 , ..., X2 n denote independent χ2 (ni, λi) distributed random variables. 

Then the reproduction property of the χ 2 distribution is 

m 

X 

i=1 

2 i 

» χ2 

m 

ni, 

i=1 

m 

 

λi 

i=1 

THM. III.7B If Y is Np(µ; Σ), and Σ is non-singular, then 

(40) 

a) X 2 = (Y ¡ µ) T Σ −1 (Y ¡ µ) » χ 2 (p) (41) 

b) Y T Σ −1 Y » χ 2 (p, λ) (42) 

where λ = µ T Σ −1 µ. 

Proof 

a) Σ is positive definite. By Thm. III.6F there exists a real nonsingular C 

such that 

Y = µ + CU, (or U = C −1 (Y ¡ µ) ) with U » N(0; Ip). 

(Such C is the matrix satisfying CC T = Σ). So U T U is a sum of squares 

of p independent N(0, 1) variables and we find: 

χ 2 (p) = U T U = (Y ¡ µ) T (C T ) −1 C −1 (Y ¡ µ) = 

19

) Omitted. 4 

= (Y ¡ µ) T (CC T ) −1 (Y ¡ µ) = (Y ¡ µ) T Σ −1 (Y ¡ µ). 

PROPOSITION III.7C - The data matrix. The sample mean vector. 

Suppose that X1, ..., Xn is a multivariate sample, i.e. independent identically 

distributed random vectors from a normal population Np(µ, Σ) : 

⎛ 

⎞ 

X1,1 ... X1,p 

⎜ 

⎟ 

⎜ X2,1 ... X2,p ⎟ 

⎜ 

⎟ 

⎜ 

⎟ 

⎝ 

⎠ 

. 

. 

Xn,1 ... Xn,p 

Let us consider the sample mean vector 

¯X ´ ( ¯ X1, ..., ¯ Xp) T , ¯ Xj = 

n 

i=1 

. 

Xij/n, j = 1, ..., p. 

Then the mean and the variance matrix of ¯ X are µ and 1 Σ, respectively, and 

n 

¯X » Np(µ, 1 

n Σ). 

Proof 

For k, r = 1, ..., p, the k, r element of the variance matrix V ar[ ¯ X] is: 

Cov( ¯ Xk, ¯ Xr) = Cov( ¯ Xk, ¯ Xr) = Cov( 1 

n 

 

i 

Xi,k, 1 

n 

 

Xj,r) 

In such a sum only the i = j terms survive, since Xi and Xj are independent 

for i 6= j. On the other hand Cov(Xi,k, Xi,r) is the same for i = 1, ...n, so 

= 1 

n 2 

 

i 

Cov(Xi,k, Xi,r) = n 

n 2 σk,r whence Σ ¯ X = 1 

n Σ. 

4 

PROPOSITION III.7D - Testing for a multivariate population mean, if Σ is 

known. 

Suppose that X1, ..., Xn are a sample, i.e. independent identically distributed 

random vectors, from a normal distribution Np(µ, Σ) : 

⎛ 

⎞ 

X1,1 ... X1,p 

⎜ 

⎟ 

⎜ X2,1 ... X2,p ⎟ 

⎜ 

⎟ 

⎜ 

⎟ 

⎝ 

⎠ 

. 

. 

Xn,1 ... Xn,p 

20 

. 

k

the variance of ¯ X is 1 Σ and n 

¯X » Np(µ, 1 

n Σ). 

A test on the null hypothesis H◦ : µ = µ◦ against H1 : µ = µ1 is performed 

by the statistics 

H◦ is rejected if 

W = n( ¯ X ¡ µ◦) T Σ −1 ( ¯ X ¡ µ◦) » χ 2 (p). 

W > χ 2 1−α(p) 

where χ2 1−α (p) is the 1¡α quantile of the chi square distribution with p degrees 

of freedom. 

If the alternative µ = µ1 is true, W follows a non central χ2 (p; λ), with 

λ = n(µ1¡µ◦)Σ −1 (µ¡µ◦) and the power of the test is given by the probability 

P 

χ 2 (p, λ) ¸ χ 2 1−α(p) 

. 

Proof 

By Prop. III.3C the variance of ¯ X is 1 

n Σ and, if H◦ holds, 

¯X » Np(µ◦, 1 

n Σ). 

Thus, by Thm. III.7B, the statistics 

W = (X ¡ µ◦) T (Σ/n) −1 (X ¡ µ◦) 

follows a χ 2 (p) distribution. Hence, at the level α, the null hypothesis H◦ is 

rejected if the experimental W belongs to the critical region: 

W > χ 2 1−α(p). 

Besides, we can recover the power of the test: under the alternative H1 : 

mu = µ1, 

( ¯ X ¡ µ◦) » Np[(µ1 ¡ µ◦), n −1 Σ] 

. Hence, by the second part of Thm. III.7A, W obeys a χ 2 (p, λ) distribution 

with non centrality parameter λ = n(µ1 ¡ µ◦) T Σ(µ1 ¡ µ◦). The power of the 

test is 

1 ¡ β = 1 ¡ P fχ 2 (p, λ) < χ 2 1−α (p)g = P fχ2 (p, λ) ¸ χ 2 1−α (p)g. 4 

EX. III.7E - 

Five dimensions of 19 sparrows have been measured. They are saved in the 

table 

21

x1 x2 x3 x4 x5 

1 156 245 31.6 18.5 20.5 

2 154 240 30.4 17.9 19.6 

3 153 240 31.0 18.4 20.6 

4 153 236 30.9 17.7 20.2 

5 155 243 31.5 18.6 20.3 

6 163 247 32.0 19.0 20.9 

7 157 238 30.9 18.4 20.2 

8 155 239 32.8 18.6 21.2 

9 164 248 32.7 19.1 21.1 

10 158 238 31.0 18.8 22.0 

11 158 240 31.3 18.6 22.0 

12 160 244 31.1 18.6 20.5 

13 161 246 32.3 19.3 21.8 

14 157 245 32.0 19.1 20.0 

15 157 235 31.5 18.1 19.8 

16 156 237 30.9 18.0 20.3 

17 158 244 31.4 18.5 21.6 

18 153 238 30.5 18.2 20.9 

19 155 236 30.3 18.5 20.1 

From that, we extract a data matrix, X1, X2, X3 corresponding to the first 3 

columns (p = 3, n = 19). The sample mean and the sample variance are: 

Suppose we want to test 

S = 

¯x T = (157, 241, 31.37) 

⎛ 

⎜ 

⎝ 

10.22 9 1.36 

9 16.66 1.84 

1.36 1.84 0.52 

H◦ : µ◦ = (156, 240, 30.9) T , against the alternative H1 : µ1 6= µ0 

assuming the parametric variance to be known, i.e. Σ = S. The statistics W 

falls in the critical region: 

⎞ 

⎟ 

⎠ 

Wexp. = 8.99 > 7.81 ´ χ 2 0.95(3) 

so that the null hypothesis is rejected. More precisely, the p-value is 0.029 2 

2 The p-value is the probability area from the experimental value Wexp to infinity: if 

p · α, H◦ is rejected; if p > α, H◦ is not rejected at the level α. Actually p is the finest 

significance level at which the experiment rejects H◦. 

22

If an alternative hypothesis H1 : µ1 = (157, 141, 31.37) is true, the ncp λ of 

the chi square is computed and the power of the test follows. 

> T T 

x1 x2 x3 x4 x5 

1 156 245 31.6 18.5 20.5 

2 154 240 30.4 17.9 19.6 

3 153 240 31.0 18.4 20.6 

4 153 236 30.9 17.7 20.2 

5 155 243 31.5 18.6 20.3 

6 163 247 32.0 19.0 20.9 

7 157 238 30.9 18.4 20.2 

8 155 239 32.8 18.6 21.2 

9 164 248 32.7 19.1 21.1 

10 158 238 31.0 18.8 22.0 

11 158 240 31.3 18.6 22.0 

12 160 244 31.1 18.6 20.5 

13 161 246 32.3 19.3 21.8 

14 157 245 32.0 19.1 20.0 

15 157 235 31.5 18.1 19.8 

16 156 237 30.9 18.0 20.3 

17 158 244 31.4 18.5 21.6 

18 153 238 30.5 18.2 20.9 

19 155 236 30.3 18.5 20.1 

> x1 x2 x3 xbar xbar 

[1] 157.00000 241.00000 31.37368 

># calcolo di covarianza fra x1,x2 aventi la stessa lunghezza 

>sum((x1-mean(x1))*(x2-mean(x2)))/(length(x1)-1) 

[1] 9 

># si puo’ calcolare direttamente col comando "cov" 

>cov(x1,x2) 

[1] 9 

># calcolo la matrice delle covarianze fra x1,x2,x3 (matrice 3 x 3) 

> Sigma Sigma[1,1]

Sigma[1,2] Sigma[1,3] Sigma[2,1] Sigma[2,2] Sigma[2,3] Sigma[3,1] Sigma[3,2] Sigma[3,3] Sigma 

[,1] [,2] [,3] 

[1,] 10.222222 9.000000 1.3666667 

[2,] 9.000000 16.666667 1.8444444 

[3,] 1.366667 1.844444 0.5231579 

># C’e anche un comando diretto per scrivere la matrice varianza: 

> X var(X) 

[,1] [,2] [,3] 

[1,] 10.222222 9.000000 1.3666667 

[2,] 9.000000 16.666667 1.8444444 

[3,] 1.366667 1.844444 0.5231579 

># Pongo l’ipotesi nulla e calcolo la statistica W 

> mu0 Wsper Wsper 

[,1] 

[1,] 8.993734 

> qchisq(0.95,df=3) 

[1] 7.814728 

> pvalue pvalue 

[1] 0.02942414 

# Calcolo della potenza assumendo una media mu1 alternativa 

> mu1 lambda lambda 

[,1] 

[1,] 25483.11 

> potenza

potenza 

[1] 1 

DEF. III.7F - 

The (Student) t distribution with n degrees of freedom is obtained as 

T = 

Z 

» t(n) (43) 

(K/n) 1/2 

where Z 2 N(0; 1), K » χ 2 (n), and Z and K are independent. 

The F distribution with (n1, n2) degrees of freedom appears as the following 

ratio 

F = K1/n1 

K2/n2 

» F (n1, n2) (44) 

where K1 » χ 2 (n1), and K2 » χ 2 (n2), and K1, K2 are indepedent. It is 

clearly seen from (43) that T 2 » F (1, n1). 

PROP. III.7G - For i = 1, ..., n, let the Xi be i.i.d. variables from a normal 

population N(µ; σ2 ). If S2 = 1 ni=1(xi ¡ ¯x) n−1 

2 then the estimated standardization 

of the mean 

T = ¯ X ¡ µ 

S/ p n 

obeys a Student t distribution with n ¡ 1 degrees of freedom. 

If two independent samples fXigi and fYjgj (i = 1, ..., n1; j = 1, ...n2) are 

taken from normal populations with the same variance, the ratio 

F = S 2 1/S 2 2 

obeys a Fisher’s distribution with n1 ¡ 1, n2 ¡ 1 degrees of freedom. 

Proof Since ¯ X » N(µ; σ 2 /n), we have: 

T = ¯ X ¡ µ 

S/ p n = ¯ X ¡ µ 

σ/ p n ¢ σ/p n 

S/ p n = 

= U 

S/σ = 

U 

 

K/(n ¡ 1) 

25

where 

U = ¯ X ¡ µ 

σ/ p n , K = S2 ¢ (n ¡ 1) 

σ2 are independent with U » N(0; 1) and K » χ 2 (n ¡ 1). 

Finally, under the assumptions of equal variance σ 2 , 

S 2 1 

S 2 2 

= 

1 

n1−1S2 1 ¢ (n1 ¡ 1)/σ2 1 

n2−1S2 2 ¢ (n2 ¡ 1)/σ2 = K1/(n1 ¡ 1) 

K2/(n2 ¡ 1) 

where K1 and K2 are independent, K1 » χ 2 (n1 ¡ 1), K2 » χ 2 (n2 ¡ 1). 

4 

># Intervallo di fiducia per la media di x1 (prima colonna della tabella sopra 

> t.test(x1) 

One Sample t-test 

data: x1 

t = 214.0444, df = 18, p-value < 2.2e-16 

alternative hypothesis: true mean is not equal to 0 

95 percent confidence interval: 

155.459 158.541 

sample estimates: 

mean of x 

157 

> # test su una media unilaterale: H_o: mu=155.5; H_1 mu> 155.5 

> t.test(x1,alternative="greater",mu=155.5) 

One Sample t-test 

data: x1 

t = 2.045, df = 18, p-value = 0.02788 

alternative hypothesis: true mean is greater than 155.5 

95 percent confidence interval: 

155.7281 Inf 

sample estimates: 

mean of x 

157 

26

# "ratio test", test sul rapporto di varianze di x2, x1: 

> var(x1)/var(x2) 

[1] 0.6133333 

> var(x2)/var(x1) 

[1] 1.630435 

># se il quantile 0.95 di F di Fisher e’ qf(0.95,length(x2)-1,length(x1)-1) 

[1] 2.217197 


[1] 0.1545287 

Given a sample x1, ..., xn from Np(µ, Σ), we want to test H◦ : µ = µ◦ against 

H1 : µ 6= µ◦. 

In Prop. III.7D we used the fact that under H◦ : µ = µ◦ 

and hence a critical region for H◦ is 

n(¯x ¡ µ◦) T Σ −1 (¯x ¡ µ◦) » χ 2 (p) 

fx : n(¯x ¡ µ◦) T Σ −1 (¯x ¡ µ◦) ¸ χ 2 1−α(p)g. 

That testing is realist for large samples: when n is large enough the sample 

variance S is nearly equal to the parameter variance Σ. Now consider the 

general case of unknown Σ. 

PROP. III.7H - Testing for a multivariate population mean if Σ is unknown. 

Let x1, ..., xn be a random sample from Np(µ, Σ), with unknown Σ. Under 

the hypothesis H◦ : µ = µ◦, and up to a suitable factor, 

T 2 ◦ = n(¯x ¡ µ◦) T S −1 (¯x ¡ µ◦) 

obeys an F distribution with p, n ¡ p degrees of freedom: 

F◦ = T 2 ◦ n ¡ p 

¢ 

n ¡ 1 p 

The null hypothesis H◦ is rejected if 

F◦ ¸ F1−α(p, n ¡ p) 

27 

» Fp,n−p

where P [F (p, n ¡ p) · F1−α(p, n ¡ p)] = 1 ¡ α. 

When H◦ is false, the F distribution is non-central 3 with non-centrality parameter 

λ = n(µ ¡ µ◦) T Σ −1 (µ ¡ µ◦).. Thus 

T 2 ◦ n ¡ p 

¢ 

n ¡ 1 p 

» F (p, n ¡ p; λ). 

The power of the test for a critical region of size α is 

P [F (p, n ¡ p); λ) ¸ F1−α(p, n ¡ p)] 

Proof. Parimal Mukhopadhyay: Multivariate Statistical Analysis, World 

Scientific 2009, pages 114 and 131. 4 

>#test di media multivariata con Sigma incognita 

> T T 

x1 x2 x3 x4 x5 

1 156 245 31.6 18.5 20.5 

2 154 240 30.4 17.9 19.6 

3 153 240 31.0 18.4 20.6 

4 153 236 30.9 17.7 20.2 

5 155 243 31.5 18.6 20.3 

6 163 247 32.0 19.0 20.9 

7 157 238 30.9 18.4 20.2 

8 155 239 32.8 18.6 21.2 

9 164 248 32.7 19.1 21.1 

10 158 238 31.0 18.8 22.0 

11 158 240 31.3 18.6 22.0 

12 160 244 31.1 18.6 20.5 

13 161 246 32.3 19.3 21.8 

14 157 245 32.0 19.1 20.0 

15 157 235 31.5 18.1 19.8 

3 The non-central F distribution with (n1, n2) degrees of freedom and non-centrality 

parameter λ is obtained from 

F = K1/n1 

K2/n2 

where K1 » χ 2 (n1, λ) and K2 » χ 2 (n2, λ) and K1, K2 are independent. The non-central 

F distribution is written F » F (n1, n2; λ). 

28

16 156 237 30.9 18.0 20.3 

17 158 244 31.4 18.5 21.6 

18 153 238 30.5 18.2 20.9 

19 155 236 30.3 18.5 20.1 

> x1 x2 x3 

> # test di H_0: mu=(160,240,30) 

> mu0 xbar X S S 

[,1] [,2] [,3] 

[1,] 10.222222 9.000000 1.3666667 

[2,] 9.000000 16.666667 1.8444444 

[3,] 1.366667 1.844444 0.5231579 

> solve(S) 

[,1] [,2] [,3] 

[1,] 0.20277502 -0.08342674 -0.2355883 

[2,] -0.08342674 0.13271132 -0.2499477 

[3,] -0.23558829 -0.24994769 3.4081208 

> Tquadro F0 F0 

[,1] 

[1,] 57.10946 

> # quantile 0.95 di F(3,19-3) 

> qf(0.95,df1=3,df2=16) 

[1] 3.238872 

> 


[1] 9.089796e-09 

> #potenza del test prendendo mu1= (161,240,31) 

> lambda mu1 lambda

potenza potenza 

[1] 0.2781924 

> 

># potenza prendendo mu1=(161,241,31) 

mu1 lambda potenza potenza 

[1] 0.9998438 

The test therefore strongly rejects the null hypothesis. 

PROP.III.7I - CONFIDENCE REGION AND SIMULTANEOUS CONFI- 

DENCE INTERVALS 

Let X1, ..., Xn be a random sample from Np(µ, Σ). A (1 ¡ α) confidence 

region for the mean µ is the ellipsoid defined by 

fµ 2 R p : n(µ ¡ ¯x) T S −1 (µ ¡ ¯x) · 

(n ¡ 1)p 

n ¡ p F1−α(p; n ¡ p)g 

where ¯x is the observed sample mean. 

Simultaneous confidence intervals for the p variables are obtained as projections 

of the ellipsoid to the p axes. Otherwise, at the level α, Bonferroni’s 

simultaneous confidence intervals are obtained by the usual univariate intervals 

with levels α/m instead of α. 

Proof We know that 

P [n( ¯ X ¡ µ) T S −1 ( ¯ X ¡ µ) · 

(n ¡ 1)p 

n ¡ p F1−α(p; n ¡ p)] = 1 ¡ α. 

Hence the confidence region is the above ellipsoid with center ¯x. Beginning 

at the center, the axes of the elipsoid are 

 

 

 

§f λi 

 

(n ¡ 1)p 

n(n ¡ p) F1−α(p, n ¡ p)g ei 

where (λi, ei) (i=1,...,p) are the (eigenvalue, eigenvector ) pairs of S. 

Now let m · p and, for i = 1, ..., m ,let Ai be a confidence statement 

Ai : the interval ¯xi § t1−α/2(n ¡ 1)si/ p n contains µi 

30

on the i¡th component µi of µ. Then P (Ai) = 1¡α. By using De Morgan’s 

law and Boole’s inequality. i.e. 

we have 

(\ m i=1 Ai) C = [ m i=1 AC i ; P ([ m i=1 Ai) · 

m 

i=1 

P (Ai), 

P (\ m i=1Ai) = 1 ¡ P [(\ m i=1Ai) C ] = 1 ¡ P [[ m i=1A C i ] ¸ 1 ¡ 

m 

P (A 

i=1 

C i ) 

So the simultaneous confidence of the m independent intervals is bounded 

below by 1 ¡ mα: 

(1 ¡ α) m ¸ 1 ¡ mα 

Therefore in place of α we can choose a confidence level α/m , for each 

i = 1, ..., m, to ensure the desired global confidence level α for the m means. 

For example, denoting x, y, z... the m variables, the simultaneous Bonferroni’s 

intervals are : 

µx = ¯x § t1− α 

2m 

sx 

p n , µy = ¯y § t1− α 

2m 

31 

sy 

p n , ... etc. 4

MULTIVARIATE RANDOM VARIABLES 1 Joint ad marginal densities

Create successful ePaper yourself

Delete template?

Save as template?