10.08.2013 Views

SOLUTION FOR HOMEWORK 3, STAT 4352 Welcome to your third ...

SOLUTION FOR HOMEWORK 3, STAT 4352 Welcome to your third ...

SOLUTION FOR HOMEWORK 3, STAT 4352 Welcome to your third ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>SOLUTION</strong> <strong>FOR</strong> <strong>HOMEWORK</strong> 3, <strong>STAT</strong> <strong>4352</strong><br />

<strong>Welcome</strong> <strong>to</strong> <strong>your</strong> <strong>third</strong> homework. We finish the point estimation; <strong>your</strong> Exam 1 is next<br />

week and it will be close <strong>to</strong> HW1-HW3.<br />

Recall that X n := (X1, . . .,Xn) denotes the vec<strong>to</strong>r of n observations.<br />

Try <strong>to</strong> find mistakes (and get extra points) in my solutions. Typically they are silly<br />

arithmetic mistakes (not methodological ones). They allow me <strong>to</strong> check that you did <strong>your</strong><br />

HW on <strong>your</strong> own. Please do not e-mail me about <strong>your</strong> findings — just mention them on the<br />

first page of <strong>your</strong> solution and count extra points.<br />

Now let us look at <strong>your</strong> problems.<br />

1. Problem 10.51. Let X1, . . .,Xn be iid according <strong>to</strong> Expon(θ), so<br />

f X θ (x) = (1/θ)e−x/θ I(x > 0), θ ∈ Ω := (0, ∞).<br />

Please note that it is important <strong>to</strong> write this density with indica<strong>to</strong>r function showing its<br />

support. In some cases the support may depend on a parameter of interest, and then this<br />

fact is always very important. We shall see such an example in this homework.<br />

For the exponential distribution we know that Eθ(X) = θ (you may check this by a direct<br />

calculation), so we get a simple method of moments estima<strong>to</strong>r<br />

ˆΘMME = ¯ X.<br />

This is the answer. But I would like <strong>to</strong> continue a bit. The method of moments estima<strong>to</strong>r<br />

(or a generalized one) allows you <strong>to</strong> work with any moment (or any function). Let us consider<br />

the second moment and equate sample second moment <strong>to</strong> the theoretical one. Recall that<br />

V arθ(X) = θ 2 , and thus<br />

Eθ(X 2 ) = V arθ(X) + (Eθ(X)) 2 = 2θ 2 .<br />

The sample second moment is n−1 n i=1 X2 i , and we get another method of moments estima<strong>to</strong>r<br />

˜ΘMME = [n −1<br />

n<br />

X<br />

i=1<br />

2 i /2]1/2 .<br />

Note that these MM estima<strong>to</strong>rs are different, and this is OK. Then a statistician should<br />

choose a better one. Which one do you think is better? You may use the notion of efficiency<br />

<strong>to</strong> resolve the issue (compare their MSEs (mean squared errors) E( ˆ θ − θ) 2 and choose an<br />

estima<strong>to</strong>r with the smaller MSE). By the way, which estima<strong>to</strong>r is based on the sufficient<br />

statistic?<br />

2. Problem 10.53. Here X1, . . .,Xn are Poisson(λ). Recall that Eλ(X) = λ and<br />

V arλ(X) = λ.<br />

The MME is easy <strong>to</strong> get via the first moment, and we have<br />

ˆλMME = ¯ X.<br />

1


This is the answer. But again, as an extra example, I can suggest a MME based on the<br />

second moment. Indeed, Eλ(X 2 ) = V arλ(X) + (EλX) 2 = λ + λ 2 and this yields that<br />

˜λMME + ˜ λ 2 n<br />

MME = n−1 X<br />

i=1<br />

2 i .<br />

Then you need <strong>to</strong> solve this equation <strong>to</strong> get the MME. Obviously it is a more complicated<br />

estima<strong>to</strong>r, but it is yet another MME.<br />

3. Problem 10.56. Let X1, . . .,Xn be iid according <strong>to</strong> the pdf<br />

gθ(x) = θ −1 e −(x−δ)/θ I(x > δ).<br />

Please note that this is a location-exponential family because<br />

X = δ + Z,<br />

where Z is a classical exponential RV with f Z θ (z) = θ −1 e −z/θ I(z > 0). I can go either further<br />

by saying that we are dealing with a location-scale family because<br />

X = δ + θZ0,<br />

where f Z0 (z) = e −z I(z > 0).<br />

So now we know the meaning of parameters δ and θ: the former is the location (shift)<br />

and the latter is the scale (multiplier).<br />

Note that this understanding simplifies all calculations because you can easily figure out<br />

(otherwise do calculations) that<br />

Eδ,θ(X) = δ + θ, V arδ,θ(X) = θ 2 .<br />

These two familiar results yield Eδ,θ(X 2 ) = θ 2 + (δ + θ) 2 , and we get the following system of<br />

two equations <strong>to</strong> find the pair of MMEs:<br />

ˆδ + ˆ θ = ¯ X,<br />

2ˆ θ 2 + 2ˆ δˆ θ + ˆ δ 2 = n −1<br />

n<br />

Xi.<br />

i=1<br />

To solve this system, we square the both sides of the first equality and then subtract the<br />

obtained equality from the second equality. We get a new system<br />

ˆδ + ˆ θ = ¯ X,<br />

ˆθ 2 = n −1<br />

n<br />

X<br />

i=1<br />

2 i − ¯ X 2 .<br />

This, <strong>to</strong>gether with a simple algebra, yields the answer<br />

ˆδMME = ¯ X − [n −1 n <br />

i=1<br />

X 2 i − ¯ X 2 ] 1/2 , θMME<br />

ˆ = [n −1 n <br />

X 2 i − ¯ X 2 ] 1/2 .<br />

2<br />

i=1


Remark: We need <strong>to</strong> check that n −1 n i=1 X 2 i − ¯ X 2 ≥ 0 for the estima<strong>to</strong>r <strong>to</strong> be well<br />

defined. This may be done via famous Hölder inequality<br />

m<br />

( aj)<br />

j=1<br />

2 m<br />

≤ m a<br />

j=1<br />

2 j .<br />

4. Problem 10.59. Here X1, . . ., Xn are Poisson(λ), λ ∈ Ω = (0, ∞). Recall that<br />

Eλ(X) = λ and V arλ(X) = λ. Then, by definition of the MLE:<br />

ˆλMLE := arg max<br />

λ∈Ω<br />

= arg max<br />

λ∈Ω<br />

n<br />

l=1<br />

n<br />

l=1<br />

For the Poisson pdf fλ(x) = e −λ λ x /x! we get<br />

ln LXn(λ) = −nλ +<br />

fλ(Xl) =: arg max LXn(λ) λ∈Ω<br />

ln(fλ(Xl)) =: arg max ln LXn(λ). λ∈Ω<br />

n<br />

n<br />

Xl ln(λ) − ln(Xl!).<br />

l=1<br />

l=1<br />

Now we need <strong>to</strong> find ˆ λMLE at which the above loglikelihood attains its maximum over all<br />

λ ∈ Ω. You can do this in a usual way: take derivative with respect <strong>to</strong> λ ( that is, calculate<br />

∂ lnLX n(λ)/∂λ, then equate it <strong>to</strong> zero, solve with respect <strong>to</strong> λ, and then check that the<br />

solution indeed maximizes the loglikelihood). Here equating of the derivative <strong>to</strong> zero yields<br />

−n + n l=1 Xl/λ = 0, and we get<br />

ˆλMLE = ¯ X.<br />

Note that for the Poisson setting the MME and MLE coincide; in general they may be<br />

different.<br />

5. Problem 10.62. Here X1, . . .,Xn are iid N(µ, σ 2 ) with the mean µ being known and<br />

the parameter of interest being the variance σ 2 . Note that σ 2 ∈ Ω = (0, ∞). Then we are<br />

interested in the MLE. Write:<br />

Here<br />

ˆσ 2 MLE<br />

= arg max<br />

σ 2 ∈Ω ln LX n(σ2 ).<br />

ln LXx(σ2 n<br />

) = ln([2πσ<br />

l=1<br />

2 ] −1/2 e −(Xl−µ) 2 /(2σ2 n<br />

) 2 2<br />

) = −(n/2) ln(2πσ ) − (1/2σ ) (Xl − µ)<br />

l=1<br />

2 .<br />

This expression takes on its maximum at<br />

Note that this is also the MME.<br />

ˆσ 2 MLE = n−1 n <br />

(Xl − µ) 2 .<br />

l=1<br />

3


Then<br />

6. Problem 10.66. Let X1, . . .,Xn be iid according <strong>to</strong> the pdf<br />

gθ(x) = θ −1 e −(x−δ)/θ I(x > δ).<br />

LX n(δ, θ) = θ−n e − n<br />

l=1 (Xl−δ)/θ I(X(1) > δ).<br />

Recall that X(1) = min(X1, . . .,Xn) is the minimal observation [the first ordered observation].<br />

This is the case that I wrote you about earlier: it is absolutely crucial <strong>to</strong> take in<strong>to</strong> account<br />

the indica<strong>to</strong>r function (the support) because here the parameter δ defines the support.<br />

By its definition,<br />

Note that<br />

( ˆ δMLE, ˆ θMLE) := arg max ln(LXn(δ, θ)).<br />

δ∈(−∞,∞),θ∈(0,∞)<br />

L(δ, θ) := ln(LXn(δ, θ)) = −n ln(θ) − θ−1 n <br />

(Xl − δ) + ln I(X(1) ≥ δ).<br />

Now the crucial step: you should graph the loglikelihood L as a function in δ and visualize<br />

that it takes on maximum when δ = X(1). So we get ˆ δMLE = X(1). Then by taking a<br />

derivative we get that ˆ θMLE = n −1 n l=1 (Xl − X(1)).<br />

Answer: ( ˆ δMLE ˆ θMLE) = (X(1), n −1 n l=1 (Xl − X(1)). Please note that ˆ δMLE is a biased<br />

estima<strong>to</strong>r; this is a rather typical outcome.<br />

7. Problem 10.73. Consider iid uniform observations X1, . . .,Xn with the parametric pdf<br />

l=1<br />

fθ(x) = I(θ − 1/2 < x < θ + 1/2).<br />

As soon as the parameter is in the indica<strong>to</strong>r function you should be very cautious: typically<br />

a graphic will help you <strong>to</strong> find a MLE estima<strong>to</strong>r, and not a differentiation. Also, it is very<br />

helpful <strong>to</strong> figure out the nature of the parameter. Here it is obviously a location parameter,<br />

and you can write<br />

X = θ + Z, Z ∼ Uniform(−1/2, 1/2).<br />

The latter helps you <strong>to</strong> guess about a correct estima<strong>to</strong>r and check a suggested one and, if<br />

necessary, simplify calculations of descriptive characteristics (mean, variance, etc.)<br />

Well, now we need <strong>to</strong> write down the likelihood function (recall that this is just a joint<br />

density only considered as a function in the parameter given a vec<strong>to</strong>r of observations):<br />

LXn(θ) =<br />

n<br />

I(θ − 1/2 < Xl < θ + 1/2) = I(θ − 1/2 < X(1) ≤ X(n) < θ + 1/2).<br />

l=1<br />

Note that the latter expression implies that (X(1), X(n)) is a sufficient statistic (due <strong>to</strong> the<br />

Fac<strong>to</strong>rization Theorem). As a result, any good estima<strong>to</strong>r, and the MLE in particular, must<br />

be a function of only these two statistics. Another remark is: it is possible <strong>to</strong> show (there<br />

exists a technique how <strong>to</strong> do this which is beyond this class objectives) that this pair of<br />

4


extreme observations is also the minimal sufficient statistic. Please look at the situation: we<br />

have 1 parameter and need 2 univariate statistics (X(1), X(n)) <strong>to</strong> have the sufficient statistics;<br />

here this is the limit of data-reduction. Nonetheless, this is a huge data-reduction whenever<br />

n is large. Just think about this: <strong>to</strong> estimate θ you do not need any observation which is<br />

between the two extreme ones! This is not a trivial assertion.<br />

Well, now let us return <strong>to</strong> the problem at hand. If you look at the graphic of the likelihood<br />

function as a function in θ, then you may conclude that it attains its maximum on all θ such<br />

that<br />

X(n) − 1/2 < θ < X(1) + 1/2. (1)<br />

As a result, we get a very curious MLE: any point within this interval can be declared as<br />

the MLE (the MLE is not unique!).<br />

Now we can consider the particular questions at hand.<br />

(a). Let ˆ Θ1 = (1/2)(X(1) + X(n)). We need <strong>to</strong> check that this estima<strong>to</strong>r satisfies (1). We<br />

just plug-in this estima<strong>to</strong>r in (1) and get<br />

X(n) − 1/2 < (1/2)(X(1) + X(n)) < X(1) + 1/2.<br />

The latter relation is true because it is equivalent <strong>to</strong> the following valid inequality:<br />

X(n) − X(1) < 1.<br />

(b) Let ˆ Θ2 = (1/3)(X(1) + 2X(n)) be another candidate for the MLE. Then it should<br />

satisfy (1). In particular, if this is the MLE then<br />

(1/3)(X(1) + 2X(n)) < X(1) + 1/2<br />

should hold. The latter inequality is equivalent <strong>to</strong><br />

X(n) − X(1) < 3/4<br />

which obviously may not hold. The contradic<strong>to</strong>ry shows that this estima<strong>to</strong>r, despite being<br />

a function of the sufficient statistic, is not the MLE.<br />

8. Problem 10.74. Here we are exploring the Bayesian approach where the parameter of<br />

interest is considered as a realization of a random variable, (it can be considered as a random<br />

variable). For the problem at hand X ∼ Binom(n, θ) and θ is a realization (which we do<br />

not directly observe) of a beta RV Θ ∼ Beta(α, β).<br />

[Please note that here <strong>your</strong> knowledge of basic/classical distributions becomes absolutely<br />

crucial: you cannot solve any problem without knowing formulae for pmf/pdf; so it is time<br />

<strong>to</strong> refresh them.]<br />

In other words, here we are observing a binomial random variable whose parameter<br />

(probability of success has a beta prior.<br />

To find a Bayesian estima<strong>to</strong>r, we need <strong>to</strong> find a posterior distribution of the parameter<br />

of interest and then calculate its mean. [Please note that <strong>your</strong> knowledge of means of classical<br />

distribution becomes very handy here: as soon as you realize the underlying posterior<br />

distribution, you can use a formula for calculating its mean.]<br />

5


Given this information, the posterior distribution of Θ given the observation X is<br />

=<br />

f Θ|X (θ|x) = fΘ (θ)f X|Θ (x|θ)<br />

f X (x)<br />

Γ(n + α + β)<br />

Γ(x + α)Γ(n − x + β) θx+α−1 (1 − θ) (n−x+β)−1 .<br />

The algebra leading <strong>to</strong> the last equality is explained on page 345.<br />

Now you can realize that the posterior distribution is again Beta(x+α, n−x+β). There<br />

are two consequences from this fact. First, by a definition, if a prior density and a posterior<br />

density are from the same family of distributions, then the prior is called conjugate. This<br />

is the case that Bayesian statisticians like a lot because this methodologically support the<br />

Bayesian approach and also simplifies formulae. Second, we know a formula for the mean of<br />

a beta RV, and using it we get the Bayesian estima<strong>to</strong>r<br />

ˆΘB = E(Θ|X) =<br />

X + α X + α<br />

=<br />

(α + X) + (n − X + β) α + n + β<br />

Now we actually can consider the exercise at hand. A general remark: Bayesian estima<strong>to</strong>r<br />

is typically a linear combination of the prior mean and the MLE estima<strong>to</strong>r with weights<br />

depending on variances of these two estimates. In general, as n → ∞, Bayesian estima<strong>to</strong>r<br />

approaches the MLE.<br />

Let us check that this is the case for the problem at hand. Write,<br />

Now, if we denote<br />

ˆΘB = X<br />

n<br />

we get the wished presentation<br />

n α<br />

+<br />

α + β + n α + β<br />

w :=<br />

n<br />

α + β + n ,<br />

ˆΘ = w ¯ X + (1 − w)θ0.<br />

α + β<br />

α + β + n .<br />

where θ0 = E(Θ) = α/(α + β) is the prior mean of Θ.<br />

Now, the problem at hand asks us <strong>to</strong> work a bit further on the weight. The variance of<br />

the beta RV Θ is<br />

Well, it is plain <strong>to</strong> see that<br />

Then a simple algebra yields<br />

V ar(Θ) := σ 2 0 =<br />

θ0(1 − θ0) =<br />

αβ<br />

(α + β) 2 (α + β + 1) .<br />

αβ<br />

(α + β) 2.<br />

σ 2 0 = θ0(1 − θ0)<br />

α + β + 1<br />

6


which in its turn yields<br />

Using this we get the wished<br />

Problem is solved.<br />

w =<br />

α + β = θ0(1 − θ0)<br />

σ 2 0<br />

− 1.<br />

n<br />

n + θ0(1 − θ − 0)σ −2<br />

0 − 1 .<br />

9. Problem 10.76. Here X ∼ N(µ, σ 2 ) with σ 2 being known. A sample of size n is given.<br />

The parameter of interest is the population mean µ, and a Bayesian approach is considered<br />

with the Normal prior M ∼ N(µ0, σ 2 0. In other words, the Bayesian approach suggests <strong>to</strong><br />

think about an estimated µ as a realization of a random variable M which has a normal<br />

distribution with the given mean and variance.<br />

As a result, we know that the Bayesian estima<strong>to</strong>r is the mean of the posterior distribution.<br />

The posterior distribution is calculated in Th.10.6, and it is again normal N(µ1, σ2 1 ) where<br />

µ1 = ¯ X nσ2 0<br />

nσ2 + µ0<br />

0 + σ2 σ2 nσ2 0 + σ2; 1<br />

σ 2 1<br />

= n 1<br />

+<br />

σ2 σ2. 0<br />

Note that this theorem implies that the normal distribution is the conjugate prior: the<br />

prior is normal and the posterior is normal as well.<br />

We can conclude that the Bayesian estima<strong>to</strong>r is<br />

ˆMB = E(M| ¯ X) = w ¯ X + (1 − w)µ0,<br />

that is, the Bayesian estima<strong>to</strong>r is a linear combination of the MLE estima<strong>to</strong>r (here ¯ X) and<br />

the prior mean (pure Bayesian estima<strong>to</strong>r when no observations are available). Recall that<br />

this is a rather typical outcome, and the Bayesian estima<strong>to</strong>r approaches the MLE as n → ∞.<br />

A direct (simple) calculation shows that<br />

Problem is solved.<br />

w = n/[n + σ 2 /σ 2 0 ].<br />

10. Problem 10.77. Here a Poisson RV X with an unknown intensity λ is observed. The<br />

problem is <strong>to</strong> estimate λ. A Bayesian approach is suggested with the prior distribution for<br />

the intensity Λ being Gamma(α, β). In other words, X ∼ Poiss(Λ) and Λ ∼ Gamma(α, β).<br />

To find a Bayesian estima<strong>to</strong>r, we need <strong>to</strong> evaluate the posterior distribution of Λ given X<br />

and then calculate its mean; that mean will be the Bayesian estima<strong>to</strong>r. We do this in two<br />

steps.<br />

(a) To find the posterior distribution we begin with the joint pdf<br />

=<br />

f Λ,X (λ, x) = f Λ (λ)f X|Λ (x|λ)<br />

1<br />

Γ(α)β αλα−1 e −λ/β e −λ λ x [x!] −1 I(λ > 0)I(x ∈ {0, 1, . . .}).<br />

7


Then the posterior pdf is<br />

f Λ|X (λ|x) = fΛ,X (λ, x)<br />

fX (x) = λ(α+x)−1e−λ(1+1/β) Γ(α)βαf X (x)x!<br />

I(λ > 0). (2)<br />

Now I explain you what smart Bayesian statisticians do. They do not calculate f X (x) or<br />

try <strong>to</strong> simplify (2); instead they look at (1) as a density in λ and try <strong>to</strong> guess what family it<br />

is from. Here it is plain <strong>to</strong> realize that the posterior pdf is again Gamma, more exactly it is<br />

Gamma(α + x, β/(1 + β)). Note that the Gamma prior for the Poisson intensity parameter<br />

is the conjugate prior because the posterior is from the same family.<br />

As soon as you realized the posterior distribution, you know what the Bayesian estima<strong>to</strong>r<br />

is: it is the expected value of this Gamma RV, namely<br />

The problem is solved.<br />

ˆΛB = E(Λ|X) = (α + X)[β/(1 + β)] = β(α + X)/(1 + β).<br />

11. Problem 10.94. This is a curious problem on application and analysis of Bayesian<br />

approach. It is given that the observation X is a binomial RV Binom(n = 30, θ) and<br />

someone believes that the probability of success θ is a realization of a Beta random variable<br />

Θ ∼ Beta(α, β). Parameters α and β are not given; instead it is given that EΘ = θ0 = .74<br />

and V ar(Θ) = σ 2 0 = 3 2 = 9. [Do you think that this information is enough <strong>to</strong> find the<br />

parameters of the underlying beta distribution? If “yes”, then what are they?]<br />

Now we are in a position <strong>to</strong> answer the questions.<br />

(a). Using only the prior information (that is, no observation is available), the best MSE<br />

estimate is the prior mean<br />

ˆΘprior = EΘ = .74.<br />

(b) Based on the direct information, the MLE and the MME estima<strong>to</strong>rs are the same<br />

and they are<br />

ˆΘMLE = ˆ ΘMME = ¯ X = X/n = 18/30.<br />

[Please compare answers in (a) and (b) parts. Are they far enough?]<br />

(c) The Bayesian estima<strong>to</strong>r with Θ ∼ Beta(α, β) is (see p.345)<br />

ˆΘB =<br />

X + α<br />

α + β + n .<br />

Now, we can either find α and β from the mean and variance information, or use results of<br />

our homework problem 10.74 and get<br />

where<br />

w =<br />

ˆΘB = w ¯ X + (1 − w)E(Θ),<br />

n<br />

n + θ0(1−θ0)<br />

σ 2 0<br />

− 1 =<br />

8<br />

30<br />

30 + (.74)(.26)<br />

9<br />

− 1 .


12. Problem 10.96. Let X be a grade, and assume that X ∼ N(µ, σ 2 ) with σ 2 = (7.4) 2 .<br />

Then there is a professor’s believe, based on a prior knowledge, that the mean M ∼ N(µ0 =<br />

65.2, σ 2 0 = (1.5) 2 ). After exam ¯ X = 72.9 is the observation.<br />

(a) Denote by Z the standard normal random variable. Then using z-scoring yields<br />

P(63.0 < M < 68.0) = P 63.0 − µ0<br />

σ0<br />

< M − µ0<br />

σ0<br />

< 68.0 − µ0<br />

<br />

63 − 65.2 68 − 65.2<br />

= P( < Z < = P<br />

1.5 1.5<br />

<br />

− 2.2 2.8<br />

< Z < .<br />

1.5 1.5<br />

Then you use Table — I skip this step here.<br />

(b) As we know from Theorem 10.6, M| ¯ X is normally distributed with<br />

µ1 = n ¯ Xσ2 0 + µ0σ2 nσ2 0 + σ2 , σ 2 1 = σ2σ2 0<br />

σ2 + nσ2 .<br />

0<br />

Here: n = 40, ¯ X = 72.9, σ 2 0 = (1.5)2 , σ 2 = (7.4) 2 , µ0 = 65.2. Plug-in these numbers and<br />

then<br />

P(63 < M < 68| ¯ X = 72.9) = P 63 − µ1<br />

Find the numbers and use the Table.<br />

9<br />

σ1<br />

σ1<br />

σ0<br />

68 − µ1<br />

<br />

< Z < .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!