24.02.2013 Views

Optimality

Optimality

Optimality

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

y the P× 1 vector<br />

(3.1) ηn(p) =<br />

Linear prediction after model selection 295<br />

� θ[p] + (X[p] ′ X[p]) −1 X[p] ′ X[¬p]θ[¬p]<br />

(0, . . . ,0) ′<br />

with the conventions that ηn(0) = (0, . . . ,0) ′ ∈ R P and ηn(P) = θ. Furthermore, let<br />

Φn,p(t), t∈R k , denote the cdf of √ nA( ˜ θ(p)−ηn(p)), i.e., Φn,p(t) is the cdf of a centered<br />

Gaussian random vector with covariance matrix σ 2 A[p](X[p] ′ X[p]/n) −1 A[p] ′<br />

in case p > 0, and the cdf of point-mass at zero in R k in case p = 0. If p > 0 and<br />

if the matrix A[p] has rank k, then Φn,p(t) has a density with respect to Lebesgue<br />

measure, and we shall denote this density by φn,p(t). We note that ηn(p) depends<br />

on θ and that Φn,p(t) depends on σ (in case p > 0), although these dependencies<br />

are not shown explicitly in the notation.<br />

For p > 0, the conditional distribution of √ n ˜ θp(p) given √ nA( ˜ θ(p)−ηn(p)) = z<br />

is a Gaussian distribution with mean √ nηn,p(p) + bn,pz and variance σ 2 ζ 2 n,p, where<br />

(3.2) bn,p = C (p)′<br />

n (A[p](X[p] ′ X[p]/n) −1 A[p] ′ ) − , and<br />

(3.3) ζ 2 n,p = ξ 2 n,p− bn,pC (p)<br />

n .<br />

In the displays above, C (p)<br />

n stands for A[p](X[p] ′ X[p]/n) −1ep, with ep denoting<br />

the p-th standard basis vector in Rp , and (A[p](X[p] ′ X[p]/n) −1A[p] ′ ) − denotes a<br />

generalized inverse of the matrix indicated (cf. Note 3(v) in Section 8a.2 of Rao [7]).<br />

Note that, in general, the quantity bn,pz depends on the choice of generalized inverse<br />

in (3.2); however, for z in the column-space of A[p], bn,pz is invariant under the<br />

choice of inverse; cf. Lemma A.2 in Leeb [1]. Since √ nA( ˜ θ(p)−ηn(p)) lies in the<br />

column-space of A[p], the conditional distribution of √ n˜ θp(p) given √ nA( ˜ θ(p)−<br />

ηn(p)) = z is thus well-defined by the above. Observe that the vector of covariances<br />

between A˜ θ(p) and ˜ θp(p) is given by σ2n−1C (p)<br />

n . In particular, note that A˜ θ(p) and<br />

˜θp(p) are uncorrelated if and only if ζ2 n,p = ξ2 n,p (or, equivalently, if and only if<br />

bn,pz = 0 for all z in the column-space of A[p]); again, see Lemma A.2 in Leeb [1].<br />

Finally, for M denoting a univariate Gaussian random variable with zero mean<br />

and variance s2≥ 0, we abbreviate the probability P(|M− a| < b) by ∆s(a, b),<br />

a∈R∪{−∞,∞}, b∈R. Note that ∆s(·,·) is symmetric around zero in its first<br />

argument, and that ∆s(−∞, b) = ∆s(∞, b) = 0 holds. In case s = 0, M is to be<br />

interpreted as being equal to zero, such that ∆0(a, b) equals one if|a| < b and zero<br />

otherwise; i.e., ∆0(a, b) reduces to an indicator function.<br />

3.1. The known-variance case<br />

The cdf G∗ n,θ,σ (t) can be expanded as a weighted sum of conditional cdfs, condi-<br />

tional on the outcome of the model selection step, where the weights are given by<br />

the corresponding model selection probabilities. To this end, let G∗ n,θ,σ (t|p) denote<br />

the conditional cdf of √ nA( ˜ θ∗− θ) given that ˆp ∗ equals p forO≤p≤P; that<br />

is, G∗ n,θ,σ (t|p) = Pn,θ,σ( √ nA( ˜ θ∗− θ)≤t| ˆp ∗ = p), with t∈R k . Moreover, let<br />

π∗ n,θ,σ (p) = Pn,θ,σ(ˆp ∗ = p) denote the corresponding model selection probability.<br />

Then the unconditional cdf G∗ n,θ,σ (t) can be written as<br />

(3.4) G ∗ n,θ,σ(t) =<br />

P�<br />

p=O<br />

G ∗ n,θ,σ(t|p)π ∗ n,θ,σ(p).<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!