Entropy Inference and the James-Stein Estimator, with Application to ...
Entropy Inference and the James-Stein Estimator, with Application to ...
Entropy Inference and the James-Stein Estimator, with Application to ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
HAUSSER AND STRIMMER<br />
Acknowledgments<br />
This work was partially supported by a Emmy Noe<strong>the</strong>r grant of <strong>the</strong> Deutsche Forschungsgemeinschaft<br />
(<strong>to</strong> K.S.). We thank <strong>the</strong> anonymous referees <strong>and</strong> <strong>the</strong> edi<strong>to</strong>r for very helpful comments.<br />
Appendix A. Recipe For Constructing <strong>James</strong>-<strong>Stein</strong>-type Shrinkage <strong>Estima<strong>to</strong>r</strong>s<br />
The original <strong>James</strong>-<strong>Stein</strong> estima<strong>to</strong>r (<strong>James</strong> <strong>and</strong> <strong>Stein</strong>, 1961) was proposed <strong>to</strong> estimate <strong>the</strong> mean of<br />
a multivariate normal distribution from a single (n = 1!) vec<strong>to</strong>r observation. Specifically, if x is a<br />
sample from N p (µ,I) <strong>the</strong>n <strong>James</strong>-<strong>Stein</strong> estima<strong>to</strong>r is given by<br />
ˆµ JS<br />
k = (1 − p − 2<br />
∑ p )x k .<br />
k=1 x2 k<br />
Intriguingly, this estima<strong>to</strong>r outperforms <strong>the</strong> maximum likelihood estima<strong>to</strong>r ˆµ ML<br />
k<br />
= x k in terms of<br />
mean squared error if <strong>the</strong> dimension is p ≥ 3. Hence, <strong>the</strong> <strong>James</strong>-<strong>Stein</strong> estima<strong>to</strong>r dominates <strong>the</strong><br />
maximum likelihood estima<strong>to</strong>r.<br />
The above estima<strong>to</strong>r can be slightly generalized by shrinking <strong>to</strong>wards <strong>the</strong> component average<br />
¯x = ∑ p k=1 x k ra<strong>the</strong>r than <strong>to</strong> zero, resulting in<br />
<strong>with</strong> estimated shrinkage intensity<br />
ˆµ Shrink<br />
k<br />
ˆλ ⋆ =<br />
= ˆλ ⋆ ¯x+(1 − ˆλ ⋆ )x k<br />
p − 3<br />
∑ p k=1 (x k − ¯x) 2.<br />
The <strong>James</strong>-<strong>Stein</strong> shrinkage principle is very general <strong>and</strong> can be put <strong>to</strong> <strong>to</strong> use in many o<strong>the</strong>r<br />
high-dimensional settings. In <strong>the</strong> following we summarize a simple recipe for constructing <strong>James</strong>-<br />
<strong>Stein</strong>-type shrinkage estima<strong>to</strong>rs along <strong>the</strong> lines of Schäfer <strong>and</strong> Strimmer (2005b) <strong>and</strong> Opgen-Rhein<br />
<strong>and</strong> Strimmer (2007a).<br />
In short, <strong>the</strong>re are two key ideas at work in <strong>James</strong>-<strong>Stein</strong> shrinkage:<br />
i) regularization of a high-dimensional estima<strong>to</strong>r ˆθ by linear combination <strong>with</strong> a lowerdimensional<br />
target estimate ˆθ Target , <strong>and</strong><br />
ii) adaptive estimation of <strong>the</strong> shrinkage parameter λ from <strong>the</strong> data by quadratic risk minimization.<br />
A general form of a <strong>James</strong>-<strong>Stein</strong>-type shrinkage estima<strong>to</strong>r is given by<br />
ˆθ Shrink = λˆθ Target +(1 − λ)ˆθ. (9)<br />
Note that ˆθ <strong>and</strong> ˆθ Target are two very different estima<strong>to</strong>rs (for <strong>the</strong> same underlying model!). ˆθ as a<br />
high-dimensional estimate <strong>with</strong> many independent components has low bias but for small samples<br />
a potentially large variance. In contrast, <strong>the</strong> target estimate ˆθ Target is low-dimensional <strong>and</strong> <strong>the</strong>refore<br />
is generally less variable than ˆθ but at <strong>the</strong> same time is also more biased. The <strong>James</strong>-<strong>Stein</strong> estimate<br />
1480