27.01.2014 Views

Entropy Inference and the James-Stein Estimator, with Application to ...

Entropy Inference and the James-Stein Estimator, with Application to ...

Entropy Inference and the James-Stein Estimator, with Application to ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ENTROPY INFERENCE AND THE JAMES-STEIN ESTIMATOR<br />

2.1 Maximum Likelihood Estimate<br />

The connection between observed counts y k <strong>and</strong> frequencies θ k is given by <strong>the</strong> multinomial distribution<br />

p<br />

n!<br />

Prob(y 1 ,...,y p ;θ 1 ,...,θ p ) =<br />

∏ p k=1 y ∏ θ y k<br />

k<br />

k!<br />

. (3)<br />

Note that θ k > 0 because o<strong>the</strong>rwise <strong>the</strong> distribution is singular. In contrast, <strong>the</strong>re may be (<strong>and</strong><br />

often are) zero counts y k . The ML estima<strong>to</strong>r of θ k maximizes <strong>the</strong> right h<strong>and</strong> side of Equation 3 for<br />

fixed y k , leading <strong>to</strong> <strong>the</strong> observed frequencies ˆθ ML<br />

k<br />

= y k<br />

n<br />

<strong>with</strong> variances Var(ˆθ ML<br />

k<br />

) = 1 n θ k(1 − θ k ) <strong>and</strong><br />

Bias(ˆθ ML<br />

k<br />

) = 0 as E(ˆθ ML<br />

k<br />

) = θ k .<br />

2.2 Miller-Madow <strong>Estima<strong>to</strong>r</strong><br />

While ˆθ ML<br />

k<br />

is unbiased, <strong>the</strong> corresponding plugin entropy estima<strong>to</strong>r Ĥ ML is not. First order bias<br />

correction leads <strong>to</strong><br />

Ĥ MM = Ĥ ML + m >0 − 1<br />

,<br />

2n<br />

where m >0 is <strong>the</strong> number of cells <strong>with</strong> y k > 0. This is known as <strong>the</strong> Miller-Madow estima<strong>to</strong>r (Miller,<br />

1955).<br />

2.3 Bayesian <strong>Estima<strong>to</strong>r</strong>s<br />

Bayesian regularization of cell counts may lead <strong>to</strong> vast improvements over <strong>the</strong> ML estima<strong>to</strong>r (Agresti<br />

<strong>and</strong> Hitchcock, 2005). Using <strong>the</strong> Dirichlet distribution <strong>with</strong> parameters a 1 ,a 2 ,...,a p as prior, <strong>the</strong><br />

resulting posterior distribution is also Dirichlet <strong>with</strong> mean<br />

ˆθ Bayes<br />

k<br />

= y k + a k<br />

n+A ,<br />

where A = ∑ p k=1 a k. The flattening constants a k play <strong>the</strong> role of pseudo-counts (compare <strong>with</strong> Equation<br />

2), so that A may be interpreted as <strong>the</strong> a priori sample size.<br />

Some common choices for a k are listed in Table 1, along <strong>with</strong> references <strong>to</strong> <strong>the</strong> corresponding<br />

plugin entropy estima<strong>to</strong>rs,<br />

Ĥ Bayes = −<br />

p<br />

∑<br />

k=1<br />

ˆθ Bayes<br />

k<br />

log(ˆθ Bayes<br />

k<br />

).<br />

k=1<br />

a k Cell frequency prior <strong>Entropy</strong> estima<strong>to</strong>r<br />

0 no prior maximum likelihood<br />

1/2 Jeffreys prior (Jeffreys, 1946) Krichevsky <strong>and</strong> Trofimov (1981)<br />

1 Bayes-Laplace uniform prior Holste et al. (1998)<br />

1/p Perks prior (Perks, 1947) Schürmann <strong>and</strong> Grassberger (1996)<br />

√ n/p minimax prior (Trybula, 1958)<br />

Table 1: Common choices for <strong>the</strong> parameters of <strong>the</strong> Dirichlet prior in <strong>the</strong> Bayesian estima<strong>to</strong>rs of<br />

cell frequencies, <strong>and</strong> corresponding entropy estima<strong>to</strong>rs.<br />

1471

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!