27.01.2014 Views

Entropy Inference and the James-Stein Estimator, with Application to ...

Entropy Inference and the James-Stein Estimator, with Application to ...

Entropy Inference and the James-Stein Estimator, with Application to ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

HAUSSER AND STRIMMER<br />

that p is fixed <strong>and</strong> known. In this setting, <strong>the</strong> Shannon entropy in natural units is given by 1<br />

H = −<br />

p<br />

∑<br />

k=1<br />

θ k log(θ k ). (1)<br />

In practice, <strong>the</strong> underlying probability mass function are unknown, hence H <strong>and</strong> θ k need <strong>to</strong> be<br />

estimated from observed cell counts y k ≥ 0.<br />

A particularly simple <strong>and</strong> widely used estima<strong>to</strong>r of entropy is <strong>the</strong> maximum likelihood (ML)<br />

estima<strong>to</strong>r<br />

Ĥ ML = −<br />

p<br />

∑<br />

k=1<br />

constructed by plugging <strong>the</strong> ML frequency estimates<br />

ˆθ ML<br />

k log(ˆθ ML<br />

k )<br />

ˆθ ML<br />

k<br />

= y k<br />

n<br />

(2)<br />

in<strong>to</strong> Equation 1, <strong>with</strong> n = ∑ p k=1 y k being <strong>the</strong> <strong>to</strong>tal number of counts.<br />

In situations <strong>with</strong> n ≫ p, that is, when <strong>the</strong> dimension is low <strong>and</strong> when <strong>the</strong>re are many observation,<br />

it is easy <strong>to</strong> infer entropy reliably, <strong>and</strong> it is well-known that in this case <strong>the</strong> ML estima<strong>to</strong>r is<br />

optimal. However, in high-dimensional problems <strong>with</strong> n ≪ p it becomes extremely challenging <strong>to</strong><br />

estimate <strong>the</strong> entropy. Specifically, in <strong>the</strong> “small n, large p” regime <strong>the</strong> ML estima<strong>to</strong>r performs very<br />

poorly <strong>and</strong> severely underestimates <strong>the</strong> true entropy.<br />

While entropy estimation has a long his<strong>to</strong>ry tracing back <strong>to</strong> more than 50 years ago, it is only<br />

recently that <strong>the</strong> specific issues arising in high-dimensional, undersampled data sets have attracted<br />

attention. This has lead <strong>to</strong> two recent innovations, namely <strong>the</strong> NSB algorithm (Nemenman et al.,<br />

2002) <strong>and</strong> <strong>the</strong> Chao-Shen estima<strong>to</strong>r (Chao <strong>and</strong> Shen, 2003), both of which are now widely considered<br />

as benchmarks for <strong>the</strong> small-sample entropy estimation problem (Vu et al., 2007).<br />

Here, we introduce a novel <strong>and</strong> highly efficient small-sample entropy estima<strong>to</strong>r based on <strong>James</strong>-<br />

<strong>Stein</strong> shrinkage (Gruber, 1998). Our method is fully analytic <strong>and</strong> hence computationally inexpensive.<br />

Moreover, our procedure simultaneously provides estimates of <strong>the</strong> entropy <strong>and</strong> of <strong>the</strong> cell<br />

frequencies suitable for plugging in<strong>to</strong> <strong>the</strong> Shannon entropy formula (Equation 1). Thus, in comparison<br />

<strong>the</strong> estima<strong>to</strong>r we propose is simpler, very efficient, <strong>and</strong> at <strong>the</strong> same time more versatile than<br />

currently available entropy estima<strong>to</strong>rs.<br />

2. Conventional Methods for Estimating <strong>Entropy</strong><br />

<strong>Entropy</strong> estima<strong>to</strong>rs can be divided in<strong>to</strong> two groups: i) methods, that rely on estimates of cell frequencies,<br />

<strong>and</strong> ii) estima<strong>to</strong>rs, that directly infer entropy <strong>with</strong>out estimating a compatible set of θ k .<br />

Most methods discussed below fall in<strong>to</strong> <strong>the</strong> first group, except for <strong>the</strong> Miller-Madow <strong>and</strong> NSB<br />

approaches.<br />

1. In this paper we use <strong>the</strong> following conventions: log denotes <strong>the</strong> natural logarithm (not base 2 or base 10), <strong>and</strong> we<br />

define 0log0 = 0.<br />

1470

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!