The bivariate gamma distribution with Laplace transform (1 + as + bt ...

The bivariate gamma distribution with Laplace transform (1 + as + bt ... The bivariate gamma distribution with Laplace transform (1 + as + bt ...

math.univ.toulouse.fr
from math.univ.toulouse.fr More from this publisher
16.07.2014 Views

The bivariate gamma distribution with Laplace transform (1 + as + bt + cst) −q : history, characterisations, estimation. Gérard Letac, Université Paul Sabatier, Toulouse, France. Conference at York, 11/30/07. ∗ ∗ with Jean-Yves Tourneret and Florent Chatelain, Irit- Enseeiht, Toulouse. 1

<strong>The</strong> <strong>bivariate</strong> <strong>gamma</strong> <strong>distribution</strong> <strong>with</strong><br />

<strong>Laplace</strong> <strong>transform</strong> (1 + <strong>as</strong> + <strong>bt</strong> + cst) −q :<br />

history, characterisations, estimation.<br />

Gérard Letac, Université Paul Sabatier,<br />

Toulouse, France.<br />

Conference at York, 11/30/07.<br />

∗<br />

∗ <strong>with</strong> Jean-Yves Tourneret and Florent Chatelain, Irit-<br />

Enseeiht, Toulouse.<br />

1


Vocabulary. While the names of <strong>distribution</strong>s<br />

in R are generaly unambiguous, at the contrary<br />

in the jungle of <strong>distribution</strong>s in R n almost<br />

nothing is codified outside of Wishart and<br />

Gaussian c<strong>as</strong>es. <strong>The</strong> scenario is usually <strong>as</strong> follows<br />

: choose a one dimensional thingy type<br />

(quite often an exponential dispersion model,<br />

namely a natural exponential family and all<br />

its real powers of convolution) <strong>as</strong> <strong>gamma</strong> or<br />

negative binomial, then any law in R n whose<br />

margins are of thingy type are said to be multidimensional<br />

thingy. Although the study of<br />

all <strong>distribution</strong>s <strong>with</strong> given marginals are rather<br />

in the non parametric domain of study,<br />

actually each author who isolates some parametric<br />

family will declare that he h<strong>as</strong> THE<br />

multidimensional thingy family.<br />

2


This is what we are also doing today by calling<br />

<strong>bivariate</strong> <strong>gamma</strong> any <strong>distribution</strong> ν(dx, dy)<br />

on (0, ∞) 2 such that there exists 4 parameters<br />

a, b, c, q <strong>with</strong> q > 0 satisfying<br />

∫ ∞ ∫ ∞<br />

0<br />

0 e−sx−ty ν(dx, dy) =<br />

1<br />

(1 + <strong>as</strong> + <strong>bt</strong> + cst) q<br />

(To be fair it is called Kibble and Moran <strong>distribution</strong><br />

by Johnson and Kotz). When (X, Y ) ∼<br />

ν doing t = 0 shows that X is <strong>gamma</strong> distributed<br />

<strong>with</strong> shape parameter q et and scale<br />

parameter a :<br />

γ q,a (dx) = 1 ( ) x q−1<br />

Γ(q) e−x a 1(0,∞) (x) dx a<br />

a .<br />

Same for Y. This shows that a et b are > 0.<br />

For if a or b are nonpositive the <strong>distribution</strong><br />

ν is concentrated on an other quadrant, or<br />

on an axis. No generality is lost by postulating<br />

a, b > 0 and therefore ν(dx, dy) concentrated<br />

on (0, ∞) 2 . <strong>The</strong> hypothesis q > 0 is<br />

more important and q < 0 would lead us to<br />

<strong>distribution</strong>s concentrated on a finite number<br />

of points <strong>with</strong> binomial margins.<br />

3


Constraints for parameters and density. We<br />

are going to see later on that if a, b, q > 0,<br />

the <strong>distribution</strong> ν does exist if and only if<br />

0 ≤ c ≤ ab. If c > 2ab the function (s, t) ↦→<br />

−q log(1+<strong>as</strong>+<strong>bt</strong>+cst) is not convex inside the<br />

suitable branch of hyperbola, but the proof<br />

that ab < c ≤ 2ab is impossible is su<strong>bt</strong>lier.<br />

Note that this implies that ν is infinitely divisible<br />

(<strong>as</strong> observed by Vere-Jones (1967)).<br />

Note also that is (X, Y ) is <strong>bivariate</strong> <strong>gamma</strong><br />

<strong>with</strong> parameters a, b, c, q then (X/a, Y/b) is <strong>bivariate</strong><br />

<strong>gamma</strong> <strong>with</strong> parameters 1, 1, c/ab, q. If<br />

c = ab obviously X et Y are independent.<br />

<strong>The</strong> c<strong>as</strong>e a = b = c (and thus c ≥ 1 since c ≤<br />

ab) is called the standard c<strong>as</strong>e <strong>with</strong> parameter<br />

r ∈ [0, 1) <strong>with</strong> c =<br />

1−r 1 . It seems to be more<br />

interesting than the a = b = 1 c<strong>as</strong>e (although<br />

the a = b = 1 c<strong>as</strong>e is natural in the context of<br />

Lanc<strong>as</strong>ter probabilities that we shall consider<br />

later on).<br />

If c = 0 then (X, Y ) ∼ (aZ, bZ) <strong>with</strong> Z ∼<br />

γ p,1 . If 0 < c ≤ ab the <strong>distribution</strong> ν h<strong>as</strong> a<br />

non familiar density. To compute it, let us<br />

use natural exponential families (NEF).<br />

4


<strong>The</strong> NEF F (µ ρ,q )<br />

Let ρ > 0, let q > 0 and consider the me<strong>as</strong>ure<br />

µ ρ,q (dx, dy) <strong>with</strong> density<br />

(xy) q−1<br />

Γ(q)<br />

f q(ρxy)<br />

on (0, ∞) 2 where f q (z) = ∑ ∞<br />

n=0<br />

<strong>Laplace</strong> <strong>transform</strong><br />

L µρ,q (θ 1 , θ 2 ) =<br />

is defined on<br />

∫ ∫<br />

z n<br />

n!Γ(q+n) . Its<br />

e θ 1x+θ 2 y µ ρ,q (dx, dy)<br />

Θ(µ ρ,q ) = {(θ 1 , θ 2 ) ; θ 1 < 0, θ 2 < 0, θ 1 θ 2 −ρ > 0}<br />

and is equal to L µρ,q (θ 1 , θ 2 ) = (θ 1 θ 2 − ρ) −q on<br />

this set. Consider the NEF F (µ ρ,q ) generated<br />

by µ ρ,q .<br />

5


An element P (θ 1 , θ 2 ) of this NEF h<strong>as</strong> the form<br />

P (θ 1 , θ 2 )(dx, dy) =<br />

(θ 1 θ 2 −ρ) q e θ 1x+θ 2 y (xy)q−1<br />

Γ(q)<br />

f q(ρxy)1 (0,∞) 2(x, y)dxdy.<br />

Thus<br />

∫ ∫<br />

e −sx−ty P (θ 1 , θ 2 )(dx, dy) = L(θ 1 − s, θ 2 − t)<br />

L(θ 1 , θ 2 )<br />

=<br />

(θ 1 θ 2 − ρ) q<br />

((θ 1 − s)(θ 2 − t) − ρ) q. 6


Introduce r =<br />

ρ<br />

θ 1 θ<br />

. We are going to see that<br />

2<br />

r is the correlation coefficient of P (θ 1 , θ 2 ).<br />

Obviously the parameters<br />

a = −θ 2<br />

θ 1 θ 2 − ρ = − 1<br />

θ 1 (1 − r) ,<br />

b = −θ 1<br />

θ 1 θ 2 − ρ = − 1<br />

θ 2 (1 − r) ,<br />

1<br />

c =<br />

θ 1 θ 2 − ρ = 1<br />

θ 1 θ 2 (1 − r)<br />

seem more attractive for P (θ 1 , θ 2 ) since<br />

∫ ∫<br />

e −sx−ty P (θ 1 , θ 2 )(dx, dy) =<br />

<strong>with</strong> the conversion table<br />

1<br />

(1 + <strong>as</strong> + <strong>bt</strong> + cst) q.<br />

θ 1 = − b c ,<br />

θ 2 = − a c<br />

r = ab − c , ρ = ab − c<br />

ab c 2 .<br />

In the standard c<strong>as</strong>e a = b = c =<br />

1−r 1 we have<br />

θ 1 = θ 2 = −1 and ρ = r. In the ‘Lanc<strong>as</strong>ter’<br />

c<strong>as</strong>e a = b = 1 and c = 1 − r we have θ 1 =<br />

θ 2 = −1−r 1 and ρ =<br />

r<br />

(1−r) 2. 7


<strong>The</strong> cl<strong>as</strong>sical objects of the NEF F (µ ρ,q ). We<br />

have seen that Θ(µ ρ,q ) is the interior of the<br />

hyperbola branch<br />

{θ 1 < 0, θ 2 < 0; θ 1 θ 2 − ρ > 0}.<br />

<strong>The</strong> cumulant function is<br />

Its differential is<br />

k ′ (θ 1 , θ 2 ) =<br />

k(θ 1 , θ 2 ) = −q log(θ 1 θ 2 − ρ)<br />

−q<br />

θ 1 θ 2 − ρ (θ 2, θ 1 ) = (qa, qb) = (m 1 , m 2 ).<br />

<strong>The</strong> domain of the means is M(F ) = (0, ∞) 2 .<br />

This gives r <strong>as</strong> a function of m 1 , m 2 and ρ :<br />

(√<br />

1 − r = q2<br />

1 + 4ρm )<br />

1m 2<br />

2ρm 1 m 2 q 2 − 1<br />

or in a simpler form :<br />

1<br />

1 − r = 1 2 + 1 2<br />

√<br />

1 + 4ρm 1m 2<br />

q 2 .<br />

8


Here is the variance function either expressed<br />

in canonical parameters, or in the mean<br />

parameters, or in the cl<strong>as</strong>sical parameters :<br />

k ′′ (θ 1 , θ 2 ) =<br />

= 1 q<br />

= q<br />

[<br />

q θ<br />

2<br />

2<br />

ρ<br />

(θ 1 θ 2 − ρ) 2 ρ θ1<br />

2<br />

[<br />

[<br />

m 2 1<br />

rm 1 m 2<br />

rm 1 m 2 m 2 2<br />

a 2<br />

]<br />

ab − c<br />

ab − c b 2<br />

(do not forget that r is the complicated function<br />

of m 1 and m 2 given above). One characterisation<br />

of this <strong>bivariate</strong> <strong>gamma</strong> NEF is due<br />

to Barlev et al. (1994). It says that if a NEF<br />

concentrated on (0, [ ∞) 2 h<strong>as</strong>]<br />

a variance function<br />

of the form 1 m<br />

2<br />

1<br />

∗<br />

q ∗ m 2 this implies that<br />

2<br />

it is a <strong>bivariate</strong> <strong>gamma</strong> NEF.<br />

]<br />

]<br />

9


How to get quickly the density. Starting from<br />

(1 + <strong>as</strong> + <strong>bt</strong> + cst) −q = E(e −sX−tY ) <strong>with</strong><br />

a, b, c > 0<br />

1. We choose α and β > 0 such that X 1 =<br />

αX and Y 1 = βY satisfy<br />

(1 + c 1 s + c 1 t + c 1 st) −q = E(e −sX 1−tY 1 )<br />

pour some c 1 = αa = βb = cαβ. Thus<br />

(X 1 , Y 1 ) is standard <strong>with</strong> parameters r and<br />

q. We get<br />

α = b c , β = a c , c 1 = ab<br />

c = 1<br />

1 − r<br />

2. We now compute the density for the standard<br />

c<strong>as</strong>e<br />

10


E(e −sX 1−tY 1 )<br />

=<br />

=<br />

=<br />

=<br />

=<br />

=<br />

=<br />

(1 − r) q<br />

(1 − r + s + t + st) q<br />

(1 − r) q<br />

((1 + s)(1 + t) − r) q<br />

(1 − r) q<br />

(<br />

(1 + s)(1 + t)(1 −<br />

r<br />

(1+s)(1+t) )) q<br />

(1 − r) q ∑ ∞<br />

(q) n r n<br />

(1 + s) q (1 + t) q n! (1 + s) n (1 + t) n<br />

(1 − r)q<br />

Γ(q)<br />

(1 − r)q<br />

Γ(q)<br />

∞∑<br />

n=0<br />

∞∑<br />

∫<br />

n=0<br />

∞ ∫ ∞<br />

n=0<br />

Γ(q + n)<br />

n!<br />

r n<br />

n!Γ(q + n)<br />

r n<br />

(1 + s) q+n (1 + t) q+n<br />

×<br />

0 0 e−sx 1−ty 1 (x 1 y 1 ) q+n−1 dx 1 dy 1<br />

∫<br />

(1 − r)q ∞ ∫ ∞<br />

Γ(q) 0 0 e−sx 1−ty 1 (x 1 y 1 ) q−1 f q (rx 1 y 1 )dx 1 dy 1<br />

11


Mixing. In other terms if<br />

N, X 0 , . . . , X n . . . , Y 0 , . . . , Y n , . . .<br />

are independent and such that X 0 ∼ Y 0 ∼ γ q,1 ,<br />

X i ∼ Y i ∼ γ 1,1 for i ≥ 1 and if N follows a<br />

negative binomial <strong>distribution</strong> :<br />

denote<br />

Pr(N = n) = (q) n<br />

n! (1 − r)q r n<br />

U n = X 0 + . . . + X n , V n = Y 0 + . . . + Y n .<br />

Under these circumstances (U N , V N ) follows<br />

a standard <strong>gamma</strong> <strong>bivariate</strong> <strong>distribution</strong> <strong>with</strong><br />

parameters r and q : thus this <strong>distribution</strong> is<br />

a mixing of <strong>gamma</strong> <strong>distribution</strong>s.


Why 0 ≤ c ≤ ab? (and not only 0 ≤ c ≤ 2ab for<br />

which log((1 + <strong>as</strong> + <strong>bt</strong> + cst) is concave). This<br />

is equivalent to <strong>as</strong>k : why r ∈ [0, 1] and not<br />

only r ∈ [−1, 1]? <strong>The</strong> preceeding calculation<br />

contains the answer ; if r < 0 the density is<br />

(1−r) q<br />

Γ(q) (x 1y 1 ) q−1 f q (rx 1 y 1 ). But<br />

( z<br />

2) q−1<br />

fq<br />

( 1<br />

4 z2 )<br />

= I q−1 (z)<br />

Since the cl<strong>as</strong>sical Bessel function I q−1 (z) h<strong>as</strong><br />

its zeros on the imaginary axis and since they<br />

are all simple and in an infinite number, this<br />

implies that f q is not positive on (−∞, 0).<br />

13


Moments of the standard <strong>bivariate</strong> <strong>gamma</strong>.<br />

Recall that the <strong>distribution</strong> of (X, Y ) such<br />

that<br />

E(e −sX−tY ) = (1 − r) q (1 − r + s + t + st) −q<br />

h<strong>as</strong> been called standard <strong>bivariate</strong> <strong>gamma</strong> <strong>distribution</strong><br />

<strong>with</strong> covariance r and shape parameter<br />

q, and therefore <strong>with</strong> density<br />

(1 − r) q(xy)q−1<br />

Γ(q) f q(rxy)e −x−y .<br />

We get e<strong>as</strong>ily from this the moments : E(X s Y t ) =<br />

(1 − r) q Γ(q + s)Γ(q + t)<br />

Γ(q) 2 2F 1 (q + s, q + t; q; r) =<br />

(1 − r) −s−t Γ(q + s)Γ(q + t)<br />

Γ(q) 2 2F 1 (−s, −t; q; r)<br />

and therefore E((XY ) s )<br />

≤ (1 − r)−2s Γ(q + s) 2<br />

Γ(q) 2 2F 1 (−s, −t; q; 1)<br />

= (1 − r)<br />

−s−tΓ(q + 2s)<br />

Γ(q)<br />

14


For instance<br />

E(X) = E(Y ) = q<br />

1 − r<br />

E(X 2 ) = E(Y 2 q(q + 1)<br />

) =<br />

(1 − r) 2<br />

σ 2 (X) = σ 2 q<br />

(Y ) =<br />

(1 − r) 2<br />

1<br />

E(XY ) =<br />

(1 − r) 2q2 (1 + r q )<br />

E(X 2 Y 2 ) =<br />

1<br />

(1 − r) 4q2 (q + 1) 2 (1 + 4r<br />

q +<br />

2r2<br />

q(q + 1) )<br />

15


Joint <strong>distribution</strong> of (P, S) = (XY, X + Y ),<br />

standard c<strong>as</strong>e. <strong>The</strong> <strong>distribution</strong> is<br />

(1−r) qpq−1<br />

Γ(q) f q(rp)e −s 1<br />

√ 1 √ s>2 p (p, s)dpds.<br />

s 2 − 4p<br />

Writing h q (z) = ∑ ∞ (q) n<br />

n=0 n!Γ(2q+2n) zn and imitating<br />

the way that the density of (X, Y ) h<strong>as</strong><br />

been found one gets the law of S<br />

(1 − r) q s 2q−1 h q (rs 2 )1 s>0 (s)ds<br />

but for the density of P = XY nothing can be<br />

o<strong>bt</strong>ained from the joint <strong>distribution</strong> of (P, S)<br />

and from the Mellin <strong>transform</strong><br />

E(P s ) = (1 − r)<br />

−2sΓ(q + s)2<br />

Γ(q) 2<br />

2F 1 (−s, −s; q; r).<br />

16


History. <strong>The</strong>se <strong>distribution</strong>s have been introduced<br />

by Wicksell (1933) and independently<br />

by Kibble (1941) and mentioned by Moran<br />

(1967), by Barndorff Nielsen(1980), Wang (1982),<br />

Seshadri (1988), Barlev and 5 coauthors(1994)<br />

and Johnson and Kotz (1972). Angelo Koudou<br />

(1995) and Philippe Bernardorff (2003)<br />

study them in their thesis, defended at the<br />

Université Paul Sabatier at Toulouse.<br />

17


Bivariate <strong>gamma</strong> and Wishart <strong>distribution</strong>s<br />

In one dimension, <strong>gamma</strong> <strong>distribution</strong>s <strong>with</strong><br />

when the shape parameter is a half integer are<br />

known <strong>as</strong> chi squares. If the shape parameter<br />

is not a half integer, probabilistic interpretations<br />

are rather rare. Here is one which is due<br />

to Yor (1990) :<br />

∫ ∞<br />

(<br />

t 0 e− p +B(t) dt) −1 ∼ γ p,1<br />

when B is a standard Brownian motion. <strong>The</strong><br />

most natural generalization of the chi square<br />

<strong>distribution</strong> is the Wishart <strong>distribution</strong> : if<br />

Z 1 , . . . , Z N are iid N(0, Σ) in R r (written by<br />

columns) the Wishart <strong>distribution</strong> is the law<br />

of the random matrix<br />

It satisfies<br />

W = 1 2 (Z 1Z T 1 + . . . + Z NZ T N ).<br />

E(e − tr (θW ) ) = det(I + Σ −1 θ) −N/2 .<br />

(θ is a symmetric positive definite matrix).<br />

18


[ ]<br />

X Z<br />

For r = 2 write W = . Suppose that<br />

Z Y<br />

[ ]<br />

θ = diag(s, t) and denote Σ −1 a γ<br />

= and<br />

γ b<br />

denote c = ab − γ 2 . One gets exactly<br />

E(e −sX−tY ) =<br />

1<br />

(1 + <strong>as</strong> + <strong>bt</strong> + cst) N/2 :<br />

As a result : the diagonal of a Wishart matrix<br />

on the (2,2) real symmetric matrices is a<br />

<strong>bivariate</strong> <strong>gamma</strong>.<br />

19


This result can be extended in several directions<br />

: by replacing N/2 by q ≥ 1/2; by replacing<br />

the (2,2) positive definite matrices by<br />

the Lorentz cone<br />

C d = {(x, y, z 1 , . . . , z d ) ; x, y > 0, xy > z 2 1 +· · ·+z2 d }<br />

and [ writing ] symbolically its elements <strong>as</strong> w =<br />

x z<br />

. <strong>The</strong> Wishart <strong>distribution</strong>s on C<br />

z y<br />

d of<br />

shape parameter q are the elements of the<br />

natural exponential family generated by the<br />

me<strong>as</strong>ure on C d <strong>with</strong> density<br />

(xy − z 2 1 − · · · − z2 d )q−1− d 2<br />

if q > 1/2 (and some singular <strong>distribution</strong> on<br />

∂C d if q = 1/2). If (X, Y, Z) follows such a<br />

Wishart law on C d then (X, Y ) is <strong>bivariate</strong><br />

<strong>gamma</strong>. <strong>The</strong> c<strong>as</strong>es of ordinary Wishart <strong>distribution</strong>s,<br />

or complex or quaternionic are the<br />

c<strong>as</strong>es d = 1, 2, 4.<br />

Note that for 0 < q < 1/2 the <strong>bivariate</strong> <strong>gamma</strong><br />

does exist but cannot be the marginal <strong>distribution</strong><br />

of (X, Y ) for a Wishart <strong>distribution</strong><br />

(X, Y, Z 1 , . . . , Z d ).<br />

20


<strong>The</strong> <strong>bivariate</strong> <strong>gamma</strong> are Lanc<strong>as</strong>ter probabilities<br />

Suppose that α(dx) and β(dy) are <strong>distribution</strong>s<br />

on R such that ∫ e −a|x| α(dx) and<br />

∫<br />

e −a|y| β(dy) are finite for some a > 0. Let<br />

(P n ) and (Q n ) be the sequences of corresponding<br />

orthonormal polynomials. We say that<br />

the <strong>distribution</strong> µ(dx, dy) on R 2 <strong>with</strong> margins<br />

α and β is of Lanc<strong>as</strong>ter type if either their<br />

exists a sequence (ρ n ) of real numbers such<br />

that µ is absolutely continuous <strong>with</strong> respect<br />

to α(dx)β(dy) and is<br />

µ(dx, dy) =<br />

⎡<br />

⎣<br />

∞∑<br />

n=0<br />

ρ n P n (x)Q n (y) ⎦ α(dx)β(dy)<br />

or µ is the weak limit of such <strong>distribution</strong>s. In<br />

both c<strong>as</strong>es we have<br />

E µ (P n (X)|Y ) = ρ n Q n (Y ), E µ (Q n (Y )|X) = ρ n P n (X).<br />

⎤<br />

21


D’jachenko (1962) h<strong>as</strong> observed that the <strong>bivariate</strong><br />

<strong>gamma</strong> is a Lanc<strong>as</strong>ter probabilility <strong>with</strong><br />

ρ n = r n where r ∈ [0, 1] is the correlation coefficient.<br />

This is a not quite obvious fact and<br />

relies on a cl<strong>as</strong>sical formula on Laguerre polynomials.<br />

Note that this is not the standard<br />

<strong>bivariate</strong> <strong>gamma</strong> but the one <strong>with</strong> a = b =<br />

1 and c = 1 − r. To this, add a Tyan and<br />

Thom<strong>as</strong> theorem which says that in general<br />

the (ρ n ) ′ n≥0s appearing in the definition of<br />

Lanc<strong>as</strong>ter probabilities are sequences of moments,<br />

and you get an elegant integral representation<br />

:<br />

Proposition. µ(dx, dy) is a Lanc<strong>as</strong>ter probability<br />

for the margins α = β = γ q,1 if and only<br />

if there exists a probability ν(dr) on [0, 1] such<br />

that<br />

∫ ∞ ∫ ∞<br />

0<br />

∫ 1<br />

0<br />

0 e−sx−ty µ(dx, dy) =<br />

ν(dr)<br />

(1 + s + t + (1 − r)st) q.<br />

Dans ce c<strong>as</strong> ∫ 1<br />

0 r n ν(dr) = ρ n .<br />

22


Other details on Lanc<strong>as</strong>ter probabilities can<br />

be found in Koudou (1996) and the paper<br />

on Gibbs sampling by Diaconis and two coauthors<br />

to appear in Statistical Science in 2008.


<strong>The</strong> Wang characterisation (1982) Recall<br />

the characterisation by Lukacs (1956) of the<br />

ordinary <strong>gamma</strong> <strong>distribution</strong>s : If X and Y<br />

are positive, independent, not constant and<br />

such that Z = X/(X + Y ) is independent of<br />

X + Y then there exist p, q, c > 0 such that<br />

X ∼ γ p,c and Y ∼ γ q,c . Proof : let a and<br />

b ∈ (0, 1) such that E(X|X + Y ) = a(X + Y )<br />

and E(X 2 |X + Y ) = b(X + Y ) 2 . <strong>The</strong>n<br />

E(E(e −s(X+Y ) X|X+Y )) = aE((X+Y )e −s(X+Y ) )<br />

E(E(e −s(X+Y ) X 2 |X+Y )) = bE((X+Y ) 2 e −s(X+Y ) ).<br />

Denoting L X (s) = E(e −sX ) we get the differential<br />

system in L X and L Y : L ′ X L Y =<br />

a(L X L Y ) ′ and L ′′ X L Y = b(L X L Y ) ′′ e<strong>as</strong>ily solved<br />

by intoducing k X = log L X .<br />

23


Wang does the same thing in two dimensions<br />

: if X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 )<br />

in (0, ∞) 2 are independent, not constant and<br />

such that<br />

Z =<br />

(<br />

X1<br />

X 1 + Y 1<br />

,<br />

)<br />

X 2<br />

X 2 + Y 2<br />

is independent of X+Y then there exist p, q, a, b, c ><br />

0 such that X and Y are <strong>bivariate</strong> <strong>gamma</strong><br />

<strong>with</strong><br />

E(e −sX 1−tX 2 ) = (1 + <strong>as</strong> + <strong>bt</strong> + cst) −p<br />

E(e −sY 1−tY 2 ) = (1 + <strong>as</strong> + <strong>bt</strong> + cst) −q .<br />

<strong>The</strong> method of proof is the same <strong>as</strong> in one<br />

dimension. But see also a splendid paper by<br />

Letac and Weso̷lowski in TAMS (2008) which<br />

generalizes...<br />

24


A parenthesis on the negative multinomial<br />

<strong>distribution</strong> in dimension 2. <strong>The</strong> <strong>distribution</strong><br />

NM(p, a, b, c) of (X, Y ) in N 2 <strong>with</strong> parameters<br />

p, a, b, c and constraints 0 < p, 0 < a, b < 1 and<br />

0 < c < (1−a)(1−b) h<strong>as</strong> generating function :<br />

E(x X y Y ) =<br />

[<br />

(1 − a)(1 − b) − c<br />

1 − ax − by + (ab − c)xy<br />

] p<br />

.<br />

<strong>The</strong> same little trick <strong>as</strong> for the <strong>gamma</strong> gives<br />

Pr(X = m, Y = n) = ((1−a)(1−b)−c) p a m b n ×<br />

min(m,n)<br />

∑<br />

k=0<br />

(p) k<br />

k!<br />

(p + k) m−k<br />

(m − k)!<br />

by the following calculation<br />

(p + k) n−k<br />

(n − k)!<br />

( c<br />

ab<br />

) k<br />

.<br />

25


=<br />

=<br />

[<br />

⎡<br />

⎣<br />

(1 − a)(1 − b) − c<br />

1 − ax − by + (ab − c)xy<br />

] p<br />

(1 − a)(1 − b) − c<br />

(1 − ax)(1 − by)(1 −<br />

[<br />

(1 − a)(1 − b) − c<br />

(1 − ax)(1 − by)<br />

∞∑<br />

k=0<br />

k=0<br />

(p) k<br />

k!<br />

] p<br />

×<br />

c k x k y k<br />

(1 − ax) k (1 − by) k<br />

cxy<br />

(1−ax)(1−by)<br />

= ((1 − a)(1 − b) − c) p ×<br />

∞∑ (p) k c k x k y k<br />

k! (1 − ax) p+k (1 − by) p+k<br />

k,r,s=0<br />

⎤p<br />

= ((1 − a)(1 − b) − c) p ×<br />

∞∑ (p) k (p + k) r (p + k) s<br />

a r b s c k x r+k y s+k<br />

k! r! s!<br />

⎦<br />

26


Although in one dimension we have<br />

lim<br />

n→∞ H 1/n NB(q,<br />

nc<br />

1 + nc ) = γ q,c,<br />

(here H 1/n is the action on <strong>distribution</strong> of<br />

the dilation x ↦→ H 1/n (x) = x/n), in two dimensions<br />

it is impossible to find an affine<br />

<strong>transform</strong>ation A n (x, y) = (α n x, β n y) and a<br />

sequence (a n , b n , c n ) of parameters such that<br />

lim n→∞ A n NM(q, a n , b n , c n ) converges towards<br />

a non singular <strong>bivariate</strong> <strong>gamma</strong>. <strong>The</strong> re<strong>as</strong>on<br />

is : replacing x and y respectively by x n = e α ns<br />

and y n = e βnt the limit of<br />

(1 − a n )(1 − b n ) − c n<br />

1 − a n x n − b n y n + (a n b n − c n )x n y n<br />

h<strong>as</strong> necessarily the form<br />

the form<br />

1<br />

1<br />

1+As+Bt<br />

and never<br />

1+As+Bt+Cst . 27


Maximum likelihood estimation, standard c<strong>as</strong>e.<br />

Recall that the density of the standard <strong>bivariate</strong><br />

<strong>gamma</strong> h<strong>as</strong> the form<br />

(1 − r) q f q (rxy)<br />

<strong>with</strong> respect to the reference me<strong>as</strong>ure<br />

1<br />

Γ(q) (xy)q−1 e −x−y 1 (0,∞) 2(x, y)dxdy.<br />

If we observe the sample (X 1 , Y 1 ), . . . , (X N , Y N )<br />

we get<br />

Proposition. In the standard c<strong>as</strong>e (q known,<br />

r unknown) the maximum likelihood estimator<br />

̂r for r ∈ [0, 1) always exists. We have<br />

̂r = 0 if and only if 1 N<br />

∑ Ni=1<br />

X i Y i ≤ q 2 . If not,<br />

̂r is the unique solution of<br />

q<br />

1 − r = 1 N<br />

N∑<br />

i=1<br />

X i Y i<br />

f q+1<br />

f q<br />

(rX i Y i )<br />

which needs a numerical treatment.<br />

28


<strong>The</strong> trick is to show that x ↦→ log f q (x) is<br />

concave on (0, ∞). First f ′ q (x) = f q+1(x) and<br />

f ′′ q (x) = f q+2 (x). We change q into q − 1.<br />

<strong>The</strong>refore we have to show that<br />

f q+1 (x)f q−1 (x) − f 2 q (x) < 0<br />

for all q > 1 and x > 0. Denote u n (q) =<br />

1<br />

. Thus the coefficient of zn of the<br />

n!Γ(q+n)<br />

entire function f q+1 (z)f q−1 (z) − fq 2 (z) is<br />

v n (q) =<br />

n∑<br />

k=0<br />

[u k (q+1)u n−k (q−1)−u k (q)u n−k (q)].<br />

Let us show that v n (q) < 0 for q > 1. We<br />

write :<br />

29


n∑<br />

1<br />

v n (q) =<br />

k!(n − k)! [ 1<br />

Γ(q + k + 1)Γ(q + n − k − 1)<br />

k=0<br />

1<br />

−<br />

Γ(q + k)Γ(q + n − k) ]<br />

n + 1<br />

= −<br />

Γ(q + n + 1)Γ(q)<br />

+<br />

n−1 ∑<br />

k=0<br />

n − 2k − 1<br />

k!(n − k)!<br />

n + 1<br />

= −<br />

Γ(q + n + 1)Γ(q)<br />

+ 1 2<br />

n−1 ∑<br />

k=0<br />

×[ n − 2k − 1<br />

k!(n − k)!<br />

n + 1<br />

= −<br />

Γ(q + n + 1)Γ(q)<br />

− 1 2<br />

×<br />

n−1 ∑<br />

k=0<br />

1<br />

Γ(q + k + 1)Γ(q + n − k)<br />

1<br />

Γ(q + k + 1)Γ(q + n − k)<br />

+<br />

n − 2(n − 1 − k) − 1<br />

(n − 1 − k)!(k + 1)! ]<br />

1<br />

Γ(q + k + 1)Γ(q + n − k)<br />

(n − 2k − 1)2<br />

(k + 1)!(n − k)! < 0. 30


Maximum likelihood estimation, general c<strong>as</strong>e.<br />

Even less e<strong>as</strong>y. It h<strong>as</strong> no meaning to <strong>as</strong>k for<br />

concavity of the log of the density parameterized<br />

by (θ 1 , θ 2 , ρ) since the domain of definition<br />

{(θ 1 , θ 2 , ρ) ; θ i < 0, θ 1 θ 2 > ρ > 0}<br />

is not convex (watch the boundary θ 1 = θ 2<br />

and θ 1 θ 2 = ρ to be convinced). Taking ρ =<br />

rθ 1 θ 2 , the domain (−∞, 0) 2 ×[0, 1) for (θ 1 , θ 2 , r)<br />

is more manageable, but for fixed r the log<br />

of the density in not concave <strong>with</strong> respect<br />

to (θ 1 , θ 2 ) since, up to a linear function, it a<br />

a function of the product θ 1 θ 2 . This excludes<br />

concavity in the product space (−∞, 0) 2 (watch<br />

the determinant of the Hessian matrix of (θ 1 , θ 2 ) ↦→<br />

h(θ 1 θ 2 ) when θ 1 θ 2 becomes close to zero).<br />

31


(θ 1 , θ 2 , r) can be estimated by writing the 3<br />

equations of cancelation of the gradient of<br />

the log likelihood (the proof of the fact that<br />

the log likelihood goes to −∞ at the boundary<br />

of the domain is quite delicate). One<br />

computes θ 1 and θ 2 <strong>with</strong> respect to r which<br />

becomes the solution of the following equation<br />

where (X, Y ) is the empirical mean :<br />

0 = g N (r)<br />

= r − 1 + q<br />

XY<br />

1<br />

N<br />

N∑<br />

i=1<br />

f q+1 r q 2 X<br />

X i Y i [<br />

i Y i<br />

f q (1 − r) 2 XY ].<br />

We content ourselves in studying it for large<br />

N. Using large numbers law the equation becomes<br />

0 = g(r) =<br />

[<br />

q<br />

r−1+<br />

E(X)E(Y ) E XY f q+1 r q 2 ]<br />

XY<br />

(<br />

f q (1 − r) 2 E(X)E(Y ) )<br />

An important remark is that g(r) depends<br />

only on the <strong>distribution</strong> of (<br />

X<br />

E(X) , Y<br />

). <strong>The</strong>refore,<br />

<strong>with</strong>out loss of generality we may <strong>as</strong>-<br />

E(Y )<br />

sume that the <strong>distribution</strong> of (X, Y ) is standard<br />

<strong>with</strong> parameter r 0 .<br />

32<br />

.


<strong>The</strong> crucial function g r0 . As explained the<br />

above function g depends only on parameter<br />

r 0 and is g r0 (r) =<br />

r−1+ (1 − r 0) 2<br />

E<br />

q<br />

[<br />

XY f q+1 r<br />

(<br />

f q (1 − r) 2(1 − r 0) 2 XY )<br />

It is e<strong>as</strong>ily seen that g r0 (0) = r 0<br />

q<br />

> 0.<br />

]<br />

.<br />

Knowing that f q (u) ∼ 1<br />

2 √ π u− q 2 +1 4e 2√u if u →<br />

∞, we get that f q+1(u)<br />

∼ √ 1 f q (u) u<br />

and therefore<br />

g(r) ∼ −c √ 1 − r <strong>with</strong> c = 1 − 1−r 0<br />

q<br />

E( √ XY ).<br />

For seeing that c > 0 observe that<br />

E( √ XY ) =<br />

≤<br />

=<br />

1<br />

1 − r 0<br />

Γ(q + 1 2 )2<br />

Γ(q) 2 2F 1 (− 1 2 , −1 2 ; q; r 0)<br />

1 Γ(q +<br />

2 1 )2<br />

1 − r 0 Γ(q) 2 2F 1 (− 1 2 , −1 ; q; 1)<br />

2<br />

q<br />

1 − r 0<br />

33


Finaly g r0 (r 0 ) = 0. Possibly g r0 is convex on<br />

[0, 1], but studying g r ′′<br />

0<br />

is quite difficult. It may<br />

be not true when r 0 is close to 1. However all<br />

simulations show that g r0 (r) > 0 if 0 < r < r 0<br />

and g r0 (r) < 0 if r 0 < r < 1. In this c<strong>as</strong>e r 0 is<br />

the only solution of g r0 (r) = 0.<br />

34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!