The bivariate gamma distribution with Laplace transform (1 + as + bt ...
The bivariate gamma distribution with Laplace transform (1 + as + bt ... The bivariate gamma distribution with Laplace transform (1 + as + bt ...
The bivariate gamma distribution with Laplace transform (1 + as + bt + cst) −q : history, characterisations, estimation. Gérard Letac, Université Paul Sabatier, Toulouse, France. Conference at York, 11/30/07. ∗ ∗ with Jean-Yves Tourneret and Florent Chatelain, Irit- Enseeiht, Toulouse. 1
- Page 2 and 3: Vocabulary. While the names of dist
- Page 4 and 5: Constraints for parameters and dens
- Page 6 and 7: An element P (θ 1 , θ 2 ) of this
- Page 8 and 9: The classical objects of the NEF F
- Page 10 and 11: How to get quickly the density. Sta
- Page 12 and 13: Mixing. In other terms if N, X 0 ,
- Page 14 and 15: Moments of the standard bivariate g
- Page 16 and 17: Joint distribution of (P, S) = (XY,
- Page 18 and 19: Bivariate gamma and Wishart distrib
- Page 20 and 21: This result can be extended in seve
- Page 22 and 23: D’jachenko (1962) has observed th
- Page 24 and 25: The Wang characterisation (1982) Re
- Page 26 and 27: A parenthesis on the negative multi
- Page 28 and 29: Although in one dimension we have l
- Page 30 and 31: The trick is to show that x ↦→
- Page 32 and 33: Maximum likelihood estimation, gene
- Page 34 and 35: The crucial function g r0 . As expl
<strong>The</strong> <strong>bivariate</strong> <strong>gamma</strong> <strong>distribution</strong> <strong>with</strong><br />
<strong>Laplace</strong> <strong>transform</strong> (1 + <strong>as</strong> + <strong>bt</strong> + cst) −q :<br />
history, characterisations, estimation.<br />
Gérard Letac, Université Paul Sabatier,<br />
Toulouse, France.<br />
Conference at York, 11/30/07.<br />
∗<br />
∗ <strong>with</strong> Jean-Yves Tourneret and Florent Chatelain, Irit-<br />
Enseeiht, Toulouse.<br />
1
Vocabulary. While the names of <strong>distribution</strong>s<br />
in R are generaly unambiguous, at the contrary<br />
in the jungle of <strong>distribution</strong>s in R n almost<br />
nothing is codified outside of Wishart and<br />
Gaussian c<strong>as</strong>es. <strong>The</strong> scenario is usually <strong>as</strong> follows<br />
: choose a one dimensional thingy type<br />
(quite often an exponential dispersion model,<br />
namely a natural exponential family and all<br />
its real powers of convolution) <strong>as</strong> <strong>gamma</strong> or<br />
negative binomial, then any law in R n whose<br />
margins are of thingy type are said to be multidimensional<br />
thingy. Although the study of<br />
all <strong>distribution</strong>s <strong>with</strong> given marginals are rather<br />
in the non parametric domain of study,<br />
actually each author who isolates some parametric<br />
family will declare that he h<strong>as</strong> THE<br />
multidimensional thingy family.<br />
2
This is what we are also doing today by calling<br />
<strong>bivariate</strong> <strong>gamma</strong> any <strong>distribution</strong> ν(dx, dy)<br />
on (0, ∞) 2 such that there exists 4 parameters<br />
a, b, c, q <strong>with</strong> q > 0 satisfying<br />
∫ ∞ ∫ ∞<br />
0<br />
0 e−sx−ty ν(dx, dy) =<br />
1<br />
(1 + <strong>as</strong> + <strong>bt</strong> + cst) q<br />
(To be fair it is called Kibble and Moran <strong>distribution</strong><br />
by Johnson and Kotz). When (X, Y ) ∼<br />
ν doing t = 0 shows that X is <strong>gamma</strong> distributed<br />
<strong>with</strong> shape parameter q et and scale<br />
parameter a :<br />
γ q,a (dx) = 1 ( ) x q−1<br />
Γ(q) e−x a 1(0,∞) (x) dx a<br />
a .<br />
Same for Y. This shows that a et b are > 0.<br />
For if a or b are nonpositive the <strong>distribution</strong><br />
ν is concentrated on an other quadrant, or<br />
on an axis. No generality is lost by postulating<br />
a, b > 0 and therefore ν(dx, dy) concentrated<br />
on (0, ∞) 2 . <strong>The</strong> hypothesis q > 0 is<br />
more important and q < 0 would lead us to<br />
<strong>distribution</strong>s concentrated on a finite number<br />
of points <strong>with</strong> binomial margins.<br />
3
Constraints for parameters and density. We<br />
are going to see later on that if a, b, q > 0,<br />
the <strong>distribution</strong> ν does exist if and only if<br />
0 ≤ c ≤ ab. If c > 2ab the function (s, t) ↦→<br />
−q log(1+<strong>as</strong>+<strong>bt</strong>+cst) is not convex inside the<br />
suitable branch of hyperbola, but the proof<br />
that ab < c ≤ 2ab is impossible is su<strong>bt</strong>lier.<br />
Note that this implies that ν is infinitely divisible<br />
(<strong>as</strong> observed by Vere-Jones (1967)).<br />
Note also that is (X, Y ) is <strong>bivariate</strong> <strong>gamma</strong><br />
<strong>with</strong> parameters a, b, c, q then (X/a, Y/b) is <strong>bivariate</strong><br />
<strong>gamma</strong> <strong>with</strong> parameters 1, 1, c/ab, q. If<br />
c = ab obviously X et Y are independent.<br />
<strong>The</strong> c<strong>as</strong>e a = b = c (and thus c ≥ 1 since c ≤<br />
ab) is called the standard c<strong>as</strong>e <strong>with</strong> parameter<br />
r ∈ [0, 1) <strong>with</strong> c =<br />
1−r 1 . It seems to be more<br />
interesting than the a = b = 1 c<strong>as</strong>e (although<br />
the a = b = 1 c<strong>as</strong>e is natural in the context of<br />
Lanc<strong>as</strong>ter probabilities that we shall consider<br />
later on).<br />
If c = 0 then (X, Y ) ∼ (aZ, bZ) <strong>with</strong> Z ∼<br />
γ p,1 . If 0 < c ≤ ab the <strong>distribution</strong> ν h<strong>as</strong> a<br />
non familiar density. To compute it, let us<br />
use natural exponential families (NEF).<br />
4
<strong>The</strong> NEF F (µ ρ,q )<br />
Let ρ > 0, let q > 0 and consider the me<strong>as</strong>ure<br />
µ ρ,q (dx, dy) <strong>with</strong> density<br />
(xy) q−1<br />
Γ(q)<br />
f q(ρxy)<br />
on (0, ∞) 2 where f q (z) = ∑ ∞<br />
n=0<br />
<strong>Laplace</strong> <strong>transform</strong><br />
L µρ,q (θ 1 , θ 2 ) =<br />
is defined on<br />
∫ ∫<br />
z n<br />
n!Γ(q+n) . Its<br />
e θ 1x+θ 2 y µ ρ,q (dx, dy)<br />
Θ(µ ρ,q ) = {(θ 1 , θ 2 ) ; θ 1 < 0, θ 2 < 0, θ 1 θ 2 −ρ > 0}<br />
and is equal to L µρ,q (θ 1 , θ 2 ) = (θ 1 θ 2 − ρ) −q on<br />
this set. Consider the NEF F (µ ρ,q ) generated<br />
by µ ρ,q .<br />
5
An element P (θ 1 , θ 2 ) of this NEF h<strong>as</strong> the form<br />
P (θ 1 , θ 2 )(dx, dy) =<br />
(θ 1 θ 2 −ρ) q e θ 1x+θ 2 y (xy)q−1<br />
Γ(q)<br />
f q(ρxy)1 (0,∞) 2(x, y)dxdy.<br />
Thus<br />
∫ ∫<br />
e −sx−ty P (θ 1 , θ 2 )(dx, dy) = L(θ 1 − s, θ 2 − t)<br />
L(θ 1 , θ 2 )<br />
=<br />
(θ 1 θ 2 − ρ) q<br />
((θ 1 − s)(θ 2 − t) − ρ) q. 6
Introduce r =<br />
ρ<br />
θ 1 θ<br />
. We are going to see that<br />
2<br />
r is the correlation coefficient of P (θ 1 , θ 2 ).<br />
Obviously the parameters<br />
a = −θ 2<br />
θ 1 θ 2 − ρ = − 1<br />
θ 1 (1 − r) ,<br />
b = −θ 1<br />
θ 1 θ 2 − ρ = − 1<br />
θ 2 (1 − r) ,<br />
1<br />
c =<br />
θ 1 θ 2 − ρ = 1<br />
θ 1 θ 2 (1 − r)<br />
seem more attractive for P (θ 1 , θ 2 ) since<br />
∫ ∫<br />
e −sx−ty P (θ 1 , θ 2 )(dx, dy) =<br />
<strong>with</strong> the conversion table<br />
1<br />
(1 + <strong>as</strong> + <strong>bt</strong> + cst) q.<br />
θ 1 = − b c ,<br />
θ 2 = − a c<br />
r = ab − c , ρ = ab − c<br />
ab c 2 .<br />
In the standard c<strong>as</strong>e a = b = c =<br />
1−r 1 we have<br />
θ 1 = θ 2 = −1 and ρ = r. In the ‘Lanc<strong>as</strong>ter’<br />
c<strong>as</strong>e a = b = 1 and c = 1 − r we have θ 1 =<br />
θ 2 = −1−r 1 and ρ =<br />
r<br />
(1−r) 2. 7
<strong>The</strong> cl<strong>as</strong>sical objects of the NEF F (µ ρ,q ). We<br />
have seen that Θ(µ ρ,q ) is the interior of the<br />
hyperbola branch<br />
{θ 1 < 0, θ 2 < 0; θ 1 θ 2 − ρ > 0}.<br />
<strong>The</strong> cumulant function is<br />
Its differential is<br />
k ′ (θ 1 , θ 2 ) =<br />
k(θ 1 , θ 2 ) = −q log(θ 1 θ 2 − ρ)<br />
−q<br />
θ 1 θ 2 − ρ (θ 2, θ 1 ) = (qa, qb) = (m 1 , m 2 ).<br />
<strong>The</strong> domain of the means is M(F ) = (0, ∞) 2 .<br />
This gives r <strong>as</strong> a function of m 1 , m 2 and ρ :<br />
(√<br />
1 − r = q2<br />
1 + 4ρm )<br />
1m 2<br />
2ρm 1 m 2 q 2 − 1<br />
or in a simpler form :<br />
1<br />
1 − r = 1 2 + 1 2<br />
√<br />
1 + 4ρm 1m 2<br />
q 2 .<br />
8
Here is the variance function either expressed<br />
in canonical parameters, or in the mean<br />
parameters, or in the cl<strong>as</strong>sical parameters :<br />
k ′′ (θ 1 , θ 2 ) =<br />
= 1 q<br />
= q<br />
[<br />
q θ<br />
2<br />
2<br />
ρ<br />
(θ 1 θ 2 − ρ) 2 ρ θ1<br />
2<br />
[<br />
[<br />
m 2 1<br />
rm 1 m 2<br />
rm 1 m 2 m 2 2<br />
a 2<br />
]<br />
ab − c<br />
ab − c b 2<br />
(do not forget that r is the complicated function<br />
of m 1 and m 2 given above). One characterisation<br />
of this <strong>bivariate</strong> <strong>gamma</strong> NEF is due<br />
to Barlev et al. (1994). It says that if a NEF<br />
concentrated on (0, [ ∞) 2 h<strong>as</strong>]<br />
a variance function<br />
of the form 1 m<br />
2<br />
1<br />
∗<br />
q ∗ m 2 this implies that<br />
2<br />
it is a <strong>bivariate</strong> <strong>gamma</strong> NEF.<br />
]<br />
]<br />
9
How to get quickly the density. Starting from<br />
(1 + <strong>as</strong> + <strong>bt</strong> + cst) −q = E(e −sX−tY ) <strong>with</strong><br />
a, b, c > 0<br />
1. We choose α and β > 0 such that X 1 =<br />
αX and Y 1 = βY satisfy<br />
(1 + c 1 s + c 1 t + c 1 st) −q = E(e −sX 1−tY 1 )<br />
pour some c 1 = αa = βb = cαβ. Thus<br />
(X 1 , Y 1 ) is standard <strong>with</strong> parameters r and<br />
q. We get<br />
α = b c , β = a c , c 1 = ab<br />
c = 1<br />
1 − r<br />
2. We now compute the density for the standard<br />
c<strong>as</strong>e<br />
10
E(e −sX 1−tY 1 )<br />
=<br />
=<br />
=<br />
=<br />
=<br />
=<br />
=<br />
(1 − r) q<br />
(1 − r + s + t + st) q<br />
(1 − r) q<br />
((1 + s)(1 + t) − r) q<br />
(1 − r) q<br />
(<br />
(1 + s)(1 + t)(1 −<br />
r<br />
(1+s)(1+t) )) q<br />
(1 − r) q ∑ ∞<br />
(q) n r n<br />
(1 + s) q (1 + t) q n! (1 + s) n (1 + t) n<br />
(1 − r)q<br />
Γ(q)<br />
(1 − r)q<br />
Γ(q)<br />
∞∑<br />
n=0<br />
∞∑<br />
∫<br />
n=0<br />
∞ ∫ ∞<br />
n=0<br />
Γ(q + n)<br />
n!<br />
r n<br />
n!Γ(q + n)<br />
r n<br />
(1 + s) q+n (1 + t) q+n<br />
×<br />
0 0 e−sx 1−ty 1 (x 1 y 1 ) q+n−1 dx 1 dy 1<br />
∫<br />
(1 − r)q ∞ ∫ ∞<br />
Γ(q) 0 0 e−sx 1−ty 1 (x 1 y 1 ) q−1 f q (rx 1 y 1 )dx 1 dy 1<br />
11
Mixing. In other terms if<br />
N, X 0 , . . . , X n . . . , Y 0 , . . . , Y n , . . .<br />
are independent and such that X 0 ∼ Y 0 ∼ γ q,1 ,<br />
X i ∼ Y i ∼ γ 1,1 for i ≥ 1 and if N follows a<br />
negative binomial <strong>distribution</strong> :<br />
denote<br />
Pr(N = n) = (q) n<br />
n! (1 − r)q r n<br />
U n = X 0 + . . . + X n , V n = Y 0 + . . . + Y n .<br />
Under these circumstances (U N , V N ) follows<br />
a standard <strong>gamma</strong> <strong>bivariate</strong> <strong>distribution</strong> <strong>with</strong><br />
parameters r and q : thus this <strong>distribution</strong> is<br />
a mixing of <strong>gamma</strong> <strong>distribution</strong>s.
Why 0 ≤ c ≤ ab? (and not only 0 ≤ c ≤ 2ab for<br />
which log((1 + <strong>as</strong> + <strong>bt</strong> + cst) is concave). This<br />
is equivalent to <strong>as</strong>k : why r ∈ [0, 1] and not<br />
only r ∈ [−1, 1]? <strong>The</strong> preceeding calculation<br />
contains the answer ; if r < 0 the density is<br />
(1−r) q<br />
Γ(q) (x 1y 1 ) q−1 f q (rx 1 y 1 ). But<br />
( z<br />
2) q−1<br />
fq<br />
( 1<br />
4 z2 )<br />
= I q−1 (z)<br />
Since the cl<strong>as</strong>sical Bessel function I q−1 (z) h<strong>as</strong><br />
its zeros on the imaginary axis and since they<br />
are all simple and in an infinite number, this<br />
implies that f q is not positive on (−∞, 0).<br />
13
Moments of the standard <strong>bivariate</strong> <strong>gamma</strong>.<br />
Recall that the <strong>distribution</strong> of (X, Y ) such<br />
that<br />
E(e −sX−tY ) = (1 − r) q (1 − r + s + t + st) −q<br />
h<strong>as</strong> been called standard <strong>bivariate</strong> <strong>gamma</strong> <strong>distribution</strong><br />
<strong>with</strong> covariance r and shape parameter<br />
q, and therefore <strong>with</strong> density<br />
(1 − r) q(xy)q−1<br />
Γ(q) f q(rxy)e −x−y .<br />
We get e<strong>as</strong>ily from this the moments : E(X s Y t ) =<br />
(1 − r) q Γ(q + s)Γ(q + t)<br />
Γ(q) 2 2F 1 (q + s, q + t; q; r) =<br />
(1 − r) −s−t Γ(q + s)Γ(q + t)<br />
Γ(q) 2 2F 1 (−s, −t; q; r)<br />
and therefore E((XY ) s )<br />
≤ (1 − r)−2s Γ(q + s) 2<br />
Γ(q) 2 2F 1 (−s, −t; q; 1)<br />
= (1 − r)<br />
−s−tΓ(q + 2s)<br />
Γ(q)<br />
14
For instance<br />
E(X) = E(Y ) = q<br />
1 − r<br />
E(X 2 ) = E(Y 2 q(q + 1)<br />
) =<br />
(1 − r) 2<br />
σ 2 (X) = σ 2 q<br />
(Y ) =<br />
(1 − r) 2<br />
1<br />
E(XY ) =<br />
(1 − r) 2q2 (1 + r q )<br />
E(X 2 Y 2 ) =<br />
1<br />
(1 − r) 4q2 (q + 1) 2 (1 + 4r<br />
q +<br />
2r2<br />
q(q + 1) )<br />
15
Joint <strong>distribution</strong> of (P, S) = (XY, X + Y ),<br />
standard c<strong>as</strong>e. <strong>The</strong> <strong>distribution</strong> is<br />
(1−r) qpq−1<br />
Γ(q) f q(rp)e −s 1<br />
√ 1 √ s>2 p (p, s)dpds.<br />
s 2 − 4p<br />
Writing h q (z) = ∑ ∞ (q) n<br />
n=0 n!Γ(2q+2n) zn and imitating<br />
the way that the density of (X, Y ) h<strong>as</strong><br />
been found one gets the law of S<br />
(1 − r) q s 2q−1 h q (rs 2 )1 s>0 (s)ds<br />
but for the density of P = XY nothing can be<br />
o<strong>bt</strong>ained from the joint <strong>distribution</strong> of (P, S)<br />
and from the Mellin <strong>transform</strong><br />
E(P s ) = (1 − r)<br />
−2sΓ(q + s)2<br />
Γ(q) 2<br />
2F 1 (−s, −s; q; r).<br />
16
History. <strong>The</strong>se <strong>distribution</strong>s have been introduced<br />
by Wicksell (1933) and independently<br />
by Kibble (1941) and mentioned by Moran<br />
(1967), by Barndorff Nielsen(1980), Wang (1982),<br />
Seshadri (1988), Barlev and 5 coauthors(1994)<br />
and Johnson and Kotz (1972). Angelo Koudou<br />
(1995) and Philippe Bernardorff (2003)<br />
study them in their thesis, defended at the<br />
Université Paul Sabatier at Toulouse.<br />
17
Bivariate <strong>gamma</strong> and Wishart <strong>distribution</strong>s<br />
In one dimension, <strong>gamma</strong> <strong>distribution</strong>s <strong>with</strong><br />
when the shape parameter is a half integer are<br />
known <strong>as</strong> chi squares. If the shape parameter<br />
is not a half integer, probabilistic interpretations<br />
are rather rare. Here is one which is due<br />
to Yor (1990) :<br />
∫ ∞<br />
(<br />
t 0 e− p +B(t) dt) −1 ∼ γ p,1<br />
when B is a standard Brownian motion. <strong>The</strong><br />
most natural generalization of the chi square<br />
<strong>distribution</strong> is the Wishart <strong>distribution</strong> : if<br />
Z 1 , . . . , Z N are iid N(0, Σ) in R r (written by<br />
columns) the Wishart <strong>distribution</strong> is the law<br />
of the random matrix<br />
It satisfies<br />
W = 1 2 (Z 1Z T 1 + . . . + Z NZ T N ).<br />
E(e − tr (θW ) ) = det(I + Σ −1 θ) −N/2 .<br />
(θ is a symmetric positive definite matrix).<br />
18
[ ]<br />
X Z<br />
For r = 2 write W = . Suppose that<br />
Z Y<br />
[ ]<br />
θ = diag(s, t) and denote Σ −1 a γ<br />
= and<br />
γ b<br />
denote c = ab − γ 2 . One gets exactly<br />
E(e −sX−tY ) =<br />
1<br />
(1 + <strong>as</strong> + <strong>bt</strong> + cst) N/2 :<br />
As a result : the diagonal of a Wishart matrix<br />
on the (2,2) real symmetric matrices is a<br />
<strong>bivariate</strong> <strong>gamma</strong>.<br />
19
This result can be extended in several directions<br />
: by replacing N/2 by q ≥ 1/2; by replacing<br />
the (2,2) positive definite matrices by<br />
the Lorentz cone<br />
C d = {(x, y, z 1 , . . . , z d ) ; x, y > 0, xy > z 2 1 +· · ·+z2 d }<br />
and [ writing ] symbolically its elements <strong>as</strong> w =<br />
x z<br />
. <strong>The</strong> Wishart <strong>distribution</strong>s on C<br />
z y<br />
d of<br />
shape parameter q are the elements of the<br />
natural exponential family generated by the<br />
me<strong>as</strong>ure on C d <strong>with</strong> density<br />
(xy − z 2 1 − · · · − z2 d )q−1− d 2<br />
if q > 1/2 (and some singular <strong>distribution</strong> on<br />
∂C d if q = 1/2). If (X, Y, Z) follows such a<br />
Wishart law on C d then (X, Y ) is <strong>bivariate</strong><br />
<strong>gamma</strong>. <strong>The</strong> c<strong>as</strong>es of ordinary Wishart <strong>distribution</strong>s,<br />
or complex or quaternionic are the<br />
c<strong>as</strong>es d = 1, 2, 4.<br />
Note that for 0 < q < 1/2 the <strong>bivariate</strong> <strong>gamma</strong><br />
does exist but cannot be the marginal <strong>distribution</strong><br />
of (X, Y ) for a Wishart <strong>distribution</strong><br />
(X, Y, Z 1 , . . . , Z d ).<br />
20
<strong>The</strong> <strong>bivariate</strong> <strong>gamma</strong> are Lanc<strong>as</strong>ter probabilities<br />
Suppose that α(dx) and β(dy) are <strong>distribution</strong>s<br />
on R such that ∫ e −a|x| α(dx) and<br />
∫<br />
e −a|y| β(dy) are finite for some a > 0. Let<br />
(P n ) and (Q n ) be the sequences of corresponding<br />
orthonormal polynomials. We say that<br />
the <strong>distribution</strong> µ(dx, dy) on R 2 <strong>with</strong> margins<br />
α and β is of Lanc<strong>as</strong>ter type if either their<br />
exists a sequence (ρ n ) of real numbers such<br />
that µ is absolutely continuous <strong>with</strong> respect<br />
to α(dx)β(dy) and is<br />
µ(dx, dy) =<br />
⎡<br />
⎣<br />
∞∑<br />
n=0<br />
ρ n P n (x)Q n (y) ⎦ α(dx)β(dy)<br />
or µ is the weak limit of such <strong>distribution</strong>s. In<br />
both c<strong>as</strong>es we have<br />
E µ (P n (X)|Y ) = ρ n Q n (Y ), E µ (Q n (Y )|X) = ρ n P n (X).<br />
⎤<br />
21
D’jachenko (1962) h<strong>as</strong> observed that the <strong>bivariate</strong><br />
<strong>gamma</strong> is a Lanc<strong>as</strong>ter probabilility <strong>with</strong><br />
ρ n = r n where r ∈ [0, 1] is the correlation coefficient.<br />
This is a not quite obvious fact and<br />
relies on a cl<strong>as</strong>sical formula on Laguerre polynomials.<br />
Note that this is not the standard<br />
<strong>bivariate</strong> <strong>gamma</strong> but the one <strong>with</strong> a = b =<br />
1 and c = 1 − r. To this, add a Tyan and<br />
Thom<strong>as</strong> theorem which says that in general<br />
the (ρ n ) ′ n≥0s appearing in the definition of<br />
Lanc<strong>as</strong>ter probabilities are sequences of moments,<br />
and you get an elegant integral representation<br />
:<br />
Proposition. µ(dx, dy) is a Lanc<strong>as</strong>ter probability<br />
for the margins α = β = γ q,1 if and only<br />
if there exists a probability ν(dr) on [0, 1] such<br />
that<br />
∫ ∞ ∫ ∞<br />
0<br />
∫ 1<br />
0<br />
0 e−sx−ty µ(dx, dy) =<br />
ν(dr)<br />
(1 + s + t + (1 − r)st) q.<br />
Dans ce c<strong>as</strong> ∫ 1<br />
0 r n ν(dr) = ρ n .<br />
22
Other details on Lanc<strong>as</strong>ter probabilities can<br />
be found in Koudou (1996) and the paper<br />
on Gibbs sampling by Diaconis and two coauthors<br />
to appear in Statistical Science in 2008.
<strong>The</strong> Wang characterisation (1982) Recall<br />
the characterisation by Lukacs (1956) of the<br />
ordinary <strong>gamma</strong> <strong>distribution</strong>s : If X and Y<br />
are positive, independent, not constant and<br />
such that Z = X/(X + Y ) is independent of<br />
X + Y then there exist p, q, c > 0 such that<br />
X ∼ γ p,c and Y ∼ γ q,c . Proof : let a and<br />
b ∈ (0, 1) such that E(X|X + Y ) = a(X + Y )<br />
and E(X 2 |X + Y ) = b(X + Y ) 2 . <strong>The</strong>n<br />
E(E(e −s(X+Y ) X|X+Y )) = aE((X+Y )e −s(X+Y ) )<br />
E(E(e −s(X+Y ) X 2 |X+Y )) = bE((X+Y ) 2 e −s(X+Y ) ).<br />
Denoting L X (s) = E(e −sX ) we get the differential<br />
system in L X and L Y : L ′ X L Y =<br />
a(L X L Y ) ′ and L ′′ X L Y = b(L X L Y ) ′′ e<strong>as</strong>ily solved<br />
by intoducing k X = log L X .<br />
23
Wang does the same thing in two dimensions<br />
: if X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 )<br />
in (0, ∞) 2 are independent, not constant and<br />
such that<br />
Z =<br />
(<br />
X1<br />
X 1 + Y 1<br />
,<br />
)<br />
X 2<br />
X 2 + Y 2<br />
is independent of X+Y then there exist p, q, a, b, c ><br />
0 such that X and Y are <strong>bivariate</strong> <strong>gamma</strong><br />
<strong>with</strong><br />
E(e −sX 1−tX 2 ) = (1 + <strong>as</strong> + <strong>bt</strong> + cst) −p<br />
E(e −sY 1−tY 2 ) = (1 + <strong>as</strong> + <strong>bt</strong> + cst) −q .<br />
<strong>The</strong> method of proof is the same <strong>as</strong> in one<br />
dimension. But see also a splendid paper by<br />
Letac and Weso̷lowski in TAMS (2008) which<br />
generalizes...<br />
24
A parenthesis on the negative multinomial<br />
<strong>distribution</strong> in dimension 2. <strong>The</strong> <strong>distribution</strong><br />
NM(p, a, b, c) of (X, Y ) in N 2 <strong>with</strong> parameters<br />
p, a, b, c and constraints 0 < p, 0 < a, b < 1 and<br />
0 < c < (1−a)(1−b) h<strong>as</strong> generating function :<br />
E(x X y Y ) =<br />
[<br />
(1 − a)(1 − b) − c<br />
1 − ax − by + (ab − c)xy<br />
] p<br />
.<br />
<strong>The</strong> same little trick <strong>as</strong> for the <strong>gamma</strong> gives<br />
Pr(X = m, Y = n) = ((1−a)(1−b)−c) p a m b n ×<br />
min(m,n)<br />
∑<br />
k=0<br />
(p) k<br />
k!<br />
(p + k) m−k<br />
(m − k)!<br />
by the following calculation<br />
(p + k) n−k<br />
(n − k)!<br />
( c<br />
ab<br />
) k<br />
.<br />
25
=<br />
=<br />
[<br />
⎡<br />
⎣<br />
(1 − a)(1 − b) − c<br />
1 − ax − by + (ab − c)xy<br />
] p<br />
(1 − a)(1 − b) − c<br />
(1 − ax)(1 − by)(1 −<br />
[<br />
(1 − a)(1 − b) − c<br />
(1 − ax)(1 − by)<br />
∞∑<br />
k=0<br />
k=0<br />
(p) k<br />
k!<br />
] p<br />
×<br />
c k x k y k<br />
(1 − ax) k (1 − by) k<br />
cxy<br />
(1−ax)(1−by)<br />
= ((1 − a)(1 − b) − c) p ×<br />
∞∑ (p) k c k x k y k<br />
k! (1 − ax) p+k (1 − by) p+k<br />
k,r,s=0<br />
⎤p<br />
= ((1 − a)(1 − b) − c) p ×<br />
∞∑ (p) k (p + k) r (p + k) s<br />
a r b s c k x r+k y s+k<br />
k! r! s!<br />
⎦<br />
26
Although in one dimension we have<br />
lim<br />
n→∞ H 1/n NB(q,<br />
nc<br />
1 + nc ) = γ q,c,<br />
(here H 1/n is the action on <strong>distribution</strong> of<br />
the dilation x ↦→ H 1/n (x) = x/n), in two dimensions<br />
it is impossible to find an affine<br />
<strong>transform</strong>ation A n (x, y) = (α n x, β n y) and a<br />
sequence (a n , b n , c n ) of parameters such that<br />
lim n→∞ A n NM(q, a n , b n , c n ) converges towards<br />
a non singular <strong>bivariate</strong> <strong>gamma</strong>. <strong>The</strong> re<strong>as</strong>on<br />
is : replacing x and y respectively by x n = e α ns<br />
and y n = e βnt the limit of<br />
(1 − a n )(1 − b n ) − c n<br />
1 − a n x n − b n y n + (a n b n − c n )x n y n<br />
h<strong>as</strong> necessarily the form<br />
the form<br />
1<br />
1<br />
1+As+Bt<br />
and never<br />
1+As+Bt+Cst . 27
Maximum likelihood estimation, standard c<strong>as</strong>e.<br />
Recall that the density of the standard <strong>bivariate</strong><br />
<strong>gamma</strong> h<strong>as</strong> the form<br />
(1 − r) q f q (rxy)<br />
<strong>with</strong> respect to the reference me<strong>as</strong>ure<br />
1<br />
Γ(q) (xy)q−1 e −x−y 1 (0,∞) 2(x, y)dxdy.<br />
If we observe the sample (X 1 , Y 1 ), . . . , (X N , Y N )<br />
we get<br />
Proposition. In the standard c<strong>as</strong>e (q known,<br />
r unknown) the maximum likelihood estimator<br />
̂r for r ∈ [0, 1) always exists. We have<br />
̂r = 0 if and only if 1 N<br />
∑ Ni=1<br />
X i Y i ≤ q 2 . If not,<br />
̂r is the unique solution of<br />
q<br />
1 − r = 1 N<br />
N∑<br />
i=1<br />
X i Y i<br />
f q+1<br />
f q<br />
(rX i Y i )<br />
which needs a numerical treatment.<br />
28
<strong>The</strong> trick is to show that x ↦→ log f q (x) is<br />
concave on (0, ∞). First f ′ q (x) = f q+1(x) and<br />
f ′′ q (x) = f q+2 (x). We change q into q − 1.<br />
<strong>The</strong>refore we have to show that<br />
f q+1 (x)f q−1 (x) − f 2 q (x) < 0<br />
for all q > 1 and x > 0. Denote u n (q) =<br />
1<br />
. Thus the coefficient of zn of the<br />
n!Γ(q+n)<br />
entire function f q+1 (z)f q−1 (z) − fq 2 (z) is<br />
v n (q) =<br />
n∑<br />
k=0<br />
[u k (q+1)u n−k (q−1)−u k (q)u n−k (q)].<br />
Let us show that v n (q) < 0 for q > 1. We<br />
write :<br />
29
n∑<br />
1<br />
v n (q) =<br />
k!(n − k)! [ 1<br />
Γ(q + k + 1)Γ(q + n − k − 1)<br />
k=0<br />
1<br />
−<br />
Γ(q + k)Γ(q + n − k) ]<br />
n + 1<br />
= −<br />
Γ(q + n + 1)Γ(q)<br />
+<br />
n−1 ∑<br />
k=0<br />
n − 2k − 1<br />
k!(n − k)!<br />
n + 1<br />
= −<br />
Γ(q + n + 1)Γ(q)<br />
+ 1 2<br />
n−1 ∑<br />
k=0<br />
×[ n − 2k − 1<br />
k!(n − k)!<br />
n + 1<br />
= −<br />
Γ(q + n + 1)Γ(q)<br />
− 1 2<br />
×<br />
n−1 ∑<br />
k=0<br />
1<br />
Γ(q + k + 1)Γ(q + n − k)<br />
1<br />
Γ(q + k + 1)Γ(q + n − k)<br />
+<br />
n − 2(n − 1 − k) − 1<br />
(n − 1 − k)!(k + 1)! ]<br />
1<br />
Γ(q + k + 1)Γ(q + n − k)<br />
(n − 2k − 1)2<br />
(k + 1)!(n − k)! < 0. 30
Maximum likelihood estimation, general c<strong>as</strong>e.<br />
Even less e<strong>as</strong>y. It h<strong>as</strong> no meaning to <strong>as</strong>k for<br />
concavity of the log of the density parameterized<br />
by (θ 1 , θ 2 , ρ) since the domain of definition<br />
{(θ 1 , θ 2 , ρ) ; θ i < 0, θ 1 θ 2 > ρ > 0}<br />
is not convex (watch the boundary θ 1 = θ 2<br />
and θ 1 θ 2 = ρ to be convinced). Taking ρ =<br />
rθ 1 θ 2 , the domain (−∞, 0) 2 ×[0, 1) for (θ 1 , θ 2 , r)<br />
is more manageable, but for fixed r the log<br />
of the density in not concave <strong>with</strong> respect<br />
to (θ 1 , θ 2 ) since, up to a linear function, it a<br />
a function of the product θ 1 θ 2 . This excludes<br />
concavity in the product space (−∞, 0) 2 (watch<br />
the determinant of the Hessian matrix of (θ 1 , θ 2 ) ↦→<br />
h(θ 1 θ 2 ) when θ 1 θ 2 becomes close to zero).<br />
31
(θ 1 , θ 2 , r) can be estimated by writing the 3<br />
equations of cancelation of the gradient of<br />
the log likelihood (the proof of the fact that<br />
the log likelihood goes to −∞ at the boundary<br />
of the domain is quite delicate). One<br />
computes θ 1 and θ 2 <strong>with</strong> respect to r which<br />
becomes the solution of the following equation<br />
where (X, Y ) is the empirical mean :<br />
0 = g N (r)<br />
= r − 1 + q<br />
XY<br />
1<br />
N<br />
N∑<br />
i=1<br />
f q+1 r q 2 X<br />
X i Y i [<br />
i Y i<br />
f q (1 − r) 2 XY ].<br />
We content ourselves in studying it for large<br />
N. Using large numbers law the equation becomes<br />
0 = g(r) =<br />
[<br />
q<br />
r−1+<br />
E(X)E(Y ) E XY f q+1 r q 2 ]<br />
XY<br />
(<br />
f q (1 − r) 2 E(X)E(Y ) )<br />
An important remark is that g(r) depends<br />
only on the <strong>distribution</strong> of (<br />
X<br />
E(X) , Y<br />
). <strong>The</strong>refore,<br />
<strong>with</strong>out loss of generality we may <strong>as</strong>-<br />
E(Y )<br />
sume that the <strong>distribution</strong> of (X, Y ) is standard<br />
<strong>with</strong> parameter r 0 .<br />
32<br />
.
<strong>The</strong> crucial function g r0 . As explained the<br />
above function g depends only on parameter<br />
r 0 and is g r0 (r) =<br />
r−1+ (1 − r 0) 2<br />
E<br />
q<br />
[<br />
XY f q+1 r<br />
(<br />
f q (1 − r) 2(1 − r 0) 2 XY )<br />
It is e<strong>as</strong>ily seen that g r0 (0) = r 0<br />
q<br />
> 0.<br />
]<br />
.<br />
Knowing that f q (u) ∼ 1<br />
2 √ π u− q 2 +1 4e 2√u if u →<br />
∞, we get that f q+1(u)<br />
∼ √ 1 f q (u) u<br />
and therefore<br />
g(r) ∼ −c √ 1 − r <strong>with</strong> c = 1 − 1−r 0<br />
q<br />
E( √ XY ).<br />
For seeing that c > 0 observe that<br />
E( √ XY ) =<br />
≤<br />
=<br />
1<br />
1 − r 0<br />
Γ(q + 1 2 )2<br />
Γ(q) 2 2F 1 (− 1 2 , −1 2 ; q; r 0)<br />
1 Γ(q +<br />
2 1 )2<br />
1 − r 0 Γ(q) 2 2F 1 (− 1 2 , −1 ; q; 1)<br />
2<br />
q<br />
1 − r 0<br />
33
Finaly g r0 (r 0 ) = 0. Possibly g r0 is convex on<br />
[0, 1], but studying g r ′′<br />
0<br />
is quite difficult. It may<br />
be not true when r 0 is close to 1. However all<br />
simulations show that g r0 (r) > 0 if 0 < r < r 0<br />
and g r0 (r) < 0 if r 0 < r < 1. In this c<strong>as</strong>e r 0 is<br />
the only solution of g r0 (r) = 0.<br />
34