Etude des marchés d'assurance non-vie à l'aide d'équilibres de ...
Etude des marchés d'assurance non-vie à l'aide d'équilibres de ...
Etude des marchés d'assurance non-vie à l'aide d'équilibres de ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
tel-00703797, version 2 - 7 Jun 2012<br />
Mo<strong>de</strong>l a<strong>de</strong>quacy<br />
1.2. GLMs, a brief introduction<br />
The <strong>de</strong>viance, which is one way to measure the mo<strong>de</strong>l a<strong>de</strong>quacy with the data and generalizes<br />
the R 2 measure of linear mo<strong>de</strong>ls, is <strong>de</strong>fined by<br />
D(y, ˆπ) = 2(ln(L(y1, . . . , yn, y1, . . . , yn)) − ln(L(ˆπ1, . . . , ˆπn, y1, . . . , yn))),<br />
where ˆπ is the estimate of the beta vector. The “best” mo<strong>de</strong>l is the one having the lowest<br />
<strong>de</strong>viance. However, if all responses are binary data, the first term can be infinite. So in<br />
practice, we consi<strong>de</strong>r the <strong>de</strong>viance simply as<br />
D(y, ˆπ) = −2 ln(L(ˆπ1, . . . , ˆπn, y1, . . . , yn)).<br />
Furthermore, the <strong>de</strong>viance is used as a relative measure to compare two mo<strong>de</strong>ls. In most softwares,<br />
in particular in R, the GLM fitting function provi<strong><strong>de</strong>s</strong> two <strong>de</strong>viances: the null <strong>de</strong>viance<br />
and the <strong>de</strong>viance. The null <strong>de</strong>viance is the <strong>de</strong>viance for the mo<strong>de</strong>l with only an intercept or if<br />
not offset only, i.e. when p = 1 and X is only an intercept full of 1 ∗ . The (second) <strong>de</strong>viance<br />
is the <strong>de</strong>viance for the mo<strong>de</strong>l D(y, ˆπ) with the p explanatory variables. Note that if there are<br />
as many parameters as there are observations, then the <strong>de</strong>viance will be the best possible, but<br />
the mo<strong>de</strong>l does not explain anything.<br />
Another criterion introduced by Akaike in the 70’s is the Akaike Information Criterion<br />
(AIC), which is also an a<strong>de</strong>quacy measure of statistical mo<strong>de</strong>ls. Unlike the <strong>de</strong>viance, AIC<br />
aims to penalized overfitted mo<strong>de</strong>ls, i.e. mo<strong>de</strong>ls with too many parameters (compared to the<br />
length of the dataset). AIC is <strong>de</strong>fined by<br />
AIC(y, ˆπ) = 2k − ln(L(ˆπ1, . . . , ˆπn, y1, . . . , yn)),<br />
where k the number of parameters, i.e. the length of β. This criterion is a tra<strong>de</strong>-off between<br />
further improvement in terms of log-likelihood with additional variables and the additional<br />
mo<strong>de</strong>l cost of including new variables. To compare two mo<strong>de</strong>ls with different parameter<br />
numbers, we look for the one having the lowest AIC.<br />
In a linear mo<strong>de</strong>l, the analysis of residuals (which are assumed to be i<strong>de</strong>ntical and in<strong>de</strong>pen<strong>de</strong>nt<br />
Gaussian variables) may reveal that the mo<strong>de</strong>l is unappropriate. Typically we can<br />
plot the fitted values against the fitted residuals. For GLMs, the analysis of residuals is much<br />
more complex, because we loose the normality assumption. Furthermore, for binary data, i.e.<br />
not binomial data, the plot of residuals exhibits straight lines, which are hard to interpret, see<br />
Appendix 1.8.2. We believe that the residual analysis is not appropriate for binary regressions.<br />
Variable selection<br />
From the normal asymptotic distribution of the maximum likelihood estimator, we can<br />
<strong>de</strong>rive confi<strong>de</strong>nce intervals as well as hypothesis tests for coefficents. Therefore, a p-value is<br />
available for each coefficient of the regression, which help us to keep only the most significant<br />
variable. However, as removing one variable impacts the significance of other variables, it can<br />
be hard to find the optimal set of explanatory variables.<br />
There are two approaches: either a forward selection, i.e. starting from the null mo<strong>de</strong>l, we<br />
add the most significant variable at each step, or a backward elimination, i.e. starting from<br />
the full mo<strong>de</strong>l, we remove the least significant variable at each step.<br />
∗. It means all the heterogeneity of data comes from the random component.<br />
51