Etude des marchés d'assurance non-vie à l'aide d'équilibres de ...
Etude des marchés d'assurance non-vie à l'aide d'équilibres de ...
Etude des marchés d'assurance non-vie à l'aide d'équilibres de ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
tel-00703797, version 2 - 7 Jun 2012<br />
Binary regression and mo<strong>de</strong>l selection<br />
1.6. Other regression mo<strong>de</strong>ls<br />
As for GLMs, the binary regression means we assume that Yi follows a Bernoulli distribution<br />
B(πi), πi being linked to explanatory variables. So, the mo<strong>de</strong>l equation is<br />
πi = g −1 (ηi),<br />
where g is the link function and ηi the predictor. Unlike the GLM where the predictor was<br />
linear, for GAMs the predictor is a sum of smooth functions:<br />
p<br />
p1<br />
p2<br />
<br />
α0 + fj(Xj) or α0 + αiXi + fj(Xj),<br />
j=1<br />
the latter being a semi-parametric approach. As suggested in Hastie and Tibshirani (1995),<br />
the purpose to use linear terms can be motivated to avoid too much smooth terms and are<br />
longer to compute (than linear terms). For instance, if a covariate represents the date or the<br />
time of events, it is “often” better to consi<strong>de</strong>r the effect as an increasing or <strong>de</strong>creasing trend<br />
with a single parameter αi.<br />
As for GLMs, we are able to compute confi<strong>de</strong>nce intervals using the Gaussian asymptotic<br />
distribution of the estimators. The variable selection for GAMs is similar to those of GLMs.<br />
The true improvement is a higher <strong>de</strong>gree of flexibility to mo<strong>de</strong>l the effect of one explanatory<br />
variables on the response.<br />
The procedure for variable selection is similar to the backward approach of GLMs, but a<br />
term is dropped only if no smooth function and no linear function with this term is relevant.<br />
That is to say, a poor significance of a variable mo<strong>de</strong>lled by a smooth function might be<br />
significant when mo<strong>de</strong>lled by a single linear term. We will use the following acceptance rules<br />
of Wood (2001) to drop an explanatory variable:<br />
(a) Is the estimated <strong>de</strong>grees of freedom for the term close to 1?<br />
(b) Does the plotted confi<strong>de</strong>nce interval band for the term inclu<strong>de</strong> zero everywhere?<br />
(c) Does the GCV score drop (or the REML score jump) when the term is dropped?<br />
If the answer is “yes” to all questions (a, b, c), then we should drop the term. If only question<br />
(a) answer is “yes”, then we should try a linear term. Otherwise there is no general rule to<br />
apply. For all the computation of GAMs, we use the recommen<strong>de</strong>d R package mgcv written<br />
by S. Wood.<br />
1.6.2 Application to the large dataset<br />
In Section 1.3.2, the GLM analysis of this large dataset reveals that the channel distribution<br />
strongly impacts the GLM outputs. Especially, the lapse gap between tied-agent and other<br />
channels is far stronger than what we could expect. Moreover, the price sensitivity gap<br />
measured by the lapse <strong>de</strong>ltas is also high. Let us see this it still holds with GAM results.<br />
On each channel and cover, we first estimate a GAM by mo<strong>de</strong>lling all the terms by a<br />
smooth function. And then we apply the Wood’s rules to remove, to linearize or to categorize<br />
the explanatory variables. In Appendix 1.8.2, we provi<strong>de</strong> the regression summary for one of<br />
the nine subsets.<br />
Comments on regression summary<br />
In this subsection, we briefly comment on the nine regression summaries. Let us start with<br />
the Third-Part Liability cover. For the agent subset, for which we have a market proxy, we<br />
i=1<br />
j=1<br />
69