Estimation, Evaluation, and Selection of Actuarial Models

Estimation, Evaluation, and Selection of Actuarial Models Estimation, Evaluation, and Selection of Actuarial Models

01.08.2014 Views

48 CHAPTER 3. SAMPLING PROPERTIES OF ESTIMATORS Constructing confidence intervals is usually very difficult. However, there is a method for constructing approximate confidence intervals that is often accessible. Suppose we have an estimator ˆθ of the parameter θ such that E(ˆθ) = . θ, Var(ˆθ) = . v(θ), andˆθ has approximately a normal distribution. With all these approximations, we have à 1 − α =Pr . −z α/2 ≤ ˆθ ! − θ p ≤ z α/2 (3.2) v(θ) and solving for θ produces the desired interval. Sometimes this is difficult to do (due to the appearance of θ in the denominator) and so we may replace v(θ) in (3.2) with v(ˆθ) to obtain a further approximation 1 − α =Pr . q µˆθ − zα/2 qv(ˆθ) ≤ θ ≤ ˆθ + z α/2 v(ˆθ) (3.3) where z α is the 100(1 − α)th percentile of the standard normal distribution. Example 3.24 Use (3.2) and (3.3) to construct approximate 95% confidence intervals for p(2) using Data Set A. From (3.2), ⎛ ⎞ 0.95 = Pr ⎝−1.96 ≤ p n(2) − p(2) q ≤ 1.96⎠ . p(2)[1−p(2)] n Solve this by making the inequality an equality and then squaring both sides to obtain (dropping the argument of (2) for simplicity), (p n − p) 2 n p(1 − p) = 1.96 2 np 2 n − 2npp n + np 2 = 1.96 2 p − 1.96 2 p 2 0 = (n +1.96 2 )p 2 − (2np n +1.96 2 )p + np 2 n p = 2np n +1.96 2 ± p (2np n +1.96 2 ) 2 − 4(n +1.96 2 )np 2 n 2(n +1.96 2 ) which provides the two endpoints of the confidence interval. Inserting the numbers from Data Set A(p n =0.017043, n =94,935) produces a confidence interval of (0.016239, 0.017886). Equation (3.3) provides the confidence interval directly as r pn (1 − p n ) p n ± 1.96 . n Inserting the numbers from Data Set A gives 0.017043±0.000823 for an interval of (0.016220, 0.017866). The answers for the two methods are very similar, whichisthecasewhenthesamplesizeislarge. The results are reasonable, because it is well known that the normal distribution is a reasonable approximation to the binomial. ¤

3.3. VARIANCE AND CONFIDENCE INTERVALS 49 When data are censored or truncated the matter becomes more complex. Counts no longer have the binomial distribution and therefore the distribution of the estimator is harder to obtain. While there are proofs available to back up the results presented here, they will not be provided. Instead, an attempt will be made to indicate why the results are reasonable. Consider the Kaplan-Meier product-limit estimator of S(t). It is the product of a number of terms of the form (r j − s j )/r j where r j was viewed as the number available to die at age y j and s j is the number who actually did so. Assume that the death ages and the number available to die are fixed, so that the value of s j is the only random quantity. As a random variable, S j has a binomial distribution based on a sample of r j lives and success probability [S(y j−1 ) − S(y j )]/S(y j−1 ). The probability arises from the fact that those available to die were known to be alive at the previous death age. For one of these terms, µ rj − S j E = r j − r j [S(y j−1 ) − S(y j )]/S(y j−1 ) = S(y j) r j r j S(y j−1 ) . That is, this ratio is an unbiased estimator of the probability of surviving from one death age to the next one. Furthermore, h i µ S(y rj − S r j−1 )−S(y j ) j j S(y j−1 ) 1 − S(y j−1)−S(y j ) S(y j−1 ) Var = r j r 2 j = [S(y j−1) − S(y j )]S(y j ) r j S(y j−1 ) 2 . Now consider the estimated survival probability at one of the death ages. Its expected value is E[Ŝ(y j)] = E = jY i=1 " jY i=1 µ ri − S i r i # = S(y i ) S(y i−1 ) = S(y j) S(y 0 ) jY i=1 µ ri − S i E r i where y 0 is the smallest observed age in the sample. In order to bring the expectation inside the product, it was assumed that the S-values are independent. The result demonstrates that at the death ages, the estimator is unbiased. With regard to the variance, we first need a general result concerning the variance of a product of independent random variables. Let X 1 ,...,X n be independent random variables where E(X j )=µ j and Var(X j )=σ 2 j . Then, Var(X 1 ···X n ) = E(X 2 1 ···X 2 n) − E(X 1 ···X n ) 2 = E(X 2 1) ···E(X 2 n) − E(X 1 ) 2 ···E(X n ) 2 = (µ 2 1 + σ 2 1) ···(µ 2 n + σ 2 n) − µ 2 1 ···µ 2 n.

3.3. VARIANCE AND CONFIDENCE INTERVALS 49<br />

When data are censored or truncated the matter becomes more complex. Counts no longer<br />

have the binomial distribution <strong>and</strong> therefore the distribution <strong>of</strong> the estimator is harder to obtain.<br />

While there are pro<strong>of</strong>s available to back up the results presented here, they will not be provided.<br />

Instead, an attempt will be made to indicate why the results are reasonable.<br />

Consider the Kaplan-Meier product-limit estimator <strong>of</strong> S(t). It is the product <strong>of</strong> a number <strong>of</strong><br />

terms <strong>of</strong> the form (r j − s j )/r j where r j was viewed as the number available to die at age y j <strong>and</strong> s j<br />

is the number who actually did so. Assume that the death ages <strong>and</strong> the number available to die are<br />

fixed, so that the value <strong>of</strong> s j is the only r<strong>and</strong>om quantity. As a r<strong>and</strong>om variable, S j has a binomial<br />

distribution based on a sample <strong>of</strong> r j lives <strong>and</strong> success probability [S(y j−1 ) − S(y j )]/S(y j−1 ). The<br />

probability arises from the fact that those available to die were known to be alive at the previous<br />

death age. For one <strong>of</strong> these terms,<br />

µ <br />

rj − S j<br />

E<br />

= r j − r j [S(y j−1 ) − S(y j )]/S(y j−1 )<br />

= S(y j)<br />

r j r j S(y j−1 ) .<br />

That is, this ratio is an unbiased estimator <strong>of</strong> the probability <strong>of</strong> surviving from one death age to<br />

the next one. Furthermore,<br />

h<br />

i<br />

µ S(y<br />

rj − S r j−1 )−S(y j )<br />

j<br />

j S(y j−1 )<br />

1 − S(y j−1)−S(y j )<br />

S(y j−1 )<br />

Var<br />

=<br />

r j<br />

r 2 j<br />

= [S(y j−1) − S(y j )]S(y j )<br />

r j S(y j−1 ) 2 .<br />

Now consider the estimated survival probability at one <strong>of</strong> the death ages. Its expected value is<br />

E[Ŝ(y j)] = E<br />

=<br />

jY<br />

i=1<br />

" jY<br />

i=1<br />

µ<br />

ri − S i<br />

r i<br />

# =<br />

S(y i )<br />

S(y i−1 ) = S(y j)<br />

S(y 0 )<br />

jY<br />

i=1<br />

µ <br />

ri − S i<br />

E<br />

r i<br />

where y 0 is the smallest observed age in the sample. In order to bring the expectation inside the<br />

product, it was assumed that the S-values are independent. The result demonstrates that at the<br />

death ages, the estimator is unbiased.<br />

With regard to the variance, we first need a general result concerning the variance <strong>of</strong> a product <strong>of</strong><br />

independent r<strong>and</strong>om variables. Let X 1 ,...,X n be independent r<strong>and</strong>om variables where E(X j )=µ j<br />

<strong>and</strong> Var(X j )=σ 2 j . Then,<br />

Var(X 1 ···X n ) = E(X 2 1 ···X 2 n) − E(X 1 ···X n ) 2<br />

= E(X 2 1) ···E(X 2 n) − E(X 1 ) 2 ···E(X n ) 2<br />

= (µ 2 1 + σ 2 1) ···(µ 2 n + σ 2 n) − µ 2 1 ···µ 2 n.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!