Evaluation of the Australian Wage Subsidy Special Youth ...

Evaluation of the Australian Wage Subsidy Special Youth ... Evaluation of the Australian Wage Subsidy Special Youth ...

02.06.2014 Views

188 weighting, both the weighted (columns 2 and 4) and unweighted (columns 1 and 3) results are shown for each missing data approach. Of main interest is the comparison of the estimates using each missing data approach, thus column 1 versus column 3, or column 2 versus column 4. Discussion of Table A2.2 generalises the results to comparison between the first panel (using mean substitution) and the second panel (deletion of cases with missing data – this gives the smaller base of 2150 cases where 1984 survey information has no information missing on any explanatory variables) by discussing the weighted results. Moving from mean substitution to the deletion approach for the weighted data does change which variables are statistically significant, as the t- statistic size changes – for example ’other city before aged 14’ becomes statistically significant, and ‘age in 1984’ becomes insignificant. It also leads to a change in the size of coefficients for statistically significant variables – for example the coefficient for ‘proportion of pre-June unemployment’ falls, and the t-statistic falls. In a probit, the coefficient size is not clearly interpretable, so the interpretation of a positive influence of this variable on participation in SYETP is not changed by the various missing data approaches, however the calculated marginal effect would be affected. When the variable is not statistically significant, changing the missing data approach can also lead to change in the sign, such as for ‘children 1984’. Although not discussed in detail here, it can be seen that substantively important changes in the estimates arise depending on each approach used. The chief variation is in which variables are statistically significant, so that choice of treatment of missing data affects which coefficients are interpreted as statistically significant. The tradeoff between bias and efficiency and using more information affects the interpretation of results. In light of this, it may be worth pursuing the application of the imputation algorithm of King et al. (2001) in future research, which is argued to outperform the mean substitution and deletion methods. However, only the more commonly accepted approaches are dealt with here. The effects of a set of dummy variables for mean imputation is shown in the first 2 columns of Table 5.8. Missing information on the parental occupation predicts failure perfectly, the problem of collinearity, and these variables must be dropped from the regression to enable estimation. The missing information for parental qualifications,

189 proportion of time spent unemployed and number of siblings is controlled for using the dummies. The estimation on the data where those cases with missing information in these variables are dropped is given in Appendix Table A2.3. The results in columns one and two are slightly different to that of Table 5.8. Of course the number of observations in columns one and two of Appendix Table A2.3 are lower at 2150 because the observations with missing information are dropped, whereas in Table 5.8 they are 2368. The Akaike Information Criterion does not vary much in size between the models, and so does not assist much in model selection here (because the sample and variables change between the models, this fit measure is more relevant).The arguments of King et al. (2001) suggest that dropping those cases, casewise deletion, gives the correct standard error, although estimates do suffer the problems of bias. It is then a subjective choice as to whether the analyst prefers to trade-off bias, however correct standard error estimation is essential if the statistical significance of the coefficients is important to analysis. In light of this, it is deemed more useful to apply casewise deletion than mean imputation dummies. 5.7.2 Sample reduction effects on model of SYETP participation Columns 3 and 4 of Table 5.8 give the probit results for SYETP participation for the final data set after sample reduction. Column 3 136 shows the unweighted results, and column 4 shows the results weighted with the survey weight. 137 As for the whole sample discussed earlier, the variables that are statistically significant alter with the use of the weight. The variables that gain significance when using the weight are married in 1984, attended a private school, interviewed in Western Australia/ Tasmania, longest job held before 1984 was 3 years or more, mostly lived in a city until aged 14. The variables that lose statistical significance are CEP referrals in 1984, father held a post-school qualification when respondent aged 14, mother worked as plant operative when respondent aged 14. A worrying change is the loss of statistical significance for CEP referrals in 1984. This is further discussed later in the modelling of the treatment effect of SYETP, because this is a key element of the identifying restriction in the bivariate probit of employment 136 The results in column 3 are equivalent to the univariate probit estimated in Richardson (1998). 137 Note that no account has been made of sample reduction from the 1984 survey in this weight. This is treated next.

188<br />

weighting, both <strong>the</strong> weighted (columns 2 and 4) and unweighted (columns 1 and 3)<br />

results are shown for each missing data approach. Of main interest is <strong>the</strong> comparison <strong>of</strong><br />

<strong>the</strong> estimates using each missing data approach, thus column 1 versus column 3, or<br />

column 2 versus column 4. Discussion <strong>of</strong> Table A2.2 generalises <strong>the</strong> results to<br />

comparison between <strong>the</strong> first panel (using mean substitution) and <strong>the</strong> second panel<br />

(deletion <strong>of</strong> cases with missing data – this gives <strong>the</strong> smaller base <strong>of</strong> 2150 cases where<br />

1984 survey information has no information missing on any explanatory variables) by<br />

discussing <strong>the</strong> weighted results. Moving from mean substitution to <strong>the</strong> deletion approach<br />

for <strong>the</strong> weighted data does change which variables are statistically significant, as <strong>the</strong> t-<br />

statistic size changes – for example ’o<strong>the</strong>r city before aged 14’ becomes statistically<br />

significant, and ‘age in 1984’ becomes insignificant. It also leads to a change in <strong>the</strong> size<br />

<strong>of</strong> coefficients for statistically significant variables – for example <strong>the</strong> coefficient for<br />

‘proportion <strong>of</strong> pre-June unemployment’ falls, and <strong>the</strong> t-statistic falls. In a probit, <strong>the</strong><br />

coefficient size is not clearly interpretable, so <strong>the</strong> interpretation <strong>of</strong> a positive influence <strong>of</strong><br />

this variable on participation in SYETP is not changed by <strong>the</strong> various missing data<br />

approaches, however <strong>the</strong> calculated marginal effect would be affected. When <strong>the</strong> variable<br />

is not statistically significant, changing <strong>the</strong> missing data approach can also lead to change<br />

in <strong>the</strong> sign, such as for ‘children 1984’. Although not discussed in detail here, it can be<br />

seen that substantively important changes in <strong>the</strong> estimates arise depending on each<br />

approach used. The chief variation is in which variables are statistically significant, so<br />

that choice <strong>of</strong> treatment <strong>of</strong> missing data affects which coefficients are interpreted as<br />

statistically significant. The trade<strong>of</strong>f between bias and efficiency and using more<br />

information affects <strong>the</strong> interpretation <strong>of</strong> results. In light <strong>of</strong> this, it may be worth pursuing<br />

<strong>the</strong> application <strong>of</strong> <strong>the</strong> imputation algorithm <strong>of</strong> King et al. (2001) in future research, which<br />

is argued to outperform <strong>the</strong> mean substitution and deletion methods. However, only <strong>the</strong><br />

more commonly accepted approaches are dealt with here.<br />

The effects <strong>of</strong> a set <strong>of</strong> dummy variables for mean imputation is shown in <strong>the</strong> first 2<br />

columns <strong>of</strong> Table 5.8. Missing information on <strong>the</strong> parental occupation predicts failure<br />

perfectly, <strong>the</strong> problem <strong>of</strong> collinearity, and <strong>the</strong>se variables must be dropped from <strong>the</strong><br />

regression to enable estimation. The missing information for parental qualifications,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!