RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok RESEARCH METHOD COHEN ok

12.01.2015 Views

DEGREES OF FREEDOM 527 Box 24.22 A2× 5contingencytableforchi-square Music Physics Maths German Spanish 7 11 25 4 3 50 Males 14.0 % 22.0 % 50 % 8.0 % 6% 100 % 17 38 73 12 1 141 Females 12.1 % 27.0 % 52 % 8.5 % 0.7 % 100 % Total 24 49 98 16 4 191 12.6 % 25.7 % 51 % 8.4 % 2.1 % 100 % Chapter 24 three cells out of the ten (two rows – males and females – with five cells in each for each of the rating categories). This means that 30 per cent of the cells contain fewer than five cases; even though acomputerwillcalculateachi-squarestatistic,it means that the result is unreliable. This highlights the point made in Chapter 4 about sampling, that the subsample size has to be large. For example, if each category here were to contain five cases then it would mean that the minimum sample size would be fifty (10 × 5), assuming that the data are evenly spread. In the example here, even though the sample size is much larger (191) it still does not guarantee that the 20 per cent rule will be observed, as the data are unevenly spread. Because of the need to ensure that at least 80 per cent of the cells of a chi-square contingency table contain more than five cases if confidence is to be placed in the results, it may not be feasible to calculate the chi-square statistic if only a small sample is being used. Hence the researcher would tend to use this statistic for larger-scale survey data. Other tests could be used if the problem of low cell frequencies obtains, e.g. the binomial test and, more widely used, the Fisher exact test (Cohen and Holliday 1996: 218–20). The required minimum number of cases in each cell renders the chi-square statistic problematic, and, apart from with nominal data, there are alternative statistics that can be calculated and which overcome this problem (e.g. the Mann- Whitney, Wilcoxon, Kruskal-Wallis and Friedman tests for non-parametric – ordinal – data, and the t-test and analysis of variance test for parametric – interval and ratio – data) (see http://www.routledge.com/textbooks/ 9780415368780 – Chapter 24, file SPSS Manual 24.5). Methods of analysing data cast into 2 × 2 contingency tables by means of the chi-square test are generally well covered in research methods books. Increasingly, however, educational data are classified in multiple rather than two-dimensional formats. Everitt (1977) provides a useful account of methods for analysing multidimensional tables. Two significance tests for very small samples are give in the accompanying web site: http:// www.routledge.com/textbooks/9780415368780 – Chapter 24, file 24.3.doc. Degrees of freedom The chi-square statistic introduces the term degrees of freedom.Gorard(2001:233)suggeststhat‘the degrees of freedom is the number of scores we need to know before we can calculate the rest’. Cohen and Holliday (1996) explain the term clearly: Suppose we have to select any five numbers. We have complete freedom of choice as to what the numbers are. So, we have five degrees of freedom. Suppose however we are then told that the five numbers must have a total value of 25. We will have complete freedom of choice to select four numbers but the fifth will be dependent on the other four. Let’s say that the first four numbers we select are 7, 8, 9, and 10, which total 34, then if the total value of the five numbers is to be 25, the fifth number must be −9. 7 + 8 + 9 + 10 − 9 = 25 A restriction has been placed on one of the observations; only four are free to vary; the fifth

528 QUANTITATIVE DATA ANALYSIS has lost its freedom. In our example then df = 4, that is N − 1 = 5 − 1 = 4. Suppose now that we are told to select any five numbers, the first two of which have to total 9, and the total value of all five has to be 25. One restriction is apparent when we wish the total of the first two numbers to be 9. Another restriction is apparent in the requirement that all five numbers must total 25. In other words we have lost two degrees of freedom in our example. It leaves us with df = 3, that is, N − 2 = 5 − 2 = 3. (Cohen and Holliday 1996: 113) For a cross-tabulation (a contingency table), degrees of freedom refer to the freedom with which the researcher is able to assign values to the cells, given fixed marginal totals, usually given as (number of rows − 1) + (number of columns − 1). There are many variants of this, and readers will need to consult more detailed texts to explore this issue. We do not dwell on degrees of freedom here, as it is automatically calculated and addressed in subsequent calculations by most statistical software packages such as SPSS. Measuring association Much educational research is concerned with establishing interrelationships among variables. We may wish to know, for example, how delinquency is related to social class background; whether an association exists between the number of years spent in full-time education and subsequent annual income; whether there is a link between personality and achievement. What, for example, is the relationship, if any, between membership of a public library and social class status Is there a relationship between social class background and placement in different strata of the secondary school curriculum Is there a relationship between gender and success or failure in ‘first time’ driving test results There are several simple measures of association readily available to the researcher to help her test these sorts of relationships. We have selected the most widely used ones here and set them out in Box 24.23. Of these, the two most commonly used correlations are the Spearman rank order correlation for ordinal data and the Pearson product-moment correlation for interval and ratio data. At this point it is pertinent to say a few words about some of the terms used in Box 24.23 to describe the nature of variables. Cohen and Holliday (1982; 1996) provide worked examples of the appropriate use and limitations of the correlational techniques outlined in Box 24.23, together with other measures of association such as Kruskal’s gamma, Somer’s d, and Guttman’s lambda (see http://www.routledge.com/textbooks/ 9780415368780 – Chapter 24, file 24.13.ppt and SPSS Manual 24.6). Look at the words used at the top of Box 24.23 to explain the nature of variables in connection with the measure called the Pearson product moment, r. Thevariables,welearn,are‘continuous’andat the ‘interval’ or the ‘ratio’ scale of measurement. Acontinuousvariableisonethat,theoretically at least, can take any value between two points on a scale. Weight, for example, is a continuous variable; so too is time, so also is height. Weight, time and height can take on any number of possible values between nought and infinity, the feasibility of measuring them across such a range being limited only by the variability of suitable measuring instruments. Turning again to Box 24.23, we read in connection with the second measure shown there (rank order or Kendall’s tau) that the two continuous variables are at the ordinal scale of measurement. The variables involved in connection with the phi coefficient measure of association (halfway down Box 24.23) are described as ‘true dichotomies’ and at the nominal scale of measurement. Truly dichotomous variables (such as sex or driving test result) can take only two values (male or female; pass or fail). To conclude our explanation of terminology, readers should note the use of the term ‘discrete variable’ in the description of the third correlation ratio (eta) in Box 24.23. We said earlier that a continuous variable can take on any value between two points on a scale. A discrete variable, however,

DEGREES OF FREEDOM 527<br />

Box 24.22<br />

A2× 5contingencytableforchi-square<br />

Music Physics Maths German Spanish<br />

7 11 25 4 3 50<br />

Males 14.0 % 22.0 % 50 % 8.0 % 6% 100 %<br />

17 38 73 12 1 141<br />

Females 12.1 % 27.0 % 52 % 8.5 % 0.7 % 100 %<br />

Total 24 49 98 16 4 191<br />

12.6 % 25.7 % 51 % 8.4 % 2.1 % 100 %<br />

Chapter 24<br />

three cells out of the ten (two rows – males and<br />

females – with five cells in each for each of the<br />

rating categories). This means that 30 per cent of<br />

the cells contain fewer than five cases; even though<br />

acomputerwillcalculateachi-squarestatistic,it<br />

means that the result is unreliable. This highlights<br />

the point made in Chapter 4 about sampling, that<br />

the subsample size has to be large. For example,<br />

if each category here were to contain five cases<br />

then it would mean that the minimum sample size<br />

would be fifty (10 × 5), assuming that the data are<br />

evenly spread. In the example here, even though<br />

the sample size is much larger (191) it still does<br />

not guarantee that the 20 per cent rule will be<br />

observed, as the data are unevenly spread.<br />

Because of the need to ensure that at least<br />

80 per cent of the cells of a chi-square contingency<br />

table contain more than five cases if<br />

confidence is to be placed in the results, it may<br />

not be feasible to calculate the chi-square statistic<br />

if only a small sample is being used. Hence<br />

the researcher would tend to use this statistic for<br />

larger-scale survey data. Other tests could be used<br />

if the problem of low cell frequencies obtains, e.g.<br />

the binomial test and, more widely used, the Fisher<br />

exact test (Cohen and Holliday 1996: 218–20).<br />

The required minimum number of cases in each<br />

cell renders the chi-square statistic problematic,<br />

and, apart from with nominal data, there are<br />

alternative statistics that can be calculated and<br />

which overcome this problem (e.g. the Mann-<br />

Whitney, Wilcoxon, Kruskal-Wallis and Friedman<br />

tests for non-parametric – ordinal – data,<br />

and the t-test and analysis of variance test<br />

for parametric – interval and ratio – data) (see<br />

http://www.routledge.com/textbo<strong>ok</strong>s/<br />

9780415368780 – Chapter 24, file SPSS Manual<br />

24.5).<br />

Methods of analysing data cast into 2 × 2<br />

contingency tables by means of the chi-square test<br />

are generally well covered in research methods<br />

bo<strong>ok</strong>s. Increasingly, however, educational data are<br />

classified in multiple rather than two-dimensional<br />

formats. Everitt (1977) provides a useful account<br />

of methods for analysing multidimensional tables.<br />

Two significance tests for very small samples<br />

are give in the accompanying web site: http://<br />

www.routledge.com/textbo<strong>ok</strong>s/9780415368780 –<br />

Chapter 24, file 24.3.doc.<br />

Degrees of freedom<br />

The chi-square statistic introduces the term degrees<br />

of freedom.Gorard(2001:233)suggeststhat‘the<br />

degrees of freedom is the number of scores we need<br />

to know before we can calculate the rest’. Cohen<br />

and Holliday (1996) explain the term clearly:<br />

Suppose we have to select any five numbers. We have<br />

complete freedom of choice as to what the numbers<br />

are. So, we have five degrees of freedom. Suppose<br />

however we are then told that the five numbers must<br />

have a total value of 25. We will have complete<br />

freedom of choice to select four numbers but the fifth<br />

will be dependent on the other four. Let’s say that the<br />

first four numbers we select are 7, 8, 9, and 10, which<br />

total 34, then if the total value of the five numbers is<br />

to be 25, the fifth number must be −9.<br />

7 + 8 + 9 + 10 − 9 = 25<br />

A restriction has been placed on one of the<br />

observations; only four are free to vary; the fifth

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!