RESEARCH METHOD COHEN ok
RESEARCH METHOD COHEN ok RESEARCH METHOD COHEN ok
CONSTRUCTING A TEST 423 where A = the number of correct scores from the high scoring group B = the number of correct scores from the low scoring group N = the total number of students in the two groups. Suppose all 10 students from the high scoring group answered the item correctly and 2 students from the low scoring group answered the item correctly. The formula would work out thus: 1 2 8 (10 + 10) = 0.80 (index of discriminability) The maximum index of discriminability is 1.00. Any item whose index of discriminability is less than 0.67, i.e. is too undiscriminating, should be reviewed first to find out whether this is due to ambiguity in the wording or possible clues in the wording. If this is not the case, then whether the researcher uses an item with an index lower than 0.67 is a matter of judgement. It would appear, then, that the item in the example would be appropriate to use in a test. For a further discussion of item discriminability see Linn (1993) and Aiken (2003). One can use the discriminability index to examine the effectiveness of distractors. Thisis based on the premise that an effective distractor should attract more students from a low scoring group than from a high scoring group. Consider the following example, where low and high scoring groups are identified: A B C Top 10 students 10 0 2 Bottom 10 students 8 0 10 In example A, the item discriminates positively in that it attracts more correct responses (10) from the top 10 students than the bottom 10 (8) and hence is a poor distractor; here, also, the discriminability index is 0.20, hence is a poor discriminator and is also a poor distractor. Example B is an ineffective distractor because nobody was included from either group. Example C is an effective distractor because it includes far more students from the bottom 10 students (10) than the higher group (2). However, in this case any ambiguities must be ruled out before the discriminating power can be improved. Distractors are the stuff of multiple choice items, where incorrect alternatives are offered, and students have to select the correct alternatives. Here a simple frequency count of the number of times a particular alternative is selected will provide information on the effectiveness of the distractor: if it is selected many times then it is working effectively; if it is seldom or never selected then it is not working effectively and it should be replaced. If we wish to calculate the item difficulty of a test, we can use the following formula: A N × 100 where A = the number of students who answered the item correctly; N = the total number of students who attempted the item. Hence if 12 students out of a class of 20 answered the item correctly, then the formula would work out thus: 12 × 100 = 60 per cent 20 The maximum index of difficulty is 100 per cent. Items falling below 33 per cent and above 67 per cent are likely to be too difficult and too easy respectively. It would appear, then, that this item would be appropriate to use in a test. Here, again, whether the researcher uses an item with an index of difficulty below or above the cut-off points is a matter of judgement. In a normreferenced test the item difficulty should be around 50 per cent (Frisbie 1981). For further discussion of item difficulty see Linn (1993) and Hanna (1993). Given that the researcher can know the degree of item discriminability and difficulty only once the test has been undertaken, there is an unavoidable Chapter 19
424 TESTS need to pilot home-grown tests. Items with limited discriminability and limited difficulty must be weeded out and replaced, those items with the greatest discriminability and the most appropriate degrees of difficulty can be retained; this can be undertaken only once data from a pilot have been analysed. Item discriminability and item difficulty take on differential significance in norm-referenced and criterion-referenced tests. In a norm-referenced test we wish to compare students with each other, hence item discriminability is very important. In acriterion-referencedtest,ontheotherhand,it is not important per se to be able to compare or discriminate between students’ performance. For example, it may be the case that we wish to discover whether a group of students has learnt a particular body of knowledge, that is the objective, rather than, say, finding out how many have learned it better than others. Hence it may be that a criterion-referenced test has very low discriminability if all the students achieve very well or achieve very poorly, but the discriminability is less important than the fact than the students have or have not learnt the material. A norm-referenced test would regard such a poorly discriminating item as unsuitable for inclusion, whereas a criterion-referenced test would regard such an item as providing useful information (on success or failure). With regard to item difficulty, in a criterionreferenced test the level of difficulty is that which is appropriate to the task or objective. Hence if an objective is easily achieved then the test item should be easily achieved; if the objective is difficult then the test item should be correspondingly difficult. This means that, unlike anorm-referencedtestwhereanitemmightbe reworked in order to increase its discriminability index, this is less of an issue in criterionreferencing. Of course, this is not to deny the value of undertaking an item difficulty analysis, rather it is to question the centrality of such a concern. Gronlund and Linn (1990: 265) suggest that where instruction has been effective the item difficulty index of a criterion-referenced test will be high. In addressing the item discriminability, item difficulty and distractor effect of particular test items, it is advisable, of course, to pilot these tests and to be cautious about placing too great a store on indices of difficulty and discriminability that are computed from small samples. In constructing a test with item analysis, item discriminability, item difficulty and distractor effects in mind, it is important also to consider the actual requirements of the test (Nuttall 1987; Cresswell and Houston 1991): Are all the items in the test equally difficult Which items are easy, moderately hard, hard or very hard What kinds of task is each item addressing: is it a practice item (repeating known knowledge), an application item (applying known knowledge, or a synthesis item (bringing together and integrating diverse areas of knowledge) If not, what makes some items more difficult than the rest Are the items sufficiently within the experience of the students How motivated will students be by the contents of each item (i.e. how relevant will they perceive the item to be, how interesting is it) The contents of the test will also need to take account of the notion of fitness for purpose, for example in the types of test items. Here the researcher will need to consider whether the kinds of data to demonstrate ability, understanding and achievement will be best demonstrated in, for example (Lewis 1974; Cohen et al. 2004: ch. 16): an open essay afactualandheavilydirectedessay short answer questions divergent thinking items completion items multiple-choice items (with one correct answer or more than one correct answer) matching pairs of items or statements inserting missing words incomplete sentences or incomplete, unlabelled diagrams
- Page 392 and 393: GROUP INTERVIEWING 373 an intro
- Page 394 and 395: INTERVIEWING CHILDREN 375 taking pl
- Page 396 and 397: THE NON-DIRECTIVE INTERVIEW AND THE
- Page 398 and 399: TELEPHONE INTERVIEWING 379 By mean
- Page 400 and 401: TELEPHONE INTERVIEWING 381 questi
- Page 402 and 403: ETHICAL ISSUES IN INTERVIEWING 383
- Page 404 and 405: PROCEDURES IN ELICITING, ANALYSING
- Page 406 and 407: PROCEDURES IN ELICITING, ANALYSING
- Page 408 and 409: DISCOURSE ANALYSIS 389 completeness
- Page 410 and 411: ACCOUNT GATHERING IN EDUCATIONAL RE
- Page 412 and 413: STRENGTHS OF THE ETHOGENIC APPROACH
- Page 414 and 415: A NOTE ON STORIES 395 instruments t
- Page 416 and 417: INTRODUCTION 397 the physical s
- Page 418 and 419: STRUCTURED OBSERVATION 399 Box 18.1
- Page 420 and 421: STRUCTURED OBSERVATION 401 Box 18.2
- Page 422 and 423: STRUCTURED OBSERVATION 403 or event
- Page 424 and 425: NATURALISTIC AND PARTICIPANT OBSERV
- Page 426 and 427: NATURALISTIC AND PARTICIPANT OBSERV
- Page 428 and 429: ETHICAL CONSIDERATIONS 409 Box 18.3
- Page 430 and 431: SOME CAUTIONARY COMMENTS 411 ob
- Page 432 and 433: CONCLUSION 413 the data mean. This
- Page 434 and 435: NORM-REFERENCED, CRITERION-REFERENC
- Page 436 and 437: COMMERCIALLY PRODUCED TESTS AND RES
- Page 438 and 439: CONSTRUCTING A TEST 419 achieveme
- Page 440 and 441: CONSTRUCTING A TEST 421 Select the
- Page 444 and 445: CONSTRUCTING A TEST 425 true/fal
- Page 446 and 447: CONSTRUCTING A TEST 427 short-answe
- Page 448 and 449: CONSTRUCTING A TEST 429 demonstrate
- Page 450 and 451: CONSTRUCTING A TEST 431 (e.g. to as
- Page 452 and 453: COMPUTERIZED ADAPTIVE TESTING 433 H
- Page 454 and 455: 20 Personal constructs Introduction
- Page 456 and 457: ALLOTTING ELEMENTS TO CONSTRUCTS 43
- Page 458 and 459: PROCEDURES IN GRID ANALYSIS 439 inv
- Page 460 and 461: PROCEDURES IN GRID ANALYSIS 441 Box
- Page 462 and 463: SOME EXAMPLES OF THE USE OF REPERTO
- Page 464 and 465: GRID TECHNIQUE AND AUDIO/VIDEO LESS
- Page 466 and 467: FOCUSED GRIDS, NON-VERBAL GRIDS, EX
- Page 468 and 469: INTRODUCTION 449 Box 21.1 Dimension
- Page 470 and 471: ROLE-PLAYING VERSUS DECEPTION: THE
- Page 472 and 473: THE USES OF ROLE-PLAYING 453 of
- Page 474 and 475: ROLE-PLAYING IN AN EDUCATIONAL SETT
- Page 476: EVALUATING ROLE-PLAYING AND OTHER S
- Page 480 and 481: 22 Approaches to qualitative data a
- Page 482 and 483: TABULATING DATA 463 data set reprod
- Page 484 and 485: TABULATING DATA 465 Box 22.4 Studen
- Page 486 and 487: FIVE WAYS OF ORGANIZING AND PRESENT
- Page 488 and 489: SYSTEMATIC APPROACHES TO DATA ANALY
- Page 490 and 491: SYSTEMATIC APPROACHES TO DATA ANALY
424 TESTS<br />
need to pilot home-grown tests. Items with limited<br />
discriminability and limited difficulty must be<br />
weeded out and replaced, those items with the<br />
greatest discriminability and the most appropriate<br />
degrees of difficulty can be retained; this can be<br />
undertaken only once data from a pilot have been<br />
analysed.<br />
Item discriminability and item difficulty take on<br />
differential significance in norm-referenced and<br />
criterion-referenced tests. In a norm-referenced<br />
test we wish to compare students with each other,<br />
hence item discriminability is very important. In<br />
acriterion-referencedtest,ontheotherhand,it<br />
is not important per se to be able to compare<br />
or discriminate between students’ performance.<br />
For example, it may be the case that we wish<br />
to discover whether a group of students has<br />
learnt a particular body of knowledge, that<br />
is the objective, rather than, say, finding out<br />
how many have learned it better than others.<br />
Hence it may be that a criterion-referenced test<br />
has very low discriminability if all the students<br />
achieve very well or achieve very poorly, but the<br />
discriminability is less important than the fact<br />
than the students have or have not learnt the<br />
material. A norm-referenced test would regard<br />
such a poorly discriminating item as unsuitable<br />
for inclusion, whereas a criterion-referenced test<br />
would regard such an item as providing useful<br />
information (on success or failure).<br />
With regard to item difficulty, in a criterionreferenced<br />
test the level of difficulty is that<br />
which is appropriate to the task or objective.<br />
Hence if an objective is easily achieved then<br />
the test item should be easily achieved; if the<br />
objective is difficult then the test item should be<br />
correspondingly difficult. This means that, unlike<br />
anorm-referencedtestwhereanitemmightbe<br />
reworked in order to increase its discriminability<br />
index, this is less of an issue in criterionreferencing.<br />
Of course, this is not to deny the<br />
value of undertaking an item difficulty analysis,<br />
rather it is to question the centrality of such a<br />
concern. Gronlund and Linn (1990: 265) suggest<br />
that where instruction has been effective the item<br />
difficulty index of a criterion-referenced test will<br />
be high.<br />
In addressing the item discriminability, item<br />
difficulty and distractor effect of particular test<br />
items, it is advisable, of course, to pilot these tests<br />
and to be cautious about placing too great a store<br />
on indices of difficulty and discriminability that<br />
are computed from small samples.<br />
In constructing a test with item analysis, item<br />
discriminability, item difficulty and distractor<br />
effects in mind, it is important also to consider<br />
the actual requirements of the test (Nuttall<br />
1987; Cresswell and Houston 1991):<br />
Are all the items in the test equally difficult<br />
Which items are easy, moderately hard, hard<br />
or very hard<br />
What kinds of task is each item addressing:<br />
is it a practice item (repeating known<br />
knowledge), an application item (applying<br />
known knowledge, or a synthesis item<br />
(bringing together and integrating diverse areas<br />
of knowledge)<br />
If not, what makes some items more difficult<br />
than the rest<br />
Are the items sufficiently within the<br />
experience of the students<br />
How motivated will students be by the contents<br />
of each item (i.e. how relevant will they<br />
perceive the item to be, how interesting is it)<br />
The contents of the test will also need to take<br />
account of the notion of fitness for purpose, for<br />
example in the types of test items. Here the<br />
researcher will need to consider whether the kinds<br />
of data to demonstrate ability, understanding and<br />
achievement will be best demonstrated in, for<br />
example (Lewis 1974; Cohen et al. 2004: ch. 16):<br />
an open essay<br />
afactualandheavilydirectedessay<br />
short answer questions<br />
divergent thinking items<br />
completion items<br />
multiple-choice items (with one correct answer<br />
or more than one correct answer)<br />
matching pairs of items or statements<br />
inserting missing words<br />
incomplete sentences or incomplete, unlabelled<br />
diagrams