12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CONSTRUCTING A TEST 431<br />

(e.g. to assign grades) is to lose the very purpose<br />

of the criterion-referencing (Gipps 1994: 85). For<br />

example, if a student is awarded a grade E for<br />

spelling in English, and a grade A for imaginative<br />

writing, this could be aggregated into a C grade as<br />

an overall grade of the student’s English language<br />

competence, but what does this C grade mean It<br />

is meaningless, it has no frame of reference or clear<br />

criteria, it loses the useful specificity of the A and<br />

Egrades,itisacompromisethatactuallytellsus<br />

nothing. Further, aggregating such grades assumes<br />

equal levels of difficulty of all items.<br />

Of course, raw scores are still open to<br />

interpretation – which is a matter of judgement<br />

rather than exactitude or precision (Wiliam<br />

1996). For example, if a test is designed to<br />

assess ‘mastery’ of a subject, then the researcher is<br />

faced with the issue of deciding what constitutes<br />

‘mastery’ – is it an absolute (i.e. very high score) or<br />

are there gradations, and if the latter, then where<br />

do these gradations fall For published tests the<br />

scoring is standardized and already made clear, as<br />

are the conversions of scores into, for example,<br />

percentiles and grades.<br />

Underpinning the discussion of scoring is the<br />

need to make it unequivocally clear exactly what<br />

the marking criteria are – what will and will<br />

not score points. This requires a clarification of<br />

whether there is a ‘checklist’ of features that must<br />

be present in a student’s answer.<br />

Clearly criterion-referenced tests will have<br />

to declare their lowest boundary – a cut-off<br />

point – below which the student has been deemed<br />

to fail to meet the criteria. A compromise can be<br />

seen in those criterion-referenced tests that award<br />

different grades for different levels of performance<br />

of the same task, necessitating the clarification<br />

of different cut-off points in the examination. A<br />

common example of this can be seen in the GCSE<br />

examinations for secondary school pupils in the<br />

United Kingdom, where students can achieve a<br />

grade between A and F for a criterion-related<br />

examination.<br />

The determination of cut-off points has been addressed<br />

by Nedelsky (1954), Angoff (1971), Ebel<br />

(1979) and Linn (1993). Angoff (1971) suggests<br />

amethodfordichotomouslyscoreditems.Here<br />

judges are asked to identify the proportion of<br />

minimally acceptable persons who would answer<br />

each item correctly. The sum of these proportions<br />

would then be taken to represent the minimally<br />

acceptable score. An elaborated version of this<br />

principle comes from Ebel (1979). Here a difficulty<br />

by relevance matrix is constructed for all<br />

the items. Difficulty might be assigned three levels<br />

(e.g. easy, medium and hard) and relevance might<br />

be assigned three levels (e.g. highly relevant, moderately<br />

relevant, barely relevant). When each and<br />

every test item has been assigned to the cells of<br />

the matrix, the judges estimate the proportion of<br />

items in each cell that minimally acceptable persons<br />

would answer correctly, with the standard<br />

for each judge being the weighted average of the<br />

proportions in each cell (which are determined by<br />

the number of items in each cell). In this method<br />

judges have to consider two factors – relevance<br />

and difficulty (unlike Angoff (1971), where only<br />

difficulty featured). What characterizes these approaches<br />

is the trust that they place in experts in<br />

making judgements about levels (e.g. of difficulty,<br />

or relevance, or proportions of successful achievement),<br />

that is they are based on fallible human<br />

subjectivity.<br />

Ebel (1979) argues that one principle in<br />

assignation of grades is that they should represent<br />

equal intervals on the score scales. Reference is<br />

made to median scores and standard deviations,<br />

median scores because it is meaningless to<br />

assume an absolute zero on scoring, and standard<br />

deviations as the unit of convenient size for<br />

inclusion of scores for each grade (see also Cohen<br />

and Holliday 1996). One procedure is thus:<br />

<br />

<br />

Calculate the median and standard deviation<br />

of the scores.<br />

Determine the lower score limits of the mark<br />

intervals using the median and the standard<br />

deviation as the unit of size for each grade.<br />

However, the issue of cut-off scores is complicated<br />

by the fact that they may vary according to<br />

the different purposes and uses of scores (e.g.<br />

for diagnosis, for certification, for selection, for<br />

programme evaluation, as these purposes will affect<br />

the number of cut-off points and grades, and the<br />

Chapter 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!