RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok RESEARCH METHOD COHEN ok

12.01.2015 Views

COMPUTERIZED ADAPTIVE TESTING 433 How ethical it would be to undertake the final four of these is perhaps questionable, or indeed any apart from the first on the list. Are they cheating or legitimate test preparation Should one teach to a test; is not to do so a dereliction of duty (e.g. in criterion- and domain-referenced tests) or giving students an unfair advantage and thus reducing the reliability of the test as a true and fair measure of ability or achievement In highstakes assessment (e.g. for public accountability and to compare schools and teachers) there is even the issue of not entering for tests students whose performance will be low (see, for example, Haladyna et al.1991).Thereisariskofa correlation between the ‘stakes’ and the degree of unethical practice – the greater the stakes, the greater the incidence of unethical practice. Unethical practice, observes Gipps (1994), occurs where scores are inflated but reliable inference on performance or achievement is not, and where different groups of students are prepared differentially for tests, i.e. giving some students an unfair advantage over others. To overcome such problems, she suggests, it is ethical and legitimate for teachers to teach to a broader domain than the test, that teachers should not teach directly to the test, and the situation should only be that better instruction rather than test preparation is acceptable (Cunningham 1998). One can add to this list of considerations (Cronbach 1970; Hanna 1993; Cunningham 1998) the following views: Tests must be valid and reliable (see Chapter 6). The administration, marking and use of the test should be undertaken only by suitably competent/qualified people (i.e. people and projects should be vetted). Access to test materials should be controlled, thus test items should not be reproduced apart from selections in professional publication; the tests should be released only to suitably qualified professionals in connection with specific professionally acceptable projects. Tests should benefit the testee (beneficence). Clear marking and grading protocols should exist (the issue of transparency is discussed in Chapter 6). Test results are reported only in a way that cannot be misinterpreted. The privacy and dignity of individuals should be respected (e.g. confidentiality, anonymity, non-traceability). Individuals should not be harmed by the test or its results (non-maleficence). Informed consent to participate in the test should be sought. Computerized adaptive testing Computerized adaptive testing (Wainer 1990; Aiken 2003: 50–2) is the decision on which particular test items to administer, which is based on the subjects’ responses to previous items. It is particularly useful for large-scale testing, where a wide range of ability can be expected. Here a test must be devised that enables the tester to cover this wide range of ability; hence it must include some easy to some difficult items – too easy and it does not enable a range of high ability to be charted (testees simply getting all the answers right), too difficult and it does not enable a range of low ability to be charted (testees simply getting all the answers wrong). We find out very little about a testee if we ask a battery of questions which are too easy or too difficult. Further, it is more efficient and reliable if atestcanavoidtheproblemforhighabilitytestees of having to work through a mass of easy items in order to reach the more difficult items and for low ability testees of having to try to guess the answers to more difficult items. Hence it is useful to have atestthatisflexibleandthatcanbeadaptedto the testees. For example, if a testee found an item too hard the next item could adapt to this and be easier, and, conversely, if a testee was successful on an item the next item could be harder. Wainer (1990) indicates that in an adaptive test the first item is pitched in the middle of the assumed ability range; if the testee answers it correctly then it is followed by a more difficult item, and if the testee answers it incorrectly then it is followed by an easier item. Computers here provide an ideal opportunity to address the flexibility, Chapter 19

434 TESTS discriminability and efficiency of testing. Aiken (2003: 51) suggests that computer adaptive testing can reduce the number of test items present to around 50 per cent of those used in conventional tests. Testees can work at their own pace, they need not be discouraged but can be challenged, the test is scored instantly to provide feedback to the testee, a greater range of items can be included in the test and a greater degree of precision and reliability of measurement can be achieved; indeed, test security can be increased and the problem of understanding answer sheets is avoided. Clearly the use of computer adaptive testing has several putative attractions. On the other hand, it requires different skills from traditional tests, and these might compromise the reliability of the test, for example: The mental processes required to work with a computer screen and computer program differ from those required for a pen and paper test. Motivation and anxiety levels increase or decrease when testees work with computers. The physical environment might exert a significant difference, e.g. lighting, glare from the screen, noise from machines, loading and running the software. Reliability shifts from an index of the variability of the test to an index of the standard error of the testee’s performance. The usual formula for calculating standard error assumes that error variance is the same for all scores, whereas in item response theory it is assumed that error variance depends on each testee’s ability – the conventional statistic of error variance calculates a single average variance of summed scores, whereas in item response theory this is at best very crude, and at worst misleading as variation is a function of ability rather than test variation and cannot fairly be summed (see Thissen (1990) for an analysis of how to address this issue). Having so many test items increases the chance of inclusion of poor items. Computer adaptive testing requires a large item pool for each area of content domain to be developed (Flaugher 1990), with sufficient numbers, variety and spread of difficulty. All items must measure a single aptitude or dimension, and the items must be independent of each other, i.e. a person’s response to an item should not depend on that person’s response to another item. The items have to be pretested and validated, their difficulty and discriminability calculated, the effect of distractors reduced, the capability of the test to address unidimensionality and/or multidimensionality to be clarified, and the rules for selecting items to be enacted.

COMPUTERIZED ADAPTIVE TESTING 433<br />

How ethical it would be to undertake the final<br />

four of these is perhaps questionable, or indeed<br />

any apart from the first on the list. Are they<br />

cheating or legitimate test preparation Should<br />

one teach to a test; is not to do so a dereliction<br />

of duty (e.g. in criterion- and domain-referenced<br />

tests) or giving students an unfair advantage and<br />

thus reducing the reliability of the test as a true and<br />

fair measure of ability or achievement In highstakes<br />

assessment (e.g. for public accountability<br />

and to compare schools and teachers) there<br />

is even the issue of not entering for tests<br />

students whose performance will be low (see, for<br />

example, Haladyna et al.1991).Thereisariskofa<br />

correlation between the ‘stakes’ and the degree<br />

of unethical practice – the greater the stakes,<br />

the greater the incidence of unethical practice.<br />

Unethical practice, observes Gipps (1994), occurs<br />

where scores are inflated but reliable inference<br />

on performance or achievement is not, and<br />

where different groups of students are prepared<br />

differentially for tests, i.e. giving some students an<br />

unfair advantage over others. To overcome such<br />

problems, she suggests, it is ethical and legitimate<br />

for teachers to teach to a broader domain than<br />

the test, that teachers should not teach directly<br />

to the test, and the situation should only be that<br />

better instruction rather than test preparation is<br />

acceptable (Cunningham 1998).<br />

One can add to this list of considerations<br />

(Cronbach 1970; Hanna 1993; Cunningham<br />

1998) the following views:<br />

Tests must be valid and reliable (see<br />

Chapter 6).<br />

The administration, marking and use of the<br />

test should be undertaken only by suitably<br />

competent/qualified people (i.e. people and<br />

projects should be vetted).<br />

Access to test materials should be controlled,<br />

thus test items should not be reproduced apart<br />

from selections in professional publication;<br />

the tests should be released only to suitably<br />

qualified professionals in connection with<br />

specific professionally acceptable projects.<br />

Tests should benefit the testee (beneficence).<br />

Clear marking and grading protocols should<br />

<br />

<br />

<br />

<br />

exist (the issue of transparency is discussed in<br />

Chapter 6).<br />

Test results are reported only in a way that<br />

cannot be misinterpreted.<br />

The privacy and dignity of individuals should<br />

be respected (e.g. confidentiality, anonymity,<br />

non-traceability).<br />

Individuals should not be harmed by the test<br />

or its results (non-maleficence).<br />

Informed consent to participate in the test<br />

should be sought.<br />

Computerized adaptive testing<br />

Computerized adaptive testing (Wainer 1990;<br />

Aiken 2003: 50–2) is the decision on which<br />

particular test items to administer, which is based<br />

on the subjects’ responses to previous items. It is<br />

particularly useful for large-scale testing, where a<br />

wide range of ability can be expected. Here a test<br />

must be devised that enables the tester to cover<br />

this wide range of ability; hence it must include<br />

some easy to some difficult items – too easy and it<br />

does not enable a range of high ability to be charted<br />

(testees simply getting all the answers right), too<br />

difficult and it does not enable a range of low ability<br />

to be charted (testees simply getting all the answers<br />

wrong). We find out very little about a testee if we<br />

ask a battery of questions which are too easy or too<br />

difficult. Further, it is more efficient and reliable if<br />

atestcanavoidtheproblemforhighabilitytestees<br />

of having to work through a mass of easy items in<br />

order to reach the more difficult items and for low<br />

ability testees of having to try to guess the answers<br />

to more difficult items. Hence it is useful to have<br />

atestthatisflexibleandthatcanbeadaptedto<br />

the testees. For example, if a testee found an item<br />

too hard the next item could adapt to this and be<br />

easier, and, conversely, if a testee was successful<br />

on an item the next item could be harder.<br />

Wainer (1990) indicates that in an adaptive<br />

test the first item is pitched in the middle of<br />

the assumed ability range; if the testee answers it<br />

correctly then it is followed by a more difficult item,<br />

and if the testee answers it incorrectly then it is<br />

followed by an easier item. Computers here provide<br />

an ideal opportunity to address the flexibility,<br />

Chapter 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!