12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CONSTRUCTING A TEST 425<br />

<br />

<br />

<br />

true/false statements<br />

open-ended questions where students are given<br />

guidance on how much to write (e.g. 300 words,<br />

a sentence, a paragraph)<br />

closed questions.<br />

These items can test recall, knowledge, comprehension,<br />

application, analysis, synthesis and evaluation,<br />

i.e. different orders of thinking. These take<br />

their rationale from Bloom (1956) on hierarchies<br />

of thinking – from low order (comprehension, application),<br />

through middle order thinking (analysis,<br />

synthesis) to higher order thinking (evaluation,<br />

judgement, criticism). Clearly the selection of the<br />

form of the test item will be based on the principle<br />

of gaining the maximum amount of information in<br />

the most economical way. This is evidenced in the<br />

use of machine-scorable multiple choice completion<br />

tests, where optical mark readers and scanners<br />

can enter and process large-scale data rapidly.<br />

In considering the contents of a test the test<br />

writer must also consider the scale for some kinds<br />

of test. The notion of a scale (a graded system of<br />

classification) can be created in two main ways<br />

(Howitt and Cramer 2005: 203):<br />

<br />

<br />

Alistofitemswhosemeasurementsgofrom<br />

the lowest to highest (e.g. an IQ test, a measure<br />

of sexism, a measure of aggressiveness), such<br />

that it is possible to judge where a student has<br />

reached on the scale by seeing the maximum<br />

level reached on the items;<br />

The method of ‘summated scores’ (Howitt and<br />

Cramer 2005: 203) in which a pool of items<br />

is created, and the student’s score is the total<br />

score gained by summing the marks for all the<br />

items.<br />

Further, many psychological tests used in<br />

educational research will be unidimensional, that<br />

is, the items all measure a single element or<br />

dimension. Howitt and Cramer (2005: 204) liken<br />

this to weighing 30 people using 10 bathroom<br />

scales, in which one would expect a high<br />

intercorrelation to be found between the bathroom<br />

scales. Other tests may be multidimensional, i.e.<br />

where two or more factors or dimensions are being<br />

measured in the same test. Howitt and Cramer<br />

(2005: 204) liken this to weighing 30 people<br />

using 10 bathroom scales and then measuring<br />

their heights using 5 different tape measures. Here<br />

one would expect a high intercorrelation to be<br />

found between the bathroom scale measures, a<br />

high intercorrelation to be found between the<br />

measurements from the tape measures, and a low<br />

intercorrelation to be found between the bathroom<br />

scale measures and the measurements from the tape<br />

measures, because they are measuring different<br />

things or dimensions.<br />

Test constructors, then, need to be clear<br />

whether they are using a unidimensional or a<br />

multidimensional scale. Many texts, while advocating<br />

the purity of using a unidimensional test<br />

that measures a single construct or concept, also<br />

recognize the efficacy, practicality and efficiency in<br />

using multidimensional tests. For example, though<br />

one might regard intelligence casually as a unidimensional<br />

factor, in fact a stronger measure of<br />

intelligence would be obtained by regarding it as<br />

amultidimensionalconstruct,therebyrequiring<br />

multidimensional scaling. Of course, some items<br />

on a test are automatically unidimensional, for<br />

example age, hours spent on homework.<br />

Further, the selection of the items needs to be<br />

considered in order to have the highest reliability.<br />

Let us say that we have ten items that measure<br />

students’ negative examination stress. Each item<br />

is intended to measure stress, for example:<br />

Item 1: Loss of sleep at examination time.<br />

Item 2: Anxiety at examination time.<br />

Item 3: Irritability at examination time.<br />

Item 4: Depression at examination time.<br />

Item 5: Tearfulness at examination time.<br />

Item 6: Unwillingness to do household chores at<br />

examination time.<br />

Item 7: Mood swings at examination time.<br />

Item 8: Increased consumption of coffee at<br />

examination time.<br />

Item 9: Positive attitude and cheerfulness at<br />

examination time.<br />

Item 10: Eager anticipation of the examination.<br />

You run a reliability test (see Chapter 24 on SPSS<br />

reliability) of internal consistency and find strong<br />

intercorrelations between items 1–5 (e.g. around<br />

Chapter 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!