CHALLENGING ASSESSMENT - PSYCONDIA

CHALLENGING ASSESSMENT 

─ 

BOOK OF ABSTRACTS OF THE 

FOURTH BIENNIAL 

EARLI/NORTHUMBRIA 

ASSESSMENT CONFERENCE 2008 

Edited by 

Marja van den Heuvel-Panhuizen 

Olaf Köller

CHALLENGING ASSESSMENT 

─ 

BOOK OF ABSTRACTS OF THE 

FOURTH BIENNIAL 

EARLI/NORTHUMBRIA 

ASSESSMENT CONFERENCE 2008

CHALLENGING ASSESSMENT – BOOK OF ABSTRACTS OF THE FOURTH BIENNIAL 

EARLI/NORTHUMBRIA ASSESSMENT CONFERENCE 2008 

Edited by 


Olaf Köller 

Editorial assistance 

Monika Lacher 

Humboldt-Universität zu Berlin 

Institut zur Qualitätsentwicklung im Bildungswesen (IQB) 

Unter den Linden 6 

10099 Berlin 

Germany 

This Book of Abstracts is also available for download as a PDF file at 

http://www.iqb.hu-berlin.de/veranst/enac2008?reg=r_11 

2008 

Printed by 

Breitfeld Vervielfältigungsservice, Berlin, Germany 

Copyright © 2008 left to the Authors 

All rights reserved 

ISBN 978-3-00-025471-0 

Fourth Biennial EARLI/Northumbria Assessment Conference 2008-07-29 

August 27 – 29, 2008 

Hosted by IQB, Humboldt University Berlin 

Conference Venue: Seminaris Seehotel Potsdam/Berlin, Germany 

ii ENAC 2008

PREFACE 

This Book of Abstracts presents recent research in the field of assessment and evaluation. 

In total, the volume contains 124 contributions consisting of the abstracts of 3 plenary 

lectures, 31 symposium papers, 42 papers, 26 roundtable papers, and 22 posters. All these 

contributions have been brought together by the Fourth Biennial EARLI/Northumbria 

Assessment Conference 2008. 

The contributions cover a rich variety of topics that reflect the themes: 

• Standards-based assessment 

• E-assessment 

• Measuring and modelling performances 

• Consequences and contexts of assessment 

• Learning-oriented assessment. 

The book opens with the abstracts of the three invited plenary lectures. Two of them 

highlight the connection between assessment on the one hand, and instruction and learning 

on the other hand. The third plenary lecture addresses the rights of children in assessment. 

The three invited symposia give a view on the socio-cultural perspective of assessment, on 

new psychometric developments in test design, and on advances in e-assessment. 

With these invited contributions, the International Conference Committee is leaving its 

marks on the 2008 conference. 

In addition, the ICC chose as the title for this conference Challenging Assessment in order 

to signify that assessment is a very complex and complicated domain of research, which 

demands maximum input of energy and know-how from those involved in it. At the same 

time, this title indicates that assessment is a fascinating area to work in. As a matter of fact, 

this is where all human development and learning can become visible. It is our job to reveal 

the traces of growth and the results of education and other learning environments. 

However, this is only half of our work. Besides generating this knowledge, making it 

accessible to all stakeholders in a meaningful and productive way is equally important. 

May this Book of Abstracts inspire its readers’ thoughts and actions towards further 

progress in the field of assessment. 


(Conference President) 

Olaf Köller 

(Director IQB) 

Berlin, August 2008 

ENAC 2008 iii

TABLE OF CONTENTS 

Preface iii 

Table of contents iv 

Introduction xvii 

EARLI/Northumbria Assessment Conference 2008 xvii 

International Conference Committee ENAC 2008 xvii 

The review process of ENAC 2008 xviii 

Plenary Lectures 3 

Eckhard Klieme 

Assessment, grading, and instruction: Understanding the context of 

educational measurement 

Ruth Leitch 

Improving Children’s Rights in Assessment: issues, challenges and 

possibilities 

Dylan Wiliam 

When is assessment learning-oriented? 

Invited Symposia 9 

Using socio-cultural perspectives to understand and change assessment in 

post-compulsory education 

Organiser: Liz McDowell 

David James 

Getting beyond the individual and the technical: How a 

cultural approach offers knowledge to transform assessment 

Kathryn Ecclestone 

Straitjacket or springboard?: the strengths and weaknesses 

of using a socio-cultural understanding of the effects of 

formative assessment on learning 

John Pryor, Barbara Crossouard 

Formative assessment: the discursive construction of 

identities 

iv ENAC 2008 

5 

6 

7 

11 

12 

13 

14

Measuring language skills by means of C-tests: Methodological challenges and 

psychometric properties 

Organisers: Alexander Robitzsch and Olaf Köller 

Chair: Olaf Köller 

Thomas Eckes 

Constructing a calibrated item bank for C-test 

Johannes Hartig, Claudia Harsch 

Gaining substantive information from local dependencies 

between C-test items 

Alexander Robitzsch, Ina Karius, Daniela Neumann 

C-tests for German Students: Dimensionality, Validity and 

Psychometric Perspectives 

Moving forward with e-assessment 

Organiser / Chair: Denise Whitelock 

Discussant: Kari Smith 

Cornelia Ruedel 

The Future of E-Assessment: E-Assessment as a Dialog 

Jim Ridgway, Sean McCusker, James Nicholson 

Alcohol and a Mash-up: Understanding Student 

Understanding 

Sally Jordan 

E-assessment for learning? The potential of short free-text 

questions with tailored feedback 

Symposia 23 

Portfolios in Higher Education in three European countries – Variations in 

Conceptions, Purposes and Practices 

Organiser: Olga Dysthe 

Chair: Nicola Reimann 

Discussant: Anton Havnes 

Elizabeth Hartnell-Young 

Learning opportunities through the processes of eportfolio 

development 

Olga Dysthe, Knut Steinar Engelsen 

The Disciplinary Content Portfolio in Norwegian Higher 

Education – How and Why? 

Wil Meeus, Peter van Petegem 

Portfolio diversity in Belgian (Flemish) Higher Education – A 

comparative study of eight cases 

ENAC 2008 v 

15 

16 

17 

18 

19 

20 

21 

22 

25 

26 

27 

28

Aims, values and ethical considerations in group work assessment 

Organiser: Lorraine Foreman-Peck 

Julia Vernon 

Involuntary Free Riding – how status affects performance in 

a group project 

Julie Jones, Andrew Smith 

Facilitating Group work: leading or empowering? 

Tony Mellor, Jane Entwsitle 

Marginalised students in group work assessment: ethical 

issues of group formation and the effective support of such 

individuals 

Multidimensional measurement models of students' competencies 

Organiser: Johannes Hartig 

Markus Wirtz, Timo Leuders, Marianne Bayrhuber, Regina Bruder 

Evaluation of non-unidimensional item contents using 

diagnostic results from Rasch-analysis 

Olga Kunina, Oliver Wilhelm, André A. Rupp 

Modelling multidimensional structure via cognitive diagnosis 

models: Theoretical potentials and methodological limitations 

for practical applications 

Jana Höhler, Johannes Hartig 

Modelling Specific Abilities for Listening Comprehension in a 

Foreign Language with a Multidimensional IRT Model 

Recent Developments in Computer-Based Assessment: Chances for the 

measurement of Competence 

Organiser: Thomas Martens 

Thibaud Latour, Raynald Jadoul, Patrick Plichart, Judith Swietlik- 

Simon, Lionel Lecaque, Samuel Renault 

Enlarging the range of assessment modalities using CBA: 

New challenges for generic (web-based) platforms 

Frank Goldhammer, Thomas Martens, Johannes Naumann, Heiko 

Rölke, Alexander Scharaf 

Developing stimuli for electronic reading assessment: The 

hypertext-builder 

Johannes Naumann, Nina Jude, Frank Goldhammer, Thomas 

Martens, Heiko Roelke, Eckhard Klieme 

Component skills of electronic reading competence 

vi ENAC 2008 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40

Issues in High-Stakes Performance-based Assessment of Clinical Competence 

Organiser: Godfrey Pell 

Chair/Discussant: Trudie Roberts 

David Blackmore 

Lessons Learned from Administering a National OSCE for 

Medical Licensure 

Sydney Smee 

Quality Assurance through the OSCE Life Cycle 

Godfrey Pell, Richard Fuller 

Investigating OSCE Error Variance when measuring higher 

level competencies 

Katharine Boursicot, Trudie Roberts, Jenny Higham, Jane Dacre 

Beyond checklist scoring – clinicians’ perceptions of 

inadequate clinical performance 

Towards (quasi-) experimental research on the design of peer assessment 

Organiser: Dominique Sluijsmans 

Marjo van Zundert, Dominique Sluijsmans, Jeroen van 

Merriënboer 

The effects of peer assessment format and task complexity 

on learning and measurements 

Dominique Sluijsmans, Jan-Willem Strijbos, Gerard Van de 

Watering 

Modelling the impact of individual contributions on peer 

assessment during group work in teacher training: In search 

of flexibility 

Jan-Willem Strijbos, Susanne Narciss, Mien Segers 

Peer feedback in academic writing: How do feedback 

content, writing ability-level and gender of the sender affect 

feedback perception and performance? 

Assessment in kindergarten classes: experiences from assessing competences 

in three domains 

Organiser: Marja van den Heuvel-Panhuizen 

Chair/Discussant: Kees de Glopper 

Coosje van der Pol, Helma van Lierop-Debrauwer 

A picture book-based tool for assessing literary competence 

in 4 to 6-year olds 

Aletta Kwant, Jan Berenst, Kees de Glopper 

Assessing the social-emotional development of young 

children by means of storytelling and questions 

ENAC 2008 vii 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52

Sylvia van den Boogaard, Marja van den Heuvel-Panhuizen 

Assessing mathematical abilities of kindergartners: 

possibilities of a group-administered multiple-choice test 

Papers 55 

Linda Allin, Lesley Fishwick 

Ethical Dilemmas: ‘Insider’ action research into Higher Education 

assessment practice 

Mandy Ashgar 

Reciprocal Peer Coaching as a Formative assessment strategy: Does it 

assist student to self regulate their learning 

Beth Black 

Using an adapted rank-ordering method to investigate January versus 

June awarding standards 

Sue Bloxham, Liz Campbell 

Generating dialogue in coursework feedback: exploring the use of 

interactive coversheets 

Saul Alejandro Contreras Palma 

Reforming practice or modifying Reforms? The science teacher’s 

responses to MBE and to assessment teaching in Chile 

Bronwen Cowie, Alister Jones, Judy Moreland, Kathrin Otrel-Cass 

Expanding student involvement in Assessment for Learning: A multimodal 

approach 

Julian Ebert 

Assessment Center Method to Evaluate Practice-Related University 

Courses 

Astrid Birgitte Eggen 

Democracy, Assessment and Validity. Discourses and practices 

concerning evaluation and assessment in an era of accountability 

viii ENAC 2008 

53 

57 

58 

59 

60 

61 

62 

63 

64

Kerry Harman, Erik Bohemia 

Using Assessment for Learning: exploring student learning experiences in 

a design studio module 

Christine Harrison, Paul Black, Jeremy Hodgen, Bethan Marshall, Natasha 

Serret 

Chasing Validity – The Reality of Teacher Summative Assessments 

Anton Havnes 

There is a bigger story behind. An analysis of mark average variation 

across Programmes 

Anton Havnes 

Course design and the Law of Unintended Consequences: Reflections on 

an assessment regime in a UK “new” University 

Mark Hoeksma, Judith Janssen, Wilfried Admiraal 

Reliability and validity of the assessment of web-based video portfolios: 

Consequences for teacher education 

Jenny Hounsell, Dai Hounsell 

Diversity in patterns of assessment across a university 

Gordon Joughin 

Learning-oriented assessment: A critical review of foundational research 

Patrick Lai 

Implementing standards-based assessment in Universities: Issues, 

Concerns and Recommendations 

Uwe Maier 

Test-based School Reform and the Quality of Performance Feedback: A 

comparative study of the relationship between mandatory testing policies 

and teacher perspectives in two German states 

Thomas Martens, Frank Goldhammer 

Motivational aspects of complex item formats 

Michael McCabe 

Remarkable Pedagogical Benefits of Reusable Assessment Objects for 

STEM Subjects 

ENAC 2008 ix 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75

Fiona Meddings, Christine Dearnley, Peter Hartley 

Demystifing the assessment process: using protocol analysis as a 

research tool in higher education 

Catherine Montgomery, Kay Sambell 

Challenging the formality of assessment: a student view of ‘Assessment 

for Learning’ in Higher Education 

Patrice O'Brien, Mei Kuin Lai 

Secondary students’ motivation to complete written dance examinations 

Michelle O'Doherty 

Mind the gap: assessment practices in the context of UK widening 

participation 

Raphaela Oehler, Alexander Robitzsch 

Measuring writing skills in large-scale assessment: Treatment of student 

non-responses for Multifaceted-Rasch-Modeling 

Susan Orr 

Collaborating or fighting for the marks? Students’ experiences of group 

assessment in the creative arts 

Ron Pat-El, M. Segers, P. Vedder, H. Tillema 

Constructing a new assessment for learning questionnaire 

Ruth Pilkington 

Assessing Professional Learning: the challenge of the UK Professional 

Standards Framework 

Margaret Price, Karen Handley, Berry O'Donovan 

Feedback – all that effort but what is the effect? 

Ana Remesal 

Student teachers on assessment: First year conceptions 

Mary Richardson 

Testing our citizens. How effective are assessments of citizenship in 

England? 

Andreas Saniter, Rainer Bremer 

Standards in vocational education 

x ENAC 2008 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87

Lydia Schaap, H.G. Schmidt 

Why do some students stop showing progress on progress tests? 

Lee Shannon, Lin Norton, Bill Norton 

Contextualising Assessment: The Lecturer's Perspective 

Pou-seong Sit, Kwok-cheung Cheung 

Learning to read: Modeling and assessment of early reading 

comprehension of the 4-year-olds in Macao kindergartens 

Anne Kristin Sjo, Knut Steinar Engelsen, Kari Smith 

Assessment in action – Norwegian secondary-school teachers and their 

assessment activities 

Kari Smith 

How do students teachers and mentors assess the Practicum? 

Margit Stein 

Assessment of competencies of apprentices 

Janet Strivens, Cathal O'Siochru 

Academics’ epistemic beliefs about their discipline and implications for 

their judgements about student performance in assessments 

Dineke Tigelaar, Jan van Tartwijk, Fred Janssen, Ietje Veldman, Nico Verloop 

Techniques for trustworthiness as a way to describe teacher educators' 

Assessment processes 

Marjo van Zundert 

Peer Assessment for Learning: a State-of-the-art in Research and Future 

Directions 

Denise Whitelock 

Investigating the Pedagogical Push and Technological Pull of Computer 

Assisted Formative Assessment 

Oliver Wilhelm, Ulrich Schroeders, Maren Formazin, Nina Bucholtz 

Strict Tests of Equivalence for and Experimental Manipulations of Tests 

for Student Achievement 

ENAC 2008 xi 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98

Roundtable papers 99 

Morten Asmyhr 

Why the moderate levels of inter-assessor reliability of student essays? 

Simon Barrie, C. Hughes, C. Smith 

Approaches to the assessment of graduate attributes in higher education 

David Boud 

Assessment for learning in and beyond courses: a national project to 

challenge university assessment practice 

Kwok-cheung Cheung, Pou-seong Sit 

Electronic reading assessment: The PISA approach for the international 

comparison of reading comprehension 

Wendy Clark, Jackie Adamson 

Developing the autonomous lifelong learner: tools, tasks and taxonomies 

Gilian Davison, Craig McLean 

Assessing the Art of Diplomacy? Learners and Tutors perceptions of the 

use of Assessment for Learning (AfL) in non-vocational education 

Luc De Grez, Martin Valcke, Irene Roozen 

Assessment of oral presentation skills in higher education 

Christine Dearnley, Jill Taylor, Catherine Coates 

Mobile Assessment of Practice Learning: An Evaluation from a Student 

Perspective 

Margaret Fisher, Tracey Proctor-Childs 

How reliable is the assessment of practice, and what is its purpose? 

Student perceptions in Health and Social Work 

Richard Fuller, Matthew Homer, Godfrey Pell 

Measuring variance and improving the reliability of criterion based 

assessment (CBA): towards the perfect OSCE 

Concha Furnborough 

Learning through assessment and feedback: implications for adult 

beginner distance language learners 

xii ENAC 2008 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

111

Stuart Hepplestone 

Secret scores: Encouraging student engagement with useful feedback 

Therese Nerheim Hopfenbeck 

Large-Scale Assessment and Learning-Oriented Assessment: Like Water 

and Oil or new Possibilities for Future Research Directions? 

Sally Jordan, Philip Butcher, Arlëne Hunter 

Online interactive assessment for open learning 

Per Lauvas, Gunnars Bjølseth, Anton Havnes 

Can inter-assessor reliability be improved by deliberation? 

Paulette Luff, Gilian Robinson 

Sketchbooks and Journals: a tool for challenging assessment? 

Michal Nachshon, Amira Rom 

Evaluating the use of popular science articles for assessing high schools 

students 

Berry O'Donovan, Margaret Price 

Supporting student intellectual development through assessment design: 

debating ‘how’? 

Berry O'Donovan, Margaret Price 

Assessment contexts that underpin student achievement: demonstrating 

effect 

Ann Ooms, Timothy Linsey, Marion Webb 

In-classroom use of mobile technologies to support formative assessment 

Jon Robinson, David Walker 

The Devil's Triad: the symbiotic link between Assessment, Study Skills 

and Key Employability Skills 

Ann Karin Sandal, Margrethe H. Syversen, Ragne Wangensteen, Kari Smith 

Learning-oriented assessment and students experiences 

Mark Schofield 

Connecting Research Behaviours with Quality Enhancement of 

Assessment: Eliciting Developmental Case Studies by Appreciative 

Enquiry 

ENAC 2008 xiii 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123

Elias Schwieler, Stefan Ekecrantz 

Conceptions of assessment in higher education: A qualitative study of 

scholars as teachers and researchers 

Thomas Stern 

Innovative Assessment Practice and Teachers’ Professional Development: 

Some Results of Austria’s IMST-Project 

Dineke Tigelaar, Mirjam Bakker, Nico Verloop 

Characteristics of an effective approach for formative assessment of 

teachers’ competence development 

Posters 127 

Andy Bell, Kevin Rowley 

Predictive indicators of academic performance at degree level 

Christian Bokhove 

Online Formative Assessment for Algebra 

Barbara Brockbank, Sally Jordan, Tom Mitchell 

Investigating the use of short answer free-text e-assessment questions 

with instantaneous tailored feedback 

Nina Bucholtz, Maren Formazin, Oliver Wilhelm 

Contextualized reasoning with written and audiovisual material: Same or 

different? 

Tobias Diemer, Harm Kuper 

Effects of Large Scale Assessments in Schools: How Standard-Based 

School Reform Works 

Greet Fastré, Marcel van der Klink, Dominique Sluijsmans, Jeroen van 

Merriënboer 

Support in Self-assessment in Secondary Vocational Education 

Merrilyn Goos, Clair Hughes, Ann Webster-Wright 

The confidence levels of course/subject coordinators in undertaking 

aspects of their assessment responsibilities 

xiv ENAC 2008 

124 

125 

126 

129 

130 

131 

132 

133 

134 

135

Stuart Hepplestone 

Useful feedback and flexible submission: Designing and implementing 

innovative online assignment management 

Rosario Hernandez 

The challenge of engaging students with feedback 

Dai Hounsell, Chun Ming Tai, Rui Xu 

Towards More Integrative Assessment 

Clair Hughes 

Using a framework adapted from Systemic Functional Linguistics to 

enhance the understanding and design of assessment tasks 

Anders Jonsson 

The use of transparency in the "Interactive examination" for student 

teachers 

Ulrich Keller, Monique Reichert, Gilbert Busana, Romain Martin 

School monitoring in Luxembourg: computerized tests and automated 

results reporting 

Marjolijn Peltenburg, Marja van den Heuvel-Panhuizen 

Mathematical power of special needs students 

Glynis Pickworth, M. van Rooyen, T.J. Avenant 

Quality Assurance review of clinical assessment: How does one close the 

loop? 

Margaret Price, Karen Handley, Berry O'Donovan 

Feedback: What’s in it for me? 

Ana Remesal, Manuel Juárez, José Luis Ramírez 

From students’ to teachers’ collaboration: a case study of the challenges 

of e-teaching and assessing as co-responsibility 

Jon Robinson, David Walker 

Symbiotic relationships: Assessment for Learning (AfL), study skills and 

key employability skills 

ENAC 2008 xv 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146

Petra Scherer 

Assessing low achievers’ understanding of place value – consequences 

for learning and instruction 

Revital Tal 

Using a course forum to promote learning and assessment for learning in 

environmental education 

Mirabelle Walker 

Learning-oriented feedback: a challenge to assessment practice 

David Webb 

Progressive Formalization as an Interpretive Lens for Increasing the 

Learning Potentials of Classroom Assessment 

Author Index 151 

Address list of presenters 157 

xvi ENAC 2008 

147 

148 

149 

150

INTRODUCTION 

EARLI/Northumbria Assessment Conference 2008 

The EARLI/Northumbria Assessment Conference (ENAC) is held now for the fourth time. It 

is a conference series established jointly by the EARLI Special Interest Group on 

Assessment and Evaluation, and Northumbria University in Newcastle, United Kingdom. 

ENAC conferences are held biennially, in the years between the – equally biennial – fullscale 

EARLI conferences. 

The EARLI/Northumbria Assessment Conference started at Northumbria University in 

2002. In 2004, the second conference took place in Norway, organised by the University of 

Bergen. The third conference returned to Northumbria University. 

The present conference continues the tradition of the EARLI/Northumbria Assessment 

Conferences by providing a forum for participants to exchange ideas in an inspiring 

professional environment and a pleasant and comfortable venue. 

The EARLI/Northumbria Assessment Conference 2008 is hosted by IQB (Institut zur 

Qualitätsentwicklung im Bildungswesen) of Humboldt University and organised in 

collaboration with the International Conference Committee. 

International Conference Committee ENAC 2008 

Marja van den Heuvel-Panhuizen (Conference President) 

IQB, Humboldt University Berlin, Germany 

Freudenthal Institute, Utrecht University, the Netherlands 

Olaf Köller (Director of IQB) 


Dietlinde Granzer 


Liz McDowell 

Northumbria University, UK 

Kay Sambell 


Nicola Reimann 


Jim Ridgway (EARLI SIG) 

University of Durham, UK 

Denise Whitelock (EARLI SIG) 

Open University, UK 

Anton Havnes 

Oslo University College, Norway 

Kari Smith (EARLI SIG) 

University of Bergen, Norway 

ENAC 2008 xvii

The review process of ENAC 2008 

The review process was organised by Dietlinde Granzer. In total, we received 

196 submissions for the Fourth Biennial EARLI/Northumbria Assessment Conference 2008. 

The submissions included the 500-word abstracts of 170 papers (40 of them as part of a 

symposium), 11 roundtable papers and 15 posters. Every submission was anonymously 

peer-reviewed by two reviewers out of a group of experts selected by the International 

Conference Committee. In case the two reviewers had entirely different opinions about the 

submission, a third reviewer was asked. 

The main criterion in the review was whether the quality of a submission was high enough, 

in general, and with respect to the proposed presentation format in particular. 

All in all, the quality of the submissions was very high. Because of the large number of high 

quality proposals for paper presentations and symposia, the ICC re-allocated some of those 

proposals to round tables and posters. 

The final decisions about acceptance, rejection, or re-allocating to another presentation 

format were in the hands of the ICC. The total acceptance rate was almost 65%. 

The ICC thanks the following people for their help in the review process: 

Bremerich-Vos, Albert, Universität Duisburg-Essen (GERMANY) 

Brna, Paul, Educational Consultancy in Technology Enhanced Learning (UK) 

Clegg, Karen, University of York (UK) 

Granzer, Dietlinde, Humboldt University Berlin (GERMANY) 

Havnes, Anton, University of Bergen (NORWAY) 

Higgins, Steve, Durham University (UK) 

Köller, Olaf, Humboldt University Berlin (GERMANY) 

Lauvas, Per, Fellesadministrasjonen (NORWAY) 

McCusker, Sean, Durham University (UK) 

McDowell, Liz, Northumbria University (UK) 

Montgomery, Catherine, Northumbria University (UK) 

Reimann, Nicola, Northumbria University (UK) 

Reiss, Kristina, Ludwig-Maximilians-Universität München (GERMANY) 

Ridgway, Jim, Durham University (UK) 

Ruedel, Cornelia, University of Zürich (SWITZERLAND) 

Rust, Christopher, Oxford Brookes University (UK) 

Sambell, Kay, Northumbria University (UK) 

Smith, Kari, University of Bergen (NORWAY) 

Van den Heuvel-Panhuizen, Marja, Utrecht University/Humboldt University Berlin (NL/GER) 

Webb, David, University of Colorado at Boulder (USA) 

Whitelock, Denise, The Open University (UK) 

Wilhelm, Oliver, Humboldt University Berlin (Germany) 

Winkley, John; Becta (UK) 

xviii ENAC 2008

ABSTRACTS 

ENAC 2008 1

2 ENAC 2008

Plenary Lectures 

ENAC 2008 3

4 ENAC 2008

Assessment, grading, and instruction: 

Understanding the context of educational measurement 

Eckhard Klieme, German Institute for International Educational Research (DIPF), Germany 

Both institutional effectiveness and individualized, adaptive education depend on the 

availability of sophisticated instruments to measure and model student learning. However, 

“one size of assessment does not fit all” (Pellegrino at al., 2001, p. 222). These authors 

called for multidisciplinary research activities focusing on three facets: “(1) development of 

cognitive models of learning that can serve as the basis for assessment design, (2) 

research on new statistical measurement models and their applicability, (3) research on 

assessment design” (p. 284). Similarly, a recently started research program in Germany 

covers four key areas: “the development of theoretical models of competence (1), the 

construction of psychometric models (2), the construction of measurement instruments for 

the empirical assessment of competencies (3), and research on the use of diagnostic 

information (4)” (Koeppen, Hartig, Klieme & Leutner 2008). 

Given that outstanding improvements have been made in recent years with regard to cognitive 

modelling, psychometrics, and test design, the fourth area seems to be the one which is least 

understood in educational research. While a lot of – mostly critical – research has been done on 

effects of high-stakes, standard-based assessment systems, few researchers have studied the 

many varieties of educational assessment that take place in the context of everyday classroom 

teaching and the huge impact these practices have on student learning. 

Teachers make observations of students’ understanding and performance in a variety of 

ways: in classroom dialogue, homework assignments, and formal tests. These procedures 

should permit diagnosis on an individual level, in terms of understanding students’ individual 

solution paths, misconceptions, etc. Appropriate individual feedback is crucial to support the 

subsequent learning process. A number of research questions arise in this context: What kind 

of diagnostic information is best understood by students, and what kind by teachers? How 

well can teachers evaluate individual learning processes? What factors influence teachers’ 

grading decisions? What models of competence do teachers rely on – implicitly or explicitly? 

How well founded and how helpful is the individual student feedback provided by the teacher? 

And how do all these processes interact with newly implemented assessment systems? 

After an overview of attempts for “instructionally sensitive assessment”, the paper will present 

two studies investigating everyday classroom practices in detail, based on a sample of math 

lessons from Germany and Switzerland. The first study examines teacher judgments about 

student achievement in terms of the grades awarded. It examines to which degree the grades 

awarded reflect different dimensions of students’ achievement and learning behavior. It also 

explores whether assessment and instruction are indeed aligned in the classroom, that is, 

whether teachers’ grading is aligned with their instruction. In the second study, we analyze 

how teacher evaluation affects students’ subsequent learning processes. This study utilizes 

feedback given to students by the teacher within classroom interaction as an indicator for the 

communication of student evaluation, and investigates the impact of two types of feedback, 

evaluative and informational, on student learning and motivation. 

ENAC 2008 5

Improving Children’s Rights in Assessment: 

issues, challenges and possibilities 

Ruth Leitch, Queen's University Belfast, United Kingdom 

Much valuable work has been achieved in recent years concerning the educational benefits 

of consulting with children and young people on teaching and learning (Flutter & Ruddock, 

2004) and as a means of assuring children their rights in education. There has been 

significantly less research on improving children’s rights in assessment despite a growing 

interest in the relationship between assessment and social justice. Rights in relation to the 

assessment of students’ learning or performance do not expressly exist in legislation in 

most jurisdictions although they are enshrined in international treaties such as the United 

Nations Convention on the Rights of the Child (UNCRC, 1989). 

This presentation will contribute a children’s rights perspective to issues of assessment 

focusing specifically on the legal implications and imperatives of Article 12 of the UNCRC. 

Data illuminating various issues will be derived from a recent ESRC/TLRP qualitative 

research project that consulted pupils on aspects of assessment policy and practice, 

including the introduction of annual pupil profiles and assessment for learning (AfL) in the 

Northern Ireland context (Leitch et al, 2008). A conceptual model based on a critical legal 

interpretation of Article 12 (Lundy, 2007) will be unpacked to illustrate some of the 

opportunities and obstacles afforded by students being involved more fully in the 

assessment of their learning. The presentation will conclude by arguing that if we are truly 

committed to improving children’s rights in relation to assessment, there must be a 

concerted approach to awareness raising on the obligations of children’s rights at all levels 

within the education system as part of a democratic culture shift – and Article 12 (UNCRC) 

is a valuable place to start. 

References 

Flutter, J. & Rudduck, J. (2004) Consulting Pupils: What's in it for Schools? London: RoutledgeFalmer. 

Leitch, R., Gardner, J., Mitchell, S., Lundy, L., Galanouli, D. & Odena, O. (2008) Consulting Pupils on the 

Assessment of their Learning. ESRC/TLRP Research Briefing Number 36, March 2008, 

http://wwwtlrp.org/pub/research/html 

Lundy, L. (2007) ‘Voice is not enough’ : The implications of Article 12 of the UNCRC for Education.British 

Educational Research Journal , Vol 33, No 6, 927-942. 

UNCRC (1989) United Nations Convention on the Rights of the Child UN General Assembly Resolution 

44/25 New York. United Nations. 

6 ENAC 2008

When is assessment learning-oriented? 

Dylan Wiliam, University of London, United Kingdom 

Educational assessments are conducted in a variety of ways and their outcomes can be 

used for a range of purposes. There are differences in who decides what is to be assessed, 

who carries out the assessment, where the assessment takes place, how the resulting 

responses made by students are scored and interpreted, and what happens as a result 

(Black and Wiliam, 2004). In particular, each of these can be the responsibility of the 

learners themselves, those who teach the students, or, at the other extreme, all the 

processes can be carried out by an external agency. Cutting across these differences, there 

are also differences in the functions that assessments serve. 

Assessments can be used to support judgments about the quality of educational programs 

or institutions (what might be termed the evaluative function). They can be used to describe 

the achievements of individuals, either for the purpose of certifying that they have reached a 

particular level of performance of competence, or for making predictions about their future 

capabilities (what might be termed the summative function). And assessment can be used 

to support learning (what might be termed the formative function). 

In this talk, I will suggest that an assessment functions formatively only when evidence 

about student achievement elicited by the assessment is interpreted and used to make 

decisions about the next steps in learning that are likely to be better, or better founded, than 

the decisions that would have been made in the absence of that evidence. 

I will further suggest that learning-oriented assessment involves five key strategies, which 

serve to connect assessment to other important educational processes: 

• Clarifying, understanding, and sharing learning intentions 

• Engineering effective classroom discussions, tasks and activities that elicit evidence of 

learning 

• Providing feedback that moves learners forward 

• Activating students as learning resources for one another 

• Activating students as owners of their own learning 

Examples of each of these strategies will be given, and the presentation will conclude by 

offering a set of priorities for the design of learning-oriented assessments. 

ENAC 2008 7

8 ENAC 2008

Invited Symposia 

ENAC 2008 9

10 ENAC 2008

Invited Symposium: Socio-cultural perspectives 

Using socio-cultural perspectives to understand and change 

assessment in post-compulsory education 

Organiser: Liz McDowell, University of Northumbria, United Kingdom 

This symposium focuses on the application of socio-cultural perspectives to the day-to-day 

practices of assessment in post-compulsory education. The importance of the research is 

that despite significant changes in theories of learning and teaching and in perspectives on 

the societal goals for educational systems, shifts in assessment thinking and practices have 

lagged behind these changes (Shepard, 2000). This remains true despite a number of 

powerful arguments making the case for change of vision from a ‘testing culture’ to an 

‘assessment culture’ (Wolf et al., 1991). 

Assessment as testing, based in the scientific measurement approach (Hager & Butler, 

1996) has retained its behaviourist roots much more strongly than contemporary teaching 

practices. Constructivist approaches should give students a more active role as participants 

in assessment rather than victims of the assessor (Dochy & McDowell, 1997) and there has 

been considerable growth in interest in formative assessment (Black & Wiliam, 1998). 

Nevertheless, in practice, assessment tends to be seen in a technicist way as a decontextualised 

and narrow type of activity, a system designed to direct students, 

formatively, towards performances that are summatively validated and represented by 

grades awarded. Hence there is considerable emphasis on effective techniques and the 

design of constructively aligned systems (Biggs, 2003) that channel students into the 

desired assessment performances. 

Socio-cultural approaches take a much broader view of assessment, recognising that it is a 

socially and contextually located set of practices. It can be seen as a structure with a 

complex of activities, influences and outcomes experienced by the actors within it, chiefly 

teachers and students, situated within a broader social, historical and cultural context. 

The symposium presenters draw upon research studies which have produced new insights 

into assessment practices. Their collective work represents a cumulative and integrated 

body of evidence pointing to the value of socio- cultural understandings of assessment and 

their utility in improving practice in classrooms, lecture rooms and examination halls. Each 

paper draws on a wide range of evidence but presents data from recent research studies in 

post-compulsory education. Assessment is viewed in terms of: its meaning for individuals 

with their personal histories and developing identities; its meaning at the collective level, 

that is, the ways that assessment is constructed in classrooms, courses, and institutions; 

and its longer term consequences (Boud & Falchikov). 

The research teams have worked with teachers to make links between new understandings 

of assessment and local assessment practices. Some have used action research 

approaches to engage teachers and involve them in the development of theory and 

practice. Each paper will challenge symposium participants to problematise aspects of 

assessment thinking and practice that may have been taken for granted and will offer ways 

of accommodating new understandings in assessment practice. 

ENAC 2008 11

Invited Symposium: Socio-cultural perspectives / Paper 1: 

Getting beyond the individual and the technical: 

How a cultural approach offers knowledge to transform assessment 

David James, University of the West of England, United Kingdom 

This paper argues that a socio-cultural perspective on assessment is an urgent and 

practical necessity in higher education. It is important to understand how and to what extent 

assessment practices (a) attempt to serve contradictory purposes, and (b) determine 

conceptions and practices of learning beyond those desired by tutors and students. It is also 

crucial to appreciate how much scope tutors have for beneficial interventions, and why. 

The paper begins by setting out some tools by which this might be achieved. It draws upon 

the methods and outcomes of the Transforming Learning Cultures in Further Education 

project which formed part of the national UK Teaching and Learning Research Programme. 

The project was the largest ever independent study of practices in further education. Its 

aims were to deepen understanding of the complexities of learning, to weigh up strategies 

for improvement, and to set in place a lasting capacity for productive enquiry amongst FE 

professionals. To this end, the study involved over 1000 questionnaires, 600 interviews, 

extensive shadowing and tutor diaries. It was informed by a range of theoretical sources, of 

which Dewey (Biesta and Burbules, 2003) and Bourdieu (e.g. Bourdieu, 1998; Grenfell and 

James, 1998) were prominent. The outcomes of the study included a tool for understanding 

tutor interventions, some ‘principles of procedure’ for improvement, and a new ‘cultural’ 

theory (Hodkinson, Biesta and James, 2007). One recurrent theme in the analysis is how 

assessment regimes and events embody – sometimes whilst concealing – strong notions of 

learning and teaching. Related to this, the study demonstrated how and why individual 

tutors often had limited capacity to make significant improvements on their own, sometimes 

despite sterling efforts, and how the same intervention could have positive or negative 

effects depending on the specific setting. 

A method for interrogating learning cultures whilst keeping assessment as a core focus is 

presented and applied to HE practice. This research approach raises questions such as, 

what conception of learning is inherent in particular ways of writing learning outcomes, or in 

the use of academic credit, or in certain assessment events, and marking regimes? Are 

there conceptions of learning that are rhetorically important but then marginalized in 

assessment practices? The approach avoids the pretence that assessment is fundamentally 

a technical matter (James, 2000) and argues that the idea of constructive alignment (e.g. 

Biggs, 2003) is ‘too good to be true’. Instead, the paper offers a cultural view of assessment 

practices that takes account of power, interests, relationships, and interactions. The view 

advocated is compatible with the humanistic concerns in the earlier seminal work of Heron 

(1988) and Boud (e.g. 1990), but combines their insights with a fresh ‘take’ on the capacity 

of (and scope for) tutors to act. The paper argues that understanding a learning culture 

provides a route to realism about worthwhile and possible change to assessment events 

and regimes. 

12 ENAC 2008


Straitjacket or springboard?: the strengths and weaknesses of using a 

socio-cultural understanding of the effects of formative assessment on learning 

Kathryn Ecclestone, Oxford Brookes University, United Kingdom 

Research into formative assessment in schools and higher education has pointed to a 

variety of techniques that supporters claim will raise achievement, engage students with 

learning and promote more democratic, transparent assessment practices. Yet, a major 

research project exploring formative assessment in further and adult education shows that 

techniques, in themselves, are neither progressive or unprogressive (see Ecclestone 2002, 

2008; Davies and Ecclestone, 2007; Marshall and Drummond, 2006). Instead, work by 

James and Biesta and colleagues shows the usefulness of a socio-cultural understanding of 

learning (see, for example, James and Biesta 2007). The paper explores how a sociocultural 

approach illuminates the subtle ways in which different learning cultures within the 

same institution or course can produce formative assessment that leads either to 

instrumental compliance or deep, sustainable engagement. Sometimes learning cultures 

encourage instrumental assessment as a springboard to deeper forms of learning; 

sometimes instrumental assessment acts as a straitjacket on learning. 

This paper draws on recent empirical studies that have explored the links between policy for 

formative assessment, espoused theoretical principles and the reality of day to day 

practices in different contexts. It examines how teachers’ and students’ ideas about 

formative assessment practices cannot be divorced from the learning cultures which both 

shape those ideas and practices and which, in turn, are shaped by them. This illuminates 

tensions between instrumental and sustainable formative practice and shows possibilities 

for affecting practice. 

However, there is also a danger that a socio-cultural understanding can also overemphasise 

the discursive effects of formative assessment on identities and the navigation 

of power and relationships within assessment practices. Whilst important, effects of 

assessment on identities can overlook attention to the quality of educational outcomes for 

students. 

The paper aims to makes proposals about how a socio-cultural understanding of formative 

assessment helps teachers influence their practice in positive ways, with specific examples 

from recent activities and discussions with teachers in further and adult education. 

ENAC 2008 13


Formative assessment: the discursive construction of identities 

John Pryor, University of Sussex, United Kingdom 

Barbara Crossouard, University of Sussex, United Kingdom 

This paper relates to recent work in higher education with professional doctorate students. It 

builds on empirical research conducted in a number of educational contexts over the past 

14 years (e.g. Torrance and Pryor 1998, 2001; Pryor and Crossouard 2008; Crossouard 

2008). In these studies the crucial importance of issues of student and teacher identity to 

learning cultures and therefore to the nature and consequences of formative assessment 

has emerged as an increasingly important theme. 

The analysis draws on social theory whereby identity is not seen in terms of the 

individualized psychological self but more in terms of identity embedded in social processes 

and practices (see Hey, 2006). This is related to a sociocultural perspective on learning as 

happening through dialogic processes of identity construction and performance, so that it 

involves ‘becoming a different person [where] identity, knowing and social membership 

entail one another’ (Lave & Wenger, 1991, p.53). The data on doctoral students were 

derived from observation of formative assessment, discourse analysis of online texts, 

including peer discussion forum interactions and tutor email feedback, and exploration of 

student perceptions through in-depth interviews. This yielded a close focus on the 

processes of formative assessment. This was supplemented by insider perspectives 

generated by the fact that the researchers were the main tutor during the part of the 

doctoral programme under study and a doctoral candidate. Thus the project was able to 

include an element of action research to develop and evaluate different aspects in more 

detail and explicitly ground the findings in practice. 

Our conclusions are that issues of identity, power and culture are part of the complexity of 

learning. These issues may act as barriers, but formative assessment offers opportunities 

for what might be described as an explicit meta-discourse which may also enhance 

learning. Thus power differentials emerge as potentially productive when different identity 

positions – assessor, teacher, practitioner, learner, disciplinary expert, critic – are 

deliberately invoked by the tutor. Similarly student identities, both as students and in relation 

to their past and future lives, can be deliberately invoked. Within this play of identities the 

disciplinary norms against which students’ performances are judged (the rules of the game) 

may be highlighted. Engagement with subject matter alongside identity thus has special 

potential for formative assessment as a means of promoting equity in education. 

This work establishes formative assessment at the heart of higher education practice and a 

key implication that its potency should not be underestimated. Despite its complexity we do 

suggest ways in which the play of identities can be incorporated into the practice of teaching 

and learning. 

14 ENAC 2008

Invited Symposium: C-tests 

Measuring language skills by means of C-tests: 

Methodological challenges and psychometric properties 

Organisers: Alexander Robitzsch, Olaf Köller, IQB, Humboldt University Berlin, Germany 

Chair: Olaf Köller, IQB, Humboldt University Berlin, Germany 

C-tests are widely used to assess overall language skills both in foreign languages as well 

as in mother tongue. These tests usually consist of texts interrupted by gaps that have to be 

filled in by examinees. The individual gaps within each text are typically the smallest unit of 

analysis, which can be treated as individual test items. The analytical focus, however, of the 

C-test assessment is usually not the individual item, but the overall level of achievement 

(i.e., the number of closed gaps).While there is broad consensus that these measures are 

quite reliable and valid, there are several methodological challenges associates with these 

measures. Particularly, when IRT-models are applied to these tests, different strategies can 

be used. Some authors (e.g., Eckes, 2007) recommend the application of polytomous IRT 

models, in which all gaps of one text are building one rating scale ranging from zero up to 

the number of gaps. Other authors, however, prefer analyzing each gap as a single item 

and then applying Rasch testlet models, in which dependencies among items are in the 

model. All three papers in the proposed symposium focus on this issue. 

The first paper provided by Eckes clearly prefers polytomous IRT models when analyzing Ctests. 

The appropriateness of this approach is shown on C-test results from approximately 

5.000 examinees from 116 countries, all of which have been working on C-tests measuring 

German as a foreign language. 

The authors of the second paper (Hartig & Harsch) argue that a more adequate strategy 

might be applying testlet models to C-tests. In this approach more is learnt about specific 

dependencies among gaps. This approach is illustrated by means of data from the German 

DESI large scale study in which foreign language skills of about 10.000 9th graders were 

assessed with C-tests. 

The third paper by Robitzsch, Karius, and Neumann offers an extended solution of the 

Hartig and Harsch approach. In their study, which was part of the German national 

assessment program, the authors propose a more detailed testlet model which models 

dependency of the gaps hierarchically, i.e., items are nested within sentences and 

sentences are nested within C-tests. Furthermore the authors analyze relationships of Ctests 

with tests on other language skills. In summary, the papers widen our understanding 

both on how to model responses in C-tests as well as what C-tests typically measure. 

ENAC 2008 15

Invited Symposium: C-tests / Paper 1: 

Constructing a calibrated item bank for C-test 

Thomas Eckes, TestDaF Institute, Germany 

C-tests are gap-filling tests that measure general language proficiency. In terms of efficient 

construction of C-tests, high-quality test development, and flexible test administration, 

including web-based testing, it is imperative to make use of a calibrated item bank, that is, 

an item bank in which parameter estimates for all items in the bank have been placed on 

the same difficulty scale. When constructing a calibrated item bank for C-tests, two major 

issues arise: (a) choosing an IRT model for item calibration and linking, and (b) choosing a 

design for collection of item-banking data. 

Regarding the first issue, it is important to realize that gaps within a given text are locally 

dependent to a significant degree. As a consequence, texts should not be analyzed on the 

level of individual gaps, but should rather be construed as super-items (item bundles, 

testlets), with item values corresponding to the number of gaps within a given text; that is, 

each text should be viewed as a polytomous item. Accordingly, Rasch models such as 

Andrich’s rating scale model or Masters’ partial credit model would seem appropriate. 

With respect to the data collection issue, one widely used design is the common-item 

nonequivalent groups (CING) design. In this design, various test forms are linked through a 

set of common items. The groups are not considered to be equivalent. Alternatively, the 

randomly-equivalent groups design could be employed. Examinees are randomly assigned 

the form to be administered; linkage between the forms is achieved by assuming that the 

different groups of examinees taking different forms are equivalent in ability. 

In the present paper, I report on an ongoing study aiming at the construction of a large 

calibrated item bank for use with an Internet-delivered C-test, the “Online Placement Test of 

German as a Foreign Language” (onDaF; www.ondaf.de). Building on research into the 

suitability of various polytomous Rasch models to the analysis of C-tests (Eckes, 2007), the 

rating scale model was employed for item calibration. Adopting a CING design, itembanking 

data were collected in a series of 23 different test sessions, covering a total of 

4,842 participants from 116 countries. In each session a set of 10 texts was administered, 

two of which were common to all sets. Reliability indices per set ranged from .94 to .98. 

Texts showing unsatisfactory model fit or DIF were eliminated. The remaining 174 texts 

were put on the same difficulty scale through a concurrent estimation procedure. 

Combined with a carefully designed client-server architecture, the Rasch-measurement 

approach to item banking currently provides the basis for a highly flexible administration of 

the onDaF at licensed test centers throughout the world. When taking the onDaF, each 

examinee is presented with a unique set of eight texts; that is, texts are drawn from the item 

bank according to a linear-on-the-fly test delivery model. In each instance, test assembly is 

subject to the constraints of increasing text difficulty and variation in text topic. Responses 

are automatically scored and test results are reported to examinees immediately after 

completing the test. 

16 ENAC 2008


Gaining substantive information from local dependencies between C-test items 

Johannes Hartig, German Institute for International Educational Research (DIPF), Germany 

Claudia Harsch, IQB, Humboldt University Berlin, Germany 

The C-test is used as a screening tool to assess global performance levels in written 

language competence. The individual gaps within each text of the C-test are the smallest 

unit of analysis, which can be treated as individual test items. Typically, however, the focus 

of the C-test assessment is not the individual item, but the overall level of achievement (i.e., 

the number of closed gaps). A technical argument against an analysis on the item level is 

that the solutions of individual gaps partially depend on the solutions of the remaining gaps 

within the same text. This means if individual gaps are treated as items, local independence 

of these items, as presumed in most measurement models, is a rather unlikely assumption. 

For this reason, many authors prefer to analyze performance in C-tests on the text level and 

not on the level of individual gaps, e.g. by treating each text as separate “super item”. In 

contrast to this approach, this paper will focus on the substantive information that can be 

gained about the C-test and the underlying language competencies if performance is 

analyzed on the item level. We use the dependencies on text level and between individual 

gaps to derive information about characteristics of texts and gaps that determine students’ 

solution processes. The aim of the study is to predict these dependencies by using a priori 

defined text and item characteristics. 

Statistical and graphical methods to examine item dependencies on text and item level are 

presented. Dependencies on text level can be estimated within a Rasch testlet model, 

assuming additional latent dimensions for each test, over and above the common 

underlying ability dimension. Dependencies between individual gaps can be graphically 

analyzed based on the correlations between residuals from Rasch analyses within each 

text. These methods are applied to data from a large scale assessment of English language 

competencies of German 9th graders. Statistics for local dependencies are estimated on 

text and on item level. Results of the Rasch testlet model show substantial amounts of textspecific 

variance, indicating general dependencies between gaps within the same text. The 

analysis of residuals yields strong dependencies between few specific item pairs, while in 

some texts almost no marked dependencies are found on item level. Dependencies on text 

level as well as on item level can partly be explained by text and item characteristics. For 

instance, the deletion frequency of gaps seems to affect dependencies on text level, and 

dependencies between items can be found for gaps within the same phrases. The results 

widen our understanding of the C-test construct; it is discussed if this knowledge can be 

used to systematically construct C-tests with specific properties. 

ENAC 2008 17


C-tests for German Students: 

Dimensionality, Validity and Psychometric Perspectives 

Alexander Robitzsch, IQB, Humboldt University Berlin, Germany 

Ina Karius, IQB, Humboldt University Berlin, Germany 

Daniela Neumann, IQB, Humboldt University Berlin, Germany 

C-tests are integrative tests which are designed to test a person’s command of language by 

making use of the principle of reduced redundancy. It is assumed that language is 

redundant in a way that allows successful communication although possible flaws in the 

language transmission (unclarity, ambiguity, noise) may impede understanding. The 

addressee of a message is able to reconstruct the form and meaning of morphologically 

incomplete words supported by the local and global context of the message, provided that 

he or she is familiar with the vocabulary, the grammatical rules and the cultural background 

of the language used. 

In Germany, the educational standards for the subject “German” (mother tongue) are 

supposed to ensure that every student is able to fully participate in written and spoken 

interaction. In the course of the measurement of the educational standards items were 

developed to assess the students’ competences in this area. A total of 1700 students of all 

secondary school types ranging from grade level eight to ten (14-17 years old) were tested 

in reading and listening comprehension, writing, orthography and language use. The 

assessment part on language use contained, among other items, C-tests. Altogether ten 

different C-tests were used in a sample of 560 students. Every student filled in four C-tests 

which resulted in a complete balanced multi-matrix sampling design. 

In many cases dimensionality of C-tests is assessed by regarding each C-test as one 

superitem that demands all blanks to be completed. We focus on a dimensional analysis on 

the item level and use NOHARM and DIMTEST to assess essential dimensionality of the Ctest 

construct. Due to lack of local stochastic independence of the Rasch model we propose 

a more detailed testlet model which models dependency hierarchically i.e. items are nested 

within sentences and sentences are nested within C-tests. 

This paper estimates this multilevel item response model on the item level. In addition 

variances of local stochastic dependency are being explained by linguistic characteristics of 

the material and the student properties such as grade level and school track. To provide a 

deeper understanding of validity, we study relationships of C-test results with subdomains of 

language use, orthography and listening comprehension by using a confirmatory factor 

analysis. 

This contribution gives insight into the C-test construct for native speakers with regard to 

dimensionality; it provides detailed validity evidence and finally proposes an alternative 

psychometric scaling model. 

18 ENAC 2008

Invited Symposium: E-Assessment 

Moving forward with e-assessment 

Organiser: Denise Whitelock, The Open University, United Kingdom 

Discussant: Kari Smith, University of Bergen, Norway 

Technology is increasingly being used to support assessment, but its effectiveness in this 

area of learning is still open to question. In part, this is because there is an awkward tension 

between assessment and constructivist approaches to learning. Even when an entire 

learning experience has been designed to be constructivist and learner-centred, formal, 

summative assessment sits uneasily with this constructivist pedagogy; it is ‘out there’ and is 

not part of the process of constructing knowledge. Constructivism is used as an approach 

for getting better performance on conventional measures, rather than as a radical 

philosophy about the nature of knowledge and its acquisition. When assessment is 

embedded within constructivist pedagogy, learners quickly adopt strategies that optimise 

their cognitive load, typically guessing what is expected of them rather than constructing 

their own conceptual frameworks. 

This symposium scrutinises the laudable aims of harnessing technology enhanced 

assessment to help shape learners as independent thinkers, making their own judgments 

and decisions about their learning process in partnership with their tutors. Assessment and 

learning need to be properly linked. As Elton and Johnston said, “if one changes the method 

of teaching, but keeps the assessment unchanged, one is very likely to fail.” And Rowntree: 

“if we wish to discover the truth about an educational system, we must look into its 

assessment procedures.” Despite decades of innovation in learning theory and technology 

and many different approaches to the problem of building conversational relationships in 

education, assessment is still the core of the problem. 

Assessment systems define the nature of subjects, and what is worth knowing, and act as 

gatekeepers to progress in education and careers, synchronising understanding between 

an individual and the world they live in. This symposium will discuss how e-assessment can 

overcome the barriers that are imposed upon both students and tutors in developing ICTbased 

systems and are tuned to the current net generation of learners. 

ENAC 2008 19

Invited Symposium: E-Assessment / Paper 1: 

The Future of E-Assessment: E-Assessment as a Dialog 

Cornelia Ruedel, University of Zurich, Switzerland 

E-Assessment has become more and more popular over the last decade. In contrast to the 

German-speaking countries where the impact has been limited compared to the advances 

in the USA and the UK, the Universities in Switzerland, Germany and Austria have now 

realised e-Asessment’s potential. The exam system in German-speaking countries is 

traditionally dominated by a competition between exams, assignments and oral 

presentations. The Bologna Reform with the modularisation of the university courses has 

put a burden upon the exam system which has started a rethink of new assessment 

methods. These new assessment forms should take all the criticism about the traditional 

system into account in terms of validity, reliability and fairness. The usual practice of 

marking in the German-speaking countries is not really transparent. There is no tradition of 

having an external examiner for written exams, only the lecturer is responsible for the 

marking process. Therefore the students have sometimes an uneasy feeling about the 

whole system. 

Generally, the students’ view of assessment is quite different from the lecturers’ view. The 

assessment is at the heart of the student learning but it is not at the heart of teaching. 

Assessment should become a classroom element which motivates, encourages and 

stimulates student learning. E-Assessment offers all these in a variety of maintainable 

solutions, like self-tests, e-portfolios and peer assessment. 

This paper will discuss the future possibilities for E-Assessment which should be a more 

holistic approach with a mixing of learning, teaching and assessing. This should be diverted 

from the usual ‘snapshot assessing’ towards an assessment over a period of time to avoid 

students’ exam anxiety and occasional blackout or blip. This approach of continuous 

assessment is only feasible with the electronic delivery and the use of Virtual Learning 

Space. This space should enable the students to be in control of their own learning and 

even their private notes and reports. The Virtual Learning Environments are too static at the 

moment and do not offer the freedom students require to network with their peers. 

Furthermore, the learning space should allow the flexibility that the students can decide 

when and where they are ready to take the test, which would lead to a self-organised 

assessment. 

Students could take the formative assessments as many times as they want, their progress 

would be recorded and would contribute to the final mark. In-depth and targeted feedback 

could guide the students so that they can learn from their own mistakes / misconceptions so 

this would help them to develop their reflective thinking skills. Here, E-Assessment would 

play the major role because is it possible to assess softer skills too since researching, 

validating data from different sources and working in a team are becoming more important 

on the wider opening job market. New forms of collaborative assessment techniques will be 

established where wikis and blogs are only the beginning. E-Assessment will be the new 

approach where the students’ expectations meet the teachers’ requirements. 

20 ENAC 2008


Alcohol and a Mash-up: Understanding Student Understanding 

Jim Ridgway, University of Durham, United Kingdom 

Sean McCusker, University of Durham, United Kingdom 

James Nicholson, University of Durham, United Kingdom 

Informed citizenship depends on the ability of citizens to understand and reason from evidence. In 

the UK at least, school statistics focuses on the mastery of technique, rather than on interpretation 

of results (Ridgway McCusker & Nicholson, 2007). The techniques themselves focus on the 

analysis of univariate and bivariate data. As a consequence, school statistics is largely useless in 

dealing with any data sets students might encounter in their lives outside school. 

There is a large literature on the problems that students and adults have with simple concepts, 

such as interpreting static 2D graphs, and tabular information (e.g. Batanero, et al., 1994). One 

might predict that working with multivariate data would be impossible for people with no statistical 

training. However, empirical explorations (e.g. Ridgway McCusker & Nicholson, 2006) show that 

computer-based 3 variable tasks are no more difficult for 12-14 year olds than are 2D paper 

based tasks. Du Feu (2005) has shown that much younger children can work meaningfully with 

multivariate data displays that they have created in the form of tactile graphs built from LEGO®. 

The SMART Centre has designed a number of software ‘shells’ in Macromedia Flash® that run 

on web browsers, and that facilitate the display of MV data (http://www.dur.ac.uk/smart.centre/). A 

variety of displays is available, that allow up to 6 variables to be displayed under user control. An 

earlier study (Ridgway, Nicholson, and McCusker, 2008) reported a study based in 13 classes of 

pupils aged 12-14 years, covering the range of abilities typical in their school. Resources were 

created on topics that included alcohol use, drug use, and sexually transmitted infections, using 

data from large scale surveys, together with curriculum materials designed to provoke 

understanding of MV data. Classroom observations showed that young pupils across the 

attainment range can engage with and understand complex messages in MV data. 

The study to be reported here presents students with a mashup comprising recent survey data on 

alcohol use presented in an interactive display, and links to recent newspaper articles on alcohol 

consumption by young people (e.g. “Young girls drink nearly twice as much alcohol as they did 

7 years ago” Daily Mail). Students are asked to critique the articles in the light of the data. We 

believe that the ability to read critically in the light of evidence is a core literacy, and a fundamental 

requirement for informed citizenship. We will report the findings from this study in detail, along 

with a list of core heuristics that are essential when exploring MV data. We will also present 

examples from student work that illustrate key aspects of statistical literacy, and questions that 

are useful to diagnose student conceptions and misconceptions. 

References 

Batanero, C., Godino, J. D., Vallecillos, A., Green, D., & Holmes, P. (1994). Errors and difficulties in 

understanding elementary statistical concepts. International Journal of Mathematics, Education, 

Science and Technology, 25(4), 527 – 547. 

du Feu, C. (2005) Bluebells and bias, stitchwort and statistics. Teaching Statistics, 27(2), 34-36 

Ridgway, J., Nicholson, J. R., & McCusker, S. (2007). Teaching statistics despite its applications. Teaching 

Statistics, 29(2), 44-48. 

Ridgway, J., Nicholson, J., and McCusker, S. (2008, in press). Reconceptualising ‘Statistics’ and 

‘Education’. In C. Batanero (ed.). Statistics Education in School Mathematics: Challenges for 

Teaching and Teacher Education. Springer. 

ENAC 2008 21


E-assessment for learning? 

The potential of short free-text questions with tailored feedback 

Sally Jordan, The Open University, United Kingdom 

A number of literature reviews have identified conditions under which assessment supports student 

learning (e.g. Gibbs and Simpson, 2004). Two common themes are assessment’s ability to 

motivate and engage students, and the role of feedback. However, if feedback is to be effective, it 

must be more than a transmission of information from teacher to learner. The student must 

understand the feedback sufficiently well to be able to learn from it i.e. to ‘close the gap’ between 

their current level of understanding and the level expected by the teacher (Ramaprasad, 1983). 

The work described is one of a number of projects in an ‘E-assessment for learning’ initiative at 

the Centre for the Open Learning of Mathematics, Science, Computing and Technology 

(COLMSCT) at the UK Open University. Most of the projects make use of the OpenMark eassessment 

system, which offers students multiple attempts at each question, with the amount of 

feedback provided increasing at each attempt. The provision of multiple attempts with increasing 

feedback is designed to give the student an opportunity to act on the feedback to correct his or 

her work immediately and the tailored feedback is designed to simulate a ‘tutor at the student’s 

elbow’. (Ross et al, 2006). 

The current project has extended the range of e-assessment questions offered to students via 

OpenMark to include those requiring free-text answers of up to around a sentence in length. The 

answer matching is written with an authoring tool provided by Intelligent Assessment 

Technologies Ltd. (Mitchell et al., 2002) which uses the natural language processing technique of 

information extraction and incorporates a number of processing modules aimed at providing 

accurate marking without undue penalty for poor spelling and grammar. A significant feature of 

the project has been the use of student responses to developmental versions of the questions, 

themselves delivered online, to improve the answer matching. 

Evaluation has included an investigation into student reaction to questions of this type and their use 

of the feedback provided. A human-computer marking comparison has shown the computer’s 

marking to be indistinguishable or more accurate than that of six course tutors. Reasons for this will 

be discussed. The two facets of the evaluation are linked; if students are to engage with the 

questions and to learn from the feedback provided, the marking must be accurate. Also, although 

most students like the questions and are impressed by the sophisticated answer matching, others 

appear to find multiple-choice questions less demanding and to be more trusting of human markers. 

The purpose of these interactive computer marked assignments (iCMAs) is to provide students 

with instantaneous feedback, pacing, and an opportunity to monitor their own progress and to 

discuss this with their tutor if appropriate. The iCMAs are complemented by tutor marked 

assignments. 

References 

Gibbs, G. and Simpson, C. (2004) Conditions under which assessment supports students’ learning. 

Learning and Teaching in Higher Education, 1:3-31. 

Mitchell, T., Russell, T., Broomhead, P. and Aldridge, N. (2002) Towards robust computerised marking of 

free-text responses. 6th International CAA Conference, Loughborough,UK. 

http://www.caaconference.com/pastconferences/2002/proceedings/Mitchell_t1.pdf 

Ramaprasad, A. (1983) On the definition of feedback, Behavioral Science, 28:4-13. 

Ross, S.M., Jordan, S.E and Butcher, P.G.(2006) Online instantaneous and targeted feedback for remote 

learners in C. Bryan and K.V.Clegg (eds) Innovative Assessment in Higher Education. London: 

Routledge:123-131. 

22 ENAC 2008

Symposia 

ENAC 2008 23

24 ENAC 2008

Symposium: Portfolios in Higher Education 

Portfolios in Higher Education in three European countries – 

Variations in Conceptions, Purposes and Practices 

Organiser: Olga Dysthe, University of Bergen, Norway 

Chair: Nicola Reimann, Northumbria University, United Kingdom 

Discussant: Anton Havnes, University of Bergen, Norway 

Portfolio assessment has been introduced in most countries both as an alternative 

assessment tool and as a tool for learning. The term ‘portfolio’, however, is used in very 

many different ways. This variation in portfolio conception is often presented as an 

advantage in the sense that portfolio is a very versatile tool that can be adapted to fit a long 

array of purposes and contexts. But it does create confusion for students who may 

encounter very different practices under the same name. In a Europe with extended 

educational mobility, we think it is timely to investigate and discuss whether there are any 

patterns of use that can be distinguished between countries (or: as characteristics in each 

country), and/or whether differences follow disciplinary and professional lines and thus cut 

across borders. Specific questions that will be raised in the discussion: Is there a need for a 

more unified understanding or definition of portfolio, and if so, is it possible? Is there a need 

for a clarificatory framework? The ‘collection-selection-reflection’ framework has been 

widely used, but is it useful in all contexts? What is useful for students, and what is needed 

in European or wider international fora where portfolios are discussed and researched? 

In this symposium we will present research from three countries that give some indication of 

how portfolios are used in higher education in Belgium, Norway and England, even though 

we fully realise that the picture in each country is even more varied than we are able to 

show. 

In England portfolios have developed from several directions, broadly summarised as 

learner-specific and subject-specific. This presentation, based on a study in 2007 will focus 

on e-portfolios growing out of the introduction of the Personal Development Planning, which 

can be seen as a formative assessment approach. 

In Norway portfolios were introduced as an alternative form of assessment in connection 

with a major reform of higher education implemented from 2002. This led to a proliferation 

of what can be called “disciplinary content portfolios”. A national survey of portfolio practices 

is presented and differences in understanding, content, use and assessment of portfolios 

between disciplines is discussed. 

A research study from the Flemish part of Belgium highlights a lot of differences between 

portfolio applications in different higher education courses. A framework was developed to 

compare portfolios from eight different courses in colleges and at universities. 

Interpretations of the portfolio concept are divergent and a portfolio standard is absent. 

All the contributions are based on research studies but the implications and the questions 

raised are practice-related. The presenters represent three European countries, UK, 

Norway and Belgium. 

ENAC 2008 25

Symposium: Portfolios in Higher Education / Paper 1: 

Learning opportunities through the processes of eportfolio development 

Elizabeth Hartnell-Young, The University of Nottingham, United Kingdom 

In Higher Education in the UK, the use of e-portfolios has developed from several directions, 

which can be broadly summarised as learner-specific and subject-specific. The first is often 

known as Personal Development Planning, which can be seen as a formative assessment 

approach, while the other is in high stakes summative assessment, in discrete subject areas, 

and increasingly in competency-based contexts in fields such as medicine. Consequently the 

form and content of the e-portfolios, and the issues that arise in their use, differ. 

England’s, e–Strategy intends that learners will have ‘a digital space that is personalised, 

that remembers what the learner is interested in and suggests relevant web sites, or alerts 

them to courses and learning opportunities that fit their needs’. As well as using such 

spaces in schools, colleges and universities, the intention is to enable the development of 

‘electronic portfolios that learners can carry on using throughout life.’ 

This paper is based on a study conducted by the author in the UK in 2007 which considered 

the uses of e-portfolios in school, further education colleges, universities and the National 

Health Service. It concluded that e-portfolio systems include repositories and a range of 

tools for storing and organising material for planning, reflecting, and giving and receiving 

feedback. The processes undertaken are opportunities for learning, while the collections in 

repositories build up over time, allowing selections to be offered to various audiences and 

assessors. Thus they can support both formative assessment (ongoing assessment for 

learning), and allow relevant selections to be presented for summative assessment 

(assessment of learning). At present, however, users have little sense of the concept of a 

lifelong e-portfolio. 

E-portfolio development can commence from one of many starting points such as online 

reflective practice, or planning or capturing evidence. These processes form part of an ‘eportfolio 

culture’: a way of thinking about personal and collaborative learning over a longer 

period of time than a specific course. Fragments of life experience from individual subjects 

and artefacts must be made more coherent as expressions of identity. However, because 

almost all e-portfolio products are licensed by institutions rather than individuals, they are 

neither portable nor interoperable, thus creating potential roadblocks on the lifelong journey. 

Assessment is the formal means by which eportfolios or their disaggregated contents are 

judged, whether by self assessment, peer assessment, tutor assessment, or university 

admissions officers. There are also other contexts, such as employment applications, in 

which audiences must recognise, acknowledge and value material in new forms. Rowntree 

(1977) suggested five dimensions of assessment: Why assess? What to Assess? How to 

assess? How to interpret? and How to respond? The last two questions are particularly 

apposite in light of increasing use of digital images, animations and so on, and with 

increasing attention being paid to e-assessment as a means of judging the outcomes of an 

individual’s learning experiences. 

References 

Rowntree, D. (1977). Assessing Students: How shall we know them? London: Harper. 

26 ENAC 2008


The Disciplinary Content Portfolio in Norwegian Higher Education – How and Why? 

Olga Dysthe, University of Bergen, Norway 

Knut Steinar Engelsen, Stord/Haugesund University College, Norway 

Considerable changes in assessment have taken place in Norway after 2002, in the wake of 

a major reform of higher education inspired by the Bologna Declaration. While ‘portfolio’ was 

an unknown concept for most teachers and students in higher education in Norway five years 

ago, an evaluation report about the reform documented that considerable changes had taken 

place and that portfolio assessment was now used in all types of educational institutions and 

across disciplines (Dysthe et al 2006). The empirical basis for this paper is a nationwide 

survey study of portfolio practices conducted in 2006, supplemented by case studies. 

The aim of the research study was to get an overview of portfolio practices in Norway. We will 

mention some of the findings and describe characteristic aspects of ‘the disciplinary content 

portfolio’ and systematic differences between different types of educational institutions as well 

as between disciplines within each institution. The main question we raise is what 

conceptions of portfolios and what practices are considered useful and under what conditions. 

Our survey was based on a randomized selection from all public universities and university 

colleges in Norway was conducted in the spring 2006. The purpose was to map 

assessment practices across different institutions and disciplines, focusing on issues like 

types and number of portfolio assignments, the use of feedback, final assessment formats 

and the use of evaluation criteria. Teacher attitudes towards the usefulness of portfolios in 

relation to student and teacher workload were also investigated. The informants in both 

surveys were professors and lecturers responsible for portfolio assessed courses. The 

survey data was analysed using standard statistical methods. 

We found that portfolio systems varied from advanced reflection-based models which 

included multiple text types and flexible feedback-practices to portfolios that consisted of 

factual texts with rudimentary feedback procedures and no reflective texts. We found 

systematic variations between professional educational institutions and universities, but also 

between ‘soft’ and ‘hard’ disciplines within the same institutions. Portfolio practices were 

diverse and a common understanding seemed lacking; a finding that may be due to the 

early stage of implementation and the complex motivation for initiating change. Feedback 

was considered very important, but even when peer feedback was being used, training in 

how to give feedback or discussion of quality criteria were not common. 

The Norwegian disciplinary content portfolio falls under the category that Hartnell-Young 

calls “subject-specific” but tries to combine formative and summative assessment. It is often 

digital but differs from the learner-specific e-portfolio described by Hartnell-Young by not 

aiming at building up repositories over time or presentation for an out-of-class audience. 

We base our discussion on socio-cultural perspectives on learning and focus particularly on 

how macro level policy decisions in Norway have affected the use of portfolios for 

assessment, and how disciplinary cultures at department level have shaped both the 

conceptual understanding and practical use of portfolios at meso level. A question arising 

from this is how portfolios in different European countries are influenced by different 

sociocultural contexts. 

ENAC 2008 27


Portfolio diversity in Belgian (Flemish) Higher Education – 

A comparative study of eight cases 

Wil Meeus, University of Antwerp, Belgium 

Peter Van Petegem, University of Antwerp, Belgium 

An international literature study on portfolio in higher education led to a timeline 

distinguishing between four modes of implementation (Meeus et al, 2006). These range 

from the use of portfolio in admissions to higher education, during the higher education 

course, on entry into the profession and for ongoing professional development. In this study 

we focus on portfolios used during higher education courses. 

There are a large number of portfolio applications in use in higher education courses in the 

Flemish part of Belgium. Although practitioners and scholars talk about portfolios as a 

standard concept, most of the portfolios they refer to seem to be very different. This study 

investigates the diversity of portfolio applications used within higher education in Flanders. 

The research questions are: In what way do portfolio applications differ within the large area 

of higher education courses? Can enough commonality be detected to claim the existence 

of a standard portfolio concept? 

Eight portfolio applications were randomly selected, all in different higher education courses 

in Flanders: elementary teacher education, secondary teacher education, graphic and 

digital media, speech therapy, podiatry, nursing, academic teacher education, and physical 

education. The first six courses are organized at colleges, the last two at universities. A 

comparative framework was developed while gathering information on the portfolio 

applications. Source triangulation was used combining document analyses, interviews with 

portfolio supervisors, and focus groups with students. 

The comparative framework defines sixteen different characteristics within five categories: 

phase of implementation, function, ingredients, ICT-format and mode of supervision. All 

eight portfolios differ remarkably. No two portfolios have identical characteristics. This leads 

to the conclusion that general pronouncements on portfolio in higher education are 

problematic given the divergent interpretations of the concept. A clear description of the 

portfolio characteristics should be part of all scholarly papers on portfolio if conclusions are 

meant to be meaningful. 

References 

Meeus, W., Van Petegem, P., Van Looy, L. (2006). Portfolio in Higher Education: Time for a Clarificatory 

Framework. International Journal of Teaching and Learning in Higher Education, 17(2), 127-135. 

28 ENAC 2008

Symposium: Group work assessment 

Aims, values and ethical considerations in group work assessment 

Organiser: Lorraine Foreman-Peck, The University of Northampton, United Kingdom 

In this symposium ‘group work’ is understood as assignments carried out by students, 

largely independently of the tutor and usually outside normal class contact time. Group work 

is used extensively in higher education in the UK, in a variety of ways, from assignments 

that are relatively short to those forming the major part of the course. They are pervasive, 

not only because they are seen as providing an educationally valuable learning experience, 

but also because they are generally believed to develop skills useful to employers. 

The requirement for group work assessment to be fair and transparent is problematic (e.g. 

Race 2001). This is most evident with group dynamics. In these instances students may fail 

to work optimally together, and then,undergo a negative and damaging experience. The 

literature suggests that these cases occur regularly, but may affect only a minority of 

students in any one cohort (e.g. Parsons 2004), and are dealt with in an ad hoc and opaque 

manner. 

Proposed solutions to group dysfunction are usually technical, focussing for example on the 

validity and reliability of different methods of allocating marks (e.g. Magin 2001). However 

this approach does not appear to address a host of value questions that commonly arise, 

such as: ‘Is it right to allow students to evict underperforming students from their groups?’; 

‘Ought equal marks for unequal contributions be given?’; ‘Should a whole group fail as a 

consequence of plagiarism committed by one student?’ 

These and other questions point to the need to conceptualise and contextualise in more 

depth: the practice of fair group work assessment. the principles and rules by which groups 

should abide, the extent and nature of tutor facilitation, and the prevention of dysfunctional 

group dynamics. These issues are explored by the symposium participants in the context of 

their own practices. 

Participants in the symposium belong to a team of tutors from the Universities of 

Northampton and Northumberland who have been working together on group work practice 

since October 2007. Their approach is action research as a form of ‘practical philosophy’ 

(Elliott 2007). This involves tutors in identifying and clarifying ethical challenges in their own 

teaching and evaluating possible solutions based on defensible educational values. From 

these coordinated case studies, grounded insights into group work practice across a range 

of disciplines are derived, along with suggested institutional policy recommendations. 

ENAC 2008 29

Symposium: Group work assessment / Paper 1: 

Involuntary Free Riding – how status affects performance in a group project 

Julia Vernon, The University of Northampton, United Kingdom 

Many studies (Maguire and Edmondson 2001, Mills 2003, Gupta 2004, Greenan et al 

1997), note the positive effects on learning which groupwork may engender, and others 

(Knight 2004, Hand 2001) discuss the existence of phenomena such as social loafing 

(Latané et al.,1979), free-riding (Albanese & Van Fleet, 1985) and the inequity of workload. 

The findings of Whyte (1943), Cottrell (1972), Webb (1992) and Ingleton (1995) show how 

the effects of group dynamics can have a positive or negative effect on the performance of 

group members. 

In this action research case study, we attempt to counter the negative effect that working in 

a group may have on the self-esteem and confidence of individuals, when they are teamed 

with students considerably more skilled than themselves. It focuses on a prolonged 

groupwork project, part of a Level 5 undergraduate course in Business Computing, which 

simulates a web development consultancy company, and involves team members taking on 

a variety of roles, in which they demonstrate different skills. 

As the project proceeds, it is noted that the status of individuals within the groups becomes 

polarised. The high status individuals have a strong sense of ownership, and a decreasing 

degree of trust in the work of others. Low-status members are inclined to defer to their 

team-mates and to draw back from expressing opinion in decision-making situations, or 

giving explanations of their own work. It is suggested in this paper that students who have 

every intention of contributing fully to the project, nevertheless, through these effects, find 

themselves in a position of being involuntary ‘free-riders’. 

In this research measures were introduced to support the groups and individuals, and to 

counter the negative effects noted. These actions took place around the middle of the 

period of the project, when in previous years there has been a lull in group activity, and 

problems have arisen. A facilitated session was arranged for the students, to bring issues of 

groupwork into the open, and develop strategies to improve group cohesion. Discussions 

followed from this and students were counselled individually to trace areas of difficulty. A 

formative assessment was introduced in the form of an individual presentation to the group, 

where each student explained their role and how they were carrying it out. 

Revisiting issues, after students have been able to experience them first-hand, resulted in a 

much more thoughtful response than when discussed early in the project. In addition the 

requirement to prove individual contribution brought about some task re-negotiation. 

Dominant members were seen to rein back in some aspects of the control they had 

exercised, while submissive members pushed themselves to take a lead on an important 

part of the project. Crucially, awareness of group dynamics had increased and was seen in 

less simplistic terms. The measures taken had the effect of alleviating the effects noted, 

supporting positive help-giving and knowledge transfer within the group, and allowing 

members to contribute more fully to the group task. 

30 ENAC 2008


Facilitating Group work: Leading or empowering? 

Julie Jones, The University of Northampton, United Kingdom 

Andrew Smith, The University of Northampton, United Kingdom 

Year two students of the Foundation Degree in Learning and Teaching, study a module 

relating to special educational needs/inclusion. Assessment is through a collaborative group 

project and a personal project diary with a reflective statement. We felt that the assessment 

strategy did not sufficiently discriminate between students: virtually all achieved very high 

grades. Concern was further prompted by awareness of recent research into issues of the 

fairness, justice and reliability of group work (Maguire and Edmondson 2001, Barnfield 

2003, Knight 2004, Skinner et al 2004) and of motivational factors including the effect of 

rewarding the group product or the individual contribution (Chapman 2002) and issues of 

inter-relationships in groups (Arango 2007). 

Reflective statements and evaluation feedback from the 2006/7 cohort identified concerns relating 

to some students acting as ‘passengers’ ,but being awarded the same high grade for the module 

as those members who completely engaged with the work. This is a well documented problem 

identified by others (Ransom 1997, Parsons 2002, Hand 2001, Cheng and Warren 2000). 

In addition the Course Team found tutor guidance was a complicating factor: it was felt that 

it was a major contributor to the high grades awarded. There was a concern that this 

facilitation encouraged some students’ lack of engagement by allowing them to be led 

rather than, as was intended, empowering them to develop their own projects. 

These observations prompted a reformulation of the assessment strategy for the 07-08 

cohort. The weightings were altered from 80% to 60% for the group assessed project and 

from 20% to 40% for the individual elements. In order to assess the effects of this and to 

gain insight into issues such as empowerment, especially those involving tutor facilitation, 

data was collected on the following: 

How the group: 

• formed and decided upon the project focus 

• sustained motivation and whether this was linked to a perception that individual 

contributions supported the group assessment or individual assessment, or both 

• managed inter-personal professional working relationships 

• managed equitable sharing of the work-load 

• their perceptions and use of the guidance available from the module tutor 

This involved: 

• analysis of 2007-08 students’ diaries and reflective statements 

• interviews with 2007-08 students 

• analysis of diaries and reflective statements from the 2006-07 cohort 

• interviews with students from the 2006-07 cohort asking them to reflect retrospectively 

on their experiences. 

• A reflective dairy written by the facilitating tutor for 07-08 

From this comparative evaluation we will explore firstly whether the amendments to the assessment 

weightings made a difference in students’ perceptions of the fairness of the assessment strategy 

and secondly the effect the level and nature of tutor facilitation had on group dynamics, especially in 

the areas of, communications, task sharing, empowerment and ownership. 

It is expected that the research will have implications for tutors’ thinking about assessment 

weightings and will throw light on the ethical dilemmas surrounding the issues of the guidance and 

facilitation of group work. 

ENAC 2008 31


Marginalised students in group work assessment: 

ethical issues of group formation and the effective support of such individuals 

Antony Mellor, Northumbria University, United Kingdom 

Jane Entwistle, Northumbria University, United Kingdom 

In this project we focus on our experience of group work assessment over a number of 

years on a 20 credit, year-long, option module Soil Degradation and Rehabilitation, which 

forms part of the final year of our BSc (Hons) Geography degree programme and has a 

cohort of around 30 students each year. The group assessment comprises 40% of the 

module marks and includes a group oral presentation and written report. We became 

concerned about a number of issues adversely affecting the student learning experience, 

such as marginalised individuals, adverse group dynamics and unequal contributions by 

individuals within groups (Mills 2003, Hand 2001). Of specific concern, however, were the 

ethics of group formation (Chang 1999, Knight 2004). Allowing self-selected groups 

inevitably leaves some students marginalised and in a position where they may be not only 

disadvantaged materially in terms of marks but also could be personally affected in a 

negative way. In this paper we explore to what extent is it our duty to address the needs of 

these students, as well as ways of maintaining equity and transparency in tutor-led support 

across the entire cohort. 

Using an action research approach (Carr 2006, Elliott 2007), we implemented four key 

interventions: 

• To make timetabled sessions available for the groups to meet and also to discuss 

progress with the tutor, thus addressing the practical problem of lack of opportunity to 

meet and facilitating group interaction early on in the process. 

• To allow groups to play to their strengths. We encouraged the students to think about 

their strengths in terms of the tasks required as part of this assignment to identify what 

their contribution might be and their role within that group. 

• To provide formative feedback on drafts of the written report. This enabled us to 

encourage and promote the need for a dialogue between group members where a 

synthesis of materials was lacking. 

• To include an individual critical reflection component as part of the assignment. We 

aimed to promote reflection on the learning inherent in the activity regardless of the form 

of the experience or the summative mark of the end product. 

Data were collected using a written teacher log, the students’ critical reflections, and a 

student questionnaire following completion of the project. Of the four interventions noted 

above, all had a positive role to play in supporting isolated and marginalised students with 

their experience of group-work. The fourth, that of individual critical reflection, was perhaps 

the least successful across the cohort as a whole because the students were relatively 

inexperienced in this way of thinking and writing, coming largely from a scientific 

background. It did however provide a platform for student grievances and issues to be 

raised, and facilitated their ability to develop different approaches to solving more abstract 

problems. Outcomes from this intervention will also be considered in the planning of this 

group assessment in future years. 

32 ENAC 2008

Symposium: Multidimensional measurement models: 

Multidimensional measurement models of students’ competencies 

Organiser: Johannes Hartig, German Institute for International Educational Research 

(DIPF), Germany 

Most measurement models applied in traditional educational assessments implicitly or explicitly 

assume that test results can be described in terms of single ability dimensions. That is, 

individual performance differences in all assessment tasks are attributed to differences in one 

common ability dimension. These unidimensional models are useful in many contexts, 

especially if the performance domain of interest is relatively narrow, or if the goal of the 

assessment is a mere summative description of student achievement. Large scale 

assessments, for instance, keep the dimensionality of their instruments low by purpose since 

their goal is the description of achievement levels of large groups in broad content domains. 

However, if performance in a more complex domain of competence is to be assessed, or if the 

goal of the assessment is a deeper understanding of the underlying individual differences, the 

use of unidimensional measurement models may be unsatisfactory. For instance, performance 

in a complex domain of competence may be attributed to multiple, distinguishable abilities and 

the goal of an assessment may be to obtain differentiated individual profiles of these abilities. 

Or, the assumption that all tasks used in an assessment measure the same single ability 

dimension for all students may not be realistic because different students may draw on different 

knowledge and strategies to arrive at the same solutions. 

If the goal of the assessment is a deeper understanding of observed performance 

differences, or if unidimensional models fail to adequately explain test outcomes in complex 

tasks, more complex, multidimensional measurement models can be employed as an 

alternative to unidimensional models. These models can be used to identify systematic 

causes for violations of the unidimensional model, and to test more differentiated theoretical 

models of students’ competencies. In the latter case, the analysis of multidimensional 

models requires stronger theoretical assumptions than unidimensional models, the 

application of more advanced statistical techniques, and typically larger sample sizes. In 

exchange, multidimensional measurement models hold considerable promise for the 

empirical examination of differentiated models of performance in complex domains and 

heterogeneous populations. 

The symposium will present different approaches and applications of multidimensional 

measurement models. The first paper focuses on methods to systematically identify and 

explain violations of the assumption of unidimensional constructs in the domain of 

mathematical problem solving. Variables interacting with psychometric properties of single 

items or subgroups of items are identified in order to achieve a better understanding of the 

assessed competence. The second paper presents an application of cognitive diagnosis 

models (CDMs) to a mathematic test for elementary school. These multidimensional latent 

class models allow the construction of differentiated models of response processes, taking 

into account multiple basic abilities. The third paper focuses on a differentiated diagnosis of 

basic abilities in a foreign language assessment. Performance in listening comprehension 

items is decomposed into general text comprehension and auditory processing abilities 

using a two-dimensional IRT model. 

The papers will be discussed with respect to the potential benefit of multidimensional 

measurement models in different contexts of application, and the theoretical requirements 

of different models. 

ENAC 2008 33

Symposium: Multidimensional measurement models / Paper 1: 

Evaluation of non-unidimensional item contents using diagnostic results from Raschanalysis 

Markus Wirtz, Freiburg Universit of Education, Germany 

Timo Leuders, Freiburg Universit of Education, Germany 

Marianne Bayrhuber, Freiburg Universit of Education, Germany 

Regina Bruder, Darmstadt Technical University, Germany 

Competence scales, which have been developed and evaluated by means of Rasch analysis, 

possess optimal properties if diagnostic results are sought to reflect systematic and reliable 

differences between students in competence domains. Such scales allow a strictly 

unidimensional assessment of competencies: The response probability for all items is determined 

by only one latent dimension and thus person characteristics can be interpreted unambiguously. 

Hence, a fair and meaningful comparison of subjects or subgroups is admissible. 

Unidimensionality can be statistically tested because of the local independence of items within 

Rasch homogeneous scales: items and persons are calibrated on a common latent trait, and 

the position of items and persons on the latent trait (i.e., item difficulties and individual abilities) 

must suffice to predict the observed data structures. For summative large scale assessments 

as PISA and TIMSS it is important that items fulfil these criteria. It has been argued, however, 

that competence constructs become restricted if items covering more than one ability are 

systematically eliminated. This may especially pose a problem if diagnostic results are to be 

interpreted and used in a formative manner in classroom contexts. If scales are supposed to 

identify didactically relevant information about students’ competencies and individual potentials 

for development, multidimensional item contents may be desirable. Such “unscalable” items 

may be particularly important to enable teachers to identify processing problems and failures. 

In order to enhance the practical benefits of applying the Rasch model in competence 

diagnostics, strategies will be presented and discussed which allow to systematically analyse 

violations of the assumptions of the Rasch model. Differential Item Functioning (DIF) and 

Mixed-Rasch-Analysis both provide techniques to identify systematic violations of model 

assumptions. Furthermore person-fit-measures can be used to identify covariates, which predict 

the fit of individual student’s answer profiles to the model, and to identify conspicuous profiles. 

Variables that affect statistical properties of single items or item groups may be identified on 

the item (e.g. including or excluding specific tasks in the classroom by different teachers) or 

the student level (e.g. processing strategies, preference for different mental 

representations). Effects of these variables can provide important information concerning 

the structure of the competence to be assessed (e.g. different student types or existence of 

typical erroneous conceptions). 

Data will be presented from a research project on models for heuristic competencies in 

mathematical problem solving. The use of problem solving strategies which demand the 

application of different representations (numerical, graphical, symbolic and verbal) and the 

systematic change between these representations are assessed by psychometric scales. 

The item pool is based on a sound didactical framework. Possible causes of item misfits will 

be evaluated in order to enhance the knowledge about the according competence domain. 

The purpose of the study is twofold: A Rasch-homogeneous assessment of sub-dimensions 

of the use of problem solving strategies will be developed, and a sophisticated diagnostic 

instrument for the identification of areas for special support needs will be provided. Within 

this talk systematic psychometric strategies are discussed, which may allow to achieve both 

of these goals. 

34 ENAC 2008


Modelling multidimensional structure via cognitive diagnosis models: Theoretical 

potentials and methodological limitations for practical applications 

Olga Kunina, IQB, Humboldt University Berlin, Germany 

Oliver Wilhelm, IQB, Humboldt University Berlin, Germany 

André A. Rupp, IQB, Humboldt University Berlin, Germany 

In large educational studies like PISA usually unidimensional probabilistic models of latent 

traits are used, assuming that the observed test results can be sufficiently explained by a 

single latent ability. However, if a deeper understanding of the underlying cognitive basic 

skills is intended this approach seems to hold some limitations in terms of adequate 

mapping of the complexity of the addressed abilities. Most constructs assessed in 

educational studies (e.g. language comprehension, mathematic performance) supposedly 

require different cognitive skills to succeed on an item or in the test. Cognitive diagnosis 

models (CDMs) can yield individual profiles of relevant basic skills. Based on the profile 

information detailed feedback can be provided and used in teaching classes or formative 

interventions. 

In methodological terms CDMs are confirmatory multidimensional latent-variable models 

suitable for efficiently modelling within-item multidimensionality. They usually contain discrete 

latent variables that allow for a multivariate classification of respondents. Importantly, in 

prototypical applications of CDMs the definitions of the latent “attributes” or “skills” are based 

on a cognitively grounded theory of response processes at a fine grain size. 

In this contribution we will first discuss these key features of CDMs. We will then illustrate 

how CDMs can be used in large-scale educational assessment by applying a variety of 

models to data from a newly developed diagnostic mathematics assessment for elementary 

school children (3rd and 4th grade). The mathematics assessment comprises counting and 

modelling tasks requiring addition, subtraction, multiplication, and division skills and aims to 

provide a differentiated profile of counting and modelling skills in basic arithmetic 

operations. 

Specifically, we will compare multidimensional profiles of children from selected 

compensatory and non-compensatory CDMs. Competing measurement models are 

compared in terms of absolute and relative model fit, attribute difficulty distributions, and 

latent class membership probabilities. To provide some evidence for methodological 

generalizability of the results, we will then compare the discrete profiles from the different 

CDMs with continuous multidimensional profiles from item response theory and 

confirmatory factor analysis models. In combination, these analyses will provide empirical 

insight into the cost-utility trade-offs of CDMs as well as into the conditions under which 

their theoretical potential can be realized in large-scale educational assessment practice. 

ENAC 2008 35


Modelling Specific Abilities for Listening Comprehension in a Foreign Language 

with a Multidimensional IRT Model 

Jana Höhler, German Institute for International Educational Research (DIPF), Germany 

Johannes Hartig, German Institute for International Educational Research (DIPF), Germany 

Multidimensional Item Response Theory (MIRT) provides an ideal foundation to model 

performance in complex domains, simultaneously taking into account multiple basic 

abilities. In MIRT models with a complex loading structure, mixtures of different abilities can 

be modeled to be necessary for specific items. These models allow to investigate the 

relative significance of different ability dimensions for specific items, i.e. what kind of ability 

is required to what extent for solving a specific item. Hence, sound theoretical assumptions 

about the interaction between the person and the test items and the nature of the relevant 

ability dimensions are required. Often these assumptions are hard to test empirically, since 

different complex models may be equivalent in terms of model fit. However, assumptions 

about the demands of specific test items allow the prediction of which items should be 

particularly strong related to specific ability dimensions. The aim of this paper is to illustrate 

how theoretical assumptions about the nature of different ability dimensions represented in 

MIRT models can be validated by testing these predictions, i.e. by relating MIRT model 

parameters to characteristics of the item content. 

The data of our empirical application is from a German large-scale assessment of 9th grade 

students’ language competencies. The analyses are based on the data from reading and 

listening comprehension tests of English as a foreign language. The listening 

comprehension items are very similar to the reading comprehension items, and it is 

reasonable to assume that they require similar abilities. Both tests require the decoding and 

understanding of English, as well as the processing and integration of the information 

retrieved. Both tests require the reading of written text, the multiple-choice items being 

presented in written English. Consequently, one latent ability dimension can be assumed to 

represent the abilities required for both tests. However, the listening comprehension test 

additionally requires the processing and understanding of spoken language. It therefore 

appears reasonable to assume a second latent dimension representing the abilities required 

exclusively for the listening comprehension items. 

A two-dimensional two-parameter (2PL) IRT-Model is applied to the data. The first 

dimension represents the abilities common to the reading and listening comprehension 

tests (“general text comprehension”), while the second dimension represents the abilities 

specific to listening comprehension (“auditory processing”). The focus of our analysis is the 

strength of the loadings of the listening comprehension items on the auditory processing 

dimension. In order to identify items that draw particularly on this dimension, a priori defined 

task characteristics are used to predict the respective items’ loadings. It can be shown that 

the loading on the auditory processing dimension is related to specific item characteristics, 

e.g. the complexity of the relevant text passage and the speed of speech. The results 

provide support for the presumed nature of the “auditory processing” dimension. 

Additionally, groups of students differ in their relative strength on both dimensions, 

illustrating the benefit of a differentiated analysis of basic ability dimensions in applied 

contexts. 

36 ENAC 2008

Symposium: Computer-based assessment: 

Recent Developments in Computer-Based Assessment: 

Chances for the measurement of Competence 

Organiser: Thomas Martens, German Institute for International Educational Research 

(DIPF), Germany 

During the last decade Computer-Based Assessment (CBA) has become more and more 

popular in the international testing and educational community. The major reason is that 

Computer-based tests can include elements that cannot be rendered on paper. 

For conducting CBAs powerful software systems have become available providing support 

to the entire assessment process, i.e., item and test development, test delivery and result 

reporting (cf. presentation by Latour, Martin, Plichart, Jadoul, Busana, and Swietlik-Simon). 

Moreover, to assess innovative constructs user-friendly tools for authoring complex 

interactive stimuli have been developed (cf. presentation by Goldhammer, Martens, 

Naumann, Rölke, and Scharaf). 

Basically, computer-based tests allow for a greater diversity of test stimuli and test 

interaction than Paper–Based Assessments (PBAs). This especially holds true with regard 

to the assessment of competencies (cf. presentation by Naumann, Jude, Goldhammer, 

Martens, Roelke, and Klieme). With PBAs it is almost impossible to measure competencies 

that involve a dynamic situation or settings that are drawn from real life. In contrast, CBA 

can integrate multimedia test content and complex interaction modes that simulate real-life 

situations, and, thereby, the validity with regard to the measured competencies can be 

increased. 

Regarding testee-item interaction CBAs enables automatic recording of reactions and 

response times, which could not be accomplished with printed material. In combination with 

interactive stimuli this set-up allows for the performance-based assessment of, for example, 

ICT literacy. 

Another test format which can only be administered using computers is computer-adaptive 

testing (CAT). Here the difficulty of the items is tailored to the individual competence level of 

the test taker, as to ensure that the subject does not receive items that are clearly too easy 

or too difficult for him. This method saves time and allows for a more accurate estimation of 

the test taker’s ability. 

In an educational context students increasingly use computers to study and to complete 

their tasks. The computer may even have become the standard tool for studying and 

problem solving, so using PBAs to assess these students might be inappropriate. For 

educational monitoring, CBAs and even web-based CBAs have become feasible and the 

benefits of applying CBA seem to far outweigh the challenges. Also, for some test persons 

CBAs might have positive influences on motivation during the test. 

In sum, CBA offers a great potential and is most probably the testing mode for 

competencies in the future. 

ENAC 2008 37

Symposium: Computer-based assessment / Paper 1: 

Enlarging the range of assessment modalities using CBA: 

New challenges for generic (web-based) platforms 

Thibaud Latour, Raynald Jadoul, Patrick Plichart, Judith Swietlik-Simon, 

Lionel Lecaque, Samuel Renault 

CRP Henri Tudor, Luxembourg 

It has long been advocated that Computer-Based Assessment (CBA) bears significant 

advantages with respect to paper-and-pencil instruments on both the testee as well as at the 

logistic and management levels. However, CBA does not cover all existing assessment 

modalities and will hardly replace other delivery modes and contexts, or human-restricted tasks. 

Taking full benefit of advanced computer and information technologies when simply shifting from 

paper-and-pencil tests to computerized instruments remains challenging on various aspects. 

Security issues related to both organisational and technological aspects are prevalent in high-stake 

testing, but also in most situations were strict measurement validity is crucial, such as large-scale 

assessments and monitoring. Security challenges range from test, item and content protection, 

processes integrity and secrecy, diffusion and validity control of tests and items, cheating detection, 

identity management, etc. Depending on the testing context, these issues are more or less easily 

tackled. However, when considering networked and loosely constrained testing situation addressing 

these issues becomes more challenging and technologically demanding. 

Advanced Result Exploitation techniques and models are necessary to take full benefit of 

the various kinds of user tracking capabilities enabled by the use of IT platforms. The 

challenge includes discovery, extraction, analysis and exploitation of potential patterns in 

the huge number of behavioural and chronometric data recorded during test execution 

together with the identification of their psychometric significance. 

New Forms of Testing and new form of instruments to perform social, collective and situational 

skill assessments are now becoming more conceivable following the maturation of so-called 

ambient intelligence technologies (pervasive computing, ubiquitous computing and advanced 

user experience). These kinds of assessments can be achieved through the use of simulations 

and games, 3D and immersive technologies, ubiquitous and mobile testing, collaborative testing, 

etc. Assessing business-related skills and jobs in an economically viable manner, i.e., reducing 

the testing time and test development costs with respect to the number of dimensions that 

compose a job description in terms of competencies; this requires the design of new multidimensional 

instruments, including techniques to rapidly screen the subject capabilities 

respective to a series of jobs reference descriptions. 

The Intelligent Management of e-Testing Resources becomes a key element for the 

generalisation of CBA in collaborative management settings where contents, models, possibly 

data, items, tests, etc are shared by remotely located stakeholders. Improving the capacity to 

qualify, annotate, exchange, and search e-testing resources in a distributed community will 

soon become a key element in the item and test production capacity enhancement. This 

challenge includes collaborative aspects in stakeholder networks, including P2P frameworks, 

semantic annotations of multimedia resources and definition of related ontologies, query 

propagation and advanced semantic searches, ruled-based item creation support, etc. 

In this contribution, we shall explore these challenges we consider important for the future and 

provide hints and potential roadmap to address these challenges from both a technological and 

psychometric perspective. The TAO (the French acronym for technology-based assessment) 

framework provides a general and open architecture for computer-assisted test development 

and delivery, with the potential to respond to most of the raised issues. 

38 ENAC 2008


Developing stimuli for electronic reading assessment: The hypertext-builder 

Frank Goldhammer, Thomas Martens, Johannes Naumann, 

Heiko Rölke, Alexander Scharaf 

German Institute for International Educational Research (DIPF), Germany 

Reflecting the increasing prevalence of technology in peoples’ everyday lives, the 

conception of reading literacy has evolved into a more comprehensive concept. More 

specifically, reading literacy referring to printed and mostly linear texts has been extended 

to also include the ability to successfully navigate and process non-linear electronic 

documents (hypertexts). During the last decade reading of electronic texts has become an 

activity of increasing importance amongst youths as well as adults. 

Against this background, sound research on the cognitive processing of electronic text is 

necessary both on fundamental and applied levels. To promote and facilitate research on 

reading electronic text, we have developed an authoring system to create electronic reading 

stimuli. 

The purpose of the present paper is twofold. First, we present a new graphical front-end tool 

for the computer based assessment platform TAO (the French acronym for technologybased 

assessment), the “Hypertext Builder”, which was developed to author items for 

electronic reading assessment. The Hypertext Builder was designed to facilitate the rapid 

development and implementation of complex electronic reading stimuli, covering all major 

text-types encountered in electronic reading such as websites, e-mail client environments, 

forums, or blogs. 

Second, after presenting the Hypertext Builder itself and demonstrating its features, we 

report first evidence for the proposition that Hypertext Builder created text stimuli capture 

specific features of electronic reading. We used Hypertext Builder created materials in an 

experiment designed to test the assumption that a greater degree of executive control is 

needed for the processing of hypertext compared to linear text because of the navigation 

demands imposed by hypertext. Sixty students read hypertexts and linear texts (withinsubject 

factor) under three secondary task conditions that either imposed no additional load, 

general dual-task load, or executive control load (between-subject factor). 

ENAC 2008 39


Component skills of electronic reading competence 

Johannes Naumann, Nina Jude, Frank Goldhammer, Thomas Martens, 

Heiko Rölke, Eckhard Klieme 


With the Internet having become a ubiquitous means for dissemination and distribution of 

opinions, news, and all other kinds of information around the world, skill in reading 

electronic documents may well be regarded as a key competence required for successful 

participation in society. Electronic documents are typically represented as non-linearly 

structured hypertexts. This means, on the one hand, compared to the reading of traditional 

printed text, that successful processing of electronic documents poses a number of 

additional demands on readers, such as making decisions whether to follow a certain link or 

not, or, if a link is followed, to keep in mind the original reading goal. On the other hand, 

electronic text allows for the implementation of signalling devices that may in fact facilitate 

processing, provided they are used adequately. Thus, reading of electronic text is a 

competence that cannot be easily mapped upon traditional text-processing skills. Rather, to 

successfully use electronic text, readers must have ample working memory resources to 

simultaneously accommodate for text processing and navigation in the first place. For that, 

basic reading processes need to be well-routinized as well, so that available working 

memory does not have be devoted to basic operations of text processing such as word 

recognition or semantic parsing. In addition, to deal with an electronic text’s non-linearity, 

e.g. to efficiently use navigational aids such as overviews or typed links, readers must have 

at their disposal adequate metacognitive strategies. Finally, computer skills may affect 

electronic reading competence, in that for reading electronic text readers must have at least 

some very basic computer knowledge, such as how to access an internet address or how to 

use a mouse. 

The present paper investigates which of these component skills actually affect individual 

competence in reading electronic documents. To assess electronic reading competence 

and related component skills, newly developed tests were implemented in an new testingtool. 

To assess electronic reading competence, this tool presents interactive stimuli that 

mimic real-life web sites, e.g. a medical or a job search site, with corresponding testquestions. 

Subsequently, students’ basic reading skill (lexical access and sentence 

comprehension), working memory capacity, metacognitive strategies and computer skill 

were assessed. A total of three-hundred students were sampled from 30 German schools. 

Test sessions lasted for three hours. Electronic reading competence was regressed on 

working memory capacity, basic reading skill, knowledge of metacognitive strategies, and 

computer skill. Using hierarchical linear models with students as level-1-units and schools 

as level-2-units, a substantial proportion of variance in electronic reading competence was 

explained by the proposed set of predictor variables. In addition, both level-1-intercepts and 

regression weights were found to vary between level-2-units (schools). As a consequence, 

future research should address not only which school-level conditions cause high average 

levels of electronic reading competence, but also the conditions under which electronic 

reading skill is more or less dependent on stable trait variables that cannot be changed 

easily, such as working memory capacity. 

40 ENAC 2008

Symposium: High-Stakes Performance-Based Assessment: 

Issues in High-Stakes Performance-Based Assessment of Clinical Competence 

Organiser: Godfrey Pell, University of Leeds, United Kingdom 

Chair/Discussant: Trudie Roberts, University of Leeds, United Kingdom 

In the current era of audit and accountability in medicine, stakeholders want assurance that 

graduates have attained the required level of clinical competence to be awarded their 

degree and a licence to practise. Final exit examinations are designed to provide that 

assurance. 

Within the field of medical education, the Objective Structured Clinical Examination (OSCE) 

is currently favoured for assessing clinical skills as it has been shown to be the most 

reliable, valid, fair and defensible format for this purpose (Harden and Gleeson;Newble). 

Assessing students’ clinical skills just prior to graduation provides the assurance that they 

have achieved the minimum level of clinical competence required for a licence to practice 

by both the General Medical Council (General Medical Council) and the Medical Council of 

Canada (Reznick, Blackmore, Cohen et. al., 1993). 

In an OSCE, candidates rotate through a series of time-limited ‘stations’ and perform a 

particular clinical task at each one. These tasks include clinical examination skills, 

communication skills or practical procedures (Boursicot and Roberts). In this way, 

candidates are tested across the range of skills and patient problems required for 

graduation and this is equated to ‘clinical competence’. 

Each station is observed by an examiner (usually a clinician or a trained observer) who 

scores the candidate using a checklist in which the steps of the particular clinical skill being 

assessed are listed as individual items. The examiners mark whether a candidate has 

performed each step correctly and the overall mark is the summation of the checklist item 

scores for any one station. 

The basic structure of an OSCE may vary; for example, the length of time allowed per 

station, whether checklists or rating scales are used for scoring, who scores (a clinician, a 

standardised patient, a trained lay observer) and whether real patients or manikins are 

used. However, the fundamental principle is that every candidate has to complete the same 

assignments in the same amount of time and is marked according to structured scoring 

instruments. 

Although the symposium is based on the OSCE, the issues being discussed should be of 

interest to delegates who use performance-based assessments for assessing professional 

competence. 

ENAC 2008 41

Symposium: High-Stakes Performance-Based Assessment / Paper 1: 

Lessons Learned from Administering a National OSCE for Medical Licensure 

David Blackmore, Medical Council of Canada, Canada 

Rationale 

Canada was the first country in the world to use a performance-based objective structured 

clinical examination (OSCE) as part of a national medical licensing examination. The 

Medical Council of Canada (MCC) awards the Licentiate (LMCC) which, in turn, is used as a 

prerequisite for licensure by the Canadian medical regulatory authorities. The pilot work and 

early testing of this examination took place in the late 1980s and examination 

implementation occurred in 1992. Since then, the Medical Council OSCE has undergone 

many changes in both its format and procedures. The experience gained from over 

35 administrations is the basis for the lessons learned. 

Methodology 

To be awarded the LMCC, an examinee must complete a two-part MCC Qualifying 

Examination (MCCQE). The MCCQE Part I is a one-day, computer-administered 

examination which the examinees usually take upon graduation from medical school. The 

examinee must then complete at least 12 months of postgraduate training before attempting 

the MCCQE Part II which is a multi-station OSCE administered to over 3000 examinees 

annually across 16 examination sites within Canada. Each OSCE consists of multiple 

stations where a physician examinee interacts with a standardized patient. A physician 

examiner observes the encounter in real-time and scores a checklist and global ratings. 

Some stations are followed by a written exercise and some stations contain a structured 

oral question administered by the physician examiner at the end of the encounter. 

Discussion 

In 1992, the MCCQE Part II consisted of 20 stations: 10 ten-minute patient-encounter 

stations and 10 five-minute patient-encounter stations followed by five-minute written 

exercises known as post encounter probes (PEPs). Two forms of this examination were 

constructed where one form was administered on a Saturday and the second form was 

administered on the following day. Two forms were required to accommodate the many 

examinees which needed to be tested concurrently across Canada. By 2008, the MCCQE 

Part II has evolved into an examination consisting of 14 stations: 7 ten-minute stations, 

5 five-minute stations followed by PEPs, and 2 pilot or pretest stations. 

Many lessons have been learned from 16 years of testing tens of thousands of physicians 

at multiple examination sites in a high-stakes licensing examination. Issues related to 

examination administration such as standardized patient recruiting and training; examiner 

recruitment, training, and retention; examination implementation, scoring, standard setting, 

and reporting have arisen over time. In addition, there have been several challenges arising 

from administering a multi-site examination across six different time zones. Some 

examination techniques have worked out better than others and the MCC examination has 

improved as it has matured. The MCC is also addressing new challenges related to 

changing examination content such as measuring professionalism and team participation. 

This presentation outlines the challenges and solutions to performance testing that have 

presented themselves as a result of the MCC employing the OSCE format in a high-stakes 

licensing examination. 

42 ENAC 2008


Quality Assurance through the OSCE Life Cycle 

Sydney Smee, Medical Council of Canada, Canada 

Rationale 

While the Objective Structured Clinical Examination (OSCE) is a format that allows for valid, 

reliable and fair testing of clinical skills, creating an OSCE that meets these criteria requires 

effort. When that effort is made, decisions based on the OSCE scores become defensible. 

A range of quality assurance measures can be taken throughout the life cycle of an OSCE 

to ensure that it meets testing standards such as those set by the American Educational 

Research Association (AERA), American Psychological Association (APA) and the National 

Council of Measurement in Education (NCME). Which quality assurance measures and to 

what degree should they be implemented is a judgment call based on the consequences of 

the decisions for test takers and test users. 

The Medical Council of Canada’s Qualifying Examination Part II is an OSCE scored by 

physicians and administered across multiple sites to candidates who have successfully 

completed 12 months of post-graduate training. This OSCE is a prerequisite for medical 

licensure in Canada and so considerable effort is made to ensure the testing process is fair 

and the scores are sufficiently valid and reliable for making high-stakes pass-fail decisions. 

The Medical Council’s quality assurance practices and the rationale behind them are 

discussed so the value of these practices for other settings can be considered. 

Methodology 

The processes used by the Medical Council at each of five stages in an OSCE cycle will be 

described, with links to the 1999 Standards for Educational and Psychological Testing 

(AERA, APA, & NCME): 

1. Validity through Case Development 

2. Improving Reliability with Standardized Patient Training, Staff Orientation and Examiner 

Briefing 

3. Steps to Ensure Fair OSCE Administration 

4. Validity and Reliability - Psychometrics and Standard Setting 

5. More about Fairness - Incidents and Appeals 

Discussion 

The discussion will look at the importance of these quality assurance processes, the 

Medical Council’s rationale for certain approaches and the implementation challenges that 

have been encountered over sixteen years experience with the Part II OSCE. Quality 

assurance minimizes the risk of false positive and false negative pass-fail decisions. 

Without quality assurance, pass-fail decisions are not defensible. Therefore the discussion 

will consider which of the approaches used for this high stakes OSCE could be adapted to 

other settings; for example, medical schools. 

ENAC 2008 43


Investigating OSCE Error Variance when measuring higher level competencies 

Godfrey Pell, University of Leeds, United Kingdom 

Richard Fuller, University of Leeds, United Kingdom 

Rationale 

Standardization and reliability are major concerns with Objective structured Clinical 

Examinations (OSCEs), but quality metrics permit deeper analysis of examination 

performance. This paper investigates the relationship between OSCE structure and error 

variance (i.e., variance due to factors other than student performance), building on previous 

research into sources of error variance. 

Methodology 

Analysis of recent 3rd, 4th & 5th (final) year OSCE results from the University of Leeds is 

considered to highlight the important problems that exist with error variance in OSCE 

scores. The impact of revisions to examiner instructions and item checklists / mark sheets, 

most notably the inclusion of intermediate grade descriptors and a reduction in the number 

of checklist items, is then assessed using 2007 and 2008 5th year OSCE data. 

Discussion 

Although error variance may be simply defined as that variance which is due to factors other 

than differences in performance caused by varying student ability, it remains possible to 

construct a variety of models of differing complexity to quantify this error. Discussion will 

include consideration of which of these models may be the most appropriate. 

Other questions to be addressed include: 

• What can we learn about problem OCSE stations from the metrics available to us, and 

how might this inform us with respect to development of improved assessments? 

• Should we have long or short checklists? 

• How can we measure higher level competencies? 

• What effects do these and other issues have on reliability? 

44 ENAC 2008


Beyond checklist scoring – 

clinicians’ perceptions of inadequate clinical performance 

Katharine Boursicot, University of London, United Kingdom 

Trudie Roberts, University of Leeds, United Kingdom 

Jenny Higham, Imperial College London, United Kingdom 

Jane Dacre, University College London, United Kingdom 

Rationale: There has been concern among medical educators and practitioners that OSCEs 

test only practical technical skills and do not scrutinise the deeper layer of understanding 

supporting those skills. The notion of having a checklist of items to measure the performance of 

clinical skills has been criticised for being reductionist and failing to capture the higher-order 

nature of clinical judgement and diagnosis. To try to understand what elements assessors felt 

were not captured by existing checklists we instituted the use of ‘Cause for Concern’ forms 

whereby examiners could report a candidate’s performance they found to be unacceptable. 

Methodology: Four medical schools introduced the use of the Cause for Concern forms as an 

addition to the checklists in each of their respective graduation OSCEs . The OSCEs consisted 

of 17 to 26 different stations across the four schools. The examiners were clinicians, some of 

whom were academics from medical school faculties. All were medical or other healthcare 

practitioners involved in the teaching and assessment of medical students; all were familiar with 

the standards expected at graduation and current professional medical practice. All examiners 

were offered training sessions on examining at OSCEs. Examiners were briefed to use the 

‘Cause for Concern’ forms about students whose performance in any area would mean that 

patient care could be compromised but where the issues causing concern were not captured by 

the checklists. After the OSCEs, the forms were collected and analysed. Three raters 

independently reviewed the comments from all the medical schools and derived recurring 

themes. These were refined into seven themes. The raters independently ascribed all the 

comments to each of the themes and dominant themes were identified. 

Results: The total numbers of forms completed from all four medical schools was 152 out of 

a total of 25,800 student-examiner encounters; this represents a reporting rate of 0.6%. The 

seven themes identified by the three raters were: Clinical skills–poor technique; Clinical 

skills–failure to elicit or recognise correct signs; Poor diagnostic ability (interpretation of 

signs); Poor/inadequate knowledge; Professional behaviour–personal (e.g. anxiety, lack 

of/over-confidence, appearance); Professional behaviour–towards patient (e.g. rough, 

inappropriate attitude); Poor communication skills (written and oral). 

Discussion: There was much commonality in the themes reported across the four medical 

schools. The themes reflected the anecdotal evidence which prompted this study, in that 

clinician examiners were concerned that professional behaviours and higher level 

diagnostic skills were lacking in students who nonetheless managed to pass the OSCE 

based on their checklist score. The main areas of concern which were reported related to 

fundamental medical skills or professional behaviour. Overall the reporting rate was very 

low, indicating that the overwhelming majority of students satisfied the clinician examiners 

with their clinical competence and professional behaviour. When making decisions about 

graduation, the ‘Cause for Concern’ forms could be used in addition to checklists to gain a 

fuller perspective on those students where there are concerns about meeting minimum 

clinical competence requirements and unacceptable professional behaviours. 

ENAC 2008 45

Symposium: Peer Assessment: 

Towards (quasi-) experimental research on the design of peer assessment 

Organiser: Dominique Sluijsmans, Open University, The Netherlands 

Peer assessment is an arrangement where equal status students judge a peers’ 

performance with a rating scheme or qualitative report (Topping, 1998), and stimulates 

students to share responsibility, reflect, discuss and collaborate with their peers (Boud, 

1990; Orsmond, Merry, & Callaghan, 2004). To date, most studies on peer assessment 

treat the quality of peer assessment – i.e., performance improvement and learning benefits 

– as a derivative of the accuracy of peer marks compared to teacher marks (Falchikov & 

Goldfinch, 2000). In addition, many researchers advocate that peer assessment has a 

positive effect on learning, but the empirical evidence is either based on student self-report 

ratings or anecdotal evidence from case studies and not on standardised performance 

improvement measures. Furthermore, peer assessment is rarely studied in (quasi-) 

experimental settings (comparing an experimental group to a control or baseline group), 

which considerably limits the claims and evidence regarding specific conditions that are 

believed to affect learning. Hence, the empirical support for learning effects, as well as for 

specific peer assessment conditions, is scarce. 

In this symposium, three contributions are presented that are interlinked in three ways: 1) 

they investigate peer assessment from a (quasi) experimental perspective; 2) they address 

the value of peer assessment for the individual learner, and 3) they aim at providing clear 

guidelines on the design of peer assessment. 

In the first contribution by van Zundert et al., it is studied how task complexity and the 

structure of the peer assessment formats used by students to conduct the peer 

assessment, affects students’ domain-specific learning and their peer assessment skill. In 

addition, the impact of cognitive load, students’ attitudes towards peer assessment, and 

transfer are considered. Finally, generalisability analyses will performed to determine 

reliability of peer assessments in relation to different formats. In the second contribution by 

Sluijsmans et al. peer assessment is investigated from the perspective of group work. The 

effects of four peer assessment design variations on individual marks and the reliability of 

peer assessment are investigated. The results show that the design of a peer assessment 

method strongly influences the transformation of a group mark into individual marks. The 

third contribution by Strijbos et al. studies the impact of peer feedback content and 

characteristics of the sender. Previous studies show that students express concerns about 

the fairness and usefulness of peer assessment, and Strijbos et al. hypothesise that this 

finding may be related to sender characteristics that feedback perception may influence the 

effect of peer feedback and subsequent performance. 

For peer assessment research to advance, identifying the gap between what we know 

about peer assessment and what we claim about peer assessment is crucial. In this respect 

we advocate more (quasi) experimental research – enabling the investigation of specific 

components and conditions as compared to holistic evaluations via case studies. Although 

the thick description in such case studies of specific peer assessment provide a wealth of 

evidence for hypothesis generation, these theories or guidelines should subsequently be 

tested in controlled experimental settings to warrant generalisations. 

46 ENAC 2008

Symposium: Peer Assessment / Paper 1: 

The effects of peer assessment format and task complexity on learning and 

measurements 

Marjo van Zundert, Open University, The Netherlands 

Dominique Sluijsmans, Open University, The Netherlands 

Jeroen van Merriënboer, Open University, The Netherlands 

Former research by Van Zundert, Sluijsmans, and Van Merriënboer (in progress) 

emphasised that variety in peer assessment practices and holistic reports (i.e., without 

specifying all variables) in peer assessment research reveal the nesessty to specify what 

exactly contributes to learning and measurements (e.g., reliability). Moreover, it was shown 

that the share of (quasi-) experimental peer assessment studies is insufficient. The current 

study examined the effects of peer assessment formats and task complexity on learning 

(domain skill, peer assessment skill, and student attitudes) and measurements (agreement 

between peer and expert assessment, between multiple peer assessments, and between 

qualitative and quantitative assessments). It was assumed that a highly structured peer 

assessment format increases learning and measurements. High structured formats differed 

from low structured formats by integration of first and higher order skills, whole-task 

approach, and low cognitive load. It was additionally assumed that a high structured format 

is especially beneficial for complex tasks. Complex tasks induce a higher cognitive load, 

which was achieved by editing simple tasks according to three principles of element 

interactivity of Cognitive Load Theory. Participants were 110 secondary education students. 

They worked in an electronic learning environment through a series of questionnaires and 

tasks. The students were randomly assigned to one of the four conditions: low/high 

structured formats – simple/complex tasks. After an introduction students logged on to a 

computer and completed an attitude questionnaire. Then they studied four study tasks 

accompanied by a peer assessment format. In the tasks, which consisted of short 

descriptions of biology research, students were supposed to learn to recognise the six steps 

of scientific research (i.e., observation, problem statement, hypothesis, experimental stage, 

results, and conclusions). The study tasks were read carefully. After each study task 

students reported the cognitive load measure of Paas, Van Merriënboer and Adams (1994). 

Next, they solved two transfer tasks (attaching the steps of research to the matching 

research description), again followed by the cognitive load measure. Subsequently students 

received two peer assessment tasks (evaluating the solution of a fictitious peer), and 

reported the cognitive load measure. Finally the attitude questionnaire was completed again 

and students logged out. Data will be analysed by ANOVA and generalisability analyses. As 

opposed to much previous research, this study attempted to clarify peer assessment effects 

by applying quasi experimental research, and by using specific instead of holistic reports. 

More (quasi-) experimental research is required in the future, to provide transparency in 

peer assessment variety and to account for peer assessment effects. 

ENAC 2008 47


Modelling the impact of individual contributions on peer assessment during 

group work in teacher training: In search of flexibility 

Dominique Sluijsmans, Open University, The Netherlands 

Jan-Willem Strijbos, Leiden University, The Netherlands 

Gerard Van de Watering, Eindhoven University of Technology, The Netherlands 

During collaborative learning students work together to accomplish a specific group task, 

e.g. performing an experiment, writing a collaborative report, carrying out a group project or 

a group presentation. These group tasks aim to facilitate peer learning and the development 

of collaboration skills. However, since the assessment strongly influences learning in any 

course, utilising collaborative learning must have assessment that promotes collaboration 

(Frederiksen, 1984). Social loafing (tendency to reduce individual effort when working in 

groups compared to individual effort expended when working alone; see Williams & Karau, 

1991) and ‘free riding’ (an individual does not bear a proportional amount of the group work 

and yet s/he shares the benefits of the group; see Kerr & Bruun, 1983) are two often voiced 

complaints by students regarding unsatisfactory group-work experiences (Johnston & Miles, 

2004). Positive interdependence and individual accountability (the latter explicitly introduced 

to counter free-riding) play a crucial role during group work In order for a group to be 

successful, all group members need to understand that they are each individually 

accountable for at least one aspect of the group task. Teachers regard a peer assessment 

as a valuable and practical tool to reduce social loafing and free-riding effects. Moreover, it 

can serve as a tool to increase students’ awareness of individual accountability and to 

promote positive interdependence. 

Although a fair amount of studies acknowledges the significance of individual contributions 

in groups via peer assessment (Lejk & Wyvill, 1996), there are two serious weaknesses in 

the design of the methods that are used to transform group marks into individual marks 

using peer ratings. First, they take a psychometric perspective (calculate) rather than an 

edumetric (design) perspective – which fits better with contemporary developments such as 

competency-based education. Second, they are weak when it comes to flexibility of peer 

assessment in group work: the students are not involved in choosing criteria, weighting of 

criteria and their participation in peer assessment is obligatory. In this study, the effects of 

four peer assessment design variations on individual marks and the reliability of peer 

assessment are investigated. These variations are modelled using the baseline dataset with 

self- and peer assessment ratings of 72 teacher training students in their fourth year for the 

Bachelor of Education. The results show that 1) the design of a peer assessment method 

strongly influences the transformation of a group mark into individual marks, and 2) that the 

reliability of a peer assessment depends on the weight of the criteria, the rating scale, the 

inclusion of self-assessment, and maximum deviation of an individual mark from the group 

mark. A more in-depth discussion of the goal of peer assessment and its implications for the 

design of peer assessment with respect to the flexible and adaptive use of peer assessment 

in group work is required. 

48 ENAC 2008


Peer feedback in academic writing: How do feedback content, writing ability-level and 

gender of the sender affect feedback perception and performance? 

Jan-Willem Strijbos, Susanne Narciss, Mien Segerss 

Leiden University, The Netherlands 

The shift towards student-centered learning places a high emphasis on students to assume 

responsibility for their learning. Peer assessment is well-suited in this respect: equal status 

students judge a peers’ performance with a rating scheme or a qualitative report (Topping, 

1998). Many peer assessment researchers stress that feedback is essential for 

performance improvement and learning benefits, but the evidence for performance and 

learning effects is scarce. Moreover, the impact of feedback content types is hardly studied. 

Students also express concerns about the fairness and usefulness of peer assessment 

(Cheng & Warren, 1997), which appears related to sender characteristics that may 

influence the effect of peer feedback (Leung, Su, & Morris, 2001). 

We conducted two studies to investigate the impact of feedback content and sender 

characteristics (writing ability-level and gender) using a factorial pre-test treatment post-test 

control group design in the context of academic writing in higher education. Study 1 

consisted of a two-way factorial design (Nexp = 71, Ncontrol = 18) and Study 2 had a threeway 

factorial design (Nexp = 160, Ncontrol = 19). In each study subjects in the experimental 

condition received a scenario in which a fictional student received fictional peer feedback. 

Subjects’ feedback perception (i.e., fairness, usefulness, acceptance, willingness to improve 

and affect) and performance (text revision quality) were investigated. 

Study 1: Subjects in experimental conditions received concise evaluative feedback (CEF) or 

elaborated informative feedback EIF) and ability-level of the sender was high or low. A 

principal component analysis revealed the latent factor ‘Perceived Adequacy of Feedback’ 

(PAF, comprising fairness, usefulness and acceptance, 9 items, α = .89). MANOVA 

revealed that EIF is perceived as more adequate. A two-way interaction for affect (AF, 6 

items, α = .81) revealed that students with EIF by a high-ability peer express more negative 

affect compared to CEF by a high-ability peer, and the opposite was observed for feedback 

by a low-ability peer. A repeated measures MANOVA showed that performance increases 

in all conditions over time, but performance for EIF by a high-ability peer was significantly 

lower compared to the control condition. 

Study 2: In addition to the variations in Study 1, the experimental conditions also varied with 

respect to gender of the sender (typical male/ female name: Joost versus Astrid). As in 

Study 1, a principal component analysis revealed the latent factor PAF (α = .90). MANOVA 

revealed that EIF is perceived as more adequate. A three-way interaction effect revealed 

that subjects’ willingness to improve (WI, 3 items, α = .70) for the feedback by ability-level 

combinations appears to be different for gender of the sender. The performance data are 

currently being analysed and will be presented by the time of the conference. 

The results of both studies reveal that feedback perception is an important aspect to be 

considered with respect peer assessment. It should be noted that both samples were 

female dominated (7 to 1 ratio), which limits strong generalisations. However, a comparative 

study with a more even gender distribution is currently conducted in secondary education. 

ENAC 2008 49

Symposium: Assessment in kindergarten classes: 

Assessment in kindergarten classes: 

experiences from assessing competences in three domains 

Organiser: Marja van den Heuvel-Panhuizen, FIsme, Utrecht University, The Netherlands/ 


Chair/Discussant: Kees de Glopper, University of Groningen, The Netherlands 

Assessment of young children is a challenging endeavour. Defining and measuring 

development and learning in young children is quite complex, for a variety of reasons (see 

e.g. Shepard, 1994). Test performance of 4- and 5-year-olds can be highly variable. With 

young children, mismatches between the content of tests and children’s existing knowledge 

and experiences lie in wait. Young children may also encounter difficulties in participation, 

due to unfamiliarity with rules and conventions for obtaining responses. In response to 

these problems, several guiding principles for the assessment of young children have been 

proposed: assessments should bring about benefits for children, they should reflect and 

model progress toward important learning goals, their methods must be appropriate to the 

development and experiences of young children, and they should be tailored to a specific 

purpose. 

This symposium discusses experiences with assessing mathematical, literary and socialemotional 

competences in kindergartners. Children’s learning in these domains is 

investigated in three interlinked research projects that are part of the PICO research 

programme (PIcture books and COnceptual development). Each project aims to determine 

the instructive value of picture books and follows the same approach. First, potential 

contributions of picture books to children’s development are identified through analyses of 

picture books. Second, we develop ‘keys’ that help teachers to unlock the richness of 

picture books and to establish engaging and instructive interaction. We use design-based 

research to develop keys for 24 picture books. Third, we do a quasi-experimental study to 

assess the developmental yield of the picture books and their corresponding keys. 

To evaluate the effect of the intervention program each PICO-project developed procedures 

and tasks for assessment, trying to adhere to the abovementioned principles and, at the 

same time, doing justice to the nature of the competence domain that is assessed. This 

resulted in a set of tools for assessing young children in three quite diverse domains of 

children’s development in which different assessment formats are used ranging from 

individual to group assessment, from oral to written assessment, and from open questioning 

to multiple-choice questioning. What all the assessment tools have in common is that they 

are grounded in a picture-book context. 

In the symposium we like to share with the audience our experiences with developing the 

assessment tools and analyzing the collected data, and the knowledge we gained regarding 

the children’s development and the way to assess this. We hope to draw the audience into 

a discussion about the obstacles and opportunities in assessing young children’s 

development in different competence domains and to touch issues of further research. 

50 ENAC 2008

Symposium: Assessment in kindergarten classes / Paper 1: 

A picture book-based tool for assessing literary competence in 4 to 6-year olds 

Coosje van der Pol, Tilburg University, The Netherlands 

Helma van Lierop-Debrauwer, Tilburg University, The Netherlands 

Introduction: The paper addresses the challenging topic of assessing literary competence of 

young children. The study reported here is part of the PICO-li project, in which we 

investigate whether and how picture books contribute to the literary development of 

kindergartners. Literary competence starts by looking with children at picture book stories 

as aesthetic compositions of text and pictures. The composition of a story is based on 

literary and social codes and conventions. The PICO-li project investigates three subdomains 

of literary competence: understanding story characters, suspenseful story 

elements and ironic humour. 

Development of the assessment tool: The developed tool is partly based on the Narrative 

Comprehension (NC) task for assessing children’s comprehension of narrative picture 

books (Paris & Paris, 2001). The NC-task has been adapted in order to concentrate on 

literary codes and narrative conventions and their aesthetic evaluation by the reader. 

Description of the assessment tool: The PICO-li assessment tool uses the picture book 

Cottonwool Colin (2007) by Jeanne Willis and Tony Ross to elicit the children’s responses. 

This book covers all three sub-domains of literary competence. The assessment has three 

parts: first, the child is invited to go through the book and respond spontaneously. In the 

second round, the book is taken away and replaced by an electronic version. During the 

assessment a child views the scanned pages of the real book on a computer screen whilst 

listening to the text being read aloud through the speakers. This ensures that the story is 

read to all the children in exactly the same way. Afterwards the child is asked to retell the 

story. In the third round, the child answers ten questions related to the three sub-domains. 

The final question is a productive question about the story’s main character. His 

development from small and weak to tall and strong is the story’s main theme. At the end of 

the story Cottonwool Colin no longer seems an appropriate name for the protagonist. The 

child is asked to think up a more appropriate name for him. 

Data collection and analysis: Almost 100 children from 18 different classrooms have been 

tested individually using the PICO-li assessment tool. Their responses have been videorecorded 

and transcribed onto answer sheets. Data from the first round are analysed for 

spontaneous comments on pictures, story line and attempts at interpretation. The retellings 

from the second round are analysed for six story structure elements: setting; characters; 

goal/initiating event; problem/episodes; solution; resolution/ending. Scoring rubrics have 

been created with higher scores reflecting a more literary stance towards the story than 

lower scores. 

Discussion: Although we are still in the process of analyzing data, our first impression is that 

the tool reveals valuable information on the children’s development of literary competence. 

At the conference we will concentrate on what we found out about the children’s concepts of 

story characters. In our presentation we will discuss the implications of our analyses for the 

practice of assessment and the problems and benefits that may arise when assessing 

literary competence in young children. 

ENAC 2008 51


Assessing the social-emotional development of young children by means of 

storytelling and questions 

Aletta Kwant, University of Groningen, The Netherlands 

Jan Berenst, University of Groningen, The Netherlands 

Kees de Glopper, University of Groningen, The Netherlands 

Introduction: Social and emotional competences are important for adequate functioning. 

This is true for human beings at almost any age including kindergartners. Unclear is where 

and how young children can learn these competences. The PICO-sem (PIcture books and 

COncept development in the social and emotional domain) project is aimed at investigating 

whether the use of picture books in kindergarten classes can contribute to the development 

of these competences. The children are read a series of picture books that address a 

number of events that are identified as highly instructive for understanding and using social 

and emotional behavior. To evaluate the effect of the picture book program a tool is 

developed to assess the children’s social and emotional and to some extend moral 

development. 

The PICO-sem assessment tool: The tool consists of a series of about 40 tasks which are 

all connected to short sketches in which social and emotional components play a role. 

The complete administration of the test took about half an hour split up in two parts. The 

tasks do not require writing and reading skills. To avoid a too strong dependence from 

verbal skills, we used not only production tasks but also recognition tasks. In the latter, it is 

assessed with the help of pictures whether the children can recognize aspects of social and 

emotional behavior. Because the children might be influenced by these pictures and their 

description, these recognition tasks come after the production tasks. 

Data collection and analysis: In January and May 2008 about 110 children from 20 

kindergarten classes have been tested individually by two trained research assistants. The 

collected data are scored for the use of social-emotional expressions by the children. In the 

analysis we tried to explore whether there is a difference between the production and 

recognition tasks. 

Results and discussion: In our presentation we will focus on questions about the feelings of 

the main character of a short story. The results from the production tasks will be compared 

with them from the recognition tasks in which the children have to react to a set of depicted 

emotions. The data are all from the assessment in January. In this test, that was 

administered before the program was carried out, we found that only asking the children to 

tell about emotions did not reveal their deep understanding. When the children had to react 

to pictures their answers were more differentiated then in the production tasks. 

Assessing young children is rather challenging. An important issue for us is to know to what 

extent our assessment really catches the children’s understanding and knowledge about 

emotions. In the discussion we will address the question whether our approach to assess 

kindergartners’ social and emotional development is a fruitful avenue to follow. 

52 ENAC 2008


Assessing mathematical abilities of kindergartners: 

possibilities of a group-administered multiple-choice test 

Sylvia van den Boogaard, Utrecht University, The Netherlands 

Marja van den Heuvel-Panhuizen, FIsme, Utrecht University, The Netherlands/ 


Introduction: Knowledge of children’s mathematics development is of crucial importance for 

offering them support for further learning. In the case of kindergartners (i.e. 4- and 5-yearolds 

who have not yet entered formal education), observations and interviews are mostly 

used to collect this information. Group-administered, multiple-choice tests often are not 

considered to be an adequate assessment tool for young children (e.g. Fuson, 2004). 

Nevertheless, this assessment format does have potential to provide relevant information 

about children’s development, as is shown earlier (see Van den Heuvel-Panhuizen, 1996). 

The present study builds on these previous experiences and aims at increasing our 

knowledge about kindergartners’ mathematical understanding and designing a multiplechoice 

test to reveal this understanding. The test is developed in the context of the PICOma 

(PIcture books and COncept development in mathematics) project that investigates 

whether and how picture books contribute to the mathematical understanding of 

kindergartners. 

The PICO-ma Test: The mathematical content included in the test covers three subdomains 

of early mathematics: number (with special attention to “structuring numbers”), 

measurement (in particular the theme “growth”), and geometry (with the focus on “taking a 

point of view”). The guiding principle for developing the test is offering children a meaningful 

and familiar context in which they can show their understanding regarding these content 

domains. Therefore, all items are presented in a picture-book-like style; most of the 

questions are inspired by picture-book stories. 

For most items, four alternative solutions are given. The children have to put a line under 

the correct solution. We made sure that there is no need for the children to read text or 

numbers. The accompanying questions are read aloud to the children. 

A draft version of the test was tried out on a number of individual children. After this, the 

final selection of items was made and several items were revised. The final test contains 

42 items; 14 items for each sub-domain. 

Data collection and analysis: In January 2008, about 400 children from 18 kindergarten 

classes took the test. The test was administered in two sessions with an interval of one 

week. Half the children of one class were assessed at a time. A trained research assistant 

led the children, who marked the right answer in their own test booklets, through the test. 

The collected data are scored as correct or incorrect and analyzed in connection with 

scores on a standardized mathematics test including classification, seriation, and 

comparison, and with information about age, sex, and socio-economic status. 

Presentation of results and discussion: In our presentation, we share our experiences with 

designing, administering, and analyzing the test, and give details about the psychometric 

quality of the test. We present our findings regarding the kindergartners’ mathematical 

understanding in the three sub-domains and how the sub-domains are related. In the 

discussion we like to reconsider the potential of group-administered paper-and-pencil tests 

for assessing young children’s mathematical development. 

ENAC 2008 53

54 ENAC 2008

Papers 

ENAC 2008 55

56 ENAC 2008

Ethical Dilemmas: 

‘Insider’ action research into Higher Education assessment practice 

Linda Allin, Northumbria University, United Kingdom 

Lesley Fishwick, Northumbria University, United Kingdom 

Newman (2000) documents that investigative enterprises with a focus on practice inquiry 

have emerged over the last decade, and have been identified as teacher research 

(Cochran-Smith and Lytle, 1993), action research (Winter, 1987) and reflective practice 

(Schon, 1987). The key aim of such studies is to try to solve the immediate and pressing 

day-to-day problems of practitioners. Within university departments, how to provide 

authentic, innovative and student centred assessment for learning within the context of 

increasing student numbers is one of the most pressing problems for programme managers 

and lecturers. Recent national student surveys highlight assessment and student feedback 

to be two key areas of dissatisfaction for students. We suggest that insider action research 

to understand staff experiences of setting assessments is a valuable starting point in 

evaluating and then improving assessment practices. The main aim of this paper is to 

examine tensions of teaching in Higher Education in relation to identifying the constraints 

and pressures which impact on lecturer’s daily work. The focus is on understanding the 

various influences and decision-making as education professionals in terms of uncovering 

assumptions which drive our assessment practices. The methodology is in-depth interviews 

asking staff their views on the purpose of assessment, the barriers to setting innovative 

assessments and their concerns over assessment processes. The study has led to a series 

of unanticipated ethical issues relating to conducting of qualitative research with colleagues. 

Oliver and Fishwick (2003) argue that there is room for discussion and debate about ethical 

considerations in qualitative work. Such debates focus on key principles including not doing 

harm (nonmaleficence), justice, autonomy and research related benefits for participants. 

Several papers discuss practical ethical problems, but they are often are more oriented 

towards issues with involving students, rather than staff, as participants (Ferguson, Yonge 

and Myrick, 2004; Hammack, 1997). In this paper, we aim to stimulate discussion into some 

of the ethical dilemmas faced by ‘insider’ research into assessment. We highlight the 

experiences of a CETL assessment for learning team in developing and implementing 

research into staff views and experiences of assessment within one particular university 

department. Such ethical dilemmas include insider/outsider perspectives, role conflicts, 

accessing staff and the process of interviewing as well as the more usually identified ethical 

concerns relating to informed consent, anonymity, confidentiality and the right to withdraw. 

A key concern for participants became the issue of trust and safeguarding privacy as well 

as assuring anonymity. In the paper we reflect on discussions within the team following the 

pilot study, and identify actions taken to address some of the dilemmas encountered. We 

identify the need to take a critically reflective stance on research into assessment practices 

and highlight the way in which minimising power relations and creating an atmosphere of 

trust are central if such research is to reach its purpose of enhancing assessment practice. 

ENAC 2008 57

Reciprocal Peer Coaching as a Formative assessment strategy: 

Does it assist student to self regulate their learning 

Mandy Asghar, Leeds Metropolitan University, United Kingdom 

Research has shown that cognitive gains are significantly higher in pairs that work together 

when compared to students studying independently (Ladyshewsky 2000, Topping 2005). 

Higher achievement, more caring and supportive relationships, greater psychological 

health, social competence and self esteem are all valuable consequences of introducing 

peer assisted learning strategies into the curriculum. In reciprocal peer coaching students 

goals are inter-related and the most successful outcome is dependent on mutual coaching. 

Reciprocal peer coaching is used as an innovative formative assessment strategy to test 

the competency of physiotherapy students’ abilities to carry out the practical skills required 

to become a successful therapist. Traditionally these skills were assessed exclusively in a 

summative format at repeated points throughout level 1 which resulted in many students 

trailing failure throughout the year. Module evaluation provided anecdotal evidence of the 

benefits of this change in the assessment strategy. A subsequent qualitative research 

project has explored “students’ perceptions of reciprocal peer coaching as a strategy to 

formatively assess practical skills”. Individual interviews and focus groups were used to 

collect data which has been analysed from a phenomenological perspective that considers 

the lifeworld as a lens through which to view the students’ lived experience (Ashworth 2003) 

Initially 4 themes have emerged from the data and include Motivation and Learning, the 

Emotional Experience of Learning, Learning Together and Contextualising the Learning 

Experience. Although students valued the feedback about their knowledge and abilities from 

the formative assessment process they expressed frequently a willingness to engage with 

reciprocal peer coaching as it provided that “pressure” which made them study. When 

considering the theoretical models of self regulation of learning many of the participants 

described a view that fitted with this model. Key aspects of which include self efficacy, 

motivation and emotion, the nature of an individuals goals and the ability to engage 

metacognitive processes. Issues identified included time management and students 

tendency to procrastinate as they find it hard to set themselves short term goals but that 

they felt that this formative assessment strategy helped them to manage. The variance 

between student goals some seeking mastery goals, others performance goals and the 

associated emotions of angry, anxiety, frustration, and relief associated with this 

assessment were all reported by these level 1 students. 

Self regulation in new environments and subject areas can be difficult for students who as 

novices often fail to employ metacognitive strategies to set goals for themselves and self 

assess their progress, many tending to compare themselves with others in order to judge 

the need to learn (Zimmerman 2002). It is suggested that self regulation is not easy and one 

that requires a scaffolding of strategies to encourage its development. (Pintrich 1999) 

Although recognised for its valuable role in the provision of feedback (and indeed the 

influence this may have on students self –efficacy), formative assessment has a role in 

assisting students being able to self regulate their learning, learning how to learn. It is this 

theory related dimension to formative assessment that I would like to discuss. 

58 ENAC 2008

Using an adapted rank-ordering method to investigate 

January versus June awarding standards 

Beth Black, Cambridge Assessment, United Kingdom 

Aims: The dual aims of this research were (i) to pilot an adapted rank-ordering method and 

(ii) to investigate whether UK examination awarding standards diverge between January 

and June sessions. 

Background: Standard maintaining is of critical importance in UK qualifications, given the 

current ‘high stakes’ environment. At qualification level, standard maintaining procedures 

are designed to ensure that a grade A in any particular subject in one year is comparable to 

a grade A in another year, through establishing the equivalent cut-scores on later versions 

of an examination which carry over the performance standards from the earlier version. 

However, the current UK method for standard setting and maintaining - the awarding 

meeting as mandated by the QCA Code of Practice (2007) - introduces the potential for the 

awarding standards of the January and June sessions to become disconnected from each 

other. Additionally, from a regulatory and comparability research point of view, the January 

sessions have been largely ignored, despite the increasing popularity of entering candidates 

for January units since Curriculum 2000. 

Given the difficulties in quantifying relevant features of the respective cohorts, (e.g. the 

January candidature is more unstable), and the problems in meeting the assumptions 

necessary for statistical methods (e.g. Schagen and Hutchison, 2008), arguably the best 

way to approach this research question is to use a judgemental method focusing on 

performance standards. In this study, the chosen method involves expert judges making 

comparisons of actual exemplars of student work (‘scripts’). 

Method: A rank-order method was employed adapted from Bramley (2005). Archive scripts 

at the key grade boundaries (A and E) from the previous six sessions (comprising three 

January and three June sessions) from two AS level units in different subjects were 

obtained. Whilst previous rank order exercises (e.g. Black and Bramley, in press) required 

judges to rank order ten scripts per pack spanning a range of performance standard, in this 

study each exercise involved scripts which were, at least notionally, of more similar quality 

(e.g. all exactly E grade borderline scripts) and therefore an adaptation was required. Rankordering 

sets of three scripts retains many of the advantages of rank-order over traditional 

Thurstone paired-comparisons as well as an additional advantage - asking of judges a more 

natural psychological task: to simply identify, on the basis of a holistic judgement, the best, 

middle and worst script. 

Analysis: Rasch analysis of the rank order outcomes produced a measure of script quality 

for each script and, using an ANOVA, it was possible to examine effects of session and 

session type (i.e. January versus June). The research indicated that the two AS units 

displayed different patterns of performance standards for January versus June. 

Discussion: The discussion will be research-related and practice-related. The paper will 

address the potential for using this method to investigate comparability in a variety of 

contexts, and implications for standard-maintaining processes. 

ENAC 2008 59

Generating dialogue in coursework feedback: 

exploring the use of interactive coversheets 

Sue Bloxham, University of Cumbria, United Kingdom 

Liz Campbell, University of Cumbria, United Kingdom 

This two year study examines feedback designed to create a dialogue between tutor and 

student without additional work for staff. Research is developing conceptual understanding 

regarding how feedback can effectively contribute to student learning (Higgins 2000, Gibbs 

& Simpson 2004, Nicol & Macfarlane-Dick (2004), Brown & Glover, 2006). Emphasis is 

being paid to the notion of feedforward (Hounsell, 2006) designed to reduce the gap 

between the standards students are expected to achieve and their current level of 

performance (Sadler 1998). Studies have examined the extent to which different types of 

tutor feedback better enable students to ‘close the gap’ (Brown & Glover 2006). Evidence 

suggests that feedback tends to focus on assignment ‘content’ whereas students find 

comments on their ‘skills’ to be more useful for future writing (Walker, 2007). Furthermore, 

feedback which elaborates on corrections is rare (Millar, 2007), but is considered more 

likely to help students make the link between the feedback and their own work (Brown & 

Glover). 

Failure to understand feedback is also associated with the tacit discourses of academic 

disciplines (Higgins 2000). However, learning tacit knowledge is an active, shared process, 

and thus writers such as Ivanic et al (2000) and Northedge (2003a) stress the importance of 

feedback which seeks to engage the student in some form of dialogue. This theoretical 

approach suggests that tutor-student dialogue could significantly aid feedback for learning, 

enabling students to understand feedback so that they can act on it to ‘reduce the gap’. 

This study emerged from concerns that staff on an Outdoor Studies Programme were 

devoting inordinate amounts of time to written feedback whilst students were reporting that 

they did not receive enough, nor was there evidence that feedback was being used to 

improve future assignments. Consequently, staff attempted to set up a dialogue with 

students by providing written feedback in response to students’ questions about their work, 

requested on their assignment coversheets. In the second year of the experiment, training 

was given in asking effective questions. 

Data was collected in the form of their feedback questions, interviews with staff, 

administration of the Assessment Experience Questionnaire (Dunbar-Goddet,Gibbs, 2006) 

and a supplementary questionnaire asking students for their preferences for guidance and 

feedback. Coding of students’ questions indicated that they differed markedly in the quality 

of their questions and rarely posed queries about the ‘content’ of their assignment, being 

much more concerned with their ‘skills’. The quantitative data indicated high mean scores 

for ‘quantity and quality of feedback’ and ‘use of feedback’ although students gave mixed 

preferences for different types of feedback. Staff reported achieving a sense of dialogue, 

finding it easier to write feedback in response to specific questions. Evidence of the impact 

of ‘question’ training will also be presented. 

Discussion will consider whether the research findings support conceptual models regarding 

the place of dialogue in creating learning-oriented assessment. It will also consider the 

practical implications of attempting to create a dialogue with students, given resource 

constraints, tutor and student expectations and quality assurance. 

60 ENAC 2008

Reforming practice or modifying Reforms? 

The science teacher’s responses to MBE and to assessment teaching in Chile 

Saul Alejandro Contreras Palma, Chile 

In 1996, the biggest Education Reform in the history of Chile was about to be implemented. 

Its aim was to strengthen the teaching profession. Therefore a framework for good teaching 

(MBE) was developed, which was focused on the need to know what and how to teach and, 

on an assessment system establishing the standards for the teachers’ work. However, little 

is known about the impact that the reform and its tools have had on how teachers think 

about teaching and learning, particularly science subjects. As Smith and Southerland (2007) 

said a "missing link" has been the investigation and understanding of the interaction 

between the teachers’ internal structures and the externally imposed ones. 

In this context, it makes sense to investigate what teachers think and do, and the relation 

between this reality and the one proposed by education reforms. This study examines the 

interactions between teachers’ beliefs and their actions concerning what and how to assess, 

and what is the degree of coherence between the teachers’ models and the ones proposed 

by the reform. This exploratory study analyzed and compared six Chilean high school 

science teachers who all participated in the reform and its training programs (PPF). The 

dates were obtained through the MBE document, a questionnaire, an interview and a nonparticipant 

observation. The information presented here was subjected to a content analysis 

centered in the assessment category. 

Our results indicated three important issues: First, behind the framework for good teaching 

(MBE) exists a constructivist model that indicates what and how teachers should assess or 

evaluate their students. Second, unlike the teachers thinking, their practice –independently 

of their subject– is traditional and inconsistent with the proposals of the reform. Third, the 

teachers’ thinking is organized in several levels, which differ from each other. There is a 

difference between what teachers "think they do, they think that should be done and what 

they say do”. This difference is more consistent with the fact that the teachers do not 

implemented the reform in their classes although having agreed to and participated in the 

reform. Therefore, the teacher thinking influences the interpretation of the –sometimes 

contradictory– messages of the proposals of the reform. 

In consequence, what we are discussing is not the reform and its tools, because we 

recognize that they have led to an advance by introducing the assessment culture and the 

measure concept. However, we believe that it would have been much more efficient to put 

the fases of the process of change process in another order: First, to explore what teachers 

know and what they are able to do and then to set standards that determine the 

professional knowledge to be obtained. In other words, it is necessary to determine what 

and how certain aspects of the teachers’ model support or impede the implementation of 

reforms, their instruments and the professional development of teachers, and to work on 

these aspects to achieve a real impact. 

ENAC 2008 61

Expanding student involvement in Assessment for Learning: 

A multimodal approach 

Bronwen Cowie, Alister Jones, Judy Moreland, Kathrin Otrel-Cass 

University of Waikato, New Zealand 

Assessment for learning (AfL) encompasses actions to assess student learning and steps to 

move that learning forward (Black and Wiliam, 1998). These actions can undertaken by 

teachers or students but the ultimate goal is that students are actively engaged in 

monitoring their learning. AfL was initially theorized within a cognitive frame. In the last 

decade research has begun to consider the implications of sociocultural views of learning 

(Gipps, 2002). These expand the possibilities for student participation in AfL. In this paper 

we use data generated in primary science and technology classrooms to illuminate the 

affordances of multimodal assessment practices. 

This paper reports on one outcome of the classroom Interaction in Science and Technology 

Education (InSiTE) study. The study involved 12 primary teachers and over 800 students 

over three years. A key goal was to understand AfL interactions around science and 

technology ideas and practices and the factors that afforded these interactions. Student and 

teacher reflective interviews, and teacher and researcher joint planning, reflection and data 

analysis meetings complemented the classroom work. The classroom data generation 

methods were videos of teacher interactions; audio taping of teacher and student talk; field 

notes; and the collection of teacher documents and student work. Post lesson discussions 

and meeting days provided a forum for data analysis. 

The InSiTE teachers and students employed multiple and multimodal means (Kress et al., 

1999) to make and communicate meaning. Their interactions encompassed talk, text, action 

and the visual mode. The teachers explicitly developed students’ oral language proficiency 

but almost invariably talk was augmented by action, writing and the visual mode. Written 

text anchored and augmented talk. Drawing was useful for illustrating and complementing 

talk when students could not express tentative ideas by talk alone. Actions, including 

gesture, were useful for demonstrating and illustrating skills and practices. A combination of 

modes multiplied meaning (Lemke, 2001). In the full paper we will present examples from a 

Year 1-3 students learning about fossils, Year 1 students designing and making kites, and 

Year 7-8 students designing and making musical instruments. 

Teacher and student talk plays a pivotal role in AfL interaction but talk is invariably 

anchored and augmented by other modes. When multiple modes are used in combination 

teacher-student AfL interactions are enriched. The likelihood that diverse groups of students 

are able to express what they know and can do is increased when classroom interaction is 

deliberately multimodal. Students do benefit from multimodal opportunities to gain feedback 

and to consider the ideas of others as part of their active engagement in AfL. 

References 

Black, P. & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7-74. 

Gipps, C. (2002). Sociocultural Perspectives on Assessment. In G. Wells & G. Claxton (Eds.) Learning for 

Life in the 21st Century (pp. 73-83). London: Blackwell Publishing Ltd. 

Kress, G., Jewitt, C., Ogborn, J. & Tsatsarelius, C. (2001). Multimodal teaching and learning: The rhetorics 

of the science classroom. London: Continuum. 

Lemke, J. (1990). Talking science: language, learning, and values. Norwood, N.J.: Ablex Pub. 

62 ENAC 2008

Assessment Center Method to Evaluate Practice-Related 

University Courses 

Julian Ebert, University of Zurich, Switzerland 

Introduction: To provide high-quality education at universities courses are evaluated. Mostly, 

written tests and subjective ratings are used to check for learning effectiveness of the course 

and students’ satisfaction. This discriminates courses with didactic concepts that are based 

upon activity and interaction (e.g. problem-based learning; Schmidt & Moust, 2000) because 

teaching and testing modus are different (Sternberg, 1994). Declarative and factual 

knowledge can be tested by written tests but procedural knowledge and skills require different 

assessment methods. Due to changes in curricula as a result of the “Bologna reformation 

process” Universities are increasingly challenged to foster their students’ meta-disciplinary 

competencies. Therefore, courses that intend not only to teach factual knowledge but rather 

train skills have to be given and evaluated concerning their efficiency. To evaluate the 

efficiency of such a practice-related university course concerning social and method skills in 

project management given at the University of Zurich an assessment center (AC) was 

developed, applied and analysed. The aim of the current paper is to present and discuss this 

innovative evaluation method for practice-related courses at universities. 

Methodology: 80 students and professionals of different subjects participate twice in 1-day- 

ACs before and after course attendance. The applied assessment methods varies from 

dyadic role-plays over planning and presentation tasks (for the evaluation of skill 

improvement) to written tests on procedural knowledge (to check for concordance of both 

applied measures). Additionally, questionnaires on motivation and work related selfassessments 

are being used. The assessors are specifically trained psychologists who 

achieved satisfying inter-rater reliability (ICC=. 75 on average). The assessees are – among 

others – tested on their abilities to delegate, lead discussions and argue, organize and 

present project plans, solve conflicts, mediate, and give feedback. 

Results: Results show significant increases in both procedural knowledge and skills. The 

baseline-corrected effect sizes for the subtasks of the procedural knowledge tests vary from 

d=.45 to d=2.07 (d=1.10 on average), those for the different assessment dimensions of the 

skill tests vary from d=.14 to d=1.32 (d=.57 on average). Nevertheless, increased 

knowledge did not automatically result in increased skills, which supports the transfer 

problem hypothesis. It also re-rises the question, whether written tests (even on procedural 

knowledge) sufficiently inform about the actual ability to perform the respective skills. Also, 

almost no gender effects have been found, i. e. male and female participants benefit equally 

from the course. 

Discussion: The findings indicate that different didactical concepts and teaching methods 

require different effectiveness testing and assessment centers seem appropriate to 

demonstrate the efficiency of skill-focused versus factual knowledge-focused courses. 

Students highly appreciated the detailed individual feedback that they received afterwards. 

We consider the opportunity to provide students with feedback beyond scores and grades 

the most important advantage compared to usual assessments at universities. 

Nevertheless, assessment centers are complex, time-consuming and expensive 

assessment methods and therefore not (yet) established in the field of course evaluation. 

We are interested in sharing our experiences with others who also work on innovative 

evaluation methods for innovative courses. 

ENAC 2008 63

Democracy, Assessment and Validity. Discourses and practices concerning 

evaluation and assessment in an era of accountability 

Astrid Birgitte Eggen, University of Oslo, Norway 

The paper is an empirically based discussion of the relationship between multiple 

understandings of democracy with the multiple purposes and practices of assessment. 

Assessment is seen as an asset of the overall evaluation processes at school and municipal 

levels. The conceptualization is inspired by three broad democratic evaluation orientations: 

elitist democratic evaluation, participatory democratic evaluation and discursive democratic 

evaluation as well as four dimensions of democracy (agency, voice, audience and 

influence). Underpinning the discussions are the various validity concerns of the democratic 

orientations, emphasizing in particular consequential, communicative, reflective and 

catalytic validity in additional to the traditional validities. 

This paper is presenting the main results of three ethnographic research projects among 

teachers and school leaders in secondary education concerning assessment and evaluation 

practices and discourses. These projects have been developed in cooperation with three 

municipal educational authorities and 20 school communities. The schools in the 

surroundings of Oslo have been participating in R&D projects during a phase of 

implementing “Kunnskapsløftet” and the “National evaluation program” (National curriculum 

(2006) combined with National strategy for assessment and evaluation) with possibilities for 

both summative and formative strategies. A consequence of national steering has been a 

cry for building assessment and evaluation literacy at both municipal and school level. 

A critical ethnographic research methodology based on emancipative, developmental and 

progressive ideology is signalling research as democratic enterprise. Data gathering has 

been twinned with in-service training of teachers and school leaders. Hence methods of 

instruction are closely connected to methods of inquiry. Issues grounded in both practices 

and theory has been accountability, democracy and ideological aspects like equity, equality, 

justice, values and ethics. The paper focuses on the democratic challenges of evaluation 

and assessment in an era of market driven accountability, however multiple accountabilities 

as well as multiple contents of democracy are identified in the participating communities. 

A situated learning perspective has been applied in order to view evaluation and 

assessment as joint enterprises depending on shared vocabulary and repertoire of 

assessment and evaluative tools in each community of practice. Consequently, the 

relevance of the traditional dichotomies of evaluation and assessment (summative and 

formative, internal and external etc) is questioned based on the findings, and boundary 

objects are introduced as an alternative analytical tool. The school communities find 

themselves within an overall ideological and epistemological controversy between a drive 

for goal oriented new public management steering combined with “evidence based” 

practices on one hand, and on the other hand the emancipative bottom up developmental 

strategies. Hence the projects points towards several tensions between the central and 

local governmental vocabulary and strategies for outcome measures and the discourses 

and practices in these schools. These projects have been feeding documentation for the 

development of the methodology and content of a program for teacher educators 

emphasizing assessment and evaluation literacy. 

64 ENAC 2008

Using Assessment for Learning: 

exploring student learning experiences in a design studio module 

Kerry Harman, Northumbria University, United Kingdom 

Erik Bohemia, Northumbria University, United Kingdom 

This paper explores the relationships between assessment for learning elements and 

student learning experiences in a design studio module. Our focus is on the Global Studio 

(Bohemia & Harman, 2008 forthcoming), a design module recently conducted at 

Northumbria University. Using a case study methodology with the aim of compiling rich, 

practice-based knowledges (Denzin & Lincoln, 2005; Gherardi, 2006), we draw on data 

gathered throughout the development and delivery of a particular design studio module in 

order to undertake our analysis. 

In the first part of the paper we briefly describe the Global Studio with a focus on the overall 

aims and the structure of the module. One aim of the Global Studio was the development of 

distance communication skills, thereby preparing students for work in geographically 

distributed workgroups. Thus, an important aspect of the course was the incorporation of 

the element of distance between geographically distributed student design teams. We also 

outline the assessment for learning elements that we use in our analysis. These include an 

emphasis on authentic assessment tasks, the extensive use of ‘low stakes’ confidence 

building opportunities, the provision of a learning environment that is rich in both formal and 

informal feedback and the development of students’ abilities to evaluate their own progress 

(McDowell et al., 2006; Sambell, Gibson, & Montgomery, 2007). 

Using the above assessment for learning framework we map various assessment for 

learning elements used in the Global Studio. We suggest in the first section of the paper 

that a number of assessment for learning elements were implicitly embedded in the 

structure and delivery of this particular module. 

In the second part of the paper we explore the relationships between assessment for 

learning elements (implicitly) used in the module and student learning experiences. Drawing 

on student evaluation data, both qualitative and quantitative, collected throughout the 

module we examine the learning experiences of students undertaking the module. Our 

focus here is on what students considered useful in the module in terms of learning and the 

links with assessment for learning elements. This analysis contributes to the collection of 

‘rich’ case study material on assessment for learning in Higher Education with a focus on 

the subject area of design. 

We conclude that the assessment for learning elements used in the analysis provided a 

useful frame for examining student learning experiences in this particular design studio 

module. Therefore, we suggest that assessment for learning may provide a useful language 

for developing ongoing discussion and research in relation to teaching and learning in the 

subject area of design. For example, the following research questions might be explored: 

does the design studio, in general, incorporate assessment for learning elements? And if 

so, how are these contributing to enhanced student learning experiences? 

ENAC 2008 65

Chasing Validity – The Reality of Teacher Summative Assessments 

Christine Harrison, Paul Black, Jeremy Hodgen, Bethan Marshall, Natasha Serret 

King's College London, United Kingdom 

The King’s-Oxfordshire-Summative-Assessment-Project (KOSAP) was an 18 month project 

aimed to investigate how to help teachers enhance the validity and reliability of their 

assessments so that these can play a significant and trustworthy part in all summative 

assessments of their students. This was a collaborative development between teachers in 

three Oxfordshire schools, their schools’ managements, assessment and subject advisers 

in the Local Education Authorities, and experts in school assessment from King’s College 

London. It involved investigation of the possibilities and practicability of assessment of year 

8 (Y8) pupils within the domains of English and of mathematics. Key research foci were the 

constraints and affordances that arise as teachers take a more active part in designing, 

using and evaluating summative assessment tools and activities and the subsequent effects 

in the classroom as the formative-summative interface is brought closer together. 

The intention of this mixed method research (through interviews, field notes, teacher writing 

and transcripts from teacher meetings) was to work with teachers to discover their working 

assessment practices, how they judged and valued the assessment tools that they used 

and whether they could be supported in improving their assessment tools and practices. 

Our interest lay in teachers’ perceptions, skills and practices and we wanted to do more 

than simply evaluate whether teachers could implement assessments that were provided for 

them. Rather, we wanted to understand the ways in which they interlaced assessment with 

curriculum and pedagogy through allowing them to explore for themselves how they might 

develop and evolve better teacher assessment 

The research revealed how the pervasiveness of tests constrained teachers’ own 

summative assessments. Teachers felt the pressures of the external tests system, in part 

through the priority given to the published test results by school managements and by 

parents and pupils, both to achieve in these tests, but also and to report on learning in 

terms only of a single level or grade. We also found that the teachers’ grasp and application 

of the principles that guide quality in assessment, notably the concept of validity, seemed 

weak. Linked to this was the generally conservative attitude that the teachers have towards 

the task of making summative judgments, and this was recognised by the project teachers. 

There is a general acceptance of the tests and tasks that they already do, despite their 

concerns that these assessment tools may not be fair, valid nor reliable in measuring the 

capabilities of their students. We therefore conclude that the more ambitious aim, of 

establishing the quality of teachers’ own summative assessments so that they may claim to 

supplement or even replace formal tests externally set and marked (ARG 2006), will be 

difficult to achieve without considerable professional development. We suggest it would take 

several years of modest steps towards such an aim, before that aim could be approached 

or a new system could be designed and implemented to meet the multiple purposes of 

public assessment. 

66 ENAC 2008

There is a bigger story behind. 

An analysis of mark average variation across Programmes 

Anton Havnes, University of Bergen, Norway 

In a UK university some undergraduate programmes have been consistently above the 

University average, others consistently below. Preliminary analyses have controlled for level 

entry grades, gender, group size, the assessment weighting on modules between 

coursework and exams, and assessment forms (coursework vs. exam). None of these 

factors explain the variation in average marks. Students who take a combined degree with 

one Field in the high mean group (HM) and another in the low mean group (LM) on average 

get higher marks in their HM modules than in their LM modules. One possible explanation is 

that the variation is due to diverse assessment and marking cultures. This project took 

another potential explanation as the starting point: Are there variations between 

Programmes in the way coursework, formative assessment and feedback is organised that 

make it reasonable to expect that students in the HM Fields probably will reach the 

standards of their Field, while it is less likely that students in the LM Fields do? If so, it is 

also reasonable to expect that the HM Fields should have higher average marks than the 

LM Fields. Also, there should be something to learn from the HM Fields that shed light on 

potentials for improvement of the educational programmes across the whole University. 

Because of the sensitivity of this issue I was engaged to do this study as an external and 

non-UK researcher. Four categories of data was obtained: 

• documents: Study Guides, Module Descriptions, Coursework tasks 

• semi-structured interviews with Field Chairs, representing two HM and two LM Fields, 

the fifth Field chair represented a Field that has risen from LM to mean (taped, 

transcribed and analysed) 

• examples of written feedback on students’ coursework 

• mark transcripts for each Field and each module in each Field 

The recruitment for students was not successful, unfortunately. The main restriction was the 

Data Protection Act, which prevents the obtainment of contact details for those students 

who have not already agreed to be contacted by a researcher. 

The analysis shows that the teachers in all Programmes comply to the University 

assessment regime and the guidelines for marking and feedback. It is hard to identify 

essential differences in the assessment cultures, instead, it seems that there is one 

assessment culture that dominates across the University. The analysis points to a series of 

contextual and conceptual factors that varies systematically between HM and LM Fields. 

• the consistency of the conceptual construct that students’ learning is about (the core 

around which students’ learning rotate throughout the whole degree programme), 

represented by the consistency of what teaching, assessment and feedback relate to 

across modules. 

• the relationship between learning activities and the students potential future professional 

and/or academic practice 

• the way the complexity of the Field (at the Programme level) and its thematic 

components (at the module level) are laid out as a Field and as a trajectory of learning 

across modules 

• inter-modular planning, coordination and communication 

• the integration of feedback in lectures, seminars and coursework. 

ENAC 2008 67

Course design and the Law of Unintended Consequences: 

Reflections on an assessment regime in a UK “new” University 


Assessment is known to drive learning. The attempt to improve students’ learning has led to 

revisions of assessing students to institute more diverse and more learning-oriented 

assessment strategies. In many universities in UK and elsewhere coursework has become 

the most common assessment form. Coursework assessment offers the opportunity to 

ensure diversity and frequent feedback. In a study of three UK universities Gibbs (2007) he 

found that the assessment environments differed “very widely” across institutions. The 

variation was particularly large in the volume of formative-only assessment. Butler (1987) 

has documented that formative-only feedback has a significant influence on learning 

contrary marks-only feedback. The importance in increasing and improving formative 

feedback to support student learning is stressed in policy documents and research (e.g. 

Hattie & Timperley, 2007; Nicol & Macfarlane-Dick, 2006). 

This paper is based on a study of coursework, marking, formative assessment and 

feedback practice in a UK University. 15 teachers (five Field Chairs and 10 lecturers) in five 

undergraduate programmes were interviewed, Institutional Guidelines, Study Guides, 

Module Descriptions and Coursework Tasks were collected and analysed. Interviews were 

transcribed and analysed to identify how assessment and feedback supported students’ 

learning. Finding show that in spite of the fact that markers invested a vast amount of 

resources in writing feedback to the students, only a small number of students actually 

collected their feedback. The analysis explains why this neglect of feedback by the students 

turns out to be a regrettable but rational response on the assessment system. Firstly, 

assessment was predominantly associated with marking mark justification: “Done that.” 

Secondly, the links between what was assessed in one module often did not link to what 

was assessed in another module. Likewise, coursework would often cover different thematic 

fields, they would be assessed in relation to different criteria and the modes of assessment 

would vary. This triple-variation made feedback of minor interest (except of students who 

had to re-sit and the very engaged students) created inconsistency in students’ learning 

trajectory: “The next assessment task is on something different, some other criteria and you 

have to perform in a different way. The problem fundamental is not any individual teacher’s 

assessment, the assessment of a specific achievement or any given feedback given to an 

achievement. Instead the problem is how the various teachers assessment interrelate 

thematically and rotate around a set of core criteria. Another problem is the marking. 

Lecturers argued that they could not give formative assessment on a piece of work that was 

subject to summative assessment. 

These and other finding will be discussed in the perspective of research on formative 

assessment, the use of criteria and the influence of feedback on learning. The findings – 

though the assessment system was expected to increase formative feedback – will also be 

discussed in the perspective of “Murphy’s law of unintended consequences”: goal-oriented 

activities will generate unexpected and often counterproductive results that can nullify the 

desired outcomes, or, things will go wrong in any given situation, if you give them a chance. 

68 ENAC 2008

Reliability and validity of the assessment of web-based video portfolios: 

Consequences for teacher education 

Mark Hoeksma, Judith Janssen, Wilfried Admiraal 

ILO Graduate School of Teaching and Learning, University of Amsterdam, The Netherlands 

In this paper, we will evaluate the quality of using web-based video portfolio for the 

assessment of competences in teacher training. The web-based video portfolio has been 

designed and tested in the DiViDossier-project, which was financed by the National eLearning 

Programme of SURF, the Dutch foundation for ICT in higher education. Since 2003, our 

institute has been using an electronic portfolio system in order to provide a realistic portrait of 

a student's abilities, offer an opportunity for a student’s self-reflection and, communicate a 

student's performance to others. These portfolios contained written documents, for example 

reflections about one’s own behaviour and classroom situations. A major advantage of a 

video portfolio is that students are able to demonstrate their competences in authentic 

professional situations. Therefore, the system of web-based video portfolio is expected to 

improve the quality of assessment in teacher education: compared to written texts, videos can 

give a realistic, or a more valid, view of teaching competences. 

Two characteristics of our video portfolio system: 

• The video portfolio includes video-narratives in which teacher trainees can demonstrate 

both integrated competences and their growth in these competences. 

• The video portfolio includes reflections and narratives demonstrating knowledge of 

methodological and pedagogical approaches. 

The use of a video portfolio has serious consequences for our procedures of assessment. A 

list of fairly ‘open’ criteria is used by two teacher educators, the students’ supervisor and an 

uninvolved teacher educator. They assess the professional competences of student 

teachers and the congruency between their performance on video and their reflections. 

In this study, our main question is how to enhance the quality of our assessment procedure 

of web-based video portfolio, in terms of reliability and validity. We will gather the following 

data: 

1. Individual interviews with 10 teacher educators on their assessment procedures of video 

portfolios 

2. Individual think-aloud interviews with 10 teacher educators on their assessment of a 

particular video portfolio 

3. Assessment forms with the assessments of 40 portfolios (80 forms, 2 per portfolio) 

The first set of data will result in general information of assessment procedures, evaluation 

of the criteria used, and justifications of the assessments. The second and the third set of 

data will be used for the analysis of reliability and validity. The reliability will be reported in 

Cohen’s kappa; the validity will be analysed by using qualitative research methods. The 

validity will be examined by investigating whether the assessors are biased in their 

assessment of the performance or reflections (cf., Heller Sheingold, & Myford, 1998). In 

addition, we will examine the consistency between their beliefs and practice of assessment. 

References 

Heller, J. I., Sheingold, K., Myford, C. M. (1998). Reasoning about evidence in portfolios: cognitive 

foundations for valid and reliable assessment. Educational Assessment, 5, 5-40. 

ENAC 2008 69

Diversity in patterns of assessment across a university 

Jenny Hounsell, University of Edinburgh, United Kingdom 

Dai Hounsell, University of Edinburgh, United Kingdom 

Over the last quarter-century, there has been a far-reaching transformation in the practices 

and processes of assessment in higher education. What was once a rather limited diet of 

essays, reports and exams has undergone a remarkable diversification and today's 

university teachers have before them an abundance of possible ways of assessing their 

students' progress and performance. Keeping track of these changes has proved far from 

easy: assessments need to be tailored not only to subject requirements but also to level of 

study, degree programme aims and the learning outcomes for a given course unit or 

module. This in turn means that within universities, responsibilities for designing and 

conducting assessments are in various respects devolved to departments. While there have 

been some surveys of changes and developments in assessment, the mapping that has 

been done has been mostly global rather than localised, and has tended to focus on 

changes that have been considered worthy of documenting in the literature (Bryan and 

Clegg, 2006; Hounsell et al., 2007; James et al., 2002). 

This paper reports the findings of a study which was distinctive in its attempt to survey 

undergraduate assessment methods and weightings across a large and long-established 

university in which subject areas enjoyed a considerable degree of autonomy in devising 

patterns of assessment. It draws on data that has recently become much more readily 

available following the introduction of a new computerised database on degree programmes 

and course units and through departmental websites. It focuses on two aspects of current 

practices: methods of assessment, and weighting of examinations and coursework. A total 

of 91 methods of assessment (68 types of coursework and 23 kinds of exam) were found to 

be in use across 20 subject areas, while the total number of methods deployed within a 

subject area ranged from 10 to 48. 

The choice of methods was to a significant extent a function of subject area: a small number 

of assessment methods was found across the subject range, while a much larger number 

were confined to a limited number of departments. There were also striking differences in 

how assessments were weighted across subject areas and over successive years of 

undergraduate study. Four contrasting models were identified, differing in terms of whether 

weightings were uniform or variable from unit to unit and from one year to the next, and the 

extent to which coursework or exams was preponderant. 

The paper concludes by exploring the implications of these findings, both for assessment 

practice within the university concerned and in higher education more generally. 

References 

Bryan, C. and Clegg, K. (2006) (eds.) Innovative Assessment in Higher Education. London: Routledge 

Hounsell, D., Blair, S., Falchikov, N., Hounsell, J., Huxham, M., Klampfleitner, M. and Thomson, K. (2007) 

Innovative Assessment Across the Disciplines: An Analytical Review of the Literature. York: Higher 

Education Academy 

James, R., McInnis, C. and Devlin, M. (2002) Assessing Learning in Australian Universities. Melbourne: 

University of Melbourne. 

70 ENAC 2008

Learning-oriented assessment: A critical review of foundational research 

Gordon Joughin, University of Wollongong, Australia 

The concept of ‘learning-oriented assessment’ draws attention to a range of conceptual, 

research and practice issues concerning the relationship between assessment and the 

process of learning in higher education. The conceptual framework for learning-oriented 

assessment proposed by Carless, Fun and Joughin (2006) provides a convenient 

framework for highlighting these issues. In this paper, one of the authors of that framework 

draws attention to, and challenges two propositions that have become maxims in the 

literature of assessment and learning, namely that assessment drives learning and that 

feedback through formative assessment is critical to the learning process. A careful review 

of repeatedly cited research casts doubt on the first of these propositions. For example, the 

treatment of the research reported in frequently cited works such as Making the Grade 

(Becker, Geer, & Hughes, 1968) and The Hidden Curriculum (Snyder, 1971) has often 

oversimplified, and thus misrepresented, the research findings, leading to singular 

interpretations of complex, multi-faceted phenomena. Other research suggesting serious 

limitations to the capacity of assessment per se to improve students’ approaches to learning 

is often under-emphasized, leading to the risk of exaggerated claims for the capacity of 

‘alternative’ forms of assessment to foster effective learning processes in students. Finally, 

research on students’ experience of assessment contrasts with the prominence accorded to 

feedback in learning and assessment theory, highlighting a worrying gap between theory 

and practice. 

This paper provides a critical review of the empirical research basis of the above 

propositions regarding the roles of assessment and feedback in directing and forming 

students’ learning. On the basis of this review, the paper proposes an empirical research 

agenda that addresses what seems to be serious gaps in our understanding of fundamental 

aspects of the interactions between assessment and learning. 

ENAC 2008 71

Implementing standards-based assessment in Universities: 

Issues, Concerns and Recommendations 

Patrick Lai, The Hong Kong Polytechnic University, China 

Studies were made to identify practices of implementation of standards-based assessment. 

Tan & Prosser (2004) conducted a phenomenographic study of academics’ conceptions of 

grade descriptors. This study illustrates that academic staff understand grade descriptors in 

markedly different ways, ranging from conceptualizing the descriptors as having nothing to 

do with standards to understanding those which are directly related to standards. Sadler 

(2005) conducted another study to find out the grading practices of universities. None of the 

approaches identified delivered the aspirations of standards-based assessment. However, 

there is a need to shift the focus from criteria to standard. Whilst these two representative 

studies emphasize only the final grading step, there is a need to have a more thorough 

investigation into each of the implementation steps of standards-based assessment. 

A series of focus group interviews with 51 academic staff from 21 departments and two 

open forums were conducted. Participants were invited to comment on the issues and 

problems encountered at the preparation, marking and post-marking stages of standardsbased 

assessment. This paper summarizes the issues and concerns identified in the 

discussions with various stakeholder groups. 

In developing criteria and performance standards for assessment tasks, often the 

assessment task is selected first and matched to the learning outcomes. It is also difficult to 

set clear criteria that can be understood easily by assessors and students and that will 

discriminate effectively between the good students and the weaker ones. Setting up and 

grading students’ work based on a matrix, with descriptors for each criterion at different 

performance levels, is tedious and becomes unmanageable. 

In making assessment criteria and standards explicit to assessors and students, the key 

concern here is that colleagues are not clear about the depth of detail required by students. 

If colleagues give too many detailed examples of the expected responses for different 

award levels it may become too much like a model answer and will not help students 

develop independent study habits. 

In ensuring the consistency in marking and grading, the concern expressed by staff is the 

need to ensure consistency in marking and grading when there are multiple markers. In 

case if there is a large number of assignments to be marked, the marker’s perceptions of 

the criteria and standards may “drift” from start to finish. 

Finally there are two issues that can affect consistency in the development of standards. 

One issue concerns the different perceptions of minimum passing standards held by 

academic colleagues. It is quite a commonly-held perception that a skewed distribution of 

students’ grades in a particular subject is abnormal and should be “normalized”. 

Based on a series of feedback-collecting exercises, this paper will also present a number of 

strategies for setting assessment tasks, marking and post-marking mechanisms that can be 

utilized to address the concerns expressed by university academic staff about their 

endeavours to implement standards-based assessment. Recommendations on strategies 

and support to facilitate academics to implement standards-based assessment made in this 

paper certainly add to the literature in higher education. 

72 ENAC 2008

Test-based School Reform and the Quality of Performance Feedback: 

A comparative study of the relationship between mandatory testing policies and 

teacher perspectives in two German states 

Uwe Maier, University of Education Schwäbisch Gmünd, Germany 

Comparative research on test-based school reform revealed that positive impact of 

performance feedback information on school improvement depends on accountability policy 

and testing system in the respective jurisdiction (Firestone, Winter & Fitz 2000; Cheng & 

Curtis 2004; Herman 2004). Test-based school reform is meanwhile a prominent instrument 

of educational policy in Germany. But state-mandated testing systems vary from jurisdiction 

to jurisdiction since the German federal constitution guarantees state autonomy in 

educational policy. Particularly the testing systems in the two German states Baden- 

Württemberg and Thüringen are rich in contrast. State-mandated tests in Baden- 

Württemberg (Vergleichsarbeiten) are not based on competency models, provide little 

feedback information (raw data) and teachers are responsible for data analysis. Statemandated 

tests in Thüringen (Kompetenztests) are based on competency models, 

performance feedback includes value-added data, and external support is high. Both 

obligatory tests were given at the end of Grade 6 in core subjects including German 

language and mathematics. 

The hypothesis was that the elaborated and value-added feedback information in Thüringen 

is more accepted by schools and can rather prompt teachers to reflect upon professional 

improvement. Random samples of schools in both states were approached for data 

collection. A total of 1136 teachers completed the questionnaire (nBW=825; nThü=311). 

Measurement of the dependent variables is based on a quantitative survey instrument. An 

exploratory factor analysis revealed seven scales: General acceptance of mandatory testing 

(6 Items, alpha = .89), mandatory testing as a burden for schools (4 Items, alpha = .79), 

curricular alignment of the test (4 Items, alpha = .84), performance feedback supports 

diagnostic activities (5 Items, alpha =.89), performance feedback supports grading (5 Items, 

alpha =.80), performance feedback indicates further revision (3 Items, alpha =.90), 

performance feedback indicates curricular changes (4 Items, alpha =.77). The hypothesis 

proved to be correct. General test acceptance and the use of performance indicators for 

diagnostic activities and reflection upon teaching were substantially higher among teachers 

in Thüringen. By contrast, teachers in Baden-Württemberg show higher average scores on 

the scale "performance feedback supports grading". The results show again that sound 

testing policies are a crucial precondition for standard-based school reforms. 

References 

Cheng, L./Curtis, A. (2004): Washback or Backwash: A Review of the Impact of Testing on Teaching and 

Learning. In: Cheng, L./Watanabe, Y./Curtis, A. (Eds.): Washback in Language Testing. Research 

Contexts and Methods. Mahwah/London: Lawrence Erlbaum, pp. 3-17. 

Firestone, W. A./Winter, J./Fitz, J. (2000): Different assessments, common practice? Mathematics testing and 

teaching in the USA and England and Wales. In: Assessment in Education, 7, 2000, 1, pp. 13-37. 

Herman, J. L. (2004): The Effects of Testing on Instruction. In: Fuhrman, S.H./Elmore, R.F. (Eds.): 

Redesigning Accountability Systems for Education. New York/London: Teachers College Press, pp. 

141-166.Lesh, R. (1999) The Development of Representational Abilities in Middle School 

Mathematics. In Sigel (Ed.) Development of Mental Representation: Theories and Applications 

(pp.323-349). London: Lawrence Erlbaum Associates, Inc. 

ENAC 2008 73

Motivational aspects of complex item formats 

Thomas Martens, Frank Goldhammer 


Some benefits from Computer-Based Assessments (CBAs) are generally accepted, e.g., 

shorter testing time or instant scoring. However, the question of whether CBA is more 

enjoyable for students is still in the focus of research. For example, Computer-Adaptive 

Tests (CATs) certainly allow for shorter testing times, but CATs may also irritate testees by 

administering constantly items with a fixed solution probability like 50% despite of the 

testees’ perceived test effort (see Frey, 2006). Another line of research (Björnsson, 2007) 

shows that 15 year old students from Denmark, Iceland and Korea enjoyed CBA more than 

Paper-Based Assessment (PBA) and - if they could freely chose their personal test mode – 

they would select the CBA mode only (53,2%) or a combination of CBA mode and PBA 

mode (37,6%). We assume that complex item formats like browser simulations are even 

more enjoyable for students than the items used by Björnsson (2007). 

In a first study with N=70 students the computer based assessment platform TAO (the 

French acronym for technology-based assessment) and the “Hypertext Builder” were used 

to develop and deliver complex electronic reading stimuli, covering all major text-types 

encountered in electronic reading such as websites, e-mail client environments, forums, or 

blogs. Furthermore motivational state and trait variables (see Rheinberg, 2003) as well as 

ICT literacy were assessed. First comparisons with older PBA studies that used the same 

motivational items revealed that the students like the complex stimuli much more and 

therefore reported that they tried harder to solve the corresponding test items. This selfreported 

effort is partly an effect of general computer motivation and ICT-literacy but 

nevertheless a direct positive effect of the used stimuli for electronic reading remains. 

This first result - that test items that try to mimic real world settings, like browsing a website, 

are more attractive for students - is not very surprising. However, it has also to be 

investigated, whether influences that might spoil test reliability of PBAs like boredom or 

refusal will be replaced by other bothering random influences that stem from complex CBAs 

like loosing time control or going into unimportant details. These side effects that might be 

related to the testee’s interaction with complex stimulus material are under ongoing 

research using thinking-aloud and eye tracking techniques. Systematic results from this 

research (N=20) will also be presented at the conference. 

74 ENAC 2008

Remarkable Pedagogical Benefits of 

Reusable Assessment Objects for STEM Subjects 

Michael McCabe, University of Portsmouth, United Kingdom 

Reusable assessment objects (McCabe, 2007) have been used to transform the learning of 

STEM (Science, Technology, Engineering and Mathematics) subjects by making eassessment 

more dynamic. The resulting “peer moderation” has improved student 

engagement through closer cooperation with the lecturer in the development of learning 

resources. 

The concept of peer moderation for summative examinations seems absurd. How can 

mathematics or science questions be released to students before an exam? If solutions are 

known in advance, the incentive to learn is lost. Traditional written exams are prepared in 

advance and moderated by academic staff under tight security. Traditional e-assessment 

exams require even more care over moderation and security, since they need to be 

checked both for their academic content and technical correctness. 

One possible approach is to use large e-assessment question banks. If hundreds or 

thousands of questions are available, then it may be possible to release them to students in 

advance of a formal exam involving a small subset of, say ten, questions. Students are 

motivated to try a large number of the formative questions and receive feedback on their 

progress. Lecturers can perform item analysis on the questions to derive e.g. facility and 

discrimination, based upon the trials. They can also identify pedagogical or technical 

mistakes at an early stage and then modify questions accordingly. The end result is an 

improvement in the quality of the question bank, but at the expense of revealing the precise 

questions to students. Of course, a student willing to attempt the complete question bank 

might be regarded as worthy of success, regardless of their ability! I have used large 

question banks available from publishers and national projects in this way, e.g. for 

mathematics and astronomy. Unfortunately, individuals rarely have the time to develop 

sufficient questions for large banks. 

Reusable assessment objects are different. They automatically generate computer-based 

questions, guidance, hints and feedback for students, through the use of random 

parameters and algorithms. The random parameters can include numbers, characters, 

words, text, diagrams, graphs, pictures, algebraic expressions, mathematical operators, 

equations, variables, functions and symbols. The algorithms specify how these random 

parameters interact and can include conditions applied to questions and answers. An 

interesting example of their use is in statistical hypothesis testing where several 

intermediate questions or steps lead up to a final decision. The final decision changes 

according to the data provided in the question. Algorithms are also useful for defining 

questions with open-ended answers, such as asking for an example which satisfies a set of 

criteria. 

Lecturer benefits include a reduced need for large question banks and the remarkable 

opportunity for students to peer moderate their summative assessment questions. Student 

benefits include greater motivation to attempt formative test questions, better feedback 

(Nicol and Milligan, 2005), greater involvement in the assessment process itself, higher 

quality questions and opportunities to assess their own progress more accurately. 

Examples of reusable assessment objects generated using MapleTA 

http://perch.mech.port.ac.uk/classes will be used to illustrate these ideas. 

ENAC 2008 75

Demystifying the assessment process: 

using protocol analysis as a research tool in higher education 

Fiona Meddings, Christine Dearnley, Peter Hartley 

University of Bradford, United Kingdom 

Marking and assessing student submissions is a fundamental part of contemporary 

education. To the casual observer undertaking assessment of student work may appear to 

be a simple process, after all the student has done the hard part - engaging with the 

process by completing the required assessment task. On closer examination however it 

seems that very little is known about the actual lecturer process of marking. Limited 

literature exists to inform us about how lecturers come to the decisions they do, and what 

influences them in reaching those decisions. What we do know is that marks are provided 

for the student to give an indication of their success or otherwise at the assessment task. In 

some cases marks are accompanied by written feedback, in the guise of qualitative 

statements sitting alongside the quantitative (possible alphanumeric) mark. Outcomes 

following the marking process depend upon the purpose for which it is seen i.e. lecturing 

staff may feel it reflects the quality of the educational process and student engagement, 

whereas the student may see the mark and feedback as giving them an idea of individual 

achievement and whether it relates to their own self assessment of their abilities. 

Although the problem of assessment does feature in the literature it is often concerned with 

what we do to assess students (Nicol 2007) i.e. the actual assessment approach or the tool 

to be used e.g. portfolio, examination etc or coursework, seminar etc, respectively; with 

blurring between the two. What is known is that good assessment choices will consider the 

teaching methods as well as the subject matter. Less attention has focused on the impact of 

assessment feedback on students (Higgins et al 2001). Others identify how improved 

feedback can assist student learning (Nicol and MacFarlane Dick 2006) or highlight 

feedback as an important feature (Gibbs and Simpson 2004). What remains unclear from 

the literature is how this feedback, which is of high importance to students (National Student 

Survey U.K. 2007), is constructed. 

This paper presentation will explore the potential of protocol analysis as a method of data 

collection used to uncover the thought processes involved in marking and assessing 

undertaken by lecturing staff at a higher education institution. This is a method of gathering 

concurrent verbal reports of lecturer judgements during the marking and assessing process, 

by recording their verbalised thoughts. Marking is almost always a solitary process, limited 

interactions between markers, with little being known about the cognitive processes 

undertaken. In a study undertaken at this university protocol analysis was used to uncover 

the thinking processes related to a marking and assessing task; by asking participants to 

speak aloud and to verbalise their cognitive processes (Ericsson and Simon 1993). A study 

by Orrell (2006), uses a similar approach (in Higher Education) therefore validating the use 

of this data collection method. The paper will examine the concepts of validity and reliability 

as well as provide some opportunity for discussion and examination of generalisability of 

findings from the utilisation of this data collection tool. 

76 ENAC 2008

Challenging the formality of assessment: a student view of 

‘Assessment for Learning’ in Higher Education 

Catherine Montgomery, Northumbria University, United Kingdom 

Kay Sambell, Northumbria University, United Kingdom 

This paper explores students’ understandings of ‘Assessment for Learning’ (Black et al, 

2003) or ‘Learning-oriented assessment’ (Carless, 2006) by outlining some of the findings of 

a systematic university-wide, cross-disciplinary study into student perceptions of this 

approach. Whilst the concept of ‘Assessment for Learning’ (AfL) has developed related 

theoretical bases over the last decade, there is little research that reveals students’ 

conceptions of the meanings associated with the term. Previous research has focused on 

specific elements of AfL such as self and peer assessment (Dochy, Segers and Sluijsmans, 

1999), feedback (Higgins, Hartley and Skelton, 2002) or on specific approaches to 

assessment (Birenbaum, 1996). However, there has been relatively little research aiming to 

illuminate students’ understandings of the concepts of AfL. Early research has indicated that 

students construct different concepts about the meanings of assessment tasks and this 

‘hidden curriculum of assessment’ can sometimes be at odds with the ‘formal’ curriculum 

(Sambell and McDowell, 1998). This paper contributes to the rapidly growing literature that 

explores the socio-cultural context of learning through investigating students’ experiences of 

assessment for learning as ‘lived’ (Orr, 2007; Montgomery, 2007). 

The paper forms part of a wider study that employs a multi-site case study design with each 

case site across the disciplines of Engineering, Education and English, representing an 

implementation of AfL in a learning context. Multiple methods of data collection are used 

with interview, observation and focus groups generating data within an interpretive 

approach. The approach is predicated on the claim that the activities of learning and 

teaching are best understood if they are investigated as activities in their ‘natural’, sociocultural 

context, rather than on the basis of ‘experimental’ interventions, or on the basis of 

actor-related variables, such as student characteristics or motivations (Haggis, 2007). 

The findings suggest that although tutors were likely to draw upon assessment-related 

discourse, such as ‘self-assessment’, ‘feedback’ and ‘peer-evaluation’ to refer to AfL, it was 

notable how far students did not. Students did, however, construct AfL as markedly different 

from more traditional teaching, learning and assessment experiences. Their heightened 

engagement with learning and assessment tasks invested their experiences with personal 

meanings, enabling them to see the ‘real world’ application of their learning. For the students, 

AfL meant they were no longer simply ‘jumping through hoops’. AfL was viewed by the students 

as being part of an informal, personal and social context for learning where conversation and 

the informal exchange of views were highly prized and student emphasis lay with ‘talk’, 

‘listening’ and ‘seeing’ in relation to informal dialogue. Students noted that their learning was 

often characterised by ‘informal chat’ and that it was ‘light-hearted banter’ and ‘story-telling’. 

This paper may stimulate discussion of Hawe’s (2007) point that in an aligned teaching, 

learning and assessment setting staff and students should engage in meaningful dialogue 

about elements of teaching, learning and assessment. This may contribute to a shared 

language with which to engage in dialogue because without this AfL may risk constructing 

discrete and, to students, ‘foreign’ assessment systems. 

ENAC 2008 77

Secondary students’ motivation to complete written dance examinations 

Patrice O'Brien, The University of Auckland, New Zealand 

Mei Kuin Lai, The University of Auckland, New Zealand 

There is little research that focuses on students’ experience of national examinations in 

non-traditional curriculum such as dance. In this standard, we examine why New Zealand 

students sitting national examinations (NCEA) in dance, were not attempting one of two 

written standards of their assessment. A non-attempt by a student present at the 

examination is referred to as a void in New Zealand. 

The research used a case study approach in order to gain rich information directly from 

students. Initially, data was obtained from questionnaires that were completed by students 

(n=26) of a national cohort (n=516) as they left the examination room, in three secondary 

schools. This enabled students to report immediately on what they had done in the 

examination and the reasons for their decisions. We conducted in-depth interviews with all 

students (n=4) who voided a standard and also interviewed a randomly selected 

comparison group representing a range of achievement levels (n=5) that did not void 

standards. 

Results showed that the greatest difference between these two groups involved the 

students’ belief in their ability to succeed. In line with attribution theory (Weiner, 1985), 

students who attempted both standards attributed their success to internal factors such as 

the effort they put into study. Students who voided a standard attributed their results to 

external factors such as the appeal of the topic they had studied, the difficulty of the 

questions or the layout of the standard. Interestingly some students who voided a standard 

could provide correct answers to the interviewer. 

The research also found that lack of success with school practice examinations, which are 

intended to assist students’ preparation for NCEA examinations, had the unintentional effect 

of contributing to the beliefs of students who voided standards that they were not capable of 

success. Initially, researchers and teachers made assumptions such as lack of literacy 

skills, or having already achieved sufficient credits for a certificate, impacted on students’ 

motivation to attempt dance standards but this research found that these factors did not 

significantly influence students to void standards. 

The implications of this research reinforce the importance of testing assumptions and 

obtaining feedback from students as a way of supporting their learning. It also reinforces the 

importance of contextualizing research results to each school’s unique situation. In a 

national study on student motivation to complete NCEA, Meyer and colleagues (2006) 

predicted that students would be influenced by the number of credits they had accumulated 

but credits did not affect students’ motivation in this case study. The study also revealed the 

negative effects of grade-focused assessment feedback on students with low expectations 

of success. 

References 

Meyer, L., McClure, J., Walkey, F., McKenzie, L., & Weir, K., (2006). The impact of NCEA on student 

motivation. Wellington: Victoria University of Wellington. 

Weiner, B., (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 

92, 548-537. 

78 ENAC 2008

Mind the gap: 

assessment practices in the context of UK widening participation 

Michelle O'Doherty, Liverpool Hope University, United Kingdom 

This paper reports on findings from research funded by the UK Higher Education Academy; 

the study aimed to explore staff and student perceptions of quality feedback within the 

context of transition between educational sectors. Whilst seminal research has been 

conducted on the assessment experience of students in schools (Black and Wiliam, 1998; 

Black et al, 2003) and universities (Hounsell, 2003), there are relatively few studies that 

investigate the impact of the former on the latter. This qualitative study makes this cross 

sector connection, presenting data collected across nine education institutions (three 

schools, sixth forms and universities respectively) on perceptions of assessment. As a 

result, our findings address a gap in the current literature, positioning first year 

undergraduate expectations of quality feedback within the context of their prior experience 

of formative assessment. 

Current theory conceptualises assessment as a dialogic process (Higgins et al, 2001) in 

which quality feedback is the most powerful single influence on student achievement 

(Hattie, 1987); therefore, the provision of quality feedback is perceived as a key requirement 

of effective teaching in higher education (Ramsden, 2003). In practice, lecturers often 

believe their feedback to be more useful than students do (Careless, 2006; Maclellan, 2001) 

and feedback has consistently been identified as the least satisfactory aspect of the student 

experience in UK universities (National Student Survey, 2007, 2006, 2005). As a 

consequence of this mis-match in staff and student perceptions, assessment in UK higher 

education is being challenged. 

Frameworks for good practice in assessment have been developed, but attempts to 

conceptualise quality feedback within the context of higher education have been positioned 

within a formative rather than a summative process (Gibbs & Simpson, 2004-5; Nicol & 

Mcfarlane-Dick, 2004; 2006). However, resource constraints coupled with a widening 

participation agenda of mass expansion in the UK have limited the opportunities for 

formative assessment to be practised (Yorke, 2003; Gibbs, 2007), At the same time, within 

the school sector a formative Assessment for Learning Culture (Assessment Reform Group, 

1999) has been developed which means students experience a significant cultural gap in 

feedback practices between educational sectors. In particular, our findings reveal students 

perceive quality feedback as part of a dialogic, guidance process rather than a summative 

event. Conversely, in higher education concerns relating to the ‘dumbing down’ of 

Independent learning through spoonfeeding (Haggis, 2006) are leading to increasing 

tensions between the theory of good practice and the practice of assessment. 

This longitudinal study reports on the consequences of these conflicting expectations of 

guidance and independent learning for first year undergraduates and their tutors in three 

subject disciplines. These findings have informed recent initiatives to scaffold students’ 

autonomous learning through formative assessment and the presentation will provide an 

opportunity to discuss these interventions. Thus, the presentation of our cross sector 

findings aims not only to reframe the context of the debate challenging current assessment 

practices in UK higher education, but also to contribute to the re-conceptualisation of 

feedback practice for future learning (Hounsell, 2007; Boud & Falchikov, 2007). 

ENAC 2008 79

Measuring writing skills in large-scale assessment: 

Treatment of student non-responses for Multifaceted-Rasch-Modeling 

Raphaela Oehler, IQB, Humboldt University Berlin, Germany 

Alexander Robitzsch, IQB, Humboldt University Berlin, Germany 

Researchers in large-scale assessment need to make decisions about how to handle 

student non-responses in their analyses. Particularly when the sample contains of lowachieving 

students, a considerable amount of responses might be missing by intention or is 

not interpretable. Reducing costs by not giving raters such texts is a common procedure. 

However, if rater effects are to be analysed and item difficulties are to be obtained using 

Multifaceted-Rasch-Modeling, a coding by the test administrators and thus introducing a 

new hypothetical “rater” is problematic for using standard IRT programs. 

One central project of the IQB is the development of large item pools on assessing 

students’ foreign language skills, particularly the reading, writing, and listening 

comprehension skills. Item development is based on the Common European Framework for 

Languages (CEF) which proposes six proficiency levels (A1 to C2). Item development for 

testing writing skills at the IQB follows the uni-level approach that means that writing tasks 

are developed for each CEF-level. The rating scales also demand the rater to judge texts 

within one level, i.e. performing quite well in an A2-task means that students’ writing skills 

are at least on A2. The rating criteria are either dichotomous (e. g. text organisation) or 

polytomous (e.g. global impression). 

A sample of N = 2.700 students from five different school types in Germany at grade eight 

to ten were tested within a multi-matrix design in 2007. Along with reading and listening 

comprehension items, 17 writing tasks were distributed. The study to be presented was 

carried out to analyse whether the newly developed rating approach works, particularly 

whether the order of the item difficulties obtained for the different rating criteria corresponds 

the CEF levels. In order to cope with the large amount of student non-responses, especially 

for the A1- and A2-task and to scale the data using Multifaceted-Modeling, a first approach 

was the distribution of the codings '8' (not interpretable texts) and '9' (empty pages) to the 

raters of one tasks (four ratings by six raters) percentwise according to the number of texts 

they had to rate in total. First results of an ad-hoc procedure of the IRT-analyses in 

ConQuest using the data for which the codings of the blank and not interpretable responses 

were distributed among the raters and when polytomous variables were recoded into binary 

codes showed that the order of the item difficulties fit the aimed at CEF-levels. We contrast 

this approach with an analysis in WinBUGS where every blank (i.e. a '9') is not modelled 

with a contamination of a true response by a rater effect, because rater discrepancies 

cannot occur by definition (besides lack of rater concentration). In addition, we extend the 

classical multifaceted Rasch analyses to include all rating criteria. A confirmatory factor 

analysis on these several rating scales and included rater effects will be presented. 

Along with a presentation of the rating system developed in the French project based on a 

uni-level approach, general implications for the treatment of student non-responses testing 

writing skills in large-scale assessment contexts are discussed. 

80 ENAC 2008

Collaborating or fighting for the marks? 

Students’ experiences of group assessment in the creative arts 

Susan Orr, York St John University, United Kingdom 

This paper reports on a research project on group assessment in creative disciplines in 

higher education that is funded by the university’s Centre of Excellence in Teaching and 

Learning (CETL). The central premise of this CETL is that creativity is enhanced through 

participation in collaborative activity. 

In the UK the National Student Survey identifies that students who study in arts-based 

subjects register lower levels of satisfaction in the areas of assessment and feedback. In 

addition, students and lecturers in the arts express concerns about the fairness of group 

assessment practices (Bryan 2004). 

Group assessment usually aims to measure the product created and the skills and efforts 

put in by members of the group (Bloxham and Boyd 2007). The effort and skills element can 

also be referred to as the process or the contribution. Cowdray and de Graaf (2005) point 

out that in arts education process and product are valued, however, process is an elusive 

concept. For example, as Heathfield (1999) points out, the term ‘contribution’ might refer to 

a student’s contribution to the task, or their contribution to group dynamics. 

Taking the view that assessment is a socially situated practice informed by, and mediated 

through, the socio-political context within which it occurs (Layder 1997), this research takes 

the form of an ethnographic study employing semi-structured interviewing and semiparticipant 

observation (Silverman 2004). I explore the ways that group assessment is 

experienced by students and lecturers in the subjects of dance, performance, music, and 

film. 

Across the disciplines studied I identified variation in the ways that marks were allocated to 

students for the process and product elements. These marking approaches represent local 

disciplinary and historical norms. 

Jacques and Salmon (2007) remind us that the process element of group work can take 

place out of the view of the lecturer. As a consequence, lecturers in this study have devised 

assessment strategies to help them assess process elements. For example, some students 

are asked to write about the process of the group work project in learning journals or 

production logs. However, students reported that they sometimes felt disadvantaged when 

they were asked to represent process in a text. As one student asked, ‘if it is a film, why 

write it?’. The written element is introduced by lecturers to help them disentangle individual 

contribution, however by asking students to represent process in this way lecturers may be 

creating unintended barriers to high achievement for some of our most creative visual 

students. As Smart and Dixon (2002:192) observe ‘those who are best able to articulate the 

collaborative […] process in a written form might gain an advantage even though their 

creative contribution may have been poor’. 

My analysis suggests that students recognise the importance of group work in terms of its 

vocational authenticity but that they are keen for lecturers to recognise and reward 

individual contribution fairly. This paper will initiate discussion about the role of process and 

how it might be assessed fairly and rigorously. 

ENAC 2008 81

Constructing a new assessment for learning questionnaire 

Ron Pat-El, M.Segers, P. Vedder, H. Tillema 

Leiden University, The Netherlands 

Aims/goals: In many countries, during the past decade, researchers and educationalists 

have put assessment on the agenda. More specifically, since the pivotal review study by 

Black and Wiliam (1998), the value of implementing assessment as a tool to support 

student learning, has been stressed. Based on qualitative studies in secondary education in 

the UK, the Assessment Reform Group (2002) has formulated 10 principles of assessment 

for learning (AfL), a constructivistic assessment strategy, where assessment is made a part 

of learning, and where emphasis is taken away from grading. 

Although reports have been published on the increasing extent to which AfL is implemented 

in schools, it is argued by researchers such as Black and Wiliam (1998) and McLellan 

(2001) that teachers tend to overestimate how well they use assessment as a tool to 

achieve learning gains in students. However, questionnaires used in AfL-research often 

suffer from methodological shortcomings such as low internal consistency of scales (e.g. 

Gibbs & Simpson, 2003), low factor loadings (e.g. James & Pedder, 2006) or inability to 

match student and teacher results (e.g. McLellan, 2001). Researching congruency of 

perceptions of AfL practice requires a valid instrument that enables direct comparisons 

between teachers and their students, and is based on a widely recognized 

operationalization of AfL. Therefore, this study aims to evaluate a new questionnaire, based 

on the principles of AfL as put forward by the ARG (2002), in which perceptions of 

implemented AfL-practices between teachers and their students can be compared. Based 

on pilot-results on a prototype questionnaire, a 48-item questionnaire is proposed that 

operationalizes four out of ten principles of AFL, namely: assessment for learning should (1) 

be central to classroom practice; (2) promote understanding of clear goals and criteria; (3) 

help learners know how to improve; and (4) develop the capacity for self-assessment. It is 

the aim of this study to evaluate whether a four-factor model based on the four mentioned 

principles of AfL can be confirmed. 

Method 

Procedure: A prototype self-report questionnaire was constructed and piloted in conjunction 

with educational experts at Leiden University at the department of Education and Child 

Studies. The initial 111-item questionnaire was administered as a semi-structured interview. 

Based on pilot-results, a prototype 48-item questionnaire was constructed and administered 

in conjunction with students participating in a bachelor-thesis project. 

Sample: The prototype self-report questionnaire was administered in 88 junior vocational 

high schools in the Netherlands to 1422 students (49% girls, 51% boys), who were on 

average 14.6 years old (SD = 1.52), and 237 teachers (43% females, 57% males), who 

were on average 42.3 years old (SD = 11.89). 

Results: Confirmatory factor analysis showed that the hypothesized four-factor model 

provided a good fit on the data (RMSEA = .05) for the student questionnaire and teacher 

questionnaire (RMSEA = .07). In the four-factor model all items corresponded to their 

intended factor. Cronbach’s alphas for the subscales in both teacher and student 

questionnaires were high. 

82 ENAC 2008

Assessing Professional Learning: the challenge of the 

UK Professional Standards Framework 

Ruth Pilkington, University of Central Lancashire, United Kingdom 

This paper addresses issues of assessment from the perspective of assessing academic 

professional learning. 

Within the UK since 2006 there has been a Professional Standards Framework with three 

standards descriptors which can be used to recognise the performance and professional 

standing of academic members of staff in HE. It is proposed that Institutions of Higher 

Education in the UK should adopt these standards descriptors when establishing continuing 

professional development (CPD) frameworks for academic staff. This initiative extends the 

existing range of postgraduate certificates widely used across the HE sector to structure 

initial professional development. 

Three issues emerge from this relating to the assessment themes in this conference: 

1. Most existing PG Certificates for recognising initial professional development for 

academics are accredited at M- level – as masters study. What does this mean for a 

professional context framed by professional values which embraces both formal and 

non-formal learning? 

2. How do you assess performance against standards in a way that is meaningful, 

developmental, acceptable to the academy and which is NOT competence-based? 

3. Professional performance and development is reliant on a notion of professional 

reflection and learning that is challenging for certain discipline cultures. Does the 

research provide sufficiently rigorous and flexible models that can adapt to more 

professionally appropriate tools of assessment than the current reliance on written 

reflective documents? 

The paper explores assessment at University of Central Lancashire (UCLan), UK, where an 

initial professional development award, the PG Certificate in Learning and Teaching in HE, 

has been in operation since the 1990s. The PG Certificate was originally graded using 

percentages but shifted to a simpler pass/refer system of grading against achievement of 

learning outcomes. This system has refined over the years to provide a rigorous model that 

has also been adopted across a Masters in Education (Professional Practice in HE). This 

Masters award forms a formal component of the academic CPD framework currently being 

developed at UCLan. Assessment of professional development within the wider framework 

is based on a professional dialogue using outcomes designed around the UK Professional 

Standards Framework descriptor statements. 

Recent involvement in a literature review of reflective practice as part of a national project 

has prompted a number of questions about the assessment of professional academic 

practice (Kahn et al). Within the literature I identified valuable tools for assessing academic 

development which explored professional learning in relation to stages of teacher 

development (Bell, 2001; Manouchehri, 2002; Kreber, 2004). This complements models 

structuring levels of reflective engagement (Moon,2004; Van Manen,1991; Hatton & Smith, 

1995). 

Practice emphasises a particular level of reflective engagement and engagement with 

literature to set assessment parameters appropriate to masters’ study. This is applied even 

when marking against learning outcomes. The spoken word shifts the parameters for 

measurement and judgement especially where it is part of a developmental process. What 

criteria will suit assessment of professional learning against standards and how will the 

criteria inform judgement within professional dialogues? 

ENAC 2008 83

Feedback – all that effort but what is the effect? 

Margaret Price, Karen Handley, Berry O’Donovan 

Oxford Brookes University, United Kingdom 

As resource constraints in higher education impact on the student experience, the 

importance of effectiveness of our practices is brought into sharp focus. This is particularly 

true for formative feedback which is arguably the most important part of the assessment 

process in its potential to affect student learning and achievement and develop deeper 

understanding of assessment standards. The process of giving and receiving feedback is 

considered limited in its effectiveness (Gibbs & Simpson, 2002; Lea & Street, 1998). This 

paper argues that measuring the effectiveness of feedback is fraught with difficulties.and 

draws on findings from a 3 –year project addressing student engagement with assessment 

feedback to illustrate staff and student views of effectiveness and engagement 

Effectiveness can only be judged if the feedback’s purpose is clear and the outcomes (e.g. 

learning, or student engagement) are measurable. Our study reveals the difficulties of easily 

evaluating feedback given the variability in staff views about the purpose of feedback and 

student expectations about what feedback really 'is' .Such diversity of views will rarely result 

in a perfect match between assessor and assessee which may explain the high levels of 

dissatisfaction and ineffectiveness (National Student Survey). 

If, as is widely accepted, feedback should primarily support future learning, its effectiveness 

should ideally demonstrate impact on learning. However the problem of isolating the effect 

of feedback within the multifaceted learning environment means that causal relationships 

are difficult if not impossible to prove (Salamon, 1992). Our study revealed that generally 

staff had no real expectation of measuring feedback’s effectiveness. There was extensive 

use of passive feedback methods which had no mechanisms to monitor engagement with or 

the effect of the feedback provided. In addition fragmented course structures limited the 

opportunity for the monitoring of future application of feedback. 

Our study confirmed the well documented and largely negative student view of feedback 

(Holmes & Smith 2003; McLellan 2001; Hounsell 1987) but also revealed student 

disillusionment with passive methods which they saw as only justifying the grade and 

precluded the opportunity for dialogue. For many students this led to disengagement with all 

feedback, engendering an impossible situation for staff seeking to engage them in the future. 

Simple performance measures for the effectiveness of feedback are not obvious. We may 

have to settle for measures of engagement rather than effects on learning but even 

engagement is difficult to evaluate. However meeting the students’ strong desire for more 

opportunity for dialogue may offer a way forward. Dialogue offers staff the opportunity to 

check effectiveness of feedback provided as well as an indication of student engagement. 

Resource constraints will not allow the return to traditional approaches to engendering 

dialogue but innovative ways can and must be found if feedback is be effective and 

demonstrably useful. Discussion will address the pitfalls of some traditional feedback 

processes and suggest approaches which provide performance measures of feedback 

within the process of increasing engagement. 

84 ENAC 2008

Student teachers on assessment: 

First year conceptions 

Ana Remesal, Universidad de Barcelona, Spain 

In the last decade there was a strong claim for formative assessment and important 

attempts of changing assessment practices have been made both at multiple national levels 

and also at an international level (Black & Wiliam, 2005; Coll et al. 2000). Nevertheless, in 

the author’s opinion, any attempt to change school practices confronts at least two big 

challenges. On the one hand, the evaluation practices at an institutional level often do not 

really support formative practices in the classroom; on the other hand, the teachers’ own 

conceptions of assessment often hinder the implementation of innovative practices 

(Remesal, 2007). Some studies have been carried out up to now in order to investigate 

teachers’ and secondary students’ about assessment (Brown, 2005, Remesal, 2006). 

These previous studies point at the key importance of the step from being just a student to 

starting to become a teacher in the professional career. In this paper the author wants to 

present results of the application of one scale on teachers’ conceptions of assessment 

(Brown, 2006). 450 teacher-student freshmen from a European country were asked to 

respond to a Likert questionnaire. The instrument consists of 27 items with 6 options 

response, positively packed. The questionnaire is based on a 4 conceptions-model: 

assessment as a tool for improving teaching and learning, assessment as a certifying tool, 

assessment aimed at accounting functions and assessment with no use at all on education. 

Results show a wide diversity of conceptions among student teachers and some significant 

differences related to the students previous educational experience (whether they were in 

first or second career). These results put an important challenge to us, teacher educators, if 

we aim at changing school practice from the root. In this latter sense, as a close future line 

of research, the author proposes a second answering to the questionnaire in 3 years time, 

when these 450 students will finish their university studies in order to identify the occurrence 

of changes along the teacher education program. 

References 

Black, P. & Wiliam, D. (2005). Lessons from around the world: how policies, politics and cultures constrain 

and afford assessment practices. The Curriculum Journal, 16(2), 249-261. 

Coll, C., Barberà, E., & Onrubia, J. (2000). La atención a la diversidad en las prácticas de evaluación. 

Infancia y Aprendizaje, 90, 111-132. 

Brown, G. T. L. (2006). Teachers’ conceptions of assessment: Validation of an abridged instrument. 

Psychological Reports, 99, 166-170. 

Brown, G. T. L. (2005). Teachers' conceptions of assessment: Overview, lessons, & implications. Invited 

NQSF Literature Review for the Australian National Quality 

Remesal, A. (2006). Los problemas en la evaluación del aprendizaje matemático en la educación 

obligatoria: perspectiva de profesores y alumnos. Tesis Doctoral, Universidad de Barcelona. 

Remesal, A. (2007). Educational reform and primary and secondary teachers’ conceptions of assessment. 

The Spanish instance, building upon Black & Wiliam (2005). The Curriculum Journal. 18(1). 27-38. 

ENAC 2008 85

Testing our citizens. 

How effective are assessments of citizenship in England? 

Mary Richardson, Roehampton University, United Kingdom 

The idea that citizenship education might provide some kind of solution to social problems is 

nothing new (Greenwood and Robins, 2003; Faulks, 2000; 2006). Over a decade ago, 

following the publication of the White Paper Excellence in Schools (DfEE, 1997) and ‘Crick’ 

Report (QCA, 1998), citizenship became a statutory part of the National Curriculum for 

England. There appears to be no opposition to the idea of educating young people about 

citizenship, but there are issues that have arisen from the decision to make it a mandatory 

subject in maintained secondary schools (Kerr et al, 2003). The most significant of these is 

assessment. 

There is at present a paucity of literature that focuses on the assessment of citizenship 

education and an assessment ‘deficit’ within the subject is becoming apparent. In its 2006 

report, Ofsted found sparse evidence of coherent and effective assessment and Kerr et al 

(2007) claim that assessment of citizenship continues to be problematic. The challenge for 

citizenship educators identified by Tudor (2001) and Jerome (2002) amongst others, 

includes the need to construct meaningful assessments that relate to the beliefs and values 

under discussion. Teachers are presented with a framework for assessing citizenship, but 

citizenship is a new, and different subject and apply modes of assessment to content such 

as active participation and voluntary activities is not straightforward. 

This research study seeks to develop: 

• knowledge and understanding of the assessments of citizenship education in 

maintained English secondary schools; 

• an understanding of the general perceptions of assessments by their primary user 

groups – teachers and students; and 

• an evidence base for policy in regard to the citizenship curriculum and its assessment. 

This paper describes the current structure of assessment for citizenship in secondary 

education in England and discusses the rationale for the assessment of citizenship. 

Philosophical and sociological literatures inform the conceptual analysis of definitions of 

citizenship; curriculum theory underpins an evaluation of teaching materials, policy and 

curriculum development documentation; and the literature of assessment informs the 

interrogation and discussions around specifications, examination papers and assessment 

documentation from a range of sources. 

An empirical evaluation of citizenship assessment from the perspective of the key user 

groups, teachers and pupils, was central to this research. Pilot investigations found no 

uniform approach to assessment and this has a significant effect upon the status of the 

subject (Richardson, 2006). A mixed-method approach combined a questionnaire survey 

sent to teachers and pupils in secondary schools across England; and interviews with pupils 

(Years 9-11) and teachers in 18 schools around England. The findings include a discussion 

of pupils’ attitudes towards end of key stage assessments and the current GCSE 

specifications offered for citizenship. Results suggest generally positive attitudes towards 

citizenship as a subject, but responses from teachers and pupils underline an educational 

ethos which values only the things that can be measured and graded. This attitude towards 

assessment appears to be affecting the perceived value of citizenship and teachers often 

struggle to develop methods of assessment which are appropriate for the subject. 

86 ENAC 2008

Standards in vocational education 

Andreas Saniter, University of Bremen, Germany 

Rainer Bremer, University of Bremen, Germany 

The main reason for the reliability and success of cross-OECD comparative studies in 

general education is not only the transnational agreement about educational standards but 

also the comparability of educational systems. This is not met in vocational education: 

Systemic differences between dual, modularized and school-based vocational education 

and training are obvious and generate serious obstacles in finding standards compatible to 

all national curricula (cf. Bremer 2005). 

The Leonardo pilot-project AERONET has pursued an approach that is independent from 

national curricula or systemic preferences. The first step was a survey about the Typical 

Professional Tasks (TPT) of skilled work in Aeronautic industries (mechanics and 

electricians) in selected Airbus-plants in France, Spain, Germany and the UK. Each TPT 

describes a cluster of related work processes, e. g. “Production of metallic components for 

aircraft or ground support equipment. In each plant skilled workers perform between 9 and 

12 TPT (for each profession) with surprisingly small differences between the countries 

(details can be found on http://www.pilot-aero.net ). To be proficient in these tasks is not 

only part of skilled work but also the aim of the apprenticeship – with the exception of Spain, 

where no apprenticeship in aeronautics exists and new workers are trained for one work 

process only. In our approach (Bremer/Saniter 2006) the professional work on each of 

these tasks is the vocational education standard and basis for evaluation – not set by 

trainers but by the community of practice. Obviously beginners and advanced apprentices 

are not yet able to fulfill all requirements of a complex task – we assessed their 

performance by analyzing their approaches to a holistic evaluation task in terms of 

understandability, practicability and usability. For each profession an evaluation task related 

to the assembly of equipment was chosen and was presented in a paper and pencil test to 

around 150 first, second, third year apprentices in France, Germany and the UK. The 

apprentices had 4 hours to work on the task. 

Surprisingly the better solutions were quite similar independently of the country and the 

years already spent in apprenticeship – it seems that different tracks lead to comparable 

results. More significant is the analysis and comparison of the performance of the 

apprentices who failed (partly): Whereas in our sample the German apprentices with 

acceptable solutions tended to ignore some aspects of the task, many participants from the 

UK followed the processes they had learnt, regardless of its applicability to the task and the 

French apprentices developed inventive but unrealistic solutions. 

We will present detailed results and first hypotheses concerning the relation of competence 

development and systemic aspects of the respective national vocational education and training. 

References 

Bremer, R. 2005: Kernberufe — eine Perspektive für die europäische Berufsentwicklung? in: Grollmann,. 

Philipp; Kruse, Wilfried; Rauner, Felix (Hrsg.): Europäisierung der Berufsbildung, Reihe Bildung und 

Arbeitswelt, Bd. 14, Münster, S. 45–62. 

Bremer, R.; Saniter, A. 2006: La recherche en matière développement de compétences chez les jeunes en 

milieu professionnel, in: L’École Comparée — Regards croisés franco-allemands, Groux, Dominique; 

Helmchen, Jürgen ; Flitner, Elisabeth, Paris. 

ENAC 2008 87

Why do some students stop showing progress on progress tests? 

Lydia Schaap, Erasmus University Rotterdam,The Netherlands 

H.G. Schmidt, Erasmus University Rotterdam,The Netherlands 

The Institute of Psychology at Erasmus University in Rotterdam has a problem-based (PBL) 

curriculum. Some of the goals of PBL are to promote a deeper understanding of the to-belearned 

material and to train students as effective problem solvers and lifelong learners. 

Therefore, long-term retention of knowledge is a crucial aspect in this learning environment 

(Norman & Schmidt, 1992). The Institute of Psychology wishes to reflect these goals in its 

assessment policy. It was decided to implement progress testing as the main assessment 

tool in the bachelor programme, because this Progress Test (PT) focuses on long-term 

retention of knowledge and measures knowledge growth (Van der Vleuten, Verwijnen, & 

Wijnen, 1996). Moreover, the direct association between a specific course and its test is 

disconnected and endless resits of exams are prevented. By using the PT as the main 

assessment tool, it is hoped that students are challenged to study in a way that promotes 

long-term retention of knowledge and that students are motivated to follow (to some extent) 

their own interests when studying. 

The PT, as used in the psychology program, reflects the objectives of the first two years of 

the bachelor programme. In these two years the basic knowledge of several domains of 

psychology are studied in sequentially programmed, five-week courses. The PT is 

administered four times a year. To promote studying on a regular basis, every course ends 

with a ‘course test’. Course tests are not rewarded with credits, however when students 

obtain an average score of 6.5 (on a ten-point scale) on these course tests, it is possible for 

them to compensate insufficient achievement on the PT. 

Several analyses have been carried out on the assessment data. For instance, the 

relationship between course tests and progress tests has been studied, as well as students’ 

knowledge growth. Analyses have shown that scores on course tests and progress tests 

correlate sufficiently (r = .70). From the analyses on the growth of student knowledge it 

appears that not all students show the same growth curves. In fact three groups can be 

distinguished (Bouwmeester & Van Onna, under revision), including a group that does not 

grow any more after the second year of study. Currently we are trying to explain why some 

students show more knowledge growth than other students. We do this by taking into 

account student variables (e.g. IQ, professional skills in tutorial groups and study behaviour, 

such as invested study time and processing strategies) as well as test variables (e.g. item 

characteristics and level of knowledge measured). Results will be presented at the 

conference to discuss the different ways of influencing student variables and test variables 

to promote long term retention and knowledge growth. 

References 

Norman, G. R. & Schmidt, H. G. (1992). The psychological basis of problem-based learning: A review of 

the evidence. Academic Medicine, 67, 557-565. 

Van der Vleuten, C. P. M., Verwijnen, G. M., & Wijnen, W. H. F. W. (1996). Fifteen years of experience with 

progress testing in a problem-based learning curriculum. Medical Teacher, 18(2), 103-109. 

88 ENAC 2008

Contextualising Assessment: The Lecturer's Perspective 

Lee Shannon, Liverpool Hope University, United Kingdom 

Lin Norton, Liverpool Hope University, United Kingdom 

Bill Norton, Liverpool Hope University, United Kingdom 

Aim: In the research literature assessment is recognised as a fundamental driver of the 

learning process (Boud, 2007;Gibbs and Simpson, 2004-5; Ramsden, 2003; Rust et al., 

2005) yet the relationship between lecturers’ pedagogical belief and choice of assessment 

is an under explored area. The aim of this study, which builds on work by Harrington et 

al.(2006) and Norton et al.(2005), was to elicit lecturers’ perceptions of assessment within 

the broader context of their philosophy of learning and teaching. Further focus is on specific 

aspects of the marking process, feedback relationships and the relationship between past 

experiences and current practices. 

Methodology: Thirty in depth semi-structured interviews were carried out with lecturers in 18 

disciplines at three higher education institutions in the UK. Participants ranged in 

assessment experience from 1-22 years and were drawn from a spectrum of academic 

backgrounds, some followed a career purely in academia, others were experienced 

practitioners in their field before entering higher education. Thematic analysis, using the 

process developed by Braun and Clarke (2006), was selected as an appropriate interpretive 

tool in dealing with individual and shared meanings at the analysis stage. The analysis was 

carried out by three experienced researchers who worked on transcripts independently at 

first, then jointly in an iterative process, to arrive at an agreed thematic structure. Data found 

to be relevant to the overall research question were developed to form thematic strands for 

further analysis. At all times themes were checked against the original transcripts to ensure 

an accurate representation of the data. 

Findings: Themes highlighted include: 

• Practical and emotional entailments of assessment regimes. 

• Relationships between assessment and philosophies of learning and teaching. 

• Assessment for learning. 

• Feedback: the students’ response. 

• Perceptions of students’ understanding of assessment. 

• Features of a ‘good’ University education. 

• Lecturing experience and changes in self perception. 

• Words of wisdom for the uninitiated. 

Implications: This study uncovered a range of features of assessment practice in various 

contexts and disciplines and emphasises the participants collective view that there is a need 

for explicit training in assessment design, marking and the use of feedback, which 

resonates with the findings of Rust (2002). It was also found that participants’ approaches 

varied from those who held explicit philosophies of learning and teaching inextricably linked 

to their assessment practices, to those who made implicit assumptions about pedagogy that 

bore little relation to the choice of assessment. 

Discussion: The above issues are discussed in relation to current trends in assessment in 

higher education and the relationship between individual pedagogies and assessment 

practices. Practical guidelines for supporting staff development are suggested as are key 

areas for further research. 

ENAC 2008 89

Learning to read: Modeling and assessment of 

early reading comprehension of the 4-year-olds in Macao kindergartens 

Pou Seong Sit, University of Macau, China 

Kwok-cheung Cheung, University of Macau, China 

Learning to read for the 4-year-olds is no easy task and this is especially so in traditional 

kindergartens in Macao. Challenging assessment in the form of integrated assessment system 

(IAS) is fervently needed (Birenbaum, et al. 2006). The aim of the present study is to scaffold 

children to higher level of reading comprehension through reciprocal teaching methods so that 

children learn to make use of the acquired reading strategies (i.e. questioning, clarifying, 

predicting, summarizing) to read predictable storybooks (Palincsar & Brown, 1984) . Notions of 

assessment, an integration of “assessment of learning” and “assessment for learning”, are 

extended and elaborated to form an IAS to comprise: (i) whole-class storybook reading 

instruction using reciprocal teaching methods; (ii) individualized storybook assessment followed 

by storyboard assessment after whole-class reading instruction, and (iii) assessment-driven 

action research at the reading corners that seek to bring up children’s reading comprehension 

levels at their zone of proximal development (Sit, 2007). 

Central to the research design is a conceptual model of early reading comprehension the 

knowledge structure of which transits from the language world progressively to the human 

world, and at the same time depicts children’s minds developing through “recognizing words 

and grammar” to “situated understanding of the texts read” (Tse, et al. 2005). Capitalizing 

on the textbase and situation model of the story contexts children progress from “learn to 

read” to “read to learn” via four distinct developmental milestones: (i) extracting the surface 

meanings of the texts and recognizing the apparent features of pictures read, (ii) inferring 

the underlying meaning of texts and inner structure of pictures for situated understanding; 

(iii) making connections of the meanings constructed from texts and pictures for holistic 

understanding of the story read; and (iv) acquiring reading strategies through reciprocal 

teaching methods (Sit, 2008). Worthy of particular mention is that interpretation of the four 

progressive milestones is done in the light of Feldman’s (1994) ideas of “sequentiality” and 

“hierarchical integration”, alongside the simultaneous restricted use of “universality” and 

“spontaneousness” of cognitive development. 

Central to the implementation of IAS is the development of storybook and storyboard 

assessment compatible with the proposed conceptual model. In the individualized 

storyboard assessment, target children are guided reading the storybook and are 

questioned and rated using a 4-point Likert scale in accordance with the objectives of the 

whole-class reading instruction (e.g. whether know the main characters after reading the 

front cover of the storybook). In the individualized storyboard assessment immediately 

following the storybook assessment, children make use of a storyboard to tell the story just 

read. They are rated using a 4-point Likert scale according to the situated understanding 

already emerged in their minds. Areas assessed include degree of participation, utilization 

of materials provided, teacher-student interactions, power of expression, completeness of 

the story structure exhibited, consistency of story themes, and signs of creativity. 

Using the assessment results as feedback for the design of action research, the present 

study was successful to scaffold children of varying learning ability to progress along the 

milestones as envisaged in the assessment model. 

90 ENAC 2008

Assessment in action – 

Norwegian secondary-school teachers and their assessment activities 

Anne Kristin Sjo, Stord/Haugesund University College, Norway 

Knut Steinar Engelsen, Stord/Haugesund University College, Norway 

Kari Smith, University of Bergen, Norway 

This paper deals with the conference theme “Learning-oriented assessment”, and takes a 

closer look at formative assessment amongst teachers in lower-secondary school. The 

paper is research-related. 

One of the strongest criticisms against Norwegian teachers is their lack of formative 

assessment skills. Results from the 2003 PISA study indicate that Norwegian teachers 

spend less time on feedback and reinforcement strategies than teachers from other OECD 

countries (Grønmo et al., 2004). This in spite of the fact that international research 

considers formative assessment strategies as extremely important determinants for 

students’ learning (Black & William, 1998; Coffield et al., 2004). 

The aim of this study is to find out what kind of formative assessment practices are 

identified amongst teachers in lower-secondary school, and more specificly, what kind of 

feedback processes can be detected and developed between teachers and their students. 

The research context is an action research project funded by the Norwegian Research 

Council which focuses on developing teachers’ assessment competence. The project is 

carried out at two schools involving nine teachers, each developing their own digital 

portfolio. During the course of the project the teachers will take note of experiences from 

their assessment practice and reflect upon these in relation to literature about assessment. 

The paper focuses on teachers’ feedback practices and the analysis is built on a 

multileveled ethno-methodological study. The intention behind the study is to gain insight 

into both the teachers’ real practices as seen in the classroom and how they themselves 

experience and explain their own feedback strategies. 

Initially two teachers were observed through a period of five days. The teachers where 

videotaped to show their interactions with the students in an attempt to reveal how the 

feedback processes were carried out step by step. Three situations from the video data are 

used in an interaction analysis (Jordan & Henderson, 1995) to show the different steps 

taken in the feedback processes. The observation period is followed by seven open-ended 

teacher interviews (Kvale, 2001). The teachers are asked to comment on various findings 

from the interaction analysis and elaborate on their own experiences with different ways of 

giving feedback to the students. In addition to an analysis of the teachers’ portfolios, this will 

draw a multi-levelled picture of existing practice, and also indicate how it is developing 

The preliminary findings indicate that the formative assessment processes performed in 

classrooms and the feedback situations are more complex and have even more layers than 

first assumed. The student’s perception of a feedback comment is to a large extent 

dependent on the context in which it is given, and the same comment can be understood by 

the student as either informative or non-informative, as constructive or destructive, as 

feedback or feed-forward, all depending on the context. The interaction analysis shows that 

the communication between teachers and students has a certain tacit dimension and one 

important factor to be considered in the analysis is whether the interaction involves an 

implicit shared inter-subjective (Rommetveit, 1974) understanding between the teacher and 

the student or not. 

ENAC 2008 91

How do students teachers and mentors assess the Practicum? 


There is recognition of the importance of Practicum in teacher education (Korthagen et al., 

2001; Smith & Lev Ari, 2005). Understanding for the need of student teachers to gain 

access to practitioners' tacit knowledge is expanding (Cambell & Kane, 1998). There is 

more to teaching than the direct product of theoretical knowledge and practical skills. 

Teaching is highly contextualized, which makes assessment of teaching a complex issue. 

A central focus for the Practicum is to help students develop independent reflective 

competence for future career-long professional development (Dewey, 1933; Schön, 1987; 

Korthagen, 2001; Day, 1999, 2004). Student teachers are expected to recognize when 

learning takes place and to recognize what is needed for future learning (Brodie & Irving, 

2007). These are internal self-assessment activities related to a specific learning context 

and are not easily articulated. 

Recently the role of mentors is rightfully receiving increased attention. Smith & Lev Ari (2005) 

show that mentors are the most significant contributors to students’ learning during Practicum. 

A key function of mentors is assessing students’ teaching competence. This is a difficult 

task as assessment serves multiple functions, to present student teachers with feedback 

and guidance and to serve summative and judgmental functions to protect the profession 

from incompetence (Smith, 2006). 

There is tension between the supporting role and the assessment role, especially in relation 

to summative assessment (grading). However, mentor opinion is essential to strengthening 

the validity of assessment. Mentors know the context of teaching and are able to assess the 

appropriateness of actions in that specific setting. Mentors accumulate practical and nondocumented 

evidence of assessment dialogue. 

The focus of the current study is to examine the extent of agreement between students’ and 

mentors’ assessment of the Practicum. 

A random sample of 20 students and their 20 mentors will be selected after the spring 

Practicum, and asked if they agree to respond electronically to an open ended structured 

questionnaire with the following focus points: 

• What is a good Practicum? 

• Strong points exhibited by student 

• Issues that need to be strengthened 

• How to go about implementing alternatives for improvement 

• Overall assessment of the practice period (grade). 

The responses of the 20 pairs (student/ mentor) will be compared to each other internally: 

• Comparison of open responses to each question 

• Comparison of grades (final question) 

Finally, the responses of the students and the teachers will be analysed separately to look 

for group commonalities. 

To ensure the validity of the findings the author and an additional researcher will analyse 

the data separately. The presented results represent the outcome of a moderation process.. 

Data collection takes part in March 2008. 

Significance: The quality of communication and shared understanding of goals between 

students and mentors seem to be a major criterion for a successful Practicum and for 

quality assessment of students’ achievements. Until today, research on assessment of the 

Practicum is meagre (Graham, 2006), and hopefully the current study will deepen our 

understanding of the extent of agreement between students and mentors. 

92 ENAC 2008

Assessment of competencies of apprentices 

Margit Stein, Catholic University of Eichstätt-Ingolstadt, Germany 

Within the project ‘LAnf - Leistungsstarke Auszubildende nachhaltig fördern’ / Assisting 

highly competent apprentices’ of the BiBB (Bundesinstitut für Berufsbildung / Federal 

Institute for Vocational Education and Training) instruments and assessments were tested 

for diagnosing highly competent apprentices and young professionals. 

In reference to the deseco-program of the OECD (‚Definition and Selection of Competencies’) 

competencies within the project LAnf was not merely defined as cognitive 

competencies but also as achievement motivation with a high willingness for learning as 

well as social competencies and autonomy. This definition of competencies within LAnf 

sticks to the three aspects of competencies in the deseco-program: the competence to act 

autonomously, the effective and interactive use of symbols like language or mathematical 

symbols and the effective interaction in various heterogeneous groups. Especially within the 

domain of professional education and training a concept that would define competencies 

merely as skills would be rather one-sided. 

Up to now especially theoretical models regarding the concept of „professional competencies 

and skills” were developed that were rarely approved and assessed within reality. 

Within the project LAnf the assessment for professional competencies was developed on 

the assumption that within professional contexts even more than in the context of instruction 

within schools effective and competent action relies on factors besides mere cognitive 

competence like autonomy and the effective interaction in heterogeneous groups. 

Based on these theoretical assumptions within LAnf a psychometric assessment was 

developed and tested for assessing highly competent apprentices and young professionals. 

In a first step trainers of different enterprises and companies of varying size were asked to 

name apprentices and trainees who were outstanding concerning their competencies within 

daily work. This group of highly competent apprentices regarding to their trainers was then 

in a second step compared with a group of vocational school students that was matched 

concerning age, sex and apprenticeship training position. Both groups were confronted with 

an assessment based on the three dimensions of competencies of the desesco approach. 

The data stated that the group of highly competent apprentices regarding to their trainers 

(n=52) outmatched the group of vocational school students (n=61) within the domains of 

cognitive competencies and intelligence. The first group was even more highly significant 

predominant regarding achievement motivation and effective interaction in various 

heterogeneous groups. The matching between professional interest and professional 

demands was not significantly different between both groups. The data shows that not only 

cognitive aspects but also motivational and social aspects of competencies differ between 

groups that display different professional performance. 

ENAC 2008 93

Academics’ epistemic beliefs about their discipline and implications for their 

judgements about student performance in assessments 

Janet Strivens, The University of Liverpool, United Kingdom 

Cathal O'Siochru, Liverpool Hope University, United Kingdom 

In recent years there has been a growing focus within the debates on learning and teaching 

in higher education on the importance of the discipline. Academics’ primary professional 

allegiance is known to be to their subject and it is increasingly seen as ‘good practice’ to 

approach the development of their teaching skills through a disciplinary perspective. This 

brings into question what we really know about the nature of disciplines. Do academics 

within subjects share a consistent set of beliefs about their subject which do or should 

influence the way they teach and assess their students? If so, are these beliefs implicit or 

can they be clearly articulated, for the presumed greater benefit of students? Or are there in 

fact significant inconsistencies which may lead to different criteria applied to judgements 

about the quality of student performance, with the likely result of leaving students confused 

and uncertain? 

This paper reports on two studies with very different methodologies but a similar focus on 

making explicit the beliefs of academics about their subject and implications for their 

students. The first study explores ‘epistemic match’ between students and staff (faculty) 

using a pair of measures, both based on Hofer’s questionnaire on epistemological beliefs 

(Hofer, 2000): the second uses in-depth interviews to explore lecturers’ perceptions of how 

and why they make certain judgements about the quality of their students’ work when 

carrying out assessments, and what this means in terms of explicating their beliefs about 

‘knowledge’ and ‘learning’ in their subject area. 

Findings from both studies will be compared to attempt to establish what has already been 

learned about the significance of academics’ beliefs about their subject in relation to the 

learning of their students, and to draw out lessons for future research in this area. 

94 ENAC 2008

Techniques for Trustworthiness as a Way to Describe Teacher Educators’ 

Assessment Processes 

Dineke Tigelaar, Jan van Tartwijk, Fred Janssen, Ietje Veldman, Nico Verloop 

ICLON-Leiden University Graduate School of Teaching, Netherlands 

Portfolios are increasingly being used in teacher education, both as a learning tool and as a 

tool for assessment. Since their introduction, portfolios have been expected to contribute to 

the learning and development of prospective teachers (Bird, 1990; Zeichner & Wray, 2001). 

Teaching portfolios should make prospective teachers think more carefully about their 

teaching and subject matter (Anderson & DeMeulle, 1998; Bartell, Kaye & Morin, 1998; 

Darling-Hammond & Snyder, 2000). However, portfolio use is often problematic. First, the 

potential benefits for student teacher learning often fail to materialize (Darling, 2001). 

Second, unambiguous portfolio rating is difficult to achieve, since information in portfolios is 

often non-standardized and derived from various contexts (Schutz & Moss, 2004). This implies 

that assessors have to interpret portfolio information and take account of context before they 

can derive judgments, which causes reliability problems. Therefore, a portfolio procedure is 

needed that promotes both student teachers’ learning processes and responsible interpretation 

in context. Applying Guba & Lincoln’s (1989) criteria for ‘trustworthiness’ seems promising in 

this respect (Tigelaar et al, 2005). Complying with these criteria means that trust must be built 

between assessors and student teachers, with assessors being aware of student teachers' 

concerns through extensive involvement in their learning processes (‘prolonged engagement’, 

‘persistent observation’). Assessors should discuss hypotheses with a peer and search for 

counterexamples (‘peer debriefing’, ‘progressive subjectivity’). Interpretations should be tested, 

accounting for all available evidence (‘negative case analysis’), and be ‘member checked’ with 

student teachers. Interpretations should be documented and conclusions should be supported 

by the original data (‘dependability’, ‘confirmability’). Finally, information about assessment 

conditions should be available (‘thick description’). 

In this study, eight teacher educators participated. Teacher educators acted both as 

supervisor and assessor for prospective teachers. We explored how teacher educators’ 

formative and summative assessment of student teachers can be described using the 

framework that Guba and Lincoln provide. Research question: to what extent do teacher 

educators’ assessment activities relate to techniques for trustworthiness? 

Teacher educators were interviewed about the application of trustworthiness criteria when 

working with the portfolio. Questions focused on: (1) using the portfolio and/or other sources 

of information or formative and summative assessment; (2) criteria and procedures for 

formative and summative assessment (3) measures for guaranteeing the quality of the 

assessment processes. Data were analysed, testing tentative categories derived from Guba 

and Lincoln, summarized in matrices, discussed among the first and second author, and 

checked with the original interview transcripts and the participants. 

‘Prolonged engagement’ and ‘persistent observation’ were applied most by teacher 

educators. ‘Negative case analysis’ and ‘member check’ were evident in most interviews. 

Documenting (‘dependability’), ‘peer debriefing’ and ‘progressive subjectivity’ was done, but 

paid more attention to in cases of doubt. Tracing interpretation processes were applied least 

(‘confirmability’, and ‘thick description’). The results suggest that teacher educators need to 

make better use of scoring rubrics and artefacts in the portfolio to underpin their 

interpretations and conclusions, including their feedback to student teachers. Furthermore, 

methods for responsible portfolio interpretation might need to be made less time-consuming 

and more practical. 

ENAC 2008 95

Peer Assessment for Learning: 

a State-of-the-art in Research and Future Directions 

Marjo van Zundert, Open University, The Netherlands 

Despite popularity and advantages of peer assessment in education, a major problem has 

not yet been solved. An enormous variety in peer assessment practices exists, which 

makes it difficult to draw inferences in terms of cause and effect. All the more since 

generally, literature describes peer assessment in a holistic fashion (i.e., without specifying 

all variables present). To date, it is unclear exactly under which circumstances peer 

assessment is beneficial for student learning. And it is still inconclusive precisely what 

evokes satisfying measurements such as reliability and validity. Hence, this study attempted 

to investigate which variables foster optimal peer assessment that is beneficial for student 

learning and with satisfactory measurements. 

We tackled this problem by an inquiry of 26 experimental studies to map variety in peer 

assessment and to identify which strategies contribute to learning and measurements. 

Literature was selected on the basis of five criteria: (1) published between 1990 and 2007; 

(2) published journal article; (3) journal listed in Social Sciences Citation Index, domain 

Education & Educational Research; (4) empirical study; (5) main topic is peer assessment 

or related term. 

This literature inquiry resulted in a descriptive review, in which four outcome categories 

were distinguished. The first category concerned measurements of peer assessment. 

Measurement issues included among others agreement between multiple peer 

assessments, or agreement between student and staff assessment. For learning from peer 

assessment, three categories were distinguished: domain skill, peer assessment skill, and 

student attitudes. Learning of domain skill referred to improved quality of students’ work. 

Peer assessment skill concerned students’ competence in assessing peers. Student 

attitudes comprised their views on peer assessment. Measurements were enhanced by 

training and experience. Domain skill was fostered by providing students with the 

opportunity to revise their work on the basis of peer assessment. Peer assessment skill was 

ameliorated by training and dependent on student characteristics. Student attitudes were 

also positively influenced by training and experience. 

The multiplicity of peer assessment practices and the holistic way of reporting were 

underlined. Future research should strive for more transparency in peer assessment effects 

by true or quasi experimental studies, in which relations between variables are indicated, so 

strong inferences in terms of cause and effect can be drawn. Besides higher education, 

research can be broadened to vocational and secondary education, considering current 

developments there. Topics that need more scrutiny comprise long term learning effects, 

feedback, the role of interpersonal variables, and the distinction between assessing and 

being assessed. Also, more clarity in measurement issues and uniformity of measurement 

instruments are desired. 

96 ENAC 2008

Investigating the Pedagogical Push and Technological Pull of 

Computer Assisted Formative Assessment 

Denise Whitelock, The Open University, United Kingdom 

Over the last ten years, learning and teaching in higher education have benefited from 

advances in social constructivist and situated learning research (Laurillard, 1993). In 

contrast, assessment has remained largely transmission orientated in both conception and 

in practice (see Knight & Yorke, 2003). This is especially true in higher education where the 

teachers’ role is usually to judge student work and to deliver feedback (as comments or 

marks) rather than to involve students as active participants in assessment processes. 

This paper reports on a project which set out to provide further insights into the role of 

electronic formative assessment in Higher Education and to point the way forward to new 

assessment practices, capitalising on a range of open source tools. The project built upon 

the premise that assessment and learning need to be properly linked. It explored the factors 

that influence assessment inputs, processes and outcomes by: 

a) Developing a suite of technological tools at different levels of support for collaborative 

and free text entry e-assessment 

b) Evaluating a series of formative assessments across a number of disciplines. 

An agile methodological approach was adopted rather than a plan driven methodology for 

the development of the software and the user evaluation since the former supports 

adaptation rather than prediction. Student surveys and a case study methodology were 

employed to understand the pedagogical drivers and barriers associated with these types of 

assessment. 

Findings 

One of the more challenging aspects in the current e-assessment milieu is to provide a set 

of electronic interactive tasks that will allow students more free text entry and provide 

immediate feedback to them. Open Comment was a system that was built to accommodate 

free text entry for formative assessment for History and Philosophy students. It forms part of 

the pedagogical push from the Arts Faculty to construct systems that help students decode 

feedback, internalise it and become more self regulated learners. 

Other tools developed in this project include a BuddySpace, BuddyFinder and SIMLINK 

combination which assisted students to work remotely in a collaborative fashion to make 

predictions, using a science simulation, which were embedded in a series of formative 

assessment tasks. 

One of the major findings from this project is the creativity of staff, both academic and 

technical, to create formative e-assessments with feedback and collaborative online tasks 

that empower students to become more reflective learners. It might appear in the short term 

that the technological pull is currently overtaking the pedagogical push in the e-assessment 

arena but this project has shown with this collection of open source applications, that there 

is way forward to redress the balance. The approach adopted here sits well within a 

constructivist paradigm which has often been less well served in the past through formal 

summative assessment which is not an integral part of the knowledge construction process. 

ENAC 2008 97

Strict Tests of Equivalence for and Experimental Manipulations of 

Tests for Student Achievement 

Oliver Wilhelm, Ulrich Schroeders, Maren Formazin, Nina Bucholtz 

IQB, Humboldt-University Berlin, Germany 

Much of the research on equivalence of measurement instruments across test media is 

easy to summarize: Unless a test of maximal behavior is strongly speeded, test media will 

be of negligible relevance for what the test measures. For many applied and scientific 

purposes, this statement is obviously too simplistic. For example, high disattenuated 

correlations across test media do not ascertain the irrelevance of test media. Similarly, two 

measures with exactly the same score distributions do not necessarily measure the same 

ability underlying observed maximal behavior. The issue of equivalence across 

manifestations of a measure is a nuisance because due to the lack of generalizable results 

about the absence of determinants of divergence, the equivalence of two forms of a test has 

to be determined for each test in each application population and across soft- and hardware 

realizations. Apparently, this scientifically not very intriguing problem is a psychometric 

Pandora box. 

However, a lack of equivalence does not necessarily indicate failure of converting a 

measure. Lack of equivalence can also indicate meaningful improvements of a measure. 

For example conventional listening comprehension tasks have a variety of shortcomings 

that can be overcome in order to improve measurement quality in computerized testing: By 

using computers, it is easier to ensure the same audio stream in the same quality for all 

participants, there are more degrees of freedom in administrating a task in a group setting 

(rewinding, forwarding, pausing), and the response alternatives can be included into the 

audio stream. Currently, the importance of such improvements is underinvestigated. 

In study one, we have administrated reading and listening comprehension tests of English 

as a foreign language in traditional and computerized versions to a larger sample of 

secondary students. Test version and sequence were varied between subjects. An attempt 

was made to keep the computerized versions of all measures as close as possible to the 

conventional test form even if that implied suboptimal operationalisations of a measure. In 

study two, we have used modified versions of listening comprehension tests, implementing 

not only stimuli but also responses in audio format with a similar sample. Additionally, 

completely newly developed video comprehension tests were administrated. In both 

studies, standard demographic questionnaires and a questionnaire assessing computer 

experiences allow for group comparisons of means and covariances in structural equation 

modeling. Fluid and crystallized intelligence measures serve as covariates. 

The focus of all analyses are covariances between latent variables in a multi-group context. 

The discussion will consider a) advantages and disadvantages of computerized testing from 

the perspective of construct validity and b) opportunities for the assessment of hitherto 

unmeasurable aspects of student achievement. 

98 ENAC 2008

Roundtable Papers 

ENAC 2008 99

100 ENAC 2008

Why the moderate levels of inter-assessor reliability of student essays? 

Morten Asmyhr, Østfold University College, Norway 

Although some studies indicate that inter-assessor reliability is adequate when student 

papers in the essay format are considered (e.g. Johnsson & Svingby 2007), other studies 

reveal serious shortcomings as to assessor reliability, both when examination papers and 

portfolios are concerned. A number of studies, some of them ancient, are revisited to search 

for regularities that might help identify factors that contribute to low marker reliability. 

At a recent occasion, all assessors agreed to submit their tentative mark prior to the final 

marking session at two separate examinations. The analysis of the results revealed greater 

differences between the two markers than 2 steps on a 7 point scale at one of the 

examinations. Results on the other examination was somewhat better as to marker 

reliability. A small number of the student papers were selected for the second part of the 

study. A number of assessors were from the local pool of assessors for the examination in 

question recruited to mark individually the papers and to record their practical procedure 

when marking. Their use of defined and specified assessment standards and criteria was 

made a significant area of concern in their reports. A sample of students sitting for the same 

examination was also recruited to mark the same papers. The results from the whole data 

set was compiled, analysed and fed back to the group of students for them to assess the 

assessment of the whole group of assessors. 

In the paper, a more comprehensive survey of studies on assessment reliability will be 

presented and the results from the present study will be given and discussed in relation to 

practical as well as theoretical concerns pertinent to examination and assessment 

procedures. Is it possible to maintain a satisfactory consistency across markers or do 

students have to accept examination results that equally dependent upon who is the 

marker(s) and the quality of the students’ papers? 

References 

Jonsson, A. & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational 

consequences. Educational Research Review, 2(2): 130-144. 

ENAC 2008 101

Approaches to the assessment of graduate attributes in higher education 

Simon Barrie, C. Hughes, C. Smith 

The University of Sydney, Australia 

This paper draws on a literature review of the various approaches to the assessment of generic 

graduate attributes. The literature review was conducted as the first stage of a national research 

study exploring the integration of generic attributes in Australian universities’ assessment 

practices (Barrie, Hughes & Smith 2007). The issue of graduate attributes (also referred to by 

some authors as generic, core or employability skills), has received considerable attention in 

recent years as universities seek to renew and articulate their purposes and demonstrate the 

efficient achievement of these, particularly in response to calls for accountability (Barrie, 2005). 

Graduate attributes have been widely taken up by universities in many parts of the world 

including Australia. While university policy claims commonly refer to the “integration” and 

“embedding” of graduate attributes, questions have been raised in relation to the alignment 

between what is espoused, what is enacted and what students experience and learn (Bath, 

Smith, Stein and Swan, 2004) in assuring their development. Australian University Quality 

Agency (AUQA) audits have revealed the need for more systematic addressing of generic 

attributes in curricular and the provision of stronger evidence in support of institutional claims 

than policy statements and relatively surface mapping activities. 

It has been argued that the strongest evidence of graduate attribute policy implementation is their 

embedding in course and program assessment activities (Barrie, 2004). However there are 

significant barriers to the achievement of this including; lack of a conceptual basis and consistent, 

coherent operational definitions of the intended outcomes, the difficulty in meaningfully 

communicating assessment standards to students, the challenge of articulating the 

developmental progression and the temptation to resolve problems by defining skills at an everincreasing 

level of detail which soon becomes unworkable for academics and students alike 

(Barrie, 2005; Washer, 2007). Despite these barriers, the literature contains numerous examples 

of approaches to the assessment of graduate attributes which demonstrate a wide diversity of 

methods, levels of student involvement and disciplinary contexualisation. These include: 

• non-traditional assessments such as moral assessments and exit interviews. (Dunbar, 

Brooks, and Kubicka-Miller, 2006) 

• attempts to develop institutional grade descriptors based on generic attributes (Leask, 2002) 

• the development of resources such as templates to guide the design of assessment 

(Watson, 2002) 

• authentic outcomes based approaches using portfolio assessment (Hernon, 2004; 

Seybert, 1994) 

• standardised tests such as the Graduate Skills Assessment Test and the Collegiate 

Skills Assessment 

• self-rating scales (eg CEQ) 

• integrated performance based assessment tasks (Hart, Bowden & Watters, 1999) 

• the use of postgraduate assessment strategies eg oral presentation and defence in 

undergraduate contexts. 

This paper presents a typology of assessment approaches based on an analysis of the types of 

assessment strategy. This typology is considered in relation to the barriers to integration 

identified in the literature on generic attributes and in relation to emerging theoretical and 

conceptual models of graduate attributes. In doing so the paper identifies the potential for 

different assessment approaches to overcome the key barriers to assessment of generic 

attributes and will stimulate discussion in relation to both theoretical and practical issues. 

102 ENAC 2008

Assessment for learning in and beyond courses: 

a national project to challenge university assessment practice 

David Boud, University of Technology, Sydney, Australia 

There has been considerable debate in recent years about how assessment can contribute 

to student learning. However, most of this has focused on learning within the framework of 

the course of study being undertaken. In a changing world however, assessment needs 

also to foster the learning that will occur after course completion, as higher education needs 

to provide students with a foundation for a lifetime of professional practice in which they will 

be continually required to learn and to engage with new ideas that go beyond the content of 

their university course. 

As part of this, a critique has been building on the inadequacy of formative assessment 

practices that help students’ learning during their courses (eg. Sadler, 1998, Yorke, 2003). 

There has also been substantial criticism of the role of summative assessment and its 

negative effects on student learning (eg. Ecclestone, 1999, Knight, 2002, Knight & Yorke, 

2003). There is also concern that simply increasing feedback to students is not, in itself, a 

worthwhile practice unless it also builds students’ capacity to critique and improve their own 

work (Hounsell, 2003). There is a flourishing literature exploring assessment practices that 

have positive effects on learning and there have been important initiatives that look at the 

long-term consequences of university courses, including assessment, on subsequent 

learning in professional practice (Mentkowski, 2000). 

Boud (2000) discussed the needs of assessment in a learning society and introduced 

requirements for a new way of thinking about assessment. He suggested that current 

assessment practices in higher education did not equip students well for a lifetime of 

learning and the assessment challenges they would face in the future. He argued that 

assessment practices should be judged from the point of view of whether they effectively 

equip students for a lifetime of assessing their own learning. More recently Boud suggested 

(2007) that assessment needed to be reconceptualized as an activity of informing judgment, 

in particular informing judgements of learners about their own work. 

The Carrick Institute for Learning and Teaching in Higher Education, the main funding body 

for teaching and learning development for Australian universities has established a one year 

project to draw on international research to examine how assessment practices that focus 

on learning in and after courses can be developed, particularly in areas where there are 

large cohorts of students. 

The Roundtable discussion will take place at the very start of this project and it will seek to 

elicit international collaboration. It will focus on the questions: what assessment practices 

have shown potential for both meeting summative purposes but also informing student 

judgement in ways that carry beyond the end of courses? What evidence is there for the 

utility of such practices? How can they be extended beyond their initial sites of 

development? How can the uptake of new assessment practices within universities be 

influenced? The approach the project has adopted on these matters will be discussed and 

the views of participants on these canvassed. 

ENAC 2008 103

Electronic reading assessment: The PISA approach for the 

international comparison of reading comprehension 

Kwok-cheung Cheung, University of Macau, China 

Pou-seong Sit, University of Macau, China 

This paper seeks to document how Macau-PISA Center prepares for electronic assessment 

of reading literacy for the 15-year-old students in secondary schools in Macao. First, 

emerging concepts of reading literacy with regard to life-long learning for our next 

generation in the digital age will be explicated. Congruence of the proposed concepts of 

electronic reading literacy with existing curricular and instructional provisions in Macao is 

evaluated. Second, the Reading Literacy Assessment Framework, a response to the OECD 

“DeSeCo Project” (i.e. Definition and Selection of key Competences) to include the ICT 

(information and communication technology) components as key competences, is 

presented to highlight the constructs assessed and nourished in the classrooms. Third, the 

paper demonstrates how test items and tasks for electronic assessment of reading literacy 

can be designed, and subsequently developed into an individualized computerized testing 

platform. 

Central to the PISA approach is the definition of reading literacy. According to PISA, reading 

literacy is an individual’s capacity to understand, use and reflect on written texts, in order to 

achieve one’s goals, to develop one’s knowledge and potential and to participate in society 

(OECD, 2006). Reading literacy is assessed in relation to: (1) text format (i.e. continuous 

versus non-continuous texts of one of the following five types, i.e. description, narration, 

exposition, argumentation and instruction); (2) aspects of the reading processes (i.e. 

retrieving information, forming a broad general understanding, developing an interpretation, 

reflecting on the contents and formal qualities of a text; and (3) situations (i.e. reading for 

work, education, private and public use). This definition goes beyond the basic skills of word 

recognition, phonemic awareness, decoding and comprehension, and it requires the reader 

to be an active and reflective user of texts so as to expand one’s knowledge and potentials, 

i.e. one has to understand, apply, integrate and synthesize texts to fulfill one’s life-long 

learning goals. 

In the electronic medium, the reading tasks generally necessitate students to identify 

important questions, locate information in line with the access structure of the reading tasks, 

analyze the usefulness of the information retrieved, integrate information retrieved from 

multiple texts, and then communicate replies through electronic means. Therefore, the 

electronic texts come across by the students are dynamic with blurred boundaries. In the 

print medium, reading tasks are fixed texts with clearly defined boundaries, and what 

students did during reading are: (1) retrieve information; (2) interpret texts; and (3) reflect 

and evaluate. Delineation of an assessment framework for electronic reading literacy 

demands an incorporation and extension of concepts from the print to the electronic 

medium. The three distinctive aspects of electronic reading literacy that have implications in 

the design of assessment rubrics become: (1) accessing and retrieving appropriate 

information online via search engines and embedded hyperlinks; (2) constructing and 

integrating texts read recursively in accordance with access structures by clicking links, and 

searching for usable information until the reader judges synthesis has been done 

meaningfully; (3) reflecting and evaluating critically authorship, accuracy, as well as quality 

and credibility of information retrieved and conveyed in the electronic texts. 

104 ENAC 2008

Developing the autonomous lifelong learner: 

tools, tasks and taxonomies 

Wendy Clark, Northumbria University, United Kingdom 

Jackie Adamson, Northumbria University, United Kingdom 

This paper describes an action research project undertaken with undergraduate students at 

levels 4 and 5. Responding to the recent focus on lifelong learning and portfolio based 

personal development planning (PDP), this ongoing project encourages students to adopt a 

deep, active approach to learning, and thus take responsibility for their own learning. 

Assessment is widely recognised as an important influence on student learning. Recent 

conceptual shifts in thinking about assessment have highlighted the importance of 

developing students as autonomous learners by viewing assessment as a learning tool 

rather than a measurement of knowledge, and portfolios are mentioned as one of the 

modes appropriate for the new thinking about assessment (Havnes and McDowell, 2008). 

Therefore, the modules forming the basis of the project, in which the PDP concept was 

integrated into the curricular content and supported by the use of an ePortfolio, were 

designed following the precepts of Biggs’ theory of ‘constructive alignment’ (Entwistle 2003). 

This fits well with the PDP/ePortfolio philosophy for encouraging learner autonomy, as well 

as fulfilling the assessment for learning (AfL) requirements for formative feedback and lowstakes 

opportunities for practice before submission for rigorous summative assessment. 

Although there is still ongoing debate about the criteria to be used for the assessment of 

portfolios (Smith and Tillema, 2008), social scientists such as Baume (2002) and Biggs 

(1997) have shown that a qualitative view of validity and reliability can ensure adequate 

rigour for summative assessment. However, it is necessary to ensure inter-rater reliability as 

well as to make the learning goals and assessment criteria transparent for learners (Havnes 

and McDowell, 2008). A taxonomy for portfolio evaluation has therefore been developed 

which is easily understood and applied by tutors and students. 

In order to study the impact of this learning environment, a variety of data has been 

collected and analysed. This includes: 

• student achievement of the stated learning outcomes of the modules, assessed in 

accordance with our taxonomy for portfolio evaluation; 

• “added value” as indicated by a correlation of UCAS entry points with summative 

assessment results and a measurement of student engagement; 

• the quality of student reflection and self-evaluation demonstrated in the reflective 

commentaries. 

Results from these analyses show a positive impact. 

In order to provide more empirical evidence, students this year have completed the 

Effective Lifelong Learning Inventory (ELLI) questionnaire (details available at: 

https://secure.vlepower.com/nlst/core/main.htm). This profiling tool serves a double 

purpose: it provides students with a vocabulary to describe their own thought processes and 

to articulate their ideas, and it provides statistical data to tutors which indicate development 

of both cohort and individual student’s learning characteristics over time. Preliminary 

analysis of these data, together with student opinion obtained in written commentaries and 

in debriefing interviews, shows that the learning environment created has brought about 

positive change. 

We welcome discussion of ways of evaluating student progress towards learning autonomy, 

in particular of the effectiveness of the ELLI profiling tool as a measurement of learning 

power development. 

ENAC 2008 105

Assessing the Art of Diplomacy? Learners and Tutors perceptions 

of the use of Assessment for Learning (AfL) in non-vocational education 

Gillian Davison, Northumbria University, United Kingdom 

Craig McLean, Northumbria University, United Kingdom 

This paper will present findings from an authentic assessment project (Assessment for 

Learning) undertaken with a group of final year under-graduate students undertaking a 

(non-vocational) Politics degree and who elected to take a module called ‘Diplomacy’, at 

Northumbria University. 

Teaching on the module comprised not only a mixture of traditional lectures and seminars, 

but also a board-game exercise. It is this exercise – that is, students playing the Diplomacy 

board-game – that will be the focus of our paper. 

The research methodology took the form of non-participant observation and semi-structured 

interviews with learners. Data was gathered in relation to the tutor’s and students’ 

experiences throughout the course of the module. Data was also taken from the learners’ 

formative assessment activities and the final summative assessment which the learners 

were required to undertake. 

Students organised themselves into one of seven “teams” based upon the imperial map of 

Europe. Their objective was simple: to win the game by being the last “power” standing. The 

module is taught to a group of 24 students. This is an optimal number for the Diplomacy 

board-game, as it is results in teams that are neither too small (i.e., one or two players), or 

too large (i.e., more than five individuals per team). 

The Diplomacy board game is a vital learning resource because it allows students to 

develop skills in negotiation, bargaining and the agreement of Treaties. The board game 

also lets students consider questions such as whether it is ever permissible to lie, cheat or 

break promises. This approach to learning requires students to be active learners (anybody 

not paying attention is likely to be eliminated from the game!) and involves students 

focusing on values, building alliances, cultivating relationships and, most importantly, trust. 

Not only is this a vital aspect of standard diplomatic relations, but it also enables students to 

meet the module’s learning outcomes (the ability to: critically examine the role of diplomacy 

in today’s world order; apply diplomatic thought to real-world situations; and to examine 

critically whether current understandings of diplomacy can help to explain the business of 

interstate relations). 

Over the twelve week period students are required to compile formative assessment 

material in the form of seminar logs, detailing their experiences of the Diplomacy board 

game. This constitutes some 20% of their overall mark, and serves as a platform for the 

extended summative essay that students write at the end of the module. 

The paper aims to demonstrate that authentic assessment activities can be used effectively 

within non-vocational subject areas and do not necessarily need to be located in areas of 

professional practice. 

106 ENAC 2008

Assessment of oral presentation skills in higher education 

Luc De Grez, Martin Valcke, Irene Roozen 

University College Brussels, Belgium 

Research Problem: Underlying this research is the concept of self-regulated learning from a 

social cognitive perspective (Bandura, 1997). A learner acquires standards and has to be 

capable, eventually, to compare his actual performance with these standards and to try to 

close the gap. This process generates internal feedback and is often supplemented by 

external feedback from teachers and peers. Both forms of feedback have to be accurate 

because accurate calibration seems a necessary condition for productive self-regulating 

learning. We can change this demand for an accurate calibration into a reliability problem, if 

we conclude that we strive for the same assessment result whether performance is 

assessed by teachers, peers, or by the learner. An overview of the literature about self- and 

peer assessment in the domain of oral presentation skills generated some questions and 

remarks: Can the optimistic view be maintained that only a simple instruction is needed to 

generate peer and self-assessments that are in agreement with assessments by 

professionals? And what if there’s no such agreement? In that case, the generalizability 

analysis (Brennan, 2000) seems to be a good first step to analyse the error variance. An 

under investigated element are the perceptions students hold of peer and self assessment. 

Research Questions: (1) What is the agreement between peer and selfassessments and 

professional assessments? (2) What are the perceptions about peer assessments? 

Research Design: Research instruments 

Assessment instrument for ‘oral presentation performance’: A rubric was constructed 

containing: three content-related (introduction, structure, and conclusion) five deliveryrelated 

(eye-contact, vocal delivery, enthusiasm, contact with the public and bodylanguage), 

and one overall item. 

Perception of ‘peer assessment’: An existing questionnaire was used and presented twice. 

Procedure: First year students (n=57) delivered three short oral presentations about 

prescribed topics and the presentations were videotaped. Participants assessed their own 

first (n=24) or second (n=54) presentation. Five professional assessors assessed in total 

209 recordings. A total of 29 presentations were assessed by six peers. 

Research results: Overall, we have found a positive correlation between professional and 

peer assessment scores (significant for four criteria) and between professional and selfassessment 

scores (significant for five criteria). The total score of professional assessments 

is significantly lower than self- and peer assessments. However, scores on eight of the nine 

items of the rubric are significantly different between professional and peer assessments. 

A two-facet generalizability study was conducted to obtain variance estimates and to 

determine the number of peers needed for reliable scores. The analysis of the variance 

components showed that the variance in scores related to the oral presentations is low and 

the variance component for peers is large. The generalizability coefficient points at a good 

reliability (.81) and the results suggest that four peers are sufficient when nine criteria are 

used. The perception of peer assessment is predominantly positive and becomes 

significantly more positive in the second questionnaire. 

References 

Bandura, A. (1997). Self efficacy: the exercise of control. New York: Freeman. 

Brennan, R. (2000). (Mis)Conceptions about Generalizability theory. Educational Measurement: Issues and 

Practice, 5-10. 

ENAC 2008 107

Mobile Assessment of Practice Learning: 

An Evaluation from a Student Perspective 

Christine Dearnley, University of Bradford, United Kingdom 

Jill Taylor, Leeds Metropolitan University, United Kingdom 

Catherine Coates, Leeds Metropolitan University, United Kingdom 

The ALPS CETL* aims to develop and improve assessment, and thereby learning, in practice 

settings for health and social care students. The centre is working towards an 

interprofessional programme of assessment of common competencies such as 

communication, team working and ethical practice among health and social care students. 

The assessment tools will be delivered in electronic, mobile format. Between July and 

December 2007, ALPS issued nearly 900 mobile devices with unlimited data connectivity to 

students undertaking practice based learning and assessment across the ALPS partnership. 

ALPS is implementing the infrastructure to develop, deliver and manage learning content and 

assessments on mobile devices to students on a large scale across the 5 partner HEI.** 

The study that forms the basis of this paper is being undertaken across all five partner sites. It 

incorporates students from sixteen professions and will investigate the impact of the ALPS 

mobile assessment processes on learning and assessment within practice settings over an 

eighteen month period. Early outcomes of the study will be reported with an emphasis on the 

extent to which assessment of core competencies for practice can be facilitated using the ALPS 

mobile assessment processes and the relationships between these processes and learning in 

practice settings. The ALPS mobile assessment processes have two further innovative 

components, which will be explored as part of this study, these are inter-professional 

assessment of common competencies and service user involvement in practice assessment. 

Whilst there is considerable evidence of mobile devices being used in health and social 

care provision, their use for assessment of professional practice is a new and innovative 

development that has not been fully evaluated. This study builds on the ALPS IT Pilots, 

which explored the feasibility and key issues of using mobile technologies in the 

assessment of health and social care students in practice settings and were reported at the 

Earli conference 2006 (Dearnley & Haigh 2006, Taylor et al 2006). Key benefits were 

identified; these included reduction in paperwork and in risks of handling paper copies of 

assessment data, enhanced communication between peers and tutors leading to increased 

professional interactions and that in some cases mobile devices seemed to help students to 

overcome barriers to writing and to instil pride in their work. The project team is committed 

to further exploring the full pedagogic potential of this initiative. 

*Assessment & learning in Practice Settings is a centre for Excellence in Learning & 

Teaching (CETL) funded by the Higher Education Funding Council for England 

http://www.alps-cetl.ac.uk/ 

**Universities of Bradford, Leeds, Huddersfield, Leeds Metropolitan and York St John 

University College 

References 

Dearnley C.A., Haigh J., Using Mobile Technologies for Assessment and Learning in Practice Settings: A 

Case Study. Third Biennial Joint Northumbria/EARLI SIG Assessment Conference. 30th Aug-1st 

Sept. Co Durham, UK (2006) 

Taylor J.D., Coates C., Eastburn S, & Ellis I. (2006) Using mobile phones for critical incident assessment in 

Health placement practice settings. Third Biennial Northumbria/EARL SIG assessment Conference. 

108 ENAC 2008

How reliable is the assessment of practice, and what is its purpose? 

Student perceptions in Health and Social Work 

Margaret Fisher, University of Plymouth, United Kingdom 

Tracey Proctor-Childs, University of Plymouth, United Kingdom 

Introduction: The Centre for Excellence in Professional Placement Learning (Ceppl) is 

based at the University of Plymouth in Devon, England. This Centre seeks to share and 

develop excellent practice in collaboration with other disciplines which have a placement or 

practice component (QAA 2003). 

One research strand is evaluating practice assessment methods in Midwifery, Social Work 

and Post-registration Health Studies. A multi-disciplinary team representing all three of 

these professional groups and comprising students, service-users, practitioners and 

academics is currently working on this project. 

This paper reports on the findings of Years One and Two of a three-year longitudinal study, 

which commenced in June 2006. Staff focus groups are concurrently being undertaken, but 

results from these will be reported at a later date. The literature clearly suggests that validity 

and reliability are fundamental to the success of an assessment, but are difficult to achieve 

(Chambers 1998, Calman et al 2002, Crossley et al 2002, McMullan et al 2003). 

Assessment of competence in practice is crucial in determining whether or not a student 

meets the criteria required of their profession (Cowan et al 2005, Watkins 2000). Early 

findings of the study raise important issues in relation to this existing evidence, as well as 

identifying further avenues for investigation. Once the study is complete, generic guidelines 

and resources will be developed to inform cross-professional assessment of practice in 

placement settings which should be transferrable internationally. 

Methodology: The aim of the project is to explore the student experience of the practice 

assessment process during a professional programme of study. Perceptions of validity and 

reliability of assessment methods as well as the impact of the process on the student 

learning experience are being explored. Multi-centre Research Ethics Committee approval 

was obtained for the study. An average of five students per professional group are 

participating in longitudinal case studies throughout their two to three-year programme. 

Semi-structured interviews are tape-recorded after submission of the practice assessment 

documents at the end of each year, and students are invited to add any further contributions 

during the year as they see fit. Single-case and cross-case analysis and synthesis is being 

conducted using the “Framework technique” (Ritchie and Spencer 1994). 

Findings: Analysis of transcripts from the first two years has resulted in identification of key 

themes:-Purpose, Process and Guidance. Practicalities of methods used and the students’ 

perception of the purpose of assessment have been discussed. The role of the practice 

assessor and the placements themselves have been identified as key areas. An interesting 

sub-theme around honesty and integrity – “cheating the system” – has emerged as an issue 

of importance. This is being explored further in view of the future professional roles of the 

students. Information gained has already informed delivery and structure of some of the 

professional programmes and their practice assessment methods. Reports on the findings 

may be accessed on the Ceppl website at: www.placementlearningcetl.plymouth.ac.uk. 

Journal publication is in progress. 

ENAC 2008 109

Measuring variance and improving the reliability of 

criterion based assessment (CBA): towards the perfect OSCE 

Richard Fuller, Matthew Homer, Godfrey Pell 

University of Leeds, United Kingdom 

Background 

Assessment methodologies have increasingly come under the spotlight with respect both 

reliability and validity. In healthcare settings, the traditional unstructured ‘long and short 

cases’ have given way to the OSCE (Objective Structured Clinical Examination) where 

students undertake a series of short clinical assessments which are objectively assessed 

against predetermined criteria. The OSCE is a prime example of CBA in health care 

programmes, allowing careful blueprinting, spread of domains, clarity of assessment mark 

sheets, standard setting and metrics to look thoughtfully at the performance of the 

assessment. 

CBA has a number of obvious weaknesses, typically in that: 

- item based checklists can highly reward a scattergun approach by candidates 

- it can be difficult to reward better performers, 

- there is strong reliance on assessor behaviour despite item based checklists 

- they are labour intensive and costly. 

- Tensions exist between (face) validity on the one hand and standardisation and 

reliability on the other. A variety of metrics can be used in the process of defining, 

exploring and correcting error variance (variance in marks due to factor other than 

student performance). This paper explores Leeds’ experience and research in this area, 

defining measures for error variance and methods of reducing variance whilst 

maintaining strong clinical validity. 

Summary of work 

This paper will provide a brief overview of the OSCE process and analysis of final year 

results from recent years will be presented. We have found that between assessor variance 

(proportion of checklist mark/grade variance attributable to assessors out of the total 

mark/grade variance) in many cases exceeded 25% and in some cases exceeded 40%. 

Interpretation of the raft of station metrics allowed us to identify causes of both random and 

systematic error. 

This paper looks at issues such as assessor training, gender interactions and checklist 

structure, and shows how these issues were addressed to reduce the mean station 

variance to below 20%. 

Conclusions 

Tensions between reliability and validity continue to be important in complex CBA 

arrangements. This philosophical tension does have demonstrable effects – and we can 

use variance to examine this, and the impact of changes. 

Despite our best efforts, between assessor variance persists, perhaps as a result of varying 

perceptions of appropriate ‘standards’ for students at different stages of their courses, 

because of varying levels of assessor maturity and confidence in dealing with checklist 

items. Whilst we have made significant improvements by addressing specific issues 

detailed in this paper, it is important to recognise that error variance in complex, high stakes 

criterion based assessment remains an ongoing challenge. 

110 ENAC 2008

Learning through assessment and feedback: 

implications for adult beginner distance language learners 

Concha Furnborough, The Open University, United Kingdom 

Feedback on marked assignments is an important element in the learning process, 

especially in distance learning, where it can provide students not only with a measure of 

their progress but also with individualised tuition (Cole et al., 1986), and may be the sole 

channel for student-tutor communication (Ros i Solé & Truman, 2005: 88). Feedback also 

makes an important contribution to motivation (Walker & Symons, 1997: 16-17). This paper 

reports specifically on distance learner perceptions of positive tutor feedback, together with 

cognitive and affective responses they may generate. 

One of the challenges of studying a language at a distance is managing interpersonal and 

communicative aspects of language acquisition (Sussex, 1981: 180); in the Open University 

(UK) students are offered a supported distance course, which includes tutor feedback on 

assignments. In this model feedback has a dual function, being used for both formative and 

summative assessment purposes. Anecdotal evidence suggests that students attach far 

greater importance to the latter than to the former, although it can be argued that learning 

occurs when students perceive feedback not simply as a judgement on their level of 

achievement but as enabling learning (Maclellan (2001) in Weaver, 2006: 380-381). 

Learning depends not only on the quality of the feedback but also on students’ responses to 

it, according to how they interpret it. 

The research presented here is part of a larger study on motivation that gathered data 

through questionnaires and interviews. These findings draw mainly on data obtained from 

56 telephone interviews with students of Spanish, French and German at the midpoint of 

their courses. The interviews covered themes associated with motivation, including 

approaches to distance language learning, support in language learning, confidence and 

progress; so they enabled us to situate learner perspectives on tutor feedback in the context 

of their views on other aspects of their learning. 

Our results suggest that this concept of feedback as a learning tool is especially important 

for beginner language learners in distance learning settings, and one that also acts as a 

vehicle for increasing their self-confidence – an important consideration in terms of 

motivation maintenance. We would also argue that some learners in this category need little 

help in discovering how to use feedback to these ends, whereas others require 

considerable support, guidance and encouragement. 

Although our target group was beginners in a distance learning context, the findings may 

also be applicable to other levels and learning contexts. 

Practice-related discussion 

Suggested areas for discussion are: 

• implications for raising learner awareness of the teaching and learning function of 

feedback; 

• training of tutors to be aware of students’ needs in terms of feedback 

• the potential of feedback to engage students in active learning, and enhance their selfconfidence 

and motivation. 

ENAC 2008 111

Secret scores: Encouraging student engagement with useful feedback 

Stuart Hepplestone, Sheffield Hallam University, United Kingdom 

This short paper session will discuss the use of technology in providing useful feedback to 

students by exploring the development of, and presenting initial findings from ongoing 

research into the practical experience and impact on the student assessment experience, of 

two separate, yet complimentary, tools at Sheffield Hallam University (SHU) to enhance the 

way feedback can be provided to students, and to encourage students to engage with their 

feedback through the Institution’s virtual learning environment, Blackboard. 

Students at SHU are increasingly expecting access to their feedback and marks online, 

often remarking on the usefulness of online feedback as a way to track their progress on 

different assessment tasks for their modules. To meet these rising expectations, the 

University undertook a project to enhance the way feedback can be provided through 

Blackboard (Hepplestone & Mather 2007). A key aspect of this project was the development 

of customised assignment handler extension which supports effective online feedback 

through the Blackboard Gradebook by enabling tutors to batch upload feedback file 

attachments along with student marks, providing feedback on group assignments to each 

individual in the group, presenting student feedback all in one place and close to their 

learning, and encouraging students to engage with their feedback to trigger the release of 

their marks (after Black & Wiliam, 1998, who argued that the “effects of feedback was 

reduced if students had access to the answers before the feedback was conveyed”). (A 

poster presentation, Useful feedback and flexible submission: Designing and implementing 

innovative online assignment management, accompanies this short paper to explore the 

development process). 

Accompanying this development is an electronic feedback wizard. This tool allows tutors to 

quickly generate consistent individual feedback documents for an entire student cohort specific 

to each assignment created in Blackboard from a generic feedback template containing a 

matrix of assessment criteria and feedback comments (Hepplestone & Mather, 2007). This 

initiative stems from various systems developed and used by individual colleagues at SHU, 

paralleling the work of Denton (2001) who developed a technique using a combination of 

Microsoft Excel and Microsoft Word to generate personalised feedback sheets. 

SHU is a large UK University with over 28,000 students, based across three campuses, 

offering a diverse range of undergraduate and postgraduate courses. 

References 

Black, P. and Wiliam, D. (1998) Assessment and classroom learning. Assessment in Education, 5 (1), pp.7-74. 

Denton, P. (2001) Generating Coursework Feedback for Large Groups of Students Using MS Excel and 

MS Word, [online]. University Chemistry Education, 5 (1), pp.1-8. Last accessed 12 February 2008 

at: http://www.rsc.org/pdf/uchemed/papers/2001/p1_denton.pdf 

Denton, P. (2001) Generating and e-Mailing Feedback to Students Using MS Office, [online] In: Proc. 5th 

International Computer Assisted Assessment Conference, Loughborough, 2-3 July 2001. Learning 

and Teaching Development, Loughborough University. Last accessed 12 February 2008 at: 

http://www.caaconference.com/pastConferences/2001/proceedings/j3.pdf 

Hepplestone, S. & Mather, R. (2007) Meeting Rising Student Expectations of Online Assignment 

Submission and Online Feedback, [online] In: Proc. 11th Computer-Assisted Assessment 

International Conference 2007, Loughborough, 10-11 July 2007. Learning and Teaching 

Development, Loughborough University. Last accessed 12 February 2007 at: 

http://www.caaconference.com/pastConferences/2007/proceedings/Hepplestone%20S%20Mather% 

20R%20n1_formatted.pdf 

112 ENAC 2008

Large-Scale Assessment and Learning-Oriented Assessment: 

Like Water and Oil or new Possibilities for Future Research Directions? 

Therese Nerheim Hopfenbeck, University of Oslo, Norway 

For better or worse, large-scale assessments seem to be here to stay. Surveys such as the 

Programme for International Student Assessment (PISA) have had a huge impact on 

national educational policy in several countries, and will probably continue to do so. 

The aim of the current work is to bridge the gap between the fields of educational 

psychology concerned with learning-orientated assessment and the field of large-scale 

assessment and the need for policy relevant data. 

The present paper consists of two arguments. First I will argue, that despite critiques, largescale 

assessment offer valuable information to the field of educational research. They can 

play a valuable role in the development of comprehensive assessment systems, which also 

includes learning -oriented assessment. 

Secondly, the use of questionnaires in large-scale assessments such as PISA can be used 

in combination with small-scale research, such as interviews, to further investigate in depth 

some of the main findings from large-scale assessment. Bringing qualitative small-scale 

research together with large-scale assessment, can lead to improvements of the research 

methods used for improving classroom assessment. 

Using a mixed method approach, combining quantitative findings from PISA 2006 with 

qualitative data from an interview study in Norway, descriptions of students’ self-beliefs of 

learning, achievement and assessment will be presented. I will show how such studies 

might be carried out and contribute to a deeper understanding of assessment. The 

relevance of the current research is based upon a review of large – scale assessment and 

its’ policy influence after 1970 together with the research based principles from the 

Assessment Reform Group (2002). 

In addition to the quantitative material from the PISA 2006 test, the empirical base for the 

discussion includes comparisons between the low achieving students and high achieving 

students on the following factors: 

• How students experienced the PISA test 

• Task format 

• Schools preparation for the PISA test 

• Students’ test motivation 

• Different assessment cultures 

Together these mixed method approach offer a thick description of students’ experience of 

large-scale assessment, their consequences’ and challenges. 

Finally, suggestions for combining large-scale assessment with classroom assessment are 

made, in an attempt to further empower the students in their learning process, to better 

develop as self-regulated, or what PISA calls “learners for tomorrow's world” who are also 

able to monitor their own learning. 

ENAC 2008 113

Online interactive assessment for open learning 

Sally Jordan, Philip Butcher, Arlëne Hunter 

The Open University, United Kingdom 

This paper describes recent developments in the formative, summative and diagnostic use 

of e-assessment at the UK Open University, in particular the development of interactive 

computer marked assignments (iCMAs). These are being introduced within a coordinated 

initiative that is extending the richness of e-assessment tasks within an integrated and 

supported pedagogical model. 

The iCMAs include many different question types, some of considerable complexity and 

involving elements of constructed learning. It is widely recognised that rapidly received 

feedback on assessment tasks has an important part to play in underpinning student 

learning, encouraging engagement and promoting retention (see for example Rust et al, 

2005, op cit; Yorke, 2001). Online assessment provides an opportunity to give virtually 

instantaneous feedback. However, providing automatically generated feedback which is 

targeted to an individual student’s specific misunderstandings is more of a challenge, 

especially in response to answers entered as free-text. Students are allowed three attempts 

at each iCMA question, with tailored and increasingly detailed prompts allowing them to act 

on the feedback whilst it is still fresh in their minds and so to learn from it (Gibbs and 

Simpson, 2004, op cit). Feedback can also be provided on the student’s demonstration of 

learning outcomes developed in the preceding period of study. 

Evaluation methodologies have included student observation, comparisons against human 

marking and a ‘success case method’ approach. Preliminary results indicate that the 

systems are robust and accurate in marking, that students enjoy the iCMAs (even when 

used summatively) and that they usually engage with the feedback provided. The system 

automatically collects information about student interactions, enabling the tracking of 

individual students’ progress (and if necessary the provision of additional support) as well 

as wider-ranging insights into students’ understanding of the course material. 

The formative capabilities of computer based assessment tasks such as those described 

are of particular importance in distance learning contexts, because of the ability to mimic a 

‘tutor at the students’ elbow’, irrespective of the geographical location of the learner and 

tutor (Ross, Jordan and Butcher, 2006). They enable dialogue about standards of 

achievement and act as a proxy for the immediately accessible learning community enjoyed 

by face to face students. 

Although the paper emphasises the open and distance learning context, we will encourage 

discussion of wider applicability. We share the view that e-assessment has the potential to 

‘significantly enhance the learning environment’ (Whitelock and Brasher, 2007), and will 

seek to challenge perceptions of e-assessment as being of limited validity and relevance. In 

so doing we will explore reasons for its relatively low uptake. 

References 


learners. In Innovative assessment in Higher Education ed. Bryan, C & Clegg, K.V., pp123-131 

London U.K., Routledge. 

Whitelock, D and Brasher, A (2006) Roadmap for e-assessment. JISC. At 

http://www.jisc.ac.uk/elp_assessment.html [accessed 1st February 2008]. 

Yorke, M (2001) Formative assessment and its relevance to retention, Higher Education Research & 

Development, 20(2), 115-126. 

114 ENAC 2008

Can inter-assessor reliability be improved by deliberation? 

Per Lauvas, Østvold University College, Norway 

Gunnar Bjølseth, Østvold University College, Norway 


From previous studies it is evident that inter-assessor reliability varies from nearly zero to 

almost complete match. However, the reliability often seems to be lower than what is 

considered acceptable, at least when expressive assignments is considered. In one health 

related study programme, several indications (e.g. from handling appeals) raised serious 

concerns as to marker reliability after the number of assessors had been cut back. Standard 

procedure is for the assessors to assign a mark and produce a written justification for easy 

processing when students use their legal right to receive feedback. 

The final summative assessment is an integrative, across-the-modules home examination 

where students are assigned a thematic field and required to choose perspectives and 

cases based on their own priorities and experience. The teaching is organised in themespecific 

modules while the final assignment is integrative. All teachers are involved in the 

final assessment, and an assessor will mark assignments that are close to or further away 

from his or her field of expertise. 

The Department of nursing education decided to run 6 workshops (full or half day) for all 

academic staff involved in the bachelor programme over a one year period. Prior to each 

workshop a set of authentic, recent student papers were distributed to all teachers (i.e. 

internal assessors) for individual, independent marking and with the requirement to produce 

the written justification (feedback) to support the mark. Student papers covered all three 

years of the Bachelor programme, as well as the whole 6 step range of marks (A to Fail). 

The workshops had three parts: (a) thematic introduction, (b) deliberations in groups to 

arrive at a conclusion as to mark assigned to a specific student paper, and (c) recording of 

results from all groups and a subsequent plenary summary and discussion. Individual and 

group assessments (grades and justifications) were collected, analysed and fed back to the 

participants, also serving as background for selecting an assessment approach to be tested 

out in the next workshop. 

Emphasis was placed on the assessors’ interpretation and application of assessment 

criteria. The intention to be scrutinised was whether a systematic process of collegial 

deliberations over the assessment of authentic student papers in relation to assessment 

criteria and feedback/justification would result in improved inter-assessor reliability. The 

‘assessment of the assessors’, conducted in the final workshop (Oct. 2007) showed that 

assessment reliability had improved, but only marginally; still it was the case that the 

variation between individual assessors’ grading and valuing of the quality of students’ 

assignments is inferior to standards considered appropriate by the faculty. It seems to be 

the case, however, that the written justifications (‘feedback’)of the given marks did change. 

In the paper, the background for the project will be elaborated, the process and the results 

will be analysed and discussed: Is it realistic to improve inter-assessor reliability to an 

acceptable level by deliberation among colleagues? What challenges do assessment of 

integrative, expressive assessment tasks represent and how could they be met? 

ENAC 2008 115

Sketchbooks and Journals: a tool for challenging assessment? 

Paulette Luff, Anglia Ruskin University, United Kingdom 

Gillian Robinson, Anglia Ruskin University, United Kingdom 

This paper highlights aspects of our experience as exploratory practitioners researching the 

use and the value of sketchbooks and learning journals as a form of assessment. We report 

our developing understandings of ways in which these can support and extend students’ 

learning within the context of an Art, Design, Technology module and an Early Childhood 

Curriculum module, both for undergraduate students of education. Within our Early 

Childhood Studies (ECS) and Primary Education BA courses we emphasise approaches to 

young children’s education informed by socio-cultural theories. This promotes a view of 

learning which stresses the importance of shared meaning making and the co-construction 

of knowledge. Accordingly, we draw upon the Vygotskian concept of pedagogical tools, 

mediating and extending knowledge construction, and emphasise a close relationship 

between means of assessment and student learning. Sketchbook research journals have 

been used as part of the assessment for the Art, Design Technology and Control, 

Technology for several years. The module is delivered through lectures, practical 

workshops, ICT workshops, self and tutor directed learning over a period of 12 weeks. 

Students are challenged to make and programme a 3D working model based on a work of 

art, to create a teaching aid that makes effective use of cross-curricular approaches. They 

use the sketchbook learning journal to maintain a record of thinking and decision making 

during the development of this project. The Early Childhood Curriculum module is studied 

over 24 weeks with sketchbook learning journals used to capture and explore 

understandings of this topic (from lectures, workshops, fieldwork and wider reading). Our 

project is, therefore, based upon a multiple case study design, apt for monitoring and 

explaining educational practices (Sanders, 1981; Merriam, 1998 ). Most data gathering is 

integrated into the module programmes with the sketchbook journals themselves forming 

important sources of qualitative data, together with staff and students’ reflection on the 

processes. Evidence, from our initial analysis of findings, indicates that sketchbook learning 

journals can provide a means for students to capture, synthesise, reflect upon and critique 

their learning. By making learning visible they also offer a rich source for assessing the 

processes of student learning and assisting our understandings and development as 

teachers. In considering sketchbooks as challenging assessment tools, we address the 

ways that using sketchbooks challenges traditional forms of summative assessment by 

requiring that students show their developing thinking and learning throughout a module, 

with built in opportunities for formative tutor, peer and self assessment. There is also the 

challenge of some clash of philosophy as, although we are advocating a constructivist 

approach to learning, in our current system all students must achieve pre-set module 

learning outcomes. It is also challenging for students, as material has to be synthesised and 

documented in ways that communicate their ideas and demonstrate higher order thinking - 

and this must be sustained throughout a module. We anticipate that these points may prove 

fruitful for discussion. 

116 ENAC 2008

Evaluating the use of popular science articles for assessing 

high schools students 

Michal Nachshon, Ministry of Education, Israel 

Amira Rom, The Open University, Israel 

Alternative assessment is a way to assess students' achievements whereby teachers 

assess students by authentic tasks when the student is required to formulate the problem; 

tasks which enable different solutions and provide the opportunity to reflect on their learning 

process.The majority of students in Israel graduate from high school after completing at 

most one year of science studies. For these students, a new program, Science for All, is 

now being offered at the high-school level as an alternative to the traditional natural science 

courses. This program encourages the teaching of science in a more thematic way, 

integrating the different scientific disciplines and aspects of technology. The intention is to 

expose all students to scientific principles and, consequently, to extend their understanding 

of them. 

Popular science articles, published in a variety of newspapers and magazines, can be a 

powerful tool to help students connect what they learn in school to current scientific and 

technological advancements. Further, popular science articles can provide opportunities for 

students to read critically, discuss issues and reach decisions based on their knowledge of 

science. 

The purpose of the study was to evaluate the use of authentic tasks, based on popular 

science articles for assessing Science for All students. The results presented in this 

summary are part of an ongoing longitudinal study of the use of popular science articles in 

instruction and assessment. 

The sample consists of 57 teachers in 40 schools nationwide. At the end of the school year 

Science for All teachers were asked to choose a popular science article and use it for the 

development of an assignment including scoring rubrics. They were then asked to send us 

the assignment, the scoring rubrics, and a sample of three students’ work corresponding to 

excellent, medium and poor grades. In addition, teachers filled a written questionnaire in 

which they were asked to characterize the assignment they developed and reflect on their 

experience. 

Specifically, teachers were asked to identify the learning goal assessed by the assignment, 

including both concepts and skills; to specify the cognitive levels required by each item in 

the assignment; and to indicate which abilities of multiple intelligences are represented in 

their assignment. 

Two independent expert teachers evaluated each assignment and sample student work. 

Next, assessment experts reviewed these evaluations and summarized the strengths and 

weaknesses of each assignment. At the end of the process, each teacher received a written 

feedback and discussed this feedback with his assigned expert teacher. 

Our findings show that teachers include both low and high-level cognitive questions in the 

assignments. With respect to multiple intelligences, teachers tend to include tasks that 

require linguistic and logical abilities but not other abilities. In addition, three main difficulties 

were identified: Teachers had trouble identifying valid learning goals; in some cases, 

teachers failed to recognize the skills that were assessed; and typically, the scoring rubrics 

did not match the learning goals the teachers intended to assess. We believe that this 

process can help teachers become more knowledgeable about the desired characteristics 

of assessment. 

ENAC 2008 117

Supporting student intellectual development through assessment design: 

debating ‘how’? 

Berry O'Donovan, Oxford Brookes University, United Kingdom 

Margaret Price, Oxford Brookes University, United Kingdom 

Prior research suggests that students move through stages of intellectual development 

(whilst in higher education) in which their beliefs about the nature of knowledge and learning 

change and develop in complexity and understanding, the best known of which is probably 

that of Perry (1970) but also includes the influential work of Belenky et al, (1986), King and 

Kitchener (1994) and Baxter Magolda, (1992). However, the literature is less clear about 

how such intellectual development can be triggered and encouraged through assessment 

and learning activities. 

Vygotsky’s (1978) seminal work on social constructivism and ‘zones of proximal 

development’ conceptualises learning development in incremental terms. Students 

advancing to nearby learning positions that some of their peers already hold and share with 

them. Arguably, this suggests tutors and assessment designs should provide the cognitive 

scaffolding that would support collaborative, incremental and seemingly comfortable 

development. 

However, other perspectives on intellectual development such as Meyer and Land’s (2003) 

work on threshold concepts and the narratives within Baxter Magolda’s (1992) work on 

intellectual development paint a less comfortable picture. Meyer and Land posit that there 

are disciplinary concepts that once understood by students lead to new and previously 

inaccessible ways of thinking. Such intellectual movement involves students entering into a 

‘liminal space’ where they have moved out of familiar cognitive territory into a zone of 

disorientation where existing certainties are rendered problematic before they can cross the 

threshold into a new landscape of understandings. Baxter Magolda’s (1992) narratives also 

contain student reflections on critical incidents that seemingly thrust them uneasily up the 

intellectual development ladder, revealing the development as sometimes both erratic and 

disquieting. 

So what does this mean for assessment? Taking a Vygotskian approach may involve 

adopting an assessment design involving low stakes, scaffolded, collaborative assessment 

activity that allows for ‘slow learning’ (Claxton, 1998 cited in Knight and Yorke, 2002). In the 

initial stages of an undergraduate degree this may also involve designing assessment tasks 

that align with lower level epistemological beliefs, i.e. content focused assessment that 

reflects factual material verified by an authority. 

Alternatively, if we consider intellectual development as uneven and inconsistent, and take 

the stance that students need to confront ‘troublesome knowledge’ (Meyer and Land, 2003) 

and make disquieting intellectual leaps that cross learning thresholds -then what 

assessment designs would we choose? Arguably, such a stance may involve assessment 

designs that involve: an ‘unfreezing’ process (Lewin, 1951) to provoke students out of 

current comfortable orientations: assessment tasks that ‘problematise’ the subject (Grey et 

al, 1996); student discomfort; and tasks that provoke higher order epistemological stances. 

The discussion will explore the nature of intellectual development, including diverse 

disciplinary epistemologies, and the implications for assessment design. To support 

participants unfamiliar with the literature, discussion will be seeded by illustrative quotes 

from the literature and practical examples taken from a large scale qualitative study of 

students’ epistemological beliefs undertaken at Oxford Brookes. 

118 ENAC 2008

Assessment contexts that underpin student achievement: demonstrating effect 

Berry O'Donovan, Oxford Brookes University, United Kingdom 

Margaret Price, Oxford Brookes University, United Kingdom 

A large scale study in the US that examined over 25,000 students and over 190 

environmental variables found that the key influence on student success is student 

involvement fostered by student/student and student/faculty interaction (Astin,1997). Such 

findings have been corroborated by smaller scale unpublished studies in the UK (Holden, 

2008). Taking a social constructivist approach to the classroom and the use of interactive 

teaching strategies has been well documented in the literature (Vygotsky, 1978). Less well 

documented is the effect of intentionally increasing opportunities for student/student and 

staff/student interaction outside of the classroom (O’Donovan et al., 2008) 

The ASKe (Assessment Standards Knowledge exchange) Centre for Excellence based at 

Oxford Brookes University in the UK has for the last two years been attempting to cultivate 

students and staff sense of community within one School situated on a satellite campus 

which attracts significant numbers of undergraduates, often taught in large classes. Within 

this learning context, described by many students as ‘impersonal’ (Price et al. 2007), the 

Centre has developed initiatives that intentionally involve students with the academic 

community outside the formal classroom. Initiatives include: peer-assisted learning in which 

more advanced students help others with their learning; modular leader assistantship in 

which students help academics with their teaching preparation and organisation; students 

as co-researchers; students allowing staff insight of their experience though the media of 

audio diaries. 

Whilst these initiatives have been evaluated as very successful from both student and staff 

perspectives, evidencing an effect on student learning through their assessed performance 

is proving very tricky. As Graham Gibbs (2002) states there is a real absence in most 

pedagogic research of hard evidence of improvement to student learning. Roundtable 

discussion will commence with Gibb’s fundamental question on whether qualitative 

evidence demonstrating student and staff appreciation and belief in the effects of such 

initiatives is sufficient. After which possible methodologies that evidence cause and effect 

between such individual initiatives and students’ assessed performance within a context of 

an ever changing learning landscape will be discussed and debated. 

References 

Astin, A. (1997) What Matters in College? Four Critical Years Revisited, San Francisco: Jossey-Bass 

Gibbs, G. (2002) ‘Ten years of Improving Student Learning’ Improving Student Learning Theory and 

Practice 10 years on. Improving Student Learning 10, Berlin, September. 

Holden, G. (2008) ‘The Importance of Feedback’, Assessment for Learning: How does that work?’, 

HEA/Northumbria Workshop, Newcastle, February. 

Price, M. & O’Donovan, B., Rust, C. (2007) Building community: engaging students within a disciplinary 

community of practice, ISSOTL ‘Locating Learning’: Sydney, July. 

O’Donovan B., Price M., Rust C. (2008) Developing student understanding of assessment standards, 

Teaching in Higher Education, vol 13., no. 2, pp.205-217 

Vygotsky, L. S (1978). Mind in Society: The Development of Higher Psychological Processes. MA: Harvard 

University Press. 

ENAC 2008 119

In-classroom use of mobile technologies to support formative assessment 

Ann Ooms, Timothy Linsey, Marion Webb 

Kingston University, United Kingdom 

The paper presents the findings of a research project on in-classroom use of mobile 

technologies to support diagnostic and formative assessment. The research project 

addressed the following questions: 

1. Under which conditions can each of the technologies be efficiently and effectively used 

for diagnostic / formative assessment in classroom settings? 

2. What is the impact of the in-classroom use of mobile technologies for 

diagnostic/formative assessment on students’ attitudes toward the module? 


diagnostic/formative assessment on students’ conceptual understanding? 


diagnostic/formative assessment on students’ test results? 

5. What is the impact of the project on teaching practices? How likely is it that that impact, if 

there is any, will sustain? 

6. What is the impact of the project on assessment practices? How likely is it that that 

impact, if there is any, will sustain? 

7. What is the impact of the project on attitudes on in-classroom use of mobile 

technologies? How likely is it that that impact, if there is any, will sustain? 

8. What indicators are there of institutional commitment to and subsequent uptake of inclassroom 

use of mobile technologies? 

Thirteen academic staff members from 7 different faculties within one university used a 

range of mobile technologies such as electronic voting systems, mobile phones, Tablet 

PC’s, Interactive Tablets and i-Pods to support rapid feedback. Two mentors supported and 

assisted the academic staff. 

A mixed-methods methodology was used to collect data from academic staff 

(questionnaires, interviews, reflective journals), students (questionnaires, focus groups), 

and mentors (interviews, reflective journals). In addition, attendance records, assessment 

strategies, assessment tools and assessment records were compared with those from the 

previous year. 

120 ENAC 2008

The Devil's Triad: 

The symbiotic link between Assessment, Study Skills and Key Employability Skills 

Jon Robinson, Northumbria University, United Kingdom 

David Walker, Northumbria University, United Kingdom 

Student reaction to assessment, study skills and the idea of being taught graduate 

employability is typically negative within Higher Education Institutions. Yet, all three now 

have to be considered and included by those involved in the curriculum design of 

programmes and modules within the University of Northumbria. This is particularly 

problematic for non-vocational subjects, such as those typical of the Humanities. In the 

English Division at Northumbria we have redesigned the core first-year module for English 

students in a way that symbiotically links assessment, study skills and employability within a 

framework underpinned by the theory and practice of Assessment for Learning (AfL). 

This roundtable presentation will outline, and open up for in-depth discussion, the approach 

taken by the curriculum team, from both a theoretical and practical perspective, when 

designing the module assessment to link with study and employability skills. It will also 

present the initial findings of research into the effectiveness of the innovation in curriculum 

design. The overarching intention of the presentation will be to create dialogue and explore 

avenues for collaboration with participants at the conference, from different countries and 

who hold different perspectives, in order to facilitate further development of our work and 

create an opportunity for the exchange of ideas. 

ENAC 2008 121

Learning-oriented assessment and students experiences 

Ann Karin Sandal, Margrethe H. Syversen, Ragne Wangensteen 

Sogn and Fjordane University College, Norway 


This presentation reports part of an ongoing project at the Sogn and Fjordane University 

College funded by the Norwegian Research Council. The aim of the project is to examine 

students’ experiences with the transition from primary to secondary school. An important 

issue is to investigate how portfolio assessment can be supportive for making choices and 

motivate for lifelong learning. The comprehensive research project focuses on how students 

are prepared to choose programmes in secondary schools and prepare for choosing a 

future profession through the subject “Elective programme subjects”. This new programme 

was introduced in primary schools together with the curricula reform “Knowledge Promotion” 

in 2006. The main aim is to prevent mistakes and dropouts, and help the students make a 

good choice. 

In the current study we examine how formative assessment influences students’ beliefs and 

plans for further education, and to what extent assessment, through digital portfolios, 

enhances consciousness about further education (Klenowski, 2002; Black & William, 2006; 

Harlen, 2006). We investigate how assessment in digital portfolios can support the learning 

process and the processes of decision making. We try to identify some consequences of 

teachers’ supportive assessment and the students’ experiences with assessment for 

learning. The study will follow students choosing vocational education programmes. We are 

using both qualitative and quantitative methods in the study. 

A questionnaire was sent to 90 students in 3 different schools in their last term in primary 

school (age 15). The preliminary findings show variations in the students’ interest, 

motivation and consciousness about the choices they are about to make. When asked what 

kind of assessment encourages further work and how this can stimulate the learning 

processes, the students valuate the written comments on their work highly. Together with 

oral response in assessment, this direct and personal response on their work gives the 

students improved self-esteem and belief in their capability of learning (Harlen, 2006; Gibbs 

& Simpson, 2005). It seems that this type of assessment is important to the students in 

order to motivate and create interest for schoolwork. (Hidi & Renningar, 2006). 

However, even if the students seem to be motivated for vocational education and practical 

activities, some students put effort into the more theoretical subjects. 

In order to be admitted to vocational education, good marks in the theoretical subjects are 

required, in which the students are not particularly interested. The students spend most of 

their time on these subjects in their last year in primary school, whereas their interest is 

inspired through, for many of them, the subject preparing them for vocational studies. 

This indicates some interesting challenges due to the students motivation and teachers’ 

assessment practices. How can formative assessment help the students to develop selfesteem, 

knowledge, visions and intrinsic motivation for further education? And how are all 

these challenges dealt with while using portfolio in the formative assessment? 

These questions will be followed up by action research in schools and longitudinal study. 

122 ENAC 2008

Connecting Research Behaviours with Quality Enhancement of Assessment: 

Eliciting Developmental Case Studies by Appreciative Enquiry 

Mark Schofield, Edge Hill University, United Kingdom 

This paper relates the University’s commitment to systematic enhancement of the student 

experience of assessment. This extends beyond quality assurance and juxtaposes research 

and development behaviours allied to ‘thicker’ description (Geertz) of complex events in 

qualitative, interpretive research approaches with those traditionally ‘thinner’ evaluation 

tools characteristic of many university quality assurance systems. 

The paper describes the process of a developmental audit across the Faculties of 

Education, Health and Arts and Sciences. Dialogues were conducted to explore the 

experiences of feedback on assessment of staff and students and those in disability and 

specific learning through focus groups and scrutiny of practices against the SENLEF 

Principles of Feedback (Student Enhanced Learning through Effective Formative 

Feedback). The process also included the elicitation of case studies from staff and students 

about their experience of effective feedback on assessment in the form of short writing 

activities. These focused on the context of effective feedback, an individual reflection on 

why it worked for them, and importantly ideas and guidance for others embarking on trying 

similar approaches. As such, this key element of the audit was conducted in the spirit of 

Appreciative Enquiry. 

Included are reflections on the similarities and differences in these two sets of staff and 

student voices and Tag Cloud representations (word frequency analyses) which reveal 

dominant and recessive themes in the sample groups. This offers some stark insights into 

affective issues related to assessment and feedback, congruence in attitudes and 

approaches and some perhaps unexpectedly astute epistemological insights from students. 

The case studies will also be offered )in an abridged form), with commentary related to 

effective practices and alignment with the SNLEF principles represented in MS Word 

comments function, including key questions and challenges arising from the narrative texts. 

The full versions will be available via a url/hyperlink in the paper. 

We argue that such developmental enquiry (research based activity) has given sightlines 

into effective practices, highlights the importance of perceptions of effective feedback, and 

emphasises that the processes embodied in this approach add enhancement layers to 

extant, historical, quality systems. These approaches are replicable for use in supporting 

other lines of enquiry related to assessment and other learning-related aspects of the 

student experience. This enrichment of quality processes is achieved by bringing research 

behaviours into close juxtaposition with quality assurance systems of intelligence gathering 

and by producing data artefacts of both of developmental significance (for use with students 

in academic induction and in staff development) and influential in policy decision making 

related to dissemination of good practice and systematic enhancement of assessment 

practices. 

ENAC 2008 123

Conceptions of assessment in higher education: 

A qualitative study of scholars as teachers and researchers 

Elias Schwieler, Stockholm University, Sweden 

Stefan Ekecrantz, Stockholm University, Sweden 

The researcher’s professional life world is based on explicit and well reflected subject 

specific conceptions. These sophisticated conceptions are the result of an extensive formal 

education, followed by life long, advanced learning by conducting research. As a teacher, 

the same individual’s pedagogic life world is often exclusively a result of socialization and 

the reproduction of existing traditions. Consequently, the teacher in higher education is 

expected to develop knowledge about pedagogic work more or less intuitively, based on far 

less articulated and reflected conceptions. Thus, the academic profession of 

research/teaching can be said to be founded on two professional extremes. There is a need 

for an increased understanding of how such double roles and life worlds are constituted, 

and how they relate to each other. In the area of assessment, we argue, an individual 

researcher’s/teacher’s double belief systems are particularly visible, making it an important 

field of study. 

We will present preliminary results from an ongoing interview-based study about this 

phenomenon, from three different assessment-related themes: 

1) Assessment and personal theories of learning – Subject specific and generic beliefs on 

how, when and why different aspects of a subject need to be learned and assessed is the 

foundation for a teacher’s professional world view. These beliefs are studied, in part, as 

implicit theories of threshold concepts (Meyer & Land 2006) and views on backwash effects 

of assessment. 

2) Assessment and normative values – Summative assessment and grading highlight 

underlying perceptions of assessment as a means to discipline, punish and reward. (Filer 

2000) Also, both students’ and teachers’ workloads peak during the assessment process, 

often leading to stress and tension. In such a climate (Biggs 2007), reflected as well as tacit 

professional values are especially important. 

3) Methodological and epistemological foundations of assessment – Advanced 

epistemological beliefs on knowledge, evidence and scientific method is a vital part of all 

academics’ research. The same individuals’ tacit views on assessment epistemology are 

often at conflict those upheld in research. A methodology that would not be considered in 

research is frequently used uncritically in the enquiry of student learning. 

Our aim is, specifically, to include individual inconsistencies, contextual issues, conceptual 

discrepancies and unelected assumptions by focusing on each teacher’s conceptions of 

assessment. In previous research, with its explanatory focus on idealized models, such 

complexities are usually seen as residuals that must to be excluded in order to maintain a 

manageable amount of parameters. (Cf. Prosser et al 2005.) Furthermore, in order to grasp 

the intricacies of the interviewees' scientific as well as subject specific conceptions, we have 

chosen to study only two epistemic communities, History and English literature. 

124 ENAC 2008

Innovative Assessment Practice and Teachers’ Professional Development: 

Some Results of Austria’s IMST-Project 

Thomas Stern, University of Klagenfurt, Austria 

IMST (Innovations in Mathematics and Science Teaching) is a long term research and 

development project aimed at establishing an effective support system for Austrian schools. 

One of seven measures is the IMST-fund for the promotion of innovations in the teaching of 

maths, sciences and IT. About 160 teacher teams per year are encouraged to submit 

proposals for their classroom innovations, to evaluate both processes and results and to 

write reports that are published on the internet. In return they receive intensive individual 

counselling and some financial remuneration, and they are invited to several workshops. A 

remarkable number of these teachers decide to choose alternative assessment methods as 

their classroom innovation and as a field of investigation into their own practice. 

A cross-case examination of several school projects focuses on new ways of assessment 

that allow the students to some extent to choose their own topics and to keep track of their 

learning progress. Two high school teachers e.g. asked their 12 year old students to record 

examples of encounters with mathematics in daily life; then they went about assessing the 

sophistication and originality of their reports. A physics teacher let her 16 year old students 

choose their own fields of interest in astronomy and then draw and present posters, which 

she assessed in accordance with criteria she had worked out with her class. The study 

shows that self-regulated learning has a strong effect not only on the students’ motivation 

and interest but also on their proficiency and learning outcomes. What is even more 

impressing is the repercussion of these teaching innovations on the attitudes of the 

teachers themselves. In the course of their project about changes in their assessment 

practices most of them embarked on a thorough reflection of their teaching priorities, of their 

beliefs about learning and of their personal perspectives and ambitions as teachers. Both 

their autonomous school innovations and their action research studies can be shown to 

have boosted their professional development. Changes in their assessment routines turned 

out to have an especially strong impact on many aspects of their professional performance 

and were often accompanied by an additional commitment for school development and an 

overall increase in reflection about professional standards. 

ENAC 2008 125

Characteristics of an effective approach for formative assessment of teachers’ 

competence development 

Dineke Tigelaar, Mirjam Bakker, Nico Verloop 

ICLON-Leiden University Graduate School of Teaching, The Netherlands 

Stimulating teachers’ professional development is an important function in assessment of 

teaching (Porter, Youngs & Odden, 2001). However, more research is needed into the effects 

of teacher assessments on teacher professional learning development (Lustick & Sykes, 2006). 

The research is part of a larger research project ‘Effects of different assessment 

approaches on teachers’ professional development’. The goal of this postdoctoral resarch 

project is to evaluate and compare the effects of three formative assessment approaches: 

(1) an expertise- en feedback based approach, (2) an approach for self-assessment, and 

(3) a negotiated assessment approach. In this research project, the focus is on teachers’ 

competences for promoting reflective skills of senior secondary vocational students in 

health care, i.e. in nursing. Central question: “What are the effects of different formative 

teacher assessment approaches on the development of secondary vocational education 

teachers’ competences for promoting and formatively assessing students’ reflection skills, 

and which combination of assessment design characteristics promotes optimally the 

teachers’ competence development? 

Research questions: 

1. Which assessment criteria and standards are developed, formulated and used in the three 

PhD-projects and which set of (common) criteria and standards can be used for a 

representative overall measurement of the participating teachers’ competence development? 

2. How do the teachers perceive and value the characteristics of the assessment approaches 

(see the characteristics 1 – 3 above) in the projects they participated in, and what are the 

results of the overall measurement of the teachers’ competence development (see question 1)? 

3. What is the relation between the measured teachers’ overall competence development 

and a) the assessment approach characteristics as documented by the PhD-researchers as 

well as b) the participating teachers’ perceptions and evaluations of these characteristics? 

Tasks of the postdoc: 

1. Distillation of the common elements in the criteria and standards for teaching 

competences formulated in the PhD-projects using matrices (Miles & Huberman, 1994). 

2. Development of instruments for the repeated overall measurement of teachers’ 

competences, and of the teachers’ (N=88) perceptions and evaluations of the assessment 

design characteristics, and more general conditions in the schools for teacher professional 

development. Video vignettes will be developed, and teachers’ will be asked to select 

samples of student work. Furthermore, questionnaires will be developed. 

3. Organization of the data gathering (in collaboration with the three PhD-researchers). 

4. Analyses of the relations between the measured teachers’ overall competence development 

and a) the assessment approach characteristics as documented by the PhD-researchers as well 

as b) the participating teachers’ perceptions and evaluations of these characteristics and of the 

more general relevant conditions. This will be done using qualitative analyses (matrices) and 

quantitative analyses (analysis of variance, multiple regression analysis, and multilevel analysis). 

5. Development of an optimal combination of design characteristics. 

We stimulate both a research and practice related discussion. 

126 ENAC 2008

Posters 

ENAC 2008 127

128 ENAC 2008

Predictive indicators of academic performance at degree level 

Andy Bell, Manchester Metropolitan University, United Kingdom 

Kevin Rowley, Manchester Metropolitan University, United Kingdom 

In the absence of A* grades at A Level, Cambridge University has designed an additional 

Admissions selection ‘tool’ – hence UCLES (University of Cambridge Local Examinations 

Syndicate) has produced the Thinking Skills Assessment Test’ (TSA Test). 

The TSA Test is designed as a ‘knowledge-independent’ measure of the candidate’s ability 

to think effectively and critically. This test is composed of two types of questions: ‘problemsolving’ 

questions, and questions which ‘tap into’ ‘critical thinking’ abilities. 

The extent to which the TSA Test is predictive of performance at degree level has yet to be 

established. Initial ‘in-house’ research by Cambridge suggests that there is indeed a 

significant predictive link between scores on the TSA Test and performance at degree level 

for students at Cambridge (Emery, J. L. et al, 2006; Emery, J.L, 2006). 

Cambridge University, then, is currently involved in appraising the TSA Test as part of its 

Admissions process. If used extensively to select students for a place at Cambridge, such 

use would have to be seen as justified – otherwise, it would be unfairly discriminatory. The 

present research at the Manchester Metropolitan University (MMU) was designed to add to 

the knowledge base concerning the validity of the TSA Test as a predictor of performance 

at degree level and, ipso facto, its validity as an Admissions ‘tool’. 

Hence, this paper addresses research currently being conducted to examine the extent to 

which the TSA Test is predictive of success at degree level at a non-Oxbridge institution 

(Manchester Metropolitan University). Four cohorts of first-year Psychology undergraduates 

(total N = approx. 350) completed Test L (a research version of the TSA Test). With the 

students’ consent, their academic performance was tracked throughout the three years of 

their degree-level studies. It was thus possible to examine the extent to which students’ 

scores on the TSA Test were predictive of degree level performance in examinations and 

assessed coursework (ACW). 

Factors other than the TSA Test – such as personality traits as measured by Quintax 

(Stuart Robertson & Associates, 1999); performance at A Level and participants’ scores on 

an IQ-type test (Ravens Progressive Matrices - Plus) – were also examined as possible 

predictors of students’ success at degree level. Students’ scores on the three sub-scales of 

the Approaches & Study Skills Inventory for Students (ASSIST / Entwistle, 2000) were also 

established and possible links with academic performance were examined. 

In addition to the above, an adaptation of the ‘Big Five’ scale provided on the website of the 

International Personality Item Pool (IPIP) is currently being developed (Bell & Rowley, 

2008). This is the ‘Big Five for Students’ scale. This will undergo factor analysis and item 

analysis. Then students’ scores on this scale will be correlated with their academic 

performance at degree level. As this scale is a recent development, this aspect will only 

include the 2007-2008 First Year cohort of students (N= 101). 

This research will be completed and all data analysis will be conducted in time for 

presentation to the EARLI (2008) conference in Berlin. 

ENAC 2008 129

Online Formative Assessment for Algebra 

Christian Bokhove, Utrecht University, Netherlands 

Rationale: In the Netherlands – as in many other EU-countries – universities complain about 

the algebraic skill level of students coming from secondary school. It is unclear whether 

these complaints have to do with basic skills or “actual conceptual understanding”, which 

we refer to as symbol sense (Arcavi, 1994). In this research we want to find out how ICT 

tools can help with formatively assessing algebraic skills. 

Key concepts: Three key topics come together in this poster session, forming the 

conceptual framework for our observations: tool use, assessment and algebraic skills. 

The first topic concerns acquiring algebraic skills. Here we discern basic skills, for example 

solving an equation, but in particular conceptual understanding. Arcavi (1994) calls this 

“symbol sense”. 

In this case, formative assessment would be appropriate; aimed at assessment for learning. 

Assessment contributes to learning and understanding of concepts (Black & Wiliam, 1998). 

Feedback plays an important in formative assessment. On the other hand, still getting 

scores and results, remains it also is important to record the progress of a student: 

summative assessment, assessment of learning. Using both to get ‘the best of both worlds’. 

In assessment for learning using ICT tools can be beneficial. ICT tools can help with giving 

users feedback, may focus on process rather than result, track results or scores and 

provide several ‘modes’, ranging from practice to exam. Thus, assessment is for learning. 

Method: In this poster session experiments with with an ICT tool called Digital Mathematical 

Environment (Bokhove, Koolstra, Heck, & Boon, 2006) are described. Through expert 

reviews, one-to-ones and small group experiments we provide a framework on how 

formative assessment can support learning for mathematics. We mention: 

Using several ‘modes’ of assessment during a sequence of lessons: first practice with more 

feedback, gradually more exam-like assessment without feedback. 

Emphasis on the process: how does a student reach his/her correct or wrong answer. This 

information can be used in a subsequent lesson. 

Results: Embedding use of an algebra tool in a didactical scenario, where self-assessment 

and classroom feedback make up a balanced curriculum for attaining sufficient algebraic 

skills, is an important part of formative assessment. We will describe the preliminary results 

of possible didactical scenario’s with the Digital Mathematical Environment. These 

scenario’s will be used for further research on the subject. 

Discussion: We would like to discuss the implications for classroom practice when using ICT 

tools for (formative and summative) assessment, and what didactical scenario’s are best 

suited for acquiring algebraic skills, by using ICT tools. 

References 

Arcavi, A. (1994). Symbol Sense: Informal Sense-Making in Formal Mathematics. For the Learning of 

Mathematics, 14(3), 24-35. 

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, 

Policy & Practice, 5(1), 7-73. 

Bokhove, C., Koolstra, G., Heck, A., & Boon, P. (2006). Using SCORM to Monitor Student Performance: 

Experiences from Secondary School Practice. Math CAA series. 

130 ENAC 2008

Investigating the use of short answer free-text e-assessment questions 

with instantaneous tailored feedback 

Barbara Brockbank, The Open University, United Kingdom 

Sally Jordan, The Open University, United Kingdom 

Tom Mitchell, Intelligent Assessment Technologies Ltd., United Kingdom 

Warburton and Conole (2005) argue that ‘It seems likely that the drive towards emergent 

technologies such as simulations and free-text marking will result in increasingly strong 

competitive pressures against the more traditional ‘standardised testing’, purely objective 

types of CAA system.’ This paper describes the application of such an emergent 

technology, grounded in a desire to improve the student learning experience. 

The UK Open University’s OpenMark assessment system enables students to be provided with 

immediate and tailored feedback on their responses to questions of a range of types, including 

those requiring free-text entry of numbers, symbols and single words (Ross, Jordan and 

Butcher, 2006). This study is an investigation into the viability and effectiveness of adding 

questions which require free-text responses of up to about 20 words in length. Answer matching 

is provided by an authoring tool supplied by Intelligent Assessment Technologies Ltd. (IAT) 

which is able to perform an intelligent match between free-text answers and predefined 

computerised model answers. Thus an answer such as ‘the Earth orbits the Sun’ can be 

differentiated from ‘the Sun orbits the Earth’ and an answer of ‘The forces are balanced’ is 

marked as correct whereas an answer of ‘The forces are not balanced’ is not. The tool looks for 

understanding without unduly penalising errors of spelling, grammar or semantics. 

The questions are delivered to students online and instantaneous targeted feedback is 

provided on both specifically incorrect and incomplete answers. Another novel feature of the 

project has been the use of student responses to early developmental versions of the 

questions – themselves delivered online – to improve the answer matching. 

Students have been observed performing the assessment tasks. Most claim that they wrote 

their responses as if for a human marker. However a few were conscious that they were 

being marked by a computer and anticipating (incorrectly) that only keywords were required, 

entered answers either in note form or in very long sentences. Most students enjoyed the 

assessment tasks and seemed comfortable with the concept of a computer marking freetext 

responses. Where the initial response was incorrect, most students were observed to 

use the advice provided by the feedback and many reached the correct answer. 

A human-computer marking comparison has indicated that the computer’s marking is typically 

indistinguishable from that of six subject-specialist human markers. The computer’s marking 

was generally accurate, showing greater than 95% concordance with the question author. A 

small number of these questions have been incorporated into regular summative interactive 

computer marked assignments on a new distance-learning interdisciplinary science course. 

We will encourage discussion of our evaluation findings and of the technological, financial, 

cultural and pedagogical issues which appear to limit take-up of assessment of this type. 

References 


learners. In Innovative assessment in Higher Education ed. Bryan, C & Clegg, K.V., pp123-131 

London U.K., Routledge. 

Warburton, W and Conole, G (2005) Wither e-assessment. Proceedings of the 2005 CAA Conference at 

http://www.caaconference.com/pastConferences/2005/proceedings/index.asp [accessed 1st 

February 2008] 

ENAC 2008 131

Contextualized reasoning with written and audiovisual material: 

Same or different? 

Nina Bucholtz, Maren Formazin, Oliver Wilhelm 


The ability to arrive at valid conclusions from given information and to comprehend given material 

of non trivial complexity is of importance for many aspects in life, e.g., for learning and acquiring 

knowledge. Fluid intelligence (gf) or more specifically reasoning can be regarded as the main 

prerequisite for contextualized reasoning. In addition, relevant domain specific knowledge – 

supposedly a content-specific component of crystallized intelligence (gc) - can aid in solving 

contextualized reasoning tasks. We have developed an innovative measure of contextualized 

reasoning in order to further investigate the distinction between decontextualized reasoning tasks 

as included in intelligence tests and contextualized reasoning tasks as a relevant aspect of 

student achievement. The new measure is expected to tap both abstract reasoning ability (gf) and 

the recall of acquired information (gc). Contextualized reasoning measures differ from traditional 

comprehension tests – as for example included in the PISA studies – because they focus on 

specific content as opposed to rather general and somehow arbitrary topics. 

One aim of our efforts is to bridge the gap between intelligence and student achievement. 

Additionally, we want to overcome the shortcoming of focusing on written material in 

contextualized measures by also considering audiovisual material. With this design we hope 

to encompass contextualized reasoning in a wider sense, including typical learning 

situations encountered by students. 

In order to reach the above research aims we have designed two studies. The most critical 

research questions were: 

1. Can short video sequences presented via PDA or notebook be embedded into 

contextualized reasoning tasks that meet all requirements of standardized measures of 

maximal behaviour? 

2. Are such audiovisual contextualized reasoning tasks equivalent to paper pencil based 

contextualized reasoning tasks? 

3. Is performance in audiovisual and traditional contextualized reasoning tasks a linear 

function of decontextualized reasoning and relevant (i.e. natural sciences) domain 

specific knowledge? 

In study one, a newly developed audiovisual test that comprises video sequences of 3 to 5 

min length was piloted with N = 86 high school students. The videos include real life scenes 

or animated simulations from biology, chemistry, physics, and geography. Participants 

watch every video once and then have to answer several comprehension questions on the 

basis of the video. A paper pencil based contextualized reasoning test was matched in 

content and comprises texts, tables and figures. All questions cover the circumscribed 

domain of natural sciences. A g-factor-model on the basis of testlets was established for 

both contextualized reasoning tests separately. Both models fit the data well. The relation 

between these two latent factors in a SEM was r = .94; fit of this model was not noticeable 

better than the fit of a model with a single latent factor across both tests. 

In a second study that is currently running with about 200 participants, an effort is made to 

replicate the results from study one and to address research question number three from 

the above list. Results of this study will be presented and discussed with regard to 

implications for further test development and educational implications. 

132 ENAC 2008

Effects of Large Scale Assessments in Schools: 

How Standard-Based School Reform Works 

Tobias Diemer, Freie Universität Berlin, Germany 

Harm Kuper, Freie Universität Berlin, Germany 

Comparative large scale assessments form a centerpiece of recent standard-based reforms 

of the educational systems of the Federal Republic of Germany. The formulation standards 

in education and the implementation of according large scale standard tests in many States 

(Länder) of Germany mark a considerable shift away from an input-oriented towards an 

output-oriented account of governance. By providing and feeding back standardized and 

comparative data about pupils’ achievement, standard tests intend to make teachers and 

schools accountable and to help them to control and improve the outcomes of the pupils’ 

work by means of evidence-based decision-making processes concerning the profession of 

teaching as well as the task of organizing schooling. 

The proposed poster deals with the question of whether and how the results from 

comparative large scale assessments are utilized by teachers as professionals and schools 

as organizations. It therefore will examine the profession- and organization-related processes 

and effects that are produced in line with large scale standardized testing and the feeding 

back of comparative results to teachers and schools. The paper will present typological 

descriptions of effects and process-related patterns of individual as well as collective databased 

decision-making that is based on large scale assessment test results in schools. 

Special attention will be drawn on noticeable consequences and changes regarding the 

conceptualisation and the design of teaching and learning processes by teachers. 

Exploration and analysis of the outlined effects and processes are carried out on two levels 

of abstraction. On a first, comparatively concrete level, the observable phenomena are 

described within the framework of a heuristic model suggested by Helmke (2004). 

According to this model the process of utilization of test results conceptually subdivides into 

four cyclically iterative stages: (1) reception, (2) reflection, (3) action, and (4) evaluation. 

Subsequently, the findings described within these categories are further aggregated as well 

as re-aggregated on a more abstract level. On this level theories of professions and 

organizational theories, particularly new institutionalism, sensemaking theory and system 

theory are analysed in reference to the results found on the more concrete level. 

Within these conceptual frameworks, empirical evidences will be presented that provide 

systematic as well as exemplary insight into the ways standard-based school reform works 

in schools. Furthermore, by reason of the integration of profession- and organization-related 

models, the study contributes to the development of a general theory of the functioning of 

school development in the context of the present standard-based and outcome-orientated 

paradigm of governance within the educational system. 

To capture the processes of decision-making, it is used a longitudinal case study approach 

based on semi-structured qualitative problem-centered interviews with headmasters and 

teachers. The material is analyzed according to procedures of qualitative content analyses 

and grounded theory. The data basis consists of about 120 interviews and 8 observations, 

that are accomplished in 4 schools in the space of 4 data collecting phases spanning over a 

period of 2 years. Due to its longitudinal design, the study gives information on the processrelated 

conditions and effects of standard-based reforms in schools. 

ENAC 2008 133

Support in Self-assessment in Secondary Vocational Education 

Greet Fastré, Marcel van der Klink, Dominique Sluijsmans, Jeroen van Merriënboer 

Open University, The Netherlands 

Despite the importance placed on student’s self-assessment in current education, it appears 

that students are not always able to assess themselves accurately, because they are 

insufficiently able to decide on which criteria they should assess themselves. 

In current assessment practices, students are often asked to come up with self-generated 

criteria and standards on which they want to assess themselves. However, it appears that 

students at the beginning of their study are not able to identify the standards and criteria 

themselves because they do not have a clear view on what is expected of them when it 

comes to their learning outcomes. It is thus the question if novice students should be asked 

to self-generate the assessment criteria. 

If students are given assessment criteria, still, most of the time, only a few assessment 

criteria are relevant for a certain task. When students need to become competent self- 

assessors, they should not only be able to make an accurate assessment, but they should 

also be capable in taking a good decision on which criteria are relevant and which criteria 

are not relevant for assessing a task (Sadler, 1989). This is certainly true in the case of 

assessing real-life whole tasks. In real-life tasks, resembling professional life, a large 

database of potential performance criteria could reasonably be considered. The whole set 

of criteria can be split up in two parts: relevant and irrelevant criteria. In today’s educational 

practices, often, no information on the relevance of the criteria is available for the students 

in advance. The question arises if students are capable of selecting the relevant criteria 

from the whole set of criteria. Regehr and Eva (2006) state that when students get the 

freedom of choosing on which criteria they want to assess themselves, there is a risk that 

they will only highlight the criteria on which they perform well or which they like because 

people naturally strive at creating a positive feeling. The risk is that students will thus not 

recognize exactly those learning needs that are really necessary. 

In this study, it is hypothesized that students who receive information on the relevance of 

the criteria can produce a more accurate self-assessment than students who do not receive 

information on the relevance of the criteria. Furthermore, we expect that students with a 

high accuracy of self-assessment are more competent in selecting points of improvement 

than students with a low accuracy of self-assessment. In the end, we expect there to be a 

positive relation between the accuracy of student’s self-assessment skills and student’s task 

performance. 

One hundred and six first-year students of a Secondary Vocational Education in Nursing 

participated in this study. The experimental design was a 2x2 factorial pre-test - post-test 

design in which the effects of ‘information on the relevance of the criteria’ (Relevant criteria 

vs. All criteria) and ‘variability in learning trajectory’ (School-based vs. Practice-based) were 

studied. Data are collected at the moment and results will be available by time of the 

conference. 

134 ENAC 2008

The confidence levels of course/subject coordinators in undertaking 

aspects of their assessment responsibilities 

Merrilyn Goos, Clair Hughes, Ann Webster-Wright 

The University of Queensland, Australia 

This paper reports the findings of an investigation of the confidence levels of course/subject 

coordinators in undertaking aspects of their assessment responsibilities at a large 

metropolitan university. Like universities in many other parts of the world, the Australian 

institution in which this investigation was undertaken is experiencing “a period of rapid 

change and innovation in relation to assessment policies and practice” (Havnes & 

McDowell, 2008, p. 3). The pressures for change and innovation range from developing 

pedagogical advances that call into question many traditional assessment practices to the 

challenges presented by the increasing student diversity, class sizes and casualisation of 

teaching, which, along with diminishing resources, characterise contemporary educational 

contexts (Anderson et al, 2002). 

The investigation was one element of a situational analysis which formed the first phase of 

a broader project aimed at supporting the leadership capacities of course/subject 

coordinators as assessment innovators. This group was targeted because, though 

significant in the implementation of institutional assessment policy, the role is scarcely 

researched despite it being highly likely that improved performance would benefit student 

learning (Blackmore et al, 2007). Confidence is considered central to the ability to learn 

about and master new practices (Gaven, 2004) and was identified as an issue for this group 

through an earlier pilot conducted by of one of the project team. 

The investigation took the form of an online survey of all course coordinators (response rate 

33%). Survey items were developed from the responsibilities and expectations either 

explicated or implied in institutional policies and rules. The survey identified areas of 

particularly high (e.g. making and defending summative judgements) and low (e.g. dealing 

with plagiarism and locating support when needed) levels of confidence. The paper will 

report survey findings in relation to individual items as well as the influential factors that 

emerged from analysis and the correlation of particular factors with demographic data such 

as years of experience and gender. In addition, coordinators provided open-ended 

comment, the analysis of which is used to elaborate on or clarify particular findings in 

relation to their positive or negative impact on confidence. 

The project was funded through the Fellowship scheme of the (Australian) Carrick Institute 

for Learning and Teaching in Higher Education. 

References 

Anderson, D., Johnson, R., & Saha, L. (2002). Changes in Academic Work: Implications for Universities of 

the Changing Age Distribution and Work Roles of Academic Staff. Canberra: DEST. 

Blackmore, P., Law, S., & Dales, R. (2007). Investigating the capabilities of course and module leaders in 

departments. Paper presented at the Higher Education Academy Annual Conference, Harrogate, 

Graven, M. (2004). Investigating mathematics teacher learning within an in-service community of practice: 

The centrality of confidence. Educational Studies in Mathematics, 57, 177-211. 

Havnes, A., & McDowell, L. (2008). Assessment dilemmas in contemporary learning cultures. In A. Havnes 

& L. McDowell (Eds.), Balancing Dilemmas in Assessment and Learning in Contemporary Education. 

New York: Routledge. 

ENAC 2008 135

Useful feedback and flexible submission: 

Designing and implementing innovative online assignment management 

Stuart Hepplestone, Sheffield Hallam Univeristy, United Kingdom 

Specific functionality has been added to the Blackboard virtual learning environment at 

Sheffield Hallam University (SHU) to enhance the way in which feedback can be provided to 

students and to improve the way student assignments are processed. This poster will 

explore the practical experience of designing and implementing a customised assignment 

handler tool in response to rising student expectations of online feedback and online 

assignment submission. (This poster presentation accompanies the short paper session, 

Secret scores: Encouraging student engagement with useful feedback, which discusses the 

use of technology in providing useful feedback to students). 

The design of this innovative assignment handler tool was achieved by mapping out the 

lifecycle of a student assignment and highlighting key functional areas for development. 

These have been developed into an innovative assignment handler tool which: 

1. Supports the online delivery of useful feedback through the Blackboard Gradebook by: 

• batch upload of individual file attachments providing detailed feedback along with 

student marks (whether the original work is submitted through Blackboard, or in a nonelectronic 

format such as hard-copy, by portfolio or presentation) 

• allowing partial cohort feedback to be uploaded by each member of the marking team 

• providing feedback on group assignments to each individual in the group, rather than 

one per group 

• giving students access to their feedback all in one place and presented as close to their 

learning as possible 

• encouraging students to engage and reflect on their feedback in order to activate the 

release of their marks (after Black & Wiliam, 1998, who argued that the “effects of feedback 

was reduced if students had access to the answers before the feedback was conveyed”). 

2. Supports the online submission of student work through Blackboard by providing students with 

a detailed electronic receipt of their assignment submission. 

The poster will present a visual representation of the lifecycle of a student assignment, clearly 

indicating where students have responsibilities in the course of completing and submitting 

assignments, and reflecting and acting upon feedback (Hepplestone & Mather, 2007). Information 

about an accompanying electronic feedback wizard development will also be displayed. 

SHU is a large regional University with over 28,000 students. It is based on three campuses 

and offers courses in a diverse range of academic subjects at both undergraduate and 

postgraduate levels. 

References 

Black, P. & Wiliam, D. (1998) Assessment and classroom learning. Assessment in Education, 5 (1), pp.7-74. 

Hepplestone, S. & Mather, R. (2007) Meeting Rising Student Expectations of Online Assignment 

Submission and Online Feedback, [online] In: Proc. 11th Computer-Assisted Assessment 

International Conference 2007, Loughborough, 10-11 July 2007. Learning and Teaching 

Development, Loughborough University. Last accessed 12 February 2007 at: 

http://www.caaconference.com/pastConferences/2007/proceedings/Hepplestone%20S%20Mather% 

20R%20n1_formatted.pdf 

136 ENAC 2008

The challenge of engaging students with feedback 

Rosario Hernandez, University College Dublin, Ireland 

Effective and high quality feedback is often regarded as a key element of excellence in 

teaching that supports student learning (Ramsden, 2003; Black and William, 1998, Sadler, 

1989). Despite this, feedback is often regarded by teachers as a labour-intensive activity 

that frequently makes little impact on student learning. Similarly, students have stressed 

that sometimes they do not understand the feedback they receive, that the feedback is too 

vague or that it does not provide them with suggestions on how to improve their work. 

These comments are particularly relevant in the teaching of modern languages in higher 

education where the feedback provided by teachers often focuses on the correction of 

grammatical mistakes and the provision of correct answers. Adding to that pressure is the 

fact that there are large numbers of students in classes and the practice of offering the 

“traditional” timely written feedback to students has become a struggle for many teachers. 

After an initial study of the issues concerning academics and students in the provision of 

effective feedback, an action-research project was undertaken with a group of 

undergraduate students of Hispanic Studies at University College Dublin. Throughout a 

semester, the duration of the module chosen for the study, students were provided with a 

variety of learning tasks whose main aim was to engage students with feedback. This 

approach to teaching and assessment required the involvement of students in a variety of 

learning activities, among others their participation in dialogue about the assessment criteria 

adopted, the use of assessment sheets with comments to act on them, the reading and 

critiquing of their work and that of their classmates (self-and peer-assessment) or the 

provision of feedback, by the teacher, with no grades. Written and oral data were collected 

by the teacher of this module at different moments during the semester in order to explore 

the experiences of the students with regard to this approach to feedback. Furthermore, a 

focus-group session was conducted in class at the end of the semester. This paper reports 

on the outcomes of the data collected throughout the semester, on the focus-group session 

and on the challenges that this approach to the provision of feedback to students entailed. 

ENAC 2008 137

Towards More Integrative Assessment 

Dai Hounsell, University of Edinburgh, United Kingdom 

Chun Ming Tai, University of Edinburgh, United Kingdom 

Rui Xu, Ningbo University, China 

Assessment in higher education is typified by competing tensions between multiple 

purposes, functions and stakeholders; wide diversity in practices within and across subject 

areas, courses and institutions; and diffuse responsibilities for the oversight and 

management of different aspects of assessment. Achieving coherence and integration in 

assessment practices, processes and policies is therefore a formidable challenge. 

This poster summarises the outcomes of a project which drew extensively upon the 

international literature on assessment in higher education to examine how a more 

integrative approach might be pursued. The project was undertaken as part of a sector-wide 

initiative in Scottish higher education on quality enhancement. 

The main outcomes of the project were a workshop programme and four guides, each of 

which focused on a key aspect of Integrative Assessment: 

• Monitoring Students’ Experiences of Assessment. This guide examines strategies to ascertain 

how well assessment in its various manifestations is working, so as to build on strengths and 

take prompt remedial action where helpful. It explores why it is important to monitor assessment 

practices systematically, what aspects of assessment are currently well-monitored in Scottish 

universities, and how the monitoring of assessment could be improved. 

• Balancing Assessment of and Assessment for Learning. This guide discusses ways of striking 

an optimal balance between the twin central functions of assessment, i.e. to evaluate and certify 

students’ performance or achievement, and to assist students in fulfilling their fullest potential as 

learners. It highlights some undesirable side-effects of imbalances and explores four strategies 

to rebalance assessment: feed-forward assessments, cumulative coursework, betterunderstood 

expectations and standards, and speedier feedback. Each strategy is illustrated 

with case-examples from a range of subjects and settings. 

• Blending Assignments and Assessments for High-Quality Learning. The starting-point for 

this guide is why it might be important not only to assess students' progress and 

performance by a variety of means, but also to consider what combination or blend of 

assignments and assessments in a course or programme of study might be optimal. The 

guide goes on to explore four important considerations that can shape how assignments 

and assessments are blended: blending for alignment of assessment and learning; blending 

for student inclusivity; blending to support progression in students’ understanding and skills, 

and blending for economy and quality. Examples and case reports are outlined from a 

cross-section of subject areas and course settings. 

• Managing Assessment Practices and Procedures. This guide argues that while most 

dimensions of assessment are generally well-managed, there are also aspects which have 

often not received the weight of attention they seem to warrant in the contemporary 

university. These aspects are: managing assessment for as well as assessment of learning; 

enabling evolutionary change in assessment; and wider sharing of responsibilities for 

managing assessment practices and processes. 

The four guides are freely downloadable from the Scottish Universities’ Enhancement 

Themes website (http://www.enhancementthemes.ac.uk/publications/) and a web-based 

version of the guides is being launched in spring 2008. 

138 ENAC 2008

Using a framework adapted from Systemic Functional Linguistics 

to enhance the understanding and design of assessment tasks 

Clair Hughes, The University of Queensland, Australia 

The plentiful and steadily increasing literature on teaching and learning in higher education 

has produced a number of helpful frameworks and guidelines that can be applied to the 

development and communication of assessment practice. As an educational developer I 

regularly deploy a core group of appropriate resources in the selection and staging 

(adaption of McAlpine, 2004) of assessment tasks that target specific cognitive levels 

(Krathwohl, 2002), the planning of feedback (Gibbs and Simpson, 2004: Price and 

O’Donovan, 2006) and the making of assessment judgements (Biggs and Collis, 1991). 

The literature however, is surprisingly light on material to support the analysis and 

purposeful design of individual assessment tasks. This gap initially became an issue for me 

when working with academics in adjusting assessment tasks to minimize opportunities for 

plagiarism. Our work was limited by several factors including a failure to acknowledge the 

wide variations in both task type and level of demand that can distinguish assessments 

within and between such categories as ‘orals’, ‘examinations’ or ‘assignments’; the 

identification of tasks by reference to subject matter and activity only; and, a belief that 

assessment tasks are restricted to the traditional or ‘signature’ forms of assessment 

associated with particular disciplines (Bond, 2007). 

This paper reports the outcome of my efforts to locate a framework that would provide the 

shared concepts and terminology required as a basis for productive and meaningful 

discussions of assessment tasks with academics. In broadening my search beyond the 

assessment literature, I investigated systemic functional linguistics (SFL) (Eggins, 2004; 

Knapp and Watkins, 2005). The resulting framework that is described has proved a useful 

resource for explicating the components of assessment tasks including many that were 

previously overlooked or inferred – audience, student perspective, mode of presentation 

and so on. The paper outlines the application of the framework to the original purpose of 

‘designing out’ opportunities for plagiarism and concludes that the framework has significant 

further potential to introduce academic teachers to a vast but generally unfamiliar literature 

on the systematic development of academic communication skills (see for example Swales 

and Feak, 2004) and as a basis for the critique of assessment as cultural practice. 

References 

Biggs, J., & Collis, K. (1982). Evaluating the Quality of Learning - the SOLO Taxonomy. New York: Academic Press. 

Bond, L. (2007). Toward a Signature Assessment for Liberal Education. Retrieved January 23, 2008, from 

http://bondessays.carnegiefoundation.org/?p=8 

Eggins, S. (2004). An Introduction to Systemic Functional Linguistics. Continuum. London & New York. 

Gibbs, G., & Simpson, C. (2004). Conditions under which assessment supports students' learning. 

Learning and Teaching in Higher Education Retrieved 19 April, 2005, from 

http://www.glos.ac.uk/shareddata/dms/2B70988BBCD42A03949CB4F3CB78A516.pdf 

Knapp, P., & Watkins, M. (2005). Genre, text, grammar: technologies for teaching and assessing writing 

Sydney: UNSW Press. 

Krathwohl, D. (2002). A revision of Bloom's taxonomy: An overview. Theory into Practice, 41(4), 212-218. 

McAlpine, L. (2004). Designing learning as well as teaching. Active Learning in Higher Education, 5(2), 119-134. 

Swales, J., & Feak, C. (2004). Academic Writing for Graduate Students: Essential Tasks and Skills 

(Second ed.). Ann Arbor: The University of Michigan Press. 

ENAC 2008 139

The use of transparency in the "Interactive examination" for student teachers 

Anders Jonsson, Malmö University, Sweden 

If the aim of education is for all students to learn and improve, then the expectations must 

be transparent to the students. In this study, three aspects of transparency are investigated 

in relation to an examination methodology for assessing student teachers' skills in analyzing 

classroom situations and in self-assessing their answers: self-assessment criteria, a scoring 

rubric, and exemplars. The examinations studied were carried out in 2004, 2005, and 2006 

respectively, all with a cohort of first year student teachers (n = 170, 154, and 138). There 

was a large difference in scores between the 2004 and 2005 cohorts (effect size, d = 3.21), 

when changes in the examination were implemented in order to increase the transparency. 

The comparison between 2005 and 2006, when no further changes were made, does not 

show a corresponding difference (d = .27). These results suggest that, by making the 

assessment more transparent, students’ performances could be greatly improved. 

140 ENAC 2008

School monitoring in Luxembourg: 

computerized tests and automated results reporting 

Ulrich Keller, Monique Reichert, Gilbert Busana, Romain Martin 

University of Luxembourg, Luxembourg 

This presentation will introduce the Luxembourgish school monitoring project, focusing on 

the various tools that were developed and used, especially the more innovative tools 

developed for internet-based computer assisted testing and automatic report generation. 

Luxembourg’s school system faces a transition encountered in many countries throughout 

the world: a transition towards more autonomy for individual schools. This necessitates the 

establishment of a school monitoring program, regularly assessing the progress of students 

in a variety of areas including, but not limited to, academic achievement. 

Apart from the development of valid, reliable and objective measures, two other 

requirements for the success and usefulness of such a project are the economical 

administration of tests and comprehensive reporting of relevant results. In this presentation, 

we will introduce the internet-based testing platform TAO and tools used for automatic 

report generation. Though developed in a country with a small population, these tools scale 

very well to other contexts and countries with much larger populations than in Luxembourg. 

We will also outline further possible developments of these tools in order to respond more 

fully to the demands of evidence based decision making in an educational context. 

ENAC 2008 141

Mathematical power of special needs students 

Marjolijn Peltenburg, FIsme, Utrecht University, The Netherlands 

Marja van den Heuvel-Panhuizen, FIsme, Utrecht University, The Netherlands/ 


The poster will inform the conference participants about a small-scale study that forms the 

start of the IMPulSE project. This is a large project aimed at revealing the undisclosed 

mathematical power of special needs students (Van den Heuvel-Panhuizen & Peltenburg, 

2007). The purpose of the small-scale study is to pilot a set of test items which differ from 

regular grade-level achievement tests used to determine students’ mathematics 

understanding. The items in the pilot have been designed with the intention of offering 

children optimal possibilities to show what they are capable of. An important characteristic 

of these items is their ‘elasticity’. Elasticity in items allows different levels of strategy use, 

which makes it possible for students to pass the limits of their assumed capacities. This 

reduces the ‘all-or-nothing’ character of assessment (Van den Heuvel-Panhuizen, 1996). 

To reveal the undisclosed mathematical power of weak students we chose a topic that is 

recognized as difficult for weak students: subtraction with “borrowing”; that means 

subtraction problems in which the 1’s digit of the subtrahend is larger than the 1’s digit of 

the minuend (e.g., 52–17 = ...). A frequently made mistake in these problems is reversing 

the digits (in this case, subtracting 2 from 7 instead of 7 from 2). 

The set of items that is presented to the children includes fourteen subtraction problems in 

the number domain up to 100. The items are taken from the Cito LOVS Test for Mid Grade 

6, but re-designed and placed in an ict environment in which the children are offered a 

dynamic visual tool to find the answers. We expect that this tool will help students to 

overcome the obstacles, as mentioned above, in solving these subtraction problems which 

require “borrowing”. 

The data-collection takes place in two schools for primary special education. In total, the set 

of items is piloted with 20 children. While working on the computer, the children’s steps 

through the program are recorded by the Camtasia Studio software. The analysis of the 

data focuses on the correct scores in the two conditions – regular Cito LVS Test and ict 

version with the dynamic tool – and on tool use in the ict version. 

The poster shows a sample of the problems used in the study and a summary of the 

findings. In addition to the results presented on the poster, Camtasia clips will be shown on 

a laptop. During the poster presentation, we would like to share with the audience our 

experiences with using an ict-based dynamic assessment format to reveal weak students’ 

learning potential. In connection with this, we would also like to discuss ways to continue 

this research. 

References 

Van den Heuvel-Panhuizen, M. (1996). Assessment and realistic mathematics education. Utrecht: CD-ß 

Press/Freudenthal Institute, Utrecht University. 

Van den Heuvel-Panhuizen, M. & Peltenburg, M (2007). Unused learning potential of special-ed students in 

mathematics. Research proposal. Utrecht, the Netherlands: Freudenthal Institute for Science and 

Mathematics Education. 

142 ENAC 2008

Quality Assurance review of clinical assessment: 

How does one close the loop? 

Glynis Pickworth, M. van Rooyen, T.J. Avenant 

University of Pretoria, South Africa 

The MBChB Undergraduate Programme Committee (UPC) of the School of Medicine, 

University of Pretoria mandated the Assessment sub-committee (AC) to review assessment 

practices in the student internship rotations. These rotations take place during the last 18 

months of the six-year programme. Students no longer have any class activities and work 

the whole day in a clinic or hospital. There are five seven-week rotations and eight three or 

three and a half week rotations through various departments such as Family Medicine, 

Obstetrics, Gynaecology, etc. On the whole the staff supervising and assessing students 

are clinicians will little or no training in education and assessment practices. The university 

provides such courses for staff but the clinicians’ workload mostly precludes them from 

attending such courses. They are joint appointments by the state and university and find it 

difficult to get time off due to service delivery commitments. 

The relevant departments were informed of the review process and criteria, after these had 

been approved by the UPC. They were also supplied with a resource guide outlining best 

practice in clinical assessment. The AC made an appointment for a group meeting with the 

staff responsible for assessment for a particular rotation. The group consisted of rotation 

heads and representatives from other departments, members of the AC and members of 

the department responsible for the rotation. During the group meeting the assessment 

practices would be described and discussed according to the review criteria. The AC would 

then compile a report describing the assessment practices. Good practice would be 

acknowledged and recommendations for improvement made. The report would be sent to 

the person responsible for the rotation to make sure the information on assessment 

practices was correct, where after it would be tabled at a meeting of the UPC. 

A comparison across rotations revealed that a wide diversity of assessment methods are 

used. Compared with the four levels of Miller’s pyramid too much assessment is still related 

to the lower levels of the pyramid model rather than to the apex. The review sensitised a 

number of staff to good assessment practice through the resource guide and discussion 

about their assessment practice. 

The question is ‘How does one close the loop in quality assurance?’ Are the 

recommendations for improvement actually implemented? A follow-up study still needs to 

be done. 

ENAC 2008 143

Feedback: What’s in it for me? 

Margaret Price, Karen Handley, Berry O’Donovan 

Oxford Brookes University, United Kingdom 

Hattie and Timperley (2007) conceptualize feedback broadly as, ‘information provided by an 

agent…regarding aspects of one’s performance or understanding' and are very clear that it 

must be ‘a “consequence” of performance’. This view is not contentious. Students want a 

response to their effort and staff need to provide information on the gap between 

performance and aim (Sadler, 1989). However we know that the process of providing and 

receiving feedback is fraught with difficulty arising from the multiple purposes of feedback, 

communication problems and emotional responses to name but a few. 

This paper seeks to examine the relational dimension of feedback and argues that it is a central 

but often a missing dimension of feedback. The role of feedback in creating the relational 

footings between student and tutor provides the foundations for a successful learning process 

and, in particular, for on-going student engagement (Black and Wiliam 1998). 

If we want students to engage with their assessment feedback, we must pay attention to the 

relational dimension of feedback. Students are free to accept, partially accept or reject feedback 

(Chinn and Brewer, 1993) and we would encourage them to exercise their own judgement in 

evaluating feedback as they progress as independent learners. Students make judgements on 

the basis not only of the 'content' but also of their perceptions of the credibility and intentions of 

the author. In addition, there is a temporal dimension because if students are initially confused 

(and negatively evaluate the feedback) but can then engage in dialogue with receptive tutors, 

students may come to understand and therefore value the feedback. Therefore our feedback 

must be convincing, but not necessarily positive ‘feelgood’ feedback which does not link with 

performance (Dweck, 2000). However, to be persuaded of the feedback’s worth, students must 

recognise the feedback as valuable through the reciprocity of the assessor. That reciprocity will 

be demonstrated through the feedback communication process. Is this process seen as 

unidirectional or dialogic, active or passive, are the participants in this together or separately? 

A three-year study on student engagement with assessment feedback involving 35 interviews 

with students and staff, 12 case studies, and questionnaire data from 3 institutions will be 

presented and the findings used to provide a framework for analysing the factors that impact 

on the relational dimension of feedback including: 

• effectiveness of communication process 

• timeliness of response 

• match between staff and students expectations of the process 

• trust in the assessor 

• media of communication – what sort of knowledge can it carry? 

• dialogue opportunity 

• context in which it can be acted upon. 

The findings confirm that what students are looking for in feedback is not unrealistic, but often 

not provided, and this leads to disillusionment and a cycle of disengagement. Therefore the 

implications for practice will be considered and discussed, including the need to prepare staff 

and students to give and receive feedback and establish a relational footing; the opportunity for 

dialogue in resource-constrained environments; and opportunities to use the feedback once 

received and understood. 

144 ENAC 2008

From students’ to teachers’ collaboration: 

a case study of the challenges of e-teaching and assessing as co-responsibility 

Ana Remesal, Universidad de Barcelona, Spain 

Manuel Juárez, José Luis Ramírez 

Centro Nacional de Investigación y Desarrollo Tecnológico, Mexico 

The introduction of new technologies into education demands that teachers handle tools that 

allow them to use techno-pedagogical environments for e-learning (Mauri et al. 2007). These 

environments pose a challenge for teachers when it comes to transforming their practice and 

to using the new tools in an efficient way that eventually will transform and optimize the 

teaching and learning processes. We report about a case study in Higher Education as the 

first part of a two-round project. The “Foundations of Computing Science” preliminary distance 

course tackled basic subjects in discrete mathematics and their applications to computing; its 

aim was to develop common knowledge among the students accepted for the Master’s in 

Computer Science in the Centro Nacional de Investigación y Desarrollo Tecnológico (National 

Center of Investigation and Technological Development)(CENIDET), an institution that 

belongs to Mexico’s Sistema Nacional de Educación Superior Tecnológica (National System 

of Higher Technological Education ) (SNEST). The Claroline (V. 1.1) distance learning 

platform was used. The purpose of this poster is to describe some of the experiences we had 

as designers and teachers of this first Web-based distance course on discrete mathematics. 

Particularly, this poster describes the difficulties encountered and the solutions proposed by 

the group of professors that designed and developed the course using Claroline s a tool in 

order to prepare the second edition of the course. 

The course was developed over five weeks in 2007, with a group of 18 students. The 

course was structured around five units, one per week. The students’ work consisted of 

learning activities done first individually, then contrasted in pairs and then discussed in the 

whole group. These activities could be carried out either in an asynchronous or a 

synchronous manner. The platform used, despite important deficiencies, allowed for the 

organization, administration and follow-up on an individual level, on student-pair level and 

also on the whole group. 

This experience showed us how the distance learning tools introduce conditions that are 

different from face-to-face courses. In order for the design of contents, materials and 

dynamics to be adequate for this new environment, greater reflection by the teacher is 

necessary (Coll, 2004). Big challenges are set for the second implementation of the course: 

especially challenges concerning the assessment of students’ learning from a coresponsibility 

perspective. The second implementation of the course will be carried out by 

two teachers simultaneously. This poses particular challenges as to students’ assessment, 

since both teachers will need to clarify and share teaching goals and assessment purposes 

and instruments. Thus, the teachers’ conceptions about assessment are expected to play 

an important role in this new course. 

References 

Coll, C. (2004). Psicología de la educación y prácticas educativas mediadas por las tecnologías de la 

información y la comunicación. Sinéctica , 25 , 1-24. 

Mauri, T., Colomina, R., De Gispert, I. (en prensa). Diseño de propuestas docentes con TIC en la 

enseñanza superior: nuevos retos y principios de calidad desde una perspectiva 

socioconstructivista. Revista deEducación. MEC. (7-3-2007). 

ENAC 2008 145

Symbiotic relationships: 

Assessment for Learning (AfL), study skills and key employability skills 

Jon Robinson, Northumbria University, United Kingdom 

David Walker, Northumbria University, United Kingdom 

This poster links directly to a detailed aspect of the area to be covered in a roundtable 

presentation application that has been submitted, but it can also stand alone as a 

representation of the development of the particular teaching practice. 

In the English Division at Northumbria University we have redesigned the core first-year 

module for English students in a way that symbiotically links assessment, study skills and 

employability within a framework underpinned by the theory and practice of Assessment for 

Learning (AfL). This poster provides a textual and graphical presentation of the introduction 

and evaluation of the first element of summative assessment on the core 1st year 

undergraduate module in English Studies at Northumbria University. 

This assessment covers the topic of plagiarism, a contentious study skills topic not only 

within Northumbria but also in the sector as a whole. The assessment practice is based on 

the principles of Assessment for Learning and designed in a way that also provides an 

opportunity to begin introducing students to practices that relate directly to key employability 

skills highlighted by the English Subject Centre as deficient in the typical English Studies 

graduate. 

146 ENAC 2008

Assessing low achievers’ understanding of place value – 

consequences for learning and instruction 

Petra Scherer, University of Bielefeld, Germany 

Introduction 

Understanding place value is necessary for understanding our decimal number system. As 

a consequence, a certain understanding has relevance for different fields of school 

mathematics. Having place-value concept is crucial for developing effective calculation 

strategies (e.g. to replace one-by-one finger counting), for understanding the written 

algorithms or for moving from integers to fractions. Research shows that especially low 

achievers have great difficulties, even in higher grades, with understanding place value. 

The paper describes a small case study in which an assessment tool was developed and 

piloted. By means of this tool teachers should get a better understanding of (1) low 

achievers’ difficulties and (2) consequences for teaching and learning processes. 

Assessment tool development 

The existing instruments for assessing the understanding of place value mainly focus on 

applying the concept in standard calculations. The developed tool, in contrast, includes testitems 

that cover different levels of representations and the main building blocks for 

calculation strategies. Moreover, not only standard items have been chosen but unknown 

formats or challenging items which have not been treated in classroom yet, which cannot be 

solved in a mechanistic way and which refer to the specific role of zero. The assessment 

tool comprises tasks with numbers up to 1000 and covers the following topics: Counting in 

steps, splitting up numbers in place values, composing numbers, interpretation of iconical 

representations of numbers, identifying place values of digits in 3-digit-numbers and solving 

simple additions and subtractions. The items can be used for paper-and-pencil tests as well 

as for interviews to get information for both, oral and written competences of the children. 

Results 

The assessment tool was piloted with 12 low achieving students (4 girls; 8 boys) from 5th 

and 6th grade who visit a special school for learning disabled. A first analysis of the results 

showed a certain understanding of place value for all students but also revealed a variety of 

difficulties, especially with the non-standard items (e. g., composing a number out of 

70+200+3 led to the incorrect number 723 whereas a more or less standard item like 

300+50+4 resulted in a correct solution). Moreover, working out simple addition and 

subtraction tasks in many cases was done in a rather mechanistic way by manipulating the 

digits and not thinking about the numbers (e. g., students did not consider all place values 

when adding 314+314 and came to the result 328). Beyond this, problems with zero 

became obvious (e. g. 624–203 led to the result 401). 

Discussion 

The analysis also shows that test results cannot be seen in an isolated way, but one has to 

take into account that children might have individual interpretations of the tasks. Just as 

important is the analysis of the whole solving process. After presenting a selection of 

results, consequences for teaching and learning will be discussed. Assessment of low 

achievers’ competences as well as classroom practice requires more than focusing on 

correct results but should also take into account the students’ solution strategies (including 

explanations and reasoning). 

ENAC 2008 147

Using a course forum to promote learning and assessment 

for learning in environmental education 

Revital (Tali) Tal, Technion – Israel Institute of Technology, Israel 

In continuation of a previous study, in which a complex assessment framework was 

implemented in an environmental education course in a science education department (Tal, 

2005), this study focused on one component of the assessment – the discussions in the 

course forum. The on-line a-synchronic forum served as a sociocultural arena for raising 

questions, leading of and participating in socio-environmental debates, and uploading the 

students’ projects and carrying out peer assessment. The participants were 15 minority preservice 

teachers from various science education disciplines who had very little prior 

knowledge about or awareness of the environment. Within the assessment for learning 

framework that directed the learning, the students were required to read and critique 

newspaper articles, investigate an environmental problem in their home community, 

participate in a field trip and discuss a variety of environmental topics in the course forum 

that was managed by the author (the course instructor) and a teaching assistant. The main 

goal was to use the course forum to improve and assess learning and engagement in 

environmental discourse. As the students were pre-service teachers, an additional goal was 

to expose the students to multi modal learning and assessment in environmental education, 

which is in line with the basic principles of environmental education. The research questions 

were: (a) to what extent the course-forum allowed participating in an environmental 

education learning-community? (b) in what ways students expressed engagement and 

concern for environmental-issues? (c) to what extent the course forum allowed the students 

to express diverse learning outcomes? Three levels of participation in the forum were 

identified: obligatory – very little participation, limited to the requested course tasks; 

occasional – characterized by random activity and limited to responding to others; and 

active – expressed by continuous activity either as initiators who brought up new topics for 

discussions or respondents who continued to develop the discussion. There was good 

alignment between the activity in the forum and the students’ final score. The students who 

actively participated in the forum expanded their learning far beyond class and the course 

assignments. In the interviews carried out a year after the course has ended, these students 

referred to the forum as learning, as well as an assessment instrument that contributed as 

well to their environmental awareness and commitment. Finally, the course forum enabled 

deep discussions that elevated the class-based learning. The students discussed more 

local problems, which were typical to their communities, and provided rich evidence to 

meaningful learning. In the follow up interviews, they indicated about the contribution of the 

forum to their freedom and their success to overcome the class language barrier. 

Referring to the idea of sociocultural theory and communities of practice, the course-forum 

enhanced learning through intensive interaction among the students, where three levels of 

practice were identified: peripheral, occasional and experienced. This study contributes to 

the field of teaching and assessment in higher education, and to the field of environmental 

education in multicultural societies. 

148 ENAC 2008

Learning-oriented feedback: a challenge to assessment practice 

Mirabelle Walker, The Open University, United Kingdom 

The paper starts by presenting research into feedback carried out in the Technology Faculty 

of the UK’s Open University. A coding tool introduced by Brown and Glover (2006) was 

used to analyse over 3000 comments made on 106 assignments in three undergraduate 

course modules. One dimension of this code was used to determine the categories of 

comments being made: relating to the content of the answer; relating to skills development; 

offering motivation; etc. The other dimension was used to determine the ‘depth’ of the 

comments: whether they were indicative, corrective or explanatory. Students’ responses to 

these comments were obtained through individual interviews with 43 of the students whose 

commented assignments had been examined. The students were asked to indicate, if 

possible, an example of a comment on their assignment that they had been able to use in 

later assignments in the module. They were also asked how they had responded to some of 

the specific comments written on their assignment. In the latter case, a thematic analysis of 

these responses was carried out, followed by a matching of response themes to categories 

and depths of comment. 

Two key results emerged from this work: the most effective comments for the students’ 

future work are those that relate to skills development; the most effective comments for 

helping students to understand inadequacies in their work are those that are explanatory. 

The paper shows that these findings are consistent with a conceptualisation of effective 

feedback on assignments that, drawing on Sadler (1989) and Black & Wiliam (1998), sees it 

as offering students a means whereby they can reduce or close the gap between their own 

knowledge, skills and understanding and the desired knowledge, skills and understanding. 

Feedback of this type is learning-oriented feedback, and assessment which is designed to 

offer adequate opportunities for feedback of this type is learning-oriented assessment. 

The paper concludes by highlighting the ways in which these findings challenge both 

feedback practice, which is often insufficiently learning-oriented, and assessment practice, 

where skills development tends to be undervalued by those who set and mark the 

questions, and attention is seldom paid to skills development through the sequence of the 

assignments in a module or programme of study. 

It is intended that discussion will centre around the challenges to feedback and assessment 

practice that arise from this research, as outlined in the conclusion to the paper. It is hoped 

that participants will be able to share experiences of, or suggestions for, responding to 

these challenges. 

References 

Black, P. & Wiliam, D. (1998) Assessment and classroom learning, Assessment in Education: Principles, 

Policy and Practice, 5(1), 7–74. 

Brown, E. & Glover, C. (2006) ‘Evaluating written feedback’ in Bryan, C. & Clegg, K. (eds.) Innovative 

Assessment in Higher Education, Abingdon: Routledge, 81–91. 

Sadler, D. R. (1989) Formative assessment and the design of instructional systems, Instructional Science, 

18, 119–144. 

ENAC 2008 149

Progressive Formalization as an Interpretive Lens for 

Increasing the Learning Potentials of Classroom Assessment 

David Webb, University of Colorado at Boulder, The United States of America 

Education researchers have repeatedly asserted that to improve student learning, teachers 

need to give greater attention to their use of formative assessment. To effectively guide 

student learning, teachers must develop greater confidence in their own decision making 

and expertise in classroom assessment. To appropriately interpret student responses to 

instructional activities, teachers need to understand how the mathematical content 

demonstrated in students’ representations relate to the development of student learning and 

expectations for mathematical literacy. 

The didactical design construct of progressive formalization, and many examples thereof, 

draws from decades of developmental research using the principles of Realistic Mathematics 

Education. Instructional sequences in RME are conceived as “learning lines” in which problem 

contexts serve as starting points to elicit students’ informal representations. When 

appropriate, the teacher builds upon students’ representations and either draws upon student 

strategies that are progressively more formal or introduces students to new strategies and 

models. Students are encouraged to refer back to less formal representations to deepen their 

understanding of the abstract-symbolic. Essentially, progressive formalization is a designoriented 

mathematical instantiation of cognitive/constructivist learning theories. Through 

careful attention to students' prior knowledge and guided support from the teacher, students’ 

conceptions are related to other pre-formal mathematical representations. The teacher 

facilitates student learning and a sense of ownership by selecting appropriate problems, 

interpreting student responses, posing clarifying questions, and using counterexamples to 

support the development of students’ mathematical understanding. 

This paper reports the underlying design theory and results from a research-based, 

professional development program designed to improve teacher confidence, expertise and 

use of classroom assessment. Over the past 3-years, the program has involved 32 middle 

grades mathematics teachers working among six middle schools (i.e., 12 to 14 year old 

students) in a moderately-sized U.S. public school district. 

From prior assessment design studies involving mathematics teachers, we recognized that 

limitations in the content knowledge of some teachers had a profound influence on their 

ability to select or design tasks accessible to students’ informal and pre-formal 

representations. As a way to deepen their understanding of mathematics, teachers 

completed mathematical tasks that illustrated progressive formalization in rational number 

and algebra. In design and analysis activities, teachers continuously used progressive 

formalization as a lens to adapt or create assessment tasks, review their instructional 

materials, design scoring guides and rubrics, interpret student responses, and discuss 

instructional responses based on examples of student work. 

The analysis of teachers’ assessment portfolios (i.e., collection of all paper and pencil 

assessments) suggest that this PD model provided teachers with a more principled basis for 

assessing student understanding and resulted in a conceptualization of assessment that 

was generative. That is, teachers applied principles of progressive formalization by 

increasing the accessibility of the assessment tasks they used and in the ways they 

interpreted and responded to student work. The full paper and presentation will include 

examples of the classroom assessment teachers designed, how they used student work to 

inform the revision and redesign of assessments, and preliminary analysis of the impact of 

this program on student achievement of participating teachers. 

150 ENAC 2008

Author Index 

ENAC 2008 151

152 ENAC 2008

Adamson 105 

Admiraal 69 

Allin 57 

Asghar 58 

Asmyhr 101 

Avenant 143 

Bakker 126 

Barrie 102 

Bayrhuber 34 

Bell 129 

Berenst 52 

Bjølseth 115 

Black 

Beth 59 

Paul 66 

Blackmore 42 

Bloxham 60 

Bohemia 65 

Bokhove 130 

Boud 103 

Boursicot 45 

Bremer 87 

Brockbank 131 

Bruder 34 

Bucholtz 98, 132 

Busana 141 

Butcher 114 

Campbell 60 

Cheung 90, 104 

Clark 105 

Coates 108 

Contreras Palma 61 

Cowie 62 

Crossouard 14 

Dacre 45 

Davison 106 

de Glopper 50, 52 

De Grez 107 

Dearnley 76, 108 

Diemer 133 

Dysthe 25, 27 

Ebert 63 

Ecclestone 13 

Eckes 16 

Eggen 64 

Ekecrantz 124 

Engelsen 27, 91 

Entwistle 32 

Fastré 134 

Fisher 109 

Fishwick 57 

Foreman-Peck 29 

Formazin 98, 132 

Fuller 44, 110 

Furnborough 111 

Goldhammer 39, 40, 74 

Goos 135 

Handley 84, 144 

Harman 65 

Harrison 66 

Harsch 17 

Hartig 17, 33, 36 

Hartley 76 

Hartnell-Young 26 

Havnes 25, 67, 68, 115 

Hepplestone 112, 136 

Hernandez 137 

Higham 45 

Hodgen 66 

Hoeksma 69 

Höhler 36 

Homer 110 

Hopfenbeck 113 

Hounsell 

Dai 70, 138 

Jenny 70 

Hughes 

C. 102 

Clair 135, 139 

Hunter 114 

Jadoul 38 

James 12 

Janssen 

Fred 95 

Judith 69 

Jones 

Alister 62 

Julie 31 

Jonsson 140 

Jordan 22, 114, 131 

Joughin 71 

Juárez 145 

Jude 40 

Karius 18 

Keller 141 

Klieme 5, 40 

Köller 15 

Kunina 35 

ENAC 2008 153

Kuper 133 

Kwant 52 

Lai 

Mei Kuin 78 

Patrick 72 

Latour 38 

Lauvas 115 

Lecaque 38 

Leitch 6 

Leuders 34 

Linsey 120 

Luff 116 

Maier 73 

Marshall 66 

Martens 37, 39, 40, 74 

Martin 141 

McCabe 75 

McCusker 21 

McDowell 11 

McLean 106 

Meddings 76 

Meeus 28 

Mellor 32 

Mitchell 131 

Montgomery 77 

Moreland 62 

Nachshon 117 

Narciss 49 

Naumann 39, 40 

Neumann 18 

Nicholson 21 

Norton 

Bill 89 

Lin 89 

O'Brien 78 

O'Doherty 79 

O'Donovan 84, 118, 119, 144 

Oehler 80 

Ooms 120 

Orr 81 

O'Siochru 94 

Otrel-Cass 62 

Pat-El 82 

Pell 41, 44, 110 

Peltenburg 142 

Pickworth 143 

Pilkington 83 

Plichart 38 

Price 84, 118, 119, 144 

Proctor-Childs 109 

Pryor 14 

Ramírez 145 

Reichert 141 

Reimann 25 

Remesal 85, 145 

Renault 38 

Richardson 86 

Ridgway 21 

Roberts 41, 45 

Robinson 

Gilian 116 

Jon 121, 146 

Robitzsch 15, 18, 80 

Rölke 39, 40 

Rom 117 

Roozen 107 

Rowley 129 

Ruedel 20 

Rupp 35 

Sambell 77 

Sandal 122 

Saniter 87 

Schaap 88 

Scharaf 39 

Scherer 147 

Schmidt 88 

Schofield 123 

Schroeders 98 

Schwieler 124 

Segers 49, 82 

Serret 66 

Shannon 89 

Sit 90, 104 

Sjo 91 

Sluijsmans 46, 47, 48, 134 

Smee 43 

Smith 19 

Andrew 31 

C. 102 

Kari 91, 92, 122 

Stein 93 

Stern 125 

Strijbos 48, 49 

Strivens 94 

Swietlik-Simon 38 

Syversen 122 

Tai 138 

Tal 148 

Taylor 108 

Tigelaar 95, 126 

154 ENAC 2008

Tillema 82 

Valcke 107 

Van de Watering 48 

van den Boogaard 53 

van den Heuvel-Panhuizen 50, 53, 142 

van der Klink 134 

van der Pol 51 

van Lierop-Debrauwer 51 

van Merriënboer 47, 134 

Van Petegem 28 

van Rooyen 143 

van Tartwijk 95 

van Zundert 47, 96 

Vedder 82 

Veldman 95 

Verloop 95, 126 

Vernon 30 

Walker 

David 121, 146 

Mirabelle 149 

Wangensteen 122 

Webb 

David 150 

Marion 120 

Webster-Wright 135 

Whitelock 19, 97 

Wilhelm 35, 98, 132 

Wiliam 7 

Wirtz 34 

Xu 138 

ENAC 2008 155

156 ENAC 2008

Address List of 

presenters 

ENAC 2008 157

158 ENAC 2008

Allin, Linda 

Northumbria University 

CETL 

NE1 8ST Newcastle Upon Tyne 

UNITED KINGDOM 

linda.allin@unn.ac.uk 

Barrie, Simon 

The University of Sydney 

AUSTRALIA 

S.Barrie@itl.usyd.edu.au 

Blackmore, David 

Medical Council of Canada 

2283 St. Laurent Boulevard 

K1G 5A2 Ottawa 

CANADA 

dblackmore@mcc.ca 

Boud, David 

University of Technology, Sydney 

PO Box 123 

NSW 2007 Broadway 

AUSTRALIA 

David.Boud@uts.edu.au 

Bucholtz, Nina 


IQB 


10099 Berlin 

GERMANY 

bucholtn@iqb.hu-berlin.de 

Asghar, Mandy 

Leeds Metropolitan University 

7 Chelwood Ave 

LS8 2BA Leeds 


a.asghar@leedsmet.ac.uk 

Bell, Andy 

Manchester Metropolitan University 

24 Park Row 

SK4 3DY Heaton Mersey / 

Stockport 


A.Bell@mmu.ac.uk 

Bloxham, Sue 

University of Cumbria 

Bowerham Rd 

LA1 3JD Lancaster 


susan.bloxham@cumbria.ac.uk 

Boursicot, Katharine 

St George's, University of London 

46-47 Compton Road 

N1 2PB London 


kboursic@sgul.ac.uk 

Cheung, Kwok Cheung 

Faculty of Education 

University of Macau 

11A, Block 2 

Taipa Macao 

CHINA 

kccheung@umac.mo 

Asmyhr, Morten 

Østfold University College 

Sagmesterveien 49 

1414 Trollåsen 

NORWAY 

morten.asmyhr@hiof.no 

Black, Beth 

Cambridge Assessment 

8 Marshall Road 

CB1 7TY Cambridge 


Black.B@cambridgeassessment.org.uk 

Bokhove, Christian 

FIsme 

Utrecht University 

Aidadreef 12 

3561 GE Utrecht 

NETHERLANDS 

cbokhove@gmail.com 

Brockbank, Barbara 

The Open University 

19 Woodfield Road 

TN9 2LG Tonbridge 


bsb3@tutor.open.ac.uk 

Clark, Wendy 


CETL 



wendy.clark@unn.ac.uk 

ENAC 2008 159

Contreras Palma, Saul Alejandro 

CHILE 

saul2674@hotmail.com 

de Glopper, Kees 

Center for Language and 

Cognition, Faculty of Arts, 

University of Groningen 

PO Box 716 

9700 AS Groningen 

NETHERLANDS 

c.m.de.glopper@rug.nl 

Diemer, Tobias 

Freie Universität Berlin 

Arnimallee 12 

14195 Berlin 

GERMANY 

diemer@zedat.fu-berlin.de 

Ecclestone, Kathryn 

Oxford Brookes University 

Westminster Institute of Education 

Harcourt Hill OX Oxford 


kecclestone@brookes.ac.uk 

Fastré, Greet 

Open Universiteit Nederland 

Valkenburgerweg 177 

6419 AT Heerlen 

NETHERLANDS 

greet.fastre@ou.nl 

Cowie, Bronwen 

University of Waikato 

Hillcrest Rd 

2001 Hamilton 

NEW ZEALAND 

bcowie@waikato.ac.nz 

De Grez, Luc 

University College Brussels 

Koningsstraat 336 

1030 Brussels 

BELGIUM 

luc.degrez@hubrussel.be 

Dysthe, Olga 

University of Bergen 

Beiteveien 9 

5019 Bergen 

NORWAY 

Olga.Dysthe@iuh.uib.no 

Eckes, Thomas 

TestDaF Institute 

Feithstr. 188 

58084 Hagen 

GERMANY 

thomas.eckes@testdaf.de 

Fisher, Margaret 

University of Plymouth 

Drake Circus 

PL4 8AA Plymouth 


m.fisher@plymouth.ac.uk 

Davison, Gillian 


CETL AfL 

NE1 8ST Newcastle upon Tyne 


gillian.davison@unn.ac.uk 

Dearnley, Christine 

University of Bradford 

Ashgrove Barn, Broad Lane 

HD9 1LS Huddersfield 


c.a.dearnley1@bradford.ac.uk 

Ebert, Julian 

University of Zurich 

Binzmühlestr. 14 

8050 Zürich 

SWITZERLAND 

ebert@ifi.uzh.ch 

Eggen, Astrid Birgitte 

University of Oslo 

Kapellveien 17c 

487 Oslo 

NORWAY 

astrid.eggen@ils.uio.no 

Foreman-Peck, Lorraine 

The University of Northampton 

53 Portland Road 

0X2 7EZ Oxford 


lorraine.foremanpeck@northampton.ac.uk 

160 ENAC 2008

Fuller, Richard 

School of Medicine 

University of Leeds 

LS2 9JT Leeds 


R.Fuller@leeds.ac.uk 

Harman, Kerry 


CETL, Ellison Building 



m.newson@unn.ac.uk 

Hartnell-Young, Elizabeth 

Learning Science Research Institute 

The University of Nottingham 

NG8 1BB Nottingham 


elizabeth.hartnellyoung@nottingham.ac.uk 

Hernandez, Rosario 

University College Dublin 

School of Languages and Literatures 

Newman Building, Belfield 4 

Dublin 

IRELAND 

charo.hernandez@ucd.ie 

Hopfenbeck, Therese Nerheim 

University of Oslo 


Sem Seland vei 24, P.O. Box 1099 

Blindern 

NO-0317 Oslo 

NORWAY 

t.n.hopfenbeck@ils.uio.no 

Furnborough, Concha 


Walton Hall 

MK7 6AA Milton Keynes 


c.furnborough@open.ac.uk 

Harrison, Christine 

King's College London 

Franklin-Wilkins-Building WBW, 

150 Stamford Street 

SE1 9NN London 


christine.harrison@kcl.ac.uk 

Havnes, Anton 


Øvreveien 36 

N-1450 Nesoddtangen 

NORWAY 

anton.havnes@hio.no 

Hoeksma, Mark 

University of Amsterdam Graduate 

School for Teaching and Learning 

P.E. Tegelbergplein 4 

1019 TA Amsterdam 

NETHERLANDS 

m.hoeksma@uva.nl 

Hounsell, Dai 

University of Edinburgh 

Paterson's Land, Holyrood Road 

EH8 8AQ Edinburgh 


Dai.Hounsell@ed.ac.uk 

Goldhammer, Frank 

German Institute for International 

Educational Research (DIPF) 

Schlossstr. 29 

60486 Frankfurt/Main 

GERMANY 

goldhammer@dipf.de 

Hartig, Johannes 



Schloßstraße 29 

60486 Frankfurt am Main 

GERMANY 

hartig@dipf.de 

Hepplestone, Stuart 

Sheffield Hallam University 

Howard Street 

S1 1WB Sheffield 


s.j.hepplestone@shu.ac.uk 

Höhler, Jana 





GERMANY 

hoehler@dipf.de 

Hounsell, Jenny 

University of Edinburgh 

Paterson's Land, Holyrood Road 

EH8 8AQ Edinburgh 


Jenny.Hounsell@ed.ac.uk 

ENAC 2008 161

Hughes, Clair 

The University of Queensland 

147 Swann Rd 

4068 Brisbane 

AUSTRALIA 

clair.hughes@uq.edu.au 

Jonsson, Anders 

Malmö University 

School of Teacher Education 

SE-205 06 Malmö 

SWEDEN 

anders.jonsson@mah.se 

Keller, Ulrich 

University of Luxembourg 

Route de Diekirch 

L-7220 Walferdange 

LUXEMBOURG 

ulrich.keller@uni.lu 

Kunina, Olga 


IQB 


10099 Berlin 

GERMANY 

Olga.Kunina@iqb.hu-berlin.de 

Lauvas, Per 

Østvold University College 

NORWAY 

per.lauvas@hiof.no 

James, David 

University of the West of England 

Coldharbour Lane 

BS16 1QY Bristol 


david.james@uwe.ac.uk 

Jordan, Sally 


COLMSCT 

95 Sluice Road 

PE38 0DZ Downham Market 


s.e.jordan@open.ac.uk 

Klieme, Eckhard 




60486 Frankfurt 

GERMANY 

klieme@dipf.de 

Kwant, Aletta 

Center for Language and 

Cognition, Faculty of Arts 

University of Groningen 

PO Box 716 

9700 AS Groningen 

NETHERLANDS 

l.p.kwant@rug.nl 

Leitch, Ruth 

Queen`s University Belfast 

69-71 University Street 

BT7 1HL Belfast 


r.leitch@qub.ac.uk 

Jones, Julie 


11 Cardinal Close 

NN4 0RP Northampton 


julie.jones@northampton.ac.uk 

Joughin, Gordon 

University of Wollongong 

CEDIR, University of Wollongong 

2522 Wollongong 

AUSTRALIA 

gordonj@uow.edu.au 

Köller, Olaf 


IQB 


10099 Berlin 

GERMANY 

iqboffice@iqb.hu-berlin.de 

Lai, Patrick 

The Hong Kong Polytechnic University 

Educational Development Centre 

Room TU607 

Hung Hom, Kowloon 

Hong Kong 

CHINA 

etktlai@netvigator.com 

Luff, Paulette 

Anglia Ruskin University 

Bishop Hall Lane 

CM1 1SQ Chelmsford 


paulette.luff@anglia.ac.uk 

162 ENAC 2008

Maier, Uwe 

University of Education Schw. Gmünd 

Ostalbstrasse 8 

73529 Schwäbisch Gmünd 

GERMANY 

uwe.maier@ph-gmuend.de 

McDowell, Liz 

University of Northumbria 

CETL Hub D121, Ellison Building, 

Ellison Place 



liz.mcdowell@unn.ac.uk 

Mellor, Antony 


School of Applied Sciences 

NE1 8ST newcastle Upon Tyne 


antony.mellor@unn.ac.uk 

Naumann, Johannes 





GERMANY 

naumann@dipf.de 

O'Donovan, Berry 


150 Marlborough Road 

OX1 4LS Oxford 


bodonovan@brookes.ac.uk 

Martens, Thomas 



Postfach 900270 


GERMANY 

m@rtens.net 

Meddings, Fiona 

Division of Midwifery & 

Reproductive Health 

University of Bradford 

55 Crowther Avenue 

LS28 5SA Leeds 


f.s.meddings@bradford.ac.uk 

Montgomery, Catherine 


CETL AfL, Ellison Building 

Ellison Place, Newcastle 

NE1 8ST Newcastle 


c.montgomery@unn.ac.uk 

O'Brien, Patrice 


University of Auckland 

111 Blockhouse Bay Rd, Avondale 

1026 Auckland 

NEW ZEALAND 

pa.obrien@auckland.ac.nz 

Oehler, Raphaela 


IQB 


10099 Berlin 

GERMANY 

raphaela.oehler@iqb.hu-berlin.de 

McCabe, Michael 

University of Portsmouth 

Lion Terrace 

PO1 3HF Portsmouth 


michael.mccabe@port.ac.uk 

Meeus, Wil 

Universiteit Antwerpen 

Venusstraat 35 

2000 Antwerpen 

BELGIUM 

wil.meeus@ua.ac.be 

Nachshon, Michal 

Ministry of Education 

Vardiya st. 24 

34657 Haifa 

ISRAEL 

michaln@tx.technion.ac.il 

O'Doherty, Michelle 

Liverpool Hope University 

Hope Park 

L16 9JD Liverpool 


odoherm@hope.ac.uk 

Ooms, Ann 

Kingston University 

19 Woodlands - 4 South Bank 

KT6 6DB Surbiton 


a.ooms@kingston.ac.uk 

ENAC 2008 163

Orr, Susan 

York St John University 

Lord Mayor's Walk 

YO317EX York 


s.orr@yorksj.ac.uk 

Peltenburg, Marjolijn 

FIsme 


Aidadreef 12 


NETHERLANDS 

M.Peltenburg@fi.uu.nl 

Plichart, Patrick 

CRP Henri Tudor 

Avenue John F. Kennedy L, 29 

1855 Luxembourg - Kirchberg 

LUXEMBOURG 

patrick.plichart@tudor.lu 

Remesal, Ana 

Universidad de Barcelona 

Paseo del Valle Hebrón, 171 

E-08035 Barcelona 

SPAIN 

aremesal@ub.edu 

Roberts, Trudie 


Level 7, Worsley Building, 

Clarendon Way 

LS2 9NL Leeds 


t.e.roberts@leeds.ac.uk 

Pat-El, Ron 

Leiden University 

Catharinaland 69 

2591 CG Den Haag 

NETHERLANDS 

rpatel@fsw.leidenuniv.nl 

Pickworth, Glynis 

University of Pretoria 

90 Wenning Street 

181 Pretoria 

SOUTH AFRICA 

glynis.pickworth@up.ac.za 

Price, Margaret 


2 Hearne Road 

W4 3NJ London 


meprice@brookes.ac.uk 

Richardson, Mary 

Roehampton University 

Froebel College 

Roehampton Lane 

SW15 5PJ London 


mary.richardson@roehampton.ac.uk 

Robinson, Jon 


CETL AfL 



john.robinson@unn.ac.uk 

Pell, Godfrey 


CSSME, EC Stoner Building 

LS2 9JT Leeds 


G.Pell@leeds.ac.uk 

Pilkington, Ruth 

University of Central Lancashire 

67 Lower Bank Road 

PR2 8NU Preston 


RMHPilkington@uclan.ac.uk 

Pryor, John 

University of Sussex 

9 Wellington Road 

BN2 3AB Brighton 


j.b.pryor@sussex.ac.uk 

Ridgway, Jim 

University of Durham 

School of Education 

Leazes Road 

Durham DH1 1TA UK 


Jim.Ridgway@durham.ac.uk 

Robitzsch, Alexander 


IQB 


10099 Berlin 

GERMANY 

alexander.robitzsch@iqb.hu-berlin.de 

164 ENAC 2008

Ruedel, Cornelia 

University of Zurich 

E-Learning Center 

Hirschengraben 84 

8001 Zurich 

SWITZERLAND 

Cornelia.Ruedel@access.uzh.ch 

Schaap, Lydia 

Erasmus University Rotterdam 

Institute of Psychology 

Haagdijk 51A 

4811 TP Breda 

NETHERLANDS 

l.schaap@fsw.eur.nl 

Schwieler, Elias 

Stockholm University 

UPC Frescativ. 28 

106 91 Stockholm 

SWEDEN 

elias.schwieler@upc.su.se 

Sjo, Anne Kristin 

Stord/Haugesund University College 

PB 5000 

5409 Stord 

NORWAY 

aks@hsh.no 

Smith, Kari 


Post box 7800 

5120 Bergen 

NORWAY 

kari.smith@iuh.uib.no 

Sandal, Ann Karin 

Sogn and Fjordane University College 

Stedjeåsen 24 

6856 Sogndal 

NORWAY 

ann.karin.sandal@hisf.no 

Scherer, Petra 

University of Bielefeld 

Faculty of Mathematics 

Athener Weg 9 

44269 Dortmund 

GERMANY 

petra.scherer@uni-bielefeld.de 

Shannon, Lee 

Liverpool Hope University 

6 Leda Grove 

L17 8XL Liverpool 


leeroyshannon@hotmail.co.uk 

Sluijsmans, Dominique 


PO Box 2960 

6401 DL Heerlen 

NETHERLANDS 

dominique.sluijsmans@ou.nl 

Stein, Margit 

Lehrstuhl für Sozialpädagogik und 

Gesundheitspädagogik 

Kath. Universität Eichstätt-Ingolstadt 

Schießstättberg 5 

85072 Eichstätt 

GERMANY 

margit.stein@gmx.net 

Saniter, Andreas 

ITB Uni Bremen 

Am Fallturm 1 Pf. 330440 

28334 Bremen 

GERMANY 

asaniter@uni-bremen.de 

Schofield, Mark 

Edge Hill Univesity 

St Helens Road 

L39 4QP Lancashire 


schom@edgehill.ac.uk 

Sit, Pou Seong 


University of Macau 

J520 

Taipa Macao 

CHINA 

pssit@umac.mo 

Smee, Sydney 

Medical Council of Canada 

2283 St. Laurent Blvd 

K1G 5A2 Ottawa 

CANADA 

sydney@mcc.ca 

Stern, Thomas 

University of Klagenfurt 

Schottenfeldg. 29 

1070 Wien 

AUSTRIA 

thomas.stern@uni-klu.ac.at 

ENAC 2008 165

Strijbos, Jan-Willem 

Universiteit Leiden 

Fac. Sociale Wetenschappen 

Postbus 9555 

2300 RB Leiden 

NETHERLANDS 

jwstrijbos@fsw.leidenuniv.nl 

Tigelaar, Dineke 

ICLON-Leiden University 

Graduate School of Teaching 

PO Box 9555 

2300 RB Leiden 

NETHERLANDS 

DTigelaar@iclon.leidenuniv.nl 

van der Pol, Coosje 

Tilburg University 

Retiesheike 16 

2460 Kasterlee 

THE NETHERLANDS 

j.a.vdrpol@uvt.nl 

Walker, Mirabelle 


Communication & Systems Dept. 

MCT Faculty 

Walton Hall 



c.m.walker@open.ac.uk 

Wilhelm, Oliver 


IQB 


10099 Berlin 

GERMANY 

oliver.wilhelm@rz.hu-berlin.de 

Strivens, Janet 

The University of Liverpool 

Y Graig, Llandegla 

LL11 3BG Wrexham 


strivens@liv.ac.uk 

van den Boogaard, Sylvia 

FIsme 


Aidadreef 12 


NETHERLANDS 

s.vandenboogaard@fi.uu.nl 

van Zundert, Marjo 


Postbus 2960 

6401 DL Heerlen 

THE NETHERLANDS 

marjo.vanzundert@ou.nl 

Webb, David 

University of Colorado at Boulder 

249 UCB 

80309 Boulder 

THE UNITED STATES 

dcwebb@colorado.edu 

Wiliam, Dylan 

University of London 

20 Bedford Way 

WC1H 0AL London 


d.wiliam@ioe.ac.uk 

Tal, Tali 

Technion – 

Israel Institute of Technology 

30 Ella st 

25147 Kefar Veradim 

ISRAEL 

rtal@technion.ac.il 

van den Heuvel-Panhuizen, Marja 

FIsme, Utrecht University 

Aidadreef 12, 3561 GE Utrecht 

NETHERLANDS 

m.vandenheuvel@fi.uu.nl 

IQB, Humboldt-Universität zu Berlin 

Unter den Linden 6, 10099 Berlin 

GERMANY 

heuvelpm@IQB.hu-berlin.de 

Vernon, Julia 


Park Campus, Boughton Green Rd 

Northampton NN2 7AL 


julia.vernon@northampton.ac.uk 

Whitelock, Denise 


Institute of Educational Technology 

Walton Hall 



d.m.whitelock@open.ac.uk 

Wirtz, Markus Antonius 

University of Education 

Department of Psychology 

Kunzenweg 21 

79117 Freiburg 

GERMANY 

markus.wirtz@ph-freiburg.de 

166 ENAC 2008

CHALLENGING ASSESSMENT - PSYCONDIA

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?