HYDROLOGICAL MODELLING AND RIVER BASIN MANAGEMENT

Danmarks og Grønlands Geologiske Undersøgelse — Særudgivelse 2007 

Hydrological Modelling and River Basin Management 

Doctoral Thesis 

Jens Christian Refsgaard 

Geological Survey of Denmark and Greenland 

Danish Ministry of the Environment

Denne afhandling er af Det Naturvidenskabelige Fakultet ved Københavns Universitet antaget til offentligt at forsvares 

for den naturvidenskabelige doktorgrad. 

København, den 5. januar, 2007 

Nils O. Andersen 

Dekan 

Forsvaret vil finde sted fredag den 1 juni, 2007 kl 14 00 i Anneksauditorium A, Studiestræde 6, Københavns Universitet 

This thesis has been accepted by the Faculty of Natural Science at the University of Copenhagen for public defence 

in fulfilment of the degree of Doctor of Science. 

Copenhagen, 5 th January, 2007 

Nils O Andersen 

Dean 

The defence will take place on Friday 1 st June, 2007 at 14 00 in Anneksaudiorium A, Studiestræde 6, University of 

Copenhagen 

Special Issue 

Author: Jens Christian Refsgaard 

Illustrations: Kristian A. Rasmussen and reproductions from existing publications 

Cover: Kristian A. Rasmussen 

Date: January 2007 

The Report is available on the internet at http://www.geus.dk/ 

ISBN 978-87-7871-185-4 

Geological Survey of Denmark and Greenland (GEUS) 

Øster Voldgade 10 

DK-1350 København K 

Tel: +45 38142000 

Fax: +45 38142050 

Email: geus@geus.dk 

http://www.geus.dk/

Refsgaard JC – Doctoral Thesis January 2007 


Table of Contents 

Dansk Resume 3 

Abstract 4 

1. Introduction 5 

1.1 Water Resources Management and Hydrological Modelling 5 

1.2 Objective and Content 5 

2 Water Resources Management and the Modelling Process 7 

2.1 Modelling as Part of the Planning and Management Process 7 

2.2 Terminology and Scientific Philosophical Basis for the Modelling Process 10 

2.2.1 Background 10 

2.2.2 Terminology and guiding principles 10 

2.2.3 Scientific philosophical aspects 12 

2.3 Modelling Protocol 14 

2.4 Classification of Models 18 

3 Simulation of Hydrological Processes at Catchment Scale 20 

3.1 Flow modelling 20 

3.1.1 Groundwater/surface water model for the Suså catchment ([1], [2]) 20 

3.1.2 Application of SHE to catchments in India ([4], [5]) 27 

3.1.3 Intercomparison of different types of hydrological models ([6]) 32 

3.2 Reactive Transport 36 

3.2.1 Oxygen transport and consumption in the unsaturated zone ([3]) 36 

3.2.2 An integrated model for the Danubian Lowland ([9]) 39 

3.2.3 Large scale modelling of groundwater contamination ([10]) 45 

3.3 Real-time Flood Forecasting 49 

3.3.1 Intercomparison of updating procedures for real-time forecasting ([8]) 49 

4. Key Issues in Catchment Scale Hydrological Modelling 53 

4.1 Scaling 53 

4.1.1 Catchment heterogeneity 53 

4.1.2 A scaling framework 56 

4.1.3 Scaling - an example 58 

4.1.4 Discussion – post evaluation 59 

4.2 Confirmation, Verification, Calibration and Validation 62 

4.2.1 Confirmation of conceptual model 62 

4.2.2 Code verification 62 

4.2.3 Model calibration 63 

4.2.4 Model validation 63 

i




4.3 Uncertainty Assessment 66 

4.3.1 Modelling uncertainty in a water resources management context 66 

4.3.2 Data uncertainty 71 

4.3.3 Parameter uncertainty 71 

4.3.4 Model structure uncertainty 73 


4.4 Quality Assurance in Model based Water Management 77 

4.4.1 Background 77 

4.4.2 The HarmoniQuA approach 77 

4.4.3 Organisational requirements for QA guidelines to be effective 79 

4.4.4 Performance criteria and uncertainty – when is a model good enough 79 


5 Conclusions and Perspectives for Future Work 81 

5.1 Summary of Main Scientific Contributions 81 

5.2 Modelling Issues for Future Research 82 

6 References 84 

ii



Appendices: Publications [1] – [15] 

[1] Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water Model for the Suså 

Catchment. Part 1: Model Description. Nordic Hydrology, 13, 299-310. 

[2] Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water Model for the Suså 

Catchment. Part 2: Simulations of Streamflow Depletions Due to Groundwater Abstraction. Nordic 

Hydrology, 13, 311-322. 

[3] Refsgaard JC, Christensen TH, Ammentorp HC (1991) A model for oxygen transport and consumption 

in the unsaturated zone. Journal of Hydrology, 129, 349-369. 

[4] Refsgaard JC, Seth SM, Bathurst JC, Erlich M, Storm B, Jørgensen, GH, Chandra S (1992) Application 

of the SHE to catchments in India - Part 1: General results. Journal of Hydrology, 140, 

pp 1-23. 

[5] Jain SK, Storm B, Bathurst JC, Refsgaard JC, Singh RD (1992) Application of the SHE to catchments 

in India - Part 2: Field experiments and simulation studies with the SHE on the Kolar subcatchment 

of the Narmada River. Journal of Hydrology, 140, 25-47. 

[6] Refsgaard JC, Knudsen J (1996) Operational validation and intercomparison of different types of 

hydrological models. Water Resources Research, 32 (7), 2189-2202. 

[7] Refsgaard JC (1997) Parametrisation, calibration and validation of distributed hydrological models. 

Journal of Hydrology, 198, 69-97. 

[8] Refsgaard JC (1997) Validation and Intercomparison of Different Updating Procedures for Real- 

Time Forecasting. Nordic Hydrology, 28, 65-84. 

[9] Refsgaard JC, Sørensen HR, Mucha I, Rodak D, Hlavaty Z, Bansky L, Klucovska J, Topolska J, 

Takac J, Kosc V, Enggrob HG, Engesgaard P, Jensen JK, Fiselier J, Griffioen J, Hansen S 

(1998) An Integrated Model for the Danubian Lowland – Methodology and Applications. Water 

Resources Management, 12, 433-465. 

[10] Refsgaard JC, Thorsen M, Jensen JB, Kleeschulte S, Hansen S (1999) Large scale modelling of 

groundwater contamination from nitrogen leaching. Journal of Hydrology, 221(3-4), 117-140. 

[11] Thorsen M, Refsgaard JC, Hansen S, Pebesma E, Jensen JB, Kleeschulte S (2001) Assessment 

of uncertainty in simulation of nitrate leaching to aquifers at catchment scale. Journal of Hydrology, 

242, 210-227. 

[12] Refsgaard JC, Henriksen HJ (2004) Modelling guidelines – terminology and guiding principles. 

Advances in Water Resources, 27(1), 71-82. 

[13] Refsgaard JC, Henriksen HJ, Harrar WG, Scholten H, Kassahun A (2005) Quality assurance in 

model based water management – review of existing practice and outline of new approaches. 

Environmental Modelling & Software, 20, 1201-1215. 

[14] Refsgaard JC, Nilsson B, Brown J, Klauer B, Moore R, Bech T, Vurro M, Blind M, Castilla G, 

Tsanis I, Biza P (2005) Harmonised techniques and representative river basin data for assessment 

and use of uncertainty information in integrated water management (HarmoniRiB). Environmental 

Science and Policy, 8, 267-277. 

[15] Refsgaard JC, van der Sluijs JP, Brown J, van der Keur P (2006). A framework for dealing with 

uncertainty due to model structure error. Advances in Water Resources, 29, 1586-1597. 

iii



iv



Preface 

The work presented in this thesis together with the 15 publications published between 1982 and 2006 

form the material for evaluation for the degree of doctor scientiarum (dr. scient.) at the University of 

Copenhagen. The papers have all been published in peer reviewed international scientific journals. 

They are referred to by the numbers [1] to [15]. 

In the present report I have assembled and summarised my most important scientific contributions to 

catchment modelling that has been my research interest during the past three decades. In this connection 

I wish to thank all my co-authors for a very inspiring co-operation during the years. Research does 

not take place in a vacuum, and without the interactions with them my work would not have been possible. 

I wish to acknowledge former and present colleagues and managements at the three organisations 

where I have been employed. At the Institute of Hydrodynamics and Hydraulic Engineering, Technical 

University of Denmark (now Environment and Resources, DTU) I was given the opportunity to explore 

and develop new integrated groundwater/surface water catchment models at a time when hydrological 

modelling was still in its infancy. This showed me the enormous potential of this new field. At Danish 

Hydraulic Institute (now DHI Water & Environment) I was then entrusted with further development of 

modelling tools and with testing them in real life applications. This taught me the limitations and difficulties 

we encounter and the need to be humble when applying models in water resources management. 

Finally, the Geological Survey of Denmark and Greenland (GEUS) has provided a very inspiring scientific 

environment and given me the opportunity to get involved in broader international research projects 

that have matured much of my previous views and allowed me to assemble this work. 

A special thank goes to Kristian A. Rasmussen, GEUS, for using his magic touch to polish some of the 

old dusty figures from the last century to make them easier to read in this thesis. 

Last, but not least, I wish to thank my family for their patience and support and for accepting that I always 

have been too busy with this topic. 

Copenhagen, January 2007 

Jens Christian Refsgaard 

"Life can only be understood backwards; but it must be lived forwards" 

Søren Kierkegaard (1813-1855) 

1



2



Dansk Resume 

Publikationerne og materialet i denne doktorafhandling beskriver en række videnskabelige undersøgelser 

af hydrologisk modellering på oplandsskala i relation til vandressourceforvaltning. Hver af de 15 

publikationer fokuserer på dele af det overordnede emne spændende fra udvikling af nye koncepter og 

modelkoder til modelanvendelser; fra punktskala til oplandsskala; fra modellering af vandstrømninger til 

transport af opløste og reaktive stoffer; fra fokus på planlægning til real-tids oversvømmelsesvarsling og 

videre til tværgående emner og protokoller for selve modelleringsprocessen. 

Afhandlingens kapitel 2 præsenterer protokoller for hydrologisk modellering og en diskussion af interaktionen 

mellem hydrologisk modellering og vandressourceforvaltning. Endvidere forklares den terminologi 

og den tilgrundlæggende videnskabsfilosofiske tankegang samt den klassifikation af modeltyper, 

som benyttes i resten af afhandlingen. Kapitel 3 indeholder resumeer af modelstudier baseret på ni af 

publikationerne. Vurderingerne af disse publikationers bidrag til ny viden på det tidspunkt de blev publiceret 

og af emner som ikke blev behandlet i publikationerne, viser en betydelig udvikling gennem de 

sidste 25 år. Fx indeholder de første publikationer om udvikling af nye modelkoder, intet om verifikation 

af modelkode, validering af modeller mod uafhængige data eller usikkerhedsvurderinger – emner som i 

dag betragtes som meget væsentlige. Eksemplerne illustrerer ligeledes, hvordan generelle emner som 

skalaproblemer og model validering gradvis udviklede sig med baggrund i erfaringer og erkendte problemer 

fra modelstudier, som egentlig havde andre formål. Kapitel 4 præsenterer og diskuterer herefter 

fire generelle emner: (a) heterogenitet og skalering; (b) konfirmation, verifikation, kalibrering og validering 

af modeller; (c) usikkerhedsvurderinger; og (d) kvalitetssikring af modelleringsprocessen. 

Mine væsentligste bidrag til ny videnskabelig viden har været indenfor de følgende fem områder: 

• Ny konceptuel forståelse og tilhørende kodeudvikling. Suså modellen var baseret på en ny forståelse 

af interaktionen mellem overfladevand og grundvand i moræneområder og bragte ny viden om 

hvorledes grundvandsindvinding påvirker vandløb i sådanne oplande. 

• Validering af modeller. Arbejdet med rigoristiske principper for validering af modeller og eksempler 

på anvendelser for såvel ’lumped conceptual’ og ’distributed physically-based’ modeller har været 

en grundpille gennem de sidste 15 år af min forskning. Specielt er introduktionen af begrebet ’conditional 

validation’ ny. 

• Skalering. Mit arbejde har ikke ’løst’ skalaproblemerne, men bidrager til at tydeliggøre de principielt 

forskellige metoder med fokus på deres respektive forudsætninger og begrænsninger. 

• Usikkerhedsvurderinger. En betydelig del af min forskningsaktivitet gennem de sidste 10 år har 

fokuseret på usikkerhedsaspekter. Mit hovedbidrag i den sammenhæng har været introduktion af 

bredere usikkerhedsaspekter i hele modelleringsprocessen samt arbejdet med usikkerheder på 

modelstruktur. 

• Protokoller for hydrologisk modellering og kvalitetssikring af modelleringsprocessen. Den omfattende 

og detaljerede modelleringsprotokol, som blev udviklet i HarmoniQuA projektet er en formalisering 

og udmøntning af erfaring fra de foregående 25 års arbejde med hydrologisk modellering. De 

ny elementer heri er den fokus der lægges på (a) den interaktive dialog mellem modellør, vandressourceforvalter, 

reviewer, interessenter og offentligheden; (b) usikkerhedsvurderinger som et løbende 

element gennem hele modelleringsprocessen; (c) model validering; og (d) introduktion af erfaringer 

og subjektiv viden via eksterne reviews. 

3



Abstract 

The publications and material presented in this thesis describe a series of scientific investigations on 

catchment modelling in relation to water resources management. Each of the 15 publications represents 

parts of the overall topic ranging from development of new concepts and model codes to model 

applications; from point scale to catchment scale; from flow modelling to transport and reactive modelling; 

from planning type applications to real-time forecasting and further on to crosscutting issues and 

protocols for the modelling process. 

The thesis starts with a presentation of protocols for the hydrological modelling process together with a 

discussion of the interaction between the water resources planning and management process and the 

hydrological modelling process. This includes a definition of terminology, a discussion of the underlying 

scientific philosophy and a classification of hydrological models. The following chapter comprises summaries 

of cases of simulation models based on nine of the publications. The post evaluations of the 

contributions to scientific knowledge in the publications and the issues not taken into account in the 

earlier publications reveal significant developments over the years. For example the first publications 

focussing on development of new model codes did not put any emphasis on rigorous verification or 

validation tests nor on uncertainty assessments, which are key issues today. The cases furthermore 

illustrate how general issues such as scaling and model validation gradually emerged from experiences 

and problems encountered in catchment studies that had other primary objectives. The next chapter 

then provides a presentation and discussion of four general issues: (a) catchment heterogeneity and 

scaling; (b) confirmation, verification, calibration and model validation; (c) uncertainty assessment; and 

(d) quality assurance in model based water management. 

My main contributions to scientific knowledge have been in the following five areas: 

• New conceptual understanding and code development. The Suså model was based on a new conceptual 

understanding of the surface water/groundwater interaction in moraine catchment and 

brought new insight into the effect of groundwater abstraction on streamflow in catchments with 

such hydrogeological characteristics. 

• Model validation. The work on rather rigorous principles for model validation and the examples of 

their application both for lumped conceptual and distributed physically based models is a cornerstone 

in my research. In particular the introduction of the term ‘conditional validation’ is novel. 

• Scaling. The framework on scaling does not ‘solve’ the scaling problem but contributes to clarifications 

on applicable methodologies with focus on their respective assumptions and limitations. 

• Uncertainty assessment. During the past decade a considerable part of my research work has focussed 

on uncertainty aspects. I consider my main contributions in this respect to be the introduction 

of the broader uncertainty aspects integrated into the modelling framework and the work with 

model structure uncertainty. 

• Modelling protocols and guidelines for quality assurance in the modelling process. The comprehensive 

modelling protocol developed within the HarmoniQuA project is a formalisation of experience 

and practises that have gradually emerged over the years. The novel elements are the emphasis on 

(a) the interactive dialogue between modeller, water manager, reviewer, stakeholders and the public; 

(b) uncertainty assessments throughout the modelling process; (c) model validation; and (d) experience 

and subjective knowledge introduced through external model reviews. 

4



1. Introduction 

1.1 Water Resources Management and Hydrological Modelling 

"Scarcity and misuse of fresh water pose a serious and growing threat to sustainable development 

and protection of the environment. Human health and welfare, food security, industrial 

development and the ecosystems on which they depend, are all at risk, unless water 

and land resources are managed more effectively in the present decade and beyond than 

they have been in the past". (ICWE, 1992) 

“The fact that the world faces a water crises has become increasingly clear in recent years. 

Challenges remain widespread and reflect severe problems in the management of water resources 

in many parts of the world. These problems will intensify unless effective and concerted 

actions are taken”. (WWAP, 2003) 

The first of the above quotes presents the status and the future challenges facing hydrologists and water 

resources managers as summarised in the introductory paragraph of the Dublin Statement on Water and 

Sustainable Development (ICWE, 1992). The second quote is from the first chapter of the UN World Water 

Development Report “Water for People, Water for Life” which is a collaborative effort of 23 UN agencies 

and convention secretariats co-ordinated by the World Water Assessment Programme. 

Thus the challenges in water resources management are enormous, both at the global scale as illustrated 

above and at smaller scales as for instance outlined in the vision for the European water sector recently 

formulated by the European Water Supply and Sanitation Technology Platform (WSSTP, 2005). 

The present thesis deals with hydrological modelling. It must be emphasised that modelling in itself is not 

sufficient to address these challenges. Modelling only constitute one, among several, sets of tools that can 

be used to support water resources management. Computer based hydrological models have been 

developed and applied at an ever increasing rate during the past four decades. The key reasons for that 

are twofold: (a) improved models and methodologies are continuously emerging from the research 

community, and (b) the demand for improved tools increases with the increasing pressure on water 

resources. Overviews of the status and development trends in catchment scale hydrological modelling 

during this period can be found in Fleming (1975) and Singh (1995). 

1.2 Objective and Content 

The objective of this thesis is to present the contributions to scientific knowledge that has emerged from 

the research described in the 15 appended publications. I have structured the thesis with an aim of presenting 

my research contributions within a framework of catchment modelling and its application to 

support water resources management. 

5



The next chapter (Chapter 2) therefore presents an overall framework of the water resources management 

and planning process and the modelling process and the interaction between these two processes. 

Here the terminology and modelling protocol are introduced and discussed. This chapter is 

based on publications [7], [12] and [13], i.e. mainly some of my most recent work. 

Chapter 3 comprises a number of examples of simulation models ranging from point scale to catchment 

scale, from flow modelling to transport and reactive modelling and from planning type applications to 

real-time forecasting. This chapter is based on publications [1], [2], [3], [4], [5], [6], [8], [9] and [10], i.e. 

mainly some of my earlier work. 

Chapter 4 then provides a presentation and discussion of key and cross-cutting issues in hydrological 

modelling such as scaling, model validation, uncertainty assessment and quality assurance. These issues 

that were introduced as part of the overall framework in Chapter 2 are here discussed with reference 

to the experience and findings made in the publications. This chapter includes ideas, views and 

material from all the 15 publications, but with more emphasis on some of the more general purpose 

publications [6], [7], [10], [11], [12], [13], [14] and [15]. 

Finally, Chapter 5 contains some conclusions and perspectives for future work. 

Thus I have not structured the content of this report according to the chronology of my publications [1] – 

[15]. The reason for this is that my most recent work provides a broader and better overview of the topic 

and is thus better suited for providing a framework for my earlier work. 

6



2 Water Resources Management and the Modelling Process 

2.1 Modelling as Part of the Planning and Management Process 

Integrated Water Resources Management (IWRM) is “a process, which promotes the co-ordinated development 

and management of water, land and related resources, in order to maximise the resultant 

economic and social welfare in an equitable manner without compromising the sustainability of vital 

ecosystems” (GWP, 2000). In the EU Water Framework Directive (WFD) Guidance Document on Planning 

Processes planning is defined as “a systematic, integrative and iterative process that is comprised 

of a number of steps executed over a specified time schedule” (EC, 2003b). In all new guidelines on 

water resources management the importance of integrated approaches, cross-sectoral planning and of 

public participation in the planning process are emphasised (GWP, 2000; EC, 2003b; Jønch-Clausen, 

2004). 

Models describing water flows, water quality, ecology and economy are being developed and used in 

increasing number and variety to support water management decisions. The interactions between the 

modelling process and the water management process are illustrated in Figs. 1 and 2. Fig. 1 shows the 

key actors in the water management process and the five steps that the modelling process typically 

may be decomposed in. The organisation that commissions a modelling study is denoted the water 

manager. This is often the competent authority, but can also be a stakeholder such as a water supply 

company. The role of the government is most often limited to providing the enabling environment such 

as legislation, research and information infrastructure. The typical cyclic and iterative character of the 

water management process, such as the WFD process, is illustrated in Fig. 2, where the interaction 

with the modelling process is illustrated by the large circle (water management) and the four smaller 

supporting circles (modelling). The WFD planning process, as most other planning processes, contains 

four main elements: 

• Identification including assessment of present status, analysis of impacts and pressures and establishment 

of environmental objectives. Here modelling may be useful for example for supporting assessments 

of what are the reference conditions and what are the impacts of the various pressures 

(EC, 2004). 

• Designing including the set up and analysis of programme of measures designed to be able in a 

cost effective way to reach the environmental objectives. Here modelling will typically be used for 

supporting assessments of the effects and costs of various measures under consideration. 

• Implementing the measures. Here on-line modelling in some cases may support the operational 

decisions to be made. 

• Evaluation of the effects of the measures on the environment. Here modelling may support the 

monitoring in order to extract maximum information from the monitoring data, e.g. by indicating errors 

and inadequacies in the data and by filtering out the effects of climate variability. 

7



The Environment 

Problem 

Identification 

1. Model Study Plan 

• Identify problem 

• Define requirements 

• Assess uncertainties 

• Prepare model study plan 

Public Opinion 

2. Data and Conceptualisation 

• Collect and process data 

• Develop conceptual model 

• Select model code 

• Review and dialogue 

Stakeholders 

Competent 

Authority 

3. Model Set-up 

• Construct model 

• Reassess performance 

criteria 


Government 

4. Calibration and Validation 

• Model calibration 

• Model validation 

• Uncertainty assessment 


Implementation 

Water 

Management 

Decision 

5. Simulation and Evaluation 

• Model predictions 

• Uncertainty assessment 


Water Management Process 

Modelling Process 

Fig. 1 The role of the modelling process and the water management decision process (inspired from 

Pascual et al. (2003). 

It is important to note that the modelling studies typically do not address the entire planning and management 

process, but rather support certain elements of the process. Modelling is applied as a response 

(but usually not the only response) to an identified problem and can provide support for water 

management decisions. The types of interactions between the modelling process and the planning and 

management process are: 

8



• The modelling process starts with a thorough framing of the problem to be addressed and definition 

of modelling objectives and requirements for the modelling study (Step 1 in Fig. 1). Water managers 

and stakeholders dominate this step, which basically is identical to part of the broader planning 

process. A participatory based assessment of the most important sources of uncertainty for the decision 

process should be used as a basis for prioritising the elements of the modelling study. The 

uncertainty assessments made at this stage will typically be qualitative. 

• The main modelling itself is composed of steps 2, 3 and 4 of Fig. 1. Here the link with the main 

planning process consists of dialogue, reviews and discussions of preliminary results. The amount 

and type of interaction here depends on the level of public participation that may vary from case to 

case from providing information over consultation to active involvement (Henriksen et al., submitted). 

• The finalisation of the modelling study (equivalent to the last step in Fig. 1), typically including scenario 

simulations. Here the water managers and the stakeholders again have a dominant role. The 

decisions made at the outcome of this step on the basis of modelling results are made in the context 

of the main planning process. Uncertainty assessment of model predictions is a crucial aspect 

of the modelling results and should be communicated in a way that is accessible for the stakeholders 

in the further water management process. 

Modelling 

Evaluation 

Modelling 

Implementation 

WFD process 

Modelling 

Identification 

Designing 

Modelling 

Fig. 2 The role of modelling in the water management process within the context of the EU Water 

Framework Directive (WFD) 

9



2.2 Terminology and Scientific Philosophical Basis for the Modelling 

Process 

2.2.1 Background 

As pointed out in [12] a key problem in relation to establishment of a theoretical modelling framework is 

confusion on terminology. For example the terms validation and verification are used with different, and 

some times interchangeable, meanings by different authors. The confusion arises from both semantic 

and philosophical considerations (Rykiel, 1996). Another important problem is the lack of consensus 

related to the so far non-conclusive debate on the fundamental question concerning whether a water 

resources model can be validated or verified, and whether it as such can be claimed to be suitable or 

valid for particular applications (Konikow and Bredehoeft, 1992; De Marsily et al., 1992; Oreskes et al., 

1994). 

An important issue in relation to validation/verification is the distinction between open and closed systems. 

A system is a closed system if its true conditions can be predicted or computed exactly. This applies 

to mathematics and mostly to physics and chemistry. Systems where the true behaviour cannot be 

computed due to uncertainties and lack of knowledge on e.g. input data and parameter values are 

called open systems. The systems we are dealing with in water resources management, based on geosciences, 

biology and socio-economy, are open systems. According to Konikow and Bredehoeft (1992) 

and Oreskes et al. (1994) it is not possible to verify or validate models of open systems. 

Finally, the principles have to reflect and be in line with the underlying philosophy of environmental 

modelling that have changed significantly during the past decades. In the early days many of us were 

focussing on the huge potentials of sophisticated models in a way that in retrospect may be characterised 

as rather naive enthusiasm (e.g. Freeze and Harlan (1969); Abbott, 1992). The dominant views 

today appears to be a much more balanced and mature view (e.g. Beven, 2002a; Beven, 2002b). 

2.2.2 Terminology and guiding principles 

According to the terminology presented in [12] the simulation environment is divided into four basic 

elements as shown in Fig. 3. The inner arrows describe the processes that relate the elements to each 

other, and the outer circle refers to the procedures that evaluate the credibility of these processes. 

In general terms a model is understood as a simplified representation of the natural system it attempts to 

describe. However, a distinction is made between three different meanings of the general term model, 

namely the conceptual model, the model code and the model that here is defined as a site-specific model. 

The most important elements in the terminology and their interrelationships are defined as follows: 

Reality: The natural system, understood here as the study area. 

Conceptual model: A description of reality in terms of verbal descriptions, equations, governing 

relationships or ‘natural laws’ that purport to describe reality. This is the user's perception of the key 

hydrological and ecological processes in the study area (perceptual model) and the corresponding 

10



simplifications and numerical accuracy limits that are assumed acceptable in order to achieve the purpose 

of the modelling. A conceptual model thus includes both a mathematical description (equations) and a 

descriptions of flow processes, river system elements, ecological structures, geological features, etc. that 

are required for the particular purpose of modelling. By drawing an analogy to scientific philosophical 

discussion the conceptual model in other words constitutes the scientific hypothesis or theory that we 

assume for our particular modelling study. 

Fig. 3 Elements of a modelling terminology [12]. 

Model code: A mathematical formulation in the form of a computer program that is so generic that it, 

without program changes, can be used to establish a model with the same basic type of equations (but 

allowing different input variables and parameter values) for different study areas. 

Model: A site-specific model established for a particular study area, including input data and parameter 

values. 

11



Model confirmation: Determination of adequacy of the conceptual model to provide an acceptable level of 

agreement for the domain of intended application. This is in other words the scientific confirmation of the 

theories/hypotheses included in the conceptual model. 

Code verification: Substantiation that a model code is in some sense a true representation of a conceptual 

model within certain specified limits or ranges of application and corresponding ranges of accuracy. 

Model calibration: The procedure of adjustment of parameter values of a model to reproduce the response 

of reality within the range of accuracy specified in the performance criteria. 

Model validation: Substantiation that a model within its domain of applicability possesses a satisfactory 

range of accuracy consistent with the intended application of the model. 

Model set-up: Establishment of a site-specific model using a model code. This requires, among other 

things, the definition of boundary and initial conditions and parameter assessment from field and laboratory 

data. 

Simulation: Use of a validated model to gain insight into reality and obtain predictions that can be used by 

water managers. This includes insight into how reality can be expected to respond to human interventions. 

In this connection uncertainty assessments of the model predictions are very important. 

Performance criteria: Level of acceptable agreement between model and reality. The performance criteria 

apply both for model calibration and model validation. 

Domain of applicability (of conceptual model): Prescribed conditions for which the conceptual model 

has been tested, i.e. compared with reality to the extent possible and judged suitable for use (by model 

confirmation). 

Domain of applicability (of model code): Prescribed conditions for which the model code has been 

tested, i.e. compared with analytical solutions, other model codes or similar to the extent possible and 

judged suitable for use (by code verification). 

Domain of applicability (of model): Prescribed conditions for which the site-specific model has been 

tested, i.e. compared with reality to the extent possible and judged suitable for use (by model validation). 

2.2.3 Scientific philosophical aspects 

The credibility of the descriptions or the agreements between reality, conceptual model, model code and 

model are evaluated through the terms confirmation, verification, calibration and validation. Thus, the relation 

between reality and the scientific description of reality which is constituted by the conceptual model 

with its theories and equations on flow and transport processes, its interpretation of the geological system 

and ecosystem at hand, etc., is evaluated through the confirmation of the conceptual model. By using the 

term confirmation in connection with conceptual model, it is implied that it is never considered possible 

to prove the truth of a theory/hypothesis and as such of a conceptual model. And even if a site-specific 

12



model is eventually accepted as valid for specific conditions, this is not a proof that the conceptual 

model is true, because, due to non-uniqueness, the site-specific model may turn out to perform right for 

the wrong reasons. 

The fundamental view expressed by scientific philosophers is that verification and validation of numerical 

models of natural systems is impossible, because natural systems are never closed and because 

the mapping of model results are always non-unique (Popper, 1959; Oreskes et al., 1994). I agree that 

it is not possible to carry out model verification or model validation, if these terms are used universally, 

without restriction to domains of applicability and levels of accuracy. 

[12] note, however, that Popper (1959) distinguished between two kinds of universal statements: the 

'strictly universal' and the 'numerical universal'. The strictly universal statements are those usually dealt 

with when speaking about theories or natural laws. They are a kind of 'all-statement' claiming to be true 

for any place and any time. In contrary, numerical universal statements refers only to a finite class of 

specific elements within a finite individual spatio-temporal region. A numerical universal statement is 

thus in fact equivalent to conjunctions of singular statements. 

The restrictions in use of the terms confirmation, verification and validation imposed by the respective 

domains of applicability imply, according to Popper's views, that the conceptual model, model code and 

site-specific models can only be classified as numerical universal statements as opposed to strictly universal 

statements. This distinction is fundamental for the terminology described in [12] and its link to 

scientific philosophical theories. Consequently the terms verification and validation should never be 

used without qualifiers. 

An important aspect of the framework outlined in [12] lies in the separation between the three different 

‘versions’ of the word model, namely the conceptual model, the model code and the-site specific model. 

Due to this distinction it is possible, at a general level, to talk about confirmation of a theory or a hypothesis 

about how nature can be described using the relevant scientific method for that purpose, and, 

at a site-specific level, to talk about validity of a given model within certain domains of applicability and 

associated with specified accuracy limits. 

13



2.3 Modelling Protocol 

The procedure for applying a hydrological model is often denoted a modelling protocol. It comprises a 

series of actions to be followed in a sequential or iterative form. The modelling protocol presented in [7] 

for distributed catchment modelling was inspired by the groundwater community (Anderson and 

Woessner, 1992). It was subsequently used in the Danish Handbook for Groundwater Modelling (Henriksen 

et al., 2001) that has been used extensively in practise since its emergence. A more recent modelling 

protocol, developed within the context of the EU research project HarmoniQuA, is reported in [13] 

and Scholten et al. (2007). The two protocols are illustrated in Figs. 4 and 5. 

Fig. 4 The modelling protocol from [7]. 

14



A modelling study will involve several phases and several actors. A typical modelling study will involve 

the following four different types of actors: 

• The water manager, i.e. the person or organisation responsible for the management or protection of 

the water resources, and thus responsible for the modelling study and the outcome (the problem 

owner). 

• The modeller, i.e. a person or an organisation that works with the model conducting the modelling 

study. If the modeller and the water manager belong to different organisations, their roles will typically 

be denoted consultant and client, respectively. 

• The reviewer, i.e. a person that is conducting some kind of external review of a modelling study. 

The review may be more or less comprehensive depending on the requirements of the particular 

case. The reviewer is typically appointed by the water manager to support the water manager to 

match the modelling capability of the modeller. 

• The stakeholders/public. A stakeholder is an interested party with a stake in the water management 

issue, either in exploiting or protecting the resource. Stakeholders include the following different 

groups: (i) competent water resource authority (typically the water manager, cf. above); (ii) interest 

groups; and (iii) general public. 

The modelling process may, according to [13], be decomposed into five major steps which again are 

decomposed into 48 tasks (Fig. 5). The contents of the five steps are: 

• STEP1 (Model Study Plan). This step aims to agree on a Model Study Plan comprising answers to 

the questions: Why is modelling required for this particular model study What is the overall modelling 

approach and which work should be carried out Who will do the modelling work Who should 

do the technical reviews Which stakeholders/public should be involved and to what degree What 

are the resources available for the project The water manager needs to describe the problem and 

its context as well as the available data. A very important task is then to analyse and determine the 

various requirements of the modelling study in terms of the expected accuracy of modelling results. 

The acceptable level of accuracy will vary from case to case and must be seen in a socio-economic 

context. It should, therefore, be defined through a dialogue between the modeller, water manager 

and stakeholders/public. In this respect an analysis of the key sources of uncertainty is crucial in 

order to focus the study on the elements that produce most information of relevance to the problem 

at hand. 

• STEP 2 (Data and Conceptualisation). In this step the modeller should gather all the relevant 

knowledge about the study basin and develop an overview of the processes and their interactions in 

order to conceptualise how the system should be modelled in sufficient detail to meet the requirements 

specified in the Model Study Plan. Consideration must be given to the spatial and temporal 

detail required of a model, to the system dynamics, to the boundary conditions and to how the 

model parameters can be determined from the available data. The need to model certain processes 

in alternative ways or to differing levels of detail in order to enable assessments of model structure 

uncertainty should be evaluated. The availability of existing computer codes that can address the 

model requirements should also be addressed. 

• STEP 3 (Model Set-up). Model Set-up implies transforming the conceptual model into a site-specific 

model that can be run in the selected model code. A major task in Model Set-up is the processing of 

data in order to prepare the input files necessary for executing the model. Usually, the model is run 

within a Graphical User Interface (GUI) where many tasks have been automated. The GUI speeds 

15

Refsgaard JC – Doctoral Thesis 


January 2007 

up the generation of input files, but it does not guarantee that the input files are error free. The 

modeller performs this work. 

• STEP 4 (Calibration and Validation). This step is concerned with the process of analysing the model 

that was constructed during the previous step, first by calibrating the model, and then by validating 

its performance against independent field data. Finally, the reliability of model simulations for the intended 

domain of applicability is assessed through uncertainty analyses. The results are described 

so that the scope of model use and its associated limitations are documented and made explicit. 

The modeller performs this work. 

• STEP 5 (Simulation and Evaluation). In this step the modeller uses the calibrated and validated 

model to make simulations to meet the objectives and requirements of the model study. Depending 

on the objectives of the study, these simulations may result in specific results that can be used in 

subsequent decision making (e.g. for planning or design purposes) or to improve understanding 

(e.g. of the hydrological/ecological regime of the study area). It is important to carry out suitable uncertainty 

assessments of the model predictions in order to arrive at a robust decision. As with the 

other steps, the quality of the results needs to be assessed through internal and external reviews. 

Each of the last four steps is concluded with a reporting task followed by a review task. The review 

tasks include dialogues between water manager, modeller, reviewer and, often, stakeholders/public. 

The protocol includes many feedback possibilities (Fig. 5). 

A comparison of the old protocol (Fig. 4) and the one decade younger HarmoniQuA protocol (Fig. 5) 

shows some interesting developments: 

• The basic sequence of the prescribed activities in the protocols is the same. The HarmoniQuA protocol 

is much more detailed than the old one, but there are no fundamental disagreements between 

the two. 

• The HarmoniQuA protocol puts much more emphasis on the framing of the modelling study. This 

is only considered in one box in Fig. 4 and not given much weight in [7], while it is one full Step 

comprising seven tasks in Fig 5. This implies for instance that requirements on performance criteria 

and uncertainty assessments are introduced rather late in the old protocol, while it is an important 

part of Step 1 in the HarmoniQuA protocol. 

• There is much emphasis on uncertainty assessments throughout the modelling process in the 

HarmoniQuA protocol, while uncertainty assessments are only considered as part of model calibration 

and simulation in the old protocol. 

• The HarmoniQuA protocol is part of a quality assurance framework with much emphasis on the 

role play between the various actors in the modelling process. This results in stakeholder involvement, 

peer reviews, focus on reporting and dialogue between water manger and modeller. In contrary 

to this, the old protocol only focuses on the modeller. 

These developments reflect a process from guidance to the modeller only (old protocol) towards guidance 

to all actors involved in the modelling process (HarmoniQuA). This process has been inspired by 

feedbacks from introducing the old protocol to real world applications, where it was realised that a 

broader concept was required. 

16

Data and 

Conceptualisation 

Describe System and 

Data Availability 

Collect and Process 

Raw Data 

Calibration 

and Validation 

Specify Stages in 

Calibration Strategy 

Select Calibration 

Method 

Define Stop Criteria 

Simulation 

and Evaluation 

Set-up Scenario 

Simulations 

Model Study Plan 

Describe Problem 

and Context 

Define 

Objectives 

Identify Data 

Availability 

Determine 

Requirements 

Prepare Terms of 

Reference 

Proposal and 

Tendering 

No 

Agree on 


and Budget 

Yes 

No 

Sufficient 

Data 

Dire 

Yes 

Model Structure and 

Processes 

Model Parameters 

Summarise 

Conceptual Model and 

Assumptions 

Need 

Yes 

for Alternative 

Conceptual 

Models 

No 

Process Model 

Structure Data 

Not 

Assess 

OK 

Dire 

Soundness of 


OK 

Code Selection 

Report and Revisit 


(Data and 

Conceptualisation) 

Review Data and 

Conceptualisation and 

Model Set-up Plan 

OK 

Not 

OK 

Model Set-up 

Construct Model 

Not 

Dire 

Test Runs 

Completed 

OK 

OK 

Specify or Update 

Calibration and 

Validation Targets 

and Criteria 

Report and Revisit Model 

Study Plan 

(Model Set-up) 

Not 

Dire 

Review Model 

OK 

Set-up and Calibration 

and Validation Plan 

OK 


Parameters 

Not 

OK 

Parameter 

Estimation 

Dire 

OK 

All 

No 

Calibration Stages 

Completed 

Yes 

Assess 

Not 

Soundness of 

OK 

Calibration 

OK 

Validation 

Not 

Dire 

Assess 

OK 

Soundness of 

Validation 

OK 

Uncertainty Analysis 

of Calibration and 

Validation 

Scope of Applicability 



(Calibration and 

Validation) 

Not 

Review 

OK 

Calibration and Validation 

and Simulation Plan 

OK 

Dire 

Not 

OK 

Check 

Simulations 

OK 

Analyse and Interpret 

Results 

Not 

Assess 

OK 

Soundness of 

Simulattion 

OK 

Uncertainty Analysis 

of Simulation 

No 

All Scenarios 

Completed 

Yes 

Reporting of 

Simulation and 

Evaluation 

Not 

Review of 

OK 


Evaluation 

OK 

Need for Post Audit 

Model Study 

Closure 

Fig. 5 The five modelling steps and the 48 tasks in the HarmoniQuA modelling protocol. The diagram is an updated version of Fig. 5 in [13] 

(Refsgaard et al., 2006).



2.4 Classification of Models 

Many attempts have been made to classify hydrological models (or model codes). Refsgaard (1996) 

presented the classification shown in Fig. 6 that I have used in all papers of the present thesis. Deterministic 

models can be classified according to whether the model gives a lumped or a distributed description 

of the considered area, and whether the description of the hydrological processes is empirical, 

conceptual, or more physically-based. A lumped model implies that the catchment is considered as one 

computational unit. A distributed model, on the other hand, provides a description of catchment processes 

at geo-referenced computational grid points within the catchment. An intermediate approach is a 

semi-distributed model, which uses some kind of distribution, either in sub-catchments or in hydrological 

response units, where areas with the same key characteristics are aggregated to sub-units without 

considering their actual locations within the catchment. Examples of hydrological response units considered 

in semi-distributed models are elevation zones, which are relevant for snow modelling, and 

combinations of soil and vegetation type, which may be relevant for simulation of root zone processes 

such as evapotranspiration and nitrate leaching. 

As most conceptual models are also lumped, and as most physically-based models are also distributed, 

the three main classes emerge: 

• Empirical (black box) 

• Lumped conceptual models (grey box) 

• Distributed physically-based (white box) 

The classification is discussed in some details in Refsgaard (1996). Here, the focus is on the two traditional 

approaches in deterministic hydrological catchment modelling, namely the lumped conceptual 

and the distributed physically-based ones. The fundamental difference between these two types of 

models lies in their process descriptions and the way spatial variability is treated. The distributed physically-based 

models contain equations which have originally been developed for point scales and which 

provide detailed descriptions of flows of water and solutes. The variability of catchment characteristics 

is accounted for explicitly through the variations of hydrological parameter values among the different 

computational grid points. This approach leaves the variability within a grid as un-accounted for, which 

in some cases is of minor importance but in other cases may pose a serious constraint. The lumped 

conceptual models uses empirical process descriptions, which have built-in accounting for the spatial 

variability of catchment characteristics. 

18



Fig. 6 Classification of hydrological models according to process description (Refsgaard, 1996). 

Typical examples of lumped conceptual model codes are the Stanford Watershed Model (Crawford and 

Linsley, 1966), the Sacramento (Burnash, 1995), the HBV (Bergström, 1995) and the NAM (Nielsen 

and Hansen, 1973). Typical examples of distributed physically-based model codes are the MIKE SHE 

(Abbott et al., 1986a, b; Refsgaard and Storm, 1995) and the Thales (Grayson et al., 1992a, b). 

Groundwater model codes like MODFLOW belong to the distributed physically-based class. 

The classification has some shortcomings that should be noted. First of all, the use of the term ‘conceptual 

model’ is unfortunate, because this is a different meaning of the term as compared to the definition 

given in Section 2.2 and used in the modelling protocols (Section 2.3). This can cause some confusion, 

but to introduce a new term completely different from what is used by almost all other scientists in the 

community of catchment modelling may cause even more confusion. Secondly, and more fundamental, 

the names of the classes should be considered as relative rather than absolute. For example Beven 

(1989) argued that in most applications physically-based models are used as lumped conceptual models 

at the grid scale. As discussed in [4] I agree that some degree of lumping and conceptualisation will 

always need to take place, but that in spite of this there is a fundamental difference in the functioning 

and, as shall also be discussed later, of the applicability of the two model types. 

19



3 Simulation of Hydrological Processes at Catchment 

Scale 

In this chapter some modelling examples from the publications are briefly summarised and discussed 

within the framework outlined in Chapter 2. 

3.1 Flow modelling 

3.1.1 Groundwater/surface water model for the Suså catchment ([1], [2]) 

Summary 

The publications [1] and [2] describe a new model code and the set-up, calibration and validation of a 

model for a 1,000 km 2 area. Further details can be found in Stang (1981), Refsgaard (1981) and 

Refsgaard and Stang (1981). The objectives of the study were to develop a spatially distributed 

groundwater/surface water model code and apply it to the Suså catchment with a particular focus on 

the stream-aquifer interaction in a hydrogeological system consisting of confined aquifer-aquitardphreatic 

aquifer and to test the model for prediction of the hydrological consequences on streamflows 

and hydraulic heads of groundwater abstraction. 

The new model code was rather complex and computationally demanding at the time of development. 

Thus, standard 30 years model simulations could only be carried out as night runs at the main frame 

computer at DTU’s computer centre. 

The model area comprising the Suså and the neighbouring Køge Å catchments is located in the central 

and southern part of Zealand. The model area, the topographic divides and the groundwater model 

polygonal mesh are shown in Fig. 7. The overall structure of the model is outlined in Fig. 8. It consists 

of four separate components for the confined regional aquifer, the aquitard, the phreatic aquifer and the 

root zone. The spatial distribution and the degree of physical basis differ between the four components. 

The time steps in the calculations are one day in all parts of the model. 

The confined aquifer is described by a two-dimensional integrated finite difference model with 112 polygons. 

For the phreatic aquifer consisting of till with very small transmissivities and for the aquitard each 

of the polygons are distributed further into four sub-polygons based on hypsographic curves (Fig. 9). 

Due to small scale topographic variations the flows in the aquitard in most polygons are upwards in 

some parts and downwards in other parts of the polygon. A correct representation of these flows between 

the regional aquifer and the phreatic aquifer that discharges the rivers is crucial for achieving a 

good description of the stream-aquifer interaction. Without such approach allowing a description of both 

upwards and downwards flows in the aquitard within the same polygon a much finer spatial resolution 

with 10-100 times as many polygons would have been required. This would have been impossible 25 

years ago due to computational constraints. 

20



The root zone component calculated the net precipitation that recharged the phreatic aquifer. The modelling 

area was divided into seven sub-areas with separate precipitation input and soil parameters. Further 

the spatial variation in vegetation was accounted for by dividing each of these seven areas into five 

vegetation areas based on agricultural statistics and one meadow (wetland) area. This makes the total 

distribution to 42 sub-areas where each sub-area is a kind of ‘hydrological response unit’, i.e. a semidistributed 

approach. The root zone calculations were based on a box approach with four layers in the 

root zone. 

Fig. 7 Topographic divides, groundwater polygonal mesh, precipitation gauging stations and precipitation 

zones of the Suså model. 

21



Fig. 8 The structure of the Suså model 

Aquitard 

40 

30 

Ground surface 

Water table 

(lower outlet) 

Head, regional 

aquifer 

Legend 

0 1 2 3 

km 

POLYGON 21 

< 24 m above MSL 

24–28 m above MSL 

28–34 m above MSL 

> 34 m above MSL 

Lilleå 

20 

Vendebæk 

Regional 

aquifer 

10 

0 

1 2 3 4 

50 100 % 

Pre-Quaternary 

surface 

Suså 

Gasmose Bæk 

Fig. 9 Hypsographic curve for polygon 21 and areas represented by the four sub-polygons. 

22



Fig. 10 Examples of simulation results from soil moisture in root zone, hydraulic head of regional confined 

aquifer and river discharge. 

The model was calibrated against soil moisture data from four experimental plots, time series of hydraulic 

heads from 40 observation wells in the regional aquifer and streamflow from six gauging stations. 

Examples of simulation results from the calibration period are shown in Fig. 10 which shows excellent 

curve fits. The groundwater and aquitard models were calibrated, along with the code development 

itself, using all available hydraulic head data from the period 1950-80. Between 1964 and 1970 the 

groundwater abstraction to Copenhagen Water Supply from the Regnemark Waterworks in the Køge Å 

catchment was increased from zero to about 15 million m 3 /year. The remaining model components 

23



were calibrated against only some of the available streamflow data, namely some of the data from the 

Suså catchment, while amongst others Køge Å data were not used for calibration. 

While the simulation of streamflows in the Køge Å catchment in [1] was characterised as a “half-way 

test of the model’s ability to simulate streamflow from ungauged catchments” no systematic validation 

tests against independent data were carried out as part of the study. Some years later the model simulations 

were extended with new data from the period 1981-87, where the groundwater abstractions had 

changed slightly. In this post audit validation study the model simulations were found to match the observations 

to the same degree of accuracy as during the calibration period (Jensen and Jørgensen, 

1988). 

The model’s ability to simulate the streamflow depletion caused by a groundwater abstraction from the 

regional confined aquifer was tested on historical data from the Køge Å catchment. Fig. 11 shows simulated 

streamflow assuming actual groundwater abstraction from the Regnemark Waterworks starting in 

1964, Q sim , and assuming no abstracting from Regnemark, Q 1 sim. The recorded streamflow fits reasonably 

well with Q sim . The difference Q 1 sim - Q sim , which is the simulated streamflow depletion caused 

by the increased groundwater abstraction, is seen to have a clear seasonal variation with smaller depletion 

during the dry summer periods and larger depletion during the wet winter season. 

Fig. 11 Comparison of 15 days moving average streamflows for Køge Å (lower) and the relative streamflow 

depletion caused by the groundwater abstraction (upper) 

24



Discussion - post evaluation 

Most other catchment models existing when the Suså model code was developed were either purely 

rainfall runoff models of the lumped conceptual type, such as the classical Stanford Watershed Model 

(Crawford and Linsley, 1966), the HBV (Bergström and Forsman, 1973; Bergström, 1976) and the NAM 

(Nielsen and Hansen, 1973) or purely groundwater models (Prickett and Lonnquist, 1971; Thomas, 

1973). A few authors had concluded that coupled groundwater/surface water modelling was essential 

(e.g. Luckner, 1978; Lloyd, 1980) and some had outlined specific, but not yet operational, concepts 

(e.g. Freeze and Harlan, 1969; Wardlaw, 1978; Jønch-Clausen, 1979). In some studies groundwater 

models and rainfall-runoff models were used at the same catchment, but without coupling (e.g. Weeks 

et al., 1974). Thus, apparently no other model had previously been used to dynamically simulate coupled 

groundwater/surface water conditions at catchment scale (rainfall, evapotranspiration, surface near 

runoff, groundwater recharge, groundwater heads, baseflow discharge from aquifers to streams). 

During the decade following [1] and [2] a few model codes with integrated groundwater/surface water 

descriptions emerged. The most prominent of these codes was the SHE (Abbott et al., 1986a, b) and its 

operational daughter codes, MIKE SHE from DHI (Refsgaard and Storm, 1995) and SHETRAN from 

University of Newcastle (Bathurst and O’Connell, 1992), which both are used today, although in later 

versions. Other operational models from that period were described by Miles and Rushton (1983), 

Christensen (1994) and Wardlaw (1994). Miles and Rushton (1983) used a simpler root zone and surface 

water component than [1] together with a two-dimensional finite difference groundwater model and 

monthly time steps. Christensen (1994) developed a model for the Tude Å catchment (a neighbour to 

Suså) that conceptually was similar and a little bit simpler than [1]. Wardlaw et al. (1994) used the concepts 

outlined in Wardlaw (1978) coupling the Stanford Watershed Model with a finite-difference 

groundwater model and a channel routing model for simulation of discharge and groundwater levels in 

the Allen catchment in England. 

During the past decade the number of integrated modelling codes has exploded. The existing codes 

today can be considered to fall in three classes: (a) fully integrated codes such as MIKE SHE (Graham 

and Butts, 2005); (b) couplings of existing groundwater codes and surface water codes such as MOD- 

FLOW and SWAT (Perkins and Sophocleous, 1999); and (c) codes based on the fully 3-dimensional 

Richards’ equation (Panday and Hayakorn, 2004). Independent reviews of the scientific basis and practical 

applicability of a number of recent integrated model codes are provided by e.g. Kaiser-Hill (2001) 

and Tampa Bay Water (2001). 

A major novelty of [1] and [2] was that the Suså model code was one of the first codes, which integrated 

surface water and groundwater descriptions, and the first of its kind applied operationally to moraine 

landscapes. The model results were unique with respect to simulation of the dynamics of the groundwater/surface 

water interaction, as for instance reflected by the annual hydraulic head fluctuations and the 

streamflow depletion due to the groundwater abstraction. Furthermore the study provided new insights 

and understanding on the mechanisms that governed streamflow depletion due to groundwater abstraction 

from confined aquifers in moraine catchments. In contrary to the traditional type curve analyses 

which were used extensively in hydrogeology to analyse test pumpings and to predict the effects of 

abstractions, [1] and [2] were based on non-stationary analysis which, as evident from the annual variations 

of streamflow depletion shown in Fig. 11, turns out to be crucial. The only modelling study from 

the following decade that considered the dynamics of the stream-aquifer interaction in moraine catch- 

25



ments in connection with groundwater abstraction was Christensen (1994) who basically confirmed the 

results of [2]. 

The spatial distribution and the degree of physical basis differ between the four components of the 

Suså model. The groundwater model can be characterised as distributed physically-based, the aquitard 

model as semi-distributed physically-based and the phreatic aquifer and root zone models as semidistributed 

conceptual. In contrary to for instance the later SHE code (Abbott et al, 1986a, b), the Suså 

model code was not generic, because it could not be applied to other catchments without changes in 

the code. Furthermore, it was tailored to the specific hydrological conditions prevailing in the Suså 

catchment and could for instance not be applied to an alluvial unconfined aquifer. 

In retrospect, it is interesting to observe that issues related to the credibility of model simulations were 

not critically analysed or discussed in [1] and [2]. First of all, aspects of code verification were not dealt 

with in the publications, although a major novelty of the work was the development of a completely new 

code. Secondly, and maybe more surprisingly, model validation and uncertainty assessments of model 

simulations were almost not addressed. By using all the available groundwater head data for calibration 

the opportunity to make split-sample validation test against parts of the data or even the unique opportunity 

to calibrate on data before the groundwater abstraction and validate on data after the abstraction 

(differential split-sample test according to Klemes (1986)) were not utilised. By not addressing the uncertainty 

and by not conducting rigorous validation tests the reader may be left with the, undocumented, 

impression that the curve fitting in Fig. 10 is supposed to reflect the predictive capability of the model. 

That the model proved to perform well in a subsequent post-audit validation study could not be known 

at the time of [1] and [2]. 

The other integrated groundwater/surface water modelling studies from the following decade (Miles and 

Rushton, 1983; Christensen, 1994; Wardlaw, 1994) had the same characteristics, i.e. only focus on 

calibration and model prediction but no mentioning of verification of the new model codes, no model 

validation tests against independent data and no uncertainty assessments. The SHE study reported by 

Bathurst (1986a, b) focussing on surface water hydrology did include split-sample validation testing and 

sensitivity analysis. For surface water (rainfall-runoff) modelling studies focusing more on model applications 

than code developments split-sample testing was more common (e.g. Bergström, 1976; WMO, 

1975; WMO 1988) but uncertainty assessment was not systematically carried out and usually not even 

considered until Beven called for it (Beven, 1989; Beven and Binley, 1992). Altogether, this illustrates a 

very significant development in the modelling practise during these three decades. 

26



3.1.2 Application of SHE to catchments in India ([4], [5]) 

Summary 

The publications [4] and [5] describe the set-up, calibration and validation of the ‘Système Hydrologique 

Européen’ (SHE) code to six sub-catchments totalling about 15,000 km 2 of the Narmada basin in India, 

Fig. 12. The objective of the papers was to describe experiences from applying a distributed physicallybased 

code like SHE to large basins with rather limited data coverage compared to previous SHE applications 

to research catchments. In contrary to the Suså study in [1] and [2], the India study did not 

include any code development, except for data processing utility software. Instead it comprised application 

of an existing code (Abbott et al., 1986a,b) to conditions that were far beyond the conditions for 

which the SHE had previously been tested in terms of catchment size, data coverage and hydrological 

regime (Bathurst, 1986a). 

Fig. 12 Location map for the Narmada and the six sub-catchments. 

Applicationwise, the study focused on simulation of catchment runoff, i.e. surface water aspects only. 

The model structure was as illustrated in Fig. 17. The groundwater zone was, however, considered only 

with one layer, i.e. a 2-dimensional groundwater model, and there were no data from observation wells 

to allow a calibration of the groundwater part of the model. The six models were set-up with a 2 km x 2 

km computational grid. A split-sample approach was used with typically three years for model calibrations 

and other three years for the subsequent model validation. 

27



The data requirements for a SHE based model is substantial and much larger than for a rainfall-runoff 

model of lumped conceptual type that previously had been applied to such types of catchments. A major 

challenge of the study was therefore to identify, collect and process data and to check their quality. 

Data were collected from more than 15 different agencies belonging to many different ministries and the 

data quality varied substantially. 

Another challenge was how to assess parameter values in a distributed model when data, in contrary to 

the previous tests on small experimental catchments like in Bathurst (1986a), are scarce. Each of the 

grid points in a distributed model is characterised by one or more parameters. Although the parameter 

values in principle (as in nature) vary from grid point to grid point, it is neither feasible nor desirable to allow 

the parameter values to vary so freely. Instead, a given parameter should only reflect the significant and 

systematic variation described in the available field data. Therefore a parameterisation procedure was 

developed, where representative parameter values were associated to individual soil types, vegetation 

types, geological layers, etc. This process of defining the spatial pattern of parameter values effectively 

reduced the number of free parameter coefficients, which needs to be adjusted in the subsequent 

calibration procedure. For example, the 820 km 2 Kolar catchment is parameterised into three soil classes 

and 10 land use/soil depth classes. For the soil type classes calibration was allowed for the hydraulic 

conductivity in the unsaturated zone (for each soil type class the conductivity could vary among three 

different land uses => nine parameter values). For the land use/soil depth classes the calibration 

parameters comprised soil depths (10 parameters in total) and the Strictler overland flow coefficients for 

four land use types (four parameters in total). Further three parameters were subject to calibration 

(hydraulic conductivity in the saturated zone, an (empirical) by-pass coefficient and a surface retention 

parameter; all kept constant throughout the catchment). Although the 26 calibration parameters could not 

be assessed from field data alone, but had to be modified through calibration, the physical realism of the 

parameter values resulting from the subsequent calibration procedure could be evaluated from available 

field data. 

The simulation results are illustrated in Fig. 13 as hydrographs for the largest sub-catchment and in Fig. 

14 as annual runoff and annual peaks for all six sub-catchments. In both figures the results are for the 

validation periods, where results are slightly poorer as compared to the calibration periods. In [4] the 

rainfall-runoff simulation results were characterised as having the same degree of accuracy as would 

have been expected with simpler hydrological models of the lumped conceptual type. The results therefore 

suggested that application of complex data demanding models like the present SHE approach are 

not justified in cases where the modelling objective is limited to simulation of catchment runoff and 

where observed runoff records exist for calibration purposes. No attempts were made in the study to 

test the capability of a model without calibration. 

After the first calibration and validation tests had been made, field investigations were carried out in the 

Kolar catchment during a 2½ week period to improve the parameter estimates, mainly for soil and vegetation 

parameters, and to evaluate the importance of additional field data. Subsequently, the Kolar 

model was recalibrated in such a way that rather narrow constraints were put on the range of values 

allowed for the key parameters. The final model, based on the additional data, produced simulation 

results of same quality as the preliminary model with respect to simulated hydrograph. Although it is 

argued in [5] that the final model is believed to give an improved physical representation of the hydrological 

regime, it is concluded that a good match between observed and simulated outlet hydrographs 

does not provide a sufficient guarantee of a hydrologically realistic process description. 

28



Fig. 13 Observed and simulated hydrographs for the Narmada at Manot during the validation period 

1985 and 1987. 

Fig. 14 Simulated monthly runoff during monsoon season (left) and simulated annual peak discharge 

compared with measured values during validation periods for all six sub-catchments. 

29




At the time of [4] and [5] lumped conceptual catchment model codes such as HBV (Bergström, 1992) 

and NAM (Jønch-Clausen and Refsgaard, 1984) had been used operationally for two decades, typically 

for catchments ranging from a few km 2 to more than 10,000 km 2 . 

At the same time distributed physically-based models had mainly been tested on flood events on small 

catchments that typically had very good data due to experimental instrumentation (Loague and Freeze, 

1985; Bathurst 1986a; Grayson et al., 1992a,b; Troch et al., 1993). Loague and Freeze (1985) compared 

a quasi-physically based model with a regression model and a unit hydrograph model on three 

experimental catchments, the 0.1 km 2 R-5, Chickasha, Oklahoma, the 7.2 km 2 WE-38, Klingertown, 

Pensylvania and the 0.1 km 2 HB-6, West Thornton, New Hampshire. Bathurst (1986a) applied the SHE 

to the simulation of flood events for the 10.6 km 2 experimental Wye catchment in Wales. Grayson et al. 

(1992a,b) applied the THALES to the simulation of flood events for the 7.0 ha Wagga catchment in Australia 

and the 4.4 ha Lucky Hill catchment at the Walnut Gulch Experimental Area in Arizona. Troch et 

al. (1993) applied a model based on a 3-dimensional numerical solution to Richards’ equation to the 7.2 

km 2 WE-38 catchment and a 0.64 km 2 subcatchment. 

To my knowledge the only examples until then of distributed physically-based model studies including 

applications on several hundred km 2 catchments and continuous simulation for periods of several years 

were the coupled groundwater/surface water models discussed in the previous section ([1]; [2]; Miles 

and Rushton, 1983; Christensen, 1994; Wardlaw et al., 1994) that all had distributed physically-based 

groundwater components and lumped (or semi-distributed) conceptual surface water components and 

some models such as WATBAL (Knudsen et al., 1986) that had semi-distributed surface water components 

and lumped conceptual groundwater components. 

During the following few years a few additional catchment scale studies with continuous simulations of 

distributed physically-based models emerged. One example is Querner (1997) who applied the 

MOGROW to the 6.5 km 2 Hupselse Beek catchment simulating both discharge and groundwater head 

dynamics. Another example is Kutchment et al. (1996) who simulated surface water processes for the 

3315 km 2 Ouse catchment. The study of Kutchment et al (1996) had many similarities with [4] and [5] 

with respect to model conceptualisation and conclusions. 

The main scientific contribution of [4] and [5] was therefore as the first study to demonstrate that distributed 

physically-based models could be established for catchments of this size and with ordinary data 

availability. Previous studies reported in literature had either been tests on small research catchments 

or been models with major components of the lumped conceptual type. As outlined above, it is worth 

noting the different traditions in the communities that had dealt with (large scale) lumped conceptual 

models, (small scale) physically-based models and groundwater models, respectively. I believe that an 

important characteristic of the team who performed the present study ([4] and [5]) was that it comprised 

scientists who together had comprehensive experiences from all these communities. 

Another key contribution was the parameterisation approach introduced. The point of departure for this 

approach, e.g. [1] and Bathurst (1986a), was an approach allowing parameter values to vary as required 

to fit the observed data during the calibration phase. This approach had been criticised by Beven 

(1989) to result in overparameterisation. The procedure resulted in 26 parameters to be calibrated for 

the Kolar catchment. Although this number is significantly less than e.g. the number of free parameters 

30



in [1], it is still very high and it is very likely that a sensitivity analysis would have shown that this number 

could easily be reduced without loss of model performance. It is interesting to note that similar parameterisation 

approaches reported for other catchments in 1997 ([7]) and 2001 (Andersen et al., 2001) 

resulted in 11 and 4 free parameters, respectively, implying that the parameterisation approach adopted 

in [4] and [5] were not yet finally developed. 

Beven (1989) had provided a fundamental critique of the way physically-based models such as the 

SHE had been promoted by e.g. Abbott et al. (1986a) and Bathurst (1986a). His main critique was that 

the attitudes in these early SHE papers were not realistic with respect to the abilities and achievements 

of physically-based models. Beven pointed amongst others to the following key problems: 

• The process equations are simplifications leading to model structure uncertainty. 

• Spatial heterogeneity at subgrid scale is not included in the physically-based models. The current 

generation of distributed physically-based models are in reality lumped conceptual models. 

• There is a great danger of overparameterisation if it is attempted to simulate all hydrological processes 

thought to be relevant and the related parameters against observed discharge data only. 

As a conclusion Beven argued that for future applications attempts must be made to obtain realistic 

estimates of the uncertainty associated with their predictions, particularly in the case of evaluating future 

scenarios of the effects of management strategies. 

[4] noted some of Beven’s critique, acknowledging that the process representation at the 2 km x 2 km 

grid squares is causing significant violations of some of the process descriptions, that “some degree of 

lumping and conceptualisation has taken place at the grid scale” and that “scale problems are important”. 

[4] stressed, however, that in spite of these acknowledged limitations “the present basin model is 

much more physically based and distributed than the traditional lumped conceptual model, where the 

entire catchment is represented in effect by one grid square, and where the process representations 

due to averaging over characteristics of topography, soil type and vegetation type are fundamentally 

different from the basic physical laws”. 

[4] and [5] concluded that the SHE is a suitable tool to support water management for conditions in India. 

In contrary to this, Beven (1989) had stated that the physically-based models “are not well suited to 

applications to real catchments”. In retrospect, it is remarkable that [4] and [5] did not go more substantially 

into a dialogue with the very fundamental critique raised by Beven (1989). For instance [4] and [5] 

did not comment at all on Beven’s main conclusion on the need for uncertainty assessment, although 

[5] actually used the model to study the impact of soil and land use by performing sensitivity analyses. 

A more comprehensive response and dialogue took place a few years later (Beven, 1996a; Refsgaard 

et al., 1996; Beven, 1996b). 

Seen in the perspective of present protocols for good modelling practise ([12] and [13]) the approach 

and conclusions in [4] and [5] are especially deficient by the lacking focus on uncertainty assessment. A 

main reason for the lack of dialogue with Beven’s critique and the lack of focus on uncertainty in [4] and 

[5] may be that we were too preoccupied with the real achievement as the first to setting up and running 

such type of model for such large catchments. Another reason may be that some of us had a background 

in groundwater modelling, where large scale distributed physically-based models had been successfully 

used to support practical water resources management for more than a decade, so we considered 

Beven’s statement that the physically-based models “are not well suited to applications to real 

catchments” as a large exaggeration. 

31



3.1.3 Intercomparison of different types of hydrological models ([6]) 

Summary 

The research study reported in publication [6] had two objectives. The first objective was to identify a 

rigorous framework for the testing of model capabilities for different types of tasks. The second objective 

was to use this theoretical framework and conduct an intercomparison study involving application of 

three model codes of different complexity to a number of tasks ranging from traditional simulation of 

stationary, gauged catchments to simulation of ungauged catchments and of catchments with nonstationary 

climate conditions. Data from three catchments in Zimbabwe were used for the tests. 

The three codes used in the study were (a) NAM (Nielsen and Hansen, 1973; Havnø et al., 1995) – Fig. 

15; (b) WATBAL (Knudsen et al., 1986) – Fig. 16; and (c) MIKE SHE (Abbott et al., 1986a,b; Refsgaard 

and Storm, 1995) – Fig. 17. The NAM and MIKE SHE can be characterised as very typical of their 

lumped conceptual and distributed physically-based types, respectively, while the WATBAL with its 

semi-distributed approach falls in between these two standard classes. 

Fig. 15 Structure of the NAM rainfall-runoff model code 

32



Fig. 16 Structure of the WATBAL code. 

Fig. 17 Schematic representation of the model structure of the ‘Système Hydrologique Européen’ (SHE) 

code. 

The three catchments in Zimbabwe that were selected for the tests were Ngezi-South (1090 km 2 ), Lundi 

(254 km 2 ) and Ngezi-North (1040 km 2 ). For two of the catchments the model simulations started with a 

blind simulation, i.e. a simulation where no calibration was conducted, but where model parameters 

were assessed directly from field data and indirectly by considering parameter values in the first catchment 

(proxy basin test). Then one year was made available for calibration and finally the full calibration 

period of 4-5 years was used. In all cases an independent period was used for validation tests (splitsample 

test). The hydrological regime in Zimbabwe is semi-arid and characterised by very large interannual 

variations. It was therefore possible to construct a test scheme in such a way that a model’s 

ability to predict differences in climate input could be tested by calibrating on a dry period and validating 

on a wet period or vice versa (differential split-sample test). 

33



The model performance was evaluated for annual runoff and criteria focussing on the shape of the discharge 

hydrograph, i.e. rainfall-runoff modelling. The modelling work was carried out by three different 

persons/teams that were very experienced by applying their respective model codes. A general conclusion 

from the study was that the performances of the three codes were surprisingly similar. Thus, the 

ability of WATBAL and SHE to explicitly utilise data such as topography, soil and vegetation data that 

the NAM could not use turned out to make no significant difference in most cases. In summary the conclusions 

were: 

• Given a few (1–3) years of runoff measurements, a lumped model of the NAM type would be a 

suitable tool from the point of view of technical and economical feasibility. This applies for catchments 

with homogeneous climatic input as well as cases where significant variations in the exogenous 

input are encountered. 

• For ungauged catchments, however, where accurate simulations are critical for water resources 

decisions, a distributed model is expected to give better results than a lumped model if appropriate 

information on catchment characteristics can be obtained. 


A scientific contribution of [6] was the adoption and demonstration of Klemes’s model validation testing 

scheme, which had not been much used since the basic idea was published by Klemes (1986). This is 

discussed further in Section 4.2.4. 

Furthermore, the results from the intercomparison contributed to the ongoing scientific discussion on 

which types of model codes should be recommended for which application purpose. Only a few intercomparison 

studies involving different model types had been reported in literature and only two studies 

included physically-based models (Loague and Freeze, 1985; Michaud and Sorooshian, 1994). Most of 

these previous studies had been conducted on small research catchments and none of them had included 

tests for non-stationary climate conditions as in [6]. 

From the emergence of the distributed physically-based models it was widely stated and believed that 

these new model types generally would be able to provide more accurate simulation of the hydrological 

cycle (Abbot et al., 1986a). In the absence of hard facts from suitable tests the scientific debate had to 

a very large extent been based on expectations and qualitative arguments such that the models with 

more physical basis in their model structure were assumed to be able to provide more accurate simulation 

results, or the opposite view, as e.g. advocated by Beven (1989) that such expectations to the superior 

performance of the physically-based models were unrealistic. In [4] we basically agreed with 

Beven (1989) with respect to the SHE’s capability to simulate discharge for large scale catchments with 

ordinary data, i.e. that the rainfall-runoff simulation results were of the same degree of accuracy “as 

would have been expected” with simpler hydrological models of the lumped conceptual type. 

With the results from [6] it was now possible to more firmly conclude that if the purpose of modelling is 

limited to simulation of runoff under stationary catchment conditions and if data exist for calibration purpose, 

there is no scientifically documented reason to go beyond lumped conceptual models. This issue 

has been subject to several studies since then, where the conclusions from [6] basically have been 

confirmed (e.g. Perrin et al., 2001; Reed et al., 2004). I believe that the only thing that may change that 

conclusion is the introduction of new spatial data from new airborne or satellite sensors. Whereas these 

new data types have proven to have great value for many hydrological purposes and for special condi- 

34



tions (e.g. snow cover), they have in general not yet documented that they can provide distributed 

models with comparative advantages in simulation of catchment runoff. 

35



3.2 Reactive Transport 

3.2.1 Oxygen transport and consumption in the unsaturated zone ([3]) 

Summary 

Publication [3] describes the development of a new code for simulation of oxygen transport and consumption 

in the unsaturated zone. The code was linked as a sub-component to the SHE modelling system 

(Abbott et al., 1986a,b). The objective of the paper was to describe the new process formulation, 

document its applicability through two case studies and outline the perspectives in relation to its use as 

part of the comprehensive SHE code. 

The unsaturated zone water flow calculations in SHE were based on a finite difference solution to the 

full Richards’ equation for unsteady soil water flow. The solute transport calculations were based on the 

traditional convection-dispersion equation. The new code for oxygen transport and consumption was an 

add-on to these first two steps and used information on soil moisture content, water flows and solute 

concentrations and fluxes as input. Thus the spatial representation is given by the underlying flow and 

solute transport discretisation, implying a one-dimensional description with spatial resolution ranging 

from a few cm close to the terrain to 20-40 cm further down in the soil column. 

The process description in [3] is based on a three-phase system (soil, water, air) and accounting for 

spatial heterogeneity at this small scale. Fig. 18 shows a microscale illustration of the soil. Air tends to 

fill the larger pores in the soil matrix whereas water is drawn into the narrow necks and finer pore 

spaces in aggregates, forming capillary films and wedges. The air and water coexist in the soil by occupying 

different geometric configurations. Oxygen movement within these different portions of the pore 

space can occur by: convective transport in the water, diffusion in water, convective transport in soil air, 

diffusion in soil air, diffusion into water-saturated soil crumbs, and consumption in free and fixed water. 

Microorganisms and plant roots are generally found in the finer pores of the soil because they require 

close contact with the soil particles for uptake of substrate and nutrients. Transport of oxygen to these 

respiring sites usually occurs in the water phase of soil crumbs. It is the rate of oxygen diffusion through 

this fixed water in micropores that will determine the availability of oxygen for respiration and the anaerobic 

fraction of the soil. A soil crumb is considered to be any fully water-saturated subvolume of soil, 

the physical size of which is determined by the nearness of air-filled soil pores. The crumb is thus defined 

by the fact that oxygen transport within the crumb is primarily due to diffusion in water-filled pores. 

The size of the soil crumbs is dependent on the water content of the soil and the corresponding number 

of air-filled pores. 

The relation between soil water content and size of the water crumbs is derived from the soil water retention 

curve that is already used in Richards’ equation. The idea behind this is illustrated in Fig. 19 and 

described in more details in [3]. The number of air filled pores at a given soil moisture content can be 

36



calculated from the retention curve (Fig. 19b). It is furthermore assumed that the distance between two 

air filled pores, d i , corresponds to the average diameter of a water saturated crumb (Fig. 19a). 

Air 

“Free” water 

Solids/ 

aggregates 

“Fixed” water 

Anaerobic 

zone 

Aerobic 

zone 

Fig. 18 Microscale representation of the three-phase soil system with respect to oxygen transport. 

Tension (ψ) 

Pore radius (p) 

Airfilled pore 

d i 

Water saturated 

crumb 

L 

(θ i 

+1) 

θ i 

Water content (θ) 

(a) 

(b) 

Fig. 19 (a) The assumed pore distribution within the unit L x L. (b) Retention curve showing the relation 

between tension, water content and pore radius of a soil. 

The two case studies where the model code was tested and demonstrated dealt with operation of a 

waste water infiltration plant and assessment of anaerobic zones of importance for denitrification in 

agricultural soils. 

37




Previous research in oxygen transport processes in heterogeneous soils (e.g. Currie, 1961; Smith, 

1980; Troeh et al., 1982) were based on the assumption of steady-state conditions with regard to 

crumb/aggregate size and aerobic-anaerobic fractions. The novel scientific contribution of this paper 

was the new concept of calculating the size of the water crumbs as a function of the water retention 

curve and the time varying soil moisture content originating from SHE calculations and the linking of this 

concept to the previous research in this field. In this way it became possible to calculate aerobicanaerobic 

fractions dynamically. 

Although the scale of consideration in this study is the smallest possible in a catchment modelling perspective, 

namely point or column scale, it illustrates that smaller scale phenomena (here diffusion into 

soil crumbs that are of mm or less in size and temporally varying) often dominate the oxygen conditions 

at grid (cm - dm) scale. The approach in [3] is an upscaling from grain size to computational model grid 

point, where the within grid heterogeneity is accounted for by developing a set of process equations 

that includes the effect of the smaller scale heterogeneity at the larger grid scale. 

In retrospect, it is interesting to consider the issues that were not discussed in [3]. In this respect it 

should be noted that code verification aspects were not mentioned in [3], although a completely new 

code was developed. Furthermore, [3] did not discuss the issue of upscaling the present grid scale 

processes to application at catchment scale. Interesting issues in this regard would be evaluations of 

how data and parameter values could be assessed for catchment scale applications and discussions of 

whether it would still be the mm-scale (crumbs) processes that would be dominating when simulating at 

large scale, or whether larger scale heterogeneities, such as differences in crops, soil types or topography, 

would become more important and thus reduce the importance of the present process description. 

The model code presented in [3] was developed in a ‘research version’ of the SHE code. After the 

completion of the study it was not upgraded to become part of the ‘commercial version’ of MIKE SHE 

that emerged a few years later. The oxygen model has not been used for practical purposes. 

To my knowledge, process description of the same detail as in [3] has not been included in any catchment 

model, and not even in the most comprehensive physically-based root zone models such as 

DAISY (Hansen et al., 1991; Abrahamsen and Hansen, 2000). In DAISY that provides state-of-the-art 

descriptions of root zone processes with focus on water, plant growth and nitrogen a much simpler and 

more empirical process formulation is used for calculating denitrification as a function of anaerobic subsoil 

conditions. 

38



3.2.2 An integrated model for the Danubian Lowland ([9]) 

Summary 

Publication [9] is concerned with environmental assessment studies in connection with the Gabcikovo 

hydropower scheme along the Danube. The objective of the underlying study was to develop and apply 

a comprehensive integrated modelling system to support management decisions in this respect. 

The Danubian Lowland (Fig. 20) in Slovakia and Hungary downstream Bratislava is an inland delta 

formed in the past by river sediments from the Danube. The entire area forms an alluvial aquifer, which 

throughout the year receives around 30 m 3 /s infiltration water from the Danube in the upper parts of the 

area and returns it to the Danube and the drainage canals in the downstream part. The aquifer is an 

important water resource for municipal and agricultural water supply, and the floodplain area with its 

alluvial forests and associated ecosystems represents a unique landscape of outstanding ecological 

importance. 

Fig. 20 The Danubian Lowland with the new reservoir and the Gabcikovo hydropower scheme. 

The Gabcikovo hydropower scheme was put into operation in 1992. A large number of hydraulic structures 

was established as part of the hydropower scheme. The key structures are a system of weirs 

across the Danube at Cunovo 15 km downstream of Bratislava, a reservoir created by the damming at 

Cunovo, a 30 km long lined navigation canal, outside the floodplain area, parallel to the Danube River 

39



with intake to the hydropower plant, a hydropower plant and two ship-locks at Gabcikovo, and an intake 

structure at Dobrohost, 10 km downstream of Cunovo, diverting water from the new canal to the river 

branch system. The entire scheme has significantly affected the hydrological regime and the ecosystem 

of the region. The scheme was originally planned as a joint effort between former Czecho-Slovakia and 

Hungary, and the major parts of the construction were carried out as such on the basis of an international 

treaty from 1977. However, since 1989 Gabcikovo has been a major matter of controversy between 

Slovakia and Hungary, who have referred some disputed questions to the International Court of 

Justice in The Hague (ICJ, 1997). 

The hydrological regime in the area is very dynamic with so many crucial links and feedback mechanisms 

between the various parts of the surface- and subsurface water regimes that no single existing model code 

was able to describe the entire regime. Therefore, the modelling system illustrated in Fig 21 was established. 

It integrates four model codes: (a) MIKE 21 (DHI, 1995) for describing the reservoir (2D flow, eutrophication, 

sediment transport); (b) MIKE 11 (Havnø et al., 1995) describing the river and river 

branches (1D flow including effects of hydraulic control structures, water quality, sediment transport); 

(c) MIKE SHE (Refsgaard and Storm, 1995) describing the ground water (3D flow, solute transport, 

geochemistry) and flood plain conditions (dynamics of inundation pattern, ground water and soil moisture 

conditions); and (d) DAISY (Hansen et al., 1991) describing agricultural aspects (crop yield, irrigation, 

nitrogen leaching). The interfaces between the various models were: 

Fig. 21 Structure of the integrated modelling system with indication of the interactions between the individual 

models 

40



A) MIKE SHE forms the core of the integrated modelling system having interfaces to all the individual 

modelling systems. The coupling of MIKE SHE and MIKE 11 is a fully dynamic coupling 

where data is exchanged within each computational time step. 

B) Results of eutrophication simulations with MIKE 21 in the reservoir are used to estimate the concentration 

of various water quality parameters in the water that enters the Danube downstream of 

the reservoir. This information serves as boundary conditions for water quality simulations for the 

Danube using MIKE 11. 

C) Sediment transport simulations in the reservoir with MIKE 21 provide information on the amount 

of fine sediment on the bottom of the reservoir. The simulated grain size distribution and sediment 

layer thickness is used to calculate leakage coefficients, which are used in ground water modelling 

with MIKE SHE to calculate the exchange of water between the reservoir and the aquifer. 

D) DAISY simulates vegetation parameters that are used in MIKE SHE to simulate the actual 

evapotranspiration. Ground water levels simulated with MIKE SHE act as lower boundary conditions 

for DAISY unsaturated zone simulations. Consequently, this process is iterative and requires 

several model simulations. 

E) Results from water quality simulations with MIKE 11 and MIKE 21 provide estimates of the concentration 

of various components/parameters in the water that infiltrates to the aquifer from the 

Danube and the reservoir. This can be used in the ground water quality simulations (geochemistry) 

with MIKE SHE. 

The integrated model was established for the 3,000 km 2 area on the basis of a large amount of good 

quality data. Most of the model parameters were assessed directly from field data, and some were estimated 

through calibration. For most of the individual model components, traditional split-sample validation 

tests were carried out. 

The modelling system was used in a scenario approach to assess the environmental impacts of alternative 

water management options. The uncertainties of the model predictions were assessed through 

sensitivity analyses. As an example, Figs 22 and 23 shows a characterisation of the floodplain area 

between the (old) main Danube river channel (western model boundary) and the power canal for predam 

(Fig. 22) and a hypothetical post-dam condition (Fig. 23) where the major part of the water is diverted 

from the main Danube channel to the power canal. The classes with different ground water depths 

and flooding have been determined from ecological considerations according to requirements of 

(semi)terrestrial (floodplain) ecotopes. For the pre-dam condition (Fig. 22) the contacts between the main 

Danube river and the river branch system is clearly seen. Similar results for a hypothetical post-dam water 

management regime (Fig. 23) show significant differences in hydrological regime, e.g. many areas are 

characterised by high groundwater tables and small/seldom flooding, while the post-dam situation (Fig. 22) 

generally has deeper ground water tables and more frequent flooding. From such changes in hydrological 

conditions inferences can be made on possible changes in the floodplain ecosystem. 

41



Fig. 22 Hydrological regime in the river branch area for 1988 pre-dam conditions characterised in ecological 

classes 

Fig. 23 Hydrological regime in the river branch area for a post-dam water management regime characterised 

in ecological classes. The scenario has been simulated using 1988 observed upstream discharge 

data and a given hypothetical operation of the hydraulic structures. 

42




The uniqueness of the established modelling system is the integration between the individual model 

codes, each of which providing complex distributed physically-based descriptions of the various processes. 

The validation tests have generally been carried out for the individual models, whereas only few 

tests on the integrated model were possible. Altogether, the integrated modelling system and the applications 

were more comprehensive and complex in terms of interactive dynamics between different 

components of an ecosystem than had previously been reported in the scientific literature. 

In the years following [9] a few comprehensive large scale studies with coupled models emerged. The 

most comprehensive of those was probably Wolf et al. (2003) who developed the STONE for calculating 

nutrient emissions from agriculture in The Netherlands. Although based on different codes the 

STONE resembles the integrated modelling system in [9] in terms of number of codes and complexity 

of process descriptions. One main difference, however, was that STONE consists of a chain of models 

without the feedback couplings that characterise [9]. Simpler, although still comprehensive, modelling 

systems were presented by Birkinshaw and Ewen (2000) as the SHETRAN code with a built-in nitrate 

transformation component and Conan et al. (2003) with a coupling of SWAT, MODFLOW and MT3DMS 

also focusing on nitrate fate at catchment scale. 

The complexity of the modelling studies in [9] may be compared to coupled modelling studies in 

neighbouring fields. The hydrology related field with the strongest modelling traditions is no doubt the 

atmospheric science. Here very comprehensive coupled models have been used in connection with 

hydrology oriented climate change studies. An example of a sequentially coupled atmospherichydrological 

model from that period is Graham (1999) who used the ECHAM4 regional atmospheric 

model coupled with the HBV hydrological model to simulate discharge for the entire 1.6 10 6 km 2 Baltic 

Sea basin. The atmospheric modelling component is in itself more demanding in terms of computer 

power than comprehensive hydrological modelling such as [9], and the complexity of the atmospheric 

modelling is maybe larger than the complexity of the individual process model codes in [9]. Otherwise 

the complexity of the coupled atmospheric-hydrological studies with respect to feedback couplings between 

process descriptions, data requirements, different scales for different processes, etc., may be 

considered comparable to the complexity of [9]. 

In retrospect it is interesting to evaluate how much this comprehensive modelling system actually was used 

as part of the political decision process Were the full potential of the models utilised by the decision 

makers In the following my personal perception of these aspects are presented. The application of the 

integrated modelling and information system in practise may be categorised in three principally different 

functions: (a) to assist in design of structures and details of water management regimes, (b) to assist in 

policy analysis by assessing the environmental impacts of alternative water management regimes, and (c) 

to assist in resolving different views between interest groups on environmental assessments. 

The use of models to assist in designs is the classical "engineering" way of using such models. There were 

a number of such applications. The best example of this is the final design in 1993 of the guiding structures 

of the Cunovo reservoir that was based on model simulations. Such model use was possible, because the 

objectives of the decision-makers were clear and there was an urgent need for the results before the 

construction works actually started. 

43



Use of models to assess the environmental impacts of alternative water management regimes was one of 

the primary reasons for establishing the modelling systems. There were several examples of such model 

applications. A key example was a combined field and modelling study of the geochemical conditions in the 

aquifer to assess whether the changed boundary conditions with the new reservoir would affect the redox 

conditions and hence the groundwater quality in the aquifer that forms the basis for the water supply of 

Bratislava. Another example is a combined field and modelling study of the eutrophication conditions in the 

reservoir. Such studies were conducted in close dialogue with the decision-makers in order to assist in their 

policy formulation. 

Finally, the modelling system was an invaluable tool in connection with the international attempts made to 

assist in resolving some of the issues that were disputed between Slovakia and Hungary. Many of the 

arguments brought forward on these highly controversial issues were mixtures of scientifically based facts 

and politically based views, but they were often claimed as purely scientifically based. It is very natural and 

fully legitimate that all parties have political interests and do their best to pursue them. However, the mixing 

of scientific facts and political interest makes the whole scene less transparent and may be an obstacle for 

arriving at rationale decisions. The role the modelling system had in this context was that it made it possible 

at some occasions to help distinguish between facts and fiction with respect to the scientific arguments. In 

this way the modelling tools assisted in separating scientific and political problems. Thus, the modelling 

system was often used as an important tool in resolving technical disagreements between the Slovakian 

and Hungarian delegations in the international expert groups (EC, 1992, 1993a, 1993b). Similarly, it is my 

impression that the modelling results played a significant role for the International Court of Justice when 

dealing with the question of whether the ecological situation could be characterised as a catastrophe 

justifying the use of the legal principle of “the ecological state of necessity” as done when Hungary stopped 

the construction works on the Gabcikovo scheme in 1989 (ICJ, 1997). 

However, there were also clear limitations to the application of the modelling tools. These limitations 

occurred when the political objectives were not clearly defined. It was for instance imagined that the 

modelling tools should be used to identify the optimal solution for the water management regime in the river 

branch system. This unique area is, however, subject to considerable interest from different sectors such 

as commercial forestry, fishery, tourism and natural conservation. The requirements of these different 

sectoral interests are not common and in some cases even contradictory with respect to how the water 

regime should be. Thus, until the balance of interests between these different stakeholders has been 

decided in terms of clear political goals from the government, an optimal solution does not exist. Another 

example of lack of clear political goals was related to the overall sharing of water between hydropower and 

the environment. 

44



3.2.3 Large scale modelling of groundwater contamination ([10]) 

Summary 

Publication [10] describes results from an EU research project on groundwater pollution from non-point 

sources. The rationale outlined in [10] is that physically based models for describing nitrate due to better 

process descriptions may be expected to have better predictive capabilities than simpler empirical 

models for certain applications related to assessing the impacts of changes in agricultural management 

practise. Such models were well proven for simulation of nitrate contamination at small scale with good 

data availability. Two of the main constraints for using such models operationally were that (a) the databases 

existing at national or European scale had not previously been tested as input for such models; 

and (b) almost no tests had been conducted for such models at large scale. The objectives of the paper 

were therefore to study the data availability at the large scale and develop methodologies for model 

upscaling/aggregation to represent conditions at larger scale. The theoretical aspects on scaling included 

in [10] are dealt with in Section 4.1. Here some key results from one of the two catchments (Karup) 

are discussed. 

The modelling system used was MIKE SHE (Refsgaard and Storm, 1995) coupled with the DAISY root 

zone model (Hansen et al., 1991). Two Danish catchments of about 500 km 2 each, Karup and Odense, 

were used for the tests. 

The principles used for collecting input data and assessing values of model parameters were: 

• The data must be easily accessible. This implied that most of the data were aggregated data from 

national or European databases. 

• No model calibration is carried out. Instead parameter values are estimated from generic transfer 

functions. 

Data were collected from the following sources: 

• Topography: 1 km grid data downloadable from USGS and GISCO (Geographical Information System 

of the European Commission) 

• Catchment boundaries and river network: generated from the topographical data using standard 

GIS functionality. 

• River cross-sections: derived from a special GIS application where the cross-section was estimated 

based on upstream catchment area, slope and a characteristic discharge. 

• Soil type: GISCO soil map. 

• Soil organic matter: experience values. 

• Vegetation: EEA CORINE land cover map. 

• Agricultural management practise: Agricultural statistics and government prescribed norms 

• Geology and groundwater abstraction: EC report 

• Climatic variables and discharge data: national data 

The MIKE SHE models were run with 1, 2 and 4 km grids. For describing the nitrate leaching from the 

root zone, 17 crop rotation schemes were established by use of DAISY. The crop rotations were based 

45



on the statistical information on crop type and livestock densities. The 17 schemes were distributed 

randomly over the catchment in such a way that the statistical distribution was in accordance with the 

agricultural statistics. As an alternative, all the agricultural area was described by one representative 

crop instead of 17 cropping patterns. These two approaches are denoted ‘Distributed’ and ‘Uniform’ in 

Figs. 24 and 25 below. 

The Karup model was validated by comparison of model simulations and field data on annual water 

balances, discharge hydrographs (Fig. 24) and nitrate concentrations in the upper groundwater layer 

from 35 observation wells (Fig. 25). The results of the validation tests were characterised as follows: 

• The annual water balance was simulated remarkably well with only 2% difference as average value 

over the five years validation period. The variation over the year (Fig. 24) is less well described. 

• The simulated nitrate concentrations (Fig. 25) match the observed data remarkably well both with 

respect to average concentrations and statistical distribution of concentrations within the catchment. 

• The simulations are clearly affected by various scale effects (1, 2, 4 km grid and Distributed/Uniform). 

This is addressed further in Section 4.1 below. 

Fig. 24 Comparison of the recorded discharge hydrograph for the Karup catchment with simulations 

based on 1, 2 and 4 km grids. The two simulated curves correspond to the combined upscaling/aggregation 

procedure (Distributed) and the simpler upscaling procedure (Uniform). 

46



1,2 

Distribution of groundwater concentrations (ultimo 1993) 

(uniform agricultural representation) 

Cumulative frequency 

1 

0,8 

0,6 

0,4 

0,2 

Measure 

d 

det1000_ 

d1 

det2000_ 

d1 

det4000_ 

d1 

0 

0 20 40 60 80 100 120 140 160 180 

1,2 

(mg/l) 

Distribution of groundwater concentrations (ultimo 1993) 

(distributed agricultural representation) 

Cumulative frequency 

1 

0,8 

0,6 

0,4 

0,2 

Measured 

det1000 

det2000 

det4000 

0 

0 20 40 60 80 100 120 140 160 180 

mg/l 

Fig. 25 Comparison of statistical distribution of nitrate concentrations in groundwater for the Karup 

catchment by the model with 1, 2 and 4 km grids and observed in 35 wells. The lower figure corresponds 

to the upscaling procedure resulting in a distributed representation of agricultural crops, while 

the upper figure is from the run with the upscaling procedure, where all agricultural area is represented 

by one uniform crop. 


The model codes used in [10] were well known and previously used in one of the catchments (Styczen 

and Storm, 1993a, b). The scientific contributions of [10] relate partly to scaling issues, which are dealt 

with in Section 4.1 below, and partly to testing the performance of nitrate catchment models when 

scarce data are used and when no model calibration is carried out. The most important finding with 

respect to data availability is probably that aggregated data in many cases can provide sufficient input 

to perform useful model simulations. This message is similar to the output from the first large scale application 

of SHE to catchments in India with scarce data ([4] and [5]), namely that an apparent lack of 

primary data should not always prevent you from using a model. 

With regard to data availability at large scale it was concluded that the most critical data that may cause 

problems for large scale applications are the geological data for which no suitable global or European 

digital database exist. In this respect the development of a national hydrological model in Denmark 

(Henriksen et al., 2003) that is based on comprehensive geological data from the very large national 

geological database is an important development. 

47



The study showed that one of the strengths of physically-based models is the possibility to assess 

many parameter values from standard values, achieved from experience through a number of other 

applications. It also showed some of the limitations in this respect. While the key results in terms of 

annual runoff and nitrogen concentration distributions are encouraging, the discharge hydrographs 

clearly illustrate that it would be very easy to obtain a better hydrograph fit through calibration of a couple 

of parameter values. When parameters are assessed in this way they are subject to considerable 

uncertainty, which will generate significant uncertainty in model predictions. This aspect is addressed in 

([11]) which is discussed in Section 4.3 below. 

The attempt to assess parameter values directly from data without any model calibration can be seen 

as the extreme end of the development starting with hundreds of free parameters in the Suså model 

([1]), over 26 parameters in the Kolar basin in India ([5]), to 11 free parameters in a previous Karup 

study ([7]). The results from the present study showed some obvious shortcomings of this approach, 

and in a later study of the Senegal basin (Andersen et al., 2001) we used 4 free parameters for calibration. 

48



3.3 Real-time Flood Forecasting 

3.3.1 Intercomparison of updating procedures for real-time forecasting ([8]) 

Summary 

Publication [8] presents a classification of updating procedures used in real-time flood forecasting modelling 

and a review of the results from the WMO project ‘Simulated Real-Time Intercomparison of Hydrological 

Models’ (WMO, 1992) comprising more than 10 commonly used hydrological model codes 

and a variety of different updating procedures. The objective of the paper was to analyse the performance 

of different types of updating procedures and to assess what is more important, the simulation 

model or the updating procedure. 

In the context of real-time forecasting a hydrological catchment model, as those in the remaining part of 

this thesis, may be denoted a process model (Fig. 26). A process model consists of a model structure 

including process equations, model parameters that are constant throughout a model run and state 

variables. The transformation from input to output by the process model is called simulation, in accordance 

with the terminology defined in Section 2.2 above. Process models that operate in real-time may 

take into consideration the measured discharge/water level at the time of preparing the forecast. This 

feedback process of assimilating the measured data into the forecasting procedure is referred to as 

updating, or data assimilation. Updating procedures can be classified according to four different methodologies 

(Fig. 26): 

1. Updating of input variables, typically by adjusting precipitation. 

2. Updating of state variables, e.g. the soil moisture content. 

3. Updating of model parameters. 

4. Updating of output variables (error prediction). 

The core of the WMO project was a workshop held in Vancouver during the period July 30 – August 8, 

1987, where 15 models from 14 different organisations were run in a simulated real-time environment. 

Data from three catchments with significantly different hydrological characteristics were used for the 

tests. Before the workshop the modellers had received historical data for several years for calibration 

and validation and two ‘warm up’ flood events. During the workshop four additional flood events were 

forecasted as blind tests, each with seven forecasts at consecutive times. Each event was forecasted 

within one workshop day, often under considerable time pressure. 

I participated in the workshop with two models that differed both with respect to process model and 

updating procedure: 

• NAMS11 comprising the NAM as catchment model, St. Venant river routing and an error prediction 

model as updating procedure. This is basically identical to what later became known as the flood 

forecasting module of MIKE 11 (Havnø et al., 1995). 

49



• NAMKAL comprising the NAM formulated in a state-space form and build into an extended Kalman 

filter for updating. This version had no separate river routing but relied on the linear reservoirs in 

NAM. 

The two models were tested on the 104 km 2 Orgeval catchment (France) and the 2,344 km 2 Bird Creek 

catchment (United States). The models were not tested on the third, snow-dominated catchment. 

Fig. 26 Schematic diagram of simulation and forecasting with illustration of four different updating 

methodologies), [8]. 

Summary results from the two catchments are shown in Fig. 27 as root mean square errors (RMSE) as 

a function of forecast lead time (lag). As can be seen from the figure the intercomparison test turned out 

to be a very close ‘race’ with at least one third of the models performing almost equally well. Depending 

on the selected criteria for comparison (which catchment, priority to short, medium or long lead times, 

etc.) several of these could claim to be the ‘best model’. What is maybe more interesting is some of the 

general findings: 

• The process models belonged to two of the classes shown in Fig. 6, namely empirical (black box) 

models and lumped conceptual models. From the results it was not possible to clearly distinguish 

which model type performed better. 

• All four types of updating procedures were represented, both among the models with the best performance 

and among the models with the poorest performance. This indicates that the selection of 

a specific updating methodology is only one out of several important factors. 

• The forecast error (RMSE) generally increases with forecast lead time. This shows that updating 

procedures most often significantly improve the performance of hydrological models for short-range 

forecasting. 

• In most cases the models with the best performance for short lead times were also those with the 

best results for the long lead times. This indicates that the goodness of the basic simulation (by the 

50



process model) is crucial to forecast accuracy, or in other words that a good updating procedure 

can not compensate for a poor process model. 


Real-time forecasting is the toughest field I have experienced in hydrological modelling with respect to 

model validation, because the results of the model forecasts are continuously confronted with observations. 

In many studies involving model simulations for planning purposes it is often not possible to conduct 

a validation test that exactly fits the conditions for which model simulations of future conditions are 

needed. Therefore, the validation test results will often have many qualifiers and be considered together 

with other arguments. In real-time flood forecasting there is no need for such qualifiers and arguments 

(‘no nonsense’) and therefore only the hard facts are considered. 

Fig. 27 Root Mean Square Errors (RMSE) as a function of forecast lead time for all models participating 

in the Orgeval and Bird Creek catchments. The RMSE values are averaged over the four forecasted 

flood events with blind tests (events 3-6), [8]. 

51



The main scientific contribution of [8] was the analysis of the performance of different types of process 

models and updating procedures and combinations hereof. Our motivations to participate in this unique 

WMO intercomparison project were (a) to test DHI’s code NAMS11 (now MIKE 11), which was used 

operationally in India at that time, in an intercomparison with some of the internationally leading codes 

and modellers; and (b) to test whether an extended Kalman filter could provide a better updating routine 

than the more commonly used and simpler error prediction routine. In addition to noting that the 

NAMS11 performed very well and that the extended Kalman filter under ideal conditions could perform 

marginally better than the standard updating procedure, the analysis lead to the following interesting 

findings: 

• It was not possible to conclude which model type, black box or lumped conceptual, is better suited 

for simulation of runoff. This is in good agreement with [6] and later studies such as Reed et al. 

(2004), which concluded that lumped conceptual and distributed physically-based models performed 

equally well for split-sample tests. Thus it may be argued that all three model types described 

in Section 2.4 in many cases can be expected to be able to perform equally well in rainfallrunoff 

modelling. 

• It turned out that the personal factor is maybe the most important aspect of hydrological modelling. 

It was clear after the workshop that the difference in model performances between the participating 

codes could often not be explained by differences in model codes. Personal factors such as the 

modeller’s ability to make a good model calibration, experience from working in hydrological regimes 

different from the regime you see in your home office, ability to work under extreme stress, 

level of preparation beforehand and random luck also played important roles. The personal factor is 

most often overlooked in natural science, maybe because it is subjective of nature and therefore 

does not fit well into the methods usually adopted in natural science. The ultimate consequence of 

this finding is that good quality of modelling results requires both use of good scientifically based 

methodologies and adoption of sound practises by competent professionals. This consequence was 

not derived in [6] but is central for recent work on quality assurance guidelines in the modelling 

process ([13]). 

Most of the model codes that participated in the intercomparison study were state-of-the-art hydrological 

model codes such as Sacramento (Burnash, 1995), HBV (Bergström, 1995) and MIKE 11 

(NAMS11) with comprehensive experience in operational flood forecasting. These codes are still 

among the most commonly used today. The updating techniques tested in [8] are also still the basic 

techniques used operationally today, although more sophisticated developments and improvements 

have taken place, e.g. a combination of the Kalman filtering and the error prediction procedure (Madsen 

and Skotner, 2005). 

52



4. Key Issues in Catchment Scale Hydrological Modelling 

4.1 Scaling 

This section provides a discussion of catchment heterogeneity and upscaling in relation to catchment 

modelling based partly on the publications in the present thesis (most importantly [7] and [10]) and 

partly on other previous work such as Refsgaard (1981), the foundation of [1] and [2], and Refsgaard 

and Butts (1999) that was heavily inspired by the EU research project behind [10] and [11]. 

Hydrological modelling is being carried out at spatial scales ranging from pore scale to global scale and 

a variety of scaling theories has been developed, see e.g. Blöschl and Sivapalan (1995) and Beven 

(1995). Many of the scaling theories consider different spatial scales for single processes. For catchment 

modelling it is necessary to include several processes and their linkages. 

4.1.1 Catchment heterogeneity 

Catchment properties exhibit spatial variability. For almost all properties this heterogeneity is very large 

and dominates the behaviour of the catchment. Scaling is basically a question of how to handle heterogeneity 

at different spatial scales. Different model types do this fundamentally different. Let us illustrate 

this by two examples. 

As the first example, let us consider an idealised description of flow through the root zone (Fig. 28). If a 

soil column, initially dry, is supplied with a certain amount of water it will retain water, until it is filled to a 

certain level, the field capacity θ’ F , whereupon all the supplied water will pass through. This is illustrated 

in Fig. 28 A,B,C, where also the frequency and the distribution of θ F are shown. If we then consider a 

catchment with a spatial variability in soil physical properties, the frequency and the distribution of the 

field capacity are illustrated in Fig. 28 D and E respectively. If the root zone of this catchment, initially 

dry, is being supplied with water, not all of the area will contribute to throughflow at the same time, as θ F 

varies in the catchment . When, for instance, the rainfall has supplied the water amount θ’ F,m , it is seen 

from Fig. 28 E that field capacity has been reached in one half of the catchment, thus contributing to 

throughflow, while the other half of the catchment still retains the rain in its root zone. 

In a lumped model, such as NAM, such spatial variability is taken into account by using semi-empirical 

relations as e.g. the dashed line in Fig. 28 F, where θ’ 1 and θ’ 2 typically have to be estimated from calibration. 

The difference between θ’ 1 and θ’ 2 can be seen as a measure of the heterogeneity of the catchment, 

or of the catchment input that is also assumed homogeneously distributed in a lumped approach. 

This way of accounting for the spatial variability in the process equations can be considered the heart of 

lumped models and also explains why the process equations in lumped models are fundamentally different 

from point scale physical process equations. 

In a distributed model the spatial variability is taken into account by dividing the catchment into several 

smaller elements, which are then usually treated as homogeneous units, i.e. as a column in Fig. 28. 

53



However, the spatial variability of soil physical properties comprise both variability between different soil 

types and variability within the same soil type as illustrated in Fig. 29. It has been demonstrated in several 

studies (Nielsen et al., 1973; Jensen and Refsgaard, 1991a,b,c; Djurhus et al. 1999) that the spatial 

variability of e.g. soil properties within one standard soil type at field scale is very high and can significantly 

influence the water balance and solute transport at this scale. 

Frequency 

A 

Distribution 

B 

Through flow 

Supplied water 

C 

Soil 

Column 

θ F 

θ F 

Supplied 

water 

θ’ F 

Frequency 

θ’ F 

Distribution 

θ F 

Through flow 

Supplied water 

1.0 

D 

E 

F 

Catchment 

0.5 

0 

θ F 

θ’ F, m 

θ’ 1 

θ’ F 

θ’ 2 

Supplied 

water 

Fig. 28 Idealised description of the variation of field capacity, θ F , and its effect on flow through the root 

zone in a soil column and in a catchment (Refsgaard, 1981). 

Frequency 

Spatial variability 

of field capacity, θ F 

within one of 

the soil types 

in the entire 

catchment 

θ F 

Fig. 29 The principle of spatial variability of a soil physical property within a single soil type and within a 

catchment containing more than one soil type (Refsgaard, 1981). 

Let us then turn to another example focusing on the limitation of a distributed model to resolve key features 

of a catchment. Fig. 30 shows the topography and river network for two models that are identical 

54



except for differences in spatial discretisation. It is clearly seen that the 500 m grid provides a much 

better resolution of the topography and the river network, and also of other catchment characteristics as 

explained in [7]. In the 2000 m grid the river valley cannot be described well and many of the smaller 

streams have to be omitted, where the distance between neighbouring streams are smaller than the 

model grid size. This significantly affects the stream-aquifer interaction and in this way the simulation of 

both river discharge and groundwater heads. As discussed in [7] a change in scale (grid size) in this 

way changes the model simulations. This can in some cases be compensated by adjusting parameter 

values. But it implies that parameter values are scale dependent and that the physical basis is reduced 

if the grid size is increased. 

Fig. 30 Topography, river network and model grid for two models with discretisations of 500 m and 

2000 m [7]. 

This example focussed on river discharges and hydraulic heads at some given observational locations 

for which [7] argues that a 500 m resolution provides an adequate description. If we instead had focussed 

on other processes such as reactive transport in aquifers or in river valleys, we would have needed 

to account for geological and geomorphological heterogeneity of much smaller scale than 500 m. This 

line of argument can continue down to pore scale processes such as those described in [3]. The point is 

that, no matter which resolution a model has, it is always possible to find processes that require a 

smaller scale in order to provide a physically based description. Consequently, the ultimate distributed 

physically based model where everything is described can never be achieved. This implies that any 

distributed model needs to provide a kind of lumped conceptual representation at its scale of operation. 

An excellent example of this is the traditional advection dispersion equation with its associated dispersivities, 

where the dispersivities show the well known scale dependence (Gelhar, 1986). The process 

description of oxygen transport and consumption given in [3] is another example. Although meant for 

55



inclusion as a submodel in a distributed physically based model, [3] incorporates spatial heterogeneity 

of processes at pore scale (mm) to a process equation assumed valid at its scale of operation (grid 

points with 10-40 cm distance). This process equation can therefore be considered a lumped conceptual 

description at this scale. 

4.1.2 A scaling framework 

In this section we only consider the case of moving from the smaller to the larger scale, which is often 

denoted upscaling. When moving to larger scales the spatial variability of physical parameters and variables 

have to be taken into account. This can in principle be done in two ways, either by aggregation or 

upscaling (Heuvelink and Pebesma, 1999): 

• Upscaling means that the process equations and the associated parameters that basically constitute 

the model in principle are modified or substituted when moving from the smaller scale to the 

larger scale. 

• Aggregation means that the process equations are applied at the smaller scale (where they were 

derived) and the large-scale results are obtained by aggregating the small-scale results at the larger 

scale. 

Hence, in order not to confuse the terminology with two different meanings of the term upscaling the 

term scaling will in the following be used for the case of moving from modelling at the smaller scale to 

modelling at the larger scale. Thus, the term upscaling is reserved to the specific approach of scaling 

defined above. 

The differences between upscaling and aggregation are illustrated in Fig. 31 and some key characteristics 

are summarised in Table 1. At the smaller scale, the hydrological processes can be described by 

smaller scale equations and associated smaller scale parameters. If the aggregation approach is 

adopted for large-scale modelling, then the model is operated at the smaller scale units with smaller 

scale equations and parameters and the model output valid for the larger scale emerges after aggregation 

of the results. The aggregation consists of estimating the spatial mean and in some cases also the 

statistical distribution of the model outputs. If the model is linear or the parameters and variables are 

spatially constant, computational time may be saved by averaging of model parameters and input before 

running the model; otherwise the models runs must be made before the aggregation step. 

Table 1. Characteristics of different scaling procedures when moving from a smaller scale (SS) to a 

larger scale (LS). 

Aggregation 

Upscaling 

Basis of process descriptions 

SS equations 

used at LS 

Large-scale 

PDE 

Smaller scale Smaller scale Smaller 

scale 

LS equations 

developed 

Larger scale 

Computational unit Smaller scale Larger scale Larger 

scale 

Larger scale 

Parameter estimation 

possible from field 

data 

Yes 

No, some values 

need calibration 

Yes 

No, some values 

need calibration 

56



Fig. 31 Upscaling and aggregation methods for extending hydrological processes from small-scale (SS) 

to large-scale (LS) models (Refsgaard and Butts, 1999). 

If the upscaling approach is adopted for the large-scale modelling, the smaller scale equations and parameters 

are in principle substituted by larger scale ones. The upscaling approach can be carried out in 

three different ways: 

• The smaller scale equations are assumed valid also at the larger scale. In this case the parameter 

values have to be estimated as effective parameters corresponding to the larger scale computational 

unit. Effective parameters are single values, similar to point scale parameters, but somehow 

reproduce the bulk behaviour of a heterogeneous medium. The estimation of parameter values is in 

such case often done by calibration, at least for a handful of the key parameters. An example of this 

approach is given in [5] describing an application of the SHE to a large catchment in India using 

spatial grid sizes of 2 km x 2 km. 

• The equations at the larger scale are derived in a theoretical framework from a set of deterministic 

partial differential equations (PDE) assumed valid at the smaller scale and assumptions on the spatial 

variability of key parameters and/or input data. This is often carried out in a stochastic framework 

where quantities such as the average value and higher order statistical moments of the desired 

model output variables can be assessed. An example of this approach is Jensen and Mantouglou 

(1992) who consider the spatial variability of soil hydraulic parameters in field scale modelling. 

In this case the parameter values may be assessed directly on the basis of smaller scale information. 

• The equations at the larger scale are developed at the larger scale using a concept, which does not 

explicitly consider the smaller scale equations, i.e. the formulation of laws that apply at the large 

scale. Examples of this approach are the conceptual rainfall-runoff models such as the NAM (Niel- 

57



sen and Hansen, 1973; [6]; [8]), cf Fig. 28 and the discussion above. The oxygen model described 

in [3] is also an example of this approach, although smaller scale and larger scale here refer to mm 

and dm scales and not to catchment scale. As a result of the larger scale concepts such codes are 

often not adequate also for smaller scale application and can most often not assess parameters directly 

from small scale information. 

4.1.3 Scaling - an example 

The above four scaling approaches each have their advantages and limitations and the specific approach 

to use in particular applications will depend on many factors such as the purpose of a given 

study, the dominating processes in the particular hydrological regime and the data availability. Thus, no 

unique approach can be claimed superior in all cases. As illustrated below, scaling procedures are in 

practise often based on combinations of the above approaches. 

The example outlines the scaling methodologies adopted under an EU research project dealing with 

uncertainties of assessing non-point pollution to aquifers at the European scale (Refsgaard et al, 1998; 

[10]). During this project two model codes were used: 

• SMART2 for studying leaching to groundwater of nitrate and aluminium from natural areas due to 

atmospheric deposition. SMART2 is a relatively simple dynamic model operating in vertical columns 

with annual time steps (Kros et al., 1995). 

• MIKE SHE/DAISY for studying groundwater contamination from agricultural areas. Both MIKE SHE 

(Refsgaard and Storm, 1995) and DAISY (Hansen et al., 1991) are physically-based model codes 

with detailed process descriptions and typically hourly time steps. 

The objective of the project was to assess the uncertainty in model predictions when applied at the 

European scale. As both codes had been developed for and previously mainly been applied at much 

smaller scales a scaling procedure had to be adopted. The two scaling procedures, illustrated in Fig. 

32, show significant differences: 

SMART 2 is operating at a 1 km grid scale. It was developed on the basis of experience with the NUC- 

SAM code (Groenenberg et al., 1995) which is a detailed physically-based code operating at point 

scale. Thus, SMART2 can be considered as an upscaling of NUCSAM with new equations and parameters 

applicable at the 1 km scale, equivalent to the upscaling procedure of the conceptual hydrological 

models described above. For use for the Netherlands the SMART2 model results were aggregated to 5 

km x 5 km grid by selecting the median value among the 25 grids of 1 km x 1 km size. The parameters 

were assessed by pedotransfer functions from field data without prior model calibration. The scaling 

procedure from point scale to national or European scales thus consists of a combination of an upscaling 

and an aggregation step. 

MIKE SHE/DAISY, on the other hand, is in this case run with equations and parameter values in each 

model grid point representing field scale conditions. The field scale is characterised by ‘effective’ soil 

and vegetation parameters, but assuming only one soil type and one cropping pattern. The smallest 

horizontal discretisation in the model is the grid scale (1-5 km) that is larger than the field scale. This 

implies that all the variations between categories of soil type and crop type within the area of each grid 

can not be resolved and described at the grid level. Input data, whose variations are not included in the 

58

Refsgaard JC – Doctoral Thesis 


January 2007 

grid scale representation, are distributed randomly at the catchment scale so that their statistical distributions 

are preserved at that scale. The results from the grid scale modelling are then aggregated to 

catchment scale (10-50 km) and the statistical properties of model output and field data are then compared 

at catchment scale (Hansen et al., 1999; [10]). Thus the scaling procedure from point scale to 

catchment scale is again a combination of an upscaling step and an aggregation step. In contrary to the 

NUCSAM-SMART2 case the upscaling step here is simply the (important) assumption that the point 

scale equations are valid at field scale. The aggregation step highlights a key issue from the concept of 

Representative Elementary Area, REA (Wood et al., 1988), namely that variability can be explicitly represented 

only at scales larger than the model grid size. 

Validation tests against field data suggested that the two different scaling procedures basically could be 

assumed valid for their respective cases, although important limitations were also identified. An important 

question regarding the differences between the two upscaling methods is, why it apparently was 

possible to make the large upscaling step from the smaller scale NUCSAM to the larger scale SMART 2 

code, while a similar step was not judged possible for the MIKE SHE/DAISY code. The answer may be 

that the nitrogen leaching in agricultural fields is a highly non-linear and dynamic process that depends 

on cropping pattern and agricultural management practise, which can not be lumped to a larger scale 

description, while the geochemical processes below natural lands, where no management practise is 

interfering, more easily can be represented by long term average simulations focussing on the gradual 

reduction of the chemical buffer capacities due to the acids in the atmospheric deposition. 

An inherent limitation of the scaling methodologies illustrated in this example is that they do not preserve 

the georeferenced location of simulated concentrations, but only their statistical distribution over 

the catchment area (e.g. Fig. 25). Therefore, comparisons with field data make no sense on a well by 

well or subcatchment by subcatchment basis, and no information on the actual location of the simulated 

‘hot spots’ within the catchment is provided. If it from a management point of view is required with a 

more detailed spatial resolution of the model predictions, then the same scaling method has to be carried 

out at a finer scale with all the statistical input data being supplied on a subcatchment basis. This is 

in principle straightforward, but in reality it may often be limited by data availability. 

4.1.4 Discussion – post evaluation 

The issue of scaling represents both a major scientific challenge and a practical problem in water resources 

management. Scaling is dealt with as a key issue in two of the publications in this thesis ([7], 

[10]). As the studies behind the other publications operate on scales ranging from point scale ([3]) to 

thousands of km 2 ([4], [5], [9]) catchment heterogeneity and scaling are dealt with and discussed in 

many of the publications. 

59

Fig. 32 Scaling methodology adopted by the SMART2 and MIKE SHE/DAISY models in the UNCERSDSS project (Refsgaard and Butts, 1999).



In the beginning of my career I had the rather naive view that it might be possible to develop a universal 

model code and a methodology that could be used to address most problems in hydrological management. 

This is reflected in the dualism of statements of the MIKE SHE description in Refsgaard and 

Storm (1995), where it on the one hand is stated that “MIKE SHE is applicable on spatial scales ranging 

from a single soil profile to a large regions”, while it on the other hand is acknowledged that “there are a 

number of fundamental scale problems which need to be carefully considered in the model applications”. 

I do not believe any longer that a universally applicable code and modelling methodology is theoretically 

realistic, and certainly it is not feasible in practise. The main reason for this is the scaling problems. 

Because scaling is interlinked with modelling concepts, I therefore do not believe it will ever be 

possible to derive a universal scaling theory of practical applicability. 

Scaling implies to take spatial heterogeneity into account. In catchment modelling it is furthermore 

complicated by the need to include and link several processes, such as subsurface processes (Dagan, 

1986; Gelhar, 1986; Wen and Gómez-Hernández, 1996), root zone processes including land surfaceatmosphere 

interaction (Michaud and Shuttelworth, 1997); and surface water processes including 

stream-aquifer interaction (Saulnier et al., 1997; [7]). 

Many researchers have expressed doubts whether it is feasible to use the same model process descriptions 

at different scales. For instance Beven (1995) states that “… the aggregation approach towards 

macroscale hydrological modelling, in which it is assumed that a model applicable at small 

scales can be applied at larger scales using ‘effective’ parameter values, is an inadequate approach to 

the scale problem. It is also unlikely in the future that any general scaling theory can be developed due 

to the dependence of hydrological systems on historical and geological perturbations.” 

Beven’s view can be considered a universal and fundamental statement to which it is difficult to disagree. 

A more pragmatic, but not necessarily conflicting, view is expressed by Grayson and Blöschl 

(2000): “As modellers, we are often left with little choice but to use the effective parameter approach, 

but we must recognise that effective parameters may have a narrow range of application and an effective 

parameter value that “works” for one process may not be valid for another process.” The scaling 

framework presented above should be seen in this context. It is not a fundamental theory but rather a 

collection of different methods and an emphasis on their respective assumptions and associated costs 

in terms of lost information. These methods or building blocks can then be used in composing specific 

scaling methodologies depending on the purposes of the particular modelling studies. In this respect it 

is crucial that the modeller is aware of the limitations of the scaling methodology chosen in a particular 

study. 

61



4.2 Confirmation, Verification, Calibration and Validation 

As illustrated in Fig. 3 the credibility of the descriptions or the agreements between reality, conceptual 

model, model code and model are evaluated through confirmation of the conceptual model, verification 

of the code, model calibration and model validation. These four terms are addressed in this section. 

4.2.1 Confirmation of conceptual model 

The conceptual model, with its selection of process descriptions, equations, etc., is the foundation for the 

model structure. Therefore a good conceptual model is most often a prerequisite for obtaining trustworthy 

model results. In groundwater modelling, establishment of the conceptual model is often considered the 

most important part of the entire modelling process (Middlemis, 2000). Evaluation of conceptual models is 

an important part in assessing uncertainty due to model structure error (Section 4.3 below and [15]). 

Methods for conceptual model confirmation should follow the standard procedures for confirmation of 

scientific theories. This implies that conceptual models should be confronted with actual field data and be 

subject to critical peer reviews. Furthermore, the feedback from the calibration and validation process may 

also serve as a means by which one or a number of alternative conceptual models may be either 

confirmed or falsified. 

As Beven (2002b) argues we need to distinguish between our qualitative understanding (perceptual model) 

and the practical implementation of that understanding in our conceptual model. As a conceptual model is 

defined in [12] as combination of a perceptual model and the simplifications acceptable for a particular 

model study a conceptual model becomes site-specific and even case specific. For example a conceptual 

model of a groundwater aquifer may be described as two-dimensional for a study focussing on regional 

groundwater heads, while it may need to include more complex three-dimensional geological structures for 

a study requiring detailed solute transport simulations. 

4.2.2 Code verification 

The ability of a given model code to adequately describe the theory and equations defined in the 

conceptual model by use of numerical algorithms is evaluated through the verification of the model code. 

Use of the term verification in this respect is in accordance with Oreskes et al. (1994), because 

mathematical equations are closed systems. The methodologies used for code verification include 

comparing a numerical solution with an analytical solution or with a numerical solution from other verified 

codes. However, some programme errors only appear under circumstances that do not routinely occur, 

and may not have been anticipated. Furthermore, for complex codes it is virtually impossible to verify that 

the code is universally accurate and error-free. Therefore, the term code verification must be qualified in 

terms of specified ranges of application and corresponding ranges of accuracy. 

Code verification is not an activity that is carried out from scratch in every modelling study. In a particular 

study it has to be ascertained that the domain of applicability for which the selected model code has been 

verified covers the conditions specified in the actual conceptual model. If that is not the case, additional 

62



verification tests have to be conducted. Otherwise, the code explicitly must be classified as not verified for 

this particular study, and the subsequent simulation results therefore have to be considered with extra caution. 

4.2.3 Model calibration 

The application of a model code to be used for setting up a site-specific model is usually associated with 

model calibration. The model performance during calibration depends on the quantity and quality of the 

available input and observation data as well as on the conceptual model. If sufficient accuracy cannot be 

achieved either the conceptual model and/or the data have to be re-evaluated. 

Many of the publications ([1], [4], [5], [6], [7], [8], [9]) have involved model calibration. This was in all 

cases done manually. Today automatic calibration (inverse modelling) is state-of-the-art (Duan et al., 

1994; Hill, 1998; Doherty, 2003), also as part of the calibration process for rather complex distributed 

physically-based models (Sonnenborg et al., 2003; Henriksen et al., 2003). 

A key issue related to calibration of distributed models with potentially hundreds or thousands of parameter 

values is a rigorous parameterisation procedure, where the spatial pattern of the parameter 

values are defined and the number of free parameters adjustable through calibration is reduced as 

much as possible. A methodology for this is presented in [7], and this issue is further discussed in [4], 

[5], [10] and Andersen et al. (2001). 

4.2.4 Model validation 

Often the model performance during calibration is used as a measure of the predictive capability of a 

model. This is a fundamental error. Many studies (e.g. [4]; [6]; Andersen et al., 2001) have 

demonstrated that the model performance against independent data not used for calibration is generally 

poorer than the performance achieved in the calibration situation. Therefore, the credibility of a sitespecific 

model’s capability to make predictions about reality must be evaluated against independent 

data. This process is denoted model validation. 

In designing suitable model validation tests a guiding principle should be that a model should be tested 

to show how well it can perform the kind of task for which it is specifically intended (Klemes, 1986). 

Klemes proposed the following scheme comprising four types of test corresponding to different situations 

with regard to whether data are available for calibration and whether the catchment conditions are 

stationary or the impact of some kind of intervention has to be simulated: 

• The split-sample test is the classical test, being applicable to cases where there is sufficient data for 

calibration and where the catchment conditions are stationary. The available data record is divided into 

two parts. A calibration is carried out on one part and then a validation on the other part. Both the 

calibration and validation exercises should give acceptable results. 

• The proxy-basin test should be applied when there is not sufficient data for a calibration of the 

catchment in question. If, for example, streamflow has to be predicted in an ungauged catchment Z, 

two gauged catchments X and Y within the region should be selected. The model should be calibrated 

on catchment X and validated on catchment Y and vice versa. Only if the two validation results are 

63



acceptable and similar can the model command a basic level of credibility with regard to its ability to 

simulate the streamflow in catchment Z adequately. 

• The differential split-sample test should be applied whenever a model is to be used to simulate flows, 

soil moisture patterns and other variables in a given gauged catchment under conditions different from 

those corresponding to the available data. The test may have several variants depending on the 

specific nature of the modelling study. If for example a simulation of the effects of a change in climate is 

intended, the test should have the following form. Two periods with different values of the climate 

variables of interest should be identified in the historical record, such as one with a high average 

precipitation and the other with a low average precipitation. If the model is intended to simulate 

streamflow for a wet climate scenario, then it should be calibrated on a dry segment of the historical 

record and validated on a wet segment. Similar test variants can be defined for the prediction of 

changes in land use, effects of groundwater abstraction and other such changes. In general, the model 

should demonstrate an ability to perform through the required transition regime. 

• The proxy-basin differential split-sample test is the most difficult test for a hydrological model, because 

it deals with cases where there is no data available for calibration and where the model is directed to 

predicting non-stationary conditions. An example of a case that requires such a test is simulation of 

hydrological conditions for a future period with a change in climate and for a catchment, where no 

calibration data presently exist. The test is a combination of the two previous tests. 

The above test types are very general and needs to be translated to specific tests in each case depending 

on data availability, hydrological regime and purpose of the modelling study. Except for the situations, 

where the split-sample test is sufficient, rather limited work has been carried out so far on validation 

test schemes. 

From a theoretical point of view the procedures outlined by Klemes (1986) for the proxy-basin and the 

differential split-sample tests, where tests have to be carried out using data from similar catchments, 

are weaker than the usual split-sample test, where data from the specific catchment are available. 

However, no obviously better testing schemes exist. 

It must be realised that the validation test schemes proposed above are so demanding that many applications 

today would fail to meet them. Thus, for many cases where either proxy-basin or differential 

split-sample tests are required, suitable test data simply do not exist. This is for example the case for 

prediction of regional scale transport of potential contamination from underground radionuclide deposits 

over the next thousands of years. In such case model validation is not possible. This does not imply 

that these modelling studies are not useful, only that their output should be recognised to be somewhat 

more uncertain than is often stated and that the term ‘validated model’ should not be used. Thus, a 

model’s validity will always be confined in terms of space, time, boundary conditions, types of application, 

etc. 


Relative to confirmation, verification and calibration, the main scientific contributions in my publications 

[1] – [15] are on the model validation issue. The motivation for this research was twofold: First of all, 

there were too many undocumented claims (over-selling) in the modelling community on model capabilities 

during the years following the development of many comprehensive model codes such as MIKE 

64



SHE. This over-selling was most obvious in practical studies conducted by consultants, but it was also 

common in large parts of the scientific community, e.g. Abbot et al. (1986a,b) and many others. Secondly, 

dominant parts of the hydrological scientific community advocated that model validation was not 

possible (Konikow and Bredehoeft, 1992; Beven, 1996a). This left the practising world in a vacuum 

without scientifically based methodologies to test and document the degree of credibility of particular 

model predictions. The methodologies described in [6] and [7] should be seen as pragmatic approaches 

to help filling this vacuum and the discussions in [12] should be seen as an attempt to provide a scientific 

basis for adopting rigorous model validation schemes as part of a good modelling practise. 

The principles and schemes proposed by Klemes have been extensively used in the last 12 of the publications 

([4] – [15]). Thus, the intercomparison study in [6] was based on a rigorous use of all four types 

of tests. Furthermore, [7] ‘translated’ Klemes’ principles that were developed with lumped conceptual 

models in mind to use in distributed modelling. After demonstrating that a distributed model that was 

validated for simulating catchment response often performs much poorer for internal sites, [7] emphasised 

that a model should only be assumed valid with respect to the outputs that have been directly 

validated. This implies e.g. that multi-site validation is needed if predictions of spatial patterns are required. 

Furthermore, a model which is validated against catchment runoff can not automatically be assumed 

valid also for simulation of erosion on a hillslope within the catchment, because smaller scale processes 

may dominate here; it will need specific validation against hillslope soil erosion data. Furthermore, 

systematic split-sample tests were made in [4], [5] and [9], and proxy- basin tests were conducted in [10]. 

Finally, the validation requirements are emphasised in the publications related to quality assurance [12] 

and [13]. 

[6] and [7] were not the first studies to use Klemes’ principles for validation. For example Quinn and 

Beven (1993) used split sample-tests, proxy-basin tests and differential split-sample tests (wet/dry periods) 

to analyse TOPMODEL’s predictive capabilities for the Plynlimon catchment in Wales. The key 

contribution of [7] and [12] in this respect was the integration of Klemes’ principles as core elements of 

a protocol for good modelling practise. 

The principles outlined in [7] and consolidated in [12] that a model should never be considered universally 

validated, but can only be conditionally validated restricted by the availability of data and specifically 

performed validation tests are well in line with Lane and Richards (2001) who argue that “evidence 

of a successful prediction in observed spaces and times (conventional validation) cannot provide a sufficient 

basis for use of a model beyond the set of situations for which the model has been empirically 

tested”. The principles are also in accordance with the new coherent philosophy for modelling of the 

environment proposed by Beven (2002b) where he argues that it is required to be able to “define those 

areas of the model space where behavioural models occur”. 

65



4.3 Uncertainty Assessment 

This section presents a broad framework originating from Refsgaard et al. (2005) and [14] followed by a 

discussion on data uncertainty (including [14]), parameter uncertainty (including [11]) and model structure 

uncertainty (including [15]) and how they affect model output uncertainty. 

4.3.1 Modelling uncertainty in a water resources management context 

Definitions and Taxonomy 

Uncertainty and associated terms such as error, risk and ignorance are defined and interpreted differently 

by different authors (see Walker et al. (2003) for a review). The different definitions reflect, among 

other factors, the different scientific disciplines and philosophies of the authors involved, as well as the 

intended audience. In addition they vary depending on their purpose. Here I will use the terminology 

used in Refsgaard et al. (2005) and [14] that has emerged after discussions between social scientists 

and natural scientists specifically aiming at applications in model based water management (Klauer and 

Brown, 2003). It is based on a subjective interpretation of uncertainty in which the degree of confidence 

that a decision maker has about possible outcomes and/or probabilities of these outcomes is the central 

focus. Thus, according to this definition a person is uncertain if s/he lacks confidence about the specific 

outcomes of an event. Reasons for this lack of confidence might include a judgement that the information 

is incomplete, blurred, inaccurate, imprecise or potentially false. Similarly, a person is certain if s/he 

is confident about the outcome of an event. It is possible that a person feels certain but has misjudged 

the situation (i.e. s/he is wrong). 

There are many different (decision) situations, with different possibilities for characterising of what we 

know or do not know and of what we are certain or uncertain. A first distinction is between ignorance as 

a lack of awareness about imperfect knowledge and uncertainty as a state of confidence about knowledge 

(which includes the act of ignoring). Our state of confidence may range from being certain to admitting 

that we know nothing (of use), and uncertainty may be expressed at a number of levels in between. 

Regardless of our confidence in what we know, ignorance implies that we can still be wrong (‘in 

error’). In this respect Brown (2004) has defined a taxonomy of imperfect knowledge illustrated in Fig. 

33. 

66



Ignorance: unaware of imperfect knowledge 

Spectrum of confidence (a state of awareness) 

Indeterminacy (‘cannot know’) 

Certainty ‘Bounded’ uncertainty ‘Unbounded’ uncertainty 

No possible outcomes 

known (‘do not know’) 

Some possible 

outcomes and 

probabilities known 

Some possible 

outcomes, but no 


All possible 

outcomes and all 


All possible outcomes 

and some probabilities 

known 

All possible outcomes 

but no probabilities 

known 

Fig. 33 Taxonomy of imperfect knowledge resulting in different uncertainty situations (Brown, 2004) 

In evaluating uncertainty, it is useful to distinguish between uncertainty that can be quantified e.g. by 

probabilities and uncertainty that can only be qualitatively described e.g. by scenarios. If one throws a 

balanced die, the precise outcome is uncertain, but the ‘attractor’ of a perfect die is certain: we know 

precisely the probability for each of the 6 outcomes, each being 1/6. This is what we mean with ‘uncertainty 

in terms of probability’. However, the estimates for the probability of each outcome can also be 

uncertain. If a model study says: “there is a 30% probability that this area will flood two times in the next 

year”, there is not only ‘uncertainty in terms of probability’ but also uncertainty regarding whether the 

estimate of 30% is a reliable estimate. 

Secondly, it is useful to distinguish between bounded uncertainty, where all possible outcomes have 

been identified and unbounded uncertainty, where the known outcomes are considered incomplete. 

Since quantitative probabilities require ‘all possible outcomes’ of an uncertain event and each of their 

individual probabilities to be known, they can only be defined for ‘bounded uncertainties’. If probabilities 

cannot be quantified in any undisputed way, we often can still qualify the available body of evidence for 

the possibility of various outcomes. 

The bounded uncertainty where all probabilities are deemed known (Fig. 33) is often denoted ‘statistical 

uncertainty’ (e.g. Walker et al., 2003). This is the case traditionally addressed in model based uncertainty 

assessment. It is important to note that this case constitutes one of many decision situations outlined 

in Fig. 33, and in other situations the main uncertainty in a decision situation cannot be characterised 

statistically. 

67



Sources of uncertainty 

Walker et al. (2003) describes the uncertainty as manifesting itself at different locations in the model 

based water management process. These locations, or sources, may be characterised as follows: 

• Context, i.e. at the boundaries of the system to be modelled. The model context is typically determined 

at the initial stage of the study where the problem is identified and the focus of the model 

study selected as a confined part of the overall problem. This includes, for example, the external 

economic, environmental, political, social and technological circumstances that form the context of 

problem. 

• Input uncertainty in terms of external driving forces (within or outside the control of the water manager) 

and system data that drive the model such as land use maps, pollution sources and climate 

data. 

• Model structure uncertainty is the conceptual uncertainty due to incomplete understanding and simplified 

descriptions of processes as compared to nature. 

• Parameter uncertainty, i.e. the uncertainties related to parameter values. 

• Model technical uncertainty is the uncertainty arising from computer implementation of the model, 

e.g. due to numerical approximations and bugs in the software. 

• Model output uncertainty, i.e. the total uncertainty on the model simulations taken all the above 

sources into account, e.g. by uncertainty propagation. 

Nature of uncertainty 

Many authors (e.g. Walker et al., 2003) categorise the nature of uncertainty into: 

• Epistemic uncertainty, i.e. the uncertainty due to imperfect knowledge. 

• Stochastic uncertainty, i.e. uncertainty due to inherent variability, e.g. climate variability. 

Epistemic uncertainty is reducible by more studies: e.g. research or data collection. Stochastic uncertainty 

is non-reducible. 

Often the uncertainty on a certain event includes both epistemic and stochastic uncertainty. An example 

is the uncertainty of the 100 year flood at a given site. This flood event can be estimated: e.g. by use of 

standard flood frequency analysis on the basis of existing flow data. The (epistemic) uncertainty may be 

reduced by improving the data analysis, by making additional monitoring (longer time series) or by a 

deepening our understanding of how the modelled system works. However, no matter how much we 

improve our knowledge, there will always be some (stochastic) uncertainty inherent to the natural system, 

related to the stochastic and chaotic nature of several natural phenomena, such as weather. Perfect 

knowledge on these phenomena cannot give us a deterministic prediction, but would have the form 

of a perfect characterisation of the natural variability; for example, a probability density function for rainfall 

in a month of the year. 

68



The uncertainty matrix 

The uncertainty matrix in Table 2 can be used as a tool to get an overview of the various sources of 

uncertainty in a modelling study. The matrix is modified after Walker et al. (2003) in such a way that it 

matches Fig. 33 and so that the taxonomy now gives ‘uncertainty type’ in descriptions that indicates in 

what terms uncertainty can best be described. The vertical axis identifies the source of uncertainty 

while the horizontal axis covers the level and nature of uncertainty. It is noticed that the matrix is in reality 

three-dimensional (source, type, nature), because the categories Type and Nature are not mutually 

exclusive 

Table 2 The uncertainty matrix (modified after Walker et al., 2003). 

Taxonomy (types of uncertainty) 

Source of uncertainty 

Natural, technological, 

Context 

economic, 

social, political 

Inputs System data 

Driving forces 

Model structure 

Model 

Technical 

Parameters 

Model outputs 

Statistical 

uncertainty 

Scenario 

uncertainty 

Qualitative 

uncertainty 

Recognised 

ignorance 

Nature 

Epistemic 

uncertainty 

Stochastic 

uncertainty 

69



Methodologies for assessing uncertainty 

A list of the most common methodologies applicable for addressing different types of uncertainty has 

been compiled and briefly described in Refsgaard et al. (2005). Table 3 provides an overview. 

Table 3 Applicability of different methodologies to address different types and sources of uncertainty 

(modified after Refsgaard et al., 2005). 

Taxonomy (types of uncertainty) 

Statistical 

uncertainty 

Scenario uncertainty 

Qualitative 

uncertainty 

Recognised 

ignorance 

Source of uncertainty 

Natural, technological, 

EE EE, SC, SI EE, EPR, 

Context 

NUSAP, SI, 

economic, 

UM 

social, political 

Inputs System data DA, EPE, EE, DA, EE, SC DA, EE DA, EE 

MCA, SA 

EE, EPR, NU- 

SAP, SI, UM 

Driving forces DA, EPE, EE, DA, EE, SC DA, EE, EPR DA, EE, EPR 

MCA, SA 

Model structure 

EE, MMS, QA EE, MMS, SC, EE, NUSAP, EA, NUSAP, 

Model 

QA 

QA 

QA 

Technical QA QA QA QA 

Parameters EE, IN-PA, SA EE, IN-PA, SA EE EE 

Model outputs 

EPE, EE, IN- EE, IN-UN, EE, NUSAP EE, NUSAP 

UN, MCA, MMS, SA 

MMS, SA 

Abbreviations of methodologies: 

DA Data Uncertainty 

EPE Error Propagation Equations 

EE Expert Elicitation 

EPR Extended Peer Review (review by stakeholders) 

IN-PA Inverse modelling (parameter estimation) 

IN-UN Inverse modelling (predictive uncertainty) 

MCA Monte Carlo Analysis 

MMS Multiple Model Simulation 

NUSAP NUSAP 

QA Quality Assurance 

SC Scenario Analysis 

SA Sensitivity Analysis 

SI Stakeholder Involvement 

UM Uncertainty Matrix 

70



4.3.2 Data uncertainty 

Uncertainty in data is a major source of uncertainty when assessing uncertainty of model outputs. It is 

also an uncertainty source that is very visible for people outside the modelling community. One of the 

scientific contributions of the HarmoniRiB project ([14]) is to address data uncertainty. This has been 

done in three steps: 

• A methodology has been developed for characterising uncertainty in different types of data (Brown 

et al., 2005). 

• A software tool (Data Uncertainty Engine – DUE) for supporting the assessment of data uncertainty 

has been developed (Brown and Heuvelink, 2005). 

• Reviews with results on data uncertainty reported in the literature have been compiled into a guideline 

report for assessing uncertainty in various types of data originating from meteorology, soil physics 

and geochemistry, hydrogeology, land cover, topography, discharge, surface water quality, 

ecology and socio-economics (Van Loon and Refsgaard, 2005). 

The categorisation of data types distinguishes 13 categories (Table 4) for each of which a conceptual 

data uncertainty model is developed. By considering measurement scale, it becomes possible to 

quickly limit the relevant uncertainty models for a certain variable. On a discrete measurement scale, for 

example, it is only relevant to consider discrete probability distribution functions, whereas continuous 

density functions are required for continuous numerical data. In addition, the use of space and time 

variability determines the need for autocorrelation functions alongside a probability density function 

(pdf). Each data category is associated with a range of uncertainty models, for which more specific pdfs 

may be developed with different simplifying assumptions (e.g. Gaussian; second-order stationarity; degree 

of temporal and spatial autocorrelation). 

Table 4 The subdivision of uncertainty categories, along the ‘axes’ of space-time variability and measurement 

scale (Brown et al., 2005). 

Measurement scale 

Space-time variability 

Continuous 

numerical 

Discrete 

numerical 

Categorical 

Narrative 

Constant in space and time A1 A2 A3 

Varies in time, not in space B1 B2 B3 

Varies in space, not in time C1 C2 C3 

4 

Varies in time and space D1 D2 D3 

4.3.3 Parameter uncertainty 

In addition to data uncertainty, uncertainty of parameter values is the most commonly considered 

source of uncertainty in hydrological modelling. The scientifically soundest way of assessing parameter 

uncertainty is through inverse modelling (Duan et al., 1994; Hill, 1998; Doherty, 2003). These tech- 

71



niques have the benefit that they, in addition to optimal parameter values, also produce calibration statistics 

in terms of parameter- and observation sensitivities, parameter correlation and parameter uncertainties. 

When parameter uncertainties are assessed they can be propagated through the model to infer about 

model output uncertainty. A serious constraint in this respect is the interdependence between model 

parameters and model structure as discussed under model structure uncertainty below. 

[11] describe an example of how (input) data uncertainty and parameter uncertainty are propagated 

through a model to assess uncertainty in model simulation of nitrate concentrations in groundwater. The 

assessment of data and parameter values were done by expert judgement and a Monte Carlo technique 

with Latin hypercube sampling was used for the uncertainty propagation. The simulated uncertainty 

band around the deterministic model simulation in Fig. 25 is shown in Fig. 34 based on 25 Monte 

Carlo realisations. The uncertainty is seen to be considerable, e.g. with the estimate of the areal fraction 

of the aquifer having concentrations less than 50 mg NO 3 /l ranging between 30% and 80%. 

1 

0,8 

Cum. frequency 

0,6 

0,4 

0,2 

(ultimo 1993) 

0 

0 20 40 60 80 100 120 140 160 180 

mg/l 

Fig. 34 Measured (•) and simulated (×) areal distribution of NO 3 concentrations in groundwater at a 

point in time. Measured values are based on 35 groundwater observations. [11]. 

As noted in [11] a fundamental limitation of the approach adopted in [11] is that the errors due to incorrect 

model structure are neglected. As discussed also below one approach to assess such model structure 

error is through comparison of predicted and observed values. In the present case (Figs 25 and 34) 

the deviation between observed and simulated values is so small that this term may be neglected. This 

is, however, by no means a proof of a correct model structure. It only shows that the particular model 

performs without apparent model errors for this particular application. 

72



4.3.4 Model structure uncertainty 

Existing approaches and new framework 

Any model is an abstraction, simplification and interpretation of reality. The incompleteness of a model 

structure and the mismatch between the real causal structure of a system and the assumed causal 

structure as represented in a model will therefore always result in uncertainty about model predictions. 

The importance of the model structure for predictions is well recognised, even for situations where predictions 

are made on output variables, such as discharge, for which field data are available (Franchini 

and Pacciani, 1992; Butts et al., 2004). The considerable challenge faced in many applications of environmental 

models is that predictions are required beyond the range of available observations, either in 

time or in space, e.g. to make extrapolations towards unobservable futures (Babendreier, 2003) or to 

make predictions for natural systems, such as ecosystems, that are likely to undergo structural changes 

(Beck, 2005). In such cases, uncertainty in model structure is recognised by many authors to be the 

main source of uncertainty in model predictions (Dubus et al., 2003; Neumann and Wierenga, 2003; 

Linkov and Burmistrov, 2003). 

The existing strategies for assessing uncertainty due to incomplete or inadequate model structure may 

be grouped into the categories shown in Fig. 35. The most important distinction is whether data exist 

that makes it possible to infer directly on the model structure uncertainty. This requires that data are 

available for the output variable of predictive interest and for conditions similar to those in the predictive 

situation. In other words it is a distinction between whether the model predictions can be considered as 

interpolations or extrapolations relative to the calibration situation. 

Availability of data for 

model validation test 

Target data exist 

(interpolation) 

No direct data 

(extrapolation) 

Increase 

parameter 

uncertainty 

Estimate 

structural 

term 

Multiple 

conceptual 

models 

Expert 

elicitation 

Pedigree 

analysis 

Intermediate data 

(differential splitsample 

case) 

No data at all 

(proxy basin case) 

Fig. 35 Classification of existing strategies for assessing conceptual model uncertainty [15]. 

73



The two main categories are thus equivalent to different situations with respect to model validation 

tests. According to Klemes’ classical hierarchical test scheme (Klemes, 1986; see Section 4.2 above), 

the interpolation case corresponds to situations where the traditional split-sample test is suitable, while 

the extrapolation case corresponds to situations where no data exist for the concerned output variable 

(proxy-basin test) or where the basin characteristics are considered non-stationary, e.g. for predictions 

of effects of climate change or effects of land use change (differential split-sample test). 

The strategies used in ‘interpolation’, i.e. for situations that are similar to the calibration situation with 

respect to variables of interest and conditions of the natural system, have the advantage that they can 

be based directly on field data (e.g. Radwan et al., 2004; van Griensven and Meixner, 2004; and Vrugt 

et al., 2005). A fundamental weakness is that field data are themselves uncertain. Nevertheless, in 

many cases, they can be expected to provide relatively accurate estimates of, at least, the total predictive 

uncertainty for the specific measured variable and for the same conditions as those in the calibration 

and validation situation. A more serious limitation of the strategies depending on observed data is 

that they are only applicable for situations where the output variables of interest are measured. While 

relevant field data are often available for variables such as water levels and water flows, this is usually 

not the case for concentrations, or when predictions are desired for scenarios involving catchment 

change, such as land use change or climate change. Another serious limitation stems from an assumption 

that the underlying system does not undergo structural changes, such as changes in ecosystem 

processes due to climate change. 

The strategy that uses multiple conceptual models benefits from an explicit analysis of the effects of 

alternative model structures, e.g. IPCC (2001), Harrar et al. (2003), Troldborg (2004), Poeter and 

Anderson (2005) and Højberg and Refsgaard (2005). The multiple conceptual model strategy makes it 

possible to include expert knowledge on plausible model structures. This strategy is strongly advocated 

by Neuman and Wierenga (2003) and Poeter and Anderson (2005). They characterise the traditional 

approach of relying on a single conceptual model as one in which plausible conceptual models are rejected 

(in this case by omission). They conclude that the bias and uncertainty that results from reliance 

on an inadequate conceptual model are typically much larger than those introduced through an inadequate 

choice of model parameter values. This view is consistent with Beven (2002b) who outlines a 

new philosophy for modelling of environmental systems. The basic aim of his approach is to extend 

traditional schemes with a more realistic account of uncertainty, rejecting the idea that a single optimal 

model exists for any given case. Instead, environmental models may be non-unique in their accuracy of 

both reproduction of observations and prediction (i.e. unidentifiable or equifinal), and subject to only a 

conditional confirmation, due to e.g. errors in model structure, calibration of parameters and period of 

data used for evaluation. 

A weakness of the multiple modelling strategy, is the absence of quantitative information about the extent 

to which each model is plausible. Furthermore, it may be difficult to sample from the full range of 

plausible conceptual models. In this respect, expert knowledge on which the formulations of multiple 

conceptual models are based, is an important and unavoidable subjective element. 

The framework presented in [15] for assessing the predictive uncertainties of environmental models 

used for extrapolation includes a combination of use of multiple conceptual models and assessment by 

use of the pedigree approach of their credibility as well as a reflection on the extent to which the sampled 

models adequately represent the space of plausible models. 

74



The role of model calibration 

Some of the existing strategies used in ‘interpolation’ cannot differentiate how the total predictive uncertainty 

originates from model input, model parameter and model structure uncertainty. Other methods 

attempt to do so, but as discussed in [15] this is problematic. In the case of uncalibrated models, the 

parameter uncertainty is very difficult to assess quantitatively, and wrong estimates of model parameter 

uncertainty will influence the estimates of model structure uncertainty. In the case of calibrated models, 

estimates of model parameter uncertainty can often be derived from autocalibration routines. An inadequate 

model structure will, however, be compensated by biased parameter values to optimise the 

model fit with field data during calibration. Hence, the uncertainty due to model structure will be underestimated 

in this case. 

The importance of model calibration can be illustrated by the example described in Højberg and 

Refsgaard (2005). They use three different conceptual models, based on three alternative geological 

interpretations, for a multi-aquifer system in Denmark. Each of the models was calibrated against piezometric 

head data using inverse technique. The three models provided equally good and very similar 

predictions of groundwater heads, including well field capture zones. However, when using the models 

to extrapolate beyond the calibration data to predictions of flow pathways and travel times the three 

models differed dramatically. When assessing the uncertainty contributed by the model parameter values, 

the overlap of uncertainty ranges between the three models significantly decreased when moving 

from groundwater heads to capture zones and travel times. They conclude that the larger the degree of 

extrapolation, the more the underlying conceptual model dominates over the parameter uncertainty and 

the effect of calibration. 

This diminishing effect of calibration as the prediction situation is extrapolated further and further away 

from the calibration base resembles the conclusion on the effects of updating relative to the underlying 

process model, when forecast lead times are increased in real-time forecasting (Fig. 27, Section 3.3). 

Here the effect of updating is reduced and the forecast error therefore increases as the forecast lead 

time (= degree of extrapolation) increases. 


Uncertainty is a key, and crosscutting, issue that I consider a useful platform or catalyst for establishing 

a common understanding in hydrological modelling and water resources management. By this I mean 

both a common understanding within the natural science based modelling issues such as scaling and 

validation and between people from the modelling and the monitoring communities as well as a broader 

dialogue between modellers and stakeholders on issues such as when is a model accurate and credible 

enough for its purpose of application, see Subsection 4.4.4 below. 

In the publications on developing the Suså model ([1], [2]) and the oxygen module ([3]) no explicit consideration 

is given to the goodness of the model structure and uncertainty assessment was not an issue 

at all. In the later work on catchment modelling in India ([4], [5]), where some twisting was done of the 

physical realism of the model due to scaling problems, it was noted that the model results might be 

‘right for the wrong reasons’, and the limitations of model applicability were emphasised in this respect, 

but no uncertainty assessments were made. In the paper describing a methodology for parameterisa- 

75



tion, calibration and validation of distributed hydrological models ([7]) uncertainty is also neglected. In 

the publications [6], [8], [9] and [10] uncertainty is discussed, but as a secondary issue only. 

Although examples of model prediction uncertainty assessments had been reported previously from 

different modelling disciplines (e.g. Refsgaard et al., 1983; Beck, 1987), the fist to emphasise the need 

to systematically perform uncertainty assessments related to catchment model predictions was probably 

Beven (1989). This was followed by Binley et al. (1991) who used Monte Carlo analysis to assess 

the predictive uncertainty for the Institute of Hydrology Distributed Model and by the introduction of the 

Generalised Likelihood Uncertainty Estimation (GLUE) methodology (Beven and Binley, 1992) after 

which uncertainty in catchment modelling was high on the agenda in the scientific community. 

My main scientific contributions on uncertainty are the publications [11], [14] and [15] and the link of 

uncertainty to principles and protocols for good modelling practise in [12] and [13]. Although reported 10 

years later than Binley et al. (1991), [11] was one of the first studies with uncertainty propagation 

through a complex, coupled distributed physically based catchment model with a focus on water quality. 

A key contribution of [14] and Refsgaard et al. (2005) is the broad framework for characterising uncertainty. 

This framework provides the link to uncertainty in the quality assurance work ([12], [13]). This 

broad framework is inspired by research in social science (Pahl-Wostl, 2002; van Asselt and Rotmans, 

2002; Dewulf et al., 2005). The main difference between the traditions in social science and natural 

science is that social scientists emphasise participatory processes including consultation and involvement 

of users, also on uncertainty aspects, right from the beginning of a study, while natural scientists 

often talk about users as someone to which uncertainty results should be communicated, e.g. Pappenberger 

and Beven (2006). 

The most difficult uncertainty problem (in natural science) to handle today is the model structure uncertainty, 

and the most important and novel contribution is probably the efforts made in this respect, primarily 

the new framework outlined in [15] but also the inclusion of options for evaluating multiple conceptual 

models in the HarmoniQuA modelling protocol ([13] and Fig. 5). The approach suggested in [15] 

of using multiple conceptual models (model structures) is not new (IPCC, 2001; Beven, 2002a; Neuman 

and Wierenga, 2003) and the use of pedigree analysis to qualitatively assess the credibility of something 

is not new either (van der Sluijs et al., 2005). The novelty lies in the combination of the two approaches 

that originate from different disciplines. 

76



4.4 Quality Assurance in Model based Water Management 

4.4.1 Background 

During the last decade many problems have emerged in river basin modelling projects, including poor 

quality of modelling, unrealistic expectations, and lack of credibility of modelling results. Some of the 

reasons for this lack of quality can be evaluated ([13]; Scholten et al., 2007) as the effect of: 

• Ambiguous terminology and a lack of understanding between key-players (modellers, clients, reviewers, 

stakeholders and concerned members of the public) 

• Bad practice (careless handling of input data, inadequate model set-up, insufficient calibration/validation 

and model use outside of its scope) 

• Lack of data or poor quality of available data 

• Insufficient knowledge on the processes 

• Poor communication between modellers and end-users on the possibilities and limitations of the 

modelling project and overselling of model capabilities 

• Confusion on how to use model results in decision making 

• Lack of documentation and clarity on the modelling process, leading to results that are difficult to 

audit or reproduce 

• Insufficient consideration of economic, institutional and political issues and a lack of integrated 

modelling. 

In the water resources management community many different guidelines on good modelling practice 

have been developed, see [13] for a review. One, if not the most, comprehensive example of a modelling 

guideline has been developed in The Netherlands (Van Waveren et al., 2000) as a result of a process 

involving all the main players in the Dutch water management field. The background for this was a 

perceived need to improve the quality of modelling (Scholten et al., 2000). Similarly, modelling guidelines 

for the Murray-Darling Basin in Australia were developed due to the perception among end-users 

that model capabilities may have been ‘over-sold’, and that there was a lack of consistency in approaches, 

communication and understanding among and between the modellers and the water managers, 

which often resulted in considerable uncertainty for decision making (Middlemis, 2000). 

4.4.2 The HarmoniQuA approach 

A software tool, MoST, with its associated knowledge base (KB), has been developed by the HarmoniQuA 

project ([13]; Scholten et al., 2007) to provide QA in modelling through guidance, monitoring 

and reporting. As defined in HarmoniQuA: “Quality Assurance (QA) is the procedural and operational 

framework used by an organisation managing the modelling study to build consensus among the organisations 

concerned in its implementation, to assure technically and scientifically adequate execution 

of all tasks included in the study, and to assure that all modelling-based analysis is reproducible and 

justifiable”. This modification of the older NRC (1990) definition includes the organisational, technical 

77



and scientific aspects, but also the need to build consensus among the organisations concerned in accordance 

with the discussion in Section 2.1 above. 

Guidelines for good modelling practise are included in the Knowledge Base (KB) of MoST. The modelling 

process has been decomposed into five steps, see the flowchart in Fig. 5. Each step includes several 

tasks. Each task has an internal structure i.e. name, definition, explanation, interrelations with other 

tasks, activities, activity related methods, references, sensitivity/pitfalls, task inputs and outputs. 

The KB contains knowledge specific to seven domains (groundwater, precipitation-runoff, river hydrodynamics, 

flood forecasting, water quality, ecology and socio-economics), and forms the heart of the 

tool. A computer based journal is produced within MoST where the water manager and modelling team 

record the progress and decisions made during a model study according to the tasks in the flowchart. 

This record can be used when reviewing the model study to judge its quality. 

The most important QA principles incorporated in the KB are: 

• The five modelling steps conclude with a formal dialogue between the modeller and manager, 

where activities and results from the present step are reported, and details of plans for the next step 

(a revised work plan) are discussed. 

• External reviews are prescribed as the key mechanism of ensuring that the knowledge and experience 

of other independent modellers are used. 

• The KB provides public interactive guidelines to facilitate dialogue between modellers and the water 

manager, with options to include auditors (reviewers), stakeholders and the public. 

• There are many feed back loops, some technical involving only the modeller, and others that may 

require a decision before doing costly additional work. 

• The KB allows performance and accuracy criteria to be updated during the modelling process. In 

the first step the water manager’s objectives and requirements are translated into performance criteria 

that may include qualitative and quantitative measures. These criteria may be modified during 

the formal reviews of subsequent steps. 

• Emphasis is put on validation schemes, i.e. tests of model performance against data that have not 

been used for model calibration. 

• Uncertainties must be explicitly recognised and assessed (qualitatively and/or quantitatively) 

throughout the modelling process. 

MoST supports multi-domain studies and working in teams of different user types (water managers, 

modellers, auditors, stakeholders and members of the public). It contains an interactive glossary that is 

accessible via hyperlinked text. The key functionality of MoST is to: 

• Guide, to ensure a model has been properly applied. This is based on the Knowledge Base. 

• Monitor, to record decisions, methods and data used in the modelling work and in this way enable 

transparency and reproducibility of the modelling process. 

• Report, to provide suitable reports of what has been done for managers/clients, modellers, auditors, 

stakeholders and the general public. 

78



4.4.3 Organisational requirements for QA guidelines to be effective 

Modelling studies involve several parties with different responsibilities. The key players are modellers 

and water managers, but often reviewers, stakeholders and the general public are also involved. To a 

large extent the quality of the modelling study is determined by the expertise, attitudes and motivation 

of the teams involved in the modelling and QA process. 

QA will only be successful if all parties actively support its use. The attitude of the modellers is important. 

NRC (1990) characterises this as follows: “most modellers enjoy the modelling process but find 

less satisfaction in the process of documentation and quality assurance”. Scholten and Groot (2002) 

describe the main problem with the Dutch Handbook on Good Modelling Practice as “they all like it, but 

only a few use it”. The water manager, however, has a particular responsibility, because he/she has the 

power to request and pay for adequate QA in modelling studies. Therefore, QA guidelines can only be 

expected to be used in practice if the water manager prescribes their use. It is therefore very important 

that the water manager has the technical capacity to organise the QA process. Often, water managers 

do not have individuals available with the appropriate training to understand and use models. An external 

modelling expert should then be sought to help with the QA process. However, this requires that the 

manager is aware of the problem and the need. 

4.4.4 Performance criteria and uncertainty – when is a model good enough 

A critical issue is how to define the performance criteria. We agree with Beven (2002b) that any conceptual 

model is known to be wrong and hence any model will be falsified if we investigate it in sufficient 

detail and specify very high performance criteria. Clearly, if one attempts to establish a model that 

should simulate the truth it would always be falsified. However, this is not very useful information. 

Therefore, we are using the conditional validation, or the validation restricted to domain of applicability 

(or numerical universal as opposed to strictly universal in Popperian terms). The good question is then 

what is good enough Or in other words what are the criteria How do we select them 

A good reference for model performance is to compare it with uncertainties of the available field observations. 

If the model performance is within this uncertainty range we often characterise the model as 

good enough. However, usually it is not so simple. How wide confidence bands do we accept on observational 

uncertainties – ranges corresponding to 65%, 95% or 99% Do we always then reject a model 

if it cannot perform within the observational uncertainty range In many cases even results from less 

accurate models may be useful. 

Therefore, the decision on what is good enough generally must be taken in a socio-economic context. 

For instance, the accuracy requirements to a model to be used for an initial screening of alternative 

options for location of a new small well field for a small water supply will be much smaller than the requirements 

to a model that is intended to be used for the final design of a large well field for a major 

water supply in an area with potential damaging effects on precious nature and other significant conflicts 

of interests. Thus, the accuracy criteria can not be decided universally by modellers or researchers, 

but must be different from case to case depending on how much is at stake in the decision to depend 

on the support from model predictions. This implies that the performance criteria must be discussed 

and agreed between the manager and the modeller beforehand. 

79



Accuracy requirements and uncertainty assessments of model simulations are two sides of the same 

coin, just seen from two different perspectives, namely the water manager and the modeller. As all uncertainty 

can not be characterised as statistical uncertainty (see Fig. 33 and Tables 2 and 3 in Subsection 

4.3.1) it is also required to characterise accuracy requirements in qualitative terms. Furthermore the 

risk perception of the water manager and the stakeholders/public has to be considered. Therefore, involvement 

of stakeholders and public are most often required as an integrated part of this process (see 

also Section 2.1 and Figs. 1-2). According to the HarmoniQuA methodology stakeholder/public involvement 

is crucial at the beginning of a modelling project to frame the problem, define the requirements 

and assess the uncertainties (Henriksen et al., submitted). 

This way of thinking is well in line with the principles behind some of the Water Framework Directive 

Guidance Documents. For example the Guidance Document on Monitoring (EC, 2003a) does not specify 

the levels of precision and confidence required from the monitoring programmes, but rather states 

that the precision and confidence level should be sufficient to enable a meaningful assessment of for 

instance the status of the environment and should be sufficient to achieve an acceptable risk of making 

the wrong decision. This obviously calls for uncertainty assessments and public participation to have a 

central role in the entire process, which pave the road towards making adaptive management an important 

part of the river basin management process (Pahl-Wostl, 2002). 


The ideas and concepts behind the HarmoniQuA guidelines ([12], [13]) summarised above have been 

inspired from previous QA guidelines. The novel contributions have been inspired both from previous 

research activities (including [4], [5], [6], [7], [9], [11]) and from participation in a large range of national 

and international consultancy projects. Without having been in this crossroad between the research 

world and the practical world for more than two decades this would not have been possible. I consider 

my most important contributions in this respect to be: 

• The terminology and guiding principles behind the guidelines [12] are novel in their attempt to formulate 

a coherent approach that on the one hand has a solid scientifically philosophical foundation 

and on the other hand can be useful for practitioners. In the very controversial issue of model validation, 

where there has been almost a deadlock between different schools with respect to whether 

validation at all is possible, the philosophy of conditional validation is novel. 

• The major novelty of the HarmoniQuA approach does not lie in its guidance on model technical 

issues, but on its emphasis and more elaborate focus on the dialogue between modeller, water 

manager, reviewer, stakeholders and the public. In addition, there are novel elements on the large 

emphasis on uncertainty assessments throughout the modelling process and model validation. Finally, 

the emphasis on model reviews allows bringing in subjective knowledge and experience in the 

QA process. 

Both the HarmoniQuA guidelines and other recent good modelling practise guidelines have been 

deeply rooted both in the scientific community and among practitioners ([13]). As a comparison, ideas 

originating alone from the natural science community, such as the suggested Code of Practise on performing 

uncertainty analysis by Pappenberger and Beven (2006), are typically limited to valuable contributions 

on model technical issues, while they often do not consider the broader aspects of the modelling 

process such as the involvement of water managers and stakeholders. 

80



5 Conclusions and Perspectives for Future Work 

5.1 Summary of Main Scientific Contributions 

The contributions to scientific knowledge in the papers of the present thesis are discussed in the previous 

chapters. The main contributions have been in the following five areas: 

• New conceptual understanding and code development. The Suså model ([1], [2]) was based on a 

new conceptual understanding of the surface water/groundwater interaction in moraine catchment. 

The code and its application brought new insight regarding the effect of groundwater abstraction on 

streamflow in catchments with such hydrogeological characteristics. 

• Model validation. The adoption and adaptation of rather rigorous principles for model validation and 

the examples of their application both for lumped conceptual and distributed physically based models 

is a cornerstone in my research. This work was first published in [6] and [7] and later brought 

into a broader modelling framework in [12] and [13]. In particular the introduction of the term ‘conditional 

validation’ in [7] and the outline of its scientific philosophical basis in [12] is novel. 

• Scaling. The publications focussing on scaling ([7], [10]) presents ideas crystallised from work with 

scaling problems in many modelling studies ranging from point scale to thousands of km 2 . The later 

framework, outlined in Section 4.1 above does not in any way ‘solve’ the scaling problem but contributes 

to clarifications on applicable methodologies with focus on their respective assumptions and 

limitations. 

• Uncertainty assessment. During the past decade a considerable part of my research work has focussed 

on uncertainty aspects. I consider my main contributions in this respect to be the introduction 

of the broader uncertainty framework integrated into the modelling framework ([13], [14]) and 

the work with model structure uncertainty ([15]). 

• Modelling protocols and guidelines for quality assurance in the modelling process. The modelling 

protocol in [7] and the later and more comprehensive one presented as part of the guidelines for 

quality assurance in the modelling framework in [13] are a formalisation of experience and practises 

that have gradually emerged over the years. The novel elements in [13] are the emphasis on (a) the 

interactive dialogue between modeller, water manager, reviewer, stakeholders and the public; (b) 

uncertainty assessments throughout the modelling process; (c) model validation; and (d) experience 

and subjective knowledge introduced through external model reviews. 

These main contributions to scientific knowledge would, however, not have been possible without the 

experience and insight gained in modelling studies ranging from point scale ([3]) to large catchments 

([4], [5], [8], [9], [11]). 

81



5.2 Modelling Issues for Future Research 

Hydrological modelling has developed significantly during the three decades I have worked in this field. 

I started with editing punch cards and could only run one simulation per day (overnight) using model 

codes that today are considered small and simple. Since then, comprehensive new knowledge has 

been build into model codes and into the methodologies used in the modelling process. 

During the process of writing this thesis, where I had to review my older publications, it was interesting 

to note the gradual change in research focus. The first decade my research focused on development of 

new codes. During the second decade more general methodological problem areas such as scaling 

and model validation were addressed. Towards the end of the third decade the emphasis is now on the 

broader issues such as uncertainty assessment and quality assurance frameworks for the entire modelling 

process, and the interaction between the modelling and the water management processes. While 

this no doubt is affected by personal and career developments, it also reflects a general trend. We are 

no longer satisfied with being able to produce beautiful simulations with sophisticated new model 

codes; we also want to evaluate the credibility of such simulations and to apply them in real-world water 

management decisions. 

Certainly I did not foresee this development three decades ago. On this background it is therefore not 

wise to make long range forecasts on what we can expect as the key issues for future modelling research. 

Hence, the following list should not pretend to cover all the most important research issues for 

modelling during the coming many years. It rather presents a list of issues which I, seen from the perspective 

dealt with in the present thesis, consider the presently most important and fundamental problems 

requiring more research during the coming years. 

• Improved representation of heterogeneity in reactive transport modelling. There will always be a 

need to improve our conceptual understanding of hydrological processes. It appears that, whereas 

we have had some success with prediction of flows and hydraulic heads, the existing paradigms in 

hydrological modelling are not good enough to simulate concentrations of conservative and reactive 

contaminants. Flows and hydraulic heads are much less depending on heterogeneity than concentrations, 

and it will be necessary to include heterogeneity much more explicitly in the modelling than 

done until now. Examples of areas, where this is important, include simulation of transport and fate 

of contaminants in aquifers and simulation of the stream-aquifer interaction governed by processes 

in river valleys. 

• Utilisation of new data types. Whenever possible we should try to make use of new data types. New 

techniques for collecting satellite data on surface conditions and geophysical data on subsurface 

features are promising and have not been fully exploited yet. We can hope and expect that better 

techniques will be developed during the coming years. Thus, it is not unrealistic in some years to 

have improved data providing both a much better spatial resolution of catchment/aquifer properties 

and on-line information on state variables. The improved spatial resolution can help us give a better 

representation of heterogeneities in models (see above), while on-line information provide interesting 

potentials for improved management. In order to utilise on-line data optimally new and improved 

data assimilation (updating) techniques will be required. 

82



• Model structure error. Probably the most important single issue related to uncertainty of model predictions 

is how to assess uncertainty caused by model structure error. It is important, because the 

most interesting fields of model applications deal with assessments of the effects on the ecosystem 

of human activities. And it is at the same time fundamentally difficult, because we in such situations 

are using models beyond the situations, where we can test the model performance against field 

data. I consider the framework based on multiple conceptual models ([15]) only to be a very first 

beginning in this respect. 

• Uncertainty and credibility of modelling in relation to water resources management. Uncertainty 

assessments of model predictions are crucial for a sound use of models in water resources management 

in practise. Model predictions without uncertainty assessments correspond to only presenting 

a (minor) part of the available information. Uncertainty in relation to water resources management 

in practise is not confined to statistical uncertainty. It is also required to include aspects of 

qualitative uncertainty and ignorance. Furthermore, uncertainty must be seen in a broad socioeconomic 

context where stakeholder and policy views are taken into account. There are many future 

challenges on this multi-disciplinary road. How do we ensure that models incorporate the best 

available information and adequately address the issues and the priorities set by water managers 

and stakeholders How should we translate objectives and requirements formulated in qualitative 

language by water managers and stakeholders to accuracy criteria for a modelling study And how 

should we compile and present uncertainties from a modelling study in a way that is understandable 

by non-modellers Some of these questions are likely to be answered within the context of new water 

management paradigms such as adaptive management. 

83



6 References 

Abbott MB (1992) The theory of the hydrological model, or: the struggle for the soul of hydrology. In: O’Kane 

JP (Ed.) Advances in theoretical hydrology, Elsevier, 237-254. 

Abbott MB, Bathurst JC, Cunge JA, O'Connel PE, Rasmussen J (1986a) An introduction to the European 

Hydrological System - Systeme Hydrologique Européen "SHE", 1: History and philosophy of a physically-based 

distributed modelling system. Journal of Hydrology, 87, 45-59. 

Abbott MB, Bathurst JC, Cunge JA, O'Connel PE, Rasmussen J (1986b) An introduction to the European 

Hydrological System - Systeme Hydrologique Européen "SHE", 2: Structure of a physically-based distributed 

modelling system. Journal of Hydrology, 87, 61-77. 

Abrahamsen P, Hansen S (2000) Daisy: an open soil-crop-atmosphere system model. Environmental Modelling 

& Software, 15, 313-330. 

Andersen J, Refsgaard JC, Jensen KH (2001) Distributed hydrological modelling of the Senegal River Basin 

– model construction and validation. Journal of Hydrology, 247, 200-214. 

Anderson MP, Woessner WW (1992) The role of postaudit in model validation. Advances in Water Resources, 

15, 167-173. 

Babendreier JE (2003) National-scale multimedia risk assessment for hazardous waste disposal. International 

Workshop on Uncertainty, Sensitivity and Parameter Estimation for Multimedia Environmental 

Modelling held at U.S Nuclear Regulatory Commission, Rockville, Maryland, August 19-21, 2003. Proceedings, 

103-109. 

Bathurst JC (1986a) Physically-based distributed modelling of an upland catchment using the Systeme Hydrologique 

Européen. Journal of Hydrology, 87, 79-102. 

Bathurst JC (1986b) Sensitivity analysis of the Systeme Hydrologique Européen for an upland catchment. 


Beck MB (1987) Water quality modelling: a review of the analysis of uncertainty. Water Resources Research, 

23(8), 1393-1442. 

Beck MB (2005) Environmental foresight and structural change. Environmental Modelling & Software, 20, 

651-670. 

Bergström (1976) Development and application of a conceptual runoff model for Scandinavian catchments. 

PhD Thesis, University of Lund, Bulletin Series A No 52. 

Bergström S (1992) The HBV model – its structure and applications. SMHI RH No 4. Norrköping. 

Bergström S (1995) The HBV model. In: Singh VP (Ed) Computer Models of Watershed Hydrology. Water 

Resources Publications, Highlands Ranch, Colorado, 443-476. 

Bergström S, Forsman A (1973) Development of a conceptual deterministic rainfall-runoff model. Nordic 

Hydrology, 4, 147-170. 

Beven K (1989) Changing ideas in hydrology – the case of physically based models. Journal of Hydrology, 

105, 157-172. 

Beven K (1995) Linking parameters across scales: Subgrid parameterization and scale dependent hydrological 

models. Hydrological Processes, 9, 507-525. 

Beven K (1996a) A discussion of distributed hydrological modelling. In: Abbott MB, Refsgaard JC (Eds): 

Distributed Hydrological Modelling, Kluwer Academic Publishers, 255-278. 

Beven K (1996b) Response to comments on ‘A discussion of distributed hydrological modelling’. In: Abbott 

MB, Refsgaard JC (Eds): Distributed Hydrological Modelling, Kluwer Academic Publishers, 289-295. 

Beven K (2001) How far can we go in distributed hydrological modelling Hydrology and Earth System Sciences, 

5(1), 1-12. 

Beven K (2002a) Towards an alternative blueprint for a physically based digitally simulated hydrologic response 

modelling system. Hydrological Processes, 16(2), 189-206. 

Beven K (2002b) Towards a coherent philosophy for modelling the environment. Proceedings of the Royal 

Society of London, A, 458 (2026), 2465-2484. 

84



Beven K, Binley AM (1992) The future of distributed models: model calibration and uncertainty prediction. 

Hydrological Processes, 6, 279-298. 

Binley AM, Beven KJ, Calver A, Watts LG (1981) Changing Responses in Hydrology: Assessing the Uncertainty 

in Physically Based Model Predictions. Water Resources Research, 27(6), 1253-1261. 

Birkinshaw SJ, Ewen J (2000) Nitrogen transformation component for SHETRAN catchment nitrate transport 

modelling. Journal of Hydrology, 230, 1-17. 

Blöschl G, Sivapalan M (1995) Scale issues in hydrological modelling: A review. Hydrological Processes, 9, 

251-290. 

Brown JD (2004) Knowledge, uncertainty and physical geography: towards the development of methodologies 

for questioning belief. Transactions of the Institute of British Geographers 29(3), 367-381. 

Brown JD, Heuvelink GBM, Refsgaard JC (2005) An integrated framework for assessing and recording uncertainties 

about environmental data. Water Science and Technology, 52(6), 153-160. 

Brown JD, Heuvelink GBM (2005) Data Uncertainty Engine (DUE) User’s Manual. University of Amsterdam. 

http://www.harmonirib.com. 

Butts MB, Payne JT, Kristensen M, Madsen H (2004) An evaluation of the impact of model structure on hydrological 

modelling uncertainty for streamflow prediction. Journal of Hydrology, 298, 242-266. 

Burnash RJC (1995) The NWS river forecast system - catchment modelling. In: Singh VP (Ed): Computer 

Models of Watershed Hydrology, Water Resources Publications, 311-366. 

Christensen S (1994) Hydrological Model for the Tude Å Catchment. Nordic Hydrology, 25, 145-166. 

Conan C, Bouraoui F, Turpin N, de Marsily G, Bidoglio G (2003) Modelling Flow and Nitrate Fate at Catchment 

Scale in Brittany (France). Journal of Environmental Quality, 32, 2026-2032. 

Crawford NH, Linsley RK (1966) Digital simulation in hydrology, Stanford Watershed Model IV, Department 

of Civil Engineering, Stanford University, Technical Report 39. 

Currie JA (1961) Gaseous diffusion in the aeration of aggregated soils. Soil Science, 92, 40-45. 

Dagan G (1986) Statistical theory of groundwater flow and transport: pore to laboratory, laboratory to formation 

and formation to regional scale. Water Resources Research, 22(9), 120-134. 

De Marsily, G Combes P, Goblet P (1992) Comments on 'Ground-water models cannot be validated', by 

Konikow LF, Bredehoeft, JD, Advances in Water Resources, 15, 367-369. 

Dewulf A, Craps M, Bouwen R, Pahl-Wostl C (2005) Integrated management of natural resources dealing 

with ambiguous issues, multiple actors and diverging frames. Water Science and Technology, 52(6), 

115-124. 

DHI (1995) MIKE 21 Short Description. Danish Hydraulic Institute, Hørsholm, Denmark. 

Djuurhus J, Hansen S, Schelde K, Jacobsen OH (1999) Modelling mean nitrate leaching from spatially variable 

fields using effective parameters. Geoderma, 87,261-279. 

Doherty J (2003) Ground water model calibration using pilot points and regularization. Ground Water, 41(2), 

170-177. 

Duan Q, Sorooshian S, Gupta VK (1994) Optimal use of the SCE-UA global optimization method for calibrating 

watershed models. Journal of Hydrology 158, 265–284. 

Dubus, IG, Brown CD, Beulke S (2003) Sources of uncertainty in pesticide fate modelling. The Science of 

the Total Environment, 317, 53-72. 

EC (1992) Working Group of Independent Experts on Variant C of the Gabcikovo-Nagymaros Project, Working 

Group Report, Commission of the European Communities, Czech and Slovak Federative Republic, 

Republic of Hungary, Budapest November 23, 1992. 

EC (1993a) Working Group of Monitoring and Water Management Experts for the Gabcikovo System of 

Locks - Data Report, Commission of the European Communities, Republic of Hungary, Slovak Republic, 

Budapest November 2, 1993. 

EC (1993b) Working Group of Monitoring and Water Management Experts for the Gabcikovo System of 

Locks - Report on Temporary Water Management Regime, Commission of the European Communities, 

Republic of Hungary, Slovak Republic, Bratislava, December 1, 1993. 

EC (2003a) Common Implementation Strategy for the Water Framework Directive (2000/60/EC). Guidance 

Document No. 7. Monitoring under the Water Framework Directive. Working Group 2.7. Office for the 

Official Publications of the European Communities, Luxembourg. 

85



EC (2003b) Common Implementation Strategy for the Water Framework Directive (2000/60/EC). Guidance 

Document No. 11. Planning Processes. Working Group 2.9. Office for the Official Publications of the 

European Communities, Luxembourg. 

EC (2004) Common Implementation Strategy for the Water Framework Directive (2000/60/EC) Guidance 

Document No 3, pressures and impacts, IMPRESS. Working Group 2.3. Office for the Official Publications 

of the European Communities, Luxembourg. 

Fleming G (1975) Computer simulation techniques in hydrology. Elsevier, New York. 

Franchini M, Pacciani M (1992) Comparative analysis of several conceptual rainfall-runoff models. Journal of 

Hydrology, 122, 161-219. 

Freeze RA, Harlan RL (1969) Blueprint for a physically-based digitally-simulated hydrologic response model. 


Gelhar LW (1986) Stochastic subsurface hydrology. From theory to application. Water Resources Research, 

22(9), 135-145. 

Graham DN, Butts MB (2005) Flexible integrated watershed modelling with MIKE SHE. In: Singh VP, Frevert 

DK (Eds) Watershed Models. CRC Press, Chapter 10. 

Graham LP (1999) Modelling runoff to the Baltic Sea, Ambio, 28, 328-334. 

Grayson RB, Moore ID, McHahon TA (1992a) Physically based hydrologic modelling, 1. A terrain-based 

model for investigative purposes. Water Resources Research, 28(10), 2639-2658. 

Grayson RB, Moore ID, McHahon TA (1992b) Physically based hydrologic modelling, 2. Is the concept realistic 

Water Resources Research, 28(10), 2639-2658. 

Grayson R, Blöschl G (2000) Spatial Modelling of Catchment Dynamics. In: Grayson R, Blöschl G (Eds.) 

Spatial Patterns in Catchment Hydrology: Observations and Modelling. Cambridge University Press, 

UK. 

Groenenberg JE, Kros J, van der Salm C, de Vries W (1995) Application of the model NUCSAM to the 

Solling spruce site. Ecological Modelling, 83, 97-107. 

GWP (2000) Integrated Water Resources Management. TAC Background Papers No. 4. Global Water Partnership, 

Stockholm. 

Hansen S, Jensen HE, Nielsen NE, Svendsen H (1991) Simulation of nitrogen dynamics and biomass production 

in winter wheat using the Danish simulation model DAISY. Fertilizer Research, 27, 245-259. 

Hansen S, Thorsen M, Pebesma E, Kleeschulte S, Svendsen H (1999) Uncertainty in simulated leaching 

due to uncertainty in input data. A case study. Soil Use and Management, 15, 167-175. 

Harrar WG, Sonnenborg TO, Henriksen HJ (2003) Capture zone, travel time and solute transport predictions 

using inverse modelling and different geological models. Hydrogeology Journal, 11(5), 536-548. 

Havnø K, Madsen MN, Dørge J (1995) MIKE 11 - A Generalized River Modelling Package. In: Singh VP (Ed) 

Computer Models of Watershed Hydrology, Water Resources Publications, Highlands Ranch, Colorado, 

733-782. 

Henriksen HJ, Refsgaard JC, Sonnenborg TO, Gravesen P, Brun A, Refsgaard A, Jensen KH (2001) STÅBI i 

grundvandsmodellering (Handbook in groundwater modelling). Danmarks og Grønlands Geologiske 

Undersøgelse, Rapport 2001/56. (In Danish) 

Henriksen HJ, Troldborg L, Nyegaard P, Sonnenborg TO, Refsgaard JC, Madsen B (2003) Methodology for 

construction, calibration and validation of a national hydrological model for Denmark. Journal of Hydrology 

280, 52-71. 

Henriksen HJ, Refsgaard JC, Højberg AL, Ferrand N, Gijsbers P, Scholten H (submitted) Public participation 

in relation to quality assurance of water resources modelling (HarmoniQuA). 

Heuvelink GBM, Pebesma EJ (1999) Spatial aggregation and soil process modelling. Geoderma, 89, 47-65. 

Hill MC (1998) Methods and guidelines for effective model calibration. U.S. Geological Survey, Water- 

Resources Investigations Report 98-4005. Denver CO. 

Højberg AL, Refsgaard JC (2005) Model Uncertainty - Parameter uncertainty versus conceptual models. 

Water Science and Technology, 52(6), 177-186. 

ICJ (1997) Case Concerning Gabcikovo-Nagymaros project (Hungary/Slovakia). Summary of the Judgement of 

25 September 1997. International Court of Justice, The Hague. 

86



ICWE (1992) The Dublin Statement and report of the conference. International Conference on Water and the 

Environment: Development issues for the 21st century. 26-31 January 1992, Dublin, Ireland. 

IPCC (2001) Climate Change 2001: The Scientific Basis. Contribution of Working Group I to the Third Assessment 

Report of the Intergovernmental Panel of Climate Change [Houghton JT, Ding Y, Griggs DJ, 

Noguer M, van der Linden PJ, Dai X, Maskell K and Johnson CA (eds)]. Cambridge University Press, 

Cambridge, UK and New York, NY, USA, 881 pp. 

Jensen KH, Mantoglou A (1992) Application of stochastic unsaturated flow theory, numerical simulations, 

and comparisons to field observations. Water Resources Research, 28, 269-284. 

Jensen RA, Jørgensen GH (1988) Hydrologisk overfladevands/grundvands model (Hydrological surface 

water/groundwater model). Technical report prepared by Danish Hydraulic Institute for the County of 

Storstrøm and the County of Vestsjælland. (in Danish) 

Jensen KH, Refsgaard JC (1991a) Spatial variability of physical parameters and processes in two field soils. 

Part I: Water Flow and Solute Transport at Local Scale. Nordic Hydrology, 22, 275-302. 

Jensen KH, Refsgaard JC (1991b) Spatial variability of physical parameters and processes in two field soils. 

Part II: Water flow at field scale. Nordic Hydrology, 22, 303-326. 

Jensen KH, Refsgaard JC (1991c) Spatial variability of physical parameters and processes in two field soils. 

Part III: Solute Transport at Field Scale. Nordic Hydrology, 22, 327-340. 

Jønch-Clausen T (1979) SHE. Système Hydrologiique Européen. A short description. Danish Hydraulic Institute, 

Hørsholm, Denmark. 

Jønch-Clausen T (2004) Integrated Water Resources Management (IWRM) and Water Efficiency Plans by 

2005. Why, What and How Global Water Partnership, TEC Background Papers No. 10, Stockholm. 

Jønch-Clausen T, Refsgaard JC (1984) A Mathematical Modelling System for Flood Forecasting. Nordic 

Hydrology, 15, 307-318. 

Kaiser-Hill (2001) Model Code and Scenario Selection Report Site-Wide Water Balance Rocky Flats Environmental 

Technology Site. Report 01-RF-00337. Kaiser-Hill Company LLC. 

Klauer B, Brown JD (2003) Conceptualising imperfect knowledge in public decision making: ignorance, uncertainty, 

error and ‘risk situations’. Environmental Research, Engineering and Management. 

Klemes V (1986) Operational testing of hydrological simulation models. Hydrological Sciences Journal, 31, 

13-24. 

Knudsen J, Thomsen A, Refsgaard JC (1986) WATBAL: A semi-distributed, physically based hydrological 

modelling system. Nordic Hydrology, 17, 347-362. 

Konikow LF, Bredehoeft JD (1992) Ground-water models cannot be validated. Advances in Water Resources, 

15, 75-83. 

Kros J, Reinds GJ, de Vries W, Latour JB, Bollen M (1995) Modelling of soil acidity and nitrogen availability 

in natural ecosystems in response to changes in acid deposition and hydrology. Report 95, DLO Winand 

Staring Centre, Wageningen. 

Kutchment LS, Demidov VN, Naden PS, Cooper DM, Broadhurst P (1996) Rainfall-runoff modelling of the 

Ouse basin, North Yorkshire: an application of a physically based distributed model. Journal of Hydrology, 

181, 323-342. 

Lane SA, Richards KS (2001) The ‘Validation’ of Hydrodynamic Models: Some Critical Perspectives. In: 

Anderson MG, Bates PD (Eds) Model Validation perspectives in Hydrological Science, 413-438. John 

Wiley & Sons, Ltd. 

Linkov I, Burmistrov D (2003) Model Uncertainty and Choices Made by Modelers: Lessons Learned from the 

International Atomic Energy Model Intercomparisons. Risk Analysis, 23(6), 1297-1308. 

Lloyd JW (1980) The importance of drift deposit influences on the hydrogeology of major British aquifers. 

Institution of Water Engineers and Scientists, Journal, 34, 346-356. 

Loague KM, Freeze RA (1985) A Comparison of Rainfall-Runoff Modelling Techniques on Small Upland 

Catchments. Water Resources Research, 21(2), 1985. 

Luckner L (1978) Gekoppelte Grundwasser-Oberflächenwassermodelle (A coupled groundwater-surface 

water model). Wasserwirtschaft-Wassertechnik, 1978, 276-278 (In German). 

Madsen H, Skotner C (2005) Adaptive state updating in real-time river flow forecasting – a combined filtering 

and error forecasting procedure. Journal of Hydrology, 308, 302-312. 

87



Michaud J, Sorooshian S (1994) Comparison of simple versus complex distributed runoff models on a midsized 

semiarid watershed. Water Resources Research, 30(3), 593-605. 

Michaud JD, Shuttelworth WJ (1997) Executive summary of the Tuczon aggregation workshop. Journal of 

Hydrology, 190, 176-181. 

Middlemis H (2000) Murray-Darling Basin Commission. Groundwater flow modelling guideline. Aquaterra 

Consulting Pty Ltd., South Perth. Western Australia. Project no. 125. 

Miles JC, Rushton KR (1983) A coupled surface water and groundwater catchment model. Journal of Hydrology, 

62, 159-177. 

Neuman SP, Wierenga PJ (2003) A comprehensive strategy of hydrogeologic modeling and uncertainty 

analysis for nuclear facilities and sites. University of Arizona, Report NUREG/CR-6805. 

Nielsen DR, Bigger JW, Erk KT (1973) Spatial variability of field measured soil water properties. Hilgardia, 

42, 215-259. 

Nielsen SA, Hansen E (1973) Numerical simulation of the rainfall-runoff process on a daily basis. Nordic 

Hydrology, 4, 171-190. 

NRC (1990) Ground Water Models: Scientific and Regulatory Applications. National Research Council, National 

Academy Press, Washington, D.C. 

Oreskes N, Shrader-Frechette K, Belitz K (1994) Verification, validation and confirmation of numerical models 

in the earth sciences. Science, 264, 641-646. 

Pahl-Wostl C (2002) Towards sustainability in the water sector – The importance of human actors and processes 

of social learning. Aquatic Sciences, 64, 394-411. 

Panday S, Hayakorn PS (2004) A fully coupled physically-based spatially-distributed model for evaluating 

surface/subsurface flow. Advances in Water Resources, 27, 361-382. 

Pappenberger F, Beven KJ (2006) Ignorance in bliss: Or seven reasons not to use uncertainty analysis. Water 

Resources Research 42, W05302, doi:10.1029/2005WR004820. 

Pascual P, Steiber N, Sunderland E (2003) Draft guidance on development, evaluation and application of 

regulatory environmental models. The Council for Regulatory Environmental Modeling. Officie of Science 

Policy, Office of Research and Development. US Environmental Protection Agency, Washington 

D.C. 60 pp. 

Perkins SP, Sophocleous M (1999) Development of a Comprehensive Watershed Model Applied to Study 

Stream Yield under Drought Conditions. Ground Water, 37(3), 418-426. 

Perrin C, Michel C, Andréassian V (2001) Does a large number of parameters enhance model performance 

Comparative assessment of common catchment model structures on 429 catchments. Journal of Hydrology, 

242, 275-301. 

Poeter E, Anderson D (2005) Multiple Ranking and Inference in Ground Water Modeling. Ground Water, 

43(4), 597-605. 

Popper KR (1959) The logic of scientific discovery. Hutchingson & Co, London. 

Prickett TA, Lonnquist CG (1971) Selected digital computer techniques for groundwater resource evaluation. 

Illinois State Water Survey, Bulletin 55. 

Querner EP (1997) Description and application of the combined surface and groundwater flow model 

MOGROW. Journal of Hydrology, 192, 158-188. 

Quinn PF, Beven KJ (1993) Spatial and temporal predictions of soil moisture dynamics, runoff, variable 

source areas and evapotranspiration for Plynlimon, Mid-Wales. Hydrological Processes, 7, 425-448. 

Radwan M, Willems P, Berlamont J (2004) Sensivity and uncertainty analysis for river quality modelling. 

Journal of Hydroinformatics, 6, 83-99. 

Reed S, Koren V, Smith M, Zhang Z, Moreda F, Seo D-J (2004) Overall distributed model intercomparison 

project results. Journal of Hydrology, 298, 27-60. 

Refsgaard JC (1981) The surface water component of an integrated hydrological model. Danish Committee for 

Hydrology. Suså Report No. H12. 

Refsgaard JC (1996) Terminology, modelling protocol and classification of hydrological model codes. In: 

Abbott MB, Refsgaard JC (Eds): Distributed Hydrological Modelling, Kluwer Academic Publishers, 17- 

39. 

88



Refsgaard JC, Stang O (1981) An integrated groundwater/surface water hydrological model. Danish Committee 

for Hydrology. Suså Report No. H13. 

Refsgaard JC, Rosbjerg D, Markussen LM (1983) Application of Kalman filter to real-time operation and to 

uncertainty analyses in hydrological modelling. IAHS Publication No 147, 273-282. 

Refsgaard JC, Storm B (1995) MIKE SHE. In: Singh VP (Ed) Computer Models of Watershed Hydrology. 

Water Resources Publications, Highlands Ranch, Colorado, 809-846. 

Refsgaard JC, Storm B, Abbott MB (1996) Comments on ‘A discussion of distributed hydrological modelling’. 

In: Abbott MB, Refsgaard JC (Eds): Distributed Hydrological Modelling, Kluwer Academic Publishers, 

279-287. 

Refsgaard JC, Ramaekers D, Heuvelink GBM, Schreurs V, Kros H, Rosén L, Hansen S (1998) Assessment 

of ‘cumulative’ uncertainty in spatial decision support systems: Application to examine the contamination 

of groundwater from diffuse sources (UNCERSDSS). Presented at the European Climate Science 

Conference, Vienna, 19-23 October 1998. 

Refsgaard JC, Butts MB (1999) Determination of grid scale parameters in catchment modelling by upscaling 

local scale parameters. Key note presentation. Proceedings of the EurAgEng International Workshop 

on Modelling of transport processes in soils at various scales in time and space, 24-26 November 

1999, Leuven, Belgium. 

Refsgaard JC, van der Sluijs JP, Højberg AL, Vanrolleghem P (2005) Harmoni-CA Guidance Uncertainty 

Analysis. Guidance 1. 46 pp. www.harmoni-ca.info. 

Rykiel ER (1996) Testing ecological models: the meaning of validation. Ecological Modelling, 90, 229-244. 

Saulnier GM, Beven K, Obled C (1997) Digital elevation analysis for distributed hydrological modelling: Reducing 

scale dependence in effective hydraulic conductivity values. Water Resources Research, 

33(9), 2097-2101. 

Scholten H, Van Waveren RH, Groot S, Van Geer FC, Wösten JHM, Koeze RD, Noort JJ (2000) Good Modelling 

Practice in water management. Paper presented on Hydroinformatics 2000, Cedar Rapids, IA, 

USA. 

Scholten H, Groot S (2002) Dutch guidelines. In: Refsgaard, JC (Ed) State-of-the-Art Report on Quality Assurance 

in modelling related to river basin management. Chapter 12, Geological Survey of Denmark 

and Greenland, Copenhagen. www.harmoniqua.org. 

Scholten H, Kassahun A, Refsgaard JC, Kargas T, Gavardinas C, Beulens AJM (2007) A methodology to 

support multidisciplinary model-based water management. Environmental Modelling & Software, 22, 

743-759. 

Singh VP (Ed) (1995) Computer Models of Watershed Hydrology. Water Resources Publications, Highlands 

Ranch, Colorado. 

Smith KA (1980) A model of the extent of anaerobic zones in aggregated soils and its potential application to 

estimates of denitrification. Journal of Soil Science, 31, 263-277. 

Sonnenborg TO, Christensen BSB, Nyegaard P, Henriksen HJ, Refsgaard JC (2003) Transient modelling of 

regional groundwater flow using parameter estimates from steady-state automatic calibration. Journal 

of Hydrology, 273, 188-204. 

Stang O (1981) A regional groundwater model for the Suså area. Danish Committee for Hydrology. Suså Report 

No. H9. 

Styczen M, Storm B (1993a) Modelling of N-movements on catchment scale – a tool for analysis and decisionmaking. 

1. Model description. Fertilizer Research, 36, 1-6. 

Styczen M, Storm B (1993b) Modelling of N-movements on catchment scale – a tool for analysis and decisionmaking. 

2. A case study. Fertilizer Research, 36, 7-17. 

Tampa Bay Water (2001) Scientific review of integrated hydrologic model ISGW/CNTB121. Prepared by West 

Consultants, Gartner Lee Ltd and AQUA TERRA Consultants for Tampa Bay Water, Florida. 

Thomas RG (1973) Groundwater models. FAO, Irrigation and Drainage Paper 21, Rome. 

Troch PA, Mancini M, Paniconni C, Wood EF (1993) Evaluation of a Distributed Catchment Scale Water 

Balance Model. Water Resources Research, 29(6), 1805-1817. 

Troeh FR, Jabro JD, Kirkham D (1982) Gaseous diffusion equations for porous materials. Geoderma, 27, 

239-253. 

89



Troldborg L (2004) The influence of conceptual geological models on the simulation of flow and transport in 

Quaternary aquifer systems. PhD Thesis. Geological Survey of Denmark and Greenland, Report 

2004/107. 

Van Asselt MBA, Rotmans J (2002) Uncertainty in Integrated Assessment Modelling. From Positivism to 

Pluralism. Climatic Change, 54: 75-105. 

Van der Sluijs JP, Craye M, Funtowicz SO, Kloprogge P, Ravetz J, Risbey JS (2005) Combining Quantitative 

and Qualitative Measures of Uncertainty in Model based Foresight Studies: the NUSAP System. Risk 

Analysis, 25(2), 481-492. 

Van Griensven A, Meixner T (2004) Dealing with unidentifiable sources of uncertainty within environmental 

models. In: Pahl C, Schmidt S, Jakeman T. (Eds.), iEMSs 2004 International Congress: "Complexity 

and Integrated Resources Management". International Environmental Modelling and Software Society, 

Osnabrück, Germany, June 2004. 

Van Loon E, Refsgaard JC (eds.) (2005) Guidelines for assessing data uncertainty in hydrological studies. 

HarmoniRiB Report. Geological Survey of Denmark and Greenland. http://www.harmonirib.com. 

Van Waveren RH, Groot S, Scholten H, Van Geer FC, Wösten JHM, Koeze RD, Noort JJ (2000) Good Modelling 

Practice Handbook, STOWA Report 99-05, Utrecht, RWS-RIZA, Lelystad, The Netherlands, 

http://waterland.net/riza/aquest/ 

Vrugt J, Diks CGH, Gupta HV (2005) Improved treatment of uncertainty in hydrologic modelling: Combining 

the strengths of global optimization and data assimilation. Water Resources Research, 41, W01017, 

doi:10.1029/2004WR003059. 

Walker WE, Harremoës P, Rotmans J, Van der Sluijs JP, Van Asselt MBA, Janssen P, Krayer von Krauss 

MP (2003) Defining Uncertainty A Conceptual Basis for Uncertainty Management in Model-Based Decision 

Support, Integrated Assessment, 4(1), 5-17. 

Wardlaw RB (1978) The development of a deterministic integrated surface/subsurface hydrological response 

model. PhD Thesis, University of Stratchclyde, Glasgow. 

Wardlaw RB, Wyness A, Rippon P (1994) Integrated catchment modelling. Surveys in Geophysics, 15, 311- 

330. 

Weeks JB (1974) Simulated effects of oil-shale development on the hydrology of the Piceance basin, Colorado. 

US Geological Survey, Professional Paper 908. 

Wen X-H, Gómez-Hernández JJ (1996) Upscaling hydraulic conductivities in heterogeneous media: An overview. 

Journal of Hydrology, 183, ix-xxxii. 

WMO (1975) Intercomparison of conceptual models used in operational hydrological forecasting. WMO Operational 

Hydrology Report No 7, WMO No 429, World Meteorological Organisation, Geneva. 

WMO (1988) Intercomparison of models for snowmelt runoff. WMO Operational Hydrology Report No 23, 

WMO No 646, World Meteorological Organisation, Geneva. 

WMO (1992) Simulated real-time intercomparison of hydrological models. WMO Operational Hydrology Report 

No 38, WMO No 779, World Meteorological Organisation, Geneva. 

Wolf J, Beusen AHW, Groenendijk P, Kroon T, Rötter R, van Zeijts H (2003) The integrated modelling system 

STONE for calculating nutrient emissions from agriculture in the Netherlands. Environmental Modelling & 

Software, 18, 597-617. 

Wood EF, Sivapalan M, Beven KJ, Band L (1988) Effects of spatial variability and scale with implications to 

hydrologic modelling. Journal of Hydrology, 102, 29-47. 

WSSTP (2005) Water safe strong and sustainable. A European vision for water supply and sanitation in 

2030. Water Supply and Sanitation Technology Platform. October 2005. http://www.wsstp.org 

WWAP (2003) Water for People, Water for Life. UN World Water Development Report. Prepared as a collaborative 

effort of 23 UN agencies and convention secretariats co-ordinated by the World Water Assessment 

Programme. UNESCO, Paris. http://www.unesco.org/water/wwap/index.shtml 

90

[1] 

Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water 

Model for the Suså Catchment. Part 1: Model Description. 

Nordic Hydrology, 13, 299-310. 

Reprinted with permission from Nordic Hydrology

[2] 

Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water 

Model for the Suså Catchment. Part 2: Simulations of Streamflow Depletions 

Due to Groundwater Abstraction. 



[3] 

Refsgaard JC, Christensen TH, Ammentorp HC (1991) A model for oxygen 

transport and consumption in the unsaturated zone. 


Reprinted from Journal of Hydrology with permission from Elsevier

[4] 

Refsgaard JC, Seth SM, Bathurst JC, Erlich M, Storm B, Jørgensen, GH, 

Chandra S (1992) Application of the SHE to catchments in India - Part 1: 

General results. 

Journal of Hydrology, 140, pp 1-23. 


[5] 

Jain SK, Storm B, Bathurst JC, Refsgaard JC, Singh RD (1992) Application of 

the SHE to catchments in India - Part 2: Field experiments and simulation 

studies with the SHE on the Kolar subcatchment of the Narmada River. 



[6] 

Refsgaard JC, Knudsen J (1996) Operational validation and intercomparison 

of different types of hydrological models. 

Water Resources Research, 32 (7), 2189-2202. 

Reproduced by permission of American Geophysical Union

WATER RESOURCES RESEARCH, VOL. 32, NO. 7, PAGES 2189–2202, JULY 1996 

Operational validation and intercomparison of different types 

of hydrological models 

Jens Christian Refsgaard and Jesper Knudsen 

Danish Hydraulic Institute, Hørsholm, Denmark 

Abstract. A theoretical framework for model validation, based on the methodology 

originally proposed by Klemes [1985, 1986], is presented. It includes a hierarchial 

validation testing scheme for model application to runoff prediction in gauged and 

ungauged catchments subject to stationary and nonstationary climate conditions. A case 

study on validation and intercomparison of three different models on three catchments in 

Zimbabwe is described. The three models represent a lumped conceptual modeling system 

(NAM), a distributed physically based system (MIKE SHE), and an intermediate 

approach (WATBAL). It is concluded that all models performed equally well when at 

least 1 year’s data were available for calibration, while the distributed models performed 

marginally better for cases where no calibration was allowed. 

Introduction 

Copyright 1996 by the American Geophysical Union. 

Paper number 96WR00896. 

0043-1397/96/96WR-00896$09.00 

In recent years water resources studies have become increasingly 

concerned with aspects of water resources for which data 

are not directly available. Examples include studies of the 

development potential of ungauged areas, environmental impacts 

of land use changes related to agricultural and forestry 

practices, conjunctive use of groundwater and surface water, 

and climate impact studies concerned with the effects on water 

resources of an anticipated climate change. 

In these and other types of studies, hydrological simulation 

models are often used to provide the missing information as a 

basis for decisions regarding the development and management 

of water and land resources. 

Traditionally, hydrological simulation modeling systems are 

classified in three main groups, namely, (1) empirical black 

box, (2) lumped conceptual, and (3) distributed physically 

based systems. The great majority of the modeling systems 

used in practice today belongs to the simple types (1) or (2) 

and require a modest numbers of parameters (approximately 

5–10) to be calibrated for their operation. Despite their simplicity, 

many models have proven quite successful in representing 

an already measured hydrograph. 

A severe drawback of these traditional modeling systems, 

however, is that their parameters are not directly related to the 

physical conditions of the catchment. Accordingly, it may be 

expected that their applicability is limited to areas where runoff 

has been measured for some years and where no significant 

change in catchment conditions have occurred. 

To provide a more appropriate tool for the type of studies 

mentioned above, considerable efforts within hydrological research 

have been directed toward development of distributed 

physically based catchment models. Such models use parameters 

which are related directly to the physical characteristics of 

the catchment (topography, soil, vegetation, and geology) and 

operate within a distributed framework to account for the 

spatial variability of both physical characteristics and meteorological 

conditions. These models aim at describing the hydrological 

processes and their interaction as and where they 

occur in the catchment and therefore offer the prospect of 

remedying the shortcomings of the traditional rainfall runoff 

models. 

Although there appears to be a certain degree of consensus 

at the theoretical level regarding the potential of the distributed 

physically based types of models, there are widely divergent 

points of view as to whether they offer a significant improvement 

in actual performance when compared to the wellproven 

lumped conceptual model type. Beven [1989, p. 161] 

argues from theoretical considerations of scale problems that 

“the current generation of distributed physically based models 

are lumped conceptual models,” and, further, that all current 

physically based models “are not well suited to applications to 

real catchments.” Grayson et al. [1992] support this view and 

claim that physically based models have been oversold by their 

developers. Other authors, for example, Smith et al. [1994], 

argue that this criticism is “overly pessimistic.” 

An evaluation of the capabilities of hydrological models 

when applied in the absence of site calibration data and limited 

validation data to predict the effects of major land use changes 

was made by the Task Committee on Quantifying Land-Use 

Change Effects [U.S. Committee, 1985], which reported a great 

belief among committee members in the capabilities of 28 

surface water hydrological modeling systems, most of which 

can be classified as lumped conceptual models. In view of the 

limited number of model comparison studies conducted and 

the less-than-encouraging results often obtained, this confidence 

is remarkable. According to the U.S. Committee [1985, p. 

1], “the reasons for this confidence were explored and appear 

to be based upon personal experience, possibly tempered by 

belief in the model originators.” 

Owing to the complexity of the problems involved, further 

theoretical evaluation is not likely to provide a definite conclusion 

regarding the capability and limitation of distributed, 

physically based modeling systems. For establishing a basis to 

better advance the discussion, relevant model validations appear 

to be a more fruitful approach, where the models concerned 

simply are subjected to a range of practical modeling 

tests to validate their capability for undertaking particular 

tasks. 

2189

2190 

REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS 

In this respect, Klemes [1986, p. 17], has developed a hierarchial 

scheme for model testing, which is based on the philosophy 

that “a hydrological simulation model must demonstrate, 

before it is used operationally, how well it can perform 

the kind of task for which it is intended.” It may appear needless 

to advocate such a basic and evident requirement. Unfortunately, 

it is well justified in view of the current practice in 

hydrological model testing. 

The present paper is based on results from a research 

project conducted at the Danish Hydraulic Institute (DHI) 

[1993a]. The project had two major objectives. The first objective 

was to identify a rigorous framework for the testing of 

model capabilities for different types of tasks. The second 

objective was to use this theoretical framework and conduct an 

intercomparison study involving application of three modeling 

systems of different complexity to a number of tasks ranging 

from traditional simulation of stationary, gauged catchments to 

simulation of ungauged catchments and of catchments with 

nonstationary climate conditions. Data from three catchments 

in Zimbabwe were used for the tests. The research project was 

a contribution to project D.5, “Testing the transferability of 

hydrological simulation models,” forming part of the World 

Climate Programme—Water [World Meteorological Organization 

(WMO), 1985]. 

Some of the results of DHI [1993a] were presented by 

Refsgaard [1996] with a focus on modeling the land surface 

processes and the coupling between hydrological and atmospheric 

models within the global change context. Thus Refsgaard 

[1996] presents some of the results from two of the 

Zimbabwean catchments to illustrate data requirements and 

form the basis for conclusions regarding which type of hydrological 

model is required for climate change modeling. The 

present paper, on the other hand, emphasizes the modeling 

methodology and contains a summary of all the test results 

from all the three Zimbabwian catchments. It furthermore 

provides a general discussion of these results with references to 

similar studies reported in literature. 

Theoretical Framework for Model Validation 

Terminology 

No unique and generally accepted terminology is presently 

used in the hydrological community with regard to issues related 

to model validation. The framework used in the present 

paper is basically in line with the terminology defined by 

Schlesinger et al. [1979], Tsang [1991], and Flavelle [1992] and 

comprises the following key definitions. 

A modeling system (i.e., code) is a generalized software 

package, which can be used for different catchments without 

modifying the source code. Examples of modeling systems are 

MIKE SHE, SACRAMENTO, and MODFLOW. 

A model is a site-specific application of a modeling system, 

including given input data and specific parameter values. An 

example of a model is a MIKE SHE–based model for the 

Ngezi catchment (cf. the case study below). 

A modeling system or a code can be “verified.” A code 

verification involves comparison of the numerical solution generated 

by the code with one or more analytical solutions or 

with other numerical solutions. Verification ensures that the 

computer program accurately solves the equations that constitute 

the mathematical model. 

Model validation is here defined as the process of demonstrating 

that a given site-specific model is capable of making 

accurate predictions for periods outside a calibration period. A 

model is said to be validated if its accuracy and predictive 

capability in the validation period have been proven to lie 

within acceptable limits or errors. It is important to notice that 

the term model validation refers to a site specific validation of 

a model. This must not be confused with a more general 

validation of a generalized modeling system which, in principle, 

will never be possible. 

Testing Scheme for Validation of Hydrological Models 

The hierarchial testing scheme proposed by Klemes [1985, 

1986] appears suitable for testing the capability of a model to 

predict the hydrological effect of climate change, land use 

change, and other nonstationary conditions. Klemes distinguished 

between simulations conducted for the same station 

(catchment) used for calibration and simulations conducted 

for ungauged catchments. He also distinguished between cases 

where climate, land use, and other catchment characteristics 

remain unchanged (are stationary) and cases where they are 

not. This leads to the definitions of four basic categories of 

typical modeling tests. 

1. The split-sample test (SS) involves calibration of a 

model based on 3–5 years of data and validation on another 

period of a similar length. 

2. The differential split-sample test (DSS) involves calibration 

of a model based on data before catchment change occurs, 

adjustment of model parameters to characterize the change, 

and validation on the subsequent period. 

3. In the proxy-basin test (PB) no direct calibration is allowed, 

but advantage may be taken of information from other 

gauged catchments. Hence validation will comprise identification 

of a gauged catchment deemed to be of a nature similar to 

that of the validation catchment; initial calibration; transfer of 

model, including adjustment of parameters to reflect actual 

conditions within validation catchment; and validation. 

4. With the proxy-basin differential split-sample test (PB- 

DSS), again no direct calibration is allowed, but information 

from other catchments may be used. Hence validation will 

comprise initial calibration on the other relevant catchment, 

transfer of model to validation catchment, selection of two 

parameter sets to represent the periods before and after the 

change, and subsequent validations on both periods. 

Relevant Literature on Model Intercomparison 

Studies 

The testing of hydrological models through validation on 

independent data has for a long time been emphasized by the 

World Meteorological Organization (WMO). In their pioneering 

studies [WMO, 1975, 1986, 1992] several hydrological modeling 

systems of the empirical black box and the lumped conceptual 

types were tested on the same data from different 

catchments. The actual testing, however, only included the 

standard SS test comprising an initial calibration of a model 

and subsequent validation based on data from an independent 

period. No firm conclusions were derived regarding significant 

differences in performance among different model types. 

Franchini and Pacciani [1991] made a comparative analysis 

of seven different lumped conceptual models. They used an SS 

testing approach calibrating on a 1-month period and validating 

on a subsequent 3-month period. They concluded that in 

spite of a wide range of structural complexity all the models 

produced similar and equally valid results. With regard to the


2191 

question of whether the simpler or the more complex variants 

within this group of models are better, they concluded that 

significantly different models produced basically equivalent results, 

with calibration times being generally proportional to the 

complexity of their structure. On the other hand, they concluded 

that the model structure should not be made too simple, 

because it will then cause a loss of the link with the physics 

of the problem and of the possibility of taking advantage of 

prior knowledge of the geomorphological nature of the catchment. 

Other researchers have conducted similar intercomparison 

studies involving empirical black box models and lumped conceptual 

models [Naef, 1981; Wilcox et al., 1990] with similar 

conclusions. 

Only a few studies have included comparisons of distributed 

physically based models with simpler models. Loague and 

Freeze [1985] in a classical study compared two empirical black 

box modeling systems (a regression model and a unit hydrograph 

model) and a quasi physically based system on three 

small experimental catchments ranging from 10 ha to 7.2 km 2 . 

The models were used on an event basis to simulate runoff 

peaks. The two empirical models were calibrated against runoff 

data and subsequently validated on independent data in an SS 

approach. The parameter values for the quasi physically based 

model were assessed directly from field data and not subject to 

any calibration before being validated against the same data as 

the two other models. Loague and Freeze [1985] found that all 

models performed poorly. For one catchment the quasi physically 

based model was subsequently applied with and without 

calibration of one key model parameter. Such calibration had 

little impact on the model performance during the validation 

period. 

In a study in the semiarid 150 km 2 Walnut Gulch experimental 

watershed Michaud and Sorooshian [1994] compared a 

lumped conceptual model (SCS), a distributed conceptual 

model (SCS with eight subcatchments, one per raingauge) and 

a distributed physically based model (KINEROS) for simulation 

of storm events. They found that with calibration, the 

accuracies of the two distributed models were similar. Without 

calibration the distributed physically based model performed 

better than the distributed conceptual model, and in both cases 

the lumped conceptual model performed poorly. 

Thus, as far as the test experience for distributed physically 

based models is concerned, both Loague and Freeze [1985] and 

Michaud and Sorooshian [1994] have performed tests on relatively 

small experimental catchments with very good data coverage. 

Both studies have used the models on ungauged conditions 

(without calibration) but in all cases under stationary 

climate conditions. The present paper presents results from 

larger catchments in Zimbabwe with ordinary data coverage 

and performs a sequence of rigorous tests of increasing complexity 

according to the hierarchial scheme outlined by Klemes 

[1986], involving intercomparisons between lumped conceptual 

and distributed physically based models. 

Hydrological Modeling Systems 

The following three modeling systems (codes) are used in 

the present study: a lumped conceptual rainfall-runoff modeling 

system (NAM), a semidistributed hydrological modeling 

system (WATBAL), and a distributed physically based hydrological 

modeling system (MIKE SHE). The NAM and MIKE 

SHE can be characterized as very typical of their respective 

classes, while the WATBAL falls in between these two standard 

classes. All three modeling systems are being used on a 

routine basis at the Danish Hydraulic Institute (DHI) in connection 

with consultancy and research projects. 

NAM 

NAM is a traditional hydrological modeling system of the 

lumped conceptual type operating by continuously accounting 

for the moisture contents in four mutually interrelated storages. 

The NAM was originally developed at the Technical 

University of Denmark [Nielsen and Hansen, 1973] and has 

been modified and extensively applied by DHI in a large number 

of engineering projects covering all climatic regimes of the 

world. Furthermore, the NAM has been transferred to more 

than 100 other organizations worldwide as part of DHI’s 

MIKE 11 generalized river modeling package. The structure of 

NAM is illustrated in Figure 1. The NAM has in its present 

version a total of 17 parameters; however, in most cases only 

about 10 of these are adjusted during calibration. 

WATBAL 

WATBAL was developed in the early 1980s by DHI in an 

attempt to enable full utilization of readily available, distributed 

data on land surface properties (topography, vegetation, 

and soil) in a physically based model, and yet it is simple 

enough to allow large-scale applications within reasonable 

computational requirements. Here the WATBAL is briefly 

introduced; more detailed information has been given by 

Knudsen et al. [1986]. 

WATBAL has been designed to account for the spatial and 

temporal variations of soil moisture. On the basis of distributed 

information on meteorological conditions, topography, 

vegetation, and soil types, the catchment area is divided into a 

number of hydrological response units, as illustrated in Figure 

2, with each unit being characterized by a different composition 

of the above features. These units are used to provide the 

spatial representation of soil moisture, while temporal variations 

within each unit are accounted for by means of empirical 

relations for the processes affecting soil moisture, using physical 

parameters particular to each unit. 

For the representation of subsurface flows a simple lumped, 

conceptual approach is applied, using a cascade of linear reservoirs 

to account for the interflow and baseflow components 

(Figure 3). In summary, WATBAL provides a distributed physically 

based description of the surface processes affecting soil 

moisture (interception, infiltration, evapotranspiration, and 

percolation), while a lumped conceptual approach is used to 

represent subsurface flows. WATBAL has previously been 

used successfully for prediction of runoff from ungauged catchments 

[Nielsen and Bari, 1988]. 

MIKE SHE 

MIKE SHE is a further development of the European Hydrological 

System—SHE [Abbott et al., 1986a, b]. It is a deterministic, 

fully distributed and physically based modeling system 

for describing the major flow processes of the entire land phase 

of the hydrological cycle. MIKE SHE solves the partial differential 

equations for the processes of overland and channel flow 

and unsaturated and saturated subsurface flow. The system is 

completed by a description of the processes of snow melt, 

interception, and evapotranspiration. The flow equations are 

solved numerically using finite difference methods. 

In the horizontal plane the catchment is discretized in a

2192 


Figure 1. Structure of the NAM rainfall runoff modeling system [DHI, 1994]. 

network of grid squares. The river system is assumed to run 

along the boundaries of these. Within each square the soil 

profile is represented by a number of computational nodes in 

the vertical direction, which above the groundwater table may 

become partly saturated. Lateral subsurface flow is only considered 

in the saturated part of the profile. Figure 4 illustrates 

the structure of the MIKE SHE. A description of the methodology 

and some experiences of model application to ordi- 

Figure 2. WATBAL representation of catchment characteristics and definition of hydrological response 

units [Knudsen et al., 1986].


2193 

Figure 3. Principal structure of WATBAL [Knudsen et al., 1986]. 

nary catchments have been given by Refsgaard et al. [1992] and 

Jain et al. [1992]. A more detailed description has been given 

by Refsgaard and Storm [1995]. 

MIKE SHE is usually categorized as a physically based system. 

The characterization is, strictly speaking, correct only if it 

is applied on an appropriate scale. A number of scale problems 

arise when the MIKE SHE is used on a regional scale [Refsgaard 

and Storm, 1995]. In addition, if there is a considerable 

Figure 4. 

Schematic presentation of the MIKE SHE [DHI, 1993b].

2194 


Figure 5. 

Location of the three catchments in Zimbabwe. 

uncertainty attached to the basic information, and if the spatial 

and temporal variables (such as groundwater table elevations) 

cannot be validated against observations, a MIKE SHE model 

of that particular site cannot be considered fully physically 

based but will degenerate towards a detailed conceptual 

model. In this case the calibration procedure is usually to 

adjust the parameters with the largest uncertainties attached, 

within a reasonable range. 

Case Study: Methodology 

Selected Catchments in Zimbabwe 

The three catchments in Zimbabwe that were selected for 

the model tests are Ngezi-South (1090 km 2 ), Lundi (254 km 2 ), 

and Ngezi-North (1040 km 2 ). The locations of the catchments 

are shown in Figure 5. 

A brief data collection/field reconnaissance to Zimbabwe 

was arranged to obtain relevant information. Daily series of 

rainfall and monthly series of pan evaporation were obtained 

from the Department of Meteorological Services. Records of 

mean daily discharges as well as information on water rights 

were obtained from the Hydrological Branch, Ministry of Energy 

Water Resources and Development. Detailed information 

on land use was obtained through subcontracting R. Whitlow, 

University of Zimbabwe, to prepare land-use maps based 

upon 1:25,000 aerial photographs. Furthermore, 1:50,000 topographical 

maps were collected and digitized. Information on 

vegetation characteristics was obtained from Timberlake [1989] 

as well as from J. Timberlake and N. Nobanda, National Herbarium 

(personal communication, 1989); B. Campell, Department 

of Biological Sciences (personal communication, 1989); 

and G. MacLaureen, Department of Crop Science, University 

of Zimbabwe (personal communication, 1989). Information on 

soil characteristics and hydrogeology was obtained from Anderson 

[1989]. Finally, valuable information of various kinds was 

provided by R. Whitlow, Department of Geography, University 

of Zimbabwe (personal communication, 1989); H. Elwell, 

Agritex (personal communication, 1989); J. Anderson, Chemistry 

and Soil Research Institute, Ministry of Agriculture (personal 

communication, 1989); and others. A more detailed description 

is given in DHI [1993a]. 

The annual catchment rainfall and runoff for the periods 

selected for modeling are shown in Table 1, while some of the 

key features for the three catchments are presented in Table 2. 

It is noticed from the rainfall and runoff figures in Table 1 that 

there are very large interannual variations. From Table 2 it 

appears that there are significant differences in the vegetation 

and soil characteristics from catchment to catchment. 

Model Testing Scheme 

The model testing scheme is illustrated in Figure 6. The 

testing of the involved models has been undertaken in parallel 

and in the following sequence. 

1. The SS test was based on data from Ngezi-South comprising 

an initial calibration of the models and a subsequent 

validation using data for an independent period. 

2. The PB test involved transfer of models to the Lundi 

catchment and adjustment of parameters to reflect the prevailing 

catchment characteristics and validation without any calibration. 

3. The modified proxy-basin (M-PB) test was as above, but


2195 

Table 1. Annual Rainfall and Runoff Values for the Three 

Zimbabwean Test Catchments 

Hydrological 

Year 

Rainfall, 

mm/yr 

Runoff, 

mm/yr 

Ngezi-South 

1971/1972 890 131 

1972/1973 317 2 

1973/1974 1290 349 

1974/1975 1087 236 

1975/1976 879 90 

1976/1977 872 116 

1977/1978 1131 245 

1978/1979 609 59 

Lundi 

1971/1972 920 89 

1972/1973 371 2 

1973/1974 1384 460 

1974/1974 1046 217 

1975/1976 857 89 

1981/1982 416 10 

1982/1983 528 7 

1983/1984 547 8 

Ngezi-North 

1977/1978 1047 156 

1978/1979 730 64 

1981/1982 430 12 

1982/1983 395 1 

1983/1984 436 4 

was adjusted by allowing model calibration based on 1 year of 

runoff data. 

4. For the DSS test, model calibration was based on data 

from an initial calibration period, and validation was based on 

data from a subsequent period. The differential nature of this 

test is justified by the fact that the later independent period 

includes three successive years (1981/1982–1983/1984) with a 

markedly lower rainfall than would be otherwise and hence 

represents a nonstationary climate scenario. 

5. The PB-DSS test involved transferring the models to the 

Ngezi-North catchment, adjusting the parameters to represent 

the catchment characteristics, and validating them by runoff 

simulation over a nonstationary climate period. 

6. The modified proxy-basin differential split-sample (M- 

PB-DSS) test was as above, though it allowed models to be 

calibrated using a short-term (1 year) record. 

Evaluation Criteria 

For measuring the performance of the models for each test, 

a standard set of criteria has been defined. The criteria have 

been designed with the sole purpose of measuring how closely 

the simulated series of daily flows agree with the measured 

series. Owing to the generalized nature of the defined model 

validations, it has been necessary to introduce several criteria 

for measuring the performance with regard to water balance, 

low flows, and peak flows. 

The standard set of performance criteria comprises a combination 

of the following four graphical plots and three numerical 

measures: (1) joint plots of the simulated and observed 

hydrographs; (2) scatter diagram of monthly runoffs; (3) flow 

duration curves; (4) scatter diagram of annual maximum discharges; 

(5) overall water balance; (6) the Nash-Sutcliffe coefficient 

(R2); and (7) an index (EI) measuring the agreement 

between the simulated and observed flow duration curves. 

The coefficient R2, introduced by Nash and Sutcliffe [1970], 

is computed on the basis of the sequence of observed and 

simulated monthly flows over the whole testing period (perfect 

agreement for R2 is 1): 

M 

R2 1 

m1 

2 M 

Q o m Q s m Q o m Q¯ o 2 

m1 

where 

M total number of months; 

s 

Q m simulated monthly flows; 

o 

Q m observed monthly flows; 

Q¯o 

average observed monthly flows over whole period. 

The flow duration curve error index, EI, provides a numerical 

measure of the difference between the flow duration curves 

of simulated and observed daily flows (perfect agreement for 

EI is 1): 

EI 1 f oq f s q dq f oq dq 

where f o (q) is the flow duration curve based on observed daily 

flows, and f s (q) is the flow duration curve based on simulated 

daily flows. 

Table 2. Land-Use Vegetation and Soil Characteristics Estimated From Available 

Information and a Brief Field Visit 

Catchment 

Ngezi-South Lundi Ngezi-North 

Land use/vegetation (area %) 

Dense/closed woody vegetation 7 13 10 

Open woody vegetation 36 25 35 

Sparse woody vegetation 14 19 14 

Grassland 11 39 16 

Cropland 29 3 19 

Abandoned cropland 2 0 6 

Rock outcrops 1 0 0 

Soil depth range, m 0–2.5 0–1 0.5–6 

Saturated hydraulic 

conductivity in root zone 

range: 1–250 

average: 80 

range: 1–70 

average: 60 

range: 2–100 

average: 50 

soil, mm/hr 

Available water content in root 

zone soil, vol % 

range: 10–14 range: 10–12 range: 9–29 

average: 12 average: 11 average: 17

2196 


Figure 6. 

Model validation test schemes. 

Model Construction, Calibration, and Application 

All models have had access to the same hydrometeorological 

data and catchment information at any time. Due to the nature 

of the different models, however, the WATBAL and SHE have 

been able to make more direct use of the available information 

than the NAM. 

In this respect, the NAM has disregarded the spatial variation 

of rainfall and used the catchment average series as input, 

and for the simulation of ungauged catchments, a subjective 

evaluation of catchment characteristics has been undertaken 

for estimation of the appropriate model parameters. On the 

other hand, the WATBAL and SHE have attempted to account 

for the spatial variability of rainfalls as well as information 

on typical storm durations to convert daily rainfall series 

to realistic hourly rainfalls. Furthermore, these models have 

directly used the available information on the spatial variation 

of topography and soil and vegetation types and their characteristics 

for model setup and estimation of appropriate model 

parameters. 

As an illustration of the differences in model complexity and 

the different abilities of the three modeling systems to utilize 

the available distributed catchment data, some key facts for the 

three model applications to the 1090 km 2 Ngezi-South catchment 

are given in the following three paragraphs. 

The NAM model considered the entire catchment as one 

unit, utilized only catchment areal rainfall, and initially disregarded 

information on soil, vegetation, and geology. Such information 

was subsequently used on a subjective basis for assessing 

likely parameter values in the PB tests on the other two 

catchments. During the model calibrations (when allowed) the 

values of the 10 parameters were assessed. 

The WATBAL model was established on the basis of six 

meteorological zones, eight soil types, and 11 vegetation types. 

The spatial occurrences of these three features resulted in 129 

hydrological response units. During the model calibrations 

(when allowed) parameter values reflecting root depths, soil 

water retention capacity, soil hydraulic conductivities, and time 

constants in subsurface flow routing were adjusted. 

The MIKE SHE also distributed the rainfall information to 

different inputs in six meteorological zones. Information on 

topography, soil, vegetation, and geology were distributed to a 

1-km grid. Thus MIKE SHE carried out calculations at 1090 

horizontal grid points. During the model calibrations (when 

allowed) parameter values reflecting soil depth and maximum 

root depths, as well as an empirical drainage time constant, 

were adjusted. In order to minimize the calibration work the 

parameter values were not varied within all 1090 grid points, 

but kept identical within each of the 13 land-use classes. In 

general, the parameters for which field data were available, 

such as soil water retention curves and leaf area index, were 

not modified during the calibration process. 

The present study has aimed at testing various types of 

general modeling systems. However, it should be emphasized 

that validation results are not solely dependent on the modeling 

system but, indeed, also depend on the hydrologist operating 

the model, including his or her personal interpretation of 

available information and subjective assessments. In the 

present study this element of uncertainty has been minimized 

to the extent possible by assigning three experienced hydrologists 

with comprehensive experience in the application of each 

of the three modeling systems and by providing each of them 

with the same catchment data. 

The calibration procedure adopted was that of “trial and 

error,” implying that the hydrologists made subjective adjustments 

of parameter values in between the calibration runs. The 

numerical and graphical performance criteria described above 

were used as important guidance for the hydrologists when 

deciding upon the set of parameter values which they assessed 

to be the optimal ones. As these decisions inevitably depend on 

the personal experiences and judgments of the hydrologists, it 

may be argued that this procedure adds an undesirable degree 

of subjectivity to the results. However, given the large number 

of performance criteria and the large number of adjustable 

parameters, especially in the WATBAL and MIKE SHE models, 

suitable and well-proven automatic parameter optimization 

techniques did not exist. Instead, by applying the standard 

calibration procedure by which the three hydrologists had comprehensive 

experience, the results may be seen as typical results 

from three different modeling systems, when using standard 

engineering procedures for data collection, model 

construction, and calibration. 

Results of Model Validation Test Scheme 

The results of the six tests outlined in Figure 6 are summarized 

in Figure 7, which shows the overall water balances and


2197 

Figure 7. 

Summary of key validation results for all tests. 

the R2 and EI numerical criteria. Simulated and observed 

hydrographs are shown in Figure 8 for two of the tests from the 

Lundi and Ngezi-North catchments. Annual water balances 

are shown for all the tests in Figures 9–15. Assessments of 

uncertainties in the PB predictions are shown in Figures 16 and 

17. Note that the different performance criteria presented in 

the figures focus on different aspects, such as overall annual 

water balances (Figures 9–17), monthly flows (R2 in Figure 7), 

flow pattern on a daily basis (EI in Figure 7) and hydrograph 

shapes (Figure 8). The results are discussed test by test in the 

following sections. 

SS Test 

This test is based on data from Ngezi-South and comprises 

an initial calibration of the models and a subsequent validation 

using data for an independent period. As indicated in Figures 

7, 9, and 10 the performances of the three models are very 

similar. All models are able to provide a close fit to the recorded 

flows for the calibration period, while for the independent 

validation period the performance is somewhat reduced, 

as expected. The reduction is, however, limited, and all models 

are able to maintain a very good representation of the overall 

water balance and the interannual and seasonal variations, as 

well as the general flow pattern. 

PB Test 

This test comprises a transfer of models to the Lundi catchment, 

adjustment of parameters to reflect the prevailing catchment 

characteristics, and validation without any calibration. 

The PB test was arranged to test the capability of the different 

models to represent runoff from an ungauged catchment area, 

and hence no calibration was allowed prior to the simulation. 

All models have used the experience from the Ngezi-South 

calibrations in combination with the available information on 

the particular catchment characteristics for Lundi. While the 

NAM model has used this information in a purely subjective 

manner to revise model parameters, both the WATBAL and 

MIKE SHE models have directly used this information for the 

model setup. The estimates prepared by the latter two models 

have, however, also been influenced by the individual modelers’ 

subjective interpretation of the available information on 

soil and vegetation characteristics. 

In order to assess the effects of the uncertainty in parameter 

estimation as perceived by the individual modelers, three alternative 

runoff simulations were prepared, reflecting expected 

low, central, and high (runoff) estimates, respectively. The results 

of the central estimates are included in Figures 7, 8a, and 

11, while annual runoff figures for the assessed uncertainty 

intervals are shown in Figure 16. 

In general, all models provide an excellent representation of 

the general flow pattern and the overall water balance, while 

maintaining the significant interannual variability to a satisfactory 

degree. The predicted hydrographs for the rainy season of 

1973/1974, shown in Figure 8a, confirm that the overall hydrograph 

pattern is predicted quite well by all three models. 

The overall performance of the central estimates by the 

NAM and MIKE SHE models is somewhat reduced compared 

to validation runs for the Ngezi-South catchment as expected 

when no calibration is possible. The estimates would, however, 

still be very valuable for all practical purposes. For the 

WATBAL model, the central estimate is even better than 

obtained for the validation period for Ngezi-South, providing 

for a very accurate representation of observed runoff record. 

From Figure 16 it appears that the assessed uncertainty 

interval for the NAM predictions of annual runoff is about 

twice as wide as for the WATBAL and MIKE SHE predictions. 

M-PB Test 

This test is based on the same data from Lundi as the above 

PB test. The M-PB test was undertaken to evaluate whether 

better model performance could be obtained should shortterm 

measurements be available for calibration. Hence, before 

the results of the previous test were revealed, 1 year (1975/ 

1976) of runoff record was released for calibration, and the PB 

test repeated. The main results of this test are summarized in 

Figure 7, and annual water balances are shown in Figure 12. 

For the NAM model the short-term calibration leads to an 

improved performance, decreasing the deviation of the overall 

water balance to some 15%. At the same time, the statistics of 

R2 and EI confirm the good representation of monthly flows 

and the overall flow pattern in general. 

For the WATBAL model the short-term calibration introduces 

only a slight improvement in the overall performance. 

The reason for this is thought to be due to the originally very 

good performance, which in any case would be difficult to 

improve. The main benefit of the short runoff record is in this 

case primarily to confirm the validity of the central estimate

2198 


Figure 8. (a) Lundi (central estimates) proxy-basin (PB) test hydrographs from 1973/1974. (b) Ngezi-North 

(central estimates) PB differential split-sample (SS) test hydrographs for 1977/1978. 

and hence to reduce the uncertainty related to the final runoff 

estimate. In this sense the calibration has proven quite valuable 

and would indeed be so in any practical case. 

For the MIKE SHE model the calibration has not introduced 

any improvement in the overall performance. As compared 

to the best of the original estimates (i.e., the low case) 

the calibration has in fact caused a deterioration of the performance. 

This rather unfortunate incident may occur for all 

Figure 9. Annual water balances for the calibration part of 

the SS test on Ngezi-South catchment. 

Figure 10. Annual water balances for the validation part of 

the SS test on Ngezi-South catchment.


2199 

Figure 11. 

catchment. 

Annual water balances for PB test on Lundi 

Figure 13. Annual water balances for differential split sample 

(DSS) test on Lundi catchment. 

types of models when calibration data are not fully consistent, 

but it appears that the SHE type of model requires a greater 

reliability of input data than other, more simple types of models 

to avoid the pitfall of miscalibration. 

DSS Test 

This test consists of model calibrations based on data from 

Lundi for 4 wet years (1971/1972–1975/1976 with mean annual 

runoff of 171 mm) and validation on data from 3 very dry years 

(1981/1982–1983/1984 with mean annual runoff of 8 mm). The 

purpose of this test is to assess the capability of the models to 

do simulations under nonstationary climate conditions. A summary 

of the main results of the differential SS tests is given in 

Figure 7, and the annual water balances are shown in Figure 

13. 

As is evident from the results, both NAM and MIKE SHE 

predict the water balance well. The WATBAL model, however, 

grossly overestimates the peaks in the relative sense, 

causing the simulated average runoff to be about twice that 

measured (15 mm compared to 8 mm). The related statistics 

are poorer than those in the other testing schemes, but it 

should be noted that even small deviations cause poor statistics 

when mean flows are as low as those in this case. 

PB-DSS Test 

This test is based on data from the third catchment, Ngezi- 

North. Without allowing for any prior calibration, all modelers 

were requested to prepare low, central, and high estimates of 

the expected series of flows for the 1977/1978–1983/1984 period. 

This period contained a sequence of mainly wet years 

(1977/1978–1980/1981) followed by 3 consecutive dry years, 

with rainfalls being less than half of that experienced in the 

former period. 

At the stage when the measured flow record was revealed, it 

was unfortunately discovered that the record for the 1979/ 

1980–1980/1981 years was erroneous and hence had to be 

disregarded when computing the test statistics. The results of 

this test are summarized in Figure 7, while the annual water 

Figure 12. Annual water balances for modified proxy-basin 

(M-PB) test on Lundi catchment. 

Figure 14. Annual water balances for proxy-basin differential 

split-sample (PB-DSS) test on Ngezi-North catchment.

2200 


Figure 17. Assessments of uncertainty interval for prediction 

of annual water balances in the PB-DSS test on Ngezi-North 

catchment. 

Figure 15. Annual water balances for modified proxy-basin 

differential split-sample (M-PB-DSS) test on Ngezi-North 

catchment. 

balances are shown in Figure 14. The assessed uncertainty 

intervals of the model predicted annual runoff are shown in 

Figure 17. 

From Figure 17 it appears that all models have managed to 

provide for a nonbiased range of estimates of the overall water 

balance, which for some models is quite narrow: NAM, 50%; 

WATBAL, 30%; and MIKE SHE, 10%. In terms of the 

overall water balance, the central estimates of the models 

agree within 25% (NAM), 5% (WATBAL), and 2% (MIKE 

SHE). The agreement between the recorded and simulated 

monthly flows and the flow duration curves, however, is less 

accurate for NAM and MIKE SHE than for the WATBAL 

model, which provides for an excellent fit in terms of these 

measures. The reason for the somewhat lower R2 and EI 

figures for the NAM model is related to its generally less 

accurate prediction of flows, while for the MIKE SHE model 

this is directly linked to the erroneous assessment of a key 

drainage parameter, causing the model to produce much more 

base flow than actually exist. 

Hydrographs showing measured discharge and predictions 

by the three models for the rainy season of 1977/1978 are 

presented in Figure 8b. These graphs confirm the conclusions 

derived from the numerical criteria, R2, and EI, namely, that 

Figure 16. Assessments of uncertainty interval for prediction 

of annual water balances in the PB test on Lundi catchment. 

the WATBAL reproduces the observed hydrograph very well, 

while the daily hydrograph for MIKE SHE reveals major errors 

in overall flow pattern. Note that the model which produces 

the best overall water balance (MIKE SHE) has at the 

same time the poorest fit when compared on daily values. 

M-PB-DSS Test 

This test is based on the same data from Ngezi-North as the 

previous PB-DSS test. Following the calibration of all models 

based on only 1 year of data (1977/1978), before the results for 

other years were revealed the above test was repeated. The 

main results of the modified test are shown in Figures 7 and 15. 

These results clearly demonstrate that access to only 1 year of 

runoff data has enabled all models to provide an excellent 

representation of the runoff within the entire testing period. 

The overall water balance agrees within 7% for all models 

and despite the fact that the calibration was based on a wet 

year, annual flows for the dry period come within the right 

order of magnitude, although the relative deviation in some 

cases is quite significant. The high R2 and EI scores achieved 

by all models confirm that the representation of the monthly 

flow sequence and the overall flow pattern has become very 

good after the calibration. 

Discussion and Conclusions 

The three generalized modeling systems, NAM, WATBAL, 

and MIKE SHE, have been subject to a rigorous testing 

scheme on data from three Zimbabwean catchments. NAM is 

a typical representative for the lumped conceptual class of 

models, while MIKE SHE similarly belongs to the distributed 

physically based class. WATBAL falls between the two classes. 

However, for the specific applications in Zimbabwe, where 

surface water hydrological aspects have been dominated, it can 

be argued that WATBAL can be considered as another representative 

of the distributed physically based class. 

Although establishing an objective framework for the model 

tests and intercomparisons has been attempted, it should be 

recognized that the results of a certain validation will be influenced 

by the specific test conditions, including the particular 

climate, catchment characteristics, data availability, and quality 

as well as subjective assessments made by the user (e.g., interpretation 

of available information for determining model parameters). 

Hence the obtained results are not only a function


2201 

of the modeling system itself, but also of the user and numerous 

other factors. To arrive at a firm conclusion many validations 

would usually be required, and the limited number of 

tests undertaken therefore suggests that individual results may 

only be cautiously concluded. 

With this caution regarding generality in mind, a number of 

specific conclusions may be derived from the case study. First, 

in view of the difficult tasks given to the models involving 

simulation for ungauged catchments and nonstationary time 

periods, the overall performance of the models is considered 

quite impressive. The overall water balance agrees within 

25% in all cases but one, and good results are achieved 

without balancing out excessive positive and negative deviations 

within individual years. In most cases the models score an 

R2 value at about 0.8 or greater and an EI index generally 

above 0.7. 

Secondly, the following is noted with regard to the specific 

types of validations tests: 

1. For the SS test the NAM, WATBAL, and MIKE SHE 

systems generally exhibit similar performance. All models are 

able to provide a close fit to the recorded flows for the calibration 

period, without severely reducing the performance 

during the independent validation period. Hence this test suggests 

that if an adequate runoff period for a few (3–5) years 

exists, any of the modeling systems could be used as a reliable 

tool for filling in gaps in such records or used to extend runoff 

series based on long-term rainfall series. Considering the data 

requirements and efforts involved in the setup of the different 

models, however, a simple model of the NAM type should 

generally be selected for such tasks. 

2. For the PB tests, designed for validating the capability of 

the models to represent flow series of ungauged catchments, it 

had been expected that the physically based models would 

produce better results than the simple type of models. The 

results, however, do not provide unambiguous support for this 

hypothesis. All three modeling systems generated good results, 

with the WATBAL providing slightly more accurate results 

than the others. Hence for the Zimbabwean conditions the 

additional capabilities of the MIKE SHE, as compared to the 

WATBAL, namely, the distributed physically based features 

relating to subsurface flow, proved to be of little value in 

simulating the water balance. For the PB tests it is noticed that 

the uncertainty range represented by the low and high estimates 

is significantly larger for the NAM than for the WAT- 

BAL and MIKE SHE cases. This probably reflects the fact that 

parameter estimation for ungauged catchments is generally 

more uncertain for the NAM, whose parameters are semiempirical 

coefficients without direct links to catchment characteristics. 

3. A general experience of the M-PB tests is that allowing 

for model calibration based on only 1 year of runoff data 

improves the overall performance of all models. The improvement 

appears to be particularly significant for the NAM model, 

which also showed the largest uncertainties in the cases where 

no calibration was possible. 

4. For the DSS tests all models have been able to simulate 

flows of the right order of magnitude and correct pattern. 

Hence all models have proven their ability to simulate the 

runoff pattern in periods with much reduced rainfall and runoff 

as compared to the calibration period. On the basis of these 

results there appears no immediate justification for using an 

advanced type of model to represent flows following a significant 

change of rainfall, providing a number of years are available 

for calibration purposes. It is tempting to extend this 

finding to suggest that the simple type of model could be used 

to assess the impact of climate change on water resources. It 

should be recognized, however, that above results cannot fully 

justify such a hypothesis, since a long-term climate change 

would probably bring about changes in vegetation and their 

evaporation. This type of nonstationarity has not been adequately 

tested. 

As far as the SS tests are concerned the above conclusion is 

in full agreement with results of other studies [e.g., Michaud 

and Sorooshian, 1994]. With regard to the PB tests the present 

conclusion in favor of the distributed physically based modeling 

systems is in agreement with, albeit more vague than, that 

of Michaud and Sorooshian [1994]. 

In summary, the present study, as well as similar studies 

reported in literature, suggests the following conclusions with 

regard to rainfall runoff modeling. 

1. Given a few (1–3) years of runoff measurements, a 

lumped model of the NAM type would be a suitable tool from 

the point of view of technical and economical feasibility. This 

applies for catchments with homogeneous climatic input as 

well as cases where significant variations in the exogenous 

input is encountered. 

2. For ungauged catchments, however, where accurate 

simulations are critical for water resources decisions, a distributed 

model is expected to give better results than a lumped 

model if appropriate information on catchment characteristics 

can be obtained. 

Acknowledgments. The modeling work on the Zimbabwe catchments 

were carried out by our colleagues Børge Storm and Merete 

Styczen (MIKE SHE) and Roar Jensen (NAM), while the second 

author was responsible for the WATBAL work. During the data collection 

and field reconnaissance in Zimbabwe, kind help and assistance 

was provided by University of Zimbabwe; National Herbarium; and 

Department of Meteorological Services and Hydrological Branch, 

Ministry of Energy, Water Resources and Development. The study was 

carried out with financial support from the Danish Council of Technology, 

and the paper preparation was supported by the Danish Technical 

Research Council. 

References 

Abbott, M. B., J. C. Bathurst, J. A. Cunge, P. E. O’Connel, and J. 

Rasmussen, An introduction to the European Hydrological System—Systeme 

Hydrologique Europeen, “SHE,” 1, History and philosophy 

of a physically based distributed modelling system, J. Hydrol., 

87, 45–59, 1986a. 

Abbott, M. B., J. C. Bathurst, J. A. Cunge, P. E. O’Connell, and J. 

Rasmussen, An introduction to the European Hydrological System—Système 

Hydrologique Européen “SHE,” 2, Structure of a 

physically based distributed modelling system, J. Hydrol., 87, 61–77, 

1986b. 

Anderson, J., Communal land physical resource inventory, Mhondoro 

and Ngezi, Draft Rep. A 551, Chem. and Soil Res. Inst., Minist. of 

Agric., Harare, Zimbabwe, 1989. 

Beven, K. J., Changing ideas in hydrology—The case of physically 

based models, J. Hydrol., 105, 157–172, 1989. 

Danish Hydraulic Institute (DHI), Validation of hydrological models, 

Phase II, Hørsholm, 1993a. 

Danish Hydraulic Institute (DHI), MIKE SHE WM, short description, 

1993b. 

Danish Hydraulic Institute (DHI), MIKE11 short description, 1994. 

Flavelle, P., A quantitative measure of model validation and its potential 

use for regulatory purposes, Adv. Water Resour., 15, 5–13, 1992. 

Franchini, M., and M. Pacciani, Comparative analysis of several conceptual 

rainfall-runoff models, J. Hydrol., 122, 161–219, 1991. 

Grayson, R. B., I. D. Moore, and T. A. McHahon, Physically based

2202 


hydrologic modeling, 2, Is the concept realistic, Water Resour. Res., 

28(10), 2659–2666, 1992. 

Jain, S. K., B. Storm, J. C. Bathurst, J. C. Refsgaard, and R. D. Singh, 

Application of the SHE to catchments in India, 2, Field experiments 

and simulation studies with the SHE on the Kolar subbasin to the 

Narmada River, J. Hydrol., 140, 25–47, 1992. 

Klemes, V., Sensitivity of water resources systems to climate variations, 

WCP Rep. 98, World Meteorological Organisation, Geneva, 1985. 

Klemes, V., Operational testing of hydrological simulation models, 

Hydrol. Sci. J., 31(1), 13–24, 1986. 

Knudsen, J., A. Thomsen, and J. C. Refsgaard, WATBAL: A semidistributed, 

physically based hydrological modelling system, Nordic 

Hydrol., 17, 347–362, 1986. 

Loague, K. M., and R. A. Freeze, A comparison of rainfall-runoff 

modeling techniques on small upland catchments, Water Resour. 

Res., 21(2), 229–248, 1985. 

Michaud, J., and S. Sorooshian, Comparison of simple versus complex 

distributed runoff models on a midsized semiarid watershed, Water 

Resour. Res., 30(3), 593–605, 1994. 

Naef, F., Can we model the rainfall-runoff process today, Hydrol. Sci. 

Bull., 26(3), 281–289, 1981. 

Nash, I. E., and I. V. Sutcliffe, River flow forecasting through conceptual 

models, I, J. Hydrol., 10, 282–290, 1970. 

Nielsen, S. A., and Bari, Simulation of runoff from ungauged catchments 

by a semi-distributed hydrological modelling system, Proceedings, 

6th IAHR Congress, Int. Assoc. for Hydraul. Res., Delft, Netherlands, 

1988. 

Nielsen, S. A., and E. Hansen, Numerical simulation of the rainfallrunoff 

process on a daily basis, Nordic Hydrol., 4, 171–190, 1973. 

Refsgaard, J. C., Model and data requirements for simulation of runoff 

and land surface processes, in Proceedings from NATO Advanced 

Research Workshop “Global Environmental Change and Land Surface 

Processes in Hydrology: The Trials and Tribulations of Modelling and 

Measurering, Tucson, May 17–21, 1993, edited by S. Sorooshian and 

V. K. Gupta, Springer-Verlag, New York, 1996. 

Refsgaard, J. C., and B. Storm, MIKE SHE, in Computer Models of 

Watershed Hydrology, edited by V. J. Singh, pp. 809–846, Water 

Resour. Publ., Littleton, Colo., 1995. 

Refsgaard, J. C., S. M. Seth, J. C. Bathurst, M. Erlich, B. Storm, G. H. 

Jørgensen, and S. Chandra, Application of the SHE to catchments in 

India, 1, General results, J. Hydrol., 140, 1–23, 1992. 

Schlesinger, S., R. E. Crosbie, R. E. Gagné, G. S. Innis, C. S. Lalwani, 

J. Loch, J. Sylvester, R. D. Wright, N. Kheir, and D. Bartos, Terminology 

for model credibility, Simulation, 32(3), 103–104, 1979. 

Smith, R. E., D. R. Goodrich, D. A. Woolhiser, and J. R. Simanton, 

Comment on “Physically based modeling, 2, Is the concept realistic” 

by R. B. Grayson, I. D. More, and T. A. McHahon, Water 

Resour. Res., 30(3), 851–854, 1994. 

Timberlake, J., Brief description of the vegetation of Mondoro and 

Ngezi communal lands, Mashonaland West, Natl. Herbarium, 

Harare, Zimbabwe, 1989. 

Tsang, C.-F., The modelling process and model validation, Ground 

Water, 29(6), 825–831, 1991. 

U.S. Committee, Task Committee on Quantifying Land-Use Change 

Effects, Evaluation of hydrological models used to quantify major 

land-use change effects, J. Irrig. Drain. Eng., 111(1), 1–17, 1985. 

Wilcox, B. P., W. J. Rawls, D. L. Brakensiek, and J. R. Wright, Predicting 

runoff from rangeland catchments: A comparison of two 

models, Water Resour. Res., 26(10), 2401–2410, 1990. 

World Meteorological Organization, (WMO), Intercomparison of 

conceptual models used in operational hydrological forecasting, 

WMO Oper. Hydrol. Rep. 7, WMO 429, Geneva, 1975. 

World Meteorological Organization (WMO), Third planning meeting 

on World Climate Programme Water, WCP 114, WMO/TD 106, 

Geneva, 1985. 

World Meteorological Organization (WMO), Intercomparison of 

models for snowmelt runoff, WMO Oper. Hydrol. Rep. 23, WMO 646, 

Geneva, 1986. 

World Meteorological Organization (WMO), Simulated real-time intercomparison 

of hydrological models, WMO Oper. Hydrol. Rep. 38, 

WMO 779, Geneva, 1992. 

J. Knudsen and J. C. Refsgaard, Danish Hydraulic Institute, Agern 

Alle 5, DK-2970 Hørsholm, Denmark. 

(Received September 25, 1995; revised March 15, 1996; 

accepted March 20, 1996.)

[7] 

Refsgaard JC (1997) Parametrisation, calibration and validation of distributed 

hydrological models. 



[8] 

Refsgaard JC (1997) Validation and Intercomparison of Different Updating 

Procedures for Real-Time Forecasting. 



[9] 

Refsgaard JC, Sørensen HR, Mucha I, Rodak D, Hlavaty Z, Bansky L, 

Klucovska J, Topolska J, Takac J, Kosc V, Enggrob HG, Engesgaard P, 

Jensen JK, Fiselier J, Griffioen J, Hansen S (1998) An Integrated Model for 

the Danubian Lowland – Methodology and Applications. 

Water Resources Management, 12, 433-465. 

Reprinted from Water Resources Management with permission from Springer 

(www.springerlink.com)

Water Resources Management 12: 433–465, 1998. 

© 1998 Kluwer Academic Publishers. Printed in the Netherlands. 

433 

An Integrated Model for the Danubian Lowland – 

Methodology and Applications 

J. C. REFSGAARD 1 ,H.R.SØRENSEN 1 , I. MUCHA 2 , D. RODAK 2 , 

Z. HLAVATY 2 , L. BANSKY 2 , J. KLUCOVSKA 2 , J. TOPOLSKA 4 , J. TAKAC 3 , 

V. KOSC 3 , H. G. ENGGROB 1 , P. ENGESGAARD 5 , J. K. JENSEN 5 , 

J. FISELIER 6 , J. GRIFFIOEN 7 and S. HANSEN 8 

1 Danish Hydraulic Institute, Denmark 

2 Ground Water Consulting Ltd., Bratislava, Slovakia 

3 Irrigation Research Institute (VUZH), Bratislava, Slovakia 

4 Water Research Institute (VUVH), Bratislava, Slovakia 

5 Water Quality Institute (VKI), Denmark 

6 DHV Consultants BV, The Netherlands 

7 Netherlands Institute of Applied Geosciences TNO, The Netherlands 

8 Royal Veterinary and Agricultural University, Denmark 

(Received: 30 December 1997; in final form: 10 November 1998) 

Abstract. A unique integrated modelling system has been developed and applied for environmental 

assessment studies in connection with the Gabcikovo hydropower scheme along the Danube. 

The modelling system integrates model codes for describing the reservoir (2D flow, eutrophication, 

sediment transport), the river and river branches (1D flow including effects of hydraulic control structures, 

water quality, sediment transport), the ground water (3D flow, solute transport, geochemistry), 

agricultural aspects (crop yield, irrigation, nitrogen leaching) and flood plain conditions (dynamics 

of inundation pattern, ground water and soil moisture conditions, and water quality). The uniqueness 

of the established modelling system is the integration between the individual model codes, each of 

which provides complex descriptions of the various processes. The validation tests have generally 

been carried out for the individual models, whereas only a few tests on the integrated model were 

possible. Based on discussion and examples, it is concluded that the results from the integrated model 

can be assumed less uncertain than outputs from the individual model components. In an example, 

the impacts of the Gabcikovo scheme on the ecologically unique wetlands created by the river branch 

system downstream of the new reservoir have been simulated. In this case, the impacts of alternative 

water management scenarios on ecologically important factors such as flood frequency and duration, 

depth of flooding, depth to ground water table, capillary rise, flow velocities, sedimentation and water 

quality in the river system have been explicitly calculated. 

Key words: Danube, environmental impacts, floodplain, Gabcikovo, groundwater, hydropower, integrated 

modelling, river branch. 

434 J. C. REFSGAARD ET AL. 

Figure 1. The Danubian Lowland with the new reservoir and the Gabcikovo scheme. 


1.1. THE DANUBIAN LOWLAND AND THE GABCIKOVO HYDROPOWER SCHEME 

The Danubian Lowland (Figure 1) in Slovakia and Hungary between Bratislava and 

Komárno is an inland delta (an alluvial fan) formed in the past by river sediments 

from the Danube. The entire area forms an alluvial aquifer, which receives around 

30 m 3 s −1 infiltration water from the Danube throughout the year, in the upper parts 

of the area and returns it to the Danube and the drainage canals in the downstream 

part. The aquifer is an important water resource for municipal and agricultural 

water supply. 

Human influence has gradually changed the hydrological regime in the area. 

Construction of dams upstream of Bratislava together with straightening and embanking 

of the river for navigational and flood protection purposes as well as 

exploitation of river sediments have significantly deepened the river bed and lowered 

the water level in the river and surrounding ground water level. These changes 

have had a significant influence on the ground water regime as well as the sensitive 

riverine forests downstream of Bratislava. Despite this basically negative trend the 

floodplain area with its alluvial forests and associated ecosystems still represents a 

unique landscape of outstanding ecological importance. 

The Gabcikovo hydropower scheme was put into operation in 1992. A large 

number of hydraulic structures has been established as part of the hydropower 

scheme. The key structures are a system of weirs across the Danube at Cunovo 

15 km downstream of Bratislava, a reservoir created by the damming at Cunovo, a 

30 km long lined power and navigation canal, outside the floodplain area, parallel to 

the Danube River with intake to the hydropower plant, a hydropower plant and two

AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND 435 

ship-locks at Gabcikovo, and an intake structure at Dobrohost, 10 km downstream 

of Cunovo, diverting water from the new canal to the river branch system. The 

entire scheme has significantly affected the hydrological regime and the ecosystem 

of the region, see, e.g., Mucha et al. (1997). The scheme was originally planned as 

a joint effort between former Czecho-Slovakia and Hungary, and the major parts of 

the construction were carried out as such on the basis of a 1977 international treaty. 

However, since 1989 Gabcikovo has been a major matter of controversy between 

Slovakia and Hungary, who have referred some disputed questions to international 

expert groups (EC, 1992, 1993a, b) and others to the International Court of Justice 

in The Hague (ICJ, 1997). 

Comprehensive monitoring and assessments of environmental impacts have been 

made, see Mucha (1995) for an overview. Since 1995 a joint Slovak-Hungarian 

monitoring program has been carried out (JAR, 1995, 1996, 1997). 

1.2. NEED FOR INTEGRATED MODELLING 

The hydrological regime in the area is very dynamic with so many crucial links 

and feedback mechanisms between the various parts of the surface- and subsurface 

water regimes that integrated modelling is required to thoroughly assess environmental 

impacts of the hydropower scheme. This is illustrated by the following three 

examples: 

• Ground water quality. Based on qualitative arguments it was hypothesised 

that the damming and creation of the reservoir might lead to changes in the 

oxidation-reduction state of the ground water. The reason for this is that the 

reservoir might increase infiltration from the Danube to the aquifer because of 

increased head gradients. On the other hand, fine sediment matter might accumulate 

on the reservoir bottom, thereby creating a reactive sediment layer. The 

river water infiltrating to the aquifer has to pass this layer, which might induce 

a change in the oxidation status of the infiltrating water. This could affect the 

quality of the ground water from being oxic or suboxic towards being anoxic, 

which is undesirable for Bratislava’s water works, most of which are located 

near the reservoir. Thus, the oxidation-reduction state of the groundwater is 

intimately linked to a balance between the rates of infiltrating reducing water 

and the aquifer oxidizing capacity. The infiltrating water is linked to the hydraulic 

behaviour of the reservoir: how large is the infiltration area and at which 

rates does the infiltration take place at different locations. However, without 

an integrated model it is not possible to quantify whether and under which 

conditions these mechanisms play a significant role in practise, whether they 

are correct in principle but without practical importance, and what measures 

should be realised. 

• Agricultural production. Changes in discharges in the Danube caused by diversion 

of some of the water through the power canal and creation of a reservoir 


Figure 2. Important processes and their interactions with regard to floodplain hydrology. 

would lead to changes in the ground water levels. As the agricultural crops 

depend on capillary rise from the shallow ground water table and irrigation, the 

new hydrological situation created by the damming of the Danube might influence 

both the crop yield, the irrigation requirements and the nitrogen leaching. 

Traditional crop models describing the root zone are not sufficient in this case, 

because the lower boundary conditions (ground water levels) are changed in a 

way that can only be quantified if also the reservoir, the river and canal system 

and the aquifer are explicitly included in the modelling. 

• Floodplain ecosystem. The flora and fauna, which in the floodplain area are 

dominated by the river side branches, depend on many factors such as flooding 

dynamics, flow velocities, depth of ground water table, soil moisture, water 

quality and sediments. Also in this case the important factors depend on the 

interaction between the groundwater and the surface water systems (illustrated 

in Figure 2), and even on water quality and sediments in the surface water 

system, so that quantitative impact assessments require an integrated modelling 

approach. 

2. Integrated Modelling System 

2.1. INDIVIDUAL MODEL COMPONENTS 

An integrated modelling system (Figure 3) has been established by combining the 

following existing and well proven model codes: 

• MIKE SHE (Refsgaard and Storm, 1995) which, on a catchment scale, can 

simulate the major flow and transport processes in the hydrological cycle: 

– 1-D flow and transport in the unsaturated zone


Figure 3. Structure of the integrated modelling system with indication of the interactions 

between the individual models. 

– 3-D flow and transport in the ground water zone 

– 2-D flow and transport on the ground surface 

– 1-D flow and transport in the river. 

All of the above processes are fully coupled allowing for feedback’s and interactions 

between components. In addition, MIKE SHE includes modules for 

multi-component geochemical and biodegradation reactions in the saturated 

zone (Engesgaard, 1996). 

• MIKE 11 (Havnø et al., 1995), is a one-dimensional river modelling system. 

MIKE 11 is used for simulating hydraulics, sediment transport and morphology, 

and water quality. MIKE 11 is based on the complete dynamic wave 

formulation of the Saint Venant equations. The modules for sediment transport 

and morphology are able to deal with cohesive and noncohesive sediment 

transport, as well as the accompanying morphological changes of the river bed. 

The noncohesive model operates on a number of different grain sizes. 

• MIKE 21 (DHI, 1995), which has the same basic characteristics as MIKE 11, 

extended to two horizontal dimensions, and is used for reservoir modelling. 

• MIKE 11 and MIKE 21 include River/Reservoir Water Quality (WQ) and 

Eutrophication (EU) (Havnø et al., 1995; VKI, 1995) modules to describe oxygen, 

ammonium, nitrate and phosphorus concentrations and oxygen demands 

as well as eutrophication issues such as bio-mass production and degradation. 

• DAISY (Hansen et al., 1991) is a one-dimensional root zone model for simulation 

of soil water dynamics, crop growth and nitrogen dynamics for various 

agricultural management practices and strategies. 


2.2. INTEGRATION OF MODEL COMPONENTS 

The integrated modelling system is formed by the exchange of data and feedbacks 

between the individual modelling systems. The structure of the integrated 

modelling system and the exchange of data between the various modelling systems 

are illustrated in general in Figure 3 and the steps in the integrated modelling is 

described further in Section 6.2 and illustrated in Figure 10 for the case of flood 

plain modelling. The interfaces between the various models indicated in Figure 3 

are 

A) MIKE SHE forms the core of the integrated modelling system having interfaces 

to all the individual modelling systems. The coupling of MIKE SHE and 

MIKE 11 is a fully dynamic coupling where data is exchanged within each 

computational time step, see Section 2.3 below. 

B) Results of eutrophication simulations with MIKE 21 in the reservoir are used 

to estimate the concentration of various water quality parameters in the water 

that enters the Danube downstream of the reservoir. This information serves as 

boundary conditions for water quality simulations for the Danube using MIKE 

11. 

C) Sediment transport simulations in the reservoir with MIKE 21 provide information 

on the amount of fine sediment on the bottom of the reservoir. The 

simulated grain size distribution and sediment layer thickness is used to calculate 

leakage coefficients, which are used in ground water modelling with MIKE 

SHE to calculate the exchange of water between the reservoir and the aquifer. 

D) The DAISY model simulates vegetation parameters which are used in MIKE 

SHE to simulate the actual evapotranspiration. Ground water levels simulated 

with MIKE SHE act as lower boundary conditions for DAISY unsaturated zone 

simulations. Consequently, this process is iterative and requires several model 

simulations. 

E) Results from water quality simulations with MIKE 11 and MIKE 21 provide 

estimates of the concentration of various components/parameters in the water 

that infiltrates to the aquifer from the Danube and the reservoir. This can be 

used in the ground water quality simulations (geochemistry) with MIKE SHE. 

A general discussion on the limitations in the above couplings is given in Section 7 

below. 

2.3. A COUPLING OF MIKE SHE AND MIKE 11 

The focus in MIKE SHE lies on catchment processes with a comparatively less 

advanced description of river processes. In contrary, MIKE 11 has a more advanced 

description of river processes and a simpler catchment description than MIKE 

SHE. Hence, for cases where full emphasis is needed for both river and catchment 

processes a coupling of the two modelling systems is required.


Figure 4. Principles of the coupling between the MIKE SHE catchment code and the MIKE 

11 river code. 

A full coupling between MIKE SHE and MIKE 11 has been developed (Figure 

4). In the combined modelling system, the simulation takes place simultaneously 

in MIKE 11 and MIKE SHE, and data transfer between the two models 

takes place through shared memory. MIKE 11 calculates water levels in rivers 

and floodplains. The calculated water levels are transferred to MIKE SHE, where 

flood depth and areal extent are mapped by comparing the calculated water levels 

with surface topographic information stored in MIKE SHE. Subsequently, MIKE 

SHE calculates water fluxes in the remaining part of the hydrological cycle. Exchange 

of water between MIKE 11 and MIKE SHE may occur due to evaporation 

from surface water, infiltration, overland flow or river-aquifer exchange. Finally, 

water fluxes calculated with MIKE SHE are exchanged with MIKE 11 through 

source/sink terms in the continuity part of the Saint Venant equations in MIKE 11. 

The MIKE SHE–MIKE 11 coupling is crucial for a correct description of the 

dynamics of the river-aquifer interaction. Firstly, the river width is larger than 

one MIKE SHE grid, in which case the MIKE SHE river-aquifer description is 

no longer valid. Secondly, the river/reservoir system comprises a large number of 

hydraulic structures, the operation of which are accurately modelled in MIKE 11, 

but cannot be accounted for in MIKE SHE. Thirdly, the very complex river branch 

system with loops and flood cells needs a very efficient hydrodynamic formulation 

such as in MIKE 11. 


2.4. COMPARISON TO OTHER MODELLING SYSTEMS REPORTED IN 

LITERATURE 

Yan and Smith (1994) described the demand and outlined a concept for a full 

integrated ground water–surface water modelling system including descriptions of 

hydraulic structures and agricultural irrigation as a decision support tool for water 

resources management in South Florida. Typical examples of integrated codes 

described in the literature are Menetti (1995) and Koncsos et al. (1995). 

In a review of recent advances in understanding the interaction of groundwater 

and surface water Winter (1995) mainly describes groundwater codes, such as 

MODFLOW, which have been expanded with some, but very limited, surface water 

simulation capabilities. The research activities are characterized as ‘... although 

studies of these systems have increased in recent years, this effort is minimal compared 

to what is needed’. Winter (1995) sees the prospects for the future as follows: 

‘Future studies of the interaction of groundwater and surface water would benefit 

from, and indeed should emphasise, interdisciplinary approaches. Physical hydrologists, 

geochemists, and biologists have a great deal to learn from each other, and 

contribute to each other, from joint studies of the interface between groundwater 

and surface water.’ 

Integrated three-dimensional descriptions of flow, transport and geochemical 

processes is still rarely seen for groundwater modelling of large basins. Thus, 

according to a recent review of basin-scale hydrogeological modelling (Person 

et al., 1996) most of the existing reactive transport model codes are based on 

one-dimensional descriptions. 

While many model codes contain a distributed physically-based representation 

of one of the three main components: ground water, unsaturated zone, and surface 

water systems, only few codes provide a fully integrated description of all 

these three main components. For example in an up-to-date book (Singh, 1995) 

presenting descriptions of 25 hydrological codes only three codes, SHE/SHESED 

(Bathurst et al., 1995), IHDM (Calver and Wood, 1995) and MIKE SHE (Refsgaard 

and Storm, 1995) provide such integrated descriptions. Among these three 

codes only MIKE SHE has capabilities for modelling advection-dispersion and 

water quality. None of the three codes contained options for computations of hydraulic 

structures in river systems, nor agricultural modelling such as crop yield 

and nitrogen leaching. 

The individual components of the integrated modelling system presented in this 

paper, we believe, represent state-of-the-art within their respective disciplines. The 

uniqueness is the full integration. 

3. Methodology for Model Construction, Calibration, Validation and 

Application 

The terminology and methodology used in the following is based on the concepts 

outlined in Refsgaard (1997).


3.1. MODEL CONSTRUCTION 

All of the applied models are based on distributed physically-based model codes. 

This implies that most of the required input data and model parameters can ideally 

be measured directly in nature. 

3.2. MODEL CALIBRATION 

The calibration of a physically-based model implies that simulation runs are carried 

out and model results are compared with measured data. The adopted calibration 

procedure was based on ‘trial and error’ implying that the model user in between 

calibration runs made subjective adjustments of parameter values within physically 

realistic limits. The most important guidance for the model user in this process was 

graphical display of model results against measured values. It may be argued that 

such manual procedure adds a degree of subjectivity to the results. However, given 

the very complex and integrated modelling focusing on a variety of output results 

and containing a large number of adjustable parameters, automatic parameter optimisation 

is not yet possible and ‘trial and error’ still becomes the only feasible 

method in practise. 

3.3. MODEL VALIDATION 

Good model results during a calibration process cannot automatically ensure that 

the model can perform equally well for other time periods as well, because the 

calibration process involves some manipulation of parameter values. Therefore, 

model validations based on independent data sets are required. To the extent possible, 

limited by data availability, the models have been validated by demonstrating 

the ability to reproduce measured data for a period outside the calibration period, 

using a so-called split-sample test (Klemes, 1986). For some of the models, the 

model was even calibrated on pre-dam conditions and validated on post-dam conditions, 

where the flow regime at some locations was significantly altered due to 

the construction of the reservoir and related hydraulic structures and canals. 

3.4. MODEL APPLICATION 

The validated models have finally been used, as an integrated system, in a scenario 

approach to assess the environmental impacts of alternative water management 

options. The uncertainties of the model predictions have been assessed through 

sensitivity analyses. 


4. Selected Results from Model Construction, Calibration and Validation of 

Individual Components 

Comprehensive data collection and processing as well as model calibration and 

validation were carried out (DHI et al., 1995). In the following sections a few 

selected results are presented for the individual components. Further aspects of 

model validation focusing on integrated aspects are discussed in Section 5. 

4.1. RIVER AND RESERVOIR FLOW MODELLING 

The following models have been constructed, calibrated and validated: 

• one-dimensional MIKE 11 model for the Danube from Bratislava to Komarno, 

• one-dimensional MIKE 11 model for the river branch system at the Slovak 

floodplain, and 

• two-dimensional MIKE 21 model for the reservoir. 

The MIKE 11 models have been established in two versions reflecting post- and 

pre-dam conditions, respectively. 

4.1.1. MIKE 11 River Model for the Danube 

The MIKE 11 model for the Danube is based on river cross-sections measured in 

1989 and 1991. The applied boundary conditions were measured daily discharges 

at Bratislava (upstream) and a discharge rating curve at Komarno (downstream). 

The model was initially calibrated for two steady state situations reflecting a low 

flow situation (905 m 3 s −1 ) and a flow situation close to the long term average 

(2390 m 3 s −1 ), respectively. Subsequently, the model was calibrated in a nonsteady 

state against daily water level and discharge measurements from 1991. The model 

was finally validated by demonstrating the ability to reproduce measured daily 

water level data from 1990. Calibration and validation results are presented in 

Topolska and Klucovska (1995). For the post-dam model some river reaches were 

updated with cross-sections measured in 1993. In addition, the reservoir and related 

hydraulic structures and canals were included. As the conditions after damming 

of the Danube have changed significantly, re-calibration of the post-dam model 

was carried out for the period April 1993–July 1993. Subsequently, the model was 

validated against measured data from the period November 1992–March 1993. 

4.1.2. MIKE 11 Model for the River Branch System 

The Danubian floodplain is a forest area of major ecological interest characterised 

by a complex system of river branches. A layout of the river branch system is shown 

in Figure 5. The cross-sections in the river branch system were measured during the 

1960’s and 1970’s. The pre-dam model was calibrated against water level and flow 

data from the 1965 flood. In the post-dam situation, the branch system is fed by an


Figure 5. Layout of the river branch system on the Slovakian side of the Danube. 

inlet structure with water from the power canal. The system consists of a number 

of compartments (cascades) separated by small dikes. On each of these dikes combined 

structures of culverts and spillways are located enabling some control of the 

water levels and flows in the system. Results of the model calibration against data 

measured during the summer 1994 are shown in Klucovska and Topolska (1995). 

Finally, the model was validated by demonstrating the ability to reproduce water 

levels measured during the summer of 1993. Some of these results are presented in 

Sørensen et al. (1996). 

4.1.3. MIKE 21 Reservoir Model 

A MIKE 21 hydrodynamic model for the reservoir was established based on a 

reservoir bathymetry measured in 1994. The spatial resolution of the finite difference 

model is 100 × 50 m. The model was calibrated against flow velocities 

measured in the reservoir in the autumn of 1994. 

4.2. GROUND WATER FLOW MODELLING 

Ground water modelling has been carried out at three different spatial scales: 

• A regional ground water model for pre-dam conditions (3000 km 2 , 500 m 

horizontal grid, 5 vertical layers). 

• A regional ground water model for post-dam conditions (3000 km 2 , 500 m 

horizontal grid, 5 vertical layers). 

• A local ground water model for an area surrounding the reservoir for both preand 

post-dam conditions (200 km 2 , 250 m horizontal grid, 7 vertical layers). 

• A local ground water model for the river branch system for both pre- and postdam 

conditions (50 km 2 , 100 m horizontal grid, 2 vertical layers). 

• A cross-sectional (vertical profile) model near Kalinkovo at the left side of the 

reservoir (2 km long, 10 m horizontal grid, 24 vertical layers). 

The regional and local ground water models all use the coupled version of the 

MIKE SHE and MIKE 11 and hence, include modelling of evapotranspiration and 


snowmelt processes, river flow, unsaturated flow and ground water flow. The crosssectional 

model only includes ground water processes. 

4.2.1. Model Construction 

Comprehensive input data were available and used in the construction of the models. 

In general, the regional and the local models are based on the same data with 

the main difference being that the local models provide finer resolutions and less 

averaging of measured input data. The two regional models, reflecting pre- and 

post-dam conditions, are basically the same. The only difference is that the postdam 

model includes the reservoir and related hydraulic structures and seepage 

canals. 

The models are based on information on location of river systems and crosssectional 

river geometry, surface topography, land use and cropping pattern, soil 

physical properties and hydrogeology. In addition, time series of daily precipitation, 

potential evapotranspiration and temperature as well as discharge inflow at 

Bratislava have been used. Comprehensive geological data exist from this area, see 

e.g., Mucha (1992) and Mucha (1993). The aquifer, ranging in thickness from about 

10 m at Bratislava to about 450 m at Gabcikovo, consists of Danube river sediments 

(sand and gravel) of late Tertiary and mainly Quaternary age. The present model is 

based on the work of Mucha et al. (1992a, b). 

4.2.2. Model Calibration 

The ground water model was calibrated against selected measured time series of 

ground water levels. The following parameters were subject to calibration: specific 

yield in the upper aquifer layer, leakage coefficients for the river bed and hydraulic 

conductivities for the aquifer layers. The soil physical characteristics for the unsaturated 

zone have been adopted directly from the unsaturated zone/agricultural 

modelling. 

The river model that has been used in the ground water modelling is identical 

to the MIKE 11 river model of the Danube, which was successfully validated independently 

as a ‘stand alone model’ (Subsection 4.1, above). When coupling MIKE 

SHE and MIKE 11 water is exchanged between the two models. The amount of 

water that recharges the aquifer in the upstream part and re-enters the river further 

downstream is in the order of 10–60 m 3 s −1 depending on the Danube discharge 

and on the actual ground water level. The recharge is typically two orders of magnitude 

less than the Danube discharge, and hence, a re-calibration of the MIKE 

11 river model is not required. As the major part of the ground water recharge 

originates from infiltration through the river bed, the leakage coefficient for the 

river bed becomes very important. Limited field information was available on this 

parameter, and hence, it was assumed spatially constant and through calibration 

assessed to be 5 × 10 −5 s −1 for the Danube and Vah rivers and 5 × 10 −6 s −1 for


the Little Danube. These values are in good agreement with previous modelling 

experiences (Mucha et al., 1992b). 

When keeping the specific yield and the leakage coefficients for the river bed 

fixed the main calibration parameters were the hydraulic conductivities of the saturated 

zone. About 300 time series of ground water level observations were available 

for the model area, typically in terms of 30–40 yr of weekly observations. The 

calibration was carried out on the basis of about 80 of these series for the period 

1986–1990. In the parameter adjustments the overall spatial pattern described in the 

geological model were maintained. Some of the calibration results are illustrated 

in Figure 6 showing observed Danube discharge data together with simulated and 

measured ground water levels for three wells located at different distances from the 

Danube. Wells 694 and 740 are seen to react relatively quickly to fluctuations in 

river discharge as compared to well 7221, which is located further away from the 

river. This illustrates how the dynamics of the Danube propagates and is dampened 

in the aquifer. 

4.2.3. Model Validation 

The calibrated ground water model was validated by demonstrating the ability to 

reproduce measured ground water tables after damming of the Danube. In this 

regard the only model modification is the inclusion of the reservoir and related 

structures and canals. Due to the nonstationarity of the hydrological regime such 

a validation test, which according to Klemes (1986) is denoted a differential splitsample 

test, is a demanding test. Figure 7 shows the simulated and observed ground 

water levels for the same three observation wells as shown for the calibration period 

in Figure 6. The effects of the damming of the Danube in October 1992, when the 

new reservoir was established, is clearly seen in terms of increased ground water 

levels and reduced ground water dynamics when comparing the two figures. These 

features are well captured by the model. 

4.3. GROUND WATER QUALITY 

A geochemical field investigation was carried out in a cross-section north of the 

reservoir near Kalinkovo as a basis for identifying the key geochemical processes 

and estimating parameter values (see Mucha, 1995). Eleven multi-screen wells 

were installed close to the water supply wells at Kalinkovo forming a 7.5 km long 

cross-section parallel to the regional ground water flow direction. The multi-screen 

wells have been sampled frequently to investigate the ongoing bio-geochemical 

processes during infiltration of the Danube river water into the aquifer. 

A ground water quality model was established for the Kalinkovo cross-sectional 

profile based on all the measured field data. This model includes a comprehensive 

description of the bio-geochemical processes such as kinetically controlled 

denitrification and equilibrium controlled inorganic chemistry based on the well 

known PHREEQE code. More details are given in Griffioen et al. (1995) and 


Figure 6. Danube discharge at Bratislava together with simulated and observed ground water 

levels for three wells before the damming of the Danube (calibration period).


Figure 7. Simulated and observed ground water levels for three wells after damming of the 

Danube (validation period). 

Engesgaard (1996). The transport part of the Kalinkovo cross-section has been 

calibrated against 18 O isotope data. The parameters describing reactive processes 

have been assessed and adjusted on the basis of the detailed field measurements 

in the Kalinkovo cross-sectional profile. It was shown that the geochemical model 

behaves qualitatively correct (Engesgaard, 1996). 

4.4. UNSATURATED ZONE AND AGRICULTURAL MODELLING 

Modelling of the pre-dam and post-dam conditions of agricultural potential and 

nitrate leaching risk was carried out using a representative selection of soil units, 

cropping pattern and meteorological data covering the area between Danube and 

Maly Danube (Figure 1). The DAISY model uses time-varying ground water levels 

(simulated with the regional MIKE SHE ground water model) as lower boundary 

condition, for the unsaturated flow simulations. Cropping pattern and fertiliser 

application is included in the model based on measurements and statistical data. 

The model was calibrated on the basis of data from field experiments carried 

out during the years 1981–1987 at the experimental station in Most near Bratislava. 

During this process the crop parameters used in the model were adjusted to Slovak 


conditions. After the initial model construction and calibration, the model performance 

was evaluated through preliminary simulations using data from a number of 

plots located on an experimental field site at Lehnice in the middle of the project 

area. On the basis of comparisons between measured and simulated values of 

nitrogen uptake, dry matter yield and nitrate concentrations in soil moisture, the 

model performance under Slovak conditions was considered satisfactory (DHI et 

al., 1995). 

4.5. RIVER AND RESERVOIR SEDIMENT TRANSPORT MODELLING 

4.5.1. Danube River Sediment Transport 

A one-dimensional morphological model was established for the Danube. The 

model operates with cross-sectional averaged parameters representing the river 

reach between every computational point (i.e. approximately 500 m), a special 

technique for comparing ‘real’ and simulated state variables was required. Therefore, 

the changes in mean water level over a decade rather than changes in bed 

elevations were compared between observations and simulations. For this purpose 

the changes in the so-called ‘Low Regulation and Navigable Water Level’ (LR- 

NWL) were used. LR-NWL is specified by the Danube Commission as the water 

level corresponding to Q94% which is approximately 980 m 3 s −1 . By using such 

an approach, perturbations in bed levels from one cross-section to another did not 

destroy the picture of the overall trends in aggradation and degradation of the river 

bed. The results of the calibration (1974–84) and validation runs (1984–90) are 

described in Topolska and Klucovska (1995). 

4.5.2. Sediment Transport in the River Branch System 

A one-dimensional fine sediment model was constructed for the river branch system 

in order to have a tool for quantitative evaluation of the possible sedimentation 

in the river branch system for alternative water management options. The upstream 

boundary condition for the model was provided in terms of concentration of suspended 

sediments simulated by the reservoir model. As virtually no field data on 

sedimentation in the river branch system were available neither calibration nor validation 

was possible. Instead, experienced values of model parameters from other 

similar studies as reported in the literature were used. 

4.5.3. Reservoir Sediment Model 

A two-dimensional fine graded sediment model was constructed for the reservoir. 

The suspended sediment input was imposed as a boundary condition in Bratislava 

with time series of sediment concentrations of six suspended sediment fractions 

with their own grain sizes and fall velocities. The fall velocity for each of the six 

fractions was assessed according to field measurements. No further model calibration 

was carried out. The only field data available for validation were a few bed


sediment samples from summer 1994 with data on sedimentation thickness and 

grain size analyses (Holobrada et al., 1994). A comparison of model results and 

field data indicated that a reservoir sedimentation of the right order of magnitude 

was simulated. The simulated reservoir sedimentation corresponded to 42% of the 

total suspended load at Bratislava. 

4.6. SURFACE WATER QUALITY MODELLING 

4.6.1. Danube River Model 

A BOD-DO model (MIKE 11 WQ) has been used to describe the water quality 

in the main stream of the Danube between Bratislava and Komarno. This model 

describes oxygen concentration (DO) as a function of the decay of organic matter 

(BOD), transformation of nitrogen components, re-aeration, oxygen consumption 

by the bottom and oxygen production and respiration by living organisms. 

As the conditions from pre-dam to post-dam have changed significantly, separate 

calibrations and validations were carried out. The pre-dam model was calibrated 

against data from October 1991 and validated against data from April and August/September 

1991. The post-dam model was calibrated against data from May 

1993 and validated against data from June 1993. 

4.6.2. Model for the River Branch System 

The water quality in the river branches was simulated with a eutrophication model 

(MIKE 11 EU), in which the algae production is the driving force. The algae 

growth in this model is described as a function of incoming light, transparency 

of the water, temperature, sedimentation and growth rate of the algae and of the 

available inorganic nutrients. The calibration was carried out on the basis of few 

data available during the period June–August 1993. Due to lack of further data no 

independent model validation was possible and hence, the uncertainties related to 

applying the model for making quantitative predictions of the effects of alternative 

water management schemes may be considerable. 

4.6.3. Reservoir Model 

In the reservoir the driving force is also the algae growth and hence, a eutrophication 

model (MIKE 21 EU) was applied. The reservoir model was calibrated 

against measured data from August 1994. This field programme was substantial 

and resulted in much more data than available for the river branch system. Good 

correspondence between simulated and observed values were achieved during the 

calibration period. However, no further data have been available for independent 

validation tests. 


5. Validation of Integrated Model 

The model calibration and validation have basically been carried out for the individual 

models using separate domain data for river system, aquifer system, etc. 

Rigorous validation tests of the integrated model were generally not possible due 

to lack of specific and simultaneous data on the processes describing the various 

couplings. Furthermore, although reasonable good assessments of uncertainties 

of the individual model predictions could be made, it was not obvious how such 

uncertainty would propagate in the integrated model. 

It can be argued that uncertainties in output from one model would in principle 

influence the uncertainties in other components of the integrated modelling system, 

thus adding to the total uncertainty of the integrated model. Following this line of 

argument would lead to the conclusion that the uncertainty of predictions by the 

integrated model would be larger than the corresponding uncertainty of predictions 

made by traditional individual models. On the other hand it can also be argued 

that in the integrated modelling approach the uncertainties in the crucial boundary 

conditions are reduced, because assumptions needed for executing individual 

models are substituted by model simulations based on data from neighbouring 

domains, which, if properly calibrated and validated, better represent the boundary 

effects. This would lead to the conclusion that the uncertainties in predictions by 

the integrated model would be smaller than those of the individual models. 

In the present study, no theoretical analyses have been made of this problem. 

Instead, a few validation tests have been made for cases where the couplings could 

indirectly be checked by testing the performance of the integrated model against 

independent data. In the following, results from one of these validation tests for the 

integrated model are shown. 

The river-aquifer interaction changed significantly, when the reservoir was established. 

An important model parameter describing this interaction is the leakage 

coefficient, which was calibrated on the basis of ground water level data for the predam 

situation (Subsection 4.2). For the post-dam situation the MIKE 21 reservoir 

model calculates the thickness and grain sizes of the sedimentation at all points 

in the reservoir. By use of the Carman-Kozeny formula, the leakage factors are 

recalculated for the area which was now covered by the reservoir. The model results 

were then checked against ground water level observations from wells near the 

reservoir, and it was found, that a calibration factor of 10 had to be applied to the 

Carman-Kozeny formula. This can theoretically be justified by the fact that the 

sediments are stratified or layered due to variations in flow velocities during the 

sedimentation process. The same formula and the same calibration factor was also 

used for converting all texture data from aquifer sediment samples to hydraulic 

conductivity values in the model. 

Now, how can the validity of the integrated model be tested The ground water 

level observations from a few wells have been used to assess the leakage calibration 

factor, so although the model output was subsequently checked against data from


Figure 8. Measured and simulated discharges in seepage canals. The data are from a particular 

day in May 1995 and in m 3 s −1 . 

many more wells, it may be argued that this in itself is not sufficient for a true model 

validation. Consider instead a comparison of simulated and measured discharges 

in the so-called seepage canals, which are small canals constructed a few hundred 

meters away from the reservoir with the aim of intercepting part of the infiltration 

through the bottom of the reservoir. In Figure 8 it can be seen that the model 

simulations match the measured data remarkably well at different locations along 

the seepage canals. Thus, at the two stations most downstream on both seepage 

canals (stations 2809 and 3214) the agreements between model predictions and 

field data are within 5%. This is a powerful test, because the discharge data have 

not been used at all in the calibration process, and because it integrates the effects of 

reservoir sedimentation, calculation of leakage factors and geological parameters. 

6. Model Application – Case Study of River Branch System 

6.1. HYDROLOGY OF RIVER BRANCH SYSTEM 

The hydrology of the river branch system is highly complex with many processes 

influencing the water characteristics of importance for flora and fauna (Figure 2). 

These processes are highly interrelated and dynamic with large variations in time 

and space. The complexity of the floodplain, with its river branch system, is indicated 

in Figures 5 and 9 for the 20 km reach downstream the reservoir on the 

Slovakian side, where alluvial forest occurs. Before the damming of the Danube 


Figure 9. Plan and perspective view of the surface topography, of the river branches and the 

related flood plains as represented in a model network of 100 m grid squares. 

in 1992 the river branches were connected with the Danube during periods with 

discharge above average. However, some of the branches were only active during 

flood situations a few days per year. It was anticipated that after the damming, 

the water level in the Danube would decrease significantly. Therefore, in order to 

avoid that water drains from the river branches to the Danube, resulting in totally 

dry river branches, the water outflow from branches into the Danube have been 

blocked except for the downstream one at chainage 1820 rkm (Figure 5). Now, the 

river branch system receives water from an inlet structure in the hydropower canal 

at Dobrohost (Figure 5). This weir has a design capacity of 234 m 3 s −1 . Together 

with the various hydraulic structures in the river branches, it controls the hydraulic, 

hydrological and ecological regime in the river branches and on the flood plains.


Figure 10. Steps in integrated model for floodplain hydrology. 

6.2. MODELLING APPROACH 

Comprehensive field studies and modelling analyses are often carried out in connection 

with assessing environmental impacts of hydropower schemes. Recent examples 

from the Danube include the studies of the Austrian schemes Altenwörth 

(Nachtnebel, 1989) and Freudenau (Perspektiven, 1989). However, like in the Austrian 

cases, the modelling studies have most often been limited to independent 

modelling of river systems, groundwater systems or other subsystems, without 

providing an integrated approach as the one presented in this paper. 

The models in this study were applied in a scenario approach simulating the 

hydrological conditions resulting from alternative possible operations of the entire 

system of hydraulic structures (alternative water management regimes). Thus, one 

historical (pre-dam) regime and three hypothetical (post-dam) water regimes cor- 


responding to alternative operation schemes for the structures of the Gabcikovo 

system were simulated (DHI et al., 1995). Due to the integration of the overall 

modelling system each scenario simulation involves a sequence, some times in an 

iterative mode, of model calculations. For the case of river branch modelling a 

hierarchical scheme of simulation runs (Figure 10) included the following major 

steps: 

Step 1. Hydraulic river modelling (MIKE 11) 

Model simulation: The MIKE 11 model simulates the river flows and water 

levels in the entire river system and river branches. 

Coupling: The model outputs, in terms of flows into the reservoir at the upstream 

end and downstream outflows through the reservoir structures are used 

as boundary conditions for the reservoir modelling (Step 2). Furthermore, the 

flow velocities and water levels are used in the river water quality simulations 

(Step 4a). 

Step 2. Reservoir modelling (MIKE 21) 

Model simulation: The MIKE 21 reservoir model simulates velocities, sedimentation 

and eutrophication/water quality in the reservoir. 

Coupling: The flow boundary conditions are generated by the river model 

(Step 1). Results on sedimentation are used to calculate leakage coefficients. 

Results on oxygen, nitrogen and carbon can be used as boundary conditions of 

river water quality, water quality of infiltrating water (Step 3a). 

Step 3a. Regional ground water flow (MIKE SHE/MIKE 11) 

Model simulation: The coupled MIKE SHE/MIKE 11 model simulates the 

ground water flow and levels including the interaction with the river system 

and the reservoir. 

Coupling: In the reservoir, the infiltration is simulated on the basis of leakage 

coefficients, which have been calculated from the amount and composition 

(grain sizes) of the sedimentation on the reservoir bottom (Step 2). This link 

between reservoir sedimentation and ground water was shown to be crucial 

for the model results. Furthermore, an iterative link to the DAISY agricultural 

model exists (Step 3b). Hence, spatially and temporally varying ground 

water levels from MIKE SHE/MIKE 11 are used as lower boundary conditions 

in DAISY, which in turn simulates the leaf area index and the root zone 

depth which are used as input time series data in MIKE SHE/MIKE 11. The 

model outputs, in terms of ground water flow velocities, are used as input 

to the ground water quality simulation. The model results, in terms of river 

flow velocities and water levels, ground water flow velocities and water levels, 

are used as time varying boundary conditions for the local flood plain model 

(Step 4b).


Step 3b. Root zone (DAISY) 

Model simulation: The DAISY model simulates the unsaturated zone flows, 

the vegetation development, including crop yield. 

Coupling: The DAISY has an iterative link to the MIKE SHE/MIKE 11 model 

(as described above under Step 3a). 

Step 4a. River branches water quality (MIKE 11) 

Model simulation: The MIKE 11 model simulates the river water quality (BOD, 

DO, COD, NO3, etc). 

Coupling: The model uses data from Step 2 and Step 4b and produces output 

on concentrations of COD and DO, which are used as input to the ecological 

assessments (Step 5). 

Step 4b. Flood plain model (MIKE SHE/MIKE 11) 

Model simulation: The coupled MIKE SHE/MIKE 11 model simulates all the 

flow processes in the flood plain area including water flows and storages on 

the ground surface, river flows and water levels, ground water flows and water 

levels, evapotranspiration, soil moisture content in the unsaturated zone and 

capillary rise. 

Coupling: The model uses data from Step 3a as boundary conditions and provides 

river flow velocities as the basis for the water quality and sediment 

simulations (Steps 4a and c). The model provides data on flood frequency and 

duration, depth of flooding, depth to ground water table, moisture content in the 

unsaturated zone and flow velocities in river branches, which are key figures in 

the subsequent ecological assessments (Step 5). 

Step 4c. River branches sedimentation (MIKE 11) 

Model simulation: The MIKE 11 model simulates the transport of fine sediments 

through the river branch system. As a result the sedimentation/erosion 

and the suspended sediment concentrations are simulated. 

Coupling: The model uses sediment concentrations simulated by the reservoir 

model (Step 2) as input. Furthermore, the flow velocities simulated by the local 

flood plain model (Step 4b) are used as the basis for the sediment calculations. 

The results, in terms of grain size of the river bed and concentrations of 

suspended material, are used as input to the ecological assessments (Step 5). 

Step 5. Ecology 

A correlation matrix between the physical/chemical parameters provided by 

the model simulations (Steps 4a, b and c) and the aquatic and terrestric ecotopes 

has been established for the project area. Alternative water management 

regimes can be described in terms of specific operation of certain hydraulic 

structures and corresponding distribution of water discharges primarily between 

the Danube, the Gabcikovo hydropower scheme and the river branch 


system. The hydrological effects of such alternative operations can be simulated 

by the integrated model and subsequently, the ecological impacts can be 

assessed in terms of likely changes of ecotopes. 

6.3. THE FLOODPLAIN MODEL 

The extent of the floodplain model area is indicated in Figure 5 and a perspective 

view of the area with the river branch system and floodplains is shown in Figure 9. 

The horizontal discretization of the finite difference model is 100 m, and the ground 

water zone is represented by two layers. Several hundreds of cross-sections and 

more than 50 hydraulic structures in the river branch system were included in the 

MIKE 11 model for the river system. 

For the pre-dam model, the surface water boundary conditions comprise a discharge 

time series at Bratislava and a discharge rating curve at the downstream end 

(Komarno). For the post-dam model, the Bratislava discharge time series has been 

divided into three discharge boundary conditions, namely at Dobrohost (intake 

from hydropower canal to river branch system), at the inlet to the hydropower 

canal and at the inlet to Danube from the reservoir. For the groundwater system, 

time varying ground water levels simulated with the regional ground water models 

act as boundary conditions. The Danube river forms an important natural boundary 

for the area. The Danube is included in the model, located on the model boundary, 

and symmetric ground water flow is assumed below the river. Hence, a zero-flux 

boundary condition is used for ground water flow below the river. 

To illustrate the complex hydrology and in particular the interaction between 

the surface and subsurface processes model results from a model simulation for a 

period in June–July 1993 are shown in Figures 11 and 12. 

Figure 11 presents the inlet discharges at the upstream point of the river branch 

system (Dobrohost), while the discharges and water levels at the confluence between 

the Danube and the hydropower outlet canal downstream of Gabcikovo 

during the same period are shown in Figure 12. Figure 11 further shows the soil 

moisture conditions for the upper two m below terrain and the water depth on the 

surface at location 2. Similar information is shown for location 1 in Figure 12. A 

soil water content above 0.40 (40 vol.%) corresponds to saturation. Location 2 is 

situated in the upstream part of the river branch system, while location 1 is located 

in the downstream part (see Figure 9). 

At location 2 (Figure 11) flooding is seen to occur as a result of river spilling 

(surface inundation occurs before the ground water table rises to the surface) whenever 

the inlet discharge exceeds approximately 60 m 3 s −1 . The soil moisture content 

is seen to react relatively fast to the flooding and the soil column becomes 

saturated. In contrary, full saturation and inundation does not occur in connection 

with the flood in the Danube in July, but the event is recognised through increasing 

ground water levels following the temporal pattern of the Danube flood.


Figure 11. Observed inlet discharge to the river branch system at Dobrohost; simulated moisture 

contents at the upper two m of the soil profile at location 2 and simulated depths of 

inundation at location 2 during June–July 1993. 

At location 1 (Figure 12) the conditions are somewhat different. During the 

simulation period location 1 never becomes inundated due to high inlet flows at 

Dobrohost. However, during the July flood in Danube, inundation at location 1 

occurs as a result of increased ground water table caused by higher water levels in 

river branches due to backwater effects from the Danube. The surface elevation at 

location 1 is 116.4 m which is 0.4 m below the flood water level shown in Figure 12 

at the confluence (5 km downstream of location 1). It is noticed that the inundation 

at this location occurs as a result of ground water table rise and not due to spilling 

of the river (surface inundation occurs after the ground water table has reached 

ground surface). 

6.4. EXAMPLE OF MODEL RESULTS 

As an example of the results which can be obtained by the floodplain model, Figure 

13 shows a characterisation of the area according to flooding and depths to 

groundwater. The map has been processed on the basis of simulations for 1988 for 

pre-dam conditions. The classes with different ground water depths and flooding 


Figure 12. Simulated discharge and water levels in the Danube at the confluence between 

Danube and the outlet canal from the hydropower plant; simulated moisture contents at the 

upper two meter of the soil profile at location 1 and simulated depths of inundation at location 

1 in the river branch system during June–July 1993. 

have been determined from ecological considerations according to requirements 

of (semi)terrestrial (floodplain) ecotopes. From the figure the contacts between the 

main Danube river and the river branch system is clearly seen. Similar computations 

have been made by alternative water management schemes after damming of 

the Danube. The results of one of the hypothetical post-dam water management 

regimes, characterized by average water flows in the power canal, Danube and 

river branch system intake of 1470 m 3 s −1 , 400 m 3 s −1 and 45 m 3 s −1 , respectively, 

are shown in Figure 14. By comparing Figure 13 and Figure 14 the differences 

in hydrological conditions can clearly be seen. For instance the pre-dam conditions 

(Figure 13) are in many places characterised by high groundwater tables


Figure 13. Hydrological regime in the river branch area for 1988 pre-dam conditions 

characterized in ecological classes. 

and small/seldom flooding, while the post-dam situation (Figure 14) generally has 

deeper ground water tables and more frequent flooding. From such changes in hydrological 

conditions inferences can be made on possible changes in the floodplain 

ecosystem. 

Further scenarios (not shown here) have, amongst others, investigated the 

effects of establishing underwater weirs in the Danube and in this way improvement 

of the connectivity between the Danube and the river branch system. 

7. Limitations in the Couplings made in the Integrated Model 

The integrated modelling system and the way it was applied includes different 

degrees of integration ranging from sequential runs, where results from one model 

are used as input to the next model, to a full integration, such as the coupling 

between MIKE SHE and MIKE 11. Hence, the system is not truly integrated in 

all respects. The justification for these different levels lies in assessments of where 

it was required in the present project area to account for feed back mechanisms 

and where such feed backs could be considered to be of minor importance for all 

practical purposes. For other areas with different hydrological characteristics, the 

required levels of integration are not necessarily the same. Therefore, a discussion 


Figure 14. Hydrological regime in the river branch area for a post-dam water management 

regime characterized in ecological classes. The scenario has been simulated using 

1988 observed upstream discharge data and a given hypothetical operation of the hydraulic 

structures. 

is given below on the universality and limitations of the various couplings made in 

the present case. 

A. Hydrological catchment/river hydraulics (MIKE SHE/MIKE 11) 

This coupling between the hydrological code and the river hydraulic code is fully 

dynamic and fully integrated with feed back mechanisms between the two codes 

within the same computational time step. This coupling cannot be treated sequentially 

in this area, since the feedback between river and aquifer works in both 

directions, with the river functioning as a source in part of the area and as a drain 

in other parts, and since the direction of the stream-aquifer interaction changes 

dynamically in time and space as a consequence of discharge fluctuations in the 

Danube. This coupling was shown to be crucial during the course of the project, 

and, due to the full integration, it is fully generic. 

B. Reservoir/river (MIKE 21/MIKE 11) 

This coupling is a simple one-way coupling with the reservoir model providing 

input data to the downstream river model, both in terms of sediment and water


quality parameters. This coupling is sufficient in the present case, because there is 

no feedback from the downstream river to the reservoir. Even though this coupling 

is not fully generic, it may be sufficient in most cases, even in cases with a network 

of reservoirs and connecting river reaches. 

C. Reservoir/groundwater water exchange (MIKE 21/MIKE SHE) 

This coupling is a simple one-way coupling with the reservoir model providing 

data on sedimentation to the groundwater module of MIKE SHE, where they are 

used to calculate leakage coefficients in the surface water/ground water flow calculations. 

This coupling is sufficient in the present case, where the reservoir water 

table always is higher than the ground water table, and where the flow always is 

from the reservoir to the aquifer. However, for cases where water flows in both 

directions, or where there are significant temporal variations in the sedimentation, 

the present coupling is not necessarily sufficient. 

D. Hydrology catchment/crop growth (MIKE SHE/DAISY) 

This coupling is an iterative coupling with data flowing in both directions. However, 

it is not a full integration with the two model codes running simultaneously. 

Therefore, a number of iterations are required until the input data used in MIKE 

SHE (vegetation data simulated by DAISY) generates the input data used in DAISY 

(ground water levels) and vice versa. For example, changes in river water levels 

affect the ground water levels, implying that the crop growth conditions change and 

hence, the DAISY simulated vegetation data used by MIKE SHE to simulate the 

ground water levels are not correct. In such a case, the MIKE SHE simulation has to 

be repeated with the new crop growth data and subsequently, the DAISY simulation 

has to be repeated with the new ground water levels, etc., until the differences 

become negligible. This coupling has been used successfully in previous studies 

(Styczen and Storm, 1993), but may, due to the iterative mode, be troublesome in 

practise. 

E. Surface water/ground water quality (MIKE 11 – MIKE 21/MIKE SHE) 

In contrary to the full coupling of flows (coupling A) the corresponding water 

quality coupling is a simple one-way coupling with the river and reservoir models 

providing the water quality parameters in the infiltrating water and uses these as 

boundary conditions for the ground water quality simulations. This coupling is 

sufficient in the present case with respect to the reservoir, where the flow always 

is from the reservoir to the aquifer. The river-aquifer interaction involves flows in 

both directions, but the return flow from the aquifer to the Danube is very small 

(about 1%) as compared to the Danube flow, and hence, the feedback from the 

ground water quality to Danube water quality is assumed negligible. However, for 

other cases where the mass flux from the aquifer to the river system is important 

for the river water quality, the present one-way coupling will not be sufficient. 


8. Discussion and Conclusions 

The hydrological and ecological system of the Danubian Lowland is so complex 

with so many interactions between the surface and the subsurface water regimes 

and between physical, chemical and biological changes, that an integrated numerical 

modelling system of the distributed physically-based type is required in order 

to provide quantitative assessments of environmental impacts on the ground water, 

the surface water and the floodplain ecosystem of alternative management options 

for the Gabcikovo hydropower scheme. 

Such an integrated modelling system has been developed, and an integrated 

model has been constructed, calibrated and, to the extent possible, validated for 

the 3000 km 2 area. The individual components of the modelling system represent 

state-of-the-art techniques within their respective disciplines. The uniqueness is the 

full integration. The integrated system enables a quite detailed level of modelling, 

including quantitative predictions of the surface and ground water regimes in the 

floodplain area, ground water levels and dynamics, ground water quality, crop 

yield and nitrogen leaching from agricultural land, sedimentation and erosion in 

rivers and reservoirs, surface water quality as well as frequency, magnitude and 

duration of inundations in floodplain areas. The computations were carried out on 

Hewlett Packard Apollo 9000/735 UNIX workstations with 132 MB RAM. With 

a 300 MHz Pentium II NT computer a typical computational times for one of 

the steps described in Section 6.2 (Figure 10) would be 2–10 hr. Thus, although 

the integrated system is rather computationally demanding, the computational requirements 

are not a serious constraint in practise as compared to the demand for 

comprehensive field data. 

For most of the individual model components, traditional split-sample validation 

tests have been carried out, thus documenting the predictive capabilities of 

these models. However, this was not possible for some aspects of the integrated 

model. Hence, according to rigorous scientific modelling protocols, the integrated 

model can be argued to have a rather limited predictive capability associated with 

large uncertainties. A theoretical analysis of error propagation in such an integrated 

model would be quite interesting, but was outside the scope of the present study 

which was limited to the comprehensive task of developing the integrated modelling 

system and establishing the integrated model on the basis of all available 

data. However, on the basis of the few possible tests (e.g. Figure 7) of the integrated 

model against independent data not used in the calibration-validation process for 

the individual models, it is our opinion that the uncertainties of the integrated model 

are significantly smaller than those of the individual models. The two key reasons 

for this are: (1) in the integrated model the internal boundaries are simulated by 

neighbouring model components and not just assessed through qualified but subjective 

estimates by the modeller; and (2) the integrated model makes it possible 

to explicitly include more sources of data in validation tests that can not all be 

utilised in the individual models. Thus, by adding independent validation tests for


the integrated model, such as the one shown in Figure 7 on discharges in seepage 

canals, to the validation tests for the individual models, the outputs of the integrated 

model have been subject to a more comprehensive test based on more data and 

hence, must be considered less uncertain than outputs from the individual models. 

The environmental impacts of the new reservoir and the diversion of water from 

the Danube through the Gabcikovo power plant can be simulated in rather fine 

detail by the integrated model established for the area. The integrated nature of 

the model has been illustrated by a case study focusing on hydrology and ecology 

in the wetland comprising the river branch system. The integrated model is not 

claimed to be capable of predicting detailed ecological changes at the species level. 

However, it is believed to be capable of simulating changes in the hydrological 

regime resulting from alternative water management decisions to such a degree of 

detail that it becomes a valuable tool for broader assessments of possible ecological 

changes in the area. 

Acknowledgements 

The present paper is based on results from the project ‘Danubian Lowland – Ground 

Water Model’ supported by the European Commission under the PHARE program. 

The project was executed by the Slovak Ministry of the Environment. The work 

was carried out by an international group of research and consulting organisations 

as reflected by the team of authors. The constructive criticisms of two anonymous 

reviewers are acknowledged. 

References 

Bathurst, J. C., Wicks, J. M. and O’Connel, P. E.: 1995, The SHE/SHESED basin scale water flow 

and sediment transport modelling system, In V. P. Singh (ed.), Computer Models of Watershed 

Hydrology, Water Resources Publications, pp. 563–594. 

Calver, A. and Wood, W. L.: 1995, The institute of hydrology distributed model, In V. P. Singh (ed.), 

Computer Models of Watershed Hydrology, Water Resources Publications, pp. 595–626. 

CEC: 1991, Commission of European Communities, Czech and Slovak Federative Republic, 

Danubian Lowland-Ground Water Model, No. PHARE/90/062/030/001/EC/WAT/1 

DHI: 1995, MIKE 21 Short Description. Danish Hydraulic Institute, Hørsholm, Denmark. 

DHI, DHV, TNO, VKI, Krüger and KVL: 1995, PHARE project Danubian Lowland – Ground Water 

Model (EC/WAT/1), Final Report. Prepared by a consultant group for the Ministry of the Environment, 

Slovak Republic and for the Commission of the European Communities, Vol. 1, 65 pp.; 

Vol. 2, 439 pp.; Vol. 3, 297 pp., Bratislava. 

EC: 1992, Working group of independent experts on variant C of the Gabcikovo-Nagymaros project, 

working Group Report, Commission of the European Communities, Czech and Slovak Federative 

Republic, Republic of Hungary, Budapest, 23 November, 1992. 

EC: 1993a, Working group of monitoring and water management experts for the Gabcikovo system 

of locks – Data Report, Commission of the European Communities, Republic of Hungary, Slovak 

Republic, Budapest, 2 November, 1993. 


EC: 1993b, Working group of monitoring and water management experts for the Gabcikovo system 

of locks – Report on temporary water management regime, Commission of the European 

Communities, Republic of Hungary, Slovak Republic, Bratislava, 1 December, 1993. 

Engesgaard, P.: 1996, Multi-Species Reactive Transport, In M. B. Abbott and J. C. Refsgaard (eds), 

Distributed Hydrological Modelling, Kluwer Academic Publishers, pp. 71–91. 

Griffioen, J., Engesgaard, P., Brun, A., Rodak, R., Mucha, I. and Refsgaard, J. C.: 1995, Nitrate 

and Mn-chemistry in the alluvial Danubian Lowland aquifer, Slovakia. Ground Water Quality: 

Remediation and Protection (GQ95), Proceedings of the Prague Conference, May 1995, IAHS 

Publ. No. 225, pp. 87–96. 

Hansen, S., Jensen, H. E., Nielsen, N. E. and Svendsen, H.: 1991, Simulation of nitrogen dynamics 

and biomass production in winter wheat using the Danish simulation model DAISY. Fertilizer 

Research 27, 245–259. 

Havnø, K., Madsen, M. N. and Dørge, J.: 1995, ‘MIKE 11 – A Generalized River Modelling 

Package’, In V. P. Singh (ed), Computer Models of Watershed Hydrology, Water Resources 

Publications, pp. 733–782. 

Holobrada, M., Capekova, Z., Lukac, M. and Misik, M.: 1994, Prognoses of the Hrusov reservoir 

eutrophication and siltation under various discharge distribution to the Old Danube (in Slovak), 

Water Research Institute (VUVH), Bratislava. 

ICJ: 1997, Case Concerning Gabcikovo-Nagymaros project (Hungary/Slovakia). Summary of the 

Judgement of 25 September 1997. International Court of Justice, The Hague, (available on 

www.icj-cij.org). 

JAR: 1995, 1996, 1997, Joint Annual Report of the environment monitoring in 1995, 1996, 1996 

according to the ‘Agreement between the Government of the Slovak Republic and the Government 

of Hungary about Certain Temporary Measures and Discharges to the Danube and Mosoni 

Danube’, signed 19 April, 1995. 

Klemes, V.: 1986, Operational testing of hydrological simulation models, Hydrological Sciences 

Journal, 13–24. 

Klucovska, J. and Topolska, J.: 1995, Water regime in the Danube river and its river branches, In I. 

Mucha (ed.), Gabcikovo Part of the Hydroelectric Power Project. Environmental Impact Review, 

Faculty of Natural Sciences, Comenius University, Bratislava, pp. 33–42. 

Kocinger, D.: 1995, Gabcikovo Part of the Hydroelectric Power Project, Basic Characteristics, In I. 

Mucha (ed.), Gabcikovo Part of the Hydroelectric Power Project – Environmental Impact Review, 

Faculty of Natural Sciences, Comenius University, Bratislava, pp. 5–14. 

Koncsos, L., Schütz, E. and Windau, U.: 1995, Application of a comprehensive decision support 

system for the water quality management of the river Ruhr, Germany, In S. P. Simonovic, Z. 

Kunzewicz, D. Rosbjerg and K. Takeuchi (eds), Modelling and Management of Sustainable 

Basin-Scale Water Resources Systems, IAHS Publ. No. 231, pp. 49–59. 

Menetti, M.: 1995, Analysis of regional water resources and their management by means of numerical 

simulation models and satellites in Mendoza, Argentina, In S. P. Simonovic, Z. Kunzewicz, D. 

Rosbjerg and K. Takeuchi (eds), Modelling and Management of Sustainable Basin-Scale Water 

Resources Systems, IAHS Publ. No. 231, pp. 49–59. 

Mucha, I.: 1992, Database processing of the hydropedological parameters for the ground water flow 

model of the Danubian Lowland (in Slovak), Ground Water Division, Faculty of Natural Science, 

Comenius University, Bratislava. 

Mucha, I., Paulikova, E., Hlavaty, Z., Rodak, D. and Pokorna, L.: 1992a, Danubian Lowland Ground 

Water Model, Working Manual to consortium of invited specialists for workshop in Bratislava, 

Ground Water Division, Faculty of Natural Sciences, Comenius University, Bratislava. 

Mucha, I., Paulikova, E., Hlavaty, Z. and Rodak, D.: 1992b, Elaboration of basis data for preparation 

of hydrogeological parameters for the model of the ground water flow of the Danubian Lowland 

area (in Slovak), Ground Water Division, Faculty of Natural Science, Comenius University, 

Bratislava.


Mucha, I., Paulikova, E., Hlavaty, Z., Rodak, D. and Pokorna, L.: 1993, Surface and ground water 

regime in the Slovak part of the Danube alluvium, Ground Water Division, Faculty of Natural 

Science, Comenius University. 

Mucha, I. (ed): 1995, Gabcikovo part of the hydroelectric power project environmental impact 

review. Evaluation based on two years monitoring, Faculty of Natural Sciences, Comenius 

University, Bratislava. 

Mucha, I., Rodak, D., Hlavaty, Z. and Bansky, L.: 1997, Environmental aspects of the design 

and construction of the Gabcikovo Hydroelectric Power Project on the river Danube, Proceedings 

International Symposium on Engineering Geology and the Environment, organized by the 

Greek National Group of IAEG, Athens, June 1997, Engineering Geology and the Environment, 

pp. 2809–2817. 

Nachtnebel, H.-P. (ed): 1989, Ökosystemstudie Donaustau Altenwörth, Veränderungen durch das 

Donaukraftwerk Altenwörth, Österreische Akademie der Wissenschaften, Veröffentlichungen 

des Österreischen MaB-Programs, Band 14, Universitätsverlag Wagner, Innsbruck. 

Person, M., Raffensperger, J. P., Ge, S. and Garven, G.: 1996, Basin-scale hydrogeologic modelling, 

Rev. Geophys. 34(1), 61–87. 

Perspektiven: 1989, Staustufe Freudenau, Perspektiven, Magazin für Stadtgestaltung und Lebensqualität, 

Dezember 1989. 

Refsgaard, J. C.: 1997, Parameterisation, calibration and validation of distributed hydrological 

models, J. Hydrology 198, 69–97. 

Refsgaard, J. C. and Storm, B.: 1995, MIKE SHE, In V. P. Singh (ed), Computer Models of Watershed 

Hydrology, Water Resources Publications, pp. 809–846. 

Singh, V. P. (ed): 1995, Computer Models of Watershed Hydrology, Water Resources Publications. 

Sørensen, H. R., Klucovska, J., Topolska, T., Clausen, T. and Refsgaard, J. C.: 1996, An engineering 

case study – Modelling the influences of the Gabcikovo hydropower plant in the hydrology and 

ecology in the Slovak part of the river branch system, In M. B. Abbott and J. C. Refsgaard (eds), 

Distributed Hydrological Modelling, Kluwer Academic Publishers, pp. 233–253. 

Styczen, M. and Storm, B.: 1993, Modelling of N-movements on catchment scale – a tool for analysis 

and decision making. 1. Model description. 2. A case study, Fertilizer Research 36, 1–17. 

Topolska, J. and Klucovska, J.: 1995, River morphology, In I. Mucha (ed.), Gabcikovo Part of the Hydroelectric 

Power Project. Environmental Impact Review, Faculty of Natural Sciences, Comenius 

University, Bratislava, pp. 23–32. 

VKI: 1995, Short Description of water quality and eutrophication modules,. Water Quality Institute, 


Winter, T. C.: 1995, Recent advances in understanding the interaction of groundwater and surface 

water, Rev. Geophys., Supplement, U.S. National Report 1991–94 to IUGG, pp. 985–994. 

Yan, J. and Smith, K. R.: 1994, Simulation of integrated surface water and ground water systems – 

model formulation, Water Resources Bulletin 30(5), 879–890.

[10] 

Refsgaard JC, Thorsen M, Jensen JB, Kleeschulte S, Hansen S (1999) Large 

scale modelling of groundwater contamination from nitrogen leaching. 

Journal of Hydrology, 221(3-4), 117-140. 


Journal of Hydrology 221 (1999) 117–140 

Large scale modelling of groundwater contamination from 

nitrate leaching 

J.C. Refsgaard a, *, M. Thorsen a , J.B. Jensen a , S. Kleeschulte b , S. Hansen c 

a Danish Hydraulic Institute, Hørsholm, Denmark 

b GIM, Luxembourg 

c Royal Veterinary and Agricultural University, Copenhagen, Denmark 

Received 20 July 1998; received in revised form 3 May 1999; accepted 31 May 1999 

Abstract 

Groundwater pollution from non-point sources, such as nitrate from agricultural activities, is a problem of increasing 

concern. Comprehensive modelling tools of the physically based type are well proven for small-scale applications with 

good data availability, such as plots or small experimental catchments. The two key problems related to large-scale simulation 

are data availability at the large scale and model upscaling/aggregation to represent conditions at larger scale. This paper 

presents a methodology and two case studies for large-scale simulation of aquifer contamination due to nitrate leaching. Readily 

available data from standard European level databases such as GISCO, EUROSTAT and the European Environment Agency 

(EEA) have been used as the basis of modelling. These data were supplemented by selected readily available data from national 

sources. The model parameters were all assessed from these data by use of various transfer functions, and no model calibration 

was carried out. The adopted upscaling procedure combines upscaling from point to field scale using effective parameters with a 

statistically based aggregation procedure from field to catchment scale, preserving the areal distribution of soil types, vegetation 

types and agricultural practices on a catchment basis. The methodology was tested on two Danish catchments with good 

simulation results on water balance and nitrate concentration distributions in groundwater. The upscaling/aggregation procedure 

appears to be applicable in many areas with regard to root zone processes such as runoff generation and nitrate leaching, 

while it has important limitations with regard to hydrograph shape due to its lack of accounting for scale effects in relation to 

stream aquifer interaction. 1999 Elsevier Science B.V. All rights reserved. 

Keywords: Upscaling; Databases; Non-point pollution; Nitrate leaching; Distributed model; Water balance 


Groundwater is a significant source of freshwater 

used by industry, agriculture and domestic users. 

However, increasing demand for water, increasing 

use of pesticides and fertilisers as well as atmospheric 

deposition constitute a threat to the quality of groundwater. 

The use of fertilisers and manure leads to the 

* Corresponding author. 

E-mail address: jcr@dhi.dk (J.C. Refsgaard) 

leaching of nitrates into the groundwater and atmospheric 

deposition contributes to the acidification of 

soils that may have an indirect effect on the contamination 

of water. 

In Europe, for instance, the present situation is 

summarised in EEA (1995), where it is assessed that 

the major part of aquifers in Northern and Central 

Europe are subject to risk of nitrate contamination 

amongst others due to agricultural activities. Therefore, 

policy makers and legislators in EU are 

concerned about the issue and a number of preventive 

0022-1694/99/$ - see front matter 1999 Elsevier Science B.V. All rights reserved. 

PII: S0022-1694(99)00081-5

118 

J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140 

legislation steps are being taken in these years (EU 

Council of Ministers, 1991; EC, 1996). 

In the scientific community, concerns on groundwater 

contamination have motivated the development 

of numerous simulation models for groundwater quality 

management. Groundwater models describing the 

flow and transport mechanisms of aquifers have been 

developed since the 1970s and applied in numerous 

pollution studies. They have mainly described the 

advection and dispersion of conservative solutes. 

More recently, geochemical and biochemical reactions 

have been included to simulate the transport 

and fate of pollutants from point sources as industrial 

and municipal waste disposal sites, see e.g. Mangold 

and Tsang (1991); Engesgaard et al. (1996) for overviews. 

Fewer attempts have been made to simulate 

non-point pollution at catchment scale resulting 

from agricultural activities, see e.g. Thorsen et al. 

(1996); Person et al. (1996) for overviews. The 

approaches range from relatively simple models 

with semi-empirical process descriptions of the 

lumped conceptual type such as ANSWERS (Beasley 

et al., 1980), CREAMS (Knisel, 1980; Knisel and 

Williams, 1995), GLEAMS (Leonard et al., 1987), 

SWRRB (Arnold and Wiliams, 1990; Arnold et al., 

1995) and AGNPS (Young et al., 1995) to more 

complex models with a physically based process 

description. The physically based models are most 

commonly one-dimensional leaching models, such 

as RZWQM (DeCoursey et al., 1989, 1992), Daisy 

(Hansen et al., 1991) and WAVE (Vereecken et al., 

1991; Vanclooster et al., 1994, 1995), which basically 

describe root zone processes only, while true, 

spatially distributed, catchment models based on 

comprehensive process descriptions, such as the 

coupled MIKE SHE/Daisy (Styczen and Storm, 

1993), are seldom reported. The simple conceptual 

models are attractive because they require relatively 

less data, which are usually easily accessible, while 

the predictive capability of these models with regard 

to assessing the impacts of alternative agricultural 

practises is questionable due to the semi-empirical 

nature of the process descriptions. On the contrary, a 

key problem in using the more complex catchment 

models operationally lies in the generally large data 

requirements prescribed by the developers of such 

model codes. However, due to the better process 

descriptions these models may for some types of 

application be expected to have better predictive 

capabilities than the simpler models (Heng and Nikolaidis, 

1998). 

Input data for the complex catchment models have 

traditionally been available in practise only for small 

areas such as experimental research catchments. 

However, as more and more data have been gathered 

in computerised databases and, in particular, in 

Geographical Information Systems (GIS), the data 

availability has improved significantly. Further, 

experience from case studies indicates that a considerable 

part of the input data may be derived from 

statistical data and more general databases (Styczen 

and Storm, 1995). 

The database of EUROSTAT, the statistical office 

of the European Commission, holds statistical information 

about different topics from all Member States 

of the European Union. Agricultural statistics provide 

information on main crops, on the structure of agricultural 

holdings and crop and on animal production. 

Environment statistics provide figures on impacts of 

other sector’s work on the environment, such as fertiliser 

and pesticide input, groundwater withdrawal, 

water quality or manure production on animal 

farms. These figures are mostly aggregated and 

published on national level. 

In order to use these statistics in a spatially distributed 

simulation model, the information needs to be 

spatially referenced to represent a unit on the ground. 

Therefore the statistical information needs to be 

linked to a GIS data set. Such GIS data is stored in 

the GISCO (Geographic Information System of the 

European Commission) database. The GISCO database 

holds spatial data about administrative boundaries 

down to commune level, thematic data sets 

such as the soil database, CORINE land cover (managed 

by the EEA) or climatic time series for about 2000 

measuring stations in the European Union. 

Thus on one hand, there is a clearly expressed need 

from decision makers at national and international 

level to have tools, which on the basis of readily available 

data can predict the risks of groundwater pollution 

from non-point sources and the impacts of 

alternative agricultural management practices; and 

on the other hand, the scientific community has 

achieved new knowledge and developed new tools 

aiming at this. However, there are some important 

gaps to be filled before the scientifically based tools

J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140 119 

Fig. 1. Schematic structure of the MIKE SHE. 

can be applied operationally for supporting the decision 

makers: 

• The physically based models are very promising 

tools for assessing the impacts of alternative agricultural 

practises, but have so far been tested on 

plot scale and very small experimental catchments, 

whereas the need from a policy making point of 

view mainly relates to application on a much larger 

scale. Hence, there is a need to derive and test 

methodologies for upscaling of such models to 

run with model grid sizes one to two order of 

magnitudes larger than usually done. 

• Readily available data on large (national and international) 

scales do exist, although in a somewhat 

aggregated form. However, such data have not yet 

been used as the basis for comprehensive modelling, 

which so far always have been based on more 

detailed data, often from experimental catchments. 

Hence, there is a need to test to which extent these 

readily available data are suitable for modelling. 

• There is a need to assess the predictive 

uncertainties, before it can be evaluated whether 

the approach of combining complex predictive 

models with existing data bases is of any practical 

use in the decision making process or whether the 

uncertainties are too large. 

This paper presents results from a joint EU research 

project on prediction of non-point nitrate contamination 

at catchment scale due to agricultural activities. 

Other results from the same study focussing on uncertainty 

aspects are presented in UNCERSDSS (1998), 

Refsgaard et al. (1998a, 1999) and Hansen et al. 

(1999). 

2. Methodology 

2.1. Materials and methods 

2.1.1. MIKE SHE 

MIKE SHE is a modelling system describing the 

flow of water and solutes in a catchment in a distributed 

physically based way. This implies numerical

120 


solutions of the coupled partial differential equations 

for overland (2D) and channel flow (1D), unsaturated 

flow (1D) and saturated flow (3D) together with a 

description of evapotranspiration and snowmelt 

processes. The model structure is illustrated in Fig. 

1. For further details reference is made to the literature 

(Abbott et al., 1986; Refsgaard and Storm, 1995). 

2.1.2. Daisy 

Daisy (Hansen et al., 1991) is a one-dimensional 

physically based modelling tool for the simulation 

of crop production and water and nitrogen balance 

in the root zone. Daisy includes modules for description 

of evapotranspiration, soil water dynamics based 

on Richards’ equation, water uptake by plants, soil 

temperature, soil mineral nitrogen dynamics based 

on the advection–dispersion equation, nitrate uptake 

by plants and nitrogen transformations in the soil. The 

nitrogen transformations simulated by Daisy are 

mineralization–immobilization turnover, nitrification 

and denitrification. In addition, Daisy includes a 

module for description of agricultural management 

practices. Details on the Daisy application in the 

present study are given by Hansen et al. (1999). 

2.1.3. MIKE SHE/Daisy coupling 

By combining MIKE SHE and Daisy, a complete 

modelling system is available for the simulation of 

water and nitrate transport in an entire catchment. In 

the present case the coupling is a sequential one. Thus 

for all agricultural areas, Daisy first produces calculations 

of water and nitrogen behaviour from the soil 

surface and through the root zone. The percolation of 

water and nitrate at the bottom of the root zone simulated 

by Daisy, is then used as input to MIKE SHE 

calculations for the remaining part of the catchment. 

For natural areas, MIKE SHE calculates also the root 

zone processes assuming no nitrate contribution from 

these areas. Owing to the sequential execution of the 

two codes, it has to be assumed that there is no feed 

back from the groundwater zone (MIKE SHE) to the 

root zone (Daisy). Further, overland flow generated by 

high intensity rainfall (Hortonian) cannot be simulated 

by this coupling, while overland flow due to 

saturation from below (Dunne) can be accounted for 

by MIKE SHE. 

Thus, MIKE SHE does not in the present case 

handle evapotranspiration and other root zone 

processes in the agricultural areas. As Daisy is onedimensional, 

one Daisy run in principle should be 

carried out for each of MIKE SHE’s horizontal 

grids. However, several MIKE SHE grids are assumed 

to have identical root zone properties (soil, crop, agricultural 

management practices, etc.), so that in practise 

the outputs from each Daisy run can be used as 

input to several MIKE SHE grids. 

2.2. Data availability at European databases 

Input data for modelling at the European scale need 

to satisfy certain requirements to make them useful for 

large-scale applications: 

• The data must be available for the whole of 

Europe. 

• The data must be harmonised according to a 

common nomenclature in order to avoid regional 

or national inconsistencies. 

• The data should be available in a seamless database. 

• The data should be available from one single 

source to avoid regional or national inconsistencies. 

• The data should be available in a format which can 

be directly integrated into a Geographical Information 

System (GIS). 

Attached to the use of “European” data sets are also 

certain problems. The data are generalised in 

geometric as well as in thematic detail, local particularities 

which are especially important for hydrological 

simulations are not always accounted for. Often 

information that is required for specific modelling 

objectives is not directly available on European 

level demanding the establishment and use of transfer 

functions instead. On the contrary, information is 

sometimes too specific when it has been collected in 

the framework of a particular research project, e.g. 

information on a particular soil property is being 

collected in natural soils but not in agricultural soils. 

Given these formal requirements, a first task of the 

project was to study the availability of data sets suited 

for large-scale hydrological modelling of groundwater 

contamination from diffuse sources. After 

intensive searches of on-line data catalogues, paper 

publications and direct contacts with organisations 

holding relevant information, it was possible to

Table 1 

Data sources for European scale hydrological modelling 


Data 

Potential data source 

identified in European data 

base 

Source actually used for 

modelling 

Scale of available data used 

Topography USGS a /GISCO USGS/GISCO 1 km grid 

Soil type GISCO soil map GISCO soil map 1 km grid 

Soil organic matter RIVM b report Experience value for Danish Denmark 

arable soils c 

Vegetation EEA: CORINE land cover EEA: CORINE land cover 1 km grid 

River network and river DCW d 

Provided by an application 1 km grid 

cross sections 

developed within the project 

Geology 

Report on groundwater 

resources in Denmark (EC, 

1982) RIVM—digital map 

data of report 



1982) 

County, i.e. approximately 

3,000 km 2 

Groundwater abstraction 



1982) RIVM—digital map 

data of report 



1982) 

Commune, i.e. 

approximately 200 km 2 

Management practices SC-DLO e report Plantedirektoratet (1996) Denmark 

Crop type Eurostat—Regional Statistics Agricultural Statistics (1995) County, i.e. approximately 

3000 km 2 

Livestock density 

Eurostat—Regional Statistics 

Eurostat—Eurofarm 

Agricultural Statistics (1995) County, i.e. approximately 

3000 km 2 

Fertilizer consumption Eurostat—Environmental 

Statistics 


3000 km 2 

Manure production 

Eurostat—Environmental 

Statistics 


3000 km 2 

Atmospheric deposition MARS project National data Denmark 

Climatic variables MARS project f National data Denmark 

River runoff GRDC g National data Catchment 

a USGS—United States Geological Survey. 

b RIVM—National Institute of Public Health and the Environment of The Netherlands. 

c RIVM data only include natural areas, not arable land. Instead the figure was assessed on the basis of previous experience with Danish 

agricultural soils. 

d DCW—Digital Chart of the World. 

e SC-DLO—Winand Staring Centre, The Netherlands. 

f MARS—Monitoring Agriculture by Remote Sensing database. 

g GRDC—Global Runoff Data Centre, database mainly for large river basins. 

identify sources for all the information requirements. 

However, after evaluation of all the potential sources 

the following deficiencies became apparent: 

• Not all information was available in spatially referenced 

GIS format, therefore other sources such as 

tables and statistics had to be considered. 

• Not all information was available from 

“European” databases, finally national sources 

had to be considered. For these national sources 

strict requirements in terms of ease of availability, 

data quality and data comparability were 

imposed. 

• The scale of the available data was often too coarse 

for the application. Global data sets with 1 × 1 

longitude/latitude resolution are often not detailed 

enough. 

The potential “European scale” data sources and the 

data sources which ultimately was used for the model 

are shown in Table 1. 

Data about climatic variables were obtained from

122 


the national meteorological institutes and river runoff 

from the national hydrological institutes. These data 

were only available from national sources, but on the 

contrary these data are probably the most easily available 

(if the issue of price charges is disregarded) and 

the most easily comparable due to international 

harmonised measuring techniques at these organisations. 

Regional statistics on Denmark obtained from 

EUROSTAT proved to be not detailed enough 

(country level only). The required statistical information 

could easily be recovered from Danish national 

statistics. 

Cost estimates for the compilation of the database 

have only been undertaken to a limited extent. The 

project data itself have mostly been obtained in 

exchange for the anticipated project results, i.e. at 

no cost. The main data that in a fully commercial 

environment cost a substantial amount of money are 

meteorological data which are available from the 

national meteorological institutes (Kleeschulte, 

1998). 

2.3. Change of scale 

Large scale hydrological models are required for a 

variety of applications in hydrological, environmental 

and land surface-atmosphere studies, both for research 

and for day to day water resources management 

purposes. The physically based models have so far 

mainly been tested and applied at small scale and 

therefore require upscaling. The complex interactions 

between spatial scale and spatial variability is widely 

perceived as a substantial obstacle to progress in this 

respect (Blöschl and Sivapalan, 1995; and many 

others). 

The research results on the scaling issue reported 

during the past decade have, depending on the particular 

applications, focussed on different aspects, 

which may be categorised as follows: 

• Subsurface processes focussing on the effect of 

geological heterogeneity. 

• Root zone processes including interactions 

between land surface and atmospheric processes. 

• Surface water processes focussing on topographic 

effects and stream–aquifer interactions. 

The effect of spatial heterogeneity on the description 

of subsurface processes has been the subject of 

comprehensive research for two decades, see e.g. 

Dagan (1986) and Gelhar (1986) for some of the 

first consolidated results and Wen and Gómez- 

Hernández (1996) for a more recent review, mainly 

related to aquifer systems. The focus in this area is 

largely concerned with upscaling of hydraulic 

conductivity and its implications on solute transport 

and dispersion processes in the unsaturated zone and 

aquifer system, typically at length scales less than 

1 km. 

The research in the land surface processes has 

mainly been driven by climate change research 

where the meteorologists typically focus on length 

scales up to 100 km. Michaud and Shuttelworth 

(1997), in a recent overview, conclude that substantial 

progress has been made for the description of surface 

energy fluxes by using simple aggregation rules. Sellers 

et al. (1997) conclude that “it appears that simple 

averages of topographic slope and vegetation parameters 

can be used to calculate surface energy and 

heat fluxes over a wide range of spatial scales, from 

a few meters up to many kilometers at least for grassland 

and sites with moderate topography”. An interesting 

finding is the apparent existence of a threshold 

scale, or representative elementary area (REA) for 

evapotranspiration and runoff generation processes 

(Wood et al., 1988, 1990, 1995). Famiglietti and 

Wood (1995) concludes on the implications of such 

an REA in a study of catchment evapotranspiration 

that “the existence of an REA for evapotranspiration 

modelling suggests that in catchment areas smaller 

than this threshold scale, actual patterns of model 

parameters and inputs may be important factors 

governing catchment-scale evapotranspiration rates 

in hydrological models. In models applied at scales 

greater than the REA scale, spatial patterns of dominant 

process controls can be represented by their 

statistical distribution functions”. The REA scales 

reported in the literature are in the order of 1–5 km 2 . 

The research on scale effects related to topography 

and stream–aquifer interactions has been rather 

limited as compared to the above two areas. Saulnier 

et al. (1997) have examined the effect of the grid sizes 

in digital terrain maps (DTM) on the model simulations 

using the topography-based TOPMODEL. They 

concluded that in particular for channel pixels the 

spatial resolution of the underlying DTM is important. 

Refsgaard (1997) using the distributed MIKE SHE


Fig. 2. Schematic representation of upscaling/aggregation procedure. 

model to the Danish Karup catchment with grid sizes 

of 0.5, 1, 2 and 4 km, found that the discharge hydrograph 

shape was significantly affected for the 2 and 

4 km grids as compared to the almost identical model 

results with 0.5 and 1 km grids. He concluded that the 

main reason for this change was that the density of 

smaller tributaries within the catchment was smaller 

for the models with the larger grids. 

Many researchers doubt whether it is feasible to use 

the same model process descriptions at different 

scales. For instance Beven (1995) states that “… the 

aggregation approach towards macroscale hydrological 

modelling, in which it is assumed that a model 

applicable at small scales can be applied at larger 

scales using ‘effective’ parameter values, is an inadequate 

approach to the scale problem. It is also unlikely 

in the future that any general scaling theory can be 

developed due to the dependence of hydrological 

systems on historical and geological perturbations”. 

We have experienced some of the same problems 

and agree that it is generally not possible to apply 

the same model without recalibration at small and 

large scales. Therefore, we have used another 

approach based on a combination of aggregation and 

upscaling in accordance with the principles recommended 

by Heuvelink and Pebesma (1998). The 

scale terminology and the upscaling procedure 

adopted here are as follows (Fig. 2): 

• The basic modelling system is of the distributed 

physically based type. For application at point 

scale (where it is not used spatially distributed) 

the process descriptions of this model type can be 

tested directly against field data. 

• The model is in this case run with (equations and) 

parameter values in each horizontal grid point 

representing field scale (50–200 m) conditions. 

The field scale is characterised by ‘effective’ soil 

and vegetation parameters, but assuming only one 

soil type and one cropping pattern. Thus the spatial 

variability within a typical field is aggregated and 

accounted for in the ‘effective’ parameter values. 

• The smallest horizontal discretization in the model 

is the grid scale or grid size (1–5 km) that is larger 

than the field scale. This implies that all the variations 

between categories of soil type and crop type

124 


Fig. 3. Locations of the Karup and Odense catchments in Denmark. 

within the area of each grid cannot be resolved and 

described at the grid level. Such input data whose 

variations are not included in the grid scale model 

representation, are distributed randomly at the 

catchment scale so that their statistical distributions 

are preserved at that scale. 

• The results from the grid scale modelling are then 

aggregated to catchment scale (10–50 km) and the 

statistical properties of model output and field data 

are then compared at catchment scale. 

• For applications to larger scales than catchment 

scale, such as continental scale, the catchment 

scale concept is used, just with more grid points. 

This implies that the continental scale can be 

considered to consist of several catchments, within 

each of which the field scale statistical variations 

are preserved and at which scale the predictive 

capability of the model thus lies. 

In the upscaling procedure a distinction is made 

between the terms upscaling and aggregation. Thus, 

spatial attributes are aggregated and model parameters 

are scaled up. A principal difference between 

aggregation and upscaling is that whereas aggregation 

can be defined irrespective of a model operating on 

the aggregated values, upscaling must always be 

defined in the context of a model that uses the parameters 

that have been scaled up (Heuvelink and 

Pebesma, 1998). In this respect the main principle 

of the upscaling procedure can be summarised as 

follows: 

• Upscale model from point scale to field scale. 

• Run model at grid scale using field scale parameters 

in such a way that their statistical properties 

are preserved at catchment scale. 

• Aggregate grid scale model output to catchment 

scale. 

This methodology mainly attempts to address scaling 

within the second of the above fields, namely root


zone processes, while scaling in relation to subsurface 

processes and stream–aquifer interaction has not been 

considered when designing the present upscaling 

procedure. The methodology has some complications 

and critical assumptions: 

• The assumption of upscaling from point scale to 

field scale is crucial. This assumption is documented 

to be fulfilled in many cases (Jensen and 

Refsgaard 1991a–c; Djuurhus et al., 1999), but 

may fail in other cases (Bresler and Dagan, 

1983), for instance in areas where overland flow 

is a dominant flow mechanism. 

• Running the model at grid scale but using model 

parameters valid at a field scale, which is typically 

2 to 3 orders of magnitude smaller, is necessary to 

make the computational demand acceptable for 

catchment and continental scale applications. The 

solution to this is to assign inputs on soil and vegetation 

types not correctly georeferenced but such 

that their statistical distribution at catchment scale 

is preserved. This implies that results at grid scale 

are dubious and should not be used. The aggregation 

step up to catchment scale is therefore essential. 

• While the statistical properties of the critical 

root zone parameters due to the aggregation 

step have been preserved at catchment scale 

this is not the case for the geological, topographical 

and stream data which are used directly 

at the grid scale. A critical question is therefore, 

how the catchment scale model output, 

due to these other data, are influenced by selection 

of grid scale. Here, investigations with 1, 2 

and 4 km grids are made. 

3. Application 

3.1. Modelling approach for the Karup and Odense 

catchments 

The modelling studies have focussed on two 

aspects, namely the feasibility of using coarse aggregated 

data available at European level databases, and 

the effect of the upscaling procedure. The modelling 

aims at describing the integrated runoff at the catchment 

outlet and the distribution function of the nitrate 

concentrations sampled from available wells over the 

catchment (aquifer). On this basis the following 

approach has been adopted: 

1. Simulation models have been established for 

two catchments in Denmark, Karup Å and 

Odense Å (Fig. 3), in the following denoted 

the Karup and Odense models, respectively. 

The topographical areas for the Karup catchment 

gauging station 20.05 Hagebro is 

518 km 2 . Correspondingly, the catchment area 

at the gauging station used for the model validation 

tests in the Odense catchment, 45.26 

Ejby Mølle, is 536 km 2 . The most detailed 

studies were carried out for the Karup catchment, 

while the results for the Odense catchment 

were included mainly to check the 

generality of the conclusions derived from the 

Karup catchment. 

2. The models are established directly from the 

European level databases and all input parameter 

values are assessed from these data or in a predefined 

objective way from experience values 

obtained from previous model studies. Thus, the 

models are not calibrated at all. 

3. The results of the models are compared with field 

data, on which basis the model performance is 

assessed. 

4. The effects of upscaling have been examined in 

two ways: 

• The models are run with different grid sizes (1, 2 

and 4 km) and the results compared. 

• For the Karup catchment two different procedures 

have been compared, namely: 

the upscaling/aggregation procedure described 

above (Fig. 2), which according to its representation 

of agricultural crops is denoted ‘distributed’; 

a simpler procedure where the agricultural crops 

are upscaled all the way from field scale to 

catchment scale. This implies that one crop 

type represents all the agricultural areas. The 

dominant crop in the area, namely winter 

wheat, has been selected as the crop for the 

70% agricultural area, while the 30% natural/

126 


Fig. 4. Surface topography, catchment delineation and river network for the Karup-EU model. 

urban areas remain as the only other vegetation 

type. This procedure is denoted ‘uniform’. 

3.2. Karup model 

3.2.1. Catchment and river system 

The catchment area and locations of the river 

branches (Fig. 4) were generated from the DEM by 

use of standard ARC/Info functionalities. The generated 

catchment areas for 1, 2 and 4 km grids were 

within 4% of the correct one at station 20.05 Hagebro. 

The river cross-sections were subsequently automatically 

derived on the basis of the following assumptions: 

• The bankful discharge (i.e. water flow up to top of 

cross-section) corresponds to a typical annual 

maximum discharge. This characteristic discharge 

is further assumed uniform in terms of specific 

runoff (1 s 1 km 2 ), so that the actual discharge 

at any cross section is estimated as the specific 

runoff multiplied by the upstream catchment area 

that can be estimated from the DEM. 

• The river slope corresponds to the slope of the 

surrounding surface, which can be derived from 

the DEM. 

• The cross-section has a trapezium shape with a 

fixed given angle and relation between depth and 

width. 

• The relation between discharge, slope and river 

cross-section can be determined by the Manning 

formula with a given Manning number. 

Most areas in Denmark are drained in order to make 

the land suitable for agriculture. Agricultural areas are 

typically artificially drained with tile drains in combination 

with small ditches. Other areas may be naturally 

drained by creeks and rivers. It is not possible to 

include a detailed and fully correct drainage description 

in a coarse model like the Karup model. Moreover, 

detailed information on drainage network is not 

available. Therefore, when establishing a coarse scale


model, a lumped description must be used. In the 

present case it is simply assumed that the entire catchment 

area is drained and that the drains are located 

1 m below ground surface. Drainage water is 

produced whenever the groundwater table is located 

above this drainage level. Drainage water is routed to 

the nearest river node where it contributes as a source 

to the river flow. Routing of groundwater to the drains 

and further to the ultimate recipient is in MIKE SHE 

described using a linear routing technique, where a 

time constant is specified by the user. In this case a 

time constant of 2:3 × 10 7 s 1 was used corresponding 

to an average retention time (in the linear reservoir) 

of 50 days. This time constant represents a 

typical value for Danish catchments. 

3.2.2. Soil properties 

The soil texture classes in a 1 × 1 km resolution 

were provided by the GISCO soil data base. The 

texture classes were translated into soil parameters 

in terms of hydraulic conductivity functions and soil 

water retention curves using pedo-transfer functions 

(Cosby et al., 1984). According to the GISCO the 

Karup catchment is covered by coarse sandy soil for 

which the following key parameter values were estimated: 

(a) saturated hydraulic conductivity 

K s ˆ 1:7 × 10 5 m=s; (b) moisture content at saturation 

u s ˆ 40 vol%; (c) moisture content at field capacity 

u FC ˆ 20 vol%; and (d) moisture content at 

wilting point u wp ˆ 6vol%: 

A specific problem was related to assessment of soil 

organic matter, which is an important parameter for 

nitrogen turnover processes. As indicated in Table 1 

such information was not identified in any of the 

European data bases. Instead a value based on 

previous experience (Lamm, 1971) with Danish agricultural 

soils was estimated. In the plough layer (0– 

20 cm) a value of 1.5%C was used, and this value 

decreased rapidly with depth to a minimum of 

0.01%C below 1 m depth. 

3.2.3. Hydrogeology 

The geological perception of the area and the basis 

for estimation of the hydrogeological parameters used 

in the model are all based on EC (1982), where the 

aquifer is described as composed of two main geological 

layers. 

The upper layer is Quaternary sediments consisting 

of sands and gravel. The transmissivity of these sediments 

are assessed to be in the order of 2 × 10 3 m 2 =s 

and the thickness about 15 m (EC, 1982). This leads to 

a horizontal hydraulic conductivity of 1:3 × 10 4 m=s 

that was used in the model calculations. An anisotropy 

factor of 10 between horizontal and vertical hydraulic 

conductivities was assumed leading to a vertical 

hydraulic conductivity of 1:3 × 10 5 m=s: Moreover, 

a specific yield of 0.2 and a storage coefficient of 

10 4 m 1 was assumed. 

Below the Quaternary sediments there are Miocene 

quarts-sand sediments with a relatively high transmissivity 

of 3 × 10 3 m 2 =s and a thickness of typically 

10–20 m (EC, 1982). Hence, in the model a thickness 

of 15 m has been used. This leads to a horizontal 

hydraulic conductivity of 2:0 × 10 4 m=s: The same 

assumptions on anisotropy, specific yield and storage 

coefficients as for the Quaternary sediments were 

applied for the Miocene sediments. 

EC (1982) provides information on groundwater 

abstraction on a commune (local administrative unit) 

basis. The Miocene sediments are described as suitable 

for drinking water supply, why it is assumed that 

all groundwater abstractions are made from these 

sediments that are the lower layer in the model. The 

total abstraction is given as 13 × 10 6 m 3 =year: The 

exact location of the individual water supply wells 

is not given in EC (1982), and has been evenly distributed 

among 10–20 model grids located along the river 

system. 

The location of the reduction front in the aquifer is 

an important parameter for nitrate conditions. As 

percolation water containing nitrate moves into 

areas with reduced geochemical conditions the nitrate 

will disappear. No information on this important parameter 

was provided in EC (1982). It was assumed that 

the front separating oxic and reduced aquifer conditions 

all over the aquifer is located in the Miocene 

sediments, 3 m below the interface to the Quaternary 

sediments. This corresponds to a location 18 m below 

the terrain surface. 

3.2.4. Hydrometeorology 

Time series of daily precipitation and temperature 

based on standard meteorological stations within the 

catchment was used. In addition, monthly values of 

potential evapotranspiration were calculated by the 

Makkink equation on the basis of climate data from

128 


the synoptic station at Karup airport. The data from 

synoptic stations are generally easily available internationally. 

3.2.5. Crop growth, evapotranspiration and nitrate 

leaching model 

Distributions of crop types and livestock densities 

were obtained from Agricultural Statistics (1995) and 

converted to slurry production using standard values 

for nitrogen content. Based on typical crop rotations 

proposed by The Danish Agricultural Advisory 

Centre and the constraints offered by crop distribution 

and livestock density two cattle farm rotations, one 

pig farm rotation and one arable farm rotation were 

constructed. In order to capture the effect of the interaction 

between weather conditions and crops, simulations 

were performed in such a way that each crop at 

its particular position in the considered rotation 

occurred exactly once in each of the years, which 

resulted in a total of 17 crop rotation schemes. 

These 17 schemes were distributed randomly over 

the area in such a way that the statistical distribution 

was in accordance with the agricultural statistics. 

To simulate the trend in the nitrate concentrations 

in the groundwater and in the streams, it is 

necessary to have information on the history of 

the fertiliser application in space and time. In 

Denmark, norms and regulations for fertilisation 

practice are defined (Plantedirektoratet, 1996) 

which regulate the maximum amount of nutrients 

allowed for a particular crop depending on forefruit 

and soil type, and in addition, provide norms 

for the lower limit of nitrogen utilisation for 

organic fertilisers. It was assumed that the farmers 

follow the statuary norms, and that the proportion 

of organic fertiliser to the individual crop in a 

rotation is proportional to the production of 

organic fertiliser in the rotation and to the relative 

nitrogen demand of the crop (the fertiliser norm of 

the particular crop in relation to the fertiliser norm 

of the rotation). Based on estimated application 

rates of organic and mineral fertilisers to the individual 

crops each year, the Daisy model simulated 

time series of nitrate leaching from the root zone 

for each agricultural grid. The MIKE SHE model 

then routed these fluxes further through the 

unsaturated zone and in the groundwater layers 

accounting for dispersion and dilution processes 

and finally into the Karup stream where the integrated 

load from the entire catchment was estimated. 

The parameterisation of the Daisy model is 

adopted from previous studies. The basic 

processes and standard parameter values were 

originally assessed from results of Danish agricultural 

field experiments (Hansen et al., 1990). As 

then the process description and standard parameters 

have only been subject to minor modifications 

in connection with model tests against data 

from The Netherlands, Germany, Denmark and 

Slovakia (Hansen et al 1991; Jensen et al, 1994, 

1996, 1997; Svendsen et al, 1995). Hence, the 

parameters related to both, evapotranspiration/ 

water balance processes and to the nitrogen transformation 

processes have, except for the soil parameters 

described in Section 3.2.3, been taken as 

the standard values. More details on the parameter 

values, their assessed uncertainties and results 

from the Daisy simulations are provided in 

Hansen et al. (1999). 

3.2.6. Boundary and initial conditions 

In addition to precipitation and groundwater 

abstraction rates the following boundary conditions 

are used: 

• The area included in the catchment is per definition 

a hydrological catchment as based on topography. 

Thus a zero-flux boundary is used along the catchment 

boundaries, also for the aquifer layers. The 

bottom of the model is considered impermeable. 

• For all upstream river ends a zero-flux boundary 

condition is applied. For the downstream end, a 

constant water level was applied. 

The most important initial conditions are the moisture 

content in the unsaturated zone and the elevation 

of the groundwater table. The initial soil moisture 

content was assumed equal to field capacity, while 

the initial groundwater tables was assumed equal to 

the groundwater tables after a seven years simulation 

period with guessed initial conditions. The model was 

run for seven years (1987–1993). In order to reduce 

the importance of uncertain initial conditions, the two 

first years were considered as a ‘warming-up period’ 

and the last five years were considered the simulation 

period.


Table 2 

Water balance in mm/year for the Karup catchment at station 20.05 Hagebro (518 km 2 ) 

Year Precipitation River flow Observed 

Model 1 km grid Model 2 km grid Model 4 km grid 

1989 812 428 392 353 460 

1990 1020 496 518 512 476 

1991 863 446 441 424 449 

1992 892 499 531 527 437 

1993 835 434 425 405 432 

Average 884 460 461 444 451 

3.3. Odense model 

The same procedure as outlined above for the 

Karup model was followed. The two main differences 

as compared to the Karup catchment are 

that the top soil belong to more fine textured 

classes with lower hydraulic conductivities and 

that the aquifer having groundwater abstraction 

is confined in the Odense catchment. This results 

in an assumption that the covering sediments are 

less permeable than the aquifer material. As no 

direct information on these confining sediments 

is given in EC (1982) the hydraulic properties of 

the soil in the root zone are assumed valid. This 

implies in practise that recharge rates to the 

aquifer is lower than in the Karup catchment 

and that the horizontal flow towards the drains 

and the river system is correspondingly larger. A 

similar geological geometry as in the Karup 

catchment is assumed, i.e. the upper less 

permeable, confining layer is assumed to have a 

thickness of 15 m and the reduction front is 

assumed to be located in the lower aquifer, 3 m 

below this confining layer. 

4. Results 

To test the model performance a number of validation 

tests were carried out for both catchments. Validation 

is here defined as substantiation that a site 

specific model performs simulations at a satisfactory 

level of accuracy. Hence, no universal validity of the 

general model code is tested nor claimed. In Tables 2 

and 3 and Figs. 5–8 results are shown for model grid 

sizes 1, 2 and 4 km and for the Karup catchment additionally 

for both the distributed and uniform upscaling 

procedures. The validation tests described below only 

considers the 1 km grid model runs, while the remaining 

results are discussed further below in the section 

dealing with scaling effects. 

4.1. Karup catchment 

The Karup model (1 km grid) was validated by 

comparison of model simulations and field data on 

the following aspects: 

• Annual water balances. Table 2 shows the annual 

water balances for the five years simulation period 

together with the observed annual discharge. The 

Table 3 

Water balance in mm/year for the Odense catchment at station 45.21 Ejby Mølle (536 km 2 ) 

Year Precipitation River flow Observed 

Model 1 km grid Model 2 km grid Model 4 km grid 

1989 649 220 177 187 181 

1990 943 349 351 394 299 

1991 760 312 291 308 265 

1992 770 308 306 332 243 

1993 906 334 329 353 306 

Average 805 305 291 315 259

130 


Fig. 5. Comparison of the recorded discharge hydrograph for the Karup catchment with simulations based on 1, 2 and 4 km grids. The two 

simulated curves corresponds to the combined upscaling/aggregation procedure (Distributed) and the simpler upscaling procedure (Uniform). 

simulated and observed hydrographs are shown in 

Fig. 5. 

• Nitrate concentrations in the upper groundwater 

layer. Simulated values are compared to observed 

values from 35 wells in terms of statistical distributions 

over the aquifer (Fig. 6). 

The main findings from these validation tests can be 

summarised as follows: 

• The annual water balance is simulated remarkably 

well. Thus the simulated and recorded flows, which 

also reflect the annual groundwater recharges in 

this area, differ only 2% as average values over 

the five year simulation period (Table 2). 

• The variation of the river runoff over the year is 

relatively well described, although not at all as 

good as the long term average water balance 

(Fig. 6). The model generally underestimates the 

runoff in the summer periods (low flows) and overestimates 

the winter flow. There may be many 

reasons for this. The most important is probably 

that the observed groundwater levels and dynamics 

are poorly reproduced by the model. The runoff 

from the Karup catchment is dominated by drainage 

flow and baseflow components. Thus a good 

simulation of groundwater levels and dynamics are 

required in order to produce a good runoff simulation. 

An improved simulation of groundwater 

levels and dynamics requires that the model 

includes, in particular, spatial variations of the 

transmissivity of the aquifer, which is not possible 

based on the available input data. 

• The nitrate concentrations simulated by the model 

are seen to match the observed data remarkably 

well, both with respect to average concentrations 

and statistical distribution of concentrations within 

the catchment. It may be noticed that the critical 

NO 3 concentration level of 50 mg/l (maximum 

admissible concentration according to drinking 

water standards) is exceeded in about 60% of the 

area. 

4.2. Odense catchment 

The Odense model (1 km grid) was validated by


Fig. 6. Comparison of the statistical distribution of nitrate concentrations in groundwater for the Karup catchment predicted by the model with 

1, 2 and 4 km grids and observed in 35 wells. The upper figure corresponds to the upscaling/aggregation procedure resulting in a distributed 

representation of agricultural crops, while the lower figure is from the run with the upscaling procedure, where all the agricultural area is 

represented by one uniform crop. 

comparison of model simulations and field data on the 

following aspects: 

• Annual water balances. Table 3 shows the annual 

water balances for the five years simulation period 

together with the observed annual discharge. The 

simulated and observed hydrographs are shown in 

Fig. 7. 

• Nitrate concentrations in the upper groundwater 

layer. Simulated values are compared 

to observed values from 42 wells in terms 

of statistical distributions over the aquifer 

(Fig. 8).

132 


Fig. 7. Discharge hydrographs for Odense catchment simulated with 1, 2 and 4 km grids. 

The main findings from these validation tests are: 

• The annual water balance is simulated reasonably 

well, although not with the same accuracy as for 

the Karup catchment. Thus the simulated and 

recorded flows differ 18% for the 1 km grid 

model as average values over the five year simulation 

period (Table 3). A comparison with another 

model study for this area reveals that one of the 

reasons for this deviation is uncertainties (errors) in 

the catchment delineation in the flat downstream 

part of the catchment. Another reason may be that 

Fig. 8. Comparison of the statistical distribution of nitrate concentrations in groundwater for the Odense catchment predicted by the model with 

1, 2 and 4 km grids and observed in 35 wells.


the soil hydraulic conductivity functions and the 

soil water retention curves that significantly affect 

the evapotranspiration are not very accurately 

determined. These inaccuracies may originate 

either from non-representative soil texture data in 

the 1 km × 1 km GISCO database or by errors 

introduced by use of the pedo-transfer functions. 

• The variation of the river runoff over the year is 

relatively well described, although the winter 

peaks are simulated too small and the summer 

low flows too high, reflecting that some of the 

internal hydrological processes may not be simulated 

correctly. 

• The distribution of groundwater concentrations by 

the end of the simulation period is seen not to 

compare very well to the observations from 42 

wells. Thus, in 80% of the observation wells no 

nitrate was found, whereas the model simulates 

zero concentration in only 25% of the area. With 

respect to the critical concentration value of 50 mg/ 

l, the observations indicate that such high concentrations 

are not found in the area, while the model 

simulates such concentrations to exist in about 5% 

of the catchment area. The main reason for this 

disagreement is most likely that in reality the 

nitrate is in most of the area reduced (disappears) 

in the confining sediments overlaying the aquifer. 

This is not simulated by the model, because the 

reduction front was assumed to be located within 

the aquifer, while analysis of local geological data 

reveals that it in reality is located in the upper 

confining layer over most of the aquifer. 

• It is noticed that the nitrate concentrations are 

significantly lower in the Odense catchment than 

in the Karup catchment, both the observed and the 

simulated values. The main reason for this is that 

the different soil properties and the less number of 

animals result in a lower nitrate leaching from the 

root zone in the Odense catchment. 

4.3. Scaling effects 

The results of running the Karup and Odense 

models with different computational grid sizes, 1, 2 

and 4 km, appear from Tables 2 and 3 for annual water 

balances and Figs. 5 and 7 for discharge hydrographs. 

Further, the results in terms of groundwater 

concentrations are shown in Figs. 6 and 8. From 

these results the following findings appear: 

• The simulated annual runoff is almost identical and 

thus independent of grid sizes. A reason for some 

of the small differences is that the catchment areas 

in the 1, 2 and 4 km models are not quite identical. 

Thus, the root zone processes responsible for 

generating the evapotranspiration and consequently 

the runoff does not appear to be scale 

dependent as long as the statistical properties of 

the soil and vegetation types are preserved, which 

is the case with the upscaling/aggregation procedure 

used in this case. 

• The hydrograph shape differs significantly for the 

three grid sizes. For the Karup model, the simulation 

with 1 km grid reproduces the low flow conditions 

reasonably well, whereas the 2 and 4 km grids 

have a rather poor description of the baseflow 

recession in general and the low flow conditions 

in particular. For the Odense model, the simulation 

with the 1 km grid shows too large baseflows 

during the low flow season, while the 2 km grid 

model has the right level and the 4 km grid 

model simulates less low flow than observed. 

This indicates that there are significant scale effects 

on the stream–aquifer interaction that are not properly 

described in the present upscaling/aggregation 

procedure. 

• The nitrate concentrations in the groundwater is 

not clearly influenced by the grid size for the 

Karup catchment, while there appears to be some 

effect for the Odense catchment. The reason for 

this difference is related to the different hydrogeological 

situations in the two catchments. In the 

Karup catchment the groundwater table is generally 

located a couple of meters below terrain 

surface and the horizontal flows take place in 

both the Quaternary and the Miocene sediments. 

Hence for both the 1, 2 and 4 km grid models, the 

main part of the horizontal groundwater flow takes 

place in the about 15 m of the aquifer located 

above the reduction front, and only a relatively 

small part of the flow lines are crossing the reduction 

front, below which the nitrate disappears. In 

the Odense catchment, the horizontal groundwater 

flows take place almost exclusively in the lower 

aquifer, of which only the upper 3 m is located

134 


above the reduction front. This implies that a large 

part of the groundwater flow is crossing the reduction 

front on its route from the infiltration zones in 

the hilly areas towards the discharge zones near the 

river. As the size of the grid influences the smoothness 

of the aquifer geometry, the grid size will 

significantly influence the number of flow lines 

crossing the reduction front and hence the nitrate 

concentrations. Such scaling effect on geological 

conditions is not accounted for in the present 

upscaling/aggregation procedure. 

Further, for evaluating the importance of the 

combined upscaling/aggregation method (‘distributed’) 

a model run has been carried out for the Karup 

catchment with another upscaling method. This alternative 

method is based on upscaling of soil/crop types 

all the way from point scale to catchment scale. This 

implies that all the agricultural area is described by 

one representative (‘uniform’) crop instead of the 17 

cropping patterns used in the ‘distributed’ method. 

This representative crop has been assumed to have 

the same characteristics as the dominant crop, namely 

winter wheat, and further to be fertilised by the same 

total amount of the organic manure as in the other 

simulations, supplemented by some mineral fertiliser 

up to the nitrate amount prescribed in the norms 

defined by Plantedirektoratet (1996). 

The results are illustrated in Figs. 5 and 6 by the 

legend denoted ‘uniform’. The effects on the 

discharge hydrographs (Fig. 5) are seen to be negligible, 

indicating that the dominant crop (by chance) has 

similar evapotranspiration characteristics as the sum 

of the different crops weighted according to their 

actual occurrence. The nitrate concentrations in 

groundwater (Fig. 6) show some differences in terms 

of a lower average concentration and a less smooth 

areal distribution as compared to the distributed agricultural 

representation. Thus, in case of the ‘uniform’ 

representation the nitrate concentrations fall in two 

main groups. Around 30% of the area, corresponding 

to the natural areas with no nitrate leaching, has 

concentrations between 0 and 20 mg/l, while the 

remaining 70%, corresponding to the agricultural 

area with the ‘uniform’ crop, has concentrations 

between 70 and 90 mg/l. In the ‘distributed’ agricultural 

representation the areal distribution curve is 

much smoother in accordance with the measured data. 

5. Discussion and conclusions 

Two prerequisites are required for performing large 

scale simulations of nitrate leaching on an operational 

basis: firstly access to readily available global (or in 

the present case European) databases, and secondly an 

adequate scaling enabling suitable models to be 

applied at a larger scale than the field scales for 

which they usually have been proven valid. A key 

challenge as compared to the experiences reported 

in the literature is then how to make use of the physically 

based model at large scale without possibility for 

detailed calibration at that scale, when we know that 

its physically based equations are developed for small 

scales. Such model can only be stated as well proven 

for small scales, and the few attempts made so far to 

use it on scales above 1000 km 2 have applied calibration 

at that scale (Refsgaard et al. 1998b, 1992; Jain et 

al., 1992). 

5.1. Data availability 

From the experiences gathered and the lessons 

learnt with regard to availability of European data 

bases the following conclusions can be drawn: 

• Not all of the existing “European” databases are 

generally applicable due to various restrictions 

(e.g. copyright, not open to other projects, pointers 

only). 

• Not all databases maintained by international institutions 

contain harmonised and integrated data 

sets. Many databases in fact only contain a collection 

of national data sets that are neither integrated 

in one seamless data set, nor harmonised in their 

contents or nomenclatures. 

• Not all input data requirements could be satisfied 

from GIS (spatial) data sets, why tables and paper 

maps are needed to supplement the information. 

However often the available data are too coarse 

in scale (e.g. EU statistics at a higher administrative 

unit than needed) or too specific (e.g. transfer 

functions for natural soils only but not for agricultural 

soils). 

• Use of national data sets is to some extent necessary, 

with restrictions to data quality and origin. 

• The search for data sets could have been largely 

improved by the existence of a European spatial


data clearinghouse and the association of the 

available data sets with meta information. 

It is noted that in spite of comprehensive efforts 

made during recent years for assessing spatial data 

by use of advanced remote sensing technology the 

only data in the “European” databases which 

originate from remote sensing data are the 

CORINE land cover data, which were useful for 

distinguishing between natural, urban and agricultural 

areas, but which did not contain any further 

information about agricultural crops of importance in 

the present context. 

In spite of the above limitations, the attempts in 

the present study to identify suitable data sources 

at the European scale have shown that useful data 

are available at that scale for most of the required 

model input data. Although these data require 

some kind of transformation, as e.g. pedo-transfer 

functions, the data appear adequate for overall 

model simulations at this scale. However, some 

gaps exist in the European level databases. Thus, 

for the following data it was necessary to use 

national data sources: 

• Meteorological data on a daily basis. 

• Soil organic matter from arable land. 

• Agricultural statistics. 

• Agricultural practices. 

These data were all easily available at a national 

scale, and hence their availability is not expected to 

pose significant constraints for large scale modelling 

in other parts of Europe. 

The most critical data that may cause problems in 

terms of availability at larger scale are the geological 

data, for which no global (or European) digital database 

apparently exists. The present case study relied 

heavily on an EC report produced by the Danish 

Geological Survey. The information in this report 

proved adequate for the present purpose, although 

the lack of geochemical information turned out to 

have some importance for one of the two catchments. 

Similar readily available EC reports exist for other 

countries, but they appear to be non-standardised 

and comprise information at a variable level of details. 

Hence, the positive conclusions from using the geological 

data in EC (1982) for Denmark cannot 

necessarily be generalised. 

5.2. Parameter assessment—no calibration 

An important element of the present methodology 

is the principle not to carry out any calibration. The 

parameter values were assessed in three different 

ways: 

• Directly from the available data, e.g. topography 

and geology. 

• Indirectly from the available data through application 

of predefined transfer functions, e.g. the soil 

hydraulic parameters. 

• Use of standard parameter values that have been 

assessed in previous studies on other locations. 

While the first two methods can be characterised as 

fully objective and transparent, it may be argued that 

there always will be some elements of subjective 

assessment hidden in the use of standard parameter 

values and that the possible calibration exercises in 

previous studies may question the “no calibration” 

statement. 

In the present case the standard parameters originate 

from two model codes and associated accumulated 

experiences: 

• Parameters in the MIKE SHE part. The standard 

parameter used here is the time constant for routing 

of groundwater to drains (50 days). From comprehensive 

hydrological modelling experience on 

dozens of Danish catchments starting with 

Refsgaard and Hansen (1982) this value can be 

characterised as a typical value. It is not the optimal 

value that would be estimated in a calibration 

for any of the two respective catchments: Thus, for 

instance the calibrated value for Karup was in 

Refsgaard (1997) estimated to 33 days. 

• Parameters in the Daisy part. The standard parameters 

used here are the ones controlling the vegetation 

part of the evapotranspiration and the 

nitrogen turnover processes in the root zone. 

These parameters are essential both for the water 

balance and the nitrogen concentrations. The Daisy 

has standard parameter that can be used if no calibration 

is possible (or desirable). These standard 

parameter values have originally been assessed 

from agricultural field experiments on plot scales 

(Hansen et al, 1990). As then the process descriptions 

and associated standard parameter values

136 


have only been subject to minor adjustments 

through a number of additional tests on new data 

sets from different countries. It should be emphasised 

that Daisy has not previously been calibrated 

on the Karup and Odense catchments. These two 

catchments, and in particular the Karup catchment, 

have been subject to modelling studies which have 

included calibration of the water balance (evapotranspiration) 

parameters. However, in the 

previous studies of the Karup catchment (Styczen 

and Storm, 1993) and (Refsgaard, 1997) the water 

balance in the root zone was simulated by MIKE 

SHE, which is not the case in the present study. As 

the process descriptions for evapotranspiration in 

MIKE SHE and Daisy are fundamentally different, 

the Daisy standard parameters used in the present 

study, have not been affected at all by the previous 

MIKE SHE studies in the same catchment. 

Thus although it may correctly be argued that the 

standard model parameters are results of previous 

studies where calibration was carried out, the specific 

parameters used in the present study have not been 

subject to, and are not results of, calibration neither in 

the Karup nor the Odense catchments. 

In our opinion, one of the strengths of physically 

based models is the possibility to assess many parameter 

values from standard values, achieved from 

experience through a number of other applications. 

We think that the results of the present study shows 

both this strength and some of limitations in this 

respect. Thus on one hand, the key results in terms 

of annual runoff and nitrogen concentration distributions 

are encouraging, while on the contrary Figs. 5 

and 7 clearly illustrate that it would be very easy to 

obtain a better hydrograph fit through calibration of a 

couple of parameter values. 

When parameter values are assessed in this way 

they inevitably are subject to considerable uncertainty, 

which again will generate significant uncertainty 

in model results. It is therefore highly relevant 

to conduct uncertainty analyses in order to assess 

whether the resulting uncertainty becomes so large 

that the model results are not of any use for water 

management in practise. A methodology and some 

results of such uncertainty analyses are provided in 

Hansen et al. (1999) for the root zone processes and in 

Refsgaard et al. (1998a) for the catchment processes. 

5.3. Upscaling 

The adopted upscaling methodology is a combination 

of upscaling and aggregation. Hence, upscaling in 

its traditional definition (Beven, 1995) is used only 

from point scale to field scale, where the same equations 

are assumed valid and where ‘effective’ parameter 

values are used. The parameter values 

estimated through pedo-transfer functions (soil data) 

and the vegetation parameters representing the different 

crops are assumed valid at field scale. Subsequently, 

an aggregation procedure is used to 

represent catchment scale conditions with regard to 

soil and vegetation types. This aggregation procedure 

is in full agreement with the findings made regarding 

the apparent existence of a threshold area (REA) 

above which “… spatial patterns of dominant process 

controls can be represented by their statistical distribution 

functions” (Famiglietti and Wood, 1995). 

This theoretical consideration is supported empirically 

by the model results, which show that the annual 

catchment runoff can be simulated well, even when 

using different model grid sizes. For the Karup catchment, 

where the nitrate reduction in the aquifer does 

not appear to have influenced the results adversely, 

even the statistical distribution of nitrate concentrations 

is simulated well. 

For simulation of annual runoff and nitrate concentration 

distributions, both of which are affected 

primarily by root zone processes, the impact of 

changes of scale is thus relatively small. In contrary 

to this, the impact on hydrograph shape is consistently 

rather large. This finding, which also is documented 

earlier in Refsgaard (1997), indicates that the applied 

upscaling/aggregation procedure has important 

limitations with regard to describing the stream–aquifer 

interactions. Thus in summary, upscaling of 

processes described by vertical, non-correlated, but 

patchy, columns is successful, while the upscaling 

fails in case of processes where horizontal flows 

between grids dominate. The differences in hydrograph 

shapes caused by the differences in grid sizes 

illustrate how careful a model user has to be when 

changing grid size. In our opinion it is not relevant 

to talk about an ‘optimal’ scale for hydrograph simulation. 

The important point is rather that the present 

methodology is scale dependent with regard to hydrograph 

simulation; hence a change of scale (grid size)


generates a need for recalibration of parameters 

responsible for baseflow recession and low flow simulation. 

An alternative, and commonly used, upscaling 

procedure, where upscaling is used all the way from 

point scale to catchment scale by selecting the dominant 

crop type in each grid, resulted in one uniform 

crop representing all the agricultural area. Results 

indicate that whereas this uniform upscaling procedure 

may be sufficient for simulating annual water 

balance and discharge hydrographs, it is not satisfactory 

for simulation of nitrate leaching and groundwater 

concentrations. This is in agreement with 

Beven (1995) who states that upscaling from small 

scales to larger scales using effective parameter values 

cannot be assumed to be generally adequate. 

An inherent limitation of the applied upscaling/ 

aggregation method is that it does not preserve the 

georeferenced location of simulated concentrations, 

but only their statistical distribution over the catchment 

area. Therefore, comparisons with field data 

make no sense on a well by well or subcatchment 

by subcatchment basis, and no information on the 

actual location of the simulated “hot spots” within 

the catchment is possible. If it from a management 

point of view is required with a more detailed spatial 

resolution of the model predictions, then the same 

upscaling method has to be carried out at a finer 

scale with all the statistical input data being supplied 

on a subcatchment basis. This is in principle straightforward, 

but in reality it may often be limited by data 

availability. 

A critical assumption in the upscaling procedure is 

the application of the point scale equations at the field 

scale with effective parameters. This corresponds to 

interpreting the field as a single equivalent soil 

column using effective hydraulic parameters. This 

approach was evaluated on two Danish experimental 

0.25 ha plots, a coarse sandy soil and sandy loam, 

using the Daisy model (Djurhuus et al., 1999). The 

two plots were monitored with respect to soil water 

content and nitrate in soil water at several depths at 57 

points, where also texture, soil water retention and 

hydraulic conductivity functions had been measured. 

The conclusions from comparing the field measured 

data with the model simulations over the experimental 

plot, represented by the 57 points, was that the 

observed mean nitrate concentrations were matched 

well by a simulation using the geometric means as 

effective parameters. This conclusion is in agreement 

with previous studies for Danish hydrological regime 

(Jensen and Refsgaard 1991a–c; Jensen and Mantoglou, 

1992). Other studies from other regimes (Bresler 

and Dagan, 1983) conclude that effective soil hydraulic 

parameters are not adequate for modelling water 

flow in spatially variable fields. The critical issue 

determining whether such approach is feasible or 

not may depend on whether Hortonian overland 

flow is created in the hydrological regime in question. 

Thus, although the upscaling methodology from point 

to field scale is far from universally valid, there are 

good reasons to believe that this assumption was satisfactorily 

fulfilled in the present case studies. 

The spatial patterns, which in subsurface hydrology 

is considered to be of significant importance (Wen and 

Gómez Hernández, 1996), have been treated in different 

ways with regard to continuous data (parameter 

values) and categorical data (soil and vegetation 

classes). The effects of spatial autocorrelation of soil 

and vegetation parameters within a field have been 

assumed incorporated into the ‘effective parameters’, 

which in the present case are assessed in a rather crude 

way through pedo-transfer functions and use of standard 

values. The categorical data have been treated 

differently in the aggregation procedure for soil and 

vegetation classes. The soil data (one soil type for 

Karup and two soil types for Odense) were assessed 

from the soil map and assigned at a grid basis so that 

the percentage of each soil type within a catchment 

was preserved and the individual grids to the largest 

possible extent were characterised by the dominant 

soil type within the respective grid. For the vegetation 

types, the same procedure was applied to initially 

distinguish between agricultural and non-agricultural 

areas by use of the land cover map. Subsequently, it 

was assumed that the spatial distribution of cropping 

patterns are random and without spatial autocorrelation. 

This is justified by the agricultural management 

practise of rotating the crops within the individual 

farms. 

5.4. General applicability of methodology 

From the results of the present study it appears that 

it is possible to use distributed physically based 

models of the same type as the MIKE SHE/Daisy

138 


for catchment scale assessment of nitrate contamination 

from agricultural land. It appears obvious that 

such model application is straightforward and the 

above conclusion is valid for other areas in Denmark. 

The interesting question is therefore how general this 

conclusion is to other areas in Europe (and on other 

continents) and what the scientific and practical 

limitations are. In this respect the following considerations 

may be noted: 

• Except for the geological data, the general availability 

of which are somewhat uncertain, there is 

no reason to expect that the application of similar 

data for other catchments in other European countries 

should not be as relatively easy as the application 

for the two Danish catchments. Likewise, the 

encouraging simulation results of using European 

level databases, in spite of their often coarse resolution 

and high level of aggregation, may also be 

expected for other areas. With regard to geological 

data it may be noted that considerable efforts are 

being made at most (if not all) national geological 

institutes to provide geological data to users in 

digital form; hence the limitation on non-easy 

data availability existing so far is likely to be overcome, 

at least nationally, during the coming years. 

• The combined aggregation/upscaling procedure 

appears valid in many areas. The catchments for 

which it was used in the present study were limited 

to a maximum of about 500 km 2 . However, the 

further upscaling to larger areas provides no fundamental 

problems, as it consists of just a larger 

number of computational grids. Computationally, 

running a model like MIKE SHE/Daisy for an area 

of for instance 100 000 km 2 with e.g. 250 

subcatchments of each 100 grids is maybe close 

to the limit of what is practically feasible today 

(five years run would require 100 h CPU time on 

a Pentium 300 MHz), but this problem will soon 

disappear as computers become faster. 

• The MIKE SHE/Daisy modelling methodology is 

general and applicable to many other areas. Some 

limitations, however, is related to special geological 

conditions such as karstic flow and fissured 

aquifers, which cannot be described explicitly. 

Another important limitation is related to the 

upscaling procedure from point to field scale, 

which may fail in areas where Hortonian overland 

flow is a dominant mechanism. In this respect it 

should be noted that many areas with dominant 

overland flow regimes are mountainous regions 

characterised by thin soil layers and steep slopes, 

which generally not are regions with important 

aquifers. 

Hence, it may be concluded that the methodology 

can relatively easily be applied to larger areas and 

used as decision support tool for evaluation of legislative 

and management measures aiming at reducing 

nitrate contamination risks. 


The present work was partly funded by the EC 

Environment and Climate Research Programme 

(contract number ENV4-CT95-0070). Good ideas 

and constructive comments to the manuscript by 

Gerard Heuvelink, University of Amsterdam, are 

greatly acknowledged. Further, the constructive criticism 

of Marnik Vanclooster, Université Catholique de 

Louvain, and an anonymous reviewer are 

acknowledged. 

References 

Abbott, M.B., Bathurst, J.C., Cunge, J.A., O’connell, P.E., Rasmussen, 

J., 1986. An introduction to the european hydrological 

system—systéme hydrologique européen SHE 2: structure of 

a physically based distributed modelling system. Journal of 

Hydrology 87, 61–77. 

Agricultural Statistics, 1995. Danmarks Statistik, 294 pp. (In 

Danish). 

Arnold, J.G., Williams, J.R., 1995. SWRRB—a watershed scale 

model for soil and water resources management. In: Singh, 

V.J. (Ed.). Computer Models of Watershed Hydrology, Water 

Resources Publication, pp. 847–908. 

Arnold, J.G., Williams, J.R., Nicks, A.D., Sammons, N.B., 1990. 

SWRRB—A basin scale simulation model for soil and water 

resources management, Texas A & M University Press, College 

Station 241 pp. 

Beasley, D.B., Huggins, L.F., Monke, E.J., 1980. ANWERS: a 

model for watershed planning. Transactions of ASAE 23 (4), 

938–944. 

Beven, K., 1995. Linking parameters across scales: subgrid parameterizations 

and scale dependent hydrological models. Hydrological 

Processes 9, 507–525. 

Blöschl, G., Sivapalan, M., 1995. Scale issues in hydrological 

modelling: a review. Hydrological Processes 9, 251–290. 

Brester, E., Dagan, G., 1983. Unsaturated flow in spatially variable


fields: application of water flow models to various fields II. 

Water Resources Research 19, 421–428. 

Cosby, B.J., Hornberger, M., Clapp, Ginn, T.R., 1984. A statistical 

exploration of relationships of soil moisture characteristics to 

the physical properties of soils. Water Resources Research 20, 

682–690. 

Dagan, G., 1986. Statistical theory of groundwater flow and transport: 

pore to laboratory, laboratory to formation, and formation 

to regional scale. Water Resources Research 22 (9), 120–134. 

DeCoursey, D.G., Rojas, K.W., Ahuja, L.R., 1989. Potentials for 

non-point source groundwater contamination analyzed using 

RZWQM. Paper No. SW892562, presented at the International 

American Society of Agricultural Engineers’ Winter Meeting, 

New Orleans, Louisiana. 

DeCoursey, D.G., Ahuja, L.R., Hanson, J., Shaffer, M., Nash, R., 

Rojas, K.W., Hebson, C., Hodges, T., Ma, Q., Johnsen, K.E., 

Ghidey, F., 1992. Root zone water quality model, Version 1.0, 

Technical Documentation. United States Department of Agriculture, 

Agricultural Research Service, Great Plains Systems 

Research Unit, Fort Collins, Colorado, USA. 

Djuurhus, J., Hansen, S., Schelde, K., Jacobsen, O.H., 1999. Modelling 

mean nitrate leaching from spatially variable fields using 

effective parameters. Geoderma 87, 261–279. 

EC, 1982. Groundwater resources in Denmark. Commission of the 

European Communities. EUR 7941 (In Danish). 

EC, 1996. Commission proposal for an Action Programme for Integrated 

Groundwater Protection and Management, Brussels. 

EEA, 1995. Europe’s Environment. The Dobris Assessment. The 

European Agency, Copenhagen. 

Engesgaard, P., 1996. Multi-species reactive transport modelling. 

In: Abbott, M.B., Refsgaard, J.C. (Eds.). Distributed Hydrological 

Modelling, Kluwer Academic Publishers, Dordrecht, pp. 

71–91. 

EU, 1991. Resolution from Ministerial seminar held in The Hague 

in November 1991. 

Famiglietti, J.S., Wood, E.F., 1995. Effects of spatial variability and 

scale on arealy averaged evapotranspiration. Water Resources 

Research 31 (3), 699–712. 

Gelhar, L.W., 1986. Stochastic subsurface hydrology. From theory 

to applications. Water Resources Research 22 (9), 135–145. 

Hansen, S., Jensen, H.E., Nielsen, N.E., Svendsen, H., 1990. Daisy, 

a soil plant system model. NPO-forskning fra Miljøstyrelsen, 

Report no. A10. Danish Environmental Protection Agency, 

Copenhagen. 

Hansen, S., Jensen, H.E., Nielsen, N.E., Svendsen, H., 1991. Simulation 

of nitrogen dynamics and biomass production in winter 

wheat using the Danish simulation model Daisy. Fertilizer 

Research 27, 245–259. 

Hansen, S., Thorsen, M., Pebesma, E., Kleeschulte, S., Svendsen, 

H., 1999. Uncertainty in simulated leaching due to uncertainty 

in input data. A case study. Soil Use and Management. 

Heng, H.H., Nikolaidis, N.P., 1998. Modelling of non-point source 

pollution of nitrogen at the watershed scale. Journal of the 

American Water Resources Association 34 (2), 359–374. 

Heuvelink, G.B.M., Pebesma, E.J., 1998. Spatial aggregation and 

soil process modelling. Geoderma. 

Jain, S.K., Storm, B., Bathurst, J.C., Refsgaard, J.C., Singh, R.D., 

1992. Application of the SHE to catchment in India. Part 2. 

Field experiments and simulation studies with the SHE on the 

Kolar subcatchment of the Narmada River. Journal of 

Hydrology 140, 25–47. 

Jensen, C., Stougaard, B., Østergaard, H.S., 1996. The performance 

of the Danish simulation model Daisy in prediction of Nmin at 

spring. Fertilizer Research 44, 79–85. 

Jensen, C., Stougaard, B., Østergaard, H.S., 1994. Simulation of the 

nitrogen dynamics in farm land areas in Denmark 1989–1993. 

Soil Use and Management 10, 111–118. 

Jensen, K.H., Refsgaard, J.C., 1991. Spatial variability of physical 

parameters in two fields. Part II: Water flow at field scale. 

Nordic Hydrology 22, 303–326. 


parameters in two fields. Part III. Solute transport at field scale. 

Nordic Hydrology 22, 327–340. 


parameters in two fields. Part I. Water flow and solute transport 

at local scale. Nordic Hydrology 22, 275–302. 

Jensen, K.H., Mantoglou, A., 1992. Application of stochastic unsaturated 

flow theory, numerical simulations, and comparisons to 

field observations. Water Resources Research 28, 269–284. 

Jensen, L.S., Mueller, T., Nielsen, N.E., Hansen, S., Crocker, G.J., 

Grace, P.R., Klir, J., Körschens, M., Poulton, P.R., 1997. Simulating 

trends in soil organic carbon in long-term experiments 

using the soil–plant–atmosphere model DAISY. Geoderma 81 

(1–2), 5–28. 

Kleeschulte, S., 1998. Assessment of data availability for direct 

modelling use at the European scale. In: Refsgaard, J.C., 

Ramaekers, D.A. (Eds.), Assessment of ‘cumulative’ uncertainty 

in spatial decision support systems: Application to examine 

the contamination of groundwater from diffuse sources. 

Final Report, vol. 1, EU contract ENV-CT95-070. http:// 

projects.gim.lu/uncersdss. 

Knisel, W.G. (Ed.), 1980. CREAMS: a field-scale model for 

chemicals, runoff, and erosion from agricultural managements 

systems. US Department of Agriculture, Science, 

and Education Administration. Conservation Research Report 

no. 26, 643 pp. 

Knisel, W.G., Williams, J.R., 1995. Hydrology component of 

CREAMS and GLEAMS models. In: Singh, V.P. (Ed.). Computer 

Models of Watershed Hydrology, Water Resources Publication, 

pp. 1069–1114. 

Lamm, C.G., 1971. Det danske jordarkiv (The Danish soil 

archieve), Tidsskrift for Planteavl, pp. 703–720 (in Danish). 

Leonard, R.A., Knisel, W.G., Still, D.A., 1987. GLEAMS: groundwater 

loading effects of agricultural management systems. 

Transactions of ASAE 30, 1403–1418. 

Mangold, D.C., Tsang, C.F., 1991. A summary of subsurface hydrological 

and hydrochemical models. Reviews of Geophysics 29 

(1), 51–79. 

Michaud, J.D., Shuttelworth, W.J., 1997. Executive summary of the 

Tuczon aggregation workshop. Journal of Hydrology 190, 176– 

181. 

Person, M., Raffensperger, J.P., Ge, S., Garven, G., 1996. Basinscale 

hydrogeologic modelling. Reviews of Geophysics 34 (1), 

61–97.

140 


Plantedirektoratet, 1996. Guidelines and forms 1996/1997. Ministry 

for Food, Agriculture and Fishery, 38 pp. (In Danish). 

Refsgaard, J.C., 1997. Parameterisation, calibration and validation 

of distributed hydrological models. Journal of Hydrology 198, 

69–97. 

Refsgaard, J.C., Hansen, E., 1982. A distributed groundwater/ 

surface water model for the Suså catchment. Part 1. Model 

description. Nordic Hydrology 13, 299–310. 

Refsgaard, J.C., Storm, B., 1995. MIKE SHE. In: Singh, V.P. (Ed.). 

Computer Models of Watershed Hydrology, Water Resources 

Publication, pp. 809–846. 

Refsgaard, J.C., Seth, S.M., Bathurst, J.C., Erlich, M., Storm, B., 

Jørgensen, G.H., Chandra, S., 1992. Application of the SHE to 

catchment in India. Part1. General results. Journal of Hydrology 

140, 1–23. 

Refsgaard, J.C., Thorsen, M., Jensen, J.B., Hansen, S., Heuvelink, 

G., Pebesma, E., Kleeschulte, S., Ramamaekers, D., 1998. 

Uncertainty in spatial decision support systems—Methodology 

related to prediction of groundwater pollution. In: Babovic, V., 

Larsen, L.C. (Eds.), Hydroinformatics ‘98. Proceedings of the 

Third International Conference on Hydroinformatics, Copenhagen, 

Balkema, 24–26 August 1998, pp. 1153–1159. 

Refsgaard, J.C., Sørensen, H.R., Mucha, I., Rodak, D., Hlavaty, Z., 

Bansky, L., Klucovska, J., Topolska, J., Takac, J., Kosc, V., 

Enggrob, H.G., Engesgaard, P., Jensen, J.K., Fiselier, J., Griffioen, 

J., Hansen, S., 1998. An integrated model for the Danubian 

Lowland—methodology and applications. Water 

Resources Management 12, 433–465. 

Refsgaard, J.C., Ramaekers, D., Heuvelink, G.B.M., Schreurs, V., 

Kros, H., Rosén, L., Hansen, S., 1998. Assessment of ‘cumulative’ 

uncertainty in spatial decision support systems: Application 

to examine the contamination of groundwater from diffuse 

sources (UNCERSDSS). Presented at the European Climate 

Science Conference, Vienna, 19–23 October. 

Saulnier, G.M., Beven, K., Obled, C., 1997. Digital elevation analysis 

for distributed hydrological modelling: Reducing scale 

dependence in effective hydraulic conductivity values. Water 

Resources Research 33 (9), 2097–2101. 

Sellers, P.J., Heiser, M.D., Hall, F.G., Verma, S.B., Desjardins, 

R.L., Schuepp, P.M., MacPherson, J.I., 1997. The impact of 

using area-averaged land surface properties—topography, 

vegetation conditions, soil wetness—in calculations of intermediate 

scale (approximately 10 km 2 ) surface-atmosphere 

heat and moisture fluxes. Journal of Hydrology 190, 269–301. 

Styczen, M., Storm, B., 1993. Modelling of N-movements on catchment 

scale—a tool for analysis and decision making. 1. Model 

description. 2. A case study. Fertilizer Research 36, 1–17. 

Styczen, M., Storm, B., 1995. Modelling of the effects of management 

practices on nitrogen in soils and groundwater. In: Bacon, 

P.E. (Ed.). Nitrogen Fertilization in the Environment, Marcel 

Dekker, New York, pp. 537–564. 

Svendsen, H., Hansen, S., Jensen, H.E., 1995. Simulation of crop 

production, water and nitrogen balances in two German agroecosystems 

using the Daisy model,. Ecological Modelling 81, 

197–212. 

Thorsen, M., Feyen, J., Styczen, M., 1996. Agrochemical modelling. 

In: Abbott, M.B., Refsgaard, J.C. (Eds.). Distributed 

Hydrological Modelling, Kluwer Academic Publishers, 

Dordrecht, pp. 121–141. 

UNCERSDSS, 1998. Assessment of cumulative uncertainty in 

Spatial Decision Support Systems: Application to examine the 

contamination of groundwater from diffuse sources 

(UNCERSDSS). EU contract ENV4-CT95-070. Final Report, 

available on http://projects.gim.lu/uncersdss. 

Vanclooster, M., Viaene, P., Christians, K., 1994. WAVE—a mathematical 

model for simulating agrochemicals in the soil and 

vadose environment. Reference and user’s manual (release 

2.0). Institute for Land and Water Management, Katholieke 

Universiteit Leuven, Belgium. 

Vanclooster, M., Viaene, P., Diels, J., Feyen, J., 1995. A deterministic 

validation procedure applied to the integrated soil crop 

model. Ecological Modelling 81, 183–195. 

Vereecken, H., Vanclooster, M., Swerts, M., Diels, J., 1991. Simulating 

nitrogen behaviour in soil cropped with winter wheat. 

Fertilizer Research 27, 233–243. 

Wen, X.-H., Gómez-Hernández, J.J., 1996. Upscaling hydraulic 

conductivities in heterogeneous media: An overview. Journal 

of Hydrology 183, ix–xxxii. 

Wood, E.F., Sivapalan, M., Beven, K.J., Band, L., 1988. Effects of 

spatial variability and scale with implications to hydrologic 

modelling. Journal of Hydrology 102, 29–47. 

Wood, E.F., Sivapalan, M., Beven, K., 1990. Similarity and scale in 

catchment storm response. Reviews of Geophysics 28, 1–18. 

Woods, R., Sivapalan, M., Duncan, M., 1995. Investigating the 

representative elementary area concept: an approach based on 

field data. Hydrological Processes 9, 291–312. 

Young, R.A., Onstad, C.A., Bosch, D.D., 1995. AGNPS: an agricultural 

nonpoint source model. In: Singh, V.P. (Ed.). Computer 

Models of Watershed Hydrology, Water Resources Publication, 

pp. 1001–1020.

[11] 

Thorsen M, Refsgaard JC, Hansen S, Pebesma E, Jensen JB, Kleeschulte S 

(2001) Assessment of uncertainty in simulation of nitrate leaching to aquifers 

at catchment scale. 



Journal of Hydrology 242 (2001) 210±227 

www.elsevier.com/locate/jhydrol 

Assessment of uncertainty in simulation of nitrate leaching to 

aquifers at catchment scale 

M. Thorsen a , J.C. Refsgaard a, *, S. Hansen b , E. Pebesma c , J.B. Jensen a , S. Kleeschulte d 

a DHI Water and Environment, Hùrsholm, Denmark 

b Royal Veterinary and Agricultural University, Copenhagen, Denmark 

c University of Amsterdam, Amsterdam, The Netherlands 

d GIM, Luxembourg, Luxembourg 

Received 21 February 2000; revised 21 July 2000; accepted 23 October 2000 

Abstract 

Deterministic models are used to predict the risk of groundwater contamination from non-point sources and to evaluate the 

effect of alleviation measures. Such model predictions are associated with considerable uncertainty due to uncertainty in the 

input data used, especially when applied at large scales. The present paper presents a case study related to prediction of nitrate 

concentrations in groundwater aquifers using a spatially distributed catchment model. Input data were primarily obtained from 

databases at an European level. The model parameters were all assessed from these data by use of transfer functions, and no 

model calibration was carried out. The Monte Carlo simulation technique was used to analyse how uncertainty in input data 

propagates to model output. It appeared that the magnitude of the uncertainty depends signi®cantly on the considered temporal 

and spatial scale. Thus simulations of ¯ux concentrations leaving the root zone at grid level were associated with large 

uncertainties, whereas uncertainties in simulated concentrations at aquifer level on catchment scale was much smaller. 

q 2001 Elsevier Science B.V. All rights reserved. 

Keywords: Nitrate; Non-point pollution; Distributed model; Catchment scale; Uncertainty; Monte Carlo method 


1.1. Background 

Deterministic models are important tools for assessing 

nitrate leaching, transport and transformation in 

connection with groundwater resources management. 

Such models may be classi®ed according to the 

description of the physical processes as black box, 

* Corresponding author. Present address. Department of Hydrology, 

Geological Survey of Denmark and Greenland, Thoravej 8, 

DK-2400 Copenhagen, Denmark. 

E-mail address: jcr@geus.dk (J.C. Refsgaard). 

conceptual and physically-based and according to 

the spatial description as lumped and distributed 

(Wood and O'Connell, 1985; Nemec, 1994; 

Refsgaard, 1996; and others). In this respect three 

typical model types are the lumped black box 

model, the lumped conceptual and the distributed 

physically-based. Most nitrogen leaching models 

such as RZWQM (DeCoursey et al., 1989) and 

DAISY (Hansen et al., 1991) are of the physicallybased 

type, but cover only the root zone at plot or ®eld 

scale. Within the ®elds of nitrogen modelling at a 

catchment scale, typical examples of a black box, a 

conceptual and a distributed physically-based model 

are statistical regression models (Simmelsgaard, 

0022-1694/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. 

PII: S0022-1694(00)00396-6

M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227 211 

1991), the SWRRB (Arnold et al., 1990; Arnold and 

Williams, 1995) and the MIKE SHE/DAISY (Styczen 

and Storm, 1993), respectively. 

The black box and conceptual models are 

attractive because they require relatively less 

data, which are usually easily accessible, while 

the predictive capability of these models with 

regard to assessing the impacts of alternative agricultural 

practices is questionable due to the semiempirical 

nature of the process descriptions. A key 

problem in using the more complex physicallybased 

catchment models operationally lies in the 

generally large data requirements prescribed by 

the developers of such model codes. However, 

due to the better process descriptions these models 

may for some types of application be expected to 

have better predictive capabilities than the simpler 

models (Heng and Nikolaidis, 1998). Traditionally, 

complex leaching models are only used on plot or 

®eld scales in areas with extraordinarily good data 

availability, and even for such cases the relevance 

of such an approach is often questioned because 

of the perceived uncertainty related to the model 

simulations (Skop, 1993). Hence, there is an 

evident need to assess the uncertainty related to 

large scale simulation of aquifer pollution from 

diffuse sources. 

When analysing for uncertainties in model 

simulations the two fundamentally different 

sources of uncertainty are: (1) uncertainty on 

input data in terms of input variables (time varying 

input such as climate data) and model parameters 

(e.g. soil physical characteristics); and (2) 

inadequate model structure (process descriptions, 

equations). When comparing the model outputs 

to measured ®eld data a third source of uncertainty 

has to be added, namely the error in the 

measurement of output from nature. 

Stochastic approaches are useful tools in uncertainty 

analyses. Assessment of uncertainties of 

model simulations requires a joint stochastic±deterministic 

approach, where the input data and/or the 

structure of the deterministic model somehow are 

considered stochastic. By considering input data as 

realisations of stochastic variables with given statistical 

properties, the governing equations become socalled 

stochastic partial differential equations 

(PDEs). The three traditional approaches to solving 

the stochastic PDEs are (1) state space formulations 

Ð Kalman ®ltering (Gelb, 1974; Ahsan and O'Connor, 

1994), (2) Monte Carlo techniques (Smith and 

Freeze, 1979a,b; Freeze, 1980; Zhang et al., 1993, 

and (3) analytical solutions to the stochastic PDEs 

(Gelhar, 1986; Dagan, 1986; Jensen and Mantoglou, 

1992). A severe limitation of the above three methods 

is that they only consider uncertainties on input data, 

while all of them assume the model structure to be 

correct. A more comprehensive approach also allowing 

consideration of the uncertainty in the model 

structure and process equations is the generalised likelihood 

uncertainty estimation (GLUE) methodology 

outlined in Beven and Binley (1992). Although no 

such studies have been reported yet, the GLUE in 

principle allows the uncertainty on model structure 

to be considered by introducing several alternative 

models, so that the Monte Carlo procedure includes 

both uncertainties on input data and on model structure. 

The objective of the present paper is, by use of 

Monte Carlo simulations, to assess whether a distributed 

physically-based model can provide fairly accurate 

predictions of nitrate concentrations in aquifers 

when applied at a catchment scale with input data only 

from readily available, aggregated data sources such 

as European databases. A limitation of the present 

paper is that only uncertainties in input data are 

considered, while errors in model structures are not 

taken into account. 

The studies reported in literature dealing with 

assessment of uncertainty of physically-based 

models consider only individual components of 

the hydrological cycle, typically groundwater, 

while the studies dealing with conceptual models, 

including both surface water, root zone and groundwater 

processes, have not considered uncertainties 

on nitrogen or other water quality aspects. Thus, to 

our knowledge, no similar attempts have been 

reported so far. The present paper focussing on 

uncertainty assessment at catchment scale is an 

extension of Refsgaard et al. (1999) and Hansen 

et al. (1999), where details on the deterministic 

modelling at catchment scale and the uncertainty 

aspects at the nitrogen leaching from the root 

zone, respectively, have been described. All three 

papers present results from the UNCERSDSS 

project (Refsgaard et al., 1998).

212 

M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227 

2. Methodology 

2.1. Modelling approach 

The deterministic simulation is carried out by the 

coupled MIKE SHE/DAISY system. This is a 

coupling of a 1D root zone model (DAISY) and a 

3D distributed catchment model (MIKE SHE). 

MIKE SHE is a modelling system describing the 

¯ow of water and solutes in a catchment in a distributed 

physically-based way. This implies numerical 

solutions of the coupled PDEs for overland (2D) and 

channel ¯ow (1D), unsaturated ¯ow (1D) and saturated 

¯ow (3D) together with a description of evapotranspiration 

and snowmelt processes. For further 

details reference is made to the literature (Abbott et 

al., 1986; Refsgaard and Storm, 1995). 

DAISY (Hansen et al., 1991) is a 1D physicallybased 

modelling tool for the simulation of crop 

production and water and nitrogen balance in the 

root zone. DAISY includes modules for description 

of evapotranspiration, soil water dynamics based on 

Richards' equation, water uptake by plants, soil 

temperature, soil mineral nitrogen dynamics based 

on the advection±dispersion equation, nitrate uptake 

by plants and nitrogen transformations in the soil. The 

nitrogen transformations simulated by DAISY are 

mineralisation-immobilisation turnover (MIT), nitri®cation 

and denitri®cation. In addition, DAISY 

includes a module for description of agricultural 

management practices. 

By combining MIKE SHE and DAISY, a complete 

modelling system is available for the simulation of 

water and nitrate transport in an entire catchment. In 

the present case the coupling is a sequential one. Thus 

for all agricultural areas, DAISY ®rst performs calculations 

of water and nitrogen behaviour from the soil 

surface and through the root zone. The percolation of 

water and nitrate at the bottom of the root zone, simulated 

by DAISY, is then used as input to MIKE SHE 

calculations for the remaining part of the catchment. 

For natural areas, MIKE SHE calculates also the root 

zone processes assuming no nitrate contribution from 

these areas. Due to the sequential execution of the two 

codes, it has to be assumed that there is no feedback 

from the groundwater zone (MIKE SHE) to the root 

zone (DAISY). As the riparian buffer zone, where 

such feedback mechanism is effective, often mainly 

(like in our case study) constitutes a part of the natural 

areas, this limitation is of minor practical importance. 

Furthermore, overland ¯ow generated by high intensity 

rainfall (Hortonian) can not be simulated by this 

coupling, while saturation-excess overland ¯ow 

(Dunne) can be accounted for by MIKE SHE. 

Thus, MIKE SHE does not in the present case 

handle evapotranspiration and other root zone 

processes in the agricultural areas. As DAISY is 1D, 

one DAISY run in principle should be carried out for 

each of MIKE SHE's horizontal grids. However, 

several MIKE SHE grids are assumed to have identical 

root zone properties (soil, crop, agricultural 

management practices, etc), so that in practise the 

outputs from each DAISY run can be used as input 

to several MIKE SHE grids. 

To ful®l one of the overall objectives of the project, 

which was to assess the quality of European data sets 

for direct use for modelling at the European scale, two 

key constraints were applied to the modelling 

approach. One constraint was that, if possible, input 

data such as model parameters and driving variables 

should be based on publicly available information, 

which preferably could be accessed from the standard 

European databases such as GISCO or EUROSTAT, 

or from very easily available national sources. 

Another constraint was that all model parameters 

obtained from standard databases were to be used 

directly or by way of transfer functions without any 

model calibration. 

2.2. Scaling 

As the equations in both the MIKE SHE and the 

DAISY codes basically are point scale equations a 

scaling procedure had to be adopted in order to 

apply the codes at a catchment scale. MIKE SHE/ 

DAISY is in this case run with equations and parameter 

values in each model grid point representing 

®eld scale conditions. The ®eld scale is characterised 

by `effective' soil and vegetation parameters, but 

assuming only one soil type and one cropping pattern. 

The smallest horizontal discretisation in the model is 

the grid scale (2 £ 2km 2 ) that is larger than the ®eld 

scale. This implies that all the variations between 

categories of soil type and crop type within the area 

of each grid can not be resolved and described at the 

grid level. Input data, whose variations are not


Fig. 1. Location of the Karup catchment in Jutland, Denmark. 

included in the grid scale representation, are distributed 

randomly at the catchment scale so that their 

statistical distributions are preserved at that scale. 

The results from the grid scale modelling are then 

aggregated to catchment scale (130 grids) and the 

statistical properties of model output and ®eld data 

are then compared at catchment scale. Thus the scaling 

procedure from point scale to catchment scale 

may be characterised as a combination of an upscaling 

step and an aggregation step. The upscaling step is 

simply the important assumption that the point scale 

equations are valid at ®eld scale. The aggregation step 

highlights a key issue from the concept of representative 

elementary area (REA) (Wood et al., 1988), 

namely that variability can be explicitly represented 

only at scales larger than the model grid size. 

More details on the adopted scaling approach is 

provided in Refsgaard et al. (1999), where it is also 

documented that the approach can be assumed valid 

for the case study in question. 

2.3. Input error assessment 

The MIKE SHE/DAISY model contains a very 

large number of input parameters. Ideally, all these 

parameters should be treated stochastically and 

included in the uncertainty analyses. However, this 

would result in an unrealistically high number of 

Monte Carlo simulations and CPU-time. Therefore, 

the input uncertainty was limited to ®ve key parameters 

(see Section 3.2 below), which were selected 

so that they, by experience, are known to be the dominant 

parameters in the processes governing the water 

balance and nitrate leaching and transformation.

214 


The actual input error assessment, i.e. the choice 

and parameterisation of the joint probability distribution 

of the stochastic variables was partly based on the 

analysis of available data and partly on expert judgement. 

Available data comprised data from national 

surveys or previous studies. The expert judgement 

refers for instance to the choice of the distribution 

type if no data were present, and the assessment of 

`realistic' ranges between which the true parameter 

values were expected to vary. Although this assessment 

seems rather subjective, it was hard to ®nd a 

better way of doing this in the case of lacking data. 

Since the basic unit of calculation is a ®eld, the variation 

of ®eld-effective values was used for determining 

the range of the parameter probability distributions. A 

single realisation of such a parameter was then used in 

the model for each grid cell. All stochastic parameters 

were treated as being mutually independent. The 

reasons for this are that no signi®cant correlation was 

suspected a priori, and that no data were available to 

actually estimate possible correlation. 

2.4. Error propagation 

The propagation of errors in the input data to the 

model output was assessed using Monte Carlo analysis. 

This means that a number of realisations were drawn at 

random from the stochastic input parameter distributions 

and that the model was run for each realisation. 

The ensemble of model outputs then is an estimate of the 

model output probability distribution, as only in¯uenced 

by uncertainty in model input parameters. In order to 

reduce the number of Monte Carlo runs, Latin hypercube 

sampling was used to draw realisations from the 

input variables (McKay et al., 1979). This essentially 

means that each sample of a stochastic input variable 

was strati®ed in N strata with equal probability mass, 

where N equals the number of Monte Carlo runs. The 

theoretical background for the adopted Latin hypercube 

sampling method is described in Pebesma and Heuvelink 

(1999). 

3. Application 

3.1. Study area 

The area used in the study is the Karup river basin, 

located in the middle part of Jutland, Denmark 

(Fig. 1). The topographic catchment covers approximately 

500 km 2 of which 70% are used for agricultural 

purposes and 30% are natural areas. The 

catchment characteristics are described in Styczen 

and Storm (1993). The data used for the present 

study and the model construction are described in 

detail in Refsgaard et al. (1999) and Hansen et al. 

(1999). In the following a brief summary is provided. 

The catchment was in the model represented in a 

3D network. The discretisation used for the uncertainty 

analysis was 2 km in the horizontal direction 

and varied in the vertical from 5 to 40 cm in the unsaturated 

zone, and from 10 to 15 m in the saturated 

zone. The catchment area and the location of the 

river branches as well as the stream geometry were 

generated on the basis of a digital elevation map from 

USGS/GISCO using Arc/Info facilities. Spatial distributions 

of land use and soil types were derived from 

the GISCO database and hydrogeological data were 

obtained from EC (1982). Distributions of crop types 

and livestock densities were obtained from Agricultural 

Statistics (1995) and converted to slurry production 

using standard values for nitrogen content. Based 

on typical crop rotations proposed by The Danish 

Agricultural Advisory Centre and the constraints 

offered by crop distribution and livestock density 

two cattle farm rotations, one pig farm rotation and 

one arable farm rotation were constructed. In order to 

capture the effect of the interaction between weather 

conditions and crop, simulations were performed in 

such a way that each crop at its particular position in 

the considered rotation occurred once in each of the 

years in the rotation. This resulted in a total of 17 

agricultural crop rotation schemes and one scheme 

representing natural areas with no assumed nitrate 

leaching. These 18 schemes were distributed 

randomly over the area in such a way that the statistical 

distribution was in accordance with the agricultural 

statistics. 

To simulate the trend in the nitrate concentrations 

in the groundwater and in the streams, it is necessary 

to have information on the history of the fertiliser 

application in space and time. In Denmark, norms 

and regulations for fertilisation practice are de®ned 

(Plantedirektoratet, 1996). These regulate the maximum 

amount of nutrients allowed for a particular 

crop depending on forefruit and soil type, and in addition, 

provide norms for the lower limit of nitrogen


Table 1 

Statistical properties of the input error considered in the Monte Carlo analysis 

Parameter Unit Distribution Mean Std. Range 

Daily rainfall 

Standard error % 50 

a 

Clay content % Uniform 8.5 0.0±17.0 

SOM2 % Truncated normal 0.5 0.22 0.06±0.94 

Cattle slurry 

Dry matter content % Truncated normal 7.5 2.5 1.89±14.35 

Total N content % Truncated normal 0.5 0.12 0.24±1.02 

Pig slurry 

Dry matter content % Truncated normal 4.9 2.5 0.82±13.79 

Total N content % Truncated normal 0.61 0.18 0.24±1.02 

Depth of reduction front m Uniform 22.5 18±27 

a 

The series was normalised so that the mean value was preserved. 

utilisation for organic fertilisers. It was assumed that 

the farmers follow these statuary norms. Based on 

estimated application rates of organic and mineral 

fertiliser to the individual crops each year, the 

DAISY model simulated time series of nitrate leaching 

from the root zone for each agricultural grid. The 

MIKE SHE model then routed these ¯uxes further 

through the unsaturated zone and in the groundwater 

layers accounting for dispersion and dilution 

processes and ®nally into the Karup stream where 

the integrated load from the entire catchment was 

estimated. 

The model was run for seven years, from 1987 to 

1993. The large storage possibilities in the unsaturated 

zone and the aquifer imply that the initial conditions 

in¯uence the simulation results for several years. The 

initial conditions were established by running the 

deterministic model twice for the period 1987± 

1993. In the ®rst run the initial conditions were 

guessed and in the second run they were taken as 

the simulated conditions by the end of the period. 

The simulated 1993 conditions in the second run 

were then used as initial conditions for the Monte 

Carlo runs. This procedure ensures that the initial 

conditions are consistent with the assumptions made 

in the deterministic simulation, but not necessarily 

with the parameter values drawn in the Monte Carlo 

runs, where e.g. a run with a parameter value resulting 

in higher nitrate leaching, in principle, should have 

been associated with higher initial nitrate concentrations 

in the aquifer. In order to reduce the effect of 

this, the two ®rst years were considered as a 

`warming-up period' and the last ®ve years were 

considered the simulation period. 

3.2. Assessment of input errors 

Uncertainty on the following ®ve parameters was 

introduced in the analysis: precipitation, soil hydraulic 

properties, soil organic matter (SOM) content, 

slurry composition, and depth of the nitrate reduction 

front in the aquifer. The rationale for selecting these 

®ve parameters and details on their assessment are 

provided in Sections 3.2.2±3.2.6 below. The statistical 

characteristics of the data included in the Monte 

Carlo analysis are shown in Table 1. 

3.2.1. Length scale and spatial correlation 

A fundamental question in the assessment of uncertainty 

of input data for a spatially distributed model 

like MIKE SHE/DAISY is whether the input data are 

spatially correlated or not. It is possible to take spatial 

correlation into account, however, it will complicate 

the Monte Carlo sampling considerably (Kros et al., 

1999). The critical question in this relation is whether 

the spatial autocorrelation length scale of the input 

data is larger than the computational scale, or whether 

the dominating spatial variability takes place within a 

computational length scale, in which case it should be 

incorporated into the effective model parameters and 

their inherent uncertainties. 

As discussed above, the basic unit of calculation is 

the model grid (2 £ 2km 2 ) with some of the parameters, 

however, representing ®eld-effective values

216 


(typically 1±10 ha in size). Hence the soil hydraulic 

parameters, the SOM content and slurry composition 

are representing ®eld length scales in the order of 

100±300 m, while the precipitation and reduction 

front are represented at a 2 km length scale. 

For the ®eld related parameters the correlation 

length scales can be assumed smaller than 100 m. 

For soil hydraulic properties this is documented in 

previous studies (Hansen and Jensen, 1988), while 

no data exist on length scales for SOM. With respect 

to slurry composition this parameter is the result of 

farm management and storage conditions, and it is 

known that the temporal variability of the produced 

slurry on the individual farm is considerable. Hence, it 

is assumed that the variability within the individual 

®elds is much larger than the variability among the 

®elds. 

Daily rainfall data are known to have correlation 

length scales that are usually larger than the 

2 km grid scale used in the present case. Geostatistical 

analysis (Storm et al., 1988) suggests that 

the length scale for Danish conditions is in the 

order of 10 km. Similarly, the location of the 

reduction/oxidation front, which is mainly dependent 

on geological conditions, may be assumed to 

be signi®cantly larger than the 2 km grid. 

This implies that the three ®eld related parameters 

in principle should be treated as spatially independent 

in the Monte Carlo analysis, while the two other input 

data could be treated as almost spatially constant. 

As a consequence of the adopted scaling approach 

the relevant scale for which the uncertainty on the 

input data should be generated in the Monte Carlo 

analysis is the catchment scale and not the grid 

scale. The uncertainty at catchment scale can be 

generated either by allowing spatial variation among 

grids and use a variance applicable for grid scale in 

the Monte Carlo sampling or by assuming a spatially 

constant value and using the (smaller) catchment scale 

variance. In the present study we have adopted the 

latter approach. This has two important limitations. 

Firstly, the nitrate reduction processes in the aquifer, 

where the horizontal dimension with ¯ows between 

neighbouring grids is important, is not fully correctly 

described because the autocorrelation length scale is 

not preserved. Secondly, the output uncertainties are 

only simulated correctly at the catchment scale, while 

they are underestimated at grid scales. 

3.2.2. Precipitation 

In general the required daily climate data are available 

throughout Europe from the national meteorological 

institutes. Among the required meteorological 

variables the precipitation is the one, subject to most 

local variations. Therefore uncertainty on the daily 

amount of precipitation was included in the present 

analysis. The uncertainty was described by adding a 

random error to the measured series. This error was 

assumed to follow a normal distribution with zero 

mean and a standard deviation equivalent to 50% of 

the measured daily value. Thus, dry days were kept 

dry. The error was assumed to contain no temporal 

autocorrelation. Finally, the series was normalised so 

that the mean value, taken over the 25 Monte Carlo 

runs, was preserved. The adopted variance is in agreement 

with Allerup et al. (1982) as standard error of 

daily rainfall for a catchment of this size. 

3.2.3. Soil hydraulic properties 

The modelling system requires soil hydraulic parameters 

in terms of retention curves and hydraulic 

conductivity functions. Such data were not directly 

available through European databases. Instead, these 

properties were estimated using pedo-transfer functions 

based on soil information in terms of texture 

composition obtained from the GISCO soil database 

in which soils are divided into ®ve texture classes 

according to FAO classi®cation. All soil types of the 

Karup catchment fall within one texture class (coarse 

texture) which covers soils with less than 18% clay 

and more than 65% sand. As the texture class covers a 

wide range of different texture compositions, soil 

hydraulic properties derived from this information 

will be associated with considerable uncertainty. 

Based on a review by Tietje and Tapakenhinrichs 

(1993) evaluating available pedo-transfer functions 

and based on the constraints imposed by the available 

information on texture (clay, silt and sand content), 

the pedo-transfer functions proposed by Cosby et al. 

(1984) were selected. These functions estimate the 

saturated hydraulic conductivity and the parameters 

in the soil water retention function proposed by 

Campbell (1974). The hydraulic conductivity function 

was calculated according to Burdine (1952) using the 

same parameters. In order to facilitate a smooth retention 

function the Campbell functions were modi®ed 

according to the modi®cations of the Brooks±Corey


function (Brooks and Corey, 1966) proposed by Smith 

(1992). In Danish soils the clay and the silt content are 

correlated. Based on information in the Danish Soil 

Library (Lamm, 1971) a relation between clay and silt 

has been established: 

Silt content ˆ 0:035 1 0:82 £ Clay content 

…r 2 ˆ 0:68† 

Adopting this relation and assuming that clay, silt 

and sand constitute all soil solids, the soil hydraulic 

properties can be calculated once the clay content is 

known. In the uncertainty analysis, the clay content 

was drawn strati®ed random from a uniform distribution 

ranging from 0 to 17% (Table 1). In reality, the 

uncertainty on the soil hydraulic parameters originate 

from two sources, namely the uncertainty on soil 

texture and the uncertainty related to use of the 

adopted pedotransfer function. In the present 

approach uncertainty is only associated to soil texture. 

Data from the Danish Soil Textural Database show 

that a uniform distribution, as adopted in the present 

study, clearly overestimates the uncertainty on soil 

texture (Bùrresen, 2000). The assumed large uncertainty 

range on soil texture may therefore compensate 

for the lack of uncertainty on the pedotransfer function, 

so that the integrated uncertainty on the soil 

hydraulic parameters is of the right order of magnitude. 

Considering that the autocorrelation length scale 

for soil texture is in the order of 100 m, this adopted 

uncertainty range may at a ®rst glance appear as a 

rather high uncertainty for soil texture at the catchment 

scale. However, as the FAO texture class is so 

broad that it actually covers different soil types with 

large differences in hydraulic properties the adopted 

catchment scale variance should be seen to cover 

uncertainty on which soil type actually is present in 

the catchment rather than uncertainty on hydraulic 

properties due to small scale variations. 

3.2.4. Soil organic matter 

In DAISY, the MIT model considers three types of 

organic matter: newly added relatively fresh organic 

matter (AOM) with a relatively short turnover rate, 

the living soil microbial biomass (SMB) and old 

native SOM with slow turnover, respectively. The 

former two can be initialised with default values 

when the model is run with a `warm-up' period of a 

couple of years prior to the actual simulation period. 

The latter comprises by far, most of the organic matter 

found in the soil. However, SOM is divided into two 

sub-pools, SOM 1 and SOM 2 . The turnover of SOM 1 is 

so slow that its contribution to the annual nitrogen 

mineralisation in agricultural soils is negligible. 

Hence, when initialising the MIT model the important 

factor is the quantity of SOM 2 . As the European databases 

did not provide this information we had to rely 

on estimates of both the amount of the organic matter 

present in the soil and the amount of this organic 

matter that is allocated to the SOM 2 . The assumed 

statistical properties of this uncertainty are shown in 

Table 1. 

3.2.5. Slurry composition 

Due to the high livestock density, slurry is a 

substantial source of nitrogen in the Karup region. 

Hence the management of slurry is of prime importance 

for the leaching losses. A main problem in 

management of slurry is the large variability found 

in the composition of the slurry. This variability 

makes the actual fertiliser application in slurry differ 

from the planned application and introduces therefore 

a considerable source of uncertainty. In the uncertainty 

analysis this has been accounted for by introducing 

uncertainty on the dry matter content and the 

nitrogen content of the slurry. The assumed error 

statistics are shown in Table 1. Further details on 

the agricultural management and the rationale behind 

the error statistics are provided in Hansen et al. 

(1999). 

3.2.6. Depth of reduction front 

In the uncertainty analysis the depth of the reduction 

front in the saturated zone was drawn from a 

uniform distribution in the interval 18±27 m below 

soil surface. 

3.3. Uncertainty analyses 

The initial part of the uncertainty analysis 

comprised an evaluation of the selected number of 

Monte Carlo runs. As the CPU-time required to run 

the model for the seven year period is substantial it 

was necessary to keep the number of Monte Carlo 

runs to a minimum. Therefore an initial choice of 25

218 


Table 2 

Evaluation of the representativeness of 25 Monte Carlo runs 

Variable 1±25 26±50 51±75 1±75 CV (%) 

Mean Std. Mean Std. Mean Std. Mean a Std. 

Leaching from root zone (kg N/ 64.7 19.2 68.2 18.9 67.2 16.7 66.7 18.1 27.1 

ha/year) 

Groundwater concentration (mg 47.7 8.0 48.3 7.2 47.6 6.0 47.8 7.0 14.6 

NO 3 /l) 

River ¯ow (mm/year) 464.0 22.0 464.0 23.0 464.0 17.0 464.0 21.0 4.5 

River concentration (mg NO 3 /l) 45.1 7.8 46.2 7.3 45.7 6.6 45.7 7.1 15.5 

a 

Homogeneity of means accepted by F-test. 

runs was made. In order to investigate whether 25 

Monte Carlo runs are suf®cient to capture the variability, 

75 Monte Carlo runs were performed and the 

results were split into 3 groups of 25 runs each and the 

statistical distribution of the three elements were 

compared. The output variables analysed were river 

¯ow, average NO 3 concentration in groundwater, and 

average NO 3 concentration in the stream. The three 

sets of Monte Carlo runs were evaluated by comparing 

the statistical distribution of simulation results, i.e. 

testing whether the simulation results can be 

described by a normal distribution and whether homogeneity 

of mean and variance can be assumed. 

In the second part of the uncertainty analysis the 

sources of uncertainty with respect to uncertainties 

associated with each of the selected Monte Carlo parameters 

were evaluated by performing ®ve sets of 

Monte Carlo simulations in each of which one of 

the initially stochastic parameters was kept deterministic. 

The uncertainty contributions of the different 

parameters were then evaluated. As annual leaching 

depends on weather, crop and crop position in the 

rotation, groundwater concentrations in single years 

were not considered, instead data averaged over the 

®ve year simulation period, 1989±1993, were used for 

the uncertainty analysis. 

4. Results Ð uncertainties of model results 

4.1. Evaluation of the number of Monte Carlo runs 

The main results of the comparison between three 

individual sets of 25 Monte Carlo runs are given in 

Table 2. Statistical tests showed that the hypothesis of 

homogeneity of means and variances can not be 

Fig. 2. Statistical distribution from 25 Monte Carlo runs of simulated average annual river ¯ow at the catchment outlet. The corresponding 

measured value based on daily river ¯ow data was 451 mm/year.


Fig. 3. Statistical distribution over 25 Monte Carlo runs of simulated areal average NO 3 concentrations in upper aquifer layer by the end of 

1993. The corresponding measured value based on data from 35 wells was 58 mg/l. 

rejected. As the three sub-sets appear statistically 

similar it was concluded that 25 Monte Carlo runs 

were suf®cient to assess the uncertainty on the simulation 

results. It should be emphasised that the small 

number of Monte Carlo runs only is possible because 

we focus on mean values and standard deviations. If 

the aim were to assess uncertainties on extreme 

values, such as the 1% fractile, 25 runs would 

obviously not have been suf®cient. 

4.2. Comparisons with ®eld data 

The simulated uncertainty intervals on selected 

model results were, if possible, compared to corresponding 

measured data available from monitoring 

programmes conducted in the area. In this context it 

is noted that due to the adopted scaling approach, the 

simulation results are only supposed to re¯ect the ®eld 

observations at a catchment scale and not at a point 

scale. 

The simulated water balance represented by average 

annual river discharge at the catchment outlet 

vary from 428 to 502 mm/year (Fig. 2). The corresponding 

measured value is 451 mm/year which 

falls within the simulated interval and within 5% of 

both the median (462 mm) and the average (463 mm) 

Fig. 4. Statistical distribution over 25 Monte Carlo runs of percentage of catchment area with NO 3 concentrations above the drinking water limit 

of 50 mg/l. The corresponding measured value based on data from 35 wells was 57%.

220 


Fig. 5. Measured (B) and simulated ( £ ) areal distribution of NO 3 concentrations in groundwater at eight points in time. Measured values are based on 35 groundwater observations.


Fig. 6. (a) Simulated time series of six monthly ¯ux concentrations from the root zone obtained in three different crop rotations (B ˆ mean, 

u ˆ ^ 1 £ std). The range of seasonal variation in standard errors is shown inside the ®gures. (b) Simulated time series of average areal aquifer 

concentrations (B ˆ mean, u ˆ ^ 1 £ std). The range of seasonal variation in standard errors is shown inside the ®gures.

222 


Fig. 6. (continued) 

of the simulated values. Fig. 3 presents the simulated 

distribution of average nitrate concentrations in the 

upper groundwater layer averaged over the entire 

catchment and over the ®ve years simulation period. 

The corresponding value obtained from observations 

in 35 wells is 58 mg/l, which falls within the simulated 

interval (35.4±61.4 mg/l) and within 25% of 

both the median (46.7 mg/l) and the average 

(47.4 mg/l) of the Monte Carlo runs. In Fig. 4 the 

fraction of the catchment area with groundwater 

concentrations above the drinking water limit of 

50 mg/l is shown in terms of statistical distribution 

for the 25 Monte Carlo runs. Also in this case the 

observed value from the 35 observation wells (57%) 

falls within the simulated interval (27±65%) and 

within 10% of the median (53%) of the Monte Carlo 

runs. 

A visual comparison is shown in Fig. 5, where 

observed areal distributions of nitrate concentrations 

from existing wells are compared to similar results 

from the Monte Carlo runs on a six-monthly basis. 

From this ®gure it is seen that the measured concentration 

distribution in general is within the uncertainty 

band generated from the Monte Carlo simulations, 

though not always centred. It appears that, in general, 

the simulated fraction of the area with nitrate concentrations 

exceeding 50 mg/l is slightly overestimated in 

the summer period and slightly underestimated in the 

winter period, indicating that the overall trend in the 

concentration level is simulated adequately whereas 

the seasonal variation in observed concentrations is 

not fully represented in the simulations. 

4.3. Nitrate concentrations in aquifer Ð at different 

temporal and spatial scales 

The results regarding the uncertainty on simulated 

nitrogen leaching from different cropping patterns and 

the importance of the contribution from different error 

sources are described in detail in Hansen et al. (1999). 

The present paper focuses on the catchment scale and 

on how uncertainties at a point scale propagate and are 

transformed (reduced) at larger spatial and temporal 

scales. 

The transformation process is illustrated in Fig. 6 

which shows the uncertainty, characterised by time 

series of the means and standard deviations among 

the 25 Monte Carlo runs for (a) six-monthly ¯ux 

concentrations from the root zone (DAISY output) 

for three different crop rotations, and (b) mean sixmonthly 

concentrations in the upper aquifer layer 

averaged over the entire aquifer. It is very clearly 

seen from the ®gures how the uncertainties are 

reduced when moving from root zone leakage to aquifer 

concentrations at catchment scale. Thus it is 

remarkable that for instance the average standard 

errors (standard deviation divided by mean) of six 

monthly root zone ¯ux concentrations in the order 

of 33±44% are reduced to a standard error of 18% 

on the assessed mean six monthly values for ground 

water concentrations at the catchment scale. 

The large seasonal variation in concentration levels 

observed in the percolation water (Fig. 6a) is levelled 

out in the simulated groundwater concentrations at 

both grid level and catchment level. This is mainly a


Table 3 

Simulations used for evaluation of uncertainty contributions. All six 

sets are based on the input uncertainties drawn for the ®rst set of 

Monte Carlo simulations (1±25) 

Monte Carlo run series 

O 

A 

B 

C 

D 

E 

Status of parameters 

All ®ve parameters are treated 

stochastic 

Precipitation is treated 

deterministic 

Texture is treated deterministic 

Soil organic matter is treated 

deterministic 

Slurry composition is treated 

deterministic 

Depth of reduction front is 

treated deterministic 

result of dilution and averaging in the entire groundwater 

volume of the upper layer which accounts for 

8±13 m of the saturated zone. The differences in 

concentration levels between crop rotations is, on 

the other hand, still re¯ected in the groundwater 

concentrations of corresponding grids (Fig. 6b) with 

lowest concentration arising from the plant production 

rotations and highest concentrations from the pig rotations. 

4.4. Analyses of different sources of input error 

In addition to the basic set of Monte Carlo simulations 

(1±25), where all ®ve selected parameters were 

treated stochastically, ®ve series were simulated in 

each of which one of the Monte Carlo parameters 

was kept deterministic (Table 3). The results of 

these extra ®ve series were compared to the result of 

the basic set in order to evaluate the uncertainty associated 

with each of the selected parameters. In Table 

4, the uncertainty contribution of each series given as 

variances is shown. The variance contribution of 

single parameters was obtained by subtracting the 

total simulated variance obtained with only four 

stochastic parameters (e.g. series A) from the total 

variance obtained with ®ve stochastic parameters 

(series O). Ideally, the sum of the variances corresponding 

to the simulation series A±E should equate 

the variance associated with Monte Carlo run series 

O, if no covariance components were generated. It is, 

however, noted that discrepancies occur indicating 

that all variance and covariance components are not 

accounted for. In spite of this, the results can give a 

rough estimate on the relative importance of the 

selected sources of uncertainty. 

As can be seen from Table 2 (runs 1±25) the uncertainty 

on the simulated annual river ¯ows (CV ˆ std./ 

mean ˆ 5%) was signi®cantly less than the uncertainty 

related to the components of the nitrogen 

balance i.e. nitrogen leaching (CV ˆ 30%) and nitrate 

concentrations in groundwater and stream water 

(CV ˆ 17%). According to Table 4 the uncertainty 

on simulated river ¯ow was dominated by contributions 

from uncertainty on soil texture and on precipitation, 

whereas the uncertainties associated with 

components of the nitrogen balance were dominated 

by the uncertainty contributions from both soil 

texture, SOM and slurry composition. Uncertainty 

on precipitation contributed only little to the simulated 

uncertainties on the nitrogen components despite 

the in¯uence it had on the water balance. The depth of 

Table 4 

Estimation of uncertainty on selected simulation results distributed on calculated variance contribution (s 2 ) from precipitation (A), soil texture 

(B), soil organic matter (C), slurry composition (D), and depth of the reduction front (E), respectively 

Variable Variance contribution from single parameters SUM (A:E) All parameters O a 

A B C D E 

Leaching from root zone 

0 192 100 114 0 406 370 

(kg/ha year) 

Groundwater 

2 30 29 28 0 89 64 

concentration (mg/l) 

River ¯ow (mm/year) 284 345 6 6 0 641 499 

River concentration (mg/l) 0 27 21 19 0 67 61 

a 

Variance from simulations with all ®ve Monte Carlo parameters included.

224 


the reduction front appeared to have only minor in¯uence 

on the uncertainty of stream water concentrations 

in the present simulations. 


From the analysis of input error contributions it was 

observed that only three of the ®ve input parameters 

included in the uncertainty analysis contributed 

signi®cantly to the simulated variation in the model 

output related to the nitrogen balance, i.e. areal leaching 

from the root zone and average nitrate concentrations 

in groundwater and stream water. Of these three 

only one, soil texture, is related to the transport 

processes. The two others, SOM and slurry composition, 

are related to the nitrogen turnover processes. 

The uncertainty introduced to the driving variable 

precipitation in¯uenced the simulated water balance 

but not the simulated nitrogen balance. This indicates 

that the timing of the percolating water governed by 

the hydraulic parameters is more important for the 

simulated nitrogen loads than the total annual 

amounts of percolation. This result is supported by 

other studies showing that one of the major factors 

in¯uencing nitrogen losses from the root zone under 

northern temperate climate is the amount of readily 

available organic nitrogen present in the soil at the end 

of the growing season where groundwater recharge is 

initiated (Landbrugets RaÊdgivningscenter, 1996). The 

predicted uncertainty on the simulated river ¯ow is in 

good agreement with results from Storm et al. (1988). 

The uncertainty introduced to the depth of the 

reduction front in the saturated zone had no in¯uence 

on the simulation results. The main reason for this is 

that the simulated groundwater levels were shallower 

than normally observed in the area. This prevented the 

percolating water from passing through the reduced 

zone before entering the stream. If the hydrogeological 

parameters had been included in the Monte Carlo 

analysis, the depth of the reduction front might have 

contributed to the simulated variation in the nitrogen 

balance component, in particular stream ¯ow concentrations, 

as well. 

A fundamental limitation of the adopted approach 

is that the errors due to incorrect model structure are 

neglected. One approach to assess such model error is 

through comparison of predicted and observed values. 

In the present case it was, however, not possible 

during the validation tests to identify a signi®cant 

model error. This must not be taken as a general 

proof for a correct model structure. It only shows 

that the model performs without apparent model 

error for the particular case study. 

Another limitation of the adopted approach lies in 

the choice of associating input uncertainty to only ®ve 

parameters. Although these ®ve parameters according 

to our experience are the most important ones in the 

different processes governing the nitrate leaching and 

transformation, this has not been documented by 

systematic sensitivity analyses, either by us or by 

other authors. It can be argued that the uncertainties 

have been underestimated by neglecting the uncertainty 

on the other input parameters. Hence, the absolute 

uncertainty ®gures should be considered with 

some reservation. 

A third limitation is the mostly subjective method 

of assessing errors in input data. If suitable data had 

been available for assessing such errors in a statistically 

more rigorous way this should have been done. 

Cases where such data are available are typically 

studies on small experimental areas, while our case 

is more comparable to practical studies, where such 

data most often are not available. In spite of the weak 

data basis for the input error assessment, the adopted 

Monte Carlo analysis is still valuable as a rigorous 

method of analysing uncertainty propagation, 

although the predicted uncertainties should be treated 

with some caution. 

When considering uncertainties at different scales it 

must be noticed that due to the adopted approaches 

with respect to upscaling and Monte Carlo sampling 

the uncertainties can only be assumed to be correctly 

assessed at the catchment scale, while the uncertainties 

at smaller scale are underestimated. This ampli- 

®es the ®nding re¯ected in Fig. 6, namely that the 

uncertainties in ¯ux concentrations leaving the root 

zone is much larger than the uncertainty at the catchment/aquifer 

scale. Taking this into account one could 

argue that the uncertainty in simulated ¯ux concentrations 

leaving the root zone at point/grid scale is so 

large that this in itself may lead to the conclusion 

that modelling with this type of model, this grid 

size, and this data basis is of minor practical use. 

However, the uncertainty at the catchment (or aquifer) 

scale, which is an interesting scale seen from a water


supply and policy point of view is reduced so much 

that the results may be useful in practice. This duality 

illustrates that discussions of model uncertainty are 

useless unless the type of simulation result is de®ned 

precisely in terms of spatial and temporal scale, which 

is probably one of the reasons why `®eld/process 

study oriented scientists' and `modellers/large scale 

oriented scientists' often misunderstand each other. 

One way of reducing the simulated uncertainty 

would be to increase the quality of the input data 

support either by using national databases instead of 

the European data sets or by actually gathering site 

speci®c data through ®eld monitoring. The uncertainty 

related to the texture composition could be 

reduced by using national soil databases, which 

often include more detailed classi®cation systems 

than the FAO approach provided in the GISCO database. 

Keeping the procedure of using pedo-transfer 

functions for obtaining hydraulic parameters this 

would decrease the uncertainty within each de®ned 

soil class. Based on the effect of keeping soil texture 

deterministic (Table 4) it could for example be 

expected that a 50% reduction in the input error 

related to soil texture obtained by collecting better 

data in this way would reduce the uncertainty on 

simulated groundwater concentration with approximately 

25%. Gathering of better precipitation data 

would, on the other hand, only improve simulation 

of the water balance and not in¯uence the simulated 

uncertainty in groundwater concentrations signi®cantly. 

Another way of decreasing the uncertainty would 

be to carry out model calibration, as this in principle 

would decrease the uncertainty related to the input 

parameters. In practice it is, however, dif®cult to 

quantify how much the input error of a single parameter 

should be reduced if calibration involving this 

parameter is conducted. In the present study, calibration 

of the hydrogeological parameters by use of 

measured groundwater levels and observed stream 

¯ow might have in¯uenced both the simulated 

groundwater concentrations by introducing a more 

diverse hydrology and in particular the simulated 

stream concentrations as the reduction front may 

have come into function. Calibration of the root 

zone processes would have required ®eld data in 

terms of e.g. soil moisture contents, nitrogen concentrations 

in the root zone, crop yields, etc., data which 

are not often available. In order to get some idea of the 

quality of the simulated mass balances, one possibility 

could be to calibrate the simulated crop yields using 

regional agricultural statistics, though these can only 

provide rather rough estimates. 

From the results of the present study it can be 

concluded that the present modelling approach appear 

feasible for estimating uncertainties in predicted 

nitrate concentrations at larger scales, and hereby 

also for evaluating the reliability of the simulation 

results. The results also indicate that the use of distributed 

physically-based models is feasible at the catchment 

scale, even if data have to be obtained from 

readily available aggregated data sources such as 

European databases. Given the constraints for obtaining 

data and given that no model calibration was 

performed in the present case study, the validation 

tests came out surprisingly well as measured groundwater 

concentrations were within the uncertainty 

intervals of the simulated groundwater concentration. 

The uncertainty of the model simulations at catchment 

scale are at a relatively low level, and thus the predictive 

capability of the model appear very interesting 

from a practical water resources management point 

of view. 


The present work was partly funded by the EC 

Environment and Climate Research Programme 

(contract number ENV4-CT95-0070). We thank the 

two reviewers, Tim Burt and Bernd Huwe, for valuable 

comments to an earlier version of this manuscript. 

References 

Abbott, M.B., Bathurst, J.C., Cunge, J.A., O'Connell, P.E., Rasmussen, 

J., 1986. An introduction to the European hydrological 

system Ð SysteÂme Hydrologique EuropeÂen `SHE'. 1. History 

and philosophy of a physically based distributed modelling 

system. 2. Structure of a physically based distributed modelling 

system. Journal of Hydrology 87, 45±77. 

Agricultural Statistics, 1995. Danmarks Statistik, 294pp. 

Ahsan, M., O'Connor, K.M., 1994. A reappraisal of the Kalman 

®ltering technique as applied in river ¯ow forecasting. Journal 

of Hydrology 161, 197±226. 

Allerup, P., Madsen, H., Riis, J., 1982. Methods for calculating areal

226 


precipitation Ð applied to the SusaÊ-catchment. Nordic Hydrology 

13, 263±278. 

Arnold, J.G., Williams, J.R., Nicks, A.D., Sammons, N.B., 1990. 

SWRRB Ð A Basin Scale Simulation Model for Soil and Water 

Resources Management. Texas A & M University Press, 

College Station (241 pp). 

Arnold, J.G., Williams, J.R., 1995. SWRRB Ð a watershed scale 

model for soil and water resources management. In: Singh, V.P. 

(Ed.). Computer Models of Watershed Hydrology. Water 

Resources Publication, pp. 847±908. 

Beven, K., Binley, A.M., 1992. The future role of distributed 

models: model calibration and predictive uncertainty. Hydrological 

Processes 6, 279±298. 

Brooks, R.H., Corey, A.T., 1966. Properties of porous media affecting 

¯uid ¯ow. Journal of the Irrigation and Drainage Division of 

the American Society of Civil Engineering 92, 61±88. 

Burdine, N.T., 1952. Relative permeability calculations from poresize 

distribution data. Transactions of the AIME 198, 35±42. 

Bùrgesen, C.D., 2000. Personal communication. Danish Institute of 

Agricultural Science. 

Campbell, G.S., 1974. A simple method for determining unsaturated 

conductivity from moisture retention data. Soil Science 

117, 311±314. 

Cosby, B.J., Hornberger, M., Clapp, Ginn, T.R., 1984. A statistical 

exploration of relationships of soil moisture characteristics to 

the physical properties of soils. Water Resources Research 20, 

682±690. 

Dagan, G., 1986. Statistical theory of groundwater ¯ow and transport: 

pore to laboratory, laboratory to formation, and formation 

to regional scale. Water Resources Research 22 (9), 120±134. 

DeCoursey, D.G., Rojas, K.W., Ahuja, L.R., 1989. Potentials for 

non-point source groundwater contamination analyzed using 

RZWQM. Paper no. SW892562. Presented at the International 

American Society of Agricultural Engineers' Winter Meeting, 

New Orleans, Louisiana. 

EC, 1982. Groundwater resources in Denmark. Commission of the 

European Communities. EUR 7941 (in Danish). 

Freeze, R.A., 1980. A stochastic-conceptual analysis of the rainfallrunoff 

process on a hillslope. Water Resources Research 16 (2), 

391±408. 

Gelb, A. (Ed.), 1974. Applied Optimal Estimation MIT Press, 

Cambridge, MA. 

Gelhar, L.W., 1986. Stochastic subsurface hydrology. From theory 

to applications. Water Resources Research 22 (9), 135±145. 

Hansen, S., Jensen, H.E., 1988. Spatial variability of soil physical 

properties. Theoretical and experimental analysis. II. Soil water 

variables-data acquisition, processing and basic statistics. 

Research report no. 1210. Department of Soil and Water and 

Plant Nutrition. The Royal Veterinary and Agricultural University, 

Copenhagen, 54pp. 

Hansen, S., Jensen, H.E., Nielsen, N.E., Svendsen, H., 1991. Simulation 

of nitrogen dynamics and biomass production in winter 

wheat using the Danish simulation model DAISY. Fertiliser 

Research 27, 245±259. 

Hansen, S., Thorsen, M., Pebesma, E., Kleeschulte, S., Svendsen, H., 

1999. Uncertainty in simulated leaching due to uncertainty in input 

data. A case study. Soil Use and Management 15, 167±175. 

Heng, H.H., Nikolaidis, N.P., 1998. Modelling of nonpoint source 

pollution of nitrogen at the watershed scale. Journal of the 

American Water Resources Association 34 (2), 359±374. 

Jensen, K.H., Mantoglou, A., 1992. Application of stochastic 

unsaturated ¯ow theory, numerical simulations and comparison 

to ®eld observations. Water Resources Research 28 (1), 

269±284. 

Kros, J., Pebesma, E.J., Reinds, G.J., Finke, P.A., 1999. Uncertainty 

assessment in modelling soil acidi®cation at the European scale: 

a case study. Journal of Environmental Quality 28 (2), 366±377. 

Lamm, C.G., 1971. The Danish soil database. Tidskrift for Planteavl 

75, 703±720 (in Danish). 

Landbrugets RaÊdgivningscenter, 1996. Square grid for nitrate investigations 

in Danmark 1990±1993. Landskontoret for Planteavl, 

Skejby, Denmark (in Danish). 

McKay, M.D., Conover, W.J., Beckman, R.J., 1979. A comparison 

of three methods for selection values of input variables in the 

analysis of output from a computer code. Technometrics 2, 

239±245. 

Nemec, J., 1994. Distributed hydrological models in the perspective 

of forecasting operational real time hydrological systems 

(FORTHS). In: Rosso, P., Peano, A., Becchi, I., Bemporad, 

G.A. (Eds.). Advances in Distributed Hydrology. Water 

Resources Publications, pp. 69±84. 

Pebesma, E.J., Heuvelink, G.B.M., 1999. Latin hypercube sampling 

of Gaussian random ®elds. Technometrics 41 (4), 303±312. 

Plantedirektoratet, 1996. Vejledninger og skemaer 1996/1997. 

Ministry for Food, Agriculture and Fishery, 38pp. 

Refsgaard, J.C., 1996. Terminology, modelling protocol and classi- 

®cation of hydrological model codes. In: Abbott, M.B., 

Refsgaard, J.C. (Eds.). Distributed Hydrological Modelling. 

Kluwer Academic, pp. 17±39. 

Refsgaard, J.C., Storm, B., 1995. MIKE SHE. In: Singh, V.P. (Ed.). 

Computer Models of Watershed Hydrology. Water Resources 

Publication, pp. 809±846. 

Refsgaard, J.C., Ramaekers, D., Heuvelink, G.B.M., Schreurs, V., 

Kros, H., RoseÂn, L., Hansen, S., 1998. Assessment of cumulative 

uncertainty in spatial decision support systems: application 

to examine the contamination of groundwater from diffuse 

sources (UNCERSDSS). Presented at the European Climate 

Science Conference, Vienna, 19±23 October, 1998. To appear 

in conference proceedings. 

Refsgaard, J.C., Thorsen, M., Birk Jensen, J., Kleeschulte, S., 

Hansen, S., 1999. Large scale modelling of groundwater 

contamination from nitrogen leaching. Journal of Hydrology 

221, 117±140. 

Simmelsgaard, S.E., 1991. Estimating functions for nitrogen leaching: 

nitrogen fertilizers in agriculture Ð requirement and leaching 

now and in the future. National Institute of Agricultural 

Economics, Copenhagen, Denmark (in Danish). 

Skop, E., 1993. Calculation of nitrogen leaching on a regional scale. 

Technical report no. 65. National Environmental Research Institute, 

Silkeborg, Denmark, 54 pp (in Danish). 

Smith, L., Freeze, R.A., 1979a. Stochastic analysis of steady state 

¯ow in a bounded domain. 1. One-dimensional simulations. 

Water Resources Research 15 (3), 521±528. 

Smith, L., Freeze, R.A., 1979b. Stochastic analysis of steady state


¯ow in a bounded domain. 2. Two-dimensional simulations. 

Water Resources Research 15 (6), 1543±1559. 

Smith, R.E., 1992. An integrated simulation model of 

nonpoint-source pollutants at the ®eld scale. Department of 

Agriculture, Agricultural Research Service, 120pp. 

Storm, B., Jensen, K.H., Refsgaard, J.C., 1988. Estimation of catchment 

rainfall uncertainty and its in¯uence on runoff prediction. 

Nordic Hydrology 19, 77±88. 

Styczen, M., Storm, B., 1993. Modelling of N-movements on catchment 

scale Ð a tool for analysis and decision making. 1. Model 

description. & 2. A case study. Fertiliser Research 36, 1±17. 

Tietje, O., Tapkenhinrichs, M., 1993. Evaluation of pedo-transfer 

functions. Soil Science Society of America Journal 57, 1088± 

1095. 

Wood, E., O'Connell, P.E., 1985. Real-time forecasting. In: Anderson, 

M.G., Burt, T.P. (Eds.). Hydrological Forecasting. Wiley, 

New York, pp. 505±558. 

Wood, E.F., Sivapalan, M., Beven, K.J., Band, L., 1988. Effects of 

spatial variability and scale with implications to hydrologic 

modelling. Journal of Hydrology 102, 29±47. 

Zhang, H., Haan, C.T., Nofziger, D.L., 1993. An approach to estimating 

uncertainties in modelling transport of solutes through 

soils. Journal of Contaminant Hydrology 12, 35±50.

[12] 

Refsgaard JC, Henriksen HJ (2004) Modelling guidelines – terminology and 

guiding principles. 

Advances in Water Resources, 27(1), 71-82. 

Reprinted from Advances in Water Resources with permission from Elsevier

Advances in Water Resources 27 (2004) 71–82 

www.elsevier.com/locate/advwatres 

Modelling guidelines––terminology and guiding principles 

Jens Christian Refsgaard * , Hans Jørgen Henriksen 

Department of Hydrology, Geological Survey of Denmark and Greenland (GEUS), Øster Voldgade 10, Copenhagen DK-1350, Denmark 

Received 29 October 2002; received in revised form 7 August 2003; accepted 18 August 2003 

Abstract 

Some scientists argue, with reference to Popper’s scientific philosophical school, that models cannot be verified or validated. 

Other scientists and many practitioners nevertheless use these terms, but with very different meanings. As a result of an increasing 

number of examples of model malpractice and mistrust to the credibility of models, several modelling guidelines are being elaborated 

in recent years with the aim of improving the quality of modelling studies. This gap between the views and the lack of 

consensus experienced in the scientific community and the strongly perceived need for commonly agreed modelling guidelines is 

constraining the optimal use and benefits of models. This paper proposes a framework for quality assurance guidelines, including a 

consistent terminology and a foundation for a methodology bridging the gap between scientific philosophy and pragmatic modelling. 

A distinction is made between the conceptual model, the model code and the site-specific model. A conceptual model is 

subject to confirmation or falsification like scientific theories. A model code may be verified within given ranges of applicability and 

ranges of accuracy, but it can never be universally verified. Similarly, a model may be validated, but only with reference to sitespecific 

applications and to pre-specified performance (accuracy) criteria. Thus, a model’s validity will always be limited in terms 

of space, time, boundary conditions and types of application. This implies a continuous interaction between manager and modeller 

in order to establish suitable accuracy criteria and predictions associated with uncertainty analysis. 

Ó 2003 Elsevier Ltd. All rights reserved. 

Keywords: Model guidelines; Scientific philosophy; Validation; Verification; Confirmation; Domain of applicability; Uncertainty 


Models describing water flows, water quality and 

ecology are being developed and applied in increasing 

number and variety. With the requirements imposed by 

the EU Water Framework Directive the trend in recent 

years to base water management decisions to a larger 

extent on model studies and to use more sophisticated 

models is likely to be reinforced. At the same time 

insufficient attention is generally given to documenting 

the predictive capability of the models. Therefore, contradictions 

emerge regarding the various claims of 

model applicability on the one hand and the lack of 

documentation of these claims on the other hand. 

Hence, the credibility of the models is often questioned, 

and sometimes with good reason. 

As emphasised by e.g. Forkel [12] modelling studies 

involve several partners with different responsibilities. 

* Corresponding author. Tel.: +45-38-14-27-76; fax: +45-38-14-20- 

50. 


The Ôkey players’ are code developers, model users and 

water resources managers. However, due to the complexity 

of the modelling process and the different backgrounds 

of these groups, gaps in terms of lack of mutual 

understanding easily develop. For example, the strengths 

and limitations of modelling applications are most often 

difficult, if not impossible, to assess by the water resources 

managers. Similarly, the transformation of water 

managers’ objectives to specific performance criteria can 

be very difficult to assess for the model users. Due to lack 

of documentation and transparency, modelling projects 

can be difficult to audit, and without a considerable effort 

it is hardly possible to reconstruct, repeat and reproduce 

the modelling process and its results. 

In the water resources management community a 

number of different guidelines on good modelling practise 

have been prepared. One of the most, if not the most, 

comprehensive examples of modelling guidelines has 

been developed in The Netherlands [37] as a result of a 

process involving all the main players in the Dutch water 

management field. The background for this process was 

a perceived need for improving the quality in modelling 

0309-1708/$ - see front matter Ó 2003 Elsevier Ltd. All rights reserved. 

doi:10.1016/j.advwatres.2003.08.006

72 J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82 

by addressing malpractice such as careless handling of 

input data, insufficient calibration and validation and 

model use outside its scope [34]. Similarly, the background 

for modelling guidelines for the Murray–Darling 

Basin in Australia was a perception among the end-users 

that model capabilities may have been Ôover-sold’, and 

that there is a lack of consistency in approaches, communication 

and understanding among and between 

modellers and water resources managers, often resulting 

in considerable uncertainty for decision making [25]. 

A key problem in relation to establishment of generally 

acceptable modelling guidelines is confusion on 

terminology. For example the terms validation and 

verifications are used with different, and some times 

interchangeable, meaning by different authors. The 

confusion arises from both semantic and philosophical 

considerations [32]. Another important problem is the 

lack of consensus related to the so far non-conclusive 

debate on the fundamental question concerning whether 

a water resources model can be validated or verified, and 

whether it as such can be claimed to be suitable or valid 

for particular applications [3,11,16,20,26]. 

Finally, modelling guidelines have to reflect and be in 

line with the underlying philosophy of environmental 

modelling which have changed significantly during the 

past decades from what in retrospect may be called 

rather naive enthusiasms (see for example Freeze and 

Harlan [13]; Abbott [1]––many of us focussed on the 

huge potentials of sophisticated models outlined in these 

early days without reflecting too much on the associated 

limitations) to what now appears to be a much more 

balanced and mature view (e.g. Beven [7,9]). 

Thus, there is a gap between the theory and practice, 

i.e. between the various, contradictory views and the 

lack of a common terminology and methodology in the 

scientific community on the one side, and the need of 

having quality assurance guidelines for practical model 

applications on the other side. The objective of the 

present paper is to establish guiding principles for 

quality assurance guidelines, including establishing a 

consistent terminology and a foundation for a methodology 

bridging the gap between scientific philosophy and 

pragmatic modelling. 

2. Key opinions in the scientific community 

The present paper does not attempt to provide a full 

review of all relevant papers on this subject. Rather, it 

provides a review of a few selected characteristic 

examples. 

2.1. Terminology 

No unique and generally accepted terminology and 

methodology exist at present in the scientific community 

with respect to modelling protocol and guidelines for 

good modelling practise. Examples of general methodologies 

exist [4,32,33], but they use different terminology 

and have significant differences with respect to the 

underlying scientific philosophy. 

A rigorous and comprehensive terminology for model 

credibility was presented by Schlesinger et al. [33]. This 

terminology was developed by a committee composed of 

members from diverse disciplines and background with 

the intent that it could be employed in all types of simulation 

applications. In regard to terminology, distinctions 

are made between model qualification (adequacy 

of conceptual model), model verification (adequacy of 

computer programme) and model validation (adequacy 

of site-specific model). With the exception of a few 

important terms, such as generic model code and model 

calibration, which are not considered by Schlesinger 

et al. [33], their proposed terminology includes all the 

important elements of the modelling process. 

Konikow and Bredehoeft [20], in their thought provoking 

paper, express the view that ‘‘the terms validation 

and verification have little or no place in 

groundwater science; these terms lead to a false impression 

of model capability’’. Their main argument relates 

to the anti-positivistic view that a theory (in this case a 

model) can never be proved to be generally valid, but 

may in contrary be falsified by just one example. They 

argue and recommend that the term history matching, 

which does not indicate a claim of predictive capability, 

should be used instead. 

Oreskes et al. [26], in their classic and philosophically 

based paper, distinguish between verification, validation 

and confirmation: 

• Verify is ‘‘an assertion or establishment of truth’’. To 

verify a model therefore means to demonstrate its 

truth. According to the authors ‘‘verification is only 

possible in closed systems in which all the components 

of the system is established independently and 

are known to be correct. In its application to models 

of natural systems, the term verification is highly misleading. 

It suggests a demonstration of proof that is 

simply not accessible’’. They argue that mathematical 

components are subject to verification, because they 

are part of closed systems, but numerical models in 

application cannot be verified because of uncertainty 

of input parameters, scaling problems and uncertainty 

in observations. 

• The term validation is weaker than the term verification. 

Thus validation does not necessarily denote an 

establishment of truth, but rather ‘‘the establishment 

of legitimacy, typically given in terms of contracts, 

arguments and methods’’. They argue that ‘‘the term 

valid may be useful for assertions about a generic 

model code but is clearly misleading if used to refer 

to actual model results in any particular realisation’’.

J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82 73 

• The term confirmation is weaker than the terms verification 

and validation. It is used with regard to a theory, 

when it is found that the theory is in agreement 

with empirical observations. As discussed below such 

agreement does not prove that the theory is true, it 

only confirms it. 

Oreskes et al. [26] do not define how the terms verification 

and validation should be used, but rather define 

their meaning and set limitations to the contexts in 

which they meaningfully can be used. 

An important distinction is made between open and 

closed systems. A system is a closed system if its true 

conditions can be predicted or computed exactly. This 

applies to mathematics and mostly to physics and 

chemistry. Systems where the true behaviour cannot be 

computed due to uncertainties and lack of knowledge on 

e.g. input data and parameter values are called open 

systems. The systems we are dealing with in water resources 

management, based on geosciences, biology and 

socio-economy, are open systems. 

It may be argued that e.g. the behaviour of a 

groundwater flow system can be predicted correctly if all 

the details of the subsurface (soil system and geological 

system) media were known, because the fundamental 

physical laws governing the flow are known. However, 

in practice it will never be possible to know all the details 

of the media down to molecular scale, and hence 

uncertainties will always exist. For instance, several 

alternative representations of the subsurface system at 

microscopic scale will be able to provide the same 

flow field at a macroscopic scale. Therefore, the results 

from a groundwater flow model are said to be nonunique. 

In addition, as the system is a so-called open 

system, the boundary conditions generate further 

uncertainty. 

Matalas et al. [24] draw a distinction between the 

terms Ômodel’ and Ôtheory’. They state that ‘‘a theory 

represents a synthesis of understanding, which provides 

not only a description of what constitutes the states of 

the system and their connectedness (i.e. postulated 

concepts), but also deducted consequences from these 

postulates. A model is an analogy or an abstraction, 

which ...may be derived intuitively and without formal 

deductive capability’’. 

Rykiel [32] argues that models can be validated as 

acceptable for pragmatic purposes, whereas theoretical 

validity is always provisional. In this respect he, like 

Matalas et al. [24], distinguishes between scientific 

models and predictive (engineering) models. Scientific 

models can be corroborated (confirmed) or refuted 

(falsified) in the sense of hypothesis testing, while predictive 

models can be validated or invalidated in the 

sense of engineering performance testing. Thus according 

to Rykiel [32], validation is not a procedure for 

testing scientific theory or for certifying the Ôtruth’ of 

current scientific understanding, but rather a testing 

of whether a model is acceptable for its intended use. 

Within the hydraulic engineering community attempts 

have been made to establish a common quality 

assurance methodology IAHR [18]. The IAHR methodology 

comprises guidelines for standard validation 

documents, where validation of a software package is 

considered in four steps [10,23]: conceptual validation, 

algorithmic validation, software validation and functional 

validation. It is noted that the term validation in 

the IAHR methodology corresponds to what other authors 

call code verification, while schemes for validation 

of site-specific models are not included. 

2.2. Scientific philosophical aspects of verification and 

validation 

Different principal schools of philosophical thought 

exist on the issue of verification and validation. During 

the second half of the 19th century and the first half of 

the 20th century positivism was the dominant philosophical 

school. Matalas et al. [24] characterises the 

positivistic school in the following way: ‘‘...theories are 

proposed through inductive logic, and the proposed 

theories are confirmed or refuted on the basis of critical 

experiments designed to verify the consequences of the 

theories. And through theory reduction or adoption of 

new or modified theories, science is able to approach 

truth’’. The logic rationale behind positivism is the 

inductive method, i.e. the inference from singular 

statements, such as accounts of results of observations 

or experiments, to universal statements, such as hypothesis 

or theories. 

Popper [29] opposed the positivistic school arguing 

that science is deductive rather than inductive, and that 

theories cannot be verified, only falsified. The deductive 

method implies inferences from a universal statement to 

a singular statement, where conclusions are logically 

derived from given premises. Science is considered as a 

hypothetico-deductive activity, implying that empirical 

observations must be framed as deductive consequences 

of a general theory or scientific law. If the observations 

can be shown to be true then the theory or law is said to 

be corroborated. Popper used the term corroborate instead 

of confirmation, because he ‘‘wanted a neutral 

term to describe the degree to which a theory has stood 

up to severe tests and proved its mettle’’. 

The greater the number and diversity of confirming 

observations the more credible the theory or law becomes. 

But no matter how much data and how many 

confirmations we have, there will always be the possibility 

that more than one theory can explain the observations. 

Over time the false theories are likely to be 

confronted with observations that falsify them. Thus, 

scientific theories are never certain or proved but only 

hypotheses subject to corroboration or falsification.


Popper [29] distinguished between two kinds of universal 

statements: the Ôstrictly universal’ and the Ônumerical 

universal’. The strictly universal statements are 

those usually dealt with when speaking about theories 

or natural laws. They are a kind of Ôall-statement’ 

claiming to be true for any place and any time. In contrary 

numerical universal statements refers only to a 

finite class of specific elements within a finite individual 

spatio-temporal region. A numerical universal statement 

is thus in fact equivalent to conjunctions of singular 

statements. 

Kuhn [21] also strongly criticised positivism, and in 

a discussion of selection of correct scientific theories 

(paradigms) states ‘‘... few philosophers of science still 

seek absolute criteria for the verification of scientific 

theories. Noting that no theory can ever be exposed to 

all possible relevant tests, they ask not whether a theory 

has been verified but rather about its probability in the 

light of the evidence that actually exists. And to answer 

that question one important school is driven to compare 

the ability of different theories to explain the evidence at 

hand.’’ 

According to the deductive approach a given system 

is reduced into elements or sub-systems that are closed, 

i.e. without uncertainties from the boundary or initial 

conditions, and a given hypothesis is then confirmed by 

use of causal relationships and rigouristic logic. The 

deductive method is the traditional scientific philosophy 

and methodology for Ôexact sciences’ such as physics and 

chemistry. Hansen [15] and Baker [5] argue that this 

deductive or Ôtheory-directed’ scientific method is not 

suitable to earth sciences, such as geology and biology, 

which are characterised by open systems, and where 

many of the signs in the historical development process 

are not preserved. Instead, they argue for another scientific 

method, which they, respectively, denote Ôholistic’ 

or Ôearth-directed’. The earth-directed scientific method 

does not focus on idealised theories verified in experimental 

laboratories. Instead, it is oriented towards 

observations in nature, uncontrolled by artificial constraints. 

The earth-directed method, being more Ôsoft’ 

and accepting conclusions on the complex state of nature 

from an integration of many observations, but 

without the logical rigorous proof required by the 

deductive method, can be argued to be well in line with 

Popper’s philosophy where the scientific knowledge 

comprises a variety of falsifiable theories that are subject 

to tests against observations [15]. 

2.3. Philosophy of environmental modelling 

Following several papers (ranging from Beven [6] to 

[7]) with comprehensive critique against the predominant 

philosophy underlying most environmental modelling, 

Beven [9] outlines a new philosophy for modelling 

of environmental systems. The basic aim of this new 

approach is to extend the most common, past approach 

with a more realistic account of uncertainty rejecting the 

idea of being able to identify only one optimal model as 

being the most reliable for a given case. His basic idea is 

in line with Oreskes et al. [26] that verification and 

validation of environmental models is impossible, because 

natural systems are open. Instead environmental 

models may be non-unique subject to only a conditional 

confirmation, due to e.g. errors in model structure, calibration 

of parameters and period of data used for 

evaluation. Due to this there will always be the possibility 

of equifinality in that many different model 

structures and parameter sets may give simulations that 

cannot be falsified from the available observational 

data. Beven therefore argues that the range of behavioural 

models (structures and parameter sets) is best 

represented in terms of mapping of the Ôlandscape space’ 

into the Ômodel space’, and that uncertainty predictions 

should consider all the behavioural models. 

3. Proposed terminology and methodological framework 

The following terminology is inspired by the generalised 

terminology for model credibility proposed by 

Schlesinger et al. [33], but modified and extended to 

accommodate some of the scientific philosophical issues 

raised above. The simulation environment is divided 

into four basic elements as shown in Fig. 1. The inner 

arrows describe the processes that relate the elements to 

each other, and the outer circle refers to the procedures 

that evaluate the credibility of these processes. 

In general terms a model is understood as a simplified 

representation of the natural system it attempts to describe. 

However, in the terminology proposed below a 

distinction is made between three different meanings of 

the general term model, namely the conceptual model, 

the model code and the model that here is defined as a 

site-specific model. The most important elements in the 

terminology and their interrelationships are defined as 

follows: 

Reality: The natural system, understood here as the 

study area. 

Conceptual model: A description of reality in terms of 

verbal descriptions, equations, governing relationships 

or Ônatural laws’ that purport to describe reality. This is 

the user’s perception of the key hydrological and ecological 

processes in the study area (perceptual model) 

and the corresponding simplifications and numerical 

accuracy limits that are assumed acceptable in order to 

achieve the purpose of the modelling. A conceptual 

model thus includes both a mathematical description 

(equations) and a descriptions of flow processes, river 

system elements, ecological structures, geological features, 

etc. that are required for the particular purpose of 

modelling. By drawing an analogy to the scientific


Fig. 1. Elements of a modelling terminology. Modified after Schlesinger et al. [33]. 

philosophical discussion above the conceptual model in 

other words constitutes the scientific hypothesis or theory 

that we assume for our particular modelling study. 

Model code: A mathematical formulation in the form 

of a computer program that is so generic that it, without 

program changes, can be used to establish a model with 

the same basic type of equations (but allowing different 

input variables and parameter values) for different study 

areas. 

Model: A site-specific model established for a particular 

study area, including input data and parameter 

values. 

Model confirmation: Determination of adequacy of 

the conceptual model to provide an acceptable level of 

agreement for the domain of intended application. This 

is in other words the scientific confirmation of the theories/hypotheses 

included in the conceptual model. 

Code verification: Substantiation that a model code is 

in some sense a true representation of a conceptual 

model within certain specified limits or ranges of application 

and corresponding ranges of accuracy. 

Model calibration: The procedure of adjustment of 

parameter values of a model to reproduce the response 

of reality within the range of accuracy specified in the 

performance criteria. 

Model validation: Substantiation that a model within 

its domain of applicability possesses a satisfactory range 

of accuracy consistent with the intended application of 

the model. 

Model set-up: Establishment of a site-specific model 

using a model code. This requires, among other things, 

the definition of boundary and initial conditions and 

parameter assessment from field and laboratory data. 

Simulation: Use of a validated model to gain insight 

into reality and obtain predictions that can be used by 

water managers. This includes insight into how reality 

can be expected to respond to human interventions. In 

this connection uncertainty assessments of the model 

predictions are very important. 

Performance criteria: Level of acceptable agreement 

between model and reality. The performance criteria 

apply both for model calibration and model validation. 

Domain of applicability (of conceptual model): Prescribed 

conditions for which the conceptual model has 

been tested, i.e. compared with reality to the extent 

possible and judged suitable for use (by model confirmation). 

Domain of applicability (of model code): Prescribed 

conditions for which the model code has been tested, i.e. 

compared with analytical solutions, other model codes 

or similar to the extent possible and judged suitable for 

use (by code verification). 

Domain of applicability (of model): Prescribed conditions 

for which the site-specific model has been tested, 

i.e. compared with reality to the extent possible and 

judged suitable for use (by model validation). 

The credibility of the descriptions or the agreements 

between reality, conceptual model, model code and 

model are evaluated through the terms confirmation, 

verification, calibration and validation. Thus, the relation 

between reality and the scientific description of reality 

which is constituted by the conceptual model with its 

theories and equations on flow and transport processes, 

its interpretation of the geological system and ecosystem 

at hand, etc., is evaluated through the confirmation of 

the conceptual model. As a logical consequence of our


position on scientific methodology, we use the term 

confirmation in connection with conceptual model. This 

implies that we agree that it is never possible to prove 

the truth of a theory/hypothesis and as such of a conceptual 

model. And even if a site-specific model is 

eventually accepted as valid for specific conditions, this 

is not a proof that the conceptual model is true, because, 

due to non-uniqueness, the site-specific model may turn 

out to perform right for the wrong reasons. 

Methods for conceptual model confirmation should 

follow the standard procedures for confirmation of scientific 

theories. This implies that conceptual models 

should be confronted with actual field data and be 

subject to critical peer reviews. Furthermore, the feedback 

from the calibration and validation process may 

also serve as a means by which one or a number of 

alternative conceptual model(s) may be either confirmed 

or falsified. 

The ability of a given model code to adequately describe 

the theory and equations defined in the conceptual 

model by use of numerical algorithms is evaluated 

through the verification of the model code. Use of the 

term verification in this respect is in accordance with 

Oreskes et al. [26], because mathematical equations are 

closed systems. The methodologies used for code verification 

include comparing a numerical solution with an 

analytical solution or with a numerical solution from 

other verified codes. However, some programme errors 

only appear under circumstances that do not routinely 

occur, and may not have been anticipated. Furthermore, 

for complex codes it is virtually impossible to verify that 

the code is universally accurate and error-free. Therefore, 

the term code verification must be qualified in 

terms of specified ranges of application and corresponding 

ranges of accuracy. A code may be applied 

outside its documented ranges of application, but in 

such cases it must not carry the label Ôverified’ and 

caution should be expressed with respect to its results. 

The application of a model code to be used for setting 

up a site-specific model is usually associated with model 

calibration. The model performance during calibration 

depends on the quantity and quality of the available 

input and observation data as well as on the conceptual 

model. If sufficient accuracy cannot be achieved either 

the conceptual model and/or the data have to be reevaluated. 

A discussion of the problems and methodologies 

in model calibration is provided by Gupta et al. 

[14]. 

Often the model performance during calibration is 

used as a measure of the predictive capability of a 

model. This is a fundamental error. Many studies (e.g. 

Refsgaard and Knudsen [31]; Liden [22]) have demonstrated 

that the model performance against independent 

data not used for calibration is generally poorer than the 

performance achieved in the calibration situation. 

Therefore, the credibility of a site-specific model’s 

capability to make predictions about reality must be 

evaluated against independent data. This process is denoted 

model validation. In designing suitable model 

validation tests a guiding principle should be that a 

model should be tested to show how well it can perform 

the kind of task for which it is specifically intended [19]. 

This implies for instance that for the case where a model 

is intended to be used for conditions similar to conditions 

where test data exist, such as extension of 

streamflow records, a standard split-sample test may be 

applied. However, models are often intended to be used 

as management tools to help answer questions such as: 

What happens to the water resources if land use is 

changed In such case no site-specific test data exist and 

the question of defining a validation test scheme becomes 

non-trivial. 

4. Discussion 

4.1. Scientific philosophical aspects 

The fundamental view expressed by scientific philosophers 

is that verification and validation of numerical 

models of natural systems is impossible, because natural 

systems are never closed and because the mapping of 

model results are always non-unique [26]. Thus, seen 

from a theoretical point it is tempting to conclude that 

the establishment of modelling guidelines comprising 

these terms simply is not possible. 

On the other hand, there is a large and increasing 

need to establish guidelines to improve the quality of 

modelling, and such guidelines need to address the issues 

of verification and validation in order to be operational 

in practise. Irrespective of what the scientific community 

decides regarding terminology and validation methodology, 

including the associated philosophical aspects, 

models are being used more and more to support water 

resources management in practise. As long as the present 

situation continues, characterised by a large degree 

of confusion on terminology and methodology, the potential 

benefits of using models are severely constrained. 

They are often subject to either Ôoverselling’ or Ômistrust’, 

and misunderstandings between model users and 

water resources managers may easily occur in the absence 

of a commonly accepted and understood Ôlanguage’. 

Thus, establishment of a terminology and 

methodology that bridge the gap between scientific 

philosophy and pragmatic modelling is a key challenge 

and an important one. 

This gap between a scientific philosophical and a 

pragmatic modelling position is also clearly reflected in 

the dialogue between Konikow and Bredehoeft [20] and 

De Marsily et al. [11]. Following the Popperian school, 

Konikow and Bredehoeft [20] express the view that ‘‘the 

terms validation and verification have little or no place


in ground-water science; these terms lead to a false 

impression of model capability’’. De Marsily et al. [11], 

in a response, argue for a more pragmatic view: ‘‘... 

using the model in a predictive mode and comparing it 

with new data is not a futile exercise; it makes a lot of 

sense to us. It does not prove that the model will be 

correct for all circumstances, it only increases our confidence 

in its value. We do not want certainty; we will be 

satisfied with engineering confidence.’’ 

With regard to scientific methodology we fundamentally 

agree with the views of Popper [29] and the 

earth-directed theoretical method described by Baker 

[5]. Consequently, we agree with the view of Oreskes 

et al. [26], Konikow and Bredehoeft [20] and many 

others that it is not possible to carry out model verification 

or model validation, if these terms are used 

without restriction to domains of applicability and levels 

of accuracy. 

The restrictions in use of the terms confirmation, 

verification and validation imposed by the respective 

domains of applicability imply, according to Popper’s 

views, that the conceptual model, model code and 

site-specific models can only be classified as numerical 

universal statements as opposed to strictly universal 

statements. This distinction is fundamental for our 

proposed methodology and its link to scientific philosophical 

theories. 

4.2. Model confirmation, verification and validation 

An important aspect of our proposed methodology 

lies in the separation between the three different Ôversions’ 

of the word model, namely the conceptual model, 

the model code and the site-specific model. This separation 

is in line with Matalas et al. [24] and Rykiel [32], 

who distinguish between the theory (conceptual model) 

and the engineering model (the site-specific model). 

Similarly, Schlesinger et al. [33] distinguish between 

conceptual model and computerised model. Schlesinger 

et al. [33], Matalas et al. [24] and Rykiel [32] do not 

separate the model code from the site-specific model. 

Due to this distinction it is possible, at a general level, 

to talk about confirmation of a theory or a hypothesis 

about how nature can be described using the relevant 

scientific method for that purpose, and, at a site-specific 

level, to talk about validity of a given model within 

certain domains of applicability and associated with 

specified accuracy limits. 

As Beven [9] argues we need to distinguish between 

our qualitative understanding (perceptual model) and 

the practical implementation of that understanding in 

our conceptual model. As we have defined a conceptual 

model as combination of a perceptual model and the 

simplifications acceptable for a particular model study a 

conceptual model becomes site-specific and even case 

specific. For example a conceptual model of a groundwater 

aquifer may be described as two-dimensional for a 

study focussing on regional groundwater heads, while it 

may need to include more complex three-dimensional 

geological structures for detailed simulation of solute 

transport studies. 

Confirmation of a conceptual model is a non-trivial 

issue. It is hardly possible to prescribe general test 

procedures, in particular not exact tests. Conceptual 

models are more difficult in some domains than in 

others. For example, the process descriptions/equations 

and the actual system is relatively easily identifiable in 

a hydrodynamic river flow system as compared to a 

groundwater system or an ecosystem, because the geology 

will never be completely known in a groundwater 

system and the biological processes may not be well 

known in an ecosystem. The more complex and difficult 

the conceptual model becomes the more Ôsoft’ the confirmation 

tests may turn out to be. Thus, expert 

knowledge in terms of peer reviews may be an important 

element of such tests. 

In cases where considerable uncertainty exists in the 

conceptual model, the possibility of testing alternative 

conceptual models should be promoted. An example of 

this is given by Troldborg [35], who reports a study 

where three scientists developed alternative geological 

interpretations for the same area, and three numerical 

groundwater models were set-up and calibrated on this 

basis. During this process, or in the subsequent validation 

phase, one or more of these models may turn out to 

perform so poorly that the underlying conceptual model 

has to be rejected. This approach of building the 

uncertainty of our knowledge of reality into alternative 

conceptual models, which are subsequently subject to a 

confirmation test, is fully in line with Popper’s scientific 

philosophical school. Unfortunately, this is very seldom 

pursued in practise. 

Code verification is not an activity that is carried out 

from scratch in every modelling study. In a particular 

study it has to be ascertained that the domain of 

applicability for which the selected model code has been 

verified covers the conditions specified in the actual 

conceptual model. If that is not the case, additional 

verification tests have to be conducted. Otherwise, the 

code explicitly must be classified as not verified for this 

particular study, and the subsequent simulation results 

therefore have to be considered with extra caution. 

Establishment of validation test schemes for the situations, 

where the split-sample test is not sufficient, is an 

area, where limited work has been carried out so far. 

The only rigorous and comprehensive methodology reported 

in literature is that of Klemes [19]. He proposed a 

systematic scheme of validation tests, where a distinction 

is made between simulations conducted for the 

same catchment as was used for calibration (split-sample 

test) and simulations conducted for ungauged catchments 

(proxy-basin tests). He also distinguished between


cases where catchment conditions such as climate, land 

use and ground water abstraction are stationary (splitsample 

test) and cases where they are not (differential 

split-sample test). A further discussion, including examples, 

of Klemes’s test scheme is given in Refsgaard 

[30]. The two key principles are: (a) the validation tests 

must be carried out against independent data, i.e. data 

that have not been used during calibration, and (b) the 

model should be tested to show how good it can perform 

the kind of task for which it is specifically intended to be 

applied subsequently. This implies e.g. that multi-site 

validation is needed if predictions of spatial patterns are 

required, and multi-variable checks are required if predictions 

of the behaviour of individual subsystems 

within a catchment is needed. Thus, a model should only 

be assumed valid with respect to outputs that have been 

explicitly validated. This means for instance that a 

model which is validated against catchment runoff cannot 

automatically be assumed valid also for simulation 

of erosion on a hillslope within the catchment, because 

smaller scale processes may dominate here; it will need 

validation against hillslope soil erosion data. 

From a theoretical point of view the procedures 

outlined by Klemes [19] for the proxy-basin and the 

differential split-sample tests, where tests have to be 

carried out using data from similar catchments, are 

weaker than the usual split-sample test, where data from 

the specific catchment are available. However, no 

obviously better testing schemes exist. Therefore, this 

will have to be reflected in the performance criteria in 

terms of larger expected uncertainties in the predictions. 

It must be realised that the validation test schemes 

proposed above are so demanding that many applications 

today would fail to meet them. Thus, for many 

cases where either proxy-basin and differential splitsample 

tests are required, suitable test data simply do 

not exist. This is for example the case for prediction of 

regional scale transport of potential contamination from 

underground radionuclide deposits over the next thousands 

of years. In such case model validation is not 

possible. This does not imply that these modelling 

studies are not useful, only that their output should be 

recognised to be somewhat more uncertain than is often 

stated and that the term Ôvalidated model’ should not 

be used. Thus, a model’s validity will always be confined 

in terms of space, time, boundary conditions, types of 

application, etc. 

According to the methodology, model validation 

implies substantiating that a site-specific model can 

produce simulation results within the range of accuracy 

specified in the performance criteria for the particular 

study. Hence, before carrying out the model calibration 

and the subsequent validation tests quantitative performance 

criteria must be established. In determining 

the acceptable level of accuracy a trade-off will, either 

explicitly or implicitly, have to be made between costs, 

in terms of data collection and modelling work, and 

associated benefits that can be obtained due to more 

accurate model results. Consequently, the acceptable 

level of accuracy will vary from case to case and must be 

seen in a socio-economic context. It should therefore 

usually not be defined by the modeller, but in a dialogue 

between the modeller and the manager. 

4.3. Need for interaction between manager, code developer 

and modeller 

As discussed above, the validation methodologies 

presently used, even in research projects, are generally 

not rigorous and far from satisfactory. At the same time 

models are being used in practise and daily claims are 

being made on validity of models and on the basis of, at 

the best, not very strict and rigorous test schemes. An 

important question then, is how can the situation be 

improved in the future As emphasised by Forkel [12] 

improvements cannot be achieved by the research 

community alone, but requires an interaction between 

the three main Ôplayers’, namely water resources managers, 

code developers and model users (modellers). 

The key responsibilities of the water resources manager 

are to specify the objectives and define the acceptance 

limits of accuracy performance criteria for the 

model application. Furthermore, it is the manager’s 

responsibility to define requirements for code verification 

and model validation. In many consultancy jobs 

accuracy criteria and validation requirements are not 

specified at all, with the result being that the model user 

implicitly defines them in accordance with the achieved 

model results. In this respect it is important in the terms 

of references for a given model application to ensure 

consistency between the objectives, the specified accuracy 

criteria, the data availability and the financial 

resources. In order for the manager to make such evaluations, 

some knowledge on the modelling process is 

required. 

The model user has the responsibility for selection of 

a suitable code as well as for construction, calibration 

and validation of the site-specific model. In particular, 

the model user is responsible for preparing validation 

documents in such a way that the domain of applicability 

and the range of accuracy of the model are 

explicitly specified. Furthermore, the documentation of 

the modelling process should ideally be done in enough 

detail that it can be repeated several years later, if required. 

The model user has to interact with the water 

resources manager on assessments of realistic model 

accuracies. Furthermore, the model user must be aware 

of the capabilities and limitations of the selected code 

and interact with the code developer with regard to 

reporting of user experience such as shortcomings in 

documentation, errors in code, market demands for 

extensions, etc.


The key responsibilities of the developer of the model 

code are to develop and verify a model code. In this 

connection it is important that the capabilities and 

limitations of the code appear in the documentation. As 

code development is a continuous process, code maintenance 

and regular updating with new versions improved 

as a response to user reactions become important. Although 

a model code should be comprehensively documented, 

there will in practise always occur doubts once 

in a while on its functioning, even for experienced users. 

Hence, active support to and dialogue with model users 

are crucial for ensuring operational model applications 

at a high professional level. 

4.4. Performance criteria––when is a model good enough 

A critical issue in relation to the methodological 

framework is how to define the performance criteria. We 

agree with Beven [9] that any conceptual model is 

known to be wrong and hence any model will be falsified 

if we investigate it in sufficient detail and specify very 

high performance criteria. 

Clearly, if one attempts to establish a model that 

should simulate the truth it would always be falsified. 

However, this is not a very useful information. Therefore, 

we are using the conditional validation, or the 

validation restricted to domain of applicability (or 

numerical universal as opposed to strictly universal in 

Popperian terms). The good question is then what is 

good enough Or in other words what are the criteria 

How do we select them 

A good reference for model performance is to compare 

it with uncertainties of the available field observations. 

If the model performance is within this uncertainty 

range we often characterise the model as good enough. 

However, usually it is not so simple. How wide confidence 

bands do we accept on observational uncertainties––ranges 

corresponding to 65%, 95% or 99% Do 

we always then reject a model if it cannot perform within 

the observational uncertainty range In many cases even 

results from less accurate models may be very useful. 

Therefore, our answer is that the decision on what is 

good enough generally must be taken in a socio-economic 

context. For instance, the accuracy requirements 

to a model to be used for an initial screening of alternative 

options for location of a new small well field for a 

small water supply will be much smaller than the 

requirements to a model that is intended to be used for 

the final design of a large well field for a major water 

supply in an area with potential damaging effects on 

precious nature and other significant conflicts of interests. 

Thus, we believe that the accuracy criteria cannot 

be decided universally by modellers or researchers, but 

must be different from case to case depending on how 

much is at stake in the decision to depend on the support 

from model predictions. This implies that the performance 

criteria must be discussed and agreed between the 

manager and the modeller beforehand. However, as the 

modelling process and the underlying study progresses 

with improved knowledge on the data and model 

uncertainties as well as on the risk perception of the 

concerned stakeholders it may well be required to adjust 

the performance criteria in a sort of adaptive project 

management context [27]. 

4.5. The role of uncertainty assessments 

Should we then trust a model if it happens to pass a 

validation test Are we sure that this model is the best 

one and that the underlying conceptual basis and input 

data are basically correct 

Yes on the one hand, in such case we may trust a 

model as a suitable tool to make predictions through 

model simulations. But on the other hand, we can never 

be sure that a model that passes a validation test will 

have a sound conceptual basis. It could be right for the 

wrong reasons, e.g. by compensating error in conceptual 

model (model structure) with errors in parameter values. 

And we know that it would be possible to find many 

other models that can pass the validation test, and that it 

would not be possible beforehand to identify one of these 

models as the best one in all respects. Having realised this 

equifinality problem the relevant question is what we 

should do to address it in practical cases. In this respect 

our framework prescribes that model predictions (see 

definition of Ôsimulation’ in Section 3) made subsequent 

to passing a validation test should include uncertainty 

assessments. Hence, we basically agree with Beven [9] 

that uncertainty assessments are necessary, and that such 

uncertainty analyses should include uncertainty on 

model structure, parameter values etc. Different methodologies 

exist for conducting uncertainty assessments, 

e.g. Beven [8] and Van Asselt and Rotmans [36]. 

5. Guiding principles and future perspectives for modelling 

guidelines 

5.1. Guiding principles 

In our opinion the two key factors causing the poor 

quality of the modelling work in practise are: (a) too 

poor quality of the modelling work done by practitioners 

(inadequate use of guidelines and quality assurance 

procedures and inadequate role play between manager 

(client) and modeller (consultant)) and (b) lack of data 

and methodology in the hydrological science. Modelling 

guidelines like [25,37] almost exclusively address the 

former issue while scientific literature like [7,9] focus on 

the latter issue. In our opinion it is crucial that the two 

lines of action are combined. This implies that we need 

to define modelling guidelines that are both operational


in practise and scientifically founded. The framework we 

have described here attempts to establish one such a 

bridge between the two fields, i.e. pragmatic modelling 

and natural science. An important aspect of this 

framework is in a scientifically consistent way to enable 

the manager and the modeller to make the compromises 

that are required in practise. 

On this background the following five key principles 

for pragmatic modelling have emerged: 

• A terminology that is internally consistent. We 

acknowledge that many authors in the scientific literature 

use different terminology and that, in particular, 

some authors do not use the terms verification 

and validation. However, these terms are also widely 

used, and we need in practise to have understandable 

terms for these operations. Thus, with the clear distinction 

between conceptual model, model code and 

site-specific model and the restrictions to domains 

of applicability (numerical universal in Popperian 

sense) we believe that our terminology is in accordance 

with the main stream of scientific philosophy. 

• We never talk about universal code verification or universal 

model validation, but always restrict these 

terms to clearly defined domains of applicability. This 

is a necessary assumption for the consistency of the 

terminology and methodology and must be emphasised 

explicitly in any guidelines. 

• Validation tests against independent data that have 

not also been used for calibration are necessary in order 

to be able to document the predictive capability 

of a model. 

• Model predictions achieved through simulation 

should be associated with uncertainty assessments 

where amongst others the uncertainty in model structure 

and parameter values should be accounted for. 

• A continuous interaction between manager and modeller 

is crucial for the success of the modelling process. 

One of the key aspects in this regard is to establish suitable 

performance criteria for the model calibration 

and validation tests. This dialogue is also very important 

in connection with uncertainty assessments. 

5.2. Future challenges 

Some of the issues dealt with in the present manuscript 

are still not fully explored. The four most 

important future challenges are: 

• Establishment of accuracy criteria for a modelling 

study is a very important issue and one where we 

maybe differ from most scientific literature. Modellers 

often establish numerical accuracy criteria in order to 

classify the goodness of a given model [2,17,28]. 

These attempts are very useful in making the performance 

more transparent and quantitative, but do not 

provide an objective means to decide what the optimal 

accuracy criteria really should be in a given case. 

According to our framework no universal accuracy 

criteria can be established, i.e. it is generally not possible 

from a natural scientific point of view to tell 

when a model performance is good enough. Such 

acceptance criteria will vary from case to case 

depending on the socio-economic context, i.e. what 

is at stake in the decisions to be supported by the 

model predictions. The good question now is: how 

do we translate the Ôsoft’ socio-economic objectives 

to Ôhard-core’ model performance criteria This is 

obviously a challenge that cannot be solved by natural 

science alone, but need to be addressed in a much 

broader context including aspects of economy, stakeholder 

interests and risk perception. Until we become 

better to overcome this challenge we will, however, 

not be able to arrive at the optimal balance between 

the costs of modelling and the derived societal benefits. 

Although this work has hardly begun yet, and 

we know that it is a very difficult road, we see no real 

alternative. 

• Although all experience shows that models generally 

perform poorer in validation tests against independent 

data than they do in calibration tests, model validation 

is in our opinion a much neglected issue, both 

in many modelling guidelines and in the scientific 

literature. Maybe many scientists have not wanted 

to use the term validation due to the scientific philosophically 

related controversies, but in any case 

many scientists are not advocating the need for model 

validation. One of the unfortunate consequences of 

this Ôlack of interest’ is that not much work has 

been devoted to developing suitable validation test 

schemes since Klemes [19]. In our opinion further 

development of suitable testing schemes and imposing 

them to all modelling projects is a major future 

challenge. 

• A third issue that requires considerable attention is 

how do we decide among alternative model structures 

and parameter sets (the equifinality problem). If we 

use multiple criteria one model may be better on 

one criteria and another on another criteria. In our 

opinion we need not necessarily chose. We know that 

all conceptual models are wrong and we know that 

wrong conceptual models are compensated by biased 

model parameter values through calibration. But, unless 

we can falsify a conceptual model directly, which 

is very difficult, or unless the resulting model is falsified 

through the validation test, this model is a possible 

candidate for predictions. And if several models 

pass the validation tests we may not be able to tell 

which one is the best. In such case they should all 

be considered suitable, and the fact that they provide 

different predictive results should be used as part of 

the uncertainty assessments. Work on this relatively


new paradigm has just begun [9] and a lot of work is 

still required to further develop and operationalise it. 

• Finally, there are many more challenges related to 

uncertainty in water resources management. Quality 

assurance and uncertainty assessments are two 

aspects that are very closely linked. Initially, the manager 

has to define accuracy criteria from a perception 

of which uncertainty level he believes is suitable in a 

particular case (see above). Subsequently, as the modelling 

study proceeds, the dialogue between modeller 

and manager has to continue with the necessary 

trade-off between modelling accuracy and cost of 

modelling study. In the uncertainty assessments it is 

very important to go beyond the traditional statistical 

uncertainty analysis. Thus, e.g. aspects of scenario 

uncertainty and ignorance should generally be included 

and in addition the uncertainties originating 

from data and models often needs to be integrated 

with socio-economic aspects in order to form a suitable 

basis for the further decision process [36]. Thus, 

like with the accuracy criteria (above) the use of 

uncertainty assessments in water resources management 

goes beyond natural science. 


The present work was carried out within the Project 

ÔHarmonising Quality Assurance in model based catchments 

and river basin management (HarmoniQuA)’, 

which is partly funded by the EC Energy, Environment 

and Sustainable Development programme (Contract 

EVK2-CT2001-00097). The constructive comments and 

suggestions to the manuscript by the HarmoniQuA 

project team and by our colleague William (Bill) G. 

Harrar are acknowledged. Finally, the constructive 

criticisms by Keith Beven, University of Lancaster; 

Rodger Grayson, University of Melbourne and a third, 

anonymous referee helped to improve the manuscript 

significantly. 

References 

[1] Abbott MB. The theory of the hydrological model, or: the 

struggle for the soul of hydrology. In: O’Kane JP, editor. 

Advances in theoretical hydrology. Elsevier; 1992. p. 237–54. 

[2] Andersen J, Refsgaard JC, Jensen KH. Distributed hydrological 

modelling of the Senegal River Basin––model construction and 

validation. J Hydrol 2001;247:200–14. 

[3] Anderson MG, Bates PD, editors. Model validation: perspectives 

in hydrological science. John Wiley and Sons; 2001. 

[4] Anderson MP, Woessner WW. The role of postaudit in model 

validation. Adv Water Resour 1992;15:167–73. 

[5] Baker VR. Conversing with the Earth: the geological approach to 

understanding. In: Frodeman R, editor. Earth matters The earth science, 

philosophy and the claims of community. Prentice Hall; 2000. 

[6] Beven K. Changing ideas in hydrology––the case of physically 

based models. J Hydrol 1989;105:157–72. 

[7] Beven K. Towards an alternative blueprint for a physically based 

digitally simulated hydrologic response modelling system. Hydrol 

Process 2002;16(2):189–206. 

[8] Beven K, Binley AM. The future of distributed models: model 

calibration and uncertainty prediction. Hydrol Process 1992;6: 

279–98. 

[9] Beven K. Towards a coherent philosophy for modelling the 

environment. Proc Roy Soc Lond A 2002;458(2026):2465–84. 

[10] Dee DP. A pragmatic approach to model validation. In: Lynch 

DR, Davies AM, editors. Quantitative skill assessment of coastal 

ocean models. Washington: AGU; 1995. p. 1–13. 

[11] De Marsily G, Combes P, Goblet P. Comments on ’Ground-water 

models cannot be validated’, by Konikow LF, Bredehoeft, JD. 

Adv Water Resour 1992;15:367–9. 

[12] Forkel C. Das numerische Modell––ein schmaler Grat zwischen 

vertrauensw€urdigem Werkzeug und gef€ahrlichem Spielzeug. Presented 

at the 26. IWASA, RWTH Aachen, 4–5 January 1996. 

[13] Freeze RA, Harlan RL. Blueprint for a physically-based digitallysimulated 

hydrologic response model. J Hydrol 1969;9:237–58. 

[14] Gupta HV, Sorooshian S, Yapo PO. Toward improved calibration 

of hydrologic models: multiple and noncommensurable 

measures of information. Water Resour Res 1998;34(4):751– 

63. 

[15] Hansen JM. The line in the sand the wave on the water––Steno’s 

theory on the language of nature and the limits of the knowledge. 

Copenhagen: Fremad; 2000. 440 pp (in Danish). 

[16] Hassanizadeh SM, Carrera J. Editorial, special issue on validation 

of geo-hydrological models. Adv Water Resour 1992;15:1–3. 

[17] Henriksen HJ, Troldborg L, Nyegaard P, Sonnenborg TO, 

Refsgaard JC, Madsen B. Methodology for construction, calibration 

and validation of a national hydrological model for 

Denmark. J Hydrol 2003;280(1–4):52–71. 

[18] IAHR. Publication of guidelines for validation documents and 

call for discussion. Int Assoc Hydraul Res Bull 1994;11:41. 

[19] Klemes V. Operational testing of hydrological simulation models. 

Hydrol Sci J 1986;31:13–24. 

[20] Konikow LF, Bredehoeft JD. Ground-water models cannot be 

validated. Adv Water Resour 1992;15:75–83. 

[21] Kuhn TS. The structure of scientific revolutions. Chicago: 

University of Chicago Press; 1962. 

[22] Liden R. Conceptual runoff models for material transport 

estimations. PhD dissertation, Report No. 1028, Lund Institute 

of Technology, Lund University, Sweden, 2000. 

[23] Los H, Gerritsen H. Validation of water quality and ecological 

models. Presented at the 26th IAHR Conference, London, Delft 

Hydraulics, 11–15 September 1995, 8 pp. 

[24] Matalas NC, Landwehr JM, Wolman MG. Prediction in water 

management. In: Scientific basis of water resource management. 

Washington, DC: National Research Council, National Academy 

Press; 1982. p. 118–27. 

[25] Middlemis H. Murray–Darling Basin Commission. Groundwater 

flow modelling guideline. Aquaterra Consulting Pty Ltd, South 

Perth, Western Australia. Project no. 125, 2000. 

[26] Oreskes N, Shrader-Frechette K, Belitz K. Verification, validation 

and confirmation of numerical models in the earth sciences. 

Science 1994;264:641–6. 

[27] Pahl-Wostl C. Towards sustainability in the water sector––the 

importance of human actors and processes of social learning. 

Aquat Sci 2002;64:394–411. 

[28] Parkin G, O’Donnell GO, Ewen J, Bathurst JC, O’Connel PE, 

Lavabre J. Validation of catchment models for predicting land-use 

and climate change impacts. 2. Case study for a Mediterranean 

catchment. J Hydrol 1996;175:595–613. 

[29] Popper KR. The logic of scientific discovery. London: Hutchingson 

& Co; 1959. 

[30] Refsgaard JC. Towards a formal approach to calibration and 

validation of models using spatial data. In: Grayson R, Bl€oschl G,


editors. Spatial patterns in catchment hydrology: Observations 

and modelling. Cambridge University Press; 2001. p. 329–54. 

[31] Refsgaard JC, Knudsen J. Operational validation and intercomparison 

of different types of hydrological models. Water Resour 

Res 1996;32(7):2189–202. 

[32] Rykiel ER. Testing ecological models: The meaning of validation. 

Ecol Modell 1996;90:229–44. 

[33] Schlesinger S, Crosbie RE, Gagne RE, Innis GS, Lalwani CS, 

Loch J, et al. Terminology for model credibility. SCS Tech Comm 

Model Credibil Simul 1979;32(3):103–4. 

[34] Scholten H, Van Waveren RH, Groot S, Van Geer FC, W€osten 

JHM, Koeze RD, et al. Good modelling practice in water 

management. Paper presented on Hydroinformatics 2000, Cedar 

Rapids, IA, USA, 2000. 

[35] Troldborg L. Effects of geological complexity on groundwater age 

prediction. Poster Session 62C, AGU December 2000. EOS 

Transactions, 81(48), F435. 

[36] Van Asselt MBA, Rotmans J. Uncertainty in integrated assessment 

modelling––from positivism to pluralism. Climat Change 

2002;54(1–2):75–105. 

[37] Van Waveren RH, Groot S, Scholten H, Van Geer FC, W€osten 

JHM, Koeze RD, et al. Good modelling practice handbook. 

STOWA Report 99-05, Utrecht, RWS-RIZA, Lelystad, The 

Netherlands. Available from: http://waterland.net/riza/aquest/.

[13] 

Refsgaard JC, Henriksen HJ, Harrar WG, Scholten H, Kassahun A (2005) 

Quality assurance in model based water management – review of existing 

practice and outline of new approaches. 

Environmental Modelling & Software, 20, 1201-1215. 

Reprinted from Environmental Modelling & Software with permission from Elsevier

Environmental Modelling & Software 20 (2005) 1201–1215 

www.elsevier.com/locate/envsoft 

Quality assurance in model based water management – review of 

existing practice and outline of new approaches 

Jens Christian Refsgaard a, ) , Hans Jørgen Henriksen a , William G. Harrar a , 

Huub Scholten b , Ayalew Kassahun b 

a Geological Survey of Denmark and Greenland (GEUS), Øster Voldgade 10, DK-1350 Copenhagen K, Denmark 

b Wageningen University (WU), Dreijenplein 2, 6703 HB, Wageningen, The Netherlands 

Received 11 December 2003; received in revised form 30 March 2004; accepted 30 July 2004 

Abstract 

Quality assurance (QA) is defined as protocols and guidelines to support the proper application of models. In the water 

management context we classify QA guidelines according to how much focus is put on the dialogue between the modeller and the 

water manager as: (Type 1) Internal technical guidelines developed and used internally by the modeller’s organisation; (Type 2) 

Public technical guidelines developed in a public consensus building process; and (Type 3) Public interactive guidelines developed as 

public guidelines to promote and regulate the interaction between the modeller and the water manager throughout the modelling 

process. State-of-the-art QA practices vary considerably between different modelling domains and countries. It is suggested that 

these differences can be explained by the scientific maturity of the underlying discipline and differences in modelling markets in terms 

of volume of jobs outsourced and level of competition. The structure and key aspects of new generic guidelines and a set of 

electronically based supporting tools that are under development within the HarmoniQuA project are presented. Model credibility 

can be enhanced by a proper modeller-manager dialogue, rigorous validation tests against independent data, uncertainty 

assessments, and peer reviews of a model at various stages throughout its development. 


Keywords: Modelling guidelines; Quality assurance; Water resources management; Uncertainty; Support tools 


Models describing water flows, water quality and 

ecology are being developed and applied in increasing 

number and variety. The trend in recent years has been 

to base water management decisions to a larger extent 

on modelling studies, and to use more sophisticated 

models. In Europe this trend is likely to be reinforced by 

the EU Water Framework Directive due to its demand 

for integrating groundwater, surface water, ecological 

) Corresponding author. Tel.: C45 38 142 776; fax: C45 38 142 

050. 


and economic aspects of water management at the river 

basin scale and due to the explicit requirement to study 

impacts of alternative measures (human interventions) 

intended to improve the ecological status in the river 

basin. Insufficient attention is often given to documenting 

the predictive capability of models. Therefore, 

contradictions may emerge regarding the various claims 

of model applicability on the one hand and the lack of 

documentation of these claims on the other hand. 

Hence, the credibility of the model is often questioned, 

and sometimes with good reason. 

Another important trend is the demand to involve 

different stakeholders in the water resources management 

process, and therefore also indirectly in the 

modelling process (Pahl-Wostl, 2002). This stakeholder 


doi:10.1016/j.envsoft.2004.07.006

1202 J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215 

involvement does not imply active participation in 

the technical modelling itself, but rather appears as 

a demand to be able to understand and review the 

various assumptions and their implications for the 

modelling results. This trend is seen at the global scale 

in connection with the generally accepted principles 

behind integrated water resources management, where 

public participation is a key element (GWP-TAC, 2000). 

In Europe, this is reflected in the EU Water Framework 

Directive, where it is explicitly prescribed that stakeholders 

and the general public should be involved in the 

water resources management process. 

The need for improving the quality of the modelling 

process has been emphasised by the research community, 

e.g. Klemes (1986), NRC (1990), Anderson and 

Woessner (1992), Forkel (1996), and Rykiel (1996). The 

recommendations made in this respect primarily focus 

on scientific/technical guidance on how the modeller 

should carry out various steps during the modelling 

process in order to achieve the best and most reliable 

results. 

Anderson and Bates (2001) in a discussion of model 

credibility and scientific integrity state that ‘‘over the last 

decade we have begun to have an appreciation of the 

need to be much more rigorous in establishing 

procedures for defining model credibility’’. They argue 

further that this demand has not evolved from the 

hydrological science itself due to immaturity and data 

limitations, but instead comes from policy makers and 

regulators who wish to have some kind of certification 

of model results. 

As emphasised by e.g. Forkel (1996) modelling 

studies involve several partners with different responsibilities. 

The ‘key players’ are code developers, model 

users and water managers. However, a lack of mutual 

understanding may develop due to the complexity of the 

modelling process and the different backgrounds of the 

‘key players’. For example, the strengths and limitations 

of modelling applications are often difficult, if not 

impossible, for the water managers to assess. Similarly, 

the transformation of objectives defined by the water 

manager to specific performance criteria can be very 

difficult for the model users to assess. It can be difficult 

to audit modelling projects due to the lack of proper 

documentation and transparency. Furthermore, it is 

often difficult to reconstruct and reproduce the modelling 

process and its results. 

In the water resources management community many 

different guidelines on good modelling practise have 

been developed. One of, if not the most, comprehensive 

example of a modelling guideline has been developed in 

The Netherlands (Van Waveren et al., 2000; Scholten 

and Groot, 2002) as a result of a process involving all 

the main players in the Dutch water management field. 

The background for this process was a perceived need 

for improving the quality in modelling by addressing 

malpractice issues such as careless handling of input 

data, insufficient calibration and validation, and model 

use outside its intended scope (Scholten et al., 2000). 

Similarly, modelling guidelines for the Murray-Darling 

Basin in Australia were developed due to the perception 

among end-users that model capabilities may have been 

‘over-sold’, and that there was a lack of consistency in 

approaches, communication and understanding among 

and between the modellers and the water managers, 

which often resulted in considerable uncertainty for 

decision making (Middlemis, 2000). 

As pointed out by Merrick et al. (2002) good 

modelling practice cannot be decomposed into a set of 

rigid rules that can be followed without communication 

between modellers and water managers. Furthermore, 

there is a risk that modellers will not embrace guidelines 

aiming to inject too much consistency in the review 

procedure. Experiences from Australia have shown that 

review reports are commonly interpreted by water 

managers (non-modellers) as quite negative. Nonmodellers 

may tend to focus mainly on the negative 

review comments rather than balance those against the 

positive comments. This may mostly be the case for 

projects where there has not been a proper specification 

of the purpose and conditions at the initiation of the 

model study or where previous reviews during earlier 

project stages have been inadequate. External reviews 

performed at the end of a project when things may have 

already gone wrong may often result in defensive 

responses both from the modellers and the water 

managers (Henriksen, 2002a). 

All the existing modelling guidelines that we are 

aware of exist as reports. Electronically based support is 

only available as text forms to record modelling 

activities. No electronically based tool that is coupled 

to a knowledge base defining how to carry out the 

modelling (electronic version of guidelines with comprehensive 

guidance to different types of users) exists at 

present. This is a paradox, considering the significant 

resources that are invested in improving modelling 

software packages with respect to new sophisticated 

information technology. 

Poor modelling results may be caused by the lack of 

adequate model codes, or data of insufficient quantity or 

quality. However, according to our experience the most 

prevalent reason for poor modelling results is the 

inadequate use of guidelines and quality assurance 

procedures, and improper interaction between the 

manager (client) and the modeller (consultant). Our 

work has been carried out within the context of an EU 

supported research project (http://www.harmoniqua.org) 

aimed at developing a common set of quality 

assurance guidelines and supporting software tools. The 

scientific philosophical basis for the adopted terminology 

and guiding principles are described by Refsgaard 

and Henriksen (2004). The objective of the present

J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215 

1203 

paper is to establish new approaches and outline the 

requirements of supporting tools for quality assurance 

procedures in the modelling process. 

2. Theoretical framework 

2.1. Terminology and scientific basis 

The terminology and methodology used in the 

following are based on Refsgaard and Henriksen (2004). 

The key elements in the terminology are illustrated in 

Fig. 1 and the most important definitions are: 

A model code is a generic software program, which 

can be used for different study areas without 

modifying the source code. 

A model is a site application of a code to a particular 

study area, including input data and parameter 

values. 

A model code can be verified. A code verification 

involves comparison of the numerical solution 

generated by the code with one or more analytical 

solutions or with other numerical solutions. Verification 

ensures that the computer programme accurately 

solves the equations that constitute the 

mathematical model. 

Model validation is here defined as the process of 

demonstrating that a given site-specific model is 

capable of making accurate predictions for periods 

outside a calibration period. A model is said to be 

validated if its accuracy and predictive capability in 

the validation period have been proven to lie within 

acceptable limits or errors. 

These terms are commonly used, although with 

differences in meaning between authors. Our views on 

Fig. 1. Elements of a modelling terminology (Refsgaard and 

Henriksen, 2004). 

these terms and the ongoing discussion on validationfalsification-confirmation 

as well as between the terms 

perceptual model, conceptual model and site-specific 

model are given in Refsgaard and Henriksen (2004). 

Here we just note that, from a quality assurance 

guideline point of view, it is fundamental for us to 

make a clear distinction between the terms conceptual 

model, model code and (site-specific) model. Furthermore, 

we never use the terms verification and validation 

in a universal sense, but always restricted to clearly 

defined domains of applicability (numerical universal in 

Popperian sense). 

In addition to ensure a proper quality of work the 

three most important underlying principles that have 

been identified from an analysis of the modelling process 

are (Refsgaard and Henriksen, 2004): 

Validation tests against independent data that have 

not also been used for calibration are necessary in 

order to be able to document the predictive 

capability of a model. 

Model predictions achieved through simulation 

should be associated with uncertainty assessments 

where amongst others the uncertainty in model 

structure and parameter values should be accounted 

for. 

A continuous interaction between water manager and 

modeller is crucial for the success of the modelling 

process. One of the key aspects in this regard is to 

establish suitable performance criteria for the model 

calibration and validation tests. This dialogue is also 

very important in connection with uncertainty 

assessments. 

2.2. Types of QA guidelines 

2.2.1. Definition and classification 

of quality assurance (QA) 

Quality assurance (QA) is defined by NRC (1990) as 

the procedural and operational framework used by an 

organisation managing the modelling study to assure 

technically and scientifically adequate execution of all 

tasks included in the study, and to assure that all 

modelling-based analysis is reproducible and defensible. 

In line with this we define QA guidelines as protocols 

and guidelines to support good application of models in 

water management. 

QA in the modelling process has two main components: 

(a) QA in development of model codes; and (b) 

QA in relation to application studies. Our paper focuses 

on the second component only. 

QA in model application studies includes data 

analyses, methodologies of good modelling practice, 

reviews and administrative procedures. Such QA guidelines 

can be classified according to how much focus is


put on the consensus building process between the 

modeller and the water manager in the following three 

classes: 

Internal technical guidelines (Type 1) established and 

used internally by the modeller’s organisation. 

Public technical guidelines (Type 2) established as 

public guidelines and used internally by the modeller’s 

organisation. 

Public interactive guidelines (Type 3) established as 

public guidelines and based on regulation of the 

interaction between the modeller and the water 

manager throughout the modelling process. 

2.2.2. Type 1: Internal technical guidelines 

Most organisations involved in modelling studies 

have some kind of internal QA procedures. They usually 

focus on the technical aspects, i.e. to ensure that the 

modelling work itself is done without making unqualified 

judgements or errors. The betters of these are 

based on the modelling protocols and similar scientifically 

based procedures originating from the research 

community. These procedures are internal in nature 

because they have been established or adopted unilaterally 

by the modeller’s organisation, and because they 

seldom deal with the interaction between modeller and 

end-user. Examples of Type 1 guidelines include: 

Internal QA procedures, common in many companies. 

Text books. Many textbooks contain chapters with 

recommended modelling protocols (e.g. Anderson 

et al., 1993). 

Manuals to software packages with hints on the best 

way to use a model (e.g. Rumbaugh and Rumbaugh, 

2001; DHI, 2002). 

2.2.3. Type 2: Public technical guidelines 

These guidelines often contain the same substance as 

the internal technical guides mentioned above. However, 

they differ in the sense that they have been 

prepared through a consultative and consensus building 

process involving many persons and organisations. They 

focus on the technical aspects and give no or little 

emphasis to the interaction between the modeller and 

the end-user. Examples of Type 2 guidelines include: 

The CAMASE guidelines for modelling that were 

developed after substantial consultation within the 

scientific modelling community (CAMASE, 1996). 

Standards from American Society for Testing and 

Materials (e.g. ASTM, 1994). 

Many of the UK standards, especially the older ones 

(Packman, 2002). 

2.2.4. Type 3: Public interactive guidelines 

These guidelines have, like the public technical 

guidelines (Type 2), been established through a public 

consultative and consensus building process. However, 

they differ from the Type 2 guidelines by an additional 

focus on regulating the interaction between the modeller 

and the water manager, who often have the roles of 

consultant and client, respectively. 

Important elements in public interactive guidelines 

are reviews that, in addition to QA in the sense of technical 

guidance, can facilitate the consensus-building process 

between the parties. Experience shows that such a 

process is crucial for the overall credibility of the modelling 

process. Examples of such QA guidelines include 

(more details on these guidelines provided in next 

chapter): 

The Dutch guidelines (Van Waveren et al., 2000; 

Scholten and Groot, 2002). 

The Australian groundwater flow modelling guidelines 

established by the Murray-Darling Basin 

Commission (Middlemis, 2000; Merrick et al., 

2002; Henriksen, 2002a). 

The Danish groundwater modelling guidelines 

(Henriksen, 2002b). 

Some of the recent UK standards (Packman, 2002). 

Californian guidelines prepared by Bay-Delta Modelling 

Forum (BDMF, 2000). 

2.3. Development stage and prevalence 

of QA guidelines 

Reviews of a number of existing QA guidelines (see 

details in next chapter) revealed significant differences in 

current practice, both between domains and between 

different countries. In some domains and some countries 

there has been a clear trend over the past couple of 

decades to move from Type 1 to Type 2 or Type 3 

guidelines. In order to understand the development of 

QA guidelines and be able to provide recommendations 

based on anticipated future needs, it is important to try 

to understand why the present differences in the 

developmental stage of QA guidelines exist. The 

hypothesis that we will test is that the development 

stage depends on two main factors: 

The scientific maturity of the underlying discipline, 

i.e. how well understood are the underlying processes 

and how easily available are the data 

necessary for practical applications. In this respect, 

a mature scientific discipline is one where there is 

a general acceptance in the scientific community on 

how the processes are described, there are no 

significant controversies on key issues, and it is 

feasible to acquire the necessary data for practical


1205 

studies. Similarly, an immature scientific discipline is 

one where some processes are not well understood, 

where there are several alternative ‘schools’ on how 

to describe things, and where it is often not possible 

to obtain sufficient field data necessary to perform 

scientifically sound modelling. Immature scientific 

disciplines are often considered as being complex, 

and are characterised by unresolved problems such 

as scale problems. For example, whereas biology is 

a relatively old science in comparison with hydrogeology, 

biota (ecological) modelling is considered 

to be immature in contrast to groundwater flow 

modelling which is considered to be mature. Biota 

modelling is rather uncertain due to the inherent 

complexity of ecological systems and the general 

limited availability of relevant field data, whereas the 

mathematical principles describing groundwater 

flow are well established and flow systems are 

readily characterised in the field. 

The modelling market maturity, i.e. how well developed 

is the market for modelling studies. In this 

respect, a mature market is characterised by (a) the 

modelling market is relatively old with numerous 

examples of good and poor quality modelling 

studies, and the motivation for establishing QA 

guidelines is largely due to water managers having 

experience with studies of poor quality; (b) most jobs 

are outsourced to private consultants; (c) the volume 

of modelling work is large, so that a number of 

consultants can be sustained and standard routines 

can evolve; and (d) there is a considerable competition 

among modellers in getting the jobs. Similarly, 

an immature market is characterised by (a) it is relatively 

new (typically !10 years); (b) most modelling 

studies are carried out by government agencies themselves; 

(c) the volume of work for the consultants is 

small; and (d) there is virtually no competition 

among modellers, instead the work is carried out by 

a few specialised groups which are often located in or 

have close ties to the research community. 

If these hypotheses were true one would a priori 

expect that a considerable degree of scientific maturity is 

required for QA guidelines of Type 2 to develop, and 

that further a mature modelling market is a necessary 

prerequisite for the development of Type 3 guidelines. 

3. Existing guidelines 

Reviews of existing QA guidelines were conducted 

(Refsgaard, 2002). The reviews attempted to cover two 

aspects: (a) variation of practices between seven different 

modelling domains (groundwater, precipitation-runoff, 

hydrodynamics, flood forecasting, surface water quality, 

biota (ecology) and socio-economy); and (b) differences 

between geographical regions. The reviews of stateof-the-art 

in the seven domains were carried out by 

seven different organisations with special expertise in the 

respective domains. During these reviews a broad search 

of relevant QA guidelines were made with primary focus 

on existing guidelines in Europe and secondarily 

on guidelines from North America and Australia. 

Subsequently, a few cases with guidelines from different 

geographical areas were selected for a more detailed 

review. The reviews did not intend to be exhaustive by 

including all important QA guidelines, but aimed at 

selecting guidelines representative for conditions in 

Europe, North America and Australia. 

In order to test the above hypotheses the conclusions 

of the state-of-the-art of QA guidelines for the different 

domains summarised in Section 3.1 are plotted in Fig. 2 

as a function of scientific maturity. Furthermore, 

examples of guidelines from different countries are 

Scientific 

maturity 

Mature 

FF 

HD 

GW-HD 

Immature 

SWQ 

Biota 

GW-WQ 

Type 1 

Internal 

PR 

HD-Sed 

SE 

GW-AD 

Type 2 

Public 

Modelling domains 

GW-HD: Groundwater flow 

GW-AD: Groundwater solute transport 

GW-WQ: Groundwater geochemistry 

PR: Precipitation runoff 

HD: Hydrodynamic – surface water flow 

HD-Sed: Sediment transport/morphology 

FF: Flood forecasting 

SWQ: Surface water quality 

Biota: Biota (ecology) 

SE: Socio-economy 

Type 3 

Interactive 

QA 

guidelines 

Fig. 2. State-of-the-art for QA guidelines in different modelling domains plotted against maturity of the underlying scientific disciplines.


Modelling 

market 

Mature 

(Old, big, 

competive) 

ASTM 

UK 

BDMF 

AUS-GW 

NL-GMP 

DK-GW 

UK 

UK 

Immature 

(New, small, 

specialised) 

CEE 

FR-FF 

Cases-guidelines 

BDMF: Bay Delta Modelling Forum (California) 

AUS-GW: Australia, groundwater 

NL-GMP: Dutch Good Modelling Practise 

DK-GW: Denmark, groundwater 

UK: United Kingdom, several domains 

ASTM: American Society for Testing and Materials 

CEE: Central and Eastern Europe 

FR-FF: France, flood forecasting 

Type 1 

Internal 

Type 2 

Public 

Type 3 

Interactive 

QA 

guidelines 

Fig. 3. Different types of guidelines as a function of maturity in the modelling market. 

presented in Section 3.2 and Fig. 3 with focus on market 

maturity. 

3.1. State-of-the-art in different modelling domains 

Groundwater modelling (Refsgaard and Henriksen, 

2002): In this field, QA guidelines are well developed 

and used in many countries, but mostly in groundwater 

flow modelling, where the state-of-the-art corresponds 

to Type 3 guidelines. For solute transport, and in 

particular for geochemical modelling, relatively few 

guidelines exist and they are not commonly used. The 

need for QA guidelines differs from country to country, 

amongst others due to different stages of development of 

the groundwater modelling market. For instance, the 

guides from the American Society for Testing and 

Materials (ASTM) were among the first of their kind to 

be developed, in the early 1990s, because the practical 

application of groundwater models at that time had 

progressed further in the USA than in most other 

countries. 

Precipitation-runoff modelling (Perrin et al., 2002a): 

Relatively few guidelines exist for this domain as standalone 

guidelines. The guidelines that do exist are generally 

confined to relatively simple (lumped) approaches, 

while no generic guidelines exist for the more complex 

models of the distributed physically-based type. Thus, 

the state-of-the-art for precipitation-runoff as a standalone 

domain may be characterised as Type1/Type2. 

However, it is also noted that precipitation-runoff 

modelling is often used as an integral part of other 

domains, e.g. groundwater models, hydrodynamic 

models, flood forecasting models and surface water 

quality models. For some of these integrated applications 

some guidelines have been developed which 

include the precipitation-runoff domain. This is, for 

instance, the case for the Danish groundwater guidelines 

(Henriksen, 2002b) which include aspects of precipitation-runoff 

modelling. 

Hydrodynamic modelling (Metelka and Krejcik, 

2002a): This domain includes environmental applications 

such as modelling of urban drainage and sewer 

systems, rivers, floodplains, estuaries and coastal waters 

both with respect to flows, sediment and morphological 

issues. QA guidelines are well developed in some fields 

(e.g. in urban drainage and river modelling), but not in 

other fields (e.g. sediment and morphological modelling). 

For hydrodynamic modelling in coastal areas and 

estuaries few QA guidelines have been identified. The 

state-of-the-art may be characterised as Type 2 for most 

parts of the domain and Type 1 for other parts. It is 

noted that hydrodynamic modelling is often an integral 

part of flood forecasting and surface water quality 

modelling. Although very similar in theoretical scientific 

background, this domain is different from the field of 

Computational Fluid Dynamics that typically is used for 

industrial purposes. 

Flood forecasting modelling (Balint, 2002): This 

domain differs fundamentally from the other domains 

by being based on real-time operation. This implies that 

the models, once established, are applied on a routine 

(daily) basis although often under extreme boundary 

conditions. The focus on QA in this domain is often 

concentrated on data quality for the on-line data 

acquisition. Due to this fundamental difference in nature, 

the status of QA guidelines for this domain does not fit 

well into the above classification, and it is not easily 

comparable to the status of the other domains. 

Surface water quality modelling (Da Silva et al., 

2002): Surface water quality modelling is based on 

a description of physical, chemical and biological 

processes. Often the data availability to assess model 

processes and parameters is sparse and often the 

key processes are not well understood. QA guidelines


1207 

are generally not well developed. The state-of-the-art 

may be characterised as Type 1. 

Biota (ecological) modelling (Old et al., 2002): 

Ecology is a diverse branch of biology that focuses on 

the relations of flora and fauna to one another and to 

their physical environment. Ecological models are 

widely used today, but perceived as being rather 

uncertain due to the inherent complexity of ecological 

systems and the general limited availability of relevant 

field data. QA guidelines are generally not well 

developed. The state-of-the-art may be characterised as 

Type 1. 

Socio-economic modelling (Heinz and Eberle, 2002): 

No general QA guidelines exist for socio-economic 

modelling. The few existing guidelines, such as the 

CAMS, CFMPS and RBMPs in the UK, are specific for 

particular types of application, and they are so far only 

used in practice in a few countries. The state-of-the-art 

may be characterised as Type1/Type2. 

In Fig. 2 the state-of-the-art for QA guidelines in the 

respective modelling domains have been plotted against 

the scientific maturity of the underlying disciplines. The 

scientific maturity of the respective domains has been 

assessed subjectively on the basis of the criteria outlined 

in Section 2.3 above. There is a tendency that the least 

developed guidelines (Type 1) appear in domains where 

the underlying scientific basis is characterised as 

immature, i.e. in surface water quality, biota (ecology) 

and groundwater quality, reflecting that many fundamental 

scientific issues remain to be solved. Similarly, 

the Type 2 and Type 3 guidelines are dominant in 

domains characterised by scientific maturity. However, 

there are clear exceptions such as precipitation-runoff 

and flood forecasting, where other factors than scientific 

maturity must play a role for the development stage of 

QA guidelines. 

3.2. Current practice in different countries 

The current practice of using QA guidelines in 

different countries has been illustrated through some 

selected cases that have been reviewed in Refsgaard 

(2002). InFig. 3 the type of QA guidelines used in the 

case studies is plotted against the maturity of the 

modelling market that has been assessed subjectively on 

the basis of the criteria given in Section 2.3 above. The 

practice as reflected by the case studies and shown on 

the figure is summarised as follows: 

Dutch guidelines (Scholten and Groot, 2002): The 

Dutch guidelines are the most generic of the existing 

guidelines in the sense that they cover all the domains 

relevant for river basin management. The technical 

guidance for different modelling domains exist, but are 

not as detailed as some of the guidelines that only cover 

one domain (e.g. ASTM guides or Australian guidelines 

on groundwater flow modelling). The Dutch guidelines 

emphasise the dialogue process between modeller and 

water manager, including the review procedures. The 

Dutch guidelines belong to Type 3. The Dutch 

modelling market may be characterised as mature. 

Australian groundwater flow modelling guidelines 

(Henriksen, 2002a): The Australian guidelines are 

technically comprehensive. They focus on the dialogue 

between the modeller and the water manager in general 

and on review procedures in particular. The guidelines 

were developed over several years with involvement of 

all of the key stakeholders. The Australian guidelines 

belong to Type 3. The Australian groundwater modelling 

market may be characterised as mature. 

Danish groundwater modelling guidelines (Henriksen, 

2002b): The Danish Handbook of Good Modelling 

Practice and draft guidelines is similar to the Australian 

ones, although some important details differ. The water 

managers, who also ensure that they presently are being 

used in most studies, have initiated the Danish guidelines. 

The Danish guidelines belong to Type 3. The 

Danish groundwater modelling market may be characterised 

as mature. 

Central and Eastern Europe (Metelka and Krejcik, 

2002b;Van Gils and Groot, 2002): Public QA guidelines 

are neither well developed nor used. Many modellers 

therefore rely only on internal QA procedures (Type 1) 

adopted by their respective organisations. This situation 

reflects a new and unregulated market for modelling 

services, and a market where the managers and their 

organisations often are technically too weak to adopt 

and enforce QA guidelines. 

French guidelines in flood forecasting (Perrin et al., 

2002b): Public or interactive guidelines do not exist in 

this area, and the case study describes a set of internal 

technical guidelines (Type 1). Although flood forecasting 

is an old modelling discipline, the modelling 

market is virtually non-existent, because flood forecasting 

modelling in France (as well as in most other 

countries) is carried out either by a government agency 

or by a specialised research institute. 

UK guidelines (Packman, 2002): QA guidelines are 

generally very well developed in the UK. Application of 

guidelines is prescribed as a routine in most areas of 

model application. Thus, in general the UK market for 

modelling services is well regulated and characterised as 

being mature. Most of the guidelines are of Type 2 and 

some recent ones of Type 3. The exceptions to this are 

the surface water quality and biota (ecological) domains 

where no general guidelines exist. The guidelines in these 

domains are therefore confined to internal procedures 

inspired by textbooks and manuals (Type 1). 

Bay Delta Modelling Forum, California (BDMF, 

2000): The Californian guidelines provide a framework, 

but very few technical details. The main emphasis of 

these guidelines is on the interaction between modellers, 

managers and the public (Type 3). In this respect various


kinds of reviews are prescribed at various stages of the 

modelling process. The American market in general and 

the Californian in particular are well established 

(mature). 

American Society for Testing and Materials (ASTM, 

1992, 1994): The American guidelines are especially 

comprehensive in the groundwater domain, where they 

have served as inspiration for all the other groundwater 

guidelines, including the Australian and the Danish 

guidelines. There are a number of guidelines on various 

elements of the modelling process. These guides are 5–10 

years old and are mainly technical of nature, while 

limited focus is put on the interaction and review 

process. 

In addition to the above QA guidelines ISO (the 

International Organisation for Standardisation) regularly 

publishes quality management and quality assurance 

standards. ISO standards provide guidance on 

fundamental principles and procedures, but on a rather 

general level. We have found ISO standards addressing 

development, supply and maintenance of computer 

software (ISO 9000-3:1997) and other standards providing 

guidance for a general process based quality 

management system in an organisation (ISO 

9004:2000(E)). However, none of the ISO standards 

include any particular guidance on matters related to 

water resources modelling or management, and they are 

therefore of limited practical use as compared to the 

above other QA guidelines dedicated to water resources 

modelling. 

3.3. Content of existing guidelines 

3.3.1. Key elements 

The existing guidelines all comprise modelling protocols 

with recommended steps and technical guidance 

on how to perform these steps in the modelling process. 

The key elements may be divided into two groups, 

namely: (1) technical guides on how to use models; and 

(2) guides for regulating the interaction between 

modeller and end-user/water manager. The key elements 

in the technical guides include: 

Definition of the purpose of the modelling study. 

Collection and processing of data. 

Establishment of a conceptual model. 

Selection of code or alternatively programming and 

verification of code. 

Model set-up. 

Establishment of performance criteria. 

Model calibration. 

Model validation. 

Uncertainty assessments. 

Simulation with model application for a specific 

purpose. 

Reporting. 

The key elements in the interaction between the 

modeller and the end-user in addition to some of the 

above elements also includes other aspects: 

Definition of the purpose of the modelling study, 

including translation of the end-users needs to 

preliminary performance criteria. 

Establishment of performance criteria. The accuracy 

of the model predictions has to be established via 

a trade off between the benefits of improving the 

accuracy in terms of less uncertainty on the 

management decisions and the costs of improving 

the accuracy through additional model studies and/ 

or collection of additional field data. 

Reviews with subsequent consultation between the 

modeller and the end-user at different phases of the 

modelling project. 

The content of the technical guides are to a large 

extent domain specific, while the elements of the 

interaction between the modeller and the end-user are 

more general in nature and differ only slightly from one 

domain to another. 

3.3.2. Integration across modelling domains 

Almost all the existing guidelines were developed for 

a specific domain e.g. groundwater modelling. As 

integrated modelling may be expected to play an 

important role in connection with implementation of 

the EU Water Framework Directive and adoption of 

Integrated Water Resources Management principles, 

guidelines not including integrated modelling aspects are 

inadequate. Even the Dutch guidelines (Scholten and 

Groot, 2002) which cover a large number of domains are 

essentially single domain guidelines, because they do not 

provide guidance on how to integrate across domains 

(interdependencies etc.). However, the Dutch guidelines 

do have the clear advantage over other existing guidelines 

in that they are based on a common methodology 

and a common glossary. 

It should be noted though that some guidelines cover 

more than one modelling domain, as they are defined 

here. For instance hydrodynamic modelling or groundwater 

modelling are often combined with precipitationrunoff, 

and guidelines combining these domains exist. 

3.3.3. Differences in terminology 

As illustrated in Refsgaard (2002) the terminology 

used in the modelling community varies significantly 

between domains and even to some extent from one 

country to another. This clearly demonstrates the need 

for establishing one common terminology and glossary 

for modelling applications as addressed by Refsgaard 

and Henriksen (2004).


1209 

4. Outline of new guidelines – HarmoniQuA 

4.1. Overall aim and structure 

On the basis of the knowledge achieved through the 

review of existing guidelines, the HarmoniQuA project 

aims to develop a new comprehensive set of guidelines 

and supporting software tools to facilitate an improved 

quality of the modelling process and hence enhance the 

confidence of all stakeholders. 

HarmoniQuA forms part of the CATCHMOD 

cluster of EU research projects (Blind, 2004). It aims 

to be a methodological component of a future infrastructure 

for model based decision support for water 

management at catchment and river basin scale. This 

main goal will be reached by providing the elements of 

a methodological layer in this infrastructure, embodied 

in a knowledge base (KB) and software tools. HarmoniQuA 

will collect methodological expertise, structure 

this knowledge and identify and fill in gaps. It will 

consist of generic and domain specific knowledge, 

modelling software specific aspects, and a transparent 

and consistent glossary of terms and concepts. This 

body of knowledge will be structured in a knowledge 

base. The following set of software tools will provide 

functionality for the HarmoniQuA system: 

guideline tool: will generate guidelines from the KB; 

monitoring tool: will monitor all activities within 

a modelling job and store these activities as a single 

model journal in a model archive; 

report tool: generates reports from a model journal; 

advisor tool: advises modellers in new modelling jobs 

based on decisions and choices of previous jobs and 

associated model journals in the model archive. 

An overview of the HarmoniQuA products (KB and 

tools) and how these interact with the activities of the 

users is presented in Fig. 4. The lower part of Fig. 4 

depicts the five major steps of the modelling process. 

These five major steps are decomposed into 45 tasks, 

with interrelations (order and feedback) as shown in 

Fig. 5. Each task has an internal structure, i.e. name, 

definition, explanation, interrelations with other tasks, 

activities, activity related methods, references, task 

inputs and outputs. This knowledge structure (steps, 

tasks, within-task-knowledge) is stored in the KB. The 

five steps and the tasks have been selected on the basis of 

existing modelling protocols and QA guidelines and 

include the key elements outlined in Section 3.3 above. 

Model based decision support has several dimensions, 

which hinder a ‘one-size-fits-all’-approach. HarmoniQuA 

attempts to serve several types of users in 

Knowledge Base 

Guidelines 

Software capabilities 

Glossary 

Domains: 

Groundwater 

Precipitation-runoff 

Hydrodynamics 

Flood forecasting 

Water quality 

Biota (ecology) 

Socio-economics 

Model 

Archive 

Model journal, Project A 

Model journal, Project B 

Model journal, Project C 

Model journal, Project D 

MoST 

Reporting 

Specific for types 

of users 

Guidance 

Generic + specific for: 

- model domain 

- user 

- job complexity 

Advise 

From previous 

model projects 

Monitoring 

Generic + specific for: 

- model domain 

- user 

- job complexity 

User 

Model Team 

Single/multiple domain 

Model Study 

Plan 

Data and 


Model 

Set-up 

Calibration 

and 

Validation 


Evaluation 

Reporting and client review take place in each step 

Fig. 4. HarmoniQuA tools (MoST) to support the QA process.



Describe Problem and 

Context 

Define Objectives 

Identify Data Availability 

Determine Requirements 

Prepare Terms of 

Reference 

Proposal and Tendering 

no 

Agree on 

Model Study Plan and 

Budget 

yes 

Legends 

Ordinary task 

Decision task 

Review task 

feedforward 

feedfback 

Data and Conceptualisation 

Model Set-up Calibration and Validation Simulation and Evaluation 

Describe System and 

Data Availability 

Construct Model 

Specify Stages in 

Calibration Strategy 

Simulations 

Process Raw Data 

no 

Test Runs 

Completed 

bad 

Select Optimisation 

Method 

Check 

Simulations 

no 

bad 

yes 

Sufficient 

Data 

yes 

Model Structure and 

Processes 

Model Parameters 

Summarise Conceptual 

Model and Assumptions 

Need for 

Alternative 

Conceptual 

Models 

no 

Process Model Structure 

Data 

no 

no 

OK 

Specify or Update 

Calibration + Validation 

Targets and Criteria 


Model Study Plan (Model 

Set-up) 

Review Model Set-up 

and Calibration and 

Validation Plan 

bad 

yes 

Define Stop Criteria 


Parameters 

Parameter 

Optimisation 

yes 

All Calibration 

Stages 

Completed 

yes 

Assess 

Soundness of 

Calibration 

OK 

Validation 

no 

no 

no 

not OK 

no 

bad 

yes 

Analyse and Interpret 

Results 

Assess 

Soundness of 

Simulation 

yes 

Uncertainty Analysis of 

Simulation 

Reporting of Simulation 

(incl. Uncertainty) 

Review of Simulation 

yes 

Model Study Closure 

bad 

no 

no 

Assess 

Soundness of 


yes 

bad 

Assess 

Soundness of 

Validation 

not OK 

Code Selection 



(Conceptualisation) 

OK 

Uncertainty Analysis of 

Calibration and 

Validation 

Document Model Scope 

no 

Review 

Conceptualisation and 

Model Set-up Plan 

yes 



(Calibration + Validation) 

Review Calibration and 

Validation and 

Simulation Plan 

yes 

no 

Fig. 5. The five steps and 45 tasks of modelling process in the HarmoniQuA knowledge base.


1211 

a series of water management domains, in jobs of 

diverse complexity and diverse application purpose. 

In this way, users working on a specific job will only be 

confronted with guidelines, instructions, decisions and 

activities that are relevant to their role in a particular 

modelling job. 

The HarmoniQuA tools have been developed in 

Prote´ ge´ 2000 following an ontological approach. More 

details can be found in Kassahun et al. (2004). The tools 

are available on http://www.harmoniqua.org/. 

4.2. Key elements 

Some of the key features to be implemented in the 

new HarmoniQuA guidelines are: 

4.2.1. Interactive guidelines 

The dialogue between the different players is crucial 

to ensure that the output from the modelling process is 

understandable for stakeholders and beneficial for the 

client. The importance of involvement of stakeholder 

and public opinions are emphasised by Pahl-Wostl 

(2002) and addressed in some Type 3 guidelines (e.g. 

BDMF, 2000; Pascual et al., 2003). In HarmoniQuA, 

each of the five major steps (Fig. 5) is therefore 

concluded with a dialogue task, in terms of either 

contract negotiation (first step) or reviews (last four 

steps). A dialogue task encourages the assessment of the 

present step and provides the opportunity to redefine the 

content of the model study plan for the next step based 

upon the results and findings of the present step. These 

dialogue steps provide flexibility to the modelling study 

and ensure that the tasks that have yet to be performed 

can be modified according to the achieved results and 

perceptions of modeller and client. 

4.2.2. Transparency and reproducibility 

Transparency and reproducibility are important, 

especially for large studies involving use of complex 

models. This will be ensured through the Monitoring 

Tool which enables modelling teams, consisting of 

modellers, managers and auditors, to be guided through 

the modelling process, to monitor all modelling activities 

and to oversee the status of each task to perform. With 

an increasing tendency to reuse existing models or 

rebuild them with additional data, modified conceptual 

models (revised model structure and/or inclusion of 

additional processes) and improved calibration and 

validation tests, this functionality of the Monitoring 

Tool becomes very important. 

4.2.3. Accuracy criteria 

Establishment of accuracy criteria for a modelling 

study is a very important, but difficult, issue. Modellers 

often establish numerical accuracy criteria in order to 

classify the goodness of a given model (e.g. Henriksen 

et al., 2003; Scholten and Van der Tol, 1998). These 

attempts are very useful in making the performance 

more transparent and quantitative, but do not provide 

an objective means to decide what the optimal accuracy 

criteria really should be in a given case. According to 

Refsgaard and Henriksen (2004) no universal accuracy 

criteria can be established, i.e. it is generally not possible 

from a natural scientific point of view to tell when 

a model performance is good enough. Such acceptance 

criteria will vary from case to case depending on the 

socio-economic context, i.e. what is at stake in the 

decisions to be supported by the model predictions. An 

appropriate question may be: how do we translate the 

‘soft’ socio-economic objectives to ‘hard-core’ model 

performance criteria This is obviously a challenge that 

cannot be solved by natural science alone, but needs to 

be addressed in a much broader context including 

aspects of economy, stakeholder interests and risk 

perception. 

Performance statistics must comprise quantifiable 

and objective measures. However numerical measures 

cannot stand alone. Often expert opinions are necessary 

supplements. 

4.2.4. Uncertainty assessments 

Quality assurance and uncertainty assessments are 

two aspects that are very closely linked. Initially, the 

manager has to define accuracy criteria from a perception 

of which uncertainty level he/she believes is suitable 

for a particular case (see above). Subsequently, as the 

modelling study proceeds, the dialogue between modeller 

and manager has to continue with the necessary 

trade off between modelling accuracy and the cost of the 

modelling study. In the uncertainty assessments it is very 

important to go beyond the traditional statistical 

uncertainty analysis. Thus, e.g. aspects of scenario 

uncertainty and ignorance should generally be included 

and in addition the uncertainties originating from data 

and models often needs to be integrated with socioeconomic 

aspects in order to form a suitable basis for 

the further decision process (e.g. Van Asselt and 

Rotmans, 2002). Thus, like with the accuracy criteria 

(above) the use of uncertainty assessments in water 

resources management goes beyond natural science. 

Assessment of uncertainty due to errors in the model 

structure is a particularly difficult task and is most often 

neglected. One way of evaluating this source of uncertainty 

is through the establishment of alternative 

conceptual models. This aspect is emphasised in the 

HarmoniQuA guidelines. 

4.2.5. Model validation 

Although experience shows that models generally 

perform poorer in validation tests against independent 

data than they do in calibration tests, model validation is 

in our opinion a neglected issue, both in many modelling


guidelines and in the scientific literature. Maybe many 

scientists have not wanted to use the term validation due 

to the scientific philosophically related controversies, but 

in any case many scientists are not advocating the need for 

model validation. One of the unfortunate consequences 

of this ‘lack of interest’ is that not much work has been 

devoted to developing suitable validation test schemes 

since Klemes (1986). In our opinion further development 

of suitable testing schemes, particularly for non-linear 

models and for applications comprising extrapolations 

beyond the calibration data basis, and imposing them to 

all modelling projects is a major future challenge. 

4.2.6. Dedication aspects 

The QA guidelines describe the different tasks and 

responsibilities of the different types of users such as (1) 

modellers; (2) water managers; (3) auditors; (4) stakeholders 

(other than water manager); and (5) general 

public. 

The QA guidelines are developed so that they 

adequately reflect the different requirements in several 

modelling domains (and still maintain a common generic 

core to ensure coherency). Furthermore, the guidelines 

will be applicable for studies where several domains, 

including socio-economy, are integrated. 

The QA guidelines differentiate according to job 

complexity in modelling, e.g. (1) basic (rough calculations); 

(2) intermediate (moderately complex calculations); 

and (3) comprehensive (sophisticated, detailed 

calculations). 


5.1. Types and reasons of existing QA guidelines 

We have classified quality assurance (QA) guidelines in 

three types: Internal technical guidelines (Type 1), Public 

technical guidelines (Type 2), and Public interactive 

guidelines (Type 3). We have then characterised the 

conditions for which the guidelines are used by (a) the 

scientific maturity of the underlying discipline(s) and (b) 

the maturity of the modelling market in the region/ 

country for which the guidelines were developed. Our 

review of existing QA guidelines is not exhaustive, but 

limited to examples aimed at being representative for 

conditions in Europe, North America and Australia. 

Thus, we have for instance not reviewed QA guidelines 

from countries in Asia, where modelling has taken place 

for many years. The results of our review revealed 

significant variations in the type of guidelines available 

and their usage between different modelling domains and 

countries. We hypothesised that the stage of QA guideline 

development largely depends on the maturity of both the 

specific scientific discipline and the modelling market in 

the respective country or region (Figs. 2 and 3). 

Considering Figs. 2 and 3 it appears that the maturity 

of the scientific discipline and market both play an 

important role in QA development. However, neither 

the scientific level nor the market maturity alone is able 

to explain the differences in the stage of QA guideline 

development. If the underlying process understanding or 

necessary data are too weak, then the modelling process 

lacks credibility no matter how well QA procedures are 

adhered to. Hence, the motivation to establish sophisticated 

QA guidelines in such cases is small. Similarly, 

even though a specific discipline may be scientifically 

mature, modellers may be reluctant to use sophisticated 

QA guidelines if they are not required to do so by 

regulators and/or water managers. The general development 

of QA guidelines has progressed over time 

from Type 1 towards Type 3. A developmental process 

that is consistent with the results of the reviews as 

reflected in Figs. 2 and 3 is the following. 

Initially, when models are introduced for practical 

application, internal technical guidelines (Type 1) 

originating from the research community are applied. 

The development from Type 1 to Type 2 QA guidelines 

requires a certain degree of maturity within both the 

specific scientific discipline and the market. This implies 

that there should not be significant lacks of knowledge 

on process descriptions, and that there is a common 

agreement about the scientifically sound procedures for 

solving the problems in this domain. The development 

of Type 2 guidelines is most often driven by the demands 

of regulators and water managers. The development 

from Type 2 to Type 3 requires a clear and conscious 

demand from regulators and water managers. 

It would also have been possible to classify the QA 

guidelines after other criteria, for example according to 

how uncertainty analysis is treated, whether they apply 

to single or multiple domains and whether they apply to 

natural or social science. We have chosen our classification 

for two main reasons. Firstly, an improved mutual 

understanding between modeller and water manager is 

crucial for a model application to be successful in 

practice, and this should be facilitated by the QA 

guidelines. Secondly, the trend of increasing stakeholder 

involvement in the water resources management process 

demands that QA guidelines also enable stakeholders to 

observe and take part in parts of the modelling process. 

Our characterisation of QA guidelines according to 

scientific and market maturity has some weaknesses. 

First of all, the assessments have been done subjectively, 

because there was no other feasible method. Secondly, 

the two characteristics are not completely independent. 

Thus a large and mature market will often put demands 

on new scientific knowledge and hence to enhance the 

scientific development, as well as it will lead to needs for 

improved technical standards. 

Altogether, it may be concluded that our hypotheses 

on the importance of scientific and market maturity for


1213 

the development of QA guidelines have not been 

falsified. However, due to the above weaknesses and 

the limited empirical basis (review not exhaustive but 

selected examples) this conclusion should be taken with 

some reservation. 

5.2. Organisational requirements 

for QA guidelines to be effective 

As emphasised by e.g. Forkel (1996) modelling 

studies involve several partners with different responsibilities. 

The ‘key players’ are code developers, model 

users (modellers) and water managers (including planning 

and regulatory authorities). To a large extent the 

quality of the modelling study is determined by the 

expertise, attitudes and motivation of the teams involved 

in the modelling and quality assessment process. 

The attitude of the modellers is important. NRC 

(1990) characterises this as follows: ‘‘most modellers 

enjoy the modelling process but find less satisfaction in 

the process of documentation and quality assurance’’. 

Scholten and Groot (2002) describe the main problem 

with the Dutch Handbook on Good Modelling Practice 

that they all like it, but only a few use it. 

QA will only become successful if both of the parties, 

modeller and water manager, are motivated and active in 

supporting its use. The water manager has a particular 

responsibility, because he/she has the power to request 

and pay for adequate QA in modelling studies. Therefore, 

QA guidelines can only be expected to be used in practice, 

if the water manager prescribes their use. In this respect it 

is very important that the water manager has the technical 

capacity to organise the QA process. A significant 

problem for water manager’s organisation is that it often 

lacks individuals who are trained at an appropriate level 

to understand and use models. If the water manager does 

not possess such skill within his/her own staff, an external 

modelling expert can be hired to help the manager in the 

QA process. However, this requires that the manager is 

aware of the problem and the need. 

5.3. The HarmoniQuA guidelines 

The approach adopted in the present HarmoniQuA 

guidelines correspond to Type 3. However, in addition 

to its focus on the dialogue and role play between the 

various actors in the modelling process, i.e. modellers, 

water managers, auditors and the public/stakeholders, 

the HarmoniQuA approach is innovative compared to 

existing Type 3 QA guidelines on the following aspects: 

Supporting software tools, beyond simple scoreboards 

and templates, are novel and important 

elements. These tools, which contain the knowledge 

base (KB), can guide the users through the 

modelling process, monitor decisions and outcomes, 

and provide experienced based advise on the 

appropriate route to be followed. This will significantly 

improve the transparency and reproducibility 

of the modelling process. To our knowledge no such 

tools exist or are under development at present. 

The focus on performance and accuracy criteria 

in the modelling process is not novel as such. However, 

the current adaptation of these criteria through 

the process in connection with the formalised review 

steps is, if not novel, then at least emphasised much 

more in the HarmoniQuA guidelines than in any 

other existing guidelines. This approach allows the 

HarmoniQuA guidelines to fit nicely with the new 

ideas of adaptive management (Pahl-Wostl, 2002). 

The uncertainty aspects are given a more central role 

than in existing guidelines, where uncertainty often 

is confined to assessment of predictive uncertainties 

towards the end of the study. In the HarmoniQuA 

guidelines uncertainty aspects plays an important 

role in 13 of the 45 tasks. Thus, uncertainty 

assessment is a central element in the dialogue 

between modeller and water manager already in the 

beginning of the model study when the initial 

performance criteria are outlined. Furthermore, 

HarmoniQuA recommends including less quantifiable 

elements such as scenario uncertainty and 

model structural uncertainty in the assessment. 

Model validation tests against independent data have 

more emphasis than in most other guidelines. 

Although the most comprehensive of the existing 

guidelines, the Dutch guidelines (Van Waveren 

et al., 2000), for example recommends validation 

to be carried out, they do not describe validation 

tests beyond the traditional split-sample test. 

The HarmoniQuA guidelines are unique in their 

dedication aspects, namely that different tasks and 

responsibilities are described for different users, 

different modelling domains and different levels of 

modelling job complexity. The Australian groundwater 

modelling guidelines have the same feature, 

but only with respect to the review procedures 

(Merrick et al., 2002). 

The HarmoniQuA guidelines consist of a comprehensive 

set of QA guidelines for multiple modelling domains 

combined with the supporting software tools. These 

functionalities appear to be well suited to the challenges 

and demands of modern water resources management. 

The usefulness, user friendliness and appreciation by the 

users will be assessed through a testing of the guidelines 

and tools in a range of river basin modelling projects. 


The present work was carried out within the 

Project ‘Harmonising Quality Assurance in model based


catchments and river basin management (Harmoni- 

QuA)’, which is partly funded by the EC Energy, 

Environment and Sustainable Development programme 

(Contract EVK1-CT2001-00097). The constructive comments 

of five anonymous reviewers are acknowledged. 

References 

Anderson, M.G., Bates, P.D., 2001. Hydrological science: model 

credibility and scientific integrity. In: Anderson, M.G., Bates, P.D. 

(Eds.), Model Validation. Perspectives in Hydrological Science. 

John Wiley & Sons, Chichester, pp. 1–10. 

Anderson, M.P., Woessner, W.W., 1992. The role of postaudit in 

model validation. Advances in Water Resources 15, 167–173. 

Anderson, M.P., Ward, D.S., Lappala, E.G., Prickett, T.A., 1993. 

Computer models for subsurface water. In: Maidment, D.R. (Ed.), 

Handbook of Hydrology. McGraw-Hill, Inc (Chapter 22). 

ASTM, 1992. Standard Practice for Evaluating Mathematical Models 

for the Environmental Fate of Chemicals. Standard E978-92, 

American Society for Testing and Materials, http://www.astm.org. 

ASTM, 1994. Standard Guide for Application of a Ground-Water 

Flow Model to a Site-Specific Problem. Standard D5447-93, 

American Society for Testing and Materials, http://www.astm.org. 

Balint, G., 2002. State-of-the-art for flood forecasting modelling. In: 

Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance 

in Modelling Related to River Basin Management. Chapter 7, 

Geological Survey of Denmark and Greenland, Copenhagen, 

http://www.harmoniqua.org. 

BDMF, 2000. Protocols for Water and Environmental Modeling. 

Bay-Delta Modeling Forum. Ad hoc Modeling Protocols Committee, 

http://www.sfei.org/modelingforum/. 

Blind, M., 2004. ICT requirements for an ‘evolutionary’ development 

of WFD compliant River Basin Management Plans. In: Pahl, C., 

Schmidt, S., Jakeman, T. (Eds.), iEMSs 2004 International 

Congress: ‘‘Complexity and Integrated Resources Management’’. 

International Environmental Modelling and Software Society, 

Osnabru¨ ck, Germany, June 2004. 

CAMASE, 1996. CAMASE was a Concerted Action for the Development 

and Testing of Quantitative Methods for research on 

Agricultural Systems and the Environment, http://www.bib.wau. 

nl/camase/. 

Da Silva, M.C., Barbosa, A.E., Rocha, J.S., Fortunato, A.B., 2002. 

State-of-the-art for surface water quality modelling. In: Refsgaard, 

J.C. (Ed.), State-of-the-Art Report on Quality Assurance in 

Modelling Related to River Basin Management. Chapter 8, 



DHI, 2002. MIKE 11 User Guide. DHI Water & Environment, 


Forkel, C., 1996. Das numerische Modell – ein schmaler Grat zwischen 

vertrauenswu¨ rdigem Werkzeug und gefährlichem Spielzeug. Presented 

at the 26. IWASA, RWTH Aachen, 4–5 January 1996. 

GWP-TAC, 2000. Integrated Water Management, TEC Background 

Papers No. 4, Global Water Partnership, SE-105 25 Stockholm, 

Sweden, ISBN: 91-630-9229-8. 

Heinz, I., Eberle, S., 2002. State-of-the-art for socio-economic 

modelling. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on 

Quality Assurance in Modelling Related to River Basin Management. 

Chapter 10, Geological Survey of Denmark and Greenland, 

Copenhagen, http://www.harmoniqua.org. 

Henriksen, H.J., 2002a. Australian groundwater modelling guidelines. 

In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality 

Assurance in Modelling Related to River Basin Management. 

Chapter 13, Geological Survey of Denmark and Greenland, 

Copenhagen, http://www.harmoniqua.org. 

Henriksen, H.J., 2002b. Danish groundwater modelling guidelines. In: 


in Modelling Related to River Basin Management. Chapter 

14, Geological Survey of Denmark and Greenland, Copenhagen, 


Henriksen, H.J., Troldborg, L., Nyegaard, P., Sonnenborg, T.O., 

Refsgaard, J.C., Madsen, B., 2003. Methodology for construction, 

calibration and validation of a national hydrological model for 

Denmark. Journal of Hydrology 280 (1–4), 52–71. 

Kassahun, A., Scholten, H., Zompanakis, G., Gavardinas, C., 2004. 

Support for model based water management with the HarmoniQuA 

toolbox. In: Pahl, C., Schmidt, S., Jakeman, T. (Eds.), 

iEMSs 2004 International Congress: ‘‘Complexity and Integrated 

Resources Management’’. International Environmental Modelling 

and Software Society, Osnabru¨ ck, Germany, June 2004. 

Klemes, V., 1986. Operational testing of hydrological simulation 

models. Hydrological Sciences Journal 31, 13–24. 

Merrick, N.P., Middlemis, H., Ross, J.B., 2002. Groundwater 

Modelling Guidelines for Australia – Recommended Procedures 

for Modelling Reviews. International Groundwater Conference. 

Balancing the Groundwater Budget. Northern Territory. Australia. 

12–17 May 2002. 

Metelka, T., Krejcik, J., 2002a. State-of-the-art for hydrodynamic. In: 


in Modelling Related to River Basin Management. Chapter 6, 



Metelka, T., Krejcik, J., 2002b. Quality assurance in Central and 

Eastern Europe. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report 

on Quality Assurance in Modelling Related to River Basin 

Management. Chapter 15, Geological Survey of Denmark and 

Greenland, Copenhagen, http://www.harmoniqua.org. 

Middlemis, H., 2000. Murray-Darling Basin Commission. Groundwater 

Flow Modelling Guideline. Aquaterra Consulting Pty Ltd. 

South Perth. Western Australia. Project no. 125. 

NRC, 1990. Ground Water Models: Scientific and Regulatory 

Applications. National Research Council, National Academy 

Press, Washington, D.C. 

Old, G.H., Packman, J.C., Calver, A.N., 2002. State-of-the-art 

for biota (ecological) modelling. In: Refsgaard, J.C. (Ed.), 

State-of-the-Art Report on Quality Assurance in Modelling 

Related to River Basin Management. Chapter 9, Geological 

Survey of Denmark and Greenland, Copenhagen, http://www. 

harmoniqua.org. 

Packman, J.C., 2002. Quality Assurance in the UK. In: Refsgaard, J.C. 

(Ed.), State-of-the-Art Report on Quality Assurance in Modelling 




Pahl-Wostl, C., 2002. Towards sustainability in the water sector – the 


Aquatic Sciences 64, 394–411. 

Pascual, P., Stiber, N., Sunderland, E., 2003. Draft Guidance on the 

Development, Evaluation, and Application of Regulatory Environmental 

Models. Council for Regulatory Environmental Modeling. 

US EPA, Washington D.C. 

Perrin, C., Andreassian, V., Michel, C., 2002a. State-of-the-art for 

precipitation-runoff modelling. In: Refsgaard, J.C. (Ed.), Stateof-the-Art 

Report on Quality Assurance in Modelling Related to 

River Basin Management. Chapter 5, Geological Survey of Denmark 

and Greenland, Copenhagen, http://www.harmoniqua.org. 

Perrin, C., Andreassian, V., Michel, C., 2002b. Quality assurance for 

precipitation-runoff modelling in France. In: Refsgaard, J.C. (Ed.), 

State-of-the-Art Report on Quality Assurance in Modelling 



harmoniqua.org.


1215 

Refsgaard, J.C. (Ed.), 2002. State-of-the-Art Report on Quality 

Assurance in Modelling Related to River Basin Management. 

Report from the EU research project HarmoniQuA, http://www. 

harmoniqua.org. 18 chapters, 182 pp. Geological Survey of 

Denmark and Greenland, Copenhagen. 

Refsgaard, J.C., Henriksen, H.J., 2002. State-of-the-art for Groundwater 

Modelling. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report 

on Quality Assurance in Modelling Related to River Basin 

Management. Chapter 4, Geological Survey of Denmark and 

Greenland, Copenhagen, http://www.harmoniqua.org. 

Refsgaard, J.C., Henriksen, H.J., 2004. Modelling guidelines – 

terminology and guiding principles. Advances in Water Resources 

27, 71–82. 

Rumbaugh, J.O., Rumbaugh, D.B., 2001. Guide to Using Groundwater 

Vistas. Environmental Simulations, Inc, Virginia, USA. 

Rykiel, E.R., 1996. Testing ecological models: the meaning of 

validation. Ecological Modelling 90, 229–244. 

Scholten, H., Van der Tol, M.W.M., 1998. Quantitative validation of 

deterministic models: when is a model acceptable In: Obaidat, M.S., 

Davoli, F., DeMarinis, D. (Eds.), The Proceedings of the Summer 

Computer Simulation Conference. SCS, The Society for Computer 

Simulation International, San Diego, CA, USA, pp. 404–409. 

Scholten, H., Groot, S., 2002. Dutch guidelines. In: Refsgaard, J.C. 

(Ed.), State-of-the-Art Report on Quality Assurance in modelling 

related to river basin management. Chapter 12, Geological 



Scholten, H., Van Waveren, R.H., Groot, S., Van Geer, F.C., Wo¨ sten, 

J.H.M., Koeze, R.D., Noort, J.J., 2000. Good Modelling Practice 

in Water Management. Paper Presented on Hydroinformatics 

2000, Cedar Rapids, IA, USA. 

Van Asselt, M.B.A., Rotmans, J., 2002. Uncertainty in integrated 

assessment modelling – From positivism to pluralism. Climatic 

Change 54 (1–2), 75–105. 

Van Gils, J.A.G., Groot, S., 2002. Examples of good modelling 

practice in the Danube Basin. In: Refsgaard, J.C. (Ed.), Stateof-the-Art 

Report on Quality Assurance in Modelling Related to 

River Basin Management. Chapter 18, Geological Survey of 

Denmark and Greenland, Copenhagen, http://www.harmoniqua. 

org. 

Van Waveren, R.H., Groot, S., Scholten, H., Van Geer, F.C., Wo¨ sten, 

J.H.M., Koeze, R.D., Noort, J.J., 2000. Good Modelling Practice 

Handbook, STOWA Report 99-05, Utrecht, RWS-RIZA, Lelystad, 

The Netherlands, http://waterland.net/riza/aquest/ (In Dutch).

[14] 

Refsgaard JC, Nilsson B, Brown J, Klauer B, Moore R, Bech T, Vurro M, 

Blind M, Castilla G, Tsanis I, Biza P (2005) Harmonised techniques and 

representative river basin data for assessment and use of uncertainty 

information in integrated water management (HarmoniRiB). 

Environmental Science and Policy, 8, 267-277. 

Reprinted from Environmental Science and Policy with permission from Elsevier

Environmental Science & Policy 8 (2005) 267–277 

www.elsevier.com/locate/envsci 

Harmonised techniques and representative river basin data for 

assessment and use of uncertainty information in 

integrated water management (HarmoniRiB) 

Jens Christian Refsgaard a, *, Bertel Nilsson a , James Brown b , 

Bernd Klauer c , Roger Moore d , Thomas Bech e , Michele Vurro f , Michiel Blind g , 

Guillermo Castilla h , Ioannis Tsanis i , Pavel Biza j 

a Geological Survey of Denmark and Greenland (GEUS), Department of Hydrology, Øster Voldgade, DK-1350 Copenhagen, Denmark 

b Universiteit van Amsterdam (UVA), Amsterdam, The Netherlands 

c Centre for Environmental Research (UFZ), Leipzig, Germany 

d Centre for Ecology and Hydrology (CEH), Wallingford, UK 

e DHI Water and Environment (DHI), Hørsholm, Denmark 

f Istituto di Ricerca Sulle Acque del CNR (IRSA), Bari, Italy 

g Institute of Inland Water Management and Waste Water Treatment (RIZA), Lelystad, The Netherlands 

h Universidad de Castilla – La Mancha (UCLM), Albacete, Spain 

i Technical University Crete (TUC), Chania, Greece 

j Povodi Moravi (PM), Brno, Czech Republic 

Abstract 

This paper describes progress on HarmoniRiB, a European Commission Framework 5 project. The HarmoniRiB project aims to support 

the implementation of the EU Water Framework Directive (WFD) by developing concepts and tools for handling uncertainty in data and 

modelling, and by designing, building and populating a database containing data and associated uncertainties for a number of representative 

basins. This river basin network aims at becoming a ‘virtual laboratory for modelling studies’, and it will be made available for the scientific 

community. The data may, e.g. be used for comparison and demonstration of methodologies and models relevant to the WFD. 

# 2005 Elsevier Ltd. All rights reserved. 

Keywords: Uncertainty; River basin management; Data; Models; River basin network; HarmoniRiB; Water Framework Directive 


1.1. Problems to be addressed 

The Water Framework Directive (WFD) provides a 

European policy basis at the river basin scale. The river basin 

management and planning process prescribed in the WFD is 

an adaptation of the Integrated Water Resources Management 

principles (GWP, 2000), involving all physical 

domains in water management, sectors of water use, 

socio-economics and stakeholder participation. As such, 

* Corresponding author. Tel.: +45 38 14 27 76; fax: +45 38 14 20 50. 


the WFD poses new challenges to water resources managers. 

The traditional physical domain specific and sectoral 

approaches need to be combined and extended to fulfil 

the WFD requirements. The preparation of the river basin 

management plans, prescribed in the WFD, is furthermore 

influenced by uncertainties on the underlying data and 

modelling results. In several sections of the WFD document, 

uncertainty is addressed (Blind and de Blois, 2003). In 

addition, most of the WFD guidance documents, being more 

specific than the WFD document itself, explicitly emphasise 

that uncertainty analyses should be performed. However, in 

spite of strong recommendations to consider uncertainty 

aspects the guidance documents do not include recommendations 

on how to do so. 

1462-9011/$ – see front matter # 2005 Elsevier Ltd. All rights reserved. 

doi:10.1016/j.envsci.2005.02.001

268 

J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277 

Therefore, there is a clear and urgent need for developing 

new concepts, methodologies and tools that can be used to 

assist in implementing the WFD. In order to support such 

research and development, it is necessary to have a network 

of representative river basins with datasets suitable for this 

purpose. This implies that the datasets, in addition to 

covering the diversity in terms of ecological regimes and 

socio-economic conditions found across Europe, must have 

built-in information on the uncertainties in the data. 

1.2. Objectives 

The paper presents status and preliminary results from an 

ongoing research project, HarmoniRiB, that is supported 

under EU’s 5th Framework Programme. The overall goal of 

HarmoniRiB is to develop methodologies for quantifying 

uncertainty and its propagation from the raw data to concise 

management information. The four specific project objectives 

are: 

To establish a practical methodology and a set of tools for 

assessing and describing uncertainty originating from 

data and models used in decision making processes for the 

production of integrated water management plans. It will 

include a methodology for integrating uncertainties on 

basic data and models and socio-economic uncertainties 

into a decision support concept applicable for implementation 

of the WFD. 

To provide a conceptual model for data management that 

can handle uncertain data and implement it for a network 

of representative river basins. 

To provide well documented datasets, suitable for 

studying the influence of uncertainty on management 

decisions for a network of representative river basins and 

to provide examples of their use in the development of 

integrated water management plans. 

To disseminate intermediate and final results among 

researchers and end-users across Europe and obtain and 

incorporate feedback on the methodologies, tools and the 

datasets. 

2. Uncertainty assessments 

2.1. Definitions and taxonomy 

Uncertainty and associated terms such as error, risk and 

ignorance are defined and interpreted differently by different 

authors (see Walker et al., 2003 for a review). The different 

definitions reflect, among other factors, the different 

scientific disciplines and philosophies of the authors 

involved, as well as the intended audience. In addition they 

vary depending on their purpose. Some are rather generic, 

such as Funtowicz and Ravetz (1990), while others apply 

more specifically to model based water management, such as 

Beck (1987). The terminology used in HarmoniRiB has 

emerged after discussions between social scientists and 

natural scientists specifically aiming at applications in 

model based water management (Klauer and Brown, 2003). 

By doing so we adopt a subjective interpretation of 

uncertainty in which the degree of confidence that a decision 

maker has about possible outcomes and/or probabilities of 

these outcomes is the central focus. Thus, according to our 

definition a person is uncertain if s/he lacks confidence 

about the specific outcomes of an event. Reasons for this lack 

of confidence might include a judgement that the information 

is incomplete, blurred, inaccurate, imprecise or 

potentially false. Similarly, a person is certain if s/he is 

confident about the outcome of an event. It is possible that a 

person feels certain but has misjudged the situation (i.e. s/he 

is wrong). 

There are many different (decision) situations, with 

different possibilities for characterising of what we know or 

do not know and of what we are certain or uncertain. A first 

distinction is between ignorance as a lack of awareness 

about imperfect knowledge and uncertainty as a state of 

confidence about knowledge (which includes the act of 

ignoring). Our state of confidence may range from being 

certain to admitting that we know nothing (of use), and 

uncertainty may be expressed at a number of levels in 

between. Regardless of our confidence in what we know, 

ignorance implies that we can still be wrong (‘in error’). In 

this respect Brown (2004) has defined a taxonomy of 

imperfect knowledge illustrated in Fig. 1. 

In evaluating uncertainty, it is useful to distinguish 

between uncertainty that can be quantified, e.g. by 

probabilities and uncertainty that can only be qualitatively 

described, e.g. by scenarios. If one throws a balanced die, the 

precise outcome is uncertain, but the ‘attractor’ of a perfect 

die is certain: we know precisely the probability for each of 

the 6 outcomes, each being 1/6. This is what we mean with 

‘uncertainty in terms of probability’. However, the estimates 

for the probability of each outcome can also be uncertain. If 

a model study says: ‘‘there is a 30% probability that this area 

will flood two times in the next year’’, there is not only 

‘uncertainty in terms of probability’ but also uncertainty 

regarding whether the estimate of 30% is a reliable estimate. 

Secondly, it is useful to distinguish between bounded 

uncertainty, where all possible outcomes have been 

identified (they can be distinct or indistinct) and unbounded 

uncertainty, where the known outcomes are considered 

incomplete. Since quantitative probabilities require ‘all 

possible outcomes’ of an uncertain event and each of their 

individual probabilities to be known, they can only be 

defined for ‘bounded uncertainties’. If probabilities cannot 

be quantified in any undisputed way, we often can still 

qualify the available body of evidence for the possibility of 

various outcomes. 

The bounded uncertainty where all probabilities are 

deemed known (Fig. 1) is often denoted ‘statistical 

uncertainty’ (e.g. Walker et al., 2003). This is the case 

traditionally addressed in model based uncertainty assess-

J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277 269 

Fig. 1. Taxonomy of imperfect knowledge resulting in different uncertainty situations (Brown, 2004). 

ment. It is important to note that this case constitutes one of 

many decision situations outlined in Fig. 1, and in other 

situations the main uncertainty in a decision situation cannot 

be characterised statistically. 

2.2. Framework for describing data uncertainty 

By considering space–time variability and data type, 

Brown et al. (2005) have distinguished 13 uncertainty 

categories of uncertain data (Table 1). 

By considering measurement scale, it becomes possible 

to quickly limit the relevant uncertainty models for a certain 

variable. On a discrete measurement scale, for example, it is 

only relevant to consider discrete probability distribution 

functions, whereas continuous density functions are required 

for continuous numerical data. In addition, the use of space 

and time variability determines the need for autocorrelation 

functions alongside a probability density function ( pdf ). 

Brown et al. (2005) explain that this classification of data by 

measurement scale and space–time variability is useful for 

uncertainty assessment because: (1) it reduces the amount of 

required information requested from the user in populating a 

database; (2) it reduces the amount of information stored in a 

database (model parameter values); (3) it ensures a close 

relationship between the structure of the probability model 

and the techniques used to estimate its parameters and; (4) it 

encourages planning of measurement campaigns for 

collecting information on uncertainty. 

Each data category is associated with a range of 

uncertainty models, for which more specific pdfs may be 

developed with different simplifying assumptions (e.g. 

Gaussian; second-order stationarity; degree of temporal and 

spatial autocorrelation). The advantages of allowing a range 

of possible models for each data category are threefold. 

First, there is a need to explicitly define an appropriate set of 

statistical assumptions for a particular dataset. Secondly, a 

range of possible assumptions can be defined a priori, and 

hence the significance of particular assumptions can be 

demonstrated with examples. Finally, the trade-off between 

model complexity, identifiability and reliability can be 

reviewed over time and balanced against the (changing) 

practical constraints on assessing uncertainty. For example, 

levels of risk and expertise can be associated with the 

simplifying assumptions allowed in a pdf, with default 

Table 1 

The subdivision and coding of uncertainty-categories, along the ‘axes’ of space–time variability and measurement scale (Brown et al., 2005) 

Space–time variability 

Measurement scale 

Continuous numerical Discrete numerical Categorical Narrative 

} 

Constant in space and time A1 A2 A3 

Varies in time, not in space B1 B2 B3 

Varies in space, not in time C1 C2 C3 

Varies in time and space D1 D2 D3 

4

270 


models for low-risk applications involving users with 

limited expertise. Minimum requirements can also be 

identified for specific datasets, such as data on toxic 

chemicals. 

Categorical data (3) differ from numerical data (1, 2) and 

narrative (4) in three important ways. First, categorical data 

cannot be manipulated statistically (i.e. computation of 

mean and variance), because the categories are not measured 

on a numerical scale. Secondly, individual values may be 

assigned to unique classes (one value to one class), where 

pdfs are based on the measured frequency, or perceived 

probability (Bayes rule), that a value occurs in a particular 

‘hard’ class or they can be partially assigned to multiple 

classes (fuzzy), where probabilities reflect doubt about the 

proportional membership of a value to a particular class 

(Heuvelink and Burrough, 1993). For the purposes of an 

uncertainty analysis, this distinction is important, because 

accuracy assessments are more complicated for fuzzy 

descriptions of reality. An important issue often overlooked 

with categorical data (e.g. the confusion matrix in landcover 

classification) is the problem of correlation in space 

and time or between datasets, since traditional statistical 

techniques do not apply to categorical data. 

Reviews with results on data uncertainty reported in the 

literature have been compiled into a guideline report for 

assessing uncertainty in various types of data originating 

from meteorology, soil physics and geochemistry, hydrogeology, 

land cover, topography, discharge, surface water 

quality, ecology and socio-economics (Van Loon and 

Refsgaard, 2005). 

2.3. Software tool to support uncertainty assessment in 

data and models 

The components of the HarmoniRiB uncertainty software 

are shown in Fig. 2. 

There are four software components in the HarmoniRiB 

design, namely: (1) a module for assessing uncertainties in 

data and storing this information within a database design 

(the database design is described briefly below (assess data 

uncertainty)); (2) a module for assessing uncertainties in 

models (assess model uncertainty); (3) a module for 

sampling from a distribution of uncertain inputs and 

(possibly) model parameters and implementing the model 

for each realisation of the uncertain inputs and parameters 

(uncertainty propagation); (4) a module for synthesising and 

presenting the uncertainty results ( present uncertainty). 

The Data Uncertainty Engine (DUE) is illustrated in 

Fig. 3. It separates the analysis of data uncertainties into four 

stages, whereby objects are first imported into the software 

(1), the sources of uncertainty are then identified (2) 

(important for a structured analysis) and are translated into a 

simple model (3) (e.g. probability model) from which 

‘alternative realities’ can be generated. These ‘alternative 

realities’ are used in an uncertainty propagation analysis to 

establish the impacts of data uncertainty on other operations, 

such as modelling. Finally, it is necessary to reflect on the 

quality of an uncertainty analysis (4), as they are fraught 

with assumptions and difficulties and can be misleading 

without quality control. The information required to 

generate ‘alternative realities’ of one or more environmental 

attributes is stored in the project database (see below). 

The methodology proposed for assessing model uncertainty 

is outlined in Refsgaard et al. (submitted for 

publication). 

2.4. Uncertainty in socio-economics 

Often uncertainty assessments are confined to uncertainties 

in data and models originating from natural science. We 

also consider uncertainty in socio-economic aspects by 

developing concepts based on the management of water 

resources and river basins (e.g. Cech, 2003). It takes into 

account literature on evaluation, e.g. cost-benefit analysis 

(Hanley and Spash, 1993; Bergstrom et al., 2001), multicriteria 

analysis (Roy, 1996; Munier, 2004) and decision 

making under uncertainty (Jungermann et al., 1998). The 

innovative aspects of our work lie in the further development 

Fig. 2. HarmoniRiB software components.


Fig. 3. Screen shots from the HarmoniRiB data uncertainty assessment tool. 

of these ideas to support the implementation of the WFD and 

particularly elaborating the role of uncertainty in the process 

of creating and selecting management measures. 

The uncertainty in socio-economic data of official 

statistics (Eurostat, Statistical bureaus of German Länder 

and the FRG) has been surveyed. We found that the efforts to 

produce accurate economic data are enormous but the 

knowledge and awareness of the remaining uncertainties is 

generally low. Despite the lack of knowledge and awareness 

about uncertainty in socio-economic data and their sources 

we judge the consideration of these uncertainties in river 

basin management as highly relevant. On the basis of our 

investigations and our experience, we expect that it will be 

difficult to reach a meaningful quantification of many of 

these uncertainties. Methods for the systematic collection of 

qualitative information on uncertainties as well as strategies 

to deal with uncertainties that are not necessarily based on 

quantification are therefore needed. 

3. Databases for accommodating uncertain data 

3.1. Functionality with respect to data uncertainty 

We have designed and developed software for a database 

than can handle data and data uncertainty. The novelty of 

this database is that it meets the following requirements: 

It can store time-series data. 

It can store spatial data, both raster and vector, as well as 

time-series of spatial data. 

It can store information about uncertainty in these data. 

The uncertainty characteristics are described according to 

the uncertainty categories listed in Table 1. This implies that 

for the continuous data types the uncertainty is described by 

use of a probability density function (pdf) and a correlation 

matrix (or correlation function) for normally distributed 

data. For categorical data (such as land cover or soil type), a 

non-parametric distribution is typically required, and may be 

stored alongside transition probabilities for describing statistical 

dependence. The HarmoniRiB database design therefore 

allows the user to associate a probability model with 

each uncertain data item. In future, the database will be 

extended to allow numerical bounds (e.g. confidence intervals) 

and scenarios when probabilities cannot be defined. 

Information on the sources of uncertainty and the quality of 

an uncertainty model is also stored in the database. 

An initial list of pdfs and autocorrelation functions are 

included in a Probability Distribution Function Dictionary 

and an Autocorrelation Function Dictionary of the database. 

In addition the software will allow a user to add new 

functions when required. In practice, it may not be possible 

to calculate the pdf parameters for every attribute value in 

the database individually. It may only be feasible to calculate 

them at the level of the attribute with which the value is 

associated (i.e. an assumption of stationarity in space or 

time). In all cases, an uncertainty model is referenced by an 

Uncertainty Model ID (UMID), which acts as a pointer to an 

uncertainty model that applies to a specific location in space 

or time and to the information on statistical dependence 

between locations and attributes.

272 


3.2. General database functionality 

The overall aim of the HarmoniRiB database system is to 

enable the HarmoniRIB Data Centre to receive, quality 

control, store and make available the representative basin 

data being assembled by the project. Ideally, it should be 

able to handle any data required for developing WFDcompliant 

River Basin Management Plans. This includes 

data for underlying modelling studies, and thus exceeds the 

WFD needs for reporting or river basin characterisations. 

The data will cover a wide range of water related topics but 

will mainly take the form of site descriptions and time series 

records. They will also include spatial data describing site 

locations, networks and variables such as land use or 

elevation. The proposed HarmoniRiB database design for 

holding these data is generic and is based on the WIS Cube 

(Moore, 1997). The major enhancements are not only the 

inclusion of uncertainty but also the seamless linking of 

metadata to data and a new underlying table design. 

At the user level, a HarmoniRiB database perceives the 

world as being composed of objects. These are any objects 

whose description and history the user wishes to record. The 

types or classes of object are decided by the user. Examples 

of object classes relevant to the WFD are sampling points, 

wells, reservoirs and rivers. 

The descriptions of objects and the events observed at 

them are recorded in terms of attribute values. Attributes, 

like object classes, are decided and defined by the user, the 

definitions being held in a dictionary. Awide range of spatial 

and non-spatial data types are supported, allowing the 

system to record most known or foreseeable types of 

attribute information required for the implementation of the 

WFD. Examples of attributes are object identifiers (names, 

reference codes, serial numbers, etc.), position, mean daily 

river flow, concentration (of e.g. nitrate), soil type and 

hydraulic conductivity. 

At the conceptual level, there is no differentiation 

between spatial and non-spatial attributes. They are all 

stored within the same logical framework. 

One way of visualising the manner in which data are 

stored in a HarmoniRiB database is to imagine a large cube, 

made up of individual cells as shown in Fig. 4. The three 

axes of this cube represent objects (WHERE observations 

were made), attributes (which record WHAT the observation 

was a measure of) and occasions (WHEN the observations 

were made). Thus, each cell in the cube records the value of 

an attribute at a particular object for a particular point in 

time. For example, one cell might record the concentration 

of calcium on 29 June 2002 at 10:20 (GMT) in the river 

Thames at Wallingford. 

The design regards all attribute values as potentially 

changeable over time, thus enabling it to handle time-series 

data such as river flow. This facility applies to spatial 

attributes as well as conventional time series making it 

possible to track an object’s movement. There is no 

constraint on the number of objects, attributes or occasions 

Fig. 4. The Cube as a way of visualising how time series data are stored 

(Tindal et al., 2004). 

which can be recorded, other than that imposed by the 

physical limits of the hardware. The Cube is otherwise 

unlimited in all directions. 

The cells in the cube hold the users’ data. Each cell 

contains a single attribute value. A cell can also contain 

some or all of the following information associated with the 

value: 

A qualifier for the value. A qualifier is an item of 

information which users may enter in order to amplify the 

meaning of an attribute value. For example, qualifiers may 

be useful in: 

Bird or bacteriological count attributes where the value 

may take the form of, say, ‘more than 10,000’. In this 

case, the value would be entered as 10,000, and the 

qualifier as > 

Chemical concentration attributes, where the actual 

concentration is unknown, but it is possible to say that it 

is less than a certain value, where the value represents 

the limit of detection of the analysis method. The value 

would be entered as the limit of detection, for example 

0.001, and the qualifier as < 

A method of derivation identifier. The method code is a 

user defined code identifying the source from which the 

value was obtained or the method by which it was derived. 

This information can be used, for example, by future users 

of the value, to determine its reliability. 

A measure of the value’s uncertainty in the form of a 

reference to an uncertainty model stored elsewhere in the 

database. This part of the requirement represents the 

major area of innovation and is likely to evolve as the 

project progresses. 

Dataset ID. Every value in the database has a pointer 

connecting the value to the dataset of which it is a 

member. The definition of what constitutes a dataset is up 

to the user. The only mandatory part of its definition is that 

the data values that make up a dataset must be owned by 

the same person or organisation. This condition is 

necessary to facilitate access control which will relate 

to ‘owned’ blocks of data.


Uncertainty Model ID. Each value contains a reference to 

an uncertainty model, which describes the range of 

possible values that an attribute might take at a given 

location. 

At the physical level, the data will be stored in a set of 

tables in a relational database such as Oracle. These will be 

held in a single account managed by the database administrator. 

Approved applications such as the data load facility 

will have direct access to this account and will be able to 

select and update data. Users and user written applications 

will be given read only access to the database via their own 

accounts. 

The database software is developed for application on an 

ArcSDE/ArcGIS platform using ESRI technology. 

4. River basin network and data 

Many networks of river basin data have been established 

for research purposes during the last couple of decades. A 

review of the characteristics of existing networks with 

respect to type of data, geographical coverage, data 

accessibility and data use by third parties is provided by 

Passarella and Vurro (2003). Examples of existing international 

networks are Flow Regimes from International 

Experimental and Network Data (FRIEND); Global Runoff 

Data Centre (GRDC); Hydrology for the Environment, Life 

and Policy (HELP); World Hydrological Cycle Observing 

System (WHYCOS); European River and Catchment 

Database Pilot Project (ERICA); Inventory of the Catchments 

for Research in Europe (ICARE) metadatabase and 

the Experimental Representative Basins (ERB) network and 

GLOWA. 

In addition to these international networks, many national 

databases containing data from national networks of river 

basins exist, e.g. Lowland Catchment Research (LOCAR); 

Data Storage for the Rijkswaterstaat (DONAR) and British 

Oceanographic Data Centre (BODC). 

Some of the existing networks provide data for 

operational purposes, while most of them have been 

established for research purposes. Many of these networks 

have existed for long periods and have served (and still do) 

important purposes. However, seen from a Water Framework 

Directive perspective, most of them have the key 

deficiency that they focus on only some aspects (domains) of 

Fig. 5. Location of the HarmoniRiB network of representative river basins.

274 


data required for water management in WFD, and most 

typically they do not contain data on ecological and socioeconomic 

aspects. Even comprehensive national databases 

such as LOCAR and DONAR do not contain do not contain 

much data on groundwater, land use and socio-economics. 

Among the international networks HELP has the broadest 

scope with a focus on socio-economic aspects. HELP, 

however, does not include groundwater or coastal water 

data. Furthermore, HELP so far only consists of rather few 

river basins Worldwide and does not have a good coverage in 

Europe. 

Thus, none of the existing river basin networks can 

provide suitable datasets for supporting research on 

integrated water management of direct relevance for 

implementation of the WFD. In addition, none of the 

existing networks comprise any quantifiable information on 

data uncertainty. Consequently, it is concluded that there is a 

clear need to supplement the existing networks with a 

network of representative river basins that as its principal 

aim has to provide data supporting research in integrated 

water resources management as required by the WFD. The 

HarmoniRiB river basin network is meant for this purpose. 

The HarmoniRiB network of representative river basins 

comprise eight basins, see Fig. 5 for locations and Table 2 for 

characteristic features. These basins have been selected to 

ensure a good coverage across Europe in terms of ecoregions, 

types of water problems, socio-economic conflicts 

and amount and quality of existing data. In addition, two of 

the river basins (Odense and Jucar) are also included in the 

Pilot River Basin Network, where the EC guidance 

documents have been tested. The aim of HarmoniRiB is, 

through interaction with the respective river basin organisations 

and data owners, to provide well documented data for 

research purposes, suitable for studying the influence of 

uncertainty on management decisions. The data will be 

publicly accessible for all research purposes. Thus, scientists 

may use the data to, e.g. assess the appropriateness of 

models and other tools in relation to the WFD. 

For each of the eight river basins a comprehensive 

amount of data is presently being collected and uploaded to 

the HarmoniRiB database. The data basically include all 

data that are required to carry out analysis for the WFD 

implementation (Blind and de Blois, 2003). Most of the data 

are organised in seven datasets, one for each of the six 

domains: climate, rivers, lakes, groundwater, transitional 

waters, and coastal waters, and one for spatial data, river 

basin characteristics and socio-economic data. Specific lists 

of data have been prepared by matching the data 

requirements given in the guidance documents on ‘Monitoring’ 

(EC, 2003b) and ‘Analysis of pressures and impacts’ 

(EC, 2003a), with the data available in the respective river 

basins (Rasmussen, 2003). 

After collecting and reformatting the data they are being 

uploaded to the HarmoniRiB Data Centre. Subsequently, 

uncertainty will be assessed and added to the data following 

the framework outlined above. 

Table 2 

Key characteristics of the HarmoniRiB network of representative river basins 

Dominant land use Main water uses Main conflicting interest 

GNP 

(Euro/pers/year) 

Country river basin Area (km 2 ) Population 

density 

(person/km 2 ) 

Flood protection, minimum discharges, water quality 

CZ, Svratka 3998 142 5600 Agriculture, forest Drinking water, electrical power, 

recreation, nature 

DE, Weisse Elster 5325 278 15000 Agriculture Drinking water, industry Point and non-point sources; wastewater and contaminated 

sites; strong economic and social changes. 

DK, Odense 1090 135 25000 Agriculture Public water supply, 

recreation, nature 

Agricultural contamination; groundwater abstraction depletes 

stream flow and wetlands 

Farming use; hydroelectrical use; touristic water demand 

ES, Jucar 21328 28 9900 Agriculture Irrigation, hydroelectric, 

touristic supply, industry 

GR, Geropotamou 600 66 10000 Agriculture Irrigation, touristic Water shortage, water quality, oversized dam, salt intrusion, 

difficulties in sharing water among municipalities 

IT, Candelaro 1980 230 10277 Agriculture Irrigation, industry Water shortage; rainfall rates decrease; intensive 

horticultural farming. 

Agriculture, water quality, ecology, flooding — 

room for water retention 

NL + DE Vecht 3780 (1980 in NL) 311 19000 Industry, agriculture, habitation Agriculture, drinking water, 

receiving water, recreation 

Water supply vs. ecology 

UK, Thames 12917 929 30000 Urban, agriculture Public water supply, ecosystem, 

recreation


5. Case studies 

For each of the river basins the methodologies will be 

tested through one case study for each of the eight river 

basins. The focus in the case studies will be assessment of 

uncertainties related to various aspects of the decision 

process related to evaluating potential measures for 

achieving the WFD objective of good ecological status. 

The following aspects of uncertainty will be considered: 

Uncertainty related to framing of the decision making 

process. This uncertainty will typically be described in 

qualitative terms. 

Uncertainty related to prediction of effects of a given 

measure, i.e. what is the impact of a given management 

decision such as changes in agricultural practice of 

abstraction of groundwater. Such predictions will often be 

made by use of hydrological models and involve the 

following sources of uncertainty: 

- Uncertainty of input data. 

- Uncertainty of model parameter values. 

- Uncertainty of model techniques (numerical solution, 

software bugs, etc.). 

- Uncertainty of model structure. 

Uncertainty on economic assessments, which, like for 

uncertainty in hydrological model predictions, may 

originate from economic data and from the choice of 

evaluation method. 

A key problem in assessing the uncertainty of the effects 

of a measure is that the effects usually are estimated as a 

difference between two model simulations, e.g. a reference 

run describing the present conditions and a run where the 

measure is taken into account. Procedures for assessing u- 

ncertainty of a model simulation are well known, while 

procedures for assessing uncertainties in differences between 

two simulation runs are theoretically difficult and rarely 

used. However, here we are mainly interested in the uncertainty 

on the difference figures. These uncertainties related 

to differences in simulated output may be much smaller than 

the uncertainties in the model predictions of each simulation 

(Reichert and Borsuk, 2005) as many sources of uncertainty 

affect the predictions for different alternatives in similar 

ways. 

The results of the case study will be uncertainties 

expressed partly quantitatively and partly qualitatively. The 

quantitative parts may be illustrated as in Fig. 6, where the 

uncertainty on the impacts (hydrological models) are shown 

along the vertical axis and the uncertainty on the costs of 

implementing a measure is shown along the horizontal axis. 

In the hypothetical example shown in Fig. 6 measure no. 1 

(PoM 1) is clearly suboptimal as compared to the two other 

measures, because its effect is much lower and the 

implementation cost higher. A decision on whether to 

chose PoM 2 or PoM 3 is, however, more difficult, because 

the uncertainty ranges are overlapping both with regards to 

effects and costs. The choice will also be influenced by the 

risk strategy of the decision maker. If the decision maker 

wants a high degree of certainty for an effect corresponding 

to the dashed line denoted ‘Minimum effect’ s/he will have 

to select PoM 3, even if the expected cost efficiency of PoM 

2 is more favourable. 

Fig. 6. Graphical representation of uncertainty in simulated effect of measure vs. estimated uncertainty in cost of implementing a measure.

276 



Acknowledgement 

Assessment of uncertainty in model simulations is 

important when such models are used to support decisions 

in water resources management (Beven and Binley, 1992; 

Pahl-Wostl, 2002; Jakeman and Letcher, 2003; Refsgaard 

and Henriksen, 2004). This is reflected in EU’s new water 

management approaches as described in the Water Framework 

Directive (EC, 2000) and the associated guidance 

documents. A basic principle in EU environmental policy on 

which the WFD is based is ‘‘...to contribute to pursuit of the 

objectives of preserving, protecting and improving the 

quality of the environment in prudent and rational use of 

natural resources, and to be based on the precautionary 

principle ... ’’ (paragraph 11 in the directive). The holistic 

concept that is prescribed in the WFD with its integrated 

approach to natural resources and socio-economic issues 

therefore requires that uncertainty be considered in the 

decision making process in order for it to become truly 

rational. This need for taken uncertainties into account is 

also explicitly stated in the WFD guidance documents 

(Blind and de Blois, 2003). 

The key sources of uncertainty of importance for 

evaluating the effect and cost of a measure in relation to 

preparing a WFD-compliant river basin management plan 

are (1) uncertainty related to framing of the decision 

making process; (2) uncertainty related to hydrological 

models (input data, parameter values, model technique, 

model structure) and; (3) uncertainty in economic assessments. 

The framework adopted in HarmoniRiB addresses 

this wide spectrum of uncertainties. The particularly 

novel contributions of HarmoniRiB in this respect are 

related to the assessment of uncertainty in data and to 

the integration of uncertainty in effects of a measure 

(outputs from hydrological models) and socio-economic 

uncertainty, including uncertainty in costs of implementing a 

measure. 

New principles often lead to a demand for new research 

for supporting their implementation. This is also the case for 

the WFD. Hence there is a need for easy access to river basin 

datasets suitable for WFD related research. None of the 

existing international river basin networks can provide 

suitable datasets for supporting research on integrated water 

management of direct relevance for implementation of the 

WFD. In addition, none of the existing networks comprise 

any quantifiable information on data uncertainty. The 

HarmoniRiB project aims at filling this gap by designing, 

building and populating a database containing data and 

associated uncertainties for a eight river basins representatively 

characterising the diversity of climatic regimes and 

water management challenges across Europe. This river 

basin network aims at becoming a ‘virtual laboratory for 

modelling studies’, and it will be made available for the 

scientific community. The data may, e.g. be used for 

comparison and demonstration of methodologies and 

models relevant to the WFD. 

This work is partly funded by the EC Energy, 

Environment and Sustainable Development programme 

(Contract EVK1-2002-00109). 

References 

Beck, M.B., 1987. Water quality modelling: a review of the analysis of 

uncertainty. Water Resour. Res. 23 (8), 1393–1442. 

Bergstrom, J.C., Boyle, K.J., Poe, G.L. (Eds.), 2001. The Economic Value 

of Water Quality. Edward Elgar, Chaltenham. 

Beven, K., Binley, A.M., 1992. The future of distributed models, model 

calibration and uncertainty predictions. Hydrol. Processes 6, 279–298. 

Blind, M., de Blois, C., 2003. The Water Framework Directive and its 

Guidance Documents — Review of data aspects. In: Refsgaard, J.C., 

Nilsson, B. (Eds.), Requirements, Report, Geological Survey of Denmark, 

Greenland, Copenhagen (Chapter 5). Available on http:// 

www.harmonirib.com/. 

Brown, J.D., 2004. Knowledge, uncertainty and physical geography: 

towards the development of methodologies for questioning belief. 

Trans. Inst. Br. Geographers 29 (3), 367–381. 

Brown, J.D., Heuvelink, G.B.M., Refsgaard, J.C., 2005. An integrated 

framework for assessing and recording uncertainties about environmental 

data. To appear in a special issue of Water Sci. Technol. 

Cech, T.V., 2003. Principles of Water Resources — History, Development, 

Management, and Policy. John Wiley & Sons, New York. 

EC, 2000. Water Framework Directive. Directive 2000/60/EC. European 

Commission. 

EC, 2003a. Guidance for the analysis of Pressures and Impacts in accordance 

with the Water Framework Directive. Working Group 2.1. 

Available on http://forum.europa.eu.int/Public/irc/env/wfd/library. 

EC, 2003b. Water Framework Directive, Common Implementation Strategy. 

Working group 2.7. Monitoring. Available on http://forum.europa.eu.int/Public/irc/env/wfd/library. 

Funtowicz, S.O., Ravetz, J., 1990. Uncertainty and Quality in Science for 

Policy. Kluwer Academic Publishers, Dordrecht. 

GWP, 2000. Integrated Water Resources Management. TAC Background 

Papers No. 4. Global Water Partnership, Stockholm. Available on http:// 

www.gwpforum.org/. 

Hanley, N., Spash, C.L., 1993. Cost-Benefit Analysis and the Environment. 

Edward Elgar, Brookfield. 

Heuvelink, G.B.M., Burrough, P.A., 1993. Error propagation in cartographic 

modelling using Boolean logic and contionous classification. 

Int. J. Geogr. Inform. Sci. 7 (3), 231–246. 

Jakeman, A.J., Letcher, R.A., 2003. Integrated assessment and modelling: 

features, principles and examples for catchment management. Environ. 

Modell. Software 18, 491–501. 

Jungermann, H., Pfister, H-R., Fischer, K., 1998. Die Psychologie der 

Entscheidung (The Psychology of Decisions). Spektrum Akademischer 

Verlag, Heidelberg. 

Klauer, B., Brown, J.D., 2003. Conceptualising imperfect knowledge in 

public decision making: ignorance, uncertainty, error and ‘risk situations’. 

Environ. Res., Eng. Manage. 

Moore, R.V., 1997. The logical and physical design of the land Ocean 

Interaction Study database. Sci. Total Environ. 194/195, 137–146. 

Munier, N., 2004. Multicriteria Environmental Assessment. Kluwer Academic 

Publishers, Dortrecht. 

Pahl-Wostl, C., 2002. Towards sustainability in the water sector — the 

importance of human actors and processes of social learning. Aquatic 

Sci. 64, 394–411. 

Passarella, G., Vurro, M., 2003. Review of Existing River Basin Networks. 

In: Refsgaard, J.C., Nilsson, B. (Eds.), Requirements Report. Geological


Survey of Denmark and Greenland, Copenhagen (Chapter 3). Available 

on http://www.harmonirib.com/. 

Rasmussen, P., 2003. Requirements for Data for HarmoniRiB. In: 

Refsgaard, J.C., Nilsson, B. (Eds.), Requirements Report. Geological 

Survey of Denmark and Greenland, Copenhagen (Chapter 7). Available 

on http://www.harmonirib.com/. 

Refsgaard, J.C., Henriksen, H.J., 2004. Modelling guidelines — terminology 

and guiding principles. Adv. Water Resour. 27, 71–82. 

Refsgaard, J.C., van der Sluijs, J.P., Brown, J., van der Keur, P., submitted 

for publication. A framework for dealing with uncertainty due to model 

structure error. 

Reichert, P., Borsuk, M.E., 2005. Does high forecast uncertainty preclude 

effective decision support. Environ. Modell. Software 20 (8), 991–1001. 

Roy, B., 1996. Multicriteria Methodology for Decision Aiding. Kluwer 

Academic Publishers, Dortrecht. 

Tindal, C.I., Moore, R.V., Dunbar, M., Goodwin, T., 2004. The HarmoniRiB 

project — the effect of uncertainty on catchment management. In: 

British Hydrological Society International Conference on Hydrology: 

Science and Practice for the 21st Century, 12–16 July 2004, London, 

UK. 

Walker, W.E., Harremoës, P., Rotmans, J., Van der Sluijs, J.P., Van Asselt, 

M.B.A., Janssen, P., Krayer von Krauss, M.P., 2003. Defining uncertainty. 

A conceptual basis for uncertainty management in model-based 

decision support. Integrated Assess. 4 (1), 5–17. 

Van Loon, E., Refsgaard, J.C. (Eds.), 2005. Guidelines for assessing data 

uncertainty in hydrological studies. First draft version prepared September 

2004. Final version to be published beginning of 2005 on http:// 

www.harmonirib.com/. 

Jens Christian Refsgaard is co-ordinator of the HarmoniRiB project. 

Since his graduation in hydrology at the Technical University of Denmark in 

1976 he has worked with hydrological modelling and water resources 

management at DTU, DHI and now at GEUS, where he holds a position 

as research professor. He is currently also WP leader in HarmoniQuA 

(quality assurance in the modelling process) and NeWater (new approaches 

in water resources management). 

Bertel Nilsson is a research scientist in hydrogeology at Geological Survey 

of Denmark and Greenland since 1988. 

James Brown is a postdoctoral research associate at the University of 

Amsterdam with interests in environmental modelling, methods for uncertainty 

analysis of models, and the impacts of scientific uncertainty on 

decision making. 

Bernd Klauer has a professional background in mathematics, physics and 

economics. After his PhD in economics from the University of Heidelberg 

he became engaged at the UFZ Centre for Environmental Research, Leipzig. 

There he currently works as a senior scientist and leader of a research group 

on integrated assessment and decision support. 

Roger Moore is a member of the Centre for Ecology and Hydrology, UK. 

His backgound lies in civil engineering but has spent most of his career 

working on integrated database design mainly in the UK but also around the 

world. Currently, he is also co-ordinator for The FP5 project HarmonIT. 

Thomas Bech holds an MSc in electronics engineering and computer 

science, and has worked as software developer and project manager at 

Seven Technologies and DHI Water & Environment. He is currently 

working as a Software Development Manager at DHI Water & Environment. 

Michele Vurro graduated in hydraulic engineering. Researcher at 

CNR.IRSA from 1982, and is now principal researcher with responsibility 

for methodology and techniques for protecting and managing water 

resources, with particular emphasis on water budget under scarce water 

availability. 

Michiel Blind, Msc Environmental Science — Water Systems Analysis, has 

worked 5 years on monitoring network design at Wageningen University, 

where after he continued his career at RWS-RIZA, on IT-water management 

issues. He is mainly involved in European Research Projects on Catchment 

modelling. 

Guillermo Castilla is a forest engineer specialized in Remote Sensing and 

GIS. He is currently involved in the dissemination activities of HarmoniRiB. 

Ioannis K. Tsanis is a professor in the Department of Environmental 

Engineering at Technical University of Crete. He obtained his PhD in civil 

engineering from University of Toronto. His research activities are in the 

areas of hydroinformatics, water resources management and coastal engineering. 

His main background is hydrological modelling, water resources 

management and hydroinformatics. 

Pavel Biza has been educated in civil engineering and developed his career 

at the water board Povodi Moravy in the Czech Republic. He is now 

involved in development of river basin management plans.

[15] 

Refsgaard JC, van der Sluijs JP, Brown J, van der Keur P (2006). A 

framework for dealing with uncertainty due to model structure error. 

Advances in Water Resources, 29, 1586-1597. 

Reprinted from Advances in Water Resources with permission from Elsevier

Advances in Water Resources 29 (2006) 1586–1597 

www.elsevier.com/locate/advwatres 

A framework for dealing with uncertainty due to model 

structure error 

Jens Christian Refsgaard a, *, Jeroen P. van der Sluijs b , 

James Brown c , Peter van der Keur a 

a Department of Hydrology, Geological Survey of Denmark and Greenland (GEUS), Oster Voldgade 10, 1350 Copenhagen, Denmark 

b Copernicus Institute for Sustainable Development and Innovation, Department of Science Technology and Society, 

Utrecht University, Utrecht, The Netherlands 

c University of Amsterdam (UVA), Amsterdam, The Netherlands 

Received 29 July 2004; received in revised form 6 September 2005; accepted 21 November 2005 

Available online 5 January 2006 

Abstract 

Although uncertainty about structures of environmental models (conceptual uncertainty) is often acknowledged to be the main 

source of uncertainty in model predictions, it is rarely considered in environmental modelling. Rather, formal uncertainty analyses 

have traditionally focused on model parameters and input data as the principal source of uncertainty in model predictions. The traditional 

approach to model uncertainty analysis, which considers only a single conceptual model, may fail to adequately sample the 

relevant space of plausible conceptual models. As such, it is prone to modelling bias and underestimation of predictive uncertainty. 

In this paper we review a range of strategies for assessing structural uncertainties in models. The existing strategies fall into two 

categories depending on whether field data are available for the predicted variable of interest. To date, most research has focussed 

on situations where inferences on the accuracy of a model structure can be made directly on the basis of field data. This corresponds 

to a situation of ‘interpolation’. However, in many cases environmental models are used for ‘extrapolation’; that is, beyond the situation 

and the field data available for calibration. In the present paper, a framework is presented for assessing the predictive uncertainties 

of environmental models used for extrapolation. It involves the use of multiple conceptual models, assessment of their 

pedigree and reflection on the extent to which the sampled models adequately represent the space of plausible models. 


Keywords: Environmental modelling; Model error; Model structure; Conceptual uncertainty; Scenario analysis; Pedigree 


1.1. Background 

* Corresponding author. Tel.: +45 38 14 27 76; fax: +45 38 14 20 50. 


Assessing the uncertainty of model simulations is 

important when such models are used to support decisions 

about water resources [6,33,23,39]. The key 

sources of uncertainty in model predictions are (i) input 

data; (ii) model parameter values; and (iii) model structure 

(=conceptual model). Other authors further distinguish 

uncertainty in model context, model assumptions, 

expert judgement and indicator choice [46,54,48] but 

these are beyond the scope of this paper. Uncertainties 

due to input data and due to parameter values have been 

dealt with in many studies, and methodologies to deal 

with these are well developed. However, no generic 

methodology exists for assessing the effects of model 

structure uncertainty, and this source of uncertainty is 

frequently neglected. 

Any model is an abstraction, simplification and interpretation 

of reality. The incompleteness of a model 


doi:10.1016/j.advwatres.2005.11.013

J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597 1587 

structure and the mismatch between the real causal 

structure of a system and the assumed causal structure 

as represented in a model always result in uncertainty 

about model predictions. The importance of the model 

structure for predictions is well recognised, even for situations 

where predictions are made on output variables, 

such as discharge, for which field data are available 

[16,8]. The considerable challenge faced in many applications 

of environmental models is that predictions are 

required beyond the range of available observations, 

either in time or in space, e.g. to make extrapolations 

towards unobservable futures [2] or to make predictions 

for natural systems, such as ecosystems, that are likely 

to undergo structural changes [4]. In such cases, uncertainty 

in model structure is recognised by many authors 

to be the main source of uncertainty in model predictions 

[44,13,31,28]. 

1.2. An example – five alternative conceptual models 

The problem is illustrated for a study conducted by 

the County of Copenhagen in 2000 involving a real 

water management decision [11,37]. The County of 

Copenhagen is the authority responsible for water 

resources management in the county where the city of 

Copenhagen abstracts groundwater for most of its water 

supply. According to a new Water Supply Act the 

county had to prepare an action plan for protection of 

groundwater against pollution. As a first step, the 

county asked five groups of Danish consulting firms to 

conduct studies of the aquifer’s vulnerability towards 

pollution in a 175 km 2 area west of Copenhagen, where 

the groundwater abstraction amounts to about 12 million 

m 3 /year. The key question to be answered was: 

which parts of this particular area are most vulnerable 

to pollution and need to be protected The five consultants 

were among the most well reputed consulting firms 

in Denmark, and they were known to have different 

views and preferences on which methodologies are most 

suitable for assessing vulnerability. As the task was one 

of the first consultancy studies on a new major market 

for preparation of groundwater protection plans it was 

considered a prestigious job to which the consultants 

generally allocated some of their most qualified 

professionals. 

The five consultants used significantly different 

approaches. One consultant based his approach on 

annual fluctuations of piezometric heads assuming that 

larger fluctuations represent greater interaction between 

aquifer and surface water systems and hence a larger 

vulnerability. Several consultants used the DRASTIC 

multi-criteria method [1], but modified it in different 

ways by changing weights and adding new, mainly geochemically 

oriented, criteria. One consultant based his 

approach on advanced hydrological modelling of both 

groundwater and surface water systems using the MIKE 

SHE code [40], while two other consultants used simpler 

groundwater modelling approaches. Thus, the five consultants 

had different perceptions of what causes 

groundwater pollution and used models with different 

processes and causal relationships to describe the possibility 

of groundwater pollution in the area. In addition, 

their different interpretations and interpolations made 

from common field data resulted in significantly different 

figures for e.g. areal means of precipitation and 

evapotranspiration and the thickness of various geological 

layers [37]. 

The conclusions of the five consultants regarding vulnerability 

to nitrate pollution are shown in Fig. 1. Itis 

apparent that the five estimates differ substantially from 

each other. In the present case, no data exist to validate 

the model predictions, because the five models were used 

to make extrapolations. Thus, it is not possible, from 

existing field data, to tell which of the five model estimates 

are more reliable. The differences in prediction 

originate from two main sources: (i) data and parameter 

uncertainty and (ii) conceptual uncertainty. Although 

the data and parameter uncertainties were not explicitly 

assessed by any of the consultants (as is common in such 

studies), the substantial differences in model structures 

and the fact that the consultants all used the same raw 

data point to structural uncertainty as the main cause 

of difference between the five model results and as a 

major source of uncertainty in model predictions. 

Fig. 1. Model predictions on aquifer vulnerability towards nitrate 

pollution for a 175 km 2 area west of Copenhagen [11].

1588 J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597 

Usually a water manager bases their decisions on the 

conclusions from only one study. The uniqueness of the 

present study was that five consultants were asked to 

answer the same question on the basis of the same data. 

In this respect the differences between the five estimates 

are striking and clearly do not provide a sound basis for 

deciding anything about which areas should be protected. 

A worrying question, which is left unanswered, 

is whether the basis for decisions is similarly poor in 

the many other cases where only a single conceptual 

model has been adopted and where millions of DKK 

have subsequently been used to prepare and implement 

action plans. 

1.3. Objective and outline of paper 

The objective of this paper is to review possible strategies 

for dealing with model structure errors and to outline 

a framework for handling the effects of model 

structure errors on predictive uncertainty, with particular 

emphasis on situations where model predictions represent 

extrapolations to situations not covered by 

calibration data and are often outside the domain on 

which our knowledge on the dynamics of the system 

and our understanding of its causal relationships is 

based. 

The paper is organised so that reviews of existing 

strategies and the discussion of their potentials and limitations 

are given in Section 2. A new framework is presented 

in Section 3 for analysing the uncertainties due to 

model structure errors when models are used for making 

extrapolations beyond their calibration base. Finally, 

the problems and perspectives of the new framework 

are discussed in Section 4. The terminology used is 

defined in Appendix. 

2. Review of possible strategies 

2.1. Classification 

The existing strategies for assessing uncertainty due 

to incomplete or inadequate model structure may be 

grouped into the categories shown in Fig. 2. The most 

important distinction is whether data exist that makes 

it possible to make inferences on the model structure 

uncertainty directly. This requires that data are available 

for the output variable of predictive interest and for conditions 

similar to those in the predictive situation. In 

other words it is a distinction between whether the 

model predictions can be considered as interpolations 

or extrapolations relative to the calibration situation. 

The two main categories are thus equivalent to different 

situations with respect to model validation tests. 

According to Klemes’ classical hierarchical test scheme 

[26,38], the interpolation case corresponds to situations 

where the traditional split-sample test is suitable, while 

the extrapolation case corresponds to situations where 

no data exist for the concerned output variable 

(proxy-basin test) or where the basin characteristics 

are considered non-stationary, e.g. for predictions of 

effects of climate change or effects of land use change 

(differential split-sample test). 

In the review of existing strategies given below examples 

of studies have been selected to illustrate the classification 

and the common approaches. It is not an 

Availability of data for 

model validation test 

Target data exist 

(interpolation) 

No direct data 

(extrapolation) 

Increase 

parameter 

uncertainty 

Estimate 

structural 

term 

Multiple 

conceptual 

models 

Expert 

elicitation 

Pedigree 

analysis 

Intermediate data 

(differential splitsample 

case) 

No data at all 

(proxy basin case) 

Fig. 2. Classification of existing strategies for assessing conceptual model uncertainty.


exhaustive review, but illustrates the range of 

approaches available to diagnose structural uncertainty 

in models. 

2.2. Data exist – interpolation 

In this situation, calibration is usually carried out 

against a sample of the existing field data to ensure some 

kind of optimal parameter values, and then the model 

predictions are compared with the remaining (‘independent’) 

field data. The deviations between model predictions 

and independent field observations can be used 

to infer the model’s conceptual error. Different methodologies 

can be used in this respect. 

2.2.1. Increasing parameter uncertainty to account 

for structural uncertainty 

One strategy is to increase the parameter uncertainty 

to a level where it is assumed to compensate for omitting 

model structure error from the analysis. Van Griensven 

and Meixner [45] provide an example of this. They 

assess the total predictive uncertainty without identifying 

or quantifying the underlying sources of uncertainty. 

They use the split-sample approach assessing ranges of 

predictive uncertainty from analyses of predictions and 

data for a period different from the calibration period. 

Their total predictive uncertainty is assessed by increasing 

the model parameter uncertainty beyond the magnitudes 

estimated during calibration to a level where the 

resulting predictive uncertainty intervals bracket the 

observations. This technique does not introduce a separate 

stochastic term for the structural uncertainty, but 

represents the structural term in the parameter term. 

The model structure error is likely to influence the model 

simulations in non-random and temporally varying 

ways. By compensating the model structure error by 

increasing the variance of a temporally constant random 

variable the results from this approach can be questioned, 

particularly if used for predictions in situations 

where split-sample tests are not made. 

2.2.2. Estimation of the structural uncertainty term 

Other strategies attempt to estimate the structural 

contribution to uncertainty in the model predictions. 

An example of such an approach is given by Radwan 

et al. [35], who estimate the total predictive uncertainty 

from a statistical analysis of the residuals between model 

predictions and observations. Further, they analyse the 

propagated uncertainties from model input and parameter 

values. By subtracting these two uncertainties from 

the total predictive uncertainty they assign the remaining 

predictive uncertainty to be an effect of model structure 

uncertainty. It is then possible to add the model 

structure uncertainty when making other predictions. 

This approach assumes that the uncertainties from different 

sources are additive. This assumption is questionable, 

because the combination of uncertainties is often 

non-linear due to interactions, correlations and dependencies 

between variables in a model. It also assumes 

that the differences in predictions and observations are 

caused by structural error and not by the poor specification 

of input and parameter uncertainty, nor by errors in 

the observations. 

Vrugt et al. [53] present another stochastic approach 

based on a simultaneous parameter optimisation and 

data assimilation with an ensemble Kalman filter. By 

specifying values for measurement error and a so-called 

‘stochastic forcing term’, representing structural uncertainty, 

they are able to estimate the dynamic behaviour 

of the model structure uncertainty. Both techniques 

assume a smooth contribution from structural uncertainty, 

but an important advantage of the latter is that 

parameter innovations (an output from the Kalman filter) 

may be used to diagnose non-stationarity in system 

structure. 

2.3. No direct data – extrapolation 

In cases where model structure errors cannot be 

assessed directly due to a lack of relevant data, the main 

strategy is to do the extrapolation with multiple conceptual 

models. Two supporting methods can be used here 

for the generation and qualification of each of the alternative 

models: expert elicitation and pedigree analysis 

(Fig. 2). 

2.3.1. Multiple conceptual models 

In the scenario approach a number of alternative 

conceptual models are considered. For each of these, 

the model input and parameter uncertainties may be 

analysed and the differences between model predictions 

are then seen as a measure of the model structure uncertainty. 

The idea of using alternative or competing candidate 

model structures was introduced in water quality 

modelling some time ago [5]. The issue typically dealt 

with here is whether models developed for current conditions 

can yield correct predictions when used under 

changed control. Van Straten and Keesman [50] note 

in this respect that good performance at the calibration 

stage does not guarantee correctly predicted behaviour, 

due to non-stationarity of the underlying processes in 

space or time. 

The multiple modelling approach has also been used 

in flood forecasting. For example, Butts et al. [8] use 10 

different model structures to evaluate structural uncertainty 

in flood predictions. They conclude that exploring 

an ensemble of model structures provides a useful 

approach in assessing simulation uncertainty. 

In groundwater modelling different conceptual models 

are typically based on different geological interpretations 

[18,43,42,30,34]. Højberg and Refsgaard [21] 

present an example using three different conceptual


models, based on three alternative geological interpretations 

for a multi-aquifer system in Denmark. Each of 

the models was calibrated against piezometric head data 

using inverse technique. The three models provided 

equally good and very similar predictions of groundwater 

heads, including well field capture zones. However, 

when using the models to extrapolate beyond the calibration 

data to predictions of flow pathways and travel 

times the three models differed dramatically. When 

assessing the uncertainty contributed by the model 

parameter values, the overlap of uncertainty ranges 

between the three models significantly decreased when 

moving from groundwater heads to capture zones and 

travel times. They conclude that the larger the degree 

of extrapolation, the more the underlying conceptual 

model dominates over the parameter uncertainty and 

the effect of calibration. 

The strategy of applying several alternative models 

based on codes with different model structures is also 

common in climate change modelling. In its description 

of uncertainty related to model predictions of both present 

and future climates the Intergovernmental Panel on 

Climate Change (IPCC) [22] bases its evaluation on scenarios 

of many (up to 35) different models. The same 

strategy is followed in the dialogue model [52]. Dialogue 

is a so-called integrated assessment model (IAM) of climate 

change. It has been developed as an interactive 

decision-support tool for energy supply policy making. 

Dialogue simulates the cause effect chain of climate 

change, using mono-disciplinary sub-models for each 

step in the chain. The chain starts with scenarios for economic 

growth, energy demand, fuel mix etc., leading to 

emissions of greenhouse gasses, leading to changes in 

atmospheric composition, leading to radiative forcing 

of the climate, leading to climate change, leading to 

impacts of climate change on societies and ecosystems. 

Rather than selecting one mono-disciplinary sub-model 

for each step, as most other climate IAMs do, dialogue 

uses multiple models for each step (for instance, three 

different carbon cycle models, simplified versions of five 

different global climate model – outcomes, etc.), representing 

the major part of the spectrum of expert opinion 

in each discipline. 

2.3.2. Expert elicitation 

Expert elicitation can be used as a supporting method 

in uncertainty analysis. It is a structured process to elicit 

subjective judgements and ideas from experts. It is 

widely used in uncertainty assessment to quantify uncertainties 

in cases where there is no or too few direct 

empirical data available to infer uncertainty. Usually 

the subjective judgement is represented as a probability 

density function reflecting the experts’ degree of belief. 

Expert elicitation aims to specify uncertainties in a structured 

and documented way, ensuring the account is both 

credible and traceable to its assumptions. Typically it is 

applied in situations where there is scarce or insufficient 

empirical material for a direct quantification of uncertainty 

[20]. An example with use of expert elicitation 

to estimate probabilities of alternative conceptual models 

is given by Meyer et al. [29]. They assessed probabilities 

as subjective values, from expert elicitation, 

reflecting a belief about the relative plausibility of each 

model based on its apparent consistency with available 

knowledge and data. 

Expert elicitation can also be used to generate ideas 

about alternative causal structures (conceptual models) 

that govern the behaviour of a system. Techniques used 

in decision analysis include group model building [51] 

and the hexagon method [19] but these techniques usually 

aim to achieve consensus. From the point of view 

of model structure uncertainty, these elicitation techniques 

could perhaps be used to generate alternative 

conceptual models. 

2.3.3. Pedigree analysis 

Another supporting method is pedigree analysis. The 

idea comes from Funtowicz and Ravetz [17], who note 

that statistical uncertainty in terms of inexactness does 

not cover all relevant dimensions of uncertainty, including 

the methodological and epistemological dimensions. 

To promote a more differentiated insight into uncertainty 

they propose to extend good scientific practice with five 

qualifiers for quantitative scientific information: numeral 

unit, spread, assessment, and pedigree (NUSAP). By 

adding expert judgement of reliability (assessment) and 

systematic multi-criteria evaluation of the processes by 

which numbers have been produced (pedigree), NUSAP 

has extended the statistical approach to uncertainty (inexactness) 

with the methodological (unreliability) and epistemological 

ignorance dimensions. By providing a 

separate qualification for each dimension of uncertainty, 

it enables flexibility in their expression. 

Each special sort of information has its own aspects 

that are key to its pedigree, so different pedigree matrices 

using different pedigree criteria can be used to qualify 

different sorts of information. Early applications of pedigree 

analysis of environmental models have focussed on 

parameter pedigree, using proxy representation, empirical 

basis, methodological rigor, theoretical understanding 

and validation as pedigree criteria. Later on, 

pedigree analysis has been extended to assessment of 

model assumptions and problem framing [49,12]. 

2.4. Discussion of strengths/weaknesses and potentials/ 

limitations 

The strategies used in ‘interpolation’, i.e. for situations 

that are similar to the calibration situation with 

respect to variables of interest and conditions of the natural 

system, have the advantage that they can be based 

directly on field data. A fundamental weakness is that


field data are themselves uncertain. Nevertheless, in 

many cases, they can be expected to provide relatively 

accurate estimates of, at least, the total predictive uncertainty 

for the specific measured variable and for the 

same conditions as those in the calibration and validation 

situation. Some of the methods cannot differentiate 

how the total predictive uncertainty originates from 

model input, model parameter and model structure 

uncertainty. Other methods attempt to do so. However, 

this distinction is, as recognised by many authors, e.g. 

Vrugt et al. [53], problematic. In the case of uncalibrated 

models, the parameter uncertainty is very difficult to 

assess quantitatively, and wrong estimates of model 

parameter uncertainty will influence the estimates of 

model structure uncertainty. In the case of calibrated 

models, estimates of model parameter uncertainty can 

often be derived from autocalibration routines. An inadequate 

model structure will, however, be compensated 

by biased parameter values to optimise the model fit 

with field data during calibration. Hence, the uncertainty 

due to model structure will be underestimated in 

this case. 

A more serious limitation of the strategies depending 

on observed data is that they are only applicable for situations 

where the output variables of interest are measured 

(e.g. [35,45,53]). While relevant field data are 

often available for variables such as water levels and 

water flows, this is usually not the case for concentrations, 

or when predictions are desired for scenarios 

involving catchment change, such as land use change 

or climate change. Another serious limitation stems 

from an assumption that the underlying system does 

not undergo structural changes, such as changes in ecosystem 

processes due to climate change. 

The strategy that uses multiple conceptual models 

benefits from an explicit analysis of the effects of alternative 

model structures. Furthermore, it makes it possible 

to include expert knowledge on plausible model structures. 

This strategy is strongly advocated by Neuman 

and Wierenga [31] and Poeter and Anderson [34]. They 

characterise the traditional approach of relying on a single 

conceptual model as one in which plausible conceptual 

models are rejected (in this case by omission). They 

conclude that the bias and uncertainty that results from 

reliance on an inadequate conceptual model are typically 

much larger than those introduced through an 

inadequate choice of model parameter values. 

This view is consistent with Beven [7] who outlines a 

new philosophy for modelling of environmental systems. 

The basic aim of his approach is to extend traditional 

schemes with a more realistic account of uncertainty, 

rejecting the idea that a single optimal model exists for 

any given case. Instead, environmental models may be 

non-unique in their accuracy of both reproduction of 

observations and prediction (i.e. unidentifiable or equifinal), 

and subject to only a conditional confirmation, due 

to e.g. errors in model structure, calibration of parameters 

and period of data used for evaluation. A weakness 

of the multiple modelling strategy, is the absence of 

quantitative information about the extent to which each 

model is plausible. Furthermore, it may be difficult to 

sample from the full range of plausible conceptual models. 

In this respect, expert knowledge on which the formulations 

of multiple conceptual models are based, is 

an important and unavoidable subjective element. The 

level of subjectivity can be reduced if the scenarios are 

generated in a formalised and reproducible manner. 

For example, this is possible with the TPROGS procedure 

[9,10], by which alternative geological models can 

be generated stochastically. The subjectivity does not 

disappear with this approach. Rather, it is transferred 

from formulation of the geological model itself to 

assumptions on probability functions and correlation 

structures of the various geological units that are more 

easily constrained in practice. 

The strategy of expert elicitation has the advantage 

that subjective expert knowledge can be included in 

the evaluation. It has the potential to make use of all 

available knowledge including knowledge that cannot 

be easily formalised otherwise. It can include views of 

sceptics, and reveals the level of expert disagreement 

on certain estimates. Expert elicitation also has several 

limitations. The fraction of experts holding a given view 

is not proportional to the probability of that view being 

correct. One may safely average estimates of model 

parameters, but if the expert’s models were incommensurate, 

one cannot average models [25]. If differences 

in expert opinion are irresolvable, weighing and combining 

the individual estimates of distributions is impossible. 

In practice, the opinions are often weighted 

equally, although sometimes self-rating is used to obtain 

a weight-factor for the experts competence. Finally, the 

results of expert elicitation tend to be sensitive to the 

selection of the experts whose estimates are gathered. 

In a review of four different case studies in which pedigree 

analysis was applied, Van der Sluijs et al. [49] show 

that pedigree analysis broadens the scope of uncertainty 

assessment and stimulates scrutiny of underlying methods 

and assumptions. Craye et al. [12] reported similar 

experiences. It facilitates structured, creative thinking 

on conceivable sources of error and fosters an enhanced 

appreciation of the issue of quality in information. It 

thereby enables a more effective criticism of quantitative 

information by providers, clients, and also users of all 

sorts, expert and lay. It provides differentiated insight 

in what the weakest parts of a given knowledge base 

are. It is flexible in its use and can be used on different 

levels of comprehensiveness: from a ‘back of the envelope’ 

sketch based on self-elicitation to a comprehensive 

and sophisticated procedure involving structured informed 

in-depth group discussions, covering each pedigree 

criterion. The scoring of pedigree criteria is to a certain


degree subjective. Subjectivity can partly be remedied by 

the design of unambiguous pedigree matrices and by 

involving multiple experts in the scoring. The choice of 

experts to do the scoring is also a potential source of 

bias. The method is relatively new, with a limited (but 

growing) number of practitioners. There is as yet no settled 

guideline for good practice. We must keep in mind 

that it is not a panacea for the problem of unquantifiable 

uncertainty. 

3. New framework 

We propose that conceptual uncertainty can be 

assessed by adopting a protocol based on the six elements 

shown in Fig. 3. The central aim is to establish 

a number of plausible conceptual models, with a range 

that adequately samples the space of possible conceptual 

models, to evaluate the tenability of each conceptual 

model and the overall range of models selected in relation 

to the perceived uncertainty on model structure 

and to propagate the uncertainties in each case. 

STEP 1: Formulate a conceptual model. A conceptual 

model is established. Since we have defined a conceptual 

model as a combination of our qualitative process 

understanding and the simplifications acceptable for a 

particular modelling study, a conceptual model becomes 

highly site-specific and even case-specific. For example a 

conceptual model of an aquifer may be described as 

Formulate a conceptual 

model 

Set up and calibrate 

model 

Sufficient conceptual 

models 

Perform validation tests 

and accept/reject models 

Evaluate tenability and 

completeness of 

conceptual models 

Make model predictions 

and assess uncertainty 

Fig. 3. Protocol for assessing conceptual model uncertainty. 

two-dimensional for a study focussing on regional 

groundwater heads, while it may need to include threedimensional 

geological structures for detailed simulation 

of contaminant transport. Formulating a new conceptual 

model may involve changing or refining the model 

structure, e.g. by modifying the hydrogeological interpretations 

(in the case of groundwater models), dimensionality, 

temporal and spatial resolution, initial and 

boundary conditions and process descriptions (governing 

equations). 

STEP 2: Set up and calibrate model. On the basis of 

the formulated conceptual model a site- and case-specific 

model is set up. Subsequently the model is calibrated 

and the model parameter uncertainty assessed. 

For the purposes of ‘interpolation’ (i.e. relevant observations 

are available), the parameter uncertainty can 

reasonably be constrained through calibration. However, 

for the case of ‘extrapolation’, the risk of calibrating 

model parameters for prediction of unobserved 

variables is that the model becomes biased for the unobserved 

variable. 

STEP 3: Sufficient conceptual models The first two 

steps are repeated until sufficient conceptual models 

are included. This judgement will be influenced by the 

practical constraints on including additional models 

and the desire to include additional conceptual models 

that are substantially different from those already 

included. 

STEP 4: Perform validation tests (to the extent data 

availability allows). In order to evaluate how well the 

models describe the system in question, the performances 

of each of the models are tested by comparing 

model predictions with independent field data, i.e. data 

not used for calibration. This may be achieved by splitting 

the sample data into a calibration and validation 

set, or, alternatively, by cross-validation (e.g. bootstrapping: 

[15]) against ‘independent data’. The models whose 

predictive capability is deemed low are discarded and 

the reasons for these predictive failures are explored, 

where possible, for insight into the origins of structural 

uncertainty. In ‘extrapolation’ cases, data will usually 

not be available for validation tests and STEP 4 must 

be skipped. However, in some cases, it is possible to test 

‘intermediate’ model results. For example a groundwater 

model aimed at prediction of concentration values 

can often be tested against groundwater head and discharge 

data, or sparse concentration data may be available 

for parts of the study area. 

STEP 5: Evaluate tenability and completeness of conceptual 

models. The aim of this step is to analyse the 

retained models with respect to their predictive bias 

and uncertainty. This has two elements: (i) to evaluate 

the tenability of each conceptual model; and (ii) as far 

as possible, to evaluate the extent to which the retained 

models represent the space of plausible conceptual models. 

The tenability of the conceptual models is evaluated


Table 1 

Pedigree matrix for evaluating the tenability of a conceptual model 

Plausibility Colleague consensus 

Score Supporting empirical evidence Theoretical understanding Representation of 

understood 

Proxy Quality and quantity 

underlying mechanisms 

Highly plausible All but cranks 

Well-established theory Model equations reflect high 

mechanistic process detail 

Controlled experiments and large 

sample direct measurements 

4 Exact measures of the 

modelled quantities 

Reasonably plausible All but rebels 

Model equations reflect 

acceptable mechanistic 

process detail 

Accepted theory with 

partial nature 

(in view of the 

phenomenon it describes) 

Historical/field data uncontrolled 

experiments small sample 

direct measurements 

3 Good fits or measures of 

the modelled quantities 

Somewhat plausible Competing schools 

Aggregated parameterised 

meta model 

Accepted theory with 

partial nature and 

limited consensus on reliability 

Modelled/derived data indirect 

measurements 

2 Well correlated but not 

measuring the same thing 

Preliminary theory Grey box model Not very plausible Embryonic field 

Educated guesses indirect approx. 

rule of thumb estimate 

1 Weak correlation but 

commonalties in measure 

Crude speculation Crude speculation Black box model Not at all plausible No opinion 

0 Not correlated and not 

clearly related 

through expert reviews. First, the strength of the tenability 

of each conceptual model is evaluated by using the 

pedigree matrix in Table 1. A structured procedure for 

the elicitation of pedigree scores is given by Van der Sluijs 

et al. [47]. Note that there is no need to arrive at a 

consensus pedigree score for each criterion: if experts 

disagree on the pedigree scores for a given model, this 

reflects further epistemological uncertainty surrounding 

that model. Next, the adequacy of the retained conceptual 

models to represent the range of plausible models is 

evaluated. This is an assessment of whether the space of 

the retained conceptual models is sufficient to encapsulate 

the relevant range of plausible conceptual models 

without becoming impractical. This has strong similarities 

to Dunn’s concept of context validation [14]. Context 

validity refers to the validity of inferences that we 

have estimated the proximal range of rival hypotheses. 

Context validation can be performed by a bottom-up 

process to elicit from experts rival hypotheses on causal 

relations governing the dynamics of a system. One could 

argue that an infinite number of conceivable models 

might exist. However, it has been shown in projects 

where such elicitation processes were used, that the 

cumulative distribution of unique rival models flattens 

out after consultation of a limited number of experts, 

usually somewhere between 20 and 25 when chosen with 

diverse enough backgrounds [27]. 

STEP 6: Make model predictions and assess uncertainty. 

Together with model predictions of the desired 

variables, uncertainty assessments are carried out. This 

will typically include uncertainty in input data and 

parameter values in addition to the conceptual uncertainty. 

Furthermore, on the basis of the goodness of 

the conceptual models, evaluated in STEP 5 the goodness 

of the assessed predictive uncertainty associated 

with the model structure should be evaluated. 


4.1. Methodologies to assess conceptual uncertainty 

As discussed above, the existing strategies fall into 

two main categories, each with limitations. The strategies 

where model structure errors are assessed from 

observed data are confined to interpolation cases, 

understood as cases where the model can be calibrated 

and validated against field data for the variables of predictive 

interest and where the natural system does not 

undergo structural change. The strategies used for situations 

involving extrapolation depend either on multiple 

conceptual models (preferred) or on expert elicitation or 

pedigree analysis for a single conceptual model (usually 

less preferred). 

The novelty of our proposed framework is the combination 

of multiple conceptual models and the pedigree


approach for assessing the overall tenability of these 

models in one formalised protocol. Some of our proposed 

steps are similar to other approaches for dealing 

with equifinality, multiple possible models and the 

rejection of non-behavioural model [6,31]. Other steps 

are based on qualitative approaches, including expert 

knowledge in a structured manner [20,49]. The aim of 

our new framework is not to identify the ‘‘true’’ model 

structure or the cause of the errors in the existing model 

structure. Instead, we propose an approach that integrates 

different types of knowledge, not previously combined, 

such as quantitative and qualitative uncertainty, 

to estimate the impact of model structure uncertainty 

on model predictions. 

The GLUE approach (generalised likelihood uncertainty 

estimation, [6,7]) also operates with a range of 

alternative models. Although almost all applications 

of GLUE reported so far operate with only one model 

structure and many alternative model parameter sets, it 

is possible to use GLUE with alternative model structures 

[24]. In addition to prescribing multiple conceptual 

models, an important difference between our 

proposed approach and GLUE is that we recommend 

parameter optimisation is conducted as part of the calibration 

in order to take full advantage of the information 

in field data. There are different opinions about 

whether calibration by parameter optimisation is advisable 

or not. The main advantage of calibration is that it 

improves the ability of the model to reproduce hydrological 

behaviour of a system within the limits of 

observed behaviour [31]. An important by-product is 

that it provides useful information about the uncertainty 

of model parameters. The disadvantage is that 

parameter optimisation may result in biased parameter 

values to compensate for errors in model structure and 

that many parameter sets (i.e. many models) perform 

more or less equally well but provide different results. 

In implementing our framework, model calibration 

might be skipped and many models with different 

parameter sets retained, as in the GLUE approach. 

The reason we are not advocating such an approach 

is partly for pragmatic reasons (very large computational 

requirements) and partly that we aim to focus 

on model structure uncertainty rather than parameter 

uncertainty. 

Although intended for use in a very different context, 

the central aim behind our proposed protocol is similar 

to the approach of IPCC [22], who assign a level of confidence 

to their assessment of climate change by evaluating 

predictions from multiple models. The level of 

confidence placed in a particular finding reflects both 

the degree of consensus amongst modellers and the 

quantity of evidence that is available to support the finding. 

IPCC [22] classifies the confidence qualitatively in 

three levels: (i) ‘well established’, (ii) ‘evolving’ and (iii) 

‘speculative’. 

4.2. Critical issues for implementing the new protocol 

4.2.1. Performance criteria – threshold for accepting/ 

rejecting models 

A critical issue in relation to acceptance/rejection of 

models (STEP 4 above) is how to define performance 

criteria. We agree with Beven [7] that any conceptual 

model is (known to be) wrong in an absolute sense, 

and hence that any model will be rejected if we investigate 

it in sufficient detail and specify very high performance 

criteria. On the other hand, the whole point in 

modelling is to simplify. 

A good reference for model performance is to compare 

it with uncertainties of the available field observations. 

If the model performance is within this 

uncertainty range we may characterise the model as 

good enough. However, usually it is less straightforward. 

For example, how wide should the confidence 

bands be before we reject models or accept them within 

observational uncertainties – ranges corresponding to 

65%, 95% or 99% Indeed, the differences between 

95% and 99% may be significant in practical terms. Do 

we always then reject a model if it cannot perform 

within the observational uncertainty range How reasonable 

are our estimates of uncertainty in observations 

In many cases, even the results from less 

accurate models may be very useful. 

Another reference for what is acceptable accuracy is 

the use of a benchmark model as discussed by e.g. Seibert 

[41]. The difficulty is then transferred to selecting 

an appropriate benchmark. 

Our answer is that the decision on performance criteria 

must, in general, be taken in a socio-economic context, 

for which predictive uncertainties must be clearly 

explained and open to interpretation beyond small 

groups of scientists. Thus, we believe that the accuracy 

criteria cannot be decided universally by modellers or 

researchers, but must be different from case to case 

depending on the nature of a decision and the risks 

involved. 

4.2.2. Qualitative assessment of tenability of conceptual 

models 

Pedigree analysis structures the critical appraisal of 

alternative model structures and provides insight in the 

state of knowledge on which each of the conceivable 

model structures is based. However, it does not give 

an indication of the relative quality of the various model 

structures. With reference to Table 1, the pedigree analysis 

for a simple statistical model (A) and a complex 

mechanistic model (B) could, for example, result in 

statements like: 

• Model A is weakly correlated to the predicted variable 

(Proxy, score 1), based on a large sample of 

direct measurements (Quality and quantity, score


4), built on a preliminary theory and a black box 

model (Theoretical understanding, score 1; Representation 

of mechanisms, score 1), somewhat plausible 

(Plausibility, score 2) and controversial among colleagues 

(Colleague consensus, score 2); 

• Model B exactly addresses the desired predictive variable 

(Proxy, score 4), is based on data with rule of 

thumb estimates (Quality and quantity, score 1), built 

on a well-established theory with model equations 

reflecting high process details (Theoretical understanding, 

score 4; Representation of mechanisms, 

score 4), reasonably plausible and accepted by all colleagues 

except rebels (Plausibility and Colleague consensus, 

score 3). 

Such statements cannot be integrated in a quantitative 

uncertainty analysis in terms of probabilities, but 

they should be available as the best possible scientifically 

based characterisation of uncertainties and as such be 

made available to those involved in the decision making 

process. 

Furthermore, as the selected conceptual models can 

never cover all possibilities, but instead cover limited 

range, it is important to emphasise that the overall 

uncertainty of model predictions cannot be assessed in 

an absolute sense, only in a conditional or relative sense 

[7,31]. Our suggested method does not alter this fundamentally. 

However, we believe that the outcome of the 

proposed formalised review is a qualitative assessment 

that is more useful in a decision making context than 

unstructured information, or verbose information from 

scientific outlets that is not always available to the decision 

maker. The challenge is to design environmental 

management strategies that are robust against the uncertainties 

identified. Inclusion of a wider range of conceivable 

model structures may help to anticipate surprises 

that would have been overlooked otherwise. 

4.2.3. Different degrees of extrapolation 

Our proposed framework deals with situations where 

predictions involve extrapolations beyond available field 

data. However, there are different degrees of extrapolation 

(Fig. 2). If we look at the situation where a threedimensional 

groundwater model is calibrated against 

groundwater head and discharge data, model predictions 

of groundwater recharge to a given layer is a smaller 

extrapolation than model predictions of groundwater 

age or contaminant concentration. In both situations, 

model predictions are carried out for variables that have 

not been used as calibration targets and for which no 

traditional split-sample validation tests are possible. 

The type of validation test recommended for such situation 

is a proxy-basin test, which according to the principles 

in Klemes [26] and Refsgaard [38], for instance, 

could imply that validation tests have to be conducted 

in two similar catchments where relevant data (e.g. concentrations) 

exist, and where such data are not used for 

calibration. The residuals in the other catchments can 

then be seen as a measure of the uncertainty to be 

expected in the catchment of interest. 

If model predictions are made for groundwater heads 

in cases involving groundwater abstraction, and the 

existing data available for calibration and validation 

tests do not include such abstraction, we also have an 

extrapolation case, although of a different nature. In this 

case we have data for the variable of predictive interest, 

but the catchment characteristics are non-stationary. 

This corresponds to the situation of model validation 

denoted by a differential split-sample test [26,38]. The 

differential split-sample test scheme recommended by 

Klemes also operates by tests on similar catchments 

where data for the type of non-stationary situation exist. 

Differential split-sample tests are often less demanding 

than proxy-basin tests [36]. A similar type of differential 

split-sample situation arises when predictions are 

required for a system in which structural change is 

expected (e.g. [50,4]. 

In cases where the conceptual models can be transferred 

to other catchments in a reliable and reproducible 

way, such proxy-basin and differential split-sample tests 

could be conducted and the results used to evaluate the 

goodness of the underlying conceptual models. It is 

worth noting that Klemes’ test schemes, which also 

apply for cases of extrapolation, operate with tests for 

two alternative catchments. This has clear similarities 

with our strategy of recommending the use of multiple 

conceptual models. 

4.3. Perspectives 

In many cases where environmental models are used 

to make predictions that are extrapolations beyond the 

calibration base, no suitable framework exists for assessing 

the effects of model structure error. The proposed 

framework is composed of elements originating from 

different scientific disciplines. The elements are well 

tested individually, but not previously applied in such 

an integrated manner for water resources or environmental 

modelling applications. The full framework still 

needs to be tested in real-life cases. 

Acknowledgement 

For the three authors from GEUS and UVA the present 

work was supported by the Project ‘Harmonised 

Techniques and Representative River Basin Data for 

Assessment and Use of Uncertainty Information in 

Integrated Water Management’ (www.harmonirib.com), 

which is partly funded by the EC Energy, Environment 

and Sustainable Development programme (Contract 

EVK1-2002-00109). The constructive comments of


Hoshin V. Gupta and two anonymous reviewers are 

acknowledged. 

Appendix. Terminology 

The terminology used is mainly based on Refsgaard 

and Henriksen [39]: 

Reality: The system that we aim to represent with the 

model, understood here as the study area. 

Conceptual model: A representation of ‘reality’ in 

terms of verbal descriptions, equations, governing relationships 

or ‘natural laws’ that purport to describe reality. 

This is the user’s perception of the key hydrological 

and ecological processes in the study area (perceptual 

model) and the corresponding simplifications and 

numerical accuracy limits that are assumed acceptable 

in order to achieve the purpose of the modelling. A conceptual 

model therefore includes a mathematical 

description (equations) of assumed processes and a 

description of the objects they interact with, including 

river system elements, ecological structures, geological 

features, etc. that are required for the particular purpose 

of modelling. 

Model code: A generic mathematical description of a 

conceptual model, implemented in a computer program. 

It is generic in the sense that, without program changes, 

it can be used to establish a model with the same basic 

type of equations (but allowing different input variables 

and parameter values) for a different study area. 

Model: A case-specific tailored version of a model 

code established for a particular study area and set of 

modelling objectives (output variables) including specific 

input data and parameter values. 

Model confirmation: Determination of the adequacy 

of the conceptual model to provide an acceptable performance 

for the domain of intended application. 

Code verification: Substantiation that a model code 

adequately represents a conceptual model within certain 

specified limits or ranges of application and corresponding 

ranges of accuracy. 

Model calibration: The procedure of adjusting the 

parameter values of a model in such a way that the 

model reproduces an observed response of the system 

represented in the model within the range of accuracy 

specified in the performance criteria. 

Model validation: Substantiation that a model, within 

its domain of applicability, possesses a satisfactory 

range of accuracy, consistent with the intended application 

of the model. Note that various authors have criticised 

the use of the word validation for predictive 

models because universal validation of a model is in 

principle impossible and therefore prefer to use the term 

model evaluation [32,3]. In our definition [39] the term 

validation is not used in a universal sense, but is always 

restricted to clearly defined domains of applicability and 

performance accuracy (‘numerical universal’ in Popperian 

sense). 

Pedigree: Pedigree conveys an evaluative account of 

the production process of information, and indicates different 

aspects of the underpinning and scientific status 

of the knowledge used. Pedigree is expressed by means 

of a set of pedigree criteria to assess these different 

aspects. Criteria for model parameter pedigree are for 

instance proxy representation, empirical basis, methodological 

rigor, theoretical understanding and validation. 

Assessment of pedigree involves qualitative expert 

judgement. To minimise arbitrariness and subjectivity 

in measuring strength, a pedigree matrix is used to code 

qualitative expert judgements for each criterion into a 

discrete numeral scale from 0 (weak) to 4 (strong) with 

linguistic descriptions (modes) of each level on the scale 

[49]. 

References 

[1] Aller LT, Bennet T, Lehr JH, Petty RJ. DRASTIC: a standardized 

system for evaluating ground water pollution potential using 

hydrogeologic setting, US EPA Robert S. Kerr Environmental 

Research Laboratory, EPA/600/287/035, Ada, OK, 1987. 

[2] Babendreier JE. National-scale multimedia risk assessment for 

hazardous waste disposal. In: International workshop on uncertainty, 

sensitivity and parameter estimation for multimedia 

environmental modelling held at US Nuclear Regulatory Commission, 

Rockville (MD), August 19–21, 2003. Proceedings, pp. 

103–9. 

[3] Beck MB. Model evaluation and performance. In: El-Shaarawi 

AH, Piegorsch WW, editors. Encyclopedia of environmetrics, vol. 

3. Chichester: John Wiley & Sons, Ltd; 2002. p. 1275–9. 

[4] Beck MB. Environmental foresight and structural change. Environ 

Modell Software 2005;20:651–70. 

[5] Beck MB, van Straten G, editorsUncertainty and forecasting of 

water quality. Springer-Verlag; 1983. 

[6] Beven K, Binley AM. The future of distributed models, model 

calibration and uncertainty predictions. Hydrol Process 

1992;6:279–98. 

[7] Beven K. Towards a coherent philosophy for modelling the 

environment. Proc Roy Soc London, A 2002;458(2026): 

2465–84. 

[8] Butts MB, Payne JT, Kristensen M, Madsen H. An evaluation of 

the impact of model structure on hydrological modelling uncertainty 

for streamflow prediction. J Hydrol 2004;298:242–66. 

[9] Carle SF, Fog GE. Transition probability based on indicator 

geostatistics. Math Geol 1996;28(4):453–77. 

[10] Carle SF, Fog GE. Modeling spatial variability with one and 

multidimensional contineous-lag Markov chains. Math Geol 

1997;29(7):891–917. 

[11] Copenhagen County. Pilot project on establishment of methodology 

for zonation of groundwater vulnerability. In: Proceedings 

from seminar on groundwater zonation, November 7, 2000, 

County of Copenhagen [in Danish]. 

[12] Craye M, van der Sluijs JP, Funtowicz S. A reflexive approach to 

dealing with uncertainties in environmental health risk science and 

policy. Int J Risk Assess Manage 2005;5(2):216–36. 

[13] Dubus IG, Brown CD, Beulke S. Sources of uncertainty in 

pesticide fate modelling. Sci Total Environ 2003;317:53–72. 

[14] Dunn W. Using the method of context validation to mitigate type 

III errors in environmental policy analysis. In: Hisschemöller M,


Hoppe HV, Dunn W, Ravetz J, editors. Knowledge, power and 

participation in environmental policy. Policy studies review 

annual, vol. 12. New Jersey (USA): Transaction Publishers. p. 

417–36. 

[15] Efron B, Tibshirani RJ. An introduction to the bootstrap. 

Monographs on statistics and applied probability. New 

York: Chapman and Hall; 1993. 

[16] Franchini M, Pacciani M. Comparative analysis of several 

conceptual rainfall-runoff models. J Hydrol 1992;122:161–219. 

[17] Funtowicz SO, Ravetz JR. Uncertainty and quality in science for 

policy. Dordrecht: Kluwer; 1990. p. 229. 

[18] Harrar WG, Sonnenborg TO, Henriksen HJ. Capture zone, travel 

time and solute transport predictions using inverse modelling and 

different geological models. Hydrogeol J 2003;11(5):536–48. 

[19] Hodgson AM. Hexagons for systems thinking. Eur J Oper Res 

1992;59:220–30. 

[20] Hora SC. Acquisition of expert judgement: examples from risk 

assessment. J Energy Eng 1992;118:136–48. 

[21] Højberg AL, Refsgaard JC. Model uncertainty – parameter 

uncertainty versus conceptual models. Water Sci Technol 

2005;52(6):177–86. 

[22] IPCC. Climate change 2001: the scientific basis. Contribution of 

working group I to the third assessment report of the intergovernmental 

panel of climate change [Houghton JT, Ding Y, Griggs 

DJ, Noguer M, van der Linden PJ, Dai X, Maskell K, Johnson 

CA, editors]. Cambridge University Press, Cambridge (UK) and 

New York (NY, USA). p. 881. 

[23] Jakeman AJ, Letcher RA. Integrated assessment and modelling: 

features, principles and examples for catchment management. 

Environ Modell Software 2003;18:491–501. 

[24] Jensen JB. Parameter and uncertainty estimation in groundwater 

modelling. PhD thesis, Department of Civil Engineering, Aalborg 

University, Series Paper no. 23, 2003. 

[25] Keith DW. When is it appropriate to combine expert judgements 

Clim Change 1996;33:139–43. 

[26] Klemes V. Operational testing of hydrological simulation models. 

Hydrol Sci J 1986;31:13–24. 

[27] Kloprogge P, van der Sluijs JP. The inclusion of stakeholder 

knowledge and perspectives in integrated assessment of climate 

change. Climatic Change, in press. 

[28] Linkov I, Burmistrov D. Model uncertainty and choices made by 

modelers: lessons learned from the international atomic energy 

model intercomparisons. Risk Anal 2003;23(6):1297–308. 

[29] Meyer PD, Ye M, Neuman SP, Cantrell KJ. Combined estimation 

of hydrogeologic conceptual model and parameter uncertainty. 

NUREG/CR-6843 Report, NRC, Washington, DC, 2004. 

[30] National Research Council. Conceptual models of flow and 

transport in the vadose zone. Washington, DC: National Academy 

Press; 2001. 

[31] Neuman SP, Wierenga PJ. A comprehensive strategy of hydrogeologic 

modeling and uncertainty analysis for nuclear facilities 

and sites. University of Arizona, Report NUREG/CR-6805, 

2003. 

[32] Oreskes N, Shrader-Frechette K, Belitz K. Verification, validation, 

and confirmation of numerical models in the Earth Sciences. 

Science 1994;263:641–6. 

[33] Pahl-Wostl C. Towards sustainability in the water sector – the 


Aquat Sci 2002;64:394–411. 

[34] Poeter E, Anderson D. Multiple ranking and inference in ground 

water modeling. Ground Water 2005;43(4):597–605. 

[35] Radwan M, Willems P, Berlamont J. Sensivity and uncertainty 

analysis for river quality modelling. J Hydroinform 2004:83–99. 

[36] Refsgaard JC, Knudsen J. Operational validation and intercomparison 

of different types of hydrological models. Water 

Resources Res 1996;32(7):2189–202. 

[37] Refsgaard JC, Hansen LK, Vahman M. Groundwater zonation in 

Copenhagen County – Intercomparision of thematic results from 

different consultants. In: Seminar on groundwater zonation, 

County of Copenhagen, November 7, 2000 [in Danish]. 

[38] Refsgaard JC. Towards a formal approach to calibration and 

validation of models using spatial data. In: Grayson R, Blöschl G, 

editors. Spatial patterns in catchment hydrology: observations 

and modelling. Cambridge University Press; 2001. p. 329–54. 

[39] Refsgaard JC, Henriksen HJ. Modelling guidelines – terminology 

and guiding principles. Adv Water Resources 2004;27:71–82. 

[40] Refsgaard JC, Storm B. MIKE SHE. In: Singh VP, editor. 

Computer models of watershed hydrology. Water Resources 

Publication; 1995. p. 809–46. 

[41] Seibert J. On the need for benchmarks in hydrological modelling. 

Hydrol Process 2001;15(6):1063–4. 

[42] Selroos JO, Walker DD, Strom A, Gylling B, Follin S. Comparison 

of alternative modelling approaches for groundwater flow in 

fractured rock. J Hydrol 2001;257:174–88. 

[43] Troldborg L. The influence of conceptual geological models on 

the simulation of flow and transport in quaternary aquifer 

systems. PhD Thesis. Geological Survey of Denmark and Greenland, 

Report 2004/107. 

[44] Usunoff E, Carrera J, Mousavi SF. An approach to the design of 

experiments for discriminating among alternative conceptual 

models. Adv Water Resources 1992;15:199–214. 

[45] Van Griensven A, Meixner T. Dealing with unidentifiable sources 

of uncertainty within environmental models. In: Pahl C, Schmidt 

S, Jakeman T, editors. iEMSs 2004 international congress: 

‘‘Complexity and integrated resources management’’. International 

Environmental Modelling and Software Society, Osnabrück, 

Germany, June 2004. 

[46] Van der Sluijs JP. Anchoring amid uncertainty; On the management 

of uncertainties in risk assessment of anthropogenic climate 

change, Ph.D. thesis, Utrecht University, 1997. p. 260. 

[47] Van der Sluijs JP, Potting J, Risbey JS, Van Vuuren D, de Vries B, 

Beusen A, et al. Uncertainty assessment of the IMAGE/TIMER 

B1 CO2 emissions scenario, using the NUSAP method. Report 

commissioned by the Netherlands National Research Program on 

global Air Pollution and Climate Change, RIVM, Bilthoven, The 

Netherlands, 2002. p. 225. 

[48] Van der Sluijs JP, Risbey JS, Kloprogge P, Ravetz JR, Funtowicz 

SO, Corral Quintana S, et al. RIVM/MNP Guidance for 

uncertainty assessment and communication: detailed guidance, 

report commissioned by RIVM/MNP – Copernicus Institute, 

Department of Science, Technology and Society, Utrecht University, 

Utrecht, The Netherlands, 2003. p. 71. 

[49] Van der Sluijs JP, Craye M, Funtowicz SO, Kloprogge P, Ravetz 

J, Risbey JS. Combining quantitative and qualitative measures of 

uncertainty in model based foresight studies: the NUSAP system. 

Risk Anal 2005;25(2):481–92. 

[50] Van Straten G, Keesman KJ. Uncertainty propagation and 

speculation in projective forecasts of environmental change: a 

lake-eutrophication example. J Forecast 1991;10:163–90. 

[51] Vennix JAM. Group model-building: tackling messy problems. 

Syst Dyn Rev 1999;15(4). 

[52] Visser H, Folkert RJM, Hoekstra J, De Wolff JJ. Identifying key 

sources of uncertainty in climate change projections. Clim Change 

2000;45:421–57. 

[53] Vrugt JA, Diks CGH, Gupta HV. Improved treatment of 

uncertainty in hydrologic modelling: combining the strengths of 

global optimization and data assimilation. Water Resources Res 

2005;41(1). Art No W01017. 

[54] Walker WE, Harremoës P, Rotmans J, Van der Sluijs JP, Van 

Asselt MBA, Janssen P, et al. Defining uncertainty. A conceptual 

basis for uncertainty management in model-based decision support. 

Integr Assessment 2003;4(1):5–17.

HYDROLOGICAL MODELLING AND RIVER BASIN MANAGEMENT

Create successful ePaper yourself

Delete template?

Save as template?