nonstationarity, covariance estimation and state-space ... - EPFL

ASPECTS OF MODERN GEOSTATISTICS: 

NONSTATIONARITY, COVARIANCE ESTIMATION AND 

STATE-SPACE DECOMPOSITIONS 

THÈSE No 2562 (2002) 

PRÉSENTÉE A LA FACULTÉ SB SECTION DE MATHÉMATIQUES 

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE 

POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES 

PAR 

Reinhard FURRER 

ingénieur mathématicien diplômé EPF 

de nationalité suisse et originaire dlEisten (VS) 

acceptée sur proposition du jury: 

Prof. S. Morgenthaler, directeur de thèse 

Prof. A.C. Davison, rapporteur 

Prof. J. Hüsler, rapporteur 

Prof. R. Webster, rapporteur 

Lausanne, EPFL 

2002

A bstract 

Geostatistical data are measurements taken at fixed locations in a spatial domain. Generally the latter are 

spatially continuous, as is typically the case in mining engineering, geology, soi1 science, and hydrology. 

Geostatistical models are based on the concept of spatial or spatio-temporal processes and aim to describe 

the underlying dependence structure. Spatial variability is modeled as a function of the distance between 

sampling sites. Called the 'variogram' or 'covariogram', this function is used to apply statistical methods 

such as estimation and/or prediction, referred to as 'kriging' in the geostatistical context. To quantify the 

spatio-temporal dependence, estimation techniques relying on certain hypotheses of stationarity (seldom 

met in reality) are applied. 

Nonstationarity and covariance estimation are the underlying topics of the present thesis, which consists 

of four chapters. 

The first chapter gives a concise overview of geostatistical definitions and notation used throughout the 

thesis. Prior to generalizing the concepts to multivariate and spatio-temporal processes, they are explained 

on spatial processes. 

There exist many different forms of nonstationarity. Two of them are discussed in the second chapter. 

First, the case where the mean of the process depends on the location is studied. The identification of 

a trend is a nontrivial problem and we emphasize that there exists no trend estimation procedure for 

spatial processes with unknown dependence structure. Exploratory tools for the empirical variogram or 

for the observed process, as well as a commonly used parametric and nonparametric method for trend 

estimation are illustrated. A simple method that evolved out of the corisequences of visual data analysis is 

developed, namely variogram estimation based on 'local trend estimation'. The latter separates the domain 

in several subdomains, or patches, on which an individual trend is estimated, and the residuals are combined 

throughout the entire domain to allow global estimation and/or inference. Simulations show that a simple 

and almost arbitrary subdivision is already sufficient to improve the results of variogram estimation. 

Moreover, the method does not break down when the (heuristic) decomposition does not coincide with 

the (true) separation of the populations. Even if the true trend is not linear the method performs better 

than other well-known parametric or nonparametric trend estimation techniques. To underline these 

statements the method is applied to real data. A second form of nonstationarity is the dependence of the 

covariance structure on the location. Under this circumstance classical covariogram estimation techniques 

are not applicable. For example, in atmospheric science one can easily imagine situations where the spatial 

dependence changes with time or where the maximum magnitude of variability may alter in time. For 

such phenornena new models are needed. Hence the remaining part of the second chapter discusses a 

new method of valid covariogram construction for nonstationary spatio-temporal processes. These new 

covariogram models are illustrated with simulations and an application to a dataset is given. 

Several statistical tools are based on the covariance matrix of the underlying process. An example of 

such a method is (functional) principal components analysis, which aims to represent a set of possibly 

vii

viii 

-- 

Abstract 

correlated. variables into uncorrelated orthogonal components. These uncorrelated components can be 

construc.ted successively, each one extracting a maximal amount of the remaining variance. This often 

leads to an appreciable reduction in dimensionality, replacing the original variables by a few components. 

To calculate the orthogonal components the covariance matrix of a multivariate or spatio-temporal process 

is required. The latter is rarely known and therefore has to be estimated. As mentionecl, an important 

aspect o:f geostatistical datais dependence over space as well as over time. This has to be taken into account 

when esti:mating the covariance matrix and the natural estimator of the covariance matrix is introduced 

in the third chapter. It is shown that it is biased under spatio-temporal dependence. This bias is studied 

under two different asymptotic models, namely increasing the number of observations in the domain and 

increasirig the domain by increasirig the number of locations. Using the first asymptotic mode1 we derive 

a fast and accurate bias correction, whereas the second asymptotic model serves to quantify the speed 

of convergence of the bias and the covariance of the components of the estimated covariance matrix. As 

shown, under mild hypotheses the asymptotic normality of the estimated covariance matrix holds and can 

be used .tc) test whether the eigenvectors of the estimated and the true covariance matrices are significantly 

different. This is revealed by examples, emphasizing the need for a bias correction. Furthermore, the 

theoretica.1 results are illustrated with Monte Carlo simulation studies and again with an application to 

real data. 

The niost commonly used decomposition to extract stationary parts of a process is based on the 

separatiori into different scales: (deterministic) large-scale variation, smooth small-scale variation, micro- 

scale variation and a measurement error. Although such additive partitioning is of considerable utility it 

also has several drawbacks, so an alternative analysis based on state-space decompositions is presented 

in the fclurth chapter. The space equation is a process governed by the state equation and an additional 

observational error, where the state at the point is a weighted mean of its neighborhood states described 

by a kernel function plus a spatial process. The new model takes account of diverse shapes of trends and 

one doer; not have to decide whether the process is stationary or not. As other existing decompositions 

can be reconstructed by the new representation, it can be seen as a generalization of existing ones. The 

decomposition results in a Fredholm integral equation of the second kind. By imposing separable kernels 

this integral equation has an explicit solution, and the model is defined by the parametrized covariogram of 

the spatial process and the parameters defining the kernel. In Our distribution-free model we will explore 

different rnethods based on minimal distances and moment equations for the parameter estimation, and, by 

generalking the concept of M-estimators to the dependent setting consistency for these new estimators is 

proven. The efficiency of the proposed method is discussed and the results are compared to other commonly 

used mcldels by means of extensive Monte Carlo simulations and applications to real datasets. Despite 

its complexity the new model furnishes an efficient and competitive approach throughout the simulations, 

which show that for most parameters this new estimator is more precise than the ordinary least squares 

estimator.

Version abrégée 

Les données géostatistiques sont constituées de mesures recueillies à des endroits déterminés dans 

le domaine spatial. Généralement elles sont continues spatialement ; des exemples typiques comprennent 

l'ingénierie minière, la géologie, la pédologie et l'hydrologie. Les modèles géostatistiques se basent sur le 

concept de processus spatial ou spatio-temporel et ont pour but de décrire la structure de dépendance sous- 

jacente. La variabilité spatiale est modélisée comme une fonction de la distance entre les sites échantillonnés. 

Cette fonction appelée 'variogramme' ou 'covariogramme' est utilisée afin d'appliquer des méthodes sta- 

tistiques comme l'estimation et/ou la prédiction, dénommée 'krigeage' dans le contexte géostatistique. 

Pour quantifier les dépendances spatio-temporelles, des techniques d'estimation se basant sur certaines 

hypothèse de stationnarité (rarement vérifiées dans la réalité) sont appliquées. 

La non-stationnarité et l'estimation de la covariance sont les thèmes sous-jacents de cette thèse qui est 

constituée de quatre chapitres. 

Le premier chapitre présente un survol court et concis des définitions et notations géostatistiques 

utilisées dans cette thèse. Préalables à la généralisation des concepts aux processus multivariés et spatio- 

temporels, elles sont établies relativement aux processus spatiaux univariés. 

Il existe beaucoup de différentes sortes de non-stationnarité, deux d'entre elles sont discutées dans 

le deuxième chapitre. Dans un premier temps le cas où la moyenne du processus dépend du site est 

étudiée. L'identification d'une tendance n'est pas un problème simple et nous soulignons qu'il n'existe pas 

de procédure d'estimation de la tendance pour les processus ponctuels dont la structure de dépendance 

est inconnue. Des outils exploratoires pour le variogramme empirique ou pour le processus observé, tout 

comme les méthodes paramétriques et non-paramétriques communément utilisées pour l'estimation de la 

tendance sont illustrées. Une méthode simple déduite de l'analyse visuelle des données est développée, à 

savoir l'estimation du variogramme basée sur 'l'estimation locale de la tendance'. Cette dernière sépare 

le domaine en plusieurs sous-doniaines ou morceaux, sur lesquels une tendance propre est estimée; les 

résidus sont combinés sur le domaine entier pour permettre une estimation et une inférence globales. Des 

simulations montrent qu'une subdivision simple et presque arbitraire suffit déjà à améliorer les résultats de 

l'estimation du variogramme. De plus la méthode fonctionne même lorsque la décomposition (heuristique) 

ne coïncide pas avec la (vraie) séparation des populations. Même dans le cas où la tendance n'est pas 

linéaire, la méthode donne de meilleurs résultats que les méthodes connues d'estimation paramétriques et 

non-paramétriques de la tendance. Pour souligner ces affirmations la méthode est appliquée à des données 

réelles. Une seconde forme de non-stationnarité est constituée par la dépendance de la structure de cova- 

riance par rapport au site. Dans ces circonstances les techniques d'estimation classiques ne peuvent pas 

s'appliquer. Par exemple, dans les sciences de l'atmosphère, il est facilement imaginable de rencontrer des 

situations où la dépendance spatiale change au cours du temps ou encore où la magnitude de variabilité 

maximale se modifie dans le temps. Pour de tels phénomènes le développement de nouveaux modèles est 

nécessaire. Par conséquent la partie restante du premier chapitre présente une nouvelle méthode valable

X 

-- Version abrégée 

pour la construction du covariogramme pour des processus spatio-temporels non-stationnaires. Ces nou- 

veaux modèles de covariogramme sont illustrés à l'aide de simulations et d'une application à un jeu de 

données. 

Plusieurs outils statistiques utilisent la matrice de covariance du processus sous-jacent. Un exemple 

d'une telle méthode est l'analyse en composantes principales (fonctionnelle) servant à représenter un en- 

semble de variables potentiellement corrélées par le biais de composantes orthogonales non corrélées. Ces 

composantes non corrélées peuvent être construites successivement, chacune extrayant une quantité maxi- 

male de la variance restante. Cela conduit souvent à une réduction appréciable de la dimension en rem- 

plaçant les variables par un nombre restreint de composantes. Pour calculer les composantes orthogonales 

la matrice de covariance d'un processus multivarié ou spatio-temporel est nécessaire. Cette dernière est 

rarement, connue et par conséquent doit être estimée. Comme précisé précédemment, une importante ca- 

ractéristique des données géostatistiques est leur dépendance à la fois spatiale et temporelle. Par conséquent 

cette caractéristique doit être prise en compte en estimant la matrice de covariance; un estimateur naturel 

de la covariance est présenté dans le troisième chapitre. Nous montrons qu'il est biaisé dans le cas d'une 

dépendance spatio-temporelle. Ce biais est étudié à l'aide de deux méthodes asymptotiques, à savoir en 

augmentant le nombre d'observations dans le domaine et en augmentant le domaine par le biais d'un ac- 

croissement du nombre de sites. En utilisant le premier modèle asymptotique nous obtenons une rapide et 

précise correction du biais, tandis que le second modèle asymptotique sert à quantifier la vitesse de conver- 

gence du biais et de la covariance des éléments de la matrice de covariance estimée. Nous démontrons 

que, sou:$ de légères hypothèses, la matrice de covariance estimée suit asymptotiquement une distribution 

normale. Cette propriété peut être utilisée pour tester si les vecteurs propres de la matrice de covariance 

estimée et ceux de la vraie matrice de covariance sont significativement différents. Ce résultat est montré à 

l'aide d'exemples soulignant la nécessité de corriger le biais. De plus les propriétés théoriques sont illustrées 

à l'aide de simulations Monte-Carlo et à nouveau avec une application à des données réelles. 

La décomposition la plus fréquente pour extraire les parties stationnaires d'un processus utilise la 

séparation selon différentes échelles : une variation (déterministe) à longue échelle, une variation lissée à 

petite échelle, une variation à micro-échelle et une erreur de mesure. Bien qu'une telle partition additive 

soit d'une utilité considérable, elle comporte également plusieurs inconvénients. C'est pourquoi une ana- 

lyse alternative utilisant une décomposition en espace d'états est présentée dans le quatrième chapitre. 

L'équation d'espace est un processus régit par une équation d'état et une erreur d'observation addition- 

nelle, ou l'état en un point est une moyenne pondérée de ces états voisins décrite par une fonction de 

noyau plus un processus spatial. Le nouveau modèle prend en compte diverses formes de tendance et il 

n'est paf3 nécessaire de décider si le processus est stationnaire ou non. Comme d'autres décompositions 

existantes peuvent être reconstruites par la nouvelle représentation, elle peut être considérée comme une 

généralisation des méthodes existantes. La décomposition aboutit à une équation intégrale de Fredholm du 

second type. En imposant la séparabilité des noyaux cette équation intégrale possède une solution explicite 

et le modèle est défini par le covariogramme paramétrisé du processus spatial et les para.mètres définissant 

le noyau. Dans notre modèle libre de distribution nous explorons diverses méthodes basés sur les distances 

minimales et les équations des moments pour l'estimation des paramètres, et en généralisant le concept des 

M-estimateurs au concept de dépendance, la consistance de ces nouveaux estimateurs est prouvée. L'effi- 

cacité de la méthode proposée est discutée et les résultats sont comparés à d'autres modèles fréquemment 

utilisés pa.r le biais de simulations Monte Car10 étendues et d'applications à des jeux de données réelles. 

Malgré sa complexité le nouveau modèle fournit une approche efficace et compétitive dans toutes les si- 

mulations. Ce dernier montre également que pour la plupart des paramètres ce nouvel estimateur est plus 

précis que les estimateurs basés sur les moindres carrés ordinaires.

Kurzfassung 

Als geostatistische Daten bezeichnet man alle Arten von Messungen, die an einem bestimmten Orten 

in einem festgelegten raumlichen Gebiet vorgenommen wurden. Diese Gebiete sind gewohnlich stetig, 

wie zum Beispiel in der Geologie, der Hydrologie, den Erdwisssenschaften und im Bergwesen. Model- 

le für geostatistische Daten basieren auf raumlichen oder hum-Zeitprozessen, welche die innewohnende 

Abhanigkeitsstruktur zu beschreiben versuchen, zum Beispiel wird die raumliche Variabilitat (Kovarianz) 

durch eine Funktion der Stichprobenorte beschrieben. Diese Funktion wird üblicherweise Variogramm oder 

Kovariogramm genannt, ihre Verwendung zur Schatzung undIoder Vorhersage ist ein grundlegendes Ele- 

ment der Geostatistik. Die meisten Techniken zur Schatzung der Raum-Zeitabhanigkeitsstruktur basieren 

auf der Annahme, dass der zugrundeliegende Prozess stationar ist, diese Annahme entspricht jedoch nur 

selten der Realitat. Nichtstationaritat und Kovarianzschatzung bilden den roten Faden dieser Dissertation, 

die in die im Folgenden kurz zusammengefassten vier Kapitel aufgeteilt ist. 

Das erste Kapitel gibt einen kurzen Überblick über geostatistische Definitionen und Schreibweisen, 

welche spater gebraucht werden. Es wird mit raumlichen Prozessen begonnen und sukzessive zu mehrdi- 

mensionalen und Raum-Zeitprozessen verallgemeinert. 

Es existieren viele verschiedene Formen von Nichtstationaritat, von denen zwei im zweiten Kapitel 

genauer betrachtet werden. Im ersten Fall, Trend gennant, hiingt der Mittelwert des Prozesses vom Ort 

im Raum ab. Die Identifizierung eines Trends ist ein nichttriviales Problem und es wird versucht auf- 

zuzeigen, dass es keine optimale Trendschatzung gibt, wenn die zugrundeliegende Abhangigkeitsstruktur 

nicht bekannt ist. Wir zeigen einige explorative Datenanalysemethoden für empirische Variogramme und 

beobachtete Prozesse. Im Weiteren werden standardmassige parametrische und nichtparametrische Trend- 

anpassungsmethoden erlautert. Von diesen Methoden ausgehend wird eine neue, einfache Denkweise zur 

Variogrammschatzung beschrieben, 'Lokale Trendschatzung' genannt. Diese teilt das Gebiet in mehrere 

Untergebiete ein, auf welchen der Trend geschatzt wird. Die Residuen werden zusammengefasst und er- 

lauben eine globale Schatzung undIoder statistische Schlussfolgerungen. Simulationen zeigen, dass eine 

einfache und heuristische Aufteilung zu Verbesserung der Variogrammschatzung führt. Wenn der wahre 

(unbekannte) Trend nicht linear ist oder wenn die heuristische Aufteilung nicht der wahren (unbekannten) 

Aufteilung entspricht, ist die Methode besser als parametrische und nichtparametrische Trendschatzung, 

wie in Simulationen und in einer Anwendung gezeigt wird. Eine zweite Art der Nichtstationaritat ist die 

Abhangigkeit der Form der Kovarianzstruktur vom Messort oder von der Messzeit, unter diesen Umstanden 

ist die klassische Variogrammschatzung nicht moglich. Im Zusammenhang mit Untersuchungen der At- 

mosphare kann zum Beispiel die Grosse der Variabilitat der Daten von der Zeit abhangen, für solche 

Phanomene werden neue Modelle gebraucht. Im letzten Teil des zweiten Kapitels wird eine neue Methode 

zur Konstruktion von gültigen, nichtseparierbaren Kovariogrammen für nichtstationare Raum-Zeitprozesse 

hergeleitet. Diese neuen Kovariogramme werden mit Simulationen und einer Anwendung illustriert. 

Viele statistische Anwendungen basieren auf der Kovarianzmatrix des modellierten Prozesses. Ein 

klassisches Beispiel einer solchen Methode ist (funktionale) Hauptkomponentenanalyse, welche eine Menge

xii Kurzfassunn 

von korrelierten Variablen in unkorrelierte, orthogonale Komponenten transformiert. Diese unkorrelierten 

Komponenten konnen sukzessive konstruiert werden, jede extrahiert den maximalen Anteil der Restva- 

riabilitat. Dieser Ansatz dient haufig zur Dimensionsreduzion, indem die ursprünglichen Variablen durch 

einige wenige orthogonale Komponenten ersetzt werden. Um diese Kompenenten zu berechnen wird die 

Kovariaiizmatrix des Raum-Zeitprozesses gebraucht, von welcher oft nur eine Schatzung vorhanden ist. 

Da geostatistische Daten eine innewohnende Abhanigkeitsstruktur über Raum und Zeit besitzen, muss 

diese in der Schatzung der Kovarianzmatrix berücksichtigt werden. Im dritten Kapitel wird der natürliche 

Schatzer unter Raum-Zeitkorrelation untersucht und gezeigt, dass dieser Schatzer einem systematischen 

Fehler unterliegt. Die Verzerrung wird unter zwei verschiedenen asymptotischen Modellen betrachtet: Die 

Anzahl Beobachtungen nimmt entweder in einem festgelegten Gebiet oder in einem entsprechend sich ver- 

grossernden Gebiet zu. Unter dem ersten Blickwinkel wird eine schnelle und prazise Verzerrungskorrektur 

hergeleitet, in der zweiten Situation wird die Konvergenzrate der Terme der geschatzten Matrix bestimmt. 

Unter schwachen Voraussetzungeri wird asymptotische Normalitat des Schatzers gezeigt. Dieses Resul- 

tat ist riotwendig für Tests von Eigenvektoren der wahren und geschatzten Eigenwerte. Hierzu werden 

Beispiele behandelt, die signifikante Unterschiede zwischen diesen Eigenvektoren aufweisen und somit die 

Notwendigkeit der Verzerrungskorrektur bestatigen. Die theoretischen Resultate werden mit Simulationen 

und Anwendungen auf realen Daten illustriert. 

Die am haufigsten genutzte Zerlegung zur Extraktion von stationaren Teilen eines Prozesses basiert auf 

einer additiven Trennung der Streuung: (deterministische) Variation in grossem Ausmass, glatte Variation 

in kleinem Ausmass, Variation im Mikroausmass und schliesslich ein Messfehler. Obwohl diese Zerlegung 

von grosser praktischer Bedeutung ist, hat sie mehrere Schwachpunkte. Eine neue und alternative Darstel- 

lung basierend auf einer Zerlegung des Zustandsraumes ist im vierten Kapitel beschrieben. Hierzu wird der 

Prozess beschrieben durch zwei Gleichungen, der Raumgleichung und der Zustandsgleichung. Die Raum- 

gleichung zerlegt den Gesamtprozess in einen von der Zustandsgleichung beschriebenen Teil und einen 

Messfehler, wahrend die Zustandsgleichun ein durch einen Kern gewichtetes Mittel und einen stationaren 

raumlichen Prozess enthalt. Diescs neue Modell kann verschiedene Formen von Trends beschreiben, des- 

halb wird eine subjektive Entscheidung bezüglich des Trends überflüssig. Zusatzlich konnen mit dem neuen 

Modell existierende Zerlegungen beschrieben werden, so dass die Zustandsraumzerlegung als eine Verallge- 

meinerung betrachtet werden kanu. Die Zustandsgleichung ist eine Fredholmsche Integralgleichung zweiter 

Art, wird ein separierbarer Kern vorausgesetzt, hat diese Gleichung eine explizite Losung und das Modell 

ist durc:h das parametrisierte Kovariogramm des stationaren raumlichen Prozesses und die Parameter des 

Kerns vollstandig beschrieben. Trotz seiner Komplexitat ist dieser neue Ansatz effizient und kompetitiv, 

da die Sc'hatzung der meisten Parameter praziser ist als die Methode der kleinsten Quadrate.

Riassunto 

1 dati geostatistici sono costituiti da misure eseguite in punti definiti ne1 dominio spaziale. Solitamente 

sono continui spazialmente. L'ingegneria mineraria, la geologia, la geotecnica e l'idrologia sono degli 

esempi tipici. 1 modelli geostatistici si basano su1 concetto di processo spaziale O spazio-temporale e 

servono a descriverne la struttura di dipendenze. La variabilità spaziale è rappresentata da una funzione 

della distanza tra i luoghi di misura. Questa funzione è chiamata 'variogramma' O 'covariogramma' ed è 

utilizzata per applicare metodi statistici come la stima e/o la previsione, chiamati 'kriging' ne1 contesto 

geostatistico. Per quantificare le dipendenze spazio-temporali, si applicano delle tecniche di stima che si 

basano su ipotesi stazionarie che in pratica si verificano solo raramente. 

La non stazionarietà e la stima della cowrianza sono i temi di fondo di questa tesi che è costituita da 

quattro capitoli. 

Il primo capitolo presenta una panoramica breve e coincisa delle definizioni geostatistiche usate in questa 

tesi. E una premessa necessaria alla generalizzazione dei concetti ai processi multivariati e spazio-temporali; 

è stabilita in base ai processi spaziali univariati. 

Esistono molti tipi differenti di non stazionarietà, due dei quali sono trattati ne1 secondo capitolo. In un 

primo tempo ci si occupa del caso in cui la media del processo dipende da1 luogo. L'identificazione di una 

tendenza non è un problema semplice e si sottolinea che non esistono procedure di stima della tendenza 

per processi puntiformi la cui struttura di dipendenza non è nota. Si illustrano inoltre degli strumenti d'e- 

splorazione del variogramma empirico O del processo in esame, e dei metodi parametrici e non parametrici 

usati correntemente per la stima della tendenza. Si sviluppa un metodo semplice che deriva dall'analisi 

visuale dei dati, ossia la stima del variogramma basata sulla 'stima locale della tendenza'. Quest'ultima 

separa il dominio in diversi sottodominii O parti, nei quali si stima una tendenza propria; i residui vengono 

in seguito combinati sull'intero dominio per permettere una stima globale. Delle simulazioni mostrano 

che una suddivisione semplice e quasi arbitraria è già sufficiente per migliorare i risultati della stima del 

variogramma. In più, il metodo funziona anche quarido la scomposizione (euristica) non coincide con la 

(vera) separazione delle popolazioni. Anche ne1 caso in cui la tendenza non è lineare questo metodo for- 

nisce risultati migliori dei metodi già noti di stima parametrica e non parametrica della tendenza. Per 

verificare queste affermazioni il metodo viene applicato a dei dati reali. Una seconda forma di non stazio- 

narietà è costituita dalla dipendenza dalla struttura di covarianza rispetto al luogo. In queste circostanze 

le tecniche classiche di stima non si possono applicare. Per esempio, nelle scenze dell'atmosfera, si possono 

trovare facilmente situazioni nelle quali la dipendenza spaziale varia ne1 tempo O dove la magnitudine della 

massima variabilità si modifica ne1 tempo. Per tali fenomeni è necessario sviluppare nuovi modelli. Con- 

seguentemente la parte restante del primo capitolo presenta un nuovo metodo, valido per la creazione del 

covariogramma per dei processi spazio-temporali non stazionari. Questi nuovi niodelli vengono illustrati 

tramite simulazioni e un'applicazione ad un insieme di dati.

xiv Riassunto 

Diversi strumenti statistici usano la matrice di covarianza del processo di fondo. Un esempio di un 

tale metodo è l'analisi delle componenti principali (funzionali) che servono a rappresentare un insieme di 

variabili potenzialmente correlate tramite delle componenti ortogonali non correlate. Queste componenti 

non correlate possono venir costituite successivamente, in modo che ogniuna estragga la massima quantità 

di variariza rimanente. Questo porta spesso ad una riduzione notevole della dimensione sostituendo le 

variabili con un numero ristretto di componenti. Per calcolare le componenti ortogonali è necessaria 

la matrice di covarianza di un processo multivariato O spazio-temporale, ma quest'ultima è raramente 

conosciuta e bisogna quindi stimarla. Come precedentemente precisato, una caratteristica importante 

dei dati geostatistici è la loro dipendenza spaziale e temporale. Bisonga quindi tener conto di questa 

caratteristica per stimare la matrice di covarianza; uno stimatore naturale della covarianza viene presentato 

ne1 terzcl capitolo. Mostriamo che non è affidabile ne1 cas0 di una dipendenza spazio-temporale. Questo 

grazie a dei metodi asintotici, ossia aumentando il numero di osservazioni ne1 dominio O ingrandendo il 

dominio aumentando il numero di luoghi. Usando il primo metodo asintotico otteniamo una correzione 

rapida e precisa dell'errore, mentre il secondo serve a quantificare la velocità di convergenza degli elementi 

della matrice di covarianza stimata. Dimostriamo inoltre, con delle ipotesi leggere, che la matrice di 

covarianza stimata segue asintoticamente una distribuzione normale. Questa proprietà pub essere usata 

per controllare se i vettori propri della matrice di cowrianza stimata e quelli della vera matrice di covarianza 

si differenziano in maniera significativa. Questo risultato è illustrato tramite degli esempi, e le proprietà 

teoriche sono illustrate con delle simulazioni di Monte-Carlo e con un'applicazione a dei dati reali. 

La scomposizione più frequente per estrarre le parti stazionarie di un processo usa la separazione a 

scale difl-èrenti: una variazione (determinista) a larga scala, una variazione lisciata a scla più piccola, una 

variaziorie a micro-scala e un errore di misura. Anche se una tale ripartizione è di notevole aiuto, comporta 

anche diversi inconvenienti. Per questa ragione si presenta ne1 quarto capitolo un'analisi alternativa che 

usa una scomposizione ne110 spazio degli stati. L7equazione di spazio è un processo retto da un'equazione 

di stato e da un errore d'osservazione addizionale, dove 10 stato in un punto è una media ponderata 

degli stati vicini che è descritta da una funzione 'kernel' e da un processo spaziale. Il nuovo modello 

tiene conto di diverse forme di tendenza e non è necessario decidere se il processo è stazionario O no. 

Siccome altre scomposizioni esistenti possono essere ricostruite con la nuova rappresentazione, si pub 

considerarla una generalizzazione dei metodi esistenti. La scomposizione porta a un'equazione integrale 

di Fredholm di secondo tipo. Imponendo la separazione dei 'kernel' questa equazione integrale possiede 

una soluzione esplicita e il modello è definito da1 covariogramma parametrico del processo spaziale e i 

parametri definiscono il 'kernel'. Ne1 nostro modello esploriamo diversi metodi basati sulle distanze minime 

e le equitzioni dei momenti per la stima dei parametri e, generalizzando il concetto degli M-stimatori al 

concetto di dipendenza, si prova la consistenza di questi nuovi stimatori. Si discute l'efficacia del metodo 

proposto e si confrontano i risultati con quelli di altri modelli usati correntemente tramite simulazioni 

estese di Monte Carlo e applicazioni con dati concreti. Malgrado la sua comlessità il nuovo modello risulta 

efficace t: competitivo in tutte le simulazioni. Si rivela inoltre più preciso degli stimatori basati sui minimi 

quadrati ordinari per la maggior parte dei parametri.

Contents 

Acknowledgernents 

Abstract 

Version abrégée 

Kurzfassung 

Riassunto 

Contents 

List of Figures 

List of Tables 

Prologue 

Overview of Geostatistical Data Analysis and Modeling 1 

......................................... 

1.1 Historical Overview 1 

............................................ 

1.1.1 Genesis 1 

.......................................... 

1.1.2 Quo Vadis 2 

..................................... 

1.2 Univariate Spatial Processes 3 

................................. 

1.2.1 Stationarity and Ergodicity 4 

1.2.2 Anisotropy .......................................... 5 

.................................. 

1.2.3 Additive Decompositions 5 

............................. 

1.2.4 Characterization Using Variograms 6 

............................ 

1.2.5 Characterization Using Covariograms 8 

1.3 Estimation of the Second Moment Structure ............................ 9 

.................................. 

1.3.1 Estimation of Variograms 9 

.................................. 

1.3.2 Variogram Model Fitting 11 

.......................................... 

1.4 Spatial Prediction 12 

............................................ 

1.4.1 Kriging 12 

vii 

ix 

xi 

... 

Xlll 

xv 

xix 

xxi 

xxiii

xvi Contents 

1.4.2 Other Interpolation Approaches . .............................. 13 

1.4.3 Stability of Kriging ..................................... 14 

1.5 h9ultivariate Spatial Processes ................................... 15 

1.6 Spatio-Temporal Processes ..................................... 16 

2 Nonstationarity Issues in Geostatistical Modeling 19 

2.1 T'rend Detection and Global Fitting . ................................ 19 

2.1.1 Exploratory Examination of the Process .......................... 20 

2.1.2 Fitting Parametric Models ................................. 24 

2.1.3 Fitting Nonparametric Models ............................... 26 

2.2 Local Trend Estimation ....................................... 27 

2.2.1 Simulations ......................................... 29 

2.2.2 Application ......................................... 29 

2.2.3 Local Variogram estimation . ................................ 34 

2.3 Covariograms of Nonstationary Spatio-Temporal Processes .................... 35 

2.3.1 Spectral Representation ................................... 35 

2.3.2 Simulations ......................................... 38 

2.3.3 Application ......................................... 40 

3 Covariance Estimation of Geostatistical Data 45 

3.1 Motivation .............................................. 45 

3.2 The Estimator Û .......................................... 46 

3.3 Asymptotic Considerations ..................................... 49 

3.3.1 Limiting Bias ........................................ 50 

3.3.2 Asymptotic Bias ....................................... 53 

3.3.3 Random and lrregular Locations .............................. 57 

3.4 Eigenvalues and Eigenvectors of U ................................. 59 

3.4.1 AsymptoticDistribution ................................... 59 

3.4.2 Confidence Cones ...................................... 63 

3.5 Simulations ............................................. 66 

3.6 Application . ............................................. 71 

4 State-Space Decomposition of Geostatistical Processes 

4.1 Motivation . ............................................. 

4.2 State-Space Decompositions .................................... 

4.2.1 State Equation with a Degenerate Kernel ......................... 

4.2.2 Other Types of State Equations .............................. 

4.3 Parameter Estimation ........................................ 

4.3.1 Moment Equations ..................................... 

4.3.2 Consistency ......................................... 

4.3.3 lnference ........................................... 

4.3.4 OLS, WLS, GLS and Robust Estimation .......................... 

......................................... 

4.4 lllustrative Examples 

4.4.1 Three Parameter Model . .................................. 

4.4.2 Gaussian Process with N = 1 . ...............................

Contents xvii 

4.5 Simulations ............................................. 91 

4.5.1 Numerical lntegration .................................... 91 

4.5.2 Estimation of Cij ...................................... 92 

4.5.3 Studies on the lmplementation of SSD ........................... 93 

4.5.4 Trend Contamination and True /3 # O ........................... 96 

4.6 Application . ............................................. 97 

4.6.1 Prediction .......................................... 97 

4.6.2 SIC97 Data ......................................... 98 

4.6.3 Lake Geneva Data . ..................................... 99 

4.7 Summary and Outlook ....................................... 101 

Epilogue 

Appendix 105 

A Datasets 105 

A.l SIC97 Data ............................................. 105 

A.2 Lake Geneva Data .......................................... 106 

A.3 Ozone Data ............................................. 108 

B Supplementary Simulation Results 111 

B.l Nonstationary Issues . ........................................ 112 

B.l.l Trend Detection ....................................... 112 

B.1.2 Local Trend Estimation ................................... 113 

B.2 Functional Principal Components Analysis ............................. 114 

B.3 State-Space Representation ..................................... 116 

Glossary 

References 

Author lndex 

Subject lndex 

Curriculum Vitae

nonstationarity, covariance estimation and state-space ... - EPFL

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?