<strong>Monte</strong> <strong>Carlo</strong> <strong>full</strong> <strong>waveform</strong> <strong>inversion</strong>MethodologyConsider that the subsurface can be represented by adiscrete set <strong>of</strong> model parameters, m , and that a data set,d , <strong>of</strong> indirect observations <strong>of</strong> the model parameters isprovided. The model parameters describe some physicalproperties <strong>of</strong> the subsurface that influences the dataobservations. Hence, the forward relation between themodel parameters (i.e. the model) and the data observationscan be expressed as (e.g. Tarantola, 2005):d = g( m ), (1)where g is a linear or non-linear mapping operator which<strong>of</strong>ten relies on a physical law. Here, the forward relation inequation (1) is given as a finite-difference time-domainsolution <strong>of</strong> Maxwell’s equations. However, any numericalwave propagation modelling strategy for GPR or seismicsignals can be applied. The inverse problem is to inferinformation about the model parameters based on a set <strong>of</strong>observations, a priori information about the model, and theforward relation between the model and the dataobservations.In a Bayesian formulation the solution to the inverseproblem is given as an a posteriori probability density,which can be formulated as (e.g. Tarantola, 2005):σ ( m) = kρ( m) L( m ), (2)MMwhere k is a normalization constant, ρM( m)is the a prioriprobability density, and L( m)is the likelihood function.ρ ( ) Mm describes the probability that the model satisfiesthe a priori information. L( m)describes how well themodelled data explains the observed data given a datauncertainty. Hence, the a posteriori probability densitydescribes the probability that a certain model is a solutionto the inverse problem.A highly nonlinear inverse problem refers to the case wherethe a priori probability density is far from being Gaussianor the forward relation between the model and data are farfrom being linear. In the case <strong>of</strong> <strong>full</strong> <strong>waveform</strong> <strong>inversion</strong>the forward relation is expected to be highly nonlinear.Moreover, the a priori information described by a trainingimage is highly non-Gaussian.The extended Metropolis algorithm is a versatile toolwhich, in particular, is useful to obtain samples fromsolutions to non-linear inverse problems using arbitrarilycomplex a priori information. The minimum requirement <strong>of</strong>the algorithm is; 1) a “black box” algorithm that is able tosample the a priori probability density and, 2) a “black box”algorithm that is able to compute the likelihood for a givenset <strong>of</strong> model parameters. The flowchart <strong>of</strong> the extendedMetropolis algorithm is as follows: 1) The a priori samplerproposes a sample, mpropose, from the a priori probabilitydensity, which is a perturbation <strong>of</strong> a previous acceptedmodel, m accept. 2) The proposed sample is accepted with theprobability (known as the Metropolis rule):Paccept⎛ L( m ) ⎞propose= min 1,⎜L( accept) ⎟⎝ m ⎠3) If the proposed model is accepted, mproposethe a posteriori probability andOtherwisempropose(3)is a sample <strong>of</strong>mproposebecomes maccept.is rejected. 4) The procedure iscontinued until a desirable number <strong>of</strong> models have beenaccepted.In this study the algorithm that provides the a prioriinformation is the Single Normal Equation SIMulation(snesim) algorithm, which is a fast geostatistical algorithmthat produces samples (conditional or unconditional) froman a priori probability density defined by a training imagefor a relatively low number <strong>of</strong> categorical values (Strebelle,2002). Hansen et al. (2008) suggest a strategy termedperturbed simulation, which is capable <strong>of</strong> producingperturbations <strong>of</strong> spatial distributions using geostatisticalalgorithms. Thus, perturbed simulation serves as a “blackbox” that produces samples <strong>of</strong> a priori probability densitiesdescribed by both two-point and multiple-point statistics.The flow <strong>of</strong> this algorithm is as follows: 1) An initialunconditional sample <strong>of</strong> the a priori probability density(here defined by a training image) is provided. 2) A subarea<strong>of</strong> the sample is randomly chosen. 3) The model parameterswithin this area are set to unknown. 4) The unknown modelparameters are resimulated conditional to the rest <strong>of</strong> themodel parameters using a geostatistical algorithm (heresnesim) and a perturbation is obtained. 5) This procedure isrepeated in order to obtain multiple samples <strong>of</strong> the a prioriprobability density.The size <strong>of</strong> the perturbation area governs the exploratorynature <strong>of</strong> the Metropolis algorithm. The size <strong>of</strong> theperturbation area is chosen subjectively. In the extreme casewhere the area covers the entire model the outcome <strong>of</strong> theperturbed simulation algorithm is uncorrelated to theprevious model. Contrary, if the area only constitutes asingle model parameter, the perturbed model is highlycorrelated with the initial model. According to themetropolis rule a small perturbation area results in aproposed model that is more probable <strong>of</strong> being accepted ascompared to a proposed model obtained using a largerperturbation area. Therefore, the perturbation area shouldSEG Expanded abstracts
<strong>Monte</strong> <strong>Carlo</strong> <strong>full</strong> <strong>waveform</strong> <strong>inversion</strong>be chosen care<strong>full</strong>y in order to ensure an efficientalgorithm. Gelman et al. (1996) found that the acceptancerate should be around 23% for high-dimensionaldistributions. For large acceptance rates the algorithm isexploring the a posteriori probability density too slowly. Onthe other hand, for smaller acceptance rates too manycomputationally expensive trials are performed. Therefore,we suggest to automatically change the size <strong>of</strong> theperturbation area while running the algorithm such that acertain acceptance rate is maintained. A constantacceptance rate results in a larger perturbation area in theburn-in period than in the subsequent sampling period.This effect is beneficial because the algorithm needs toperform large perturbations in the initial part in order t<strong>of</strong>ind models <strong>of</strong> large probability and, hence, producerepresentative samples <strong>of</strong> the a posteriori probabilitydensity.Finally, the likelihood function is defined as a Gaussiandistribution:⎛ 1⎞L( ) kexp ( g( ) d )/ σ⎝⎠Ni i 2m = ⎜− ∑ m −obs ⎟, (4)2 i = 1where g( m) i represents the amplitude <strong>of</strong> the individualsample points <strong>of</strong> all the simulated <strong>waveform</strong>s obtainedithrough equation (1) (i.e. the FDTD algorithm) and ared obsthe sample points <strong>of</strong> the observed <strong>waveform</strong> data. σ is thestandard deviation <strong>of</strong> the expected amplitude uncertainty <strong>of</strong>the <strong>waveform</strong> data.Results and discussionFigure 1 shows a training image that mimic a matrix <strong>of</strong> claywith embedded channels <strong>of</strong> unconsolidated sand.Electromagnetic signals in near surface sediments aresensitive to the dielectric permittivity and the electricalconductivity <strong>of</strong> the materials. In this study we limitourselves only to consider the influence <strong>of</strong> the dielectricpermittivity, which is primarily governing the phasevelocity <strong>of</strong> the signal. Water saturation <strong>of</strong> clay is <strong>of</strong>ten highcompared to sandy deposits. Therefore, the dielectricpermittivity <strong>of</strong> the clay is set to a relative dielectricpermittivity <strong>of</strong> εr≈ 4,57 (0,14m/ns) and the permittivity <strong>of</strong>the sand channels is set to εr≈ 2,75 (0,18m/ns) (e.g. Toppet al., 1980). Figure 2 (left) is the synthetic reference to beconsidered and is, at the same time, an unconditionalsample <strong>of</strong> the training image obtained using snesim. Theelectrical conductivity is set to a constant value <strong>of</strong> 3 mS/mand is, in the following, assumed known.A <strong>full</strong> <strong>waveform</strong> synthetic data set is calculated using theFDTD algorithm. A Ricker wavelet with a centralfrequency <strong>of</strong> 100 MHz is used as source pulse. The sourcepulse is assumed known during the <strong>inversion</strong>. Thetransmitter and receiver positions are separated by 2 m and0.25 m, respectively (see figure 2 left). Data acquired with atransmitter-receiver angle larger than 45 degrees fromhorizontal are omitted since, in practice, these data areviolated by effects <strong>of</strong> wave guiding in the boreholes (cf.Peterson, 2001). This leads to a total <strong>of</strong> 248 dataobservations (i.e. recorded <strong>waveform</strong>s).Depth [m]05101520250 5 10 15 20 25Distance [m]Figure 1. Training image which mimic sandy channelstructures embedded in a matrix <strong>of</strong> clay deposits.Depth [m]0246810Reference model120 2 4Distance [m]Distance [m]Figure 2. Left) Synthetic reference model. Black asterisksshow transmitter positions and the yellow dots showreceiver positions. Right) The initial model used as inputfor the <strong>inversion</strong>.0Initial guess120 2 4SEG Expanded abstractsNoise is subsequently added to the data by performing arandom phase shifting <strong>of</strong> the synthetic <strong>waveform</strong>s. Thephase shift is normal distributed with zero mean and astandard deviation <strong>of</strong> 0.4 ns since this is a typicalmagnitude found in GPR travel time data (e.g. Looms et al,in press). The phase shift results in an amplitude2468104.64.44.243.83.63.43.232.82.6Relative dielectric permittivity ( ε/ε 0)4.64.44.243.83.63.43.232.82.6Relative dielectric permittivity ( ε/ε 0)