Euradwaste '08 - EU Bookshop - Europa

Euradwaste '08 - EU Bookshop - Europa Euradwaste '08 - EU Bookshop - Europa

bookshop.europa.eu
from bookshop.europa.eu More from this publisher
10.12.2012 Views

specific sampling strategy, with the only exception of the simplest method (correlation ratios - CR). Graphical methods provide complementary visual information that helps understanding the meaning of numerical sensitivity indices and the global structure of the system model. 2.2.1 Monte Carlo based methods Monte Carlo based methods may be divided in three types: regression/correlation based methods, Monte Carlo Filtering and gridding methods. Regression/correlation methods assume that inputs and outputs can have a linear relation. The simplest method is Pearson’s correlation coefficient. This is a measure of linear relation that takes values between –1 and +1. Positive values indicate joint linear increase or decrease, negative values indicate that when the input increases the output decreases and vice versa. The closer the value is to +1 or to –1, the stronger the relation is, thus the more important the input parameter is. An alternative related measure of sensitivity is the slope or regression coefficient of the output versus the input (after standardisation of the sampled values, subtracting the corresponding sample mean and dividing by the corresponding standard deviation, in order to avoid scale effects in the sensitivity indices). When the importance of several inputs is analysed at the same time, the tool used is multiple regression together with input and output standardisation. In this case, the indices used are the Partial Correlation Coefficients (PCC) and the Standardised Regression Coefficients (SRC). These methods become really important when the inputs are correlated, otherwise they are respectively exactly the same as Pearson’s correlation coefficient and regression coefficients in simple regression. The importance of PCCs and SRCs comes from the fact that they measure respectively the correlation and the regression coefficient between one input and one output after removing the influence of all the other inputs. Unfortunately, by default, these indices are used to analyse only main effects, interactions are hardly ever considered in the analysis. Additionally, hardly ever practitioners study possible transformations of inputs and outputs to get a more appropriate regression model. In many cases, instead of linear, the relation between inputs and outputs is monotonic, in those cases it is convenient to transform inputs and outputs into their ranks, the largest sampled value is transformed into n (sample size), the second largest one into n-1,…, the smallest into 1. This way, the same tools (renamed as Partial Rank Regression Coefficients -PRCC- and Standardised Rank Regression Coefficients –SRRC-) may be used to assess the importance of each input parameter. The reliability of the results obtained via linear regression depends on the Coefficient of Determination (R 2 ) of the regression model obtained. If R 2 is close to 1 the results are very reliable, if it is close to 0, it means that this sensitivity method is not appropriate to study the system model at hand. Monte Carlo Filtering (MCF) is based on dividing the output sample in two or more subsets according to some criterion (achievement of a given condition, exceeding a threshold, etc.) and testing if the inputs associated to those subsets are different or not. As an example, we could divide the output sample in two parts, the one that exceeds a safety limit and the rest. We could wonder if points in both subsamples are related to different regions of a given input or if they may be related to any region of that input. In the first case knowing the value of that input parameter would be important in order to be able to predict if the safety limit will be exceeded or not, while in the second case it would not be. The tools used to provide adequate answers to this type of questions are a set of parametric and non-parametric statistics and their associated tests, as for example the two-sample t test, the two-sample F test, the two-sample Smirnov test, the k-sample Smirnov test, the Cramervon Mises test, the Wilcoxon test (also known as Mann-Whitney test) and the Kruskal-Wallis test, see Conover (1980) for details about each specific statistic and test. In general, non-parametric tests should be preferred for fewer restrictions are imposed on the samples used. The idea behind Gridding and the tests used are similar to Monte Carlo Filtering. The only real difference is that the cri- 390

teria to divide a sample in two or more parts are set on the input space and the test is performed using the corresponding points in the output space. 2.2.2 Variance based methods The variance, or equivalently the standard deviation, and the entropy, are the main measures of uncertainty in the theory of Probability. The larger the variance of a random variable is, the less accurate our knowledge about it is. Decreasing the variance of a given output variable is quite an attractive target, that may be achieved sometimes by decreasing the variance of input parameters (this is not always true, remember the possibility of risk dilution). This is what makes so attractive methods that try to find out what fraction of the output uncertainty (variance) may be attributed to the uncertainty (variance) in each input. Variance based methods find their theoretical support in Sobol’s decomposition of any integrable function in the unit reference hypercube into 2 k orthogonal summands of different dimension: the mean value of the function, k functions which depend each one only on one input parameter, k(k- 1)/2 functions that depend only on two input parameters, k(k-1)(k-2)/6 that depend only on three input parameters and so on. Replacing any output variable of the system model (our function) by its Sobol’s decomposition in the integral used to compute its variance produces in a straightforward manner the decomposition of the variance in its components. The quotient between each component of the variance and the total variance provides the fraction of the variance attributed to each single input parameter (main effects), each combination of only two input parameters (second order interactions) and so on. These are called Sobol’s sensitivity indices; see Sobol (1993). It is important to remark that Sobol’s decomposition is equivalent to the classical Analysis of Variance (ANOVA) used in Statistics. Several algorithms have been proposed to compute Sobol’s indices, the first one by himself. The main problem is related to the efficiency of the method. It needs one specific sample to compute each sensitivity index. Since its development, huge efforts have been done to improve the strategies (algorithms) to compute Sobol’s indices, see for example Saltelli (2002) and Tarantola et al. (2006). It remains as a powerful but expensive method (in terms of computational cost). Independently, and quite before the development of Sobol’s decomposition and Sobol’s indices, a method had been developed to compute first order sensitivity indices (equivalent to first order Sobol’s sensitivity indices): the Fourier Amplitude Sensitivity Test (FAST), see Cukier et al. (1973), Schaibly and Shuler (1973) and Cukier et al. (1975). In order to compute sensitivity indices, these authors create a search curve that covers reasonably well the input space. Each input parameter is assigned an integer frequency. Varying simultaneously all input parameters according to that set of frequencies generates the search curve. Equally spaced points are sampled from the search curve and used to perform a Fourier analysis. The coefficients corresponding to the frequency (and its harmonics) assigned to each input parameter are used to compute the corresponding sensitivity index. Saltelli et al. (1999) did further improvements of the method, among them the possibility of computing total sensitivity indices for a given input parameter (the fraction of the variance due to it and all its interactions of any order). FAST remains unable to compute sensitivity indices for interactions. Correlation Ratios are an alternative to Sobol’s method and FAST to compute first order sensitivity indices using a normal sample (SRS, LHS, etc.). So, though a method used to compute variance based sensitivity indices, it could also be considered Monte Carlo based. 391

specific sampling strategy, with the only exception of the simplest method (correlation ratios - CR).<br />

Graphical methods provide complementary visual information that helps understanding the meaning<br />

of numerical sensitivity indices and the global structure of the system model.<br />

2.2.1 Monte Carlo based methods<br />

Monte Carlo based methods may be divided in three types: regression/correlation based methods,<br />

Monte Carlo Filtering and gridding methods. Regression/correlation methods assume that inputs<br />

and outputs can have a linear relation. The simplest method is Pearson’s correlation coefficient.<br />

This is a measure of linear relation that takes values between –1 and +1. Positive values indicate<br />

joint linear increase or decrease, negative values indicate that when the input increases the output<br />

decreases and vice versa. The closer the value is to +1 or to –1, the stronger the relation is, thus the<br />

more important the input parameter is. An alternative related measure of sensitivity is the slope or<br />

regression coefficient of the output versus the input (after standardisation of the sampled values,<br />

subtracting the corresponding sample mean and dividing by the corresponding standard deviation,<br />

in order to avoid scale effects in the sensitivity indices). When the importance of several inputs is<br />

analysed at the same time, the tool used is multiple regression together with input and output standardisation.<br />

In this case, the indices used are the Partial Correlation Coefficients (PCC) and the<br />

Standardised Regression Coefficients (SRC). These methods become really important when the inputs<br />

are correlated, otherwise they are respectively exactly the same as Pearson’s correlation coefficient<br />

and regression coefficients in simple regression. The importance of PCCs and SRCs comes<br />

from the fact that they measure respectively the correlation and the regression coefficient between<br />

one input and one output after removing the influence of all the other inputs. Unfortunately, by default,<br />

these indices are used to analyse only main effects, interactions are hardly ever considered in<br />

the analysis. Additionally, hardly ever practitioners study possible transformations of inputs and<br />

outputs to get a more appropriate regression model.<br />

In many cases, instead of linear, the relation between inputs and outputs is monotonic, in those<br />

cases it is convenient to transform inputs and outputs into their ranks, the largest sampled value is<br />

transformed into n (sample size), the second largest one into n-1,…, the smallest into 1. This way,<br />

the same tools (renamed as Partial Rank Regression Coefficients -PRCC- and Standardised Rank<br />

Regression Coefficients –SRRC-) may be used to assess the importance of each input parameter.<br />

The reliability of the results obtained via linear regression depends on the Coefficient of Determination<br />

(R 2 ) of the regression model obtained. If R 2 is close to 1 the results are very reliable, if it is<br />

close to 0, it means that this sensitivity method is not appropriate to study the system model at hand.<br />

Monte Carlo Filtering (MCF) is based on dividing the output sample in two or more subsets according<br />

to some criterion (achievement of a given condition, exceeding a threshold, etc.) and testing if<br />

the inputs associated to those subsets are different or not. As an example, we could divide the output<br />

sample in two parts, the one that exceeds a safety limit and the rest. We could wonder if points<br />

in both subsamples are related to different regions of a given input or if they may be related to any<br />

region of that input. In the first case knowing the value of that input parameter would be important<br />

in order to be able to predict if the safety limit will be exceeded or not, while in the second case it<br />

would not be. The tools used to provide adequate answers to this type of questions are a set of parametric<br />

and non-parametric statistics and their associated tests, as for example the two-sample t<br />

test, the two-sample F test, the two-sample Smirnov test, the k-sample Smirnov test, the Cramervon<br />

Mises test, the Wilcoxon test (also known as Mann-Whitney test) and the Kruskal-Wallis test,<br />

see Conover (1980) for details about each specific statistic and test. In general, non-parametric tests<br />

should be preferred for fewer restrictions are imposed on the samples used. The idea behind Gridding<br />

and the tests used are similar to Monte Carlo Filtering. The only real difference is that the cri-<br />

390

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!