Bayesian Inference for Multivariate t Copulas Modeling Financial ...

✄✄ ✄ ✄✄✄✄ ✄ ✄ ✄Swiss Federal Institute of Technology ZurichSeminar forStatisticsDepartment of MathematicsMaster Thesis Summer 2011Evgenia AgeevaBayesian Inference for Multivariate t CopulasModeling Financial Market RiskSubmission Date: September 30th 2011Co-AdviserAdviser:Dr. Martin MächlerProf. Dr. Peter Bühlmann

iv”Uncertainty is the only certainty there is, and knowing how to live with insecurity is theonly security,”John Allen Paulos

viAbstract

AbstractThe main objective of this thesis is to develop a Markov chain Monte Carlo (MCMC)method under the Bayesian inference framework for estimating meta-t copula functionsfor modeling financial market risks. The complete posterior distribution of the copulaparameters resulting from Bayesian MCMC allows further analysis such as calculating therisk measures that incorporate the parameter uncertainty. The simulation study of thefictitious and real equity portfolio returns shows that the parameter uncertainty tends toincrease the risk measures, such as the Value-at-Risk and the Expected Shortfall of theprofit-and-loss distribution.vii

viiiCONTENTSContentsDeclarationiii1 Introduction 32 The Concept of Copula 52.1 Basic Notation and Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Elliptical Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 The Student t and Meta-t Copulas 93.1 The Multivariate t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Standard t-Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2.1 Simulation of t-Copulas . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Meta t-Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Maximum Likelihood Based Estimation of the t-Copula 174.1 Copula Parameter Estimation: Overview over Classic Approaches . . . . . . 174.1.1 Maximum Likelihood Method (ML) . . . . . . . . . . . . . . . . . . 174.1.2 The Inference Functions for Margins Method (IFM) . . . . . . . . . 184.1.3 The Canonical Maximum Likelihood Method (CML) . . . . . . . . . 194.2 Student’s t-Copula: CML Estimation . . . . . . . . . . . . . . . . . . . . . . 204.3 Meta-t Copula with t Distributed Margins: IFM Estimation . . . . . . . . . 214.4 Simulation Study: Comparison between the IFM and CML Methods . . . . 215 Interludium on Bayesian Inference 255.1 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Metropolis-Hastings Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 265.3 Implementation Issues: Choice of q(θ ∗ | θ) . . . . . . . . . . . . . . . . . . . 285.3.1 Random Walk Metropolis-Hastings . . . . . . . . . . . . . . . . . . . 285.3.2 Independence Chain Metropolis-Hastings . . . . . . . . . . . . . . . 285.4 Multiple-Block Metropolis-Hastings . . . . . . . . . . . . . . . . . . . . . . . 295.5 Obtaining an Approximate Random Sample for Inference . . . . . . . . . . 295.6 Bayesian Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.7 Bayesian Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Bayesian Estimate of the t-Copula’s Degree of Freedom ν 336.1 Bayesian Estimator of the Degrees of Freedom Parameter of a t-Copula . . 336.1.1 Choice of the Prior Distribution . . . . . . . . . . . . . . . . . . . . 336.1.2 Posterior Distribution of ν . . . . . . . . . . . . . . . . . . . . . . . . 366.1.3 Choice of the Proposal Distribution . . . . . . . . . . . . . . . . . . 366.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.3 Assessing Markov Chain Convergence . . . . . . . . . . . . . . . . . . . . . 386.3.1 Visual Analysis via Trace Plots . . . . . . . . . . . . . . . . . . . . . 396.3.2 Visual Analysis via Autocorrelation Plots . . . . . . . . . . . . . . . 396.4 Simulation Study: Bayesian vs. Classical Estimation Methods of ν . . . . . 397 Correlation Estimation of the t-Copula 437.1 Classical Approach: Kendall’s tau Approximation . . . . . . . . . . . . . . . 437.2 Bayesian Correlation Estimation . . . . . . . . . . . . . . . . . . . . . . . . 44

CONTENTS 17.2.1 Bayesian Inference for a Covariance matrix . . . . . . . . . . . . . . 457.2.2 MCMC for the Covariance Matrix . . . . . . . . . . . . . . . . . . . 467.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487.4 Simulation Study with Higher Matrix Dimensions . . . . . . . . . . . . . . . 498 Bayesian t-Copula Distribution 518.1 Multiple-Block Metropolis-Hastings for the t-Copula Parameters . . . . . . 519 Computing the Nearest Correlation Matrix 559.1 Eigenvalue Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559.2 Higham Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5810 Application: Simulated and Real Equity Portfolio 5910.1 Value-at-Risk and Expected Shortfall . . . . . . . . . . . . . . . . . . . . . . 5910.2 VaR and ES Calculation Based on Simulated Portfolio Returns . . . . . . . 6010.3 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6310.4 Equity Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6510.4.1 Historical Simulation of VaR / ES . . . . . . . . . . . . . . . . . . . 6510.4.2 ML Copula Calibration and Resulting VaR / ES . . . . . . . . . . . 6510.4.3 Bayesian Copula Estimation and the Resulting VaR / ES . . . . . . 6610.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7011 Conclusion and Outlook 7111.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Bibliography 75A Complementary information 77A.1 Inverse Wishart Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.2 Equity data: Yahoo Tickers and Principal Statistics . . . . . . . . . . . . . 78

2 CONTENTS

Chapter 1IntroductionCopula functions have become a major tool in statistics for modeling and analysing dependencestructures among financial risk factors. In fact, they allow to model the dependencystructure independently from the marginal distributions. In this way, we may construct amultivariate distribution with different margins and the dependence structure given fromthe copula function.In practice, one of the most popular copulas in modeling multivariate financial data is thet-copula and the meta-t copula; see Embrechts et al. (2001), and Demarta and McNeil(2005).This popularity is due to its simplicity in terms of simulation and calibration, combinedwith its ability to model tail dependence, which is often observed in financial returns data.In addition, it is well known that financial data are usually heavy tailed and therefore webase our financial risk model on t- and meta-t copulas.Estimation of copula parameters, in general, is often based on classical maximum likelihood(ML) and its variations. The most common approaches are fully parametric (ML), stepwiseparametric (the so-called Inference Function for Margins method) or semi-parametric (theso-called Canonical Maximum Likelihood approach). The fact that point estimators arethemselves uncertain (especially if the data are scarce) is usually neglected and the gradeof uncertainty of each parameter is not taken into account.As a consequence, a model for financial market risk that is based on the point estimatorsmight lead to a wrong subjective risk perception and may create a dangerous overconfidenceof the decision maker. Financial risk managers are therefore rightfully concernedwith the precision of typical risk measurement techniques. Hence, there is a need for amodel simulating financial market risk that takes the parameter uncertainty into account.The employment of Bayesian methods and simulation tools could be a natural solution tothe raised problem. From a Bayesian point of view, model parameters are random variableswhose distribution can be inferred by combining the prior density with the likelihoodof observed data. The complete posterior distribution of the parameters resulting fromBayesian MCMC allows further analysis such as parameter uncertainty quantification.Moreover, Bayesian methodology offers a natural framework to introduce parameter uncertaintyinto predicting risk measures, such as Value-at-Risk (VaR) or Expected Shortfall(ES), making it possible to obtain a posterior distribution of these risk measures given thedata.3

4 IntroductionThe Bayesian inference approach for parameter estimation has become very popular duringthe last two decades. In the operational risk management field, Wüthrich and Shevchenko(2009) and Dalla Valle and Giudici (2008) provide methods from Bayesian statistics tomodel loss and severity distributions combining loss data and expert opinion.Increasingly, Bayesian MCMC finds new applications in the field of quantitative financialrisk modeling. Böcker et al. (2010) and Borowicz and Norman (2009) incorporateparameter uncertainty using Bayesian methods for estimating Gaussian and Gumbel copulas,where the parameter uncertainty stems from the correlation matrix. Min and Czado(2010) provide Bayesian analysis of pair-copula constructions. Dalla Valle (2007) proposesBayesian inference based on MCMC for multivariate Gaussian and t-copulas. However,there has been very few research on the impact of parameter uncertainty stemming fromspecific copula models on the usual risk measures.In this thesis we propose Bayesian approach for modeling financial market risk through t-copulas. Our work makes two novel contributions in this field. First, we construct Bayesiant-copula parameter estimates, obtained via MCMC methods. In doing this, we explicitlyaddress the existence of parameter uncertainty associated with the copula models.Second, we propagate the parameter uncertainty into the risk measures VaR/ES uncertainty,therefore we can do statistical inference on the so-called posterior distribution ofVaR/ES, e.g. calculate Bayesian credible intervals.Using historical data of equity asset prices as a case study, we found that Bayesian riskmeasures tend to have larger absolute values than the maximum likelihood based riskmeasures, i.e. the parameter uncertainty has an incresing effect on the risk measures.Moreover, the posterior credible intervals for VaR and ES are larger than the usual bootstrappedconfidence intervals for the maximum likelihood estimated model.The thesis is structured as follows. In chapters 2 and 3, a basic notation on copulasis given and the t-copula is described in detail. An overview of well-known maximumlikelihood-basedestimators for t-copulas is presented in chapter 4. Next, the used Bayesianmethodology is illustrated in detail, with particular focus on the description of simulationmethods. In chapters 6-9 we develop a Markov chain Monte Carlo method for estimatingt-copula functions. In chapter 10 we demonstrate the impact of parameter uncertainty onthe risk measures for simulated and real equity portfolio returns. Finally, we draw someconcluding remarks.

Chapter 2The Concept of CopulaCopulas are functions that join multivariate distribution functions to their one-dimensionalmarginal distribution functions. Alternatively, a copula is a multivariate distributionfunction defined on [0, 1] d , whose one-dimensional margins are uniform on the unit interval.More details about copulas can be found in Joe and Xu (1996) and McNeil et al. (2005).In this chapter we introduce the basic notation and most important theorems about copulas.We will let R denote the ordinary real line (−∞, ∞), R denote the extended real line[−∞, ∞], and R 2 denote the extended real plane [−∞, ∞] 2 .2.1 Basic Notation and TheoremsHere, we adopt the notation from McNeil et al. (2005).Definition A d-dimensional copula C(u 1 , . . . , u d ), is a multivariate distribution functiondefined on the unit hypercube [0, 1] d , with standard uniform marginal distributions.Specifically, C : [0, 1] d → [0, 1], where C(u) = C(u 1 , . . . , u d ) is a d-dimensional copula, if:1 C(u 1 , . . . , u d ) is increasing in each component u i .2 C(1, . . . , 1, u i , 1, . . . , 1) = u i for all i ∈ {1, . . . , d}, u i ∈ [0, 1].3 For all (a 1 , . . . , a d ), (b 1 , . . . , b d ) ∈ [0, 1] d with a i ≤ b i we have2∑ 2∑· · · (−1) i 1+···+i dC(u 1i1 , . . . , u did ) ≥ 0, (2.1)i 1 =1 i d =1where u j1 = a j and u j2 = b j for all j ∈ {1, . . . , d}.The first property is clearly required of any multivariate distribution function and thesecond property is the requirement of uniform marginal distributions. The third propertyis less obvious, but the so-called rectangle inequality in (2.1) ensures that if the randomvector (U 1 , . . . , U d ) ′ has df C, then P[a 1 ≤ U 1 ≤ b 1 , . . . , a d ≤ U d ≤ b d ] is non-negative.These three properties characterize a copula; if a function C fulfills them, then it is a5

6 The Concept of Copulacopula. Note also that, for 2 ≤ k < d, the k-dimensional margins of a d-dimensionalcopula are themselves copulas.The importance of copulas in the study of multivariate distribution functions is summarizedby the following theorem, which shows, firstly, that all multivariate distributionscontain copulas and, secondly, that copulas may be used in conjunction with univariatedistribution functions to construct multivariate distribution functions.Theorem 2.1.1. Sklar 1959. Let F be an d-dimensional distribution function with marginsF 1 , F 2 , . . . , F d . Then there exists a d-dimensional copula C : [0, 1] d → [0, 1] such that, forx ∈ R d we haveF (u 1 , u 2 , . . . , u d ) = C(F 1 (x 1 ), . . . , F d (x d )). (2.2)If the margins are continuous, then C is unique; otherwise C is uniquely determined onRan F 1 × Ran F 2 × · · · × Ran F d , where Ran F i denotes the range of F i . Conversely, ifC is a copula and F 1 , F 2 , . . . , F d are univariate distribution functions, then the function Fdefined in (2.2) is a joint distribution function with margins F 1 , F 2 , . . . , F d .Remark For the puspose of this thesis we concentrate exclusively on random vectors X =(X 1 , . . . , X d ) ′ whose marginal distribution functions are continuous and strictly increasing.In this case the so-called copula C of their joint distribution function may be extractedfrom (2.2) by evaluatingC(u) := C(u 1 , . . . , u d ) = F (F −1 (u 1 ), . . . , F −1 (u d )), (2.3)where the Fi−1 are the quantile functions of the margins. The copula C can be thought ofas the distribution function of the componentwise probability transformed random vector(F 1 (X 1 ), . . . , F d (X d )) ′ .Remark By applying Sklar’s theorem and by exploiting the relation between the distributionand the density function 1 , we can easily derive the multivariate copula densityc(F 1 (x 1 ), F 2 (x 2 ), . . . , F d (x d )) associated with a copula function C(F 1 (x 1 ), F 2 (x 2 ), . . . , F d (x d )):f(x 1 , x 2 , . . . , x d ) == ∂d [C(F 1 (x 1 ), F 2 (x 2 ), . . . , F d (x d ))]d∏· f i (x i )∂F 1 (x 1 ) . . . ∂F d (x d )i=1d∏= c(F 1 (x 1 ), F 2 (x 2 ), . . . , F d (x d )) · f i (x i ),i=11 As known, in the univariate case, the density function f(x) of a random variable X can be obtainedby the cumulative distribution function via the following relation:f(x) =∂F (x)∂x .

2.2 Elliptical Copulas 7where we definec(F 1 (x 1 ), F 2 (x 2 ), . . . , F d (x d )) = f(x 1, x 2 , . . . , x d )∏ di=1. (2.4)f i (x i )We will see later that knowledge of the associated copula density will be particularly usefulin order to calibrate its parameters to real market data.2.2 Elliptical CopulasElliptical copulas have become very popular in finance and risk management. An ellipticalcopula is the copula corresponding to an elliptical distribution by the Sklar’s theorem.Recall that whenever a multivariate density f(x) depends on x only through the quadraticform (x − µ) ′ Σ −1 (x − µ), it is the density of a so-called elliptical distribution. A detailedintroduction can be found in McNeil et al. (2005).Two most popular elliptical copulas are the Gaussian and the t-copula.Gaussian copulaThe Gaussian copula is defined byC GaP (u 1 , . . . , u d ) = Φ P (Φ −1 (u 1 ), . . . , Φ −1 (u d )),where Φ P is the multivariate normal distribution with correlation matrix P . Φ is thestandard normal distribution function and Φ −1 is its functional inverse. The Gauss copuladoes not have a simple closed form, but can be expressed as an integral over the densityof X; in two dimensions the Gauss copula is given byC Gaρ (u 1 , u 2 ) =∫ Φ −1 (u 1 ) ∫ Φ −1 (u 2 )−∞−∞(1−(s22π(1 − ρ 2 ) 1/2 exp 1 − 2ρs 1 s 2 + s 2 2 )2(1 − ρ 2 ))ds 1 ds 2 , (2.5)where ρ is simply the linear correlation coefficient between the two random variables.Student t-copulaThe t-copula is defined asCν,P t (u 1 , . . . , u d ) = t ν,P (t −1ν (u 1 ), . . . , t −1ν (u d )),where ν denotes the degrees of freedom (df) parameter and P is the correlation matrix.The bivariate t-copula with the parameters ν and ρ is given byC t ν,ρ(u 1 , u 2 ) =∫ F−1tν (u 1)−∞∫ F−1tν (u 2)−∞Γ ( ν2( )Γ ν+2 (2) √πν 1 − ρ 21 + s2 1 − 2ρs 1s 2 + s 2 2ν) −ν+22ds 1 ds 2 ,(2.6)where s 1 = Ft −1ν(u 1 ), s 2 = Ft −1ν(u 2 ) and Ft −1ν(·) is the quantile function for the standardunivariate Student t-distribution with df ν. Note, that setting the df to infinity results inthe Gaussian copula.

8 The Concept of CopulaThe use of the t-copula in risk management has become widely established. See for examplethe recent work by Daul et al. (2003) or Fang and Fang (2002). In the following chapterwe discuss the t-copula in greater detail.

Chapter 3The Student t and Meta-t CopulasIn this section we describe the multivariate t-distribution and its copula, the so-called t-copula. Additionally we present the related class of copulas, the so-called meta-t copulas.3.1 The Multivariate t DistributionThe d-dimensional random vector X = (X 1 , · · · , X d ) ′ is said to have a (non-singular)multivariate t distribution with ν degrees of freedom (df), mean vector µ and positivedefinitedispersion or scatter matrix Σ, denoted X ∼ t d (ν, µ, Σ), if its density is givenbyf td (ν,µ,Σ)(x) =( )Γ ν+d2Γ ( ν2) √ (πν) d |Σ|(1 + (x − µ)′ Σ −1 (x − µ)ν) −ν+d2, (3.1)where ν is the number of df and Γ is the Gamma function.The df parameter ν is the shape parameter of the latter distribution, which is more heavytailedwith lower value of ν.Note that in this standard parameterizationE[X] = µ,if ν > 1, else undefined.In addition, if ν > 2 then the covariance matrix of the Student’s t-distribution is given bycov(X) =otherwise i.e. for 1 < ν ≤ 2 it does not exist.νν − 2 Σ,In the univariate case the Student’s t-distribution and has the probability density function( )f tν (x) = Γ ν+1 ( ) −ν+12Γ ( )ν√ 1 + x2 2,2 πν νwhere ν is the number of df and Γ is the Gamma function.9

10 The Student t and Meta-t Copulas3.2 Standard t-CopulaA d-dimensional t-copula can be written asC(u) := C(u 1 , . . . , u d ) = F tν,P (Ft −1ν(u 1 ), . . . , Ft −1ν(u d )). (3.2)The copula remains invariant under a standardization of the marginal distributions (infact it remains invariant under any series of strictly increasing transformations of thecomponents of the random vector X). This means that the copula of a t d (ν, µ, Σ) isidentical to that of a t d (ν, 0, P ) distribution where P is the correlation matrix implied bythe dispersion matrix Σ. The unique copula is thus given byC t ν,P (u) =∫ F−1tν (u 1)−∞∫ F−1tν. . .(u d)−∞( )Γ ν+d2Γ ( ν2) √ (πν) d |P |(1 + z′ P −1 zν) −ν+d2where Ft −1νdenotes the quantile function of a standard univariate t ν distribution. In thebivariate case we simplify the notation to Cν,ρ t where ρ is the off-diagonal element of P .Alternative representationThe t-copulas are most easily described and understood by a stochastic representation, asdefined below.• Z = (Z 1 , . . . , Z d ) ′ is a random vector from the multivariate normal distribution withzero mean vector, unit variances and correlation matrix P .• U = (U 1 , . . . , U d ) ′ is defined on [0, 1] d domain.• V is a random variable from the uniform (0, 1) distribution independent of Z.• W = G −1ν (V ), where G ν (·) is the distribution function of √ ν/S with S distributedfrom the chi-square distribution with ν df, i.e. W and Z are independent.• F tν (·) is the standard univariate t-distribution and Ft −1ν(·) is its inverse.Then we have the following representations.The random vectordz,X = W Z (3.3)is distributed from a multivariate t-distribution and random vectoris distributed from the standard t-copula, denoted by C t ν,P .t-Copula probability density functionU = (F tν (X 1 ), . . . , F tν (X d )) ′ (3.4)For estimation purposes it is useful to note that the density of the t-copula may be easilycalculated from (3.2) and has the formc t ν,P (u) = f t ν,P(Ft −1ν(u 1 ), · · · , Ft −1ν(u d )∏ di=1f tν (Ft −1 , u ∈ (0, 1) d , (3.5)ν(u i ))

3.2 Standard t-Copula 11where f tν,P is the joint density of a t d (ν, 0, P ) distributed random vector and f tν is thedensity of the univariate standard t-distribution with ν df.Rewriting (3.5) gives us closed formula for the t-copula density functionc t ν,P (u) =(Γ) (ν+d2Γν2) d−1 (1 + z′ P −1 zν) −ν+d2( ) d ) −ν+1|P | 1/2 ν+1 ∏di=1Γ2(1 + z2 2iν, (3.6)wherez = (Ft −1ν(u 1 ), . . . , Ft −1ν(u d )) ′ .Figure 3.1 shows the surface of the Student’s t-copula density as depicted in equation (2.5)for the bivariate case with correlation ρ.Figure 3.1: Bivariate t-copula density function with ρ = 0.7 and ν = 7.Dependence MeasuresThere are three common kinds of dependence measure in risk management: the usualPearson linear correlation; Kendall’s tau and Spearmann’s rho and the coefficients of taildependence. All of these dependence measures yield a scalar measurement for a pair ofrandom variables (X, Y ). We assume the reader is familiar with the notion of Pearson

12 The Student t and Meta-t Copulascorrelation and recall here the basic notation of the other two kinds of dependence measure,that will play an important role throughout the thesis; Kendall’s tau and tail-dependencecoefficients.Kendall’s tauIn the field of copula research Kendall’s tau is the most popular measure of concordancefor bivariate random vectors. In general the measure is calculated asτ(X, Y ) = E [sign(X − ˜X)(Y − Ỹ )], (3.7)where ( ˜X, Ỹ ) is a second independent pair with the same distribution as (X, Y ).It can be shown (see Embrechts et al. (2001)) that the Kendall’s tau rank correlation τdepends only on the copula C (and not on the marginal distributions of X and Y ) and isgiven byτ(X, Y ) = 4∫ 1 ∫ 100C(u 1 , u 2 )dC(u 1 , u 2 ) − 1.Remarkably Kendall’s tau takes the same elegant form for the Gauss copula CρGa , thet-copula Cν,ρ t or the copula of essentially all useful distributions in the bivariate ellipticalclass, this form beingτ(X, Y ) = 2 arcsin ρ. (3.8)πA proof of a slightly more general result applying to all elliptical distributions has beenderived in Lindskog et al. (2001).Coefficients of tail dependenceThe upper and lower tail dependence coefficients of a copula provide quantification oftail strength. This piece of information that the tails of data sets are asymptoticallydependent or independent is especially important when fitting copulas to empirical data,as some models (e.g. Gaussian copula) are asymptotically independent in the tail, whileothers have tail dependence.Let X and Y be random variables with continuous distribution functions F X and F Y .The coefficients of upper and lower tail dependence of X and Y are defined as−1−1λ u (X, Y ) = lim P[X > Fq→1− X(q) | Y > FY(q)] and−1−1λ l (X, Y ) = lim P[X ≤ Fq→0 + X(q) | Y ≤ FY(q)],provided a limit λ u and λ l ∈ [0, 1] exists. For the normal copula and many others, thesecoefficients are zero. This means that for extreme values the distributions are uncorrelated,so large-large or small-small combinations are not likely.For the t-copula of an elliptically symmetric distribution like the t the two measures λ land λ u coincide, and are denoted simply by λ.According to Embrechts et al. (2001), λ is given by

3.2 Standard t-Copula 13λ(ν, ρ) = 2t ν+1(− √ ν + 1√ ) 1 − ρ.1 + ρProvided that ρ > −1, the copula of the bivariate t distribution is asymptotically dependentin both the upper and lower tail. In Figure 3.2 we plot the coefficient of taildependence for various values of ν and ρ. For fixed ρ the strength of the tail dependenceincreases as ν decreases and for fixed ν tail dependence increases as ρ increases. Even forzero or negative correlation values there is some tail dependence. For detailed explanationsee McNeil et al. (2005), p.211.Figure 3.2: The coefficient of upper and lower tail dependence λ for the bivariate t-copulafor different df ν and ρ values.3.2.1 Simulation of t-CopulasIt should be evident from the way the t-copula is extracted from well-known t distributionthat it is particularly easy to sample from this copula, provided we can sample from thet distribution.The following algorithm illustrates how to sample from the standard t-copula:Algorithm 3.2.1. Simulation of t-copula• Generate X ∼ t d (ν, 0, P ) by the usual approach (e.g. Cholesky decomposition) forgenerating correlated random variable samples.• Return U = (F tν (X 1 ), · · · , F tν (X d )) ′ . The random vector U has distribution functionC t ν,P .

14 The Student t and Meta-t Copulas3.3 Meta t-CopulaThe t-copula is often used in risk management due to the fact that it allows the modellingof tail dependence between risks and it is simple to simulate and calibrate. However,the use of a standard t-copula is often criticized due to its restriction of having a singleparameter for the degrees of freedom ν that may limit its capability to model the taildependence structure in a multivariate case.The converse statement of Sklar’s theorem provides a very powerful technique for constructingmultivariate distributions with arbitrary margins and copulas; we know that ifwe start with a copula C and margins F 1 , . . . , F d , then F (x) := C(F 1 (x 1 ), . . . , F d (x d )) definesa multivariate distribution function with margins F 1 , . . . , F d . The following examplewill play an important role throughout the thesis: a meta-t copula, which has the copulaC t ν,P and univariate t-distributed margins with different df parameters ν 1, . . . , ν d .Meta-t Copula with t-distributed MarginsLet Θ := {ν 1 , . . . , ν d , ν, P | ν 1 , . . . , ν d , ν ∈ (0, ∞], −1 < ρ ij < 1, i = 1, . . . , d, j < i} be theparameter space of the d-variate meta t-copula with univariate t-margins, abbreviated asmeta-t θ , where θ ∈ Θ. The parameters of the marginal distributions are df’s ν 1 , . . . , ν dwhile ν and P are the parameters of the t-copula.The density of the meta-t copula may be calculated from (3.2) analogously to the t-copulaand has the formwherec meta−tν,P(u) =z = (F −1t ν1(u 1 ), . . . , F −1t νd(u d )) ′ .(Γ) (ν+d2Γ(|P | 1/2 Γν2) d−1 (1 + z′ P −1 zν) −ν+d2) d ( )ν − i +1ν+1 ∏di=121 + z2 2iν iFigure 3.3 illustrates contour plots of bivariate meta-t copulas with t margins on theoriginal scale with ρ = 0.8 and different df values ν 1 , ν 2 and ν.,

3.3 Meta t-Copula 15Figure 3.3: Contour plots of bivariate meta-t probability density functions with ρ = 0.8and different degrees of freedom parameters ν 1 , ν 2 and ν.

16 The Student t and Meta-t Copulas

Chapter 4Maximum Likelihood BasedEstimation of the t-CopulaCalibrating copula parameters to real market data represents an active research area inthe current statistical literature (see, for example, Mashal and Zeevi (2002), Kim et al.(2007) or Chen and Fan (2007)).There exist several methods of estimating copula parameters. The one step method or themaximum likelihood (ML) method estimates all parameters of the model simultaneously.The second method is the two step estimator or the method of inference functions for margins(IFM), which first estimates the parameters of the margins and then estimates thecopula function. The canonical maximum likelihood method (CML), or the semiparametricestimation, leaves the marginal densities unspecified and uses the empirical probabilityintegral transform in order to obtain the uniform marginals needed to estimate the copulaparameters.In this section we will introduce these well-known methods of calibrating a copula ingeneral and a t-copula in particular.In the following analysis we will consider a random sample represented by the time seriesX = (X 1t , · · · , X dt ) N t=1 , where d stands for the number of underlying risk factors includedand N represents the number of observations (on a daily, monthly, quarterly or yearlybase) available.4.1 Copula Parameter Estimation: Overview over ClassicApproaches4.1.1 Maximum Likelihood Method (ML)Let Θ be the parameter space and θ be the k-dimensional vector of parameters to beestimated. Let L(θ) and l(θ) be, respectively, the likelihood and the log-likelihood for theobservations. If X follows a copula C, with the canonical expression for copula densityfunction as expressed by equation (2.4), then the log-likelihood function of the copula17

18 Maximum Likelihood Based Estimation of the t-Copulaparameters isN∑N∑ d∑l(θ) = ln c(F 1 (x t 1), . . . , F d (x t d)) + ln f n (x t n). (4.1)t=1t=1 n=1The assumptions to be made are: the log-likelihood function attains the supremum valueand the data are independent and identically distributed. The method can be appliedhowever to a broader setting, see Ferguson (1996). Since the X is a time series vector andno iid assumption can be imposed, we drop the independence assumption for simplicityreasons.We finally define the maximum likelihood estimator, as the vector ˆθ such thatˆθ := (ˆθ 1 , ˆθ 2 , . . . , ˆθ k ) = arg max{l(θ) : θ ∈ Θ}.Student’s t Copula Let Θ = {(ν, P ) : ν ∈ (0, ∞], P ∈ R d×d }, with P being a symmetricand positive definite matrix, denote the parameter space. We can then apply (4.1) to thecase of the Student’s t-copula density described in equation (3.3). In this case we obtainwherel Student (θ) =N ln− ν + d2ν+dΓ(2 )ν+1Γ(Γ( ν 22 ) − dN ln )Γ( ν 2 ) − N 2N∑t=1z = (Ft −1ν(u 1 ), . . . , Ft −1ν(u d )) ′ .(ln1 + z′ tP −1 z tνln |P |)+ ν + 12N∑d∑t=1 n=1(ln1 + z2 ntν), (4.2)The calibration of the t-copula requires a simultaneous estimation of the parameters ofthe margins and the parameters related to the dependence structure, i.e. the correlationmatrix. This procedure, as pointed out by Mashal and Naldi (2002), needs a huge amountof data and is computationally very intensive. For that reason alternative methodologieshave been introduced as reported in next paragraphs.4.1.2 The Inference Functions for Margins Method (IFM)This method, based on the pioneering work of Joe and Xu (1996), exploiting the fundamentalidea of copula theory (that is, the separation between the univariate margins andthe dependence structure), expresses equation (4.1) in the following representationN∑() N∑ d∑l(θ) = ln c F 1 (x t 1; θ 1 ), . . . , F d (x t d; θ d ); α + ln f n (x t n; θ n ). (4.3)t=1t=1 n=1The peculiarity of (4.3) relies in the separation between the vector of the parameters forthe univariate marginals θ = (θ 1 , . . . , θ d ) and the vector of the copula parameters α. In

4.1 Copula Parameter Estimation: Overview over Classic Approaches 19other words, the calibration of the copula parameters to data is performed via a two stageprocedure:1. Estimation of the vector of the parameters for the marginal univariates θ = (θ 1 , . . . , θ d )via the ML method. For example, considering the time series of the i-th risk factor,we haveˆθ i = argmaxθ iN ∑t=1ln f i (x t i; θ i ).2. Estimation of the vector of the copula parameters α, using the previous estimatorsˆθ = (ˆθ 1 , . . . , ˆθ d ):ˆα IF M = argmaxαN∑ln c(F 1 (x t 1; ˆθ 1 ), . . . , F d (x t d; ˆθ d ); α).t=1The IFM estimator is then defined as the vector θ IF M =(ˆθ, ˆα IF M).Remark As illustrated by Joe and Xu (1996) the gain in computational convenience oftencomes at the expense of efficiency. Kim et al. (2007) further show that an inappropriatechoice of models for the margins may lead to poor estimation of the dependence parameterper se. Moreover, a small sample size may lead to poor estimation of the parameters.4.1.3 The Canonical Maximum Likelihood Method (CML)Both the ML and IFM methods are based on an exogenous imposition of the parametricform of the univariate margins 1 . There is an alternative method, which does not imply anya priori assumption on the distributional form of the margins, which is called the CanonicalMaximum Likelihood method (CML) and it relies on the concept of the empiricalmarginal transformation. This transformation tends to approximate 2 the unknown parametricmarginals F i (·), for i = 1, . . . , d, with the slightly modified empirical distributionfunctions ˆF i (·) defined as followsˆF i (·) = 1 N+1 ∑1N + 1{Xni ≤·}, for i = 1, . . . , d, (4.4)n=11where 1 {Xni ≤·} represents the indicator function. We takeN+1 instead of 1Npotential problems with ˆF i blowing up at the boundary of [0, 1].to avoid1. Transformation of the initial data set X = (X 1t , · · · , X dt ) N t=1 into the set of so-calledpseudo-observations, using (4.4), that is, for t = 1, · · · , N, letû t = (û t 1, · · · , û t d) =()ˆF 1 (X 1t ), · · · , ˆF d (X dt ) . (4.5)1 Note that for equation (4.2) we have fixed z = (F −1t ν(u 1), F −1t ν(u 2), . . . , F −1t ν(u d )) ′2 The Glivenko-Cantelli lemma ensures that, when sample size tends to infinity, ˆF converge uniformlyto F on the real line, almost surely.

20 Maximum Likelihood Based Estimation of the t-Copula2. Estimation of the vector of the copula parameters α, via the following relation:ˆα CML = argmaxαN∑t=1ln c(û t 1, · · · , û t d; α).The CML estimator is then defined as the vector θ CML = ˆα CML .Consistency and asymptotic normality of the CML estimator were shown by Genest etal. (1995) for the iid case and by Chen and Fan (2006) for estimating copula based timeseries models. Chen and Fan (2006) showed that this estimator also converges to the trueparameter in case the copula is misspecified (which in most practical situations is verylikely), so that the estimated model fits the data the best.Remark The IFM method is more advantageous over the CML method for modelingpurposes because it gives full parametric description of the data through the estimatesν, ν 1 , . . . , ν d , P .The CML method gives only the estimates ν and P , leaving the whole specification ofmargins nonparametric. As a result, the CML method is less suitable for modeling purposes.4.2 Student’s t-Copula: CML EstimationMashal and Zeevi (2002) suggest to use the following algorithm to estimate the parametersν and P of the t-copula:1. Transforming the data set X = (X 1t , · · · , X dt ) N t=1 into uniform variates ût = (û t 1 , · · · , ût d ),using (4.4).2. Estimate the correlation matrix P using the Kendall’s tau non-parametric estimator(McNeil et al. (2005)):( πˆP ij = sin ij)2 ˆτ , i, j = 1, . . . , d. (4.6)This correlation estimation method is justified by the equation (3.8) and will bediscussed in more detail in Chapter 7. Note, that there is no guarantee that thecomponentwise transformation (3.8) of the matrix of Kendall’s tau rank correlationwill remain positive definite. In this case ˆP can be transformed with the eigenvalueor Higham method to obtain a positive definite matrix that is close to ˆP . We discussthese two methods in Chapter 9.3. The remaining parameter ν of the copula is then estimated with maximum likelihood:ˆν = argmaxν∈(0,∞]N∑ln c((û t 1, · · · , û t d); ν, ˆP ).t=1

4.3 Meta-t Copula with t Distributed Margins: IFM Estimation 214.3 Meta-t Copula with t Distributed Margins: IFM EstimationA similar approach can be developed for calibrating a meta-t copula with t distributedmargins. The set of parameters to be estimated is {ν, ν 1 , . . . , ν d , P }.1. Estimation of the vector of the parameters for the marginal univariates ν 1 , . . . , ν dwith the ML method. For example, considering the time series of the i-th risk factor,we haveˆν i = argmaxν iN ∑t=1ln f i (x t i; ν i ).2. Transforming the data set X = (X 1t , · · · , X dt ) N t=1 into uniform variates ût = (û t 1 , · · · , ût d ),using the fitted marginal t distributions with degrees of freedom ˆν 1 , . . . , ˆν d :û t =()F −1 (Xtˆν1 1t ), · · · , F −1 (Xtˆνd dt ) . (4.7)3. Estimate the correlation matrix coefficients P ij using the Kendall’s tau rank correlationcoefficients:( πˆP ij = sin ij)2 ˆτ , i, j = 1, . . . , d.4. Perform a numerical search for ˆν, i.e.,whereˆν = argmaxν∈(0,∞]N∑ln c(û t 1, . . . , û t d; ν, ˆP ),t=1c(u; ν, P ) := c(u 1 , . . . , u d ; ν, P ) =(Γ) (ν+d2Γν2) d−1 () −ν+d1 + z ′ P −1 2z( ) d ) −ν+1|P | 1/2 ν+1 ∏di=1Γ2(1 + z2 2iν,and again, z denotes (F −1tˆν1(û 1 ), . . . , F −1tˆνd(û d )) ′ .4.4 Simulation Study: Comparison between the IFM andCML MethodsThe aim of this simulation study is to compare efficiency of the CML and the IFM estimationmethod on a t-copula. This goal is achieved by comparing the true parameter withthe parameters estimated with the above mentioned strategies.Sample size N=2000We simulate a random sample following a Cν,P t -copula represented by the time seriesX = (X 1t , · · · , X dt ) N t=1 , where d stands for the number of underlying risk factors and

22 Maximum Likelihood Based Estimation of the t-CopulaN = 2000 represents the number of observations for each risk factor. The true value ofthe df parameter is ν = 7.The copula parameter ν is estimated with the CML and the IFM method and the deviationfrom the true parameter is recorded. We repeat this procedure 150 times for each of thedimensions d = 2, 5, 10, 20 and 50. The correlation matrices are generated in the way, thateach off-diagonal element is set to 0.7 for simplicity reasons.The results of the simulation study are presented in Figure 4.1 . Both estimation techniquesdisplay approximately equal accuracy for the underlying data sample size of 2000.Figure 4.1: Descriptive statistics of the absolute errors of the CML and the IFM t-copulaestimators for copula dimensions d = 2, 5, 10, 20 and 50 on the logarithmic scale. Thesample size of the underlying dataset is 2000. The number of simulations for each copuladimension is 150.The two-sample paired t-test fails to reject the null hypothesis (the difference betweenthe mean of the accumulated absolute errors of the CML method and the IFM method iszero). The p-value is 0.9815. As expected, for large sample size (in our case 2000), thereis no significant difference in efficiency between the CML and the IFM methods.Sample size N=300Now, we repeat the same procedure, but for the data sample size being set to 300. The resultsof the simulation study are presented in Figure 4.2. Now, both estimation techniquesdo not display equal accuracy, that can be observed the best in Figure 4.3.The two-sample paired t-test rejects the null hypothesis at the (the difference betweenthe mean of the accumulated absolute errors of the CML method and the IFM methodis zero). The corresponding p-value is 0.0024. In other words, the IFM method is lessefficient than the CML method when the sample size is small.

4.4 Simulation Study: Comparison between the IFM and CML Methods 23Figure 4.2: Descriptive statistics of the absolute errors of the CML and the IFM t-copulaestimators for copula dimensions d = 2, 5, 10, 20 and 50 on the logarithmic scale. Thesample size of the underlying dataset is 300. The number of simulations for each copuladimension is 150.Figure 4.3: Absolute errors of the CML vs. the IFM t-copula estimators for copuladimensions d = 2, 5, 10, 20 and 50. The sample size of the underlying dataset is 300. Thenumber of simulations for each copula dimension is 150.

24 Maximum Likelihood Based Estimation of the t-Copula

Chapter 5Interludium on Bayesian InferenceThe idea behind the present work is to estimate the parameters of the copula model notonly with the classical, but also with the Bayesian approach and, in particular, with theMarkov chain Monte Carlo (MCMC) methods. In this chapter we introduce the needednotation and tools from the Bayesian inference.A full overview on Bayesian statistics offers Carlin (2008), an interested reader may getdetailed insight into MCMC methods in Liang et al. (2010).5.1 Basic NotationConsider a random vector of observations X = (X 1 , X 2 , . . . , X N ) whose density, for a givenvector of parameters θ = (θ 1 , . . . , θ d ), is p(X | θ). From a Bayesian point of view, bothobservations and parameters are considered to be random. Then Bayes’ theorem can beformulated aswherep(X, θ) = p(X | θ) π(θ) = p(θ | X) p(X), (5.1)• p(X, θ) is the joint density of observed data and parameters;• p(X | θ) is the density of observations for given parameters, which is the likelihoodfunction L(θ | X). Therefore, from now on we use the notation L(θ | X).• π(θ) is the density of parameters, a so-called prior distribution. Typically, π(θ)depends on a set of further parameters that are called hyperparameters. The priordensity incorporates our relative belief weights of every possible parameter valuebefore we observe the data. It can stem from the expert opinion available, or from theinspection of the parameter space. For simplicity of notation, we consider continuousπ(θ) only.• p(θ | X) is the density of parameters given observed data X, a so-called posteriordistribution. It measures how plausible we consider each possible value of theparameter after observing the data.• p(X) is a marginal density of X.25

26 Interludium on Bayesian InferenceThe later can also be written as∫p(X) =L(θ | X)π(θ) dθ. (5.2)Using (5.1), the posterior distribution can be written asp(θ | X) =L(θ | X)π(θ).p(X)We havep(θ | X) =L(θ | X)π(θ)∫ . (5.3)L(θ | X)π(θ) dθNote, that in equation (5.3), the integral given in the denominator can be very difficult toevaluate, even numerically. We havep(θ | X) ∝ π(θ) · L(θ | X). (5.4)The posterior distribution of the parameter θ is proportional to the prior distribution ofθ times likelihood. The formula (5.4) gives us all the information about the shape of theposterior. However, we cannot use the formula (5.4) to find probabilities or to computemoments since it is not a density.A possible solution is to draw samples from the posterior distribution and make inferenceon it using the samples. Computational Bayesian statistics offers various methods to drawa sample from the true posterior, even when we only know it in the unscaled form. Themost popular method is Markov Chain Monte Carlo sampling, which we present in thefollowing.5.2 Metropolis-Hastings AlgorithmMarkov Chain Monte Carlo methods can be used for generating samples from the posteriordistribution. Here, we don’t draw our sample from the posterior distribution directly.Rather we set up a Markov chain that has the posterior distribution as its limiting distribution.The Metropolis-Hastings (MH) algorithm, Gibbs sampler, and the substitutionsampler are methods for doing this. We let the Markov chain run a long time until it hasapproached the limiting distribution. Any value taken after that initial burn-in periodapproximates a limiting distribution. For more detail about Bayesian inference or MCMCmethods, consult Bolstad (2010) or Carlin (2008).A full justification of why the following algorithms work requires Markov chain theory weprefer to avoid, but see Chib and Greenberg (1995) for detailed explanations using thetheory of Markov chains.Suppose our goal is to draw samples from some distribution p(θ | X) = π(θ)·L(θ|X)K. Thenormalizing constant K = ∫ L(θ | X) π(θ) dθ may not be known, or very difficult to compute.The Metropolis-Hastings algorithm (Metropolis et al. (1953)) generates a sequenceof draws from this distribution as follows:

5.2 Metropolis-Hastings Algorithm 27Algorithm 5.2.1. Metropolis-Hastings algorithm for a single parameter (1953)1. Start with any initial value θ 0 satisfying π(θ 0 ) > 0.2. Do for t = 1, . . . , n:(a) Using current θ t−1 value, sample a candidate point θ ∗ from some jumpingdistribution q(θ t−1 | θ ∗ ), which is the probability of returning a value of θ ∗given a previous value of θ t−1 . This distribution is also referred to as theproposal or candidate-generating distribution.(b) Given the candidate point θ ∗ , calculate the ratio of the density at the candidateθ ∗ and current θ t−1 points,α(θ t−1 , θ ∗ ) = p(θ∗ | X) · q(θ ∗ | θ t−1 )p(θ t−1 | X) · q(θ t−1 | θ ∗ ) .Notice that because we are considering the ratio of p(θ | X) under two differentvalues, the normalizing constant K cancels out. The algorithm only requiresthat we know the unscaled posterior. Using (5.4), we can rewrite the aboveexpression asα(θ t−1 , θ ∗ ) = π(θ∗ ) · L(θ ∗ | X) · q(θ ∗ | θ t−1 )π(θ t−1 ) · L(θ t−1 | X) · q(θ t−1 | θ ∗ ) .(c) Draw u from the uniform distribution U[0, 1].(d) if u < α(θ t−1 , θ ∗ ) then accept the candidate point (set θ t = θ ∗ ), else let θ t = θ t−1We summarize the Metropolis-Hastings sampling as first computing( π(θα(θ, θ ∗ ∗ ) · L(θ ∗ | X) · q(θ ∗ )| θ t−1 )) = minπ(θ t−1 ) · L(θ t−1 | X) · q(θ t−1 | θ ∗ ) , 1and then accepting the candidate point with probability α.Remark We should note that having the candidate density q(θ | θ ∗ ) close to the targetposterior density p(θ | X) leads to more candidates being accepted. In fact, when thecandidate density is exactly the same shape as the targetthe acceptance probability is given byq(θ | θ ∗ ) = k × p(θ ∗ | X),[α(θ, θ ∗ ) = min 1, p(θ∗ | X) · q(θ ∗ | θ)[= min= 1.Thus, in this case, all candidates will be accepted.p(θ | X) · q(θ | θ ∗ )]1, p(θ∗ | X) · p(θ | X)p(θ | X) · p(θ ∗ | X)]

28 Interludium on Bayesian Inference5.3 Implementation Issues: Choice of q(θ ∗ | θ)Since for the MH algorithm it is possible to not move, the average percentage of iterationsfor which moves are accepted should always be recorded. One is interested in having highacceptance rates. One naive approach to achieve this is to make the chain move veryslowly. For example if the proposal q is generated by a normal distribution with meanθ t−1 and small variance. This will lead to acceptance probabilities close to one. The chainhowever must be capable to traverse the whole parameter space in order to converge tothe limiting distribution. Small moves therefore take many iterations to convergence. Onthe other hand large moves are likely to fall in the tails of the posterior distribution pand causing a low value for the test ratio p(θ∗ |X)·q(θ ∗ |θ)p(θ|X)·q(θ|θ ∗ ). Therefore the proposal has to bechosen in such a way that considerable moves from the current state are possible which canbe accepted with substantial probability. Many authors suggest to find proposals whichresult in an acceptance rate between 20 − 50% (see for example Bennett et al. (1996) andBesag et al. (1995)).Note, we have very much freedom in choosing the proposal kernel q, we consider now somespecial cases.5.3.1 Random Walk Metropolis-HastingsMetropolis et al. (1953) considered Markov chains with a random walk candidate distribution.For a random-walk candidate generating distribution the candidate is drawn froma symmetric distribution centered at the current value. Thus, the candidate density isgiven byq(θ | θ ∗ ) = q 1 (θ ∗ − θ),where q 1 (·) is a function symmetric about zero. Because of the symmetry q 1 (θ ∗ − θ) =q 1 (θ − θ ∗ ), so for a random-walk candidate density, the acceptance probability simplifiesto[α(θ, θ ∗ ) = min1, p(θ∗ | X)p(θ | X)The resulting algorithm is referred to as the Metropolis algorithm. This is used veryoften and typical choices for q are the normal or Student’s t distribution centered aroundzero with the parameters specified according to the principles described below. A chainwith random-walk candidate density will generally have many accepted candidates, butmost of the moves will be a short distance. Thus, it might take a long time for the Markovchain to explore the whole parameter space.].5.3.2 Independence Chain Metropolis-HastingsIn this case the proposed transition kernel is independent of the previous value, i.e.q(θ ∗ | θ) = q(θ ∗ ).

5.4 Multiple-Block Metropolis-Hastings 29Note the corresponding MH algorithm still produces a Markov chain, i.e. the current valuedepends on the previous value.As in the case of symmetric Markov chain we can let q be a normal or Student’s t density,but now it is necessary to specify the location of the generating density as well as thespread, which now requires a priori knowledge about the parameter to be estimated.An independence MH chain has fewer candidates accepted than in the random walk chain,but the moves may be large. This will lead to better mixing properties. It is good to havea proposal density that has a shape close to that of the target so that more candidateswill be accepted.Another popular choice for q is to use the prior density, i.e. q(θ ∗ | θ t−1 ) is set to be equalto π(θ t−1 ). In this case the acceptance probability reduces to[α(θ t−1 , θ ∗ L(θ ∗ ]| X)) = min 1,L(θ t−1 | X)if L(θ ∗ | X) > 0 and 1 otherwise, which is computationally very simple.Remark The Metropolis-Hastings algorithm can be used for multiple parameters θ =(θ 1 , . . . , θ n ). For this, we need to specify a multidimensional prior and proposal distributions.This can be very challenging task, especially, if the parameter space dimensionis large and the joint prior distribution not easy to define. However, the Multiple-blockMetropolis-Hastings algorithm, which we present next, may be helpful.5.4 Multiple-Block Metropolis-HastingsThis method is suitable for multivariate parameter space and is motivated by the following.Often we can group different variables of the vector θ = (θ 1 , . . . , θ n ) in one block. Thisincreases efficiency, especially if the parameter space is large.1. Define p(θ j |Θ (−j) ) to be the joint density of θ where Θ (−j) denotes a parametervector containing all of the values but the values of the jth block.We begin with some initial value θ (0) for each variable.2. For each sample j = {1 . . . k}, sample each variable θ (i)j from the conditional distributionp(θ j |Θ (−j) ) . The algorithm is a Metropolis-Hastings where we do updateblock by block. The update of θ k is done in each step of the cycle.5.5 Obtaining an Approximate Random Sample for InferenceMarkov Chain Monte Carlo samples are not independent random samples. For inferencewe however need a random iid sample from the posterior. There are two basic points ofview on this problem.

30 Interludium on Bayesian InferenceThe burn-in time is the number of steps needed for the draw from the chain to beconsidered a draw from the long-run distribution. In other words, after the burn-in timethe draw from the chain would be the same, no matter where the chain started from. Theburn-in time can be visually deduced from the trace plot. The idea is to use all drawsfrom the Markov chain sample after the burn-in time. All the draws from the chain afterthat can be considered to be draws from the posterior, although they are not independentfrom each other.The second way is to thin the Markov chain Monte Carlo sample to enough degreethat the sample is approximately a random one. To do this, we have to thin enough so thenext draw does not depend on the last one. This way the thinned Markov chain MonteCarlo sample will be approximately a random sample from the posterior. Then we do ourinferences on the thinned, approximately random, sample.5.6 Bayesian Point EstimationThe first type of inference is where a single statistic is calculated from the sample dataand is used to estimate the unknown parameter. From the Bayesian perspective, pointestimation is choosing a value to summarize the posterior distribution. The most importantstatistic of a distribution is its location. The posterior mean, median and the mode aregood measures of location and hence would be good Bayesian estimators of the parameter.Once we obtained approximately random samples from the posterior distribution p(θ | X),we wish to estimate θ with a small mean square error, thusMSE (ˆθ) = E[(ˆθ − θ) 2 | X] = (ˆθ(X) − E[θ | X]) 2 + var(θ | X)and because var(θ | X) is not a function of ˆθ, E[θ | X] is the estimator that minimizesMSE. E[θ | X] is called the Bayesian minimum mean square error (MMSE) estimatoror the posterior mean:ˆθ MMSE = E[θ | X]. (5.5)The posterior median could also be used as a Bayesian estimator since it minimizes theposterior mean absolute deviation E[|θ − ˆθ| | X]ˆθ median = argminθE[|θ − ˆθ| | X]. (5.6)Alternative Bayesian estimate is the maximum a posteriori (MAP) estimateˆθ MAP = argmaxθp(θ | X). (5.7)If the prior π(θ) is constant and the parameter range includes the MLE, then the MAP ofthe posterior is the same as MLE.

5.7 Bayesian Interval Estimation 315.7 Bayesian Interval EstimationThe second type of inference is where we find an interval of possible values that has aspecific probability of containing the true value. In the Bayesian approach, we calculatethe credible interval that has the specifies posterior probability of containing the randomparameter θ.When we want to find a (1 − α) · 100% credible interval for θ from the posterior we arelooking for an interval (θ l , θ u ) such that the posterior probability(1 − α) = P [ θ l < θ < θ u ]=∫ θuθ lp(θ | X) dθ.There are many possible intervals that have the required coverage property. The shortestinterval (θ l , θ u ) with the required probability will have equal density values. That isp(θ l | X) = p(θ u | X). However, often it is more convenient to find the interval (θ l , θ u )that has equal tail areas. This will be the interval (θ l , θ u ) where we find θ l and θ u byrespectively.∫ θl−∞p(θ | X) dθ = α 2and∫ ∞θ up(θ | X) dθ = α 2 ,

32 Interludium on Bayesian Inference

Chapter 6Bayesian Estimate of thet-Copula’s Degree of Freedom νThe copula parameters, especially those of the t-copula, as we see in Chapter 4, are difficultto estimate. If many joint observations are available, then estimating the dependencestructure is relatively easy. But the estimation procedures for copulas, such as ML-basedtechniques, often lead to a large parameter uncertainty when observations are scarce. Apossible approach to solve this problem is to apply methods of Bayesian inference, whichcombine prior information, such as expert opinion, with the likelihood of the data.There has been several proposals in the literature on this topic. Böcker et al. (2010) andBorowicz and Norman (2009) incorporate parameter uncertainty using Bayesian methodsfor estimating Gaussian and Gumbel copulas, where the parameter uncertainty stemsfrom the correlation coefficients. Min and Czado (2010) provide Bayesian analysis of paircopulaconstructions. Within the framework of operational risk management Wüthrichand Shevchenko (2009) and Dalla Valle and Giudici (2008) provide methods from Bayesianstatistics to model loss distributions combining loss data and expert opinion.The main objective of this chapter is to estimate the df parameter ν of a t-copula usingBayesian inference methods. Note, that for doing this, we estimate the correlation matrixP and treat it as a fixed parameter.6.1 Bayesian Estimator of the Degrees of Freedom Parameterof a t-Copula6.1.1 Choice of the Prior DistributionBayesian parameter estimation requires a priori knowledge about the parameter to beestimated. Our parameter of interest is ν ∈ (0, +∞]. Hence, we are looking for a priordistribution function with support (0, +∞], finite mean and light right tail. Possiblecandidates for the prior distribution function are:• Uniform on [0, b] and b being an upper bound for the possible value of ν. Thisis slightly disadvantageous, since we have to prespecify the hyperparameter b and33

34 Bayesian Estimate of the t-Copula’s Degree of Freedom νtherefore make a restriction. Additionally, the Bayesian estimate MAP of the posterioris the same as ML estimate, if the prior is non-informative, i.e. uniform (seesection 5.6). For this reason we do not take the non-informative uniform prior intoconsideration.• Truncated Normal distribution on (0, +∞) with mean µ and variance σ 2 . Itsdensity, for 0 < X < +∞, is given byand by f = 0 otherwise.f tr.normal (x; µ, σ, 0, +∞) =1 x−µσφ(σ )1 − Φ( µ σ ),The mean and the variance of the truncated normal distributed rv X are given byE tr.normal [X | 0 < X < +∞] = µ + φ( µ σ )Φ( µ σ ) σ,(Var tr.normal (X) = σ 2 1 − φ( µ σ ) ( φ(µΦ( µ σ )σ )Φ( µ σ ) + µ σFigure 6.1 illustrates different shapes of the truncated normal density.)).Figure 6.1: Truncated normal probability density function on (0, ∞) for different valuesof µ and σ parameters.• Gamma distribution with scale parameter θ > 0 and shape parameter k > 0. Itsdensity function ise − x θf gamma (x; k, θ) = x k−1θ k , for x ≥ 0.Γ(k)

6.1 Bayesian Estimator of the Degrees of Freedom Parameter of a t-Copula35Its first and second moments are given byE gamma [X | k, θ] = k θ,var gamma (X) = k θ 2 .Figure 6.2 illustrates different shapes of the gamma density.Figure 6.2: Gamma probability density function for different values of θ and k parameters.Thus, possible candidates for a prior distribution of ν are the truncated normal and thegamma distributions. Note, that in both cases the values of the hyperparameters (scaleand shape) have to be determined. Next, we outline the most common approaches todetermining the hyperparameter values.Involving expert opinionAssume, we possess expert opinion about the most likely value of the ν parameter. Letsdenote it by ν expert . Next, we equate ν expert = ! E prior [ν] and search for the best suitablecombination of hyperparameter values that correspond to our beliefs. This can be donevia the moment matching procedure. Note, there is no standard and theoretically justifiedmethod for determining the value of hyperparameters. This arbitrariness is the maindisadvantage of the Bayesian inference methods.Empirical Bayes approachEmpirical Bayes methods are procedures for statistical inference in which the prior distributionis estimated from the data. Empirical Bayes method is a very useful approach forsetting hyperparameters, if we do not have any prior information about their distributionavailable.

36 Bayesian Estimate of the t-Copula’s Degree of Freedom νIn fact, we can compute the hyperparameters by equating the expected value of the priorto the maximum likelihood estimate and by setting prior variances to be reasonably large.Hierarchical modelsHierarchical models can also be used to determine the value of the hyperparameters. Thehyperparameters are assumed to be random, their prior distribution has to be set. Thisleads to more complicated sampling methods. Consult e.g. Carlin (2008) or Bolstad (2010)for more details.6.1.2 Posterior Distribution of νRecall, the posterior distribution of ν has formp(ν | X, P ) ∝ π(ν) · L(ν, P | X).In the case of the gamma prior π gamma (ν), we obtain the following shape of the posteriordistributionp(ν | X, P ) ∝ π gamma (ν) · L(ν, P | X)( Γ(∝ ν k−1 e −ν/θ ν+d2 ·) ) N ( Γ(ν+1Γ( ν 22 ) ·) ) −dNΓ( ν 2 ) · |P | N 2·N∏(t=11 + x′ tP −1 x tν) −ν+d2·N∏d∏(t=1 n=11 + x2 ntν) ν+12.(6.1)The posterior distribution of the ν parameter of a t-copula as presented by (6.1) is nota standard distribution. Similar result is obtained when taking the truncated normaldistribution as a prior. Therefore, we apply Markov chain Monte Carlo (MCMC) methodsto generate a sample of ν parameter distributed according to p(ν | X, P ). A suitablemethod for this purpose is the Metropolis-Hastings (MH) algorithm for a single parameter.It is computationally more convenient to use the expression for the posterior distribution(6.1) on the logarithmic scale:ln p(ν | X, P ) ∝ ln π gamma (ν) · l(ν, P | X)∝ (k − 1) ln ν − ν ν+dΓ(2+ N ln ) ( Γ(ν+1θ Γ( ν 22 ) − dN ln) )Γ( ν 2 ) + N 2− ν + d2N∑t=1(ln1 + x′ tP −1 x tν)+ ν + 12N∑d∑t=1 n=1(ln1 + x2 ntνln |P |).6.1.3 Choice of the Proposal DistributionThe Metropolis-Hastings algorithm requires an appropriate proposal density. In general,the Metropolis-Hastings algorithm works more successfully when the proposal density isat least approximately similar to the target density, i.e. p(ν, P | X).

6.2 Example 37Random Walk Metropolis-Hastings is disadvantageous in our case, because it potentiallyallows the proposed value of ν to drop below 0.Independence Chain Metropolis-Hastings with q being normal or Student’s t density isa possible alternative. But then it is necessary to specify the location of the generatingdensity as well as the spread that adds another two nuisance parameters in our model.The Bayesian estimation procedure of a t-copula is summarized in the Algorithm 6.1.1.Algorithm 6.1.1. Estimating t-copula df parameter ν via Metropolis-Hastingsalgorithm1. Transformation of the initial data set X = (X 1t , · · · , X dt ) N t=1 into the set of pseudoobservationsû, using the empirical marginal transformation (4.4) or the IFM method(4.7).2. Estimate the correlation matrix P using the Kendall’s tau non-parametric estimator( πˆP ij = sin ij)2 ˆτ , i, j = 1, . . . , d.Perform needed matrix adjustments (as described in Chapter 9) of P to make itpositive definite.3. Compute the ML estimate ˆν ML of the copula parameter ν. Determine the type ofthe prior distribution (based on the expert opinion or on ˆν ML ) and the values ofthe corresponding hyperparameters. Moreover, determine the type of the proposaldistribution.4. Run the Metropolis-Hastings algorithm (5.2.1) using a prespecified proposal densityand the target density being proportional to π(ν) · L(ν, P | X). Return an MCMCchain of parameter values of ν that are assumed to be ”approximately” sampled fromthe posterior distribution.5. Using simulated draws summarize the posterior distribution or calculate any relevantquantities of interest, e.g. the Bayesian minimum mean square error estimatorˆν MMSE = E[ ν | X] or the maximum a posterior estimate ˆν MAP = argmax p(ν | X).νThis algorithm is been implemented in MATLAB: BayesianDFEstimateTCopula.m.6.2 ExampleIn this section we present an application of the method for the calibration the parameterof the t-copula via Bayesian inference on the simulated data set. We construct the timeseries X = (X 1t , · · · , X dt ) N t=1 where d = 4 stands for the number of simulated risk factorsand N = 2000 represents the number of observations available.Our fictitious data set is as follows:

38 Bayesian Estimate of the t-Copula’s Degree of Freedom ν(a) Gamma prior(b) Truncated normal priorFigure 6.3: Trace plots for the ν parameter for the Example 6.2. Here, we used 6000Monte Carlo simulations for both gamma and truncated normal priors. The true value isν = 7 (dotted line)X has a copula function Cν,P t , with degrees of freedom ν = 7 and correlation matrix⎛⎞1 0.4 0.9 0.50.4 1 0.1 0.7P = ⎜⎟⎝0.9 0.1 1 0.2⎠ .0.5 0.7 0.2 1We employ the Independence Metropolis-Hastings algorithm with Gaussian proposal density(with σ = 1) for the following prior densities:• Gamma (ν ML , 1),• Truncated normal (ν ML , 1).For all two cases we draw 6000 samples of ν according to the posterior distribution.The corresponding trace plots (value of the sampled ν versus iterations) are shown inFigure 6.3. In addition, we plot the autocorrelations of the not thinned Markov chains(Figure 6.4).We see that in our case two different prior models seem to have only little influence on theposterior distributions. To compare the both resulting posterior distributions we employthe paires two-sample t-test. It failed to reject the null hypothesis in our example (thedifference between the means of the two distributions is unsignificant) with a p-value being0.897.6.3 Assessing Markov Chain ConvergenceSimulation-based Bayesian inference requires using simulated draws to summarize theposterior distribution or calculate any relevant quantities of interest. We need to treat thesimulation draws with care. There are usually two issues. First, one has to decide whetherthe Markov chain has reached its stationary, or the desired posterior, distribution. Second,

6.4 Simulation Study: Bayesian vs. Classical Estimation Methods of ν 39one has to determine the number of iterations to keep after the Markov chain has reachedstationarity. Convergence diagnostics help to resolve these issues.Note that many diagnostic tools are designed to verify a necessary but not sufficientcondition for convergence. There are no conclusive tests that can tell you when the Markovchain has converged to its stationary distribution.We will use two common graphical tools to check the convergence, which are describedbelow.6.3.1 Visual Analysis via Trace PlotsTrace plots of samples versus the simulation index can be very useful in assessing convergence.The trace shows us if the chain has not yet converged to its stationary distributionthat is, if it needs a longer burn-in period. A trace can also tell you whether the chain ismixing well. A chain might have reached stationarity if the distribution of points is notchanging as the chain progresses. The aspects of stationarity that are most recognizablefrom a trace plot are a relatively constant mean and variance. A chain that mixes well traversesits posterior space rapidly, and it can jump from one remote region of the posteriorto another in relatively few steps.Figure 6.3 displays ”perfect” trace plots for the parameter ν for both priors. Note that thecenter of the chain appears to be around the value 7, with acceptable fluctuations. Thisindicates that the chain could have reached the right distribution. The chain is mixingwell; it is exploring the distribution by traversing to areas where its density is very low.We can conclude that the mixing is good here.6.3.2 Visual Analysis via Autocorrelation PlotsAnother graph that is very useful in accessing a MCMC sampler looks at the serial autocorrelationsas a function of the time lag. A plot of ρ k vs. k (the kth order autocorrelationvs. the lag) should show geometric decay is the chain has reached stationarity (see Lianget al. (2010)).The autocorrelation plot may indicate underlying correlation structure in the series notobvious from the time series trace. High correlations between long lags indicate poormixing of the chain.Figure 6.4 displays a good behaviour of the autocorrelation plot for a parameter ν if theprior is chosen to be truncated normal. Here we observe the geometric decay is the samplerseries. For the gamma prior the autocorrelation is slightly higher. We therefore favorizethe truncated normal prior for the further research.6.4 Simulation Study: Bayesian vs. Classical EstimationMethods of νHere, we compare the performance of the Bayesian and the classical estimators such asthe IFM (section 4.3) or the CML (section 4.2) estimators of the ν parameter.

40 Bayesian Estimate of the t-Copula’s Degree of Freedom ν(a) Gamma prior(b) Truncated normal priorFigure 6.4: Autocorrelation plots for ν parameter for the Example 6.2. The samples arenot thinned.The simulation is designed as following. We simulate a random sample following a Cν,P t -copula represented by the time series X = (X 1t , · · · , X dt ) N t=1 where d stands for the numberof underlying risk factors and N = 2000 represents the number of observations for eachrisk factor. The true value of the df parameter is ν = 7.The copula parameter ν is estimated with the Bayesian method (prior taken equal totruncated normal with σ = 1), the deviation from the true value is recorded. We repeatthis procedure 100 times for each of the dimensions d = 2, 5, 10, 20 and 50. The correlationmatrices are generated in the way, that each off-diagonal element is set to 0.7 for simplicityreasons.The results of the simulation study are presented in Figure 6.5.performance of the IFM and CML estimators in Figure 4.1.Compare it with theTo compare the estimation efficiency of the Bayesian and the IFM method we employ thetwo-samples paired t-test. It fails to reject the null hypothesis (the difference between themean of the absolute errors of the Bayesian method and the IFM method is zero). Thep-value is 0.6379. We conclude, that both estimation techniques display approximatelyequal accuracy for the underlying data sample size of 2000.

6.4 Simulation Study: Bayesian vs. Classical Estimation Methods of ν 41Figure 6.5: Descriptive statistics of the absolute errors of the Bayesian t-copula df estimatorsfor copula dimensions d = 2, 5, 10, 20 and 50 on the logarithmic scale. The sample sizeof the underlying dataset is 2000. The number of simulations for each copula dimensionis 100.

42 Bayesian Estimate of the t-Copula’s Degree of Freedom ν

Chapter 7Correlation Estimation of thet-CopulaFor elliptical distributions linear correlation is a natural measure of dependence. However,a linear correlation estimator such as the Pearson product-moment correlation estimator(the standard estimator) being suitable for data from multivariate normal distributionshas a very bad performance for heavier tailed or contaminated data. The linear correlationestimator is not robust, so its value can be misleading if outliers are present, see Devlin etal. (1975). Therefore robust estimators are needed, robust in the sense of maintaining ahigh efficiency for heavier tailed elliptical distributions as well as for multivariate normaldistributions.In this chapter we discuss two methods to estimate the correlation matrix of a t or relatedcopula: the Kendall’s and the Bayesian approach. The former employs the method ofmoments, while the latter is based upon Markov Chain Monte Carlo methods that allowto simulate sample correlation matrices from their posterior distribution and that enablesus to tackle the important issue of parameter uncertainty. The results presented in thischapter show that the Kendall’s τ-based estimator for the correlation matrix yields smallerestimation biases at less computational effort than the Bayesian estimator.7.1 Classical Approach: Kendall’s tau ApproximationWe illustrate Kendall’s tau approximation which is an alternative dependence measureto the linear correlation coefficient. It is preferred over the linear correlation coefficientsince it is invariant with respect to strictly increasing nonlinear transformations and doesnot require the existence of second moments. For shortcomings and pitfalls of the correlationcoefficient we refer to Embrechts et al (2001). A key role plays here the followingrelationship between Kendall’s tau and the linear correlation coefficient ρτ = 2 arcsin ρ, (7.1)πproven by Lindskog et al.margins.(2001) for elliptical bivariate distributions with continuous43

7.2 Bayesian Correlation Estimation 45distribution is combined with the likelihood function of the data (after transformationof the marginals) according to Bayes theorem. This approach has very highcomputational cost and is mainly focused on involving the expert opinion abouteach correlation coefficient into estimation procedure.• The approach discussed in Dalla Valle (2009) is based upon Markov Chan MonteCarlo. The novelty of this approach is that a large number of correlation matrices,drawn via MCMC, instead of a fixed parameter estimate of P , are produced inorder to generate the so called Bayesian Student’s t copula. The inverse Wishartdistribution is being chosen as a prior distribution of the correlation matrix P . This,multiplied to the likelihood function of a t copula density results in an a posterioridensity of the correlation matrix. Unfortunately, this approach is disapproved inArbenz (2010). The matrix drawn from the inverse Wishart distribution does notnecessarily have ones on the diagonal. Therefore, it is incorrect to produce Markovchain Monte Carlo samples of P according to the inverse Wishart prior and makeinference based on the incorrectly build chain.Remark Our main objective is not to involve an expert opinion, because it is rarelyavailable, but to tackle the issue of parameter uncertainty while estimating the correlationmatrix of the t-copula. Therefore we focus on the concept of Dalla Valle (2009) andimplement the needed corrections.7.2.1 Bayesian Inference for a Covariance matrixSuppose that the risk factor X has copula Cν,P t and we wish to obtain the Bayesian estimatorof the correlation matrix P . The idea is to estimate firstly the degree of freedomparameter ν as described in section 4.2. In short, we find the Kendall’s tau based approximationˆP of the correlation matrix, and then estimate ν with the usual maximumlikelihood method, i.e. we maximize the log-likelihood function of the Student’s t copuladensity with ˆP fixed.As already mentioned above, it is mathematically incorrect to assume inverse Wishartdistribution as a prior for P . Hence we go one step back and look at the covariance matrixV :=νν−2Σ. Indeed, we can assume that the covariance matrix V of dimension d has aninverse Wishart prior distribution with parameters α > d + 1 and positive definite scalematrix B > 0:V ∼ Inverse Wishart(α, B).In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution,is a probability distribution defined on real-valued positive-definite matrices. In Bayesianstatistics it is used as the conjugate prior for the covariance matrix of a multivariatenormal distribution (see e.g. Leonard and Hsu (1992)). More information about theinverse Wishart distribution can be found in the Appendix A.1. It is important to note,that the mean of the random variable V ∼ Inverse Wishart(α, B) isE[V ] =Bα − d − 1 .

46 Correlation Estimation of the t-CopulaConcerning the choice of hyperparameters α and B, the most reasonable choice is the choiceinspired by the empirical Bayes methodology, where the prior distribution is estimatedfrom the data. This approach stands in contrast to standard Bayesian methods, for whichthe prior distribution is fixed before any data are observed.According to that we set B as a point estimator matrix, such as the sample covariancematrixˆB := ˆV (α − d − 1)N∑· (α − d − 1) = (X i − X)(X i − X) T . (7.3)N − 1where α hyperparameter can be chosen arbitrarily, with respect to the fact that the smallerthe value of α is, the more volatile the result. Moreover, the choice of α parameter has tobe calibrated in that way to achieve good acceptance rate of the Markov chain.So, we use the data as our prior knowledge about the a priori distribution of the covariancematrix. We havei=1V ∼ Inverse Wishart(α, ˆB). (7.4)Note, that the random variable drawn from the inverse Wishart distribution is positivedefinite per construction, therefore no additional matrix adjustments are needed beforeplugging it into the algebraic form of the copula density.Next, we build an algorithm that produces a Monte Carlo Markov Chain for the covariancematrix. Finally, we use a great number of covariance matrices, drawn via MCMC, deducethe mean a posteriori estimator of V and map it into a corresponding correlation matrix,as given byP ij = P(V ) ij =V ij√Vii V jj.7.2.2 MCMC for the Covariance MatrixThe posterior distribution of V is calculated following Bayes’ theorem:According to (7.4), the prior density of V isπ(V ) ∝p(V | ν, X) ∝ π(V ) · L(ν, V | X). (7.5)(|V | −(α+d+1)/2 · exp − 1 2 tr ( B · V −1)) . (7.6)Starting from the random sample X of risk factors, transform the initial data set intothe set of uniform variates U using the empirical marginal transformation or the fullparametric transformation as discussed in Chapter 3.The likelihood function with the fixed ν parameter is then

7.2 Bayesian Correlation Estimation 47L(ν, V | X) =[ ( ν + d= Γ2N∏[·t=1)] N·[Γ1 + z′ tP(V )z tν( ν + 1] −ν+d2where z i = F −1t ν(u i ) for all i = 1, . . . , d.Summarized we obtain2·)] −dN [ ( )] ν (d−1)N· Γ· |P(V )| − N 22N∏d∏[t=1 i=11 + z2 itν] ν+12,p(V | ν, X) ∝∝ π(V ) · L(ν, V | X)(∝ |V | −(α+d+1)/2 · exp − 1 2 tr ( B · V −1)) N∏[· |P(V )| − N 2 ·t=11 + z′ tP(V )z tν] −ν+d2.We cannot recognize the previous equation as any standard form and for this reason weneeded to employ a simple one-dimensional independence chain Metropolis-Hastings for Vusing inverse Wishart candidate distribution whose hyperparameter α can be calibratedto obtain good acceptance rates in a range between 25% and 50%.The acceptance probability is given bywherea(Ṽ , V t) = min{1, A(Ṽ , V t)},A(Ṽ , V t) =p(Ṽ | ν, X)p(V t | ν, X)=|Ṽ |−(α+d+1)/2 · exp(|V t | −(α+d+1)/2 · exp(− 1 2 tr ( B · Ṽ −1)) · |P(Ṽ )|− N 2 · ∏Nt=1[1 + z′ t− 1 2 tr ( B · V −1tP(Ṽ )ztν) ) · |P(V t )| − N 2 · ∏Nt=1[1 + z′ t P(Vt)ztν(7.7)] −ν+d2] −ν+d2.Finally, we use the MCMC chain, discard the predefined number of first burn-in elementsand thin the chain. Hence, we finally come up with, with a predefined number N MCMCof the a posteriori samples {V a posteriori } N MCMCn=1 . The Baysian estimate of V is then theposterior mean matrix of V , or the simple averageˆV Bayesian :=̂ E[V a posteriori ] =N1 MCMC ∑N MCMCn=1V a posteriorin .Mapping the covariance matrix back into the correlation matrix gives us the pseudo-Bayesian estimate of the correlation matrix:

48 Correlation Estimation of the t-CopulaP Bayesian = P( ˆV Bayesian ).To avoid possible misunderstandings, we emphasize that P Bayesian is not a pure Bayesianestimator. Since we did not impose any prior distribution on it and did not derive itsposteriori distribution. ˆV Bayesian is the Bayesian estimator of the covariance matrix whichleads us naturally to the estimator of the correlation matrix P Bayesian .7.3 ExampleWe now illustrate our new approach by means of a fictitious numerical example.assume that the empirical correlation matrix P of the Cν,P t -copula is given byWe⎛⎞1 0.4 0.9 0.50.4 1 0.1 0.7P = ⎜⎟⎝0.9 0.1 1 0.2⎠ .0.5 0.7 0.2 1We simulate a time series X with 4 risk factors of length 2000 that has a Cν,Pt copuladistribution with degrees of freedom ν = 7.The starting value V (0) is chosen as the sample covariance matrix. Furthermore, we set thedegrees of freedom parameter of the inverse Wishart distribution α = 2000, that accordingto our studies results in the acceptance rate of the Markov chain ≈ 40%.Next, we draw 15000 samples from the posteriori distribution, look at the last 10000samples from which we then picked only every fifth value in order to reduce autocorrelationof the chain. Hence, we finally came up with 2000 covariance matrices samples from theposterior distribution, compute the mean a posteriori covariance matrix and map it to thecorresponding correlation matrix. Figure 7.1 shows histograms of the simulated correlationcoefficients.The resulting estimators and the absolute error norms are⎛⎞1 0.3710 0.9002 0.4770P Kendall 0.3710 1 0.0774 0.6981∥ ∥∥P= ⎜⎟⎝0.9002 0.0774 1 0.1772⎠ , with Kendall − P ∥ = 0.049820.4770 0.6981 0.1772 1⎛⎞1 0.4514 0.9117 0.4938P Bayesian 0.4514 1 0.2177 0.7234∥ ∥∥P= ⎜⎟⎝0.9117 0.2177 1 0.1761⎠ , with Bayesian − P ∥ = 0.1331.20.4938 0.7234 0.1761 1Here, we measure the ”distance” between the true matrix P and its estimate with thematrix 2-norm induced by the euclidean vector norm. It is defined as‖A‖ 2:= max‖x‖ 2 =1 ‖Ax‖ 2 = √ λ max ,

7.4 Simulation Study with Higher Matrix Dimensions 49Figure 7.1: Histograms of the simulated correlation coefficients according to Example 7.3.The dotted lines indicate the true parameter values.that is the square root of the largest eigenvalue of the positive-semidefinite matrix A T A.The choice of the matrix 2-norm is justified by the fact that it captures the best thesize of the eigenvalues, hence is more suitable for measuring the absolute error norms forcovariance and correlation matrices.7.4 Simulation Study with Higher Matrix DimensionsNext, we compare the performance of the Kendall’s tau based correlation estimator and theBayesian estimator. For a given correlation matrix dimension d we simulate 100 samplesfollowing a Cν,P t copula (the value of ν is not essential, we set it equal to 7 for simplicityreasons). For each sample, we compute the correlation matrix estimates P Kendall andP Bayesian and the corresponding absolute error 2-norms. This procedure is repeated forcorrelation matrix dimensions d = 2, 3, 5, 10, 20 and 50. The Figure 7.2 shows descriptivestatistics of the both estimation techniques for 5 different matrix dimensions.DiscussionAs shown in Figure 7.2 for correlation matrix dimensions higher than 2, the correlationestimation based on the Bayesian inference cannot achieve the same effectiveness/ accuracy

Chapter 8Bayesian t-Copula DistributionIn the last two chapters we showed how to derive Bayesian estimators of the t-copula parametersdegree of freedom ν and the correlation matrix P . The estimators were obtainedseparately.In Dalla Valle (2009) the approach of so-called Bayesian copula distribution has beendiscussed where the inverse Wishart distribution has been opted as a prior for P andthe truncated Poisson as a prior for ν, resulting in a joint posterior distribution of thet-copula parameters. As already mentioned, imposing an inverse Wishart as a conjugateprior distribution on the correlation matrix P proved to be wrong. We implement theneeded corrections and based on that, built an algorithm for sampling simultaneously νand P and therefore we are able to capture the whole parameter uncertainty of the t-copulaparameters.8.1 Multiple-Block Metropolis-Hastings for the t-Copula ParametersAs before, X denotes a d-dimenstional random vector with a corresponding copula functionC t ν,P . Let Θ = {(ν, P ) : ν ∈ (0, ∞], P ∈ Rd×d }, with P being a symmetric and positivedefinite matrix, denote the parameter space.• We can assume that the covariance matrix V of dimension d has an inverse Wishartprior distribution with parameters α > d+1 and positive definite scale matrix B > 0as in section 7.2:V ∼ Inverse Wishart(α, B).Then, the posterior distribution of V conditional on the random parameter ν andobservations X is of the formp(V | X, ν) ∝ π inverse Wishart α,B(V ) · L(ν, P | X).Note, there ia a one-to-one connection between the covariance V and the correlationmatrix PP ij = P(V ) ij =V ij√Vii V jj.51

52 Bayesian t-Copula DistributionWe, therefore, write L(ν, V | X) instead of L(ν, P | X).• Second, we assume that the degrees of freedom parameter ν has a truncated (on(0, ∞)) normal prior distribution with parameters µ > 0 (shape) and σ > 0 (scale):ν ∼ Truncated normal (µ, σ).And its posterior distribution conditional on the random parameter V and observationsX is of the formp(ν | X, V ) ∝ πµ,σtr.normal (ν) · L(ν, V | X).In both cases the shape of the posterior has no standard form. Therefore we need employa MCMC algorithm to draw samples from the both posterior conditional distributionand make inference on the samples. The Multiple-Block Metropolis-Hastings is the mostsuitable algorithm for this purpose.Algorithm 8.1.1. Multiple-Block Metropolis-Hastings sampler of the t-copula parametersν and PX is a d-dimensional random vector with a corresponding copula function Cν,P t . The setof parameters to be estimated is {ν, P }.1. Transform the initial data set X = (X 1t , · · · , X dt ) N t=1 into uniform variates, using theempirical marginal distribution (4.4), or the inference functions for margins approach(4.3).2. Estimate the correlation matrix P using the Kendall’s tau non-parametric estimator:( πˆP ij = sin ij)2 ˆτ , i, j = 1, . . . , d.Perform needed matrix adjustments (described in Chapter 9) of ˆP to make it positivedefinite.3. Compute the ML estimate ν ML of the copula parameter ν. Its prior distribution istruncated normal (µ, σ). Set the values of the corresponding hyperparameters µ andσ.4. Compute the sample covariance estimate ˆV . Its prior distribution is inverse Wishart(α, ˆV · (α − d − 1)). Set the value of the hyperparameter α.5. Start with any initial value ν (0) satisfying π tr. normalµ,σ (ν (0) ) > 0 and V (0) satisfyingπ inverse Wishart α,B(V (0) ) > 06. Do for t = 1, . . . , n:(a) Perform one step of the Independence Metropolis-Hastings algorithm for ν t(according to 5.2.1), i.e. draw a sample ν t from its posterior distributionp(ν t | X, V t−1 ) ∝ π tr. normalµ,σ (ν t ) · L(ν t , V t−1 | X).

8.1 Multiple-Block Metropolis-Hastings for the t-Copula Parameters 53(b) Draw a sample V t from its posterior distribution (as in 7.2.2)p(V t | X, ν t ) ∝ π inverse Wishart α,B(V t ) · L(ν t , V t | X).4. Return a Markov chain of samples {ν, V } n t=1 .Using simulated draws we can summarize the posterior distribution or calculate any relevantquantities of interest, such as the Bayesian point or interval estimates.This algorithm is been implemented in MATLAB: BayesianTCopulaEstimation.m.

54 Bayesian t-Copula Distribution

Chapter 9Computing the NearestCorrelation MatrixSince some robust correlation estimators do not extend to the multivariate case, the correlationmatrix has to be estimated by estimating pairwise correlations. In this case, theresulting matrix of estimated pairwise correlations is not necessarily positive semi-definite(PSD). We mentioned this problem when considering the Kendall’s tau based correlationestimator for the t-copula in Chapter 4. Hence we are interested in techniques fortransforming non-PSD correlation matrices to PSD correlation matrices such that thetransformed matrix is close to the original matrix.A correlation matrix is a symmetric positive semidefinite (PSD) matrix with unit diagonal.Recall that a matrix P is positive definite (PD) if for all x ≠ 0 x ′ P x > 0 and positivesemidefinite if x ′ P x ≥ 0. For symmetric matrices being positive definite is equivalentto having all eigenvalues positive and being positive semidefinite is equivalent to havingall eigenvalues nonnegative. In most applications in finance the positive definiteness ofthe correlation matrix is often desired over the positive semidefiniteness, since it moretractable when inversion or further decomposion of a matrix is required.Suppose that the risk factor X has copula Cν,P t and we wish to estimate the correlationmatrix P . Assume, that we have an estimator of P (obtained via Kendall’s approximationor via Bayesian inference, as discussed in the previous chapter). An important advantageof such coordinate-dependent methods is their relatively low computational cost, but themain problem is that this results in a matrix of pairwise correlation estimates that is notnecessarily positive definite (unless we use the simple product-moment covariance matrixestimator); this problem does not always arise and if it does, a matrix adjustment methodcan be used, such as the eigenvalue method of Rousseeuw and Molenberghs (1993), or themethod proposed by Higham (2001), which we discuss in the following.9.1 Eigenvalue MethodThis method was first proposed in Rousseeuw and Molenberghs (1993), and found a broaduse in financial applications.55

56 Computing the Nearest Correlation MatrixMotivation A true correlation matrix P is PSD, and hence can be written asP = GLG T (9.1)where L is the matrix of eigenvalues of P , G is an orthogonal matrix of correspondingeigenvectors. In case P is PSD, the eigenvalues are non-negative.When P ∗ is symmetric but not PSD, then (9.1) still holds, but now some eigenvaluesare negative. Typically, the negative eigenvalues will not have a large absolute value. Anatural approach consists of replacing the negative eigenvalues by zeroes, if the result hasto be PSD, or by a small positive number, δ say, if PD is required. L is replaced by ˜L,and we compute ˜P = G˜LG T . The diagonal elements of ˜P will not necessarily be equalto 1. Therefore, we transform ˜P to P ′ = Q 1 ˜LQT1 , where Q 1 is the diagonal matrix withdiagonal elements 1/ √˜p jj (j = 1, . . . , d).Algorithm 9.1.1. Eigenvalue Method Let P ∗ be a so-called pseudo-correlation matrix(d×d), i.e. a symmetric matrix of pairwise correlation estimates with unit diagonal entriesand off-diagonal entries in [−1, 1] that is not positive semidefinite.1. Calculate the spectral decomposition P ∗ = GLG T , where L is the matrix of eigenvaluesand G is an orthogonal matrix whose columns are eigenvectors of P ∗ .2. Replace all negative eigenvalues in L by small values δ > 0 (to ensure PD) or zeroes(PSD) to obtain ˜L .3. Calculate ˜P = G˜LG T , which will be symmetric and PSD / PD but not a correlationmatrix, since its diagonal elements will not necessarily equal one.4. Return the correlation matrix P ′ = P( ˜P ), where P denotes the correlation matrixoperator defined asP( ˜P ) ij :=˜P ij√˜Pii ˜Pjji, j = 1, . . . , d.The obvious advantage of this method is that it is conceptually very simple, that explainswhy it has been a preferred method in financial applications.9.2 Higham MethodThe procedure proposed by Higham (2001) solves the problem of finding the nearest correlationmatrix to a given estimated matrix by minimizing the distance between them inan appropriate matrix norm. We describe the procedure of Higham and adopt here hisnotation.Motivation The problem considered is, for a given arbitrary symmetric A ∈ R n×n , tocompute the distanceγ(A) = min{‖A − X‖ : X is a correlation matrix}

9.2 Higham Method 57and a matrix X achieving this minimum distance. The norm is a weighted version of theFrobenius norm, ‖A‖ 2 F = ∑ a 2 ij , the Frobenius norm being the easiest norm to work withi,jfor this problem. The weighted Frobenius norm is denoted by‖A‖ W = ‖W 1/2 AW 1/2 ‖ Fwhere W is a symmetric positive definite matrix.We define the setsS = {Y = Y ′ ∈ R n×n : Y is positive semidefinite}U = {Y = Y ′ ∈ R n×n : y ii = 1, i = 1 : n}.We are looking for a matrix X in the intersection of S and U that is closest to A in aweighted Frobenius norm. We take W equal the identity matrix for simplicity reasons.For A = Q diag (λ i )Q ′ let the projection into the space S be()P S (A) := Q diag max(λ i , 0) Q ′ .The projection into the space U is simply given byP U (A) := (p ij ), p ij ={aij , i ≠ j,1, i = j.To find the nearest matrix at the intersection of the sets S and U an iteration of projections(P S and P U ) due to Dykstra (1983) is being employed, which incorporates a judiciouslychosen correction to each projection that can be viewed as a normal vector to the correspondingset. For more details consult the original paper (Higham (2001)).Algorithm 9.2.1. Higham Method (2002) Given symmetric A ∈ R n×n the followingalgorithm computes the nearest correlation matrix in the weighted Frobenius norm.1. Set △S 0 = 0, Y 0 = A.2. For k = 1, 2, . . .R k = Y k−1 − △S k−1 , where △S k−1 is Dykstra’s correction.X k = P S (R k )△S k = X k − R kY k = P U (X k )end.General results of Boyle and Dykstra (1985, Theorem 2) and Han (1988, Theorem 4.7)show that both X k and Y k converge to the desired correlation matrix as k → ∞. Moreover,these authors expect linear convergence at best.This algorithm has been implemented as a R-function nearPD() in the R-package ”Matrix”.The MATLAB version of this algorithm (makePositiveDefinite.m) is attached to the thesis.

58 Computing the Nearest Correlation Matrix(a) log-error norms(b) computational timeFigure 9.1: The performances of the eigenvalue and the Higham method applied on 6random d × d pseudo-correlation matrices A and measured in terms of Frobenius norm‖A − X‖ F . Left figure displays the distance ‖A − X‖ F for various dimensions of A. Rightfigure shows the corresponding computational time required for both methods.9.3 Simulation StudyWe tested both algorithms for their accuracy and computational costs.For different dimensions (d = 2, . . . , 500) we produced a multivariate random sample fromthe t 7 -distribution, computed the correlation matrix with the Kendall’s tau approximation(eqn. 7.2) and added little Gaussian noise to the estimated correlation coefficients. Weapplied the above mentioned adjustment methods on the resulting matrix which is in themost cases not positive definite. For simplicity reasons the eigenvalue method was appliedwith δ = 0, which corresponds to computing a PSD matrix. All simulations were replicated100 times. Figure 9.1 illustates the absolute error norms (in the Frobenius matrix norm)for different matrix dimensions on the logarithmic scale and computational time neededof the both methods.From Figure 9.1 it is evident that among the two methods the Higham method exposeshigher accuracy than the eigenvalue method. Nevertheless, the Higham method has highercomputational time cost.9.4 DiscussionThe eigenvalue algorithm makes use of the spectral decomposition, where the negativeeigenvalues are being replaced by positive small values, but it is fairly ad hoc, and doesnot satisfy this nearness property referred to above. Presumably, pathology could arise.Higham algorithm solves the problem precisely i.e. it finds the nearest correlation matrixto the given symmetric matrix, in an appropriate matrix norm. As Higham underlines,”the main weakness of the algorithm is its linear convergence rate”.Based on the simulation study we can confirm that the algorithm proposed by Highamfor computing the nearest positive definite matrix is the more precise choice than theeigenvalue method. We therefore employ the Higham algorithm in further research.

Chapter 10Application: Simulated and RealEquity PortfolioIn this chapter we present a novel way for predicting risk measures based on the Bayesiancopula estimation. Contrary to the classical approach of using point estimators of copulaparameters we use a probability distribution (Bayesian posterior distribution) for eachcopula parameter that enables us to tackle the important issue of parameter uncertainty.We use two risk measures: Value-at-Risk and expected shortfall.10.1 Value-at-Risk and Expected ShortfallValue-at-Risk (VaR) is a widely used risk measure of the risk of loss on a specific portfolioof financial assets. For a given portfolio, probability and time horizon, VaR is defined asa threshold value such that the probability that the mark-to-market loss on the portfolioover the given time horizon exceeds this value (assuming normal markets and no tradingin the portfolio) is the given probability level.In financial applications, VaR can be expressed as following. Let L be a random variablerepresenting the returns with loss distribution F L and α ∈ (0, 1) a confidence level. ThenVaR α (L), Value-at-Risk at level α, is given byVaR α (L) = F ← L (α) = inf {x ∈ R : F L (x) ≥ α}.In probabilistic terms, VaR is thus simply the quantile of the loss distribution (see McNeilet al. (2005)).Another risk measure, frequently used in finance, is the expected shorfall (ES) at α level.It is defined asES α = 1 α∫ α0VaR γ (L) dγ.Informally, this equation amounts to saying ”in case of losses so severe that they occuronly alpha percent of the time, what is our average loss”.59

60 Application: Simulated and Real Equity Portfolio10.2 VaR and ES Calculation Based on Simulated PortfolioReturnsIn this section, we consider simulated financial returns to explore the impact of differentcalibration techniques on the common risk measures, where the true values of the riskmeasures are known a priori.We simulate the stock returns X by a meta-t copula as following. The number of observationsis chosen to be N = 1500, think e.g. of 6 years of daily data. We assume that theempirical correlation matrix P is given by⎛⎞1 0.4 0.9 0.50.4 1 0.1 0.7P = ⎜⎟⎝0.9 0.1 1 0.2⎠ .0.5 0.7 0.2 1The time series X = (X (t)1 , X(t) 2 , . . . , X(t) 4 )N t=1 have a meta-t copula with degrees of freedomν = 7. The margins are t-distributed, with df ν 1 = 9, ν 2 = 6, ν 3 = 7, ν 4 = 5:X ∼ C t ν,P,ν 1 ,··· ,ν 4. (10.1)Assume for simplicity reasons that in our portfolio each stock has initial price of 100 units.We adjust the returns with a scaling factor σ = 0.025, that corresponds to the typicallyobserved equity daily volatility.Hence, the daily absolute return on portfolio is4∑Z t = σ · X tj , t = 1, 2, . . . , N. (10.2)j=1The distribution of portfolio returns Z that can be used to assess daily Value-at-Risk(VaR) or the expected shortfall (ES), that is done as following.• True value: MC integrationThe true value of the risk measures VaR and ES for the simulated absolute portfolioreturns do not have a simple closed form and therefore are computed via the MonteCarlo intergration. The risk measures are computed at the confidence level α =0.01(99%).• Parametric model: meta-t copula, fixed point parameter estimatesNext, the point estimators of the meta-t copula (ˆν ML , ˆP , ˆν ML1 , · · · , ˆν ML4 ) are computedwith the IFM method (as in section 4.3). Based on them, we simulate hypotheticaltime series (of size 10000) of returns on that portfolio that have a meta-tcopula distribution. Finally, we aggregate the absolute return on this portfolio andthe corresponding risk measures VaR and ES (hereafter referred to as ML-basedVaR and ES). In addition, the 95% bootstrapped confidence intervals (the numberof bootstrap samples is 5000) for the VaR and ES are computed.

10.2 VaR and ES Calculation Based on Simulated Portfolio Returns 61• Parametric model: meta-t copula, Bayesian parameter estimates andtheir posterior distributionFinally, we demonstrate the impact of parameter uncertainty on the risk measures.We run the multiple-block Metropolis-Hastings for the t-copula parameters (section8.1), discard first 1000 burn-in observations and pick each second sample. We areleft with n = 2500 samples ((ν i , P i ) 2500i=1 ). For each parameter sample (ν i, P i ) 2500i=1standardized returns on the portfolio (number of simulations is 5000) are simulatedaccording toiidX i ∼C t ν i ,P i ,ˆν 1 ML ,··· ,ˆν 4MLand scaled with the above mentioned daily equity volatility constant.Next, the respective risk measures are inferred from the simulated returns. This stepcan be visualized as following:⎛⎛ ⎞() (1)⎞(ν 1 , P 1 )(X 1t , · · · , X dt ) N ⎛t=1(VaR (1) , ES (1)) ⎞· · ·⎜ ⎟⎝ · · · ⎠ → · · ·→· · ·⎜ · · · ⎟ ⎜⎝()(ν n , P n )(n)⎠ ⎝ · · · ⎟((X 1t , · · · , X dt ) N VaR (n) , ES (n)) ⎠t=1(10.3)In this way, we obtain 2500 samples from the posterior (that is, empirical) distributionof VaR and ES, respectively, from that the average posterior risk measures(hereafter referred to as Bayesian VaR and ES) and the confidence intervals can bededucted.We summarize our results for the both risk measures and above mentioned estimationtechniques in Table 10.1. Figure 10.1 illustrates the histograms of the posterior VaR andES.Repeating the simulation procedure (as illustrated in 10.2) for 1000 times (i.e. computingthe ML-based VaR/ES, running the multiple-block Metropolis-Hastings algorithm forcopula parameters and extracting a Bayesian VaR/ES) showed us that the ML-based riskmeasures do slightly differ in value from the Bayesian risk measures for the simulatedportfolio returns.In 73% of the cases, the absolute value of the Bayesian risk measure is larger than theML-based ones, i.e. the risk is overestimated. The difference between the Bayesian Valueat-Riskand the ML-based (VaR Bayesianα − VaR MLα ) is approximately normally distributedaround 0.1162, the absolute difference between the Bayesian ES and the ML-based isscattered around 0.1393.The one-sample t-test rejects the null hypothesis that the differences VaR Bayesianα−VaR MLαare a random samples from a normal distribution with mean 0 and unknown variance(p-value 1.48 · 10 −4 ). The same is valid for the ES (p-value 1.57 · 10 −4 ).On average, the posterior credible intervals for the VaR are 130% larger than the bootrappedintervals for ML-based VaR. The posterior credible intervals for the ES are 180% largerthan the bootrapped intervals for ML-based ES.

62 Application: Simulated and Real Equity Portfolio(a) VaR(b) ESFigure 10.1: Simulated portfolio returns: The histograms of the posterior VaR and ES atconfidence level 99% of the returns. The horizontal lines are the bootstrapped (red) andthe posterior (black) 95% credible intervals for the VaR and ES risk measure, respectively.

10.3 Interpretations 63Simulated Portfolio: Aggregated risk measures VaR and ES at 99% confidence levelTrue Value ML risk measure Bayesian risk measureVaR -22.8254 -23.2687 -23.1443corresponding relativereturn on portfolio −5.7% −5.8% −5.8%95% credible ( -24.7867, -21.7834) ( -25.0479 , -21.0122)intervalES -28.6101 -28.6501 -28.7313corresponding relativereturn on portfolio −7.15% −7.18% −7.19%95% credible (-30.3164 ,-27.1166) ( -31.7034, -25.7378)intervalTable 10.1: Aggregated daily VaR and ES of the simulated portfolio returns at confidencelevel of 99%: true values, parametric model estimation with fixed parameter estimatesand via Bayesian inference. The 95% credible intervals for the ML-based VaR and ESare calculated with the bootstrap (B = 5000) method, for the Bayesian risk measures theintervals are calculated from the posterior distribution of VaR and ES, respectively (seesection 5.7).10.3 InterpretationsThe risk measures are overestimated by the Bayesian parameter model. One most likelyreason is that the posterior distribution of the copula parameters (ν and P ) implies aslightly more tail dependent distribution of simulated returns and hence larger value ofthe risk measures than in the case of the point estimated copula model.Recall the formula for the tail dependence coefficient λ(ν, ρ) of a bivariate t-copulaλ(ν, ρ) = 2t ν+1(− √ ν + 1√ ) 1 − ρ.1 + ρIn Figure 10.2 we plot the dependence of λ(ν, ρ) on ν for fixed ρ = 0.6 together with afictitious symmetric posterior distribution of ν. We see, that the dependence structure isnot linear in ν, hence the tail dependence coefficient λ increases much more if ν decreases,but it decreases relatively less if ν increases, equivalently, the distance denoted by A isgreater than the one denoted by B, if the distribution of ν is approximately symmetric.An even greater discrepance between A and B is expected for lower values of ν, and asmaller discrepance for relatively larger ν.The same idea can be extended to the multivariate case, even for the negative correlation1 . Moreover, the same effect is expected while applying the posterior distribution ofthe correlation coefficients.Therefore, our hypothesis is as following:1 For more details on the multivariate tail dependence see Einmahl et al. (2011).

64 Application: Simulated and Real Equity PortfolioFigure 10.2: The coefficient of upper (or lower) tail dependence λ of the bivariate t-copulafor ρ = 0.6. The dotted line is a fictitious symmetric probability density of the ν parameterwith the mean equal to 7.Having a posterior distribution of ν that is evenly distributed on both sides of the meaninstead of one point ML estimate of ν will automatically imply an increase of the posteriorVaR/ES compared to the ML-based VaR/ES.Simulated Portfolio: Aggregated risk measures VaR and ES at 99% confidence levelν = 3.5 ν = 7 ν = 15VaR -31.1332 (0.8271) -23.1443 (0.1162) -19.4019 (0.045)ES -43.6259 (0.9173) -28.7313 (0.1393) -22.4452 (0.385)Table 10.2: Aggregated daily VaR and ES of the simulated portfolio returns at confidencelevel of 99% for the different meta-t copula models for the simulated returns. Values inthe brackets are the average deviations (risk measure MLα − risk measure Bayesianα ).To confirm this hypothesis we run the VaR / ES simulation algorithm, but this time with aslightly changed setting. We simulate the standardised returns according to 10.1 but witha small ν = 3.5 and a large ν = 15. Having a smaller df parameter hypothetically will havean impact of increasing the Bayesian VaR and ES, compared to the ML-based Var /ES.The contrary statement is expected for large ν. Indeed, the simulation results stated inTable 10.2 confirm our hypothesis. Assuming small ν = 3.5 for the underlying true t-copulamodel we obtain a posterior distribution of ν that implies more tail dependent simulatedreturns and hence, larger deviations between the Bayesian and ML risk measures. Thecontrary result is valid for large ν = 15.

10.4 Equity Portfolio 6510.4 Equity PortfolioIn this section, we consider equity daily data to explore the impact of different copulacalibration techniques and parameter uncertainty on the common risk measures.We use a dataset of N = 1008 daily observations of the equity prices of a group ofworldwide equities. The number of assets in the basket is d = 7.The time series of historical prices (closing price) S = (S (t)1 , S(t) 2 , . . . , S(t) 7 )N t=1 (from 12thJune 2007 till 8 th June 2011) were downloaded from the Yahoo Finance Statistical Release(http://www.finance.yahoo.com). Details on the specific tickers are provided in AppendixA.2.10.4.1 Historical Simulation of VaR / ESHistorical simulations represent the simplest way of estimating VaR and ES for many portfolios.In this approach, the risk measures for a profit & loss of the portfolio are estimatedby running the portfolio through actual historical data and computing the changes thathave occurred in each period.Assume we hold λ i = 1 units of each equity. The portfolio value at time t is thenPortValue t ∑= 7 S (t)i .i=1The historical VaR at the confidence level α = 0.01 (or 99% of the time) of the givenportfolio is equal to VaR histα = −12.81, the expected shortfall is ES histα = −15.97.Theseabsolute values correspond to the return on portfolio of −5% and −6.2%, respectively.The relative return for the i th equity at time t + 1 is then defined in the usual wayR (t+1)i= S(t+1) iS (t)i− S (t)i. (10.4)Hence, the potential loss or gain in value of the given portfolio from adverse marketmovements over a specified period of 1 day is thenδPortValue t+1 = PortValue t+1 − PortValue t =7∑i=1R (t+1)i S (t)i . (10.5)10.4.2 ML Copula Calibration and Resulting VaR / ESIn the present study we model the dependence of the standardized (with zero mean andunit variances) relative returns (10.4) by the meta-t copula model with t margins.The meta-t copula for the standardized returns is fitted with the IFM method, as describedin section 4.3. 22 Now we can see why the log returns are unusable for our setting. Assuming that the log returns have ameta-t copula with t margins dependency structure, the expression for the potential loss or gain in portfolio

66 Application: Simulated and Real Equity PortfolioThe resulting estimates are as following:ˆν ML = 4.3876 ˆν 1 ML = 7.3112 ˆν 2 ML = 8.3438 ˆν 3 ML = 8.3604ˆν 4 ML = 9.6825 ˆν 5 ML = 5.8752 ˆν 6 ML = 8.4245 ˆν 7 ML = 8.7213.The estimated correlation matrix ˆP is⎛BCS PNP.PA BNS RY PZB UCG.MI NDB.FBCS 1 -0.0021 0.7216 0.6915 0.0933 -0.0353 0.2064PNP.PA -0.0021 1 -0.1410 -0.0242 0.2290 -0.2558 -0.1011BNS 0.7216 -0.1410 1 0.8741 -0.1125 -0.1502 0.1796RY 0.6915 -0.0242 0.8741 1 -0.1972 -0.2890 0.1330PZB 0.0933 0.2290 -0.1125 -0.1972 1 0.0721 -0.0624⎜⎝ UCG.MI -0.0353 -0.2558 -0.1502 -0.2890 0.0721 1 0.1730NDB.F 0.2064 -0.1011 0.1796 0.1330 -0.0624 0.1730 1⎞.⎟⎠Based on the ML copula parameter estimates, we create a hypothetical time series (size10000) of returns on that portfolio that follow a meta-t copula. Finally, we aggregate therealizations of this portfolio and the corresponding risk measures VaR and ES. In thiscase, the value of VaR and ES for the confidence level α = 0.01 isVaR MLαES MLα= −13.2992 (−13.9531, −12.5829)= −16.4299 (−17.0457, −15.8239),where the values in the brackets are the 95% bootstrapped (B = 5000) confidence intervals.10.4.3 Bayesian Copula Estimation and the Resulting VaR / ESFirstly, use the ML estimates of the margins and transform the return data set into uniformvariates û t = (û t 1 , · · · , ût d ), using the fitted marginal t distributions with degrees of freedomˆν 1 ML , . . . , ˆν dML . Having done this, we run the multiple-block Metropolis-Hastings for thet-copula parameters (section 8.1) ν and P , discard first 1000 burn in observations and pickeach second sample. We are left with 2500 samples (ν i , P i ) 2500i=1 . The corresponding traceplot (value of the sampled ν versus iterations) is shown in Figure 10.3.Secondly, based on each parameter sample (ν i , P i ) 2500i=1 we simulate the returns of theportfolio (number of simulations is 5000) and calculate the respective risk measure. Theposterior distribution of the t-copula parameters implies also a posterior distribution forthe aggregated risk measures. In this way, we obtain 2500 samples from the posteriordistribution of VaR and ES, respectively (as illustrated in 10.2).value takes formδPortValue t+1 = PortValue t+1 − PortValue t =7∑i=1S (t)i (exp(R (t+1)i ) − 1),where exp(R (t+1)i ) has a log-t distribution. The log-t distributed rv has infinite mean, that is undesirablefor simulation purpose.

10.4 Equity Portfolio 67Figure 10.3: Equity portfolio returns: Thinned trace plot for the ν parameter. The solidhorizontal indicates the Bayesian mean a posteriori estimate ˆν MMSE .The histograms of the posterior VaR and ES values are plotted in Figure 10.4. We summarizeour results for the both ML-based and Bayesian risk measures in Table 10.3.Equity Portfolio: Aggregated risk measures VaR and ES at 99% confidence levelTrue Value ML risk measure Bayesian risk measureVaR -12.81 -13.2992 -13.4341corresponding relativereturn on portfolio −5% −5.3% −5.4%95% credible (-13.9531,-12.5829) (-15.2345 , -10.6125)intervalES -15.97 -16.4299 -16.9257corresponding relativereturn on portfolio −6.2% −6.6% −6.8%95% credible (-17.0457 , -15.8239) ( -19.3038 , -13.3133)intervalTable 10.3: Aggregated daily VaR and ES of the equity portfolio returns at confidencelevel of 99%: true values, parametric model estimation with fixed parameter estimates andvia Bayesian inference. The 95% credible intervals are calculated with the bootstrap (B =5000) method, except for the Bayesian risk measures, where the intervals are calculatedfrom the posterior distribution of VaR and ES, respectively (see section 5.7).Repeating the simulation procedure 1000 times (i.e. computing the ML-based VaR/ES,running the multiple-block Metropolis-Hastings algorithm for copula parameters and extractinga Bayesian VaR/ES) showed us that the Bayesian risk measures differ in valuefrom the ML-based risk measures for the equity portfolio returns.

68 Application: Simulated and Real Equity Portfolio(a) VaR(b) ESFigure 10.4: Equity portfolio returns: The histograms of the posterior VaR and ES atconfidence level 99%. The horizontal lines are the bootstrapped (red) and the posterior(black) 95% credible intervals for the VaR and ES risk measure, respectively.

10.4 Equity Portfolio 69In 82% of the cases, the absolute value of the Bayesian risk measure is larger than theML-based ones, i.e. it is overestimated as in Figure 10.4. The difference between theBayesian Value-at-Risk and the ML-based (VaR Bayesianα − VaR MLα ) is approximately normallydistributed around 0.1481, the absolute difference between the Bayesian ES and theML-based is scattered around 0.1518.Again, the one-sample t-test rejects the null hypothesis that the differences VaR Bayesianα −are a random samples from a normal distribution with zero mean and unknownVaR MLαvariance (p-value 1.55 · 10 −3 ). The same is valid for the ES (p-value 1.91 · 10 −3 ).On average, the posterior credible intervals for the VaR are 220% larger than the bootrappedintervals for ML-based VaR. The posterior credible intervals for the ES are 480% largerthan the bootrapped intervals for ML-based ES.Moreover, the ”uncertainty” plot (Figure 10.5) shows the dependence of the posteriordistribution of VaR on the confidence level α. As expected, the higher the confidencelevel, the larger the confidence interval for the VaR.Figure 10.5: Uncertainty plot for the equity portfolio: posterior VaR α descriptive statisticsvs. confidence level α.

70 Application: Simulated and Real Equity Portfolio10.5 DiscussionFirst of all, we see very small difference in value between the historical and the ML-basedrisk measures. It means, that the historical data is well fitted by the meta-t copula.Next, we observe that the Bayesian risk measures that incorporate the parameter uncertaintyare overestimated compared to the ML-based risk measures. We conclude that ourhypothesis (having a posterior distribution of ν that is evenly distributed on both sides ofthe mean instead of one point ML estimate of ν will automatically imply an increase ofthe posterior VaR/ES compared to the ML-based VaR/ES) works on the historical dataset, too.

Chapter 11Conclusion and OutlookIn this thesis we demonstrated how to estimate the parameters of the t-copula model notonly with the classical, but also with the Bayesian approach and, in particular, with theMCMC method.We found out that the Bayesian estimation for the copula degrees of freedom parameterhas approximately equal accuracy as the common maximum-likelihood based estimationmethods. Moreover, it is observed that the Bayesian estimate of the correlation matrixresulting from the inverse Wishart prior distribution does not the same high accuracy asthe commonly used Kendall’s correlation approximation.We examined the impact of parameter uncertainty using the posterior distribution of thet-copula parameters on the usual risk measures such as VaR and ES when assuming meta-tcopula model for the equity portfolio returns. Our analysis showed that that the parameteruncertainty, while modeling financial risks with a meta-t copula, tends to increase the riskmeasures by 2−3% on average. Moreover, the posterior confidence intervals are on average220% (VaR) and 480% (ES) larger than the usual bootstrap confidence interval for themeta-t copula model for the returns based on the point estimates.In conclusion, we argue that introducing parameter uncertainty via MCMC simulationsinto risk measure calculation in a broader model setting and applied on different portfolioscan be a useful tool for modeling financial risks.11.1 Future WorkFor a comprehensive modeling of multivariate dependence in finance or insurance, thereare other issues in data analysis that should be addressed carefully, such as the problemof missing values or time-dependent correlation parameters. These are not considered inthe present study.Moreover, an interesting development of our research would be to address parameter uncertaintystemming from the hyperparameters or the volatility estimates. This requires adifferent, hierarchical Bayesian modeling approach.71

72 Conclusion and Outlook

BibliographyArbenz, P. (2010). Bayesian Copulae Distributions, with Application to Operational RiskManagement - Some Comments. unpublished.Bennett, J. E., Racine-Poon, A., and J.C. Wakefield (1996). MCMC for Nonlinear HierarchicalModels. In Markov Chain Monte Carlo in Practice. London: Chapman andHall.Besag, J., D. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation andstochastic systems. Statistical Science, 10(1), 58–66.Böcker, K., Crimmi, A. and Fink, H. (2010). Bayesian Risk Aggregation: CorrelationUncertainty and Expert Judgement. Rethinking Risk Measurement and Reporting Uncertainty.Bolstad, W. M. (2010). Understanding Computational Bayesian Statistics. Wiley.Borowicz, J.M., Norman, J.P. (2009). The Effects of Parameter Uncertaintyin Dependent Structures. Presented at the 28th InternationalCongress of Actuaries, Paris, 28 May-2 June 2008, Available athttp://www.actuaries.org/EVENTS/Congresses/Paris/Papers/3093.pdf.Boyle, J.P. and Dykstra, L. D. (1985). Advances in Order Restricted Inference, Chapter AMethod for Finding Projections onto the Intersection of Convex Sets in Hilbert Spaces,pp. 28–47. Springer-Verlag, Berlin.Carlin, B. (2008). Bayesian Methods for Data Analysis. CPC Press.Chen, X. and Y. Fan (2006). Comparison of semiparametric and parametric methods forestimating copulas. Journal of Econometrics, 130, 307–335.Dalla Valle, L. (2009). Bayesian Copulae Distributions, with Application to OperationalRisk Management. Methodology and Computing in Applied Probability, 11, 95–115.Dalla Valle, L. and Giudici, P. (2008). A Bayesian Approach to Estimate the MarginalLoss Distributions in Operational Risk Management. Computational Statistics and DataAnalysis, 52, 3107–3127.Daul, S., De Giorgi, E., Lindskog, F., and McNeil, A. (2003). The Grouped t-Copula withan Application to Credit Risk. RISK, 16, 73–76.Devlin, S.J., Gnanadesikan, R, and Kettenring J.R. (1975). Robust Estimation and OutlierDetection with Correlation Coefficients. Biometrika, 62(3), 531–545.Dykstra, L. D. (1983). An Algorithm for Restricted Least Squares Regression. Stat. Assoc.,837–842.73

74 BIBLIOGRAPHYEinmahl, John H.J., Krajina, A. and Segers, J. (2011). An M-Estimator for Tail Dependencein Arbitrary Dimensions. CentER, 13.Embrechts, P., McNeil, A., and Straumann, D. (2001). Risk Management: Value at Riskand Beyond, Chapter Correlation and Dependency in Risk Management: Propertiesand Pitfalls, pp. 176–223. Cambridge University Press.Fang, H. and Fang, K. (2002). The Meta-Elliptical Distributions with Given Marginals.Multivariate Analysis, 82, 1–16.Ferguson, T.S. (1996). A Course in Large Sample Theory. Chapman and Hall.Genest, C., Ghoudi, K. and L.-P. Rivest (1995). A Semiparametric Estimation Procedureof Dependence Parameters in Multivariate Families of Distributions. Biometrika, 82,543–552.Han, S.-P. (1988). A Successive Projection Method. Math. Prog., 1–14.Higham, N. J. (2001). Computing the Nearest Correlation Matrix - a Problem fromFinance. IMA Journal of Numerical Analysis, 22, 329–343.Joe, H. and Xu, J.J. (1996). The estimation Method of Inference Function for Marginsfor Multivariate Models. Technical report, Dept. of Statistics, University of BritishColumbia.Kim, G., Silvapulle, M.J., and P. Silvapulle (2007). Comparison of Semiparametric andParametric Methods for Estimating Copulas. Communications in Statistics, 2836–2850.Leonard, T. and John S.J. Hsu (1992). Bayesian Inference for a Covariance Matrix. TheAnnals of Statistics, 2(4), 1669–1696.Liang, F., Liu, C. and R. Carroll (2010). Advanced Markov Chain Monte Carlo Methods:Learning from Past Samples. Wiley.Lindskog, F., McNeil, A. and Schmock, U. (2001). Kendall’s Tau for Elliptical Distributions.ETH Zurich, Department of Mathematics, working paper.Luo, X. and Shevchenko, P. V. (2010). The t-Copula with Multiple Parameters of Degreesof Freedom: Bivariate Characteristics and Application to Risk Management. QuantitativeFinance, 10(9), 1039–1054.Mashal, R. and M. Naldi (2002). Pricing Multiname Credit Derivatives: Heavy TailedApproach. Quantitative Credit Research Quaterly,, 3107–3127. Lehman Brothers Incand Columbia Gradiate School of Business, working paper.Mashal, R. and Zeevi, A. (2002). Beyond Correlation: Extreme Comovements BetweenFinancial Assets. Columbia University, Working Paper.McNeil, A., Frey, R. and P. Embrechts (2005). Quantitative Risk Management: Concepts,Techniques and Tools. Princeton University Press.Metropolis, N., Rosenbluth, A.W., Rosenbluth M. N., Teller A. H. and E. Teller (1953).Equation of State Calculations by Fast Computing Machines. Journal of ChemicalPhysics, 21(6), 1087–1092.Min, A., Czado, C. (2010). Bayesian Inference for Multivariate Copulas Using Pair-CopulaConstructions. Journal of Financial Econometrics, 8(4).

BIBLIOGRAPHY 75Rousseeuw, P. and Molenberghs, G. (1993). Transformation of Nonpositive SemidefiniteCorrelation Matrices. Communications in Statistics: Theory and Methods, 965–984.Wüthrich, V. M. and P. V. Shevchenko (2009). The Structural Modelling of OperationalRisk via Bayesian inference: Combining Loss Data with Expert Opinions. The Journalof Operational Risk, 44(2), 3–26.

76 BIBLIOGRAPHY

Appendix AComplementary informationA.1 Inverse Wishart DistributionIn statistics, the inverse Wishart distribution, also called the inverted Wishart distribution,is a probability distribution defined on real-valued positive-definite matrices. In Bayesianstatistics it is used as the conjugate prior for the covariance matrix of a multivariate normaldistribution.We say A follows an inverse Wishart distribution, denoted as A ∼ Inverse Wishart(B, α),with d×d symmetric positive definite scale matrix B > 0 and degrees of freedom parameterα > d + 1 (real). The probability density function of the d-dimensional inverse Wishart is:f Inverse Wishart(B,α) (A) =(|B| α 2 |A| − α+d+12 exp2 αd2 Γ d(α2)− 1 2 tr(BA−1 )) ,where Γ d (·) is the multivariate gamma function defined asThe mean is given byand the variance of each element of A( ) αΓ d = π d(d−1)/4 ∏ d Γ[α/2 + (1 − j)/2].2j=1E[A] =Bα − d − 1var(A ij ) = (α − d + 1)B2 ij + (α − d − 1)B iiB jj(α − d)(α − d − 1) 2 .(α − d − 3)77

78 Complementary informationA.2 Equity data: Yahoo Tickers and Principal StatisticsIn Chapter 10 we use equity data, downloaded from the Yahoo Finance Statistical Release(http://www.finance.yahoo.com). The tickers regarding the used equities and the principalstatistics of the relative returns are reported in Table A.1.Name Ticker Mean Standard deviationBarclays PLC (BCS) 0.0026 0.0562BNP Paribas (BNP.PA) 0.0009 0.0319The Bank Of Nova Scotia (BNS) -0.0001 0.0242Royal Bank of Canada (RY) 0.0001 0.0242Merrill Lynch Depositor, Inc. M (PZB) -0.0001 0.0250UniCredit SpA (UCG.MI) 0.0020 0.0338NORDEA BANK (NDB.F) 0.0008 0.0333Table A.1: Equity data: the tickers and the principal statistics on the correspondingrelative returns.The historical price time series for the above mentioned equities are illustrated in Figure .Figure A.1: Historical price time series for the equity portfolio described in section 10.4.

Bayesian Inference for Multivariate t Copulas Modeling Financial ...

Create successful ePaper yourself

Delete template?

Save as template?