11.07.2015 Views

a consistent nonparametric test for causality in quantile - Humboldt ...

a consistent nonparametric test for causality in quantile - Humboldt ...

a consistent nonparametric test for causality in quantile - Humboldt ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Econometric Theory, 28, 2012, 861–887.doi:10.1017/S0266466611000685A CONSISTENT NONPARAMETRICTEST FOR CAUSALITY IN QUANTILEKIHO JEONGKyungpook National UniversityWOLFGANG K. HÄRDLE<strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>SONG SONG<strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>andUniversity of Cali<strong>for</strong>nia, BerkeleyThis paper proposes a <strong>nonparametric</strong> <strong>test</strong> of Granger <strong>causality</strong> <strong>in</strong> <strong>quantile</strong>. Zheng(1998, Econometric Theory 14, 123–138) studied the idea to reduce the problemof <strong>test</strong><strong>in</strong>g a <strong>quantile</strong> restriction to a problem of <strong>test</strong><strong>in</strong>g a particular type of meanrestriction <strong>in</strong> <strong>in</strong>dependent data. We extend Zheng’s approach to the case of dependentdata, particularly to the <strong>test</strong> of Granger <strong>causality</strong> <strong>in</strong> <strong>quantile</strong>. Comb<strong>in</strong><strong>in</strong>g the resultsof Zheng (1998) and Fan and Li (1999, Journal of Nonparametric Statistics 10,245–271), we establish the asymptotic normal distribution of the <strong>test</strong> statistic undera β-mix<strong>in</strong>g process. The <strong>test</strong> is <strong>consistent</strong> aga<strong>in</strong>st all fixed alternatives and detectslocal alternatives approach<strong>in</strong>g the null at proper rates. Simulations are carried outto illustrate the behavior of the <strong>test</strong> under the null and also the power of the <strong>test</strong>under plausible alternatives. An economic application considers the causal relationsbetween the crude oil price, the USD/GBP exchange rate, and the gold price <strong>in</strong> thegold market.1. INTRODUCTIONWhether movements <strong>in</strong> one economic variable cause reactions <strong>in</strong> another variableis an important issue <strong>in</strong> economic policy and also <strong>for</strong> f<strong>in</strong>ancial <strong>in</strong>vestmentdecisions. A framework <strong>for</strong> <strong>in</strong>vestigat<strong>in</strong>g <strong>causality</strong> between economic <strong>in</strong>dicatorshas been developed by Granger (1969). Test<strong>in</strong>g <strong>for</strong> Granger <strong>causality</strong> betweenThe research was conducted while Jeong was visit<strong>in</strong>g C.A.S.E.—Center <strong>for</strong> Applied Statistics and Economics—<strong>Humboldt</strong>-Universität zu Berl<strong>in</strong> <strong>in</strong> the summers of 2005 and 2007. Jeong is grateful <strong>for</strong> their hospitality dur<strong>in</strong>gthe visits. Jeong’s work was supported by a Korean Research Foundation grant funded by the Korean government(MOEHRD) (KRF-2006-B00002), and Härdle and Song’s work was supported by the Deutsche Forschungsgeme<strong>in</strong>schaftthrough the SFB 649 “Economic Risk.” We thank the editor, two anonymous referees, and Holger Dette <strong>for</strong>concrete suggestions on improv<strong>in</strong>g the manuscript and restructur<strong>in</strong>g the paper. Their valuable comments and suggestionsare gratefully acknowledged. Address correspondence to Kiho Jeong, Kyungpook National University, Korea;e-mail: khjeong@knu.ac.kr.c○ Cambridge University Press 2012 861


862 KIHO JEONG ET AL.economic time series has been s<strong>in</strong>ce studied <strong>in</strong>tensively <strong>in</strong> empirical macroeconomicsand empirical f<strong>in</strong>ance. The majority of research results were obta<strong>in</strong>ed <strong>in</strong>the context of Granger <strong>causality</strong> <strong>in</strong> the conditional mean. The conditional mean,though, is a questionable element of analysis if the distributions of the variables<strong>in</strong>volved are nonelliptic or fat tailed as is to be expected with, <strong>for</strong> example, f<strong>in</strong>ancialreturns. The focus of a <strong>causality</strong> analysis on the mean might result <strong>in</strong> unclearnews. The conditional mean is only one element of an overall summary <strong>for</strong> theconditional distribution. A tail area causal relation may be quite different froma <strong>causality</strong> based on the center of the distribution. Lee and Yang (2007) exploremoney-<strong>in</strong>come Granger <strong>causality</strong> <strong>in</strong> the conditional <strong>quantile</strong> and f<strong>in</strong>d that Granger<strong>causality</strong> is significant <strong>in</strong> tail <strong>quantile</strong>s, whereas it is not significant <strong>in</strong> the centerof the distribution.An illustrat<strong>in</strong>g motivation <strong>for</strong> the research presented here is from labor marketanalysis where one tries to f<strong>in</strong>d out how <strong>in</strong>come depends on the age of the employee<strong>for</strong> different education levels, genders, and nationalities, and so on (discrim<strong>in</strong>ationeffects); see, <strong>for</strong> example, Buch<strong>in</strong>sky (1995). In particular, the effectof education on <strong>in</strong>come is summarized by the basic claim of Day and Newburger(2007): At most ages, more education equates with higher earn<strong>in</strong>gs, and the payoffis most notable at the highest educational level, which is actually from thepo<strong>in</strong>t of view of mean regression. However, whether this difference is significantor not is still questionable, especially <strong>for</strong> different ends of the (conditional)<strong>in</strong>come distribution. Härdle, Ritov, and Song (2009) show that <strong>for</strong> the 0.20 <strong>quantile</strong>confidence bands <strong>for</strong> <strong>in</strong>come given “university,” “apprenticeship,” and “loweducation” status do not differ significantly from one another although they becomeprogressively lower, which <strong>in</strong>dicates that high education does not equateto higher earn<strong>in</strong>gs significantly <strong>for</strong> the lower tails of <strong>in</strong>come, whereas <strong>in</strong>creas<strong>in</strong>gage seems to be the ma<strong>in</strong> driv<strong>in</strong>g <strong>for</strong>ce. For the conditional median, the bands<strong>for</strong> “university” and “low education” differ significantly. For the 0.80 <strong>quantile</strong>s,all conditional <strong>quantile</strong>s differ, which <strong>in</strong>dicates that higher education is associatedwith higher earn<strong>in</strong>gs. However, these f<strong>in</strong>d<strong>in</strong>gs do not necessarily <strong>in</strong>dicatecausalities. To answer the question “Does education Granger cause <strong>in</strong>come <strong>in</strong>various conditional <strong>quantile</strong>s?” the concept of Granger <strong>causality</strong> <strong>in</strong> means cannotbe used to estimate or <strong>test</strong> <strong>for</strong> these effects. Hence the need <strong>for</strong> the conceptof Granger <strong>causality</strong> <strong>in</strong> <strong>quantile</strong>s and the need to develop <strong>test</strong>s <strong>for</strong> these effectsemerge.Another motivation comes from controll<strong>in</strong>g and monitor<strong>in</strong>g downside marketrisk and <strong>in</strong>vestigat<strong>in</strong>g large comovements between f<strong>in</strong>ancial markets. These areimportant <strong>for</strong> risk management and portfolio/<strong>in</strong>vestment diversification (Hong,Liu, and Wang, 2009). Various other risk management tasks are described <strong>in</strong>Bollerslev (2001) and Campbell and Cochrane (1999) <strong>in</strong>dicat<strong>in</strong>g the importanceof Granger <strong>causality</strong> <strong>in</strong> <strong>quantile</strong>. Yet another motivation comes from thewell-known robustness properties of the conditional <strong>quantile</strong>: like the parallelboxplot—calculated across an explanatory variable—the set of conditional<strong>quantile</strong>s characterizes the entire distribution <strong>in</strong> more detail.


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 863Based on the kernel method, we propose a <strong>nonparametric</strong> <strong>test</strong> <strong>for</strong> Granger<strong>causality</strong> <strong>in</strong> <strong>quantile</strong>. Test<strong>in</strong>g conditional <strong>quantile</strong> restrictions by <strong>nonparametric</strong>estimation techniques <strong>in</strong> dependent data situations has not been considered <strong>in</strong> theliterature be<strong>for</strong>e. This paper <strong>in</strong>tends to fill this literature gap. In an unpublishedwork<strong>in</strong>g paper that has been <strong>in</strong>dependently carried out from ours, Lee and Yang(2007) also propose a <strong>test</strong> <strong>for</strong> Granger <strong>causality</strong> <strong>in</strong> the conditional <strong>quantile</strong>. Their<strong>test</strong>, however, relies on l<strong>in</strong>ear <strong>quantile</strong> regression and thus is subject to possiblefunctional misspecification of <strong>quantile</strong> regression. Recently, Hong et al. (2009) <strong>in</strong>vestigatedGranger <strong>causality</strong> <strong>in</strong> value at risk (VaR) with a correspond<strong>in</strong>g (kernelbased)<strong>test</strong>. Their method, however, offers two possible improvements. The first isthat it needs a parametric specification of VaR, aga<strong>in</strong> subject to misspecificationerrors. The second is that their <strong>test</strong> does not directly check <strong>causality</strong> but rather anecessary condition <strong>for</strong> <strong>causality</strong>.The problem of <strong>test</strong><strong>in</strong>g conditional mean restrictions us<strong>in</strong>g <strong>nonparametric</strong> estimationtechniques has been actively studied <strong>for</strong> dependent data. Among the relatedwork, the <strong>test</strong><strong>in</strong>g procedures of Fan and Li (1999) and Li (1999) use thegeneral hypothesis of the <strong>for</strong>m E(ε|z) = 0, where ε and z are the regression errorterm and the vector of regressors, respectively. They consider the distancemeasure of J = E[εE(ε|z) f (z)] to construct kernel-based procedures. For the advantagesof us<strong>in</strong>g this distance measure <strong>in</strong> kernel-based <strong>test</strong><strong>in</strong>g procedures, seeLi and Wang (1998) and Hsiao and Li (2001). A feasible <strong>test</strong> statistic based onJ has a second-order degenerate U-statistic as the lead<strong>in</strong>g term under the nullhypothesis. Generaliz<strong>in</strong>g the result of Hall (1984) <strong>for</strong> <strong>in</strong>dependent data, Fan andLi (1999) establish the asymptotic normal distribution <strong>for</strong> a general second-orderdegenerate U-statistic with dependent data.All the results stated previously on <strong>test</strong><strong>in</strong>g mean restrictions are however irrelevantwhen <strong>test</strong><strong>in</strong>g <strong>quantile</strong> restrictions. Zheng (1998) proposed an idea to trans<strong>for</strong>m<strong>quantile</strong> restrictions to mean restrictions <strong>in</strong> <strong>in</strong>dependent data. Follow<strong>in</strong>g hisidea, one can use the exist<strong>in</strong>g technical results on <strong>test</strong><strong>in</strong>g mean restrictions <strong>in</strong> <strong>test</strong><strong>in</strong>g<strong>quantile</strong> restrictions. In this paper, by comb<strong>in</strong><strong>in</strong>g Zheng’s idea and the resultsof Fan and Li (1999) and Li (1999), we derive a <strong>test</strong> statistic <strong>for</strong> Granger <strong>causality</strong><strong>in</strong> <strong>quantile</strong> and establish the asymptotic normal distribution of the proposed<strong>test</strong> statistic under a β-mix<strong>in</strong>g process. Our <strong>test</strong><strong>in</strong>g procedure can be extended toseveral hypothesis <strong>test</strong><strong>in</strong>g problems with conditional <strong>quantile</strong> <strong>in</strong> dependent data;<strong>for</strong> example, <strong>test</strong><strong>in</strong>g a parametric regression functional <strong>for</strong>m, <strong>test</strong><strong>in</strong>g the <strong>in</strong>significanceof a subset of regressors, and <strong>test</strong><strong>in</strong>g semiparametric versus <strong>nonparametric</strong>regression models.The paper is organized as follows. Section 2 presents the <strong>test</strong> statistic. Section3 establishes the asymptotic normal distribution under the null hypothesis ofno <strong>causality</strong> <strong>in</strong> <strong>quantile</strong>. Section 4 displays a fairly extensive simulation study toillustrate the behavior of the <strong>test</strong> under the null, <strong>in</strong> addition to the power of the<strong>test</strong> under plausible alternatives. Section 5 considers the causal relations betweenthe crude oil and gold prices as an economic application. Section 6 concludes thepaper. Technical proofs are given <strong>in</strong> the Appendix.


864 KIHO JEONG ET AL.2. NONPARAMETRIC TEST FOR GRANGER CAUSALITYIN QUANTILETo simplify the exposition, we assume a bivariate case, or that only {y t ,w t } areobservable. Granger <strong>causality</strong> <strong>in</strong> mean (Granger, 1988) is def<strong>in</strong>ed as follows.1. w t does not cause y t <strong>in</strong> mean with respect to {y t−1 ,...,y t−p ,w t−1 ,...,w t−q } ifE( y t |y t−1 ,...,y t−p ,w t−1 ,...,w t−q ) = E( y t |y t−1 ,...,y t−p ) and2. w t is a prima facie cause <strong>in</strong> mean of y t with respect to {y t−1 ,...,y t−p ,w t−1 , ...,w t−q } ifE( y t |y t−1 ,...,y t−p ,w t−1 ,...,w t−q ) ≠ E( y t |y t−1 ,...,y t−p ).Motivated by the def<strong>in</strong>ition of Granger <strong>causality</strong> <strong>in</strong> mean, we def<strong>in</strong>e Granger<strong>causality</strong> <strong>in</strong> <strong>quantile</strong> as follows.1. w t does not cause y t <strong>in</strong> the θ-<strong>quantile</strong> with respect to {y t−1 ,...,y t−p ,w t−1 ,..., w t−q } ifQ θ ( y t |y t−1 ,...,y t−p ,w t−1 ,...,w t−q ) = Q θ ( y t |y t−1 ,...,y t−p ). (1)2. w t is a prima facie cause <strong>in</strong> the θ-<strong>quantile</strong> of y t with respect to {y t−1 ,...,y t−p ,w t−1 ,...,w t−q } ifQ θ ( y t |y t−1 ,...,y t−p ,w t−1 ,...,w t−q ) ≠ Q θ ( y t |y t−1 ,...,y t−p ), (2)where Q θ (y t |·) is the θth (0


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 865Zheng (1998) proposed an idea to reduce the problem of <strong>test</strong><strong>in</strong>g a <strong>quantile</strong> restrictionto a problem of <strong>test</strong><strong>in</strong>g a particular type of mean restriction. The null hypothesis(3) is true if and only if E[1{y t Q θ (x t )|z t }] = θ or 1{y t Q θ (x t )} =θ + ε t where E(ε t |z t ) = 0 and 1(·) is the <strong>in</strong>dicator function. For a list of relatedliterature we refer to Li and Wang (1998) and Zheng (1998). Although variousdistance measures can be used to <strong>consistent</strong>ly <strong>test</strong> the hypothesis (3), we considerthe follow<strong>in</strong>g distance measure:{Fy|zJ ≡ E[(Q θ (x t )|z t ) − θ } ]2 fz (z t ) , (5)with f zt (z t ) be<strong>in</strong>g the marg<strong>in</strong>al density function of z t , which is sometimes abbreviatedas f z (z t ). Note that J 0 and the equality holds if, and only if, H 0 istrue, with strict <strong>in</strong>equality hold<strong>in</strong>g under H 1 . Thus J can be used as a propercandidate <strong>for</strong> <strong>consistent</strong> <strong>test</strong><strong>in</strong>g of H 0 (Li, 1999, p. 104). Because E(ε t |z t ) =F y|z {Q θ (x t )|z t } − θ we haveJ = E{ε t E(ε t |z t ) f z (z t )}. (6)The <strong>test</strong> is based on a sample analogue of E{ε E(ε|z) f z (z)}. Includ<strong>in</strong>g the densityfunction f z (z) avoids the problem of trimm<strong>in</strong>g on the boundary of the densityfunction; see Powell, Stock, and Stoker (1989) <strong>for</strong> an analogue approach. Thedensity-weighted conditional expectation E(ε|z) f z (z) can be estimated by kernelmethodsÊ(ε t |z t ) ˆ f z (z t ) =1T(T − 1)h ∑ m K ts ε s , (7)s≠twhere m = p + q is the dimension of z, K ts = K {(z t − z s )/h} is the kernelfunction, and h is a bandwidth. Then we have a sample analogue of J asJ T ≡=1T (T − 1)h m1T (T − 1)h mT T∑ ∑t=1 s≠tT T∑ ∑t=1 s≠tK ts ε t ε sK ts [1{y t Q θ (x t )} − θ][1{y s Q θ (x s )} − θ]. (8)The θth conditional <strong>quantile</strong> of y t given x t , Q θ (x t ), can also be estimated by the<strong>nonparametric</strong> kernel methodˆQ θ (x t ) = ˆF y|x −1 (θ|x t), (9)where∑ L ts 1(y s y t )s≠tˆF y|x (y t |x t ) =(10)∑ L tss≠tis the Nadaraya–Watson kernel estimator of F y|x (y t |x t ) with the kernel functionof L ts = L (x t − x s )/a and the bandwidth parameter of α. The unknown error εcan be estimated as


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 867(e) For any y and x satisfy<strong>in</strong>g 0 < F y|x (y|x) 0, F y|x andf x (x) are cont<strong>in</strong>uous and bounded <strong>in</strong> x and y; <strong>for</strong> fixed y, the conditionaldistribution function F y|x and the conditional density function f y|x belongto A ∞ 3 ; f y|x(Q θ (x)|x)>0 <strong>for</strong> all x; f y|x is uni<strong>for</strong>mly bounded <strong>in</strong> x and yby, say, c f .(f) For some compact set G, there are ε>0 and γ>0 such that f x γ<strong>for</strong> all x <strong>in</strong> the ε-neighborhood {x|‖x − u‖ 0,f y|x (y|x) c 0 <strong>for</strong> all x ∈ G,v ∈ .(g) There is an <strong>in</strong>creas<strong>in</strong>g sequence s T of positive <strong>in</strong>tegers such that <strong>for</strong> somef<strong>in</strong>ite A,(A2)Ts Tβ 2s T /(3T ) (s T ) A, 1 s T T 2<strong>for</strong> all T 1.(a) We use product kernels <strong>for</strong> both L (·) and K (·). Let l and k be their correspond<strong>in</strong>gunivariate kernel which is bounded and symmetric. Then l(·) isnonnegative, l(·) ∈ ϒ v , k(·) is nonnegative, and k(·) ∈ ϒ 2 .(b) h = O(T −α′ ) <strong>for</strong> some 0 0, sup y∈φzρ∣ ∣ g(y) − g(z) − G g (y, z) ∣ ∣ /|y − z| μ D g (z) <strong>for</strong> all z,where φ zρ = {y||y − z|


868 KIHO JEONG ET AL.homogeneous polynomial <strong>in</strong> y − z with coefficients be<strong>in</strong>g the partial derivativesof g at z of orders 1 through d − 1 when d > 1; and g(z), its partial derivativesof order d − 1 and less, and D g (z) have f<strong>in</strong>ite αth moments.The functions <strong>in</strong> A α μ are thus expanded <strong>in</strong> a Taylor series with a local Lipschitzcondition on the rema<strong>in</strong>der, (α,μ) depend<strong>in</strong>g simultaneously on smoothness andmoment properties. Bounded functions <strong>in</strong> Lip(μ) (the Lipschitz class of degree μ)<strong>for</strong> 0 1, Aα μ conta<strong>in</strong>s the bounded and (d − 1)-timesboundedly differentiable functions whose (d − 1)th partial derivatives are <strong>in</strong> Lip(μ − d + 1)). In apply<strong>in</strong>g A α μ to f and F, we take α =∞.Conditions (A1)(a)–(d) and (A2)(a) and (b) are adopted from conditions (D1)and (D2) of Li (1999), which are used to derive the asymptotic normal distributionof a second-order degenerate U-statistic. Assumption (A1)(a) requires {y t ,w t }t=1Tto be a stationary absolutely regular process with geometric decay rate. Assumptions(A1)(b)–(d) are ma<strong>in</strong>ly some smoothness and moment conditions; these conditionsare quite weak <strong>in</strong> the sense that they are similar to those used <strong>in</strong> Fan andLi (1996) <strong>for</strong> the <strong>in</strong>dependent data case. However, <strong>for</strong> autoregressive conditionallyskedastic (ARCH) or generalized autoregressive conditionally heteroskedastic(GARCH) type error processes as considered <strong>in</strong> Engle (1982) and Bollerslev(1986), the error term ε t may not have f<strong>in</strong>ite fourth moments <strong>in</strong> some situations.For example, let ε t |ε t−1 ∼ N(0,α 0 + α 1 εt−1 2 ). Engle (1982) showed that ε t doesnot have a f<strong>in</strong>ite fourth moment if α 1 > 1/ √ 3. Thus, Assumption (A1)(c) will beviolated <strong>in</strong> such a case.Assumption (A2)(a) requires L(·) to be a vth- (v 2 ) order kernel. This conditiontogether with (A1)(b) ensures that the bias <strong>in</strong> the kernel estimation (of thenull model) is bounded. The requirement that k is a nonnegative second-orderkernel function <strong>in</strong> (A2)(b) is a quite weak and standard assumption.Conditions (A1)(e)–(g) and (A2)(c) are technical conditions (A1), (A2), (B1),(B2), (C1), and (C2) of Theorem 2.3 of Franke and Mwita (2003), which are requiredto get the uni<strong>for</strong>m convergence rate of the <strong>nonparametric</strong> kernel estimatorof the conditional distribution function and correspond<strong>in</strong>g conditional <strong>quantile</strong>with mix<strong>in</strong>g data. Because the simple ARCH models (Engle, 1982; Masry andTjøstheim, 1995, 1997), their extensions (Diebolt and Guegan, 1993), and thebil<strong>in</strong>ear Markovian models are geometrically strongly mix<strong>in</strong>g under some generalergodicity conditions, Assumption (A1)(g) is usually satisfied. There also existsimple methods to determ<strong>in</strong>e the mix<strong>in</strong>g rates <strong>for</strong> various classes of random processes,<strong>for</strong> example, Gaussian, Markov, autoregressive mov<strong>in</strong>g average, ARCH,and GARCH. Hence the assumption of a known mix<strong>in</strong>g rate is reasonable andhas been adopted <strong>in</strong> many studies, <strong>for</strong> example, Györfi, Härdle, Sarda, and Vieu(1989), Irle (1997), Meir (2000), Modha and Masry (1998), Roussas (1988), andYu (1993). Auestad and Tjøstheim (1990) provided excellent discussions on therole of mix<strong>in</strong>g <strong>for</strong> model identification <strong>in</strong> nonl<strong>in</strong>ear time series analysis. But s<strong>in</strong>cethe restriction of Assumption (A1)(c) as discussed be<strong>for</strong>e, ARCH or GARCHtype processes may not satisfy all assumptions here. F<strong>in</strong>ally conditions (A2)(d)


870 KIHO JEONG ET AL.Th m/2 Ĵ T / ˆσ 0√T=T − 1T∑t=1T∑s≠t(iii) Under the alternative hypothesis (4),{ } ][ {} ]K ts[1 y t ˆQ θ (x t ) − θ 1 y s ˆQ θ (x s ) − θ√ √2θ(1 − θ) ∑ Kts2s≠t.Ĵ T → E{[F y|z (Q θ (x t )|z t ) − θ] 2 f z (z t )} > 0<strong>in</strong> probability.(iv) Under the local alternatives (A.2) <strong>in</strong> the Appendix, T h m/2 Ĵ T → N(μ,σ1 2)<strong>in</strong> distribution, where[]μ = E fy|z 2 {Q θ (z t )|z t }l 2 (z t ) f z (z t ) ,}∫σ1 {σ 2 = 2E v 4 (z t) f z (z t ) K 2 (u)du, andσ 2 v (z t) = E(v 2 t |z t) with v t ≡ I {y t Q θ (x t )} − F(Q θ (x t )|z t ).Theorem 3.1 generalizes the results of Zheng (1998) <strong>for</strong> <strong>in</strong>dependent data tothe weakly dependent data case. A detailed proof of Theorem 3.1 is given <strong>in</strong>the Appendix. The ma<strong>in</strong> difficulty <strong>in</strong> deriv<strong>in</strong>g the asymptotic distribution of thestatistic def<strong>in</strong>ed <strong>in</strong> equation (12) is that a <strong>nonparametric</strong> <strong>quantile</strong> estimator is<strong>in</strong>cluded <strong>in</strong> the <strong>in</strong>dicator function that is not differentiable with respect to the<strong>quantile</strong> argument and thus prevents tak<strong>in</strong>g a Taylor expansion around the trueconditional <strong>quantile</strong> Q θ (x t ). To circumvent the problem, Zheng (1998) madeuse of the work of Sherman (1994) on uni<strong>for</strong>m convergence of U-statistics <strong>in</strong>dexedby parameters. In this paper, we bound the <strong>test</strong> statistic by the statistics<strong>in</strong> which the <strong>nonparametric</strong> <strong>quantile</strong> estimator <strong>in</strong> the <strong>in</strong>dicator function is replacedwith sums of the true conditional <strong>quantile</strong> and upper and lower bounds<strong>consistent</strong> with the uni<strong>for</strong>m convergence rate of the <strong>nonparametric</strong> <strong>quantile</strong> estimator,1(y t Q θ (x t ) − C T ) and 1(y t Q θ (x t ) + C T ).An important further step is to show that the differences of the ideal <strong>test</strong> statisticJ T given <strong>in</strong> equation (8) and the statistics hav<strong>in</strong>g the <strong>in</strong>dicator functions obta<strong>in</strong>edfrom the first step stated previously are asymptotically negligible. We may directlyshow that the second moments of the differences are asymptotically negligible byus<strong>in</strong>g the result of Yoshihara (1976) on the bound of moments of U-statistics<strong>for</strong> absolutely regular processes. However, it is tedious to get bounds on thesecond moments with dependent data. In the proof we use <strong>in</strong>stead the fact thatthe differences are second-order degenerate U-statistics. Thus by us<strong>in</strong>g the resulton the asymptotic normal distribution of the second-order degenerate U-statisticof Fan and Li (1999), we can derive the asymptotic variance that is based on the<strong>in</strong>dependent and identically distributed (i.i.d.) sequence hav<strong>in</strong>g the same marg<strong>in</strong>aldistributions as weakly dependent variables <strong>in</strong> the <strong>test</strong> statistic. With this little


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 871trick we only need to show that the asymptotic variance is O(1) <strong>in</strong> an i.i.d.situation. For details refer to the Appendix.4. SIMULATIONWe generate bivariate data {y t ,w t } t=1 T accord<strong>in</strong>g to the follow<strong>in</strong>g model:y t = 1 2 y t−1 + cw 2 t−1 + ε 1t,w t = 1 + 1 2 w t−1 + ε 2t ,where ε 1t and ε 2t are <strong>in</strong>dependent standard normal random variables. Here c = 0corresponds to the hypothetical model; that is, w t does not cause y t <strong>in</strong> the θ<strong>quantile</strong> with respect to {y t−1 ,w t−1 }. All the coefficients are set such that thecorrespond<strong>in</strong>g time series is stationary and β-mix<strong>in</strong>g with correspond<strong>in</strong>g densitiesbounded to satisfy the assumptions discussed be<strong>for</strong>e. We use different values ofc ∈ [0,1] to <strong>in</strong>vestigate the power of the <strong>test</strong>, such that the higher c is, the strongerthe <strong>causality</strong> of w t on y t is. Without loss of generality, we choose θ = 0.1,0.5,0.9and T = 500,1,000,5,000 here with the bandwidth h and a as <strong>in</strong> (7) and (10)as <strong>for</strong> a typical Nadaraya–Watson type estimator. We consider the nom<strong>in</strong>al 0.05significance level and repeat the <strong>test</strong> 500 times to generate the power.Table 1 displays the power per<strong>for</strong>mance of the <strong>test</strong> <strong>for</strong> different comb<strong>in</strong>ationsof T, c, and θ. First, obviously the power is very sensitive to the choice of T ; thatis, the larger T is, <strong>for</strong> the same c and θ, the larger the power is. From a technicalpo<strong>in</strong>t of view, this makes sense, because the more data we have, the more evidencewe can draw from to detect the “<strong>causality</strong>” effect. Our asymptotic result, Theorem3.1, needs the plug-<strong>in</strong> estimation of the asymptotic covariance matrix that is usedto normalize the <strong>test</strong> statistic. Note that such an estimator is model-dependent andunder the alternative is <strong>consistent</strong> with a different value than the one under thenull. As a result, the power deteriorates <strong>for</strong> small T . On the other hand, whetherthe <strong>causality</strong> effect exists or not is the nature of the series, which is <strong>in</strong>dependent ofthe sample size used <strong>in</strong> this technical <strong>test</strong>. Enhanc<strong>in</strong>g the power per<strong>for</strong>mance <strong>for</strong>small-sample data us<strong>in</strong>g the simulation-based method deserves further research.Second, as discussed be<strong>for</strong>e, the higher c is, the stronger the <strong>causality</strong> of w t on y tis, which is confirmed by the larger and larger power values. Third, <strong>for</strong> different<strong>quantile</strong>s θ, we f<strong>in</strong>d that the powers with respect to θ = 0.5 are usually larger thanthe powers with respect to θ = 0.1 and 0.9.5. APPLICATION TO COMMODITY PRICESIn f<strong>in</strong>ancial and commodity markets, it has been argued that the covariation ofthe tails may be different from that of the rest of the distribution. The gold marketis one of the most important markets <strong>in</strong> the world, where trad<strong>in</strong>g takes place24 hours a day around the globe and transactions <strong>in</strong>volv<strong>in</strong>g billions of dollars of


872 KIHO JEONG ET AL.TABLE 1. Power per<strong>for</strong>mance <strong>for</strong> different comb<strong>in</strong>ations of T,c, and θc Power (θ 0.1) c Power (θ 0.5) c Power (θ 0.9)T = 5000.00 0.024 0.00 0.108 0.00 0.0100.03 0.030 0.03 0.288 0.03 0.0200.06 0.058 0.06 0.796 0.06 0.1080.09 0.190 0.09 0.991 0.09 0.5850.12 0.414 0.12 1.000 0.12 0.9500.15 0.696 0.15 1.000 0.15 0.9940.18 0.888 0.18 1.000 0.18 1.0000.21 0.962 0.21 1.000 0.21 1.0000.24 0.988 0.24 1.000 0.24 1.0000.27 1.000 0.27 1.000 0.27 1.0000.30 1.000 0.30 1.000 0.30 1.000T = 1,0000.00 0.014 0.00 0.130 0.00 0.0180.01 0.022 0.01 0.144 0.01 0.0240.02 0.038 0.02 0.296 0.02 0.0240.03 0.026 0.03 0.564 0.03 0.0400.04 0.060 0.04 0.788 0.04 0.1080.05 0.110 0.05 0.946 0.05 0.2840.06 0.196 0.06 0.990 0.06 0.5060.07 0.356 0.07 1.000 0.07 0.8380.08 0.530 0.08 1.000 0.08 0.9500.09 0.676 0.09 1.000 0.09 0.9940.10 0.816 0.10 1.000 0.10 0.9960.11 0.906 0.11 1.000 0.11 1.0000.12 0.958 0.12 1.000 0.12 1.0000.13 0.972 0.13 1.000 0.13 1.0000.14 0.994 0.14 1.000 0.14 1.0000.15 0.998 0.15 1.000 0.15 1.0000.16 1.000 0.16 1.000 0.16 1.000T = 5,0000.00 0.020 0.00 0.116 0.00 0.0260.01 0.028 0.01 0.328 0.01 0.0460.02 0.124 0.02 0.904 0.02 0.1420.03 0.490 0.03 1.000 0.03 0.7280.04 0.924 0.04 1.000 0.04 0.9880.05 1.000 0.05 1.000 0.05 1.0000.06 1.000 0.06 1.000 0.06 1.0000.07 1.000 0.07 1.000 0.07 1.0000.08 1.000 0.08 1.000 0.08 1.0000.09 1.000 0.09 1.000 0.09 1.0000.10 1.000 0.10 1.000 0.10 1.000


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 873TABLE 2. Unit root <strong>test</strong>sTime CR Unit rootTest trend Test value Unit afterVariable type term statistics 5% root differenc<strong>in</strong>gLN Oil DF no 0.86955 −1.94160 yes noADF no 0.72255 −1.94160 yes noPP no 0.73107 −1.94160 yes noKPSS no 2.16221 0.14600 yes noDF <strong>in</strong>clude −0.81819 −2.86386 yes noADF <strong>in</strong>clude −1.03287 −2.86386 yes noPP <strong>in</strong>clude −0.94355 −2.86386 yes noKPSS <strong>in</strong>clude 2.16221 0.14600 yes noGBP DF no −0.12461 −1.94160 yes noADF no −0.16186 −1.94160 yes noPP no −0.12506 −1.94160 yes noKPSS no 5.26720 0.14600 yes noDF <strong>in</strong>clude −1.53295 −2.86386 yes noADF <strong>in</strong>clude −1.51000 −2.86386 yes noPP <strong>in</strong>clude −1.53853 −2.86386 yes noKPSS <strong>in</strong>clude 5.26720 0.14600 yes noLN Gold DF no 0.45931 −1.94160 yes noADF no 1.03139 −1.94160 yes noPP no 0.69975 −1.94160 yes noKPSS no 3.50910 0.14600 yes noDF <strong>in</strong>clude −1.98422 −2.86386 yes noADF <strong>in</strong>clude −1.36627 −2.86386 yes noPP <strong>in</strong>clude −1.66336 −2.86386 yes noKPSS <strong>in</strong>clude 3.50910 0.14600 yes noNote: ”LN Oil”, ”GBP”, and ”LN Gold” refer to the logarithmic Brent crude oil price, USD/GBP exchange rate, andlogarithmic NYMEX spot gold price, respectively. The “Test types” DF, ADF, PP, and KPSS refer to unit root <strong>test</strong>sof, respectively, Dickey–Fuller (Fuller, 1976), augmented Dickey–Fuller (Fuller, 1976), Phillips–Perron (Phillips &Perron, 1988), and (Kwaitkowski et al., 1992).gold are carried out each day. Understand<strong>in</strong>g the mechanism of gold price changesis important <strong>for</strong> many outstand<strong>in</strong>g issues <strong>in</strong> <strong>in</strong>ternational economics and f<strong>in</strong>ance.Market participants are <strong>in</strong>creas<strong>in</strong>gly concerned with their exposure to large goldprice fluctuations with special <strong>in</strong>terest <strong>in</strong> which factors drive the changes. In thissection, we apply the <strong>quantile</strong> <strong>causality</strong> <strong>test</strong> to <strong>in</strong>vestigate relations between theBrent crude oil, USD/GBP exchange rate and NYMEX spot gold prices (<strong>in</strong> USDper barrel and per ounce, respectively). The data, as seen <strong>in</strong> Figure 1, obta<strong>in</strong>edfrom Datastream, are daily observations from 20 February 1997 to 17 July 2009(T = 3,237). We use the USD/GBP <strong>in</strong>stead of USD/EUR because the euro wasonly <strong>in</strong>troduced as a new currency from 1 January 1999. As <strong>in</strong>dicated by Table 2,we assume differenced logarithmic data are stationary and β-mix<strong>in</strong>g with correspond<strong>in</strong>gdensities bounded. Because a long memory effect is not expected, wechoose p = q = 1 and m = 2.


874 KIHO JEONG ET AL.FIGURE 1. Plot of the gold prices, oil price, and exchange rate time series from 20February 1997 to 17 July 2009.FIGURE 2. Test statistics with respect to different <strong>quantile</strong>s <strong>for</strong> the oil-gold prices <strong>causality</strong><strong>test</strong>.Figures 2 and 3 present results of <strong>test</strong><strong>in</strong>g whether oil prices Granger cause goldprices and whether the USD/GBP exchange rate Granger causes gold prices atthe various <strong>quantile</strong>s, respectively, where logarithmic returns <strong>in</strong>stead of the rawobservations are used. The solid l<strong>in</strong>e and dotted l<strong>in</strong>e represent the standardized<strong>test</strong> statistics with respect to different <strong>quantile</strong>s (x-axis) and the critical value1.96, respectively. In Figures 2 and 3, because the <strong>test</strong> statistic exceeds the criticalvalue when 0.22 ≤ θ ≤ 0.80, we conclude that the oil price and exchange ratechanges do not cause the gold price change <strong>in</strong> θ0.80, whereas it isa prima facie cause <strong>in</strong> the 0.22 ≤ θ ≤ 0.80 <strong>quantile</strong>, respectively. For example, theoil price and USD/GBP exchange rate <strong>in</strong>creases suggest that <strong>in</strong>vestors are wary ofthe U.S. dollar’s weakness and <strong>in</strong>flation. Because gold is typically bought as an


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 875FIGURE 3. Test statistics with respect to different <strong>quantile</strong>s <strong>for</strong> the exchange rate-goldprices <strong>causality</strong> <strong>test</strong>.alternative to the dollar among safe-haven assets, <strong>in</strong>vestors seek<strong>in</strong>g safety from<strong>in</strong>flation risk and currency devaluation will cause the gold price to rise. However,the extreme low and high changes of the gold market may be caused by speculation.This is <strong>consistent</strong> with most of the empirical f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong> the literature that thecodependency may be stronger <strong>in</strong> the center than <strong>in</strong> the tails. By comb<strong>in</strong><strong>in</strong>g resultsfrom Figures 2 and 3, we f<strong>in</strong>d that the oil price and exchange rate changes have asignificant predictive power <strong>for</strong> nonextreme gold price changes, which is, however,not significant <strong>for</strong> extreme changes. This f<strong>in</strong>d<strong>in</strong>g could help to make it possible touse the gold price and GBP to hedge oil price changes <strong>in</strong> a more precise way withmore careful <strong>in</strong>vestigation of their relations, which deserves further research.6. CONCLUSIONBy extend<strong>in</strong>g the Zheng (1998) idea to dependent data, we propose a <strong>consistent</strong><strong>test</strong> <strong>for</strong> Granger <strong>causality</strong> <strong>in</strong> conditional <strong>quantile</strong>. The appeal<strong>in</strong>g feature of ourproposed <strong>test</strong> is that it can <strong>in</strong>vestigate Granger <strong>causality</strong> <strong>in</strong> various conditional<strong>quantile</strong>s. The benefit of this is illustrated <strong>in</strong> the commodity market applicationwhere the causal relationships among the oil price, USD/GBP exchange rate, andgold price were shown to be different between a tail area and <strong>in</strong> the center of thedistribution. We also illustrate that oil price and USD/GBP changes have significantpredictive power on nonextreme gold price changes.The <strong>test</strong> can be extended <strong>in</strong> a number of ways to <strong>test</strong> conditional <strong>quantile</strong> restrictionswith dependent data: First, it can be extended to <strong>test</strong> functional misspecification,or the <strong>in</strong>significance of a subset of regressors <strong>in</strong> <strong>quantile</strong> regressionfunction, and second, it can also be used to <strong>test</strong> some semiparametric versus <strong>nonparametric</strong>models <strong>in</strong> <strong>quantile</strong> regression models.


876 KIHO JEONG ET AL.REFERENCESAuestad, B. & D. Tjøstheim (1990) Identification of nonl<strong>in</strong>ear time series: First order characterisationand order determ<strong>in</strong>ation. Biometrica 77, 669–687.Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics31, 307–327.Bollerslev, T. (2001) F<strong>in</strong>ancial econometrics: Past developments and future challenges. Journal ofEconometrics 100, 41–51.Buch<strong>in</strong>sky, M. (1995) Quantile regression, Box-Cox trans<strong>for</strong>mation model, and the U.S. wage structure.1963–1987. Journal of Econometrics 65, 65–154.Campbell, J. & J. Cochrane (1999) By <strong>for</strong>ce of habit: A consumption-based explanation of aggrega<strong>test</strong>ock market behaviour. Journal of Political Economy 107, 205–251.Day, J.C. & E.C. Newburger (2002) The big payoff: Educational atta<strong>in</strong>ment and synthetic estimatesof work-life earn<strong>in</strong>gs. Special studies. Current population reports. Statistical report p23–210, U.S.Department of Commerce, U.S. Census Bureau.Diebolt, J. & D. Guégan (1993) Tail behavior of the stationary density of general nonl<strong>in</strong>ear autoregressiveprocesses of order 1. Journal of Applied Probability 30, 315–329.Engle, R. (1982) Autoregressive conditional heteroskedasticity with estimates of the variance ofUnited K<strong>in</strong>gdom <strong>in</strong>flation. Econometrica 50, 987–1007.Fan, Y. & Q. Li (1996) Consistent model specification <strong>test</strong>s: Omitted variables, parametric and semiparametricfunctional <strong>for</strong>ms. Econometrica 64, 865–890.Fan, Y. & Q. Li (1999) Central limit theorem <strong>for</strong> degenerate U-statistics of absolutely regular processeswith applications to model specification <strong>test</strong>s. Journal of Nonparametric Statistics 10,245–271.Franke, J. & P. Mwita (2003) Nonparametric Estimates <strong>for</strong> Conditional Quantiles of Time Series.Wirtschaftsmathematik 87, University of Kaiserslautern.Fuller, W. (1976) Introduction to Statistical Time Series. Wiley.Granger, C. (1969) Investigat<strong>in</strong>g causal relations by econometric models and cross-spectral methods.Econometrica 37, 424–438.Granger, C. (1988) Some recent developments <strong>in</strong> a concept of <strong>causality</strong>. Journal of Econometrics 39,199–211.Györfi, L., W. Härdle, P. Sarda, & P. Vieu (1989) Nonparametric Curve Estimation from Time Series.Spr<strong>in</strong>ger-Verlag.Hall, P. (1984) Central limit theorem <strong>for</strong> <strong>in</strong>tegrated square error of multivariate <strong>nonparametric</strong> densityestimators. Journal of Multivariate Analysis 14, 1–16.Härdle, W., Y. Ritov, & S. Song (2009) Bootstrap Partial L<strong>in</strong>ear Quantile Regression and ConfidenceBands. SFB649 Discussion paper 2010-002, <strong>Humboldt</strong> Universität zu Berl<strong>in</strong>.Härdle, W. & T. Stoker (1989) Investigat<strong>in</strong>g smooth multiple regression by the method of averagederivatives. Journal of the American Statistical Association 84, 986–995.Hong, Y., Y. Liu, & S. Wang (2009) Granger <strong>causality</strong> <strong>in</strong> risk and detection of extreme risk spilloverbetween f<strong>in</strong>ancial markets. Journal of Econometrics 150, 271–287.Hsiao, C. & Q. Li (2001) A <strong>consistent</strong> <strong>test</strong> <strong>for</strong> conditional heteroskedasticity <strong>in</strong> time-series regressionmodels. Econometric Theory 17, 188–221.Irle, A. (1997) On the consistency <strong>in</strong> <strong>nonparametric</strong> estimation under mix<strong>in</strong>g assumptions. MultivariateAnalysis 60, 123–147.Kwiatkowski, D., P.C.B. Phillips, P. Schmidt, & Y. Sh<strong>in</strong> (1992) Test<strong>in</strong>g the null hypothesis ofstationarity aga<strong>in</strong>st the alternative of a unit root. Journal of Econometrics 54, 159–178.Lee, T. & W. Yang (2007) Money-Income Granger-Causality <strong>in</strong> Quantiles. Manuscript, University ofCali<strong>for</strong>nia, Riverside.Li, Q. (1999) Consistent model specification <strong>test</strong>s <strong>for</strong> time series econometric models. Journal ofEconometrics 92, 101–147.Li, Q. & S. Wang (1998) A simple <strong>consistent</strong> bootstrap <strong>test</strong> <strong>for</strong> a parametric regression functional<strong>for</strong>m. Journal of Econometrics 87, 145–165.


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 877Masry, E. & D. Tjøstheim (1995) Nonparametric estimation and identification of nonl<strong>in</strong>ear ARCHtime series: Strong convergence and asymptotic normality. Econometric Theory 11, 258–289.Masry, E. & D. Tjøstheim (1997) Additive nonl<strong>in</strong>ear ARX time series and projection estimates.Econometric Theory 13, 214–252.Meir, R. (2000) Nonparametric time series prediction through adaptive model selection. Mach<strong>in</strong>eLearn<strong>in</strong>g 39, 5–34.Modha, D. & E. Masry (1998) Memory-universal prediction of stationary random processes. IEEETransactions on In<strong>for</strong>mation Theory 44, 117–133.Phillips, P.C.B. & P. Perron (1988) Test<strong>in</strong>g <strong>for</strong> a unit root <strong>in</strong> time series regression. Biometricka 75,335–346.Powell, J., J. Stock, & T. Stoker (1989) Semiparametric estimation of <strong>in</strong>dex coefficients. Econometrica57, 1403–1430.Rob<strong>in</strong>son, P.M. (1988) Root-n-<strong>consistent</strong> semiparametric regression. Econometrica 56, 931–954.Roussas, G.G. (1988) Nonparametric estimation <strong>in</strong> mix<strong>in</strong>g sequences of random variables. Journal ofStatistical Plann<strong>in</strong>g and Inference 18, 135–149.Sherman, R. (1994) Maximal <strong>in</strong>equalities <strong>for</strong> degenerate U-processes with applications to optimizationestimators. Annals of Statistics 22, 439–459.Yoshihara, K. (1976) Limit<strong>in</strong>g behavior of u-statistics <strong>for</strong> stationary absolutely regular processes.Zeitschrift für Wahrsche<strong>in</strong>lichkeitstheorie und Verwandte Gebiete 35, 237–252.Yu, B. (1993) Density estimation <strong>in</strong> the l1 norm <strong>for</strong> dependent data with applications. Annals ofStatistics 21, 711–735.Zheng, J. (1998) A <strong>consistent</strong> <strong>nonparametric</strong> <strong>test</strong> of parametric regression models under conditional<strong>quantile</strong> restrictions. Econometric Theory 14, 123–138.APPENDIXProof of Theorem 3.1(i). In the proof, we use several approximations to Ĵ T . We def<strong>in</strong>ethem now and recall a few already def<strong>in</strong>ed statistics <strong>for</strong> convenience of reference.Ĵ T ≡J T ≡J TU ≡1T (T − 1)h m1T (T − 1)h m1T (T − 1)h mT T∑ ∑ K tsˆε t ˆε s ,t=1 s≠tT T∑ ∑ K ts ε t ε s ,t=1 s≠tT T∑ ∑ K ts ε tU ε sU ,t=1 s≠t1 T TJ TL ≡T (T − 1)h ∑ m ∑ K ts ε tL ε sL ,t=1 s≠t{}where ˆε t = I y t ˆQ θ (x t ) − θ,ε t = I {y t Q θ (x t )} − θ,ε tU = I {y t + C T Q θ (x t )} − θ,ε tL = I {y t − C T Q θ (x t )} − θ,(A.1)(A.2)(A.3)(A.4)and C T is an upper bound <strong>consistent</strong> with the uni<strong>for</strong>m convergence rate of the <strong>nonparametric</strong>estimator of conditional <strong>quantile</strong> given <strong>in</strong> equation (13). The proof of Theorem 3.1(i)consists of three steps.


878 KIHO JEONG ET AL.Step 1. Asymptotic normality.Th m/2 J T → N(0,σ 2 0 ),(A.5)where σ0 {θ 2 = 2E 2 (1 − θ) 2 ∫ }f (z t )}{K 2 (u)du under the null.Step 2. Conditional asymptotic equivalence. Suppose that both Th m/2 (J T − J TU ) =O p (1) and Th m/2 (J T − J TL ) = O p (1).Then Th m/2 (Ĵ T − J T ) = O p (1). (A.6)Step 3. Asymptotic equivalence.Th m/2 (J T − J TU ) = O p (1) and Th m/2 (J T − J TL ) = O p (1). (A.7)The comb<strong>in</strong>ation of steps 1–3 yields Theorem 3.1(i).Proof of Step 1. Because J T is a degenerate U-statistic of order 2, the result followsfrom Lemma 3.2.Proof of Step 2. The proof of step 2 is motivated by the technique of Härdle and Stoker(1989) that was used <strong>in</strong> treat<strong>in</strong>g trimm<strong>in</strong>g an <strong>in</strong>dicator function asymptotically. Supposethat the follow<strong>in</strong>g two statements hold:Th m/2 (J T − J TU ) = O p (1) and (A.8)Th m/2 (J T − J TL ) = O p (1).(A.9)Use C T to denote an upper bound <strong>consistent</strong> with the uni<strong>for</strong>m convergence rate of the<strong>nonparametric</strong> estimator of conditional <strong>quantile</strong> given <strong>in</strong> equation (13). Suppose thatsup| ˆQ θ (x) − Q θ (x)| C T .(A.10)If <strong>in</strong>equality (A.10) holds, then the follow<strong>in</strong>g statements also hold:{Q θ (x)>y t + C T }⊂{ˆQ θ (x)>y t }⊂{Q θ (x)>y t − C T }, (A.11)1(Q θ (x)>y t + C T ) 1( ˆQ θ (x)>y t ) 1( Q θ (x)>y t − C T ), (A.12)J TU Ĵ T J TL ,(A.13)|Th m/2 (J T − Ĵ T )| max{|Th m/2 (J T − J TU )|,|Th m/2 (J T − J TL )|}. (A.14)Us<strong>in</strong>g (A.10) and (A.14), we have the follow<strong>in</strong>g <strong>in</strong>equality:{P |Th m/2 (J T − Ĵ T )| >δ| sup∣ ˆQ}θ (x) − Q θ (x)| C T{} P max{|Th m/2 (J T − J TU )|,|Th m/2 (J T − J TL )|} >δ∣ sup| ˆQ θ (x) − Q θ (x)| C T<strong>for</strong> allδ >0.(A.15)


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 879Invok<strong>in</strong>g Lemma 3.1 and condition (A2)(c), we have}P{sup| ˆQ θ (x) − Q θ (x)| C T → 1 as T →∞. (A.16)By (A.8) and (A.9), as T →∞,wehave{}P max{|Th m/2 (J T − J TU )|,|Th m/2 (J T − J TL )|} >δ → 0 <strong>for</strong> all δ>0. (A.17)There<strong>for</strong>e, as T →∞,}the right-hand side of the <strong>in</strong>equality (A.15) × P{sup| ˆQ θ (x) − Q θ (x)| C T → 0;}the left-hand side of the <strong>in</strong>equality (A.15) × P{sup| ˆQ θ (x) − Q θ (x)| C T{}= P |Th m/2 (J T − Ĵ T )| >δ → 0.In summary, we have that if both Th m/2 (J T − J TU ) = O p (1) and Th m/2 (J T − J TL ) =O p (1), then Th m/2 (Ĵ T − J T ) = O p (1).Proof of Step 3. In the rema<strong>in</strong><strong>in</strong>g proof, we focus on show<strong>in</strong>g that Th m/2 (J T − J TU ) =O p (1), with the proof of Th m/2 (J T − J TL ) = O p (1) be<strong>in</strong>g treated similarly. The proofof step 3 is close <strong>in</strong> l<strong>in</strong>e with the proof <strong>in</strong> Zheng (1998). DenoteH T (s,t, g) ≡ K ts {1(y t g(x t )) − θ}{1(y s g(x s )) − θ} and (A.18)J[g] ≡1T (T − 1)h mT T∑ ∑ H T (s,t, g).t=1 s≠t(A.19)Then we have J T ≡ J[Q θ ] and J TU ≡ J[Q θ −C T ]. We decompose H T (s,t, g) <strong>in</strong>to threeparts:H T (s,t, g) = K ts {1(y t g(x t )) − F(g(x t )|z t )}{1(y s g(x s )) − F(g(x s )|z s )}+2 × K ts {1(y t g(x t )) − F(g(x t )|z t )}{F(g(x s )|z s ) − θ}+ K ts {F(g(x t )|z t ) − θ}{F(g(x s )|z s ) − θ}= H 1T (s,t, g) + 2H 2T (s,t, g) + H 3T (s,t, g). (A.20)Then let J j [g] = 1/(T (T − 1)h m ) ∑T T∑ H jT (s,t, g), i = 1,2,3. We will treat J j [Q θ ] −t=1 s≠tJ j [Q θ − C T ] <strong>for</strong> j = 1,2,3 separately.(1) Th m/2 [ J 1 (Q θ ) − J 1 (Q θ − C T ) ] = O p (1). By simple manipulation, we haveJ 1 (Q θ ) − J 1 (Q θ − C T )1 T T [=T (T − 1)h ∑ m ∑ H1T (s,t, Q θ ) − H 1T (s,t, Q θ − C T ) ]t=1 s≠t=1T (T − 1)h mT T∑ ∑ K ts{[1(y t Q θ (x t )) − F(Q θ (x t )|z t )]t=1 s≠t×[1(y s Q θ (x s )) − F(Q θ (x s )|z s )]


880 KIHO JEONG ET AL.−[1(y t (Q θ (x t ) − C T )) − F((Q θ (x t ) − C T )|z t )]}×[1(y s (Q θ (x s ) − C T )) − F((Q θ (x s ) − C T )|z s )] .(A.21)To avoid tedious work to get bounds on the second moment of J 1 (Q θ ) − J 1 (Q θ − C T )with dependent data, we note that the right-hand side of (A.21) is a degenerate U-statisticof order 2. Thus we can apply Lemma 3.2 and haveTh m/2 [ J 1 (Q θ ) − J 1 (Q θ − C T ) ] → N(0,σ2 2 ) <strong>in</strong> distribution, (A.22)where the def<strong>in</strong>ition of the asymptotic variance σ2 2 is based on the i.i.d. sequence hav<strong>in</strong>gthe same marg<strong>in</strong>al distributions as weakly dependent variables <strong>in</strong> (A.21). That is,σ 2 2 = 2h−m Ẽ [ H 1T (s,t, Q θ ) − H 1T (s,t, Q θ − C T ) ] 2 ,where the notation Ẽ is an expectation evaluated at an i.i.d. sequence hav<strong>in</strong>g the samemarg<strong>in</strong>al distribution as the mix<strong>in</strong>g sequences <strong>in</strong> (A.21) (Fan and Li, 1999, p. 248). Now,to show that Th m/2 [ J 1 (Q θ ) − J 1 (Q θ − C T ) ] = O p (1), we only need to show that theasymptotic variance σ 2 2 (z) is O(1) with i.i.d. data. Use T to denote an upper bound<strong>consistent</strong> with the <strong>in</strong>tegral over K ts be<strong>in</strong>g of the order O(h m ).WehaveẼ [ H 1T (s,t, Q θ ) − H 1T (s,t, Q θ − C T ) ] 2 T Ẽ{[1 t (Q θ ) − F t (Q θ )][1 s (Q θ ) − F s (Q θ )]− [1 t (Q θ − C T ) − F t (Q θ − C T )][1 s (Q θ − C T ) − F s (Q θ − C T )]} 2 T Ẽ{F t (Q θ )[1 − F t (Q θ )] F s (Q θ )[1 − F s (Q θ )]}+ Ẽ{F t (Q θ − C T )[1 − F t (Q θ − C T )] F s (Q θ − C T )[1 − F s (Q θ − C T )]}− 2E{[F t (m<strong>in</strong>(Q θ , Q θ − C T )) − F t (Q θ )F t (Q θ − C T )]× [F s (m<strong>in</strong>(Q θ , Q θ − C T )) − F s (Q θ )F s (Q θ − C T )]}= T Ẽ{[F t (Q θ ) − F t (Q θ )F t (Q θ )][F s (Q θ ) − F s (Q θ )F s (Q θ )]}− T Ẽ{[F t (m<strong>in</strong>(Q θ , Q θ − C T )) − F t (Q θ )F t (Q θ − C T )]× [F s (m<strong>in</strong>(Q θ , Q θ − C T )) − F s (Q θ )F s (Q θ − C T )]}+ T Ẽ{[F t (Q θ − C T ) − F t (Q θ − C T )F t (Q θ − C T )]× [F s (Q θ − C T ) − F s (Q θ − C T )F s (Q θ − C T )]}− T Ẽ{[F t (m<strong>in</strong>(Q θ , Q θ − C T )) − F t (Q θ )F t (Q θ − C T )]× [F s (m<strong>in</strong>(Q θ , Q θ − C T )) − F s (Q θ )F s (Q θ − C T )]} T C T .(A.23)Thus we have that σ 2 2 = O(C T ) = O(1), and soTh m/2 [ J 1 (Q θ ) − J 1 (Q θ − C T ) ] = Op(1).(A.24)


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 881(2) Th m/2 [ J 2 (Q θ ) − J 2 (Q θ − C T ) ] = Op(1). Note that H 2T (s,t, Q θ ) = 0 because ofF y|z (Q θ (x s )|z s ) − θ = 0. Then we haveJ 2 (Q θ ) − J 2 (Q θ − C T ) =−J 2 (Q θ − C T )1 T T ( )1 zt=−T (T − 1)∑ ∑t=1 s≠t h m K − z sh×{1(y t Q θ (x t ) − C T ) − F y|z (Q θ (x t ) − C T |z t )}×{F y|z (Q θ (x s ) − C T |z s ) − θ}.(A.25)By tak<strong>in</strong>g a Taylor expansion of F y|z (Q θ (x s ) − C T |z s ) around Q θ (x s ), it equals1 T T ( )1 zt−T (T − 1)∑ ∑t=1 s≠t h m K − z sh×{1(y t Q θ (x t ) − C T ) − F y|z (Q θ (x t ) − C T |z t )}× (−C T ) f y|z ( ¯Q θ (x s )|z s ), (A.26)where ¯Q θ is between Q θ and Q θ − C T . Thus we have(J 2 (Q θ ) − J 2 (Q θ − C T )) 2[1 T T ( )1 ztT (T − 1)∑ ∑t=1 s≠t h m K − z sh× { 1(y t Q θ (x t ) − C T ) − F y|z (Q θ (x t ) − C T |z t ) }] 2 2 C 2 T= 2 C 2 T≡ 2 C 2 T[1 T {T∑ 1(yt Q θ (x t ) − C T ) − F y|z (Q θ (x t ) − C T ) } ] 2fˆz (z t )t=1{}1 T2T∑ u t fˆz (z t )t=1= 2 CT 2 T −2 ∑T u 2 tt=1≡ J 21 + J 22 ,fˆz 2 (z t ) + 2 CT 2 T −2 ∑Tt=1T∑s≠twhere the <strong>in</strong>equality holds because of Assumption (A.1)(e).E|J 21 | = 2 CT [T 2 T −1 −1 ∑T { } ]E u 2 ˆt fz 2 (z t )t=1(= O CT 2 T −2 h −m) ,u t u s ˆ f z (z t ) ˆ f z (z s )(A.27)(A.28)where the second equality is derived by us<strong>in</strong>g Lemma C.3(iii) of Li (1999).


882 KIHO JEONG ET AL.[J 22 = 2 CT2 T −2 ∑T T∑ u t u s f z (z t ) f z (z s )t=1 s≠t+ 2T −2 ∑T T}∑ u t u s f z (z t ){f ˆz(z s ) − f z (z s )t=1 s≠t+ T −2 ∑T T {∑ u t u s f ˆz(z t ) − f z (z t )}{f ˆz(z s ) − f z (z s )t=1 s≠t≡ C 2 T (J 221 + J 222 + J 223 ).Follow<strong>in</strong>g the l<strong>in</strong>e of the proof of Lemma A.2(i) of Li (1999) we have that(J 221 = O p T −2) (, J 222 = O p T −1) , and J 223 = O p(T −1) ; thus(J 22 = Op CT 2 T −1) .} ](A.29)(A.30)Thus, comb<strong>in</strong><strong>in</strong>g (A.28) and (A.30), we haveTh m/2 [ J 2 (Q θ ) − J 2 (Q θ − C T ) ] = O p (C T ) + Op(C T T 1/2 h m/2)= Op(1). (A.31)(3) Th m/2 [ J 3 (Q θ ) − J 3 (Q θ − C T ) ] = Op(1). Not<strong>in</strong>g that H 3T (s,t, Q θ ) = 0 becauseof F(Q θ (x j )|z j ) − θ = 0 <strong>for</strong> j = t,s, wehaveJ 3 (Q θ ) − J 3 (Q θ − C T )1 T T ( )1 zt=−T (T − 1)∑ ∑t=1 s≠t h m K − z sh×{F(Q θ (x t ) − C T |z t ) − θ}{F(Q θ (x s ) − C T |z s ) − θ}1 T T ( )1 zt=T (T − 1)∑ ∑t=1 s≠t h m K − z sC 2 h T f y|z( ¯Q θ (x t )|z t ) f y|z ( ¯Q θ (x s )|z s )= C 2 T1 TT∑ f y|z ( ¯Q θ (x t )|z t ) f y|z ( ¯Q θ (x s )|z s ) fˆz (z t ).t=1(A.32)Thus, we haveE|J 3 (Q θ ) − J 3 (Q θ − C T )| CT2 1 TT∑ E∣ ˆf z (z t ) ∣t=1 CT2 1 TT∑ E| f z (z t )| + CT2t=1(= O CT21TT∑t=1E∣ f ˆz (z t ) − f z (z t ) ∣). (A.33)


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 883F<strong>in</strong>ally, we haveTh m/2 [ J 3 (Q θ ) − J 3 (Q θ − C T ) ] )= O p(Th m/2 CT2= Op(1). (A.34)By comb<strong>in</strong><strong>in</strong>g (A.24), (A.31), and (A.34), we have the result of step 3.Proof of Theorem 3.1(ii). Because∫σ0 2 = 2θ2 (1 − θ) 2 E{f z (z t )} K 2 (u)duˆσ 0 2 ≡ 2θ2 (1 − θ) 2 1T (T − 1)h m ∑ Kts 2 ,s≠tandit is enough to show thatσT 2 ≡ 1T (T − 1)h m ∑ Kts2s≠t∫= E{f z (z t )} K 2 (u)du+ Op(1).(A.35)Note that σT 2 is a nondegenerate U-statistic of order 2 with kernelH T (z t , z s ) = 1 ( )h m K 2 zt − z s. (A.36)hBecause Assumptions (A2)(d) and (e) satisfy the conditions of Lemma 3.2 of Yoshihara(1976) on the asymptotic equivalence of the U-statistic and its projection under β-mix<strong>in</strong>g,we have <strong>for</strong> γ = 2(δ − δ ′ )/δ ′ (2 + δ) > 0σT 2 ≡ 1T (T − 1) ∑ H T (z t , z s )s≠t∫ ∫= H T (z 1 , z 2 )dF z (z 1 )dF z (z 2 )+2T −1 ∑T [ ∫ ∫ ∫H T (z t , z 2 )dF z (z 2 ) −t=1+O p (T −1−γ )∫ ∫= H T (z 1 , z 2 )dF z (z 1 )dF z (z 2 ) + Op(1)∫ ∫ ( )1=h m K 2 z1 − z 2dF z (z 1 )dF z (z 2 ) + Op(1)h∫∫= K 2 (u)du fz 2 (z)dz+ O p(1).The result of Theorem 3.1(ii) follows from (A.37).]H T (z 1 , z 2 )dF z (z 1 )dF z (z 2 )(A.37)Proof of Theorem 3.1(iii). The proof of Theorem 3.1(iii) consists of two steps.


884 KIHO JEONG ET AL.Step 1. Show that Ĵ T = J T + Op(1) under the alternative hypothesis (4).Step 2. Show that J T = J + Op(1) under the alternative hypothesis (4),where J = E{[F y|z (Q θ (x t )|z t ) − θ] 2 f z (z t )}. The comb<strong>in</strong>ation of steps 1 and2 yields Theorem 3.1(iii).Proof of Step 1. We note that the results of step 2 and Th m/2 [ J 1 (Q θ ) − J 1 (Q θ − C T ) ] =Op(1) of step 3 <strong>in</strong> the proof of Theorem 3.1(i) still hold under the alternative hypothesis(4). Thus we focus on show<strong>in</strong>g that J 2 (Q θ ) − J 2 (Q θ − C T ) = Op(1) andJ 3 (Q θ ) − J 3 (Q θ − C T ) = Op(1).We beg<strong>in</strong> with show<strong>in</strong>g that J 2 (Q θ ) − J 2 (Q θ − C T ) = Op(1). By the same proceduresas <strong>in</strong> (A.27), we can show that J 2 (Q θ − C T ) = Op(T −1 h −m/2 ). Thus it rema<strong>in</strong>s to showthat J 2 (Q θ ) = Op(1). By tak<strong>in</strong>g a Taylor expansion of F y|z (Q θ (x s )|z s ) around Q θ (x s ),we have1 T TJ 2 (Q θ ) =−T (T − 1)∑ ∑t=1 s≠t( )1 zth m K − z sh×{1(y t Q θ (x t )) − F y|z (Q θ (x t )|z t )}× f y|z ( ¯Q θ (x s )|z s )= 1 TT∑ {1(y t Q θ (x t )) − F y|z (Q θ (x t ))} f y|z ( ¯Q θ (x s )|z s ) fˆz (z t )t=1≡ 1 TT∑ u t f y|z ( ¯Q θ (x s )|z s ) fˆz (z t ).t=1(A.38)By similar arguments as <strong>in</strong> (A.26) and (A.31), we have(J 2 (Q θ ) = O T −1 h −m) .(A.39)Next, we show that Th m/2 [ J 3 (Q θ ) − J 3 (Q θ − C T ) ] = Op(1) under the alternative hypothesis(4). Because F(Q θ (x j )|z j ) − θ ≠ 0 <strong>for</strong> j = t,s under the alternative hypothesis,we haveJ 3 (Q θ ) − J 3 (Q θ − C T )1 T T ( )1 zt=T (T − 1)∑ ∑t=1 s≠t h m K − z s{F(Q θ (x t )|z t ) − θ}{F(Q θ (x s )|z s ) − θ}h1 T T ( )1 zt−T (T − 1)∑ ∑t=1 s≠t h m K − z sh×{F(Q θ (x t ) − C T |z t ) − θ}{F(Q θ (x s ) − C T |z s ) − θ}= 1 TT∑ {F(Q θ (x t )|z t ) − θ}{F(Q θ (x s )|z s ) − θ} fˆz (z t )t=1− 1 TT∑ {F(Q θ (x t ) − C T |z t ) − θ}{F(Q θ (x s ) − C T |z s ) − θ} fˆz (z t ).t=1(A.40)


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 885By tak<strong>in</strong>g a Taylor expansion of F y|z (Q θ (x j ) − C T |z j ) around Q θ (z j ) <strong>for</strong> j = t,s, wehaveJ 3 (Q θ ) − J 3 (Q θ − C T ) = 1 TT∑ {F(Q θ (x t )|z t ) − θ}C T f y|z ( ¯Q θ (x t )|z t ) fˆz (z t )t=1+ 1 TT∑ C T f y|z ( ¯Q θ (x t )|z t ){F(Q θ (x s )|z s ) − θ} fˆz (z t )t=1− 1 TT∑ CT 2 f y|z( ¯Q θ (x t )|z t ) f y|z ( ¯Q θ (x s )|z s ) fˆz (z t ). (A.41)t=1We further take a Taylor expansion of F y|z (Q θ (x j )|z j ) around Q θ (z j ) <strong>for</strong> j = t,s andhaveJ 3 (Q θ ) − J 3 (Q θ − C T ) = 1 T+ 1 T− 1 TT∑ f y|z ( ¯Q θ (x t , z t )|z t )C T f y|z ( ¯Q θ (x s )|z s ) fˆz (z t )t=1T∑ C T f y|z ( ¯Q θ (x t )|z t ) f y|z ( ¯Q θ (x s , z s )|z s ) fˆz (z t )t=1T∑ CT 2 f y|z( ¯Q θ (x t )|z t ) f y|z ( ¯Q θ (x s )|z s ) fˆz (z t ),t=1(A.42)where ¯Q θ (x s , z s ) is between Q θ (x s ) and Q θ (z s ). Then by us<strong>in</strong>g the same procedures as<strong>in</strong> (A.30), we haveJ 3 (Q θ ) − J 3 (Q θ − C T ) = O (C T ).Now we have the result of step 1 <strong>for</strong> the proof of Theorem 3.1(iii).(A.43)Proof of Step 2. Us<strong>in</strong>g (7) and the uni<strong>for</strong>m convergence rate of the kernel regressionestimator under a β-mix<strong>in</strong>g process, we haveJ T =1T (T − 1)h mT T∑ ∑ K ts ε t ε st=1 s≠t= 1 T ∑ Ê(ε t |z t ) fˆz (z t )ε tt=1= 1 T ∑ E(ε t |z t ) f z (z t )ε t + 1t=1T ∑t=1}{Ê(εt |z t ) fˆz (z t ) − E(ε t |z t ) f z (z t ) ε t= 1 T ∑ E(ε t |z t ) f z (z t )ε t + Op(1)t=1= E [ ]E(ε t |z t ) f z (z t )ε t + O p(1)= J + Op(1). (A.44)


886 KIHO JEONG ET AL.Proof of Theorem 3.1(iv). The proof of Theorem 3.1(iv) is close <strong>in</strong> l<strong>in</strong>e with the proof<strong>in</strong> Zheng (1998). The proof of Theorem 3.1(iv) consists of two steps.Step 1. Show that Ĵ T = J T + Op(T −1 h −m/2 ) under the alternative hypothesis (A.2).Step 2. Show that Th m/2 J T → N(μ,σ1 2 ) under the alternative hypothesis (A.2),[{}where μ = E fy|z 2 {Q θ (z t )|z t }l 2 (z t ) f z (z t )], σ1 2 = 2E σv 4(z t ) f z (z t )∫ K 2 (u)du,and σv 2(z t ) = E(vt 2|z t ) with v t ≡ I {y t Q θ (x t )}− F(Q θ (x t )|z t ).Proof of Step 1. The results of step 1 <strong>in</strong> the proof of Theorem 3.1(iii) show that,under the general alternative hypothesis (4), the elements consist<strong>in</strong>g ( of Ĵ T − J T are allOp(T −1 h −m/2 ) except <strong>for</strong> J 2 (Q θ (x)), the order of which is O T −1 h −m) as <strong>in</strong> (A.39).Thus we need to show that J 2 (Q θ (x)) = Op(T −1 h −m/2 ) under the local alternative hypothesis(A.2). Tak<strong>in</strong>g a Taylor expansion of F y|z {Q θ (z t ) + d T l(z t )|z t } around d T = 0,we haveF y|z {Q θ (z t ) + d T l(z t )|z t } = θ + d T f y|z {Q θ (z t )|z t }l(z t ) + Op(d 2 T ).(A.45)By similar procedures as <strong>in</strong> (A.38) and (A.39), we have1 T TJ 2 (Q θ (x)) =−T (T − 1)∑ ∑t=1 s≠t× d T f y|z {Q θ (z t )|z t }l(z t ) + Op1 T=−d TT∑t=1( )1 zth m K − z s{1(y t Q θ (x t )) − F y|z (Q θ (x t )|z t )}h( )dT2{1(y t Q θ (x t )) − F y|z (Q θ (x t )|z t )}× f y|z {Q θ (z t )|z t }l(z t ) ˆ f z (z t ) + Op1 T≡−d TT∑t=1(= Op dT2( )dT2( )u t f y|z {Q θ (z t )|z t }l(z t ) fˆz (z t ) + Op dT2). (A.46)Proof of Step 2. Tak<strong>in</strong>g a Taylor expansion of F y|z {Q θ (z t ) + d T l(z t )|z t } aroundd T = 0, we have1 T TJ T (Q θ (x)) =T (T − 1)h ∑ m ∑ K ts {1(y t Q θ (x t ))F(Q θ (x t )|z t )}t=1 s≠t×{1(y s Q θ (x s )) − F(Q θ (x s )|z s )}2d T TT−T (T − 1)h ∑ m ∑ K ts {1(y t Q θ (x t )) − F(Q θ (x t )|z t )}t=1 s≠t× f y|z {Q θ (z s )|z s }l(z s )


NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE 887d 2 T T+ TT (T − 1)h ∑ m ∑ K ts f y|z {Q θ (z t )|z t }l(z t ) f y|z {Q θ (z s )|z s }l(z s )t=1 s≠t( )+Op dT2 ( )= T 1T − 2d T T 2T + dT 2 T 3T + Op dT2 . (A.47)Not<strong>in</strong>g that T 1T is a degenerate U-statistic of order 2, by Lemma 3.2, we have( )Th m/2 T 1T → N 0,σ12 <strong>in</strong> distribution,(A.48){Similarly to the proof <strong>for</strong> (A.31), we can show that T 2T = O (Th m ) −1} , and so d T T 2T ={O (Th m/2 ) −1} . And by the same procedures as <strong>in</strong> (A.44), we haveT 3T[]→ E fy|z 2 {Q θ (z t )|z t }l 2 (z t ) f z (z t )<strong>in</strong> probability.(A.49)Thus,( )Th m/2 J T → N μ,σ12 , (A.50)[]where μ = E fy|z 2 {Q θ (z t )|z t }l 2 (z t ) f z (z t ) .


Journal of Multivariate Analysis 107 (2012) 244–262Contents lists available at SciVerse ScienceDirectJournal of Multivariate Analysisjournal homepage: www.elsevier.com/locate/jmvaBootstrap confidence bands and partial l<strong>in</strong>ear <strong>quantile</strong> regression ✩Song Song a,b,∗ , Ya’acov Ritov c , Wolfgang K. Härdle aa <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>, Germanyb The University of Texas at Aust<strong>in</strong>, United Statesc The Hebrew University of Jerusalem, Israela r t i c l ei n f oa b s t r a c tArticle history:Received 13 June 2011Available onl<strong>in</strong>e 31 January 2012JEL classification:C14C21C31J01J31J71In this paper bootstrap confidence bands are constructed <strong>for</strong> <strong>nonparametric</strong> <strong>quantile</strong>estimates of regression functions, where resampl<strong>in</strong>g is done from a suitably estimatedempirical distribution function (edf) <strong>for</strong> residuals. It is known that the approximationerror <strong>for</strong> the confidence band by the asymptotic Gumbel distribution is logarithmicallyslow. It is proved that the bootstrap approximation provides an improvement. The caseof multidimensional and discrete regressor variables is dealt with us<strong>in</strong>g a partial l<strong>in</strong>earmodel. An economic application considers the labor market differential effect with respectto different education levels.© 2012 Elsevier Inc. All rights reserved.AMS subject classifications:62F4062G0862G86Keywords:BootstrapQuantile regressionConfidence bandsNonparametric fitt<strong>in</strong>gKernel smooth<strong>in</strong>gPartial l<strong>in</strong>ear model1. IntroductionQuantile regression, as first <strong>in</strong>troduced by Koenker and Bassett [25], is ‘‘gradually develop<strong>in</strong>g <strong>in</strong>to a comprehensivestrategy <strong>for</strong> complet<strong>in</strong>g the regression prediction’’ as claimed by Koenker and Hallock [26]. Quantile smooth<strong>in</strong>g is an effectivemethod to estimate <strong>quantile</strong> curves <strong>in</strong> a flexible <strong>nonparametric</strong> way. S<strong>in</strong>ce this technique makes no structural assumptionson the underly<strong>in</strong>g curve, it is very important to have a device <strong>for</strong> understand<strong>in</strong>g when observed features are significant anddecid<strong>in</strong>g between functional <strong>for</strong>ms. For example, a question often asked <strong>in</strong> this context is whether or not an observed peakor valley is actually a feature of the underly<strong>in</strong>g regression function or is only an artifact of the observational noise. For suchissues, confidence bands (i.e., uni<strong>for</strong>m over location) give an idea about the global variability of the estimate.The <strong>nonparametric</strong> <strong>quantile</strong> estimate could be obta<strong>in</strong>ed either us<strong>in</strong>g a check function such as a robustified local l<strong>in</strong>earsmoother [10,35,36], or through estimat<strong>in</strong>g the conditional distribution function us<strong>in</strong>g the double-kernel local l<strong>in</strong>ear✩ The f<strong>in</strong>ancial support from the Deutsche Forschungsgeme<strong>in</strong>schaft via SFB 649 ‘‘Ökonomisches Risiko’’, <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong> is gratefullyacknowledged. Ya’acov Ritov’s research is supported by an ISF grant and a <strong>Humboldt</strong> Award. We thank Thorsten Vogel and Alexandra Spitz-Oener <strong>for</strong>shar<strong>in</strong>g their data, comments and suggestions.∗ Correspondence to: The University of Texas at Aust<strong>in</strong>, 78751 Aust<strong>in</strong>, United States.E-mail address: ssoonngg123@gmail.com (S. Song).0047-259X/$ – see front matter © 2012 Elsevier Inc. All rights reserved.doi:10.1016/j.jmva.2012.01.020


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 245technique [11,35,36]. Besides these, [17] proposed a weighted version of the Nadaraya–Watson estimator, which was furtherstudied by Cai [5]. In the previous work the theoretical focus has ma<strong>in</strong>ly been on obta<strong>in</strong><strong>in</strong>g consistency and asymptoticnormality of the <strong>quantile</strong> smoother, and thereby provid<strong>in</strong>g the necessary <strong>in</strong>gredients to construct its po<strong>in</strong>twise confidence<strong>in</strong>tervals. This, however, is not sufficient to get an idea about the global variability of the estimate; neither can it be usedto correctly answer questions about the curve’s shape, which conta<strong>in</strong>s the lack of fit <strong>test</strong> as an immediate application. Thismotivates us to construct the confidence bands.To this end, [22] used strong approximations of the empirical process and extreme value theory. However, the verypoor convergence rate of extremes of a sequence of n <strong>in</strong>dependent normal random variables is well documented and wasfirst noticed and <strong>in</strong>vestigated by Fisher and Tippett [12], and discussed <strong>in</strong> greater detail by Hall [16]. In the latter paper itwas shown that the rate of the convergence to its limit (the suprema of a stationary Gaussian process) can be no faster than(log n) −1 . For example, the supremum of a <strong>nonparametric</strong> <strong>quantile</strong> estimate can converge to its limit no faster than (log n) −1 .These results may make extreme value approximation of the distributions of suprema somewhat doubtful, <strong>for</strong> example <strong>in</strong>the context of the uni<strong>for</strong>m confidence band construction <strong>for</strong> a <strong>nonparametric</strong> <strong>quantile</strong> estimate.This paper proposes and analyzes a bootstrap-based method of obta<strong>in</strong><strong>in</strong>g the confidence bands <strong>for</strong> <strong>nonparametric</strong><strong>quantile</strong> estimates. The method is simple to implement, does not rely on the evaluation of quantities which appear <strong>in</strong>asymptotic distributions, and takes the bias properly <strong>in</strong>to account (at least asymptotically). Additionally, we show thatthe bootstrap distribution can approximate the true one (w.r.t. the ∥ · ∥ ∞ norm, details <strong>in</strong> Theorem 2.1) up to n −2/5 , whichrepresents a significant improvement relative to (log n) −1 , which is based on the asymptotic Gumbel distribution, as studiedby Härdle and Song [22]. Previous research by Hahn [15] showed consistency of a bootstrap approximation to the cumulativedistribution function (cdf) without assum<strong>in</strong>g <strong>in</strong>dependence of the error and regressor terms. Ref. [23] showed bootstrapmethods <strong>for</strong> median regression models based on a smoothed least-absolute-deviations (SLAD) estimate.Let (X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X n , Y n ) be a sequence of <strong>in</strong>dependent identically distributed bivariate random variables withjo<strong>in</strong>t pdf f (x, y), jo<strong>in</strong>t cdf F(x, y), conditional pdf f (y|x), f (x|y), conditional cdf F(y|x), F(x|y) <strong>for</strong> Y given X and X given Yrespectively, and marg<strong>in</strong>al pdf f X (x) <strong>for</strong> X, f Y (y) <strong>for</strong> Y . With some abuse of notation we use the letters f and F to denotedifferent pdfs and cdfs respectively. The exact distribution will be clear from the context. At the first stage we assume thatx ∈ J ∗ = (a, b) <strong>for</strong> some 0 < a < b < 1. Let l(x) denote the p-<strong>quantile</strong> curve, i.e. l(x) = F −1Y|x (p).In economics, discrete or categorial regressors are very common. An example is from labor market analysis where onetries to f<strong>in</strong>d out how revenues depend on the age of the employee w.r.t. different education levels, labor union statuses,genders and nationalities, i.e. <strong>in</strong> econometric analysis one targets the differential effects. For example, [4] exam<strong>in</strong>ed the USwage structure by <strong>quantile</strong> regression techniques. This motivates the extension to multivariate covariables by partial l<strong>in</strong>earmodell<strong>in</strong>g (PLM). This is convenient especially when we have categorial elements of the X vector. Partial l<strong>in</strong>ear models, whichwere first considered by Green and Yandell [14,8,34,32], are gradually develop<strong>in</strong>g <strong>in</strong>to a class of commonly used and studiedsemiparametric regression models, which can reta<strong>in</strong> the flexibility of <strong>nonparametric</strong> models and ease the <strong>in</strong>terpretation ofl<strong>in</strong>ear regression models while avoid<strong>in</strong>g the ‘‘curse of dimensionality’’. Recently [29] used penalized <strong>quantile</strong> regression <strong>for</strong>variable selection of partially l<strong>in</strong>ear models with measurement errors.In this paper, we propose an extension of the <strong>quantile</strong> regression model to x = (u, v) ⊤ ∈ R d with u ∈ R d−1 andv ∈ J ∗ ⊂ R. The <strong>quantile</strong> regression curve we consider is ˜l(x) = F−1(p) = Y|x u⊤ β + l(v). The multivariate confidence band canthen be constructed, √ based on the univariate uni<strong>for</strong>m confidence band, plus the estimated l<strong>in</strong>ear part which we will proveis more accurately ( n consistency) estimated. This makes various tasks <strong>in</strong> economics, e.g. labor market differential effect<strong>in</strong>vestigation, multivariate model specification <strong>test</strong>s and the <strong>in</strong>vestigation of the distribution of <strong>in</strong>come and wealth acrossregions or countries or the distribution across households possible. Additionally, s<strong>in</strong>ce the natural l<strong>in</strong>k between <strong>quantile</strong> andexpectile regression was developed by Newey and Powell [30], we can further extend our result <strong>in</strong>to expectile regression<strong>for</strong> various tasks, e.g. demography risk research or expectile-based Value at Risk (EVAR) as <strong>in</strong> [28]. For high-dimensionalmodell<strong>in</strong>g, [2] recently <strong>in</strong>vestigated high-dimensional sparse models with L 1 penalty. Additionally, our result might also befurther extended to <strong>in</strong>tersection bounds (one side confidence bands), which is similar to the work of Chernozhukov et al. [6].The rest of this article is organized as follows. To keep the ma<strong>in</strong> idea transparent, <strong>in</strong> Section 2, as an <strong>in</strong>troduction to themore complicated situation, the bootstrap approximation rate <strong>for</strong> the (univariate) confidence band is presented through acoupl<strong>in</strong>g argument. An extension to multivariate covariance X with partial l<strong>in</strong>ear modell<strong>in</strong>g is shown <strong>in</strong> Section 3 with theactual type of confidence bands and their properties. In Section 4, we compare via a Monte Carlo study the bootstrap uni<strong>for</strong>mconfidence band with the one based on the asymptotic theory and <strong>in</strong>vestigate the behavior of partial l<strong>in</strong>ear estimates withthe correspond<strong>in</strong>g confidence band. In Section 5, an application considers the labor market differential effect. The discussionis restricted to the semiparametric extension. We do not discuss the general <strong>nonparametric</strong> regression. We conjecture thatthis extension is possible under appropriate conditions. Section 6 conta<strong>in</strong>s conclud<strong>in</strong>g remarks. All proofs are sketched <strong>in</strong>the Appendix.2. Bootstrap confidence bands <strong>in</strong> the univariate caseSuppose Y i = l(X i ) + ε i , i = 1, . . . , n, where ε i has the (conditional) distribution function F(·|X i ). For simplicity, butwithout any loss of generality, we assume that F(0|X i ) = p. F(ξ|x) is smooth as a function of x and ξ <strong>for</strong> any x, and <strong>for</strong> anyξ <strong>in</strong> the neighborhood of 0. We assume:


246 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262(A1) X 1 , . . . , X n are an i.i.d. sample, and <strong>in</strong>f x f X (x) = λ 0 > 0. The <strong>quantile</strong> function satisfies sup x |l (j) (x)| ≤ λ j < ∞, j = 1, 2.(A2) The distribution of Y given X has a density and <strong>in</strong>f x,t f (t|x) ≥ λ 3 > 0, cont<strong>in</strong>uous at all x ∈ J ∗ , and at t only <strong>in</strong> aneighborhood of 0. More exactly, we have the follow<strong>in</strong>g Taylor expansion at x ′ = x, t = 0, <strong>for</strong> some A(·) and f 0 (·): F(t|x ′ ) = F(0|x) + ∂F(t|x′ ) ∂x ′t + ∂F(t|x′ )x ′ =x,t=0∂t (x ′ − x) + R(t, x ′ ; x)x ′ =x,t=0def= p + f 0 (x)t + A(x)(x ′ − x) + R(t, x ′ ; x), (1)where|R(t, x ′ ; x)|supt,x,x ′ t 2 + |x ′ − x| < ∞.2Let K be a symmetric density function with compact support and d K = u 2 K(u)du < ∞. Let l h (·) = l n,h (·) be the<strong>nonparametric</strong> p-<strong>quantile</strong> estimate of Y 1 , . . . , Y n with weight function K{(X i − ·)/h} <strong>for</strong> some global bandwidth h =h n (K h (u) = h −1 K(u/h)), that is, a solution ofnnK h (x − X i )1{Y i < l h (x)}K h (x − X i )1{Y i ≤ l h (x)}i=1i=1< p ≤. (2)nnK h (x − X i )K h (x − X i )i=1i=1Generally, the bandwidth may also depend on x. A local (adaptive) bandwidth selection though deserves future research.Note that by assumption (A1), l h (x) is the <strong>quantile</strong> of a discrete distribution, which is equivalent to a sample of size O p (nh)from a distribution with p-<strong>quantile</strong> whose bias is O(h 2 ) relative to the true value. Let δ n be the local rate of convergenceof the function l h , essentially δ n = h 2 + (nh) −1/2 = O(n −2/5 ) with optimal bandwidth choice h = h n = O(n −1/5 ) asdef<strong>in</strong> [36]. We employ also an auxiliary estimate l g = l n,g , essentially one similar to l n,h but with a slightly larger bandwidthg = g n = h n n ζ (a heuristic explanation of why it is essential to oversmooth g is given later), where ζ is some small number.The asymptotically optimal choice of ζ as shown later is 4/45.(A3) The estimate l g satisfiessup |l ′′ (x) −x∈J ∗ g l′′ (x)| = O p (1),sup |l ′ (x) −x∈J ∗ g l′ (x)| = O p (δ n /h). (3)Assumption (A3) is only stated to overwrite the issue here. It actually follows from the assumptions on (g, h). A sequence{a n } is slowly vary<strong>in</strong>g if n −α a n → 0 <strong>for</strong> any α > 0. With some abuse of notation we will use S n to denote any slowly vary<strong>in</strong>gfunction which may change from place to place, e.g. S 2 = nS n is a valid expression (s<strong>in</strong>ce if S n is a slowly vary<strong>in</strong>g function,then S 2 n is slowly vary<strong>in</strong>g as well). λ i and C i are generic constants throughout this paper and the subscripts have no specificmean<strong>in</strong>g. Note that there is no S n term <strong>in</strong> (3) exactly because the bandwidth g n used to calculate l g is slightly larger thanthat used <strong>for</strong> l h . We want to smooth it such that l g , as an estimate of the <strong>quantile</strong> function, has a slightly worse rate ofconvergence, but its derivatives converge faster.We also consider a family of estimates ˆF(·|Xi ), i = 1, . . . , n, estimat<strong>in</strong>g respectively F(·|X i ) and satisfy<strong>in</strong>g ˆF(0|Xi ) = p. Forexample we can take the distribution with a po<strong>in</strong>t mass [ nK{α j=1 n(X j − X i )}] −1 K{(X j − X i )/h} on Y j − l h (X i ), j = 1, . . . , n,i.e.nK h (X j − X i )1{Y j − l h (X i ) ≤ ·}ˆF(·|X i ) =j=1nK h (X j − X i )j=1We additionally assume:. (4)(A4) f X (x) is twice cont<strong>in</strong>uously differentiable and f (t|x) is cont<strong>in</strong>uous <strong>in</strong> x, Hölder-cont<strong>in</strong>uous <strong>in</strong> t and uni<strong>for</strong>mly bounded<strong>in</strong> x and t by, say, λ 4 .For the precision of ˆF(·|Xi )’s approximation around 0, we employ the follow<strong>in</strong>g lemma from Franke and Mwita [13]:Lemma 2.1 ([13, Lemma A.3-5]). If assumptions (A1, A2, A4) hold, then <strong>for</strong> |t| < S n δ n , δ n → 0, i = 1, . . . , n, X i ∈ J ∗ ,sup |ˆF(t|Xi ) − F(t|X i )| = O p {S n δ n }. (5)|t|


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 247Let F −1 (·|·) and ˆF−1(·|·) be the <strong>in</strong>verse function of the conditional cdf and its estimate. We consider the follow<strong>in</strong>gbootstrap procedure. Let U 1 , . . . , U n be i.i.d. uni<strong>for</strong>m [0, 1] variables. LetY ∗i= l g (X i ) + ˆF −1 (U i |X i ), i = 1, . . . , n (6)be the bootstrap sample. We couple this sample to an unobserved hypothetical sample from the true conditional distributionY #i= l(X i ) + F −1 (U i |X i ), i = 1, . . . , n. (7)Note that the vectors (Y 1 , . . . , Y n ) and (Y #, . . . , 1 Y #) n are equally distributed given X 1, . . . , X n . We are really <strong>in</strong>terested <strong>in</strong> theexact values of Y #iand Y ∗ionly when they are near the appropriate <strong>quantile</strong>, that is, only if |U i − p| < S n δ n . But then, byEq. (1), Lemma 2.1 and the <strong>in</strong>verse function theorem, we havemax |F −1 (U i |X i ) − F −1 (U i |X i )| =i:|F −1 (U i |X i )−F −1 (p)|


248 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262(2) Compute the conditional edf:nK h (x − X i )1{ˆε i t}i=1ˆF(t|x) =.nK h (x − X i )i=1(3) For each i = 1, . . . , n, generate random variables ε ∗ ∼ i,bˆF(t|Xi ), b = 1, . . . , B and construct the bootstrap sample, i = 1, . . . , n, b = 1, . . . , B as follows:Y ∗i,bY ∗i,b = l g(X i ) + ε ∗ i,b .(4) For each bootstrap sample {(X i , Y ∗i,b )}n i=1 , compute l∗ hand the random variabledefd b = supˆf {l∗(x)|x}x∈J ∗ hˆf X (x)|l ∗ (x) − h l g(x)|where ˆf {l(x)|x}, ˆfX (x) are <strong>consistent</strong> estimators of f {l(x)|x}, f X (x).(5) Calculate the (1 − α) <strong>quantile</strong> d ∗ α of d 1, . . . , d B . −1(6) Construct the bootstrap uni<strong>for</strong>m confidence band centered around l h (x), i.e. l h (x) ±ˆf {lh (x)|x} ˆf X (x)d ∗ α .While bootstrap methods are well-known tools <strong>for</strong> assess<strong>in</strong>g variability, more care must be taken to properly account <strong>for</strong>the type of bias encountered <strong>in</strong> <strong>nonparametric</strong> curve estimation. The choice of bandwidth is crucial here. In our experiencethe bootstrap works well with a rather crude choice of g; one may, however, specify g more precisely. S<strong>in</strong>ce the ma<strong>in</strong> roleof the pilot bandwidth is to provide a correct adjustment <strong>for</strong> the bias, we use the goal of bias estimation as a criterion. Recallthat the bias <strong>in</strong> the estimation of l(x) by l # h(x) is given byb h (x) = El # h(x) − l(x).The bootstrap bias of the estimate constructed from the resampled data isˆb h,g (x) = El ∗ h (x) − l g(x). (15)Note that <strong>in</strong> (15) the expected value is computed under the bootstrap estimation. The follow<strong>in</strong>g theorem gives anasymptotic representation of the mean squared error <strong>for</strong> the problem of estimat<strong>in</strong>g b h (x) by ˆbh,g (x). It is then straight<strong>for</strong>wardto f<strong>in</strong>d g to m<strong>in</strong>imize this representation. Such a choice of g will make the <strong>quantile</strong>s of the orig<strong>in</strong>al and coupled bootstrapdistributions close to each other. In addition to the technical assumptions be<strong>for</strong>e, we also need:(A5) l and f are four times cont<strong>in</strong>uously differentiable.(A6) K is twice cont<strong>in</strong>uously differentiable.Theorem 2.2. Under assumptions (A1–A6), <strong>for</strong> any x ∈ J ∗ 2 Eˆbh,g (x) − b h (x) X1, . . . , X n ∼ h 4 (C 1 g 4 + C 2 n −1 g −5 ) (16)<strong>in</strong> the sense that the ratio between the RHS and the LHS tends <strong>in</strong> probability to 1 <strong>for</strong> some constants C 1 , C 2 .An immediate consequence of Theorem 2.2 is that the rate of convergence of g should be n −1/9 , see also [20]. This makesprecise the previous <strong>in</strong>tuition which <strong>in</strong>dicated that g should slightly oversmooth. Under our assumptions, reasonable choicesof h will be of the order n −1/5 as <strong>in</strong> [36]. Hence, (16) shows once aga<strong>in</strong> that g should tend to zero more slowly than h. Notethat Theorem 2.2 is not stated uni<strong>for</strong>mly over h. The reason is that we are only try<strong>in</strong>g to give some <strong>in</strong>dication of how thepilot bandwidth g should be selected.We summarize how to select the bandwidth h <strong>for</strong> the local <strong>quantile</strong> smoother and g <strong>for</strong> the oversmoothed estimate asbelow.1 Select h as <strong>in</strong> [36] which is also quoted below.– Use ready-made and sophisticated methods to select h mean , the optimal bandwidth choice <strong>for</strong> regresion meanestimation; we use the technique of Ruppert et al. [33].– Use h = h mean {p(l − p)/φ(Φ −1 (p)) 2 } 1/5 to obta<strong>in</strong> all other h’s (w.r.t. different p’s) from h mean . φ and Ψ are the PDF andCDF of standard normal distributions respectively.2 Accord<strong>in</strong>g to Theorem 2.2, select g as g = n 4/45 h.(14)


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 2493. Bootstrap confidence bands <strong>in</strong> PLMsThe case of multivariate regressors may be handled via a semiparametric specification of the <strong>quantile</strong> regression curve.More specifically we assume that with x = (u, v) ⊤ ∈ R d , v ∈ R:˜l(x) = u ⊤ β + l(v).In this section we show how to proceed <strong>in</strong> this multivariate sett<strong>in</strong>g and how — based on Theorem 2.1 — a multivariateconfidence band may be constructed. We first describe the numerical procedure <strong>for</strong> obta<strong>in</strong><strong>in</strong>g estimates of β and l, where ldenotes — as <strong>in</strong> the earlier sections — the one-dimensional conditional <strong>quantile</strong> curve. We then move on to the theoreticalproperties. First note that the PLM <strong>quantile</strong> estimation problem can be seen as estimat<strong>in</strong>g (β, l) <strong>in</strong>y = u ⊤ β + l(v) + ε= ˜l(x) + ε (17)where the p-<strong>quantile</strong> of ε conditional on both u and v is 0.In order to estimate β, let a n denote an <strong>in</strong>creas<strong>in</strong>g sequence of positive <strong>in</strong>tegers and set b n = a −1n. For each n = 1, 2, . . . ,partition the unit <strong>in</strong>terval [0, 1] <strong>for</strong> v <strong>in</strong> a n <strong>in</strong>tervals I ni , i = 1, . . . , a n , of equal length b n and let m ni denote the midpo<strong>in</strong>tof I ni . In each of these small <strong>in</strong>tervals I ni , i = 1, . . . , a n , l(v) can be considered as be<strong>in</strong>g approximately constant, and hence(17) can be considered as a l<strong>in</strong>ear model. This observation motivates the follow<strong>in</strong>g two stage estimation procedure.(1) A l<strong>in</strong>ear <strong>quantile</strong> regression <strong>in</strong>side each partition is used to estimate ˆβ i , i = 1, . . . , a n . Their weighted mean yieldsˆβ. More exactly, consider the parametric <strong>quantile</strong> regression of y on u, 1 v ∈ [0, b n ) , 1 v ∈ [b n , 2b n ) , . . . , 1 v ∈[1 − b n , 1] . That is, letThen letψ(t) def= (p − 1)t1(t < 0) + pt1(t 0).ˆβ = arg m<strong>in</strong>βm<strong>in</strong>l 1 ,...,l annψ Y i − β T U i −i=1a n j=1l j 1 V i ∈ I ni .(2) Calculate the smooth <strong>quantile</strong> estimate as <strong>in</strong> (2) from (V i , Y i − U ⊤ iˆβ) n i=1 , and name it as ˜lh (v).The follow<strong>in</strong>g theorem states the asymptotic distribution of ˆβ.Theorem 3.1. If assumption (A1) holds, <strong>for</strong> the above two stage estimation procedure, there exist positive def<strong>in</strong>ite matrices D, C,such that√ n( ˆβ − β) L → N{0, p(1 − p)D −1 CD −1 } as n → ∞,where C = plim n→∞ C n and D = plim n→∞ D n with C n = 1 n ni=1 U⊤ iU i and D n = 1 n nj=1 f {l(V j)|V i }U ⊤ jU j respectively.Note that l(v),˜lh (v) (<strong>quantile</strong> smoother based on (v, y − u ⊤ β)) and ˜lh (v) can be treated as zeros (w.r.t. θ, θ ∈ I where Iis a possibly <strong>in</strong>f<strong>in</strong>ite, or possibly degenerate, <strong>in</strong>terval <strong>in</strong> R) of the functionsdef H(θ, v) = f (v, ỹ)ψ(ỹ − θ)dỹ, (18)Rn Hn (θ, v) def= n −1 K h (v − V i )ψ( Yi − θ), (19)wherei=1n Hn (θ, v) def= n −1 K h (v − V i )ψ( Yi − θ), (20) Yidef= Y i − U ⊤ iβ, Yidef= Y i − U ⊤ ii=1ˆβ = Y i − U ⊤ iβ + U ⊤ i(β − ˆβ) def= Yi + Z i .From Theorem 3.1 we know that ˆβ − β = O p (1/ √ n) and ∥Z i ∥ ∞ = O p (1/ √ n). Under the follow<strong>in</strong>g assumption, which issatisfied by exponential and generalized hyperbolic distributions, also used <strong>in</strong> [18]:(A7) The conditional densities f (·|ỹ), ỹ ∈ R, are uni<strong>for</strong>mly local Lipschitz cont<strong>in</strong>uous of order ˜α (ulL- ˜α) on J, uni<strong>for</strong>mly <strong>in</strong>ỹ ∈ R, with 0 < ˜α 1, and (nh)/ log n → ∞,


250 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262<strong>for</strong> some constant C 3 not depend<strong>in</strong>g on n, Lemma 2.1 <strong>in</strong> [22] shows a.s. as n → ∞:sup sup | Hn (θ, v) − H(θ, v)| ≤ C3 max{(nh/ log n) −1/2 , h ˜α }.θ∈I v∈J ∗Observ<strong>in</strong>g that √ h/ log n = O(1), we then havesup sup | Hn (θ, v) − H(θ, v)| ≤ sup sup | Hn (θ, v) − H(θ, v)| + sup sup | Hn (θ, v) − Hn (θ, v)|θ∈I v∈J ∗ θ∈I v∈J ∗ θ∈I v∈J ∗ ≤O p (1/ √ n) sup |n −1 K h |v∈J≤ C 4 max{(nh/ log n) −1/2 , h ˜α } (21)<strong>for</strong> a constant C 4 which can be different from C 3 . To show the uni<strong>for</strong>m consistency of the <strong>quantile</strong> smoother, we shall reducethe problem of strong convergence of ˜lh (v) − l(v), uni<strong>for</strong>mly <strong>in</strong> v, to an application of the strong convergence of Hn (θ, v) to H(θ, v), uni<strong>for</strong>mly <strong>in</strong> v and θ. For our result on ˜lh (·), we shall also require(A8) <strong>in</strong>f v∈J ∗ ψ{y − l(v) + ε}dF(y|v) ˜q|ε|, <strong>for</strong> |ε| δ1 ,where δ 1 and ˜q are some positive constants, see also [19]. This assumption is satisfied if a constant ˜q exists giv<strong>in</strong>g f {l(v)|v} >˜q/p, x ∈ J. Ref. [22] showed:Lemma 3.1. Under assumptions (A7) and (A8) , we have a.s. as n → ∞sup |˜lh (v) − l(v)| ≤ C 5 max{(nh/ log n) −1/2 , h ˜α } (22)v∈J ∗with another constant C 5 not depend<strong>in</strong>g on n. If we consider the bandwidth h = O(n −1/5 ) and then skip the slow vary<strong>in</strong>g functionlog n, then (nh/ log n) −1/2 = O(n −2/5 ) < O(n −1/5 ) h ˜α , (22) can be further simplified tosup |˜lh (v) − l(v)| ≤ C 5 {h ˜α }.v∈J ∗S<strong>in</strong>ce the proof is essentially the same as Theorem 2.1 of the above mentioned reference, it is omitted here.The convergence rate <strong>for</strong> the parametric part O p (n −1/2 ) (Theorem 3.1) is smaller than the bootstrap approximation error<strong>for</strong> the <strong>nonparametric</strong> part O p (n −2/5 ) as shown <strong>in</strong> Theorem 2.1. This makes the construction of uni<strong>for</strong>m confidence bands<strong>for</strong> multivariate x ∈ R d with a partial l<strong>in</strong>ear model possible.Proposition 3.1. Under the assumptions (A1)–(A8), an approximate (1 − α) × 100% confidence band over R d−1 × [0, 1] is −1u ⊤ ˆβ + ˜lh (v) ±ˆf {˜lh (x)|x} ˆf X (x)d ∗ α ,where ˆf {˜lh (x)|x}, ˆfX (x) are <strong>consistent</strong> estimators of f {l(x)|x}, f X (x).Note that here we actually only require that the convergence rate of the parametric part, which is typically O p (n −1/2 ),is smaller than the bootstrap approximation error <strong>for</strong> the <strong>nonparametric</strong> part O p (n −2/5 ). This makes construction <strong>for</strong> theuni<strong>for</strong>m confidence bands of more general semiparametric models possible <strong>in</strong>stead of just the partial l<strong>in</strong>ear model shownhere and similar results could be obta<strong>in</strong>ed easily.4. A Monte Carlo studyThis section is divided <strong>in</strong>to two parts. First we concentrate on a univariate regressor variable x, check the validity of thebootstrap procedure together with sett<strong>in</strong>gs <strong>in</strong> the specific example, and compare it with asymptotic uni<strong>for</strong>m bands. Secondlywe <strong>in</strong>corporate the partial l<strong>in</strong>ear model to handle the multivariate case of x ∈ R d .Below is the summary of the simulation procedure.(1) Simulate (X i , Y i ), i = 1, . . . , n accord<strong>in</strong>g to their jo<strong>in</strong>t pdf f (x, y).In order to compare with earlier results <strong>in</strong> the literature, we choose the jo<strong>in</strong>t pdf of bivariate data {(X i , Y i )} n i=1 , n = 1000asf (x, y) = f y|x (y − s<strong>in</strong> x)1(x ∈ [0, 1]), (23)where f y|x (x) is the pdf of N(0, x) with an <strong>in</strong>creas<strong>in</strong>g heteroscedastic structure. Thus the theoretical <strong>quantile</strong> is l(x) =s<strong>in</strong>(x) + √ xΦ −1 (p). Based on this normality property, all the assumptions can be seen to be satisfied.


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 251Fig. 1. The real 0.9 <strong>quantile</strong> curve (black dotted l<strong>in</strong>e), 0.9 <strong>quantile</strong> estimate (cyan solid l<strong>in</strong>e) with correspond<strong>in</strong>g 95% uni<strong>for</strong>m confidence band fromasymptotic theory (magenta dashed l<strong>in</strong>es) and confidence band from bootstrapp<strong>in</strong>g (red dashed–dot l<strong>in</strong>es). (For <strong>in</strong>terpretation of the references to color<strong>in</strong> this figure legend, the reader is referred to the web version of this article.)(2) Compute the local <strong>quantile</strong> smoother l h (x) of Y 1 , . . . , Y n with bandwidth h and obta<strong>in</strong> residuals ˆε i = Y i − l h (X i ),i = 1, . . . , n.If we choose p = 0.9, then Φ −1 (p) = 1.2816, l(x) = s<strong>in</strong>(x) + √ 1.2816 x. Set h = 0.05.(3) Compute the conditional edf:nK h (x − X i )1{ˆε i t}i=1ˆF(t|x) =.nK h (x − X i )i=1The choice of kernel functions plays a m<strong>in</strong>or role here. Section 3.4.3 and Table 3.3 of Härdle et al. [21] discuss theefficiencies of different kernels. The Epanechnikov kernel would be the optimal one; however, the differences amongvarious kernels are small. Thus, we just use the Gaussian kernel to assure numerical stability. This is also convenientbecause the optimal bandwidth suggested by Yu and Jones [36] is also calculated based on the Gaussian kernel.(4) For each i = 1, . . . , n, generate random variables ε ∗ ∼ i,bˆF(t|x), b = 1, . . . , B and construct the bootstrap sample, i = 1, . . . , n, b = 1, . . . , B as follows:Y ∗i,bY ∗i,b = l g(X i ) + ε ∗ i,b ,with g = 0.2.(5) For each bootstrap sample {(X i , Y ∗i,b )}n i=1 , compute l∗ hand the random variabledefd b = supˆf {l∗(x)|x}x∈J ∗ hˆf X (x)|l ∗ (x) − h l g(x)| , (24)where ˆf {l(x)|x}, ˆfX (x) are <strong>consistent</strong> estimators of f {l(x)|x}, f X (x) with use of f (y|x) = f (x, y)/f X (x).(6) Calculate the (1 − α) <strong>quantile</strong> d ∗ α of d 1, . . . , d B . −1(7) Construct the bootstrap uni<strong>for</strong>m confidence band centered around l h (x), i.e. l h (x) ±ˆf {lh (x)|x} ˆf X (x)d ∗ α .Fig. 1 shows the theoretical 0.9 <strong>quantile</strong> curve, 0.9 <strong>quantile</strong> estimate with correspond<strong>in</strong>g 95% uni<strong>for</strong>m confidence bandfrom the asymptotic theory and the confidence band from the bootstrap. The real 0.9 <strong>quantile</strong> curve is marked as the blackdotted l<strong>in</strong>e. We then compute the classic local <strong>quantile</strong> estimate l h (x) (cyan solid) with its correspond<strong>in</strong>g 95% uni<strong>for</strong>mconfidence band (magenta dashed) based on asymptotic theory accord<strong>in</strong>g to Härdle and Song [22]. The 95% confidenceband from the bootstrap is displayed as red dashed–dot l<strong>in</strong>es. At first sight, the <strong>quantile</strong> smoother, together with twocorrespond<strong>in</strong>g bands, all capture the heteroscedastic structure quite well, and the width of the bootstrap confidence bandis similar to the one based on asymptotic theory <strong>in</strong> [22]. Fig. 2 presents the bootstrap confidence bands constructed us<strong>in</strong>gdifferent oversmooth<strong>in</strong>g bandwidths w.r.t. the same (but different from the one used <strong>for</strong> Fig. 1) randomly generated dataset, namely, 1/2, 1 and 2 times (from left to right) of the oversmooth<strong>in</strong>g bandwidth g = n 4/45 h used be<strong>for</strong>e. As we can see,when we deviate from g = n 4/45 h, the bootstrap confidence bands get wider.We now extend x to the multivariate case and use a different <strong>quantile</strong> function to verify our method. Choose x = (u, v) ⊤ ∈R d , v ∈ R, and generate the data {(U i , V i , Y i )} n i=1, n = 1000 withy = 2u + v 2 + ε − 1.2816, (25)where u and v are uni<strong>for</strong>mly distributed random variables <strong>in</strong> [0, 2] and [0, 1] respectively. ε has a standard normaldistribution. The theoretical 0.9-<strong>quantile</strong> curve is ˜l(x) = 2u + v 2 . S<strong>in</strong>ce the choice of a n is uncerta<strong>in</strong> here, we <strong>test</strong> different


252 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262Fig. 2. The real 0.9 <strong>quantile</strong> curve (black dotted l<strong>in</strong>e), 0.9 <strong>quantile</strong> estimate (cyan solid l<strong>in</strong>e) with correspond<strong>in</strong>g 95% uni<strong>for</strong>m confidence band fromasymptotic theory (magenta dashed l<strong>in</strong>es) and confidence band from bootstrapp<strong>in</strong>g (red dashed–dot l<strong>in</strong>es). The left, middle and right plots correspond tothe oversmooth<strong>in</strong>g bandwidth set as n 4/45 h/2, n 4/45 h and 2n 4/45 h respectively. (For <strong>in</strong>terpretation of the references to color <strong>in</strong> this figure legend, the readeris referred to the web version of this article.)Table 1SSE of ˆβ with respect to a n <strong>for</strong> different numbers of observations.a n n = 1000 n = 8000 n = 261148n 1/3 /83.6 × 10 −3n 1/3 /4 5.4 × 10 −1 4.0 × 10 −2 3.3 × 10 −3n 1/3 /2 6.1 × 10 −1 3.5 × 10 −2 3.2 × 10 −3n 1/3 6.2 × 10 −1 3.6 × 10 −2 3.1 × 10 −3n 1/3 · 2 8.0 × 10 −1 3.9 × 10 −2 2.9 × 10 −3n 1/3 · 4 4.9 × 10 −1 3.6 × 10 −2 2.8 × 10 −3n 1/3 · 83.4 × 10 −3choices of a n <strong>for</strong> different n by simulation. To this end, we modify the theoretical model as follows:y = 2u + v 2 + ε − Φ −1 (p)such that the real β is always equal to 2 no matter if p is 0.01 or 0.99. The result is displayed <strong>in</strong> Fig. 3 <strong>for</strong> n = 1000,n = 8000, n = 261148 (number of observations <strong>for</strong> the data set used <strong>in</strong> the follow<strong>in</strong>g application part <strong>in</strong>clud<strong>in</strong>g bothuncensored and censored observations). Different l<strong>in</strong>es correspond to different a n , i.e. n 1/3 /8, n 1/3 /4, n 1/3 /2, n 1/3 , n 1/3 · 2,n 1/3 · 4 and n 1/3 · 8. At first, it seems that the choice of a n does not matter too much. To further <strong>in</strong>vestigate this, we calculate99the SSE ( { 1ˆβ(i/100) − β}) where ˆβ(i/100) denotes the estimate correspond<strong>in</strong>g to the i/100 <strong>quantile</strong>. The results aredisplayed <strong>in</strong> Table 1. Obviously a n has much less effect than n on SSE. Consider<strong>in</strong>g the computational cost, which <strong>in</strong>creaseswith a n , and the estimation per<strong>for</strong>mance, empirically we suggest a n = n 1/3 . Certa<strong>in</strong>ly this issue is far from settled and needsfurther <strong>in</strong>vestigation.Thus <strong>for</strong> the specific model (25), we have a n = 10, ˆβ = 1.997, h = 0.2 and g = 0.7. In Fig. 4 the theoretical 0.9 <strong>quantile</strong>curve with respect to v, and the 0.9 <strong>quantile</strong> estimate with correspond<strong>in</strong>g uni<strong>for</strong>m confidence band are displayed. The real0.9 <strong>quantile</strong> curve is marked as the black dotted l<strong>in</strong>e. We then compute the <strong>quantile</strong> smoother l h (x) (magenta solid). The95% bootstrap uni<strong>for</strong>m confidence band is displayed as red dashed l<strong>in</strong>es and covers the true <strong>quantile</strong> curve quite well.5. A labor market applicationOur <strong>in</strong>tuition of the effect of education on <strong>in</strong>come is summarized by Day and Newburger’s basic claim [7]: ‘‘At most ages,more education equates with higher earn<strong>in</strong>gs, and the payoff is most notable at the highest educational levels’’, which isactually from the po<strong>in</strong>t of view of mean regression. However, whether this difference is significant or not is still questionable,especially <strong>for</strong> different ends of the (conditional) <strong>in</strong>come distribution. To this end, a careful <strong>in</strong>vestigation of <strong>quantile</strong>regression is necessary. S<strong>in</strong>ce different education levels may reflect different productivity, which is unobservable and mayalso results from different ages, abilities etc., to study the labor market differential effect with respect to different educationlevels, a semiparametric partial l<strong>in</strong>ear <strong>quantile</strong> model is preferred, which can reta<strong>in</strong> the flexibility of the <strong>nonparametric</strong>models <strong>for</strong> the age and other unobservable factors and ease the <strong>in</strong>terpretation of the education factor.We use the adm<strong>in</strong>istrative data from the German National Pension Office (Deutsche Rentenversicherung Bund) <strong>for</strong> thefollow<strong>in</strong>g group: West German part, males, born between 1939 and 1942 who began receiv<strong>in</strong>g a pension <strong>in</strong> 2004 or 2005(when they were 62–66 years old) with at least 30 yearly uncensored observations. S<strong>in</strong>ce different people entered <strong>in</strong>tothe pension system and stopped receiv<strong>in</strong>g job earn<strong>in</strong>gs at different ages, we only consider those earn<strong>in</strong>gs recorded by thepension system when they were between 25 and 59 years old. For example, we consider person A’s yearly earn<strong>in</strong>gs whenhe was 25–59 (enter<strong>in</strong>g <strong>in</strong>to the pension system at 25), person B’s when he was 27–59 (enter<strong>in</strong>g <strong>in</strong>to the pension systemat 27), and person C’s when he was 30–59 (enter<strong>in</strong>g <strong>in</strong>to the pension system at 30). In total, n = 128429 observations areavailable. We have the follow<strong>in</strong>g three education categories: ‘‘low education’’, ‘‘apprenticeship’’ and ‘‘university’’ <strong>for</strong> thevariable u (we assign them the numerical values 1, 2 and 3 respectively); the variable v is the age of the employee. ‘‘Loweducation’’ means without post-secondary education <strong>in</strong> Germany. ‘‘Apprenticeship’’ means part of Germany’s dual education


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 253Fig. 3. ˆβ with respect to different <strong>quantile</strong>s <strong>for</strong> different numbers of observations, i.e. n = 1000 (top), n = 8000 (middle), n = 261 148 (bottom). Differentl<strong>in</strong>es <strong>in</strong> the same plot correspond to different a n , i.e. n 1/3 /8, n 1/3 /4, n 1/3 /2, n 1/3 , n 1/3 · 2, n 1/3 · 4 and n 1/3 · 8.system. Depend<strong>in</strong>g on the profession, a person may work <strong>for</strong> three to four days a week <strong>in</strong> the company and then spendone or two days at a vocational school (Berufsschule). ‘‘University’’ <strong>in</strong> Germany also <strong>in</strong>cludes technical colleges (applieduniversities). S<strong>in</strong>ce the level and structure of wages differ substantially between East and West Germany, we concentrateon West Germany only here (which we usually refer to simply as Germany). Our data have several advantages over the mostoften used German Socio-Economics Panel (GSOEP) data <strong>for</strong> analyz<strong>in</strong>g wages <strong>in</strong> Germany. Firstly, they are available <strong>for</strong> amuch longer period, as opposed to from 1984 only <strong>for</strong> the GSOEP data. Secondly, and more importantly, they have a muchlarger sample size. Thirdly, wages are likely to be measured much more precisely. Fourthly, we observe a complete earn<strong>in</strong>gs


254 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262Fig. 4. Nonparametric part smooth<strong>in</strong>g, real 0.9 <strong>quantile</strong> curve (black dotted l<strong>in</strong>e) with respect to v, 0.9 <strong>quantile</strong> smoother (magenta solid l<strong>in</strong>e) withcorrespond<strong>in</strong>g 95% bootstrap uni<strong>for</strong>m confidence band (red dashed l<strong>in</strong>es). (For <strong>in</strong>terpretation of the references to color <strong>in</strong> this figure legend, the reader isreferred to the web version of this article.)Fig. 5. Boxplots <strong>for</strong> ‘‘low education’’ (red), ‘‘apprenticeship’’ (blue) and ‘‘university’’ (brown) groups correspond<strong>in</strong>g to different ages. (For <strong>in</strong>terpretation ofthe references to color <strong>in</strong> this figure legend, the reader is referred to the web version of this article.)Fig. 6.ˆβ correspond<strong>in</strong>g to different <strong>quantile</strong>s with 6, 13, 25 partitions.history from the <strong>in</strong>dividual’s first job until his retirement, there<strong>for</strong>e this is a true panel, not a pseudo-panel. There are alsoseveral drawbacks. For example, some very wealthy <strong>in</strong>dividuals are not registered <strong>in</strong> the German pension system, e.g. if theirmonthly <strong>in</strong>come is more than some threshold (which may vary <strong>for</strong> different years due to the <strong>in</strong>flation effect), the <strong>in</strong>dividualhas the right not to be <strong>in</strong>cluded <strong>in</strong> the public pension system, and thus is not recorded. Besides this, it is also right-censored atthe highest level of earn<strong>in</strong>gs that is subject to social security contributions, so the censored observations <strong>in</strong> the data are only<strong>for</strong> those who actually decided to stay with<strong>in</strong> the public system. Because of the comb<strong>in</strong>ation of truncation and censor<strong>in</strong>g,this paper focuses on the uncensored data only, and we should not draw <strong>in</strong>ferences from the very high <strong>quantile</strong>, i.e. we onlyconsider the 0.80 <strong>quantile</strong>s here. Recently, similar data were also used to <strong>in</strong>vestigate the German wage structure as <strong>in</strong> [9].


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 255Fig. 7. 95% bootstrap (thick) and asymptotic (th<strong>in</strong>) uni<strong>for</strong>m confidence bands <strong>for</strong> 0.20-<strong>quantile</strong> smoothers w.r.t. 3 different education levels. The ‘‘loweducation’’, ‘‘apprenticeship’’ and ‘‘university’’ levels are marked as red dashed, blue dotted and brown dashed–dot l<strong>in</strong>es respectively. (For <strong>in</strong>terpretationof the references to color <strong>in</strong> this figure legend, the reader is referred to the web version of this article.)Fig. 8. 95% bootstrap (thick) and asymptotic (th<strong>in</strong>) uni<strong>for</strong>m confidence bands <strong>for</strong> 0.20-<strong>quantile</strong> smoothers w.r.t. 3 different education levels with theoversmooth<strong>in</strong>g bandwidth set as g/2, g/4, 2g and 4g (from left to right, up to down) respectively. The ‘‘low education’’, ‘‘apprenticeship’’ and ‘‘university’’levels are marked as red dashed, blue dotted and brown dashed–dot l<strong>in</strong>es respectively. (For <strong>in</strong>terpretation of the references to color <strong>in</strong> this figure legend,the reader is referred to the web version of this article.)Follow<strong>in</strong>g from Becker’s [1] human capital model, a log trans<strong>for</strong>mation is per<strong>for</strong>med first on the hourly real wages(unit: EUR, at year 2000 prices). Fig. 5 displays the boxplots <strong>for</strong> the ‘‘low education’’, ‘‘apprenticeship’’ and ‘‘university’’groups correspond<strong>in</strong>g to different ages. In the data all ages (25–59) are reported as <strong>in</strong>tegers and are categorized <strong>in</strong> one-yeargroups. We rescaled them to the <strong>in</strong>terval [0, 1] by divid<strong>in</strong>g by 40, with correspond<strong>in</strong>g bandwidths h of 0.041, 0.039, 0.041<strong>for</strong> the 0.20, 0.50, 0.80 <strong>nonparametric</strong> <strong>quantile</strong> smoothers respectively. Correspond<strong>in</strong>gly, as discussed be<strong>for</strong>e, we chooseg = n 4/45 h, thus 0.12, 0.11, 0.12 <strong>for</strong> the correspond<strong>in</strong>g oversmoothers respectively. To detect whether a differential effect<strong>for</strong> different education levels exists, we compare the correspond<strong>in</strong>g uni<strong>for</strong>m confidence bands, i.e. differences <strong>in</strong>dicate thatthe differential effect may exist <strong>for</strong> different education levels <strong>in</strong> the German labor market <strong>for</strong> that specific labor group.Follow<strong>in</strong>g an application of the partial l<strong>in</strong>ear model <strong>in</strong> Section 3, Fig. 6 displays ˆβ with respect to different <strong>quantile</strong>s <strong>for</strong>6, 13, and 25 partitions, respectively. At first, the ˆβ curve is quite surpris<strong>in</strong>g, s<strong>in</strong>ce it is not, as <strong>in</strong> mean regression, a positiveconstant, but rather varies a lot, e.g. ˆβ(0.20) = 0.026, ˆβ(0.50) = 0.057 and ˆβ(0.80) = 0.061. Furthermore, it is robustto different numbers of partitions. It seems that the differences between the ‘‘low education’’ and ‘‘university’’ groups aredifferent <strong>for</strong> different tails of the wage distribution. To judge whether these differences are significant, we use the uni<strong>for</strong>m


256 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262Fig. 9. 95% bootstrap (thick) and asymptotic (th<strong>in</strong>) uni<strong>for</strong>m confidence bands <strong>for</strong> 0.50-<strong>quantile</strong> smoothers w.r.t. 3 different education levels. The ‘‘loweducation’’, ‘‘apprenticeship’’ and ‘‘university’’ levels are marked as red dashed, blue dotted and brown dashed–dot l<strong>in</strong>es respectively. (For <strong>in</strong>terpretationof the references to color <strong>in</strong> this figure legend, the reader is referred to the web version of this article.)Fig. 10. 95% bootstrap (thick) and asymptotic (th<strong>in</strong>) uni<strong>for</strong>m confidence bands <strong>for</strong> 0.50-<strong>quantile</strong> smoothers w.r.t. 3 different education levels with theoversmooth<strong>in</strong>g bandwidth set as g/2, g/4, 2g and 4g (from left to right, up to down) respectively. The ‘‘low education’’, ‘‘apprenticeship’’ and ‘‘university’’levels are marked as red dashed, blue dotted and brown dashed–dot l<strong>in</strong>es respectively. (For <strong>in</strong>terpretation of the references to color <strong>in</strong> this figure legend,the reader is referred to the web version of this article.)confidence band techniques discussed <strong>in</strong> Section 2 which are displayed <strong>in</strong> Figs. 7–11 correspond<strong>in</strong>g to the 0.20, 0.50 and0.80 <strong>quantile</strong>s respectively.The 95% uni<strong>for</strong>m confidence bands from bootstrapp<strong>in</strong>g <strong>for</strong> the ‘‘low education’’ group are marked as red dashed l<strong>in</strong>es,while the ones <strong>for</strong> ‘‘apprenticeship’’ and ‘‘university’’ are displayed as blue dotted and brown dashed–dot l<strong>in</strong>es, respectively.The correspond<strong>in</strong>g asymptotic bands studied <strong>in</strong> [22] are also added <strong>for</strong> reference (th<strong>in</strong> l<strong>in</strong>es with the same style and color),which overlap with the bootstrap bands <strong>for</strong> large samples as here. For the 0.20 <strong>quantile</strong> <strong>in</strong> Fig. 7, the bands <strong>for</strong> ‘‘university’’,‘‘apprenticeship’’ and ‘‘low education’’ do not differ significantly from one another although they become progressivelylower, which <strong>in</strong>dicates that high education does not equate to higher earn<strong>in</strong>gs significantly <strong>for</strong> the lower tails of wages,while <strong>in</strong>creas<strong>in</strong>g age seems to be the ma<strong>in</strong> driv<strong>in</strong>g <strong>for</strong>ce. For the 0.50 <strong>quantile</strong> <strong>in</strong> Fig. 9, the bands <strong>for</strong> ‘‘university’’ and ‘‘loweducation’’ differ significantly from one another although not from that <strong>for</strong> ‘‘apprenticeship’’. However, <strong>for</strong> the 0.80 <strong>quantile</strong>s<strong>in</strong> Fig. 11, all the bands differ significantly (except on the right boundary because of the <strong>nonparametric</strong> method’s boundaryeffect) result<strong>in</strong>g from the relatively large ˆβ(0.80) = 0.061, which <strong>in</strong>dicates that high education is significantly associatedwith higher earn<strong>in</strong>gs <strong>for</strong> the upper tails of wages.


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 257Fig. 11. 95% bootstrap (thick) and asymptotic (th<strong>in</strong>) uni<strong>for</strong>m confidence bands <strong>for</strong> 0.80-<strong>quantile</strong> smoothers w.r.t. 3 different education levels. The ‘‘loweducation’’, ‘‘apprenticeship’’ and ‘‘university’’ levels are marked as red dashed, blue dotted and brown dashed–dot l<strong>in</strong>es respectively. (For <strong>in</strong>terpretationof the references to color <strong>in</strong> this figure legend, the reader is referred to the web version of this article.)Fig. 12. 95% bootstrap (thick) and asymptotic (th<strong>in</strong>) uni<strong>for</strong>m confidence bands <strong>for</strong> 0.80-<strong>quantile</strong> smoothers w.r.t. 3 different education levels with theoversmooth<strong>in</strong>g bandwidth set as g/2, g/4, 2g and 4g (from left to right, up to down) respectively. The correspond<strong>in</strong>g l<strong>in</strong>e styles and colors are the sameas <strong>in</strong> Fig. 7. (For <strong>in</strong>terpretation of the references to color <strong>in</strong> this figure legend, the reader is referred to the web version of this article.)Coupled with Figs. 7, 9 and 11, Figs. 8, 10 and 12 present the correspond<strong>in</strong>g bootstrap confidence bands constructedus<strong>in</strong>g different oversmooth<strong>in</strong>g bandwidths, namely, half, quarter, twice and quadruple (from left to right, up to down) ofthe oversmooth<strong>in</strong>g bandwidth g = n 4/45 h used be<strong>for</strong>e. The correspond<strong>in</strong>g asymptotic bands are also added <strong>for</strong> reference(th<strong>in</strong> l<strong>in</strong>es with the same style and color). As we can see, <strong>in</strong> practice, <strong>for</strong> the typically large labor economic data set, thebootstrap confidence bands are quite robust to the choice of the oversmooth<strong>in</strong>g bandwidth.If we <strong>in</strong>vestigate the explanations <strong>for</strong> the differences <strong>in</strong> different tails of the <strong>in</strong>come distribution, maybe the mostprom<strong>in</strong>ent reason is the rapid development of technology, which has been extensively studied. The po<strong>in</strong>t is that technologydoes not simply <strong>in</strong>crease the demand <strong>for</strong> upper-end labor relative to that of lower-end labor, but <strong>in</strong>stead asymmetricallyaffects the bottom and the top of the wage distribution, result<strong>in</strong>g <strong>in</strong> its strong asymmetry.6. ConclusionsIn this paper we construct confidence bands <strong>for</strong> <strong>nonparametric</strong> <strong>quantile</strong> estimates of regression functions. The methodis based on bootstrapp<strong>in</strong>g, where resampl<strong>in</strong>g is done from a suitably estimated empirical distribution function (edf) <strong>for</strong>residuals. It is proven that the bootstrap approximation provides an improvement over the confidence bands constructed


258 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262via the asymptotic Gumbel distribution. We also propose a partial l<strong>in</strong>ear model to handle the case of multidimensional anddiscrete regressor variables. An economic application consider<strong>in</strong>g the labor market differential effect with respect to variouseducation levels is studied. The conclusions from the po<strong>in</strong>t of view of <strong>quantile</strong> regression are <strong>consistent</strong> with those of the(grouped) mean regression, but <strong>in</strong> a more careful way <strong>in</strong> the sense that we provide <strong>for</strong>mal statistical tools to judge theseuni<strong>for</strong>mly. The partial l<strong>in</strong>ear <strong>quantile</strong> regression techniques, together with confidence bands, developed <strong>in</strong> this paper displayvery <strong>in</strong>terest<strong>in</strong>g f<strong>in</strong>d<strong>in</strong>gs compared with classic (mean) methods and will br<strong>in</strong>g <strong>in</strong> more contributions to the differentialanalysis of the labor market.AppendixProof of Theorem 2.1. We start by prov<strong>in</strong>g Eq. (8). Write first ˆF−1(U i |X i ) = F −1 (U i |X i )+∆ i . Fix any i such that |F −1 (U i |X i )−F −1 (p)| ≤ S n δ n , which, by Eq. (1), implies that |U i − p| < S n δ n . Lemma 2.1 gives2max |ˆF(S δ n n|X i ) − F(S 2 δ n n|X i )| = O p (S n δ n ). (26)iTogether with F(±S 2 n δ n|X i ) = p ± O(S 2 n δ n), aga<strong>in</strong> by Eq. (1), we have ˆF(±S2n δ n|X i ) = p ± O p (S 2 n δ n) and thusˆF(−S 2 n δ n|X i ) = p − O p (S 2 n δ n) p − S n δ n < U i < p + S n δ n< p + O p (S 2 n δ n) = ˆF(S2n δ n|X i ).S<strong>in</strong>ce ˆF(·|Xi ) is monotone non-decreas<strong>in</strong>g, |ˆF−1(U i |X i )| ≤ S 2 n δ n, which means, by S 2 n = S n,|ˆF −1 (U i |X i )| ≤ S n δ n . (27)Apply now Lemma 2.1 aga<strong>in</strong> to Eq. (27), and obta<strong>in</strong>S n δ n ≥ |ˆF{ˆF −1 (U i |X i )|X i } − F{ˆF −1 (U i |X i )|X i }|= |U i − F{F −1 (U i |X i ) + ∆ i |X i }|= |F{F −1 (U i |X i )|X i } − F{F −1 (U i |X i ) + ∆ i |X i }|≥ f 0 (X i )|∆ i |. (28)Hence |∆ i | < S n δ n , and we summarize it asmax |F −1 (U i |X i ) − F −1 (U i |X i )| = O p {S n δ n }.i:|F −1 (U i |X i )−F −1 (p)|


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 259To show the difference of the two <strong>quantile</strong> smoothers, we shall reduce the strong convergence of q hi [{Y ∗j− l g (X j ) + l g (X j ) −l g (X i )} n j=1 ] − q hi[{Y #j− l(X j ) + l(X j ) − l(X i )} n j=1 ], <strong>for</strong> any i, to an application of the strong convergence of G(θ, Xi ) to Gn (θ, X i ),uni<strong>for</strong>mly <strong>in</strong> θ, <strong>for</strong> any i. Under assumptions (A7) and (A8), <strong>in</strong> a similar spirit to Härdle and Song [22], we getmax |l ∗ h (X i) − l g (X i ) − l # h (X i) − l(X i )| = O p (δ n ).iTo show the supremum of the bootstrap approximation error, without loss of generality, based on assumption (A1), wereorder the orig<strong>in</strong>al observations {X i , Y i } n i=1 , such that X 1 X 2 , . . . , X n . First decompose:sup |l ∗ (x) −x∈J ∗ h l g(x) − l # h(x) − l(x)| = max |l ∗ h (X i) − l g (X i ) − l # h (X i) + l(X i )|i+ maxisupx∈[X i ,X i+1 ]|l ∗ (x) − h l g(x) − l # h(x) + l(x)|. (32)From assumption (A1) we know l ′ (·) ≤ λ 1 and max i (X i+1 − X i ) = O p (S n /n). By the mean value theorem, we conclude thatthe second term of (32) is of a lower order than the first term. Together with Eq. (12) we havesup |l ∗ (x) −x∈J ∗ h l g(x) − l # h(x) − l(x)| = O{max |l ∗ h (X i) − l g (X i ) − l # h (X i) − l(X i )|} = O p (δ n ),iwhich means that the supremum of the approximation error over all x is of the same order of the maximum over the discreteobserved X i . □Proof of Theorem 2.2. The proof of (16) uses methods related to those <strong>in</strong> the proof of Theorem 3 of Härdle and Marron [20],so only the ma<strong>in</strong> steps are explicitly given. The first step is a bias-variance decomposition, 2E ˆbh,g (x) − b h (x) |X1 , . . . , X n = V n + B 2 , n (33)whereV n = Var ˆbh,g (x)|X 1 , . . . , X n,B n = E ˆbh,g (x) − b h (x)|X 1 , . . . , X n.Follow<strong>in</strong>g the uni<strong>for</strong>m Bahadur representation techniques <strong>for</strong> <strong>quantile</strong> regression as <strong>in</strong> Theorem 3.2 of Kong et al. [27],we have the follow<strong>in</strong>g l<strong>in</strong>ear approximation <strong>for</strong> the <strong>quantile</strong> smoother as a local polynomial smoother correspond<strong>in</strong>g to aspecific loss function:where<strong>for</strong>l # h (x) − l(x) = L n + O p (L n ),L n = n−1 K h (x − X i )ψ {Y i − l(x)}f {l(x)|x} f X (x)ψ(u) = p1{u ∈ (0, ∞)} − (1 − p)1{u ∈ (−∞, 0)}= p − 1{u ∈ (−∞, 0)},l(x − t) − l(x) = l ′ (x)(−t) + l ′′ (x)t 2 + O(t 2 ),{l(x − t) − l(x)} ′ = l ′′ (x)(−t) + l ′′′ (x)t 2 + O(t 2 ),f (x − t) = f (x) + f ′ (x)(−t) + f ′′ (x)(t 2 ) + O(t 2 ),f ′ (x − t) = f ′ (x) + f ′′ (x)(−t) + f ′′′ (x)t 2 + O(t 2 ),K h (t)tdt = 0,K h (t)t 2 dt = h 2 d K ,K h (t)O(t 2 )dt = O(h 2 ).Then we haveB n = B n1 + O(B n1 ),


260 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262where<strong>for</strong>B n1 =Kg (x − t)U h (t)dt − U h (x)f X (x)f {l(x)|x}U h (x) = K h (x − s)ψ {l(s) − l(x)} f (s)ds= K h (t)ψ {l(x − t) − l(x)} f (x − t)dt.By differentiation, a Taylor expansion and properties of the kernel K (see assumption (A2)),U ′ (x) = hK h (t)[ψ ′ {l(x − t) − l(x)} ′ f (x − t) + ψ {l(x − t) − l(x)} f ′ (x − t)]dt.Here ψ ′ is the derivative of ψ except the 0 po<strong>in</strong>t, which actually does not matter s<strong>in</strong>ce there is <strong>in</strong>tegration afterwards.Collect<strong>in</strong>g terms, we getU ′ (x) = hK h (t){ψ ′ l ′′ (x)f ′ X (x)t2 + ψ ′ l ′′′ f X (x)t 2 + af ′′′ (x)t 2 + O(t 2 )}dt= K h (t) C 0 t 2 + o(t 2 ) dt = h 2 d K · C 0 + O(h 2 ),where a is a constant with |a| < 1 and C 0 = ψ ′ l ′′ (x)f ′ X (x) + ψ ′ l ′′′ f X (x) + af ′′′ (x).Hence, by another substitution and Taylor expansion, <strong>for</strong> the first term <strong>in</strong> the numerator of B n1 , we haveB n2 = g 2 h 2 (d K ) 2 · C 0 + O(g 2 h 2 ).Thus, along almost all sample sequences,B 2 n = C 1g 4 h 4 + O(g 4 h 4 ) (34)<strong>for</strong> C 1 = (d K ) 4 C 2 0 /[f 2 X (x)f 2 {l(x)|x}].For the variance term, calculation <strong>in</strong> a similar spirit shows thatwhereV n = V n1 + O(V n1 ),<strong>for</strong>V n1 =K2g (x − t)W h(t)dt − K g (x − t)U h (t)dt 2fX (x)f {l(x)|x}f X (x)f {l(x)|x}W h (x) = K 2 h (x − s)ψ {l(s) − l(x)}2 f (s)ds= K 2 (t)ψ h {l(x − t) − l(x)}2 f (x − t)dt.Hence, by Taylor expansion, collect<strong>in</strong>g items and similar calculation, we haveV n = n −1 h 4 g −5 C 2 + O(n −1 h 4 g −5 ) (35)<strong>for</strong> a constant C 2 . This, together with (33) and (34), completes the proof of Theorem 2.2.□Proof of Theorem 3.1. In the case where the function l is known, the estimate ˆβ I isnˆβ I = argm<strong>in</strong> ψ{Y i − l(V i ) − U ⊤ iβ}.βi=1S<strong>in</strong>ce l is unknown, <strong>in</strong> each of these small <strong>in</strong>tervals I ni , l(V i ) could be regarded as a constant α = l(m ni ) <strong>for</strong> some i whosecorrespond<strong>in</strong>g <strong>in</strong>terval I ni covers V i . From assumption (A1), we know that |l(V i ) − α i | ≤ λ 1 b n < ∞. If we def<strong>in</strong>e our firststep estimate ˆβ i <strong>in</strong>side each small <strong>in</strong>terval as( ˆα i , ˆβ i ) = arg m<strong>in</strong>α, βψ(Yi − α − U ⊤ iβ),


S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262 261|{Y i − l(V i ) − U ⊤ iβ} − (Y i − α − U ⊤ iβ)| ≤ λ 1 b n < ∞ <strong>in</strong>dicates that we could treat ˆβ i as ˆβ I <strong>in</strong>side each partition. If we use d ito denote the number of observations <strong>in</strong>side partition I ni (based on the i.i.d. assumption as <strong>in</strong> assumption (A1), on averaged i = n/a n ). For each of the ˆβ i ’s <strong>in</strong>side <strong>in</strong>terval I ni , various parametric <strong>quantile</strong> regression works, e.g. the convex function rule<strong>in</strong> [31,24], yieldd i ( ˆβ i − β) L → N{0, p(1 − p)D ′−1i(p)C ′i D′−1 i(p)} (36)with the matrices C ′i= d i−1 d ii=1 U⊤ iU i and D ′ i (p) = d i −1 d ii=1 f {l(V i)|V i }U ⊤ iU i .To get ˆβ, our second step is to take the weighted mean of ˆβ 1 , . . . , ˆβ an asˆβ = arg m<strong>in</strong>βa n i=1d i ( ˆβ i − β) 2 =a n i=1d i ˆβ i /n.Note that under this construction, ˆβ 1 , . . . , ˆβ an are <strong>in</strong>dependent but not identical. Thus we <strong>in</strong>tend to use the L<strong>in</strong>debergcondition <strong>for</strong> the central limit theorem. To this end, we use s 2 n to denote Var( a ni=1 d ˆβ i i /n), and we need to further checkwhether the follow<strong>in</strong>g ‘‘L<strong>in</strong>deberg condition’’ holds:1 a nlim( ˆβa n →∞ s 2 i − β) 2 dF = 0, <strong>for</strong> all ε > 0. (37)n (|d i ˆβ i /n−β|>εs n )i=1S<strong>in</strong>ce anVar d i ( ˆβ i − β)/n =i=1a n id i×p(1 − p)i=1U ⊤ iU <strong>in</strong>/d <strong>in</strong>/d id ij=1d ij=1f {l(V j )|v}U ⊤ jf {l(V j )|v}U ⊤ j −1nn≈ p(1 − p) f {l(V j )|v}U ⊤ jU jj=1def= 1 p(1 − p)D−1nn C nD −1 , ni=1U j −1U j −1 U ⊤ i −1 nU i f {l(V j )|v}U ⊤ jU jwhere D n = 1 n nj=1 f {l(V j)|V i }U ⊤ jU j and C n = 1 n ni=1 U⊤ iU i , together with the normality of ˆβ i as <strong>in</strong> (36) and properties ofthe tail of the normal distribution, e.g. Exe. 14.3–14.4 of Borak et al. [3], (37) follows.Thus as n, a n → ∞ (although at a lower rate than n), together with C = plim n→∞ C n , D = plim n→∞ D n , we have√ n( ˆβ − β) L → N{0, p(1 − p)D −1 CD −1 }. □ (38)j=1References[1] G. Becker, Human Capital: A Theoretical and Empirical Analysis with Special Reference to Education, third ed., The University of Chicago Press, 1994.[2] A. Belloni, V. Chernozhukov, L 1 -penalized <strong>quantile</strong> regression <strong>in</strong> high-dimensional sparse models, Annals of Statistics 39 (1) (2011) 82–130.[3] S. Borak, W. Härdle, B. Lopez, Statistics of F<strong>in</strong>ancial Markets Exercises and Solutions, Spr<strong>in</strong>ger-Verlag, Heidelberg, 2010.[4] M. Buch<strong>in</strong>sky, Quantile regression, Box–Cox trans<strong>for</strong>mation model, and the US wage structure, 1963–1987, Journal of Econometrics 65 (1995)109–154.[5] Z.W. Cai, Regression <strong>quantile</strong>s <strong>for</strong> time series, Econometric Theory 18 (2002) 169–192.[6] V. Chernozhukov, S. Lee, A.M. Rosen, Intersection bounds: estimation and <strong>in</strong>ference, CEMMAP Work<strong>in</strong>g Paper, 19/09, 2009.[7] J.C. Day, E.C. Newburger, The big payoff: educational atta<strong>in</strong>ment and synthetic estimates of work-life earn<strong>in</strong>gs. Special studies, current populationreports, Statistical Report, US Department of Commerce Economics and Statistics Adm<strong>in</strong>istration, US CENSUS BUREAU, 2002, pp. 23–210.[8] L. Denby, Smooth regression functions. Statistical Report 26, AT&T Bell Laboratories, 1986.[9] C. Dustmann, J. Ludsteck, U. Schönberg, Revisitng the German wage structure, The Quarterly Journal of Economics 124 (2) (2009) 843–881.[10] J. Fan, T.C. Hu, Y.K. Troung, Robust <strong>nonparametric</strong> function estimation, Scand<strong>in</strong>avian Journal of Statistics 21 (1994) 433–446.[11] J. Fan, Q. Yao, H. Tong, Estimation of conditional densities and sensitivity measures <strong>in</strong> nonl<strong>in</strong>ear dynamical systems, Biometrika 83 (1996) 189–206.[12] R.A. Fisher, L.H.C. Tippett‘, Limit<strong>in</strong>g <strong>for</strong>ms of the frequency distribution of the largest or smallest member of a sample, Proceed<strong>in</strong>gs of the CambridgePhilosophical Society 24 (1928) 180–190.[13] J. Franke, P. Mwita, Nonparametric estimates <strong>for</strong> conditional <strong>quantile</strong>s of time series, Report <strong>in</strong> Wirtschaftsmathematik 87, University of Kaiserslautern,2003.[14] P.J. Green, B.S. Yandell, Semi-parametric generalized l<strong>in</strong>ear models, <strong>in</strong>: Proceed<strong>in</strong>gs 2nd International GLIM Conference, <strong>in</strong>: Lecture Notes <strong>in</strong> Statistics32, vol. 32, Spr<strong>in</strong>ger, New York, 1985, pp. 44–55.[15] J. Hahn, Bootstrapp<strong>in</strong>g <strong>quantile</strong> regression estimators, Econometric Theory 11 (1) (1995) 105–121.[16] P. Hall, On convergence rates of suprema, Probability Theory and Related Fields 89 (1991) 447–455.[17] P. Hall, R. Wolff, Q. Yao, Methods <strong>for</strong> estimat<strong>in</strong>g a conditional distribution function, Journal of the American Statistical Association 94 (1999) 154–163.[18] W. Härdle, P. Janssen, R. Serfl<strong>in</strong>g, Strong uni<strong>for</strong>m consistency rates <strong>for</strong> estimators of conditional functionals, Annals of Statistics 16 (1988) 1428–1429.


262 S. Song et al. / Journal of Multivariate Analysis 107 (2012) 244–262[19] W. Härdle, S. Luckhaus, Uni<strong>for</strong>m consistency of a class of regression function estimators, Annals of Statistics 12 (1984) 612–623.[20] W. Härdle, J. Marron, Bootstrap simultaneous error bars <strong>for</strong> <strong>nonparametric</strong> regression, Annals of Statistics 19 (1991) 778–796.[21] W. Härdle, M. Müller, S. Sperlich, A. Werwatz, Nonparametric and Semiparametric Models, Spr<strong>in</strong>ger Verlag, Heidelberg, 2004.[22] W. Härdle, S. Song, Confidence bands <strong>in</strong> <strong>quantile</strong> regression, Econometric Theory 26 (2010) 1–22. with corrections <strong>for</strong>thcom<strong>in</strong>g or obta<strong>in</strong>ed from theauthors.[23] J.L. Horowitz, Bootstrap methods <strong>for</strong> median regression models, Econometrica 66 (6) (1998) 1327–1351.[24] K. Knight, Compar<strong>in</strong>g conditional <strong>quantile</strong> estimators: first and second order considerations, Unpublished Manuscript, 2001.[25] R. Koenker, G.W. Bassett, Regression <strong>quantile</strong>s, Econometrica 46 (1978) 33–50.[26] R. Koenker, K.F. Hallock, Quantile regression, The Journal of Economic Perspectives 15 (4) (2001) 143–156.[27] E. Kong, O. L<strong>in</strong>ton, Y. Xia, Uni<strong>for</strong>m Bahadur representation <strong>for</strong> local polynomial estimates of M-regression and its application to the additive model,Econometric Theory 26 (2010) 1529–1564.[28] C. Kuan, J. Yeh, Y. Hsu, Assess<strong>in</strong>g value at risk with care, the conditional autoregressive expectile models, Journal of Econometrics 150 (2009) 261–270.[29] H. Liang, R. Li, Variable selection <strong>for</strong> partially l<strong>in</strong>ear models with measurement errors, Journal of the American Statistical Association 104 (485) (2009)234–248.[30] W. Newey, J. Powell, Asymmetric least squares estimation and <strong>test</strong><strong>in</strong>g, Econometrica 55 (1987) 816–847.[31] D. Pollard, Asymptotics <strong>for</strong> least absolute deviation estimators, Econometric Theory 7 (1991) 186–199.[32] P.M. Rob<strong>in</strong>son, Semiparametric econometrics: a survey, Journal of Applied Econometrics 3 (1988) 35–51.[33] D. Ruppert, S.J. Sheather, M.P. Wand, An effective bandwidth selector <strong>for</strong> local least squares regression, Journal of the American Statistical Association90 (432) (1995) 1257–1270.[34] P.E. Speckman, Regression analysis <strong>for</strong> partially l<strong>in</strong>ear models, Journal of the Royal Statistical Society, Series B 50 (1988) 413–436.[35] K. Yu, M.C. Jones, A comparison of local constant and local l<strong>in</strong>ear regression <strong>quantile</strong> estimation, Computational Statistics and Data Analysis 25 (1997)159–166.[36] K. Yu, M.C. Jones, Local l<strong>in</strong>ear <strong>quantile</strong> regression, Journal of the American Statistical Association 93 (1998) 228–237.


Journal of Multivariate Analysis 105 (2012) 164–175Contents lists available at SciVerse ScienceDirectJournal of Multivariate Analysisjournal homepage: www.elsevier.com/locate/jmvaDifference based ridge and Liu type estimators <strong>in</strong> semiparametricregression models ✩Esra Akdeniz Duran a,1 , Wolfgang Karl Härdle b , Maria Osipenko c,∗a Department of Statistics, Gazi University, Turkeyb Center <strong>for</strong> Applied Statistics & Economics, <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>, Germanyc CASE, School of Bus<strong>in</strong>ess and Economics, <strong>Humboldt</strong>-Universiät zu Berl<strong>in</strong>, Unter den L<strong>in</strong>den 6, 10099, Germanya r t i c l ei n f oa b s t r a c tArticle history:Received 4 March 2011Available onl<strong>in</strong>e 7 September 2011AMS subject classifications:62G0862J07Keywords:Difference based estimatorDifferenc<strong>in</strong>g estimatorDifferenc<strong>in</strong>g matrixLiu estimatorLiu type estimatorMulticoll<strong>in</strong>earityRidge regression estimatorSemiparametric modelWe consider a difference based ridge regression estimator and a Liu type estimator ofthe regression parameters <strong>in</strong> the partial l<strong>in</strong>ear semiparametric regression model, y =Xβ +f +ε. Both estimators are analyzed and compared <strong>in</strong> the sense of mean-squared error.We consider the case of <strong>in</strong>dependent errors with equal variance and give conditions underwhich the proposed estimators are superior to the unbiased difference based estimationtechnique. We extend the results to account <strong>for</strong> heteroscedasticity and autocovariance<strong>in</strong> the error terms. F<strong>in</strong>ally, we illustrate the per<strong>for</strong>mance of these estimators with anapplication to the determ<strong>in</strong>ants of electricity consumption <strong>in</strong> Germany.© 2011 Elsevier Inc. All rights reserved.1. IntroductionSemiparametric partial l<strong>in</strong>ear models have received considerable attention <strong>in</strong> statistics and econometrics. They have awide range of applications, from biomedical studies to economics. In these models, some explanatory variables have a l<strong>in</strong>eareffect on the response while others are enter<strong>in</strong>g <strong>nonparametric</strong>ally. Consider the semiparametric regression model:y i = x ⊤ iβ + f (t i ) + ε i , i = 1, . . . , n (1)where y i ’s are observations at t i , 0 ≤ t 1 ≤ t 2 ≤ · · · ≤ t n ≤ 1 and x ⊤ i= (x i1 , x i2 , . . . , x ip ) are known p-dimensional vectorswith p ≤ n. In many applications, t i ’s are values of an extra univariate ‘‘time’’ variable at which responses y i are observed.In the case t i ∈ R k , t i = (t 1i , . . . , t ki ) ⊤ , the triples (y 1 , x 1 , t 1 ), . . . , (y n , x n , t n ) should be ordered us<strong>in</strong>g one of the algorithmsmentioned <strong>in</strong> [30], Appendix A, or <strong>in</strong> [8, Section 2.2].✩ This research was supported by Deutsche Forschungsgeme<strong>in</strong>schaft through the SFB 649 ‘Economic Risk’.∗ Correspond<strong>in</strong>g author.E-mail addresses: esraakdeniz@gmail.com (E. Akdeniz Duran), maria.osipenko@wiwi.hu-berl<strong>in</strong>.de (M. Osipenko).1 Dr. Esra Akdeniz Duran was a research associate at <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>, Germany dur<strong>in</strong>g this research.0047-259X/$ – see front matter © 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.jmva.2011.08.018


E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175 165In Eq. (1), β = (β 1 , . . . , β p ) ⊤ is an unknown p-dimensional parameter vector, f (·) is an unknown smooth function andε’s are <strong>in</strong>dependent and identically distributed random errors with E(ε|x, t) = 0 and Var(ε|x, t) = σ 2 . We shall call f (t)the smooth part of the model and assume that it represents a smooth unparameterized functional relationship.The goal is to estimate the unknown parameter vector β and the <strong>nonparametric</strong> function f (t) from the data {y i , x i , t i } n i=1 .In vector/matrix notation, (1) is written asy = Xβ + f + ε (2)where y = (y 1 , . . . , y n ) ⊤ , X = (x 1 , . . . , x n ), f = {f (t 1 ), . . . , f (t n )} ⊤ , ε = (ε 1 , . . . , ε n ) ⊤ .Semiparametric models are by design more flexible than standard l<strong>in</strong>ear regression models s<strong>in</strong>ce they comb<strong>in</strong>e bothparametric and <strong>nonparametric</strong> components. There exist various goodness-of-fit <strong>test</strong>s to identify the <strong>nonparametric</strong> part <strong>in</strong>this k<strong>in</strong>d of models; see [8] and the references there<strong>in</strong>. Estimation techniques <strong>for</strong> semiparametric partially l<strong>in</strong>ear modelsare based on different <strong>nonparametric</strong> regression procedures. The most important approaches to estimate β and f are given<strong>in</strong> [12,4,7,6,5,14,24,15,33].In practice, researchers often encounter the problem of multicoll<strong>in</strong>earity. In case of multicoll<strong>in</strong>earity, we know thatthe (p × p) matrix X ⊤ X has one or more small eigenvalues; the estimates of the regression coefficients can there<strong>for</strong>ehave large variances: the least squares estimator per<strong>for</strong>ms poorly <strong>in</strong> this case. Hoerl and Kennard [17] proposed the ridgeregression estimator and it has become the most common method to overcome this particular weakness of the least squaresestimator. For the purpose of this paper, we will employ the biased estimator that was proposed by Liu [20] to combat themulticoll<strong>in</strong>earity. The Liu estimator comb<strong>in</strong>es the Ste<strong>in</strong> [26] estimator with the ridge regression estimator; see also [1,13].The condition number is a measure of multicoll<strong>in</strong>earity. If X ⊤ X is ill-conditioned with a large condition number, theridge regression estimator or Liu estimator can be used to estimate β, [21]. We consider difference based ridge and Liutype estimators <strong>in</strong> comparison to the unbiased difference based approach. We give theoretical conditions that determ<strong>in</strong>esuperiority among the estimation techniques <strong>in</strong> the mean squared error matrix sense.We use data on monthly electricity consumption and its determ<strong>in</strong>ants (<strong>in</strong>come, electricity and gas prices, temperature)<strong>for</strong> Germany. The purpose is to understand electricity consumption as a l<strong>in</strong>ear function of <strong>in</strong>come and price and a nonl<strong>in</strong>earfunction of temperature: semiparametric approach is there<strong>for</strong>e necessary here. The data reveal a high condition numberof 20.5; we there<strong>for</strong>e expect a more precise estimation with Ridge or Liu type estimators. We show how our theoreticallyderived conditions can be implemented <strong>for</strong> a given data set and be used to determ<strong>in</strong>e the appropriate biased estimationtechnique.The paper is organized as follows. In Section 2, the model and the differenc<strong>in</strong>g estimator is def<strong>in</strong>ed. We <strong>in</strong>troducedifference based ridge and Liu type estimators <strong>in</strong> Section 3. In Section 4, the differenc<strong>in</strong>g estimator proposed by Yatchew [30]and the difference based Liu type estimator are compared <strong>in</strong> terms of the mean squared error. In Section 5, both biasedregression methodologies <strong>in</strong> semiparametric regression models are compared <strong>in</strong> terms of the mean squared error. Section 6relaxes the assumption of i.i.d. errors and replicates the results of the previous sections <strong>in</strong> the presence of heteroscedasticityand autocorrelation. Section 7 gives a real data example to show the per<strong>for</strong>mance of the proposed estimators.2. The model and differenc<strong>in</strong>g estimatorIn this section, we <strong>in</strong>troduce a difference based technique <strong>for</strong> the estimation of the l<strong>in</strong>ear coefficient vector <strong>in</strong> asemiparametric regression. This technique has been used to remove the <strong>nonparametric</strong> component <strong>in</strong> the partially l<strong>in</strong>earmodel by various authors (e.g. [30,32,19,3]).Consider the semiparametric regression model (2). Let d = (d 0 , d 1 , . . . , d m ) ⊤ be an m + 1 vector where m is the orderof differenc<strong>in</strong>g and d 0 , d 1 , . . . , d m are differenc<strong>in</strong>g weights that m<strong>in</strong>imizemk=1m−k 2d j d k+j ,j=1such thatmd j = 0j=0andmd 2 j= 1 (3)j=0are satisfied.Let us def<strong>in</strong>e the (n − m) × n differenc<strong>in</strong>g matrix D to have first and last rows (d ⊤ , 0 ⊤ n−m−1 ), (0⊤ n−m−1 , d⊤ ) respectively,with i-th row (0 i , d ⊤ , 0 ⊤ ), n−m−i−1 i = 1, . . . , (n − m − 1), where 0 r <strong>in</strong>dicates an r-vector of all zero elements⎛⎞d 0 d 1 d 2 · · · d m 0 · · · · · · 00 d 0 d 1 d 2 · · · d m 0 · · · 0D =⎜..⎟⎝0 · · · · · · d 0 d 1 d 2 · · ·⎠ .d m 00 0 · · · · · · d 0 d 1 d 2 · · · d m


166 E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175Apply<strong>in</strong>g the differenc<strong>in</strong>g matrix to (2) permits direct estimation of the parametric effect. Eubank et al. [6] showed thatthe parameter vector <strong>in</strong> (2) can be estimated with parametric efficiency. If f is an unknown function with bounded firstderivative, then Df is essentially 0, so that apply<strong>in</strong>g the differenc<strong>in</strong>g matrix we haveDy = DXβ + Df + Dε ≈ DXβ + Dεy ≈ Xβ +ε (4)wherey = Dy, X = DX andε = Dε. Constra<strong>in</strong>ts (3) ensure that the <strong>nonparametric</strong> effect is removed and Var(ε) = Var(ε) =σ 2 . With (4), a simple differenc<strong>in</strong>g estimator of the parameter β <strong>in</strong> the semiparametric regression model results:β (0) = {(DX) ⊤ (DX)} −1 (DX) ⊤ Dy= X⊤ X −1 X ⊤ y. (5)Thus, differenc<strong>in</strong>g allows one to per<strong>for</strong>m <strong>in</strong>ferences on β as if there were no <strong>nonparametric</strong> component f <strong>in</strong> model (2), [9].We will also use the modified estimator of σ 2 proposed by Eubank et al. [7]σ 2 =y⊤ (I − P ⊥ )ytr{D ⊤ (I − P ⊥ )D}(6)with P ⊥ = X( X ⊤ X)−1 X ⊤ , I (p × p) identity matrix and tr(·) denot<strong>in</strong>g the trace function <strong>for</strong> a square matrix.3. Difference based ridge and Liu type estimatorAs an alternative to β (0) <strong>in</strong> (5), [27] propose:β (1) (k) = ( X⊤ X + kI)−1 X ⊤ y, k ≥ 0;here k is the ridge-bias<strong>in</strong>g parameter selected by the researcher. We callβ (1) (k) a difference based ridge regression estimatorof the semiparametric regression model.From the least squares perspective, the coefficients β are chosen to m<strong>in</strong>imize(y − Xβ) ⊤ (y − Xβ). (7)Add<strong>in</strong>g to the least squares objective (7) a penaliz<strong>in</strong>g function of the squared norm ∥ηβ (0) − β∥ 2 <strong>for</strong> the vector of regressioncoefficients, yields a conditional objective:L = (y − Xβ) ⊤ (y − Xβ) + (η β (0) − β) ⊤ (ηβ (0) − β). (8)M<strong>in</strong>imiz<strong>in</strong>g (8) with respect to β, we obta<strong>in</strong> the estimator β (2) (η) an alternative to β (0) <strong>in</strong> (5):β (2) (η) = ( X⊤ X + I) −1 ( X ⊤ y + ηβ (0) ), (9)where η, 0 ≤ η ≤ 1, is a bias<strong>in</strong>g parameter and when η = 1, β (2) (η) = β (0) . The <strong>for</strong>mal resemblance between (9) and theLiu estimator motivated [1,18,29] to call it the difference based Liu type estimator of the semiparametric regression model.4. Mean squared error matrix (MSEM) comparison of β (0) with β (2) (η)In this section, the objective is to exam<strong>in</strong>e the difference of the mean square error matrices of β (0) and β (2) (η). We notethat <strong>for</strong> any estimator β of β, its mean squared error matrix (MSEM) is def<strong>in</strong>ed as MSEM(β) = Cov(β) + Bias(β) Bias(β) ⊤ ,where Cov(β) denotes the variance–covariance matrix and Bias(β) = E(β) − β is the bias vector. The expected value ofβ (2) (η) can be written asE{β (2) (η)} = β − (1 − η)( X⊤ X + I) −1 β.The bias of the β (2) (η) is given asBias{β (2) (η)} = −(1 − η)( X⊤ X + I) −1 β. (10)Denot<strong>in</strong>g F η = ( X ⊤ X + I)−1( X ⊤ X + ηI) and observ<strong>in</strong>g Fη and ( X ⊤ X) −1 are commutative, we may write β (2) (η) asβ (2) (η) = F ηβ (0) = F η ( X⊤ X)−1 X ⊤ y= ( X⊤ X) −1 F η X ⊤ y.


168 E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–1755. MSEM comparison of β (1) (k) and β (2) (η)Let us now compare the MSEM per<strong>for</strong>mance ofβ (1) (k) = ( X⊤ X + kI)−1 X ⊤ ywith= S k X⊤ Dy= A 1 y (17)β (2) (η) = ( X⊤ X + I) −1 ( X⊤ y + η β (0) )= ( X⊤ X) −1 ( X⊤ X + I) −1 ( X⊤ X + ηI) X ⊤ y= UF η X⊤ Dy= A 2 y. (18)The MSEM of the difference based ridge regression estimator β (1) (k) is given byMSEM{β (1) (k)} = Cov{β (1) (k)} + Bias{β (1) (k)} Bias{β (1) (k)} ⊤= S k (σ 2 S + k 2 ββ ⊤ )S ⊤ k= σ 2 (A 1 A ⊤ 1 ) + d 1d ⊤ 1 ,where S k = ( X ⊤ X + kI) −1 and d 1 = Bias{β (1) (k)} = −kS k β; see [27]. The MSEM <strong>in</strong> (16) may be written asMSEM{β (2) (η)} = σ 2 (A 2 A ⊤ 2 ) + d 2d ⊤ 2 ,with d 2 = Bias{β (2) (η)} = −(1 − η)(U −1 + I) −1 β.Def<strong>in</strong>e∆ 3 = MSEM{β (1) (k)} − MSEM{β (2) (η)} = σ 2 (A 1 A ⊤ − 1 A 2A ⊤ ) + 2 (d 1d ⊤ − 1 d 2d ⊤ 2). (19)For the follow<strong>in</strong>g proofs we employ the follow<strong>in</strong>g lemma.Lemma 5.1 (Trenkler and Toutenburg [28]). Let β (j) = A j y, j = 1, 2 be the two l<strong>in</strong>ear estimators of β. Suppose the differenceCov(β (1) ) − Cov(β (2) ) of the covariance matrices of the estimators β (1) and β (2) is positive def<strong>in</strong>ite. Then MSEM(β (1) ) −MSEM(β (2) ) is positive def<strong>in</strong>ite if and only if d ⊤ 2 {Cov( β (1) ) − Cov(β (2) ) + d 1 d ⊤ 1 }−1 d 2 < 1.Theorem 5.1. The sampl<strong>in</strong>g variance of β (2) (η) is smaller than that of β (1) (k), if and only if λ m<strong>in</strong> (G −12 G 1) > 1, where λ m<strong>in</strong> isthe m<strong>in</strong>imum eigenvalue of G −12 G 1 and G j = σ 2 A j A ⊤ j, j = 1, 2.Proof. Consider the difference∆ ∗ = Cov{β (1) (k)} − Cov{β (2) (η)}= σ 2 (A 1 A ⊤ 1 − A 2A ⊤ 2 ),= G 1 − G 2with G 1 = (D ⊤ XWk U) ⊤ = V ⊤ V, W k = I+kU and G 2 = ( XF⊤η U) ⊤ ( XF⊤η U). S<strong>in</strong>ce rank(V) = p < n−m, G 1 is a (p×p) positivedef<strong>in</strong>ite matrix and G 2 is a symmetric matrix. Hence, a nons<strong>in</strong>gular matrix O exists such that O ⊤ G 1 O = I and O ⊤ G 2 O = Λ,with Λ diagonal matrix with diagonal elements roots λ of the polynomial equation |G 1 − λG 2 | = 0 (see [16, p. 563]or [25, p. 160]). Thus, we may write ∆ ∗ = (O ⊤ ) −1 (O ⊤ G 1 O − O ⊤ G 2 O)O −1 = (O ⊤ ) −1 (Λ − I)O −1 or O ⊤ ∆ ∗ O = Λ − I. IfG 1 − G 2 is positive def<strong>in</strong>ite, then O ⊤ G 1 O − O ⊤ G 2 O = Ψ − I is positive def<strong>in</strong>ite. Hence λ i − 1 > 0, i = 1, 2, . . . , p, so we getλ m<strong>in</strong> (G −12 G 1) > 1.Now let λ m<strong>in</strong> (G −12 G 1) > 1 hold. Furthermore, with G 2 positive def<strong>in</strong>ite and G 1 symmetric, we have λ m<strong>in</strong> < ν⊤ G 1 ν< λ ν ⊤ G 2 ν max<strong>for</strong> all nonzero (p × 1) vectors ν, so G 1 − G 2 is positive def<strong>in</strong>ite; see [23, p. 74]. It is obvious that Cov{β (2) (η)} − Cov{β (1) (k)}is positive def<strong>in</strong>ite <strong>for</strong> 0 ≤ η ≤ 1, k ≥ 0 if and only if λ m<strong>in</strong> (G −12 G 1) > 1. □Theorem 5.2. Consider β (1) (k) = A 1 y and β (2) (η) = A 2 y of β. Suppose that the difference Cov{β (1) (k)} − Cov{β (2) (η)} ispositive def<strong>in</strong>ite. Then∆ 3 = MSEM{β (1) (k)} − MSEM{β (2) (η)}is positive def<strong>in</strong>ite if and only ifd ⊤ 2 {σ 2 (A 1 A ⊤ 1 − A 2A ⊤ 2 ) + d 1d ⊤ 1 }−1 d 2 < 1with A 1 = S k X ⊤ D, A 2 = UF η X ⊤ D.


E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175 169Proof. The difference between the MSEMs of β (2) (η) and β (1) (k) is given by∆ 3 = MSEM{β (1) (k)} − MSEM{β (2) (η)}= σ 2 (A 1 A ⊤ 1 − A 2A ⊤ 2 ) + (d 1d ⊤ 1 − d 2d ⊤ 2 )= Cov{β (1) (k)} − Cov{β (2) (η)} + (d 1 d ⊤ 1 − d 2d ⊤ 2 ).Apply<strong>in</strong>g Lemma 5.1 yields the desired result.□It should be noted that all results reported above are based on the assumption that k and η are non-stochastic. Thetheoretical results <strong>in</strong>dicate that the β (2) (η) is not always better than the β (1) (k), and vice versa. For practical purposes, wehave to replace these unknown parameters by some suitable estimators.6. The heteroscedasticity and correlated error caseUp to this po<strong>in</strong>t, <strong>in</strong>dependent errors with equal variance were assumed. The error term might also exhibit autocorrelation.To account <strong>for</strong> these effects, we extend the results <strong>in</strong> this section and consider the more general case of heteroscedasticityand autocovariance <strong>in</strong> the error terms.Consider now observations {y t , x t , t t } T t=1 and the semiparametric partial l<strong>in</strong>ear model y t = x ⊤ t β+f (t t)+ε t , t = 1, . . . , T .Let E(εε ⊤ |x, t) = Ω not necessarily diagonal. To keep the structure of the errors <strong>for</strong> later <strong>in</strong>ference, we def<strong>in</strong>e an (n × n)permutation matrix P as <strong>in</strong> [32]. Consider a permutation:⎛1 t (1)· · · · · ·⎜ ⎟⎜ i⎝⎞t (i) ⎟⎠t (n)· · · · · ·nwhere i = 1, . . . , n is the <strong>in</strong>dex of the ordered <strong>nonparametric</strong> variable and t (i) = 1, . . . , T correspond<strong>in</strong>g time <strong>in</strong>dex of theobservations. Then P is def<strong>in</strong>ed <strong>for</strong> i, j = 1, . . . , n:1, j = t(i)P ij =0, otherwise.We can now rewrite the model after reorder<strong>in</strong>g and differenc<strong>in</strong>g:DPy = DPXβ + DPf (x) + DPε, E(εε ⊤ |x, t) = Ω. (20)Then, with X = DPX andy = DPy from (20), β (0) is given:withβ (0) = ( X⊤ X)−1 X ⊤ y (21)Cov(β (0) ) = ( X⊤ X)−1 X⊤ DPΩD⊤ P⊤ X( X⊤ X)−1= U X⊤ DPΩD⊤ P⊤ XU. (22)We will use a heteroscedasticity and autocovariance <strong>consistent</strong> estimator described <strong>in</strong> [22] <strong>for</strong> the <strong>in</strong>terior matrix of (22),which is <strong>in</strong> our case: L DPΩD ⊤ P ⊤ = { DPε( DPε) ⊤ } ⊙ 1 − l H l (23)L + 1l=0with DPε = y − X β (0) , ⊙ denot<strong>in</strong>g the elementwise matrix product, L the maximum lag of nonzero autocorrelation <strong>in</strong> theerrors and H 0 the identity matrix. Let L l be a matrix with ones on the lth diagonal; then H l , l = 1, . . . L are such that:H l 0, if {DP(Lij=l + L ⊤ l )D⊤ P ⊤ } ij = 0,1, otherwise and i, j = 1, . . . , p.Plugg<strong>in</strong>g (23) <strong>in</strong> (22), we obta<strong>in</strong> a <strong>consistent</strong> estimator <strong>for</strong> Cov(β (0) ); see [31] <strong>for</strong> details.Denot<strong>in</strong>g S = X ⊤ DPΩD ⊤ P ⊤ X, we can write down Cov{ β (1) (k)} and Cov{β (2) (η)} <strong>in</strong> model (20).Cov{β (1) (k)} = S k SSk (24)Cov{β (2) (η)} = F η U SUFη . (25)


170 E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175Us<strong>in</strong>g (22) and (25), the difference ∆ 1 = Cov(β (0) ) − Cov{β (2) (η)} can be expressed as∆ 1 = U SU − Fη U SUF⊤η= F η {Fη−1 U SU(F⊤η ) −1 − U SU}F⊤η= (1 − η 2 )(U −1 + I) −1 11 + η (U S + SU) + U SU(U −1 + I) −1 , (26)with τ = 11+η > 0, M = U SU, N = U S + SU. S<strong>in</strong>ce M is a (p × p) positive def<strong>in</strong>ite matrix and N is a symmetric matrix, anons<strong>in</strong>gular matrix T exists such that T ⊤ MT = I and T ⊤ NT = E; here E is a diagonal matrix and its diagonal elements arethe roots of the polynomial equation | M−1 N −eI| = 0 (see [11, pp. 408] and [16, pp. 563]) and we may write (26) as∆ 1 = (1 − η 2 )H( M + τ N)H= (1 − η 2 )H(T ⊤ ) −1 (T ⊤ MT + τ T⊤ NT)T −1 H= (1 − η 2 )H(T ⊤ ) −1 (I + τ E)T −1 H,where I +τ E = diag(1 + τe11 , . . . , 1 + τe pp ) and H = (U −1 + I) −1 . S<strong>in</strong>ce N = U S + SU ≠ 0, there is at least one diagonalelement of E that is nonzero.Lete ii < 0 <strong>for</strong> at least one i; then positive def<strong>in</strong>iteness of I + τ E is guaranteed by 1 0 < τ < m<strong>in</strong>e ii . (27) 0 <strong>for</strong> all i = 1, . . . , p and there<strong>for</strong>e I +τ E is a positive def<strong>in</strong>ite matrix. Consequently, ∆1 becomes a positivedef<strong>in</strong>ite matrix, as well. It is now evident that the estimator β (2) (η) has a smaller variance compared with the estimator β (0)if and only if (27) is satisfied.With∆ ′ 1 = Cov( β (0) ) − Cov{β (1) (k)}= k 2 S k 1k (U S + SU) + U SUS k 1= k 2 S k N + M S kkand analogous argumentation as above obta<strong>in</strong>ed <strong>for</strong> β (1) (k):0 < 1 k < m<strong>in</strong>1 e ii . (28)


∆ ′ 2E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175 171= MSEM( β (0) ) − MSEM{β (1) (k)}= Cov(β (0) ) − Cov{β (1) (k)} − Bias{β (1) (k)} Bias{β (1) (k)} ⊤= S k {k( SU + U S) + k 2 U SU − k 2 ββ ⊤ }S k 1= k 2 S k N + M − ββ⊤k= k 2 S k (W 1 − ββ ⊤ )S k .With Lemma 4.1, the assertion follows.S k□Theorem 6.1 gives conditions under which the biased estimator β (i) (x), i = {1, 2}; x = {k, η} is superior to β (0) <strong>in</strong> thepresence of heteroscedasticity and autocorrelation <strong>in</strong> the data.Note that <strong>for</strong> comparison of the biased estimators Theorem 5.1 can be extended straight <strong>for</strong>wardly to the general caseby exchang<strong>in</strong>g G 1 and G 2 by G1 = A1 Ω A⊤1 and G2 = A2 Ω A⊤2 correspond<strong>in</strong>gly, with A1 = S k X ⊤ DP, A2 = UF η X ⊤ DP. Hence,−1the sampl<strong>in</strong>g variance of β (2) (η) is always smaller than that of β (1) (k), if and only if λ m<strong>in</strong> ( G2 G1 ) > 1, where λ m<strong>in</strong> is them<strong>in</strong>imum eigenvalue of G2−1 G1 .Now, we give a generalized version of Theorem 5.2.Theorem 6.2. Consider β (1) = A1 y and β (2) = A2 y of β. Suppose that the difference Cov{β (1) } − Cov{β (2) } is positive def<strong>in</strong>ite.Then∆ 3 = MSEM(β (1) ) − MSEM(β (2) )is positive def<strong>in</strong>ite if and only ifd ⊤ 2 ( A1 Ω A⊤1 − A2 Ω A⊤2 + d 1d ⊤ 1 )−1 d 2 < 1.Proof. The difference between the MSEMs of β (2) (η) and β (1) (k) is given by∆ 3 = MSEM(β (1) ) − MSEM(β (2) )= A1 Ω A⊤1 − A2 Ω A⊤2 + d 1d ⊤ 1 − d 2d ⊤ 2= Cov(β (1) ) − Cov(β (2) ) + d 1 d ⊤ 1 − d 2d ⊤ 2 .Apply<strong>in</strong>g Lemma 5.1 yields the desired result.□We note that <strong>in</strong> order to use the criteria above, one has to estimate the parameters. The estimation of Ω is thereby themost challeng<strong>in</strong>g. However, as long as estimator (23) is available, all considered criteria can be evaluated on the real dataand can be used <strong>for</strong> practical purposes.7. Determ<strong>in</strong>ants of electricity demandThe empirical study example is motivated by the importance of expla<strong>in</strong><strong>in</strong>g variation <strong>in</strong> electricity consumption. S<strong>in</strong>ceelectricity is a non-storable good, electricity providers are <strong>in</strong>terested <strong>in</strong> understand<strong>in</strong>g and hedg<strong>in</strong>g demand fluctuations.Electricity consumption is known to be <strong>in</strong>fluenced negatively by the price of electricity and positively by the <strong>in</strong>come of theconsumers. As electricity is frequently used <strong>for</strong> heat<strong>in</strong>g and cool<strong>in</strong>g, the effect of the air temperature must also be present.Both heat<strong>in</strong>g by low temperatures and cool<strong>in</strong>g by high temperatures result <strong>in</strong> higher electricity consumption and motivatethe use of a <strong>nonparametric</strong> specification <strong>for</strong> the temperature effect. Thus we consider the semiparametric regression modeldef<strong>in</strong>ed <strong>in</strong> (1)y = f (t) + β 1 x 1 + β 2 x 2 + β 3 x 3 + · · · + β 13 x 13 + ε, (29)where y is the log monthly electricity consumption per person (aggregated electricity consumption was divided bypopulation <strong>in</strong>terpolated l<strong>in</strong>early from quarterly data), t is cumulated average temperature <strong>in</strong>dex <strong>for</strong> the correspond<strong>in</strong>gmonth taken as average of 20 German cities computed from the data of German weather service (Deutscher Wetterdienst),x 1 is the log GDP per person <strong>in</strong>terpolated l<strong>in</strong>early from quarterly data, detrended and deseasonalized and x 2 is the log rateof electricity price to the gas price, detrended. The data <strong>for</strong> 199601-201009 comes from EUROSTAT. Reference prices <strong>for</strong>electricity were computed as an average of electricity tariffs <strong>for</strong> consumer groups IND-Ie and HH-Dc, <strong>for</strong> gas—IND-I3-2 andHH-D3 with reference period 2005S1. Time series of prices were obta<strong>in</strong>ed by scal<strong>in</strong>g with electricity price or correspond<strong>in</strong>glygas price <strong>in</strong>dices. x 3 , x 4 , . . . , x 13 are dummy variables <strong>for</strong> the monthly effects.The model <strong>in</strong> (29) <strong>in</strong>cludes both parametric effects and a <strong>nonparametric</strong> effect. The only <strong>nonparametric</strong> effect is impliedby the temperature variable. From Fig. 1, we can see that the effect of t on y is likely to be nonl<strong>in</strong>ear, while the effects ofother variables are roughly l<strong>in</strong>ear. The dummy variables enter <strong>in</strong>to the l<strong>in</strong>ear part <strong>in</strong> the specification of the semiparametricregression as well.


172 E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175Fig. 1. Plots of <strong>in</strong>dividual exp. variables vs. dependent variable, l<strong>in</strong>ear fit (green), local polynomial fit (red), 95% confidence bands (black). (For <strong>in</strong>terpretationof the references to colour <strong>in</strong> this figure legend, the reader is referred to the web version of this article.)We note that the condition number of X ⊤ X of these explanatory variables is 20.5, which justifies the use of β (1) (k) andβ (2) (η); see [2].Throughout the paper, we use fifth-order differenc<strong>in</strong>g (m = 5). Results <strong>for</strong> other orders of differenc<strong>in</strong>g were similar.The admissible regions <strong>for</strong> the bias<strong>in</strong>g parameters η and k <strong>for</strong> MSEM superiority were η ≥ 0.923 and k ≤ 0.0085.These bounds were determ<strong>in</strong>ed us<strong>in</strong>g the estimated parameters and the <strong>in</strong>equalities from Theorem 4.1 and Theorem 3.1<strong>in</strong> [27], respectively. Under more general assumptions on Ω and result<strong>in</strong>g heteroscedasticity and autocovariance <strong>consistent</strong>Newey–West covariance estimator, def<strong>in</strong>ed <strong>in</strong> (23), the admissible region <strong>for</strong> η (Theorem 6.1 and restriction (27)) wasshr<strong>in</strong>ked to η ≥ 0.927. For β (1) (k), no admissible values of k were found, s<strong>in</strong>ce admissible k ≥ 1.57 of (28) do not satisfythe condition of Theorem 6.1 (see Table 2).Alternatively, we used a scalar mean squared error (SMSE), def<strong>in</strong>ed as the trace of the correspond<strong>in</strong>g MSEM, to comparethe estimators. The bounds <strong>for</strong> k and η can then be calculated only numerically us<strong>in</strong>g a grid on [0, 1] <strong>for</strong> the bias<strong>in</strong>gparameters and determ<strong>in</strong><strong>in</strong>g the regions where SMSEs of the proposed estimators are lower. SMSE superiority of β (1) (k)and β (2) (η) over β (0) under general Ω is given <strong>for</strong> k ≤ 0.0267 and η ≥ 0.384 compared to k ≤ 0.0123 and η ≥ 0.708 bystandard assumptions; see Fig. 2 which depicts SMSE of the estimators and the correspond<strong>in</strong>g η and k under standard andgeneral assumptions. Thus the SMSE superiority <strong>in</strong>tervals <strong>for</strong> η and k become even larger <strong>in</strong> the case of the general <strong>for</strong>mof Ω.Our computations here are per<strong>for</strong>med with R 2.10.1 and the codes are available on www.quantlet.org.Results of different estimation procedures can be found <strong>in</strong> Table 1. We note that regardless of the estimator type, theeffect of <strong>in</strong>come is positive and the effect of relative price is negative as expected from an economic perspective, as <strong>in</strong> [4].However, the R 2 obta<strong>in</strong>ed by difference based methods is higher and SMSE lower <strong>for</strong> Liu type and ridge difference basedestimator. The values of bias<strong>in</strong>g parameters <strong>for</strong> which conditions of Theorems 5.1 and 5.2 are satisfied are given <strong>in</strong> Table 3.The superiority of β (2) (η) over β (1) (k) is assured <strong>for</strong> the zone of values marked by plus.Return<strong>in</strong>g to our semiparametric specification, we may now remove the estimated parametric effect from the dependentvariable and analyze the <strong>nonparametric</strong> effect. We use a local l<strong>in</strong>ear estimator of f to model the <strong>nonparametric</strong> effect oftemperature. The result<strong>in</strong>g plots are presented <strong>in</strong> Fig. 3 where we also <strong>in</strong>clude the l<strong>in</strong>ear effect. We notice that all differenc<strong>in</strong>gprocedures result <strong>in</strong> similar estimators of f , regardless of notable differences <strong>in</strong> the coefficients of the l<strong>in</strong>ear part. Theestimator of f is <strong>consistent</strong> with f<strong>in</strong>d<strong>in</strong>gs e.g. of [4] <strong>for</strong> US electricity data.In both specifications, f is different from the l<strong>in</strong>ear effect and there<strong>for</strong>e <strong>in</strong>clud<strong>in</strong>g temperature as a l<strong>in</strong>ear effect ismislead<strong>in</strong>g.


E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175 173Fig. 2. SMSE of β (2) (η) <strong>in</strong> dependence of η (left) and β (1) (k) <strong>in</strong> dependence of k (right) aga<strong>in</strong>st that of β (0) (dashed) under standard assumptions (black)and under generalized assumptions (red). (For <strong>in</strong>terpretation of the references to colour <strong>in</strong> this figure legend, the reader is referred to the web version ofthis article.)Table 1Results of OLS, difference based and Liu type difference based estimations.β OLSβ (0)β (1) (10 −3 ) β (2) (0.95)x 1 0.634 0.578 * 0.550 * 0.562 *x 2 −0.152 *** −0.160 *** −0.158 *** −0.161 ***x 3 0.030 *** 0.030 * 0.030 * 0.030 *x 4 −0.043 *** −0.040 ** −0.040 ** −0.040 **x 5 0.011 0.031 0.031 0.031x 6 −0.051 ** −0.014 −0.013 −0.014x 7 −0.054 * −0.014 −0.013 −0.014x 8 −0.079 ** −0.065 −0.064 −0.065x 9 −0.036 −0.037 −0.036 −0.037x 10 −0.052 −0.044 −0.043 −0.044x 11 −0.049 −0.013 −0.012 −0.013x 12 −0.000 0.040 0.040 0.040x 13 −0.001 0.016 0.016 0.016t −13 · 10 −5*** – – –R 2 0.729 0.749 0.749 0.749* Indicates significance on 10%.** Indicates significance on 5%.*** Indicates significance on 1%.Table 2Standard errors of the estimators <strong>in</strong> comparison to Newey–West standard errors <strong>for</strong> the effects of x 1 (<strong>in</strong>come) and x 2 (relative price).Ω β (0)β (1) (10 −3 ) β (2) (0.95)σ 2 I Ω NW σ 2 I Ω NW σ 2 I Ω NWx 1 0.215 0.347 0.209 0.337 0.205 0.215x 2 0.034 0.047 0.034 0.047 0.034 0.034SMSE 0.058 0.148 0.056 0.141 0.054 0.0588. ConclusionWe proposed a difference based Liu type estimator and a difference based ridge regression estimator <strong>for</strong> the partial l<strong>in</strong>earsemiparametric regression model.The results show that <strong>in</strong> case of multicoll<strong>in</strong>earity, the proposed estimator, β (2) (η) is superior to the difference basedestimator β (0) . We gave bounds on the value of η which ensure the superiority of the proposed estimator. The two biasedestimators β (2) (η) and β (1) (k) <strong>for</strong> different values of η and k can be compared <strong>in</strong> terms of MSEM with the theoretical resultsabove.F<strong>in</strong>ally, an application to electricity consumption has been provided to show properties of the proposed estimatorbased on the mean square error criterion. We could estimate the l<strong>in</strong>ear effects of the l<strong>in</strong>ear determ<strong>in</strong>ants as well as the<strong>nonparametric</strong> effect f of a cumulated average temperature <strong>in</strong>dex.Thus, the theoretical results obta<strong>in</strong>ed allow us to tackle the problem of multicoll<strong>in</strong>earity <strong>in</strong> real applications ofsemiparametric models. Moreover, we are able to get estimators of the l<strong>in</strong>ear effects with lower standard errors by tun<strong>in</strong>gparameters k and η accord<strong>in</strong>gly.


174 E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175Table 3Admissible bias<strong>in</strong>g parameters η and k marked by plus if they satisfy conditions of Theorems 5.1 and 5.2, i.e. β (2) (η) is superior to β (1) (k).η · 10 2 k · 10 41 2 3 4 5 6 7 8 9 10 11 12 139.23–9.23 − − − − − − − − − − − − −9.24–9.24 + − − − − − − − − − − − −9.25–9.25 + + − − − − − − − − − − −9.26–9.26 + + + − − − − − − − − − −9.27–9.27 + + + + − − − − − − − − −9.28–9.28 + + + + + − − − − − − − −9.29–9.30 + + + + + + − − − − − − −9.31–9.31 + + + + + + + − − − − − −9.32–9.32 + + + + + + + + − − − − −9.34–9.35 + + + + + + + + + − − − −9.36–9.37 + + + + + + + + + + − − −9.38–9.39 + + + + + + + + + + + − −9.40–9.43 + + + + + + + + + + + + −9.44–9.56 + + + + + + + + + + + + +9.57–9.61 + + + + + + + + + + + + −9.62–9.65 + + + + + + + + + + + − −9.66–9.69 + + + + + + + + + + − − −9.70–9.72 + + + + + + + + + − − − −9.73–9.76 + + + + + + + + − − − − −9.77–9.79 + + + + + + + − − − − − −9.80–9.82 + + + + + + − − − − − − −9.83–9.85 + + + + + − − − − − − − −9.86–9.88 + + + + − − − − − − − − −9.89–9.91 + + + − − − − − − − − − −9.92–9.94 + + − − − − − − − − − − −9.95–9.97 + − − − − − − − − − − − −9.98–9.99 − − − − − − − − − − − −Fig. 3. Estimated f nonl<strong>in</strong>ear effect of t on y via differenced based (left), Liu-type differenced based (right) and difference-based ridge (center) approaches.References[1] F. Akdeniz, S. Kaçiranlar, On the almost unbiased generalized Liu estimator and unbiased estimation of the bias and MSE, Communications <strong>in</strong> StatisticsTheory and Methods 24 (7) (1995) 1789–1797.[2] D. Belsley, E. Kuh, R. Welsch, Regression Diagnostics, Wiley, New York, 1980.


E. Akdeniz Duran et al. / Journal of Multivariate Analysis 105 (2012) 164–175 175[3] L. Brown, M. Lev<strong>in</strong>e, Variance estimation <strong>in</strong> <strong>nonparametric</strong> regression via the difference sequence method, Annals of Statistics 35 (2007) 2219–2232.[4] R.F. Engle, C. Granger, J. Rice, A. Weiss, Semiparametric estimates of the relation between weather and electricity sales, Journal of American StatisticalAssociation 81 (1986) 310–320.[5] R. Eubank, Nonparametric Regression and Spl<strong>in</strong>e Smooth<strong>in</strong>g, Marcel Dekker, New York, 1999.[6] R. Eubank, E. Kambour, J. Kim, K. Klipple, C. Reese, M. Schimek, Kernel smooth<strong>in</strong>g <strong>in</strong> partial l<strong>in</strong>ear models, Journal of the Royal Statistical Society SeriesB 50 (3) (1988) 413–436.[7] R. Eubank, E. Kambour, J. Kim, K. Klipple, C. Reese, M. Schimek, Estimation <strong>in</strong> partially l<strong>in</strong>ear models, Computational Statistics and Data Analysis 29(1998) 27–34.[8] J. Fan, L. Huang, Goodness-of-fit <strong>test</strong>s <strong>for</strong> parametric regression models, Journal of American Statistical Association 96 (2001) 640–652.[9] J. Fan, Y. Wu, Semiparametric estimation of covariance matrices <strong>for</strong> longitud<strong>in</strong>al data, Journal of American Statistical Association 103 (2008)1520–1533.[10] R. Farebrother, Further results on the mean square error of ridge regression, Journal of the Royal Statistical Society Series B 38 (1976) 248–250.[11] F. Graybill, Matrices with Applications <strong>in</strong> Statistics, Duxbury Classic, 1983.[12] P. Green, C. Jennison, A. Seheult, Analysis of field experiments by least squares smooth<strong>in</strong>g, Journal of the Royal Statistical Society Series B 47 (1985)299–315.[13] M. Gruber, Improv<strong>in</strong>g Efficiency by Shr<strong>in</strong>kage: The James–Ste<strong>in</strong> and Ridge Regression Estimators, Marcell Dekker, Inc., New York, 1985[14] W. Härdle, H. Liang, J. Gao, Partially L<strong>in</strong>ear Models, Physika Verlag, Heidelberg, 2000.[15] W. Härdle, M. Müller, S. Sperlich, A. Werwatz, Nonparametric and Semiparametric Models, Spr<strong>in</strong>ger Verlag, Heidelberg, 2004.[16] D. Harville, Matrix Algebra from a Statistician’s Perspective, Spr<strong>in</strong>ger Verlag, New York, 1997.[17] A. Hoerl, R. Kennard, Ridge regression:biased estimation <strong>for</strong> orthogonal problems, Technometrics 12 (1970) 55–67.[18] M. Hubert, P. Wijekoon, Improvement of the Liu estimation <strong>in</strong> l<strong>in</strong>ear regression model, Statistical Papers 47 (3) (2006) 471–479.[19] K. Klipple, R. Eubank, Difference-based variance estimators <strong>for</strong> partially l<strong>in</strong>ear models, Festschrift <strong>in</strong> honor of Dist<strong>in</strong>guished Professor Mir Masoom Alion the occasion of his retirement, 2007, pp. 313–323.[20] K. Liu, A new class of biased estimate <strong>in</strong> l<strong>in</strong>ear regression, Communications <strong>in</strong> Statistics Theory and Methods 22 (1993) 393–402.[21] K. Liu, Us<strong>in</strong>g Liu type estimator to combat multicoll<strong>in</strong>earity, Communications <strong>in</strong> Statistics Theory and Methods 32 (5) (2003) 1009–1020.[22] W. Newey, K. West, A simple, positive semi-def<strong>in</strong>ite, heteroskedasticity and autocorrelation <strong>consistent</strong> covariance matrix, Econometrica 55 (3) (1987)703–708.[23] C. Rao, L<strong>in</strong>ear Statistical Inference and its Applications, Wiley, New York, 1973.[24] D. Ruppert, M. Wand, R. Carroll, R. Gill, Semiparametric Regression, Cambridge University Press, 2003.[25] J. Schott, Matrix Analysis <strong>for</strong> Statistics, second ed., Wiley Inc., New Jersey, 2005.[26] C. Ste<strong>in</strong>, Inadmissibility of the usual estimator <strong>for</strong> the mean of a multivariate normal distribution, <strong>in</strong>: Proc. Third Berkeley Symp. Math. Statist. Prob.,vol. 1, 1956, pp. 197–206.[27] G. Tabakan, F. Akdeniz, Difference-based ridge estimator of parameters <strong>in</strong> partial l<strong>in</strong>ear model, Statistical Papers 51 (2010) 357–368.[28] G. Trenkler, H. Toutenburg, Mean square matrix comparisons between two biased estimators-an overview of recent results, Statistical Papers 31(1990) 165–179.[29] H. Yang, J. Xu, An alternative stochastic restricted Liu estimator <strong>in</strong> l<strong>in</strong>ear regression, Statistical Papers 50 (2009) 639–647.[30] A. Yatchew, An elementary estimator of the partial l<strong>in</strong>ear model, Economics Letters 57 (1997) 135–143.[31] A. Yatchew, Differenc<strong>in</strong>g methods <strong>in</strong> <strong>nonparametric</strong> regression: simple techniques <strong>for</strong> the applied econometrician, 1999.http://www.economics.utoronto.ca/yatchew/.[32] A. Yatchew, Semiparametric Regression <strong>for</strong> the Applied Econometrician, Cambridge University Press, 2003.[33] J. You, G. Chen, Y. Zhou, Statistical <strong>in</strong>ference of partially l<strong>in</strong>ear regression models with heteroscedastic errors, Journal of Multivariate Analysis 98 (8)(2007) 1539–1557.


Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625Contents lists available at SciVerse ScienceDirectJournal of Empirical F<strong>in</strong>ancejournal homepage: www.elsevier.com/locate/jempf<strong>in</strong>Modell<strong>in</strong>g and <strong>for</strong>ecast<strong>in</strong>g liquidity supply us<strong>in</strong>g semiparametricfactor dynamics☆Wolfgang Karl Härdle, Nikolaus Hautsch, Andrija Mihoci ⁎<strong>Humboldt</strong> Universität zu Berl<strong>in</strong> and C.A.S.E. – Center <strong>for</strong> Applied Statistics and Economics, Spandauer Str. 1, D-10178 Berl<strong>in</strong>, GermanyCenter <strong>for</strong> F<strong>in</strong>ancial Studies (CFS), Frankfurt, Germanyarticle <strong>in</strong>fo abstractArticle history:Received 21 April 2010Received <strong>in</strong> revised <strong>for</strong>m 21 March 2012Accepted 2 April 2012Available onl<strong>in</strong>e 10 April 2012JEL classification:C14C32C53G11Keywords:Limit order bookLiquidity riskSemiparametric modelFactor structurePredictionWe model the dynamics of ask and bid curves <strong>in</strong> a limit order book market us<strong>in</strong>g a dynamicsemiparametric factor model. The shape of the curves is captured by a factor structure which isestimated <strong>nonparametric</strong>ally. Correspond<strong>in</strong>g factor load<strong>in</strong>gs are modelled jo<strong>in</strong>tly with best bidand best ask quotes us<strong>in</strong>g a vector error correction specification. Apply<strong>in</strong>g the framework tofour stocks traded at the Australian Stock Exchange (ASX) <strong>in</strong> 2002, we show that the suggestedmodel captures the spatial and temporal dependencies of the limit order book. We f<strong>in</strong>d spillovereffects between both sides of the market and provide evidence <strong>for</strong> short-term quotepredictability. Relat<strong>in</strong>g the shape of the curves to variables reflect<strong>in</strong>g the current state of themarket, we show that the recent liquidity demand has the strongest impact. In an extensive<strong>for</strong>ecast<strong>in</strong>g analysis we show that the model is successful <strong>in</strong> <strong>for</strong>ecast<strong>in</strong>g the liquidity supplyover various time horizons dur<strong>in</strong>g a trad<strong>in</strong>g day. Moreover, it is shown that the model's<strong>for</strong>ecast<strong>in</strong>g power can be used to improve optimal order execution strategies.© 2012 Elsevier B.V. All rights reserved.1. IntroductionDue to technological progress <strong>in</strong> the organization of trad<strong>in</strong>g systems and exchanges, electronic limit order book trad<strong>in</strong>g hasbecome the dom<strong>in</strong>ant trad<strong>in</strong>g <strong>for</strong>m <strong>for</strong> equities. Open limit order books provide important <strong>in</strong><strong>for</strong>mation on the current liquiditysupply as reflected by the offered price-quantity relationships on both sides of the market. These supply and demand schedulesprovide valuable <strong>in</strong><strong>for</strong>mation on traders' price expectations <strong>in</strong> the spirit of the sem<strong>in</strong>al paper by Glosten (1994), reflect thecurrent implied costs of trad<strong>in</strong>g as well as demand and supply elasticities. However, while the dynamic behavior of liquiditydemand, as reflected by trad<strong>in</strong>g <strong>in</strong>tensities and trade sizes, has been already studied <strong>in</strong> various papers (see, e.g., (Hautschand Huang (2012) and Brownlees et al. (2009)), the stochastic properties of liquidity supply is still widely unknown. An obviousreason is that liquidity supply is reflected by high-dimensional bid and ask schedules which are not straight<strong>for</strong>wardly modelled<strong>in</strong> a dynamic sett<strong>in</strong>g. Consequently, it is a widely open question whether and to which extent liquidity supply might bepredictable.☆ This work was supported by the Deutsche Forschungsgeme<strong>in</strong>schaft via Collaborative Research Center 649 “Ökonomisches Risiko”, <strong>Humboldt</strong>-Universitt zuBerl<strong>in</strong>, Germany.⁎ Correspond<strong>in</strong>g author. Tel.: +49 30 2093 5623.E-mail address: mihociax@cms.hu-berl<strong>in</strong>.de (A. Mihoci).0927-5398/$ – see front matter © 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.jempf<strong>in</strong>.2012.04.002


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625611The paper's major idea is to capture the shape of high-dimensional ask and bid curves by a lower-dimensional factor structurewhich is estimated non-parametrically. We propose a dynamic semiparametric factor model where the shape of order schedulesis captured by a non-parametric factor structure while the curves' dynamic behavior is driven by time-vary<strong>in</strong>g factor load<strong>in</strong>gs. Thelatter are modelled parametrically employ<strong>in</strong>g a vector error correction model (VECM). We show that the model captures thedynamics of high-dimensional order curves very well and is sufficiently parsimonious to produce valuable out-of-samplepredictions. Moreover, the schedule of market depth posted around best quotes reveals strong serial dependence and thus ispredictable. This structure is <strong>in</strong>duced by the <strong>in</strong>ventory character of order volume which is strongly persistent over time.By provid<strong>in</strong>g empirical evidence on the dynamics and predictability of order book schedules, this paper fills a gap <strong>in</strong> empiricalliterature and complements recent (mostly theoretical) work on order splitt<strong>in</strong>g and dynamic order submission strategies. For<strong>in</strong>stance, the question of how to reduce the costs of trad<strong>in</strong>g by optimally splitt<strong>in</strong>g a large order over time (e.g., over the course of atrad<strong>in</strong>g day) is of high relevance <strong>in</strong> f<strong>in</strong>ancial practice. Obizhaeva and Wang (2005) and Engle and Ferstenberg (2007) analyzeoptimal splitt<strong>in</strong>g strategies whose implementations ultimately require predictions of future liquidity demand and supply.Bertsimas and Lo (1998) and Almgren and Chriss (2000) derive optimal execution strategies by m<strong>in</strong>imiz<strong>in</strong>g expected costs ofexecut<strong>in</strong>g, an order <strong>in</strong> the context of static price impact functions. Optimal execution <strong>in</strong> a limit order book market is analyzed byAlfonsi et al. (2010). They allow <strong>for</strong> general shapes of order book curves and derive explicit optimal execution strategies <strong>in</strong>discrete time. By provid<strong>in</strong>g <strong>in</strong>sights <strong>in</strong>to the actual <strong>for</strong>m of order book curves and their dynamic behavior, our results can be usedas valuable <strong>in</strong>puts <strong>in</strong> theoretical frameworks.While to the best of our knowledge our study is the first which models the shapes and dynamics of a complete (highdimensional)order book, there is a substantial body of empirical literature on the dynamics of limit order books and the analysisof traders' order submission strategies, such as, e.g., Biais et al. (1995), Griffiths et al. (2000), Ahn et al. (2001), Ranaldo (2004),Hollifield et al. (2004), Bloomfield et al. (2005), Degryse et al. (2005), Hall and Hautsch (2006, 2007), Large (2007), Hasbrouckand Saar (2009) or Cao et al. (2009).An important aspect <strong>in</strong> this literature is to analyze the question of how to optimally balance risks and ga<strong>in</strong>s of a trader'sdecision whether to post a market order or a limit order. As recently illustrated by Chacko et al. (2008), a limit order can beultimately seen as an American option and transaction costs are rents that a monopolistic market maker extracts from impatient<strong>in</strong>vestors who trade via aggressive limit orders or market orders. Consequently, the analysis of liquidity risks (see, e.g., Johnson,(2008), Liu (2009), Garvey and Wu (2009), Goyenko et al. (2009)) and transaction costs (see, e.g. Chacko et al. (2008), Hasbrouck(2009)) are <strong>in</strong> the central focus of recent literature.Given the objective to capture not only the volume around the best quotes but also pend<strong>in</strong>g quantities more deeply <strong>in</strong> thebook, the underly<strong>in</strong>g problem becomes <strong>in</strong>herently high-dimensional. A typical graphical snapshot of ask and bid curves <strong>for</strong> fourstocks traded at the Australian Securities Exchange (ASX) <strong>in</strong> 2002, is given by Fig. 1. The curse of dimensionality appliesimmediately as soon as time variations of the order curve shapes have to be taken <strong>in</strong>to account. As shown by Fig. 1 and asillustrated <strong>in</strong> more detail <strong>in</strong> the sequel of the paper, order volume is not necessarily only concentrated around the best quotes butcan be substantially dispersed over a wider range of price levels. This is a typical scenario <strong>for</strong> moderately liquid markets as that ofthe ASX. In such a context, the dynamic modell<strong>in</strong>g of all volume levels <strong>in</strong>dividually becomes complicate and <strong>in</strong>tractable.We suggest reduc<strong>in</strong>g the high dimensionality of the order book by means of a factor decomposition us<strong>in</strong>g the so-calledDynamic Semiparametric Factor Model (DSFM) proposed by Fengler et al. (2007), Brüggemann et al. (2008), Park et al. (2009)and Cao et al. (2009). Accord<strong>in</strong>gly, we model the shape of the book <strong>in</strong> terms of underly<strong>in</strong>g latent factors which are def<strong>in</strong>ed on agrid space around the best ask or bid quotes and can depend on additional explanatory variables captur<strong>in</strong>g, e.g., the state of themarket. In order to avoid specific functional <strong>for</strong>ms <strong>for</strong> the shape of the curves, the factors as well as the correspond<strong>in</strong>g load<strong>in</strong>gs areestimated <strong>nonparametric</strong>ally us<strong>in</strong>g B-spl<strong>in</strong>es. Then, <strong>in</strong> a second step, we model the multivariate dynamics of the factor load<strong>in</strong>gstogether with the best bid and the best ask price us<strong>in</strong>g a VEC specification.Us<strong>in</strong>g this framework we aim answer<strong>in</strong>g the follow<strong>in</strong>g research questions: (i) How many factors are required to model orderbook curves reasonably well? (ii) What does the shape of the factors look like? (iii) What do the dynamics of the estimated factorload<strong>in</strong>gs look like? (iv) Does there exist evidence <strong>for</strong> a strong cross-dependence between both sides of the order book? (v) Canquotes be predictable <strong>in</strong> the short run? (vi) Does the shape of the order book curves depend on past price movements, pasttrad<strong>in</strong>g volume as well past volatility? (vii) How successful is the model <strong>in</strong> predict<strong>in</strong>g future liquidity supply and can it be used toimprove order execution strategies?BHPx 10515NABx 1053MIMx 10615WOWx 1056Quantity105211054209 10 11 12Price <strong>in</strong> AUD032 33 34 35Price <strong>in</strong> AUD00 1 2 3Price <strong>in</strong> AUD011 12 13 14Price <strong>in</strong> AUDFig. 1. Limit order books <strong>for</strong> selected stocks traded at the ASX on July 8, 2002 at 10:15. Red: bid curve, blue: ask curve.


612 W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625Us<strong>in</strong>g limit order book data from four stocks traded at the ASX cover<strong>in</strong>g two months <strong>in</strong> 2002, we show that approximately 95%of the order book variations observed on 5-m<strong>in</strong> <strong>in</strong>tervals can be expla<strong>in</strong>ed by two underly<strong>in</strong>g time-vary<strong>in</strong>g factors. While the firstfactor captures the overall slope of the curves, the second one is associated with its curvature. Know<strong>in</strong>g the shape of the orderbook can help us to predict quotes <strong>in</strong> the short run. Further empirical results show relatively weak spill-over effects between thebid and the ask side of the market. It turns out that recent liquidity demand represented by the cumulative buy/sell trad<strong>in</strong>gobserved over the past 5 m<strong>in</strong> has an effect on the shape of the curve but does not <strong>in</strong>duce a higher explanatory power. Similarevidence is shown <strong>for</strong> the impact of past returns and correspond<strong>in</strong>g (realized) volatility. Moreover, we f<strong>in</strong>d that factor load<strong>in</strong>gsfollow highly persistent though stationary dynamics.To evaluate the model's <strong>for</strong>ecast<strong>in</strong>g power, we per<strong>for</strong>m an extensive out-of-sample <strong>for</strong>ecast<strong>in</strong>g analysis which is <strong>in</strong> l<strong>in</strong>e with atypical scenario <strong>in</strong> f<strong>in</strong>ancial practice. In particular, at every 5-m<strong>in</strong> <strong>in</strong>terval dur<strong>in</strong>g a trad<strong>in</strong>g day, the model is re-estimated andused to produce <strong>for</strong>ecasts <strong>for</strong> the pend<strong>in</strong>g volume on each price level <strong>for</strong> all future 5-m<strong>in</strong> <strong>in</strong>tervals dur<strong>in</strong>g the rema<strong>in</strong>der of thetrad<strong>in</strong>g day. We show that our approach is able to outper<strong>for</strong>m a naive prediction, where the current order book is used as apredictor <strong>for</strong> the rema<strong>in</strong><strong>in</strong>g day. These results can be used to improve <strong>in</strong>tra-day order execution strategies by reduc<strong>in</strong>g impliedtransaction costs.The rema<strong>in</strong>der of the paper is structured as follows: After the data description <strong>in</strong> Section 2, the Dynamic SemiparametricFactor Model (DSFM) is <strong>in</strong>troduced <strong>in</strong> Section 3. Empirical results regard<strong>in</strong>g the modell<strong>in</strong>g and <strong>for</strong>ecast<strong>in</strong>g of liquidity supply areprovided <strong>in</strong> Sections 4 and 5, respectively. Section 6 concludes.2. Data2.1. Trad<strong>in</strong>g at the ASX and descriptive statisticsThe Australian Stock Exchange (ASX) is a cont<strong>in</strong>uous double auction electronic market, where the cont<strong>in</strong>uous auction trad<strong>in</strong>gperiod is preceded and followed by a call auction. Normal trad<strong>in</strong>g takes place cont<strong>in</strong>uously on all stocks between 10:09 a.m. and4:00 p.m. from Monday to Friday. Dur<strong>in</strong>g cont<strong>in</strong>uous trad<strong>in</strong>g, any buy (sell) order entered that has a price that is greater than(less than) or equal to exist<strong>in</strong>g queued buy (sell) orders, will be executed immediately. If an order cannot be executed completely,the rema<strong>in</strong><strong>in</strong>g volume enters the queues as a limit order. Limit orders are queued <strong>in</strong> the buy and sell queues accord<strong>in</strong>g to a strictprice-time priority order. Orders can be entered, deleted and modified without restriction.For order prices below 10 cents, the m<strong>in</strong>imum tick size is 0.1 cents, <strong>for</strong> order prices above 10 cents and below 50 cents it is 0.5 cents,whereas <strong>for</strong> orders priced 50 cents and above it is 1 cent. Note that there might be orders which are entered with an undisclosed orhidden volume if the total value of the order exceeds AUD 200,000. S<strong>in</strong>ce this applies only to a small fraction of the posted volumes, wecan safely neglect the occurrence of hidden volume <strong>in</strong> our empirical study. For more details on the data, see Hall and Hautsch (2007)us<strong>in</strong>g the same data base as well as the official description of the trad<strong>in</strong>g rules of the Stock Exchange Automated Trad<strong>in</strong>g System(SEATS) on the ASX on www.asxonl<strong>in</strong>e.com.We select four companies traded at the ASX cover<strong>in</strong>g the period from July 8 to August 16, 2002 (30 trad<strong>in</strong>g days), namelyBroken Hill Proprietary Limited (BHP), National Australia Bank Limited (NAB), MIM and Woolworths (WOW). The number ofmarket and limit orders <strong>for</strong> the selected stocks is given <strong>in</strong> Table 1.We observe more buy orders than sell orders imply<strong>in</strong>g that the bid side of the limit order book was chang<strong>in</strong>g more frequentlythan the ask side. BHP and NAB are significantly more actively traded than MIM and WOW shares. Aggregated over all stocks,20.08% (23.98%) of all bid (ask) limit orders have been changed (after post<strong>in</strong>g), whereas 13.70% (14.89%) have been cancelled.Furthermore, <strong>for</strong> both traded as well as posted quantities we f<strong>in</strong>d that on average sell volumes are higher than buy volumes (notreported here). Hence, confirm<strong>in</strong>g the result above, liquidity variations on the bid side are higher than that of the ask side. Thisf<strong>in</strong>d<strong>in</strong>g might be expla<strong>in</strong>ed by the fact that dur<strong>in</strong>g the analyzed period the market generally went down creat<strong>in</strong>g more sellactivities than buy activities.The orig<strong>in</strong>al dataset conta<strong>in</strong>s all limit order book records as well as the correspond<strong>in</strong>g order curves represented by theunderly<strong>in</strong>g price-volume comb<strong>in</strong>ations. The latter is the particular object of <strong>in</strong>terest <strong>for</strong> the rema<strong>in</strong>der of the analysis.Table 1Total number of market and limit orders <strong>for</strong> selected stocks traded at the ASX from July 8 to August 16, 2002.Orders BHP NAB MIM WOWMarket orders(i) Buy 28,030 16,304 4115 7260(ii) Sell 16,755 15,142 2789 6464Limit orders(i) Buy (bid side) 50,012 28,850 9551 13,234– Changed 8009 7561 1637 3203– Cancelled 5202 4725 2044 1951(ii) Sell (ask side) 32,053 25,953 6474 11,318– Changed 6891 6261 1862 3164– Cancelled 4692 3863 1178 1554


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–6256132.2. Notation and data preprocess<strong>in</strong>gThe underly<strong>in</strong>g limit order book data conta<strong>in</strong>s identification attributes regard<strong>in</strong>g r=1,…,R different orders as well asquantities demanded and offered <strong>for</strong> different price levels j=1,…,J, at any time po<strong>in</strong>t t=1,…,T. Particularly, at any t, we observeJ=101 price levels on a fixed m<strong>in</strong>imum tick size grid orig<strong>in</strong>at<strong>in</strong>g from the best bid and ask quote.S<strong>in</strong>ce the order book dynamics are found to be very persistent, we choose a sampl<strong>in</strong>g frequency of 5 m<strong>in</strong> without los<strong>in</strong>g toomuch <strong>in</strong><strong>for</strong>mation on the liquidity supply. To remove effects due to market open<strong>in</strong>g and closure, the first 15 m<strong>in</strong> and last 5 m<strong>in</strong>are discarded. Hence, at each trad<strong>in</strong>g day, start<strong>in</strong>g at 10:15 and end<strong>in</strong>g at 15:55, we select per stock 69 price-quantity vectors, <strong>in</strong>total T=2070 vectors over the whole sample period. Denote Y ~ b t;j and Y ~ a t;j as the pend<strong>in</strong>g bid and ask volumes at bid and ask limitprices S ~ b t;j and S ~ a t;j , respectively at time po<strong>in</strong>t t.We def<strong>in</strong>e the best bid price at time t as the highest buy price S ~ b t;101 , and similarly, the best ask price at t as the lowest sell price~S a t;1 . The correspond<strong>in</strong>g quantities at best bid and ask prices are then Y ~ b t;101 and Y ~ a t;1 , respectively, yield<strong>in</strong>g the mid-quote price tobe def<strong>in</strong>ed as S ~ t ¼ S ~ b t;101 þ S ~ at;1 =2. The absolute price deviations from the best bid and ask price at level j and time t are given by⌣bS t;j ¼ S ~ b t;j −~ S b t;101 and ⌣ S a t;j ¼ S ~ a t;j −~ S a t;1 , respectively and constitute a fixed price grid. To measure spreads between <strong>in</strong>dividual pricelevels <strong>in</strong> relative terms, i.e., <strong>in</strong> relation to the prevail<strong>in</strong>g best bid and ask price, we def<strong>in</strong>e so-called 'relative price levels' asS b t;j ¼ ⌣ S b t;j =~ S b t;101 and Sa t;j ¼ ⌣ S a t;j =~ S a t;1 , respectively.In order to <strong>in</strong>vestigate to which extent order book <strong>in</strong><strong>for</strong>mation might reveal <strong>in</strong><strong>for</strong>mation to predict high-frequency returns, we regress1 m<strong>in</strong> and 5 m<strong>in</strong> mid-quote returns, respectively, on lagged order imbalances~Y b t−1;j= Y ~ b t−1;j þ Y ~ at−1;jand~Y a t−1;j = Y ~ b t−1;j þ Y ~ at−1;j ;respectively, <strong>for</strong> j=1,…,101. Fig. 2 shows the implied R 2 values <strong>in</strong> dependence of the number of <strong>in</strong>cluded imbalance levels. Itturns out that order book imbalances <strong>in</strong>deed reveal short-term predictability. Interest<strong>in</strong>gly, even levels far apart from the markethave still dist<strong>in</strong>ct prediction power push<strong>in</strong>g the R 2 to values of approximately 10%. These f<strong>in</strong>d<strong>in</strong>gs show that the order book itselfreveals predictive content <strong>for</strong> future price movements which could be exploited <strong>in</strong> trad<strong>in</strong>g strategies.In order to account <strong>for</strong> <strong>in</strong>tra-day seasonality effects, we adjust the order volumes correspond<strong>in</strong>gly. To avoid to seasonallyadjust all <strong>in</strong>dividual volume series separately, we assume that the seasonality impact on quoted volumes at all levels is identicaland is well captured by the seasonalities <strong>in</strong> market depth on the best bid and ask levels Y ~ b t;101 and Y ~ a t;1 , respectively. Assum<strong>in</strong>g amultiplicative impact of the seasonality factor, the seasonally adjusted quantities are computed <strong>for</strong> both sides of the market atprice level j, and time t asY b t;j ¼~Y b t;js b tð1ÞY ~ aY a t;j ¼ t;j; ð2Þs a twith s t b and s t a represent<strong>in</strong>g the seasonality components at time t <strong>for</strong> the bid and the ask side, respectively.The non-stochastic seasonal trend factors s t b and s t a are specified parametrically us<strong>in</strong>g a flexible Fourier series approximation asproposed by Gallant (1981) and are given bynos b t ¼ δ b ⋅t þ XMbδ b c;mcosðt⋅2πm Þþδ b s;ms<strong>in</strong>ðt⋅2πm Þm¼1ð3Þ0.1BHP0.08NAB0.15MIM0.08WOWR 20.0750.050.0250.060.040.020.10.050.060.040.020025 50 75 100Price Levels00 25 50 75 100Price Levels00 25 50 75 100Price Levels00 25 50 75 100Price LevelsFig. 2. Coefficients of determ<strong>in</strong>ation (R 2 ) implied by l<strong>in</strong>ear regression of 1 m<strong>in</strong> (red) and 5 m<strong>in</strong> (blue) mid-quote returns on lagged order imbalances <strong>for</strong> selectedstocks traded at the ASX from July 8 to August 16, 2002 (30 trad<strong>in</strong>g days). The horizontal axis depicts the number of <strong>in</strong>cluded imbalance levels.


614 W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625nos a t ¼ δ a ⋅t þ XMaδ a c;mcosðt⋅2πm Þþδ a s;ms<strong>in</strong>ðt⋅2πm Þ : ð4Þm¼1Here δ b , δ a , δ b a b ac,m , δ c,m and δ s,m and δ s,m are coefficients to be estimated, and t denotes a normalized time trend mapp<strong>in</strong>g the timeof the day on a (0,1] <strong>in</strong>tervals. The polynomial orders M b and M a are selected accord<strong>in</strong>g to the Bayes <strong>in</strong><strong>for</strong>mation criterion (BIC).For all stocks we select M b =M a =1, except <strong>for</strong> the bid side <strong>for</strong> BHP (M b =2). The result<strong>in</strong>g <strong>in</strong>tra-day seasonality patterns <strong>for</strong> bothsides of all order book markets are plotted <strong>in</strong> Fig. 3.For all stocks, we observe that the liquidity supply <strong>in</strong>creases be<strong>for</strong>e market closure. We attribute this f<strong>in</strong>d<strong>in</strong>g to traders'pressure and will<strong>in</strong>gness to close positions overnight. Post<strong>in</strong>g aggressive limit orders on the best levels (or even with<strong>in</strong> thespread) maximizes the execution probability and avoids cross<strong>in</strong>g the spread. Moreover, weak evidence <strong>for</strong> a ‘lunch time dip’ ispresented which, however, is only observed <strong>for</strong> the more liquid stocks (NAB and BHP). In contrast, <strong>for</strong> the less liquid stocks, theamount of posted volume nearly monotonically <strong>in</strong>creases over the course of the day.3. The dynamic semiparametric factor modelRecall that the object of <strong>in</strong>terest is the high-dimensional object of seasonally adjusted level-dependent order volume<strong>in</strong>ventories Y b t;j ; Ya t;j ∈R 202 , observed on a 5-m<strong>in</strong> frequency. Propos<strong>in</strong>g a suitable statistical model requires f<strong>in</strong>d<strong>in</strong>g an appropriateway of reduc<strong>in</strong>g the high dimension without los<strong>in</strong>g too much <strong>in</strong><strong>for</strong>mation on the spatial and dynamic structure of the process.Moreover, applicability of the model requires computational tractability as well as numerical stability.A common way to reduce the dimensionality of multivariate processes is to apply a factor decomposition. The underly<strong>in</strong>g ideais that the high-dimensional process is ideally driven by only a few common factors which conta<strong>in</strong> most underly<strong>in</strong>g <strong>in</strong><strong>for</strong>mation.Factor models are often applied <strong>in</strong> the asset pric<strong>in</strong>g literature to extract underly<strong>in</strong>g common risk factors. In this spirit, a successfulparametric factor model has been proposed, <strong>for</strong> <strong>in</strong>stance, by Nelson and Siegel (1987) to model yield curves. In this framework,the shape of the curve is parametrically captured by Laguerre polynomials.Limit order book curves <strong>in</strong>herently reflect traders' price expectations and the supply and demand <strong>in</strong> the market (see, e.g.Glosten (1994) <strong>for</strong> a theoretical framework). As there is no obvious parametric <strong>for</strong>m <strong>for</strong> ask and bid curves and we want to avoidimpos<strong>in</strong>g assumptions on functional <strong>for</strong>m, we prefer to capture the curve's spatial structure <strong>in</strong> a <strong>nonparametric</strong> way. A naturaland powerful class of models <strong>for</strong> these k<strong>in</strong>d of problems is the class of Dynamic Semiparametric Factor Models (DSFMs) proposedby Fengler et al. (2007), Brüggemann et al. (2008), Park et al. (2009) and Cao et al. (2009). The DSFM model successfully comb<strong>in</strong>esthe advantages of a <strong>nonparametric</strong> approach <strong>for</strong> cross-sectionally (‘spatially’) fitt<strong>in</strong>g a curve and that of a parametric time seriesmodel <strong>for</strong> modell<strong>in</strong>g persistent multivariate dynamics.Assume that that the observable J-dimensional random vector, Y t,j , can be modelled based on the follow<strong>in</strong>g orthogonal L-factormodel,Y t;j ¼ m 0;j þ Z t;1 m 1;j þ ⋯ þ Z t;L m L;j þ ε t;j ;ð5Þwhere m(⋅)=(m 0 ,m 1 ,…,m L ) ⊤ denotes the time-<strong>in</strong>variant factors, a tuple of functions with the property m l : R d →R; l ¼ 0; …; L; Z t ¼1 T ; Z t;1 ; …; Z t;L ⊤denotes the time series of factor load<strong>in</strong>gs, and εt,j represents a white noise error term. The time <strong>in</strong>dex is denoted byt=1,…,T, whereas the cross-sectional <strong>in</strong>dex is j=1,…,J. Note that this type of factor model is rather restrictive, because it does nottake explanatory variables <strong>in</strong>to account.Factorx 104 BHP64211:00 13:00 15:003x 10 NAB86411:00 13:00 15:00x 10 3 82 x 106 MIM1011:00 13:00 15:00x 104 WOW21011:00 13:00 15:00Factor46116 x 104 Time211:00 13:00 15:00411:00 13:00 15:00Time2 x 106 Time011:00 13:00 15:002 x 104 Time011:00 13:00 15:00Fig. 3. Estimated <strong>in</strong>tra-day seasonality factors <strong>for</strong> quantities offered at best bid prices (red) and <strong>for</strong> quantities supplied at best ask prices (blue) across selectedstocks traded at the ASX from July 8 to August 16, 2002 (30 trad<strong>in</strong>g days).


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625615The DSFM is a generalization of the factor model given <strong>in</strong> Eq. (5) and allows the factors m l to depend upon explanatoryvariables, X t,j . Its analytical <strong>for</strong>m is given byY t;j ¼ XLl¼0Z t;l m lX t;j þ ε t;j ¼ Z ⊤ t m X t;j þ ε t;j ;ð6Þwhere the processes X t,j , ε t,j and Z t are assumed to be <strong>in</strong>dependent. Moreover, the number of underly<strong>in</strong>g factors L should notexceed the dimension of the object, J. The ma<strong>in</strong> idea of the DSFM is that L is significantly smaller than J result<strong>in</strong>g <strong>in</strong> a severedimension reduction of the process.As suggested by Park et al. (2009), the estimation of the factors m l is per<strong>for</strong>med us<strong>in</strong>g a series estimator. For K≥1, appropriatefunctions ψ k : ½0; 1Š d →R; k ¼ 1; …; K, which are normalized such that ∫ψ 2 k (x)dx=1 holds, are selected. Park et al. (2009) selecttensor B-spl<strong>in</strong>e basis functions <strong>for</strong> ψ k , whereas Fengler et al. (2007) use a kernel smooth<strong>in</strong>g approach. In the present study, wefollow the <strong>for</strong>mer strategy and employ tensor B-spl<strong>in</strong>e basis functions.After select<strong>in</strong>g the functions ψ k , the factors m(⋅)=(m 0 ,m 1 ,…,m L ) ⊤ are approximated by Aψ, where A ¼ a l;k ∈RðLþ1ÞK is acoefficient matrix, and ψ(⋅)=(ψ 1 ,…,ψ K ) ⊤ denotes a vector of selected functions. Here, K denotes the number of knots used <strong>for</strong> thetensor B-spl<strong>in</strong>e functions and is <strong>in</strong>terpretable as a bandwidth parameter. Thus, the first part <strong>in</strong> the right-hand side of (6), which<strong>in</strong>corporates all factors and factor load<strong>in</strong>gs, can be rewritten as Z ⊤ t m X t;j ¼ XLl¼0Z t;l m l ¼ XLX t;jl¼0Z t;lX Kk¼1a l;k ψ kX t;j ¼ Z ⊤ t Aψ X t;j : ð7ÞIn modell<strong>in</strong>g liquidity supply we use either the ‘relative price levels’ on the bid side S b t,j or those on the ask side S a t,j as the mostimportant explanatory variable X t,j . When focus<strong>in</strong>g on the LOB shape predictability, we add key (weakly exogenous) trad<strong>in</strong>gvariables, namely the past 5-m<strong>in</strong> aggregated trad<strong>in</strong>g volume on both sides of the market, the past 5-m<strong>in</strong> log mid-quote return aswell as the past 5-m<strong>in</strong> volatility, see Section 3.The coefficient matrix A and time series of factor load<strong>in</strong>gs Z t can be estimated us<strong>in</strong>g least squares. Hence, the estimated matrix^A and factor load<strong>in</strong>gs ^Zt ¼ 1 T ; ^Z t;1 ; …; ^Z ⊤t;L are def<strong>in</strong>ed as m<strong>in</strong>imizers of the sum of squared residuals, S(A,Zt )^Z t ; ^A¼ arg m<strong>in</strong> SA; ð Z tÞ ð8ÞZ t ;AX T¼ arg m<strong>in</strong>Z t ;At¼1X Jj¼1n o Y t;j −Z ⊤ 2:t Aψ X t;jð9ÞTo f<strong>in</strong>d a solution of the m<strong>in</strong>imization problem stated <strong>in</strong> Eq. (9), a Newton–Raphson algorithm is used. As shown by Park et al.(2009), this algorithm is shown to converge to a solution at a geometric rate under some weak conditions on the <strong>in</strong>itial choicenovecðAÞ ð0Þ; Z ð0Þ. Moreover, Park et al. (2009) prove that the difference between the estimated load<strong>in</strong>gs ^Z t and the true load<strong>in</strong>gs Z ttare asymptotically negligible. Consequently, it is justified to use <strong>in</strong> a second step multivariate time series specifications <strong>in</strong> order tomodel the dynamics of the factor load<strong>in</strong>gs. Note that due to the estimation complexity, the coefficients of the seasonal trendfactors <strong>in</strong> Eqs. (1) and (2) are not estimated jo<strong>in</strong>tly with the unknown parameters (matrix A) and the factor load<strong>in</strong>gs.The selection of the number of time-<strong>in</strong>variant factors (L) and the number of knots K is per<strong>for</strong>med by evaluat<strong>in</strong>g the proportionof expla<strong>in</strong>ed variance (EV) given byEVðÞ¼ L 1−RVðÞ¼ L 1−P T P Jt¼1 j¼1( Y t;j − PL ) 2^Z t;l ^m l X t;jP T P Jt¼1 j¼1l¼0nY t;j −Y o : ð10Þ2Moreover, the knots used <strong>in</strong> the tensor B-spl<strong>in</strong>e functions should be specified <strong>in</strong> advance. We choose l<strong>in</strong>early spaced knots,with a start<strong>in</strong>g po<strong>in</strong>t determ<strong>in</strong>ed by the m<strong>in</strong>imal value of the explanatory variable (corrected by −5%), and the end po<strong>in</strong>tcorrespond<strong>in</strong>g to the maximal value (corrected by 5%). Sensitivity analysis shows that the results are quite stable regard<strong>in</strong>g thechoice of grid po<strong>in</strong>ts.Because of the use of tensor B-spl<strong>in</strong>e functions <strong>for</strong> the demand and supply curves, which are monotonous <strong>in</strong> the price levels,our estimated first factor ^m 1 and the estimated quantities ^Y t;j are adjusted <strong>for</strong> extreme price levels. Correspond<strong>in</strong>gly, <strong>for</strong> the bidside we keep constant the first (lowest) ten level values, and analogously, <strong>for</strong> the ask side we fix the last (highest) ten level values.


616 W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625The model's goodness-of-fit is evaluated us<strong>in</strong>g the root mean squared error (RMSE) criterion,vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu 1 X T X ( JRMSE ¼YTJt;j − XL ) 2t ^Z t;l ^m l X t;j : ð11Þt¼1j¼1l¼04. Modell<strong>in</strong>g limit order book dynamicsTo model order book dynamics we follow a two step procedure <strong>for</strong> each stock <strong>in</strong>dividually. Employ<strong>in</strong>g the DSFM approach <strong>in</strong>the first step, we model the shape of order book curves <strong>in</strong> dependence of relative price levels. In the follow<strong>in</strong>g step, the dynamicsof the estimated factor load<strong>in</strong>gs is analysed jo<strong>in</strong>tly with the best bid quotes, best ask quotes and the bid-ask spread <strong>in</strong> a parametricmultivariate time series context. This procedure allows us to study the cross-dependency between both sides of the market, the<strong>in</strong>teractions between the limit order book and the quotes, as well as the impact of the bid-ask spread on liquidity supply.Moreover, we <strong>in</strong>vestigate whether the order book shape itself is predictable by additional covariates, particularly, the past trad<strong>in</strong>gvolume, past (realized) volatility as well as past log returns.4.1. Limit order book modell<strong>in</strong>g us<strong>in</strong>g the DSFMWe dist<strong>in</strong>guish between two implementation methods of the DSFM:(i) Separated approach: Separate analysis of both sides of the limit order book, i.e., the bid side Y b t;j ∈R101 , and the ask side,Y a t;j ∈R101 .(ii) Comb<strong>in</strong>ed approach: Simultaneous modell<strong>in</strong>g of both sides of the limit order book with the bid side reversed, i.e. −Y b t;j ; Ya t;j ∈R 202 .To model the limit order book <strong>in</strong> dependence of the relative price levels us<strong>in</strong>g the DSFM, i.e., the relative price deviations fromthe best bid price and best ask price, S b t,j and S a t,j , respectively, we impose K=20 knots <strong>for</strong> the B-spl<strong>in</strong>e functions <strong>in</strong> case of theseparated approach and K=40 knots <strong>in</strong> case of the comb<strong>in</strong>ed approach. Us<strong>in</strong>g more knots does not result <strong>in</strong> significantimprovements of the expla<strong>in</strong>ed variance or <strong>in</strong> the correspond<strong>in</strong>g RMSE, as def<strong>in</strong>ed <strong>in</strong> Eqs. (10) and (11).Empirical results, available from the authors upon request, show that up to approximately 95% of the expla<strong>in</strong>ed variation <strong>in</strong>order curves can be expla<strong>in</strong>ed us<strong>in</strong>g L=2 factors, whereas the marg<strong>in</strong>al contribution of a potentially third factor is only verysmall. Consequently, a two-factor DSFM specification is sufficient to capture the curve dynamics and is used <strong>in</strong> the sequel of theanalysis. Furthermore, compar<strong>in</strong>g the per<strong>for</strong>mance of the two alternative DSFM specifications, it turns out that <strong>in</strong> almost all casesthe DSFM-separated approach outper<strong>for</strong>ms the DSFM-comb<strong>in</strong>ed approach <strong>in</strong> terms of a higher proportion of expla<strong>in</strong>ed varianceand lower values of the root mean squared error. We observe that at almost every price level the DSFM-separated approachoutper<strong>for</strong>ms the DSFM-comb<strong>in</strong>ed approach. There<strong>for</strong>e, the rema<strong>in</strong>der of the analysis will rely on the DSFM-separated approachwith two factors.5BHP10NAB2MIM6WOW1st Factor2.55130−1 −0.5 0 0.5 10−1 −0.5 0 0.5 10−1 −0.5 0 0.5 10−1 −0.5 0 0.5 15105102nd Factor0000−5−1 −0.5 0 0.5 1Price Levels−10−1 −0.5 0 0.5 1Price Levels−5−1 −0.5 0 0.5 1Price Levels−10−1 −0.5 0 0.5 1Price LevelsFig. 4. Estimated first and second factor of the limit order book depend<strong>in</strong>g on relative price levels us<strong>in</strong>g the DSFM-separated approach with two factors <strong>for</strong>selected stocks traded at the ASX from July 8 to August 16, 2002 (30 trad<strong>in</strong>g days). Red: bid curve, blue: ask curve.


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625617Fig. 4 depicts the <strong>nonparametric</strong> estimates of the first and second factor ^m 1 and ^m 2 <strong>in</strong> dependence of the relative price grids.The first factor obviously captures the overall slope of the curve which is associated with the average trad<strong>in</strong>g costs <strong>for</strong> all volumelevels on the correspond<strong>in</strong>g sides of the market. In contrast, the second factor captures order curve fluctuations around the overallslope and thus can be <strong>in</strong>terpreted as a ‘curvature’ factor <strong>in</strong> the spirit of Nelson and Siegel (1987). The shape of this factor revealsthat the curve's curvature is particularly dist<strong>in</strong>ct <strong>for</strong> levels close to the best quotes and <strong>for</strong> levels very deep <strong>in</strong> the book where thecurve seems to spread out. The shapes of the estimated factors are remarkably similar <strong>for</strong> all stocks except <strong>for</strong> MIM. For the latterstock, the shapes of both factors are quite similar and significantly deviate from those reported <strong>for</strong> the other stocks. This f<strong>in</strong>d<strong>in</strong>g isexpla<strong>in</strong>ed by the peculiarities of MIM <strong>for</strong> which the relative tick size is larger than <strong>for</strong> the other stocks. This implies that liquidityis concentrated on relatively few price levels around the best ask and bid quotes whereas the book flattens out <strong>for</strong> higher levels.This pattern is clearly revealed by the correspond<strong>in</strong>g factors shown <strong>in</strong> Fig. 4.However, a priori it is unclear whether modell<strong>in</strong>g order book curves based on all 101 price levels is most appropriate <strong>in</strong> aprediction context. Besides the well-known trade-off between <strong>in</strong>-sample fit and out-of-sample prediction per<strong>for</strong>mance, we alsoface the difficulty that the predictive <strong>in</strong><strong>for</strong>mation revealed by order book volume might depend on the distance to the best quotes.For <strong>in</strong>stance, if price levels far away from the market may conta<strong>in</strong> <strong>in</strong><strong>for</strong>mation that help predict<strong>in</strong>g books <strong>in</strong> the future, this<strong>in</strong><strong>for</strong>mation should be taken <strong>in</strong>to account. However, if they conta<strong>in</strong> virtually only noise (e.g., because of stale orders) it would bemore optimal to ignore this <strong>in</strong><strong>for</strong>mation <strong>in</strong> order to extract a more precise factor structure on lower price levels only. S<strong>in</strong>ceoptimiz<strong>in</strong>g this choice <strong>in</strong> an (out-of-sample) prediction context is tedious and computationally cumbersome, we restrictourselves to the quite common proceed<strong>in</strong>g of per<strong>for</strong>m<strong>in</strong>g model selection based on <strong>in</strong>-sample <strong>in</strong><strong>for</strong>mation. Accord<strong>in</strong>gly, weevaluate the model implied expla<strong>in</strong>ed variance when not the full grid of 101 levels but just 25, 50 and 75 levels are employed. Itturns out that the expla<strong>in</strong>ed variance rema<strong>in</strong>s widely unchanged with the model fit <strong>in</strong>creas<strong>in</strong>g with the number of <strong>in</strong>corporatedlevels. This is particularly important <strong>in</strong> the context of order books of less liquid stocks. There<strong>for</strong>e, we proceed by extract<strong>in</strong>g thefactor structure employ<strong>in</strong>g the entire book.Time series plots of the correspond<strong>in</strong>g factor load<strong>in</strong>gs ^Z b t and ^Z a t are shown <strong>in</strong> Fig. 5. We observe that the load<strong>in</strong>gs strongly varyover time reflect<strong>in</strong>g time variations <strong>in</strong> the shape of the book. The series reveal cluster<strong>in</strong>g structures <strong>in</strong>dicat<strong>in</strong>g a relatively highpersistence <strong>in</strong> the processes. This result is not very surpris<strong>in</strong>g given the fact that order book <strong>in</strong>ventories do not change tooseverely dur<strong>in</strong>g short time horizons. Observ<strong>in</strong>g order book volumes on even higher frequencies than 5 m<strong>in</strong> further <strong>in</strong>creases thispersistence, ultimately driv<strong>in</strong>g the processes toward unit root processes. Naturally, this behavior is particularly dist<strong>in</strong>ct <strong>for</strong> lessfrequently traded stocks and less severe <strong>for</strong> highly active stocks (cf. Hautsch and Huang (2012) <strong>for</strong> correspond<strong>in</strong>g results <strong>for</strong> moreliquid assets).The high persistence is confirmed by autocorrelation functions of ^Z b t and ^Z a t (not shown <strong>in</strong> the paper) and correspond<strong>in</strong>g unitroot and stationarity <strong>test</strong>s. Accord<strong>in</strong>g to the Schmidt-Phillips <strong>test</strong> (see Schmidt and Phillips (1992), H 0 : unit root) <strong>for</strong> all processesthe null hypothesis of an unit root can be rejected at the 5% significance level (<strong>test</strong> statistics <strong>for</strong> all estimated factor load<strong>in</strong>gs are <strong>in</strong>the range [−201.53, −53.88], whereas the critical value equals −25.20). Conversely, <strong>test</strong><strong>in</strong>g the null hypothesis of stationarityus<strong>in</strong>g the KPSS <strong>test</strong> (see Kwiatkowski et al. (1992), H 0 : weak stationarity) implies no rejections <strong>for</strong> the majority of the processes.Nevertheless, <strong>in</strong> five cases we have to reject stationarity. F<strong>in</strong>ally, to <strong>test</strong> <strong>for</strong> possible co<strong>in</strong>tegration between the factor load<strong>in</strong>gs, weper<strong>for</strong>m Johansen (1991) trace <strong>test</strong> (not shown <strong>in</strong> the paper) but do not f<strong>in</strong>d significant evidence <strong>for</strong> common stochastic trendsunderly<strong>in</strong>g the order book.A graphical illustration <strong>for</strong> the goodness-of-fit of the model, depict<strong>in</strong>g the estimated vs. the actually observed limit order bookcurve, would suggest that the model fits the observed curves very well (no illustrations provided here). This is particularly true30BHP10NAB20MIM20WOW1st Load<strong>in</strong>gs1551010008 22 05 16008 22 05 16008 22 05 16008 22 05 1642422nd Load<strong>in</strong>gs0000−408 22 05 16Trad<strong>in</strong>g Day−208 22 05 16Trad<strong>in</strong>g Day−408 22 05 16Trad<strong>in</strong>g Day−208 22 05 16Trad<strong>in</strong>g DayFig. 5. Estimated first and second factor load<strong>in</strong>gs of the limit order book depend<strong>in</strong>g on relative price levels us<strong>in</strong>g the DSFM-separated approach with two factors<strong>for</strong> selected stocks traded at the ASX from July 8 to August 16, 2002 (30 trad<strong>in</strong>g days). Red: bid curve, blue: ask curve.


618 W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625<strong>for</strong> price levels close to the best ask and bid quotes, at any chosen trad<strong>in</strong>g day and stock. Slight deviations are observed <strong>for</strong> pricelevels deeply <strong>in</strong> the book. However, the latter case is less relevant <strong>for</strong> most applications <strong>in</strong> practice.4.2. Modell<strong>in</strong>g limit order book dynamicsOur approach, stipulated under the philosophy smooth <strong>in</strong> space and parametric <strong>in</strong> time, allows us to <strong>in</strong>vestigate the limit orderbook dynamics <strong>in</strong> a multivariate time series modell<strong>in</strong>g context, as well as to relate the order book dynamics to the time evolutionof additional covariates. Formally, <strong>for</strong> each stock we focus on the dynamics of the four estimated stationary factor load<strong>in</strong>gs.Includ<strong>in</strong>g the best bid and the best ask price returns, we consider a (six dimensional) vector of endogenous variablesz t ¼ ^Z b 1;t; ^Z b 2;t; ^Z a 1;t; ^Z a 2;t; Δlog S ~ b t;101; Δlog S ~ ⊤; at;1where ^Z b 1;t , b ^Z 2;t , a ^Z 1;t and ^Z a 2;t denote the estimated first (1) and second (2) factor load<strong>in</strong>gs <strong>for</strong> the bid (b) and ask side (a),respectively. We denote by Δlog S ~ b t;101 the best bid price return, and similarly, by Δlog~ S a t;1 thebest ask price return.Follow<strong>in</strong>g Engle and Patton (2004) and Hautsch and Huang (2012), the bid-ask spread log S ~ bt−1;101 −log~ S a t−1;1 serves as anatural co<strong>in</strong>tegration relationship between the two <strong>in</strong>tegrated ask and bid series. As all other endogenous variables are shown tobe stationary, we obta<strong>in</strong> a vector error correction (VEC) specification of order q with the spread as the only co<strong>in</strong>tegrationrelationship, i.e.,z t ¼ c þ Γ 1 z t−1 þ … þ Γ q z t−q þ γ log S ~ b t−1;101−log S ~ at−1;1 þ ε t :ð12ÞHere c denotes a vector with constants, vector γ ¼ ðγ 1 ; …; γ 6 Þ ⊤ collects parameters associated with the lagged bid-ask spreadand ε t represents a white noise error term. The matrices Γ 1 ,Γ 2 ,…,Γ q are parameter matrices associated with lagged endogenousvariables. Technically, we determ<strong>in</strong>e the order q accord<strong>in</strong>g to the BIC.Estimation results show that <strong>in</strong> all cases, a maximum lag order of q=4 is sufficient. In particular, the follow<strong>in</strong>g model ordersare selected: BHP and WOW (q=3), NAB (q=2), MIM (q=4). For sake of brevity we refra<strong>in</strong> from show<strong>in</strong>g all parameterestimates here, but just report the estimates of matrix Γ 1 and vector γ <strong>for</strong> BHP, NAB, MIM and WOW, respectively, which conta<strong>in</strong>the most relevant <strong>in</strong><strong>for</strong>mation <strong>for</strong> an economic <strong>in</strong>terpretation (5% significance is denoted by an asterisk (*)):20:95 0:63 −0:05 −0:26 3 2 33:03 18:08 −95:700:02 0:79 0:00 0:04 10:68 −16:120:04 0:00 0:75 −34:130:02 −59:60 67:60−0:00 0:04 0:02 0:77 86:83−13:99 13:55;64 0:00 0:00 −0:00 0:00 7 −13:21;6 7−0:59 0:29 5 4 −0:42 50:00 0:00 −0:00 0:00 −0:26 −0:04 0:0220:71 0:16 −0:04 −0:21 123:78 −124:07 3 2−174:41 30:04 0:78 −0:00 0:07 −22:39 21:910:04 0:13 0:73 0:18 −88:56 86:91 9:26−0:03 −0:03 0:03 0:71 26:03 −25:46 47:60;64 0:00 0:00 −0:00 7 −20:59;6 7−0:00 0:21 −0:34 5 4 −0:85 50:00 0:00 −0:00 0:00 0:29 −0:41 −0:0420:90 1:29 −0:00 0:55 3 2−46:92 50:79 62:01 30:00 0:93 −0:01 −0:01 1:12 −1:49−0:02 1:23 0:99 0:48 31:56 −44:50 0:250:00 0:04 0:03 0:84 −44:50 66:73 −5:89;7 −21:66 and6 74 0:00 0:00 −0:00 −0:00 0:40 −0:58 5 4 −0:28 50:00 0:00 −0:00 −0:00 0:90 −1:09 −0:1820:74 −0:02 0:12 0:38 3 2 328:87 −37:11 −27:140:04 0:82 −0:02 −0:04 2:53 −3:580:04 0:03 0:87 0:19 −70:61 72:84 −6:33−0:03 0:02 0:02 0:83 59:9812:81 −13:70;64 −0:00 −0:00 0:00 0:00 7 −4:04:6 70:02 −0:15 5 4 −0:51 5−0:00 −0:00 0:00 0:00 0:21 −0:34 0:05The estimation results can be summarised as follows:Firstly, we observe strong own-process dynamics, but only relatively weak (mostly <strong>in</strong>significant) cross-dependenciesbetween the endogenous variables. The latter are most pronounced <strong>for</strong> less frequently traded stocks (MIM and WOW).Overall, the quite weak <strong>in</strong>ter-dependencies between the processes on the ask and bid side <strong>in</strong>dicate that time variations <strong>in</strong> theliquidity schedule on the one side is almost unaffected by that on the other side.


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–6256191 x 10−3 BHP1 x 10−3 NAB5 x 10−3 MIM1 x 10−3 WOWResponse0−10−100−1−20 10 20 30−20 10 20 30−50 10 20 30−20 10 20 30Response102 x 10−3 0Time <strong>in</strong> m<strong>in</strong>.−20 10 20 302 x 10−3 Time <strong>in</strong> m<strong>in</strong>.−10 10 20 305 x 10−3 0Time <strong>in</strong> m<strong>in</strong>.−50 10 20 301 x 10−3 0Time <strong>in</strong> m<strong>in</strong>.−10 10 20 30Fig. 6. Orthogonalized impulse-response analysis: responses of the best bid quote return to a one standard deviation shock <strong>in</strong> the estimated first bid factorload<strong>in</strong>gs (upper panel) and response of the best ask quote return to a one standard deviation shock <strong>in</strong> the estimated first ask factor load<strong>in</strong>gs (lower panel). Weemploy the DSFM-separated approach with two factors and a VEC specification <strong>for</strong> selected stocks traded at the ASX from July 8 to August 16, 2002 (30 trad<strong>in</strong>gdays). The response variable always enters the VEC specification <strong>in</strong> the first position. 95% confidence <strong>in</strong>tervals are shown with dashed l<strong>in</strong>es.Secondly, the major f<strong>in</strong>d<strong>in</strong>g is that quote changes are short-run predictable given the shape of the book. More precisely,changes <strong>in</strong> the factor load<strong>in</strong>g have a short term impact on the quote changes, up to, say, 5–10 m<strong>in</strong>. The impact is significant <strong>for</strong>the frequently traded stocks, and less severe <strong>for</strong> less liquid stocks. In particular, a shock on the bid side result<strong>in</strong>g <strong>in</strong> upwardrotation of the bid curve (<strong>in</strong>duc<strong>in</strong>g a higher sell pressure) leads to an <strong>in</strong>stantaneous decrease <strong>in</strong> the best bid quote followed bya significant <strong>in</strong>crease of the price with<strong>in</strong> the next few m<strong>in</strong>utes, see, e.g. Fig. 6. This is driven by a grow<strong>in</strong>g buy pressure reflectedby an <strong>in</strong>crease of bid depth at and beh<strong>in</strong>d the market. Fig. 6 depicts the impulse responses of ask and bid quotes driven by ashock <strong>in</strong> the order book slope. While these effects are quite dist<strong>in</strong>ct on the bid side, they are, however, less pronounced on theask side. A shock on the ask side, however, has a more neutral effect on the price, see, e.g. Fig. 6. However, note that thepredictability of quotes only holds over comparably short horizons. There<strong>for</strong>e, <strong>for</strong> daily order execution strategies, as discussed<strong>in</strong> Section 5, these effects are only of limited use.Thirdly, we f<strong>in</strong>d slight evidence <strong>for</strong> asymmetric reactions of slope factor load<strong>in</strong>gs on changes of the bid-ask spread. Inparticular, we observe that ris<strong>in</strong>g spreads tend to reduce the order aggressiveness on the bid side while the converse is true onthe ask side. Hence, we conclude that as the bid and ask curves move apart, the price is (on average) decreas<strong>in</strong>g. Similarly, asthe bid-ask spread shr<strong>in</strong>ks, the price is expected to <strong>in</strong>crease. This re-confirms our f<strong>in</strong>d<strong>in</strong>g <strong>in</strong> Chapter 2, that liquidity variationson the bid side are higher than that of the ask side with more sell activities than buy activities.4.3. Drivers of the order book shapeIn this section, we analyse whether the shape of order book curves is predictable based on key (weakly exogenous) trad<strong>in</strong>gvariables. We select three variables <strong>for</strong> which we expect to observe the strongest impact on the book's shape, namely the past 5-m<strong>in</strong>aggregated trad<strong>in</strong>g volume on both sides of the market represent<strong>in</strong>g the recent liquidity demand, the past 5-m<strong>in</strong> log mid-quotereturn as well as the past 5-m<strong>in</strong> volatility.The buy and sell trad<strong>in</strong>g volumes at time t are given by the sum of traded quantities from all market orders r, Q ~ b r and Q ~ s r , over5 m<strong>in</strong> <strong>in</strong>terval, namely, Q ~ bt ¼ ∑Rb t ~r¼1Q b r and Q ~ s t ¼ ∑Rs t ~r¼1Q s r , where R t b and R s t denote the number of buy and sell orders over the<strong>in</strong>terval (t−1,t], respectively. Correspond<strong>in</strong>gly, log returns r t and volatility V t are computed asr t ¼ log ~ S t~S t−1ð13ÞV t ¼ r 2 t ;ð14Þwhere ~ S t and ~ S t−1 denote the mid-quotes observed at t and t-1, respectively. Note that the trad<strong>in</strong>g volumes as well as the volatilityare seasonally adjusted follow<strong>in</strong>g the procedure expla<strong>in</strong>ed above. Moreover, the used <strong>nonparametric</strong> procedure requires thevariables to be standardized between −1 and 1. This standardization is per<strong>for</strong>med based on the m<strong>in</strong>imum and maximumobservations of the correspond<strong>in</strong>g variables. F<strong>in</strong>ally, as commonly known, <strong>nonparametric</strong> regression becomes computationallycumbersome <strong>for</strong> a high number of regressors. To keep our approach computationally tractable and to avoid problems due to thecurse of dimensionality, we <strong>in</strong>clude the regressors only <strong>in</strong>dividually (together with the relative price distances). This ultimatelyyields a three-dimensional problem.


620 W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625BHPNAB1st Factor630−10−5 0Price Levels1MIM0−1Sell Quantity1050−4−2Price Levels0 1WOW0−1Sell Quantity1st Factor420−100−50Price Levels010−1Sell Quantity1050−10−5Price Levels010−1Sell QuantityFig. 7. Estimated first factors of the bid side with respect to relative price levels and the past log traded sell volume us<strong>in</strong>g the DSFM-separated approach with twofactors <strong>for</strong> selected stocks traded at the ASX from July 8 to August 16, 2002 (30 trad<strong>in</strong>g days).Figs. 7 and 8 show the estimated first factors <strong>for</strong> the bid and the ask side <strong>in</strong> dependence of the past 5-m<strong>in</strong> sell and buy trad<strong>in</strong>gvolumes, respectively. As expected, we observe that the past liquidity demand <strong>in</strong>fluences the order book curve. A high trad<strong>in</strong>gvolume implies that a non-trivial part of the pend<strong>in</strong>g volume <strong>in</strong> the book is removed. Thus, most of the observed variation of thefactor's shape is <strong>in</strong>duced by the fact that either quoted price levels close to the best quotes have been completely absorbed and therema<strong>in</strong><strong>in</strong>g volume is correspond<strong>in</strong>gly 'shifted down' <strong>in</strong> relation to the new best quote or, alternatively, only a part of the pend<strong>in</strong>gvolume on the best quotes is removed chang<strong>in</strong>g the distribution of the pend<strong>in</strong>g volumes across the (relative) price levels.As expected, the curve flattens <strong>in</strong> the area of high volumes. Strik<strong>in</strong>gly, we also observe a decay<strong>in</strong>g pattern if the volume sizesdecl<strong>in</strong>e. Actually, <strong>in</strong> all pictures, the maximum slope (and thus the highest level of liquidity supply) is observed <strong>for</strong> magnitudes ofthe standardized volume between −1 and 0, i.e., comparably small (though not zero) trad<strong>in</strong>g volumes. This pattern might betechnically expla<strong>in</strong>ed by the standardization procedure based on extreme values or by the usual boundary problems of <strong>nonparametric</strong>regression. On the other hand, note that due the curse-of-dimensionality problem we cannot simultaneously control<strong>for</strong> other variables. For <strong>in</strong>stance, very small market-side-specific trad<strong>in</strong>g volumes can <strong>in</strong>dicate the occurrence of marketimbalances or, alternatively, might be associated with wide spreads. Both scenarios could <strong>for</strong>ce <strong>in</strong>vestors to post rather limitorders than market orders which might expla<strong>in</strong> the decay<strong>in</strong>g shape of the figures after hav<strong>in</strong>g observed small trad<strong>in</strong>g volumes.To evaluate whether the <strong>in</strong>clusion of past trad<strong>in</strong>g volume further <strong>in</strong>creases the model's goodness-of-fit, we calculated thecorrespond<strong>in</strong>g RMSEs. Compar<strong>in</strong>g the results (range from 4.37 to 10.42) with that <strong>for</strong> the basis model (range from 0.18 to 3.49)shows that the <strong>in</strong>cluded regressors yield higher estimation errors. Hence, obviously the <strong>in</strong>clusion of additional regressorsultimately generates more noise overcompensat<strong>in</strong>g a possibly higher explanatory power. Similar results are also found <strong>for</strong> thepast log returns and past volatility serv<strong>in</strong>g as regressors. The <strong>in</strong>clusion of log returns yields smaller estimation errors than the<strong>in</strong>clusion of volatility. However, the overall per<strong>for</strong>mance is lower than <strong>in</strong> the cases above. Because of this reason, we refra<strong>in</strong> fromshow<strong>in</strong>g correspond<strong>in</strong>g graphs of the estimated factors.A possible reason <strong>for</strong> the decl<strong>in</strong><strong>in</strong>g model per<strong>for</strong>mance <strong>in</strong> case of <strong>in</strong>cluded regressors might be the lower dimensionality of theregressors <strong>in</strong> comparison with that of the limit order book. Note that the <strong>in</strong>cluded regressors do not reveal any variation acrossBHPNAB1st Factor63010 5Price Levels0 1MIM0−1Buy Quantity105042Price Levels0 1WOW0−1Buy Quantity1st Factor42100 0 50Price Levels010−1Buy Quantity10510 0 5Price Levels010−1Buy QuantityFig. 8. Estimated first factors of the ask side with respect to relative price levels and the past log traded buy volume us<strong>in</strong>g the DSFM-separated approach with twofactors <strong>for</strong> selected stocks traded at the ASX from July 8 to August 16, 2002 (30 trad<strong>in</strong>g days).


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625621the levels of the book. Consequently, the explanatory variables cannot improve the model's spatial fit but just its dynamic fit.Obviously, the latter is not sufficient to obta<strong>in</strong> an overall reduction of estimation errors.5. Forecast<strong>in</strong>g liquidity supply5.1. SetupThe aim of this section is to analyse the model's <strong>for</strong>ecast<strong>in</strong>g per<strong>for</strong>mance <strong>in</strong> a realistic sett<strong>in</strong>g mimick<strong>in</strong>g the situation <strong>in</strong>f<strong>in</strong>ancial applications. We consider an <strong>in</strong>vestor observ<strong>in</strong>g the limit order book at 5-m<strong>in</strong>ute snapshots together with thehistory over the past 10 trad<strong>in</strong>g days. It is assumed that dur<strong>in</strong>g a trad<strong>in</strong>g day an <strong>in</strong>vestor updates limit order book every 5 m<strong>in</strong>and requires produc<strong>in</strong>g <strong>for</strong>ecasts <strong>for</strong> all (5 m<strong>in</strong>) <strong>in</strong>tervals of the rema<strong>in</strong>der of the day. Such <strong>in</strong><strong>for</strong>mation might be useful <strong>in</strong>order to optimally balance order execution dur<strong>in</strong>g the course of a day. S<strong>in</strong>ce we do not exceed beyond the end of the trad<strong>in</strong>gday (<strong>in</strong> order to avoid overnight effects), the <strong>for</strong>ecast<strong>in</strong>g horizon h subsequently decl<strong>in</strong>es if we approach market closure.Hence, start<strong>in</strong>g at 10:30, we produce multi-step <strong>for</strong>ecasts <strong>for</strong> all rema<strong>in</strong><strong>in</strong>g h = 66 <strong>in</strong>tervals dur<strong>in</strong>g the day. Correspond<strong>in</strong>gly,at 15:50, we are left with a horizon of h =1. S<strong>in</strong>ce quotes themselves – accord<strong>in</strong>g to our results above – are only predictableover short horizons which are virtually irrelevant <strong>for</strong> the present analysis, we do not explicitly <strong>in</strong>corporate this <strong>in</strong><strong>for</strong>mationhere.Consequently, the model is re-estimated every 5 m<strong>in</strong> exploit<strong>in</strong>g past <strong>in</strong><strong>for</strong>mation over a fixed w<strong>in</strong>dow of 10 trad<strong>in</strong>g days(<strong>in</strong>clud<strong>in</strong>g the recent observation). Due to the length of the estimation period, we do not produce <strong>for</strong>ecasts <strong>for</strong> the first two weeksof our sample but focus on the period between July 22 and August 16, 2002, thereby cover<strong>in</strong>g the period of 20 trad<strong>in</strong>g days. Inaccordance with our <strong>in</strong>-sample results reported <strong>in</strong> the previous section, we choose the DSFM-Separated approach based on twofactors without additional regressors as underly<strong>in</strong>g specification.A natural benchmark to evaluate our model is the naive <strong>for</strong>ecast. In this context, we assume that the <strong>in</strong>vestor has noappropriate prediction model but just uses the current liquidity supply as a <strong>for</strong>ecast <strong>for</strong> the rema<strong>in</strong>der of the day. More <strong>for</strong>mally,we suppose that our <strong>in</strong>vestor can use the follow<strong>in</strong>g two approaches <strong>in</strong> order to <strong>for</strong>ecast liquidity supply ^Y t ′ þh;j at a given timepo<strong>in</strong>t t′ from July 22 at 10:25 until August 16, 2002, at 15:50, t′=693,…,2069=T−1, over a <strong>for</strong>ecast<strong>in</strong>g horizon 1 ≤h≤66, andover the absolute price level j:(i) DSFM approach: Firstly, the factors and factor load<strong>in</strong>gs are estimated us<strong>in</strong>g the DSFM-Separated approach with two factors,K=20 knots used <strong>for</strong> the B-spl<strong>in</strong>e basis functions, and with past 690 observed (de-seasonalized) limit order book curves.baMore precisely, at time po<strong>in</strong>t t′, relative price levels S t′ − 691 : t′,j and S t′ − 691 : t′,j and de-seasonalized observed bid and askbasides Y t′ − 691 : t′,j and Y t′ − 691 : t′,j enter the estimation procedures. This yields estimates <strong>for</strong> the bid (ask) side, 66 times perday <strong>for</strong> each stock, <strong>in</strong> total 1320 times over 20 days.Secondly, s<strong>in</strong>ce we do not account <strong>for</strong> (short-term) quote return predictability but only <strong>for</strong>ecast the liquidity supply, weemploy a simple 4-dimensional VAR(p) model <strong>for</strong> the four time-vary<strong>in</strong>g factor load<strong>in</strong>gs. When fitted to the entire timeseries (30 trad<strong>in</strong>g days) and accord<strong>in</strong>g to the BIC, a maximum lag order p=4 is sufficient. In particular, the follow<strong>in</strong>gVAR(p) models are selected: BHP and MIM – VAR(4), NAB – VAR(2), WOW – VAR(3). Us<strong>in</strong>g this specifications, we <strong>for</strong>ecastthe factor load<strong>in</strong>gs over the <strong>for</strong>ecast<strong>in</strong>g period ^Z t ′ þh . Then, the predicted factor load<strong>in</strong>gs together with the estimated time<strong>in</strong>variantfactors ^m l are used to predict the order book.(ii) Naive approach: Among all historical 690 limit order book curves, only the last one at time t′,(Y b t′,j , Y a t′,j ), is selected as the h-step ahead <strong>for</strong>ecast.The predictions are evaluated us<strong>in</strong>g the root mean squared prediction error (RMSPE), i.e., a version of the <strong>in</strong>-sample RMSE(11) where the sum over the sampl<strong>in</strong>g periods t and the sample size T are replaced by the <strong>for</strong>ecast<strong>in</strong>g horizons h and H,respectively. S<strong>in</strong>ce future quotes and relative price grids are not predicted by the model, we assume that quotes themselvesfollow random walk processes and the spread rema<strong>in</strong>s constant. Future quotes are there<strong>for</strong>e predicted us<strong>in</strong>g the current one.Consequently, the predicted future relative price grid rema<strong>in</strong>s constant.5.2. Forecast<strong>in</strong>g resultsFig. 9 shows the RMSPEs <strong>for</strong> each required <strong>for</strong>ecast<strong>in</strong>g horizon h dur<strong>in</strong>g a trad<strong>in</strong>g day implied by the DSFM as well as the naivemodel. The follow<strong>in</strong>g results can be summarized: First, overall the DSFM <strong>for</strong>ecasts outper<strong>for</strong>m the naive ones. Nevertheless, thenaive <strong>for</strong>ecast is a serious competitor which is hard to beat. This result is not surpris<strong>in</strong>g given the high persistence <strong>in</strong> liquiditysupply. Secondly, the model's <strong>for</strong>ecast<strong>in</strong>g per<strong>for</strong>mance is obviously higher on the bid side than on the ask side. This result mightbe expla<strong>in</strong>ed by the fact that dur<strong>in</strong>g the sample period we observe a downward market <strong>in</strong>duc<strong>in</strong>g higher activities on the bid sidethan on the ask side. This is confirmed by the descriptive statistics shown above. Thirdly, the DSFM outper<strong>for</strong>ms the naive modelparticularly over horizons up to 1 to 2 hours. For longer horizons, the picture is less clear.Analyz<strong>in</strong>g average RMSPEs (averaged over all <strong>for</strong>ecast<strong>in</strong>g horizons and both sides of the market) as reported by Table 2<strong>in</strong>dicate that the overall prediction per<strong>for</strong>mance of the DSFM approach is significantly higher than that of the benchmark.


622 W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–62510BHP10NAB10MIM10WOWRMSPE555500 1 2 3 4 500 1 2 3 4 500 1 2 3 4 500 1 2 3 4 510101010RMSPE555500 1 2 3 4 5Horizon00 1 2 3 4 5Horizon00 1 2 3 4 5Horizon00 1 2 3 4 6HorizonFig. 9. Root mean squared prediction errors (RMSPEs) implied by the DSFM-separated approach with two factors <strong>for</strong> the bid side (red) as well as the ask side(blue) and by the naive approach (black) <strong>for</strong> all <strong>in</strong>tra-day <strong>for</strong>ecast<strong>in</strong>g horizons (<strong>in</strong> hours) <strong>for</strong> selected stocks traded at the ASX. Prediction period: July 22 toAugust 16, 2002 (20 trad<strong>in</strong>g days).5.3. F<strong>in</strong>ancial and economic applicationsThe results <strong>in</strong> the previous section show that the DSFM is able to successfully predict liquidity supply over various <strong>for</strong>ecast<strong>in</strong>ghorizons dur<strong>in</strong>g a day. In this subsection, we apply these results to two practical examples. The first one is devoted to an orderexecution strategy, whereas the second one deals with <strong>for</strong>ecasts of demand and supply elasticities.Example 1. (Trad<strong>in</strong>g Strategy)Suppose an <strong>in</strong>stitutional <strong>in</strong>vestor decides to buy (sell) a certa<strong>in</strong> number of shares v over the course of a trad<strong>in</strong>g day, start<strong>in</strong>gfrom 10:30 until 15:40. The size of the traded quantity <strong>for</strong> BHP, NAB and WOW is chosen as to be 5 or 10 times the averagepend<strong>in</strong>g volume at the best bid (ask) level. In case of MIM, where liquidity supply is much more concentrated at the first level andthe book is very th<strong>in</strong> <strong>for</strong> higher levels (see the empirical results <strong>in</strong> the previous sections), we choose the traded volume as be<strong>in</strong>g 2and 5 times the average first level depth. This yields to the follow<strong>in</strong>g quantities <strong>in</strong> the respective two cases of high (a) and veryhigh (b) liquidity demand:(a) BHP – 175,000 shares; NAB – 25,000 shares; WOW – 50,000 shares; MIM – 1,860,000 shares(b) BHP – 350,000 shares; NAB – 50,000 shares; WOW – 100,000 shares; MIM – 4,650,000 shares.To reduce the computational burden, we assume that trad<strong>in</strong>g is only per<strong>for</strong>med on a 5 m<strong>in</strong> grid throughout the daycorrespond<strong>in</strong>g to 63 possible trad<strong>in</strong>g time po<strong>in</strong>ts. Moreover, suppose that the <strong>in</strong>vestor makes her trad<strong>in</strong>g decision at 10:30 butdoes not monitor the market anymore dur<strong>in</strong>g the day. Consequently, her <strong>for</strong>ecast<strong>in</strong>g horizon covers h=63 periods at each trad<strong>in</strong>gday. Then, she has to decide between two execution strategies:(i) Splitt<strong>in</strong>g the buy (sell) order of size v <strong>in</strong> a 5 m<strong>in</strong>ute frequency proportionally over the trad<strong>in</strong>g day result<strong>in</strong>g <strong>in</strong>to 63 trades ofsize v/63 each.(ii) Plac<strong>in</strong>g orders not proportionally but at those m (5 m<strong>in</strong>ute <strong>in</strong>terval) time po<strong>in</strong>ts throughout the day where the DSFMbasedpredicted implied trad<strong>in</strong>g costs c of the volume v are smallest (among all 63 possible periods). Then, the volume v issplit over the m time po<strong>in</strong>ts accord<strong>in</strong>g to the relative proportions of expected trad<strong>in</strong>g costs. Hence, at <strong>in</strong>terval i, w i ⋅v sharesare traded, with w i ¼ c i =∑ m j¼1 c j <strong>for</strong> i=1,…,m.Table 2Average root mean squared prediction errors (RMPSEs) of both limit order book sides implied by the DSFM-separated approach with two factors and the naivemodel <strong>for</strong> selected stocks traded at the ASX <strong>in</strong> the period from July 22 to August 6, 2002 (20 <strong>for</strong>ecast<strong>in</strong>g days).Approach BID ASKBHP NAB MIM WOW BHP NAB MIM WOWNaive 7.11 7.59 6.03 6.08 6.50 5.96 5.96 6.19DSFM 7.18 5.10 4.84 5.33 5.56 5.46 5.63 5.45


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625623Strategy (i) can be seen as a special case of strategy (ii) if m is chosen as m=63 and the volume v is just equally split.Conversely, <strong>for</strong> m=1, we obta<strong>in</strong> the extreme case, where the entire quantity is traded once requir<strong>in</strong>g to severely 'walk up' thebook. The DSFM predictions of trad<strong>in</strong>g costs are computed based on the predicted order book shape at each po<strong>in</strong>t and the effectivecosts to buy or to sell the quantity v while us<strong>in</strong>g the ask and bid quotes prevail<strong>in</strong>g at 10:25 (<strong>in</strong> accordance with the assumption ofa random walk). Note that we do not optimize over the quantity underly<strong>in</strong>g the predicted trad<strong>in</strong>g costs but just fix it at vcorrespond<strong>in</strong>g to the maximally possible trade size per time po<strong>in</strong>t. Consequently, our strategy selects those trad<strong>in</strong>g po<strong>in</strong>ts wherethe execution of the entire quantity v is expected to be cheapest and thus covers also the hypothetical (limit<strong>in</strong>g) case of putt<strong>in</strong>g allweight w i on a s<strong>in</strong>gle po<strong>in</strong>t imply<strong>in</strong>g a 'one-shot' execution. Of course, an even more sophisticated (and optimized) strategywould require the prediction of trad<strong>in</strong>g costs <strong>for</strong> relative proportions of v which are themselves simultaneously optimized.However, this would substantially <strong>in</strong>crease the numerical and computational burden and is beyond the scope of the current study.To implement these strategies, we consider 20 <strong>for</strong>ecast<strong>in</strong>g days cover<strong>in</strong>g the period from July 22 to August 16, 2002. Fig. 10shows the average percentage reduction <strong>in</strong> trad<strong>in</strong>g costs of strategy (ii) <strong>in</strong> excess of the equal-splitt<strong>in</strong>g ('naive') strategy (i) <strong>for</strong>various choices of m∈[1,63]. In most cases we observe that a strategic plac<strong>in</strong>g of orders accord<strong>in</strong>g to DSFM predictions yieldexcess ga<strong>in</strong>s of approximately 10 basis po<strong>in</strong>ts on average. Overall, the sell<strong>in</strong>g strategies are more beneficial than the buy<strong>in</strong>gstrategies confirm<strong>in</strong>g the f<strong>in</strong>d<strong>in</strong>gs on prediction errors above. This is most strik<strong>in</strong>g <strong>for</strong> BHP where we observe a significantdifference between sell-based and buy-based profits if the number of trad<strong>in</strong>g po<strong>in</strong>ts are low. Apart from this observation we f<strong>in</strong>d agenerally non-monotonic behavior of the curves imply<strong>in</strong>g losses if m is small, <strong>in</strong>creas<strong>in</strong>g (and positive) ga<strong>in</strong>s <strong>for</strong> a higher numberof trad<strong>in</strong>g po<strong>in</strong>ts and a convergence to zero <strong>for</strong> m reach<strong>in</strong>g the upper limit of 63. This pattern <strong>in</strong>dicates that trad<strong>in</strong>g the dailyposition us<strong>in</strong>g only a few large market orders is <strong>in</strong>ferior to an equal-splitt<strong>in</strong>g strategy as the underly<strong>in</strong>g transactions have to walkup the book too severely and cause huge price impacts. For higher values of m, the strategic placement accord<strong>in</strong>g to DSFMpredictions become profitable where <strong>in</strong> the limit of m=63, relative benefits are only due to a strategic (non-equal) weight<strong>in</strong>gscheme. However, <strong>for</strong> MIM we observe a significantly different pattern imply<strong>in</strong>g the highest ga<strong>in</strong>s <strong>for</strong> m be<strong>in</strong>g small and nearlymonotonically decl<strong>in</strong><strong>in</strong>g profits if m is <strong>in</strong>creas<strong>in</strong>g. This pattern is obviously <strong>in</strong>duced by the peculiarities of the MIM order bookwhich is extremely deep on the first level and makes 'one-shot' executions of large volumes quite beneficial. Overall, the patternsare very similar <strong>for</strong> the two classes of daily quantities, where as expected the relative ga<strong>in</strong>s become smaller with higher tradeddaily volume.Overall, our f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong>dicate that the model is successful <strong>in</strong> predict<strong>in</strong>g times where the market is sufficiently deep <strong>in</strong> order toexecute a large orders. The fact that the model per<strong>for</strong>ms reasonably well is promis<strong>in</strong>g <strong>for</strong> more elaborate practical applications ofthe DSFM. Moreover, note that the reported results are valid under the assumption that there are no transaction fees. Actually, <strong>in</strong>practice, a proportional splitt<strong>in</strong>g strategy <strong>in</strong>duces higher transaction costs as a complete execution via a market order. Thiscomponent is not taken <strong>in</strong>to account here and would even <strong>in</strong>crease the per<strong>for</strong>mance of the DSFM-based execution strategy.F<strong>in</strong>ally, predictions of trad<strong>in</strong>g costs could be further improved by exploit<strong>in</strong>g possible predictive <strong>in</strong><strong>for</strong>mation of the limit orderbook <strong>for</strong> future returns. Our descriptive statistics reported above show that order book imbalances have <strong>in</strong>deed (slight) predictionpower. We will leave these issues, however, <strong>for</strong> future research.Benefits10.50−0.5BHP−10 10 20 30 40 50 60NAB0.50.250−0.25−0.50 10 20 30 40 50 60MIM0.50.250−0.25−0.50 10 20 30 40 50 60WOW0.50.250−0.25−0.50 10 20 30 40 50 60Benefits10.50−0.5−10 10 20 30 40 50 60Trades per Day0.50.250−0.25−0.50 10 20 30 40 50 60Trades per Day0.50.250−0.25−0.50 10 20 30 40 50 60Trades per Day0.50.250−0.25−0.50 10 20 30 40 50 60Trades per DayFig. 10. Average percentage ga<strong>in</strong>s by reduced transaction costs compared to an equal-splitt<strong>in</strong>g strategy when buy<strong>in</strong>g (blue) and sell<strong>in</strong>g (red) shares based on mDSFM-predicted time po<strong>in</strong>ts per day. Upper panel: Daily volumes correspond<strong>in</strong>g to 5 (2) times the average first level market depth <strong>for</strong> BHP, NAB, WOW (MIM).Lower panel: Daily volumes correspond<strong>in</strong>g to 10 (5) times the average first level market depth <strong>for</strong> BHP, NAB, WOW (MIM). Prediction period: July 22 to August16, 2002 (20 trad<strong>in</strong>g days).


624 W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625Example 2. (Demand and Supply Elasticity)A straight<strong>for</strong>ward dimension-less measure <strong>for</strong> the order book slope is the curve's elasticity which we compute at best bid ( ~ S b t ′ ;101 )and best ask prices ( ~ S a t ′ ;1 )as^Y b ^E d tt ′ þh ¼′ þh;1 −^Y b t ′ þh;101~S b t=′ ;1 −~ S b t ′ ;101; ð15Þ^Y b t ′ þh;101~S b t ′ ;101^Y a ^E s tt ′ þh ¼′ þh;101 −^Y a t ′ þh;1~S a t=′ ;101 −~ S a t ′ ;1; ð16Þ^Y a t ′ þh;1~S a t ′ ;1<strong>for</strong> the demand (bid) and supply (ask) side, respectively. The elasticity is a measure <strong>for</strong> the marg<strong>in</strong>al trad<strong>in</strong>g costs reflect<strong>in</strong>g thecurve's curvature.Suppose at 10:30 an <strong>in</strong>vestor aims to predict the demand and supply elasticity at best bid and best ask prices <strong>for</strong> all 5-m<strong>in</strong><strong>in</strong>tervals dur<strong>in</strong>g the trad<strong>in</strong>g day cover<strong>in</strong>g the <strong>for</strong>ecast horizons h=1,…,66. As above, the <strong>for</strong>ecasts are computed us<strong>in</strong>g the last 10trad<strong>in</strong>g days. S<strong>in</strong>ce we are not <strong>for</strong>ecast<strong>in</strong>g the price process, the last observed ask and bid quotes are used <strong>for</strong> prediction. Fig. 11shows the 10:30 predictions of demand and supply elasticities at best bid and best ask prices dur<strong>in</strong>g all trad<strong>in</strong>g days. We observethat marg<strong>in</strong>al trad<strong>in</strong>g costs exhibit significant variations over time. The fact that predicted elasticities reveal temporarily trend<strong>in</strong>gpatterns might be used <strong>for</strong> improv<strong>in</strong>g trad<strong>in</strong>g strategies.Consider the case of NAB on July 24 and July 30, 2002. We observe that the demand elasticities (<strong>in</strong> absolute terms) are<strong>in</strong>creas<strong>in</strong>g on the first day, and decreas<strong>in</strong>g on the second day. Practically, it would be better to sell shares late on July 24, and earlyon July 30, under the assumption that the price does not change significantly over both trad<strong>in</strong>g days. The supply elasticities showconverse patterns across the days. Consequently, it would be advisable to buy shares early on July 24, and late on July 30, providedthat the prices rema<strong>in</strong> unchanged. While this section aims to illustrate possible applications of the DSFM approach, more detailedelaborations of dynamic strategies are beyond the scope of the paper.6. ConclusionsIn this paper, we propose a dynamic semiparametric factor model (DSFM) <strong>for</strong> modell<strong>in</strong>g and <strong>for</strong>ecast<strong>in</strong>g liquidity supply. Thema<strong>in</strong> idea of the DSFM as proposed by Brüggemann et al. (2008), Cao et al. (2009), Fengler et al. (2007) and Park et al. (2009) is tocapture the order curve's spatial structure across various relative distances to the best quotes us<strong>in</strong>g a factor decomposition whichis estimated <strong>nonparametric</strong>ally. To capture the order book's time variations, the correspond<strong>in</strong>g factor load<strong>in</strong>gs are modelled us<strong>in</strong>ga multivariate time series model. The framework is flexible though parsimonious and turns out to provide a powerful way toreduce the high dimension of the book and to extract the relevant underly<strong>in</strong>g <strong>in</strong><strong>for</strong>mation regard<strong>in</strong>g order book dynamics.The model is applied to four stocks traded at the Australian Stock Exchange (ASX). It is shown that two underly<strong>in</strong>g factors canexpla<strong>in</strong> up to 95% of <strong>in</strong>-sample variations of ask and bid liquidity supply. While the first factor captures the overall order curve's slope,the second factor is associated with the curve's curvature. The extracted factor load<strong>in</strong>gs reveal highly persistent though weaklystationary dynamics which are successfully captured by a vector error correction specification. We f<strong>in</strong>d relatively weak spill-overeffects between both sides of the limit order book sides that are more pronounced <strong>for</strong> less liquid stocks compared to high frequentlytraded ones. It is shown that order book shapes have short-term prediction power <strong>for</strong> quote changes. Furthermore, we show that the800BHP1200NAB30MIM800WOWElasticity40006000150400022 24 26 30 0122 24 26 30 0122 24 26 30 0122 24 26 30 01800120030800Elasticity40006000150400005 07 09 13 15Trad<strong>in</strong>g Day05 07 09 13 15Trad<strong>in</strong>g Day05 07 09 13 15Trad<strong>in</strong>g Day05 07 09 13 15Trad<strong>in</strong>g DayFig. 11. Predicted demand and supply elasticities at best bid (red) and best ask prices (blue) us<strong>in</strong>g the DSFM-separated approach with two factors <strong>for</strong> selectedstocks traded at the ASX from July 22 to August 2, 2002 (upper panels, 10 trad<strong>in</strong>g days) and from August 5 to August 16, 2002 (lower panels, 10 trad<strong>in</strong>g days).


W.K. Härdle et al. / Journal of Empirical F<strong>in</strong>ance 19 (2012) 610–625625order curves' shapes are driven by explanatory variables reflect<strong>in</strong>g the recent liquidity demand, volatility as well as mid-quotereturns.Employ<strong>in</strong>g the DSFM approach <strong>in</strong> an extensive and realistic out-of-sample <strong>for</strong>ecast<strong>in</strong>g exercise we show that the modelsuccessfully predicts the liquidity supply over various <strong>for</strong>ecast<strong>in</strong>g horizons dur<strong>in</strong>g a trad<strong>in</strong>g day and outper<strong>for</strong>ms a naive approach.Us<strong>in</strong>g the <strong>for</strong>ecast<strong>in</strong>g results <strong>in</strong> a trad<strong>in</strong>g strategy it is shown that order execution costs can be reduced if orders are optimally placedaccord<strong>in</strong>g to predictions of liquidity supply. In particular, it turns out that optimal order placement <strong>in</strong> periods of high liquidity results<strong>in</strong> smaller transaction costs than <strong>in</strong> the case of a proportional splitt<strong>in</strong>g over time. F<strong>in</strong>ally, our flexible approach allows us to estimateand to predict future (excess) demand and supply elasticities.These results show that the DSFM approach is suitable <strong>for</strong> modell<strong>in</strong>g and <strong>for</strong>ecast<strong>in</strong>g liquidity supply. S<strong>in</strong>ce it is computationallytractable, it can serve as a valuable build<strong>in</strong>g block <strong>for</strong> automated trad<strong>in</strong>g models.AcknowledgementsWe are very grateful to Anthony Hall <strong>for</strong> provid<strong>in</strong>g us the data. For helpful comments and discussions we thank Jean-PhilippeBouchaud, Joachim Grammig, Jeffrey Russell and the participants of the 2009 <strong>Humboldt</strong>-Copenhagen Conference on F<strong>in</strong>ancialEconometrics <strong>in</strong> Berl<strong>in</strong>, the 2009 annual conference of the Society <strong>for</strong> F<strong>in</strong>ancial Econometrics (SoFiE) <strong>in</strong> Geneva, the 2009European Meet<strong>in</strong>g of the Econometric Society <strong>in</strong> Barcelona, the International Conference on Price, Liquidity and Credit Risk <strong>in</strong>Konstanz, 2008, as well as the 4th World Congress of the International Association <strong>for</strong> Statistical Comput<strong>in</strong>g <strong>in</strong> Yokohama, 2008.Furthermore, we are grateful to Szymon Borak <strong>for</strong> help<strong>in</strong>g us with the implementation of the Dynamic Semiparametric FactorModel <strong>in</strong> MATLAB. F<strong>in</strong>ally, we thank the reviewers <strong>for</strong> their constructive comments and helpful suggestions.ReferencesAhn, H.J., Bae, K.H., Chan, K., 2001. Limit orders, depth, and volatility: evidence from the stock exchange of Hong Kong. Journal of F<strong>in</strong>ance 56, 767–788.Alfonsi, A., Fruth, A., Schied, A., 2010. Optimal execution strategies <strong>in</strong> limit order books with general shape functions. Quantitative F<strong>in</strong>ance 10, 143–157.Almgren, R., Chriss, N., 2000. Optimal execution of portfolio transactions. Journal of Risk 3, 5–39.Bertsimas, D., Lo, A.W., 1998. Optimal control of execution costs. Journal of F<strong>in</strong>ancial Markets 1, 1–50.Biais, B., Hillion, P., Spatt, C., 1995. An empirical analysis of the limit order book and the order flow <strong>in</strong> the Paris bourse. Journal of F<strong>in</strong>ance 50, 1655–1689.Bloomfield, R., O'Hara, M., Saar, G., 2005. The ”make or take” decision <strong>in</strong> an electronic market: evidence on the evolution of liquidity. Journal of F<strong>in</strong>ancialEconomics 75, 165–200.Brownlees, C.T., Cipoll<strong>in</strong>i, F., Giampiero, M.G., 2009. Intra-daily Volume Model<strong>in</strong>g and Prediction <strong>for</strong> Algorithmic Trad<strong>in</strong>g. Discussion paper, Stern School ofBus<strong>in</strong>ess. .Brüggemann, R., Härdle, W., Mungo, J., Trenkler, C., 2008. VAR modell<strong>in</strong>g <strong>for</strong> dynamic semiparametric factors of volatility str<strong>in</strong>gs. Journal of F<strong>in</strong>ancial Econometrics5 (2), 189–218.Cao, J., Härdle, W., Mungo, J., 2009. A Jo<strong>in</strong>t Analysis of the KOSPI 200 Option and ODAX Option Markets Dynamics. Discussion Paper 019, Collaborative ResearchCenter 649 ”Economic Risk”. <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>.Chacko, G.C., Jurek, J.W., Staf<strong>for</strong>d, E., 2008. The price of immediacy. Journal of F<strong>in</strong>ance 63, 1253–1290.Degryse, H., Jong, F., Ravenswaaij, M., Wuyts, G., 2005. Aggressive orders and the resiliency of a limit order market. Review of F<strong>in</strong>ance 9, 201–242.Engle, R.F., Ferstenberg, R., 2007. Execution risk. Journal of Portfolio Management 33, 34–45.Engle, R.F., Patton, A.J., 2004. Impacts of trades <strong>in</strong> an error-correction model of quote prices. Journal of F<strong>in</strong>ancial Markets 7, 1–25.Fengler, M.R., Härdle, W., Mammen, E., 2007. A dynamic semiparametric factor model <strong>for</strong> implied volatility str<strong>in</strong>g dynamics. Journal of F<strong>in</strong>ancial Econometrics 5(2), 189–218.Gallant, A.R., 1981. On the bias of flexible functional <strong>for</strong>ms and an essentially unbiased <strong>for</strong>m. Journal of Econometrics 15, 211–245.Garvey, R., Wu, F., 2009. Intraday time and order execution quality dimensions. Journal of F<strong>in</strong>ancial Markets 12, 203–228.Glosten, L., 1994. Is the electronic Limit Order Book <strong>in</strong>evitable. Journal of F<strong>in</strong>ance 49 (4), 1127–1161.Goyenko, R.Y., Holden, C.W., Trzc<strong>in</strong>ka, C.A., 2009. Do liquidity measures measure liquidity? Journal of F<strong>in</strong>ancial Economics 92, 153–181.Griffiths, M.D., Smith, B.F., Turnbull, D.A.S., White, R.W., 2000. The costs and determ<strong>in</strong>ants of order aggressiveness. Journal of F<strong>in</strong>ancial Economics 56, 65–88.Hall, A.D., Hautsch, N., 2006. Order aggressiveness and order book dynamics. Empirical Economics 30, 973–1005.Hall, A.D., Hautsch, N., 2007. Modell<strong>in</strong>g the buy and sell <strong>in</strong>tensity <strong>in</strong> a limit order book market. Journal of F<strong>in</strong>ancial Markets 10 (3), 249–286.Hasbrouck, J., 2009. Trad<strong>in</strong>g costs and returns <strong>for</strong> U.S. equities: estimat<strong>in</strong>g effective costs from daily data. Journal of F<strong>in</strong>ance 64, 1445–1477.Hasbrouck, J., Saar, G., 2009. Technology and liquidity provision: the blurr<strong>in</strong>g of traditional def<strong>in</strong>itions. Journal of F<strong>in</strong>ancial Markets 12, 143–172.Hautsch, N., Huang, R., 2012. The market impact of a limit order. Journal of Economic Dynamics & Control 36, 501–522.Hollifield, B., Miller, R.A., Sandås, P., 2004. Empirical analysis of limit order markets. Review of Economic Studies 71, 1027–1063.Johansen, S., 1991. Estimation and hypothesis <strong>test</strong><strong>in</strong>g of co<strong>in</strong>tegration vectors <strong>in</strong> gaussian vector autoregressive models. Econometrica 59 (6), 1551–1580.Johnson, T.C., 2008. Volume, liquidity and liquidity risk. Journal of F<strong>in</strong>ancial Economics 87, 388–417.Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., Sh<strong>in</strong>, Y., 1992. Test<strong>in</strong>g the null hypothesis of stationarity aga<strong>in</strong>st the alternative of a unit root: how sure are we thateconomic time series have a unit root? Journal of Econometrics 54, 159–178.Large, J., 2007. Measur<strong>in</strong>g the resiliency of an electronic limit order book. Journal of F<strong>in</strong>ancial Markets 10, 1–25.Liu, W.-M., 2009. Monitor<strong>in</strong>g and limit order submission risks. Journal of F<strong>in</strong>ancial Markets 12, 107–141.Nelson, C.R., Siegel, A.F., 1987. Parsimonious modell<strong>in</strong>g of yield curves. Journal of Bus<strong>in</strong>ess 60, 473–489.Obizhaeva, A., Wang, J., 2005. Optimal Trad<strong>in</strong>g Strategy and Supply/Demand Dynamics. Work<strong>in</strong>g Paper 11444, NBER.Park, B., Mammen, E., Härdle, W., Borak, S., 2009. Time series modell<strong>in</strong>g with semiparametric factor dynamics. Journal of the American Statistical Association 104(485), 284–298.Ranaldo, A., 2004. Order aggressiveness <strong>in</strong> limit order book markets. Journal of F<strong>in</strong>ancial Markets 7, 53–74.Schmidt, P., Phillips, P.C.B., 1992. LM <strong>test</strong>s <strong>for</strong> a unit root <strong>in</strong> the presence of determ<strong>in</strong>istic trends. Ox<strong>for</strong>d Bullet<strong>in</strong> of Economics and Statistics 54, 257–287.


This article was downloaded by: [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek]On: 25 April 2012, At: 06:53Publisher: RoutledgeIn<strong>for</strong>ma Ltd Registered <strong>in</strong> England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UKApplied Mathematical F<strong>in</strong>ancePublication details, <strong>in</strong>clud<strong>in</strong>g <strong>in</strong>structions <strong>for</strong> authors andsubscription <strong>in</strong><strong>for</strong>mation:http://www.tandfonl<strong>in</strong>e.com/loi/ramf20The Implied Market Price of WeatherRiskWolfgang Karl Härdle a & Brenda López Cabrera aa Ladislaus von Bortkiewicz Chair of Statistics, <strong>Humboldt</strong>University of Berl<strong>in</strong>, Berl<strong>in</strong>, GermanyAvailable onl<strong>in</strong>e: 17 Oct 2011To cite this article: Wolfgang Karl Härdle & Brenda López Cabrera (2012): The Implied Market Priceof Weather Risk, Applied Mathematical F<strong>in</strong>ance, 19:1, 59-95To l<strong>in</strong>k to this article: http://dx.doi.org/10.1080/1350486X.2011.591170PLEASE SCROLL DOWN FOR ARTICLEFull terms and conditions of use: http://www.tandfonl<strong>in</strong>e.com/page/terms-andconditionsThis article may be used <strong>for</strong> research, teach<strong>in</strong>g, and private study purposes. Anysubstantial or systematic reproduction, redistribution, resell<strong>in</strong>g, loan, sub-licens<strong>in</strong>g,systematic supply, or distribution <strong>in</strong> any <strong>for</strong>m to anyone is expressly <strong>for</strong>bidden.The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of any<strong>in</strong>structions, <strong>for</strong>mulae, and drug doses should be <strong>in</strong>dependently verified with primarysources. The publisher shall not be liable <strong>for</strong> any loss, actions, claims, proceed<strong>in</strong>gs,demand, or costs or damages whatsoever or howsoever caused aris<strong>in</strong>g directly or<strong>in</strong>directly <strong>in</strong> connection with or aris<strong>in</strong>g out of the use of this material.


Applied Mathematical F<strong>in</strong>ance,Vol. 19, No. 1, 59–95, February 2012The Implied Market Price of Weather RiskDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012WOLFGANG KARL HÄRDLE & BRENDA LÓPEZ CABRERALadislaus von Bortkiewicz Chair of Statistics, <strong>Humboldt</strong> University of Berl<strong>in</strong>, Berl<strong>in</strong>, Germany(Received 7 May 2010; <strong>in</strong> revised <strong>for</strong>m 28 February 2011)ABSTRACT Weather derivatives (WD) are end-products of a process known as securitizationthat trans<strong>for</strong>ms non-tradable risk factors (weather) <strong>in</strong>to tradable f<strong>in</strong>ancial assets. For pric<strong>in</strong>g andhedg<strong>in</strong>g non-tradable assets, one essentially needs to <strong>in</strong>corporate the market price of risk (MPR),which is an important parameter of the associated equivalent mart<strong>in</strong>gale measure (EMM). Themajority of papers so far has priced non-tradable assets assum<strong>in</strong>g zero or constant MPR, butthis assumption yields biased prices and has never been quantified earlier under the EMM framework.Given that liquid-derivative contracts based on daily temperature are traded on the ChicagoMercantile Exchange (CME), we <strong>in</strong>fer the MPR from traded futures-type contracts (CAT, CDD,HDD and AAT). The results show how the MPR significantly differs from 0, how it varies <strong>in</strong>time and changes <strong>in</strong> sign. It can be parameterized, given its dependencies on time and temperatureseasonal variation. We establish connections between the market risk premium (RP) and the MPR.KEY WORDS: CAR process, CME, HDD, seasonal volatitity, risk premium1. IntroductionIn the 1990s weather derivatives (WD) were developed to hedge aga<strong>in</strong>st the randomnature of temperature variations that constitute weather risk. WD are f<strong>in</strong>ancialcontracts with payments based on weather-related measurements. WD cover aga<strong>in</strong>stvolatility caused by temperature, ra<strong>in</strong>fall w<strong>in</strong>d, snow, and frost. The key factor <strong>in</strong> efficientusage of WD is a reliable valuation procedure. However, due to their specificnature one encounters several difficulties. First, because the underly<strong>in</strong>g weather (and<strong>in</strong>dices) is not tradable and second, the WD market is <strong>in</strong>complete, mean<strong>in</strong>g that theWD cannot be cost-efficiently replicated by other WD.S<strong>in</strong>ce the largest portion of WD traded at Chicago Mercantile Exchange (CME) iswritten on temperature <strong>in</strong>dices, we concentrate our research on temperature derivatives.There have been basically four branches of temperature derivative pric<strong>in</strong>g:actuarial approach, <strong>in</strong>difference pric<strong>in</strong>g, general equilibrium theory or pric<strong>in</strong>g via noarbitrage arguments. While the actuarial approach considers the conditional expectationof the pay-off under the real physical measure discounted at the riskless rate (Brixet al., 2005), the <strong>in</strong>difference pric<strong>in</strong>g relies on the equivalent utility pr<strong>in</strong>ciple (Barrieuand El Karoui, 2002; Brockett et al., 2010) and the general equilibrium theory assumesCorrespondence Address: Brenda López Cabrera, Ladislaus von Bortkiewicz Chair of Statistics, <strong>Humboldt</strong>University of Berl<strong>in</strong>, Berl<strong>in</strong>, Germany. Tel: +49(0)30 2093 1457 Email: lopezcab@wiwi.hu-berl<strong>in</strong>.de1350-486X Pr<strong>in</strong>t/1466-4313 Onl<strong>in</strong>e/12/010059–37 © 2012 Taylor & Francishttp://dx.doi.org/10.1080/1350486X.2011.591170


60 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012<strong>in</strong>vestors’ preferences and rules of Pareto optimal risk allocation (Cao and Wei, 2004;Horst and Mueller, 2007; Richards et al., 2004). The Mart<strong>in</strong>gale approach, althoughless demand<strong>in</strong>g <strong>in</strong> terms of assumptions, concentrates on the econometric modell<strong>in</strong>gof the underly<strong>in</strong>g dynamics and requires the selection of an adequate equivalent mart<strong>in</strong>galemeasure (EMM) to value the pay-offs by tak<strong>in</strong>g expectations (Alaton et al.,2002; Benth, 2003; Benth and Saltyte-Benth, 2007; Benth et al., 2007; Brody et al.,2002; Huang-Hsi et al., 2008; Mraoua and Bari, 2007).Here we prefer the latter approach. First, s<strong>in</strong>ce the underly<strong>in</strong>g (temperature) weconsider is of local nature, our analysis aims at understand<strong>in</strong>g the pric<strong>in</strong>g at differentlocations around the world. Second, the EMM approach helps identify the marketprice of risk (MPR), which is an important parameter of the associated EMM, and it is<strong>in</strong>dispensable <strong>for</strong> pric<strong>in</strong>g and hedg<strong>in</strong>g non-tradable assets. The MPR can be extractedfrom traded securities and one can use this value to price other derivatives, though any<strong>in</strong>ference about the MPR requires an assumption about its specification.The MPR is of high scientific <strong>in</strong>terest, not only <strong>for</strong> f<strong>in</strong>ancial risk analysis, butalso <strong>for</strong> better economic modell<strong>in</strong>g of fair valuation of risk. Constant<strong>in</strong>ides (1987)and Landskroner (1977) studied the MPR of tradable assets <strong>in</strong> the Capital AssetPric<strong>in</strong>g Model (CAPM) framework. For pric<strong>in</strong>g <strong>in</strong>terest rate derivatives, Vasicek(1977) assumed a constant market price of <strong>in</strong>terest rate, while Hull and White (1990)used the specification of Cox et al. (1985). In the oil market, Gibson and Schwartz(1990) supposed an <strong>in</strong>tertemporal constant market price of crude oil conveniencesyield risk. Benth et al. (2008) <strong>in</strong>troduced a parameterization of the MPR to price electricityderivatives. In the WD framework, Cao and Wei (2004) and Richards et al.(2004) studied the MPR as an implicit parameter <strong>in</strong> a generalization of the Lucas’(1978) equilibrium framework. They showed that the MPR is not only statisticallysignificant <strong>for</strong> temperature derivatives, but also economically large as well. However,calibration problems arise with the methodology suggested by Cao and Wei (2004),s<strong>in</strong>ce it deals with a global model like the Lucas’ (1978) approach while weather islocally specified. Benth and Saltyte-Benth (2007) <strong>in</strong>troduced theoretical ideas of equivalentchanges of measure to get no arbitrage futures/option prices written on differenttemperature <strong>in</strong>dices. Huang-Hsi et al. (2008) exam<strong>in</strong>ed the MPR of the Taiwan StockExchange Capitalization-Weighted Stock Index ((the mean of stock returns – risk-free<strong>in</strong>terest rate)/SD of stock returns) and used it as a proxy <strong>for</strong> the MPR on temperatureoption prices. The majority of temperature pric<strong>in</strong>g papers so far has priced temperaturederivatives assum<strong>in</strong>g 0 or constant MPR (Alaton et al., 2002; Cao and Wei, 2004;Dorfleitner and Wimmer, 2010; Huang-Hsi et al., 2008; Richards et al., 2004), but thisassumption yields biased prices and has never been quantified earlier us<strong>in</strong>g the EMMframework. This article deals exactly with the differences between ‘historical’ and ‘riskneutral’ behaviours of temperature.The contribution of this article is threefold. First, <strong>in</strong> contrast to Campbell andDiebold (2005), Benth and Saltyte-Benth (2007) and Benth et al. (2007), we correct<strong>for</strong> seasonality and seasonal variation of temperature with a local smooth<strong>in</strong>gapproach to get, <strong>in</strong>dependently of the chosen location, the driv<strong>in</strong>g stochastics closeto a Gaussian Process and with that be<strong>in</strong>g able to apply pric<strong>in</strong>g technique tools off<strong>in</strong>ancial mathematics (Karatzas and Shreve, 2001). Second and the ma<strong>in</strong> contribution,us<strong>in</strong>g statistical modell<strong>in</strong>g and given that liquid derivative contracts based on daily


The Implied Market Price of Weather Risk 61Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012temperature are traded on the CME, we imply the MPR from traded futures-type contracts(CAT/HDD/CDD/AAT) based on a well-known pric<strong>in</strong>g model developed byBenth et al. (2007). We have chosen this methodology because it is a stationary modelthat fits the stylized characteristics of temperature well; it nests a number of previousmodels (Alaton et al., 2002; Benth, 2003; Benth and Saltyte-Benth, 2005, 2007; Brodyet al., 2002; Dornier and Querel, 2007); it provides closed-<strong>for</strong>m pric<strong>in</strong>g <strong>for</strong>mulas; andit computes, after deriv<strong>in</strong>g the MPR, non-arbitrage prices based on a cont<strong>in</strong>uoustimehedg<strong>in</strong>g strategy. Moreover, the price dynamics of futures are easy to computeand require only a one-time estimation. Our implied MPR approach is a calibrationprocedure <strong>for</strong> f<strong>in</strong>ancial eng<strong>in</strong>eer<strong>in</strong>g purposes. In the calibration exercise, a s<strong>in</strong>gledate (but different time horizons and calibrated <strong>in</strong>struments are used) is required,s<strong>in</strong>ce the model is recalibrated daily to detect <strong>in</strong>tertemporal effects. Moreover, we usean economic and statistical <strong>test</strong><strong>in</strong>g approach, where we start from a specification ofthe MPR and check consistency with the data. The aim of this analysis is to studythe effect of different MPR specifications (a constant, a (two) piecewise l<strong>in</strong>ear function,a time-determ<strong>in</strong>istic function and a ‘f<strong>in</strong>ancial-bootstrapp<strong>in</strong>g’) on the temperaturefutures prices. The statistical po<strong>in</strong>t of view is to beat this as an <strong>in</strong>verse problem withdifferent degrees of smoothness expressed through the penalty parameter of a smooth<strong>in</strong>gspl<strong>in</strong>e. The degrees of smoothness will allow <strong>for</strong> a term structure of risk. S<strong>in</strong>cesmooth<strong>in</strong>g estimates are fundamentally different from estimat<strong>in</strong>g a determ<strong>in</strong>istic function,we also assure our results by fitt<strong>in</strong>g a parametric function to all available contractprices (calendar year estimation). The economic po<strong>in</strong>t of view is to detect possible timedependencies that can be expla<strong>in</strong>ed by <strong>in</strong>vestor’s preferences <strong>in</strong> order to hedge weatherrisk. Our f<strong>in</strong>d<strong>in</strong>gs that the MPR differs significantly from 0 confirm the results found<strong>in</strong> Cao and Wei (2004), Huang-Hsi et al. (2008), Richards et al. (2004) and Alatonet al. (2002), but we differ from them, by show<strong>in</strong>g that it varies <strong>in</strong> time and changes<strong>in</strong> sign. It is not a reflection of bad model specification, but data-extracted MPR. Thiscontradicts the assumption made earlier <strong>in</strong> the literature that the MPR is 0 or constantand rules out the ‘burn-<strong>in</strong>’ analysis, which is popular among practitioners s<strong>in</strong>ceit uses the historical average <strong>in</strong>dex value as the price <strong>for</strong> the futures (Brix et al., 2005).This br<strong>in</strong>gs significant challenges to the statistical branch of the pric<strong>in</strong>g literature.We also establish connections between the market risk premium (RP) (a Girsanovtypechange of probability) and the MPR. As a third contribution, we discusshow to price over-the-counter (OTC) temperature derivatives with the <strong>in</strong><strong>for</strong>mationextracted.Our article is structured as follows. Section 2 presents the fundamentals of temperaturederivatives (futures and options) and describes the temperature data and thetemperature futures traded at CME, the biggest market offer<strong>in</strong>g this k<strong>in</strong>d of product.Section 3 is devoted to expla<strong>in</strong><strong>in</strong>g the dynamics of temperature data – the econometricpart. The temperature model captures l<strong>in</strong>ear trend, seasonality, mean reversion,<strong>in</strong>tertemporal correlations and seasonal volatility effects. Section 4 – the f<strong>in</strong>ancialmathematics part – connects the weather dynamics with the pric<strong>in</strong>g methodology.Section 5 solves the <strong>in</strong>verse problem of determ<strong>in</strong><strong>in</strong>g the MPR of CME temperaturefutures us<strong>in</strong>g different specifications. Section 1 <strong>in</strong>troduces the estimation results and<strong>test</strong> procedures of our specifications applied <strong>in</strong>to temperature-derivative data. Here wegive (statistical and economic) <strong>in</strong>terpretations of the estimated MPR. The pric<strong>in</strong>g of


62 W. K. Härdle and B. López CabreraOTC temperature products is discussed at the end of this section. Section 6 concludesthe article. All computations <strong>in</strong> this article were carried out <strong>in</strong> Matlab version 7.6 (TheMathWorks, Inc., Natick, MA, USA). To simplify notation, dates are denoted withyyyymmdd <strong>for</strong>mat.Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 20122. Temperature DerivativesThe largest portion of futures and options written on temperature <strong>in</strong>dices is traded onthe CME. Most of the temperature derivatives are written on daily average temperature<strong>in</strong>dices, rather than on the underly<strong>in</strong>g temperature by itself. A call option written onfutures F (t,τ1 ,τ 2 ) with exercise time t ≤ τ 1 and delivery over a period [τ 1 , τ 2 ] will paymax { F (t,τ1 ,τ 2 ) − K,0 } at the end of the measurement period [τ 1 , τ 2 ]. The most commonweather <strong>in</strong>dices on temperature are Heat<strong>in</strong>g Degree Day (HDD), Cool<strong>in</strong>g Degree Day(CDD) and Cumulative Averages (CAT). The HDD <strong>in</strong>dex measures the temperatureover a period [τ 1 , τ 2 ], usually between October and April:HDD(τ 1 , τ 2 ) =∫ τ2τ 1max(c − T u ,0)du, (1)where c is the basel<strong>in</strong>e temperature (typically 18 ◦ Cor65 ◦ F) and T u = (T u,max +T u,m<strong>in</strong> )/2 is the average temperature on day u. Similarly, the CDD <strong>in</strong>dex measures thetemperature over a period [τ 1 , τ 2 ], usually between April and October:CDD(τ 1 , τ 2 ) =∫ τ2τ 1max(T u − c,0)du. (2)The HDD and the CDD <strong>in</strong>dex are used to trade futures and options <strong>in</strong> 24 US cities,6 Canadian cities and 3 Australian cities. The CAT <strong>in</strong>dex accounts the accumulatedaverage temperature over [τ 1 , τ 2 ]:CAT(τ 1 , τ 2 ) =∫ τ2τ 1T u du. (3)The CAT <strong>in</strong>dex is the substitution of the CDD <strong>in</strong>dex <strong>for</strong> 11 European cities. S<strong>in</strong>cemax(T u − c,0)− max(c − T u ,0)= T u − c, we get the HDD–CDD parity:CDD(τ 1 , τ 2 ) − HDD(τ 1 , τ 2 ) = CAT(τ 1 , τ 2 ) − c(τ 2 − τ 1 ). (4)There<strong>for</strong>e, it is sufficient to analyse only HDD and CAT <strong>in</strong>dices. An <strong>in</strong>dex similar tothe CAT <strong>in</strong>dex is the Pacific Rim Index, which measures the accumulated total of 24-hraverage temperature (C24AT) over a period [τ 1 , τ 2 ] days <strong>for</strong> Japanese cities:C24AT(τ 1 , τ 2 ) =∫ τ2τ 1˜T u du, (5)∫where ˜T u = 1 2424 1T ui du i and T ui denotes the temperature of hour u i . A difference ofthe CAT and the C24AT <strong>in</strong>dex is that the latter is traded over the whole year. Note that


The Implied Market Price of Weather Risk 63Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012temperature is a cont<strong>in</strong>uous-time process even though the <strong>in</strong>dices used as underly<strong>in</strong>g<strong>for</strong> temperature futures contracts are discretely monitored.As temperature is not a marketable asset, the replication arguments <strong>for</strong> any temperaturefutures contract do not hold and <strong>in</strong>completeness of the market follows. In thiscontext, any probability measure Q equivalent to the objective P is also an EMM and arisk neutral probability turns all the tradable assets <strong>in</strong>to mart<strong>in</strong>gales after discount<strong>in</strong>g.However, s<strong>in</strong>ce temperature futures/option prices dynamics are <strong>in</strong>deed tradable assets,they must be free of arbitrage. Thanks to the Girsanov theorem, equivalent changes ofmeasures are simply associated with changes of drift. Hence, under a probability space(, F, Q) with a filtration {F t } 0≤t≤τmax ,whereτ max denotes a maximal time cover<strong>in</strong>gall times of <strong>in</strong>terest <strong>in</strong> the market, we choose a parameterized equivalent pric<strong>in</strong>g measureQ = Q θ that completes the market and p<strong>in</strong> it down to compute the arbitrage-freetemperature futures price:F (t,τ1 ,τ 2 ) = E Q θ[Y|F t ], (6)where Y refers to the pay-off from the temperature <strong>in</strong>dex <strong>in</strong> Equations (2)–(5). TheMPR θ is assumed to be a real-valued, bounded and piecewise cont<strong>in</strong>uous function.We later relax that assumption, by consider<strong>in</strong>g the time-dependent MPR θ t .Infact,the MPR can depend on anyth<strong>in</strong>g that can affect <strong>in</strong>vestors’ attitudes. The MPR can be<strong>in</strong>ferred from market data.The choice of Q determ<strong>in</strong>es the RP demanded <strong>for</strong> <strong>in</strong>vestors <strong>for</strong> hold<strong>in</strong>g the temperaturederivative, and opposite, hav<strong>in</strong>g knowledge of the RP determ<strong>in</strong>es the choiceof the risk-neutral probability. The RP is def<strong>in</strong>ed as a drift of the spot dynamicsor a Girsanov-type change of probability. In Equation (6), the futures price is setunder a risk-neutral probability Q = Q θ , thereby the RP measures exactly the differencesbetween the risk-neutral F (t,τ i1 ,τ2 i,Q) (market prices) and the temperature marketprobability predictions ˆF (t,τ i1 ,τ2 i,P) (under P):RP = F (t,τ i1 ,τ i 2 ,Q) − ˆF (t,τ i1 ,τ i 2 ,P) . (7)Us<strong>in</strong>g the ‘burn-<strong>in</strong>’ approach of Brix et al. (2005), the futures price is only the historicalaverage <strong>in</strong>dex value, there<strong>for</strong>e there is no RP s<strong>in</strong>ce Q = P.2.1 DataWe have temperature data available from 35 US, 30 German, 159 Ch<strong>in</strong>ese and9 European weather stations. The temperature data were obta<strong>in</strong>ed from the NationalClimatic Data Center (NCDC), the Deutscher Wetterdienst (DWD), BloombergProfessional Service, the Japanese Meteorological Agency (JMA) and the Ch<strong>in</strong>aMeteorological Adm<strong>in</strong>istration (CMA). The temperature data conta<strong>in</strong> the m<strong>in</strong>imum,maximum and average daily temperatures measured <strong>in</strong> degree Fahrenheit <strong>for</strong> US citiesand degree Celsius <strong>for</strong> other cities. The data set period is, <strong>in</strong> most of the cities, from19470101 to 20091231.The WD data traded at CME were provided by Bloomberg Professional Service. Weuse daily clos<strong>in</strong>g prices from 20000101 to 20091231. The measurement periods <strong>for</strong> the


64 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012different temperature <strong>in</strong>dices are standardized to be as each month of the year or asseasonal strips (m<strong>in</strong>imum of 2 and maximum of 7 consecutive calendar months). Thefutures and options at the CME are cash settled, that is, the owner of a futures contractreceives 20 times the <strong>in</strong>dex at the end of the measurement period, <strong>in</strong> return <strong>for</strong> a fixedprice. The currency is British pounds <strong>for</strong> the European futures contracts, US dollars <strong>for</strong>the US contracts and Japanese Yen <strong>for</strong> the Asian cities. The m<strong>in</strong>imum price <strong>in</strong>crementis 1 <strong>in</strong>dex po<strong>in</strong>t. The degree day metric is Celsius or Fahrenheit and the term<strong>in</strong>ation ofthe trad<strong>in</strong>g is two calendar days follow<strong>in</strong>g the expiration of the contract month. Theaccumulation period of each CAT/CDD/HDD/C24AT <strong>in</strong>dex futures contract beg<strong>in</strong>swith the first calendar day of the contract month and ends with the calendar day of thecontract month. Earth Satellite Corporation (ESC) reports to CME the daily averagetemperature. Traders bet that the temperature will not exceed the estimates from ESC.3. Temperature DynamicsIn order to derive explicitly no arbitrage prices <strong>for</strong> temperature derivatives, we needfirst to describe the dynamics of the underly<strong>in</strong>g under the physical measure. This articlestudies the average daily temperature data (because most of the temperature derivativetrad<strong>in</strong>g is based on this quantity) <strong>for</strong> US, European and Asian cities. In particular,we analyse the weather dynamics <strong>for</strong> Atlanta, Portland, Houston, New York, Berl<strong>in</strong>,Essen, Tokyo, Osaka, Beij<strong>in</strong>g and Taipei (Table 1). Our <strong>in</strong>terest <strong>in</strong> these cities is becauseall of them with the exception of the latter two are traded at CME and because a casualexam<strong>in</strong>ation of the trad<strong>in</strong>g statistics on the CME website reveals that the AtlantaHDD, Houston CDD and Portland CDD temperature contracts have relatively moreliquidity.Most of the literature that discuss models <strong>for</strong> daily average temperature and capturea l<strong>in</strong>ear trend (due to global warm<strong>in</strong>g and urbanization), seasonality (peaks <strong>in</strong>cooler w<strong>in</strong>ter and warmer summers), mean reversion, seasonal volatility (a variationthat varies seasonally) and strong correlations (long memory); see, <strong>for</strong> example, Alatonet al. (2002), Cao and Wei (2004), Campbell and Diebold (2005) and Benth et al.(2007). They differ from their def<strong>in</strong>ition of temperature variations, which is exactlythe component that characterizes weather risk. Here we show that an autoregressive(AR) model AR of high order (p) <strong>for</strong> the detrended daily average temperatures(rather than the underly<strong>in</strong>g temperature itself) is enough to capture the stylized factsof temperature.We first need to remove the seasonality <strong>in</strong> mean t from the daily temperatureseries T t , check <strong>in</strong>tertemporal correlations and remove the seasonality <strong>in</strong> variance todeal with a stationary process. The determ<strong>in</strong>istic seasonal mean component can beapproximated with Fourier-truncated series (FTS): t = a + bt +L∑{ } 2π(t − dl )c l cos, (8)l · 365l=1where the coefficients a and b <strong>in</strong>dicate the average daily temperature and global warm<strong>in</strong>g,respectively. We observe low temperatures <strong>in</strong> w<strong>in</strong>ter times and high temperatures<strong>in</strong> summer <strong>for</strong> different locations. The temperature data sets do not deviate from its


The Implied Market Price of Weather Risk 65Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Table 1. Coefficients of the Fourier-truncated seasonal series of average daily temperatures <strong>in</strong> different cities.City Period â (CI) ˆb (CI) ĉ1 (CI) ˆd1 (CI)Atlanta 19480101–20081204 61.95 (61.95, 61.96) −0.0025 (−0.0081, 0.0031) 18.32 (18.31, 18.33) −165.02 (−165.03, −165.02)Beij<strong>in</strong>g 19730101–20090831 12.72 (12.71, 12.73) 0.0001 (−0.0070, 0.0073) 14.93 (14.92, 14.94) −169.59 (−169.59, −169.58)Berl<strong>in</strong> 19480101–20080527 9.72 (9.71, 9.74) −0.0004 (−0.0147, 0.0139) 9.75 (9.74, 9.77) −164.79 (−164.81, −164.78)Essen 19700101–20090731 10.80 (10.79, 10.81) −0.0020 (−0.0134, 0.0093) 8.02 (8.01, 8.03) −161.72 (−161.73, −161.71)Houston 19700101–20081204 68.52 (68.51, 68.52) −0.0006 (−0.0052, 0.0039) 15.62 (15.62, 15.63) −165.78 (−165.79, −165.78)New York 19490101–20081204 53.86 (53.86, 53.87) −0.0004 (−0.0079, 0.0071) 21.43 (21.42, 21.44) −156.27 (−156.27, −156.26)Osaka 19730101–20090604 16.78 (16.77, 16.79) −0.0021 (−0.0109, 0.0067) 11.61 (11.60, 11.62) −153.57 (−153.58, −153.56)Portland 19480101–20081204 55.35 (55.35, 55.36) −0.0116 (−0.0166, −0.0065) 14.36 (14.36, 14.37) −155.58 (−155.58, −155.57)Taipei 19920101–20090806 23.32 (23.31, 23.33) 0.0023 (−0.0086, 0.0133) 6.67 (6.66, 6.68) −158.67 (−158.68, −158.66)Tokyo 19730101–20090831 16.32 (16.31, 16.33) −0.0003 (−0.0085, 0.0079) 10.38 (10.37, 10.38) −153.52 (−153.53, −153.52)Notes: CI, Confidence <strong>in</strong>terval.All coefficients are non-zero at 1% significance level. CIs are given <strong>in</strong> parentheses. Dates given <strong>in</strong> yyyymmdd <strong>for</strong>mat. The daily temperature is measured <strong>in</strong> degreeCelsius, except <strong>for</strong> American cities measured <strong>in</strong> degree Fahrenheit.


66 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012mean level and <strong>in</strong> most of the cases a l<strong>in</strong>ear trend at 1% significance level is detectableas it is displayed <strong>in</strong> Table 1.Our f<strong>in</strong>d<strong>in</strong>gs are similar to Alaton et al. (2002) and Benth et al. (2007) <strong>for</strong> Sweden;Benth et al. (2007) <strong>for</strong> Lithuania; Campbell and Diebold (2005) <strong>for</strong> the United States;and Papazian and Skiadopoulos (2010) <strong>for</strong> Barcelona, London, Paris and Rome. Inour empirical results, the number of periodic terms of the FTS series varies from cityto city, sometimes from 4 to 21 or more terms. We notice that the series expansion<strong>in</strong> Equation (8) with more and more periodic terms provides a f<strong>in</strong>e tun<strong>in</strong>g, but thiswill <strong>in</strong>crease the number of parameters. Here we propose a different way to correct <strong>for</strong>seasonality. We show that a local smooth<strong>in</strong>g approach does that job <strong>in</strong>stead, but withless technical expression. Asymptotically they can be approximated by FTS estimators.For a fixed time po<strong>in</strong>t s ∈ [1, 365], we smooth s with a Local L<strong>in</strong>ear Regression (LLR)estimator: s = arg m<strong>in</strong>e, f∑365t=1{ ¯T t − e s − f s (t − s) } ( )2 t − s K , (9)hwhere ¯T t is the mean of average daily temperature <strong>in</strong> J years, h is the smooth<strong>in</strong>gparameter and K(·) denotes a kernel. This estimator, us<strong>in</strong>g Epanechnikov Kernel,<strong>in</strong>corporates an asymmetry term s<strong>in</strong>ce high temperatures <strong>in</strong> w<strong>in</strong>ter are more pronouncedthan <strong>in</strong> summer as Figure 1 displays <strong>in</strong> a stretch of 8 years plot of the averagedaily temperatures over the FTS estimates.After remov<strong>in</strong>g the LLR-seasonal mean (Equation (9)) from the daily average temperatures(X t = T t − t ), we apply the Augmented Dickey–Fuller (ADF) and theKwaitkowski–Phillips–Schmidt–Sh<strong>in</strong> (KPSS) <strong>test</strong>s to check whether X t is a stationaryprocess. We then plot the Partial Autocorrelation Function (PACF) of X t to detectpossible <strong>in</strong>tertemporal correlations. This suggests that persistence of daily average iscaptured by AR processes of higher order p:X t+p =p∑β i X t−i + ε t , ε t = σ t e t , e t ∼ N(0, 1), (10)i=1with residuals ε t . Under the stationarity hypothesis of the coefficients βs and the meanzero of residuals ε t , the mean temperature E [T t ] = t . This is different to the approachof Campbell and Diebold (2005), who suggested to regress deseasonalized temperatureson orig<strong>in</strong>al temperatures. The analysis of the PACFs and Akaike’s <strong>in</strong><strong>for</strong>mationcriterion (AIC) suggests that the AR(3) model <strong>in</strong> Benth et al. (2007) expla<strong>in</strong>s the temperatureevolution well and holds <strong>for</strong> many cities. The results of the stationarity <strong>test</strong>sand the coefficients of the fitted AR(3) are given <strong>in</strong> Table 2. Figure 2 illustrates that theACFs of the residuals ε t are close to 0 and accord<strong>in</strong>g to Box-Ljung statistic the firstfew lags are <strong>in</strong>significant, whereas the ACFs of the squared residuals εt 2 show a highseasonal pattern.with FTS and anadditional generalized autoregressive conditional heteroskedasticity (GARCH) (p, q)term:We calibrate the determ<strong>in</strong>istic seasonal variance function σ 2t


The Implied Market Price of Weather Risk 67(a)(b)Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Temperature <strong>in</strong> °F(c)Temperature <strong>in</strong> °C(e)Temperature <strong>in</strong> °F(g)Temperature <strong>in</strong> °C(i)Temperature <strong>in</strong> °C8060402020000101 20030101 20060101 20080501200Time−2020000101 20030101 20060101 20080501806040Time20000101 20030101 20060101 200805013020100Time20000101 20030101 20060101 20080501302010Time20000101 20030101 20060101 20080501TimeTemperature <strong>in</strong> °C(d)Temperature <strong>in</strong> °C(f)Temperature <strong>in</strong> °F(h)Temperature <strong>in</strong> °F(j)Temperature <strong>in</strong> °C3020100−1020000101 20030101 20060101 200805013020100−10Time20000101 20030101 20060101 2008050180604020Time20000101 20030101 20060101 2008050180604020Time20000101 20030101 20060101 200805013020100Time20000101 20030101 20060101 20080501TimeFigure 1. A stretch of 8 years plot of the average daily temperatures (black l<strong>in</strong>e), the seasonalcomponent modelled with a Fourier-truncated series (dashed l<strong>in</strong>e) and the local l<strong>in</strong>ear regression(grey l<strong>in</strong>e) us<strong>in</strong>g Epanechnikov Kernel. (a) Atlanta, (b) Beij<strong>in</strong>g, (c) Berl<strong>in</strong>, (d) Essen, (e)Houston, (f) New York, (g) Osaka, (h) Portland, (i) Taipei and (j) Tokyo.ˆσ 2t = c +L∑l=1{c 2l cos( ) 2lπt+ c 2l+1 s<strong>in</strong>365( )} 2lπt+ α 1 (σ t−1 η t−1 ) 2 + β 1 σt−1 2365, η t(11)∼ iid N(0, 1).


68 W. K. Härdle and B. López CabreraTable 2. Result of the stationary <strong>test</strong>s and the coefficients of the fitted AR(3).ADF–KPSS AR(3) CAR(3)City ˆτ ˆk β 1 β 2 β 3 α 1 α 2 α 3 λ 1 λ 2,3Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Atlanta −55.55 + 0.21 ∗∗∗ 0.96 −0.38 0.13 2.03 1.46 0.28 −0.30 −0.86Beij<strong>in</strong>g −30.75 + 0.16 ∗∗∗ 0.72 −0.07 0.05 2.27 1.63 0.29 −0.27 −1.00Berl<strong>in</strong> −40.94 + 0.13 ∗∗ 0.91 −0.20 0.07 2.08 1.37 0.20 −0.21 −0.93Essen −23.87 + 0.11 ∗ 0.93 −0.21 0.11 2.06 1.34 0.16 −0.16 −0.95Houston −38.17 + 0.05 ∗ 0.90 −0.39 0.15 2.09 1.57 0.33 −0.33 −0.87New York −56.88 + 0.08 ∗ 0.76 −0.23 0.11 2.23 1.69 0.34 −0.32 −0.95Osaka −18.65 + 0.09 ∗ 0.73 −0.14 0.06 2.26 1.68 0.34 −0.33 −0.96Portland −45.13 + 0.05 ∗ 0.86 −0.22 0.08 2.13 1.48 0.26 −0.27 −0.93Taipei −32.82 + 0.09 ∗ 0.79 −0.22 0.06 2.20 1.63 0.36 −0.40 −0.90Tokyo −25.93 + 0.06 ∗ 0.64 −0.07 0.06 2.35 1.79 0.37 −0.33 −1.01Notes: ADF, augmented Dickey–Fuller; KPSS, Kwiatkowski–Phillips–Schmidt–Sh<strong>in</strong>; AR, autoregressiveprocess; CAR, cont<strong>in</strong>uous autoregressive model.ADF and KPSS statistics, coefficients of the AR(3), CAR(3) and eigenvalues λ 1,2,3 , of the matrix A of theCAR(3) model <strong>for</strong> the detrended daily average temperatures <strong>for</strong> different cities.+ 0.01 critical values, ∗ 0.1 critical value, ∗∗ 0.05 critical value, ∗∗∗ 0.01 critical value.The Fourier part <strong>in</strong> Equation (11) captures the seasonality <strong>in</strong> volatility, whereasthe GARCH part captures the rema<strong>in</strong><strong>in</strong>g non-seasonal volatility. Note aga<strong>in</strong> thatmore and more periodic terms <strong>in</strong> Equation (11) provide a good fitt<strong>in</strong>g but thiswill <strong>in</strong>crease the number of parameters. To avoid this and <strong>in</strong> order to achieve positivityof the variance, Gaussian risk factors and volatility model flexibility <strong>in</strong> acont<strong>in</strong>uous time, we propose the calibration of the seasonal variance <strong>in</strong> terms ofan LLR:arg m<strong>in</strong>g,h∑365t=12{ˆε t − g s − h s (t − s) } ( )2 t − s K , (12)hwhere ˆε 2 t is the mean of squared residuals <strong>in</strong> J years and K(·) is a kernel. Figure 3shows the daily empirical variance (the average of squared residuals <strong>for</strong> each day ofthe year), the fitt<strong>in</strong>gs us<strong>in</strong>g the FTS-GARCH(1,1) and the LLR (with Epanechnikovkernel) estimators. Here we obta<strong>in</strong> the Campbell and Diebold’s (2005) effect <strong>for</strong> differenttemperature data, high variance <strong>in</strong> w<strong>in</strong>ter to earlier summer and low variance <strong>in</strong>spr<strong>in</strong>g to late summer. The effects of non-seasonal GARCH volatility component aresmall.Figure 4 displays the ACFs of temperature residuals ε t and squared residuals ε 2 tafter divid<strong>in</strong>g out the determ<strong>in</strong>istic LLR seasonal variance. The ACF plots of thestandardized residuals rema<strong>in</strong> unchanged but now the squared residuals presents anon-seasonal pattern. The LLR seasonal variance creates almost normal residuals andcaptures the peak seasons as Figure 5 <strong>in</strong> a log Kernel smooth<strong>in</strong>g density plot showsaga<strong>in</strong>st a Normal Kernel evaluated at 100 equally spaced po<strong>in</strong>ts. Table 3 presentsthe calibrated coefficients of the FTS-GARCH seasonal variance estimates and the


The Implied Market Price of Weather Risk 69ACF ε t− AtlantaACF ε t− Berl<strong>in</strong>ACF ε t− HoustonACF ε t− OsakaACF ε t− Taipei0.030.020.010−0.01−0.02−0.030 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 10000.050.040.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 1000LagACF ε 2 t − AtlantaACF ε 2 t − Berl<strong>in</strong>ACF ε 2 t − HoustonACF ε 2 t − OsakaACF ε 2 t − Taipei0.10.050−0.05−0.10 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 10000.150.10.050−0.05−0.10.050.040.030.020.010−0.01−0.02Lag0 200 400 600 800 1000Lag−0.030 200 400 600 800 10000.10.050−0.05Lag−0.10 200 400 600 800 1000LagACF ε t−Beij<strong>in</strong>gACF ε t−EssenACF ε t− New YorktDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012ACF ε t− PortlandACF ε t− Tokyo0.030.020.010−0.01−0.02−0.030 200 400 600 800 10000.030.020.010−0.01−0.02LagACF ε 2 t − Beij<strong>in</strong>gACF ε 2 t − Essen−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 1000LagACF ε 2 − Tokyo ACF ε 2 ACF ε 2 − New Yorkt − Portlandt0.040.030.020.010−0.01−0.02−0.030 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 10000.10.050−0.05Lag−0.10 200 400 600 800 10000.030.020.010−0.01−0.02Lag−0.030 200 400 600 800 10000.10.050−0.05Lag−0.10 200 400 600 800 1000Figure 2. The ACF of residuals ε t (left panels) and squared residuals ε 2 t(right panels) ofdetrended daily temperatures <strong>for</strong> different cities.Lagdescriptive statistics <strong>for</strong> the residuals after correct<strong>in</strong>g by the FTS-GARCH and LLRseasonal variance. We observe that <strong>in</strong>dependently of the chosen location, the driv<strong>in</strong>gstochastics are close to a Wiener process. This will allow us to apply the pric<strong>in</strong>g toolsof f<strong>in</strong>ancial mathematics.


70 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012(a)70Seasonal variance605040302010(c)10Seasonal variance(e)120Seasonal variance(g)Seasonal variance(i)12Seasonal variance0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec864Time2Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec10080604020Time0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec8642Time0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec108642Time0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecTime(b)Seasonal variance8765432(d)16Seasonal variance(f)70Seasonal variance(h)25Seasonal variance(j)10Seasonal variance1Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec141210864Time2Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec605040302010Time0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec201510Time5Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec8642Time0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecFigure 3. The daily empirical variance (black l<strong>in</strong>e), the Fourier-truncated (dashed l<strong>in</strong>e) andthe local l<strong>in</strong>ear smoother seasonal variation us<strong>in</strong>g Epanechnikov kernel (grey l<strong>in</strong>e) <strong>for</strong> differentcities. (a) Atlanta, (b) Beij<strong>in</strong>g, (c) Berl<strong>in</strong>, (d) Essen, (e) Houston, (f) New York, (g) Osaka,(h) Portland, (i) Taipei and (j) Tokyo.Time4. Stochastic Pric<strong>in</strong>g ModelTemperatures are naturally evolv<strong>in</strong>g cont<strong>in</strong>uously over time, so it is very convenientto model the dynamics of temperature with cont<strong>in</strong>uous-time stochastic processes,although the data may be on a daily scale. We there<strong>for</strong>e need the re<strong>for</strong>mulation of theunderly<strong>in</strong>g process <strong>in</strong> cont<strong>in</strong>uous time to be more convenient with market def<strong>in</strong>itions.


The Implied Market Price of Weather Risk 710.030.030.030.030.020.020.020.02Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012ACF ε t – AtlantaACF ε t – Berl<strong>in</strong>ACF ε t – HoustonACF ε t – OsakaACF ε t – Taipei0.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.040.030.020.010−0.01−0.02−0.03−0.040 200 400 600 800 1000LagACF ε t2 – Atlanta−0.030 200 400 600 800 1000LagACF ε t2 – Berl<strong>in</strong>ACF ε t2 – HoustonACF ε t 2 – OsakaACF ε t 2 – Taipei0.010−0.01−0.020.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.040.030.020.010−0.01−0.02−0.03−0.040 200 400 600 800 1000LagACF ε t – Tokyo ACF ε t – Portland ACF ε t – New York ACF ε t – Essen ACF ε t – Beij<strong>in</strong>g0.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.022 – New York ACF ε t2 – Essen ACF ε t2 – Beij<strong>in</strong>g2 – Tokyo 2ACF ε t – PortlandACF ε t−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag−0.030 200 400 600 800 1000LagACF ε t0.010−0.01−0.020.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000Lag0.030.020.010−0.01−0.02−0.030 200 400 600 800 1000LagFigure 4. The ACF of residuals ε t (left panels) and squared residuals ε 2 t(right panels) ofdetrended daily temperatures after divid<strong>in</strong>g out the local l<strong>in</strong>ear seasonal variance <strong>for</strong> differentcities.We show that the AR(p) (Equation (10)) estimated <strong>in</strong> Section 3 <strong>for</strong> the detrended temperaturecan be there<strong>for</strong>e seen as a discretely sampled cont<strong>in</strong>uous-time autoregressive(CAR) process (CAR(p)) driven by a one-dimensional Brownian motion B t (thoughthe cont<strong>in</strong>uous-time process is Markov <strong>in</strong> higher dimension) (Benth et al., 2007):dX t = AX t dt + e p σ t dB t , (13)


72 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012(a) (b) (c) (d) (e)0−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 80−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 8−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 8(f) (g) (h) (i) (j)0−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 80−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 800−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 80−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 80−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 80−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 80−2−4−6−8−10−12−14−8 −6 −4 −2 0 2 4 6 8Figure 5. The log of Normal Kernel ( ∗ ) and log of Kernel smooth<strong>in</strong>g density estimate of residualsafter correct<strong>in</strong>g FTS (+) and locar l<strong>in</strong>ear (o) seasonal variance <strong>for</strong> different cities (a) Atlanta,(b) Beij<strong>in</strong>g, (c) Berl<strong>in</strong>, (d) Essen, (e) Houston, (f) New York, (g) Osaka, (h) Portland, (i) Taipeiand (j) Tokyo.where the state vector X t ∈ R p <strong>for</strong> p ≥ 1 is a vectorial Ornste<strong>in</strong>–Uhlenbeck process,namely, the temperatures after remov<strong>in</strong>g seasonality at times t, t − 1, t − 2, t − 3, ...;e k denotes the kth unit vector <strong>in</strong> R p <strong>for</strong> k = 1, ..., p; σ t > 0 is a determ<strong>in</strong>istic volatility(real-valued and square <strong>in</strong>tegrable function); and A is a p × p matrix:⎛⎞0 1 0 ... 0 00 0 1 ... 0 0A =. ⎜ ... 0. ⎟⎝0 ... ... 0 0 1⎠ , (14)−α p −α p−1 ... 0 −α 1with positive constants α k . Follow<strong>in</strong>g this nomenclature, X q(t) with q = 1, ..., p isthe qth coord<strong>in</strong>ate of X t and by sett<strong>in</strong>g q = 1 is equivalent to the detrended temperaturetime series X 1(t) = T t − t . The proof is derived by an analytical l<strong>in</strong>kbetween X 1(t) , X 2(t) and X 3(t) and the lagged temperatures up to time t − 3. X 1(t+3) isapproximated by Euler discretization. Thus <strong>for</strong> p = 1, X t = X 1(t) and Equation (13)becomesdX 1(t) =−α 1 X 1(t) dt + σ t dB t , (15)which is the cont<strong>in</strong>uous version of an AR(1) process. Similarly <strong>for</strong> p = 2, assume a timestep of length 1 dt = 1 and substitute X 2(t) iteratively to getX 1(t+2) ≈ (2 − α 1 )X 1(t+1) + (α 1 − α 2 − 1)X 1(t) + σ t e t , (16)


The Implied Market Price of Weather Risk 73Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Table 3. Coefficients of the FTS-GARCH seasonal variance.Corrected residuals ˆε twith ˆσtCoefficients of the FTS FTS LLRCity ĉ1 ĉ2 ĉ3 ĉ4 ĉ5 ĉ6 ĉ7 ĉ8 ĉ9 JB Kurt Skew JB Kurt SkewAtlanta 21.51 18.10 7.09 2.35 1.69 −0.39 −0.68 0.24 −0.45 272.01 3.98 −0.70 253.24 3.91 −0.68Beij<strong>in</strong>g 3.89 0.70 0.84 −0.22 −0.49 −0.20 −0.14 −0.11 0.08 219.67 3.27 −0.28 212.46 3.24 −0.28Berl<strong>in</strong> 5.07 0.10 0.72 0.98 −0.43 0.45 0.06 0.16 0.22 224.55 3.48 −0.05 274.83 3.51 −0.08Essen 4.78 0.00 0.42 0.63 −0.20 0.17 −0.06 0.05 0.17 273.90 3.65 −0.05 251.89 3.61 −0.08Houston 23.61 25.47 4.49 6.65 −0.38 1.00 −2.67 0.68 −1.56 140.97 3.96 −0.60 122.83 3.87 −0.57New York 22.29 13.80 3.16 3.30 −0.47 0.80 2.04 0.11 0.01 367.38 3.43 −0.23 355.03 3.43 −0.22Osaka 3.34 0.80 0.80 −0.57 −0.27 −0.18 −0.07 0.01 −0.03 105.32 3.37 −0.11 101.50 3.36 −0.11Portland 12.48 1.55 1.05 1.42 −1.19 0.46 0.34 −0.40 0.45 67.10 3.24 0.06 75.01 3.27 0.02Taipei 3.50 1.49 1.59 −0.38 −0.16 0.03 −0.17 −0.09 −0.18 181.90 3.26 −0.39 169.41 3.24 −0.37Tokyo 3.80 0.01 0.73 −0.69 −0.33 −0.14 −0.14 0.26 −0.13 137.93 3.45 −0.10 156.58 3.46 −0.13Notes: FTS, Fourier-truncated series; JB, Jarque Bera; LLR, local l<strong>in</strong>ear regression; GARCH, generalized autoregressive conditional heteroskedasticity.Seasonal variance estimate of {ci} 9 i = 1 fitted with an FTS and statistics – Skewness (Skew), kurtosis (Kurt) and JB <strong>test</strong> statistics – of the standardized residualswith seasonal variances fitted with FTS-GARCH and with LLR.Coefficients are significant at 1% level.


74 W. K. Härdle and B. López Cabrerawhere e t = B t+1 − B t .Forp = 3, we have:X 1(t+1) − X 1(t) = X 2(t) dt,X 2(t+1) − X 2(t) = X 3(t) dt,Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012X 3(t+1) − X 3(t) =−α 3 X 1(t) dt − α 2 X 2(t) dt − α 1 X 3(t) dt + σ t e t ,...,X 1(t+3) − X 1(t+2) = X 2(t+2) dt,X 2(t+3) − X 2(t+2) = X 3(t+2) dt,X 3(t+3) − X 3(t+2) =−α 3 X 1(t+2) dt − α 2 X 2(t+2) dt − α 1 X 3(t+2) dt + σ t e t ,substitut<strong>in</strong>g <strong>in</strong>to the X 1(t+3) dynamics and sett<strong>in</strong>g dt = 1:X 1(t+3) ≈ (3 − α 1 ) X} {{ } 1(t+2) + (2α 1 − α 2 − 3) X} {{ } 1(t+1) + (−α 1 + α 2 − α 3 + 1) X} {{ } 1(t) . (18)β 1 β 2 β 3Please note that this corrects the derivation <strong>in</strong> Benth et al. (2007) and Equation (18)leads to Equation (10) (with p = 3). The approximation of Equation (18) is required tocompute the eigenvalues of matrix A. The last columns of Table 2 display the CAR(3)-parameters and the eigenvalues of the matrix A <strong>for</strong> the studied temperature data. Thestationarity condition is fulfilled s<strong>in</strong>ce the eigenvalues of A have negative real parts andthe variance matrix ∫ t0 σ t−s 2 exp {A(s)} e pe ⊤ p exp { A ⊤ (s) } ds converges as t →∞.By apply<strong>in</strong>g the multidimensional Itô Formula, the process <strong>in</strong> Equation (13) withX t = x ∈ R p has the explicit <strong>for</strong>m X s = exp {A(s − t)} x + ∫ st exp {A(s − u)} e pσ u dB u <strong>for</strong>s ≥ t ≥ 0.S<strong>in</strong>ce dynamics of temperature futures prices must be free of arbitrage under thepric<strong>in</strong>g equivalent measure Q θ , the temperature dynamics of Equation (13) becomes<strong>for</strong> s ≥ t ≥ 0:dX t = (AX t + e p σ t θ t )dt + e p σ t dB θ t ,X s = exp {A(s − t)} x +∫ stexp {A(s − u)} e p σ u θ u du +∫ st(17)exp {A(s − u)} e p σ u dBu θ .(19)By <strong>in</strong>sert<strong>in</strong>g Equations (1)–(3) <strong>in</strong>to Equation (6), Benths et al. (2007) explicity calculatedthe risk neutral prices <strong>for</strong> HDD/CDD/CAT futures (and options) <strong>for</strong> contractstraded be<strong>for</strong>e the temperature measurement period, that is 0 ≤ t ≤ τ 1


The Implied Market Price of Weather Risk 75Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012F HDD(t,τ1 ,τ 2 ) =F CDD(t,τ1 ,τ 2 ) =∫ τ2τ 1∫ τ2τ 1∫ τ2υ t,s ψυ t,s ψ[ ]c − m{t,s,e ⊤1 exp{A(s−t)}X t}ds,υ t,s[ m{t,s,e ⊤1 exp{A(s−t)}X t} − c ]ds,υ t,s∫ τ1F CAT(t,τ1 ,τ 2 ) = u du + a t,τ1 ,τ 2X t + θ u σ u a t,τ1 ,τ 2e p duτ 1 t∫ τ2+ θ u σ u e ⊤ [ ]1 A−1 exp {A(τ 2 − u)} − I p ep du,τ 1with a t,τ1 ,τ 2= e ⊤ 1 A−1 [exp {A(τ 2 − t)} − exp {A(τ 1 − t)}]; I p is a p × p identity matrix;ψ(x) = x(x) + ϕ(x) ( denotes the standard normal cumulative distributionfunction (cdf)) with x = e ⊤ 1 exp {A(s − t)} X t; υt,s 2 = ∫ st σ [ ]u2 e⊤ 21exp {A(s − t)} e p du;and m {t,s,x} = s + ∫ st σ uθ u e ⊤ 1 exp {A(s − t)} e pdu + x. The solution to Equation (20)depends on the assumed specification <strong>for</strong> the MPR θ. In the next section, it is shownthat different assumed risk specifications can lead <strong>in</strong>to different derivative prices.The model <strong>in</strong> Benth et al. (2007) nests a number of previous models (Alaton et al.,2002; Benth, 2003; Benth and Saltyte-Benth, 2005; Brody et al., 2002); it generalizesthe Benth and Saltyte-Benth (2007) and Dornier and Querel (2007) approachesand is a very well studied methodology <strong>in</strong> the literature (Benth et al., 2011; Papazianand Skiadopoulos, 2010; Zapranis and Alexandridis, 2008). Besides it gives a clearconnection between the discrete- and cont<strong>in</strong>uous-time versions, it provides closed<strong>for</strong>mnon-arbitrage pric<strong>in</strong>g <strong>for</strong>mulas and it requires only a one-time estimation <strong>for</strong>the price dynamics. With the time series approach (Campbell and Diebold, 2005), thecont<strong>in</strong>uous-time approaches (Alaton et al., 2002; Huang-Hsi et al., 2008), neural networks(Zapranis and Alexandridis, 2008, 2009) or the pr<strong>in</strong>cipal component analysisapproach (Papazian and Skiadopoulos, 2010) are not easy to compute price dynamicsof CAT/CDD/HDD futures and one needs to use numerical approaches or simulations<strong>in</strong> order to calculate conditional expectations <strong>in</strong> Equation (6). In that case,partial differential equations or Monte Carlo simulations are be<strong>in</strong>g used. For optionpric<strong>in</strong>g, this would mean to simulate scenarios from futures prices. This translates <strong>in</strong>to<strong>in</strong>tensive computer simulation procedures.5. The Implied Market Price of Weather RiskFor pric<strong>in</strong>g and hedg<strong>in</strong>g non-tradable assets, one essentially needs to <strong>in</strong>corporate theMPR θ which is an important parameter of the associated EMM and it measures theadditional return <strong>for</strong> bear<strong>in</strong>g more risk. This section deals exactly with the differencesbetween ‘historical’ (P) and ‘risk-neutral’ (Q) behaviours of temperature. Us<strong>in</strong>g statisticalmodell<strong>in</strong>g and given that liquid-derivative contracts based on daily temperaturesare traded on the CME, one might <strong>in</strong>fer the MPR (the change of drift) from traded(CAT/CDD/HDD/C24AT) futures–options-type contracts.(20)


76 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Our study is a calibration procedure <strong>for</strong> f<strong>in</strong>ancial eng<strong>in</strong>eer<strong>in</strong>g purposes. In the calibrationexercise, a s<strong>in</strong>gle date (but different time horizons and calibrated <strong>in</strong>strumentsare used) is required, s<strong>in</strong>ce the model is recalibrated daily to detect <strong>in</strong>tertemporaleffects. Moreover, we use an economic and statistical <strong>test</strong><strong>in</strong>g approach, where westart from a specification of the MPR and check consistency with the data. By giv<strong>in</strong>gassumptions about the MPR, we implicitly make an assumption about the aggregaterisk aversion of the market. The risk parameter θ can then be <strong>in</strong>ferred by f<strong>in</strong>d<strong>in</strong>g thevalue that satisfies Equation (20) <strong>for</strong> each specification. Once we know the MPR <strong>for</strong>temperature futures, then we know the MPR <strong>for</strong> options and thus one can price new‘non-standard maturities’ or OTC derivatives. The concept of implied MPR is similarto that used <strong>in</strong> extract<strong>in</strong>g implied volatilities (Fengler et al., 2007) or the market priceof oil risk (Gibson and Schwartz, 1990).To value temperature derivatives, the follow<strong>in</strong>g specifications of the MPR are <strong>in</strong>vestigated:a constant, a piecewise l<strong>in</strong>ear function, a two-piecewise l<strong>in</strong>ear function, atime-determ<strong>in</strong>istic function and a ‘f<strong>in</strong>ancial-bootstrapp<strong>in</strong>g’ MPR. The statistical po<strong>in</strong>tof view is to beat this as an <strong>in</strong>verse problem with different degrees of smoothnessexpressed through the penalty parameter of a smooth<strong>in</strong>g spl<strong>in</strong>e. The economic po<strong>in</strong>tof view is to detect possible time dependencies that can be expla<strong>in</strong>ed by <strong>in</strong>vestor’spreferences <strong>in</strong> order to hedge weather risk.In this article we concentrate on contracts with monthly measurement length periods,but similar implications apply <strong>for</strong> seasonal strip contracts. We observe differenttemperature futures contracts i = 1, ..., I with measurement periods t ≤ τ1 i


The Implied Market Price of Weather Risk 77Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Table 4. Weather futures and futures prices listed on date (yyyymmdd) at CME.Trad<strong>in</strong>g date Measurement period Futures prices F (t,τ1 ,τ 2 , ˆθ) Realized TtContract type t τ1 τ2 CME MPR = 0 Constant MPR I(τ 1 ,τ 2 )Berl<strong>in</strong>-CAT 20070316 20070401 20070430 288.00 363.00 291.06 362.90Berl<strong>in</strong>-CAT 20070316 20070501 20070531 457.00 502.11 454.91 494.20Berl<strong>in</strong>-CAT 20070316 20070601 20070630 529.00 571.78 630.76 574.30Berl<strong>in</strong>-CAT 20070316 20070701 20070731 616.00 591.56 626.76 583.00Berl<strong>in</strong>-CAT 20070316 20070801 20070831 610.00 566.14 636.22 580.70Berl<strong>in</strong>-CAT 20070316 20070901 20070930 472.00 414.33 472.00 414.80Berl<strong>in</strong>-CAT 20070427 20070501 20070531 457.00 506.18 457.52 494.20Berl<strong>in</strong>-CAT 20070427 20070601 20070630 529.00 571.78 534.76 574.30Berl<strong>in</strong>-CAT 20070427 20070701 20070731 616.00 591.56 656.76 583.00Berl<strong>in</strong>-CAT 20070427 20070801 20070831 610.00 566.14 636.22 580.70Berl<strong>in</strong>-CAT 20070427 20070901 20070930 472.00 414.33 472.00 414.80Tokyo-C24AT 20081027 20090301 20090331 450.00 118.32 488.90 305.00Tokyo-C24AT 20081027 20090401 20090430 592.00 283.18 563.27 479.00Tokyo-C24AT 20081027 20090501 20090531 682.00 511.07 696.31 623.00Tokyo-C24AT 20081027 20090601 20090630 818.00 628.24 835.50 679.00Tokyo-C24AT 20081027 20090701 20090731 855.00 731.30 706.14 812.00Notes: CME, Chicago Mercantile Exchange; MPR, market price of risk.Weather futures at CME; futures prices F (t,τ1 ,τ 2 , ˆθ) from CME; estimated prices with MPR = 0; constant MPR <strong>for</strong> different controls per trad<strong>in</strong>gdate (constant MPR).Source: Bloomberg Professional Service (Weather futures).


78 W. K. Härdle and B. López Cabreraˆθ i t,HDD= arg m<strong>in</strong>θ i t⎛⎝F HDD(t,τ i1 ,τ2 i) −∫ τ i2τ i 1⎡υ t,s ψ ⎣ c − ⎤ ⎞ˆm1 2{t,s,e ⊤ 1 exp{A(s−t)}X t} ⎦ ds⎠, (21)υ t,sDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012with ˆm 1 {t,s,x} = ∫s + θti st σ ue ⊤ 1 exp {A(s − t)} e pdu + x, υt,s 2 , ψ(x) andx def<strong>in</strong>ed as <strong>in</strong>Equation (20). The MPR <strong>for</strong> CDD futures ˆθt,CDD i is equivalent to the HDD case <strong>in</strong>Equation (21) and we will there<strong>for</strong>e omit CDD parameterizations. Note that this specificationcan be seen as a determ<strong>in</strong>istic time-vary<strong>in</strong>g MPR θt i that varies with date <strong>for</strong>any given contract i, but it is constant over [t, τ2 i].5.2 One Piecewise Constant MPRA simpler MPR parameterization is to assume that it is constant across all time horizoncontracts priced <strong>in</strong> a particular date (θ t ). We there<strong>for</strong>e estimate this constant MPR<strong>for</strong> all contract types traded at t ≤ τ1 i ξ) θt 2 . The twopiecewise constant function ˆθ t with t ≤ τ1 i


The Implied Market Price of Weather Risk 79f CAT (ξ) = arg m<strong>in</strong>θ 1 t,CAT ,θ 2 t,CAT(I∑F CAT(t,τ i1 ,τ2 i) −i=1∫ τ i2τ i 1ˆ u du − â t,τ i1 ,τ i 2 X tDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012{ ∫ τ i− θt,CAT1 1I (u ≤ ξ) ˆσ u â t,τ i1 ,τ2 i e pdu+∫ τ i2τ i 1tI (u ≤ ξ) ˆσ u e ⊤ 1 A−1 [ exp { A(τ i 2 − u)} − I p]ep du{ ∫ τ i− θt,CAT2 1I (u >ξ) ˆσ u â t,τ i1 ,τ2 i e pdu+f HDD (ξ) = arg m<strong>in</strong>θ 1 t,HDD ,θ 2 t,HDD∫ τ i2τ i 1tI (u >ξ) ˆσ u e ⊤ 1 A−1 [ exp { A(τ i 2 − u)} − I p]ep du}) 2,⎛I∑⎝F HDD(t,τ i1 ,τ2 i) −i=1ˆm 3 {t,s,x} = s + θ 1 t,HDD{∫ st∫ τ i2τ i 1}(23)⎡υ t,s ψ ⎣ c − ⎤ ⎞ˆm3 2{t,s,e ⊤ 1 exp{A(s−t)}X t} ⎦ ds⎠,υ t,s}I (u ≤ ξ) σ u e ⊤ 1 exp {A(s − t)} e pdu + x{∫ s}+ θt,HDD2 I (u >ξ) σ u e ⊤ 1 exp {A(s − t)} e pdu + x ,tand υt,s 2 , ψ(x) andx as def<strong>in</strong>ed <strong>in</strong> Equation (20). In the next step, we optimized thevalue of ξ such as f CAT (ξ) orf HDD (ξ) is m<strong>in</strong>imized. This MPR specification will varyaccord<strong>in</strong>g to the unknown ξ. This would mean that the market does a risk adjustment<strong>for</strong> contracts traded close or far from the measurement period.5.4 General Form of the MPR per Trad<strong>in</strong>g DayGeneraliz<strong>in</strong>g the piecewise cont<strong>in</strong>uous function given <strong>in</strong> the previous subsection, the(<strong>in</strong>verse) problem of determ<strong>in</strong><strong>in</strong>g θ t with t ≤ τ1 i


80 W. K. Härdle and B. López Cabreraarg m<strong>in</strong>a k⎛I∑⎝F HDD(t,τ i1 ,τ2 i) −i=1∫ τ i2τ i 1⎡υ t,s ψ ⎣ c − ⎤ ⎞ˆm4 2{t,s,e ⊤ 1 exp{A(s−t)}X t} ⎦ ds⎠, (24)υ t,sDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012with ˆm 4 {t,s,x} = s + ∫ s ∑ Kt k = 1 a kl k (u i ) ˆσ ui e ⊤ 1 exp {A(s − t)} e pdu i + x and υt,s 2 , ψ(x) andxas def<strong>in</strong>ed <strong>in</strong> Equation (20). h k (u i )andl k (u i ) are vectors of known basis functions andmay denote a B-spl<strong>in</strong>e basis <strong>for</strong> example. γ k and a k def<strong>in</strong>e the coefficients and K is thenumber of knots. This means that the <strong>in</strong>ferred MPR is go<strong>in</strong>g to be a solution <strong>for</strong> an<strong>in</strong>verse problem with different degrees of smoothness expressed through the penaltyparameter of a smooth<strong>in</strong>g spl<strong>in</strong>e. The degrees of smoothness will allow <strong>for</strong> a termstructure of risk. In other words, a time-dependent risk factor offers the possibility tohave different risk adjustments <strong>for</strong> different times of the year.5.5 Bootstrapp<strong>in</strong>g the MPRIn this section we propose a bootstrapp<strong>in</strong>g technique to detect possible MPR timedependentpaths of temperature futures contracts. More importantly, s<strong>in</strong>ce thesefutures contract types have different measurement periods [τ1 i, τ 2 i] with τ 1 i


The Implied Market Price of Weather Risk 81Then substitute ˆθ 1 t,CAT <strong>in</strong> the period [τ 1 1 , τ 1 2 ]and ˆθ 2 t,CAT <strong>in</strong> the period [τ 2 1 , τ 2 2 ]toestimateˆθ 3 t,CAT :ˆθ 3 t,CAT= arg m<strong>in</strong>ˆθ 3 t,CAT(F CAT(t,τ 31 ,τ2 3) −∫ τ 32τ 3 1∫ τ 1ˆ u du − â t,τ 31 ,τ2 3X 1t −tˆθ 1 t,CAT ˆσ uâ t,τ 31 ,τ 3 2 e pduDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012∫ τ 22−τ12ˆθ 2 t,CAT ˆσ uâ t,τ 31 ,τ 3 2 e pdu −∫ τ 32τ 3 1θ 3 t,CAT ˆσ ue ⊤ 1 A−1 [ exp { A(τ 3 2 − u)} − I p]ep du) 2.In a similar way, one obta<strong>in</strong>s the estimation of ˆθ 4 t,CAT , ..., ˆθ I t,CAT .5.6 Smooth<strong>in</strong>g the MPR over TimeS<strong>in</strong>ce smooth<strong>in</strong>g <strong>in</strong>dividual estimates is different from estimat<strong>in</strong>g a determ<strong>in</strong>istic function,we also assure our results by fitt<strong>in</strong>g a parametric function to all available contractprices (calendar year estimation). After comput<strong>in</strong>g the MPR ˆθ t,CAT , ˆθ t,HDD and ˆθ t,CDD<strong>for</strong> each of the previous specification and <strong>for</strong> each of the nth trad<strong>in</strong>g days t <strong>for</strong> differentith contracts, the MPR time series can be smoothed with the <strong>in</strong>verse problem po<strong>in</strong>ts tof<strong>in</strong>d an MPR ˆθ u <strong>for</strong> every calendar day u and with that be<strong>in</strong>g able to price temperaturederivatives <strong>for</strong> any date:arg m<strong>in</strong>f ∈F jn∑t=1{ } 2ˆθ t − f (u t ) = arg m<strong>in</strong>α j⎧n∑ ⎨⎩ ˆθ t −t=1⎫2J∑ ⎬α j j (u t ) , (27)⎭where j (u t ) is a vector of known basis functions, α j def<strong>in</strong>es the coefficients, J is thenumber of knots, u t = t − + 1 with <strong>in</strong>crement and n is the number of days to besmoothed. In our case, u t = 1dayand j (u t ) is estimated us<strong>in</strong>g cubic spl<strong>in</strong>es.Alternatively, one can first do the smooth<strong>in</strong>g with basis functions of all availablefutures contracts:arg m<strong>in</strong>β jn∑t=1⎧I∑ ⎨⎩ F (t,τ1 i,τ 2 i) −i=1j=1⎫2J∑ ⎬β j j (u t ) , (28)⎭j=1and then estimate the time series of ˆθ t s s with the obta<strong>in</strong>ed smoothed futures pricesF s (t,τ1 1,τ 2) . IFor example, <strong>for</strong> a constant MPR <strong>for</strong> all CAT futures contracts type traded over allts with t ≤ τ1 i


82 W. K. Härdle and B. López Cabreraˆθ s t,CAT= arg m<strong>in</strong>θ s t,CAT(∫ τ IF s CAT(t,τ1 1,τ 2 I ) − 2τ 1 1ˆ u du − â t,τ 11 ,τ I 2 X t − θ s t,CAT{ ∫ τ 11tˆσ u â t,τ 11 ,τ I 2 e pdu+∫ τ I2τ 1 1ˆσ u e ⊤ 1 A−1 [ exp { A(τ I 2 − u)} − I p]ep du}) 2. (29)Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 20125.7 Statistical and Economical Insights of the Implied MPRIn this section, us<strong>in</strong>g the previous specifications, we imply the MPR (the changeof drift) <strong>for</strong> CME (CAT/CDD/HDD/C24AT) futures contracts traded <strong>for</strong> differentcities. Note that one might also <strong>in</strong>fer the MPR from options data and compare thef<strong>in</strong>d<strong>in</strong>gs with prices <strong>in</strong> the futures market.Table 5 presents the descriptive statistics of different MPR specifications <strong>for</strong> Berl<strong>in</strong>-CAT, Essen-CAT and Tokyo-C24AT daily futures contracts with t ≤ τ1 i


The Implied Market Price of Weather Risk 83Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Table 5. Statistics of MPR specifications <strong>for</strong> Berl<strong>in</strong>-CAT, Essen-CAT and Tokyo-C24AAT.TypeNo. ofcontracts Statistic Constant 1 piecewise 2 piecewise Bootstrap Spl<strong>in</strong>eBerl<strong>in</strong>-CAT WS (Prob) 0.93 (0.66) 0.01 (0.08) 0.00 (0.04) 0.93 (0.04) 0.91 (0.66)30 days 487 M<strong>in</strong> (Max) −0.28 (0.12) −0.65 (0.11) −1.00 (0.20) −0.28 (0.12) 0.02 (0.21)(i = 1) Med (SD) 0.05 (0.07) −0.10 (0.12) −0.17 (0.31) 0.05 (0.07) 0.14 (0.06)60 days 874 M<strong>in</strong> (Max) −0.54 (0.84) −4.95 (8.39) −10.71 (10.25) −0.54 (0.84) 0.00 (0.21)(i = 2) Med (SD) 0.03 (0.13) −0.09 (1.06) −0.17 (1.44) 0.03 (0.13) 0.05 (0.05)90 days 858 M<strong>in</strong> (Max) −0.54 (0.82) −4.95 (8.39) −10.71 (10.25) −0.54 (0.82) 0.00 (0.21)(i = 3) Med (SD) 0.02 (0.09) −0.10 (1.07) −0.17 (1.46) 0.02 (0.09) 0.11 (0.07)120 days 815 M<strong>in</strong> (Max) −0.53 (0.84) −4.95 (4.56) −7.71 (6.88) −0.53 (0.84) 0.00 (0.21)(i = 4) Med (SD) 0.03 (0.14) −0.10 (0.94) −0.19 (1.38) 0.03 (0.14) 0.11 (0.09)150 days 752 M<strong>in</strong> (Max) −0.54 (0.84) −4.53 (4.56) −8.24 (6.88) −0.54 (0.84) −0.00 (0.20)(i = 5) Med (SD) 0.13 (0.09) −0.10 (0.95) −0.20 (1.47) 0.13 (0.09) 0.01 (0.07)180 days 711 M<strong>in</strong> (Max) −0.54 (0.82) −4.53 (4.56) −8.24 (6.88) −0.54 (0.82) −0.03 (0.12)(i = 6) Med (SD) 0.02 (0.12) −0.10 (0.95) −0.14 (1.48) 0.02 (0.12) 0.00 (0.03)Essen-CAT WS (Prob) 0.02 (0.11) 0.20 (0.34) 0.01 (0.09) 0.02 (0.11) 1.63 (0.79)30 days 384 M<strong>in</strong> (Max) −0.98 (0.52) −31.05 (5.73) −1.83 (1.66) −0.98 (0.52) −0.00 (0.00)(i = 1) Med (SD) 0.01 (0.12) −0.39 (1.51) −0.43 (9.62) 0.01 (0.12) 0.00 (0.00)60 days 796 M<strong>in</strong> (Max) −1.35 (0.62) −31.05 (5.73) −1.83 (1.66) −1.35 (0.62) 0.00 (0.00)(i = 2) Med (SD) 0.02 (0.14) −0.40 (1.56) −0.46 (9.99) 0.02 (0.14) 0.00 (0.00)90 days 738 M<strong>in</strong> (Max) −1.56 (0.59) −6.68 (5.14) −5.40 (1.71) −1.56 (0.59) 0.00 (0.00)(i = 3) Med (SD) −0.02 (0.19) −0.40 (0.88) −0.40 (0.85) −0.02 (0.19) 0.00 (0.00)120 days 551 M<strong>in</strong> (Max) −0.29 (0.51) −4.61 (1.44) −6.60 (1.43) −0.29 (0.51) 0.00 (0.00)(i = 4) Med (SD) 0.03 (0.05) −0.37 (0.51) −0.41 (0.85) 0.03 (0.05) 0.00 (0.00)150 days 468 M<strong>in</strong> (Max) −0.44 (0.13) −4.61 (1.44) −6.60 (1.43) −0.44 (0.13) 0.00 (0.00)(i = 5) Med (SD) 0.00 (0.07) −0.35 (0.49) −0.30 (0.81) 0.00 (0.07) 0.00 (0.00)180 days 405 M<strong>in</strong> (Max) −0.10 (0.57) −4.61 (1.44) −4.25 (0.52) −0.10 (0.57) 0.00 (0.00)(i = 6) Med (SD) −0.02 (0.06) −0.12 (0.45) −0.21 (0.52) −0.02 (0.06) 0.00 (0.00)Tokyo-C24AT WS (Prob) 0.76 (0.61) 0.02 (0.13) 0.01 (0.11) 0.76 (0.61) 4.34 (0.96)30 days 419 M<strong>in</strong> (Max) −7.55 (0.17) −69.74 (52.17) −69.74 (52.17) −7.55 (0.17) −0.23 (−0.18)(Cont<strong>in</strong>ued)


84 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Table 5. (Cont<strong>in</strong>ued).TypeNo. ofcontracts Statistic Constant 1 piecewise 2 piecewise Bootstrap Spl<strong>in</strong>e(i = 2) Med (SD) −3.87 (2.37) −0.33 (19.68) −0.48 (20.46) −3.87 (2.37) −0.20 (0.01)60 days 416 M<strong>in</strong> (Max) −7.56 (0.14) −69.74 (52.17) −69.74 (52.17) −7.56 (0.14) −0.18 (−0.13)(i = 3) Med (SD) −3.49 (2.47) −0.23 (21.34) −0.41 (21.63) −3.49 (2.47) −0.15 (0.01)90 days 393 M<strong>in</strong> (Max) −7.55 (1.02) −69.74 (26.82) −69.74 (38.53) −7.55 (1.02) −0.13 (−0.09)(i = 4) Med (SD) −2.96 (2.65) 0.04 (20.84) −0.33 (20.22) −2.96 (2.65) −0.11 (0.01)120 days 350 M<strong>in</strong> (Max) −7.55 (1.02) −69.74 (26.82) −69.74 (48.32) −7.55 (1.02) −0.10 (−0.06)(i = 5) Med (SD) −2.08 (2.74) 1.26 (19.54) −0.11 (19.69) −2.08 (2.74) −0.08 (0.01)150 days 305 M<strong>in</strong> (Max) −7.55 (1.02) −51.18 (26.82) −51.18 (48.32) −7.55 (1.02) −0.06 (−0.04)(i = 6) Med (SD) −2.08 (2.71) 1.26 (16.03) 7.17 (17.02) −2.08 (2.71) −0.05 (0.00)180 days 243 M<strong>in</strong> (Max) −7.39 (1.26) −51.18 (19.10) −51.18 (48.32) −7.39 (1.26) −0.05 (−0.04)(i = 7) Med (SD) −2.08 (2.74) 3.66 (15.50) 7.63 (17.69) −2.08 (2.74) −0.04 (0.00)210 days 184 M<strong>in</strong> (Max) −7.39 (1.26) −24.88 (26.16) −54.14 (39.65) −7.39 (1.26) −0.07 (−0.05)(i = 8) Med (SD) −3.00 (2.86) 13.69 (10.05) 10.61 (14.63) −3.00 (2.86) −0.06 (0.00)240 days 167 M<strong>in</strong> (Max) −7.39 (1.26) −24.88 (21.23) −82.62 (42.14) −7.39 (1.26) −0.07 (−0.07)(i = 9) Med (SD) −3.00 (2.74) 3.46 (11.73) −4.24 (37.25) −3.00 (2.74) −0.07 (0.00)270 days 134 M<strong>in</strong> (Max) −7.39 (0.44) −24.88 (26.16) −82.62 (42.14) −7.39 (0.44) −0.07 (−0.03)(i = 10) Med (SD) −4.24 (2.39) 8.48 (13.88) −7.39 (40.76) −4.24 (2.39) −0.05 (0.00)Notes: WS, Wald statistics; SD, standard deviation.Futures contracts traded dur<strong>in</strong>g (20031006–20080527), (20050617–20090731), and (20040723–20090630) respectively, with trad<strong>in</strong>g date be<strong>for</strong>e measurementperiod t ≤ τ i 1


The Implied Market Price of Weather Risk 85Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012(a)MPR of CAT prices(c)MPR of CAT prices420Day: 20060530−250 100 150 200 250 300No. of days be<strong>for</strong>e measurement period10−1Day: 20060530MPR of CAT prices−250 100 150 200 250 300No. of days be<strong>for</strong>e measurement period(b)MPR of CAT prices10−1−250 100 150 200 250 300No. of days be<strong>for</strong>e measurement period(d)0.20.10−0.1Day: 20060530Day: 20060530−0.250 100 150 200 250 300No. of days be<strong>for</strong>e measurement periodFigure 6. Two piecewise constant MPR with jumps ξ = (a) 62, (b) 93, (c) 123 and (d) 154 days <strong>for</strong>Berl<strong>in</strong>-CAT contracts traded on 20060530. The correspond<strong>in</strong>g sum of squared errors are 2759,14794, 15191 and 15526. When the jump ξ is gett<strong>in</strong>g far from the measurement period, the valueof the MPR ˆθ 1 tdecreases and ˆθ 2 t<strong>in</strong>creases, yield<strong>in</strong>g a ˆθ t around 0.(<strong>in</strong> this case ξ = 150 days). The spl<strong>in</strong>e MPR smooths over time and <strong>for</strong> days withouttrad<strong>in</strong>g (see the case of Berl<strong>in</strong>-CAT or Essen-CAT futures), it displays a maximum, <strong>for</strong>example, <strong>in</strong> w<strong>in</strong>ter. A penaliz<strong>in</strong>g term <strong>in</strong> Equation (24) might correct <strong>for</strong> this.In all the specifications, we verified the discussion that MPR is different from 0 (asCao and Wei (2004), Huang-Hsi et al. (2008), Richards et al. (2004) and Alaton et al.(2002) do) varies <strong>in</strong> time and moves from a negative to a positive doma<strong>in</strong> accord<strong>in</strong>gto the changes <strong>in</strong> the seasonal variation. The MPR specifications change signs whena contract expires and rolls over to another contract (e.g. from 210 to 180, 150, 120,90, 60, 30 days be<strong>for</strong>e measurement period); they react negatively to the fast changes<strong>in</strong> seasonal variance σ t with<strong>in</strong> the measurement period (Figure 3) and to the changes<strong>in</strong> CAT futures volatility σ t a t,τ1 ,τ 2e p . Figure 8 shows the Berl<strong>in</strong>-CAT volatility paths <strong>for</strong>contracts issued be<strong>for</strong>e and with<strong>in</strong> the measurement periods 2004–2008. We observedthe Samuelson effect <strong>for</strong> mean-revert<strong>in</strong>g futures: <strong>for</strong> contracts traded with<strong>in</strong> the measurementperiod, CAT volatility is close to 0 when the time to measurement is largeand it decreases up to the end of the measurement period. For contracts traded be<strong>for</strong>ethe measurement period, CAT volatility is also close to 0 when the time to measurementis large, but <strong>in</strong>creases up to the start of the measurement period. In Figure 9,two Berl<strong>in</strong>-CAT contracts issued on 20060517 but with different measurement periodsare plotted: the longest the measurement period, the largest the volatility. Besidesthis, one observes the effect of the CAR(3) <strong>in</strong> both contracts when the volatility decaysjust be<strong>for</strong>e maturity of the contracts. These two effects are comparable with the study<strong>for</strong> Stockholm CAT futures <strong>in</strong> Benth et al. (2007); however, the deviations are lesssmoothed <strong>for</strong> Berl<strong>in</strong>.


86 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012CAT-Berl<strong>in</strong> pricesCAT MPR constantCAT 2OLS−MPRCAT spl<strong>in</strong>e−MPR800600400Day: 20060530200100 200 300No.of days be<strong>for</strong>e measurement period10−1100 200 300No.of days be<strong>for</strong>e measurement period0.40.20100 200 300No.of days be<strong>for</strong>e measurement period100500−50100 200 300No.of days be<strong>for</strong>e measurement periodCAT-Essen pricesCAT MPR constantCAT 2OLS−MPRCAT spl<strong>in</strong>e−MPR800600400Day: 20060530200100 200 300No.of days be<strong>for</strong>e measurement period0.50−0.5100 200 300No.of days be<strong>for</strong>e measurement period10.50100 200 300No.of days be<strong>for</strong>e measurement period500−50100 200 300No.of days be<strong>for</strong>e measurement periodAAT-Tokyo pricesAAT 2OLS−MPRAAT spl<strong>in</strong>e−MPRAAT MPR constant40302010Day: 2005053150 100 150 200No.of days be<strong>for</strong>e measurement period−2−4−6−850 100 150 200No.of days be<strong>for</strong>e measurement period200−20−4050 100 150 200No.of days be<strong>for</strong>e measurement period0−0.2−0.450 100 150 200No.of days be<strong>for</strong>e measurement periodFigure 7. Futures CAT prices (1 row panel) and MPR specifications: constant MPR <strong>for</strong> differentcontracts per trad<strong>in</strong>g day, two piecewise constant and spl<strong>in</strong>e (2, 3 and 4 row panel) <strong>for</strong> Berl<strong>in</strong>-CAT (left), Essen-CAT (middle), Tokyo-AAT (right) of futures traded on 20060530, 20060530and 20060531, respectively.We <strong>in</strong>vestigate the proposition that the MPR derived from CAT/HDD/CDDfutures is different from 0. We conduct the Wald statistical <strong>test</strong> to check whether thiseffect exists by <strong>test</strong><strong>in</strong>g the true value of the parameter based on the sample estimate. Inthe multivariate case, the Wald statistic <strong>for</strong> { θ t ∈ R i} nt=1 is( ˆθ t − θ 0 ) ⊤ ( ˆθ t − θ 0 ) ∼ χ 2 p , 1 2 ( ˆθ t − θ 0 ) ∼ N (0, I i ),where is the variance matrix and the estimate ˆθ t is compared with the proposedvalue θ 0 = 0. Us<strong>in</strong>g a sample size of n trad<strong>in</strong>g dates of contracts with t ≤ τ1 i


The Implied Market Price of Weather Risk 87(a)12(b)121010Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012CAT volatility8642(c)12CAT volatility10864202004 2005 2006 2007 2008Be<strong>for</strong>e the measurement period2004 2005 2006 2007 2008With<strong>in</strong> the measurement periodCAT volatility(d)CAT volatility86421210Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec8642Be<strong>for</strong>e the measurement period 20060Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecWith<strong>in</strong> the measurement period 2006Figure 8. The Samuelson effect <strong>for</strong> Berl<strong>in</strong>-CAT futures expla<strong>in</strong>ed by the CAT volatility σ t a t,τ1 ,τ 2e p(black l<strong>in</strong>e) and the volatility σ t of Berl<strong>in</strong>-CAT futures (dash l<strong>in</strong>e) from 2004 to 2008 and 2006<strong>for</strong> contracts traded be<strong>for</strong>e (a) and (b) and with<strong>in</strong> (c) and (d) the measurement period.when MPR estimates are obta<strong>in</strong>ed from smoothed prices us<strong>in</strong>g the calendar year estimation(Equation (29)). Both smooth<strong>in</strong>g procedures lead to similar outcomes: notablechanges <strong>in</strong> sign, MPR deviations are smoothed over time and the higher the number ofcalendar days, the closer the fit of Equations (27) and (29). This <strong>in</strong>dicates that samplesize does not <strong>in</strong>fluence the stochastic behaviour of the MPR.To <strong>in</strong>terpret the economic mean<strong>in</strong>g of the previous MPR results, recall, <strong>for</strong> example,the relationship between the RP (the market price m<strong>in</strong>us the implied futures price withMPR equal to 0) and the MPR <strong>for</strong> CAT temperature futures:RP CAT =∫ τ i1t∫ τ iθ u σ u a t,τ i1 ,τ2 i e 2pdu + θ u σ u e ⊤ [ 1 A−1 exp { A(τ2 i − u)} ]− I p ep du, (30)τ1iwhich can be <strong>in</strong>terpreted as the aggregated MPR times the amount of temperature riskσ t over [t, τ1 i] (first <strong>in</strong>tegral) and [τ 1 i, τ 2 i ] (second <strong>in</strong>tegral). By adjust<strong>in</strong>g the MPR value,these two terms contribute to the CAT futures price. For temperature futures withvalues that are positive related to weather changes <strong>in</strong> the short term, this implies a


88 W. K. Härdle and B. López Cabrera(a) 141210(b) 54Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012CAT volatility86420−10 −8 −6 −4 −2Trad<strong>in</strong>g days prior to measurement periodAR(3) effect3210−10 −8 −6 −4 −2Trad<strong>in</strong>g days prior to measurement periodFigure 9. (a) The CAT term structure of volatility and (b) the autoregressive effect of two contractsissued on 20060517: one with whole June as measurement period (straight l<strong>in</strong>e) and theother one with only the 1st week of June (dotted l<strong>in</strong>e).negative RP mean<strong>in</strong>g that buyers of temperature derivatives expect to pay lower pricesto hedge weather risk (<strong>in</strong>surance RP). In this case, θ t must be negative <strong>for</strong> CAT futures,s<strong>in</strong>ce σ t and X t are both positive. Negative MPRs translate <strong>in</strong>to premiums <strong>for</strong> bear<strong>in</strong>grisk, imply<strong>in</strong>g that <strong>in</strong>vestor will accept a reduction <strong>in</strong> the return of the derivativeequal to the right-hand side of Equation (30) <strong>in</strong> exchange <strong>for</strong> elim<strong>in</strong>at<strong>in</strong>g the effectsof the seasonal variance on pay-offs. On the other side, positive RP <strong>in</strong>dicates the existenceof consumers, who consider temperature derivatives <strong>for</strong> speculation purposes.In this case, θ t must be positive and implies discounts <strong>for</strong> tak<strong>in</strong>g additional (weather)risk. This rules out the ‘burn-<strong>in</strong>’ analysis of Brix et al. (2005), which seems to popularamong practitioners s<strong>in</strong>ce it uses the historical average <strong>in</strong>dex value as the price <strong>for</strong> thefutures. The sign of MPR–RP reflects the risk attitude and time horizon perspectivesof market participants <strong>in</strong> the diversification process to hedge weather risk <strong>in</strong> peak seasons.By understand<strong>in</strong>g the MPR, market participants might earn money (by short<strong>in</strong>gor long<strong>in</strong>g, accord<strong>in</strong>g to the sign). The <strong>in</strong>vestors impute value to the weather products,although they are non-marketable. This might suggest some possible relationshipsbetween risk aversion and the MPR.The non-stationarity behaviour of the MPR (sign changes) is also possible becauseit is captur<strong>in</strong>g all the non-fundamental <strong>in</strong><strong>for</strong>mation affect<strong>in</strong>g the futures pric<strong>in</strong>g:<strong>in</strong>vestors preferences, transaction costs, market illiquidity or other fractions like effectson the demand function. When the trad<strong>in</strong>g is illiquid the observed prices may conta<strong>in</strong>some liquidity premium, which can contam<strong>in</strong>ate the estimation of the MPR.Figure 11 illustrates the RP of Berl<strong>in</strong>-CAT futures <strong>for</strong> monthly contracts traded on20031006–20080527. We observe RPs different from 0, time dependent, where positive(negative) MPR contributes positively (negatively) to futures prices. The mean <strong>for</strong> theconstant MPR <strong>for</strong> the i = 1, ..., 7th Berl<strong>in</strong>-CAT futures contracts per trad<strong>in</strong>g dateis of size 0.02, 0.05, 0.02, 0.01, 0.10, 0.02 and 0.04, thus the terms <strong>in</strong> Equation (30)


The Implied Market Price of Weather Risk 89CAT MPRSmooth<strong>in</strong>g constant−MPR10−10 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRSmooth<strong>in</strong>g constant−MPR0.50−0.50 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRSmooth<strong>in</strong>g constant−MPR0.50−0.50 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPR0.5Smooth<strong>in</strong>g OLS−MPR00 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRSmooth<strong>in</strong>g OLS−MPR1000−1000 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRSmooth<strong>in</strong>g OLS−MPR50−50 100 200 300 400No. of days be<strong>for</strong>e measurement periodDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012CAT MPRCAT MPRCAT MPRSmooth<strong>in</strong>g OLS2−MPR50−50 100 200 300 400No. of days be<strong>for</strong>e measurement periodSmooth<strong>in</strong>g bootstrap−MPR10−10 100 200 300 400No. of days be<strong>for</strong>e measurement periodSmooth<strong>in</strong>g spl<strong>in</strong>e−MPR200−200 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRMPR of smoothed prices10−10 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRCAT MPRSmooth<strong>in</strong>g OLS2−MPR4200 100 200 300 400No. of days be<strong>for</strong>e measurement periodSmooth<strong>in</strong>g bootstrap−MPR0.50−0.50 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRCAT MPRSmooth<strong>in</strong>g spl<strong>in</strong>e−MPR50−50 100 200 300 400No. of days be<strong>for</strong>e measurement periodMPR of smoothed prices500−500 100 200 300 400No. of days be<strong>for</strong>e measurement periodCAT MPRCAT MPRCAT MPRCAT MPRSmooth<strong>in</strong>g OLS2−MPR200−200 100 200 300 400No. of days be<strong>for</strong>e measurement periodSmooth<strong>in</strong>g bootstrap−MPR0.50−0.50 100 200 300 400No. of days be<strong>for</strong>e measurement periodSmooth<strong>in</strong>g spl<strong>in</strong>e−MPR0.20−0.20 100 200 300 400No. of days be<strong>for</strong>e measurement periodMPR of smoothed prices0.50−0.50 100 200 300 400No. of days be<strong>for</strong>e measurement periodFigure 10. Smooth<strong>in</strong>g the MPR parameterization <strong>for</strong> Berl<strong>in</strong>-CAT futures traded on 20060530:the calendar year smooth<strong>in</strong>g (black l<strong>in</strong>e) <strong>for</strong> 1 day (left), 5 days (middle) and 30 days (right). Thelast row gives MPR estimates obta<strong>in</strong>ed from smoothed prices.contribute little to the prices compared to the seasonal mean t .TheRPsareverysmall <strong>for</strong> all contract types, and they behave constant with<strong>in</strong> the measurement monthbut fluctuate with σ t and θ t , lead<strong>in</strong>g to higher RPs dur<strong>in</strong>g volatile months (w<strong>in</strong>tersor early summers). This suggests that the temperature market does the risk adjustmentaccord<strong>in</strong>g to the seasonal effect, where low levels of mean reversion mean that volatilityplays a greater role <strong>in</strong> determ<strong>in</strong><strong>in</strong>g the prices.Our data extracted MPR results can be comparable with Cao and Wei (2004),Richards et al. (2004) and Huang-Hsi et al. (2008), who showed that the MPR is notonly different from 0 <strong>for</strong> temperature derivatives, but also significant and economicallylarge as well. However, the results <strong>in</strong> Cao and Wei (2004) and Richards et al. (2004)rely on the specification of the dividend process and the risk aversion level, while theapproach of Huang-Hsi et al. (2008) depends on the studied Stock <strong>in</strong>dex to computethe proxy estimate of the MPR. Alaton et al. (2002) concluded that the MPR impactis likely to be small. Our f<strong>in</strong>d<strong>in</strong>gs can also be compared with the MPR of other nontradableassets, <strong>for</strong> example, <strong>in</strong> commodities markets; the MPR may be either positiveor negative depend<strong>in</strong>g on the time horizon considered. In Schwartz (1997), the calibrationof futures prices of oil and copper delivered negative MPR <strong>in</strong> both cases.For electricity, Cartea and Figueroa (2005) estimated a negative MPR. Cartea andWilliams (2008) found a positive MPR <strong>for</strong> gas long-term contracts and <strong>for</strong> short-term


90 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012(a)CAT−RP4020−20 0−40−60−8020040101 20050101 20060101 20070101 20080101(b)100CAT−RP(c)CAT−RP(d)CAT−RP0−100100−100Time20040101 20050101 20060101 20070101 20080101Time01000−10020040101 20050101 20060101 20070101 20080101Time20040101 20050101 20060101 20070101 20080101Time(e)CAT−RP(f)CAT−RP(g)CAT−RP1000−1001000−1001000−10020040101 20050101 20060101 20070101 20080101Time20040101 20050101 20060101 20070101 20080101Time20040101 20050101 20060101 20070101 20080101Figure 11. Risk premiums (RPs) of Berl<strong>in</strong>-CAT monthly futures prices traded dur<strong>in</strong>g (20031006–20080527) with t ≤ τ1 i


The Implied Market Price of Weather Risk 91the <strong>in</strong>ferred prices with constant MPR replicate market prices, the estimated priceswith P = Q are close to the realized temperatures, mean<strong>in</strong>g that the history is likely agood prediction of the future. Table 6 describes the root mean squared errors (RMSEs)of the differences between the market prices and the estimated futures prices, withMPR values implied directly from specific futures contract types and with MPR valuesextracted from the HDD/CDD/CAT parity method, over different periods and cities.The RMSE is def<strong>in</strong>ed asDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012RMSE = √ n−1n∑(F t,τ i1 ,τ2 i − ˆF t,τ i1 ,τ2 i )2 ,t=1where ˆF t,τ i1 ,τ2 i are the estimated futures prices and small RMSE values denote good measureof precision. The RMSE estimates <strong>in</strong> the case of the constant MPR <strong>for</strong> differentCAT futures contracts are statistically significant enough to know CAT futures prices,but fail <strong>for</strong> HDD futures. S<strong>in</strong>ce temperature futures are written on different <strong>in</strong>dices, theimplied MPR will be then contract-specific hence requir<strong>in</strong>g a separate estimation procedure.We argue that this <strong>in</strong>equality <strong>in</strong> prices results from additional premiums thatthe market <strong>in</strong>corporates to the HDD estimation, due to possible temperature marketprobability predictions operat<strong>in</strong>g under a more general equilibrium rather than nonarbitrageconditions (Horst and Mueller, 2007) or due to the <strong>in</strong>corporation of weather<strong>for</strong>ecast models <strong>in</strong> the pric<strong>in</strong>g model that <strong>in</strong>fluence the risk attitude of market participants<strong>in</strong> the diversification process of hedg<strong>in</strong>g weather risk (Benth and Meyer-Brandis,2009; Dorfleitner and Wimmer, 2010; Papazian and Skiadopoulos, 2010).We <strong>in</strong>vestigate the pric<strong>in</strong>g algorithm <strong>for</strong> cities without <strong>for</strong>mal WD market. In thiscontext, the stylized facts of temperature data ( t , σ t ) are the only risk factors. Hence,a natural way to <strong>in</strong>fer the MPR <strong>for</strong> emerg<strong>in</strong>g regions is by know<strong>in</strong>g the MPR dependencyon seasonal variation of the closest geographical location with <strong>for</strong>mal WDmarket. For example, <strong>for</strong> pric<strong>in</strong>g Taipei weather futures derivatives, one could takethe WD market <strong>in</strong> Tokyo and learn the dependence structure by simply regress<strong>in</strong>g theaverage MPR of Tokyo-C24AT futures contracts i over the trad<strong>in</strong>g period aga<strong>in</strong>st theseasonal variation <strong>in</strong> period [τ 1 , τ 2 ]:ˆθ i τ 1 ,τ 2= 1τ 1 − tˆσ 2 τ 1 ,τ 2=∑τ 1tˆθ i t ,1 ∑τ 2ˆσ t 2τ 2 − τ .1t=τ 1In this case, the quadratic function that parameterizes the dependence is θ t = 4.08 −2.19 ˆσ τ 2 1 ,τ 2+ 0.28 ˆσ τ 4 1 ,τ 2, with R 2 adj= 0.71 and MPR <strong>in</strong>creases by <strong>in</strong>creas<strong>in</strong>g the drift andvolatility values (Figure 12). The dependencies of the MPR on time and temperatureseasonal variation <strong>in</strong>dicate that <strong>for</strong> regions with homogeneous weather risk there issome common market price of weather risk (as we expect <strong>in</strong> equilibrium).


92 W. K. Härdle and B. López CabreraDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Table 6. RMSE of the differences between observed CAT/HDD/CDD.Measurement period RMSE between estimated with MPR (θt) and CME pricesContract type τ1 τ2 No. of contracts MPR = 0 Constant 1 piecewise 2 piecewise Bootstrap Spl<strong>in</strong>eAtlanta-CDD + 20070401 20070430 230 15.12 20.12 150.54 150.54 20.15 27.34Atlanta-CDD + 20070501 20070531 228 20.56 53.51 107.86 107.86 53.52 28.56Atlanta-CDD + 20070601 20070630 230 18.52 43.58 97.86 97.86 44.54 35.56Atlanta-CDD + 20070701 20070731 229 11.56 39.58 77.78 77.78 39.59 38.56Atlanta-CDD + 20070801 20070831 229 21.56 33.58 47.86 47.86 33.59 38.56Atlanta-CDD + 20070901 20070930 230 17.56 53.58 77.86 77.86 53.54 18.56Berl<strong>in</strong>-HDD ∗ 20061101 20061130 22 129.94 164.52 199.59 199.59 180.00 169.76Berl<strong>in</strong>-HDD ∗ 20061201 20061231 43 147.89 138.45 169.11 169.11 140.00 167.49Berl<strong>in</strong>-HDD + 20061101 20061130 22 39.98 74.73 89.59 89.59 74.74 79.86Berl<strong>in</strong>-HDD + 20061201 20061231 43 57.89 58.45 99.11 99.11 58.45 88.49Berl<strong>in</strong>-CAT + 20070401 20070430 230 18.47 40.26 134.83 134.83 40.26 18.44Berl<strong>in</strong>-CAT + 20070501 20070531 38 40.38 47.03 107.342 107.34 47.03 40.38Berl<strong>in</strong>-CAT + 20070601 20070630 58 10.02 26.19 78.18 78.18 26.20 10.02Berl<strong>in</strong>-CAT + 20070701 20070731 79 26.55 16.41 100.22 100.22 16.41 26.55Berl<strong>in</strong>-CAT + 20070801 20070831 101 34.31 12.22 99.59 99.59 12.22 34.31Berl<strong>in</strong>-CAT + 20070901 20070930 122 32.48 17.96 70.45 70.45 17.96 32.48Essen-CAT + 20070401 20070430 230 13.88 33.94 195.98 195.98 33.94 13.87Essen-CAT + 20070501 20070531 39 52.66 52.95 198.18 198.188 52.95 52.66Essen-CAT + 20070601 20070630 59 15.86 21.35 189.45 189.45 21.38 15.86Essen-CAT + 20070701 20070731 80 16.71 44.14 155.82 155.82 44.14 16.71Essen-CAT + 20070801 20070831 102 31.84 22.66 56.93 56.92 22.66 31.84Essen-CAT + 20070901 20070930 123 36.93 14.28 111.58 111.58 14.28 33.93Tokyo-C24AT + 20090301 20090331 57 161.81 148.21 218.99 218.99 148.21 158.16Tokyo-C24AT + 20090401 20090430 116 112.65 99.55 156.15 156.15 99.55 109.78Tokyo-C24AT + 20090501 20090531 141 81.64 70.81 111.21 111.21 70.81 79.68Tokyo-C24AT + 20090601 20090630 141 113.12 92.66 104.75 110.68 92.66 111.20Tokyo-C24AT + 20090701 20090731 141 78.65 74.95 116.34 3658.39 74.95 77.07Notes: RMSE, root mean squared error; MPR, market price of risk; CME, Chicago Mercantile Exchange.Futures prices with t ≤ τ 1 i


The Implied Market Price of Weather Risk 930.60.40.2AugAprDownloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012MPR0−0.2Nov JulOctJunSepMay−0.42.5 3 3.5 4 4.5 5 5.5Average temperature variation <strong>in</strong> measurement monthFigure 12. The calibrated MPR as a determ<strong>in</strong>istic function of the monthly temperature variationof Tokyo-C24AT futures from November 2008 to November 2009 (prices <strong>for</strong> 8 contracts wereavailable).6. Conclusions and Further ResearchThis article deals with the differences between ‘historical’ and ‘risk-neutral’ behavioursof temperature and gives <strong>in</strong>sights <strong>in</strong>to the MPR, a drift adjustment <strong>in</strong> the dynamics ofthe temperature process to reflect how <strong>in</strong>vestors are compensated <strong>for</strong> bear<strong>in</strong>g risk whenhold<strong>in</strong>g the derivative. Our empirical work shows that <strong>in</strong>dependently of the chosenlocation, the temperature-driv<strong>in</strong>g stochastics are close to the Gaussian risk factorsthat allow us to work under the f<strong>in</strong>ancial mathematical context.Us<strong>in</strong>g statistical modell<strong>in</strong>g, we imply the MPR from daily temperature futures-typecontracts (CAT, CDD, HDD, C24AT) traded at the CME under the EMM framework.Different specifications of the MPR are <strong>in</strong>vestigated. It can be parameterized, given itsdependencies on time and seasonal variation. We also establish connections betweenthe RP and the MPR. The results show that the MPRs–RPs are significantly differentfrom 0, chang<strong>in</strong>g over time. This contradicts with the assumption made earlier <strong>in</strong> theliterature that MPR is 0 or constant and rules out the ‘burn-<strong>in</strong>’ analysis, which is popularamong practitioners. This br<strong>in</strong>gs significant challenges to the statistical branch ofthe pric<strong>in</strong>g literature, suggest<strong>in</strong>g that <strong>for</strong> regions with homogeneous weather risk thereis a common market price of weather risk. In particular, us<strong>in</strong>g a relationship of theMPR with a utility function, one may l<strong>in</strong>k the sign changes of the MPR with risk attitudeand time horizon perspectives of market participants <strong>in</strong> the diversification processto hedge weather risk.A further research on the explicit relationship between the RP and the MPR shouldbe carried out to expla<strong>in</strong> possible connections between modelled futures prices andtheir deviations from the futures market. An important issue <strong>for</strong> our results is that theeconometric part <strong>in</strong> Section 2 is carried out with estimates rather than true values. Onethus deals with noisy observations, which are likely to alter the subsequent estimationsand <strong>test</strong> procedures. An alternative to this is to use an adaptive local parametric estimationprocedure, <strong>for</strong> example, <strong>in</strong> Mercurio and Spoko<strong>in</strong>y (2004) or Härdle et al. (2011).


94 W. K. Härdle and B. López CabreraF<strong>in</strong>ally, a different methodology, but related to this article, would be to imply thepric<strong>in</strong>g kernel of option prices.AcknowledgementWe thank Fred Espen Benth and two anonymous referees <strong>for</strong> several constructive and<strong>in</strong>sightful suggestions on how to improve the article.Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012ReferencesAlaton, P., Djehiche, B. and Stillberger, D. (2002) On modell<strong>in</strong>g and pric<strong>in</strong>g weather derivatives, AppliedMathematical F<strong>in</strong>ance, 9(1), pp. 1–20.Barrieu, P. and El Karoui, N. (2002) Optimal design of weather derivatives, ALGO Research, 5(1), pp. 79–92.Benth, F. (2003) On arbitrage-free pric<strong>in</strong>g of weather derivatives based on fractional Brownian motion,Applied Mathematical F<strong>in</strong>ance, 10(4), pp. 303–324.Benth, F., Cartea, A. and Kiesel, R. (2008) Pric<strong>in</strong>g <strong>for</strong>ward contracts <strong>in</strong> power markets by the certa<strong>in</strong>tyequivalence pr<strong>in</strong>ciple: expla<strong>in</strong><strong>in</strong>g the sign of the market risk premium, Journal of F<strong>in</strong>ance and Bank<strong>in</strong>g,32(10), pp. 2006–2021.Benth, F., Härdle, W. K. and López Cabrera, B. (2011) Pric<strong>in</strong>g Asian temperature risk. In: P. Cizek,W. Härdle and R. Weron (Eds.), Statistical Tools <strong>for</strong> F<strong>in</strong>ance and Insurance, pp. 163–199 (Heidelberg:Spr<strong>in</strong>ger-Verlag).Benth, F. and Meyer-Brandis, T. (2009) The <strong>in</strong><strong>for</strong>mation premium <strong>for</strong> non-storable commodities, Journal ofEnergy Markets, 2(3), pp. 111–140.Benth, F. and Saltyte-Benth, J. (2005) Stochastic modell<strong>in</strong>g of temperature variations with a view towardsweather derivatives, Applied Mathematical F<strong>in</strong>ance, 12(1), pp. 53–85.Benth, F. and Saltyte-Benth, J. (2007) The volatility of temperature and pric<strong>in</strong>g of weather derivatives,Quantitative F<strong>in</strong>ance, 7(5), pp. 553–561.Benth, F., Saltyte-Benth, J. and Jal<strong>in</strong>ska, P. (2007) A spatial-temporal model <strong>for</strong> temperature with seasonalvariance, Applied Statistics, 34(7), pp. 823–841.Benth, F., Saltyte-Benth, J. and Koekebakker, S. (2007) Putt<strong>in</strong>g a price on temperature, Scand<strong>in</strong>avian Journalof Statistics, 34, pp. 746–767.Benth, F., Saltyte-Benth, J. and Koekebakker, S. (2008) Stochastic Modell<strong>in</strong>g of Electricity and RelatedMarkets, Advanced Series on Statistical Science and Applied Probability, 2nd ed. (S<strong>in</strong>gapore: WorldScientific).Brix, A., Jewson, S. and Ziehmann, C. (2005) Weather Derivative Valuation: The Meteorological, Statistical,F<strong>in</strong>ancial and Mathematical Foundations (Cambridge: Cambridge University Press).Brockett, P., Golden, L. L., Wen, M. and Yang, C. (2010) Pric<strong>in</strong>g weather derivatives us<strong>in</strong>g the <strong>in</strong>differencepric<strong>in</strong>g approach, North American Actuarial Journal, 13(3), pp. 303–315.Brody, D., Syroka, J. and Zervos, M. (2002) Dynamical pric<strong>in</strong>g of weather derivatives, Quantitative F<strong>in</strong>ance,2(3), pp. 189–198.Campbell, S. and Diebold, F. (2005) Weather <strong>for</strong>ecast<strong>in</strong>g <strong>for</strong> weather derivatives, Journal of AmericanStatistical Association, 100(469), pp. 6–16.Cao, M. and Wei, J. (2004) Weather derivatives valuation and market price of weather risk, The Journal ofFuture Markets, 24(11), pp. 1065–1089.Cartea, A. and Figueroa, M. (2005) Pric<strong>in</strong>g <strong>in</strong> electricity markets: a mean revert<strong>in</strong>g jump diffusion modelwith seasonality, Applied Mathematical F<strong>in</strong>ance, 12(4), pp. 313–335.Cartea, A. and Williams, T. (2008) UK gas markets: the market price of risk and applications to multiple<strong>in</strong>terruptible supply contracts, Energy Economics, 30(3), pp. 829–846.Constant<strong>in</strong>ides, G. (1987) Market risk adjustment <strong>in</strong> project valuation, The Journal of F<strong>in</strong>ance, 33(2),pp. 603–616.Cox, J. C., Ingersoll, J. and Ross, S. (1985) A theory of the term structure of <strong>in</strong>terest rates, Econometrica, 59,pp. 385–407.Doran, J. S. and Ronn, E. (2008) Comput<strong>in</strong>g the market price of volatility risk <strong>in</strong> the energy commoditymarkets, Journal of Bank<strong>in</strong>g and F<strong>in</strong>ance, 32, pp. 2541–2552.


The Implied Market Price of Weather Risk 95Downloaded by [<strong>Humboldt</strong>-Universitt zu Berl<strong>in</strong> Universittsbibliothek] at 06:53 25 April 2012Dorfleitner, G. and Wimmer, M. (2010) The pric<strong>in</strong>g of temperature futures at the Chicago MercantileExchange, Journal of Bank<strong>in</strong>g and F<strong>in</strong>ance, 34(6), pp. 1360–1370.Dornier, F. and Querel, M. (2007) Caution to the w<strong>in</strong>d: energy power risk management, Weather RiskSpecial Report, August, pp. 30–32.Fengler, M., Härdle, W. and Mammen, E. (2007) A dynamic semiparametric factor model <strong>for</strong> impliedvolatility str<strong>in</strong>g dynamics, F<strong>in</strong>ancial Econometrics, 5(2), pp. 189–218.Gibson, R. and Schwartz, E. (1990) Stochastic convenience yield and the pric<strong>in</strong>g of oil cont<strong>in</strong>gent claims,Journal of F<strong>in</strong>ance, 45, pp. 959–976.Härdle, W. K., López Cabrera, B., Okhr<strong>in</strong>, O. and Wang, W. (2011) Localiz<strong>in</strong>g temperature risk, Work<strong>in</strong>gPaper, <strong>Humboldt</strong>-Universitât zu Berl<strong>in</strong>.Horst, U. and Mueller, M. (2007) On the spann<strong>in</strong>g property of risk bonds priced by equilibrium,Mathematics of Operation Research, 32(4), pp. 784–807.Huang-Hsi, H., Yung-M<strong>in</strong>g, S. and Pei-Syun, L. (2008) HDD and CDD option pric<strong>in</strong>g with market priceof weather risk <strong>for</strong> Taiwan, The Journal of Future Markets, 28(8), pp. 790–814.Hull, J. and White, A. (1990) Valu<strong>in</strong>g derivative securities us<strong>in</strong>g the explicit f<strong>in</strong>ite difference method, Journalof F<strong>in</strong>ancial and Quantitative Analysis, 28, pp. 87–100.Karatzas, I. and Shreve, S. (2001) Methods of Mathematical F<strong>in</strong>ance (New York: Spr<strong>in</strong>ger-Verlag).Landskroner, Y. (1977) Intertemporal determ<strong>in</strong>ation of the market price of risk, The Journal of F<strong>in</strong>ance,32(5), pp. 1671–1681.Lucas, R. E. (1978) Asset prices <strong>in</strong> an exchange economy, Econometrica, 46(6), pp. 1429–1445.Mercurio, D. and Spoko<strong>in</strong>y, V. (2004) Statistical <strong>in</strong>ference <strong>for</strong> time-<strong>in</strong>homogeneous volatility models, TheAnnals of Statistics, 32(2), pp. 577–602.Mraoua, M. and Bari, D. (2007) Temperature stochastic modell<strong>in</strong>g and weather derivatives pric<strong>in</strong>g:empirical study with Moroccan data, Afrika Statistika, 2(1), pp. 22–43.Papazian, G. and Skiadopoulos, G. (2010) Model<strong>in</strong>g the dynamics of temperature with a view to weatherderivatives, Work<strong>in</strong>g Paper, University of Piraeus.Richards, T., Manfredo, M. and Sanders, D. (2004) Pric<strong>in</strong>g weather derivatives, American Journal ofAgricultural Economics, 86(4), pp. 1005–1017.Schwartz, E. (1997) The stochastic behaviour of commodity prices: implications <strong>for</strong> valuation and hedg<strong>in</strong>g,Journal of F<strong>in</strong>ance, LII(3), pp. 923–973.Vasicek, O. (1977) An equilibrium characterization of the term structure, Journal of F<strong>in</strong>ancial Economics,5,pp. 177–188.Zapranis, A. and Alexandridis, A. (2008) Model<strong>in</strong>g the temperature time-dependent speed of meanreversion <strong>in</strong> the context of weather derivative pric<strong>in</strong>g, Applied Mathematical F<strong>in</strong>ance, 15, pp. 355–386.Zapranis, A. and Alexandridis, A. (2009) Weather derivatives pric<strong>in</strong>g: model<strong>in</strong>g the seasonal residual varianceof an Ornste<strong>in</strong>-Uhlenbeck temperature process with neural networks, Neurocomput<strong>in</strong>g, 73(1–3),pp. 37–48.


Comput StatDOI 10.1007/s00180-012-0312-6ORIGINAL PAPERUs<strong>in</strong>g wiki to build an e-learn<strong>in</strong>g system <strong>in</strong> statistics<strong>in</strong> the Arabic languageTaleb Ahmad · Wolfgang Härdle ·Sigbert Kl<strong>in</strong>ke · Shafiqah AlawadhiReceived: 1 June 2007 / Accepted: 4 February 2012© Spr<strong>in</strong>ger-Verlag 2012Abstract E-learn<strong>in</strong>g plays an important role <strong>in</strong> education as it supports onl<strong>in</strong>e teach<strong>in</strong>gvia computer networks and provides educational services by utilis<strong>in</strong>g <strong>in</strong><strong>for</strong>mationtechnologies. This paper presents a case study describ<strong>in</strong>g the development of anArabic language e-learn<strong>in</strong>g course <strong>in</strong> statistics. Under discussion are problems concern<strong>in</strong>ge-learn<strong>in</strong>g <strong>in</strong> Arab countries with special focus on the difficulties of the applicationof e-learn<strong>in</strong>g <strong>in</strong> the Arabic world as well as design<strong>in</strong>g an Arabic plat<strong>for</strong>m withits language and technical challenges. For the plat<strong>for</strong>m we have chosen a wiki thatsupports LaT E X <strong>for</strong> <strong>for</strong>mulas and R to generate tables and figures as well as some <strong>in</strong>teractivity.Our system, Arabic MM*Stat, can be found at http://mars.wiwi.hu-berl<strong>in</strong>.de/mediawiki/mmstat_ar.KeywordsE-learn<strong>in</strong>g · MM*Stat · WikiT. Ahmad (B)Tishreen University, Latakia, Syriae-mail: taleb_ahmad1976@yahoo.deW. HärdleCenter <strong>for</strong> Applied Statistics and Economics, <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>,Unter den L<strong>in</strong>den 6, 10099 Berl<strong>in</strong>, Germanye-mail: haerdle@wiwi.hu-berl<strong>in</strong>.deS. Kl<strong>in</strong>keLadislaus von Bortkiewicz Chair of Statistics, <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>,Unter den L<strong>in</strong>den 6, 10099 Berl<strong>in</strong>, Germanye-mail: sigbert@wiwi.hu-berl<strong>in</strong>.deS. AlawadhiDepartment of Statistics and Operations Research, Faculty of Science,Kuwait University, P. O. Box 5969, 13060 Safat, Kuwaite-mail: alawadi@kuc01.kuniv.edu.kw123


T. Ahmad et al.1 IntroductionDue to the proliferation of the Internet, e-learn<strong>in</strong>g has become a significant aspect ofeducation and many universities and educational <strong>in</strong>stitutions have created their ownweb sites and e-learn<strong>in</strong>g systems. Future trends predict that e-learn<strong>in</strong>g will significantlycomplement classic learn<strong>in</strong>g. Statistics show that the size of the worldwide e-learn<strong>in</strong>gmarket is estimated to be 52.6 billion US dollars yearly, with the ratio at 65–75% <strong>for</strong>the United States and Europe. Statistics also <strong>in</strong>dicate that 30% of the education wasdelivered electronically. In comparison the e-learn<strong>in</strong>g market <strong>in</strong> Arab countries witha size around 15 million US dollars yearly is very weak. The gap between Europe andthe United States and the Arab countries is very large.The reasons <strong>for</strong> this gap is briefly summarised below:– Accord<strong>in</strong>g to the la<strong>test</strong> figures available on Internet World Statistics 2010 (deArgaez 2011), Internet world usage still varies widely across the world and acrosslanguages as shown <strong>in</strong> Table 1. The diffusion of Internet services <strong>in</strong> the most Arabcountries is weak compared to other regions of the world. This is ma<strong>in</strong>ly due to thegovernment monopolies over the telecommunications sector, result<strong>in</strong>g <strong>in</strong> higherprices. As a consequence only 3.3% of Internet users come from the Arabic region,even though the Arabic population is 5% of world population. Another example<strong>for</strong> this gap is that the percentage of web users <strong>in</strong> the Arabic world is 18.8%compared with 58.4% <strong>in</strong> Europe, 77.4% <strong>in</strong> the USA and 28.7% on average <strong>in</strong> thewhole world. Arabic users have much less experience with e-learn<strong>in</strong>g plat<strong>for</strong>ms,telecourses and educational courses.– English is the most common language <strong>in</strong> the e-learn<strong>in</strong>g plat<strong>for</strong>ms, but most Arabicusers have difficulties <strong>in</strong> understand<strong>in</strong>g and speak<strong>in</strong>g English.– General educational problems: A high level of illiteracy can be found <strong>in</strong> the Arabicworld which varies between 25 and 45% (Clayton 2007; Al-Fadhli 2008).– There is only a limited number of specialised cadres and scientific expertise <strong>in</strong> thearea of e-learn<strong>in</strong>g <strong>in</strong> Arab countries (Maegaard et al. 2005).Due to the above mentioned problems Arab countries need more time to acquirethe advantage of e-learn<strong>in</strong>g. The dissem<strong>in</strong>ation of the culture of e-learn<strong>in</strong>g <strong>in</strong> schoolsand universities needs a new generation of qualified professionals who can deal successfullywith modern technology and the experiences of e-learn<strong>in</strong>g.In fact, our Internet research showed that only a few Arabic e-learn<strong>in</strong>g plat<strong>for</strong>msexists, especially <strong>for</strong> statistics we could not f<strong>in</strong>d a s<strong>in</strong>gle one. For this reason we f<strong>in</strong>dthe creation of a plat<strong>for</strong>m that would aid Arabic students <strong>in</strong> learn<strong>in</strong>g statistics highlynecessary. The plat<strong>for</strong>m should cover the basic statistical topics, and is supported bymultiple examples and ease-of-use will be adapted <strong>for</strong> Arabic students.From the perspective described above we developed an Arabic e-learn<strong>in</strong>g plat<strong>for</strong>m<strong>in</strong> statistics (Arabic MM*Stat), which might become an important reference po<strong>in</strong>t <strong>in</strong>the study of statistics <strong>in</strong> Arabic through the Internet.Around about 2000 a system known as MM*Stat was developed at the School<strong>for</strong> Bus<strong>in</strong>ess and Economics of <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong> (Müller et al. 2000).123


Us<strong>in</strong>g wiki to build an e-learn<strong>in</strong>g systemTable 1 World <strong>in</strong>ternet users <strong>for</strong> 10 languages by June 2010 (de Argaez 2011)Top 10languages <strong>in</strong>the <strong>in</strong>ternetInternetusers(Mio.)Internetpenetration,(%)Growth <strong>in</strong><strong>in</strong>ternet(2000–2010),(%)Internet users% of total,(%)Worldpopulation(2010estimate)English 537 42.0 281 27.3 1,278Ch<strong>in</strong>ese 445 32.6 1,277 22.6 1,278Spanish 153 36.5 742 7.8 420Japanese 99 78.2 111 5.0 127Portuguese 83 33.0 990 4.2 250German 75 78.6 173 3.8 96Arabic 65 18.8 2,501 3.3 347French 60 17.2 389 3.0 348Russian 60 42.8 1,826 3.0 139Korean 39 55.2 107 2.0 71Top 10 languages 1,615 36.4 421 82.2 4,442Other languages 351 14.6 588 17.8 2,403World total 1,966 28.7 444 100.0 6,846MM*Stat is a plat<strong>for</strong>m <strong>for</strong> e-learn<strong>in</strong>g statistics and is an HTML based multimediaenvironment to support teach<strong>in</strong>g and learn<strong>in</strong>g statistics via CD or Internet.A MM*Stat course consists of lectures of specific topics <strong>in</strong> basic statistics, see Fig. 1<strong>for</strong> the hypergeometric distribution. Each lecture gives the basic concepts of generalstatistical theory, def<strong>in</strong>itions, <strong>for</strong>mulae and mathematical proofs. At the bottom is aset of buttons, on the left-hand side three buttons <strong>for</strong> navigation (go to the previouslecture, jump to the table of contents, go to the next lecture) and on the right-hand sidea number of buttons which l<strong>in</strong>k to pages with additional <strong>in</strong><strong>for</strong>mation. Four types ofadditional <strong>in</strong><strong>for</strong>mation are provided, these are:Expla<strong>in</strong>ed examples which require only knowledge of the current lecture to understandthem.Enhanced examples which require knowledge from different lectures than the currentone to understand them.Interactive examples which allow the user, via an embedded statistical software, torun them. For example, to plot the probability density function or the cumulativedistribution function <strong>for</strong> different parameters of n and p) or apply <strong>test</strong>s.More <strong>in</strong><strong>for</strong>mation which conta<strong>in</strong> <strong>for</strong> example historical <strong>in</strong><strong>for</strong>mation or mathematicalderivations which are not necessary <strong>for</strong> first-hand understand<strong>in</strong>g.Each chapter with lectures is f<strong>in</strong>ished with a lecture conta<strong>in</strong><strong>in</strong>g multiple-choice questionssuch that a user can evaluate his/her learn<strong>in</strong>g progress.Students or anyone <strong>in</strong>terested <strong>in</strong> statistics can <strong>in</strong>teractively learn about the basicconcepts of statistics at anytime and anywhere and consequently we based Arabic123


T. Ahmad et al.Fig. 1 The graphical user <strong>in</strong>terface of MM*Stat, as an example the lecture entitled hypergeometric distribution.Note the navigation button (bottom left) and buttons to examples and more <strong>in</strong><strong>for</strong>mation (bottomright). The tabs at the top reflect the user history and allow <strong>for</strong> a fast change between lecturesMM*Stat on the exist<strong>in</strong>g MM*Stat, which already existed <strong>in</strong> various languages: Czech,German, English, Spanish, French, Indonesian, Italian, Polish and Portuguese.2 Difficulties to design Arabic plat<strong>for</strong>msThere are some problems, however, associated with the mak<strong>in</strong>g of an Arabic plat<strong>for</strong>m,these relate to language as well as technology. We summarise these problems below:Language problemsThere are some items related to translation, some words and scientific terms aresimilar <strong>in</strong> Arabic and could create a problem when translated. For example, seeTable 2. The Arabic language makes no dist<strong>in</strong>ction between “adm<strong>in</strong>istration” and“management” or “calculate” and “compute”. The reader must recognise from thecontext which mean<strong>in</strong>g is correct. This makes a text more difficult to understand.Technical problems1. User <strong>in</strong>terfaceThe different language versions of MM*Stat were based on two different systems:123


Us<strong>in</strong>g wiki to build an e-learn<strong>in</strong>g systemTable 2 Some similar words <strong>in</strong> Arabic– The German version was written <strong>in</strong> HTML and the user <strong>in</strong>terface wasdeveloped with JavaScript <strong>for</strong> Internet Explorer 5. The problem was thatneither later Internet Explorer versions nor browsers other than the InternetExplorer were able to run the JavaScript code.– The English version was written <strong>in</strong> LaT E X <strong>for</strong> a variety of reasons, <strong>for</strong>example, translat<strong>in</strong>g MM*Stat <strong>in</strong>to a new language just required a changeto the LaT E X text which is much easier to handle than translat<strong>in</strong>g from aHTML page with a lot of embedded JavaScript codes. We used a softwarebased on LaTeX2HTML to create the HTML/JavaScript version ofMM*Stat with the same user <strong>in</strong>terface as be<strong>for</strong>e (Witzel and Kl<strong>in</strong>ke 2002).2. Writ<strong>in</strong>g from left to rightArabic script runs from right to left as opposed to most other languages, there<strong>for</strong>eall lists, paragraphs, statistical <strong>for</strong>ms, tables and graphics also run fromright to left. In some cases however Arabic text may conta<strong>in</strong> <strong>in</strong><strong>for</strong>mation thatneeds to run <strong>in</strong> the opposite direction (from left to right) such as numbersand Lat<strong>in</strong> texts. Any program that supports the Arabic language should providethe possibility of chang<strong>in</strong>g the direction when needed. A solution wouldbe to use ArabTeX (Lagally 2004), but with ArabTeX the Arabic texts arewritten <strong>in</strong> English with special character comb<strong>in</strong>ations and not <strong>in</strong> Arabic, seeFig. 2. Obviously this is unfamiliar to most Arabic speak<strong>in</strong>g people. AdditionallyLaTeX2HTML supports neither text from right to left, Arabic or Arab-TeX.3. Interactive examplesMM*Stat conta<strong>in</strong>ed a set of <strong>in</strong>teractive examples, which are important s<strong>in</strong>cethey allow the user to practice repeatedly with various variables or datasets, and with alternate sample sizes or parameters of the statistical methodsapplied. In this manner, the student obta<strong>in</strong>s a better understand<strong>in</strong>g of howthe statistical method works. However, the client-server technology implementedby Lehmann (2004) <strong>for</strong> MM*Stat worked only with the statisticalsoftware XploRe. The development and support of the XploRe softwarehas un<strong>for</strong>tunately ceased, so the question arises how one should <strong>in</strong>clude the<strong>in</strong>teractive examples <strong>in</strong> Arabic MM*Stat such that they will be runnable <strong>in</strong>future.The language problem can only be solved by adapt<strong>in</strong>g the texts. To solve the technicalproblems we decided to use another technology, the so called “wiki technology”.123


T. Ahmad et al.Fig. 2 Sample Arabtex <strong>in</strong>put <strong>in</strong> LaT E X, see examples/guha.tex <strong>in</strong> Arabtex3 Wiki technology3.1 What is a wiki?Wiki is a system that allows users to collaborate <strong>in</strong> <strong>for</strong>m<strong>in</strong>g the content of a web site.The first wiki web site, “WikiWikiWeb”, was designed by Cunn<strong>in</strong>gham and Leuf <strong>in</strong>1995 (Leuf and Cunnigham 2001). They describe the wiki system as a simple databasethat can operate on the World Wide Web. The goal is to simplify the process of participationand cooperation <strong>in</strong> the development of web content with maximum flexibility.The ma<strong>in</strong> advantages of a wiki are:– Wiki simplifies the process of content edit<strong>in</strong>g. Each web page conta<strong>in</strong>s a l<strong>in</strong>k tochange content with<strong>in</strong> the web browser. After sav<strong>in</strong>g a modified page it can beviewed immediately.– It uses simple markup to coord<strong>in</strong>ate content, and it is suitable <strong>for</strong> users withlittle experience with computers or web site development, as no HTML languageknowledge is required.– Wiki sites keep a record of the page history and there<strong>for</strong>e makes the comparisonof older and newer web pages an easy task. If a mistake is made, one can revertto the older version of the page.– Wiki sites can be publicly open and there<strong>for</strong>e allow any user to improve the content.– Wiki simplifies the organisation of a site: Wiki sites create hypertext databasesand can regulate the content <strong>in</strong> any manner desired; many content managementsystems require the plann<strong>in</strong>g of the organization of the content be<strong>for</strong>e anyth<strong>in</strong>g iswritten. This allows <strong>for</strong> flexibility which is not available <strong>in</strong> content managementsystems.3.2 Application of wikiThe flexibility of the wiki concept makes it an ideal knowledge transfer tool, at universities,educational <strong>in</strong>stitutes, <strong>in</strong> companies and with specialised web sites. For example,123


Us<strong>in</strong>g wiki to build an e-learn<strong>in</strong>g systemFig. 3 Entry page of Arabeyes wiki (http://wiki.arabeyes.org)a teacher could write his course us<strong>in</strong>g a wiki and offer it to his students as usefulmaterial <strong>for</strong> study, see e.g., Kl<strong>in</strong>ke (2011).Nowadays, we have many more examples of web sites us<strong>in</strong>g wikis as a tool <strong>for</strong>the development of content, like Wikipedia (2011). The Wikipedia project started 15January 2001 and today there are more than 10 million articles <strong>in</strong> the encyclopedia<strong>in</strong> all languages, more than 3.7 million articles <strong>in</strong> the English encyclopedia alone.Millions of volunteers around the world modify and add to the contents daily andnew articles are created. The Arabic version of the free encyclopedia was launched <strong>in</strong>July 2003 and currently conta<strong>in</strong>s approximately 160 thousand articles as the Arabicencyclopedia is <strong>in</strong> the content-build<strong>in</strong>g phase.The Arabic wiki plat<strong>for</strong>m “Arabeyes” (Afifi et al. 2011) provides a good environment<strong>for</strong> discussion and exchange of experience and knowledge about the Arabiclanguage. Arabeyes offers the translation <strong>in</strong>to Arabic <strong>for</strong> free open-source programs.In addition Arabeyes provides a technical dictionary that aims to translate and standardisethe technical terms used <strong>in</strong> translat<strong>in</strong>g the software to the Arabic user. Arabeyesis a solution <strong>for</strong> the language problem, see Fig. 3.3.3 Implementation of Arabic MM*StatArabic MM*Stat is directed at students and Arabic users that serve the e-learn<strong>in</strong>gissues <strong>in</strong> the Arabic region. The content of Arabic MM*Stat is a translation of thecontent of the <strong>for</strong>mer CD’s <strong>in</strong>to Arabic.123


T. Ahmad et al.Fig. 4 Graphical user <strong>in</strong>terface (GUI) of Arabic MM*Stat. Note that the <strong>in</strong>terface language (English) isdifferent from the content language (Arabic) due to user sett<strong>in</strong>gsWikimatrix (2011) offers an overview of the available wiki software and their capabilities.A useful wiki should support:– LaT E X to provide the possibility to write a statistical <strong>for</strong>mula <strong>in</strong> “mathematical”language rather than <strong>in</strong>tegrate it as a graphic, generated <strong>for</strong> example byLaTeX2HTML.– Arabic as a language <strong>for</strong> the content and the <strong>in</strong>terface.– Integration of statistical software, preferably R, to recreate <strong>in</strong>teractive examples.– Multiple choice questions to <strong>test</strong> students knowledge.As wiki software we f<strong>in</strong>ally decided to use the Mediawiki, the software beh<strong>in</strong>d the(Arabic) Wikipedia. It solves all possible technical problems (see Figs. 4 and 5):– User <strong>in</strong>terfaceIt is able to have the content and the user <strong>in</strong>terface <strong>in</strong> the Arabic language as theArabic Wikipedia shows.– Writ<strong>in</strong>g from left to rightTo some extent, it can change the writ<strong>in</strong>g direction <strong>for</strong> <strong>for</strong>mulas, list etc.– Interactive examplesThrough Mediawiki extensions we are able to transfer the functionality of theMM*Stat CD to the new system:– The R extension allows to embed (<strong>in</strong>teractive) tables and graphics generatedby R <strong>in</strong>to wiki page as well as <strong>in</strong>teractive examples.123


Us<strong>in</strong>g wiki to build an e-learn<strong>in</strong>g systemFig. 5 An <strong>in</strong>teractive example of a graphic of a probability density function and a table of a cdf of ab<strong>in</strong>omial distribution (p = 0.6)Fig. 6 The wiki source code <strong>for</strong> the page shown <strong>in</strong> Fig. 5. On top the Arabic text and with<strong>in</strong> the RFormtags the <strong>in</strong>put parameters and with<strong>in</strong> the R tags the R program123


T. Ahmad et al.– The Quiz extension provided multiple choice questions (Babé 2007).– The Math extension allows <strong>for</strong>mulas written <strong>in</strong> LaT E X to be embedded <strong>in</strong>tothe wiki page (Wegrzanowski and Vibber 2011).3.4 Integration of R program <strong>in</strong>to Arabic MM*StatR is a language and environment <strong>for</strong> statistical comput<strong>in</strong>g and graphics (R DevelopmentCore Team 2011). Arabic MM*Stat uses R programs to create tables and graphicswhich can be <strong>in</strong>corporated <strong>in</strong> courses notes. For the Mediawiki software an extensionto embed R <strong>in</strong>to the wiki page exists.They enable the students and learners, <strong>for</strong> example to visualise statistics distributionsand probability tables via the Internet. See Fig. 5 as an example of a graphic ofa probability density function and a table of a cdf function of a b<strong>in</strong>omial distribution.Choos<strong>in</strong>g other <strong>in</strong>put values will lead to different tables or graphics.Figure 6 shows the wiki source code <strong>for</strong> the example shown <strong>in</strong> Fig. 5. The <strong>in</strong>teractiveexample consists of two tags R<strong>for</strong>m und R which share a common attributename.... Input parameters...... R program...Between the open<strong>in</strong>g and clos<strong>in</strong>g R<strong>for</strong>m tags are the <strong>in</strong>put parameters as def<strong>in</strong>ed<strong>in</strong> an HTML <strong>for</strong>m. The follow<strong>in</strong>g open<strong>in</strong>g and clos<strong>in</strong>g R tag conta<strong>in</strong> the R programwhich produces a graphic. For more detail see Kl<strong>in</strong>ke and Zlatk<strong>in</strong>-Troitschanskaia(2007).There are <strong>in</strong> Arabic MM*Stat other examples, e.g., <strong>for</strong> other distributions like normal,Poisson and exponential distribution.4 ConclusionUs<strong>in</strong>g E-learn<strong>in</strong>g/e-teach<strong>in</strong>g tools to offer effective learn<strong>in</strong>g of statistics is a necessity<strong>for</strong> students. There is the possibility of creat<strong>in</strong>g an e-learn<strong>in</strong>g system with ArabicMM*Stat through the application of wiki technology. Some of the specific characteristicswe have discussed earlier <strong>for</strong> develop<strong>in</strong>g an Arabic plat<strong>for</strong>m already exist <strong>in</strong>the wiki. We see that embedd<strong>in</strong>g of R is an solution <strong>for</strong> the <strong>in</strong>teractive examples <strong>in</strong>Arabic MM*Stat. We hope that the Arabic MM*Stat plat<strong>for</strong>m <strong>for</strong> e-learn<strong>in</strong>g of statisticswill be a significant help <strong>for</strong> the Arabic user as it clearly overcomes weaknesses<strong>in</strong> develop<strong>in</strong>g such electronic plat<strong>for</strong>ms <strong>in</strong> Arabic.This research was supported by the Deutsche Forschungsgeme<strong>in</strong>schaft through theCRC 649 ‘Economic Risk’.123


Us<strong>in</strong>g wiki to build an e-learn<strong>in</strong>g systemReferencesAfifi D, Chahibi Y, Hosny K, Trojette MA (2011) Arabeyes.org—The Arabic Unix Project. http://wiki.arabeyes.org, Accessed 18 Aug 2011Al-Fadhli S (2008) Students’ perceptions of e-learn<strong>in</strong>g <strong>in</strong> Arab society: Kuwait University as a case study.E-Learn<strong>in</strong>g 5(4):418–428Babé LR (2007) Extension: Quiz. http://www.mediawiki.org/wiki/Extension:Quiz, Accessed 26 Aug 2011CosmoCode (2011) Wikimatrix. http://www.wikimarix.org, Accessed 24 Aug 2011Clayton PM (2007) Women, violence and the <strong>in</strong>ternet. E-Learn<strong>in</strong>g 4(1):79–92de Argaez E (2011) Internet world stats, Columbia 2011. http://www.<strong>in</strong>ternetworldstats.com, Accessed 18Aug 2011Kl<strong>in</strong>ke S (2011) Develop<strong>in</strong>g web-based tools <strong>for</strong> the teach<strong>in</strong>g of statistics: Our Wikis and the GermanWikipedia, ISI Conference Proceed<strong>in</strong>gs, Dubl<strong>in</strong>/Ireland (<strong>for</strong>thcom<strong>in</strong>g)Kl<strong>in</strong>ke S, Zlatk<strong>in</strong>-Troitschanskaia O (2007) Embedd<strong>in</strong>g R <strong>in</strong> the Mediawiki, Sonder<strong>for</strong>schungsbereich649 Discussion Papers 61, <strong>Humboldt</strong> University, Berl<strong>in</strong>/Germany 2007. http://edoc.hu-berl<strong>in</strong>.de/docviews/abstract.php?id=28421, Accessed 18 Aug 2011Lagally K (2004) ArabTeX : a system <strong>for</strong> typesett<strong>in</strong>g Arabic and Hebrew. Report Nr. 2004/03, UserManual Version 4.00, Fakultät In<strong>for</strong>matik, University Stuttgart, Germany. http://www2.<strong>in</strong><strong>for</strong>matik.uni-stuttgart.de/ivi/bs/research/arab_e.htm, Accessed 18 Aug 2011Lehmann H (2004) Client/server based statistical comput<strong>in</strong>g. Dissertation, <strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>,Germany. http://edoc.hu-berl<strong>in</strong>.de/docviews/abstract.php?id=20803, Accessed 18 Aug 2011Leuf B, Cunn<strong>in</strong>gham W (2001) The wiki way: quick collaboration on the web. Addison-Wesley, Read<strong>in</strong>g,MAMaegaard B, Choukri K, Mokbel C, Yaseen M (2005) Language technology <strong>for</strong> Arabic. Report of TheNEMLAR project, Center <strong>for</strong> Sprogteknolo gi. University of Copenhagen, Denmark. http://www.medar.<strong>in</strong>fo/The_Nemlar_Project/Publications/Arabic_LT.pdf, Accessed 18 Aug 2011Müller M, Rönz B, Ziegenhagen U (2006) The Multimedia Project MM*Stat <strong>for</strong> Teach<strong>in</strong>g Statistics.In: Proceed<strong>in</strong>gs of the COMPSTAT 2000. pp 409–414R Development Core Team (2011) R: A language and environment <strong>for</strong> statistical comput<strong>in</strong>g. R Foundation<strong>for</strong> Statistical Comput<strong>in</strong>g, Vienna/Austria. http://www.r-project.org, Accessed 18 Aug 2011Wegrzanowski T, Vibber B et al. (2011) Extension: Math. http://www.mediawiki.org/wiki/Extension:Math,Accessed 26 Aug 2011Wikimedia Foundation Inc. (2011) Wikipedia: The free encyclopedia. http://www.wikipedia.org, Accessed18 Aug 2011Witzel R, Kl<strong>in</strong>ke S (2002) MD*Book onl<strong>in</strong>e & e-stat: Generat<strong>in</strong>g e-stat Modules from Latex. In: Proceed<strong>in</strong>gsof the COMPSTAT 2002. pp 449–454123


Journal of F<strong>in</strong>ancial Econometrics, 2013, Vol. 11, No. 2, 370--399Shape Invariant Model<strong>in</strong>g of Pric<strong>in</strong>gKernels and Risk AversionMARIA GRITHCASE–Center <strong>for</strong> Applied Statistics and Economics,<strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>WOLFGANG HÄRDLECASE–Center <strong>for</strong> Applied Statistics and Economics,<strong>Humboldt</strong>-Universität zu Berl<strong>in</strong>JUHYUN PARKDepartment of Mathematics and Statistics, Lancaster UniversityABSTRACTSeveral empirical studies reported that pric<strong>in</strong>g kernels exhibit acommon pattern across different markets. The ma<strong>in</strong> <strong>in</strong>terest <strong>in</strong> pric<strong>in</strong>gkernels lies <strong>in</strong> validat<strong>in</strong>g the presence of the peaks and their variability<strong>in</strong> location among curves. Motivated by this observation we <strong>in</strong>vestigatethe problem of estimat<strong>in</strong>g pric<strong>in</strong>g kernels based on the shape <strong>in</strong>variantmodel, a semi-parametric approach used <strong>for</strong> multiple curves withshape-related nonl<strong>in</strong>ear variation. This approach allows us to capturethe common features conta<strong>in</strong>ed <strong>in</strong> the shape of the functions and at thesame time characterize the nonl<strong>in</strong>ear variability with a few <strong>in</strong>terpretableparameters. These parameters provide an <strong>in</strong><strong>for</strong>mative summary of thecurves and can be used to make a further analysis with macroeconomicvariables. Implied risk aversion function and utility function can also bederived. The method is demonstrated with the European options andreturns values of the German stock <strong>in</strong>dex DAX. (JEL: C14, C32, G12)KEYWORDS: pric<strong>in</strong>g kernels, risk aversion, risk neutral densityDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013The f<strong>in</strong>ancial support from the Deutsche Forschungsgeme<strong>in</strong>schaft via SFB 649 “Ökonomisches Risiko”,<strong>Humboldt</strong>-Universität zu Berl<strong>in</strong> is gratefully acknowledged. Address correspondence to Juhyun Park,Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK, or email:juhyun.park@lancaster.ac.ukdoi:10.1093/jjf<strong>in</strong>ec/nbs019 Advanced Access publication November 18, 2012© The Author, 2012. Published by Ox<strong>for</strong>d University Press. All rights reserved.For Permissions, please email: journals.permissions@ox<strong>for</strong>djournals.org


1 METHODOLOGYGRITH ET AL. | Shape Invariant Model<strong>in</strong>g 3711.1 Pric<strong>in</strong>g Kernel and Risk AversionRisk analysis and management drew much attention <strong>in</strong> quantitative f<strong>in</strong>ancerecently. Understand<strong>in</strong>g the basic pr<strong>in</strong>ciples of f<strong>in</strong>ancial economics is a challeng<strong>in</strong>gtask <strong>in</strong> particular <strong>in</strong> a dynamic context. With the <strong>for</strong>mulation of utility maximizationtheory, <strong>in</strong>dividuals’ preferences are expla<strong>in</strong>ed through the shape of the underly<strong>in</strong>gutility functions. Namely, a concave, convex, or l<strong>in</strong>ear utility function is associatedwith risk averse, risk seek<strong>in</strong>g, or risk neutral behavior, respectively. The comparisonis often made through the Arrow-Pratt measure of absolute risk aversion (ARA), asa summary of aggregate <strong>in</strong>vestor’s risk-averseness. The quantity is orig<strong>in</strong>ated fromthe expected utility theory and is def<strong>in</strong>ed byARA(u)=− U′′ (u)U ′ (u) ,where U is the <strong>in</strong>dividual utility as a function of wealth.With an economic consideration that one unit ga<strong>in</strong> and loss does not carrythe same value <strong>for</strong> every <strong>in</strong>dividual, understand<strong>in</strong>g state-dependent risk behaviorbecomes an <strong>in</strong>creas<strong>in</strong>gly important issue. The fundamental problem is that<strong>in</strong>dividual agents are not directly observable but it is assumed that the prices ofgoods traded <strong>in</strong> the market reflect the dynamics of their risk behavior. Severalef<strong>for</strong>ts have been made to relate the price processes of assets and options traded <strong>in</strong>a market to risk behavior of <strong>in</strong>vestors, s<strong>in</strong>ce options are securities guard<strong>in</strong>g aga<strong>in</strong>stlosses <strong>in</strong> risky assets.A standard option pric<strong>in</strong>g model <strong>in</strong> a complete market assumes a risk neutraldistribution of returns, which gives the fair price under no arbitrage assumptions.If markets are not complete, there are more risk neutral distributions and the fairprice depends on the hedg<strong>in</strong>g problem. The subjective or historical distribution ofobserved returns reflects a risk-adaptive behavior of <strong>in</strong>vestors based on subjectiveassessment of the future market. Then the equilibrium price is the arbitrage freeprice and the transition from risk neutral pric<strong>in</strong>g to subjective rule is achievedthrough the pric<strong>in</strong>g kernel. Assum<strong>in</strong>g those densities exist, write q <strong>for</strong> the riskneutral density and p <strong>for</strong> the historical density. The pric<strong>in</strong>g kernel K is def<strong>in</strong>ed bythe ratio of those densities:K(u)= q(u)p(u) .Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013Through the <strong>in</strong>termediation of these densities, there exists a l<strong>in</strong>k between thepric<strong>in</strong>g kernel and ARA, see <strong>for</strong> example Leland (1980)ARA(u)= p′ (u)p(u) − q′ (u)q(u) =−dlogK(u) .du


372 Journal of F<strong>in</strong>ancial Econometrics22EPK1EPK10.8 1 1.2Returns0.8 1 1.2ReturnsFigure 1 Examples of <strong>in</strong>ter-temporal pric<strong>in</strong>g kernels <strong>for</strong> various maturities <strong>in</strong> January–February2006 (left) and monthly pric<strong>in</strong>g kernels from the first six months <strong>in</strong> 2006 <strong>for</strong> maturity one month(right).In this way, rather than specify<strong>in</strong>g a priori preferences of agents (risk neutral, averse,or risk seek<strong>in</strong>g) and implicitly the monotonicity of the pric<strong>in</strong>g kernel, we can <strong>in</strong>ferthe risk patterns from the shape of the pric<strong>in</strong>g kernel.1.2 Dynamics of Empirical Pric<strong>in</strong>g Kernels (EPKs)With <strong>in</strong>creas<strong>in</strong>g availability of large market data, several approaches to recover<strong>in</strong>gpric<strong>in</strong>g kernels from empirical data have been proposed. As many of them estimatep and q separately to recover K, potentially relevant are studies focus<strong>in</strong>g onrecover<strong>in</strong>g risk neutral density, see e.g. Jackwerth (1999), and Bondarenko (2003) <strong>for</strong>comparison of different approaches. For the estimation of p <strong>nonparametric</strong> kernelmethods or parametric models such as GARCH or Heston models are popularchoices.Examples of empirical pric<strong>in</strong>g kernels obta<strong>in</strong>ed from European options dataon the German stock <strong>in</strong>dex DAX (Deutscher Aktien <strong>in</strong>dex) <strong>in</strong> 2006 are shown <strong>in</strong>Figure 1, based on separate estimation of p and q. A detailed account of estimationis given <strong>in</strong> Section 3.4. To make these comparable, they are shown on a cont<strong>in</strong>uouslycompounded returns scale. Throughout the article, the pric<strong>in</strong>g kernel is consideredas a function of this common scale of returns. Figure 1 depicts <strong>in</strong>ter-temporal pric<strong>in</strong>gkernels with various maturities <strong>in</strong> January–February 2006 (left), and monthlypric<strong>in</strong>g kernels with fixed maturity one month <strong>in</strong> 2006 (right). The sample ofcurves appears to have a bump around 1 and has convexity followed by concavity<strong>in</strong> all cases. The location as well as the magnitude of the bump vary amongcurves, which reflects <strong>in</strong>dividual variability on different dates or under different<strong>in</strong>vestment horizons. Some features that are of particular economic <strong>in</strong>terest <strong>in</strong>cludeDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 373the maximum of the bump, the spread or duration of the bump and the location ofthe bump.From a statistical perspective, properties of the pric<strong>in</strong>g kernel are <strong>in</strong>tr<strong>in</strong>sicallyrelated to assumptions about the data generation process. A very restrictive model,with normal marg<strong>in</strong>al distributions, is the Black–Scholes model. This results <strong>in</strong> anoverall decreas<strong>in</strong>g pric<strong>in</strong>g kernel <strong>in</strong> wealth, which is <strong>consistent</strong> with overall riskaversebehavior. These preferences are often assumed <strong>in</strong> the classical economictheory of utility-maximiz<strong>in</strong>g agent and correspond to a concave <strong>in</strong>direct vonNeumann and Morgenstern utility function. However, under richer parametricspecifications or <strong>nonparametric</strong> models monotonicity of the pric<strong>in</strong>g kernel hasbeen rejected <strong>in</strong> practice (Rosenberg and Engle, 2002; Giacom<strong>in</strong>i and Härdle, 2008).The phenomenon of locally nondecreas<strong>in</strong>g pric<strong>in</strong>g kernel is referred to as thepric<strong>in</strong>g kernel puzzle <strong>in</strong> the literature. There have been many attempts to reconcilethe underly<strong>in</strong>g economic theory with the empirical f<strong>in</strong>d<strong>in</strong>gs. A recent solutionsuggested by Hens and Reichl<strong>in</strong> (2012) relates the puzzle to the violation of thefundamental assumptions <strong>in</strong> the equilibrium model framework.Most of earlier works adopt a static viewpo<strong>in</strong>t, show<strong>in</strong>g a snapshot of marketson selected dates but report that there is a common pattern across differentmarkets. The first dynamic viewpo<strong>in</strong>t appears <strong>in</strong> Jackwerth (2000), who recoversa series of pric<strong>in</strong>g kernels <strong>in</strong> consecutive times and claims that these do notcorrespond to the basic assumptions of asset pric<strong>in</strong>g theory. In a similar frameworkGiacom<strong>in</strong>i and Härdle (2008) per<strong>for</strong>m a factor analysis based on the so-calleddynamic semiparametric factor models, while Giacom<strong>in</strong>i, Härdle and Handel(2008) <strong>in</strong>troduce time series analysis of daily summary measures of pric<strong>in</strong>g kernelsto exam<strong>in</strong>e variability between curves.Chabi-Yo, Garcia, and Renault (2008) expla<strong>in</strong> the observed dynamics or thepuzzles by means of latent variables <strong>in</strong> the asset pric<strong>in</strong>g models. Effectively, theypropose to build conditional models of the pric<strong>in</strong>g kernels given the state variablesreflect<strong>in</strong>g preferences, economic fundamentals, or beliefs. With<strong>in</strong> this frameworkthey are able to reproduce the puzzles, <strong>in</strong> conjunction with some jo<strong>in</strong>t parametricspecifications <strong>for</strong> the pric<strong>in</strong>g kernel and the asset return processes.Due to evolution of markets over time under different circumstances, thepric<strong>in</strong>g kernels are <strong>in</strong>tr<strong>in</strong>sically time vary<strong>in</strong>g. Thus, approaches that do not take<strong>in</strong>to account the chang<strong>in</strong>g market make limited use of <strong>in</strong><strong>for</strong>mation available <strong>in</strong> thecurrent data. On the other hand, changes over time may not be completely arbitrary,as there are common rules and underly<strong>in</strong>g laws that assure some consistencyacross different market system. Moreover, variability observed <strong>in</strong> pric<strong>in</strong>g kernels,as shown <strong>in</strong> Figures 1, is not necessarily l<strong>in</strong>ear, and thus factors constructed from al<strong>in</strong>ear comb<strong>in</strong>ation of observations are only mean<strong>in</strong>gful <strong>for</strong> expla<strong>in</strong><strong>in</strong>g aggregatedeffects.Consider<strong>in</strong>g the pric<strong>in</strong>g kernels as an object of curves, we approach theproblem of estimat<strong>in</strong>g the pric<strong>in</strong>g kernels and implied risk aversion functionsfrom a functional data analysis viewpo<strong>in</strong>t (Ramsay and Silverman, 2002). The ma<strong>in</strong><strong>in</strong>terest <strong>in</strong> pric<strong>in</strong>g kernels lies <strong>in</strong> validat<strong>in</strong>g the presence of the peaks and theirDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


374 Journal of F<strong>in</strong>ancial Econometricsvariability <strong>in</strong> location among curves. Motivated by this observation we <strong>in</strong>vestigatethe estimation method based on the shape <strong>in</strong>variant model, which will be <strong>for</strong>mally<strong>in</strong>troduced <strong>in</strong> Section 2. This is chosen over the commonly adopted functionalpr<strong>in</strong>cipal component analysis to accommodate the nonl<strong>in</strong>ear features such asvariation of peak locations, which encapsulate quantities amenable to economic<strong>in</strong>terpretation. The shape <strong>in</strong>variant model allows us to capture the commoncharacteristics, reported across different studies on different markets. We thenexpla<strong>in</strong> <strong>in</strong>dividual variability as a deviation from the common curve or a reference.Our contribution is three-fold. Firstly, we analyze the phenomenon of pric<strong>in</strong>gkernel puzzle from a dynamic viewpo<strong>in</strong>t us<strong>in</strong>g shape <strong>in</strong>variant model<strong>in</strong>g approach.The start<strong>in</strong>g question was how to compare the empirical evidence. By tak<strong>in</strong>g <strong>in</strong>toaccount variability among curves, we quantify a trend of the puzzle <strong>in</strong> the seriesof the pric<strong>in</strong>g kernels by a few <strong>in</strong>terpretable parameters. Secondly, we provide aunified framework <strong>for</strong> estimation and <strong>in</strong>terpretation of ARA and utility functions<strong>consistent</strong> with the underly<strong>in</strong>g pric<strong>in</strong>g kernels with the same set of parameters. TheARA correspond<strong>in</strong>g to the reference pric<strong>in</strong>g kernel could be viewed as a typicalpattern of risk behavior <strong>for</strong> the period under consideration. Due to nonl<strong>in</strong>eartrans<strong>for</strong>mation <strong>in</strong>volved <strong>in</strong> deriv<strong>in</strong>g ARA from the pric<strong>in</strong>g kernel function, thiscommon ARA function does not necessarily co<strong>in</strong>cide with the simple average ARAfunctions. Thirdly, the output of the analysis provides a summary measure to studythe relationship with macroeconomic variables. Through real data example we haverelated the changes <strong>in</strong> risk behavior to some macroeconomic variables of <strong>in</strong>terestand found that local risk lov<strong>in</strong>g behavior is procyclical. We acknowledge that wedo not provide an economic explanation to the puzzle but rather try to understandthe nature of the phenomenon by means of statistical analysis.The paper is organized as follows. Section 2 motivates common shape model<strong>in</strong>gapproach and Section 3 reviews the shape <strong>in</strong>variant model and describes it <strong>in</strong>detail <strong>in</strong> the context of pric<strong>in</strong>g kernel estimation. This section serves the basis ofour analysis. Numerical studies based on simulation are found <strong>in</strong> Section 4. Anapplication to real data example is summarized <strong>in</strong> Section 5.2 COMMON SHAPE MODELING2.1 Shape Invariant Model <strong>for</strong> Pric<strong>in</strong>g KernelWe consider a common shape model<strong>in</strong>g approach <strong>for</strong> the series of pric<strong>in</strong>g kernelswith explicit components of location and scale. To represent vary<strong>in</strong>g pric<strong>in</strong>g kernels,we <strong>in</strong>troduce the time <strong>in</strong>dex t <strong>in</strong> the pric<strong>in</strong>g kernel as K t and consider a generalregression model:Y t =K t +ε t ,Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013where ε t represents an error with mean 0 and variance σt 2 . We beg<strong>in</strong> with a work<strong>in</strong>gassumption of <strong>in</strong>dependent error as <strong>in</strong> Kneip and Engel (1995). The effect of


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 375dependent error is <strong>in</strong>vestigated <strong>in</strong> simulation studies <strong>in</strong> Section 4.2. The relationshipamong K t s is specified as( ) u−θt3K t (u)=θ t1 g +θ t4 , (1)θ t2with some unknown constants θ t =(θ t1 ,θ t2 ,θ t3 ,θ t4 ) and an unknown function g. Thecommon shape function g can be <strong>in</strong>terpreted as a reference curve. Deviation fromthe reference curve is described by four parameters θ t =(θ t1 ,θ t2 ,θ t3 ,θ t4 ) that capturescale changes and a shift <strong>in</strong> horizontal and vertical direction. This parametrization<strong>in</strong> (1) is commonly known as a shape <strong>in</strong>variant model (SIM), orig<strong>in</strong>ally <strong>in</strong>troducedby Lawton, Sylvestre and Maggio (1972) and studied by Kneip and Engel (1995).Note that the model <strong>in</strong>cludes as a special case complete parametric models withknown g.In contrast to standard applications of SIM as a regression model, the SIMapplication to pric<strong>in</strong>g kernel estimation does not, strictly speak<strong>in</strong>g, satisfy themodel assumption. There is no realization of the pric<strong>in</strong>g kernels available and thusour <strong>for</strong>mulation of regression model should be viewed as an approximation. Theorig<strong>in</strong>al data used would be <strong>in</strong>traday options data and daily returns data, which arecollected from separate sources with sample sizes of different orders of magnitudebut estimation of p and q can be effectively done <strong>in</strong>dependently of each other. It maybe possible to elaborate our approach to <strong>in</strong>corporate simultaneous estimation witha two-step state-dependent dynamic model <strong>for</strong>mulation whereby the dynamics ofthe observed return processes are specified and the unobserved pric<strong>in</strong>g kernelprocesses enter as a state variable. However, with current advancement <strong>in</strong> themethodology, this is only possible with limited parametric model choices, see <strong>for</strong>example Chabi-Yo et al. (2008), and extension to a flexible shape <strong>in</strong>variant model isleft <strong>for</strong> future work.Instead we exploit the fact that prelim<strong>in</strong>ary estimates of pric<strong>in</strong>g kernels basedon separate estimation of p and q are readily available from market data and this caneasily substitute Y. From now on, we treat the estimates as someth<strong>in</strong>g observableand denote by Y t , similar to the regression <strong>for</strong>mulation with direct measurementsY t and state the asymptotic result without further complication of pre-process<strong>in</strong>gsteps. After all, these estimates of curves are available from the beg<strong>in</strong>n<strong>in</strong>g and theSIM aims to characterize a structural relationship among these curves. This howevermay impact the parametric rate of convergence atta<strong>in</strong>able (Kneip and Engel, 1995)because our observations are already contam<strong>in</strong>ated by a <strong>nonparametric</strong> error ofestimation. As is shown <strong>in</strong> Section 3.6, the dom<strong>in</strong>at<strong>in</strong>g error comes from theestimation of q, which <strong>in</strong>volves second derivative estimation. The optimal rate ofconvergence <strong>for</strong> estimat<strong>in</strong>g second derivative is known to be O(N −2/9 ), where Nis the sample size used (Stone, 1982). This implies that σt 2 =α N,tv 2 where α N,t isa constant of order O(N −2/9 ), which should be understood as the multiplicationfactor <strong>for</strong> the parametric rate of convergence.A particular choice of estimates of <strong>in</strong>dividual pric<strong>in</strong>g kernels is not part ofthe model <strong>for</strong>mulation but affects the start<strong>in</strong>g values <strong>for</strong> the estimation of shapeDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


376 Journal of F<strong>in</strong>ancial EconometricsK_t0 1 2 3 4 5 60.0 0.5 1.0 1.5 2.0wealthU_t0 1 2 3 4 5 60.0 0.5 1.0 1.5 2.0wealthFigure 2 Example of location and scale shift family of pric<strong>in</strong>g kernels (left) and correspond<strong>in</strong>gutility functions (right). Solid l<strong>in</strong>e <strong>in</strong> each plot represents reference curves of g(u)=u −γ and U 0 (u)=u 1−γ /(1−γ ) with γ =0.7, respectively. Parameters are θ t1 =1.1,θ t2 =1,θ t3 =1−θ (1/γ )t1 , and θ t4 =0<strong>for</strong>dot-dashed (red) and θ t4 =−0.5 <strong>for</strong> dashed (blue) l<strong>in</strong>es.<strong>in</strong>variant model. Our choice of <strong>in</strong>itial estimates will be expla<strong>in</strong>ed <strong>in</strong> Section 3.4.Our ma<strong>in</strong> <strong>in</strong>terest lies <strong>in</strong> quantify<strong>in</strong>g the variation among the pric<strong>in</strong>g kernels giventhose estimates.The new message here is an analysis of a sequence of pric<strong>in</strong>g kernels throughshape <strong>in</strong>variant models. Although we start with different motivation, our approachis <strong>in</strong> l<strong>in</strong>e with that of Chabi-Yo et al. (2008). In contrast to their approach, we imposea structural constra<strong>in</strong>t that is related to the shape of the function. This way we strikea balance between flexibility much desired <strong>in</strong> parametric model specification and<strong>in</strong>terpretability of the results lack<strong>in</strong>g <strong>in</strong> full <strong>nonparametric</strong> models.2.2 SIM and Black–Scholes ModelTo appreciate the model <strong>for</strong>mulation, given <strong>in</strong> the Equation (1), it is <strong>in</strong>structive toconsider utility functions implied by this family of pric<strong>in</strong>g kernels together. Theutility function can be derived fromU t (u)=α∫ u0K t (x)dx,Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013<strong>for</strong> a constant α. Figure 2 shows an example based on a power utility function,which corresponds to risk averse behavior. Pric<strong>in</strong>g kernels K t are shown on theleft and the correspond<strong>in</strong>g utility functions U t are on the right. The solid l<strong>in</strong>esrepresent reference curves and the dashed and dot-dashed l<strong>in</strong>es represent K t andU t with appropriate parameters θ t <strong>in</strong> the Equation (1). Depend<strong>in</strong>g on the choice of


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 377parameters, the utility function can <strong>in</strong>crease quickly or slowly. As an illustration, weconsider the Black–Scholes model with power utility function. The Black–Scholesmodel assumes that the stock price follows a geometric Brownian motiondS t /S t =μdt+σ dW t ,which gives rise to a log normal distribution <strong>for</strong> the historical density p. Under therisk neutral measure, the drift μ is replaced by the riskless rate r but the density qis still log normal. The pric<strong>in</strong>g kernel can be written as a power functionK(u)=λu −γ ,0


378 Journal of F<strong>in</strong>ancial EconometricsFor example, we have seen that the two scale parameters <strong>in</strong> the pric<strong>in</strong>g kernelfunctions correspond<strong>in</strong>g to the Black–Scholes model are not separable. Basically,unless there exist some qualitatively dist<strong>in</strong>ct common characteristics <strong>for</strong> each curve,the model is not identifiable (Kneip and Gasser, 1988). In the case of no priorstructural <strong>in</strong><strong>for</strong>mation available as <strong>in</strong> the case of pric<strong>in</strong>g kernels, it is sufficientto consider a few landmarks such as peaks and <strong>in</strong>flection po<strong>in</strong>ts.Even with a unique g, some translation and scal<strong>in</strong>g of parameters lead tomultiple representations of the models. For uniqueness of parameters, we willimpose normaliz<strong>in</strong>g conditions suggested <strong>in</strong> Kneip and Engel (1995):∑TT −1 θ t1 =1,t=1∑TT −1 θ t2 =1,t=1∑TT −1 θ t3 =0,t=1∑TT −1 θ t4 =0<strong>in</strong> the sense that there exists an average curve. These conditions are not restrictionat all and can be replaced by any appropriate comb<strong>in</strong>ation of parameters.Alternatively, we could consider the first curve as a reference, as done <strong>in</strong>Härdle and Marron (1990), which implies the restriction θ 1 =(1,1,0,0). Generally,an application-driven normalization scheme can be devised and examples arefound <strong>in</strong> Lawton, Sylvestre and Maggio (1972).2.4 SIM Implied Risk Aversion and Utility FunctionThe utility function correspond<strong>in</strong>g to K t is given by( ) (u−θt3U t (u) = θ t1 θ t2{G −G − θ )}t3+θ t4 uθ t2 θ t2( )≡ θt1 ∗ G u−θt3+θt4 ∗ θ +θ t4u,t2where G(t)= ∫ t0 g(u)du. The utility function U t is a comb<strong>in</strong>ation of a SIM class of thecommon utility function and a l<strong>in</strong>ear utility function.The ARA measure is given by− θ t1θ t2g ′( )u−θ t3θ t2ARA t (u)= ) . (2)θ t1 g(u−θt3θ t2+θ t4For example, assum<strong>in</strong>g g(u)=u −γ with θ t2 =1 gives{ARA t (u)=γ (u−θ t3 )+(θ t4 /θ t1 )(u−θ t3 ) γ +1} −1.t=1Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013When θ t4 =0, this function is monotonically decreas<strong>in</strong>g but <strong>in</strong> general this is notthe case. Note the common ARA function correspond<strong>in</strong>g to g is γ u −1 compared tothe mean ARA function computed by tak<strong>in</strong>g the sample average T −1∑ Tt=1 ARA t (u).


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 379θ 1θ 4PK212121θ 20.9 1 1.1θ 30.9 1 1.1210.9 1 1.10.9 1 1.1ARAU4000.20.10.9 1 1.10.9 1 1.14000.20.10.9 1 1.10.9 1 1.14000.20.10.9 1 1.10.9 1 1.14000.20.10.9 1 1.10.9 1 1.1Figure 3 Effect of parameters on pric<strong>in</strong>g kernel (top), ARA (middle), and utility function (bottom)compared to the basel<strong>in</strong>e model θ 0 =(1,1,0,0) (black). Dot-dashed l<strong>in</strong>es are used <strong>for</strong> <strong>in</strong>creas<strong>in</strong>gdirection and dashed l<strong>in</strong>es <strong>for</strong> decreas<strong>in</strong>g direction.In order to ga<strong>in</strong> some <strong>in</strong>sights, we take a closer look at the changes <strong>in</strong>relation to <strong>in</strong>dividual scale and shift parameters. These <strong>in</strong>dividual effects aredemonstrated <strong>in</strong> Figure 3. We vary each θ i with respect to a basel<strong>in</strong>e model andthen we show how these modifications translate <strong>in</strong>to changes of the risk attitudesand the correspond<strong>in</strong>g utility functions. The parameters used <strong>in</strong> Figure 3 areθ =(0.5,0.7,−0.025,−0.25) <strong>in</strong> dashed l<strong>in</strong>e and θ =(1.5,1.3,0.025,0.25) <strong>in</strong> dot-dashedl<strong>in</strong>e.For this exercise we first standardize the common curve that we have estimatedvia the shape <strong>in</strong>variant model so that the peak occurs at the value 0 on the abscissaand the effect of the scale and shift parameters is separately captured. But weadded the peak coord<strong>in</strong>ates back <strong>for</strong> visualization so that they are comparableto other figures shown on returns scale. We observe that an <strong>in</strong>crease <strong>in</strong> θ 1 marksthe bump of the pric<strong>in</strong>g kernel more dist<strong>in</strong>ctive while the shape of ARA rema<strong>in</strong>sunchanged compared to the basel<strong>in</strong>e model because, as we can see from (2), ARAdoes not depend on θ 1 when θ 4 =0. Yet, the effect of θ 1 on ARA can be analyzed byconsider<strong>in</strong>g two dist<strong>in</strong>ct cases: θ 4 >0 and θ 4


380 Journal of F<strong>in</strong>ancial Econometricsbecause the direction of change <strong>in</strong> the slope of ARA is dictated by the sign of θ 4 .Inthe present case—after normalization—θ 1 varies around 0 and its effect on ARA isalmost nil.A larger value <strong>in</strong> the parameter θ 2 as compared to a benchmark value stretchesthe x-axis, which implies[{larger spread of the}bump.]When we vary θ 2 alone theslope of ARA(θ 2 u)is1/θ22 g ′2 (u)−g ′′ (u)/g(u) /g 2 (u) . The term <strong>in</strong> brackets doesnot depend on θ 2 ; it is equal to the slope of ARA(u). There<strong>for</strong>e, there is an <strong>in</strong>verserelationship between the direction of change <strong>in</strong> the parameter and that of theabsolute value of the slope. These changes <strong>in</strong> slope occur around an <strong>in</strong>flectionpo<strong>in</strong>t that corresponds to the peak of the pric<strong>in</strong>g kernel.A positive <strong>in</strong>crement <strong>in</strong> θ 3 shifts both curves to the left without any modification<strong>in</strong> the shape. θ 4 simply translates pric<strong>in</strong>g kernel curves above or below the referencecurve follow<strong>in</strong>g a sign rule. Similarly to θ 2 , the shape of ARA modifies aroundthe fixed <strong>in</strong>flection po<strong>in</strong>t that marks the change from risk proclivity (negativeARA) to risk aversion (positive ARA). The effect of θ 4 on the values of ARA isstraight<strong>for</strong>ward: s<strong>in</strong>ce θ 4 adds to the g <strong>in</strong> the denom<strong>in</strong>ator its <strong>in</strong>crease will dim<strong>in</strong>ishthe absolute ARA level and the other way around. Insulat<strong>in</strong>g the effects of a change<strong>in</strong> θ 4 on the slope of ARA(u) analytically proves to be a more complicated task than<strong>in</strong> the case of θ 2 because the change <strong>in</strong> the slope depends jo<strong>in</strong>tly on the change <strong>in</strong> θ 4and on the pric<strong>in</strong>g kernel values and its first two derivatives. In our case, the slopearound the <strong>in</strong>flection po<strong>in</strong>t <strong>in</strong>creases when θ 4 decreases.As <strong>for</strong> the utility function, positive changes <strong>in</strong> θ 1 and θ 4 <strong>in</strong>creases its absoluteslope. In the horizontal direction, θ 3 translates the curve to the left or right similarlyto the pric<strong>in</strong>g kernel and ARA while θ 2 shr<strong>in</strong>ks or expands its doma<strong>in</strong>.With this <strong>in</strong><strong>for</strong>mation at hand we can characterize the changes <strong>in</strong> risk patterns<strong>in</strong> relation to economic variables of <strong>in</strong>terest, see Section 5.4.3 FITTING SHAPE INVARIANT MODELS3.1 Estimation of SIMThe model <strong>in</strong> (1) is equivalently written asK t (θ t2 u+θ t3 )=θ t1 g(u)+θ t4 , θ t1 >0, θ t2 >0. (3)Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013The estimation procedure is developed us<strong>in</strong>g the least squares criterion basedon <strong>nonparametric</strong> estimates of <strong>in</strong>dividual curves. If there are only two curves,parameter estimates are obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g∫{ ˆK2 (θ 2 u+θ 3 )−θ 1 ˆK1 (u)−θ 4 } 2 w(u)du, (4)


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 381where ˆKi are <strong>nonparametric</strong> estimates of the curves. Härdle and Marron (1990)studied comparison of two curves and Kneip and Engel (1995) extended to multiplecurves with an iterative algorithm. We consider an adaption of such algorithm here.The weight function w is <strong>in</strong>troduced to ensure that the functions are compared<strong>in</strong> a doma<strong>in</strong> where the common features are def<strong>in</strong>ed. We assume that there is an<strong>in</strong>terval [a,b]∈J where boundary effects are elim<strong>in</strong>ated and then def<strong>in</strong>ew(u)= ∏ t1 [a,b]{(u−θt3 )/θ t2}.The parameter estimates are compared only <strong>in</strong> the common region def<strong>in</strong>ed by wbut the <strong>in</strong>dividual curve estimates are def<strong>in</strong>ed on the whole <strong>in</strong>terval. Weights canbe extended to account <strong>for</strong> additional variability.The normalization leads to:∑TT −1 K t (θ t2 u+θ t3 )=g(u). (5)t=1Formula (5) was exploited <strong>in</strong> the algorithm proposed by Kneip and Engel (1995).We adopt a similar strategy here.• Initialize( )– Let ˆK t =Y t and set start<strong>in</strong>g values θ (0)t2 ,θ(0) t3<strong>for</strong> t=1,2,···,T.– Construct an <strong>in</strong>itial estimate g (0) by• Forr-th step, r =1,2,···,R,g (0) (u)=T −1 T ∑t=1ˆK t(θ (0)t2(0)u+θt3– Determ<strong>in</strong>e parameters θ (r) separately <strong>for</strong> t=1,2,···,T by m<strong>in</strong>imiz<strong>in</strong>g∫ {ˆKt (θ t2 u+θ t3 )−θ t1 g (r−1) (u)−θ t4} 2w(u)du.– Normalize parameters: <strong>for</strong> j =(1,2) and k =(3,4)tj← θ (r)tjθ (r)∑t θ (r)tj)., θ (r)tk ←θ(r) tk −T−1∑ tθ (r)tk .Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013– Update g (r−1) tog (r) (u)=T −1 T ∑t=1ˆK t(θ (r)t2(r)u+θt3).


382 Journal of F<strong>in</strong>ancial Econometrics• Determ<strong>in</strong>e f<strong>in</strong>al estimates:˜θ t = θ (R)t,∑T (˜g(u) = T −1 ˆK t ˜θ t2 u+ ˜θ t3).t=1Kneip and Engel (1995) proved consistency of the estimator. In particulardespite <strong>nonparametric</strong> <strong>in</strong>itial curve estimates, the parameters are shown to be √ T<strong>consistent</strong>. In their analysis it is noted that the <strong>in</strong>itial estimates of the curves are ofm<strong>in</strong>or importance compared to the f<strong>in</strong>al estimate of g. So the orig<strong>in</strong>al algorithm<strong>in</strong>cludes the f<strong>in</strong>al updat<strong>in</strong>g of each curve. This improves precision of the estimatesbecause the pooled sample estimate reduces the variance of ˜g, which allowsundersmooth<strong>in</strong>g at the f<strong>in</strong>al stage to reduce bias. However, this f<strong>in</strong>al updat<strong>in</strong>g stepis not practical <strong>for</strong> our situation with <strong>in</strong>direct measurements and is not implementedhere <strong>for</strong> pric<strong>in</strong>g kernel estimation. On the other hand, we can take advantageof hav<strong>in</strong>g smooth curves evaluated at f<strong>in</strong>ite grid po<strong>in</strong>ts as data. It is easier toimprove the <strong>in</strong>itialization step, expla<strong>in</strong>ed <strong>in</strong> Section 3.2. This leads to simplificationof the estimat<strong>in</strong>g procedure with little compromise of the quality of the fit. Infact, the number of iterations required is very small and often 3 or 4 is sufficient <strong>in</strong>practical terms. We found that when the <strong>in</strong>itial estimates are determ<strong>in</strong>ed sufficientlyaccurate, the iteration is not necessary.As a work<strong>in</strong>g model we have assumed an <strong>in</strong>dependent error. If there is areasonable dependence structure available, this could be <strong>in</strong>corporated easily <strong>in</strong>the estimation algorithm with weighted least squares estimation <strong>in</strong> (4). The effectof <strong>in</strong>dependence assumption ma<strong>in</strong>ly appears <strong>in</strong> the standard error estimation anda correction can be made with a sandwich variance–covariance estimator. To assessthe effect of model misspecification, we also carried out some simulation studieswith dependent errors and reported the results <strong>in</strong> Section 4.3.2 Start<strong>in</strong>g ValuesIf there is no scale change <strong>in</strong> horizontal direction, due to prom<strong>in</strong>ent peaks <strong>in</strong> eachcurve, the parameter θ 3 can be identified easily by the location of the <strong>in</strong>dividualpeak. If the models hold true, and there are two unique landmarks identifiable <strong>for</strong>each curve, simple l<strong>in</strong>ear regression between the <strong>in</strong>dividual mark and the averagemark provides an estimate of the slope parameter θ 2 . Suppose that the peak isidentified by u satisfy<strong>in</strong>g Kt ′ (u)=0. Then we have0=K t ′ (u)= θ ( )t1g ′ u−θt3.θ t2 θ t2Writ<strong>in</strong>g u ∗ t <strong>for</strong> K′ t and u∗ 0 <strong>for</strong> g′ leads to a simple l<strong>in</strong>ear relation:Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013u ∗ t =θ t2u ∗ 0 +θ t3. (6)


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 3833.53.5EPKEPK110.85 1 1.15 0.85 1 1.15ReturnsReturnsFigure 4 Initial estimates K t (u) (left) and f<strong>in</strong>al estimates K t (θ t2 u+θ t3 ) from SIM (right) with goverlayed. Marked <strong>in</strong> the left plot are two landmarks identified <strong>for</strong> estimation of the start<strong>in</strong>g valuesof (θ t2 ,θ t3 ).If an <strong>in</strong>flection po<strong>in</strong>t is used, we would have0=K t ′′ (u)= θ ( )t1g ′′ u−θt3,θ t2θ 2 t2which gives rise to the same relation as (6), with the correspond<strong>in</strong>g u ∗∗tand u ∗∗0substituted. The coefficients of <strong>in</strong>tercept and slope estimates are used <strong>for</strong> start<strong>in</strong>gvalues of θ t3 and θ t2 , respectively.We used the peak and the <strong>in</strong>flection po<strong>in</strong>ts around 1 as landmarks, marked<strong>in</strong> Figure 4. The location of the landmarks is def<strong>in</strong>ed by the zero cross<strong>in</strong>gs of thefirst and second derivatives. Because the <strong>in</strong>itial observations K t are a smoothedcurve, we f<strong>in</strong>d that additional smooth<strong>in</strong>g procedure is not required at this stage:a f<strong>in</strong>ite difference operation is sufficient to apply mean value theorem with l<strong>in</strong>ear<strong>in</strong>terpolation.The slope between any two po<strong>in</strong>ts did not vary much, which is <strong>consistent</strong> withthe model specification. This step is also used as an <strong>in</strong><strong>for</strong>mal check and shouldthere be any nonl<strong>in</strong>earity detected, the model needs to be extended to <strong>in</strong>clude anonl<strong>in</strong>ear trans<strong>for</strong>mation. With our example, this was not the case.Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 20133.3 Nonl<strong>in</strong>ear OptimizationGiven the estimates of (θ t2 ,θ t3 ), the nonl<strong>in</strong>ear least squares optimization uses (4),which is approximated by∑{ ( )ˆKt θt2 u j +θ t3 −θt1 ĝ ( ) } 2w ( )u j −θt4 uj . (7)j


384 Journal of F<strong>in</strong>ancial EconometricsWhen the <strong>in</strong>itial values of (θ t2 ,θ t3 ) are sufficiently accurate, this step is simplifiedto a l<strong>in</strong>ear regression. Conditional on θ t2 ,θ t3 and ĝ, the solutions to the least squareregression with response variable ˆKt (θ t2 u j +θ t3 ) and explanatory variable ĝ(u j )provide (θ t1 ,θ t4 ). When a further optimization rout<strong>in</strong>e is employed to improve theestimates, these numbers serve as <strong>in</strong>itial values <strong>for</strong> (θ t1 ,θ t4 ).3.4 Initial Estimates of KTo start the algorithm the <strong>in</strong>itial estimates of K should be supplied. An example of<strong>in</strong>itial estimates of K is shown <strong>in</strong> Figure 4 on the scale of cont<strong>in</strong>uously compoundedreturns. These are obta<strong>in</strong>ed from separate estimation of p and q, which are describedbelow. Individual smooth<strong>in</strong>g parameter choice is discussed <strong>in</strong> Section 5 with realdata example.3.4.1 Estimation of the historical density p. We use the <strong>nonparametric</strong>kernel density estimates similar to Ait-Sahalia and Lo (2000) based on the pastobservations of returns <strong>for</strong> a fixed maturity τ. With this approach the returnsof the stock prices are assumed to vary slowly and thus the process can beassumed stationary <strong>for</strong> a short period of time. Alternatively, if additional model<strong>in</strong>gassumption is made <strong>for</strong> the evolution of the stock price such as GARCH, asimulation-based approach could be employed.At given time t and T =t+τ we obta<strong>in</strong> realizations of future return values froma w<strong>in</strong>dow of historical return values of length J:r k T =log (S t−(k−1) /S t−τ−k+1)and S k T =S te rk T , k =1,...,J.The probability density of r T is obta<strong>in</strong>ed by the kernel density estimatorˆp hp (r)= 1Jh p( )J∑ rkK T−r,h pk=1where K is a kernel weight function and h p is the bandwidth. Some variations arealso explored such as overlapp<strong>in</strong>g and nonoverlapp<strong>in</strong>g w<strong>in</strong>dows with a real dataexample <strong>in</strong> Section 5.Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 20133.5 Estimation of the Risk Neutral Density qWe beg<strong>in</strong> with the call price option <strong>for</strong>mula that l<strong>in</strong>ks the call prices to therisk neutral density estimation. The European call price option <strong>for</strong>mula is given


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 385by (Ait-Sahalia and Duarte, 2003)whereC ( X,τ,r t,τ ,δ t,τ ,S t)=e−r t,τ τ∫ ∞0max(S T −X,0)q ( S T |τ,r t,τ ,δ t,τ ,S t)dST• S t : the underly<strong>in</strong>g asset price at time t,• X: the strike price,• τ: the time to maturity,• T =t+τ: the expiration date,• r t,τ : the determ<strong>in</strong>istic risk free <strong>in</strong>terest rate <strong>for</strong> that maturity,• δ t,τ : the correspond<strong>in</strong>g dividend yield of the asset.Write q(S T )<strong>for</strong>q(S T |τ,r t,τ ,δ t,τ ,S t ). For fixed t and τ, assume r t,τ =r and δ t,τ =δ,the risk neutral density is expressed asq(u)=e rτ ∂2 C∂X 2 | X=u.The relation is due to Breeden and Litzenberger (1978) and serves the basis ofmany current semi-parametric and <strong>nonparametric</strong> approaches. We employ thesemiparametric estimates of Rookley (1997), where the parametric Black–Scholes<strong>for</strong>mula is assumed except that the volatility parameter σ is a function of the option’smoneyness and the time to maturity τ. In this work, we fix the maturity and considerit as one dimensional regression problem.Def<strong>in</strong>e F =S t e (r−δ)τ and m=X/F is moneyness. Write and φ <strong>for</strong> thecumulative distribution function and its density of standard normal randomvariable, respectively. The Black–Scholes model assumesC BS (X,τ) = S t e −δτ (d 1 )−e −rτ X(d 2 )= e −rτ F { ( d 1)−m(d2)}.In a semiparametric call price function, the volatility parameter σ is expressed as afunction of the option’s moneynes and the time to maturity τ:C(X,τ,r,δ,S t )=C BS (X,τ,F,σ (m,τ)).To derive the second derivative of C, it is simpler to work with a standardizedcall price function c(m,τ)=e rτ C(X,τ,r,δ,σ)/F =(d 1 )−m(d 2 ). The derivatives ofC and c are related as∂C∂X = e−rτ F ∂c ∂m ∂c=e−rτ∂m ∂X ∂m ,∂ 2 C ∂c2 ∂m= e−rτ∂X2 ∂m 2 ∂X = e−rτ ∂c 2F ∂m 2 .Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


386 Journal of F<strong>in</strong>ancial EconometricsWith some manipulation we obta<strong>in</strong> the follow<strong>in</strong>g expressions, which are onlyfunctions of (σ,σ ′ ,σ ′′ ):∂c∂m = φ( ) ∂d 1d 1∂m −( ) ( ) ∂d 2d 2 −mφ d2∂m) 2+φ ( ) ∂ 2 d 1d 1∂ 2 c∂m 2 =−d 1φ ( ) ( ∂d 1d 1∂m∂m 2 −φ( ) ∂d 2d 2∂m −φ( ) ∂d 2d 2∂mwhere+md 2 φ ( ) ( )∂d 22d 2 −mφ ( ) ∂ 2 d 2d 2∂m∂m 2 ,∂d 1∂m =− √ 1 1τ mσ (m,τ) + √ 1 ln(m) σ ′ √(m,τ) ττ σ 2 (m,τ) + 2 σ ′ (m,τ)∂d 2∂m = ∂d 1∂m −√ τσ ′ (m,τ)∂ 2 d 1∂m 2 = 1m 2√ τσ(m,τ) + √ 2 σ ′ {(m,τ) 1τ σ 2 (m,τ) m −ln(m) σ ′ }(m,τ)σ (m,τ){+σ ′′ (m,τ)√ln(m) τσ 2 (m,τ) √ τ + 2∂ 2 d 2∂m 2 = ∂2 d 1∂m 2 −√ τσ ′′ (m,τ).Note that this leads to a slightly different derivation from Rookley (1997), albeitus<strong>in</strong>g the same pr<strong>in</strong>ciple.In order to compute the derivatives of σ , we used the local polynomialsmooth<strong>in</strong>g on implied volatility. Let σ i be the implied volatility correspond<strong>in</strong>g tothe call price C i with moneyness m i . The local polynomial smooth<strong>in</strong>g estimates areobta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g⎧⎫2∑⎨3∑⎬⎩ σ i − β j (m)(m i −m) j W ( )(m⎭ i −m)/h q ,ij=0where W(·) is a weight function. The estimates are computed as ˆσ (m)=ˆβ 0 (m), ˆσ ′ (m)= ˆβ 1 (m) and ˆσ ′′ (m)=2 ˆβ 2 (m). Substitut<strong>in</strong>g the estimates to the aboveexpressions gives an estimate of q. The density estimates are def<strong>in</strong>ed on the scaleof S T . To def<strong>in</strong>e the density on the same returns scale r T =log(S T /S t )asp, a simpletrans<strong>for</strong>mation can be applied:}Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013q(r T )=q(S T )S T .Notice that all results are shown on a cont<strong>in</strong>uously compounded 1-month periodreturns R T =1+r T =1+log(S T /S t ).


3.6 Word on AsymptoticsGRITH ET AL. | Shape Invariant Model<strong>in</strong>g 387There are two layers of estimation <strong>in</strong>volved. The first step deals with <strong>in</strong>dividualcurve estimation and the second step <strong>in</strong>troduces shape <strong>in</strong>variant model<strong>in</strong>g. Theshape <strong>in</strong>variant model<strong>in</strong>g is largely robust to how the data are prepared be<strong>for</strong>eenter<strong>in</strong>g the iterative algorithm and the result<strong>in</strong>g estimates are <strong>in</strong>terpreted asconditional on the <strong>in</strong>dividual curves. There<strong>for</strong>e, the ma<strong>in</strong> estimation error arises <strong>in</strong>the first stage where p and q are separately estimated with possibly different samplesizes and separately chosen bandwidths.In practical terms, the sample size used <strong>in</strong> estimat<strong>in</strong>g p is normally of smallerorder, say n compared to N =nM <strong>for</strong> q <strong>for</strong> a constant M. This is due to thedifference between the daily observations available <strong>for</strong> estimat<strong>in</strong>g p and the<strong>in</strong>traday observations available <strong>for</strong> estimat<strong>in</strong>g q. Thus it might be expected thatthe estimation error will be dom<strong>in</strong>ated by the estimation error of p. On the otherhand, the underly<strong>in</strong>g function p <strong>for</strong> which simple kernel estimation is used is muchsimpler and more stable compared to q <strong>for</strong> which <strong>nonparametric</strong> second derivativeestimation is required.Because the estimates of ratios are constructed from the ratio of the estimates,we can decompose the error asˆK(u)−K(u) = ˆq(u)ˆp(u) − q(u)p(u)≃ ˆq(u)−q(u) − q(u)p(u) p(u)ˆp(u)−p(u).p(u)Numerical <strong>in</strong>stability might occur <strong>in</strong> the region where ˆp≈0 however this is not oftheoretical concern. In fact, the pric<strong>in</strong>g kernel is the Radon-Nikodym derivative ofan absolutely cont<strong>in</strong>uous measure, and thus p and q are equivalent measures, thatis, the null set of p is the same as the null set of q. So we can limit our attention tothe case where p(u)>ɛ <strong>for</strong> some constant ɛ. Provided that p(u)>ɛ and q(u)>ɛ,theasymptotic approximation is straight<strong>for</strong>ward and asymptotic bias and variance canbe computed from[ ]E ˆK(u)−K(u)≃ E[ˆq(u)−q(u) ]− q(u) E [ˆp(u)−p(u) ]p(u) p(u) p(u)( ) ( ) ( )= O hq4 +O hp2 +O hp 2 +h4 q ,{[ ]]Var[ˆq(u)Var ˆK(u)−K(u) ≃ K 2 (u)q 2 + Var[ˆp(u) ] }(u) p 2 (u)Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013= O {( Nh q) −1 }+O{(nhp) −1 }+O{(Nhq) −1 +(nhp) −1 }.S<strong>in</strong>ce ˆq <strong>in</strong>volves estimation of second derivative of a regression function, the error isdom<strong>in</strong>ated by the estimation of q. The optimal rate of convergence <strong>for</strong> q is O(N −2/9 )


388 Journal of F<strong>in</strong>ancial Econometricswhile that <strong>for</strong> p is O(n −2/5 ). These will be equivalent when M =O(n 39/15 )>O(n 2 ). Inpractice M is of much smaller order and there<strong>for</strong>e the lead<strong>in</strong>g error terms come fromthe estimation of q. Ait-Sahalia and Lo (2000) showed <strong>in</strong> a similar framework thatthe error is dom<strong>in</strong>ated by the estimation of q and <strong>for</strong> the purpose of asymptoticsp can be regarded as a fixed quantity. For this reason we actually implement asemiparametric estimator <strong>for</strong> q to stabilize the estimator.Consistency and asymptotic normality of the parameter estimates are shown<strong>in</strong> Härdle and Marron (1990) <strong>for</strong> two curves and <strong>in</strong> Kneip and Engel (1995) <strong>for</strong>multiple curves. We write the approximate distribution <strong>for</strong> ˆθ t asˆθ t ≈N(θ t , t ).Due to the iterative algorithm, the asymptotic covariance matrix is morecomplicated <strong>for</strong> multiple curves but Kneip and Engel (1995) show that, as thenumber of curves <strong>in</strong>creases, the additional terms aris<strong>in</strong>g <strong>in</strong> the covariance matrix isof lower order than the standard error term due to nonl<strong>in</strong>ear least square methods.There is no suggested estimate <strong>for</strong> the asymptotic covariance matrix but a <strong>consistent</strong>estimate can be constructed as <strong>in</strong> standard nonl<strong>in</strong>ear least square methods. Def<strong>in</strong>ethe residual ê tj = ˆKt (u j )− ˜Kt (u j ) where ˆK is the <strong>in</strong>itial estimates and ˜K is the SIMestimates and letˆσ t 2 = 1 n∑êtj 2 n.The covariance matrix can be estimated as⎡⎤n∑ (( )} ⊤ˆ t =ˆσ t2 ⎣n{▽ −1 θ ˜Kt u j ; ˜θ)}{▽ θ ˜Kt u j ; ˜θ ⎦j=1where ▽ θ K(u;θ) is the first derivative of the function, given by( )∂K(u) u−θ3= g ,∂θ 1 θ 2∂K(u)=− θ ( ) ( )1 u−θ3g ′ u−θ3,∂θ 2 θ 2 θ 2 θ 2∂K(u)=− θ ( )1g ′ u−θ3,∂θ 3 θ 2 θ 2∂K(u)∂θ 4= 1.j=1−1,Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013To see whether the location or scale parameters are different between any pair ofcurves, we can compute the standard errors of the estimates to make a comparison.A <strong>for</strong>mal hypothesis <strong>test</strong><strong>in</strong>g also appears <strong>in</strong> Härdle and Marron (1990) <strong>for</strong> kernelbasedestimates and <strong>in</strong> Ke and Wang (2001) <strong>for</strong> spl<strong>in</strong>e-based estimates. For example


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 389Table 1 Parameter values of θDistribution Mean Standard deviationθ 1 Log-normal 1 0.33θ 2 Log-normal 1 0.28θ 3 Normal 0 0.27θ 4 Normal 0 0.35we might be <strong>in</strong>terested <strong>in</strong> <strong>test</strong><strong>in</strong>g whether a location or a scale parameter can beremoved.Although these results are practically relevant, we note that the methodsmentioned all assume direct observations of the underly<strong>in</strong>g function of <strong>in</strong>terest,with one smooth<strong>in</strong>g parameter selection <strong>in</strong>volved. Obta<strong>in</strong><strong>in</strong>g comparable rigorousresults <strong>for</strong> our estimator is complicated <strong>in</strong> the present situation due to thenonstandard nature of the estimator be<strong>in</strong>g a ratio of two separate <strong>nonparametric</strong>estimates with <strong>in</strong>dependent bandwidths. We consider this out of scope of this paperand leave it <strong>for</strong> separate work.4 NUMERICAL STUDIES OF SIM ESTIMATIONApply<strong>in</strong>g the SIM to pric<strong>in</strong>g kernels <strong>in</strong>volves two rather separate estimation steps,the <strong>in</strong>itial estimation of the pric<strong>in</strong>g kernels and the SIM estimation given the preestimates.The <strong>for</strong>mer has been studied extensively and <strong>in</strong> particular the propertiesof the <strong>nonparametric</strong> methods that we have used are well established <strong>in</strong> theliterature. This section ma<strong>in</strong>ly concerns the latter.We identify the two ma<strong>in</strong> factors that could affect the per<strong>for</strong>mance of SIMestimation to be error misspecification and smooth<strong>in</strong>g parameter selection <strong>for</strong> the<strong>in</strong>dividual curves. Their effects are evaluated <strong>in</strong> the follow<strong>in</strong>g simulation studies.The effects on pric<strong>in</strong>g kernel estimation are separately studied <strong>in</strong> Section 5.4, <strong>in</strong>comparison to the standard <strong>nonparametric</strong> approach used <strong>in</strong> Jackwerth (2000).4.1 Generat<strong>in</strong>g CurvesIn each simulation 50 curves are generated at 50 (random uni<strong>for</strong>m) grid po<strong>in</strong>ts. Inorder to mimic the common shape of the observed pric<strong>in</strong>g kernel, we generatedthe common curve by a ratio of two densitiesDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013g(u)=q 0 (u)/p 0 (u),where p 0 is density of Gamma(0.8,1) distribution and q 0 is density of mixture w∗Gamma(0.2,1)+(1−w)∗N(0.91,0.3 2 ) distribution with w=0.3. In accordance withthe normalization scheme, the θ values are set as <strong>in</strong> Table 1. The values of the


390 Journal of F<strong>in</strong>ancial EconometricsTable 2 Parameter values <strong>for</strong> error specificationError 1 Error 2 Error3Case 1 σ 0.02 0.05 0.10Case 2 φ 0.75 0.75 0.75σ u 0.02 0.03 0.09Case 3 α −3.69 −2.99 −2.30β 0.75 0.52 0.53σ v 0.01 0.02 0.02Case 4 α −2.41 −1.89 −1.39β 0.45 0.40 0.42φ 0.75 0.45 0.45σ v 0.10 0.25 0.25standard deviation were chosen to be similar to the observed ones <strong>in</strong> the real dataexample.4.2 Error SpecificationFor the error specification, we have <strong>in</strong>cluded dependent errors <strong>in</strong> time as well as <strong>in</strong>moneyness as follow<strong>in</strong>g.• Case 1: Independent error: ε t,j ∼N ( 0,σ 2)• Case 2: Dependent error <strong>in</strong> moneyness:( )ε t,j =φε t,j−1 +u t,j , the set of the u t,j ∼N 0,σu2• Case 3: Dependent error <strong>in</strong> time: ε t,j ∼N ( 0,σt2 )log(σ t )=α+β log(σ t−1 )+v t ,( )v t ∼N 0,σv2• Case 4: Dependent error <strong>in</strong> moneyness and time:( )ε t,j = φε t,j−1 +u t,j , u t,j ∼N 0,σut2 ,log(σ ut ) = α+β log ( ) ( )σ u,t−1 +vt , v t ∼N 0,σv2Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013Cases 1 and 2 are commonly assumed but Cases 3 and 4 were rarely used <strong>in</strong>the literature with SIM estimation. Table 2 lists the parameter values used <strong>for</strong>simulation. These values are chosen to be comparable <strong>in</strong> terms of overall variabilityamong cases.


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 3914.3 Smooth<strong>in</strong>g Parameter SelectionWe consider three versions of the least squares cross-validation (CV) based criteria<strong>for</strong> bandwidth selection:CV t (h)=n∑i=1{Y t,i − ̂K−(i) t,h (u i)} 2,where ̂K−(i) t,his the local l<strong>in</strong>ear fit without us<strong>in</strong>g the i-th observation. For eachobserved curve we f<strong>in</strong>d the optimal bandwidth ht ∗ =argm<strong>in</strong>CV t(h). Due toconsiderable variability <strong>in</strong> the x-dimension we standardize the optimal bandwidths( h ˜ t ∗ =h∗ t /s t), where s t is the empirical standard deviation of u i , and we choose thecommon bandwidth as follows:h opt,1 =max( ˜ h ∗ t )h opt,2 =average( ˜ h ∗ t ) or h opt,3 =argm<strong>in</strong> ∑ tCV t (h).F<strong>in</strong>ally, we multiply h opt by s t and use these values to per<strong>for</strong>m smooth<strong>in</strong>g of eachcurve.4.4 Results of SimulationWe considered various simulation scenarios based on the comb<strong>in</strong>ations of the caseof errors and bandwidth selection methods. Table 3 summarizes the results of thegoodness of fit measured by MSE <strong>for</strong> the case σ =0.05. For comparison we added <strong>in</strong>the last row the MSE <strong>for</strong> the standard <strong>nonparametric</strong> estimates based on <strong>in</strong>dividualoptimal bandwidths to their advantage. For larger error (σ =0.1, not shown) we alsoobserved some dramatic deterioration with Case 4. Nevertheless, the simulationstudies suggest that the overall error is <strong>in</strong> the same order of magnitude and wesuspect that the impact of these factors is limited. The fit was however best withsmooth<strong>in</strong>g parameters selected by h 1 .5 REAL DATA EXAMPLEWe use <strong>in</strong>traday European options data on the Deutscher Aktien <strong>in</strong>dex (DAX),provided by European Exchange EUREX and ma<strong>in</strong>ta<strong>in</strong>ed by the CASE, RDC SFB649 (http://sfb649.wiwi.hu-berl<strong>in</strong>.de) <strong>in</strong> Berl<strong>in</strong>. We have identified options datawith maturity one month (31 work<strong>in</strong>g days/23 trad<strong>in</strong>g days) from June 2003 toJune 2006 from DAX 30 Index European options, which adds up to 37 days.We obta<strong>in</strong> the <strong>in</strong>itial estimates <strong>for</strong> p and q as described <strong>in</strong> Section 3.4. For thechoice of kernel functions, we have used quartic function <strong>for</strong> both p and q.Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


392 Journal of F<strong>in</strong>ancial EconometricsTable 3 Comparison of SIM estimation with respect to error misspecification andsmooth<strong>in</strong>g parameter selectionσ =0.05methods parms. case 1 case 2 case 3 case 4h 1 θ 1 31 32 67 65θ 2 60 70 84 77θ 3 54 62 81 76θ 4 32 32 77 75K i s 1.2 1.6 1.5 1.5h 2 θ 1 67 68 80 69θ 2 115 115 110 99θ 3 111 110 105 103θ 4 70 72 99 85K i s 1.1 1.6 1.9 1.9h 3 θ 1 67 71 67 73θ 2 115 108 91 82θ 3 111 100 88 84θ 4 70 74 83 88K i s 1.1 1.6 1.8 1.8npK 3.5 2.0 4.2 3.6Numbers are MSE multiplied by 10000. K i s computes the average MSE <strong>for</strong> all curves from SIM and npKwithout SIM but us<strong>in</strong>g <strong>in</strong>dividual optimal bandwidths <strong>for</strong> each curve.5.1 Estimation of the Risk Neutral Density qThe stock <strong>in</strong>dex price varies with<strong>in</strong> one day and we would need to identify theprice at which a certa<strong>in</strong> transaction has taken place. However, several authors (e.g.Jackwerth, 2000) report that the <strong>in</strong>traday change of the <strong>in</strong>dex price is stale and weuse <strong>in</strong>stead the prices of futures contracts closest to the time of the registered pairof the option and strike prices to derive the correspond<strong>in</strong>g stock price, corrected <strong>for</strong>dividends and difference <strong>in</strong> taxation follow<strong>in</strong>g a methodology described <strong>in</strong> Fengler(2005).The data conta<strong>in</strong>s the actually traded call prices, the implied stock <strong>in</strong>dex pricecorrected <strong>for</strong> the dividends from the futures derivatives on the DAX, the strikeprices, the <strong>in</strong>terest rates (l<strong>in</strong>early <strong>in</strong>terpolated based on EURIBOR to approximatea riskless <strong>in</strong>terest rate <strong>for</strong> the specific option’s time to maturity), the maturity, thetype of the options, calculated moneyness, calculated Black and Scholes impliedvolatility, the volume, and the date. For each day, we use only at-the-money andout-of-the-money call options and <strong>in</strong>-the-money puts to compute the Black–Scholesimplied volatilities. This guarantees that unreliable observations (high volatility)will be removed from our estimation samples. S<strong>in</strong>ce the <strong>in</strong>traday stock price varies,we use its median (S t ) to compute the risk neutral density and correct the strikeDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 393price to preserve the ratio relative to the underly<strong>in</strong>g stock price. For this price, weverify if our observations satisfy the no arbitrage condition:S t ≥C i ≥max(S t −X i e −rτ ,0),where X i is the adjusted strike price and C i is the correspond<strong>in</strong>g call price. For therema<strong>in</strong><strong>in</strong>g observations (X i ,C i ) we compute the (m i ,σ i ) counterparts <strong>for</strong> the fixedS t by implicitly assum<strong>in</strong>g that the volatility does not depend on the changes <strong>in</strong> the<strong>in</strong>traday stock price. The estimates are computed based on these pairs (m i ,σ i ).5.2 Estimation of the Historical Density pWe compute the <strong>nonparametric</strong> kernel density estimates as described earlier.Jackwerth (2000) argues that some discrepancies between the <strong>nonparametric</strong>estimates are attributed to overlapp<strong>in</strong>g and nonoverlapp<strong>in</strong>g w<strong>in</strong>dows of the pastobservations selected. For comparison to the earlier works, we also experimentedwith a choice of time vary<strong>in</strong>g equity premium and constant equity premium (wedemean the densities and supplant it with the risk free rate on the estimationday plus 8% equity premium per annum as <strong>in</strong> Jackwerth (2000) adjusted <strong>for</strong>the correspond<strong>in</strong>g maturity), overlapp<strong>in</strong>g and nonoverlapp<strong>in</strong>g returns, w<strong>in</strong>dowlengths of 2, 4, and 6 years, respectively. The estimates <strong>for</strong> different choices ofparameters are then compared subsequently <strong>in</strong> terms of pric<strong>in</strong>g kernel, impliedrisk aversion and implied utility function estimation. We f<strong>in</strong>d that with vary<strong>in</strong>gdegrees of assumptions on the model, common characteristics such as peaks andskewness are reportedly observed <strong>in</strong> a wide range of estimates.5.3 Smooth<strong>in</strong>g Parameter SelectionIn contrast to the simulation studies, the effect of smooth<strong>in</strong>g parameter is lesstransparent with real data when we estimate p and q separately. At first glance,the bandwidth selection <strong>for</strong> q seems more <strong>in</strong>fluential than that of p <strong>in</strong> gaug<strong>in</strong>gper<strong>for</strong>mance of the estimates, as it <strong>in</strong>volves derivative estimation. Figure 5 exam<strong>in</strong>esthe effect of the bandwidth choices on ˆq. Top left panel shows the implied volatilityestimates overlayed, the top right shows the first derivative estimates and bottomleft shows the second derivative estimates, respectively, which are used as <strong>in</strong>puts tocreate the estimates of q on bottom right panel. The bandwidths used are (0.05, 0.10,0.15, 0.20). With the apparent undersmooth<strong>in</strong>g at the smallest bandwidth, there isnotable variability <strong>in</strong> terms of smoothness <strong>in</strong> estimation of implied volatility andits derivatives, however the result<strong>in</strong>g density estimates demonstrate robustness.Similar observations are made to other dates. However by smooth<strong>in</strong>g on impliedvolatility doma<strong>in</strong>, we f<strong>in</strong>d that the estimates are stable with relatively a wide rangeof bandwidth choices.Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


394 Journal of F<strong>in</strong>ancial Econometrics0.30.75σ0.2σ 1 st derivative0σ 2 nd derivative0.8 1 1.2Moneyness1500.8 1 1.2MoneynessRND0.8 1 1.2Moneyness1260.8 1 1.2ReturnsFigure 5 Example of q estimates with vary<strong>in</strong>g bandwidths (0.05, 0.1, 0.15, 0.20). The first threepanels show estimates of implied volatility, its first and second derivative. The correspond<strong>in</strong>gdensities are shown <strong>in</strong> lower right panel. Estimates are stable <strong>for</strong> a wide range of bandwidthschoices.For a systematic choice, we employed a version of CV criteria (h opt,1 def<strong>in</strong>ed<strong>in</strong> Section 4.3) <strong>for</strong> p and q estimation. For estimation of q, we have used the leastsquares CV <strong>for</strong> local cubic estimation to <strong>in</strong>clude the second derivative of σ :CV(h q ) =n∑n∑i=1 j̸=i{σ i −̂σ (0)h q ,−i (m i)−̂σ (1)h q ,−i (m i)(m j −m i )}1 2(2)−2̂σh q ,−i (m i)(m j −m i ) 2 w(m i ),where ̂σ (k)h q ,−i is the k-th derivative estimate without the i-th observation (m i,σ i ) and0≤w(m i )≤1 is a weight function. The h 1 -optimal bandwidth <strong>in</strong> implied volatilityspace turns out to be h q =0.2.For estimation of p, we have used the likelihood CV <strong>for</strong> each curve on returnsscale:logL(h p )=n∑i=1log ˆp −(i)h p(r i ),Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 395230EPKARA01Utility0.95 1 1.05Returns0.20.10.95 1 1.05ReturnsMean ARA0.95 1 1.05Returns3000.95 1 1.05ReturnsFigure 6 Illustration of SIM with common EPK, ARA, utility function, and mean ARA.where ˆp −(i)h p(r i ) is the leave-one-out kernel estimator <strong>for</strong> p hp (r i ). However, we foundthat the optimal bandwidth selected tends to systematically oversmooth and thuswe chose a smaller value close to the maximum of <strong>in</strong>dividually optimal bandwidths,which is <strong>in</strong> our case h p =0.05.5.4 Estimation of Pric<strong>in</strong>g Kernels, ARA and Utility FunctionWe have considered <strong>in</strong> Section 5.2 various options <strong>for</strong> the parameter choice <strong>in</strong>estimat<strong>in</strong>g p and have ended up with 12 series of pre-estimates of pric<strong>in</strong>g kernel. Weare <strong>in</strong>terested <strong>in</strong> see<strong>in</strong>g how these choices <strong>in</strong>fluence the estimated common curvesand θ t parameters by SIM. S<strong>in</strong>ce, as it turns out, the results are very similar amongspecifications we depict graphically only four of them <strong>in</strong> Figures 6 and 7: thosebased on nonoverlapp<strong>in</strong>g (solid) and overlapp<strong>in</strong>g (dashed) returns over the lasttwo years, nonoverlapp<strong>in</strong>g returns over the last four (dot-dashed) and six (dotted)years, respectively with vary<strong>in</strong>g equity premium. The added l<strong>in</strong>es <strong>in</strong> Figure 7 are95% po<strong>in</strong>twise confidence band <strong>for</strong> the first series of pre-estimates.The common curves are represented <strong>in</strong> Figure 6. All estimates display aparadoxical feature: pric<strong>in</strong>g kernel has a bump, ARA has a region of negative valuesthat correspond to the <strong>in</strong>creas<strong>in</strong>g region <strong>in</strong> the pric<strong>in</strong>g kernel, utility function has aconvex region <strong>in</strong> the doma<strong>in</strong> around the peak of the pric<strong>in</strong>g kernel. The variabilityDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


396 Journal of F<strong>in</strong>ancial Econometrics22θ 11θ 212003/06 2004/11 2006/052003/06 2004/11 2006/05θ 4102003/06 2004/11 2006/05θ 4102003/06 2004/11 2006/05Figure 7 Estimated SIM parameters under variations <strong>in</strong> the choice of the w<strong>in</strong>dow lengths of returnsvalues.among curves is expressed by θ t -s. In Figure 7, we observe that the ma<strong>in</strong> difference<strong>in</strong> the dynamics of different series has to do with the magnitude but less withthe direction of change. In addition, we computed the mean of implied ARAcorrespond<strong>in</strong>g to our estimation period by comput<strong>in</strong>g the sample average andfound that it was similar to the the mean ARA <strong>for</strong> S&P500 appear<strong>in</strong>g <strong>in</strong> Figure3C—19 March 1991 to 19 August 1993 <strong>in</strong> Jackwerth (2000), and to a certa<strong>in</strong> extentto the yearly average from 2003 and 2005 shown <strong>in</strong> Figure 4 <strong>in</strong> Chabi-Yo et al.(2008). It is worth not<strong>in</strong>g that the mean ARA and the common ARA curves differ agreat deal due to the nonl<strong>in</strong>ear trans<strong>for</strong>mation <strong>in</strong>volved <strong>in</strong> deriv<strong>in</strong>g ARA from thepric<strong>in</strong>g kernel, e.g. see Equation (2) <strong>in</strong> Section 2.4. This is not surpris<strong>in</strong>g s<strong>in</strong>ce the<strong>in</strong>terpretation of common curve is different from the average curve, <strong>in</strong> particularthe common curve and the mean curves have different scales of the x-doma<strong>in</strong>s—bymeans of registration.5.5 Relation to Macroeconomic VariablesWith an aid of the SIM model <strong>for</strong> EPK, we wish to characterize changes <strong>in</strong> riskpatterns <strong>in</strong> relation to economic variables of <strong>in</strong>terest. Be<strong>for</strong>e do<strong>in</strong>g this, we shouldmention that <strong>in</strong> the case of nonstandard common curves—<strong>in</strong> our empirical examplethe peak does not occur at 0—both θ 1 and θ 2 <strong>in</strong>troduce a shift effect <strong>in</strong> EPK togetherwith its shape effect. In order to disentangle these effects and improve <strong>in</strong>terpretationwe first standardize the EPK curves by the location of the peak be<strong>for</strong>e apply<strong>in</strong>g SIM.Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 397Table 4 Correlation table <strong>for</strong> the first difference of SIM parametethers and theselected macro economic variablesθ 1 θ 2 θ 3 θ 4 CS DAX YTθ 1 1.00 0.55 ∗ 0.02 0.78 ∗ −0.25 0.38 ∗∗ −0.26θ 2 1.00 0.38 ∗ −0.04 0.06 −0.12 −0.39 ∗∗θ 3 1.00 −0.18 0.07 −0.21 −0.28 ∗∗∗θ 4 1.00 −0.37 ∗∗ 0.62 ∗ −0.04∗ , ∗∗ , and ∗∗∗ significant at 1%, 5%, and 10% levels, respectively.This <strong>in</strong>troduces two more parameters, the horizontal and vertical coord<strong>in</strong>ates of thepeaks <strong>in</strong> the analysis. S<strong>in</strong>ce their shift effect is comprised by parameters θ 3 and θ 4we will not treat them here separately.Previous studies try<strong>in</strong>g to l<strong>in</strong>k the parameters describ<strong>in</strong>g risk attitudes to thebus<strong>in</strong>ess conditions <strong>in</strong>clude Rosenberg and Engle (2002). Based on power pric<strong>in</strong>gkernel specifications they show that risk aversion is counter-cyclical. Other relatedwork <strong>in</strong>vestigates the relation between equity premiums, (e.g., Fama and French,1989), smile asymmetry of volatility (Bekaert and Wu, 2000; Drechsler and Yaron,2010), or market efficiency (Marshall, Cahan, and Cahan, 2008). The advantageof our approach over Rosenberg and Engle (2002) is that it allows us to identifyhow the change <strong>in</strong> economic variables relates to the shape of a <strong>nonparametric</strong>allyestimated pric<strong>in</strong>g kernel. Due to limited sample size—37 observations—it isimpossible to estimate a structural model that correctly deals with the simultaneityof our set of dependent variables. Further research will <strong>in</strong>volve the estimation ofa (S)VAR specification, <strong>in</strong> order to account <strong>for</strong> the a<strong>for</strong>ementioned endogeneity.We <strong>in</strong>stead evaluate the potential univariate correlations between the estimatedθ t parameters and macroeconomic variables associated with the bus<strong>in</strong>ess cycleand <strong>in</strong>terpret our results from the perspective of local EPK and risk aversionfunctions. We use the follow<strong>in</strong>g variables that have a revealed relation with thestate of the economy: credit spread (CS) is the difference between the yield on thecorporate bond, based on the German CORPTOP Bond matur<strong>in</strong>g <strong>in</strong> 3–5 years, andthe government bond matur<strong>in</strong>g <strong>in</strong> 5 years; the yield curve slope (YT) refers to thedifference between the 30-year government bond yield and three-months <strong>in</strong>terbankrate; short-term <strong>in</strong>terest rate (IR) is the three-months <strong>in</strong>terbank rate; and DAX 30Per<strong>for</strong>mance <strong>in</strong>dex as a proxy <strong>for</strong> consumption. Depend<strong>in</strong>g on data availabilitywe collect daily or monthly data. Tests on unit roots failed to reject stationarity<strong>in</strong> all parameter series and economic variables; we there<strong>for</strong>e work with their firstdifference. For conciseness we present only the correlation table <strong>for</strong> nonoverlapp<strong>in</strong>greturns over the past two years with vary<strong>in</strong>g equity premium and <strong>in</strong>terpret theresults below <strong>in</strong> relation to Figure 3.In Table 4, we read significant positive correlation between changes <strong>in</strong> θ 1 andDAX and negative one with the credit spread, <strong>in</strong>dicat<strong>in</strong>g that the EPK becomesmore pronounced when the economic <strong>in</strong>dicators suggest an expand<strong>in</strong>g economy;changes <strong>in</strong> θ 2 and YT are negatively correlated, suggest<strong>in</strong>g that risk aversion slopeDownloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


398 Journal of F<strong>in</strong>ancial Econometricsbecomes locally steeper dur<strong>in</strong>g economic boom. The same <strong>in</strong>terpretation holds <strong>for</strong>the negative correlation between changes <strong>in</strong> θ 3 and YT. The height of the peakvaries with the returns on the <strong>in</strong>dex, po<strong>in</strong>t<strong>in</strong>g to an <strong>in</strong>creas<strong>in</strong>g local risk proclivity<strong>in</strong> periods of economic expansion. We have not found any significant correlationbetween changes <strong>in</strong> θ t and <strong>in</strong> the short-term <strong>in</strong>terest rate. F<strong>in</strong>ally, we observe apositive correlation between the <strong>in</strong>crements <strong>in</strong> θ 1 and θ 2 that suggests that overperiods of concerted negative evolution of the economic <strong>in</strong>dicators the EPK bumpwill shr<strong>in</strong>k <strong>in</strong> both horizontal and vertical direction, possibly lead<strong>in</strong>g to an overalldecreas<strong>in</strong>g EPK.In summary, the sense of the relations between the <strong>in</strong>dicators of the bus<strong>in</strong>esscycle and the parameters that summarize risk preferences <strong>in</strong>dicates that locally risklov<strong>in</strong>g behavior is procyclical. These f<strong>in</strong>d<strong>in</strong>gs are also <strong>in</strong> l<strong>in</strong>e with the results found<strong>in</strong> Rosenberg and Engle (2002).REFERENCESAit-Sahalia, Y., and J. Duarte. 2003. Nonparametric Option Pric<strong>in</strong>g under ShapeRestrictions. Journal of Econometrics 16: 9–47.Ait-Sahalia, Y., and A. Lo. 2000. Nonparametric Risk Management and Implied RiskAversion. Journal of Econometrics 94: 9–51.Bekaert, G., and G. Wu. 2000. Asymmetric Volatility and Risk <strong>in</strong> Equity Markets.Review of F<strong>in</strong>ancial Studies 13: 1–42.Bondarenko, O. 2003. Estimation of Risk-Neutral Densities us<strong>in</strong>g Positive ConvolutionApproximation. Journal of Econometrics 116: 85–112.Breeden, D., and R. Litzenberger. 1978. Prices of State-Cont<strong>in</strong>gent Claims Implicit<strong>in</strong> Option Prices. The Journal of Bus<strong>in</strong>ess 51: 621–651.Chabi-Yo, Y., R. Garcia, and E. Renault. 2008. State Dependence can Expla<strong>in</strong> theRisk Aversion Puzzle. Review of F<strong>in</strong>ancial Studies 21: 973–1011.Drechsler, I., and A. Yaron. 2010. What’s Vol Got to Do with it. Review of F<strong>in</strong>ancialStudies 20: 1–45.Fama, E. F., and K. R. French. 1989. Bus<strong>in</strong>ess Conditions and Expected Returns onStocks and Bonds. Journal of F<strong>in</strong>ancial Economics 25: 23–49.Fengler, M. R. 2005. Semiparametric Model<strong>in</strong>g of Implied Volatility. Spr<strong>in</strong>ger-VerlagBerl<strong>in</strong> and Heidelberg.Giacom<strong>in</strong>i, E., and W. Härdle. 2008. “Dynamic semiparametric factor models <strong>in</strong>pric<strong>in</strong>g kernel estimation.” In S. Dabo-Niang and F. Ferraty (eds.), Functionaland Operational Statistics, Contributions to Statistics, pp. 181–187. Physica-Verlag HD.Giacom<strong>in</strong>i, E., W. Härdle, and M. Handel. 2008. “Time dependent relative riskaversion.” In G. Bol, S. Rachev, and R. Wrth (eds.), Risk Assessment: Decisions<strong>in</strong> Bank<strong>in</strong>g and F<strong>in</strong>ance, Contributions to Economics, pp. 15–46. Physica-Verlag HD.Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013


GRITH ET AL. | Shape Invariant Model<strong>in</strong>g 399Härdle, W., and J. S. Marron. 1990. Semiparametric Comparison of RegressionCurves. The Annals of Statistics 18: 63–89.Hens, T., and C. Reichl<strong>in</strong>. 2012. “Three Solutions to the Pric<strong>in</strong>g KernelPuzzle.” Technical report, Swiss F<strong>in</strong>ance Institute Research PaperNo. 10-14. Available at SSRN: http://ssrn.com/abstract=1582888 orhttp://dx.doi.org/10.2139/ssrn.1582888Jackwerth, J. C. 1999. Option-Implied Risk-Neutral Distributions and ImpliedB<strong>in</strong>omial Trees: a Literature Review. Journal of Derivatives 2: 66–82.Jackwerth, J. C. 2000. Recover<strong>in</strong>g Risk Aversion from Option Prices and RealizedReturns. Review of F<strong>in</strong>ancial Studies 13: 433–451.Ke, C., and Y. Wang. 2001. Semiparametric Nonl<strong>in</strong>ear Mixed-Effects Models andtheir Applications. Journal of the American Statistical Association 96: 1272–1298.Kneip, A., and J. Engel. 1995. Model Estimation <strong>in</strong> Nonl<strong>in</strong>ear Regression underShape Invariance. The Annals of Statistics 23: 551–570.Kneip, A., and T. Gasser. 1988. Convergence and Consistency Results <strong>for</strong> Self-Model<strong>in</strong>g Nonl<strong>in</strong>ear Regression. The Annals of Statistics 16: 82–112.Lawton, W. H., E. A. Sylvestre, and M. S. Maggio. 1972. Self Model<strong>in</strong>g Nonl<strong>in</strong>earRegression. Technometrics 14: 513–532.Leland, H. E. 1980. Who Should Buy Portfolio Insurance? Journal of F<strong>in</strong>ance 35:581–594.Marshall, B. R., R. H. Cahan, and J. Cahan. 2008. “Technical Analysis Around theWorld: Does it Ever Add Value?” Technical report, SSRN eLibrary.Ramsay, J. O., and B. W. Silverman. 2002. Applied Functional Data Analysis. Spr<strong>in</strong>gerSeries <strong>in</strong> Statistics. New York: Spr<strong>in</strong>ger. Methods and case studies.Rookley, C. 1997. “Fully Exploit<strong>in</strong>g the In<strong>for</strong>mation Content of Intra day OptionQuotes: Applications <strong>in</strong> Option Pric<strong>in</strong>g and Risk Management.” Technicalreport, University of Arizona.Rosenberg, J. V., and R. F. Engle. 2002. Empirical Pric<strong>in</strong>g Kernels. Journal of F<strong>in</strong>ancialEconomics 64: 341–372.Stone, C. J. 1982. Optimal Global Rates of Convergence <strong>for</strong> NonparametricRegression. The Annals of Statistics 10: 1040–1053.Downloaded from http://jfec.ox<strong>for</strong>djournals.org/ at <strong>Humboldt</strong>-Universitaet zu Berl<strong>in</strong> on May 17, 2013

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!