26.04.2013 Views

computing rare-event probabilities for affine models and general ...

computing rare-event probabilities for affine models and general ...

computing rare-event probabilities for affine models and general ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

COMPUTING RARE-EVENT PROBABILITIES FOR AFFINE<br />

MODELS AND GENERAL STATE SPACE MARKOV PROCESSES<br />

A DISSERTATION<br />

SUBMITTED TO THE DEPARTMENT OF MANAGEMENT<br />

SCIENCE AND ENGINEERING<br />

AND THE COMMITTEE ON GRADUATE STUDIES<br />

OF STANFORD UNIVERSITY<br />

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS<br />

FOR THE DEGREE OF<br />

DOCTOR OF PHILOSOPHY<br />

Xiaowei Zhang<br />

August 2011


© 2011 by Xiaowei Zhang. All Rights Reserved.<br />

Re-distributed by Stan<strong>for</strong>d University under license with the author.<br />

This work is licensed under a Creative Commons Attribution-<br />

Noncommercial 3.0 United States License.<br />

http://creativecommons.org/licenses/by-nc/3.0/us/<br />

This dissertation is online at: http://purl.stan<strong>for</strong>d.edu/ny328vh8662<br />

ii


I certify that I have read this dissertation <strong>and</strong> that, in my opinion, it is fully adequate<br />

in scope <strong>and</strong> quality as a dissertation <strong>for</strong> the degree of Doctor of Philosophy.<br />

Peter Glynn, Primary Adviser<br />

I certify that I have read this dissertation <strong>and</strong> that, in my opinion, it is fully adequate<br />

in scope <strong>and</strong> quality as a dissertation <strong>for</strong> the degree of Doctor of Philosophy.<br />

Nicholas Bambos<br />

I certify that I have read this dissertation <strong>and</strong> that, in my opinion, it is fully adequate<br />

in scope <strong>and</strong> quality as a dissertation <strong>for</strong> the degree of Doctor of Philosophy.<br />

Approved <strong>for</strong> the Stan<strong>for</strong>d University Committee on Graduate Studies.<br />

Tze Lai<br />

Patricia J. Gumport, Vice Provost Graduate Education<br />

This signature page was generated electronically upon submission of this dissertation in<br />

electronic <strong>for</strong>mat. An original signed hard copy of the signature page is on file in<br />

University Archives.<br />

iii


Abstract<br />

Rare-<strong>event</strong> simulation concerns <strong>computing</strong> small <strong>probabilities</strong>, i.e. <strong>rare</strong>-<strong>event</strong> prob-<br />

abilities. This dissertation investigates efficient simulation algorithms based on im-<br />

portance sampling <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> different <strong>models</strong>, <strong>and</strong><br />

establishes their efficiency via asymptotic analysis.<br />

The first part discusses asymptotic behavior of <strong>affine</strong> <strong>models</strong>. Stochastic stability<br />

of <strong>affine</strong> jump diffusions are carefully studied. In particular, positive recurrence, er-<br />

godicity, <strong>and</strong> exponential ergodicity are established <strong>for</strong> such processes under various<br />

conditions via a Foster-Lyapunov type approach. The stationary distribution is char-<br />

acterized in terms of its characteristic function. Furthermore, the large deviations<br />

behavior of <strong>affine</strong> point processes are explicitly computed, based on which a logarith-<br />

mically efficient importance sampling algorithm is proposed <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong><br />

<strong>probabilities</strong> <strong>for</strong> <strong>affine</strong> point processes.<br />

The second part is devoted to a much more <strong>general</strong> setting, i.e. <strong>general</strong> state space<br />

Markov processes. The current state-of-the-art algorithm <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong><br />

<strong>probabilities</strong> in this context heavily relies on the solution of a certain eigenvalue<br />

problem, which is often unavailable in closed <strong>for</strong>m unless certain special structure<br />

is present (e.g. <strong>affine</strong> structure <strong>for</strong> <strong>affine</strong> <strong>models</strong>). To circumvent this difficulty,<br />

assuming the existence of a regenerative structure, we propose a bootstrap-based<br />

algorithm that conducts the importance sampling on the regenerative cycle-path space<br />

instead of the original one-step transition kernel. The efficiency of this algorithm is<br />

also discussed.<br />

iv


Acknowledgements<br />

Life is r<strong>and</strong>om <strong>and</strong> yet there do exist moments that deflect its path. Being admitted<br />

to Stan<strong>for</strong>d is definitely one such moment of mine. The past five years turns out<br />

to be the most amazing time in my life. I am <strong>for</strong>tunate <strong>and</strong> grateful that I have<br />

been showered with so much support <strong>and</strong> encouragement, which makes it a virtually<br />

impossible task to express my appreciation to every single one of those who have<br />

helped me.<br />

First <strong>and</strong> <strong>for</strong>emost, I am deeply indebted to my advisor, Professor Peter Glynn.<br />

I met Peter <strong>for</strong> the first time while he was on an academic visit to Nankai University<br />

in China. I was a senior student, wondering what major to study <strong>for</strong> graduate school.<br />

I had not even considered applying <strong>for</strong> Stan<strong>for</strong>d because I thought I had merely a<br />

remote chance <strong>for</strong> it. But Peter encouraged me to follow my heart <strong>and</strong> things un-<br />

folded surprisingly well. During my PhD study, Peter has been a constant source<br />

of inspiration <strong>and</strong> I have benefited from him in numerous aspects. As a doctoral<br />

supervisor, Peter’s broad scope of knowledge, penetrating insight <strong>and</strong> intuitive expla-<br />

nation of complicated theory have never ceased to amaze me. Due to him, my way<br />

of thinking <strong>for</strong> research is shaped more rigorous <strong>and</strong> my horizon is exp<strong>and</strong>ed beyond<br />

what I ever imagined. His support is by no means limited to the academic level. As<br />

a mentor <strong>and</strong> friend, Peter makes me com<strong>for</strong>table to share my personal experiences,<br />

both excitements <strong>and</strong> frustrations. He is a role model that I look up to <strong>and</strong> wish to<br />

become, professionally <strong>and</strong> personally.<br />

I would like to thank Professor Kay Gieceke, <strong>for</strong> introducing me to the research<br />

area of credit risk. His deep insight about financial modeling makes him a great<br />

collaborator. It is an enjoyable process to write a paper with him. I would also<br />

v


like to thank Professor David Siegmund, <strong>for</strong> the classes on probability <strong>and</strong> stochastic<br />

processes he taught as well as the suggestions he gave <strong>for</strong> my research questions. I am<br />

obliged to Professor Jose Blanchet at Columbia University, Professor Darrell Duffie,<br />

along with the three preceding professors <strong>for</strong> writing recommendation letters during<br />

the process of my academic job searching. Moreover, I would like to thank Professor<br />

Nick Bambos <strong>and</strong> Professor Tze Leung Lai <strong>for</strong> serving my reading committee, <strong>and</strong><br />

Professor George Papanicolaou <strong>for</strong> chairing my oral defense. My gratitude extends to<br />

all the faculty <strong>and</strong> staff in the Department of Management Science <strong>and</strong> Engineering<br />

<strong>for</strong> their help <strong>and</strong> advice <strong>for</strong> years.<br />

My friends have played an indispensable part of my life at Stan<strong>for</strong>d. They have<br />

made my life at Stan<strong>for</strong>d full of joy <strong>and</strong> excitement. Among them are Simla Ceyhan,<br />

Anwei Chai, Shi Chen, Yichuan Ding, Dongdong Ge, Yihan Guan, Juegang Hu,<br />

Chuan Huang, Krishnamurthy Iyer, Xiaoye Jiang, Tim Kraft, Lei Liu, Shan Liu, Jing<br />

Ma, Zongming Ma, Ehsan Mousavi, Chen Peng, Qi Qi, Waraporn Tongprasit, Xi<br />

Wang, Zizhuo Wang, Wei Wu, Yu Wu, Li Xu, Yuan Yao, Hongsong Yuan, Yan Zhai,<br />

Kaiyuan Zhang, Feng Zhang, Lingren Zhang, Qinqin Zhang, Wugang Zhao, Yanchong<br />

Zheng, Zhen Zhu, <strong>and</strong> etc. In addition, I want to give my special thanks to Su Chen,<br />

my best friend at Stan<strong>for</strong>d <strong>and</strong> my Best Man.<br />

At last, it comes to my family. My parents, Jinshan Zhang <strong>and</strong> Xiuhuan Bi, have<br />

always been supportive in any ways. They have showed <strong>and</strong> are still showing me how<br />

to look at the world, optimistic <strong>and</strong> passionate. My wife, Biyun Pan, is my greatest<br />

achievement. I could never say thanks too much <strong>for</strong> the love <strong>and</strong> support she gives<br />

me. This dissertation is dedicated to my dear family.<br />

vi


Contents<br />

Abstract iv<br />

Acknowledgements v<br />

1 Introduction 1<br />

2 Affine Jump Diffusions: Stochastic Stability 7<br />

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7<br />

2.2 Affine Jump Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

2.3 Stochastic Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

2.3.1 Foster-Lyapunov Inequalities . . . . . . . . . . . . . . . . . . . 17<br />

2.3.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

2.3.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.4 Characterization of Stationary Distribution . . . . . . . . . . . . . . . 36<br />

3 Affine Point Processes: Large Deviations 43<br />

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

3.2 Affine Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 48<br />

3.4 Large Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

3.4.1 A Class of Exponential Martingales . . . . . . . . . . . . . . . 53<br />

3.4.2 Limiting Cumulant Generating Function . . . . . . . . . . . . 62<br />

3.4.3 Steepness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

3.5 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

vii


3.6 Application to Portfolio Credit Risk . . . . . . . . . . . . . . . . . . . 71<br />

4 Computing Large Deviations <strong>for</strong> GSSMPs 75<br />

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75<br />

4.2 Markov-Dependent Sums . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

4.3 Empirical Moment Generating Function . . . . . . . . . . . . . . . . 83<br />

4.4 A Bootstrap Based Simulation Algorithm . . . . . . . . . . . . . . . . 88<br />

4.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

4.5.1 Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . 98<br />

4.5.2 R<strong>and</strong>om Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

A Stochastic Stability of Markov Processes 104<br />

A.1 Extended Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

A.2 Foster-Lyapunov Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 106<br />

A.2.1 Recurrence <strong>and</strong> Ergodicity . . . . . . . . . . . . . . . . . . . . 106<br />

A.2.2 Exponential Ergodicity . . . . . . . . . . . . . . . . . . . . . . 108<br />

Bibliography 110<br />

viii


List of Tables<br />

3.1 Theoretical v.s. Estimated Mean/Variance . . . . . . . . . . . . . . . 51<br />

3.2 Parameter specification <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> APPs 73<br />

3.3 Results of the numerical experiment <strong>for</strong> testing the logarithmic effi-<br />

ciency of the IS estimator. . . . . . . . . . . . . . . . . . . . . . . . . 73<br />

4.1 Parameter specification <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> the<br />

AR(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100<br />

4.2 True vs estimated tilting parameters <strong>for</strong> the AR(1) model. The number<br />

of cycles m = 40000. . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

4.3 Results <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> the AR(1) model.<br />

The number of cycles m = 40000, the number of bootstrap samples<br />

r = 20000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

4.4 True vs estimated tilting parameters <strong>for</strong> the r<strong>and</strong>om walk. c = 0.5.<br />

The number of cycles m = 40000. . . . . . . . . . . . . . . . . . . . 102<br />

4.5 Results <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> the r<strong>and</strong>om walk.<br />

c = 0.5. The number of cycles m = 40000, the number of bootstrap<br />

samples r = 10000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />

ix


List of Figures<br />

3.1 Simulated Sample Path of X(t) <strong>and</strong> L(t). . . . . . . . . . . . . . . . . 52<br />

3.2 Histogram v.s. Fitted Probability Density. Left: L(T ); Right: N(T ). 52<br />

3.3 Convergence of Log Ratio as the probability tends to 0, showing the<br />

logarithmic efficiency of the IS estimator. . . . . . . . . . . . . . . . . 74<br />

x


Chapter 1<br />

Introduction<br />

This dissertation, as suggested by the title, consists of two parts, namely comput-<br />

ing <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> via simulation <strong>for</strong> <strong>affine</strong> <strong>models</strong> <strong>and</strong> <strong>general</strong> state space<br />

Markov processes. Rare-<strong>event</strong> simulation is an important research area in stochastic<br />

simulation as well as computational <strong>and</strong> applied probability. It involves estimating<br />

<strong>probabilities</strong> of the <strong>event</strong>s that occur very infrequently <strong>and</strong> yet significant enough to<br />

justify their study, i.e. <strong>rare</strong>-<strong>event</strong>s. Rare-<strong>event</strong> simulation has wide applications in<br />

numerous areas. A notable example is the packet loss in packet-switched telecom-<br />

munication networks. In order to reduce the variation of delays in carrying real-time<br />

video traffic, the buffers within the switches are designed to have limited size, which<br />

yields the possibility of buffer overflow <strong>and</strong> thereby packet loss. Hence, it is of great<br />

importance to estimate the packet loss probability (which could be of order 10 −9 ) as a<br />

per<strong>for</strong>mance measure of the network system; see, <strong>for</strong> example, [53]. Another example<br />

is in the area of air transportation, where a specification <strong>for</strong> civil aircraft is that the<br />

probability of failure must be less than, say, 10 −9 during a flight of about 8 hours;<br />

see, <strong>for</strong> example, [13]. Rare-<strong>event</strong> simulation is also widely applicable in finance. For<br />

instance, portfolio managers need to estimate the probability of large portfolio loss<br />

<strong>for</strong> risk management purpose (see, <strong>for</strong> example, [48] <strong>and</strong> [7]). In insurance context,<br />

the insurance company is interested in estimating the probability of ruin in a given<br />

time horizon or <strong>event</strong>ual ruin to adjust the premiums against the potential claims<br />

(see, <strong>for</strong> example, [3] <strong>and</strong> [4]). Some <strong>general</strong> references on <strong>rare</strong>-<strong>event</strong> simulation are<br />

1


CHAPTER 1. INTRODUCTION 2<br />

[15], [60], [1], <strong>and</strong> [84].<br />

Consider the problem of estimating the probability α = P(A) of some <strong>rare</strong>-<strong>event</strong><br />

A, i.e. α is small. The crude Monte Carlo (CMC) method is to use the estimator<br />

Z = IA so that the variance is σ 2 = α(1 − α). For small α, the absolute error zσn −1/2<br />

is not of sufficient interest. What really matters is the relative error, which truly<br />

captures the accuracy of the estimation. This leads to the problem of CMC with<br />

<strong>rare</strong>-<strong>event</strong>s:<br />

relative error = zσn−1/2<br />

α<br />

= z<br />

1 − α<br />

nα<br />

∼ z<br />

√ nα → ∞,<br />

as α ↓ 0. Consequently, if one wants to achieve a prescribed relative precision, one<br />

needs to increase n in proportion to α −1 . To illustrate this, let us assume that we<br />

target at 10% relative error with a 95% confidence interval. Also, assume that the<br />

probability of interest α is of order 10 −9 , which occurs in many telecommunication<br />

settings. This leads to<br />

1.96<br />

√ 10 −9 n ≤ 0.1,<br />

implying n ≥ 3.84 × 10 11 . If the system being simulated is complex, this would be<br />

virtually infeasible!<br />

To <strong>for</strong>mally discuss such efficiency concepts, let {A(x)} be a family of <strong>rare</strong>-<strong>event</strong>s,<br />

where x is some index <strong>and</strong> assuming x ∈ (0, ∞) without loss of <strong>general</strong>ity. Assume<br />

that α(x) P(A(x)) → 0 as x → ∞. For each x let Z(x) be an unbiased estimator<br />

<strong>for</strong> A(x), i.e. EZ(x) = α(x).<br />

The best per<strong>for</strong>mance that has been observed in realistic <strong>rare</strong>-<strong>event</strong> setting is<br />

bounded relative error or strong efficiency, meaning<br />

lim<br />

x→∞<br />

Var(Z(x))<br />

< ∞. (1.1)<br />

α(x) 2<br />

If Z(x) has bounded relative error, the number of samples required to achieve a<br />

prescribed relative precision is independent of the rarity of the A(x). To see this, let


CHAPTER 1. INTRODUCTION 3<br />

us assume that we generate n iid copies of Z(x), Z1(x), . . . , Zn(x) <strong>and</strong> use<br />

Zn(x) = 1<br />

n<br />

n<br />

Zi(x)<br />

as our estimate <strong>for</strong> α(x). Then, it follows from the Chebyshev’s inequality that <strong>for</strong><br />

any ɛ > 0,<br />

<br />

|Z(x) − α(x)|<br />

P<br />

α(x)<br />

i=1<br />

<br />

> ɛ<br />

≤ Var(Z(x))<br />

nɛ2 .<br />

α(x) 2<br />

Then, <strong>for</strong> a given upper bound ɛ of the relative error, we can guarantee that the<br />

relative error is no larger than ɛ with probability (1 − δ), if the number of replications<br />

Hence, if we choose <strong>and</strong> fix<br />

n ≥ Var(Z(x))<br />

δɛ2 . (1.2)<br />

α(x) 2<br />

n ≥ 1<br />

lim<br />

δɛ2 x→∞<br />

we are all set, regardless of how small α(x) is.<br />

Var(Z(x))<br />

α(x) 2 ,<br />

An efficiency concept slightly weaker than (1.1) is logarithmic efficiency, meaning<br />

lim<br />

x→∞<br />

log Var(Z(x))<br />

≥ 1. (1.3)<br />

log α(x) 2<br />

Remark 1.1. The conditions (1.1) <strong>and</strong> (1.3) are typically verified by replacing Var(Z(x))<br />

by EZ(x) 2 . <br />

It is more often to work with logarithmic efficiency rather than strong efficiency<br />

in practice. The reasons include i.) logarithmic efficiency is often easier to verify<br />

(typically by applying the large deviations theory); <strong>and</strong> ii.) the difference between<br />

their per<strong>for</strong>mance in practice is minor.<br />

The first portion of this dissertation discusses asymptotic analysis <strong>and</strong> the <strong>rare</strong>-<br />

<strong>event</strong> simulation <strong>for</strong> <strong>affine</strong> <strong>models</strong> in the context of portfolio credit risk. Risk manage-<br />

ment is particularly concerned with the <strong>event</strong> of large portfolio loss, which is typically


CHAPTER 1. INTRODUCTION 4<br />

<strong>rare</strong> but significant. The most recent as well as the preceding financial crises that<br />

have happened in the history indicate that the default of one major financial entity<br />

on the market could trigger a chain effect that spread the risk exposure through the<br />

entire financial network so that more defaults tend to occur. This “feedback” feature<br />

of the defaults are known as “self-excitation” or “clustering” in portfolio credit risk.<br />

Recently, there has been extensive research based on modeling portfolio credit risk<br />

with <strong>affine</strong> point processes. Belonging to the family of <strong>affine</strong> <strong>models</strong>, <strong>affine</strong> point<br />

process is an intensity-based (or so-called “reduced-<strong>for</strong>m”) model, which captures the<br />

self-excitation feature (see [37]). Affine <strong>models</strong> are widely used in various areas of<br />

finance <strong>and</strong> econometrics, whose broad applicability is due to their computational<br />

tractability as well as flexibility in calibrating parameters. Given the wild popularity<br />

of copula-based <strong>models</strong> be<strong>for</strong>e the 2008 financial crisis, it is not surprising that most<br />

of the research on efficient simulation <strong>for</strong> portfolio credit risk is conducted in the<br />

context of copulas. By contrast, the efficient simulation <strong>for</strong> intensity-based <strong>models</strong><br />

are far less; see [6] <strong>for</strong> doubly stochastic processes, <strong>and</strong> the most recent work in [41],<br />

[42] <strong>for</strong> <strong>affine</strong> point processes. The three preceding papers work in the “large portfo-<br />

lio” asymptotic regime <strong>and</strong> we will treat the “large time horizon” asymptotic regime<br />

instead in this dissertation.<br />

The vehicle that drives our efficient simulation algorithm is the large deviations<br />

analysis, which describes the atypical behavior of the system. An <strong>affine</strong> point process<br />

is, to a large extent, similar as a Markov-dependent sum, whose large deviations be-<br />

havior has been extensively studied; see [75], [76], [64], [65]. As indicated in [32] <strong>and</strong><br />

[11], a Markov-dependent sum with an unbounded functional could ensure uncon-<br />

ventional large deviation asymptotics, i.e. subexponential asymptotics. Nevertheless,<br />

even though the underlining Markov process <strong>for</strong> <strong>affine</strong> point process is unbounded,<br />

our analysis shows that its large deviations behavior still owns exponential asymp-<br />

totics. The above discussions on large deviations <strong>and</strong> efficient simulation <strong>for</strong> <strong>affine</strong><br />

point processes are done in Chapter 3.<br />

It turns out that in order to quantify the <strong>rare</strong>-<strong>event</strong> region <strong>and</strong> to prove the large<br />

deviations result <strong>for</strong> an <strong>affine</strong> point process, we need to establish its typical behav-<br />

ior first. Due to the close connection between <strong>affine</strong> point process <strong>and</strong> <strong>affine</strong> jump


CHAPTER 1. INTRODUCTION 5<br />

diffusion (as self-explained by their definitions), the typical behavior (characterized<br />

in terms of a central limit convergence) of the <strong>for</strong>mer heavily depends on that of<br />

the latter, especially its existence of an equilibrium. This induces the study of the<br />

stochastic stability of <strong>affine</strong> jump diffusion. Since the stochastic stability, in partic-<br />

ular ergodicity, of a jump diffusion process has interest of its own (<strong>for</strong> instance, its<br />

applications in parameter estimation based on long-term asymptotics), we provide a<br />

careful treatment of this subject in Chapter 2. Given the popularity of <strong>affine</strong> <strong>models</strong><br />

in practice, it is surprising that there have been no results in the literature <strong>for</strong> estab-<br />

lishing the ergodicity of <strong>affine</strong> jump diffusions. Existing research is typically focused<br />

on the jump diffusions whose jump intensity is independent of the state ([86], [68],<br />

[69], [99]) <strong>and</strong> this assumption obviously fails <strong>for</strong> the self-exciting <strong>affine</strong> <strong>models</strong>. Our<br />

approach <strong>for</strong> establishing the ergodicity is the Foster-Lyapunov criteria. Thanks to<br />

the <strong>affine</strong> structure, we are able to find appropriate test functions that verify the<br />

criteria, which is not a trivial task <strong>for</strong> a generic multidimensional Markov process.<br />

A central limit theorem is further proved <strong>for</strong> <strong>affine</strong> jump diffusions by virtue of the<br />

central limit theorem <strong>for</strong> local martingales. Finally, the stationary distribution is<br />

characterized in terms of its Fourier trans<strong>for</strong>m.<br />

For the second portion of this dissertation, we move into a much more <strong>general</strong><br />

setting, where we consider the <strong>rare</strong>-<strong>event</strong> simulation <strong>for</strong> <strong>general</strong> state space Markov<br />

processes. The implementability of the importance sampling <strong>for</strong> <strong>affine</strong> point pro-<br />

cesses in Chapter 3 depends on two things. First, we are able to compute the limiting<br />

cumulant generating function of the process, or equivalently we are able to solve an as-<br />

sociated eigenvalue problem; second, we are able to sample/generate paths under the<br />

change-of-measure, which involves the eigenvalue/eigenfunction of the first problem.<br />

These two tasks are explicitly solvable <strong>for</strong> <strong>affine</strong> point processes due to the <strong>affine</strong><br />

structure. In particular, the eigenfunction has an exponential <strong>affine</strong> <strong>for</strong>m <strong>and</strong> the<br />

change-of-measure falls within the same family as the original probability measure.<br />

The current state-of-the-art algorithms <strong>for</strong> <strong>rare</strong>-<strong>event</strong> simulation <strong>for</strong> Markov processes<br />

also depends on the feasibility of the above two tasks; see [16], [34], <strong>and</strong> [12]. In the<br />

absence of such a special structure of the underlying model, how should one proceed<br />

without the explicit solution of the associated eigenvalue problem. [17] proposes a


CHAPTER 1. INTRODUCTION 6<br />

sequential importance sampling <strong>and</strong> resampling algorithm, attempting to address the<br />

second problem above. We, by contrast, will try to solve the two problems simul-<br />

taneously in Chapter 4. Assuming the existence of a regenerative structure, which<br />

can be constructed via Nummlin’s splitting method in the presence of positive Harris<br />

recurrence, we consider the importance sampling at the level of the regenerative cy-<br />

cles instead of the step-by-step transition dynamics, so that the eigenfunction can be<br />

eliminated from the tilted probability measure, which solves the second task. More-<br />

over, we can approximate the tilting parameter by its empirical approximation, which<br />

solves the first task. This bootstrap type algorithm is proved to be logarithmically<br />

efficient.


Chapter 2<br />

Affine Jump Diffusions: Stochastic<br />

Stability<br />

2.1 Introduction<br />

Affine jump-diffusion (AJD) processes represent a large family of continuous-time<br />

stochastic <strong>models</strong> that are widely applied in financial engineering <strong>and</strong> econometrics.<br />

The broad applicability of this family of <strong>models</strong> is due to its modeling flexibility<br />

as well as computational tractability. The term “<strong>affine</strong>” derives from the fact that<br />

the drift, the variance <strong>and</strong> the jump intensity are all <strong>affine</strong> in the state vector. As<br />

shown in [30] <strong>and</strong> [28], the <strong>affine</strong> structure implies that the Laplace/Fourier trans-<br />

<strong>for</strong>m of the probability distribution of an AJD is explicitly available up to solving a<br />

system of (<strong>general</strong>ized) Riccati ordinary differential equations (ODEs), which thereby<br />

provides great tractability. (Note that <strong>for</strong> a generic diffusion process, to obtain its<br />

trans<strong>for</strong>m requires solving a set of partial differential equations, which is much more<br />

computationally involved.)<br />

The AJD family includes many broadly used examples, such as the Brownian<br />

motion with drift in the Ho-Lee model in [54], the Ornstein-Uhlenbeck (OU) process<br />

in the Vasicek model in [96], <strong>and</strong> the Feller diffusion in the Cox-Ingersoll-Ross (CIR)<br />

model in [22].<br />

The stochastic stability of jump diffusions is not only of interest in itself but also<br />

7


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 8<br />

has important implications in parameter estimation. The ergodicity typically plays an<br />

essential role <strong>for</strong> estimators in establishing laws of large numbers; see, <strong>for</strong> example,<br />

[88], [89] <strong>and</strong> references therein. The stability will also be used in Chapter 3 <strong>for</strong><br />

<strong>computing</strong> the large deviations asymptotics <strong>for</strong> <strong>affine</strong> point processes.<br />

Nevertheless, despite its wide application in practice, especially in finance <strong>and</strong><br />

econometrics, it is surprising that there has not been a systematic treatment <strong>for</strong> the<br />

stochastic stability of AJDs in the literature. Although there has been significant<br />

research on the stochastic stability of generic jump diffusions ([86], [68], [69], [99] <strong>and</strong><br />

the references therein), these discussions are limited to the setting where the jump<br />

intensity is “state-independent”, which clearly fails in our AJD setting. It is showed<br />

in [47] <strong>and</strong> [63] that the equilibrium of <strong>affine</strong> diffusion processes (without jumps)<br />

exists, through the analysis of the stability of the associated system of ODEs. We<br />

find it difficult to extend their approach to AJDs because one would then need to<br />

analyze a system of ordinary integro-differential equations, which is significantly more<br />

challenging.<br />

A key assumption <strong>for</strong> establishing the stability is the “mean-reversion” of the<br />

drift (Assumption 3), under which the paths have a tendency to return to a compact<br />

set. Moreover, the farther the paths deviate from this compact set, the stronger the<br />

tendency will be. Another key observation is that the only factor that can ruin the<br />

ef<strong>for</strong>t of the mean-reversion is the jump component. We thereby need to control the<br />

jumps so that they are not too big in size or do not occur too frequently. This leads<br />

to Assumption 5. These two assumptions are critical <strong>and</strong> will be carried over to the<br />

asymptotic analysis of <strong>affine</strong> point processes in Chapter 3 as well.<br />

The approach we will adopt <strong>for</strong> establishing the stability is the Foster-Lyapunov<br />

method (see Appendix A <strong>for</strong> details). To find an appropriate Foster-Lyapunov test<br />

function in the multidimensional setting is not a trivial task <strong>and</strong> can be challenging.<br />

However, the task can be done <strong>for</strong> AJDs thanks to the <strong>affine</strong> structure. In particular,<br />

the test functions are chosen to be monotone functions (power function or logarithmic<br />

function) of a certain norm in the Euclidean space. We not only obtain results on<br />

the existence of the equilibrium (in other words, positive Harris recurrence) of AJDs,<br />

but also on the convergence rate to the equilibrium. In addition we derive a central


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 9<br />

limit theorem (CLT) <strong>for</strong> AJDs. The tool we will use is the CLT <strong>for</strong> local martingales.<br />

Again, due to the <strong>affine</strong> structure, we can explicitly calculate the equilibrium mean<br />

<strong>and</strong> asymptotic covariance matrix. The above discussions will be made clearer in<br />

Section 2.3. Finally, we will characterize the equilibrium distribution in terms of a<br />

first-order linear partial differential equation (PDE) in Section 2.4, which has a close<br />

connection to an ODE system.<br />

2.2 Affine Jump Diffusions<br />

For the rest of this chapter as well as Chapter 3, we will use the following notational<br />

conventions. For a vector v ∈ R n , v is viewed as a column vector, v ⊺ denotes its<br />

transpose, v denotes its Euclidean norm <strong>and</strong> diag(v) denotes the diagonal matrix<br />

whose diagonal elements are v. Moreover, we use 0 to denote a zero matrix <strong>and</strong><br />

Id(i) to denote a matrix with all zero entries except the i-th diagonal entry is 1. We<br />

write A 0 if A ∈ R n×n is a symmetric positive semidefinite matrix <strong>and</strong> A ≻ 0 if A<br />

is symmetric positive definite. Finally, <strong>for</strong> any probability distribution η on S, Pη(·)<br />

denote the distribution conditional on X(0) has distribution η <strong>and</strong> Eη is the associated<br />

expectation operator. In particular, Px(·) = P(·|X(0) = x) <strong>and</strong> Ex(·) = E(·|X(0) = x)<br />

<strong>for</strong> a given x ∈ S.<br />

Fix a complete probability space (Ω, P, F ) <strong>and</strong> a filtration {Ft : t ≥ 0} satisfying<br />

the usual conditions (see, <strong>for</strong> example, [62] <strong>for</strong> details). Let X be a n-dimensional<br />

time-homogeneous Markov process with state space S ⊆ R n , satisfying the following<br />

stochastic differential equation (SDE)<br />

dX(t) = µ(X(t)) dt + σ(X(t)) dW (t) +<br />

K<br />

<br />

i=1<br />

S<br />

z Ni(dt, dz) (2.1)<br />

where W = (W (t) : t ≥ 0) be a st<strong>and</strong>ard n-dimensional Brownian motion adapted to<br />

{Ft : t ≥ 0}, µ : S → R n , <strong>and</strong> σ : S → R n×n , i = 1, . . . , K. Moreover, Ni(dt, dz) is a<br />

counting r<strong>and</strong>om measure [0, ∞) × R n with compensator Λi(X(t−))dtϕi(dz), where<br />

Λi : S → R+ <strong>and</strong> ϕi is a probability measure on R n , <strong>for</strong> each i = 1, . . . , K.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 10<br />

Moreover, define<br />

Ni(t) <br />

t<br />

0<br />

<br />

S<br />

Ni(ds, dz)<br />

<strong>for</strong> i = 1, . . . , K. Then, Ni(t) is a counting process with intensity Λi(X(t−)).<br />

In the sequel of the paper, we will use Z i to denote a S-valued r<strong>and</strong>om variable<br />

with probability distribution ϕi.<br />

The <strong>affine</strong> structure is introduced in the following fashion. We assume that µ, σσ ⊺<br />

<strong>and</strong> Λ are all <strong>affine</strong> in the state variable, i.e.<br />

X is then an AJD.<br />

µ(x) =b − βx, b ∈ R n , β ∈ R n×n<br />

σ(x)σ(x) T n<br />

=a + α j xj, a ∈ R n×n , α j ∈ R n×n , j = 1, . . . , n<br />

Λi(x) =λi +<br />

j=1<br />

n<br />

j=1<br />

κijxj, λ ∈ R K + , κ ∈ R K×n<br />

+ , i = 1, . . . , K<br />

Let I, J ⊆ {1, . . . , n} be two index sets. We write vI = (vi : i ∈ I) ⊺ <strong>and</strong> MI,J =<br />

(Mij : i ∈ I, j ∈ J) <strong>for</strong> any vector v <strong>and</strong> matrix M. From now on, we fix the index<br />

sets I <strong>and</strong> J be setting I = {1, . . . , m} <strong>and</strong> J = {m + 1, . . . , n}.<br />

Definition 2.1. The parameters (a, α, b, β, λ, κ) are called admissible if<br />

1). a 0 with aI,I = 0 (hence aI,J = 0 <strong>and</strong> aJ,I = 0)<br />

2). α (α 1 , . . . , α n ) with α i 0 <strong>and</strong> α i I,I = αi i,iId(i) <strong>for</strong> i ∈ I; α i = 0 <strong>for</strong> i ∈ J.<br />

3). b ∈ R m + × R n−m<br />

4). βI,J = 0 <strong>and</strong> βI,I has non-positive off-diagonal elements..<br />

5). λ ∈ R K + , κ ∈ R K×n<br />

+<br />

with κi,j = 0 <strong>for</strong> i = 1, . . . , K <strong>and</strong> j ∈ J.<br />

Here are several basic assumptions that we will use in this chapter.<br />

Assumption 1. The parameters (a, α, b, β, λ, κ) in the SDE (2.1) are admissible.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 11<br />

Assumption 2. Either EZ i < ∞ <strong>for</strong> all i = 1, . . . , K, or κ = 0, where Z i has<br />

distribution ϕi.<br />

Lemma 2.1. Under Assumption 1 <strong>and</strong> Assumption 2, the SDE (2.1) has a unique<br />

weak solution on S = R m + × R n−m . The solution process is càdlàg (right continuous<br />

with left limits), nonexplosive <strong>and</strong> has the Feller property.<br />

Proof. See Theorem 2.7 <strong>and</strong> Lemma 9.2 of [28].<br />

Remark 2.1. As indicated in [28], the state space S = R m + × R n−m is called canonical.<br />

Moreover, the first m coordinates are of CIR type while the others are of O-U type.<br />

The volatility function σ(·) <strong>and</strong> the intensity function Λi(·), i = 1, . . . , K, depend<br />

only on the CIR type coordinates. <br />

Lemma 2.1 asserts that there exists a process X ′ (t) adapted to a filtration F ′<br />

t,<br />

satisfying<br />

dX ′ (t) = (b − µ(X ′ (t))) dt + σ(X ′ (t)) dW ′ (t) +<br />

K<br />

<br />

i=1<br />

S<br />

z N ′ i(dt, dz)<br />

<strong>for</strong> a Brownian motion W ′ (t) adapted to F ′<br />

t, <strong>and</strong> r<strong>and</strong>om measures N ′ i(dt, dz) with<br />

intensity Λi(X ′ (t))dtϕi(dz), i = 1, . . . , K. With a bit abuse of notation, we will not<br />

differentiate (X(t), W (t), Ni(dt, dz), Ft) from (X ′ (t), W ′ (t), N ′ i(dt, dz), F ′<br />

t).<br />

Assumption 3. β is positive stable, i.e. all the eigenvalues of β have positive real<br />

parts.<br />

Remark 2.2. For a one-dimensional Itô diffusion process, the assumption that β > 0 is<br />

critical <strong>for</strong> recurrence. A natural extension in the multidimensional setting to retain<br />

the “positivity” of β is to make β positive stable. This assumption originates from<br />

the study of the stability of a system of first-order ODEs ˙<br />

f = −βf, where β is the<br />

coefficients matrix. The same assumption also appears in [86] or [68], which study<br />

the stochastic stability of O-U type processes driven by a Lévy process. In light of<br />

the connection to the mean reversion O-U process, we call Assumption 3 the mean<br />

reversion assumption.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 12<br />

Assumption 4. There exists an index set L ⊆ I <strong>for</strong> which<br />

a + <br />

k∈L<br />

α k ≻ 0, <strong>and</strong> min<br />

k∈L<br />

Remark 2.3. Note that a key technical assumption in all the Foster-Lyapunov criteria<br />

in the Appendix is the condition that all the compact sets are petite. As discussed<br />

in Section A.2.1, to satisfy this condition requires some continuity on the transition<br />

kernel. A natural step to introduce such continuity is to assume the existence of the<br />

transition density. The existence (or even smoothness) of the transition density of a<br />

jump diffusion process has been extensively studied in the literature. For example,<br />

a sufficient condition <strong>for</strong> Itô diffusions is the Hörm<strong>and</strong>er condition developed using<br />

Malliavin calculus. Moreover, the same condition also applies <strong>for</strong> certain class of<br />

jump diffusions (whose jump intensity is state-independent). See detailed discussion<br />

in [90]. Obviously, this condition does not apply in the AJD setting since the jump<br />

intensity of an AJD may linearly depends on the state. Nevertheless, we have the<br />

following proposition on the existence (<strong>and</strong> smoothness) of the transition density of<br />

AJDs. <br />

Lemma 2.2. Suppose that Assumptions 1, 2, <strong>and</strong> 4 hold. Let L be the index set in<br />

Assumption 4 <strong>and</strong> p be a nonnegative integer<br />

p < min<br />

k∈L<br />

bk<br />

2α k k,k<br />

Then, Px(X(t) ∈ ·) admits a density g(y) of class C p with support in S <strong>and</strong> the partial<br />

− 1.<br />

bk<br />

α k k,k<br />

> 2.<br />

derivatives of g(y) of orders 0, . . . , p tend to 0 as y → ∞.<br />

Proof. See Theorem 4.1 of [40].<br />

In order to apply the Foster-Lyapunov approach discussed in the Appendix, we<br />

need to specify the extended generator of an AJD (see Definition A.2). To that end,<br />

we introduce the following function space <strong>for</strong> potential test functions. Let C 2 (R n ) be


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 13<br />

the set of twice differentialable functions f : R n → R <strong>and</strong> define<br />

<br />

Q = f ∈ C 2 (R n <br />

<br />

) <br />

<br />

Also define the linear operator A on Q by<br />

where<br />

<strong>and</strong><br />

<br />

|f(· + z)|ϕi(dz) is bounded on compact sets, i = 1, . . . , K<br />

G f(x) = ∇f(x) ⊺ (b − βx) + 1<br />

2<br />

L f(x) =<br />

K<br />

i=1<br />

(2.2)<br />

A f = G f + L f, (2.3)<br />

(λi + κ ⊺<br />

i x)<br />

n<br />

i,j=1<br />

where we use κi to denote the i-th row of κ.<br />

<br />

∂2 <br />

f<br />

(x) ai,j +<br />

∂xi∂xj<br />

m<br />

k=1<br />

(f(x + z) − f(x)) ϕi(dz),<br />

α k i,jxk<br />

We now show that A is the extended generator of X. We will need the following<br />

lemmata on the properties of local martingales.<br />

Lemma 2.3. Let M = (M(t) : t ≥ 0) be a càdlàg process <strong>and</strong> Tn be a sequence of<br />

stopping times increasing to ∞ a.s. such that M(t ∧ Tn) is a local martingale <strong>for</strong> each<br />

n. Then, M is a local martingale.<br />

Proof. See Theorem 48 in Chapter 1 of [83].<br />

Lemma 2.4. Let γ(dt, dz) be a counting r<strong>and</strong>om measure on [0, ∞) × R n with com-<br />

pensator ν(dt, dz).<br />

(i.) If<br />

E<br />

t<br />

0<br />

<br />

R n<br />

then t <br />

0<br />

|g(s, z)| ν(ds, dz) < ∞,<br />

R n<br />

is a local martingale, where ˜γ γ − ν.<br />

g(s, z) ˜γ(ds, dz)<br />

<br />

,


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 14<br />

(ii.) If<br />

E<br />

t<br />

0<br />

<br />

R n<br />

then t <br />

is a martingale.<br />

Proof. See Theorem II.1.33 of [56].<br />

0<br />

|g(s, z)| 2 ν(ds, dz) < ∞,<br />

R n<br />

g(s, z) ˜γ(ds, dz)<br />

Proposition 2.1. Under Assumption 1 <strong>and</strong> Assumption 2, A is the extended gen-<br />

erator of X. Further, Q ⊆ Dom(A ), where Q is defined by (2.2).<br />

Proof. Fix t > 0, x ∈ S, <strong>and</strong> f ∈ Q. Let τk = inf{t : X(t) > k} <strong>for</strong> each k ≥ 1.<br />

Lemma 2.1 asserts that X is nonexplosive, so τk → ∞ as k → ∞. By virtue of<br />

Lemma 2.3, it suffices to show that<br />

f(X(t ∧ τk)) −<br />

t∧τk<br />

is a Px-local martingale adapted to X <strong>for</strong> each k.<br />

0<br />

A f(X(s)) ds<br />

Note that by Itô’s <strong>for</strong>mula (see, <strong>for</strong> example, Theorem 33 in Chapter 2 of [83])<br />

f(X(t ∧ τk))<br />

=f(X(0)) + <br />

+<br />

t∧τk<br />

0<br />

=f(X(0)) +<br />

0


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 15<br />

<strong>and</strong><br />

Note that<br />

Ex<br />

t∧τk<br />

0<br />

G2(t) = <br />

(f(X(s)) − f(X(s−))) −<br />

0


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 16<br />

Note that<br />

t∧τk<br />

Ex<br />

0<br />

t∧τk<br />

≤Ex<br />

≤<br />

t<br />

0<br />

0, d < ∞, <strong>and</strong> a compact set C. Under mind conditions, this<br />

inequality implies positive Harris recurrence.<br />

Moreover, if V (x) ≥ 1 on C c , then the following inequality is obviously stronger,<br />

which will imply a stronger stochastic stability result, namely exponential ergodicity<br />

(roughly meaning the process converges to the equilibrium exponentially fast).<br />

A V (x) ≤ −cV (x) + dIC(x), x ∈ S.<br />

We will give different sufficient conditions under which such a test function does exist.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 17<br />

2.3.1 Foster-Lyapunov Inequalities<br />

Note that in the absence of the jump part (i.e. X is an Itô diffusion), Assumption<br />

3 guarantees that X is positive Harris recurrent; see [86] <strong>and</strong> [68]. Since the mean<br />

reversion <strong>for</strong>ces the process drifts back to the equilibrium, it is then seems reasonable<br />

to speculate that the equilibrium will still exist in the presence of jumps if the effect<br />

of jumps are “dominated” by the <strong>for</strong>ce of mean reversion (see Assumption 5). It turns<br />

out that this is indeed the case as shown in Theorem 2.1 <strong>and</strong> Theorem 2.2.<br />

Assumption 5. β − K<br />

i=1 EZi κ ⊺<br />

i is positive stable, where Zi has distribution ϕi <strong>and</strong><br />

κi is the i-th row of κ.<br />

Remark 2.4. When EZ i = ∞ <strong>and</strong> κi = 0, we set EZ i κ ⊺<br />

i<br />

= 0.<br />

Be<strong>for</strong>e proceeding to the proofs, we will need the following lemma which is an<br />

important property of positive stable matrices.<br />

Lemma 2.5. Let A ∈ R n×n be positive stable. Then there exists G ≻ 0 such that<br />

GA + A ⊺ G ≻ 0.<br />

Proof. See Theorem 2.2.3 of [55].<br />

For any n × n matrix G ≻ 0, define<br />

xG (x ⊺ Gx) 1<br />

2<br />

<strong>for</strong> x ∈ R n . It is straight<strong>for</strong>ward to see that ·G is a norm equivalent to the Euclidean<br />

norm · in R n , since G ≻ 0. Further, we may define the associated matrix norm.<br />

In particular, <strong>for</strong> any A ∈ R n×n , define<br />

AG = sup<br />

y∈R n ,y=0<br />

AyG<br />

.<br />

yG<br />

The following technical result is also necessary <strong>for</strong> verifying the Foster-Lyapunov<br />

inequalities.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 18<br />

Lemma 2.6. Let Z ∈ R n be <strong>and</strong> f ≥ 0 be a monotone increasing function with<br />

Ef(Z) < ∞. Fix ɛ > 0. If<br />

then<br />

Proof. Note that<br />

Hence,<br />

f(y)<br />

lim<br />

y→∞ f(y − ɛ)<br />

= 1,<br />

lim f(x)P(x + Z ≤ ɛ) = 0<br />

x→∞<br />

P(x + Z ≤ ɛ) ≤ P(x − Z ≤ ɛ) = P(Z ≥ x − ɛ).<br />

f(x)P(x + Z ≤ ɛ) ≤f(x)P(Z ≥ x − ɛ)<br />

≤ f(x)<br />

E[f(Z)I(Z ≥ x − ɛ)]<br />

f(x − ɛ)<br />

→0<br />

as x → ∞, by the dominated convergence theorem.<br />

Proposition 2.2. Suppose that Assumptions 1, 2, 3, <strong>and</strong> 5 hold. Let Z i be a rv<br />

with distribution ϕi, i = 1, . . . , K. Assume that EZ i p < ∞ <strong>for</strong> some p > 0 <strong>and</strong><br />

all i = 1, . . . , K. Then, there exists a function V ∈ Q, a compact set C <strong>and</strong> some<br />

constants c > 0, d < ∞ such that<br />

A V (x) ≤ −cV (x) + dIC(x), x ∈ S. (2.5)<br />

Proof. Fix ɛ > 0. Let V be a C 2 function with V (x) = x p<br />

H = (x⊺ Hx) p<br />

2 <strong>for</strong> xH > ɛ,<br />

where H ≻ 0 will be specified later. Note that<br />

Hence,<br />

x + y p<br />

H ≤ (xH + yH) p ≤<br />

<br />

p<br />

(1 + xH )yp H , if y > 1<br />

1 + x p<br />

H , if y ≤ 1<br />

x + y p<br />

H ≤ (1 + xp H )(1 + yp H ).


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 19<br />

It follows that, <strong>for</strong> each i = 1, . . . , K,<br />

<br />

V (x + z) ϕi(dz)<br />

<br />

<br />

≤ V (x + z) ϕi(dz) +<br />

x+zH≤ɛ<br />

x + z<br />

x+zH>ɛ<br />

p<br />

H ϕi(dz)<br />

<br />

≤ sup V (y) +<br />

yH≤ɛ<br />

(1 + x p<br />

H )(1 + zp H ) ϕi(dz)<br />

= sup V (y) + (1 + x<br />

yH≤ɛ<br />

p<br />

H )(1 + EZi p<br />

H ).<br />

V (·+z) ϕi(dz) is bounded on compact sets <strong>for</strong> each i = 1, . . . , K, guaranteeing that<br />

V ∈ Q.<br />

Note that if we can show<br />

A V (x) ≤ −cV (x), ∀x ∈ C c<br />

where C {x ∈ S : xH ≤ k} <strong>for</strong> some k > 0, then taking d = sup x∈C V (x) ∈ (0, ∞)<br />

will suffice. There<strong>for</strong>e, it suffices to show<br />

<strong>for</strong> all xH sufficiently large.<br />

A V (x) ≤ −cV (x) (2.6)<br />

Direct calculations yield that <strong>for</strong> xH sufficiently large,<br />

∇V (x) = px p−2<br />

H Hx<br />

∇ 2 V (x) = px p−2 −2<br />

H (p − 2)xH Hxx⊺H + H .


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 20<br />

It follows that<br />

G V (x)<br />

=px p−2<br />

H<br />

<br />

x ⊺ H(b − βx) + 1<br />

2<br />

n<br />

i,j=1<br />

=px p−2<br />

H [−x⊺Hβx + O(xH)]<br />

<br />

=pV (x) − x⊺Hβx x2 <br />

+ o(1)<br />

H<br />

as xH → ∞.<br />

(ai,j +<br />

m<br />

k=1<br />

α k i,jxk) (p − 2)x −2<br />

H (Hxx⊺ <br />

H)i,j + Hi,j<br />

<br />

(2.7)<br />

Note that Assumption 2 <strong>and</strong> Assumption 5 indicate that we need to discuss the<br />

following two cases separately.<br />

(a) p ≥ 1 <strong>and</strong> β − K i=1 EZiκ ⊺<br />

i is positive stable;<br />

(b) p ∈ (0, 1) <strong>and</strong> κ = 0.<br />

Case (a) Suppose that p ≥ 1 <strong>and</strong> β − K i=1 EZiκ ⊺<br />

i is positive stable.<br />

Applying Taylor’s expansion, <strong>for</strong> each i = 1, . . . , K,<br />

V (x + z) − V (x) =z ⊺ ∇V (ξ)<br />

=z ⊺ ∇V (ξ)I(ξH ≤ ɛ) + pξ p−2<br />

H ξ⊺ HzI(ξH > ɛ),<br />

where ξ = x + uz <strong>for</strong> some u ∈ (0, 1). There<strong>for</strong>e, <strong>for</strong> xH > ɛ,<br />

κ ⊺<br />

i x(V (x + z) − V (x))<br />

V (x)<br />

= κ⊺<br />

i xz⊺ ∇V (ξ)<br />

x p<br />

H<br />

S1 + S2<br />

· I(ξH ≤ ɛ) + p · ξp<br />

H<br />

x p<br />

H<br />

· ξ⊺Hzκ ⊺<br />

i x<br />

ξ2 · I(ξH > ɛ) (2.8)<br />

H<br />

Note that ξH lies between xH <strong>and</strong> x + zH <strong>and</strong> that ξ⊺Hzκ ⊺<br />

i x lies between<br />

x⊺Hzκ ⊺<br />

i x <strong>and</strong> (x + z)⊺Hzκ ⊺<br />

i x. There<strong>for</strong>e, it follows from the squeeze theorem implies


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 21<br />

<strong>and</strong> (2.8) that<br />

κ ⊺<br />

i x(V (x + z) − V (x))<br />

V (x)<br />

∼ p · x⊺ Hzκ ⊺<br />

i x<br />

x 2 H<br />

(2.9)<br />

as xH → ∞ <strong>for</strong> each z ∈ R n , since ξH → ∞ as xH → ∞. Moreover, it follows<br />

from (2.8) that<br />

<strong>for</strong> xH large enough, <strong>and</strong> that<br />

S1 ≤ κiH · xH · zH · ∇V (ξ)H<br />

x p<br />

H<br />

≤κiH · sup ∇V (y)H ·<br />

yH≤ɛ<br />

zH<br />

x p−1<br />

H<br />

≤κiH · sup ∇V (y)H · zH<br />

yH≤ɛ<br />

· I(ξH ≤ ɛ)<br />

S2 ≤ κiH · xH · pξ p−2<br />

H · ξH · HH · zH<br />

x p<br />

H<br />

≤pκH · HH · zH · (xH + uzH) p−1<br />

x p−1<br />

H<br />

=pκH · HH · zH · (1 + x −1<br />

H<br />

≤pκH · HH · zH · (1 + zH) p−1<br />

· zH) p−1<br />

(2.10)<br />

<strong>for</strong> xH large enough. The right-h<strong>and</strong>-sides of (2.10) <strong>and</strong> (2.10) are clearly ϕi-<br />

integrable since EZ i p < ∞ <strong>and</strong> p ≥ 1. Then, the dominated convergence theorem<br />

<strong>and</strong> (2.9) that<br />

κ ⊺<br />

i x<br />

<br />

as xH → ∞. There<strong>for</strong>e,<br />

(V (x + z) − V (x)) ϕi(dz) ∼pV (x) ·<br />

L V (x) ∼ pV (x) ·<br />

⊺ ⊺<br />

x Hzκi x<br />

x2 ϕi(dz)<br />

H<br />

=pV (x) · x⊺ HEZ i κ ⊺<br />

i x<br />

x 2 H<br />

K<br />

i=1<br />

x ⊺ HEZ i κ ⊺<br />

i x<br />

x 2 H<br />

(2.11)


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 22<br />

as xH → ∞. It then follows from (2.7) <strong>and</strong> (2.11) that<br />

A V (x) =pV (x)<br />

<br />

− x⊺ H(β − K<br />

i=1 EZi κ ⊺<br />

i )x<br />

x 2 H<br />

<br />

=pV (x) − x⊺HBx x2 <br />

+ o(1) ,<br />

H<br />

+ o(1)<br />

where B β − K<br />

i=1 Zi κ ⊺<br />

i . By Lemma 2.5, there exists H ≻ 0 such that HB+B⊺ H ≻<br />

0. Hence,<br />

x ⊺ HBx = 1<br />

2 x⊺ (HB + B ⊺ H)x ≥ cx 2 ,<br />

<strong>for</strong> some c > 0 <strong>and</strong> all x ∈ R n . Moreover, we have x 2 ≥ δx 2 H<br />

Hence,<br />

<strong>for</strong> xH large enough, proving (2.6).<br />

A V (x) ≤ −pcδV (x)<br />

Case (b) Suppose that p ∈ (0, 1) <strong>and</strong> κ = 0.<br />

Note that, since p ∈ (0, 1),<br />

There<strong>for</strong>e, <strong>for</strong> xH > ɛ,<br />

x + y p<br />

H ≤ (xH + yH) p ≤ x p<br />

H + yp H .<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

V (x + z) − V (x) ϕi(dz) <br />

<br />

<br />

≤ |V (x + z) − V (x)| ϕi(dz) +<br />

x+zH≤ɛ<br />

<br />

<br />

≤ (V (x + z) + V (x)) ϕi(dz) +<br />

x+zH≤ɛ<br />

x+zH>ɛ<br />

x+zH>ɛ<br />

<br />

<strong>for</strong> some δ > 0.<br />

|V (x + z) − V (x)| ϕi(dz)<br />

z p<br />

H ϕi(dz)<br />

= sup V (y) + x<br />

yH≤ɛ<br />

p<br />

H · P(x + ZiH ≤ ɛ) + EZ i p<br />

H = O(1)


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 23<br />

by Lemma 2.6. This implies that, since κ = 0,<br />

L V (x) =<br />

K<br />

i=1<br />

as xH → ∞. It follows from (2.7) <strong>and</strong> (2.12) that<br />

λi<br />

<br />

V (x + z) − V (x) ϕi(dz) = O(1) (2.12)<br />

<br />

A V (x) = pV (x) − x⊺Hβx x2 <br />

+ o(1) .<br />

H<br />

By Lemma 2.5, there exists H ≻ 0 such that<br />

<strong>for</strong> some c, δ > 0. Hence,<br />

x ⊺ Hβx = 1<br />

2 x⊺ (Hβ + β ⊺ H)x ≥ cx 2 ≥ cδx 2 H,<br />

<strong>for</strong> xH large enough, proving (2.6).<br />

A V (x) ≤ −pcδV (x)<br />

Likewise, we can even treat the case of “super heavy-tailed” jump distribution,<br />

where EZ i p = ∞ <strong>for</strong> any p > 0, i = 1, . . . , K. The assumption we need here<br />

regarding the tail behavior is that E log(1 + Z i ) < ∞, i = 1, . . . , K. Note that this<br />

condition on the tail behavior of the jump distribution also appears in [86] <strong>and</strong> [68],<br />

where the stationarity of the O-U type process driven by a Lèvy process is established.<br />

Proposition 2.3. Suppose that Assumptions 1, 2, <strong>and</strong> 3 hold. If E log(1+Z i ) < ∞<br />

<strong>for</strong> each i = 1, . . . , K, <strong>and</strong> κ = 0, then there exists a function V ∈ Q, a compact set<br />

C <strong>and</strong> some constants c > 0, d < ∞ such that<br />

A V (x) ≤ −c + dIC(x), x ∈ S. (2.13)<br />

Proof. Fix ɛ > 0. Note that there exists H ≻ 0 such that Hβ + β ⊺ H ≻ 0 by Lemma<br />

2.5. Let V be a C 2 function with V (x) = log(1 + xH) <strong>for</strong> xH > ɛ. Note that<br />

log(1 + xH) is subadditive:<br />

log(1 + x + zH) ≤ log(1 + xH + zH) ≤ log(1 + xH) + log(1 + zH).


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 24<br />

Then, <strong>for</strong> each i = 1, . . . , K,<br />

<br />

V (x + z) ϕi(dz)<br />

<br />

<br />

≤ V (x + z) ϕi(dz) + log(1 + x + zH) ϕi(dz)<br />

x+zH≤ɛ<br />

<br />

x+zH>ɛ<br />

≤ sup<br />

yH≤ɛ<br />

V (y) + (log(1 + xH) + log(1 + zH)) ϕi(dz)<br />

= sup V (y) + log(1 + xH) + E log(1 + Z<br />

yH≤ɛ<br />

i H).<br />

Hence, V (· + z) ϕi(dz) is bounded on compact sets <strong>for</strong> each i = 1, . . . , K, guaran-<br />

teeing that V ∈ Q. Then it suffices to show that<br />

A V (x) ≤ −c<br />

<strong>for</strong> some c > 0 <strong>and</strong> all x sufficiently large.<br />

Direct calculations yield that <strong>for</strong> xH > ɛ,<br />

∇V (x) =x −1<br />

H (1 + xH) −1 Hx<br />

∇ 2 V (x) =x −1<br />

H (1 + xH) −1 H − x −2<br />

H (1 + xH) −1 (1 + 2xH)Hxx ⊺ H .<br />

It follows that<br />

G V (x) =x −1<br />

−1<br />

H (1 + xH)<br />

+ 1<br />

2<br />

n<br />

i,j=1<br />

(ai,j +<br />

m<br />

k=1<br />

<br />

x ⊺ H(b − βx)<br />

α k i,jxk) Hi,j − x −2<br />

H (1 + xH) −1 (1 + 2xH)(Hxx ⊺ <br />

H)i,j<br />

<br />

=x −1<br />

H (1 + xH) −1 (−x ⊺ Hβx + O(xH))<br />

x<br />

= −<br />

⊺Hβx + o(1)<br />

xH(1 + xH)


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 25<br />

as x → ∞. Note that x ⊺ Hβx ≥ cx ≥ cδxH <strong>for</strong> some c, δ > 0, so<br />

Hence,<br />

Moreover, we have<br />

x<br />

lim<br />

xH→∞<br />

⊺Hβx xH(1 + xH)<br />

≥ lim<br />

xH→∞<br />

<br />

(V (x + z) − V (x)) ϕi(dz)<br />

<br />

<br />

= (V (x + z) − V (x)) ϕi(dz) +<br />

x+zH≤ɛ<br />

S1 + S2<br />

Note that <strong>for</strong> xH > ɛ,<br />

<br />

S1 ≤<br />

=<br />

<br />

x+zH≤ɛ<br />

<br />

cδxH<br />

xH(1 + xH)<br />

= cδ.<br />

lim<br />

xH→∞<br />

G (x) ≤ −cδ (2.14)<br />

x+zH>ɛ<br />

sup V (y) − log(1 + xH)<br />

yH≤ɛ<br />

<br />

sup V (y) − log(1 + xH)<br />

yH≤ɛ<br />

as xH → ∞ by Lemma 2.6. Also note that <strong>for</strong> xH > ɛ,<br />

<br />

S2 ≤<br />

<br />

≤<br />

x+zH>ɛ<br />

log<br />

(V (x + z) − V (x)) ϕi(dz)<br />

<br />

ϕi(dz)<br />

P(x + Z i H ≤ ɛ) → 0 (2.15)<br />

(log(1 + xH + zH) − log(1 + z)) ϕi(dz)<br />

<br />

1 + zH<br />

1 + xH<br />

<br />

ϕi(dz) → 0 (2.16)<br />

by the dominated convergence theorem. It follows from (2.15) <strong>and</strong> (2.16) that<br />

lim L V (x) = lim<br />

xH→∞ xH→∞<br />

K<br />

i=1<br />

λi<br />

<br />

(V (x + z) − V (x)) ϕi(dz) = 0, (2.17)


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 26<br />

since κ = 0. Combining (2.14) <strong>and</strong> (2.17) gives us<br />

<strong>for</strong> xH large enough.<br />

2.3.2 Ergodicity<br />

A V (x) ≤ − cδ<br />

2<br />

We have verified the Foster-Lyapunov inequality (2.5) to apply Proposition A.1 <strong>for</strong><br />

establishing the (exponential) ergodicity. The only difference is that the set C in the<br />

inequality is compact, whereas Proposition A.1 requires C be petite. Our next task<br />

is then to show that every compact set is petite <strong>for</strong> X.<br />

Lemma 2.7. Suppose that Φ = (Φn : n ≥ 0) is a ϕ-irreducible discrete time Markov<br />

chain with state space S. If Φ has the Feller property <strong>and</strong> the support of ϕ has<br />

non-empty interior, then all compacts subsets of S are petite.<br />

Proof. See Proposition 6.2.8 of [74].<br />

Proposition 2.4. Under Assumptions 1, 2, <strong>and</strong> 4, every compact subsets of S is<br />

petite <strong>for</strong> X.<br />

Proof. Fix δ > 0, consider the skeleton chain X δ (X(nδ) : n ≥ 0). Lemma 2.1<br />

implies that X has Feller property. Hence, X δ is a Feller chain. Moreover, Lemma<br />

2.2 guarantees the existence of the transition density of Px(X(t) ∈ dy). Hence, X<br />

<strong>and</strong> thus X δ is ν-irreducible, where ν is the Lebesgue measure. It then follows from<br />

Lemma 2.7 that every compact subsets of S is petite <strong>for</strong> the skeleton chain X δ <strong>and</strong><br />

there<strong>for</strong>e, by the definition, is petite <strong>for</strong> X.<br />

We also need the following property of f-norm of signed-measures defined in A.7.<br />

Lemma 2.8. Let η be a signed-measure on S <strong>and</strong> f, g ≥ 1 be two positive measurable<br />

functions with f ≤ cg <strong>for</strong> some constant c > 0, then<br />

ηf ≤ cηg.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 27<br />

Proof. For any function h with |h| ≤ f, clearly |h/c| ≤ g. Denote <br />

η(h). Then,<br />

<strong>and</strong> thus<br />

i.e. ηf ≤ cηg.<br />

|η(h)| = c|η(h/c)| ≤ c sup |η(r)| = cηg<br />

|r|≤g<br />

sup |η(h)| ≤ cηg,<br />

|h|≤f<br />

S<br />

h(x)η(dx) by<br />

Theorem 2.1. Suppose that Assumptions 1, 2, 3, 4, <strong>and</strong> 5 hold. Let Z i be a rv with<br />

distribution ϕi, i = 1, . . . , K. Assume that EZ i p < ∞ <strong>for</strong> some p > 0 <strong>and</strong> all<br />

i = 1, . . . , K. Then,<br />

(i) X admits a unique stationary distribution π;<br />

(ii) X is f-exponentially ergodic, where f(x) = x p . In particular,<br />

<strong>for</strong> each x ∈ S <strong>and</strong> some c > 0.<br />

(iii) EπX(0) p < ∞.<br />

Px(X(t) ∈ ·) − π(·)f ≤ cf(x)e −γt<br />

Proof. Let V (x) = x p<br />

H . It follows from Proposition 2.2, Proposition 2.4, <strong>and</strong> Proposition<br />

A.2 that X has a unique stationary distribution π <strong>for</strong> which EπV (X(0)) < ∞<br />

<strong>and</strong><br />

Px(X(t) ∈ ·) − π(·)V ≤ c1V (x)e −γt , x ∈ S<br />

<strong>for</strong> some c1, γ > 0. Note that · H is equivalent to · , hence EπX(0) p < ∞ <strong>and</strong><br />

c2f(x) ≤ V (x) ≤ c3f(x), x ∈ R n ,<br />

<strong>for</strong> some c2 > 0. It then follows from Lemma 2.8 that<br />

c2 · f ≤ · V ,


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 28<br />

implying that<br />

<strong>for</strong> all x ∈ S.<br />

Px(X(t) ∈ ·) − π(·)f ≤ c −1<br />

2 Px(X(t) ∈ ·) − π(·)V ≤ c1c3<br />

f(x)e −γt ,<br />

Theorem 2.2. Suppose that Assumptions 1, 2, 3, <strong>and</strong> 4 hold. If E log(1+Z i ) < ∞<br />

<strong>and</strong> κi = 0, i = 1, . . . , K, then X admits a unique stationary distribution <strong>and</strong> is<br />

ergodic.<br />

Proof. This is an immediate result from Proposition 2.3, Proposition 2.4, <strong>and</strong> Propo-<br />

sition A.1.<br />

The preceding two theorems indicate that the convergence rate of X(t) to the<br />

equilibrium is essentially determined by the jump component, particularly, the tail<br />

heaviness of the “heaviest” jump distribution ϕi, which also determines the tail heav-<br />

iness of the stationary distribution.<br />

2.3.3 Central Limit Theorem<br />

It is well known that under mild conditions, a positive Harris recurrent Markov process<br />

Φ = (Φ(t) : t ≥ 0) satisfies the central limit theorem (CLT)<br />

t 1/2<br />

<br />

1<br />

t<br />

t<br />

0<br />

<br />

f(Φ(s)) ds − π(f) ⇒ N (0, σ 2 )<br />

as t → ∞, where π is the stationary distribution, f : S → R <strong>for</strong> which π(|f|) < ∞,<br />

<strong>and</strong> the variance<br />

σ 2 Eπf(Φ(0)) + 2<br />

∞<br />

0<br />

c2<br />

Eπ[f(Φ(0))f(Φ(t))] dt,<br />

where f = f −π(f) is the centered version of f. The derivation of such a CLT typically<br />

exploits the regenerative structure of Φ (i.e. divide Φ(t) into iid regenerative cycles<br />

<strong>and</strong> apply the CLT <strong>for</strong> iid rv’s; see, <strong>for</strong> example, [74].


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 29<br />

We will apply a different approach here. Particularly, we will express t<br />

X(s) ds<br />

0<br />

in the <strong>for</strong>m of a local martingale plus some remaining term <strong>and</strong> apply the CLT <strong>for</strong><br />

local martingales. Not only can we obtain a multidimensional CLT, but also can<br />

calculate the covariance matrix of the limit explicitly thanks to the <strong>affine</strong> structure.<br />

In particular, we consider a (R q -valued) local martingale of the <strong>for</strong>m<br />

V (t) <br />

t<br />

<strong>for</strong> some c ∈ R n <strong>and</strong> B ∈ R n×n . Then, by (2.1),<br />

V (t) =<br />

t<br />

0<br />

+<br />

0<br />

X(s) ds − ct +<br />

K<br />

i=1<br />

=(Bb +<br />

+<br />

t<br />

0<br />

t<br />

0<br />

<br />

R n<br />

X(s) ds − ct + B(X(t) − X(0)) (2.18)<br />

t<br />

0<br />

Bz Ni(ds, dz)<br />

K<br />

λiBEZ i − c)t +<br />

i=1<br />

Bσ(X(s)) dW (s) +<br />

B(b − βX(s)) ds +<br />

t<br />

0<br />

K<br />

i=1<br />

(I − Bβ +<br />

t<br />

0<br />

<br />

R n<br />

t<br />

0<br />

K<br />

i=1<br />

Bσ(X(s)) dW (s)<br />

BEZ i κ ⊺<br />

i )X(s) ds<br />

Bz Ñi(ds, dz), (2.19)<br />

where I is the identity matrix, Ñi(ds, dz) Ni(ds, dz) − Λi(X(s−))dsϕi(dz) is com-<br />

pensated r<strong>and</strong>om measure of Ni(ds, dz) <strong>and</strong> Z i has distribution ϕi, i = 1, . . . , K.<br />

Hence, if we choose r <strong>and</strong> A such that. Hence, if we choose c <strong>and</strong> B such that<br />

Bb +<br />

K<br />

λiBEZ i − c = 0<br />

i=1<br />

I − Bβ +<br />

K<br />

i=1<br />

BEZ i κ ⊺<br />

i<br />

= 0,<br />

(2.20)<br />

then V (t) is a local martingale in R n . Assumption 5 implies that β − K<br />

i=1 EZi κ ⊺<br />

i is


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 30<br />

nonsingular, so<br />

c = B(b +<br />

B = (β −<br />

K<br />

λiEZ i ) = 0<br />

i=1<br />

K<br />

i=1<br />

EZ i κ ⊺<br />

i )−1 .<br />

(2.21)<br />

Now that we have constructed a local martingale U(t), we will apply the CLT <strong>for</strong><br />

local martingales as follows.<br />

Proposition 2.5. (Local Martingale CLT) Let M = (M(t) : t ≥ 0 be a local<br />

martingale in R n <strong>and</strong> 〈M〉 = (〈M〉(t) : t ≥ 0) ∈ R n×n be the predictable quadratic<br />

covariation (see, <strong>for</strong> example, [83]) of M. Suppose that <strong>for</strong> each T > 0 <strong>and</strong> i, j =<br />

1, . . . , n,<br />

<strong>for</strong> some H 0 in R n×n , <strong>and</strong><br />

<strong>and</strong><br />

Then,<br />

lim<br />

t→∞ t−1 〈M〉ij(t) = Σij in probability, (2.22)<br />

lim E sup n<br />

n→∞ 0≤t≤nT<br />

−1 |Mi(t) − Mi(t−)| 2 = 0, (2.23)<br />

lim E sup n<br />

n→∞ 0≤t≤nT<br />

−1/2 |〈M〉ij(t) − 〈M〉ij(t−)| = 0. (2.24)<br />

t −1 M(t) ⇒ N (0, Σ)<br />

as t → ∞, where N (0, Σ) is a n-dimensional Gaussian r<strong>and</strong>om variable with mean<br />

0 <strong>and</strong> covariance matrix H.<br />

Proof. See Theorem 1.4 in Chapter 7 of [38].<br />

Roughly speaking, Proposition 2.5 asserts that if i.) 〈M〉 has an equilibrium<br />

(condition (2.22)); ii.) the jumps of either M or 〈M〉 cannot be too large nor occur<br />

too often (conditions (2.23) <strong>and</strong> (2.24)), then M satisfy the CLT. We will now verify<br />

these conditions <strong>for</strong> V one by one.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 31<br />

Proposition 2.6. Let 〈V 〉 be the predictable quadratic variation of V . If X is ergodic,<br />

then<br />

as t → ∞ <strong>for</strong> some Γ 0.<br />

Proof. Note that, by (2.19) <strong>and</strong> (2.20),<br />

V (t) =<br />

t<br />

0<br />

t −1 〈V 〉(t) → Γ a.s.<br />

Bσ(X(s)) dW (s) +<br />

K<br />

i=1<br />

t<br />

0<br />

<br />

R n<br />

Bz Ñi(ds, dz), (2.25)<br />

where B is given by (2.21). It follows that the predictable quadratic covariation is<br />

〈V 〉(t) =<br />

t<br />

Bσ(X(s))σ(X(s))<br />

0<br />

⊺ B ⊺ ds<br />

K<br />

t <br />

+<br />

i=1 0 Rn Bzz ⊺ B ⊺ ϕi(dz)Λi(X(s−))ds<br />

=B(a +<br />

+<br />

K<br />

i=1<br />

n<br />

B(α j +<br />

j=1<br />

=B(a +<br />

+<br />

K<br />

i=1<br />

m<br />

B(α j +<br />

j=1<br />

λiEzz ⊺ )B ⊺ t<br />

K<br />

i=1<br />

λiEzz ⊺ )B ⊺ t<br />

K<br />

i=1<br />

κi,jEzz ⊺ )B ⊺<br />

t<br />

0<br />

κi,jEzz ⊺ )B ⊺<br />

t<br />

0<br />

Xj(s−) ds<br />

Xj(s−) ds., (2.26)<br />

since α j = 0 <strong>and</strong> κi,j = 0 <strong>for</strong> j = m+1, . . . , n. The calculation of predictable quadratic<br />

covariation is referred to [2] or [83]. The ergodicity of X implies the following strong<br />

law of large numbers (SLLN)<br />

1<br />

t<br />

t<br />

0<br />

f(X(s)) ds → π(f) a.s.<br />

as t → ∞ <strong>for</strong> any nonnegative function f, where π is the stationary distribution of


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 32<br />

X; see, <strong>for</strong> example, [74]. In particular, we have<br />

1<br />

t<br />

t<br />

0<br />

<br />

Xj(s) ds →<br />

as t → ∞. It follows from (2.26) <strong>and</strong> (2.27) that<br />

as t → ∞, where<br />

Γ = B<br />

<br />

a +<br />

K<br />

λiEzz ⊺ +<br />

i=1<br />

S<br />

xj π(dx) a.s. (2.27)<br />

t −1 〈V 〉(t) → Γ a.s. (2.28)<br />

m<br />

(α j +<br />

j=1<br />

K<br />

i=1<br />

κi,jEzz ⊺ <br />

) xjπ(dx) B<br />

S<br />

⊺ . (2.29)<br />

Finally, we verify that Γ 0 which is straight<strong>for</strong>ward from the fact that a 0 <strong>and</strong><br />

α j 0, j = 1, . . . , m.<br />

We now proceed to verify the condition (2.23) <strong>for</strong> V (t). First, we need to estimate<br />

the number of jumps in a fixed time interval.<br />

Lemma 2.9. Let Ψ = (Ψ(t) : t ≥ 0) be an adapted counting process with intensity<br />

Λ(t), <strong>for</strong> which E( t<br />

0 Λ(s) ds) < ∞. If Ψ is nonexplosive, then (Ψ(t) − t<br />

Λ(s) ds :<br />

0<br />

0 ≤ t ≤ T ) is a martingale.<br />

Proof. See Theorem T8 <strong>and</strong> T9 of [14].<br />

Lemma 2.10. If X is ergodic, then<br />

<strong>for</strong> all T > 0 <strong>and</strong> i = 1, . . . , K.<br />

ENi(kT )<br />

lim<br />

k→∞ k<br />

= T EπΛi(X(0)),<br />

Proof. Fix T > 0 <strong>and</strong> i = 1, . . . , K. X is ergodic, so<br />

EX(t) → EπX(0)


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 33<br />

as t → ∞. Hence, <strong>for</strong> any ɛ > 0, there exists k0 > 0 such that<br />

EΛi(X(kT )) < EπΛi(X(0)) + ɛ<br />

<strong>for</strong> all k > k0 since Λi(x) is <strong>affine</strong> in x. Moreover, the ergodicity of X implies that N<br />

is nonexplosive, from which <strong>and</strong> Lemma 2.9 it follows that<br />

There<strong>for</strong>e,<br />

ENi(kT )<br />

k<br />

= 1<br />

k E<br />

= 1<br />

k E<br />

= 1<br />

k<br />

≤ 1<br />

k<br />

Likewise, we can show<br />

kT<br />

0<br />

kT<br />

0<br />

kT<br />

0<br />

k0T<br />

0<br />

Λi(X(s−))ds<br />

Λi(X(s))ds<br />

EΛi(X(s))ds (by Fubini’s Theorem)<br />

EΛi(X(s)) ds + 1<br />

k (EπΛi(X(0)) + ɛ)(kT − k0T ).<br />

ENi(kT )<br />

lim<br />

k→∞ k<br />

ENi(kT )<br />

lim<br />

k→∞ k<br />

Sending ɛ ↓ 0 completes the proof.<br />

≤ T (EπΛi(X(0)) + ɛ).<br />

≥ T (EπΛi(X(0)) − ɛ).<br />

Roughly speaking, Proposition 2.10 states that the number of jumps is propor-<br />

tional to the length of the time interval (which is not “too often”). We are now ready<br />

to show the following result, verifying condition (2.23) <strong>for</strong> V (t).<br />

Proposition 2.7. Suppose that X is ergodic <strong>and</strong> EZ i 2+ɛ < ∞ <strong>for</strong> some ɛ > 0.<br />

Then,<br />

<strong>for</strong> l = 1, . . . , n.<br />

lim E sup k<br />

k→∞ 0≤t≤kT<br />

−1 |Vl(t) − Vl(t−)| 2 = 0,<br />

Proof. Let (Z i j : j ≥ 1) be a sequence of iid rv’s with common distribution ϕi <strong>and</strong>


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 34<br />

g(z) = Bz, where B is the matrix given by (2.21). It follows from the representation<br />

of V (t) in (2.25) that the pure jump part of V is<br />

Hence,<br />

K<br />

i=1<br />

t<br />

0<br />

<br />

R n<br />

g(z)Ni(ds, dz).<br />

P( sup k<br />

0≤t≤kT<br />

−1 |Vl(t) − Vl(t−)| 2 > x)<br />

=P( sup sup k<br />

1≤i≤K 1≤j≤Ni(kT )<br />

−1 gl(Z i j) 2 > x)<br />

=E[P( sup sup k<br />

1≤i≤K 1≤j≤Ni(kT )<br />

−1 gl(Z i j) 2 > x|Ni(kT ), i = 1, . . . , K)]<br />

≤E[<br />

=E[<br />

=<br />

K<br />

i=1<br />

K<br />

i=1<br />

Ni(kT ) <br />

j=1<br />

Ni(kT ) <br />

j=1<br />

P(k −1 gl(Z i j) 2 > x|Ni(kT ), i = 1, . . . , K)]<br />

P(k −1 gl(Z i j) 2 > x)]<br />

K<br />

ENi(kT )P(k −1 gl(Z i 1) 2 > x)<br />

i=1<br />

It follows that <strong>for</strong> any δ > 0,<br />

E sup k<br />

0≤t≤kT<br />

−1 |Vl(t) − Vl(t−)| 2 =δ +<br />

≤δ +<br />

=δ +<br />

≤δ +<br />

∞<br />

δ<br />

∞<br />

δ<br />

P( sup k<br />

0≤t≤kT<br />

−1 |Vl(t) − Vl(t−)| 2 > x) dx<br />

K<br />

ENi(kT )P(k −1 gl(Z i 1) 2 > x) dx<br />

i=1<br />

K<br />

ENi(kT )<br />

i=1<br />

K<br />

ENi(kT )<br />

i=1<br />

∞<br />

δ<br />

∞<br />

δ<br />

P(k −1 gl(Z i 1) 2 > x) dx<br />

(kx)<br />

−(1+ ɛ<br />

2 ) Egl(Z i 1) 2+ɛ dx.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 35<br />

There<strong>for</strong>e,<br />

lim E sup k<br />

k→∞ 0≤t≤kT<br />

−1 |Vl(t) − Vl(t−)| 2 ≤ δ<br />

by Proposition 2.10. Sending ɛ ↓ 0 finishes the proof.<br />

Now, we are ready to prove the CLT <strong>for</strong> X.<br />

Theorem 2.3. Suppose that Z i ∈ R n has distribution ϕi <strong>and</strong> EZ i 2+ɛ < ∞, <strong>for</strong><br />

some ɛ > 0 <strong>and</strong> all i = 1, . . . , K. Then, under Assumptions 1 - 5,<br />

as t → ∞, where<br />

c = B(b +<br />

<br />

Γ = B<br />

B = (β −<br />

a +<br />

t 1/2<br />

<br />

1<br />

t<br />

K<br />

λiEZ i )<br />

i=1<br />

K<br />

i=1<br />

K<br />

i=1<br />

t<br />

0<br />

λiEZ i Z i⊺ +<br />

EZ i κ ⊺<br />

i )−1 .<br />

<br />

X(s) ds − c ⇒ N (0, Γ),<br />

m<br />

α j +<br />

j=1<br />

K<br />

i=1<br />

κijE(Z i Z i⊺ <br />

) xj π(dx) B<br />

S<br />

⊺<br />

Proof. Note that 〈V 〉(t) has continuous sample paths a.s., so the condition (2.24) is<br />

trivially satisfied <strong>for</strong> V (t).. Under the current assumptions, it follows from Proposition<br />

2.1 that X is ergodic. So Propositions 2.6 <strong>and</strong> Proposition 2.7 respectively verified<br />

the conditions (2.22) <strong>and</strong> (2.23) <strong>for</strong> V (t). There<strong>for</strong>e, by the local martingale CLT<br />

(Proposition 2.5), we have<br />

as t → ∞, where Γ is given by (2.29).<br />

t −1/2 V (t) ⇒ N (0, Γ),<br />

It follows from the ergodicity of X that<br />

X(t) ⇒ X(∞)<br />

as t → ∞, where X(∞) has the stationary distribution π. Hence, t −1/2 X(t) ⇒ 0 <strong>and</strong>


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 36<br />

thus<br />

t −1/2 X(t) → 0 in probability<br />

as t → ∞. Recall that the representation (2.18),<br />

t<br />

from which it follows that<br />

as t → ∞.<br />

0<br />

X(s) ds − ct = V (t) − B(X(t) − X(0)),<br />

t 1/2<br />

<br />

1<br />

t<br />

t<br />

0<br />

<br />

X(s) ds − c ⇒ N (0, Γ),<br />

We clearly has the following corollary regarding the equilibrium of X.<br />

Corollary 2.1. Under Assumptions 1 - 5,<br />

as t → ∞, where<br />

1<br />

t<br />

t<br />

0<br />

c = (β −<br />

<br />

X(s) ds →<br />

K<br />

i=1<br />

S<br />

EZ i κ ⊺<br />

i )−1 (b +<br />

xπ(dx) = c, a.s.<br />

K<br />

λiEZ i ).<br />

2.4 Characterization of Stationary Distribution<br />

Theorem 2.4. Suppose that Assumptions 1, 2, 3, 4, <strong>and</strong> 5 hold. Let Z i be a rv with<br />

distribution ϕi, i = 1, . . . , K. Assume that EZ i < ∞ <strong>for</strong> all i = 1, . . . , K. Let<br />

ψ(θ) = Eπe iθX(0) be the Fourier trans<strong>for</strong>m of the distribution π. Then, ψ satisfies the<br />

following first-order PDE<br />

i=1<br />

f(θ)ψ(θ) + g(θ) ⊺ ∇ψ(θ) = 0, (2.30)


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 37<br />

with ψ(0) = 1, where<br />

<strong>and</strong><br />

f(θ) = iθ ⊺ b − 1<br />

2 θ⊺ aθ +<br />

K<br />

λk(ψk(θ) − 1),<br />

k=1<br />

g(θ) ⊺ = −θ ⊺ β + 1<br />

2 iθ⊺ αθ − i<br />

K<br />

k=1<br />

where θ⊺αθ = (θ⊺α1θ, . . . , θ⊺αnθ), ψk(θ) = EeiθZk .<br />

(ψk(θ) − 1)κ ⊺<br />

k ,<br />

Proof. Let h(x) = e iθ⊺ x . Then, similar as the derivation of (2.4), applying Itô’s<br />

<strong>for</strong>mula <strong>for</strong> complex-valued functions, (see, <strong>for</strong> example, [83]), we obtain that<br />

=<br />

h(X(t)) − h(X(0)) −<br />

t<br />

0<br />

I1 + I2<br />

t<br />

0<br />

(A h)(X(s)) ds<br />

∇h(X(s−)) ⊺ σ(X(s)) dW (s) +<br />

Note that ∇h(x) = ih(x)θ, so<br />

Eπ<br />

t<br />

0<br />

K<br />

i=1<br />

|∇h(X(s−)) ⊺ σ(X(s))| 2 ds =Eπ<br />

≤Eπ<br />

t<br />

0<br />

t<br />

0<br />

t<br />

0<br />

=θ ⊺ aθt +<br />

=(θ ⊺ aθ +<br />

<br />

S<br />

(h(X(s−) + z) − h(X(s−))) Ñi(ds, dz)<br />

|h(X(s−))| 2 |θ ⊺ aθ +<br />

(θ ⊺ aθ +<br />

m<br />

j=1<br />

m<br />

θ<br />

j=1<br />

⊺ α j t<br />

θ<br />

0<br />

m<br />

j=1<br />

θ ⊺ α j θXj(s)) ds<br />

EπXj(s) ds<br />

m<br />

θ ⊺ α j θEπXj(0))t < ∞,<br />

j=1<br />

θ ⊺ α j θXj(s−)| ds<br />

where the penultimate equality follows from the Fubini’s theorem since Xj ≥ 0 <strong>for</strong><br />

j = 1, . . . , m. There<strong>for</strong>e, I1 is a Pπ-martingale.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 38<br />

On the other h<strong>and</strong>, note that |h(x)| ≤ 1 <strong>for</strong> all x, so<br />

t<br />

Eπ<br />

0<br />

t<br />

≤Eπ<br />

0<br />

<br />

<br />

≤4t < ∞.<br />

S<br />

S<br />

|h(X(s−) + z) − h(X(s−))| 2 ϕi(dz)<br />

2(|h(X(s−) + z)| 2 + |h(X(s−))| 2 ) ϕi(dz)<br />

Hence, by Lemma 2.4, I2 is a Pπ-martingale. It follows that<br />

<strong>and</strong> thus<br />

Eπh(X(t)) − Eπ<br />

Eπ<br />

t<br />

t<br />

<strong>for</strong> any t > 0, since Eπh(X(t)) = Eπh(X(0)).<br />

Moreover,<br />

<br />

(A h)(x) =h(x) iθ ⊺ (b − βx) − 1<br />

2<br />

+<br />

K<br />

k=1<br />

Hence, by the Fubini’s Theorem,<br />

Eπ<br />

t<br />

0<br />

(A h)(X(s)) ds =<br />

0<br />

0<br />

(A h)(X(s)) ds = Eπh(X(0)),<br />

(A h)(X(s)) ds = 0 (2.31)<br />

(λk + κ ⊺<br />

k x)<br />

<br />

<br />

=h(x) (iθ ⊺ b + 1<br />

2 θ⊺aθ +<br />

+ (−iθ ⊺ β − 1<br />

2 θ⊺ αθ +<br />

t<br />

0<br />

n<br />

k,l=1<br />

(akl +<br />

n<br />

α r klxr)θkθl<br />

r=1<br />

(e iθz <br />

− 1) ϕk(dz)<br />

n<br />

λk(ψk(θ) − 1))<br />

k=1<br />

n<br />

k=1<br />

Eπ(A h)(X(s)) ds =<br />

(ψk(θ) − 1)κ ⊺<br />

k )x<br />

<br />

. (2.32)<br />

t<br />

0<br />

Eπ(A h)(X(0)) ds


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 39<br />

since EπX(0) < ∞. It then follows from (2.31) that<br />

<strong>for</strong> any t > 0, implying that<br />

t<br />

Note that ψ(θ) = Eπh(X(0)) <strong>and</strong> that<br />

0<br />

Eπ(A h)(X(0)) ds = 0<br />

Eπ(A h)(X(0)) = 0. (2.33)<br />

∇ψ(θ) = iEπX(0)h(X(0)),<br />

which can be shown easily by the Dominated Convergence Theorem. So, the proof is<br />

completed by combining (2.32) <strong>and</strong> (2.33).<br />

Remark 2.5. The identity (2.33) also appears in [35] <strong>and</strong> [50]. It is proved in [35] that<br />

if a probability η satisfies <br />

A f(x) η(dx) = 0<br />

<strong>for</strong> all bounded C 2 functions f, then η is the stationary distribution. The same<br />

identity is used in [50] <strong>for</strong> establishing upper bounds on stationary expectations. <br />

Remark 2.6. Note that the first-order PDE (2.30) is linear <strong>and</strong> thus can be solved<br />

by the method of characteristics (see, <strong>for</strong> example, [39]). In particular, it suffices to<br />

solve the following ODE system <strong>for</strong> θ = θ(s) <strong>and</strong> δ = δ(s).<br />

dθ<br />

ds = − θ⊺β + 1<br />

2 iθ⊺αθ − i<br />

dδ<br />

ds =iθ⊺ b − 1<br />

2 θ⊺ aθ +<br />

K<br />

k=1<br />

(ψk(θ) − 1)κ ⊺<br />

k<br />

K<br />

λi(ψk(θ) − 1).<br />

k=1<br />

We now illustrate Theorem 2.4 with some examples.


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 40<br />

Example 2.1. (O-U Process) The O-U process satisfies the SDE<br />

dX(t) = (b − βX(t)) dt + σ dW (t).<br />

It is well known that the O-U process is a Gaussian process with covariance function<br />

<strong>for</strong> s < t. Hence,<br />

Moreover, it is easy to see that<br />

Cov(X(s), X(t)) = σ2 −β(t−s) −β(t+s)<br />

e − e<br />

2β<br />

<br />

Var(X(t)) = σ2 −2βt<br />

1 − e<br />

2β<br />

.<br />

ExX(t) = xe −βt + b<br />

β (1 − e−βt ).<br />

Hence, the characteristic function of X(t) conditional on X(0) = x is<br />

<br />

ψX(t)(θ) = exp iθ xe −βt + b<br />

β (1 − e−βt <br />

) − σ2 −2βt<br />

1 − e<br />

4β<br />

θ 2<br />

<br />

.<br />

On the other h<strong>and</strong>, by Theorem 2.4, the characteristic function ψ(θ) of the equilibrium<br />

distribution satisfies<br />

<br />

ibθ − 1<br />

2 σ2θ 2<br />

<br />

ψ(θ) − βθψ ′ (θ) = 0.<br />

We can easily solve this first-order ODE to obtain<br />

Obviously, <strong>for</strong> each θ,<br />

<br />

ib σ2<br />

ψ(θ) = exp θ −<br />

β 4β θ2<br />

<br />

.<br />

lim<br />

t→∞ ψX(t)(θ) → ψ(θ).<br />

Hence, the equilibrium distribution is Gaussian with mean b<br />

β<br />

σ2<br />

<strong>and</strong> variance , i.e. it<br />


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 41<br />

has the density <br />

β<br />

exp<br />

πσ2 (βx − b) 2<br />

βσ 2<br />

<br />

, x ∈ R.<br />

Example 2.2. (CIR Process) The SDE <strong>for</strong> CIR process is<br />

dX(t) = (b − βX(t)) dt + σ X(t) dW (t).<br />

It is well known (see [22]) that conditional X(0) = x,<br />

where<br />

c(t) =<br />

X(t) D = Y<br />

2c(t) ,<br />

2β<br />

σ 2 (1 − e −βt ) ,<br />

<strong>and</strong> Y is a noncentral χ 2 rv with 4b<br />

σ 2 degrees of freedom <strong>and</strong> noncentrality parameter<br />

2c(t)e −βt x. Hence, (see Chapter 29 of [57]) the characteristic function of Y is<br />

2b<br />

−<br />

ψY (θ) = (1 − 2θi) σ2 exp<br />

<strong>and</strong> thus the characteristic function of X(t) is<br />

ψX(t)(θ) = ψY<br />

2c(t)e −βt xθi<br />

1 − 2θi<br />

<br />

θ<br />

= 1 −<br />

2c(t)<br />

θi<br />

2b<br />

−<br />

σ<br />

c(t)<br />

2<br />

<br />

exp<br />

<br />

,<br />

e −βt xθi<br />

1 − θi<br />

c(t)<br />

On other h<strong>and</strong>, Theorem 2.4 implies that the characteristic function ψ(θ) of the<br />

equilibrium distribution satisfies<br />

ibθψ(θ) +<br />

<br />

−βθ + 1<br />

2 iσ2θ 2<br />

<br />

ψ ′ (θ) = 0.<br />

We can easily solve this first-order ODE to obtain<br />

ψ(θ) =<br />

<br />

1 − iσ2<br />

2β θ<br />

2b<br />

−<br />

σ2 .


CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 42<br />

It is easy to see that <strong>for</strong> each θ,<br />

lim<br />

t→∞ ψX(t)(θ) → ψ(θ).<br />

Let X(∞) has the equilibrium distribution, then σ2<br />

4β X(∞) has χ2 distribution with<br />

4b<br />

σ 2 degrees of freedom. In particular, the equilibrium density is<br />

1<br />

2 2b<br />

σ 2 Γ( 2b<br />

σ 2 )<br />

4βx<br />

σ 2<br />

2b<br />

σ2 −1 <br />

exp − 2βx<br />

σ2 <br />

, x ≥ 0.


Chapter 3<br />

Affine Point Processes: Large<br />

Deviations<br />

3.1 Introduction<br />

The <strong>affine</strong> point process (APP) model is part of <strong>affine</strong> <strong>models</strong> <strong>and</strong> is a broadly used<br />

stochastic model in practice. An APP is a point process whose intensity is an <strong>affine</strong><br />

function of an AJD. Notable examples include Hawkes process ([51] <strong>and</strong> [52]) <strong>and</strong><br />

doubly stochastic process with a CIR intensity process ([31]). One reason <strong>for</strong> the wide<br />

applicability of APPs in practice is that it successfully captures the so-called “self-<br />

excitation” or “clustering” feature exhibited in many time-series data. For instance,<br />

seismology <strong>and</strong> earthquake modeling ([79] <strong>and</strong> [80]), credit derivatives pricing ([37]),<br />

risk management ([19]), high frequency trading ([8]), social network ([91]), <strong>and</strong> so<br />

<strong>for</strong>th.<br />

The center of this chapter is to explicitly calculate the large deviations asymptotics<br />

<strong>for</strong> APPs in the long-term horizon asymptotic regime. However, be<strong>for</strong>e studying its<br />

atypical behavior, we will characterize the typical behavior of an APP in terms of a<br />

CLT result, which also helps us to identify the “<strong>rare</strong>-<strong>event</strong>” region of an APP.<br />

Let (L(t) : t ≥ 0) be an APP. As will be seen later, L(t) has roughly the same<br />

43


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 44<br />

magnitude (at least on average) as an Markov additive functional of the <strong>for</strong>m<br />

t<br />

0<br />

f(Y (s)) ds (3.1)<br />

where (Y (s) : s ≥ 0) is Markov process <strong>and</strong> f is a function. Hence, the large deviations<br />

behavior of L(t) is essentially the same as the above Markov additive functional (3.1).<br />

The large deviations behavior of Markov additive functionals has been extensively<br />

studied in the literature; <strong>for</strong> example, [75], [76], [64], [65], <strong>and</strong> references therein.<br />

However, one subtlety lies in whether f is bounded or not.<br />

When f is bounded, the large deviations of (3.1) is qualitatively the same as that<br />

of a r<strong>and</strong>om walk with light-tailed iid increments. In particular, one typically has<br />

that<br />

t<br />

<br />

P f(Y (s)) ds > Rt<br />

0<br />

(3.2)<br />

decays to 0 as t → ∞ exponentially fast. In other words, the above tail probability<br />

is upper bounded by e −ct <strong>for</strong> some c > 0. However, when f is unbounded, a large<br />

deviation of order t from the equilibrium could be achieved in o(t) time, as opposed to<br />

O(t) time <strong>for</strong> bounded f. This suggests that the probability of such a large deviation<br />

could be of order e o(t) . This phenomenon is carefully discussed in [32] <strong>and</strong> [11],<br />

where the authors shows that even <strong>for</strong> the simple M/M/1 queue length process, (3.2)<br />

does not have an exponential decay. In fact, it has Weibullian asymptotics (decays<br />

subexponentially fast).<br />

Nevertheless, our calculation shows that <strong>for</strong> an APP (in which case, f(Y ) = Y <strong>and</strong><br />

the Markov process Y is an AJD), although Y is unbounded, the “traditional” large<br />

deviations behavior holds. In other words, P(L(t) > Rt) does decay to 0 exponentially<br />

fast as t → ∞. A significant difference between APPs <strong>and</strong> M/M/1 queue length<br />

process is that APPs have a much faster relaxation speed (which measures how fast<br />

<strong>for</strong> a path deviated from the equilibrium returns to the equilibrium).<br />

We close this chapter by applying the large deviations result to portfolio credit risk.<br />

Risk management is particularly concerned with <strong>rare</strong> but significant large loss <strong>event</strong>s.<br />

It is there<strong>for</strong>e of significant interest to study the estimation of the tail probability the


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 45<br />

loss distribution. To be more specific, let the (L(t) : t ≥ 0) be a multidimensional<br />

APP with each component denoting the accumulated default loss of each individual<br />

credit portfolio. Our goal is to accurately compute the tail probability distribution<br />

of L(t).<br />

A conventional approach is the Fourier trans<strong>for</strong>m, which can be computed by<br />

solving a system of (<strong>general</strong>ized) Riccati ODE’s; see [37]. Then the probability distri-<br />

bution of L(t) can be calculated by Fourier inversion, which typically involves eval-<br />

uating the Fourier trans<strong>for</strong>m at a large number of points <strong>and</strong> each such evaluation<br />

requires solving an ODE system. This induces large computational complexity. To<br />

address this numerical problem, [46] develops a saddlepoint approximation so that<br />

one only needs to evaluate the Fourier trans<strong>for</strong>m at one single point (the saddle-<br />

point). However, their approximation requires the computation of up to the fourth<br />

partial derivatives of the cumulant generating function of L(t), which involves solv-<br />

ing an ODE system of size O(n 4 ), where n is the dimension of the underlying AJD.<br />

When n is large, this approximation becomes computationally expensive. The au-<br />

thors also propose to use an approximated saddlepoint instead the true saddlepoint,<br />

whose computation complexity is the bottleneck of the whole algorithm. However,<br />

the approximated saddlepoint per<strong>for</strong>ms terribly when the probability of interest goes<br />

deep into the tails.<br />

An alternative <strong>for</strong> <strong>computing</strong> the probability distribution of L(t) is Monte Carlo<br />

simulation. There has been an extensive research on asymptotic analysis as well as<br />

efficient simulation <strong>for</strong> portfolio credit risk. Most of the existing research, however, is<br />

focused on structural <strong>models</strong> (or threshold <strong>models</strong>) which are essentially originated<br />

from Merton’s seminal firm-value work ([70]). Both [58] <strong>and</strong> [61] suggest an impor-<br />

tance sampling (IS) approach based on empirical studies <strong>for</strong> single-factor Gaussian<br />

copula model. Large deviation asymptotics is developed in [48] <strong>for</strong> the loss distri-<br />

bution associated the single-factor Gaussian copula model <strong>and</strong> provide theoretical<br />

support <strong>for</strong> the IS procedure by proving its logarithmic efficiency. Generalizing the<br />

preceding result, [44] establishes the large deviation asymptotics <strong>for</strong> the multi-factor<br />

Gaussian copula model, which is later applied to prove the logarithmic efficiency<br />

of the IS estimator proposed in [45]. One caveat of the Gaussian copula model is


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 46<br />

that it fails to capture external dependence, which roughly means that variables may<br />

simultaneously take on large values with nonnegligible probability. To address this<br />

problem, [7] propose a t-copula model <strong>and</strong> derive sharp asymptotics <strong>for</strong> the associated<br />

loss distribution. In addition, they develop two IS estimators based on exponential<br />

twisting <strong>and</strong> hazard rate twisting (see, <strong>for</strong> example, [59]), respectively. They show<br />

that the first IS algorithm has bounded relative error <strong>and</strong> the second is logarithmi-<br />

cally efficient. Recently, [18] develops two efficient simulation schemes <strong>for</strong> the t-copula<br />

model based on conditional Monte Carlo <strong>and</strong> show that the estimators have bounded<br />

relative error.<br />

Despite the wide applications of reduced-<strong>for</strong>m <strong>models</strong> (or intensity-based <strong>models</strong>),<br />

research on asymptotic analysis <strong>and</strong> efficient simulation <strong>for</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong><br />

in these <strong>models</strong> is surprisingly little. For example, [6] proposes an logarithmically<br />

efficient IS scheme to estimate the loss distribution in a doubly stochastic intensity<br />

model with <strong>affine</strong> structure. Their work exploits the conditional independence of<br />

firm defaults in the doubly stochastic setting. The discussions in the most recent<br />

work in [41] <strong>and</strong> [42] are also done <strong>for</strong> self-exciting <strong>affine</strong> point processes in the same<br />

large dimension asymptotic regime. However, the model they studied are not “fully<br />

multidimensional”, in the sense that the coefficients (which are matrices) are basically<br />

diagonalizable <strong>and</strong> the correlation between different risk factors is introduced only<br />

through one single one-dimensional process. Our asymptotic analysis is developed<br />

in a more <strong>general</strong>ized setting. In particular, no diagonalizability of the coefficient of<br />

<strong>affine</strong> point process is assumed. In addition, we work in a different asymptotic regime,<br />

i.e. the large time horizon regime.<br />

3.2 Affine Point Processes<br />

Let X ∈ S = R m + × Rn−m <strong>for</strong> some 0 ≤ m ≤ n be a n-dimensional AJD defined by<br />

(2.1). Then, we call<br />

L(t) <br />

K<br />

i=1<br />

t<br />

an <strong>affine</strong> point process, where ζi : R n → R q , i = 1, . . . , K.<br />

0<br />

<br />

R n<br />

ζi(z)Ni(ds, dz), (3.3)


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 47<br />

In addition to Assumptions 1- 5 used in Chapter 2, we will use the following<br />

assumptions in this chapter as well.<br />

Assumption 6. The origin is in the interiors of both {θ ∈ R q : Ee θ⊺ ζi(Z i ) < ∞} <strong>and</strong><br />

{u ∈ Rn : Eeu⊺Z i<br />

< ∞}, where Zi has distribution ϕi, <strong>for</strong> i = 1, . . . , K.<br />

Remark 3.1. Assumption 6 asserts that the jump distributions are light-tailed so that<br />

the exponential martingale introduced in Section 3.4.1 is well defined. This certainly<br />

yields that EZ i p < ∞ <strong>for</strong> any p > 0 <strong>and</strong> each i = 1, . . . , K.<br />

Assumption 7. ζi : R n → R q<br />

+, <strong>for</strong> i = 1, . . . , K.<br />

The structures of the parameters play an critical role in our subsequent analysis.<br />

To this end, we need the following concepts on matrices.<br />

Definition 3.1. A matrix A ∈ R n×n is called an Z-matrix if A has non-positive<br />

off-diagonal elements, i.e. Aij ≤ 0 <strong>for</strong> 1 ≤ i = j ≤ n.<br />

Definition 3.2. A positive stable Z-matrix is called an M-matrix.<br />

We will use the following important property of M-matrices.<br />

Lemma 3.1. Let A be a n × n matrix. Suppose that A = sI − B <strong>for</strong> some s > 0 <strong>and</strong><br />

B ∈ R n×n<br />

+ . Then, A is an M-matrix, if <strong>and</strong> only if s > sp(B), the spectral radius of<br />

B.<br />

Proof. It follows from Definition (1.2) on Page 133 of [9] that sI − B is a M-matrix<br />

if s > sp(B).<br />

Now we show the other direction of the statement. Suppose A = sI − B is a<br />

M-matrix. Note that<br />

det(tI − A) = det((t − s)I + B),<br />

so s − r is an eigenvalue of A if r is an eigenvalue of B. Since B is a nonnegative<br />

matrix, we know that sp(B) ≥ 0 is an eigenvalue of B. So s − sp(B) is an eigenvalue<br />

of A = sI − B. It follows immediately that s > sp(B) since eigenvalues of A have<br />

positive real parts.


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 48<br />

Remark 3.2. The block lower triangular structure of β implies that Assumption 3 is<br />

equivalent to the condition that both βI,I <strong>and</strong> βJ,J are positive stable. Hence, βI,I is<br />

an M-matrix. <br />

3.3 Central Limit Theorem<br />

We have studied the stochastic stability of AJDs in Chapter 2. It is natural to spec-<br />

ulate that an APP would exhibit certain equilibrium behavior when the associated<br />

AJD is “stable”. We characterize the typical behavior of an APP in terms of the<br />

following central limit convergence<br />

t −1/2 (L(t) − rt) ⇒ N (0, Σ)<br />

as t → ∞, <strong>for</strong> some r ∈ R q <strong>and</strong> Σ ≻ 0, where N(0, Σ) is a Gaussian rv with mean 0<br />

<strong>and</strong> covariance matrix Σ. One could use the same argument we used in Section 2.3.3<br />

<strong>for</strong> deriving the CLT <strong>for</strong> X, so we will omit technical details <strong>and</strong> only calculate r <strong>and</strong><br />

Σ explicitly.<br />

Consider the local martingale of the <strong>for</strong>m U(t) L(t) − rt + A(X(t) − X(0)) <strong>for</strong><br />

some r ∈ R q <strong>and</strong> A ∈ R q×n that will be specified later. Then, by (2.1),<br />

U(t) =L(t) − rt +<br />

=<br />

+<br />

K<br />

i=1<br />

+ [<br />

+<br />

K<br />

t<br />

i=1 0<br />

t <br />

0<br />

<br />

R n<br />

t<br />

R n<br />

0<br />

A(b − βX(s)) ds +<br />

Az Ni(ds, dz)<br />

(ζi(z) + Az) Ñi(ds, dz) +<br />

t<br />

0<br />

t<br />

K<br />

λi(Eζi(Z i ) + AEZ i ) + Ab − r]t<br />

i=1<br />

t<br />

0<br />

[<br />

K<br />

i=1<br />

(Eζi(Z i ) + AEZ i κ ⊺<br />

i<br />

) − Aβ]X(s) ds.<br />

0<br />

Aσ(X(s)) dW (s)<br />

Aσ(X(s)) dW (s)


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 49<br />

Hence, if we choose r <strong>and</strong> A such that<br />

K<br />

λi(Eζi(Z i ) + AEZ i ) + Ab − r = 0<br />

i=1<br />

K<br />

i=1<br />

(Eζi(Z i ) + AEZ i κ ⊺<br />

i<br />

) − Aβ = 0<br />

then U(t) is a local martingale in R q . Assumption 5 implies that β − K<br />

i=1 EZi κ ⊺<br />

i is<br />

invertible so we can solve the above equations explicitly as follows<br />

A = (<br />

r =<br />

K<br />

i=1<br />

K<br />

i=1<br />

Next we calculate 〈U〉(t). Note that<br />

U(t) =<br />

K<br />

i=1<br />

t<br />

0<br />

<br />

R n<br />

where A is given by (3.4). There<strong>for</strong>e,<br />

where<br />

〈U〉(t) =<br />

=<br />

K<br />

t<br />

<br />

=(AaA ⊺ +<br />

Eζi(Z i )κ ⊺<br />

i )(β −<br />

K<br />

i=1<br />

λi(Eζi(Z i ) + AEZ i ) + Ab<br />

(ζi(z) + Az) Ñi(ds, dz) +<br />

EZ i κ ⊺<br />

i )−1<br />

t<br />

0<br />

Aσ(X(s)) dW (s),<br />

(ζi(z) + Az)(ζi(z) + Az) ⊺ ϕi(dz)Λi(X(s−)) ds<br />

i=1 0 Rn t<br />

+ Aσ(X(s))σ(X(s))<br />

0<br />

⊺ A ⊺ ds<br />

K<br />

C<br />

i=1<br />

i<br />

t<br />

(λi + κ<br />

0<br />

⊺<br />

t<br />

n<br />

i X(s)) ds + A(a + α<br />

0<br />

j=1<br />

j Xj(s))A ⊺ ds<br />

K<br />

λiC<br />

i=1<br />

i n<br />

)t + [Aα<br />

j=1<br />

j A ⊺ K<br />

+ κijC<br />

i=1<br />

i t<br />

]<br />

0<br />

C i = E(ζi(Z i ) + AZ i )(ζi(Z i ) + AZ i ) ⊺ , i = 1, . . . , K.<br />

(3.4)<br />

Xj(s) (3.5)


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 50<br />

If X is ergodic, then Corollary 2.1<br />

as t → ∞, where<br />

1<br />

t<br />

c = (β −<br />

t<br />

0<br />

K<br />

i=1<br />

It follows from (3.5) <strong>and</strong> (3.6) that<br />

as t → ∞, where<br />

Σ = AaA ⊺ +<br />

X(s) ds → c a.s. (3.6)<br />

EZ i κ ⊺<br />

i )−1 (b +<br />

K<br />

λiEZ i ).<br />

i=1<br />

t −1 〈U〉(t) → Σ a.s.<br />

K<br />

λiC i +<br />

i=1<br />

n<br />

j=1<br />

cj[Aα j A ⊺ +<br />

K<br />

i=1<br />

κijC i ].<br />

Theorem 3.1. Suppose that Z i ∈ R n has distribution ϕi <strong>and</strong> EZ i 2+ɛ < ∞, <strong>for</strong><br />

some ɛ > 0 <strong>and</strong> all i = 1, . . . , K. Then, under Assumptions 1 - 5,<br />

as t → ∞, where<br />

r = K<br />

Γ = AaA ⊺ +<br />

t −1/2 (L(t) − rt) ⇒ N (0, Σ)<br />

i=1 λi(Eζi(Z i ) + AEZ i ) + Ab<br />

K<br />

K<br />

λiC i +<br />

i=1<br />

Eζi(Z i )κ ⊺<br />

i )(β −<br />

n<br />

j=1<br />

K<br />

cj[Aα j A ⊺ +<br />

K<br />

i=1<br />

κijC i ]<br />

A = (<br />

EZ<br />

i=1<br />

i=1<br />

i κ ⊺<br />

i )−1<br />

C i = E(ζi(Z i ) + AZ i )(ζi(Z i ) + AZ i ) ⊺ , i = 1, . . . , K.<br />

Example 3.1. Consider a one-dimensional <strong>general</strong>ized Hawkes process L satisfying<br />

dX(t) = (b − βX(t))dt + σ X(t)dW (t) + dL(t), (3.7)


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 51<br />

L(T ) N(T )<br />

Mean Variance Mean Variance<br />

Theoretical Approximation<br />

(r1T )<br />

56.9767<br />

(Σ1,1T )<br />

59.5238<br />

(r2T )<br />

81.3953<br />

(Σ2,2T )<br />

110.2293<br />

Estimation 57.4710 59.1638 82.1990 109.1025<br />

Table 3.1: Theoretical v.s. Estimated Mean/Variance<br />

<strong>and</strong> L(t) = N(t)<br />

i=1 Zi where N(t) is a counting process with intensity X(t) <strong>and</strong> where<br />

Zi’s are iid rv’s with common distribution ϕ. Let Y (t) = (L(t), N(t)) ⊺ . Then,<br />

as t → ∞, where<br />

r =<br />

b<br />

β − EZ1<br />

t −1/2 (Y (t) − rt) ⇒ N (0, Σ)<br />

<br />

EZ1<br />

1<br />

<strong>and</strong> Σ = b(β2 + σ 2 )<br />

(β − EZ1) 3<br />

Hence, we have the following approximation<br />

L(T ) D<br />

≈ r1T + N (0, Σ1,1T )<br />

N(T ) D<br />

≈ r2T + N (0, Σ2,2T )<br />

<br />

2 EZ1 EZ1<br />

.<br />

EZ1 1<br />

<strong>for</strong> T large, where D<br />

≈ means “approximately equal to in distribution”. Let us see an<br />

numerical example, where we choose the parameters as follows: T = 100, b = 3.5,<br />

β = 5, σ = 0.2 <strong>and</strong> ϕ is uni<strong>for</strong>m on {0.4, 0.6, 0.8, 1.0}. We generate 1000 sample paths<br />

to estimate (L(T ), N(T )) <strong>and</strong> compare against the above approximations. Figure 3.1<br />

shows a simulated path of (X(t), L(t)). The results are shown in Figure 3.2 <strong>and</strong> Table<br />

3.1.<br />

3.4 Large Deviations<br />

We are interested in the probability P(L(t) > x) <strong>for</strong> x large. One mathematical<br />

treatment is to change the <strong>for</strong>m of the problem <strong>and</strong> consider P(L(t) > Rt). When t is


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 52<br />

Figure 3.1: Simulated Sample Path of X(t) <strong>and</strong> L(t).<br />

Figure 3.2: Histogram v.s. Fitted Probability Density. Left: L(T ); Right: N(T ).


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 53<br />

“moderate” (so that Rt is not too large, one could use the central limit approximation<br />

as discussed in the last section. On the other h<strong>and</strong>, when t is large, the asymptotics<br />

of P(L(t) > Rt) falls into the regime that can be treated by the Gärtner-Ellis theorem<br />

(see, <strong>for</strong> example, [24]).<br />

Note that Theorem 3.1 indicates that t −1 L(t) → r as t → ∞. Hence,<br />

P(L(t) ≥ Rt) → 0<br />

as t → ∞ <strong>for</strong> R > r, i.e. ((L(t) ≥ Rt) : t ≥ 0) is a family of <strong>rare</strong>-<strong>event</strong>s. In this<br />

section, we will compute the asymptotics<br />

1<br />

lim log P(L(t) ≥ Rt). (3.8)<br />

t→∞ t<br />

The Gärtner-Ellis theorem provides a mechanism <strong>for</strong> <strong>computing</strong> the above logarithmic<br />

asymptotics. In particular, the computation of<br />

plays a key role in the calculation.<br />

lim<br />

t→∞ t−1 log E exp(θ ⊺ L(t)) (3.9)<br />

3.4.1 A Class of Exponential Martingales<br />

In order to compute (3.9), we first attempt to construct a martingale of the <strong>for</strong>m<br />

M(t) = M(t, θ) = exp[θ ⊺ L(t) − φt + h(X(t)) − h(X(0))] (3.10)<br />

Both φ <strong>and</strong> h : S → R clearly depend on the choice of θ, but we choose (temporarily)<br />

to suppress the dependence on θ <strong>for</strong> notational simplicity. The <strong>affine</strong> structure of J<br />

<strong>and</strong> X suggests that we take h of the <strong>for</strong>m h(x) = u ⊺ x <strong>for</strong> some suitable u ∈ R n that<br />

depends on θ.


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 54<br />

Let Y (t) = θ ⊺ L(t) − φt + u ⊺ (X(t) − X(0)). Itô’s <strong>for</strong>mula establishes that<br />

M(t) = 1+<br />

t<br />

0<br />

M(s−) dY c (s)+ 1<br />

2<br />

t<br />

0<br />

M(s−) d[Y ] c (s)+ <br />

(M(s)−M(s−)), (3.11)<br />

0


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 55<br />

Plugging (3.12), (3.13), <strong>and</strong> (3.14) into (3.11) yields that<br />

M(t) =1 +<br />

+<br />

+<br />

+<br />

t<br />

0<br />

t<br />

0<br />

t<br />

0<br />

t<br />

0<br />

M(s−)[u ⊺ b − φ + 1<br />

2 u⊺ au +<br />

M(s−)u ⊺ σ(X(s)) dW (s) + 1<br />

2<br />

M(s−)[<br />

<br />

R n<br />

K<br />

i=1<br />

M(s−)<br />

(Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

K<br />

i=1<br />

n<br />

j=1<br />

λi(Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

u ⊺ α j u<br />

− 1)κ ⊺<br />

i − u⊺ β]X(s) ds<br />

K<br />

(e θ⊺ζi(z)+u⊺z − 1)Ñi(ds, dz).<br />

i=1<br />

− 1)] ds<br />

⊺<br />

M(s−)Xj(s) ds<br />

Evidently, M(t) is a local martingale if we choose φ ∈ R <strong>and</strong> u ∈ R n such that<br />

<strong>and</strong><br />

n<br />

l=1<br />

u ⊺ b − φ + 1<br />

2 u⊺ au +<br />

ulβl,j − 1<br />

2 u⊺ α j u −<br />

K<br />

i=1<br />

K<br />

i=1<br />

(Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

λi(Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

0<br />

− 1) = 0 (3.15)<br />

− 1)κi,j = 0, j = 1, . . . , n. (3.16)<br />

Note that α j = 0, κi,j = 0 <strong>for</strong> i = 1, . . . , n, j = m + 1, . . . , n <strong>and</strong> that βl,j = 0 <strong>for</strong><br />

l = 1, . . . , m <strong>and</strong> j = m + 1, . . . , n. So it follows from (3.16) that<br />

n<br />

l=m+1<br />

ulβl,j = 0, j = m + 1, . . . , n,<br />

which, written in matrix <strong>for</strong>m, is equivalent to<br />

u ⊺<br />

J βJ,J = 0.<br />

It follows from Remark 3.2 that βJ,J is nonsingular. There<strong>for</strong>e, uJ = 0, i.e. ul = 0<br />

<strong>for</strong> l = m + 1, . . . , n.<br />

Remark 3.3. Here is a heuristic interpretation <strong>for</strong> the fact that uJ = uJ(θ) ≡ 0 <strong>for</strong> all


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 56<br />

θ. The magnitude of L(t) is (at least on average) roughly the same as its compensator<br />

K<br />

i=1<br />

t<br />

0<br />

<br />

Λi(X(s)) ds<br />

Rn ζi(z) ϕi(dz),<br />

which does not depend on XJ(t), i.e. Xl(t), l = m + 1, . . . , n. Hence, only Xl(t),<br />

l = 1, . . . , m are needed to “offset” the r<strong>and</strong>omness of L(t) <strong>and</strong> thus ul ≡ 0 <strong>for</strong><br />

l = m + 1, . . . , n.<br />

We can simplify (3.15) <strong>and</strong> (3.16) even further. By the assumptions on the struc-<br />

tures of α <strong>and</strong> a, (3.15) <strong>and</strong> (3.16) can be simplified to<br />

<strong>and</strong><br />

m<br />

l=1<br />

ulβl,j − 1<br />

2 αj<br />

j,j u2 j −<br />

u ⊺ b − φ +<br />

K<br />

i=1<br />

K<br />

i=1<br />

λi(Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

(Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

− 1) = 0 (3.17)<br />

− 1)κi,j = 0, j = 1, . . . , m. (3.18)<br />

(3.17) asserts that φ = φ(θ) is computable from u = u(θ). Suppose <strong>for</strong> now that<br />

(3.18) has a solution pair (θ, u). Consider another parameter set (a, α, b, β, λ, κ) as<br />

follows. Define λ ∈ R K + <strong>and</strong> κ ∈ R K×n<br />

+<br />

where κ ⊺<br />

i<br />

<strong>and</strong> κ⊺<br />

i<br />

define β ∈ R n×n such that<br />

<br />

λi = λi<br />

<br />

κi = κi<br />

R n<br />

R n<br />

such that<br />

e θ⊺ ζi(z)+u ⊺ z ϕi(dz) (3.19)<br />

e θ⊺ ζi(z)+u ⊺ z ϕi(dz) (3.20)<br />

is the i-th row of κ <strong>and</strong> κ, respectively, <strong>for</strong> i = 1, . . . , K. Moreover,<br />

βj = βj − α j u, j = 1, . . . , n<br />

where βj <strong>and</strong> βj are the j-th column of β <strong>and</strong> β, respectively, <strong>for</strong> j = 1, . . . , n. Note


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 57<br />

that α j = 0 <strong>and</strong> uj = 0 <strong>for</strong> j = m + 1, . . . , n, from which it follows immediately that<br />

So<br />

β =<br />

βI,I − diag(α 1 1,1u1, . . . , α m m,mum) 0<br />

βI,J<br />

βJ,J<br />

βI,I = βI,I − diag(α 1 1,1u1, . . . , α m m,mum)<br />

<br />

. (3.21)<br />

is a Z-matrix since β is a Z-matrix. There<strong>for</strong>e, we have the following proposition.<br />

Proposition 3.1. Suppose (θ, uI) solves (3.18). Then, under Assumptions 1 <strong>and</strong> 6,<br />

(a, α, b, β, λ, κ) is admissible, where β, λ, <strong>and</strong> κ are respectively defined by (3.21),<br />

(3.19), <strong>and</strong> (3.20).<br />

Hence,<br />

Note that, by (3.15) <strong>and</strong> (3.16),<br />

dM(t) = M(t−)(u ⊺ σ(X(t)) dW (t) +<br />

t<br />

M(t) = exp<br />

+<br />

−<br />

K<br />

0<br />

t<br />

i=1 0<br />

t<br />

K<br />

i=1<br />

0<br />

K<br />

<br />

i=1<br />

R n<br />

u ⊺ σ(X(s)) dW (s) − 1<br />

2<br />

<br />

<br />

Rn (θ ⊺ ζi(z) + u ⊺ z)Ni(ds, dz)<br />

R n<br />

(e θ⊺ ζi(z)+u ⊺ z − 1)Ñi(dt, dz)). (3.22)<br />

t<br />

u<br />

0<br />

⊺ σ(X(s))σ ⊺ (X(s))u ds<br />

(e θ⊺ζi(z)+u⊺ <br />

z<br />

− 1)ϕi(dz)Λi(X(s)) ds<br />

(3.23)<br />

Proposition 3.2. Suppose that (3.18) has a solution pair (θ, uI). Let u = (uI; 0) ∈<br />

R n . Then, under Assumptions 1 <strong>and</strong> 6, (M(t) : t ∈ [0, T ]) defined by (3.23) is a<br />

martingale <strong>for</strong> each T > 0.<br />

Proof. The proof is essentially the same as that given in [21] in the context of a<br />

generic jump diffusion with a possible explosion. The similar idea was also developed<br />

in [98] <strong>for</strong> diffusion processes. We provide here a simplified version adapted to the<br />

APPs.


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 58<br />

It follows from (3.22) <strong>and</strong> (3.23) that M(t) is a positive local martingale, <strong>and</strong><br />

thus a supermartingale. Consequently, it suffices to show that EM(T ) = 1. Let<br />

(a, α, b, β, λ, κ) be defined by (3.19), (3.20), <strong>and</strong> (3.21). Moreover, let<br />

<strong>for</strong> i = 1, . . . , K. Set<br />

Note that<br />

ϕi(dz) = eθ⊺ ζi(z)+u⊺z ϕi(dz)<br />

<br />

Rn eθ⊺ζi(y)+u⊺y ϕi(dy)<br />

µ(x) = b − βx + σ(x)σ(x) ⊺ u.<br />

σ(x)σ ⊺ (x)u = (a +<br />

n<br />

xjα j )u =<br />

j=1<br />

n<br />

xjα j u<br />

j=1<br />

(3.24)<br />

by the fact that uj = 0 <strong>for</strong> i = m + 1, . . . , n <strong>and</strong> the assumption on the structure of<br />

a. Hence, µ(x) = b − βx.<br />

Now consider the AJD X(t) satisfying<br />

d X(t) = µ( X(t))dt + σ( X(t))dW (t) +<br />

K<br />

<br />

i=1<br />

R n<br />

z Ni(dt, dz), (3.25)<br />

where Ni(dt, dz) is a counting r<strong>and</strong>om measure on [0, ∞) × Rn with compensator<br />

Λi( X(t))dt ϕi(dz), where Λi(x) = λi + κ ⊺<br />

i x, <strong>for</strong> i = 1, . . . , K.<br />

For each k ≥ 1, we define the stopping times:<br />

τk = inf{t > 0 : X(t−) ≥ k or X(t) ≥ k}<br />

τk = inf{t > 0 : X(t−) ≥ k or X(t) ≥ k}<br />

Both X(t) <strong>and</strong> X(t) are nonexplosive by Lemma 2.1. Hence, these stopping times<br />

satisfy:<br />

lim<br />

k→∞ P(τk ≥ T ) = lim P(τk ≥ T ) = 1. (3.26)<br />

k→∞<br />

For each n, let X τk(t) = X(t)I(t < τk) be the stopped processes associated with<br />

(τk : k ≥ 1). Let M k (t) be the exponential local martingale by replacing X(t) by


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 59<br />

X τk(t) in (3.23). Note that<br />

<br />

1<br />

E exp<br />

2<br />

<br />

m 1<br />

= E exp<br />

2<br />

< ∞<br />

where<br />

t<br />

u<br />

0<br />

⊺ σ(X τk<br />

K<br />

t <br />

⊺ τk (s))σ (X (s))u ds +<br />

i=1 0 Rn α<br />

j=1<br />

j<br />

j,ju2 t<br />

j X<br />

0<br />

τk<br />

K<br />

j (s) ds + Eyi(Z<br />

i=1<br />

i t<br />

) (λi + κ<br />

0<br />

⊺<br />

i Xτk (s)) ds<br />

yi(z) = e θ⊺ ζi(z)+u ⊺ z (θ ⊺ ζi(z) + u ⊺ z − 1) + 1<br />

yi(z)ϕi(dz)Λi(X τk (s)) ds<br />

<strong>for</strong> i = 1, . . . , K, since X τk(s) is bounded. It then follows from Théorème IV.3 of [67]<br />

that (M k (t) : t ∈ [0, T ]) is a martingale. Hence, <strong>for</strong> each k ≥ 1, M k (t) induced a<br />

<br />

= M Ft k (t) <strong>for</strong> t ∈ [0, T ]. It<br />

probability measure Q k equivalent to P defined by dQk<br />

dP<br />

follows from Girsanov’s theorem that <strong>for</strong> each k ≥ 1,<br />

W k (t) = W (t) −<br />

t<br />

0<br />

σ ⊺ (X τk (s))u ds<br />

is a st<strong>and</strong>ard n-dimensional Brownian motion under Q k . In addition, Ni(dt, dz) has<br />

compensator Λi(X τk(t))dtϕi(dz) under Q k <strong>for</strong> each i = 1, . . . , K.<br />

It is easy to see that<br />

dX(t) = µ(X(t)) dt + σ(X(t)) dW k (t) +<br />

K<br />

<br />

i=1<br />

R n<br />

<br />

zNi(dt, dz) (3.27)<br />

<strong>for</strong> t ∈ [0, τk). By comparing (3.25) with (3.27), we conclude that (X(t) : t ∈ [0, τk))<br />

under Q k has the same distribution as ( X(t) : t ∈ [0, τk)) under P . There<strong>for</strong>e, by<br />

(3.26)<br />

as k → ∞. Moreover, note that<br />

EM k (T )I{τk≥T } = Q k (τk ≥ T ) = P (τk ≥ T ) → 1<br />

EM k (T )I{τk≥T } = EM(T )I{τk≥T } → EM(T )I{τ∞≥T }


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 60<br />

as k → ∞ by the monotone convergence theorem, where<br />

τ∞ inf{t > 0 : X(t) = ∞ or X(t−) = ∞}.<br />

The non-explosiveness of X implies that τ∞ = ∞ P -a.s.. There<strong>for</strong>e, we conclude that<br />

EM(T ) = 1.<br />

Remark 3.4. Note that (3.18) may have multiple solutions u <strong>for</strong> a given θ. Proposition<br />

3.2 indicates that M is a martingale <strong>for</strong> any solution pair (θ, u). This is a surprising<br />

result because one might expect that there exists a unique solution u <strong>for</strong> a given θ, <strong>for</strong><br />

which M is a martingale, while other solutions make M a strictly local martingale.<br />

In that case, one would be able to identify the “appropriate” solution branch u(θ)<br />

by verifying the martingality of M. In the current setting, we will show that the<br />

“appropriate” solution branch u(θ) turns out to be the one that makes X have certain<br />

stochastic stability property under the change of measure.<br />

The above observation <strong>general</strong>izes to other additive functionals of <strong>affine</strong> processes.<br />

We will use the following simple example <strong>for</strong> illustration.<br />

Example 3.2. Let X be a CIR process satisfying<br />

dX(t) = (b − βX(t)) dt + σ X(t) dW (t),<br />

with b, β, σ > 0. X is clearly nonnegative. For each θ ∈ R, consider the following<br />

exponential martingale<br />

<br />

M(t) = exp θ<br />

t<br />

0<br />

<br />

X(s) ds − φt + u(X(t) − X(0))<br />

with φ <strong>and</strong> u to be determined. Applying Itô’s <strong>for</strong>mula,<br />

<br />

dM(t) = M(t) (θ − βu + 1<br />

2 σ2u 2 )X(t)dt + (bu − φ)dt + σu <br />

X(t)dW (t) ,


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 61<br />

which implies that φ = bu <strong>and</strong><br />

1<br />

2 σ2 u 2 − βu + θ = 0.<br />

Hence, there are two solutions u1 <strong>and</strong> u2 if θ < β2<br />

2σ 2 , i.e.<br />

u1 = β + β2 − 2θσ2 σ2 , u2 = β − β2 − 2θσ2 σ2 .<br />

We use the subscript i to indicates the quantities associated with the solution ui,<br />

i = 1, 2. A direct application of Girsanov’s theorem yields that X satisfies<br />

dX(t) = (b − βX(t))dt + σ X(t)d W (t)<br />

where W is a st<strong>and</strong>ard Brownian motion under the probability measure Q induced<br />

by M, <strong>and</strong><br />

β = β − uσ 2 = ∓ β 2 − 2θσ 2 .<br />

Since β1 < 0, X is not mean-reverting under Q1 <strong>and</strong> hence is transient. On the other<br />

h<strong>and</strong>, β2 > 0 so X is recurrent under Q2. So<br />

<br />

exp θ<br />

t<br />

<strong>and</strong> it can be shown that<br />

as t → ∞, implying<br />

0<br />

<br />

X(s) ds = e φ2t Q2 E exp(−u2(X(t) − X(0)))<br />

E Q2 exp(−u2(X(t) − X(0))) = O(1) (3.28)<br />

<br />

1<br />

lim log E exp θ<br />

t→∞ t<br />

t<br />

0<br />

<br />

X(s) ds = φ2.<br />

Remark 3.5. We need to point out a subtlety in (3.28). If θ > 0, then u2 > 0 <strong>and</strong><br />

thereby exp(−u2X(t)) is bounded <strong>for</strong> all t > 0. The ergodicity of X will naturally<br />

yield (3.28). However, if θ < 0, then u2 < 0. In this case, exp(−u2X(t)) is unbounded.


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 62<br />

We need a stronger stochastic stability than ergodicity, namely exponential ergodicity,<br />

to ensure (3.28). An example illustrating the problem of the multiple solutions <strong>for</strong><br />

one dimensional Hawkes process appears in [100].<br />

3.4.2 Limiting Cumulant Generating Function<br />

As discussed at the beginning of Section 3.4.1, a key step to establish the logarithmic<br />

asymptotics (3.8) is to compute the limiting cumulant generating function (CGF)<br />

By Proposition 3.2,<br />

1<br />

lim<br />

t→∞ t log E exp(θ⊺L(t)). M(t) = exp(θ ⊺ L(t) − φt + u ⊺ (X(t) − X(0)))<br />

is a martingale where φ = φ(θ) <strong>and</strong> u = u(θ) are solved from (3.17) <strong>and</strong> (3.18). It<br />

follows that<br />

E exp(θ ⊺ L(t) − φt) = E Q exp(−u(X(t) − X(0)))<br />

where Q is the equivalent probability measure induced by M(t), i.e. dQ<br />

<br />

= M(t).<br />

Ft<br />

Recall the discussion in Example 3.2. If we can show that<br />

then we obviously have<br />

E Q exp(−u ⊺ (X(t) − X(0))) = O(1), (3.29)<br />

1<br />

φ(θ) = lim<br />

t→∞ t log E exp(θ⊺L(t)). (3.30)<br />

To facilitate our discussion on the properties of u(θ) <strong>and</strong> φ(θ), let Fj(θ, uI) denote<br />

the LHS of the j-th equation of (3.18) <strong>and</strong> F (θ, uI) = (F1(θ, uI), . . . , Fm(θ, uI)) :<br />

dP


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 63<br />

R q+m → R m . Then,<br />

∂Fj<br />

∂ul<br />

∂Fj<br />

∂uj<br />

= βl,j −<br />

K<br />

i=1<br />

= βj,j − α j<br />

j,j uj −<br />

κi,jEZ i l e θ⊺ ζi(Z i )+u ⊺ Z i<br />

, 1 ≤ l = j ≤ m<br />

K<br />

i=1<br />

κi,jEZ i l e θ⊺ ζi(Z i )+u ⊺ Z i<br />

.<br />

Then, J (θ, uI) = ( ∂Fj<br />

∂ul )1≤j,l≤m, the Jacobian matrix of F with respect to uI, is given<br />

by<br />

Note that<br />

J (θ, uI) ⊺<br />

=βI,I − diag((α 1 1,1u1, . . . , α m m,mum)) −<br />

J (0, 0) ⊺ = βI,I −<br />

K<br />

i=1<br />

K<br />

i=1<br />

(EZ i Ie θ⊺ ζi(Z i )+u ⊺ Z i<br />

EZ i Iκ ⊺<br />

i,I<br />

)κ ⊺<br />

i,I . (3.31)<br />

is nonsingular, whose eigenvalues have positive real parts by Assumption 5. Since<br />

J (θ, uI) is continuous in (θ, uI), we conclude that there exists an open neighborhood<br />

of the origin in R q+m within which the eigenvalues of J (θ, uI) have positive real parts.<br />

Define<br />

O = {(θ, uI) ∈ R q+m : J (θ, uI) is positive stable}.<br />

Note that <strong>for</strong> a given θ, there may exist multiple solutions uI. We choose the solution<br />

branch uI = uI(θ) <strong>for</strong> which (θ, uI) ∈ O. A critical reason is that by choosing uI this<br />

way, X is ergodic under Q (which depends on θ), so that (3.29) holds (at least <strong>for</strong><br />

u ∈ R n +). Recall that a crucial assumption regarding the stability of X is that it is<br />

“mean-reverting” in the sense of Assumption 3. Analogously, we have the following<br />

result.<br />

Proposition 3.3. Let (θ, uI) ∈ O be a solution to F (θ, uI) = 0. Then, under<br />

Assumptions 1, 3, <strong>and</strong> 6, β defined by (3.21) is positive stable.<br />

Proof. It follows from Remark 3.2 that βJ,J is positive stable. By virtue of the block


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 64<br />

lower triangular <strong>for</strong>m of β, it suffices to show that<br />

βI,I = βI,I − diag(α 1 1,1u1, . . . , α m m,mum)<br />

is positive stable. Proposition 3.1 indicates that βI,I is a Z-matrix. So, it suffices to<br />

show that βI,I is a M-matrix.<br />

By Lemma 3.1 <strong>and</strong> Assumption 3,<br />

<strong>for</strong> some A ∈ R m×m<br />

+<br />

where<br />

βI,I = sI − A<br />

<strong>and</strong> s > sp(A). Then,<br />

J (θ, uI) ⊺ = βI,I − D − G = sI − A − D − G,<br />

D = diag((α 1 1,1u1, . . . , α m m,mum))<br />

G =<br />

K<br />

i=1<br />

(EZ i Ie θ⊺ ζi(Z i )+u ⊺ Z i<br />

Clearly, G is nonnegative. Let t = min<br />

1≤i≤m ai i,iui, then<br />

κ ⊺<br />

i,I .<br />

J (θ, uI) ⊺ = (s − t)I − (A + G + H),<br />

where H = D −tI is a nonnegative matrix. Since (θ, uI) ∈ O, J (θ, uI) is a M-matrix.<br />

It follows immediately from Lemma 3.1 that<br />

Hence,<br />

is a M-matrix.<br />

s − t > sp(A + G + H) ≥ sp(A + H).<br />

βI,I = βI,I − D = (s − t)I − (A + H)


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 65<br />

Proposition 3.3 indicates that ˆ β satisfies Assumption 3. Moreover, it follows from<br />

(3.31) as well as the definitions of κ <strong>and</strong> βI,I in (3.20) <strong>and</strong> (3.21) that<br />

is positive stable. There<strong>for</strong>e,<br />

β −<br />

K<br />

i=1<br />

J (θ, uI) ⊺ = βI,I −<br />

EZ i κ ⊺<br />

i =<br />

K<br />

i=1<br />

EZ i Iκ ⊺<br />

i,I ,<br />

βI,I − K<br />

i=1 EZi I κ⊺<br />

i,I<br />

0<br />

βI,J − K<br />

i=1 EZi J κ⊺<br />

i,I βJ,J<br />

satisfies Assumption 5. Then, it follows from Theorem 2.1 that the AJD X satisfying<br />

the SDE (3.25) is ergodic. Equivalently, we have<br />

Proposition 3.4. Under Assumptions 1 - 6, X is ergodic under the probability mea-<br />

sure Q induced by the exponential martingale M.<br />

Now we are ready to show the following proposition which proves (3.30) <strong>for</strong> θ ∈ R q<br />

+.<br />

Proposition 3.5. Let θ ∈ R q<br />

+ <strong>and</strong> (θ, u, φ) be a solution triplet to (3.17) <strong>and</strong> (3.18)<br />

with (θ, uI) ∈ O. Then, under Assumptions 1 - 7,<br />

1<br />

φ(θ) = lim<br />

t→∞ t log E exp(θ⊺L(t)). Proof. Since F (θ, uI(θ)) = 0, the implicit function theorem implies that<br />

∇uI(θ) = −∇uI F (θ, uI(θ)) −1 ∇θF (θ, uI(θ)) = −J (θ, uI(θ)) −1 ∇θF (θ, uI(θ)). (3.32)<br />

Note that<br />

∇θF (θ, uI(θ)) ⊺ = −<br />

K<br />

i=1<br />

(Eζi(Z i )e θ⊺ ζi(Z i )+u(θ) ⊺ Z i<br />

<br />

)κ ⊺<br />

i,I ∈ Rq×m −<br />

(3.33)<br />

by Assumption 7. Moreover, J (θ, uI(θ)) is an M-matrix, so (J (θ, uI(θ)) ⊺ ) −1 ∈ R m×m<br />

+ ;<br />

see Page 137 of [9]. Hence,<br />

∇uI(θ) ≥ 0.


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 66<br />

It is easy to see that uI(0) = 0. There<strong>for</strong>e, uI(θ) ≥ 0 if θ ≥ 0.<br />

Note that<br />

exp(−u ⊺ X(t)) = exp(−<br />

m<br />

uiXi(t))<br />

is bounded, since Xi(·) ≥ 0 <strong>and</strong> ui ≥ 0 <strong>for</strong> i = 1, . . . , m. Consequently, by Proposition<br />

3.4 we have<br />

i=1<br />

E Q exp[−u ⊺ (X(t) − X(0))] = O(1)<br />

as t → ∞, thereby t −1 log E exp(θ ⊺ L(t)) → φ(θ) as t → ∞.<br />

3.4.3 Steepness<br />

Now that we have computed the limiting CGF φ = φ(θ), in order to apply the<br />

Gärtner-Ellis theorem the next step is verify φ is steep (see [24] <strong>for</strong> the definition) in<br />

the region of interest (namely, R q<br />

+ <strong>for</strong> our purpose). To this end, define<br />

Θ = {θ ∈ R q : ∃uI ∈ R m s.t. F (θ, uI) = 0, (θ, uI) ∈ O}.<br />

The steepness of φ is related to its behavior near the boundary of Θ. To completely<br />

characterize Θ is challenging. The one-dimensional case is solved in [100]; [63] <strong>and</strong><br />

[47] discuss a related problem <strong>for</strong> multidimensional <strong>affine</strong> diffusion processes without<br />

jumps by studying an associated dynamical system. In particular, they character-<br />

ize {θ : limt→∞ E exp(θ ⊺ X(t)) < ∞}, where X(t) is an <strong>affine</strong> diffusion (without<br />

jumps). Nevertheless, we do have the following property regarding the Jacobian ma-<br />

trix J (θ, uI(θ)) near the boundary of Θ.<br />

Lemma 3.2. Suppose ∂Θ = ∅. Then, det(J (θ, uI(θ))) → 0 as Θ ∋ θ → θ 0 ∈ ∂Θ.<br />

Proof. Let s = s(θ) = max1≤i≤m J (θ, uI(θ))ii <strong>and</strong> B = B(θ) = sI − J (θ, uI(θ)).<br />

Then, B ∈ R m×m<br />

+ <strong>and</strong> s > sp(B) by Lemma 3.1, since J (θ, uI(θ)) is an M-matrix <strong>for</strong><br />

each θ ∈ Θ. It follows from the Perron-Frobenius theorem that sp(B) is an eigenvalue<br />

of B.<br />

Moreover, the continuity of J (θ, uI(θ)) in θ implies that s is continuous in θ. B is<br />

there<strong>for</strong>e continuous in θ as well. So s − sp(B) → s(θ 0 ) − sp(B(θ 0 )) as θ → θ 0 . Note


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 67<br />

that since J (θ, uI) is positive stable <strong>and</strong> thus nonsingular <strong>for</strong> (θ, uI) ∈ O, we can<br />

extend the solution path (θ, uI) of F (θ, uI) = 0 from the origin to the boundary of O<br />

by the implicit function theorem (see, <strong>for</strong> example, [27]). Hence, (θ0, uI(θ0)) ∈ ∂O if<br />

θ0 ∈ ∂Θ. It follows that s(θ 0 ) − sp(B(θ 0 )) = 0 as θ → θ 0 , implying s − sp(B) → 0 as<br />

θ → θ 0 .<br />

There<strong>for</strong>e, letting (ηi : 1 ≤ i ≤ m) denotes the eigenvalues of B with η1 = sp(B),<br />

det(J (θ, uI(θ))) = det(sI − B) =<br />

as θ → θ 0 .<br />

m<br />

m<br />

(s − ηi) = (s − sp(B)) (s − ηi) → 0<br />

i=1<br />

Lemma 3.3. ∂Θ = ∅ if <strong>and</strong> only if κ = 0.<br />

Proof. If κ = 0, then (3.18) is trivially solved by uI(θ) ≡ 0 <strong>for</strong> all θ. Hence, Θ = R q<br />

<strong>and</strong> thereby ∂Θ = ∅.<br />

Now assume κ = 0, i.e. there exist i = 1, . . . , K <strong>and</strong> j = 1, . . . , m such that<br />

κi,j = 0. We have shown in the proof of Proposition 3.5 that uI(θ) ≥ 0 <strong>for</strong> θ ∈ Θ R q<br />

+.<br />

Hence, it follows from (3.18) that<br />

m<br />

l=1<br />

ulβl,j − 1<br />

2 αj<br />

j,j u2 j ≥ (Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

− 1)κi,j.<br />

Moreover, βI,I has non-positive off-diagonal elements. Hence,<br />

ujβj,j − 1<br />

2 αj<br />

j,j u2 j ≥ (Ee θ⊺ ζi(Z i )+u ⊺ Z i<br />

− 1)κi,j.<br />

The LHS is obviously upper bounded. In addition, ζi is nonnegative. Hence, ∂Θ R q<br />

+ =<br />

∅ thereby ∂Θ = ∅.<br />

Proposition 3.6. Suppose ∂Θ = ∅. Then, ∇φ(θ) → ∞ as θ → ∂Θ.<br />

Proof. It follows from (3.17) that<br />

φ(θ) = u(θ) ⊺ b +<br />

K<br />

i=1<br />

λiEe θ⊺ ζi(Z i )+u(θ) ⊺ Z i<br />

.<br />

i=2


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 68<br />

Hence,<br />

∇φ(θ) = ∇u(θ) ⊺ b +<br />

= ∇uI(θ) ⊺ bI +<br />

K<br />

i=1<br />

λiE(ζi(Z i ) + ∇u(θ) ⊺ Z i )e θ⊺ ζi(Z i )+u(θ) ⊺ Z i<br />

K<br />

i=1<br />

λiE(ζi,I(Z i ) + ∇uI(θ) ⊺ Z i I)e θ⊺ ζi(Z i )+uI(θ) ⊺ Z i I (3.34)<br />

since ui = 0 <strong>for</strong> i = m + 1, . . . , n. It follows from (3.32) <strong>and</strong> (3.33) that<br />

where<br />

∇uI(θ) = −J (θ, uI(θ)) −1 ∇θF (θ, uI(θ))<br />

∇θF (θ, uI(θ)) ⊺ =<br />

K<br />

i=1<br />

(Eζi(Z i )e θ⊺ ζi(Z i )+u ⊺ Z i<br />

)κ ⊺<br />

i,I<br />

Lemma 3.3 implies that κ = 0, thereby ∇θF (θ, uI(θ)) converges to a nonzero finite<br />

matrix as θ → ∂Θ. Then, it is easy to see that ∇φ(θ) → ∞ as θ → ∂Θ by Lemma<br />

3.2.<br />

Proposition 3.7. Suppose ∂Θ = ∅. Then, ∇φ(θ) → ∞ as θ → ∞ with θ ∈<br />

Θ R q<br />

+ .<br />

Proof. Lemma 3.3 implies that κ = 0, in which case u(θ) ≡ 0 <strong>for</strong> θ ∈ Θ = R q .<br />

Moreover, we have<br />

∇φ(θ) =<br />

which clearly finishes the proof.<br />

K<br />

λiEζi(Z i )e θ⊺ζi(Z i )<br />

,<br />

i=1<br />

Proposition 3.6 <strong>and</strong> Proposition 3.7 assert that φ(θ) is steep in R q<br />

+. We are now<br />

in a good shape to apply the Gärtner-Ellis theorem to establish the large deviation<br />

asymptotics <strong>for</strong> L(t).<br />

Theorem 3.2. Let r be the equilibrium mean of L(t) given in Theorem 3.1. Then<br />

under Assumptions 1 - 7, we have<br />

1<br />

lim log P(L(t) ≥ Rt) = −I(R),<br />

t→∞ t


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 69<br />

<strong>for</strong> any R ∈ R q with R > r, where I(R) = θ ⊺<br />

R R − φ(θR), <strong>and</strong> θR ∈ R q is such that<br />

∇φ(θR) = R.<br />

Proof. For each θ ∈ R q , define<br />

1<br />

ψ(θ) = lim<br />

t→∞ t log E exp(θ⊺L(t)). We have shown in Proposition 3.5 that ψ(θ) = φ(θ) <strong>for</strong> θ ∈ R q<br />

+.<br />

It then follows from Proposition 3.6 <strong>and</strong> Proposition 3.7 that ψ(θ) is steep in R q<br />

+.<br />

Let I(x) = sup θ∈R q[θ ⊺ x − ψ(θ)] is the Legendre-Fenchel trans<strong>for</strong>m of ψ(θ). It can be<br />

easily verified from (3.34) that ∇ψ(0) = ∇φ(0) = r. Then, <strong>for</strong> each x ≥ r, there<br />

exists a unique θx ∈ R q<br />

+ such that ∇ψ(θx) = x <strong>and</strong> I(x) = θ ⊺ xx − ψ(θx) = θ ⊺ xx − φ(θx);<br />

see Theorem 1 of [85]. There<strong>for</strong>e, <strong>for</strong> x ≥ r,<br />

∇xI(x) = (∇xθx)x + θx − ∇xθx∇φ(θx) = θx ≥ 0.<br />

It follows that I(x) ≥ I(y) if x ≥ y. Hence,<br />

inf I(x) = I(R),<br />

x∈A<br />

since x ≥ R <strong>for</strong> all x ∈ A ⊂ R q<br />

+. Finally, by the Gärtner-Ellis theorem<br />

completing the proof.<br />

1<br />

lim log P(L(t) ≥ Rt) = − inf<br />

t→∞ t x∈A I(x),<br />

3.5 Importance Sampling<br />

Suppose that we are interested in <strong>computing</strong> P(L(t) > Rt) <strong>for</strong> R > r via simulation.<br />

Given the key role θR plays in the large deviations asymptotics, it is not surprising<br />

that to consider an IS estimator associated with the change-of-measure Q induced by


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 70<br />

the exponential martingale M(t, θR). More specifically, let<br />

Y (t) M(t, θR) −1 I(L Q (t) ≥ Rt),<br />

= exp[θ ⊺ L Q (t) − φ(θR)t + u(θR) ⊺ (X Q (t) − X Q (0))]I(L Q (t) ≥ Rt) (3.35)<br />

where (X Q (t), L Q (t)) satisfies the SDE (2.1) with parameters (a, α, b, β, λ, κ) <strong>and</strong> ϕi<br />

as follows<br />

where θ = θR <strong>and</strong> u = u(θR).<br />

<br />

λi = λi<br />

Rn e θ⊺ζi(z)+u⊺z ϕi(dz)<br />

<br />

κi = κi<br />

Rn e θ⊺ζi(z)+u⊺z ϕi(dz)<br />

<br />

βI,I β<br />

− diag(α<br />

=<br />

1 1,1u1, . . . , αm m,mum) 0<br />

<br />

βI,J<br />

ϕi(dz) = eθ⊺ ζi(z)+u⊺z ϕi(dz)<br />

<br />

Rn eθ⊺ζi(z)+u⊺z ϕi(dz)<br />

It is shown in [93] that the IS estimator guided by the Gärtner-Ellis theorem<br />

is logarithmically efficient in a great <strong>general</strong>ity. We provide a proof here <strong>for</strong> self-<br />

containment.<br />

Theorem 3.3. Under Assumptions 1 - 7, the IS estimator (3.35) is logarithmically<br />

efficient, i.e.<br />

Proof. Note that<br />

lim<br />

t→∞<br />

log E Q Y (t) 2<br />

2 log E Q Y (t)<br />

= 1.<br />

E Q Y (t) 2 = E Q exp{−2[θ ⊺<br />

R L(t) − φ(θR)t + u(θR) ⊺ (X(t) − X(0))]}I(L(t) ≥ Rt)<br />

βJ,J<br />

≤ E Q exp{−2[θ ⊺<br />

R Rt − φ(θR)t + u(θR) ⊺ (X(t) − X(0))]}<br />

= exp(−2I(R)t)E Q exp[−2u(θR)(X(t) − X(0))],<br />

where I(R) = θ ⊺<br />

R R − φ(θR). Note that X(t) is ergodic under the probability measure<br />

Q as we discussed in the proof of Proposition 3.4 <strong>and</strong> that u(θR) ≥ 0 (since θR ≥ 0).


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 71<br />

There<strong>for</strong>e,<br />

as t → ∞. Hence,<br />

<strong>and</strong> thus<br />

E Q exp[−2u(θR)(X(t) − X(0))] = O(1)<br />

log EQY (t) 2<br />

log EQY (t) ≥ −2I(R)t + log E exp[−2u(θR)(X(t) − X(0))]<br />

log P(L(t) > Rt)<br />

lim<br />

t→∞<br />

log E Q Y (t) 2<br />

2 log E Q Y (t)<br />

by Theorem 3.2. On the other h<strong>and</strong>, note that E Q Y (t) 2 ≥ (E Q Y (t)) 2 by Jensen’s<br />

inequality, from which it follows that log E Q Y (t) 2 ≥ 2 log E Q Y (t). There<strong>for</strong>e,<br />

completing the proof.<br />

lim<br />

t→∞<br />

log E Q Y (t) 2<br />

2 log E Q Y (t)<br />

≥ 1<br />

≤ 1,<br />

3.6 Application to Portfolio Credit Risk<br />

Consider n portfolios exposed to credit risk. Let Lk(t) denote the accumulated default<br />

loss of portfolio i, i = 1, . . . , n. Suppose that Xi(t) is the idiosyncratic risk factor<br />

associated with portfolio i, i = 1, . . . , n, while X0(t) is the common risk factor associ-<br />

ated with the system risk. Assume that X is an AJD with the following specification.<br />

For i = 0, 1, . . . , n,<br />

<br />

dXi(t) = (bi − βiXi(t)) dt + σi Xi(t) dWi(t) + δi<br />

n<br />

dLk(t), (3.36)<br />

where βi > 0 controls the mean-reversion speed, bi > 0 governs the reversion level,<br />

βi > 0 describes the diffusive volatility, <strong>and</strong> δi ≥ 0 represents the sensitivity of Xi to<br />

a jump (i.e. default loss) in the portfolios. Moreover, <strong>for</strong> k = 1, . . . , n,<br />

Lk(t) =<br />

t<br />

0<br />

<br />

S<br />

zNk(ds, dz),<br />

k=1


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 72<br />

where Nk(d, dz) is a counting r<strong>and</strong>om measure on [0, ∞) × R+ with compensator<br />

(ωkX0(t−) + Xk(t−))ϕi(dz). Here, ωk > 0 <strong>and</strong> ϕk is the probability distribution of<br />

the individual default loss of portfolio k.<br />

Note that <strong>for</strong> each k = 1, . . . , n,<br />

Nk(t) <br />

t<br />

0<br />

<br />

S<br />

Nk(ds, dz)<br />

represents accumulated number of defaults <strong>for</strong> portfolio k. Then,<br />

Lk(t) =<br />

where (Z k j : j ≥ 1) is a sequence of iid R+-valued rv’s with common distribution ϕk,<br />

Nk(t) <br />

which represents the sequence of individual default losses <strong>for</strong> portfolio k.<br />

n By including a feedback term δi k=1 dLk(t) in the dynamics of the risk factor,<br />

the model (3.36) extends the doubly-stochastic model of [29], [82], [36] <strong>and</strong> so <strong>for</strong>th.<br />

If δi = 0, then the risk factor Xi does not respond to defaults. The doubly-stochastic<br />

<strong>for</strong>mulation corresponds to setting δi = 0 <strong>for</strong> all i. [43] provides asymptotically exact<br />

simulation scheme <strong>for</strong> generating samples of (3.36) without resorting to discretization.<br />

j=1<br />

We conducted the following numerical experiment to verify the logarithmic effi-<br />

ciency of the IS estimator (3.35). We let n = 4 <strong>and</strong> ϕi be the uni<strong>for</strong>m distribution<br />

on [0, 1] <strong>for</strong> all 1 ≤ i ≤ 4. The parameters of the model, b, β, σ, δ, ω are r<strong>and</strong>omly<br />

Z k j ,<br />

generated; see Table 3.2, where r is the equilibrium of L(t), i.e.<br />

L(t)<br />

r = lim<br />

t→∞ t<br />

<strong>and</strong> is computed by the <strong>for</strong>mula given in Theorem 3.1. In addition, R is the large<br />

deviation level <strong>and</strong> is set to be 1.05r. The results of the simulation are shown in Table<br />

3.3 as well as Figure 3.3, where E Q Y (t) refers to the estimated probability, E Q Y (t) 2<br />

refers to the estimated second moment of the IS estimator, <strong>and</strong> “Log Ratio” refers to<br />

log E Q Y (t) 2<br />

2 log E Q Y (t) .


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 73<br />

i 0 1 2 3 4<br />

bi 1.9818 4.8008 4.0125 2.5173 1.8957<br />

βi 5.9995 4.9699 5.5199 4.3371 4.0607<br />

σi 0.1565 0.2489 0.1631 0.1785 0.1388<br />

δi 0 0.5400 0.8044 0.2217 0.0944<br />

ωi N/A 0.2317 0.7400 0.5197 0.2132<br />

ri N/A 0.6286 0.6297 0.4265 0.2916<br />

Ri N/A 0.6601 0.6612 0.4479 0.3062<br />

Table 3.2: Parameter specification <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> APPs<br />

t E Q Y (t) E Q Y (t) 2 Log Ratio<br />

10 4.10E-02 2.69E-02 0.57<br />

50 2.18E-02 9.32E-03 0.61<br />

100 1.50E-02 4.40E-03 0.65<br />

500 8.71E-04 7.27E-05 0.68<br />

1000 1.34E-04 2.57E-06 0.72<br />

2000 8.34E-07 4.37E-10 0.77<br />

3000 4.78E-08 1.09E-12 0.82<br />

5000 1.60E-10 5.30E-18 0.88<br />

7500 2.46E-15 2.11E-27 0.91<br />

10000 5.68E-19 1.88E-34 0.93<br />

Table 3.3: Results of the numerical experiment <strong>for</strong> testing the logarithmic efficiency<br />

of the IS estimator.<br />

Obviously, the numerical results indicate the convergence of the above ratio to 1 as<br />

t → ∞.


CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 74<br />

Probability<br />

10 0<br />

10 −4<br />

10 −8<br />

10 −12<br />

10 −16<br />

10 −20<br />

Prob.<br />

Log Ratio<br />

0 2000 4000 6000 8000<br />

0.5<br />

10000<br />

t<br />

Figure 3.3: Convergence of Log Ratio as the probability tends to 0, showing the<br />

logarithmic efficiency of the IS estimator.<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

Log Ratio


Chapter 4<br />

Computing Large Deviations <strong>for</strong><br />

General State Space Markov<br />

Processes<br />

4.1 Introduction<br />

Be<strong>for</strong>e proceeding to a more <strong>general</strong> setting, let us give some remarks on the IS<br />

algorithm <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> APPs in Chapter 3. Recall that<br />

the probability of interest is<br />

P(L(t) > Rt),<br />

where L is an APP. A key step to find the appropriate “tilting” parameter θR <strong>for</strong> the<br />

IS estimator is to calculate the limiting CGF<br />

1<br />

φ(θ) lim<br />

t→∞ t log E exp(θ⊺L(t)). The above task is essentially the same as solving the eigenvalue problem<br />

H f(θ) = φ(θ)f(θ), (4.1)<br />

75


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 76<br />

where f(θ) = f(x, l; θ) is a function in (x, l) <strong>for</strong> each θ <strong>and</strong> H is the infinitesimal<br />

generator of (X(t), L(t)) with X(t) satisfying the SDE (2.1), i.e.<br />

H f(x, l; θ) =∇xf(x, l; θ) ⊺ (b − βx) + 1<br />

2<br />

+<br />

K<br />

i=1<br />

(λi + κ ⊺<br />

i x)<br />

<br />

S<br />

n<br />

i,j=1<br />

(ai,j +<br />

m<br />

k=1<br />

α k i,jxk) ∂2f (x, l; θ)<br />

∂xi∂xj<br />

(f(x + γi(z), l + ζi(z); θ) − f(x, l; θ))ϕi(dz).<br />

Not surprisingly, the eigenvalue problem (4.1) is explicitly solvable due to the <strong>affine</strong><br />

structure. In fact, the solution of the eigenfunction f has <strong>for</strong>m f(x, l; θ) = exp(u ⊺ x +<br />

θ ⊺ l).<br />

In fact, <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> Markov processes is typically related<br />

to an eigenvalue problem similar as (4.1), which usually does not have an explicit<br />

solution available unless certain special structure exists. The existing IS algorithm,<br />

however, heavily relies on the solution of such an eigenvalue problem. This chapter<br />

will be devoted to the discussion of how to compute <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> <strong>general</strong><br />

state space Markov processes in the absence of special structure.<br />

4.2 Markov-Dependent Sums<br />

Consider an (time-homogeneous) discrete time Markov chain Φ = (Φn : n ≥ 0)<br />

with state space S <strong>and</strong> transition kernel P (x, dy) = Px(Φ1 ∈ dy). Suppose that Φ<br />

is positive Harris recurrent so that there exists a unique stationary distribution π.<br />

Define the Markov-dependent sum<br />

where f : S → R with π(f) <br />

S<br />

n−1<br />

Sn = f(Φi),<br />

i=0<br />

f(x)π(dx) = 0 <strong>and</strong> π(|f|) < ∞. We will attempt to<br />

compute Px(Sn > cn) <strong>for</strong> c > 0 <strong>and</strong> n large. Clearly, by the SLLN <strong>for</strong> Markov chains,<br />

Sn<br />

n<br />

→ 0 a.s.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 77<br />

as n → ∞. Hence, ({Sn > cn} : n ≥ 0) is a family of <strong>rare</strong>-<strong>event</strong>s. Moreover, we<br />

usually have a large deviations result as follows<br />

1<br />

lim<br />

n→∞ n log Px(Sn > cn) = −I(c),<br />

where I(c) is called the rate function, which will be specified later. We now intro-<br />

duce the technical conditions that will guarantee the above large deviations result.<br />

The conditions are taken from [65], which <strong>general</strong>izes the results in [64] <strong>for</strong> bounded<br />

functions f.<br />

Assumption 8. (a) Φ is irreducible, aperiodic <strong>and</strong> satisfies the following Foster-<br />

Lyapunov criterion with V : S → [1, ∞):<br />

log(P e V ) ≤ V − δW + bIC<br />

<strong>for</strong> a small set C ⊂ S, constants δ > 0, b < ∞, <strong>and</strong> a function W : S → [1, ∞).<br />

(b) fW0 < ∞, where W0 : S → [1, ∞) <strong>for</strong> which<br />

lim<br />

l→∞ sup<br />

<br />

W0(x)<br />

I(W (x) > l) = 0.<br />

x∈S W (x)<br />

(c) There exists n0 such that, <strong>for</strong> each l < sup x W (x), there is a measure βl <strong>for</strong><br />

which sup gV n0) ≤ βl(A) <strong>for</strong> all<br />

A ∈ B(S) <strong>and</strong> x ∈ CW (l), where CW (l) {y : W (y) ≤ l} <strong>and</strong> τCW (l) is the first<br />

hitting time of CW (l).<br />

Remark 4.1. The drift criterion in (a) of Assumption 8 captures the essential ingre-<br />

dient of the large deviations conditions imposed by Donsker <strong>and</strong> Varadhan in their<br />

seminal work ([25] <strong>and</strong> [26]). <br />

Now define a positive kernel K(θ) = (K(θ, x, dy) : x, y ∈ S), where<br />

K(θ, x, dy) e θf(x) P (x, dy).


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 78<br />

Then the eigenvalue problem of interest is<br />

i.e. <br />

S<br />

K(θ)h(θ) = λ(θ)h(θ), (4.2)<br />

K(θ, x, dy)h(θ, y) = λ(θ)h(θ, x), x ∈ S.<br />

Proposition 4.1. Under Assumption 8, the eigenvalue problem (4.2) has a positive<br />

solution pair (λ(θ), h(θ)) <strong>for</strong> θ in a neighborhood of the origin. Moreover,<br />

1<br />

lim<br />

n→∞ n log Exe θSn = log λ(θ) ψ(θ).<br />

Proof. See Theorem 4.1 of [64] <strong>and</strong> Theorem 3.1 of [65].<br />

Define Dψ {θ : ψ(θ) < ∞}. The the origin belongs to Do ψ , the interior of Dψ.<br />

Remark 4.2. The eigenvalue λ(θ) associated with the positive eigenfunction h(θ) is<br />

in fact the largest eigenvalue (the eigenvalue having the largest absolute value) of the<br />

kernel K(θ), known as the Perron-Frobenius eigenvalue ([78], [75], <strong>and</strong> [76]). <br />

Proposition 4.2. Under Assumption 8, <strong>for</strong> 0 < c < sup θ∈Dψ∩(0,∞) ψ ′ (θ),<br />

where I(c) = θc · c − ψ(θc), <strong>and</strong> ψ ′ (θc) = c.<br />

1<br />

lim<br />

n→∞ n log Px(Sn > cn) = −I(c), (4.3)<br />

Proof. See Theorem 6.3 <strong>and</strong> Theorem 6.5 of [64] as well as Theorem 5.3 <strong>and</strong> Theorem<br />

5.4 of [65].<br />

Provided that the large deviations result (4.3) is valid, the most well-known IS<br />

algorithm <strong>for</strong> efficiently estimating Px(Sn > cn) <strong>for</strong> large n is the “exponential tilting”<br />

([1]). In particular, we simulate Φ under the tilted transition kernel<br />

exp(θcf(x) − ψ(θc))P (x, dy) h(θc, y)<br />

, (4.4)<br />

h(θc, x)


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 79<br />

<strong>and</strong> use the IS estimator<br />

Z I(Sn > cn) exp(−θcSn + nψ(θc)) h(θc, Φ0)<br />

. (4.5)<br />

h(θc, Φn)<br />

By the large deviations result (4.3), it can be shown easily that above IS estimator is<br />

logarithmically efficient (see, <strong>for</strong> example, [16]). There are also variations based on the<br />

above IS estimator. [33] <strong>and</strong> [34] introduce the dynamic importance sampling method.<br />

A closely related method is the state-dependent change-of-measure, introduced by<br />

[10], see also [12].<br />

Nevertheless, two major difficulties need to be tackled in order to implement the<br />

IS estimator Z.<br />

1. As mentioned earlier, it is hard to compute the solution to the eigenvalue prob-<br />

lem (4.2) numerically when the state space S is large or continuous when an<br />

explicit solution is unavailable.<br />

2. Even if the solution is available (explicitly or numerically), it is still hard to<br />

sample from the tilted transition kernel (4.4).<br />

Remark 4.3. Recall that <strong>for</strong> the <strong>rare</strong>-<strong>event</strong> simulation <strong>for</strong> APPs, we can simulate<br />

(X(t), L(t)) under the IS distribution because the IS distribution still belongs to the<br />

<strong>affine</strong> family, due to the special <strong>affine</strong> structure. <br />

In order to circumvent these two difficulties, we will impose <strong>and</strong> exploit the re-<br />

generative structure of Φ <strong>and</strong> introduce an importance sampler on the “cycle-path<br />

space” instead of the original one-step transition dynamics. A recent <strong>and</strong> closely<br />

related work is [17], which proposes the sequential importance sampling <strong>and</strong> resam-<br />

pling (SISR) procedure. Their idea is that at each step k < n, first generate m<br />

independent copies of Φk under the original transition dynamics <strong>and</strong> then resam-<br />

ple among these m copies with IS, whose weight function is roughly proportional to<br />

e ˆ θf(Φk)−ψ( ˆ θ) u( ˆ θ,Φk)<br />

u( ˆ θ,Φk−1) , where ˆ θ is an approximations to θc <strong>and</strong> u is a function satisfying<br />

K(θ)u(θ) ≤ (1 − δ)λ(θ)u(θ) <strong>for</strong> some δ ∈ (0, 1). Note that the function u appears in<br />

a discrete probability distribution, which is easy to sample from, thereby solving the<br />

second difficulty above.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 80<br />

Nevertheless, the SISR algorithm in [17] requires the knowledge of the solution to<br />

the eigenvalue problem (4.2). In the presence of continuous state space, they simply<br />

discretize the state space <strong>and</strong> numerically compute an approximation to ψ(θ) <strong>and</strong><br />

further an approximation to θc. There<strong>for</strong>e, they do not address the first difficulty.<br />

Our work, by contrast, attempt to address these two difficulties simultaneously. A<br />

crucial assumption we will use is the following minorization condition.<br />

Assumption 9. There exists a compact set C ⊂ S, a constant δ ∈ (0, 1), <strong>and</strong> a<br />

probability measure ϕ on S, such that<br />

P (x, dy) ≥ δϕ(·), x ∈ C.<br />

Remark 4.4. Harris recurrence requires that<br />

P m (x, dy) ≥ δϕ(·), x ∈ C, (4.6)<br />

<strong>for</strong> some m ≥ 1 (see Section A.2.1). It is well known that Φ has regenerative structure<br />

when m = 1 in (4.6), while only one-dependent regenerative structure exists when<br />

m > 1 (see [49] <strong>and</strong> references therein). <br />

When m = 1 the regenerative structure can be constructed via the “splitting<br />

method” due to [77] <strong>and</strong> [5]. Define<br />

Q(x, dy) <br />

P (x, dy) − δϕ(dy)<br />

.<br />

1 − δ<br />

Then Q(x, ·) is a probability distribution on S <strong>for</strong> each x ∈ C. Note that<br />

P (x, dy) = δϕ(dy) + (1 − δ)Q(x, dy),<br />

which indicates that one can identify the regenerations as follows.<br />

Suppose that Φ visits C at time σ. Flip a δ-coin. If the coin comes up with “heads”<br />

(which occurs with probaibility δ), distribute Φσ+1 according to ϕ; otherwise, distribute<br />

Φσ+1 according to Q. Then, conditional on the coin coming up “heads”, Φσ+1 has a


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 81<br />

distribution ϕ that is independent of Φσ. In other words, Φ regenerates whenever it<br />

distributes itself according to ϕ.<br />

Proposition 4.3. Suppose that Assumption 8 <strong>and</strong> Assumption 9 hold. Let τ be the<br />

(first) regeneration time, i.e. τ = inf{n ≥ 1 : Φn−1 ∈ C, P(Φn ∈ ·) = ϕ(·)}. Then,<br />

<strong>for</strong> θ ∈ D o ψ ,<br />

Eϕe θSτ −τψ(θ) = 1.<br />

Proof. Let Mn = e θSn−nψ(θ) h(θ, Φn). Then,<br />

<strong>and</strong> thus<br />

Rn Mn<br />

Mn−1<br />

θf(Φn−1)−ψ(θ) h(θ, Φn)<br />

= e<br />

h(θ, Φn−1) ,<br />

θf(Φn−1)−ψ(θ) h(θ, Φn)<br />

E[Rn|Xn−1] =EXn−1e<br />

h(θ, Φn−1)<br />

=<br />

=<br />

1<br />

λ(θ)h(θ, Φn−1)<br />

1<br />

λ(θ)h(θ, Φn−1)<br />

<br />

<br />

h(θ, y)e<br />

S<br />

θf(Φn−1)<br />

P (Φn−1, dy)<br />

S<br />

K(θ, Φn−1, dy)h(θ, y) = 1.<br />

There<strong>for</strong>e, (Mn : n ≥ 1) is a Px-martingale adapted to Φ. By the optional sampling<br />

theorem,<br />

h(θ, x) =Exe θSτ∧n−(τ∧n)ψ(θ) h(θ, Φτ∧n)<br />

=Exe θSτ −τψ(θ) h(θ, Φτ)I(τ ≤ n) + Exe θSn−nψ(θ) h(θ, Φn)I(τ > n)<br />

=Exe θSτ −τψ(θ) h(θ, Φτ)I(τ ≤ n) + h(θ, x) ˜ Px(τ > n) (4.7)<br />

where ˜ Px is the probability associated of the exponentially tilted transition kernel<br />

eθf(x)−nψ(θ) h(θ,y)<br />

. It follows from 8 <strong>and</strong> Proposition 2.12 of [65] that Φ is exponentially<br />

h(θ,x)<br />

ergodic under ˜ Px. In particular, we know ˜ Px(τ < ∞) = 1. There<strong>for</strong>e, sending n → ∞<br />

in (4.7) yields<br />

h(θ, x) = Exe θSτ −τψ(θ) h(θ, Φτ),


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 82<br />

<strong>for</strong> each x ∈ S. Hence,<br />

Eϕh(θ, Φ0) = Eϕe θSτ −τψ(θ) h(θ, Φτ).<br />

By the regenerative structure, Φτ is independent of (τ, Φ0, . . . , Φτ−1). Hence, Φτ is<br />

independent of e θSτ −τψ(θ) . It follows that<br />

<strong>and</strong> thus Eϕe θSτ −τψ(θ) = 1.<br />

Eϕh(θ, Φ0) =Eϕe θSτ −τψ(θ) Eϕh(θ, Φτ)<br />

=Eϕe θSτ −τψ(θ) Eϕh(θ, Φ0),<br />

Let 0 < T (1) < T (2) < · · · be the sequence of regeneration times <strong>and</strong> T (0) = 0.<br />

Define τk = T (k+1)−T (k), k ≥ 0. Set Nn = min{k ≥ 0 : τ0+· · ·+τk = T (k+1) ≥ n}.<br />

Then, Nn is a stopping time adapted to Φ.<br />

Note that due to the regenerative structure,<br />

⎛⎛<br />

⎝⎝<br />

<br />

T (k+1)−1<br />

f(Φi), τk<br />

i=T (k)<br />

⎞<br />

⎞<br />

⎠ : k ≥ 1⎠<br />

is an iid sequence. Then the following corollary is an immediate result of Proposition<br />

4.3 <strong>and</strong> Wald’s identity ([97]).<br />

Corollary 4.1. Suppose that Assumption 8 <strong>and</strong> Assumption 9 hold. Then <strong>for</strong> θ ∈ D o ψ ,<br />

M θ n e θS T (Nn+1)−T (Nn+1)ψ(θ) is a Pϕ-martingale adapted to Φ.<br />

Corollary 4.1 implies that (M θ n : n ≥ 1) induces a probability measure Pθ on the<br />

“cycle-path” space as follows<br />

d˜ Pθ <br />

ϕ <br />

<br />

= M<br />

dPϕ Fn<br />

θ n, (4.8)<br />

where (Fn : n ≥ 0) is the filtration generated by Φ. In particular, we have<br />

Pϕ(Sn > cn) = ˜ E θ ϕI(Sn > cn)e −θS T (Nn+1)+T (Nn+1)ψ(θ) . (4.9)


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 83<br />

Moreover, we can incorporate the case when ϕ(·) = δx(·):<br />

Px(Sn > cn) = ˜ E θ xI(Sn > cn)e −θ[S T (Nn+1)−Sτ 0 ]+[T (Nn+1)−τ0]ψ(θ) ,<br />

i.e. the dynamics of Φ remains unchanged till τ0, after which we introduce the IS as<br />

indicated by (4.8) <strong>for</strong> the regenerative cycles.<br />

The natural question here is how one should find an appropriate tilting parameter<br />

θ. Naturally, θc in the large deviations (4.3) is a good c<strong>and</strong>idate. But <strong>computing</strong><br />

θc requires <strong>computing</strong> the eigenvalue problem (4.2). So we will adopt an empirical<br />

approximation of θc, avoiding the computation of (4.2). In the sequel of this chapter,<br />

we will fix c ∈ (0, sup θ∈D o ψ ∩(0,∞) ψ ′ (θ)) <strong>for</strong> which the large deviations (4.3) holds <strong>and</strong><br />

let ϑ θc > 0 suppress the dependence on c <strong>for</strong> notational simplicity.<br />

4.3 Empirical Moment Generating Function<br />

For each i ≥ 0, define<br />

Yi <br />

T (i+1)−1 <br />

j=T (i)<br />

f(Φj).<br />

The regeneration structure indicates that ((Yi, τi) : i ≥ 1) is a sequence of iid rv’s with<br />

common distribution that is the same as (Y, τ). Proposition 4.3 suggests that we can<br />

consider the empirical approximation of Eϕe θY −τψ(θ) to compute an approximation of<br />

ψ(θ). More specifically, <strong>for</strong> m = 1, 2, . . ., define<br />

γm(θ, ζ) 1<br />

m<br />

m<br />

exp(θYi − ζτi)<br />

i=1<br />

be the empirical moment generating function of (Y, τ). We then can define the “em-<br />

pirical version” of ϑ <strong>and</strong> ψ. In particular, let ψm(θ) be the empirical CGF <strong>for</strong> which<br />

γm(θ, ψm(θ)) = 1, (4.10)


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 84<br />

<strong>and</strong> let ϑm be the empirical root <strong>for</strong> which<br />

ψ ′ m(ϑm) = c (4.11)<br />

We will show the consistency <strong>and</strong> asymptotic normality of ϑm. To this end, we<br />

need the following lemma.<br />

Lemma 4.1. Suppose that fm(ξ) is a rv <strong>for</strong> each ξ ∈ R, m ≥ 1, <strong>for</strong> which<br />

fm(ξ) a.s.<br />

→ f(ξ)<br />

<strong>for</strong> each ξ, as m → ∞, where f : R → R is a deterministic function. Also, suppose<br />

that f <strong>and</strong> fm are both increasing (or decreasing) in ξ, <strong>and</strong> that ξ0 <strong>and</strong> ξm solve<br />

f(ξ0) = 0 <strong>and</strong> fm(ξm) = 0, respectively. Then,<br />

as m → ∞.<br />

ξm<br />

a.s.<br />

→ ξ0<br />

Proof. Without loss of <strong>general</strong>ity, assume that f <strong>and</strong> fm are both increasing in ξ. For<br />

any ɛ > 0,<br />

fm(ξ0 − ɛ) a.s.<br />

→ f(ξ0 − ɛ) < f(ξ0) = 0,<br />

fm(ξ0 + ɛ) a.s.<br />

→ f(ξ0 + ɛ) < f(ξ0) = 0,<br />

as m → ∞. Hence, there exists m0 such that<br />

fm(ξ0 − ɛ) < 0 < fm(ξ0 + ɛ)<br />

<strong>for</strong> all m > m0. Note that fm(ξm) = 0 <strong>and</strong> fm is increasing. It follows that<br />

<strong>for</strong> all m > m0. There<strong>for</strong>e, ξm<br />

ξ0 − ɛ < ξm < ξ0 + ɛ<br />

a.s.<br />

→ ξ0 as m → ∞.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 85<br />

Proposition 4.4. Suppose that Assumption 8 <strong>and</strong> Assumption 9 hold. Then, ψm(θ) a.s.<br />

→<br />

ψ(θ) as m → ∞ <strong>for</strong> any θ ∈ D o ψ .<br />

Proof. Fix a θ ∈ D o ψ . Let f(ζ) = EeθY −ζτ , <strong>and</strong><br />

fm(ζ) = 1<br />

m<br />

m<br />

e θYi−ζτi , m ≥ 1.<br />

Then, the SLLN yields fm(ζ) a.s.<br />

→ f(ζ) as m → ∞ <strong>for</strong> any θ.<br />

i=1<br />

Note that, since τ > 0 a.s., f <strong>and</strong> fm are decreasing in ζ. It then follows immediately<br />

from Lemma 4.1 that ψm(θ) a.s.<br />

→ ψ(θ) as m → ∞ <strong>for</strong> any θ ∈ Do ψ .<br />

Proposition 4.5. Suppose that Assumption 8 <strong>and</strong> Assumption 9 hold. Then, ϑm<br />

as m → ∞.<br />

Proof. We know from Proposition 4.4 that ψm(θ) a.s.<br />

→ ψ(θ) as m → ∞.<br />

Note that<br />

Ee θY −ψ(θ)τ = 1.<br />

a.s.<br />

→ ϑ<br />

The dominated convergence theorem guarantees that we can interchange the differ-<br />

entiation <strong>and</strong> expectation. Differentiating both sides with respect to θ yields,<br />

<strong>and</strong><br />

Then,<br />

Likewise, we can show<br />

E[e θY −ψ(θ)τ (Y − ψ ′ (θ)τ)] = 0,<br />

E[e θY −ψ(θ)τ ((Y − ψ ′ (θ)τ) 2 − ψ ′′ (θ)τ)] = 0.<br />

ψ ′ (θ) =<br />

−ψ(θ)τ<br />

EY eθY<br />

EτeθY −ψ(θ)τ<br />

ψ ′′ (θ) = E(Y − ψ′ (θ)τ) 2e Eτe<br />

ψ ′ m(θ) =<br />

1 m<br />

m i=1<br />

1 m<br />

m i=1<br />

θY −ψ(θ)τ<br />

><br />

θY −ψ(θ)τ<br />

Yie θYi−ψm(θ)τi<br />

τie θYi−ψm(θ)τi<br />

0.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 86<br />

<strong>and</strong> ψ ′′ m(θ) > 0. Hence, ψ ′ m(θ) − c <strong>and</strong> ψ ′ (θ) − c are both increasing in θ. Moreover,<br />

ψ ′ m(θ) a.s.<br />

→ ψ ′ (θ)<br />

as m → ∞. It then follows from Lemma 4.1 that ϑm<br />

a.s.<br />

→ ϑ as m → ∞.<br />

We then immediately arrive at the following result by Proposition 4.4 <strong>and</strong> Propo-<br />

sition 4.5.<br />

Theorem 4.1. Suppose that Assumption 8 <strong>and</strong> Assumption 9 hold. Then,<br />

as m → ∞.<br />

ϑmc − ψm(ϑm) a.s.<br />

→ I(c)<br />

We now conclude this section with a discussion on the asymptotic normality of<br />

the empirical approximations.<br />

Theorem 4.2. Suppose that Assumption 8 <strong>and</strong> Assumption 9 hold. If 2ϑ ∈ Do ψ , then<br />

as m → ∞, where Σ = BΓB ⊺ ,<br />

<strong>and</strong><br />

where Z = e ϑY −ψ(ϑ)τ .<br />

<br />

√ ϑm ϑ<br />

m<br />

− ⇒ N (0, Σ)<br />

ψm(ϑm) ψ(ϑ)<br />

B =<br />

<br />

EY Z −EτZ<br />

EY (Y − cτ)Z −Eτ(Y − cτ)Z<br />

−1<br />

<br />

<br />

Var(Z) Cov(Z, (Y − cτ)Z)<br />

Γ =<br />

,<br />

Cov(Z, (Y − cτ)Z) Var((Y − cτ)Z)<br />

Proof. Define γ(θ, ζ) Ee θY −ζτ , <strong>and</strong> Dγ {(θ, ζ) : γ(θ, ζ) < ∞}. Then, γ is<br />

differentiable in D o γ. Note that γ(θ, ψ(θ)) = 0. Hence,<br />

∂<br />

∂<br />

γ(θ, ψ(θ)) +<br />

∂θ ∂ζ γ(θ, ψ(θ)) · ψ′ (θ) = 0.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 87<br />

Since ψ ′ (ϑ) = c, we have<br />

i.e.<br />

Likewise, we have<br />

∂<br />

∂<br />

γ(ϑ, ψ(ϑ)) + γ(ϑ, ψ(ϑ)) · c = 0,<br />

∂θ ∂ζ<br />

1<br />

m<br />

E(Y − cτ)e ϑY −ψ(ϑ)τ = 0. (4.12)<br />

m<br />

(Yi − cτi)e ϑmYi−ψm(ϑm)τi = 0.<br />

i=1<br />

It follows that (ϑ, ψ(ϑ)) solves F (ϑ, ψ(ϑ)) = 0, where<br />

<br />

<br />

θY −ζτ Ee<br />

F (θ, ζ) <br />

.<br />

E(Y − cτ)eθY −ζτ<br />

Likewise, (ϑm, ψm(ϑm)) solves Fm(ϑm, ψm(ϑm)) = 0, where<br />

Note that<br />

There<strong>for</strong>e,<br />

Fm(θ, ζ) <br />

1<br />

m 1 m<br />

m<br />

0 =Fm(ϑm, ψm(ϑm)) − F (ϑ, ψ(ϑ))<br />

m<br />

i=1 eθYi−ζτi<br />

i=1 (Yi − cτi)e θYi−ζτi<br />

=Fm(ϑm, ψm(ϑm)) − Fm(ϑ, ψ(ϑ)) + Fm(ϑ, ψ(ϑ)) − F (ϑ, ψ(ϑ)).<br />

− [Fm(ϑ, ψ(ϑ)) − F (ϑ, ψ(ϑ))]<br />

<br />

<br />

ϑm − ϑ<br />

=∇Fm(ϑ, ψ(ϑ))<br />

+ o((ϑm − ϑ, ψm(ϑm) − ψ(ϑ))). (4.13)<br />

ψm(ϑm) − ψ(ϑ)<br />

Note that since 2ϑ ∈ D o ψ , the covariance matrix of (eϑY −ψ(ϑ)τ , (Y −cτ)e θY −ζτ ) is finite.<br />

So we can apply the CLT to obtain<br />

√ m[Fm(ϑ, ψ(ϑ)) − F (ϑ, ψ(ϑ))] ⇒ N (0, Γ), (4.14)<br />

<br />

.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 88<br />

as m → ∞, where<br />

Γ =<br />

<br />

Var ϑY e −ψ(ϑ)τ<br />

Cov eϑY −ψ(ϑ) ϑY , (Y − cτ)e −ψ(ϑ)<br />

Moreover, note that by the SLLN,<br />

∇Fm(ϑ, ψ(ϑ)) →<br />

<br />

Cov eϑY −ψ(ϑ) , (Y − cτ)e<br />

Var ϑY (Y − cτ)e −ψ(ϑ)<br />

EY eϑY −ψ(ϑ) ϑY −ψ(ϑ)<br />

−Eτe<br />

EY (Y − cτ)eϑY −ψ(ϑ) ϑY −ψ(ϑ)<br />

−Eτ(Y − cτ)e<br />

as m → ∞. Note that the determinant of the RHS is<br />

E[τZ]E[Y (Y − cτ)Z] − E[Y Z]E[τ(Y − cτ)Z]<br />

=E[τZ](E[Y (Y − cτ)Z] − cE[τ(Y − cτ)Z]) by (4.12)<br />

=E[τZ]E[(Y − cτ) 2 Z] > 0<br />

<br />

ϑY −ψ(ϑ)<br />

<br />

a.s. (4.15)<br />

where Z = e ϑY −ψ(ϑ) . Hence, the RHS of (4.15) is nonsingular. Combining (4.13),<br />

(4.14), <strong>and</strong> (4.15) yields the desired result.<br />

4.4 A Bootstrap Based Simulation Algorithm<br />

Theorem 4.1 asserts that we could use e −n(ϑmc−ψm(ϑm)) as an approximation to P(Sn ><br />

cn) <strong>for</strong> large n <strong>and</strong> m. This approximation is not clearly accurate enough since it<br />

approximates P(Sn > cn) only at the logarithmic scale. To achieve a higher degree<br />

of precision, we need to do additional bootstrap-type resampling.<br />

Algorithm 1. (i) Generate m iid copies of (Y0, τ0), denoted by ((Y0,i, τ0,i) : 1 ≤<br />

i ≤ m), <strong>and</strong> m iid regenerative cycles ((Yi, τi) : 1 ≤ i ≤ m). Then compute ϑm<br />

<strong>and</strong> ψm(·).<br />

(ii) Sample ((Y0,i, τ0,i) with probability 1/m <strong>for</strong> all i <strong>and</strong> denote it by ( ˜ Y0, ˜τ0) <strong>and</strong><br />

˜Y0 = ˜τ0−1<br />

j=0 f( ˜ Xj).<br />

(iii) Sample the regenerative cycle i from the collection ((Yi, τi) : 1 ≤ i ≤ m) with<br />

.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 89<br />

probability<br />

1<br />

m exp(ϑmYi − ψm(ϑm)τi)<br />

<strong>and</strong> denote it by ( ˜ Y1, ˜τ1) <strong>and</strong> ˜ Y1 = ˜τ0+˜τ1−1 f( j=˜τ0<br />

˜ Xj).<br />

(iv) Continue sampling such cycles independently until the total cycle length exceeds<br />

n, i.e.<br />

Set Ñn = k.<br />

˜T (k + 1) ˜τ0 + ˜τ1 + · · · + ˜τk ≥ n.<br />

(v) Let ˜ Sn(m) be the sum of the first n f( ˜ Xj)’s associated with the above boot-<br />

strapped sequence of cycles, i.e.<br />

Put<br />

n−1<br />

˜Sn(m) = f( ˜ Xj) =<br />

j=0<br />

Ñn−1 <br />

i=0<br />

˜Yi +<br />

n−1<br />

j= ˜ T ( Ñn)<br />

f( ˜ Xj).<br />

Z(n, m) = I( ˜ ⎛<br />

Ñn <br />

Sn(m) > cn) exp ⎝ (−ϑm ˜ ⎞<br />

Yi + ψm(ϑm)˜τi) ⎠<br />

(vi) Repeat the above four steps r independent times, yielding Z1(n, m), . . . , Zr(n, m),<br />

<strong>and</strong> return<br />

Z(n, m, r) = 1<br />

r<br />

i=1<br />

r<br />

Zi(n, m).<br />

Proposition 4.6. Let Xm = {(Y0,i, τ0,i), (Yi, τi) : 1 ≤ i ≤ m}. Then,<br />

E[Z 2 (n, m)|Xm]<br />

≤e −2n(ϑmc−ψm(ϑm)) · 1<br />

m<br />

m<br />

i=1<br />

i=1<br />

+<br />

2ϑmY<br />

e 0,i−2ψm(ϑm)τ0,i 1<br />

·<br />

m<br />

m<br />

e<br />

i=1<br />

+ −<br />

ϑm(Y i +Y<br />

i )+ψm(ϑm)τi ,


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 90<br />

where<br />

Y +<br />

0,i =<br />

τ0,i−1<br />

<br />

j=0<br />

<br />

Y +<br />

i =<br />

T (i+2)−1<br />

j=T (i+1)<br />

<br />

Y −<br />

i =<br />

T (i+2)−1<br />

j=T (i+1)<br />

f + (Xj)<br />

f + (Xj)<br />

f − (Xj)<br />

<strong>for</strong> each i = 1, . . . , m. Here, f + <strong>and</strong> f − are respectively the positive <strong>and</strong> negative part<br />

of f.<br />

Proof. Note that<br />

E[Z 2 ⎡<br />

(n, m)|Xm] = E ⎣I( ˜ ⎛<br />

Ñn <br />

Sn(m) > cn) exp ⎝2 (−ϑm ˜ ⎞<br />

<br />

<br />

Yi + ψm(ϑm)˜τi) ⎠ <br />

<strong>and</strong> that<br />

Hence,<br />

Ñn <br />

i=1<br />

E[Z 2 (n, m)|Xm]<br />

<br />

≤E e −2ϑm<br />

<br />

<br />

=E e 2ϑmY0−2ψm(ϑm)τ0−2ϑm<br />

=e −2n(ϑmc−ψm(ϑm)) <br />

· E<br />

<br />

· E e −2ϑm<br />

T ˜(Ñn+1)−1 j=n<br />

cn−Y0+ ˜ T ( Ñn+1)−1<br />

j=n<br />

˜Yi = ˜ Sn(m) − Y0 +<br />

i=1<br />

˜T ( Ñn+1)−1 <br />

j=n<br />

f( ˜ Xj).<br />

f( ˜ <br />

Xj) +2ψm(ϑm)( ˜ T ( Ñn+1)−˜τ0) <br />

<br />

Xm<br />

˜ T (Ñn+1)−1<br />

j=n<br />

e 2ϑm ˜ Y0−2ψm(ϑm)˜τ0<br />

Xm Xm<br />

⎤<br />

⎦ ,<br />

f( ˜ Xj)−2n(θmc−ψm(ϑm))+2ψm(ϑm)( ˜ T ( Ñn+1)−n) <br />

Xm <br />

<br />

<br />

f( ˜ Xj)+2ψm(ϑm)( ˜ T ( Ñn+1)−n) <br />

Xm <br />

<br />

<br />

<br />

<br />

<br />

<br />

(4.16)<br />

where the last equality follows from the independence between ( ˜ Y0, ˜τ0) <strong>and</strong> ( ˜ Yi, ˜τi) <strong>for</strong>


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 91<br />

all i = 1, . . . , m, conditional on Xm.<br />

Let E ∗ [·|Xm] denote the probability measure under which we sample the regener-<br />

ative cycles in Step (iii) of Algorithm 1 uni<strong>for</strong>mly, i.e. select cycle i with probability<br />

1/m <strong>for</strong> all 1 ≤ i ≤ m. Then,<br />

<br />

E<br />

=E ∗<br />

<br />

e −2ϑm<br />

e −2ϑm<br />

≤E ∗<br />

<br />

e −2ϑm<br />

=E ∗<br />

<br />

e −ϑm<br />

<br />

e ϑm<br />

˜ T (Ñn+1)−1<br />

j=n<br />

˜ T (Ñn+1)−1<br />

j=n<br />

˜ T (Ñn+1)−1<br />

j=n<br />

<br />

<br />

f( ˜ Xj)+2ψm(ϑm)( ˜ T ( Ñn+1)−n) <br />

Xm f( ˜ Xj)+2ψm(ϑm)( ˜ T ( Ñn+1)−n) · e ϑm ˜ YÑn−ψm(ϑm)˜τ Ñn<br />

Xm <br />

<br />

<br />

f( ˜ Xj)+ϑm ˜ YÑn +ψm(ϑm)˜τ Ñn<br />

Xm T ˜(Ñn+1)−1 j=n f( ˜ n−1 Xj)+ϑm<br />

j= ˜ T ( Ñn) f( ˜ Xj)+ψm(ϑm)˜τ Ñn<br />

Xm ≤E ∗<br />

T ˜(Ñn+1)−1 j=n f − ( ˜ n−1 Xj)+ϑm<br />

j= ˜ T ( Ñn) f + ( ˜ Xj)+ψm(ϑm)˜τ Ñn<br />

Xm ≤E ∗<br />

<br />

e ϑm ˜ Y −<br />

Ñn +ϑm ˜ Y −<br />

Ñn +ψm(ϑm)˜τ <br />

<br />

Ñn<br />

Xm <br />

= 1<br />

m<br />

m<br />

e<br />

i=1<br />

+ −<br />

ϑm(Y i +Y<br />

i )+ψm(ϑm)τi . (4.17)<br />

The first inequality follows from the fact that ψm(ϑm) > 0 <strong>and</strong> ˜ T ( Ñn + 1) − n ≤ τ Ñn .<br />

Finally, note that<br />

<br />

E<br />

<br />

<br />

<br />

e 2ϑm ˜ Y0−2ψm(ϑm)˜τ0<br />

Xm <br />

= 1<br />

m<br />

≤ 1<br />

m<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

m<br />

e 2ϑmY0,i−2ψm(ϑm)τ0,i<br />

i=1<br />

m<br />

i=1<br />

+<br />

2ϑmY<br />

e 0,i−2ψm(ϑm)τ0,i , (4.18)<br />

since ϑm > 0. The proof is finished by combining (4.16), (4.17), <strong>and</strong> (4.18).<br />

We will need the following technical assumption.<br />

Assumption 10. (i.) 2ϑ ∈ D 0 ψ ;<br />

+<br />

θY (ii.) Ee 0 −ζτ0 < ∞ <strong>for</strong> (θ, ζ) in a neighborhood of (2ϑ, 2ψ(ϑ));


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 92<br />

(iii.) Ee<br />

+ −<br />

θ(Y 1 +Y<br />

1 )+ζτ1 < ∞ <strong>for</strong> (θ, ζ) in a neighborhood of (ϑ, ψ(ϑ)).<br />

Proposition 4.7. Suppose that Assumptions 8, 9, <strong>and</strong> 10 hold. Let m ∼ n 2+ξ <strong>for</strong><br />

ξ<br />

−(1+ some ξ > 0 <strong>and</strong> δ = n 4 ) . Then,<br />

as n → ∞.<br />

Proof. It follows from Theorem 4.2 that<br />

as m → ∞ <strong>for</strong> some σ 2 > 0. Hence,<br />

P(|ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| ≥ δ) → 0<br />

√ m(ϑm − ϑ) ⇒ N (0, σ 2 )<br />

δ −1 (ϑm − ϑ) p → 0<br />

as n → ∞, since √ ξ<br />

1+ m ∼ n 2 <strong>and</strong> δ−1 ξ<br />

1+ = n 4 . There<strong>for</strong>e,<br />

as n → ∞. Likewise, we have<br />

as n → ∞. It follows that<br />

as n → ∞.<br />

<br />

P |ϑm − ϑ| ≥ δ<br />

<br />

= P δ<br />

2<br />

−1 |ϑm − ϑ| ≥ 1<br />

<br />

→ 0<br />

2<br />

<br />

P |ψm(ϑm) − ψ(ϑ)| ≥ δ<br />

<br />

→ 0<br />

2<br />

P(|ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| ≥ δ)<br />

<br />

≤P |ϑm − ϑ| ≥ δ<br />

<br />

+ P |ψm(ϑm) − ψ(ϑ)| ≥<br />

2<br />

δ<br />

<br />

2<br />

→0


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 93<br />

ξ<br />

−(1+ Proposition 4.8. Suppose that Assumptions 8, 9, <strong>and</strong> 10 hold. Let δ = n 4 ) <strong>for</strong><br />

some ξ > 0. Then,<br />

2 E[Z (n, m)|Xm]<br />

E<br />

p(n) 2<br />

<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ = e o(n)<br />

as n → ∞, where p(n) = Px(Sn > cn).<br />

Proof. It follows from Proposition 4.6 that<br />

E E[Z 2 (n, m)|Xm]; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ <br />

<br />

≤E e −2n(ϑmc−ψm(ϑm)) · 1<br />

m<br />

+<br />

2ϑmY<br />

e 0,i<br />

m<br />

i=1<br />

−2ψm(ϑm)τ0,i<br />

· 1<br />

m + −<br />

ϑm(Y<br />

e i +Yi m<br />

i=1<br />

)+ψm(ϑm)τi<br />

<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

<br />

≤E e −2n((ϑ−δ)c−(ψ(ϑ)+δ)) · 1<br />

m<br />

+<br />

2(ϑ+δ)Y<br />

e 0,i<br />

m<br />

i=1<br />

−2(ψ(ϑ)−δ)τ0,i<br />

· 1<br />

m<br />

+ −<br />

(ϑ+δ)(Y<br />

e i +Yi m<br />

)+(ψ(ϑ)+δ)τi<br />

<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

i=1<br />

≤e −2n((ϑc−ψ(ϑ))−(c+1)δ) +<br />

2(ϑ+δ)Y<br />

· Ee 0 −2(ψ(ϑ)−δ)τ0 (ϑ+δ)(Y<br />

· Ee + −<br />

1 +Y<br />

On the other h<strong>and</strong>, it follows from Proposition 4.2 that<br />

as n → ∞. Consequently,<br />

p(n) ∼ e −n(ϑc−ψ(ϑ))+o(n)<br />

2 E[Z (n, m)|Xm]<br />

E<br />

p(n) 2<br />

<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

≤ Ae−2n((ϑc−ψ(ϑ))−(c+1)δ)<br />

re −2n(ϑc−ψ(ϑ))+o(n)<br />

=Ae 2(c+1)δn+o(n)<br />

=e o(n)<br />

1 )+(ψ(ϑ)+δ)τ1 .<br />

(4.19)


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 94<br />

ξ<br />

−(1+ as n → ∞, since δ = n 4 ) , where<br />

by Assumption 10.<br />

+<br />

2(ϑ+δ)Y<br />

A = Ee<br />

0 −2(ψ(ϑ)−δ)τ0 (ϑ+δ)(Y<br />

· Ee + −<br />

1 +Y<br />

Assumption 11. (a) limθ→∞ sup x∈S |Exe iθf(Φ1) | < 1<br />

(b) limθ→∞ sup m≥1 |E[e iθY1 |Xm]| < 1<br />

1 )+(ψ(ϑ)+δ)τ1 < ∞<br />

Remark 4.5. Assumption 11 indicates that f(Φ1) <strong>and</strong> Y1|Xm are both strongly non-<br />

lattice, which is required to derive the local central limit convergence <strong>and</strong> further to<br />

obtain the exact large deviations asymptotics <strong>for</strong> them.<br />

ξ<br />

−(1+ Proposition 4.9. Suppose that Assumptions 8, 9, 10, <strong>and</strong> 11 hold. Let δ = n 4 )<br />

<strong>for</strong> some ξ > 0. Then,<br />

E<br />

E(Z(n, m)|Xm)<br />

p(n) 2<br />

as n → ∞, where p(n) = Px(Sn > cn).<br />

Proof. It follows from Theorem 5.3 of [65] that<br />

p(n) ∼<br />

2 <br />

− 1 ; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ → 0<br />

h(ϑ, x)<br />

ϑ 2πψ ′′ (ϑ)n e−n(ϑc−ψ(ϑ)) ,<br />

as n → ∞, where h(ϑ, x) is the eigenfunction of (4.2).<br />

Note that<br />

E(Z(n, m)|Xm) = E ∗ [I( ˜ Sn(m) > cn)|Xm] = P ∗ ( ˜ Sn(m) > cn|Xm)<br />

where E ∗ [·|Xm] denotes the probability measure under which we sample the regener-<br />

ative cycles in Step (iii) of Algorithm 1 uni<strong>for</strong>mly, i.e. select cycle i with probability<br />

1/m <strong>for</strong> all 1 ≤ i ≤ m.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 95<br />

In light of the regenerative structure of ˜ Sn(m). In light of the regenerative struc-<br />

ture of ˜ Sn(m), following an argument that is essentially the same as Theorem 1 of<br />

[66], we can show that<br />

P ∗ ( ˜ Sn(m) > cn|Xm) ∼<br />

ϑm<br />

h(ϑm, x)<br />

<br />

2πψ ′′<br />

m(ϑm)n e−n(ϑmc−ψm(ϑm))<br />

as n → ∞, <strong>for</strong> each m ≥ 1. Note that the key step to proving the preceding asymp-<br />

totics is a local CLT <strong>for</strong> ˜ Sn(m), during which its non-lattice property (Assumption<br />

11) is needed.<br />

<strong>and</strong><br />

It follows that, <strong>for</strong> any ɛ > 0,<br />

<strong>for</strong> all large n. Hence,<br />

P ∗ ( ˜ Sn(m) > cn|Xm)<br />

p(n)<br />

h(ϑ, x)<br />

p(n) ≥ (1 − ɛ)<br />

ϑ 2πψ ′′ (ϑ)n e−n(ϑc−ψ(ϑ))<br />

P ∗ ( ˜ Sn(m) > cn|Xm) ≤ (1 + ɛ)<br />

≤<br />

ϑm<br />

h(ϑm, x)<br />

<br />

2πψ ′′<br />

m(ϑm)n e−n(ϑmc−ψm(ϑm))<br />

1 + ɛ<br />

1 − ɛ · h(ϑm,<br />

<br />

x) ϑ<br />

·<br />

h(ϑ, x) ϑm<br />

·<br />

ψ ′′ (ϑ)<br />

ψ ′′ m(ϑm) · e−n[(ϑm−ϑ)c−(ψm(ϑm)−ψ(ϑ))] .<br />

Moreover, note that on the <strong>event</strong> {ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ},<br />

h(ϑm, x)<br />

h(ϑ, x)<br />

<br />

ϑ<br />

·<br />

ϑm<br />

·<br />

ψ ′′ (ϑ)<br />

ψ ′′ m(ϑm)<br />

≤ 1 + ɛ


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 96<br />

ξ<br />

−(1+ <strong>for</strong> all δ small enough, or <strong>for</strong> all n large enough since δ = n 4 ) . There<strong>for</strong>e,<br />

E<br />

<br />

P∗ ( ˜ <br />

Sn(m) > cn|Xm)<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

p(n)<br />

(1 + ɛ)2<br />

≤ · E<br />

1 − ɛ<br />

e −n(c+1)δ<br />

(1 + ɛ)2<br />

→<br />

1 − ɛ<br />

as n → ∞. Sending ɛ ↓ 0, we have<br />

Likewise,<br />

Hence,<br />

lim<br />

n→∞ E<br />

lim E<br />

n→∞<br />

lim<br />

n→∞ E<br />

<br />

<br />

<br />

Similarly, we can show<br />

P∗ ( ˜ <br />

Sn(m) > cn|Xm)<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ ≤ 1.<br />

p(n)<br />

P∗ ( ˜ <br />

Sn(m) > cn|Xm)<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ ≥ 1.<br />

p(n)<br />

P∗ ( ˜ <br />

Sn(m) > cn|Xm)<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ = 1.<br />

p(n)<br />

lim<br />

n→∞ E<br />

<br />

P∗ ( ˜ Sn(m) > cn|Xm) 2<br />

p(n) 2<br />

<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ = 1.<br />

It follows that<br />

E(Z(n, m)|Xm)<br />

E<br />

=E<br />

→1<br />

p(n) 2<br />

<br />

P ∗ ( ˜ Sn(m) > cn|Xm) 2<br />

p(n) 2<br />

2 <br />

− 1 ; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

− 2P∗ ( ˜ Sn(m) > cn|Xm)<br />

p(n)<br />

+ 1; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 97<br />

as n → ∞.<br />

Theorem 4.3. Suppose that Assumptions 8, 9, 10, <strong>and</strong> 11 hold. Let m ∼ n 2+ξ <strong>for</strong><br />

some ξ > 0, then there exists a specification <strong>for</strong> r such that r ∼ e o(n) <strong>and</strong><br />

as n → ∞, <strong>for</strong> any ɛ > 0.<br />

<br />

Z(n, m, r)<br />

P<br />

p(n)<br />

ξ<br />

−(1+ Proof. Note that, setting δ = n 4 ) ,<br />

<br />

<br />

− 1<br />

<br />

<br />

> ɛ → 0<br />

<br />

Z(n, m, r) <br />

P<br />

− 1<br />

p(n) > ɛ<br />

<br />

Z(n, m, r) <br />

≤P<br />

− 1<br />

p(n) > ɛ; |ϑm<br />

<br />

− ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

+ P(|ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| ≥ δ).<br />

Hence, by Proposition 4.7, it suffices to show that <strong>for</strong> some choice of r ∼ e o(n) ,<br />

<br />

Z(n, m, r)<br />

P<br />

p(n)<br />

<br />

<br />

− 1<br />

> ɛ; |ϑm<br />

<br />

− ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ → 0 (4.20)<br />

as n → ∞. To this end, by the Chebyshev’s inequality,<br />

<br />

Z(n, m, r)<br />

P<br />

p(n)<br />

<br />

<br />

− 1<br />

> ɛ<br />

Xm<br />

≤ E[(Z(n, m, r) − p(n))2 |Xm]<br />

ɛ 2 p(n) 2<br />

= Var(Z(n, m, r)|Xm) + [E(Z(n, m, r)|Xm) − p(n)] 2<br />

ɛ2p(n) 2<br />

=ɛ −2<br />

<br />

Var(Z(n, m)|Xm)<br />

rp(n) 2 + [E(Z(n, m)|Xm) − p(n)] 2<br />

p(n) 2<br />

<br />

≤ɛ −2<br />

<br />

E(Z(n, m) 2 |Xm)<br />

rp(n) 2<br />

<br />

E(Z(n, m)|Xm)<br />

+<br />

p(n) 2<br />

<br />

2<br />

− 1 .


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 98<br />

There<strong>for</strong>e,<br />

<br />

Z(n, m, r) <br />

P<br />

− 1<br />

p(n) > ɛ; |ϑm<br />

<br />

− ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ → 0<br />

<br />

Z(n, m, r) <br />

=E P<br />

− 1<br />

p(n) > ɛ<br />

Xm <br />

<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

≤ɛ −2 2 E(Z(n, m) |Xm)<br />

E<br />

rp(n) 2<br />

<br />

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

+ ɛ −2 E(Z(n, m)|Xm)<br />

E<br />

p(n) 2<br />

<br />

2<br />

− 1 ; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ<br />

which, combined with Proposition 4.8 <strong>and</strong> Proposition 4.9, implies (4.20).<br />

Remark 4.6. Theorem 4.3 asserts that in order to achieve a given relative precision<br />

ɛ, the required computational ef<strong>for</strong>t <strong>for</strong> Algorithm 1 is e o(n) . More specifically, the<br />

computational ef<strong>for</strong>t <strong>for</strong> generating cycles <strong>and</strong> <strong>computing</strong> ϑm is O(m) = O(n 2+ξ ).<br />

Since the complexity <strong>for</strong> simulating one Z(n, m) is O(m), which is equivalent to<br />

generating a discrete rv with support size m, the computational ef<strong>for</strong>t <strong>for</strong> compute the<br />

estimate Z(n, m, r) is O(mr) = e o(n) . There<strong>for</strong>e, the entire computational complexity<br />

<strong>for</strong> Algorithm 1 is e o(n) , meaning the algorithm is logarithmically efficient.<br />

4.5 Numerical Experiments<br />

4.5.1 Autoregressive Model<br />

We will first use an autoregressive model of order 1 to illustrate Algorithm 1 since<br />

the tail probability of such a model is explicitly available. In particular, let<br />

Φn = ρΦn−1 + Zn, n ≥ 1,<br />

where ρ ∈ (0, 1) <strong>and</strong> Zn’s are iid with st<strong>and</strong>ard normal distribution N (0, 1) <strong>and</strong> are<br />

independent of Φ0. Define<br />

n−1<br />

Sn = Φi.<br />

i=0


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 99<br />

Given c > 0, we will compute the probability Px(Sn > cn) P(Sn > cn|Φ0 = x) <strong>for</strong><br />

large n.<br />

<strong>and</strong><br />

First, note that by direct calculation,<br />

Sn =<br />

Φn = ρ n Φ0 +<br />

n<br />

ρ n−i Zi,<br />

i=1<br />

1 − ρn<br />

1 − ρ Φ0<br />

n−1<br />

1 − ρ<br />

+<br />

i=1<br />

n−i<br />

1 − ρ Zi.<br />

Hence, given Φ0, Sn is normal rv with mean 1−ρn<br />

1−ρ Φ0 <strong>and</strong> variance<br />

There<strong>for</strong>e,<br />

σ(n) 2 n−1<br />

n−i 1 − ρ<br />

<br />

1 − ρ<br />

i=1<br />

2<br />

=<br />

Px(Sn > cn) = 1 − Ψ<br />

1<br />

(1 − ρ) 2<br />

<br />

n − 1 − 2(ρ − ρn )<br />

1 − ρ + ρ2 − ρ2n 1 − ρ2 <br />

.<br />

<br />

1−ρ<br />

cn − n<br />

1−ρ x<br />

<br />

σ(n)<br />

where Ψ(·) is the cumulative distribution function of the st<strong>and</strong>ard normal distribution.<br />

Moreover, we can easily calculate the large deviations asymptotics of Sn. Partic-<br />

ularly, note that<br />

1<br />

n log Exe θSn = 1<br />

1−ρn<br />

log Exe 1−ρ<br />

n x+N (0,σ(n)2 ) 1 θ<br />

=<br />

n<br />

2σ(n) 2<br />

2<br />

as n → ∞. Setting ψ ′ (ϑ) = c yields<br />

ϑ = (¸1 − ρ) 2 .<br />

The exponential rate function is then given by<br />

I(c) ϑc − ψ(ϑ) = c2 (1 − ρ) 2<br />

.<br />

2<br />

θ2<br />

→ ψ(θ)<br />

2(1 − ρ) 2


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 100<br />

x ρ c ɛ<br />

0 0.5 1 0.25<br />

Table 4.1: Parameter specification <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> the<br />

AR(1) model<br />

In order to implement Algorithm 1, we first need to specify the regenerative struc-<br />

ture. Namely, we ought to find a compact set C, a constant δ ∈ (0, 1) <strong>and</strong> a distri-<br />

bution ϕ such that the minorization condition (Assumption 9) is satisfied. To that<br />

end, let C = [−ɛ, ɛ], <strong>and</strong> φ(y) = infx∈C p(x, y), where p(x, y) is the transition density<br />

of the AR(1) model. Then,<br />

Let<br />

φ(y) = inf<br />

|x|≤ɛ<br />

∞<br />

δ =<br />

=<br />

⎧<br />

1<br />

⎨<br />

(y−ρx)2<br />

− √ e 2 =<br />

2π ⎩<br />

φ(y) dy<br />

(y+ρɛ)2<br />

√1 − e 2 , y ≥ 0<br />

2π<br />

(y−ρɛ)2<br />

√1 − e 2 , y < 0<br />

2π .<br />

−∞<br />

0<br />

∞<br />

1 (y−ρɛ)2<br />

− 1 (y+ρɛ)2<br />

− √ e 2 dy + √ e 2 dy<br />

−∞ 2π 0 2π<br />

=Ψ(−ρɛ) + 1 − Ψ(ρɛ)<br />

=2Ψ(−ρɛ) ∈ (0, 1).<br />

Finally, let ϕ(x) = x φ(y)<br />

dy be a distribution function on R. Then, we have<br />

−∞ δ<br />

Px(Φ1 ∈ ·) ≥ δϕ(·), x ∈ C.<br />

For the numerical experiment, the parameters are set up in Table 4.1. The nu-<br />

merical results are shown in Table 4.2 <strong>and</strong> Table 4.3, where “Log Ratio” means<br />

log(Var(Z(n,m)|Xm))<br />

log(p(n) 2 .<br />

)


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 101<br />

True Estimated<br />

ϑ 0.25 0.2951<br />

ψ(ϑ) 0.125 0.1567<br />

I(c) 0.125 0.1384<br />

Table 4.2: True vs estimated tilting parameters <strong>for</strong> the AR(1) model. The number of<br />

cycles m = 40000.<br />

n p(n) Z(n, m, r) Var(Z(n, m)|Xm) Log Ratio<br />

20 0.0082 0.0116 0.0142 0.443<br />

40 5.32E-04 4.62E-04 1.59E-04 0.580<br />

60 3.72E-05 2.94E-05 6.07E-07 0.702<br />

80 2.70E-06 2.00E-06 2.59E-09 0.771<br />

100 2.01E-07 1.19E-07 9.58E-12 0.823<br />

Table 4.3: Results <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> the AR(1) model. The<br />

number of cycles m = 40000, the number of bootstrap samples r = 20000.<br />

4.5.2 R<strong>and</strong>om Walk<br />

We now apply the Algorithm 1 to the r<strong>and</strong>om walk with light-tailed increments, which<br />

is certainly a special case of the Markov chain. In particular, consider<br />

Φn = Φn−1 + Zn, n ≥ 1,<br />

where Zn’s are iid with st<strong>and</strong>ard normal distribution. In this setting, Φ regenerates<br />

at each individual step <strong>and</strong> we can simplify Algorithm 1 as follows<br />

Algorithm 2. (i) Generate m iid copies of Z, call them (Z1, . . . , Zm) Xm.<br />

(ii) Compute<br />

(iii) Compute ϑm via ψ ′ m(ϑm) = c.<br />

(iv) Sample Zi from Xm with probability<br />

<br />

m 1<br />

ψm(θ) = log e<br />

m<br />

θZi<br />

<br />

.<br />

i=1<br />

1<br />

m eϑmZi−ψm(ϑm) ,


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 102<br />

True Estimated<br />

ϑ 0.5 0.4992<br />

ψ(ϑ) 0.125 0.1258<br />

I(c) 0.125 0.1238<br />

Table 4.4: True vs estimated tilting parameters <strong>for</strong> the r<strong>and</strong>om walk. c = 0.5. The<br />

number of cycles m = 40000.<br />

<strong>and</strong> call it ˜ Z1.<br />

(v) Continue sampling independently to obtain ˜ Z1, . . . , ˜ Zn.<br />

(vi) Put ˜ Sn(m) = ˜ Z1 + · · · + ˜ Zn <strong>and</strong><br />

Z(n, m) = I( ˜ Sn(m) > cn) exp(−ϑmSn(m) + ψm(ϑm)n).<br />

(vii) Repeat the above three steps r independent times, yielding Z1(n, m), . . . , Zr(n, m),<br />

<strong>and</strong> return<br />

Direct calculation yields that<br />

Z(n, m, r) = 1<br />

r<br />

r<br />

Zi(n, m).<br />

i=1<br />

ψ(θ) log Ee θZ1 = θ 2<br />

2 .<br />

Setting ψ ′ (ϑ) = c gives ϑ = c. Hence, the exponential rate function is<br />

I(c) = ϑc − ψ(ϑ) = c2<br />

2 .<br />

The numerical results are shown in Table 4.4 <strong>and</strong> Table 4.5.


CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 103<br />

n p(n) Z(n, m, r) Var(Z(n, m)|Xm) Log Ratio<br />

20 0.0569 0.0576 0.0064 0.881<br />

40 7.83E-04 8.37E-04 2.44E-06 0.903<br />

60 5.38E-05 5.93E-05 1.47E-08 0.917<br />

80 3.87E-06 4.41E-06 9.57E-11 0.926<br />

100 2.87E-07 3.21E-07 5.88E-13 0.935<br />

Table 4.5: Results <strong>for</strong> <strong>computing</strong> <strong>rare</strong>-<strong>event</strong> <strong>probabilities</strong> <strong>for</strong> the r<strong>and</strong>om walk. c =<br />

0.5. The number of cycles m = 40000, the number of bootstrap samples r = 10000.


Appendix A<br />

Stochastic Stability of Markov<br />

Processes<br />

The purpose of this chapter is to give an overview of a unified approach to study<br />

the stochastic stability (transience, recurrence, ergodicity, etc.) of continuous-time<br />

Markov processes via the so-called Foster-Lyapunov criteria. Foster-Lyapunov or<br />

“drift” (inequality) criteria are among the most widely used sufficient conditions <strong>for</strong><br />

the stability classification of Markov <strong>models</strong>. An excellent reference of this topic is<br />

[74], which provides a systematic treatment in the context of discrete time Markov<br />

chains. On the other h<strong>and</strong>, [71], [72], <strong>and</strong> [92] discuss the corresponding materials<br />

<strong>for</strong> continuous time Markov processes. Other related references include [94], [95], <strong>and</strong><br />

[20]; see also [73] <strong>for</strong> a brief survey on the same subject.<br />

To fix the idea, let Φ = (Φn : n ≥ 0) be a (time-homogeneous) discrete time<br />

Markov chain with state space S <strong>and</strong> transition probability P (x, A) = P(Φn ∈<br />

A|Φn−1 = x) <strong>for</strong> A ⊆ S. Suppose that V is a non-negative function on S. Then<br />

the drift of V (Φn) at x ∈ S is defined by<br />

<br />

∆V (x) <br />

S<br />

P (x, dy)V (y) − V (x).<br />

104


APPENDIX A. STOCHASTIC STABILITY OF MARKOV PROCESSES 105<br />

An example of the Foster-Lyapunov criterion is<br />

∆V (x) ≤ −1 + bIC(x) (A.1)<br />

<strong>for</strong> some b > 0 <strong>and</strong> some set C (typically, compact). Under certain other mild<br />

technical conditions, (A.1) will imply that Φ is positive recurrent.<br />

It is not hard to imagine that drift conditions similar to (A.1) in the discrete time<br />

context have analogues <strong>for</strong> continuous time Markov processes. Evidently, an analogue<br />

<strong>for</strong> the drift operator ∆ is the “generator” (whose definition will be given later). From<br />

now on, we will focus on the continuous time setting.<br />

A.1 Extended Generator<br />

Let Φ = (Φ(t) : t ≥ 0) be a (time-homogeneous) continuous time Markov process liv-<br />

ing on the state space (S, B(S)), where B(S) is the Borel field on S. We assume that<br />

S is a locally compact <strong>and</strong> separable metric space. The Foster-Lyapunov framework<br />

works <strong>for</strong> <strong>general</strong> state spaces, but in practice S is typically a subset of the Euclidean<br />

space. Φ evolves on the probability space (Ω, F , P). We denote by Px the probability<br />

measure conditioned on Φ(0) = x <strong>and</strong> by Ex the associated expectation operator. We<br />

assume that Φ is a nonexplosive (Borel) right process (see [87] <strong>for</strong> definition) so that<br />

Φ is strongly Markovian <strong>and</strong> has right continuous sample paths.<br />

In the remaining of the chapter, we will fix the following two notations. For a<br />

measurable set A ∈ B(S), let τA = inf{t ≥ 0 : Φ(t) ∈ A}. Let {Ok : k ≥ 1} denote a<br />

family of open sets in S <strong>for</strong> which the closure of Ok is a compact set <strong>and</strong> Ok ↑ S as<br />

k → ∞.<br />

Definition A.1. Φ is called ϕ-irreducible if there exists a σ-finite measure ϕ such<br />

that <strong>for</strong> all x ∈ S,<br />

ϕ(A) > 0 implies Px(τA < ∞) > 0.<br />

There are several different versions <strong>for</strong> defining a generator of a Markov process<br />

<strong>and</strong> we will adopt the one from [23].


APPENDIX A. STOCHASTIC STABILITY OF MARKOV PROCESSES 106<br />

Definition A.2. We denote by Dom( ˜<br />

A ) the set of all functions f : S → R <strong>for</strong> which<br />

there exists a measurable function g : S → R such that<br />

M f (t) f(Φ(t)) −<br />

t<br />

0<br />

g(Φ(s)) ds (A.2)<br />

is a local martingale adapted to Φ with respect to Px <strong>for</strong> each x ∈ S. We write<br />

A˜f = g <strong>and</strong> call A˜ the extended generator of Φ; Dom( A ˜)<br />

is called the domain of<br />

A ˜.<br />

The local martingale property indicates that there exists a sequence of stopping<br />

times (σk : k ≥ 0) with σk → ∞ as k → ∞ (also called a localizing sequence of<br />

stopping times) <strong>for</strong> which (M f (t ∧ σk) : t ≥ 0) is a uni<strong>for</strong>mly integrable martingale<br />

adapted to Φ <strong>for</strong> any fixed k.<br />

A.2 Foster-Lyapunov Criteria<br />

The Foster-Lyapunov criteria revealed in this section are taken from [73], which pro-<br />

vides an overview of the results in [72]. Note that in [72], the authors use a different<br />

definition <strong>for</strong> the extended generator <strong>and</strong> adopts a “truncation” argument. However,<br />

the same authors switched to Definition A.2 in the survey [73]. In fact, the proofs in<br />

[72] will retain valid if one adopts Definition A.2 <strong>and</strong> replaces the truncated process<br />

by the “localized” process Φ(t ∧ Tk), where (Tk : k ≥ 0) is a localizing sequence of<br />

stopping times <strong>for</strong> the local martingale (A.2).<br />

A.2.1 Recurrence <strong>and</strong> Ergodicity<br />

Definition A.3. Φ is called Harris recurrent if there exists a σ-finite measure ϕ,<br />

such that <strong>for</strong> all x ∈ S,<br />

ϕ(A) > 0 implies Px(τA < ∞) = 1.<br />

Note that the above definition of Harris recurrence is equivalent to the following<br />

one in the discrete time setting. In particular, a Markov chain (Φn : n ≥ 0) is Harris


APPENDIX A. STOCHASTIC STABILITY OF MARKOV PROCESSES 107<br />

recurrent if there exists a set C ∈ B(S) (called a small set in [74]) <strong>for</strong> which there<br />

exist λ > 0, a probability ϕ on S, <strong>and</strong> m ≥ 1 such that Px(τK < ∞) = 1 <strong>for</strong> each<br />

x ∈ S <strong>and</strong> Px(Φm ∈ B) ≥ λϕ(B) <strong>for</strong> x ∈ K <strong>and</strong> B ∈ B(S).<br />

We will need the concept of petite set, which <strong>general</strong>izes small set, in order to<br />

establish the Foster-Lyapunov criteria <strong>for</strong> stability.<br />

Definition A.4. A non-empty set C ∈ B(S) is called ϕa-petite if ϕa is a non-trivial<br />

measure on B(S) <strong>and</strong> a is a probability on (0, ∞) which satisfy the bound<br />

<br />

Ka(x, ·) <br />

where P (t, x, ·) = Px(Φ(t) ∈ ·).<br />

P (t, x, ·)a(dt) ≥ ϕa(·), ∀x ∈ C<br />

Definition A.5. A function f : S → R is lower semicontinuous if<br />

lim f(y) ≥ f(x), x ∈ S.<br />

y→x<br />

Definition A.6. If Px(Φ(t) ∈ O) is a lower semicontinuous function <strong>for</strong> any open set<br />

O ∈ B(S), then Φ is called a (weak) Feller process.<br />

Remark A.1. Φ is Feller if <strong>and</strong> only if Exg(Φ(t)) is a bounded continuous function<br />

in x ∈ S <strong>for</strong> all t > 0, whenever g is bounded <strong>and</strong> continuous. See, <strong>for</strong> example,<br />

Proposition 6.1.1 of [74]. <br />

It is well known that the Harris recurrence of Φ implies the existence of an es-<br />

sentially unique invariant measure π. If the invariant measure if finite, then it can<br />

be normalized to be a probability measure, in which case Φ is called positive Harris<br />

recurrent.<br />

Definition A.7. For any positive measurable function f ≥ 1 <strong>and</strong> any signed-measure<br />

η on S, define the f-norm ηf by<br />

ηf = sup |η(g)|,<br />

|g|≤f


APPENDIX A. STOCHASTIC STABILITY OF MARKOV PROCESSES 108<br />

where η(g) = <br />

x∈S g(x)η(dx). When f ≡ 1, · f is called the total variation norm<br />

<strong>and</strong> is denoted by · T V .<br />

Definition A.8. Φ is called ergodic if the stationary distribution π exists <strong>and</strong><br />

lim<br />

t→∞ Px(Φ(t) ∈ ·) − π(·) = 0, x ∈ S.<br />

T V<br />

Recall that <strong>for</strong> an irreducible discrete time Markov chain, ergodicity is equivalent<br />

to positive Harris recurrence. There is also such a connection in the continuous time<br />

setting. It is shown in [71] that if any one skeleton chain is irreducible, then ergodicity<br />

<strong>and</strong> positive Harris recurrence are equivalent concepts.<br />

Proposition A.1. Suppose that Φ is a ϕ-irreducible nonexplosive right process. If<br />

there exists constants c > 0, d < ∞, a petite set C ∈ B(S) <strong>and</strong> a function V such<br />

that<br />

then Φ is ergodic.<br />

Proof. See Theorem 7 of [73].<br />

˜<br />

A V (x) ≤ −c + dIC(x), x ∈ S,<br />

A.2.2 Exponential Ergodicity<br />

Suppose that Φ is positive Harris recurrent with stationary distribution π.<br />

Definition A.9. For a function f ≥ 1, Φ is called f-exponentially ergodic if there<br />

exists a constant β ∈ (0, 1) <strong>and</strong> a finite-valued function B(x) such that<br />

<strong>for</strong> all t > 0 <strong>and</strong> x ∈ S.<br />

Px(X(t) ∈ ·) − π(·)f ≤ B(x)β t<br />

Proposition A.2. Suppose that Φ is a ϕ-irreducible nonexplosive right process. If<br />

there exists a petite set C, constants c > 0, d < ∞, <strong>and</strong> a function V such that<br />

˜<br />

A V (x) ≤ −cV (x) + dIC(x), x ∈ S,


APPENDIX A. STOCHASTIC STABILITY OF MARKOV PROCESSES 109<br />

then Φ is V -exponentially ergodic <strong>and</strong> π(V ) < ∞.<br />

Proof. See Theorem 7 <strong>and</strong> Theorem 8 of [73].


Bibliography<br />

[1] S. Amussen <strong>and</strong> P. W. Glynn. Stochastic Simulation: Algorithm <strong>and</strong> Analysis.<br />

Springer-Verlag, 2007.<br />

[2] P. K. Andersen, Ø. Borgan, R. D. Gill, <strong>and</strong> N. Keiding. Statistical Models Based<br />

on Counting Processes. Springer-Verlag, 1993.<br />

[3] S. Asmussen. Conjugate processes <strong>and</strong> the simulation of ruin problems. Stoch.<br />

Proc. Appl., 20:213–229, 1985.<br />

[4] S. Asmussen. Ruin Probabilities. World Scientific Publishing Co. Ltd. London,<br />

2000.<br />

[5] K. B. Athreya <strong>and</strong> P. Ney. A new approach to the limit theory of recurrent<br />

Markov chains. Trans. Amer. Math. Soc., 245:493501, 1978.<br />

[6] A. Bassamboo <strong>and</strong> S. Jain. Efficient importance sampling <strong>for</strong> reduced <strong>for</strong>m<br />

<strong>models</strong> in credit risk. In Proceedings of the 2006 Winter Simulation Conference,<br />

pages 741–749, 2006.<br />

[7] A. Bassamboo, S. Juneja, <strong>and</strong> A. Zeevi. Portfolio credit risk with extremal<br />

dependence: Asymptotic analysis <strong>and</strong> efficient simulation. Operations Research,<br />

56(3):593–606, 2008.<br />

[8] L. Bauwens <strong>and</strong> N. Hautsch. H<strong>and</strong>book of Financial Time Series, chapter Mod-<br />

elling Financial High Frequency Data Using Point Processes, pages 953–979.<br />

Springer Berlin Heidelberg, 2009.<br />

110


BIBLIOGRAPHY 111<br />

[9] A. Berman <strong>and</strong> R. J. Plemmons. Nonnegative Matrices in the Mathematical<br />

Sciences. SIAM, 1987.<br />

[10] J. Blanchet <strong>and</strong> P. W. Glynn. Efficient <strong>rare</strong> <strong>event</strong> simulation <strong>for</strong> the maximum<br />

of heavy-tailed r<strong>and</strong>om variables. Ann. Appl. Probab., 18:1351–1378, 2008.<br />

[11] J. Blanchet, P. W. Glynn, <strong>and</strong> S. P. Meyn. Large deviations <strong>for</strong> the empirical<br />

mean of an M/M/1 queue. Submitted <strong>for</strong> publication, 2011.<br />

[12] J. Blanchet, K. Leder, <strong>and</strong> P. W. Glynn. Strongly efficent algorithms <strong>for</strong> light-<br />

tailed r<strong>and</strong>om walks: An old folk song sung to a faster new tune. In P. LEcuyer<br />

<strong>and</strong> eds. A. Owen, editors, Monte Carlo <strong>and</strong> Quasi-Monte Carlo Methods 2008.<br />

Springer, 2009.<br />

[13] H. A. P. Blom, G. J. Bakker, <strong>and</strong> J. Krystul. Rare <strong>event</strong> estimation <strong>for</strong> a large-<br />

scale stochastic hybrid system with air traffic application. In G. Rubino <strong>and</strong><br />

B. Tuffin, editors, Rare Event Simulation using Monte Carlo Methods, chapter 9,<br />

pages 193–214. Wiley, 2009.<br />

[14] P. Brémaud. Point Processes <strong>and</strong> Queues: Martingale Dynamics. Springer,<br />

1981.<br />

[15] J. Bucklew. Introduction to Rare Event Simulation. Springer-Verlag, 2004.<br />

[16] H. P. Chan <strong>and</strong> T. L. Lai. Efficient importance sampling <strong>for</strong> Monte Carlo<br />

evaluation of exceedance <strong>probabilities</strong>. Ann. Appl. Probab., 17:440–473, 2007.<br />

[17] H. P. Chan <strong>and</strong> T. L. Lai. A sequential Monte Carlo approach to <strong>computing</strong><br />

tail <strong>probabilities</strong> in stochastic <strong>models</strong>. Ann. of Appl. Prob., <strong>for</strong>thcoming.<br />

[18] J. C. C. Chan <strong>and</strong> D. P. Kroese. Efficient estimation of large portfolio loss<br />

<strong>probabilities</strong> in t-copula <strong>models</strong>. European Journal of Operations Research,<br />

205:361–367, 2010.<br />

[19] V. Chavez-Demoulin, A. C. Davison, <strong>and</strong> A. J. McNeil. Estimating value-at-<br />

risk: A point process approach. Quantitative Finance, 5(2):227–234, 2005.


BIBLIOGRAPHY 112<br />

[20] M.-F. Chen. On three classical problems <strong>for</strong> Markov chains with continuous<br />

time parameter. J. Appl. Prob., 28:305–320, 1991.<br />

[21] P. Cheridito, D. Filipović, <strong>and</strong> M. Yor. Equivalent <strong>and</strong> absolutely continuous<br />

measure changes <strong>for</strong> jump-diffusion processes. The Ann. Appl. Probab., 15:1713–<br />

1732, 2005.<br />

[22] J. C. Cox, J. E. Ingersoll, <strong>and</strong> S. A. Ross. A theory of the term structure of<br />

interest rates. Econometrica, 53(2):385–407, 1985.<br />

[23] M. H. A. Davis. Markov Models <strong>and</strong> Optimization. Chapman <strong>and</strong> Hall, London,<br />

1993.<br />

[24] A. Dembo <strong>and</strong> O. Zeitouni. Large Deviations Techniques <strong>and</strong> Applications.<br />

Springer, 2 edition, 1998.<br />

[25] M. D. Donsker <strong>and</strong> S. R. S. Varadhan. Asymptotic evaluation of certain Markov<br />

process expectations <strong>for</strong> large time. I. II. Comm. Pure Appl. Math., 28:1–47,<br />

1975.<br />

[26] M. D. Donsker <strong>and</strong> S. R. S. Varadhan. Asymptotic evaluation of certain Markov<br />

process expectations <strong>for</strong> large time. III. Comm. Pure Appl. Math., 29:389461,<br />

1976.<br />

[27] A. L. Dontchev <strong>and</strong> R. T. Rockafellar. Implicit Functions <strong>and</strong> Solution Map-<br />

pings: A View from Variational Analysis. Springer, 2009.<br />

[28] D. Duffie, D. Filipović, <strong>and</strong> W. Schachermayer. Affine processes <strong>and</strong> applica-<br />

tions in finance. Ann. Appl. Probab., 13(3):984–1053, 2003.<br />

[29] D. Duffie <strong>and</strong> N. Garleanu. Risk <strong>and</strong> valuation of collateralized debt obligations.<br />

Financial Analysts Journal, 57(1):41–59, 2001.<br />

[30] D. Duffie, J. Pan, <strong>and</strong> K. Singleton. Trans<strong>for</strong>m analysis <strong>and</strong> asset pricing <strong>for</strong><br />

<strong>affine</strong> jump-diffusions. Econometrica, 68(6):1343–1376, 2000.


BIBLIOGRAPHY 113<br />

[31] D. Duffie <strong>and</strong> K. J. Singleton. Credit Risk: Pricing, Measurement, <strong>and</strong> Man-<br />

agement. Princeton University Press, 2003.<br />

[32] K. R. Duffy <strong>and</strong> S. P. Meyn. Most likely paths to error when estimating the<br />

mean of a reflected r<strong>and</strong>om walk. Per<strong>for</strong>mance Evaluation, 67(12):1290–1303,<br />

2010.<br />

[33] P. Dupuis <strong>and</strong> H. Wang. Dynamic importance sampling <strong>for</strong> uni<strong>for</strong>mly recurrent<br />

Markov chains. Ann. Appl. Probab., 15:1–38, 2005.<br />

[34] P. Dupuis <strong>and</strong> H. Wang. Subsolutions of an Isaacs equation <strong>and</strong> <strong>and</strong> efficient<br />

schemes <strong>for</strong> importance sampling. Math. Oper. Research, 32:723–757, 2007.<br />

[35] P. E. Echeverria. A criterion <strong>for</strong> invariant measure of Markov procsses. Z.<br />

Wahrsch. verw. Gebiete., 61:1–16, 1982.<br />

[36] A. Eckner. Computational techniques <strong>for</strong> basic <strong>affine</strong> <strong>models</strong> of portfolio credit<br />

risk. Journal of Computational Finance, 15:63–97, 2009.<br />

[37] E. Errais, K. Giesecke, <strong>and</strong> L. B. Goldberg. Affine point processes <strong>and</strong> portfolio<br />

credit risk. SIAM Journal on Financial Mathematics, 1:642665, 2010.<br />

[38] S. N. Ethier <strong>and</strong> T. G. Kurtz. Markov Processes: Characterization <strong>and</strong> Con-<br />

vergence. Wiley, 1986.<br />

[39] L. C. Evans. Partial Differential Equations. Amer Mathematical Society, 1998.<br />

[40] D. Filipović, E. Mayerhofer, <strong>and</strong> P. Schneider. Density approximations <strong>for</strong><br />

multivariate <strong>affine</strong> jump-diffusion processes. Working paper, 2011.<br />

[41] K. Giesecke, K. Spiliopoulos, <strong>and</strong> R. B. Sowers. Default clustering in large<br />

portfolios: Typical <strong>and</strong> atypical <strong>event</strong>s. Submitted, 2011.<br />

[42] K Giesecke, K. Spiliopoulos, R. B. Sowers, <strong>and</strong> J. A. Sirignano. Large portfolio<br />

asymptotics <strong>for</strong> losses from default. Submitted, 2011.


BIBLIOGRAPHY 114<br />

[43] Kay Giesecke, Baeho Kim, <strong>and</strong> Shilin Zhu. Monte Carlo algorithms <strong>for</strong> default<br />

timing problems. Management Science, <strong>for</strong>thcoming.<br />

[44] P. Glasserman, W. Kang, <strong>and</strong> P. Shahabuddin. Large deviations in multifactor<br />

portfolio credit risk. Mathematical Finance, 17(3):345–379, 2007.<br />

[45] P. Glasserman, W. Kang, <strong>and</strong> P. Shahabuddin. Fast simulation of multifactor<br />

portfolio credit risk. Operations Research, 56(5):12001217, 2008.<br />

[46] P. Glasserman <strong>and</strong> K.-K. Kim. Saddlepoint approximation <strong>for</strong> <strong>affine</strong> jump-<br />

diffusion <strong>models</strong>. Journal of Economic Dynamics <strong>and</strong> Control, 33:37–52, 2009.<br />

[47] P. Glasserman <strong>and</strong> K.-K. Kim. Moment explosions <strong>and</strong> stationary distributions<br />

in <strong>affine</strong> diffusion <strong>models</strong>. Mathematical Finance, 20(1):1–33, 2010.<br />

[48] P. Glasserman <strong>and</strong> J. Li. Importance sampling <strong>for</strong> portfolio credit risk. Man-<br />

agement Science, 51(11):16431656, 2005.<br />

[49] P. W. Glynn. Wide-sense regeneration <strong>for</strong> Harris recurrent Markov processes:<br />

An open problem. Queueing Systems - Theory <strong>and</strong> Applications, (3), <strong>for</strong>thcom-<br />

ing.<br />

[50] P. W. Glynn <strong>and</strong> A. Zeevi. Bounding stationary expectations of Markov pro-<br />

cesses. In S. Ethier, J. Feng, <strong>and</strong> R. Stockbridge, editors, Markov Processes <strong>and</strong><br />

Related Topics: A Festschrift <strong>for</strong> Thomas G. Kurtz. IMS, 2008.<br />

[51] A. G. Hawkes. Spectra of some self-exciting <strong>and</strong> mutually exciting point pro-<br />

cesses. Biometrika, 58:83–90, 1971.<br />

[52] A. G. Hawkes <strong>and</strong> D. Oakes. A cluster process representation of a self-exciting<br />

proces. Journal of Applied Probability, 11:493–503, 1974.<br />

[53] P. Heidelberger. Fast simulation of <strong>rare</strong> <strong>event</strong>s in queueing <strong>and</strong> reliability mod-<br />

els. ACM Transactions on Modeling <strong>and</strong> Computer Simulation, 5(1):43–85,<br />

1995.


BIBLIOGRAPHY 115<br />

[54] T. S. Y. Ho <strong>and</strong> S. B. Lee. Term structure movements <strong>and</strong> pricing interest rate<br />

contingent claims. Journal of Finance, 41(5):1011–1029, 1986.<br />

[55] R. A. Horn <strong>and</strong> C. R. Johnson. Topics in Matrix Analysis. Cambridge University<br />

Press, 1994.<br />

[56] J. Jacod <strong>and</strong> A. Shiryaev. Limit Theorems <strong>for</strong> Stochastic Processes. Springer,<br />

2 edition, 2002.<br />

[57] N. L. Johnson, K. Kotz, <strong>and</strong> N. Balakrishnan. Continuous Univariate Distri-<br />

butions. Vol. 2. Wiley-Interscience, 2 edition, 1995.<br />

[58] M. S. Joshi. Applying importance sampling to pricing single tranches of cdos<br />

in a one-factor li model. Technical report, QUARC, Group Risk Management,<br />

Royal Bank of Scotl<strong>and</strong>., 2004.<br />

[59] S. Juneja <strong>and</strong> P. Shahabuddin. Simulating heavy tailed processes using delayed<br />

hazard rate twisting. ACM Transactions on Modeling <strong>and</strong> Computer Simula-<br />

tion, 12(2):94–118, 2002.<br />

[60] S. Juneja <strong>and</strong> P. Shahabuddin. Rare <strong>event</strong> simulation techniques: An intro-<br />

duction <strong>and</strong> recent advances. In S. G. Henderson <strong>and</strong> B. L. Nelson, editors,<br />

H<strong>and</strong>books in Operations Research <strong>and</strong> Management Science, volume 13, chap-<br />

ter 11. Elsevier, Amsterdam, The Netherl<strong>and</strong>s, 2006.<br />

[61] M. Kalkbrener, H. Lotter, <strong>and</strong> L. Overbeck. Sensible <strong>and</strong> efficient capital allo-<br />

cation <strong>for</strong> credit portfolios. Risk, 17(1):19–24, 2004.<br />

[62] I. Karatzas <strong>and</strong> S. E. Shreve. Brownian Motion <strong>and</strong> Stochastic Calculus.<br />

Springer, 2 edition, 1991.<br />

[63] K.-K. Kim. Stability analysis of Riccati differential equations related to <strong>affine</strong><br />

diffusion processes. Journal of Mathematical Analysis <strong>and</strong> Applications, 364:18–<br />

31, 2010.


BIBLIOGRAPHY 116<br />

[64] I. Kontoyiannis <strong>and</strong> S. P. Meyn. Spectral theory <strong>and</strong> limit theorems <strong>for</strong> geo-<br />

metrically ergodic Markov processes. Ann. Appl. Probab., 13(1):304–362, 2003.<br />

[65] I. Kontoyiannis <strong>and</strong> S. P. Meyn. Large deviations asymptotics <strong>and</strong> the spec-<br />

tral theory of multiplicatively regular Markov processes. Electron. J. Probab.,<br />

10(3):61–123, 2005.<br />

[66] T. Kuczek <strong>and</strong> K. N. Crank. A large-deviation result <strong>for</strong> regenerative processes.<br />

J. Theo. Prob., 4(3):551–561, 1991.<br />

[67] D. Lépingle <strong>and</strong> J. Mémin. Sur lintegrabilité uni<strong>for</strong>me des martingales expo-<br />

nentielles. Z. Wahrsch. Verw. Gebiete, 42(3):175–203, 1978.<br />

[68] H. Masuda. On multidimensional Ornstein-Uhlenbeck processes driven by a<br />

<strong>general</strong> lévy process. Bernoulli, 10:97–120, 2004.<br />

[69] H. Masuda. Ergodicity <strong>and</strong> exponential β-mixing bounds <strong>for</strong> multidimensional<br />

diffusions with jumps. Stochastic Process. Appl., 117:35–56, 2007.<br />

[70] R. Merton. On the pricing of corporate debt: The risk structure of interest<br />

rates. Journal of Finance, 29:449–470, 1974.<br />

[71] S. P. Meyn <strong>and</strong> R. L Tweedie. Stability <strong>for</strong> Markovian processes II: continuous-<br />

time processes <strong>and</strong> sampled chains. Adv. Appl. Probability, 25:487–517, 1993.<br />

[72] S. P. Meyn <strong>and</strong> R. L Tweedie. Stability of Markovian <strong>probabilities</strong> III: Foster-<br />

Lyapunov criteria <strong>for</strong> continuous-time processes. Adv. Appl. Prob., 25:518–548,<br />

1993.<br />

[73] S. P. Meyn <strong>and</strong> R. L. Tweedie. A survey of Foster-Lyapunov techniques <strong>for</strong> gen-<br />

eral state space Markov processes. In Proceedings of the Workshop on Stochastic<br />

Stability <strong>and</strong> Stochastic Stabilization, Metz, France, June 1993.<br />

[74] S. P. Meyn <strong>and</strong> R. L. Tweedie. Markov Chains <strong>and</strong> Stochastic Stability. Cam-<br />

bridge Unversity Press, 2 edition, 2009.


BIBLIOGRAPHY 117<br />

[75] P. Ney <strong>and</strong> E. Nummelin. Markov additive processes I. Eigenvalue properties<br />

<strong>and</strong> limit theorems. Ann. Probab., 15(2):561–592, 1987.<br />

[76] P. Ney <strong>and</strong> E. Nummelin. Markov additive processes II. Large deviations. Ann.<br />

Probab., 15(2):593–609, 1987.<br />

[77] E. Nummelin. A splitting technique <strong>for</strong> Harris recurrent chains. Z. Wahrschein-<br />

lichkeitstheorie und Verw. Geb., 43:309–318, 1978.<br />

[78] E. Nummelin. General Irreducible Markov Chains <strong>and</strong> Nonnegative Operators.<br />

Cambridge University Press, 1984.<br />

[79] Y. Ogata. Statistical <strong>models</strong> <strong>for</strong> earthquake occurrences <strong>and</strong> residual analysis<br />

<strong>for</strong> point processes. Journal of the American Statistical Association, 83(101):9–<br />

27, 1988.<br />

[80] Y. Ogata. Space-time point-process <strong>models</strong> <strong>for</strong> earthquake occurrences. Annals<br />

of the Institute of Statistical Mathematics, 50(2):378–402, 1998.<br />

[81] B. Øksendal. Stochastic Differential Equations: An Introduction with Applica-<br />

tions. Springer-Ver, 2000.<br />

[82] E. Papageorgiou <strong>and</strong> R. Sircar. Multiscale intensity <strong>models</strong> <strong>and</strong> name grouping<br />

<strong>for</strong> valuation of multi-name credit derivatives. Applied Mathematical Finance,<br />

15(1):73–105, 2007.<br />

[83] P. E. Protter. Stochastic Integration <strong>and</strong> Differential Equations. Springer, 2<br />

edition, 2003.<br />

[84] G. Rubino <strong>and</strong> B. Tuffin, editors. Rare Event Simulation using Monte Carlo<br />

Methods. John Wiley & Sons, 2009.<br />

[85] J. S. Sadowsky <strong>and</strong> J. A. Bucklew. On large deviations theory <strong>and</strong> asymptot-<br />

ically efficient Monte Carlo estimation. IEEE Trans. In<strong>for</strong>m. Theory, 36:579–<br />

588, 1990.


BIBLIOGRAPHY 118<br />

[86] K. Sato <strong>and</strong> M. Yamazato. Operator-self-decomposable distributions as limit<br />

distributions of processes of OrnsteinUhlenbeck type. Stochastic Process. Appl.,<br />

17:73–100, 1984.<br />

[87] M. Sharpe. General Theory of Markov Processes. Academic Press, 1988.<br />

[88] Y. Shimizu. M-estimation <strong>for</strong> discretely observed ergodic diffusion processes<br />

with infinite jumps. Stat. Inference Stoch. Process, 9:179225, 2004.<br />

[89] Y. Shimizu <strong>and</strong> N. Yoshida. Estimation of parameters <strong>for</strong> diffusion processes<br />

with jumps from discrete observations. Stat. Inference Stoch. Process, 9:227277,<br />

2004.<br />

[90] A. V. Skorohod. Asymptotic Methods in the Theory of Stochastic Differential<br />

Equations. American Mathematical Society, 1989.<br />

[91] A. Stomakhin, M. B. Short, <strong>and</strong> A. L. Bertozzi. Reconstruction of missing data<br />

in social networks based on temporal patterns of interactions, 2011. submitted.<br />

[92] O. Stramer <strong>and</strong> R. L. Tweedie. Stability <strong>and</strong> instability of continuous time<br />

Markov processes. In F. P. Kelly, editor, Probability, Statistics <strong>and</strong> Optimiza-<br />

tion: A Tribute to Peter Whittle, pages 173–184. John Wiley & Sons, 1994.<br />

[93] R. Szechtman <strong>and</strong> P. W. Glynn. Rare-<strong>event</strong> simulation <strong>for</strong> infinite server queues.<br />

In E. Yücesan, C.-H. Chen, J. L. Snowdon, <strong>and</strong> J. M. Charnes, editors, Pro-<br />

ceedings of the 2002 Winter Simulation Conference, pages 416–423, 2002.<br />

[94] R. L. Tweedie. Sufficient conditions <strong>for</strong> regularity, recurrence <strong>and</strong> ergodicity of<br />

Markov processes. Math. Proc. Camb. Phil. Soc., 78:125–136, 1975.<br />

[95] R. L Tweedie. Criteria <strong>for</strong> ergodicity, exponential ergodicity <strong>and</strong> strong ergod-<br />

icity of Markov processes. J. Appl. Prob., 18:122–130, 1981.<br />

[96] O. Vasicek. An equilibrium characterization of the term structure. Journal of<br />

Financial Economics, 5:177–188, 1977.


BIBLIOGRAPHY 119<br />

[97] A. Wald. On cumulative sums of r<strong>and</strong>om variables. The Annals of Mathematical<br />

Statistics, 15(3):283296, 1944.<br />

[98] B. Wong <strong>and</strong> C. C. Heyde. On the martingale property of stochastic exponen-<br />

tials. J. Appl. Prob., 41:654–664, 2004.<br />

[99] F. Xi. Asymptotic properties of jump-diffusion processes with state-dependent<br />

switching. St, 119(7):2198 – 2221, 2009.<br />

[100] X. Zhang, P. W. Glynn, K. Giesecke, <strong>and</strong> J. Blanchet. Rare <strong>event</strong> simulation<br />

<strong>for</strong> a <strong>general</strong>ized Hawkes process. In M. D. Rossetti, R. R. Hill, B. Johans-<br />

son, A. Dunkin, <strong>and</strong> R. G. Ingalls, editors, Proceedings of the 2009 Winter<br />

Simulation Conference, pages 1291–1298, 2009.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!