computing rare-event probabilities for affine models and general ...

COMPUTING RARE-EVENT PROBABILITIES FOR AFFINE 

MODELS AND GENERAL STATE SPACE MARKOV PROCESSES 

A DISSERTATION 

SUBMITTED TO THE DEPARTMENT OF MANAGEMENT 

SCIENCE AND ENGINEERING 

AND THE COMMITTEE ON GRADUATE STUDIES 

OF STANFORD UNIVERSITY 

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS 

FOR THE DEGREE OF 

DOCTOR OF PHILOSOPHY 

Xiaowei Zhang 

August 2011

© 2011 by Xiaowei Zhang. All Rights Reserved. 

Re-distributed by Stanford University under license with the author. 

This work is licensed under a Creative Commons Attribution- 

Noncommercial 3.0 United States License. 

http://creativecommons.org/licenses/by-nc/3.0/us/ 

This dissertation is online at: http://purl.stanford.edu/ny328vh8662 

ii

I certify that I have read this dissertation and that, in my opinion, it is fully adequate 

in scope and quality as a dissertation for the degree of Doctor of Philosophy. 

Peter Glynn, Primary Adviser 



Nicholas Bambos 



Approved for the Stanford University Committee on Graduate Studies. 

Tze Lai 

Patricia J. Gumport, Vice Provost Graduate Education 

This signature page was generated electronically upon submission of this dissertation in 

electronic format. An original signed hard copy of the signature page is on file in 

University Archives. 

iii

Abstract 

Rare-event simulation concerns computing small probabilities, i.e. rare-event prob- 

abilities. This dissertation investigates efficient simulation algorithms based on im- 

portance sampling for computing rare-event probabilities for different models, and 

establishes their efficiency via asymptotic analysis. 

The first part discusses asymptotic behavior of affine models. Stochastic stability 

of affine jump diffusions are carefully studied. In particular, positive recurrence, er- 

godicity, and exponential ergodicity are established for such processes under various 

conditions via a Foster-Lyapunov type approach. The stationary distribution is char- 

acterized in terms of its characteristic function. Furthermore, the large deviations 

behavior of affine point processes are explicitly computed, based on which a logarith- 

mically efficient importance sampling algorithm is proposed for computing rare-event 

probabilities for affine point processes. 

The second part is devoted to a much more general setting, i.e. general state space 

Markov processes. The current state-of-the-art algorithm for computing rare-event 

probabilities in this context heavily relies on the solution of a certain eigenvalue 

problem, which is often unavailable in closed form unless certain special structure 

is present (e.g. affine structure for affine models). To circumvent this difficulty, 

assuming the existence of a regenerative structure, we propose a bootstrap-based 

algorithm that conducts the importance sampling on the regenerative cycle-path space 

instead of the original one-step transition kernel. The efficiency of this algorithm is 

also discussed. 

iv

Acknowledgements 

Life is random and yet there do exist moments that deflect its path. Being admitted 

to Stanford is definitely one such moment of mine. The past five years turns out 

to be the most amazing time in my life. I am fortunate and grateful that I have 

been showered with so much support and encouragement, which makes it a virtually 

impossible task to express my appreciation to every single one of those who have 

helped me. 

First and foremost, I am deeply indebted to my advisor, Professor Peter Glynn. 

I met Peter for the first time while he was on an academic visit to Nankai University 

in China. I was a senior student, wondering what major to study for graduate school. 

I had not even considered applying for Stanford because I thought I had merely a 

remote chance for it. But Peter encouraged me to follow my heart and things un- 

folded surprisingly well. During my PhD study, Peter has been a constant source 

of inspiration and I have benefited from him in numerous aspects. As a doctoral 

supervisor, Peter’s broad scope of knowledge, penetrating insight and intuitive expla- 

nation of complicated theory have never ceased to amaze me. Due to him, my way 

of thinking for research is shaped more rigorous and my horizon is expanded beyond 

what I ever imagined. His support is by no means limited to the academic level. As 

a mentor and friend, Peter makes me comfortable to share my personal experiences, 

both excitements and frustrations. He is a role model that I look up to and wish to 

become, professionally and personally. 

I would like to thank Professor Kay Gieceke, for introducing me to the research 

area of credit risk. His deep insight about financial modeling makes him a great 

collaborator. It is an enjoyable process to write a paper with him. I would also 

v

like to thank Professor David Siegmund, for the classes on probability and stochastic 

processes he taught as well as the suggestions he gave for my research questions. I am 

obliged to Professor Jose Blanchet at Columbia University, Professor Darrell Duffie, 

along with the three preceding professors for writing recommendation letters during 

the process of my academic job searching. Moreover, I would like to thank Professor 

Nick Bambos and Professor Tze Leung Lai for serving my reading committee, and 

Professor George Papanicolaou for chairing my oral defense. My gratitude extends to 

all the faculty and staff in the Department of Management Science and Engineering 

for their help and advice for years. 

My friends have played an indispensable part of my life at Stanford. They have 

made my life at Stanford full of joy and excitement. Among them are Simla Ceyhan, 

Anwei Chai, Shi Chen, Yichuan Ding, Dongdong Ge, Yihan Guan, Juegang Hu, 

Chuan Huang, Krishnamurthy Iyer, Xiaoye Jiang, Tim Kraft, Lei Liu, Shan Liu, Jing 

Ma, Zongming Ma, Ehsan Mousavi, Chen Peng, Qi Qi, Waraporn Tongprasit, Xi 

Wang, Zizhuo Wang, Wei Wu, Yu Wu, Li Xu, Yuan Yao, Hongsong Yuan, Yan Zhai, 

Kaiyuan Zhang, Feng Zhang, Lingren Zhang, Qinqin Zhang, Wugang Zhao, Yanchong 

Zheng, Zhen Zhu, and etc. In addition, I want to give my special thanks to Su Chen, 

my best friend at Stanford and my Best Man. 

At last, it comes to my family. My parents, Jinshan Zhang and Xiuhuan Bi, have 

always been supportive in any ways. They have showed and are still showing me how 

to look at the world, optimistic and passionate. My wife, Biyun Pan, is my greatest 

achievement. I could never say thanks too much for the love and support she gives 

me. This dissertation is dedicated to my dear family. 

vi

Contents 

Abstract iv 

Acknowledgements v 

1 Introduction 1 

2 Affine Jump Diffusions: Stochastic Stability 7 

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

2.2 Affine Jump Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

2.3 Stochastic Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

2.3.1 Foster-Lyapunov Inequalities . . . . . . . . . . . . . . . . . . . 17 

2.3.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

2.3.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 28 

2.4 Characterization of Stationary Distribution . . . . . . . . . . . . . . . 36 

3 Affine Point Processes: Large Deviations 43 

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 

3.2 Affine Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 46 

3.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 48 

3.4 Large Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 

3.4.1 A Class of Exponential Martingales . . . . . . . . . . . . . . . 53 

3.4.2 Limiting Cumulant Generating Function . . . . . . . . . . . . 62 

3.4.3 Steepness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 

3.5 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 

vii

3.6 Application to Portfolio Credit Risk . . . . . . . . . . . . . . . . . . . 71 

4 Computing Large Deviations for GSSMPs 75 

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 

4.2 Markov-Dependent Sums . . . . . . . . . . . . . . . . . . . . . . . . . 76 

4.3 Empirical Moment Generating Function . . . . . . . . . . . . . . . . 83 

4.4 A Bootstrap Based Simulation Algorithm . . . . . . . . . . . . . . . . 88 

4.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 98 

4.5.1 Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . 98 

4.5.2 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

A Stochastic Stability of Markov Processes 104 

A.1 Extended Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 

A.2 Foster-Lyapunov Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 106 

A.2.1 Recurrence and Ergodicity . . . . . . . . . . . . . . . . . . . . 106 

A.2.2 Exponential Ergodicity . . . . . . . . . . . . . . . . . . . . . . 108 

Bibliography 110 

viii

List of Tables 

3.1 Theoretical v.s. Estimated Mean/Variance . . . . . . . . . . . . . . . 51 

3.2 Parameter specification for computing rare-event probabilities for APPs 73 

3.3 Results of the numerical experiment for testing the logarithmic effi- 

ciency of the IS estimator. . . . . . . . . . . . . . . . . . . . . . . . . 73 

4.1 Parameter specification for computing rare-event probabilities for the 

AR(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 

4.2 True vs estimated tilting parameters for the AR(1) model. The number 

of cycles m = 40000. . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

4.3 Results for computing rare-event probabilities for the AR(1) model. 

The number of cycles m = 40000, the number of bootstrap samples 

r = 20000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

4.4 True vs estimated tilting parameters for the random walk. c = 0.5. 

The number of cycles m = 40000. . . . . . . . . . . . . . . . . . . . 102 

4.5 Results for computing rare-event probabilities for the random walk. 

c = 0.5. The number of cycles m = 40000, the number of bootstrap 

samples r = 10000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

ix

List of Figures 

3.1 Simulated Sample Path of X(t) and L(t). . . . . . . . . . . . . . . . . 52 

3.2 Histogram v.s. Fitted Probability Density. Left: L(T ); Right: N(T ). 52 

3.3 Convergence of Log Ratio as the probability tends to 0, showing the 

logarithmic efficiency of the IS estimator. . . . . . . . . . . . . . . . . 74 

x

Chapter 1 

Introduction 

This dissertation, as suggested by the title, consists of two parts, namely comput- 

ing rare-event probabilities via simulation for affine models and general state space 

Markov processes. Rare-event simulation is an important research area in stochastic 

simulation as well as computational and applied probability. It involves estimating 

probabilities of the events that occur very infrequently and yet significant enough to 

justify their study, i.e. rare-events. Rare-event simulation has wide applications in 

numerous areas. A notable example is the packet loss in packet-switched telecom- 

munication networks. In order to reduce the variation of delays in carrying real-time 

video traffic, the buffers within the switches are designed to have limited size, which 

yields the possibility of buffer overflow and thereby packet loss. Hence, it is of great 

importance to estimate the packet loss probability (which could be of order 10 −9 ) as a 

performance measure of the network system; see, for example, [53]. Another example 

is in the area of air transportation, where a specification for civil aircraft is that the 

probability of failure must be less than, say, 10 −9 during a flight of about 8 hours; 

see, for example, [13]. Rare-event simulation is also widely applicable in finance. For 

instance, portfolio managers need to estimate the probability of large portfolio loss 

for risk management purpose (see, for example, [48] and [7]). In insurance context, 

the insurance company is interested in estimating the probability of ruin in a given 

time horizon or eventual ruin to adjust the premiums against the potential claims 

(see, for example, [3] and [4]). Some general references on rare-event simulation are 

1

CHAPTER 1. INTRODUCTION 2 

[15], [60], [1], and [84]. 

Consider the problem of estimating the probability α = P(A) of some rare-event 

A, i.e. α is small. The crude Monte Carlo (CMC) method is to use the estimator 

Z = IA so that the variance is σ 2 = α(1 − α). For small α, the absolute error zσn −1/2 

is not of sufficient interest. What really matters is the relative error, which truly 

captures the accuracy of the estimation. This leads to the problem of CMC with 

rare-events: 

relative error = zσn−1/2 

α 

= z 

1 − α 

nα 

∼ z 

√ nα → ∞, 

as α ↓ 0. Consequently, if one wants to achieve a prescribed relative precision, one 

needs to increase n in proportion to α −1 . To illustrate this, let us assume that we 

target at 10% relative error with a 95% confidence interval. Also, assume that the 

probability of interest α is of order 10 −9 , which occurs in many telecommunication 

settings. This leads to 

1.96 

√ 10 −9 n ≤ 0.1, 

implying n ≥ 3.84 × 10 11 . If the system being simulated is complex, this would be 

virtually infeasible! 

To formally discuss such efficiency concepts, let {A(x)} be a family of rare-events, 

where x is some index and assuming x ∈ (0, ∞) without loss of generality. Assume 

that α(x) P(A(x)) → 0 as x → ∞. For each x let Z(x) be an unbiased estimator 

for A(x), i.e. EZ(x) = α(x). 

The best performance that has been observed in realistic rare-event setting is 

bounded relative error or strong efficiency, meaning 

lim 

x→∞ 

Var(Z(x)) 

< ∞. (1.1) 

α(x) 2 

If Z(x) has bounded relative error, the number of samples required to achieve a 

prescribed relative precision is independent of the rarity of the A(x). To see this, let


us assume that we generate n iid copies of Z(x), Z1(x), . . . , Zn(x) and use 

Zn(x) = 1 

n 

n 

Zi(x) 

as our estimate for α(x). Then, it follows from the Chebyshev’s inequality that for 

any ɛ > 0, 

 

|Z(x) − α(x)| 

P 

α(x) 

i=1 

 

> ɛ 

≤ Var(Z(x)) 

nɛ2 . 

α(x) 2 

Then, for a given upper bound ɛ of the relative error, we can guarantee that the 

relative error is no larger than ɛ with probability (1 − δ), if the number of replications 

Hence, if we choose and fix 

n ≥ Var(Z(x)) 

δɛ2 . (1.2) 

α(x) 2 

n ≥ 1 

lim 

δɛ2 x→∞ 

we are all set, regardless of how small α(x) is. 

Var(Z(x)) 

α(x) 2 , 

An efficiency concept slightly weaker than (1.1) is logarithmic efficiency, meaning 

lim 

x→∞ 

log Var(Z(x)) 

≥ 1. (1.3) 

log α(x) 2 

Remark 1.1. The conditions (1.1) and (1.3) are typically verified by replacing Var(Z(x)) 

by EZ(x) 2 . 

It is more often to work with logarithmic efficiency rather than strong efficiency 

in practice. The reasons include i.) logarithmic efficiency is often easier to verify 

(typically by applying the large deviations theory); and ii.) the difference between 

their performance in practice is minor. 

The first portion of this dissertation discusses asymptotic analysis and the rare- 

event simulation for affine models in the context of portfolio credit risk. Risk manage- 

ment is particularly concerned with the event of large portfolio loss, which is typically


rare but significant. The most recent as well as the preceding financial crises that 

have happened in the history indicate that the default of one major financial entity 

on the market could trigger a chain effect that spread the risk exposure through the 

entire financial network so that more defaults tend to occur. This “feedback” feature 

of the defaults are known as “self-excitation” or “clustering” in portfolio credit risk. 

Recently, there has been extensive research based on modeling portfolio credit risk 

with affine point processes. Belonging to the family of affine models, affine point 

process is an intensity-based (or so-called “reduced-form”) model, which captures the 

self-excitation feature (see [37]). Affine models are widely used in various areas of 

finance and econometrics, whose broad applicability is due to their computational 

tractability as well as flexibility in calibrating parameters. Given the wild popularity 

of copula-based models before the 2008 financial crisis, it is not surprising that most 

of the research on efficient simulation for portfolio credit risk is conducted in the 

context of copulas. By contrast, the efficient simulation for intensity-based models 

are far less; see [6] for doubly stochastic processes, and the most recent work in [41], 

[42] for affine point processes. The three preceding papers work in the “large portfo- 

lio” asymptotic regime and we will treat the “large time horizon” asymptotic regime 

instead in this dissertation. 

The vehicle that drives our efficient simulation algorithm is the large deviations 

analysis, which describes the atypical behavior of the system. An affine point process 

is, to a large extent, similar as a Markov-dependent sum, whose large deviations be- 

havior has been extensively studied; see [75], [76], [64], [65]. As indicated in [32] and 

[11], a Markov-dependent sum with an unbounded functional could ensure uncon- 

ventional large deviation asymptotics, i.e. subexponential asymptotics. Nevertheless, 

even though the underlining Markov process for affine point process is unbounded, 

our analysis shows that its large deviations behavior still owns exponential asymp- 

totics. The above discussions on large deviations and efficient simulation for affine 

point processes are done in Chapter 3. 

It turns out that in order to quantify the rare-event region and to prove the large 

deviations result for an affine point process, we need to establish its typical behav- 

ior first. Due to the close connection between affine point process and affine jump


diffusion (as self-explained by their definitions), the typical behavior (characterized 

in terms of a central limit convergence) of the former heavily depends on that of 

the latter, especially its existence of an equilibrium. This induces the study of the 

stochastic stability of affine jump diffusion. Since the stochastic stability, in partic- 

ular ergodicity, of a jump diffusion process has interest of its own (for instance, its 

applications in parameter estimation based on long-term asymptotics), we provide a 

careful treatment of this subject in Chapter 2. Given the popularity of affine models 

in practice, it is surprising that there have been no results in the literature for estab- 

lishing the ergodicity of affine jump diffusions. Existing research is typically focused 

on the jump diffusions whose jump intensity is independent of the state ([86], [68], 

[69], [99]) and this assumption obviously fails for the self-exciting affine models. Our 

approach for establishing the ergodicity is the Foster-Lyapunov criteria. Thanks to 

the affine structure, we are able to find appropriate test functions that verify the 

criteria, which is not a trivial task for a generic multidimensional Markov process. 

A central limit theorem is further proved for affine jump diffusions by virtue of the 

central limit theorem for local martingales. Finally, the stationary distribution is 

characterized in terms of its Fourier transform. 

For the second portion of this dissertation, we move into a much more general 

setting, where we consider the rare-event simulation for general state space Markov 

processes. The implementability of the importance sampling for affine point pro- 

cesses in Chapter 3 depends on two things. First, we are able to compute the limiting 

cumulant generating function of the process, or equivalently we are able to solve an as- 

sociated eigenvalue problem; second, we are able to sample/generate paths under the 

change-of-measure, which involves the eigenvalue/eigenfunction of the first problem. 

These two tasks are explicitly solvable for affine point processes due to the affine 

structure. In particular, the eigenfunction has an exponential affine form and the 

change-of-measure falls within the same family as the original probability measure. 

The current state-of-the-art algorithms for rare-event simulation for Markov processes 

also depends on the feasibility of the above two tasks; see [16], [34], and [12]. In the 

absence of such a special structure of the underlying model, how should one proceed 

without the explicit solution of the associated eigenvalue problem. [17] proposes a


sequential importance sampling and resampling algorithm, attempting to address the 

second problem above. We, by contrast, will try to solve the two problems simul- 

taneously in Chapter 4. Assuming the existence of a regenerative structure, which 

can be constructed via Nummlin’s splitting method in the presence of positive Harris 

recurrence, we consider the importance sampling at the level of the regenerative cy- 

cles instead of the step-by-step transition dynamics, so that the eigenfunction can be 

eliminated from the tilted probability measure, which solves the second task. More- 

over, we can approximate the tilting parameter by its empirical approximation, which 

solves the first task. This bootstrap type algorithm is proved to be logarithmically 

efficient.

Chapter 2 

Affine Jump Diffusions: Stochastic 

Stability 

2.1 Introduction 

Affine jump-diffusion (AJD) processes represent a large family of continuous-time 

stochastic models that are widely applied in financial engineering and econometrics. 

The broad applicability of this family of models is due to its modeling flexibility 

as well as computational tractability. The term “affine” derives from the fact that 

the drift, the variance and the jump intensity are all affine in the state vector. As 

shown in [30] and [28], the affine structure implies that the Laplace/Fourier trans- 

form of the probability distribution of an AJD is explicitly available up to solving a 

system of (generalized) Riccati ordinary differential equations (ODEs), which thereby 

provides great tractability. (Note that for a generic diffusion process, to obtain its 

transform requires solving a set of partial differential equations, which is much more 

computationally involved.) 

The AJD family includes many broadly used examples, such as the Brownian 

motion with drift in the Ho-Lee model in [54], the Ornstein-Uhlenbeck (OU) process 

in the Vasicek model in [96], and the Feller diffusion in the Cox-Ingersoll-Ross (CIR) 

model in [22]. 

The stochastic stability of jump diffusions is not only of interest in itself but also 

7

CHAPTER 2. AFFINE JUMP DIFFUSIONS: STOCHASTIC STABILITY 8 

has important implications in parameter estimation. The ergodicity typically plays an 

essential role for estimators in establishing laws of large numbers; see, for example, 

[88], [89] and references therein. The stability will also be used in Chapter 3 for 

computing the large deviations asymptotics for affine point processes. 

Nevertheless, despite its wide application in practice, especially in finance and 

econometrics, it is surprising that there has not been a systematic treatment for the 

stochastic stability of AJDs in the literature. Although there has been significant 

research on the stochastic stability of generic jump diffusions ([86], [68], [69], [99] and 

the references therein), these discussions are limited to the setting where the jump 

intensity is “state-independent”, which clearly fails in our AJD setting. It is showed 

in [47] and [63] that the equilibrium of affine diffusion processes (without jumps) 

exists, through the analysis of the stability of the associated system of ODEs. We 

find it difficult to extend their approach to AJDs because one would then need to 

analyze a system of ordinary integro-differential equations, which is significantly more 

challenging. 

A key assumption for establishing the stability is the “mean-reversion” of the 

drift (Assumption 3), under which the paths have a tendency to return to a compact 

set. Moreover, the farther the paths deviate from this compact set, the stronger the 

tendency will be. Another key observation is that the only factor that can ruin the 

effort of the mean-reversion is the jump component. We thereby need to control the 

jumps so that they are not too big in size or do not occur too frequently. This leads 

to Assumption 5. These two assumptions are critical and will be carried over to the 

asymptotic analysis of affine point processes in Chapter 3 as well. 

The approach we will adopt for establishing the stability is the Foster-Lyapunov 

method (see Appendix A for details). To find an appropriate Foster-Lyapunov test 

function in the multidimensional setting is not a trivial task and can be challenging. 

However, the task can be done for AJDs thanks to the affine structure. In particular, 

the test functions are chosen to be monotone functions (power function or logarithmic 

function) of a certain norm in the Euclidean space. We not only obtain results on 

the existence of the equilibrium (in other words, positive Harris recurrence) of AJDs, 

but also on the convergence rate to the equilibrium. In addition we derive a central


limit theorem (CLT) for AJDs. The tool we will use is the CLT for local martingales. 

Again, due to the affine structure, we can explicitly calculate the equilibrium mean 

and asymptotic covariance matrix. The above discussions will be made clearer in 

Section 2.3. Finally, we will characterize the equilibrium distribution in terms of a 

first-order linear partial differential equation (PDE) in Section 2.4, which has a close 

connection to an ODE system. 

2.2 Affine Jump Diffusions 

For the rest of this chapter as well as Chapter 3, we will use the following notational 

conventions. For a vector v ∈ R n , v is viewed as a column vector, v ⊺ denotes its 

transpose, v denotes its Euclidean norm and diag(v) denotes the diagonal matrix 

whose diagonal elements are v. Moreover, we use 0 to denote a zero matrix and 

Id(i) to denote a matrix with all zero entries except the i-th diagonal entry is 1. We 

write A 0 if A ∈ R n×n is a symmetric positive semidefinite matrix and A ≻ 0 if A 

is symmetric positive definite. Finally, for any probability distribution η on S, Pη(·) 

denote the distribution conditional on X(0) has distribution η and Eη is the associated 

expectation operator. In particular, Px(·) = P(·|X(0) = x) and Ex(·) = E(·|X(0) = x) 

for a given x ∈ S. 

Fix a complete probability space (Ω, P, F ) and a filtration {Ft : t ≥ 0} satisfying 

the usual conditions (see, for example, [62] for details). Let X be a n-dimensional 

time-homogeneous Markov process with state space S ⊆ R n , satisfying the following 

stochastic differential equation (SDE) 

dX(t) = µ(X(t)) dt + σ(X(t)) dW (t) + 

K 

 

i=1 

S 

z Ni(dt, dz) (2.1) 

where W = (W (t) : t ≥ 0) be a standard n-dimensional Brownian motion adapted to 

{Ft : t ≥ 0}, µ : S → R n , and σ : S → R n×n , i = 1, . . . , K. Moreover, Ni(dt, dz) is a 

counting random measure [0, ∞) × R n with compensator Λi(X(t−))dtϕi(dz), where 

Λi : S → R+ and ϕi is a probability measure on R n , for each i = 1, . . . , K.


Moreover, define 

Ni(t) 

t 

0 

 

S 

Ni(ds, dz) 

for i = 1, . . . , K. Then, Ni(t) is a counting process with intensity Λi(X(t−)). 

In the sequel of the paper, we will use Z i to denote a S-valued random variable 

with probability distribution ϕi. 

The affine structure is introduced in the following fashion. We assume that µ, σσ ⊺ 

and Λ are all affine in the state variable, i.e. 

X is then an AJD. 

µ(x) =b − βx, b ∈ R n , β ∈ R n×n 

σ(x)σ(x) T n 

=a + α j xj, a ∈ R n×n , α j ∈ R n×n , j = 1, . . . , n 

Λi(x) =λi + 

j=1 

n 

j=1 

κijxj, λ ∈ R K + , κ ∈ R K×n 

+ , i = 1, . . . , K 

Let I, J ⊆ {1, . . . , n} be two index sets. We write vI = (vi : i ∈ I) ⊺ and MI,J = 

(Mij : i ∈ I, j ∈ J) for any vector v and matrix M. From now on, we fix the index 

sets I and J be setting I = {1, . . . , m} and J = {m + 1, . . . , n}. 

Definition 2.1. The parameters (a, α, b, β, λ, κ) are called admissible if 

1). a 0 with aI,I = 0 (hence aI,J = 0 and aJ,I = 0) 

2). α (α 1 , . . . , α n ) with α i 0 and α i I,I = αi i,iId(i) for i ∈ I; α i = 0 for i ∈ J. 

3). b ∈ R m + × R n−m 

4). βI,J = 0 and βI,I has non-positive off-diagonal elements.. 

5). λ ∈ R K + , κ ∈ R K×n 

+ 

with κi,j = 0 for i = 1, . . . , K and j ∈ J. 

Here are several basic assumptions that we will use in this chapter. 

Assumption 1. The parameters (a, α, b, β, λ, κ) in the SDE (2.1) are admissible.


Assumption 2. Either EZ i < ∞ for all i = 1, . . . , K, or κ = 0, where Z i has 

distribution ϕi. 

Lemma 2.1. Under Assumption 1 and Assumption 2, the SDE (2.1) has a unique 

weak solution on S = R m + × R n−m . The solution process is càdlàg (right continuous 

with left limits), nonexplosive and has the Feller property. 

Proof. See Theorem 2.7 and Lemma 9.2 of [28]. 

Remark 2.1. As indicated in [28], the state space S = R m + × R n−m is called canonical. 

Moreover, the first m coordinates are of CIR type while the others are of O-U type. 

The volatility function σ(·) and the intensity function Λi(·), i = 1, . . . , K, depend 

only on the CIR type coordinates. 

Lemma 2.1 asserts that there exists a process X ′ (t) adapted to a filtration F ′ 

t, 

satisfying 

dX ′ (t) = (b − µ(X ′ (t))) dt + σ(X ′ (t)) dW ′ (t) + 

K 

 

i=1 

S 

z N ′ i(dt, dz) 

for a Brownian motion W ′ (t) adapted to F ′ 

t, and random measures N ′ i(dt, dz) with 

intensity Λi(X ′ (t))dtϕi(dz), i = 1, . . . , K. With a bit abuse of notation, we will not 

differentiate (X(t), W (t), Ni(dt, dz), Ft) from (X ′ (t), W ′ (t), N ′ i(dt, dz), F ′ 

t). 

Assumption 3. β is positive stable, i.e. all the eigenvalues of β have positive real 

parts. 

Remark 2.2. For a one-dimensional Itô diffusion process, the assumption that β > 0 is 

critical for recurrence. A natural extension in the multidimensional setting to retain 

the “positivity” of β is to make β positive stable. This assumption originates from 

the study of the stability of a system of first-order ODEs ˙ 

f = −βf, where β is the 

coefficients matrix. The same assumption also appears in [86] or [68], which study 

the stochastic stability of O-U type processes driven by a Lévy process. In light of 

the connection to the mean reversion O-U process, we call Assumption 3 the mean 

reversion assumption.


Assumption 4. There exists an index set L ⊆ I for which 

a + 

k∈L 

α k ≻ 0, and min 

k∈L 

Remark 2.3. Note that a key technical assumption in all the Foster-Lyapunov criteria 

in the Appendix is the condition that all the compact sets are petite. As discussed 

in Section A.2.1, to satisfy this condition requires some continuity on the transition 

kernel. A natural step to introduce such continuity is to assume the existence of the 

transition density. The existence (or even smoothness) of the transition density of a 

jump diffusion process has been extensively studied in the literature. For example, 

a sufficient condition for Itô diffusions is the Hörmander condition developed using 

Malliavin calculus. Moreover, the same condition also applies for certain class of 

jump diffusions (whose jump intensity is state-independent). See detailed discussion 

in [90]. Obviously, this condition does not apply in the AJD setting since the jump 

intensity of an AJD may linearly depends on the state. Nevertheless, we have the 

following proposition on the existence (and smoothness) of the transition density of 

AJDs. 

Lemma 2.2. Suppose that Assumptions 1, 2, and 4 hold. Let L be the index set in 

Assumption 4 and p be a nonnegative integer 

p < min 

k∈L 

bk 

2α k k,k 

Then, Px(X(t) ∈ ·) admits a density g(y) of class C p with support in S and the partial 

− 1. 

bk 

α k k,k 

> 2. 

derivatives of g(y) of orders 0, . . . , p tend to 0 as y → ∞. 

Proof. See Theorem 4.1 of [40]. 

In order to apply the Foster-Lyapunov approach discussed in the Appendix, we 

need to specify the extended generator of an AJD (see Definition A.2). To that end, 

we introduce the following function space for potential test functions. Let C 2 (R n ) be


the set of twice differentialable functions f : R n → R and define 

 

Q = f ∈ C 2 (R n 

 

) 

 

Also define the linear operator A on Q by 

where 

and 

 

|f(· + z)|ϕi(dz) is bounded on compact sets, i = 1, . . . , K 

G f(x) = ∇f(x) ⊺ (b − βx) + 1 

2 

L f(x) = 

K 

i=1 

(2.2) 

A f = G f + L f, (2.3) 

(λi + κ ⊺ 

i x) 

n 

i,j=1 

where we use κi to denote the i-th row of κ. 

 

∂2 

f 

(x) ai,j + 

∂xi∂xj 

m 

k=1 

(f(x + z) − f(x)) ϕi(dz), 

α k i,jxk 

We now show that A is the extended generator of X. We will need the following 

lemmata on the properties of local martingales. 

Lemma 2.3. Let M = (M(t) : t ≥ 0) be a càdlàg process and Tn be a sequence of 

stopping times increasing to ∞ a.s. such that M(t ∧ Tn) is a local martingale for each 

n. Then, M is a local martingale. 

Proof. See Theorem 48 in Chapter 1 of [83]. 

Lemma 2.4. Let γ(dt, dz) be a counting random measure on [0, ∞) × R n with com- 

pensator ν(dt, dz). 

(i.) If 

E 

t 

0 

 

R n 

then t 

0 

|g(s, z)| ν(ds, dz) < ∞, 

R n 

is a local martingale, where ˜γ γ − ν. 

g(s, z) ˜γ(ds, dz) 

 

,


(ii.) If 

E 

t 

0 

 

R n 

then t 

is a martingale. 

Proof. See Theorem II.1.33 of [56]. 

0 

|g(s, z)| 2 ν(ds, dz) < ∞, 

R n 

g(s, z) ˜γ(ds, dz) 

Proposition 2.1. Under Assumption 1 and Assumption 2, A is the extended gen- 

erator of X. Further, Q ⊆ Dom(A ), where Q is defined by (2.2). 

Proof. Fix t > 0, x ∈ S, and f ∈ Q. Let τk = inf{t : X(t) > k} for each k ≥ 1. 

Lemma 2.1 asserts that X is nonexplosive, so τk → ∞ as k → ∞. By virtue of 

Lemma 2.3, it suffices to show that 

f(X(t ∧ τk)) − 

t∧τk 

is a Px-local martingale adapted to X for each k. 

0 

A f(X(s)) ds 

Note that by Itô’s formula (see, for example, Theorem 33 in Chapter 2 of [83]) 

f(X(t ∧ τk)) 

=f(X(0)) + 

+ 

t∧τk 

0 

=f(X(0)) + 

0



Note that 

Ex 

t∧τk 

0 

G2(t) = 

(f(X(s)) − f(X(s−))) − 

0


Note that 

t∧τk 

Ex 

0 

t∧τk 

≤Ex 

≤ 

t 

0 

0, d < ∞, and a compact set C. Under mind conditions, this 

inequality implies positive Harris recurrence. 

Moreover, if V (x) ≥ 1 on C c , then the following inequality is obviously stronger, 

which will imply a stronger stochastic stability result, namely exponential ergodicity 

(roughly meaning the process converges to the equilibrium exponentially fast). 

A V (x) ≤ −cV (x) + dIC(x), x ∈ S. 

We will give different sufficient conditions under which such a test function does exist.


2.3.1 Foster-Lyapunov Inequalities 

Note that in the absence of the jump part (i.e. X is an Itô diffusion), Assumption 

3 guarantees that X is positive Harris recurrent; see [86] and [68]. Since the mean 

reversion forces the process drifts back to the equilibrium, it is then seems reasonable 

to speculate that the equilibrium will still exist in the presence of jumps if the effect 

of jumps are “dominated” by the force of mean reversion (see Assumption 5). It turns 

out that this is indeed the case as shown in Theorem 2.1 and Theorem 2.2. 

Assumption 5. β − K 

i=1 EZi κ ⊺ 

i is positive stable, where Zi has distribution ϕi and 

κi is the i-th row of κ. 

Remark 2.4. When EZ i = ∞ and κi = 0, we set EZ i κ ⊺ 

i 

= 0. 

Before proceeding to the proofs, we will need the following lemma which is an 

important property of positive stable matrices. 

Lemma 2.5. Let A ∈ R n×n be positive stable. Then there exists G ≻ 0 such that 

GA + A ⊺ G ≻ 0. 

Proof. See Theorem 2.2.3 of [55]. 

For any n × n matrix G ≻ 0, define 

xG (x ⊺ Gx) 1 

2 

for x ∈ R n . It is straightforward to see that ·G is a norm equivalent to the Euclidean 

norm · in R n , since G ≻ 0. Further, we may define the associated matrix norm. 

In particular, for any A ∈ R n×n , define 

AG = sup 

y∈R n ,y=0 

AyG 

. 

yG 

The following technical result is also necessary for verifying the Foster-Lyapunov 

inequalities.


Lemma 2.6. Let Z ∈ R n be and f ≥ 0 be a monotone increasing function with 

Ef(Z) < ∞. Fix ɛ > 0. If 

then 

Proof. Note that 

Hence, 

f(y) 

lim 

y→∞ f(y − ɛ) 

= 1, 

lim f(x)P(x + Z ≤ ɛ) = 0 

x→∞ 

P(x + Z ≤ ɛ) ≤ P(x − Z ≤ ɛ) = P(Z ≥ x − ɛ). 

f(x)P(x + Z ≤ ɛ) ≤f(x)P(Z ≥ x − ɛ) 

≤ f(x) 

E[f(Z)I(Z ≥ x − ɛ)] 

f(x − ɛ) 

→0 

as x → ∞, by the dominated convergence theorem. 

Proposition 2.2. Suppose that Assumptions 1, 2, 3, and 5 hold. Let Z i be a rv 

with distribution ϕi, i = 1, . . . , K. Assume that EZ i p < ∞ for some p > 0 and 

all i = 1, . . . , K. Then, there exists a function V ∈ Q, a compact set C and some 

constants c > 0, d < ∞ such that 

A V (x) ≤ −cV (x) + dIC(x), x ∈ S. (2.5) 

Proof. Fix ɛ > 0. Let V be a C 2 function with V (x) = x p 

H = (x⊺ Hx) p 

2 for xH > ɛ, 

where H ≻ 0 will be specified later. Note that 

Hence, 

x + y p 

H ≤ (xH + yH) p ≤ 

 

p 

(1 + xH )yp H , if y > 1 

1 + x p 

H , if y ≤ 1 

x + y p 

H ≤ (1 + xp H )(1 + yp H ).


It follows that, for each i = 1, . . . , K, 

 

V (x + z) ϕi(dz) 

 

 

≤ V (x + z) ϕi(dz) + 

x+zH≤ɛ 

x + z 

x+zH>ɛ 

p 

H ϕi(dz) 

 

≤ sup V (y) + 

yH≤ɛ 

(1 + x p 

H )(1 + zp H ) ϕi(dz) 

= sup V (y) + (1 + x 

yH≤ɛ 

p 

H )(1 + EZi p 

H ). 

V (·+z) ϕi(dz) is bounded on compact sets for each i = 1, . . . , K, guaranteeing that 

V ∈ Q. 

Note that if we can show 

A V (x) ≤ −cV (x), ∀x ∈ C c 

where C {x ∈ S : xH ≤ k} for some k > 0, then taking d = sup x∈C V (x) ∈ (0, ∞) 

will suffice. Therefore, it suffices to show 

for all xH sufficiently large. 

A V (x) ≤ −cV (x) (2.6) 

Direct calculations yield that for xH sufficiently large, 

∇V (x) = px p−2 

H Hx 

∇ 2 V (x) = px p−2 −2 

H (p − 2)xH Hxx⊺H + H .


It follows that 

G V (x) 

=px p−2 

H 

 

x ⊺ H(b − βx) + 1 

2 

n 

i,j=1 

=px p−2 

H [−x⊺Hβx + O(xH)] 

 

=pV (x) − x⊺Hβx x2 

+ o(1) 

H 

as xH → ∞. 

(ai,j + 

m 

k=1 

α k i,jxk) (p − 2)x −2 

H (Hxx⊺ 

H)i,j + Hi,j 

 

(2.7) 

Note that Assumption 2 and Assumption 5 indicate that we need to discuss the 

following two cases separately. 

(a) p ≥ 1 and β − K i=1 EZiκ ⊺ 

i is positive stable; 

(b) p ∈ (0, 1) and κ = 0. 

Case (a) Suppose that p ≥ 1 and β − K i=1 EZiκ ⊺ 

i is positive stable. 

Applying Taylor’s expansion, for each i = 1, . . . , K, 

V (x + z) − V (x) =z ⊺ ∇V (ξ) 

=z ⊺ ∇V (ξ)I(ξH ≤ ɛ) + pξ p−2 

H ξ⊺ HzI(ξH > ɛ), 

where ξ = x + uz for some u ∈ (0, 1). Therefore, for xH > ɛ, 

κ ⊺ 

i x(V (x + z) − V (x)) 

V (x) 

= κ⊺ 

i xz⊺ ∇V (ξ) 

x p 

H 

S1 + S2 

· I(ξH ≤ ɛ) + p · ξp 

H 

x p 

H 

· ξ⊺Hzκ ⊺ 

i x 

ξ2 · I(ξH > ɛ) (2.8) 

H 

Note that ξH lies between xH and x + zH and that ξ⊺Hzκ ⊺ 

i x lies between 

x⊺Hzκ ⊺ 

i x and (x + z)⊺Hzκ ⊺ 

i x. Therefore, it follows from the squeeze theorem implies


and (2.8) that 

κ ⊺ 

i x(V (x + z) − V (x)) 

V (x) 

∼ p · x⊺ Hzκ ⊺ 

i x 

x 2 H 

(2.9) 

as xH → ∞ for each z ∈ R n , since ξH → ∞ as xH → ∞. Moreover, it follows 

from (2.8) that 

for xH large enough, and that 

S1 ≤ κiH · xH · zH · ∇V (ξ)H 

x p 

H 

≤κiH · sup ∇V (y)H · 

yH≤ɛ 

zH 

x p−1 

H 

≤κiH · sup ∇V (y)H · zH 

yH≤ɛ 

· I(ξH ≤ ɛ) 

S2 ≤ κiH · xH · pξ p−2 

H · ξH · HH · zH 

x p 

H 

≤pκH · HH · zH · (xH + uzH) p−1 

x p−1 

H 

=pκH · HH · zH · (1 + x −1 

H 

≤pκH · HH · zH · (1 + zH) p−1 

· zH) p−1 

(2.10) 

for xH large enough. The right-hand-sides of (2.10) and (2.10) are clearly ϕi- 

integrable since EZ i p < ∞ and p ≥ 1. Then, the dominated convergence theorem 

and (2.9) that 

κ ⊺ 

i x 

 

as xH → ∞. Therefore, 

(V (x + z) − V (x)) ϕi(dz) ∼pV (x) · 

L V (x) ∼ pV (x) · 

⊺ ⊺ 

x Hzκi x 

x2 ϕi(dz) 

H 

=pV (x) · x⊺ HEZ i κ ⊺ 

i x 

x 2 H 

K 

i=1 

x ⊺ HEZ i κ ⊺ 

i x 

x 2 H 

(2.11)


as xH → ∞. It then follows from (2.7) and (2.11) that 

A V (x) =pV (x) 

 

− x⊺ H(β − K 


i )x 

x 2 H 

 

=pV (x) − x⊺HBx x2 

+ o(1) , 

H 

+ o(1) 

where B β − K 

i=1 Zi κ ⊺ 

i . By Lemma 2.5, there exists H ≻ 0 such that HB+B⊺ H ≻ 

0. Hence, 

x ⊺ HBx = 1 

2 x⊺ (HB + B ⊺ H)x ≥ cx 2 , 

for some c > 0 and all x ∈ R n . Moreover, we have x 2 ≥ δx 2 H 

Hence, 

for xH large enough, proving (2.6). 

A V (x) ≤ −pcδV (x) 

Case (b) Suppose that p ∈ (0, 1) and κ = 0. 

Note that, since p ∈ (0, 1), 

Therefore, for xH > ɛ, 

x + y p 

H ≤ (xH + yH) p ≤ x p 

H + yp H . 

 

 

 

 

 

 

 

V (x + z) − V (x) ϕi(dz) 

 

 

≤ |V (x + z) − V (x)| ϕi(dz) + 

x+zH≤ɛ 

 

 

≤ (V (x + z) + V (x)) ϕi(dz) + 

x+zH≤ɛ 

x+zH>ɛ 

x+zH>ɛ 

 

for some δ > 0. 

|V (x + z) − V (x)| ϕi(dz) 

z p 

H ϕi(dz) 

= sup V (y) + x 

yH≤ɛ 

p 

H · P(x + ZiH ≤ ɛ) + EZ i p 

H = O(1)


by Lemma 2.6. This implies that, since κ = 0, 

L V (x) = 

K 

i=1 

as xH → ∞. It follows from (2.7) and (2.12) that 

λi 

 

V (x + z) − V (x) ϕi(dz) = O(1) (2.12) 

 

A V (x) = pV (x) − x⊺Hβx x2 

+ o(1) . 

H 

By Lemma 2.5, there exists H ≻ 0 such that 

for some c, δ > 0. Hence, 

x ⊺ Hβx = 1 

2 x⊺ (Hβ + β ⊺ H)x ≥ cx 2 ≥ cδx 2 H, 

for xH large enough, proving (2.6). 

A V (x) ≤ −pcδV (x) 

Likewise, we can even treat the case of “super heavy-tailed” jump distribution, 

where EZ i p = ∞ for any p > 0, i = 1, . . . , K. The assumption we need here 

regarding the tail behavior is that E log(1 + Z i ) < ∞, i = 1, . . . , K. Note that this 

condition on the tail behavior of the jump distribution also appears in [86] and [68], 

where the stationarity of the O-U type process driven by a Lèvy process is established. 

Proposition 2.3. Suppose that Assumptions 1, 2, and 3 hold. If E log(1+Z i ) < ∞ 

for each i = 1, . . . , K, and κ = 0, then there exists a function V ∈ Q, a compact set 

C and some constants c > 0, d < ∞ such that 

A V (x) ≤ −c + dIC(x), x ∈ S. (2.13) 

Proof. Fix ɛ > 0. Note that there exists H ≻ 0 such that Hβ + β ⊺ H ≻ 0 by Lemma 

2.5. Let V be a C 2 function with V (x) = log(1 + xH) for xH > ɛ. Note that 

log(1 + xH) is subadditive: 

log(1 + x + zH) ≤ log(1 + xH + zH) ≤ log(1 + xH) + log(1 + zH).


Then, for each i = 1, . . . , K, 

 

V (x + z) ϕi(dz) 

 

 

≤ V (x + z) ϕi(dz) + log(1 + x + zH) ϕi(dz) 

x+zH≤ɛ 

 

x+zH>ɛ 

≤ sup 

yH≤ɛ 

V (y) + (log(1 + xH) + log(1 + zH)) ϕi(dz) 

= sup V (y) + log(1 + xH) + E log(1 + Z 

yH≤ɛ 

i H). 

Hence, V (· + z) ϕi(dz) is bounded on compact sets for each i = 1, . . . , K, guaran- 

teeing that V ∈ Q. Then it suffices to show that 

A V (x) ≤ −c 

for some c > 0 and all x sufficiently large. 

Direct calculations yield that for xH > ɛ, 

∇V (x) =x −1 

H (1 + xH) −1 Hx 

∇ 2 V (x) =x −1 

H (1 + xH) −1 H − x −2 

H (1 + xH) −1 (1 + 2xH)Hxx ⊺ H . 


G V (x) =x −1 

−1 

H (1 + xH) 

+ 1 

2 

n 

i,j=1 

(ai,j + 

m 

k=1 

 

x ⊺ H(b − βx) 

α k i,jxk) Hi,j − x −2 

H (1 + xH) −1 (1 + 2xH)(Hxx ⊺ 

H)i,j 

 

=x −1 

H (1 + xH) −1 (−x ⊺ Hβx + O(xH)) 

x 

= − 

⊺Hβx + o(1) 

xH(1 + xH)


as x → ∞. Note that x ⊺ Hβx ≥ cx ≥ cδxH for some c, δ > 0, so 

Hence, 

Moreover, we have 

x 

lim 

xH→∞ 

⊺Hβx xH(1 + xH) 

≥ lim 

xH→∞ 

 

(V (x + z) − V (x)) ϕi(dz) 

 

 

= (V (x + z) − V (x)) ϕi(dz) + 

x+zH≤ɛ 

S1 + S2 

Note that for xH > ɛ, 

 

S1 ≤ 

= 

 

x+zH≤ɛ 

 

cδxH 

xH(1 + xH) 

= cδ. 

lim 

xH→∞ 

G (x) ≤ −cδ (2.14) 

x+zH>ɛ 

sup V (y) − log(1 + xH) 

yH≤ɛ 

 

sup V (y) − log(1 + xH) 

yH≤ɛ 

as xH → ∞ by Lemma 2.6. Also note that for xH > ɛ, 

 

S2 ≤ 

 

≤ 

x+zH>ɛ 

log 

(V (x + z) − V (x)) ϕi(dz) 

 

ϕi(dz) 

P(x + Z i H ≤ ɛ) → 0 (2.15) 

(log(1 + xH + zH) − log(1 + z)) ϕi(dz) 

 

1 + zH 

1 + xH 

 

ϕi(dz) → 0 (2.16) 

by the dominated convergence theorem. It follows from (2.15) and (2.16) that 

lim L V (x) = lim 

xH→∞ xH→∞ 

K 

i=1 

λi 

 

(V (x + z) − V (x)) ϕi(dz) = 0, (2.17)


since κ = 0. Combining (2.14) and (2.17) gives us 

for xH large enough. 

2.3.2 Ergodicity 

A V (x) ≤ − cδ 

2 

We have verified the Foster-Lyapunov inequality (2.5) to apply Proposition A.1 for 

establishing the (exponential) ergodicity. The only difference is that the set C in the 

inequality is compact, whereas Proposition A.1 requires C be petite. Our next task 

is then to show that every compact set is petite for X. 

Lemma 2.7. Suppose that Φ = (Φn : n ≥ 0) is a ϕ-irreducible discrete time Markov 

chain with state space S. If Φ has the Feller property and the support of ϕ has 

non-empty interior, then all compacts subsets of S are petite. 

Proof. See Proposition 6.2.8 of [74]. 

Proposition 2.4. Under Assumptions 1, 2, and 4, every compact subsets of S is 

petite for X. 

Proof. Fix δ > 0, consider the skeleton chain X δ (X(nδ) : n ≥ 0). Lemma 2.1 

implies that X has Feller property. Hence, X δ is a Feller chain. Moreover, Lemma 

2.2 guarantees the existence of the transition density of Px(X(t) ∈ dy). Hence, X 

and thus X δ is ν-irreducible, where ν is the Lebesgue measure. It then follows from 

Lemma 2.7 that every compact subsets of S is petite for the skeleton chain X δ and 

therefore, by the definition, is petite for X. 

We also need the following property of f-norm of signed-measures defined in A.7. 

Lemma 2.8. Let η be a signed-measure on S and f, g ≥ 1 be two positive measurable 

functions with f ≤ cg for some constant c > 0, then 

ηf ≤ cηg.


Proof. For any function h with |h| ≤ f, clearly |h/c| ≤ g. Denote 

η(h). Then, 

and thus 

i.e. ηf ≤ cηg. 

|η(h)| = c|η(h/c)| ≤ c sup |η(r)| = cηg 

|r|≤g 

sup |η(h)| ≤ cηg, 

|h|≤f 

S 

h(x)η(dx) by 

Theorem 2.1. Suppose that Assumptions 1, 2, 3, 4, and 5 hold. Let Z i be a rv with 

distribution ϕi, i = 1, . . . , K. Assume that EZ i p < ∞ for some p > 0 and all 

i = 1, . . . , K. Then, 

(i) X admits a unique stationary distribution π; 

(ii) X is f-exponentially ergodic, where f(x) = x p . In particular, 

for each x ∈ S and some c > 0. 

(iii) EπX(0) p < ∞. 

Px(X(t) ∈ ·) − π(·)f ≤ cf(x)e −γt 

Proof. Let V (x) = x p 

H . It follows from Proposition 2.2, Proposition 2.4, and Proposition 

A.2 that X has a unique stationary distribution π for which EπV (X(0)) < ∞ 


Px(X(t) ∈ ·) − π(·)V ≤ c1V (x)e −γt , x ∈ S 

for some c1, γ > 0. Note that · H is equivalent to · , hence EπX(0) p < ∞ and 

c2f(x) ≤ V (x) ≤ c3f(x), x ∈ R n , 

for some c2 > 0. It then follows from Lemma 2.8 that 

c2 · f ≤ · V ,


implying that 

for all x ∈ S. 

Px(X(t) ∈ ·) − π(·)f ≤ c −1 

2 Px(X(t) ∈ ·) − π(·)V ≤ c1c3 

f(x)e −γt , 

Theorem 2.2. Suppose that Assumptions 1, 2, 3, and 4 hold. If E log(1+Z i ) < ∞ 

and κi = 0, i = 1, . . . , K, then X admits a unique stationary distribution and is 

ergodic. 

Proof. This is an immediate result from Proposition 2.3, Proposition 2.4, and Propo- 

sition A.1. 

The preceding two theorems indicate that the convergence rate of X(t) to the 

equilibrium is essentially determined by the jump component, particularly, the tail 

heaviness of the “heaviest” jump distribution ϕi, which also determines the tail heav- 

iness of the stationary distribution. 

2.3.3 Central Limit Theorem 

It is well known that under mild conditions, a positive Harris recurrent Markov process 

Φ = (Φ(t) : t ≥ 0) satisfies the central limit theorem (CLT) 

t 1/2 

 

1 

t 

t 

0 

 

f(Φ(s)) ds − π(f) ⇒ N (0, σ 2 ) 

as t → ∞, where π is the stationary distribution, f : S → R for which π(|f|) < ∞, 

and the variance 

σ 2 Eπf(Φ(0)) + 2 

∞ 

0 

c2 

Eπ[f(Φ(0))f(Φ(t))] dt, 

where f = f −π(f) is the centered version of f. The derivation of such a CLT typically 

exploits the regenerative structure of Φ (i.e. divide Φ(t) into iid regenerative cycles 

and apply the CLT for iid rv’s; see, for example, [74].


We will apply a different approach here. Particularly, we will express t 

X(s) ds 

0 

in the form of a local martingale plus some remaining term and apply the CLT for 

local martingales. Not only can we obtain a multidimensional CLT, but also can 

calculate the covariance matrix of the limit explicitly thanks to the affine structure. 

In particular, we consider a (R q -valued) local martingale of the form 

V (t) 

t 

for some c ∈ R n and B ∈ R n×n . Then, by (2.1), 

V (t) = 

t 

0 

+ 

0 

X(s) ds − ct + 

K 

i=1 

=(Bb + 

+ 

t 

0 

t 

0 

 

R n 

X(s) ds − ct + B(X(t) − X(0)) (2.18) 

t 

0 

Bz Ni(ds, dz) 

K 

λiBEZ i − c)t + 

i=1 

Bσ(X(s)) dW (s) + 

B(b − βX(s)) ds + 

t 

0 

K 

i=1 

(I − Bβ + 

t 

0 

 

R n 

t 

0 

K 

i=1 

Bσ(X(s)) dW (s) 

BEZ i κ ⊺ 

i )X(s) ds 

Bz Ñi(ds, dz), (2.19) 

where I is the identity matrix, Ñi(ds, dz) Ni(ds, dz) − Λi(X(s−))dsϕi(dz) is com- 

pensated random measure of Ni(ds, dz) and Z i has distribution ϕi, i = 1, . . . , K. 

Hence, if we choose r and A such that. Hence, if we choose c and B such that 

Bb + 

K 

λiBEZ i − c = 0 

i=1 

I − Bβ + 

K 

i=1 

BEZ i κ ⊺ 

i 

= 0, 

(2.20) 

then V (t) is a local martingale in R n . Assumption 5 implies that β − K 


i is


nonsingular, so 

c = B(b + 

B = (β − 

K 

λiEZ i ) = 0 

i=1 

K 

i=1 

EZ i κ ⊺ 

i )−1 . 

(2.21) 

Now that we have constructed a local martingale U(t), we will apply the CLT for 

local martingales as follows. 

Proposition 2.5. (Local Martingale CLT) Let M = (M(t) : t ≥ 0 be a local 

martingale in R n and 〈M〉 = (〈M〉(t) : t ≥ 0) ∈ R n×n be the predictable quadratic 

covariation (see, for example, [83]) of M. Suppose that for each T > 0 and i, j = 

1, . . . , n, 

for some H 0 in R n×n , and 


Then, 

lim 

t→∞ t−1 〈M〉ij(t) = Σij in probability, (2.22) 

lim E sup n 

n→∞ 0≤t≤nT 

−1 |Mi(t) − Mi(t−)| 2 = 0, (2.23) 

lim E sup n 

n→∞ 0≤t≤nT 

−1/2 |〈M〉ij(t) − 〈M〉ij(t−)| = 0. (2.24) 

t −1 M(t) ⇒ N (0, Σ) 

as t → ∞, where N (0, Σ) is a n-dimensional Gaussian random variable with mean 

0 and covariance matrix H. 

Proof. See Theorem 1.4 in Chapter 7 of [38]. 

Roughly speaking, Proposition 2.5 asserts that if i.) 〈M〉 has an equilibrium 

(condition (2.22)); ii.) the jumps of either M or 〈M〉 cannot be too large nor occur 

too often (conditions (2.23) and (2.24)), then M satisfy the CLT. We will now verify 

these conditions for V one by one.


Proposition 2.6. Let 〈V 〉 be the predictable quadratic variation of V . If X is ergodic, 

then 

as t → ∞ for some Γ 0. 

Proof. Note that, by (2.19) and (2.20), 

V (t) = 

t 

0 

t −1 〈V 〉(t) → Γ a.s. 

Bσ(X(s)) dW (s) + 

K 

i=1 

t 

0 

 

R n 

Bz Ñi(ds, dz), (2.25) 

where B is given by (2.21). It follows that the predictable quadratic covariation is 

〈V 〉(t) = 

t 

Bσ(X(s))σ(X(s)) 

0 

⊺ B ⊺ ds 

K 

t 

+ 

i=1 0 Rn Bzz ⊺ B ⊺ ϕi(dz)Λi(X(s−))ds 

=B(a + 

+ 

K 

i=1 

n 

B(α j + 

j=1 

=B(a + 

+ 

K 

i=1 

m 

B(α j + 

j=1 

λiEzz ⊺ )B ⊺ t 

K 

i=1 

λiEzz ⊺ )B ⊺ t 

K 

i=1 

κi,jEzz ⊺ )B ⊺ 

t 

0 

κi,jEzz ⊺ )B ⊺ 

t 

0 

Xj(s−) ds 

Xj(s−) ds., (2.26) 

since α j = 0 and κi,j = 0 for j = m+1, . . . , n. The calculation of predictable quadratic 

covariation is referred to [2] or [83]. The ergodicity of X implies the following strong 

law of large numbers (SLLN) 

1 

t 

t 

0 

f(X(s)) ds → π(f) a.s. 

as t → ∞ for any nonnegative function f, where π is the stationary distribution of


X; see, for example, [74]. In particular, we have 

1 

t 

t 

0 

 

Xj(s) ds → 

as t → ∞. It follows from (2.26) and (2.27) that 

as t → ∞, where 

Γ = B 

 

a + 

K 

λiEzz ⊺ + 

i=1 

S 

xj π(dx) a.s. (2.27) 

t −1 〈V 〉(t) → Γ a.s. (2.28) 

m 

(α j + 

j=1 

K 

i=1 

κi,jEzz ⊺ 

) xjπ(dx) B 

S 

⊺ . (2.29) 

Finally, we verify that Γ 0 which is straightforward from the fact that a 0 and 

α j 0, j = 1, . . . , m. 

We now proceed to verify the condition (2.23) for V (t). First, we need to estimate 

the number of jumps in a fixed time interval. 

Lemma 2.9. Let Ψ = (Ψ(t) : t ≥ 0) be an adapted counting process with intensity 

Λ(t), for which E( t 

0 Λ(s) ds) < ∞. If Ψ is nonexplosive, then (Ψ(t) − t 

Λ(s) ds : 

0 

0 ≤ t ≤ T ) is a martingale. 

Proof. See Theorem T8 and T9 of [14]. 

Lemma 2.10. If X is ergodic, then 

for all T > 0 and i = 1, . . . , K. 

ENi(kT ) 

lim 

k→∞ k 

= T EπΛi(X(0)), 

Proof. Fix T > 0 and i = 1, . . . , K. X is ergodic, so 

EX(t) → EπX(0)


as t → ∞. Hence, for any ɛ > 0, there exists k0 > 0 such that 

EΛi(X(kT )) < EπΛi(X(0)) + ɛ 

for all k > k0 since Λi(x) is affine in x. Moreover, the ergodicity of X implies that N 

is nonexplosive, from which and Lemma 2.9 it follows that 

Therefore, 

ENi(kT ) 

k 

= 1 

k E 

= 1 

k E 

= 1 

k 

≤ 1 

k 

Likewise, we can show 

kT 

0 

kT 

0 

kT 

0 

k0T 

0 

Λi(X(s−))ds 

Λi(X(s))ds 

EΛi(X(s))ds (by Fubini’s Theorem) 

EΛi(X(s)) ds + 1 

k (EπΛi(X(0)) + ɛ)(kT − k0T ). 

ENi(kT ) 

lim 

k→∞ k 

ENi(kT ) 

lim 

k→∞ k 

Sending ɛ ↓ 0 completes the proof. 

≤ T (EπΛi(X(0)) + ɛ). 

≥ T (EπΛi(X(0)) − ɛ). 

Roughly speaking, Proposition 2.10 states that the number of jumps is propor- 

tional to the length of the time interval (which is not “too often”). We are now ready 

to show the following result, verifying condition (2.23) for V (t). 

Proposition 2.7. Suppose that X is ergodic and EZ i 2+ɛ < ∞ for some ɛ > 0. 

Then, 

for l = 1, . . . , n. 

lim E sup k 

k→∞ 0≤t≤kT 

−1 |Vl(t) − Vl(t−)| 2 = 0, 

Proof. Let (Z i j : j ≥ 1) be a sequence of iid rv’s with common distribution ϕi and


g(z) = Bz, where B is the matrix given by (2.21). It follows from the representation 

of V (t) in (2.25) that the pure jump part of V is 

Hence, 

K 

i=1 

t 

0 

 

R n 

g(z)Ni(ds, dz). 

P( sup k 

0≤t≤kT 

−1 |Vl(t) − Vl(t−)| 2 > x) 

=P( sup sup k 

1≤i≤K 1≤j≤Ni(kT ) 

−1 gl(Z i j) 2 > x) 

=E[P( sup sup k 

1≤i≤K 1≤j≤Ni(kT ) 

−1 gl(Z i j) 2 > x|Ni(kT ), i = 1, . . . , K)] 

≤E[ 

=E[ 

= 

K 

i=1 

K 

i=1 

Ni(kT ) 

j=1 

Ni(kT ) 

j=1 

P(k −1 gl(Z i j) 2 > x|Ni(kT ), i = 1, . . . , K)] 

P(k −1 gl(Z i j) 2 > x)] 

K 

ENi(kT )P(k −1 gl(Z i 1) 2 > x) 

i=1 

It follows that for any δ > 0, 

E sup k 

0≤t≤kT 

−1 |Vl(t) − Vl(t−)| 2 =δ + 

≤δ + 

=δ + 

≤δ + 

∞ 

δ 

∞ 

δ 

P( sup k 

0≤t≤kT 

−1 |Vl(t) − Vl(t−)| 2 > x) dx 

K 

ENi(kT )P(k −1 gl(Z i 1) 2 > x) dx 

i=1 

K 

ENi(kT ) 

i=1 

K 

ENi(kT ) 

i=1 

∞ 

δ 

∞ 

δ 

P(k −1 gl(Z i 1) 2 > x) dx 

(kx) 

−(1+ ɛ 

2 ) Egl(Z i 1) 2+ɛ dx.



lim E sup k 

k→∞ 0≤t≤kT 

−1 |Vl(t) − Vl(t−)| 2 ≤ δ 

by Proposition 2.10. Sending ɛ ↓ 0 finishes the proof. 

Now, we are ready to prove the CLT for X. 

Theorem 2.3. Suppose that Z i ∈ R n has distribution ϕi and EZ i 2+ɛ < ∞, for 

some ɛ > 0 and all i = 1, . . . , K. Then, under Assumptions 1 - 5, 


c = B(b + 

 

Γ = B 

B = (β − 

a + 

t 1/2 

 

1 

t 

K 

λiEZ i ) 

i=1 

K 

i=1 

K 

i=1 

t 

0 

λiEZ i Z i⊺ + 

EZ i κ ⊺ 

i )−1 . 

 

X(s) ds − c ⇒ N (0, Γ), 

m 

α j + 

j=1 

K 

i=1 

κijE(Z i Z i⊺ 

) xj π(dx) B 

S 

⊺ 

Proof. Note that 〈V 〉(t) has continuous sample paths a.s., so the condition (2.24) is 

trivially satisfied for V (t).. Under the current assumptions, it follows from Proposition 

2.1 that X is ergodic. So Propositions 2.6 and Proposition 2.7 respectively verified 

the conditions (2.22) and (2.23) for V (t). Therefore, by the local martingale CLT 

(Proposition 2.5), we have 

as t → ∞, where Γ is given by (2.29). 

t −1/2 V (t) ⇒ N (0, Γ), 

It follows from the ergodicity of X that 

X(t) ⇒ X(∞) 

as t → ∞, where X(∞) has the stationary distribution π. Hence, t −1/2 X(t) ⇒ 0 and


thus 

t −1/2 X(t) → 0 in probability 

as t → ∞. Recall that the representation (2.18), 

t 

from which it follows that 

as t → ∞. 

0 

X(s) ds − ct = V (t) − B(X(t) − X(0)), 

t 1/2 

 

1 

t 

t 

0 

 

X(s) ds − c ⇒ N (0, Γ), 

We clearly has the following corollary regarding the equilibrium of X. 

Corollary 2.1. Under Assumptions 1 - 5, 


1 

t 

t 

0 

c = (β − 

 

X(s) ds → 

K 

i=1 

S 

EZ i κ ⊺ 

i )−1 (b + 

xπ(dx) = c, a.s. 

K 

λiEZ i ). 

2.4 Characterization of Stationary Distribution 

Theorem 2.4. Suppose that Assumptions 1, 2, 3, 4, and 5 hold. Let Z i be a rv with 

distribution ϕi, i = 1, . . . , K. Assume that EZ i < ∞ for all i = 1, . . . , K. Let 

ψ(θ) = Eπe iθX(0) be the Fourier transform of the distribution π. Then, ψ satisfies the 

following first-order PDE 

i=1 

f(θ)ψ(θ) + g(θ) ⊺ ∇ψ(θ) = 0, (2.30)


with ψ(0) = 1, where 


f(θ) = iθ ⊺ b − 1 

2 θ⊺ aθ + 

K 

λk(ψk(θ) − 1), 

k=1 

g(θ) ⊺ = −θ ⊺ β + 1 

2 iθ⊺ αθ − i 

K 

k=1 

where θ⊺αθ = (θ⊺α1θ, . . . , θ⊺αnθ), ψk(θ) = EeiθZk . 

(ψk(θ) − 1)κ ⊺ 

k , 

Proof. Let h(x) = e iθ⊺ x . Then, similar as the derivation of (2.4), applying Itô’s 

formula for complex-valued functions, (see, for example, [83]), we obtain that 

= 

h(X(t)) − h(X(0)) − 

t 

0 

I1 + I2 

t 

0 

(A h)(X(s)) ds 

∇h(X(s−)) ⊺ σ(X(s)) dW (s) + 

Note that ∇h(x) = ih(x)θ, so 

Eπ 

t 

0 

K 

i=1 

|∇h(X(s−)) ⊺ σ(X(s))| 2 ds =Eπ 

≤Eπ 

t 

0 

t 

0 

t 

0 

=θ ⊺ aθt + 

=(θ ⊺ aθ + 

 

S 

(h(X(s−) + z) − h(X(s−))) Ñi(ds, dz) 

|h(X(s−))| 2 |θ ⊺ aθ + 

(θ ⊺ aθ + 

m 

j=1 

m 

θ 

j=1 

⊺ α j t 

θ 

0 

m 

j=1 

θ ⊺ α j θXj(s)) ds 

EπXj(s) ds 

m 

θ ⊺ α j θEπXj(0))t < ∞, 

j=1 

θ ⊺ α j θXj(s−)| ds 

where the penultimate equality follows from the Fubini’s theorem since Xj ≥ 0 for 

j = 1, . . . , m. Therefore, I1 is a Pπ-martingale.


On the other hand, note that |h(x)| ≤ 1 for all x, so 

t 

Eπ 

0 

t 

≤Eπ 

0 

 

 

≤4t < ∞. 

S 

S 

|h(X(s−) + z) − h(X(s−))| 2 ϕi(dz) 

2(|h(X(s−) + z)| 2 + |h(X(s−))| 2 ) ϕi(dz) 

Hence, by Lemma 2.4, I2 is a Pπ-martingale. It follows that 


Eπh(X(t)) − Eπ 

Eπ 

t 

t 

for any t > 0, since Eπh(X(t)) = Eπh(X(0)). 

Moreover, 

 

(A h)(x) =h(x) iθ ⊺ (b − βx) − 1 

2 

+ 

K 

k=1 

Hence, by the Fubini’s Theorem, 

Eπ 

t 

0 

(A h)(X(s)) ds = 

0 

0 

(A h)(X(s)) ds = Eπh(X(0)), 

(A h)(X(s)) ds = 0 (2.31) 

(λk + κ ⊺ 

k x) 

 

 

=h(x) (iθ ⊺ b + 1 

2 θ⊺aθ + 

+ (−iθ ⊺ β − 1 

2 θ⊺ αθ + 

t 

0 

n 

k,l=1 

(akl + 

n 

α r klxr)θkθl 

r=1 

(e iθz 

− 1) ϕk(dz) 

n 

λk(ψk(θ) − 1)) 

k=1 

n 

k=1 

Eπ(A h)(X(s)) ds = 

(ψk(θ) − 1)κ ⊺ 

k )x 

 

. (2.32) 

t 

0 

Eπ(A h)(X(0)) ds


since EπX(0) < ∞. It then follows from (2.31) that 

for any t > 0, implying that 

t 

Note that ψ(θ) = Eπh(X(0)) and that 

0 

Eπ(A h)(X(0)) ds = 0 

Eπ(A h)(X(0)) = 0. (2.33) 

∇ψ(θ) = iEπX(0)h(X(0)), 

which can be shown easily by the Dominated Convergence Theorem. So, the proof is 

completed by combining (2.32) and (2.33). 

Remark 2.5. The identity (2.33) also appears in [35] and [50]. It is proved in [35] that 

if a probability η satisfies 

A f(x) η(dx) = 0 

for all bounded C 2 functions f, then η is the stationary distribution. The same 

identity is used in [50] for establishing upper bounds on stationary expectations. 

Remark 2.6. Note that the first-order PDE (2.30) is linear and thus can be solved 

by the method of characteristics (see, for example, [39]). In particular, it suffices to 

solve the following ODE system for θ = θ(s) and δ = δ(s). 

dθ 

ds = − θ⊺β + 1 

2 iθ⊺αθ − i 

dδ 

ds =iθ⊺ b − 1 

2 θ⊺ aθ + 

K 

k=1 

(ψk(θ) − 1)κ ⊺ 

k 

K 

λi(ψk(θ) − 1). 

k=1 

We now illustrate Theorem 2.4 with some examples.


Example 2.1. (O-U Process) The O-U process satisfies the SDE 

dX(t) = (b − βX(t)) dt + σ dW (t). 

It is well known that the O-U process is a Gaussian process with covariance function 

for s < t. Hence, 

Moreover, it is easy to see that 

Cov(X(s), X(t)) = σ2 −β(t−s) −β(t+s) 

e − e 

2β 

 

Var(X(t)) = σ2 −2βt 

1 − e 

2β 

. 

ExX(t) = xe −βt + b 

β (1 − e−βt ). 

Hence, the characteristic function of X(t) conditional on X(0) = x is 

 

ψX(t)(θ) = exp iθ xe −βt + b 

β (1 − e−βt 

) − σ2 −2βt 

1 − e 

4β 

θ 2 

 

. 

On the other hand, by Theorem 2.4, the characteristic function ψ(θ) of the equilibrium 

distribution satisfies 

 

ibθ − 1 

2 σ2θ 2 

 

ψ(θ) − βθψ ′ (θ) = 0. 

We can easily solve this first-order ODE to obtain 

Obviously, for each θ, 

 

ib σ2 

ψ(θ) = exp θ − 

β 4β θ2 

 

. 

lim 

t→∞ ψX(t)(θ) → ψ(θ). 

Hence, the equilibrium distribution is Gaussian with mean b 

β 

σ2 

and variance , i.e. it 

2β


has the density 

β 

exp 

πσ2 (βx − b) 2 

βσ 2 

 

, x ∈ R. 

Example 2.2. (CIR Process) The SDE for CIR process is 

dX(t) = (b − βX(t)) dt + σ X(t) dW (t). 

It is well known (see [22]) that conditional X(0) = x, 

where 

c(t) = 

X(t) D = Y 

2c(t) , 

2β 

σ 2 (1 − e −βt ) , 

and Y is a noncentral χ 2 rv with 4b 

σ 2 degrees of freedom and noncentrality parameter 

2c(t)e −βt x. Hence, (see Chapter 29 of [57]) the characteristic function of Y is 

2b 

− 

ψY (θ) = (1 − 2θi) σ2 exp 

and thus the characteristic function of X(t) is 

ψX(t)(θ) = ψY 

2c(t)e −βt xθi 

1 − 2θi 

 

θ 

= 1 − 

2c(t) 

θi 

2b 

− 

σ 

c(t) 

2 

 

exp 

 

, 

e −βt xθi 

1 − θi 

c(t) 

On other hand, Theorem 2.4 implies that the characteristic function ψ(θ) of the 

equilibrium distribution satisfies 

ibθψ(θ) + 

 

−βθ + 1 

2 iσ2θ 2 

 

ψ ′ (θ) = 0. 

We can easily solve this first-order ODE to obtain 

ψ(θ) = 

 

1 − iσ2 

2β θ 

2b 

− 

σ2 .


It is easy to see that for each θ, 

lim 

t→∞ ψX(t)(θ) → ψ(θ). 

Let X(∞) has the equilibrium distribution, then σ2 

4β X(∞) has χ2 distribution with 

4b 

σ 2 degrees of freedom. In particular, the equilibrium density is 

1 

2 2b 

σ 2 Γ( 2b 

σ 2 ) 

4βx 

σ 2 

2b 

σ2 −1 

exp − 2βx 

σ2 

, x ≥ 0.

Chapter 3 

Affine Point Processes: Large 

Deviations 


The affine point process (APP) model is part of affine models and is a broadly used 

stochastic model in practice. An APP is a point process whose intensity is an affine 

function of an AJD. Notable examples include Hawkes process ([51] and [52]) and 

doubly stochastic process with a CIR intensity process ([31]). One reason for the wide 

applicability of APPs in practice is that it successfully captures the so-called “self- 

excitation” or “clustering” feature exhibited in many time-series data. For instance, 

seismology and earthquake modeling ([79] and [80]), credit derivatives pricing ([37]), 

risk management ([19]), high frequency trading ([8]), social network ([91]), and so 

forth. 

The center of this chapter is to explicitly calculate the large deviations asymptotics 

for APPs in the long-term horizon asymptotic regime. However, before studying its 

atypical behavior, we will characterize the typical behavior of an APP in terms of a 

CLT result, which also helps us to identify the “rare-event” region of an APP. 

Let (L(t) : t ≥ 0) be an APP. As will be seen later, L(t) has roughly the same 

43

CHAPTER 3. AFFINE POINT PROCESSES: LARGE DEVIATIONS 44 

magnitude (at least on average) as an Markov additive functional of the form 

t 

0 

f(Y (s)) ds (3.1) 

where (Y (s) : s ≥ 0) is Markov process and f is a function. Hence, the large deviations 

behavior of L(t) is essentially the same as the above Markov additive functional (3.1). 

The large deviations behavior of Markov additive functionals has been extensively 

studied in the literature; for example, [75], [76], [64], [65], and references therein. 

However, one subtlety lies in whether f is bounded or not. 

When f is bounded, the large deviations of (3.1) is qualitatively the same as that 

of a random walk with light-tailed iid increments. In particular, one typically has 

that 

t 

 

P f(Y (s)) ds > Rt 

0 

(3.2) 

decays to 0 as t → ∞ exponentially fast. In other words, the above tail probability 

is upper bounded by e −ct for some c > 0. However, when f is unbounded, a large 

deviation of order t from the equilibrium could be achieved in o(t) time, as opposed to 

O(t) time for bounded f. This suggests that the probability of such a large deviation 

could be of order e o(t) . This phenomenon is carefully discussed in [32] and [11], 

where the authors shows that even for the simple M/M/1 queue length process, (3.2) 

does not have an exponential decay. In fact, it has Weibullian asymptotics (decays 

subexponentially fast). 

Nevertheless, our calculation shows that for an APP (in which case, f(Y ) = Y and 

the Markov process Y is an AJD), although Y is unbounded, the “traditional” large 

deviations behavior holds. In other words, P(L(t) > Rt) does decay to 0 exponentially 

fast as t → ∞. A significant difference between APPs and M/M/1 queue length 

process is that APPs have a much faster relaxation speed (which measures how fast 

for a path deviated from the equilibrium returns to the equilibrium). 

We close this chapter by applying the large deviations result to portfolio credit risk. 

Risk management is particularly concerned with rare but significant large loss events. 

It is therefore of significant interest to study the estimation of the tail probability the


loss distribution. To be more specific, let the (L(t) : t ≥ 0) be a multidimensional 

APP with each component denoting the accumulated default loss of each individual 

credit portfolio. Our goal is to accurately compute the tail probability distribution 

of L(t). 

A conventional approach is the Fourier transform, which can be computed by 

solving a system of (generalized) Riccati ODE’s; see [37]. Then the probability distri- 

bution of L(t) can be calculated by Fourier inversion, which typically involves eval- 

uating the Fourier transform at a large number of points and each such evaluation 

requires solving an ODE system. This induces large computational complexity. To 

address this numerical problem, [46] develops a saddlepoint approximation so that 

one only needs to evaluate the Fourier transform at one single point (the saddle- 

point). However, their approximation requires the computation of up to the fourth 

partial derivatives of the cumulant generating function of L(t), which involves solv- 

ing an ODE system of size O(n 4 ), where n is the dimension of the underlying AJD. 

When n is large, this approximation becomes computationally expensive. The au- 

thors also propose to use an approximated saddlepoint instead the true saddlepoint, 

whose computation complexity is the bottleneck of the whole algorithm. However, 

the approximated saddlepoint performs terribly when the probability of interest goes 

deep into the tails. 

An alternative for computing the probability distribution of L(t) is Monte Carlo 

simulation. There has been an extensive research on asymptotic analysis as well as 

efficient simulation for portfolio credit risk. Most of the existing research, however, is 

focused on structural models (or threshold models) which are essentially originated 

from Merton’s seminal firm-value work ([70]). Both [58] and [61] suggest an impor- 

tance sampling (IS) approach based on empirical studies for single-factor Gaussian 

copula model. Large deviation asymptotics is developed in [48] for the loss distri- 

bution associated the single-factor Gaussian copula model and provide theoretical 

support for the IS procedure by proving its logarithmic efficiency. Generalizing the 

preceding result, [44] establishes the large deviation asymptotics for the multi-factor 

Gaussian copula model, which is later applied to prove the logarithmic efficiency 

of the IS estimator proposed in [45]. One caveat of the Gaussian copula model is


that it fails to capture external dependence, which roughly means that variables may 

simultaneously take on large values with nonnegligible probability. To address this 

problem, [7] propose a t-copula model and derive sharp asymptotics for the associated 

loss distribution. In addition, they develop two IS estimators based on exponential 

twisting and hazard rate twisting (see, for example, [59]), respectively. They show 

that the first IS algorithm has bounded relative error and the second is logarithmi- 

cally efficient. Recently, [18] develops two efficient simulation schemes for the t-copula 

model based on conditional Monte Carlo and show that the estimators have bounded 

relative error. 

Despite the wide applications of reduced-form models (or intensity-based models), 

research on asymptotic analysis and efficient simulation for rare-event probabilities 

in these models is surprisingly little. For example, [6] proposes an logarithmically 

efficient IS scheme to estimate the loss distribution in a doubly stochastic intensity 

model with affine structure. Their work exploits the conditional independence of 

firm defaults in the doubly stochastic setting. The discussions in the most recent 

work in [41] and [42] are also done for self-exciting affine point processes in the same 

large dimension asymptotic regime. However, the model they studied are not “fully 

multidimensional”, in the sense that the coefficients (which are matrices) are basically 

diagonalizable and the correlation between different risk factors is introduced only 

through one single one-dimensional process. Our asymptotic analysis is developed 

in a more generalized setting. In particular, no diagonalizability of the coefficient of 

affine point process is assumed. In addition, we work in a different asymptotic regime, 

i.e. the large time horizon regime. 

3.2 Affine Point Processes 

Let X ∈ S = R m + × Rn−m for some 0 ≤ m ≤ n be a n-dimensional AJD defined by 

(2.1). Then, we call 

L(t) 

K 

i=1 

t 

an affine point process, where ζi : R n → R q , i = 1, . . . , K. 

0 

 

R n 

ζi(z)Ni(ds, dz), (3.3)


In addition to Assumptions 1- 5 used in Chapter 2, we will use the following 

assumptions in this chapter as well. 

Assumption 6. The origin is in the interiors of both {θ ∈ R q : Ee θ⊺ ζi(Z i ) < ∞} and 

{u ∈ Rn : Eeu⊺Z i 

< ∞}, where Zi has distribution ϕi, for i = 1, . . . , K. 

Remark 3.1. Assumption 6 asserts that the jump distributions are light-tailed so that 

the exponential martingale introduced in Section 3.4.1 is well defined. This certainly 

yields that EZ i p < ∞ for any p > 0 and each i = 1, . . . , K. 

Assumption 7. ζi : R n → R q 

+, for i = 1, . . . , K. 

The structures of the parameters play an critical role in our subsequent analysis. 

To this end, we need the following concepts on matrices. 

Definition 3.1. A matrix A ∈ R n×n is called an Z-matrix if A has non-positive 

off-diagonal elements, i.e. Aij ≤ 0 for 1 ≤ i = j ≤ n. 

Definition 3.2. A positive stable Z-matrix is called an M-matrix. 

We will use the following important property of M-matrices. 

Lemma 3.1. Let A be a n × n matrix. Suppose that A = sI − B for some s > 0 and 

B ∈ R n×n 

+ . Then, A is an M-matrix, if and only if s > sp(B), the spectral radius of 

B. 

Proof. It follows from Definition (1.2) on Page 133 of [9] that sI − B is a M-matrix 

if s > sp(B). 

Now we show the other direction of the statement. Suppose A = sI − B is a 

M-matrix. Note that 

det(tI − A) = det((t − s)I + B), 

so s − r is an eigenvalue of A if r is an eigenvalue of B. Since B is a nonnegative 

matrix, we know that sp(B) ≥ 0 is an eigenvalue of B. So s − sp(B) is an eigenvalue 

of A = sI − B. It follows immediately that s > sp(B) since eigenvalues of A have 

positive real parts.


Remark 3.2. The block lower triangular structure of β implies that Assumption 3 is 

equivalent to the condition that both βI,I and βJ,J are positive stable. Hence, βI,I is 

an M-matrix. 

3.3 Central Limit Theorem 

We have studied the stochastic stability of AJDs in Chapter 2. It is natural to spec- 

ulate that an APP would exhibit certain equilibrium behavior when the associated 

AJD is “stable”. We characterize the typical behavior of an APP in terms of the 

following central limit convergence 

t −1/2 (L(t) − rt) ⇒ N (0, Σ) 

as t → ∞, for some r ∈ R q and Σ ≻ 0, where N(0, Σ) is a Gaussian rv with mean 0 

and covariance matrix Σ. One could use the same argument we used in Section 2.3.3 

for deriving the CLT for X, so we will omit technical details and only calculate r and 

Σ explicitly. 

Consider the local martingale of the form U(t) L(t) − rt + A(X(t) − X(0)) for 

some r ∈ R q and A ∈ R q×n that will be specified later. Then, by (2.1), 

U(t) =L(t) − rt + 

= 

+ 

K 

i=1 

+ [ 

+ 

K 

t 

i=1 0 

t 

0 

 

R n 

t 

R n 

0 

A(b − βX(s)) ds + 

Az Ni(ds, dz) 

(ζi(z) + Az) Ñi(ds, dz) + 

t 

0 

t 

K 

λi(Eζi(Z i ) + AEZ i ) + Ab − r]t 

i=1 

t 

0 

[ 

K 

i=1 

(Eζi(Z i ) + AEZ i κ ⊺ 

i 

) − Aβ]X(s) ds. 

0 

Aσ(X(s)) dW (s) 

Aσ(X(s)) dW (s)


Hence, if we choose r and A such that 

K 

λi(Eζi(Z i ) + AEZ i ) + Ab − r = 0 

i=1 

K 

i=1 

(Eζi(Z i ) + AEZ i κ ⊺ 

i 

) − Aβ = 0 

then U(t) is a local martingale in R q . Assumption 5 implies that β − K 


i is 

invertible so we can solve the above equations explicitly as follows 

A = ( 

r = 

K 

i=1 

K 

i=1 

Next we calculate 〈U〉(t). Note that 

U(t) = 

K 

i=1 

t 

0 

 

R n 

where A is given by (3.4). Therefore, 

where 

〈U〉(t) = 

= 

K 

t 

 

=(AaA ⊺ + 

Eζi(Z i )κ ⊺ 

i )(β − 

K 

i=1 

λi(Eζi(Z i ) + AEZ i ) + Ab 

(ζi(z) + Az) Ñi(ds, dz) + 

EZ i κ ⊺ 

i )−1 

t 

0 

Aσ(X(s)) dW (s), 

(ζi(z) + Az)(ζi(z) + Az) ⊺ ϕi(dz)Λi(X(s−)) ds 

i=1 0 Rn t 

+ Aσ(X(s))σ(X(s)) 

0 

⊺ A ⊺ ds 

K 

C 

i=1 

i 

t 

(λi + κ 

0 

⊺ 

t 

n 

i X(s)) ds + A(a + α 

0 

j=1 

j Xj(s))A ⊺ ds 

K 

λiC 

i=1 

i n 

)t + [Aα 

j=1 

j A ⊺ K 

+ κijC 

i=1 

i t 

] 

0 

C i = E(ζi(Z i ) + AZ i )(ζi(Z i ) + AZ i ) ⊺ , i = 1, . . . , K. 

(3.4) 

Xj(s) (3.5)


If X is ergodic, then Corollary 2.1 


1 

t 

c = (β − 

t 

0 

K 

i=1 

It follows from (3.5) and (3.6) that 


Σ = AaA ⊺ + 

X(s) ds → c a.s. (3.6) 

EZ i κ ⊺ 

i )−1 (b + 

K 

λiEZ i ). 

i=1 

t −1 〈U〉(t) → Σ a.s. 

K 

λiC i + 

i=1 

n 

j=1 

cj[Aα j A ⊺ + 

K 

i=1 

κijC i ]. 

Theorem 3.1. Suppose that Z i ∈ R n has distribution ϕi and EZ i 2+ɛ < ∞, for 

some ɛ > 0 and all i = 1, . . . , K. Then, under Assumptions 1 - 5, 


r = K 

Γ = AaA ⊺ + 

t −1/2 (L(t) − rt) ⇒ N (0, Σ) 

i=1 λi(Eζi(Z i ) + AEZ i ) + Ab 

K 

K 

λiC i + 

i=1 

Eζi(Z i )κ ⊺ 

i )(β − 

n 

j=1 

K 

cj[Aα j A ⊺ + 

K 

i=1 

κijC i ] 

A = ( 

EZ 

i=1 

i=1 

i κ ⊺ 

i )−1 

C i = E(ζi(Z i ) + AZ i )(ζi(Z i ) + AZ i ) ⊺ , i = 1, . . . , K. 

Example 3.1. Consider a one-dimensional generalized Hawkes process L satisfying 

dX(t) = (b − βX(t))dt + σ X(t)dW (t) + dL(t), (3.7)


L(T ) N(T ) 

Mean Variance Mean Variance 

Theoretical Approximation 

(r1T ) 

56.9767 

(Σ1,1T ) 

59.5238 

(r2T ) 

81.3953 

(Σ2,2T ) 

110.2293 

Estimation 57.4710 59.1638 82.1990 109.1025 

Table 3.1: Theoretical v.s. Estimated Mean/Variance 

and L(t) = N(t) 

i=1 Zi where N(t) is a counting process with intensity X(t) and where 

Zi’s are iid rv’s with common distribution ϕ. Let Y (t) = (L(t), N(t)) ⊺ . Then, 


r = 

b 

β − EZ1 

t −1/2 (Y (t) − rt) ⇒ N (0, Σ) 

 

EZ1 

1 

and Σ = b(β2 + σ 2 ) 

(β − EZ1) 3 

Hence, we have the following approximation 

L(T ) D 

≈ r1T + N (0, Σ1,1T ) 

N(T ) D 

≈ r2T + N (0, Σ2,2T ) 

 

2 EZ1 EZ1 

. 

EZ1 1 

for T large, where D 

≈ means “approximately equal to in distribution”. Let us see an 

numerical example, where we choose the parameters as follows: T = 100, b = 3.5, 

β = 5, σ = 0.2 and ϕ is uniform on {0.4, 0.6, 0.8, 1.0}. We generate 1000 sample paths 

to estimate (L(T ), N(T )) and compare against the above approximations. Figure 3.1 

shows a simulated path of (X(t), L(t)). The results are shown in Figure 3.2 and Table 

3.1. 

3.4 Large Deviations 

We are interested in the probability P(L(t) > x) for x large. One mathematical 

treatment is to change the form of the problem and consider P(L(t) > Rt). When t is


Figure 3.1: Simulated Sample Path of X(t) and L(t). 

Figure 3.2: Histogram v.s. Fitted Probability Density. Left: L(T ); Right: N(T ).


“moderate” (so that Rt is not too large, one could use the central limit approximation 

as discussed in the last section. On the other hand, when t is large, the asymptotics 

of P(L(t) > Rt) falls into the regime that can be treated by the Gärtner-Ellis theorem 

(see, for example, [24]). 

Note that Theorem 3.1 indicates that t −1 L(t) → r as t → ∞. Hence, 

P(L(t) ≥ Rt) → 0 

as t → ∞ for R > r, i.e. ((L(t) ≥ Rt) : t ≥ 0) is a family of rare-events. In this 

section, we will compute the asymptotics 

1 

lim log P(L(t) ≥ Rt). (3.8) 

t→∞ t 

The Gärtner-Ellis theorem provides a mechanism for computing the above logarithmic 

asymptotics. In particular, the computation of 

plays a key role in the calculation. 

lim 

t→∞ t−1 log E exp(θ ⊺ L(t)) (3.9) 

3.4.1 A Class of Exponential Martingales 

In order to compute (3.9), we first attempt to construct a martingale of the form 

M(t) = M(t, θ) = exp[θ ⊺ L(t) − φt + h(X(t)) − h(X(0))] (3.10) 

Both φ and h : S → R clearly depend on the choice of θ, but we choose (temporarily) 

to suppress the dependence on θ for notational simplicity. The affine structure of J 

and X suggests that we take h of the form h(x) = u ⊺ x for some suitable u ∈ R n that 

depends on θ.


Let Y (t) = θ ⊺ L(t) − φt + u ⊺ (X(t) − X(0)). Itô’s formula establishes that 

M(t) = 1+ 

t 

0 

M(s−) dY c (s)+ 1 

2 

t 

0 

M(s−) d[Y ] c (s)+ 

(M(s)−M(s−)), (3.11) 

0


Plugging (3.12), (3.13), and (3.14) into (3.11) yields that 

M(t) =1 + 

+ 

+ 

+ 

t 

0 

t 

0 

t 

0 

t 

0 

M(s−)[u ⊺ b − φ + 1 

2 u⊺ au + 

M(s−)u ⊺ σ(X(s)) dW (s) + 1 

2 

M(s−)[ 

 

R n 

K 

i=1 

M(s−) 

(Ee θ⊺ ζi(Z i )+u ⊺ Z i 

K 

i=1 

n 

j=1 

λi(Ee θ⊺ ζi(Z i )+u ⊺ Z i 

u ⊺ α j u 

− 1)κ ⊺ 

i − u⊺ β]X(s) ds 

K 

(e θ⊺ζi(z)+u⊺z − 1)Ñi(ds, dz). 

i=1 

− 1)] ds 

⊺ 

M(s−)Xj(s) ds 

Evidently, M(t) is a local martingale if we choose φ ∈ R and u ∈ R n such that 


n 

l=1 

u ⊺ b − φ + 1 

2 u⊺ au + 

ulβl,j − 1 

2 u⊺ α j u − 

K 

i=1 

K 

i=1 



0 

− 1) = 0 (3.15) 

− 1)κi,j = 0, j = 1, . . . , n. (3.16) 

Note that α j = 0, κi,j = 0 for i = 1, . . . , n, j = m + 1, . . . , n and that βl,j = 0 for 

l = 1, . . . , m and j = m + 1, . . . , n. So it follows from (3.16) that 

n 

l=m+1 

ulβl,j = 0, j = m + 1, . . . , n, 

which, written in matrix form, is equivalent to 

u ⊺ 

J βJ,J = 0. 

It follows from Remark 3.2 that βJ,J is nonsingular. Therefore, uJ = 0, i.e. ul = 0 

for l = m + 1, . . . , n. 

Remark 3.3. Here is a heuristic interpretation for the fact that uJ = uJ(θ) ≡ 0 for all


θ. The magnitude of L(t) is (at least on average) roughly the same as its compensator 

K 

i=1 

t 

0 

 

Λi(X(s)) ds 

Rn ζi(z) ϕi(dz), 

which does not depend on XJ(t), i.e. Xl(t), l = m + 1, . . . , n. Hence, only Xl(t), 

l = 1, . . . , m are needed to “offset” the randomness of L(t) and thus ul ≡ 0 for 

l = m + 1, . . . , n. 

We can simplify (3.15) and (3.16) even further. By the assumptions on the struc- 

tures of α and a, (3.15) and (3.16) can be simplified to 


m 

l=1 

ulβl,j − 1 

2 αj 

j,j u2 j − 

u ⊺ b − φ + 

K 

i=1 

K 

i=1 



− 1) = 0 (3.17) 

− 1)κi,j = 0, j = 1, . . . , m. (3.18) 

(3.17) asserts that φ = φ(θ) is computable from u = u(θ). Suppose for now that 

(3.18) has a solution pair (θ, u). Consider another parameter set (a, α, b, β, λ, κ) as 

follows. Define λ ∈ R K + and κ ∈ R K×n 

+ 

where κ ⊺ 

i 

and κ⊺ 

i 

define β ∈ R n×n such that 

 

λi = λi 

 

κi = κi 

R n 

R n 

such that 

e θ⊺ ζi(z)+u ⊺ z ϕi(dz) (3.19) 

e θ⊺ ζi(z)+u ⊺ z ϕi(dz) (3.20) 

is the i-th row of κ and κ, respectively, for i = 1, . . . , K. Moreover, 

βj = βj − α j u, j = 1, . . . , n 

where βj and βj are the j-th column of β and β, respectively, for j = 1, . . . , n. Note


that α j = 0 and uj = 0 for j = m + 1, . . . , n, from which it follows immediately that 

So 

β = 

βI,I − diag(α 1 1,1u1, . . . , α m m,mum) 0 

βI,J 

βJ,J 

βI,I = βI,I − diag(α 1 1,1u1, . . . , α m m,mum) 

 

. (3.21) 

is a Z-matrix since β is a Z-matrix. Therefore, we have the following proposition. 

Proposition 3.1. Suppose (θ, uI) solves (3.18). Then, under Assumptions 1 and 6, 

(a, α, b, β, λ, κ) is admissible, where β, λ, and κ are respectively defined by (3.21), 

(3.19), and (3.20). 

Hence, 

Note that, by (3.15) and (3.16), 

dM(t) = M(t−)(u ⊺ σ(X(t)) dW (t) + 

t 

M(t) = exp 

+ 

− 

K 

0 

t 

i=1 0 

t 

K 

i=1 

0 

K 

 

i=1 

R n 

u ⊺ σ(X(s)) dW (s) − 1 

2 

 

 

Rn (θ ⊺ ζi(z) + u ⊺ z)Ni(ds, dz) 

R n 

(e θ⊺ ζi(z)+u ⊺ z − 1)Ñi(dt, dz)). (3.22) 

t 

u 

0 

⊺ σ(X(s))σ ⊺ (X(s))u ds 

(e θ⊺ζi(z)+u⊺ 

z 

− 1)ϕi(dz)Λi(X(s)) ds 

(3.23) 

Proposition 3.2. Suppose that (3.18) has a solution pair (θ, uI). Let u = (uI; 0) ∈ 

R n . Then, under Assumptions 1 and 6, (M(t) : t ∈ [0, T ]) defined by (3.23) is a 

martingale for each T > 0. 

Proof. The proof is essentially the same as that given in [21] in the context of a 

generic jump diffusion with a possible explosion. The similar idea was also developed 

in [98] for diffusion processes. We provide here a simplified version adapted to the 

APPs.


It follows from (3.22) and (3.23) that M(t) is a positive local martingale, and 

thus a supermartingale. Consequently, it suffices to show that EM(T ) = 1. Let 

(a, α, b, β, λ, κ) be defined by (3.19), (3.20), and (3.21). Moreover, let 

for i = 1, . . . , K. Set 

Note that 

ϕi(dz) = eθ⊺ ζi(z)+u⊺z ϕi(dz) 

 

Rn eθ⊺ζi(y)+u⊺y ϕi(dy) 

µ(x) = b − βx + σ(x)σ(x) ⊺ u. 

σ(x)σ ⊺ (x)u = (a + 

n 

xjα j )u = 

j=1 

n 

xjα j u 

j=1 

(3.24) 

by the fact that uj = 0 for i = m + 1, . . . , n and the assumption on the structure of 

a. Hence, µ(x) = b − βx. 

Now consider the AJD X(t) satisfying 

d X(t) = µ( X(t))dt + σ( X(t))dW (t) + 

K 

 

i=1 

R n 

z Ni(dt, dz), (3.25) 

where Ni(dt, dz) is a counting random measure on [0, ∞) × Rn with compensator 

Λi( X(t))dt ϕi(dz), where Λi(x) = λi + κ ⊺ 

i x, for i = 1, . . . , K. 

For each k ≥ 1, we define the stopping times: 

τk = inf{t > 0 : X(t−) ≥ k or X(t) ≥ k} 

τk = inf{t > 0 : X(t−) ≥ k or X(t) ≥ k} 

Both X(t) and X(t) are nonexplosive by Lemma 2.1. Hence, these stopping times 

satisfy: 

lim 

k→∞ P(τk ≥ T ) = lim P(τk ≥ T ) = 1. (3.26) 

k→∞ 

For each n, let X τk(t) = X(t)I(t < τk) be the stopped processes associated with 

(τk : k ≥ 1). Let M k (t) be the exponential local martingale by replacing X(t) by


X τk(t) in (3.23). Note that 

 

1 

E exp 

2 

 

m 1 

= E exp 

2 

< ∞ 

where 

t 

u 

0 

⊺ σ(X τk 

K 

t 

⊺ τk (s))σ (X (s))u ds + 

i=1 0 Rn α 

j=1 

j 

j,ju2 t 

j X 

0 

τk 

K 

j (s) ds + Eyi(Z 

i=1 

i t 

) (λi + κ 

0 

⊺ 

i Xτk (s)) ds 

yi(z) = e θ⊺ ζi(z)+u ⊺ z (θ ⊺ ζi(z) + u ⊺ z − 1) + 1 

yi(z)ϕi(dz)Λi(X τk (s)) ds 

for i = 1, . . . , K, since X τk(s) is bounded. It then follows from Théorème IV.3 of [67] 

that (M k (t) : t ∈ [0, T ]) is a martingale. Hence, for each k ≥ 1, M k (t) induced a 

 

= M Ft k (t) for t ∈ [0, T ]. It 

probability measure Q k equivalent to P defined by dQk 

dP 

follows from Girsanov’s theorem that for each k ≥ 1, 

W k (t) = W (t) − 

t 

0 

σ ⊺ (X τk (s))u ds 

is a standard n-dimensional Brownian motion under Q k . In addition, Ni(dt, dz) has 

compensator Λi(X τk(t))dtϕi(dz) under Q k for each i = 1, . . . , K. 

It is easy to see that 

dX(t) = µ(X(t)) dt + σ(X(t)) dW k (t) + 

K 

 

i=1 

R n 

 

zNi(dt, dz) (3.27) 

for t ∈ [0, τk). By comparing (3.25) with (3.27), we conclude that (X(t) : t ∈ [0, τk)) 

under Q k has the same distribution as ( X(t) : t ∈ [0, τk)) under P . Therefore, by 

(3.26) 

as k → ∞. Moreover, note that 

EM k (T )I{τk≥T } = Q k (τk ≥ T ) = P (τk ≥ T ) → 1 

EM k (T )I{τk≥T } = EM(T )I{τk≥T } → EM(T )I{τ∞≥T }


as k → ∞ by the monotone convergence theorem, where 

τ∞ inf{t > 0 : X(t) = ∞ or X(t−) = ∞}. 

The non-explosiveness of X implies that τ∞ = ∞ P -a.s.. Therefore, we conclude that 

EM(T ) = 1. 

Remark 3.4. Note that (3.18) may have multiple solutions u for a given θ. Proposition 

3.2 indicates that M is a martingale for any solution pair (θ, u). This is a surprising 

result because one might expect that there exists a unique solution u for a given θ, for 

which M is a martingale, while other solutions make M a strictly local martingale. 

In that case, one would be able to identify the “appropriate” solution branch u(θ) 

by verifying the martingality of M. In the current setting, we will show that the 

“appropriate” solution branch u(θ) turns out to be the one that makes X have certain 

stochastic stability property under the change of measure. 

The above observation generalizes to other additive functionals of affine processes. 

We will use the following simple example for illustration. 

Example 3.2. Let X be a CIR process satisfying 

dX(t) = (b − βX(t)) dt + σ X(t) dW (t), 

with b, β, σ > 0. X is clearly nonnegative. For each θ ∈ R, consider the following 

exponential martingale 

 

M(t) = exp θ 

t 

0 

 

X(s) ds − φt + u(X(t) − X(0)) 

with φ and u to be determined. Applying Itô’s formula, 

 

dM(t) = M(t) (θ − βu + 1 

2 σ2u 2 )X(t)dt + (bu − φ)dt + σu 

X(t)dW (t) ,


which implies that φ = bu and 

1 

2 σ2 u 2 − βu + θ = 0. 

Hence, there are two solutions u1 and u2 if θ < β2 

2σ 2 , i.e. 

u1 = β + β2 − 2θσ2 σ2 , u2 = β − β2 − 2θσ2 σ2 . 

We use the subscript i to indicates the quantities associated with the solution ui, 

i = 1, 2. A direct application of Girsanov’s theorem yields that X satisfies 

dX(t) = (b − βX(t))dt + σ X(t)d W (t) 

where W is a standard Brownian motion under the probability measure Q induced 

by M, and 

β = β − uσ 2 = ∓ β 2 − 2θσ 2 . 

Since β1 < 0, X is not mean-reverting under Q1 and hence is transient. On the other 

hand, β2 > 0 so X is recurrent under Q2. So 

 

exp θ 

t 

and it can be shown that 

as t → ∞, implying 

0 

 

X(s) ds = e φ2t Q2 E exp(−u2(X(t) − X(0))) 

E Q2 exp(−u2(X(t) − X(0))) = O(1) (3.28) 

 

1 

lim log E exp θ 

t→∞ t 

t 

0 

 

X(s) ds = φ2. 

Remark 3.5. We need to point out a subtlety in (3.28). If θ > 0, then u2 > 0 and 

thereby exp(−u2X(t)) is bounded for all t > 0. The ergodicity of X will naturally 

yield (3.28). However, if θ < 0, then u2 < 0. In this case, exp(−u2X(t)) is unbounded.


We need a stronger stochastic stability than ergodicity, namely exponential ergodicity, 

to ensure (3.28). An example illustrating the problem of the multiple solutions for 

one dimensional Hawkes process appears in [100]. 

3.4.2 Limiting Cumulant Generating Function 

As discussed at the beginning of Section 3.4.1, a key step to establish the logarithmic 

asymptotics (3.8) is to compute the limiting cumulant generating function (CGF) 

By Proposition 3.2, 

1 

lim 

t→∞ t log E exp(θ⊺L(t)). M(t) = exp(θ ⊺ L(t) − φt + u ⊺ (X(t) − X(0))) 

is a martingale where φ = φ(θ) and u = u(θ) are solved from (3.17) and (3.18). It 

follows that 

E exp(θ ⊺ L(t) − φt) = E Q exp(−u(X(t) − X(0))) 

where Q is the equivalent probability measure induced by M(t), i.e. dQ 

 

= M(t). 

Ft 

Recall the discussion in Example 3.2. If we can show that 

then we obviously have 

E Q exp(−u ⊺ (X(t) − X(0))) = O(1), (3.29) 

1 

φ(θ) = lim 

t→∞ t log E exp(θ⊺L(t)). (3.30) 

To facilitate our discussion on the properties of u(θ) and φ(θ), let Fj(θ, uI) denote 

the LHS of the j-th equation of (3.18) and F (θ, uI) = (F1(θ, uI), . . . , Fm(θ, uI)) : 

dP


R q+m → R m . Then, 

∂Fj 

∂ul 

∂Fj 

∂uj 

= βl,j − 

K 

i=1 

= βj,j − α j 

j,j uj − 

κi,jEZ i l e θ⊺ ζi(Z i )+u ⊺ Z i 

, 1 ≤ l = j ≤ m 

K 

i=1 

κi,jEZ i l e θ⊺ ζi(Z i )+u ⊺ Z i 

. 

Then, J (θ, uI) = ( ∂Fj 

∂ul )1≤j,l≤m, the Jacobian matrix of F with respect to uI, is given 

by 

Note that 

J (θ, uI) ⊺ 

=βI,I − diag((α 1 1,1u1, . . . , α m m,mum)) − 

J (0, 0) ⊺ = βI,I − 

K 

i=1 

K 

i=1 

(EZ i Ie θ⊺ ζi(Z i )+u ⊺ Z i 

EZ i Iκ ⊺ 

i,I 

)κ ⊺ 

i,I . (3.31) 

is nonsingular, whose eigenvalues have positive real parts by Assumption 5. Since 

J (θ, uI) is continuous in (θ, uI), we conclude that there exists an open neighborhood 

of the origin in R q+m within which the eigenvalues of J (θ, uI) have positive real parts. 

Define 

O = {(θ, uI) ∈ R q+m : J (θ, uI) is positive stable}. 

Note that for a given θ, there may exist multiple solutions uI. We choose the solution 

branch uI = uI(θ) for which (θ, uI) ∈ O. A critical reason is that by choosing uI this 

way, X is ergodic under Q (which depends on θ), so that (3.29) holds (at least for 

u ∈ R n +). Recall that a crucial assumption regarding the stability of X is that it is 

“mean-reverting” in the sense of Assumption 3. Analogously, we have the following 

result. 

Proposition 3.3. Let (θ, uI) ∈ O be a solution to F (θ, uI) = 0. Then, under 

Assumptions 1, 3, and 6, β defined by (3.21) is positive stable. 

Proof. It follows from Remark 3.2 that βJ,J is positive stable. By virtue of the block


lower triangular form of β, it suffices to show that 

βI,I = βI,I − diag(α 1 1,1u1, . . . , α m m,mum) 

is positive stable. Proposition 3.1 indicates that βI,I is a Z-matrix. So, it suffices to 

show that βI,I is a M-matrix. 

By Lemma 3.1 and Assumption 3, 

for some A ∈ R m×m 

+ 

where 

βI,I = sI − A 

and s > sp(A). Then, 

J (θ, uI) ⊺ = βI,I − D − G = sI − A − D − G, 

D = diag((α 1 1,1u1, . . . , α m m,mum)) 

G = 

K 

i=1 

(EZ i Ie θ⊺ ζi(Z i )+u ⊺ Z i 

Clearly, G is nonnegative. Let t = min 

1≤i≤m ai i,iui, then 

κ ⊺ 

i,I . 

J (θ, uI) ⊺ = (s − t)I − (A + G + H), 

where H = D −tI is a nonnegative matrix. Since (θ, uI) ∈ O, J (θ, uI) is a M-matrix. 

It follows immediately from Lemma 3.1 that 

Hence, 

is a M-matrix. 

s − t > sp(A + G + H) ≥ sp(A + H). 

βI,I = βI,I − D = (s − t)I − (A + H)


Proposition 3.3 indicates that ˆ β satisfies Assumption 3. Moreover, it follows from 

(3.31) as well as the definitions of κ and βI,I in (3.20) and (3.21) that 

is positive stable. Therefore, 

β − 

K 

i=1 

J (θ, uI) ⊺ = βI,I − 

EZ i κ ⊺ 

i = 

K 

i=1 

EZ i Iκ ⊺ 

i,I , 

βI,I − K 

i=1 EZi I κ⊺ 

i,I 

0 

βI,J − K 

i=1 EZi J κ⊺ 

i,I βJ,J 

satisfies Assumption 5. Then, it follows from Theorem 2.1 that the AJD X satisfying 

the SDE (3.25) is ergodic. Equivalently, we have 

Proposition 3.4. Under Assumptions 1 - 6, X is ergodic under the probability mea- 

sure Q induced by the exponential martingale M. 

Now we are ready to show the following proposition which proves (3.30) for θ ∈ R q 

+. 

Proposition 3.5. Let θ ∈ R q 

+ and (θ, u, φ) be a solution triplet to (3.17) and (3.18) 

with (θ, uI) ∈ O. Then, under Assumptions 1 - 7, 

1 

φ(θ) = lim 

t→∞ t log E exp(θ⊺L(t)). Proof. Since F (θ, uI(θ)) = 0, the implicit function theorem implies that 

∇uI(θ) = −∇uI F (θ, uI(θ)) −1 ∇θF (θ, uI(θ)) = −J (θ, uI(θ)) −1 ∇θF (θ, uI(θ)). (3.32) 

Note that 

∇θF (θ, uI(θ)) ⊺ = − 

K 

i=1 

(Eζi(Z i )e θ⊺ ζi(Z i )+u(θ) ⊺ Z i 

 

)κ ⊺ 

i,I ∈ Rq×m − 

(3.33) 

by Assumption 7. Moreover, J (θ, uI(θ)) is an M-matrix, so (J (θ, uI(θ)) ⊺ ) −1 ∈ R m×m 

+ ; 

see Page 137 of [9]. Hence, 

∇uI(θ) ≥ 0.


It is easy to see that uI(0) = 0. Therefore, uI(θ) ≥ 0 if θ ≥ 0. 

Note that 

exp(−u ⊺ X(t)) = exp(− 

m 

uiXi(t)) 

is bounded, since Xi(·) ≥ 0 and ui ≥ 0 for i = 1, . . . , m. Consequently, by Proposition 

3.4 we have 

i=1 

E Q exp[−u ⊺ (X(t) − X(0))] = O(1) 

as t → ∞, thereby t −1 log E exp(θ ⊺ L(t)) → φ(θ) as t → ∞. 

3.4.3 Steepness 

Now that we have computed the limiting CGF φ = φ(θ), in order to apply the 

Gärtner-Ellis theorem the next step is verify φ is steep (see [24] for the definition) in 

the region of interest (namely, R q 

+ for our purpose). To this end, define 

Θ = {θ ∈ R q : ∃uI ∈ R m s.t. F (θ, uI) = 0, (θ, uI) ∈ O}. 

The steepness of φ is related to its behavior near the boundary of Θ. To completely 

characterize Θ is challenging. The one-dimensional case is solved in [100]; [63] and 

[47] discuss a related problem for multidimensional affine diffusion processes without 

jumps by studying an associated dynamical system. In particular, they character- 

ize {θ : limt→∞ E exp(θ ⊺ X(t)) < ∞}, where X(t) is an affine diffusion (without 

jumps). Nevertheless, we do have the following property regarding the Jacobian ma- 

trix J (θ, uI(θ)) near the boundary of Θ. 

Lemma 3.2. Suppose ∂Θ = ∅. Then, det(J (θ, uI(θ))) → 0 as Θ ∋ θ → θ 0 ∈ ∂Θ. 

Proof. Let s = s(θ) = max1≤i≤m J (θ, uI(θ))ii and B = B(θ) = sI − J (θ, uI(θ)). 

Then, B ∈ R m×m 

+ and s > sp(B) by Lemma 3.1, since J (θ, uI(θ)) is an M-matrix for 

each θ ∈ Θ. It follows from the Perron-Frobenius theorem that sp(B) is an eigenvalue 

of B. 

Moreover, the continuity of J (θ, uI(θ)) in θ implies that s is continuous in θ. B is 

therefore continuous in θ as well. So s − sp(B) → s(θ 0 ) − sp(B(θ 0 )) as θ → θ 0 . Note


that since J (θ, uI) is positive stable and thus nonsingular for (θ, uI) ∈ O, we can 

extend the solution path (θ, uI) of F (θ, uI) = 0 from the origin to the boundary of O 

by the implicit function theorem (see, for example, [27]). Hence, (θ0, uI(θ0)) ∈ ∂O if 

θ0 ∈ ∂Θ. It follows that s(θ 0 ) − sp(B(θ 0 )) = 0 as θ → θ 0 , implying s − sp(B) → 0 as 

θ → θ 0 . 

Therefore, letting (ηi : 1 ≤ i ≤ m) denotes the eigenvalues of B with η1 = sp(B), 

det(J (θ, uI(θ))) = det(sI − B) = 

as θ → θ 0 . 

m 

m 

(s − ηi) = (s − sp(B)) (s − ηi) → 0 

i=1 

Lemma 3.3. ∂Θ = ∅ if and only if κ = 0. 

Proof. If κ = 0, then (3.18) is trivially solved by uI(θ) ≡ 0 for all θ. Hence, Θ = R q 

and thereby ∂Θ = ∅. 

Now assume κ = 0, i.e. there exist i = 1, . . . , K and j = 1, . . . , m such that 

κi,j = 0. We have shown in the proof of Proposition 3.5 that uI(θ) ≥ 0 for θ ∈ Θ R q 

+. 

Hence, it follows from (3.18) that 

m 

l=1 

ulβl,j − 1 

2 αj 

j,j u2 j ≥ (Ee θ⊺ ζi(Z i )+u ⊺ Z i 

− 1)κi,j. 

Moreover, βI,I has non-positive off-diagonal elements. Hence, 

ujβj,j − 1 

2 αj 

j,j u2 j ≥ (Ee θ⊺ ζi(Z i )+u ⊺ Z i 

− 1)κi,j. 

The LHS is obviously upper bounded. In addition, ζi is nonnegative. Hence, ∂Θ R q 

+ = 

∅ thereby ∂Θ = ∅. 

Proposition 3.6. Suppose ∂Θ = ∅. Then, ∇φ(θ) → ∞ as θ → ∂Θ. 

Proof. It follows from (3.17) that 

φ(θ) = u(θ) ⊺ b + 

K 

i=1 

λiEe θ⊺ ζi(Z i )+u(θ) ⊺ Z i 

. 

i=2


Hence, 

∇φ(θ) = ∇u(θ) ⊺ b + 

= ∇uI(θ) ⊺ bI + 

K 

i=1 

λiE(ζi(Z i ) + ∇u(θ) ⊺ Z i )e θ⊺ ζi(Z i )+u(θ) ⊺ Z i 

K 

i=1 

λiE(ζi,I(Z i ) + ∇uI(θ) ⊺ Z i I)e θ⊺ ζi(Z i )+uI(θ) ⊺ Z i I (3.34) 

since ui = 0 for i = m + 1, . . . , n. It follows from (3.32) and (3.33) that 

where 

∇uI(θ) = −J (θ, uI(θ)) −1 ∇θF (θ, uI(θ)) 

∇θF (θ, uI(θ)) ⊺ = 

K 

i=1 

(Eζi(Z i )e θ⊺ ζi(Z i )+u ⊺ Z i 

)κ ⊺ 

i,I 

Lemma 3.3 implies that κ = 0, thereby ∇θF (θ, uI(θ)) converges to a nonzero finite 

matrix as θ → ∂Θ. Then, it is easy to see that ∇φ(θ) → ∞ as θ → ∂Θ by Lemma 

3.2. 

Proposition 3.7. Suppose ∂Θ = ∅. Then, ∇φ(θ) → ∞ as θ → ∞ with θ ∈ 

Θ R q 

+ . 

Proof. Lemma 3.3 implies that κ = 0, in which case u(θ) ≡ 0 for θ ∈ Θ = R q . 

Moreover, we have 

∇φ(θ) = 

which clearly finishes the proof. 

K 

λiEζi(Z i )e θ⊺ζi(Z i ) 

, 

i=1 

Proposition 3.6 and Proposition 3.7 assert that φ(θ) is steep in R q 

+. We are now 

in a good shape to apply the Gärtner-Ellis theorem to establish the large deviation 

asymptotics for L(t). 

Theorem 3.2. Let r be the equilibrium mean of L(t) given in Theorem 3.1. Then 

under Assumptions 1 - 7, we have 

1 

lim log P(L(t) ≥ Rt) = −I(R), 

t→∞ t


for any R ∈ R q with R > r, where I(R) = θ ⊺ 

R R − φ(θR), and θR ∈ R q is such that 

∇φ(θR) = R. 

Proof. For each θ ∈ R q , define 

1 

ψ(θ) = lim 

t→∞ t log E exp(θ⊺L(t)). We have shown in Proposition 3.5 that ψ(θ) = φ(θ) for θ ∈ R q 

+. 

It then follows from Proposition 3.6 and Proposition 3.7 that ψ(θ) is steep in R q 

+. 

Let I(x) = sup θ∈R q[θ ⊺ x − ψ(θ)] is the Legendre-Fenchel transform of ψ(θ). It can be 

easily verified from (3.34) that ∇ψ(0) = ∇φ(0) = r. Then, for each x ≥ r, there 

exists a unique θx ∈ R q 

+ such that ∇ψ(θx) = x and I(x) = θ ⊺ xx − ψ(θx) = θ ⊺ xx − φ(θx); 

see Theorem 1 of [85]. Therefore, for x ≥ r, 

∇xI(x) = (∇xθx)x + θx − ∇xθx∇φ(θx) = θx ≥ 0. 

It follows that I(x) ≥ I(y) if x ≥ y. Hence, 

inf I(x) = I(R), 

x∈A 

since x ≥ R for all x ∈ A ⊂ R q 

+. Finally, by the Gärtner-Ellis theorem 

completing the proof. 

1 

lim log P(L(t) ≥ Rt) = − inf 

t→∞ t x∈A I(x), 

3.5 Importance Sampling 

Suppose that we are interested in computing P(L(t) > Rt) for R > r via simulation. 

Given the key role θR plays in the large deviations asymptotics, it is not surprising 

that to consider an IS estimator associated with the change-of-measure Q induced by


the exponential martingale M(t, θR). More specifically, let 

Y (t) M(t, θR) −1 I(L Q (t) ≥ Rt), 

= exp[θ ⊺ L Q (t) − φ(θR)t + u(θR) ⊺ (X Q (t) − X Q (0))]I(L Q (t) ≥ Rt) (3.35) 

where (X Q (t), L Q (t)) satisfies the SDE (2.1) with parameters (a, α, b, β, λ, κ) and ϕi 

as follows 

where θ = θR and u = u(θR). 

 

λi = λi 

Rn e θ⊺ζi(z)+u⊺z ϕi(dz) 

 

κi = κi 

Rn e θ⊺ζi(z)+u⊺z ϕi(dz) 

 

βI,I β 

− diag(α 

= 

1 1,1u1, . . . , αm m,mum) 0 

 

βI,J 

ϕi(dz) = eθ⊺ ζi(z)+u⊺z ϕi(dz) 

 

Rn eθ⊺ζi(z)+u⊺z ϕi(dz) 

It is shown in [93] that the IS estimator guided by the Gärtner-Ellis theorem 

is logarithmically efficient in a great generality. We provide a proof here for self- 

containment. 

Theorem 3.3. Under Assumptions 1 - 7, the IS estimator (3.35) is logarithmically 

efficient, i.e. 


lim 

t→∞ 

log E Q Y (t) 2 

2 log E Q Y (t) 

= 1. 

E Q Y (t) 2 = E Q exp{−2[θ ⊺ 

R L(t) − φ(θR)t + u(θR) ⊺ (X(t) − X(0))]}I(L(t) ≥ Rt) 

βJ,J 

≤ E Q exp{−2[θ ⊺ 

R Rt − φ(θR)t + u(θR) ⊺ (X(t) − X(0))]} 

= exp(−2I(R)t)E Q exp[−2u(θR)(X(t) − X(0))], 

where I(R) = θ ⊺ 

R R − φ(θR). Note that X(t) is ergodic under the probability measure 

Q as we discussed in the proof of Proposition 3.4 and that u(θR) ≥ 0 (since θR ≥ 0).



as t → ∞. Hence, 


E Q exp[−2u(θR)(X(t) − X(0))] = O(1) 

log EQY (t) 2 

log EQY (t) ≥ −2I(R)t + log E exp[−2u(θR)(X(t) − X(0))] 

log P(L(t) > Rt) 

lim 

t→∞ 



by Theorem 3.2. On the other hand, note that E Q Y (t) 2 ≥ (E Q Y (t)) 2 by Jensen’s 

inequality, from which it follows that log E Q Y (t) 2 ≥ 2 log E Q Y (t). Therefore, 

completing the proof. 

lim 

t→∞ 



≥ 1 

≤ 1, 

3.6 Application to Portfolio Credit Risk 

Consider n portfolios exposed to credit risk. Let Lk(t) denote the accumulated default 

loss of portfolio i, i = 1, . . . , n. Suppose that Xi(t) is the idiosyncratic risk factor 

associated with portfolio i, i = 1, . . . , n, while X0(t) is the common risk factor associ- 

ated with the system risk. Assume that X is an AJD with the following specification. 

For i = 0, 1, . . . , n, 

 

dXi(t) = (bi − βiXi(t)) dt + σi Xi(t) dWi(t) + δi 

n 

dLk(t), (3.36) 

where βi > 0 controls the mean-reversion speed, bi > 0 governs the reversion level, 

βi > 0 describes the diffusive volatility, and δi ≥ 0 represents the sensitivity of Xi to 

a jump (i.e. default loss) in the portfolios. Moreover, for k = 1, . . . , n, 

Lk(t) = 

t 

0 

 

S 

zNk(ds, dz), 

k=1


where Nk(d, dz) is a counting random measure on [0, ∞) × R+ with compensator 

(ωkX0(t−) + Xk(t−))ϕi(dz). Here, ωk > 0 and ϕk is the probability distribution of 

the individual default loss of portfolio k. 

Note that for each k = 1, . . . , n, 

Nk(t) 

t 

0 

 

S 

Nk(ds, dz) 

represents accumulated number of defaults for portfolio k. Then, 

Lk(t) = 

where (Z k j : j ≥ 1) is a sequence of iid R+-valued rv’s with common distribution ϕk, 

Nk(t) 

which represents the sequence of individual default losses for portfolio k. 

n By including a feedback term δi k=1 dLk(t) in the dynamics of the risk factor, 

the model (3.36) extends the doubly-stochastic model of [29], [82], [36] and so forth. 

If δi = 0, then the risk factor Xi does not respond to defaults. The doubly-stochastic 

formulation corresponds to setting δi = 0 for all i. [43] provides asymptotically exact 

simulation scheme for generating samples of (3.36) without resorting to discretization. 

j=1 

We conducted the following numerical experiment to verify the logarithmic effi- 

ciency of the IS estimator (3.35). We let n = 4 and ϕi be the uniform distribution 

on [0, 1] for all 1 ≤ i ≤ 4. The parameters of the model, b, β, σ, δ, ω are randomly 

Z k j , 

generated; see Table 3.2, where r is the equilibrium of L(t), i.e. 

L(t) 

r = lim 

t→∞ t 

and is computed by the formula given in Theorem 3.1. In addition, R is the large 

deviation level and is set to be 1.05r. The results of the simulation are shown in Table 

3.3 as well as Figure 3.3, where E Q Y (t) refers to the estimated probability, E Q Y (t) 2 

refers to the estimated second moment of the IS estimator, and “Log Ratio” refers to 


2 log E Q Y (t) .


i 0 1 2 3 4 

bi 1.9818 4.8008 4.0125 2.5173 1.8957 

βi 5.9995 4.9699 5.5199 4.3371 4.0607 

σi 0.1565 0.2489 0.1631 0.1785 0.1388 

δi 0 0.5400 0.8044 0.2217 0.0944 

ωi N/A 0.2317 0.7400 0.5197 0.2132 

ri N/A 0.6286 0.6297 0.4265 0.2916 

Ri N/A 0.6601 0.6612 0.4479 0.3062 

Table 3.2: Parameter specification for computing rare-event probabilities for APPs 

t E Q Y (t) E Q Y (t) 2 Log Ratio 

10 4.10E-02 2.69E-02 0.57 

50 2.18E-02 9.32E-03 0.61 

100 1.50E-02 4.40E-03 0.65 

500 8.71E-04 7.27E-05 0.68 

1000 1.34E-04 2.57E-06 0.72 

2000 8.34E-07 4.37E-10 0.77 

3000 4.78E-08 1.09E-12 0.82 

5000 1.60E-10 5.30E-18 0.88 

7500 2.46E-15 2.11E-27 0.91 

10000 5.68E-19 1.88E-34 0.93 

Table 3.3: Results of the numerical experiment for testing the logarithmic efficiency 

of the IS estimator. 

Obviously, the numerical results indicate the convergence of the above ratio to 1 as 

t → ∞.


Probability 

10 0 

10 −4 

10 −8 

10 −12 

10 −16 

10 −20 

Prob. 

Log Ratio 

0 2000 4000 6000 8000 

0.5 

10000 

t 

Figure 3.3: Convergence of Log Ratio as the probability tends to 0, showing the 

logarithmic efficiency of the IS estimator. 

1 

0.9 

0.8 

0.7 

0.6 

Log Ratio

Chapter 4 

Computing Large Deviations for 

General State Space Markov 

Processes 


Before proceeding to a more general setting, let us give some remarks on the IS 

algorithm for computing rare-event probabilities for APPs in Chapter 3. Recall that 

the probability of interest is 

P(L(t) > Rt), 

where L is an APP. A key step to find the appropriate “tilting” parameter θR for the 

IS estimator is to calculate the limiting CGF 

1 

φ(θ) lim 

t→∞ t log E exp(θ⊺L(t)). The above task is essentially the same as solving the eigenvalue problem 

H f(θ) = φ(θ)f(θ), (4.1) 

75

CHAPTER 4. COMPUTING LARGE DEVIATIONS FOR GSSMPS 76 

where f(θ) = f(x, l; θ) is a function in (x, l) for each θ and H is the infinitesimal 

generator of (X(t), L(t)) with X(t) satisfying the SDE (2.1), i.e. 

H f(x, l; θ) =∇xf(x, l; θ) ⊺ (b − βx) + 1 

2 

+ 

K 

i=1 

(λi + κ ⊺ 

i x) 

 

S 

n 

i,j=1 

(ai,j + 

m 

k=1 

α k i,jxk) ∂2f (x, l; θ) 

∂xi∂xj 

(f(x + γi(z), l + ζi(z); θ) − f(x, l; θ))ϕi(dz). 

Not surprisingly, the eigenvalue problem (4.1) is explicitly solvable due to the affine 

structure. In fact, the solution of the eigenfunction f has form f(x, l; θ) = exp(u ⊺ x + 

θ ⊺ l). 

In fact, computing rare-event probabilities for Markov processes is typically related 

to an eigenvalue problem similar as (4.1), which usually does not have an explicit 

solution available unless certain special structure exists. The existing IS algorithm, 

however, heavily relies on the solution of such an eigenvalue problem. This chapter 

will be devoted to the discussion of how to compute rare-event probabilities for general 

state space Markov processes in the absence of special structure. 

4.2 Markov-Dependent Sums 

Consider an (time-homogeneous) discrete time Markov chain Φ = (Φn : n ≥ 0) 

with state space S and transition kernel P (x, dy) = Px(Φ1 ∈ dy). Suppose that Φ 

is positive Harris recurrent so that there exists a unique stationary distribution π. 

Define the Markov-dependent sum 

where f : S → R with π(f) 

S 

n−1 

Sn = f(Φi), 

i=0 

f(x)π(dx) = 0 and π(|f|) < ∞. We will attempt to 

compute Px(Sn > cn) for c > 0 and n large. Clearly, by the SLLN for Markov chains, 

Sn 

n 

→ 0 a.s.


as n → ∞. Hence, ({Sn > cn} : n ≥ 0) is a family of rare-events. Moreover, we 

usually have a large deviations result as follows 

1 

lim 

n→∞ n log Px(Sn > cn) = −I(c), 

where I(c) is called the rate function, which will be specified later. We now intro- 

duce the technical conditions that will guarantee the above large deviations result. 

The conditions are taken from [65], which generalizes the results in [64] for bounded 

functions f. 

Assumption 8. (a) Φ is irreducible, aperiodic and satisfies the following Foster- 

Lyapunov criterion with V : S → [1, ∞): 

log(P e V ) ≤ V − δW + bIC 

for a small set C ⊂ S, constants δ > 0, b < ∞, and a function W : S → [1, ∞). 

(b) fW0 < ∞, where W0 : S → [1, ∞) for which 

lim 

l→∞ sup 

 

W0(x) 

I(W (x) > l) = 0. 

x∈S W (x) 

(c) There exists n0 such that, for each l for 

which sup gV n0) ≤ βl(A) for all 

A ∈ B(S) and x ∈ CW (l), where CW (l) {y : W (y) ≤ l} and τCW (l) is the first 

hitting time of CW (l). 

Remark 4.1. The drift criterion in (a) of Assumption 8 captures the essential ingre- 

dient of the large deviations conditions imposed by Donsker and Varadhan in their 

seminal work ([25] and [26]). 

Now define a positive kernel K(θ) = (K(θ, x, dy) : x, y ∈ S), where 

K(θ, x, dy) e θf(x) P (x, dy).


Then the eigenvalue problem of interest is 

i.e. 

S 

K(θ)h(θ) = λ(θ)h(θ), (4.2) 

K(θ, x, dy)h(θ, y) = λ(θ)h(θ, x), x ∈ S. 

Proposition 4.1. Under Assumption 8, the eigenvalue problem (4.2) has a positive 

solution pair (λ(θ), h(θ)) for θ in a neighborhood of the origin. Moreover, 

1 

lim 

n→∞ n log Exe θSn = log λ(θ) ψ(θ). 

Proof. See Theorem 4.1 of [64] and Theorem 3.1 of [65]. 

Define Dψ {θ : ψ(θ) < ∞}. The the origin belongs to Do ψ , the interior of Dψ. 

Remark 4.2. The eigenvalue λ(θ) associated with the positive eigenfunction h(θ) is 

in fact the largest eigenvalue (the eigenvalue having the largest absolute value) of the 

kernel K(θ), known as the Perron-Frobenius eigenvalue ([78], [75], and [76]). 

Proposition 4.2. Under Assumption 8, for 0 < c 

where I(c) = θc · c − ψ(θc), and ψ ′ (θc) = c. 

1 

lim 

n→∞ n log Px(Sn > cn) = −I(c), (4.3) 

Proof. See Theorem 6.3 and Theorem 6.5 of [64] as well as Theorem 5.3 and Theorem 

5.4 of [65]. 

Provided that the large deviations result (4.3) is valid, the most well-known IS 

algorithm for efficiently estimating Px(Sn > cn) for large n is the “exponential tilting” 

([1]). In particular, we simulate Φ under the tilted transition kernel 

exp(θcf(x) − ψ(θc))P (x, dy) h(θc, y) 

, (4.4) 

h(θc, x)


and use the IS estimator 

Z I(Sn > cn) exp(−θcSn + nψ(θc)) h(θc, Φ0) 

. (4.5) 

h(θc, Φn) 

By the large deviations result (4.3), it can be shown easily that above IS estimator is 

logarithmically efficient (see, for example, [16]). There are also variations based on the 

above IS estimator. [33] and [34] introduce the dynamic importance sampling method. 

A closely related method is the state-dependent change-of-measure, introduced by 

[10], see also [12]. 

Nevertheless, two major difficulties need to be tackled in order to implement the 

IS estimator Z. 

1. As mentioned earlier, it is hard to compute the solution to the eigenvalue prob- 

lem (4.2) numerically when the state space S is large or continuous when an 

explicit solution is unavailable. 

2. Even if the solution is available (explicitly or numerically), it is still hard to 

sample from the tilted transition kernel (4.4). 

Remark 4.3. Recall that for the rare-event simulation for APPs, we can simulate 

(X(t), L(t)) under the IS distribution because the IS distribution still belongs to the 

affine family, due to the special affine structure. 

In order to circumvent these two difficulties, we will impose and exploit the re- 

generative structure of Φ and introduce an importance sampler on the “cycle-path 

space” instead of the original one-step transition dynamics. A recent and closely 

related work is [17], which proposes the sequential importance sampling and resam- 

pling (SISR) procedure. Their idea is that at each step k < n, first generate m 

independent copies of Φk under the original transition dynamics and then resam- 

ple among these m copies with IS, whose weight function is roughly proportional to 

e ˆ θf(Φk)−ψ( ˆ θ) u( ˆ θ,Φk) 

u( ˆ θ,Φk−1) , where ˆ θ is an approximations to θc and u is a function satisfying 

K(θ)u(θ) ≤ (1 − δ)λ(θ)u(θ) for some δ ∈ (0, 1). Note that the function u appears in 

a discrete probability distribution, which is easy to sample from, thereby solving the 

second difficulty above.


Nevertheless, the SISR algorithm in [17] requires the knowledge of the solution to 

the eigenvalue problem (4.2). In the presence of continuous state space, they simply 

discretize the state space and numerically compute an approximation to ψ(θ) and 

further an approximation to θc. Therefore, they do not address the first difficulty. 

Our work, by contrast, attempt to address these two difficulties simultaneously. A 

crucial assumption we will use is the following minorization condition. 

Assumption 9. There exists a compact set C ⊂ S, a constant δ ∈ (0, 1), and a 

probability measure ϕ on S, such that 

P (x, dy) ≥ δϕ(·), x ∈ C. 

Remark 4.4. Harris recurrence requires that 

P m (x, dy) ≥ δϕ(·), x ∈ C, (4.6) 

for some m ≥ 1 (see Section A.2.1). It is well known that Φ has regenerative structure 

when m = 1 in (4.6), while only one-dependent regenerative structure exists when 

m > 1 (see [49] and references therein). 

When m = 1 the regenerative structure can be constructed via the “splitting 

method” due to [77] and [5]. Define 

Q(x, dy) 

P (x, dy) − δϕ(dy) 

. 

1 − δ 

Then Q(x, ·) is a probability distribution on S for each x ∈ C. Note that 

P (x, dy) = δϕ(dy) + (1 − δ)Q(x, dy), 

which indicates that one can identify the regenerations as follows. 

Suppose that Φ visits C at time σ. Flip a δ-coin. If the coin comes up with “heads” 

(which occurs with probaibility δ), distribute Φσ+1 according to ϕ; otherwise, distribute 

Φσ+1 according to Q. Then, conditional on the coin coming up “heads”, Φσ+1 has a


distribution ϕ that is independent of Φσ. In other words, Φ regenerates whenever it 

distributes itself according to ϕ. 

Proposition 4.3. Suppose that Assumption 8 and Assumption 9 hold. Let τ be the 

(first) regeneration time, i.e. τ = inf{n ≥ 1 : Φn−1 ∈ C, P(Φn ∈ ·) = ϕ(·)}. Then, 

for θ ∈ D o ψ , 

Eϕe θSτ −τψ(θ) = 1. 

Proof. Let Mn = e θSn−nψ(θ) h(θ, Φn). Then, 


Rn Mn 

Mn−1 

θf(Φn−1)−ψ(θ) h(θ, Φn) 

= e 

h(θ, Φn−1) , 

θf(Φn−1)−ψ(θ) h(θ, Φn) 

E[Rn|Xn−1] =EXn−1e 

h(θ, Φn−1) 

= 

= 

1 

λ(θ)h(θ, Φn−1) 

1 

λ(θ)h(θ, Φn−1) 

 

 

h(θ, y)e 

S 

θf(Φn−1) 

P (Φn−1, dy) 

S 

K(θ, Φn−1, dy)h(θ, y) = 1. 

Therefore, (Mn : n ≥ 1) is a Px-martingale adapted to Φ. By the optional sampling 

theorem, 

h(θ, x) =Exe θSτ∧n−(τ∧n)ψ(θ) h(θ, Φτ∧n) 

=Exe θSτ −τψ(θ) h(θ, Φτ)I(τ ≤ n) + Exe θSn−nψ(θ) h(θ, Φn)I(τ > n) 

=Exe θSτ −τψ(θ) h(θ, Φτ)I(τ ≤ n) + h(θ, x) ˜ Px(τ > n) (4.7) 

where ˜ Px is the probability associated of the exponentially tilted transition kernel 

eθf(x)−nψ(θ) h(θ,y) 

. It follows from 8 and Proposition 2.12 of [65] that Φ is exponentially 

h(θ,x) 

ergodic under ˜ Px. In particular, we know ˜ Px(τ < ∞) = 1. Therefore, sending n → ∞ 

in (4.7) yields 

h(θ, x) = Exe θSτ −τψ(θ) h(θ, Φτ),


for each x ∈ S. Hence, 

Eϕh(θ, Φ0) = Eϕe θSτ −τψ(θ) h(θ, Φτ). 

By the regenerative structure, Φτ is independent of (τ, Φ0, . . . , Φτ−1). Hence, Φτ is 

independent of e θSτ −τψ(θ) . It follows that 

and thus Eϕe θSτ −τψ(θ) = 1. 

Eϕh(θ, Φ0) =Eϕe θSτ −τψ(θ) Eϕh(θ, Φτ) 

=Eϕe θSτ −τψ(θ) Eϕh(θ, Φ0), 

Let 0 < T (1) < T (2) < · · · be the sequence of regeneration times and T (0) = 0. 

Define τk = T (k+1)−T (k), k ≥ 0. Set Nn = min{k ≥ 0 : τ0+· · ·+τk = T (k+1) ≥ n}. 

Then, Nn is a stopping time adapted to Φ. 

Note that due to the regenerative structure, 

⎛⎛ 

⎝⎝ 

 

T (k+1)−1 

f(Φi), τk 

i=T (k) 

⎞ 

⎞ 

⎠ : k ≥ 1⎠ 

is an iid sequence. Then the following corollary is an immediate result of Proposition 

4.3 and Wald’s identity ([97]). 

Corollary 4.1. Suppose that Assumption 8 and Assumption 9 hold. Then for θ ∈ D o ψ , 

M θ n e θS T (Nn+1)−T (Nn+1)ψ(θ) is a Pϕ-martingale adapted to Φ. 

Corollary 4.1 implies that (M θ n : n ≥ 1) induces a probability measure Pθ on the 

“cycle-path” space as follows 

d˜ Pθ 

ϕ 

 

= M 

dPϕ Fn 

θ n, (4.8) 

where (Fn : n ≥ 0) is the filtration generated by Φ. In particular, we have 

Pϕ(Sn > cn) = ˜ E θ ϕI(Sn > cn)e −θS T (Nn+1)+T (Nn+1)ψ(θ) . (4.9)


Moreover, we can incorporate the case when ϕ(·) = δx(·): 

Px(Sn > cn) = ˜ E θ xI(Sn > cn)e −θ[S T (Nn+1)−Sτ 0 ]+[T (Nn+1)−τ0]ψ(θ) , 

i.e. the dynamics of Φ remains unchanged till τ0, after which we introduce the IS as 

indicated by (4.8) for the regenerative cycles. 

The natural question here is how one should find an appropriate tilting parameter 

θ. Naturally, θc in the large deviations (4.3) is a good candidate. But computing 

θc requires computing the eigenvalue problem (4.2). So we will adopt an empirical 

approximation of θc, avoiding the computation of (4.2). In the sequel of this chapter, 

we will fix c ∈ (0, sup θ∈D o ψ ∩(0,∞) ψ ′ (θ)) for which the large deviations (4.3) holds and 

let ϑ θc > 0 suppress the dependence on c for notational simplicity. 

4.3 Empirical Moment Generating Function 

For each i ≥ 0, define 

Yi 

T (i+1)−1 

j=T (i) 

f(Φj). 

The regeneration structure indicates that ((Yi, τi) : i ≥ 1) is a sequence of iid rv’s with 

common distribution that is the same as (Y, τ). Proposition 4.3 suggests that we can 

consider the empirical approximation of Eϕe θY −τψ(θ) to compute an approximation of 

ψ(θ). More specifically, for m = 1, 2, . . ., define 

γm(θ, ζ) 1 

m 

m 

exp(θYi − ζτi) 

i=1 

be the empirical moment generating function of (Y, τ). We then can define the “em- 

pirical version” of ϑ and ψ. In particular, let ψm(θ) be the empirical CGF for which 

γm(θ, ψm(θ)) = 1, (4.10)


and let ϑm be the empirical root for which 

ψ ′ m(ϑm) = c (4.11) 

We will show the consistency and asymptotic normality of ϑm. To this end, we 

need the following lemma. 

Lemma 4.1. Suppose that fm(ξ) is a rv for each ξ ∈ R, m ≥ 1, for which 

fm(ξ) a.s. 

→ f(ξ) 

for each ξ, as m → ∞, where f : R → R is a deterministic function. Also, suppose 

that f and fm are both increasing (or decreasing) in ξ, and that ξ0 and ξm solve 

f(ξ0) = 0 and fm(ξm) = 0, respectively. Then, 

as m → ∞. 

ξm 

a.s. 

→ ξ0 

Proof. Without loss of generality, assume that f and fm are both increasing in ξ. For 

any ɛ > 0, 

fm(ξ0 − ɛ) a.s. 

→ f(ξ0 − ɛ) < f(ξ0) = 0, 

fm(ξ0 + ɛ) a.s. 

→ f(ξ0 + ɛ) < f(ξ0) = 0, 

as m → ∞. Hence, there exists m0 such that 

fm(ξ0 − ɛ) < 0 < fm(ξ0 + ɛ) 

for all m > m0. Note that fm(ξm) = 0 and fm is increasing. It follows that 

for all m > m0. Therefore, ξm 

ξ0 − ɛ < ξm < ξ0 + ɛ 

a.s. 

→ ξ0 as m → ∞.


Proposition 4.4. Suppose that Assumption 8 and Assumption 9 hold. Then, ψm(θ) a.s. 

→ 

ψ(θ) as m → ∞ for any θ ∈ D o ψ . 

Proof. Fix a θ ∈ D o ψ . Let f(ζ) = EeθY −ζτ , and 

fm(ζ) = 1 

m 

m 

e θYi−ζτi , m ≥ 1. 

Then, the SLLN yields fm(ζ) a.s. 

→ f(ζ) as m → ∞ for any θ. 

i=1 

Note that, since τ > 0 a.s., f and fm are decreasing in ζ. It then follows immediately 

from Lemma 4.1 that ψm(θ) a.s. 

→ ψ(θ) as m → ∞ for any θ ∈ Do ψ . 

Proposition 4.5. Suppose that Assumption 8 and Assumption 9 hold. Then, ϑm 

as m → ∞. 

Proof. We know from Proposition 4.4 that ψm(θ) a.s. 

→ ψ(θ) as m → ∞. 

Note that 

Ee θY −ψ(θ)τ = 1. 

a.s. 

→ ϑ 

The dominated convergence theorem guarantees that we can interchange the differ- 

entiation and expectation. Differentiating both sides with respect to θ yields, 


Then, 

Likewise, we can show 

E[e θY −ψ(θ)τ (Y − ψ ′ (θ)τ)] = 0, 

E[e θY −ψ(θ)τ ((Y − ψ ′ (θ)τ) 2 − ψ ′′ (θ)τ)] = 0. 

ψ ′ (θ) = 

−ψ(θ)τ 

EY eθY 

EτeθY −ψ(θ)τ 

ψ ′′ (θ) = E(Y − ψ′ (θ)τ) 2e Eτe 

ψ ′ m(θ) = 

1 m 

m i=1 

1 m 

m i=1 

θY −ψ(θ)τ 

> 

θY −ψ(θ)τ 

Yie θYi−ψm(θ)τi 

τie θYi−ψm(θ)τi 

0.


and ψ ′′ m(θ) > 0. Hence, ψ ′ m(θ) − c and ψ ′ (θ) − c are both increasing in θ. Moreover, 

ψ ′ m(θ) a.s. 

→ ψ ′ (θ) 

as m → ∞. It then follows from Lemma 4.1 that ϑm 

a.s. 

→ ϑ as m → ∞. 

We then immediately arrive at the following result by Proposition 4.4 and Propo- 

sition 4.5. 

Theorem 4.1. Suppose that Assumption 8 and Assumption 9 hold. Then, 

as m → ∞. 

ϑmc − ψm(ϑm) a.s. 

→ I(c) 

We now conclude this section with a discussion on the asymptotic normality of 

the empirical approximations. 

Theorem 4.2. Suppose that Assumption 8 and Assumption 9 hold. If 2ϑ ∈ Do ψ , then 

as m → ∞, where Σ = BΓB ⊺ , 


where Z = e ϑY −ψ(ϑ)τ . 

 

√ ϑm ϑ 

m 

− ⇒ N (0, Σ) 

ψm(ϑm) ψ(ϑ) 

B = 

 

EY Z −EτZ 

EY (Y − cτ)Z −Eτ(Y − cτ)Z 

−1 

 

 

Var(Z) Cov(Z, (Y − cτ)Z) 

Γ = 

, 

Cov(Z, (Y − cτ)Z) Var((Y − cτ)Z) 

Proof. Define γ(θ, ζ) Ee θY −ζτ , and Dγ {(θ, ζ) : γ(θ, ζ) < ∞}. Then, γ is 

differentiable in D o γ. Note that γ(θ, ψ(θ)) = 0. Hence, 

∂ 

∂ 

γ(θ, ψ(θ)) + 

∂θ ∂ζ γ(θ, ψ(θ)) · ψ′ (θ) = 0.


Since ψ ′ (ϑ) = c, we have 

i.e. 

Likewise, we have 

∂ 

∂ 

γ(ϑ, ψ(ϑ)) + γ(ϑ, ψ(ϑ)) · c = 0, 

∂θ ∂ζ 

1 

m 

E(Y − cτ)e ϑY −ψ(ϑ)τ = 0. (4.12) 

m 

(Yi − cτi)e ϑmYi−ψm(ϑm)τi = 0. 

i=1 

It follows that (ϑ, ψ(ϑ)) solves F (ϑ, ψ(ϑ)) = 0, where 

 

 

θY −ζτ Ee 

F (θ, ζ) 

. 

E(Y − cτ)eθY −ζτ 

Likewise, (ϑm, ψm(ϑm)) solves Fm(ϑm, ψm(ϑm)) = 0, where 

Note that 


Fm(θ, ζ) 

1 

m 1 m 

m 

0 =Fm(ϑm, ψm(ϑm)) − F (ϑ, ψ(ϑ)) 

m 

i=1 eθYi−ζτi 

i=1 (Yi − cτi)e θYi−ζτi 

=Fm(ϑm, ψm(ϑm)) − Fm(ϑ, ψ(ϑ)) + Fm(ϑ, ψ(ϑ)) − F (ϑ, ψ(ϑ)). 

− [Fm(ϑ, ψ(ϑ)) − F (ϑ, ψ(ϑ))] 

 

 

ϑm − ϑ 

=∇Fm(ϑ, ψ(ϑ)) 

+ o((ϑm − ϑ, ψm(ϑm) − ψ(ϑ))). (4.13) 

ψm(ϑm) − ψ(ϑ) 

Note that since 2ϑ ∈ D o ψ , the covariance matrix of (eϑY −ψ(ϑ)τ , (Y −cτ)e θY −ζτ ) is finite. 

So we can apply the CLT to obtain 

√ m[Fm(ϑ, ψ(ϑ)) − F (ϑ, ψ(ϑ))] ⇒ N (0, Γ), (4.14) 

 

.


as m → ∞, where 

Γ = 

 

Var ϑY e −ψ(ϑ)τ 

Cov eϑY −ψ(ϑ) ϑY , (Y − cτ)e −ψ(ϑ) 

Moreover, note that by the SLLN, 

∇Fm(ϑ, ψ(ϑ)) → 

 

Cov eϑY −ψ(ϑ) , (Y − cτ)e 

Var ϑY (Y − cτ)e −ψ(ϑ) 

EY eϑY −ψ(ϑ) ϑY −ψ(ϑ) 

−Eτe 

EY (Y − cτ)eϑY −ψ(ϑ) ϑY −ψ(ϑ) 

−Eτ(Y − cτ)e 

as m → ∞. Note that the determinant of the RHS is 

E[τZ]E[Y (Y − cτ)Z] − E[Y Z]E[τ(Y − cτ)Z] 

=E[τZ](E[Y (Y − cτ)Z] − cE[τ(Y − cτ)Z]) by (4.12) 

=E[τZ]E[(Y − cτ) 2 Z] > 0 

 

ϑY −ψ(ϑ) 

 

a.s. (4.15) 

where Z = e ϑY −ψ(ϑ) . Hence, the RHS of (4.15) is nonsingular. Combining (4.13), 

(4.14), and (4.15) yields the desired result. 

4.4 A Bootstrap Based Simulation Algorithm 

Theorem 4.1 asserts that we could use e −n(ϑmc−ψm(ϑm)) as an approximation to P(Sn > 

cn) for large n and m. This approximation is not clearly accurate enough since it 

approximates P(Sn > cn) only at the logarithmic scale. To achieve a higher degree 

of precision, we need to do additional bootstrap-type resampling. 

Algorithm 1. (i) Generate m iid copies of (Y0, τ0), denoted by ((Y0,i, τ0,i) : 1 ≤ 

i ≤ m), and m iid regenerative cycles ((Yi, τi) : 1 ≤ i ≤ m). Then compute ϑm 

and ψm(·). 

(ii) Sample ((Y0,i, τ0,i) with probability 1/m for all i and denote it by ( ˜ Y0, ˜τ0) and 

˜Y0 = ˜τ0−1 

j=0 f( ˜ Xj). 

(iii) Sample the regenerative cycle i from the collection ((Yi, τi) : 1 ≤ i ≤ m) with 

.


probability 

1 

m exp(ϑmYi − ψm(ϑm)τi) 

and denote it by ( ˜ Y1, ˜τ1) and ˜ Y1 = ˜τ0+˜τ1−1 f( j=˜τ0 

˜ Xj). 

(iv) Continue sampling such cycles independently until the total cycle length exceeds 

n, i.e. 

Set Ñn = k. 

˜T (k + 1) ˜τ0 + ˜τ1 + · · · + ˜τk ≥ n. 

(v) Let ˜ Sn(m) be the sum of the first n f( ˜ Xj)’s associated with the above boot- 

strapped sequence of cycles, i.e. 

Put 

n−1 

˜Sn(m) = f( ˜ Xj) = 

j=0 

Ñn−1 

i=0 

˜Yi + 

n−1 

j= ˜ T ( Ñn) 

f( ˜ Xj). 

Z(n, m) = I( ˜ ⎛ 

Ñn 

Sn(m) > cn) exp ⎝ (−ϑm ˜ ⎞ 

Yi + ψm(ϑm)˜τi) ⎠ 

(vi) Repeat the above four steps r independent times, yielding Z1(n, m), . . . , Zr(n, m), 

and return 

Z(n, m, r) = 1 

r 

i=1 

r 

Zi(n, m). 

Proposition 4.6. Let Xm = {(Y0,i, τ0,i), (Yi, τi) : 1 ≤ i ≤ m}. Then, 

E[Z 2 (n, m)|Xm] 

≤e −2n(ϑmc−ψm(ϑm)) · 1 

m 

m 

i=1 

i=1 

+ 

2ϑmY 

e 0,i−2ψm(ϑm)τ0,i 1 

· 

m 

m 

e 

i=1 

+ − 

ϑm(Y i +Y 

i )+ψm(ϑm)τi ,


where 

Y + 

0,i = 

τ0,i−1 

 

j=0 

 

Y + 

i = 

T (i+2)−1 

j=T (i+1) 

 

Y − 

i = 

T (i+2)−1 

j=T (i+1) 

f + (Xj) 

f + (Xj) 

f − (Xj) 

for each i = 1, . . . , m. Here, f + and f − are respectively the positive and negative part 

of f. 


E[Z 2 ⎡ 

(n, m)|Xm] = E ⎣I( ˜ ⎛ 

Ñn 

Sn(m) > cn) exp ⎝2 (−ϑm ˜ ⎞ 

 

 

Yi + ψm(ϑm)˜τi) ⎠ 

and that 

Hence, 

Ñn 

i=1 

E[Z 2 (n, m)|Xm] 

 

≤E e −2ϑm 

 

 

=E e 2ϑmY0−2ψm(ϑm)τ0−2ϑm 

=e −2n(ϑmc−ψm(ϑm)) 

· E 

 

· E e −2ϑm 

T ˜(Ñn+1)−1 j=n 

cn−Y0+ ˜ T ( Ñn+1)−1 

j=n 

˜Yi = ˜ Sn(m) − Y0 + 

i=1 

˜T ( Ñn+1)−1 

j=n 

f( ˜ Xj). 

f( ˜ 

Xj) +2ψm(ϑm)( ˜ T ( Ñn+1)−˜τ0) 

 

Xm 

˜ T (Ñn+1)−1 

j=n 

e 2ϑm ˜ Y0−2ψm(ϑm)˜τ0 

Xm Xm 

⎤ 

⎦ , 

f( ˜ Xj)−2n(θmc−ψm(ϑm))+2ψm(ϑm)( ˜ T ( Ñn+1)−n) 

Xm 

 

 

f( ˜ Xj)+2ψm(ϑm)( ˜ T ( Ñn+1)−n) 

Xm 

 

 

 

 

 

 

(4.16) 

where the last equality follows from the independence between ( ˜ Y0, ˜τ0) and ( ˜ Yi, ˜τi) for


all i = 1, . . . , m, conditional on Xm. 

Let E ∗ [·|Xm] denote the probability measure under which we sample the regener- 

ative cycles in Step (iii) of Algorithm 1 uniformly, i.e. select cycle i with probability 

1/m for all 1 ≤ i ≤ m. Then, 

 

E 

=E ∗ 

 

e −2ϑm 

e −2ϑm 

≤E ∗ 

 

e −2ϑm 

=E ∗ 

 

e −ϑm 

 

e ϑm 

˜ T (Ñn+1)−1 

j=n 

˜ T (Ñn+1)−1 

j=n 

˜ T (Ñn+1)−1 

j=n 

 

 

f( ˜ Xj)+2ψm(ϑm)( ˜ T ( Ñn+1)−n) 

Xm f( ˜ Xj)+2ψm(ϑm)( ˜ T ( Ñn+1)−n) · e ϑm ˜ YÑn−ψm(ϑm)˜τ Ñn 

Xm 

 

 

f( ˜ Xj)+ϑm ˜ YÑn +ψm(ϑm)˜τ Ñn 

Xm T ˜(Ñn+1)−1 j=n f( ˜ n−1 Xj)+ϑm 

j= ˜ T ( Ñn) f( ˜ Xj)+ψm(ϑm)˜τ Ñn 

Xm ≤E ∗ 

T ˜(Ñn+1)−1 j=n f − ( ˜ n−1 Xj)+ϑm 

j= ˜ T ( Ñn) f + ( ˜ Xj)+ψm(ϑm)˜τ Ñn 

Xm ≤E ∗ 

 

e ϑm ˜ Y − 

Ñn +ϑm ˜ Y − 

Ñn +ψm(ϑm)˜τ 

 

Ñn 

Xm 

= 1 

m 

m 

e 

i=1 

+ − 

ϑm(Y i +Y 

i )+ψm(ϑm)τi . (4.17) 

The first inequality follows from the fact that ψm(ϑm) > 0 and ˜ T ( Ñn + 1) − n ≤ τ Ñn . 

Finally, note that 

 

E 

 

 

 

e 2ϑm ˜ Y0−2ψm(ϑm)˜τ0 

Xm 

= 1 

m 

≤ 1 

m 

 

 

 

 

 

 

 

 

 

 

m 

e 2ϑmY0,i−2ψm(ϑm)τ0,i 

i=1 

m 

i=1 

+ 

2ϑmY 

e 0,i−2ψm(ϑm)τ0,i , (4.18) 

since ϑm > 0. The proof is finished by combining (4.16), (4.17), and (4.18). 

We will need the following technical assumption. 

Assumption 10. (i.) 2ϑ ∈ D 0 ψ ; 

+ 

θY (ii.) Ee 0 −ζτ0 < ∞ for (θ, ζ) in a neighborhood of (2ϑ, 2ψ(ϑ));


(iii.) Ee 

+ − 

θ(Y 1 +Y 

1 )+ζτ1 < ∞ for (θ, ζ) in a neighborhood of (ϑ, ψ(ϑ)). 

Proposition 4.7. Suppose that Assumptions 8, 9, and 10 hold. Let m ∼ n 2+ξ for 

ξ 

−(1+ some ξ > 0 and δ = n 4 ) . Then, 

as n → ∞. 

Proof. It follows from Theorem 4.2 that 

as m → ∞ for some σ 2 > 0. Hence, 

P(|ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| ≥ δ) → 0 

√ m(ϑm − ϑ) ⇒ N (0, σ 2 ) 

δ −1 (ϑm − ϑ) p → 0 

as n → ∞, since √ ξ 

1+ m ∼ n 2 and δ−1 ξ 

1+ = n 4 . Therefore, 

as n → ∞. Likewise, we have 

as n → ∞. It follows that 

as n → ∞. 

 

P |ϑm − ϑ| ≥ δ 

 

= P δ 

2 

−1 |ϑm − ϑ| ≥ 1 

 

→ 0 

2 

 

P |ψm(ϑm) − ψ(ϑ)| ≥ δ 

 

→ 0 

2 

P(|ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| ≥ δ) 

 

≤P |ϑm − ϑ| ≥ δ 

 

+ P |ψm(ϑm) − ψ(ϑ)| ≥ 

2 

δ 

 

2 

→0


ξ 

−(1+ Proposition 4.8. Suppose that Assumptions 8, 9, and 10 hold. Let δ = n 4 ) for 

some ξ > 0. Then, 

2 E[Z (n, m)|Xm] 

E 

p(n) 2 

 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ = e o(n) 

as n → ∞, where p(n) = Px(Sn > cn). 

Proof. It follows from Proposition 4.6 that 

E E[Z 2 (n, m)|Xm]; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

 

≤E e −2n(ϑmc−ψm(ϑm)) · 1 

m 

+ 

2ϑmY 

e 0,i 

m 

i=1 

−2ψm(ϑm)τ0,i 

· 1 

m + − 

ϑm(Y 

e i +Yi m 

i=1 

)+ψm(ϑm)τi 

 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

 

≤E e −2n((ϑ−δ)c−(ψ(ϑ)+δ)) · 1 

m 

+ 

2(ϑ+δ)Y 

e 0,i 

m 

i=1 

−2(ψ(ϑ)−δ)τ0,i 

· 1 

m 

+ − 

(ϑ+δ)(Y 

e i +Yi m 

)+(ψ(ϑ)+δ)τi 

 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

i=1 

≤e −2n((ϑc−ψ(ϑ))−(c+1)δ) + 

2(ϑ+δ)Y 

· Ee 0 −2(ψ(ϑ)−δ)τ0 (ϑ+δ)(Y 

· Ee + − 

1 +Y 

On the other hand, it follows from Proposition 4.2 that 

as n → ∞. Consequently, 

p(n) ∼ e −n(ϑc−ψ(ϑ))+o(n) 

2 E[Z (n, m)|Xm] 

E 

p(n) 2 

 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

≤ Ae−2n((ϑc−ψ(ϑ))−(c+1)δ) 

re −2n(ϑc−ψ(ϑ))+o(n) 

=Ae 2(c+1)δn+o(n) 

=e o(n) 

1 )+(ψ(ϑ)+δ)τ1 . 

(4.19)


ξ 

−(1+ as n → ∞, since δ = n 4 ) , where 

by Assumption 10. 

+ 

2(ϑ+δ)Y 

A = Ee 

0 −2(ψ(ϑ)−δ)τ0 (ϑ+δ)(Y 

· Ee + − 

1 +Y 

Assumption 11. (a) limθ→∞ sup x∈S |Exe iθf(Φ1) | < 1 

(b) limθ→∞ sup m≥1 |E[e iθY1 |Xm]| < 1 

1 )+(ψ(ϑ)+δ)τ1 < ∞ 

Remark 4.5. Assumption 11 indicates that f(Φ1) and Y1|Xm are both strongly non- 

lattice, which is required to derive the local central limit convergence and further to 

obtain the exact large deviations asymptotics for them. 

ξ 

−(1+ Proposition 4.9. Suppose that Assumptions 8, 9, 10, and 11 hold. Let δ = n 4 ) 

for some ξ > 0. Then, 

E 

E(Z(n, m)|Xm) 

p(n) 2 

as n → ∞, where p(n) = Px(Sn > cn). 

Proof. It follows from Theorem 5.3 of [65] that 

p(n) ∼ 

2 

− 1 ; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ → 0 

h(ϑ, x) 

ϑ 2πψ ′′ (ϑ)n e−n(ϑc−ψ(ϑ)) , 

as n → ∞, where h(ϑ, x) is the eigenfunction of (4.2). 

Note that 

E(Z(n, m)|Xm) = E ∗ [I( ˜ Sn(m) > cn)|Xm] = P ∗ ( ˜ Sn(m) > cn|Xm) 

where E ∗ [·|Xm] denotes the probability measure under which we sample the regener- 

ative cycles in Step (iii) of Algorithm 1 uniformly, i.e. select cycle i with probability 

1/m for all 1 ≤ i ≤ m.


In light of the regenerative structure of ˜ Sn(m). In light of the regenerative struc- 

ture of ˜ Sn(m), following an argument that is essentially the same as Theorem 1 of 

[66], we can show that 

P ∗ ( ˜ Sn(m) > cn|Xm) ∼ 

ϑm 

h(ϑm, x) 

 

2πψ ′′ 

m(ϑm)n e−n(ϑmc−ψm(ϑm)) 

as n → ∞, for each m ≥ 1. Note that the key step to proving the preceding asymp- 

totics is a local CLT for ˜ Sn(m), during which its non-lattice property (Assumption 

11) is needed. 


It follows that, for any ɛ > 0, 

for all large n. Hence, 

P ∗ ( ˜ Sn(m) > cn|Xm) 

p(n) 

h(ϑ, x) 

p(n) ≥ (1 − ɛ) 

ϑ 2πψ ′′ (ϑ)n e−n(ϑc−ψ(ϑ)) 

P ∗ ( ˜ Sn(m) > cn|Xm) ≤ (1 + ɛ) 

≤ 

ϑm 

h(ϑm, x) 

 

2πψ ′′ 

m(ϑm)n e−n(ϑmc−ψm(ϑm)) 

1 + ɛ 

1 − ɛ · h(ϑm, 

 

x) ϑ 

· 

h(ϑ, x) ϑm 

· 

ψ ′′ (ϑ) 

ψ ′′ m(ϑm) · e−n[(ϑm−ϑ)c−(ψm(ϑm)−ψ(ϑ))] . 

Moreover, note that on the event {ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ}, 

h(ϑm, x) 

h(ϑ, x) 

 

ϑ 

· 

ϑm 

· 

ψ ′′ (ϑ) 

ψ ′′ m(ϑm) 

≤ 1 + ɛ


ξ 

−(1+ for all δ small enough, or for all n large enough since δ = n 4 ) . Therefore, 

E 

 

P∗ ( ˜ 

Sn(m) > cn|Xm) 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

p(n) 

(1 + ɛ)2 

≤ · E 

1 − ɛ 

e −n(c+1)δ 

(1 + ɛ)2 

→ 

1 − ɛ 

as n → ∞. Sending ɛ ↓ 0, we have 

Likewise, 

Hence, 

lim 

n→∞ E 

lim E 

n→∞ 

lim 

n→∞ E 

 

 

 

Similarly, we can show 

P∗ ( ˜ 


; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ ≤ 1. 

p(n) 

P∗ ( ˜ 


; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ ≥ 1. 

p(n) 

P∗ ( ˜ 


; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ = 1. 

p(n) 

lim 

n→∞ E 

 

P∗ ( ˜ Sn(m) > cn|Xm) 2 

p(n) 2 

 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ = 1. 


E(Z(n, m)|Xm) 

E 

=E 

→1 

p(n) 2 

 

P ∗ ( ˜ Sn(m) > cn|Xm) 2 

p(n) 2 

2 

− 1 ; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

− 2P∗ ( ˜ Sn(m) > cn|Xm) 

p(n) 

+ 1; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ


as n → ∞. 

Theorem 4.3. Suppose that Assumptions 8, 9, 10, and 11 hold. Let m ∼ n 2+ξ for 

some ξ > 0, then there exists a specification for r such that r ∼ e o(n) and 

as n → ∞, for any ɛ > 0. 

 

Z(n, m, r) 

P 

p(n) 

ξ 

−(1+ Proof. Note that, setting δ = n 4 ) , 

 

 

− 1 

 

 

> ɛ → 0 

 

Z(n, m, r) 

P 

− 1 

p(n) > ɛ 

 

Z(n, m, r) 

≤P 

− 1 

p(n) > ɛ; |ϑm 

 

− ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

+ P(|ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| ≥ δ). 

Hence, by Proposition 4.7, it suffices to show that for some choice of r ∼ e o(n) , 

 

Z(n, m, r) 

P 

p(n) 

 

 

− 1 

> ɛ; |ϑm 

 

− ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ → 0 (4.20) 

as n → ∞. To this end, by the Chebyshev’s inequality, 

 

Z(n, m, r) 

P 

p(n) 

 

 

− 1 

> ɛ 

Xm 

≤ E[(Z(n, m, r) − p(n))2 |Xm] 

ɛ 2 p(n) 2 

= Var(Z(n, m, r)|Xm) + [E(Z(n, m, r)|Xm) − p(n)] 2 

ɛ2p(n) 2 

=ɛ −2 

 

Var(Z(n, m)|Xm) 

rp(n) 2 + [E(Z(n, m)|Xm) − p(n)] 2 

p(n) 2 

 

≤ɛ −2 

 

E(Z(n, m) 2 |Xm) 

rp(n) 2 

 

E(Z(n, m)|Xm) 

+ 

p(n) 2 

 

2 

− 1 .



 

Z(n, m, r) 

P 

− 1 

p(n) > ɛ; |ϑm 

 

− ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ → 0 

 

Z(n, m, r) 

=E P 

− 1 

p(n) > ɛ 

Xm 

 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

≤ɛ −2 2 E(Z(n, m) |Xm) 

E 

rp(n) 2 

 

; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

+ ɛ −2 E(Z(n, m)|Xm) 

E 

p(n) 2 

 

2 

− 1 ; |ϑm − ϑ| + |ψm(ϑm) − ψ(ϑ)| < δ 

which, combined with Proposition 4.8 and Proposition 4.9, implies (4.20). 

Remark 4.6. Theorem 4.3 asserts that in order to achieve a given relative precision 

ɛ, the required computational effort for Algorithm 1 is e o(n) . More specifically, the 

computational effort for generating cycles and computing ϑm is O(m) = O(n 2+ξ ). 

Since the complexity for simulating one Z(n, m) is O(m), which is equivalent to 

generating a discrete rv with support size m, the computational effort for compute the 

estimate Z(n, m, r) is O(mr) = e o(n) . Therefore, the entire computational complexity 

for Algorithm 1 is e o(n) , meaning the algorithm is logarithmically efficient. 

4.5 Numerical Experiments 

4.5.1 Autoregressive Model 

We will first use an autoregressive model of order 1 to illustrate Algorithm 1 since 

the tail probability of such a model is explicitly available. In particular, let 

Φn = ρΦn−1 + Zn, n ≥ 1, 

where ρ ∈ (0, 1) and Zn’s are iid with standard normal distribution N (0, 1) and are 

independent of Φ0. Define 

n−1 

Sn = Φi. 

i=0


Given c > 0, we will compute the probability Px(Sn > cn) P(Sn > cn|Φ0 = x) for 

large n. 


First, note that by direct calculation, 

Sn = 

Φn = ρ n Φ0 + 

n 

ρ n−i Zi, 

i=1 

1 − ρn 

1 − ρ Φ0 

n−1 

1 − ρ 

+ 

i=1 

n−i 

1 − ρ Zi. 

Hence, given Φ0, Sn is normal rv with mean 1−ρn 

1−ρ Φ0 and variance 


σ(n) 2 n−1 

n−i 1 − ρ 

 

1 − ρ 

i=1 

2 

= 

Px(Sn > cn) = 1 − Ψ 

1 

(1 − ρ) 2 

 

n − 1 − 2(ρ − ρn ) 

1 − ρ + ρ2 − ρ2n 1 − ρ2 

. 

 

1−ρ 

cn − n 

1−ρ x 

 

σ(n) 

where Ψ(·) is the cumulative distribution function of the standard normal distribution. 

Moreover, we can easily calculate the large deviations asymptotics of Sn. Partic- 

ularly, note that 

1 

n log Exe θSn = 1 

1−ρn 

log Exe 1−ρ 

n x+N (0,σ(n)2 ) 1 θ 

= 

n 

2σ(n) 2 

2 

as n → ∞. Setting ψ ′ (ϑ) = c yields 

ϑ = (¸1 − ρ) 2 . 

The exponential rate function is then given by 

I(c) ϑc − ψ(ϑ) = c2 (1 − ρ) 2 

. 

2 

θ2 

→ ψ(θ) 

2(1 − ρ) 2


x ρ c ɛ 

0 0.5 1 0.25 

Table 4.1: Parameter specification for computing rare-event probabilities for the 

AR(1) model 

In order to implement Algorithm 1, we first need to specify the regenerative struc- 

ture. Namely, we ought to find a compact set C, a constant δ ∈ (0, 1) and a distri- 

bution ϕ such that the minorization condition (Assumption 9) is satisfied. To that 

end, let C = [−ɛ, ɛ], and φ(y) = infx∈C p(x, y), where p(x, y) is the transition density 

of the AR(1) model. Then, 

Let 

φ(y) = inf 

|x|≤ɛ 

∞ 

δ = 

= 

⎧ 

1 

⎨ 

(y−ρx)2 

− √ e 2 = 

2π ⎩ 

φ(y) dy 

(y+ρɛ)2 

√1 − e 2 , y ≥ 0 

2π 

(y−ρɛ)2 

√1 − e 2 , y < 0 

2π . 

−∞ 

0 

∞ 

1 (y−ρɛ)2 

− 1 (y+ρɛ)2 

− √ e 2 dy + √ e 2 dy 

−∞ 2π 0 2π 

=Ψ(−ρɛ) + 1 − Ψ(ρɛ) 

=2Ψ(−ρɛ) ∈ (0, 1). 

Finally, let ϕ(x) = x φ(y) 

dy be a distribution function on R. Then, we have 

−∞ δ 

Px(Φ1 ∈ ·) ≥ δϕ(·), x ∈ C. 

For the numerical experiment, the parameters are set up in Table 4.1. The nu- 

merical results are shown in Table 4.2 and Table 4.3, where “Log Ratio” means 

log(Var(Z(n,m)|Xm)) 

log(p(n) 2 . 

)


True Estimated 

ϑ 0.25 0.2951 

ψ(ϑ) 0.125 0.1567 

I(c) 0.125 0.1384 

Table 4.2: True vs estimated tilting parameters for the AR(1) model. The number of 

cycles m = 40000. 

n p(n) Z(n, m, r) Var(Z(n, m)|Xm) Log Ratio 

20 0.0082 0.0116 0.0142 0.443 

40 5.32E-04 4.62E-04 1.59E-04 0.580 

60 3.72E-05 2.94E-05 6.07E-07 0.702 

80 2.70E-06 2.00E-06 2.59E-09 0.771 

100 2.01E-07 1.19E-07 9.58E-12 0.823 

Table 4.3: Results for computing rare-event probabilities for the AR(1) model. The 

number of cycles m = 40000, the number of bootstrap samples r = 20000. 

4.5.2 Random Walk 

We now apply the Algorithm 1 to the random walk with light-tailed increments, which 

is certainly a special case of the Markov chain. In particular, consider 

Φn = Φn−1 + Zn, n ≥ 1, 

where Zn’s are iid with standard normal distribution. In this setting, Φ regenerates 

at each individual step and we can simplify Algorithm 1 as follows 

Algorithm 2. (i) Generate m iid copies of Z, call them (Z1, . . . , Zm) Xm. 

(ii) Compute 

(iii) Compute ϑm via ψ ′ m(ϑm) = c. 

(iv) Sample Zi from Xm with probability 

 

m 1 

ψm(θ) = log e 

m 

θZi 

 

. 

i=1 

1 

m eϑmZi−ψm(ϑm) ,


True Estimated 

ϑ 0.5 0.4992 

ψ(ϑ) 0.125 0.1258 

I(c) 0.125 0.1238 

Table 4.4: True vs estimated tilting parameters for the random walk. c = 0.5. The 

number of cycles m = 40000. 

and call it ˜ Z1. 

(v) Continue sampling independently to obtain ˜ Z1, . . . , ˜ Zn. 

(vi) Put ˜ Sn(m) = ˜ Z1 + · · · + ˜ Zn and 

Z(n, m) = I( ˜ Sn(m) > cn) exp(−ϑmSn(m) + ψm(ϑm)n). 

(vii) Repeat the above three steps r independent times, yielding Z1(n, m), . . . , Zr(n, m), 

and return 

Direct calculation yields that 

Z(n, m, r) = 1 

r 

r 

Zi(n, m). 

i=1 

ψ(θ) log Ee θZ1 = θ 2 

2 . 

Setting ψ ′ (ϑ) = c gives ϑ = c. Hence, the exponential rate function is 

I(c) = ϑc − ψ(ϑ) = c2 

2 . 

The numerical results are shown in Table 4.4 and Table 4.5.


n p(n) Z(n, m, r) Var(Z(n, m)|Xm) Log Ratio 

20 0.0569 0.0576 0.0064 0.881 

40 7.83E-04 8.37E-04 2.44E-06 0.903 

60 5.38E-05 5.93E-05 1.47E-08 0.917 

80 3.87E-06 4.41E-06 9.57E-11 0.926 

100 2.87E-07 3.21E-07 5.88E-13 0.935 

Table 4.5: Results for computing rare-event probabilities for the random walk. c = 

0.5. The number of cycles m = 40000, the number of bootstrap samples r = 10000.

Appendix A 

Stochastic Stability of Markov 

Processes 

The purpose of this chapter is to give an overview of a unified approach to study 

the stochastic stability (transience, recurrence, ergodicity, etc.) of continuous-time 

Markov processes via the so-called Foster-Lyapunov criteria. Foster-Lyapunov or 

“drift” (inequality) criteria are among the most widely used sufficient conditions for 

the stability classification of Markov models. An excellent reference of this topic is 

[74], which provides a systematic treatment in the context of discrete time Markov 

chains. On the other hand, [71], [72], and [92] discuss the corresponding materials 

for continuous time Markov processes. Other related references include [94], [95], and 

[20]; see also [73] for a brief survey on the same subject. 

To fix the idea, let Φ = (Φn : n ≥ 0) be a (time-homogeneous) discrete time 

Markov chain with state space S and transition probability P (x, A) = P(Φn ∈ 

A|Φn−1 = x) for A ⊆ S. Suppose that V is a non-negative function on S. Then 

the drift of V (Φn) at x ∈ S is defined by 

 

∆V (x) 

S 

P (x, dy)V (y) − V (x). 

104

APPENDIX A. STOCHASTIC STABILITY OF MARKOV PROCESSES 105 

An example of the Foster-Lyapunov criterion is 

∆V (x) ≤ −1 + bIC(x) (A.1) 

for some b > 0 and some set C (typically, compact). Under certain other mild 

technical conditions, (A.1) will imply that Φ is positive recurrent. 

It is not hard to imagine that drift conditions similar to (A.1) in the discrete time 

context have analogues for continuous time Markov processes. Evidently, an analogue 

for the drift operator ∆ is the “generator” (whose definition will be given later). From 

now on, we will focus on the continuous time setting. 

A.1 Extended Generator 

Let Φ = (Φ(t) : t ≥ 0) be a (time-homogeneous) continuous time Markov process liv- 

ing on the state space (S, B(S)), where B(S) is the Borel field on S. We assume that 

S is a locally compact and separable metric space. The Foster-Lyapunov framework 

works for general state spaces, but in practice S is typically a subset of the Euclidean 

space. Φ evolves on the probability space (Ω, F , P). We denote by Px the probability 

measure conditioned on Φ(0) = x and by Ex the associated expectation operator. We 

assume that Φ is a nonexplosive (Borel) right process (see [87] for definition) so that 

Φ is strongly Markovian and has right continuous sample paths. 

In the remaining of the chapter, we will fix the following two notations. For a 

measurable set A ∈ B(S), let τA = inf{t ≥ 0 : Φ(t) ∈ A}. Let {Ok : k ≥ 1} denote a 

family of open sets in S for which the closure of Ok is a compact set and Ok ↑ S as 

k → ∞. 

Definition A.1. Φ is called ϕ-irreducible if there exists a σ-finite measure ϕ such 

that for all x ∈ S, 

ϕ(A) > 0 implies Px(τA < ∞) > 0. 

There are several different versions for defining a generator of a Markov process 

and we will adopt the one from [23].


Definition A.2. We denote by Dom( ˜ 

A ) the set of all functions f : S → R for which 

there exists a measurable function g : S → R such that 

M f (t) f(Φ(t)) − 

t 

0 

g(Φ(s)) ds (A.2) 

is a local martingale adapted to Φ with respect to Px for each x ∈ S. We write 

A˜f = g and call A˜ the extended generator of Φ; Dom( A ˜) 

is called the domain of 

A ˜. 

The local martingale property indicates that there exists a sequence of stopping 

times (σk : k ≥ 0) with σk → ∞ as k → ∞ (also called a localizing sequence of 

stopping times) for which (M f (t ∧ σk) : t ≥ 0) is a uniformly integrable martingale 

adapted to Φ for any fixed k. 

A.2 Foster-Lyapunov Criteria 

The Foster-Lyapunov criteria revealed in this section are taken from [73], which pro- 

vides an overview of the results in [72]. Note that in [72], the authors use a different 

definition for the extended generator and adopts a “truncation” argument. However, 

the same authors switched to Definition A.2 in the survey [73]. In fact, the proofs in 

[72] will retain valid if one adopts Definition A.2 and replaces the truncated process 

by the “localized” process Φ(t ∧ Tk), where (Tk : k ≥ 0) is a localizing sequence of 

stopping times for the local martingale (A.2). 

A.2.1 Recurrence and Ergodicity 

Definition A.3. Φ is called Harris recurrent if there exists a σ-finite measure ϕ, 

such that for all x ∈ S, 

ϕ(A) > 0 implies Px(τA < ∞) = 1. 

Note that the above definition of Harris recurrence is equivalent to the following 

one in the discrete time setting. In particular, a Markov chain (Φn : n ≥ 0) is Harris


recurrent if there exists a set C ∈ B(S) (called a small set in [74]) for which there 

exist λ > 0, a probability ϕ on S, and m ≥ 1 such that Px(τK < ∞) = 1 for each 

x ∈ S and Px(Φm ∈ B) ≥ λϕ(B) for x ∈ K and B ∈ B(S). 

We will need the concept of petite set, which generalizes small set, in order to 

establish the Foster-Lyapunov criteria for stability. 

Definition A.4. A non-empty set C ∈ B(S) is called ϕa-petite if ϕa is a non-trivial 

measure on B(S) and a is a probability on (0, ∞) which satisfy the bound 

 

Ka(x, ·) 

where P (t, x, ·) = Px(Φ(t) ∈ ·). 

P (t, x, ·)a(dt) ≥ ϕa(·), ∀x ∈ C 

Definition A.5. A function f : S → R is lower semicontinuous if 

lim f(y) ≥ f(x), x ∈ S. 

y→x 

Definition A.6. If Px(Φ(t) ∈ O) is a lower semicontinuous function for any open set 

O ∈ B(S), then Φ is called a (weak) Feller process. 

Remark A.1. Φ is Feller if and only if Exg(Φ(t)) is a bounded continuous function 

in x ∈ S for all t > 0, whenever g is bounded and continuous. See, for example, 

Proposition 6.1.1 of [74]. 

It is well known that the Harris recurrence of Φ implies the existence of an es- 

sentially unique invariant measure π. If the invariant measure if finite, then it can 

be normalized to be a probability measure, in which case Φ is called positive Harris 

recurrent. 

Definition A.7. For any positive measurable function f ≥ 1 and any signed-measure 

η on S, define the f-norm ηf by 

ηf = sup |η(g)|, 

|g|≤f


where η(g) = 

x∈S g(x)η(dx). When f ≡ 1, · f is called the total variation norm 

and is denoted by · T V . 

Definition A.8. Φ is called ergodic if the stationary distribution π exists and 

lim 

t→∞ Px(Φ(t) ∈ ·) − π(·) = 0, x ∈ S. 

T V 

Recall that for an irreducible discrete time Markov chain, ergodicity is equivalent 

to positive Harris recurrence. There is also such a connection in the continuous time 

setting. It is shown in [71] that if any one skeleton chain is irreducible, then ergodicity 

and positive Harris recurrence are equivalent concepts. 

Proposition A.1. Suppose that Φ is a ϕ-irreducible nonexplosive right process. If 

there exists constants c > 0, d < ∞, a petite set C ∈ B(S) and a function V such 

that 

then Φ is ergodic. 

Proof. See Theorem 7 of [73]. 

˜ 

A V (x) ≤ −c + dIC(x), x ∈ S, 

A.2.2 Exponential Ergodicity 

Suppose that Φ is positive Harris recurrent with stationary distribution π. 

Definition A.9. For a function f ≥ 1, Φ is called f-exponentially ergodic if there 

exists a constant β ∈ (0, 1) and a finite-valued function B(x) such that 

for all t > 0 and x ∈ S. 

Px(X(t) ∈ ·) − π(·)f ≤ B(x)β t 

Proposition A.2. Suppose that Φ is a ϕ-irreducible nonexplosive right process. If 

there exists a petite set C, constants c > 0, d < ∞, and a function V such that 

˜ 

A V (x) ≤ −cV (x) + dIC(x), x ∈ S,


then Φ is V -exponentially ergodic and π(V ) < ∞. 

Proof. See Theorem 7 and Theorem 8 of [73].

Bibliography 

[1] S. Amussen and P. W. Glynn. Stochastic Simulation: Algorithm and Analysis. 

Springer-Verlag, 2007. 

[2] P. K. Andersen, Ø. Borgan, R. D. Gill, and N. Keiding. Statistical Models Based 

on Counting Processes. Springer-Verlag, 1993. 

[3] S. Asmussen. Conjugate processes and the simulation of ruin problems. Stoch. 

Proc. Appl., 20:213–229, 1985. 

[4] S. Asmussen. Ruin Probabilities. World Scientific Publishing Co. Ltd. London, 

2000. 

[5] K. B. Athreya and P. Ney. A new approach to the limit theory of recurrent 

Markov chains. Trans. Amer. Math. Soc., 245:493501, 1978. 

[6] A. Bassamboo and S. Jain. Efficient importance sampling for reduced form 

models in credit risk. In Proceedings of the 2006 Winter Simulation Conference, 

pages 741–749, 2006. 

[7] A. Bassamboo, S. Juneja, and A. Zeevi. Portfolio credit risk with extremal 

dependence: Asymptotic analysis and efficient simulation. Operations Research, 

56(3):593–606, 2008. 

[8] L. Bauwens and N. Hautsch. Handbook of Financial Time Series, chapter Mod- 

elling Financial High Frequency Data Using Point Processes, pages 953–979. 

Springer Berlin Heidelberg, 2009. 

110

BIBLIOGRAPHY 111 

[9] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical 

Sciences. SIAM, 1987. 

[10] J. Blanchet and P. W. Glynn. Efficient rare event simulation for the maximum 

of heavy-tailed random variables. Ann. Appl. Probab., 18:1351–1378, 2008. 

[11] J. Blanchet, P. W. Glynn, and S. P. Meyn. Large deviations for the empirical 

mean of an M/M/1 queue. Submitted for publication, 2011. 

[12] J. Blanchet, K. Leder, and P. W. Glynn. Strongly efficent algorithms for light- 

tailed random walks: An old folk song sung to a faster new tune. In P. LEcuyer 

and eds. A. Owen, editors, Monte Carlo and Quasi-Monte Carlo Methods 2008. 

Springer, 2009. 

[13] H. A. P. Blom, G. J. Bakker, and J. Krystul. Rare event estimation for a large- 

scale stochastic hybrid system with air traffic application. In G. Rubino and 

B. Tuffin, editors, Rare Event Simulation using Monte Carlo Methods, chapter 9, 

pages 193–214. Wiley, 2009. 

[14] P. Brémaud. Point Processes and Queues: Martingale Dynamics. Springer, 

1981. 

[15] J. Bucklew. Introduction to Rare Event Simulation. Springer-Verlag, 2004. 

[16] H. P. Chan and T. L. Lai. Efficient importance sampling for Monte Carlo 

evaluation of exceedance probabilities. Ann. Appl. Probab., 17:440–473, 2007. 

[17] H. P. Chan and T. L. Lai. A sequential Monte Carlo approach to computing 

tail probabilities in stochastic models. Ann. of Appl. Prob., forthcoming. 

[18] J. C. C. Chan and D. P. Kroese. Efficient estimation of large portfolio loss 

probabilities in t-copula models. European Journal of Operations Research, 

205:361–367, 2010. 

[19] V. Chavez-Demoulin, A. C. Davison, and A. J. McNeil. Estimating value-at- 

risk: A point process approach. Quantitative Finance, 5(2):227–234, 2005.


[20] M.-F. Chen. On three classical problems for Markov chains with continuous 

time parameter. J. Appl. Prob., 28:305–320, 1991. 

[21] P. Cheridito, D. Filipović, and M. Yor. Equivalent and absolutely continuous 

measure changes for jump-diffusion processes. The Ann. Appl. Probab., 15:1713– 

1732, 2005. 

[22] J. C. Cox, J. E. Ingersoll, and S. A. Ross. A theory of the term structure of 

interest rates. Econometrica, 53(2):385–407, 1985. 

[23] M. H. A. Davis. Markov Models and Optimization. Chapman and Hall, London, 

1993. 

[24] A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications. 

Springer, 2 edition, 1998. 

[25] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov 

process expectations for large time. I. II. Comm. Pure Appl. Math., 28:1–47, 

1975. 

[26] M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain Markov 

process expectations for large time. III. Comm. Pure Appl. Math., 29:389461, 

1976. 

[27] A. L. Dontchev and R. T. Rockafellar. Implicit Functions and Solution Map- 

pings: A View from Variational Analysis. Springer, 2009. 

[28] D. Duffie, D. Filipović, and W. Schachermayer. Affine processes and applica- 

tions in finance. Ann. Appl. Probab., 13(3):984–1053, 2003. 

[29] D. Duffie and N. Garleanu. Risk and valuation of collateralized debt obligations. 

Financial Analysts Journal, 57(1):41–59, 2001. 

[30] D. Duffie, J. Pan, and K. Singleton. Transform analysis and asset pricing for 

affine jump-diffusions. Econometrica, 68(6):1343–1376, 2000.


[31] D. Duffie and K. J. Singleton. Credit Risk: Pricing, Measurement, and Man- 

agement. Princeton University Press, 2003. 

[32] K. R. Duffy and S. P. Meyn. Most likely paths to error when estimating the 

mean of a reflected random walk. Performance Evaluation, 67(12):1290–1303, 

2010. 

[33] P. Dupuis and H. Wang. Dynamic importance sampling for uniformly recurrent 

Markov chains. Ann. Appl. Probab., 15:1–38, 2005. 

[34] P. Dupuis and H. Wang. Subsolutions of an Isaacs equation and and efficient 

schemes for importance sampling. Math. Oper. Research, 32:723–757, 2007. 

[35] P. E. Echeverria. A criterion for invariant measure of Markov procsses. Z. 

Wahrsch. verw. Gebiete., 61:1–16, 1982. 

[36] A. Eckner. Computational techniques for basic affine models of portfolio credit 

risk. Journal of Computational Finance, 15:63–97, 2009. 

[37] E. Errais, K. Giesecke, and L. B. Goldberg. Affine point processes and portfolio 

credit risk. SIAM Journal on Financial Mathematics, 1:642665, 2010. 

[38] S. N. Ethier and T. G. Kurtz. Markov Processes: Characterization and Con- 

vergence. Wiley, 1986. 

[39] L. C. Evans. Partial Differential Equations. Amer Mathematical Society, 1998. 

[40] D. Filipović, E. Mayerhofer, and P. Schneider. Density approximations for 

multivariate affine jump-diffusion processes. Working paper, 2011. 

[41] K. Giesecke, K. Spiliopoulos, and R. B. Sowers. Default clustering in large 

portfolios: Typical and atypical events. Submitted, 2011. 

[42] K Giesecke, K. Spiliopoulos, R. B. Sowers, and J. A. Sirignano. Large portfolio 

asymptotics for losses from default. Submitted, 2011.


[43] Kay Giesecke, Baeho Kim, and Shilin Zhu. Monte Carlo algorithms for default 

timing problems. Management Science, forthcoming. 

[44] P. Glasserman, W. Kang, and P. Shahabuddin. Large deviations in multifactor 

portfolio credit risk. Mathematical Finance, 17(3):345–379, 2007. 

[45] P. Glasserman, W. Kang, and P. Shahabuddin. Fast simulation of multifactor 

portfolio credit risk. Operations Research, 56(5):12001217, 2008. 

[46] P. Glasserman and K.-K. Kim. Saddlepoint approximation for affine jump- 

diffusion models. Journal of Economic Dynamics and Control, 33:37–52, 2009. 

[47] P. Glasserman and K.-K. Kim. Moment explosions and stationary distributions 

in affine diffusion models. Mathematical Finance, 20(1):1–33, 2010. 

[48] P. Glasserman and J. Li. Importance sampling for portfolio credit risk. Man- 

agement Science, 51(11):16431656, 2005. 

[49] P. W. Glynn. Wide-sense regeneration for Harris recurrent Markov processes: 

An open problem. Queueing Systems - Theory and Applications, (3), forthcom- 

ing. 

[50] P. W. Glynn and A. Zeevi. Bounding stationary expectations of Markov pro- 

cesses. In S. Ethier, J. Feng, and R. Stockbridge, editors, Markov Processes and 

Related Topics: A Festschrift for Thomas G. Kurtz. IMS, 2008. 

[51] A. G. Hawkes. Spectra of some self-exciting and mutually exciting point pro- 

cesses. Biometrika, 58:83–90, 1971. 

[52] A. G. Hawkes and D. Oakes. A cluster process representation of a self-exciting 

proces. Journal of Applied Probability, 11:493–503, 1974. 

[53] P. Heidelberger. Fast simulation of rare events in queueing and reliability mod- 

els. ACM Transactions on Modeling and Computer Simulation, 5(1):43–85, 

1995.


[54] T. S. Y. Ho and S. B. Lee. Term structure movements and pricing interest rate 

contingent claims. Journal of Finance, 41(5):1011–1029, 1986. 

[55] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University 

Press, 1994. 

[56] J. Jacod and A. Shiryaev. Limit Theorems for Stochastic Processes. Springer, 

2 edition, 2002. 

[57] N. L. Johnson, K. Kotz, and N. Balakrishnan. Continuous Univariate Distri- 

butions. Vol. 2. Wiley-Interscience, 2 edition, 1995. 

[58] M. S. Joshi. Applying importance sampling to pricing single tranches of cdos 

in a one-factor li model. Technical report, QUARC, Group Risk Management, 

Royal Bank of Scotland., 2004. 

[59] S. Juneja and P. Shahabuddin. Simulating heavy tailed processes using delayed 

hazard rate twisting. ACM Transactions on Modeling and Computer Simula- 

tion, 12(2):94–118, 2002. 

[60] S. Juneja and P. Shahabuddin. Rare event simulation techniques: An intro- 

duction and recent advances. In S. G. Henderson and B. L. Nelson, editors, 

Handbooks in Operations Research and Management Science, volume 13, chap- 

ter 11. Elsevier, Amsterdam, The Netherlands, 2006. 

[61] M. Kalkbrener, H. Lotter, and L. Overbeck. Sensible and efficient capital allo- 

cation for credit portfolios. Risk, 17(1):19–24, 2004. 

[62] I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus. 

Springer, 2 edition, 1991. 

[63] K.-K. Kim. Stability analysis of Riccati differential equations related to affine 

diffusion processes. Journal of Mathematical Analysis and Applications, 364:18– 

31, 2010.


[64] I. Kontoyiannis and S. P. Meyn. Spectral theory and limit theorems for geo- 

metrically ergodic Markov processes. Ann. Appl. Probab., 13(1):304–362, 2003. 

[65] I. Kontoyiannis and S. P. Meyn. Large deviations asymptotics and the spec- 

tral theory of multiplicatively regular Markov processes. Electron. J. Probab., 

10(3):61–123, 2005. 

[66] T. Kuczek and K. N. Crank. A large-deviation result for regenerative processes. 

J. Theo. Prob., 4(3):551–561, 1991. 

[67] D. Lépingle and J. Mémin. Sur lintegrabilité uniforme des martingales expo- 

nentielles. Z. Wahrsch. Verw. Gebiete, 42(3):175–203, 1978. 

[68] H. Masuda. On multidimensional Ornstein-Uhlenbeck processes driven by a 

general lévy process. Bernoulli, 10:97–120, 2004. 

[69] H. Masuda. Ergodicity and exponential β-mixing bounds for multidimensional 

diffusions with jumps. Stochastic Process. Appl., 117:35–56, 2007. 

[70] R. Merton. On the pricing of corporate debt: The risk structure of interest 

rates. Journal of Finance, 29:449–470, 1974. 

[71] S. P. Meyn and R. L Tweedie. Stability for Markovian processes II: continuous- 

time processes and sampled chains. Adv. Appl. Probability, 25:487–517, 1993. 

[72] S. P. Meyn and R. L Tweedie. Stability of Markovian probabilities III: Foster- 

Lyapunov criteria for continuous-time processes. Adv. Appl. Prob., 25:518–548, 

1993. 

[73] S. P. Meyn and R. L. Tweedie. A survey of Foster-Lyapunov techniques for gen- 

eral state space Markov processes. In Proceedings of the Workshop on Stochastic 

Stability and Stochastic Stabilization, Metz, France, June 1993. 

[74] S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stability. Cam- 

bridge Unversity Press, 2 edition, 2009.


[75] P. Ney and E. Nummelin. Markov additive processes I. Eigenvalue properties 

and limit theorems. Ann. Probab., 15(2):561–592, 1987. 

[76] P. Ney and E. Nummelin. Markov additive processes II. Large deviations. Ann. 

Probab., 15(2):593–609, 1987. 

[77] E. Nummelin. A splitting technique for Harris recurrent chains. Z. Wahrschein- 

lichkeitstheorie und Verw. Geb., 43:309–318, 1978. 

[78] E. Nummelin. General Irreducible Markov Chains and Nonnegative Operators. 

Cambridge University Press, 1984. 

[79] Y. Ogata. Statistical models for earthquake occurrences and residual analysis 

for point processes. Journal of the American Statistical Association, 83(101):9– 

27, 1988. 

[80] Y. Ogata. Space-time point-process models for earthquake occurrences. Annals 

of the Institute of Statistical Mathematics, 50(2):378–402, 1998. 

[81] B. Øksendal. Stochastic Differential Equations: An Introduction with Applica- 

tions. Springer-Ver, 2000. 

[82] E. Papageorgiou and R. Sircar. Multiscale intensity models and name grouping 

for valuation of multi-name credit derivatives. Applied Mathematical Finance, 

15(1):73–105, 2007. 

[83] P. E. Protter. Stochastic Integration and Differential Equations. Springer, 2 

edition, 2003. 

[84] G. Rubino and B. Tuffin, editors. Rare Event Simulation using Monte Carlo 

Methods. John Wiley & Sons, 2009. 

[85] J. S. Sadowsky and J. A. Bucklew. On large deviations theory and asymptot- 

ically efficient Monte Carlo estimation. IEEE Trans. Inform. Theory, 36:579– 

588, 1990.


[86] K. Sato and M. Yamazato. Operator-self-decomposable distributions as limit 

distributions of processes of OrnsteinUhlenbeck type. Stochastic Process. Appl., 

17:73–100, 1984. 

[87] M. Sharpe. General Theory of Markov Processes. Academic Press, 1988. 

[88] Y. Shimizu. M-estimation for discretely observed ergodic diffusion processes 

with infinite jumps. Stat. Inference Stoch. Process, 9:179225, 2004. 

[89] Y. Shimizu and N. Yoshida. Estimation of parameters for diffusion processes 

with jumps from discrete observations. Stat. Inference Stoch. Process, 9:227277, 

2004. 

[90] A. V. Skorohod. Asymptotic Methods in the Theory of Stochastic Differential 

Equations. American Mathematical Society, 1989. 

[91] A. Stomakhin, M. B. Short, and A. L. Bertozzi. Reconstruction of missing data 

in social networks based on temporal patterns of interactions, 2011. submitted. 

[92] O. Stramer and R. L. Tweedie. Stability and instability of continuous time 

Markov processes. In F. P. Kelly, editor, Probability, Statistics and Optimiza- 

tion: A Tribute to Peter Whittle, pages 173–184. John Wiley & Sons, 1994. 

[93] R. Szechtman and P. W. Glynn. Rare-event simulation for infinite server queues. 

In E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, editors, Pro- 

ceedings of the 2002 Winter Simulation Conference, pages 416–423, 2002. 

[94] R. L. Tweedie. Sufficient conditions for regularity, recurrence and ergodicity of 

Markov processes. Math. Proc. Camb. Phil. Soc., 78:125–136, 1975. 

[95] R. L Tweedie. Criteria for ergodicity, exponential ergodicity and strong ergod- 

icity of Markov processes. J. Appl. Prob., 18:122–130, 1981. 

[96] O. Vasicek. An equilibrium characterization of the term structure. Journal of 

Financial Economics, 5:177–188, 1977.


[97] A. Wald. On cumulative sums of random variables. The Annals of Mathematical 

Statistics, 15(3):283296, 1944. 

[98] B. Wong and C. C. Heyde. On the martingale property of stochastic exponen- 

tials. J. Appl. Prob., 41:654–664, 2004. 

[99] F. Xi. Asymptotic properties of jump-diffusion processes with state-dependent 

switching. St, 119(7):2198 – 2221, 2009. 

[100] X. Zhang, P. W. Glynn, K. Giesecke, and J. Blanchet. Rare event simulation 

for a generalized Hawkes process. In M. D. Rossetti, R. R. Hill, B. Johans- 

son, A. Dunkin, and R. G. Ingalls, editors, Proceedings of the 2009 Winter 

Simulation Conference, pages 1291–1298, 2009.

computing rare-event probabilities for affine models and general ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?