Asymptotic theory of semiparametric Z-estimators for stochastic ...

The Annals of Statistics 

2009, Vol. 37, No. 6A, 3555–3579 

DOI: 10.1214/09-AOS693 

© Institute of Mathematical Statistics, 2009 

ASYMPTOTIC THEORY OF SEMIPARAMETRIC Z-ESTIMATORS 

FOR STOCHASTIC PROCESSES WITH APPLICATIONS TO 

ERGODIC DIFFUSIONS AND TIME SERIES 

BY YOICHI NISHIYAMA 

Institute of Statistical Mathematics 

This paper generalizes a part of the theory of Z-estimation which has 

been developed mainly in the context of modern empirical processes to the 

case of stochastic processes, typically, semimartingales. We present a general 

theorem to derive the asymptotic behavior of the solution to an estimating 

equation θ n (θ, ĥn) = 0 with an abstract nuisance parameter h when the 

compensator of n is random. As its application, we consider the estimation 

problem in an ergodic diffusion process model where the drift coefficient 

contains an unknown, finite-dimensional parameter θ and the diffusion coefficient 

is indexed by a nuisance parameter h from an infinite-dimensional 

space. An example for the nuisance parameter space is a class of smooth functions. 

We establish the asymptotic normality and efficiency of a Z-estimator 

for the drift coefficient. As another application, we present a similar result 

also in an ergodic time series model. 

1. Introduction. Let us begin with stating our motivating example; the details 

are presented in Section 4. Consider the one-dimensional ergodic diffusion process 

X on I = (l, r) ⊆ R which is a solution to the stochastic differential equation 

(SDE) given by 

∫ t 

∫ t 

(1) 

X t = X 0 + S(X s ; θ)ds + σ(X s ; h) dW s , 

0 

0 

where s W s is a standard Brownian motion. Here, we consider a d-dimensional 

parametric family {S(·; θ); θ ∈ } for the drift coefficient indexed by a compact 

subset of R d , and a possibly infinite-dimensional “parametric” family 

{σ 2 (·; h); h ∈ H } for the diffusion coefficient indexed by a (general) totally 

bounded metric space (H, d H ). We denote by (θ 0 ,h 0 ) the true value of (θ, h).Our 

aim is to estimate θ 0 when the model is perturbed by the unknown nuisance parameter 

h. As for the parameter h 0 , we construct a d H -consistent estimator ĥn. 

We prove that the Z-estimator ̂θ n , which is a solution to an estimating equation 

n (θ, ĥn) = 0, is asymptotically normal and efficient. [We follow van der Vaart 

and Wellner (1996) for the terminology “Z-estimator.”] 

Received January 2009. 

AMS 2000 subject classifications. Primary 62F12; secondary 62M05, 62M10. 

Key words and phrases. Estimating function, nuisance parameter, metric entropy, asymptotic efficiency, 

ergodic diffusion, discrete observation. 

3555

3556 Y. NISHIYAMA 

There exist a lot of works which treat the estimation problem for the drift coefficient. 

It is well known that when the process X = (X t ) t∈[0,∞) is observed continuously 

on the time interval [0,T], the diffusion coefficient may be assumed to 

be known without loss of generality. (So, we may put h = h 0 .) In such cases, the 

asymptotic normality and efficiency of the maximum likelihood estimator (MLE) 

̂θ T for θ 0 ,asT →∞, has been already established. See, for example, Kutoyants 

(2004). The MLE ̂θ T is a solution to the estimating equation ˙l T (θ) = 0 with 

˙l T (θ) = 1 T 

∫ T 

0 

Ṡ(X t ; θ) 

σ 2 (X t ; h 0 ) [dX t − S(X t ; θ)dt], 

where Ṡ denotes the derivative of S with respect to θ. On the other hand, when the 

process X = (X t ) t∈[0,∞) is observed only at discrete time points {0 = t0 n

SEMIPARAMETRIC Z-ESTIMATION 3557 

and to show the differentiability of (θ, h) ˜ n (θ, h) around (θ 0 ,h 0 ). Roughly 

speaking, our main result asserts that if we assume h ↦→ σ 2 (·; h) is Lipschitz continuous 

with respect to d H , that the metric entropy condition is satisfied, 

∫ 1 

0 

√ 

log N(H,d H ,ε)dε


denote by C ρ (T ) the space of functions on T which are continuous with respect to 

the metric ρ. We equip both spaces with the uniform metric. Given a probability 

measure P , we denote by P ∗ the corresponding outer probability; see van der Vaart 

and Wellner (1996) for the stochastic convergence theory which does not assume 

the measurability. We denote by p → and d → the convergence in (outer) probability 

and the weak convergence. The limit notation mean in principle that we take the 

limit as n →∞. The Euclidean metric on R d is denoted by ‖·‖. 

2. General theory for semiparametric Z-estimation. Let two sets and H 

be given. Let 

n : × H → R d and ˜ n : × H → R d 

be random maps. The latter should be a random “compensator” of the former, 

and in the i.i.d. case it is not random and not depending on n. Compare the above 

setting with that in Chapter 3.3 of van der Vaart and Wellner (1996)where˜ n ≡ . 

We present a way to derive the asymptotic behaviour of the estimator ̂θ n for 

the parameter θ ∈ of interest, with help of the estimator ĥn for the nuisance 

parameter h ∈ H , which are solutions to the estimating equation 

n (̂θ n , ĥn) ≈ 0. 

Here, the true values θ 0 ∈ and h 0 ∈ H are supposed to satisfy 

˜ n (θ 0 ,h 0 ) ≈ 0. 

The following theorem extends a special case of Theorem 3.3.1 of van der Vaart 

and Wellner (1996). See also Theorem 5.21 of van der Vaart (1998). 

THEOREM 2.1. Let be a subset of R d with the Euclidean metric ‖·‖. Let 

(H, d H ) be a semimetric space. Let n : × H → R d and ˜ n : × H → R d 

be random maps defined on a probability space ( n , F n ,P n ).(We do not assume 

any measurability.) Suppose that there exist a sequence of constants r n ↑∞, some 

fixed point (θ 0 ,h 0 ) and an invertible matrix V θ0 ,h 0 

which satisfy the following (i) 

and (ii). 

(i) There exists a neighborhood U ⊂ × H of (θ 0 ,h 0 ) such that 

r n ( n − ˜ n ) d → Z 

in l ∞ (U), 

where almost all paths (θ, h) Z(θ,h) are continuous with respect to ρ =‖·‖∨ 

d H . 

(ii) For given random sequence (̂θ n , ĥn), it holds that 

˜ n (̂θ n , ĥn) − ˜ n (θ 0 ,h 0 ) − V θ0 ,h 0 

(̂θ n − θ 0 ) = o P ∗ n 

(r −1 

n 

+‖̂θ n − θ 0 ‖)

and that 


‖̂θ n − θ 0 ‖∨d H (ĥn,h 0 ) = o P ∗ n 

(1), 

Then it holds that 

˜ n (θ 0 ,h 0 ) = o P ∗ n 

(r −1 

n ). 

r n (̂θ n − θ 0 ) d →−V −1 

θ 0 ,h 0 

Z(θ 0 ,h 0 ). 

n (̂θ n , ĥn) = o P ∗ n 

(r −1 

n ), 

To prove the above theorem, we need the following lemma, which is a slight 

generalization of Lemma 19.24 of van der Vaart (1998). 

LEMMA 2.2. Let (T , ρ) be a semimetric space. Suppose that Z n 

d → Z in 

l ∞ (T ) and that almost all paths of Z are continuous with respect to ρ. If T -valued 

random sequence ̂t n satisfies ρ(̂t n ,t 0 ) = o P ∗ n 

(1) for some nonrandom t 0 ∈ T , then 

Z n (̂t n ) − Z n (t 0 ) = o P ∗ n 

(1). 

PROOF. Let us equip the space l ∞ (T ) × T with the metric ‖·‖ T + ρ, where 

‖·‖ T denotes the uniform metric on l ∞ (T ). Define the function g : l ∞ (T ) × T → 

R by g(z,t) = z(t) − z(t 0 ). Then for any z ∈ C ρ (T ) and t ∈ T , the function g 

is continuous at (z, t). Indeed, if (z n ,t n ) → (z, t), then‖z n − z‖ T → 0, and thus 

z n (t n ) = z(t n ) + o(1) → z(t), while z n (t 0 ) → z(t 0 ) is trivial. 

By assumption, we have (Z n ,̂t n ) d → (Z, t 0 ) in l ∞ (T ) × T [see, e.g., Theorem 

18.10(v) of van der Vaart (1998)]. Since almost all paths of Z belong to 

C ρ (T ), by the continuous mapping theorem, 

Z n (̂t n ) − Z n (t 0 ) = g(Z n ,̂t n ) d → g(Z,t 0 ) = Z(t 0 ) − Z(t 0 ) = 0. 

The proof is finished. 

□ 

PROOF OF THEOREM 2.1. Applying Lemma 2.2 to the l ∞ (U)-valued random 

element Z n = r n ( n − ˜ n ),wehave 

r n 

( 

n (̂θ n , ĥn) − ˜ n (̂θ n , ĥn) ) − r n 

( 

n (θ 0 ,h 0 ) − ˜ n (θ 0 ,h 0 ) ) = o P ∗ n 

(1). 

Since r n n (̂θ n , ĥn) = o P ∗ n 

(1), wehave 

r n 

(˜ n (̂θ n , ĥn) − ˜ n (θ 0 ,h 0 ) ) =−r n n (θ 0 ,h 0 ) + o P ∗ n 

(1). 

By the assumption (ii), it holds that 

(2) r n V θ0 ,h 0 

(̂θ n − θ 0 ) =−r n n (θ 0 ,h 0 ) + o P ∗ n 

(1 + r n ‖̂θ n − θ 0 ‖). 

Now, since 

r n ‖̂θ n − θ 0 ‖≤‖Vθ −1 

0 ,h 0 

‖r n ‖V θ0 ,h 0 

(̂θ n − θ 0 )‖ 

≤ O Pn (1) + o P ∗ n 

(1 + r n ‖̂θ n − θ 0 ‖) by (2),


it holds that r n ‖̂θ n − θ 0 ‖=O P ∗ n 

(1). Inserting this to (2), we have 

r n V θ0 ,h 0 

(̂θ n − θ 0 ) =−r n n (θ 0 ,h 0 ) + o P ∗ n 

(1) 

=−r n 

( 

n (θ 0 ,h 0 ) − ˜ n (θ 0 ,h 0 ) ) + o P ∗ n 

(1) 

d 

→−Z(θ 0 ,h 0 ), 

which implies the conclusion. 

□ 

In Theorem 2.1, both “‖̂θ n − θ 0 ‖=o P ∗ n 

(1)” and“d H (ĥn,h 0 ) = o P ∗ n 

(1)” are 

assumed. Under some conditions, the former automatically follows from the latter, 

as it is seen in the following theorem. 

THEOREM 2.3. Let (, d ) and (H, d H ) be two semimetric spaces. Let 

n : × H → R d be a random map defined on a probability space ( n , F n ,P n ). 

(We do not assume any measurability.) Let : × H → R d be a nonrandom 

function. Suppose that 

sup ‖ n (θ, h) − (θ,h)‖=o P ∗ n 

(1). 

(θ,h)∈×H 

Suppose also that for some (θ 0 ,h 0 ) ∈ × H 

inf ‖(θ,h 0)‖ > 0 

θ : d (θ,θ 0 )>ε 

∀ε>0, 

and h (θ,h) is continuous at h 0 uniformly in θ. Then for any random sequence 

(̂θ n , ĥn) such that n (̂θ n , ĥn) = o P ∗ n 

(1) and that d H (ĥn,h 0 ) = o P ∗ n 

(1), it 

holds that d (̂θ n ,θ 0 ) = o P ∗ n 

(1). 

PROOF. 

Observe that 

‖(̂θ n , ĥn)‖≤‖(̂θ n , ĥn) − n (̂θ n , ĥn)‖+‖ n (̂θ n , ĥn)‖ 

≤ sup ‖(θ,h)− n (θ, h)‖+‖ n (̂θ n , ĥn)‖ 

θ,h 

= o P ∗ n 

(1). 

Now, for every ε>0, there exist δ,η > 0 such that ‖(θ,h)‖ >ηfor every θ with 

d (θ, θ 0 )>εand every h with d H (h, h 0 )ε} is 

contained in the event {‖(̂θ n , ĥn)‖ >η}∪{d H (ĥn,h 0 ) ≥ δ}. The outer probability 

of the latter event converges to 0. □ 

3. Uniform law of large numbers. In this section, we give a uniform law of 

large numbers for ergodic processes, under a smoothness assumption. The proof is


standard, so it is omitted. [See, e.g., Theorem 2.4.1 of van der Vaart and Wellner 

(1996) for the idea, or see Nishiyama (2009).] 

THEOREM 3.1. Let (E, E) be a measurable space. Let beasetwhichis 

totally bounded with respect to the semimetric ρ. Let a family {f(·; θ); θ ∈ } 

of measurable functions on E be given. Suppose that there exists a measurable 

function K such that 

(3) 

|f(x; θ)− f(x; θ ′ )|≤K(x)ρ(θ,θ ′ ) ∀θ,θ ′ ∈ . 

(i) Suppose that the E-valued random process {X t } t∈[0,∞) is ergodic with the 

invariant law μ, that is, for any μ-integrable function g 

∫ 

1 T 

∫ 

p 

g(X t )dt → g(x)μ(dx). 

T 0 

E 

If all f(·; θ) and K are μ-integrable, then 

∫ 1 T 

∫ 

sup 

∣ f(X t ; θ)dt − f(x; θ)μ(dx) 

T 

∣ = o P ∗(1). 

θ∈ 

0 

(ii) Suppose that the E-valued random process {X i } i=1,2,... is ergodic with the 

invariant law μ, that is, for any μ-integrable function g, 

1 

n 

n∑ 

i=1 

E 

g(X i ) → 

p ∫ 

g(x)μ(dx). 

E 

If all f(·; θ) and K are μ-integrable, then 

1 

n∑ 

∫ 

sup 

f(X 

∣ 

i ; θ)− f(x; θ)μ(dx) 

n 

E 

∣ = o P ∗(1). 

θ∈ 

i=1 

REMARK. The smoothness assumption (3) can be replaced by “bracketing.” 

See Theorem 2.4.1 of van der Vaart and Wellner (1996). 

4. Ergodic diffusion processes. 

4.1. Regularity conditions. Let us consider the diffusion process model introduced 

in the first paragraph of Section 1. We shall list up some conditions. 

We suppose that there exists a parametric family of d-dimensional vector-valued 

functions {Ṡ(·; θ); θ ∈ } on I which satisfies the following conditions. Typically, 

they may be considered to be the derivatives of S(·; θ) with respect to θ, thatis, 

Ṡ(·; θ) = ( 

∂θ ∂ 

1 

S(·; θ),..., 

∂θ ∂ 

d 

S(·; θ)) T . The function appearing in A1 and A3 

may be chosen to be common without loss of generality.


A1. is a compact subset of R d . There exists a measurable function on I 

such that at the true θ 0 ∈ , 

S(x; θ)− S(x; θ 0 ) = Ṡ(x; θ 0 ) T (θ − θ 0 ) + (x)ɛ(x; θ,θ 0 ), 

where sup x∈I |ɛ(x; θ,θ 0 )|=o(‖θ − θ 0 ‖) as θ → θ 0 . 

A2. There exists a constant K>0 such that 

sup |S(x; θ)− S(x ′ ; θ)|≤K|x − x ′ |; 

θ∈ 

sup ‖Ṡ(x; θ)− Ṡ(x ′ ; θ)‖≤K|x − x ′ |; 

θ∈ 

sup |σ 2 (x; h) − σ 2 (x ′ ; h)|≤K|x − x ′ |. 

h∈H 

A3. There exists a measurable function on I such that 

sup |S(x; θ)|≤(x); 

θ∈ 

sup ‖Ṡ(x; θ)‖≤(x); 

θ∈ 

c := inf inf σ 2 (x; h) > 0; 

h∈H x∈I 

‖Ṡ(x; θ)− Ṡ(x; θ)‖≤(x)‖θ − θ ′ ‖ ∀θ,θ ′ ∈ ; 

|σ 2 (x; h) − σ 2 (x; h ′ )|≤(x) d H (h, h ′ ) ∀h, h ′ ∈ H. 

A4. sup t∈R E((X t ) 8 +|X t | 4 )ε 

I 

0 

Ṡ(x; θ) 

σ 2 (x; h 0 ) [S(x; θ 0) − S(x; θ)]μ(dx) 

∥ > 0. 

REMARK. The last assumption in A2 implies that σ 2 (x; h 0 ) ≤ C(1 +|x|) for 

a constant C>0.


To close this subsection, let us discuss the possibility of the choice of the nuisance 

parameter space (H, d H ). 

EXAMPLE 1 (Parametric model). When (H, d H ) is a compact subset of a 

finite-dimensional Euclidean space, the metric entropy condition A7 is indeed satisfied. 

So the main restriction is the Lipschitz continuity of h ↦→ σ 2 (·; h) in A3. 

This situation is more general than that in Yoshida (1992) and Kessler (1997), 

although, as announced in Section 1, our result does not include theirs. 

EXAMPLE 2 (The class of smooth functions). Let us consider the parametrization 

σ(x; h) = h(x) where h is an element of the class H = CM α (I) defined 

below. We equip the function space H with the uniform metric ‖·‖ ∞ for which 

the last requirement in A3 is always fulfilled. To check A7, first we consider the 

case where I is a bounded subset of R, and next we give some remarks for the 

general case. 

We take the material below from Section 2.7.1 of van der Vaart and Wellner 

(1996). Let I be a bounded, convex subset of R q . (In the current example of onedimensional 

diffusions, we are considering the case q = 1, but for the generality 

we set q to be a general positive integer; see Section 5.) Let α>0andM>0 

be given, and let α be the greatest integer smaller than α. For any vector k = 

(k 1 ,...,k q ) of q integers, we define 

D k = 

∂x k , 

1 

1 ···∂xk q 

q 

where k· = ∑ q 

i=1 k i. We denote by CM α (I) the class of functions defined on I such 

that 

max sup 

k·≤α x 

|Dk h(x)|+max sup |D k h(x) − D k h(y)| 

k·=α x,y ‖x − y‖ α−α ≤ M, 

where the sumprema are taken over all x, y in the interior of I with x ≠ y. Then 

there exists a constant K>0 depending only on α and q, such that 

∂ k· 

( M 

log N(CM α (I), ‖·‖ ∞,ε)≤ Kλ(I 1 ) 

ε 

) q/α 

, 

where λ(I 1 ) is the Lebesgue measure of the set {x : ‖x−I‖ < 1}. Hence, the metric 

entropy condition A7 is satisfied if q/(2α) < 1, and therefore our theory works. 

When I = R q , we shall restrict out attention, for example, to the following class 

H of functions on R q .LetI 0 be a bounded, convex subset of R q , and we suppose 

that the restriction of h ∈ H to I 0 belongs to C M (I 0 ) and that 

(4) 

sup |h(x) − h ′ (x)|≤L sup |h(x) − h ′ (x)| ∀h, h ′ ∈ H, 

x∈R q x∈I 0


for a constant L>0. Then both the last condition of A3 and A7 are satisfied. The 

condition (4) is satisfied if we assume, for example, either of the following: 

(i) h is known on I c 0 ; 

(ii) when q = 1andI 0 =[l 0 ,r 0 ], each h is constant on (−∞,l 0 ] and on 

[r 0 , ∞). 

Although the examples (i) and (ii) might look restrictive, it should be noted that 

in practice we can choose an arbitrary large I 0 . 

Another way to deal with the unbounded case I = R q is to consider the parametrization 

σ(x; h) = h(u(x)), h ∈ C α M (I 0), 

where I 0 being a bounded, convex subset of R q′ and u : R q → I 0 is a fixed function. 

If q ′ /(2α) < 1, then both the last requirement in A3 and the metric entropy 

condition A7 are satisfied for the uniform metric d H =‖·‖ ∞ . 

Instead of CM α (I), another possibility of the choice of H which satisfies the 

metric entropy condition for the uniform metric is the Sobolev class; see Example 

19.10 of van der Vaart (1998). 

4.2. Results. As announced in Section 1, we propose to use the estimating 

function 

n (θ, h) = 1 n∑ Ṡ(X t n 

i−1 

; θ) 

tn 

σ 2 (X t n 

i−1 

; h) [X ti 

n − X t n 

i−1 

− S(X t n 

i−1 

; θ)|ti n − ti−1 n |], 

i=1 

whose compensator is 

˜ n (θ, h) = 1 n∑ Ṡ(X t n 

i−1 

; θ) [∫ t n 

i 

tn 

σ 2 (X t n 

i−1 

; h) 

i=1 

t n i−1 

Then we have the following two lemmas. 

S(X t ,θ 0 )dt − S(X t n 

i−1 

; θ)|t n i − t n i−1 | ]. 

LEMMA 4.1. Assume n → 0 and tn n →∞. Equip the space × H with the 

metric ρ =‖·‖∨d H . Under A1–A5 and A7, √ tn ( n − ˜ n ) converges weakly in 

C ρ ( × H) to a zero-mean Gaussian process Z with the covariance 

∫ 

EZ(θ, h)Z(θ ′ ,h ′ ) T = 

I 

Ṡ(x; θ)Ṡ(x; θ ′ ) T 

σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ(dx). 

In particular, the random variable Z(θ 0 ,h 0 ) is distributed with N (0,I(θ 0 ,h 0 )). 

LEMMA 4.2. Assume n = o((tn n)−1 ) and tn n →∞. Under A1–A6, for any 

random sequence (θ n ,h n ) such that ‖θ n − θ 0 ‖∨d H (h n ,h 0 ) = o P ∗(1), it holds that 

˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 ) − (−I(θ 0 ,h 0 ))(θ n − θ 0 ) = o P ∗( (t 

n 

n ) −1/2 +‖θ n − θ 0 ‖ ) .


Combining these lemmas with Theorem 2.1, and noting also ˜ n (θ 0 ,h 0 ) = 

O P ( 1/2 

n ) which will be proved by using Lemma 4.5 below, we can conclude 

the following theorem. 

THEOREM 4.3. Assume n = o((tn n)−1 ) and tn n →∞. Under A1–A7, for any 

random sequence (̂θ n , ĥn), such that 

and 

‖̂θ n − θ 0 ‖=o P ∗(1), 

d H (ĥn,h 0 ) = o P ∗(1) 

n (̂θ n , ĥn) = o P ∗((t n n )−1/2 ), 

the estimator ̂θ n is asymptotically normal and efficient: 

√ 

t n n (̂θ n − θ 0 ) d → N (0,I(θ 0 ,h 0 ) −1 ). 

When A8 is also satisfied, the assumption “‖̂θ n − θ 0 ‖=o P ∗(1)” is automatically 

satisfied. 

In the above theorem, the only assumption which we cannot check in the course 

of computing the data is the consistency “d H (ĥn,h 0 ) = o P ∗(1),” because it involves 

the true value h 0 of the unknown parameter h ∈ H .When{σ 2 (·; h); h ∈ H } 

is a class of functions σ 2 (·; h) = h(·) where H is a class of smooth functions, one 

may think that a kernel estimator is a candidate for ĥn. As stated above, in view of 

the Lipschitz condition of h ↦→ σ 2 (·; h) (the last condition in A3), it is convenient 

to consider the consistency with respect to the uniform metric. However, to show 

the consistency of the kernel estimator with respect to the uniform metric is a task. 

Generally speaking, showing the consistency for infinite-dimensional parameter 

is not a trivial problem, which should be solved by independent articles. See, for 

example, Hoffmann (2001). Below, we give a general way to show the consistency 

of a least square estimator. 

THEOREM 4.4. 

Assume n → 0 and tn n →∞. Assume A2–A5 and A7. Suppose 

that 

∫ 

inf |σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ(dx) > 0 ∀ε>0 

h : d H (h,h 0 )>ε I 

is satisfied. If the random element ĥn satisfies A n (ĥn) ≤ inf h∈H A n (h) + o P ∗(1) 

where 

A n (h) = 1 

n∑ 

|X t n 

i 

− X t n 

i−1 

| 2 

tn 

∣ 

|ti n − ti−1 n | − σ 2 2 

(X t n 

i−1 

,h) 

∣ |ti n − ti−1 n |, 

i=1 

then it holds that d H (ĥn,h 0 ) = o P ∗(1).


4.3. Proofs. Before the proofs, we state a lemma which is well known. 

LEMMA 4.5. 

|t n i 

− t n i−1 |≤1. 

Let X be a solution to the SDE (1) for (θ, h) = (θ 0 ,h 0 ). Assume 

(i) For any k ≥ 2, there exists a constant C k > 0, depending only on k, such 

that 

E sup |X t − X t n 

t∈[ti−1 n ,tn i ] i−1 

| k ≤ C k sup E{|S(X s ; θ 0 )| k +|σ(X s ; h 0 )| k }|ti n − ti−1 n |k/2 

s∈R 

=: D k |t n i − t n i−1 |k/2 , 

provided the right-hand side is finite. 

(ii) For any k ≥ 2 and any measurable function f , g, it holds that 

sup E ( |X t − X t n 

t∈[ti−1 n ,tn i ] i−1 

| k/2 |f(X t n 

i−1 

)||g(X t )| ) 

≤ (D k |ti n − ti−1 n |k/2 ) 1/2 sup(E|f(X s )| 4 ) 1/4 sup(E|g(X s )| 4 ) 1/4 , 

s∈R 

s∈R 

provided the right-hand side is finite. 

PROOF. The assertion (i) is well known. (Use Hölder’s inequality 

∫ t n 

and Burkholder–Davis–Gundy’s inequality for i 

ti−1 

n |S(X s ; θ 0 )| ds and 

sup t∈[t n 

i−1 ,ti n] | ∫ t 

t n σ(X s ; h 0 )dW s |.) The assertion (ii) follows from Hölder’s inequality 

and (i). 

i−1 

□ 

During the proofs, we write 

ψ(x; θ,h)= 

Ṡ(x; θ) 

σ 2 (x; h) , 

which is a d-dimensional vector-valued function. For each component ψ (j) (x; θ, 

h) (j = 1,...,d), it holds that 

∣ ψ (j) (x; θ,h)− ψ (j) (x ′ ; θ,h) ∣ ∣ 

(5) 

≤ |Ṡ(j) (x; θ)− Ṡ (j) (x ′ ; θ)| 

σ 2 (x; h) 

+ ∣ Ṡ (j) (x ′ ; θ) ∣ 1 

∣ 

σ 2 (x; h) − 1 

σ 2 (x ′ ; h) 

∣ 

{ 1 

≤ 

c + (x) } 

c 2 K|x − x ′ |

and that 

∣ ψ (j) (x; θ,h)− ψ (j) (x; θ ′ ,h ′ ) ∣ ∣ 

(6) 


≤ |Ṡ(j) (x; θ)− Ṡ (j) (x; θ ′ )| 

σ 2 (x; h) 

+ ∣ ∣Ṡ (j) (x; θ ′ ) ∣ 1 

∣ 

σ 2 (x; h) − 1 

σ 2 (x; h ′ ) 

∣ 

≤ |Ṡ(j) (x; θ)− Ṡ (j) (x; θ ′ )| 

+ ∣ c 

Ṡ (j) (x; θ ′ ) ∣ |σ 2 (x; h) − σ 2 (x; h ′ )| 

c 2 

{ (x) 

≤ + |(x)|2 } (‖θ − θ ′ 

c c 2 ‖∨d H (h, h ′ ) ) . 

PROOF OF LEMMA 4.1. We apply Theorem 3.4.2 of Nishiyama (2000b) [or, 

see Theorem 3.3 of van der Vaart and van Zanten (2005)] to the terminals M n,θ,h 

tn 

of the continuous martingales t Mt 

n,θ,h given by 

M n,θ,h 

t = 1 √ t n n 

n∑ 

∫ t n 

i ∧t 

ψ(X t n 

i−1 

; θ,h) σ(X s ; h 0 )dW s . 

i=1 

ti−1 n ∧t 

For the finite-dimensional convergence, it is sufficient to show the convergence of 

predictable covariation. This is done as follows. 

(7) 

〈M n,θ,h ,M n,θ ′ ,h ′ 〉 t n = 1 n∑ 

∫ t n 

tn 

ψ(X t n 

i−1 

; θ, h)ψ(X t n 

i−1 

; θ ′ ,h ′ ) T i 

= 1 

i=1 

∫ t n n 

t n i−1 

σ 2 (X s ; h 0 )ds 

ψ(X t ; θ, h)ψ(X t ; θ ′ ,h ′ ) T σ 2 (X t ; h 0 )dt + o P (1) 

tn 

0 

∫ 

p 

→ ψ(x; θ, h)ψ(x; θ ′ ,h ′ ) T σ 2 (x; h 0 )μ(dx) 

I 

∫ 

Ṡ(x; θ)Ṡ(x; θ ′ ) T 

= 

I σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ(dx) 

= I(θ 0 ,h 0 ) if (θ, h) = (θ ′ ,h ′ ) = (θ 0 ,h 0 ). 

Here, to show (7), we have used the bound (5) and Lemma 4.5(ii) twice. 

To establish Nishiyama’s condition [ME], let us observe the following fact to 

check the metric entropy condition for the product space × H . 

In general, if (D, d) and (E, e) are two semimetric spaces, then the covering 

number of the product space D × E with respect to the maximum semimetric 

d ∨ e, namely N(D × E,d ∨ e,ε), is bounded by N(D,d,ε)· N(E,e,ε).Tosee 

this claim, let B i , i = 1,...,N(D,d,ε) be an ε-covering of D, andletC j , j = 

1,...,N(E,e,ε)be an ε-covering of E. Then the diameters of the sets B i × C j ⊂


D × E with respect to d ∨ e are smaller than ε, thus these sets form an ε-covering 

of D × E. The claim has been proved. Consequently, the metric entropy condition 

is satisfied if 

∫ 1 

0 

∫ 1 

0 

√ 

log N(D,d,ε)dε < ∞ 

√ 

log N(D × E,d ∨ e,ε)dε < ∞ 

and 

∫ 1 

0 

√ 

log N(E,e,ε)dε

to (θ 0 ,h 0 ) that 

(8) 

1 

t n n 

∫ t n n 

0 


ψ(X t ; θ n ,h n )[S(X t ; θ 0 ) − S(X t ; θ n )] dt 

= 1 

t n n 

= 1 

= 

tn 

∫ 

I 

∫ t n n 

0 

∫ t n n 

0 

ψ(X t ; θ n ,h n )Ṡ(X t ; θ 0 ) T dt(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖) 

ψ(X t ; θ 0 ,h 0 )Ṡ(X t ; θ 0 ) T dt(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖) 

ψ(x; θ 0 ,h 0 )Ṡ(x; θ 0 ) T μ(dx)(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖) 

=−I(θ 0 ,h 0 )(θ n − θ 0 ) + o P ∗(‖θ n − θ 0 ‖). 

To prove (8) in the above computation, use (6) to show that for every j,k = 

1,...,d 

∫ 1 t n n 

∣ ψ (j) (X t ; θ n ,h n )Ṡ (k) (X t ; θ 0 )dt − 1 ∫ t n n 

ψ (j) (X t ; θ 0 ,h 0 )Ṡ (k) (X t ; θ 0 )dt 

∣ 

t n n 

0 

≤ 1 ∫ t n{ n (Xt ) 

tn 

0 c 

= O P (1) · o P ∗(1) 

= o P ∗(1). 

The proof is complete. 

t n n 

0 

+ |(X t)| 2 

c 2 } ∣∣Ṡ (k) (X t ; θ 0 ) ∣∣ dt ·‖θ n − θ 0 ‖∨d H (h n ,h 0 ) 

□ 

PROOF OF THEOREM 4.3. By Lemma 4.5, it is easy to see that ˜ n (θ 0 ,h 0 ) = 

O P ( 1/2 

n ) = o P ((tn n)−1/2 ). So the main assertion follows from Theorem 2.1 

with help from Lemmas 4.1 and 4.2. On the other hand, since it follows from 

Lemma 4.5(ii) and Theorem 3.1(i) that sup θ,h ‖ n (θ, h) − (θ,h)‖=o P ∗(1), 

where 

∫ 

Ṡ(x; θ) 

(θ,h) = 

I σ 2 (x; h) [S(x; θ 0) − S(x; θ)]μ(dx), 

the assertion that the consistency “‖̂θ n − θ 0 ‖=o P ∗(1)” automatically follows 

from A8 is immediate from Theorem 2.3. □ 

PROOF OF THEOREM 4.4. 

n∑ 

M n (h) = 1 

M(h) = 

tn 

i=1 

∫ 

I 

Put 

|σ 2 (X t n 

i−1 

; h) − σ 2 (X t n 

i−1 

; h 0 )| 2 |t n i − t n i−1 |, 

|σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ(dx).


Let us apply Corollary 3.2.3 of van der Vaart and Wellner (1996) to the above 

M n and M for the given ĥn which is the solution of A n (ĥn) ≤ inf h∈H A n (h) + 

o P ∗(1). By Lemma 4.5(ii) and Theorem 3.1(i), it is not difficult to see that 

sup h∈H |M n (h) − M(h)| =o P ∗(1), so it is sufficient to show that M n (ĥn) = 

o P ∗(1). 

Observe that 

1 

n∑ 

|X t n 

i 

− X t n 

i−1 

| 2 

tn 

∣ 

|t n i=1 i 

− ti−1 n | − σ 2 2 

(X t n 

i−1 

; h) 

∣ |ti n − ti−1 n | 

= 1 

n∑ 

|X t n 

i 

− X t n 

i−1 

| 2 

tn 

∣ 

|t n i=1 i 

− ti−1 n | − σ 2 2 

(X t n 

i−1 

; h 0 ) 

∣ |ti n − ti−1 n | 

+ 2 n∑ 

( |Xt n 

i 

− X t n 

i−1 

| 2 

) 

tn 

|t n i=1 i 

− ti−1 n | − σ 2 (X t n 

i−1 

; h 0 ) 

+ 1 

t n n 

× ( σ 2 (X t n 

i−1 

; h 0 ) − σ 2 (X t n 

i−1 

; h) ) |t n i − t n i−1 | 

n∑ 

|σ 2 (X t n 

i−1 

; h 0 ) − σ 2 (X t n 

i−1 

; h)| 2 |ti n − ti−1 n |. 

i=1 

Let us prove that the supremum with respect to h of the absolute value of the 

second term on the right-hand side converges in outer probability to zero [say, the 

claim (a)]. 

Since we have from Itô’s formula that 

|X t n 

i 

− X t n 

i−1 

| 2 = 2 

∫ t n 

i 

t n i−1 

(X s − X t n 

i−1 

)S(X s ; θ 0 )ds 

∫ t n ∫ 

i 

t n 

i 

+ 2 (X s − X t 

ti−1 

n n 

i−1 

)σ (X s ; h 0 )dW s + σ 2 (X s ; h 0 )ds, 

ti−1 

n 

it is sufficient to show that C 1,n = o P (1), sup h∈H |C 2,n (h)|=o P ∗(1) and C 3,n = 

o P (1), where 

C 1,n = 1 

t n n 

C 2,n (h) = 1 

t n n 

C 3,n = 1 

t n n 

n∑ 

∫ t n i 

∣ (X s − X t 

i=1 

ti−1 

n n 

i−1 

)S(X s ; θ 0 )ds 

∣ (X ti−1 n ), 

n∑ 

∫ t n 

i 

( 

(X s − X t 

i=1 

ti−1 

n n 

i−1 

)σ (X s ; h 0 )dW s σ 2 (X t n 

i−1 

; h 0 ) − σ 2 (X t n 

i−1 

; h) ) , 

n∑ 

∫ t n 

i 

|σ 2 (X s ; h 0 ) − σ 2 (X t 

i=1 

ti−1 

n n 

i−1 

; h 0 )| ds(X t n 

i−1 

). 

By using Lemma 4.5(ii), we easily have EC 1,n → 0andEC 3,n → 0. On the other 

hand, by using Theorem 3.4.2 of Nishiyama (2000b), it holds that C 2,n converges


weakly to zero in C dH (H ) (recall the argument in the proof of Lemma 4.1). Therefore, 

we have sup h∈H |C 2,n (h)|=o P ∗(1). 

Hence, the claim (a) is true, and we have that A n (ĥn) ≤ inf h∈H A n (h) + o P ∗(1) 

implies that M n (ĥn) = o P ∗(1). The proof is finished. □ 

5. Ergodic time series. 

5.1. Model and regularity conditions. Let us consider the time series model 

given by 

X i = ˜S(X i−1 ,...,X i−q1 ; θ)+ ˜σ(X i−1 ,...,X i−q2 ; h)w i . 

By putting q = q 1 ∨q 2 and changing the domains of the functions ˜S and ˜σ , without 

loss of generality, we can write 

X i = S(X i ; θ)+ σ(X i ; h)w i , 

where X i = (X i−1 ,...,X i−q ) and S(·; θ) and σ(·; h) are some measurable functions 

on R q . For simplicity, we assume that the initial values (X 0 ,...,X 1−q ) = 

(x 0 ,...,x 1−q ) are fixed. 

As for the noise {w i }, we consider the following two cases: 

CASE G (Gaussian). 

N (0, 1). 

CASE M (Martingale). 

where F i = σ {X j : j ≤ i}. 

{w i } are independently, identically distributed with 

E[w i |F i−1 ]=0andE[w 2 i |F i−1] =1 almost surely, 

Clearly, the Case G is a special case of the Case M. When we do not especially 

declare the restriction to the Case G, we consider the Case M in principle. 

Let us list up some conditions which have the same fashion as those in Section 

4.1. We suppose that there exists a parametric family of d-dimensional vectorvalued 

functions {Ṡ(·; θ); θ ∈ } on R q which satisfies the following conditions. 

Typically, they may be considered to be the derivatives of S(·; θ) with respect to θ, 

that is, Ṡ(·; θ) = ( 

∂θ ∂ 

1 

S(·; θ),..., 

∂θ ∂ 

d 

S(·; θ)) T . The function appearing in B1 

and B2 may be chosen to be common without loss of generality. 

B1. is a compact subset of R d . There exists a measurable function on R q 

such that at the true θ 0 ∈ , 

S(x; θ)− S(x; θ 0 ) = Ṡ(x; θ 0 ) T (θ − θ 0 ) + (x)ɛ(x; θ,θ 0 ), 

where sup x∈R q |ɛ(x; θ,θ 0 )|=o(‖θ − θ 0 ‖) as θ → θ 0 .


B2. There exists a measurable function on R q such that 

sup |S(x; θ)|≤(x); 

θ∈ 

sup ‖Ṡ(x; θ)‖≤(x); 

θ∈ 

σ 2 (x; h 0 ) ≤ (x), 

c := inf inf σ 2 (x; h) > 0; 

h∈H x∈R q 

‖Ṡ(x; θ)− Ṡ(x; θ)‖≤(x)‖θ − θ ′ ‖ ∀θ,θ ′ ∈ ; 

|σ 2 (x; h) − σ 2 (x; h ′ )|≤(x)d H (h, h ′ ) ∀h, h ′ ∈ H. 

B3. The process {X i } i=1,2,... is ergodic under the true (θ 0 ,h 0 ) in the sense that 

for q ′ = q and q + 1 there exists the invariant measure μ q ′ such that for every 

μ q ′-integrable function f 

1 

n 

n∑ 

i=1 

We also assume that 

∫ 

B4. The matrix 

f(X i−1 ,...,X i−q ′) → 

p ∫ 

f(x 1,...,x q ′)μ q ′(dx 1 ···dx q ′). 

R q′ 

∫ 

R q (x)5 μ q (dx)ε 

∥ 

∫ 1 

0 

√ 

log N(H,d H ,ε)dε 0. 

See the end of Section 4.1 for the discussion of the choice of (H, d H ). 

5.2. Results. In order to explain the idea of our estimating function, let us first 

consider the Case G. We denote by P n,u the distribution of {X 1 ,...,X n } under


θ = θ 0 + n −1/2 u and h = h 0 ,whereu ∈ R d . By an easy computation, the loglikelihood 

ratio is given by 

(9) 

where 

and 

n,u = 

log dP n,u 

dP n,0 

(X 1 ,...,X n ) 

=− 

n∑ 

i=1 

1 

2σ 2 (X i ; h 0 ) 

= n,u − B n,u , 

n∑ 

i=1 

B n,u = 

×{|X i − S(X i ; θ 0 + n −1/2 u)| 2 −|X i − S(X i ; θ 0 )| 2 } 

1 

σ 2 (X i ; h 0 ) {X i − S(X i ; θ 0 )}{S(X i ; θ 0 + n −1/2 u) − S(X i ; θ 0 )} 

n∑ 

i=1 

1 

2σ 2 (X i ; h 0 ) {S(X i; θ 0 + n −1/2 u) − S(X i ; θ 0 )} 2 . 

Under the above regularity conditions, we have 

n,u 

d → N (0,u T I(θ 0 ,h 0 )u) and B n,u 

p 

→ 

1 

2 

u T I(θ 0 ,h 0 )u. 

So it follows from the theory of the local asymptotic normality that the distribution 

of the asymptotically efficient bound in the Case G when h 0 is known is 

N (0,I(θ 0 ,h 0 ) −1 ). That is, if we obtain an estimator ˜θ n such that √ n(˜θ n − θ 0 ) → 

d 

N (0,I(θ 0 ,h 0 ) −1 ), it is asymptotically efficient in the sense of the local asymptotic 

minimax theorem. [See, e.g., Chapter 3.11 of van der Vaart and Wellner (1996).] 

If the parameter h is unknown, then the estimation problem for θ becomes more 

difficult. So if we have an estimator which asymptotically behaves as stated above, 

then we may say that it is asymptotically efficient with the nuisance parameter h. 

This argument is not true in the Case M where the log-likelihood does not equal 

the formula (9), but we propose to use it for deriving an estimating equation which 

yields the same asymptotic distribution as the Case G. 

Not only in the Case G but also in the case M, differentiating (9) formally, 

and replacing the true h 0 by the unknown parameter h, we propose the estimating 

function 

n (θ, h) = 1 n∑ Ṡ(X i ; θ) ( 

Xi 

n σ 2 − S(X i ; θ) ) . 

(X i ; h) 

Its compensator is 

˜ n (θ, h) = 1 n 

n∑ 

i=1 

i=1 

Ṡ(X i ; θ) ( S(Xi 

σ 2 ; θ 0 ) − S(X i ; θ) ) . 

(X i ; h)


Thus, it holds that 

√ ( n n (θ, h) − ˜ n (θ, h) ) = √ 1 n∑ Ṡ(X i ; θ) 

n σ 2 (X i ; h) σ(X i; h 0 )w i , 

which is the summation of a C ρ ( × H)-valued martingale difference array where 

ρ =‖·‖∨d H . By using Jain–Marcus’ central limit theorem for martingales given 

by Nishiyama (1996, 2000a, 2000b), we have the following lemma which plays a 

key role in our approach. 

LEMMA 5.1. Under B1–B3 and B5, the sequence of random fields √ n( n − 

˜ n ), with parameter (θ, h), converges weakly in C ρ ( × H) to a zero-mean 

Gaussian random field Z with the covariance 

EZ(θ, h)Z(θ ′ ,h ′ ) T Ṡ(x; θ)Ṡ(x; θ 

= 

∫R ′ ) T 

q σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ q (dx). 

In particular, the random variable Z(θ 0 ,h 0 ) is distributed with N (0,I(θ 0 ,h 0 )). 

Another lemma which is necessary to apply Theorem 2.1 is the following. 

LEMMA 5.2. Under B1–B4, for any random sequence (θ n ,h n ) such that 

‖θ n − θ 0 ‖∨d H (h n ,h 0 ) = o P ∗(1), it holds that 

˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 ) − (−I(θ 0 ,h 0 ))(̂θ n − θ 0 ) = o P ∗(‖θ n − θ 0 ‖). 

i=1 

Noting also that ˜ n (θ 0 ,h 0 ) = 0, we can apply Theorem 2.1 to conclude the 

following theorem. 

THEOREM 5.3. 

Under B1–B5, for any random sequence (̂θ n , ĥn) such that 

‖̂θ n − θ 0 ‖=o P ∗(1), d H (ĥn,h 0 ) = o P ∗(1) and n (̂θ n , ĥn) = o P ∗(n −1/2 ), 

the estimator ̂θ n is asymptotically normal: 

√ n(̂θ n − θ 0 ) d → N (0,I(θ 0 ,h 0 ) −1 ). 

In particular, in the Case G, the estimator ̂θ n is asymptotically efficient. When B6 

is also satisfied, the assumption “‖̂θ n − θ 0 ‖=o P ∗(1)” is automatically satisfied. 

By the same reason as in Section 4, it is necessary to develop a procedure to construct 

a consistent estimator ĥn for the nuisance parameter h ∈ H . The following 

theorem gives us an answer.


THEOREM 5.4. Assume B2, B3 and B5. 

(First step: initial estimator for θ 0 .) Suppose the identifiability condition 

∫ 

inf |S(x; θ)− S(x; θ 0)| 2 μ q (dx)>0 ∀ε>0 

θ : ‖θ−θ 0 ‖>ε R q 

is satisfied. If a random sequence θn 

LS satisfies A n (θn 

LS) 

≤ inf θ∈ A n (θ) + o P ∗(1), 

where 

A n (θ) = 1 n∑ 

|X i − S(X i ; θ)| 2 , 

n 

i=1 

then it holds that ‖θn 

LS − θ 0 ‖=o P ∗(1). 

(Second step: consistent estimator for h 0 .) Suppose the identifiability condition 

∫ 

inf |σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ q (dx)>0 ∀ε>0 

h : d H (h,h 0 )>ε R q 

is satisfied. Merely by a technical reason, assume that there exists a constant 

L 4 > 0 such that E[wi 4|F i−1]


PROOF OF LEMMA 5.2. Notice that ˜ n (θ 0 ,h 0 ) = 0. For any (possibly random) 

sequence (θ n ,h n ) converging in outer probability to (θ 0 ,h 0 ), it holds that 

˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 ) 

(11) 

= 1 n 

= 1 n 

= 1 n 

n∑ 

i=1 

n∑ 

i=1 

n∑ 

i=1 

Ṡ(X i ; θ n ) 

σ 2 (X i ; h n ) [S(X i; θ 0 ) − S(X i ; θ n )] 

Ṡ(X i ; θ n ) 

σ 2 (X i ; h n )Ṡ(X i; θ 0 )(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖) 

Ṡ(X i ; θ 0 ) 

σ 2 (X i ; h 0 )Ṡ(X i; θ 0 )(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖) 

=−I(θ 0 ,h 0 )(θ n − θ 0 ) + o P ∗(‖θ n − θ 0 ‖). 

To show (11) above, do the same argument as the proof of Lemma 4.2 using (10) 

instead of (6). □ 

PROOF OF THEOREM 5.3. The assertions follow from Theorems 2.1 and 2.3 

by using also Lemmas 5.1 and 5.2, and Theorem 3.1(ii), respectively. □ 

PROOF OF THEOREM 5.4. To prove the first step, we will apply Corollary 

3.2.3 of van der Vaart and Wellner (1996). We can write A n (θ) = T 1,n + T 2,n (θ) + 

T 3,n (θ) where 

T 1,n = 1 n∑ 

|X i − S(X i ; θ 0 )| 2 , 

n 

i=1 

T 2,n (θ) = 1 n∑ ( S(Xi ; θ 0 ) − S(X i ; θ) ) σ(X i ; h 0 )w i , 

n 

i=1 

T 3,n (θ) = 1 n∑ 

|S(X i ; θ 0 ) − S(X i ; θ)| 2 . 

n 

i=1 

The term T 1,n converges in probability to a constant C 1 . On the other hand, by using 

Proposition 4.5 of Nishiyama (2000a), we have that √ nT 2,n converges weakly 

in C() to a tight law, thus, sup θ∈ |T 2,n (θ)| converges in outer probability to zero. 

Finally, by Theorem 3.1, it holds that sup θ∈ |T 3,n (θ) − T 3 (θ)|=o P ∗(1) where 

∫ 

T 3 (θ) = |S(x; θ 0) − S(x; θ)| 2 μ(dx). 

R q 

Hence, we have sup θ∈ |A n (θ) − (C 1 + T 3 (θ))|=o P ∗(1), and van der Vaart and 

Wellner’s (1996) consistency theorem yields the assertion of the first step.


To prove the second step, we shall apply Corollary 3.2.3 of van der Vaart and 

Wellner (1996) again. Let us first see that sup h∈H |B n (h) − ˜B n (h)| =o P ∗(1) 

where 

˜B n (h) = 1 n∑ 

n 

∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h) ∣ 2 . 

n 

i=1 

Now, notice that 

∣ ∣∣ |X i − S(X i ; θ LS )|2 − σ 2 (X i ; h) ∣ 2 − ∣∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h) ∣ 2∣ ∣ 

≤ ∣∣ |X i − S(X i ; θn 

LS )|2 +|X i − S(X i ; θ 0 )| 2 − 2σ 2 (X i ; h) ∣ × ∣∣ |X i − S(X i ; θn 

LS )|2 −|X i − S(X i ; θ 0 )| 2∣ ∣ 

≤ ∣∣ |X i − S(X i ; θn 

LS )|2 +|X i − S(X i ; θ 0 )| 2 − 2σ 2 (X i ; h) ∣ ×|2X i − S(X i ; θ LS 

n ) − S(X i; θ 0 )| 

×|S(X i ; θ LS 

n ) − S(X i; θ 0 )| 

≤ L(X i , X i )‖θ LS 

n 

− θ 0 ‖, 

where L(x 0 ,x 1 ,...,x q ) = C‖x 0 |+(x 1 ,...,x q )| 4 for a constant C. Given any 

ε>0 choose M>0 such that ∫ R q+1 L(x 0,x 1 ,...,x q )1 {L(x0 ,x 1 ,...,x q )>M}μ q+1 (dx 0 , 

dx 1 ,...,dx q )M} diam(). 

i=1 

The second term of the right-hand side converges to a positive constant which 

is smaller than ε · diam(). Since the choice of ε>0 is arbitrary, we have 

sup h∈H |B n (h) − ˜B n (h)|=o P ∗(1). 

Now, we can write ˜B n (h) = T 1,n + T 2,n (h) + T 3,n (h), where 

T 1,n = 1 n 

T 2,n (h) = 2 n 

T 3,n (h) = 1 n 

n∑ 

∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h 0 ) ∣ 2 , 

i=1 

n∑ 

i=1 

n∑ 

i=1 

σ 2 (X i ,h 0 )(w 2 i − 1)( σ 2 (X i ,h 0 ) − σ 2 (X i ; h) ) , 

|σ 2 (X i ; h 0 ) − σ 2 (X i ; h)| 2 . 

The term T 1,n converges in probability to a constant C 1 by assumption. By 

Jain–Marcus’ CLT for martingale difference arrays, it is easy to show that 

sup h∈H |T 2,n (h)|=o P ∗(1) (here, we use the technical assumption that E[w 4 i |F i−1]


is bounded). Finally, by Theorem 3.1, it holds that sup h∈H |T 3,n (h) − T 3 (h)| = 

o P ∗(1) where 

∫ 

T 3 (h) = |σ 2 (x; h 0 ) − σ 2 (x; h)| 2 μ q (dx). 

R q 

Consequently, we have sup h∈H |B n (h) − (C 1 + T 3 (h))|=o P ∗(1). Therefore, the 

claim of the second step follows form van der Vaart and Wellner’s (1996) consistency 

theorem. □ 

Acknowledgments. I thank the Associate Editor and the referees for their 

careful reading and the advice to present a sharper version of the moment conditions. 

REFERENCES 

FLORENS-ZMIROU, D. (1989). Approximate discrete time schemes for statistics of diffusion 

processes. Statistics 20 547–557. MR1047222 

HOFFMANN, M. (2001). On estimating the diffusion coefficient: Parametric versus nonparametric. 

Ann. Inst. H. Poincaré Probab. Statist. 37 339–372. MR1831987 

JACOD, J.andSHIRYAEV, A. N. (1987). Limit Theorems for Stochastic Processes. Springer, Berlin. 

MR0959133 

KESSLER, M. (1997). Estimation of an ergodic diffusion from discrete observations. Scand. J. Statist. 

24 211–229. MR1455868 

KOSOROK, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. 

Springer, New York. 

KUTOYANTS, Y. A. (2004). Statistical Inference for Ergodic Diffusion Processes. Springer, London. 

MR2144185 

NISHIYAMA, Y. (1996). A central limit theorem for l ∞ -valued martingale difference arrays and its 

application. Preprint 971. Dept. Mathematics, Utrecht Univ. Available at http://www.ism.ac.jp/ 

~nisiyama/. 

NISHIYAMA, Y. (1997). Some central limit theorems for l ∞ -valued semimartingales and their applications. 

Probab. Theory Related Fields 108 459–494. MR1465638 

NISHIYAMA, Y. (1999). A maximal inequality for continuous martingales and M-estimation in a 

Gaussian white noise model. Ann. Statist. 27 675–696. MR1714712 

NISHIYAMA, Y. (2000a). Weak convergence of some classes of martingales with jumps. Ann. 

Probab. 28 685–712. MR1782271 

NISHIYAMA, Y. (2000b). Entropy Methods for Martingales. CWI Tract. 128. Centrum Wisk. Inform., 

Amsterdam. MR1742612 

NISHIYAMA, Y. (2007). On the paper “Weak convergence of some classes of martingales with 

jumps.” Ann. Probab. 35 1194–1200. MR2319720 

NISHIYAMA, Y. (2009). A uniform law of large numbers for ergodic processes under a Lipschitz 

condition. Unpublished manuscript. Available at http://www.ism.ac.jp/~nisiyama/. 

TANIGUCHI, M.andKAKIZAWA, Y. (2000). Asymptotic Theory of Statistical Inference for Time 

Series. Springer, New York. MR1785484 

VAN DER VAART, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press, Cambridge. 

MR1652247 

VAN DER VAART, A.W.andWELLNER, J. A. (1996). Weak Convergence and Empirical Processes: 

With Applications to Statistics. Springer, New York. MR1385671


VAN DER VAART, A.W.andVAN ZANTEN, H. (2005). Donsker theorems for diffusions: Necessary 

and sufficient conditions. Ann. Probab. 33 1422–1451. MR2150194 

YOSHIDA, N. (1992). Estimation for diffusion processes from discrete observations. J. Multivariate 

Anal. 41 220–242. MR1172898 

INSTITUTE OF STATISTICAL MATHEMATICS 

4-6-7 MINAMI-AZABU, MINATO-KU 

TOKYO 106-8569 

JAPAN 

E-MAIL: nisiyama@ism.ac.jp 

URL: http://www.ism.ac.jp/~nisiyama/

Asymptotic theory of semiparametric Z-estimators for stochastic ...

Create successful ePaper yourself

Delete template?

Save as template?