07.02.2014 Views

Asymptotic theory of semiparametric Z-estimators for stochastic ...

Asymptotic theory of semiparametric Z-estimators for stochastic ...

Asymptotic theory of semiparametric Z-estimators for stochastic ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Annals <strong>of</strong> Statistics<br />

2009, Vol. 37, No. 6A, 3555–3579<br />

DOI: 10.1214/09-AOS693<br />

© Institute <strong>of</strong> Mathematical Statistics, 2009<br />

ASYMPTOTIC THEORY OF SEMIPARAMETRIC Z-ESTIMATORS<br />

FOR STOCHASTIC PROCESSES WITH APPLICATIONS TO<br />

ERGODIC DIFFUSIONS AND TIME SERIES<br />

BY YOICHI NISHIYAMA<br />

Institute <strong>of</strong> Statistical Mathematics<br />

This paper generalizes a part <strong>of</strong> the <strong>theory</strong> <strong>of</strong> Z-estimation which has<br />

been developed mainly in the context <strong>of</strong> modern empirical processes to the<br />

case <strong>of</strong> <strong>stochastic</strong> processes, typically, semimartingales. We present a general<br />

theorem to derive the asymptotic behavior <strong>of</strong> the solution to an estimating<br />

equation θ n (θ, ĥn) = 0 with an abstract nuisance parameter h when the<br />

compensator <strong>of</strong> n is random. As its application, we consider the estimation<br />

problem in an ergodic diffusion process model where the drift coefficient<br />

contains an unknown, finite-dimensional parameter θ and the diffusion coefficient<br />

is indexed by a nuisance parameter h from an infinite-dimensional<br />

space. An example <strong>for</strong> the nuisance parameter space is a class <strong>of</strong> smooth functions.<br />

We establish the asymptotic normality and efficiency <strong>of</strong> a Z-estimator<br />

<strong>for</strong> the drift coefficient. As another application, we present a similar result<br />

also in an ergodic time series model.<br />

1. Introduction. Let us begin with stating our motivating example; the details<br />

are presented in Section 4. Consider the one-dimensional ergodic diffusion process<br />

X on I = (l, r) ⊆ R which is a solution to the <strong>stochastic</strong> differential equation<br />

(SDE) given by<br />

∫ t<br />

∫ t<br />

(1)<br />

X t = X 0 + S(X s ; θ)ds + σ(X s ; h) dW s ,<br />

0<br />

0<br />

where s W s is a standard Brownian motion. Here, we consider a d-dimensional<br />

parametric family {S(·; θ); θ ∈ } <strong>for</strong> the drift coefficient indexed by a compact<br />

subset <strong>of</strong> R d , and a possibly infinite-dimensional “parametric” family<br />

{σ 2 (·; h); h ∈ H } <strong>for</strong> the diffusion coefficient indexed by a (general) totally<br />

bounded metric space (H, d H ). We denote by (θ 0 ,h 0 ) the true value <strong>of</strong> (θ, h).Our<br />

aim is to estimate θ 0 when the model is perturbed by the unknown nuisance parameter<br />

h. As <strong>for</strong> the parameter h 0 , we construct a d H -consistent estimator ĥn.<br />

We prove that the Z-estimator ̂θ n , which is a solution to an estimating equation<br />

n (θ, ĥn) = 0, is asymptotically normal and efficient. [We follow van der Vaart<br />

and Wellner (1996) <strong>for</strong> the terminology “Z-estimator.”]<br />

Received January 2009.<br />

AMS 2000 subject classifications. Primary 62F12; secondary 62M05, 62M10.<br />

Key words and phrases. Estimating function, nuisance parameter, metric entropy, asymptotic efficiency,<br />

ergodic diffusion, discrete observation.<br />

3555


3556 Y. NISHIYAMA<br />

There exist a lot <strong>of</strong> works which treat the estimation problem <strong>for</strong> the drift coefficient.<br />

It is well known that when the process X = (X t ) t∈[0,∞) is observed continuously<br />

on the time interval [0,T], the diffusion coefficient may be assumed to<br />

be known without loss <strong>of</strong> generality. (So, we may put h = h 0 .) In such cases, the<br />

asymptotic normality and efficiency <strong>of</strong> the maximum likelihood estimator (MLE)<br />

̂θ T <strong>for</strong> θ 0 ,asT →∞, has been already established. See, <strong>for</strong> example, Kutoyants<br />

(2004). The MLE ̂θ T is a solution to the estimating equation ˙l T (θ) = 0 with<br />

˙l T (θ) = 1 T<br />

∫ T<br />

0<br />

Ṡ(X t ; θ)<br />

σ 2 (X t ; h 0 ) [dX t − S(X t ; θ)dt],<br />

where Ṡ denotes the derivative <strong>of</strong> S with respect to θ. On the other hand, when the<br />

process X = (X t ) t∈[0,∞) is observed only at discrete time points {0 = t0 n


SEMIPARAMETRIC Z-ESTIMATION 3557<br />

and to show the differentiability <strong>of</strong> (θ, h) ˜ n (θ, h) around (θ 0 ,h 0 ). Roughly<br />

speaking, our main result asserts that if we assume h ↦→ σ 2 (·; h) is Lipschitz continuous<br />

with respect to d H , that the metric entropy condition is satisfied,<br />

∫ 1<br />

0<br />

√<br />

log N(H,d H ,ε)dε


3558 Y. NISHIYAMA<br />

denote by C ρ (T ) the space <strong>of</strong> functions on T which are continuous with respect to<br />

the metric ρ. We equip both spaces with the uni<strong>for</strong>m metric. Given a probability<br />

measure P , we denote by P ∗ the corresponding outer probability; see van der Vaart<br />

and Wellner (1996) <strong>for</strong> the <strong>stochastic</strong> convergence <strong>theory</strong> which does not assume<br />

the measurability. We denote by p → and d → the convergence in (outer) probability<br />

and the weak convergence. The limit notation mean in principle that we take the<br />

limit as n →∞. The Euclidean metric on R d is denoted by ‖·‖.<br />

2. General <strong>theory</strong> <strong>for</strong> <strong>semiparametric</strong> Z-estimation. Let two sets and H<br />

be given. Let<br />

n : × H → R d and ˜ n : × H → R d<br />

be random maps. The latter should be a random “compensator” <strong>of</strong> the <strong>for</strong>mer,<br />

and in the i.i.d. case it is not random and not depending on n. Compare the above<br />

setting with that in Chapter 3.3 <strong>of</strong> van der Vaart and Wellner (1996)where˜ n ≡ .<br />

We present a way to derive the asymptotic behaviour <strong>of</strong> the estimator ̂θ n <strong>for</strong><br />

the parameter θ ∈ <strong>of</strong> interest, with help <strong>of</strong> the estimator ĥn <strong>for</strong> the nuisance<br />

parameter h ∈ H , which are solutions to the estimating equation<br />

n (̂θ n , ĥn) ≈ 0.<br />

Here, the true values θ 0 ∈ and h 0 ∈ H are supposed to satisfy<br />

˜ n (θ 0 ,h 0 ) ≈ 0.<br />

The following theorem extends a special case <strong>of</strong> Theorem 3.3.1 <strong>of</strong> van der Vaart<br />

and Wellner (1996). See also Theorem 5.21 <strong>of</strong> van der Vaart (1998).<br />

THEOREM 2.1. Let be a subset <strong>of</strong> R d with the Euclidean metric ‖·‖. Let<br />

(H, d H ) be a semimetric space. Let n : × H → R d and ˜ n : × H → R d<br />

be random maps defined on a probability space ( n , F n ,P n ).(We do not assume<br />

any measurability.) Suppose that there exist a sequence <strong>of</strong> constants r n ↑∞, some<br />

fixed point (θ 0 ,h 0 ) and an invertible matrix V θ0 ,h 0<br />

which satisfy the following (i)<br />

and (ii).<br />

(i) There exists a neighborhood U ⊂ × H <strong>of</strong> (θ 0 ,h 0 ) such that<br />

r n ( n − ˜ n ) d → Z<br />

in l ∞ (U),<br />

where almost all paths (θ, h) Z(θ,h) are continuous with respect to ρ =‖·‖∨<br />

d H .<br />

(ii) For given random sequence (̂θ n , ĥn), it holds that<br />

˜ n (̂θ n , ĥn) − ˜ n (θ 0 ,h 0 ) − V θ0 ,h 0<br />

(̂θ n − θ 0 ) = o P ∗ n<br />

(r −1<br />

n<br />

+‖̂θ n − θ 0 ‖)


and that<br />

SEMIPARAMETRIC Z-ESTIMATION 3559<br />

‖̂θ n − θ 0 ‖∨d H (ĥn,h 0 ) = o P ∗ n<br />

(1),<br />

Then it holds that<br />

˜ n (θ 0 ,h 0 ) = o P ∗ n<br />

(r −1<br />

n ).<br />

r n (̂θ n − θ 0 ) d →−V −1<br />

θ 0 ,h 0<br />

Z(θ 0 ,h 0 ).<br />

n (̂θ n , ĥn) = o P ∗ n<br />

(r −1<br />

n ),<br />

To prove the above theorem, we need the following lemma, which is a slight<br />

generalization <strong>of</strong> Lemma 19.24 <strong>of</strong> van der Vaart (1998).<br />

LEMMA 2.2. Let (T , ρ) be a semimetric space. Suppose that Z n<br />

d → Z in<br />

l ∞ (T ) and that almost all paths <strong>of</strong> Z are continuous with respect to ρ. If T -valued<br />

random sequence ̂t n satisfies ρ(̂t n ,t 0 ) = o P ∗ n<br />

(1) <strong>for</strong> some nonrandom t 0 ∈ T , then<br />

Z n (̂t n ) − Z n (t 0 ) = o P ∗ n<br />

(1).<br />

PROOF. Let us equip the space l ∞ (T ) × T with the metric ‖·‖ T + ρ, where<br />

‖·‖ T denotes the uni<strong>for</strong>m metric on l ∞ (T ). Define the function g : l ∞ (T ) × T →<br />

R by g(z,t) = z(t) − z(t 0 ). Then <strong>for</strong> any z ∈ C ρ (T ) and t ∈ T , the function g<br />

is continuous at (z, t). Indeed, if (z n ,t n ) → (z, t), then‖z n − z‖ T → 0, and thus<br />

z n (t n ) = z(t n ) + o(1) → z(t), while z n (t 0 ) → z(t 0 ) is trivial.<br />

By assumption, we have (Z n ,̂t n ) d → (Z, t 0 ) in l ∞ (T ) × T [see, e.g., Theorem<br />

18.10(v) <strong>of</strong> van der Vaart (1998)]. Since almost all paths <strong>of</strong> Z belong to<br />

C ρ (T ), by the continuous mapping theorem,<br />

Z n (̂t n ) − Z n (t 0 ) = g(Z n ,̂t n ) d → g(Z,t 0 ) = Z(t 0 ) − Z(t 0 ) = 0.<br />

The pro<strong>of</strong> is finished.<br />

□<br />

PROOF OF THEOREM 2.1. Applying Lemma 2.2 to the l ∞ (U)-valued random<br />

element Z n = r n ( n − ˜ n ),wehave<br />

r n<br />

(<br />

n (̂θ n , ĥn) − ˜ n (̂θ n , ĥn) ) − r n<br />

(<br />

n (θ 0 ,h 0 ) − ˜ n (θ 0 ,h 0 ) ) = o P ∗ n<br />

(1).<br />

Since r n n (̂θ n , ĥn) = o P ∗ n<br />

(1), wehave<br />

r n<br />

(˜ n (̂θ n , ĥn) − ˜ n (θ 0 ,h 0 ) ) =−r n n (θ 0 ,h 0 ) + o P ∗ n<br />

(1).<br />

By the assumption (ii), it holds that<br />

(2) r n V θ0 ,h 0<br />

(̂θ n − θ 0 ) =−r n n (θ 0 ,h 0 ) + o P ∗ n<br />

(1 + r n ‖̂θ n − θ 0 ‖).<br />

Now, since<br />

r n ‖̂θ n − θ 0 ‖≤‖Vθ −1<br />

0 ,h 0<br />

‖r n ‖V θ0 ,h 0<br />

(̂θ n − θ 0 )‖<br />

≤ O Pn (1) + o P ∗ n<br />

(1 + r n ‖̂θ n − θ 0 ‖) by (2),


3560 Y. NISHIYAMA<br />

it holds that r n ‖̂θ n − θ 0 ‖=O P ∗ n<br />

(1). Inserting this to (2), we have<br />

r n V θ0 ,h 0<br />

(̂θ n − θ 0 ) =−r n n (θ 0 ,h 0 ) + o P ∗ n<br />

(1)<br />

=−r n<br />

(<br />

n (θ 0 ,h 0 ) − ˜ n (θ 0 ,h 0 ) ) + o P ∗ n<br />

(1)<br />

d<br />

→−Z(θ 0 ,h 0 ),<br />

which implies the conclusion.<br />

□<br />

In Theorem 2.1, both “‖̂θ n − θ 0 ‖=o P ∗ n<br />

(1)” and“d H (ĥn,h 0 ) = o P ∗ n<br />

(1)” are<br />

assumed. Under some conditions, the <strong>for</strong>mer automatically follows from the latter,<br />

as it is seen in the following theorem.<br />

THEOREM 2.3. Let (, d ) and (H, d H ) be two semimetric spaces. Let<br />

n : × H → R d be a random map defined on a probability space ( n , F n ,P n ).<br />

(We do not assume any measurability.) Let : × H → R d be a nonrandom<br />

function. Suppose that<br />

sup ‖ n (θ, h) − (θ,h)‖=o P ∗ n<br />

(1).<br />

(θ,h)∈×H<br />

Suppose also that <strong>for</strong> some (θ 0 ,h 0 ) ∈ × H<br />

inf ‖(θ,h 0)‖ > 0<br />

θ : d (θ,θ 0 )>ε<br />

∀ε>0,<br />

and h (θ,h) is continuous at h 0 uni<strong>for</strong>mly in θ. Then <strong>for</strong> any random sequence<br />

(̂θ n , ĥn) such that n (̂θ n , ĥn) = o P ∗ n<br />

(1) and that d H (ĥn,h 0 ) = o P ∗ n<br />

(1), it<br />

holds that d (̂θ n ,θ 0 ) = o P ∗ n<br />

(1).<br />

PROOF.<br />

Observe that<br />

‖(̂θ n , ĥn)‖≤‖(̂θ n , ĥn) − n (̂θ n , ĥn)‖+‖ n (̂θ n , ĥn)‖<br />

≤ sup ‖(θ,h)− n (θ, h)‖+‖ n (̂θ n , ĥn)‖<br />

θ,h<br />

= o P ∗ n<br />

(1).<br />

Now, <strong>for</strong> every ε>0, there exist δ,η > 0 such that ‖(θ,h)‖ >η<strong>for</strong> every θ with<br />

d (θ, θ 0 )>εand every h with d H (h, h 0 )ε} is<br />

contained in the event {‖(̂θ n , ĥn)‖ >η}∪{d H (ĥn,h 0 ) ≥ δ}. The outer probability<br />

<strong>of</strong> the latter event converges to 0. □<br />

3. Uni<strong>for</strong>m law <strong>of</strong> large numbers. In this section, we give a uni<strong>for</strong>m law <strong>of</strong><br />

large numbers <strong>for</strong> ergodic processes, under a smoothness assumption. The pro<strong>of</strong> is


SEMIPARAMETRIC Z-ESTIMATION 3561<br />

standard, so it is omitted. [See, e.g., Theorem 2.4.1 <strong>of</strong> van der Vaart and Wellner<br />

(1996) <strong>for</strong> the idea, or see Nishiyama (2009).]<br />

THEOREM 3.1. Let (E, E) be a measurable space. Let beasetwhichis<br />

totally bounded with respect to the semimetric ρ. Let a family {f(·; θ); θ ∈ }<br />

<strong>of</strong> measurable functions on E be given. Suppose that there exists a measurable<br />

function K such that<br />

(3)<br />

|f(x; θ)− f(x; θ ′ )|≤K(x)ρ(θ,θ ′ ) ∀θ,θ ′ ∈ .<br />

(i) Suppose that the E-valued random process {X t } t∈[0,∞) is ergodic with the<br />

invariant law μ, that is, <strong>for</strong> any μ-integrable function g<br />

∫<br />

1 T<br />

∫<br />

p<br />

g(X t )dt → g(x)μ(dx).<br />

T 0<br />

E<br />

If all f(·; θ) and K are μ-integrable, then<br />

∫ 1 T<br />

∫<br />

sup<br />

∣ f(X t ; θ)dt − f(x; θ)μ(dx)<br />

T<br />

∣ = o P ∗(1).<br />

θ∈<br />

0<br />

(ii) Suppose that the E-valued random process {X i } i=1,2,... is ergodic with the<br />

invariant law μ, that is, <strong>for</strong> any μ-integrable function g,<br />

1<br />

n<br />

n∑<br />

i=1<br />

E<br />

g(X i ) →<br />

p ∫<br />

g(x)μ(dx).<br />

E<br />

If all f(·; θ) and K are μ-integrable, then<br />

1<br />

n∑<br />

∫<br />

sup<br />

f(X<br />

∣<br />

i ; θ)− f(x; θ)μ(dx)<br />

n<br />

E<br />

∣ = o P ∗(1).<br />

θ∈<br />

i=1<br />

REMARK. The smoothness assumption (3) can be replaced by “bracketing.”<br />

See Theorem 2.4.1 <strong>of</strong> van der Vaart and Wellner (1996).<br />

4. Ergodic diffusion processes.<br />

4.1. Regularity conditions. Let us consider the diffusion process model introduced<br />

in the first paragraph <strong>of</strong> Section 1. We shall list up some conditions.<br />

We suppose that there exists a parametric family <strong>of</strong> d-dimensional vector-valued<br />

functions {Ṡ(·; θ); θ ∈ } on I which satisfies the following conditions. Typically,<br />

they may be considered to be the derivatives <strong>of</strong> S(·; θ) with respect to θ, thatis,<br />

Ṡ(·; θ) = (<br />

∂θ ∂<br />

1<br />

S(·; θ),...,<br />

∂θ ∂<br />

d<br />

S(·; θ)) T . The function appearing in A1 and A3<br />

may be chosen to be common without loss <strong>of</strong> generality.


3562 Y. NISHIYAMA<br />

A1. is a compact subset <strong>of</strong> R d . There exists a measurable function on I<br />

such that at the true θ 0 ∈ ,<br />

S(x; θ)− S(x; θ 0 ) = Ṡ(x; θ 0 ) T (θ − θ 0 ) + (x)ɛ(x; θ,θ 0 ),<br />

where sup x∈I |ɛ(x; θ,θ 0 )|=o(‖θ − θ 0 ‖) as θ → θ 0 .<br />

A2. There exists a constant K>0 such that<br />

sup |S(x; θ)− S(x ′ ; θ)|≤K|x − x ′ |;<br />

θ∈<br />

sup ‖Ṡ(x; θ)− Ṡ(x ′ ; θ)‖≤K|x − x ′ |;<br />

θ∈<br />

sup |σ 2 (x; h) − σ 2 (x ′ ; h)|≤K|x − x ′ |.<br />

h∈H<br />

A3. There exists a measurable function on I such that<br />

sup |S(x; θ)|≤(x);<br />

θ∈<br />

sup ‖Ṡ(x; θ)‖≤(x);<br />

θ∈<br />

c := inf inf σ 2 (x; h) > 0;<br />

h∈H x∈I<br />

‖Ṡ(x; θ)− Ṡ(x; θ)‖≤(x)‖θ − θ ′ ‖ ∀θ,θ ′ ∈ ;<br />

|σ 2 (x; h) − σ 2 (x; h ′ )|≤(x) d H (h, h ′ ) ∀h, h ′ ∈ H.<br />

A4. sup t∈R E((X t ) 8 +|X t | 4 )ε<br />

I<br />

0<br />

Ṡ(x; θ)<br />

σ 2 (x; h 0 ) [S(x; θ 0) − S(x; θ)]μ(dx)<br />

∥ > 0.<br />

REMARK. The last assumption in A2 implies that σ 2 (x; h 0 ) ≤ C(1 +|x|) <strong>for</strong><br />

a constant C>0.


SEMIPARAMETRIC Z-ESTIMATION 3563<br />

To close this subsection, let us discuss the possibility <strong>of</strong> the choice <strong>of</strong> the nuisance<br />

parameter space (H, d H ).<br />

EXAMPLE 1 (Parametric model). When (H, d H ) is a compact subset <strong>of</strong> a<br />

finite-dimensional Euclidean space, the metric entropy condition A7 is indeed satisfied.<br />

So the main restriction is the Lipschitz continuity <strong>of</strong> h ↦→ σ 2 (·; h) in A3.<br />

This situation is more general than that in Yoshida (1992) and Kessler (1997),<br />

although, as announced in Section 1, our result does not include theirs.<br />

EXAMPLE 2 (The class <strong>of</strong> smooth functions). Let us consider the parametrization<br />

σ(x; h) = h(x) where h is an element <strong>of</strong> the class H = CM α (I) defined<br />

below. We equip the function space H with the uni<strong>for</strong>m metric ‖·‖ ∞ <strong>for</strong> which<br />

the last requirement in A3 is always fulfilled. To check A7, first we consider the<br />

case where I is a bounded subset <strong>of</strong> R, and next we give some remarks <strong>for</strong> the<br />

general case.<br />

We take the material below from Section 2.7.1 <strong>of</strong> van der Vaart and Wellner<br />

(1996). Let I be a bounded, convex subset <strong>of</strong> R q . (In the current example <strong>of</strong> onedimensional<br />

diffusions, we are considering the case q = 1, but <strong>for</strong> the generality<br />

we set q to be a general positive integer; see Section 5.) Let α>0andM>0<br />

be given, and let α be the greatest integer smaller than α. For any vector k =<br />

(k 1 ,...,k q ) <strong>of</strong> q integers, we define<br />

D k =<br />

∂x k ,<br />

1<br />

1 ···∂xk q<br />

q<br />

where k· = ∑ q<br />

i=1 k i. We denote by CM α (I) the class <strong>of</strong> functions defined on I such<br />

that<br />

max sup<br />

k·≤α x<br />

|Dk h(x)|+max sup |D k h(x) − D k h(y)|<br />

k·=α x,y ‖x − y‖ α−α ≤ M,<br />

where the sumprema are taken over all x, y in the interior <strong>of</strong> I with x ≠ y. Then<br />

there exists a constant K>0 depending only on α and q, such that<br />

∂ k·<br />

( M<br />

log N(CM α (I), ‖·‖ ∞,ε)≤ Kλ(I 1 )<br />

ε<br />

) q/α<br />

,<br />

where λ(I 1 ) is the Lebesgue measure <strong>of</strong> the set {x : ‖x−I‖ < 1}. Hence, the metric<br />

entropy condition A7 is satisfied if q/(2α) < 1, and there<strong>for</strong>e our <strong>theory</strong> works.<br />

When I = R q , we shall restrict out attention, <strong>for</strong> example, to the following class<br />

H <strong>of</strong> functions on R q .LetI 0 be a bounded, convex subset <strong>of</strong> R q , and we suppose<br />

that the restriction <strong>of</strong> h ∈ H to I 0 belongs to C M (I 0 ) and that<br />

(4)<br />

sup |h(x) − h ′ (x)|≤L sup |h(x) − h ′ (x)| ∀h, h ′ ∈ H,<br />

x∈R q x∈I 0


3564 Y. NISHIYAMA<br />

<strong>for</strong> a constant L>0. Then both the last condition <strong>of</strong> A3 and A7 are satisfied. The<br />

condition (4) is satisfied if we assume, <strong>for</strong> example, either <strong>of</strong> the following:<br />

(i) h is known on I c 0 ;<br />

(ii) when q = 1andI 0 =[l 0 ,r 0 ], each h is constant on (−∞,l 0 ] and on<br />

[r 0 , ∞).<br />

Although the examples (i) and (ii) might look restrictive, it should be noted that<br />

in practice we can choose an arbitrary large I 0 .<br />

Another way to deal with the unbounded case I = R q is to consider the parametrization<br />

σ(x; h) = h(u(x)), h ∈ C α M (I 0),<br />

where I 0 being a bounded, convex subset <strong>of</strong> R q′ and u : R q → I 0 is a fixed function.<br />

If q ′ /(2α) < 1, then both the last requirement in A3 and the metric entropy<br />

condition A7 are satisfied <strong>for</strong> the uni<strong>for</strong>m metric d H =‖·‖ ∞ .<br />

Instead <strong>of</strong> CM α (I), another possibility <strong>of</strong> the choice <strong>of</strong> H which satisfies the<br />

metric entropy condition <strong>for</strong> the uni<strong>for</strong>m metric is the Sobolev class; see Example<br />

19.10 <strong>of</strong> van der Vaart (1998).<br />

4.2. Results. As announced in Section 1, we propose to use the estimating<br />

function<br />

n (θ, h) = 1 n∑ Ṡ(X t n<br />

i−1<br />

; θ)<br />

tn<br />

σ 2 (X t n<br />

i−1<br />

; h) [X ti<br />

n − X t n<br />

i−1<br />

− S(X t n<br />

i−1<br />

; θ)|ti n − ti−1 n |],<br />

i=1<br />

whose compensator is<br />

˜ n (θ, h) = 1 n∑ Ṡ(X t n<br />

i−1<br />

; θ) [∫ t n<br />

i<br />

tn<br />

σ 2 (X t n<br />

i−1<br />

; h)<br />

i=1<br />

t n i−1<br />

Then we have the following two lemmas.<br />

S(X t ,θ 0 )dt − S(X t n<br />

i−1<br />

; θ)|t n i − t n i−1 | ].<br />

LEMMA 4.1. Assume n → 0 and tn n →∞. Equip the space × H with the<br />

metric ρ =‖·‖∨d H . Under A1–A5 and A7, √ tn ( n − ˜ n ) converges weakly in<br />

C ρ ( × H) to a zero-mean Gaussian process Z with the covariance<br />

∫<br />

EZ(θ, h)Z(θ ′ ,h ′ ) T =<br />

I<br />

Ṡ(x; θ)Ṡ(x; θ ′ ) T<br />

σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ(dx).<br />

In particular, the random variable Z(θ 0 ,h 0 ) is distributed with N (0,I(θ 0 ,h 0 )).<br />

LEMMA 4.2. Assume n = o((tn n)−1 ) and tn n →∞. Under A1–A6, <strong>for</strong> any<br />

random sequence (θ n ,h n ) such that ‖θ n − θ 0 ‖∨d H (h n ,h 0 ) = o P ∗(1), it holds that<br />

˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 ) − (−I(θ 0 ,h 0 ))(θ n − θ 0 ) = o P ∗( (t<br />

n<br />

n ) −1/2 +‖θ n − θ 0 ‖ ) .


SEMIPARAMETRIC Z-ESTIMATION 3565<br />

Combining these lemmas with Theorem 2.1, and noting also ˜ n (θ 0 ,h 0 ) =<br />

O P ( 1/2<br />

n ) which will be proved by using Lemma 4.5 below, we can conclude<br />

the following theorem.<br />

THEOREM 4.3. Assume n = o((tn n)−1 ) and tn n →∞. Under A1–A7, <strong>for</strong> any<br />

random sequence (̂θ n , ĥn), such that<br />

and<br />

‖̂θ n − θ 0 ‖=o P ∗(1),<br />

d H (ĥn,h 0 ) = o P ∗(1)<br />

n (̂θ n , ĥn) = o P ∗((t n n )−1/2 ),<br />

the estimator ̂θ n is asymptotically normal and efficient:<br />

√<br />

t n n (̂θ n − θ 0 ) d → N (0,I(θ 0 ,h 0 ) −1 ).<br />

When A8 is also satisfied, the assumption “‖̂θ n − θ 0 ‖=o P ∗(1)” is automatically<br />

satisfied.<br />

In the above theorem, the only assumption which we cannot check in the course<br />

<strong>of</strong> computing the data is the consistency “d H (ĥn,h 0 ) = o P ∗(1),” because it involves<br />

the true value h 0 <strong>of</strong> the unknown parameter h ∈ H .When{σ 2 (·; h); h ∈ H }<br />

is a class <strong>of</strong> functions σ 2 (·; h) = h(·) where H is a class <strong>of</strong> smooth functions, one<br />

may think that a kernel estimator is a candidate <strong>for</strong> ĥn. As stated above, in view <strong>of</strong><br />

the Lipschitz condition <strong>of</strong> h ↦→ σ 2 (·; h) (the last condition in A3), it is convenient<br />

to consider the consistency with respect to the uni<strong>for</strong>m metric. However, to show<br />

the consistency <strong>of</strong> the kernel estimator with respect to the uni<strong>for</strong>m metric is a task.<br />

Generally speaking, showing the consistency <strong>for</strong> infinite-dimensional parameter<br />

is not a trivial problem, which should be solved by independent articles. See, <strong>for</strong><br />

example, H<strong>of</strong>fmann (2001). Below, we give a general way to show the consistency<br />

<strong>of</strong> a least square estimator.<br />

THEOREM 4.4.<br />

Assume n → 0 and tn n →∞. Assume A2–A5 and A7. Suppose<br />

that<br />

∫<br />

inf |σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ(dx) > 0 ∀ε>0<br />

h : d H (h,h 0 )>ε I<br />

is satisfied. If the random element ĥn satisfies A n (ĥn) ≤ inf h∈H A n (h) + o P ∗(1)<br />

where<br />

A n (h) = 1<br />

n∑<br />

|X t n<br />

i<br />

− X t n<br />

i−1<br />

| 2<br />

tn<br />

∣<br />

|ti n − ti−1 n | − σ 2 2<br />

(X t n<br />

i−1<br />

,h)<br />

∣ |ti n − ti−1 n |,<br />

i=1<br />

then it holds that d H (ĥn,h 0 ) = o P ∗(1).


3566 Y. NISHIYAMA<br />

4.3. Pro<strong>of</strong>s. Be<strong>for</strong>e the pro<strong>of</strong>s, we state a lemma which is well known.<br />

LEMMA 4.5.<br />

|t n i<br />

− t n i−1 |≤1.<br />

Let X be a solution to the SDE (1) <strong>for</strong> (θ, h) = (θ 0 ,h 0 ). Assume<br />

(i) For any k ≥ 2, there exists a constant C k > 0, depending only on k, such<br />

that<br />

E sup |X t − X t n<br />

t∈[ti−1 n ,tn i ] i−1<br />

| k ≤ C k sup E{|S(X s ; θ 0 )| k +|σ(X s ; h 0 )| k }|ti n − ti−1 n |k/2<br />

s∈R<br />

=: D k |t n i − t n i−1 |k/2 ,<br />

provided the right-hand side is finite.<br />

(ii) For any k ≥ 2 and any measurable function f , g, it holds that<br />

sup E ( |X t − X t n<br />

t∈[ti−1 n ,tn i ] i−1<br />

| k/2 |f(X t n<br />

i−1<br />

)||g(X t )| )<br />

≤ (D k |ti n − ti−1 n |k/2 ) 1/2 sup(E|f(X s )| 4 ) 1/4 sup(E|g(X s )| 4 ) 1/4 ,<br />

s∈R<br />

s∈R<br />

provided the right-hand side is finite.<br />

PROOF. The assertion (i) is well known. (Use Hölder’s inequality<br />

∫ t n<br />

and Burkholder–Davis–Gundy’s inequality <strong>for</strong> i<br />

ti−1<br />

n |S(X s ; θ 0 )| ds and<br />

sup t∈[t n<br />

i−1 ,ti n] | ∫ t<br />

t n σ(X s ; h 0 )dW s |.) The assertion (ii) follows from Hölder’s inequality<br />

and (i).<br />

i−1<br />

□<br />

During the pro<strong>of</strong>s, we write<br />

ψ(x; θ,h)=<br />

Ṡ(x; θ)<br />

σ 2 (x; h) ,<br />

which is a d-dimensional vector-valued function. For each component ψ (j) (x; θ,<br />

h) (j = 1,...,d), it holds that<br />

∣ ψ (j) (x; θ,h)− ψ (j) (x ′ ; θ,h) ∣ ∣<br />

(5)<br />

≤ |Ṡ(j) (x; θ)− Ṡ (j) (x ′ ; θ)|<br />

σ 2 (x; h)<br />

+ ∣ Ṡ (j) (x ′ ; θ) ∣ 1<br />

∣<br />

σ 2 (x; h) − 1<br />

σ 2 (x ′ ; h)<br />

∣<br />

{ 1<br />

≤<br />

c + (x) }<br />

c 2 K|x − x ′ |


and that<br />

∣ ψ (j) (x; θ,h)− ψ (j) (x; θ ′ ,h ′ ) ∣ ∣<br />

(6)<br />

SEMIPARAMETRIC Z-ESTIMATION 3567<br />

≤ |Ṡ(j) (x; θ)− Ṡ (j) (x; θ ′ )|<br />

σ 2 (x; h)<br />

+ ∣ ∣Ṡ (j) (x; θ ′ ) ∣ 1<br />

∣<br />

σ 2 (x; h) − 1<br />

σ 2 (x; h ′ )<br />

∣<br />

≤ |Ṡ(j) (x; θ)− Ṡ (j) (x; θ ′ )|<br />

+ ∣ c<br />

Ṡ (j) (x; θ ′ ) ∣ |σ 2 (x; h) − σ 2 (x; h ′ )|<br />

c 2<br />

{ (x)<br />

≤ + |(x)|2 } (‖θ − θ ′<br />

c c 2 ‖∨d H (h, h ′ ) ) .<br />

PROOF OF LEMMA 4.1. We apply Theorem 3.4.2 <strong>of</strong> Nishiyama (2000b) [or,<br />

see Theorem 3.3 <strong>of</strong> van der Vaart and van Zanten (2005)] to the terminals M n,θ,h<br />

tn<br />

<strong>of</strong> the continuous martingales t Mt<br />

n,θ,h given by<br />

M n,θ,h<br />

t = 1 √ t n n<br />

n∑<br />

∫ t n<br />

i ∧t<br />

ψ(X t n<br />

i−1<br />

; θ,h) σ(X s ; h 0 )dW s .<br />

i=1<br />

ti−1 n ∧t<br />

For the finite-dimensional convergence, it is sufficient to show the convergence <strong>of</strong><br />

predictable covariation. This is done as follows.<br />

(7)<br />

〈M n,θ,h ,M n,θ ′ ,h ′ 〉 t n = 1 n∑<br />

∫ t n<br />

tn<br />

ψ(X t n<br />

i−1<br />

; θ, h)ψ(X t n<br />

i−1<br />

; θ ′ ,h ′ ) T i<br />

= 1<br />

i=1<br />

∫ t n n<br />

t n i−1<br />

σ 2 (X s ; h 0 )ds<br />

ψ(X t ; θ, h)ψ(X t ; θ ′ ,h ′ ) T σ 2 (X t ; h 0 )dt + o P (1)<br />

tn<br />

0<br />

∫<br />

p<br />

→ ψ(x; θ, h)ψ(x; θ ′ ,h ′ ) T σ 2 (x; h 0 )μ(dx)<br />

I<br />

∫<br />

Ṡ(x; θ)Ṡ(x; θ ′ ) T<br />

=<br />

I σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ(dx)<br />

= I(θ 0 ,h 0 ) if (θ, h) = (θ ′ ,h ′ ) = (θ 0 ,h 0 ).<br />

Here, to show (7), we have used the bound (5) and Lemma 4.5(ii) twice.<br />

To establish Nishiyama’s condition [ME], let us observe the following fact to<br />

check the metric entropy condition <strong>for</strong> the product space × H .<br />

In general, if (D, d) and (E, e) are two semimetric spaces, then the covering<br />

number <strong>of</strong> the product space D × E with respect to the maximum semimetric<br />

d ∨ e, namely N(D × E,d ∨ e,ε), is bounded by N(D,d,ε)· N(E,e,ε).Tosee<br />

this claim, let B i , i = 1,...,N(D,d,ε) be an ε-covering <strong>of</strong> D, andletC j , j =<br />

1,...,N(E,e,ε)be an ε-covering <strong>of</strong> E. Then the diameters <strong>of</strong> the sets B i × C j ⊂


3568 Y. NISHIYAMA<br />

D × E with respect to d ∨ e are smaller than ε, thus these sets <strong>for</strong>m an ε-covering<br />

<strong>of</strong> D × E. The claim has been proved. Consequently, the metric entropy condition<br />

is satisfied if<br />

∫ 1<br />

0<br />

∫ 1<br />

0<br />

√<br />

log N(D,d,ε)dε < ∞<br />

√<br />

log N(D × E,d ∨ e,ε)dε < ∞<br />

and<br />

∫ 1<br />

0<br />

√<br />

log N(E,e,ε)dε


to (θ 0 ,h 0 ) that<br />

(8)<br />

1<br />

t n n<br />

∫ t n n<br />

0<br />

SEMIPARAMETRIC Z-ESTIMATION 3569<br />

ψ(X t ; θ n ,h n )[S(X t ; θ 0 ) − S(X t ; θ n )] dt<br />

= 1<br />

t n n<br />

= 1<br />

=<br />

tn<br />

∫<br />

I<br />

∫ t n n<br />

0<br />

∫ t n n<br />

0<br />

ψ(X t ; θ n ,h n )Ṡ(X t ; θ 0 ) T dt(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />

ψ(X t ; θ 0 ,h 0 )Ṡ(X t ; θ 0 ) T dt(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />

ψ(x; θ 0 ,h 0 )Ṡ(x; θ 0 ) T μ(dx)(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />

=−I(θ 0 ,h 0 )(θ n − θ 0 ) + o P ∗(‖θ n − θ 0 ‖).<br />

To prove (8) in the above computation, use (6) to show that <strong>for</strong> every j,k =<br />

1,...,d<br />

∫ 1 t n n<br />

∣ ψ (j) (X t ; θ n ,h n )Ṡ (k) (X t ; θ 0 )dt − 1 ∫ t n n<br />

ψ (j) (X t ; θ 0 ,h 0 )Ṡ (k) (X t ; θ 0 )dt<br />

∣<br />

t n n<br />

0<br />

≤ 1 ∫ t n{ n (Xt )<br />

tn<br />

0 c<br />

= O P (1) · o P ∗(1)<br />

= o P ∗(1).<br />

The pro<strong>of</strong> is complete.<br />

t n n<br />

0<br />

+ |(X t)| 2<br />

c 2 } ∣∣Ṡ (k) (X t ; θ 0 ) ∣∣ dt ·‖θ n − θ 0 ‖∨d H (h n ,h 0 )<br />

□<br />

PROOF OF THEOREM 4.3. By Lemma 4.5, it is easy to see that ˜ n (θ 0 ,h 0 ) =<br />

O P ( 1/2<br />

n ) = o P ((tn n)−1/2 ). So the main assertion follows from Theorem 2.1<br />

with help from Lemmas 4.1 and 4.2. On the other hand, since it follows from<br />

Lemma 4.5(ii) and Theorem 3.1(i) that sup θ,h ‖ n (θ, h) − (θ,h)‖=o P ∗(1),<br />

where<br />

∫<br />

Ṡ(x; θ)<br />

(θ,h) =<br />

I σ 2 (x; h) [S(x; θ 0) − S(x; θ)]μ(dx),<br />

the assertion that the consistency “‖̂θ n − θ 0 ‖=o P ∗(1)” automatically follows<br />

from A8 is immediate from Theorem 2.3. □<br />

PROOF OF THEOREM 4.4.<br />

n∑<br />

M n (h) = 1<br />

M(h) =<br />

tn<br />

i=1<br />

∫<br />

I<br />

Put<br />

|σ 2 (X t n<br />

i−1<br />

; h) − σ 2 (X t n<br />

i−1<br />

; h 0 )| 2 |t n i − t n i−1 |,<br />

|σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ(dx).


3570 Y. NISHIYAMA<br />

Let us apply Corollary 3.2.3 <strong>of</strong> van der Vaart and Wellner (1996) to the above<br />

M n and M <strong>for</strong> the given ĥn which is the solution <strong>of</strong> A n (ĥn) ≤ inf h∈H A n (h) +<br />

o P ∗(1). By Lemma 4.5(ii) and Theorem 3.1(i), it is not difficult to see that<br />

sup h∈H |M n (h) − M(h)| =o P ∗(1), so it is sufficient to show that M n (ĥn) =<br />

o P ∗(1).<br />

Observe that<br />

1<br />

n∑<br />

|X t n<br />

i<br />

− X t n<br />

i−1<br />

| 2<br />

tn<br />

∣<br />

|t n i=1 i<br />

− ti−1 n | − σ 2 2<br />

(X t n<br />

i−1<br />

; h)<br />

∣ |ti n − ti−1 n |<br />

= 1<br />

n∑<br />

|X t n<br />

i<br />

− X t n<br />

i−1<br />

| 2<br />

tn<br />

∣<br />

|t n i=1 i<br />

− ti−1 n | − σ 2 2<br />

(X t n<br />

i−1<br />

; h 0 )<br />

∣ |ti n − ti−1 n |<br />

+ 2 n∑<br />

( |Xt n<br />

i<br />

− X t n<br />

i−1<br />

| 2<br />

)<br />

tn<br />

|t n i=1 i<br />

− ti−1 n | − σ 2 (X t n<br />

i−1<br />

; h 0 )<br />

+ 1<br />

t n n<br />

× ( σ 2 (X t n<br />

i−1<br />

; h 0 ) − σ 2 (X t n<br />

i−1<br />

; h) ) |t n i − t n i−1 |<br />

n∑<br />

|σ 2 (X t n<br />

i−1<br />

; h 0 ) − σ 2 (X t n<br />

i−1<br />

; h)| 2 |ti n − ti−1 n |.<br />

i=1<br />

Let us prove that the supremum with respect to h <strong>of</strong> the absolute value <strong>of</strong> the<br />

second term on the right-hand side converges in outer probability to zero [say, the<br />

claim (a)].<br />

Since we have from Itô’s <strong>for</strong>mula that<br />

|X t n<br />

i<br />

− X t n<br />

i−1<br />

| 2 = 2<br />

∫ t n<br />

i<br />

t n i−1<br />

(X s − X t n<br />

i−1<br />

)S(X s ; θ 0 )ds<br />

∫ t n ∫<br />

i<br />

t n<br />

i<br />

+ 2 (X s − X t<br />

ti−1<br />

n n<br />

i−1<br />

)σ (X s ; h 0 )dW s + σ 2 (X s ; h 0 )ds,<br />

ti−1<br />

n<br />

it is sufficient to show that C 1,n = o P (1), sup h∈H |C 2,n (h)|=o P ∗(1) and C 3,n =<br />

o P (1), where<br />

C 1,n = 1<br />

t n n<br />

C 2,n (h) = 1<br />

t n n<br />

C 3,n = 1<br />

t n n<br />

n∑<br />

∫ t n i<br />

∣ (X s − X t<br />

i=1<br />

ti−1<br />

n n<br />

i−1<br />

)S(X s ; θ 0 )ds<br />

∣ (X ti−1 n ),<br />

n∑<br />

∫ t n<br />

i<br />

(<br />

(X s − X t<br />

i=1<br />

ti−1<br />

n n<br />

i−1<br />

)σ (X s ; h 0 )dW s σ 2 (X t n<br />

i−1<br />

; h 0 ) − σ 2 (X t n<br />

i−1<br />

; h) ) ,<br />

n∑<br />

∫ t n<br />

i<br />

|σ 2 (X s ; h 0 ) − σ 2 (X t<br />

i=1<br />

ti−1<br />

n n<br />

i−1<br />

; h 0 )| ds(X t n<br />

i−1<br />

).<br />

By using Lemma 4.5(ii), we easily have EC 1,n → 0andEC 3,n → 0. On the other<br />

hand, by using Theorem 3.4.2 <strong>of</strong> Nishiyama (2000b), it holds that C 2,n converges


SEMIPARAMETRIC Z-ESTIMATION 3571<br />

weakly to zero in C dH (H ) (recall the argument in the pro<strong>of</strong> <strong>of</strong> Lemma 4.1). There<strong>for</strong>e,<br />

we have sup h∈H |C 2,n (h)|=o P ∗(1).<br />

Hence, the claim (a) is true, and we have that A n (ĥn) ≤ inf h∈H A n (h) + o P ∗(1)<br />

implies that M n (ĥn) = o P ∗(1). The pro<strong>of</strong> is finished. □<br />

5. Ergodic time series.<br />

5.1. Model and regularity conditions. Let us consider the time series model<br />

given by<br />

X i = ˜S(X i−1 ,...,X i−q1 ; θ)+ ˜σ(X i−1 ,...,X i−q2 ; h)w i .<br />

By putting q = q 1 ∨q 2 and changing the domains <strong>of</strong> the functions ˜S and ˜σ , without<br />

loss <strong>of</strong> generality, we can write<br />

X i = S(X i ; θ)+ σ(X i ; h)w i ,<br />

where X i = (X i−1 ,...,X i−q ) and S(·; θ) and σ(·; h) are some measurable functions<br />

on R q . For simplicity, we assume that the initial values (X 0 ,...,X 1−q ) =<br />

(x 0 ,...,x 1−q ) are fixed.<br />

As <strong>for</strong> the noise {w i }, we consider the following two cases:<br />

CASE G (Gaussian).<br />

N (0, 1).<br />

CASE M (Martingale).<br />

where F i = σ {X j : j ≤ i}.<br />

{w i } are independently, identically distributed with<br />

E[w i |F i−1 ]=0andE[w 2 i |F i−1] =1 almost surely,<br />

Clearly, the Case G is a special case <strong>of</strong> the Case M. When we do not especially<br />

declare the restriction to the Case G, we consider the Case M in principle.<br />

Let us list up some conditions which have the same fashion as those in Section<br />

4.1. We suppose that there exists a parametric family <strong>of</strong> d-dimensional vectorvalued<br />

functions {Ṡ(·; θ); θ ∈ } on R q which satisfies the following conditions.<br />

Typically, they may be considered to be the derivatives <strong>of</strong> S(·; θ) with respect to θ,<br />

that is, Ṡ(·; θ) = (<br />

∂θ ∂<br />

1<br />

S(·; θ),...,<br />

∂θ ∂<br />

d<br />

S(·; θ)) T . The function appearing in B1<br />

and B2 may be chosen to be common without loss <strong>of</strong> generality.<br />

B1. is a compact subset <strong>of</strong> R d . There exists a measurable function on R q<br />

such that at the true θ 0 ∈ ,<br />

S(x; θ)− S(x; θ 0 ) = Ṡ(x; θ 0 ) T (θ − θ 0 ) + (x)ɛ(x; θ,θ 0 ),<br />

where sup x∈R q |ɛ(x; θ,θ 0 )|=o(‖θ − θ 0 ‖) as θ → θ 0 .


3572 Y. NISHIYAMA<br />

B2. There exists a measurable function on R q such that<br />

sup |S(x; θ)|≤(x);<br />

θ∈<br />

sup ‖Ṡ(x; θ)‖≤(x);<br />

θ∈<br />

σ 2 (x; h 0 ) ≤ (x),<br />

c := inf inf σ 2 (x; h) > 0;<br />

h∈H x∈R q<br />

‖Ṡ(x; θ)− Ṡ(x; θ)‖≤(x)‖θ − θ ′ ‖ ∀θ,θ ′ ∈ ;<br />

|σ 2 (x; h) − σ 2 (x; h ′ )|≤(x)d H (h, h ′ ) ∀h, h ′ ∈ H.<br />

B3. The process {X i } i=1,2,... is ergodic under the true (θ 0 ,h 0 ) in the sense that<br />

<strong>for</strong> q ′ = q and q + 1 there exists the invariant measure μ q ′ such that <strong>for</strong> every<br />

μ q ′-integrable function f<br />

1<br />

n<br />

n∑<br />

i=1<br />

We also assume that<br />

∫<br />

B4. The matrix<br />

f(X i−1 ,...,X i−q ′) →<br />

p ∫<br />

f(x 1,...,x q ′)μ q ′(dx 1 ···dx q ′).<br />

R q′<br />

∫<br />

R q (x)5 μ q (dx)ε<br />

∥<br />

∫ 1<br />

0<br />

√<br />

log N(H,d H ,ε)dε 0.<br />

See the end <strong>of</strong> Section 4.1 <strong>for</strong> the discussion <strong>of</strong> the choice <strong>of</strong> (H, d H ).<br />

5.2. Results. In order to explain the idea <strong>of</strong> our estimating function, let us first<br />

consider the Case G. We denote by P n,u the distribution <strong>of</strong> {X 1 ,...,X n } under


SEMIPARAMETRIC Z-ESTIMATION 3573<br />

θ = θ 0 + n −1/2 u and h = h 0 ,whereu ∈ R d . By an easy computation, the loglikelihood<br />

ratio is given by<br />

(9)<br />

where<br />

and<br />

n,u =<br />

log dP n,u<br />

dP n,0<br />

(X 1 ,...,X n )<br />

=−<br />

n∑<br />

i=1<br />

1<br />

2σ 2 (X i ; h 0 )<br />

= n,u − B n,u ,<br />

n∑<br />

i=1<br />

B n,u =<br />

×{|X i − S(X i ; θ 0 + n −1/2 u)| 2 −|X i − S(X i ; θ 0 )| 2 }<br />

1<br />

σ 2 (X i ; h 0 ) {X i − S(X i ; θ 0 )}{S(X i ; θ 0 + n −1/2 u) − S(X i ; θ 0 )}<br />

n∑<br />

i=1<br />

1<br />

2σ 2 (X i ; h 0 ) {S(X i; θ 0 + n −1/2 u) − S(X i ; θ 0 )} 2 .<br />

Under the above regularity conditions, we have<br />

n,u<br />

d → N (0,u T I(θ 0 ,h 0 )u) and B n,u<br />

p<br />

→<br />

1<br />

2<br />

u T I(θ 0 ,h 0 )u.<br />

So it follows from the <strong>theory</strong> <strong>of</strong> the local asymptotic normality that the distribution<br />

<strong>of</strong> the asymptotically efficient bound in the Case G when h 0 is known is<br />

N (0,I(θ 0 ,h 0 ) −1 ). That is, if we obtain an estimator ˜θ n such that √ n(˜θ n − θ 0 ) →<br />

d<br />

N (0,I(θ 0 ,h 0 ) −1 ), it is asymptotically efficient in the sense <strong>of</strong> the local asymptotic<br />

minimax theorem. [See, e.g., Chapter 3.11 <strong>of</strong> van der Vaart and Wellner (1996).]<br />

If the parameter h is unknown, then the estimation problem <strong>for</strong> θ becomes more<br />

difficult. So if we have an estimator which asymptotically behaves as stated above,<br />

then we may say that it is asymptotically efficient with the nuisance parameter h.<br />

This argument is not true in the Case M where the log-likelihood does not equal<br />

the <strong>for</strong>mula (9), but we propose to use it <strong>for</strong> deriving an estimating equation which<br />

yields the same asymptotic distribution as the Case G.<br />

Not only in the Case G but also in the case M, differentiating (9) <strong>for</strong>mally,<br />

and replacing the true h 0 by the unknown parameter h, we propose the estimating<br />

function<br />

n (θ, h) = 1 n∑ Ṡ(X i ; θ) (<br />

Xi<br />

n σ 2 − S(X i ; θ) ) .<br />

(X i ; h)<br />

Its compensator is<br />

˜ n (θ, h) = 1 n<br />

n∑<br />

i=1<br />

i=1<br />

Ṡ(X i ; θ) ( S(Xi<br />

σ 2 ; θ 0 ) − S(X i ; θ) ) .<br />

(X i ; h)


3574 Y. NISHIYAMA<br />

Thus, it holds that<br />

√ ( n n (θ, h) − ˜ n (θ, h) ) = √ 1 n∑ Ṡ(X i ; θ)<br />

n σ 2 (X i ; h) σ(X i; h 0 )w i ,<br />

which is the summation <strong>of</strong> a C ρ ( × H)-valued martingale difference array where<br />

ρ =‖·‖∨d H . By using Jain–Marcus’ central limit theorem <strong>for</strong> martingales given<br />

by Nishiyama (1996, 2000a, 2000b), we have the following lemma which plays a<br />

key role in our approach.<br />

LEMMA 5.1. Under B1–B3 and B5, the sequence <strong>of</strong> random fields √ n( n −<br />

˜ n ), with parameter (θ, h), converges weakly in C ρ ( × H) to a zero-mean<br />

Gaussian random field Z with the covariance<br />

EZ(θ, h)Z(θ ′ ,h ′ ) T Ṡ(x; θ)Ṡ(x; θ<br />

=<br />

∫R ′ ) T<br />

q σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ q (dx).<br />

In particular, the random variable Z(θ 0 ,h 0 ) is distributed with N (0,I(θ 0 ,h 0 )).<br />

Another lemma which is necessary to apply Theorem 2.1 is the following.<br />

LEMMA 5.2. Under B1–B4, <strong>for</strong> any random sequence (θ n ,h n ) such that<br />

‖θ n − θ 0 ‖∨d H (h n ,h 0 ) = o P ∗(1), it holds that<br />

˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 ) − (−I(θ 0 ,h 0 ))(̂θ n − θ 0 ) = o P ∗(‖θ n − θ 0 ‖).<br />

i=1<br />

Noting also that ˜ n (θ 0 ,h 0 ) = 0, we can apply Theorem 2.1 to conclude the<br />

following theorem.<br />

THEOREM 5.3.<br />

Under B1–B5, <strong>for</strong> any random sequence (̂θ n , ĥn) such that<br />

‖̂θ n − θ 0 ‖=o P ∗(1), d H (ĥn,h 0 ) = o P ∗(1) and n (̂θ n , ĥn) = o P ∗(n −1/2 ),<br />

the estimator ̂θ n is asymptotically normal:<br />

√ n(̂θ n − θ 0 ) d → N (0,I(θ 0 ,h 0 ) −1 ).<br />

In particular, in the Case G, the estimator ̂θ n is asymptotically efficient. When B6<br />

is also satisfied, the assumption “‖̂θ n − θ 0 ‖=o P ∗(1)” is automatically satisfied.<br />

By the same reason as in Section 4, it is necessary to develop a procedure to construct<br />

a consistent estimator ĥn <strong>for</strong> the nuisance parameter h ∈ H . The following<br />

theorem gives us an answer.


SEMIPARAMETRIC Z-ESTIMATION 3575<br />

THEOREM 5.4. Assume B2, B3 and B5.<br />

(First step: initial estimator <strong>for</strong> θ 0 .) Suppose the identifiability condition<br />

∫<br />

inf |S(x; θ)− S(x; θ 0)| 2 μ q (dx)>0 ∀ε>0<br />

θ : ‖θ−θ 0 ‖>ε R q<br />

is satisfied. If a random sequence θn<br />

LS satisfies A n (θn<br />

LS)<br />

≤ inf θ∈ A n (θ) + o P ∗(1),<br />

where<br />

A n (θ) = 1 n∑<br />

|X i − S(X i ; θ)| 2 ,<br />

n<br />

i=1<br />

then it holds that ‖θn<br />

LS − θ 0 ‖=o P ∗(1).<br />

(Second step: consistent estimator <strong>for</strong> h 0 .) Suppose the identifiability condition<br />

∫<br />

inf |σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ q (dx)>0 ∀ε>0<br />

h : d H (h,h 0 )>ε R q<br />

is satisfied. Merely by a technical reason, assume that there exists a constant<br />

L 4 > 0 such that E[wi 4|F i−1]


3576 Y. NISHIYAMA<br />

PROOF OF LEMMA 5.2. Notice that ˜ n (θ 0 ,h 0 ) = 0. For any (possibly random)<br />

sequence (θ n ,h n ) converging in outer probability to (θ 0 ,h 0 ), it holds that<br />

˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 )<br />

(11)<br />

= 1 n<br />

= 1 n<br />

= 1 n<br />

n∑<br />

i=1<br />

n∑<br />

i=1<br />

n∑<br />

i=1<br />

Ṡ(X i ; θ n )<br />

σ 2 (X i ; h n ) [S(X i; θ 0 ) − S(X i ; θ n )]<br />

Ṡ(X i ; θ n )<br />

σ 2 (X i ; h n )Ṡ(X i; θ 0 )(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />

Ṡ(X i ; θ 0 )<br />

σ 2 (X i ; h 0 )Ṡ(X i; θ 0 )(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />

=−I(θ 0 ,h 0 )(θ n − θ 0 ) + o P ∗(‖θ n − θ 0 ‖).<br />

To show (11) above, do the same argument as the pro<strong>of</strong> <strong>of</strong> Lemma 4.2 using (10)<br />

instead <strong>of</strong> (6). □<br />

PROOF OF THEOREM 5.3. The assertions follow from Theorems 2.1 and 2.3<br />

by using also Lemmas 5.1 and 5.2, and Theorem 3.1(ii), respectively. □<br />

PROOF OF THEOREM 5.4. To prove the first step, we will apply Corollary<br />

3.2.3 <strong>of</strong> van der Vaart and Wellner (1996). We can write A n (θ) = T 1,n + T 2,n (θ) +<br />

T 3,n (θ) where<br />

T 1,n = 1 n∑<br />

|X i − S(X i ; θ 0 )| 2 ,<br />

n<br />

i=1<br />

T 2,n (θ) = 1 n∑ ( S(Xi ; θ 0 ) − S(X i ; θ) ) σ(X i ; h 0 )w i ,<br />

n<br />

i=1<br />

T 3,n (θ) = 1 n∑<br />

|S(X i ; θ 0 ) − S(X i ; θ)| 2 .<br />

n<br />

i=1<br />

The term T 1,n converges in probability to a constant C 1 . On the other hand, by using<br />

Proposition 4.5 <strong>of</strong> Nishiyama (2000a), we have that √ nT 2,n converges weakly<br />

in C() to a tight law, thus, sup θ∈ |T 2,n (θ)| converges in outer probability to zero.<br />

Finally, by Theorem 3.1, it holds that sup θ∈ |T 3,n (θ) − T 3 (θ)|=o P ∗(1) where<br />

∫<br />

T 3 (θ) = |S(x; θ 0) − S(x; θ)| 2 μ(dx).<br />

R q<br />

Hence, we have sup θ∈ |A n (θ) − (C 1 + T 3 (θ))|=o P ∗(1), and van der Vaart and<br />

Wellner’s (1996) consistency theorem yields the assertion <strong>of</strong> the first step.


SEMIPARAMETRIC Z-ESTIMATION 3577<br />

To prove the second step, we shall apply Corollary 3.2.3 <strong>of</strong> van der Vaart and<br />

Wellner (1996) again. Let us first see that sup h∈H |B n (h) − ˜B n (h)| =o P ∗(1)<br />

where<br />

˜B n (h) = 1 n∑<br />

n<br />

∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h) ∣ 2 .<br />

n<br />

i=1<br />

Now, notice that<br />

∣ ∣∣ |X i − S(X i ; θ LS )|2 − σ 2 (X i ; h) ∣ 2 − ∣∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h) ∣ 2∣ ∣<br />

≤ ∣∣ |X i − S(X i ; θn<br />

LS )|2 +|X i − S(X i ; θ 0 )| 2 − 2σ 2 (X i ; h) ∣ × ∣∣ |X i − S(X i ; θn<br />

LS )|2 −|X i − S(X i ; θ 0 )| 2∣ ∣<br />

≤ ∣∣ |X i − S(X i ; θn<br />

LS )|2 +|X i − S(X i ; θ 0 )| 2 − 2σ 2 (X i ; h) ∣ ×|2X i − S(X i ; θ LS<br />

n ) − S(X i; θ 0 )|<br />

×|S(X i ; θ LS<br />

n ) − S(X i; θ 0 )|<br />

≤ L(X i , X i )‖θ LS<br />

n<br />

− θ 0 ‖,<br />

where L(x 0 ,x 1 ,...,x q ) = C‖x 0 |+(x 1 ,...,x q )| 4 <strong>for</strong> a constant C. Given any<br />

ε>0 choose M>0 such that ∫ R q+1 L(x 0,x 1 ,...,x q )1 {L(x0 ,x 1 ,...,x q )>M}μ q+1 (dx 0 ,<br />

dx 1 ,...,dx q )M} diam().<br />

i=1<br />

The second term <strong>of</strong> the right-hand side converges to a positive constant which<br />

is smaller than ε · diam(). Since the choice <strong>of</strong> ε>0 is arbitrary, we have<br />

sup h∈H |B n (h) − ˜B n (h)|=o P ∗(1).<br />

Now, we can write ˜B n (h) = T 1,n + T 2,n (h) + T 3,n (h), where<br />

T 1,n = 1 n<br />

T 2,n (h) = 2 n<br />

T 3,n (h) = 1 n<br />

n∑<br />

∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h 0 ) ∣ 2 ,<br />

i=1<br />

n∑<br />

i=1<br />

n∑<br />

i=1<br />

σ 2 (X i ,h 0 )(w 2 i − 1)( σ 2 (X i ,h 0 ) − σ 2 (X i ; h) ) ,<br />

|σ 2 (X i ; h 0 ) − σ 2 (X i ; h)| 2 .<br />

The term T 1,n converges in probability to a constant C 1 by assumption. By<br />

Jain–Marcus’ CLT <strong>for</strong> martingale difference arrays, it is easy to show that<br />

sup h∈H |T 2,n (h)|=o P ∗(1) (here, we use the technical assumption that E[w 4 i |F i−1]


3578 Y. NISHIYAMA<br />

is bounded). Finally, by Theorem 3.1, it holds that sup h∈H |T 3,n (h) − T 3 (h)| =<br />

o P ∗(1) where<br />

∫<br />

T 3 (h) = |σ 2 (x; h 0 ) − σ 2 (x; h)| 2 μ q (dx).<br />

R q<br />

Consequently, we have sup h∈H |B n (h) − (C 1 + T 3 (h))|=o P ∗(1). There<strong>for</strong>e, the<br />

claim <strong>of</strong> the second step follows <strong>for</strong>m van der Vaart and Wellner’s (1996) consistency<br />

theorem. □<br />

Acknowledgments. I thank the Associate Editor and the referees <strong>for</strong> their<br />

careful reading and the advice to present a sharper version <strong>of</strong> the moment conditions.<br />

REFERENCES<br />

FLORENS-ZMIROU, D. (1989). Approximate discrete time schemes <strong>for</strong> statistics <strong>of</strong> diffusion<br />

processes. Statistics 20 547–557. MR1047222<br />

HOFFMANN, M. (2001). On estimating the diffusion coefficient: Parametric versus nonparametric.<br />

Ann. Inst. H. Poincaré Probab. Statist. 37 339–372. MR1831987<br />

JACOD, J.andSHIRYAEV, A. N. (1987). Limit Theorems <strong>for</strong> Stochastic Processes. Springer, Berlin.<br />

MR0959133<br />

KESSLER, M. (1997). Estimation <strong>of</strong> an ergodic diffusion from discrete observations. Scand. J. Statist.<br />

24 211–229. MR1455868<br />

KOSOROK, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference.<br />

Springer, New York.<br />

KUTOYANTS, Y. A. (2004). Statistical Inference <strong>for</strong> Ergodic Diffusion Processes. Springer, London.<br />

MR2144185<br />

NISHIYAMA, Y. (1996). A central limit theorem <strong>for</strong> l ∞ -valued martingale difference arrays and its<br />

application. Preprint 971. Dept. Mathematics, Utrecht Univ. Available at http://www.ism.ac.jp/<br />

~nisiyama/.<br />

NISHIYAMA, Y. (1997). Some central limit theorems <strong>for</strong> l ∞ -valued semimartingales and their applications.<br />

Probab. Theory Related Fields 108 459–494. MR1465638<br />

NISHIYAMA, Y. (1999). A maximal inequality <strong>for</strong> continuous martingales and M-estimation in a<br />

Gaussian white noise model. Ann. Statist. 27 675–696. MR1714712<br />

NISHIYAMA, Y. (2000a). Weak convergence <strong>of</strong> some classes <strong>of</strong> martingales with jumps. Ann.<br />

Probab. 28 685–712. MR1782271<br />

NISHIYAMA, Y. (2000b). Entropy Methods <strong>for</strong> Martingales. CWI Tract. 128. Centrum Wisk. In<strong>for</strong>m.,<br />

Amsterdam. MR1742612<br />

NISHIYAMA, Y. (2007). On the paper “Weak convergence <strong>of</strong> some classes <strong>of</strong> martingales with<br />

jumps.” Ann. Probab. 35 1194–1200. MR2319720<br />

NISHIYAMA, Y. (2009). A uni<strong>for</strong>m law <strong>of</strong> large numbers <strong>for</strong> ergodic processes under a Lipschitz<br />

condition. Unpublished manuscript. Available at http://www.ism.ac.jp/~nisiyama/.<br />

TANIGUCHI, M.andKAKIZAWA, Y. (2000). <strong>Asymptotic</strong> Theory <strong>of</strong> Statistical Inference <strong>for</strong> Time<br />

Series. Springer, New York. MR1785484<br />

VAN DER VAART, A. W. (1998). <strong>Asymptotic</strong> Statistics. Cambridge Univ. Press, Cambridge.<br />

MR1652247<br />

VAN DER VAART, A.W.andWELLNER, J. A. (1996). Weak Convergence and Empirical Processes:<br />

With Applications to Statistics. Springer, New York. MR1385671


SEMIPARAMETRIC Z-ESTIMATION 3579<br />

VAN DER VAART, A.W.andVAN ZANTEN, H. (2005). Donsker theorems <strong>for</strong> diffusions: Necessary<br />

and sufficient conditions. Ann. Probab. 33 1422–1451. MR2150194<br />

YOSHIDA, N. (1992). Estimation <strong>for</strong> diffusion processes from discrete observations. J. Multivariate<br />

Anal. 41 220–242. MR1172898<br />

INSTITUTE OF STATISTICAL MATHEMATICS<br />

4-6-7 MINAMI-AZABU, MINATO-KU<br />

TOKYO 106-8569<br />

JAPAN<br />

E-MAIL: nisiyama@ism.ac.jp<br />

URL: http://www.ism.ac.jp/~nisiyama/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!