Asymptotic theory of semiparametric Z-estimators for stochastic ...
Asymptotic theory of semiparametric Z-estimators for stochastic ...
Asymptotic theory of semiparametric Z-estimators for stochastic ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
The Annals <strong>of</strong> Statistics<br />
2009, Vol. 37, No. 6A, 3555–3579<br />
DOI: 10.1214/09-AOS693<br />
© Institute <strong>of</strong> Mathematical Statistics, 2009<br />
ASYMPTOTIC THEORY OF SEMIPARAMETRIC Z-ESTIMATORS<br />
FOR STOCHASTIC PROCESSES WITH APPLICATIONS TO<br />
ERGODIC DIFFUSIONS AND TIME SERIES<br />
BY YOICHI NISHIYAMA<br />
Institute <strong>of</strong> Statistical Mathematics<br />
This paper generalizes a part <strong>of</strong> the <strong>theory</strong> <strong>of</strong> Z-estimation which has<br />
been developed mainly in the context <strong>of</strong> modern empirical processes to the<br />
case <strong>of</strong> <strong>stochastic</strong> processes, typically, semimartingales. We present a general<br />
theorem to derive the asymptotic behavior <strong>of</strong> the solution to an estimating<br />
equation θ n (θ, ĥn) = 0 with an abstract nuisance parameter h when the<br />
compensator <strong>of</strong> n is random. As its application, we consider the estimation<br />
problem in an ergodic diffusion process model where the drift coefficient<br />
contains an unknown, finite-dimensional parameter θ and the diffusion coefficient<br />
is indexed by a nuisance parameter h from an infinite-dimensional<br />
space. An example <strong>for</strong> the nuisance parameter space is a class <strong>of</strong> smooth functions.<br />
We establish the asymptotic normality and efficiency <strong>of</strong> a Z-estimator<br />
<strong>for</strong> the drift coefficient. As another application, we present a similar result<br />
also in an ergodic time series model.<br />
1. Introduction. Let us begin with stating our motivating example; the details<br />
are presented in Section 4. Consider the one-dimensional ergodic diffusion process<br />
X on I = (l, r) ⊆ R which is a solution to the <strong>stochastic</strong> differential equation<br />
(SDE) given by<br />
∫ t<br />
∫ t<br />
(1)<br />
X t = X 0 + S(X s ; θ)ds + σ(X s ; h) dW s ,<br />
0<br />
0<br />
where s W s is a standard Brownian motion. Here, we consider a d-dimensional<br />
parametric family {S(·; θ); θ ∈ } <strong>for</strong> the drift coefficient indexed by a compact<br />
subset <strong>of</strong> R d , and a possibly infinite-dimensional “parametric” family<br />
{σ 2 (·; h); h ∈ H } <strong>for</strong> the diffusion coefficient indexed by a (general) totally<br />
bounded metric space (H, d H ). We denote by (θ 0 ,h 0 ) the true value <strong>of</strong> (θ, h).Our<br />
aim is to estimate θ 0 when the model is perturbed by the unknown nuisance parameter<br />
h. As <strong>for</strong> the parameter h 0 , we construct a d H -consistent estimator ĥn.<br />
We prove that the Z-estimator ̂θ n , which is a solution to an estimating equation<br />
n (θ, ĥn) = 0, is asymptotically normal and efficient. [We follow van der Vaart<br />
and Wellner (1996) <strong>for</strong> the terminology “Z-estimator.”]<br />
Received January 2009.<br />
AMS 2000 subject classifications. Primary 62F12; secondary 62M05, 62M10.<br />
Key words and phrases. Estimating function, nuisance parameter, metric entropy, asymptotic efficiency,<br />
ergodic diffusion, discrete observation.<br />
3555
3556 Y. NISHIYAMA<br />
There exist a lot <strong>of</strong> works which treat the estimation problem <strong>for</strong> the drift coefficient.<br />
It is well known that when the process X = (X t ) t∈[0,∞) is observed continuously<br />
on the time interval [0,T], the diffusion coefficient may be assumed to<br />
be known without loss <strong>of</strong> generality. (So, we may put h = h 0 .) In such cases, the<br />
asymptotic normality and efficiency <strong>of</strong> the maximum likelihood estimator (MLE)<br />
̂θ T <strong>for</strong> θ 0 ,asT →∞, has been already established. See, <strong>for</strong> example, Kutoyants<br />
(2004). The MLE ̂θ T is a solution to the estimating equation ˙l T (θ) = 0 with<br />
˙l T (θ) = 1 T<br />
∫ T<br />
0<br />
Ṡ(X t ; θ)<br />
σ 2 (X t ; h 0 ) [dX t − S(X t ; θ)dt],<br />
where Ṡ denotes the derivative <strong>of</strong> S with respect to θ. On the other hand, when the<br />
process X = (X t ) t∈[0,∞) is observed only at discrete time points {0 = t0 n
SEMIPARAMETRIC Z-ESTIMATION 3557<br />
and to show the differentiability <strong>of</strong> (θ, h) ˜ n (θ, h) around (θ 0 ,h 0 ). Roughly<br />
speaking, our main result asserts that if we assume h ↦→ σ 2 (·; h) is Lipschitz continuous<br />
with respect to d H , that the metric entropy condition is satisfied,<br />
∫ 1<br />
0<br />
√<br />
log N(H,d H ,ε)dε
3558 Y. NISHIYAMA<br />
denote by C ρ (T ) the space <strong>of</strong> functions on T which are continuous with respect to<br />
the metric ρ. We equip both spaces with the uni<strong>for</strong>m metric. Given a probability<br />
measure P , we denote by P ∗ the corresponding outer probability; see van der Vaart<br />
and Wellner (1996) <strong>for</strong> the <strong>stochastic</strong> convergence <strong>theory</strong> which does not assume<br />
the measurability. We denote by p → and d → the convergence in (outer) probability<br />
and the weak convergence. The limit notation mean in principle that we take the<br />
limit as n →∞. The Euclidean metric on R d is denoted by ‖·‖.<br />
2. General <strong>theory</strong> <strong>for</strong> <strong>semiparametric</strong> Z-estimation. Let two sets and H<br />
be given. Let<br />
n : × H → R d and ˜ n : × H → R d<br />
be random maps. The latter should be a random “compensator” <strong>of</strong> the <strong>for</strong>mer,<br />
and in the i.i.d. case it is not random and not depending on n. Compare the above<br />
setting with that in Chapter 3.3 <strong>of</strong> van der Vaart and Wellner (1996)where˜ n ≡ .<br />
We present a way to derive the asymptotic behaviour <strong>of</strong> the estimator ̂θ n <strong>for</strong><br />
the parameter θ ∈ <strong>of</strong> interest, with help <strong>of</strong> the estimator ĥn <strong>for</strong> the nuisance<br />
parameter h ∈ H , which are solutions to the estimating equation<br />
n (̂θ n , ĥn) ≈ 0.<br />
Here, the true values θ 0 ∈ and h 0 ∈ H are supposed to satisfy<br />
˜ n (θ 0 ,h 0 ) ≈ 0.<br />
The following theorem extends a special case <strong>of</strong> Theorem 3.3.1 <strong>of</strong> van der Vaart<br />
and Wellner (1996). See also Theorem 5.21 <strong>of</strong> van der Vaart (1998).<br />
THEOREM 2.1. Let be a subset <strong>of</strong> R d with the Euclidean metric ‖·‖. Let<br />
(H, d H ) be a semimetric space. Let n : × H → R d and ˜ n : × H → R d<br />
be random maps defined on a probability space ( n , F n ,P n ).(We do not assume<br />
any measurability.) Suppose that there exist a sequence <strong>of</strong> constants r n ↑∞, some<br />
fixed point (θ 0 ,h 0 ) and an invertible matrix V θ0 ,h 0<br />
which satisfy the following (i)<br />
and (ii).<br />
(i) There exists a neighborhood U ⊂ × H <strong>of</strong> (θ 0 ,h 0 ) such that<br />
r n ( n − ˜ n ) d → Z<br />
in l ∞ (U),<br />
where almost all paths (θ, h) Z(θ,h) are continuous with respect to ρ =‖·‖∨<br />
d H .<br />
(ii) For given random sequence (̂θ n , ĥn), it holds that<br />
˜ n (̂θ n , ĥn) − ˜ n (θ 0 ,h 0 ) − V θ0 ,h 0<br />
(̂θ n − θ 0 ) = o P ∗ n<br />
(r −1<br />
n<br />
+‖̂θ n − θ 0 ‖)
and that<br />
SEMIPARAMETRIC Z-ESTIMATION 3559<br />
‖̂θ n − θ 0 ‖∨d H (ĥn,h 0 ) = o P ∗ n<br />
(1),<br />
Then it holds that<br />
˜ n (θ 0 ,h 0 ) = o P ∗ n<br />
(r −1<br />
n ).<br />
r n (̂θ n − θ 0 ) d →−V −1<br />
θ 0 ,h 0<br />
Z(θ 0 ,h 0 ).<br />
n (̂θ n , ĥn) = o P ∗ n<br />
(r −1<br />
n ),<br />
To prove the above theorem, we need the following lemma, which is a slight<br />
generalization <strong>of</strong> Lemma 19.24 <strong>of</strong> van der Vaart (1998).<br />
LEMMA 2.2. Let (T , ρ) be a semimetric space. Suppose that Z n<br />
d → Z in<br />
l ∞ (T ) and that almost all paths <strong>of</strong> Z are continuous with respect to ρ. If T -valued<br />
random sequence ̂t n satisfies ρ(̂t n ,t 0 ) = o P ∗ n<br />
(1) <strong>for</strong> some nonrandom t 0 ∈ T , then<br />
Z n (̂t n ) − Z n (t 0 ) = o P ∗ n<br />
(1).<br />
PROOF. Let us equip the space l ∞ (T ) × T with the metric ‖·‖ T + ρ, where<br />
‖·‖ T denotes the uni<strong>for</strong>m metric on l ∞ (T ). Define the function g : l ∞ (T ) × T →<br />
R by g(z,t) = z(t) − z(t 0 ). Then <strong>for</strong> any z ∈ C ρ (T ) and t ∈ T , the function g<br />
is continuous at (z, t). Indeed, if (z n ,t n ) → (z, t), then‖z n − z‖ T → 0, and thus<br />
z n (t n ) = z(t n ) + o(1) → z(t), while z n (t 0 ) → z(t 0 ) is trivial.<br />
By assumption, we have (Z n ,̂t n ) d → (Z, t 0 ) in l ∞ (T ) × T [see, e.g., Theorem<br />
18.10(v) <strong>of</strong> van der Vaart (1998)]. Since almost all paths <strong>of</strong> Z belong to<br />
C ρ (T ), by the continuous mapping theorem,<br />
Z n (̂t n ) − Z n (t 0 ) = g(Z n ,̂t n ) d → g(Z,t 0 ) = Z(t 0 ) − Z(t 0 ) = 0.<br />
The pro<strong>of</strong> is finished.<br />
□<br />
PROOF OF THEOREM 2.1. Applying Lemma 2.2 to the l ∞ (U)-valued random<br />
element Z n = r n ( n − ˜ n ),wehave<br />
r n<br />
(<br />
n (̂θ n , ĥn) − ˜ n (̂θ n , ĥn) ) − r n<br />
(<br />
n (θ 0 ,h 0 ) − ˜ n (θ 0 ,h 0 ) ) = o P ∗ n<br />
(1).<br />
Since r n n (̂θ n , ĥn) = o P ∗ n<br />
(1), wehave<br />
r n<br />
(˜ n (̂θ n , ĥn) − ˜ n (θ 0 ,h 0 ) ) =−r n n (θ 0 ,h 0 ) + o P ∗ n<br />
(1).<br />
By the assumption (ii), it holds that<br />
(2) r n V θ0 ,h 0<br />
(̂θ n − θ 0 ) =−r n n (θ 0 ,h 0 ) + o P ∗ n<br />
(1 + r n ‖̂θ n − θ 0 ‖).<br />
Now, since<br />
r n ‖̂θ n − θ 0 ‖≤‖Vθ −1<br />
0 ,h 0<br />
‖r n ‖V θ0 ,h 0<br />
(̂θ n − θ 0 )‖<br />
≤ O Pn (1) + o P ∗ n<br />
(1 + r n ‖̂θ n − θ 0 ‖) by (2),
3560 Y. NISHIYAMA<br />
it holds that r n ‖̂θ n − θ 0 ‖=O P ∗ n<br />
(1). Inserting this to (2), we have<br />
r n V θ0 ,h 0<br />
(̂θ n − θ 0 ) =−r n n (θ 0 ,h 0 ) + o P ∗ n<br />
(1)<br />
=−r n<br />
(<br />
n (θ 0 ,h 0 ) − ˜ n (θ 0 ,h 0 ) ) + o P ∗ n<br />
(1)<br />
d<br />
→−Z(θ 0 ,h 0 ),<br />
which implies the conclusion.<br />
□<br />
In Theorem 2.1, both “‖̂θ n − θ 0 ‖=o P ∗ n<br />
(1)” and“d H (ĥn,h 0 ) = o P ∗ n<br />
(1)” are<br />
assumed. Under some conditions, the <strong>for</strong>mer automatically follows from the latter,<br />
as it is seen in the following theorem.<br />
THEOREM 2.3. Let (, d ) and (H, d H ) be two semimetric spaces. Let<br />
n : × H → R d be a random map defined on a probability space ( n , F n ,P n ).<br />
(We do not assume any measurability.) Let : × H → R d be a nonrandom<br />
function. Suppose that<br />
sup ‖ n (θ, h) − (θ,h)‖=o P ∗ n<br />
(1).<br />
(θ,h)∈×H<br />
Suppose also that <strong>for</strong> some (θ 0 ,h 0 ) ∈ × H<br />
inf ‖(θ,h 0)‖ > 0<br />
θ : d (θ,θ 0 )>ε<br />
∀ε>0,<br />
and h (θ,h) is continuous at h 0 uni<strong>for</strong>mly in θ. Then <strong>for</strong> any random sequence<br />
(̂θ n , ĥn) such that n (̂θ n , ĥn) = o P ∗ n<br />
(1) and that d H (ĥn,h 0 ) = o P ∗ n<br />
(1), it<br />
holds that d (̂θ n ,θ 0 ) = o P ∗ n<br />
(1).<br />
PROOF.<br />
Observe that<br />
‖(̂θ n , ĥn)‖≤‖(̂θ n , ĥn) − n (̂θ n , ĥn)‖+‖ n (̂θ n , ĥn)‖<br />
≤ sup ‖(θ,h)− n (θ, h)‖+‖ n (̂θ n , ĥn)‖<br />
θ,h<br />
= o P ∗ n<br />
(1).<br />
Now, <strong>for</strong> every ε>0, there exist δ,η > 0 such that ‖(θ,h)‖ >η<strong>for</strong> every θ with<br />
d (θ, θ 0 )>εand every h with d H (h, h 0 )ε} is<br />
contained in the event {‖(̂θ n , ĥn)‖ >η}∪{d H (ĥn,h 0 ) ≥ δ}. The outer probability<br />
<strong>of</strong> the latter event converges to 0. □<br />
3. Uni<strong>for</strong>m law <strong>of</strong> large numbers. In this section, we give a uni<strong>for</strong>m law <strong>of</strong><br />
large numbers <strong>for</strong> ergodic processes, under a smoothness assumption. The pro<strong>of</strong> is
SEMIPARAMETRIC Z-ESTIMATION 3561<br />
standard, so it is omitted. [See, e.g., Theorem 2.4.1 <strong>of</strong> van der Vaart and Wellner<br />
(1996) <strong>for</strong> the idea, or see Nishiyama (2009).]<br />
THEOREM 3.1. Let (E, E) be a measurable space. Let beasetwhichis<br />
totally bounded with respect to the semimetric ρ. Let a family {f(·; θ); θ ∈ }<br />
<strong>of</strong> measurable functions on E be given. Suppose that there exists a measurable<br />
function K such that<br />
(3)<br />
|f(x; θ)− f(x; θ ′ )|≤K(x)ρ(θ,θ ′ ) ∀θ,θ ′ ∈ .<br />
(i) Suppose that the E-valued random process {X t } t∈[0,∞) is ergodic with the<br />
invariant law μ, that is, <strong>for</strong> any μ-integrable function g<br />
∫<br />
1 T<br />
∫<br />
p<br />
g(X t )dt → g(x)μ(dx).<br />
T 0<br />
E<br />
If all f(·; θ) and K are μ-integrable, then<br />
∫ 1 T<br />
∫<br />
sup<br />
∣ f(X t ; θ)dt − f(x; θ)μ(dx)<br />
T<br />
∣ = o P ∗(1).<br />
θ∈<br />
0<br />
(ii) Suppose that the E-valued random process {X i } i=1,2,... is ergodic with the<br />
invariant law μ, that is, <strong>for</strong> any μ-integrable function g,<br />
1<br />
n<br />
n∑<br />
i=1<br />
E<br />
g(X i ) →<br />
p ∫<br />
g(x)μ(dx).<br />
E<br />
If all f(·; θ) and K are μ-integrable, then<br />
1<br />
n∑<br />
∫<br />
sup<br />
f(X<br />
∣<br />
i ; θ)− f(x; θ)μ(dx)<br />
n<br />
E<br />
∣ = o P ∗(1).<br />
θ∈<br />
i=1<br />
REMARK. The smoothness assumption (3) can be replaced by “bracketing.”<br />
See Theorem 2.4.1 <strong>of</strong> van der Vaart and Wellner (1996).<br />
4. Ergodic diffusion processes.<br />
4.1. Regularity conditions. Let us consider the diffusion process model introduced<br />
in the first paragraph <strong>of</strong> Section 1. We shall list up some conditions.<br />
We suppose that there exists a parametric family <strong>of</strong> d-dimensional vector-valued<br />
functions {Ṡ(·; θ); θ ∈ } on I which satisfies the following conditions. Typically,<br />
they may be considered to be the derivatives <strong>of</strong> S(·; θ) with respect to θ, thatis,<br />
Ṡ(·; θ) = (<br />
∂θ ∂<br />
1<br />
S(·; θ),...,<br />
∂θ ∂<br />
d<br />
S(·; θ)) T . The function appearing in A1 and A3<br />
may be chosen to be common without loss <strong>of</strong> generality.
3562 Y. NISHIYAMA<br />
A1. is a compact subset <strong>of</strong> R d . There exists a measurable function on I<br />
such that at the true θ 0 ∈ ,<br />
S(x; θ)− S(x; θ 0 ) = Ṡ(x; θ 0 ) T (θ − θ 0 ) + (x)ɛ(x; θ,θ 0 ),<br />
where sup x∈I |ɛ(x; θ,θ 0 )|=o(‖θ − θ 0 ‖) as θ → θ 0 .<br />
A2. There exists a constant K>0 such that<br />
sup |S(x; θ)− S(x ′ ; θ)|≤K|x − x ′ |;<br />
θ∈<br />
sup ‖Ṡ(x; θ)− Ṡ(x ′ ; θ)‖≤K|x − x ′ |;<br />
θ∈<br />
sup |σ 2 (x; h) − σ 2 (x ′ ; h)|≤K|x − x ′ |.<br />
h∈H<br />
A3. There exists a measurable function on I such that<br />
sup |S(x; θ)|≤(x);<br />
θ∈<br />
sup ‖Ṡ(x; θ)‖≤(x);<br />
θ∈<br />
c := inf inf σ 2 (x; h) > 0;<br />
h∈H x∈I<br />
‖Ṡ(x; θ)− Ṡ(x; θ)‖≤(x)‖θ − θ ′ ‖ ∀θ,θ ′ ∈ ;<br />
|σ 2 (x; h) − σ 2 (x; h ′ )|≤(x) d H (h, h ′ ) ∀h, h ′ ∈ H.<br />
A4. sup t∈R E((X t ) 8 +|X t | 4 )ε<br />
I<br />
0<br />
Ṡ(x; θ)<br />
σ 2 (x; h 0 ) [S(x; θ 0) − S(x; θ)]μ(dx)<br />
∥ > 0.<br />
REMARK. The last assumption in A2 implies that σ 2 (x; h 0 ) ≤ C(1 +|x|) <strong>for</strong><br />
a constant C>0.
SEMIPARAMETRIC Z-ESTIMATION 3563<br />
To close this subsection, let us discuss the possibility <strong>of</strong> the choice <strong>of</strong> the nuisance<br />
parameter space (H, d H ).<br />
EXAMPLE 1 (Parametric model). When (H, d H ) is a compact subset <strong>of</strong> a<br />
finite-dimensional Euclidean space, the metric entropy condition A7 is indeed satisfied.<br />
So the main restriction is the Lipschitz continuity <strong>of</strong> h ↦→ σ 2 (·; h) in A3.<br />
This situation is more general than that in Yoshida (1992) and Kessler (1997),<br />
although, as announced in Section 1, our result does not include theirs.<br />
EXAMPLE 2 (The class <strong>of</strong> smooth functions). Let us consider the parametrization<br />
σ(x; h) = h(x) where h is an element <strong>of</strong> the class H = CM α (I) defined<br />
below. We equip the function space H with the uni<strong>for</strong>m metric ‖·‖ ∞ <strong>for</strong> which<br />
the last requirement in A3 is always fulfilled. To check A7, first we consider the<br />
case where I is a bounded subset <strong>of</strong> R, and next we give some remarks <strong>for</strong> the<br />
general case.<br />
We take the material below from Section 2.7.1 <strong>of</strong> van der Vaart and Wellner<br />
(1996). Let I be a bounded, convex subset <strong>of</strong> R q . (In the current example <strong>of</strong> onedimensional<br />
diffusions, we are considering the case q = 1, but <strong>for</strong> the generality<br />
we set q to be a general positive integer; see Section 5.) Let α>0andM>0<br />
be given, and let α be the greatest integer smaller than α. For any vector k =<br />
(k 1 ,...,k q ) <strong>of</strong> q integers, we define<br />
D k =<br />
∂x k ,<br />
1<br />
1 ···∂xk q<br />
q<br />
where k· = ∑ q<br />
i=1 k i. We denote by CM α (I) the class <strong>of</strong> functions defined on I such<br />
that<br />
max sup<br />
k·≤α x<br />
|Dk h(x)|+max sup |D k h(x) − D k h(y)|<br />
k·=α x,y ‖x − y‖ α−α ≤ M,<br />
where the sumprema are taken over all x, y in the interior <strong>of</strong> I with x ≠ y. Then<br />
there exists a constant K>0 depending only on α and q, such that<br />
∂ k·<br />
( M<br />
log N(CM α (I), ‖·‖ ∞,ε)≤ Kλ(I 1 )<br />
ε<br />
) q/α<br />
,<br />
where λ(I 1 ) is the Lebesgue measure <strong>of</strong> the set {x : ‖x−I‖ < 1}. Hence, the metric<br />
entropy condition A7 is satisfied if q/(2α) < 1, and there<strong>for</strong>e our <strong>theory</strong> works.<br />
When I = R q , we shall restrict out attention, <strong>for</strong> example, to the following class<br />
H <strong>of</strong> functions on R q .LetI 0 be a bounded, convex subset <strong>of</strong> R q , and we suppose<br />
that the restriction <strong>of</strong> h ∈ H to I 0 belongs to C M (I 0 ) and that<br />
(4)<br />
sup |h(x) − h ′ (x)|≤L sup |h(x) − h ′ (x)| ∀h, h ′ ∈ H,<br />
x∈R q x∈I 0
3564 Y. NISHIYAMA<br />
<strong>for</strong> a constant L>0. Then both the last condition <strong>of</strong> A3 and A7 are satisfied. The<br />
condition (4) is satisfied if we assume, <strong>for</strong> example, either <strong>of</strong> the following:<br />
(i) h is known on I c 0 ;<br />
(ii) when q = 1andI 0 =[l 0 ,r 0 ], each h is constant on (−∞,l 0 ] and on<br />
[r 0 , ∞).<br />
Although the examples (i) and (ii) might look restrictive, it should be noted that<br />
in practice we can choose an arbitrary large I 0 .<br />
Another way to deal with the unbounded case I = R q is to consider the parametrization<br />
σ(x; h) = h(u(x)), h ∈ C α M (I 0),<br />
where I 0 being a bounded, convex subset <strong>of</strong> R q′ and u : R q → I 0 is a fixed function.<br />
If q ′ /(2α) < 1, then both the last requirement in A3 and the metric entropy<br />
condition A7 are satisfied <strong>for</strong> the uni<strong>for</strong>m metric d H =‖·‖ ∞ .<br />
Instead <strong>of</strong> CM α (I), another possibility <strong>of</strong> the choice <strong>of</strong> H which satisfies the<br />
metric entropy condition <strong>for</strong> the uni<strong>for</strong>m metric is the Sobolev class; see Example<br />
19.10 <strong>of</strong> van der Vaart (1998).<br />
4.2. Results. As announced in Section 1, we propose to use the estimating<br />
function<br />
n (θ, h) = 1 n∑ Ṡ(X t n<br />
i−1<br />
; θ)<br />
tn<br />
σ 2 (X t n<br />
i−1<br />
; h) [X ti<br />
n − X t n<br />
i−1<br />
− S(X t n<br />
i−1<br />
; θ)|ti n − ti−1 n |],<br />
i=1<br />
whose compensator is<br />
˜ n (θ, h) = 1 n∑ Ṡ(X t n<br />
i−1<br />
; θ) [∫ t n<br />
i<br />
tn<br />
σ 2 (X t n<br />
i−1<br />
; h)<br />
i=1<br />
t n i−1<br />
Then we have the following two lemmas.<br />
S(X t ,θ 0 )dt − S(X t n<br />
i−1<br />
; θ)|t n i − t n i−1 | ].<br />
LEMMA 4.1. Assume n → 0 and tn n →∞. Equip the space × H with the<br />
metric ρ =‖·‖∨d H . Under A1–A5 and A7, √ tn ( n − ˜ n ) converges weakly in<br />
C ρ ( × H) to a zero-mean Gaussian process Z with the covariance<br />
∫<br />
EZ(θ, h)Z(θ ′ ,h ′ ) T =<br />
I<br />
Ṡ(x; θ)Ṡ(x; θ ′ ) T<br />
σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ(dx).<br />
In particular, the random variable Z(θ 0 ,h 0 ) is distributed with N (0,I(θ 0 ,h 0 )).<br />
LEMMA 4.2. Assume n = o((tn n)−1 ) and tn n →∞. Under A1–A6, <strong>for</strong> any<br />
random sequence (θ n ,h n ) such that ‖θ n − θ 0 ‖∨d H (h n ,h 0 ) = o P ∗(1), it holds that<br />
˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 ) − (−I(θ 0 ,h 0 ))(θ n − θ 0 ) = o P ∗( (t<br />
n<br />
n ) −1/2 +‖θ n − θ 0 ‖ ) .
SEMIPARAMETRIC Z-ESTIMATION 3565<br />
Combining these lemmas with Theorem 2.1, and noting also ˜ n (θ 0 ,h 0 ) =<br />
O P ( 1/2<br />
n ) which will be proved by using Lemma 4.5 below, we can conclude<br />
the following theorem.<br />
THEOREM 4.3. Assume n = o((tn n)−1 ) and tn n →∞. Under A1–A7, <strong>for</strong> any<br />
random sequence (̂θ n , ĥn), such that<br />
and<br />
‖̂θ n − θ 0 ‖=o P ∗(1),<br />
d H (ĥn,h 0 ) = o P ∗(1)<br />
n (̂θ n , ĥn) = o P ∗((t n n )−1/2 ),<br />
the estimator ̂θ n is asymptotically normal and efficient:<br />
√<br />
t n n (̂θ n − θ 0 ) d → N (0,I(θ 0 ,h 0 ) −1 ).<br />
When A8 is also satisfied, the assumption “‖̂θ n − θ 0 ‖=o P ∗(1)” is automatically<br />
satisfied.<br />
In the above theorem, the only assumption which we cannot check in the course<br />
<strong>of</strong> computing the data is the consistency “d H (ĥn,h 0 ) = o P ∗(1),” because it involves<br />
the true value h 0 <strong>of</strong> the unknown parameter h ∈ H .When{σ 2 (·; h); h ∈ H }<br />
is a class <strong>of</strong> functions σ 2 (·; h) = h(·) where H is a class <strong>of</strong> smooth functions, one<br />
may think that a kernel estimator is a candidate <strong>for</strong> ĥn. As stated above, in view <strong>of</strong><br />
the Lipschitz condition <strong>of</strong> h ↦→ σ 2 (·; h) (the last condition in A3), it is convenient<br />
to consider the consistency with respect to the uni<strong>for</strong>m metric. However, to show<br />
the consistency <strong>of</strong> the kernel estimator with respect to the uni<strong>for</strong>m metric is a task.<br />
Generally speaking, showing the consistency <strong>for</strong> infinite-dimensional parameter<br />
is not a trivial problem, which should be solved by independent articles. See, <strong>for</strong><br />
example, H<strong>of</strong>fmann (2001). Below, we give a general way to show the consistency<br />
<strong>of</strong> a least square estimator.<br />
THEOREM 4.4.<br />
Assume n → 0 and tn n →∞. Assume A2–A5 and A7. Suppose<br />
that<br />
∫<br />
inf |σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ(dx) > 0 ∀ε>0<br />
h : d H (h,h 0 )>ε I<br />
is satisfied. If the random element ĥn satisfies A n (ĥn) ≤ inf h∈H A n (h) + o P ∗(1)<br />
where<br />
A n (h) = 1<br />
n∑<br />
|X t n<br />
i<br />
− X t n<br />
i−1<br />
| 2<br />
tn<br />
∣<br />
|ti n − ti−1 n | − σ 2 2<br />
(X t n<br />
i−1<br />
,h)<br />
∣ |ti n − ti−1 n |,<br />
i=1<br />
then it holds that d H (ĥn,h 0 ) = o P ∗(1).
3566 Y. NISHIYAMA<br />
4.3. Pro<strong>of</strong>s. Be<strong>for</strong>e the pro<strong>of</strong>s, we state a lemma which is well known.<br />
LEMMA 4.5.<br />
|t n i<br />
− t n i−1 |≤1.<br />
Let X be a solution to the SDE (1) <strong>for</strong> (θ, h) = (θ 0 ,h 0 ). Assume<br />
(i) For any k ≥ 2, there exists a constant C k > 0, depending only on k, such<br />
that<br />
E sup |X t − X t n<br />
t∈[ti−1 n ,tn i ] i−1<br />
| k ≤ C k sup E{|S(X s ; θ 0 )| k +|σ(X s ; h 0 )| k }|ti n − ti−1 n |k/2<br />
s∈R<br />
=: D k |t n i − t n i−1 |k/2 ,<br />
provided the right-hand side is finite.<br />
(ii) For any k ≥ 2 and any measurable function f , g, it holds that<br />
sup E ( |X t − X t n<br />
t∈[ti−1 n ,tn i ] i−1<br />
| k/2 |f(X t n<br />
i−1<br />
)||g(X t )| )<br />
≤ (D k |ti n − ti−1 n |k/2 ) 1/2 sup(E|f(X s )| 4 ) 1/4 sup(E|g(X s )| 4 ) 1/4 ,<br />
s∈R<br />
s∈R<br />
provided the right-hand side is finite.<br />
PROOF. The assertion (i) is well known. (Use Hölder’s inequality<br />
∫ t n<br />
and Burkholder–Davis–Gundy’s inequality <strong>for</strong> i<br />
ti−1<br />
n |S(X s ; θ 0 )| ds and<br />
sup t∈[t n<br />
i−1 ,ti n] | ∫ t<br />
t n σ(X s ; h 0 )dW s |.) The assertion (ii) follows from Hölder’s inequality<br />
and (i).<br />
i−1<br />
□<br />
During the pro<strong>of</strong>s, we write<br />
ψ(x; θ,h)=<br />
Ṡ(x; θ)<br />
σ 2 (x; h) ,<br />
which is a d-dimensional vector-valued function. For each component ψ (j) (x; θ,<br />
h) (j = 1,...,d), it holds that<br />
∣ ψ (j) (x; θ,h)− ψ (j) (x ′ ; θ,h) ∣ ∣<br />
(5)<br />
≤ |Ṡ(j) (x; θ)− Ṡ (j) (x ′ ; θ)|<br />
σ 2 (x; h)<br />
+ ∣ Ṡ (j) (x ′ ; θ) ∣ 1<br />
∣<br />
σ 2 (x; h) − 1<br />
σ 2 (x ′ ; h)<br />
∣<br />
{ 1<br />
≤<br />
c + (x) }<br />
c 2 K|x − x ′ |
and that<br />
∣ ψ (j) (x; θ,h)− ψ (j) (x; θ ′ ,h ′ ) ∣ ∣<br />
(6)<br />
SEMIPARAMETRIC Z-ESTIMATION 3567<br />
≤ |Ṡ(j) (x; θ)− Ṡ (j) (x; θ ′ )|<br />
σ 2 (x; h)<br />
+ ∣ ∣Ṡ (j) (x; θ ′ ) ∣ 1<br />
∣<br />
σ 2 (x; h) − 1<br />
σ 2 (x; h ′ )<br />
∣<br />
≤ |Ṡ(j) (x; θ)− Ṡ (j) (x; θ ′ )|<br />
+ ∣ c<br />
Ṡ (j) (x; θ ′ ) ∣ |σ 2 (x; h) − σ 2 (x; h ′ )|<br />
c 2<br />
{ (x)<br />
≤ + |(x)|2 } (‖θ − θ ′<br />
c c 2 ‖∨d H (h, h ′ ) ) .<br />
PROOF OF LEMMA 4.1. We apply Theorem 3.4.2 <strong>of</strong> Nishiyama (2000b) [or,<br />
see Theorem 3.3 <strong>of</strong> van der Vaart and van Zanten (2005)] to the terminals M n,θ,h<br />
tn<br />
<strong>of</strong> the continuous martingales t Mt<br />
n,θ,h given by<br />
M n,θ,h<br />
t = 1 √ t n n<br />
n∑<br />
∫ t n<br />
i ∧t<br />
ψ(X t n<br />
i−1<br />
; θ,h) σ(X s ; h 0 )dW s .<br />
i=1<br />
ti−1 n ∧t<br />
For the finite-dimensional convergence, it is sufficient to show the convergence <strong>of</strong><br />
predictable covariation. This is done as follows.<br />
(7)<br />
〈M n,θ,h ,M n,θ ′ ,h ′ 〉 t n = 1 n∑<br />
∫ t n<br />
tn<br />
ψ(X t n<br />
i−1<br />
; θ, h)ψ(X t n<br />
i−1<br />
; θ ′ ,h ′ ) T i<br />
= 1<br />
i=1<br />
∫ t n n<br />
t n i−1<br />
σ 2 (X s ; h 0 )ds<br />
ψ(X t ; θ, h)ψ(X t ; θ ′ ,h ′ ) T σ 2 (X t ; h 0 )dt + o P (1)<br />
tn<br />
0<br />
∫<br />
p<br />
→ ψ(x; θ, h)ψ(x; θ ′ ,h ′ ) T σ 2 (x; h 0 )μ(dx)<br />
I<br />
∫<br />
Ṡ(x; θ)Ṡ(x; θ ′ ) T<br />
=<br />
I σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ(dx)<br />
= I(θ 0 ,h 0 ) if (θ, h) = (θ ′ ,h ′ ) = (θ 0 ,h 0 ).<br />
Here, to show (7), we have used the bound (5) and Lemma 4.5(ii) twice.<br />
To establish Nishiyama’s condition [ME], let us observe the following fact to<br />
check the metric entropy condition <strong>for</strong> the product space × H .<br />
In general, if (D, d) and (E, e) are two semimetric spaces, then the covering<br />
number <strong>of</strong> the product space D × E with respect to the maximum semimetric<br />
d ∨ e, namely N(D × E,d ∨ e,ε), is bounded by N(D,d,ε)· N(E,e,ε).Tosee<br />
this claim, let B i , i = 1,...,N(D,d,ε) be an ε-covering <strong>of</strong> D, andletC j , j =<br />
1,...,N(E,e,ε)be an ε-covering <strong>of</strong> E. Then the diameters <strong>of</strong> the sets B i × C j ⊂
3568 Y. NISHIYAMA<br />
D × E with respect to d ∨ e are smaller than ε, thus these sets <strong>for</strong>m an ε-covering<br />
<strong>of</strong> D × E. The claim has been proved. Consequently, the metric entropy condition<br />
is satisfied if<br />
∫ 1<br />
0<br />
∫ 1<br />
0<br />
√<br />
log N(D,d,ε)dε < ∞<br />
√<br />
log N(D × E,d ∨ e,ε)dε < ∞<br />
and<br />
∫ 1<br />
0<br />
√<br />
log N(E,e,ε)dε
to (θ 0 ,h 0 ) that<br />
(8)<br />
1<br />
t n n<br />
∫ t n n<br />
0<br />
SEMIPARAMETRIC Z-ESTIMATION 3569<br />
ψ(X t ; θ n ,h n )[S(X t ; θ 0 ) − S(X t ; θ n )] dt<br />
= 1<br />
t n n<br />
= 1<br />
=<br />
tn<br />
∫<br />
I<br />
∫ t n n<br />
0<br />
∫ t n n<br />
0<br />
ψ(X t ; θ n ,h n )Ṡ(X t ; θ 0 ) T dt(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />
ψ(X t ; θ 0 ,h 0 )Ṡ(X t ; θ 0 ) T dt(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />
ψ(x; θ 0 ,h 0 )Ṡ(x; θ 0 ) T μ(dx)(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />
=−I(θ 0 ,h 0 )(θ n − θ 0 ) + o P ∗(‖θ n − θ 0 ‖).<br />
To prove (8) in the above computation, use (6) to show that <strong>for</strong> every j,k =<br />
1,...,d<br />
∫ 1 t n n<br />
∣ ψ (j) (X t ; θ n ,h n )Ṡ (k) (X t ; θ 0 )dt − 1 ∫ t n n<br />
ψ (j) (X t ; θ 0 ,h 0 )Ṡ (k) (X t ; θ 0 )dt<br />
∣<br />
t n n<br />
0<br />
≤ 1 ∫ t n{ n (Xt )<br />
tn<br />
0 c<br />
= O P (1) · o P ∗(1)<br />
= o P ∗(1).<br />
The pro<strong>of</strong> is complete.<br />
t n n<br />
0<br />
+ |(X t)| 2<br />
c 2 } ∣∣Ṡ (k) (X t ; θ 0 ) ∣∣ dt ·‖θ n − θ 0 ‖∨d H (h n ,h 0 )<br />
□<br />
PROOF OF THEOREM 4.3. By Lemma 4.5, it is easy to see that ˜ n (θ 0 ,h 0 ) =<br />
O P ( 1/2<br />
n ) = o P ((tn n)−1/2 ). So the main assertion follows from Theorem 2.1<br />
with help from Lemmas 4.1 and 4.2. On the other hand, since it follows from<br />
Lemma 4.5(ii) and Theorem 3.1(i) that sup θ,h ‖ n (θ, h) − (θ,h)‖=o P ∗(1),<br />
where<br />
∫<br />
Ṡ(x; θ)<br />
(θ,h) =<br />
I σ 2 (x; h) [S(x; θ 0) − S(x; θ)]μ(dx),<br />
the assertion that the consistency “‖̂θ n − θ 0 ‖=o P ∗(1)” automatically follows<br />
from A8 is immediate from Theorem 2.3. □<br />
PROOF OF THEOREM 4.4.<br />
n∑<br />
M n (h) = 1<br />
M(h) =<br />
tn<br />
i=1<br />
∫<br />
I<br />
Put<br />
|σ 2 (X t n<br />
i−1<br />
; h) − σ 2 (X t n<br />
i−1<br />
; h 0 )| 2 |t n i − t n i−1 |,<br />
|σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ(dx).
3570 Y. NISHIYAMA<br />
Let us apply Corollary 3.2.3 <strong>of</strong> van der Vaart and Wellner (1996) to the above<br />
M n and M <strong>for</strong> the given ĥn which is the solution <strong>of</strong> A n (ĥn) ≤ inf h∈H A n (h) +<br />
o P ∗(1). By Lemma 4.5(ii) and Theorem 3.1(i), it is not difficult to see that<br />
sup h∈H |M n (h) − M(h)| =o P ∗(1), so it is sufficient to show that M n (ĥn) =<br />
o P ∗(1).<br />
Observe that<br />
1<br />
n∑<br />
|X t n<br />
i<br />
− X t n<br />
i−1<br />
| 2<br />
tn<br />
∣<br />
|t n i=1 i<br />
− ti−1 n | − σ 2 2<br />
(X t n<br />
i−1<br />
; h)<br />
∣ |ti n − ti−1 n |<br />
= 1<br />
n∑<br />
|X t n<br />
i<br />
− X t n<br />
i−1<br />
| 2<br />
tn<br />
∣<br />
|t n i=1 i<br />
− ti−1 n | − σ 2 2<br />
(X t n<br />
i−1<br />
; h 0 )<br />
∣ |ti n − ti−1 n |<br />
+ 2 n∑<br />
( |Xt n<br />
i<br />
− X t n<br />
i−1<br />
| 2<br />
)<br />
tn<br />
|t n i=1 i<br />
− ti−1 n | − σ 2 (X t n<br />
i−1<br />
; h 0 )<br />
+ 1<br />
t n n<br />
× ( σ 2 (X t n<br />
i−1<br />
; h 0 ) − σ 2 (X t n<br />
i−1<br />
; h) ) |t n i − t n i−1 |<br />
n∑<br />
|σ 2 (X t n<br />
i−1<br />
; h 0 ) − σ 2 (X t n<br />
i−1<br />
; h)| 2 |ti n − ti−1 n |.<br />
i=1<br />
Let us prove that the supremum with respect to h <strong>of</strong> the absolute value <strong>of</strong> the<br />
second term on the right-hand side converges in outer probability to zero [say, the<br />
claim (a)].<br />
Since we have from Itô’s <strong>for</strong>mula that<br />
|X t n<br />
i<br />
− X t n<br />
i−1<br />
| 2 = 2<br />
∫ t n<br />
i<br />
t n i−1<br />
(X s − X t n<br />
i−1<br />
)S(X s ; θ 0 )ds<br />
∫ t n ∫<br />
i<br />
t n<br />
i<br />
+ 2 (X s − X t<br />
ti−1<br />
n n<br />
i−1<br />
)σ (X s ; h 0 )dW s + σ 2 (X s ; h 0 )ds,<br />
ti−1<br />
n<br />
it is sufficient to show that C 1,n = o P (1), sup h∈H |C 2,n (h)|=o P ∗(1) and C 3,n =<br />
o P (1), where<br />
C 1,n = 1<br />
t n n<br />
C 2,n (h) = 1<br />
t n n<br />
C 3,n = 1<br />
t n n<br />
n∑<br />
∫ t n i<br />
∣ (X s − X t<br />
i=1<br />
ti−1<br />
n n<br />
i−1<br />
)S(X s ; θ 0 )ds<br />
∣ (X ti−1 n ),<br />
n∑<br />
∫ t n<br />
i<br />
(<br />
(X s − X t<br />
i=1<br />
ti−1<br />
n n<br />
i−1<br />
)σ (X s ; h 0 )dW s σ 2 (X t n<br />
i−1<br />
; h 0 ) − σ 2 (X t n<br />
i−1<br />
; h) ) ,<br />
n∑<br />
∫ t n<br />
i<br />
|σ 2 (X s ; h 0 ) − σ 2 (X t<br />
i=1<br />
ti−1<br />
n n<br />
i−1<br />
; h 0 )| ds(X t n<br />
i−1<br />
).<br />
By using Lemma 4.5(ii), we easily have EC 1,n → 0andEC 3,n → 0. On the other<br />
hand, by using Theorem 3.4.2 <strong>of</strong> Nishiyama (2000b), it holds that C 2,n converges
SEMIPARAMETRIC Z-ESTIMATION 3571<br />
weakly to zero in C dH (H ) (recall the argument in the pro<strong>of</strong> <strong>of</strong> Lemma 4.1). There<strong>for</strong>e,<br />
we have sup h∈H |C 2,n (h)|=o P ∗(1).<br />
Hence, the claim (a) is true, and we have that A n (ĥn) ≤ inf h∈H A n (h) + o P ∗(1)<br />
implies that M n (ĥn) = o P ∗(1). The pro<strong>of</strong> is finished. □<br />
5. Ergodic time series.<br />
5.1. Model and regularity conditions. Let us consider the time series model<br />
given by<br />
X i = ˜S(X i−1 ,...,X i−q1 ; θ)+ ˜σ(X i−1 ,...,X i−q2 ; h)w i .<br />
By putting q = q 1 ∨q 2 and changing the domains <strong>of</strong> the functions ˜S and ˜σ , without<br />
loss <strong>of</strong> generality, we can write<br />
X i = S(X i ; θ)+ σ(X i ; h)w i ,<br />
where X i = (X i−1 ,...,X i−q ) and S(·; θ) and σ(·; h) are some measurable functions<br />
on R q . For simplicity, we assume that the initial values (X 0 ,...,X 1−q ) =<br />
(x 0 ,...,x 1−q ) are fixed.<br />
As <strong>for</strong> the noise {w i }, we consider the following two cases:<br />
CASE G (Gaussian).<br />
N (0, 1).<br />
CASE M (Martingale).<br />
where F i = σ {X j : j ≤ i}.<br />
{w i } are independently, identically distributed with<br />
E[w i |F i−1 ]=0andE[w 2 i |F i−1] =1 almost surely,<br />
Clearly, the Case G is a special case <strong>of</strong> the Case M. When we do not especially<br />
declare the restriction to the Case G, we consider the Case M in principle.<br />
Let us list up some conditions which have the same fashion as those in Section<br />
4.1. We suppose that there exists a parametric family <strong>of</strong> d-dimensional vectorvalued<br />
functions {Ṡ(·; θ); θ ∈ } on R q which satisfies the following conditions.<br />
Typically, they may be considered to be the derivatives <strong>of</strong> S(·; θ) with respect to θ,<br />
that is, Ṡ(·; θ) = (<br />
∂θ ∂<br />
1<br />
S(·; θ),...,<br />
∂θ ∂<br />
d<br />
S(·; θ)) T . The function appearing in B1<br />
and B2 may be chosen to be common without loss <strong>of</strong> generality.<br />
B1. is a compact subset <strong>of</strong> R d . There exists a measurable function on R q<br />
such that at the true θ 0 ∈ ,<br />
S(x; θ)− S(x; θ 0 ) = Ṡ(x; θ 0 ) T (θ − θ 0 ) + (x)ɛ(x; θ,θ 0 ),<br />
where sup x∈R q |ɛ(x; θ,θ 0 )|=o(‖θ − θ 0 ‖) as θ → θ 0 .
3572 Y. NISHIYAMA<br />
B2. There exists a measurable function on R q such that<br />
sup |S(x; θ)|≤(x);<br />
θ∈<br />
sup ‖Ṡ(x; θ)‖≤(x);<br />
θ∈<br />
σ 2 (x; h 0 ) ≤ (x),<br />
c := inf inf σ 2 (x; h) > 0;<br />
h∈H x∈R q<br />
‖Ṡ(x; θ)− Ṡ(x; θ)‖≤(x)‖θ − θ ′ ‖ ∀θ,θ ′ ∈ ;<br />
|σ 2 (x; h) − σ 2 (x; h ′ )|≤(x)d H (h, h ′ ) ∀h, h ′ ∈ H.<br />
B3. The process {X i } i=1,2,... is ergodic under the true (θ 0 ,h 0 ) in the sense that<br />
<strong>for</strong> q ′ = q and q + 1 there exists the invariant measure μ q ′ such that <strong>for</strong> every<br />
μ q ′-integrable function f<br />
1<br />
n<br />
n∑<br />
i=1<br />
We also assume that<br />
∫<br />
B4. The matrix<br />
f(X i−1 ,...,X i−q ′) →<br />
p ∫<br />
f(x 1,...,x q ′)μ q ′(dx 1 ···dx q ′).<br />
R q′<br />
∫<br />
R q (x)5 μ q (dx)ε<br />
∥<br />
∫ 1<br />
0<br />
√<br />
log N(H,d H ,ε)dε 0.<br />
See the end <strong>of</strong> Section 4.1 <strong>for</strong> the discussion <strong>of</strong> the choice <strong>of</strong> (H, d H ).<br />
5.2. Results. In order to explain the idea <strong>of</strong> our estimating function, let us first<br />
consider the Case G. We denote by P n,u the distribution <strong>of</strong> {X 1 ,...,X n } under
SEMIPARAMETRIC Z-ESTIMATION 3573<br />
θ = θ 0 + n −1/2 u and h = h 0 ,whereu ∈ R d . By an easy computation, the loglikelihood<br />
ratio is given by<br />
(9)<br />
where<br />
and<br />
n,u =<br />
log dP n,u<br />
dP n,0<br />
(X 1 ,...,X n )<br />
=−<br />
n∑<br />
i=1<br />
1<br />
2σ 2 (X i ; h 0 )<br />
= n,u − B n,u ,<br />
n∑<br />
i=1<br />
B n,u =<br />
×{|X i − S(X i ; θ 0 + n −1/2 u)| 2 −|X i − S(X i ; θ 0 )| 2 }<br />
1<br />
σ 2 (X i ; h 0 ) {X i − S(X i ; θ 0 )}{S(X i ; θ 0 + n −1/2 u) − S(X i ; θ 0 )}<br />
n∑<br />
i=1<br />
1<br />
2σ 2 (X i ; h 0 ) {S(X i; θ 0 + n −1/2 u) − S(X i ; θ 0 )} 2 .<br />
Under the above regularity conditions, we have<br />
n,u<br />
d → N (0,u T I(θ 0 ,h 0 )u) and B n,u<br />
p<br />
→<br />
1<br />
2<br />
u T I(θ 0 ,h 0 )u.<br />
So it follows from the <strong>theory</strong> <strong>of</strong> the local asymptotic normality that the distribution<br />
<strong>of</strong> the asymptotically efficient bound in the Case G when h 0 is known is<br />
N (0,I(θ 0 ,h 0 ) −1 ). That is, if we obtain an estimator ˜θ n such that √ n(˜θ n − θ 0 ) →<br />
d<br />
N (0,I(θ 0 ,h 0 ) −1 ), it is asymptotically efficient in the sense <strong>of</strong> the local asymptotic<br />
minimax theorem. [See, e.g., Chapter 3.11 <strong>of</strong> van der Vaart and Wellner (1996).]<br />
If the parameter h is unknown, then the estimation problem <strong>for</strong> θ becomes more<br />
difficult. So if we have an estimator which asymptotically behaves as stated above,<br />
then we may say that it is asymptotically efficient with the nuisance parameter h.<br />
This argument is not true in the Case M where the log-likelihood does not equal<br />
the <strong>for</strong>mula (9), but we propose to use it <strong>for</strong> deriving an estimating equation which<br />
yields the same asymptotic distribution as the Case G.<br />
Not only in the Case G but also in the case M, differentiating (9) <strong>for</strong>mally,<br />
and replacing the true h 0 by the unknown parameter h, we propose the estimating<br />
function<br />
n (θ, h) = 1 n∑ Ṡ(X i ; θ) (<br />
Xi<br />
n σ 2 − S(X i ; θ) ) .<br />
(X i ; h)<br />
Its compensator is<br />
˜ n (θ, h) = 1 n<br />
n∑<br />
i=1<br />
i=1<br />
Ṡ(X i ; θ) ( S(Xi<br />
σ 2 ; θ 0 ) − S(X i ; θ) ) .<br />
(X i ; h)
3574 Y. NISHIYAMA<br />
Thus, it holds that<br />
√ ( n n (θ, h) − ˜ n (θ, h) ) = √ 1 n∑ Ṡ(X i ; θ)<br />
n σ 2 (X i ; h) σ(X i; h 0 )w i ,<br />
which is the summation <strong>of</strong> a C ρ ( × H)-valued martingale difference array where<br />
ρ =‖·‖∨d H . By using Jain–Marcus’ central limit theorem <strong>for</strong> martingales given<br />
by Nishiyama (1996, 2000a, 2000b), we have the following lemma which plays a<br />
key role in our approach.<br />
LEMMA 5.1. Under B1–B3 and B5, the sequence <strong>of</strong> random fields √ n( n −<br />
˜ n ), with parameter (θ, h), converges weakly in C ρ ( × H) to a zero-mean<br />
Gaussian random field Z with the covariance<br />
EZ(θ, h)Z(θ ′ ,h ′ ) T Ṡ(x; θ)Ṡ(x; θ<br />
=<br />
∫R ′ ) T<br />
q σ 2 (x; h)σ 2 (x; h ′ ) σ 2 (x; h 0 )μ q (dx).<br />
In particular, the random variable Z(θ 0 ,h 0 ) is distributed with N (0,I(θ 0 ,h 0 )).<br />
Another lemma which is necessary to apply Theorem 2.1 is the following.<br />
LEMMA 5.2. Under B1–B4, <strong>for</strong> any random sequence (θ n ,h n ) such that<br />
‖θ n − θ 0 ‖∨d H (h n ,h 0 ) = o P ∗(1), it holds that<br />
˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 ) − (−I(θ 0 ,h 0 ))(̂θ n − θ 0 ) = o P ∗(‖θ n − θ 0 ‖).<br />
i=1<br />
Noting also that ˜ n (θ 0 ,h 0 ) = 0, we can apply Theorem 2.1 to conclude the<br />
following theorem.<br />
THEOREM 5.3.<br />
Under B1–B5, <strong>for</strong> any random sequence (̂θ n , ĥn) such that<br />
‖̂θ n − θ 0 ‖=o P ∗(1), d H (ĥn,h 0 ) = o P ∗(1) and n (̂θ n , ĥn) = o P ∗(n −1/2 ),<br />
the estimator ̂θ n is asymptotically normal:<br />
√ n(̂θ n − θ 0 ) d → N (0,I(θ 0 ,h 0 ) −1 ).<br />
In particular, in the Case G, the estimator ̂θ n is asymptotically efficient. When B6<br />
is also satisfied, the assumption “‖̂θ n − θ 0 ‖=o P ∗(1)” is automatically satisfied.<br />
By the same reason as in Section 4, it is necessary to develop a procedure to construct<br />
a consistent estimator ĥn <strong>for</strong> the nuisance parameter h ∈ H . The following<br />
theorem gives us an answer.
SEMIPARAMETRIC Z-ESTIMATION 3575<br />
THEOREM 5.4. Assume B2, B3 and B5.<br />
(First step: initial estimator <strong>for</strong> θ 0 .) Suppose the identifiability condition<br />
∫<br />
inf |S(x; θ)− S(x; θ 0)| 2 μ q (dx)>0 ∀ε>0<br />
θ : ‖θ−θ 0 ‖>ε R q<br />
is satisfied. If a random sequence θn<br />
LS satisfies A n (θn<br />
LS)<br />
≤ inf θ∈ A n (θ) + o P ∗(1),<br />
where<br />
A n (θ) = 1 n∑<br />
|X i − S(X i ; θ)| 2 ,<br />
n<br />
i=1<br />
then it holds that ‖θn<br />
LS − θ 0 ‖=o P ∗(1).<br />
(Second step: consistent estimator <strong>for</strong> h 0 .) Suppose the identifiability condition<br />
∫<br />
inf |σ 2 (x; h) − σ 2 (x; h 0 )| 2 μ q (dx)>0 ∀ε>0<br />
h : d H (h,h 0 )>ε R q<br />
is satisfied. Merely by a technical reason, assume that there exists a constant<br />
L 4 > 0 such that E[wi 4|F i−1]
3576 Y. NISHIYAMA<br />
PROOF OF LEMMA 5.2. Notice that ˜ n (θ 0 ,h 0 ) = 0. For any (possibly random)<br />
sequence (θ n ,h n ) converging in outer probability to (θ 0 ,h 0 ), it holds that<br />
˜ n (θ n ,h n ) − ˜ n (θ 0 ,h 0 )<br />
(11)<br />
= 1 n<br />
= 1 n<br />
= 1 n<br />
n∑<br />
i=1<br />
n∑<br />
i=1<br />
n∑<br />
i=1<br />
Ṡ(X i ; θ n )<br />
σ 2 (X i ; h n ) [S(X i; θ 0 ) − S(X i ; θ n )]<br />
Ṡ(X i ; θ n )<br />
σ 2 (X i ; h n )Ṡ(X i; θ 0 )(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />
Ṡ(X i ; θ 0 )<br />
σ 2 (X i ; h 0 )Ṡ(X i; θ 0 )(θ 0 − θ n ) + o P ∗(‖θ n − θ 0 ‖)<br />
=−I(θ 0 ,h 0 )(θ n − θ 0 ) + o P ∗(‖θ n − θ 0 ‖).<br />
To show (11) above, do the same argument as the pro<strong>of</strong> <strong>of</strong> Lemma 4.2 using (10)<br />
instead <strong>of</strong> (6). □<br />
PROOF OF THEOREM 5.3. The assertions follow from Theorems 2.1 and 2.3<br />
by using also Lemmas 5.1 and 5.2, and Theorem 3.1(ii), respectively. □<br />
PROOF OF THEOREM 5.4. To prove the first step, we will apply Corollary<br />
3.2.3 <strong>of</strong> van der Vaart and Wellner (1996). We can write A n (θ) = T 1,n + T 2,n (θ) +<br />
T 3,n (θ) where<br />
T 1,n = 1 n∑<br />
|X i − S(X i ; θ 0 )| 2 ,<br />
n<br />
i=1<br />
T 2,n (θ) = 1 n∑ ( S(Xi ; θ 0 ) − S(X i ; θ) ) σ(X i ; h 0 )w i ,<br />
n<br />
i=1<br />
T 3,n (θ) = 1 n∑<br />
|S(X i ; θ 0 ) − S(X i ; θ)| 2 .<br />
n<br />
i=1<br />
The term T 1,n converges in probability to a constant C 1 . On the other hand, by using<br />
Proposition 4.5 <strong>of</strong> Nishiyama (2000a), we have that √ nT 2,n converges weakly<br />
in C() to a tight law, thus, sup θ∈ |T 2,n (θ)| converges in outer probability to zero.<br />
Finally, by Theorem 3.1, it holds that sup θ∈ |T 3,n (θ) − T 3 (θ)|=o P ∗(1) where<br />
∫<br />
T 3 (θ) = |S(x; θ 0) − S(x; θ)| 2 μ(dx).<br />
R q<br />
Hence, we have sup θ∈ |A n (θ) − (C 1 + T 3 (θ))|=o P ∗(1), and van der Vaart and<br />
Wellner’s (1996) consistency theorem yields the assertion <strong>of</strong> the first step.
SEMIPARAMETRIC Z-ESTIMATION 3577<br />
To prove the second step, we shall apply Corollary 3.2.3 <strong>of</strong> van der Vaart and<br />
Wellner (1996) again. Let us first see that sup h∈H |B n (h) − ˜B n (h)| =o P ∗(1)<br />
where<br />
˜B n (h) = 1 n∑<br />
n<br />
∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h) ∣ 2 .<br />
n<br />
i=1<br />
Now, notice that<br />
∣ ∣∣ |X i − S(X i ; θ LS )|2 − σ 2 (X i ; h) ∣ 2 − ∣∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h) ∣ 2∣ ∣<br />
≤ ∣∣ |X i − S(X i ; θn<br />
LS )|2 +|X i − S(X i ; θ 0 )| 2 − 2σ 2 (X i ; h) ∣ × ∣∣ |X i − S(X i ; θn<br />
LS )|2 −|X i − S(X i ; θ 0 )| 2∣ ∣<br />
≤ ∣∣ |X i − S(X i ; θn<br />
LS )|2 +|X i − S(X i ; θ 0 )| 2 − 2σ 2 (X i ; h) ∣ ×|2X i − S(X i ; θ LS<br />
n ) − S(X i; θ 0 )|<br />
×|S(X i ; θ LS<br />
n ) − S(X i; θ 0 )|<br />
≤ L(X i , X i )‖θ LS<br />
n<br />
− θ 0 ‖,<br />
where L(x 0 ,x 1 ,...,x q ) = C‖x 0 |+(x 1 ,...,x q )| 4 <strong>for</strong> a constant C. Given any<br />
ε>0 choose M>0 such that ∫ R q+1 L(x 0,x 1 ,...,x q )1 {L(x0 ,x 1 ,...,x q )>M}μ q+1 (dx 0 ,<br />
dx 1 ,...,dx q )M} diam().<br />
i=1<br />
The second term <strong>of</strong> the right-hand side converges to a positive constant which<br />
is smaller than ε · diam(). Since the choice <strong>of</strong> ε>0 is arbitrary, we have<br />
sup h∈H |B n (h) − ˜B n (h)|=o P ∗(1).<br />
Now, we can write ˜B n (h) = T 1,n + T 2,n (h) + T 3,n (h), where<br />
T 1,n = 1 n<br />
T 2,n (h) = 2 n<br />
T 3,n (h) = 1 n<br />
n∑<br />
∣ |X i − S(X i ; θ 0 )| 2 − σ 2 (X i ; h 0 ) ∣ 2 ,<br />
i=1<br />
n∑<br />
i=1<br />
n∑<br />
i=1<br />
σ 2 (X i ,h 0 )(w 2 i − 1)( σ 2 (X i ,h 0 ) − σ 2 (X i ; h) ) ,<br />
|σ 2 (X i ; h 0 ) − σ 2 (X i ; h)| 2 .<br />
The term T 1,n converges in probability to a constant C 1 by assumption. By<br />
Jain–Marcus’ CLT <strong>for</strong> martingale difference arrays, it is easy to show that<br />
sup h∈H |T 2,n (h)|=o P ∗(1) (here, we use the technical assumption that E[w 4 i |F i−1]
3578 Y. NISHIYAMA<br />
is bounded). Finally, by Theorem 3.1, it holds that sup h∈H |T 3,n (h) − T 3 (h)| =<br />
o P ∗(1) where<br />
∫<br />
T 3 (h) = |σ 2 (x; h 0 ) − σ 2 (x; h)| 2 μ q (dx).<br />
R q<br />
Consequently, we have sup h∈H |B n (h) − (C 1 + T 3 (h))|=o P ∗(1). There<strong>for</strong>e, the<br />
claim <strong>of</strong> the second step follows <strong>for</strong>m van der Vaart and Wellner’s (1996) consistency<br />
theorem. □<br />
Acknowledgments. I thank the Associate Editor and the referees <strong>for</strong> their<br />
careful reading and the advice to present a sharper version <strong>of</strong> the moment conditions.<br />
REFERENCES<br />
FLORENS-ZMIROU, D. (1989). Approximate discrete time schemes <strong>for</strong> statistics <strong>of</strong> diffusion<br />
processes. Statistics 20 547–557. MR1047222<br />
HOFFMANN, M. (2001). On estimating the diffusion coefficient: Parametric versus nonparametric.<br />
Ann. Inst. H. Poincaré Probab. Statist. 37 339–372. MR1831987<br />
JACOD, J.andSHIRYAEV, A. N. (1987). Limit Theorems <strong>for</strong> Stochastic Processes. Springer, Berlin.<br />
MR0959133<br />
KESSLER, M. (1997). Estimation <strong>of</strong> an ergodic diffusion from discrete observations. Scand. J. Statist.<br />
24 211–229. MR1455868<br />
KOSOROK, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference.<br />
Springer, New York.<br />
KUTOYANTS, Y. A. (2004). Statistical Inference <strong>for</strong> Ergodic Diffusion Processes. Springer, London.<br />
MR2144185<br />
NISHIYAMA, Y. (1996). A central limit theorem <strong>for</strong> l ∞ -valued martingale difference arrays and its<br />
application. Preprint 971. Dept. Mathematics, Utrecht Univ. Available at http://www.ism.ac.jp/<br />
~nisiyama/.<br />
NISHIYAMA, Y. (1997). Some central limit theorems <strong>for</strong> l ∞ -valued semimartingales and their applications.<br />
Probab. Theory Related Fields 108 459–494. MR1465638<br />
NISHIYAMA, Y. (1999). A maximal inequality <strong>for</strong> continuous martingales and M-estimation in a<br />
Gaussian white noise model. Ann. Statist. 27 675–696. MR1714712<br />
NISHIYAMA, Y. (2000a). Weak convergence <strong>of</strong> some classes <strong>of</strong> martingales with jumps. Ann.<br />
Probab. 28 685–712. MR1782271<br />
NISHIYAMA, Y. (2000b). Entropy Methods <strong>for</strong> Martingales. CWI Tract. 128. Centrum Wisk. In<strong>for</strong>m.,<br />
Amsterdam. MR1742612<br />
NISHIYAMA, Y. (2007). On the paper “Weak convergence <strong>of</strong> some classes <strong>of</strong> martingales with<br />
jumps.” Ann. Probab. 35 1194–1200. MR2319720<br />
NISHIYAMA, Y. (2009). A uni<strong>for</strong>m law <strong>of</strong> large numbers <strong>for</strong> ergodic processes under a Lipschitz<br />
condition. Unpublished manuscript. Available at http://www.ism.ac.jp/~nisiyama/.<br />
TANIGUCHI, M.andKAKIZAWA, Y. (2000). <strong>Asymptotic</strong> Theory <strong>of</strong> Statistical Inference <strong>for</strong> Time<br />
Series. Springer, New York. MR1785484<br />
VAN DER VAART, A. W. (1998). <strong>Asymptotic</strong> Statistics. Cambridge Univ. Press, Cambridge.<br />
MR1652247<br />
VAN DER VAART, A.W.andWELLNER, J. A. (1996). Weak Convergence and Empirical Processes:<br />
With Applications to Statistics. Springer, New York. MR1385671
SEMIPARAMETRIC Z-ESTIMATION 3579<br />
VAN DER VAART, A.W.andVAN ZANTEN, H. (2005). Donsker theorems <strong>for</strong> diffusions: Necessary<br />
and sufficient conditions. Ann. Probab. 33 1422–1451. MR2150194<br />
YOSHIDA, N. (1992). Estimation <strong>for</strong> diffusion processes from discrete observations. J. Multivariate<br />
Anal. 41 220–242. MR1172898<br />
INSTITUTE OF STATISTICAL MATHEMATICS<br />
4-6-7 MINAMI-AZABU, MINATO-KU<br />
TOKYO 106-8569<br />
JAPAN<br />
E-MAIL: nisiyama@ism.ac.jp<br />
URL: http://www.ism.ac.jp/~nisiyama/