Notes from Limit Theorems 2 Mihai Nica
Notes from Limit Theorems 2 Mihai Nica
Notes from Limit Theorems 2 Mihai Nica
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Notes</strong> <strong>from</strong> <strong>Limit</strong> <strong>Theorems</strong> 2<br />
<strong>Mihai</strong> <strong>Nica</strong>
<strong>Notes</strong>. These are my notes <strong>from</strong> the class <strong>Limit</strong> <strong>Theorems</strong> 2 taught by Proffesor<br />
McKean in Spring 2012. I have tried to carefully go over the bigger theorems<br />
<strong>from</strong> the course and fill in all the details explicitly. There is also a lot of information<br />
that is folded in <strong>from</strong> other sources.<br />
• The section on Martingales is supplemented with some notes <strong>from</strong> ”A First<br />
Look at Rigorous Probability Theory” by Jeffrey S. Rosenthal, which has<br />
a really nice introduction to Martingales.<br />
• The section of the law of the iterated logarithm is supplemented with<br />
some inequalities which I looked up on the internet...mostly wikipedia<br />
and PlanetMath.<br />
• In the section on Ergodic theorem, I use a notation I found on wikipedia<br />
that I like for continued fractions. In my pen-and-paper notes, there is<br />
also a little section about Ergodic theory for geodesics on surfaces, which<br />
is really cute. However, I couldn’t figure out a good way to draw the<br />
pictures so it hasn’t been typed up yet.<br />
• The section on Brownian Motion is supplemented by the book Brownian<br />
Motion and Martingale’s in Analysis by Richard Durret which is really<br />
wonderful. Some of the slick results are taken straight <strong>from</strong> there.<br />
• I also include an appendix with results that I found myself reviewing as I<br />
went through this stuff.
Contents<br />
Chapter 1. Martingales 5<br />
1. Definitions and Examples 5<br />
2. Stopping times 6<br />
3. Martingale Convergence Theorem 7<br />
4. Applications 9<br />
Chapter 2. The Law of the Iterated Logarithm 13<br />
1. First Half of the Law of the Iterated Logarithm 13<br />
2. Second Half of the Law of the Iterated Logarithm 15<br />
Chapter 3. Ergodic Theorem 19<br />
1. Motivation 19<br />
2. Birkhoff’s Theorem 20<br />
3. Continued Fractions 24<br />
Chapter 4. Brownian Motion 29<br />
1. Motivation 29<br />
2. Levy’s Construction 30<br />
3. Construction <strong>from</strong> Durret’s Book 33<br />
4. Some Properties 36<br />
Chapter 5. Appendix 39<br />
1. Conditional Random Variables 39<br />
2. Extension <strong>Theorems</strong> 40<br />
3
CHAPTER 1<br />
Martingales<br />
1. Definitions and Examples<br />
This section on Martingales contains heavy use of conditional random variables.<br />
I do a quick review of this topic <strong>from</strong> <strong>Limit</strong> <strong>Theorems</strong> 1 in the appendix.<br />
Definition 1.1. A sequence of random variables X0, X1, ... is called a martingale<br />
if E(|Xn|) < ∞for all n and with probability 1:<br />
E (Xn+1|X0, X1, ..., Xn) = Xn<br />
Intuitively, this is says that the average value of Xn+1is the same as that of Xn,<br />
even if we are given the values of X0to Xn. Note that conditioning on X0, ..., Xnis<br />
just different notation for conditioning on σ(X0, ..., Xn), which is the sigma algebra<br />
generated by preimages of Borel sets through X0, ..., Xn. One can make more general<br />
martingales by replacing σ(X0, ..., Xn) with an arbitrary increasing chain of sigma<br />
algebras Fn; the results here carry over to that setting too.<br />
Example 1.2. Sometimes martingales are called “fair games”. The analogy is<br />
that the random variable Xn represents the bankroll of the gambler at time n. The<br />
game is fair, because at any point in time the equity of the gambler is constant.<br />
Definition 1.3. A submartingale is when E (Xn+1|X0, X1, ..., Xn) ≥ Xn (i.e.<br />
the capital is increasing) and a supermartingale is when E (Xn+1|X0, X1, ..., Xn) ≤<br />
Xn (i.e. the capital is decreasing) Most of the theorems for martingales work for<br />
submartingales, just change the inequality in the right place. To avoid confusion<br />
between sub-, super-, and ordinary martingales, we will sometimes call a martingale<br />
a “fair martingale”.<br />
Example 1.4. The symmetric random walk, Xn = Z0 + Z1 + ... + Zn with<br />
each Zn = ±1 with probability 1<br />
2 is a martingale. In terms of the fair game, this is<br />
gambling on the outcome of a fair coin.<br />
Remark. Using the properties of conditional probabilities to see that:<br />
E (Xn+2|X0, X1, ..., Xn) = E (E (Xn+2|X0, X1, ..., Xn+1) |X0, ...Xn)<br />
= E (Xn+1|X0, ...Xn)<br />
= Xn<br />
With a simple argument by induction, we get that in general:<br />
E (Xm|X0, X1, ..., Xn) = Xn<br />
In particular then E(Xn) = E(X0) for every n. If τ is a random “time”, (a<br />
non-negative integer) that is independent of the Xn’s, then E(Xτ ) is a weighted<br />
average of E(Xn)’s, so have E(Xτ ) = E(X0) still. What if υis dependent on the<br />
5
6 1. MARTINGALES<br />
X ′ ns? In general we cannot have equality for the example of the simple symmetric<br />
random walk (coin-flip betting), with τ =first time that Xn = −1 has E(Xn) =<br />
−1 = 0 = E(X0). The next section gives some conditions where this holds.<br />
2. Stopping times<br />
Definition 2.1. For a martingale {Xn}, A non-negative integer valued random<br />
variable τ is a stopping time if it has the property that:<br />
{τ = n} ∈ σ(X1, X2, . . . , Xn)<br />
Intuitively, this is saying that one can determine if τ = n just by looking at the<br />
first n steps in the martingale.<br />
Example 2.2. In the example of the random coin flipping, if we let τ be the<br />
first time so that Xn =10, then τ is a stopping time.<br />
Example 2.3. We often are interested in Xτ , the value of the martingale at<br />
the random time τ. This is precisely defined as Xτ (ω) = X τ(ω)(ω). Another handy<br />
rewriting is: Xτ = Xk1 {τ=k} .<br />
Lemma 2.4. If {Xn}is a submartingale and τ1, τ2are bounded stopping times<br />
so that ∃M s.t. 0 ≤ τ1 ≤ τ2 ≤ M with probability 1, then E(Xτ1 ) ≤ E(Xτ2 ), with<br />
equality for fair martingales.<br />
Proof. For fixed k, the event {τ1 < k ≤ τ2}can be written as {τ1 < k ≤<br />
τ2} = {τ1 ≤ k − 1} ∩ {τ2 ≤ k − 1} C <strong>from</strong> which we see that the event {τ1 <<br />
k ≤ τ2} ∈ σ(X0, X1, . . . , Xk−1) because τ1and τ2are both stopping times. We<br />
have then the following manipulation using a telescoping series, linearity of the<br />
expectation, the fact that E(Y 1A)= E(E(Y |X0, X1, . . . , Xk−1)1A) for events A ∈<br />
σ(X0, X1, . . . , Xk−1), and finally the fact that E(Xk|X0, X1, . . . Xk−1) − Xk−1 ≥ 0<br />
since Xn is a (sub)martingale. (with equality for fair martingales):<br />
E(Xτ2 ) − E(Xτ1 ) = E(Xτ2 − Xτ1 )<br />
τ2<br />
= E(<br />
= E<br />
k=τ1+1<br />
M<br />
k=1<br />
Xk − Xk−1)<br />
(Xk − Xk−1)1 {τ1
3. MARTINGALE CONVERGENCE THEOREM 7<br />
Theorem 2.5. Say {Xn} is a martingale and τ a bounded stopping time, (that<br />
is ∃M s.t. 0 ≤ τ ≤ M with probability 1). Then:<br />
E(Xτ ) = E(X0)<br />
Proof. Let υbe the random variable which is constantly 0. This is a stopping<br />
time! So by the above lemma, since 0 ≤ υ ≤ τ ≤ M, we have that E(Xτ ) =<br />
E(Xυ) = E(X0) <br />
Theorem 2.6. For {Xn}a martingale and τ a stopping time which is almost<br />
surely finite (that is P(τ < ∞) = 1) we have:<br />
<br />
E(Xτ ) = E(X0) ⇐⇒ E lim<br />
n→∞ X <br />
min(τ,n) = lim<br />
n→∞ E <br />
Xmin(τ,n) Proof. It suffices to show that E(Xτ ) = E <br />
limn→∞ Xmin(τ,n) andE(X0) =<br />
limn→∞ E <br />
Xmin(τ,n) . The first equality holds since P(τ < ∞) = 1 gives P(limn→∞ Xmin(τ,n) =<br />
Xτ ) = 1, so they agree almost surely. The second holds by the above theorem concerning<br />
bounded stopping times since for any n, min(τ, n) is a bounded stopping<br />
time, so we have E <br />
Xmin(τ,n) = E(X0), so equality holds in the limit too. <br />
Remark. The above theorem can be combined with things like monotone<br />
convergence theorem or Lebesgue dominated convergence theorem to switch the<br />
limits and conclude that E(Xτ ) = E(X0). Here are some examples:<br />
Example 2.7. If {Xn}is a martingale and τ a stopping time so that P(τ <<br />
∞) = 1 and E(|Xτ |) < ∞, and limn→∞ E(Xn1τ>n) = 0, then E(Xτ ) = E(X0).<br />
Proof. For any n we have: X min(τ,n) = Xn1τ>n +Xτ 1τ≤nTaking expectation<br />
and then the limit as n → ∞, gives:<br />
lim<br />
n→∞ E(Xmin(τ,n)) = lim<br />
n→∞ E(Xn1τ>n) + lim<br />
n→∞ E(Xτ 1τ>n)<br />
= 0 + E(Xτ )<br />
Where the first term is 0 by hypothesis, and the second limit is justified since<br />
Xτ 1τ>n → Xτ pointwise almost surely since P(τ < ∞) = 1, and the dominant<br />
majorant E(|Xτ |) < ∞lets us use the Lebesgue dominated convergence theorem to<br />
conclude the convergence of the expectation. <br />
Example 2.8. Suppose {Xn}is a martingale and τ a stopping time so that<br />
E(τ) < ∞ and |Xn+1 − Xn| ≤ M < ∞for some fixed M and for every n. Then<br />
E(Xτ ) = E(X0).<br />
Proof. Let Y = |X0| + Mτ. Then Y can be used as a dominant majorant in<br />
a L.D.C.T. very similar to the above example to get the conclusion. <br />
3. Martingale Convergence Theorem<br />
The proof relies on the famous upcrossing lemma:<br />
Lemma 3.1. [The Upcrossing Lemma]. Let {Xn}be a submartingale. For fixed<br />
α, β ∈ R, β > α,and M ∈ N let U α,β<br />
M be the number of “upcrossings” that the<br />
martingale {Xn}makes of the interval α, β in the time period 1 ≤ n ≤ M. (An<br />
upcrossing is when Xn goes <strong>from</strong> being less than α initially to being more than β
8 1. MARTINGALES<br />
later. Precisely this is: U α,β<br />
M = maxk{k : ∃t1 < u1 < . . . < tk < uk ≤ M s.t. Xti ≤<br />
α and Xui ≥ β ∀i} ). Then:<br />
E(U α,β<br />
M ) ≤ E (|XM − X0|)<br />
β − α<br />
Proof. Firstly, we remark that it suffices to prove the result when the submartingale<br />
{Xn}is replaced by {max(Xn, α)}, since this is still a submartingale,<br />
it has the same number of upcrossings as Xn, and | max(XM , α) − max(X0, α)| ≤<br />
|XM −X0|, so the equality is only strengthened. In other words, we assume without<br />
loss of generality that Xn ≥ α for all n. This simplification is used in exactly one<br />
spot later on to get the inequality we need.<br />
Let us now carefully nail down where the upcrossings happen. Define u0 =<br />
v0 = 0 and iteratively define:<br />
uj = min(M, inf {k : Xk ≤ α}<br />
k>vj−1<br />
vj = min(M, inf {k : Xk ≥ β}<br />
k>uj<br />
These record the times where the martingale crosses the interval [α, β]; the uj’s<br />
record when it first crosses moving to the left of the interval, and the vj’s record<br />
crosses going to the right of the interval. They are also truncated at time M so that<br />
they are bounded stopping times. Moreover, since these times are strictly increasing<br />
until they hit M, it must be the case that vM = M. We have then, using some<br />
crafty telescoping sums:<br />
E(XM ) = E (XvM )<br />
= E XvM − XuM + XuM − XvM−1 + XvM−1 − . . . − Xu1<br />
<br />
+ Xu1 − X0 + X0<br />
<br />
M<br />
<br />
M<br />
= E (X0) + E Xvk − Xuk + E <br />
Xuk − Xvk−1<br />
k=1<br />
The third term is non-negative! This is because uk and vk−1are both bounded<br />
stopping times with 0 ≤ vk−1 ≤ uk ≤ M, so our theorem about stopping times<br />
gives that this expectation is non-negative. (This is subtle! Most of the time<br />
(when we haven’t hit time M yet) we expect Xuk < αwhile Xvk−1 > β, so their<br />
difference is negative. However, because of the small probability event where vk−1 <<br />
M and uk = M, we get a big positive number with small probability which balances<br />
the whole expectation. Compare to the example of a simple symmetric random walk<br />
with a truncated stopping time for τ =first time that Xn = −1.)<br />
M<br />
Now the second term, has E Xvk k=1 − Xuk ≥ E (β − α)U α,β<br />
<br />
M . This is<br />
because each upcrossing counted in U α,β<br />
M contributes at least (β−α) to the sum, null<br />
cycles (where uk = vk = M) contribute nothing, and the possibly one incomplete<br />
cycle (where uk < M but vk = M) must give a non-negative contribution to the<br />
sum by the simplification that Xn > α.<br />
Hence we have:<br />
<br />
E(XM ) ≥ E (X0) + (β − α)E + 0<br />
k=1<br />
U α,β<br />
M<br />
Which gives the desired result.
4. APPLICATIONS 9<br />
Theorem 3.2. [Martingale Convergence Theorem] Let {Xn}be a submartingale<br />
with sup n E(|Xn|) < ∞. Then there exists a random variable X so that Xn → X<br />
almost surely. (That is Xn(ω) = X(ω) for almost all ω ∈ Ω).<br />
Proof. Firstly, since sup n E(|Xn|) < ∞, by Fatou’s lemma we have: E(lim infn |Xn|) ≤<br />
lim infn E(|Xn|) ≤ sup n E(|Xn|) < ∞, <strong>from</strong> which it follows that P(|Xn| → ∞) =<br />
0. This ensures that the Xncannot “leak away” probability to ±∞, which would<br />
prevent the limiting random variable <strong>from</strong> being properly normalized.<br />
Now suppose by contradiction that P(lim inf Xn < lim sup Xn) > 0, i.e. there is<br />
a non-zero probability of Xnnot converging. Then, using the density of the rationals<br />
and countable subadditivity to find an α and β so that P(lim inf Xn < α < β <<br />
lim sup Xn) > 0. Counting the number of upcrossing Xn <br />
<br />
makes of [α, β],we see that<br />
α,β<br />
we must have: P lim UM = ∞ > P(lim inf Xn < α < β < lim sup Xn) > 0.<br />
<br />
M→∞ <br />
α,β<br />
Hence E lim UM = ∞. By the monotone convergence theorem however, we<br />
M→∞ <br />
α,β<br />
have that lim<br />
lim U = ∞.<br />
M→∞<br />
α,β<br />
E(UM ) =E<br />
M→∞<br />
M<br />
But now we have reached a contradiction! For by the upcrossing lemma:<br />
α,β<br />
lim E(UM M→∞ ) ≤ limM→∞ E (|XM − X0|)<br />
≤<br />
β − α<br />
2 supM E(|Xn|)<br />
< ∞<br />
β − α<br />
4. Applications<br />
Theorem 4.1. [Levy]Suppose Z a random variable with E(|Z|) < ∞, and that<br />
{Fn} is a decreasing chain of σ−algebras, F1 ⊃ F2 ⊃ . . . (This is saying that they<br />
are getting coarser and coarser). Let F∞ = ∩Fn. Then we have almost surely:<br />
lim<br />
n→∞ E(Z|Fn) = E(Z|F∞)<br />
Proof. We first prove that there is an almost sure limit using the martingale<br />
convergence theorem, and then we check the defining properties of E(Z|F∞) to<br />
verify that this is indeed the limit.<br />
Firstly, let Xn = E(Z|Fn). Then for any fixed M ∈ N we have that the sequence<br />
XM , XM−1, . . . X2, X1is a martingale (Here we are referring to a slightly more general<br />
martingale than in our original definition, the sigma algebra σ(X1, X2, . . .) in<br />
the definition is replaced by arbitrary increasing sigma algebras Fn. The expectation<br />
property of the martingale follows by the fact that E(E(Z|F)|G) = E(Z|G)<br />
when G ⊂ F) Notice that we had to reverse the order of the sequence to get the<br />
sigma algebras to increase (i.e. get finer and finer), so that we really have a martingale.<br />
For this reason, the martingale convergence theorem does not apply directly<br />
but the idea of the proof will still work. Suppose by contradiction, as in the proof of<br />
the martingale convergence theorem, that P(lim inf Xn < lim sup Xn) > 0. Then,<br />
as before, find αand β so that P(lim inf Xn < α < β < lim sup Xn) > 0. Since there<br />
are infinitely many crossings then of the interval [α, β], we can know that the number<br />
of downcrossings D α,β<br />
M has P<br />
Hence, since D α,β<br />
M<br />
<br />
lim<br />
M→∞ Dα,β<br />
M<br />
<br />
<br />
= ∞ > 0 and so E<br />
lim<br />
M→∞ Dα,β<br />
M<br />
<br />
<br />
= ∞.<br />
is increasing in M (the number of downcrossings<br />
<br />
can only increase<br />
if we wait longer), we may find an M0 ∈ Nso that E D α,β<br />
<br />
> M0<br />
2E(|Z|)<br />
β−α .
10 1. MARTINGALES<br />
Taking now the martingale sequence XM0 , XM0−1, . . . X2, X1, we have a violation<br />
of the upcrossing lemma just as we did in the martingale convergence theorem.<br />
Next, to verify that the limit is indeed E(Z|F∞) we just need to check the<br />
two defining properties, namely that it is F∞measurable and that it has the correct<br />
expectation value for events in F∞. limn→∞ E(Z|Fn) is F∞measurable, since<br />
F∞ ⊂ F n for every n, meaning that E(Z|Fn) is F∞measurable for every n, and so<br />
the limit is too.<br />
To see that limn→∞ E(Z|Fn) takes the correct expectations for events in F,<br />
notice that for any A ∈ F∞ ⊂ Fn we have for every n that E (E(Z|Fn)1A) =<br />
E(Z1A) since A ∈ Fn, so in the limit limn→∞ E (E(Z|Fn)1A) = E(Z1A). Hence<br />
the problem of proving that E (limn→∞ E(Z|Fn)1A) = E(Z1A) is reduced to an<br />
interchange of a limit with an expectation. If Z is bounded, this is justified by the<br />
bounded convergence theorem. For Z not bounded, truncating Z by Z1 {Z≤N}with<br />
a bit more work will give the same interchange of limits. <br />
Theorem 4.2. [Levy] Suppose Z a random variable with E(|Z|) < ∞, and that<br />
{Fn} is an increasing chain of σalgebras, F1 ⊂ F2 ⊂ . . . (This is saying that they<br />
are getting finer and finer). Let F∞ = ∪Fn. Then we have almost surely:<br />
lim<br />
n→∞ E(Z|Fn) = E(Z|F∞)<br />
Proof. This proof is like the last one. In this case E(Z|Fn) really is a martingale<br />
(no backwards), so an almost sure limit exists by the martingale convergence<br />
theorem. Some more work here is needed....I think you get the desired property<br />
by approximation with “tame events” A ∈ F∞, for every ɛ > 0 there exists<br />
An ∈ Fnsuch that P(A∆An) < ɛ. <br />
Remark. This result is often known as the “Levy Zero-One Law” since a<br />
common application is to consider an event A ∈ F∞, for which the theorem tells<br />
us that:<br />
lim<br />
n→∞ P(A|Fn) = lim<br />
n→∞ E(1A|Fn)<br />
= E(1A|F∞)<br />
= 1A<br />
Where the last equality holds since A is F∞measurable. This says in particular<br />
that this probability is either 0 or 1, since these are the only two values taken on<br />
by 1A. In this setting, the theorem gives a short proof of the Kolmogorov zero-one<br />
law.<br />
Theorem 4.3. [Kolmogorov Zero-One law] Let X1, X2, . . .be an infinite sequence<br />
of i.i.d. random variables. Define:<br />
<br />
n<br />
<br />
Fn = σ σ(Xk)<br />
F∞ =<br />
Ftail =<br />
n<br />
k=1<br />
k=1<br />
Fn<br />
∞<br />
<br />
∞<br />
<br />
σ σ(Xk)<br />
n=1<br />
k=n
4. APPLICATIONS 11<br />
Then any event A ∈ Ftail has either P(A) = 0 or P(A) = 1. These are those<br />
events which do not depend on finitely many of the X ′ ns.<br />
Proof. Let A ∈ Ftail. For any n ∈ N we have that P(A) = P(A|Fn) =<br />
E(1A|Fn) since A ∈ Ftail does not depend on the first n variables, so its conditional<br />
expectation is a constant. Have then, (as in the above “Levy 0-1” remark):<br />
P(A) = lim<br />
n→∞ P(A|Fn)<br />
= 1A<br />
Since A ∈ F∞So indeed, the only the values of P(A) that are possible are 1<br />
and 0. <br />
Theorem 4.4. [Strong Law of Large Numbers] Suppose X1, X2, . . . are i.i.d.<br />
Then we have almost surely that:<br />
X1 + X2 + . . . + Xn<br />
lim<br />
= E(X1)<br />
n→∞ n<br />
∞<br />
Proof. Define Sn = X1 + X2 + . . . + Xn, and let Fn = σ( σ(Sk)) be the<br />
sigma algebra of the tail Sn, Sn+1, . . . . We now claim that:<br />
E(X1|Fn) = Sn<br />
n<br />
This can be seen in the following slick way. First notice that by symmetry,<br />
we<br />
<br />
must have E(X1|Fn) = E(X2|Fn) = . . . = E(Xn|Fn). By linearity now:<br />
n<br />
k=1 E(Xk|Fn)=E(Σn k=1Xk|Fn) = E(Sn|Fn) = Sn, since Sn ∈ Fn. Hence since<br />
they are all equal, and sum to Sn, we get E(X1|Fn) = Sn<br />
n as desired. By Levy’s<br />
theorem now:<br />
Sn<br />
lim<br />
n→∞ n<br />
= lim<br />
n→∞ E(X1|Fn)<br />
<br />
= E X1| <br />
<br />
From here, one can use the Hewitt-Savage zero-one law (which says that permutation<br />
<br />
invariant events have a zero one law), to see that the whole sigma algebra<br />
k Fk must be the trivial one, so then E (X1| <br />
k Fk) = E(X1). Alternatively, once<br />
we have conclude that such an almost sure limit exists, one could then remark<br />
by the Kolmogorov zero that the limit must be a constant (for limn→∞ Sn<br />
n does<br />
k<br />
Fk<br />
k=n<br />
not depend on finitely many of the X ′ ns so any type of event {lim Sn<br />
n<br />
< α}must<br />
have probability 0 or 1. By taking a sup, we can find that it must be a constant.)<br />
Combining this with the above, using the fact that conditional random variables<br />
preserve the expectation, shows the constant is indeed E(X1). <br />
Theorem 4.5. [Hewitt Savage Zero-One Law] Let X1, X2, . . .be an infinite sequence<br />
of i.i.d. random variables. Let A be an event which is unchanged under<br />
finite permutations of the induces of the X ′ is. (e.g. for every finite permutation<br />
Π,ω = (x1, x2, . . .) ∈ A iff Π(ω) = (xΠ(1), xΠ(2), . . .) ∈ A i.e. Π(A) = A). Then<br />
P(A) = 0 or 1.
12 1. MARTINGALES<br />
Proof. We call an event “tame” if it only depends on finitely many of the<br />
X ′ is.The proof is a consequence of the fact that for any ɛ, any event A can be<br />
approximated by a “tame event” B so that P(B△A) < ɛ. (This is completely analogous<br />
to the fact that for the usual Lebesgue measure on R, one can approximate<br />
any measurable set S by a finite union of open intervals In so that λ(∪n i=1Ii△U) < ɛ.<br />
This comes <strong>from</strong> the definition of the Lebesgue measure as the inf of the outer measure<br />
with open sets, and the fact that every open set is a union of countably many<br />
intervals, of which only finitely many are needed to be within ɛ/2. In the same vein,<br />
the probability measure on the infinite sequence of events is generated by the outer<br />
measure <strong>from</strong> tame events. This is usually all packaged up in the Caratheodory extension<br />
theorem.). Once we have this tame event B, depending only on X1, . . . Xn<br />
we letΠ be the permutation that permutes 1, . . . n with n + 1, . . . , 2n so that B and<br />
Π(B) are independent events. Have then:<br />
P(A) ≈ P(A ∩ B)<br />
= P(Π(A) ∩ Π(B))<br />
= P(A ∩ Π(B))<br />
≈ P(B ∩ Π(B))<br />
= P(B)P(Π(B))<br />
= P(B) 2<br />
≈ P(A) 2<br />
Where each of the approximations hold within ɛ by the choice of B. Since we<br />
can do this for every ɛ > 0, we get P(A) = P(A) 2 and the result follows.
CHAPTER 2<br />
The Law of the Iterated Logarithm<br />
We will prove that for a sequence of i.i.d events X1, X2, . . .with mean 0 and<br />
variance 1 that for Sn = n i=1 Xn:<br />
<br />
Sn<br />
P lim sup =<br />
n→∞ n log(log n)) √ <br />
2<br />
This result is giving us finer information about these sums than the law of large<br />
numbers or the central limit theorem. We need the theory of martingales to get<br />
Doob’s inequality, and then a bunch of other sneaky tricks, like the Borel Cantelli<br />
lemmas, to get the result. We will also need a few analytic type estimates along the<br />
way. (Actually, our proof here will only prove the case where the X ′ ns are ±1 with<br />
probability 1/2 each. The result can be generalized by using even finer estimates)<br />
1. First Half of the Law of the Iterated Logarithm<br />
To start, we will first prove some helpful lemmas.<br />
Lemma 1.1. [Doob’s Inequality] For a submartingale Zn, we have for any α > 0<br />
that:<br />
<br />
P max<br />
0≤i≤n Zi<br />
<br />
≥ α ≤ E(|Zn|)<br />
α<br />
Proof. (Taken <strong>from</strong> Rosenthal) Let Akbe the event that {Xk ≥ α, but Xi <<br />
α for i < k},i.e. that the process reaches αfor the first time at time k. These are<br />
disjoint events with A = ∪Ak = {(max0≤i≤n Zi) ≥ α} which is the event we want.<br />
Now consider:<br />
αP(A) =<br />
n<br />
αP(Ak)<br />
k=0<br />
= E(α1Ak )<br />
≤ E(Xk1Ak ) since Xk ≥ αon Ak<br />
≤ E (E (Xn|X1, X2, . . . Xk) 1Ak ) since it’s a submartingale<br />
= E(Xn1Ak )<br />
= E(Xn1A)<br />
≤ E(|Xn|)<br />
And the result follows. <br />
13
14 2. THE LAW OF THE ITERATED LOGARITHM<br />
Remark. This is a “rich man’s version of Chebyushev-type inequalities”, which<br />
are proved using the same trick as in lines 3 and 4 of the inequality train above.<br />
The fact that the behavior of the whole martingale can be controlled by the end<br />
point of the martingale gives us the little extra oomph we need.<br />
Lemma 1.2. [Hoeffding’s Inequality] Let Y be a random variable so that E(Y ) =<br />
0 and a, b ∈ R so that a ≤ Y ≤ b almost surely. Then E(e tY ) ≤ e t2 (b−a) 2 /8 .<br />
Proof. Write X as a convex combination of a and b: Y = αb + (1 − α)a where<br />
α = (Y − a)/(b − a). By convexity of e ( ) , have then:<br />
e tY ≤<br />
Y − a<br />
b − a etb b − Y<br />
+<br />
b − a eta<br />
Taking expectations (and using E(Y ) = 0), have:<br />
E e tY ≤ −a<br />
b − a etb + b<br />
b − a eta = e g(t(b−a))<br />
For g(u) = −γu + log(1 − γ + γe u )and γ = − a<br />
b−a . Notice g(0) = g′ (0) = 0 and<br />
g ′′ (u) < 1<br />
4<br />
for all u. Hence by Taylor’s theorem:<br />
g(u) = g(0) + ug ′ (0) + u2<br />
2 g′′ (ξ)<br />
≤ 0 + 0 + u 2 1 u2<br />
=<br />
2 4 8<br />
So then E e tY ≤ e g(t(b−a)) ≤ e t2 (b−a) 2 /8 <br />
Lemma 1.3. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 1<br />
2 and Sn = n<br />
k=1 Xk.<br />
Then P(maxk≤n Sk > λ) ≤ e −λ2 /2n .<br />
Proof. Have, by using Doob’s inequality and Hoeffding’s Inequality, for any<br />
t ∈ R, we have:<br />
Set t = 4λ/n(b − a) 2 to get:<br />
P(max<br />
k≤n Sk > λ) = P(max<br />
k≤n etSk > e tλ )<br />
≤ e −tλ E(e tSn )<br />
= e −tλ E(e tX1 ) n<br />
≤ e −tλ e nt2 (b−a) 2 /8<br />
P(max<br />
k≤n Sk > λ) ≤ e −(4λ/n(b−a)2 )λ e n(4λ/n(b−a) 2 ) 2 (b−a) 2 /8<br />
= e −2λ2 /n(b−a) 2<br />
For simple symmetric steps, we have a = −1 and b = 1, so this gives the<br />
result. <br />
Theorem 1.4. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 1<br />
2 and Sn <br />
=<br />
n<br />
k=1 Xk. Then for any ɛ > 0,<br />
<br />
<br />
P<br />
lim sup<br />
n→∞<br />
Sn<br />
n log(log n)) > √ 2 + ɛ<br />
= 0
2. SECOND HALF OF THE LAW OF THE ITERATED LOGARITHM 15<br />
Or in other words, since this holds for any value of ɛ > 0:<br />
<br />
Sn<br />
P lim sup ≤<br />
n→∞ n log(log n)) √ <br />
2 = 1<br />
Proof. Fix some θ > 1 (the choice will be made more precise later). We will<br />
show that with the correct choice of θ, the events An = {Sk > (2 + ɛ)k log(log k))for<br />
some k, θn−1 ≤ k < θn } happens only finitely many times, which will show that<br />
the limsup can’t be more than √ 2 + ɛ. To do this it suffices to show that P(An)<br />
is summable, because then the Borel-Cantelli lemmas will show that An happens<br />
finitely often with probability 1. We have (using our previous lemma):<br />
<br />
P(An) = P Sk > (2 + ɛ)k log(log k)), θ n−1 ≤ k < θ n<br />
<br />
≤ P Sk > (2 + ɛ)θn−1 log(log θn−1 )), θ n−1 ≤ k < θ n<br />
<br />
≤ P max<br />
k≤θn Sk > (2 + ɛ)θn−1 log(log θn−1 <br />
))<br />
2 <br />
(2 + ɛ)θn−1 log(log θn−1 ))<br />
≤ exp −<br />
2θn <br />
2 + ɛ θ<br />
= exp −<br />
2<br />
n−1 (log(n − 1) + log(log(θ)))<br />
θn <br />
<br />
≈ exp − 1 + ɛ<br />
<br />
θ<br />
2<br />
−1 <br />
log(n − 1) for large n<br />
So choosing θ < 1 + ɛ<br />
2 , gives us that 1 + ɛ<br />
<br />
−1<br />
2 θ > 1, so this is:<br />
ɛ −(1+<br />
P(An) ≤ (n − 1) 2)θ −1<br />
From which we see that P(An) is summable (it’s a p-series!). By using the<br />
Borel Cantelli lemma, this means that An happens only finitely many times with<br />
probability 1, which is the desired result. <br />
2. Second Half of the Law of the Iterated Logarithm<br />
To prove the other half, we need some more estimates.<br />
Lemma 2.1. [Mill’s Inequality] This is an estimate concerning the probability<br />
density function of a Gaussian:<br />
λ<br />
λ2 + 1 e−λ2 ∞<br />
/2<br />
≤ e −y2 /2 1<br />
dy ≤<br />
λ e−λ2 /2<br />
λ<br />
Proof. To prove the lower bound, we find a remarkable anti-derivative:<br />
∞<br />
e<br />
λ<br />
−y2 /2<br />
dy ≥<br />
∞<br />
e<br />
λ<br />
−y2 4 2<br />
/2 y + 2y − 1<br />
y2 + 2y2 <br />
dy<br />
+ 1<br />
=<br />
<br />
− y<br />
y2 + 1 e−y2 ∞ /2<br />
=<br />
λ<br />
λ 2 + 1 e−λ2 /2<br />
λ
16 2. THE LAW OF THE ITERATED LOGARITHM<br />
The upper bound is found by using the estimate y/λ > 1 in the range of<br />
integration:<br />
<br />
λ<br />
∞<br />
e −y2 /2 dy ≤<br />
∞<br />
y<br />
λ e−y2 /2<br />
dy<br />
λ<br />
= 1<br />
λ<br />
<br />
−e −y2 /2 ∞<br />
= 1<br />
λ e−λ2 /2<br />
Theorem 2.2. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 1<br />
2 and Sn <br />
=<br />
n<br />
k=1 Xk. Then for any ɛ > 0,<br />
<br />
<br />
P<br />
lim sup<br />
n→∞<br />
Sn<br />
n log(log n)) ≥ √ 2 − 2ɛ<br />
Or in other words, since this holds for any value of ɛ > 0:<br />
<br />
Sn<br />
P lim sup ≥<br />
n→∞ n log(log n)) √ <br />
2 = 1<br />
Proof. As in the proof of the other half of the law, the idea is to prove that the<br />
appropriate events happen infinitely often using the Borel-Cantelli<br />
lemmas. Fix θ ><br />
1 (the choice will be made precise later). Let Bn =<br />
We will show that these occur infinitely often and then show why this gives the<br />
result. Notice that the B ′ ns are independent, as each Bn depends only on the value<br />
of Xk for θn−1 ≤ k ≤ θn , so to prove that Bnhappens i.o. it suffices to show, via<br />
the Borel Cantelli lemma, that P(Bn) is not summable. Consider:<br />
<br />
P(Bn) = P Sθn − Sθn−1 ≥ (2 − ɛ)θn log(log(θn <br />
))<br />
<br />
= P<br />
≈<br />
λ<br />
= 1<br />
<br />
S θ n −θ n−1 ≥ (2 − ɛ)θ n log(log(θ n ))<br />
1<br />
√<br />
2π<br />
∞<br />
√ (2−ɛ)θ n log(log(θ n ))<br />
√ θ n −θ n−1<br />
Sθ n − S θ n−1 ≥ (2 − ɛ)θ n log(log(θ n ))<br />
e −y2 /2 dy<br />
Where the first equality holds using the Markov property of the sums (equiv-<br />
s are i.i.d.), and<br />
alently, look at the definition as sums of X ′ is and the fact the X′ i<br />
the second equality is coming asymptotically as θn − θn−1 √ → ∞ <strong>from</strong> the central<br />
(2−ɛ)θn log(log(θn ))<br />
limit theorem. Now, let λ = √ be the lower bound of the integral<br />
θn−θn−1 and use Mill’s inequality to get:<br />
P(Bn) ≥<br />
=<br />
1<br />
√ 2π<br />
1<br />
√ 2π<br />
λ<br />
λ 2 + 1 e−λ2 /2<br />
1<br />
λ + λ −1 e−λ2 /2<br />
<br />
<br />
.
2. SECOND HALF OF THE LAW OF THE ITERATED LOGARITHM 17<br />
But now notice that λ =<br />
log n<br />
√ (2−ɛ)θ n log(log(θ n ))<br />
√ θ n −θ n−1<br />
≈ √ 2 − ɛ √ log n<br />
√ 1−θ −1 , so λ2 ≈ (2 −<br />
ɛ) 1−θ−1 . So our estimate is:<br />
1<br />
P(Bn) ≥ C √ √ −1<br />
log n + log n exp<br />
<br />
2 − ɛ<br />
2(1 − θ−1 <br />
log n<br />
)<br />
≥ Cn −<br />
“<br />
1−ɛ/2<br />
1−θ−1 ”<br />
log n −1/2<br />
Where C’s are some constants. By choosing θlarge enough, 1−ɛ/2<br />
1−θ −1 < 1 and this<br />
will not be summable! Have then Bn occurs infinitely often.<br />
Now, we will show that these events Bnoccurring infinitely often will be enough<br />
to see that Sθ n ≥ (2 − 2ɛ)θ n log(log(θ n )) infinitely often too. To do this we<br />
will use the first half of the law of the iterated logarithm we already proved,<br />
namely that for any η > 0, the events {Sk > (2 + η)k log(log k))}happen only<br />
finitely often with probability 1. By symmetry, we’ll have the events {Sk <<br />
− (2 + η)k log(log k))}happen only finitely often too. Hence, the events An <br />
=<br />
Sθn−1 < − (2 + η)θn−1 log(log θn−1 <br />
)) happens only finitely often with proba-<br />
bility 1. Now, since the B ′ ns occur infinitely often with probability 1, and the A ′ ns<br />
occur only finitely often with probability 1, the events Bn ∩ A c n will occur infinitely<br />
often with probability 1 too. This will give us the infinite sequence we need, for on<br />
the event Bn ∩ A c n we have the inequalities:<br />
Sθ n − S θ n−1 ≥ (2 − ɛ)θ n log(log(θ n ))<br />
S θ n−1 ≥ − (2 + η)θ n−1 log(log θ n−1 ))<br />
Hence, with probability 1, we have that for infinitely many values of n:<br />
Sθ n ≥ (2 − ɛ)θ n log(log(θ n )) + S θ n−1<br />
≥ (2 − ɛ)θn log(log(θn )) − (2 + η)θn−1 log(log θn−1 ))<br />
≥ (2 − ɛ)θn log(log(θn <br />
(2 + η)<br />
)) − θ<br />
θ<br />
n log(log θn ))<br />
<br />
√2 2 + η θ<br />
− ɛ −<br />
n log(log(θn ))<br />
=<br />
θ<br />
So by fixing η, (any choice will do) and then choosing θ large enough we can<br />
√2<br />
make the coefficient − ɛ − ≥ √ 2 − 2ɛ. (Note that this doesn’t disrupt<br />
2+η<br />
θ<br />
our choice of θ previously because that too was a choice to make θ large, so we can<br />
always find θso big to suit both our needs.) We have then that for infinitely many<br />
n:<br />
Sθ n<br />
θ n log(log(θ n )) ≥ √ 2 − 2ɛ<br />
So then:<br />
P<br />
<br />
lim sup<br />
n→∞<br />
Sn<br />
n log(log n)) ≥ √ 2 − 2ɛ<br />
The two halves of the law of the iterated logarithm give the full result:<br />
<br />
= 1
18 2. THE LAW OF THE ITERATED LOGARITHM<br />
P<br />
<br />
lim sup<br />
n→∞<br />
Sn<br />
n log(log n)) = √ 2<br />
<br />
= 1
CHAPTER 3<br />
Ergodic Theorem<br />
1. Motivation<br />
The study of Ergodic Theory was first motivated by statistical mechanics. Here,<br />
one is interested in the long term average of systems. For example, say we have some<br />
particles with position Q(t) at time t, and momentum P (t) at time t. Let f be a<br />
function on this state space, for example f might be the pressure/temperature/some<br />
other macroscopic variable. Can we find a distribution G so that:<br />
T<br />
1<br />
lim<br />
T →∞ T<br />
0<br />
<br />
f(Q(s), P (s))ds = fdG<br />
Gibbs et al. worked on this problem and it turns out that G = 1<br />
Z e−H/kT with<br />
Z the partition function, H the Hamiltonian, T temperature, and k Boltzmann’s<br />
constant has this! These types of long term averaging things can be useful. We will<br />
start with a simple example.<br />
Example 1.1. Let Ω = [0, 1) = {θ : 0 ≤ θ < 1} where we think of Ω as a<br />
circle with perimeter 1 (and θthe position on the circle). For some fixed angle ω, let<br />
T : Ω → Ω be rotation by ω, that is T (θ) = θ + ω mod 1. This is clearly measure<br />
preserving in the sense that for any set B we have that m(B) = m(T −1 (B)) where<br />
m is the usual Lebesgue measure. Could it be that:<br />
N−1<br />
1 <br />
lim<br />
N→∞ N<br />
n=0<br />
f (T n x) =<br />
1<br />
0<br />
f(s)ds<br />
If ωis rational, this doesn’t have a chance, because T n eventually cycles back<br />
to the identity, so T n x will only sample finitely many points. However, if ωis<br />
irrational, this is true! We can prove it in this case using Fourier analysis. When<br />
f(x) = e 2πimx , for m ∈ N, we have the geometric series:<br />
N−1<br />
1 <br />
N<br />
n=0<br />
<br />
f (T n x) = 1<br />
N−1<br />
N<br />
n=0<br />
e 2πim(x+nω)<br />
= 1<br />
N e2πmx e2πimNω − 1<br />
e2πimω − 1<br />
→ 0<br />
1<br />
= f(s)ds<br />
Where the fact that ωirrational ensures that e 2πimω −1 = 0. In the case m = 0,<br />
f is constant, so of course the result holds. Now for any f ∈ C 2 (Ω), we can expand<br />
0<br />
19
20 3. ERGODIC THEOREM<br />
f as a Fourier series to see the result holds. This lets us calculate for example:<br />
For if f = 1 (a,b) notice that<br />
#{k ≤ N : x + kω ∈ (a, b)}<br />
lim<br />
= b − a<br />
N→∞<br />
N<br />
#{k≤N: x+kω∈(a,b)}<br />
N<br />
= 1 N−1 N n=0 f (T nx). By ap-<br />
proximating f by C 2 functions (in the L 1 sense) <strong>from</strong> above and below, and applying<br />
the limit calculated above, we get the result.<br />
Is there away we can do this kind of thing using probability methods (rather<br />
than Fourier)? The next result is a nice theorem in this direction.<br />
2. Birkhoff’s Theorem<br />
Theorem 2.1. [Birkoff-Khinchin Ergodic Theorem] Say (Ω, F, P) is a probability<br />
space. Suppose T : Ω → Ω is a measure preserving map, in the sense that<br />
P(T −1 (B)) = P(B) for all B ∈ F. Let F0 = {A ∈ F : T −1 A = A a.e.} be the field<br />
of T invariant events. For f : Ω → R a random variable with E(|f|) < ∞, we have<br />
almost surely:<br />
N−1<br />
1 <br />
lim f (T<br />
N→∞ N<br />
n x) = E (f|F0)<br />
n=0<br />
Corollary 2.2. In the case that F0 is the trivial field, E (f|F0) = E(f) is a<br />
constant, so this is exactly the thing we had above. This happens precisely when<br />
T −1 A = A ⇒ P(A) = 0 or 1. In this case we say that the map T is “ergodic”.<br />
The proof of this theorem relies on the following lemma.<br />
Lemma 2.3. [Maximal Ergodic Lemma] Say (Ω, F, P) is a probability space.<br />
Suppose T : Ω → Ω is a measure preserving map, in the sense that P(T −1 (B)) =<br />
P(B) for all B ∈ F. Say f : Ω → R a random variable with E(|f|) < ∞. Let<br />
Sn = n−1 k=1 f(T kx) and let A = {supn≥1 Sn > 0} be the event that this is positive<br />
at some point. Then:<br />
<br />
E (f1A) = fdP > 0<br />
A<br />
Proof. Define f + (x) = f(T x) and let mn = max{0, S1, S2, . . . Sn}, and m + n in<br />
the same way, replacing f by f + in the definition of Sk. Notice that by this<br />
definition the mn’s are non-decreasing. Notice that the event A = {sup n≥1 Sn > 0}<br />
is the same as saying mn > 0 for n large enough. For this reason, it will be enough<br />
to restrict our attention to the events {mn > 0}. Notice that if we are in the event<br />
{mn > 0} then we have:<br />
S1 + m + n = S1 + max{0, S + 1 , S+ 2 , . . . S+ n }<br />
= S1 + max{0, S2 − S1, S3 − S1, . . . Sn+1 − S1}<br />
= max{S1, S2, . . . Sn+1}<br />
= mn+1<br />
Where we used that we’re on the event{mn > 0} in the last step to see the last<br />
equality, and we used S + n = n−1<br />
0<br />
f(T k T x) = n<br />
1 f(T x) = Sn+1 − S1in the second
2. BIRKHOFF’S THEOREM 21<br />
equality. We have then:<br />
E <br />
f1 {mn>0} =<br />
=<br />
<br />
E S11 {mn>0}<br />
E (mn+1 − m + =<br />
<br />
n )1 {mn>0}<br />
E ≥<br />
<br />
+<br />
mn+11 {mn>0} − E m n 1 {mn>0}<br />
E +<br />
mn+11 {mn>0} − E m n<br />
The last inequality holds since on the event {mn = 0},we have S1 ≤ 0, so<br />
m + n = mn+1 − S1 ≥ mn+1 ≥ 0, so E m + <br />
+<br />
n 1 {mn=0} ≥ 0. Hence E (m n ) =<br />
E m + <br />
+ +<br />
n 1 {mn>0} + E m n 1 {mn=0} ≥ E m n 1 {mn>0} . From here, we note that<br />
E(m + n ) = E(mn) since the map T is measure preserving, and the only difference<br />
between m + n and mn is the map x → T x. Have then:<br />
E <br />
f1 {mn>0} ≥ E mn+11 {mn>0} − E (mn)<br />
= E <br />
mn+11 {mn>0} − E mn1 {mn>0}<br />
<br />
= E (mn+1 − mn)1 {mn>0}<br />
≥ 0<br />
The second equality holds since mn ≥ 0 always holds, and the last inequality<br />
holds since the m ′ ns are non-increasing. Finally, to get the result, notice that<br />
{mn > 0} is increasing to {sup Sn > 0}, so by a monotone convergence theorem<br />
result, we have:<br />
E <br />
f1 {sup Sn>0} = lim<br />
n→∞ E <br />
f1 {mn>0} ≥ 0<br />
With this in hand, we can prove Birkhoff’s theorem:<br />
Theorem 2.4. [Birkoff-Khinchin Ergodic Theorem] Say (Ω, F, P) is a probability<br />
space. Suppose T : Ω → Ω is a measure preserving map, in the sense that<br />
P(T −1 (B)) = P(B) for all B ∈ F. Let F0 = {A ∈ F : T −1A = A a.e.} be the field<br />
of T -invariant events. For f : Ω → R a random variable with E(|f|) < ∞, we have<br />
almost surely:<br />
N−1<br />
1 <br />
lim f (T<br />
N→∞ N<br />
n x) = E (f|F0)<br />
n=0<br />
Proof. Firstly, we will argue that limN→∞ 1 N−1 N n=0 f (T nx) converges a.s. to<br />
some random variables, and then we (as usual) check that it has the two defining<br />
properties of conditional expectation.<br />
Define SN = N−1<br />
N−1 n=0 f (T nx) does not converge<br />
a.s.. By the usual trick with rational numbers then, we can find a, b ∈ R so that<br />
the even A = lim inf Sn<br />
Sn<br />
n ≤ a < b ≤ lim sup n hasP (A) > 0. Notice moreover,<br />
that A is a T -invariant event, i.e. x ∈ A ⇒ T x ∈ A, since applying T shifts the<br />
terms in Sn by one, which does not affect the limsup or liminf of Sn/n. (Indeed,<br />
these don’t depend on finitely many of the terms!). For this reason, we may define<br />
a new probability measure on the set A, namely we think of (A, ˜ F, ˜ P) as a new<br />
probability space, with ˜ F={A ∩ B : B ∈ F}and ˜ P(E) = P(E)/P(A). The fact<br />
that A is T -invariant means that T nx ∈ A whenever x ∈ A so we can still talk<br />
n=0 f(T nx) as before, so that we are interested in the sum<br />
Sn/n. Suppose by contradiction that limN→∞ 1<br />
N
22 3. ERGODIC THEOREM<br />
about Snand so on on this space. The fact that P(A) > 0 means that there is no<br />
problem re-normalizing like this. So we have now ˜ P(A) = 1 is the whole space.<br />
With this new space as our framework, we let f ′ (ω) = f(ω) − b, then we get new<br />
sums S ′ n with S′<br />
<br />
n Sn<br />
n = n − b and then A = lim inf S′<br />
<br />
n<br />
S′<br />
n<br />
n ≤ a − b < 0 ≤ lim sup n .<br />
Notice then that ˜ P(lim sup Sn<br />
n ≥ 0) ≥ ˜ P(A) = 1 so then ˜ P({sup S ′ n > 0}) = 1 is<br />
the whole space A. Have then by the maximal ergodic lemma that:<br />
0 < ˜ E(f ′ 1 {sup S ′ n >0}) = ˜ E(f ′ ) = ˜ E(f) − b<br />
The same argument on f ′′′ (ω) = a − f(ω) gives:<br />
0 < ˜ E(f ′′ 1 {sup S ′′<br />
n >0}) = a − ˜ E(f)<br />
But this is a contradiction now, for we have:<br />
a > ˜ E(f) > b<br />
Which is impossible since a < b. This contradiction means that its impossible<br />
to separate the liminf and the limsup like this, in other words we have almost sure<br />
convergence.<br />
Next it remains only to see that the random variable that this converges to is<br />
E(f|F0). Let us denote Firstly, notice that limN→∞ 1<br />
N<br />
N−1<br />
n=0 f (T n x) by ¯ f. We<br />
must show ¯ f is F0 measurable and that E( ¯ f1A) = E(f1A) for all A ∈ F 0. Notice<br />
that applying x → T x does not change limN→∞ 1<br />
N<br />
N−1<br />
n=0 f (T n x) as it only effects<br />
finitely many terms. This shows that ¯ f(x) = ¯ f(T x) This is the reason why ¯ f is F0<br />
measurable. More formally, to see that ¯ f −1 (B) is T -invariant for any Borel set B,<br />
just write out the definitions:<br />
T ( ¯ f −1 (B)) = T x ∈ Ω : ¯ f(x) ∈ B <br />
= T x ∈ Ω : ¯ f(T x) ∈ B <br />
= y ∈ Ω : ¯ f(y) ∈ B <br />
= ¯ f −1 (B)<br />
So indeed, ¯ f −1 (B) ⊂ F0 means ¯ f is F0 measurable. To see that ¯ f has the right<br />
expectation values, we first see prove the result for indicator functions and then use<br />
the “ladder” of integration to get the result we need. Consider that for sets A ∈ F0<br />
and B ∈ F we have:<br />
<br />
<br />
1B(x)dP = 1A(x)1B(x)dP<br />
A<br />
=<br />
=<br />
=<br />
<br />
<br />
<br />
A<br />
1A(T x)1B(T x)dP<br />
1A(x)1B(T x)dP<br />
1B(T x)dP<br />
Where the second equality is using the fact that P is T -invariant and the third<br />
equality uses the fact that A ∈ F0 ⇒ 1A(x) = 1A(T x). Since <br />
<br />
1B(x)dP= A A 1B(Tx)dP,
2. BIRKHOFF’S THEOREM 23<br />
by following along with the construction of the Lebesgue integral starting <strong>from</strong> indicator<br />
functions, we conclude that <br />
<br />
f(x)dP = f(T x)dP for any integrable f.<br />
A A<br />
Applying this inductively, we see that for any N ∈ N that:<br />
<br />
A<br />
N−1<br />
1 <br />
f(T<br />
N<br />
k x)dP =<br />
k=0<br />
<br />
A<br />
f(x)dP<br />
When ¯ f is bounded, we can take the limit as N → ∞ and use the bounded<br />
convergence theorem to conclude:<br />
<br />
A<br />
¯fdP = lim<br />
=<br />
<br />
N→∞<br />
A<br />
<br />
A<br />
f(x)dP<br />
N−1<br />
1 <br />
f(T<br />
N<br />
k x)dP<br />
For general ¯ f, we can use a truncation argument and the monotone convergence<br />
theorem to get finish the result. <br />
Example 2.5. If we look at our first example of rotation by an angle ω, we<br />
concluded (using Fourier analysis) that when ωis irrational and f has a Fourier<br />
series that:<br />
N−1<br />
1 <br />
lim<br />
N→∞ N<br />
n=0<br />
By Birkhoff’s theorem, we know that:<br />
f (T n x) =<br />
k=0<br />
1<br />
0<br />
f(s)ds<br />
N−1<br />
1 <br />
lim f (T<br />
N→∞ N<br />
n x) = E(f|F0)<br />
n=0<br />
So we conclude that: 1<br />
0 f(s)ds = E(f|F0). Since this holds for every f, it<br />
must be that F0 is the trivial field. Notice that this improves our result a little bit,<br />
since we may now apply it to any f integrable, not just f which are C 2 .<br />
Example 2.6. In the first example, we were essentially looking at 1 N−1 N n=0 e2πim(x+nω) .<br />
Now lets ask about the series: 1 N−1 N n=0 e2πim(2n x) . This is harder to handle with<br />
Fourier techniques, but we can still use Birkhoff’s theorem. Again take Ω = [0, 1)<br />
to be our space, but instead of thinking of this as a circle, think of this as binary<br />
sequence (which are the binary expansions of each number between 0 and 1),<br />
Ω = {0.e1e2 . . . : ei = ±1}. Let T : Ω → Ω by T (0.e1e2e3 . . .) = 0.e2e3 . . . . This<br />
translates to T (x) = 2x mod 1 (this is the reason that applying it N times gives<br />
2Nx). It’s not hard to verify that this is measure preserving. By the Kolmogorov<br />
Zero-One law, the field F0of T -invariant events must be the trivial field, for by<br />
applying T N times, we see that an event A ∈ F0cannot depend on the first N<br />
digits e1, e2, . . . eN . Since this works for any N, this is a subset of the tail field,
24 3. ERGODIC THEOREM<br />
which by K-0-1 is trivial. Hence, by Birkhoff’s Theorem, we have:<br />
N−1<br />
1 <br />
lim f (T<br />
N→∞ N<br />
n x) = E(f|F0)<br />
n=0<br />
= E(f)<br />
=<br />
1<br />
0<br />
fdP<br />
For the Fourier basis function f(x) = e 2πimx , this is saying that:<br />
N−1<br />
1 <br />
lim e<br />
N→∞ N<br />
2πim(2nx) = 0<br />
n=0<br />
Example 2.7. We can use Birkhoff’s theorem to give yet another proof of<br />
the strong law of large numbers. Let (X1, X2, . . .) be a sequence of i.i.d. random<br />
variables with finite mean and let Ω be the probability space for these sequences.<br />
Define T : Ω → Ω by T (x1, x2, x3, . . .) = (x2, x3, . . .). Notice that since the X ′ s<br />
are i.i.d. that this is measure preserving. As in example 2, the Kolmogorov zero<br />
one law tells us the field F0 of T -invariant is trivial. Let f(x1, x2, . . .) = x1. By<br />
Birkhoff’s theorem:<br />
lim<br />
N→∞<br />
N−1<br />
1 <br />
N<br />
n=0<br />
xn = lim<br />
N→∞<br />
= E(f|F0)<br />
= E(f)<br />
= E(X1)<br />
N−1<br />
1 <br />
N<br />
n=0<br />
Which is exactly the strong law of large numbers.<br />
3. Continued Fractions<br />
f (T n x)<br />
One way to specify a number in x ∈ [0, 1) is the binary expansion. Each binary<br />
digit tells you “which half” of the number line x is in. e.g. first digits says if its<br />
in 0, 1<br />
<br />
1<br />
2 or 2 , 1 , and then we treat that interval like [0, 1) and start over again for<br />
the next digit. Another way to do this game would be to draw the harmonic series<br />
1<br />
1 1<br />
n on the number line, and then specify which interval [ n+1 , n ) the number is in.<br />
1<br />
1<br />
Call this first number n1, and we’ll have then that ≤ x < . From this we<br />
n1+1 n1<br />
may conclude that:<br />
1<br />
x =<br />
n1 + ɛ1<br />
For some ɛ1 ∈ [0, 1). Play the same game again for ɛ1, and we get:<br />
1<br />
x =<br />
n1 + 1<br />
n2+ɛ2<br />
Continuing this indefinitely gives us the “continued fraction expansion” for x.<br />
Since this is hard to write, we will adopt the convention that x = [n1; n2; n3; . . .] to<br />
mean the continued fraction expansion n1 and then n2 and so on.<br />
Proposition 3.1. If the sequence [n1; n2; n3; . . .] is cyclic (that is it repeats<br />
after some finite number of steps), then x = [n1; n2; n3; . . .] is algebraic.
3. CONTINUED FRACTIONS 25<br />
Proof. The easiest way to see this is an example. Suppose we look at x =<br />
[1; 1; 1; . . .]. Then:<br />
So then:<br />
1<br />
x =<br />
1 + 1<br />
. ..<br />
1+<br />
1<br />
= 1 + x<br />
x<br />
But then x2 − x + 1 = 0, so x is the root of a quadratic equation. In this case<br />
x = √ 5−1<br />
2 is the golden section. In general, if the continued fraction expansion is<br />
periodic after N steps, then x will be the root of an N + 1 order polynomial. <br />
Definition 3.2. We write x = [n1; n2; n3; . . .] to mean:<br />
x =<br />
n1 +<br />
1<br />
1<br />
n2+ 1<br />
n 3 +...<br />
Problem 3.3. Let T : (0, 1) → (0, 1) by T ([n1; n2; . . .]) = [n2; n3; . . .]. This is<br />
the map T (x) = 1<br />
x mod 1. Is there a probability density P we can put on (0, 1) so<br />
that T will be measure preserving?<br />
Proof. [Gauss] The probability density dP = 1<br />
Indeed, just notice that by the definition of T that:<br />
T −1 (a, b) =<br />
∞<br />
n=1<br />
1<br />
log 2 1+x<br />
<br />
1<br />
b + n ,<br />
<br />
1<br />
a + n<br />
dx will do the trick!<br />
So then the requirement P(T −1 (a, b)) = P(a, b) gives (using ρ as a probability<br />
density function):<br />
b<br />
a<br />
ρ(x)dx =<br />
∞<br />
1<br />
a+n<br />
<br />
n=1 1<br />
b+n<br />
Taking the derivative w.r.t. b here gives:<br />
ρ(x) =<br />
∞<br />
n=1<br />
ρ(x)dx<br />
<br />
1 1<br />
ρ<br />
x + n (x + n) 2
26 3. ERGODIC THEOREM<br />
This is hard to solve, but its easy to verify that ρ(x) = 1<br />
1+x<br />
LHS is 1<br />
1+x<br />
while the RHS is:<br />
∞<br />
<br />
1<br />
ρ<br />
x + n<br />
n=1<br />
1<br />
(x + n) 2<br />
=<br />
=<br />
=<br />
=<br />
=<br />
∞<br />
n=1<br />
n=1<br />
1<br />
1 + 1<br />
x+n<br />
1<br />
(x + n) 2<br />
∞ x + n 1<br />
1 + (x + n)<br />
n=1<br />
(x + n) 2<br />
∞ 1<br />
(x + n + 1)(x + n)<br />
n=1<br />
∞ 1<br />
x + n −<br />
1<br />
x + n + 1<br />
1<br />
x + 1<br />
works, since the<br />
Which is a telescoping sum so we can evaluate it exactly. The factor of 1<br />
log 2<br />
normalizes ρ so that 1<br />
ρ(x)dx = 1. Indeed:<br />
0<br />
1<br />
0<br />
1 1 1<br />
log 2 − log 1<br />
dx = [log(1 + x)]1 0 = = 1<br />
log 2 x + 1 log 2 log 2<br />
Theorem 3.4. The shift function T : [0, 1] → [0, 1] given by T ([n1; n2; , . . .]) =<br />
[n2; n3; . . .]is ergodic.<br />
Proof. Fix N ∈ N and a list of integers n1, n2, . . . , nN . Now define:<br />
1<br />
n(x) :=<br />
n1 +<br />
1<br />
n2+...+ 1<br />
n N+x<br />
For each choice of n1, n2, . . . , nN, the image of [0, 1] through n(x) is an interval<br />
whose endpoints are n(0) and n(1). As N increases, the interval [n(0), n(1)] gets<br />
smaller and smaller. An easy proof by induction shows that n(x) can be written<br />
as:<br />
Ax + B<br />
n(x) =<br />
Cx + D<br />
For A, B, C, D ∈ Rwith 0 ≤ A ≤ B and 1 ≤ C ≤ D and with AD − BC = ±1<br />
where the sign depends on the parity of N. Now, let I = [n(0), n(1)] and let<br />
J = (a, b) be an arbitarty interval.<br />
Claim. |I ∩ T −N (J)| ≥ 1<br />
2 |I||J| holds for all N ∈ N.<br />
Proof. Take x ∈ I ∩ T −N (J). Notice that x ∈ I if and only if x = n(y)<br />
for some y ∈ [0, 1] by definition of I. So we can write x as a continued fraction<br />
x = [n1; n2; . . . ; nN−1; nN + y]. On the other hand, x ∈ T −N (J) if and only if<br />
T N x ∈ J. But T N x = T N ([n1; n2; . . . ; nN−1; nN; y]) = y by definition of T . This<br />
shows that x ∈ T −N (J) if and only if y ∈ J.<br />
Have then, using the the observation that n is a fractional linear transformation,<br />
that:<br />
I ∩ T −N (J) = {n(y) : y ∈ J} = [n(a), n(b)]
This shows:<br />
3. CONTINUED FRACTIONS 27<br />
|I ∩ T −N (J)| =<br />
=<br />
=<br />
|n(b) − n(a)|<br />
<br />
<br />
<br />
<br />
Ab + B Aa + B <br />
− <br />
Cb + D Ca + D <br />
<br />
<br />
<br />
<br />
b − a <br />
<br />
(Ca<br />
+ D)(Cb + D) <br />
≥<br />
|b − a|<br />
since a, b < 1<br />
(C + D) 2<br />
≥ 1<br />
|b − a||I|<br />
2<br />
= 1<br />
2 |J||I|<br />
The last inequality holds by writing out |I|and using AD − BC = ±1 and the<br />
fact that 1 ≤ C ≤ D so that C + D ≤ 2D:<br />
|I| =<br />
=<br />
|n(0) − n(1)|<br />
<br />
<br />
<br />
<br />
A + B B <br />
− <br />
C + D D <br />
=<br />
1<br />
|AD − BC|<br />
D(C + D)<br />
=<br />
1<br />
D(C + D)<br />
≤<br />
2<br />
(C + D) 2<br />
Finally, to see that T is ergodic, take any Borel set B ∈ F. By approximating<br />
B by intervals, the inequality <strong>from</strong> the claim still holds:<br />
<br />
I ∩ T −N B ≥ 1<br />
2 |I||B|<br />
Take any set A now. Again, by approximting A by intervals I, we can use the<br />
above inequality to get:<br />
<br />
A ∩ T −N B ≥ 1<br />
2 |A||B|<br />
This gives what we want, for if B is T −invariant, we have T −N B = B for every<br />
N. The choice A = B c in the above gives:<br />
1<br />
2 |B|Bc | ≤ |B c ∩ T −N B|<br />
= |B c ∩ B|<br />
= 0<br />
So |B||B c | = 0, which is only possible if |B| = 1 or |B| = 0. This is saying<br />
all T invarant sets are either measure zero or full measure. In other words, T is<br />
ergodic.
CHAPTER 4<br />
Brownian Motion<br />
1. Motivation<br />
Our aim is to discuss a stochastic process on [0, 1] (that is a probability space<br />
(Ω, F, P) and a collection of random variables Bt(ω), for t ∈ [0, 1]) which has the<br />
following properties:<br />
• B0(ω) = 0 for every ω ∈ Ω<br />
• Fix a T ∈ [0, 1], and define for t > T,B + t = BT +t − Bt. We want B + t to<br />
look statistically identical to Bt. (This says the process has some sort of<br />
“time homogenous” property.)<br />
• We want B + t as defined above to be independent of Bt. (This says that<br />
the process has some sort of Markov property)<br />
• E(B 2 t ) < ∞<br />
• E(Bt) = 0<br />
• Bt(ω) is continuous for every (or almost every) ω ∈ Ω.<br />
This process is supposed to describe something like a piece of dust that you can<br />
see sometimes wiggling about in a sunbeam. Notice that the time homogenous and<br />
Markov property together means we can write:<br />
BT =<br />
N<br />
k=1<br />
B kT<br />
N<br />
− B (k−1)T<br />
N<br />
Which is a sum of many independent increments. By the central limit theorem,<br />
this is suggesting Bt ∼ N(0, σ 2 ) is normally distributed (to get this more rigorously<br />
would take a bit more work, since the above set up is not exactly the set up for the<br />
central limit theorem). This is often taken as an “axiom” :<br />
• Bt ∼ N(0, σ 2 )<br />
A quick calculation shows that σ2 ∝ t. Let f(t) = σ2 be the variance for Bt. Then:<br />
f(t + s) = E (Bt+s) 2<br />
= E (Bt+s − Bs + Bs) 2<br />
= E (Bt+s − Bs) 2 + E B 2 s + 2E ((Bt+s − Bs)Bs)<br />
= f(t) + f(s) + 2 · 0<br />
Where we used the time homogenous property and the Markov property. This<br />
functional relation means that f(t) must be linear! f(0) = 0 holds since B0 is<br />
known exactly. Hence f(t) = c · t. It doesn’t hurt to take c = 1, since anything we<br />
get can be rescaled for other values of c if need be. Sometimes this is taken as the<br />
“axiom”:<br />
(1) Bt ∼ N(0, t)<br />
29
30 4. BROWNIAN MOTION<br />
The following resulting property also turns out to be very useful:<br />
Proposition 1.1. E(BaBb) = min(a, b)<br />
Proof. Suppose W.O.L.O.G. a < b. Then: E(BaBb) = E(Ba(Bb − Ba +<br />
Ba)) = E(Ba(Bb − Ba)) + E(B 2 a) = 0 + a = min(a, b) <br />
It remains to see that such a process really exists. The main difficulty is proving<br />
that the process is continuous. There is more than one way to skin the cat for this;<br />
each method is useful because it gives a different insight into what is going on.<br />
2. Levy’s Construction<br />
We will construct Brownian motion on t ∈ [0, 1] as a uniform limit of continuous<br />
functions B N t , as N → ∞. Each B N t will be an approximation of the Brownian<br />
motion that is piecewise linear between the dyadic rationals of the form a<br />
2 N . The<br />
real trick in the construction is the remarkable observation that the corrections<br />
<strong>from</strong> BN t to B N+1<br />
t are independent of the construction so far up to level N, which<br />
is the crucial fact that makes the construction so nice and allows it to converge.<br />
The crucial fact about Brownian motion that makes this possible is captured in the<br />
below proposition:<br />
Proposition 2.1. Let Bt be a Brownian path and 0 < a < b < 1. Consider the<br />
line segment joining Ba and Bb: l(t) = Ba+(t−a) Bb−Ba<br />
b−a . Consider the value of the<br />
. The difference <strong>from</strong> this point to the line<br />
Brownian path at the midpoint time B a+b<br />
2<br />
l(t) is independent of Bb and Ba. That is to say: X = B a+b<br />
2<br />
− l( a+b<br />
2 ) = B a+b −<br />
2<br />
1<br />
2Ba − 1<br />
2Bb, is independent of Ba and Bb. Moreover, X is normally distributed<br />
X ∼ N(0, 1<br />
4<br />
(b − a)).<br />
Proof. Firstly, we notice that the random variables X, Ba,and Bb are have a<br />
joint normal distribution. This can be seen without much difficulty by expanding<br />
the definition of X to write any linear combination of X, Baand Bb as a linear<br />
combination of B a+b , Ba,and Bb. From here, rewrite as a linear combination of<br />
2<br />
Ba, B a+b − Ba, and Bb − B a+b . By the hypothesis on our Brownian motion, each<br />
2<br />
2<br />
of these are independent Gaussian variables, so any linear combination of them is<br />
again Gaussian. Hence any linear combination of X, Ba and Bb is Gaussian. This<br />
property is a characterization of the joint Gaussian distribution. The observation<br />
that X, Ba and Bb are jointly normal substantially simplifies the verification of<br />
their independence, as for jointly normal distributions they are independent if and<br />
only if they are uncorrelated. From here we calculate (with the help of the useful<br />
covariance relation):<br />
<br />
E(BaX) = E Ba(B a+b −<br />
2<br />
1<br />
2 Ba − 1<br />
2 Bb)<br />
<br />
<br />
= E BaB a+b −<br />
2<br />
1<br />
2 E B 2 1<br />
a −<br />
2 E(BaBb)<br />
= a − 1 1<br />
a −<br />
2 2 a<br />
= 0<br />
A similar calculation holds for E(BbX). Since these are uncorrelated and jointly<br />
normal, they are independent. A quick calculation using the covariance relation<br />
again gives X ∼ N(0, 1<br />
4 (b − a))
2. LEVY’S CONSTRUCTION 31<br />
This remarkable fact gives us a nice idea to construct Brownian motion starting<br />
with an infinite sequence of standard E(Z) = 0, E(Z2 ) = 1 i.i.d Gaussian variables<br />
(Z0, Z1, Z2, . . .). The idea is to first construct B0 = 0, B1 = Z0. Then, once B0, and<br />
B1 are constructed by the above proposition, we know that B1/2− 1 1<br />
2B0− 2B1 can be<br />
modeled by 1<br />
4Z1, so set B1/2 = 1<br />
2B1 <br />
1 + 4Z1. Once B0, B1/2, B1 are constructed,<br />
the above proposition gives us a way to get B 1 and B 3 using two more normal<br />
4 4<br />
1<br />
variables 8Z2 <br />
1 and 8Z3 and so on.<br />
The above proposition and paragraph is the basic idea. It becomes a bit of a<br />
mouthful to write it all down. A confused reader should focus on understanding<br />
the construction above before digesting the below details.<br />
To formalize the process, we let B N t be the construction at the N − th level of<br />
construction, which will have the correct values at points of the form a<br />
2 N , 0 ≤ a ≤<br />
2 N . We make fill in in between these points with a piecewise continuous function.<br />
After some bookkeeping, the easiest way to write this down is as follows. First<br />
2k<br />
define some “tent” functions which make little peaks in the interval 2n , 2(k+1)<br />
2n <br />
of<br />
unit height:<br />
⎧<br />
⎪⎨<br />
2<br />
Tn,k =<br />
⎪⎩<br />
n (t − (2k)) t ∈ 2k<br />
2n , 2k+1<br />
2n 2<br />
<br />
n ((2k + 2) − t) t ∈ 2k+1<br />
2n , 2k+2<br />
2n 0<br />
<br />
<br />
2k t /∈<br />
2n , 2(k+1)<br />
2n Notice that for every level n, 0 ≤ k ≤ 2 n−1 − 1 means there are 2 n−1 tents, and<br />
notice that these tents are disjoint and of unit height.<br />
Now, at every level of the construction we make sure that B N t has the right<br />
value at points of the form a<br />
2 N by adding in the right tents with heights distributed<br />
by scaled normal functions:<br />
B N t = Z0t +<br />
N<br />
n=1<br />
2 n−1 −1<br />
k=0<br />
1<br />
Zn,kTn,k(t)<br />
2n+1 Explanation of this formula: The “Z0t” is the initial level 0 construction. The<br />
sum 0 ≤ n ≤ N sums over the N levels of construction, and the sum 0 ≤ k ≤<br />
2 n−1 − 1 is over the 2 n−1 tents that get added on at the n − th level. Each tent<br />
<br />
1<br />
has a height distributed like 2n+1 1<br />
Z ∼ N(0, 2n+1 ) , where Z ∼ N(0, 1)(This is the<br />
content of the proposition above!) For convenience, we label the infinite sequence<br />
of normal variables so that Zn,k is controlling the height of the k − th tent on the<br />
n − th level.<br />
Finally we get the Brownian motion as Bt = limN→∞ BN t , which puts the<br />
Brownian motion on the same probability space as the infinite sequence of normal<br />
variables. To see that this is continuous, we show that the convergence is uniform<br />
almost surely. Since each BN t is continuous, and a uniform limit of continuous<br />
functions is continuous, this gives that Bt is continuous.<br />
Proposition 2.2. The family of functions B N t is converging uniformly almost<br />
surely.<br />
Proof. As you might suspect, the trick is to use the right summablesequence <br />
with a clever application of the Borel Cantelli lemma. Let Hn = maxt∈[0,1] 2 n−1 <br />
−1 1<br />
k=0 2n+1 Zn,kTn,k(t)
32 4. BROWNIAN MOTION<br />
be the maximum height contribution to Bt at level n. Since the tent functions<br />
Tn,k(t) are disjoint, this is Hn =<br />
following estimate:<br />
P(Hn > 2<br />
n − 2<br />
<br />
√<br />
2n) = P<br />
<br />
1<br />
2n+1 max<br />
0≤k≤2n−1−1 max<br />
0≤k≤2n−1 (|Zn,k|) > 2<br />
−1<br />
≤ 2 n−1 P |Z| > 2 √ n <br />
= 2 n P Z > 2 √ n <br />
=<br />
≤<br />
2 n<br />
√ 2π<br />
2 n<br />
√ 2π<br />
∞<br />
2 √ n<br />
= C · 1<br />
√ n ·<br />
<br />
exp − x2<br />
<br />
dx<br />
2<br />
(|Zn,k|). We now make the<br />
n − 2 2 n+1<br />
2 2 1<br />
2<br />
<br />
√<br />
n<br />
1<br />
2 √ n exp<br />
<br />
− (2√n) 2 <br />
(this is Mill’s ratio)<br />
2<br />
<br />
2<br />
e2 n Which is a summable sequence! Hence, we know by the Borel Cantelli lemma<br />
that this happens only finely often almost surely. That is to say, for almost every<br />
n √<br />
− ω ∈ Ω, we can find N ∈ N so that Hn(ω) ≤ 2 2 2n for all n > N. But then we<br />
have that for all p, q > Nand any t ∈ [0, 1]:<br />
|B p<br />
t − B q<br />
t | =<br />
≤<br />
≤<br />
≤<br />
<br />
<br />
<br />
<br />
<br />
<br />
q<br />
n=p+1<br />
q<br />
n=p+1<br />
q<br />
n=p+1<br />
∞<br />
2<br />
n=N<br />
2 n−1 −1<br />
k=0<br />
|Hn|<br />
2<br />
n − 2<br />
n − 2<br />
√ 2n<br />
√ 2n<br />
<br />
<br />
<br />
1<br />
<br />
Zn,kTn,k(t) <br />
2n+1 <br />
<br />
n √<br />
− But since 2 2 2n is summable, this can be made arbitrarily small, and we<br />
see then that BN t is Cauchy in the uniform norm. Since this holds for almost every<br />
ω ∈ Ω, we indeed have uniform convergence almost surely. <br />
Finally, to see that the limiting process is really what we want, we just verify<br />
that E (Bt − Bs) 2 = |t − s|, <strong>from</strong> which it’s easy to check the properties we want.<br />
To see this, we just use the density of the dyadic rationals in [0, 1]. The above<br />
construction fixes points of the form a<br />
2 n at step n, that is to say Bt( a<br />
2 n ) = B n t ( a<br />
2 n ).<br />
Hence for t, s dyadic rationals, we have E (Bt − Bs) 2 = E (B n t − B n s ) 2 = |t − s|<br />
which is easily checked by the construction above/the earlier proposition.
3. CONSTRUCTION FROM DURRET’S BOOK 33<br />
For arbitrary t now, but s still taken to be a dyadic rational, we take a sequence<br />
of dyadic rationals tn → t. We have then using Fatou’s lemma:<br />
E (Bt − Bs) 2 <br />
= E lim (Btn − Bs)<br />
2<br />
n→∞<br />
≤ lim<br />
n→∞ E (Btn − Bs)<br />
2<br />
= lim<br />
n→∞ |tn − s|<br />
= |t − s|<br />
Now consider, for any n ∈ N:<br />
E (Bt − Bs) 2 = E (Bt − Btn − Bs + Btn )2<br />
= E (Bt − Btn) 2 + E (Bs − Btn) 2 + 2E ((Bt − Btn)(Bs − Btn))<br />
Since this holds for any n ∈ N, we get:<br />
E (Bt − Bs) 2 =<br />
<br />
lim E (Bt − Btn<br />
n→∞<br />
)2 + E (Bs − Btn )2 + 2E ((Bt − Btn )(Bs − Btn ))<br />
= 0 + lim<br />
n→∞ |tn − s| + 0<br />
= |t − s|<br />
Where we have observed that the two limits on either side are 0 by using<br />
E (Bt − Bs) 2 ≤ |t−s| in a clever way. First:limn→∞ E (Bt − Btn )2 ≤ limn→∞ |t−<br />
tn| = 0 and secondly with the help of Holder:<br />
lim<br />
n→∞ |E ((Bt − Btn )(Bs − Btn )) | ≤ lim<br />
n→∞<br />
≤ lim<br />
n→∞<br />
= 0<br />
E((Bt − Btn )2 ) E((Bs − Btn )2 )<br />
|t − tn| |s − tn|<br />
Once we have E (Bt − Bs) 2 = |t − s| for arbitrary t and dyadic s, the same<br />
argument repeated again will show that E (Bt − Bs) 2 = |t − s| works when both<br />
t and s are arbitrary.<br />
3. Construction <strong>from</strong> Durret’s Book<br />
(I call this “Durret’s construction” since I read it out of Durret’s book: “Brownian<br />
Motion and Martingale’s in Analysis”)<br />
The above construction is pretty elementary and gives all the desired properties.<br />
The following construction is a bit more technical, in particular it uses a few<br />
extension results like Caratheodory and Kolmogorov. However, it gives immediately<br />
that not only is the Brownian motion continuous, but it is Holder continuous<br />
for exponents γ < 1<br />
2 . This construction uses a few ”extension theorems”, which are<br />
gone over briefly in the appendix.<br />
Definition 3.1. (Constructing Brownian Motion with the Kolmogorov Extension<br />
Theorem)<br />
The Kolmogorov Extension Theorem gives us a quick way to define a measure<br />
on the space of functions. However, since the space of functions {f : T → R} is so<br />
large, this theorem often gives us a very unwieldy space to work with, one in which<br />
we can’t get our hands on the properties we want. The construction of Brownian<br />
motion below is a great example, constructing with the Kolmogorov theorem is
34 4. BROWNIAN MOTION<br />
bad, while if we take more care and construct it on only countably many points,<br />
we get what we want.<br />
Let Pt1,t2,...tn (A1×A2×.<br />
<br />
. .×An) = dx1 dx2 . . .<br />
A1<br />
A2<br />
An<br />
dxnΠ n k=1 pti−ti−1 (xi−1, xi),<br />
where pt(x, y) = √ 2πt −1 exp(− |y−x|2<br />
2t ). This is naively what you get as the distribution<br />
of Bt1 , Bt2,..., Btn if you use the Markov property and normal distribution<br />
of the Brownian motion. By Kolmogorov, we get a measure Pon the entire space<br />
of function {f : [0, 1] → R}. This defines the Brownian motion!<br />
Proposition 3.2. With the above description of P, it will be impossible to<br />
see that the Brownian motion is almost surely continuous because the continuous<br />
functions C ⊂ {f : [0, 1] → R} are not even measurable.<br />
Proof. Suppose by contradiction C is measurable. Then we can find a sequence<br />
t1, t2, . . . of times and Borel sets B1, B2, . . . so that C = {f : (f(ti) ∈ Bi}<br />
(The proof of this fact comes by showing that sets of the form {f : (f(ti) ∈ Bi} are a<br />
sigma-algebra which contain the cylinder sets used to define Ω = σ(A) ). Take any<br />
continuous function f now, and alter its value at a single point t /∈ {t1, t2, . . .} to<br />
get a function ˆ f which agrees with f at {t1, t2, . . .} but is not continuous. But then<br />
ˆf ∈ C = {f : (f(ti) ∈ Bi} since it agrees with f at {t1, t2, . . .} is a contradiction. <br />
This result means that our construction is not good. It is better to construct<br />
the Bt as follows:<br />
Definition 3.3. (Constructing Brownian Motion with Uniform Continuity)<br />
Step 1. (Define on dyadic rationals). Let Pt1,...tn as above. Use the countable<br />
Kolmogorov Extension Theorem to get a measure P on the set of functions Ω =<br />
{f : [0, 1] ∩ D2 → R} <strong>from</strong> the dyadic rationals to R.<br />
Step 2. Check that functions in Ω are almost surely Holder continuous. i.e. for<br />
almost all f ∈ Ω, |f(t) − f(s)| ≤ C|t − s| γ<br />
Step 3. Conclude that for almost every f ∈ Ω,there is a unique way to extend<br />
f to a function f : [0, 1] → R since the dyadic rationals are dense in R.<br />
Step 1 is pretty simple, but step 2 requires some verification and is the real<br />
heart of the problem:<br />
Proposition 3.4. Fix γ < 1<br />
2 . For almost every f ∈ Ω, there is a constant C<br />
so that |f(t) − f(s)| ≤ C|t − s| γ<br />
We first prove a lemma.<br />
Lemma 3.5. Fix γ < 1<br />
2 . Then there exists δ > 0,so that for almost every f ∈ Ω,<br />
there is an N ∈ N (which depends on f) so that for n ≥ N we have:<br />
|f(x) − f(y)| ≤ |x − y| γ<br />
Whenever x = i2−n , y = j2−nand |x − y| ≤ <br />
1 n(1−δ)<br />
2<br />
Proof. Take m ∈ N so large so that m > 1<br />
1−2γ . We use the inequality<br />
E |f(t) − f(s)| 2m ≤ Cm|t − s| m with Cm = E|f(1)| 2m (This follows by the property<br />
that f(t) − f(s) ∼ f(s) + N(0, t − s) ). For any n ∈ N now, consider now the
following estimates:<br />
P<br />
<br />
3. CONSTRUCTION FROM DURRET’S BOOK 35<br />
|f(x) − f(y)| > |x − y| γ for some x = i2 −n , y = j2 −n and |x − y| ≤<br />
<br />
n(1−δ)<br />
1<br />
2<br />
≤ |x − y| −2mγ E |f(x) − f(y)| 2m<br />
Where the sum on the right hand side is taken over all the possible x, y that satisfy<br />
the inequality |x − y| ≤ <br />
1 n(1−δ)<br />
(There are finitely many, since we are restricting<br />
2<br />
ourselves to dyadic rationals x = i2−n , y = j2−n ). We have used the Chebyshev<br />
inequality P(|X| > a) ≤ a−mE(|X| m ) here. Now, by the above inequality, we have:<br />
<br />
−2mγ m<br />
LHS ≤ Cm |x − y| |x − y|<br />
<br />
−2mγ+m<br />
= Cm |x − y|<br />
≤ Cm2 n 2 nδ (2 −n(1−δ) ) −2mγ+m<br />
= Cm2 −n(−(1+δ)+(1−δ)(−2mγ+m))<br />
The last bound comes in because |x − y| ≤ 2 −n(1−δ) for x, y in our sum, and<br />
there are at most 2 n choices for x and 2 nδ choices for y once x has been fixed<br />
(remember, they are all n-th level dyadic rationals). Now, the term that appears<br />
in the exponent is:<br />
ɛ = −(1 + δ) + (1 − δ)(−2mγ + m)<br />
Since m is so large so that −2mγ + m > 1, we can choose δ so small so that<br />
ɛ > 0. We will have then that<br />
LHS ≤ 2 −nɛ<br />
Which is a summable sequence! By the Borel Cantelli lemma, it must be the<br />
case that for almost every f ∈ Ω the event here happens only finitely many times.<br />
This is exactly the statement of the lemma which we wanted to prove. <br />
Proposition 3.6. Fix γ < 1<br />
2 . For almost every f ∈ Ω, there is a constant C<br />
so that |f(t) − f(s)| ≤ C|t − s| γ<br />
Proof. For almost every f ∈ Ω, find δ > 0, N ∈ N as in the lemma. Take any<br />
t, s ∈ D2 ∩ [0, 1] with t − s < 2 −N(1−δ) .Choose m > N now so that 2 −(m+1)(1−δ) ≤<br />
t − s ≤ 2 −m(1−δ) .Write now t = i2 −m − 2 −q1 − 2 −q2 − . . . 2 −qk < (i − 1)2 −m , and<br />
s = j2 −m + 2 −r1 + . . . + 2 −rl < (j + 1)2 m for some choice of q ′ s and r ′ s so that<br />
m < q1 < . . . < qk and m < r1 < . . . < rl. Since t − r < 2 −m(1−δ) , we have<br />
i2 −m − j2 −m < t − s < 2 −m(1−δ) so we can apply the result <strong>from</strong> the lemma to<br />
conclude:<br />
|f(i2 −m ) − f(j2 −m )| ≤ ((2 mδ )2 −m ) γ<br />
= 2 −m(1−δ)γ
36 4. BROWNIAN MOTION<br />
Now, we use the result of the lemma again many times to see that (using our<br />
clever rewriting of t):<br />
|f(t) − f(i2 −m )| ≤ |f(i2 −m − 2 −q1 ) − f(i2 −m )| + |f(i2 −m − 2 −q1 − 2 −q2 ) − f(i2 −m − 2 −q1 )| + . . . + |f(i2 −m<br />
≤ |2 −q1 | γ + . . . + |2 −qk | γ<br />
≤<br />
∞<br />
j=m+1<br />
≤ C2 −γm<br />
(2 −j ) γ<br />
Since m < qp for each p, and where we used Jensen’s inequality to bound the<br />
sum. We similarly get a bound on |f(s) − f(j2 −m )|.Finally then:<br />
|f(t) − f(s)| ≤ C2 −γm(1−δ) + C2 −γm + C2 −γm<br />
≤ C2 −γm(1−δ)<br />
<br />
γ(1−δ)<br />
= C2 2 −(m+1)(1−δ) γ<br />
≤ C2 γ(1−δ) |t − s| γ<br />
By the choice of m so that 2 −(m+1)(1−δ) ≤ t − s. <br />
So <strong>from</strong> here we see that the Brownian motion is almost surely Holder continuous<br />
for exponents γ < 1<br />
2 . This result lets us find a unique extension of f(t) <strong>from</strong><br />
the dyadic rationals to all of [0, 1] which is not only continuous, but moreover its<br />
Holder continuous for exponents γ < 1<br />
2 , which is a stronger result than our first<br />
construction. For ease of notation now, we will change our notation now a little<br />
bit. We will refer to ω ∈ Ωnow instead of f and we now have a family of random<br />
variables Bt(ω) = ω(t). What we have just proven is that for fixed ω, the map<br />
t → Bt(ω) is indeed a Holder continuous path for exponents γ < 1<br />
2 .<br />
4. Some Properties<br />
The following slick result shows that the Brownian motion is nowhere Holder<br />
continuous for γ > 1<br />
2 , which in particular shows that it is nowhere differentiable.<br />
Proposition 4.1. For γ > 1<br />
2 , the set of functions which are Holder continuous<br />
with exponent γ at some point is a null set. In other words, the Brownian motion<br />
is almost surely nowhere Holder continuous for exponents γ > 1<br />
2 .<br />
Proof. Fix a γ > 1<br />
m+1<br />
2 and C ∈ R. Choose m ∈ N so large so that γ¿ 2m . Define<br />
the events, starting at n > m:<br />
<br />
An = ω : ∃s ∈ [0, 1] such that |Bt − Bs| ≤ C|t − s| γ ∀t ∈ [s − m m<br />
, s +<br />
n n ]<br />
<br />
Define the random variable:<br />
<br />
<br />
Yn,k(ω) = max <br />
j=0,1,...2m B <br />
k + j k + j − 1 <br />
− B<br />
n<br />
n<br />
And finally, the events:<br />
Bn = at least one of the Yn,k ≤ 2C <br />
m γ<br />
n<br />
We now claim that An ⊂ Bn, since for ω ∈ An, we find an s so that |Bt − Bs| ≤<br />
C|t − s| γ∀t ∈ [s − m m<br />
n , s + n ]. In particular, |Bt − Bs| ≤ C <br />
m γ<br />
n By the pigeonhole
4. SOME PROPERTIES 37<br />
principle, inside this interval we can find k so that { k k+1 k+2<br />
n , n , n<br />
m m<br />
n , s + n ] . But then, for this k, we have:<br />
<br />
<br />
Yn,k(ω) = max <br />
j=0,1,...2m B <br />
k + j k + j − 1 <br />
− B<br />
n<br />
n<br />
<br />
<br />
≤ max <br />
j=0...2m B <br />
k + j <br />
− B(s) <br />
n +<br />
<br />
<br />
<br />
k + j − 1 <br />
B(s) − B<br />
n<br />
<br />
m<br />
γ ≤ 2C<br />
n<br />
So ω ∈ Bn by definition.Now consider that:<br />
P(An) ≤ P(Bn)<br />
≤<br />
<br />
≤<br />
k=0..n−m<br />
<br />
k=0..n−m<br />
<br />
P Yn,k ≤ 2C<br />
<br />
B<br />
P k+j<br />
n<br />
<br />
m<br />
γ n<br />
− B k+j−1<br />
n<br />
<br />
<br />
≤ 2C<br />
<br />
≤ nP |B 1<br />
n − B0|<br />
<br />
m<br />
γ2m < 2C<br />
n<br />
<br />
<br />
m<br />
γ √<br />
= nP |B1 − B0| < 2C n<br />
n<br />
<br />
2<br />
<br />
m<br />
<br />
γ<br />
2m<br />
√<br />
≤ n √2π 2C n<br />
n<br />
2m<br />
1 (<br />
= Dn 2 −γ)2m+1 = Dn m+1−2mγ → 0<br />
<br />
m<br />
γ n<br />
k+2m , . . . n } ⊂ [s −<br />
<br />
for each j = 0, 1, ..2m<br />
Where we used the independence property of disjoint intervals of the Brownian<br />
motion, the scaling relation P(Bt > a) = P(Bct > √ ca), and the easy inequality<br />
P(N(0, 1) > λ) ≤ 2λwhich comes <strong>from</strong> integrating the p.d.f.. Finally, by the choice<br />
of m so that γ > m+1<br />
2m<br />
, we know that m + 1 − 2mγ < 0 so this probability does<br />
indeed go to zero. But then, as the events An are increasing, this means that An<br />
are all zero probability events, which is the result we wanted.
CHAPTER 5<br />
Appendix<br />
1. Conditional Random Variables<br />
Let (Ω, F, P) be a probability space and X, Y : Ω → R random variables. B is<br />
the Borel sigma algebra of R.<br />
Definition 1.1. We define σ(X) ⊂ F to be the sigma-algebra generated by<br />
the preimages of Borel sets through F. That is:<br />
σ(X) = σ({X −1 (B) : B ∈ B})<br />
Remark. The sub-algebra σ(X) is in coarser than all of F. Intuitively, the<br />
random variable X can only “detect” up to sets in σ(X).<br />
Definition 1.2. Let Σ ⊂ F be a subalgebra of F. We say a random variable<br />
X : Ω → R is Σ−measurable if X −1 (B) ∈ Σ for all B ∈ B. Equivalently, if<br />
σ(X) ⊂ Σ.<br />
.<br />
Example 1.3. Every random variable is always F measurable, since σ(X) ⊂ F<br />
Definition 1.4. Given X and Y , we can define a new random variable Z =<br />
E(Y |X) to be the unique random variable with the following two properties:<br />
1. Z is σ(X) measurable.<br />
2. For any B ∈ B we have E (Z1X∈B) = E (Y 1X∈B)<br />
Remark. The existence of this random variable is proven by restricting the<br />
Radon-Nikodym derivative of Y with respect to the probability space to just the<br />
sigma field σ(X).<br />
Remark. There is no problem with picking any subalgebra Σ ⊂ F instead<br />
of σ(X). The second condition is simply that for any S ∈ Σ we have E (Z1S) =<br />
E (Y 1S), which is really the condition above with Σ = σ(X).<br />
Remark. Z = E(Y |X) is a random variable Z : Ω → R, but it is often thought<br />
of as a function Z : R → R, whose input is the random variable X. This works<br />
because Z is σ(X) measurable. The following two little results clear this up a bit:<br />
Proposition 1.5. If f : R → R is measurable, and Z : Ω → Ris Σ-measurable,<br />
then the random variable f ◦ Z is Σ-measurable too.<br />
Proof. For any B ∈ B we have (f ◦ Z) −1 (B) = Z −1 (f −1 (B)) ∈ Σsince<br />
f −1 (B) ∈ B as f is measurable and Z is Σ- measurable. <br />
Proposition 1.6. If Z is σ(X)-measurable random variable, then we may think<br />
of Z as a function Z : R → R whose input is X.<br />
39
40 5. APPENDIX<br />
Proof. Define ˜ Z : R → R by ˜ Z(x) = Z(ω) for any representative ω ∈<br />
X −1 ({x}). We must justify why this value is independent of the choice of ω ∈<br />
X −1 ({x}). Indeed for ω1, ω2 ∈ X −1 ({x}), let z = Z(ω1).Since Z is σ(X) measurable,<br />
we have that:<br />
Z −1 ({z}) ∈ σ(X)<br />
⇒ Z −1 ({z}) = X −1 (B) for some B ∈ B<br />
But then ω1 ∈ Z −1 ({z}) = X −1 (B), so that X(ω1) ∈ B. Since X(ω1) = X(ω2) =<br />
x, we have then ω2 ∈ X −1 (B) = Z −1 ({z}), which means that Z(ω1) = Z(ω2) = z,<br />
as desired. Hence ˜ Z is well defined! With this definition of ˜ Z, we see that Z = ˜ Z◦X.<br />
We often conflate Z with ˜ Z in practice. <br />
2. Extension <strong>Theorems</strong><br />
Theorem 2.1. [Caratheodory Extension Theorem]<br />
Fix some (Ω, A, P0), where Ω is a set, A is an algebra of sets (aka a field<br />
of sets), and P0is a finitely additive probability measure on A. If we have the<br />
additional property that:<br />
For sequences of sets A1, A2, . . . ∈ A which are pairwise disjoint with the property<br />
that ∪An ∈ A too, then we necessarily have P0(∪An) = P0(An) .<br />
Then there is a unique extension to a probability space (Ω, σ(A), P) so that P<br />
and P0 agree on A.<br />
Proof. [sketch] The idea is exactly the same as the construction of the Lebesgue<br />
measure on [0, 1] <strong>from</strong> the premeasure generated by µ((a, b))<br />
<br />
= b − a on the algebra<br />
of open sets. Define an outer measure: P(E) := inf P0(An). From here<br />
E⊂∪An<br />
you check that P is indeed a probability measure. Countable subadditivity and<br />
monotonicity are easy. To get that P(A) = P0(A) for A ∈ A requires the special<br />
property we are given above. Once this is done, you can define measurable sets a-la<br />
Caratheodory: E measurable iff for all A ∈ A we have P(A) = P(A∩E)+P(A∩E c ).<br />
Then you verify that σ(A) is a subset of these measurable sets, and declare P = Pto<br />
be the measure on σ(A). <br />
Remark. The above condition needed in the theorem can be replaced with<br />
“Continuity <strong>from</strong> above at ∅”:<br />
For A1, A2, . . . ∈ A which are decreasing down to ∅, then we necessarily have<br />
that P0(An) → 0 too.<br />
The equivalence of these two conditions is not too difficult. The first condition is<br />
more intuitive, while this second condition is sometimes easier to verify in practice.<br />
Theorem 2.2. [Countable Kolmogorov Extension Theorem]<br />
Suppose for every n ≥ 1, we have a probability measure Pn on R n . Suppose<br />
also that these probability measure’s satisfy the following consistency condition for<br />
every Borel set E ∈ R n :<br />
Pn+k(E × R k ) = Pn(E)<br />
Then there exists a unique measure Pon the infinite product measure R ∞ of<br />
sequences, so that for every Borel set E ∈ R n P(E × R × R × . . .) = Pn(E).
2. EXTENSION THEOREMS 41<br />
Proof. [sketch] Take Ω = R ∞ be real-valued sequences. Define the field of<br />
cylinder sets to be:<br />
A = {E × R × R × . . . : E ∈ R n is Borel}<br />
With finitely additive measure P0(E × R × R × . . .) := Pn(E). The given<br />
condition on the P ′ ns shows this is well defined. To see continuity <strong>from</strong> above at<br />
∅, notice that if Ak ↓ ∅, then we must have Ak = Ek × R × R × . . . for some sets<br />
Ek ∈ R n with Ek ↓ ∅. But then of course, since Pn is a probability measure,<br />
we have P0(Ak) = Pn(Ek) → 0. By application of the Caratheodory extension<br />
theorem, we get the desired measure! <br />
Theorem 2.3. [Kolmogorov Extension Theorem]<br />
Let T be any interval T ⊂ R. Suppose we have a family of probability measure’s<br />
Pt1,t2,...tn on Rn whenever t1, t2, . . . tn is a finite number of points in T . Suppose<br />
also that these probability measure’s satisfy the following consistency condition:<br />
P t1,t2,...tn,ˆt1,ˆt2,...ˆtm (E × Rm ) = Pt1,t2,...tn (E)<br />
Then there exists a unique measure P on the set of functions {f : T → R}so<br />
that:<br />
P ({f : (f(t1), f(t2), . . . f(tn)) ∈ E}) = Pt1,t2,...tn (E)<br />
Remark. This is very similar to the countable version, but requires some more<br />
work to make it work out. However, since the space of functions {f : T → R} is so<br />
large, this theorem often gives us a very unwieldy space to work with, one in which<br />
we can’t get our hands on the properties we want. The construction of Brownian<br />
motion below is a great example, constructing with the uncountable Kolmogorov<br />
theorem is bad, while with the countable one is a good.