06.07.2013 Views

Notes from Limit Theorems 2 Mihai Nica

Notes from Limit Theorems 2 Mihai Nica

Notes from Limit Theorems 2 Mihai Nica

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Notes</strong> <strong>from</strong> <strong>Limit</strong> <strong>Theorems</strong> 2<br />

<strong>Mihai</strong> <strong>Nica</strong>


<strong>Notes</strong>. These are my notes <strong>from</strong> the class <strong>Limit</strong> <strong>Theorems</strong> 2 taught by Proffesor<br />

McKean in Spring 2012. I have tried to carefully go over the bigger theorems<br />

<strong>from</strong> the course and fill in all the details explicitly. There is also a lot of information<br />

that is folded in <strong>from</strong> other sources.<br />

• The section on Martingales is supplemented with some notes <strong>from</strong> ”A First<br />

Look at Rigorous Probability Theory” by Jeffrey S. Rosenthal, which has<br />

a really nice introduction to Martingales.<br />

• The section of the law of the iterated logarithm is supplemented with<br />

some inequalities which I looked up on the internet...mostly wikipedia<br />

and PlanetMath.<br />

• In the section on Ergodic theorem, I use a notation I found on wikipedia<br />

that I like for continued fractions. In my pen-and-paper notes, there is<br />

also a little section about Ergodic theory for geodesics on surfaces, which<br />

is really cute. However, I couldn’t figure out a good way to draw the<br />

pictures so it hasn’t been typed up yet.<br />

• The section on Brownian Motion is supplemented by the book Brownian<br />

Motion and Martingale’s in Analysis by Richard Durret which is really<br />

wonderful. Some of the slick results are taken straight <strong>from</strong> there.<br />

• I also include an appendix with results that I found myself reviewing as I<br />

went through this stuff.


Contents<br />

Chapter 1. Martingales 5<br />

1. Definitions and Examples 5<br />

2. Stopping times 6<br />

3. Martingale Convergence Theorem 7<br />

4. Applications 9<br />

Chapter 2. The Law of the Iterated Logarithm 13<br />

1. First Half of the Law of the Iterated Logarithm 13<br />

2. Second Half of the Law of the Iterated Logarithm 15<br />

Chapter 3. Ergodic Theorem 19<br />

1. Motivation 19<br />

2. Birkhoff’s Theorem 20<br />

3. Continued Fractions 24<br />

Chapter 4. Brownian Motion 29<br />

1. Motivation 29<br />

2. Levy’s Construction 30<br />

3. Construction <strong>from</strong> Durret’s Book 33<br />

4. Some Properties 36<br />

Chapter 5. Appendix 39<br />

1. Conditional Random Variables 39<br />

2. Extension <strong>Theorems</strong> 40<br />

3


CHAPTER 1<br />

Martingales<br />

1. Definitions and Examples<br />

This section on Martingales contains heavy use of conditional random variables.<br />

I do a quick review of this topic <strong>from</strong> <strong>Limit</strong> <strong>Theorems</strong> 1 in the appendix.<br />

Definition 1.1. A sequence of random variables X0, X1, ... is called a martingale<br />

if E(|Xn|) < ∞for all n and with probability 1:<br />

E (Xn+1|X0, X1, ..., Xn) = Xn<br />

Intuitively, this is says that the average value of Xn+1is the same as that of Xn,<br />

even if we are given the values of X0to Xn. Note that conditioning on X0, ..., Xnis<br />

just different notation for conditioning on σ(X0, ..., Xn), which is the sigma algebra<br />

generated by preimages of Borel sets through X0, ..., Xn. One can make more general<br />

martingales by replacing σ(X0, ..., Xn) with an arbitrary increasing chain of sigma<br />

algebras Fn; the results here carry over to that setting too.<br />

Example 1.2. Sometimes martingales are called “fair games”. The analogy is<br />

that the random variable Xn represents the bankroll of the gambler at time n. The<br />

game is fair, because at any point in time the equity of the gambler is constant.<br />

Definition 1.3. A submartingale is when E (Xn+1|X0, X1, ..., Xn) ≥ Xn (i.e.<br />

the capital is increasing) and a supermartingale is when E (Xn+1|X0, X1, ..., Xn) ≤<br />

Xn (i.e. the capital is decreasing) Most of the theorems for martingales work for<br />

submartingales, just change the inequality in the right place. To avoid confusion<br />

between sub-, super-, and ordinary martingales, we will sometimes call a martingale<br />

a “fair martingale”.<br />

Example 1.4. The symmetric random walk, Xn = Z0 + Z1 + ... + Zn with<br />

each Zn = ±1 with probability 1<br />

2 is a martingale. In terms of the fair game, this is<br />

gambling on the outcome of a fair coin.<br />

Remark. Using the properties of conditional probabilities to see that:<br />

E (Xn+2|X0, X1, ..., Xn) = E (E (Xn+2|X0, X1, ..., Xn+1) |X0, ...Xn)<br />

= E (Xn+1|X0, ...Xn)<br />

= Xn<br />

With a simple argument by induction, we get that in general:<br />

E (Xm|X0, X1, ..., Xn) = Xn<br />

In particular then E(Xn) = E(X0) for every n. If τ is a random “time”, (a<br />

non-negative integer) that is independent of the Xn’s, then E(Xτ ) is a weighted<br />

average of E(Xn)’s, so have E(Xτ ) = E(X0) still. What if υis dependent on the<br />

5


6 1. MARTINGALES<br />

X ′ ns? In general we cannot have equality for the example of the simple symmetric<br />

random walk (coin-flip betting), with τ =first time that Xn = −1 has E(Xn) =<br />

−1 = 0 = E(X0). The next section gives some conditions where this holds.<br />

2. Stopping times<br />

Definition 2.1. For a martingale {Xn}, A non-negative integer valued random<br />

variable τ is a stopping time if it has the property that:<br />

{τ = n} ∈ σ(X1, X2, . . . , Xn)<br />

Intuitively, this is saying that one can determine if τ = n just by looking at the<br />

first n steps in the martingale.<br />

Example 2.2. In the example of the random coin flipping, if we let τ be the<br />

first time so that Xn =10, then τ is a stopping time.<br />

Example 2.3. We often are interested in Xτ , the value of the martingale at<br />

the random time τ. This is precisely defined as Xτ (ω) = X τ(ω)(ω). Another handy<br />

rewriting is: Xτ = Xk1 {τ=k} .<br />

Lemma 2.4. If {Xn}is a submartingale and τ1, τ2are bounded stopping times<br />

so that ∃M s.t. 0 ≤ τ1 ≤ τ2 ≤ M with probability 1, then E(Xτ1 ) ≤ E(Xτ2 ), with<br />

equality for fair martingales.<br />

Proof. For fixed k, the event {τ1 < k ≤ τ2}can be written as {τ1 < k ≤<br />

τ2} = {τ1 ≤ k − 1} ∩ {τ2 ≤ k − 1} C <strong>from</strong> which we see that the event {τ1 <<br />

k ≤ τ2} ∈ σ(X0, X1, . . . , Xk−1) because τ1and τ2are both stopping times. We<br />

have then the following manipulation using a telescoping series, linearity of the<br />

expectation, the fact that E(Y 1A)= E(E(Y |X0, X1, . . . , Xk−1)1A) for events A ∈<br />

σ(X0, X1, . . . , Xk−1), and finally the fact that E(Xk|X0, X1, . . . Xk−1) − Xk−1 ≥ 0<br />

since Xn is a (sub)martingale. (with equality for fair martingales):<br />

E(Xτ2 ) − E(Xτ1 ) = E(Xτ2 − Xτ1 )<br />

τ2<br />

= E(<br />

= E<br />

k=τ1+1<br />

M<br />

k=1<br />

Xk − Xk−1)<br />

(Xk − Xk−1)1 {τ1


3. MARTINGALE CONVERGENCE THEOREM 7<br />

Theorem 2.5. Say {Xn} is a martingale and τ a bounded stopping time, (that<br />

is ∃M s.t. 0 ≤ τ ≤ M with probability 1). Then:<br />

E(Xτ ) = E(X0)<br />

Proof. Let υbe the random variable which is constantly 0. This is a stopping<br />

time! So by the above lemma, since 0 ≤ υ ≤ τ ≤ M, we have that E(Xτ ) =<br />

E(Xυ) = E(X0) <br />

Theorem 2.6. For {Xn}a martingale and τ a stopping time which is almost<br />

surely finite (that is P(τ < ∞) = 1) we have:<br />

<br />

E(Xτ ) = E(X0) ⇐⇒ E lim<br />

n→∞ X <br />

min(τ,n) = lim<br />

n→∞ E <br />

Xmin(τ,n) Proof. It suffices to show that E(Xτ ) = E <br />

limn→∞ Xmin(τ,n) andE(X0) =<br />

limn→∞ E <br />

Xmin(τ,n) . The first equality holds since P(τ < ∞) = 1 gives P(limn→∞ Xmin(τ,n) =<br />

Xτ ) = 1, so they agree almost surely. The second holds by the above theorem concerning<br />

bounded stopping times since for any n, min(τ, n) is a bounded stopping<br />

time, so we have E <br />

Xmin(τ,n) = E(X0), so equality holds in the limit too. <br />

Remark. The above theorem can be combined with things like monotone<br />

convergence theorem or Lebesgue dominated convergence theorem to switch the<br />

limits and conclude that E(Xτ ) = E(X0). Here are some examples:<br />

Example 2.7. If {Xn}is a martingale and τ a stopping time so that P(τ <<br />

∞) = 1 and E(|Xτ |) < ∞, and limn→∞ E(Xn1τ>n) = 0, then E(Xτ ) = E(X0).<br />

Proof. For any n we have: X min(τ,n) = Xn1τ>n +Xτ 1τ≤nTaking expectation<br />

and then the limit as n → ∞, gives:<br />

lim<br />

n→∞ E(Xmin(τ,n)) = lim<br />

n→∞ E(Xn1τ>n) + lim<br />

n→∞ E(Xτ 1τ>n)<br />

= 0 + E(Xτ )<br />

Where the first term is 0 by hypothesis, and the second limit is justified since<br />

Xτ 1τ>n → Xτ pointwise almost surely since P(τ < ∞) = 1, and the dominant<br />

majorant E(|Xτ |) < ∞lets us use the Lebesgue dominated convergence theorem to<br />

conclude the convergence of the expectation. <br />

Example 2.8. Suppose {Xn}is a martingale and τ a stopping time so that<br />

E(τ) < ∞ and |Xn+1 − Xn| ≤ M < ∞for some fixed M and for every n. Then<br />

E(Xτ ) = E(X0).<br />

Proof. Let Y = |X0| + Mτ. Then Y can be used as a dominant majorant in<br />

a L.D.C.T. very similar to the above example to get the conclusion. <br />

3. Martingale Convergence Theorem<br />

The proof relies on the famous upcrossing lemma:<br />

Lemma 3.1. [The Upcrossing Lemma]. Let {Xn}be a submartingale. For fixed<br />

α, β ∈ R, β > α,and M ∈ N let U α,β<br />

M be the number of “upcrossings” that the<br />

martingale {Xn}makes of the interval α, β in the time period 1 ≤ n ≤ M. (An<br />

upcrossing is when Xn goes <strong>from</strong> being less than α initially to being more than β


8 1. MARTINGALES<br />

later. Precisely this is: U α,β<br />

M = maxk{k : ∃t1 < u1 < . . . < tk < uk ≤ M s.t. Xti ≤<br />

α and Xui ≥ β ∀i} ). Then:<br />

E(U α,β<br />

M ) ≤ E (|XM − X0|)<br />

β − α<br />

Proof. Firstly, we remark that it suffices to prove the result when the submartingale<br />

{Xn}is replaced by {max(Xn, α)}, since this is still a submartingale,<br />

it has the same number of upcrossings as Xn, and | max(XM , α) − max(X0, α)| ≤<br />

|XM −X0|, so the equality is only strengthened. In other words, we assume without<br />

loss of generality that Xn ≥ α for all n. This simplification is used in exactly one<br />

spot later on to get the inequality we need.<br />

Let us now carefully nail down where the upcrossings happen. Define u0 =<br />

v0 = 0 and iteratively define:<br />

uj = min(M, inf {k : Xk ≤ α}<br />

k>vj−1<br />

vj = min(M, inf {k : Xk ≥ β}<br />

k>uj<br />

These record the times where the martingale crosses the interval [α, β]; the uj’s<br />

record when it first crosses moving to the left of the interval, and the vj’s record<br />

crosses going to the right of the interval. They are also truncated at time M so that<br />

they are bounded stopping times. Moreover, since these times are strictly increasing<br />

until they hit M, it must be the case that vM = M. We have then, using some<br />

crafty telescoping sums:<br />

E(XM ) = E (XvM )<br />

= E XvM − XuM + XuM − XvM−1 + XvM−1 − . . . − Xu1<br />

<br />

+ Xu1 − X0 + X0<br />

<br />

M<br />

<br />

M<br />

= E (X0) + E Xvk − Xuk + E <br />

Xuk − Xvk−1<br />

k=1<br />

The third term is non-negative! This is because uk and vk−1are both bounded<br />

stopping times with 0 ≤ vk−1 ≤ uk ≤ M, so our theorem about stopping times<br />

gives that this expectation is non-negative. (This is subtle! Most of the time<br />

(when we haven’t hit time M yet) we expect Xuk < αwhile Xvk−1 > β, so their<br />

difference is negative. However, because of the small probability event where vk−1 <<br />

M and uk = M, we get a big positive number with small probability which balances<br />

the whole expectation. Compare to the example of a simple symmetric random walk<br />

with a truncated stopping time for τ =first time that Xn = −1.)<br />

M<br />

Now the second term, has E Xvk k=1 − Xuk ≥ E (β − α)U α,β<br />

<br />

M . This is<br />

because each upcrossing counted in U α,β<br />

M contributes at least (β−α) to the sum, null<br />

cycles (where uk = vk = M) contribute nothing, and the possibly one incomplete<br />

cycle (where uk < M but vk = M) must give a non-negative contribution to the<br />

sum by the simplification that Xn > α.<br />

Hence we have:<br />

<br />

E(XM ) ≥ E (X0) + (β − α)E + 0<br />

k=1<br />

U α,β<br />

M<br />

Which gives the desired result.


4. APPLICATIONS 9<br />

Theorem 3.2. [Martingale Convergence Theorem] Let {Xn}be a submartingale<br />

with sup n E(|Xn|) < ∞. Then there exists a random variable X so that Xn → X<br />

almost surely. (That is Xn(ω) = X(ω) for almost all ω ∈ Ω).<br />

Proof. Firstly, since sup n E(|Xn|) < ∞, by Fatou’s lemma we have: E(lim infn |Xn|) ≤<br />

lim infn E(|Xn|) ≤ sup n E(|Xn|) < ∞, <strong>from</strong> which it follows that P(|Xn| → ∞) =<br />

0. This ensures that the Xncannot “leak away” probability to ±∞, which would<br />

prevent the limiting random variable <strong>from</strong> being properly normalized.<br />

Now suppose by contradiction that P(lim inf Xn < lim sup Xn) > 0, i.e. there is<br />

a non-zero probability of Xnnot converging. Then, using the density of the rationals<br />

and countable subadditivity to find an α and β so that P(lim inf Xn < α < β <<br />

lim sup Xn) > 0. Counting the number of upcrossing Xn <br />

<br />

makes of [α, β],we see that<br />

α,β<br />

we must have: P lim UM = ∞ > P(lim inf Xn < α < β < lim sup Xn) > 0.<br />

<br />

M→∞ <br />

α,β<br />

Hence E lim UM = ∞. By the monotone convergence theorem however, we<br />

M→∞ <br />

α,β<br />

have that lim<br />

lim U = ∞.<br />

M→∞<br />

α,β<br />

E(UM ) =E<br />

M→∞<br />

M<br />

But now we have reached a contradiction! For by the upcrossing lemma:<br />

α,β<br />

lim E(UM M→∞ ) ≤ limM→∞ E (|XM − X0|)<br />

≤<br />

β − α<br />

2 supM E(|Xn|)<br />

< ∞<br />

β − α<br />

4. Applications<br />

Theorem 4.1. [Levy]Suppose Z a random variable with E(|Z|) < ∞, and that<br />

{Fn} is a decreasing chain of σ−algebras, F1 ⊃ F2 ⊃ . . . (This is saying that they<br />

are getting coarser and coarser). Let F∞ = ∩Fn. Then we have almost surely:<br />

lim<br />

n→∞ E(Z|Fn) = E(Z|F∞)<br />

Proof. We first prove that there is an almost sure limit using the martingale<br />

convergence theorem, and then we check the defining properties of E(Z|F∞) to<br />

verify that this is indeed the limit.<br />

Firstly, let Xn = E(Z|Fn). Then for any fixed M ∈ N we have that the sequence<br />

XM , XM−1, . . . X2, X1is a martingale (Here we are referring to a slightly more general<br />

martingale than in our original definition, the sigma algebra σ(X1, X2, . . .) in<br />

the definition is replaced by arbitrary increasing sigma algebras Fn. The expectation<br />

property of the martingale follows by the fact that E(E(Z|F)|G) = E(Z|G)<br />

when G ⊂ F) Notice that we had to reverse the order of the sequence to get the<br />

sigma algebras to increase (i.e. get finer and finer), so that we really have a martingale.<br />

For this reason, the martingale convergence theorem does not apply directly<br />

but the idea of the proof will still work. Suppose by contradiction, as in the proof of<br />

the martingale convergence theorem, that P(lim inf Xn < lim sup Xn) > 0. Then,<br />

as before, find αand β so that P(lim inf Xn < α < β < lim sup Xn) > 0. Since there<br />

are infinitely many crossings then of the interval [α, β], we can know that the number<br />

of downcrossings D α,β<br />

M has P<br />

Hence, since D α,β<br />

M<br />

<br />

lim<br />

M→∞ Dα,β<br />

M<br />

<br />

<br />

= ∞ > 0 and so E<br />

lim<br />

M→∞ Dα,β<br />

M<br />

<br />

<br />

= ∞.<br />

is increasing in M (the number of downcrossings<br />

<br />

can only increase<br />

if we wait longer), we may find an M0 ∈ Nso that E D α,β<br />

<br />

> M0<br />

2E(|Z|)<br />

β−α .


10 1. MARTINGALES<br />

Taking now the martingale sequence XM0 , XM0−1, . . . X2, X1, we have a violation<br />

of the upcrossing lemma just as we did in the martingale convergence theorem.<br />

Next, to verify that the limit is indeed E(Z|F∞) we just need to check the<br />

two defining properties, namely that it is F∞measurable and that it has the correct<br />

expectation value for events in F∞. limn→∞ E(Z|Fn) is F∞measurable, since<br />

F∞ ⊂ F n for every n, meaning that E(Z|Fn) is F∞measurable for every n, and so<br />

the limit is too.<br />

To see that limn→∞ E(Z|Fn) takes the correct expectations for events in F,<br />

notice that for any A ∈ F∞ ⊂ Fn we have for every n that E (E(Z|Fn)1A) =<br />

E(Z1A) since A ∈ Fn, so in the limit limn→∞ E (E(Z|Fn)1A) = E(Z1A). Hence<br />

the problem of proving that E (limn→∞ E(Z|Fn)1A) = E(Z1A) is reduced to an<br />

interchange of a limit with an expectation. If Z is bounded, this is justified by the<br />

bounded convergence theorem. For Z not bounded, truncating Z by Z1 {Z≤N}with<br />

a bit more work will give the same interchange of limits. <br />

Theorem 4.2. [Levy] Suppose Z a random variable with E(|Z|) < ∞, and that<br />

{Fn} is an increasing chain of σalgebras, F1 ⊂ F2 ⊂ . . . (This is saying that they<br />

are getting finer and finer). Let F∞ = ∪Fn. Then we have almost surely:<br />

lim<br />

n→∞ E(Z|Fn) = E(Z|F∞)<br />

Proof. This proof is like the last one. In this case E(Z|Fn) really is a martingale<br />

(no backwards), so an almost sure limit exists by the martingale convergence<br />

theorem. Some more work here is needed....I think you get the desired property<br />

by approximation with “tame events” A ∈ F∞, for every ɛ > 0 there exists<br />

An ∈ Fnsuch that P(A∆An) < ɛ. <br />

Remark. This result is often known as the “Levy Zero-One Law” since a<br />

common application is to consider an event A ∈ F∞, for which the theorem tells<br />

us that:<br />

lim<br />

n→∞ P(A|Fn) = lim<br />

n→∞ E(1A|Fn)<br />

= E(1A|F∞)<br />

= 1A<br />

Where the last equality holds since A is F∞measurable. This says in particular<br />

that this probability is either 0 or 1, since these are the only two values taken on<br />

by 1A. In this setting, the theorem gives a short proof of the Kolmogorov zero-one<br />

law.<br />

Theorem 4.3. [Kolmogorov Zero-One law] Let X1, X2, . . .be an infinite sequence<br />

of i.i.d. random variables. Define:<br />

<br />

n<br />

<br />

Fn = σ σ(Xk)<br />

F∞ =<br />

Ftail =<br />

n<br />

k=1<br />

k=1<br />

Fn<br />

∞<br />

<br />

∞<br />

<br />

σ σ(Xk)<br />

n=1<br />

k=n


4. APPLICATIONS 11<br />

Then any event A ∈ Ftail has either P(A) = 0 or P(A) = 1. These are those<br />

events which do not depend on finitely many of the X ′ ns.<br />

Proof. Let A ∈ Ftail. For any n ∈ N we have that P(A) = P(A|Fn) =<br />

E(1A|Fn) since A ∈ Ftail does not depend on the first n variables, so its conditional<br />

expectation is a constant. Have then, (as in the above “Levy 0-1” remark):<br />

P(A) = lim<br />

n→∞ P(A|Fn)<br />

= 1A<br />

Since A ∈ F∞So indeed, the only the values of P(A) that are possible are 1<br />

and 0. <br />

Theorem 4.4. [Strong Law of Large Numbers] Suppose X1, X2, . . . are i.i.d.<br />

Then we have almost surely that:<br />

X1 + X2 + . . . + Xn<br />

lim<br />

= E(X1)<br />

n→∞ n<br />

∞<br />

Proof. Define Sn = X1 + X2 + . . . + Xn, and let Fn = σ( σ(Sk)) be the<br />

sigma algebra of the tail Sn, Sn+1, . . . . We now claim that:<br />

E(X1|Fn) = Sn<br />

n<br />

This can be seen in the following slick way. First notice that by symmetry,<br />

we<br />

<br />

must have E(X1|Fn) = E(X2|Fn) = . . . = E(Xn|Fn). By linearity now:<br />

n<br />

k=1 E(Xk|Fn)=E(Σn k=1Xk|Fn) = E(Sn|Fn) = Sn, since Sn ∈ Fn. Hence since<br />

they are all equal, and sum to Sn, we get E(X1|Fn) = Sn<br />

n as desired. By Levy’s<br />

theorem now:<br />

Sn<br />

lim<br />

n→∞ n<br />

= lim<br />

n→∞ E(X1|Fn)<br />

<br />

= E X1| <br />

<br />

From here, one can use the Hewitt-Savage zero-one law (which says that permutation<br />

<br />

invariant events have a zero one law), to see that the whole sigma algebra<br />

k Fk must be the trivial one, so then E (X1| <br />

k Fk) = E(X1). Alternatively, once<br />

we have conclude that such an almost sure limit exists, one could then remark<br />

by the Kolmogorov zero that the limit must be a constant (for limn→∞ Sn<br />

n does<br />

k<br />

Fk<br />

k=n<br />

not depend on finitely many of the X ′ ns so any type of event {lim Sn<br />

n<br />

< α}must<br />

have probability 0 or 1. By taking a sup, we can find that it must be a constant.)<br />

Combining this with the above, using the fact that conditional random variables<br />

preserve the expectation, shows the constant is indeed E(X1). <br />

Theorem 4.5. [Hewitt Savage Zero-One Law] Let X1, X2, . . .be an infinite sequence<br />

of i.i.d. random variables. Let A be an event which is unchanged under<br />

finite permutations of the induces of the X ′ is. (e.g. for every finite permutation<br />

Π,ω = (x1, x2, . . .) ∈ A iff Π(ω) = (xΠ(1), xΠ(2), . . .) ∈ A i.e. Π(A) = A). Then<br />

P(A) = 0 or 1.


12 1. MARTINGALES<br />

Proof. We call an event “tame” if it only depends on finitely many of the<br />

X ′ is.The proof is a consequence of the fact that for any ɛ, any event A can be<br />

approximated by a “tame event” B so that P(B△A) < ɛ. (This is completely analogous<br />

to the fact that for the usual Lebesgue measure on R, one can approximate<br />

any measurable set S by a finite union of open intervals In so that λ(∪n i=1Ii△U) < ɛ.<br />

This comes <strong>from</strong> the definition of the Lebesgue measure as the inf of the outer measure<br />

with open sets, and the fact that every open set is a union of countably many<br />

intervals, of which only finitely many are needed to be within ɛ/2. In the same vein,<br />

the probability measure on the infinite sequence of events is generated by the outer<br />

measure <strong>from</strong> tame events. This is usually all packaged up in the Caratheodory extension<br />

theorem.). Once we have this tame event B, depending only on X1, . . . Xn<br />

we letΠ be the permutation that permutes 1, . . . n with n + 1, . . . , 2n so that B and<br />

Π(B) are independent events. Have then:<br />

P(A) ≈ P(A ∩ B)<br />

= P(Π(A) ∩ Π(B))<br />

= P(A ∩ Π(B))<br />

≈ P(B ∩ Π(B))<br />

= P(B)P(Π(B))<br />

= P(B) 2<br />

≈ P(A) 2<br />

Where each of the approximations hold within ɛ by the choice of B. Since we<br />

can do this for every ɛ > 0, we get P(A) = P(A) 2 and the result follows.


CHAPTER 2<br />

The Law of the Iterated Logarithm<br />

We will prove that for a sequence of i.i.d events X1, X2, . . .with mean 0 and<br />

variance 1 that for Sn = n i=1 Xn:<br />

<br />

Sn<br />

P lim sup =<br />

n→∞ n log(log n)) √ <br />

2<br />

This result is giving us finer information about these sums than the law of large<br />

numbers or the central limit theorem. We need the theory of martingales to get<br />

Doob’s inequality, and then a bunch of other sneaky tricks, like the Borel Cantelli<br />

lemmas, to get the result. We will also need a few analytic type estimates along the<br />

way. (Actually, our proof here will only prove the case where the X ′ ns are ±1 with<br />

probability 1/2 each. The result can be generalized by using even finer estimates)<br />

1. First Half of the Law of the Iterated Logarithm<br />

To start, we will first prove some helpful lemmas.<br />

Lemma 1.1. [Doob’s Inequality] For a submartingale Zn, we have for any α > 0<br />

that:<br />

<br />

P max<br />

0≤i≤n Zi<br />

<br />

≥ α ≤ E(|Zn|)<br />

α<br />

Proof. (Taken <strong>from</strong> Rosenthal) Let Akbe the event that {Xk ≥ α, but Xi <<br />

α for i < k},i.e. that the process reaches αfor the first time at time k. These are<br />

disjoint events with A = ∪Ak = {(max0≤i≤n Zi) ≥ α} which is the event we want.<br />

Now consider:<br />

αP(A) =<br />

n<br />

αP(Ak)<br />

k=0<br />

= E(α1Ak )<br />

≤ E(Xk1Ak ) since Xk ≥ αon Ak<br />

≤ E (E (Xn|X1, X2, . . . Xk) 1Ak ) since it’s a submartingale<br />

= E(Xn1Ak )<br />

= E(Xn1A)<br />

≤ E(|Xn|)<br />

And the result follows. <br />

13


14 2. THE LAW OF THE ITERATED LOGARITHM<br />

Remark. This is a “rich man’s version of Chebyushev-type inequalities”, which<br />

are proved using the same trick as in lines 3 and 4 of the inequality train above.<br />

The fact that the behavior of the whole martingale can be controlled by the end<br />

point of the martingale gives us the little extra oomph we need.<br />

Lemma 1.2. [Hoeffding’s Inequality] Let Y be a random variable so that E(Y ) =<br />

0 and a, b ∈ R so that a ≤ Y ≤ b almost surely. Then E(e tY ) ≤ e t2 (b−a) 2 /8 .<br />

Proof. Write X as a convex combination of a and b: Y = αb + (1 − α)a where<br />

α = (Y − a)/(b − a). By convexity of e ( ) , have then:<br />

e tY ≤<br />

Y − a<br />

b − a etb b − Y<br />

+<br />

b − a eta<br />

Taking expectations (and using E(Y ) = 0), have:<br />

E e tY ≤ −a<br />

b − a etb + b<br />

b − a eta = e g(t(b−a))<br />

For g(u) = −γu + log(1 − γ + γe u )and γ = − a<br />

b−a . Notice g(0) = g′ (0) = 0 and<br />

g ′′ (u) < 1<br />

4<br />

for all u. Hence by Taylor’s theorem:<br />

g(u) = g(0) + ug ′ (0) + u2<br />

2 g′′ (ξ)<br />

≤ 0 + 0 + u 2 1 u2<br />

=<br />

2 4 8<br />

So then E e tY ≤ e g(t(b−a)) ≤ e t2 (b−a) 2 /8 <br />

Lemma 1.3. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 1<br />

2 and Sn = n<br />

k=1 Xk.<br />

Then P(maxk≤n Sk > λ) ≤ e −λ2 /2n .<br />

Proof. Have, by using Doob’s inequality and Hoeffding’s Inequality, for any<br />

t ∈ R, we have:<br />

Set t = 4λ/n(b − a) 2 to get:<br />

P(max<br />

k≤n Sk > λ) = P(max<br />

k≤n etSk > e tλ )<br />

≤ e −tλ E(e tSn )<br />

= e −tλ E(e tX1 ) n<br />

≤ e −tλ e nt2 (b−a) 2 /8<br />

P(max<br />

k≤n Sk > λ) ≤ e −(4λ/n(b−a)2 )λ e n(4λ/n(b−a) 2 ) 2 (b−a) 2 /8<br />

= e −2λ2 /n(b−a) 2<br />

For simple symmetric steps, we have a = −1 and b = 1, so this gives the<br />

result. <br />

Theorem 1.4. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 1<br />

2 and Sn <br />

=<br />

n<br />

k=1 Xk. Then for any ɛ > 0,<br />

<br />

<br />

P<br />

lim sup<br />

n→∞<br />

Sn<br />

n log(log n)) > √ 2 + ɛ<br />

= 0


2. SECOND HALF OF THE LAW OF THE ITERATED LOGARITHM 15<br />

Or in other words, since this holds for any value of ɛ > 0:<br />

<br />

Sn<br />

P lim sup ≤<br />

n→∞ n log(log n)) √ <br />

2 = 1<br />

Proof. Fix some θ > 1 (the choice will be made more precise later). We will<br />

show that with the correct choice of θ, the events An = {Sk > (2 + ɛ)k log(log k))for<br />

some k, θn−1 ≤ k < θn } happens only finitely many times, which will show that<br />

the limsup can’t be more than √ 2 + ɛ. To do this it suffices to show that P(An)<br />

is summable, because then the Borel-Cantelli lemmas will show that An happens<br />

finitely often with probability 1. We have (using our previous lemma):<br />

<br />

P(An) = P Sk > (2 + ɛ)k log(log k)), θ n−1 ≤ k < θ n<br />

<br />

≤ P Sk > (2 + ɛ)θn−1 log(log θn−1 )), θ n−1 ≤ k < θ n<br />

<br />

≤ P max<br />

k≤θn Sk > (2 + ɛ)θn−1 log(log θn−1 <br />

))<br />

2 <br />

(2 + ɛ)θn−1 log(log θn−1 ))<br />

≤ exp −<br />

2θn <br />

2 + ɛ θ<br />

= exp −<br />

2<br />

n−1 (log(n − 1) + log(log(θ)))<br />

θn <br />

<br />

≈ exp − 1 + ɛ<br />

<br />

θ<br />

2<br />

−1 <br />

log(n − 1) for large n<br />

So choosing θ < 1 + ɛ<br />

2 , gives us that 1 + ɛ<br />

<br />

−1<br />

2 θ > 1, so this is:<br />

ɛ −(1+<br />

P(An) ≤ (n − 1) 2)θ −1<br />

From which we see that P(An) is summable (it’s a p-series!). By using the<br />

Borel Cantelli lemma, this means that An happens only finitely many times with<br />

probability 1, which is the desired result. <br />

2. Second Half of the Law of the Iterated Logarithm<br />

To prove the other half, we need some more estimates.<br />

Lemma 2.1. [Mill’s Inequality] This is an estimate concerning the probability<br />

density function of a Gaussian:<br />

λ<br />

λ2 + 1 e−λ2 ∞<br />

/2<br />

≤ e −y2 /2 1<br />

dy ≤<br />

λ e−λ2 /2<br />

λ<br />

Proof. To prove the lower bound, we find a remarkable anti-derivative:<br />

∞<br />

e<br />

λ<br />

−y2 /2<br />

dy ≥<br />

∞<br />

e<br />

λ<br />

−y2 4 2<br />

/2 y + 2y − 1<br />

y2 + 2y2 <br />

dy<br />

+ 1<br />

=<br />

<br />

− y<br />

y2 + 1 e−y2 ∞ /2<br />

=<br />

λ<br />

λ 2 + 1 e−λ2 /2<br />

λ


16 2. THE LAW OF THE ITERATED LOGARITHM<br />

The upper bound is found by using the estimate y/λ > 1 in the range of<br />

integration:<br />

<br />

λ<br />

∞<br />

e −y2 /2 dy ≤<br />

∞<br />

y<br />

λ e−y2 /2<br />

dy<br />

λ<br />

= 1<br />

λ<br />

<br />

−e −y2 /2 ∞<br />

= 1<br />

λ e−λ2 /2<br />

Theorem 2.2. Let X1, X2, . . . be i.i.d with P(X1 = ±1) = 1<br />

2 and Sn <br />

=<br />

n<br />

k=1 Xk. Then for any ɛ > 0,<br />

<br />

<br />

P<br />

lim sup<br />

n→∞<br />

Sn<br />

n log(log n)) ≥ √ 2 − 2ɛ<br />

Or in other words, since this holds for any value of ɛ > 0:<br />

<br />

Sn<br />

P lim sup ≥<br />

n→∞ n log(log n)) √ <br />

2 = 1<br />

Proof. As in the proof of the other half of the law, the idea is to prove that the<br />

appropriate events happen infinitely often using the Borel-Cantelli<br />

lemmas. Fix θ ><br />

1 (the choice will be made precise later). Let Bn =<br />

We will show that these occur infinitely often and then show why this gives the<br />

result. Notice that the B ′ ns are independent, as each Bn depends only on the value<br />

of Xk for θn−1 ≤ k ≤ θn , so to prove that Bnhappens i.o. it suffices to show, via<br />

the Borel Cantelli lemma, that P(Bn) is not summable. Consider:<br />

<br />

P(Bn) = P Sθn − Sθn−1 ≥ (2 − ɛ)θn log(log(θn <br />

))<br />

<br />

= P<br />

≈<br />

λ<br />

= 1<br />

<br />

S θ n −θ n−1 ≥ (2 − ɛ)θ n log(log(θ n ))<br />

1<br />

√<br />

2π<br />

∞<br />

√ (2−ɛ)θ n log(log(θ n ))<br />

√ θ n −θ n−1<br />

Sθ n − S θ n−1 ≥ (2 − ɛ)θ n log(log(θ n ))<br />

e −y2 /2 dy<br />

Where the first equality holds using the Markov property of the sums (equiv-<br />

s are i.i.d.), and<br />

alently, look at the definition as sums of X ′ is and the fact the X′ i<br />

the second equality is coming asymptotically as θn − θn−1 √ → ∞ <strong>from</strong> the central<br />

(2−ɛ)θn log(log(θn ))<br />

limit theorem. Now, let λ = √ be the lower bound of the integral<br />

θn−θn−1 and use Mill’s inequality to get:<br />

P(Bn) ≥<br />

=<br />

1<br />

√ 2π<br />

1<br />

√ 2π<br />

λ<br />

λ 2 + 1 e−λ2 /2<br />

1<br />

λ + λ −1 e−λ2 /2<br />

<br />

<br />

.


2. SECOND HALF OF THE LAW OF THE ITERATED LOGARITHM 17<br />

But now notice that λ =<br />

log n<br />

√ (2−ɛ)θ n log(log(θ n ))<br />

√ θ n −θ n−1<br />

≈ √ 2 − ɛ √ log n<br />

√ 1−θ −1 , so λ2 ≈ (2 −<br />

ɛ) 1−θ−1 . So our estimate is:<br />

1<br />

P(Bn) ≥ C √ √ −1<br />

log n + log n exp<br />

<br />

2 − ɛ<br />

2(1 − θ−1 <br />

log n<br />

)<br />

≥ Cn −<br />

“<br />

1−ɛ/2<br />

1−θ−1 ”<br />

log n −1/2<br />

Where C’s are some constants. By choosing θlarge enough, 1−ɛ/2<br />

1−θ −1 < 1 and this<br />

will not be summable! Have then Bn occurs infinitely often.<br />

Now, we will show that these events Bnoccurring infinitely often will be enough<br />

to see that Sθ n ≥ (2 − 2ɛ)θ n log(log(θ n )) infinitely often too. To do this we<br />

will use the first half of the law of the iterated logarithm we already proved,<br />

namely that for any η > 0, the events {Sk > (2 + η)k log(log k))}happen only<br />

finitely often with probability 1. By symmetry, we’ll have the events {Sk <<br />

− (2 + η)k log(log k))}happen only finitely often too. Hence, the events An <br />

=<br />

Sθn−1 < − (2 + η)θn−1 log(log θn−1 <br />

)) happens only finitely often with proba-<br />

bility 1. Now, since the B ′ ns occur infinitely often with probability 1, and the A ′ ns<br />

occur only finitely often with probability 1, the events Bn ∩ A c n will occur infinitely<br />

often with probability 1 too. This will give us the infinite sequence we need, for on<br />

the event Bn ∩ A c n we have the inequalities:<br />

Sθ n − S θ n−1 ≥ (2 − ɛ)θ n log(log(θ n ))<br />

S θ n−1 ≥ − (2 + η)θ n−1 log(log θ n−1 ))<br />

Hence, with probability 1, we have that for infinitely many values of n:<br />

Sθ n ≥ (2 − ɛ)θ n log(log(θ n )) + S θ n−1<br />

≥ (2 − ɛ)θn log(log(θn )) − (2 + η)θn−1 log(log θn−1 ))<br />

≥ (2 − ɛ)θn log(log(θn <br />

(2 + η)<br />

)) − θ<br />

θ<br />

n log(log θn ))<br />

<br />

√2 2 + η θ<br />

− ɛ −<br />

n log(log(θn ))<br />

=<br />

θ<br />

So by fixing η, (any choice will do) and then choosing θ large enough we can<br />

√2<br />

make the coefficient − ɛ − ≥ √ 2 − 2ɛ. (Note that this doesn’t disrupt<br />

2+η<br />

θ<br />

our choice of θ previously because that too was a choice to make θ large, so we can<br />

always find θso big to suit both our needs.) We have then that for infinitely many<br />

n:<br />

Sθ n<br />

θ n log(log(θ n )) ≥ √ 2 − 2ɛ<br />

So then:<br />

P<br />

<br />

lim sup<br />

n→∞<br />

Sn<br />

n log(log n)) ≥ √ 2 − 2ɛ<br />

The two halves of the law of the iterated logarithm give the full result:<br />

<br />

= 1


18 2. THE LAW OF THE ITERATED LOGARITHM<br />

P<br />

<br />

lim sup<br />

n→∞<br />

Sn<br />

n log(log n)) = √ 2<br />

<br />

= 1


CHAPTER 3<br />

Ergodic Theorem<br />

1. Motivation<br />

The study of Ergodic Theory was first motivated by statistical mechanics. Here,<br />

one is interested in the long term average of systems. For example, say we have some<br />

particles with position Q(t) at time t, and momentum P (t) at time t. Let f be a<br />

function on this state space, for example f might be the pressure/temperature/some<br />

other macroscopic variable. Can we find a distribution G so that:<br />

T<br />

1<br />

lim<br />

T →∞ T<br />

0<br />

<br />

f(Q(s), P (s))ds = fdG<br />

Gibbs et al. worked on this problem and it turns out that G = 1<br />

Z e−H/kT with<br />

Z the partition function, H the Hamiltonian, T temperature, and k Boltzmann’s<br />

constant has this! These types of long term averaging things can be useful. We will<br />

start with a simple example.<br />

Example 1.1. Let Ω = [0, 1) = {θ : 0 ≤ θ < 1} where we think of Ω as a<br />

circle with perimeter 1 (and θthe position on the circle). For some fixed angle ω, let<br />

T : Ω → Ω be rotation by ω, that is T (θ) = θ + ω mod 1. This is clearly measure<br />

preserving in the sense that for any set B we have that m(B) = m(T −1 (B)) where<br />

m is the usual Lebesgue measure. Could it be that:<br />

N−1<br />

1 <br />

lim<br />

N→∞ N<br />

n=0<br />

f (T n x) =<br />

1<br />

0<br />

f(s)ds<br />

If ωis rational, this doesn’t have a chance, because T n eventually cycles back<br />

to the identity, so T n x will only sample finitely many points. However, if ωis<br />

irrational, this is true! We can prove it in this case using Fourier analysis. When<br />

f(x) = e 2πimx , for m ∈ N, we have the geometric series:<br />

N−1<br />

1 <br />

N<br />

n=0<br />

<br />

f (T n x) = 1<br />

N−1<br />

N<br />

n=0<br />

e 2πim(x+nω)<br />

= 1<br />

N e2πmx e2πimNω − 1<br />

e2πimω − 1<br />

→ 0<br />

1<br />

= f(s)ds<br />

Where the fact that ωirrational ensures that e 2πimω −1 = 0. In the case m = 0,<br />

f is constant, so of course the result holds. Now for any f ∈ C 2 (Ω), we can expand<br />

0<br />

19


20 3. ERGODIC THEOREM<br />

f as a Fourier series to see the result holds. This lets us calculate for example:<br />

For if f = 1 (a,b) notice that<br />

#{k ≤ N : x + kω ∈ (a, b)}<br />

lim<br />

= b − a<br />

N→∞<br />

N<br />

#{k≤N: x+kω∈(a,b)}<br />

N<br />

= 1 N−1 N n=0 f (T nx). By ap-<br />

proximating f by C 2 functions (in the L 1 sense) <strong>from</strong> above and below, and applying<br />

the limit calculated above, we get the result.<br />

Is there away we can do this kind of thing using probability methods (rather<br />

than Fourier)? The next result is a nice theorem in this direction.<br />

2. Birkhoff’s Theorem<br />

Theorem 2.1. [Birkoff-Khinchin Ergodic Theorem] Say (Ω, F, P) is a probability<br />

space. Suppose T : Ω → Ω is a measure preserving map, in the sense that<br />

P(T −1 (B)) = P(B) for all B ∈ F. Let F0 = {A ∈ F : T −1 A = A a.e.} be the field<br />

of T invariant events. For f : Ω → R a random variable with E(|f|) < ∞, we have<br />

almost surely:<br />

N−1<br />

1 <br />

lim f (T<br />

N→∞ N<br />

n x) = E (f|F0)<br />

n=0<br />

Corollary 2.2. In the case that F0 is the trivial field, E (f|F0) = E(f) is a<br />

constant, so this is exactly the thing we had above. This happens precisely when<br />

T −1 A = A ⇒ P(A) = 0 or 1. In this case we say that the map T is “ergodic”.<br />

The proof of this theorem relies on the following lemma.<br />

Lemma 2.3. [Maximal Ergodic Lemma] Say (Ω, F, P) is a probability space.<br />

Suppose T : Ω → Ω is a measure preserving map, in the sense that P(T −1 (B)) =<br />

P(B) for all B ∈ F. Say f : Ω → R a random variable with E(|f|) < ∞. Let<br />

Sn = n−1 k=1 f(T kx) and let A = {supn≥1 Sn > 0} be the event that this is positive<br />

at some point. Then:<br />

<br />

E (f1A) = fdP > 0<br />

A<br />

Proof. Define f + (x) = f(T x) and let mn = max{0, S1, S2, . . . Sn}, and m + n in<br />

the same way, replacing f by f + in the definition of Sk. Notice that by this<br />

definition the mn’s are non-decreasing. Notice that the event A = {sup n≥1 Sn > 0}<br />

is the same as saying mn > 0 for n large enough. For this reason, it will be enough<br />

to restrict our attention to the events {mn > 0}. Notice that if we are in the event<br />

{mn > 0} then we have:<br />

S1 + m + n = S1 + max{0, S + 1 , S+ 2 , . . . S+ n }<br />

= S1 + max{0, S2 − S1, S3 − S1, . . . Sn+1 − S1}<br />

= max{S1, S2, . . . Sn+1}<br />

= mn+1<br />

Where we used that we’re on the event{mn > 0} in the last step to see the last<br />

equality, and we used S + n = n−1<br />

0<br />

f(T k T x) = n<br />

1 f(T x) = Sn+1 − S1in the second


2. BIRKHOFF’S THEOREM 21<br />

equality. We have then:<br />

E <br />

f1 {mn>0} =<br />

=<br />

<br />

E S11 {mn>0}<br />

E (mn+1 − m + =<br />

<br />

n )1 {mn>0}<br />

E ≥<br />

<br />

+<br />

mn+11 {mn>0} − E m n 1 {mn>0}<br />

E +<br />

mn+11 {mn>0} − E m n<br />

The last inequality holds since on the event {mn = 0},we have S1 ≤ 0, so<br />

m + n = mn+1 − S1 ≥ mn+1 ≥ 0, so E m + <br />

+<br />

n 1 {mn=0} ≥ 0. Hence E (m n ) =<br />

E m + <br />

+ +<br />

n 1 {mn>0} + E m n 1 {mn=0} ≥ E m n 1 {mn>0} . From here, we note that<br />

E(m + n ) = E(mn) since the map T is measure preserving, and the only difference<br />

between m + n and mn is the map x → T x. Have then:<br />

E <br />

f1 {mn>0} ≥ E mn+11 {mn>0} − E (mn)<br />

= E <br />

mn+11 {mn>0} − E mn1 {mn>0}<br />

<br />

= E (mn+1 − mn)1 {mn>0}<br />

≥ 0<br />

The second equality holds since mn ≥ 0 always holds, and the last inequality<br />

holds since the m ′ ns are non-increasing. Finally, to get the result, notice that<br />

{mn > 0} is increasing to {sup Sn > 0}, so by a monotone convergence theorem<br />

result, we have:<br />

E <br />

f1 {sup Sn>0} = lim<br />

n→∞ E <br />

f1 {mn>0} ≥ 0<br />

With this in hand, we can prove Birkhoff’s theorem:<br />

Theorem 2.4. [Birkoff-Khinchin Ergodic Theorem] Say (Ω, F, P) is a probability<br />

space. Suppose T : Ω → Ω is a measure preserving map, in the sense that<br />

P(T −1 (B)) = P(B) for all B ∈ F. Let F0 = {A ∈ F : T −1A = A a.e.} be the field<br />

of T -invariant events. For f : Ω → R a random variable with E(|f|) < ∞, we have<br />

almost surely:<br />

N−1<br />

1 <br />

lim f (T<br />

N→∞ N<br />

n x) = E (f|F0)<br />

n=0<br />

Proof. Firstly, we will argue that limN→∞ 1 N−1 N n=0 f (T nx) converges a.s. to<br />

some random variables, and then we (as usual) check that it has the two defining<br />

properties of conditional expectation.<br />

Define SN = N−1<br />

N−1 n=0 f (T nx) does not converge<br />

a.s.. By the usual trick with rational numbers then, we can find a, b ∈ R so that<br />

the even A = lim inf Sn<br />

Sn<br />

n ≤ a < b ≤ lim sup n hasP (A) > 0. Notice moreover,<br />

that A is a T -invariant event, i.e. x ∈ A ⇒ T x ∈ A, since applying T shifts the<br />

terms in Sn by one, which does not affect the limsup or liminf of Sn/n. (Indeed,<br />

these don’t depend on finitely many of the terms!). For this reason, we may define<br />

a new probability measure on the set A, namely we think of (A, ˜ F, ˜ P) as a new<br />

probability space, with ˜ F={A ∩ B : B ∈ F}and ˜ P(E) = P(E)/P(A). The fact<br />

that A is T -invariant means that T nx ∈ A whenever x ∈ A so we can still talk<br />

n=0 f(T nx) as before, so that we are interested in the sum<br />

Sn/n. Suppose by contradiction that limN→∞ 1<br />

N


22 3. ERGODIC THEOREM<br />

about Snand so on on this space. The fact that P(A) > 0 means that there is no<br />

problem re-normalizing like this. So we have now ˜ P(A) = 1 is the whole space.<br />

With this new space as our framework, we let f ′ (ω) = f(ω) − b, then we get new<br />

sums S ′ n with S′<br />

<br />

n Sn<br />

n = n − b and then A = lim inf S′<br />

<br />

n<br />

S′<br />

n<br />

n ≤ a − b < 0 ≤ lim sup n .<br />

Notice then that ˜ P(lim sup Sn<br />

n ≥ 0) ≥ ˜ P(A) = 1 so then ˜ P({sup S ′ n > 0}) = 1 is<br />

the whole space A. Have then by the maximal ergodic lemma that:<br />

0 < ˜ E(f ′ 1 {sup S ′ n >0}) = ˜ E(f ′ ) = ˜ E(f) − b<br />

The same argument on f ′′′ (ω) = a − f(ω) gives:<br />

0 < ˜ E(f ′′ 1 {sup S ′′<br />

n >0}) = a − ˜ E(f)<br />

But this is a contradiction now, for we have:<br />

a > ˜ E(f) > b<br />

Which is impossible since a < b. This contradiction means that its impossible<br />

to separate the liminf and the limsup like this, in other words we have almost sure<br />

convergence.<br />

Next it remains only to see that the random variable that this converges to is<br />

E(f|F0). Let us denote Firstly, notice that limN→∞ 1<br />

N<br />

N−1<br />

n=0 f (T n x) by ¯ f. We<br />

must show ¯ f is F0 measurable and that E( ¯ f1A) = E(f1A) for all A ∈ F 0. Notice<br />

that applying x → T x does not change limN→∞ 1<br />

N<br />

N−1<br />

n=0 f (T n x) as it only effects<br />

finitely many terms. This shows that ¯ f(x) = ¯ f(T x) This is the reason why ¯ f is F0<br />

measurable. More formally, to see that ¯ f −1 (B) is T -invariant for any Borel set B,<br />

just write out the definitions:<br />

T ( ¯ f −1 (B)) = T x ∈ Ω : ¯ f(x) ∈ B <br />

= T x ∈ Ω : ¯ f(T x) ∈ B <br />

= y ∈ Ω : ¯ f(y) ∈ B <br />

= ¯ f −1 (B)<br />

So indeed, ¯ f −1 (B) ⊂ F0 means ¯ f is F0 measurable. To see that ¯ f has the right<br />

expectation values, we first see prove the result for indicator functions and then use<br />

the “ladder” of integration to get the result we need. Consider that for sets A ∈ F0<br />

and B ∈ F we have:<br />

<br />

<br />

1B(x)dP = 1A(x)1B(x)dP<br />

A<br />

=<br />

=<br />

=<br />

<br />

<br />

<br />

A<br />

1A(T x)1B(T x)dP<br />

1A(x)1B(T x)dP<br />

1B(T x)dP<br />

Where the second equality is using the fact that P is T -invariant and the third<br />

equality uses the fact that A ∈ F0 ⇒ 1A(x) = 1A(T x). Since <br />

<br />

1B(x)dP= A A 1B(Tx)dP,


2. BIRKHOFF’S THEOREM 23<br />

by following along with the construction of the Lebesgue integral starting <strong>from</strong> indicator<br />

functions, we conclude that <br />

<br />

f(x)dP = f(T x)dP for any integrable f.<br />

A A<br />

Applying this inductively, we see that for any N ∈ N that:<br />

<br />

A<br />

N−1<br />

1 <br />

f(T<br />

N<br />

k x)dP =<br />

k=0<br />

<br />

A<br />

f(x)dP<br />

When ¯ f is bounded, we can take the limit as N → ∞ and use the bounded<br />

convergence theorem to conclude:<br />

<br />

A<br />

¯fdP = lim<br />

=<br />

<br />

N→∞<br />

A<br />

<br />

A<br />

f(x)dP<br />

N−1<br />

1 <br />

f(T<br />

N<br />

k x)dP<br />

For general ¯ f, we can use a truncation argument and the monotone convergence<br />

theorem to get finish the result. <br />

Example 2.5. If we look at our first example of rotation by an angle ω, we<br />

concluded (using Fourier analysis) that when ωis irrational and f has a Fourier<br />

series that:<br />

N−1<br />

1 <br />

lim<br />

N→∞ N<br />

n=0<br />

By Birkhoff’s theorem, we know that:<br />

f (T n x) =<br />

k=0<br />

1<br />

0<br />

f(s)ds<br />

N−1<br />

1 <br />

lim f (T<br />

N→∞ N<br />

n x) = E(f|F0)<br />

n=0<br />

So we conclude that: 1<br />

0 f(s)ds = E(f|F0). Since this holds for every f, it<br />

must be that F0 is the trivial field. Notice that this improves our result a little bit,<br />

since we may now apply it to any f integrable, not just f which are C 2 .<br />

Example 2.6. In the first example, we were essentially looking at 1 N−1 N n=0 e2πim(x+nω) .<br />

Now lets ask about the series: 1 N−1 N n=0 e2πim(2n x) . This is harder to handle with<br />

Fourier techniques, but we can still use Birkhoff’s theorem. Again take Ω = [0, 1)<br />

to be our space, but instead of thinking of this as a circle, think of this as binary<br />

sequence (which are the binary expansions of each number between 0 and 1),<br />

Ω = {0.e1e2 . . . : ei = ±1}. Let T : Ω → Ω by T (0.e1e2e3 . . .) = 0.e2e3 . . . . This<br />

translates to T (x) = 2x mod 1 (this is the reason that applying it N times gives<br />

2Nx). It’s not hard to verify that this is measure preserving. By the Kolmogorov<br />

Zero-One law, the field F0of T -invariant events must be the trivial field, for by<br />

applying T N times, we see that an event A ∈ F0cannot depend on the first N<br />

digits e1, e2, . . . eN . Since this works for any N, this is a subset of the tail field,


24 3. ERGODIC THEOREM<br />

which by K-0-1 is trivial. Hence, by Birkhoff’s Theorem, we have:<br />

N−1<br />

1 <br />

lim f (T<br />

N→∞ N<br />

n x) = E(f|F0)<br />

n=0<br />

= E(f)<br />

=<br />

1<br />

0<br />

fdP<br />

For the Fourier basis function f(x) = e 2πimx , this is saying that:<br />

N−1<br />

1 <br />

lim e<br />

N→∞ N<br />

2πim(2nx) = 0<br />

n=0<br />

Example 2.7. We can use Birkhoff’s theorem to give yet another proof of<br />

the strong law of large numbers. Let (X1, X2, . . .) be a sequence of i.i.d. random<br />

variables with finite mean and let Ω be the probability space for these sequences.<br />

Define T : Ω → Ω by T (x1, x2, x3, . . .) = (x2, x3, . . .). Notice that since the X ′ s<br />

are i.i.d. that this is measure preserving. As in example 2, the Kolmogorov zero<br />

one law tells us the field F0 of T -invariant is trivial. Let f(x1, x2, . . .) = x1. By<br />

Birkhoff’s theorem:<br />

lim<br />

N→∞<br />

N−1<br />

1 <br />

N<br />

n=0<br />

xn = lim<br />

N→∞<br />

= E(f|F0)<br />

= E(f)<br />

= E(X1)<br />

N−1<br />

1 <br />

N<br />

n=0<br />

Which is exactly the strong law of large numbers.<br />

3. Continued Fractions<br />

f (T n x)<br />

One way to specify a number in x ∈ [0, 1) is the binary expansion. Each binary<br />

digit tells you “which half” of the number line x is in. e.g. first digits says if its<br />

in 0, 1<br />

<br />

1<br />

2 or 2 , 1 , and then we treat that interval like [0, 1) and start over again for<br />

the next digit. Another way to do this game would be to draw the harmonic series<br />

1<br />

1 1<br />

n on the number line, and then specify which interval [ n+1 , n ) the number is in.<br />

1<br />

1<br />

Call this first number n1, and we’ll have then that ≤ x < . From this we<br />

n1+1 n1<br />

may conclude that:<br />

1<br />

x =<br />

n1 + ɛ1<br />

For some ɛ1 ∈ [0, 1). Play the same game again for ɛ1, and we get:<br />

1<br />

x =<br />

n1 + 1<br />

n2+ɛ2<br />

Continuing this indefinitely gives us the “continued fraction expansion” for x.<br />

Since this is hard to write, we will adopt the convention that x = [n1; n2; n3; . . .] to<br />

mean the continued fraction expansion n1 and then n2 and so on.<br />

Proposition 3.1. If the sequence [n1; n2; n3; . . .] is cyclic (that is it repeats<br />

after some finite number of steps), then x = [n1; n2; n3; . . .] is algebraic.


3. CONTINUED FRACTIONS 25<br />

Proof. The easiest way to see this is an example. Suppose we look at x =<br />

[1; 1; 1; . . .]. Then:<br />

So then:<br />

1<br />

x =<br />

1 + 1<br />

. ..<br />

1+<br />

1<br />

= 1 + x<br />

x<br />

But then x2 − x + 1 = 0, so x is the root of a quadratic equation. In this case<br />

x = √ 5−1<br />

2 is the golden section. In general, if the continued fraction expansion is<br />

periodic after N steps, then x will be the root of an N + 1 order polynomial. <br />

Definition 3.2. We write x = [n1; n2; n3; . . .] to mean:<br />

x =<br />

n1 +<br />

1<br />

1<br />

n2+ 1<br />

n 3 +...<br />

Problem 3.3. Let T : (0, 1) → (0, 1) by T ([n1; n2; . . .]) = [n2; n3; . . .]. This is<br />

the map T (x) = 1<br />

x mod 1. Is there a probability density P we can put on (0, 1) so<br />

that T will be measure preserving?<br />

Proof. [Gauss] The probability density dP = 1<br />

Indeed, just notice that by the definition of T that:<br />

T −1 (a, b) =<br />

∞<br />

n=1<br />

1<br />

log 2 1+x<br />

<br />

1<br />

b + n ,<br />

<br />

1<br />

a + n<br />

dx will do the trick!<br />

So then the requirement P(T −1 (a, b)) = P(a, b) gives (using ρ as a probability<br />

density function):<br />

b<br />

a<br />

ρ(x)dx =<br />

∞<br />

1<br />

a+n<br />

<br />

n=1 1<br />

b+n<br />

Taking the derivative w.r.t. b here gives:<br />

ρ(x) =<br />

∞<br />

n=1<br />

ρ(x)dx<br />

<br />

1 1<br />

ρ<br />

x + n (x + n) 2


26 3. ERGODIC THEOREM<br />

This is hard to solve, but its easy to verify that ρ(x) = 1<br />

1+x<br />

LHS is 1<br />

1+x<br />

while the RHS is:<br />

∞<br />

<br />

1<br />

ρ<br />

x + n<br />

n=1<br />

1<br />

(x + n) 2<br />

=<br />

=<br />

=<br />

=<br />

=<br />

∞<br />

n=1<br />

n=1<br />

1<br />

1 + 1<br />

x+n<br />

1<br />

(x + n) 2<br />

∞ x + n 1<br />

1 + (x + n)<br />

n=1<br />

(x + n) 2<br />

∞ 1<br />

(x + n + 1)(x + n)<br />

n=1<br />

∞ 1<br />

x + n −<br />

1<br />

x + n + 1<br />

1<br />

x + 1<br />

works, since the<br />

Which is a telescoping sum so we can evaluate it exactly. The factor of 1<br />

log 2<br />

normalizes ρ so that 1<br />

ρ(x)dx = 1. Indeed:<br />

0<br />

1<br />

0<br />

1 1 1<br />

log 2 − log 1<br />

dx = [log(1 + x)]1 0 = = 1<br />

log 2 x + 1 log 2 log 2<br />

Theorem 3.4. The shift function T : [0, 1] → [0, 1] given by T ([n1; n2; , . . .]) =<br />

[n2; n3; . . .]is ergodic.<br />

Proof. Fix N ∈ N and a list of integers n1, n2, . . . , nN . Now define:<br />

1<br />

n(x) :=<br />

n1 +<br />

1<br />

n2+...+ 1<br />

n N+x<br />

For each choice of n1, n2, . . . , nN, the image of [0, 1] through n(x) is an interval<br />

whose endpoints are n(0) and n(1). As N increases, the interval [n(0), n(1)] gets<br />

smaller and smaller. An easy proof by induction shows that n(x) can be written<br />

as:<br />

Ax + B<br />

n(x) =<br />

Cx + D<br />

For A, B, C, D ∈ Rwith 0 ≤ A ≤ B and 1 ≤ C ≤ D and with AD − BC = ±1<br />

where the sign depends on the parity of N. Now, let I = [n(0), n(1)] and let<br />

J = (a, b) be an arbitarty interval.<br />

Claim. |I ∩ T −N (J)| ≥ 1<br />

2 |I||J| holds for all N ∈ N.<br />

Proof. Take x ∈ I ∩ T −N (J). Notice that x ∈ I if and only if x = n(y)<br />

for some y ∈ [0, 1] by definition of I. So we can write x as a continued fraction<br />

x = [n1; n2; . . . ; nN−1; nN + y]. On the other hand, x ∈ T −N (J) if and only if<br />

T N x ∈ J. But T N x = T N ([n1; n2; . . . ; nN−1; nN; y]) = y by definition of T . This<br />

shows that x ∈ T −N (J) if and only if y ∈ J.<br />

Have then, using the the observation that n is a fractional linear transformation,<br />

that:<br />

I ∩ T −N (J) = {n(y) : y ∈ J} = [n(a), n(b)]


This shows:<br />

3. CONTINUED FRACTIONS 27<br />

|I ∩ T −N (J)| =<br />

=<br />

=<br />

|n(b) − n(a)|<br />

<br />

<br />

<br />

<br />

Ab + B Aa + B <br />

− <br />

Cb + D Ca + D <br />

<br />

<br />

<br />

<br />

b − a <br />

<br />

(Ca<br />

+ D)(Cb + D) <br />

≥<br />

|b − a|<br />

since a, b < 1<br />

(C + D) 2<br />

≥ 1<br />

|b − a||I|<br />

2<br />

= 1<br />

2 |J||I|<br />

The last inequality holds by writing out |I|and using AD − BC = ±1 and the<br />

fact that 1 ≤ C ≤ D so that C + D ≤ 2D:<br />

|I| =<br />

=<br />

|n(0) − n(1)|<br />

<br />

<br />

<br />

<br />

A + B B <br />

− <br />

C + D D <br />

=<br />

1<br />

|AD − BC|<br />

D(C + D)<br />

=<br />

1<br />

D(C + D)<br />

≤<br />

2<br />

(C + D) 2<br />

Finally, to see that T is ergodic, take any Borel set B ∈ F. By approximating<br />

B by intervals, the inequality <strong>from</strong> the claim still holds:<br />

<br />

I ∩ T −N B ≥ 1<br />

2 |I||B|<br />

Take any set A now. Again, by approximting A by intervals I, we can use the<br />

above inequality to get:<br />

<br />

A ∩ T −N B ≥ 1<br />

2 |A||B|<br />

This gives what we want, for if B is T −invariant, we have T −N B = B for every<br />

N. The choice A = B c in the above gives:<br />

1<br />

2 |B|Bc | ≤ |B c ∩ T −N B|<br />

= |B c ∩ B|<br />

= 0<br />

So |B||B c | = 0, which is only possible if |B| = 1 or |B| = 0. This is saying<br />

all T invarant sets are either measure zero or full measure. In other words, T is<br />

ergodic.


CHAPTER 4<br />

Brownian Motion<br />

1. Motivation<br />

Our aim is to discuss a stochastic process on [0, 1] (that is a probability space<br />

(Ω, F, P) and a collection of random variables Bt(ω), for t ∈ [0, 1]) which has the<br />

following properties:<br />

• B0(ω) = 0 for every ω ∈ Ω<br />

• Fix a T ∈ [0, 1], and define for t > T,B + t = BT +t − Bt. We want B + t to<br />

look statistically identical to Bt. (This says the process has some sort of<br />

“time homogenous” property.)<br />

• We want B + t as defined above to be independent of Bt. (This says that<br />

the process has some sort of Markov property)<br />

• E(B 2 t ) < ∞<br />

• E(Bt) = 0<br />

• Bt(ω) is continuous for every (or almost every) ω ∈ Ω.<br />

This process is supposed to describe something like a piece of dust that you can<br />

see sometimes wiggling about in a sunbeam. Notice that the time homogenous and<br />

Markov property together means we can write:<br />

BT =<br />

N<br />

k=1<br />

B kT<br />

N<br />

− B (k−1)T<br />

N<br />

Which is a sum of many independent increments. By the central limit theorem,<br />

this is suggesting Bt ∼ N(0, σ 2 ) is normally distributed (to get this more rigorously<br />

would take a bit more work, since the above set up is not exactly the set up for the<br />

central limit theorem). This is often taken as an “axiom” :<br />

• Bt ∼ N(0, σ 2 )<br />

A quick calculation shows that σ2 ∝ t. Let f(t) = σ2 be the variance for Bt. Then:<br />

f(t + s) = E (Bt+s) 2<br />

= E (Bt+s − Bs + Bs) 2<br />

= E (Bt+s − Bs) 2 + E B 2 s + 2E ((Bt+s − Bs)Bs)<br />

= f(t) + f(s) + 2 · 0<br />

Where we used the time homogenous property and the Markov property. This<br />

functional relation means that f(t) must be linear! f(0) = 0 holds since B0 is<br />

known exactly. Hence f(t) = c · t. It doesn’t hurt to take c = 1, since anything we<br />

get can be rescaled for other values of c if need be. Sometimes this is taken as the<br />

“axiom”:<br />

(1) Bt ∼ N(0, t)<br />

29


30 4. BROWNIAN MOTION<br />

The following resulting property also turns out to be very useful:<br />

Proposition 1.1. E(BaBb) = min(a, b)<br />

Proof. Suppose W.O.L.O.G. a < b. Then: E(BaBb) = E(Ba(Bb − Ba +<br />

Ba)) = E(Ba(Bb − Ba)) + E(B 2 a) = 0 + a = min(a, b) <br />

It remains to see that such a process really exists. The main difficulty is proving<br />

that the process is continuous. There is more than one way to skin the cat for this;<br />

each method is useful because it gives a different insight into what is going on.<br />

2. Levy’s Construction<br />

We will construct Brownian motion on t ∈ [0, 1] as a uniform limit of continuous<br />

functions B N t , as N → ∞. Each B N t will be an approximation of the Brownian<br />

motion that is piecewise linear between the dyadic rationals of the form a<br />

2 N . The<br />

real trick in the construction is the remarkable observation that the corrections<br />

<strong>from</strong> BN t to B N+1<br />

t are independent of the construction so far up to level N, which<br />

is the crucial fact that makes the construction so nice and allows it to converge.<br />

The crucial fact about Brownian motion that makes this possible is captured in the<br />

below proposition:<br />

Proposition 2.1. Let Bt be a Brownian path and 0 < a < b < 1. Consider the<br />

line segment joining Ba and Bb: l(t) = Ba+(t−a) Bb−Ba<br />

b−a . Consider the value of the<br />

. The difference <strong>from</strong> this point to the line<br />

Brownian path at the midpoint time B a+b<br />

2<br />

l(t) is independent of Bb and Ba. That is to say: X = B a+b<br />

2<br />

− l( a+b<br />

2 ) = B a+b −<br />

2<br />

1<br />

2Ba − 1<br />

2Bb, is independent of Ba and Bb. Moreover, X is normally distributed<br />

X ∼ N(0, 1<br />

4<br />

(b − a)).<br />

Proof. Firstly, we notice that the random variables X, Ba,and Bb are have a<br />

joint normal distribution. This can be seen without much difficulty by expanding<br />

the definition of X to write any linear combination of X, Baand Bb as a linear<br />

combination of B a+b , Ba,and Bb. From here, rewrite as a linear combination of<br />

2<br />

Ba, B a+b − Ba, and Bb − B a+b . By the hypothesis on our Brownian motion, each<br />

2<br />

2<br />

of these are independent Gaussian variables, so any linear combination of them is<br />

again Gaussian. Hence any linear combination of X, Ba and Bb is Gaussian. This<br />

property is a characterization of the joint Gaussian distribution. The observation<br />

that X, Ba and Bb are jointly normal substantially simplifies the verification of<br />

their independence, as for jointly normal distributions they are independent if and<br />

only if they are uncorrelated. From here we calculate (with the help of the useful<br />

covariance relation):<br />

<br />

E(BaX) = E Ba(B a+b −<br />

2<br />

1<br />

2 Ba − 1<br />

2 Bb)<br />

<br />

<br />

= E BaB a+b −<br />

2<br />

1<br />

2 E B 2 1<br />

a −<br />

2 E(BaBb)<br />

= a − 1 1<br />

a −<br />

2 2 a<br />

= 0<br />

A similar calculation holds for E(BbX). Since these are uncorrelated and jointly<br />

normal, they are independent. A quick calculation using the covariance relation<br />

again gives X ∼ N(0, 1<br />

4 (b − a))


2. LEVY’S CONSTRUCTION 31<br />

This remarkable fact gives us a nice idea to construct Brownian motion starting<br />

with an infinite sequence of standard E(Z) = 0, E(Z2 ) = 1 i.i.d Gaussian variables<br />

(Z0, Z1, Z2, . . .). The idea is to first construct B0 = 0, B1 = Z0. Then, once B0, and<br />

B1 are constructed by the above proposition, we know that B1/2− 1 1<br />

2B0− 2B1 can be<br />

modeled by 1<br />

4Z1, so set B1/2 = 1<br />

2B1 <br />

1 + 4Z1. Once B0, B1/2, B1 are constructed,<br />

the above proposition gives us a way to get B 1 and B 3 using two more normal<br />

4 4<br />

1<br />

variables 8Z2 <br />

1 and 8Z3 and so on.<br />

The above proposition and paragraph is the basic idea. It becomes a bit of a<br />

mouthful to write it all down. A confused reader should focus on understanding<br />

the construction above before digesting the below details.<br />

To formalize the process, we let B N t be the construction at the N − th level of<br />

construction, which will have the correct values at points of the form a<br />

2 N , 0 ≤ a ≤<br />

2 N . We make fill in in between these points with a piecewise continuous function.<br />

After some bookkeeping, the easiest way to write this down is as follows. First<br />

2k<br />

define some “tent” functions which make little peaks in the interval 2n , 2(k+1)<br />

2n <br />

of<br />

unit height:<br />

⎧<br />

⎪⎨<br />

2<br />

Tn,k =<br />

⎪⎩<br />

n (t − (2k)) t ∈ 2k<br />

2n , 2k+1<br />

2n 2<br />

<br />

n ((2k + 2) − t) t ∈ 2k+1<br />

2n , 2k+2<br />

2n 0<br />

<br />

<br />

2k t /∈<br />

2n , 2(k+1)<br />

2n Notice that for every level n, 0 ≤ k ≤ 2 n−1 − 1 means there are 2 n−1 tents, and<br />

notice that these tents are disjoint and of unit height.<br />

Now, at every level of the construction we make sure that B N t has the right<br />

value at points of the form a<br />

2 N by adding in the right tents with heights distributed<br />

by scaled normal functions:<br />

B N t = Z0t +<br />

N<br />

n=1<br />

2 n−1 −1<br />

k=0<br />

1<br />

Zn,kTn,k(t)<br />

2n+1 Explanation of this formula: The “Z0t” is the initial level 0 construction. The<br />

sum 0 ≤ n ≤ N sums over the N levels of construction, and the sum 0 ≤ k ≤<br />

2 n−1 − 1 is over the 2 n−1 tents that get added on at the n − th level. Each tent<br />

<br />

1<br />

has a height distributed like 2n+1 1<br />

Z ∼ N(0, 2n+1 ) , where Z ∼ N(0, 1)(This is the<br />

content of the proposition above!) For convenience, we label the infinite sequence<br />

of normal variables so that Zn,k is controlling the height of the k − th tent on the<br />

n − th level.<br />

Finally we get the Brownian motion as Bt = limN→∞ BN t , which puts the<br />

Brownian motion on the same probability space as the infinite sequence of normal<br />

variables. To see that this is continuous, we show that the convergence is uniform<br />

almost surely. Since each BN t is continuous, and a uniform limit of continuous<br />

functions is continuous, this gives that Bt is continuous.<br />

Proposition 2.2. The family of functions B N t is converging uniformly almost<br />

surely.<br />

Proof. As you might suspect, the trick is to use the right summablesequence <br />

with a clever application of the Borel Cantelli lemma. Let Hn = maxt∈[0,1] 2 n−1 <br />

−1 1<br />

k=0 2n+1 Zn,kTn,k(t)


32 4. BROWNIAN MOTION<br />

be the maximum height contribution to Bt at level n. Since the tent functions<br />

Tn,k(t) are disjoint, this is Hn =<br />

following estimate:<br />

P(Hn > 2<br />

n − 2<br />

<br />

√<br />

2n) = P<br />

<br />

1<br />

2n+1 max<br />

0≤k≤2n−1−1 max<br />

0≤k≤2n−1 (|Zn,k|) > 2<br />

−1<br />

≤ 2 n−1 P |Z| > 2 √ n <br />

= 2 n P Z > 2 √ n <br />

=<br />

≤<br />

2 n<br />

√ 2π<br />

2 n<br />

√ 2π<br />

∞<br />

2 √ n<br />

= C · 1<br />

√ n ·<br />

<br />

exp − x2<br />

<br />

dx<br />

2<br />

(|Zn,k|). We now make the<br />

n − 2 2 n+1<br />

2 2 1<br />

2<br />

<br />

√<br />

n<br />

1<br />

2 √ n exp<br />

<br />

− (2√n) 2 <br />

(this is Mill’s ratio)<br />

2<br />

<br />

2<br />

e2 n Which is a summable sequence! Hence, we know by the Borel Cantelli lemma<br />

that this happens only finely often almost surely. That is to say, for almost every<br />

n √<br />

− ω ∈ Ω, we can find N ∈ N so that Hn(ω) ≤ 2 2 2n for all n > N. But then we<br />

have that for all p, q > Nand any t ∈ [0, 1]:<br />

|B p<br />

t − B q<br />

t | =<br />

≤<br />

≤<br />

≤<br />

<br />

<br />

<br />

<br />

<br />

<br />

q<br />

n=p+1<br />

q<br />

n=p+1<br />

q<br />

n=p+1<br />

∞<br />

2<br />

n=N<br />

2 n−1 −1<br />

k=0<br />

|Hn|<br />

2<br />

n − 2<br />

n − 2<br />

√ 2n<br />

√ 2n<br />

<br />

<br />

<br />

1<br />

<br />

Zn,kTn,k(t) <br />

2n+1 <br />

<br />

n √<br />

− But since 2 2 2n is summable, this can be made arbitrarily small, and we<br />

see then that BN t is Cauchy in the uniform norm. Since this holds for almost every<br />

ω ∈ Ω, we indeed have uniform convergence almost surely. <br />

Finally, to see that the limiting process is really what we want, we just verify<br />

that E (Bt − Bs) 2 = |t − s|, <strong>from</strong> which it’s easy to check the properties we want.<br />

To see this, we just use the density of the dyadic rationals in [0, 1]. The above<br />

construction fixes points of the form a<br />

2 n at step n, that is to say Bt( a<br />

2 n ) = B n t ( a<br />

2 n ).<br />

Hence for t, s dyadic rationals, we have E (Bt − Bs) 2 = E (B n t − B n s ) 2 = |t − s|<br />

which is easily checked by the construction above/the earlier proposition.


3. CONSTRUCTION FROM DURRET’S BOOK 33<br />

For arbitrary t now, but s still taken to be a dyadic rational, we take a sequence<br />

of dyadic rationals tn → t. We have then using Fatou’s lemma:<br />

E (Bt − Bs) 2 <br />

= E lim (Btn − Bs)<br />

2<br />

n→∞<br />

≤ lim<br />

n→∞ E (Btn − Bs)<br />

2<br />

= lim<br />

n→∞ |tn − s|<br />

= |t − s|<br />

Now consider, for any n ∈ N:<br />

E (Bt − Bs) 2 = E (Bt − Btn − Bs + Btn )2<br />

= E (Bt − Btn) 2 + E (Bs − Btn) 2 + 2E ((Bt − Btn)(Bs − Btn))<br />

Since this holds for any n ∈ N, we get:<br />

E (Bt − Bs) 2 =<br />

<br />

lim E (Bt − Btn<br />

n→∞<br />

)2 + E (Bs − Btn )2 + 2E ((Bt − Btn )(Bs − Btn ))<br />

= 0 + lim<br />

n→∞ |tn − s| + 0<br />

= |t − s|<br />

Where we have observed that the two limits on either side are 0 by using<br />

E (Bt − Bs) 2 ≤ |t−s| in a clever way. First:limn→∞ E (Bt − Btn )2 ≤ limn→∞ |t−<br />

tn| = 0 and secondly with the help of Holder:<br />

lim<br />

n→∞ |E ((Bt − Btn )(Bs − Btn )) | ≤ lim<br />

n→∞<br />

≤ lim<br />

n→∞<br />

= 0<br />

E((Bt − Btn )2 ) E((Bs − Btn )2 )<br />

|t − tn| |s − tn|<br />

Once we have E (Bt − Bs) 2 = |t − s| for arbitrary t and dyadic s, the same<br />

argument repeated again will show that E (Bt − Bs) 2 = |t − s| works when both<br />

t and s are arbitrary.<br />

3. Construction <strong>from</strong> Durret’s Book<br />

(I call this “Durret’s construction” since I read it out of Durret’s book: “Brownian<br />

Motion and Martingale’s in Analysis”)<br />

The above construction is pretty elementary and gives all the desired properties.<br />

The following construction is a bit more technical, in particular it uses a few<br />

extension results like Caratheodory and Kolmogorov. However, it gives immediately<br />

that not only is the Brownian motion continuous, but it is Holder continuous<br />

for exponents γ < 1<br />

2 . This construction uses a few ”extension theorems”, which are<br />

gone over briefly in the appendix.<br />

Definition 3.1. (Constructing Brownian Motion with the Kolmogorov Extension<br />

Theorem)<br />

The Kolmogorov Extension Theorem gives us a quick way to define a measure<br />

on the space of functions. However, since the space of functions {f : T → R} is so<br />

large, this theorem often gives us a very unwieldy space to work with, one in which<br />

we can’t get our hands on the properties we want. The construction of Brownian<br />

motion below is a great example, constructing with the Kolmogorov theorem is


34 4. BROWNIAN MOTION<br />

bad, while if we take more care and construct it on only countably many points,<br />

we get what we want.<br />

Let Pt1,t2,...tn (A1×A2×.<br />

<br />

. .×An) = dx1 dx2 . . .<br />

A1<br />

A2<br />

An<br />

dxnΠ n k=1 pti−ti−1 (xi−1, xi),<br />

where pt(x, y) = √ 2πt −1 exp(− |y−x|2<br />

2t ). This is naively what you get as the distribution<br />

of Bt1 , Bt2,..., Btn if you use the Markov property and normal distribution<br />

of the Brownian motion. By Kolmogorov, we get a measure Pon the entire space<br />

of function {f : [0, 1] → R}. This defines the Brownian motion!<br />

Proposition 3.2. With the above description of P, it will be impossible to<br />

see that the Brownian motion is almost surely continuous because the continuous<br />

functions C ⊂ {f : [0, 1] → R} are not even measurable.<br />

Proof. Suppose by contradiction C is measurable. Then we can find a sequence<br />

t1, t2, . . . of times and Borel sets B1, B2, . . . so that C = {f : (f(ti) ∈ Bi}<br />

(The proof of this fact comes by showing that sets of the form {f : (f(ti) ∈ Bi} are a<br />

sigma-algebra which contain the cylinder sets used to define Ω = σ(A) ). Take any<br />

continuous function f now, and alter its value at a single point t /∈ {t1, t2, . . .} to<br />

get a function ˆ f which agrees with f at {t1, t2, . . .} but is not continuous. But then<br />

ˆf ∈ C = {f : (f(ti) ∈ Bi} since it agrees with f at {t1, t2, . . .} is a contradiction. <br />

This result means that our construction is not good. It is better to construct<br />

the Bt as follows:<br />

Definition 3.3. (Constructing Brownian Motion with Uniform Continuity)<br />

Step 1. (Define on dyadic rationals). Let Pt1,...tn as above. Use the countable<br />

Kolmogorov Extension Theorem to get a measure P on the set of functions Ω =<br />

{f : [0, 1] ∩ D2 → R} <strong>from</strong> the dyadic rationals to R.<br />

Step 2. Check that functions in Ω are almost surely Holder continuous. i.e. for<br />

almost all f ∈ Ω, |f(t) − f(s)| ≤ C|t − s| γ<br />

Step 3. Conclude that for almost every f ∈ Ω,there is a unique way to extend<br />

f to a function f : [0, 1] → R since the dyadic rationals are dense in R.<br />

Step 1 is pretty simple, but step 2 requires some verification and is the real<br />

heart of the problem:<br />

Proposition 3.4. Fix γ < 1<br />

2 . For almost every f ∈ Ω, there is a constant C<br />

so that |f(t) − f(s)| ≤ C|t − s| γ<br />

We first prove a lemma.<br />

Lemma 3.5. Fix γ < 1<br />

2 . Then there exists δ > 0,so that for almost every f ∈ Ω,<br />

there is an N ∈ N (which depends on f) so that for n ≥ N we have:<br />

|f(x) − f(y)| ≤ |x − y| γ<br />

Whenever x = i2−n , y = j2−nand |x − y| ≤ <br />

1 n(1−δ)<br />

2<br />

Proof. Take m ∈ N so large so that m > 1<br />

1−2γ . We use the inequality<br />

E |f(t) − f(s)| 2m ≤ Cm|t − s| m with Cm = E|f(1)| 2m (This follows by the property<br />

that f(t) − f(s) ∼ f(s) + N(0, t − s) ). For any n ∈ N now, consider now the


following estimates:<br />

P<br />

<br />

3. CONSTRUCTION FROM DURRET’S BOOK 35<br />

|f(x) − f(y)| > |x − y| γ for some x = i2 −n , y = j2 −n and |x − y| ≤<br />

<br />

n(1−δ)<br />

1<br />

2<br />

≤ |x − y| −2mγ E |f(x) − f(y)| 2m<br />

Where the sum on the right hand side is taken over all the possible x, y that satisfy<br />

the inequality |x − y| ≤ <br />

1 n(1−δ)<br />

(There are finitely many, since we are restricting<br />

2<br />

ourselves to dyadic rationals x = i2−n , y = j2−n ). We have used the Chebyshev<br />

inequality P(|X| > a) ≤ a−mE(|X| m ) here. Now, by the above inequality, we have:<br />

<br />

−2mγ m<br />

LHS ≤ Cm |x − y| |x − y|<br />

<br />

−2mγ+m<br />

= Cm |x − y|<br />

≤ Cm2 n 2 nδ (2 −n(1−δ) ) −2mγ+m<br />

= Cm2 −n(−(1+δ)+(1−δ)(−2mγ+m))<br />

The last bound comes in because |x − y| ≤ 2 −n(1−δ) for x, y in our sum, and<br />

there are at most 2 n choices for x and 2 nδ choices for y once x has been fixed<br />

(remember, they are all n-th level dyadic rationals). Now, the term that appears<br />

in the exponent is:<br />

ɛ = −(1 + δ) + (1 − δ)(−2mγ + m)<br />

Since m is so large so that −2mγ + m > 1, we can choose δ so small so that<br />

ɛ > 0. We will have then that<br />

LHS ≤ 2 −nɛ<br />

Which is a summable sequence! By the Borel Cantelli lemma, it must be the<br />

case that for almost every f ∈ Ω the event here happens only finitely many times.<br />

This is exactly the statement of the lemma which we wanted to prove. <br />

Proposition 3.6. Fix γ < 1<br />

2 . For almost every f ∈ Ω, there is a constant C<br />

so that |f(t) − f(s)| ≤ C|t − s| γ<br />

Proof. For almost every f ∈ Ω, find δ > 0, N ∈ N as in the lemma. Take any<br />

t, s ∈ D2 ∩ [0, 1] with t − s < 2 −N(1−δ) .Choose m > N now so that 2 −(m+1)(1−δ) ≤<br />

t − s ≤ 2 −m(1−δ) .Write now t = i2 −m − 2 −q1 − 2 −q2 − . . . 2 −qk < (i − 1)2 −m , and<br />

s = j2 −m + 2 −r1 + . . . + 2 −rl < (j + 1)2 m for some choice of q ′ s and r ′ s so that<br />

m < q1 < . . . < qk and m < r1 < . . . < rl. Since t − r < 2 −m(1−δ) , we have<br />

i2 −m − j2 −m < t − s < 2 −m(1−δ) so we can apply the result <strong>from</strong> the lemma to<br />

conclude:<br />

|f(i2 −m ) − f(j2 −m )| ≤ ((2 mδ )2 −m ) γ<br />

= 2 −m(1−δ)γ


36 4. BROWNIAN MOTION<br />

Now, we use the result of the lemma again many times to see that (using our<br />

clever rewriting of t):<br />

|f(t) − f(i2 −m )| ≤ |f(i2 −m − 2 −q1 ) − f(i2 −m )| + |f(i2 −m − 2 −q1 − 2 −q2 ) − f(i2 −m − 2 −q1 )| + . . . + |f(i2 −m<br />

≤ |2 −q1 | γ + . . . + |2 −qk | γ<br />

≤<br />

∞<br />

j=m+1<br />

≤ C2 −γm<br />

(2 −j ) γ<br />

Since m < qp for each p, and where we used Jensen’s inequality to bound the<br />

sum. We similarly get a bound on |f(s) − f(j2 −m )|.Finally then:<br />

|f(t) − f(s)| ≤ C2 −γm(1−δ) + C2 −γm + C2 −γm<br />

≤ C2 −γm(1−δ)<br />

<br />

γ(1−δ)<br />

= C2 2 −(m+1)(1−δ) γ<br />

≤ C2 γ(1−δ) |t − s| γ<br />

By the choice of m so that 2 −(m+1)(1−δ) ≤ t − s. <br />

So <strong>from</strong> here we see that the Brownian motion is almost surely Holder continuous<br />

for exponents γ < 1<br />

2 . This result lets us find a unique extension of f(t) <strong>from</strong><br />

the dyadic rationals to all of [0, 1] which is not only continuous, but moreover its<br />

Holder continuous for exponents γ < 1<br />

2 , which is a stronger result than our first<br />

construction. For ease of notation now, we will change our notation now a little<br />

bit. We will refer to ω ∈ Ωnow instead of f and we now have a family of random<br />

variables Bt(ω) = ω(t). What we have just proven is that for fixed ω, the map<br />

t → Bt(ω) is indeed a Holder continuous path for exponents γ < 1<br />

2 .<br />

4. Some Properties<br />

The following slick result shows that the Brownian motion is nowhere Holder<br />

continuous for γ > 1<br />

2 , which in particular shows that it is nowhere differentiable.<br />

Proposition 4.1. For γ > 1<br />

2 , the set of functions which are Holder continuous<br />

with exponent γ at some point is a null set. In other words, the Brownian motion<br />

is almost surely nowhere Holder continuous for exponents γ > 1<br />

2 .<br />

Proof. Fix a γ > 1<br />

m+1<br />

2 and C ∈ R. Choose m ∈ N so large so that γ¿ 2m . Define<br />

the events, starting at n > m:<br />

<br />

An = ω : ∃s ∈ [0, 1] such that |Bt − Bs| ≤ C|t − s| γ ∀t ∈ [s − m m<br />

, s +<br />

n n ]<br />

<br />

Define the random variable:<br />

<br />

<br />

Yn,k(ω) = max <br />

j=0,1,...2m B <br />

k + j k + j − 1 <br />

− B<br />

n<br />

n<br />

And finally, the events:<br />

Bn = at least one of the Yn,k ≤ 2C <br />

m γ<br />

n<br />

We now claim that An ⊂ Bn, since for ω ∈ An, we find an s so that |Bt − Bs| ≤<br />

C|t − s| γ∀t ∈ [s − m m<br />

n , s + n ]. In particular, |Bt − Bs| ≤ C <br />

m γ<br />

n By the pigeonhole


4. SOME PROPERTIES 37<br />

principle, inside this interval we can find k so that { k k+1 k+2<br />

n , n , n<br />

m m<br />

n , s + n ] . But then, for this k, we have:<br />

<br />

<br />

Yn,k(ω) = max <br />

j=0,1,...2m B <br />

k + j k + j − 1 <br />

− B<br />

n<br />

n<br />

<br />

<br />

≤ max <br />

j=0...2m B <br />

k + j <br />

− B(s) <br />

n +<br />

<br />

<br />

<br />

k + j − 1 <br />

B(s) − B<br />

n<br />

<br />

m<br />

γ ≤ 2C<br />

n<br />

So ω ∈ Bn by definition.Now consider that:<br />

P(An) ≤ P(Bn)<br />

≤<br />

<br />

≤<br />

k=0..n−m<br />

<br />

k=0..n−m<br />

<br />

P Yn,k ≤ 2C<br />

<br />

B<br />

P k+j<br />

n<br />

<br />

m<br />

γ n<br />

− B k+j−1<br />

n<br />

<br />

<br />

≤ 2C<br />

<br />

≤ nP |B 1<br />

n − B0|<br />

<br />

m<br />

γ2m < 2C<br />

n<br />

<br />

<br />

m<br />

γ √<br />

= nP |B1 − B0| < 2C n<br />

n<br />

<br />

2<br />

<br />

m<br />

<br />

γ<br />

2m<br />

√<br />

≤ n √2π 2C n<br />

n<br />

2m<br />

1 (<br />

= Dn 2 −γ)2m+1 = Dn m+1−2mγ → 0<br />

<br />

m<br />

γ n<br />

k+2m , . . . n } ⊂ [s −<br />

<br />

for each j = 0, 1, ..2m<br />

Where we used the independence property of disjoint intervals of the Brownian<br />

motion, the scaling relation P(Bt > a) = P(Bct > √ ca), and the easy inequality<br />

P(N(0, 1) > λ) ≤ 2λwhich comes <strong>from</strong> integrating the p.d.f.. Finally, by the choice<br />

of m so that γ > m+1<br />

2m<br />

, we know that m + 1 − 2mγ < 0 so this probability does<br />

indeed go to zero. But then, as the events An are increasing, this means that An<br />

are all zero probability events, which is the result we wanted.


CHAPTER 5<br />

Appendix<br />

1. Conditional Random Variables<br />

Let (Ω, F, P) be a probability space and X, Y : Ω → R random variables. B is<br />

the Borel sigma algebra of R.<br />

Definition 1.1. We define σ(X) ⊂ F to be the sigma-algebra generated by<br />

the preimages of Borel sets through F. That is:<br />

σ(X) = σ({X −1 (B) : B ∈ B})<br />

Remark. The sub-algebra σ(X) is in coarser than all of F. Intuitively, the<br />

random variable X can only “detect” up to sets in σ(X).<br />

Definition 1.2. Let Σ ⊂ F be a subalgebra of F. We say a random variable<br />

X : Ω → R is Σ−measurable if X −1 (B) ∈ Σ for all B ∈ B. Equivalently, if<br />

σ(X) ⊂ Σ.<br />

.<br />

Example 1.3. Every random variable is always F measurable, since σ(X) ⊂ F<br />

Definition 1.4. Given X and Y , we can define a new random variable Z =<br />

E(Y |X) to be the unique random variable with the following two properties:<br />

1. Z is σ(X) measurable.<br />

2. For any B ∈ B we have E (Z1X∈B) = E (Y 1X∈B)<br />

Remark. The existence of this random variable is proven by restricting the<br />

Radon-Nikodym derivative of Y with respect to the probability space to just the<br />

sigma field σ(X).<br />

Remark. There is no problem with picking any subalgebra Σ ⊂ F instead<br />

of σ(X). The second condition is simply that for any S ∈ Σ we have E (Z1S) =<br />

E (Y 1S), which is really the condition above with Σ = σ(X).<br />

Remark. Z = E(Y |X) is a random variable Z : Ω → R, but it is often thought<br />

of as a function Z : R → R, whose input is the random variable X. This works<br />

because Z is σ(X) measurable. The following two little results clear this up a bit:<br />

Proposition 1.5. If f : R → R is measurable, and Z : Ω → Ris Σ-measurable,<br />

then the random variable f ◦ Z is Σ-measurable too.<br />

Proof. For any B ∈ B we have (f ◦ Z) −1 (B) = Z −1 (f −1 (B)) ∈ Σsince<br />

f −1 (B) ∈ B as f is measurable and Z is Σ- measurable. <br />

Proposition 1.6. If Z is σ(X)-measurable random variable, then we may think<br />

of Z as a function Z : R → R whose input is X.<br />

39


40 5. APPENDIX<br />

Proof. Define ˜ Z : R → R by ˜ Z(x) = Z(ω) for any representative ω ∈<br />

X −1 ({x}). We must justify why this value is independent of the choice of ω ∈<br />

X −1 ({x}). Indeed for ω1, ω2 ∈ X −1 ({x}), let z = Z(ω1).Since Z is σ(X) measurable,<br />

we have that:<br />

Z −1 ({z}) ∈ σ(X)<br />

⇒ Z −1 ({z}) = X −1 (B) for some B ∈ B<br />

But then ω1 ∈ Z −1 ({z}) = X −1 (B), so that X(ω1) ∈ B. Since X(ω1) = X(ω2) =<br />

x, we have then ω2 ∈ X −1 (B) = Z −1 ({z}), which means that Z(ω1) = Z(ω2) = z,<br />

as desired. Hence ˜ Z is well defined! With this definition of ˜ Z, we see that Z = ˜ Z◦X.<br />

We often conflate Z with ˜ Z in practice. <br />

2. Extension <strong>Theorems</strong><br />

Theorem 2.1. [Caratheodory Extension Theorem]<br />

Fix some (Ω, A, P0), where Ω is a set, A is an algebra of sets (aka a field<br />

of sets), and P0is a finitely additive probability measure on A. If we have the<br />

additional property that:<br />

For sequences of sets A1, A2, . . . ∈ A which are pairwise disjoint with the property<br />

that ∪An ∈ A too, then we necessarily have P0(∪An) = P0(An) .<br />

Then there is a unique extension to a probability space (Ω, σ(A), P) so that P<br />

and P0 agree on A.<br />

Proof. [sketch] The idea is exactly the same as the construction of the Lebesgue<br />

measure on [0, 1] <strong>from</strong> the premeasure generated by µ((a, b))<br />

<br />

= b − a on the algebra<br />

of open sets. Define an outer measure: P(E) := inf P0(An). From here<br />

E⊂∪An<br />

you check that P is indeed a probability measure. Countable subadditivity and<br />

monotonicity are easy. To get that P(A) = P0(A) for A ∈ A requires the special<br />

property we are given above. Once this is done, you can define measurable sets a-la<br />

Caratheodory: E measurable iff for all A ∈ A we have P(A) = P(A∩E)+P(A∩E c ).<br />

Then you verify that σ(A) is a subset of these measurable sets, and declare P = Pto<br />

be the measure on σ(A). <br />

Remark. The above condition needed in the theorem can be replaced with<br />

“Continuity <strong>from</strong> above at ∅”:<br />

For A1, A2, . . . ∈ A which are decreasing down to ∅, then we necessarily have<br />

that P0(An) → 0 too.<br />

The equivalence of these two conditions is not too difficult. The first condition is<br />

more intuitive, while this second condition is sometimes easier to verify in practice.<br />

Theorem 2.2. [Countable Kolmogorov Extension Theorem]<br />

Suppose for every n ≥ 1, we have a probability measure Pn on R n . Suppose<br />

also that these probability measure’s satisfy the following consistency condition for<br />

every Borel set E ∈ R n :<br />

Pn+k(E × R k ) = Pn(E)<br />

Then there exists a unique measure Pon the infinite product measure R ∞ of<br />

sequences, so that for every Borel set E ∈ R n P(E × R × R × . . .) = Pn(E).


2. EXTENSION THEOREMS 41<br />

Proof. [sketch] Take Ω = R ∞ be real-valued sequences. Define the field of<br />

cylinder sets to be:<br />

A = {E × R × R × . . . : E ∈ R n is Borel}<br />

With finitely additive measure P0(E × R × R × . . .) := Pn(E). The given<br />

condition on the P ′ ns shows this is well defined. To see continuity <strong>from</strong> above at<br />

∅, notice that if Ak ↓ ∅, then we must have Ak = Ek × R × R × . . . for some sets<br />

Ek ∈ R n with Ek ↓ ∅. But then of course, since Pn is a probability measure,<br />

we have P0(Ak) = Pn(Ek) → 0. By application of the Caratheodory extension<br />

theorem, we get the desired measure! <br />

Theorem 2.3. [Kolmogorov Extension Theorem]<br />

Let T be any interval T ⊂ R. Suppose we have a family of probability measure’s<br />

Pt1,t2,...tn on Rn whenever t1, t2, . . . tn is a finite number of points in T . Suppose<br />

also that these probability measure’s satisfy the following consistency condition:<br />

P t1,t2,...tn,ˆt1,ˆt2,...ˆtm (E × Rm ) = Pt1,t2,...tn (E)<br />

Then there exists a unique measure P on the set of functions {f : T → R}so<br />

that:<br />

P ({f : (f(t1), f(t2), . . . f(tn)) ∈ E}) = Pt1,t2,...tn (E)<br />

Remark. This is very similar to the countable version, but requires some more<br />

work to make it work out. However, since the space of functions {f : T → R} is so<br />

large, this theorem often gives us a very unwieldy space to work with, one in which<br />

we can’t get our hands on the properties we want. The construction of Brownian<br />

motion below is a great example, constructing with the uncountable Kolmogorov<br />

theorem is bad, while with the countable one is a good.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!