12.07.2015 Views

Measure, Integral, and Conditional Expectation - Functional Analysis

Measure, Integral, and Conditional Expectation - Functional Analysis

Measure, Integral, and Conditional Expectation - Functional Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Jan van Neerven<strong>Measure</strong>, <strong>Integral</strong>, <strong>and</strong><strong>Conditional</strong> <strong>Expectation</strong>H<strong>and</strong>outMay 2, 2012


1<strong>Measure</strong> <strong>and</strong> <strong>Integral</strong>In this lecture we review elements of the Lebesgue theory of integration.1.1 σ-AlgebrasLet Ω be a set. A σ-algebra in Ω is a collection F of subsets of Ω with thefollowing properties:(1) Ω ∈ F ;(2) A ∈ F implies ∁A ∈ F ;(3) A 1 , A 2 , · · · ∈ F implies ⋃ ∞n=1 A n ∈ F .Here, ∁A = Ω \ A is the complement of A.A measurable space is a pair (Ω, F ), where Ω is a set <strong>and</strong> F is a σ-algebrain Ω.The above three properties express that F is non-empty, closed undertaking complements, <strong>and</strong> closed under taking countable unions. From∞⋂n=1( ⋃ ∞A n = ∁n=1∁A n)it follows that F is closed under taking countable intersections. Clearly, F isclosed under finite unions <strong>and</strong> intersections as well.Example 1.1. In every set Ω there is a minimal σ-algebra, {∅, Ω}, <strong>and</strong> a maximalσ-algebra, the set P(Ω) of all subsets of Ω.Example 1.2. If A is any collection of subsets of Ω, then there is a smallestσ-algebra, denoted σ(A ), containing A . This σ-algebra is obtained as the intersectionof all σ-algebras containing A <strong>and</strong> is called the σ-algebra generatedby A .


2 1 <strong>Measure</strong> <strong>and</strong> <strong>Integral</strong>Example 1.3. Let (Ω 1 , F 1 ), (Ω 2 , F 2 ), . . . be a sequence of measurable spaces.The product∞∏ ∞∏( Ω n , F n )n=1is defined by ∏ ∞n=1 Ω n = Ω 1 ×Ω 2 ×. . . (the cartesian product) <strong>and</strong> ∏ ∞n=1 F n =F 1 × F 2 × . . . is defined as the σ-algebra generated by all sets of the formn=1A 1 × · · · × A N × Ω N+1 × Ω N+2 × . . .with N = 1, 2, . . . <strong>and</strong> A n ∈ F n for n = 1, . . . , N. Finite products∏ Nn=1 (Ω n, F n ) are defined similarly; here one takes the σ-algebra in Ω 1 ×· · · × Ω N generated by all sets of the form A 1 × · · · × A N with A n ∈ F n forn = 1, . . . , N.Example 1.4. The Borel σ-algebra of R d , notation B(R d ), is the σ-algebragenerated by all open sets in R d . The sets in this σ-algebra are called theBorel sets of R d . As an exercise, check that(R d , B(R d )) =d∏(R, B(R)).Hint: Every open set in R d is a countable union of rectangles of the form[a 1 , b 1 ) × · · · × [a d , b d ).Example 1.5. Let f : Ω → R be any function. For Borel sets B ∈ B(R) wedefine{f ∈ B} := {ω ∈ Ω : f(ω) ∈ B}.The collectionn=1σ(f) = { {f ∈ B} : B ∈ B(R) }is a σ-algebra in Ω, the σ-algebra generated by f.1.2 Measurable functionsLet (Ω 1 , F 1 ) <strong>and</strong> (Ω 2 , F 2 ) be measurable spaces. A function f : Ω 1 → Ω 2 issaid to be measurable if {f ∈ A} ∈ F 1 for all A ∈ F 2 . Clearly, compositionsof measurable functions are measurable.Proposition 1.6. If A 2 is a subset of F 2 with the property that σ(A 2 ) = F 2 ,then a function f : Ω 1 → Ω 2 is measurable if <strong>and</strong> only if{f ∈ A} ∈ F 1 for all A ∈ A 2 .Proof. Just notice that {A ∈ F 2 : {f ∈ A} ∈ F 1 } is a sub-σ-algebra of F 2containing A 2 .⊓⊔


1.3 <strong>Measure</strong>s 3In most applications we are concerned with functions f from a measurablespace (Ω, F ) to (R, B(R)). Such functions are said to be Borel measurable. Inwhat follows we summarise some measurability properties of Borel measurablefunctions (the adjective ‘Borel’ will be omitted when no confusion is likely toarise).Proposition 1.6 implies that a function f : Ω → R is Borel measurableif <strong>and</strong> only if {f > a} ∈ F for all a ∈ R. From this it follows that linearcombinations, products, <strong>and</strong> quotients (if defined) of measurable functions aremeasurable. For example, if f <strong>and</strong> g are measurable, then f + g is measurablesince{f + g > a} = ⋃ q∈Q{f > q} ∩ {g > a − q}.This, in turn, implies that fg is measurable, as we havefg = 1 2 [(f + g)2 − (f 2 + g 2 )].If f = sup n1 f n pointwise <strong>and</strong> each f n is measurable, then f is measurablesince∞⋃{f > a} = {f n > a}.n=1It follows from inf n1 f n = − sup n1 (−f n ) that the pointwise infimum ofa sequence of measurable functions is measurable as well. From this we getthat the pointwise limes superior <strong>and</strong> limes inferior of measurable functionsis measurable, since( ) ( )lim sup f n = lim f k = inf f kn→∞ n→∞n1supknsupkn<strong>and</strong> lim inf n→∞ f n = − lim sup n→∞ (−f n ). In particular, the pointwise limitlim n→∞ f n of a sequence of measurable functions is measurable.In these results, it is implicitly understood that the suprema <strong>and</strong> infimaexist <strong>and</strong> are finite pointwise. This restriction can be lifted by consideringfunctions f : Ω → [−∞, ∞]. Such a function is said to be measurable if thesets f ∈ B} as well as the sets {f = ±∞} are in F . This extension will alsobe useful later on.1.3 <strong>Measure</strong>sLet (Ω, F ) be a measurable space. A measure is a mapping µ : F → [0, ∞]with the following properties:(1) µ(∅) = 0;(2) For all disjoint set A 1 , A 2 , . . . in F we have µ ( ⋃ ∞n=1 A n)=∑ ∞n=1 µ(A n).


4 1 <strong>Measure</strong> <strong>and</strong> <strong>Integral</strong>A triple (Ω, F , µ), with µ a measure on a measurable space (Ω, F ), iscalled measure space. A measure space (Ω, F , P) is called finite if µ is a finitemeasure, that is, if µ(Ω) < ∞. If µ(Ω) = 1, then µ is called a probabilitymeasure <strong>and</strong> (Ω, F , µ) is called a probability space. In probability theory, it iscustomary to use the symbol P for a probability measure.The following properties of measures are easily checked:(a) if A 1 ⊆ A 2 in F , then µ(A 1 ) µ(A 2 );(b) if A 1 , A 2 , . . . in F , then( ⋃ ∞µn=1A n)∞∑µ(A n );n=1Hint: Consider the disjoint sets A n+1 \ (A 1 ∪ · · · ∪ A n ).(c) if A 1 ⊆ A 2 ⊆ . . . in F , then( ⋃ ∞µn=1A n)= limn→∞ µ(A n);Hint: Consider the disjoint sets A n+1 \ A n .(d) if A 1 ⊇ A 2 ⊇ . . . in F <strong>and</strong> µ(A 1 ) < ∞, thenHint: Take complements.( ⋂ ∞µn=1A n)= limn→∞ µ(A n).Note that in c) <strong>and</strong> d) the limits (in [0, ∞]) exist by monotonicity.Example 1.7. Let (Ω, F ) <strong>and</strong> ( ˜Ω, ˜ F ) be measurable spaces <strong>and</strong> let µ be ameasure on (Ω, F ). If f : Ω → ˜Ω is measurable, thenf ∗ (µ)(Ã) := µ{f ∈ Ã}, à ∈ F ˜,defines a measure f ∗ (µ) on ( ˜Ω, ˜ F ). This measure is called the image of µunder f. In the context where (Ω, F , P) is a probability space, the image ofP under f is called the distribution of f.Concerning existence <strong>and</strong> uniqueness of measures we have the followingtwo fundamental results.Theorem 1.8 (Caratheodory). Let A be a non-empty collection of subsetsof a set Ω that is closed under taking complements <strong>and</strong> finite unions. If µ :A → [0, ∞] is a mapping such that properties 1.2(1) <strong>and</strong> 1.2(2) hold, thenthere exists a unique measure ˜µ : σ(A ) → [0, ∞] such that˜µ| A = µ.


1.3 <strong>Measure</strong>s 5The proof is lengthy <strong>and</strong> rather tedious. The typical situation is that measuresare given, <strong>and</strong> therefore the proof is omitted. The uniqueness part is aconsequence of the next result, which is useful in many other situations aswell.Theorem 1.9 (Dynkin). Let µ 1 <strong>and</strong> µ 2 be two finite measures defined on ameasurable space (Ω, F ). Let A ⊆ F be a collection of sets with the followingproperties:(1) A is closed under finite intersections;(2) σ(A ), the σ-algebra generated by A , equals F .If µ 1 (A) = µ 2 (A) for all A ∈ A , then µ 1 = µ 2 .Proof. Let D denote the collection of all sets D ∈ F with µ 1 (D) = µ 2 (D).Then A ⊆ D <strong>and</strong> D is a Dynkin system, that is,• Ω ∈ D;• if D 1 ⊆ D 2 with D 1 , D 2 ∈ D, then also D 2 \ D 1 ∈ D;• if D 1 ⊆ D 2 ⊆ . . . with all D n ∈ D, then also ⋃ ∞n=1 D n ∈ D.The finiteness of µ 1 <strong>and</strong> µ 2 is used to prove the second property.By assumption we have D ⊆ F = σ(A ); we will show that σ(A ) ⊆ D. Tothis end let D 0 denote the smallest Dynkin system in F containing A . Wewill show that σ(A ) ⊆ D 0 . In view of D 0 ⊆ D, this will prove the lemma.Let C = {D 0 ∈ D 0 : D 0 ∩ A ∈ D 0 for all A ∈ A }. Then C is a Dynkinsystem <strong>and</strong> A ⊆ C since A is closed under taking finite intersections. Itfollows that D 0 ⊆ C , since D 0 is the smallest Dynkin system containing A .But obviously, C ⊆ D 0 , <strong>and</strong> therefore C = D 0 .Now let C ′ = {D 0 ∈ D 0 : D 0 ∩ D ∈ D 0 for all D ∈ D 0 }. Then C ′is a Dynkin system <strong>and</strong> the fact that C = D 0 implies that A ⊆ C ′ . HenceD 0 ⊆ C ′ , since D 0 is the smallest Dynkin system containing A . But obviously,C ′ ⊆ D 0 , <strong>and</strong> therefore C ′ = D 0 .It follows that D 0 is closed under taking finite intersections. But a Dynkinsystem with this property is a σ-algebra. Thus, D 0 is a σ-algebra, <strong>and</strong> nowA ⊆ D 0 implies that also σ(A ) ⊆ D 0 .⊓⊔Example 1.10 (Lebesgue measure). In R d , let A be the collection of all finiteunions of rectangles [a 1 , b 1 ) × · · · × [a d , b d ). Define µ : A → [0, ∞] byµ([a 1 , b 1 ) × · · · × [a d , b d )) :=d∏(b n − a n )<strong>and</strong> extend this definition to finite disjoint unions by additivity. It can beverified that A <strong>and</strong> µ satisfy the assumptions of Caratheodory’s theorem.The resulting measure µ on σ(A ) = B(R d ), the Borel σ-algebra of R d , iscalled the Lebesgue measure.n=1


6 1 <strong>Measure</strong> <strong>and</strong> <strong>Integral</strong>Example 1.11. Let (Ω 1 , F 1 , P 1 ), (Ω 2 , F 2 , P 2 ), . . . be a sequence of probabilityspaces. There exists a unique probability measure P = ∏ ∞∏ n=1 P n on F =∞n=1 F n which satisfiesP(A 1 × · · · × A N × Ω N+1 × Ω N+2 × . . . ) =N∏P n (A n )for all N 1 <strong>and</strong> A n ∈ F n for n = 1, . . . , N. Finite products of arbitrarymeasures µ 1 , . . . , µ N can be defined similarly.For example, the Lebesgue measure on R d is the product measure of dcopies of the Lebesgue measure on R.As an illustration of the use of Dynkin’s lemma we conclude this sectionwith an elementary result on independence. In probability theory, a measurablefunction f : Ω → R, where (Ω, F , P) is a probability space, is calleda r<strong>and</strong>om variable. R<strong>and</strong>om variables f 1 , . . . , f N : Ω → R are said to beindependent if for all B 1 , . . . , B N ∈ B(R) we haveP{f 1 ∈ B 1 , . . . , f N ∈ B N } =n=1N∏P{f 1 ∈ B 1 }.Proposition 1.12. The r<strong>and</strong>om variables f 1 , . . . , f N : Ω → R are independentif <strong>and</strong> only if ν = ∏ Nn=1 ν n, where ν is the distribution of (f 1 , . . . , f N )<strong>and</strong> ν n is the distribution of f n , n = 1, . . . , N.Proof. By definition, f 1 , . . . , f N are independent if <strong>and</strong> only if ν(B) =∏ Nn=1 ν n(B n ) for all B = B 1 × · · · × B N with B n ∈ B(R) for all n = 1, . . . , N.Since these sets generate B(R N ), by Dynkin’s lemma the latter impliesν = ∏ Nn=1 ν n. ⊓⊔n=11.4 The Lebesgue integralLet (Ω, F ) be a measurable space. A simple function is a function f : Ω → Rthat can be represented in the formf =N∑c n 1 Anwith coefficients c n ∈ R <strong>and</strong> disjoint sets A n ∈ F for all n = 1, . . . , N.n=1Proposition 1.13. A function f : Ω → [−∞, ∞] is measurable if <strong>and</strong> only ifit is the pointwise limit of a sequence of simple functions. If in addition f isnon-negative, we may arrange that 0 f n ↑ f pointwise.Proof. Take, for instance, f n = ∑ 2 2nm=−2 2nm 21 n {f∈[ m2 n , m+12n )}.⊓⊔


1.4 The Lebesgue integral 71.4.1 Integration of simple functionsLet (Ω, F , µ) be a measure space. For a non-negative simple function f =∑ Nn=1 c n1 An we define∫Ωf dµ :=N∑c n µ(A n ).n=1We allow µ(A n ) to be infinite; this causes no problems because the coefficientsc n are non-negative (we use the convention 0 · ∞ = 0). It is easy to checkthat this integral is well-defined, in the sense that it does not depend on theparticular representation of f as a simple function. Also, the integral is linearwith respect to addition <strong>and</strong> multiplication with non-negative scalars,∫∫ ∫af 1 + bf 2 dµ = a f 1 dµ + b f 2 dµ,Ω<strong>and</strong> monotone in the sense that if 0 f 1 f 2 pointwise, then∫ ∫f 1 dµ f 2 dµ.1.4.2 Integration of non-negative functionsΩIn what follows, a non-negative function is a function with values in [0, ∞].For a non-negative measurable function f we choose a sequence of simplefunctions 0 f n ↑ f <strong>and</strong> define∫∫f dµ := lim f n dµ.n→∞ΩThe following lemma implies that this definition does not depend on the approximatingsequence.Lemma 1.14. For a non-negative measurable function f <strong>and</strong> non-negativesimple functions f n <strong>and</strong> g such that 0 f n ↑ f <strong>and</strong> g f pointwise we have∫∫g dµ lim f n dµ.n→∞ΩProof. First consider the case g = 1 A . Fix ε > 0 arbitrary <strong>and</strong> let A n ={1 A f n 1 − ε}. Then A 1 ⊆ A 2 ⊆ . . . ↑ A <strong>and</strong> therefore µ(A n ) ↑ µ(A n ). Since1 A f n (1 − ε)1 An ,∫∫limn→∞Ω1 A f n dµ (1 − ε) limn→∞ µ(A n) = (1 − ε)µ(A) = (1 − ε)This proves the lemma for g = 1 A . The general case follows by linearity.ΩΩΩΩΩΩg dµ.⊓⊔


1.4 The Lebesgue integral 9∫∫φ ◦ f dµ = φ df ∗ (µ).Ω 1 Ω 2To prove this, note the this is trivial for simple functions φ = 1 A with A ∈ F 2 .By linearity, the identity extends to non-negative simple functions φ, <strong>and</strong> bymonotone convergence (using Proposition 1.13) to non-negative measurablefunctions φ.From the monotone convergence theorem we deduce the following usefulcorollary.Theorem 1.17 (Fatou Lemma). Let (f n ) ∞ n=1 be a sequence of non-negativemeasurable functions on (Ω, F , µ). Then∫∫lim inf f n dµ lim inf f n dµ.n→∞ n→∞ΩProof. From inf kn f k f m , m n, we infer∫∫inf f k dµ infkn mnHence, by the monotone convergence theorem,∫∫lim inf f n dµ = lim inf f k dµΩn→∞Ωn→∞ kn∫= lim inf f k dµ limn→∞ knΩΩΩΩinfn→∞ mn1.4.3 Integration of real-valued functionsf m dµ.A measurable function f : Ω → [−∞, ∞] is called integrable if∫|f| dµ < ∞.Ω∫Ω∫f m dµ = lim inf f n dµ.n→∞Ω⊓⊔Clearly, if f <strong>and</strong> g are measurable <strong>and</strong> |g| |f| pointwise, then g is integrable iff is integrable. In particular, if f is integrable, then the non-negative functionsf + <strong>and</strong> f − are integrable, <strong>and</strong> we define∫ ∫ ∫f dµ := f + dµ − f − dµ.ΩFor a set A ∈ F we writeΩΩ∫ ∫f dµ := 1 A f dµ,AΩnoting that 1 A f is integrable. The monotonicity <strong>and</strong> additivity properties ofthis integral carry over to this more general situation, provided we assumethat the functions we integrate are integrable.The next result is among the most useful in analysis.


10 1 <strong>Measure</strong> <strong>and</strong> <strong>Integral</strong>Theorem 1.18 (Dominated Convergence Theorem). Let (f n ) ∞ n=1 be asequence of integrable functions such that lim n→∞ f n = f pointwise. If thereexists an integrable function g such that |f n | |g| for all n 1, then∫In particular,limn→∞Ω|f n − f| dµ = 0.∫ ∫lim f n dµ = f dµ.n→∞ΩΩProof. We make the preliminary observation that if (h n ) ∞ n=1 is a sequence ofnon-negative measurable functions such that lim n→∞ h n = 0 pointwise <strong>and</strong> his a non-negative integrable function such that h n h for all n 1, then bythe Fatou lemma∫ ∫h dµ = lim infΩΩ lim infn→∞(h − h n) dµn→∞∫∫h − h n dµ =ΩSince ∫ Ω h dµ is finite, it follows that 0 lim sup n→∞therefore∫h n dµ = 0.limn→∞ΩΩ∫h dµ − lim sup h n dµ.n→∞ Ω∫Ω h n dµ 0 <strong>and</strong>The theorem follows by applying this to h n = |f n − f| <strong>and</strong> h = 2|g|.Let (Ω, F , P) be a probability space. The integral of an integrable r<strong>and</strong>omvariable f : Ω → R is called the expectation of f, notation∫Ef = f dP.Proposition 1.19. If f : Ω → R <strong>and</strong> g : Ω → R are independent integrabler<strong>and</strong>om variables, then fg : Ω → R is integrable <strong>and</strong>ΩEfg = Ef Eg.Proof. To prove this we first observe that for any non-negative Borel measurableφ : R → R, the r<strong>and</strong>om variables φ ◦ f <strong>and</strong> φ ◦ g are independent. Takingφ(x) = x + <strong>and</strong> φ(x) = x − , it follows that we may consider the positive <strong>and</strong>negative parts of f <strong>and</strong> g separately. In doing so, we may assume that f <strong>and</strong>g are non-negative.Denoting the distributions of f, g, <strong>and</strong> (f, g) by ν f , ν g , <strong>and</strong> ν (f,g) respectively,by the substitution formula (with φ(x, y) = xy1 [0,∞) 2(x, y)) <strong>and</strong> thefact that ν (f,g) = ν f × ν g we obtain⊓⊔


1.5 L p -spaces 11∫∫E(fg) = xy dν (f,g) (x, y) = xy d(ν f × dν g )(x, y)[0,∞) 2 [0,∞)∫2 ∫=xy dν f (x)dν g (y) = Ef Eg.[0,∞) [0,∞)This proves the integrability of fg along with the desired identity.⊓⊔1.4.4 The role of null setsWe begin with a simple observation.Proposition 1.20. Let f be a non-negative measurable function.1. If ∫ f dµ < ∞, then µ{f = ∞} = 0.Ω2. If ∫ f dµ = 0, then µ{f ≠ 0} = 0.ΩProof. For all c > 0 we have 0 c1 {f=∞} f <strong>and</strong> therefore∫0 cµ{f = ∞} f dµ.The first result follows from this by letting c → ∞. For the second, note thatfor all n 1 we have 1 n 1 {f 1 n } f <strong>and</strong> therefore1n µ{ f 1 ∫} f dµ = 0.n ΩIt follows that µ{f 1 n } = 0. Now note that {f > 0} = ⋃ ∞n=1 {f 1 n}. ⊓⊔Proposition 1.21. If f is integrable <strong>and</strong> µ{f ≠ 0} = 0, then ∫ f dµ = 0.ΩProof. By considering f + <strong>and</strong> f − separately we may assume f is non-negative.Choose simple functions 0 f n ↑ f. Then µ{f n > 0} µ{f > 0} = 0 <strong>and</strong>therefore ∫ Ω f n dµ = 0 for all n 1. The result follows from this. ⊓⊔What these results show is that in everything that has been said so far, wemay replace pointwise convergence by pointwise convergence µ-almost everywhere,where the latter means that we allow an exceptional set of µ-measurezero. This applies, in particular, to the monotonicity assumption in the monotoneconvergence theorem <strong>and</strong> the convergence <strong>and</strong> domination assumptionsin the dominated convergence theorem.Ω1.5 L p -spacesLet µ be a measure on (Ω, F ) <strong>and</strong> fix 1 p < ∞. We define L p (µ) as theset of all measurable functions f : Ω → R such that


12 1 <strong>Measure</strong> <strong>and</strong> <strong>Integral</strong>∫For such functions we defineΩ‖f‖ p :=|f| p dµ < ∞.( ∫ |f| p dµΩ) 1pThe next result shows that L p (µ) is a vector space:Proposition 1.22 (Minkowski’s inequality). For all f, g ∈ L p (µ) wehave f + g ∈ L p (µ) <strong>and</strong>‖f + g‖ p ‖f‖ p + ‖g‖ p .Proof. By elementary calculus it is checked that for all non-negative real numbersa <strong>and</strong> b one has(a + b) p = inft∈(0,1) t1−p a p + (1 − t) 1−p b p .Applying this identity to |f(x)| <strong>and</strong> |g(x)| <strong>and</strong> integrating with respect to µ,for all fixed t ∈ (0, 1) we obtain∫∫|f(x) + g(x)| p dµ(x) (|f(x)| + |g(x)|) p dµ(x)ΩΩ∫∫ t 1−p |f(x)| p dµ(x) + (1 − t) 1−p |g(x)| p dµ(x).Stated differently, this says thatΩ‖f + g‖ p p t 1−p ‖f‖ p p + (1 − t) 1−p ‖g‖ p p..ΩTaking the infimum over all t ∈ (0, 1) gives the result.⊓⊔In spite of this result, ‖ · ‖ p is not a norm on L p (µ), because ‖f‖ p = 0only implies that f = 0 µ-almost everywhere. In order to get around thisimperfection, we define an equivalence relation ∼ on L p (µ) byf ∼ g ⇔ f = g µ-almost everywhere.The equivalence class of a function f modulo ∼ is denoted by [f]. On thequotient spaceL p (µ) := L p (µ)/ ∼we define a scalar multiplication <strong>and</strong> addition in the natural way:c[f] := [cf],[f] + [g] := [f + g].We leave it as an exercise to check that both operators are well defined. Withthese operations, L p (µ) is a normed vector space with respect to the norm


1.5 L p -spaces 13‖[f]‖ p := ‖f‖ p .Following common practice we shall make no distinction between a functionin L p (µ) <strong>and</strong> its equivalence class in L p (µ).The next result states that L p (µ) is a Banach space (which, by definition,is a complete normed vector space):Theorem 1.23. The normed space L p (µ) is complete.Proof. Let (f n ) ∞ n=1 be a Cauchy sequence with respect to the norm ‖ · ‖ p ofL p (µ). By passing to a subsequence we may assume that‖f n+1 − f n ‖ p 1 , n = 1, 2, . . .2n Define the non-negative measurable functionsg N :=N−1∑n=0|f n+1 − f n |, g :=∞∑|f n+1 − f n |,with the convention that f 0 = 0. By the monotone convergence theorem,∫∫g p dµ = lim g p N dµ.N→∞ΩΩn=0Taking p-th roots <strong>and</strong> using Minkowski’s inequality we obtainN−1‖g‖ p = lim ‖g ∑N‖ p lim ‖f n+1 −f n ‖ p =N→∞ N→∞n=0∞∑‖f n+1 −f n ‖ p 1+‖f 1 ‖ p .If follows that g is finitely-valued µ-almost everywhere, which means that thesum defining g converges absolutely µ-almost everywhere. As a result, the sumf :=∞∑(f n+1 − f n )n=0n=0converges on the set {g < ∞}. On this set we haveN−1∑f = lim (f n+1 − f n ) = lim f N.N→∞N→∞n=0Defining f to be zero on the null set {g = ∞}, the resulting function f ismeasurable. FromN−1|f N | p ∑= ∣ (f n+1 − f n ) ∣ p n=0( N−1∑n=0) p|f n+1 − f n | |g|p<strong>and</strong> the dominated convergence theorem we conclude that


14 1 <strong>Measure</strong> <strong>and</strong> <strong>Integral</strong>lim ‖f − f N‖ p = 0.N→∞We have proved that a subsequence of the original Cauchy sequence convergesto f in L p (µ). As is easily verified, this implies that the original Cauchysequence converges to f as well.⊓⊔In the course of the proof we obtained the following result:Corollary 1.24. Every convergence sequence in L p (µ) has a µ-almost everywhereconvergent subsequence.For p = ∞ the above definitions have to be modified a little. We defineL ∞ (µ) to be the vector space of all measurable functions f : Ω → R suchthatsup |f(ω)| < ∞ω∈Ω<strong>and</strong> define, for such functions,The space‖f‖ ∞ := sup |f(ω)|.ω∈ΩL ∞ (µ) := L ∞ (µ)/ ∼is again a normed vector space in a natural way. The simple proof that L ∞ (µ)is a Banach space is left as an exercise.We continue with an approximation result, the proof of which is an easyapplication of Proposition 1.13 together with the dominated convergence theorem:Proposition 1.25. For 1 p ∞, the simple functions in L p (µ) are densein L p (µ).We conclude with an important inequality, known as Hölder’s inequality.For p = q = 2, it is known as the Cauchy-Schwarz inequality.Theorem 1.26. Let 1 p, q ∞ satisfy 1 p + 1 qg ∈ L q (µ), then fg ∈ L 1 (µ) <strong>and</strong>= 1. If f ∈ Lp (µ) <strong>and</strong>‖fg‖ 1 ‖f‖ p ‖g‖ q .Proof. For p = 1, q = ∞ <strong>and</strong> for p = ∞, q = 1, this follows by a direct estimate.Thus we may assume from now on that 1 < p, q < ∞. The inequality is thenproved in the same way as Minkowski’s inequality, this time using the identity( t p a p )ab = inft>0 p+ bqqt q . ⊓⊔


1.7 Notes 151.6 Fubini’s theoremA measure space (Ω, F , µ) is called σ-finite if we can write Ω = ⋃ ∞n=1 Ω(n)with sets Ω (n) ∈ F satisfying µ(Ω (n) ) < ∞.Theorem 1.27 (Fubini). Let (Ω 1 , F 1 , µ 1 ) <strong>and</strong> (Ω 1 , F 1 , µ 1 ) be σ-finite measurespaces. For any non-negative measurable function f : Ω 1 × Ω 2 → R thefollowing assertions hold:(1) The non-negative function ω 1 ↦→ ∫ Ω 2f(ω 1 , ω 2 ) dµ 2 (ω 2 ) is measurable;(2) The non-negative function ω 2 ↦→ ∫ Ω 1f(ω 1 , ω 2 ) dµ 1 (ω 1 ) is measurable;(3) We have∫∫∫f d(µ 1 × µ 2 ) = f dµ 1 dµ 2 = f dµ 2 dµ 1 .Ω 1×Ω 2∫Ω 2 Ω 1∫Ω 1 Ω 2We shall not give a detailed proof, but sketch the main steps. First, forf(ω 1 , ω 2 ) = 1 A1 (ω 1 )1 A2 (ω 2 ) with A 1 ∈ F 1 , A 2 ∈ F 2 , the result is clear.Dynkin’s lemma is used to extend the result to functions f = 1 A with A ∈F = F 1 × F 2 . One has to assume first that µ 1 <strong>and</strong> µ 2 are finite; the σ-finitecase then follows by approximation. By taking linear combinations, the resultextends to simple functions. The result for arbitrary non-negative measurablefunctions then follows by monotone convergence.A variation of the Fubini theorem holds if we replace ‘non-negative measurable’by ‘integrable’. In that case, for µ 1 -almost all ω 1 ∈ Ω 1 the functionω 2 ↦→ f(ω 1 , ω 2 ) is integrable <strong>and</strong> for µ 2 -almost all ω 2 ∈ Ω 2 the function ω 1 ↦→f(ω 1 , ω 2 ) is integrable. Moreover, the functions ω 1 ↦→ ∫ Ω 2f(ω 1 , ω 2 ) dµ 2 (ω 2 )<strong>and</strong> ω 2 ↦→ ∫ Ω 1f(ω 1 , ω 2 ) dµ 1 (ω 1 ) are integrable <strong>and</strong> the above identities hold.1.7 NotesThe proof of the monotone convergence theorem is taken from Kallenberg[1]. The proofs of the Minkowski inequality <strong>and</strong> Hölder inequality, althoughnot the ones that are usually presented in introductory courses, belong to thefolklore of the subject.


2<strong>Conditional</strong> expectationsIn this lecture we introduce conditional expectations <strong>and</strong> study some of theirbasic properties.We begin by recalling some terminology. A r<strong>and</strong>om variable is a measurablefunction on a probability space (Ω, F , P). In this context, Ω is often called thesample space <strong>and</strong> the sets in F the events. For an integrable r<strong>and</strong>om variablef : Ω → R we write∫Ef := f dPfor its expectation.Below we fix a sub-σ-algebra G of F . For 1 p ∞ we denote byL p (Ω, G ) the subspace of all f ∈ L p (Ω) having a G -measurable representative.With this notation, L p (Ω) = L p (Ω, F ). We will be interested in the problemof finding good approximations of a r<strong>and</strong>om variable f ∈ L p (Ω) by r<strong>and</strong>omvariables in L p (Ω; G ).Lemma 2.1. L p (Ω, G ) is a closed subspace of L p (Ω).Proof. Suppose that (ξ n ) ∞ n=1 is a sequence in L p (Ω, G ) such that lim n→∞ f n =f in L p (Ω). We may assume that the f n are pointwise defined <strong>and</strong> G -measurable. After passing to a subsequence (when 1 p < ∞) we mayfurthermore assume that lim n→∞ f n = f almost surely. The set C of allω ∈ Ω where the sequence (f n (ω)) ∞ n=1 converges is G -measurable. Put˜f := lim n→∞ 1 C f n , where the limit exists pointwise. The r<strong>and</strong>om variable˜f is G -measurable <strong>and</strong> agrees almost surely with f. This shows that f definesan element of L p (Ω, G ).⊓⊔Our aim is to show that L p (Ω, G ) is the range of a contractive projectionin L p (Ω). For p = 2 this is clear: we have the orthogonal decompositionL 2 (Ω) = L 2 (Ω, G ) ⊕ L 2 (Ω, G ) ⊥Ω


18 2 <strong>Conditional</strong> expectations<strong>and</strong> the projection we have in mind is the orthogonal projection, denotedby P G , onto L 2 (Ω, G ) along this decomposition. Following common usage wewriteE(f|G ) := P G (f), f ∈ L 2 (Ω),<strong>and</strong> call E(f|G ) the conditional expectation of f with respect to G . Let usemphasise that E(f|G ) is defined as an element of L 2 (Ω, G ), that is, as anequivalence class of r<strong>and</strong>om variables.Lemma 2.2. For all f ∈ L 2 (Ω) <strong>and</strong> G ∈ G we have∫∫E(f|G ) dP = f dP.GAs a consequence, if f 0 almost surely, then E(f|G ) 0 almost surely.Proof. By definition we have f − E(f|G ) ⊥ L 2 (Ω, G ). If G ∈ G , then 1 G ∈L 2 (Ω, G ) <strong>and</strong> therefore∫1 G (f − E(f|G )) dP = 0.ΩThis gives the desired identity. For the second assertion, choose a G -measurablerepresentative of g := E(f|G ) <strong>and</strong> apply the identity to the G -measurable set{g < 0}. ⊓⊔Taking G = Ω we obtain the identity E(E(f|G )) = Ef. This will be usedin the lemma, which asserts that the mapping f ↦→ E(f|G ) is L 1 -bounded.Lemma 2.3. For all f ∈ L 2 (Ω) we have E|E(f|G )| E|f|.Proof. It suffices to check that |E(f|G )| E(|f| |G ), since then the lemmafollows from E|E(f|G )| EE(|f| |G ) = E|f|. Splitting f into positive <strong>and</strong>negative parts, almost surely we have|E(f|G )| = |E(f + |G ) − E(f − |G )| |E(f + |G )| + |E(f − |G )| = E(f + |G ) + E(f − |G ) = E(|f| |G ).Since L 2 (Ω) is dense in L 1 (Ω) this lemma shows that the condition expectationoperator has a unique extension to a contractive linear operator onL 1 (Ω), which we also denote by E(·|G ). This operator is a projection (thisfollows by noting that it is a projection in L 2 (Ω) <strong>and</strong> then using that L 2 (Ω)is dense in L 1 (Ω)) <strong>and</strong> it is positive in the sense that it maps positive r<strong>and</strong>omvariables to positive r<strong>and</strong>om variables; this follows from Lemma 2.2 byapproximation.Lemma 2.4 (<strong>Conditional</strong> Jensen inequality). If φ : R → R is convex,then for all f ∈ L 1 (Ω) such that φ ◦ f ∈ L 1 (Ω) we have, almost surely,Gφ ◦ E(f|G ) E(φ ◦ f|G ).⊓⊔


2 <strong>Conditional</strong> expectations 19Proof. If a, b ∈ R are such that at + b φ(t) for all t ∈ R, then the positivityof the conditional expectation operator givesaE(f|G ) + b = E(af + b|G ) E(φ ◦ f|G )almost surely. Since φ is convex we can find real sequences (a n ) ∞ n=1 <strong>and</strong> (b n ) ∞ n=1such that φ(t) = sup n1 (a n t + b n ) for all t ∈ R; we leave the proof of this factas an exercise. Hence almost surely,φ ◦ E(f|G ) = sup a n E(f|G ) + b n E(φ ◦ f|G ).n1⊓⊔Theorem 2.5 (L p -contractivity). For all 1 p ∞ the conditional expectationoperator extends to a contractive positive projection on L p (Ω) withrange L p (Ω, G ). For f ∈ L p (Ω), the r<strong>and</strong>om variable E(f|G ) is the uniqueelement of L p (Ω, G ) with the property that for all G ∈ G ,∫∫E(f|G ) dP = f dP. (2.1)GProof. For 1 p < ∞ the L p -contractivity follows from Lemma 2.4 applied tothe convex function φ(t) = |t| p . For p = ∞ we argue as follows. If f ∈ L ∞ (Ω),then 0 |f| ‖f‖ ∞ 1 Ω <strong>and</strong> therefore 0 E(|f| |G ) ‖f‖ ∞ 1 Ω almost surely.Hence, E(|f| |G ) ∈ L ∞ (Ω) <strong>and</strong> ‖E(|f| |G )‖ ∞ ‖f‖ ∞ .For 2 p ∞, (2.1) follows from Lemma 2.2. For f ∈ L p (Ω) with1 p < 2 we choose a sequence (f n ) ∞ n=1 in L 2 (Ω) such that lim n→∞ f n = fin L p (Ω). Then lim n→∞ E(f n |G ) = E(f|G ) in L p (Ω) <strong>and</strong> therefore, for anyG ∈ G ,∫G∫∫ ∫E(f|G ) dP = lim E(f n |G ) dP = lim f n dP = f dP.n→∞Gn→∞GGIf g ∈ L p (Ω, G ) satisfies ∫ G g dP = ∫ G f dP for all G ∈ G , then ∫ ∫G g dP =E(f|G ) dP for all G ∈ G . Since both g <strong>and</strong> E(f|G ) are G -measurable, asGin the proof of the second part of Lemma 2.2 this implies that g = E(f|G )almost surely.In particular, E(E(f|G )|G ) = E(f|G ) for all f ∈ L p (Ω) <strong>and</strong> E(f|G ) = ffor all f ∈ L p (Ω, G ). This shows that E(·|G ) is a projection onto L p (Ω, G ).⊓⊔The next two results develop some properties of conditional expectations.Proposition 2.6.(1) If f ∈ L 1 (Ω) <strong>and</strong> H is a sub-σ-algebra of G , then almost surelyGE(E(f|G )|H ) = E(f|H ).


20 2 <strong>Conditional</strong> expectations(2) If f ∈ L 1 (Ω) is independent of G (that is, f is independent of 1 G for allG ∈ G ), then almost surelyE(f|G ) = Ef.(3) If f ∈ L p (Ω) <strong>and</strong> g ∈ L q (Ω, G ) with 1 p, q ∞, 1 p + 1 q= 1, then almostsurelyE(gf|G ) = gE(f|G ).Proof. (1): For all H ∈ H we have ∫ H E(E(f|G )|H ) dP = ∫ ∫E(f|G ) dP =Hf dP by Theorem 2.5, first applied to H <strong>and</strong> then to G (observe thatHH ∈ G ). Now the result follows from the uniqueness part of the theorem.(2): For all G ∈ G we have ∫ G Ef dP = E1 GEf = E1 G f = ∫ f dP, <strong>and</strong>Gthe result follows from the uniqueness part of Theorem 2.5.(3): For all G, G ′ ∈ G we have ∫ G 1 G ′E(f|G ) dP = ∫ ∫E(f|G ) dP =G∩G ′f dP = ∫ G∩G ′ G f1 G ′ dP. Hence E(f1 G ′|G ) = 1 G ′E(f|G ) by the uniquenesspart of Theorem 2.5. By linearity, this gives the result for simple functions g,<strong>and</strong> the general case follows by approximation.⊓⊔Example 2.7. Let f : (−r, r) → R be integrable <strong>and</strong> let G be the σ-algebra ofall symmetric Borel sets in (−r, r) (we call a set B symmetric if B = −B).Then,E(f|G ) = 1 2 (f + ˇf) almost surely,where ˇf(x) = f(−x). This is verified by checking the condition (2.1).Example 2.8. Let f : [0, 1) → R be integrable <strong>and</strong> let G be the σ-algebragenerated by the sets I n := [ n−1N, n N), n = 1, . . . , N. Then,E(f|G ) =N∑n=1m n 1 [n−1N , n N ) almost surely,wherem n = 1 ∫f(x) dx.|I n | I nAgain this is verified by checking the condition (2.1).If f <strong>and</strong> g are r<strong>and</strong>om variables on (Ω, F , P), we write E(g|f) := E(g|σ(f)),where σ(f) is the σ-algebra generated by f.Example 2.9. Let f 1 , . . . , f N be independent integrable r<strong>and</strong>om variables satisfyingE(f 1 ) = · · · = E(f N ) = 0, <strong>and</strong> put S N := f 1 + · · · + f N . Then, for1 n N,E(S N |f n ) = f n almost surely.This follows from the following almost sure identities:E(S N |f n ) = E(f n |f n ) + ∑ m≠nE(f m |f n ) = f n + ∑ m≠nE(f m ) = f n .


2.1 Notes 21Example 2.10. Let f 1 , . . . , f N be independent <strong>and</strong> identically distributed integrabler<strong>and</strong>om variables <strong>and</strong> put S N := f 1 + · · · + f N . Then<strong>and</strong> thereforeE(f 1 |S N ) = · · · = E(f N |S N ) almost surelyE(f n |S N ) = S N /N almost surely, n = 1, . . . , N.It is clear that the second identity follows from the first. To prove the first,let µ denote the (common) distribution of the r<strong>and</strong>om variables f n . By independence,the distribution of the R N -valued r<strong>and</strong>om variable (f 1 , . . . , f N )equals the N-fold product measure µ N := µ × · · · × µ. Then, for any Borel setB ∈ B(R) we have, using the substitution formula <strong>and</strong> Fubini’s theorem,∫E(f n |S N ) dP{S N ∈B}∫=∫=={S N ∈B}f n dP{x 1+···+x N ∈B}∫ ∫R N−1which is independent of n.x n dµ N (x)y dµ(y) dµ N−1 (z 1 , . . . , z N−1 ),{y∈B−z 1−···−z N−1 }2.1 NotesWe recommend Williams [2] for an introduction to the subject. The biblefor modern probabilists is Kallenberg [1].The definition of conditional expectations starting from orthogonal projectionsin L 2 (Ω) follows Kallenberg [1]. Many textbooks prefer the shorter(but less elementary <strong>and</strong> transparent) route via the Radon-Nikodým theorem.In this approach, one notes that if f is a r<strong>and</strong>om variable on (Ω, F , P) <strong>and</strong>G is a sub-σ-algebra of F , then∫Q(G) := f dP, G ∈ G ,Gdefines a probability measure Q on (Ω, G , P| G ) which is absolutely continuouswith respect to P| G . Hence by the Radon-Nikodým, Q has an integrabledensity with respect to P| G . This density equals the conditional expectationE(f|G ) almost surely.


References1. O. Kallenberg, “Foundations of Modern Probability”, second ed., Probability<strong>and</strong> its Applications (New York), Springer-Verlag, New York, 2002.2. D. Williams, “Probability with Martingales”, Cambridge Mathematical Textbooks,Cambridge University Press, Cambridge, 1991.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!