Cheat Sheet - Student.cs.uwaterloo.ca

Cheat Sheet 

Markov’s inequality: Let X be a non-negative random variable. Then for any t>0, 

Pr(X 

t) apple E(X) 

t 

Hoe↵ding’s bound: Let X 1 ,X 2 ,...,X m be independently distributed random variables 

lying in interval [0, 1], and let X = 1 m 

P m 

i=1 X i.Thenforany✏>0, 

Pr(|X E(X)| ✏) apple 2e 2m✏2 . 

Agnostic PAC-learnability: AclassH ✓ 2 X is (Agnostic PAC) learnable if there exists 

a function m learn 

H :(0, 1) ⇥ (0, 1) ! N and a learning algorithm A : S 1 

i=0 (X ⇥{0, 1})i ! 2 X , 

such that for every (✏ >0, >0) and probability distribution P over X ⇥{0, 1}, ifasample 

S of size m m learn 

H (✏, delta) is drawn i.i.d. from P ,then,withprobabilityatleast(1 ), 

L P (A(S)) appleL P (H)+✏. (Here, L P (H) = inf h2H L P (h).) 

Non-Uniform Learnability: AclassH ✓ 2 X is non-uniformly learnable if there exists a 

function m nu learn 

H 

: H ⇥(0, 1)⇥(0, 1) ! N and a learning algorithm A : S 1 

i=0 (X ⇥{0, 1})i ! 

2 X ,suchthatforevery(✏>0, >0), h 2 H and probability distribution P over X ⇥{0, 1}, 

nu learn 

if a sample S of size m mH 

(h, ✏, delta) isdrawni.i.d. fromP ,then,withprobability 

at least (1 ), L P (A(S)) appleL P (h)+✏. Shattering & VC-dim: AclassH ✓ 2 X shatters 

set A i↵ {A \ h : h 2 H} =2 A .Vapnik-ChervonekisdimensionofH is defined as 

VC-dim(H) = sup{|A| : A ✓ X, H shatters A} . 

Sauer’s lemma: For a class H ✓ 2 X and a set A ✓ X, wedenoteby⇧ H (A) thenumberof 

cuts realized by H on A. Formally, 

⇧ H (A) =|{A \ h : h 2 H}| . 

For sample size m, ⇧ H (m) isthemaximumnumberofcutsonanysetofsizem. Formally, 

⇧ H (m) =max{⇧ H (A) : A ✓ X, |A| = m} . 

Sauer’s lemma says that if VC-dim(H) =d, then 

⇧ H (m) apple 

. 

dX 

✓ ◆ m 

. 

i 

Two inequalities are useful: 

dX 

✓ ◆ m 

apple m d +1 

i 

form 1, d 0 

i=0 

dX 

✓ ◆ m 

apple 

i 

i=0 

i=0 

⇣ me 

⌘ d 

for m d 1 . 

d 

12

Non-Uniform Learnability: AclassH ✓ 2 X is non-uniformly learnable if there exists a 

function m nu learn 

H 

: H ⇥(0, 1)⇥(0, 1) ! N and a learning algorithm A : S 1 

i=0 (X ⇥{0, 1})i ! 

2 X ,suchthatforevery(✏>0, >0), h 2 H and probability distribution P over X ⇥{0, 1}, 

nu learn 

if a sample S of size m mH 

(h, ✏, delta) isdrawni.i.d. fromP ,then,withprobability 

at least (1 ), L P (A(S)) appleL P (h)+✏. 

The fundamental sample complexity theorem: If H ✓ 2 X is a class with finite 

VC-dim(H) =d, thenthereexistsC 1 ,C 2 such that for all (✏ >0, >0), 

d +log(1/ ) 

d +log(1/ ) 

C 1

Cheat Sheet - Student.cs.uwaterloo.ca

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?