30.08.2014 Views

Cheat Sheet - Student.cs.uwaterloo.ca

Cheat Sheet - Student.cs.uwaterloo.ca

Cheat Sheet - Student.cs.uwaterloo.ca

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Cheat</strong> <strong>Sheet</strong><br />

Markov’s inequality: Let X be a non-negative random variable. Then for any t>0,<br />

Pr(X<br />

t) apple E(X)<br />

t<br />

Hoe↵ding’s bound: Let X 1 ,X 2 ,...,X m be independently distributed random variables<br />

lying in interval [0, 1], and let X = 1 m<br />

P m<br />

i=1 X i.Thenforany✏>0,<br />

Pr(|X E(X)| ✏) apple 2e 2m✏2 .<br />

Agnostic PAC-learnability: AclassH ✓ 2 X is (Agnostic PAC) learnable if there exists<br />

a function m learn<br />

H :(0, 1) ⇥ (0, 1) ! N and a learning algorithm A : S 1<br />

i=0 (X ⇥{0, 1})i ! 2 X ,<br />

such that for every (✏ >0, >0) and probability distribution P over X ⇥{0, 1}, ifasample<br />

S of size m m learn<br />

H (✏, delta) is drawn i.i.d. from P ,then,withprobabilityatleast(1 ),<br />

L P (A(S)) appleL P (H)+✏. (Here, L P (H) = inf h2H L P (h).)<br />

Non-Uniform Learnability: AclassH ✓ 2 X is non-uniformly learnable if there exists a<br />

function m nu learn<br />

H<br />

: H ⇥(0, 1)⇥(0, 1) ! N and a learning algorithm A : S 1<br />

i=0 (X ⇥{0, 1})i !<br />

2 X ,suchthatforevery(✏>0, >0), h 2 H and probability distribution P over X ⇥{0, 1},<br />

nu learn<br />

if a sample S of size m mH<br />

(h, ✏, delta) isdrawni.i.d. fromP ,then,withprobability<br />

at least (1 ), L P (A(S)) appleL P (h)+✏. Shattering & VC-dim: AclassH ✓ 2 X shatters<br />

set A i↵ {A \ h : h 2 H} =2 A .Vapnik-ChervonekisdimensionofH is defined as<br />

VC-dim(H) = sup{|A| : A ✓ X, H shatters A} .<br />

Sauer’s lemma: For a class H ✓ 2 X and a set A ✓ X, wedenoteby⇧ H (A) thenumberof<br />

cuts realized by H on A. Formally,<br />

⇧ H (A) =|{A \ h : h 2 H}| .<br />

For sample size m, ⇧ H (m) isthemaximumnumberofcutsonanysetofsizem. Formally,<br />

⇧ H (m) =max{⇧ H (A) : A ✓ X, |A| = m} .<br />

Sauer’s lemma says that if VC-dim(H) =d, then<br />

⇧ H (m) apple<br />

.<br />

dX<br />

✓ ◆ m<br />

.<br />

i<br />

Two inequalities are useful:<br />

dX<br />

✓ ◆ m<br />

apple m d +1<br />

i<br />

form 1, d 0<br />

i=0<br />

dX<br />

✓ ◆ m<br />

apple<br />

i<br />

i=0<br />

i=0<br />

⇣ me<br />

⌘ d<br />

for m d 1 .<br />

d<br />

12


Non-Uniform Learnability: AclassH ✓ 2 X is non-uniformly learnable if there exists a<br />

function m nu learn<br />

H<br />

: H ⇥(0, 1)⇥(0, 1) ! N and a learning algorithm A : S 1<br />

i=0 (X ⇥{0, 1})i !<br />

2 X ,suchthatforevery(✏>0, >0), h 2 H and probability distribution P over X ⇥{0, 1},<br />

nu learn<br />

if a sample S of size m mH<br />

(h, ✏, delta) isdrawni.i.d. fromP ,then,withprobability<br />

at least (1 ), L P (A(S)) appleL P (h)+✏.<br />

The fundamental sample complexity theorem: If H ✓ 2 X is a class with finite<br />

VC-dim(H) =d, thenthereexistsC 1 ,C 2 such that for all (✏ >0, >0),<br />

d +log(1/ )<br />

d +log(1/ )<br />

C 1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!