Cheat Sheet - Student.cs.uwaterloo.ca
Cheat Sheet - Student.cs.uwaterloo.ca
Cheat Sheet - Student.cs.uwaterloo.ca
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Cheat</strong> <strong>Sheet</strong><br />
Markov’s inequality: Let X be a non-negative random variable. Then for any t>0,<br />
Pr(X<br />
t) apple E(X)<br />
t<br />
Hoe↵ding’s bound: Let X 1 ,X 2 ,...,X m be independently distributed random variables<br />
lying in interval [0, 1], and let X = 1 m<br />
P m<br />
i=1 X i.Thenforany✏>0,<br />
Pr(|X E(X)| ✏) apple 2e 2m✏2 .<br />
Agnostic PAC-learnability: AclassH ✓ 2 X is (Agnostic PAC) learnable if there exists<br />
a function m learn<br />
H :(0, 1) ⇥ (0, 1) ! N and a learning algorithm A : S 1<br />
i=0 (X ⇥{0, 1})i ! 2 X ,<br />
such that for every (✏ >0, >0) and probability distribution P over X ⇥{0, 1}, ifasample<br />
S of size m m learn<br />
H (✏, delta) is drawn i.i.d. from P ,then,withprobabilityatleast(1 ),<br />
L P (A(S)) appleL P (H)+✏. (Here, L P (H) = inf h2H L P (h).)<br />
Non-Uniform Learnability: AclassH ✓ 2 X is non-uniformly learnable if there exists a<br />
function m nu learn<br />
H<br />
: H ⇥(0, 1)⇥(0, 1) ! N and a learning algorithm A : S 1<br />
i=0 (X ⇥{0, 1})i !<br />
2 X ,suchthatforevery(✏>0, >0), h 2 H and probability distribution P over X ⇥{0, 1},<br />
nu learn<br />
if a sample S of size m mH<br />
(h, ✏, delta) isdrawni.i.d. fromP ,then,withprobability<br />
at least (1 ), L P (A(S)) appleL P (h)+✏. Shattering & VC-dim: AclassH ✓ 2 X shatters<br />
set A i↵ {A \ h : h 2 H} =2 A .Vapnik-ChervonekisdimensionofH is defined as<br />
VC-dim(H) = sup{|A| : A ✓ X, H shatters A} .<br />
Sauer’s lemma: For a class H ✓ 2 X and a set A ✓ X, wedenoteby⇧ H (A) thenumberof<br />
cuts realized by H on A. Formally,<br />
⇧ H (A) =|{A \ h : h 2 H}| .<br />
For sample size m, ⇧ H (m) isthemaximumnumberofcutsonanysetofsizem. Formally,<br />
⇧ H (m) =max{⇧ H (A) : A ✓ X, |A| = m} .<br />
Sauer’s lemma says that if VC-dim(H) =d, then<br />
⇧ H (m) apple<br />
.<br />
dX<br />
✓ ◆ m<br />
.<br />
i<br />
Two inequalities are useful:<br />
dX<br />
✓ ◆ m<br />
apple m d +1<br />
i<br />
form 1, d 0<br />
i=0<br />
dX<br />
✓ ◆ m<br />
apple<br />
i<br />
i=0<br />
i=0<br />
⇣ me<br />
⌘ d<br />
for m d 1 .<br />
d<br />
12
Non-Uniform Learnability: AclassH ✓ 2 X is non-uniformly learnable if there exists a<br />
function m nu learn<br />
H<br />
: H ⇥(0, 1)⇥(0, 1) ! N and a learning algorithm A : S 1<br />
i=0 (X ⇥{0, 1})i !<br />
2 X ,suchthatforevery(✏>0, >0), h 2 H and probability distribution P over X ⇥{0, 1},<br />
nu learn<br />
if a sample S of size m mH<br />
(h, ✏, delta) isdrawni.i.d. fromP ,then,withprobability<br />
at least (1 ), L P (A(S)) appleL P (h)+✏.<br />
The fundamental sample complexity theorem: If H ✓ 2 X is a class with finite<br />
VC-dim(H) =d, thenthereexistsC 1 ,C 2 such that for all (✏ >0, >0),<br />
d +log(1/ )<br />
d +log(1/ )<br />
C 1