Lecture 18 Subgradients
Lecture 18 Subgradients
Lecture 18 Subgradients
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Lecture</strong> <strong>18</strong><br />
<strong>Subgradients</strong><br />
November 3, 2008
Outline<br />
<strong>Lecture</strong> <strong>18</strong><br />
• Existence of <strong>Subgradients</strong><br />
• Subdifferential Properties<br />
• Optimality Conditions<br />
Convex Optimization 1
Convex-Constrained Non-differentiable Minimization<br />
<strong>Lecture</strong> <strong>18</strong><br />
minimize f(x)<br />
subject to x ∈ C<br />
• Characteristics:<br />
• The function f : R n → R is convex and possibly non-differentiable<br />
• The set C ⊆ R n is nonempty closed and convex<br />
• The optimal value f ∗ is finite<br />
• Our focus here is non-differentiability<br />
Convex Optimization 2
Definition of Subgradient and Subdifferential<br />
Def. A vector s ∈ R n is a subgradient of f at ˆx ∈ dom f when<br />
<strong>Lecture</strong> <strong>18</strong><br />
f(x) ≥ f(ˆx) + s T (x − ˆx) for all x ∈ dom f<br />
Def. A subdifferential of f at ˆx ∈ dom f is the set of all subgradients s<br />
of f at ˆx ∈ dom f<br />
• The subdifferential of f at ˆx is denoted by ∂f(ˆx)<br />
• When f is differentible at ˆx, we have ∂f(ˆx) = {∇f(ˆx)} (the subdifferential<br />
is a singleton)<br />
• Examples<br />
f(x) = |x|, ∂f(0) =<br />
f(x) =<br />
⎧<br />
⎨<br />
⎩<br />
⎧<br />
⎨<br />
⎩<br />
sign(x) for x ≠ 0<br />
[−1, 1] for x = 0<br />
x 2 + 2|x| − 3 for |x| > 1<br />
0 for |x| ≤ 1<br />
Convex Optimization 3
<strong>Lecture</strong> <strong>18</strong><br />
<strong>Subgradients</strong> and Epigraph<br />
• Let s be a subgradient of f at ˆx:<br />
f(x) ≥ f(ˆx) + s T (x − ˆx)<br />
• The subgradient inequality is equivalent to<br />
−s T ˆx + f(ˆx) ≤ −s T x + f(x)<br />
for all x ∈ dom f<br />
for all x ∈ dom f<br />
• Let f(x) > −∞ for all x ∈ R n . Then<br />
epi f = {(x, w) | f(x) ≤ w, x ∈ R n }<br />
Thus, −s T ˆx + f(ˆx) ≤ −s T x + w for all (x, w) ∈ epi f, equivalent to<br />
[ ] T [ ] [ ] T [ ]<br />
−s ˆx −s x<br />
≤<br />
for all (x, w) ∈ epi f<br />
1 f(ˆx) 1 w<br />
Therefore, the hyperplane<br />
H = { (x, γ) ∈ R n+1 | (−s, 1) T (x, γ) = (−s, 1) T (ˆx, f(ˆx)) }<br />
supports epi f at the vector (ˆx, f(ˆx))<br />
Convex Optimization 4
<strong>Lecture</strong> <strong>18</strong><br />
Subdifferential Set Properties<br />
Theorem 1 A subdifferential set ∂f(ˆx) is convex and closed<br />
Proof H7.<br />
Theorem 2 (Existence) Let f be convex with a nonempty dom f. Then:<br />
(a) For x ∈ relint(dom f), we have ∂f(x) ≠ ∅.<br />
(b) ∂f(x) ≠ ∅ is nonempty and bounded if and only if x ∈ int(dom f).<br />
Implications<br />
• The subdifferential ∂f(ˆx) is nonempty compact convex set for every ˆx<br />
in the interior of dom f.<br />
• When dom f = R n , ∂f(x) is nonempty compact convex set for all x<br />
Convex Optimization 5
<strong>Lecture</strong> <strong>18</strong><br />
Partial Proof: If ˆx ∈ int(dom f), then ∂f(ˆx) is nonempty and bounded.<br />
• ∂f(ˆx) Nonempty.<br />
Let ˆx be in the interior of dom f. The vector (ˆx, f(ˆx)) does not<br />
belong to the interior of epi f. The epigraph epi f is convex and by<br />
the Supporting Hyperlane Theorem, there is a vector (d, β) ∈ R n+1 ,<br />
(d, β) ≠ 0 such that<br />
d T ˆx + βf(ˆx) ≤ d T x + βw for all (x, w) ∈ epi f<br />
We have epi f = {(x, w) | f(x) ≤ w, x ∈ dom f}. Hence,<br />
d T ˆx + βf(ˆx) ≤ d T x + βw for all x ∈ dom f, f(x) ≤ w<br />
We must have β ≥ 0. We cannot have β = 0 (it would imply d = 0).<br />
Dividing by β, we see that −d/β is a subgradient of f at ˆx<br />
Convex Optimization 6
• ∂f(ˆx) Bounded.<br />
By the subgradient inequality, we have<br />
<strong>Lecture</strong> <strong>18</strong><br />
f(x) ≥ f(ˆx) + s T (x − ˆx) for all x ∈ dom f<br />
Suppose that the subdifferential ∂f(ˆx) is unbounded. Let s k be a<br />
sequence of subgradients in ∂f(ˆx) with ‖s k ‖ → ∞.<br />
Since ˆx lies in the interior of domain, there exists a δ > 0 such that<br />
ˆx + δy ∈ dom f for any y ∈ R n . Letting x = ˆx + δ s k<br />
for any k, we<br />
‖s k ‖<br />
have<br />
(<br />
f ˆx + δ<br />
s )<br />
k<br />
≥ f(ˆx) + δ‖s k ‖ for all k<br />
‖s k ‖<br />
As k → ∞, we have f (ˆx + δ s )<br />
k<br />
‖s k ‖ − f(ˆx) → ∞.<br />
However, this relation contradicts the continuity of f at ˆx. [Recall, a<br />
convex function is continuous over the interior of its domain.]<br />
Example Consider f(x) = − √ x with dom f = {x | x ≥ 0}. We have<br />
∂f(0) = ∅. Note that 0 is not in the interior of the domain of f<br />
Convex Optimization 7
<strong>Lecture</strong> <strong>18</strong><br />
Boundedness of the Subdifferential Sets<br />
Theorem 2 Let f : R n → R be convex and let X be a bounded set. Then,<br />
the set<br />
∪ x∈X ∂f(x)<br />
is bounded.<br />
Proof<br />
Assume that there is an unbounded sequence of subgradients s k , i.e.,<br />
lim ‖s k‖ = ∞,<br />
k→∞<br />
where s k ∈ ∂f(x k ) for some x k ∈ X. The sequence {x k } is bounded, so<br />
it has a convergent subsequence, say {x k } K converging to some x ∈ R n .<br />
Consider d k = for k ∈ K. This is a bounded sequence, and it has a<br />
s k<br />
‖s k ‖<br />
Convex Optimization 8
<strong>Lecture</strong> <strong>18</strong><br />
convergent subsequence, say {d k } K ′ with K ′ ⊆ K. Let d be the limit of<br />
{d k } K ′.<br />
Since s k ∈ ∂f(x k ), we have for each k,<br />
f(x k + d k ) ≥ f(x k ) + s T k d k = f(x k ) + ‖s k ‖.<br />
By letting k → ∞ with k ∈ K ′ , we see that<br />
lim sup[f(x k + d k ) − f(x k )] ≥ lim sup ‖s k ‖.<br />
k→∞<br />
k∈K ′<br />
By continuity of f,<br />
k→∞<br />
k∈K ′<br />
lim sup[f(x k + d k ) − f(x k )] = f(x + d) − f(x),<br />
k→∞<br />
k∈K ′<br />
hence finite, implying that ‖s k ‖ is bounded. This is a contradition ({s k }<br />
was assumed to be unbounded).<br />
Convex Optimization 9
<strong>Lecture</strong> <strong>18</strong><br />
Continuity of the Subdifferential<br />
Theorem 3 Let f : R n → R be convex and let {x k } converge to some<br />
x ∈ R n . Let s k ∈ ∂f(x k ) for all k. Then, the sequence {s k } is bounded<br />
and every of its limit points is a subgradient of f at x.<br />
Proof H7.<br />
Convex Optimization 10
<strong>Lecture</strong> <strong>18</strong><br />
Subdifferential and Directional Derivatives<br />
Definition The directional derivative f ′ (x; d) of f at x along direction d is<br />
the following limiting value<br />
f ′ f(x + αd) − f(x)<br />
(x; d) = lim<br />
.<br />
α→0 α<br />
• When f is convex, the ratio f(x+αd)−f(x) is nondecreasing function of<br />
α<br />
α > 0, and as α decreases to zero, the ratio converges to some value or<br />
decreases to −∞. (HW)<br />
Theorem 4 Let x ∈ int(dom f). Then, the directional derivative f ′ (x; d)<br />
is finite for all d ∈ R n . In particular, we have<br />
f ′ (x; d) = max<br />
s∈∂f(x) sT d.<br />
Convex Optimization 11
<strong>Lecture</strong> <strong>18</strong><br />
Proof<br />
When x ∈ int(dom f), the subdifferential ∂f(x) is nonempty and compact.<br />
Using the subgradient defining relation, we can see that f ′ (x; d) ≥ s T d for<br />
all s ∈ ∂f(x). Therefore,<br />
f ′ (x; d) ≥ max<br />
s∈∂f(x) sT d.<br />
To show that actually equality holds, we rely on Separating Hyperplane<br />
Theorem. Define<br />
C 1 = {(z, w) | z ∈ dom f, f(z) < w},<br />
C 2 = {(y, v) | y = x + αd, v = f(x) + αf ′ (x; d), α ≥ 0}.<br />
These sets are nonempty, convex, and disjoint (HW). By the Separating<br />
Convex Optimization 12
<strong>Lecture</strong> <strong>18</strong><br />
Hyperplane Theorem, there exists a nonzero vector (a, β) ∈ R n+1 such<br />
that<br />
a T (x + αd) + β(f(x) + αf ′ (x; d)) ≤ a T z + βw, (1)<br />
for all α ≥ 0, z ∈ dom f, and f(z) < w. We must have β ≥ 0 - why? We<br />
cannot have β = 0 -why?<br />
Thus, β > 0 and we can divide by β the relation in (1), and obtain with<br />
ã = a/β,<br />
ã T (x + αd) + f(x) + αf ′ (x; d) ≤ ã T z + w, (2)<br />
for all α ≥ 0, z ∈ dom f, and f(z) < w. Choosing α = 0 and letting<br />
w ↓ f(z), we see<br />
ã T x + f(x) ≤ ã T z + f(z),<br />
implying that f(x) − ã T (z − x) ≤ f(z) for all z ∈ dom f. Therefore<br />
−ã ∈ ∂f(x).<br />
Convex Optimization 13
<strong>Lecture</strong> <strong>18</strong><br />
Letting z = x, w ↓ f(z) and α = 1 in (2), we obtain<br />
ã T (x + d) + f(x) + f ′ (x; d) ≤ ã T x + f(x),<br />
implying f ′ (x; d) ≤ −ã T d. In view of f ′ (x; d) ≥ max s∈∂f(x) s T d, it follows<br />
that<br />
f ′ (x; d) = max<br />
s∈∂f(x) sT d,<br />
where the maximum is attained at the “constructed” subgradient −ã.<br />
Convex Optimization 14
<strong>Lecture</strong> <strong>18</strong><br />
Optimality Conditions: Unconstrained Case<br />
Unconstrained optimization<br />
Assumption<br />
minimize f(x)<br />
• The function f is convex (non-differentiable) and proper<br />
[f proper means f(x) > −∞ for all x and dom f ≠ ∅]<br />
Theorem Under this assumption, a vector x ∗ minimizes f over R n if and<br />
only if<br />
0 ∈ ∂f(x ∗ )<br />
• The result is a generalization of ∇f(x ∗ ) = 0<br />
• Proof x ∗ is optimal if and only if f(x) ≥ f(x ∗ ) for all x, or equivalently<br />
f(x) ≥ f(x ∗ ) + 0 T (x − x ∗ ) for all x ∈ R n<br />
Thus, x ∗ is optimal if and only if 0 ∈ ∂f(x ∗ )<br />
Convex Optimization 15
<strong>Lecture</strong> <strong>18</strong><br />
Examples<br />
• The function f(x) = |x|<br />
∂f(0) =<br />
⎧<br />
⎨<br />
⎩<br />
sign(x) for x ≠ 0<br />
[−1, 1] for x = 0<br />
The minimum is at x ∗ = 0, and evidently 0 ∈ ∂f(0)<br />
• The function f(x) = ‖x‖<br />
∂f(x) =<br />
⎧<br />
⎨<br />
⎩<br />
x<br />
‖x‖<br />
for x ≠ 0<br />
{s | ‖s‖ ≤ 1} for x = 0<br />
Again, the minimum is at x ∗ = 0 and 0 ∈ ∂f(0)<br />
Convex Optimization 16
<strong>Lecture</strong> <strong>18</strong><br />
• The function f(x) = max{x 2 + 2x − 3, x 2 − 2x − 3, 4}<br />
f(x) =<br />
⎧<br />
⎪⎨<br />
⎪⎩<br />
x 2 − 2x − 3<br />
for x < −1<br />
4 for x ∈ [−1, 1]<br />
x 2 + 2x − 3 for x > 1<br />
∂f(x) =<br />
⎧<br />
⎪⎨<br />
⎪⎩<br />
2x − 2 for x > 1<br />
[−4, 0]<br />
for x = −1<br />
0 for x ∈ (−1, 1)<br />
[0, 4] for x = 1<br />
2x + 2 for x > 1<br />
• The optimal set is X ∗ = [−1, 1]<br />
• For every x ∗ ∈ X ∗ , we have 0 ∈ ∂f(x ∗ )<br />
Convex Optimization 17
<strong>Lecture</strong> <strong>18</strong><br />
Optimality Conditions: Constrained Case<br />
Constrained optimization<br />
Assumption<br />
minimize f(x)<br />
subject to x ∈ C<br />
• The function f is convex (non-differentiable) and proper<br />
• The set C is nonempty closed and convex<br />
Theorem Under this assumption, a vector x ∗ ∈ C minimizes f over the<br />
set C if and only if there exists a subgradient d ∈ ∂f(x ∗ ) such that<br />
d T (x − x ∗ ) ≥ 0<br />
for all x ∈ C<br />
• The result is a generalization of ∇f(x ∗ ) T (x − x ∗ ) ≥ 0 for x ∈ C<br />
Convex Optimization <strong>18</strong>