23.01.2014 Views

Lecture 18 Subgradients

Lecture 18 Subgradients

Lecture 18 Subgradients

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Lecture</strong> <strong>18</strong><br />

<strong>Subgradients</strong><br />

November 3, 2008


Outline<br />

<strong>Lecture</strong> <strong>18</strong><br />

• Existence of <strong>Subgradients</strong><br />

• Subdifferential Properties<br />

• Optimality Conditions<br />

Convex Optimization 1


Convex-Constrained Non-differentiable Minimization<br />

<strong>Lecture</strong> <strong>18</strong><br />

minimize f(x)<br />

subject to x ∈ C<br />

• Characteristics:<br />

• The function f : R n → R is convex and possibly non-differentiable<br />

• The set C ⊆ R n is nonempty closed and convex<br />

• The optimal value f ∗ is finite<br />

• Our focus here is non-differentiability<br />

Convex Optimization 2


Definition of Subgradient and Subdifferential<br />

Def. A vector s ∈ R n is a subgradient of f at ˆx ∈ dom f when<br />

<strong>Lecture</strong> <strong>18</strong><br />

f(x) ≥ f(ˆx) + s T (x − ˆx) for all x ∈ dom f<br />

Def. A subdifferential of f at ˆx ∈ dom f is the set of all subgradients s<br />

of f at ˆx ∈ dom f<br />

• The subdifferential of f at ˆx is denoted by ∂f(ˆx)<br />

• When f is differentible at ˆx, we have ∂f(ˆx) = {∇f(ˆx)} (the subdifferential<br />

is a singleton)<br />

• Examples<br />

f(x) = |x|, ∂f(0) =<br />

f(x) =<br />

⎧<br />

⎨<br />

⎩<br />

⎧<br />

⎨<br />

⎩<br />

sign(x) for x ≠ 0<br />

[−1, 1] for x = 0<br />

x 2 + 2|x| − 3 for |x| > 1<br />

0 for |x| ≤ 1<br />

Convex Optimization 3


<strong>Lecture</strong> <strong>18</strong><br />

<strong>Subgradients</strong> and Epigraph<br />

• Let s be a subgradient of f at ˆx:<br />

f(x) ≥ f(ˆx) + s T (x − ˆx)<br />

• The subgradient inequality is equivalent to<br />

−s T ˆx + f(ˆx) ≤ −s T x + f(x)<br />

for all x ∈ dom f<br />

for all x ∈ dom f<br />

• Let f(x) > −∞ for all x ∈ R n . Then<br />

epi f = {(x, w) | f(x) ≤ w, x ∈ R n }<br />

Thus, −s T ˆx + f(ˆx) ≤ −s T x + w for all (x, w) ∈ epi f, equivalent to<br />

[ ] T [ ] [ ] T [ ]<br />

−s ˆx −s x<br />

≤<br />

for all (x, w) ∈ epi f<br />

1 f(ˆx) 1 w<br />

Therefore, the hyperplane<br />

H = { (x, γ) ∈ R n+1 | (−s, 1) T (x, γ) = (−s, 1) T (ˆx, f(ˆx)) }<br />

supports epi f at the vector (ˆx, f(ˆx))<br />

Convex Optimization 4


<strong>Lecture</strong> <strong>18</strong><br />

Subdifferential Set Properties<br />

Theorem 1 A subdifferential set ∂f(ˆx) is convex and closed<br />

Proof H7.<br />

Theorem 2 (Existence) Let f be convex with a nonempty dom f. Then:<br />

(a) For x ∈ relint(dom f), we have ∂f(x) ≠ ∅.<br />

(b) ∂f(x) ≠ ∅ is nonempty and bounded if and only if x ∈ int(dom f).<br />

Implications<br />

• The subdifferential ∂f(ˆx) is nonempty compact convex set for every ˆx<br />

in the interior of dom f.<br />

• When dom f = R n , ∂f(x) is nonempty compact convex set for all x<br />

Convex Optimization 5


<strong>Lecture</strong> <strong>18</strong><br />

Partial Proof: If ˆx ∈ int(dom f), then ∂f(ˆx) is nonempty and bounded.<br />

• ∂f(ˆx) Nonempty.<br />

Let ˆx be in the interior of dom f. The vector (ˆx, f(ˆx)) does not<br />

belong to the interior of epi f. The epigraph epi f is convex and by<br />

the Supporting Hyperlane Theorem, there is a vector (d, β) ∈ R n+1 ,<br />

(d, β) ≠ 0 such that<br />

d T ˆx + βf(ˆx) ≤ d T x + βw for all (x, w) ∈ epi f<br />

We have epi f = {(x, w) | f(x) ≤ w, x ∈ dom f}. Hence,<br />

d T ˆx + βf(ˆx) ≤ d T x + βw for all x ∈ dom f, f(x) ≤ w<br />

We must have β ≥ 0. We cannot have β = 0 (it would imply d = 0).<br />

Dividing by β, we see that −d/β is a subgradient of f at ˆx<br />

Convex Optimization 6


• ∂f(ˆx) Bounded.<br />

By the subgradient inequality, we have<br />

<strong>Lecture</strong> <strong>18</strong><br />

f(x) ≥ f(ˆx) + s T (x − ˆx) for all x ∈ dom f<br />

Suppose that the subdifferential ∂f(ˆx) is unbounded. Let s k be a<br />

sequence of subgradients in ∂f(ˆx) with ‖s k ‖ → ∞.<br />

Since ˆx lies in the interior of domain, there exists a δ > 0 such that<br />

ˆx + δy ∈ dom f for any y ∈ R n . Letting x = ˆx + δ s k<br />

for any k, we<br />

‖s k ‖<br />

have<br />

(<br />

f ˆx + δ<br />

s )<br />

k<br />

≥ f(ˆx) + δ‖s k ‖ for all k<br />

‖s k ‖<br />

As k → ∞, we have f (ˆx + δ s )<br />

k<br />

‖s k ‖ − f(ˆx) → ∞.<br />

However, this relation contradicts the continuity of f at ˆx. [Recall, a<br />

convex function is continuous over the interior of its domain.]<br />

Example Consider f(x) = − √ x with dom f = {x | x ≥ 0}. We have<br />

∂f(0) = ∅. Note that 0 is not in the interior of the domain of f<br />

Convex Optimization 7


<strong>Lecture</strong> <strong>18</strong><br />

Boundedness of the Subdifferential Sets<br />

Theorem 2 Let f : R n → R be convex and let X be a bounded set. Then,<br />

the set<br />

∪ x∈X ∂f(x)<br />

is bounded.<br />

Proof<br />

Assume that there is an unbounded sequence of subgradients s k , i.e.,<br />

lim ‖s k‖ = ∞,<br />

k→∞<br />

where s k ∈ ∂f(x k ) for some x k ∈ X. The sequence {x k } is bounded, so<br />

it has a convergent subsequence, say {x k } K converging to some x ∈ R n .<br />

Consider d k = for k ∈ K. This is a bounded sequence, and it has a<br />

s k<br />

‖s k ‖<br />

Convex Optimization 8


<strong>Lecture</strong> <strong>18</strong><br />

convergent subsequence, say {d k } K ′ with K ′ ⊆ K. Let d be the limit of<br />

{d k } K ′.<br />

Since s k ∈ ∂f(x k ), we have for each k,<br />

f(x k + d k ) ≥ f(x k ) + s T k d k = f(x k ) + ‖s k ‖.<br />

By letting k → ∞ with k ∈ K ′ , we see that<br />

lim sup[f(x k + d k ) − f(x k )] ≥ lim sup ‖s k ‖.<br />

k→∞<br />

k∈K ′<br />

By continuity of f,<br />

k→∞<br />

k∈K ′<br />

lim sup[f(x k + d k ) − f(x k )] = f(x + d) − f(x),<br />

k→∞<br />

k∈K ′<br />

hence finite, implying that ‖s k ‖ is bounded. This is a contradition ({s k }<br />

was assumed to be unbounded).<br />

Convex Optimization 9


<strong>Lecture</strong> <strong>18</strong><br />

Continuity of the Subdifferential<br />

Theorem 3 Let f : R n → R be convex and let {x k } converge to some<br />

x ∈ R n . Let s k ∈ ∂f(x k ) for all k. Then, the sequence {s k } is bounded<br />

and every of its limit points is a subgradient of f at x.<br />

Proof H7.<br />

Convex Optimization 10


<strong>Lecture</strong> <strong>18</strong><br />

Subdifferential and Directional Derivatives<br />

Definition The directional derivative f ′ (x; d) of f at x along direction d is<br />

the following limiting value<br />

f ′ f(x + αd) − f(x)<br />

(x; d) = lim<br />

.<br />

α→0 α<br />

• When f is convex, the ratio f(x+αd)−f(x) is nondecreasing function of<br />

α<br />

α > 0, and as α decreases to zero, the ratio converges to some value or<br />

decreases to −∞. (HW)<br />

Theorem 4 Let x ∈ int(dom f). Then, the directional derivative f ′ (x; d)<br />

is finite for all d ∈ R n . In particular, we have<br />

f ′ (x; d) = max<br />

s∈∂f(x) sT d.<br />

Convex Optimization 11


<strong>Lecture</strong> <strong>18</strong><br />

Proof<br />

When x ∈ int(dom f), the subdifferential ∂f(x) is nonempty and compact.<br />

Using the subgradient defining relation, we can see that f ′ (x; d) ≥ s T d for<br />

all s ∈ ∂f(x). Therefore,<br />

f ′ (x; d) ≥ max<br />

s∈∂f(x) sT d.<br />

To show that actually equality holds, we rely on Separating Hyperplane<br />

Theorem. Define<br />

C 1 = {(z, w) | z ∈ dom f, f(z) < w},<br />

C 2 = {(y, v) | y = x + αd, v = f(x) + αf ′ (x; d), α ≥ 0}.<br />

These sets are nonempty, convex, and disjoint (HW). By the Separating<br />

Convex Optimization 12


<strong>Lecture</strong> <strong>18</strong><br />

Hyperplane Theorem, there exists a nonzero vector (a, β) ∈ R n+1 such<br />

that<br />

a T (x + αd) + β(f(x) + αf ′ (x; d)) ≤ a T z + βw, (1)<br />

for all α ≥ 0, z ∈ dom f, and f(z) < w. We must have β ≥ 0 - why? We<br />

cannot have β = 0 -why?<br />

Thus, β > 0 and we can divide by β the relation in (1), and obtain with<br />

ã = a/β,<br />

ã T (x + αd) + f(x) + αf ′ (x; d) ≤ ã T z + w, (2)<br />

for all α ≥ 0, z ∈ dom f, and f(z) < w. Choosing α = 0 and letting<br />

w ↓ f(z), we see<br />

ã T x + f(x) ≤ ã T z + f(z),<br />

implying that f(x) − ã T (z − x) ≤ f(z) for all z ∈ dom f. Therefore<br />

−ã ∈ ∂f(x).<br />

Convex Optimization 13


<strong>Lecture</strong> <strong>18</strong><br />

Letting z = x, w ↓ f(z) and α = 1 in (2), we obtain<br />

ã T (x + d) + f(x) + f ′ (x; d) ≤ ã T x + f(x),<br />

implying f ′ (x; d) ≤ −ã T d. In view of f ′ (x; d) ≥ max s∈∂f(x) s T d, it follows<br />

that<br />

f ′ (x; d) = max<br />

s∈∂f(x) sT d,<br />

where the maximum is attained at the “constructed” subgradient −ã.<br />

Convex Optimization 14


<strong>Lecture</strong> <strong>18</strong><br />

Optimality Conditions: Unconstrained Case<br />

Unconstrained optimization<br />

Assumption<br />

minimize f(x)<br />

• The function f is convex (non-differentiable) and proper<br />

[f proper means f(x) > −∞ for all x and dom f ≠ ∅]<br />

Theorem Under this assumption, a vector x ∗ minimizes f over R n if and<br />

only if<br />

0 ∈ ∂f(x ∗ )<br />

• The result is a generalization of ∇f(x ∗ ) = 0<br />

• Proof x ∗ is optimal if and only if f(x) ≥ f(x ∗ ) for all x, or equivalently<br />

f(x) ≥ f(x ∗ ) + 0 T (x − x ∗ ) for all x ∈ R n<br />

Thus, x ∗ is optimal if and only if 0 ∈ ∂f(x ∗ )<br />

Convex Optimization 15


<strong>Lecture</strong> <strong>18</strong><br />

Examples<br />

• The function f(x) = |x|<br />

∂f(0) =<br />

⎧<br />

⎨<br />

⎩<br />

sign(x) for x ≠ 0<br />

[−1, 1] for x = 0<br />

The minimum is at x ∗ = 0, and evidently 0 ∈ ∂f(0)<br />

• The function f(x) = ‖x‖<br />

∂f(x) =<br />

⎧<br />

⎨<br />

⎩<br />

x<br />

‖x‖<br />

for x ≠ 0<br />

{s | ‖s‖ ≤ 1} for x = 0<br />

Again, the minimum is at x ∗ = 0 and 0 ∈ ∂f(0)<br />

Convex Optimization 16


<strong>Lecture</strong> <strong>18</strong><br />

• The function f(x) = max{x 2 + 2x − 3, x 2 − 2x − 3, 4}<br />

f(x) =<br />

⎧<br />

⎪⎨<br />

⎪⎩<br />

x 2 − 2x − 3<br />

for x < −1<br />

4 for x ∈ [−1, 1]<br />

x 2 + 2x − 3 for x > 1<br />

∂f(x) =<br />

⎧<br />

⎪⎨<br />

⎪⎩<br />

2x − 2 for x > 1<br />

[−4, 0]<br />

for x = −1<br />

0 for x ∈ (−1, 1)<br />

[0, 4] for x = 1<br />

2x + 2 for x > 1<br />

• The optimal set is X ∗ = [−1, 1]<br />

• For every x ∗ ∈ X ∗ , we have 0 ∈ ∂f(x ∗ )<br />

Convex Optimization 17


<strong>Lecture</strong> <strong>18</strong><br />

Optimality Conditions: Constrained Case<br />

Constrained optimization<br />

Assumption<br />

minimize f(x)<br />

subject to x ∈ C<br />

• The function f is convex (non-differentiable) and proper<br />

• The set C is nonempty closed and convex<br />

Theorem Under this assumption, a vector x ∗ ∈ C minimizes f over the<br />

set C if and only if there exists a subgradient d ∈ ∂f(x ∗ ) such that<br />

d T (x − x ∗ ) ≥ 0<br />

for all x ∈ C<br />

• The result is a generalization of ∇f(x ∗ ) T (x − x ∗ ) ≥ 0 for x ∈ C<br />

Convex Optimization <strong>18</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!