Lecture 18 Subgradients

Lecture 18 

Subgradients 

November 3, 2008

Outline 


• Existence of Subgradients 

• Subdifferential Properties 

• Optimality Conditions 

Convex Optimization 1

Convex-Constrained Non-differentiable Minimization 


minimize f(x) 

subject to x ∈ C 

• Characteristics: 

• The function f : R n → R is convex and possibly non-differentiable 

• The set C ⊆ R n is nonempty closed and convex 

• The optimal value f ∗ is finite 

• Our focus here is non-differentiability 


Definition of Subgradient and Subdifferential 

Def. A vector s ∈ R n is a subgradient of f at ˆx ∈ dom f when 


f(x) ≥ f(ˆx) + s T (x − ˆx) for all x ∈ dom f 

Def. A subdifferential of f at ˆx ∈ dom f is the set of all subgradients s 

of f at ˆx ∈ dom f 

• The subdifferential of f at ˆx is denoted by ∂f(ˆx) 

• When f is differentible at ˆx, we have ∂f(ˆx) = {∇f(ˆx)} (the subdifferential 

is a singleton) 

• Examples 

f(x) = |x|, ∂f(0) = 

f(x) = 

⎧ 

⎨ 

⎩ 

⎧ 

⎨ 

⎩ 

sign(x) for x ≠ 0 

[−1, 1] for x = 0 

x 2 + 2|x| − 3 for |x| > 1 

0 for |x| ≤ 1 



Subgradients and Epigraph 

• Let s be a subgradient of f at ˆx: 

f(x) ≥ f(ˆx) + s T (x − ˆx) 

• The subgradient inequality is equivalent to 

−s T ˆx + f(ˆx) ≤ −s T x + f(x) 

for all x ∈ dom f 

for all x ∈ dom f 

• Let f(x) > −∞ for all x ∈ R n . Then 

epi f = {(x, w) | f(x) ≤ w, x ∈ R n } 

Thus, −s T ˆx + f(ˆx) ≤ −s T x + w for all (x, w) ∈ epi f, equivalent to 

[ ] T [ ] [ ] T [ ] 

−s ˆx −s x 

≤ 

for all (x, w) ∈ epi f 

1 f(ˆx) 1 w 

Therefore, the hyperplane 

H = { (x, γ) ∈ R n+1 | (−s, 1) T (x, γ) = (−s, 1) T (ˆx, f(ˆx)) } 

supports epi f at the vector (ˆx, f(ˆx)) 



Subdifferential Set Properties 

Theorem 1 A subdifferential set ∂f(ˆx) is convex and closed 

Proof H7. 

Theorem 2 (Existence) Let f be convex with a nonempty dom f. Then: 

(a) For x ∈ relint(dom f), we have ∂f(x) ≠ ∅. 

(b) ∂f(x) ≠ ∅ is nonempty and bounded if and only if x ∈ int(dom f). 

Implications 

• The subdifferential ∂f(ˆx) is nonempty compact convex set for every ˆx 

in the interior of dom f. 

• When dom f = R n , ∂f(x) is nonempty compact convex set for all x 



Partial Proof: If ˆx ∈ int(dom f), then ∂f(ˆx) is nonempty and bounded. 

• ∂f(ˆx) Nonempty. 

Let ˆx be in the interior of dom f. The vector (ˆx, f(ˆx)) does not 

belong to the interior of epi f. The epigraph epi f is convex and by 

the Supporting Hyperlane Theorem, there is a vector (d, β) ∈ R n+1 , 

(d, β) ≠ 0 such that 

d T ˆx + βf(ˆx) ≤ d T x + βw for all (x, w) ∈ epi f 

We have epi f = {(x, w) | f(x) ≤ w, x ∈ dom f}. Hence, 

d T ˆx + βf(ˆx) ≤ d T x + βw for all x ∈ dom f, f(x) ≤ w 

We must have β ≥ 0. We cannot have β = 0 (it would imply d = 0). 

Dividing by β, we see that −d/β is a subgradient of f at ˆx 


• ∂f(ˆx) Bounded. 

By the subgradient inequality, we have 


f(x) ≥ f(ˆx) + s T (x − ˆx) for all x ∈ dom f 

Suppose that the subdifferential ∂f(ˆx) is unbounded. Let s k be a 

sequence of subgradients in ∂f(ˆx) with ‖s k ‖ → ∞. 

Since ˆx lies in the interior of domain, there exists a δ > 0 such that 

ˆx + δy ∈ dom f for any y ∈ R n . Letting x = ˆx + δ s k 

for any k, we 

‖s k ‖ 

have 

( 

f ˆx + δ 

s ) 

k 

≥ f(ˆx) + δ‖s k ‖ for all k 

‖s k ‖ 

As k → ∞, we have f (ˆx + δ s ) 

k 

‖s k ‖ − f(ˆx) → ∞. 

However, this relation contradicts the continuity of f at ˆx. [Recall, a 

convex function is continuous over the interior of its domain.] 

Example Consider f(x) = − √ x with dom f = {x | x ≥ 0}. We have 

∂f(0) = ∅. Note that 0 is not in the interior of the domain of f 



Boundedness of the Subdifferential Sets 

Theorem 2 Let f : R n → R be convex and let X be a bounded set. Then, 

the set 

∪ x∈X ∂f(x) 

is bounded. 

Proof 

Assume that there is an unbounded sequence of subgradients s k , i.e., 

lim ‖s k‖ = ∞, 

k→∞ 

where s k ∈ ∂f(x k ) for some x k ∈ X. The sequence {x k } is bounded, so 

it has a convergent subsequence, say {x k } K converging to some x ∈ R n . 

Consider d k = for k ∈ K. This is a bounded sequence, and it has a 

s k 

‖s k ‖ 



convergent subsequence, say {d k } K ′ with K ′ ⊆ K. Let d be the limit of 

{d k } K ′. 

Since s k ∈ ∂f(x k ), we have for each k, 

f(x k + d k ) ≥ f(x k ) + s T k d k = f(x k ) + ‖s k ‖. 

By letting k → ∞ with k ∈ K ′ , we see that 

lim sup[f(x k + d k ) − f(x k )] ≥ lim sup ‖s k ‖. 

k→∞ 

k∈K ′ 

By continuity of f, 

k→∞ 

k∈K ′ 

lim sup[f(x k + d k ) − f(x k )] = f(x + d) − f(x), 

k→∞ 

k∈K ′ 

hence finite, implying that ‖s k ‖ is bounded. This is a contradition ({s k } 

was assumed to be unbounded). 



Continuity of the Subdifferential 

Theorem 3 Let f : R n → R be convex and let {x k } converge to some 

x ∈ R n . Let s k ∈ ∂f(x k ) for all k. Then, the sequence {s k } is bounded 

and every of its limit points is a subgradient of f at x. 

Proof H7. 



Subdifferential and Directional Derivatives 

Definition The directional derivative f ′ (x; d) of f at x along direction d is 

the following limiting value 

f ′ f(x + αd) − f(x) 

(x; d) = lim 

. 

α→0 α 

• When f is convex, the ratio f(x+αd)−f(x) is nondecreasing function of 

α 

α > 0, and as α decreases to zero, the ratio converges to some value or 

decreases to −∞. (HW) 

Theorem 4 Let x ∈ int(dom f). Then, the directional derivative f ′ (x; d) 

is finite for all d ∈ R n . In particular, we have 

f ′ (x; d) = max 

s∈∂f(x) sT d. 



Proof 

When x ∈ int(dom f), the subdifferential ∂f(x) is nonempty and compact. 

Using the subgradient defining relation, we can see that f ′ (x; d) ≥ s T d for 

all s ∈ ∂f(x). Therefore, 

f ′ (x; d) ≥ max 

s∈∂f(x) sT d. 

To show that actually equality holds, we rely on Separating Hyperplane 

Theorem. Define 

C 1 = {(z, w) | z ∈ dom f, f(z) < w}, 

C 2 = {(y, v) | y = x + αd, v = f(x) + αf ′ (x; d), α ≥ 0}. 

These sets are nonempty, convex, and disjoint (HW). By the Separating 



Hyperplane Theorem, there exists a nonzero vector (a, β) ∈ R n+1 such 

that 

a T (x + αd) + β(f(x) + αf ′ (x; d)) ≤ a T z + βw, (1) 

for all α ≥ 0, z ∈ dom f, and f(z) < w. We must have β ≥ 0 - why? We 

cannot have β = 0 -why? 

Thus, β > 0 and we can divide by β the relation in (1), and obtain with 

ã = a/β, 

ã T (x + αd) + f(x) + αf ′ (x; d) ≤ ã T z + w, (2) 

for all α ≥ 0, z ∈ dom f, and f(z) < w. Choosing α = 0 and letting 

w ↓ f(z), we see 

ã T x + f(x) ≤ ã T z + f(z), 

implying that f(x) − ã T (z − x) ≤ f(z) for all z ∈ dom f. Therefore 

−ã ∈ ∂f(x). 



Letting z = x, w ↓ f(z) and α = 1 in (2), we obtain 

ã T (x + d) + f(x) + f ′ (x; d) ≤ ã T x + f(x), 

implying f ′ (x; d) ≤ −ã T d. In view of f ′ (x; d) ≥ max s∈∂f(x) s T d, it follows 

that 

f ′ (x; d) = max 

s∈∂f(x) sT d, 

where the maximum is attained at the “constructed” subgradient −ã. 



Optimality Conditions: Unconstrained Case 

Unconstrained optimization 

Assumption 

minimize f(x) 

• The function f is convex (non-differentiable) and proper 

[f proper means f(x) > −∞ for all x and dom f ≠ ∅] 

Theorem Under this assumption, a vector x ∗ minimizes f over R n if and 

only if 

0 ∈ ∂f(x ∗ ) 

• The result is a generalization of ∇f(x ∗ ) = 0 

• Proof x ∗ is optimal if and only if f(x) ≥ f(x ∗ ) for all x, or equivalently 

f(x) ≥ f(x ∗ ) + 0 T (x − x ∗ ) for all x ∈ R n 

Thus, x ∗ is optimal if and only if 0 ∈ ∂f(x ∗ ) 



Examples 

• The function f(x) = |x| 

∂f(0) = 

⎧ 

⎨ 

⎩ 

sign(x) for x ≠ 0 

[−1, 1] for x = 0 

The minimum is at x ∗ = 0, and evidently 0 ∈ ∂f(0) 

• The function f(x) = ‖x‖ 

∂f(x) = 

⎧ 

⎨ 

⎩ 

x 

‖x‖ 

for x ≠ 0 

{s | ‖s‖ ≤ 1} for x = 0 

Again, the minimum is at x ∗ = 0 and 0 ∈ ∂f(0) 



• The function f(x) = max{x 2 + 2x − 3, x 2 − 2x − 3, 4} 

f(x) = 

⎧ 

⎪⎨ 

⎪⎩ 

x 2 − 2x − 3 

for x < −1 

4 for x ∈ [−1, 1] 

x 2 + 2x − 3 for x > 1 

∂f(x) = 

⎧ 

⎪⎨ 

⎪⎩ 

2x − 2 for x > 1 

[−4, 0] 

for x = −1 

0 for x ∈ (−1, 1) 

[0, 4] for x = 1 

2x + 2 for x > 1 

• The optimal set is X ∗ = [−1, 1] 

• For every x ∗ ∈ X ∗ , we have 0 ∈ ∂f(x ∗ ) 



Optimality Conditions: Constrained Case 

Constrained optimization 

Assumption 

minimize f(x) 

subject to x ∈ C 

• The function f is convex (non-differentiable) and proper 

• The set C is nonempty closed and convex 

Theorem Under this assumption, a vector x ∗ ∈ C minimizes f over the 

set C if and only if there exists a subgradient d ∈ ∂f(x ∗ ) such that 

d T (x − x ∗ ) ≥ 0 

for all x ∈ C 

• The result is a generalization of ∇f(x ∗ ) T (x − x ∗ ) ≥ 0 for x ∈ C 

Convex Optimization 18

Lecture 18 Subgradients

Create successful ePaper yourself

Delete template?

Save as template?