11.01.2015 Views

Solutions to Selected Exercises in Chapter Two - CARMA

Solutions to Selected Exercises in Chapter Two - CARMA

Solutions to Selected Exercises in Chapter Two - CARMA

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Solutions</strong> <strong>to</strong> <strong>Selected</strong> <strong>Exercises</strong> <strong>in</strong> <strong>Chapter</strong> <strong>Two</strong><br />

<strong>Exercises</strong> from Section 2.1<br />

2.1.1. See [410, Theorem 4.43].<br />

2.1.2. For x > x 0 , x ∈ I, the three-slope <strong>in</strong>equality (2.1.1) implies<br />

f(x) − f(x 0 )<br />

x − x 0<br />

≥ f ′ +(x 0 ).<br />

For x < x 0 , x ∈ I, the three-slope <strong>in</strong>equality (2.1.1) implies<br />

f(x) − f(x 0 )<br />

x − x 0<br />

≤ f ′ −(x 0 ).<br />

Hence f(x) − f(x 0 ) ≥ λ(x − x 0 ) for all x ∈ I whenever f ′ −(x 0 ) ≤ λ ≤ f ′ +(x 0 ). Consider<strong>in</strong>g x > x 0 ,<br />

the three-slope <strong>in</strong>equality (2.1.1) implies f ′ +(x 0 ) = sup{λ : λ(x−x 0 ) ≤ f(x)−f(x 0 ) for all x ∈ I},<br />

and by the above, f ′ (x 0 ) is the maximum such λ.<br />

2.1.3. Suppose T : E → F is a l<strong>in</strong>ear mapp<strong>in</strong>g, and let A := y 0 + T where y 0 ∈ F . Then for any<br />

λ ∈ R, we have<br />

A(λx + (1 − λ)y) = T (λx + (1 − λ)y) + y 0<br />

= λ(T x) + (1 − λ)(T y) + y 0<br />

= λ(T x + y 0 ) + (1 − λ)(T y + y 0 )<br />

= λAx + (1 − λ)Ay<br />

For the reverse, suppose A : E → F is an aff<strong>in</strong>e mapp<strong>in</strong>g. Let y 0 = A(0), T (x) := Ax − y 0 and<br />

k ∈ R. Then<br />

Furthermore,<br />

T (kx) = A(kx) − y 0 = A(kx + (1 − k)0) − y 0<br />

= kA(x) + (1 − k)A(0) − y 0 = kA(x) − ky 0 = kT (x).<br />

T (x + y) = A(x + y) − y 0 = A<br />

( 1<br />

2 (2x) + 1 2 (2y) )<br />

− y 0<br />

= 1 2 A(2x) + 1 2 A(2y) − y 0 = 1 2 [A(2x) − y 0] + 1 2 [A(2y) − y 0]<br />

= 1 2 T (2x) + 1 T (2y) = T (x) + T (y).<br />

2<br />

2.1.4. Suppose f is positively homogeneous and subadditive, then for x, y ∈ X and α, β ∈ R +<br />

we have<br />

f(αx + βy) ≤ f(αx) + f(βy) = αf(x) + βf(y).<br />

This shows f is subl<strong>in</strong>ear. Conversely, suppose f is subl<strong>in</strong>ear. Then clearly it is subadditive by<br />

choos<strong>in</strong>g α = 1 and β = 1 <strong>in</strong> the previous <strong>in</strong>equality. Now f(0) ≤ 0 · f(x) + 0 · f(−x) = 0 and<br />

1


f(0) ≤ f(0) + f(0) implies f(0) ≥ 0, and so f(0) = 0. When λ = 0, 0 = f(0) = f(λx) = λf(x),<br />

now for λ > 0,<br />

f(λx) = f(λx + 0) ≤ λf(x) + 1 · f(0) = λf(x) = λf(λ −1 λx) ≤ λ · λ −1 f(λx)<br />

where the second <strong>in</strong>equality follows from the first.<br />

2.1.5. Suppose f is a convex function. Consider the set F = {x ∈ E : f(x) ≤ α}. In the case F<br />

is not empty, let u, v ∈ F . Then for 0 ≤ λ ≤ 1,<br />

f(λu + (1 − λ)v) ≤ λf(u) + (1 − λ)f(v) ≤ λα + (1 − λ)α = α.<br />

Thus λu + (1 − λ)v ∈ F whenever 0 ≤ λ ≤ 1, and so f is quasi-convex. Analogously, suppose<br />

u, v ∈ dom f. Then we can choose α ∈ R so that max{f(u), f(v)} ≤ α. It follows from the<br />

previous reason<strong>in</strong>g that λu + (1 − λ)v ∈ dom f whenever 0 ≤ λ ≤ 1 and so dom f is convex.<br />

2.1.6. Observe that a po<strong>in</strong>twise suprema of convex functions is convex, because the epigraph<br />

is an <strong>in</strong>tersection of convex sets which is convex. Notice that the function need not be proper.<br />

Moreover, the epigraph will be closed if all of the functions are closed.<br />

(a) Suppose f : X → [−∞, +∞] is convex. If f ≡ +∞, then epi f = ∅ is convex. Otherwise, let<br />

(x, t), (y, s) ∈ epi f. Then for 0 ≤ λ ≤ 1 we have<br />

f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y) ≤ λt + (1 − λ)s.<br />

Therefore, λ(x, t) + (1 − λ)(y, s) ∈ epi f as desired.<br />

Conversely, suppose epi f is convex, if epi f = ∅ then f ≡ +∞ is convex. Otherwise, suppose<br />

f(x), f(y) < +∞. Then (x, t), (y, s)) ∈ epi f where f(x) < t and f(y) < s. Thus for 0 ≤ λ ≤ 1,<br />

we have λ(x, t)) + (1 − λ)(y, s)) ∈ epi f. This implies<br />

f(λx + (1 − λ)y) ≤ λt + (1 − λ)s,<br />

for all t > f(x) and s > f(y). It follows that f is convex. (Hence this confirms that we can def<strong>in</strong>e<br />

improper convex functions via their epigraphs as remarked earlier)<br />

(c) Let x, y ∈ E, and 0 ≤ λ ≤ 1. Then<br />

m(g(λx + (1 − λ)y) ≤ m(λ(g(x)) + (1 − λ)g(y)) ≤ λm(g(x)) + (1 − λ)m(g(y)),<br />

as desired.<br />

For a nice alternative geometric approach and explanation <strong>to</strong> (d), the reader is encouraged <strong>to</strong><br />

consult [255, Proposition 2.2.1]. For the converse, the convexity of g follows by restrict<strong>in</strong>g <strong>to</strong> the<br />

case t = 1.<br />

2.1.7. It is clear that if x 0 ∈ <strong>in</strong>t C, then x 0 ∈ core C. For the converse, suppose C is convex<br />

and x 0 ∈ core C. Then there exist δ i > 0 so that x 0 + te i ∈ C for all |t| ≤ δ i and i = 1, 2 . . . , n<br />

where {e i } n i=1 is the usual basis of Rn . Now let δ = m<strong>in</strong>{δ 1 , δ 2 , . . . , δ n }. Because C is convex, it<br />

follows that x 0 + h ∈ C whenever h = a 1 e 1 + a 2 e 2 + . . . a n e n where |a i | ≤ δ/n for i = 1, 2, . . . , n.<br />

Consequenctly, x 0 ∈ <strong>in</strong>t C ≠ ∅. A conventional example of a nonconvex set F ⊂ R 2 with<br />

(0, 0) ∈ core F \ <strong>in</strong>t F is F = {(x, y) ∈ R 2 : |y| ≥ x 2 or y = 0}; see also Figure 2.4 for another<br />

example.<br />

2


2.1.8. Let x ∈ E be a po<strong>in</strong>t of cont<strong>in</strong>uity of the convex function f. The max formula (2.1.19)<br />

ensures that ∂f(x) ≠ ∅. Moreover, f has Lipschitz constant K on some neighborhood U of x.<br />

Because 〈v, y − x〉 ≤ f(y) − f(x) for all y ∈ U and v ∈ ∂f(x), it follows that ‖v‖ ≤ K. Thus,<br />

∂f(x) ⊂ KB E . Now suppose (v n ) ⊂ ∂f(x) and v n → v. Then<br />

〈v, y − x〉 = lim<br />

n→∞ 〈v n, y − x〉 ≤ f(y) − f(x), for all y ∈ E.<br />

Therefore, ∂f(x) is closed. F<strong>in</strong>ally, suppose u, v ∈ ∂f(x) and 0 ≤ λ ≤ 1. Writ<strong>in</strong>g w = λu + (1 −<br />

λ)v, for each y ∈ E we have<br />

〈w, y − x〉 = λ〈u, y − x〉 + (1 − λ)〈v, y − x〉<br />

≤<br />

λ[f(y) − f(x)] + (1 − λ)[f(y) − f(x)] = f(y) − f(x).<br />

This shows w ∈ ∂f(x). Therefore, ∂f(x) is a nonempty, closed, convex and bounded subset of<br />

E.<br />

2.1.9. Let x ∈ dom f and d ∈ E. If x + d ∉ dom f, f(x + d) − f(x) = ∞ so the <strong>in</strong>equality is<br />

clear. In the case x + d ∈ dom f, the three slope <strong>in</strong>equality implies for 0 < t < 1,<br />

f ′ (x; d) ≤<br />

f(x + td) − f(x)<br />

t<br />

≤<br />

f(x + d) − f(x)<br />

.<br />

1<br />

as desired. Thus f(x + d) ≥ f(x) for all d ∈ E if and only if f ′ (x; d) ≥ 0 for all d ∈ E. Also,<br />

0 ∈ ∂f(x) if and only if 〈0, d〉 ≤ f(x + d) − f(x) for all d ∈ E if and only if f(x + d) ≥ f(x) for<br />

all d ∈ E.<br />

2.1.10. The measurability of φ ◦ f follows because φ is cont<strong>in</strong>uous (see [384]). Let a := ∫ fdµ<br />

and note the bounds on f ensure a ∈ I. Apply Corollary 2.1.3 <strong>to</strong> obta<strong>in</strong> λ ∈ R such that<br />

φ(t) ≥ φ(a) + λ(t − a) for all t ∈ I. Then φ(f(t)) − λ(f(t) − a) − φ(a) ≥ 0. Integrat<strong>in</strong>g we obta<strong>in</strong><br />

∫<br />

[φ(f(t)) − λ(f(t) − a) − φ(a)]dµ ≥ 0.<br />

Ω<br />

Because µ(Ω) = 1 and because of the choice of a, this implies<br />

∫<br />

(∫ ) (∫ )<br />

φ(f(t))dµ ≥ λ(a − a) + φ fdµ = φ fdµ ,<br />

Ω<br />

Ω<br />

Ω<br />

as desired.<br />

2.1.11. (a) For arbitrary a, b ∈ R and 0 < λ < 1, let f be def<strong>in</strong>ed by f(x) := a for 0 ≤ x ≤ λ<br />

and f(x) := b for λ < x ≤ 1. Then<br />

(∫ 1<br />

)<br />

g f(x) dx = g(λa + (1 − λ)b)<br />

while ∫ 1<br />

g(f) dx = λg(a) + (1 − λ)g(b).<br />

The orig<strong>in</strong>al assumption<br />

0<br />

0<br />

(∫ 1<br />

)<br />

g f(x)dx ≤<br />

0<br />

3<br />

∫ 1<br />

0<br />

g(f)dx


then implies g(λa + (1 − λ)b) ≤ λg(a) + (1 − λ)g(b). This establishes the convexity of g.<br />

(b) Apply<strong>in</strong>g Jensen’s <strong>in</strong>equality when φ := exp(·) we have<br />

[∫ ] ∫<br />

exp f dµ ≤ exp(f) dµ.<br />

Ω<br />

Ω<br />

When Ω is the f<strong>in</strong>ite set {s 1 , s 2 , . . . , s n } with µ({s i }) = 1/n and f(s i ) = y i for i = 1, 2, . . . , n, this<br />

becomes<br />

[ ]<br />

1<br />

exp<br />

n (y 1 + . . . + y n ) ≤ 1 n (ey 1<br />

+ . . . + e yn ), y i ∈ R.<br />

When x i = e y i<br />

this becomes<br />

(x 1 x 2 · · · x n ) 1/n ≤ 1 n (x 1 + x 2 + . . . + x n ),<br />

that is, the arithmetic-geometric mean <strong>in</strong>equality.<br />

2.1.12. Suppose f is not identically −∞. Then fix x 0 where f is real-valued. We may assume<br />

f(x 0 ) = 0. Suppose f(x 0 + h) > 0 for some h ∈ E. Then for t > 1,<br />

0 < f(x 0 + h) ≤ (1 − t −1 )f(x 0 ) + t −1 f(x 0 + th) = t −1 f(x 0 + th).<br />

Thus lim t→∞ f(x 0 + th) = lim t→∞ tf(x 0 + h) = ∞ which is a contradiction. F<strong>in</strong>ally, <strong>in</strong> the case<br />

f(x 0 + h) < 0 for some h ∈ E, we would deduce f(x 0 − h) > 0, and so this, <strong>to</strong>o, is impossible.<br />

2.1.13. (a) For k ≥ 0, observe that x ∈ λC if and only if kx ∈ kλC. Therefore γ C is positively<br />

homogeneous (the def<strong>in</strong>ition of γ C := <strong>in</strong>f{λ ≥ 0 : x ∈ λC} ensures γ C (0) = 0 even when 0 ∉ C).<br />

Suppose C ⊂ E is convex. Let x, y ∈ E. In the case γ C (x) = ∞ or γ C (y) = ∞ then it is clear<br />

γ C (λx + (1 − λ)y) ≤ λγ C (x) + (1 − λ)γ C (y) when 0 < λ < 1.<br />

Suppose γ C (x), γ C (y) are f<strong>in</strong>ite and ɛ > 0. Choose α, β so that γ C (x) ≤ α < γ C (x) + ɛ,<br />

γ C (y) ≤ α < γ C (y) + ɛ and 1 α x, 1 β<br />

y ∈ C. So we choose u, v ∈ C so that x = αu and y = αv (the<br />

case x = 0 or y = 0 are f<strong>in</strong>e). Then<br />

λαu + (1 − λ)βv<br />

λα + (1 − λ)β<br />

∈ C<br />

and therefore<br />

γ C (λαu + (1 − λ)βv) ≤ λα + (1 − λ)β<br />

or <strong>in</strong> other words, γ C (λx + (1 − λ)y) ≤ λγ C (x) + (1 − λ)γ C (y) + ɛ. This shows γ C is convex when<br />

C is convex, <strong>in</strong> fact, γ C is subadditive because<br />

γ C (x + y) = γ C<br />

( 1<br />

2 (2x) + 1 2 (2y) )<br />

≤ 1 2 γ C(2x) + 1 2 γ C(2y) = γ C (x) + γ C (y).<br />

Hence γ C is subl<strong>in</strong>ear when C is convex.<br />

(b) Suppose 0 ∈ core C. Given x ∈ E, there exists t > 0 so that tx ∈ C and then x ∈ 1 t<br />

C. Thus<br />

γ C (x) ≤ 1/t. Because γ C is convex and everywhere f<strong>in</strong>ite on E, it is cont<strong>in</strong>uous everywhere by<br />

Theorem 2.1.12.<br />

4


(c) Suppose 0 ∈ core C. Then γ C is cont<strong>in</strong>uous, and therefore {x ∈ E : γ C (x) ≤ 1} is closed.<br />

Observe that γ C (x) ≤ 1 for all x ∈ C, consequently cl C ⊂ {x ∈ E : γ C (x) ≤ 1}.<br />

2.1.14. For example, let f(x, y) = − 4√ xy when x ≥ 0, and y ≥ 0. The strict convexity assertion<br />

is probably best shown by comput<strong>in</strong>g the Hessian (as <strong>in</strong>troduced <strong>in</strong> Section 2.2) of f at (x, y),<br />

which is<br />

[<br />

3<br />

H = 16 x−7/4 y 1/4 − 1<br />

16 x−3/4 y −3/4 ]<br />

− 1<br />

16 x−3/4 y −3/4 3<br />

16 x1/4 y −7/4 .<br />

Then f is strictly convex on the <strong>in</strong>terior of its doma<strong>in</strong> because this matrix is positive def<strong>in</strong>ite for<br />

all (x, y) with x > 0 and y > 0 which follows because h 11 > 0, and |H| > 0 at all such (x, y).<br />

2.1.15. Let (f i ) be a family of proper functions and def<strong>in</strong>e f : E → [−∞, +∞] by<br />

{∑<br />

f(x) := <strong>in</strong>f λi f i (x i ) : ∑ λ i = 1, λ i ≥ 0, λ i f<strong>in</strong>itely nonzero, ∑ }<br />

λ i x i = x .<br />

Let u, v ∈ dom f and and let α and β be any real numbers satisfy<strong>in</strong>g f(u) < α and f(v) < β.<br />

Now choose sums as <strong>in</strong> the def<strong>in</strong>ition above so<br />

∑ ∑<br />

αi f(u i ) < α, βi f(v i ) < β where ∑ α i u i = u, ∑ β i v i = v.<br />

Then for any 0 ≤ λ ≤ 1, we have ∑ [λα i +(1−λ)β i ] = 1 and ∑ (λα i u i +(1−λ)β i v i ) = λu+(1−λ)v.<br />

Then by the def<strong>in</strong>ition of f,<br />

f(λu + (1 − λ)v) ≤ ∑ λα i f i (u i ) + (1 − λ)β i f i (v i ) < λα + (1 − λ)β.<br />

The convexity of f follows from this. Next we show that f is the largest convex function m<strong>in</strong>oriz<strong>in</strong>g<br />

the family. Indeed suppose h is convex and h m<strong>in</strong>orizes the family (f i ). Then for any x such<br />

that x = ∑ λ i x i where ∑ λ i = 1, λ i ≥ 0, and only f<strong>in</strong>itely many of the λ i are nonzero, by the<br />

convexity of h and m<strong>in</strong>orization property we have<br />

h( ∑ λ i x i ) ≤ ∑ λ i h(x i ) ≤ ∑ λ i f(x i ).<br />

Tak<strong>in</strong>g the <strong>in</strong>fimum over all such sums we see that h ≤ f.<br />

For the example, let f : R → R be def<strong>in</strong>ed by f(t) := t 2 if t ≠ 0 and f(0) = 1. Notice that<br />

epi conv f = {(x, y) ∈ R 2 : y ≤ x 2 } and conv epi f = epi conv f \ {(0, 0)}.<br />

2.1.16. Suppose T : X → Y is an open mapp<strong>in</strong>g. Then T (U) is open where U = <strong>in</strong>t B X . Then,<br />

0 ∈ <strong>in</strong>t T (U), so we choose r > 0 so that rB Y ⊂ T (U). For any y ∈ Y , choose n > 0 so that<br />

n −1 ‖y‖ < r. Then n −1 y ∈ T (U), and so we let x ∈ X be chosen so T x = n −1 y. Then T (nx) = y<br />

and so T is on<strong>to</strong> as desired.<br />

Conversely, suppose T is on<strong>to</strong>. Let y ∈ T (U) where U is an open subset of X, and choosex ∈ U<br />

so that T x = y. Let V be an open convex set so that x ∈ V ⊂ U. Now fix h ∈ Y . Because T is<br />

on<strong>to</strong>, we fix v ∈ X so that T v = h. Because V is open, we choose δ > 0 so that x + tv ∈ V for<br />

all 0 ≤ t ≤ δ. Then T (x + tv) = y + th ∈ T (V ) for all 0 ≤ t ≤ δ. Thus y ∈ core T (V ), and so<br />

y ∈ <strong>in</strong>t T (V ) ⊂ T (U). Thus T (U) is open.<br />

2.1.19. Suppose lim <strong>in</strong>f ‖x‖→∞ f(x)/‖x‖ > β > 0. Let S = {x : f(x) ≤ α} for some α ∈ R. F<strong>in</strong>d<br />

k > 0 so that kβ > α, and f(x)/‖x‖ > β for all ‖x‖ ≥ k. If ‖x‖ ≥ k, then f(x) > kβ > α.<br />

Therefore, S ⊂ kB E .<br />

5


Clearly f : R → R def<strong>in</strong>ed by f(t) := √ |t| is not convex, and {t : f(t) ≤ α} = [−α 2 , α 2 ] for each<br />

α > 0. However,<br />

lim f(t)/|t| = lim<br />

|t|→∞ |t|→∞ |t|−1/2 = 0.<br />

2.1.22. (a) For each n ∈ N, let F n := {x ∈ S : |f k (x)| ≤ n}. Then ⋃ n F n = S because<br />

f n (x) → f(x) and f(x) ∈ R. Accord<strong>in</strong>g <strong>to</strong> the Baire category theorem, there exist N ∈ N such<br />

that F N conta<strong>in</strong>s a relatively open (<strong>in</strong> E) subset U. Then |f k (x)| ≤ N for all k ∈ N, x ∈ U.<br />

(b) By shift<strong>in</strong>g, we may assume x = 0 where x is some given po<strong>in</strong>t <strong>in</strong> <strong>in</strong>t S. Now rB E ⊂ <strong>in</strong>t S<br />

for some r > 0. By part (a), there is some B r1 (y) ⊂ rB E for which f k (x) ≤ M for all k ∈ N,<br />

x ∈ B r1 (y). Replac<strong>in</strong>g M with a larger number as necessary, we may also assume f k (−y) ≤ M for<br />

all k ∈ N. By the convexity of f n , we have that f n ≤ M on conv({−y} ∪ B r1 (y)) which conta<strong>in</strong>s<br />

r 12<br />

B E . Thus f n is uniformly bounded on r 1<br />

2<br />

B E .<br />

Now let K be a compact subset of <strong>in</strong>t S. Suppose by way of contradiction there exists ɛ > 0 and<br />

a subsequence (x nk ) ⊂ K such that<br />

(1) |f nk (x nk ) − f(x nk )| > ɛ for all n k .<br />

By pass<strong>in</strong>g <strong>to</strong> a further subsequence as necessary, we may assume x nk → ¯x where ¯x ∈ K. By<br />

the previous paragraph, we f<strong>in</strong>d r > 0, so that f nk is uniformly bounded on B r (¯x). The proof of<br />

Theorem 2.1.10 shows that (f nk ) is equi-Lipschitz on B r (¯x). Hence (f n<br />

2 k<br />

) converges uniformly <strong>to</strong><br />

f on B r (¯x). This is a contraction with (1) because (x n<br />

2 k<br />

) is eventually <strong>in</strong> B r (¯x).<br />

2<br />

2.1.23. (a) Suppose f does not have Lipschitz constant K ≥ 0 on U. Fix u, v ∈ U such that<br />

f(v) − f(u) > K‖v − u‖, and let φ ∈ ∂f(v). Then<br />

〈φ, u − v〉 ≤ f(u) − f(v) < −K‖u − v‖<br />

and so ‖φ‖ > K.<br />

Conversely, suppose f has Lipschitz constant K ≥ 0 on U. Let u ∈ U. Then ∂f(u) is not empty<br />

because f is cont<strong>in</strong>uous. Moreover, let φ ∈ ∂f(u). Then<br />

〈φ, v − u〉 ≤ f(v) − f(u) ≤ K‖v − u‖<br />

for all v ∈ U. Because u is <strong>in</strong> the <strong>in</strong>terior of U, it follows that ‖φ‖ ≤ K.<br />

The “<strong>in</strong> particular”’ part, follows from the first part because f is cont<strong>in</strong>uous on U, and therefore,<br />

locally Lipschitz on U.<br />

(b) Because dom f = E, f is cont<strong>in</strong>uous on E, and the extreme value theorem implies f is<br />

bounded on bounded subsets of E. Exercise 2.1.22 then ensures f is Lipschitz on bounded subsets<br />

of E, and then part (a) of this exercise ensures ∂f maps bounded subsets of E <strong>to</strong> bounded subsets<br />

of E.<br />

<strong>Exercises</strong> from Section 2.2<br />

2.2.1. A proof of the convexity of f us<strong>in</strong>g the Hessian is sketched <strong>in</strong> [369, pp. 27–28]. Because<br />

f is convex, we have<br />

( ) x + y<br />

f ≤ 1 2 2 f(x) + 1 2 f(y)<br />

6


which means<br />

√ (x1 ) ( )<br />

+ y<br />

− n 1 xn + y n<br />

· · ·<br />

≤ − 1 2<br />

2 2 n√ x 1 x 2 · · · x n − 1 2 n√ y 1 y 2 · · · y n .<br />

Multiply<strong>in</strong>g both sides of the previous <strong>in</strong>equality by −2 yields the result.<br />

2.2.2. (a) ⇒ (b): Let g := −1/f and φ(t) := − ln(−t) for t ∈ (−∞, 0). Then φ is convex and<br />

<strong>in</strong>creas<strong>in</strong>g, and g is convex. Therefore, ln ◦f = φ ◦ g is convex.<br />

(b) ⇒ (c): g := ln ◦f is convex, therefore, f = exp(ln ◦f) is convex s<strong>in</strong>ce exp is convex and<br />

<strong>in</strong>creas<strong>in</strong>g.<br />

2.2.4. We provide details as <strong>in</strong> [34, Lemma 3.2]. For a function f on I consider the associated<br />

Bregman distance D f def<strong>in</strong>ed by D f (x, y) := f(x) − f(y) − f ′ (y)(x − y). Let g := −1/h so that<br />

g ′ = h ′ /h 2 . (a): 1/h is concave if and only if g is convex if and only if D g is nonnegative if and<br />

only if<br />

0 ≤ −1/h(x) + 1/h(y) − (h ′ (y)/h 2 (y))(x − y) for all x, y ∈ I<br />

if and only if<br />

0 ≤ h(x)h(y) − h 2 (y) − h(x)h ′ (y)(x − y) for all x, y ∈ I.<br />

Part (b) is similar, not<strong>in</strong>g that D f ≡ 0 if and only if f is aff<strong>in</strong>e. Part (c) was shown <strong>in</strong> Exercise<br />

2.2.2. Part (d): 1/h is concave if and only if g is convex if and only if g ′′ = (h 2 h ′′ −<br />

2h(h ′ ) 2 )/h 4 ≥ 0 if and only if hh ′′ ≥ 2(h ′ ) 2 .<br />

2.2.5. First, g is a real-valued convex function on [0, 1] and so Theorem 2.1.2(d) ensures that g<br />

is differentiable except at possibly countably many t ∈ [0, 1]. Then Theorem 2.2.1 implies that<br />

at po<strong>in</strong>ts of differentiability ∇g(t) = {∂g(t)}. Now let t ∈ (0, 1) be a po<strong>in</strong>t of differentiability of<br />

of g. Observe that<br />

〈φ t , sh〉 ≤ f(x + (s + t)h) − f(x + th) = g(t + s) − g(t),<br />

Hence 〈φ t , h〉 ∈ ∂g(t) and we conclude ∇g(t) = 〈φ t , h〉.<br />

2.2.8. Suppose f is Fréchet differentiable at x 0 . Let φ = f ′ (x 0 ). Given ɛ > 0 we choose δ > 0 so<br />

that δ‖φ‖ < ɛ and<br />

|f(x 0 + h) − f(x 0 ) − φ(h)| ≤ ɛ 2 ‖h‖<br />

whenever 0 < ‖h‖ < δ. Now suppose ‖x − x 0 ‖ < δ. Then x = x 0 + h where ‖h‖ < δ and the<br />

previous <strong>in</strong>equality then implies<br />

Thus, f is cont<strong>in</strong>uous at x 0 .<br />

|f(x) − f(x 0 )| ≤ |f(x 0 + h) − f(x 0 )| ≤ (‖φ‖ + ɛ/2)‖h‖ < ɛ.<br />

2.2.9. Suppose f has Lipschitz constant K <strong>in</strong> a neighborhood U of x 0 and Gâteaux differentiable<br />

at x 0 with Gâteaux derivative φ. Then ‖φ‖ ≤ K. Suppose f is not Fréchet differentiable at x 0 .<br />

Then there exists ɛ > 0 and t n → 0 + , h n ∈ S E such that<br />

|f(x 0 + t n h n ) − f(x 0 ) − φ(t n h n )| ≥ t n ɛ<br />

7


Because S X is compact, we may replace (h n ) above with one of its convergent subsequences, so<br />

we suppose h n → h ∈ S X . When ‖h n − h‖ < ɛ/3K. For n sufficiently large we have<br />

|f(x 0 + t n h) − f(x 0 ) − φ(t n h)| ≥ |f(x 0 + t n h n ) − f(x 0 ) − φ(t n h n )| − 2Kt n ‖h n − h‖<br />

≥ t n ɛ − 2K(ɛ/3K) ≥ t n ɛ/3<br />

which contradicts the Gâteaux differentiability of f. For a slightly different proof of this, see the<br />

last part of the proof of Theorem 2.5.4.<br />

2.2.13. Suppose f is convex on the <strong>in</strong>verval I, and suppose J := [a, b] is a compact sub<strong>in</strong>terval<br />

of I. Then for m aff<strong>in</strong>e and 0 ≤ λ ≤ 1, we have<br />

(f + m)(λa + (1 − λ)b) ≤ λ(f + m)(a) + (1 − λ)(f + m)(b)<br />

≤<br />

max{(f + m)(a), (f + m)(b)}.<br />

Thus the supremum of f + m is atta<strong>in</strong>ed at one of the endpo<strong>in</strong>ts a or b.<br />

Conversely, suppose a, b ∈ I. Now choose an aff<strong>in</strong>e function m such that (f +m)(a) = (f +m)(b).<br />

Because (f + m) atta<strong>in</strong>s its max on [a, b] at an endpo<strong>in</strong>t, we know it atta<strong>in</strong>s its max on [a, b] at<br />

both a and b. Then, for 0 ≤ λ ≤ 1, we have<br />

(f + m)(λa + (1 − λ)b) = f(λa + (1 − λ)b) + λm(a) + (1 − λ)m(b)<br />

≤<br />

max(f + m) = λ(f(a) + m(a)) + (1 − λ)(f(b) + m(b)).<br />

[a,b]<br />

Consequent, f(λa + (1 − λ)b) ≤ λf(a) + (1 − λ)f(b) and so f is convex as desired.<br />

2.2.16. Suppose a < b and M is an aff<strong>in</strong>e function through (a, f(a)) and (b, f(b)), and let m be<br />

an aff<strong>in</strong>e m<strong>in</strong>orant of f pass<strong>in</strong>g through ((a + b)/2, f((a + b)/2) (us<strong>in</strong>g the max formula (2.1.19)).<br />

Then m ≤ f ≤ M, and thus<br />

( ) a + b<br />

f<br />

2<br />

≤ 1<br />

b − a<br />

∫ b<br />

a<br />

f(t) dt ≤<br />

f(a) + f(b)<br />

2<br />

s<strong>in</strong>ce these quantities are the averages of m, f and M respectively on [a, b].<br />

2.2.20. First, for (x, y) ∈ dom f \ {(0, 0)}, the Hessian of f at (x, y) is<br />

[<br />

6xy<br />

H =<br />

−2 −6x 2 y −3 ]<br />

−6x 2 y −3 6x 3 y −4 .<br />

Then H is positive semidef<strong>in</strong>ite for such (x, y) because |H| = 0 and 6xy −2 ≥ 0. Also, for<br />

(x, y) ∈ dom f, we have<br />

λf(x, y) + (1 − λ)f(0, 0) = λ x3<br />

= f(λ(x, y))<br />

y2 and <strong>to</strong>gether we deduce f is convex. It is also closed because: (i) its doma<strong>in</strong> is closed (ii) x 3 /y 2<br />

is cont<strong>in</strong>uous when y ≠ 0, (iii) lim <strong>in</strong>f (x,y)→(0,0) f(x, y) ≥ f(0, 0). However, f is not cont<strong>in</strong>uous at<br />

(0, 0) even when consider<strong>in</strong>g that the underly<strong>in</strong>g <strong>to</strong>pological space is the doma<strong>in</strong>. This is because<br />

lim (x,x 2 )→(0,0) f(x, x 2 ) = +∞.<br />

8


A further observation is that this type of example cannot occur on R. Indeed, if f : [a, b] → R<br />

is convex and lower semicont<strong>in</strong>uous then f is cont<strong>in</strong>uous as a function on [a, b]. Indeed, it<br />

is cont<strong>in</strong>uous on (a, b) and further lim <strong>in</strong>f x→b − f(x) ≥ f(b) by lower semicont<strong>in</strong>uity while the<br />

convexity of f implies lim sup x→b − f(x) ≤ f(b). Similarly, f is cont<strong>in</strong>uous from the right at a.<br />

2.2.21. Suppose x, y ∈ U, and x ∗ ∈ ∂f(x), y ∗ ∈ ∂f(y) (which are not empty by the max<br />

formula (2.1.19)). Then by the subdifferential <strong>in</strong>equality<br />

〈y ∗ − x ∗ , y − x〉 = y ∗ (y − x) + x ∗ (x − y)<br />

≥ f(y) − f(x) + f(x) − f(y) = 0.<br />

Hence the subdifferential is a mono<strong>to</strong>ne mapp<strong>in</strong>g. The ‘<strong>in</strong> particular’ statement follows because<br />

∂f(x) = {∇f(x)} when f is differentiable at x.<br />

2.2.22. (a) Suppose not, then there exists x n → x 0 and ɛ > 0 so that φ n ∈ ∂f(x n ), but<br />

φ n ∉ ∂f(x 0 ) + ɛB E . Use the local Lipschitz property of f (Theorem 2.1.12) <strong>to</strong> deduce that<br />

(‖φ n ‖) n is bounded. Then use compactness <strong>to</strong> f<strong>in</strong>d convergent subsequence, say φ nk → φ. Now<br />

fix y ∈ E. Then<br />

φ(y) − φ(x 0 ) = φ(y) − φ(x nk ) + φ(x nk ) − φ(x 0 )<br />

= lim φ n k<br />

(y − x nk ) + φ nk (x nk − x 0 )<br />

k→∞<br />

≤ lim f(y) − f(x n k<br />

) + φ nk (x nk − x 0 ) = f(y) − f(x).<br />

k→∞<br />

Therefore φ ∈ ∂f(x 0 ) which contradicts that φ nk → φ.<br />

(b) This follows from (a) and the fact ∂f(x 0 ) = {f ′ (x 0 )} (Theorem 2.2.1).<br />

(c) Suppose not, then there is a subsequence (n k ) and ɛ > 0 such that φ nk ∈ ∂f nk (w nk ), w nk ∈ W<br />

but φ nk ∉ ∇f(w nk ) + ɛB E . Because f n → f uniformly on bounded sets, it follows that (f n ) is<br />

uniformly bounded on bounded sets, and thus (f n ) is eventually uniformly Lipschitz on bounded<br />

sets. Hence by pass<strong>in</strong>g <strong>to</strong> a further subsequence, if necessary, we may assume w nk → w 0 , and<br />

φ nk → φ for some w 0 , φ ∈ E. Now let y ∈ E, and observe<br />

φ(y) − φ(x 0 ) = φ(y) − φ(w nk ) + φ(w nk ) − φ(w 0 )<br />

= lim φ n k<br />

(y − w nk ) + φ nk (w nk − x 0 )<br />

k→∞<br />

≤ lim f n k<br />

(y) − f nk (w nk ) + φ nk (w nk − w 0 ) = f(y) − f(w 0 ),<br />

k→∞<br />

where the last equality follows by the uniform convergence of f nk <strong>to</strong> f on bounded sets. Thus<br />

φ ∈ ∂f(w 0 ), that is φ = ∇f(w 0 ). By (b), ∇f(w k ) → ∇f(w 0 ) = φ which yields a contradiction<br />

because ‖φ nk − ∇f(w nk )‖ > ɛ.<br />

(d) For example, let f n := max{|·|−1/n, 0} and f := |·| on R. Then ∂f n (1/n) ⊄ ∂f(1/n)+ 1 2 B R<br />

for any n ∈ N. Indeed, ∂f n (1/n) = [0, 1] while ∂f(1/n) + 1 2 B R = [1/2, 3/2]. For the rema<strong>in</strong><strong>in</strong>g<br />

part, suppose no such N exists. As <strong>in</strong> (c), choose φ nk ∈ ∂f nk (w nk ) but φ nk ∉ ∂f(w) + ɛB E for<br />

‖w − w nk ‖ < δ and as <strong>in</strong> (c), w nk → w 0 and φ nk → φ for some w 0 ∈ E and φ ∈ E. Aga<strong>in</strong>, as <strong>in</strong><br />

(c), one can show that φ ∈ ∂f(w 0 ). However, for ‖w nk − w 0 ‖ < δ, we have φ nk ∉ ∂f(w 0 ) + ɛB E<br />

which is a contradiction.<br />

9


<strong>Exercises</strong> from Section 2.3<br />

2.3.2. Us<strong>in</strong>g calculus, one can show that for f := | · | p /p on R one has f ∗ = | · | q /q. The<br />

Fenchel–Young <strong>in</strong>equality (2.3.1) then shows f(x) + f ∗ (y) ≥ xy, that is,<br />

as desired.<br />

|x| p /p + |y| q /q ≥ xy for all real x and y,<br />

2.3.3. Let ‖f‖ p = α, ‖g‖ q = β where α, β > 0 (if either α = 0 or β = 0, then fg = 0 a.e. and so<br />

the <strong>in</strong>equality is trivially true). Now we <strong>in</strong>tegrate both sides of the Young <strong>in</strong>equality:<br />

∫<br />

X<br />

f(x) g(x)<br />

α β<br />

∫X<br />

dµ ≤<br />

Multiply<strong>in</strong>g both sides by αβ yields the result.<br />

1 |f(x)| p<br />

p α p + 1 |g(x)| q<br />

q β q dµ = 1 p + 1 q = 1.<br />

2.3.4. (a) To show that x ↦→ ∑ N<br />

k=1 |x k| p is convex observe that g := | · | p is a convex function R,<br />

and then<br />

N∑<br />

x ↦→ g(P k (x)) where P k (x) = x k<br />

k=1<br />

is a sum of convex functions s<strong>in</strong>ce g ◦ P k is a convex function for each k as it is a composition of a<br />

convex function with a l<strong>in</strong>ear function. Now use the gauge construction as suggested <strong>in</strong> the h<strong>in</strong>t.<br />

(b) Alternatively, one may apply the discrete form of Hölder’s <strong>in</strong>equality of Exercise 2.3.3 as<br />

follows:<br />

N∑<br />

|x k + y k | p =<br />

k=1<br />

≤<br />

=<br />

N∑<br />

|x k ||x k + y k | p−1 + |y k ||x k + y k | p−1<br />

k=1<br />

(<br />

∑ N<br />

) 1 ( p N )<br />

∑<br />

1 q<br />

|x k | p |x k + y k | (p−1)q +<br />

k=1<br />

k=1<br />

k=1<br />

(<br />

∑ N<br />

) 1 ( p N )<br />

∑<br />

1 q<br />

|y k | p |x k + y k | (p−1)q<br />

k=1<br />

k=1<br />

(<br />

∑ N<br />

) ⎡ 1 (<br />

q N<br />

)<br />

∑<br />

1 (<br />

p N<br />

) ⎤<br />

∑<br />

1 p<br />

|x k + y k | p ⎣ |x k | p + |y k | p ⎦<br />

( N<br />

)<br />

∑<br />

1 q<br />

where we used (p−1)q = p. Now divide both sides by |x k + y k | p us<strong>in</strong>g that 1−1/q = 1/p<br />

<strong>to</strong> obta<strong>in</strong><br />

as desired.<br />

2.3.12. Part (a).<br />

k=1<br />

k=1<br />

k=1<br />

k=1<br />

(<br />

∑ N<br />

) 1 (<br />

p N<br />

)<br />

∑<br />

1 (<br />

p N<br />

)<br />

∑<br />

1 p<br />

|x k + y k | p ≤ |x k | p + |y k | p<br />

k=1<br />

k=1<br />

10


(i) Accord<strong>in</strong>g <strong>to</strong> the Fenchel–Young <strong>in</strong>equality (2.3.1) we have<br />

f(x) + g(Ax) ≥ 〈A ∗ φ, x〉 − f ∗ (Aφ) + 〈−φ, Ax〉 − g ∗ (−φ)<br />

= 〈φ, Ax〉 − f ∗ (A ∗ φ) − 〈φ, Ax〉 − g ∗ (−φ)<br />

= −f ∗ (A ∗ φ) − g ∗ (−φ).<br />

Tak<strong>in</strong>g the <strong>in</strong>fimum of the left-hand side over x ∈ E, and then tak<strong>in</strong>g the supremum of the<br />

right-hand side over φ ∈ Y establishes the weak duality <strong>in</strong>equality p ≥ d.<br />

(ii) (This part is for the subdifferential sum rule).<br />

Λ ∈ ∂g(Ax). Then for any v ∈ E we have<br />

Let x ∈ E and suppose φ ∈ ∂f(x) and<br />

〈φ + A ∗ Λ, v − x〉 = 〈φ, v − x〉 + 〈Λ, A(v − x)〉<br />

≤<br />

f(v) − f(x) + g(Av) − g(Ax).<br />

Thus φ + A ∗ Λ ∈ ∂(f + g ◦ A)(x) from which the <strong>in</strong>clusion follows.<br />

(iii) Fix u ∈ Y . Then f(x) + g(Ax + u) < ∞ for some x ∈ X if and only if there exist x ∈ dom f<br />

such that Ax+u ∈ dom g if and only if u ∈ dom g−A dom f. Thus dom h = dom g−A dom f.<br />

To check the convexity of h, suppose u, v ∈ dom h and let α, β be any numbers such that<br />

h(u) < α and h(v) < β. Choose x 1 ∈ E such that f(x 1 ) + g(Ax 1 + u) < α and x 2 ∈ E such<br />

that f(x 2 ) + g(Ax 2 + v) < β. Then for any 0 ≤ λ ≤ 1, we have<br />

h(λu + (1 − λ)v) = <strong>in</strong>f {f(x) + g(Ax + λu + (1 − λ)v)}<br />

x∈E<br />

≤ f(λx 1 + (1 − λ)x 2 ) + g(A(λx 1 + (1 − λ)x 2 ) + λu + (1 − λ)v)<br />

≤ λf(x 1 ) + (1 − λ)f(x 2 ) + λg(Ax 1 + u) + (1 − λ)g(Ax 2 + v)<br />

< λα + (1 − λ)β.<br />

Thus, h(λu + (1 − λ)v) ≤ λh(u) + (1 − λ)h(v) as desired.<br />

(iv) Let x 0 ∈ dom f be such that Ax 0 ∈ cont g. Let y 0 = Ax 0 . Because g is cont<strong>in</strong>uous at y 0 ,<br />

this implies y 0 + rB Y ⊂ dom g for some r > 0. Therefore,<br />

Part (b).<br />

rB Y = (y 0 + rB Y ) − Ax 0 ⊂ dom g − A dom f<br />

which implies 0 ∈ core(dom g − A dom f) as desired.<br />

(i) First, <strong>in</strong>clusion was completed <strong>in</strong> (a)(ii) above. Conversely suppose φ ∈ ∂(f +g ◦A)(¯x). Apply<strong>in</strong>g<br />

the Fenchel–Young <strong>in</strong>equality (2.3.1), and then apply<strong>in</strong>g the Fenchel duality theorem,<br />

we obta<strong>in</strong><br />

f(¯x) + g(A¯x) − 〈φ, ¯x〉 = <strong>in</strong>f {f(x) + g(Ax) − 〈φ, x〉} = <strong>in</strong>f {(f − φ)(x) + g(Ax)}<br />

x∈E x∈E<br />

= −(f − φ) ∗ (A ∗ ¯φ) − g ∗ (− ¯φ),<br />

where ¯φ ∈ Y is a po<strong>in</strong>t where d <strong>in</strong> the Fenchel duality theorem is atta<strong>in</strong>ed. Therefore,<br />

(f − φ)(¯x) − 〈A ∗ ¯φ, ¯x〉 + g(A¯x) − 〈− ¯φ, A¯x) = −(f − φ) ∗ (A ∗ ¯φ) − g ∗ (− ¯φ)<br />

11


and by Fenchel–Young <strong>in</strong>equaltiy (2.3.1), A ∗ ¯φ ∈ ∂(f − φ)(¯x) and − ¯φ ∈ ∂g(A¯x). The first<br />

<strong>in</strong>clusion implies A ∗ ¯φ + φ ∈ ∂f(¯x), and us<strong>in</strong>g the second <strong>in</strong>clusion we check<br />

〈−A ∗ ¯φ, u − ¯x〉 = 〈− ¯φ, A(u − ¯x)〉 ≤ g(Au) − g(A¯x) for all u ∈ E;<br />

thus − ¯φ ∈ ∂g(A¯x), consequently equality holds <strong>in</strong> the sum formula.<br />

(ii) The previous part has proved the ‘only if’ assertion, and we can essentially reverse our steps<br />

<strong>to</strong> deduce the ‘if’ assertion.<br />

2.3.13. Suppose f : E → R has Lipschitz constant k. Suppose φ ∈ E and ‖φ‖ > k. Choose<br />

x 0 ∈ E with ‖x 0 ‖ = 1 and φ(x 0 ) > k. Then lim t→∞ φ(tx) − f(tx) → ∞. So φ ∉ dom f ∗ .<br />

For the converse, assume dom f ∗ ⊂ kB E is not empty, then f is bounded below by φ − a where<br />

φ ∈ dom f ∗ and a = f ∗ (φ). (Us<strong>in</strong>g relative <strong>in</strong>terior properties, one knows that the doma<strong>in</strong> of<br />

the subdifferential of a proper convex function on E is nonempty, and hence the doma<strong>in</strong> of the<br />

conjugate is not empty; see Theorem 2.4.8). Then if dom f ≠ E, one can f<strong>in</strong>d y n ∈ dom f ∗ such<br />

that ‖y n ‖ → ∞: for example, lett<strong>in</strong>g f n := φ − a + nd C where C := dom f, one has f n ≤ f, but<br />

for x ∉ C, and y ∈ ∂f n (x), one has ‖y‖ ≥ n − ‖φ‖. S<strong>in</strong>ce y ∈ dom f ∗ , this yields a contradiction.<br />

Thus dom f = E and thus f is cont<strong>in</strong>uous. In the event f is not k-Lipschitz, one can choose<br />

u, v ∈ E such that f(v) − f(u) > k‖u − v‖. Now let y ∈ ∂f(v). Then<br />

〈y, u − v〉 < −k‖u − v‖<br />

and so ‖y‖ > k, but y ∈ dom f ∗ which is a contradiction.<br />

This result fails if f is not convex: for example, consider f(x) := √ |x| on R. Then dom f ∗ = {0},<br />

but f is not Lipschitz.<br />

2.3.14. Basic facts about <strong>in</strong>fimal convolutions.<br />

(a) Let (x, s) ∈ epi f and (y, t) ∈ epi g. Then<br />

(f g)(x + y) ≤ f(x) + g(x + y − x) = f(x) + g(y) ≤ s + t<br />

and so (x + y, s + t) ∈ epi(f g). That is, epi f + epi g ⊂ epi(f g). Now suppose h<br />

is a function such that there exists ¯x ∈ E with h(¯x) > (f g)(¯x). Choose t ∈ R such<br />

that h(¯x) > t > (f g)(¯x). Then we choose y ∈ E such that f(¯x) + g(y − ¯x) < t Then<br />

(¯x, t) ∉ epi h, but (¯x, t) ∈ epi(f g). Thus (f g) is the largest function whose epigraph<br />

conta<strong>in</strong>s epi f + epi g.<br />

(b) As suggested, let f(x) := e x and g(x) := 0. Then f and g are cont<strong>in</strong>uous and convex, but<br />

epi f + epi g = {(x, y) ∈ R 2 : y > 0}.<br />

(c) As suggested, let f(x) := x and g(x) := 0. For any u ∈ R,<br />

Thus (f g)(u) = −∞ for all u ∈ R.<br />

(f g)(u) ≤ f(−e n ) + g(u + e n ) = −e n for all n ∈ N.<br />

(d) As suggested let C := {(x, y) : y ≥ e x } and D := {(x, y) : y ≥ 0}. Then δ C<br />

which is not closed.<br />

δ D = δ {(x,y):q>0}<br />

12


(e) Suppose f and g are convex functions. Let u, v ∈ dom(f g). Let α and β be any real<br />

numbers satisfy<strong>in</strong>g (f g)(u) < α and (f g)(v) < β. Now choose x 1 , x 2 ∈ E so that<br />

f(x 1 ) + g(u − x 1 ) < α and f(x 2 ) + g(v − x 2 ) < β.<br />

Then<br />

(f g)(λu + (1 − λ)v) ≤ f(λx 1 + (1 − λ)x 2 ) + g(λ(u − x 1 ) + (1 − λ)(v − x 2 ))<br />

≤ λf(x 1 ) + (1 − λ)f(x 2 ) + λg(u − x 1 ) + (1 − λ)g(v − x 2 )<br />

< λα + (1 − λ)β.<br />

It follows that f<br />

g is convex.<br />

(f) Notice that (c) already shows this may fail if one of the functions is not bounded below,<br />

and we need <strong>to</strong> explicitly assume g is proper and let x 0 ∈ dom g. Then<br />

<strong>in</strong>f<br />

E f + <strong>in</strong>f<br />

E g ≤ (f g)(x) ≤ g(x 0) + f(x − x 0 ).<br />

When f is cont<strong>in</strong>uous, this implies (f g) is real-valued and hence cont<strong>in</strong>uous. When f is<br />

bounded on bounded sets, so is (f g). When f is Lipschitz with Lipschitz constant k ≥ 0,<br />

then f ≤ k‖ · ‖ + f(0) and so f g ≤ k‖ · ‖ + b where b := g(x 0 ) + f(0) + k‖x 0 ‖ which<br />

implies (f g) is Lipschitz with Lipschitz constant k (see Exercise 4.1.28). See also the note<br />

follow<strong>in</strong>g (g).<br />

(g) Observe that<br />

(‖ · ‖ δ C )(x) = <strong>in</strong>f<br />

y∈X ‖x − y‖ + δ C(y) = <strong>in</strong>f<br />

y∈C ‖x − y‖ = d C(x).<br />

As <strong>in</strong> the proof of (f), the convolution has Lipschitz constant 1 because the norm has<br />

Lipschitz constant 1.<br />

Further notes. One may prefer a more explicit argument <strong>in</strong> (f). Indeed, once we have established<br />

(f g) is real-valued, suppose (f g)(¯x) = ¯t. Given ɛ > 0, choose y ∈ E such that f(y)+g(¯x−y) <<br />

¯t + ɛ. Then for any h ∈ E,<br />

(f g)(h + ¯x) ≤ f(y + h) + g(¯x − y) ≤ |f(y + h) − f(y)| + f(y) + g(¯x − y)<br />

< (f g)(h + ¯x) + |f(y + h) − f(y)| + ɛ.<br />

From here, local/global Lipschitz properties of the convolution then follow directly from the<br />

local/global Lipschitz properties of f (this argument works just as well <strong>in</strong> any normed l<strong>in</strong>ear<br />

space irrespective of the dimension).<br />

2.3.16. Suppose f : C → R is Lipschitz with Lipschitz constant k, and let ˜f(x) := <strong>in</strong>f{f(y) +<br />

k‖x − y‖ : y ∈ C}. For x 0 ∈ C, tak<strong>in</strong>g y = x 0 clearly shows ˜f(x 0 ) ≤ f(x 0 ), hence ˜f ≤ f<br />

on C. Also, fix x 0 ∈ C. Then f(x) ≤ f(x 0 ) + k‖x − x 0 ‖, and so f(x) < ∞ for all x ∈ X.<br />

Moreover, f(x) ≥ f(x 0 ) − k‖x − x 0 ‖ for x ∈ C. Now fix x ∈ X, then f(y) + k‖x − y‖ ≥<br />

f(x 0 ) − k‖y − x 0 ‖ + k‖x − y‖ ≥ f(x 0 ) − k‖x − x 0 ‖ for all y ∈ C. This shows ˜f(x) > −∞ for all<br />

x ∈ X, i.e. ˜f is real-valued.<br />

Suppose ˜f(x 0 ) < f(x 0 ) for some x 0 ∈ C. Then there exists y ∈ C such that f(y) + k‖x 0 − y‖ <<br />

f(x 0 ). This violates the Lipschitz constant of f on C. Hence ˜f(x) = f(x) for all x ∈ C. Similarly,<br />

13


one can see that ˜f is globally Lipschitz with Lipschitz constant k. Indeed, suppose ˜f(u) − ˜f(v) ><br />

k‖u − v‖ + ɛ for some u, v ∈ X and ɛ > 0. Choose x 0 ∈ C such that ˜f(v) ≤ f(x 0 ) + k‖v − x 0 ‖ + ɛ.<br />

Then<br />

˜f(u) ≤ f(x 0 ) + k‖u − x 0 ‖ ≤ f(x 0 ) + k‖v − x 0 ‖ + k‖u − v‖<br />

which is a contradiction.<br />

2.3.17. (a) Suppose x and y are global m<strong>in</strong>imizers of f. Then f(λx + (1 − λ)y) ≥ f(x) =<br />

λf(x) + (1 − λ)f(y) for 0 < λ < 1. Because f is strictly convex, x = y.<br />

(b) It suffices <strong>to</strong> show this for y = 0, and it suffices <strong>to</strong> show<br />

( ) u + v<br />

(2) f < 1 2 2 f(u) + 1 f(v) for all dist<strong>in</strong>ct u, v ∈ E.<br />

2<br />

Indeed, if f(λu + (1 − λ)v) = λf(u) + (1 − λ)f(v) for some 0 < λ < 1 and dist<strong>in</strong>ct u and v,<br />

then by the convexity of f equality holds for all 0 < λ < 1. We now check that (2) is an easy<br />

consequence of the parallelogram identity:<br />

( ) u + v<br />

f<br />

2<br />

= 1 u + v<br />

2 ∥ 2 ∥<br />

2<br />

= 1 2 ‖u‖2 + 1 2 ‖v‖2 − 1 ‖u − v‖2<br />

2<br />

< 1 2 f(u) + 1 f(v) when u ≠ v.<br />

2<br />

(c) (i) Let y ∈ E and def<strong>in</strong>e f by f(x) := 1 2 ‖x − y‖2 + δ C . The strict convexity of f follows<br />

from (b), and f atta<strong>in</strong>s its m<strong>in</strong>imum on C because any m<strong>in</strong>imiz<strong>in</strong>g sequence is bounded and a<br />

convergent subsequence converges <strong>to</strong> the unique, by part (a), m<strong>in</strong>imizer.<br />

Now let y ∈ E, and suppose ȳ ∈ C satisfies 〈y − ȳ, x − ȳ〉 ≤ 0 for all x ∈ C. Then for x ∈ C<br />

‖y − ȳ‖ 2 = 〈y − ȳ, y − ȳ〉 = 〈y − ȳ, y − x〉 + 〈y − ȳ, x − ȳ〉<br />

≤<br />

〈y − ȳ, y − x〈≤ ‖y − ȳ‖ ‖y − x‖<br />

and so ‖y − ȳ‖ ≤ ‖y − x‖ for all x ∈ C, thus ȳ = P C (y).<br />

Conversely, suppose ȳ ∈ C satisfies 〈y − ȳ, x − ȳ〉 > 0 for some x ∈ C. Then for each 0 < λ < 1,<br />

the convexity of C implies the po<strong>in</strong>t x λ := λx + (1 − λ)ȳ is <strong>in</strong> C. Now compute<br />

‖y − x λ ‖ 2 = 〈y − x λ , y − x λ 〉<br />

= 〈y − ȳ − λ(x − ȳ), y − ȳ − λ(x − ȳ)〉<br />

= ‖y − ȳ‖ 2 − 2λ〈y − ȳ, x − ȳ〉 + λ 2 ‖x − ȳ‖ 2<br />

= ‖y − ȳ‖ 2 − λ[2〈y − ȳ, x − ȳ〉 − λ‖x − ȳ‖ 2 ].<br />

For λ > 0 sufficiently small, the term <strong>in</strong> the brackets is positive and then ‖y − ȳ‖ 2 > ‖y − x λ ‖ 2 ,<br />

and so ȳ ≠ P C (y).<br />

(ii) Let ¯x ∈ C. Then d ∈ N C (¯x) if and only if d ∈ ∂δ C (¯x) if and only if<br />

if and only if P C (¯x + d) = ¯x (by part (i)).<br />

(iii) In fact one can show<br />

〈¯x + d − ¯x, x − ¯x ≤ δ C (x) − δ C (¯x) = 0 for all x ∈ C<br />

(3) ‖P C (x) − P C (y)‖ 2 + ‖x − P C (x) − (y − P C (y))‖ 2 ≤ ‖x − y‖ 2 for all x, y ∈ E.<br />

14


Indeed, expand<strong>in</strong>g and rearrang<strong>in</strong>g us<strong>in</strong>g the <strong>in</strong>ner product, the left-hand side of (3) is equal <strong>to</strong><br />

(4) 〈x − y, x − y〉 + 2〈y − P C (y), P C (x) − P C (y)〉 + 2〈x − P C (x), P C (y) − P C (x)〉<br />

and by part (i) the last two <strong>in</strong>ner products <strong>in</strong> (4) are less than or equal <strong>to</strong> 0 which provides the<br />

desired conclusion.<br />

(d) For example S = {−1, 1}, then P S (0) is multi-valued, and lim x→0 + P (x) = {1} while<br />

lim x→0 − P (x) = {−1} so there is no s<strong>in</strong>gle-valued selection of P that is cont<strong>in</strong>uous at {0}.<br />

2.3.20. (a) Let f(x) := ‖x‖2<br />

2<br />

− δ S (x). The proof of Fact 4.5.6 shows<br />

1<br />

2 d2 S(y) = ‖y‖2 − f ∗ (y)<br />

2<br />

or d S (·) = ‖ · ‖ 2 − 2f ∗ (·) is a difference of convex functions as desired.<br />

(b) d C = ‖ · ‖ δ C and thus d ∗ C = (‖ · ‖)∗ + δ ∗ C = δ B E<br />

+ σ C .<br />

(c) Let x ∈ C. By Fenchel–Young, φ ∈ ∂d C (x) if and only if d ∗ C (φ) = φ(x) − d C(x) if and only<br />

if δ BE (φ) + σ C (φ) = φ(x) − d C (x) = φ(x) if and only if φ ∈ B E and φ(x) = σ C (φ) if and only if<br />

φ ∈ ∂δ C and φ ∈ B E if and only if φ ∈ N C (x) and φ ∈ B E .<br />

(d) Suppose x ∉ C. Then φ ∈ ∂d C (x) if and only if φ ∈ B E and<br />

Therefore, φ =<br />

〈φ, x − P C (x)〉 ≥ d C (x) − d C (P C (x)) = d C (x) = ‖x − P C (x)‖.<br />

1<br />

‖x − P C (x) (x − P C(x)) = 1<br />

d C (x) (x − P C(x)).<br />

(e) For x ∈ C, we obta<strong>in</strong> that ∇d 2 C<br />

(x) = 0 because<br />

lim<br />

t→0 |∇d2 C(x + th) − d 2 C(x) − 〈0, th〉| ≤ t 2 ‖h‖.<br />

For x ∉ C, the cha<strong>in</strong> rule implies ∇d 2 C (x) = 2d C(x)∇d C (x) = 2(x − P C (x)) (by part (c)). Thus<br />

(e) follows.<br />

2.3.21. Let D := {x ∈ E : Ax = b}. Let x ∈ D, then φ ∈ ∂D(x) if and only if φ(y − x) ≤ 0 for<br />

all y ∈ D if and only if φ(u) ≤ 0 for all u ∈ ker A if and only if φ(u) = 0 for all u ∈ ker A.<br />

Suppose φ ∈ A ∗ Y , that is φ = A ∗ y for some y ∈ Y . Fix u ∈ D, and let v ∈ D be arbitrary, then<br />

〈A ∗ y, v − u〉 = 〈y, A(v − u)〉 = 〈y, b − b〉 = 0.<br />

Therefore φ ∈ ∂D(u), that is φ ∈ N C (u)<br />

Conversely, suppose φ ∈ ∂δ D (x). Suppose φ ≠ 0, then fix x 0 ∈ X such that φ(x 0 ) = 1. We<br />

now express E as the direct sum ker φ ⊕ R ∪ {+∞} 0<br />

. Observe that Ax 0 ∉ A(kerφ). Indeed,<br />

suppose Ax 0 = Ax 1 for some x 1 ∈ ker φ. Then A(x 0 − x 1 ) = 0 and so by the previous paragraph<br />

φ(x 0 − x 1 ) = 0. Thus, by the basic separation theorem, we choose y ∈ Y such that y(Ax 0 ) = 1<br />

and y(A(ker φ)) = {0}. Given ¯x ∈ E, we write ¯x = h + φ(¯x)x 0 where k ∈ ker φ. Then<br />

〈A ∗ y, ¯x〉 = 〈y, A(h + φ(¯x)x 0 )〉 = 〈y, Ah〉 + φ(¯x)〈y, Ax 0 〉 = φ(¯x).<br />

Because ¯x ∈ E was arbitrary, we have φ = A ∗ y, and A ∗ Y ⊂ δ D (x) as desired.<br />

15


(a) Suppose ¯x is a local m<strong>in</strong>imizer as specified. Then ∇f(¯x)| ∈ ∂δ D (¯x) and so ∇f(¯x) ∈ A ∗ Y .<br />

(b) Suppose ∇f(¯x) ∈ A ∗ Y and f is convex. Then ∇f(¯x) ∈ ∂δ D (¯x), and because f is convex, ¯x<br />

is a global m<strong>in</strong>imizer of f| D .<br />

<strong>Exercises</strong> from Section 2.4<br />

2.4.1. Part(a).<br />

(i) As <strong>in</strong> the proof of the Fenchel duality theorem, let h : Y → [−∞, +∞] be def<strong>in</strong>ed by<br />

h(u) := <strong>in</strong>f {f(x) + g(Ax + u)},<br />

x∈E<br />

then h is convex and 0 ∈ core(dom g − A dom f) = dom h, thus by the max formula (2.1.19)<br />

∂h(0) is not empty, so we let −φ ∈ ∂h(0). Now for all x ∈ E and u ∈ Y with u = Av where<br />

v ∈ E, we have<br />

Def<strong>in</strong>e<br />

0 ≤ h(0) ≤ h(u) + 〈φ, u〉<br />

≤<br />

f(x) + g(A(x + v)) + 〈φ, Av〉<br />

(5) b := <strong>in</strong>f<br />

x∈E {f(x) − 〈A∗ φ, x〉}<br />

= [f(x) − 〈A ∗ φ, x〉] − [−g(A(x + v)) − 〈φ, A(x + v)〉].<br />

Then a ≤ b and thus for any r ∈ [a, b] we have<br />

a := sup{−g(A(z)) − 〈A ∗ φ, z〉}.<br />

z∈E<br />

f(x) ≥ r + 〈A ∗ φ, x〉 ≥ −(g ◦ A)(x) for all x ∈ E.<br />

(ii) With notation as <strong>in</strong> the Fenchel duality theorem (2.3.4), observe p ≥ 0 because f(x) ≥<br />

−g(Ax), and then the Fenchel duality theorem (2.3.4) says d = p and because the supremum<br />

<strong>in</strong> d is atta<strong>in</strong>ed, we choose φ ∈ Y such that<br />

0 ≤ p = −f ∗ (A ∗ φ) − g ∗ (−φ)<br />

≤ [f(x) − 〈φ, Ax〉] + [g(y) + 〈φ, y〉] for all x ∈ X, y ∈ Y,<br />

where the second <strong>in</strong>equality is a direct consequence of the def<strong>in</strong>itions of f ∗ (A ∗ φ) and g ∗ (−φ).<br />

Then for any z ∈ E, sett<strong>in</strong>g y = Az, <strong>in</strong> the previous <strong>in</strong>equality, we deduce a ≤ b where a<br />

and b are as <strong>in</strong> (5). Now choose r ∈ [a, b] and let α(x) = 〈A ∗ φ, x〉 + r.<br />

(iii) The <strong>in</strong>clusion is straightforward (Exercise 2.3.12 (a)(ii)), so we prove the reverse <strong>in</strong>clusion.<br />

Suppose φ ∈ ∂(f + g ◦ A)(¯x). Because shift<strong>in</strong>g by a constant does not change the<br />

subdifferential, we may assume without loss of generality that<br />

x ↦→ f(x) + g(Ax) − φ(x)<br />

atta<strong>in</strong>s its m<strong>in</strong>imum of 0 at ¯x. Accord<strong>in</strong>g <strong>to</strong> the sandwich theorem (2.4.1) there exists an<br />

aff<strong>in</strong>e function α := 〈A ∗ y, ·〉 + r with −y ∈ ∂g(A¯x) such that<br />

f(x) − φ(x) ≥ α(x) ≥ −g(Ax)<br />

for all x ∈ E, with equality when x = ¯x<br />

Then f(x) ≥ 〈φ + A ∗ y, x〉 + r and f(¯x) = 〈φ + A ∗ y, ¯x〉 + r. Thus φ + A ∗ y ∈ ∂f(¯x), and as<br />

a consequence, we have φ ∈ ∂f(¯x) + A ∗ ∂g(A¯x) as desired.<br />

16


(iv) Let the notation be as <strong>in</strong> the Hahn–Banach extension theorem (2.1.18). Then −g ≤ p where<br />

g = −f +δ S . Because p is everywhere cont<strong>in</strong>uous, we can apply the sandwich theorem (2.4.1)<br />

<strong>to</strong> f<strong>in</strong>d an aff<strong>in</strong>e mapp<strong>in</strong>g α such that −g ≤ α ≤ p, that is f ≤ α ≤ p. Then α = α(0) + φ<br />

where φ ∈ E. We know α(0) ≥ f(0) = 0 and so φ ≤ p. We claim φ(s) = f(s) for all s <strong>in</strong> the<br />

l<strong>in</strong>ear subspace S. Indeed, if this were not true, then f(x 0 ) − φ(x 0 ) ≠ 0 for some x 0 ∈ S.<br />

Then choose k ∈ R so that k(f(x 0 ) − φ(x 0 )) > α(0). This implies f(kx 0 ) > φ(kx 0 ) + α(0)<br />

which is impossible. Thus φ| S = f as desired.<br />

Part (b). Several connections were outl<strong>in</strong>ed <strong>in</strong> (a) and earlier, for now we’ll derive a couple<br />

additional easy relations, <strong>to</strong> sketch one complete circle.<br />

(i) Suppose the subdifferential sum rule is valid, and x 0 ∈ core dom f where f : E → (−∞, +∞]<br />

is convex. Then E = ∂(f + δ {x0 })(x 0 ) = ∂f(x 0 ) + ∂δ {x0 }(x 0 ) and so ∂f(x 0 ) is not empty.<br />

(ii) Suppose the subdifferential of a convex function on E at a po<strong>in</strong>t of cont<strong>in</strong>uity is not empty.<br />

Now consider a l<strong>in</strong>ear function f on a subspace Y of E and f| Y ≤ p for some subl<strong>in</strong>ear<br />

function p on E. Consider h = f p. Then h ≤ p, h| Y = f, h is cont<strong>in</strong>uous with h(0) =<br />

0. Consider φ ∈ ∂h(0). Then φ| Y = f, and φ ≤ h. Thus the Hahn-Banach extension<br />

theorem (2.1.18) follows.<br />

So one of the circles of implications we have sketched is: Hahn–Banach extension ⇒ max formula<br />

⇒ Fenchel duality theorem ⇒ sandwich theorem ⇒ nonempt<strong>in</strong>ess of subdifferential ⇒ Hahn–<br />

Banach extension theorem. Where the proofs of the respective implications are given <strong>in</strong>: proof<br />

of max formula (2.1.19), proof of the Fenchel duality theorem (2.3.4), Part(a)(ii), Part(a)(iii),<br />

Part(b)(i), Part(b)(ii).<br />

Further notes. (a) The Fenchel duality and the sandwich theorems are most easily visualized and<br />

unders<strong>to</strong>od <strong>in</strong> the classical case Y = E where A is the identity map, and yet still very powerful.<br />

In this case, the primal and dual problems are:<br />

p := <strong>in</strong>f {f(x) + g(x)} and d := sup{−f ∗ (y) − g ∗ (−y)}<br />

x∈E<br />

As before, p ≥ d by the Fenchel–Young <strong>in</strong>equality (2.3.1). We derive p = d us<strong>in</strong>g the sandwich<br />

theorem (2.4.1) when 0 ∈ core(dom g − dom f). Indeed, when p > −∞, we know f(x) ≥ p − g(x)<br />

for all x ∈ E, and thus there is an aff<strong>in</strong>e function α := φ + r such that<br />

y∈E<br />

f(x) ≥ 〈φ, x〉 + r ≥ −g(x) + p for all x ∈ E.<br />

Then 〈φ, x〉 − f(x) ≤ −r for all x ∈ E and 〈−φ, x〉 − g(x) ≤ r − p for all x ∈ E. Thus f ∗ (φ) ≤ −r<br />

and g ∗ (−φ) ≤ r − p. In other words, p ≥ d ≥ −f ∗ (φ) − g ∗ (−φ) ≥ r + p − r = p and so p = d and<br />

the sup is atta<strong>in</strong>ed at φ as desired.<br />

(b) Of course, one can derive the Fenchel dualilty theorem (2.3.4) from the sandwich theorem<br />

(2.4.1) by slightly modify<strong>in</strong>g the proof as presented <strong>in</strong> the text. Indeed, let h be as def<strong>in</strong>ed <strong>in</strong><br />

the proof of the Fenchel dualilty theorem (2.3.4), and observe h ≥ p − δ {0} , and h(0) = p. By the<br />

sandwich theorem (2.4.1), there is an aff<strong>in</strong>e function, say α := p − φ such that h ≥ α ≥ p − δ {0} ,<br />

and thus −φ ∈ ∂h(0). Now proceed as <strong>in</strong> the proof for the Fenchel dualilty theorem (2.3.4).<br />

2.4.3. (a) Fix ɛ > 0 and let ¯x ∈ cl C. Then there exists a sequence (x n ) ⊂ C such that x n → ¯x.<br />

Fix n 0 such that ‖x n0 − ¯x‖ < ɛ. Then ¯x ∈ x n0 + ɛB X ⊂ C + ɛB E .<br />

17


(b) Let x + y ∈ D + F where x ∈ D, y ∈ F . Because D is open, we choose ɛ > 0 so that<br />

x + ɛB E ⊂ D. Then x + y + ɛB E ⊂ D + F . Therefore, x + y ∈ <strong>in</strong>t(D + F ) and so D + F is open.<br />

(c) Let x ∈ <strong>in</strong>t C and choose ɛ > 0 so that x + ɛB E ⊂ C. By part (a), for each λ > 0,<br />

cl C ⊂ C + λɛ<br />

1−λ B E. Therefore,<br />

λx + (1 − λ) cl C ⊂<br />

(<br />

λx + (1 − λ) C +<br />

λɛ )<br />

1 − λ B E<br />

Because x ∈ <strong>in</strong>t C was arbitrary it follows that<br />

= λ(x − ɛB E ) + (1 − λ)C<br />

= λ(x + ɛB E ) + (1 − λ)C ⊂ C.<br />

λ <strong>in</strong>t C + (1 − λ) cl C ⊂ C, for each 0 < λ ≤ 1,<br />

and by part (b), the sum on the left hand side is open, and so<br />

λ <strong>in</strong>t C + (1 − λ) cl C ⊂ <strong>in</strong>t C, for each 0 < λ ≤ 1,<br />

as desired.<br />

(d) Because <strong>in</strong>t C ⊂ cl C, the previous <strong>in</strong>equality implies (trivially) λx + (1 − λ)y ∈ <strong>in</strong>t C for all<br />

x, y ∈ <strong>in</strong>t C and 0 < λ < 1, and so <strong>in</strong>t C is convex.<br />

(e) For any fixed x ∈ <strong>in</strong>t C, λx + (1 − λ) cl C ⊂ <strong>in</strong>t C. Lett<strong>in</strong>g λ → 0 + , we deduce that<br />

cl C ⊂ cl(<strong>in</strong>t C). Clearly this can fail when C is not convex as any nonempty set with empty<br />

<strong>in</strong>terior will do.<br />

Further notes: <strong>in</strong> any normed l<strong>in</strong>ear space it is straightforward <strong>to</strong> show the <strong>in</strong>terior of a convex<br />

set is convex: Let x, y ∈ <strong>in</strong>t C, and choose r > 0 so that x + rB X ⊂ C and y + rB X ⊂ C. Then<br />

for 0 ≤ λ ≤ 1, one has<br />

λ(x + rB X ) + (1 − λ)(y + rB X ) ⊂ C.<br />

Then, λx + (1 − λ)y + rB X ⊂ C and λx + (1 − λ)y ⊂ <strong>in</strong>t C as desired. It is also easy <strong>to</strong> see that<br />

the closure of a convex set is closed (see solution <strong>to</strong> Exercise 2.4.8). See also [383, Theorem 1.13]<br />

for more.<br />

2.4.4.(a) Let (A i ) i∈I be a collection of aff<strong>in</strong>e sets. Let A = ⋂ i∈I A i, and let x, y ∈ A and λ ∈ R.<br />

Then x, y ∈ A i for each i ∈ I and so λx+(1−λ)y ∈ A i for each i ∈ I. Therefore, λx+(1−λ)y ∈ A,<br />

and so A is aff<strong>in</strong>e.<br />

(b) Suppose A is a nonempty aff<strong>in</strong>e subset of E. Fix x 0 ∈ A. We claim that Y := A − x 0 is<br />

l<strong>in</strong>ear. Indeed, then for α, β ∈ R and x, y ∈ L we have x = u − x 0 and y = v − x 0 where u, v ∈ A,<br />

and<br />

αx + βy + x 0 = αu − αx 0 + βv − βx 0 + 1x 0 ∈ A<br />

because α − α + β − β + 1 = 1 (see part (c)). Therefore, αx + βy ∈ A − x 0 , and A − x 0 is a l<strong>in</strong>ear<br />

subspace. Conversely, if Y is l<strong>in</strong>ear, and x 0 ∈ E, then A := Y + x 0 is aff<strong>in</strong>e. Indeed, for x, y ∈ Y ,<br />

and α + β = 1, we have<br />

α(x + x 0 ) + β(y + y 0 ) = αx + βy + x 0 ∈ A,<br />

which completes the proof of (b). (Alternatively, one can directly deduce this from Lemma 2.4.5).<br />

18


(c) This part follows immediately from Lemma 2.4.5 which shows aff D = x + span(D − x) for<br />

any x ∈ D. Indeed, suppose x 1 , x 2 , . . . , x m ∈ D. Then for ∑ m<br />

i=1 λ i = 1, we have<br />

m∑<br />

λ i x i = x +<br />

i=1<br />

m∑<br />

λ i (x i − x) ∈ aff D.<br />

Conversely, suppose u ∈ aff D. Then u ∈ x + span(D − x) and so<br />

i=1<br />

m∑<br />

u = x + α i (x i − x) =<br />

i=1<br />

m∑<br />

m∑<br />

α i x i + (1 − α i )x.<br />

i=1<br />

i=1<br />

Thus u ∈ aff D and this proves (c).<br />

(d) Suppose D is nonempty. L<strong>in</strong>ear subspaces of E are closed, therefore aff D = x + span(D − x)<br />

is closed for any x ∈ D. Consequently, cl D ⊂ aff D. It then follows that aff(cl D) ⊂ aff D.<br />

Because the reverse <strong>in</strong>clusion is clear, we deduce aff(cl D) = aff D.<br />

2.4.5. (a) Consider C 1 = [0, 1] × {0} and C 2 = [0, 1] × [0, 1] as subsets of R 2 . Then ri C 1 =<br />

(0, 1) × {0} while ri C 2 = (0, 1) × (0, 1). Thus C 1 ⊂ C 2 , but ri C 1 ⊄ ri C 2 .<br />

(b) By translat<strong>in</strong>g C, we may assume that 0 ∈ C, then aff C = Y is a l<strong>in</strong>ear space, and so ri C<br />

is the <strong>in</strong>terior of C relative <strong>to</strong> Y , thus we may apply Exercise 2.4.3 us<strong>in</strong>g Y as the overspace <strong>to</strong><br />

derive the conclusion.<br />

(c) (i) ⇒ (ii): Let x ∈ ri C then there exists r > 0 so that x + ɛB E ∩ aff C ⊂ C. In particular,<br />

for y ∈ C choos<strong>in</strong>g ɛ > 0 so that ɛ‖y − x‖ < r, we have x + ɛ(x − y) ∈ C.<br />

(ii) ⇒ (iii): Let Y := {λ(c − x) : λ ≥ 0, x ∈ C}. Certa<strong>in</strong>ly 0 ∈ Y . Moreover, let λ(c − x) ∈ Y .<br />

Then αλ(c − x) ∈ Y if α ≥ 0. Suppose α < 0, then choose ɛ > 0 so that ɛ(x − c) + x ∈ C. Then<br />

αλ(c − x) = |α|λ(x − c) = |α|λ ɛ(x − c)<br />

ɛ<br />

= |α|λ [(ɛ(x − c) + x) − x],<br />

ɛ<br />

and so Y is closed under scalar multiplication. We now show Y is closed under addition. Indeed,<br />

for the nontrivial case λ 1 , λ 2 > 0 we have<br />

[<br />

λ 1<br />

λ 1 (c 1 − x) + λ 2 (c 2 − x) = (λ 1 + λ 2 ) c 1 + λ ]<br />

2<br />

c 2 − x<br />

λ 1 + λ 2 λ 1 + λ 2<br />

= (λ 1 + λ 2 )(¯c − x),<br />

where ¯c ∈ C by the convexity of C as desired. Thus Y is a l<strong>in</strong>ear subspace.<br />

(iii) ⇒ (i) Let Y = {λ(c − x) : c ∈ C}. It follows from Lemma 2.4.5 that aff C = x + Y . For a<br />

basis {y i } of Y , we can choose each of y i and −y i can be written as λ(c − x) for some λ > 0 and<br />

c ∈ C from which it follows that x is <strong>in</strong> the <strong>in</strong>terior of C relative <strong>to</strong> aff C.<br />

(d) In fact, ri(T C) = T (ri C); see [369, Theorem 6.6].<br />

2.4.6. By shift<strong>in</strong>g f we may assume 0 ∈ dom f, and let Y = span(dom f). Let x ∈ ri(dom f), the<br />

x is <strong>in</strong> the <strong>in</strong>terior of the doma<strong>in</strong> of f relatively <strong>to</strong> Y . By the max formula (2.1.19), ∂f| Y (x) ≠ ∅,<br />

that is, there exists φ ∈ Y such that<br />

(6) 〈φ, y − x〉 ≤ f(y) − f(x) for all y ∈ Y.<br />

19


Now write E = Y +Z as a direct sum, and def<strong>in</strong>e ˜φ on E by ˜φ(y +z) = φ(y). Because dom f ⊂ Y ,<br />

it follows from (6) that ˜φ ∈ ∂f(x).<br />

Notice that this result ensures the subdifferential of any proper convex function on E has<br />

nonempty doma<strong>in</strong> and range. This is because the doma<strong>in</strong> of a convex function is convex, and<br />

every nonempty convex subset of E has nonempty relative <strong>in</strong>terior.<br />

2.4.7. Suppose x 0 ∈ dom f and ∂f(x 0 ) = ∅. Because cl(ri dom f) = cl(dom f), there exists a<br />

sequence (x n ) ⊂ ri(dom f) converg<strong>in</strong>g <strong>to</strong> x 0 , and hence by Exercise 2.4.6 there exist φ n ∈ ∂f(x n ).<br />

Suppose by way of contradiction that ‖φ n ‖ ↛ ∞, hence it has a bounded subsequence, and then<br />

by compactness a convergent subsequence. Thus we suppose (φ nj ) → φ. Then for y ∈ E, we have<br />

φ(y) − φ(x 0 ) = lim<br />

j<br />

φ nj (y − x nj ) ≤ lim <strong>in</strong>f<br />

j<br />

f(y) − f(x nj ) ≤ f(y) − f(x 0 ),<br />

and so φ ∈ ∂f(x 0 ). This provides our desired contradiction. Thus ‖φ n ‖ → ∞. Further Exercise<br />

2.2.23 shows that ∂f(x 0 ) is unbounded whenever it is not empty and x 0 is <strong>in</strong> the boundary<br />

of the doma<strong>in</strong> of f.<br />

2.4.8. Suppose f is proper. Fix x 0 ∈ ri(dom f) and let φ ∈ ∂f(x 0 ). Then f(x) ≥ cl f(x) ≥<br />

f(x 0 ) + φ(x − x 0 ) for all x ∈ X and so cl f is proper. One can verify cl f is convex because<br />

its epigraph is convex as the closure of a convex set; that a closure of a convex set is convex<br />

is elementary <strong>to</strong> verify. Indeed, suppose D = cl C where C is convex. Suppose x, y ∈ D, and<br />

0 ≤ λ ≤ 1. Choose (x n ), (y n ) ⊂ C so that x n → x and y n → y. Then<br />

λx + (1 − λ)y = lim<br />

n<br />

(λx n + (1 − λ)y n ).<br />

Hence λx + (1 − λ)y ∈ D as it is a limit of elements from C.<br />

Further notes. Hence, one can use the Hessian <strong>to</strong> check convexity of convex functions on closed<br />

doma<strong>in</strong>s. For example suppose f : C → R is cont<strong>in</strong>uous and <strong>in</strong>t C ≠ ∅. Suppose f is twice<br />

Gâteaux differentiable on <strong>in</strong>t C whose Hessian is positive semidef<strong>in</strong>ite at all x ∈ <strong>in</strong>t C. Then f is<br />

convex, because f| <strong>in</strong>t C is convex, and f is the closure of f| <strong>in</strong>t C .<br />

2.4.9. The set C is closed by Carathéodory’s theorem (1.2.5) because it is the convex hull of a<br />

compact set. The set of extreme po<strong>in</strong>ts of C is not closed because (1, 0, 0) is not an extreme po<strong>in</strong>t<br />

of C but every other po<strong>in</strong>t on the circle {(x, y, z) : x 2 + y 2 = 1, z = 0} is an extreme po<strong>in</strong>t of C.<br />

2.4.20. Let m := <strong>in</strong>f C f. Then f ≥ −g where g := δ C − m. The conditions imply we can apply<br />

the sandwich theorem (2.4.1) <strong>to</strong> f<strong>in</strong>d an aff<strong>in</strong>e function α such that<br />

m − δ C ≤ α ≤ f.<br />

Then m ≤ <strong>in</strong>f C α ≤ <strong>in</strong>f C f as desired.<br />

Suppose ¯x m<strong>in</strong>imizes f on C, and write the aff<strong>in</strong>e separat<strong>in</strong>g function as α = φ + r. Then<br />

φ ∈ ∂f(¯x) and −φ ∈ ∂δ C (¯x) and so 0 = φ − φ ∈ ∂f(¯x) + N C (¯x) as desired.<br />

Conversely, suppose 0 ∈ ∂f(¯x) + δ C (¯x). Then 0 ∈ ∂(f + δ C )(¯x), and so f + δ C atta<strong>in</strong>s its<br />

m<strong>in</strong>imum at ¯x. Therefore f atta<strong>in</strong>s its m<strong>in</strong>imum on C at ¯x.<br />

20


Further notes. This last part could have been more efficiently completed us<strong>in</strong>g the subdifferential<br />

sum rule because<br />

0 ∈ ∂(f + δ C )(¯x) if and only if 0 ∈ ∂f(¯x) + ∂δ C (¯x) if and only if 0 ∈ ∂f(¯x) + N C (¯x).<br />

2.4.23. Let K be a nonempty subset of E. Then<br />

Given any x ∈ K, t ≥ 0 and φ ∈ K − we have<br />

K − := {φ ∈ E : 〈φ, x〉 ≤ 0, for all x ∈ K}<br />

〈φ, tx〉 = t〈φ, x〉 ≤ 0.<br />

Therefore, K ⊂ (K − ) − . Also, observe that for any set S, S − is a closed convex set because it is<br />

an <strong>in</strong>tersection of closed half-spaces. In particular, K −− is a closed convex set conta<strong>in</strong><strong>in</strong>g R + K.<br />

Now let C be the closed convex hull of R + K, and suppose x 0 ∉ C. By the basic separation<br />

theorem (2.1.21), we choose φ ∈ E such that φ(x 0 ) > sup C φ. Observe that sup C φ = 0 because<br />

0 ∈ C, and if sup C φ > 0, then there exists t > 0 and ¯x ∈ K such that φ(t¯x) > 0. Then<br />

lim n→∞ φ(n¯x) = ∞, and so φ would not be bounded above by φ(x 0 ) on C. S<strong>in</strong>ce sup C φ = 0,<br />

then φ ∈ K − , and so x 0 ∉ (K − ) − , and thus K −− ⊂ C as desired.<br />

2.4.24. We will prove the assertions for D + C, the other case follows by consider<strong>in</strong>g D + (−C).<br />

Suppose d 1 + c 1 , d 2 + c 2 ∈ D + C. Then for 0 ≤ λ ≤ 1, us<strong>in</strong>g the convexity of D and C we obta<strong>in</strong><br />

λ(d 1 + c 1 ) + (1 − λ)(d 2 + c 2 ) = λd 1 + (1 − λ)d 2 + λc 1 + (1 − λ)c 2 ∈ D + C.<br />

Now suppose (d n + c n ) ∞ n=1 ⊂ D + C and d n + c n → ¯x. Let (c nk ) be a convergent subsequence of<br />

(c n ), say, c nk → ¯c ∈ C. Then d nk → ¯x − ¯c. Thus ¯x − ¯c ∈ D, and so ¯x ∈ D + C as desired.<br />

<strong>Exercises</strong> from Section 2.5<br />

2.5.1. Suppose f is differentiable at x 0 (by Theorem 2.2.1 f is au<strong>to</strong>matically Fréchet differentiable),<br />

so given ɛ > 0 we choose δ > 0 so that<br />

|f(x + h) − f(x) − 〈f ′ (x 0 ), h〉| ≤ ɛ ‖h‖ whenever 0 ≤ ‖h‖ < δ.<br />

2<br />

Therefore, for ‖h‖ < δ, us<strong>in</strong>g the triangle <strong>in</strong>equality we obta<strong>in</strong><br />

|f(x + h) + f(x − h) − 2f(x)| = |f(x + h) − f(x) − 〈f ′ (x), h〉 + f(x − h) − f(x) − 〈f ′ (x), −h〉|<br />

≤<br />

ɛ 2 ‖h‖ + ɛ ‖h‖ = ɛ‖h‖.<br />

2<br />

f(x + h) + f(x − h) − 2f(x)<br />

This implies lim<br />

= 0.<br />

‖h‖→0<br />

‖h‖<br />

Conversely, suppose f is not differentiable at x. Because f is cont<strong>in</strong>uous at x, the max formula<br />

(2.1.19) ensures ∂f(x) ≠ ∅, thus we use Theorem 2.2.1 <strong>to</strong> deduce that there are dist<strong>in</strong>ct<br />

21


φ, Λ ∈ ∂f(x). Thus we choose h 0 ∈ S E so that (φ − Λ)(h 0 ) > 0. Now let ɛ = (φ − Λ). By the<br />

subdifferential <strong>in</strong>equality<br />

f(x + th 0 ) − f(x) − φ(th 0 ) + f(x − th 0 ) − f(x) − Λ(−th 0 ) ≥ 0, for all t.<br />

In particular, f(x + th 0 ) + f(x − th 0 ) − 2f(x) ≥ (φ − Λ)(th 0 ) ≥ ɛt for all t > 0 and so<br />

lim sup<br />

‖h‖→0<br />

f(x + h) + f(x − h) − 2f(x)<br />

‖h‖<br />

≥ ɛ<br />

which establishes the ‘if’ assertion.<br />

2.5.2. Let<br />

G n,m :=<br />

{<br />

x ∈ U :<br />

sup<br />

‖h‖≤ 1 m<br />

}<br />

f(x + h) + f(x − h) − 2f(x) < 1 ,<br />

nm<br />

and O n := ⋃ m≥1 G n,m and G := ⋂ n≥1 O n. Suppose x ∈ G and let ɛ > 0. Choose n such that<br />

1/n < ɛ. Now f<strong>in</strong>d m such that x ∈ G n,m , and choose δ so that 0 < δ < 1/m. The convexity of<br />

f implies<br />

Indeed, for 0 < λ ≤ 1 we have<br />

sup f(x + h) + f(x − h) − 2f(x) < 1<br />

‖h‖=α<br />

n α whenever 0 < α ≤ 1 m .<br />

f(x + λh) + f(x − λh) − 2f(x) ≤ λf(x + h) + (1 − λ)f(x) + λf(x − h) + (1 − λ)f(x) − 2f(x)<br />

= λ[f(x + h) + f(x − h) − 2f(x)].<br />

Exercise 2.5.1 implies that f is differentiable at x. Conversely, suppose f is differentiable at x,<br />

and fix n ∈ N. Choose 0 < ɛ < 1/n and use Exercise 2.5.1 <strong>to</strong> f<strong>in</strong>d δ > 0 so that<br />

f(x + h) + f(x − h) − 2f(x) ≤ ɛ‖h‖ whenever ‖h‖ ≤ δ.<br />

Then x ∈ G n,m for all m > 1/δ. It follows that x ∈ G as desired.<br />

It rema<strong>in</strong>s <strong>to</strong> verify that O n is open for each n. Indeed, fix n and suppose x ∈ O n . Then for<br />

some m 0 ∈ N, x ∈ G nm and, as above, the convexity of f implies that x ∈ G nm for all m ≥ m 0 .<br />

Now fix m > m 0 sufficiently large so that f has Lipschitz constant, say K > 0 on B 2/m (x). Then<br />

choose ɛ > 0 so that<br />

sup f(x + h) + f(x − h) − 2f(x) < 1<br />

mn − ɛ.<br />

‖h‖≤ 1 m<br />

Now we choose α > 0 so that 4Kα < ɛ and α < 1/m. Now suppose ‖u − x‖ < α, then because f<br />

has Lipschitz constant K on B 2/m (x), for ‖h‖ < 1/m we have<br />

f(u + h) + f(u − h) − 2f(u) ≤ f(x + h) + f(x − h) − 2f(x) + 4K‖x − u‖ < 1<br />

mn .<br />

This shows O n is open as desired.<br />

2.5.3. Suppose f : R n → R m is locally Lipschitz. Let p j be the j-th coord<strong>in</strong>ate projection from<br />

R m <strong>to</strong> R. Def<strong>in</strong>e f j = p j ◦ f. Then f j is locally Lipschitz. Let D = {x ∈ R n : f j is differentiable<br />

at x, j = 1, 2, . . . , m}. It follows from Rademacher’s theorem, that D c is a union of f<strong>in</strong>itely many<br />

null sets, and thus has measure 0. It rema<strong>in</strong>s <strong>to</strong> show that f is differentiable at each x ∈ D.<br />

22


Indeed, fix x ∈ D and let A be the m by n matrix whose j-th row is ∇f j (x). It is not hard <strong>to</strong><br />

verify that ∇f(x) = A, <strong>in</strong>deed for ɛ > 0, choose δ > 0 so that<br />

f i (x + th) − f i (x)<br />

∣<br />

− 〈∇f i (x), h〉<br />

t<br />

∣ < √ ɛ whenever h ∈ S R n, 0 < |t| < δ.<br />

m<br />

Then for h ∈ S R n<br />

and 0 < |t| < δ, we have<br />

f(x + th) − f(x)<br />

∥ t<br />

Hence ∇f(x) = A as desired.<br />

√ √√√ m<br />

− Ah<br />

∥ = ∑<br />

(<br />

fi (x + th) − f i (x)<br />

t<br />

i=1<br />

∑<br />

< √ m ɛ 2 m = ɛ.<br />

i=1<br />

) 2<br />

− 〈∇f i (x), h〉<br />

2.5.4. Let ɛ > 0, and let h ∈ S X . Let K > 0 be chosen so that f satisfies Lipschitz constant K<br />

<strong>in</strong> a neighborhood δB r (¯x) of ¯x and ‖y‖ ≤ K. Now fix k ∈ N with ‖h k − h‖ < ɛ/4K. Now choose<br />

0 < δ < r so that<br />

(7) |f(¯x + th k ) − f(¯x) − 〈y, th k 〉| < ɛ t whenever |t| < δ.<br />

4<br />

Now for |t| < δ we have<br />

|f(¯x + th) − f(¯x) − 〈y, th〉| ≤ |f(¯x + th k ) − f(¯x) − 〈y, th k 〉| + 2K‖th − th k ‖<br />

≤<br />

ɛ 2 |t| + 2K|t| ɛ<br />

4K = ɛ|t|.<br />

This shows f is Gâteaux differentiable at ¯x with ∇f(¯x) = y. So far, we didn’t use that X is f<strong>in</strong>itedimensional.<br />

However, because X is f<strong>in</strong>ite-dimensional, and f is Lipschitz <strong>in</strong> a neighborhood of<br />

¯x, Exercise 2.2.9 implies f is Féchet-differentiable at ¯x as desired.<br />

Further comments. The reader will notice that last part of the proof of Rademacher’s theorem<br />

also proves this fact. Observe further that even the assertion f is Gâteaux differentiable at ¯x<br />

may fail for cont<strong>in</strong>uous functions (the estimate with the Lipschitz constant above was crucial).<br />

Indeed, on R 2 let f(x, y) = 0 whenever y ≤ √ |x| and f(x, y) = y − √ |x| for y ≥ √ |x|. The<br />

hypothesis of the exercise are satisfied at ¯x := (0, 0) for every direction h ∈ S R 2 with y := (0, 0)<br />

except for the direction h := (0, 1), however, f fails <strong>to</strong> be Gâteaux differentiable at (0, 0).<br />

2.5.5. Let x be a boundary po<strong>in</strong>t of C. Because C has nonempty <strong>in</strong>terior, we choose φ ∈ S X ∗ so<br />

that φ(x) = sup C φ and φ(x) > φ(y) whenever y ∈ <strong>in</strong>t C. It follows that φ ∈ ∂d C (x). Indeed, if<br />

y ∈ C, then φ(y −x) ≤ 0 = d C (y)−d C (x). If y ∉ C, then φ(y −x) ≤ <strong>in</strong>f{‖y −u‖ : φ(u) ≤ φ(x)} ≤<br />

d C (y) = d C (y) − d C (x). Because 0 ∈ d C (x), it follows that ∂d C (x) is not a s<strong>in</strong>gle<strong>to</strong>n, so d C is<br />

not differentiable at x. Thus d C is a convex function that is not differentiable at the boundary<br />

po<strong>in</strong>ts of C, and consequently, the boundary of C is both first category and Lebesgue-null.<br />

2.5.7. (a)⇒(b): g is almost everywhere differentiable by Theorem 2.5.1. On the other hand, the<br />

subgradient <strong>in</strong>equality holds on U. Together, these facts imply (b).<br />

(b)⇒(c): trivial.<br />

23


(c)⇒(a): Fix u, v <strong>in</strong> U, u ≠ v, and t ∈ (0, 1). It is not hard <strong>to</strong> see that there exist sequences (u n )<br />

<strong>in</strong> U, (v n ) <strong>in</strong> U, (t n ) <strong>in</strong> (0, 1) with u n → u, v n → v, t n → t, and x n := t n u n + (1 − t n )v n ∈ A, for<br />

every n. By assumption, ∇g(x n )(u n − x n ) ≤ g(u n ) − g(x n ) and ∇g(x n )(v n − x n ) ≤ g(v n ) − g(x n ).<br />

Equivalently, (1 − t n )∇g(x n )(v n − u n ) ≤ g(u n ) − g(x n ) and t n ∇g(x n )(u n − v n ) ≤ g(v n ) − g(x n ).<br />

Multiply the former <strong>in</strong>equality by t n , the latter by 1 − t n and add<strong>in</strong>g we obta<strong>in</strong><br />

0 ≤ t n g(u n ) − t n (g(x n ) + (1 − t n )g(v n ) − (1 − t n )g(x n ),<br />

or <strong>in</strong> other words g(x n ) ≤ t n g(u n ) + (1 − t n )g(v n ). Now let n tend <strong>to</strong> +∞, and deduce that<br />

g ( tu + (1 − t)v ) ≤ tg(u) + (1 − t)g(v). The convexity of g follows and the proof is complete.<br />

2.5.8. Let f n (x) := n[f(x + 1/n) − f(x)]. Let G := {x : f ′ (x) exists}; then f n (x) → f ′ (x) for<br />

x ∈ G. Let F n := {x : |f j (x) − f k (x)| ≤ 1/2} for all j, k ≥ n. Then F n is closed because it<br />

is an <strong>in</strong>tersection of closed sets s<strong>in</strong>ce f j − f k is cont<strong>in</strong>uous for each k, j ∈ N. Clearly (f n (x)) is<br />

convergent for x ∈ G and so ⋃ F n ⊃ G. Suppose F n0 conta<strong>in</strong>s an open <strong>in</strong>terval I for some n 0 ∈ N.<br />

Then |f n0 (x) − f ′ (x)| ≤ 1/2 almost everywhere on I. By the Fundamental theorem of calculus,<br />

f ′ (x) = χ S − χ R\S almost everywhere, and so f ′ (x) = 1 on a dense subset of I and f ′ (x) = −1<br />

on another dense subset of I we conclude that f n0 ≥ 1/2 on a dense subset of I, and f n0 ≤ −1/2<br />

on a dense subset of I <strong>to</strong> contradict the cont<strong>in</strong>uity of f n0 . Hence F n is nowhere dense for each<br />

n ∈ N and so G is a set of first category.<br />

<strong>Exercises</strong> from Section 2.7<br />

2.7.1. Note that given a set S, conv(S) is the collection of convex comb<strong>in</strong>ations of elements <strong>in</strong><br />

S, that is element of the form ∑ m<br />

i=1 λ is i where ∑ m<br />

i=1 λ i = 1 and λ i ≥ 0 for all 1 ≤ i ≤ m.<br />

By the extreme value theorem, f atta<strong>in</strong>s its maximum at some ¯x ∈ C. By M<strong>in</strong>kowski’s theorem<br />

(2.7.2), we may write ¯x = ∑ m<br />

i=1 λ ix i where x i is an extreme po<strong>in</strong>t of C, λ i ≥ 0 for 1 ≤ i ≤ m<br />

and ∑ m<br />

i=1 λ i = 1. By the convexity of f, f(¯x) ≤ ∑ m<br />

i=1 λ if(x i ). Because f atta<strong>in</strong>s it maximum at<br />

¯x, this implies f(x i ) = f(¯x) for each 1 ≤ i ≤ n.<br />

The sketch of the basic convex hull fact is as follows. Because conv(S) is the smallest convex set<br />

conta<strong>in</strong><strong>in</strong>g S, we know it conta<strong>in</strong>s every convex comb<strong>in</strong>ation ∑ m<br />

i=1 λ is i where s i ∈ S and λ i ≥ 0<br />

for 1 ≤ i ≤ n and ∑ m<br />

i=1 λ i = 1. Conversely, let C be the collection of all convex comb<strong>in</strong>ations<br />

of elements of S. We will verify C is convex, and thus C = conv(S). Indeed, let x and y be<br />

convex comb<strong>in</strong>ations of elements of S, say x = ∑ m<br />

i=1 α iu i and y = ∑ n<br />

i=1 β iv i , then for 0 ≤ λ ≤ 1,<br />

it follows that ∑ m<br />

i=1 λα i + ∑ n<br />

i=1 (1 − λ)β i = 1 and so λx + (1 − λ)y is a convex comb<strong>in</strong>ation of<br />

elements of S, which means λx + (1 − λ)y ∈ C as desired.<br />

2.7.2.(Exposed Po<strong>in</strong>ts)<br />

(a) Suppose x 0 is an exposed po<strong>in</strong>t of a convex set C. Choose φ ∈ E such that 〈φ, x 0 〉 = sup C φ<br />

and 〈φ, x〉 < ∠φ, x 0 〉 for all x ∈ C \ {x 0 }. Let x, y ∈ C \ {x 0 }. Then for any 0 ≤ λ ≤ 1,<br />

φ(λx + (1 − λ)y) = λφ(x) + (1 − λ)φ(y) < φ(x 0 ).<br />

Thus x 0 is not a convex comb<strong>in</strong>ation of x and y.<br />

(b) Let C be a compact convex subset of E. Suppose x 0 ∈ C is an exposed po<strong>in</strong>t, exposed by<br />

φ ∈ E. Now suppose (x n ) ⊂ C is a sequence such that φ(x n ) → φ(x 0 ), but x n →→ x 0 . By<br />

the compactness of C, there is a convergent subsequence of (x n ) such that (x nk ) → ¯x where<br />

¯x ≠ x 0 and ¯x ∈ C. Then φ(¯x) = lim φ(x n ) = φ(x 0 ). This contradicts the fact φ exposed x 0<br />

<strong>in</strong> C.<br />

24


(c) A proof that the exposed po<strong>in</strong>ts are dense <strong>in</strong> the extreme po<strong>in</strong>ts of a compact convex (or<br />

any closed convex set <strong>in</strong> E) can be found <strong>in</strong> [369, Theorem 18.6, p. 167–68], the proof uses<br />

the basic separation theorem (2.1.21) and Carathéodory’s theorem (1.2.5). We will outl<strong>in</strong>e<br />

another proof that every compact convex subset of E is the closed convex hull of its strongly<br />

exposed po<strong>in</strong>ts that mimics techniques that will be used <strong>in</strong> Section 6.6.<br />

Indeed, let C be a compact convex subset of E. Then σ C : E → R is a cont<strong>in</strong>uous convex<br />

function. By Theorem 2.5.1, σ C is differentiable on a dense subset of E. Now let D be<br />

the closed convex hull of the exposed po<strong>in</strong>ts of C. Suppose by way of contradiction that<br />

D ≠ C, that is, we fix ¯x ∈ C \ D. Accord<strong>in</strong>g <strong>to</strong> the basic separation theorem (2.1.21),<br />

we choose y ∈ E such that 〈y, ¯x〉 > sup D y. Because D is bounded, and the po<strong>in</strong>ts of<br />

differentiability of σ C are dense <strong>in</strong> E, we may choose φ ∈ E such that σ C is differentiable<br />

at φ, and φ(¯x) > sup D φ.<br />

Now let x 0 ∈ E = ∇σ C (φ) and so ∂σ C (φ) = {x 0 }. It is easy <strong>to</strong> check that φ(x 0 ) = σ C φ.<br />

Indeed, 〈x 0 , 2φ − φ ≤ σ C (2φ) − σ C (φ) = σ C (φ) and 〈x 0 , φ − 0 ≥ σ C (φ) − σ C (0) = −σ C (φ).<br />

Thus φ(x 0 ) = σ C (φ). Further, x 0 ∈ C, for otherwise we would use the basic separation<br />

theorem (2.1.21) <strong>to</strong> f<strong>in</strong>d Λ ∈ E so that Λ(x 0 ) > σ C (Λ). This provides the immediate<br />

contradiction 〈x, Λ − φ) > σ C (Λ) − σ C (φ). F<strong>in</strong>ally, if u ∈ C is such that φ(u) = σ C (φ), then<br />

〈u, Λ − σ〉 ≤ σ C (Λ) − σ C (φ)<br />

and so u ∈ ∂σ C (φ) = {x 0 }. Thus φ atta<strong>in</strong>s its supremum on C uniquely at x 0 , and thus we<br />

have the contradiction that x 0 is an exposed po<strong>in</strong>t of C which is not <strong>in</strong> D.<br />

Further notes. The set C := {(x, y) ∈ R 2 : −1 ≤ x ≤ 1, − √ 1 − x 2 ≤ y ≤ 1 + √ 1 − x 2 } is a<br />

compact convex set that is not the convex hull of its exposed po<strong>in</strong>ts. Indeed, any exposed po<strong>in</strong>t<br />

(x, y) of C satisfies |x| < 1. Thus the closure is needed <strong>in</strong> (c).<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!