31.12.2013 Views

Lecture 2 Piecewise-linear optimization

Lecture 2 Piecewise-linear optimization

Lecture 2 Piecewise-linear optimization

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

EE236A (Fall 2013-14)<br />

<strong>Lecture</strong> 2<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong><br />

• piecewise-<strong>linear</strong> minimization<br />

• l 1 - and l ∞ -norm approximation<br />

• examples<br />

• modeling software<br />

2–1


Linear and affine functions<br />

<strong>linear</strong> function: a function f : R n → R is <strong>linear</strong> if<br />

f(αx+βy) = αf(x)+βf(y)<br />

∀x,y ∈ R n ,α,β ∈ R<br />

property: f is <strong>linear</strong> if and only if f(x) = a T x for some a<br />

affine function: a function f : R n → R is affine if<br />

f(αx+(1−α)y) = αf(x)+(1−α)f(y)<br />

∀x,y ∈ R n ,α ∈ R<br />

property: f is affine if and only if f(x) = a T x+b for some a, b<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–2


<strong>Piecewise</strong>-<strong>linear</strong> function<br />

f : R n → R is (convex) piecewise-<strong>linear</strong> if it can be expressed as<br />

f(x) = max<br />

i=1,...,m (aT i x+b i )<br />

f is parameterized by m n-vectors a i and m scalars b i<br />

f(x)<br />

a T i x + b i<br />

x<br />

(the term piecewise-affine is more accurate but less common)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–3


<strong>Piecewise</strong>-<strong>linear</strong> minimization<br />

minimize f(x) = max<br />

i=1,...,m (aT i x+b i )<br />

• equivalent LP (with variables x and auxiliary scalar variable t)<br />

minimize t<br />

subject to a T i x+b i ≤ t, i = 1,...,m<br />

to see equivalence, note that for fixed x the optimal t is t = f(x)<br />

• LP in matrix notation: minimize ˜c T˜x subject to Øx ≤˜b with<br />

˜x =<br />

[<br />

x<br />

t<br />

] [<br />

0<br />

, ˜c =<br />

1<br />

⎡<br />

]<br />

, Ã =<br />

−1<br />

. .<br />

a T m −1<br />

⎣ aT 1<br />

⎤ ⎡<br />

⎦, ˜b = ⎣ −b ⎤<br />

1<br />

. ⎦<br />

−b m<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–4


Minimizing a sum of piecewise-<strong>linear</strong> functions<br />

minimize f(x)+g(x) = max<br />

i=1,...,m (aT i x+b i )+ max<br />

i=1,...,p (cT i x+d i )<br />

• cost function is piecewise-<strong>linear</strong>: maximum of mp affine functions<br />

(<br />

f(x)+g(x) = max (ai +c j ) T x+(b i +d j ) )<br />

i=1,...,m<br />

j=1,...,p<br />

• equivalent LP with m+p inequalities<br />

minimize t 1 +t 2<br />

subject to a T i x+b i ≤ t 1 , i = 1,...,m<br />

c T i x+d i ≤ t 2 , i = 1,...,p<br />

note that for fixed x, optimal t 1 , t 2 are t 1 = f(x), t 2 = g(x)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–5


• equivalent LP in matrix notation<br />

minimize ˜c T˜x<br />

subject to Øx ≤˜b<br />

with<br />

˜x =<br />

⎡<br />

⎣ x ⎤<br />

t 1<br />

⎦, ˜c =<br />

t 2<br />

⎡<br />

⎣ 0 1<br />

1<br />

⎤<br />

⎦, Ã =<br />

⎡<br />

⎢<br />

⎣<br />

a T 1 −1 0<br />

. . .<br />

a T m −1 0<br />

c T 1 0 −1<br />

. . .<br />

0 −1<br />

c T p<br />

⎤ ⎡<br />

, ˜b = ⎥ ⎢<br />

⎦ ⎣<br />

⎤<br />

−b 1<br />

.<br />

−b m<br />

−d 1<br />

⎥<br />

. ⎦<br />

−d p<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–6


l ∞ -Norm (Cheybshev) approximation<br />

with A ∈ R m×n , b ∈ R m<br />

minimize ‖Ax−b‖ ∞<br />

• l ∞ -norm (Chebyshev norm) of m-vector y is<br />

‖y‖ ∞ = max<br />

i=1,...,m |y i| = max<br />

i=1,...,m max{y i,−y i }<br />

• equivalent LP (with variables x and auxiliary scalar variable t)<br />

minimize t<br />

subject to −t1 ≤ Ax−b ≤ t1<br />

(for fixed x, optimal t is t = ‖Ax−b‖ ∞ )<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–7


• equivalent LP in matrix notation<br />

minimize<br />

[<br />

0<br />

1<br />

] T [<br />

x<br />

t<br />

]<br />

subject to<br />

[<br />

A −1<br />

−A −1<br />

][<br />

x<br />

t<br />

]<br />

≤<br />

[<br />

b<br />

−b<br />

]<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–8


l 1 -Norm approximation<br />

minimize ‖Ax−b‖ 1<br />

• l 1 -norm of m-vector y is<br />

‖y‖ 1 =<br />

m∑<br />

|y i | =<br />

i=1<br />

m∑<br />

max{y i ,−y i }<br />

i=1<br />

• equivalent LP (with variable x and auxiliary vector variable u)<br />

minimize<br />

m∑<br />

u i<br />

i=1<br />

subject to −u ≤ Ax−b ≤ u<br />

(for fixed x, optimal u is u i = |(Ax−b) i |, i = 1,...,m)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–9


• equivalent LP in matrix notation<br />

minimize<br />

[<br />

0<br />

1<br />

] T [<br />

x<br />

u<br />

]<br />

subject to<br />

[<br />

A −I<br />

−A −I<br />

][<br />

x<br />

u<br />

]<br />

≤<br />

[<br />

b<br />

−b<br />

]<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–10


Comparison with least-squares solution<br />

histograms of residuals Ax−b, with randomly generated A ∈ R 200×80 , for<br />

10<br />

8<br />

6<br />

4<br />

2<br />

01.5<br />

100<br />

80<br />

60<br />

40<br />

20<br />

¡1.5 0<br />

x ls = argmin‖Ax−b‖, x l1 = argmin‖Ax−b‖ 1<br />

1.0<br />

0.5 0.0 0.5 1.0 1.5<br />

(Ax ls −b) k<br />

¡¡1.0<br />

0.5 0.0 0.5 1.0 1.5<br />

(Ax l1 −b) k<br />

l 1 -norm distribution is wider with a high peak at zero<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–11


Robust curve fitting<br />

• fit affine function f(t) = α+βt to m points (t i ,y i )<br />

• an approximation problem Ax ≈ b with<br />

⎡<br />

A = ⎣ 1 t ⎤<br />

[<br />

1 α<br />

. . ⎦, x =<br />

β<br />

1 t m<br />

]<br />

, b =<br />

⎡<br />

⎣ y ⎤<br />

1<br />

. ⎦<br />

y m<br />

f(t)<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

¢5<br />

¢10<br />

¢15<br />

¢20<br />

¢10<br />

¢5 0 5 10<br />

t<br />

• dashed: minimize ‖Ax−b‖<br />

• solid: minimize ‖Ax−b‖ 1<br />

l 1 -norm approximation is more<br />

robust against outliers<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–12


Sparse signal recovery via l 1 -norm minimization<br />

• ˆx ∈ R n is unknown signal, known to be very sparse<br />

• we make <strong>linear</strong> measurements y = Aˆx with A ∈ R m×n , m < n<br />

estimation by l 1 -norm minimization: compute estimate by solving<br />

minimize ‖x‖ 1<br />

subject to Ax = y<br />

estimate is signal with smallest l 1 -norm, consistent with measurements<br />

equivalent LP (variables x, u ∈ R n )<br />

minimize 1 T u<br />

subject to −u ≤ x ≤ u<br />

Ax = y<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–13


Example<br />

2<br />

• exact signal ˆx ∈ R 1000<br />

• 10 nonzero components<br />

ˆxk<br />

1<br />

0<br />

−1<br />

−2<br />

0 200 400 600 800 1000<br />

k<br />

least-norm solutions (randomly generated A ∈ R 100×1000 )<br />

minimum l 2 -norm solution<br />

minimum l 1 -norm solution<br />

2<br />

2<br />

1<br />

1<br />

xk<br />

0<br />

xk<br />

0<br />

−1<br />

−1<br />

−2<br />

0 200 400 600 800 1000<br />

k<br />

−2<br />

0 200 400 600 800 1000<br />

k<br />

l 1 -norm estimate is exact<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–14


Exact recovery<br />

when are the following problems equivalent?<br />

minimize card(x)<br />

subject to Ax = y<br />

minimize ‖x‖ 1<br />

subject to Ax = y<br />

• card(x) is cardinality (number of nonzero components) of x<br />

• depends on A and cardinality of sparsest solution of Ax = y<br />

we say A allows exact recovery of k-sparse vectors if<br />

ˆx = argmin<br />

Ax=y<br />

‖x‖ 1<br />

when y = Aˆx and card(ˆx) ≤ k<br />

• here, argmin‖x‖ 1 denotes the unique minimizer<br />

• a property of (the nullspace) of the ‘measurement matrix’ A<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–15


‘Nullspace condition’ for exact recovery<br />

necessary and sufficient condition for exact recovery of k-sparse vectors 1<br />

|z (1) |+···+|z (k) | < 1 2 ‖z‖ 1 ∀z ∈ nullspace(A)\{0}<br />

here, z (i) denotes component z i in order of decreasing magnitude<br />

|z (1) | ≥ |z (2) | ≥ ··· ≥ |z (n) |<br />

• a bound on how ‘concentrated’ nonzero vectors in nullspace(A) can be<br />

• implies k < n/2<br />

• difficult to verify for general A<br />

• holds with high probability for certain distributions of random A<br />

1 Feuer & Nemirovski (IEEE Trans. IT, 2003) and several other papers on compressed sensing.<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–16


Proof of nullspace condition<br />

notation<br />

• x has support I ⊆ {1,2,...,n} if x i = 0 for i ∉ I<br />

• |I| is number of elements in I<br />

• P I is projection matrix on n-vectors with support I: P I is diagonal with<br />

(P I ) jj =<br />

{<br />

1 j ∈ I<br />

0 otherwise<br />

• A satisfies the nullspace condition if<br />

‖P I z‖ 1 < 1 2 ‖z‖ 1<br />

for all nonzero z in nullspace(A) and for all support sets I with |I| ≤ k<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–17


sufficiency: suppose A satisfies the nullspace condition<br />

• let ˆx be k-sparse with support I (i.e., with P Iˆx = ˆx); define y = Aˆx<br />

• consider any feasible x (i.e., satisfying Ax = y), different from ˆx<br />

• define z = x− ˆx; this is a nonzero vector in nullspace(A)<br />

‖x‖ 1 = ‖ˆx+z‖ 1<br />

≥ ‖ˆx+z −P I z‖ 1 −‖P I z‖ 1<br />

= ∑ k∈I<br />

|ˆx k |+ ∑ k∉I<br />

|z k |−‖P I z‖ 1<br />

= ‖ˆx‖ 1 +‖z‖ 1 −2‖P I z‖ 1<br />

> ‖ˆx‖ 1<br />

(line 2 is the triangle inequality; the last line is the nullspace condition)<br />

therefore ˆx = argmin Ax=y ‖x‖ 1<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–18


necessity: suppose A does not satisfy the nullspace condition<br />

• for some nonzero z ∈ nullspace(A) and support set I with |I| ≤ k,<br />

‖P I z‖ 1 ≥ 1 2 ‖z‖ 1<br />

• define a k-sparse vector ˆx = −P I z and y = Aˆx<br />

• the vector x = ˆx+z satisfies Ax = y and has l 1 -norm<br />

‖x‖ 1 = ‖−P I z +z‖ 1<br />

= ‖z‖ 1 −‖P I z‖ 1<br />

≤ 2‖P I z‖ 1 −‖P I z‖ 1<br />

= ‖ˆx‖ 1<br />

therefore ˆx is not the unique l 1 -minimizer<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–19


Linear classification<br />

• given a set of points {v 1 ,...,v N } with binary labels s i ∈ {−1,1}<br />

• find hyperplane that strictly separates the two classes<br />

a T v i +b > 0 if s i = 1<br />

a T v i +b < 0 if s i = −1<br />

homogeneous in a, b, hence equivalent to the <strong>linear</strong> inequalities (in a, b)<br />

s i (a T v i +b) ≥ 1, i = 1,...,N<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–20


Approximate <strong>linear</strong> separation of non-separable sets<br />

minimize<br />

N∑<br />

max{0,1−s i (a T v i +b)}<br />

i=1<br />

• penalty 1−s i (a T i v i+b) for misclassifying point v i<br />

• can be interpreted as a heuristic for minimizing #misclassified points<br />

• a piecewise-<strong>linear</strong> minimization problem with variables a, b<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–21


equivalent LP (variables a ∈ R n , b ∈ R, u ∈ R N )<br />

minimize<br />

N∑<br />

u i<br />

i=1<br />

subject to 1−s i (v T i a+b) ≤ u i, i = 1,...,N<br />

u i ≥ 0, i = 1,...,N<br />

in matrix notation:<br />

⎡<br />

⎤<br />

T ⎡<br />

⎤<br />

minimize<br />

subject to<br />

⎣ 0 0 ⎦ ⎣ a b ⎦<br />

1 u<br />

⎡<br />

−s 1 v1 T −s 1 −1 0 ··· 0<br />

−s 2 v2 T −s 2 0 −1 ··· 0<br />

. . . . ... .<br />

−s N vN T −s N 0 0 ··· −1<br />

0 0 −1 0 ··· 0<br />

⎢ 0 0 0 −1 ··· 0<br />

⎣ . . . . ... .<br />

0 0 0 0 ··· −1<br />

⎤<br />

⎡<br />

⎢<br />

⎣<br />

⎥<br />

⎦<br />

⎤<br />

a<br />

b<br />

u 1<br />

u 2<br />

⎥<br />

. ⎦<br />

u N<br />

≤<br />

⎡<br />

⎢<br />

⎣<br />

−1<br />

−1<br />

.<br />

−1<br />

0<br />

0.<br />

0<br />

⎤<br />

⎥<br />

⎦<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–22


Modeling software<br />

modeling tools simplify the formulation of LPs (and other problems)<br />

• accept <strong>optimization</strong> problem in standard notation (max, ‖·‖ 1 , . . . )<br />

• recognize problems that can be converted to LPs<br />

• express the problem in the input format required by a specific LP solver<br />

examples of modeling packages<br />

• AMPL, GAMS<br />

• CVX, YALMIP (MATLAB)<br />

• CVXPY, Pyomo, CVXOPT (Python)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–23


CVX example<br />

minimize ‖Ax−b‖ 1<br />

subject to 0 ≤ x k ≤ 1, k = 1,...,n<br />

MATLAB code<br />

cvx_begin<br />

variable x(n);<br />

minimize( norm(A*x - b, 1) )<br />

subject to<br />

x >= 0<br />

x

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!