31.12.2013 Views

Lecture 2 Piecewise-linear optimization

Lecture 2 Piecewise-linear optimization

Lecture 2 Piecewise-linear optimization

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

EE236A (Fall 2013-14)<br />

<strong>Lecture</strong> 2<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong><br />

• piecewise-<strong>linear</strong> minimization<br />

• l 1 - and l ∞ -norm approximation<br />

• examples<br />

• modeling software<br />

2–1


Linear and affine functions<br />

<strong>linear</strong> function: a function f : R n → R is <strong>linear</strong> if<br />

f(αx+βy) = αf(x)+βf(y)<br />

∀x,y ∈ R n ,α,β ∈ R<br />

property: f is <strong>linear</strong> if and only if f(x) = a T x for some a<br />

affine function: a function f : R n → R is affine if<br />

f(αx+(1−α)y) = αf(x)+(1−α)f(y)<br />

∀x,y ∈ R n ,α ∈ R<br />

property: f is affine if and only if f(x) = a T x+b for some a, b<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–2


<strong>Piecewise</strong>-<strong>linear</strong> function<br />

f : R n → R is (convex) piecewise-<strong>linear</strong> if it can be expressed as<br />

f(x) = max<br />

i=1,...,m (aT i x+b i )<br />

f is parameterized by m n-vectors a i and m scalars b i<br />

f(x)<br />

a T i x + b i<br />

x<br />

(the term piecewise-affine is more accurate but less common)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–3


<strong>Piecewise</strong>-<strong>linear</strong> minimization<br />

minimize f(x) = max<br />

i=1,...,m (aT i x+b i )<br />

• equivalent LP (with variables x and auxiliary scalar variable t)<br />

minimize t<br />

subject to a T i x+b i ≤ t, i = 1,...,m<br />

to see equivalence, note that for fixed x the optimal t is t = f(x)<br />

• LP in matrix notation: minimize ˜c T˜x subject to Øx ≤˜b with<br />

˜x =<br />

[<br />

x<br />

t<br />

] [<br />

0<br />

, ˜c =<br />

1<br />

⎡<br />

]<br />

, Ã =<br />

−1<br />

. .<br />

a T m −1<br />

⎣ aT 1<br />

⎤ ⎡<br />

⎦, ˜b = ⎣ −b ⎤<br />

1<br />

. ⎦<br />

−b m<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–4


Minimizing a sum of piecewise-<strong>linear</strong> functions<br />

minimize f(x)+g(x) = max<br />

i=1,...,m (aT i x+b i )+ max<br />

i=1,...,p (cT i x+d i )<br />

• cost function is piecewise-<strong>linear</strong>: maximum of mp affine functions<br />

(<br />

f(x)+g(x) = max (ai +c j ) T x+(b i +d j ) )<br />

i=1,...,m<br />

j=1,...,p<br />

• equivalent LP with m+p inequalities<br />

minimize t 1 +t 2<br />

subject to a T i x+b i ≤ t 1 , i = 1,...,m<br />

c T i x+d i ≤ t 2 , i = 1,...,p<br />

note that for fixed x, optimal t 1 , t 2 are t 1 = f(x), t 2 = g(x)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–5


• equivalent LP in matrix notation<br />

minimize ˜c T˜x<br />

subject to Øx ≤˜b<br />

with<br />

˜x =<br />

⎡<br />

⎣ x ⎤<br />

t 1<br />

⎦, ˜c =<br />

t 2<br />

⎡<br />

⎣ 0 1<br />

1<br />

⎤<br />

⎦, Ã =<br />

⎡<br />

⎢<br />

⎣<br />

a T 1 −1 0<br />

. . .<br />

a T m −1 0<br />

c T 1 0 −1<br />

. . .<br />

0 −1<br />

c T p<br />

⎤ ⎡<br />

, ˜b = ⎥ ⎢<br />

⎦ ⎣<br />

⎤<br />

−b 1<br />

.<br />

−b m<br />

−d 1<br />

⎥<br />

. ⎦<br />

−d p<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–6


l ∞ -Norm (Cheybshev) approximation<br />

with A ∈ R m×n , b ∈ R m<br />

minimize ‖Ax−b‖ ∞<br />

• l ∞ -norm (Chebyshev norm) of m-vector y is<br />

‖y‖ ∞ = max<br />

i=1,...,m |y i| = max<br />

i=1,...,m max{y i,−y i }<br />

• equivalent LP (with variables x and auxiliary scalar variable t)<br />

minimize t<br />

subject to −t1 ≤ Ax−b ≤ t1<br />

(for fixed x, optimal t is t = ‖Ax−b‖ ∞ )<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–7


• equivalent LP in matrix notation<br />

minimize<br />

[<br />

0<br />

1<br />

] T [<br />

x<br />

t<br />

]<br />

subject to<br />

[<br />

A −1<br />

−A −1<br />

][<br />

x<br />

t<br />

]<br />

≤<br />

[<br />

b<br />

−b<br />

]<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–8


l 1 -Norm approximation<br />

minimize ‖Ax−b‖ 1<br />

• l 1 -norm of m-vector y is<br />

‖y‖ 1 =<br />

m∑<br />

|y i | =<br />

i=1<br />

m∑<br />

max{y i ,−y i }<br />

i=1<br />

• equivalent LP (with variable x and auxiliary vector variable u)<br />

minimize<br />

m∑<br />

u i<br />

i=1<br />

subject to −u ≤ Ax−b ≤ u<br />

(for fixed x, optimal u is u i = |(Ax−b) i |, i = 1,...,m)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–9


• equivalent LP in matrix notation<br />

minimize<br />

[<br />

0<br />

1<br />

] T [<br />

x<br />

u<br />

]<br />

subject to<br />

[<br />

A −I<br />

−A −I<br />

][<br />

x<br />

u<br />

]<br />

≤<br />

[<br />

b<br />

−b<br />

]<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–10


Comparison with least-squares solution<br />

histograms of residuals Ax−b, with randomly generated A ∈ R 200×80 , for<br />

10<br />

8<br />

6<br />

4<br />

2<br />

01.5<br />

100<br />

80<br />

60<br />

40<br />

20<br />

¡1.5 0<br />

x ls = argmin‖Ax−b‖, x l1 = argmin‖Ax−b‖ 1<br />

1.0<br />

0.5 0.0 0.5 1.0 1.5<br />

(Ax ls −b) k<br />

¡¡1.0<br />

0.5 0.0 0.5 1.0 1.5<br />

(Ax l1 −b) k<br />

l 1 -norm distribution is wider with a high peak at zero<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–11


Robust curve fitting<br />

• fit affine function f(t) = α+βt to m points (t i ,y i )<br />

• an approximation problem Ax ≈ b with<br />

⎡<br />

A = ⎣ 1 t ⎤<br />

[<br />

1 α<br />

. . ⎦, x =<br />

β<br />

1 t m<br />

]<br />

, b =<br />

⎡<br />

⎣ y ⎤<br />

1<br />

. ⎦<br />

y m<br />

f(t)<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

¢5<br />

¢10<br />

¢15<br />

¢20<br />

¢10<br />

¢5 0 5 10<br />

t<br />

• dashed: minimize ‖Ax−b‖<br />

• solid: minimize ‖Ax−b‖ 1<br />

l 1 -norm approximation is more<br />

robust against outliers<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–12


Sparse signal recovery via l 1 -norm minimization<br />

• ˆx ∈ R n is unknown signal, known to be very sparse<br />

• we make <strong>linear</strong> measurements y = Aˆx with A ∈ R m×n , m < n<br />

estimation by l 1 -norm minimization: compute estimate by solving<br />

minimize ‖x‖ 1<br />

subject to Ax = y<br />

estimate is signal with smallest l 1 -norm, consistent with measurements<br />

equivalent LP (variables x, u ∈ R n )<br />

minimize 1 T u<br />

subject to −u ≤ x ≤ u<br />

Ax = y<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–13


Example<br />

2<br />

• exact signal ˆx ∈ R 1000<br />

• 10 nonzero components<br />

ˆxk<br />

1<br />

0<br />

−1<br />

−2<br />

0 200 400 600 800 1000<br />

k<br />

least-norm solutions (randomly generated A ∈ R 100×1000 )<br />

minimum l 2 -norm solution<br />

minimum l 1 -norm solution<br />

2<br />

2<br />

1<br />

1<br />

xk<br />

0<br />

xk<br />

0<br />

−1<br />

−1<br />

−2<br />

0 200 400 600 800 1000<br />

k<br />

−2<br />

0 200 400 600 800 1000<br />

k<br />

l 1 -norm estimate is exact<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–14


Exact recovery<br />

when are the following problems equivalent?<br />

minimize card(x)<br />

subject to Ax = y<br />

minimize ‖x‖ 1<br />

subject to Ax = y<br />

• card(x) is cardinality (number of nonzero components) of x<br />

• depends on A and cardinality of sparsest solution of Ax = y<br />

we say A allows exact recovery of k-sparse vectors if<br />

ˆx = argmin<br />

Ax=y<br />

‖x‖ 1<br />

when y = Aˆx and card(ˆx) ≤ k<br />

• here, argmin‖x‖ 1 denotes the unique minimizer<br />

• a property of (the nullspace) of the ‘measurement matrix’ A<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–15


‘Nullspace condition’ for exact recovery<br />

necessary and sufficient condition for exact recovery of k-sparse vectors 1<br />

|z (1) |+···+|z (k) | < 1 2 ‖z‖ 1 ∀z ∈ nullspace(A)\{0}<br />

here, z (i) denotes component z i in order of decreasing magnitude<br />

|z (1) | ≥ |z (2) | ≥ ··· ≥ |z (n) |<br />

• a bound on how ‘concentrated’ nonzero vectors in nullspace(A) can be<br />

• implies k < n/2<br />

• difficult to verify for general A<br />

• holds with high probability for certain distributions of random A<br />

1 Feuer & Nemirovski (IEEE Trans. IT, 2003) and several other papers on compressed sensing.<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–16


Proof of nullspace condition<br />

notation<br />

• x has support I ⊆ {1,2,...,n} if x i = 0 for i ∉ I<br />

• |I| is number of elements in I<br />

• P I is projection matrix on n-vectors with support I: P I is diagonal with<br />

(P I ) jj =<br />

{<br />

1 j ∈ I<br />

0 otherwise<br />

• A satisfies the nullspace condition if<br />

‖P I z‖ 1 < 1 2 ‖z‖ 1<br />

for all nonzero z in nullspace(A) and for all support sets I with |I| ≤ k<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–17


sufficiency: suppose A satisfies the nullspace condition<br />

• let ˆx be k-sparse with support I (i.e., with P Iˆx = ˆx); define y = Aˆx<br />

• consider any feasible x (i.e., satisfying Ax = y), different from ˆx<br />

• define z = x− ˆx; this is a nonzero vector in nullspace(A)<br />

‖x‖ 1 = ‖ˆx+z‖ 1<br />

≥ ‖ˆx+z −P I z‖ 1 −‖P I z‖ 1<br />

= ∑ k∈I<br />

|ˆx k |+ ∑ k∉I<br />

|z k |−‖P I z‖ 1<br />

= ‖ˆx‖ 1 +‖z‖ 1 −2‖P I z‖ 1<br />

> ‖ˆx‖ 1<br />

(line 2 is the triangle inequality; the last line is the nullspace condition)<br />

therefore ˆx = argmin Ax=y ‖x‖ 1<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–18


necessity: suppose A does not satisfy the nullspace condition<br />

• for some nonzero z ∈ nullspace(A) and support set I with |I| ≤ k,<br />

‖P I z‖ 1 ≥ 1 2 ‖z‖ 1<br />

• define a k-sparse vector ˆx = −P I z and y = Aˆx<br />

• the vector x = ˆx+z satisfies Ax = y and has l 1 -norm<br />

‖x‖ 1 = ‖−P I z +z‖ 1<br />

= ‖z‖ 1 −‖P I z‖ 1<br />

≤ 2‖P I z‖ 1 −‖P I z‖ 1<br />

= ‖ˆx‖ 1<br />

therefore ˆx is not the unique l 1 -minimizer<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–19


Linear classification<br />

• given a set of points {v 1 ,...,v N } with binary labels s i ∈ {−1,1}<br />

• find hyperplane that strictly separates the two classes<br />

a T v i +b > 0 if s i = 1<br />

a T v i +b < 0 if s i = −1<br />

homogeneous in a, b, hence equivalent to the <strong>linear</strong> inequalities (in a, b)<br />

s i (a T v i +b) ≥ 1, i = 1,...,N<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–20


Approximate <strong>linear</strong> separation of non-separable sets<br />

minimize<br />

N∑<br />

max{0,1−s i (a T v i +b)}<br />

i=1<br />

• penalty 1−s i (a T i v i+b) for misclassifying point v i<br />

• can be interpreted as a heuristic for minimizing #misclassified points<br />

• a piecewise-<strong>linear</strong> minimization problem with variables a, b<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–21


equivalent LP (variables a ∈ R n , b ∈ R, u ∈ R N )<br />

minimize<br />

N∑<br />

u i<br />

i=1<br />

subject to 1−s i (v T i a+b) ≤ u i, i = 1,...,N<br />

u i ≥ 0, i = 1,...,N<br />

in matrix notation:<br />

⎡<br />

⎤<br />

T ⎡<br />

⎤<br />

minimize<br />

subject to<br />

⎣ 0 0 ⎦ ⎣ a b ⎦<br />

1 u<br />

⎡<br />

−s 1 v1 T −s 1 −1 0 ··· 0<br />

−s 2 v2 T −s 2 0 −1 ··· 0<br />

. . . . ... .<br />

−s N vN T −s N 0 0 ··· −1<br />

0 0 −1 0 ··· 0<br />

⎢ 0 0 0 −1 ··· 0<br />

⎣ . . . . ... .<br />

0 0 0 0 ··· −1<br />

⎤<br />

⎡<br />

⎢<br />

⎣<br />

⎥<br />

⎦<br />

⎤<br />

a<br />

b<br />

u 1<br />

u 2<br />

⎥<br />

. ⎦<br />

u N<br />

≤<br />

⎡<br />

⎢<br />

⎣<br />

−1<br />

−1<br />

.<br />

−1<br />

0<br />

0.<br />

0<br />

⎤<br />

⎥<br />

⎦<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–22


Modeling software<br />

modeling tools simplify the formulation of LPs (and other problems)<br />

• accept <strong>optimization</strong> problem in standard notation (max, ‖·‖ 1 , . . . )<br />

• recognize problems that can be converted to LPs<br />

• express the problem in the input format required by a specific LP solver<br />

examples of modeling packages<br />

• AMPL, GAMS<br />

• CVX, YALMIP (MATLAB)<br />

• CVXPY, Pyomo, CVXOPT (Python)<br />

<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–23


CVX example<br />

minimize ‖Ax−b‖ 1<br />

subject to 0 ≤ x k ≤ 1, k = 1,...,n<br />

MATLAB code<br />

cvx_begin<br />

variable x(n);<br />

minimize( norm(A*x - b, 1) )<br />

subject to<br />

x >= 0<br />

x

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!