Lecture 2 Piecewise-linear optimization
Lecture 2 Piecewise-linear optimization
Lecture 2 Piecewise-linear optimization
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
EE236A (Fall 2013-14)<br />
<strong>Lecture</strong> 2<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong><br />
• piecewise-<strong>linear</strong> minimization<br />
• l 1 - and l ∞ -norm approximation<br />
• examples<br />
• modeling software<br />
2–1
Linear and affine functions<br />
<strong>linear</strong> function: a function f : R n → R is <strong>linear</strong> if<br />
f(αx+βy) = αf(x)+βf(y)<br />
∀x,y ∈ R n ,α,β ∈ R<br />
property: f is <strong>linear</strong> if and only if f(x) = a T x for some a<br />
affine function: a function f : R n → R is affine if<br />
f(αx+(1−α)y) = αf(x)+(1−α)f(y)<br />
∀x,y ∈ R n ,α ∈ R<br />
property: f is affine if and only if f(x) = a T x+b for some a, b<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–2
<strong>Piecewise</strong>-<strong>linear</strong> function<br />
f : R n → R is (convex) piecewise-<strong>linear</strong> if it can be expressed as<br />
f(x) = max<br />
i=1,...,m (aT i x+b i )<br />
f is parameterized by m n-vectors a i and m scalars b i<br />
f(x)<br />
a T i x + b i<br />
x<br />
(the term piecewise-affine is more accurate but less common)<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–3
<strong>Piecewise</strong>-<strong>linear</strong> minimization<br />
minimize f(x) = max<br />
i=1,...,m (aT i x+b i )<br />
• equivalent LP (with variables x and auxiliary scalar variable t)<br />
minimize t<br />
subject to a T i x+b i ≤ t, i = 1,...,m<br />
to see equivalence, note that for fixed x the optimal t is t = f(x)<br />
• LP in matrix notation: minimize ˜c T˜x subject to Øx ≤˜b with<br />
˜x =<br />
[<br />
x<br />
t<br />
] [<br />
0<br />
, ˜c =<br />
1<br />
⎡<br />
]<br />
, Ã =<br />
−1<br />
. .<br />
a T m −1<br />
⎣ aT 1<br />
⎤ ⎡<br />
⎦, ˜b = ⎣ −b ⎤<br />
1<br />
. ⎦<br />
−b m<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–4
Minimizing a sum of piecewise-<strong>linear</strong> functions<br />
minimize f(x)+g(x) = max<br />
i=1,...,m (aT i x+b i )+ max<br />
i=1,...,p (cT i x+d i )<br />
• cost function is piecewise-<strong>linear</strong>: maximum of mp affine functions<br />
(<br />
f(x)+g(x) = max (ai +c j ) T x+(b i +d j ) )<br />
i=1,...,m<br />
j=1,...,p<br />
• equivalent LP with m+p inequalities<br />
minimize t 1 +t 2<br />
subject to a T i x+b i ≤ t 1 , i = 1,...,m<br />
c T i x+d i ≤ t 2 , i = 1,...,p<br />
note that for fixed x, optimal t 1 , t 2 are t 1 = f(x), t 2 = g(x)<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–5
• equivalent LP in matrix notation<br />
minimize ˜c T˜x<br />
subject to Øx ≤˜b<br />
with<br />
˜x =<br />
⎡<br />
⎣ x ⎤<br />
t 1<br />
⎦, ˜c =<br />
t 2<br />
⎡<br />
⎣ 0 1<br />
1<br />
⎤<br />
⎦, Ã =<br />
⎡<br />
⎢<br />
⎣<br />
a T 1 −1 0<br />
. . .<br />
a T m −1 0<br />
c T 1 0 −1<br />
. . .<br />
0 −1<br />
c T p<br />
⎤ ⎡<br />
, ˜b = ⎥ ⎢<br />
⎦ ⎣<br />
⎤<br />
−b 1<br />
.<br />
−b m<br />
−d 1<br />
⎥<br />
. ⎦<br />
−d p<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–6
l ∞ -Norm (Cheybshev) approximation<br />
with A ∈ R m×n , b ∈ R m<br />
minimize ‖Ax−b‖ ∞<br />
• l ∞ -norm (Chebyshev norm) of m-vector y is<br />
‖y‖ ∞ = max<br />
i=1,...,m |y i| = max<br />
i=1,...,m max{y i,−y i }<br />
• equivalent LP (with variables x and auxiliary scalar variable t)<br />
minimize t<br />
subject to −t1 ≤ Ax−b ≤ t1<br />
(for fixed x, optimal t is t = ‖Ax−b‖ ∞ )<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–7
• equivalent LP in matrix notation<br />
minimize<br />
[<br />
0<br />
1<br />
] T [<br />
x<br />
t<br />
]<br />
subject to<br />
[<br />
A −1<br />
−A −1<br />
][<br />
x<br />
t<br />
]<br />
≤<br />
[<br />
b<br />
−b<br />
]<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–8
l 1 -Norm approximation<br />
minimize ‖Ax−b‖ 1<br />
• l 1 -norm of m-vector y is<br />
‖y‖ 1 =<br />
m∑<br />
|y i | =<br />
i=1<br />
m∑<br />
max{y i ,−y i }<br />
i=1<br />
• equivalent LP (with variable x and auxiliary vector variable u)<br />
minimize<br />
m∑<br />
u i<br />
i=1<br />
subject to −u ≤ Ax−b ≤ u<br />
(for fixed x, optimal u is u i = |(Ax−b) i |, i = 1,...,m)<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–9
• equivalent LP in matrix notation<br />
minimize<br />
[<br />
0<br />
1<br />
] T [<br />
x<br />
u<br />
]<br />
subject to<br />
[<br />
A −I<br />
−A −I<br />
][<br />
x<br />
u<br />
]<br />
≤<br />
[<br />
b<br />
−b<br />
]<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–10
Comparison with least-squares solution<br />
histograms of residuals Ax−b, with randomly generated A ∈ R 200×80 , for<br />
10<br />
8<br />
6<br />
4<br />
2<br />
01.5<br />
100<br />
80<br />
60<br />
40<br />
20<br />
¡1.5 0<br />
x ls = argmin‖Ax−b‖, x l1 = argmin‖Ax−b‖ 1<br />
1.0<br />
0.5 0.0 0.5 1.0 1.5<br />
(Ax ls −b) k<br />
¡¡1.0<br />
0.5 0.0 0.5 1.0 1.5<br />
(Ax l1 −b) k<br />
l 1 -norm distribution is wider with a high peak at zero<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–11
Robust curve fitting<br />
• fit affine function f(t) = α+βt to m points (t i ,y i )<br />
• an approximation problem Ax ≈ b with<br />
⎡<br />
A = ⎣ 1 t ⎤<br />
[<br />
1 α<br />
. . ⎦, x =<br />
β<br />
1 t m<br />
]<br />
, b =<br />
⎡<br />
⎣ y ⎤<br />
1<br />
. ⎦<br />
y m<br />
f(t)<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
¢5<br />
¢10<br />
¢15<br />
¢20<br />
¢10<br />
¢5 0 5 10<br />
t<br />
• dashed: minimize ‖Ax−b‖<br />
• solid: minimize ‖Ax−b‖ 1<br />
l 1 -norm approximation is more<br />
robust against outliers<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–12
Sparse signal recovery via l 1 -norm minimization<br />
• ˆx ∈ R n is unknown signal, known to be very sparse<br />
• we make <strong>linear</strong> measurements y = Aˆx with A ∈ R m×n , m < n<br />
estimation by l 1 -norm minimization: compute estimate by solving<br />
minimize ‖x‖ 1<br />
subject to Ax = y<br />
estimate is signal with smallest l 1 -norm, consistent with measurements<br />
equivalent LP (variables x, u ∈ R n )<br />
minimize 1 T u<br />
subject to −u ≤ x ≤ u<br />
Ax = y<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–13
Example<br />
2<br />
• exact signal ˆx ∈ R 1000<br />
• 10 nonzero components<br />
ˆxk<br />
1<br />
0<br />
−1<br />
−2<br />
0 200 400 600 800 1000<br />
k<br />
least-norm solutions (randomly generated A ∈ R 100×1000 )<br />
minimum l 2 -norm solution<br />
minimum l 1 -norm solution<br />
2<br />
2<br />
1<br />
1<br />
xk<br />
0<br />
xk<br />
0<br />
−1<br />
−1<br />
−2<br />
0 200 400 600 800 1000<br />
k<br />
−2<br />
0 200 400 600 800 1000<br />
k<br />
l 1 -norm estimate is exact<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–14
Exact recovery<br />
when are the following problems equivalent?<br />
minimize card(x)<br />
subject to Ax = y<br />
minimize ‖x‖ 1<br />
subject to Ax = y<br />
• card(x) is cardinality (number of nonzero components) of x<br />
• depends on A and cardinality of sparsest solution of Ax = y<br />
we say A allows exact recovery of k-sparse vectors if<br />
ˆx = argmin<br />
Ax=y<br />
‖x‖ 1<br />
when y = Aˆx and card(ˆx) ≤ k<br />
• here, argmin‖x‖ 1 denotes the unique minimizer<br />
• a property of (the nullspace) of the ‘measurement matrix’ A<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–15
‘Nullspace condition’ for exact recovery<br />
necessary and sufficient condition for exact recovery of k-sparse vectors 1<br />
|z (1) |+···+|z (k) | < 1 2 ‖z‖ 1 ∀z ∈ nullspace(A)\{0}<br />
here, z (i) denotes component z i in order of decreasing magnitude<br />
|z (1) | ≥ |z (2) | ≥ ··· ≥ |z (n) |<br />
• a bound on how ‘concentrated’ nonzero vectors in nullspace(A) can be<br />
• implies k < n/2<br />
• difficult to verify for general A<br />
• holds with high probability for certain distributions of random A<br />
1 Feuer & Nemirovski (IEEE Trans. IT, 2003) and several other papers on compressed sensing.<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–16
Proof of nullspace condition<br />
notation<br />
• x has support I ⊆ {1,2,...,n} if x i = 0 for i ∉ I<br />
• |I| is number of elements in I<br />
• P I is projection matrix on n-vectors with support I: P I is diagonal with<br />
(P I ) jj =<br />
{<br />
1 j ∈ I<br />
0 otherwise<br />
• A satisfies the nullspace condition if<br />
‖P I z‖ 1 < 1 2 ‖z‖ 1<br />
for all nonzero z in nullspace(A) and for all support sets I with |I| ≤ k<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–17
sufficiency: suppose A satisfies the nullspace condition<br />
• let ˆx be k-sparse with support I (i.e., with P Iˆx = ˆx); define y = Aˆx<br />
• consider any feasible x (i.e., satisfying Ax = y), different from ˆx<br />
• define z = x− ˆx; this is a nonzero vector in nullspace(A)<br />
‖x‖ 1 = ‖ˆx+z‖ 1<br />
≥ ‖ˆx+z −P I z‖ 1 −‖P I z‖ 1<br />
= ∑ k∈I<br />
|ˆx k |+ ∑ k∉I<br />
|z k |−‖P I z‖ 1<br />
= ‖ˆx‖ 1 +‖z‖ 1 −2‖P I z‖ 1<br />
> ‖ˆx‖ 1<br />
(line 2 is the triangle inequality; the last line is the nullspace condition)<br />
therefore ˆx = argmin Ax=y ‖x‖ 1<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–18
necessity: suppose A does not satisfy the nullspace condition<br />
• for some nonzero z ∈ nullspace(A) and support set I with |I| ≤ k,<br />
‖P I z‖ 1 ≥ 1 2 ‖z‖ 1<br />
• define a k-sparse vector ˆx = −P I z and y = Aˆx<br />
• the vector x = ˆx+z satisfies Ax = y and has l 1 -norm<br />
‖x‖ 1 = ‖−P I z +z‖ 1<br />
= ‖z‖ 1 −‖P I z‖ 1<br />
≤ 2‖P I z‖ 1 −‖P I z‖ 1<br />
= ‖ˆx‖ 1<br />
therefore ˆx is not the unique l 1 -minimizer<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–19
Linear classification<br />
• given a set of points {v 1 ,...,v N } with binary labels s i ∈ {−1,1}<br />
• find hyperplane that strictly separates the two classes<br />
a T v i +b > 0 if s i = 1<br />
a T v i +b < 0 if s i = −1<br />
homogeneous in a, b, hence equivalent to the <strong>linear</strong> inequalities (in a, b)<br />
s i (a T v i +b) ≥ 1, i = 1,...,N<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–20
Approximate <strong>linear</strong> separation of non-separable sets<br />
minimize<br />
N∑<br />
max{0,1−s i (a T v i +b)}<br />
i=1<br />
• penalty 1−s i (a T i v i+b) for misclassifying point v i<br />
• can be interpreted as a heuristic for minimizing #misclassified points<br />
• a piecewise-<strong>linear</strong> minimization problem with variables a, b<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–21
equivalent LP (variables a ∈ R n , b ∈ R, u ∈ R N )<br />
minimize<br />
N∑<br />
u i<br />
i=1<br />
subject to 1−s i (v T i a+b) ≤ u i, i = 1,...,N<br />
u i ≥ 0, i = 1,...,N<br />
in matrix notation:<br />
⎡<br />
⎤<br />
T ⎡<br />
⎤<br />
minimize<br />
subject to<br />
⎣ 0 0 ⎦ ⎣ a b ⎦<br />
1 u<br />
⎡<br />
−s 1 v1 T −s 1 −1 0 ··· 0<br />
−s 2 v2 T −s 2 0 −1 ··· 0<br />
. . . . ... .<br />
−s N vN T −s N 0 0 ··· −1<br />
0 0 −1 0 ··· 0<br />
⎢ 0 0 0 −1 ··· 0<br />
⎣ . . . . ... .<br />
0 0 0 0 ··· −1<br />
⎤<br />
⎡<br />
⎢<br />
⎣<br />
⎥<br />
⎦<br />
⎤<br />
a<br />
b<br />
u 1<br />
u 2<br />
⎥<br />
. ⎦<br />
u N<br />
≤<br />
⎡<br />
⎢<br />
⎣<br />
−1<br />
−1<br />
.<br />
−1<br />
0<br />
0.<br />
0<br />
⎤<br />
⎥<br />
⎦<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–22
Modeling software<br />
modeling tools simplify the formulation of LPs (and other problems)<br />
• accept <strong>optimization</strong> problem in standard notation (max, ‖·‖ 1 , . . . )<br />
• recognize problems that can be converted to LPs<br />
• express the problem in the input format required by a specific LP solver<br />
examples of modeling packages<br />
• AMPL, GAMS<br />
• CVX, YALMIP (MATLAB)<br />
• CVXPY, Pyomo, CVXOPT (Python)<br />
<strong>Piecewise</strong>-<strong>linear</strong> <strong>optimization</strong> 2–23
CVX example<br />
minimize ‖Ax−b‖ 1<br />
subject to 0 ≤ x k ≤ 1, k = 1,...,n<br />
MATLAB code<br />
cvx_begin<br />
variable x(n);<br />
minimize( norm(A*x - b, 1) )<br />
subject to<br />
x >= 0<br />
x