lecture. - CASTLE Lab - Princeton University
lecture. - CASTLE Lab - Princeton University
lecture. - CASTLE Lab - Princeton University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Stochastic Optimization Seminar<br />
Lecture 6:<br />
Perturbation Analysis in Optimization<br />
Boris Defourny 1<br />
Scribe: Ethan X.Fang 1<br />
1 ORFE, Operations Research & Financial Engineering<br />
<strong>Princeton</strong> <strong>University</strong><br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 1 / 26
1 Introduction and Notation<br />
Introduction and Notation<br />
2 Hölder and Lipschitz Continuity for Functions<br />
3 Growth Conditions<br />
4 First- and Second-Order Expansions<br />
5 Perturbation Theorems<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 2 / 26
Motivation<br />
Introduction and Notation<br />
Perturbation analysis in optimization deals with the sensitivity of optimal values<br />
and optimal solution sets to perturbations of the the objective function and<br />
feasible set. Perturbation analysis provides a theoretical foundation that helps<br />
analyzing algorithms for solving approximately stochastic optimization problems.<br />
Notations<br />
Let us write minCf ∈ R = R {−∞} {+∞} for the value of the minimum of<br />
a function f : X → R over the closed set C ⊂ X. The set X is a metric<br />
space,usually R n endowed with the Euclidean norm. We assume that C is in the<br />
domain of f. In minimization problem, domf = {x ∈ X : f(x) < ∞}.<br />
The optimal solution set is the set argmin Cf = {x ∈ C : f(x) = minCf}. For<br />
ɛ > 0, the ɛ-optimal solution set is the set<br />
argmin C,ɛ f = {x ∈ C : f(x) ≤ minCf +ɛ}. If argmin C f is a singleton {s}, we<br />
write s = argmin Cf. For an illustration, see Figure 1 one the next slide.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 3 / 26
2<br />
1<br />
0<br />
Introduction and Notation<br />
x 2<br />
−1<br />
2<br />
0<br />
x<br />
1<br />
4<br />
1<br />
0<br />
−1<br />
2<br />
0<br />
x<br />
1<br />
6<br />
1<br />
0<br />
−1 0 1<br />
2<br />
1<br />
0<br />
x 2 − 0.5x<br />
−1 0 1<br />
2<br />
1<br />
0<br />
x 4 − 0.5x<br />
−1 0 1<br />
2<br />
1<br />
0<br />
x 6 − 0.5x<br />
−1 0 1<br />
Figure: Minimizing x 2m over the domain C = [0,1], for m = 1,2,3. If the objective is<br />
perturbed by −0.5x (tilt perturbation), the optimal solution is less “stable” with<br />
higher values of m. The ɛ-optimal solution sets are shown for ɛ = 0.05.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 4 / 26
Questions<br />
Introduction and Notation<br />
Suppose that we have an objective function fθ and a set Cθ that depend on<br />
some parameter θ ∈ Θ (See Figure 2 on the next slide). We would ask questions<br />
including:<br />
• What are conditions on Θ ensuring that minCθ fθ is finite for all θ ∈ Θ?<br />
• Given Θ, what is the range of the optimal value mapping θ ↦→ minCθ fθ?<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 5 / 26
θ = [ −0.06 1.64 2.13 −0.70 ]<br />
4<br />
T<br />
3<br />
2<br />
1<br />
0<br />
−2 0 2<br />
3<br />
2<br />
1<br />
Introduction and Notation<br />
θ = [ −0.26 1.44 1.93 −0.90 ]<br />
4<br />
T<br />
0<br />
−2 0 2<br />
Cθ = {x ∈ R2 ⎢<br />
: Ax θ} where A = ⎢<br />
⎣<br />
⎡<br />
θ = [ −0.46 1.24 1.73 −1.10 ]<br />
4<br />
T<br />
3<br />
2<br />
1<br />
−0.83 −0.43<br />
0.57 0.63<br />
0.28 0.80<br />
1.14 −0.90<br />
0<br />
−2 0 2<br />
⎤<br />
⎥<br />
⎦<br />
⎡<br />
⎢<br />
and θ = ⎢<br />
⎣<br />
θ = [ −0.66 1.04 1.53 −1.30 ]<br />
4<br />
T<br />
3<br />
2<br />
1<br />
0<br />
−2 0 2<br />
−0.06<br />
1.64<br />
2.13<br />
0.70<br />
⎤<br />
⎡<br />
⎥<br />
⎦−0.2k ⎢<br />
⎣<br />
Figure: Convex polyhedra described by Cθ k = {x ∈ R 2 : Ax θk}, for k = 0,1,2,3<br />
(left to right).<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 6 / 26<br />
1<br />
1<br />
1<br />
1<br />
⎤<br />
⎥<br />
⎦
Hölder and Lipschitz Continuity for Functions<br />
1 Introduction and Notation<br />
2 Hölder and Lipschitz Continuity for Functions<br />
3 Growth Conditions<br />
4 First- and Second-Order Expansions<br />
5 Perturbation Theorems<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 7 / 26
Hölder and Lipschitz Continuity for Functions<br />
Definition 1 (distance from a point to a set)<br />
The distance from a point x ∈ X to a set S ⊂ X is written<br />
dist(x,S) = infs∈S x −s.<br />
Definition 2 (neighborhood of a set)<br />
A neighborhood of a set S ⊂ X, written NS, is a set that is a neighborhood of<br />
each point of S.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 8 / 26
Hölder and Lipschitz Continuity for Functions<br />
Definition 3 (Hölder condition of degree α)<br />
A function f(x) satisfies the Hölder condition of degree α on a set S ⊂ X (for<br />
some α ≥ 0) if there exists a constant c ≥ 0 such that<br />
|f(x)−f(y)| ≤ cx −y α for all x,y ∈ S. (1)<br />
When α = 1, it corresponds to Lipschitz condition.<br />
Definition 4 (modulus)<br />
The smallest possible constant c > 0 satisfies Hölder condition of function f(x)<br />
and the degree α. Note here we implicitly assume f(x) is of real value. A<br />
similar definition could be easily obtained for vector-valued function where<br />
|f(x)−f(y)| is replaced by f(x)−f(y).<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 9 / 26
Example 1<br />
Hölder and Lipschitz Continuity for Functions<br />
f0(x) = ax α with α ∈ (0,1] and α ∈ R satisfies |f0(x)−f0(y)| ≤ |a||x −y| α on<br />
C = [0,u].<br />
Example 2<br />
Let f be a continuous function differentiable on an open set S, with gradient<br />
∇f(x) on S. Let C be a compact convex subset of S. Then f is Lipschitz<br />
continuous on C modulus c = maxx∈C∇xf(x).<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 10 / 26
1 Introduction and Notation<br />
Growth Conditions<br />
2 Hölder and Lipschitz Continuity for Functions<br />
3 Growth Conditions<br />
4 First- and Second-Order Expansions<br />
5 Perturbation Theorems<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 11 / 26
Motivation<br />
Growth Conditions<br />
This section presents notions for ensuring that a function of interest is not ”too<br />
flat”. Objective functions that are too flat in a neighborhood of their optimal<br />
solution lead to minimization problems where solutions can be very sensitive to<br />
perturbations. This may not be too annoying if the utility derived from the<br />
optimization model is well reflected by the value of objective function. However,<br />
there exist settings where the utility of the model is also derived from the<br />
solution itself.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 12 / 26
Growth Conditions<br />
Definition 6(γ-order growth condition)<br />
Consider a function f and a nonempty set C ⊂ X. Let S ⊂ C be a nonempty<br />
set where f is constant-valued with value fS ∈ R:<br />
f(s) = fS for all s ∈ S.<br />
The γ-order growth condition(γ > 0) holds for f in C if there exists a constant<br />
c > 0 and a neighborhood NS of S such that<br />
f(x)−fS ≥ c[dist(x,S)] γ <br />
for all x ∈ NS C.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 13 / 26
Remark<br />
Growth Conditions<br />
Ideally, we are interested in the largest c > 0 that can satisfies the inequality for<br />
a fixed γ. Compared to Hölder-type conditions, here only x needs to vary, and<br />
the absolute value is not taken on the left-hand side. Hence in practice, S<br />
stands for a set of local minimizers.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 14 / 26
Growth Conditions<br />
Example(strongly convex function)<br />
Let f(x) be a twice differentiable, strongly convex function over a convex set C<br />
comprising its minimum: there exists a constant m > 0 such that ∇ 2 f(x) mI<br />
for all x ∈ C. Since for any x,y ∈ C, there exists a point z = tx +(1−t)y with<br />
t ∈ [0,1] such that<br />
f(y)−f(x) = ∇f(x) ⊤ (y −x)+ 1<br />
2 (y −x)⊤ ∇ 2 f(z)(y −x) ,<br />
we have, by the strong convexity assumption,<br />
f(y)−f(x) ≥ ∇f(x) ⊤ (y −x)+ m<br />
2 ||y −x||2 .<br />
Consider S = argmin Cf, s ∈ S, and write fS = minC f. We have<br />
f(y)−fS ≥ ∇f(s) ⊤ (y −s)+ m<br />
2 ||y −s||2 ≥ m<br />
2 ||y −s||2 ,<br />
using the fact that ∇f(s) ⊤ (y −s) ≥ 0 for all y ∈ C and s ∈ S, by optimality of<br />
S. This proves that S is in fact reduced to {s}, and that the second-order<br />
growth condition holds with modulus c = m/2:<br />
f(x)−fS ≥ c[x −dist(x,S)] 2<br />
for all x ∈ C.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 15 / 26
1 Introduction and Notation<br />
First- and Second-Order Expansions<br />
2 Hölder and Lipschitz Continuity for Functions<br />
3 Growth Conditions<br />
4 First- and Second-Order Expansions<br />
5 Perturbation Theorems<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 16 / 26
Directionally Differentiable<br />
First- and Second-Order Expansions<br />
A function g(x) is directionally differentiable at x if the directional derivatives<br />
exist for all directions h.<br />
Remark<br />
g ′ (x,h) = lim<br />
t→0 +<br />
g(x +th)−g(x)<br />
t<br />
If g(x) is directional differentiable, we can expand g(x +th) for t > 0 as<br />
g(x +th) = g(x)+tg ′ (x,h)+o(t).<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 17 / 26
Second-Order Expansions<br />
First- and Second-Order Expansions<br />
A directionally differentiable function g(x) is second order directionally<br />
differentiable at x in the direction h if the (parabolic) second order directional<br />
derivatives<br />
g ′′ (x,h,w) = lim<br />
t→0 +<br />
g(x +th+ t2<br />
2w)−[g(x)+tg′ (x,h)]<br />
t<br />
exist for all (parabolic) directions w. In that case we can expand<br />
g(x +th+ t2<br />
2 w) for t > 0 as<br />
g(x +th+ t2<br />
2 w) = g(x)+tg′ (x,h)+ t2<br />
2 g′′ (x,h,w)+o(t 2 ).<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 18 / 26
1 Introduction and Notation<br />
Perturbation Theorems<br />
2 Hölder and Lipschitz Continuity for Functions<br />
3 Growth Conditions<br />
4 First- and Second-Order Expansions<br />
5 Perturbation Theorems<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 19 / 26
Fixed Feasible Set<br />
Perturbation Theorems<br />
Consider the optimization problem<br />
minimize f(x)<br />
subject to x ∈ C,<br />
where C ⊂ R n . Let ¯v = minCf and Sf = argmin Cf. Assume that<br />
• A1. Sf is nonempty.<br />
• A2. f satisfies a γ-order growth condition with modulus c in a<br />
neighborhood N of argmin C f.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 20 / 26<br />
(2)
Perturbed Optimization Problem<br />
Perturbation Theorems<br />
Consider the perturbed optimization problem<br />
where we assume that<br />
minimize f(x)+η(x)<br />
subject to x ∈ C,<br />
• B1. η is Lipschitz continuous modulus κ on C N.<br />
Let g(x) = f(x)+η(x), and consider any ɛ-optimal solution ˜xg ∈ argmin C,ɛg<br />
such that ˜xg ∈ N.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 21 / 26<br />
(3)
Perturbation Theorems<br />
Theorem 1 [Bonnans and Shapiro(2000)]<br />
Under the assumptions A1, A2 and B1, the distance between ˜xg and<br />
Sf = argmin Cf satisfy the relation<br />
c[dist(˜xg,Sf)] γ ≤ κdist(˜xg,Sf)+ɛ.<br />
In particular, for the second-order growth condition, the relation yields<br />
dist(˜xg,Sf) ≤ 1<br />
<br />
κ/c + (<br />
2 1<br />
2 κ/c)2 +ɛ/c<br />
≤ κ/c + ɛ/c<br />
For the first-order growth condition, if κ < c, the relation yields<br />
dist(˜xg,Sf) ≤ ɛ/(c −κ).<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 22 / 26
Proof of Theorem 1<br />
Perturbation Theorems<br />
Consider ˜xf ∈ argmin Cf, and ˜xg ∈ argmin C,ɛg N. We have<br />
g(˜xg) ≤ minCg +ɛ ≤ g(¯xf)+ɛ, implying g(˜xg)−g(¯xf) ≤ ɛ. Therefore,<br />
f(˜xg)−f(¯xf) = g(˜xg)−η(˜xg)−[g(¯xf −η(¯xf)]<br />
= g(˜xg)−g(¯xf)+η(¯xf)−η(˜xg) ≤ ɛ+κ˜xg −¯xf,<br />
where the last inequality is obtained by using the Lipschitz condition on η. On<br />
the other hand, f(˜xg)−f(¯xf) ≥ c[dist(˜xg,Sf)] γ . For any δ > 0, we can choose<br />
¯xf such that ˜xg −¯xf ≤ dist(˜xg,Sf)+δ. Therefore<br />
f(˜xg)−f(¯xf) ≥ c[˜xg −¯xf−δ] γ .<br />
We have thus obtained c[˜xg −¯xf−δ] γ ≤ ɛ+κ˜xg −¯xf. By letting δ → 0<br />
yields ˜xg −¯xf → dist(˜xg,Sf) and the relation<br />
cy γ ≤ ɛ+κy<br />
where y = dist(˜xg,Sf) ≥ 0. The results for particular values of γ follow by<br />
solving for y. When γ = 2 one has to solve cy 2 −κy −ɛ ≤ 0,y ≥ 0. When<br />
γ = 1 one has to solve (c −κ)y ≤ ɛ,y ≥ 0.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 23 / 26
More Assumptions<br />
Perturbation Theorems<br />
More can be said under additional conditions. Assume that<br />
• A3. The set C is compact. We define a convex compact set U ⊂ R n such<br />
that X ⊂ intU.<br />
• A4. f is Lipschitz continuous on U.<br />
• A5. Sf = argmin Cf is a singleton. We write ¯x = argmin Cf.<br />
• A2’. f satisfies a quadratic growth condition at ¯x, modulus c on U.<br />
• A6. f is twice continuously differentiable at ¯x. In particular we can expand<br />
f(x(t)) where x(t) = x +th+ t2<br />
2 +o(t2 ) as follows:<br />
f(x(t)) = f(¯x)+tf ′ (x,h)+ t2<br />
2 f ′′ (x,h,w)+o(t 2 )<br />
= f(¯x)+t∇f(¯x) ⊤ h+ t2<br />
2 (∇f(¯x)⊤ w +h ⊤ ∇ 2 f(¯x)h)+o(t 2 ).<br />
• A7. C is convex polyhedral.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 24 / 26
More Assumptions...<br />
Perturbation Theorems<br />
For t > 0, consider the perturbed problems<br />
P(t) : minimize f(x)+tηt(x) subject to x ∈ X,<br />
where the perturbations ηt(x) satisfy the following assumptions:<br />
• B1’. The functions ηt are Lipschitz continuous on U.<br />
• B2. ηt converges to a function which is Lipschitz continuous as t ↓ 0.<br />
• B3. The function η is differentiable at ¯x. In particular we can expand<br />
η(x(t)) where x(t) = x +th+ t2<br />
s w +o(t2 ) as follows:<br />
η(x(t)) = η(¯x)+t∇η(¯x)h+o(t).<br />
Let gt(x) = f(x)+tηt(x), and consider v(t) = minCg and ¯x(t) = argmin C gt.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 25 / 26
Perturbation Theorems<br />
Theorem 2[Shapiro, Dentcheva, and Ruszczyński(2009)]<br />
Under all the assumptions we have made in the previous two slides, for t ≥ 0,<br />
we have the second-order expansion of the optimal function<br />
v(t) = ¯v +tηt(¯x)+ t2<br />
2 hf,η(¯x)+o(t 2 ),<br />
where hf,η(¯x) is the optimal value of the auxiliary problem over h,<br />
minimize 2h ⊤ ∇η(¯x)+h ⊤ ∇ 2 f(¯x)h<br />
subject to h ∈ C crit (¯x) := {d ∈ TC(¯x) : d ⊤ ∇f(¯x) = 0},<br />
TC(x) := {c ∈ R n : dist(x +td,C) = o(t),t ≥ 0}.<br />
Moreover, if the solution to the auxiliary problem is a unique, say ¯g, we have for<br />
t ≥ 0 the first-order expansion of the optimal solution<br />
˜x(t) = ¯x +t ¯ h+o(t).<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 26 / 26
Perturbation Theorems<br />
J.F. Bonnans and A. Shapiro.<br />
Perturbation Analysis of Optimization Problems.<br />
Springer, New York, 2000.<br />
W. Cook, A.M.H. Gerards, and A. Schrijver.<br />
Sensitivity theorems in integer linear programming.<br />
Mathematical Programming, 34:251–264, 1986.<br />
O.L. Mangasarian.<br />
Nonlinear Programming.<br />
SIAM, 1994.<br />
R.T. Rockafellar and R.J.-B. Wets.<br />
Variational Analysis.<br />
Springer, third edition, 1998.<br />
A. Shapiro, D. Dentcheva, and A. Ruszczyński.<br />
Lectures on Stochastic Programming: Modeling and Theory.<br />
SIAM, 2009.<br />
Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 26 / 26