03.07.2013 Views

lecture. - CASTLE Lab - Princeton University

lecture. - CASTLE Lab - Princeton University

lecture. - CASTLE Lab - Princeton University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stochastic Optimization Seminar<br />

Lecture 6:<br />

Perturbation Analysis in Optimization<br />

Boris Defourny 1<br />

Scribe: Ethan X.Fang 1<br />

1 ORFE, Operations Research & Financial Engineering<br />

<strong>Princeton</strong> <strong>University</strong><br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 1 / 26


1 Introduction and Notation<br />

Introduction and Notation<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 2 / 26


Motivation<br />

Introduction and Notation<br />

Perturbation analysis in optimization deals with the sensitivity of optimal values<br />

and optimal solution sets to perturbations of the the objective function and<br />

feasible set. Perturbation analysis provides a theoretical foundation that helps<br />

analyzing algorithms for solving approximately stochastic optimization problems.<br />

Notations<br />

Let us write minCf ∈ R = R {−∞} {+∞} for the value of the minimum of<br />

a function f : X → R over the closed set C ⊂ X. The set X is a metric<br />

space,usually R n endowed with the Euclidean norm. We assume that C is in the<br />

domain of f. In minimization problem, domf = {x ∈ X : f(x) < ∞}.<br />

The optimal solution set is the set argmin Cf = {x ∈ C : f(x) = minCf}. For<br />

ɛ > 0, the ɛ-optimal solution set is the set<br />

argmin C,ɛ f = {x ∈ C : f(x) ≤ minCf +ɛ}. If argmin C f is a singleton {s}, we<br />

write s = argmin Cf. For an illustration, see Figure 1 one the next slide.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 3 / 26


2<br />

1<br />

0<br />

Introduction and Notation<br />

x 2<br />

−1<br />

2<br />

0<br />

x<br />

1<br />

4<br />

1<br />

0<br />

−1<br />

2<br />

0<br />

x<br />

1<br />

6<br />

1<br />

0<br />

−1 0 1<br />

2<br />

1<br />

0<br />

x 2 − 0.5x<br />

−1 0 1<br />

2<br />

1<br />

0<br />

x 4 − 0.5x<br />

−1 0 1<br />

2<br />

1<br />

0<br />

x 6 − 0.5x<br />

−1 0 1<br />

Figure: Minimizing x 2m over the domain C = [0,1], for m = 1,2,3. If the objective is<br />

perturbed by −0.5x (tilt perturbation), the optimal solution is less “stable” with<br />

higher values of m. The ɛ-optimal solution sets are shown for ɛ = 0.05.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 4 / 26


Questions<br />

Introduction and Notation<br />

Suppose that we have an objective function fθ and a set Cθ that depend on<br />

some parameter θ ∈ Θ (See Figure 2 on the next slide). We would ask questions<br />

including:<br />

• What are conditions on Θ ensuring that minCθ fθ is finite for all θ ∈ Θ?<br />

• Given Θ, what is the range of the optimal value mapping θ ↦→ minCθ fθ?<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 5 / 26


θ = [ −0.06 1.64 2.13 −0.70 ]<br />

4<br />

T<br />

3<br />

2<br />

1<br />

0<br />

−2 0 2<br />

3<br />

2<br />

1<br />

Introduction and Notation<br />

θ = [ −0.26 1.44 1.93 −0.90 ]<br />

4<br />

T<br />

0<br />

−2 0 2<br />

Cθ = {x ∈ R2 ⎢<br />

: Ax θ} where A = ⎢<br />

⎣<br />

⎡<br />

θ = [ −0.46 1.24 1.73 −1.10 ]<br />

4<br />

T<br />

3<br />

2<br />

1<br />

−0.83 −0.43<br />

0.57 0.63<br />

0.28 0.80<br />

1.14 −0.90<br />

0<br />

−2 0 2<br />

⎤<br />

⎥<br />

⎦<br />

⎡<br />

⎢<br />

and θ = ⎢<br />

⎣<br />

θ = [ −0.66 1.04 1.53 −1.30 ]<br />

4<br />

T<br />

3<br />

2<br />

1<br />

0<br />

−2 0 2<br />

−0.06<br />

1.64<br />

2.13<br />

0.70<br />

⎤<br />

⎡<br />

⎥<br />

⎦−0.2k ⎢<br />

⎣<br />

Figure: Convex polyhedra described by Cθ k = {x ∈ R 2 : Ax θk}, for k = 0,1,2,3<br />

(left to right).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 6 / 26<br />

1<br />

1<br />

1<br />

1<br />

⎤<br />

⎥<br />


Hölder and Lipschitz Continuity for Functions<br />

1 Introduction and Notation<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 7 / 26


Hölder and Lipschitz Continuity for Functions<br />

Definition 1 (distance from a point to a set)<br />

The distance from a point x ∈ X to a set S ⊂ X is written<br />

dist(x,S) = infs∈S x −s.<br />

Definition 2 (neighborhood of a set)<br />

A neighborhood of a set S ⊂ X, written NS, is a set that is a neighborhood of<br />

each point of S.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 8 / 26


Hölder and Lipschitz Continuity for Functions<br />

Definition 3 (Hölder condition of degree α)<br />

A function f(x) satisfies the Hölder condition of degree α on a set S ⊂ X (for<br />

some α ≥ 0) if there exists a constant c ≥ 0 such that<br />

|f(x)−f(y)| ≤ cx −y α for all x,y ∈ S. (1)<br />

When α = 1, it corresponds to Lipschitz condition.<br />

Definition 4 (modulus)<br />

The smallest possible constant c > 0 satisfies Hölder condition of function f(x)<br />

and the degree α. Note here we implicitly assume f(x) is of real value. A<br />

similar definition could be easily obtained for vector-valued function where<br />

|f(x)−f(y)| is replaced by f(x)−f(y).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 9 / 26


Example 1<br />

Hölder and Lipschitz Continuity for Functions<br />

f0(x) = ax α with α ∈ (0,1] and α ∈ R satisfies |f0(x)−f0(y)| ≤ |a||x −y| α on<br />

C = [0,u].<br />

Example 2<br />

Let f be a continuous function differentiable on an open set S, with gradient<br />

∇f(x) on S. Let C be a compact convex subset of S. Then f is Lipschitz<br />

continuous on C modulus c = maxx∈C∇xf(x).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 10 / 26


1 Introduction and Notation<br />

Growth Conditions<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 11 / 26


Motivation<br />

Growth Conditions<br />

This section presents notions for ensuring that a function of interest is not ”too<br />

flat”. Objective functions that are too flat in a neighborhood of their optimal<br />

solution lead to minimization problems where solutions can be very sensitive to<br />

perturbations. This may not be too annoying if the utility derived from the<br />

optimization model is well reflected by the value of objective function. However,<br />

there exist settings where the utility of the model is also derived from the<br />

solution itself.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 12 / 26


Growth Conditions<br />

Definition 6(γ-order growth condition)<br />

Consider a function f and a nonempty set C ⊂ X. Let S ⊂ C be a nonempty<br />

set where f is constant-valued with value fS ∈ R:<br />

f(s) = fS for all s ∈ S.<br />

The γ-order growth condition(γ > 0) holds for f in C if there exists a constant<br />

c > 0 and a neighborhood NS of S such that<br />

f(x)−fS ≥ c[dist(x,S)] γ <br />

for all x ∈ NS C.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 13 / 26


Remark<br />

Growth Conditions<br />

Ideally, we are interested in the largest c > 0 that can satisfies the inequality for<br />

a fixed γ. Compared to Hölder-type conditions, here only x needs to vary, and<br />

the absolute value is not taken on the left-hand side. Hence in practice, S<br />

stands for a set of local minimizers.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 14 / 26


Growth Conditions<br />

Example(strongly convex function)<br />

Let f(x) be a twice differentiable, strongly convex function over a convex set C<br />

comprising its minimum: there exists a constant m > 0 such that ∇ 2 f(x) mI<br />

for all x ∈ C. Since for any x,y ∈ C, there exists a point z = tx +(1−t)y with<br />

t ∈ [0,1] such that<br />

f(y)−f(x) = ∇f(x) ⊤ (y −x)+ 1<br />

2 (y −x)⊤ ∇ 2 f(z)(y −x) ,<br />

we have, by the strong convexity assumption,<br />

f(y)−f(x) ≥ ∇f(x) ⊤ (y −x)+ m<br />

2 ||y −x||2 .<br />

Consider S = argmin Cf, s ∈ S, and write fS = minC f. We have<br />

f(y)−fS ≥ ∇f(s) ⊤ (y −s)+ m<br />

2 ||y −s||2 ≥ m<br />

2 ||y −s||2 ,<br />

using the fact that ∇f(s) ⊤ (y −s) ≥ 0 for all y ∈ C and s ∈ S, by optimality of<br />

S. This proves that S is in fact reduced to {s}, and that the second-order<br />

growth condition holds with modulus c = m/2:<br />

f(x)−fS ≥ c[x −dist(x,S)] 2<br />

for all x ∈ C.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 15 / 26


1 Introduction and Notation<br />

First- and Second-Order Expansions<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 16 / 26


Directionally Differentiable<br />

First- and Second-Order Expansions<br />

A function g(x) is directionally differentiable at x if the directional derivatives<br />

exist for all directions h.<br />

Remark<br />

g ′ (x,h) = lim<br />

t→0 +<br />

g(x +th)−g(x)<br />

t<br />

If g(x) is directional differentiable, we can expand g(x +th) for t > 0 as<br />

g(x +th) = g(x)+tg ′ (x,h)+o(t).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 17 / 26


Second-Order Expansions<br />

First- and Second-Order Expansions<br />

A directionally differentiable function g(x) is second order directionally<br />

differentiable at x in the direction h if the (parabolic) second order directional<br />

derivatives<br />

g ′′ (x,h,w) = lim<br />

t→0 +<br />

g(x +th+ t2<br />

2w)−[g(x)+tg′ (x,h)]<br />

t<br />

exist for all (parabolic) directions w. In that case we can expand<br />

g(x +th+ t2<br />

2 w) for t > 0 as<br />

g(x +th+ t2<br />

2 w) = g(x)+tg′ (x,h)+ t2<br />

2 g′′ (x,h,w)+o(t 2 ).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 18 / 26


1 Introduction and Notation<br />

Perturbation Theorems<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 19 / 26


Fixed Feasible Set<br />

Perturbation Theorems<br />

Consider the optimization problem<br />

minimize f(x)<br />

subject to x ∈ C,<br />

where C ⊂ R n . Let ¯v = minCf and Sf = argmin Cf. Assume that<br />

• A1. Sf is nonempty.<br />

• A2. f satisfies a γ-order growth condition with modulus c in a<br />

neighborhood N of argmin C f.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 20 / 26<br />

(2)


Perturbed Optimization Problem<br />

Perturbation Theorems<br />

Consider the perturbed optimization problem<br />

where we assume that<br />

minimize f(x)+η(x)<br />

subject to x ∈ C,<br />

• B1. η is Lipschitz continuous modulus κ on C N.<br />

Let g(x) = f(x)+η(x), and consider any ɛ-optimal solution ˜xg ∈ argmin C,ɛg<br />

such that ˜xg ∈ N.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 21 / 26<br />

(3)


Perturbation Theorems<br />

Theorem 1 [Bonnans and Shapiro(2000)]<br />

Under the assumptions A1, A2 and B1, the distance between ˜xg and<br />

Sf = argmin Cf satisfy the relation<br />

c[dist(˜xg,Sf)] γ ≤ κdist(˜xg,Sf)+ɛ.<br />

In particular, for the second-order growth condition, the relation yields<br />

dist(˜xg,Sf) ≤ 1<br />

<br />

κ/c + (<br />

2 1<br />

2 κ/c)2 +ɛ/c<br />

≤ κ/c + ɛ/c<br />

For the first-order growth condition, if κ < c, the relation yields<br />

dist(˜xg,Sf) ≤ ɛ/(c −κ).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 22 / 26


Proof of Theorem 1<br />

Perturbation Theorems<br />

Consider ˜xf ∈ argmin Cf, and ˜xg ∈ argmin C,ɛg N. We have<br />

g(˜xg) ≤ minCg +ɛ ≤ g(¯xf)+ɛ, implying g(˜xg)−g(¯xf) ≤ ɛ. Therefore,<br />

f(˜xg)−f(¯xf) = g(˜xg)−η(˜xg)−[g(¯xf −η(¯xf)]<br />

= g(˜xg)−g(¯xf)+η(¯xf)−η(˜xg) ≤ ɛ+κ˜xg −¯xf,<br />

where the last inequality is obtained by using the Lipschitz condition on η. On<br />

the other hand, f(˜xg)−f(¯xf) ≥ c[dist(˜xg,Sf)] γ . For any δ > 0, we can choose<br />

¯xf such that ˜xg −¯xf ≤ dist(˜xg,Sf)+δ. Therefore<br />

f(˜xg)−f(¯xf) ≥ c[˜xg −¯xf−δ] γ .<br />

We have thus obtained c[˜xg −¯xf−δ] γ ≤ ɛ+κ˜xg −¯xf. By letting δ → 0<br />

yields ˜xg −¯xf → dist(˜xg,Sf) and the relation<br />

cy γ ≤ ɛ+κy<br />

where y = dist(˜xg,Sf) ≥ 0. The results for particular values of γ follow by<br />

solving for y. When γ = 2 one has to solve cy 2 −κy −ɛ ≤ 0,y ≥ 0. When<br />

γ = 1 one has to solve (c −κ)y ≤ ɛ,y ≥ 0.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 23 / 26


More Assumptions<br />

Perturbation Theorems<br />

More can be said under additional conditions. Assume that<br />

• A3. The set C is compact. We define a convex compact set U ⊂ R n such<br />

that X ⊂ intU.<br />

• A4. f is Lipschitz continuous on U.<br />

• A5. Sf = argmin Cf is a singleton. We write ¯x = argmin Cf.<br />

• A2’. f satisfies a quadratic growth condition at ¯x, modulus c on U.<br />

• A6. f is twice continuously differentiable at ¯x. In particular we can expand<br />

f(x(t)) where x(t) = x +th+ t2<br />

2 +o(t2 ) as follows:<br />

f(x(t)) = f(¯x)+tf ′ (x,h)+ t2<br />

2 f ′′ (x,h,w)+o(t 2 )<br />

= f(¯x)+t∇f(¯x) ⊤ h+ t2<br />

2 (∇f(¯x)⊤ w +h ⊤ ∇ 2 f(¯x)h)+o(t 2 ).<br />

• A7. C is convex polyhedral.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 24 / 26


More Assumptions...<br />

Perturbation Theorems<br />

For t > 0, consider the perturbed problems<br />

P(t) : minimize f(x)+tηt(x) subject to x ∈ X,<br />

where the perturbations ηt(x) satisfy the following assumptions:<br />

• B1’. The functions ηt are Lipschitz continuous on U.<br />

• B2. ηt converges to a function which is Lipschitz continuous as t ↓ 0.<br />

• B3. The function η is differentiable at ¯x. In particular we can expand<br />

η(x(t)) where x(t) = x +th+ t2<br />

s w +o(t2 ) as follows:<br />

η(x(t)) = η(¯x)+t∇η(¯x)h+o(t).<br />

Let gt(x) = f(x)+tηt(x), and consider v(t) = minCg and ¯x(t) = argmin C gt.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 25 / 26


Perturbation Theorems<br />

Theorem 2[Shapiro, Dentcheva, and Ruszczyński(2009)]<br />

Under all the assumptions we have made in the previous two slides, for t ≥ 0,<br />

we have the second-order expansion of the optimal function<br />

v(t) = ¯v +tηt(¯x)+ t2<br />

2 hf,η(¯x)+o(t 2 ),<br />

where hf,η(¯x) is the optimal value of the auxiliary problem over h,<br />

minimize 2h ⊤ ∇η(¯x)+h ⊤ ∇ 2 f(¯x)h<br />

subject to h ∈ C crit (¯x) := {d ∈ TC(¯x) : d ⊤ ∇f(¯x) = 0},<br />

TC(x) := {c ∈ R n : dist(x +td,C) = o(t),t ≥ 0}.<br />

Moreover, if the solution to the auxiliary problem is a unique, say ¯g, we have for<br />

t ≥ 0 the first-order expansion of the optimal solution<br />

˜x(t) = ¯x +t ¯ h+o(t).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 26 / 26


Perturbation Theorems<br />

J.F. Bonnans and A. Shapiro.<br />

Perturbation Analysis of Optimization Problems.<br />

Springer, New York, 2000.<br />

W. Cook, A.M.H. Gerards, and A. Schrijver.<br />

Sensitivity theorems in integer linear programming.<br />

Mathematical Programming, 34:251–264, 1986.<br />

O.L. Mangasarian.<br />

Nonlinear Programming.<br />

SIAM, 1994.<br />

R.T. Rockafellar and R.J.-B. Wets.<br />

Variational Analysis.<br />

Springer, third edition, 1998.<br />

A. Shapiro, D. Dentcheva, and A. Ruszczyński.<br />

Lectures on Stochastic Programming: Modeling and Theory.<br />

SIAM, 2009.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 26 / 26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!