03.07.2013 Views

lecture. - CASTLE Lab - Princeton University

lecture. - CASTLE Lab - Princeton University

lecture. - CASTLE Lab - Princeton University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stochastic Optimization Seminar<br />

Lecture 6:<br />

Perturbation Analysis in Optimization<br />

Boris Defourny 1<br />

Scribe: Ethan X.Fang 1<br />

1 ORFE, Operations Research & Financial Engineering<br />

<strong>Princeton</strong> <strong>University</strong><br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 1 / 26


1 Introduction and Notation<br />

Introduction and Notation<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 2 / 26


Motivation<br />

Introduction and Notation<br />

Perturbation analysis in optimization deals with the sensitivity of optimal values<br />

and optimal solution sets to perturbations of the the objective function and<br />

feasible set. Perturbation analysis provides a theoretical foundation that helps<br />

analyzing algorithms for solving approximately stochastic optimization problems.<br />

Notations<br />

Let us write minCf ∈ R = R {−∞} {+∞} for the value of the minimum of<br />

a function f : X → R over the closed set C ⊂ X. The set X is a metric<br />

space,usually R n endowed with the Euclidean norm. We assume that C is in the<br />

domain of f. In minimization problem, domf = {x ∈ X : f(x) < ∞}.<br />

The optimal solution set is the set argmin Cf = {x ∈ C : f(x) = minCf}. For<br />

ɛ > 0, the ɛ-optimal solution set is the set<br />

argmin C,ɛ f = {x ∈ C : f(x) ≤ minCf +ɛ}. If argmin C f is a singleton {s}, we<br />

write s = argmin Cf. For an illustration, see Figure 1 one the next slide.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 3 / 26


2<br />

1<br />

0<br />

Introduction and Notation<br />

x 2<br />

−1<br />

2<br />

0<br />

x<br />

1<br />

4<br />

1<br />

0<br />

−1<br />

2<br />

0<br />

x<br />

1<br />

6<br />

1<br />

0<br />

−1 0 1<br />

2<br />

1<br />

0<br />

x 2 − 0.5x<br />

−1 0 1<br />

2<br />

1<br />

0<br />

x 4 − 0.5x<br />

−1 0 1<br />

2<br />

1<br />

0<br />

x 6 − 0.5x<br />

−1 0 1<br />

Figure: Minimizing x 2m over the domain C = [0,1], for m = 1,2,3. If the objective is<br />

perturbed by −0.5x (tilt perturbation), the optimal solution is less “stable” with<br />

higher values of m. The ɛ-optimal solution sets are shown for ɛ = 0.05.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 4 / 26


Questions<br />

Introduction and Notation<br />

Suppose that we have an objective function fθ and a set Cθ that depend on<br />

some parameter θ ∈ Θ (See Figure 2 on the next slide). We would ask questions<br />

including:<br />

• What are conditions on Θ ensuring that minCθ fθ is finite for all θ ∈ Θ?<br />

• Given Θ, what is the range of the optimal value mapping θ ↦→ minCθ fθ?<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 5 / 26


θ = [ −0.06 1.64 2.13 −0.70 ]<br />

4<br />

T<br />

3<br />

2<br />

1<br />

0<br />

−2 0 2<br />

3<br />

2<br />

1<br />

Introduction and Notation<br />

θ = [ −0.26 1.44 1.93 −0.90 ]<br />

4<br />

T<br />

0<br />

−2 0 2<br />

Cθ = {x ∈ R2 ⎢<br />

: Ax θ} where A = ⎢<br />

⎣<br />

⎡<br />

θ = [ −0.46 1.24 1.73 −1.10 ]<br />

4<br />

T<br />

3<br />

2<br />

1<br />

−0.83 −0.43<br />

0.57 0.63<br />

0.28 0.80<br />

1.14 −0.90<br />

0<br />

−2 0 2<br />

⎤<br />

⎥<br />

⎦<br />

⎡<br />

⎢<br />

and θ = ⎢<br />

⎣<br />

θ = [ −0.66 1.04 1.53 −1.30 ]<br />

4<br />

T<br />

3<br />

2<br />

1<br />

0<br />

−2 0 2<br />

−0.06<br />

1.64<br />

2.13<br />

0.70<br />

⎤<br />

⎡<br />

⎥<br />

⎦−0.2k ⎢<br />

⎣<br />

Figure: Convex polyhedra described by Cθ k = {x ∈ R 2 : Ax θk}, for k = 0,1,2,3<br />

(left to right).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 6 / 26<br />

1<br />

1<br />

1<br />

1<br />

⎤<br />

⎥<br />


Hölder and Lipschitz Continuity for Functions<br />

1 Introduction and Notation<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 7 / 26


Hölder and Lipschitz Continuity for Functions<br />

Definition 1 (distance from a point to a set)<br />

The distance from a point x ∈ X to a set S ⊂ X is written<br />

dist(x,S) = infs∈S x −s.<br />

Definition 2 (neighborhood of a set)<br />

A neighborhood of a set S ⊂ X, written NS, is a set that is a neighborhood of<br />

each point of S.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 8 / 26


Hölder and Lipschitz Continuity for Functions<br />

Definition 3 (Hölder condition of degree α)<br />

A function f(x) satisfies the Hölder condition of degree α on a set S ⊂ X (for<br />

some α ≥ 0) if there exists a constant c ≥ 0 such that<br />

|f(x)−f(y)| ≤ cx −y α for all x,y ∈ S. (1)<br />

When α = 1, it corresponds to Lipschitz condition.<br />

Definition 4 (modulus)<br />

The smallest possible constant c > 0 satisfies Hölder condition of function f(x)<br />

and the degree α. Note here we implicitly assume f(x) is of real value. A<br />

similar definition could be easily obtained for vector-valued function where<br />

|f(x)−f(y)| is replaced by f(x)−f(y).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 9 / 26


Example 1<br />

Hölder and Lipschitz Continuity for Functions<br />

f0(x) = ax α with α ∈ (0,1] and α ∈ R satisfies |f0(x)−f0(y)| ≤ |a||x −y| α on<br />

C = [0,u].<br />

Example 2<br />

Let f be a continuous function differentiable on an open set S, with gradient<br />

∇f(x) on S. Let C be a compact convex subset of S. Then f is Lipschitz<br />

continuous on C modulus c = maxx∈C∇xf(x).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 10 / 26


1 Introduction and Notation<br />

Growth Conditions<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 11 / 26


Motivation<br />

Growth Conditions<br />

This section presents notions for ensuring that a function of interest is not ”too<br />

flat”. Objective functions that are too flat in a neighborhood of their optimal<br />

solution lead to minimization problems where solutions can be very sensitive to<br />

perturbations. This may not be too annoying if the utility derived from the<br />

optimization model is well reflected by the value of objective function. However,<br />

there exist settings where the utility of the model is also derived from the<br />

solution itself.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 12 / 26


Growth Conditions<br />

Definition 6(γ-order growth condition)<br />

Consider a function f and a nonempty set C ⊂ X. Let S ⊂ C be a nonempty<br />

set where f is constant-valued with value fS ∈ R:<br />

f(s) = fS for all s ∈ S.<br />

The γ-order growth condition(γ > 0) holds for f in C if there exists a constant<br />

c > 0 and a neighborhood NS of S such that<br />

f(x)−fS ≥ c[dist(x,S)] γ <br />

for all x ∈ NS C.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 13 / 26


Remark<br />

Growth Conditions<br />

Ideally, we are interested in the largest c > 0 that can satisfies the inequality for<br />

a fixed γ. Compared to Hölder-type conditions, here only x needs to vary, and<br />

the absolute value is not taken on the left-hand side. Hence in practice, S<br />

stands for a set of local minimizers.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 14 / 26


Growth Conditions<br />

Example(strongly convex function)<br />

Let f(x) be a twice differentiable, strongly convex function over a convex set C<br />

comprising its minimum: there exists a constant m > 0 such that ∇ 2 f(x) mI<br />

for all x ∈ C. Since for any x,y ∈ C, there exists a point z = tx +(1−t)y with<br />

t ∈ [0,1] such that<br />

f(y)−f(x) = ∇f(x) ⊤ (y −x)+ 1<br />

2 (y −x)⊤ ∇ 2 f(z)(y −x) ,<br />

we have, by the strong convexity assumption,<br />

f(y)−f(x) ≥ ∇f(x) ⊤ (y −x)+ m<br />

2 ||y −x||2 .<br />

Consider S = argmin Cf, s ∈ S, and write fS = minC f. We have<br />

f(y)−fS ≥ ∇f(s) ⊤ (y −s)+ m<br />

2 ||y −s||2 ≥ m<br />

2 ||y −s||2 ,<br />

using the fact that ∇f(s) ⊤ (y −s) ≥ 0 for all y ∈ C and s ∈ S, by optimality of<br />

S. This proves that S is in fact reduced to {s}, and that the second-order<br />

growth condition holds with modulus c = m/2:<br />

f(x)−fS ≥ c[x −dist(x,S)] 2<br />

for all x ∈ C.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 15 / 26


1 Introduction and Notation<br />

First- and Second-Order Expansions<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 16 / 26


Directionally Differentiable<br />

First- and Second-Order Expansions<br />

A function g(x) is directionally differentiable at x if the directional derivatives<br />

exist for all directions h.<br />

Remark<br />

g ′ (x,h) = lim<br />

t→0 +<br />

g(x +th)−g(x)<br />

t<br />

If g(x) is directional differentiable, we can expand g(x +th) for t > 0 as<br />

g(x +th) = g(x)+tg ′ (x,h)+o(t).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 17 / 26


Second-Order Expansions<br />

First- and Second-Order Expansions<br />

A directionally differentiable function g(x) is second order directionally<br />

differentiable at x in the direction h if the (parabolic) second order directional<br />

derivatives<br />

g ′′ (x,h,w) = lim<br />

t→0 +<br />

g(x +th+ t2<br />

2w)−[g(x)+tg′ (x,h)]<br />

t<br />

exist for all (parabolic) directions w. In that case we can expand<br />

g(x +th+ t2<br />

2 w) for t > 0 as<br />

g(x +th+ t2<br />

2 w) = g(x)+tg′ (x,h)+ t2<br />

2 g′′ (x,h,w)+o(t 2 ).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 18 / 26


1 Introduction and Notation<br />

Perturbation Theorems<br />

2 Hölder and Lipschitz Continuity for Functions<br />

3 Growth Conditions<br />

4 First- and Second-Order Expansions<br />

5 Perturbation Theorems<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 19 / 26


Fixed Feasible Set<br />

Perturbation Theorems<br />

Consider the optimization problem<br />

minimize f(x)<br />

subject to x ∈ C,<br />

where C ⊂ R n . Let ¯v = minCf and Sf = argmin Cf. Assume that<br />

• A1. Sf is nonempty.<br />

• A2. f satisfies a γ-order growth condition with modulus c in a<br />

neighborhood N of argmin C f.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 20 / 26<br />

(2)


Perturbed Optimization Problem<br />

Perturbation Theorems<br />

Consider the perturbed optimization problem<br />

where we assume that<br />

minimize f(x)+η(x)<br />

subject to x ∈ C,<br />

• B1. η is Lipschitz continuous modulus κ on C N.<br />

Let g(x) = f(x)+η(x), and consider any ɛ-optimal solution ˜xg ∈ argmin C,ɛg<br />

such that ˜xg ∈ N.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 21 / 26<br />

(3)


Perturbation Theorems<br />

Theorem 1 [Bonnans and Shapiro(2000)]<br />

Under the assumptions A1, A2 and B1, the distance between ˜xg and<br />

Sf = argmin Cf satisfy the relation<br />

c[dist(˜xg,Sf)] γ ≤ κdist(˜xg,Sf)+ɛ.<br />

In particular, for the second-order growth condition, the relation yields<br />

dist(˜xg,Sf) ≤ 1<br />

<br />

κ/c + (<br />

2 1<br />

2 κ/c)2 +ɛ/c<br />

≤ κ/c + ɛ/c<br />

For the first-order growth condition, if κ < c, the relation yields<br />

dist(˜xg,Sf) ≤ ɛ/(c −κ).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 22 / 26


Proof of Theorem 1<br />

Perturbation Theorems<br />

Consider ˜xf ∈ argmin Cf, and ˜xg ∈ argmin C,ɛg N. We have<br />

g(˜xg) ≤ minCg +ɛ ≤ g(¯xf)+ɛ, implying g(˜xg)−g(¯xf) ≤ ɛ. Therefore,<br />

f(˜xg)−f(¯xf) = g(˜xg)−η(˜xg)−[g(¯xf −η(¯xf)]<br />

= g(˜xg)−g(¯xf)+η(¯xf)−η(˜xg) ≤ ɛ+κ˜xg −¯xf,<br />

where the last inequality is obtained by using the Lipschitz condition on η. On<br />

the other hand, f(˜xg)−f(¯xf) ≥ c[dist(˜xg,Sf)] γ . For any δ > 0, we can choose<br />

¯xf such that ˜xg −¯xf ≤ dist(˜xg,Sf)+δ. Therefore<br />

f(˜xg)−f(¯xf) ≥ c[˜xg −¯xf−δ] γ .<br />

We have thus obtained c[˜xg −¯xf−δ] γ ≤ ɛ+κ˜xg −¯xf. By letting δ → 0<br />

yields ˜xg −¯xf → dist(˜xg,Sf) and the relation<br />

cy γ ≤ ɛ+κy<br />

where y = dist(˜xg,Sf) ≥ 0. The results for particular values of γ follow by<br />

solving for y. When γ = 2 one has to solve cy 2 −κy −ɛ ≤ 0,y ≥ 0. When<br />

γ = 1 one has to solve (c −κ)y ≤ ɛ,y ≥ 0.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 23 / 26


More Assumptions<br />

Perturbation Theorems<br />

More can be said under additional conditions. Assume that<br />

• A3. The set C is compact. We define a convex compact set U ⊂ R n such<br />

that X ⊂ intU.<br />

• A4. f is Lipschitz continuous on U.<br />

• A5. Sf = argmin Cf is a singleton. We write ¯x = argmin Cf.<br />

• A2’. f satisfies a quadratic growth condition at ¯x, modulus c on U.<br />

• A6. f is twice continuously differentiable at ¯x. In particular we can expand<br />

f(x(t)) where x(t) = x +th+ t2<br />

2 +o(t2 ) as follows:<br />

f(x(t)) = f(¯x)+tf ′ (x,h)+ t2<br />

2 f ′′ (x,h,w)+o(t 2 )<br />

= f(¯x)+t∇f(¯x) ⊤ h+ t2<br />

2 (∇f(¯x)⊤ w +h ⊤ ∇ 2 f(¯x)h)+o(t 2 ).<br />

• A7. C is convex polyhedral.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 24 / 26


More Assumptions...<br />

Perturbation Theorems<br />

For t > 0, consider the perturbed problems<br />

P(t) : minimize f(x)+tηt(x) subject to x ∈ X,<br />

where the perturbations ηt(x) satisfy the following assumptions:<br />

• B1’. The functions ηt are Lipschitz continuous on U.<br />

• B2. ηt converges to a function which is Lipschitz continuous as t ↓ 0.<br />

• B3. The function η is differentiable at ¯x. In particular we can expand<br />

η(x(t)) where x(t) = x +th+ t2<br />

s w +o(t2 ) as follows:<br />

η(x(t)) = η(¯x)+t∇η(¯x)h+o(t).<br />

Let gt(x) = f(x)+tηt(x), and consider v(t) = minCg and ¯x(t) = argmin C gt.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 25 / 26


Perturbation Theorems<br />

Theorem 2[Shapiro, Dentcheva, and Ruszczyński(2009)]<br />

Under all the assumptions we have made in the previous two slides, for t ≥ 0,<br />

we have the second-order expansion of the optimal function<br />

v(t) = ¯v +tηt(¯x)+ t2<br />

2 hf,η(¯x)+o(t 2 ),<br />

where hf,η(¯x) is the optimal value of the auxiliary problem over h,<br />

minimize 2h ⊤ ∇η(¯x)+h ⊤ ∇ 2 f(¯x)h<br />

subject to h ∈ C crit (¯x) := {d ∈ TC(¯x) : d ⊤ ∇f(¯x) = 0},<br />

TC(x) := {c ∈ R n : dist(x +td,C) = o(t),t ≥ 0}.<br />

Moreover, if the solution to the auxiliary problem is a unique, say ¯g, we have for<br />

t ≥ 0 the first-order expansion of the optimal solution<br />

˜x(t) = ¯x +t ¯ h+o(t).<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 26 / 26


Perturbation Theorems<br />

J.F. Bonnans and A. Shapiro.<br />

Perturbation Analysis of Optimization Problems.<br />

Springer, New York, 2000.<br />

W. Cook, A.M.H. Gerards, and A. Schrijver.<br />

Sensitivity theorems in integer linear programming.<br />

Mathematical Programming, 34:251–264, 1986.<br />

O.L. Mangasarian.<br />

Nonlinear Programming.<br />

SIAM, 1994.<br />

R.T. Rockafellar and R.J.-B. Wets.<br />

Variational Analysis.<br />

Springer, third edition, 1998.<br />

A. Shapiro, D. Dentcheva, and A. Ruszczyński.<br />

Lectures on Stochastic Programming: Modeling and Theory.<br />

SIAM, 2009.<br />

Boris Defourny (ORFE) Lecture 6: Perturbation Analysis in Optimization March 15,2012 26 / 26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!