v2010.10.26 - Convex Optimization

v2010.10.26 - Convex Optimization v2010.10.26 - Convex Optimization

convexoptimization.com
from convexoptimization.com More from this publisher
12.07.2015 Views

250 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS21.510.5Y 20−0.5−1−1.5−2−2 −1.5 −1 −0.5 0 0.5 1 1.5 2Figure 76: Gradient in R 2 evaluated on grid over some open disc in domainof convex quadratic bowl f(Y )= Y T Y : R 2 → R illustrated in Figure 77.Circular contours are level sets; each defined by a constant function-value.Y 13.6 GradientGradient ∇f of any differentiable multidimensional function f (formallydefined inD.1) maps each entry f i to a space having the same dimensionas the ambient space of its domain. Notation ∇f is shorthand for gradient∇ x f(x) of f with respect to x . ∇f(y) can mean ∇ y f(y) or gradient∇ x f(y) of f(x) with respect to x evaluated at y ; a distinction that shouldbecome clear from context.Gradient of a differentiable real function f(x) : R K →R with respect toits vector argument is uniquely defined⎡∇f(x) =⎢⎣∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎤⎥⎦ ∈ RK (1759)while the second-order gradient of the twice differentiable real function with

3.6. GRADIENT 251respect to its vector argument is traditionally called the Hessian ; 3.17⎡⎤∂ 2 f(x)∂x 1 ∂x 2· · ·∇ 2 f(x) =⎢⎣∂ 2 f(x)∂x 2 1∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂x 2 2· · ·....∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂x 2 K∈ S K (1760)⎥⎦The gradient can be interpreted as a vector pointing in the direction ofgreatest change, [219,15.6] or polar to direction of steepest descent. 3.18 [375]The gradient can also be interpreted as that vector normal to a sublevel set;e.g., Figure 78, Figure 67.For the quadratic bowl in Figure 77, the gradient maps to R 2 ; illustratedin Figure 76. For a one-dimensional function of real variable, for example,the gradient evaluated at any point in the function domain is just the slope(or derivative) of that function there. (conferD.1.4.1)For any differentiable multidimensional function, zero gradient ∇f = 0is a condition necessary for its unconstrained minimization [152,3.2]:3.6.0.0.1 Example. Projection on rank-1 subset.For A∈ S N having eigenvalues λ(A)= [λ i ]∈ R N , consider the unconstrainednonconvex optimization that is a projection on the rank-1 subset (2.9.2.1)of positive semidefinite cone S N + : Defining λ 1 max i {λ(A) i } andcorresponding eigenvector v 1minimizexarg minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=‖λ(A)‖ 2 − λ 2 (1695)1 , λ 1 > 0{0 , λ1 ≤ 0‖xx T − A‖ 2 F = √ (1696)v 1 λ1 , λ 1 > 0From (1789) andD.2.1, the gradient of ‖xx T − A‖ 2 F is∇ x((x T x) 2 − 2x T Ax ) = 4(x T x)x − 4Ax (590)3.17 Jacobian is the Hessian transpose, so commonly confused in matrix calculus.3.18 Newton’s direction −∇ 2 f(x) −1 ∇f(x) is better for optimization of nonlinear functionswell approximated locally by a quadratic function. [152, p.105]

3.6. GRADIENT 251respect to its vector argument is traditionally called the Hessian ; 3.17⎡⎤∂ 2 f(x)∂x 1 ∂x 2· · ·∇ 2 f(x) =⎢⎣∂ 2 f(x)∂x 2 1∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂x 2 2· · ·....∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂x 2 K∈ S K (1760)⎥⎦The gradient can be interpreted as a vector pointing in the direction ofgreatest change, [219,15.6] or polar to direction of steepest descent. 3.18 [375]The gradient can also be interpreted as that vector normal to a sublevel set;e.g., Figure 78, Figure 67.For the quadratic bowl in Figure 77, the gradient maps to R 2 ; illustratedin Figure 76. For a one-dimensional function of real variable, for example,the gradient evaluated at any point in the function domain is just the slope(or derivative) of that function there. (conferD.1.4.1)For any differentiable multidimensional function, zero gradient ∇f = 0is a condition necessary for its unconstrained minimization [152,3.2]:3.6.0.0.1 Example. Projection on rank-1 subset.For A∈ S N having eigenvalues λ(A)= [λ i ]∈ R N , consider the unconstrainednonconvex optimization that is a projection on the rank-1 subset (2.9.2.1)of positive semidefinite cone S N + : Defining λ 1 max i {λ(A) i } andcorresponding eigenvector v 1minimizexarg minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=‖λ(A)‖ 2 − λ 2 (1695)1 , λ 1 > 0{0 , λ1 ≤ 0‖xx T − A‖ 2 F = √ (1696)v 1 λ1 , λ 1 > 0From (1789) andD.2.1, the gradient of ‖xx T − A‖ 2 F is∇ x((x T x) 2 − 2x T Ax ) = 4(x T x)x − 4Ax (590)3.17 Jacobian is the Hessian transpose, so commonly confused in matrix calculus.3.18 Newton’s direction −∇ 2 f(x) −1 ∇f(x) is better for optimization of nonlinear functionswell approximated locally by a quadratic function. [152, p.105]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!