v2010.10.26 - Convex Optimization
v2010.10.26 - Convex Optimization v2010.10.26 - Convex Optimization
672 APPENDIX D. MATRIX CALCULUSGradient of vector-valued function g(X) : R K×L →R N on matrix domainis a cubix[∇X(:,1) g 1 (X) ∇ X(:,1) g 2 (X) · · · ∇ X(:,1) g N (X)∇g(X) ∇ X(:,2) g 1 (X) ∇ X(:,2) g 2 (X) · · · ∇ X(:,2) g N (X). . .. . .. . .∇ X(:,L) g 1 (X) ∇ X(:,L) g 2 (X) · · · ∇ X(:,L) g N (X) ]= [ ∇g 1 (X) ∇g 2 (X) · · · ∇g N (X) ] ∈ R K×N×L (1767)while the second-order gradient has a five-dimensional representation;[∇∇X(:,1) g 1 (X) ∇∇ X(:,1) g 2 (X) · · · ∇∇ X(:,1) g N (X)∇ 2 g(X) ∇∇ X(:,2) g 1 (X) ∇∇ X(:,2) g 2 (X) · · · ∇∇ X(:,2) g N (X).........∇∇ X(:,L) g 1 (X) ∇∇ X(:,L) g 2 (X) · · · ∇∇ X(:,L) g N (X) ]= [ ∇ 2 g 1 (X) ∇ 2 g 2 (X) · · · ∇ 2 g N (X) ] ∈ R K×N×L×K×L (1768)The gradient of matrix-valued function g(X) : R K×L →R M×N on matrixdomain has a four-dimensional representation called quartix (fourth-ordertensor)⎡⎤∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g(X) ∇g⎢ 21 (X) ∇g 22 (X) · · · ∇g 2N (X)⎥⎣ . .. ⎦ ∈ RM×N×K×L (1769)∇g M1 (X) ∇g M2 (X) · · · ∇g MN (X)while the second-order gradient has six-dimensional representation⎡⎤∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g(X) ∇⎢2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X)⎥⎣ . .. ⎦ ∈ RM×N×K×L×K×L∇ 2 g M1 (X) ∇ 2 g M2 (X) · · · ∇ 2 g MN (X)(1770)and so on.
D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 673D.1.2Product rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X)while [53,8.3] [315]∇ X(f(X) T g(X) ) = ∇ X (f)g + ∇ X (g)f (1771)∇ X tr ( f(X) T g(X) ) = ∇ X(tr ( f(X) T g(Z) ) + tr ( g(X)f(Z) T))∣ ∣∣Z←X (1772)These expressions implicitly apply as well to scalar-, vector-, or matrix-valuedfunctions of scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f(X) : R 2×2 →R 2 = X T a and g(X) : R 2×2 →R 2 = Xb . We wishto find∇ X(f(X) T g(X) ) = ∇ X a T X 2 b (1773)using the product rule. Formula (1771) calls for∇ X a T X 2 b = ∇ X (X T a)Xb + ∇ X (Xb)X T a (1774)Consider the first of the two terms:Á∂(XÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ∇ X (f)g = ∇ X (X T a)Xb= [ ]Á∇(X T a) 1 ∇(XÁ(1775)T a) 2 Xb⎤T a) 2∂X 11(1776)⎡⎥The gradient of X T a forms a cubix in R 2×2×2 ; a.k.a, third-order tensor.⎡∂(X T a) 1∂X 11Á Á∂(X T a) 1∂(X T a) 2∂X 12Á⎤(Xb) 1∂X 12∇ X (X T a)Xb =⎢ ⎥⎣ ⎦ ∈ R 2×1×2∂(X T a) 1∂(X T a) 2(Xb) ∂X 21∂X 212⎢⎥⎣⎦∂(X T a) 1∂X 22∂(X T a) 2∂X 22
- Page 621 and 622: A.6. SINGULAR VALUE DECOMPOSITION,
- Page 623 and 624: A.6. SINGULAR VALUE DECOMPOSITION,
- Page 625 and 626: A.7. ZEROS 625A.6.5SVD of symmetric
- Page 627 and 628: A.7. ZEROS 627(Transpose.)Likewise,
- Page 629 and 630: A.7. ZEROS 629For X,A∈ S M +[34,2
- Page 631 and 632: A.7. ZEROS 631A.7.5.0.1 Proposition
- Page 633 and 634: Appendix BSimple matricesMathematic
- Page 635 and 636: B.1. RANK-ONE MATRIX (DYAD) 635R(v)
- Page 637 and 638: B.1. RANK-ONE MATRIX (DYAD) 637B.1.
- Page 639 and 640: B.2. DOUBLET 639R([u v ])R(Π)= R([
- Page 641 and 642: B.3. ELEMENTARY MATRIX 641has N −
- Page 643 and 644: B.4. AUXILIARY V -MATRICES 643is an
- Page 645 and 646: B.4. AUXILIARY V -MATRICES 64514. [
- Page 647 and 648: B.5. ORTHOGONAL MATRIX 647Given X
- Page 649 and 650: B.5. ORTHOGONAL MATRIX 649Figure 15
- Page 651 and 652: B.5. ORTHOGONAL MATRIX 651which is
- Page 653 and 654: Appendix CSome analytical optimal r
- Page 655 and 656: C.2. TRACE, SINGULAR AND EIGEN VALU
- Page 657 and 658: C.2. TRACE, SINGULAR AND EIGEN VALU
- Page 659 and 660: C.2. TRACE, SINGULAR AND EIGEN VALU
- Page 661 and 662: C.3. ORTHOGONAL PROCRUSTES PROBLEM
- Page 663 and 664: C.4. TWO-SIDED ORTHOGONAL PROCRUSTE
- Page 665 and 666: C.4. TWO-SIDED ORTHOGONAL PROCRUSTE
- Page 667 and 668: C.4. TWO-SIDED ORTHOGONAL PROCRUSTE
- Page 669 and 670: Appendix DMatrix calculusFrom too m
- Page 671: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 675 and 676: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 677 and 678: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 679 and 680: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 681 and 682: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 683 and 684: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 685 and 686: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 687 and 688: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 689 and 690: D.1. DIRECTIONAL DERIVATIVE, TAYLOR
- Page 691 and 692: D.2. TABLES OF GRADIENTS AND DERIVA
- Page 693 and 694: D.2. TABLES OF GRADIENTS AND DERIVA
- Page 695 and 696: D.2. TABLES OF GRADIENTS AND DERIVA
- Page 697 and 698: D.2. TABLES OF GRADIENTS AND DERIVA
- Page 699 and 700: Appendix EProjectionFor any A∈ R
- Page 701 and 702: 701U T = U † for orthonormal (inc
- Page 703 and 704: E.1. IDEMPOTENT MATRICES 703where A
- Page 705 and 706: E.1. IDEMPOTENT MATRICES 705order,
- Page 707 and 708: E.1. IDEMPOTENT MATRICES 707are lin
- Page 709 and 710: E.3. SYMMETRIC IDEMPOTENT MATRICES
- Page 711 and 712: E.3. SYMMETRIC IDEMPOTENT MATRICES
- Page 713 and 714: E.3. SYMMETRIC IDEMPOTENT MATRICES
- Page 715 and 716: E.4. ALGEBRA OF PROJECTION ON AFFIN
- Page 717 and 718: E.5. PROJECTION EXAMPLES 717a ∗ 2
- Page 719 and 720: E.5. PROJECTION EXAMPLES 719where Y
- Page 721 and 722: E.5. PROJECTION EXAMPLES 721(B.4.2)
D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 673D.1.2Product rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X)while [53,8.3] [315]∇ X(f(X) T g(X) ) = ∇ X (f)g + ∇ X (g)f (1771)∇ X tr ( f(X) T g(X) ) = ∇ X(tr ( f(X) T g(Z) ) + tr ( g(X)f(Z) T))∣ ∣∣Z←X (1772)These expressions implicitly apply as well to scalar-, vector-, or matrix-valuedfunctions of scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f(X) : R 2×2 →R 2 = X T a and g(X) : R 2×2 →R 2 = Xb . We wishto find∇ X(f(X) T g(X) ) = ∇ X a T X 2 b (1773)using the product rule. Formula (1771) calls for∇ X a T X 2 b = ∇ X (X T a)Xb + ∇ X (Xb)X T a (1774)Consider the first of the two terms:Á∂(XÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ∇ X (f)g = ∇ X (X T a)Xb= [ ]Á∇(X T a) 1 ∇(XÁ(1775)T a) 2 Xb⎤T a) 2∂X 11(1776)⎡⎥The gradient of X T a forms a cubix in R 2×2×2 ; a.k.a, third-order tensor.⎡∂(X T a) 1∂X 11Á Á∂(X T a) 1∂(X T a) 2∂X 12Á⎤(Xb) 1∂X 12∇ X (X T a)Xb =⎢ ⎥⎣ ⎦ ∈ R 2×1×2∂(X T a) 1∂(X T a) 2(Xb) ∂X 21∂X 212⎢⎥⎣⎦∂(X T a) 1∂X 22∂(X T a) 2∂X 22