v2010.10.26 - Convex Optimization

DATTORROCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMεβοο

DattorroCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMeboo

Convex Optimization&Euclidean Distance GeometryJon DattorroMεβoo Publishing

Meboo Publishing USAPO Box 12Palo Alto, California 94302Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo, 2005, v2010.10.26.ISBN 0976401304 (English)ISBN 9780615193687 (International III)This is version 2010.10.26: available in print, as conceived, in color.cybersearch:I. convex optimizationII. semidefinite programIII. rank constraintIV. convex geometryV. distance matrixVI. convex conesVII. cardinality minimizationprograms by Matlabtypesetting bywith donations from SIAM and AMS.This searchable electronic color pdfBook is click-navigable within the text bypage, section, subsection, chapter, theorem, example, definition, cross reference,citation, equation, figure, table, and hyperlink. A pdfBook has no electronic copyprotection and can be read and printed by most computers. The publisher herebygrants the right to reproduce this work in any format but limited to personal use.2001-2010 MebooAll rights reserved.

for Jennie Columba♦Antonio♦♦& Sze Wan

EDM = S h ∩ ( S ⊥ c − S +)

PreludeThe constant demands of my department and university and theever increasing work needed to obtain funding have stolen much ofmy precious thinking time, and I sometimes yearn for the halcyondays of Bell Labs.−Steven Chu, Nobel laureate [81]Convex Analysis is the calculus of inequalities while Convex Optimization isits application. Analysis is inherently the domain of a mathematician whileOptimization belongs to the engineer. Practitioners in the art of ConvexOptimization engage themselves with discovery of which hard problems,perhaps previously believed nonconvex, can be transformed into convexequivalents; because once convex form of a problem is found, then a globallyoptimal solution is close at hand − the hard work is finished: Finding convexexpression of a problem is itself, in a very real sense, the solution.2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.7

There is a great race under way to determine which important problemscan be posed in a convex setting. Yet, that skill acquired by understandingthe geometry and application of Convex Optimization will remain more anart for some time to come; the reason being, there is generally no uniquetransformation of a given problem to its convex equivalent. This means,two researchers pondering the same problem are likely to formulate a convexequivalent differently; hence, one solution is likely different from the otherfor the same problem, and any convex combination of those two solutionsremains optimal. Any presumption of only one right or correct solutionbecomes nebulous. Study of equivalence & sameness, uniqueness, and dualitytherefore pervade study of Optimization.Tremendous benefit accrues when an optimization problem can betransformed to its convex equivalent, primarily because any locally optimalsolution is then guaranteed globally optimal. Solving a nonlinear system,for example, by instead solving an equivalent convex optimization problemis therefore highly preferable. 0.1 Yet it can be difficult for the engineer toapply theory without an understanding of Analysis.These pages comprise my journal over an eight year period bridginggaps between engineer and mathematician; they constitute a translation,unification, and cohering of about three hundred papers, books, and reportsfrom several different fields of mathematics and engineering. Beacons ofhistorical accomplishment are cited throughout. Much of what is written herewill not be found elsewhere. Care to detail, clarity, accuracy, consistency,and typography accompanies removal of ambiguity and verbosity out ofrespect for the reader. Consequently there is much cross-referencing andbackground material provided in the text, footnotes, and appendices so asto be self-contained and to provide understanding of fundamental concepts.−Jon DattorroStanford, California20090.1 That is what motivates a convex optimization known as geometric programming,invented in 1960s, [60] [79] which has driven great advances in the electronic circuit designindustry. [34,4.7] [248] [383] [386] [101] [180] [189] [190] [191] [192] [193] [264] [265] [309]8

Convex Optimization&Euclidean Distance Geometry1 Overview 212 Convex geometry 352.1 Convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.2 Vectorized-matrix inner product . . . . . . . . . . . . . . . . . 502.3 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.4 Halfspace, Hyperplane . . . . . . . . . . . . . . . . . . . . . . 722.5 Subspace representations . . . . . . . . . . . . . . . . . . . . . 852.6 Extreme, Exposed . . . . . . . . . . . . . . . . . . . . . . . . . 922.7 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962.8 Cone boundary . . . . . . . . . . . . . . . . . . . . . . . . . . 1062.9 Positive semidefinite (PSD) cone . . . . . . . . . . . . . . . . . 1132.10 Conic independence (c.i.) . . . . . . . . . . . . . . . . . . . . . 1402.11 When extreme means exposed . . . . . . . . . . . . . . . . . . 1462.12 Convex polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . 1472.13 Dual cone & generalized inequality . . . . . . . . . . . . . . . 1533 Geometry of convex functions 2193.1 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . 2203.2 Practical norm functions, absolute value . . . . . . . . . . . . 2253.3 Inverted functions and roots . . . . . . . . . . . . . . . . . . . 2353.4 Affine function . . . . . . . . . . . . . . . . . . . . . . . . . . 2373.5 Epigraph, Sublevel set . . . . . . . . . . . . . . . . . . . . . . 2413.6 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2509

10 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY3.7 Convex matrix-valued function . . . . . . . . . . . . . . . . . . 2623.8 Quasiconvex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2693.9 Salient properties . . . . . . . . . . . . . . . . . . . . . . . . . 2714 Semidefinite programming 2734.1 Conic problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2744.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2844.3 Rank reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2994.4 Rank-constrained semidefinite program . . . . . . . . . . . . . 3084.5 Constraining cardinality . . . . . . . . . . . . . . . . . . . . . 3334.6 Cardinality and rank constraint examples . . . . . . . . . . . . 3534.7 Constraining rank of indefinite matrices . . . . . . . . . . . . . 3844.8 Convex Iteration rank-1 . . . . . . . . . . . . . . . . . . . . . 3905 Euclidean Distance Matrix 3955.1 EDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3965.2 First metric properties . . . . . . . . . . . . . . . . . . . . . . 3975.3 ∃ fifth Euclidean metric property . . . . . . . . . . . . . . . . 3975.4 EDM definition . . . . . . . . . . . . . . . . . . . . . . . . . . 4025.5 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4345.6 Injectivity of D & unique reconstruction . . . . . . . . . . . . 4395.7 Embedding in affine hull . . . . . . . . . . . . . . . . . . . . . 4455.8 Euclidean metric versus matrix criteria . . . . . . . . . . . . . 4505.9 Bridge: Convex polyhedra to EDMs . . . . . . . . . . . . . . . 4585.10 EDM-entry composition . . . . . . . . . . . . . . . . . . . . . 4655.11 EDM indefiniteness . . . . . . . . . . . . . . . . . . . . . . . . 4685.12 List reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 4765.13 Reconstruction examples . . . . . . . . . . . . . . . . . . . . . 4805.14 Fifth property of Euclidean metric . . . . . . . . . . . . . . . 4866 Cone of distance matrices 4976.1 Defining EDM cone . . . . . . . . . . . . . . . . . . . . . . . . 4996.2 √Polyhedral bounds . . . . . . . . . . . . . . . . . . . . . . . . 5016.3 EDM cone is not convex . . . . . . . . . . . . . . . . . . . . 5026.4 EDM definition in 11 T . . . . . . . . . . . . . . . . . . . . . . 5036.5 Correspondence to PSD cone S N−1+ . . . . . . . . . . . . . . . 5116.6 Vectorization & projection interpretation . . . . . . . . . . . . 5176.7 A geometry of completion . . . . . . . . . . . . . . . . . . . . 522

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 116.8 Dual EDM cone . . . . . . . . . . . . . . . . . . . . . . . . . . 5296.9 Theorem of the alternative . . . . . . . . . . . . . . . . . . . . 5446.10 Postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5457 Proximity problems 5477.1 First prevalent problem: . . . . . . . . . . . . . . . . . . . . . 5557.2 Second prevalent problem: . . . . . . . . . . . . . . . . . . . . 5667.3 Third prevalent problem: . . . . . . . . . . . . . . . . . . . . . 5787.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588A Linear algebra 591A.1 Main-diagonal δ operator, λ , trace, vec . . . . . . . . . . . . 591A.2 Semidefiniteness: domain of test . . . . . . . . . . . . . . . . . 596A.3 Proper statements . . . . . . . . . . . . . . . . . . . . . . . . . 599A.4 Schur complement . . . . . . . . . . . . . . . . . . . . . . . . 611A.5 Eigenvalue decomposition . . . . . . . . . . . . . . . . . . . . 616A.6 Singular value decomposition, SVD . . . . . . . . . . . . . . . 620A.7 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625B Simple matrices 633B.1 Rank-one matrix (dyad) . . . . . . . . . . . . . . . . . . . . . 634B.2 Doublet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639B.3 Elementary matrix . . . . . . . . . . . . . . . . . . . . . . . . 640B.4 Auxiliary V -matrices . . . . . . . . . . . . . . . . . . . . . . . 642B.5 Orthogonal matrix . . . . . . . . . . . . . . . . . . . . . . . . 647C Some analytical optimal results 653C.1 Properties of infima . . . . . . . . . . . . . . . . . . . . . . . . 653C.2 Trace, singular and eigen values . . . . . . . . . . . . . . . . . 654C.3 Orthogonal Procrustes problem . . . . . . . . . . . . . . . . . 661C.4 Two-sided orthogonal Procrustes . . . . . . . . . . . . . . . . 663C.5 Nonconvex quadratics . . . . . . . . . . . . . . . . . . . . . . 668D Matrix calculus 669D.1 Directional derivative, Taylor series . . . . . . . . . . . . . . . 669D.2 Tables of gradients and derivatives . . . . . . . . . . . . . . . 690

12 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRYE Projection 699E.1 Idempotent matrices . . . . . . . . . . . . . . . . . . . . . . . 703E.2 I −P , Projection on algebraic complement . . . . . . . . . . . 708E.3 Symmetric idempotent matrices . . . . . . . . . . . . . . . . . 709E.4 Algebra of projection on affine subsets . . . . . . . . . . . . . 715E.5 Projection examples . . . . . . . . . . . . . . . . . . . . . . . 715E.6 Vectorization interpretation, . . . . . . . . . . . . . . . . . . . 722E.7 Projection on matrix subspaces . . . . . . . . . . . . . . . . . 729E.8 Range/Rowspace interpretation . . . . . . . . . . . . . . . . . 733E.9 Projection on convex set . . . . . . . . . . . . . . . . . . . . . 733E.10 Alternating projection . . . . . . . . . . . . . . . . . . . . . . 748F Notation and a few definitions 769Bibliography 787Index 817

List of Figures1 Overview 211 Orion nebula . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Application of trilateration is localization . . . . . . . . . . . . 233 Molecular conformation . . . . . . . . . . . . . . . . . . . . . . 244 Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Swiss roll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 USA map reconstruction . . . . . . . . . . . . . . . . . . . . . 287 Honeycomb . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Robotic vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Reconstruction of David . . . . . . . . . . . . . . . . . . . . . 3410 David by distance geometry . . . . . . . . . . . . . . . . . . . 342 Convex geometry 3511 Slab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3812 Open, closed, convex sets . . . . . . . . . . . . . . . . . . . . . 4013 Intersection of line with boundary . . . . . . . . . . . . . . . . 4114 Tangentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4415 Inverse image . . . . . . . . . . . . . . . . . . . . . . . . . . . 4816 Inverse image under linear map . . . . . . . . . . . . . . . . . 4917 Tesseract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5218 Linear injective mapping of Euclidean body . . . . . . . . . . 5419 Linear noninjective mapping of Euclidean body . . . . . . . . 5520 Convex hull of a random list of points . . . . . . . . . . . . . . 6121 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6322 Two Fantopes . . . . . . . . . . . . . . . . . . . . . . . . . . . 6623 Convex hull of rank-1 matrices . . . . . . . . . . . . . . . . . . 6813

14 LIST OF FIGURES24 A simplicial cone . . . . . . . . . . . . . . . . . . . . . . . . . 7125 Hyperplane illustrated ∂H is a partially bounding line . . . . 7326 Hyperplanes in R 2 . . . . . . . . . . . . . . . . . . . . . . . . 7527 Affine independence . . . . . . . . . . . . . . . . . . . . . . . . 7828 {z ∈ C | a T z = κ i } . . . . . . . . . . . . . . . . . . . . . . . . . 8029 Hyperplane supporting closed set . . . . . . . . . . . . . . . . 8130 Minimizing hyperplane over affine set in nonnegative orthant . 8931 Maximizing hyperplane over convex set . . . . . . . . . . . . . 9032 Closed convex set illustrating exposed and extreme points . . 9433 Two-dimensional nonconvex cone . . . . . . . . . . . . . . . . 9734 Nonconvex cone made from lines . . . . . . . . . . . . . . . . 9735 Nonconvex cone is convex cone boundary . . . . . . . . . . . . 9836 Union of convex cones is nonconvex cone . . . . . . . . . . . . 9837 Truncated nonconvex cone X . . . . . . . . . . . . . . . . . . 9938 Cone exterior is convex cone . . . . . . . . . . . . . . . . . . . 9939 Not a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10140 Minimum element, Minimal element . . . . . . . . . . . . . . . 10441 K is a pointed polyhedral cone not full-dimensional . . . . . . 10842 Exposed and extreme directions . . . . . . . . . . . . . . . . . 11243 Positive semidefinite cone . . . . . . . . . . . . . . . . . . . . 11544 Convex Schur-form set . . . . . . . . . . . . . . . . . . . . . . 11745 Projection of truncated PSD cone . . . . . . . . . . . . . . . . 12046 Circular cone showing axis of revolution . . . . . . . . . . . . 12947 Circular section . . . . . . . . . . . . . . . . . . . . . . . . . . 13148 Polyhedral inscription . . . . . . . . . . . . . . . . . . . . . . 13349 Conically (in)dependent vectors . . . . . . . . . . . . . . . . . 14250 Pointed six-faceted polyhedral cone and its dual . . . . . . . . 14351 Minimal set of generators for halfspace about origin . . . . . . 14552 Venn diagram for cones and polyhedra . . . . . . . . . . . . . 14953 Unit simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15254 Two views of a simplicial cone and its dual . . . . . . . . . . . 15455 Two equivalent constructions of dual cone . . . . . . . . . . . 15856 Dual polyhedral cone construction by right angle . . . . . . . 15957 K is a halfspace about the origin . . . . . . . . . . . . . . . . 16058 Iconic primal and dual objective functions . . . . . . . . . . . 16159 Orthogonal cones . . . . . . . . . . . . . . . . . . . . . . . . . 16660 Blades K and K ∗ . . . . . . . . . . . . . . . . . . . . . . . . . 16761 Membership w.r.t K and orthant . . . . . . . . . . . . . . . . 177

LIST OF FIGURES 1562 Shrouded polyhedral cone . . . . . . . . . . . . . . . . . . . . 18163 Simplicial cone K in R 2 and its dual . . . . . . . . . . . . . . . 18764 Monotone nonnegative cone K M+ and its dual . . . . . . . . . 19765 Monotone cone K M and its dual . . . . . . . . . . . . . . . . . 19966 Two views of monotone cone K M and its dual . . . . . . . . . 20067 First-order optimality condition . . . . . . . . . . . . . . . . . 2043 Geometry of convex functions 21968 Convex functions having unique minimizer . . . . . . . . . . . 22169 Minimum/Minimal element, dual cone characterization . . . . 22370 1-norm ball B 1 from compressed sensing/compressive sampling 22771 Cardinality minimization, signed versus unsigned variable . . 22972 1-norm variants . . . . . . . . . . . . . . . . . . . . . . . . . . 22973 Affine function . . . . . . . . . . . . . . . . . . . . . . . . . . 23974 Supremum of affine functions . . . . . . . . . . . . . . . . . . 24175 Epigraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24276 Gradient in R 2 evaluated on grid . . . . . . . . . . . . . . . . 25077 Quadratic function convexity in terms of its gradient . . . . . 25778 Contour plot of convex real function at selected levels . . . . . 25979 Iconic quasiconvex function . . . . . . . . . . . . . . . . . . . 2704 Semidefinite programming 27380 Venn diagram of convex program classes . . . . . . . . . . . . 27781 Visualizing positive semidefinite cone in high dimension . . . . 28082 Primal/Dual transformations . . . . . . . . . . . . . . . . . . 29083 Projection versus convex iteration . . . . . . . . . . . . . . . . 31184 Trace heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . 31285 Sensor-network localization . . . . . . . . . . . . . . . . . . . . 31686 2-lattice of sensors and anchors for localization example . . . . 31987 3-lattice of sensors and anchors for localization example . . . . 32088 4-lattice of sensors and anchors for localization example . . . . 32189 5-lattice of sensors and anchors for localization example . . . . 32290 Uncertainty ellipsoids orientation and eccentricity . . . . . . . 32391 2-lattice localization solution . . . . . . . . . . . . . . . . . . . 32692 3-lattice localization solution . . . . . . . . . . . . . . . . . . . 32693 4-lattice localization solution . . . . . . . . . . . . . . . . . . . 32794 5-lattice localization solution . . . . . . . . . . . . . . . . . . . 327

16 LIST OF FIGURES95 10-lattice localization solution . . . . . . . . . . . . . . . . . . 32896 100 randomized noiseless sensor localization . . . . . . . . . . 32897 100 randomized sensors localization . . . . . . . . . . . . . . . 32998 Regularization curve for convex iteration . . . . . . . . . . . . 33299 1-norm heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . 335100 Sparse sampling theorem . . . . . . . . . . . . . . . . . . . . . 339101 Signal dropout . . . . . . . . . . . . . . . . . . . . . . . . . . 343102 Signal dropout reconstruction . . . . . . . . . . . . . . . . . . 344103 Simplex with intersecting line problem in compressed sensing . 346104 Geometric interpretations of sparse-sampling constraints . . . 349105 Permutation matrix column-norm and column-sum constraint 357106 max cut problem . . . . . . . . . . . . . . . . . . . . . . . . 366107 Shepp-Logan phantom . . . . . . . . . . . . . . . . . . . . . . 371108 MRI radial sampling pattern in Fourier domain . . . . . . . . 376109 Aliased phantom . . . . . . . . . . . . . . . . . . . . . . . . . 378110 Neighboring-pixel stencil on Cartesian grid . . . . . . . . . . . 380111 Differentiable almost everywhere . . . . . . . . . . . . . . . . . 382112 MIT logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386113 One-pixel camera . . . . . . . . . . . . . . . . . . . . . . . . . 386114 One-pixel camera - compression estimates . . . . . . . . . . . 3885 Euclidean Distance Matrix 395115 Convex hull of three points . . . . . . . . . . . . . . . . . . . . 396116 Complete dimensionless EDM graph . . . . . . . . . . . . . . 399117 Fifth Euclidean metric property . . . . . . . . . . . . . . . . . 401118 Fermat point . . . . . . . . . . . . . . . . . . . . . . . . . . . 410119 Arbitrary hexagon in R 3 . . . . . . . . . . . . . . . . . . . . . 412120 Kissing number . . . . . . . . . . . . . . . . . . . . . . . . . . 413121 Trilateration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418122 This EDM graph provides unique isometric reconstruction . . 422123 Two sensors • and three anchors ◦ . . . . . . . . . . . . . . . 422124 Two discrete linear trajectories of sensors . . . . . . . . . . . . 423125 Coverage in cellular telephone network . . . . . . . . . . . . . 426126 Contours of equal signal power . . . . . . . . . . . . . . . . . . 426127 Depiction of molecular conformation . . . . . . . . . . . . . . 428128 Square diamond . . . . . . . . . . . . . . . . . . . . . . . . . . 437129 Orthogonal complements in S N abstractly oriented . . . . . . . 440

LIST OF FIGURES 17130 Elliptope E 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459131 Elliptope E 2 interior to S 2 + . . . . . . . . . . . . . . . . . . . . 460132 Smallest eigenvalue of −V T N DV N . . . . . . . . . . . . . . . . 466133 Some entrywise EDM compositions . . . . . . . . . . . . . . . 466134 Map of United States of America . . . . . . . . . . . . . . . . 479135 Largest ten eigenvalues of −V T N OV N . . . . . . . . . . . . . . 482136 Relative-angle inequality tetrahedron . . . . . . . . . . . . . . 489137 Nonsimplicial pyramid in R 3 . . . . . . . . . . . . . . . . . . . 4936 Cone of distance matrices 497138 Relative boundary of cone of Euclidean distance matrices . . . 500139 Example of V X selection to make an EDM . . . . . . . . . . . 505140 Vector V X spirals . . . . . . . . . . . . . . . . . . . . . . . . . 508141 Three views of translated negated elliptope . . . . . . . . . . . 515142 Halfline T on PSD cone boundary . . . . . . . . . . . . . . . . 519143 Vectorization and projection interpretation example . . . . . . 520144 Intersection of EDM cone with hyperplane . . . . . . . . . . . 523145 Neighborhood graph . . . . . . . . . . . . . . . . . . . . . . . 524146 Trefoil knot untied . . . . . . . . . . . . . . . . . . . . . . . . 526147 Trefoil ribbon . . . . . . . . . . . . . . . . . . . . . . . . . . . 528148 Orthogonal complement of geometric center subspace . . . . . 532149 EDM cone construction by flipping PSD cone . . . . . . . . . 533150 Decomposing a member of polar EDM cone . . . . . . . . . . 538151 Ordinary dual EDM cone projected on S 3 h . . . . . . . . . . . 5437 Proximity problems 547152 Pseudo-Venn diagram . . . . . . . . . . . . . . . . . . . . . . 550153 Elbow placed in path of projection . . . . . . . . . . . . . . . 551154 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . 571A Linear algebra 591155 Geometrical interpretation of full SVD . . . . . . . . . . . . . 623B Simple matrices 633156 Four fundamental subspaces of any dyad . . . . . . . . . . . . 635157 Four fundamental subspaces of a doublet . . . . . . . . . . . . 639

List of Tables2 Convex geometryTable 2.9.2.3.1, rank versus dimension of S 3 + faces 122Table 2.10.0.0.1, maximum number of c.i. directions 141Cone Table 1 194Cone Table S 195Cone Table A 196Cone Table 1* 2014 Semidefinite programmingFaces of S 3 + corresponding to faces of S 3 + 2815 Euclidean Distance MatrixPrécis 5.7.2: affine dimension in terms of rank 448B Simple matricesAuxiliary V -matrix Table B.4.4 646D Matrix calculusTable D.2.1, algebraic gradients and derivatives 691Table D.2.2, trace Kronecker gradients 692Table D.2.3, trace gradients and derivatives 693Table D.2.4, logarithmic determinant gradients, derivatives 695Table D.2.5, determinant gradients and derivatives 696Table D.2.6, logarithmic derivatives 696Table D.2.7, exponential gradients and derivatives 6972001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.19

Chapter 1OverviewConvex OptimizationEuclidean Distance GeometryPeople are so afraid of convex analysis.−Claude Lemaréchal, 2003In layman’s terms, the mathematical science of Optimization is the studyof how to make a good choice when confronted with conflicting requirements.The qualifier convex means: when an optimal solution is found, then it isguaranteed to be a best solution; there is no better choice.Any convex optimization problem has geometric interpretation. If a givenoptimization problem can be transformed to a convex equivalent, then thisinterpretive benefit is acquired. That is a powerful attraction: the ability tovisualize geometry of an optimization problem. Conversely, recent advancesin geometry and in graph theory hold convex optimization within their proofs’core. [394] [319]This book is about convex optimization, convex geometry (withparticular attention to distance geometry), and nonconvex, combinatorial,and geometrical problems that can be relaxed or transformed into convexproblems. A virtual flood of new applications follows by epiphany that manyproblems, presumed nonconvex, can be so transformed. [10] [11] [59] [93][149] [151] [277] [298] [306] [361] [362] [391] [394] [34,4.3, p.316-322]2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.21

22 CHAPTER 1. OVERVIEWFigure 1: Orion nebula. (astrophotography by Massimo Robberto)Euclidean distance geometry is, fundamentally, a determination of pointconformation (configuration, relative position or location) by inference frominterpoint distance information. By inference we mean: e.g., given onlydistance information, determine whether there corresponds a realizableconformation of points; a list of points in some dimension that attains thegiven interpoint distances. Each point may represent simply location or,abstractly, any entity expressible as a vector in finite-dimensional Euclideanspace; e.g., distance geometry of music [107].It is a common misconception to presume that some desired pointconformation cannot be recovered in absence of complete interpoint distanceinformation. We might, for example, want to realize a constellation givenonly interstellar distance (or, equivalently, parsecs from our Sun and relativeangular measurement; the Sun as vertex to two distant stars); called stellarcartography, an application evoked by Figure 1. At first it may seem O(N 2 )data is required, yet there are many circumstances where this can be reducedto O(N).If we agree that a set of points can have a shape (three points can form

ˇx 4ˇx 3ˇx 2Figure 2: Application of trilateration (5.4.2.3.5) is localization (determiningposition) of a radio signal source in 2 dimensions; more commonly known byradio engineers as the process “triangulation”. In this scenario, anchorsˇx 2 , ˇx 3 , ˇx 4 are illustrated as fixed antennae. [209] The radio signal source(a sensor • x 1 ) anywhere in affine hull of three antenna bases can beuniquely localized by measuring distance to each (dashed white arrowed linesegments). Ambiguity of lone distance measurement to sensor is representedby circle about each antenna. Trilateration is expressible as a semidefiniteprogram; hence, a convex optimization problem. [320]a triangle and its interior, for example, four points a tetrahedron), thenwe can ascribe shape of a set of points to their convex hull. It should beapparent: from distance, these shapes can be determined only to within arigid transformation (rotation, reflection, translation).Absolute position information is generally lost, given only distanceinformation, but we can determine the smallest possible dimension in whichan unknown list of points can exist; that attribute is their affine dimension(a triangle in any ambient space has affine dimension 2, for example). Incircumstances where stationary reference points are also provided, it becomespossible to determine absolute position or location; e.g., Figure 2.23

24 CHAPTER 1. OVERVIEWFigure 3: [188] [116] Distance data collected via nuclear magnetic resonance(NMR) helped render this 3-dimensional depiction of a protein molecule.At the beginning of the 1980s, Kurt Wüthrich [Nobel laureate] developedan idea about how NMR could be extended to cover biological moleculessuch as proteins. He invented a systematic method of pairing each NMRsignal with the right hydrogen nucleus (proton) in the macromolecule. Themethod is called sequential assignment and is today a cornerstone of all NMRstructural investigations. He also showed how it was subsequently possible todetermine pairwise distances between a large number of hydrogen nuclei anduse this information with a mathematical method based on distance-geometryto calculate a three-dimensional structure for the molecule. −[281] [378] [183]Geometric problems involving distance between points can sometimes bereduced to convex optimization problems. Mathematics of this combinedstudy of geometry and optimization is rich and deep. Its applicationhas already proven invaluable discerning organic molecular conformationby measuring interatomic distance along covalent bonds; e.g., Figure 3.[88] [351] [139] [46] Many disciplines have already benefitted and simplifiedconsequent to this theory; e.g., distance-based pattern recognition (Figure 4),localization in wireless sensor networks by measurement of intersensordistance along channels of communication, [47] [389] [45] wireless locationof a radio-signal source such as a cell phone by multiple measurements ofsignal strength, the global positioning system (GPS), and multidimensionalscaling which is a numerical representation of qualitative data by finding alow-dimensional scale.

25Figure 4: This coarsely discretized triangulated algorithmically flattenedhuman face (made by Kimmel & the Bronsteins [226]) represents a stagein machine recognition of human identity; called face recognition. Distancegeometry is applied to determine discriminating-features.Euclidean distance geometry together with convex optimization have alsofound application in artificial intelligence:to machine learning by discerning naturally occurring manifolds in:– Euclidean bodies (Figure 5,6.7.0.0.1),– Fourier spectra of kindred utterances [212],– and photographic image sequences [372],to robotics; e.g., automatic control of vehicles maneuvering information. (Figure 8)by chapterWe study pervasive convex Euclidean bodies; their many manifestationsand representations. In particular, we make convex polyhedra, cones, anddual cones visceral through illustration in chapter 2 Convex geometrywhere geometric relation of polyhedral cones to nonorthogonal bases

26 CHAPTER 1. OVERVIEWFigure 5: Swiss roll from Weinberger & Saul [372]. The problem of manifoldlearning, illustrated for N =800 data points sampled from a “Swiss roll” 1Ç.A discretized manifold is revealed by connecting each data point and its k=6nearest neighbors 2Ç. An unsupervised learning algorithm unfolds the Swissroll while preserving the local geometry of nearby data points 3Ç. Finally, thedata points are projected onto the two dimensional subspace that maximizestheir variance, yielding a faithful embedding of the original manifold 4Ç.

(biorthogonal expansion) is examined. It is shown that coordinates areunique in any conic system whose basis cardinality equals or exceedsspatial dimension. The conic analogue to linear independence, called conicindependence, is introduced as a tool for study, analysis, and manipulation ofcones; a natural extension and next logical step in progression: linear, affine,conic. We explain conversion between halfspace- and vertex-description of aconvex cone, we motivate the dual cone and provide formulae for finding it,and we show how first-order optimality conditions or alternative systemsof linear inequality or linear matrix inequality can be explained by dualgeneralized inequalities with respect to convex cones. Arcane theorems ofalternative generalized inequality are, in fact, simply derived from conemembership relations; generalizations of algebraic Farkas’ lemma translatedto geometry of convex cones.Any convex optimization problem can be visualized geometrically. Desireto visualize in high dimension [Sagan, Cosmos −The Edge of Forever, 22:55 ′ ]is deeply embedded in the mathematical psyche. [1] Chapter 2 providestools to make visualization easier, and we teach how to visualize in highdimension. The concepts of face, extreme point, and extreme direction of aconvex Euclidean body are explained here; crucial to understanding convexoptimization. How to find the smallest face of any pointed closed convexcone containing convex set C , for example, is divulged; later shown to havepractical application to presolving convex programs. The convex cone ofpositive semidefinite matrices, in particular, is studied in depth:We interpret, for example, inverse image of the positive semidefinitecone under affine transformation. (Example 2.9.1.0.2)Subsets of the positive semidefinite cone, discriminated by rankexceeding some lower bound, are convex. In other words, high-ranksubsets of the positive semidefinite cone boundary united with itsinterior are convex. (Theorem 2.9.2.9.3) There is a closed form forprojection on those convex subsets.The positive semidefinite cone is a circular cone in low dimension,while Geršgorin discs specify inscription of a polyhedral cone into thatpositive semidefinite cone. (Figure 48)Chapter 3 Geometry of convex functions observes Fenchel’s analogybetween convex sets and functions: We explain, for example, how the realaffine function relates to convex functions as the hyperplane relates to convex27

28 CHAPTER 1. OVERVIEWoriginalreconstructionFigure 6: About five thousand points along borders constituting UnitedStates were used to create an exhaustive matrix of interpoint distance for eachand every pair of points in an ordered set (a list); called Euclidean distancematrix. From that noiseless distance information, it is easy to reconstructmap exactly via Schoenberg criterion (910). (5.13.1.0.1, confer Figure 134)Map reconstruction is exact (to within a rigid transformation) given anynumber of interpoint distances; the greater the number of distances, thegreater the detail (as it is for all conventional map preparation).sets. Partly a toolbox of practical useful convex functions and a cookbookfor optimization problems, methods are drawn from the appendices aboutmatrix calculus for determining convexity and discerning geometry.Semidefinite programming is reviewed in chapter 4 with particularattention to optimality conditions for prototypical primal and dual problems,their interplay, and a perturbation method for rank reduction of optimalsolutions (extant but not well known). Positive definite Farkas’ lemmais derived, and we also show how to determine if a feasible set belongsexclusively to a positive semidefinite cone boundary. An arguably goodthree-dimensional polyhedral analogue to the positive semidefinite cone of3 ×3 symmetric matrices is introduced: a new tool for visualizing coexistenceof low- and high-rank optimal solutions in six isomorphic dimensions and amnemonic aid for understanding semidefinite programs. We find a minimal

29cardinality Boolean solution to an instance of Ax = b :minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(680)The sensor-network localization problem is solved in any dimension in thischapter. We introduce a method of convex iteration for constraining rankin the form rankG ≤ ρ and cardinality in the form cardx ≤ k . Cardinalityminimization is applied to a discrete image-gradient of the Shepp-Loganphantom, from Magnetic Resonance Imaging (MRI) in the field of medicalimaging, for which we find a new lower bound of 1.9% cardinality. Weshow how to handle polynomial constraints, and how to transform arank-constrained problem to a rank-1 problem.The EDM is studied in chapter 5 Euclidean Distance Matrix; itsproperties and relationship to both positive semidefinite and Gram matrices.We relate the EDM to the four classical properties of Euclidean metric;thereby, observing existence of an infinity of properties of the Euclideanmetric beyond triangle inequality. We proceed by deriving the fifth Euclideanmetric property and then explain why furthering this endeavor is inefficientbecause the ensuing criteria (while describing polyhedra in angle or area,volume, content, and so on ad infinitum) grow linearly in complexity andnumber with problem size.Reconstruction methods are explained and applied to a map of theUnited States; e.g., Figure 6. We also experimentally test a conjecture ofBorg & Groenen by reconstructing a distorted but recognizable isotonic mapof the USA using only ordinal (comparative) distance data: Figure 134e-f.We demonstrate an elegant method for including dihedral (or torsion)angle constraints into a molecular conformation problem. We explain whytrilateration (a.k.a, localization) is a convex optimization problem. Weshow how to recover relative position given incomplete interpoint distanceinformation, and how to pose EDM problems or transform geometricalproblems to convex optimizations; e.g., kissing number of packed spheresabout a central sphere (solved in R 3 by Isaac Newton).The set of all Euclidean distance matrices forms a pointed closed convexcone called the EDM cone: EDM N . We offer a new proof of Schoenberg’s

30 CHAPTER 1. OVERVIEWFigure 7: These bees construct a honeycomb by solving a convex optimizationproblem (5.4.2.3.4). The most dense packing of identical spheres about acentral sphere in 2 dimensions is 6. Sphere centers describe a regular lattice.seminal characterization of EDMs:{−VTD ∈ EDM NN DV N ≽ 0⇔D ∈ S N h(910)Our proof relies on fundamental geometry; assuming, any EDM mustcorrespond to a list of points contained in some polyhedron (possibly atits vertices) and vice versa. It is known, but not obvious, this Schoenbergcriterion implies nonnegativity of the EDM entries; proved herein.We characterize the eigenvalue spectrum of an EDM, and then devisea polyhedral spectral cone for determining membership of a given matrix(in Cayley-Menger form) to the convex cone of Euclidean distance matrices;id est, a matrix is an EDM if and only if its nonincreasingly ordered vectorof eigenvalues belongs to a polyhedral spectral cone for EDM N ;D ∈ EDM N⇔⎧ ([ ])⎪⎨ 0 1Tλ∈1 −D⎪⎩D ∈ S N h[ ]RN+∩ ∂HR −(1125)We will see: spectral cones are not unique.In chapter 6 Cone of distance matrices we explain a geometricrelationship between the cone of Euclidean distance matrices, two positivesemidefinite cones, and the elliptope. We illustrate geometric requirements,in particular, for projection of a given matrix on a positive semidefinite conethat establish its membership to the EDM cone. The faces of the EDM coneare described, but still open is the question whether all its faces are exposedas they are for the positive semidefinite cone.

31Figure 8: Robotic vehicles in concert can move larger objects, guard aperimeter, perform surveillance, find land mines, or localize a plume of gas,liquid, or radio waves. [138]The Schoenberg criterion (910) for identifying a Euclidean distancematrix is revealed to be a discretized membership relation (dual generalizedinequalities, a new Farkas’-like lemma) between the EDM cone and itsordinary dual: EDM N∗ . A matrix criterion for membership to the dualEDM cone is derived that is simpler than the Schoenberg criterion:D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (1274)There is a concise equality, relating the convex cone of Euclidean distancematrices to the positive semidefinite cone, apparently overlooked in theliterature; an equality between two large convex Euclidean bodies:EDM N = S N h ∩ ( )S N⊥c − S N +(1268)In chapter 7 Proximity problems we explore methods of solutionto a few fundamental and prevalent Euclidean distance matrix proximityproblems; the problem of finding that distance matrix closest, in some sense,

32 CHAPTER 1. OVERVIEWto a given matrix H :minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNminimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMN(1310)We apply a convex iteration method for constraining rank. Known heuristicsfor rank minimization are also explained. We offer new geometrical proof, ofa famous discovery by Eckart & Young in 1936 [129], with particular regardto Euclidean projection of a point on that generally nonconvex subset ofthe positive semidefinite cone boundary comprising all semidefinite matriceshaving rank not exceeding a prescribed bound ρ . We explain how thisproblem is transformed to a convex optimization for any rank ρ .appendicesToolboxes are provided so as to be more self-contained:linear algebra (appendix A is primarily concerned with properstatements of semidefiniteness for square matrices),simple matrices (dyad, doublet, elementary, Householder, Schoenberg,orthogonal, etcetera, in appendix B),collection of known analytical solutions to some important optimizationproblems (appendix C),matrix calculus remains somewhat unsystematized when comparedto ordinary calculus (appendix D concerns matrix-valued functions,matrix differentiation and directional derivatives, Taylor series, andtables of first- and second-order gradients and matrix derivatives),

an elaborate exposition offering insight into orthogonal andnonorthogonal projection on convex sets (the connection betweenprojection and positive semidefiniteness, for example, or betweenprojection and a linear objective function in appendix E),Matlab code on Wıκımization to discriminate EDMs, to determineconic independence, to reduce or constrain rank of an optimal solutionto a semidefinite program, compressed sensing (compressive sampling)for digital image and audio signal processing, and two distinct methodsof reconstructing a map of the United States: one given only distancedata, the other given only comparative distance data.33

34 CHAPTER 1. OVERVIEWFigure 9: Three-dimensional reconstruction of David from distance data.Figure 10: Digital Michelangelo Project, Stanford University. Measuringdistance to David by laser rangefinder. Spatial resolution is 0.29mm.

Chapter 2Convex geometryConvexity has an immensely rich structure and numerousapplications. On the other hand, almost every “convex” idea canbe explained by a two-dimensional picture.−Alexander Barvinok [26, p.vii]As convex geometry and linear algebra are inextricably bonded by asymmetry(linear inequality), we provide much background material on linear algebra(especially in the appendices) although a reader is assumed comfortable with[331] [333] [202] or any other intermediate-level text. The essential referencesto convex analysis are [199] [307]. The reader is referred to [330] [26] [371][42] [61] [304] [358] for a comprehensive treatment of convexity. There isrelatively less published pertaining to convex matrix-valued functions. [215][203,6.6] [294]2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.35

36 CHAPTER 2. CONVEX GEOMETRY2.1 Convex setA set C is convex iff for all Y, Z ∈ C and 0≤µ≤1µY + (1 − µ)Z ∈ C (1)Under that defining condition on µ , the linear sum in (1) is called a convexcombination of Y and Z . If Y and Z are points in real finite-dimensionalEuclidean vector space R n or R m×n (matrices), then (1) represents theclosed line segment joining them. All line segments are thereby convex sets.Apparent from the definition, a convex set is a connected set. [258,3.4,3.5][42, p.2] A convex set can, but does not necessarily, contain the origin 0.An ellipsoid centered at x = a (Figure 13, p.41), given matrix C ∈ R m×n{x∈ R n | ‖C(x − a)‖ 2 = (x − a) T C T C(x − a) ≤ 1} (2)is a good icon for a convex set. 2.12.1.1 subspaceA nonempty subset R of real Euclidean vector space R n is called a subspace(2.5) if every vector 2.2 of the form αx + βy , for α,β∈R , is in R whenevervectors x and y are. [250,2.3] A subspace is a convex set containing theorigin, by definition. [307, p.4] Any subspace is therefore open in the sensethat it contains no boundary, but closed in the sense [258,2]It is not difficult to showR + R = R (3)R = −R (4)as is true for any subspace R , because x∈ R ⇔ −x∈ R . Given any x∈ RR = x + R (5)The intersection of an arbitrary collection of subspaces remains asubspace. Any subspace not constituting the entire ambient vector space R nis a proper subspace; e.g., 2.3 any line through the origin in two-dimensionalEuclidean space R 2 . The vector space R n is itself a conventional subspace,inclusively, [227,2.1] although not proper.2.1 This particular definition is slablike (Figure 11) in R n when C has nontrivial nullspace.2.2 A vector is assumed, throughout, to be a column vector.2.3 We substitute abbreviation e.g. in place of the Latin exempli gratia.

2.1. CONVEX SET 372.1.2 linear independenceArbitrary given vectors in Euclidean space {Γ i ∈ R n , i=1... N} are linearlyindependent (l.i.) if and only if, for all ζ ∈ R NΓ 1 ζ 1 + · · · + Γ N−1 ζ N−1 − Γ N ζ N = 0 (6)has only the trivial solution ζ = 0 ; in other words, iff no vector from thegiven set can be expressed as a linear combination of those remaining.Geometrically, two nontrivial vector subspaces are linearly independent iffthey intersect only at the origin.2.1.2.1 preservation of linear independence(confer2.4.2.4,2.10.1) Linear transformation preserves linear dependence.[227, p.86] Conversely, linear independence can be preserved under lineartransformation. Given Y = [y 1 y 2 · · · y N ]∈ R N×N , consider the mappingT(Γ) : R n×N → R n×N ΓY (7)whose domain is the set of all matrices Γ∈ R n×N holding a linearlyindependent set columnar. Linear independence of {Γy i ∈ R n , i=1... N}demands, by definition, there exist no nontrivial solution ζ ∈ R N toΓy 1 ζ i + · · · + Γy N−1 ζ N−1 − Γy N ζ N = 0 (8)By factoring out Γ , we see that triviality is ensured by linear independenceof {y i ∈ R N }.2.1.3 Orthant:name given to a closed convex set that is the higher-dimensionalgeneralization of quadrant from the classical Cartesian partition of R 2 ; aCartesian cone. The most common is the nonnegative orthant R n + or R n×n+(analogue to quadrant I) to which membership denotes nonnegative vectorormatrix-entries respectively; e.g.,R n + {x∈ R n | x i ≥ 0 ∀i} (9)The nonpositive orthant R n − or R n×n− (analogue to quadrant III) denotesnegative and 0 entries. Orthant convexity 2.4 is easily verified bydefinition (1).2.4 All orthants are selfdual simplicial cones. (2.13.5.1,2.12.3.1.1)

38 CHAPTER 2. CONVEX GEOMETRYFigure 11: A slab is a convex Euclidean body infinite in extent but notaffine. Illustrated in R 2 , it may be constructed by intersecting two opposinghalfspaces whose bounding hyperplanes are parallel but not coincident.Because number of halfspaces used in its construction is finite, slab is apolyhedron (2.12). (Cartesian axes + and vector inward-normal, to eachhalfspace-boundary, are drawn for reference.)2.1.4 affine setA nonempty affine set (from the word affinity) is any subset of R n that is atranslation of some subspace. Any affine set is convex and open so containsno boundary: e.g., empty set ∅ , point, line, plane, hyperplane (2.4.2),subspace, etcetera. For some parallel 2.5 subspace R and any point x ∈ AA is affine ⇔ A = x + R= {y | y − x∈R}(10)The intersection of an arbitrary collection of affine sets remains affine. Theaffine hull of a set C ⊆ R n (2.3.1) is the smallest affine set containing it.2.1.5 dimensionDimension of an arbitrary set S is Euclidean dimension of its affine hull;[371, p.14]dim S dim aff S = dim aff(S − s) , s∈ S (11)the same as dimension of the subspace parallel to that affine set aff S whennonempty. Hence dimension (of a set) is synonymous with affine dimension.[199, A.2.1]2.5 Two affine sets are said to be parallel when one is a translation of the other. [307, p.4]

2.1. CONVEX SET 392.1.6 empty set versus empty interiorEmptiness ∅ of a set is handled differently than interior in the classicalliterature. It is common for a nonempty convex set to have empty interior;e.g., paper in the real world:An ordinary flat sheet of paper is a nonempty convex set having emptyinterior in R 3 but nonempty interior relative to its affine hull.2.1.6.1 relative interiorAlthough it is always possible to pass to a smaller ambient Euclidean spacewhere a nonempty set acquires an interior [26,II.2.3], we prefer the qualifierrelative which is the conventional fix to this ambiguous terminology. 2.6 Sowe distinguish interior from relative interior throughout: [330] [371] [358]Classical interior int C is defined as a union of points: x is an interiorpoint of C ⊆ R n if there exists an open ball of dimension n and nonzeroradius centered at x that is contained in C .Relative interior rel int C of a convex set C ⊆ R n is interior relative toits affine hull. 2.7Thus defined, it is common (though confusing) for int C the interior of C tobe empty while its relative interior is not: this happens whenever dimensionof its affine hull is less than dimension of the ambient space (dim aff C < n ;e.g., were C paper) or in the exception when C is a single point; [258,2.2.1]rel int{x} aff{x} = {x} , int{x} = ∅ , x∈ R n (12)In any case, closure of the relative interior of a convex set C always yieldsclosure of the set itself;rel int C = C (13)Closure is invariant to translation. If C is convex then rel int C and C areconvex. [199, p.24] If C has nonempty interior, thenrel int C = int C (14)2.6 Superfluous mingling of terms as in relatively nonempty set would be an unfortunateconsequence. From the opposite perspective, some authors use the term full orfull-dimensional to describe a set having nonempty interior.2.7 Likewise for relative boundary (2.1.7.2), although relative closure is superfluous.[199,A.2.1]

40 CHAPTER 2. CONVEX GEOMETRY(a)(b)R 2(c)Figure 12: (a) Closed convex set. (b) Neither open, closed, or convex. YetPSD cone can remain convex in absence of certain boundary components(2.9.2.9.3). Nonnegative orthant with origin excluded (2.6) and positiveorthant with origin adjoined [307, p.49] are convex. (c) Open convex set.Given the intersection of convex set C with affine set Arel int(C ∩ A) = rel int(C) ∩ A ⇐ rel int(C) ∩ A ̸= ∅ (15)Because an affine set A is openrel int A = A (16)2.1.7 classical boundary(confer2.1.7.2) Boundary of a set C is the closure of C less its interior;∂ C = C \ int C (17)[55,1.1] which follows from the factint C = C ⇔ ∂ int C = ∂ C (18)and presumption of nonempty interior. 2.8 Implications are:2.8 Otherwise, for x∈ R n as in (12), [258,2.1-2.3]int{x} = ∅ = ∅

2.1. CONVEX SET 41(a)R(b)R 2(c)R 3Figure 13: (a) Ellipsoid in R is a line segment whose boundary comprises twopoints. Intersection of line with ellipsoid in R , (b) in R 2 , (c) in R 3 . Eachellipsoid illustrated has entire boundary constituted by zero-dimensionalfaces; in fact, by vertices (2.6.1.0.1). Intersection of line with boundaryis a point at entry to interior. These same facts hold in higher dimension.int C = C \∂ Ca bounded open set has boundary defined but not contained in the setinterior of an open set is equivalent to the set itself;from which an open set is defined: [258, p.109]C is open ⇔ int C = C (19)C is closed ⇔ int C = C (20)The set illustrated in Figure 12b is not open because it is not equivalentto its interior, for example, it is not closed because it does not containits boundary, and it is not convex because it does not contain all convexcombinations of its boundary points.the empty set is both open and closed.

42 CHAPTER 2. CONVEX GEOMETRY2.1.7.1 Line intersection with boundaryA line can intersect the boundary of a convex set in any dimension at apoint demarcating the line’s entry to the set interior. On one side of thatentry-point along the line is the exterior of the set, on the other side is theset interior. In other words,starting from any point of a convex set, a move toward the interior isan immediate entry into the interior. [26,II.2]When a line intersects the interior of a convex body in any dimension, theboundary appears to the line to be as thin as a point. This is intuitivelyplausible because, for example, a line intersects the boundary of the ellipsoidsin Figure 13 at a point in R , R 2 , and R 3 . Such thinness is a remarkablefact when pondering visualization of convex polyhedra (2.12,5.14.3) infour Euclidean dimensions, for example, having boundaries constructed fromother three-dimensional convex polyhedra called faces.We formally define face in (2.6). For now, we observe the boundary ofa convex body to be entirely constituted by all its faces of dimension lowerthan the body itself. Any face of a convex set is convex. For example: Theellipsoids in Figure 13 have boundaries composed only of zero-dimensionalfaces. The two-dimensional slab in Figure 11 is an unbounded polyhedronhaving one-dimensional faces making its boundary. The three-dimensionalbounded polyhedron in Figure 20 has zero-, one-, and two-dimensionalpolygonal faces constituting its boundary.2.1.7.1.1 Example. Intersection of line with boundary in R 6 .The convex cone of positive semidefinite matrices S 3 + (2.9) in the ambientsubspace of symmetric matrices S 3 (2.2.2.0.1) is a six-dimensional Euclideanbody in isometrically isomorphic R 6 (2.2.1). The boundary of thepositive semidefinite cone in this dimension comprises faces having only thedimensions 0, 1, and 3 ; id est, {ρ(ρ+1)/2, ρ=0, 1, 2}.Unique minimum-distance projection PX (E.9) of any point X ∈ S 3 onthat cone S 3 + is known in closed form (7.1.2). Given, for example, λ∈int R 3 +and diagonalization (A.5.1) of exterior point⎡ ⎤λ 1 0X = QΛQ T ∈ S 3 , Λ ⎣ λ 2⎦ (21)0 −λ 3

2.1. CONVEX SET 43where Q∈ R 3×3 is an orthogonal matrix, then the projection on S 3 + in R 6 is⎡PX = Q⎣λ 1 0λ 20 0⎤⎦Q T ∈ S 3 + (22)This positive semidefinite matrix PX nearest X thus has rank 2, found bydiscarding all negative eigenvalues in Λ . The line connecting these two pointsis {X + (PX−X)t | t∈R} where t=0 ⇔ X and t=1 ⇔ PX . Becausethis line intersects the boundary of the positive semidefinite cone S 3 + atpoint PX and passes through its interior (by assumption), then the matrixcorresponding to an infinitesimally positive perturbation of t there shouldreside interior to the cone (rank 3). Indeed, for ε an arbitrarily small positiveconstant,⎡ ⎤λ 1 0X + (PX−X)t| t=1+ε= Q(Λ+(PΛ−Λ)(1+ε))Q T = Q⎣λ 2⎦Q T ∈ int S 3 +0 ελ 3(23)2.1.7.1.2 Example. Tangential line intersection with boundary.A higher-dimensional boundary ∂ C of a convex Euclidean body C is simplya dimensionally larger set through which a line can pass when it does notintersect the body’s interior. Still, for example, a line existing in five ormore dimensions may pass tangentially (intersecting no point interior to C[219,15.3]) through a single point relatively interior to a three-dimensionalface on ∂ C . Let’s understand why by inductive reasoning.Figure 14a shows a vertical line-segment whose boundary comprisesits two endpoints. For a line to pass through the boundary tangentially(intersecting no point relatively interior to the line-segment), it must exist inan ambient space of at least two dimensions. Otherwise, the line is confinedto the same one-dimensional space as the line-segment and must pass alongthe segment to reach the end points.Figure 14b illustrates a two-dimensional ellipsoid whose boundary isconstituted entirely by zero-dimensional faces. Again, a line must exist inat least two dimensions to tangentially pass through any single arbitrarilychosen point on the boundary (without intersecting the ellipsoid interior).

44 CHAPTER 2. CONVEX GEOMETRY(a)R 2(b)R 3(c)(d)Figure 14: Line tangential: (a) (b) to relative interior of a zero-dimensionalface in R 2 , (c) (d) to relative interior of a one-dimensional face in R 3 .

2.1. CONVEX SET 45Now let’s move to an ambient space of three dimensions. Figure 14cshows a polygon rotated into three dimensions. For a line to pass through itszero-dimensional boundary (one of its vertices) tangentially, it must existin at least the two dimensions of the polygon. But for a line to passtangentially through a single arbitrarily chosen point in the relative interiorof a one-dimensional face on the boundary as illustrated, it must exist in atleast three dimensions.Figure 14d illustrates a solid circular cone (drawn truncated) whoseone-dimensional faces are halflines emanating from its pointed end (vertex).This cone’s boundary is constituted solely by those one-dimensional halflines.A line may pass through the boundary tangentially, striking only onearbitrarily chosen point relatively interior to a one-dimensional face, if itexists in at least the three-dimensional ambient space of the cone.From these few examples, way deduce a general rule (without proof):A line may pass tangentially through a single arbitrarily chosen pointrelatively interior to a k-dimensional face on the boundary of a convexEuclidean body if the line exists in dimension at least equal to k+2.Now the interesting part, with regard to Figure 20 showing a boundedpolyhedron in R 3 ; call it P : A line existing in at least four dimensionsis required in order to pass tangentially (without hitting int P) through asingle arbitrary point in the relative interior of any two-dimensional polygonalface on the boundary of polyhedron P . Now imagine that polyhedron P isitself a three-dimensional face of some other polyhedron in R 4 . To pass aline tangentially through polyhedron P itself, striking only one point fromits relative interior rel int P as claimed, requires a line existing in at leastfive dimensions. 2.9It is not too difficult to deduce:A line may pass through a single arbitrarily chosen point interior to ak-dimensional convex Euclidean body (hitting no other interior point)if that line exists in dimension at least equal to k+1.In layman’s terms, this means: a being capable of navigating four spatialdimensions (one Euclidean dimension beyond our physical reality) could seeinside three-dimensional objects.2.9 This rule can help determine whether there exists unique solution to a convexoptimization problem whose feasible set is an intersecting line; e.g., the trilaterationproblem (5.4.2.3.5).

46 CHAPTER 2. CONVEX GEOMETRY2.1.7.2 Relative boundaryThe classical definition of boundary of a set C presumes nonempty interior:∂ C = C \ int C (17)More suitable to study of convex sets is the relative boundary; defined[199,A.2.1.2]rel ∂ C C \ rel int C (24)boundary relative to affine hull of C .In the exception when C is a single point {x} , (12)rel∂{x} = {x}\{x} = ∅ , x∈ R n (25)A bounded convex polyhedron (2.3.2,2.12.0.0.1) in subspace R , forexample, has boundary constructed from two points, in R 2 from atleast three line segments, in R 3 from convex polygons, while a convexpolychoron (a bounded polyhedron in R 4 [373]) has boundary constructedfrom three-dimensional convex polyhedra. A halfspace is partially boundedby a hyperplane; its interior therefore excludes that hyperplane. An affineset has no relative boundary.2.1.8 intersection, sum, difference, product2.1.8.0.1 Theorem. Intersection. [307,2, thm.6.5]Intersection of an arbitrary collection of convex sets {C i } is convex. For afinite collection of N sets, a necessarily nonempty intersection of relativeinterior ⋂ Ni=1 rel int C i = rel int ⋂ Ni=1 C i equals relative interior of intersection.And for a possibly infinite collection, ⋂ C i = ⋂ C i .⋄In converse this theorem is implicitly false in so far as a convex set canbe formed by the intersection of sets that are not. Unions of convex sets aregenerally not convex. [199, p.22]Vector sum of two convex sets C 1 and C 2 is convex [199, p.24]C 1 + C 2 = {x + y | x ∈ C 1 , y ∈ C 2 } (26)but not necessarily closed unless at least one set is closed and bounded.

2.1. CONVEX SET 47By additive inverse, we can similarly define vector difference of two convexsetsC 1 − C 2 = {x − y | x ∈ C 1 , y ∈ C 2 } (27)which is convex. Applying this definition to nonempty convex set C 1 , itsself-difference C 1 − C 1 is generally nonempty, nontrivial, and convex; e.g.,for any convex cone K , (2.7.2) the set K − K constitutes its affine hull.[307, p.15]Cartesian product of convex sets{[ ]} [ ]x C1C 1 × C 2 = | x ∈ Cy 1 , y ∈ C 2 =(28)C 2remains convex. The converse also holds; id est, a Cartesian product isconvex iff each set is. [199, p.23]Convex results are also obtained for scaling κ C of a convex set C ,rotation/reflection Q C , or translation C+ α ; each similarly defined.Given any operator T and convex set C , we are prone to write T(C)meaningT(C) {T(x) | x∈ C} (29)Given linear operator T , it therefore follows from (26),T(C 1 + C 2 ) = {T(x + y) | x∈ C 1 , y ∈ C 2 }= {T(x) + T(y) | x∈ C 1 , y ∈ C 2 }= T(C 1 ) + T(C 2 )(30)2.1.9 inverse imageWhile epigraph (3.5) of a convex function must be convex, it generallyholds that inverse image (Figure 15) of a convex function is not. The mostprominent examples to the contrary are affine functions (3.4):2.1.9.0.1 Theorem. Inverse image. [307,3]Let f be a mapping from R p×k to R m×n .The image of a convex set C under any affine functionis convex.f(C) = {f(X) | X ∈ C} ⊆ R m×n (31)

48 CHAPTER 2. CONVEX GEOMETRY(a)Cff(C)(b)f −1 (F)fFFigure 15: (a) Image of convex set in domain of any convex function f isconvex, but there is no converse. (b) Inverse image under convex function f .

2.1. CONVEX SET 49R nR mR(A T )x px p =A † bA † Ax = x pAx p = bR(A)b{x}0ηN(A)x=x p + ηAx = bAη = 0N(A T )0{b}Figure 16:(confer Figure 161) Action of linear map represented by A∈ R m×n :Component of vector x in nullspace N(A) maps to origin while componentin rowspace R(A T ) maps to range R(A). For any A∈ R m×n , AA † Ax = b(E) and inverse image of b ∈ R(A) is a nonempty affine set: x p + N(A).Inverse image of a convex set F ,f −1 (F) = {X | f(X)∈ F} ⊆ R p×k (32)a single- or many-valued mapping, under any affine function f isconvex.⋄In particular, any affine transformation of an affine set remains affine.[307, p.8] Ellipsoids are invariant to any [sic] affine transformation.Although not precluded, this inverse image theorem does not require auniquely invertible mapping f . Figure 16, for example, mechanizes inverseimage under a general linear map. Example 2.9.1.0.2 and Example 3.5.0.0.2offer further applications.Each converse of this two-part theorem is generally false; id est, givenf affine, a convex image f(C) does not imply that set C is convex, andneither does a convex inverse image f −1 (F) imply set F is convex. Acounterexample, invalidating a converse, is easy to visualize when the affinefunction is an orthogonal projector [331] [250]:

50 CHAPTER 2. CONVEX GEOMETRY2.1.9.0.2 Corollary. Projection on subspace. 2.10 (1934) [307,3]Orthogonal projection of a convex set on a subspace or nonempty affine setis another convex set.⋄Again, the converse is false. Shadows, for example, are umbral projectionsthat can be convex when the body providing the shade is not.2.2 Vectorized-matrix inner productEuclidean space R n comes equipped with a linear vector inner-product〈y,z〉 y T z (33)We prefer those angle brackets to connote a geometric rather than algebraicperspective; e.g., vector y might represent a hyperplane normal (2.4.2).Two vectors are orthogonal (perpendicular) to one another if and only iftheir inner product vanishes;y ⊥ z ⇔ 〈y,z〉 = 0 (34)When orthogonal vectors each have unit norm, then they are orthonormal.A vector inner-product defines Euclidean norm (vector 2-norm,A.7.1)‖y‖ 2 = ‖y‖ √ y T y , ‖y‖ = 0 ⇔ y = 0 (35)For linear operator A , the adjoint operator A T is defined by [227,3.10]〈y,A T z〉〈Ay,z〉 (36)For linear operation on a vector, represented by real matrix A , the adjointoperator A T is its transposition.The vector inner-product for matrices is calculated just as it is for vectors;by first transforming a matrix in R p×k to a vector in R pk by concatenatingits columns in the natural order. For lack of a better term, we shall callthat linear bijective (one-to-one and onto [227, App.A1.2]) transformation2.10 For hyperplane representations see2.4.2. For projection of convex sets on hyperplanessee [371,6.6]. A nonempty affine set is called an affine subset (2.3.1.0.1). Orthogonalprojection of points on affine subsets is reviewed inE.4.

2.2. VECTORIZED-MATRIX INNER PRODUCT 51vectorization. For example, the vectorization of Y = [y 1 y 2 · · · y k ] ∈ R p×k[166] [327] is⎡ ⎤y 1yvec Y ⎢ 2⎥⎣ . ⎦ ∈ Rpk (37)y kThen the vectorized-matrix inner-product is trace of matrix inner-product;for Z ∈ R p×k , [61,2.6.1] [199,0.3.1] [382,8] [365,2.2]where (A.1.1)〈Y , Z〉 tr(Y T Z) = vec(Y ) T vec Z (38)tr(Y T Z) = tr(ZY T ) = tr(YZ T ) = tr(Z T Y ) = 1 T (Y ◦ Z)1 (39)and where ◦ denotes the Hadamard product 2.11 of matrices [159,1.1.4]. Theadjoint A T operation on a matrix can therefore be defined in like manner:〈Y , A T Z〉〈AY , Z〉 (40)Take any element C 1 from a matrix-valued set in R p×k , for example, andconsider any particular dimensionally compatible real vectors v and w .Then vector inner-product of C 1 with vw T is〈vw T , C 1 〉 = 〈v , C 1 w〉 = v T C 1 w = tr(wv T C 1 ) = 1 T( (vw T )◦ C 1)1 (41)Further, linear bijective vectorization is distributive with respect toHadamard product; id est,vec(Y ◦ Z) = vec(Y ) ◦ vec(Z) (42)2.2.0.0.1 Example. Application of inverse image theorem.Suppose set C ⊆ R p×k were convex. Then for any particular vectors v ∈ R pand w ∈ R k , the set of vector inner-productsY v T Cw = 〈vw T , C〉 ⊆ R (43)2.11 Hadamard product is a simple entrywise product of corresponding entries from twomatrices of like size; id est, not necessarily square. A commutative operation, theHadamard product can be extracted from within a Kronecker product. [202, p.475]

52 CHAPTER 2. CONVEX GEOMETRYR 2 R 3(a)(b)Figure 17: (a) Cube in R 3 projected on paper-plane R 2 . Subspace projectionoperator is not an isomorphism because new adjacencies are introduced.(b) Tesseract is a projection of hypercube in R 4 on R 3 .is convex. It is easy to show directly that convex combination of elementsfrom Y remains an element of Y . 2.12 Instead given convex set Y , C mustbe convex consequent to inverse image theorem 2.1.9.0.1.More generally, vw T in (43) may be replaced with any particular matrixZ ∈ R p×k while convexity of set 〈Z , C〉⊆ R persists. Further, by replacingv and w with any particular respective matrices U and W of dimensioncompatible with all elements of convex set C , then set U T CW is convex bythe inverse image theorem because it is a linear mapping of C . 2.2.1 Frobenius’2.2.1.0.1 Definition. Isomorphic.An isomorphism of a vector space is a transformation equivalent to a linearbijective mapping. Image and inverse image under the transformationoperator are then called isomorphic vector spaces.△2.12 To verify that, take any two elements C 1 and C 2 from the convex matrix-valued set C ,and then form the vector inner-products (43) that are two elements of Y by definition.Now make a convex combination of those inner products; videlicet, for 0≤µ≤1µ 〈vw T , C 1 〉 + (1 − µ) 〈vw T , C 2 〉 = 〈vw T , µ C 1 + (1 − µ)C 2 〉The two sides are equivalent by linearity of inner product. The right-hand side remainsa vector inner-product of vw T with an element µ C 1 + (1 − µ)C 2 from the convex set C ;hence, it belongs to Y . Since that holds true for any two elements from Y , then it mustbe a convex set.

2.2. VECTORIZED-MATRIX INNER PRODUCT 53Isomorphic vector spaces are characterized by preservation of adjacency;id est, if v and w are points connected by a line segment in one vectorspace, then their images will be connected by a line segment in the other.Two Euclidean bodies may be considered isomorphic if there exists anisomorphism, of their vector spaces, under which the bodies correspond.[367,I.1] Projection (E) is not an isomorphism, Figure 17 for example;hence, perfect reconstruction (inverse projection) is generally impossiblewithout additional information.When Z =Y ∈ R p×k in (38), Frobenius’ norm is resultant from vectorinner-product; (confer (1686))‖Y ‖ 2 F = ‖ vec Y ‖2 2 = 〈Y , Y 〉 = tr(Y T Y )= ∑ i,jY 2ij = ∑ iλ(Y T Y ) i = ∑ iσ(Y ) 2 i(44)where λ(Y T Y ) i is the i th eigenvalue of Y T Y , and σ(Y ) i the i th singularvalue of Y . Were Y a normal matrix (A.5.1), then σ(Y )= |λ(Y )|[393,8.1] thus‖Y ‖ 2 F = ∑ iλ(Y ) 2 i = ‖λ(Y )‖ 2 2 = 〈λ(Y ), λ(Y )〉 = 〈Y , Y 〉 (45)The converse (45) ⇒ normal matrix Y also holds. [202,2.5.4]Frobenius’ norm is the Euclidean norm of vectorized matrices. Becausethe metrics are equivalent, for X ∈ R p×k‖ vec X −vec Y ‖ 2 = ‖X −Y ‖ F (46)and because vectorization (37) is a linear bijective map, then vector spaceR p×k is isometrically isomorphic with vector space R pk in the Euclidean senseand vec is an isometric isomorphism of R p×k . Because of this Euclideanstructure, all the known results from convex analysis in Euclidean space R ncarry over directly to the space of real matrices R p×k .2.2.1.1 Injective linear operatorsInjective mapping (transformation) means one-to-one mapping; synonymouswith uniquely invertible linear mapping on Euclidean space.Linear injective mappings are fully characterized by lack of nontrivialnullspace.

54 CHAPTER 2. CONVEX GEOMETRYTR 2 R 3dim domT = dim R(T)Figure 18: Linear injective mapping Tx=Ax : R 2 → R 3 of Euclidean bodyremains two-dimensional under mapping represented by skinny full-rankmatrix A∈ R 3×2 ; two bodies are isomorphic by Definition 2.2.1.0.1.2.2.1.1.1 Definition. Isometric isomorphism.An isometric isomorphism of a vector space having a metric defined on it is alinear bijective mapping T that preserves distance; id est, for all x,y ∈dom T‖Tx − Ty‖ = ‖x − y‖ (47)Then isometric isomorphism T is called a bijective isometry.Unitary linear operator Q : R k → R k , represented by orthogonal matrixQ∈ R k×k (B.5), is an isometric isomorphism; e.g., discrete Fourier transformvia (836). Suppose T(X)= UXQ is a bijective isometry where U isa dimensionally compatible orthonormal matrix. 2.13 Then we also sayFrobenius’ norm is orthogonally invariant; meaning, for X,Y ∈ R p×k‖U(X −Y )Q‖ F = ‖X −Y ‖ F (48)⎡ ⎤1 0Yet isometric operator T : R 2 → R 3 , represented by A = ⎣ 0 1 ⎦on R 2 ,0 0is injective but not a surjective map to R 3 . [227,1.6,2.6] This operator Tcan therefore be a bijective isometry only with respect to its range.2.13 any matrix U whose columns are orthonormal with respect to each other (U T U = I);these include the orthogonal matrices.△

2.2. VECTORIZED-MATRIX INNER PRODUCT 55BR 3PTR 3PT(B)xPTxFigure 19: Linear noninjective mapping PTx=A † Ax : R 3 → R 3 ofthree-dimensional Euclidean body B has affine dimension 2 under projectionon rowspace of fat full-rank matrix A∈ R 2×3 . Set of coefficients of orthogonalprojection T B = {Ax |x∈ B} is isomorphic with projection P(T B) [sic].Any linear injective transformation on Euclidean space is uniquelyinvertible on its range. In fact, any linear injective transformation has arange whose dimension equals that of its domain. In other words, for anyinvertible linear transformation T [ibidem]dim dom(T) = dim R(T) (49)e.g., T represented by skinny-or-square full-rank matrices. (Figure 18) Animportant consequence of this fact is:Affine dimension, of any n-dimensional Euclidean body in domain ofoperator T , is invariant to linear injective transformation.2.2.1.2 Noninjective linear operatorsMappings in Euclidean space created by noninjective linear operators can becharacterized in terms of an orthogonal projector (E). Consider noninjectivelinear operator Tx =Ax : R n → R m represented by fat matrix A∈ R m×n(m< n). What can be said about the nature of this m-dimensional mapping?

56 CHAPTER 2. CONVEX GEOMETRYConcurrently, consider injective linear operator Py=A † y : R m → R nwhere R(A † )= R(A T ). P(Ax)= PTx achieves projection of vector x onthe row space R(A T ). (E.3.1) This means vector Ax can be succinctlyinterpreted as coefficients of orthogonal projection.Pseudoinverse matrix A † is skinny and full-rank, so operator Py is a linearbijection with respect to its range R(A † ). By Definition 2.2.1.0.1, imageP(T B) of projection PT(B) on R(A T ) in R n must therefore be isomorphicwith the set of projection coefficients T B = {Ax |x∈ B} in R m and have thesame affine dimension by (49). To illustrate, we present a three-dimensionalEuclidean body B in Figure 19 where any point x in the nullspace N(A)maps to the origin.2.2.2 Symmetric matrices2.2.2.0.1 Definition. Symmetric matrix subspace.Define a subspace of R M×M : the convex set of all symmetric M×M matrices;S M { A∈ R M×M | A=A T} ⊆ R M×M (50)This subspace comprising symmetric matrices S M is isomorphic with thevector space R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. The orthogonal complement [331] [250] of S M isS M⊥ { A∈ R M×M | A=−A T} ⊂ R M×M (51)the subspace of antisymmetric matrices in R M×M ; id est,S M ⊕ S M⊥ = R M×M (52)where unique vector sum ⊕ is defined on page 772.△All antisymmetric matrices are hollow by definition (have 0 maindiagonal). Any square matrix A∈ R M×M can be written as a sum of itssymmetric and antisymmetric parts: respectively,A = 1 2 (A +AT ) + 1 2 (A −AT ) (53)The symmetric part is orthogonal in R M2 to the antisymmetric part; videlicet,tr ( (A +A T )(A −A T ) ) = 0 (54)

2.2. VECTORIZED-MATRIX INNER PRODUCT 57In the ambient space of real matrices, the antisymmetric matrix subspacecan be described{ }1S M⊥ =2 (A −AT ) | A∈ R M×M ⊂ R M×M (55)because any matrix in S M is orthogonal to any matrix in S M⊥ . Furtherconfined to the ambient subspace of symmetric matrices, because ofantisymmetry, S M⊥ would become trivial.2.2.2.1 Isomorphism of symmetric matrix subspaceWhen a matrix is symmetric in S M , we may still employ the vectorizationtransformation (37) to R M2 ; vec , an isometric isomorphism. We mightinstead choose to realize in the lower-dimensional subspace R M(M+1)/2 byignoring redundant entries (below the main diagonal) during transformation.Such a realization would remain isomorphic but not isometric. Lack ofisometry is a spatial distortion due now to disparity in metric between R M2and R M(M+1)/2 . To realize isometrically in R M(M+1)/2 , we must make acorrection: For Y = [Y ij ]∈ S M we take symmetric vectorization [215,2.2.1]⎡ ⎤√2Y12Y 11Y 22√2Y13svec Y √2Y23∈ R M(M+1)/2 (56)⎢ Y 33 ⎥⎣ ⎦.Y MMwhere all entries off the main diagonal have been scaled. Now for Z ∈ S M〈Y , Z〉 tr(Y T Z) = vec(Y ) T vec Z = svec(Y ) T svec Z (57)Then because the metrics become equivalent, for X ∈ S M‖ svec X − svec Y ‖ 2 = ‖X − Y ‖ F (58)and because symmetric vectorization (56) is a linear bijective mapping, thensvec is an isometric isomorphism of the symmetric matrix subspace. In other

58 CHAPTER 2. CONVEX GEOMETRYwords, S M is isometrically isomorphic with R M(M+1)/2 in the Euclidean senseunder transformation svec .The set of all symmetric matrices S M forms a proper subspace in R M×M ,so for it there exists a standard orthonormal basis in isometrically isomorphicR M(M+1)/2 ⎧⎨{E ij ∈ S M } =⎩e i e T i ,1 √2(e i e T j + e j e T ii = j = 1... M), 1 ≤ i < j ≤ M⎫⎬⎭(59)where M(M + 1)/2 standard basis matrices E ij are formed from thestandard basis vectors[{ ]1, i = je i =0, i ≠ j , j = 1... M ∈ R M (60)Thus we have a basic orthogonal expansion for Y ∈ S MY =M∑ j∑〈E ij ,Y 〉 E ij (61)j=1 i=1whose coefficients〈E ij ,Y 〉 ={Yii , i = 1... MY ij√2 , 1 ≤ i < j ≤ M(62)correspond to entries of the symmetric vectorization (56).2.2.3 Symmetric hollow subspace2.2.3.0.1 Definition. Hollow subspaces. [352]Define a subspace of R M×M : the convex set of all (real) symmetric M ×Mmatrices having 0 main diagonal;R M×Mh { A∈ R M×M | A=A T , δ(A) = 0 } ⊂ R M×M (63)where the main diagonal of A∈ R M×M is denoted (A.1)δ(A) ∈ R M (1415)

2.2. VECTORIZED-MATRIX INNER PRODUCT 59Operating on a vector, linear operator δ naturally returns a diagonal matrix;δ(δ(A)) is a diagonal matrix. Operating recursively on a vector Λ∈ R N ordiagonal matrix Λ∈ S N , operator δ(δ(Λ)) returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) = Λ (1417)The subspace R M×Mh(63) comprising (real) symmetric hollow matrices isisomorphic with subspace R M(M−1)/2 ; its orthogonal complement isR M×M⊥h { A∈ R M×M | A=−A T + 2δ 2 (A) } ⊆ R M×M (64)the subspace of antisymmetric antihollow matrices in R M×M ; id est,R M×Mh⊕ R M×M⊥h= R M×M (65)Yet defined instead as a proper subspace of ambient S MS M h { A∈ S M | δ(A) = 0 } ≡ R M×Mh⊂ S M (66)the orthogonal complement S M⊥h of symmetric hollow subspace S M h ,S M⊥h { A∈ S M | A=δ 2 (A) } ⊆ S M (67)called symmetric antihollow subspace, is simply the subspace of diagonalmatrices; id est,S M h ⊕ S M⊥h = S M (68)△Any matrix A∈ R M×M can be written as a sum of its symmetric hollowand antisymmetric antihollow parts: respectively,( ) ( 1 1A =2 (A +AT ) − δ 2 (A) +2 (A −AT ) + δ (A))2 (69)The symmetric hollow part is orthogonal to the antisymmetric antihollowpart in R M2 ; videlicet,))1 1tr((2 (A +AT ) − δ (A))( 2 2 (A −AT ) + δ 2 (A) = 0 (70)

60 CHAPTER 2. CONVEX GEOMETRYbecause any matrix in subspace R M×Mhis orthogonal to any matrix in theantisymmetric antihollow subspace{ }1R M×M⊥h=2 (A −AT ) + δ 2 (A) | A∈ R M×M ⊆ R M×M (71)of the ambient space of real matrices; which reduces to the diagonal matricesin the ambient space of symmetric matricesS M⊥h = { δ 2 (A) | A∈ S M} = { δ(u) | u∈ R M} ⊆ S M (72)In anticipation of their utility with Euclidean distance matrices (EDMs)in5, for symmetric hollow matrices we introduce the linear bijectivevectorization dvec that is the natural analogue to symmetric matrixvectorization svec (56): for Y = [Y ij ]∈ S M h⎡dvec Y √ 2⎢⎣⎤Y 12Y 13Y 23Y 14Y 24Y 34 ⎥.⎦Y M−1,M∈ R M(M−1)/2 (73)Like svec , dvec is an isometric isomorphism on the symmetric hollowsubspace. For X ∈ S M h‖ dvec X − dvecY ‖ 2 = ‖X − Y ‖ F (74)The set of all symmetric hollow matrices S M h forms a proper subspacein R M×M , so for it there must be a standard orthonormal basis inisometrically isomorphic R M(M−1)/2{ }1 ( ){E ij ∈ S M h } = √ ei e T j + e j e T i , 1 ≤ i < j ≤ M (75)2where M(M −1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M .The symmetric hollow majorization corollary A.1.2.0.2 characterizeseigenvalues of symmetric hollow matrices.

2.3. HULLS 61Figure 20: Convex hull of a random list of points in R 3 . Some pointsfrom that generating list reside interior to this convex polyhedron (2.12).[373, Convex Polyhedron] (Avis-Fukuda-Mizukoshi)2.3 HullsWe focus on the affine, convex, and conic hulls: convex sets that may beregarded as kinds of Euclidean container or vessel united with its interior.2.3.1 Affine hull, affine dimensionAffine dimension of any set in R n is the dimension of the smallest affine set(empty set, point, line, plane, hyperplane (2.4.2), translated subspace, R n )that contains it. For nonempty sets, affine dimension is the same as dimensionof the subspace parallel to that affine set. [307,1] [199,A.2.1]Ascribe the points in a list {x l ∈ R n , l=1... N} to the columns ofmatrix X :X = [x 1 · · · x N ] ∈ R n×N (76)In particular, we define affine dimension r of the N-point list X asdimension of the smallest affine set in Euclidean space R n that contains X ;r dim aff X (77)Affine dimension r is a lower bound sometimes called embedding dimension.[352] [185] That affine set A in which those points are embedded is uniqueand called the affine hull [330,2.1];A aff {x l ∈ R n , l=1... N} = aff X= x 1 + R{x l − x 1 , l=2... N} = {Xa | a T 1 = 1} ⊆ R n (78)

62 CHAPTER 2. CONVEX GEOMETRYfor which we call list X a set of generators. Hull A is parallel to subspacewhereR{x l − x 1 , l=2... N} = R(X − x 1 1 T ) ⊆ R n (79)R(A) = {Ax | ∀x} (142)Given some arbitrary set C and any x∈ Cwhere aff(C−x) is a subspace.aff C = x + aff(C − x) (80)2.3.1.0.1 Definition. Affine subset.We analogize affine subset to subspace, 2.14 defining it to be any nonemptyaffine set (2.1.4).△The affine hull of a point x is that point itself;aff ∅ ∅ (81)aff{x} = {x} (82)Affine hull of two distinct points is the unique line through them. (Figure 21)The affine hull of three noncollinear points in any dimension is that uniqueplane containing the points, and so on. The subspace of symmetric matricesS m is the affine hull of the cone of positive semidefinite matrices; (2.9)aff S m + = S m (83)2.3.1.0.2 Example. Affine hull of rank-1 correlation matrices. [218]The set of all m ×m rank-1 correlation matrices is defined by all the binaryvectors y in R m (confer5.9.1.0.1){yy T ∈ S m + | δ(yy T )=1} (84)Affine hull of the rank-1 correlation matrices is equal to the set of normalizedsymmetric matrices; id est,aff{yy T ∈ S m + | δ(yy T )=1} = {A∈ S m | δ(A)=1} (85)2.14 The popular term affine subspace is an oxymoron.

2.3. HULLS 63Aaffine hull (drawn truncated)Cconvex hullKconic hull (truncated)Rrange or span is a plane (truncated)Figure 21: Given two points in Euclidean vector space of any dimension,their various hulls are illustrated. Each hull is a subset of range; generally,A , C, K ⊆ R ∋ 0. (Cartesian axes drawn for reference.)

64 CHAPTER 2. CONVEX GEOMETRY2.3.1.0.3 Exercise. Affine hull of correlation matrices.Prove (85) via definition of affine hull. Find the convex hull instead. 2.3.1.1 Partial order induced by R N + and S M +Notation a ≽ 0 means vector a belongs to nonnegative orthant R N + whilea ≻ 0 means vector a belongs to the nonnegative orthant’s interior int R N + .a ≽ b denotes comparison of vector a to vector b on R N with respect to thenonnegative orthant; id est, a ≽ b means a −b belongs to the nonnegativeorthant but neither a or b is necessarily nonnegative. With particularrespect to the nonnegative orthant, a ≽ b ⇔ a i ≥ b i ∀i (370).More generally, a ≽ Kb or a ≻ Kb denotes comparison with respect topointed closed convex cone K , but equivalence with entrywise comparisondoes not hold and neither a or b necessarily belongs to K . (2.7.2.2)The symbol ≥ is reserved for scalar comparison on the real line R withrespect to the nonnegative real line R + as in a T y ≥ b . Comparison ofmatrices with respect to the positive semidefinite cone S M + , like I ≽A ≽ 0in Example 2.3.2.0.1, is explained in2.9.0.1.2.3.2 Convex hullThe convex hull [199,A.1.4] [307] of any bounded 2.15 list or set of N pointsX ∈ R n×N forms a unique bounded convex polyhedron (confer2.12.0.0.1)whose vertices constitute some subset of that list;P conv {x l , l=1... N}= conv X = {Xa | a T 1 = 1, a ≽ 0} ⊆ R n(86)Union of relative interior and relative boundary (2.1.7.2) of the polyhedroncomprise its convex hull P , the smallest closed convex set that contains thelist X ; e.g., Figure 20. Given P , the generating list {x l } is not unique.But because every bounded polyhedron is the convex hull of its vertices,[330,2.12.2] the vertices of P comprise a minimal set of generators.Given some arbitrary set C ⊆ R n , its convex hull conv C is equivalent tothe smallest convex set containing it. (confer2.4.1.1.1) The convex hull is a2.15 An arbitrary set C in R n is bounded iff it can be contained in a Euclidean ball havingfinite radius. [113,2.2] (confer5.7.3.0.1) The smallest ball containing C has radiusinf sup ‖x −y‖ and center x ⋆ whose determination is a convex problem because sup‖x −y‖x y∈Cy∈Cis a convex function of x; but the supremum may be difficult to ascertain.

2.3. HULLS 65subset of the affine hull;conv C ⊆ aff C = aff C = aff C = aff conv C (87)Any closed bounded convex set C is equal to the convex hull of its boundary;C = conv ∂ C (88)conv ∅ ∅ (89)2.3.2.0.1 Example. Hull of rank-k projection matrices. [143] [287][11,4.1] [292,3] [239,2.4] [240] Convex hull of the set comprising outerproduct of orthonormal matrices has equivalent expression: for 1 ≤ k ≤ N(2.9.0.1)conv { UU T | U ∈ R N×k , U T U = I } = { A∈ S N | I ≽ A ≽ 0, 〈I , A〉=k } ⊂ S N +(90)This important convex body we call Fantope (after mathematician Ky Fan).In case k = 1, there is slight simplification: ((1619), Example 2.9.2.7.1)conv { UU T | U ∈ R N , U T U = 1 } = { A∈ S N | A ≽ 0, 〈I , A〉=1 } (91)In case k = N , the Fantope is identity matrix I . More generally, the set{UU T | U ∈ R N×k , U T U = I } (92)comprises the extreme points (2.6.0.0.1) of its convex hull. By (1463), eachand every extreme point UU T has only k nonzero eigenvalues λ and theyall equal 1 ; id est, λ(UU T ) 1:k = λ(U T U) = 1. So Frobenius’ norm of eachand every extreme point equals the same constant‖UU T ‖ 2 F = k (93)Each extreme point simultaneously lies on the boundary of the positivesemidefinite cone (when k < N ,2.9) and on the boundary of a hypersphereof dimension k(N −√k(1 k + 1) and radius − k ) centered at kI along2 2 N Nthe ray (base 0) through the identity matrix I in isomorphic vector spaceR N(N+1)/2 (2.2.2.1).Figure 22 illustrates extreme points (92) comprising the boundary of aFantope, the boundary of a disc corresponding to k = 1, N = 2 ; but thatcircumscription does not hold in higher dimension. (2.9.2.8)

66 CHAPTER 2. CONVEX GEOMETRYsvec ∂ S 2 +γI[ α ββ γ]α√2βFigure 22: Two Fantopes. Circle (radius 1/ √ 2), shown here on boundary ofpositive semidefinite cone S 2 + in isometrically isomorphic R 3 from Figure 43,comprises boundary of a Fantope (90) in this dimension (k = 1, N = 2). Lonepoint illustrated is identity matrix I , interior to PSD cone, and is thatFantope corresponding to k = 2, N = 2. (View is from inside PSD conelooking toward origin.)

2.3. HULLS 672.3.2.0.2 Example. Nuclear norm ball: convex hull of rank-1 matrices.From (91), in Example 2.3.2.0.1, we learn that the convex hull of normalizedsymmetric rank-1 matrices is a slice of the positive semidefinite cone.In2.9.2.7 we find the convex hull of all symmetric rank-1 matrices to bethe entire positive semidefinite cone.In the present example we abandon symmetry; instead posing, what isthe convex hull of bounded nonsymmetric rank-1 matrices:conv{uv T | ‖uv T ‖ ≤ 1, u∈ R m , v ∈ R n } = {X ∈ R m×n | ∑ iσ(X) i ≤ 1} (94)where σ(X) is a vector of singular values. (Since ‖uv T ‖= ‖u‖‖v‖ (1610),norm of each vector constituting a dyad uv T (B.1) in the hull is effectivelybounded above by 1.)Proof. (⇐) Suppose ∑ σ(X) i ≤ 1. As inA.6, define a singularvalue decomposition: X = UΣV T where U = [u 1 ... u min{m,n} ]∈ R m×min{m,n} ,V∑= [v 1 ... v min{m,n} ]∈ R n×min{m,n} , and whose sum of singular values isσ(X)i = tr Σ = κ ≤ 1. Then we may write X = ∑ σ i√ √ κuiκ κvTi which is aconvex combination of dyads each of whose norm does not exceed 1. (Srebro)(⇒) Now suppose we are given a convex combination of dyadsX = ∑ ∑α i u i viT such that αi =1, α i ≥ 0 ∀i, and ‖u i vi T ‖≤1 ∀i.Then∑by triangle inequality for singular values [203, cor.3.4.3]σ(X)i ≤ ∑ σ(α i u i vi T )= ∑ α i ‖u i vi T ‖≤ ∑ α i .Given any particular dyad uv T p in the convex hull, because its polar −uv T pand every convex combination of the two belong to that hull, then the uniqueline containing those two points ±uv T p (their affine combination (78)) mustintersect the hull’s boundary at the normalized dyads {±uv T | ‖uv T ‖=1}.Any point formed by convex combination of dyads in the hull must thereforebe expressible as a convex combination of dyads on the boundary: Figure 23.conv{uv T | ‖uv T ‖ ≤ 1, u∈ R m , v ∈ R n } ≡ conv{uv T | ‖uv T ‖ = 1, u∈ R m , v ∈ R n }(95)id est, dyads may be normalized and the hull’s boundary contains them;∂{X ∈ R m×n | ∑ iσ(X) i ≤ 1} ⊇ {uv T | ‖uv T ‖ = 1, u∈ R m , v ∈ R n } (96)Normalized dyads constitute the set of extreme points (2.6.0.0.1) of thisnuclear norm ball which is, therefore, their convex hull.

68 CHAPTER 2. CONVEX GEOMETRY‖xy T ‖=1xy T p‖uv T ‖=1uv T p‖uv T ‖≤1{X ∈ R m×n | ∑ iσ(X) i ≤ 1}0−uv T p‖−uv T ‖=1−xy T p‖−xy T ‖=1Figure 23: uv T p is a convex combination of normalized dyads ‖±uv T ‖=1 ;similarly for xy T p . Any point in line segment joining xy T p to uv T p is expressibleas a convex combination of two to four points indicated on boundary. Affinedimension of vectorized nuclear norm ball in X∈R 2×2 is 2 because (1610)σ 1 = ‖u‖‖v‖=1 by (95).

2.3. HULLS 692.3.2.0.3 Exercise. Convex hull of outer product.Describe the interior of a Fantope.Find the convex hull of nonorthogonal projection matrices (E.1.1):{UV T | U ∈ R N×k , V ∈ R N×k , V T U = I} (97)Find the convex hull of nonsymmetric matrices bounded under some norm:{UV T | U ∈ R m×k , V ∈ R n×k , ‖UV T ‖ ≤ 1} (98)2.3.2.0.4 Example. Permutation polyhedron. [201] [317] [261]A permutation matrix Ξ is formed by interchanging rows and columns ofidentity matrix I . Since Ξ is square and Ξ T Ξ = I , the set of all permutationmatrices is a proper subset of the nonconvex manifold of orthogonal matrices(B.5). In fact, the only orthogonal matrices having all nonnegative entriesare permutations of the identity:Ξ −1 = Ξ T , Ξ ≥ 0 (99)And the only positive semidefinite permutation matrix is the identity.[333,6.5 prob.20]Regarding the permutation matrices as a set of points in Euclidean space,its convex hull is a bounded polyhedron (2.12) described (Birkhoff, 1946)conv{Ξ = Π i (I ∈ S n )∈ R n×n , i=1... n!} = {X ∈ R n×n | X T 1=1, X1=1, X ≥ 0}(100)where Π i is a linear operator here representing the i th permutation. Thispolyhedral hull, whose n! vertices are the permutation matrices, is alsoknown as the set of doubly stochastic matrices. The permutation matricesare the minimal cardinality (fewest nonzero entries) doubly stochasticmatrices. The only orthogonal matrices belonging to this polyhedron arethe permutation matrices.It is remarkable that n! permutation matrices can be described as theextreme points (2.6.0.0.1) of a bounded polyhedron, of affine dimension(n−1) 2 , that is itself described by 2n equalities (2n−1 linearly independentequality constraints in n 2 nonnegative variables). By Carathéodory’stheorem, conversely, any doubly stochastic matrix can be described as a

70 CHAPTER 2. CONVEX GEOMETRYconvex combination of at most (n−1) 2 +1 permutation matrices. [202,8.7][55, thm.1.2.5] This polyhedron, then, can be a device for relaxing an integer,combinatorial, or Boolean optimization problem. 2.16 [66] [283,3.1] 2.3.2.0.5 Example. Convex hull of orthonormal matrices. [27,1.2]Consider rank-k matrices U ∈ R n×k such that U T U = I . These are theorthonormal matrices; a closed bounded submanifold, of all orthogonalmatrices, having dimension nk − 1 k(k + 1) [52]. Their convex hull is2expressed, for 1 ≤ k ≤ nconv{U ∈ R n×k | U T U = I} = {X ∈ R n×k | ‖X‖ 2 ≤ 1}= {X ∈ R n×k | ‖X T a‖ ≤ ‖a‖ ∀a∈ R n }(101)When k=n, matrices U are orthogonal and the convex hull is called thespectral norm ball which is the set of all contractions. [203, p.158] [329, p.313]The orthogonal matrices then constitute the extreme points (2.6.0.0.1) ofthis hull.By Schur complement (A.4), the spectral norm ‖X‖ 2 constraining largestsingular value σ(X) 1 can be expressed as a semidefinite constraint[ ] I X‖X‖ 2 ≤ 1 ⇔X T ≽ 0 (102)Ibecause of equivalence X T X ≼ I ⇔ σ(X) ≼ 1 with singular values. (1566)(1450) (1451) 2.3.3 Conic hullIn terms of a finite-length point list (or set) arranged columnar in X ∈ R n×N(76), its conic hull is expressedK cone {x l , l=1... N} = cone X = {Xa | a ≽ 0} ⊆ R n (103)id est, every nonnegative combination of points from the list. Conic hull ofany finite-length list forms a polyhedral cone [199,A.4.3] (2.12.1.0.1; e.g.,Figure 50a); the smallest closed convex cone (2.7.2) that contains the list.2.16 Relaxation replaces an objective function with its convex envelope or expands a feasibleset to one that is convex. Dantzig first showed in 1951 that, by this device, the so-calledassignment problem can be formulated as a linear program. [316] [26,II.5]

2.3. HULLS 71Figure 24: A simplicial cone (2.12.3.1.1) in R 3 whose boundary is drawntruncated; constructed using A∈ R 3×3 and C = 0 in (287). By the mostfundamental definition of a cone (2.7.1), entire boundary can be constructedfrom an aggregate of rays emanating exclusively from the origin. Eachof three extreme directions corresponds to an edge (2.6.0.0.3); they areconically, affinely, and linearly independent for this cone. Because thisset is polyhedral, exposed directions are in one-to-one correspondence withextreme directions; there are only three. Its extreme directions give rise towhat is called a vertex-description of this polyhedral cone; simply, the conichull of extreme directions. Obviously this cone can also be constructed byintersection of three halfspaces; hence the equivalent halfspace-description.

72 CHAPTER 2. CONVEX GEOMETRYBy convention, the aberration [330,2.1]cone ∅ {0} (104)Given some arbitrary set C , it is apparentconv C ⊆ cone C (105)2.3.4 Vertex-descriptionThe conditions in (78), (86), and (103) respectively define an affinecombination, convex combination, and conic combination of elements fromthe set or list. Whenever a Euclidean body can be described as somehull or span of a set of points, then that representation is loosely calleda vertex-description and those points are called generators.2.4 Halfspace, HyperplaneA two-dimensional affine subset is called a plane. An (n −1)-dimensionalaffine subset in R n is called a hyperplane. [307] [199] Every hyperplanepartially bounds a halfspace (which is convex, but not affine, and the onlynonempty convex set in R n whose complement is convex and nonempty).2.4.1 Halfspaces H + and H −Euclidean space R n is partitioned in two by any hyperplane ∂H ; id est,H − + H + = R n . The resulting (closed convex) halfspaces, both partiallybounded by ∂H , may be described as an asymmetryH − = {y | a T y ≤ b} = {y | a T (y − y p ) ≤ 0} ⊂ R n (106)H + = {y | a T y ≥ b} = {y | a T (y − y p ) ≥ 0} ⊂ R n (107)where nonzero vector a∈R n is an outward-normal to the hyperplane partiallybounding H − while an inward-normal with respect to H + . For any vectory −y p that makes an obtuse angle with normal a , vector y will lie in thehalfspace H − on one side (shaded in Figure 25) of the hyperplane while acuteangles denote y in H + on the other side.

2.4. HALFSPACE, HYPERPLANE 73H + = {y | a T (y − y p )≥0}ay pcyd∂H = {y | a T (y − y p )=0} = N(a T ) + y p∆H − = {y | a T (y − y p )≤0}N(a T )={y | a T y=0}Figure 25: Hyperplane illustrated ∂H is a line partially bounding halfspacesH − and H + in R 2 . Shaded is a rectangular piece of semiinfinite H − withrespect to which vector a is outward-normal to bounding hyperplane; vectora is inward-normal with respect to H + . Halfspace H − contains nullspaceN(a T ) (dashed line through origin) because a T y p > 0. Hyperplane,halfspace, and nullspace are each drawn truncated. Points c and d areequidistant from hyperplane, and vector c − d is normal to it. ∆ is distancefrom origin to hyperplane.

74 CHAPTER 2. CONVEX GEOMETRYAn equivalent more intuitive representation of a halfspace comes aboutwhen we consider all the points in R n closer to point d than to point c orequidistant, in the Euclidean sense; from Figure 25,H − = {y | ‖y − d‖ ≤ ‖y − c‖} (108)This representation, in terms of proximity, is resolved with the moreconventional representation of a halfspace (106) by squaring both sides ofthe inequality in (108);}H − ={y | (c − d) T y ≤ ‖c‖2 − ‖d‖ 2=2( {y | (c − d) T y − c + d ) }≤ 02(109)2.4.1.1 PRINCIPLE 1: Halfspace-description of convex setsThe most fundamental principle in convex geometry follows from thegeometric Hahn-Banach theorem [250,5.12] [18,1] [132,I.1.2] whichguarantees any closed convex set to be an intersection of halfspaces.2.4.1.1.1 Theorem. Halfspaces. [199,A.4.2b] [42,2.4]A closed convex set in R n is equivalent to the intersection of all halfspacesthat contain it.⋄Intersection of multiple halfspaces in R n may be represented using amatrix constant A⋂H i−= {y | A T y ≼ b} = {y | A T (y − y p ) ≼ 0} (110)i⋂H i+= {y | A T y ≽ b} = {y | A T (y − y p ) ≽ 0} (111)iwhere b is now a vector, and the i th column of A is normal to a hyperplane∂H i partially bounding H i . By the halfspaces theorem, intersections likethis can describe interesting convex Euclidean bodies such as polyhedra andcones, giving rise to the term halfspace-description.

2.4. HALFSPACE, HYPERPLANE 7511−1−11[ 1a =1](a)[ −1b =−1]−1−11(b){y | a T y=1}{y | b T y=−1}{y | a T y=−1}{y | b T y=1}(c)[ −1c =1]−11−11−11−1[ 1d =−11](d){y | c T y=1}{y | c T y=−1}{y | d T y=−1}{y | d T y=1}[ 1e =0]−11(e){y | e T y=−1} {y | e T y=1}Figure 26: (a)-(d) Hyperplanes in R 2 (truncated). Movement in normaldirection increases vector inner-product. This visual concept is exploitedto attain analytical solution of linear programs; e.g., Example 2.4.2.6.2,Exercise 2.5.1.2.2, Example 3.4.0.0.2, [61, exer.4.8-exer.4.20]. Each graph isalso interpretable as a contour plot of a real affine function of two variablesas in Figure 73. (e) Ratio |β|/‖α‖ from {x | α T x = β} represents radius ofhypersphere about 0 supported by hyperplane whose normal is α .

76 CHAPTER 2. CONVEX GEOMETRY2.4.2 Hyperplane ∂H representationsEvery hyperplane ∂H is an affine set parallel to an (n −1)-dimensionalsubspace of R n ; it is itself a subspace if and only if it contains the origin.dim∂H = n − 1 (112)so a hyperplane is a point in R , a line in R 2 , a plane in R 3 , and so on. Everyhyperplane can be described as the intersection of complementary halfspaces;[307,19]∂H = H − ∩ H + = {y | a T y ≤ b , a T y ≥ b} = {y | a T y = b} (113)a halfspace-description. Assuming normal a∈ R n to be nonzero, then anyhyperplane in R n can be described as the solution set to vector equationa T y = b (illustrated in Figure 25 and Figure 26 for R 2 );∂H {y | a T y = b} = {y | a T (y−y p ) = 0} = {Zξ+y p | ξ ∈ R n−1 } ⊂ R n (114)All solutions y constituting the hyperplane are offset from the nullspace ofa T by the same constant vector y p ∈ R n that is any particular solution toa T y=b ; id est,y = Zξ + y p (115)where the columns of Z ∈ R n×n−1 constitute a basis for the nullspaceN(a T ) = {x∈ R n | a T x=0} . 2.17Conversely, given any point y p in R n , the unique hyperplane containingit having normal a is the affine set ∂H (114) where b equals a T y p andwhere a basis for N(a T ) is arranged in Z columnar. Hyperplane dimensionis apparent from dimension of Z ; that hyperplane is parallel to the span ofits columns.2.4.2.0.1 Exercise. Hyperplane scaling.Given normal y , draw a hyperplane {x∈ R 2 | x T y =1}. Suppose z = 1y . 2On the same plot, draw the hyperplane {x∈ R 2 | x T z =1}. Now supposez = 2y , then draw the last hyperplane again with this new z . What is theapparent effect of scaling normal y ?2.17 We will later find this expression for y in terms of nullspace of a T (more generally, ofmatrix A T (144)) to be a useful device for eliminating affine equality constraints, much aswe did here.

2.4. HALFSPACE, HYPERPLANE 772.4.2.0.2 Example. Distance from origin to hyperplane.Given the (shortest) distance ∆∈ R + from the origin to a hyperplanehaving normal vector a , we can find its representation ∂H by droppinga perpendicular. The point thus found is the orthogonal projection of theorigin on ∂H (E.5.0.0.5), equal to a∆/‖a‖ if the origin is known a priorito belong to halfspace H − (Figure 25), or equal to −a∆/‖a‖ if the originbelongs to halfspace H + ; id est, when H − ∋0∂H = { y | a T (y − a∆/‖a‖) = 0 } = { y | a T y = ‖a‖∆ } (116)or when H + ∋0∂H = { y | a T (y + a∆/‖a‖) = 0 } = { y | a T y = −‖a‖∆ } (117)Knowledge of only distance ∆ and normal a thus introduces ambiguity intothe hyperplane representation.2.4.2.1 Matrix variableAny halfspace in R mn may be represented using a matrix variable. Forvariable Y ∈ R m×n , given constants A∈ R m×n and b = 〈A , Y p 〉 ∈ RH − = {Y ∈ R mn | 〈A, Y 〉 ≤ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≤ 0} (118)H + = {Y ∈ R mn | 〈A, Y 〉 ≥ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≥ 0} (119)Recall vector inner-product from2.2: 〈A, Y 〉= tr(A T Y )= vec(A) T vec(Y ).Hyperplanes in R mn may, of course, also be represented using matrixvariables.∂H = {Y | 〈A, Y 〉 = b} = {Y | 〈A, Y −Y p 〉 = 0} ⊂ R mn (120)Vector a from Figure 25 is normal to the hyperplane illustrated. Likewise,nonzero vectorized matrix A is normal to hyperplane ∂H ;A ⊥ ∂H in R mn (121)2.4.2.2 Vertex-description of hyperplaneAny hyperplane in R n may be described as affine hull of a minimal set ofpoints {x l ∈ R n , l = 1... n} arranged columnar in a matrix X ∈ R n×n : (78)∂H = aff{x l ∈ R n , l = 1... n} ,dim aff{x l ∀l}=n−1= aff X , dim aff X = n−1(122)= x 1 + R{x l − x 1 , l=2... n} , dim R{x l − x 1 , l=2... n}=n−1= x 1 + R(X − x 1 1 T ) , dim R(X − x 1 1 T ) = n−1

78 CHAPTER 2. CONVEX GEOMETRYA 1A 2A 30Figure 27: Any one particular point of three points illustrated does not belongto affine hull A i (i∈1, 2, 3, each drawn truncated) of points remaining.Three corresponding vectors in R 2 are, therefore, affinely independent (butneither linearly or conically independent).whereR(A) = {Ax | ∀x} (142)2.4.2.3 Affine independence, minimal setFor any particular affine set, a minimal set of points constituting itsvertex-description is an affinely independent generating set and vice versa.Arbitrary given points {x i ∈ R n , i=1... N} are affinely independent(a.i.) if and only if, over all ζ ∈ R N ζ T 1=1, ζ k = 0 (confer2.1.2)x i ζ i + · · · + x j ζ j − x k = 0, i≠ · · · ≠j ≠k = 1... N (123)has no solution ζ ; in words, iff no point from the given set can be expressedas an affine combination of those remaining. We deducel.i. ⇒ a.i. (124)Consequently, {x i , i=1... N} is an affinely independent set if and only if{x i −x 1 , i=2... N} is a linearly independent (l.i.) set. [[205,3] ] (Figure 27)XThis is equivalent to the property that the columns of1 T (for X ∈ R n×Nas in (76)) form a linearly independent set. [199,A.1.3]

2.4. HALFSPACE, HYPERPLANE 792.4.2.4 Preservation of affine independenceIndependence in the linear (2.1.2.1), affine, and conic (2.10.1) senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (76)holds an affinely independent set in its columns. Consider a transformationon the domain of such matricesT(X) : R n×N → R n×N XY (125)where fixed matrix Y [y 1 y 2 · · · y N ]∈ R N×N represents linear operator T .Affine independence of {Xy i ∈ R n , i=1... N} demands (by definition (123))there exist no solution ζ ∈ R N ζ T 1=1, ζ k = 0, toXy i ζ i + · · · + Xy j ζ j − Xy k = 0, i≠ · · · ≠j ≠k = 1... N (126)By factoring out X , we see that is ensured by affine independence of{y i ∈ R N } and by R(Y )∩ N(X) = 0 where2.4.2.5 Affine mapsN(A) = {x | Ax=0} (143)Affine transformations preserve affine hulls. Given any affine mapping T ofvector spaces and some arbitrary set C [307, p.8]aff(T C) = T(aff C) (127)2.4.2.6 PRINCIPLE 2: Supporting hyperplaneThe second most fundamental principle of convex geometry also follows fromthe geometric Hahn-Banach theorem [250,5.12] [18,1] that guaranteesexistence of at least one hyperplane in R n supporting a full-dimensionalconvex set 2.18 at each point on its boundary.The partial boundary ∂H of a halfspace that contains arbitrary set Y iscalled a supporting hyperplane ∂H to Y when the hyperplane contains atleast one point of Y . [307,11]2.18 It is customary to speak of a hyperplane supporting set C but not containing C ;called nontrivial support. [307, p.100] Hyperplanes in support of lower-dimensional bodiesare admitted.

80 CHAPTER 2. CONVEX GEOMETRYCH −H +0 > κ 3 > κ 2 > κ 1{z ∈ R 2 | a T z = κ 1 }a{z ∈ R 2 | a T z = κ 2 }{z ∈ R 2 | a T z = κ 3 }Figure 28: (confer Figure 73) Each linear contour, of equal inner product invector z with normal a , represents i th hyperplane in R 2 parametrized byscalar κ i . Inner product κ i increases in direction of normal a . i th linesegment {z ∈ C | a T z = κ i } in convex set C ⊂ R 2 represents intersection withhyperplane. (Cartesian axes drawn for reference.)

2.4. HALFSPACE, HYPERPLANE 81tradition(a)Yy paH +H −∂H −nontraditional(b)Yy pãH −H +∂H +Figure 29: (a) Hyperplane ∂H − (128) supporting closed set Y ⊂ R 2 .Vector a is inward-normal to hyperplane with respect to halfspace H + ,but outward-normal with respect to set Y . A supporting hyperplane canbe considered the limit of an increasing sequence in the normal-direction likethat in Figure 28. (b) Hyperplane ∂H + nontraditionally supporting Y .Vector ã is inward-normal to hyperplane now with respect to bothhalfspace H + and set Y . Tradition [199] [307] recognizes only positivenormal polarity in support function σ Y as in (129); id est, normal a ,figure (a). But both interpretations of supporting hyperplane are useful.

82 CHAPTER 2. CONVEX GEOMETRY2.4.2.6.1 Definition. Supporting hyperplane ∂H .Assuming set Y and some normal a≠0 reside in opposite halfspaces 2.19(Figure 29a), then a hyperplane supporting Y at point y p ∈ ∂Y is described∂H − = { y | a T (y − y p ) = 0, y p ∈ Y , a T (z − y p ) ≤ 0 ∀z∈Y } (128)Given only normal a , the hyperplane supporting Y is equivalently describedwhere real function∂H − = { y | a T y = sup{a T z |z∈Y} } (129)σ Y (a) = sup{a T z |z∈Y} (553)is called the support function for Y .Another equivalent but nontraditional representation 2.20 for a supportinghyperplane is obtained by reversing polarity of normal a ; (1678)∂H + = { y | ã T (y − y p ) = 0, y p ∈ Y , ã T (z − y p ) ≥ 0 ∀z∈Y }= { y | ã T y = − inf{ã T z |z∈Y} = sup{−ã T z |z∈Y} } (130)where normal ã and set Y both now reside in H + (Figure 29b).When a supporting hyperplane contains only a single point of Y , thathyperplane is termed strictly supporting. 2.21△A full-dimensional set that has a supporting hyperplane at every pointon its boundary, conversely, is convex. A convex set C ⊂ R n , for example,can be expressed as the intersection of all halfspaces partially bounded byhyperplanes supporting it; videlicet, [250, p.135]C = ⋂a∈R n {y | a T y ≤ σ C (a) } (131)by the halfspaces theorem (2.4.1.1.1).There is no geometric difference between supporting hyperplane ∂H + or∂H − or ∂H and 2.22 an ordinary hyperplane ∂H coincident with them.2.19 Normal a belongs to H + by definition.2.20 useful for constructing the dual cone; e.g., Figure 55b. Tradition would instead haveus construct the polar cone; which is, the negative dual cone.2.21 Rockafellar terms a strictly supporting hyperplane tangent to Y if it is unique there;[307,18, p.169] a definition we do not adopt because our only criterion for tangency isintersection exclusively with a relative boundary. Hiriart-Urruty & Lemaréchal [199, p.44](confer [307, p.100]) do not demand any tangency of a supporting hyperplane.2.22 If vector-normal polarity is unimportant, we may instead signify a supportinghyperplane by ∂H .

2.4. HALFSPACE, HYPERPLANE 832.4.2.6.2 Example. Minimization over hypercube.Consider minimization of a linear function over a hypercube, given vector cminimize c T xxsubject to −1 ≼ x ≼ 1(132)This convex optimization problem is called a linear program 2.23 because theobjective 2.24 of minimization c T x is a linear function of variable x and theconstraints describe a polyhedron (intersection of a finite number of halfspacesand hyperplanes).Any vector x satisfying the constraints is called a feasible solution.Applying graphical concepts from Figure 26, Figure 28, and Figure 29,x ⋆ = − sgn(c) is an optimal solution to this minimization problem but is notnecessarily unique. It generally holds for optimization problem solutions:optimal ⇒ feasible (133)Because an optimal solution always exists at a hypercube vertex (2.6.1.0.1)regardless of value of nonzero vector c in (132), [94] mathematicians see thisgeometry as a means to relax a discrete problem (whose desired solution isinteger or combinatorial, confer Example 4.2.3.1.1). [239,3.1] [240] 2.4.2.6.3 Exercise. Unbounded below.Suppose instead we minimize over the unit hypersphere in Example 2.4.2.6.2;‖x‖ ≤ 1. What is an expression for optimal solution now? Is that programstill linear?Now suppose minimization of absolute value in (132). Are the followingprograms equivalent for some arbitrary real convex set C ? (confer (517))minimize |x|x∈Rsubject to −1 ≤ x ≤ 1x ∈ C≡minimize α + βα , βsubject to 1 ≥ β ≥ 01 ≥ α ≥ 0α − β ∈ C(134)Many optimization problems of interest and some methods of solutionrequire nonnegative variables. The method illustrated below splits a variable2.23 The term program has its roots in economics. It was originally meant with regard toa plan or to efficient organization or systematization of some industrial process. [94,2]2.24 The objective is the function that is argument to minimization or maximization.

84 CHAPTER 2. CONVEX GEOMETRYinto parts; x = α − β (extensible to vectors). Under what conditions onvector a and scalar b is an optimal solution x ⋆ negative infinity?minimize α − βα∈R , β∈Rsubject to β ≥ 0α ≥ 0[ ] αa T = bβMinimization of the objective function entails maximization of β .(135)2.4.2.7 PRINCIPLE 3: Separating hyperplaneThe third most fundamental principle of convex geometry again follows fromthe geometric Hahn-Banach theorem [250,5.12] [18,1] [132,I.1.2] thatguarantees existence of a hyperplane separating two nonempty convex setsin R n whose relative interiors are nonintersecting. Separation intuitivelymeans each set belongs to a halfspace on an opposing side of the hyperplane.There are two cases of interest:1) If the two sets intersect only at their relative boundaries (2.1.7.2), thenthere exists a separating hyperplane ∂H containing the intersection butcontaining no points relatively interior to either set. If at least one ofthe two sets is open, conversely, then the existence of a separatinghyperplane implies the two sets are nonintersecting. [61,2.5.1]2) A strictly separating hyperplane ∂H intersects the closure of neither set;its existence is guaranteed when intersection of the closures is emptyand at least one set is bounded. [199,A.4.1]2.4.3 Angle between hyperspacesGiven halfspace-descriptions, dihedral angle between hyperplanes orhalfspaces is defined as the angle between their defining normals. Givennormals a and b respectively describing ∂H a and ∂H b , for example( ) 〈a , b〉(∂H a , ∂H b ) arccos radians (136)‖a‖ ‖b‖

2.5. SUBSPACE REPRESENTATIONS 852.5 Subspace representationsThere are two common forms of expression for Euclidean subspaces, bothcoming from elementary linear algebra: range form R and nullspace form N ;a.k.a, vertex-description and halfspace-description respectively.The fundamental vector subspaces associated with a matrix A∈ R m×n[331,3.1] are ordinarily related by orthogonal complementand of dimension:R(A T ) ⊥ N(A), N(A T ) ⊥ R(A) (137)R(A T ) ⊕ N(A) = R n , N(A T ) ⊕ R(A) = R m (138)dim R(A T ) = dim R(A) = rankA ≤ min{m,n} (139)with complementarity (a.k.a, conservation of dimension)dim N(A) = n − rankA , dim N(A T ) = m − rankA (140)These equations (137)-(140) comprise the fundamental theorem of linearalgebra. [331, p.95, p.138]From these four fundamental subspaces, the rowspace and range identifyone form of subspace description (range form or vertex-description (2.3.4))R(A T ) spanA T = {A T y | y ∈ R m } = {x∈ R n | A T y=x , y ∈R(A)} (141)R(A) spanA = {Ax | x∈ R n } = {y ∈ R m | Ax=y , x∈R(A T )} (142)while the nullspaces identify the second common form (nullspace form orhalfspace-description (113))N(A) {x∈ R n | Ax=0} = {x∈ R n | x ⊥ R(A T )} (143)N(A T ) {y ∈ R m | A T y=0} = {y ∈ R m | y ⊥ R(A)} (144)Range forms (141) (142) are realized as the respective span of the columnvectors in matrices A T and A , whereas nullspace form (143) or (144) is thesolution set to a linear equation similar to hyperplane definition (114). Yetbecause matrix A generally has multiple rows, halfspace-description N(A) isactually the intersection of as many hyperplanes through the origin; for (143),each row of A is normal to a hyperplane while each row of A T is a normalfor (144).

86 CHAPTER 2. CONVEX GEOMETRY2.5.0.0.1 Exercise. Subspace algebra.GivenR(A) + N(A T ) = R(B) + N(B T ) = R m (145)proveR(A) ⊇ N(B T ) ⇔ N(A T ) ⊆ R(B) (146)R(A) ⊇ R(B) ⇔ N(A T ) ⊆ N(B T ) (147)e.g., Theorem A.3.1.0.6.2.5.1 Subspace or affine subset. ..Any particular vector subspace R p can be described as N(A) the nullspaceof some matrix A or as R(B) the range of some matrix B .More generally, we have the choice of expressing an n − m-dimensionalaffine subset in R n as the intersection of m hyperplanes, or as the offset spanof n − m vectors:2.5.1.1 ...as hyperplane intersectionAny affine subset A of dimension n−m can be described as an intersectionof m hyperplanes in R n ; given fat (m≤n) full-rank (rank = min{m , n})matrix⎡a T1 ⎤A ⎣. ⎦∈ R m×n (148)and vector b∈R m ,a T mA {x∈ R n | Ax=b} =m⋂ { }x | aTi x=b ii=1(149)a halfspace-description. (113)For example: The intersection of any two independent 2.25 hyperplanesin R 3 is a line, whereas three independent hyperplanes intersect at apoint. In R 4 , the intersection of two independent hyperplanes is a plane2.25 Hyperplanes are said to be independent iff the normals defining them are linearlyindependent.

2.5. SUBSPACE REPRESENTATIONS 87(Example 2.5.1.2.1), whereas three hyperplanes intersect at a line, four at apoint, and so on. A describes a subspace whenever b = 0 in (149).For n>kA ∩ R k = {x∈ R n | Ax=b} ∩ R k =m⋂ { }x∈ R k | a i (1:k) T x=b ii=1(150)The result in2.4.2.2 is extensible; id est, any affine subset A also has avertex-description:2.5.1.2 ...as span of nullspace basisAlternatively, we may compute a basis for nullspace of matrix A (E.3.1) andthen equivalently express affine subset A as its span plus an offset: DefineZ basis N(A)∈ R n×n−m (151)so AZ = 0. Then we have a vertex-description in Z ,A = {x∈ R n | Ax = b} = { Zξ + x p | ξ ∈ R n−m} ⊆ R n (152)the offset span of n − m column vectors, where x p is any particular solutionto Ax = b . For example, A describes a subspace whenever x p = 0.2.5.1.2.1 Example. Intersecting planes in 4-space.Two planes can intersect at a point in four-dimensional Euclidean vectorspace. It is easy to visualize intersection of two planes in three dimensions;a line can be formed. In four dimensions it is harder to visualize. So let’sresort to the tools acquired.Suppose an intersection of two hyperplanes in four dimensions is specifiedby a fat full-rank matrix A 1 ∈ R 2×4 (m = 2, n = 4) as in (149):{ ∣ [ ] }∣∣∣A 1 x∈ R 4 a11 a 12 a 13 a 14x = ba 21 a 22 a 23 a 1 (153)24The nullspace of A 1 is two dimensional (from Z in (152)), so A 1 representsa plane in four dimensions. Similarly define a second plane in terms ofA 2 ∈ R 2×4 :{ ∣ [ ] }∣∣∣A 2 x∈ R 4 a31 a 32 a 33 a 34x = ba 41 a 42 a 43 a 2 (154)44

88 CHAPTER 2. CONVEX GEOMETRYIf the two planes are independent (meaning any line in one is linearlyindependent [ ] of any line from the other), they will intersect at a point becauseA1then is invertible;A 2A 1 ∩ A 2 ={ ∣ [ ] [ ]}∣∣∣x∈ R 4 A1 b1x =A 2 b 22.5.1.2.2 Exercise. Linear program.Minimize a hyperplane over affine set A in the nonnegative orthant(155)minimize c T xxsubject to Ax = bx ≽ 0(156)where A = {x | Ax = b}. Two cases of interest are drawn in Figure 30.Graphically illustrate and explain optimal solutions indicated in the caption.Why is α ⋆ negative in both cases? Is there solution on the vertical axis?What causes objective unboundedness in the latter case (b)? Describe allvectors c that would yield finite optimal objective in (b).Graphical solution to linear programmaximize c T xxsubject to x ∈ P(157)is illustrated in Figure 31. Bounded set P is an intersection of manyhalfspaces. Why is optimal solution x ⋆ not aligned with vector c as inCauchy-Schwarz inequality (2055)?2.5.2 Intersection of subspacesThe intersection of nullspaces associated with two matrices A∈ R m×n andB ∈ R k×n can be expressed most simply as([ ]) [ ]A AN(A) ∩ N(B) = N {x∈ R n | x = 0} (158)BBthe nullspace of their rowwise concatenation.

2.5. SUBSPACE REPRESENTATIONS 89(a)cA = {x |Ax = b}{x | c T x = α}A = {x |Ax = b}(b)c{x | c T x = α}Figure 30: Minimizing hyperplane over affine set A in nonnegative orthantR 2 + whose extreme directions (2.8.1) are the nonnegative Cartesian axes.(a) Optimal solution is • . (b) Optimal objective α ⋆ = −∞.

90 CHAPTER 2. CONVEX GEOMETRYPcx ⋆∂HFigure 31: Maximizing hyperplane ∂H , whose normal is vector c∈P , overpolyhedral set P in R 2 is a linear program (157). Optimal solution x ⋆ at • .Suppose the columns of a matrix Z constitute a basis for N(A) while thecolumns of a matrix W constitute a basis for N(BZ). Then [159,12.4.2]N(A) ∩ N(B) = R(ZW) (159)If each basis is orthonormal, then the columns of ZW constitute anorthonormal basis for the intersection.In the particular circumstance A and B are each positive semidefinite[21,6], or in the circumstance A and B are two linearly independent dyads(B.1.1), thenN(A) ∩ N(B) = N(A + B),⎧⎨⎩A,B ∈ S M +orA + B = u 1 v T 1 + u 2 v T 22.5.3 Visualization of matrix subspacesFundamental subspace relations, such as(l.i.)R(A T ) ⊥ N(A), N(A T ) ⊥ R(A) (137)(160)

2.5. SUBSPACE REPRESENTATIONS 91are partially defining. But to aid visualization of involved geometry, itsometimes helps to vectorize matrices. For any square matrix A , s∈ N(A) ,and w ∈ N(A T )〈A, ss T 〉 = 0, 〈A, ww T 〉 = 0 (161)because s T As = w T Aw = 0. This innocuous observation becomes a sharpinstrument for visualization of diagonalizable matrices (A.5.1): for rank-ρmatrix A∈ R M×MA = SΛS −1 = [ s 1 · · · s M ] Λ⎣⎡w T1.w T M⎤⎦ =M∑λ i s i wi T (1547)where nullspace eigenvectors are real by theorem A.5.0.0.1 and where (B.1.1)( )M∑R{s i ∈ R M |λ i =0} = R s i s T i = N(A)i=ρ+1( )(162)M∑R{w i ∈ R M |λ i =0} = R w i wiT = N(A T )i=ρ+1Define an unconventional basis among column vectors of each summation:i=1basis N(A) ⊆basis N(A T ) ⊆M ∑i=ρ+1∑ Mi=ρ+1s i s T i⊆ N(A)w i w T i ⊆ N(A T )(163)We shall regard a vectorized subspace as vectorization of any M ×M matrixwhose columns comprise an overcomplete basis for that subspace; e.g.,E.3.1∑vec basis N(A) = vec M s i s T ii=ρ+1∑vec basis N(A T ) = vec M w i wiTi=ρ+1(164)By this reckoning, vec basis R(A)= vec A but is not unique. Now, because〈〉〈〉M∑M∑A , s i s T i = 0, A , w i wiT = 0 (165)i=ρ+1i=ρ+1

92 CHAPTER 2. CONVEX GEOMETRYthen vectorized matrix A is normal to a hyperplane (of dimension M 2 −1)that contains both vectorized nullspaces (each of whose dimension is M −ρ);vec A ⊥ vec basis N(A), vec basis N(A T ) ⊥ vec A (166)These vectorized subspace orthogonality relations represent a departure(absent T ) from fundamental subspace relations (137) stated at the outset.2.6 Extreme, Exposed2.6.0.0.1 Definition. Extreme point.An extreme point x ε of a convex set C is a point, belonging to its closure C[42,3.3], that is not expressible as a convex combination of points in Cdistinct from x ε ; id est, for x ε ∈ C and all x 1 ,x 2 ∈ C \x εµx 1 + (1 − µ)x 2 ≠ x ε , µ ∈ [0, 1] (167)△In other words, x ε is an extreme point of C if and only if x ε is not apoint relatively interior to any line segment in C . [358,2.10]Borwein & Lewis offer: [55,4.1.6] An extreme point of a convex set C isa point x ε in C whose relative complement C \x ε is convex.The set consisting of a single point C ={x ε } is itself an extreme point.2.6.0.0.2 Theorem. Extreme existence. [307,18.5.3] [26,II.3.5]A nonempty closed convex set containing no lines has at least one extremepoint.⋄2.6.0.0.3 Definition. Face, edge. [199,A.2.3]Aface F of convex set C is a convex subset F ⊆ C such that everyclosed line segment x 1 x 2 in C , having a relatively interior point(x∈rel intx 1 x 2 ) in F , has both endpoints in F . The zero-dimensionalfaces of C constitute its extreme points. The empty set ∅ and C itselfare conventional faces of C . [307,18]All faces F are extreme sets by definition; id est, for F ⊆ C and allx 1 ,x 2 ∈ C \Fµx 1 + (1 − µ)x 2 /∈ F , µ ∈ [0, 1] (168)

2.6. EXTREME, EXPOSED 93A one-dimensional face of a convex set is called an edge.△Dimension of a face is the penultimate number of affinely independentpoints (2.4.2.3) belonging to it;dim F = sup dim{x 2 − x 1 , x 3 − x 1 , ... , x ρ − x 1 | x i ∈ F , i=1... ρ} (169)ρThe point of intersection in C with a strictly supporting hyperplaneidentifies an extreme point, but not vice versa. The nonempty intersection ofany supporting hyperplane with C identifies a face, in general, but not viceversa. To acquire a converse, the concept exposed face requires introduction:2.6.1 Exposure2.6.1.0.1 Definition. Exposed face, exposed point, vertex, facet.[199,A.2.3, A.2.4]Fis an exposed face of an n-dimensional convex set C iff there is asupporting hyperplane ∂H to C such thatF = C ∩ ∂H (170)Only faces of dimension −1 through n −1 can be exposed by ahyperplane.An exposed point, the definition of vertex, is equivalent to azero-dimensional exposed face; the point of intersection with a strictlysupporting hyperplane.Afacet is an (n −1)-dimensional exposed face of an n-dimensionalconvex set C ; facets exist in one-to-one correspondence with the(n −1)-dimensional faces. 2.26{exposed points} = {extreme points}{exposed faces} ⊆ {faces}△2.26 This coincidence occurs simply because the facet’s dimension is the same as thedimension of the supporting hyperplane exposing it.

94 CHAPTER 2. CONVEX GEOMETRYABCDFigure 32: Closed convex set in R 2 . Point A is exposed hence extreme;a classical vertex. Point B is extreme but not an exposed point. Point Cis exposed and extreme; zero-dimensional exposure makes it a vertex.Point D is neither an exposed or extreme point although it belongs to aone-dimensional exposed face. [199,A.2.4] [330,3.6] Closed face AB isexposed; a facet. The arc is not a conventional face, yet it is composedentirely of extreme points. Union of all rotations of this entire set aboutits vertical edge produces another convex set in three dimensions havingno edges; but that convex set produced by rotation about horizontal edgecontaining D has edges.

2.6. EXTREME, EXPOSED 952.6.1.1 Density of exposed pointsFor any closed convex set C , its exposed points constitute a dense subset ofits extreme points; [307,18] [335] [330,3.6, p.115] dense in the sense [373]that closure of that subset yields the set of extreme points.For the convex set illustrated in Figure 32, point B cannot be exposedbecause it relatively bounds both the facet AB and the closed quarter circle,each bounding the set. Since B is not relatively interior to any line segmentin the set, then B is an extreme point by definition. Point B may be regardedas the limit of some sequence of exposed points beginning at vertex C .2.6.1.2 Face transitivity and algebraFaces of a convex set enjoy transitive relation. If F 1 is a face (an extreme set)of F 2 which in turn is a face of F 3 , then it is always true that F 1 is aface of F 3 . (The parallel statement for exposed faces is false. [307,18])For example, any extreme point of F 2 is an extreme point of F 3 ; inthis example, F 2 could be a face exposed by a hyperplane supportingpolyhedron F 3 . [223, def.115/6 p.358] Yet it is erroneous to presume thata face, of dimension 1 or more, consists entirely of extreme points. Nor is aface of dimension 2 or more entirely composed of edges, and so on.For the polyhedron in R 3 from Figure 20, for example, the nonemptyfaces exposed by a hyperplane are the vertices, edges, and facets; thereare no more. The zero-, one-, and two-dimensional faces are in one-to-onecorrespondence with the exposed faces in that example.2.6.1.3 Smallest faceDefine the smallest face F , that contains some element G , of a convex set C :F(C ∋G) (171)videlicet, C ⊃ rel int F(C ∋G) ∋ G . An affine set has no faces except itselfand the empty set. The smallest face, that contains G , of the intersectionof convex set C with an affine set A [239,2.4] [240]F((C ∩ A)∋G) = F(C ∋G) ∩ A (172)equals the intersection of A with the smallest face, that contains G , of set C .

96 CHAPTER 2. CONVEX GEOMETRY2.6.1.4 Conventional boundary(confer2.1.7.2) Relative boundaryis equivalent to:rel ∂ C = C \ rel int C (24)2.6.1.4.1 Definition. Conventional boundary of convex set. [199,C.3.1]The relative boundary ∂ C of a nonempty convex set C is the union of allexposed faces of C .△Equivalence to (24) comes about because it is conventionally presumedthat any supporting hyperplane, central to the definition of exposure, doesnot contain C . [307, p.100] Any face F of convex set C (that is not C itself)belongs to rel∂C . (2.8.2.1)2.7 ConesIn optimization, convex cones achieve prominence because they generalizesubspaces. Most compelling is the projection analogy: Projection on asubspace can be ascertained from projection on its orthogonal complement(Figure 165), whereas projection on a closed convex cone can be determinedfrom projection instead on its algebraic complement (2.13,E.9.2.1); calledthe polar cone.2.7.0.0.1 Definition. Ray.The one-dimensional set{ζΓ + B | ζ ≥ 0, Γ ≠ 0} ⊂ R n (173)defines a halfline called a ray in nonzero direction Γ∈ R n having baseB ∈ R n . When B=0, a ray is the conic hull of direction Γ ; hence aconvex cone.△Relative boundary of a single ray, base 0 in any dimension, is the originbecause that is the union of all exposed faces not containing the entire set.Its relative interior is the ray itself excluding the origin.

2.7. CONES 97X(a)00(b)XFigure 33: (a) Two-dimensional nonconvex cone drawn truncated. Boundaryof this cone is itself a cone. Each polar half is itself a convex cone. (b) Thisconvex cone (drawn truncated) is a line through the origin in any dimension.It has no relative boundary, while its relative interior comprises entire line.0Figure 34: This nonconvex cone in R 2 is a pair of lines through the origin.[250,2.4] Because the lines are linearly independent, they are algebraiccomplements whose vector sum is R 2 a convex cone.

98 CHAPTER 2. CONVEX GEOMETRY0Figure 35: Boundary of a convex cone in R 2 is a nonconvex cone; a pair ofrays emanating from the origin.XXFigure 36: Union of two pointed closed convex cones is nonconvex cone X .

2.7. CONES 99XXFigure 37: Truncated nonconvex cone X = {x ∈ R 2 | x 1 ≥ x 2 , x 1 x 2 ≥ 0}.Boundary is also a cone. [250,2.4] Cartesian axes drawn for reference. Eachhalf (about the origin) is itself a convex cone.X0Figure 38: Nonconvex cone X drawn truncated in R 2 . Boundary is also acone. [250,2.4] Cone exterior is convex cone.

100 CHAPTER 2. CONVEX GEOMETRY2.7.1 Cone definedA set X is called, simply, cone if and only ifΓ ∈ X ⇒ ζΓ ∈ X for all ζ ≥ 0 (174)where X denotes closure of cone X . An example of such a cone is the unionof two opposing quadrants; e.g., X ={x∈ R 2 | x 1 x 2 ≥0} which is not convex.[371,2.5] Similar examples are shown in Figure 33 and Figure 37.All cones can be defined by an aggregate of rays emanating exclusivelyfrom the origin (but not all cones are convex). Hence all closed cones containthe origin 0 and are unbounded, excepting the simplest cone {0}. Theempty set ∅ is not a cone, but its conic hull is;cone ∅ = {0} (104)2.7.2 Convex coneWe call set K a convex cone iffΓ 1 , Γ 2 ∈ K ⇒ ζΓ 1 + ξΓ 2 ∈ K for all ζ,ξ ≥ 0 (175)id est, if and only if any conic combination of elements from K belongs to itsclosure. Apparent from this definition, ζΓ 1 ∈ K and ξΓ 2 ∈ K ∀ζ,ξ ≥ 0 ;meaning, K is a cone. Set K is convex since, for any particular ζ,ξ ≥ 0µζΓ 1 + (1 − µ)ξΓ 2 ∈ K ∀µ ∈ [0, 1] (176)because µζ,(1 − µ)ξ ≥ 0. Obviously,{X } ⊃ {K} (177)the set of all convex cones is a proper subset of all cones. The set ofconvex cones is a narrower but more familiar class of cone, any memberof which can be equivalently described as the intersection of a possibly(but not necessarily) infinite number of hyperplanes (through the origin)and halfspaces whose bounding hyperplanes pass through the origin; ahalfspace-description (2.4). Convex cones need not be full-dimensional.

2.7. CONES 101Figure 39: Not a cone; ironically, the three-dimensional flared horn (withor without its interior) resembling mathematical symbol ≻ denoting strictcone membership and partial order.More familiar convex cones are Lorentz cone (confer Figure 46) 2.27{[ xK l =t]}∈ R n × R | ‖x‖ l ≤ t, l=2 (178)and polyhedral cone (2.12.1.0.1); e.g., any orthant generated by Cartesianhalf-axes (2.1.3). Esoteric examples of convex cones include the point at theorigin, any line through the origin, any ray having the origin as base suchas the nonnegative real line R + in subspace R , any halfspace partiallybounded by a hyperplane through the origin, the positive semidefinitecone S M + (191), the cone of Euclidean distance matrices EDM N (893)(6), completely positive semidefinite matrices {CC T |C ≥ 0} [40, p.71], anysubspace, and Euclidean vector space R n .2.27 a.k.a: second-order cone, quadratic cone, circular cone (2.9.2.8.1), unboundedice-cream cone united with its interior.

102 CHAPTER 2. CONVEX GEOMETRY2.7.2.1 cone invarianceMore Euclidean bodies are cones, it seems, than are not. 2.28 This class ofconvex body, the convex cone, is invariant to scaling, linear and single- ormany-valued inverse linear transformation, vector summation, and Cartesianproduct, but is not invariant to translation. [307, p.22]2.7.2.1.1 Theorem. Cone intersection (nonempty).Intersection of an arbitrary collection of convex cones is a convex cone.[307,2,19]Intersection of an arbitrary collection of closed convex cones is a closedconvex cone. [258,2.3]Intersection of a finite number of polyhedral cones (Figure 50 p.143,2.12.1.0.1) remains a polyhedral cone.⋄The property pointedness is associated with a convex cone; but,pointed cone convex cone (Figure 35, Figure 36)2.7.2.1.2 Definition. Pointed convex cone. (confer2.12.2.2)A convex cone K is pointed iff it contains no line. Equivalently, K is notpointed iff there exists any nonzero direction Γ ∈ K such that −Γ ∈ K . Ifthe origin is an extreme point of K or, equivalently, ifK ∩ −K = {0} (179)then K is pointed, and vice versa. [330,2.10] A convex cone is pointed iffthe origin is the smallest nonempty face of its closure.△Then a pointed closed convex cone, by principle of separating hyperplane(2.4.2.7), has a strictly supporting hyperplane at the origin. The simplestand only bounded [371, p.75] convex cone K = {0} ⊆ R n is pointed, byconvention, but not full-dimensional. Its relative boundary is the empty set∅ (25) while its relative interior is the point 0 itself (12). The pointed convex2.28 confer Figures: 24 33 34 35 38 37 39 41 43 50 54 57 59 60 62 63 64 65 66 138151 175

2.7. CONES 103cone that is a halfline, emanating from the origin in R n , has relative boundary0 while its relative interior is the halfline itself excluding 0. Pointed areany Lorentz cone, cone of Euclidean distance matrices EDM N in symmetrichollow subspace S N h , and positive semidefinite cone S M + in ambient S M .2.7.2.1.3 Theorem. Pointed cones. [55,3.3.15, exer.20]A closed convex cone K ⊂ R n is pointed if and only if there exists a normal αsuch that the setC {x ∈ K | 〈x,α〉 = 1} (180)is closed, bounded, and K = cone C . Equivalently, K is pointed if and onlyif there exists a vector β normal to a hyperplane strictly supporting K ;id est, for some positive scalar ǫ〈x, β〉 ≥ ǫ‖x‖ ∀x∈ K (181)If closed convex cone K is not pointed, then it has no extreme point. 2.29Yet a pointed closed convex cone has only one extreme point [42,3.3]: theexposed point residing at the origin; its vertex. Pointedness is invariantto Cartesian product by (179). And from the cone intersection theorem itfollows that an intersection of convex cones is pointed if at least one of thecones is; implying, each and every nonempty exposed face of a pointed closedconvex cone is a pointed closed convex cone.⋄2.7.2.2 Pointed closed convex cone induces partial orderRelation ≼ represents partial order on some set if that relation possesses 2.30reflexivity (x≼x)antisymmetry (x≼z , z ≼x ⇒ x=z)transitivity (x≼y , y ≼z ⇒ x≼z),(x≼y , y ≺z ⇒ x≺z)2.29 nor does it have extreme directions (2.8.1).2.30 A set is totally ordered if it further obeys a comparability property of the relation: foreach and every x and y from the set, x ≼ y or y ≼ x; e.g., one-dimensional real vectorspace R is the smallest unbounded totally ordered and connected set.

104 CHAPTER 2. CONVEX GEOMETRYR 2C 1C 2(a)x + Kyx(b)y − KFigure 40: (confer Figure 69) (a) Point x is the minimum element of set C 1with respect to cone K because cone translated to x∈ C 1 contains entire set.(Cones drawn truncated.) (b) Point y is a minimal element of set C 2 withrespect to cone K because negative cone translated to y ∈ C 2 contains only y .These concepts, minimum/minimal, become equivalent under a total order.

2.7. CONES 105A pointed closed convex cone K induces partial order on R n or R m×n , [21,1][324, p.7] essentially defined by vector or matrix inequality;x ≼Kz ⇔ z − x ∈ K (182)x ≺Kz ⇔ z − x ∈ rel int K (183)Neither x or z is necessarily a member of K for these relations to hold.Only when K is a nonnegative orthant R n + do these inequalities reduceto ordinary entrywise comparison (2.13.4.2.3) while partial order lingers.Inclusive of that special case, we ascribe nomenclature generalized inequalityto comparison with respect to a pointed closed convex cone.We say two points x and y are comparable when x ≼ y or y ≼ xwith respect to pointed closed convex cone K . Visceral mechanics ofactually comparing points, when cone K is not an orthant, are wellillustrated in the example of Figure 63 which relies on the equivalentmembership-interpretation in definition (182) or (183).Comparable points and the minimum element of some vectorormatrix-valued partially ordered set are thus well defined, so nonincreasingsequences with respect to cone K can therefore converge in this sense: Pointx ∈ C is the (unique) minimum element of set C with respect to cone K ifffor each and every z ∈ C we have x ≼ z ; equivalently, iff C ⊆ x + K . 2.31A closely related concept, minimal element, is useful for partially orderedsets having no minimum element: Point x ∈ C is a minimal elementof set C with respect to pointed closed convex cone K if and only if(x − K) ∩ C = x . (Figure 40) No uniqueness is implied here, althoughimplicit is the assumption: dim K ≥ dim aff C . In words, a point that isa minimal element is smaller (with respect to K) than any other point in theset to which it is comparable.Further properties of partial order with respect to pointed closed convexcone K are not defining:homogeneity (x≼y , λ≥0 ⇒ λx≼λz),(x≺y , λ>0 ⇒ λx≺λz)additivity (x≼z , u≼v ⇒ x+u ≼ z+v),(x≺z , u≼v ⇒ x+u ≺ z+v)2.31 Borwein & Lewis [55,3.3 exer.21] ignore possibility of equality to x + K in thiscondition, and require a second condition: . . . and C ⊂ y + K for some y in R n impliesx ∈ y + K .

106 CHAPTER 2. CONVEX GEOMETRY2.7.2.2.1 Definition. Proper cone: a cone that ispointedclosedconvexfull-dimensional.△A proper cone remains proper under injective linear transformation.[90,5.1] Examples of proper cones are the positive semidefinite cone S M + inthe ambient space of symmetric matrices (2.9), the nonnegative real line R +in vector space R , or any orthant in R n , and the set of all coefficients ofunivariate degree-n polynomials nonnegative on interval [0, 1] [61, exmp.2.16]or univariate degree-2n polynomials nonnegative over R [61, exer.2.37].2.8 Cone boundaryEvery hyperplane supporting a convex cone contains the origin. [199,A.4.2]Because any supporting hyperplane to a convex cone must therefore itself bea cone, then from the cone intersection theorem (2.7.2.1.1) it follows:2.8.0.0.1 Lemma. Cone faces. [26,II.8]Each nonempty exposed face of a convex cone is a convex cone. ⋄2.8.0.0.2 Theorem. Proper-cone boundary.Suppose a nonzero point Γ lies on the boundary ∂K of proper cone K in R n .Then it follows that the ray {ζΓ | ζ ≥ 0} also belongs to ∂K . ⋄Proof. By virtue of its propriety, a proper cone guarantees existenceof a strictly supporting hyperplane at the origin. [307, cor.11.7.3] 2.32 Hencethe origin belongs to the boundary of K because it is the zero-dimensionalexposed face. The origin belongs to the ray through Γ , and the ray belongsto K by definition (174). By the cone faces lemma, each and every nonemptyexposed face must include the origin. Hence the closed line segment 0Γ mustlie in an exposed face of K because both endpoints do by Definition 2.6.1.4.1.2.32 Rockafellar’s corollary yields a supporting hyperplane at the origin to any convex conein R n not equal to R n .

2.8. CONE BOUNDARY 107That means there exists a supporting hyperplane ∂H to K containing 0Γ .So the ray through Γ belongs both to K and to ∂H . ∂H must thereforeexpose a face of K that contains the ray; id est,{ζΓ | ζ ≥ 0} ⊆ K ∩ ∂H ⊂ ∂K (184)Proper cone {0} in R 0 has no boundary (24) because (12)rel int{0} = {0} (185)The boundary of any proper cone in R is the origin.The boundary of any convex cone whose dimension exceeds 1 can beconstructed entirely from an aggregate of rays emanating exclusively fromthe origin.2.8.1 Extreme directionThe property extreme direction arises naturally in connection with thepointed closed convex cone K ⊂ R n , being analogous to extreme point.[307,18, p.162] 2.33 An extreme direction Γ ε of pointed K is a vectorcorresponding to an edge that is a ray emanating from the origin. 2.34Nonzero direction Γ ε in pointed K is extreme if and only ifζ 1 Γ 1 +ζ 2 Γ 2 ≠ Γ ε ∀ ζ 1 , ζ 2 ≥ 0, ∀ Γ 1 , Γ 2 ∈ K\{ζΓ ε ∈ K | ζ ≥0} (186)In words, an extreme direction in a pointed closed convex cone is thedirection of a ray, called an extreme ray, that cannot be expressed as a coniccombination of directions of any rays in the cone distinct from it.An extreme ray is a one-dimensional face of K . By (105), extremedirection Γ ε is not a point relatively interior to any line segment inK\{ζΓ ε ∈ K | ζ ≥0}. Thus, by analogy, the corresponding extreme ray{ζΓ ε ∈ K | ζ ≥0} is not a ray relatively interior to any plane segment 2.35in K .2.33 We diverge from Rockafellar’s extreme direction: “extreme point at infinity”.2.34 An edge (2.6.0.0.3) of a convex cone is not necessarily a ray. A convex cone maycontain an edge that is a line; e.g., a wedge-shaped polyhedral cone (K ∗ in Figure 41).2.35 A planar fragment; in this context, a planar cone.

108 CHAPTER 2. CONVEX GEOMETRY∂K ∗KFigure 41: K is a pointed polyhedral cone not full-dimensional in R 3 (drawntruncated in a plane parallel to the floor upon which you stand). Dual coneK ∗ is a wedge whose truncated boundary is illustrated (drawn perpendicularto the floor). In this particular instance, K ⊂ int K ∗ (excepting the origin).Cartesian coordinate axes drawn for reference.

2.8. CONE BOUNDARY 1092.8.1.1 extreme distinction, uniquenessAn extreme direction is unique, but its vector representation Γ ε is notbecause any positive scaling of it produces another vector in the same(extreme) direction. Hence an extreme direction is unique to within a positivescaling. When we say extreme directions are distinct, we are referring todistinctness of rays containing them. Nonzero vectors of various length inthe same extreme direction are therefore interpreted to be identical extremedirections. 2.36The extreme directions of the polyhedral cone in Figure 24 (p.71), forexample, correspond to its three edges. For any pointed polyhedral cone,there is a one-to-one correspondence of one-dimensional faces with extremedirections.The extreme directions of the positive semidefinite cone (2.9) comprisethe infinite set of all symmetric rank-one matrices. [21,6] [195,III] Itis sometimes prudent to instead consider the less infinite but completenormalized set, for M >0 (confer (235)){zz T ∈ S M | ‖z‖= 1} (187)The positive semidefinite cone in one dimension M =1, S + the nonnegativereal line, has one extreme direction belonging to its relative interior; anidiosyncrasy of dimension 1.Pointed closed convex cone K = {0} has no extreme direction becauseextreme directions are nonzero by definition.If closed convex cone K is not pointed, then it has no extreme directionsand no vertex. [21,1]Conversely, pointed closed convex cone K is equivalent to the convex hullof its vertex and all its extreme directions. [307,18, p.167] That is thepractical utility of extreme direction; to facilitate construction of polyhedralsets, apparent from the extremes theorem:2.36 Like vectors, an extreme direction can be identified with the Cartesian point at thevector’s head with respect to the origin.

110 CHAPTER 2. CONVEX GEOMETRY2.8.1.1.1 Theorem. (Klee) Extremes. [330,3.6] [307,18, p.166](confer2.3.2,2.12.2.0.1) Any closed convex set containing no lines can beexpressed as the convex hull of its extreme points and extreme rays. ⋄It follows that any element of a convex set containing no lines maybe expressed as a linear combination of its extreme elements; e.g.,Example 2.9.2.7.1.2.8.1.2 GeneratorsIn the narrowest sense, generators for a convex set comprise any collectionof points and directions whose convex hull constructs the set.When the extremes theorem applies, the extreme points and directionsare called generators of a convex set. An arbitrary collection of generatorsfor a convex set includes its extreme elements as a subset; the set of extremeelements of a convex set is a minimal set of generators for that convex set.Any polyhedral set has a minimal set of generators whose cardinality is finite.When the convex set under scrutiny is a closed convex cone, coniccombination of generators during construction is implicit as shown inExample 2.8.1.2.1 and Example 2.10.2.0.1. So, a vertex at the origin (if itexists) becomes benign.We can, of course, generate affine sets by taking the affine hull of anycollection of points and directions. We broaden, thereby, the meaning ofgenerator to be inclusive of all kinds of hulls.Any hull of generators is loosely called a vertex-description. (2.3.4)Hulls encompass subspaces, so any basis constitutes generators for avertex-description; span basis R(A).2.8.1.2.1 Example. Application of extremes theorem.Given an extreme point at the origin and N extreme rays, denoting the i thextreme direction by Γ i ∈ R n , then their convex hull is (86)P = { [0 Γ 1 Γ 2 · · · Γ N ] aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ]aζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ]b | b ≽ 0 } (188)⊂ R na closed convex set that is simply a conic hull like (103).

2.8. CONE BOUNDARY 1112.8.2 Exposed direction2.8.2.0.1 Definition. Exposed point & direction of pointed convex cone.[307,18] (confer2.6.1.0.1)When a convex cone has a vertex, an exposed point, it resides at theorigin; there can be only one.In the closure of a pointed convex cone, an exposed direction is thedirection of a one-dimensional exposed face that is a ray emanatingfrom the origin.{exposed directions} ⊆ {extreme directions}△For a proper cone in vector space R n with n ≥ 2, we can say more:{exposed directions} = {extreme directions} (189)It follows from Lemma 2.8.0.0.1 for any pointed closed convex cone, thereis one-to-one correspondence of one-dimensional exposed faces with exposeddirections; id est, there is no one-dimensional exposed face that is not a raybase 0.The pointed closed convex cone EDM 2 , for example, is a ray inisomorphic subspace R whose relative boundary (2.6.1.4.1) is the origin.The conventionally exposed directions of EDM 2 constitute the empty set∅ ⊂ {extreme direction}. This cone has one extreme direction belonging toits relative interior; an idiosyncrasy of dimension 1.2.8.2.1 Connection between boundary and extremes2.8.2.1.1 Theorem. Exposed. [307,18.7] (confer2.8.1.1.1)Any closed convex set C containing no lines (and whose dimension is atleast 2) can be expressed as closure of the convex hull of its exposed pointsand exposed rays.⋄From Theorem 2.8.1.1.1,rel ∂ C = C \ rel int C (24)= conv{exposed points and exposed rays} \ rel int C= conv{extreme points and extreme rays} \ rel int C⎫⎪⎬⎪⎭(190)

112 CHAPTER 2. CONVEX GEOMETRYBCADFigure 42: Properties of extreme points carry over to extreme directions.[307,18] Four rays (drawn truncated) on boundary of conic hull oftwo-dimensional closed convex set from Figure 32 lifted to R 3 . Raythrough point A is exposed hence extreme. Extreme direction B on coneboundary is not an exposed direction, although it belongs to the exposedface cone{A,B}. Extreme ray through C is exposed. Point D is neitheran exposed or extreme direction although it belongs to a two-dimensionalexposed face of the conic hull.0

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 113Thus each and every extreme point of a convex set (that is not a point)resides on its relative boundary, while each and every extreme direction of aconvex set (that is not a halfline and contains no line) resides on its relativeboundary because extreme points and directions of such respective sets donot belong to relative interior by definition.The relationship between extreme sets and the relative boundary actuallygoes deeper: Any face F of convex set C (that is not C itself) belongs torel ∂ C , so dim F < dim C . [307,18.1.3]2.8.2.2 Converse caveatIt is inconsequent to presume that each and every extreme point and directionis necessarily exposed, as might be erroneously inferred from the conventionalboundary definition (2.6.1.4.1); although it can correctly be inferred: eachand every extreme point and direction belongs to some exposed face.Arbitrary points residing on the relative boundary of a convex set are notnecessarily exposed or extreme points. Similarly, the direction of an arbitraryray, base 0, on the boundary of a convex cone is not necessarily an exposedor extreme direction. For the polyhedral cone illustrated in Figure 24, forexample, there are three two-dimensional exposed faces constituting theentire boundary, each composed of an infinity of rays. Yet there are onlythree exposed directions.Neither is an extreme direction on the boundary of a pointed convex conenecessarily an exposed direction. Lift the two-dimensional set in Figure 32,for example, into three dimensions such that no two points in the set arecollinear with the origin. Then its conic hull can have an extreme directionB on the boundary that is not an exposed direction, illustrated in Figure 42.2.9 Positive semidefinite (PSD) coneThe cone of positive semidefinite matrices studied in this sectionis arguably the most important of all non-polyhedral cones whosefacial structure we completely understand.−Alexander Barvinok [26, p.78]

114 CHAPTER 2. CONVEX GEOMETRY2.9.0.0.1 Definition. Positive semidefinite cone.The set of all symmetric positive semidefinite matrices of particulardimension M is called the positive semidefinite cone:S M + { A ∈ S M | A ≽ 0 }= { A ∈ S M | y T Ay ≥0 ∀ ‖y‖ = 1 }= ⋂ {A ∈ S M | 〈yy T , A〉 ≥ 0 }(191)‖y‖=1= {A ∈ S M + | rankA ≤ M }formed by the intersection of an infinite number of halfspaces (2.4.1.1) invectorized variable 2.37 A , each halfspace having partial boundary containingthe origin in isomorphic R M(M+1)/2 . It is a unique immutable proper conein the ambient space of symmetric matrices S M .The positive definite (full-rank) matrices comprise the cone interiorint S M + = { A ∈ S M | A ≻ 0 }= { A ∈ S M | y T Ay>0 ∀ ‖y‖ = 1 }= ⋂ {A ∈ S M | 〈yy T , A〉 > 0 }‖y‖=1= {A ∈ S M + | rankA = M }(192)while all singular positive semidefinite matrices (having at least one0 eigenvalue) reside on the cone boundary (Figure 43); (A.7.5)∂S M + = { A ∈ S M | A ≽ 0, A ⊁ 0 }= { A ∈ S M | min{λ(A) i , i=1... M } = 0 }= { A ∈ S M + | 〈yy T , A〉=0 for some ‖y‖ = 1 }= {A ∈ S M + | rankA < M }(193)where λ(A)∈ R M holds the eigenvalues of A .The only symmetric positive semidefinite matrix in S M +0-eigenvalues resides at the origin. (A.7.3.0.1)△having M2.37 infinite in number when M >1. Because y T A y=y T A T y , matrix A is almost alwaysassumed symmetric. (A.2.1)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 115γsvec ∂ S 2 +[ α ββ γ]α√2βMinimal set of generators are the extreme directions: svec{yy T | y ∈ R M }Figure 43: (d’Aspremont) Truncated boundary of PSD cone in S 2 plotted inisometrically isomorphic R 3 via svec (56); 0-contour of smallest eigenvalue(193). Lightest shading is closest, darkest shading is farthest and inside shell.Entire boundary can be constructed{ from an√ aggregate of rays (2.7.0.0.1)emanating exclusively from origin: κ 2 [z12 2z1 z 2 z2 2 ] T | κ∈ R , z ∈ R 2} .A circular cone in this dimension (2.9.2.8), each and every ray on boundarycorresponds to an extreme direction but such is not the case in any higherdimension (confer Figure 24). PSD cone geometry is not as simple in higherdimensions [26,II.12] although PSD cone is selfdual (377) in ambient realspace of symmetric matrices. [195,II] PSD cone has no two-dimensionalface in any dimension, its only extreme point residing at 0.

116 CHAPTER 2. CONVEX GEOMETRY2.9.0.1 MembershipObserve notation A ≽ 0 denoting a positive semidefinite matrix; 2.38 meaning(confer2.3.1.1), matrix A belongs to the positive semidefinite cone in thesubspace of symmetric matrices whereas A ≻ 0 denotes membership to thatcone’s interior. (2.13.2) Notation A ≻ 0 , denoting a positive definite matrix,can be read: symmetric matrix A is greater than the origin with respect to thepositive semidefinite cone. These notations further imply that coordinates[sic] for orthogonal expansion of a positive (semi)definite matrix must be its(nonnegative) positive eigenvalues (2.13.7.1.1,E.6.4.1.1) when expandedin its eigenmatrices (A.5.0.3); id est, eigenvalues must be (nonnegative)positive.Generalizing comparison on the real line, the notation A ≽B denotescomparison with respect to the positive semidefinite cone; (A.3.1) id est,A ≽B ⇔ A −B ∈ S M + but neither matrix A or B necessarily belongs tothe positive semidefinite cone. Yet, (1486) A ≽B , B ≽0 ⇒ A≽0 ; id est,A∈ S M + . (confer Figure 63)2.9.0.1.1 Example. Equality constraints in semidefinite program (649).Employing properties of partial order (2.7.2.2) for the pointed closed convexpositive semidefinite cone, it is easy to show, given A + S = CS ≽ 0 ⇔ A ≼ CS ≻ 0 ⇔ A ≺ C(194)2.9.1 Positive semidefinite cone is convexThe set of all positive semidefinite matrices forms a convex cone in theambient space of symmetric matrices because any pair satisfies definition(175); [202,7.1] videlicet, for all ζ 1 , ζ 2 ≥ 0 and each and every A 1 , A 2 ∈ S Mζ 1 A 1 + ζ 2 A 2 ≽ 0 ⇐ A 1 ≽ 0, A 2 ≽ 0 (195)a fact easily verified by the definitive test for positive semidefiniteness of asymmetric matrix (A):A ≽ 0 ⇔ x T Ax ≥ 0 for each and every ‖x‖ = 1 (196)2.38 the same as nonnegative definite matrix.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 117CXFigure 44: Convex set C ={X ∈ S × x∈ R | X ≽ xx T } drawn truncated.xid est, for A 1 , A 2 ≽ 0 and each and every ζ 1 , ζ 2 ≥ 0ζ 1 x T A 1 x + ζ 2 x T A 2 x ≥ 0 for each and every normalized x ∈ R M (197)The convex cone S M + is more easily visualized in the isomorphic vectorspace R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. When M = 2 the PSD cone is semiinfinite inexpanse in R 3 , having boundary illustrated in Figure 43. When M = 3 thePSD cone is six-dimensional, and so on.2.9.1.0.1 Example. Sets from maps of positive semidefinite cone.The setC = {X ∈ S n × x∈ R n | X ≽ xx T } (198)is convex because it has Schur-form; (A.4)X − xx T ≽ 0 ⇔f(X , x) [ X xx T 1]≽ 0 (199)e.g., Figure 44. Set C is the inverse image (2.1.9.0.1) of S n+1+ under affinemapping f . The set {X ∈ S n × x∈ R n | X ≼ xx T } is not convex, in contrast,having no Schur-form. Yet for fixed x = x p , the set{X ∈ S n | X ≼ x p x T p } (200)is simply the negative semidefinite cone shifted to x p x T p .

118 CHAPTER 2. CONVEX GEOMETRY2.9.1.0.2 Example. Inverse image of positive semidefinite cone.Now consider finding the set of all matrices X ∈ S N satisfyinggiven A,B∈ S N . Define the setAX + B ≽ 0 (201)X {X | AX + B ≽ 0} ⊆ S N (202)which is the inverse image of the positive semidefinite cone under affinetransformation g(X)AX+B . Set X must therefore be convex byTheorem 2.1.9.0.1.Yet we would like a less amorphous characterization of this set, so insteadwe consider its vectorization (37) which is easier to visualize:wherevec g(X) = vec(AX) + vec B = (I ⊗A) vec X + vec B (203)I ⊗A QΛQ T ∈ S N2 (204)is block-diagonal formed by Kronecker product (A.1.1 no.31,D.1.2.1).Assignx vec X ∈ R N2(205)N2b vec B ∈ Rthen make the equivalent problem: Findwherevec X = {x∈ R N2 | (I ⊗A)x + b ∈ K} (206)K vec S N + (207)is a proper cone isometrically isomorphic with the positive semidefinite conein the subspace of symmetric matrices; the vectorization of every element ofS N + . Utilizing the diagonalization (204),vec X = {x | ΛQ T x ∈ Q T (K − b)}= {x | ΦQ T x ∈ Λ † Q T N2(208)(K − b)} ⊆ Rwhere †denotes matrix pseudoinverse (E) andΦ Λ † Λ (209)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 119is a diagonal projection matrix whose entries are either 1 or 0 (E.3). Wehave the complementary sumΦQ T x + (I − Φ)Q T x = Q T x (210)So, adding (I −Φ)Q T x to both sides of the membership within (208) admitsvec X = {x∈ R N2 | Q T x ∈ Λ † Q T (K − b) + (I − Φ)Q T x}= {x | Q T x ∈ Φ ( Λ † Q T (K − b) ) ⊕ (I − Φ)R N2 }= {x ∈ QΛ † Q T (K − b) ⊕ Q(I − Φ)R N2 }= (I ⊗A) † (K − b) ⊕ N(I ⊗A)(211)where we used the facts: linear function Q T x in x on R N2 is a bijection,and ΦΛ † = Λ † .vec X = (I ⊗A) † vec(S N + − B) ⊕ N(I ⊗A) (212)In words, set vec X is the vector sum of the translated PSD cone(linearly mapped onto the rowspace of I ⊗ A (E)) and the nullspace ofI ⊗ A (synthesis of fact fromA.6.3 andA.7.3.0.1). Should I ⊗A have nonullspace, then vec X =(I ⊗A) −1 vec(S N + − B) which is the expected result.2.9.2 Positive semidefinite cone boundaryFor any symmetric positive semidefinite matrix A of rank ρ , there mustexist a rank ρ matrix Y such that A be expressible as an outer productin Y ; [331,6.3]A = Y Y T ∈ S M + , rankA=ρ , Y ∈ R M×ρ (213)Then the boundary of the positive semidefinite cone may be expressed∂S M + = { A ∈ S M + | rankA

120 CHAPTER 2. CONVEX GEOMETRY√2βsvec S 2 +(a)(b)α[ α ββ γ]Figure 45: (a) Projection of truncated PSD cone S 2 + , truncated above γ =1,on αβ-plane in isometrically isomorphic R 3 . View is from above with respectto Figure 43. (b) Truncated above γ =2. From these plots we might infer,for example, line { [ 0 1/ √ 2 γ ] T | γ ∈ R } intercepts PSD cone at some largevalue of γ ; in fact, γ =∞.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1212.9.2.1 rank ρ subset of the positive semidefinite coneFor the same reason (closure), this applies more generally; for 0≤ρ≤M{A ∈ SM+ | rankA=ρ } = { A ∈ S M + | rankA≤ρ } (216)For easy reference, we give such generally nonconvex sets a name: rank ρsubset of a positive semidefinite cone. For ρ < M this subset, nonconvex forM > 1, resides on the positive semidefinite cone boundary.2.9.2.1.1 Exercise. Closure and rank ρ subset.Prove equality in (216).For example,∂S M + = { A ∈ S M + | rankA=M− 1 } = { A ∈ S M + | rankA≤M− 1 } (217)In S 2 , each and every ray on the boundary of the positive semidefinite conein isomorphic R 3 corresponds to a symmetric rank-1 matrix (Figure 43), butthat does not hold in any higher dimension.2.9.2.2 Subspace tangent to open rank ρ subsetWhen the positive semidefinite cone subset in (216) is left unclosed as inM(ρ) { A ∈ S N + | rankA=ρ } (218)then we can specify a subspace tangent to the positive semidefinite coneat a particular member of manifold M(ρ). Specifically, the subspace R Mtangent to manifold M(ρ) at B ∈ M(ρ) [186,5, prop.1.1]R M (B) {XB + BX T | X ∈ R N×N } ⊆ S N (219)has dimension(dim svec R M (B) = ρ N − ρ − 1 )= ρ(N − ρ) +2ρ(ρ + 1)2(220)Tangent subspace R M contains no member of the positive semidefinite coneS N + whose rank exceeds ρ .Subspace R M (B) is a hyperplane supporting S N + when B ∈ M(N −1).Another good example of tangent subspace is given inE.7.2.0.2 by (2000);R M (11 T ) = S N⊥c , orthogonal complement to the geometric center subspace.(Figure 148, p.532)

122 CHAPTER 2. CONVEX GEOMETRY2.9.2.3 Faces of PSD cone, their dimension versus rankEach and every face of the positive semidefinite cone, having dimension lessthan that of the cone, is exposed. [246,6] [215,2.3.4] Because each andevery face of the positive semidefinite cone contains the origin (2.8.0.0.1),each face belongs to a subspace of dimension the same as the face.Define F(S M + ∋A) (171) as the smallest face, that contains a givenpositive semidefinite matrix A , of positive semidefinite cone S M + . Thenmatrix A , having ordered diagonalization A = QΛQ T ∈ S M + (A.5.1), isrelatively interior to 2.39 [26,II.12] [113,31.5.3] [239,2.4] [240]F ( S M + ∋A ) = {X ∈ S M + | N(X) ⊇ N(A)}= {X ∈ S M + | 〈Q(I − ΛΛ † )Q T , X〉 = 0}= {QΛΛ † ΨΛΛ † Q T | Ψ ∈ S M + }= QΛΛ † S M + ΛΛ † Q T≃ S rank A+(221)which is isomorphic with convex cone S rank A+ ; e.g., Q S M + Q T = S M + . Thelarger the nullspace of A , the smaller the face. (140) Thus dimension of thesmallest face that contains given matrix A isdim F ( S M + ∋A ) = rank(A)(rank(A) + 1)/2 (222)in isomorphic R M(M+1)/2 , and each and every face of S M + is isomorphic witha positive semidefinite cone having dimension the same as the face. Observe:not all dimensions are represented, and the only zero-dimensional face is theorigin. The positive semidefinite cone has no facets, for example.2.9.2.3.1 Table: Rank k versus dimension of S 3 + facesk dim F(S 3 + ∋ rank-k matrix)0 0boundary ≤1 1≤2 3interior ≤3 62.39 For X ∈ S M + , A=QΛQ T ∈ S M + , show N(X) ⊇ N(A) ⇔ 〈Q(I − ΛΛ † )Q T , X 〉 = 0.Given 〈Q(I − ΛΛ † )Q T , X 〉 = 0 ⇔ R(X)⊥ N(A). (A.7.4)(⇒) Assume N(X) ⊇ N(A), then R(X)⊥ N(X) ⊇ N(A).(⇐) Assume R(X)⊥ N(A), then X Q(I − ΛΛ † )Q T = 0 ⇒ N(X) ⊇ N(A).

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 123For the positive semidefinite cone S 2 + in isometrically isomorphic R 3depicted in Figure 43, rank-2 matrices belong to the interior of that facehaving dimension 3 (the entire closed cone), rank-1 matrices belong torelative interior of a face having dimension 2.40 1, and the only rank-0 matrixis the point at the origin (the zero-dimensional face).2.9.2.3.2 Exercise. Bijective isometry.Prove that the smallest face of positive semidefinite cone S M + , containing aparticular full-rank matrix A having ordered diagonalization QΛQ T , is theentire cone: id est, prove Q S M + Q T = S M + from (221).2.9.2.4 rank-k face of PSD coneAny rank-k

124 CHAPTER 2. CONVEX GEOMETRY2.9.2.5 PSD cone face containing principal submatrixA principal submatrix of a matrix A∈ R M×M is formed by discarding anyparticular subset of its rows and columns having the same indices. Thereare M!/(1!(M −1)!) principal 1 × 1 submatrices, M!/(2!(M −2)!) principal2 × 2 submatrices, and so on, totaling 2 M − 1 principal submatricesincluding A itself. Principal submatrices of a symmetric matrix aresymmetric. A given symmetric matrix has rank ρ iff it has a nonsingularprincipal ρ×ρ submatrix but none larger. [297,5-10] By loading vectory in test y T Ay (A.2) with various binary patterns, it follows thatany principal submatrix must be positive (semi)definite whenever A is(Theorem A.3.1.0.4). If positive semidefinite matrix A∈ S M + has principalsubmatrix of dimension ρ with rank r , then rankA ≤ M −ρ+r by (1536).Because each and every principal submatrix of a positive semidefinitematrix in S M is positive semidefinite, then each principal submatrix belongsto a certain face of positive semidefinite cone S M + by (222). Of specialinterest are full-rank positive semidefinite principal submatrices, for thendescription of smallest face becomes simpler. We can find the smallest face,that contains a particular full-rank principal submatrix of A , by embeddingthat submatrix in a 0 matrix of the same dimension as A : Were Φ a binarydiagonal matrixΦ = δ 2 (Φ)∈ S M , Φ ii ∈ {0, 1} (224)having diagonal entry 0 corresponding to a discarded row and columnfrom A∈ S M + , then any principal submatrix 2.42 so embedded canbe expressed ΦAΦ ; id est, for an embedded principal submatrixΦAΦ∈ S M + rank ΦAΦ = rank Φ ≤ rankAF ( S M + ∋ΦAΦ ) = {X ∈ S M + | N(X) ⊇ N(ΦAΦ)}= {X ∈ S M + | 〈I − Φ , X〉 = 0}= {ΦΨΦ | Ψ ∈ S M + }≃ S rankΦ+(225)Smallest face that contains an embedded principal submatrix, whose rankis not necessarily full, may be expressed like (221): For embedded principal] I 02.42 To express a leading principal submatrix, for example, Φ =[0 T .0

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 125submatrix ΦAΦ∈ S M + rank ΦAΦ ≤ rank Φ , apply ordered diagonalizationinstead toˆΦ T A ˆΦ = UΥU T ∈ S rankΦ+ (226)ThenF ( S M + ∋ΦAΦ ) = {X ∈ S M + | N(X) ⊇ N(ΦAΦ)}= {X ∈ S M + | 〈ˆΦU(I − ΥΥ † )U TˆΦT + I − Φ , X〉 = 0}= {ˆΦUΥΥ † ΨΥΥ † U TˆΦT | Ψ ∈ S rankΦ+ }≃ S rankΦAΦ+(227)where binary diagonal matrix Φ is partitioned into nonzero and zero columnsby permutation Ξ∈ R M×M ;ΦΞ T [ ˆΦ 0 ]∈ R M×M , rank ˆΦ = rank Φ , Φ = ˆΦˆΦ T ∈ S M , ˆΦTˆΦ = I (228)Any embedded principal submatrix may be expressedΦAΦ = ˆΦˆΦ T A ˆΦˆΦ T ∈ S M + (229)where ˆΦ T A ˆΦ∈ S rankΦ+ extracts the principal submatrix whereas ˆΦˆΦ T A ˆΦˆΦ Tembeds it.2.9.2.5.1 Example. Smallest face containing disparate elements.Imagine two vectorized matrices A 1 and A 2 on diametrically opposed sidesof the positive semidefinite cone boundary pictured in Figure 43. Smallestface formula (221) can be altered to accommodate a union of points:F(S M + ⊃ ⋃ iA i)={X ∈ S M +∣ N(X) ⊇ ⋂ iN(A i )}(230)To see that, regard svecA 1 as normal to a hyperplane in R 3 containing avectorized basis for its nullspace: svec basis N(A 1 ) (2.5.3). Similarly, thereis a second hyperplane containing svec basis N(A 2 ) having normal svecA 2 .While each hyperplane is two-dimensional, each nullspace has only oneaffine dimension because A 1 and A 2 are rank-1. Because our interest isonly that part of the nullspace in the positive semidefinite cone, then forX,A i ∈ S M + by〈X , A i 〉 = 0 ⇔ XA i = A i X = 0 (1590)

126 CHAPTER 2. CONVEX GEOMETRYwe may ignore the fact that vectorized nullspace svec basis N(A i ) is a propersubspace of the hyperplane. We may think instead in terms of wholehyperplanes because equivalence (1590) says that the positive semidefinitecone effectively filters that subset of the hyperplane, whose normal is A i ,that constitutes N(A i ).And so hyperplane intersection makes a line intersecting the positivesemidefinite cone but only at the origin. In this hypothetical example,smallest face containing those two matrices therefore comprises the entirecone because every positive semidefinite matrix has nullspace containing 0.The smaller the intersection, the larger the smallest face. 2.9.2.5.2 Exercise. Disparate elements.Prove that (230) holds for an arbitrary set {A i ∈ S M + }. One way is by showing⋂ N(Ai ) ∩ S M + = conv({A i }) ⊥ ∩ S M + ; with perpendicularity ⊥ as in (373). 2.432.9.2.6 face of all PSD matrices having same principal submatrixNow we ask what is the smallest face of the positive semidefinite conecontaining all matrices having a principal submatrix in common; in otherwords, that face containing all PSD matrices (of any rank) with particularentries fixed − the smallest face containing all PSD matrices whose fixedentries correspond to some given principal submatrix ΦAΦ . To maintaingenerality, 2.44 we move an extracted principal submatrix ˆΦ T A ˆΦ∈ S rankΦ+ intoleading position via permutation Ξ from (228): for A∈ S M +[ΞAΞ T ˆΦT A ˆΦ BB T C]∈ S M + (231)By properties of partitioned PSD matrices inA.4.0.1,([basis NˆΦT A ˆΦ]) [ ]B 0B T ⊇C I − CC † (232)[ ] 0Hence N(ΞXΞ T ) ⊇ N(ΞAΞ T ) span in a smallest face F formulaI2.45because all PSD matrices, given fixed principal submatrix, are admitted:2.43 Hint: (1590) (1911).2.44 to fix any principal submatrix; not only leading principal submatrices.2.45 meaning, more pertinently, I − Φ is dropped from (227).

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 127Define a set of all PSD matrices having fixed principal submatrix ΦAΦ{ [S Ξ ˆΦT T A ˆΦ]}BB T Ξ ≽ 0C ∣ B ∈ RrankΦ×M−rankΦ , C ∈ S M−rankΦ+ (233)SoF ( S M + ⊇ S ) = { X ∈ S M + | N(X) ⊇ N(S) }= {X ∈ S M + | 〈ˆΦU(I − ΥΥ † )U TˆΦT , X〉 = 0}{ [ ] [ ]}UΥΥ= Ξ T †0 ΥΥ0 T Ψ† U T 0I 0 T ΞI ∣ Ψ ∈ SM +M−rank Φ+rankΦAΦ≃ S+(234)In the case that ΦAΦ is a leading principal submatrix, then Ξ=I .2.9.2.7 Extreme directions of positive semidefinite coneBecause the positive semidefinite cone is pointed (2.7.2.1.2), there is aone-to-one correspondence of one-dimensional faces with extreme directionsin any dimension M ; id est, because of the cone faces lemma (2.8.0.0.1)and direct correspondence of exposed faces to faces of S M + , it follows: thereis no one-dimensional face of the positive semidefinite cone that is not a rayemanating from the origin.Symmetric dyads constitute the set of all extreme directions: For M >1{yy T ∈ S M | y ∈ R M } ⊂ ∂S M + (235)this superset of extreme directions (infinite in number, confer (187)) for thepositive semidefinite cone is, generally, a subset of the boundary. By theextremes theorem (2.8.1.1.1), the convex hull of extreme rays and the originis the positive semidefinite cone: (2.8.1.2.1){ ∞}∑conv{yy T ∈ S M | y ∈ R M } = b i z i zi T | z i ∈ R M , b≽0 = S M + (236)i=1For two-dimensional matrices (M =2, Figure 43){yy T ∈ S 2 | y ∈ R 2 } = ∂S 2 + (237)

128 CHAPTER 2. CONVEX GEOMETRYwhile for one-dimensional matrices, in exception, (M =1,2.7){yy ∈ S | y≠0} = int S + (238)Each and every extreme direction yy T makes the same angle with theidentity matrix in isomorphic R M(M+1)/2 , dependent only on dimension;videlicet, 2.46( )(yy T 〈yy T , I〉 1, I) = arccos = arccos √ ∀y ∈ R M (239)‖yy T ‖ F ‖I‖ F M2.9.2.7.1 Example. Positive semidefinite matrix from extreme directions.Diagonalizability (A.5) of symmetric matrices yields the following results:Any positive semidefinite matrix (1450) in S M can be written in the formA =M∑λ i z i zi T = ÂÂT = ∑ ii=1â i â T i ≽ 0, λ ≽ 0 (240)a conic combination of linearly independent extreme directions (â i â T i or z i z T iwhere ‖z i ‖=1), where λ is a vector of eigenvalues.If we limit consideration to all symmetric positive semidefinite matricesbounded via unity traceC {A ≽ 0 | trA = 1} (91)then any matrix A from that set may be expressed as a convex combinationof linearly independent extreme directions;Implications are:A =M∑λ i z i zi T ∈ C , 1 T λ = 1, λ ≽ 0 (241)i=11. set C is convex (an intersection of PSD cone with hyperplane),2. because the set of eigenvalues corresponding to a given square matrix Ais unique (A.5.0.1), no single eigenvalue can exceed 1 ; id est, I ≽ A3. and the converse holds: set C is an instance of Fantope (91). 2.46 Analogy with respect to the EDM cone is considered in [185, p.162] where it is found:angle is not constant. Extreme directions of the EDM cone can be found in6.4.3.2. Thecone’s axis is −E =11 T − I (1095).

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 129R0Figure 46: This solid circular cone in R 3 continues upward infinitely. Axis ofrevolution is illustrated as vertical line through origin. R represents radius:distance measured from an extreme direction to axis of revolution. Were thisa Lorentz cone, any plane slice containing axis of revolution would make aright angle.

130 CHAPTER 2. CONVEX GEOMETRY2.9.2.8 Positive semidefinite cone is generally not circularExtreme angle equation (239) suggests that the positive semidefinite conemight be invariant to rotation about its axis of revolution; id est, a circularcone. We investigate this now:2.9.2.8.1 Definition. Circular cone: 2.47a pointed closed convex cone having hyperspherical sections orthogonal toits axis of revolution about which the cone is invariant to rotation. △A conic section is the intersection of a cone with any hyperplane. In threedimensions, an intersecting plane perpendicular to a circular cone’s axis ofrevolution produces a section bounded by a circle. (Figure 46) A prominentexample of a circular cone in convex analysis is Lorentz cone (178). Wealso find that the positive semidefinite cone and cone of Euclidean distancematrices are circular cones, but only in low dimension.The positive semidefinite cone has axis of revolution that is the ray(base 0) through the identity matrix I . Consider a set of normalized extremedirections of the positive semidefinite cone: for some arbitrary positiveconstant a∈ R +{yy T ∈ S M | ‖y‖ = √ a} ⊂ ∂S M + (242)The distance from each extreme direction to the axis of revolution is radius√R infc ‖yyT − cI‖ F = a1 − 1 M(243)which is the distance from yy T to a M I ; the length of vector yyT − a M I .Because distance R (in a particular dimension) from the axis of revolutionto each and every normalized extreme direction is identical, the extremedirections lie on the boundary of a hypersphere in isometrically isomorphicR M(M+1)/2 . From Example 2.9.2.7.1, the convex hull (excluding vertex atthe origin) of the normalized extreme directions is a conic sectionC conv{yy T | y ∈ R M , y T y = a} = S M + ∩ {A∈ S M | 〈I , A〉 = a} (244)orthogonal to identity matrix I ;〈C − a 〉M I , I = tr(C − a I) = 0 (245)M

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 131yy TaI θ aMRM 11TFigure 47: Illustrated is a section, perpendicular to axis of revolution, ofcircular cone from Figure 46. Radius R is distance from any extremedirection to axis at a I . Vector aM M 11T is an arbitrary reference by whichto measure angle θ .Proof. Although the positive semidefinite cone possesses somecharacteristics of a circular cone, we can show it is not by demonstratingshortage of extreme directions; id est, some extreme directions correspondingto each and every angle of rotation about the axis of revolution arenonexistent: Referring to Figure 47, [380,1-7]cosθ =〈 aM 11T − a M I , yyT − a M I〉a 2 (1 − 1 M ) (246)Solving for vector y we geta(1 + (M −1) cosθ) = (1 T y) 2 (247)which does not have real solution ∀ 0 ≤ θ ≤ 2π in every matrix dimension M .2.47 A circular cone is assumed convex throughout, although not so by other authors. Wealso assume a right circular cone.

132 CHAPTER 2. CONVEX GEOMETRYFrom the foregoing proof we can conclude that the positive semidefinitecone might be circular but only in matrix dimensions 1 and 2. Because of ashortage of extreme directions, conic section (244) cannot be hypersphericalby the extremes theorem (2.8.1.1.1, Figure 42).2.9.2.8.2 Exercise. Circular semidefinite cone.Prove the PSD cone to be circular in matrix dimensions 1 and 2 while it isa rotation of Lorentz cone (178) in matrix dimension 2 . 2.48 2.9.2.8.3 Example. PSD cone inscription in three dimensions.Theorem. Geršgorin discs. [202,6.1] [364] [255, p.140]For p∈R m + given A=[A ij ]∈ S m , then all eigenvalues of A belong to theunion of m closed intervals on the real line;⎧⎫⋃⎪⎨λ(A) ∈ m ξ ∈ R |ξ − A ii | ≤ ̺i 1 m∑⎪⎬ ⋃p j |A ij | = m [A ii −̺i , A ii +̺i]i=1p ⎪⎩i j=1i=1 ⎪⎭j ≠ i(248)Furthermore, if a union of k of these m [intervals] forms a connectedregion that is disjoint from all the remaining n −k [intervals], thenthere are precisely k eigenvalues of A in this region.⋄To apply the theorem to determine positive semidefiniteness of symmetricmatrix A , we observe that for each i we must haveSupposeA ii ≥ ̺i (249)m = 2 (250)so A ∈ S 2 . Vectorizing A as in (56), svec A belongs to isometricallyisomorphic R 3 . Then we have m2 m−1 = 4 inequalities, in the matrix entriesA ij with Geršgorin parameters p =[p i ]∈ R 2 + ,p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 12(251){[2.48 α β/ √ ]2Hint: Given coneβ/ √ | √ } [ ]α2 γ2 + β 2 ≤ γ , show √ γ + α β12isβ γ − αa vector rotation that is positive semidefinite under the same inequality.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1330-1-0.5p =[ 1/21]10.500.5110.750.50 0.25p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 120-1-0.50.50[ 1p =1]10.5110.750.50 0.250-1-0.5A 11√2A12A 220.50[ 2p =1]10.5110.750.50 0.25svec ∂S 2 +Figure 48: Proper polyhedral cone K , created by intersection of halfspaces,inscribes PSD cone in isometrically isomorphic R 3 as predicted by Geršgorindiscs theorem for A=[A ij ]∈ S 2 . Hyperplanes supporting K intersect alongboundary of PSD cone. Four extreme directions of K coincide with extremedirections of PSD cone.

134 CHAPTER 2. CONVEX GEOMETRYwhich describe an intersection of four halfspaces in R m(m+1)/2 . Thatintersection creates the proper polyhedral cone K (2.12.1) whoseconstruction is illustrated in Figure 48. Drawn truncated is the boundaryof the positive semidefinite cone svec S 2 + and the bounding hyperplanessupporting K .Created by means of Geršgorin discs, K always belongs to the positivesemidefinite cone for any nonnegative value of p ∈ R m + . Hence any point inK corresponds to some positive semidefinite matrix A . Only the extremedirections of K intersect the positive semidefinite cone boundary in thisdimension; the four extreme directions of K are extreme directions of thepositive semidefinite cone. As p 1 /p 2 increases in value from 0, two extremedirections of K sweep the entire boundary of this positive semidefinite cone.Because the entire positive semidefinite cone can be swept by K , the systemof linear inequalities[Y T p1 ±psvec A 2 / √ ]2 00 ±p 1 / √ svec A ≽ 0 (252)2 p 2when made dynamic can replace a semidefinite constraint A≽0 ; id est, forgiven p where Y ∈ R m(m+1)/2×m2m−1butK = {z | Y T z ≽ 0} ⊂ svec S m + (253)svec A ∈ K ⇒ A ∈ S m + (254)∃p Y T svec A ≽ 0 ⇔ A ≽ 0 (255)In other words, diagonal dominance [202, p.349,7.2.3]A ii ≥m∑|A ij | , ∀i = 1... m (256)j=1j ≠ iis only a sufficient condition for membership to the PSD cone; but bydynamic weighting p in this dimension, it was made necessary andsufficient.In higher dimension (m > 2), boundary of the positive semidefinite coneis no longer constituted completely by its extreme directions (symmetric

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 135rank-one matrices); its geometry becomes intricate. How all the extremedirections can be swept by an inscribed polyhedral cone, 2.49 similarly to theforegoing example, remains an open question.2.9.2.8.4 Exercise. Dual inscription.Find dual proper polyhedral cone K ∗ from Figure 48.2.9.2.9 Boundary constituents of the positive semidefinite cone2.9.2.9.1 Lemma. Sum of positive semidefinite matrices.For A,B ∈ S M +rank(A + B) = rank(µA + (1 −µ)B) (257)over open interval (0, 1) of µ .Proof. Any positive semidefinite matrix belonging to the PSD conehas an eigenvalue decomposition that is a positively scaled sum of linearlyindependent symmetric dyads. By the linearly independent dyads definitioninB.1.1.0.1, rank of the sum A +B is equivalent to the number of linearlyindependent dyads constituting it. Linear independence is insensitive tofurther positive scaling by µ . The assumption of positive semidefinitenessprevents annihilation of any dyad from the sum A +B . 2.9.2.9.2 Example. Rank function quasiconcavity. (confer3.8)For A,B ∈ R m×n [202,0.4]that follows from the fact [331,3.6]rankA + rankB ≥ rank(A + B) (258)dim R(A) + dim R(B) = dim R(A + B) + dim(R(A) ∩ R(B)) (259)For A,B ∈ S M +rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} (1466)that follows from the factN(A + B) = N(A) ∩ N(B) , A,B ∈ S M + (160)2.49 It is not necessary to sweep the entire boundary in higher dimension.⋄

136 CHAPTER 2. CONVEX GEOMETRYRank is a quasiconcave function on S M + because the right-hand inequality in(1466) has the concave form (643); videlicet, Lemma 2.9.2.9.1. From this example we see, unlike convex functions, quasiconvex functionsare not necessarily continuous. (3.8) We also glean:2.9.2.9.3 Theorem. Convex subsets of positive semidefinite cone.Subsets of the positive semidefinite cone S M + , for 0≤ρ≤MS M +(ρ) {X ∈ S M + | rankX ≥ ρ} (260)are pointed convex cones, but not closed unless ρ = 0 ; id est, S M +(0)= S M + .⋄Proof. Given ρ , a subset S M +(ρ) is convex if and only if convexcombination of any two members has rank at least ρ . That is confirmedby applying identity (257) from Lemma 2.9.2.9.1 to (1466); id est, forA,B ∈ S M +(ρ) on closed interval [0, 1] of µrank(µA + (1 −µ)B) ≥ min{rankA, rankB} (261)It can similarly be shown, almost identically to proof of the lemma, any coniccombination of A,B in subset S M +(ρ) remains a member; id est, ∀ζ, ξ ≥0rank(ζA + ξB) ≥ min{rank(ζA), rank(ξB)} (262)Therefore, S M +(ρ) is a convex cone.Another proof of convexity can be made by projection arguments:2.9.2.10 Projection on S M +(ρ)Because these cones S M +(ρ) indexed by ρ (260) are convex, projection onthem is straightforward. Given a symmetric matrix H having diagonalizationH QΛQ T ∈ S M (A.5.1) with eigenvalues Λ arranged in nonincreasingorder, then its Euclidean projection (minimum-distance projection) on S M +(ρ)P S M+ (ρ)H = QΥ ⋆ Q T (263)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 137corresponds to a map of its eigenvalues:Υ ⋆ ii ={ max {ǫ , Λii } , i=1... ρmax {0, Λ ii } , i=ρ+1... M(264)where ǫ is positive but arbitrarily close to 0.2.9.2.10.1 Exercise. Projection on open convex cones.Prove (264) using Theorem E.9.2.0.1.Because each H ∈ S M has unique projection on S M +(ρ) (despite possibilityof repeated eigenvalues in Λ), we may conclude it is a convex set by theBunt-Motzkin theorem (E.9.0.0.1).Compare (264) to the well-known result regarding Euclidean projectionon a rank ρ subset of the positive semidefinite cone (2.9.2.1)S M + \ S M +(ρ + 1) = {X ∈ S M + | rankX ≤ ρ} (216)P S M+ \S M + (ρ+1) H = QΥ ⋆ Q T (265)As proved in7.1.4, this projection of H corresponds to the eigenvalue mapΥ ⋆ ii ={ max {0 , Λii } , i=1... ρ0 , i=ρ+1... M(1341)Together these two results (264) and (1341) mean: A higher-rank solutionto projection on the positive semidefinite cone lies arbitrarily close to anygiven lower-rank projection, but not vice versa. Were the number ofnonnegative eigenvalues in Λ known a priori not to exceed ρ , then thesetwo different projections would produce identical results in the limit ǫ→0.2.9.2.11 Uniting constituentsInterior of the PSD cone int S M + is convex by Theorem 2.9.2.9.3, for example,because all positive semidefinite matrices having rank M constitute the coneinterior.All positive semidefinite matrices of rank less than M constitute the coneboundary; an amalgam of positive semidefinite matrices of different rank.Thus each nonconvex subset of positive semidefinite matrices, for 0

138 CHAPTER 2. CONVEX GEOMETRYhaving rank ρ successively 1 lower than M , appends a nonconvexconstituent to the cone boundary; but only in their union is the boundarycomplete: (confer2.9.2)∂S M + =⋃{Y ∈ S M + | rankY = ρ} (267)M−1ρ=0The composite sequence, the cone interior in union with each successiveconstituent, remains convex at each step; id est, for 0≤k ≤MM⋃{Y ∈ S M + | rankY = ρ} (268)ρ=kis convex for each k by Theorem 2.9.2.9.3.2.9.2.12 Peeling constituentsProceeding the other way: To peel constituents off the complete positivesemidefinite cone boundary, one starts by removing the origin; the onlyrank-0 positive semidefinite matrix. What remains is convex. Next, theextreme directions are removed because they constitute all the rank-1 positivesemidefinite matrices. What remains is again convex, and so on. Proceedingin this manner eventually removes the entire boundary leaving, at last, theconvex interior of the PSD cone; all the positive definite matrices.2.9.2.12.1 Exercise. Difference A − B .What about a difference of matrices A,B belonging to the PSD cone? Show:Difference of any two points on the boundary belongs to the boundaryor exterior.Difference A −B , where A belongs to the boundary while B is interior,belongs to the exterior.2.9.3 Barvinok’s propositionBarvinok posits existence and quantifies an upper bound on rank of a positivesemidefinite matrix belonging to the intersection of the PSD cone with anaffine subset:

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1392.9.3.0.1 Proposition. (Barvinok) Affine intersection with PSD cone.[26,II.13] [24,2.2] Consider finding a matrix X ∈ S N satisfyingX ≽ 0 , 〈A j , X〉 = b j , j =1... m (269)given nonzero linearly independent (vectorized) A j ∈ S N and real b j . Definethe affine subsetA {X | 〈A j , X〉=b j , j =1... m} ⊆ S N (270)If the intersection A ∩ S N + is nonempty given a number m of equalities,then there exists a matrix X ∈ A ∩ S N + such thatrankX (rankX + 1)/2 ≤ m (271)whence the upper bound 2.50⌊√ ⌋ 8m + 1 − 1rankX ≤2(272)Given desired rank instead, equivalently,m < (rankX + 1)(rankX + 2)/2 (273)An extreme point of A ∩ S N + satisfies (272) and (273). (confer4.1.2.2)A matrix X R T R is an extreme point if and only if the smallest face, thatcontains X , of A ∩ S N + has dimension 0 ; [239,2.4] [240] id est, iff (171)dim F ( (A ∩ S N +)∋X )= rank(X)(rank(X) + 1)/2 − rank [ (274)svec RA 1 R T svec RA 2 R T · · · svec RA m R ] Tequals 0 in isomorphic R N(N+1)/2 .Now the intersection A ∩ S N + is assumed bounded: Assume a givennonzero upper bound ρ on rank, a number of equalitiesm=(ρ + 1)(ρ + 2)/2 (275)2.504.1.2.2 contains an intuitive explanation. This bound is itself limited above, of course,by N ; a tight limit corresponding to an interior point of S N + .

140 CHAPTER 2. CONVEX GEOMETRYand matrix dimension N ≥ ρ + 2 ≥ 3. If the intersection is nonempty andbounded, then there exists a matrix X ∈ A ∩ S N + such thatrankX ≤ ρ (276)This represents a tightening of the upper bound; a reduction by exactly 1of the bound provided by (272) given the same specified number m (275) ofequalities; id est,rankX ≤√ 8m + 1 − 12− 1 (277)⋄When intersection A ∩ S N + is known a priori to consist only of a singlepoint, then Barvinok’s proposition provides the greatest upper bound onits rank not exceeding N . The intersection can be a single nonzero pointonly if the number of linearly independent hyperplanes m constituting Asatisfies 2.51 N(N + 1)/2 − 1 ≤ m ≤ N(N + 1)/2 (278)2.10 Conic independence (c.i.)In contrast to extreme direction, the property conically independentdirection is more generally applicable; inclusive of all closed convexcones (not only pointed closed convex cones). Arbitrary given directions{Γ i ∈ R n , i=1... N} comprise a conically independent set if and only if(confer2.1.2,2.4.2.3)Γ i ζ i + · · · + Γ j ζ j − Γ l = 0, i≠ · · · ≠j ≠l = 1... N (279)has no solution ζ ∈ R N + ; in words, iff no direction from the given set canbe expressed as a conic combination of those remaining; e.g., Figure 49[test (279) Matlab implementation, Wıκımization]. Arranging any setof generators for a particular closed convex cone in a matrix columnar,X [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (280)2.51 For N >1, N(N+1)/2 −1 independent hyperplanes in R N(N+1)/2 can make a linetangent to svec ∂ S N + at a point because all one-dimensional faces of S N + are exposed.Because a pointed convex cone has only one vertex, the origin, there can be no intersectionof svec ∂ S N + with any higher-dimensional affine subset A that will make a nonzero point.

2.10. CONIC INDEPENDENCE (C.I.) 141then this test of conic independence (279) may be expressed as a set of linearfeasibility problems: for l = 1... Nfind ζ ∈ R Nsubject to Xζ = Γ lζ ≽ 0ζ l = 0(281)If feasible for any particular l , then the set is not conically independent.To find all conically independent directions from a set via (281), generatorΓ l must be removed from the set once it is found (feasible) conicallydependent on remaining generators in X . So, to continue testing remaininggenerators when Γ l is found to be dependent, Γ l must be discarded frommatrix X before proceeding. A generator Γ l that is instead found conicallyindependent of remaining generators in X , on the other hand, is conicallyindependent of any subset of remaining generators. A c.i. set thus found isnot necessarily unique.It is evident that linear independence (l.i.) of N directions implies theirconic independence;l.i. ⇒ c.i.which suggests, number of l.i. generators in the columns of X cannotexceed number of c.i. generators. Denoting by k the number of conicallyindependent generators contained in X , we have the most fundamental rankinequality for convex conesdim aff K = dim aff[0 X ] = rankX ≤ k ≤ N (282)Whereas N directions in n dimensions can no longer be linearly independentonce N exceeds n , conic independence remains possible:2.10.0.0.1 Table: Maximum number of c.i. directionsdimension n supk (pointed) supk (not pointed)0 0 01 1 22 2 43.∞.∞.

142 CHAPTER 2. CONVEX GEOMETRY000(a) (b) (c)Figure 49: Vectors in R 2 : (a) affinely and conically independent,(b) affinely independent but not conically independent, (c) conicallyindependent but not affinely independent. None of the examples exhibitslinear independence. (In general, a.i. c.i.)Assuming veracity of this table, there is an apparent vastness between twoand three dimensions. The finite numbers of conically independent directionsindicate:Convex cones in dimensions 0, 1, and 2 must be polyhedral. (2.12.1)Conic independence is certainly one convex idea that cannot be completelyexplained by a two-dimensional picture as Barvinok suggests [26, p.vii].From this table it is also evident that dimension of Euclidean space cannotexceed the number of conically independent directions possible;n ≤ supk2.10.1 Preservation of conic independenceIndependence in the linear (2.1.2.1), affine (2.4.2.4), and conic senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (280)holds a conically independent set columnar. Consider a transformation onthe domain of such matricesT(X) : R n×N → R n×N XY (283)where fixed matrix Y [y 1 y 2 · · · y N ]∈ R N×N represents linear operator T .Conic independence of {Xy i ∈ R n , i=1... N} demands, by definition (279),Xy i ζ i + · · · + Xy j ζ j − Xy l = 0, i≠ · · · ≠j ≠l = 1... N (284)

2.10. CONIC INDEPENDENCE (C.I.) 143K(a)K(b)∂K ∗Figure 50: (a) A pointed polyhedral cone (drawn truncated) in R 3 having sixfacets. The extreme directions, corresponding to six edges emanating fromthe origin, are generators for this cone; not linearly independent but theymust be conically independent. (b) The boundary of dual cone K ∗ (drawntruncated) is now added to the drawing of same K . K ∗ is polyhedral, proper,and has the same number of extreme directions as K has facets.

144 CHAPTER 2. CONVEX GEOMETRYhave no solution ζ ∈ R N + . That is ensured by conic independence of {y i ∈ R N }and by R(Y )∩ N(X) = 0 ; seen by factoring out X .2.10.1.1 linear maps of cones[21,7] If K is a convex cone in Euclidean space R and T is any linearmapping from R to Euclidean space M , then T(K) is a convex cone in Mand x ≼ y with respect to K implies T(x) ≼ T(y) with respect to T(K).If K is full-dimensional in R , then so is T(K) in M .If T is a linear bijection, then x ≼ y ⇔ T(x) ≼ T(y). If K is pointed,then so is T(K). And if K is closed, so is T(K). If F is a face of K , thenT(F) is a face of T(K).Linear bijection is only a sufficient condition for pointedness andclosedness; convex polyhedra (2.12) are invariant to any linear or inverselinear transformation [26,I.9] [307, p.44, thm.19.3].2.10.2 Pointed closed convex K & conic independenceThe following bullets can be derived from definitions (186) and (279) inconjunction with the extremes theorem (2.8.1.1.1):The set of all extreme directions from a pointed closed convex cone K ⊂ R nis not necessarily a linearly independent set, yet it must be a conicallyindependent set; (compare Figure 24 on page 71 with Figure 50a){extreme directions} ⇒ {c.i.}When a conically independent set of directions from pointed closed convexcone K is known to comprise generators, conversely, then all directions fromthat set must be extreme directions of the cone;{extreme directions} ⇔ {c.i. generators of pointed closed convex K}Barker & Carlson [21,1] call the extreme directions a minimal generatingset for a pointed closed convex cone. A minimal set of generators is thereforea conically independent set of generators, and vice versa, 2.52 for a pointedclosed convex cone.2.52 This converse does not hold for nonpointed closed convex cones as Table 2.10.0.0.1implies; e.g., ponder four conically independent generators for a plane (n=2, Figure 49).

2.10. CONIC INDEPENDENCE (C.I.) 145Hx 2x 30∂HFigure 51: Minimal set of generators X = [x 1 x 2 x 3 ]∈ R 2×3 (not extremedirections) for halfspace about origin; affinely and conically independent.x 1An arbitrary collection of n or fewer distinct extreme directions frompointed closed convex cone K ⊂ R n is not necessarily a linearly independentset; e.g., dual extreme directions (483) from Example 2.13.11.0.3.{≤ n extreme directions in R n } {l.i.}Linear dependence of few extreme directions is another convex idea thatcannot be explained by a two-dimensional picture as Barvinok suggests[26, p.vii]; indeed, it only first comes to light in four dimensions! But thereis a converse: [330,2.10.9]{extreme directions} ⇐ {l.i. generators of closed convex K}2.10.2.0.1 Example. Vertex-description of halfspace H about origin.From n + 1 points in R n we can make a vertex-description of a convexcone that is a halfspace H , where {x l ∈ R n , l=1... n} constitutes aminimal set of generators for a hyperplane ∂H through the origin. Anexample is illustrated in Figure 51. By demanding the augmented set

146 CHAPTER 2. CONVEX GEOMETRY{x l ∈ R n , l=1... n + 1} be affinely independent (we want vector x n+1 notparallel to ∂H), thenH = ⋃ (ζ x n+1 + ∂H)ζ ≥0= {ζ x n+1 + cone{x l ∈ R n , l=1... n} | ζ ≥0}= cone{x l ∈ R n , l=1... n + 1}(285)a union of parallel hyperplanes. Cardinality is one step beyond dimension ofthe ambient space, but {x l ∀l} is a minimal set of generators for this convexcone H which has no extreme elements.2.10.2.0.2 Exercise. Enumerating conically independent directions.Describe a nonpointed polyhedral cone in three dimensions having more than8 conically independent generators. (confer Table 2.10.0.0.1) 2.10.3 Utility of conic independencePerhaps the most useful application of conic independence is determinationof the intersection of closed convex cones from their halfspace-descriptions,or representation of the sum of closed convex cones from theirvertex-descriptions.⋂Ki∑KiA halfspace-description for the intersection of any number of closedconvex cones K i can be acquired by pruning normals; specifically,only the conically independent normals from the aggregate of all thehalfspace-descriptions need be retained.Generators for the sum of any number of closed convex cones K i canbe determined by retaining only the conically independent generatorsfrom the aggregate of all the vertex-descriptions.Such conically independent sets are not necessarily unique or minimal.2.11 When extreme means exposedFor any convex full-dimensional polyhedral set in R n , distinction betweenthe terms extreme and exposed vanishes [330,2.4] [113,2.2] for faces of

2.12. CONVEX POLYHEDRA 147all dimensions except n ; their meanings become equivalent as we saw inFigure 20 (discussed in2.6.1.2). In other words, each and every face of anypolyhedral set (except the set itself) can be exposed by a hyperplane, andvice versa; e.g., Figure 24.Lewis [246,6] [215,2.3.4] claims nonempty extreme proper subsets andthe exposed subsets coincide for S n + ; id est, each and every face of the positivesemidefinite cone, whose dimension is less than dimension of the cone, isexposed. A more general discussion of cones having this property can befound in [341]; e.g., Lorentz cone (178) [20,II.A].2.12 Convex polyhedraEvery polyhedron, such as the convex hull (86) of a bounded list X , canbe expressed as the solution set of a finite system of linear equalities andinequalities, and vice versa. [113,2.2]2.12.0.0.1 Definition. Convex polyhedra, halfspace-description.A convex polyhedron is the intersection of a finite number of halfspaces andhyperplanes;P = {y | Ay ≽ b, Cy = d} ⊆ R n (286)where coefficients A and C generally denote matrices. Each row of C is avector normal to a hyperplane, while each row of A is a vector inward-normalto a hyperplane partially bounding a halfspace.△By the halfspaces theorem in2.4.1.1.1, a polyhedron thus described isa closed convex set possibly not full-dimensional; e.g., Figure 20. Convexpolyhedra 2.53 are finite-dimensional comprising all affine sets (2.3.1),polyhedral cones, line segments, rays, halfspaces, convex polygons, solids[223, def.104/6 p.343], polychora, polytopes, 2.54 etcetera.It follows from definition (286) by exposure that each face of a convexpolyhedron is a convex polyhedron.Projection of any polyhedron on a subspace remains a polyhedron. Moregenerally, image and inverse image of a convex polyhedron under any linear2.53 We consider only convex polyhedra throughout, but acknowledge the existence ofconcave polyhedra. [373, Kepler-Poinsot Solid]2.54 Some authors distinguish bounded polyhedra via the designation polytope. [113,2.2]

148 CHAPTER 2. CONVEX GEOMETRYtransformation remains a convex polyhedron; [26,I.9] [307, thm.19.3] theforemost consequence being, invariance of polyhedral set closedness.When b and d in (286) are 0, the resultant is a polyhedral cone. Theset of all polyhedral cones is a subset of convex cones:2.12.1 Polyhedral coneFrom our study of cones, we see: the number of intersecting hyperplanesand halfspaces constituting a convex cone is possibly but not necessarilyinfinite. When the number is finite, the convex cone is termed polyhedral.That is the primary distinguishing feature between the set of all convexcones and polyhedra; all polyhedra, including polyhedral cones, are finitelygenerated [307,19]. (Figure 52) We distinguish polyhedral cones in the setof all convex cones for this reason, although all convex cones of dimension 2or less are polyhedral.2.12.1.0.1 Definition. Polyhedral cone, halfspace-description. 2.55(confer (103)) A polyhedral cone is the intersection of a finite number ofhalfspaces and hyperplanes about the origin;K = {y | Ay ≽ 0, Cy = 0} ⊆ R n (a)= {y | Ay ≽ 0, Cy ≽ 0, Cy ≼ 0} (b)⎧ ⎡ ⎤ ⎫⎨ A ⎬=⎩ y | ⎣ C ⎦y ≽ 0(c)⎭−C(287)where coefficients A and C generally denote matrices of finite dimension.Each row of C is a vector normal to a hyperplane containing the origin,while each row of A is a vector inward-normal to a hyperplane containingthe origin and partially bounding a halfspace.△A polyhedral cone thus defined is closed, convex (2.4.1.1), has onlya finite number of generators (2.8.1.2), and can be not full-dimensional.(Minkowski) Conversely, any finitely generated convex cone is polyhedral.(Weyl) [330,2.8] [307, thm.19.1]2.55 Rockafellar [307,19] proposes affine sets be handled via complementary pairs of affineinequalities; e.g., Cy ≽d and Cy ≼d which, when taken together, can present severedifficulties to some interior-point methods of numerical solution.

2.12. CONVEX POLYHEDRA 149convex polyhedraboundedpolyhedrapolyhedralconesconvex conesFigure 52: Polyhedral cones are finitely generated, unbounded, and convex.2.12.1.0.2 Exercise. Unbounded convex polyhedra.Illustrate an unbounded polyhedron that is not a cone or its translation. From the definition it follows that any single hyperplane through theorigin, or any halfspace partially bounded by a hyperplane through the originis a polyhedral cone. The most familiar example of polyhedral cone is anyquadrant (or orthant,2.1.3) generated by Cartesian half-axes. Esotericexamples of polyhedral cone include the point at the origin, any line throughthe origin, any ray having the origin as base such as the nonnegative realline R + in subspace R , polyhedral flavor (proper) Lorentz cone (317), anysubspace, and R n . More polyhedral cones are illustrated in Figure 50 andFigure 24.2.12.2 Vertices of convex polyhedraBy definition, a vertex (2.6.1.0.1) always lies on the relative boundary of aconvex polyhedron. [223, def.115/6 p.358] In Figure 20, each vertex of thepolyhedron is located at an intersection of three or more facets, and everyedge belongs to precisely two facets [26,VI.1 p.252]. In Figure 24, the onlyvertex of that polyhedral cone lies at the origin.The set of all polyhedral cones is clearly a subset of convex polyhedraand a subset of convex cones (Figure 52). Not all convex polyhedra arebounded; evidently, neither can they all be described by the convex hull of abounded set of points as defined in (86). Hence a universal vertex-description

150 CHAPTER 2. CONVEX GEOMETRYof polyhedra in terms of that same finite-length list X (76):2.12.2.0.1 Definition. Convex polyhedra, vertex-description.(confer2.8.1.1.1) Denote the truncated a-vector[ ]aia i:l = .a l(288)By discriminating a suitable finite-length generating list (or set) arrangedcolumnar in X ∈ R n×N , then any particular polyhedron may be describedP = { Xa | a T 1:k1 = 1, a m:N ≽ 0, {1... k} ∪ {m ... N} = {1... N} } (289)where 0≤k ≤N and 1≤m≤N+1. Setting k=0 removes the affineequality condition. Setting m=N+1 removes the inequality. △Coefficient indices in (289) may or may not be overlapping, but allthe coefficients are assumed constrained. From (78), (86), and (103), wesummarize how the coefficient conditions may be applied;affine sets −→ a T 1:k 1 = 1polyhedral cones −→ a m:N ≽ 0}←− convex hull (m ≤ k) (290)It is always possible to describe a convex hull in the region of overlappingindices because, for 1 ≤ m ≤ k ≤ N{a m:k | a T m:k1 = 1, a m:k ≽ 0} ⊆ {a m:k | a T 1:k1 = 1, a m:N ≽ 0} (291)Members of a generating list are not necessarily vertices of thecorresponding polyhedron; certainly true for (86) and (289), some subsetof list members reside in the polyhedron’s relative interior. Conversely, whenboundedness (86) applies, convex hull of the vertices is a polyhedron identicalto convex hull of the generating list.2.12.2.1 Vertex-description of polyhedral coneGiven closed convex cone K in a subspace of R n having any set of generatorsfor it arranged in a matrix X ∈ R n×N as in (280), then that cone is describedsetting m=1 and k=0 in vertex-description (289):a conic hull of N generators.K = coneX = {Xa | a ≽ 0} ⊆ R n (103)

2.12. CONVEX POLYHEDRA 1512.12.2.2 Pointedness(2.7.2.1.2) [330,2.10] Assuming all generators constituting the columns ofX ∈ R n×N are nonzero, polyhedral cone K is pointed if and only if there isno nonzero a ≽ 0 that solves Xa=0; id est, ifffind asubject to Xa = 01 T a = 1a ≽ 0(292)is infeasible or iff N(X) ∩ R N + = 0 . 2.56 Otherwise, the cone will contain atleast one line and there can be no vertex; id est, the cone cannot otherwisebe pointed. Euclidean vector space R n or any halfspace are examples ofnonpointed polyhedral cone; hence, no vertex. This null-pointedness criterionXa=0 means that a pointed polyhedral cone is invariant to linear injectivetransformation.Examples of pointed polyhedral cone K include: the origin, any 0-basedray in a subspace, any two-dimensional V-shaped cone in a subspace, anyorthant in R n or R m×n ; e.g., nonnegative real line R + in vector space R .2.12.3 Unit simplexA peculiar subset of the nonnegative orthant with halfspace-descriptionS {s | s ≽ 0, 1 T s ≤ 1} ⊆ R n + (293)is a unique bounded convex full-dimensional polyhedron called unit simplex(Figure 53) having n + 1 facets, n + 1 vertices, and dimensiondim S = n (294)The origin supplies one vertex while heads of the standard basis [202][331] {e i , i=1... n} in R n constitute those remaining; 2.57 thus itsvertex-description:S= conv {0, {e i , i=1... n}}= { [0 e 1 e 2 · · · e n ]a | a T 1 = 1, a ≽ 0 } (295)2.56 If rankX = n , then the dual cone K ∗ (2.13.1) is pointed. (310)2.57 In R 0 the unit simplex is the point at the origin, in R the unit simplex is the linesegment [0,1], in R 2 it is a triangle and its relative interior, in R 3 it is the convex hull ofa tetrahedron (Figure 53), in R 4 it is the convex hull of a pentatope [373], and so on.

152 CHAPTER 2. CONVEX GEOMETRYS = {s | s ≽ 0, 1 T s ≤ 1}1Figure 53: Unit simplex S in R 3 is a unique solid tetrahedron but not regular.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1532.12.3.1 SimplexThe unit simplex comes from a class of general polyhedra called simplex,having vertex-description: [94] [307] [371] [113]conv{x l ∈ R n } | l = 1... k+1, dim aff{x l } = k , n ≥ k (296)So defined, a simplex is a closed bounded convex set possibly notfull-dimensional. Examples of simplices, by increasing affine dimension, are:a point, any line segment, any triangle and its relative interior, a generaltetrahedron, any five-vertex polychoron, and so on.2.12.3.1.1 Definition. Simplicial cone.A proper polyhedral (2.7.2.2.1) cone K in R n is called simplicial iff K hasexactly n extreme directions; [20,II.A] equivalently, iff proper K has exactlyn linearly independent generators contained in any given set of generators.△simplicial cone ⇒ proper polyhedral coneThere are an infinite variety of simplicial cones in R n ; e.g., Figure 24,Figure 54, Figure 64. Any orthant is simplicial, as is any rotation thereof.2.12.4 Converting between descriptionsConversion between halfspace- (286) (287) and vertex-description (86) (289)is nontrivial, in general, [15] [113,2.2] but the conversion is easy forsimplices. [61,2.2.4] Nonetheless, we tacitly assume the two descriptions tobe equivalent. [307,19 thm.19.1] We explore conversions in2.13.4,2.13.9,and2.13.11:2.13 Dual cone & generalized inequality& biorthogonal expansionThese three concepts, dual cone, generalized inequality, and biorthogonalexpansion, are inextricably melded; meaning, it is difficult to completelydiscuss one without mentioning the others. The dual cone is critical in testsfor convergence by contemporary primal/dual methods for numerical solution

154 CHAPTER 2. CONVEX GEOMETRYFigure 54: Two views of a simplicial cone and its dual in R 3 (second view onnext page). Semiinfinite boundary of each cone is truncated for illustration.Each cone has three facets (confer2.13.11.0.3). Cartesian axes are drawnfor reference.

2.13. DUAL CONE & GENERALIZED INEQUALITY 155

156 CHAPTER 2. CONVEX GEOMETRYof conic problems. [391] [277,4.5] For unique minimum-distance projectionon a closed convex cone K , the negative dual cone −K ∗ plays the role thatorthogonal complement plays for subspace projection. 2.58 (E.9.2.1) Indeed,−K ∗ is the algebraic complement in R n ;K ⊞ −K ∗ = R n (2023)where ⊞ denotes unique orthogonal vector sum.One way to think of a pointed closed convex cone is as a new kind ofcoordinate system whose basis is generally nonorthogonal; a conic system,very much like the familiar Cartesian system whose analogous cone is thefirst quadrant (the nonnegative orthant). Generalized inequality ≽ K is aformalized means to determine membership to any pointed closed convexcone K (2.7.2.2) whereas biorthogonal expansion is, fundamentally, anexpression of coordinates in a pointed conic system whose axes are linearlyindependent but not necessarily orthogonal. When cone K is the nonnegativeorthant, then these three concepts come into alignment with the Cartesianprototype: biorthogonal expansion becomes orthogonal expansion, the dualcone becomes identical to the orthant, and generalized inequality obeys atotal order entrywise.2.13.1 Dual coneFor any set K (convex or not), the dual cone [109,4.2]K ∗ {y ∈ R n | 〈y , x〉 ≥ 0 for all x ∈ K} (297)is a unique cone 2.59 that is always closed and convex because it is anintersection of halfspaces (2.4.1.1.1). Each halfspace has inward-normal x ,belonging to K , and boundary containing the origin; e.g., Figure 55a.When cone K is convex, there is a second and equivalent construction:Dual cone K ∗ is the union of each and every vector y inward-normalto a hyperplane supporting K (2.4.2.6.1); e.g., Figure 55b. When K is2.58 Namely, projection on a subspace is ascertainable from projection on its orthogonalcomplement (Figure 165).2.59 The dual cone is the negative polar cone defined by many authors; K ∗ = −K ◦ .[199,A.3.2] [307,14] [41] [26] [330,2.7]

2.13. DUAL CONE & GENERALIZED INEQUALITY 157represented by a halfspace-description such as (287), for example, where⎡a T1 ⎤⎡c T1A ⎣ . ⎦∈ R m×n , C ⎣ .⎤⎦∈ R p×n (298)a T mc T pthen the dual cone can be represented as the conic hullK ∗ = cone{a 1 ,..., a m , ±c 1 ,..., ±c p } (299)a vertex-description, because each and every conic combination of normalsfrom the halfspace-description of K yields another inward-normal to ahyperplane supporting K .K ∗ can also be constructed pointwise using projection theory fromE.9.2:for P K x the Euclidean projection of point x on closed convex cone K−K ∗ = {x − P K x | x∈ R n } = {x∈ R n | P K x = 0} (2024)2.13.1.0.1 Exercise. Manual dual cone construction.Perhaps the most instructive graphical method of dual cone construction iscut-and-try. Find the dual of each polyhedral cone from Figure 56 by usingdual cone equation (297).2.13.1.0.2 Exercise. Dual cone definitions.What is {x∈ R n | x T z ≥0 ∀z∈R n } ?What is {x∈ R n | x T z ≥1 ∀z∈R n } ?What is {x∈ R n | x T z ≥1 ∀z∈R n +} ?As defined, dual cone K ∗ exists even when the affine hull of the originalcone is a proper subspace; id est, even when the original cone is notfull-dimensional. 2.60To further motivate our understanding of the dual cone, consider theease with which convergence can be ascertained in the following optimizationproblem (302p):2.60 Rockafellar formulates dimension of K and K ∗ . [307,14.6.1] His monumental workConvex Analysis has not one figure or illustration. See [26,II.16] for a good illustrationof Rockafellar’s recession cone [42].

158 CHAPTER 2. CONVEX GEOMETRYK ∗(a)0K ∗KyK(b)0Figure 55: Two equivalent constructions of dual cone K ∗ in R 2 : (a) Showingconstruction by intersection of halfspaces about 0 (drawn truncated). Onlythose two halfspaces whose bounding hyperplanes have inward-normalcorresponding to an extreme direction of this pointed closed convex coneK ⊂ R 2 need be drawn; by (369). (b) Suggesting construction by union ofinward-normals y to each and every hyperplane ∂H + supporting K . Thisinterpretation is valid when K is convex because existence of a supportinghyperplane is then guaranteed (2.4.2.6).

2.13. DUAL CONE & GENERALIZED INEQUALITY 159R 2 R 310.8(a)0.60.40.20∂K ∗K∂K ∗K(b)−0.2−0.4−0.6K ∗−0.8−1−0.5 0 0.5 1 1.5x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ G(K ∗ ) (366)Figure 56: Dual cone construction by right angle. Each extreme direction ofa proper polyhedral cone is orthogonal to a facet of its dual cone, and viceversa, in any dimension. (2.13.6.1) (a) This characteristic guides graphicalconstruction of dual cone in two dimensions: It suggests finding dual-coneboundary ∂ by making right angles with extreme directions of polyhedralcone. The construction is then pruned so that each dual boundary vector doesnot exceed π/2 radians in angle with each and every vector from polyhedralcone. Were dual cone in R 2 to narrow, Figure 57 would be reached in limit.(b) Same polyhedral cone and its dual continued into three dimensions.(confer Figure 64)

160 CHAPTER 2. CONVEX GEOMETRYKK ∗0Figure 57: Polyhedral cone K is a halfspace about origin in R 2 . Dual cone K ∗is a ray base 0, hence not full-dimensional in R 2 ; so K cannot be pointed,hence has no extreme directions. (Both convex cones appear truncated.)2.13.1.0.3 Example. Dual problem. (confer4.1)Duality is a powerful and widely employed tool in applied mathematics fora number of reasons. First, the dual program is always convex even if theprimal is not. Second, the number of variables in the dual is equal to thenumber of constraints in the primal which is often less than the number ofvariables in the primal program. Third, the maximum value achieved bythe dual problem is often equal to the minimum of the primal. [299,2.1.3]When not equal, the dual always provides a bound on the primal optimalobjective. For convex problems, the dual variables provide necessary andsufficient optimality conditions:Essentially, Lagrange duality theory concerns representation of a givenoptimization problem as half of a minimax problem. [307,36] [61,5.4] Givenany real function f(x,z)minimizexalways holds. Whenminimizexmaximizezmaximizezf(x,z) ≥ maximizezf(x,z) = maximizezminimize f(x,z) (300)xminimize f(x,z) (301)x

2.13. DUAL CONE & GENERALIZED INEQUALITY 161f(x, z p ) or f(x)f(x p , z) or g(z)xzFigure 58: (Drawing by Lucas V. Barbosa) This serves as mnemonic icon forprimal and dual problems, although objective functions from conic problems(302p) (302d) are linear. When problems are strong duals, duality gap is 0 ;meaning, functions f(x), g(z) (dotted) kiss at saddle value as depicted atcenter. Otherwise, dual functions never meet (f(x) > g(z)) by (300).

162 CHAPTER 2. CONVEX GEOMETRYwe have strong duality and then a saddle value [152] exists. (Figure 58)[304, p.3] Consider primal conic problem (p) (over cone K) and itscorresponding dual problem (d): [293,3.3.1] [239,2.1] [240] givenvectors α , β and matrix constant C(p)minimize α T xxsubject to x ∈ KCx = βmaximize β T zy , zsubject to y ∈ K ∗C T z + y = α(d) (302)Observe: the dual problem is also conic, and its objective function valuenever exceeds that of the primal;α T x ≥ β T zx T (C T z + y) ≥ (Cx) T zx T y ≥ 0(303)which holds by definition (297). Under the sufficient condition: (302p) is aconvex problem 2.61 satisfying Slater’s condition (p.285), then equalityx ⋆T y ⋆ = 0 (304)is achieved; which is necessary and sufficient for optimality (2.13.10.1.5);each problem (p) and (d) attains the same optimal value of its objective andeach problem is called a strong dual to the other because the duality gap(optimal primal−dual objective difference) becomes 0. Then (p) and (d) aretogether equivalent to the minimax problemminimize α T x − β T zx,y,zsubject to x ∈ K ,y ∈ K ∗Cx = β , C T z + y = α(p)−(d) (305)whose optimal objective always has the saddle value 0 (regardless of theparticular convex cone K and other problem parameters). [359,3.2] Thusdetermination of convergence for either primal or dual problem is facilitated.Were convex cone K polyhedral (2.12.1), then problems (p) and (d)would be linear programs. Selfdual nonnegative orthant K yields the primal2.61 A convex problem, essentially, has convex objective function optimized over a convexset. (4) In this context, problem (p) is convex if K is a convex cone.

2.13. DUAL CONE & GENERALIZED INEQUALITY 163prototypical linear program and its dual. Were K a positive semidefinitecone, then problem (p) has the form of prototypical semidefinite program(649) with (d) its dual. It is sometimes possible to solve a primal problem byway of its dual; advantageous when the dual problem is easier to solve thanthe primal problem, for example, because it can be solved analytically, or hassome special structure that can be exploited. [61,5.5.5] (4.2.3.1) 2.13.1.1 Key properties of dual coneFor any cone, (−K) ∗ = −K ∗For any cones K 1 and K 2 , K 1 ⊆ K 2 ⇒ K ∗ 1 ⊇ K ∗ 2 [330,2.7](Cartesian product) For closed convex cones K 1 and K 2 , theirCartesian product K = K 1 × K 2 is a closed convex cone, andK ∗ = K ∗ 1 × K ∗ 2 (306)where each dual is determined with respect to a cone’s ambient space.(conjugation) [307,14] [109,4.5] [330, p.52] When K is any convexcone, dual of the dual cone equals closure of the original cone;K ∗∗ = K (307)is the intersection of all halfspaces about the origin that contain K .Because K ∗∗∗ = K ∗ always holds,K ∗ = (K) ∗ (308)When convex cone K is closed, then dual of the dual cone is the originalcone; K ∗∗ = K ⇔ K is a closed convex cone: [330, p.53, p.95]K = {x ∈ R n | 〈y , x〉 ≥ 0 ∀y ∈ K ∗ } (309)If any cone K is full-dimensional, then K ∗ is pointed;K full-dimensional ⇒ K ∗ pointed (310)If the closure of any convex cone K is pointed, conversely, then K ∗ isfull-dimensional;K pointed ⇒ K ∗ full-dimensional (311)Given that a cone K ⊂ R n is closed and convex, K is pointed if and onlyif K ∗ − K ∗ = R n ; id est, iff K ∗ is full-dimensional. [55,3.3 exer.20]

164 CHAPTER 2. CONVEX GEOMETRY(vector sum) [307, thm.3.8] For convex cones K 1 and K 2K 1 + K 2 = conv(K 1 ∪ K 2 ) (312)is a convex cone.(dual vector-sum) [307,16.4.2] [109,4.6] For convex cones K 1 and K 2K ∗ 1 ∩ K ∗ 2 = (K 1 + K 2 ) ∗ = (K 1 ∪ K 2 ) ∗ (313)(closure of vector sum of duals) 2.62 For closed convex cones K 1 and K 2(K 1 ∩ K 2 ) ∗ = K ∗ 1 + K ∗ 2 = conv(K ∗ 1 ∪ K ∗ 2) (314)[330, p.96] where operation closure becomes superfluous under thesufficient condition K 1 ∩ int K 2 ≠ ∅ [55,3.3 exer.16,4.1 exer.7].(Krein-Rutman) Given closed convex cones K 1 ⊆ R m and K 2 ⊆ R nand any linear map A : R n → R m , then provided int K 1 ∩ AK 2 ≠ ∅[55,3.3.13, confer4.1 exer.9](A −1 K 1 ∩ K 2 ) ∗ = A T K ∗ 1 + K ∗ 2 (315)where dual of cone K 1 is with respect to its ambient space R m and dualof cone K 2 is with respect to R n , where A −1 K 1 denotes inverse image(2.1.9.0.1) of K 1 under mapping A , and where A T denotes adjointoperator. The particularly important case K 2 = R n is easy to show: forA T A = I(A T K 1 ) ∗ {y ∈ R n | x T y ≥ 0 ∀x∈A T K 1 }= {y ∈ R n | (A T z) T y ≥ 0 ∀z∈K 1 }= {A T w | z T w ≥ 0 ∀z∈K 1 }= A T K ∗ 1(316)2.62 These parallel analogous results for subspaces R 1 , R 2 ⊆ R n ; [109,4.6]R ⊥⊥ = R for any subspace R.(R 1 + R 2 ) ⊥ = R ⊥ 1 ∩ R ⊥ 2(R 1 ∩ R 2 ) ⊥ = R ⊥ 1 + R⊥ 2

2.13. DUAL CONE & GENERALIZED INEQUALITY 165K is proper if and only if K ∗ is proper.K is polyhedral if and only if K ∗ is polyhedral. [330,2.8]K is simplicial if and only if K ∗ is simplicial. (2.13.9.2) A simplicialcone and its dual are proper polyhedral cones (Figure 64, Figure 54),but not the converse.K ⊞ −K ∗ = R n ⇔ K is closed and convex. (2023)Any direction in a proper cone K is normal to a hyperplane separatingK from −K ∗ .2.13.1.2 Examples of dual coneWhen K is R n , K ∗ is the point at the origin, and vice versa.When K is a subspace, K ∗ is its orthogonal complement, and vice versa.(E.9.2.1, Figure 59)When cone K is a halfspace in R n with n > 0 (Figure 57 for example),the dual cone K ∗ is a ray (base 0) belonging to that halfspace but orthogonalto its bounding hyperplane (that contains the origin), and vice versa.When convex cone K is a closed halfplane in R 3 (Figure 60), it isneither pointed or full-dimensional; hence, the dual cone K ∗ can be neitherfull-dimensional or pointed.When K is any particular orthant in R n , the dual cone is identical; id est,K = K ∗ .When K is any quadrant in subspace R 2 , K ∗ is a wedge-shaped polyhedralcone in R 3 ; e.g., for K equal to quadrant I , K ∗ =[R2+RWhen K is a polyhedral flavor Lorentz cone (confer (178)){[ ]}xK l = ∈ R n × R | ‖x‖tl ≤ t , l∈{1, ∞} (317)its dual is the proper cone [61, exmp.2.25]{[ ]}K q = K ∗ xl = ∈ R n × R | ‖x‖tq ≤ t , l∈{1, 2, ∞} (318)where ‖x‖ ∗ l = ‖x‖ q is that norm dual to ‖x‖ l determined via solution to1/l + 1/q = 1. Figure 62 illustrates K=K 1 and K ∗ = K ∗ 1= K ∞ in R 2 × R .].

166 CHAPTER 2. CONVEX GEOMETRYK ∗K0R 3 R 2K ∗K0Figure 59: When convex cone K is any one Cartesian axis, its dual cone isthe convex hull of all axes remaining; its orthogonal complement. In R 3 ,dual cone K ∗ (drawn tiled and truncated) is a hyperplane through origin; itsnormal belongs to line K . In R 2 , dual cone K ∗ is a line through origin whileconvex cone K is that line orthogonal.

2.13. DUAL CONE & GENERALIZED INEQUALITY 167KK ∗Figure 60: K and K ∗ are halfplanes in R 3 ; blades. Both semiinfinite convexcones appear truncated. Each cone is like K from Figure 57 but embeddedin a two-dimensional subspace of R 3 . Cartesian coordinate axes drawn forreference.

168 CHAPTER 2. CONVEX GEOMETRY2.13.2 Abstractions of Farkas’ lemma2.13.2.0.1 Corollary. Generalized inequality and membership relation.[199,A.4.2] Let K be any closed convex cone and K ∗ its dual, and let xand y belong to a vector space R n . Theny ∈ K ∗ ⇔ 〈y , x〉 ≥ 0 for all x ∈ K (319)which is, merely, a statement of fact by definition of dual cone (297). Byclosure we have conjugation: [307, thm.14.1]x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ (320)which may be regarded as a simple translation of Farkas’ lemma [133] asin [307,22] to the language of convex cones, and a generalization of thewell-known Cartesian cone factx ≽ 0 ⇔ 〈y , x〉 ≥ 0 for all y ≽ 0 (321)for which implicitly K = K ∗ = R n + the nonnegative orthant.Membership relation (320) is often written instead as dual generalizedinequalities, when K and K ∗ are pointed closed convex cones,x ≽K0 ⇔ 〈y , x〉 ≥ 0 for all y ≽ 0 (322)K ∗meaning, coordinates for biorthogonal expansion of x (2.13.7.1.2,2.13.8)[365] must be nonnegative when x belongs to K . Conjugating,y ≽K ∗0 ⇔ 〈y , x〉 ≥ 0 for all x ≽K0 (323)⋄When pointed closed convex cone K is not polyhedral, coordinate axesfor biorthogonal expansion asserted by the corollary are taken from extremedirections of K ; expansion is assured by Carathéodory’s theorem (E.6.4.1.1).We presume, throughout, the obvious:x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ (320)⇔x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ , ‖y‖= 1(324)

2.13. DUAL CONE & GENERALIZED INEQUALITY 1692.13.2.0.2 Exercise. Test of dual generalized inequalities.Test Corollary 2.13.2.0.1 and (324) graphically on the two-dimensionalpolyhedral cone and its dual in Figure 56.(confer2.7.2.2) When pointed closed convex cone K is implicit from context:x ≽ 0 ⇔ x ∈ Kx ≻ 0 ⇔ x ∈ rel int K(325)Strict inequality x ≻ 0 means coordinates for biorthogonal expansion of xmust be positive when x belongs to rel int K . Strict membership relationsare useful; e.g., for any proper cone 2.63 K and its dual K ∗x ∈ int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ , y ≠ 0 (326)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ int K ∗ (327)Conjugating, we get the dual relations:y ∈ int K ∗ ⇔ 〈y , x〉 > 0 for all x ∈ K , x ≠ 0 (328)y ∈ K ∗ , y ≠ 0 ⇔ 〈y , x〉 > 0 for all x ∈ int K (329)Boundary-membership relations for proper cones are also useful:x ∈ ∂K ⇔ ∃ y ≠ 0 〈y , x〉 = 0, y ∈ K ∗ , x ∈ K (330)y ∈ ∂K ∗ ⇔ ∃ x ≠ 0 〈y , x〉 = 0, x ∈ K , y ∈ K ∗ (331)which are consistent; e.g., x∈∂K ⇔ x /∈int K and x∈ K .2.13.2.0.3 Example. Linear inequality. [337,4](confer2.13.5.1.1) Consider a given matrix A and closed convex cone K .By membership relation we haveAy ∈ K ∗ ⇔ x T Ay≥0 ∀x ∈ K⇔ y T z ≥0 ∀z ∈ {A T x | x ∈ K}⇔ y ∈ {A T x | x ∈ K} ∗ (332)2.63 An open cone K is admitted to (326) and (329) by (19).

170 CHAPTER 2. CONVEX GEOMETRYThis implies{y | Ay ∈ K ∗ } = {A T x | x ∈ K} ∗ (333)When K is the selfdual nonnegative orthant (2.13.5.1), for example, thenand the dual relation{y | Ay ≽ 0} = {A T x | x ≽ 0} ∗ (334){y | Ay ≽ 0} ∗ = {A T x | x ≽ 0} (335)comes by a theorem of Weyl (p.148) that yields closedness for anyvertex-description of a polyhedral cone.2.13.2.1 Null certificate, Theorem of the alternativeIf in particular x p /∈ K a closed convex cone, then construction in Figure 55bsuggests there exists a supporting hyperplane (having inward-normalbelonging to dual cone K ∗ ) separating x p from K ; indeed, (320)x p /∈ K ⇔ ∃ y ∈ K ∗ 〈y , x p 〉 < 0 (336)Existence of any one such y is a certificate of null membership. From adifferent perspective,x p ∈ Kor in the alternative∃ y ∈ K ∗ 〈y , x p 〉 < 0(337)By alternative is meant: these two systems are incompatible; one system isfeasible while the other is not.2.13.2.1.1 Example. Theorem of the alternative for linear inequality.Myriad alternative systems of linear inequality can be explained in terms ofpointed closed convex cones and their duals.Beginning from the simplest Cartesian dual generalized inequalities (321)(with respect to nonnegative orthant R m + ),y ≽ 0 ⇔ x T y ≥ 0 for all x ≽ 0 (338)

2.13. DUAL CONE & GENERALIZED INEQUALITY 171Given A∈ R n×m , we make vector substitution y ← A T yA T y ≽ 0 ⇔ x T A T y ≥ 0 for all x ≽ 0 (339)Introducing a new vector by calculating b Ax we getA T y ≽ 0 ⇔ b T y ≥ 0, b = Ax for all x ≽ 0 (340)By complementing sense of the scalar inequality:A T y ≽ 0or in the alternativeb T y < 0, ∃ b = Ax, x ≽ 0(341)If one system has a solution, then the other does not; define a convex coneK={y | A T y ≽0} , then y ∈ K or in the alternative y /∈ K .Scalar inequality b T y

172 CHAPTER 2. CONVEX GEOMETRYFrom membership relation (344), conversely, suppose we allow any y ∈ R m .Then because −x T Ay is unbounded below, x T (b −Ay)≥0 implies A T x=0:for y ∈ R mIn toto,A T x=0, b − Ay ∈ K ∗ ⇐ x T (b − Ay)≥ 0 ∀x ∈ K (346)b − Ay ∈ K ∗ ⇔ x T b ≥ 0, A T x=0 ∀x ∈ K (347)Vector x belongs to cone K but is also constrained to lie in a subspaceof R n specified by an intersection of hyperplanes through the origin{x∈ R n |A T x=0}. From this, alternative systems of generalized inequalitywith respect to pointed closed convex cones K and K ∗Ay ≼K ∗or in the alternativeb(348)x T b < 0, A T x=0, x ≽K0derived from (347) simply by taking the complementary sense of theinequality in x T b . These two systems are alternatives; if one system hasa solution, then the other does not. 2.64 [307, p.201]By invoking a strict membership relation between proper cones (326),we can construct a more exotic alternative strengthened by demand for aninterior point;b − Ay ≻0 ⇔ x T b > 0, A T x=0 ∀x ≽K ∗ K0, x ≠ 0 (349)2.64 If solutions at ±∞ are disallowed, then the alternative systems become insteadmutually exclusive with respect to nonpolyhedral cones. Simultaneous infeasibility of thetwo systems is not precluded by mutual exclusivity; called a weak alternative. Ye providesan example illustrating simultaneous [ ] infeasibility with [ respect ] to the positive semidefinitecone: x∈ S 2 1 00 1, y ∈ R , A = , and b = where x0 01 0T b means 〈x , b〉 .A better strategy than disallowing solutions at ±∞ is to demand an interior point asin (350) or Lemma 4.2.1.1.2. Then question of simultaneous infeasibility is moot.

2.13. DUAL CONE & GENERALIZED INEQUALITY 173From this, alternative systems of generalized inequality [61, pages:50,54,262]Ay ≺K ∗or in the alternativeb(350)x T b ≤ 0, A T x=0, x ≽K0, x ≠ 0derived from (349) by taking complementary sense of the inequality in x T b .And from this, alternative systems with respect to the nonnegativeorthant attributed to Gordan in 1873: [160] [55,2.2] substituting A ←A Tand setting b = 0A T y ≺ 0or in the alternative(351)Ax = 0, x ≽ 0, ‖x‖ 1 = 1Ben-Israel collects related results from Farkas, Motzkin, Gordan, and Stiemkein Motzkin transposition theorem. [33]2.13.3 Optimality condition(confer2.13.10.1) The general first-order necessary and sufficient conditionfor optimality of solution x ⋆ to a minimization problem ((302p) for example)with real differentiable convex objective function f(x) : R n →R is [306,3]∇f(x ⋆ ) T (x − x ⋆ ) ≥ 0 ∀x ∈ C , x ⋆ ∈ C (352)where C is a convex feasible set, 2.65 and where ∇f(x ⋆ ) is the gradient (3.6) off with respect to x evaluated at x ⋆ . In words, negative gradient is normal toa hyperplane supporting the feasible set at a point of optimality. (Figure 67)Direct solution to variation inequality (352), instead of the correspondingminimization, spawned from calculus of variations. [250, p.178] [132, p.37]One solution method solves an equivalent fixed point-of-projection problemx = P C (x − ∇f(x)) (353)2.65 presumably nonempty set of all variable values satisfying all given problem constraints;the set of feasible solutions.

174 CHAPTER 2. CONVEX GEOMETRYthat follows from a necessary and sufficient condition for projection on convexset C (Theorem E.9.1.0.2)P(x ⋆ −∇f(x ⋆ )) ∈ C , 〈x ⋆ −∇f(x ⋆ )−x ⋆ , x−x ⋆ 〉 ≤ 0 ∀x ∈ C (2010)Proof of equivalence [Wıκımization, Complementarity problem] is providedby Németh. Given minimum-distance projection problem1minimize ‖x − y‖2x 2subject to x ∈ C(354)on convex feasible set C for example, the equivalent fixed point problemx = P C (x − ∇f(x)) = P C (y) (355)is solved in one step.In the unconstrained case (C = R n ), optimality condition (352) reduces tothe classical rule (p.252)∇f(x ⋆ ) = 0, x ⋆ ∈ domf (356)which can be inferred from the following application:2.13.3.0.1 Example. Optimality for equality constrained problem.Given a real differentiable convex function f(x) : R n →R defined ondomain R n , a fat full-rank matrix C ∈ R p×n , and vector d∈ R p , the convexoptimization problemminimize f(x)x(357)subject to Cx = dis characterized by the well-known necessary and sufficient optimalitycondition [61,4.2.3]∇f(x ⋆ ) + C T ν = 0 (358)where ν ∈ R p is the eminent Lagrange multiplier. [305] [250, p.188] [229] Inother words, solution x ⋆ is optimal if and only if ∇f(x ⋆ ) belongs to R(C T ).Via membership relation, we now derive condition (358) from the generalfirst-order condition for optimality (352): For problem (357)C {x∈ R n | Cx = d} = {Zξ + x p | ξ ∈ R n−rank C } (359)

2.13. DUAL CONE & GENERALIZED INEQUALITY 175is the feasible set where Z ∈ R n×n−rank C holds basis N(C) columnar, andx p is any particular solution to Cx = d . Since x ⋆ ∈ C , we arbitrarily choosex p = x ⋆ which yields an equivalent optimality condition∇f(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (360)when substituted into (352). But this is simply half of a membership relationwhere the cone dual to R n−rank C is the origin in R n−rank C . We must thereforehaveZ T ∇f(x ⋆ ) = 0 ⇔ ∇f(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (361)meaning, ∇f(x ⋆ ) must be orthogonal to N(C). These conditionsare necessary and sufficient for optimality.Z T ∇f(x ⋆ ) = 0, Cx ⋆ = d (362)2.13.4 Discretization of membership relation2.13.4.1 Dual halfspace-descriptionHalfspace-description of dual cone K ∗ is equally simple as vertex-descriptionK = cone(X) = {Xa | a ≽ 0} ⊆ R n (103)for corresponding closed convex cone K : By definition (297), for X ∈ R n×Nas in (280), (confer (287))K ∗ = { y ∈ R n | z T y ≥ 0 for all z ∈ K }= { y ∈ R n | z T y ≥ 0 for all z = Xa , a ≽ 0 }= { y ∈ R n | a T X T y ≥ 0, a ≽ 0 }= { y ∈ R n | X T y ≽ 0 } (363)that follows from the generalized inequality and membership corollary (321).The semi-infinity of tests specified by all z ∈ K has been reduced to a setof generators for K constituting the columns of X ; id est, the test has beendiscretized.Whenever cone K is known to be closed and convex, the conjugatestatement must also hold; id est, given any set of generators for dual coneK ∗ arranged columnar in Y , then the consequent vertex-description of dualcone connotes a halfspace-description for K : [330,2.8]K ∗ = {Y a | a ≽ 0} ⇔ K ∗∗ = K = { z | Y T z ≽ 0 } (364)

176 CHAPTER 2. CONVEX GEOMETRY2.13.4.2 First dual-cone formulaFrom these two results (363) and (364) we deduce a general principle:From any [sic] given vertex-description (103) of closed convex cone K ,a halfspace-description of the dual cone K ∗ is immediate by matrixtransposition (363); conversely, from any given halfspace-description(287) of K , a dual vertex-description is immediate (364). [307, p.122]Various other converses are just a little trickier. (2.13.9,2.13.11)We deduce further: For any polyhedral cone K , the dual cone K ∗ is alsopolyhedral and K ∗∗ = K . [330, p.56]The generalized inequality and membership corollary is discretized in thefollowing theorem inspired by (363) and (364):2.13.4.2.1 Theorem. Discretized membership. (confer2.13.2.0.1) 2.66Given any set of generators, (2.8.1.2) denoted by G(K) for closed convexcone K ⊆ R n , and any set of generators denoted G(K ∗ ) for its dual such thatK = cone G(K) , K ∗ = cone G(K ∗ ) (365)then discretization of the generalized inequality and membership corollary(p.168) is necessary and sufficient for certifying cone membership: for x andy in vector space R nx ∈ K ⇔ 〈γ ∗ , x〉 ≥ 0 for all γ ∗ ∈ G(K ∗ ) (366)y ∈ K ∗ ⇔ 〈γ , y〉 ≥ 0 for all γ ∈ G(K) (367)⋄Proof. K ∗ = {G(K ∗ )a | a ≽ 0}. y ∈ K ∗ ⇔ y=G(K ∗ )a for some a ≽ 0.x∈ K ⇔ 〈y , x〉≥ 0 ∀y ∈ K ∗ ⇔ 〈G(K ∗ )a , x〉≥ 0 ∀a ≽ 0 (320). a ∑ i α ie iwhere e i is the i th member of a standard basis of possibly infinitecardinality. 〈G(K ∗ )a , x〉≥ 0 ∀a ≽ 0 ⇔ ∑ i α i〈G(K ∗ )e i , x〉≥ 0 ∀α i ≥ 0 ⇔〈G(K ∗ )e i , x〉≥ 0 ∀i. Conjugate relation (367) is similarly derived. 2.66 Stated in [21,1] without proof for pointed closed convex case.

2.13. DUAL CONE & GENERALIZED INEQUALITY 177xKFigure 61: x ≽ 0 with respect to K but not with respect to nonnegativeorthant R 2 + (pointed convex cone K drawn truncated).2.13.4.2.2 Exercise. Discretized dual generalized inequalities.Test Theorem 2.13.4.2.1 on Figure 56a using extreme directions there asgenerators.From the discretized membership theorem we may further deduce a moresurgical description of closed convex cone that prescribes only a finite numberof halfspaces for its construction when polyhedral: (Figure 55a)K = {x ∈ R n | 〈γ ∗ , x〉 ≥ 0 for all γ ∗ ∈ G(K ∗ )} (368)K ∗ = {y ∈ R n | 〈γ , y〉 ≥ 0 for all γ ∈ G(K)} (369)2.13.4.2.3 Exercise. Partial order induced by orthant.When comparison is with respect to the nonnegative orthant K = R n + , thenfrom the discretized membership theorem it directly follows:x ≼ z ⇔ x i ≤ z i ∀i (370)Generate simple counterexamples demonstrating that this equivalence withentrywise inequality holds only when the underlying cone inducing partialorder is the nonnegative orthant; e.g., explain Figure 61.

178 CHAPTER 2. CONVEX GEOMETRY2.13.4.2.4 Example. Boundary membership to proper polyhedral cone.For a polyhedral cone, test (330) of boundary membership can be formulatedas a linear program. Say proper polyhedral cone K is specified completelyby generators that are arranged columnar inX = [ Γ 1 · · · Γ N ] ∈ R n×N (280)id est, K = {Xa | a ≽ 0}. Then membership relationc ∈ ∂K ⇔ ∃ y ≠ 0 〈y , c〉 = 0, y ∈ K ∗ , c ∈ K (330)may be expressedfinda , yy ≠ 0subject to c T y = 0X T y ≽ 0Xa = ca ≽ 0This linear feasibility problem has a solution iff c∈∂K .(371)2.13.4.3 smallest face of pointed closed convex coneGiven nonempty convex subset C of a convex set K , the smallest face of Kcontaining C is equivalent to intersection of all faces of K that contain C .[307, p.164] By (309), membership relation (330) means that each andevery point on boundary ∂K of proper cone K belongs to a hyperplanesupporting K whose normal y belongs to dual cone K ∗ . It follows that thesmallest face F , containing C ⊂ ∂K ⊂ R n on boundary of proper cone K ,is the intersection of all hyperplanes containing C whose normals are in K ∗ ;whereF(K ⊃ C) = {x ∈ K | x ⊥ K ∗ ∩ C ⊥ } (372)When C ∩ int K ≠ ∅ then F(K ⊃ C)= K .C ⊥ {y ∈ R n | 〈z, y〉=0 ∀z∈ C} (373)2.13.4.3.1 Example. Finding smallest face of cone.Suppose polyhedral cone K is completely specified by generators arrangedcolumnar inX = [ Γ 1 · · · Γ N ] ∈ R n×N (280)

2.13. DUAL CONE & GENERALIZED INEQUALITY 179To find its smallest face F(K ∋c) containing a given point c∈K , by thediscretized membership theorem 2.13.4.2.1, it is necessary and sufficient tofind generators for the smallest face. We may do so one generator at atime: 2.67 Consider generator Γ i . If there exists a vector z ∈ K ∗ orthogonal toc but not to Γ i , then Γ i cannot belong to the smallest face of K containingc . Such a vector z can be realized by a linear feasibility problem:find z ∈ R nsubject to c T z = 0X T z ≽ 0Γ T i z = 1(374)If there exists a solution z for which Γ T i z=1, thenΓ i ̸⊥ K ∗ ∩ c ⊥ = {z ∈ R n | X T z ≽0, c T z=0} (375)so Γ i /∈ F(K ∋c) ; solution z is a certificate of null membership. If thisproblem is infeasible for generator Γ i ∈ K , conversely, then Γ i ∈ F(K ∋c)by (372) and (363) because Γ i ⊥ K ∗ ∩ c ⊥ ; in that case, Γ i is a generator ofF(K ∋c).Since the constant in constraint Γ T i z =1 is arbitrary positively, then bytheorem of the alternative there is correspondence between (374) and (348)admitting the alternative linear problem: for a given point c∈Kfinda∈R N , µ∈Ra , µsubject to µc − Γ i = Xaa ≽ 0(376)Now if this problem is feasible (bounded) for generator Γ i ∈ K , then (374)is infeasible and Γ i ∈ F(K ∋c) is a generator of the smallest face thatcontains c .2.13.4.3.2 Exercise. Finding smallest face of pointed closed convex cone.Show that formula (372) and algorithms (374) and (376) apply more broadly;id est, a full-dimensional cone K is an unnecessary condition. 2.68 2.67 When finding a smallest face, generators of K in matrix X may not be diminished innumber (by discarding columns) until all generators of the smallest face have been found.2.68 Hint: A hyperplane, with normal in K ∗ , containing cone K is admissible.

180 CHAPTER 2. CONVEX GEOMETRY2.13.4.3.3 Exercise. Smallest face of positive semidefinite cone.Derive (221) from (372).2.13.5 Dual PSD cone and generalized inequalityThe dual positive semidefinite cone K ∗ is confined to S M by convention;S M ∗+ {Y ∈ S M | 〈Y , X〉 ≥ 0 for all X ∈ S M + } = S M + (377)The positive semidefinite cone is selfdual in the ambient space of symmetricmatrices [61, exmp.2.24] [39] [195,II]; K = K ∗ .Dual generalized inequalities with respect to the positive semidefinite conein the ambient space of symmetric matrices can therefore be simply stated:(Fejér)X ≽ 0 ⇔ tr(Y T X) ≥ 0 for all Y ≽ 0 (378)Membership to this cone can be determined in the isometrically isomorphicEuclidean space R M2 via (38). (2.2.1) By the two interpretations in2.13.1,positive semidefinite matrix Y can be interpreted as inward-normal to ahyperplane supporting the positive semidefinite cone.The fundamental statement of positive semidefiniteness, y T Xy ≥0 ∀y(A.3.0.0.1), evokes a particular instance of these dual generalizedinequalities (378):X ≽ 0 ⇔ 〈yy T , X〉 ≥ 0 ∀yy T (≽ 0) (1442)Discretization (2.13.4.2.1) allows replacement of positive semidefinitematrices Y with this minimal set of generators comprising the extremedirections of the positive semidefinite cone (2.9.2.7).2.13.5.1 selfdual conesFrom (131) (a consequence of the halfspaces theorem,2.4.1.1.1), where theonly finite value of the support function for a convex cone is 0 [199,C.2.3.1],or from discretized definition (369) of the dual cone we get a ratherself-evident characterization of selfdual cones:

2.13. DUAL CONE & GENERALIZED INEQUALITY 181(a)K0 ∂K ∗∂K ∗K(b)Figure 62: Two (truncated) views of a polyhedral cone K and its dual in R 3 .Each of four extreme directions from K belongs to a face of dual cone K ∗ .Cone K , shrouded by its dual, is symmetrical about its axis of revolution.Each pair of diametrically opposed extreme directions from K makes a rightangle. An orthant (or any rotation thereof; a simplicial cone) is not theonly selfdual polyhedral cone in three or more dimensions; [20,2.A.21] e.g.,consider an equilateral having five extreme directions. In fact, every selfdualpolyhedral cone in R 3 has an odd number of extreme directions. [22, thm.3]

182 CHAPTER 2. CONVEX GEOMETRYK = K ∗ ⇔ K = ⋂ {y | γ T y ≥ 0 } (379)γ∈G(K)In words: Cone K is selfdual iff its own extreme directions are inward-normalsto a (minimal) set of hyperplanes bounding halfspaces whose intersectionconstructs it. This means each extreme direction of K is normal to ahyperplane exposing one of its own faces; a necessary but insufficientcondition for selfdualness (Figure 62, for example).Selfdual cones are necessarily full-dimensional. [30,I] Their mostprominent representatives are the orthants (Cartesian cones), the positivesemidefinite cone S M + in the ambient space of symmetric matrices (377), andLorentz cone (178) [20,II.A] [61, exmp.2.25]. In three dimensions, a planecontaining the axis of revolution of a selfdual cone (and the origin) willproduce a slice whose boundary makes a right angle.2.13.5.1.1 Example. Linear matrix inequality. (confer2.13.2.0.3)Consider a peculiar vertex-description for a convex cone K defined overa positive semidefinite cone (instead of a nonnegative orthant as indefinition (103)): for X ∈ S n + given A j ∈ S n , j =1... m⎧⎡⎨K = ⎣⎩⎧⎡⎨= ⎣⎩〈A 1 , X〉.〈A m , X〉⎤ ⎫⎬⎦ | X ≽ 0⎭ ⊆ Rm⎤ ⎫svec(A 1 ) T⎬. ⎦svec X | X ≽ 0svec(A m ) T ⎭ {A svec X | X ≽ 0}(380)where A∈ R m×n(n+1)/2 , and where symmetric vectorization svec is definedin (56). Cone K is indeed convex because, by (175)A svec X p1 , A svec X p2 ∈ K ⇒ A(ζ svec X p1 +ξ svec X p2 )∈ K for all ζ,ξ ≥ 0(381)since a nonnegatively weighted sum of positive semidefinite matrices must bepositive semidefinite. (A.3.1.0.2) Although matrix A is finite-dimensional, Kis generally not a polyhedral cone (unless m=1 or 2) simply because X ∈ S n + .

2.13. DUAL CONE & GENERALIZED INEQUALITY 183Theorem. Inverse image closedness. [199, prop.A.2.1.12][307, thm.6.7] Given affine operator g : R m → R p , convex set D ⊆ R m ,and convex set C ⊆ R p g −1 (rel int C)≠ ∅ , thenrel intg(D)=g(rel int D), rel intg −1 C =g −1 (rel int C), g −1 C =g −1 C(382)⋄By this theorem, relative interior of K may always be expressedrel int K = {A svec X | X ≻ 0} (383)Because dim(aff K)=dim(A svec S n ) (127) then, provided the vectorized A jmatrices are linearly independent,rel int K = int K (14)meaning, cone K is full-dimensional ⇒ dual cone K ∗ is pointed by (310).Convex cone K can be closed, by this corollary:Corollary. Cone closedness invariance. [56,3] [51,3]Given linear operator A : R p → ( R m and closed)convex cone X ⊆ R p ,convex cone K = A(X) is closed A(X) = A(X) if and only ifN(A) ∩ X = {0} or N(A) ∩ X rel∂X (384)Otherwise, K = A(X) ⊇ A(X) ⊇ A(X). [307, thm.6.6]⋄If matrix A has no nontrivial nullspace, then A svec X is an isomorphismin X between cone S n + and range R(A) of matrix A ; (2.2.1.0.1,2.10.1.1)sufficient for convex cone K to be closed and have relative boundaryrel∂K = {A svec X | X ≽ 0, X ⊁ 0} (385)Now consider the (closed convex) dual cone:K ∗ = {y | 〈z , y〉 ≥ 0 for all z ∈ K} ⊆ R m= {y | 〈z , y〉 ≥ 0 for all z = A svec X , X ≽ 0}= {y | 〈A svec X , y〉 ≥ 0 for all X ≽ 0}= { y | 〈svec X , A T y〉 ≥ 0 for all X ≽ 0 }= { y | svec −1 (A T y) ≽ 0 } (386)

184 CHAPTER 2. CONVEX GEOMETRYthat follows from (378) and leads to an equally peculiar halfspace-descriptionK ∗ = {y ∈ R m |m∑y j A j ≽ 0} (387)j=1The summation inequality with respect to positive semidefinite cone S n + isknown as linear matrix inequality. [59] [151] [262] [362] Although we alreadyknow that the dual cone is convex (2.13.1), inverse image theorem 2.1.9.0.1certifies convexity of K ∗ which is the inverse image of positive semidefinitecone S n + under linear transformation g(y) ∑ y j A j . And although wealready know that the dual cone is closed, it is certified by (382). By theinverse image closedness theorem, dual cone relative interior may always beexpressedm∑rel int K ∗ = {y ∈ R m | y j A j ≻ 0} (388)Function g(y) on R m is an isomorphism when the vectorized A j matricesare linearly independent; hence, uniquely invertible. Inverse image K ∗ musttherefore have dimension equal to dim ( R(A T )∩ svec S n +)(49) and relativeboundaryrel ∂K ∗ = {y ∈ R m |j=1m∑y j A j ≽ 0,j=1m∑y j A j ⊁ 0} (389)When this dimension equals m , then dual cone K ∗ is full-dimensionalj=1rel int K ∗ = int K ∗ (14)which implies: closure of convex cone K is pointed (310).2.13.6 Dual of pointed polyhedral coneIn a subspace of R n , now we consider a pointed polyhedral cone K given interms of its extreme directions Γ i arranged columnar inX = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (280)The extremes theorem (2.8.1.1.1) provides the vertex-description of apointed polyhedral cone in terms of its finite number of extreme directionsand its lone vertex at the origin:

2.13. DUAL CONE & GENERALIZED INEQUALITY 1852.13.6.0.1 Definition. Pointed polyhedral cone, vertex-description.Given pointed polyhedral cone K in a subspace of R n , denoting its i th extremedirection by Γ i ∈ R n arranged in a matrix X as in (280), then that cone maybe described: (86) (confer (188) (293))K = { [0 X ] aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= Xaζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{ } (390)= Xb | b ≽ 0 ⊆ Rnthat is simply a conic hull (like (103)) of a finite number N of directions.Relative interior may always be expressedrel int K = {Xb | b ≻ 0} ⊂ R n (391)but identifying the cone’s relative boundary in this mannerrel∂K = {Xb | b ≽ 0, b ⊁ 0} (392)holds only when matrix X represents a bijection onto its range; in otherwords, some coefficients meeting lower bound zero (b∈∂R N +) do notnecessarily provide membership to relative boundary of cone K . △Whenever cone K is pointed closed and convex (not only polyhedral), thendual cone K ∗ has a halfspace-description in terms of the extreme directionsΓ i of K :K ∗ = { y | γ T y ≥ 0 for all γ ∈ {Γ i , i=1... N} ⊆ rel ∂K } (393)because when {Γ i } constitutes any set of generators for K , the discretizationresult in2.13.4.1 allows relaxation of the requirement ∀x∈ K in (297) to∀γ∈{Γ i } directly. 2.69 That dual cone so defined is unique, identical to (297),polyhedral whenever the number of generators N is finiteK ∗ = { y | X T y ≽ 0 } ⊆ R n (363)and is full-dimensional because K is assumed pointed.necessarily pointed unless K is full-dimensional (2.13.1.1).But K ∗ is not2.69 The extreme directions of K constitute a minimal set of generators. Formulae andconversions to vertex-description of the dual cone are in2.13.9 and2.13.11.

186 CHAPTER 2. CONVEX GEOMETRY2.13.6.1 Facet normal & extreme directionWe see from (363) that the conically independent generators of cone K(namely, the extreme directions of pointed closed convex cone K constitutingthe N columns of X) each define an inward-normal to a hyperplanesupporting dual cone K ∗ (2.4.2.6.1) and exposing a dual facet when N isfinite. Were K ∗ pointed and finitely generated, then by closure the conjugatestatement would also hold; id est, the extreme directions of pointed K ∗ eachdefine an inward-normal to a hyperplane supporting K and exposing a facetwhen N is finite. Examine Figure 56 or Figure 62, for example.We may conclude, the extreme directions of proper polyhedral K arerespectively orthogonal to the facets of K ∗ ; likewise, the extreme directionsof proper polyhedral K ∗ are respectively orthogonal to the facets of K .2.13.7 Biorthogonal expansion by example2.13.7.0.1 Example. Relationship to dual polyhedral cone.Simplicial cone K illustrated in Figure 63 induces a partial order on R 2 . Allpoints greater than x with respect to K , for example, are contained in thetranslated cone x + K . The extreme directions Γ 1 and Γ 2 of K do not makean orthogonal set; neither do extreme directions Γ 3 and Γ 4 of dual cone K ∗ ;rather, we have the biorthogonality condition, [365]Γ T 4 Γ 1 = Γ T 3 Γ 2 = 0Γ T 3 Γ 1 ≠ 0, Γ T 4 Γ 2 ≠ 0(394)Biorthogonal expansion of x ∈ K is thenx = Γ 1Γ T 3 xΓ T 3 Γ 1+ Γ 2Γ T 4 xΓ T 4 Γ 2(395)where Γ T 3 x/(Γ T 3 Γ 1 ) is the nonnegative coefficient of nonorthogonal projection(E.6.1) of x on Γ 1 in the direction orthogonal to Γ 3 (y in Figure 163, p.717),and where Γ T 4 x/(Γ T 4 Γ 2 ) is the nonnegative coefficient of nonorthogonalprojection of x on Γ 2 in the direction orthogonal to Γ 4 (z in Figure 163);they are coordinates in this nonorthogonal system. Those coefficients mustbe nonnegative x ≽ K0 because x ∈ K (325) and K is simplicial.If we ascribe the extreme directions of K to the columns of a matrixX [ Γ 1 Γ 2 ] (396)

2.13. DUAL CONE & GENERALIZED INEQUALITY 187Γ 4Γ 2Γ 3K ∗xzK yΓ 10Γ 1 ⊥ Γ 4Γ 2 ⊥ Γ 3K ∗w − KuwFigure 63: (confer Figure 163) Simplicial cone K in R 2 and its dual K ∗ drawntruncated. Conically independent generators Γ 1 and Γ 2 constitute extremedirections of K while Γ 3 and Γ 4 constitute extreme directions of K ∗ . Dottedray-pairs bound translated cones K . Point x is comparable to point z(and vice versa) but not to y ; z ≽ Kx ⇔ z − x ∈ K ⇔ z − x ≽ K0 iff ∃nonnegative coordinates for biorthogonal expansion of z − x . Point y is notcomparable to z because z does not belong to y ± K . Translating a negatedcone is quite helpful for visualization: u ≼ Kw ⇔ u ∈ w − K ⇔ u − w ≼ K0.Points need not belong to K to be comparable; e.g., all points less than w(with respect to K) belong to w − K .

188 CHAPTER 2. CONVEX GEOMETRYthen we find that the pseudoinverse transpose matrixX †T =[Γ 31Γ T 3 Γ 1Γ 41Γ T 4 Γ 2](397)holds the extreme directions of the dual cone. Therefore,x = XX † x (403)is the biorthogonal expansion (395) (E.0.1), and the biorthogonalitycondition (394) can be expressed succinctly (E.1.1) 2.70X † X = I (404)Expansion w=XX † w , for any w ∈ R 2 , is unique if and only if the extremedirections of K are linearly independent; id est, iff X has no nullspace. 2.13.7.1 Pointed cones and biorthogonalityBiorthogonality condition X † X = I from Example 2.13.7.0.1 means Γ 1 andΓ 2 are linearly independent generators of K (B.1.1.1); generators becauseevery x∈ K is their conic combination. From2.10.2 we know that meansΓ 1 and Γ 2 must be extreme directions of K .A biorthogonal expansion is necessarily associated with a pointed closedconvex cone; pointed, otherwise there can be no extreme directions (2.8.1).We will address biorthogonal expansion with respect to a pointed polyhedralcone not full-dimensional in2.13.8.2.13.7.1.1 Example. Expansions implied by diagonalization.(confer6.4.3.2.1) When matrix X ∈ R M×M is diagonalizable (A.5),X = SΛS −1 = [ s 1 · · · s M ] Λ⎣⎡w T1.w T M⎤⎦ =M∑λ i s i wi T (1547)2.70 Possibly confusing is the fact that formula XX † x is simultaneously the orthogonalprojection of x on R(X) (1910), and a sum of nonorthogonal projections of x ∈ R(X) onthe range of each and every column of full-rank X skinny-or-square (E.5.0.0.2).i=1

2.13. DUAL CONE & GENERALIZED INEQUALITY 189coordinates for biorthogonal expansion are its eigenvalues λ i (contained indiagonal matrix Λ) when expanded in S ;⎡X = SS −1 X = [s 1 · · · s M ] ⎣⎤w1 T X. ⎦ =wM TX M∑λ i s i wi T (398)Coordinate value depend upon the geometric relationship of X to its linearlyindependent eigenmatrices s i w T i . (A.5.0.3,B.1.1)Eigenmatrices s i w T i are linearly independent dyads constituted by rightand left eigenvectors of diagonalizable X and are generators of somepointed polyhedral cone K in a subspace of R M×M .When S is real and X belongs to that polyhedral cone K , for example,then coordinates of expansion (the eigenvalues λ i ) must be nonnegative.When X = QΛQ T is symmetric, coordinates for biorthogonal expansionare its eigenvalues when expanded in Q ; id est, for X ∈ S Mi=1X = QQ T X =M∑q i qi T X =i=1M∑λ i q i qi T ∈ S M (399)i=1becomes an orthogonal expansion with orthonormality condition Q T Q=Iwhere λ i is the i th eigenvalue of X , q i is the corresponding i th eigenvectorarranged columnar in orthogonal matrixQ = [q 1 q 2 · · · q M ] ∈ R M×M (400)and where eigenmatrix q i qiT is an extreme direction of some pointedpolyhedral cone K ⊂ S M and an extreme direction of the positive semidefinitecone S M + .Orthogonal expansion is a special case of biorthogonal expansion ofX ∈ aff K occurring when polyhedral cone K is any rotation about theorigin of an orthant belonging to a subspace.Similarly, when X = QΛQ T belongs to the positive semidefinite cone inthe subspace of symmetric matrices, coordinates for orthogonal expansion

190 CHAPTER 2. CONVEX GEOMETRYmust be its nonnegative eigenvalues (1450) when expanded in Q ; id est, forX ∈ S M +M∑M∑X = QQ T X = q i qi T X = λ i q i qi T ∈ S M + (401)i=1where λ i ≥0 is the i th eigenvalue of X . This means matrix Xsimultaneously belongs to the positive semidefinite cone and to the pointedpolyhedral cone K formed by the conic hull of its eigenmatrices. 2.13.7.1.2 Example. Expansion respecting nonpositive orthant.Suppose x ∈ K any orthant in R n . 2.71 Then coordinates for biorthogonalexpansion of x must be nonnegative; in fact, absolute value of the Cartesiancoordinates.Suppose, in particular, x belongs to the nonpositive orthant K = R n − .Then the biorthogonal expansion becomes an orthogonal expansioni=1x = XX T x =n∑−e i (−e T i x) =i=1n∑−e i |e T i x| ∈ R n − (402)i=1and the coordinates of expansion are nonnegative. For this orthant K we haveorthonormality condition X T X = I where X = −I , e i ∈ R n is a standardbasis vector, and −e i is an extreme direction (2.8.1) of K .Of course, this expansion x=XX T x applies more broadly to domain R n ,but then the coordinates each belong to all of R .2.13.8 Biorthogonal expansion, derivationBiorthogonal expansion is a means for determining coordinates in a pointedconic coordinate system characterized by a nonorthogonal basis. Studyof nonorthogonal bases invokes pointed polyhedral cones and their duals;extreme directions of a cone K are assumed to constitute the basis whilethose of the dual cone K ∗ determine coordinates.Unique biorthogonal expansion with respect to K depends upon existenceof its linearly independent extreme directions: Polyhedral cone K must bepointed; then it possesses extreme directions. Those extreme directions mustbe linearly independent to uniquely represent any point in their span.2.71 An orthant is simplicial and selfdual.

2.13. DUAL CONE & GENERALIZED INEQUALITY 191We consider nonempty pointed polyhedral cone K possibly notfull-dimensional; id est, we consider a basis spanning a subspace. Then weneed only observe that section of dual cone K ∗ in the affine hull of K because,by expansion of x , membership x∈aff K is implicit and because any breachof the ordinary dual cone into ambient space becomes irrelevant (2.13.9.3).Biorthogonal expansionx = XX † x ∈ aff K = aff cone(X) (403)is expressed in the extreme directions {Γ i } of K arranged columnar inunder assumption of biorthogonalityX = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (280)X † X = I (404)where † denotes matrix pseudoinverse (E). We therefore seek, in thissection, a vertex-description for K ∗ ∩ aff K in terms of linearly independentdual generators {Γ ∗ i }⊂aff K in the same finite quantity 2.72 as the extremedirections {Γ i } ofK = cone(X) = {Xa | a ≽ 0} ⊆ R n (103)We assume the quantity of extreme directions N does not exceed thedimension n of ambient vector space because, otherwise, the expansion couldnot be unique; id est, assume N linearly independent extreme directionshence N ≤ n (X skinny 2.73 -or-square full-rank). In other words, fat full-rankmatrix X is prohibited by uniqueness because of existence of an infinity ofright inverses;polyhedral cones whose extreme directions number in excess of theambient space dimension are precluded in biorthogonal expansion.2.72 When K is contained in a proper subspace of R n , the ordinary dual cone K ∗ will havemore generators in any minimal set than K has extreme directions.2.73 “Skinny” meaning thin; more rows than columns.

192 CHAPTER 2. CONVEX GEOMETRY2.13.8.1 x ∈ KSuppose x belongs to K ⊆ R n . Then x =Xa for some a≽0. Vector a isunique only when {Γ i } is a linearly independent set. 2.74 Vector a∈ R N cantake the form a =Bx if R(B)= R N . Then we require Xa =XBx = x andBx=BXa = a . The pseudoinverse B =X † ∈ R N×n (E) is suitable when Xis skinny-or-square and full-rank. In that case rankX =N , and for all c ≽ 0and i=1... Na ≽ 0 ⇔ X † Xa ≽ 0 ⇔ a T X T X †T c ≥ 0 ⇔ Γ T i X †T c ≥ 0 (405)The penultimate inequality follows from the generalized inequality andmembership corollary, while the last inequality is a consequence of thatcorollary’s discretization (2.13.4.2.1). 2.75 From (405) and (393) we deduceK ∗ ∩ aff K = cone(X †T ) = {X †T c | c ≽ 0} ⊆ R n (406)is the vertex-description for that section of K ∗ in the affine hull of K becauseR(X †T ) = R(X) by definition of the pseudoinverse. From (310), we knowK ∗ ∩ aff K must be pointed if rel int K is logically assumed nonempty withrespect to aff K .Conversely, suppose full-rank skinny-or-square matrix (N ≤ n)[ ]X †T Γ ∗ 1 Γ ∗ 2 · · · Γ ∗ N ∈ R n×N (407)comprises the extreme directions {Γ ∗ i } ⊂ aff K of the dual-cone intersectionwith the affine hull of K . 2.76 From the discretized membership theorem and2.74 Conic independence alone (2.10) is insufficient to guarantee uniqueness.2.75a ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀(c ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀a ≽ 0)∀(c ≽ 0 ⇔ Γ T i X†T c ≥ 0 ∀i) Intuitively, any nonnegative vector a is a conic combination of the standard basis{e i ∈ R N }; a≽0 ⇔ a i e i ≽0 for all i. The last inequality in (405) is a consequence of thefact that x=Xa may be any extreme direction of K , in which case a is a standard basisvector; a = e i ≽0. Theoretically, because c ≽ 0 defines a pointed polyhedral cone (in fact,the nonnegative orthant in R N ), we can take (405) one step further by discretizing c :a ≽ 0 ⇔ Γ T i Γ ∗ j ≥ 0 for i,j =1... N ⇔ X † X ≥ 0In words, X † X must be a matrix whose entries are each nonnegative.2.76 When closed convex cone K is not full-dimensional, K ∗ has no extreme directions.

2.13. DUAL CONE & GENERALIZED INEQUALITY 193(314) we get a partial dual to (393); id est, assuming x∈aff cone X{ }x ∈ K ⇔ γ ∗T x ≥ 0 for all γ ∗ ∈ Γ ∗ i , i=1... N ⊂ ∂K ∗ ∩ aff K (408)⇔ X † x ≽ 0 (409)that leads to a partial halfspace-description,K = { x∈aff coneX | X † x ≽ 0 } (410)For γ ∗ =X †T e i , any x =Xa , and for all i we have e T i X † Xa = e T i a ≥ 0only when a ≽ 0. Hence x∈ K .When X is full-rank, then the unique biorthogonal expansion of x ∈ Kbecomes (403)x = XX † x =N∑i=1Γ i Γ ∗Ti x (411)whose coordinates Γ ∗Ti x must be nonnegative because K is assumed pointedclosed and convex. Whenever X is full-rank, so is its pseudoinverse X † .(E) In the present case, the columns of X †T are linearly independentand generators of the dual cone K ∗ ∩ aff K ; hence, the columns constituteits extreme directions. (2.10.2) That section of the dual cone is itself apolyhedral cone (by (287) or the cone intersection theorem,2.7.2.1.1) havingthe same number of extreme directions as K .2.13.8.2 x ∈ aff KThe extreme directions of K and K ∗ ∩ aff K have a distinct relationship;because X † X = I , then for i,j = 1... N , Γ T i Γ ∗ i = 1, while for i ≠ j ,Γ T i Γ ∗ j = 0. Yet neither set of extreme directions, {Γ i } nor {Γ ∗ i } , isnecessarily orthogonal. This is a biorthogonality condition, precisely,[365,2.2.4] [202] implying each set of extreme directions is linearlyindependent. (B.1.1.1)The biorthogonal expansion therefore applies more broadly; meaning,for any x ∈ aff K , vector x can be uniquely expressed x =Xb whereb∈R N because aff K contains the origin. Thus, for any such x∈ R(X)(conferE.1.1), biorthogonal expansion (411) becomes x =XX † Xb =Xb .

194 CHAPTER 2. CONVEX GEOMETRY2.13.9 Formulae finding dual cone2.13.9.1 Pointed K , dual, X skinny-or-square full-rankWe wish to derive expressions for a convex cone and its ordinary dualunder the general assumptions: pointed polyhedral K denoted by its linearlyindependent extreme directions arranged columnar in matrix X such thatThe vertex-description is given:rank(X ∈ R n×N ) = N dim aff K ≤ n (412)K = {Xa | a ≽ 0} ⊆ R n (413)from which a halfspace-description for the dual cone follows directly:By defining a matrixK ∗ = {y ∈ R n | X T y ≽ 0} (414)X ⊥ basis N(X T ) (415)(a columnar basis for the orthogonal complement of R(X)), we can sayaff cone X = aff K = {x | X ⊥T x = 0} (416)meaning K lies in a subspace, perhaps R n . Thus a halfspace-descriptionand a vertex-description 2.77 from (314)K = {x∈ R n | X † x ≽ 0, X ⊥T x = 0} (417)K ∗ = { [X †T X ⊥ −X ⊥ ]b | b ≽ 0 } ⊆ R n (418)These results are summarized for a pointed polyhedral cone, havinglinearly independent generators, and its ordinary dual:Cone Table 1 K K ∗vertex-description X X †T , ±X ⊥halfspace-description X † , X ⊥T X T2.77 These descriptions are not unique. A vertex-description of the dual cone, for example,might use four conically independent generators for a plane (2.10.0.0.1, Figure 49) whenonly three would suffice.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1952.13.9.2 Simplicial caseWhen a convex cone is simplicial (2.12.3), Cone Table 1 simplifies becausethen aff coneX = R n : For square X and assuming simplicial K such thatrank(X ∈ R n×N ) = N dim aff K = n (419)we haveCone Table S K K ∗vertex-description X X †Thalfspace-description X † X TFor example, vertex-description (418) simplifies toK ∗ = {X †T b | b ≽ 0} ⊂ R n (420)Now, because dim R(X)= dim R(X †T ) , (E) the dual cone K ∗ is simplicialwhenever K is.2.13.9.3 Cone membership relations in a subspaceIt is obvious by definition (297) of ordinary dual cone K ∗ , in ambient vectorspace R , that its determination instead in subspace S ⊆ R is identicalto its intersection with S ; id est, assuming closed convex cone K ⊆ S andK ∗ ⊆ R(K ∗ were ambient S) ≡ (K ∗ in ambient R) ∩ S (421)because{y ∈ S | 〈y , x〉 ≥ 0 for all x ∈ K} = {y ∈ R | 〈y , x〉 ≥ 0 for all x ∈ K} ∩ S(422)From this, a constrained membership relation for the ordinary dual coneK ∗ ⊆ R , assuming x,y ∈ S and closed convex cone K ⊆ Sy ∈ K ∗ ∩ S ⇔ 〈y , x〉 ≥ 0 for all x ∈ K (423)By closure in subspace S we have conjugation (2.13.1.1):x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ ∩ S (424)

196 CHAPTER 2. CONVEX GEOMETRYThis means membership determination in subspace S requires knowledge ofdual cone only in S . For sake of completeness, for proper cone K withrespect to subspace S (confer (326))x ∈ int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ ∩ S , y ≠ 0 (425)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ int K ∗ ∩ S (426)(By closure, we also have the conjugate relations.) Yet when S equals aff Kfor K a closed convex conex ∈ rel int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ ∩ aff K , y ≠ 0 (427)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ rel int(K ∗ ∩ aff K) (428)2.13.9.4 Subspace S = aff KAssume now a subspace S that is the affine hull of cone K : Consider againa pointed polyhedral cone K denoted by its extreme directions arrangedcolumnar in matrix X such thatrank(X ∈ R n×N ) = N dim aff K ≤ n (412)We want expressions for the convex cone and its dual in subspace S =aff K :Cone Table A K K ∗ ∩ aff Kvertex-description X X †Thalfspace-description X † , X ⊥T X T , X ⊥TWhen dim aff K = n , this table reduces to Cone Table S. These descriptionsfacilitate work in a proper subspace. The subspace of symmetric matrices S N ,for example, often serves as ambient space. 2.782.13.9.4.1 Exercise. Conically independent columns and rows.We suspect the number of conically independent columns (rows) of X tobe the same for X †T , where † denotes matrix pseudoinverse (E). Provewhether it holds that the columns (rows) of X are c.i. ⇔ the columns (rows)of X †T are c.i.2.78 The dual cone of positive semidefinite matrices S N∗+ = S N + remains in S N by convention,whereas the ordinary dual cone would venture into R N×N .

2.13. DUAL CONE & GENERALIZED INEQUALITY 19710.80.6K ∗ M+(a)0.40.2K M+0−0.2−0.4−0.6K ∗ M+−0.8−1−0.5 0 0.5 1 1.5(b)⎡X †T (:,3) = ⎣001⎤⎦∂K ∗ M+K M+⎡X = ⎣1 1 10 1 10 0 1⎤⎦⎡1X †T (:,1) = ⎣−10⎤⎦⎡ ⎤0X †T (:,2) = ⎣ 1 ⎦−1Figure 64: Simplicial cones. (a) Monotone nonnegative cone K M+ and itsdual K ∗ M+ (drawn truncated) in R2 . (b) Monotone nonnegative cone andboundary of its dual (both drawn truncated) in R 3 . Extreme directions ofK ∗ M+ are indicated.

198 CHAPTER 2. CONVEX GEOMETRY2.13.9.4.2 Example. Monotone nonnegative cone.[61, exer.2.33] [353,2] Simplicial cone (2.12.3.1.1) K M+ is the cone of allnonnegative vectors having their entries sorted in nonincreasing order:K M+ {x | x 1 ≥ x 2 ≥ · · · ≥ x n ≥ 0} ⊆ R n += {x | (e i − e i+1 ) T x ≥ 0, i = 1... n−1, e T nx ≥ 0}= {x | X † x ≽ 0}(429)x T y =a halfspace-description where e i is the i th standard basis vector, and where 2.79X †T [ e 1 −e 2 e 2 −e 3 · · · e n ] ∈ R n×n (430)For any vectors x and y , simple algebra demandsn∑x i y i = (x 1 − x 2 )y 1 + (x 2 − x 3 )(y 1 + y 2 ) + (x 3 − x 4 )(y 1 + y 2 + y 3 ) + · · ·i=1+ (x n−1 − x n )(y 1 + · · · + y n−1 ) + x n (y 1 + · · · + y n ) (431)Because x i − x i+1 ≥ 0 ∀i by assumption whenever x∈ K M+ , we can employdual generalized inequalities (323) with respect to the selfdual nonnegativeorthant R n + to find the halfspace-description of dual monotone nonnegativecone K ∗ M+ . We can say xT y ≥ 0 for all X † x ≽ 0 [sic] if and only ifid est,wherey 1 ≥ 0, y 1 + y 2 ≥ 0, ... , y 1 + y 2 + · · · + y n ≥ 0 (432)x T y ≥ 0 ∀X † x ≽ 0 ⇔ X T y ≽ 0 (433)X = [e 1 e 1 +e 2 e 1 +e 2 +e 3 · · · 1 ] ∈ R n×n (434)Because X † x≽0 connotes membership of x to pointed K M+ , then by(297) the dual cone we seek comprises all y for which (433) holds; thus itshalfspace-descriptionK ∗ M+ = {y ≽K ∗ M+0} = {y | ∑ ki=1 y i ≥ 0, k = 1... n} = {y | X T y ≽ 0} ⊂ R n(435)2.79 With X † in hand, we might concisely scribe the remaining vertex- andhalfspace-descriptions from the tables for K M+ and its dual. Instead we use dualgeneralized inequalities in their derivation.

2.13. DUAL CONE & GENERALIZED INEQUALITY 199x 210.50K M−0.5−1K M−1.5K ∗ M−2−1 −0.5 0 0.5 1 1.5 2x 1Figure 65: Monotone cone K M and its dual K ∗ M (drawn truncated) in R2 .The monotone nonnegative cone and its dual are simplicial, illustrated fortwo Euclidean spaces in Figure 64.From2.13.6.1, the extreme directions of proper K M+ are respectivelyorthogonal to the facets of K ∗ M+ . Because K∗ M+ is simplicial, theinward-normals to its facets constitute the linearly independent rows of X Tby (435). Hence the vertex-description for K M+ employs the columns of Xin agreement with Cone Table S because X † =X −1 . Likewise, the extremedirections of proper K ∗ M+ are respectively orthogonal to the facets of K M+whose inward-normals are contained in the rows of X † by (429). So thevertex-description for K ∗ M+ employs the columns of X†T . 2.13.9.4.3 Example. Monotone cone.(Figure 65, Figure 66) Full-dimensional but not pointed, the monotone coneis polyhedral and defined by the halfspace-descriptionK M {x ∈ R n | x 1 ≥ x 2 ≥ · · · ≥ x n } = {x ∈ R n | X ∗T x ≽ 0} (436)Its dual is therefore pointed but not full-dimensional;K ∗ M = {X ∗ b [e 1 −e 2 e 2 −e 3 · · · e n−1 −e n ]b | b ≽ 0 } ⊂ R n (437)the dual cone vertex-description where the columns of X ∗ comprise itsextreme directions. Because dual monotone cone K ∗ M is pointed and satisfiesrank(X ∗ ∈ R n×N ) = N dim aff K ∗ ≤ n (438)

200 CHAPTER 2. CONVEX GEOMETRY-x 2x 3∂K MK ∗ Mx 2x 3∂K MK ∗ MFigure 66: Two views of monotone cone K M and its dual K ∗ M (drawntruncated) in R 3 . Monotone cone is not pointed. Dual monotone coneis not full-dimensional. Cartesian coordinate axes are drawn for reference.

2.13. DUAL CONE & GENERALIZED INEQUALITY 201where N = n −1, and because K M is closed and convex, we may adapt ConeTable 1 (p.194) as follows:Cone Table 1* K ∗ K ∗∗ = Kvertex-description X ∗ X ∗†T , ±X ∗⊥halfspace-description X ∗† , X ∗⊥T X ∗TThe vertex-description for K M is thereforewhere X ∗⊥ = 1 andK M = {[X ∗†T X ∗⊥ −X ∗⊥ ]a | a ≽ 0} ⊂ R n (439)⎡⎤n − 1 −1 −1 · · · −1 −1 −1n − 2 n − 2 −2... · · · −2 −2X ∗† = 1 . n − 3 n − 3 . .. −(n − 4) . −3n3 . n − 4 . ∈ R.. n−1×n−(n − 3) −(n − 3) .⎢⎥⎣ 2 2 · · ·... 2 −(n − 2) −(n − 2) ⎦while1 1 1 · · · 1 1 −(n − 1)(440)K ∗ M = {y ∈ R n | X ∗† y ≽ 0, X ∗⊥T y = 0} (441)is the dual monotone cone halfspace-description.2.13.9.4.4 Exercise. Inside the monotone cones.Mathematically describe the respective interior of the monotone nonnegativecone and monotone cone. In three dimensions, also describe the relativeinterior of each face.2.13.9.5 More pointed cone descriptions with equality conditionConsider pointed polyhedral cone K having a linearly independent set ofgenerators and whose subspace membership is explicit; id est, we are giventhe ordinary halfspace-descriptionK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(287a)

202 CHAPTER 2. CONVEX GEOMETRYwhere A ∈ R m×n and C ∈ R p×n . This can be equivalently written in termsof nullspace of C and vector ξ :K = {Zξ ∈ R n | AZξ ≽ 0} (442)where R(Z ∈ R n×n−rank C ) N(C). Assuming (412) is satisfiedrankX rank ( (AZ) † ∈ R n−rank C×m) = m − l = dim aff K ≤ n − rankC(443)where l is the number of conically dependent rows in AZ which mustbe removed to make ÂZ before the Cone Tables become applicable.2.80Then results collected there admit assignment ˆX ( ÂZ) † ∈ R n−rank C×m−l ,where Â∈ Rm−l×n , followed with linear transformation by Z . So we get thevertex-description, for full-rank (ÂZ)† skinny-or-square,K = {Z(ÂZ)† b | b ≽ 0} (444)From this and (363) we get a halfspace-description of the dual coneK ∗ = {y ∈ R n | (Z T Â T ) † Z T y ≽ 0} (445)From this and Cone Table 1 (p.194) we get a vertex-description, (1873)K ∗ = {[Z †T (ÂZ)T C T −C T ]c | c ≽ 0} (446)Yet becauseK = {x | Ax ≽ 0} ∩ {x | Cx = 0} (447)then, by (314), we get an equivalent vertex-description for the dual coneK ∗ = {x | Ax ≽ 0} ∗ + {x | Cx = 0} ∗= {[A T C T −C T ]b | b ≽ 0}(448)from which the conically dependent columns may, of course, be removed.2.80 When the conically dependent rows (2.10) are removed, the rows remaining must belinearly independent for the Cone Tables (p.19) to apply.

2.13. DUAL CONE & GENERALIZED INEQUALITY 2032.13.10 Dual cone-translate(E.10.3.2.1) First-order optimality condition (352) inspires a dual-conevariant: For any set K , the negative dual of its translation by any a∈ R n is−(K − a) ∗ = {y ∈ R n | 〈y , x − a〉≤0 for all x ∈ K} K ⊥ (a)= {y ∈ R n | 〈y , x〉≤0 for all x ∈ K − a}(449)a closed convex cone called normal cone to K at point a . From this, a newmembership relation like (320):y ∈ −(K − a) ∗ ⇔ 〈y , x − a〉≤0 for all x ∈ K (450)and by closure the conjugate, for closed convex cone Kx ∈ K ⇔ 〈y , x − a〉≤0 for all y ∈ −(K − a) ∗ (451)2.13.10.1 first-order optimality condition - restatement(confer2.13.3) The general first-order necessary and sufficient condition foroptimality of solution x ⋆ to a minimization problem with real differentiableconvex objective function f(x) : R n →R over convex feasible set C is [306,3]−∇f(x ⋆ ) ∈ −(C − x ⋆ ) ∗ , x ⋆ ∈ C (452)id est, the negative gradient (3.6) belongs to the normal cone to C at x ⋆ asin Figure 67.2.13.10.1.1 Example. Normal cone to orthant.Consider proper cone K = R n + , the selfdual nonnegative orthant in R n . Thenormal cone to R n + at a∈ K is (2091)K ⊥ R n + (a∈ Rn +) = −(R n + − a) ∗ = −R n + ∩ a ⊥ , a∈ R n + (453)where −R n += −K ∗ is the algebraic complement of R n + , and a ⊥ is theorthogonal complement to range of vector a . This means: When point ais interior to R n + , the normal cone is the origin. If n p represents numberof nonzero entries in vector a∈∂R n + , then dim(−R n + ∩ a ⊥ )= n − n p andthere is a complementary relationship between the nonzero entries in vector aand the nonzero entries in any vector x∈−R n + ∩ a ⊥ .

204 CHAPTER 2. CONVEX GEOMETRYαα ≥ β ≥ γβCx ⋆ −∇f(x ⋆ )γ{z | f(z) = α}{y | ∇f(x ⋆ ) T (y − x ⋆ ) = 0, f(x ⋆ )=γ}Figure 67: (confer Figure 78) Shown is a plausible contour plot in R 2 ofsome arbitrary differentiable convex real function f(x) at selected levels α ,β , and γ ; id est, contours of equal level f (level sets) drawn dashed infunction’s domain. From results in3.6.2 (p.258), gradient ∇f(x ⋆ ) is normalto γ-sublevel set L γ f (559) by Definition E.9.1.0.1. From2.13.10.1, functionis minimized over convex set C at point x ⋆ iff negative gradient −∇f(x ⋆ )belongs to normal cone to C there. In circumstance depicted, normal coneis a ray whose direction is coincident with negative gradient. So, gradient isnormal to a hyperplane supporting both C and the γ-sublevel set.

2.13. DUAL CONE & GENERALIZED INEQUALITY 2052.13.10.1.2 Example. Optimality conditions for conic problem.Consider a convex optimization problem having real differentiable convexobjective function f(x) : R n →R defined on domain R nminimize f(x)xsubject to x ∈ K(454)Let’s first suppose that the feasible set is a pointed polyhedral cone Kpossessing a linearly independent set of generators and whose subspacemembership is made explicit by fat full-rank matrix C ∈ R p×n ; id est, weare given the halfspace-description, for A∈ R m×nK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(287a)(We’ll generalize to any convex cone K shortly.) Vertex-description of thiscone, assuming (ÂZ)† skinny-or-square full-rank, isK = {Z(ÂZ)† b | b ≽ 0} (444)where Â∈ Rm−l×n , l is the number of conically dependent rows in AZ (2.10)which must be removed, and Z ∈ R n×n−rank C holds basis N(C) columnar.From optimality condition (352),because∇f(x ⋆ ) T (Z(ÂZ)† b − x ⋆ )≥ 0 ∀b ≽ 0 (455)−∇f(x ⋆ ) T Z(ÂZ)† (b − b ⋆ )≤ 0 ∀b ≽ 0 (456)x ⋆ Z(ÂZ)† b ⋆ ∈ K (457)From membership relation (450) and Example 2.13.10.1.1〈−(Z T Â T ) † Z T ∇f(x ⋆ ), b − b ⋆ 〉 ≤ 0 for all b ∈ R m−l+⇔(458)−(Z T Â T ) † Z T ∇f(x ⋆ ) ∈ −R m−l+ ∩ b ⋆⊥Then equivalent necessary and sufficient conditions for optimality of conicproblem (454) with feasible set K are: (confer (362))(Z T Â T ) † Z T ∇f(x ⋆ ) ≽R m−l+0, b ⋆ ≽ 0, ∇f(x ⋆ ) T Z(ÂZ)† b ⋆ = 0 (459)R m−l+

206 CHAPTER 2. CONVEX GEOMETRYexpressible, by (445),∇f(x ⋆ ) ∈ K ∗ , x ⋆ ∈ K , ∇f(x ⋆ ) T x ⋆ = 0 (460)This result (460) actually applies more generally to any convex cone Kcomprising the feasible set: Necessary and sufficient optimality conditionsare in terms of objective gradient−∇f(x ⋆ ) ∈ −(K − x ⋆ ) ∗ , x ⋆ ∈ K (452)whose membership to normal cone, assuming only cone K convexity,−(K − x ⋆ ) ∗ = K ⊥ K(x ⋆ ∈ K) = −K ∗ ∩ x ⋆⊥ (2091)equivalently expresses conditions (460).When K = R n + , in particular, then C =0, A=Z =I ∈ S n ; id est,minimize f(x)xsubject to x ≽ 0 (461)Necessary and sufficient optimality conditions become (confer [61,4.2.3])R n +∇f(x ⋆ ) ≽ 0, x ⋆ ≽ 0, ∇f(x ⋆ ) T x ⋆ = 0 (462)R n +R n +equivalent to condition (330) 2.81 (under nonzero gradient) for membershipto the nonnegative orthant boundary ∂R n + .2.13.10.1.3 Example. Complementarity problem. [210]A complementarity problem in nonlinear function f is nonconvexfind z ∈ Ksubject to f(z) ∈ K ∗〈z , f(z)〉 = 0(463)yet bears strong resemblance to (460) and to (2027) Moreau’s decompositiontheorem on page 740 for projection P on mutually polar cones K and −K ∗ .2.81 and equivalent to well-known Karush-Kuhn-Tucker (KKT) optimality conditions[61,5.5.3] because the dual variable becomes gradient ∇f(x).

2.13. DUAL CONE & GENERALIZED INEQUALITY 207Identify a sum of mutually orthogonal projections x z −f(z) ; in Moreau’sterms, z=P K x and −f(z)=P −K∗x . Then f(z)∈ K ∗ (E.9.2.2 no.4) and zis a solution to the complementarity problem iff it is a fixed point ofz = P K x = P K (z − f(z)) (464)Given that a solution exists, existence of a fixed point would be guaranteedby theory of contraction. [227, p.300] But because only nonexpansivity(Theorem E.9.3.0.1) is achievable by a projector, uniqueness cannot beassured. [203, p.155] Elegant proofs of equivalence between complementarityproblem (463) and fixed point problem (464) are provided by Németh[Wıκımization, Complementarity problem].2.13.10.1.4 Example. Linear complementarity problem. [87] [272] [311]Given matrix B ∈ R n×n and vector q ∈ R n , a prototypical complementarityproblem on the nonnegative orthant K = R n + is linear in w =f(z) :find z ≽ 0subject to w ≽ 0w T z = 0w = q + Bz(465)This problem is not convex when both vectors w and z are variable. 2.82Notwithstanding, this linear complementarity problem can be solved byidentifying w ← ∇f(z)= q + Bz then substituting that gradient into (463)find z ∈ Ksubject to ∇f(z) ∈ K ∗〈z , ∇f(z)〉 = 0(466)2.82 But if one of them is fixed, then the problem becomes convex with a very simplegeometric interpretation: Define the affine subsetA {y ∈ R n | By = w − q}For w T z to vanish, there must be a complementary relationship between the nonzeroentries of vectors w and z ; id est, w i z i =0 ∀i. Given w ≽0, then z belongs to theconvex set of solutions:z ∈ −K ⊥ R n + (w ∈ Rn +) ∩ A = R n + ∩ w ⊥ ∩ Awhere K ⊥ R n +(w) is the normal cone to R n + at w (453). If this intersection is nonempty, thenthe problem is solvable.

208 CHAPTER 2. CONVEX GEOMETRYwhich is simply a restatement of optimality conditions (460) for conic problem(454). Suitable f(z) is the quadratic objective from convex problem1minimizez 2 zT Bz + q T zsubject to z ≽ 0(467)which means B ∈ S n + should be (symmetric) positive semidefinite for solutionof (465) by this method. Then (465) has solution iff (467) does. 2.13.10.1.5 Exercise. Optimality for equality constrained conic problem.Consider a conic optimization problem like (454) having real differentiableconvex objective function f(x) : R n →Rminimize f(x)xsubject to Cx = dx ∈ K(468)minimized over convex cone K but, this time, constrained to affine setA = {x | Cx = d}. Show, by means of first-order optimality condition(352) or (452), that necessary and sufficient optimality conditions are:(confer (460))x ⋆ ∈ KCx ⋆ = d∇f(x ⋆ ) + C T ν ⋆ ∈ K ∗(469)〈∇f(x ⋆ ) + C T ν ⋆ , x ⋆ 〉 = 0where ν ⋆ is any vector 2.83 satisfying these conditions.2.13.11 Proper nonsimplicial K , dual, X fat full-rankSince conically dependent columns can always be removed from X toconstruct K or to determine K ∗ [Wıκımization], then assume we are givena set of N conically independent generators (2.10) of an arbitrary properpolyhedral cone K in R n arranged columnar in X ∈ R n×N such that N > n(fat) and rankX = n . Having found formula (420) to determine the dual ofa simplicial cone, the easiest way to find a vertex-description of proper dual2.83 an optimal dual variable, these optimality conditions are equivalent to the KKTconditions [61,5.5.3].

2.13. DUAL CONE & GENERALIZED INEQUALITY 209cone K ∗ is to first decompose K into simplicial parts K i so that K = ⋃ K i . 2.84Each component simplicial cone in K corresponds to some subset of nlinearly independent columns from X . The key idea, here, is how theextreme directions of the simplicial parts must remain extreme directions ofK . Finding the dual of K amounts to finding the dual of each simplicial part:2.13.11.0.1 Theorem. Dual cone intersection. [330,2.7]Suppose proper cone K ⊂ R n equals the union of M simplicial cones K i whoseextreme directions all coincide with those of K . Then proper dual cone K ∗is the intersection of M dual simplicial cones K ∗ i ; id est,M⋃M⋂K = K i ⇒ K ∗ = K ∗ i (470)i=1Proof. For X i ∈ R n×n , a complete matrix of linearly independentextreme directions (p.145) arranged columnar, corresponding simplicial K i(2.12.3.1.1) has vertex-descriptionNow suppose,K =K =i=1K i = {X i c | c ≽ 0} (471)M⋃K i =i=1M⋃{X i c | c ≽ 0} (472)i=1The union of all K i can be equivalently expressed⎧⎡ ⎤⎪⎨[ X 1 X 2 · · · X M ] ⎢⎣⎪⎩aḅ.c⎥⎦ | a , b ... c ≽ 0 ⎫⎪ ⎬⎪ ⎭(473)Because extreme directions of the simplices K i are extreme directions of Kby assumption, then2.84 That proposition presupposes, of course, that we know how to perform simplicialdecomposition efficiently; also called “triangulation”. [303] [173,3.1] [174,3.1] Existenceof multiple simplicial parts means expansion of x∈ K , like (411), can no longer be uniquebecause number N of extreme directions in K exceeds dimension n of the space.⋄

210 CHAPTER 2. CONVEX GEOMETRYK = { [ X 1 X 2 · · · X M ] d | d ≽ 0 } (474)by the extremes theorem (2.8.1.1.1). Defining X [X 1 X 2 · · · X M ] (withany redundant [sic] columns optionally removed from X), then K ∗ can beexpressed ((363), Cone Table S p.195)K ∗ = {y | X T y ≽ 0} =M⋂{y | Xi T y ≽ 0} =i=1M⋂K ∗ i (475)i=1To find the extreme directions of the dual cone, first we observe that somefacets of each simplicial part K i are common to facets of K by assumption,and the union of all those common facets comprises the set of all facets ofK by design. For any particular proper polyhedral cone K , the extremedirections of dual cone K ∗ are respectively orthogonal to the facets of K .(2.13.6.1) Then the extreme directions of the dual cone can be foundamong inward-normals to facets of the component simplicial cones K i ; thosenormals are extreme directions of the dual simplicial cones K ∗ i . From thetheorem and Cone Table S (p.195),K ∗ =M⋂K ∗ i =i=1M⋂i=1{X †Ti c | c ≽ 0} (476)The set of extreme directions {Γ ∗ i } for proper dual cone K ∗ is thereforeconstituted by those conically independent generators, from the columnsof all the dual simplicial matrices {X †Ti } , that do not violate discretedefinition (363) of K ∗ ;{ }Γ ∗ 1 , Γ ∗ 2 ... Γ ∗ N{}= c.i. X †Ti (:,j), i=1... M , j =1... n | X † i (j,:)Γ l ≥ 0, l =1... N(477)where c.i. denotes selection of only the conically independent vectors fromthe argument set, argument (:,j) denotes the j th column while (j,:) denotesthe j th row, and {Γ l } constitutes the extreme directions of K . Figure 50b(p.143) shows a cone and its dual found via this algorithm.

2.13. DUAL CONE & GENERALIZED INEQUALITY 2112.13.11.0.2 Example. Dual of K nonsimplicial in subspace aff K .Given conically independent generators for pointed closed convex cone K inR 4 arranged columnar inX = [ Γ 1 Γ 2 Γ 3 Γ 4 ] =⎡⎢⎣1 1 0 0−1 0 1 00 −1 0 10 0 −1 −1⎤⎥⎦ (478)having dim aff K = rankX = 3, (282) then performing the most inefficientsimplicial decomposition in aff K we find⎡ ⎤ ⎡ ⎤1 1 01 1 0X 1 = ⎢ −1 0 1⎥⎣ 0 −1 0 ⎦ , X 2 = ⎢ −1 0 0⎥⎣ 0 −1 1 ⎦0 0 −10 0 −1X 3 =4X †T1 =⎢⎣⎡⎢⎣1 0 0−1 1 00 0 10 −1 −1⎤⎥⎦ , X 4 =⎡⎢⎣1 0 00 1 0−1 0 10 −1 −1⎤⎥⎦(479)The corresponding dual simplicial cones in aff K have generators respectivelycolumnar in⎡ ⎤ ⎡ ⎤2 1 1−2 1 1 ⎢4X †T3 =⎡⎢⎣2 −3 1−2 1 −33 2 −1−1 2 −1−1 −2 3−1 −2 −1Applying algorithm (477) we get⎥⎦ ,⎤⎥⎦ ,[ ]Γ ∗ 1 Γ ∗ 2 Γ ∗ 3 Γ ∗ 44X†T 2 =4X†T 4 =⎡= 1 ⎢4⎣⎢⎣⎡⎢⎣1 2 1−3 2 11 −2 11 −2 −33 −1 2−1 3 −2−1 −1 2−1 −1 −21 2 3 21 2 −1 −21 −2 −1 2−3 −2 −1 −2⎤⎥⎦⎤⎥⎦(480)⎥⎦ (481)

212 CHAPTER 2. CONVEX GEOMETRYwhose rank is 3, and is the known result; 2.85 a conically independent setof generators for that pointed section of the dual cone K ∗ in aff K ; id est,K ∗ ∩ aff K .2.13.11.0.3 Example. Dual of proper polyhedral K in R 4 .Given conically independent generators for a full-dimensional pointed closedconvex cone KX = [ Γ 1 Γ 2 Γ 3 Γ 4 Γ 5 ] = ⎢⎣⎡1 1 0 1 0−1 0 1 0 10 −1 0 1 00 0 −1 −1 0⎤⎥⎦ (482)we count 5!/((5 −4)! 4!)=5 component simplices. 2.86 Applying algorithm(477), we find the six extreme directions of dual cone K ∗ (with Γ 2 = Γ ∗ 5)X ∗ =[]Γ ∗ 1 Γ ∗ 2 Γ ∗ 3 Γ ∗ 4 Γ ∗ 5 Γ ∗ 6=⎡⎢⎣1 0 0 1 1 11 0 0 1 0 01 0 −1 0 −1 11 −1 −1 1 0 0⎤⎥⎦ (483)which means, (2.13.6.1) this proper polyhedral K = cone(X) has six(three-dimensional) facets generated G by its {extreme directions}:⎧ ⎧F 1F 2 ⎪⎨⎫⎪ ⎬ ⎪⎨FG 3=F 4F ⎪⎩ 5 ⎪ ⎭ ⎪⎩F 6Γ 1 Γ 2 Γ 3Γ 1 Γ 2 Γ 5⎫⎪ ⎬Γ 1 Γ 4 Γ 5(484)Γ 1 Γ 3 Γ 4Γ 3 Γ 4 Γ 5 ⎪ ⎭Γ 2 Γ 3 Γ 52.85 These calculations proceed so as to be consistent with [114,6]; as if the ambientvector space were proper subspace aff K whose dimension is 3. In that ambient space, Kmay be regarded as a proper cone. Yet that author (from the citation) erroneously statesdimension of the ordinary dual cone to be 3 ; it is, in fact, 4.2.86 There are no linearly dependent combinations of three or four extreme directions inthe primal cone.

2.13. DUAL CONE & GENERALIZED INEQUALITY 213whereas dual proper polyhedral cone K ∗ has only five:⎧ ⎫ ⎧F1∗ Γ ∗⎪⎨ F2∗ 1 Γ ∗ 2 Γ ∗ 3 Γ ∗ 4⎪⎬ ⎪⎨ Γ ∗G F3∗ 1 Γ ∗ 2 Γ ∗ 6= Γ ∗F4⎪⎩∗ 1 Γ ∗ 4 Γ ∗ 5 Γ ∗ 6Γ ⎪⎭ ⎪⎩∗F5∗ 3 Γ ∗ 4 Γ ∗ 5Γ ∗ 2 Γ ∗ 3 Γ ∗ 5 Γ ∗ 6⎫⎪⎬⎪⎭(485)Six two-dimensional cones, having generators respectively {Γ ∗ 1 Γ ∗ 3} {Γ ∗ 2 Γ ∗ 4}{Γ ∗ 1 Γ ∗ 5} {Γ ∗ 4 Γ ∗ 6} {Γ ∗ 2 Γ ∗ 5} {Γ ∗ 3 Γ ∗ 6} , are relatively interior to dual facets;so cannot be two-dimensional faces of K ∗ (by Definition 2.6.0.0.3).We can check this result (483) by reversing the process; we find6!/((6 −4)! 4!) − 3=12 component simplices in the dual cone. 2.87 Applyingalgorithm (477) to those simplices returns a conically independent set ofgenerators for K equivalent to (482).2.13.11.0.4 Exercise. Reaching proper polyhedral cone interior.Name two extreme directions Γ i of cone K from Example 2.13.11.0.3 whoseconvex hull passes through that cone’s interior. Explain why. Are there twosuch extreme directions of dual cone K ∗ ?2.13.12 coordinates in proper nonsimplicial systemA natural question pertains to whether a theory of unique coordinates,like biorthogonal expansion, is extensible to proper cones whose extremedirections number in excess of ambient spatial dimensionality.2.13.12.0.1 Theorem. Conic coordinates.With respect to vector v in some finite-dimensional Euclidean space R n ,define a coordinate t ⋆ v of point x in full-dimensional pointed closed convexcone Kt ⋆ v(x) sup{t∈ R | x − tv ∈ K} (486)Given points x and y in cone K , if t ⋆ v(x)= t ⋆ v(y) for each and every extremedirection v of K then x = y .⋄2.87 Three combinations of four dual extreme directions are linearly dependent; they belongto the dual facets. But there are no linearly dependent combinations of three dual extremedirections.

214 CHAPTER 2. CONVEX GEOMETRYConic coordinate definition (486) acquires its heritage from conditions(376) for generator membership to a smallest face. Coordinate t ⋆ v(c)=0, forexample, corresponds to unbounded µ in (376); indicating, extreme directionv cannot belong to the smallest face of cone K that contains c .2.13.12.0.2 Proof. Vector x −t ⋆ v must belong to the coneboundary ∂K by definition (486). So there must exist a nonzero vector λ thatis inward-normal to a hyperplane supporting cone K and containing x −t ⋆ v ;id est, by boundary membership relation for full-dimensional pointed closedconvex cones (2.13.2)x−t ⋆ v ∈ ∂K ⇔ ∃ λ ≠ 0 〈λ, x−t ⋆ v〉 = 0, λ ∈ K ∗ , x−t ⋆ v ∈ K (330)whereK ∗ = {w ∈ R n | 〈v , w〉 ≥ 0 for all v ∈ G(K)} (369)is the full-dimensional pointed closed convex dual cone. The set G(K) , ofpossibly infinite cardinality N , comprises generators for cone K ; e.g., itsextreme directions which constitute a minimal generating set. If x −t ⋆ vis nonzero, any such vector λ must belong to the dual cone boundary byconjugate boundary membership relationλ ∈ ∂K ∗ ⇔ ∃ x−t ⋆ v ≠ 0 〈λ, x−t ⋆ v〉 = 0, x−t ⋆ v ∈ K , λ ∈ K ∗ (331)whereK = {z ∈ R n | 〈λ, z〉 ≥ 0 for all λ ∈ G(K ∗ )} (368)This description of K means: cone K is an intersection of halfspaces whoseinward-normals are generators of the dual cone. Each and every face ofcone K (except the cone itself) belongs to a hyperplane supporting K . Eachand every vector x −t ⋆ v on the cone boundary must therefore be orthogonalto an extreme direction constituting generators G(K ∗ ) of the dual cone.To the i th extreme direction v = Γ i ∈ R n of cone K , ascribe a coordinatet ⋆ i(x)∈ R of x from definition (486). On domain K , the mapping⎡t ⋆ (x) =⎢⎣⎤t ⋆ 1(x)⎥. ⎦: R n → R N (487)t ⋆ N (x)

2.13. DUAL CONE & GENERALIZED INEQUALITY 215has no nontrivial nullspace. Because x −t ⋆ v must belong to ∂K by definition,the mapping t ⋆ (x) is equivalent to a convex problem (separable in index i)whose objective (by (330)) is tightly bounded below by 0 :t ⋆ (x) ≡ arg minimizet∈R NN∑i=1Γ ∗Tj(i) (x − t iΓ i )subject to x − t i Γ i ∈ K ,i=1... N(488)where index j ∈ I is dependent on i and where (by (368)) λ = Γj ∗ ∈ R n is anextreme direction of dual cone K ∗ that is normal to a hyperplane supportingK and containing x − t ⋆ iΓ i . Because extreme-direction cardinality N forcone K is not necessarily the same as for dual cone K ∗ , index j must bejudiciously selected from a set I .To prove injectivity when extreme-direction cardinality N > n exceedsspatial dimension, we need only show mapping t ⋆ (x) to be invertible;[128, thm.9.2.3] id est, x is recoverable given t ⋆ (x) :x = arg minimize˜x∈R nN∑i=1Γ ∗Tj(i) (˜x − t⋆ iΓ i )subject to ˜x − t ⋆ iΓ i ∈ K ,i=1... N(489)The feasible set of this nonseparable convex problem is an intersection oftranslated full-dimensional pointed closed convex cones ⋂ i K + t⋆ iΓ i . Theobjective function’s linear part describes movement in normal-direction −Γj∗for each of N hyperplanes. The optimal point of hyperplane intersection isthe unique solution x when {Γj ∗ } comprises n linearly independent normalsthat come from the dual cone and make the objective vanish. Because thedual cone K ∗ is full-dimensional, pointed, closed, and convex by assumption,there exist N extreme directions {Γj ∗ } from K ∗ ⊂ R n that span R n . Sowe need simply choose N spanning dual extreme directions that make theoptimal objective vanish. Because such dual extreme directions preexistby (330), t ⋆ (x) is invertible.Otherwise, in the case N ≤ n , t ⋆ (x) holds coordinates for biorthogonalexpansion. Reconstruction of x is therefore unique.2.13.12.1 reconstruction from conic coordinatesThe foregoing proof of the conic coordinates theorem is not constructive; itestablishes existence of dual extreme directions {Γ ∗ j } that will reconstruct

216 CHAPTER 2. CONVEX GEOMETRYa point x from its coordinates t ⋆ (x) via (489), but does not prescribe theindex set I . There are at least two computational methods for specifying{Γj(i) ∗ } : one is combinatorial but sure to succeed, the other is a geometricmethod that searches for a minimum of a nonconvex function. We describethe latter:Convex problem (P)(P)maximize tt∈Rsubject to x − tv ∈ Kminimize λ T xλ∈R nsubject to λ T v = 1λ ∈ K ∗ (D) (490)is equivalent to definition (486) whereas convex problem (D) is its dual; 2.88meaning, primal and dual optimal objectives are equal t ⋆ = λ ⋆T x assumingSlater’s condition (p.285) is satisfied. Under this assumption of strongduality, λ ⋆T (x − t ⋆ v)= t ⋆ (1 − λ ⋆T v)=0; which implies, the primal problemis equivalent tominimize λ ⋆T (x − tv)t∈R(P) (491)subject to x − tv ∈ Kwhile the dual problem is equivalent tominimize λ T (x − t ⋆ v)λ∈R nsubject to λ T v = 1 (D) (492)λ ∈ K ∗Instead given coordinates t ⋆ (x) and a description of cone K , we proposeinversion by alternating solution of primal and dual problemsN∑minimize Γ ∗Tx∈R n i (x − t ⋆ iΓ i )i=1(493)subject to x − t ⋆ iΓ i ∈ K , i=1... N2.88 Form a Lagrangian associated with primal problem (P):L(t, λ) = t + λ T (x − tv) = λ T x + t(1 − λ T v) ,λ ≽K ∗ 0suptL(t, λ) = λ T x , 1 − λ T v = 0Dual variable (Lagrange multiplier [250, p.216]) λ generally has a nonnegative sense forprimal maximization with any cone membership constraint, whereas λ would have anonpositive sense were the primal instead a minimization problem having a membershipconstraint.

2.13. DUAL CONE & GENERALIZED INEQUALITY 217N∑minimize Γ ∗TΓ ∗ i (x ⋆ − t ⋆ iΓ i )i ∈Rn i=1subject to Γ ∗Ti Γ i = 1 ,Γ ∗ i ∈ K ∗ ,i=1... Ni=1... N(494)where dual extreme directions Γ ∗ i are initialized arbitrarily and ultimatelyascertained by the alternation. Convex problems (493) and (494) areiterated until convergence which is guaranteed by virtue of a monotonicallynonincreasing real sequence of objective values. Convergence can be fast.The mapping t ⋆ (x) is uniquely inverted when the necessarily nonnegativeobjective vanishes; id est, when Γ ∗Ti (x ⋆ − t ⋆ iΓ i )=0 ∀i. Here, a zeroobjective can occur only at the true solution x . But this global optimalitycondition cannot be guaranteed by the alternation because the commonobjective function, when regarded in both primal x and dual Γ ∗ i variablessimultaneously, is generally neither quasiconvex or monotonic. (3.8.0.0.3)Conversely, a nonzero objective at convergence is a certificate thatinversion was not performed properly. A nonzero objective indicates thatthe global minimum of a multimodal objective function could not be foundby this alternation. That is a flaw in this particular iterative algorithm forinversion; not in theory. 2.89 A numerical remedy is to reinitialize the Γ ∗ i todifferent values.2.89 The Proof 2.13.12.0.2, that suitable dual extreme directions {Γj ∗ } always exist, meansthat a global optimization algorithm would always find the zero objective of alternation(493) (494); hence, the unique inversion x. But such an algorithm can be combinatorial.

218 CHAPTER 2. CONVEX GEOMETRY

Chapter 3Geometry of convex functionsThe link between convex sets and convex functions is via theepigraph: A function is convex if and only if its epigraph is aconvex set.−Werner FenchelWe limit our treatment of multidimensional functions 3.1 to finite-dimensionalEuclidean space. Then an icon for a one-dimensional (real) convex functionis bowl-shaped (Figure 77), whereas the concave icon is the inverted bowl;respectively characterized by a unique global minimum and maximum whoseexistence is assumed. Because of this simple relationship, usage of the termconvexity is often implicitly inclusive of concavity. Despite iconic imagery,the reader is reminded that the set of all convex, concave, quasiconvex, andquasiconcave functions contains the monotonic functions [203] [215,2.3.5];e.g., [61,3.6, exer.3.46].3.1 vector- or matrix-valued functions including the real functions. Appendix D, with itstables of first- and second-order gradients, is the practical adjunct to this chapter.2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.219

220 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1 Convex function3.1.1 real and vector-valued functionVector-valued functionf(X) : R p×k →R M =⎡⎢⎣f 1 (X).f M (X)⎤⎥⎦ (495)assigns each X in its domain domf (a subset of ambient vector space R p×k )to a specific element [258, p.3] of its range (a subset of R M ). Function f(X)is linear in X on its domain if and only if, for each and every Y,Z ∈domfand α , β ∈ Rf(αY + βZ) = αf(Y ) + βf(Z) (496)A vector-valued function f(X) : R p×k →R M is convex in X if and only ifdomf is a convex set and, for each and every Y,Z ∈domf and 0≤µ≤1f(µY + (1 − µ)Z) ≼µf(Y ) + (1 − µ)f(Z) (497)R M +As defined, continuity is implied but not differentiability (nor smoothness). 3.2Apparently some, but not all, nonlinear functions are convex. Reversing senseof the inequality flips this definition to concavity. Linear (and affine3.4) 3.3functions attain equality in this definition. Linear functions are thereforesimultaneously convex and concave.Vector-valued functions are most often compared (182) as in (497) withrespect to the M-dimensional selfdual nonnegative orthant R M + , a propercone. 3.4 In this case, the test prescribed by (497) is simply a comparisonon R of each entry f i of a vector-valued function f . (2.13.4.2.3) Thevector-valued function case is therefore a straightforward generalization ofconventional convexity theory for a real function. This conclusion followsfrom theory of dual generalized inequalities (2.13.2.0.1) which assertsf convex w.r.t R M + ⇔ w T f convex ∀w ∈ G(R M∗+ ) (498)3.2 Figure 68b illustrates a nondifferentiable convex function. Differentiability is certainlynot a requirement for optimization of convex functions by numerical methods; e.g., [243].3.3 While linear functions are not invariant to translation (offset), convex functions are.3.4 Definition of convexity can be broadened to other (not necessarily proper) cones.Referred to in the literature as K-convexity, [294] R M∗+ (498) generalizes to K ∗ .

3.1. CONVEX FUNCTION 221f 1 (x)f 2 (x)(a)(b)Figure 68: Convex real functions here have a unique minimizer x ⋆ . Forx∈ R , f 1 (x)=x 2 =‖x‖ 2 2 is strictly convex whereas nondifferentiable functionf 2 (x)= √ x 2 =|x|=‖x‖ 2 is convex but not strictly. Strict convexity of a realfunction is only a sufficient condition for minimizer uniqueness.shown by substitution of the defining inequality (497). Discretization allowsrelaxation (2.13.4.2.1) of a semiinfinite number of conditions {w ∈ R M∗+ } to:{w ∈ G(R M∗+ )} ≡ {e i ∈ R M , i=1... M } (499)(the standard basis for R M and a minimal set of generators (2.8.1.2) for R M + )from which the stated conclusion follows; id est, the test for convexity of avector-valued function is a comparison on R of each entry.3.1.2 strict convexityWhen f(X) instead satisfies, for each and every distinct Y and Z in domfand all 0

222 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSAny convex real function f(X) has unique minimum value over anyconvex subset of its domain. [302, p.123] Yet solution to some convexoptimization problem is, in general, not unique; e.g., given minimizationof a convex real function over some convex feasible set Cminimize f(X)Xsubject to X ∈ C(502)any optimal solution X ⋆ comes from a convex set of optimal solutionsX ⋆ ∈ {X | f(X) = inf f(Y ) } ⊆ C (503)Y ∈ CBut a strictly convex real function has a unique minimizer X ⋆ ; id est, for theoptimal solution set in (503) to be a single point, it is sufficient (Figure 68)that f(X) be a strictly convex real 3.5 function and set C convex. But strictconvexity is not necessary for minimizer uniqueness: existence of any strictlysupporting hyperplane to a convex function epigraph (p.219,3.5) at itsminimum over C is necessary and sufficient for uniqueness.Quadratic real functions x T Ax + b T x + c are convex in x iff A≽0.(3.6.4.0.1) Quadratics characterized by positive definite matrix A≻0 arestrictly convex and vice versa. The vector 2-norm square ‖x‖ 2 (Euclideannorm square) and Frobenius’ norm square ‖X‖ 2 F , for example, are strictlyconvex functions of their respective argument (each absolute norm is convexbut not strictly). Figure 68a illustrates a strictly convex real function.3.1.2.1 minimum/minimal element, dual cone characterizationf(X ⋆ ) is the minimum element of its range if and only if, for each and everyw ∈ int R M∗+ , it is the unique minimizer of w T f . (Figure 69) [61,2.6.3]If f(X ⋆ ) is a minimal element of its range, then there exists a nonzerow ∈ R M∗+ such that f(X ⋆ ) minimizes w T f . If f(X ⋆ ) minimizes w T f for somew ∈ int R M∗+ , conversely, then f(X ⋆ ) is a minimal element of its range.3.5 It is more customary to consider only a real function for the objective of a convexoptimization problem because vector- or matrix-valued functions can introduce ambiguityinto the optimal objective value. (2.7.2.2,3.1.2.1) Study of multidimensional objectivefunctions is called multicriteria- [324] or multiobjective- or vector-optimization.

3.1. CONVEX FUNCTION 223Rf(b)f(X ⋆ )f(X ⋆ )wf(X) 2f(X) 1Rf(a)ffwFigure 69: (confer Figure 40) Function range is convex for a convex problem.(a) Point f(X ⋆ ) is the unique minimum element of function range Rf .(b) Point f(X ⋆ ) is a minimal element of depicted range.(Cartesian axes drawn for reference.)

224 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.2.1.1 Exercise. Cone of convex functions.Prove that relation (498) implies: The set of all convex vector-valuedfunctions forms a convex cone in some space. Indeed, any nonnegativelyweighted sum of convex functions remains convex. So trivial function f = 0is convex. Relatively interior to each face of this cone are the strictly convexfunctions of corresponding dimension. 3.6 How do convex real functions fitinto this cone? Where are the affine functions?3.1.2.1.2 Example. Conic origins of Lagrangian.The cone of convex functions, implied by membership relation (498), providesfoundation for what is known as a Lagrangian function. [252, p.398] [280]Consider a conic optimization problem, for proper cone K and affine subset Aconvex w.r.t ⎣ RM +KAminimize f(x)xsubject to g(x) ≽ K 0h(x) ∈ A(504)A Cartesian product of convex functions remains convex, so we may write[ ] ⎡ ⎤[ ] ⎡ ⎤ ⎡fg fg w⎦ ⇔ [ w T λ T ν T ] convex ∀⎣λ ⎦∈h h ν⎤⎣ RM∗ +K ∗ ⎦A ⊥(505)where A ⊥ is a normal cone to A . A normal cone to an affine subset is theorthogonal complement of its parallel subspace (E.10.3.2.1).Membership relation (505) holds because of equality for h in convexitycriterion (497) and because normal-cone membership relation (451), givenpoint a∈A , becomesh ∈ A ⇔ 〈ν , h − a〉=0 for all ν ∈ A ⊥ (506)In other words: all affine functions are convex (with respect to any givenproper cone), all convex functions are translation invariant, whereas anyaffine function must satisfy (506).A real Lagrangian arises from the scalar term in (505); id est,[ ] fghL [w T λ T ν T ]= w T f + λ T g + ν T h (507)3.6 Strict case excludes cone’s point at origin and zero weighting.

3.2. PRACTICAL NORM FUNCTIONS, ABSOLUTE VALUE 2253.2 Practical norm functions, absolute valueTo some mathematicians, “all norms on R n are equivalent” [159, p.53];meaning, ratios of different norms are bounded above and below by finitepredeterminable constants. But to statisticians and engineers, all norms arecertainly not created equal; as evidenced by the compressed sensing (sparsity)revolution, begun in 2004, whose focus is predominantly 1-norm.A norm on R n is a convex function f : R n → R satisfying: for x,y ∈ R n ,α∈ R [227, p.59] [159, p.52]1. f(x) ≥ 0 (f(x) = 0 ⇔ x = 0) (nonnegativity)2. f(x + y) ≤ f(x) + f(y) 3.7 (triangle inequality)3. f(αx) = |α|f(x) (nonnegative homogeneity)Convexity follows by properties 2 and 3. Most useful are 1-, 2-, and ∞-norm:‖x‖ 1 = minimize 1 T tt∈R nsubject to −t ≼ x ≼ t(508)where |x| = t ⋆ (entrywise absolute value equals optimal t ). 3.8‖x‖ 2 = minimizet∈Rsubject tot[ tI xx T t]≽S n+1+0(509)where ‖x‖ 2 = ‖x‖ √ x T x = t ⋆ .‖x‖ ∞ = minimize tt∈Rsubject to −t1 ≼ x ≼ t1(510)where max{|x i | , i=1... n} = t ⋆ .‖x‖ 1 = minimizeα∈R n , β∈R n 1 T (α + β)subject to α,β ≽ 0x = α − β(511)3.7 ‖x + y‖ ≤ ‖x‖ + ‖y‖ for any norm, with equality iff x = κy where κ ≥ 0.3.8 Vector 1 may be replaced with any positive [sic] vector to get absolute value,theoretically, although 1 provides the 1-norm.

226 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSwhere |x| = α ⋆ + β ⋆ because of complementarity α ⋆T β ⋆ = 0 at optimality.(508) (510) (511) represent linear programs, (509) is a semidefinite program.Over some convex set C given vector constant y or matrix constant Yarg infx∈C ‖x − y‖ 2 = arg infx∈C ‖x − y‖2 2 (512)arg infX∈ C ‖X − Y ‖ F = arg infX∈ C ‖X − Y ‖2 F (513)are unconstrained convex quadratic problems. Equality does not hold for asum of norms. (5.4.2.3.2) Optimal solution is norm dependent: [61, p.297]minimizex∈R n ‖x‖ 1subject to x ∈ C≡minimize 1 T tx∈R n , t∈R nsubject to −t ≼ x ≼ tx ∈ C(514)minimizex∈R n ‖x‖ 2subject to x ∈ C≡minimizex∈R n , t∈Rsubject tot[ tI xx T tx ∈ C]≽S n+1+0(515)minimize ‖x‖ ∞x∈R nsubject to x ∈ C≡minimize tx∈R n , t∈Rsubject to −t1 ≼ x ≼ t1x ∈ C(516)In R n these norms represent: ‖x‖ 1 is length measured along a grid(taxicab distance), ‖x‖ 2 is Euclidean length, ‖x‖ ∞ is maximum |coordinate|.minimizex∈R n ‖x‖ 1subject to x ∈ C≡minimizeα∈R n , β∈R n 1 T (α + β)subject to α,β ≽ 0x = α − βx ∈ C(517)These foregoing problems (508)-(517) are convex whenever set C is. Theirequivalence transformations make objectives smooth.

3.2. PRACTICAL NORM FUNCTIONS, ABSOLUTE VALUE 227A = {x∈ R 3 |Ax=b}R 3 B 1 = {x∈ R 3 | ‖x‖ 1 ≤ 1}Figure 70: 1-norm ball B 1 is convex hull of all cardinality-1 vectors of unitnorm (its vertices). Ball boundary contains all points equidistant from originin 1-norm. Cartesian axes drawn for reference. Plane A is overhead (drawntruncated). If 1-norm ball is expanded until it kisses A (intersects ballonly at boundary), then distance (in 1-norm) from origin to A is achieved.Euclidean ball would be spherical in this dimension. Only were A parallel totwo axes could there be a minimal cardinality least Euclidean norm solution.Yet 1-norm ball offers infinitely many, but not all, A-orientations resultingin a minimal cardinality solution. (1-norm ball is an octahedron in thisdimension while ∞-norm ball is a cube.)

228 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.2.0.0.1 Example. Projecting the origin, on an affine subset, in 1-norm.In (1884) we interpret least norm solution to linear system Ax = b asorthogonal projection of the origin 0 on affine subset A = {x∈ R n |Ax=b}where A∈ R m×n is fat full-rank. Suppose, instead of the Euclidean metric,we use taxicab distance to do projection. Then the least 1-norm problem isstated, for b ∈ R(A)minimize ‖x‖ 1x(518)subject to Ax = ba.k.a, compressed sensing problem. Optimal solution can be interpretedas an oblique projection on A simply because the Euclidean metric is notemployed. This problem statement sometimes returns optimal x ⋆ havingminimal cardinality; which can be explained intuitively with reference toFigure 70: [19]Projection of the origin, in 1-norm, on affine subset A is equivalent tomaximization (in this case) of the 1-norm ball B 1 until it kisses A ; rather,a kissing point in A achieves the distance in 1-norm from the origin to A .For the example illustrated (m=1, n=3), it appears that a vertex of theball will be first to touch A . 1-norm ball vertices in R 3 represent nontrivialpoints of minimal cardinality 1, whereas edges represent cardinality 2, whilerelative interiors of facets represent maximal cardinality 3. By reorientingaffine subset A so it were parallel to an edge or facet, it becomes evidentas we expand or contract the ball that a kissing point is not necessarilyunique. 3.9The 1-norm ball in R n has 2 n facets and 2n vertices. 3.10 For n > 0B 1 = {x∈ R n | ‖x‖ 1 ≤ 1} = conv{±e i ∈ R n , i=1... n} (519)is a vertex-description of the unit 1-norm ball. Maximization of the 1-normball until it kisses A is equivalent to minimization of the 1-norm ball until itno longer intersects A . Then projection of the origin on affine subset A isminimizex∈R n ‖x‖ 1subject to Ax = b≡minimize cc∈R , x∈R nsubject to x ∈ cB 1Ax = b(520)3.9 This is unlike the case for the Euclidean ball (1884) where minimum-distanceprojection on a convex set is unique (E.9); all kissable faces of the Euclidean ball aresingle points (vertices).3.10 The ∞-norm ball in R n has 2n facets and 2 n vertices.

3.2. PRACTICAL NORM FUNCTIONS, ABSOLUTE VALUE 229k/m10.910.90.80.80.7signed0.7positive0.60.60.50.50.40.40.30.30.20.20.10.100 0.2 0.4 0.6 0.8 100 0.2 0.4 0.6 0.8 1m/n(518)minimize ‖x‖ 1xsubject to Ax = bminimize ‖x‖ 1xsubject to Ax = bx ≽ 0(523)Figure 71: Exact recovery transition: Respectively signed [117] [119] orpositive [124] [122] [123] solutions x to Ax=b with sparsity k below thickcurve are recoverable. For Gaussian random matrix A∈ R m×n , thick curvedemarcates phase transition in ability to find sparsest solution x by linearprogramming. These results were empirically reproduced in [38].f 2 (x) f 3 (x) f 4 (x)xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxFigure 72: Under 1-norm f 2 (x) , histogram (hatched) of residual amplitudesAx −b exhibits predominant accumulation of zero-residuals. Nonnegativelyconstrained 1-norm f 3 (x) from (523) accumulates more zero-residualsthan f 2 (x). Under norm f 4 (x) (not discussed), histogram would exhibitpredominant accumulation of (nonzero) residuals at gradient discontinuities.

230 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSwherecB 1 = {[I −I ]a | a T 1=c, a≽0} (521)Then (520) is equivalent tominimize cc∈R , x∈R n , a∈R 2nsubject to x = [I −I ]aa T 1 = ca ≽ 0Ax = b≡minimizea∈R 2n ‖a‖ 1subject to [A −A ]a = ba ≽ 0(522)where x ⋆ = [I −I ]a ⋆ . (confer (517)) Significance of this result:(confer p.384) Any vector 1-norm minimization problem may have itsvariable replaced with a nonnegative variable of the same optimalcardinality but twice the length.All other things being equal, nonnegative variables are easier to solve forsparse solutions. (Figure 71, Figure 72, Figure 100) The compressed sensingproblem (518) becomes easier to interpret; e.g., for A∈ R m×nminimize ‖x‖ 1xsubject to Ax = bx ≽ 0≡minimize 1 T xxsubject to Ax = bx ≽ 0(523)movement of a hyperplane (Figure 26, Figure 30) over a bounded polyhedronalways has a vertex solution [94, p.22]. Or vector b might lie on the relativeboundary of a pointed polyhedral cone K = {Ax | x ≽ 0}. In the lattercase, we find practical application of the smallest face F containing b from2.13.4.3 to remove all columns of matrix A not belonging to F ; becausethose columns correspond to 0-entries in vector x .3.2.0.0.2 Exercise. Combinatorial optimization.A device commonly employed to relax combinatorial problems is to arrangedesirable solutions at vertices of bounded polyhedra; e.g., the permutationmatrices of dimension n , which are factorial in number, are the extremepoints of a polyhedron in the nonnegative orthant described by anintersection of 2n hyperplanes (2.3.2.0.4). Minimizing a linear objectivefunction over a bounded polyhedron is a convex problem (a linear program)that always has an optimal solution residing at a vertex.

3.2. PRACTICAL NORM FUNCTIONS, ABSOLUTE VALUE 231What about minimizing other functions? Given some nonsingularmatrix A , geometrically describe three circumstances under which there arelikely to exist vertex solutions tominimizex∈R n ‖Ax‖ 1subject to x ∈ Poptimized over some bounded polyhedron P . 3.113.2.1 k smallest entries(524)Sum of the k smallest entries of x∈ R n is the optimal objective value from:for 1≤k ≤nn∑n∑π(x) i = minimize x T yπ(x) i = maximize k t + 1 T zy∈R ni=n−k+1 z∈R n , t∈Rsubject to 0 ≼ y ≼ 1≡subject to x ≽ t1 + z1 T y = kz ≼ 0(525)i=n−k+1which are dual linear programs, where π(x) 1 = max{x i , i=1... n} whereπ is a nonlinear permutation-operator sorting its vector argument intononincreasing order. Finding k smallest entries of an n-length vector x isexpressible as an infimum of n!/(k!(n − k)!) linear functions of x . The sum∑ π(x)i is therefore a concave function of x ; in fact, monotonic (3.6.1.0.1)in this instance.3.2.2 k largest entriesSum of the k largest entries of x∈ R n is the optimal objective value from:[61, exer.5.19]k∑k∑π(x) i = maximize x T yπ(x) i = minimize k t + 1 T zy∈R ni=1z∈Rsubject to 0 ≼ y ≼ 1≡n , t∈Rsubject to x ≼ t1 + z1 T y = kz ≽ 0(526)i=13.11 Hint: Suppose, for example, P belongs to an orthant and A were orthogonal. Beginwith A=I and apply level sets of the objective, as in Figure 67 and Figure 70. [ Or rewrite ] xthe problem as a linear program like (514) and (516) but in a composite variable ← y .t

232 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSwhich are dual linear programs. Finding k largest entries of an n-lengthvector x is expressible as a supremum of n!/(k!(n − k)!) linear functionsof x . (Figure 74) The summation is therefore a convex function (andmonotonic in this instance,3.6.1.0.1).3.2.2.1 k-largest normLet Πx be a permutation of entries x i such that their absolute valuebecomes arranged in nonincreasing order: |Πx| 1 ≥ |Πx| 2 ≥ · · · ≥ |Πx| n .Sum of the k largest entries of |x|∈ R n is a norm, by properties of vectornorm (3.2), and is the optimal objective value of a linear program:‖x‖nkk∑ ∑|Πx| i = k π(|x|) ii=1{= sup a T i xi∈Ii=1∣ a }ij ∈ {−1, 0, 1}carda i = k= minimize k t + 1 T zz∈R n , t∈Rsubject to −t1 − z ≼ x ≼ t1 + zz ≽ 0= maximize (y 1 − y 2 ) T xy 1 , y 2 ∈R nsubject to 0 ≼ y 1 ≼ 10 ≼ y 2 ≼ 1where the norm subscript derives from a binomial coefficient(y 1 + y 2 ) T 1 = k( nk), and(527)‖x‖n n= ‖x‖ 1‖x‖n1= ‖x‖ ∞(528)‖x‖nk= ‖π(|x|) 1:k ‖ 1Sum of k largest absolute entries of an n-length vector x is expressible asa supremum of 2 k n!/(k!(n − k)!) linear functions of x ; (Figure 74) hence,this norm is convex in x . [61, exer.6.3e]minimize ‖x‖nx∈R n ksubject to x ∈ C≡minimizez∈R n , t∈R , x∈R nsubject tok t + 1 T z−t1 − z ≼ x ≼ t1 + zz ≽ 0x ∈ C(529)

3.2. PRACTICAL NORM FUNCTIONS, ABSOLUTE VALUE 2333.2.2.1.1 Exercise. Polyhedral epigraph of k-largest norm.Make those card I = 2 k n!/(k!(n − k)!) linear functions explicit for ‖x‖ 22 and‖x‖ 21 on R 2 and ‖x‖ 32 on R 3 . Plot ‖x‖ 22 and ‖x‖ 21 in three dimensions. 3.2.2.1.2 Example. Compressed sensing problem.Conventionally posed as convex problem (518), we showed: the compressedsensing problem can always be posed equivalently with a nonnegative variableas in convex statement (523). The 1-norm predominantly appears in theliterature because it is convex, it inherently minimizes cardinality under sometechnical conditions, [68] and because the desirable 0-norm is intractable.Assuming a cardinality-k solution exists, the compressed sensing problemmay be written as a difference of two convex functions: for A∈ R m×nminimize ‖x‖ 1 − ‖x‖nx∈R nksubject to Ax = bx ≽ 0≡find x ∈ R nsubject to Ax = bx ≽ 0‖x‖ 0 ≤ k(530)which is a nonconvex statement, a minimization of n−k smallest entriesof variable vector x , minimization of a concave function on R n + (3.2.1)[307,32]; but a statement of compressed sensing more precise than (523)because of its equivalence to 0-norm. ‖x‖nkis the convex k-largest norm of x(monotonic on R n +) while ‖x‖ 0 (quasiconcave on R n +) expresses its cardinality.Global optimality occurs at a zero objective of minimization; id est, when thesmallest n −k entries of variable vector x are zeroed. Under nonnegativityconstraint, this compressed sensing problem (530) becomes the same asminimize (1 − z) T xz(x) , x∈R nsubject to Ax = bx ≽ 0(531)where1 = ∇‖x‖ 1z = ∇‖x‖nk}, x ≽ 0 (532)

234 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSwhere gradient of k-largest norm is an optimal solution to a convex problem:⎫‖x‖n = maximize y T xk y∈R nsubject to 0 ≼ y ≼ 1y T 1 = k ⎪⎬, x ≽ 0 (533)∇‖x‖n = arg maximize y T xk y∈R nsubject to 0 ≼ y ≼ 1 ⎪⎭y T 1 = k3.2.2.1.3 Exercise. k-largest norm gradient.Prove (532). Find ∇‖x‖ 1 and ∇‖x‖nkon R n . 3.123.2.3 clippingZeroing negative vector entries under 1-norm is accomplished:‖x + ‖ 1 = minimize 1 T tt∈R nsubject to x ≼ t0 ≼ t(534)where, for x=[x i , i=1... n]∈ R nx + t ⋆ =(clipping)[xi , x i ≥ 00, x i < 0}, i=1... n]= 1 (x + |x|) (535)2minimizex∈R n ‖x + ‖ 1subject to x ∈ C≡minimize 1 T tx∈R n , t∈R nsubject to x ≼ t0 ≼ tx ∈ C(536)3.12 Hint:D.2.1.

3.3. INVERTED FUNCTIONS AND ROOTS 2353.3 Inverted functions and rootsA given function f is convex iff −f is concave. Both functions are looselyreferred to as convex since −f is simply f inverted about the abscissa axis,and minimization of f is equivalent to maximization of −f .A given positive function f is convex iff 1/f is concave; f inverted aboutordinate 1 is concave. Minimization of f is maximization of 1/f .We wish to implement objectives of the form x −1 . Suppose we have a2×2 matrix[ ] x zT ∈ R 2 (537)z ywhich is positive semidefinite by (1514) whenT ≽ 0 ⇔ x > 0 and xy ≥ z 2 (538)A polynomial constraint such as this is therefore called a conic constraint. 3.13This means we may formulate convex problems, having inverted variables,as semidefinite programs in Schur-form; e.g.,ratherminimize x −1x∈Rsubject to x > 0x ∈ C≡x > 0, y ≥ 1 x⇔minimizex , y ∈ Rsubject to[ x 11 y(inverted) For vector x=[x i , i=1... n]∈ R nminimizex∈R nn∑i=1x −1isubject to x ≻ 0ratherx ∈ C≡x ≻ 0, y ≥ tr ( δ(x) −1)minimizex∈R n , y∈Rsubject to⇔y[ ] x 1≽ 01 yx ∈ C(539)]≽ 0 (540)y[√ xin√ ] n≽ 0 , yi=1... nx ∈ C (541)[xi√ n√ n y]≽ 0 , i=1... n (542)3.13 In this dimension, the convex cone formed from the set of all values {x , y , z} satisfyingconstraint (538) is called a rotated quadratic or circular cone or positive semidefinite cone.

236 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.3.1 fractional power[149] To implement an objective of the form x α for positive α , we quantizeα and work instead with that approximation. Choose nonnegative integer qfor adequate quantization of α like so:α k 2 q (543)where k ∈{0, 1, 2... 2 q −1}. Any k from that set may be written∑k= q b i 2 i−1 where b i ∈ {0, 1}. Define vector y=[y i , i=0... q] with y 0 =1:i=13.3.1.1 positiveThen we have the equivalent semidefinite program for maximizing a concavefunction x α , for quantized 0≤α 0x ∈ C≡maximizex∈R , y∈R q+1subject toy q[yi−1 y iy ix b i]≽ 0 ,i=1... qx ∈ C (544)where nonnegativity of y q is enforced by maximization; id est,[ ]x > 0, y q ≤ x α yi−1 y i⇔≽ 0 , i=1... q (545)3.3.1.2 negativeIt is also desirable to implement an objective of the form x −α for positive α .The technique is nearly the same as before: for quantized 0≤α 0x ∈ C≡y iminimizex , z∈R , y∈R q+1subject tox b iz[yi−1 y iy ix b i][ ]z 1≽ 01 y q≽ 0 ,i=1... qx ∈ C (546)

3.4. AFFINE FUNCTION 237rather]x > 0, z ≥ x −α⇔[yi−1 y iy ix b i[ ]z 1≽ 01 y q≽ 0 ,i=1... q(547)3.3.1.3 positive invertedNow define vector t=[t i , i=0... q] with t 0 =1. To implement an objectivex 1/α for quantized 0≤α

238 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSdefinition (497). All affine functions satisfy a membership relation, forsome normal cone, like (506). Affine multidimensional functions are moreeasily recognized by existence of no multiplicative multivariate terms and nopolynomial terms of degree higher than 1 ; id est, entries of the function arecharacterized by only linear combinations of argument entries plus constants.A T XA + B T B is affine in X , for example. Trace is an affine function;actually, a real linear function expressible as inner product f(X) = 〈A, X〉with matrix A being the identity. The real affine function in Figure 73illustrates hyperplanes, in its domain, constituting contours of equalfunction-value (level sets (555)).3.4.0.0.1 Example. Engineering control. [388,2.2] 3.14For X ∈ S M and matrices A,B, Q, R of any compatible dimensions, forexample, the expression XAX is not affine in X whereas[ ]R Bg(X) =T XXB Q + A T (551)X + XAis an affine multidimensional function.engineering control. [59] [151]Such a function is typical in(confer Figure 16) Any single- or many-valued inverse of an affine functionis affine.3.4.0.0.2 Example. Linear objective.Consider minimization of a real affine function f(z)= a T z + b over theconvex feasible set C in its domain R 2 illustrated in Figure 73. Since scalar bis fixed, the problem posed is the same as the convex optimizationminimize a T zzsubject to z ∈ C(552)whose objective of minimization is a real linear function. Were convex set Cpolyhedral (2.12), then this problem would be called a linear program. Were3.14 The interpretation from this citation of {X ∈ S M | g(X) ≽ 0} as “an intersectionbetween a linear subspace and the cone of positive semidefinite matrices” is incorrect.(See2.9.1.0.2 for a similar example.) The conditions they state under which strongduality holds for semidefinite programming are conservative. (confer4.2.3.0.1)

3.4. AFFINE FUNCTION 239f(z)Az 2z 1CH−H+a{z ∈ R 2 | a T z = κ1 }{z ∈ R 2 | a T z = κ2 }{z ∈ R 2 | a T z = κ3 }Figure 73: Three hyperplanes intersecting convex set C ⊂ R 2 from Figure 28.Cartesian axes in R 3 : Plotted is affine subset A = f(R 2 )⊂ R 2 × R ; a planewith third dimension. We say sequence of hyperplanes, w.r.t domain R 2 ofaffine function f(z)= a T z + b : R 2 → R , is increasing in normal direction(Figure 26) because affine function increases in direction of gradient a(3.6.0.0.3). Minimization of a T z + b over C is equivalent to minimizationof a T z .

240 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSconvex set C an intersection with a positive semidefinite cone, then thisproblem would be called a semidefinite program.There are two distinct ways to visualize this problem: one in the objectivefunction’s domain R 2 ,[ the]other including the ambient space of the objectiveR2function’s range as in . Both visualizations are illustrated in Figure 73.RVisualization in the function domain is easier because of lower dimension andbecauselevel sets (555) of any affine function are affine. (2.1.9)In this circumstance, the level sets are parallel hyperplanes with respectto R 2 . One solves optimization problem (552) graphically by finding thathyperplane intersecting feasible set C furthest right (in the direction ofnegative gradient −a (3.6)).When a differentiable convex objective function f is nonlinear, thenegative gradient −∇f is a viable search direction (replacing −a in (552)).(2.13.10.1, Figure 67) [152] Then the nonlinear objective function can bereplaced with a dynamic linear objective; linear as in (552).3.4.0.0.3 Example. Support function. [199,C.2.1-C.2.3.1]For arbitrary set Y ⊆ R n , its support function σ Y (a) : R n → R is definedσ Y (a) supa T z (553)z∈Ya positively homogeneous function of direction a whose range contains ±∞.[250, p.135] For each z ∈ Y , a T z is a linear function of vector a . Becauseσ Y (a) is a pointwise supremum of linear functions, it is convex in a(Figure 74). Application of the support function is illustrated in Figure 29afor one particular normal a . Given nonempty closed bounded convex sets Yand Z in R n and nonnegative scalars β and γ [371, p.234]σ βY+γZ (a) = βσ Y (a) + γσ Z (a) (554)3.4.0.0.4 Exercise. Level sets.Given a function f and constant κ , its level sets are definedL κ κf {z | f(z)=κ} (555)

3.5. EPIGRAPH, SUBLEVEL SET 241{a T z 1 + b 1 | a∈ R}supa T pz i + b ii{a T z 2 + b 2 | a∈ R}{a T z 3 + b 3 | a∈ R}a{a T z 4 + b 4 | a∈ R}{a T z 5 + b 5 | a∈ R}Figure 74: Pointwise supremum of convex functions remains convex; byepigraph intersection. Supremum of affine functions in variable a evaluatedat argument a p is illustrated. Topmost affine function per a is supremum.Give two distinct examples of convex function, that are not affine, havingconvex level sets.3.4.0.0.5 Exercise. Epigraph intersection. (confer Figure 74)Draw three hyperplanes in R 3 representing max(0,x) sup{0,x i | x∈ R n }in R 2 ×R to see why maximum of nonnegative vector entries is a convexfunction of variable x . What is the normal to each hyperplane? 3.15 Why ismax(x) convex?3.5 Epigraph, Sublevel setIt is well established that a continuous real function is convex if andonly if its epigraph makes a convex set; [199] [307] [358] [371] [250]epigraph is the connection between convex sets and convex functions(p.219). Piecewise-continuous convex functions are admitted, thereby, and3.15 Hint: page 258.

242 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSq(x)f(x)quasiconvexconvexxxFigure 75: Quasiconvex function q epigraph is not necessarily convex, butconvex function f epigraph is convex in any dimension. Sublevel sets arenecessarily convex for either function, but sufficient only for quasiconvexity.all invariant properties of convex sets carry over directly to convex functions.Generalization of epigraph to a vector-valued function f(X) : R p×k →R M isstraightforward: [294]epi f {(X , t)∈ R p×k × R M | X ∈ domf , f(X) ≼t } (556)R M +id est,f convex ⇔ epif convex (557)Necessity is proven: [61, exer.3.60] Given any (X, u), (Y , v)∈ epif , wemust show for all µ∈[0, 1] that µ(X, u) + (1−µ)(Y , v)∈ epif ; id est, wemust showf(µX + (1−µ)Y ) ≼µu + (1−µ)v (558)Yet this holds by definition because f(µX+(1−µ)Y ) ≼ µf(X)+(1−µ)f(Y ).The converse also holds.R M +3.5.0.0.1 Exercise. Epigraph sufficiency.Prove that converse: Given any (X, u), (Y , v)∈ epi f , if for all µ∈[0, 1]µ(X, u) + (1−µ)(Y , v)∈ epif holds, then f must be convex.

3.5. EPIGRAPH, SUBLEVEL SET 243Sublevel sets of a convex real function are convex. Likewise, correspondingto each and every ν ∈ R ML ν f {X ∈ dom f | f(X) ≼ν } ⊆ R p×k (559)R M +sublevel sets of a convex vector-valued function are convex. As for convexreal functions, the converse does not hold. (Figure 75)To prove necessity of convex sublevel sets: For any X,Y ∈ L ν f we mustshow for each and every µ∈[0, 1] that µX + (1−µ)Y ∈ L ν f . By definition,f(µX + (1−µ)Y ) ≼R M +µf(X) + (1−µ)f(Y ) ≼R M +ν (560)When an epigraph (556) is artificially bounded above, t ≼ ν , then thecorresponding sublevel set can be regarded as an orthogonal projection ofepigraph on the function domain.Sense of the inequality is reversed in (556), for concave functions, and weuse instead the nomenclature hypograph. Sense of the inequality in (559) isreversed, similarly, with each convex set then called superlevel set.3.5.0.0.2 Example. Matrix pseudofractional function.Consider a real function of two variablesf(A, x) : S n × R n → R = x T A † x (561)on domf = S+× n R(A). This function is convex simultaneously in bothvariables when variable matrix A belongs to the entire positive semidefinitecone S n + and variable vector x is confined to range R(A) of matrix A .To explain this, we need only demonstrate that the function epigraph is

244 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSconvex. Consider Schur-form (1511) fromA.4: for t ∈ R[ ] A zG(A, z , t) =z T ≽ 0t⇔z T (I − AA † ) = 0t − z T A † z ≥ 0A ≽ 0(562)Inverse image of the positive semidefinite cone S n+1+ under affine mappingG(A, z , t) is convex by Theorem 2.1.9.0.1. Of the equivalent conditions forpositive semidefiniteness of G , the first is an equality demanding vector zbelong to R(A). Function f(A, z)=z T A † z is convex on convex domainS+× n R(A) because the Cartesian product constituting its epigraphepif(A, z) = { (A, z , t) | A ≽ 0, z ∈ R(A), z T A † z ≤ t } = G −1( )S n+1+ (563)is convex.3.5.0.0.3 Exercise. Matrix product function.Continue Example 3.5.0.0.2 by introducing vector variable x and makingthe substitution z ←Ax . Because of matrix symmetry (E), for all x∈ R nf(A, z(x)) = z T A † z = x T A T A † Ax = x T Ax = f(A, x) (564)whose epigraph isepif(A, x) = { (A, x, t) | A ≽ 0, x T Ax ≤ t } (565)Provide two simple explanations why f(A, x) = x T Ax is not a functionconvex simultaneously in positive semidefinite matrix A and vector x ondomf = S+× n R n .3.5.0.0.4 Example. Matrix fractional function. (confer3.7.2.0.1)Continuing Example 3.5.0.0.2, now consider a real function of two variableson domf = S n +×R n for small positive constant ǫ (confer (1870))f(A, x) = ǫx T (A + ǫI) −1 x (566)

3.5. EPIGRAPH, SUBLEVEL SET 245where the inverse always exists by (1451). This function is convexsimultaneously in both variables over the entire positive semidefinite cone S n +and all x∈ R n : Consider Schur-form (1514) fromA.4: for t ∈ R[ ] A + ǫI zG(A, z , t) =z T ǫ −1 ≽ 0t⇔t − ǫz T (A + ǫI) −1 z ≥ 0A + ǫI ≻ 0(567)Inverse image of the positive semidefinite cone S n+1+ under affine mappingG(A, z , t) is convex by Theorem 2.1.9.0.1. Function f(A, z) is convex onS+×R n n because its epigraph is that inverse image:epi f(A, z) = { (A, z , t) | A + ǫI ≻ 0, ǫz T (A + ǫI) −1 z ≤ t } = G −1( )S n+1+(568)3.5.1 matrix fractional projector functionConsider nonlinear function f having orthogonal projector W as argument:f(W , x) = ǫx T (W + ǫI) −1 x (569)Projection matrix W has property W † = W T = W ≽ 0 (1916). Anyorthogonal projector can be decomposed into an outer product oforthonormal matrices W = UU T where U T U = I as explained inE.3.2. From (1870) for any ǫ > 0 and idempotent symmetric W ,ǫ(W + ǫI) −1 = I − (1 + ǫ) −1 W from whichThereforef(W , x) = ǫx T (W + ǫI) −1 x = x T( I − (1 + ǫ) −1 W ) x (570)limǫ→0 +f(W, x) = lim (W + ǫI) −1 x = x T (I − W )x (571)ǫ→0 +ǫxTwhere I − W is also an orthogonal projector (E.2).We learned from Example 3.5.0.0.4 that f(W , x)= ǫx T (W +ǫI) −1 x isconvex simultaneously in both variables over all x ∈ R n when W ∈ S n + is

246 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSconfined to the entire positive semidefinite cone (including its boundary). Itis now our goal to incorporate f into an optimization problem such thatan optimal solution returned always comprises a projection matrix W . Theset of orthogonal projection matrices is a nonconvex subset of the positivesemidefinite cone. So f cannot be convex on the projection matrices, andits equivalent (for idempotent W )f(W , x) = x T( I − (1 + ǫ) −1 W ) x (572)cannot be convex simultaneously in both variables on either the positivesemidefinite or symmetric projection matrices.Suppose we allow domf to constitute the entire positive semidefinitecone but constrain W to a Fantope (90); e.g., for convex set C and 0 < k < nas inminimize ǫx T (W + ǫI) −1 xx∈R n , W ∈S nsubject to 0 ≼ W ≼ I(573)trW = kx ∈ CAlthough this is a convex problem, there is no guarantee that optimal W isa projection matrix because only extreme points of a Fantope are orthogonalprojection matrices UU T .Let’s try partitioning the problem into two convex parts (one for x andone for W), substitute equivalence (570), and then iterate solution of convexproblemminimize x T (I − (1 + ǫ) −1 W)xx∈R n(574)subject to x ∈ Cwith convex problem(a)minimize x ⋆T (I − (1 + ǫ) −1 W)x ⋆W ∈S nsubject to 0 ≼ W ≼ ItrW = k≡maximize x ⋆T Wx ⋆W ∈S nsubject to 0 ≼ W ≼ ItrW = k(575)until convergence, where x ⋆ represents an optimal solution of (574) fromany particular iteration. The idea is to optimally solve for the partitionedvariables which are later combined to solve the original problem (573).What makes this approach sound is that the constraints are separable, the

3.5. EPIGRAPH, SUBLEVEL SET 247partitioned feasible sets are not interdependent, and the fact that the originalproblem (though nonlinear) is convex simultaneously in both variables. 3.16But partitioning alone does not guarantee a projector as solution.To make orthogonal projector W a certainty, we must invoke a knownanalytical optimal solution to problem (575): Diagonalize optimal solutionfrom problem (574) x ⋆ x ⋆T QΛQ T (A.5.1) and set U ⋆ = Q(:, 1:k)∈ R n×kper (1700c);W = U ⋆ U ⋆T = x⋆ x ⋆T‖x ⋆ ‖ 2 + Q(:, 2:k)Q(:, 2:k)T (576)Then optimal solution (x ⋆ , U ⋆ ) to problem (573) is found, for small ǫ , byiterating solution to problem (574) with optimal (projector) solution (576)to convex problem (575).Proof. Optimal vector x ⋆ is orthogonal to the last n −1 columns oforthogonal matrix Q , sof ⋆ (574) = ‖x ⋆ ‖ 2 (1 − (1 + ǫ) −1 ) (577)after each iteration. Convergence of f(574) ⋆ is proven with the observation thatiteration (574) (575a) is a nonincreasing sequence that is bounded below by 0.Any bounded monotonic sequence in R is convergent. [258,1.2] [42,1.1]Expression (576) for optimal projector W holds at each iteration, therefore‖x ⋆ ‖ 2 (1 − (1 + ǫ) −1 ) must also represent the optimal objective value f(574)⋆at convergence.Because the objective f (573) from problem (573) is also bounded belowby 0 on the same domain, this convergent optimal objective value f(574) ⋆ (forpositive ǫ arbitrarily close to 0) is necessarily optimal for (573); id est,by (1683), andf ⋆ (574) ≥ f ⋆ (573) ≥ 0 (578)limǫ→0 +f⋆ (574) = 0 (579)Since optimal (x ⋆ , U ⋆ ) from problem (574) is feasible to problem (573), andbecause their objectives are equivalent for projectors by (570), then converged(x ⋆ , U ⋆ ) must also be optimal to (573) in the limit. Because problem (573)is convex, this represents a globally optimal solution.3.16 A convex problem has convex feasible set, and the objective surface has one and onlyone global minimum. [302, p.123]

248 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.5.2 semidefinite program via SchurSchur complement (1511) can be used to convert a projection problemto an optimization problem in epigraph form. Suppose, for example,we are presented with the constrained projection problem studied byHayden & Wells in [184] (who provide analytical solution): Given A∈ R M×Mand some full-rank matrix S ∈ R M×L with L < Mminimize ‖A − X‖ 2X∈ S MFsubject to S T XS ≽ 0(580)Variable X is constrained to be positive semidefinite, but only on a subspacedetermined by S . First we write the epigraph form:minimize tX∈ S M , t∈Rsubject to ‖A − X‖ 2 F ≤ tS T XS ≽ 0(581)Next we use Schur complement [277,6.4.3] [248] and matrix vectorization(2.2):minimize tX∈ S M , t∈R[]tI vec(A − X)subject tovec(A − X) T ≽ 0 (582)1S T XS ≽ 0This semidefinite program (4) is an epigraph form in disguise, equivalentto (580); it demonstrates how a quadratic objective or constraint can beconverted to a semidefinite constraint.Were problem (580) instead equivalently expressed without the squareminimize ‖A − X‖ FX∈ S Msubject to S T XS ≽ 0(583)then we get a subtle variation:minimize tX∈ S M , t∈Rsubject to ‖A − X‖ F ≤ tS T XS ≽ 0(584)

3.5. EPIGRAPH, SUBLEVEL SET 249that leads to an equivalent for (583) (and for (580) by (513))minimizeX∈ S M , t∈Rsubject tot[tI vec(A − X)vec(A − X) T t]≽ 0(585)S T XS ≽ 03.5.2.0.1 Example. Schur anomaly.Consider a problem abstract in the convex constraint, given symmetricmatrix Aminimize ‖X‖ 2X∈ S M F − ‖A − X‖2 Fsubject to X ∈ C(586)the minimization of a difference of two quadratic functions each convex inmatrix X . Observe equality‖X‖ 2 F − ‖A − X‖ 2 F = 2 tr(XA) − ‖A‖ 2 F (587)So problem (586) is equivalent to the convex optimizationminimize tr(XA)X∈ S Msubject to X ∈ C(588)But this problem (586) does not have Schur-formminimize t − αX∈ S M , α , tsubject to X ∈ C‖X‖ 2 F ≤ t(589)‖A − X‖ 2 F ≥ αbecause the constraint in α is nonconvex. (2.9.1.0.1)

250 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS21.510.5Y 20−0.5−1−1.5−2−2 −1.5 −1 −0.5 0 0.5 1 1.5 2Figure 76: Gradient in R 2 evaluated on grid over some open disc in domainof convex quadratic bowl f(Y )= Y T Y : R 2 → R illustrated in Figure 77.Circular contours are level sets; each defined by a constant function-value.Y 13.6 GradientGradient ∇f of any differentiable multidimensional function f (formallydefined inD.1) maps each entry f i to a space having the same dimensionas the ambient space of its domain. Notation ∇f is shorthand for gradient∇ x f(x) of f with respect to x . ∇f(y) can mean ∇ y f(y) or gradient∇ x f(y) of f(x) with respect to x evaluated at y ; a distinction that shouldbecome clear from context.Gradient of a differentiable real function f(x) : R K →R with respect toits vector argument is uniquely defined⎡∇f(x) =⎢⎣∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎤⎥⎦ ∈ RK (1759)while the second-order gradient of the twice differentiable real function with

3.6. GRADIENT 251respect to its vector argument is traditionally called the Hessian ; 3.17⎡⎤∂ 2 f(x)∂x 1 ∂x 2· · ·∇ 2 f(x) =⎢⎣∂ 2 f(x)∂x 2 1∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂x 2 2· · ·....∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂x 2 K∈ S K (1760)⎥⎦The gradient can be interpreted as a vector pointing in the direction ofgreatest change, [219,15.6] or polar to direction of steepest descent. 3.18 [375]The gradient can also be interpreted as that vector normal to a sublevel set;e.g., Figure 78, Figure 67.For the quadratic bowl in Figure 77, the gradient maps to R 2 ; illustratedin Figure 76. For a one-dimensional function of real variable, for example,the gradient evaluated at any point in the function domain is just the slope(or derivative) of that function there. (conferD.1.4.1)For any differentiable multidimensional function, zero gradient ∇f = 0is a condition necessary for its unconstrained minimization [152,3.2]:3.6.0.0.1 Example. Projection on rank-1 subset.For A∈ S N having eigenvalues λ(A)= [λ i ]∈ R N , consider the unconstrainednonconvex optimization that is a projection on the rank-1 subset (2.9.2.1)of positive semidefinite cone S N + : Defining λ 1 max i {λ(A) i } andcorresponding eigenvector v 1minimizexarg minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=‖λ(A)‖ 2 − λ 2 (1695)1 , λ 1 > 0{0 , λ1 ≤ 0‖xx T − A‖ 2 F = √ (1696)v 1 λ1 , λ 1 > 0From (1789) andD.2.1, the gradient of ‖xx T − A‖ 2 F is∇ x((x T x) 2 − 2x T Ax ) = 4(x T x)x − 4Ax (590)3.17 Jacobian is the Hessian transpose, so commonly confused in matrix calculus.3.18 Newton’s direction −∇ 2 f(x) −1 ∇f(x) is better for optimization of nonlinear functionswell approximated locally by a quadratic function. [152, p.105]

252 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSSetting the gradient to 0Ax = x(x T x) (591)is necessary for optimal solution. Replace vector x with a normalizedeigenvector v i of A∈ S N , corresponding to a positive eigenvalue λ i , scaledby square root of that eigenvalue. Then (591) is satisfiedx ← v i√λi ⇒ Av i = v i λ i (592)xx T = λ i v i vi T is a rank-1 matrix on the positive semidefinite cone boundary,and the minimum is achieved (7.1.2) when λ i =λ 1 is the largest positiveeigenvalue of A . If A has no positive eigenvalue, then x=0 yields theminimum.Differentiability is a prerequisite neither to convexity or to numericalsolution of a convex optimization problem. The gradient provides a necessaryand sufficient condition (352) (452) for optimality in the constrained case,nevertheless, as it does in the unconstrained case:For any differentiable multidimensional convex function, zero gradient∇f = 0 is a necessary and sufficient condition for its unconstrainedminimization [61,5.5.3]:3.6.0.0.2 Example. Pseudoinverse.The pseudoinverse matrix is the unique solution to an unconstrained convexoptimization problem [159,5.5.4]: given A∈ R m×nwherewhose gradient (D.2.3)vanishes whenminimizeX∈R n×m ‖XA − I‖2 F (593)‖XA − I‖ 2 F = tr ( A T X T XA − XA − A T X T + I ) (594)∇ X ‖XA − I‖ 2 F = 2 ( XAA T − A T) = 0 (595)XAA T = A T (596)When A is fat full-rank, then AA T is invertible, X ⋆ = A T (AA T ) −1 is thepseudoinverse A † , and AA † =I . Otherwise, we can make AA T invertibleby adding a positively scaled identity, for any A∈ R m×nX = A T (AA T + t I) −1 (597)

3.6. GRADIENT 253Invertibility is guaranteed for any finite positive value of t by (1451). Thenmatrix X becomes the pseudoinverse X → A † X ⋆ in the limit t → 0 + .Minimizing instead ‖AX − I‖ 2 F yields the second flavor in (1869). 3.6.0.0.3 Example. Hyperplane, line, described by affine function.Consider the real affine function of vector variable, (confer Figure 73)f(x) : R p → R = a T x + b (598)whose domain is R p and whose gradient ∇f(x)=a is a constant vector(independent of x). This function describes the real line R (its range), andit describes a nonvertical [199,B.1.2] hyperplane ∂H in the space R p × Rfor any particular vector a (confer2.4.2);{[∂H =having nonzero normalxa T x + bη =[ a−1] }| x∈ R p ⊂ R p ×R (599)]∈ R p ×R (600)This equivalence to a hyperplane holds only for real functions. [ ]3.19 EpigraphRpof real affine function f(x) is therefore a halfspace in , so we have:RThe real affine function is to convex functionsasthe hyperplane is to convex sets.3.19 To prove that, consider a vector-valued affine functionf(x) : R p →R M = Ax + bhaving gradient ∇f(x)=A T ∈ R p×M : The affine set{[ ] }x| x∈ R p ⊂ R p ×R MAx + bis perpendicular to [ ]∇f(x)η ∈ R p×M × R M×M−Ibecause([ ] [ ])η T x 0− = 0 ∀x ∈ R pAx + b bYet η is a vector (in R p ×R M ) only when M = 1.

254 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSSimilarly, the matrix-valued affine function of real variable x , for anyparticular matrix A∈ R M×N ,describes a line in R M×N in direction Aand describes a line in R×R M×N{[h(x) : R→R M×N = Ax + B (601){Ax + B | x∈ R} ⊆ R M×N (602)xAx + B]}| x∈ R ⊂ R×R M×N (603)whose slope with respect to x is A .3.6.1 monotonic functionA real function of real argument is called monotonic when it is exclusivelynonincreasing or nondecreasing over the whole of its domain. A realdifferentiable function of real argument is monotonic when its first derivative(not necessarily continuous) maintains sign over the function domain.3.6.1.0.1 Definition. Monotonicity.In pointed closed convex cone K , multidimensional function f(X) isnondecreasing monotonic whennonincreasing monotonic when∀X,Y ∈ dom f .Y ≽ K X ⇒ f(Y ) ≽ f(X)Y ≽ K X ⇒ f(Y ) ≼ f(X)△(604)For monotonicity of vector-valued functions, f compared with respect tothe nonnegative orthant, it is necessary and sufficient for each entry f i to bemonotonic in the same sense.Any affine function is monotonic. In K = S M + , for example, tr(Z T X) is anondecreasing monotonic function of matrix X ∈ S M when constant matrixZ is positive semidefinite; which follows from a result (378) of Fejér.A convex function can be characterized by another kind of nondecreasingmonotonicity of its gradient:

3.6. GRADIENT 2553.6.1.0.2 Theorem. Gradient monotonicity. [199,B.4.1.4][55,3.1 exer.20] Given real differentiable function f(X) : R p×k →R withmatrix argument on open convex domain, the condition〈∇f(Y ) − ∇f(X) , Y − X〉 ≥ 0 for each and every X,Y ∈ domf (605)is necessary and sufficient for convexity of f . Strict inequality and caveatY ≠ X provide necessary and sufficient conditions for strict convexity. ⋄3.6.1.0.3 Example. Composition of functions. [61,3.2.4] [199,B.2.1]Monotonic functions play a vital role determining convexity of functionsconstructed by transformation. Given functions g : R k → R andh : R n → R k , their composition f = g(h) : R n → R defined byis convex iff(x) = g(h(x)) , domf = {x∈ dom h | h(x)∈ domg} (606)g is convex nondecreasing monotonic and h is convexg is convex nonincreasing monotonic and h is concaveand composite function f is concave ifg is concave nondecreasing monotonic and h is concaveg is concave nonincreasing monotonic and h is convexwhere ∞ (−∞) is assigned to convex (concave) g when evaluated outside itsdomain. For differentiable functions, these rules are consequent to (1790).Convexity (concavity) of any g is preserved when h is affine. If f and g are nonnegative convex real functions, then (f(x) k + g(x) k ) 1/kis also convex for integer k ≥1. [249, p.44] A squared norm is convex havingthe same minimum because a squaring operation is convex nondecreasingmonotonic on the nonnegative real line.

256 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.6.1.0.4 Exercise. Order of composition.Real function f =x −2 is convex on R + but not predicted so by results inExample 3.6.1.0.3 when g=h(x) −1 and h=x 2 . Explain this anomaly. The following result for product of real functions is extensible to innerproduct of multidimensional functions on real domain:3.6.1.0.5 Exercise. Product and ratio of convex functions. [61, exer.3.32]In general the product or ratio of two convex functions is not convex. [230]However, there are some results that apply to functions on R [real domain].Prove the following. 3.20(a) If f and g are convex, both nondecreasing (or nonincreasing), andpositive functions on an interval, then fg is convex.(b) If f , g are concave, positive, with one nondecreasing and the othernonincreasing, then fg is concave.(c) If f is convex, nondecreasing, and positive, and g is concave,nonincreasing, and positive, then f/g is convex.3.6.2 first-order convexity condition, real functionDiscretization of w ≽0 in (498) invites refocus to the real-valued function:3.6.2.0.1 Theorem. Necessary and sufficient convexity condition.[61,3.1.3] [132,I.5.2] [391,1.2.3] [42,1.2] [330,4.2] [306,3]For real differentiable function f(X) : R p×k →R with matrix argument onopen convex domain, the condition (conferD.1.7)f(Y ) ≥ f(X) + 〈∇f(X) , Y − X〉 for each and every X,Y ∈ domf (607)is necessary and sufficient for convexity of f . Caveat Y ≠ X and strictinequality again provide necessary and sufficient conditions for strictconvexity. [199,B.4.1.1]⋄3.20 Hint: Prove3.6.1.0.5a by verifying Jensen’s inequality ((497) at µ= 1 2 ).

3.6. GRADIENT 257f(Y )[ ∇f(X)−1]∂H −Figure 77: When a real function f is differentiable at each point in its opendomain, there is an intuitive geometric interpretation of function convexity interms of its gradient ∇f and its epigraph: Drawn is a convex quadratic bowlin R 2 ×R (confer Figure 160, p.680); f(Y )= Y T Y : R 2 → R versus Y onsome open disc in R 2 . Unique strictly supporting hyperplane ∂H − ⊂ R 2 × R(only partially drawn) and its normal vector [ ∇f(X) T −1 ] T at theparticular point of support [X T f(X) ] T are illustrated. The interpretation:At each and every coordinate Y , there is a unique hyperplane containing[Y T f(Y ) ] T and supporting the epigraph of convex differentiable f .

258 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSWhen f(X) : R p →R is a real differentiable convex function with vectorargument on open convex domain, there is simplification of the first-ordercondition (607); for each and every X,Y ∈ domff(Y ) ≥ f(X) + ∇f(X) T (Y − X) (608)From this we can find ∂H − a unique [371, p.220-229] nonvertical [199,B.1.2]hyperplane [ ](2.4), expressed in terms of function gradient, supporting epifXat : videlicet, defining f(Y /∈ domf) ∞ [61,3.1.7]f(X)[ Yt]∈ epif ⇔ t ≥ f(Y ) ⇒ [ ∇f(X) T−1 ]([ Yt]−[ ]) X≤ 0f(X)(609)This means, for each and every point X in the domain of a convex real[ function ] f(X) , there exists a hyperplane ∂H − [ in R p ] × R having normal∇f(X)Xsupporting the function epigraph at ∈ ∂H−1f(X)−{[ Y∂H − =t] [ ] Rp∈R[∇f(X) T −1 ]([ Yt]−[ ]) } X= 0f(X)(610)Such a hyperplane is strictly supporting whenever a function is strictlyconvex. One such supporting hyperplane (confer Figure 29a) is illustratedin Figure 77 for a convex quadratic.From (608) we deduce, for each and every X,Y ∈ domf in the domain,∇f(X) T (Y − X) ≥ 0 ⇒ f(Y ) ≥ f(X) (611)meaning, the gradient at X identifies a supporting hyperplane there in R p{Y ∈ R p | ∇f(X) T (Y − X) = 0} (612)to the convex sublevel sets of convex function f (confer (559))L f(X) f {Z ∈ domf | f(Z) ≤ f(X)} ⊆ R p (613)illustrated for an arbitrary convex real function in Figure 78 and Figure 67.That supporting hyperplane is unique for twice differentiable f . [219, p.501]

3.6. GRADIENT 259αβα ≥ β ≥ γ∇f(X)Y−Xγ{Z | f(Z) = α}{Y | ∇f(X) T (Y − X) = 0, f(X)=α} (612)Figure 78:(confer Figure 67) Shown is a plausible contour plot in R 2 of somearbitrary real differentiable convex function f(Z) at selected levels α , β ,and γ ; contours of equal level f (level sets) drawn in the function’s domain.A convex function has convex sublevel sets L f(X) f (613). [307,4.6] Thesublevel set whose boundary is the level set at α , for instance, comprisesall the shaded regions. For any particular convex function, the familycomprising all its sublevel sets is nested. [199, p.75] Were sublevel sets notconvex, we may certainly conclude the corresponding function is neitherconvex. Contour plots of real affine functions are illustrated in Figure 26and Figure 73.

260 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.6.3 first-order convexity condition, vector-valued fNow consider the first-order necessary and sufficient condition for convexity ofa vector-valued function: Differentiable function f(X) : R p×k →R M is convexif and only if domf is open, convex, and for each and every X,Y ∈ domff(Y ) ≽R M +f(X) +→Y −Xdf(X) f(X) + d dt∣ f(X+ t (Y − X)) (614)t=0→Y −Xwhere df(X) is the directional derivative 3.21 [219] [332] of f at X in directionY −X . This, of course, follows from the real-valued function case: by dualgeneralized inequalities (2.13.2.0.1),f(Y ) − f(X) −where→Y −Xdf(X) ≽R M +→Y −Xdf(X) =0 ⇔⎡⎢⎣〈〉→Y −Xf(Y ) − f(X) − df(X) , w ≥ 0 ∀w ≽ 0R M∗+(615)tr ( ∇f 1 (X) T (Y − X) ) ⎤tr ( ∇f 2 (X) T (Y − X) )⎥.tr ( ∇f M (X) T (Y − X) ) ⎦ ∈ RM (616)Necessary and sufficient discretization (498) allows relaxation of thesemiinfinite number of conditions {w ≽ 0} instead to {w ∈ {e i , i=1... M }}the extreme directions of the selfdual nonnegative orthant. Each extreme→Y −Xdirection picks out a real entry f i and df(X) ifrom the vector-valuedfunction and its directional derivative, then Theorem 3.6.2.0.1 applies.The vector-valued function case (614) is therefore a straightforwardapplication of the first-order convexity condition for real functions to eachentry of the vector-valued function.3.21 We extend the traditional definition of directional derivative inD.1.4 so that directionmay be indicated by a vector or a matrix, thereby broadening scope of the Taylor series(D.1.7). The right-hand side of inequality (614) is the first-order Taylor series expansionof f about X .

3.6. GRADIENT 2613.6.4 second-order convexity conditionAgain, by discretization (498), we are obliged only to consider each individualentry f i of a vector-valued function f ; id est, the real functions {f i }.For f(X) : R p →R M , a twice differentiable vector-valued function withvector argument on open convex domain,∇ 2 f i (X) ≽0 ∀X ∈ domf , i=1... M (617)S p +is a necessary and sufficient condition for convexity of f . Obviously,when M = 1, this convexity condition also serves for a real function.Condition (617) demands nonnegative curvature, intuitively, henceprecluding points of inflection as in Figure 79 (p.270).Strict inequality in (617) provides only a sufficient condition for strictconvexity, but that is nothing new; videlicet, strictly convex real functionf i (x)=x 4 does not have positive second derivative at each and every x∈ R .Quadratic forms constitute a notable exception where the strict-case converseholds reliably.3.6.4.0.1 Example. Convex quadratic.Real quadratic multivariate polynomial in matrix A and vector bx T Ax + 2b T x + c (618)is convex if and only if A≽0.gradient: (D.2.1)Proof follows by observing second order∇ 2 x(x T Ax + 2b T x + c ) = A +A T (619)Because x T (A +A T )x = 2x T Ax, matrix A can be assumed symmetric. 3.6.4.0.2 Exercise. Real fractional function. (confer3.3,3.5.0.0.4)Prove that real function f(x,y) = x/y is not convex on the first quadrant.Also exhibit this in a plot of the function. (f is quasilinear (p.271) on{y > 0} , in fact, and nonmonotonic; even on the first quadrant.)

262 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.6.4.0.3 Exercise. Stress function.Define |x −y| √ (x −y) 2 andX = [x 1 · · · x N ] ∈ R 1×N (76)Given symmetric nonnegative data [h ij ] ∈ S N ∩ R N×N+ , consider functionN−1∑ N∑f(vec X) = (|x i − x j | − h ij ) 2 ∈ R (1308)i=1 j=i+1Take the gradient and Hessian of f . Then explain why f is not a convexfunction; id est, why doesn’t second-order condition (617) apply to theconstant positive semidefinite Hessian matrix you found. For N = 6 and h ijdata from (1381), apply line theorem 3.7.3.0.1 to plot f along some arbitrarylines through its domain.3.6.4.1 second-order ⇒ first-order conditionFor a twice-differentiable real function f i (X) : R p →R having open domain,a consequence of the mean value theorem from calculus allows compressionof its complete Taylor series expansion about X ∈ domf i (D.1.7) to threeterms: On some open interval of ‖Y ‖ 2 , so that each and every line segment[X,Y ] belongs to domf i , there exists an α∈[0, 1] such that [391,1.2.3][42,1.1.4]f i (Y ) = f i (X) + ∇f i (X) T (Y −X) + 1 2 (Y −X)T ∇ 2 f i (αX + (1 − α)Y )(Y −X)(620)The first-order condition for convexity (608) follows directly from this andthe second-order condition (617).3.7 Convex matrix-valued functionWe need different tools for matrix argument: We are primarily interested incontinuous matrix-valued functions g(X). We choose symmetric g(X)∈ S Mbecause matrix-valued functions are most often compared (621) with respectto the positive semidefinite cone S M + in the ambient space of symmetricmatrices. 3.223.22 Function symmetry is not a necessary requirement for convexity; indeed, for A∈R m×pand B ∈R m×k , g(X) = AX + B is a convex (affine) function in X on domain R p×k with

3.7. CONVEX MATRIX-VALUED FUNCTION 2633.7.0.0.1 Definition. Convex matrix-valued function:1) Matrix-definition.A function g(X) : R p×k →S M is convex in X iff domg is a convex set and,for each and every Y,Z ∈domg and all 0≤µ≤1 [215,2.3.7]g(µY + (1 − µ)Z) ≼µg(Y ) + (1 − µ)g(Z) (621)S M +Reversing sense of the inequality flips this definition to concavity. Strictconvexity is defined less a stroke of the pen in (621) similarly to (500).2) Scalar-definition.It follows that g(X) : R p×k →S M is convex in X iff w T g(X)w : R p×k →R isconvex in X for each and every ‖w‖= 1; shown by substituting the defininginequality (621). By dual generalized inequalities we have the equivalent butmore broad criterion, (2.13.5)g convex w.r.t S M + ⇔ 〈W , g〉 convexfor each and every W ≽S M∗+0 (622)Strict convexity on both sides requires caveat W ≠ 0. Because theset of all extreme directions for the selfdual positive semidefinite cone(2.9.2.7) comprises a minimal set of generators for that cone, discretization(2.13.4.2.1) allows replacement of matrix W with symmetric dyad ww T asproposed.△3.7.0.0.2 Example. Taxicab distance matrix.Consider an n-dimensional vector space R n with metric induced by the1-norm. Then distance between points x 1 and x 2 is the norm of theirdifference: ‖x 1 −x 2 ‖ 1 . Given a list of points arranged columnar in a matrixX = [x 1 · · · x N ] ∈ R n×N (76)respect to the nonnegative orthant R m×k+ . Symmetric convex functions share the samebenefits as symmetric matrices. Horn & Johnson [202,7.7] liken symmetric matrices toreal numbers, and (symmetric) positive definite matrices to positive real numbers.

264 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSthen we could define a taxicab distance matrixD 1 (X) (I ⊗ 1 T n) | vec(X)1 T − 1 ⊗X | ∈ S N h ∩ R N×N+⎡⎤0 ‖x 1 − x 2 ‖ 1 ‖x 1 − x 3 ‖ 1 · · · ‖x 1 − x N ‖ 1‖x 1 − x 2 ‖ 1 0 ‖x 2 − x 3 ‖ 1 · · · ‖x 2 − x N ‖ 1=‖x⎢ 1 − x 3 ‖ 1 ‖x 2 − x 3 ‖ 1 0 ‖x 3 − x N ‖ 1⎥⎣ . .... . ⎦‖x 1 − x N ‖ 1 ‖x 2 − x N ‖ 1 ‖x 3 − x N ‖ 1 · · · 0(623)where 1 n is a vector of ones having dim1 n = n and where ⊗ representsKronecker product. This matrix-valued function is convex with respect to thenonnegative orthant since, for each and every Y,Z ∈ R n×N and all 0≤µ≤1D 1 (µY + (1 − µ)Z)≼R N×N+µD 1 (Y ) + (1 − µ)D 1 (Z) (624)3.7.0.0.3 Exercise. 1-norm distance matrix.The 1-norm is called taxicab distance because to go from one point to anotherin a city by car, road distance is a sum of grid lengths. Prove (624). 3.7.1 first-order convexity condition, matrix-valued fFrom the scalar-definition (3.7.0.0.1) of a convex matrix-valued function,for differentiable function g and for each and every real vector w of unitnorm ‖w‖= 1, we havew T g(Y )w ≥ w T →Y −XTg(X)w + w dg(X) w (625)that follows immediately from the first-order condition (607) for convexity ofa real function because→Y −XTw dg(X) w = 〈 ∇ X w T g(X)w , Y − X 〉 (626)→Y −Xwhere dg(X) is the directional derivative (D.1.4) of function g at X indirection Y −X . By discretized dual generalized inequalities, (2.13.5)g(Y ) − g(X) −→Y −Xdg(X) ≽S M +0 ⇔〈g(Y ) − g(X) −→Y −X〉dg(X) , ww T ≥ 0 ∀ww T (≽ 0)(627)S M∗+

3.7. CONVEX MATRIX-VALUED FUNCTION 265For each and every X,Y ∈ domg (confer (614))g(Y ) ≽g(X) +→Y −Xdg(X) (628)S M +must therefore be necessary and sufficient for convexity of a matrix-valuedfunction of matrix variable on open convex domain.3.7.2 epigraph of matrix-valued function, sublevel setsWe generalize epigraph to a continuous matrix-valued function [34, p.155]g(X) : R p×k →S M :epi g {(X , T )∈ R p×k × S M | X ∈ domg , g(X) ≼T } (629)S M +from which it followsg convex ⇔ epig convex (630)Proof of necessity is similar to that in3.5 on page 242.Sublevel sets of a convex matrix-valued function corresponding to eachand every S ∈ S M (confer (559))L Sg {X ∈ domg | g(X) ≼S } ⊆ R p×k (631)S M +are convex. There is no converse.3.7.2.0.1 Example. Matrix fractional function redux. [34, p.155]Generalizing Example 3.5.0.0.4 consider a matrix-valued function of twovariables on domg = S N + ×R n×N for small positive constant ǫ (confer (1870))g(A, X) = ǫX(A + ǫI) −1 X T (632)where the inverse always exists by (1451). This function is convexsimultaneously in both variables over the entire positive semidefinite cone S N +

266 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSand all X ∈ R n×N : Consider Schur-form (1514) fromA.4: for T ∈ S n[ ]A + ǫI XTG(A, X, T ) =X ǫ −1 ≽ 0T⇔T − ǫX(A + ǫI) −1 X T ≽ 0A + ǫI ≻ 0(633)By Theorem 2.1.9.0.1, inverse image of the positive semidefinite cone S N+n+under affine mapping G(A, X, T ) is convex. Function g(A, X) is convexon S N + ×R n×N because its epigraph is that inverse image:epig(A, X) = { (A, X, T ) | A + ǫI ≻ 0, ǫX(A + ǫI) −1 X T ≼ T } = G −1( S N+n+(634)3.7.3 second-order convexity condition, matrix-valued fThe following line theorem is a potent tool for establishing convexity of amultidimensional function. To understand it, what is meant by line must firstbe solidified. Given a function g(X) : R p×k →S M and particular X, Y ∈ R p×knot necessarily in that function’s domain, then we say a line {X+ t Y | t ∈ R}passes through domg when X+ t Y ∈ domg over some interval of t ∈ R .3.7.3.0.1 Theorem. Line theorem. [61,3.1.1]Matrix-valued function g(X) : R p×k →S M is convex in X if and only if itremains convex on the intersection of any line with its domain. ⋄Now we assume a twice differentiable function.3.7.3.0.2 Definition. Differentiable convex matrix-valued function.Matrix-valued function g(X) : R p×k →S M is convex in X iff domg is anopen convex set, and its second derivative g ′′ (X+ t Y ) : R→S M is positivesemidefinite on each point of intersection along every line {X+ t Y | t ∈ R}that intersects domg ; id est, iff for each and every X, Y ∈ R p×k such thatX+ t Y ∈ domg over some open interval of t ∈ Rd 2dt 2 g(X+ t Y ) ≽ S M +0 (635))

3.7. CONVEX MATRIX-VALUED FUNCTION 267Similarly, ifd 2dt 2 g(X+ t Y ) ≻ S M +0 (636)then g is strictly convex; the converse is generally false. [61,3.1.4] 3.23 △3.7.3.0.3 Example. Matrix inverse. (confer3.3.1)The matrix-valued function X µ is convex on int S M + for −1≤µ≤0or 1≤µ≤2 and concave for 0≤µ≤1. [61,3.6.2] In particular, thefunction g(X) = X −1 is convex on int S M + . For each and every Y ∈ S M(D.2.1,A.3.1.0.5)d 2dt 2 g(X+ t Y ) = 2(X+ t Y )−1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 ≽S M +0 (637)on some open interval of t ∈ R such that X + t Y ≻0. Hence, g(X) isconvex in X . This result is extensible; 3.24 trX −1 is convex on that samedomain. [202,7.6 prob.2] [55,3.1 exer.25]3.7.3.0.4 Example. Matrix squared.Iconic real function f(x)= x 2 is strictly convex on R . The matrix-valuedfunction g(X)=X 2 is convex on the domain of symmetric matrices; forX, Y ∈ S M and any open interval of t ∈ R (D.2.1)d 2d2g(X+ t Y ) =dt2 dt 2(X+ t Y )2 = 2Y 2 (638)which is positive semidefinite when Y is symmetric because then Y 2 = Y T Y(1457). 3.25A more appropriate matrix-valued counterpart for f is g(X)=X T Xwhich is a convex function on domain {X ∈ R m×n } , and strictly convexwhenever X is skinny-or-square full-rank. This matrix-valued function canbe generalized to g(X)=X T AX which is convex whenever matrix A ispositive semidefinite (p.692), and strictly convex when A is positive definite3.23 The strict-case converse is reliably true for quadratic forms.3.24 d/dt tr g(X+ tY ) = trd/dtg(X+ tY ). [203, p.491]3.25 By (1475) inA.3.1, changing the domain instead to all symmetric and nonsymmetricpositive semidefinite matrices, for example, will not produce a convex function.

268 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSand X is skinny-or-square full-rank (Corollary A.3.1.0.5).‖X‖ 2 sup ‖Xa‖ 2 = σ(X) 1 = √ λ(X T X) 1 = minimize‖a‖=1t∈Rsubject tot (639)[ ] tI XX T ≽ 0tIThe matrix 2-norm (spectral norm) coincides with largest singular value.This supremum of a family of convex functions in X must be convex becauseit constitutes an intersection of epigraphs of convex functions.3.7.3.0.5 Exercise. Squared maps.Give seven examples of distinct polyhedra P for which the set{X T X | X ∈ P} ⊆ S n + (640)were convex. Is this set convex, in general, for any polyhedron P ?(confer (1233) (1240)) Is the epigraph of function g(X)=X T X convex forany polyhedral domain?3.7.3.0.6 Exercise. Squared inverse. (confer3.7.2.0.1)For positive scalar a , real function f(x)= ax −2 is convex on the nonnegativereal line. Given positive definite matrix constant A , prove via line theoremthat g(X)= tr ( (X T A −1 X) −1) is generally not convex unless X ≻ 0 . 3.26From this result, show how it follows via Definition 3.7.0.0.1-2 thath(X) = (X T A −1 X) −1 is generally neither convex.3.7.3.0.7 Example. Matrix exponential.The matrix-valued function g(X) = e X : S M → S M is convex on the subspaceof circulant [168] symmetric matrices. Applying the line theorem, for all t∈Rand circulant X, Y ∈ S M , from Table D.2.7 we haved 2dt 2eX+ t Y = Y e X+ t Y Y ≽S M +0 , (XY ) T = XY (641)because all circulant matrices are commutative and, for symmetric matrices,XY = Y X ⇔ (XY ) T = XY (1474). Given symmetric argument, the matrix3.26 Hint:D.2.3.

3.8. QUASICONVEX 269exponential always resides interior to the cone of positive semidefinitematrices in the symmetric matrix subspace; e A ≻0 ∀A∈ S M (1867). Thenfor any matrix Y of compatible dimension, Y T e A Y is positive semidefinite.(A.3.1.0.5)The subspace of circulant symmetric matrices contains all diagonalmatrices. The matrix exponential of any diagonal matrix e Λ exponentiateseach individual entry on the main diagonal. [251,5.3] So, changingthe function domain to the subspace of real diagonal matrices reduces thematrix exponential to a vector-valued function in an isometrically isomorphicsubspace R M ; known convex (3.1.1) from the real-valued function case[61,3.1.5].There are, of course, multifarious methods to determine functionconvexity, [61] [42] [132] each of them efficient when appropriate.3.7.3.0.8 Exercise. log det function.Matrix determinant is neither a convex or concave function, in general,but its inverse is convex when domain is restricted to interior of a positivesemidefinite cone. [34, p.149] Show by three different methods: On interior ofthe positive semidefinite cone, log detX = − log detX −1 is concave. 3.8 QuasiconvexQuasiconvex functions [61,3.4] [199] [330] [371] [244,2] are useful inpractical problem solving because they are unimodal (by definition whennonmonotonic); a global minimum is guaranteed to exist over any convex setin the function domain; e.g., Figure 79.3.8.0.0.1 Definition. Quasiconvex function.f(X) : R p×k →R is a quasiconvex function of matrix X iff domf is a convexset and for each and every Y,Z ∈domf , 0≤µ≤1f(µY + (1 − µ)Z) ≤ max{f(Y ), f(Z)} (642)A quasiconcave function is determined:f(µY + (1 − µ)Z) ≥ min{f(Y ), f(Z)} (643)△

270 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSFigure 79: Iconic unimodal differentiable quasiconvex function of twovariables graphed in R 2 × R on some open disc in R 2 . Note reversal ofcurvature in direction of gradient.Unlike convex functions, quasiconvex functions are not necessarilycontinuous; e.g., quasiconcave rank(X) on S M + (2.9.2.9.2) and card(x)on R M + . Although insufficient for convex functions, convexity of each andevery sublevel set serves as a definition of quasiconvexity:3.8.0.0.2 Definition. Quasiconvex multidimensional function.Scalar-, vector-, or matrix-valued function g(X) : R p×k →S M is a quasiconvexfunction of matrix X iff domg is a convex set and the sublevel setcorresponding to each and every S ∈ S ML Sg = {X ∈ dom g | g(X) ≼ S } ⊆ R p×k (631)is convex. Vectors are compared with respect to the nonnegative orthant R M +while matrices are with respect to the positive semidefinite cone S M + .Convexity of the superlevel set corresponding to each and every S ∈ S M ,likewiseL S g = {X ∈ domg | g(X) ≽ S } ⊆ R p×k (644)is necessary and sufficient for quasiconcavity.△

3.9. SALIENT PROPERTIES 2713.8.0.0.3 Exercise. Nonconvexity of matrix product.Consider real function f on a positive definite domain[ ] [X1rel int SN+f(X) = tr(X 1 X 2 ) , X ∈ domf X 2 rel int S N +](645)with superlevel setsL s f = {X ∈ dom f | 〈X 1 , X 2 〉 ≥ s } (646)Prove: f(X) is not quasiconcave except when N = 1, nor is it quasiconvexunless X 1 = X 2 .3.8.1 bilinearBilinear function 3.27 x T y of vectors x and y is quasiconcave (monotonic) onthe entirety of the nonnegative orthants only when vectors are of dimension 1.3.8.2 quasilinearWhen a function is simultaneously quasiconvex and quasiconcave, it is calledquasilinear. Quasilinear functions are completely determined by convexlevel sets. One-dimensional function f(x) = x 3 and vector-valued signumfunction sgn(x) , for example, are quasilinear. Any monotonic function isquasilinear 3.28 (3.8.0.0.1,3.9 no.4, but not vice versa; Exercise 3.6.4.0.2).3.9 Salient propertiesof convex and quasiconvex functions1.A convex function is assumed continuous but not necessarilydifferentiable on the relative interior of its domain. [307,10]A quasiconvex function is not necessarily a continuous function.2. convex epigraph ⇔ convexity ⇒ quasiconvexity ⇔ convex sublevel sets.convex hypograph ⇔ concavity ⇒ quasiconcavity ⇔ convex superlevel.monotonicity ⇒ quasilinearity ⇔ convex level sets.3.27 Convex envelope of bilinear functions is well known. [3]3.28 e.g., a monotonic concave function is quasiconvex, but dare not confuse these terms.

272 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.log-convex ⇒ convex ⇒ quasiconvex.concave ⇒ quasiconcave ⇐ log-concave ⇐ positive concave. 3.294. The line theorem (3.7.3.0.1) translates identically to quasiconvexity(quasiconcavity). [61,3.4.2]5.g convex ⇔ −g concaveg quasiconvex ⇔ −g quasiconcaveg log-convex ⇔ 1/g log-concave(translation, homogeneity) Function convexity, concavity,quasiconvexity, and quasiconcavity are invariant to offset andnonnegative scaling.6. (affine transformation of argument) Composition g(h(X)) of convex(concave) function g with any affine function h : R m×n → R p×kremains convex (concave) in X ∈ R m×n , where h(R m×n ) ∩ domg ≠ ∅ .[199,B.2.1] Likewise for the quasiconvex (quasiconcave) functions g .7. – Nonnegatively weighted sum of convex (concave) functionsremains convex (concave). (3.1.2.1.1)– Nonnegatively weighted nonzero sum of strictly convex(concave) functions remains strictly convex (concave).– Nonnegatively weighted maximum (minimum) of convex 3.30(concave) functions remains convex (concave).– Pointwise supremum (infimum) of convex (concave) functionsremains convex (concave). (Figure 74) [307,5] –Sum of quasiconvex functions not necessarily quasiconvex.– Nonnegatively weighted maximum (minimum) of quasiconvex(quasiconcave) functions remains quasiconvex (quasiconcave).– Pointwise supremum (infimum) of quasiconvex (quasiconcave)functions remains quasiconvex (quasiconcave).3.29 Log-convex means: logarithm of function f is convex on dom f .3.30 Supremum and maximum of convex functions are proven convex by intersection ofepigraphs.

Chapter 4Semidefinite programmingPrior to 1984, linear and nonlinear programming, 4.1 one a subsetof the other, had evolved for the most part along unconnectedpaths, without even a common terminology. (The use of“programming” to mean “optimization” serves as a persistentreminder of these differences.)−Forsgren, Gill, & Wright (2002) [145]Given some practical application of convex analysis, it may at first seempuzzling why a search for its solution ends abruptly with a formalizedstatement of the problem itself as a constrained optimization. Theexplanation is: typically we do not seek analytical solution because there arerelatively few. (3.5.2,C) If a problem can be expressed in convex form,rather, then there exist computer programs providing efficient numericalglobal solution. [167] [384] [385] [383] [348] [336] The goal, then, becomesconversion of a given problem (perhaps a nonconvex or combinatorialproblem statement) to an equivalent convex form or to an alternation ofconvex subproblems convergent to a solution of the original problem:4.1 nascence of polynomial-time interior-point methods of solution [363] [381].Linear programming ⊂ (convex ∩ nonlinear) programming.2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.273

274 CHAPTER 4. SEMIDEFINITE PROGRAMMINGBy the fundamental theorem of Convex Optimization, any locally optimalpoint of a convex problem is globally optimal. [61,4.2.2] [306,1] Givenconvex real objective function g and convex feasible set D ⊆domg , whichis the set of all variable values satisfying the problem constraints, we pose ageneric convex optimization problemminimize g(X)Xsubject to X ∈ D(647)where constraints are abstract here in membership of variable X to convexfeasible set D . Inequality constraint functions of a convex optimizationproblem are convex while equality constraint functions are conventionallyaffine, but not necessarily so. Affine equality constraint functions, as opposedto the superset of all convex equality constraint functions having convex levelsets (3.4.0.0.4), make convex optimization tractable.Similarly, the problemmaximize g(X)Xsubject to X ∈ D(648)is called convex were g a real concave function and feasible set D convex.As conversion to convex form is not always possible, there is much ongoingresearch to determine which problem classes have convex expression orrelaxation. [34] [59] [151] [277] [345] [149]4.1 Conic problemStill, we are surprised to see the relatively small number ofsubmissions to semidefinite programming (SDP) solvers, as thisis an area of significant current interest to the optimizationcommunity. We speculate that semidefinite programming issimply experiencing the fate of most new areas: Users have yet tounderstand how to pose their problems as semidefinite programs,and the lack of support for SDP solvers in popular modellinglanguages likely discourages submissions.−SIAM News, 2002. [115, p.9]

4.1. CONIC PROBLEM 275(confer p.162) Consider a conic problem (p) and its dual (d): [293,3.3.1][239,2.1] [240](p)minimize c T xxsubject to x ∈ KAx = bmaximize b T yy,ssubject to s ∈ K ∗A T y + s = c(d) (302)where K is a closed convex cone, K ∗ is its dual, matrix A is fixed, and theremaining quantities are vectors.When K is a polyhedral cone (2.12.1), then each conic problem becomesa linear program; the selfdual nonnegative orthant providing the prototypicalprimal linear program and its dual. [94,3-1] 4.2 More generally, eachoptimization problem is convex when K is a closed convex cone. Unlike theoptimal objective value, a solution to each convex problem is not necessarilyunique; in other words, the optimal solution set {x ⋆ } or {y ⋆ , s ⋆ } is convex andmay comprise more than a single point although the corresponding optimalobjective value is unique when the feasible set is nonempty.4.1.1 a Semidefinite programWhen K is the selfdual cone of positive semidefinite matrices S n + in thesubspace of symmetric matrices S n , then each conic problem is called asemidefinite program (SDP); [277,6.4] primal problem (P) having matrixvariable X ∈ S n while corresponding dual (D) has slack variable S ∈ S n andvector variable y = [y i ]∈ R m : [10] [11,2] [391,1.3.8](P)minimizeX∈ S n 〈C , X〉subject to X ≽ 0A svec X = bmaximizey∈R m , S∈S n 〈b, y〉subject to S ≽ 0svec −1 (A T y) + S = C(D)(649)This is the prototypical semidefinite program and its dual, where matrixC ∈ S n and vector b∈R m are fixed, as is⎡ ⎤svec(A 1 ) TA ⎣ . ⎦∈ R m×n(n+1)/2 (650)svec(A m ) Twhere A i ∈ S n , i=1... m , are given. Thus4.2 Dantzig explains reasoning behind a nonnegativity constraint: . . .negative quantitiesof activities are not possible. . . .a negative number of cases cannot be shipped.

276 CHAPTER 4. SEMIDEFINITE PROGRAMMING⎡A svec X = ⎣〈A 1 , X〉.〈A m , X〉(651)∑svec −1 (A T y) = m y i A iThe vector inner-product for matrices is defined in the Euclidean/Frobeniussense in the isomorphic vector space R n(n+1)/2 ; id est,i=1〈C , X〉 tr(C T X) = svec(C) T svec X (38)where svec X defined by (56) denotes symmetric vectorization.In a national planning problem of some size, one may easily run into severalhundred variables and perhaps a hundred or more degrees of freedom. ...Itshould always be remembered that any mathematical method and particularlymethods in linear programming must be judged with reference to the type ofcomputing machinery available. Our outlook may perhaps be changed whenwe get used to the super modern, high capacity electronic computor that willbe available here from the middle of next year. −Ragnar Frisch [147]Linear programming, invented by Dantzig in 1947 [94], is now integralto modern technology. The same cannot yet be said of semidefiniteprogramming whose roots trace back to systems of positive semidefinitelinear inequalities studied by Bellman & Fan in 1963. [31] [102] Interior-pointmethods for numerical solution can be traced back to the logarithmic barrierof Frisch in 1954 and Fiacco & McCormick in 1968 [141]. Karmarkar’spolynomial-time interior-point method sparked a log-barrier renaissancein 1984, [275,11] [381] [363] [277, p.3] but numerical performanceof contemporary general-purpose semidefinite program solvers remainslimited: Computational intensity for dense systems varies as O(m 2 n)(m constraints ≪ n variables) based on interior-point methods that producesolutions no more relatively accurate than 1E-8. There are no solverscapable of handling in excess of n=100,000 variables without significant,sometimes crippling, loss of precision or time. 4.3 [35] [276, p.258] [67, p.3]4.3 Heuristics are not ruled out by SIOPT; indeed I would suspect that most successfulmethods have (appropriately described) heuristics under the hood - my codes certainly do....Of course, there are still questions relating to high-accuracy and speed, but for manyapplications a few digits of accuracy suffices and overnight runs for non-real-time deliveryis acceptable.−Nicholas I. M. Gould, Editor-in-Chief, SIOPT⎤⎦

4.1. CONIC PROBLEM 277PCsemidefinite programquadratic programsecond-order cone programlinear programFigure 80: Venn diagram of programming hierarchy. Semidefinite program isa subset of convex program PC. Semidefinite program subsumes other convexprogram classes excepting geometric program. Second-order cone programand quadratic program each subsume linear program. Nonconvex program\PC comprises those for which convex equivalents have not yet been found.

278 CHAPTER 4. SEMIDEFINITE PROGRAMMINGNevertheless, semidefinite programming has recently emerged toprominence because it admits a new class of problem previously unsolvable byconvex optimization techniques, [59] and because it theoretically subsumesother convex techniques: (Figure 80) linear programming and quadraticprogramming and second-order cone programming. 4.4 Determination of theRiemann mapping function from complex analysis [286] [29,8,13], forexample, can be posed as a semidefinite program.4.1.2 Maximal complementarityIt has been shown [391,2.5.3] that contemporary interior-point methods[382] [289] [277] [11] [61,11] [145] (developed circa 1990 [151] for numericalsolution of semidefinite programs) can converge to a solution of maximalcomplementarity; [176,5] [390] [253] [158] not a vertex-solution but asolution of highest cardinality or rank among all optimal solutions. 4.5This phenomenon can be explained by recognizing that interior-pointmethods generally find solutions relatively interior to a feasible set bydesign. 4.6 [6, p.3] Log barriers are designed to fail numerically at the feasibleset boundary. So low-rank solutions, all on the boundary, are rendered moredifficult to find as numerical error becomes more prevalent there.4.1.2.1 Reduced-rank solutionA simple rank reduction algorithm, for construction of a primal optimalsolution X ⋆ to (649P) satisfying an upper bound on rank governed byProposition 2.9.3.0.1, is presented in4.3. That proposition asserts existenceof feasible solutions with an upper bound on their rank; [26,II.13.1]specifically, it asserts an extreme point (2.6.0.0.1) of primal feasible setA ∩ S n + satisfies upper bound⌊√ ⌋ 8m + 1 − 1rankX ≤(272)2where, given A∈ R m×n(n+1)/2 and b∈ R m ,A {X ∈ S n | A svec X = b} (652)4.4 SOCP came into being in the 1990s; it is not posable as a quadratic program. [248]4.5 This characteristic might be regarded as a disadvantage to this method of numericalsolution, but this behavior is not certain and depends on solver implementation.4.6 Simplex methods, in contrast, generally find vertex solutions.

4.1. CONIC PROBLEM 279is the affine subset from primal problem (649P).4.1.2.2 Coexistence of low- and high-rank solutions; analogyThat low-rank and high-rank optimal solutions {X ⋆ } of (649P) coexist maybe grasped with the following analogy: We compare a proper polyhedral coneS 3 + in R 3 (illustrated in Figure 81) to the positive semidefinite cone S 3 + inisometrically isomorphic R 6 , difficult to visualize. The analogy is good:int S 3 + is constituted by rank-3 matrices.int S 3 + has three dimensions.boundary ∂S 3 + contains rank-0, rank-1, and rank-2 matrices.boundary ∂S 3 + contains 0-, 1-, and 2-dimensional faces.the only rank-0 matrix resides in the vertex at the origin.Rank-1 matrices are in one-to-one correspondence with extremedirections of S 3 + and S 3 + . The set of all rank-1 symmetric matrices inthis dimension{G ∈ S3+ | rankG=1 } (653)is not a connected set.Rank of a sum of members F +G in Lemma 2.9.2.9.1 and location ofa difference F −G in2.9.2.12.1 similarly hold for S 3 + and S 3 + .Euclidean distance from any particular rank-3 positive semidefinitematrix (in the cone interior) to the closest rank-2 positive semidefinitematrix (on the boundary) is generally less than the distance to theclosest rank-1 positive semidefinite matrix. (7.1.2)distance from any point in ∂S 3 + to int S 3 + is infinitesimal (2.1.7.1.1).distance from any point in ∂S 3 + to int S 3 + is infinitesimal.

280 CHAPTER 4. SEMIDEFINITE PROGRAMMINGC0PΓ 1Γ 2S+3A=∂HFigure 81: Visualizing positive semidefinite cone in high dimension: Properpolyhedral cone S 3 + ⊂ R 3 representing positive semidefinite cone S 3 + ⊂ S 3 ;analogizing its intersection with hyperplane S 3 + ∩ ∂H . Number of facets isarbitrary (analogy is not inspired by eigenvalue decomposition). The rank-0positive semidefinite matrix corresponds to the origin in R 3 , rank-1 positivesemidefinite matrices correspond to the edges of the polyhedral cone, rank-2to the facet relative interiors, and rank-3 to the polyhedral cone interior.Vertices Γ 1 and Γ 2 are extreme points of polyhedron P =∂H ∩ S 3 + , andextreme directions of S 3 + . A given vector C is normal to another hyperplane(not illustrated but independent w.r.t ∂H) containing line segment Γ 1 Γ 2minimizing real linear function 〈C, X〉 on P . (confer Figure 26, Figure 30)

4.1. CONIC PROBLEM 281faces of S 3 + correspond to faces of S 3 + (confer Table 2.9.2.3.1):k dim F(S+) 3 dim F(S 3 +) dim F(S 3 + ∋ rank-k matrix)0 0 0 0boundary 1 1 1 12 2 3 3interior 3 3 6 6Integer k indexes k-dimensional faces F of S 3 + . Positive semidefinitecone S 3 + has four kinds of faces, including cone itself (k = 3,boundary + interior), whose dimensions in isometrically isomorphicR 6 are listed under dim F(S 3 +). Smallest face F(S 3 + ∋ rank-k matrix)that contains a rank-k positive semidefinite matrix has dimensionk(k + 1)/2 by (222).For A equal to intersection of m hyperplanes having independentnormals, and for X ∈ S 3 + ∩ A , we have rankX ≤ m ; the analogueto (272).Proof. With reference to Figure 81: Assume one (m = 1) hyperplaneA = ∂H intersects the polyhedral cone. Every intersecting planecontains at least one matrix having rank less than or equal to 1 ; id est,from all X ∈ ∂H ∩ S 3 + there exists an X such that rankX ≤ 1. Rank 1is therefore an upper bound in this case.Now visualize intersection of the polyhedral cone with two (m = 2)hyperplanes having linearly independent normals. The hyperplaneintersection A makes a line. Every intersecting line contains at least onematrix having rank less than or equal to 2, providing an upper bound.In other words, there exists a positive semidefinite matrix X belongingto any line intersecting the polyhedral cone such that rankX ≤ 2.In the case of three independent intersecting hyperplanes (m = 3), thehyperplane intersection A makes a point that can reside anywhere inthe polyhedral cone. The upper bound on a point in S+ 3 is also thegreatest upper bound: rankX ≤ 3.

282 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.1.2.2.1 Example. Optimization over A ∩ S 3 + .Consider minimization of the real linear function 〈C , X〉 overa polyhedral feasible set;P A ∩ S 3 + (654)f0 ⋆ minimize 〈C , X〉Xsubject to X ∈ A ∩ S+3(655)As illustrated for particular vector C and hyperplane A = ∂H in Figure 81,this linear function is minimized on any X belonging to the face of Pcontaining extreme points {Γ 1 , Γ 2 } and all the rank-2 matrices in between;id est, on any X belonging to the face of PF(P) = {X | 〈C , X〉 = f ⋆ 0 } ∩ A ∩ S 3 + (656)exposed by the hyperplane {X | 〈C , X〉=f ⋆ 0 }. In other words, the set of alloptimal points X ⋆ is a face of P{X ⋆ } = F(P) = Γ 1 Γ 2 (657)comprising rank-1 and rank-2 positive semidefinite matrices. Rank 1 isthe upper bound on existence in the feasible set P for this case m = 1hyperplane constituting A . The rank-1 matrices Γ 1 and Γ 2 in face F(P)are extreme points of that face and (by transitivity (2.6.1.2)) extremepoints of the intersection P as well. As predicted by analogy to Barvinok’sProposition 2.9.3.0.1, the upper bound on rank of X existent in the feasibleset P is satisfied by an extreme point. The upper bound on rank of anoptimal solution X ⋆ existent in F(P) is thereby also satisfied by an extremepoint of P precisely because {X ⋆ } constitutes F(P) ; 4.7 in particular,{X ⋆ ∈ P | rankX ⋆ ≤ 1} = {Γ 1 , Γ 2 } ⊆ F(P) (658)As all linear functions on a polyhedron are minimized on a face, [94] [252][274] [280] by analogy we so demonstrate coexistence of optimal solutions X ⋆of (649P) having assorted rank.4.7 and every face contains a subset of the extreme points of P by the extremeexistence theorem (2.6.0.0.2). This means: because the affine subset A and hyperplane{X | 〈C , X 〉 = f ⋆ 0 } must intersect a whole face of P , calculation of an upper bound onrank of X ⋆ ignores counting the hyperplane when determining m in (272).

4.1. CONIC PROBLEM 2834.1.2.3 Previous workBarvinok showed, [24,2.2] when given a positive definite matrix C and anarbitrarily small neighborhood of C comprising positive definite matrices,there exists a matrix ˜C from that neighborhood such that optimal solutionX ⋆ to (649P) (substituting ˜C) is an extreme point of A ∩ S n + and satisfiesupper bound (272). 4.8 Given arbitrary positive definite C , this meansnothing inherently guarantees that an optimal solution X ⋆ to problem (649P)satisfies (272); certainly nothing given any symmetric matrix C , as theproblem is posed. This can be proved by example:4.1.2.3.1 Example. (Ye) Maximal Complementarity.Assume dimension n to be an even positive number. Then the particularinstance of problem (649P),has optimal solution〈[ I 0minimizeX∈ S n 0 2Isubject to X ≽ 0X ⋆ =〈I , X〉 = n[ 2I 00 0] 〉, X(659)]∈ S n (660)with an equal number of twos and zeros along the main diagonal. Indeed,optimal solution (660) is a terminal solution along the central path taken bythe interior-point method as implemented in [391,2.5.3]; it is also a solutionof highest rank among all optimal solutions to (659). Clearly, rank of thisprimal optimal solution exceeds by far a rank-1 solution predicted by upperbound (272).4.1.2.4 Later developmentsThis rational example (659) indicates the need for a more generally applicableand simple algorithm to identify an optimal solution X ⋆ satisfying Barvinok’sProposition 2.9.3.0.1. We will review such an algorithm in4.3, but first weprovide more background.4.8 Further, the set of all such ˜C in that neighborhood is open and dense.

284 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.2 Framework4.2.1 Feasible setsDenote by D and D ∗ the convex sets of primal and dual points respectivelysatisfying the primal and dual constraints in (649), each assumed nonempty;⎧ ⎡ ⎤ ⎫⎨〈A 1 , X〉 ⎬D =⎩ X ∈ Sn + | ⎣ . ⎦= b⎭ = A ∩ Sn +〈A m , X〉(661){}m∑D ∗ = S ∈ S n + , y = [y i ]∈ R m | y i A i + S = CThese are the primal feasible set and dual feasible set. Geometrically,primal feasible A ∩ S n + represents an intersection of the positive semidefinitecone S n + with an affine subset A of the subspace of symmetric matrices S nin isometrically isomorphic R n(n+1)/2 . A has dimension n(n+1)/2 −mwhen the vectorized A i are linearly independent. Dual feasible set D ∗ isa Cartesian product of the positive semidefinite cone with its inverse image(2.1.9.0.1) under affine transformation 4.9 C − ∑ y i A i . Both feasible sets areconvex, and the objective functions are linear on a Euclidean vector space.Hence, (649P) and (649D) are convex optimization problems.4.2.1.1 A ∩ S n + emptiness determination via Farkas’ lemma4.2.1.1.1 Lemma. Semidefinite Farkas’ lemma.Given set {A i ∈ S n , i=1... m} , vector b = [b i ]∈ R m , and affine subset(652) A = {X ∈ S n |〈A i , X〉=b i , i=1... m} {A svec X |X ≽ 0} (380) is closed,then primal feasible set A ∩ S n + is nonempty if and only if y T b ≥ 0 holds foreach and every vector y = [y i ]∈ R m ∑such that m y i A i ≽ 0.Equivalently, primal feasible set A ∩ S n + is nonempty if and only if∑y T b ≥ 0 holds for each and every vector ‖y‖= 1 such that m y i A i ≽ 0. ⋄4.9 Inequality C − ∑ y i A i ≽0 follows directly from (649D) (2.9.0.1.1) and is known as alinear matrix inequality. (2.13.5.1.1) Because ∑ y i A i ≼C , matrix S is known as a slackvariable (a term borrowed from linear programming [94]) since its inclusion raises thisinequality to equality.i=1i=1i=1

4.2. FRAMEWORK 285Semidefinite Farkas’ lemma provides necessary and sufficient conditionsfor a set of hyperplanes to have nonempty intersection A ∩ S n + with thepositive semidefinite cone. Given⎡ ⎤svec(A 1 ) TA = ⎣ . ⎦ ∈ R m×n(n+1)/2 (650)svec(A m ) Tsemidefinite Farkas’ lemma assumes that a convex coneK = {A svec X | X ≽ 0} (380)is closed per membership relation (320) from which the lemma springs:[232,I] K closure is attained when matrix A satisfies the cone closednessinvariance corollary (p.183). Given closed convex cone K and its dual fromExample 2.13.5.1.1m∑K ∗ = {y | y j A j ≽ 0} (387)j=1then we can apply membership relationto obtain the lemmab ∈ K ⇔ 〈y , b〉 ≥ 0 ∀y ∈ K ∗ (320)b ∈ K ⇔ ∃X ≽ 0 A svec X = b ⇔ A ∩ S n + ≠ ∅ (662)b ∈ K ⇔ 〈y,b〉 ≥ 0 ∀y ∈ K ∗ ⇔ A ∩ S n + ≠ ∅ (663)The final equivalence synopsizes semidefinite Farkas’ lemma.While the lemma is correct as stated, a positive definite version is requiredfor semidefinite programming [391,1.3.8] because existence of a feasiblesolution in the cone interior A ∩ int S n + is required by Slater’s condition 4.10to achieve 0 duality gap (optimal primal−dual objective difference4.2.3,Figure 58). Geometrically, a positive definite lemma is required to insurethat a point of intersection closest to the origin is not at infinity; e.g.,4.10 Slater’s sufficient constraint qualification is satisfied whenever any primal or dualstrictly feasible solution exists; id est, any point satisfying the respective affine constraintsand relatively interior to the convex cone. [330,6.6] [41, p.325] If the cone were polyhedral,then Slater’s constraint qualification is satisfied when any feasible solution exists (relativelyinterior to the cone or on its relative boundary). [61,5.2.3]

286 CHAPTER 4. SEMIDEFINITE PROGRAMMINGFigure 45. Then given A∈ R m×n(n+1)/2 having rank m , we wish to detectexistence of nonempty primal feasible set interior to the PSD cone; 4.11 (383)b ∈ int K ⇔ 〈y, b〉 > 0 ∀y ∈ K ∗ , y ≠ 0 ⇔ A ∩ int S n + ≠ ∅ (664)Positive definite Farkas’ lemma is made from proper cones, K (380) and K ∗(387), and membership relation (326) for which K closedness is unnecessary:4.2.1.1.2 Lemma. Positive definite Farkas’ lemma.Given l.i. set {A i ∈ S n , i=1... m} and vector b = [b i ]∈ R m , make affine setA = {X ∈ S n |〈A i , X〉=b i , i=1... m} (652)Primal feasible cone interior A ∩ int S n + is nonempty if and only if y T b > 0∑holds for each and every vector y = [y i ]≠ 0 such that m y i A i ≽ 0.Equivalently, primal feasible cone interior A ∩ int S n + is nonempty if and∑only if y T b > 0 holds for each and every vector ‖y‖= 1 m y i A i ≽ 0. ⋄4.2.1.1.3 Example. “New” Farkas’ lemma.Lasserre [232,III] presented an example in 1995, originally offered byBen-Israel in 1969 [32, p.378], to support closedness in semidefinite Farkas’Lemma 4.2.1.1.1:[ ] svec(A1 )A Tsvec(A 2 ) T =[ 0 1 00 0 1i=1i=1] [ 1, b =0Intersection A ∩ S n + is practically empty because the solution set{[ ] }α √2 1{X ≽ 0 | A svec X = b} =≽ 0 | α∈ R√120](665)(666)is positive semidefinite only asymptotically (α→∞). Yet the dual systemm∑y i A i ≽0 ⇒ y T b≥0 erroneously indicates nonempty intersection becausei=1K (380) violates a closedness condition of the lemma; videlicet, for ‖y‖= 1[ ]01 [ ][ ]√2 0 00y 1√1+ y20 2 ≽ 0 ⇔ y = ⇒ y T b = 0 (667)0 114.11 Detection of A ∩ int S n +≠ ∅ by examining int K instead is a trick need not be lost.

4.2. FRAMEWORK 287On the other hand, positive definite Farkas’ Lemma 4.2.1.1.2 certifies thatA ∩ int S n + is empty; what we need to know for semidefinite programming.Lasserre suggested addition of another condition to semidefinite Farkas’lemma (4.2.1.1.1) to make a new lemma having no closedness condition.But positive definite Farkas’ lemma (4.2.1.1.2) is simpler and obviates theadditional condition proposed.4.2.1.2 Theorem of the alternative for semidefinite programmingBecause these Farkas’ lemmas follow from membership relations, we mayconstruct alternative systems from them. Applying the method of2.13.2.1.1,then from positive definite Farkas’ lemma we getA ∩ int S n + ≠ ∅or in the alternativem∑y T b ≤ 0, y i A i ≽ 0, y ≠ 0i=1(668)Any single vector y satisfying the alternative certifies A ∩ int S n + is empty.Such a vector can be found as a solution to another semidefinite program:for linearly independent (vectorized) set {A i ∈ S n , i=1... m}minimizeysubject toy T bm∑y i A i ≽ 0i=1‖y‖ 2 ≤ 1(669)If an optimal vector y ⋆ ≠ 0 can be found such that y ⋆T b ≤ 0, then primalfeasible cone interior A ∩ int S n + is empty.4.2.1.3 Boundary-membership criterion(confer (663) (664)) From boundary-membership relation (330) for linearmatrix inequality proper cones K (380) and K ∗ (387)b ∈ ∂K ⇔ ∃ y ≠ 0 〈y , b〉 = 0, y ∈ K ∗ , b ∈ K ⇔ ∂S n + ⊃ A ∩ S n + ≠ ∅(670)

288 CHAPTER 4. SEMIDEFINITE PROGRAMMINGWhether vector b ∈ ∂K belongs to cone K boundary, that is a determinationwe can indeed make; one that is certainly expressible as a feasibility problem:Given linearly independent set 4.12 {A i ∈ S n , i=1... m} , for b ∈ K (662)find y ≠ 0subject to y T b = 0m∑y i A i ≽ 0i=1(671)Any such nonzero solution y certifies that affine subset A (652) intersects thepositive semidefinite cone S n + only on its boundary; in other words, nonemptyfeasible set A ∩ S n + belongs to the positive semidefinite cone boundary ∂S n + .4.2.2 DualsThe dual objective function from (649D) evaluated at any feasible solutionrepresents a lower bound on the primal optimal objective value from (649P).We can see this by direct substitution: Assume the feasible sets A ∩ S n + andD ∗ are nonempty. Then it is always true:〈 ∑i〈C , X〉 ≥ 〈b, y〉〉y i A i + S , X ≥ [ 〈A 1 , X〉 · · · 〈A m , X〉 ] y〈S , X〉 ≥ 0(672)The converse also follows becauseX ≽ 0, S ≽ 0 ⇒ 〈S , X〉 ≥ 0 (1485)Optimal value of the dual objective thus represents the greatest lower boundon the primal. This fact is known as the weak duality theorem for semidefiniteprogramming, [391,1.3.8] and can be used to detect convergence in anyprimal/dual numerical method of solution.4.12 From the results of Example 2.13.5.1.1, vector b on the boundary of K cannotbe detected simply by looking for 0 eigenvalues in matrix X . We do not consider askinny-or-square matrix A because then feasible set A ∩ S+ n is at most a single point.

4.2. FRAMEWORK 2894.2.2.1 Dual problem statement is not uniqueEven subtle but equivalent restatements of a primal convex problem canlead to vastly different statements of a corresponding dual problem. Thisphenomenon is of interest because a particular instantiation of dual problemmight be easier to solve numerically or it might take one of few forms forwhich analytical solution is known.Here is a canonical restatement of prototypical dual semidefinite program(649D), for example, equivalent by (194):(D)maximizey∈R m , S∈S n 〈b, y〉subject to S ≽ 0svec −1 (A T y) + S = C≡maximizey∈R m 〈b, y〉subject to svec −1 (A T y) ≼ C(649˜D)Dual feasible cone interior in int S n + (661) (651) thereby corresponds withcanonical dual (˜D) feasible interior{}rel int ˜Dm∑∗ y ∈ R m | y i A i ≺ C(673)i=14.2.2.1.1 Exercise. Primal prototypical semidefinite program.Derive prototypical primal (649P) from its canonical dual (649˜D); id est,demonstrate that particular connectivity in Figure 82. 4.2.3 Optimality conditionsWhen primal feasible cone interior A ∩ int S n + exists in S n or when canonicaldual feasible interior rel int ˜D ∗ exists in R m , then these two problems (649P)(649D) become strong duals by Slater’s sufficient condition (p.285). In otherwords, the primal optimal objective value becomes equal to the dual optimalobjective value: there is no duality gap (Figure 58) and so determination ofconvergence is facilitated; id est, if ∃X ∈ A ∩ int S n + or ∃y ∈ rel int ˜D ∗ then〈 ∑i〈C , X ⋆ 〉 = 〈b, y ⋆ 〉y ⋆ i A i + S ⋆ , X ⋆ 〉= [ 〈A 1 , X ⋆ 〉 · · · 〈A m , X ⋆ 〉 ] y ⋆〈S ⋆ , X ⋆ 〉 = 0(674)

290 CHAPTER 4. SEMIDEFINITE PROGRAMMINGP˜PdualitydualityDtransformation˜DFigure 82: Connectivity indicates paths between particular primal and dualproblems from Exercise 4.2.2.1.1. More generally, any path between primalproblems P (and equivalent ˜P) and dual D (and equivalent ˜D) is possible:implying, any given path is not necessarily circuital; dual of a dual problemis not necessarily stated in precisely same manner as corresponding primalconvex problem, in other words, although its solution set is equivalent towithin some transformation.where S ⋆ , y ⋆ denote a dual optimal solution. 4.13 We summarize this:4.2.3.0.1 Corollary. Optimality and strong duality. [359,3.1][391,1.3.8] For semidefinite programs (649P) and (649D), assume primaland dual feasible sets A ∩ S n + ⊂ S n and D ∗ ⊂ S n × R m (661) are nonempty.ThenX ⋆ is optimal for (649P)S ⋆ , y ⋆ are optimal for (649D)duality gap 〈C,X ⋆ 〉−〈b, y ⋆ 〉 is 0if and only ifi) ∃X ∈ A ∩ int S n + or ∃y ∈ rel int ˜D ∗ii) 〈S ⋆ , X ⋆ 〉 = 0and⋄4.13 Optimality condition 〈S ⋆ , X ⋆ 〉=0 is called a complementary slackness condition, inkeeping with linear programming tradition [94], that forbids dual inequalities in (649) tosimultaneously hold strictly. [306,4]

4.2. FRAMEWORK 291For symmetric positive semidefinite matrices, requirement ii is equivalentto the complementarity (A.7.4)〈S ⋆ , X ⋆ 〉 = 0 ⇔ S ⋆ X ⋆ = X ⋆ S ⋆ = 0 (675)Commutativity of diagonalizable matrices is a necessary and sufficientcondition [202,1.3.12] for these two optimal symmetric matrices to besimultaneously diagonalizable. ThereforerankX ⋆ + rankS ⋆ ≤ n (676)Proof. To see that, the product of symmetric optimal matricesX ⋆ , S ⋆ ∈ S n must itself be symmetric because of commutativity. (1474) Thesymmetric product has diagonalization [11, cor.2.11]S ⋆ X ⋆ = X ⋆ S ⋆ = QΛ S ⋆Λ X ⋆Q T = 0 ⇔ Λ X ⋆Λ S ⋆ = 0 (677)where Q is an orthogonal matrix. The product of the nonnegative diagonal Λmatrices can be 0 if their main diagonal zeros are complementary or coincide.Due only to symmetry, rankX ⋆ = rank Λ X ⋆ and rankS ⋆ = rank Λ S ⋆ forthese optimal primal and dual solutions. (1460) So, because of thecomplementarity, the total number of nonzero diagonal entries from both Λcannot exceed n .When equality is attained in (676)rankX ⋆ + rankS ⋆ = n (678)there are no coinciding main diagonal zeros in Λ X ⋆Λ S ⋆ , and so we have whatis called strict complementarity. 4.14 Logically it follows that a necessary andsufficient condition for strict complementarity of an optimal primal and dualsolution isX ⋆ + S ⋆ ≻ 0 (679)4.2.3.1 solving primal problem via dualThe beauty of Corollary 4.2.3.0.1 is its conjugacy; id est, one can solve eitherthe primal or dual problem in (649) and then find a solution to the othervia the optimality conditions. When a dual optimal solution is known,for example, a primal optimal solution is any primal feasible solution inhyperplane {X | 〈S ⋆ , X〉=0}.4.14 distinct from maximal complementarity (4.1.2).

292 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.2.3.1.1 Example. Minimal cardinality Boolean. [93] [34,4.3.4] [345](confer Example 4.5.1.5.1) Consider finding a minimal cardinality Booleansolution x to the classic linear algebra problem Ax = b given noiseless dataA∈ R m×n and b∈ R m ;minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(680)where ‖x‖ 0 denotes cardinality of vector x (a.k.a, 0-norm; not a convexfunction).A minimal cardinality solution answers the question: “Which fewestlinear combination of columns in A constructs vector b ?” Cardinalityproblems have extraordinarily wide appeal, arising in many fields of scienceand across many disciplines. [318] [214] [171] [170] Yet designing an efficientalgorithm to optimize cardinality has proved difficult. In this example, wealso constrain the variable to be Boolean. The Boolean constraint forcesan identical solution were the norm in problem (680) instead the 1-norm or2-norm; id est, the two problems(680)minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n=minimize ‖x‖ 1xsubject to Ax = bx i ∈ {0, 1} ,(681)i=1... nare the same. The Boolean constraint makes the 1-norm problem nonconvex.Given dataA =⎡−1 1 8 1 1 0⎢1 1 1⎣ −3 2 82 3−9 4 81419− 1 2 31− 1 4 9⎤⎥⎦ , b =the obvious and desired solution to the problem posed,⎡⎢⎣11214⎤⎥⎦ (682)x ⋆ = e 4 ∈ R 6 (683)

4.2. FRAMEWORK 293has norm ‖x ⋆ ‖ 2 =1 and minimal cardinality; the minimum number ofnonzero entries in vector x . The Matlab backslash command x=A\b ,for example, finds⎡x M=⎢⎣2128051280901280⎤⎥⎦(684)having norm ‖x M‖ 2 = 0.7044 .id est, an optimal solution toCoincidentally, x Mis a 1-norm solution;minimize ‖x‖ 1xsubject to Ax = b(518)The pseudoinverse solution (rounded)⎡ ⎤−0.0456−0.1881x P= A † b =0.0623⎢ 0.2668⎥⎣ 0.3770 ⎦−0.1102(685)has least norm ‖x P‖ 2 =0.5165 ; id est, the optimal solution to (E.0.1.0.1)minimize ‖x‖ 2xsubject to Ax = b(686)Certainly none of the traditional methods provide x ⋆ = e 4 (683) because, andin general, for Ax = b∥ arg inf ‖x‖2∥∥2 ≤ ∥ ∥ arg inf ‖x‖1∥∥2 ≤ ∥ ∥ arg inf ‖x‖0∥∥2 (687)

294 CHAPTER 4. SEMIDEFINITE PROGRAMMINGWe can reformulate this minimal cardinality Boolean problem (680) as asemidefinite program: First transform the variablex (ˆx + 1) 1 2(688)so ˆx i ∈ {−1, 1} ; equivalently,minimize ‖(ˆx + 1) 1‖ ˆx2 0subject to A(ˆx + 1) 1 = b 2δ(ˆxˆx T ) = 1(689)where δ is the main-diagonal linear operator (A.1). By assigning (B.1)[ ˆxG =1] [ ˆx T 1] =[ X ˆxˆx T 1][ ˆxˆxT ˆxˆx T 1]∈ S n+1 (690)problem (689) becomes equivalent to: (Theorem A.3.1.0.7)minimize 1 TˆxX∈ S n , ˆx∈R nsubject to A(ˆx + 1) 1[ = b 2] X ˆxG = (≽ 0)ˆx T 1δ(X) = 1rankG = 1(691)where solution is confined to rank-1 vertices of the elliptope in S n+1(5.9.1.0.1) by the rank constraint, the positive semidefiniteness, and theequality constraints δ(X)=1. The rank constraint makes this problemnonconvex; by removing it 4.15 we get the semidefinite program4.15 Relaxed problem (692) can also be derived via Lagrange duality; it is a dual of adual program [sic] to (691). [304] [61,5, exer.5.39] [377,IV] [150,11.3.4] The relaxedproblem must therefore be convex having a larger feasible set; its optimal objective valuerepresents a generally loose lower bound (1683) on the optimal objective of problem (691).

4.2. FRAMEWORK 295minimize 1 TˆxX∈ S n , ˆx∈R nsubject to A(ˆx + 1) 1[ = b 2] X ˆxG =ˆx T 1δ(X) = 1≽ 0(692)whose optimal solution x ⋆ (688) is identical to that of minimal cardinalityBoolean problem (680) if and only if rankG ⋆ =1.Hope 4.16 of acquiring a rank-1 solution is not ill-founded because 2 nelliptope vertices have rank 1, and we are minimizing an affine function ona subset of the elliptope (Figure 130) containing rank-1 vertices; id est, byassumption that the feasible set of minimal cardinality Boolean problem(680) is nonempty, a desired solution resides on the elliptope relativeboundary at a rank-1 vertex. 4.17For that data given in (682), our semidefinite program solversdpsol [384][385] (accurate in solution to approximately 1E-8) 4.18 finds optimal solutionto (692)⎡round(G ⋆ ) =⎢⎣1 1 1 −1 1 1 −11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 1⎤⎥⎦(693)near a rank-1 vertex of the elliptope in S n+1 (Theorem 5.9.1.0.2); its sorted4.16 A more deterministic approach to constraining rank and cardinality is in4.6.0.0.11.4.17 Confinement to the elliptope can be regarded as a kind of normalization akin tomatrix A column normalization suggested in [120] and explored in Example 4.2.3.1.2.4.18 A typically ignored limitation of interior-point solution methods is their relativeaccuracy of only about 1E-8 on a machine using 64-bit (double precision) floating-pointarithmetic; id est, optimal solution x ⋆ cannot be more accurate than square root ofmachine epsilon (ǫ=2.2204E-16). Nonzero primal−dual objective difference is not a goodmeasure of solution accuracy.

296 CHAPTER 4. SEMIDEFINITE PROGRAMMINGeigenvalues,⎡λ(G ⋆ ) =⎢⎣6.999999777990990.000000226872410.000000022502960.00000000262974−0.00000000999738−0.00000000999875−0.00000001000000⎤⎥⎦(694)Negative eigenvalues are undoubtedly finite-precision effects. Because thelargest eigenvalue predominates by many orders of magnitude, we can expectto find a good approximation to a minimal cardinality Boolean solution bytruncating all smaller eigenvalues. We find, indeed, the desired result (683)⎛⎡x ⋆ = round⎜⎢⎝⎣0.000000001279470.000000005273690.000000001810010.999999974690440.000000014089500.00000000482903⎤⎞= e 4 (695)⎥⎟⎦⎠These numerical results are solver dependent; insofar, not all SDP solverswill return a rank-1 vertex solution.4.2.3.1.2 Example. Optimization over elliptope versus 1-norm polyhedronfor minimal cardinality Boolean Example 4.2.3.1.1.A minimal cardinality problem is typically formulated via, what is by now,the standard practice [120] [68,3.2,3.4] of column normalization applied toa 1-norm problem surrogate like (518). Suppose we define a diagonal matrix⎡Λ ⎢⎣⎤‖A(:,1)‖ 2 0‖A(:, 2)‖ 2 ⎥...0 ‖A(:, 6)‖ 2⎦ ∈ S6 (696)used to normalize the columns (assumed nonzero) of given noiseless datamatrix A . Then approximate the minimal cardinality Boolean problemminimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(680)

4.2. FRAMEWORK 297asminimize ‖ỹ‖ 1ỹsubject to AΛ −1 ỹ = b1 ≽ Λ −1 ỹ ≽ 0(697)where optimal solutiony ⋆ = round(Λ −1 ỹ ⋆ ) (698)The inequality in (697) relaxes Boolean constraint y i ∈ {0, 1} from (680);bounding any solution y ⋆ to a nonnegative unit hypercube whose vertices arebinary numbers. Convex problem (697) is justified by the convex envelopecenv ‖x‖ 0 on {x∈ R n | ‖x‖ ∞ ≤κ} = 1 κ ‖x‖ 1 (1366)Donoho concurs with this particular formulation, equivalently expressible asa linear program via (514).Approximation (697) is therefore equivalent to minimization of an affinefunction (3.2) on a bounded polyhedron, whereas semidefinite programminimize 1 TˆxX∈ S n , ˆx∈R nsubject to A(ˆx + 1) 1[ = b 2] X ˆxG =ˆx T 1δ(X) = 1≽ 0(692)minimizes an affine function on an intersection of the elliptope withhyperplanes. Although the same Boolean solution is obtained fromthis approximation (697) as compared with semidefinite program (692),when given that particular data from Example 4.2.3.1.1, Singer confides acounterexample: Instead, given dataA =[1 01 √20 11√2]then solving approximation (697) yields, b =[11](699)

298 CHAPTER 4. SEMIDEFINITE PROGRAMMING⎛⎡y ⋆ ⎜⎢= round⎝⎣1 − 1 √21 − 1 √21⎤⎞⎥⎟⎦⎠ =⎡⎢⎣001⎤⎥⎦ (700)(infeasible, with or without rounding, with respect to original problem (680))whereas solving semidefinite program (692) produces⎡⎤1 1 −1 1round(G ⋆ ) = ⎢ 1 1 −1 1⎥⎣ −1 −1 1 −1 ⎦ (701)1 1 −1 1with sorted eigenvaluesλ(G ⋆ ) =⎡⎢⎣3.999999650572640.00000035942736−0.00000000000000−0.00000001000000⎤⎥⎦ (702)Truncating all but the largest eigenvalue, from (688) we obtain (confer y ⋆ )⎛⎡⎤⎞⎡ ⎤0.99999999625299 1x ⋆ = round⎝⎣0.99999999625299 ⎦⎠ = ⎣ 1 ⎦ (703)0.00000001434518 0the desired minimal cardinality Boolean result.4.2.3.1.3 Exercise. Minimal cardinality Boolean art.Assess general performance of standard-practice approximation (697) ascompared with the proposed semidefinite program (692). 4.2.3.1.4 Exercise. Conic independence.Matrix A from (682) is full-rank having three-dimensional nullspace. Findits four conically independent columns. (2.10) To what part of proper coneK = {Ax | x ≽ 0} does vector b belong?4.2.3.1.5 Exercise. Linear independence.Show why fat matrix A , from compressed sensing problem (518) or (523),may be regarded full-rank without loss of generality. In other words: Is aminimal cardinality solution invariant to linear dependence of rows?

4.3. RANK REDUCTION 2994.3 Rank reduction...it is not clear generally how to predict rankX ⋆ or rankS ⋆before solving the SDP problem.−Farid Alizadeh (1995) [11, p.22]The premise of rank reduction in semidefinite programming is: an optimalsolution found does not satisfy Barvinok’s upper bound (272) on rank. Theparticular numerical algorithm solving a semidefinite program may haveinstead returned a high-rank optimal solution (4.1.2; e.g., (660)) when alower-rank optimal solution was expected. Rank reduction is a means toadjust rank of an optimal solution, returned by a solver, until it satisfiesBarvinok’s upper bound.4.3.1 Posit a perturbation of X ⋆Recall from4.1.2.1, there is an extreme point of A ∩ S n + (652) satisfyingupper bound (272) on rank. [24,2.2] It is therefore sufficient to locate anextreme point of the intersection whose primal objective value (649P) isoptimal: 4.19 [113,31.5.3] [239,2.4] [240] [7,3] [291]Consider again affine subsetA = {X ∈ S n | A svec X = b} (652)where for A i ∈ S n ⎡A ⎣⎤svec(A 1 ) T. ⎦ ∈ R m×n(n+1)/2 (650)svec(A m ) TGiven any optimal solution X ⋆ tominimizeX∈ S n 〈C , X 〉subject to X ∈ A ∩ S n +(649P)4.19 There is no known construction for Barvinok’s tighter result (277).−Monique Laurent (2004)

300 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwhose rank does not satisfy upper bound (272), we posit existence of a setof perturbations{t j B j | t j ∈ R , B j ∈ S n , j =1... n} (704)such that, for some 0≤i≤n and scalars {t j , j =1... i} ,X ⋆ +i∑t j B j (705)j=1becomes an extreme point of A ∩ S n + and remains an optimal solution of(649P). Membership of (705) to affine subset A is secured for the i thperturbation by demanding〈B i , A j 〉 = 0, j =1... m (706)while membership to the positive semidefinite cone S n + is insured by smallperturbation (715). Feasibility of (705) is insured in this manner, optimalityis proved in4.3.3.The following simple algorithm has very low computational intensity andlocates an optimal extreme point, assuming a nontrivial solution:4.3.1.0.1 Procedure. Rank reduction. [Wıκımization]initialize: B i = 0 ∀ifor iteration i=1...n{1. compute a nonzero perturbation matrix B i of X ⋆ + i−1 ∑j=1t j B j2. maximize t isubject to X ⋆ + i ∑j=1t j B j ∈ S n +}

4.3. RANK REDUCTION 301A rank-reduced optimal solution is theni∑X ⋆ ← X ⋆ + t j B j (707)j=14.3.2 Perturbation formPerturbations of X ⋆ are independent of constants C ∈ S n and b∈R m inprimal and dual problems (649). Numerical accuracy of any rank-reducedresult, found by perturbation of an initial optimal solution X ⋆ , is thereforequite dependent upon initial accuracy of X ⋆ .4.3.2.0.1 Definition. Matrix step function. (conferA.6.5.0.1)Define the signum-like quasiconcave real function ψ : S n → Rψ(Z) { 1, Z ≽ 0−1, otherwise(708)The value −1 is taken for indefinite or nonzero negative semidefiniteargument.△Deza & Laurent [113,31.5.3] prove: every perturbation matrix B i ,i=1... n , is of the formB i = −ψ(Z i )R i Z i R T i ∈ S n (709)where∑i−1X ⋆ R 1 R1 T , X ⋆ + t j B j R i Ri T ∈ S n (710)j=1where the t j are scalars and R i ∈ R n×ρ is full-rank and skinny where( )∑i−1ρ rank X ⋆ + t j B jj=1(711)

302 CHAPTER 4. SEMIDEFINITE PROGRAMMINGand where matrix Z i ∈ S ρ is found at each iteration i by solving a verysimple feasibility problem: 4.20find Z i ∈ S ρsubject to 〈Z i , RiA T j R i 〉 = 0 ,j =1... m(712)Were there a sparsity pattern common to each member of the set{R T iA j R i ∈ S ρ , j =1... m} , then a good choice for Z i has 1 in each entrycorresponding to a 0 in the pattern; id est, a sparsity pattern complement.At iteration i∑i−1X ⋆ + t j B j + t i B i = R i (I − t i ψ(Z i )Z i )Ri T (713)j=1By fact (1450), therefore∑i−1X ⋆ + t j B j + t i B i ≽ 0 ⇔ 1 − t i ψ(Z i )λ(Z i ) ≽ 0 (714)j=1where λ(Z i )∈ R ρ denotes the eigenvalues of Z i .Maximization of each t i in step 2 of the Procedure reduces rank of (713)and locates a new point on the boundary ∂(A ∩ S n +) . 4.21 Maximization oft i thereby has closed form;4.20 A simple method of solution is closed-form projection of a random nonzero point onthat proper subspace of isometrically isomorphic R ρ(ρ+1)/2 specified by the constraints.(E.5.0.0.6) Such a solution is nontrivial assuming the specified intersection of hyperplanesis not the origin; guaranteed by ρ(ρ + 1)/2 > m. Indeed, this geometric intuition aboutforming the perturbation is what bounds any solution’s rank from below; m is fixed bythe number of equality constraints in (649P) while rank ρ decreases with each iteration i.Otherwise, we might iterate indefinitely.4.21 This holds because rank of a positive semidefinite matrix in S n is diminished belown by the number of its 0 eigenvalues (1460), and because a positive semidefinite matrixhaving one or more 0 eigenvalues corresponds to a point on the PSD cone boundary (193).Necessity and sufficiency are due to the facts: R i can be completed to a nonsingular matrix(A.3.1.0.5), and I − t i ψ(Z i )Z i can be padded with zeros while maintaining equivalencein (713).

4.3. RANK REDUCTION 303(t ⋆ i) −1 = max {ψ(Z i )λ(Z i ) j , j =1... ρ} (715)When Z i is indefinite, direction of perturbation (determined by ψ(Z i )) isarbitrary. We may take an early exit from the Procedure were Z i to become0 or wererank [ svec R T iA 1 R i svec R T iA 2 R i · · · svec R T iA m R i]= ρ(ρ + 1)/2 (716)which characterizes rank ρ of any [sic] extreme point in A ∩ S n + . [239,2.4][240]Proof. Assuming the form of every perturbation matrix is indeed (709),then by (712)svec Z i ⊥ [ svec(R T iA 1 R i ) svec(R T iA 2 R i ) · · · svec(R T iA m R i ) ] (717)By orthogonal complement we haverank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] ⊥+ rank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] = ρ(ρ + 1)/2(718)When Z i can only be 0, then the perturbation is null because an extremepoint has been found; thus[svec(RTi A 1 R i ) · · · svec(R T iA m R i ) ] ⊥= 0 (719)from which the stated result (716) directly follows.

304 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.3.3 Optimality of perturbed X ⋆We show that the optimal objective value is unaltered by perturbation (709);id est,i∑〈C , X ⋆ + t j B j 〉 = 〈C , X ⋆ 〉 (720)j=1Proof. From Corollary 4.2.3.0.1 we have the necessary and sufficientrelationship between optimal primal and dual solutions under assumption ofnonempty primal feasible cone interior A ∩ int S n + :S ⋆ X ⋆ = S ⋆ R 1 R T 1 = X ⋆ S ⋆ = R 1 R T 1 S ⋆ = 0 (721)This means R(R 1 ) ⊆ N(S ⋆ ) and R(S ⋆ ) ⊆ N(R T 1 ). From (710) and (713)we get the sequence:X ⋆ = R 1 R1TX ⋆ + t 1 B 1 = R 2 R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )R1TX ⋆ + t 1 B 1 + t 2 B 2 = R 3 R3 T = R 2 (I − t 2 ψ(Z 2 )Z 2 )R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )(I − t 2 ψ(Z 2 )Z 2 )R1T.()∑X ⋆ + i i∏t j B j = R 1 (I − t j ψ(Z j )Z j ) R1 T (722)j=1j=1Substituting C = svec −1 (A T y ⋆ ) + S ⋆ from (649),()∑〈C , X ⋆ + i i∏t j B j 〉 =〈svec −1 (A T y ⋆ ) + S ⋆ , R 1 (I − t j ψ(Z j )Z j )j=1〈〉m∑= yk ⋆A ∑k , X ⋆ + i t j B jk=1j=1j=1〈 m〉∑= yk ⋆A k + S ⋆ , X ⋆ = 〈C , X ⋆ 〉 (723)k=1R T 1〉because 〈B i , A j 〉=0 ∀i, j by design (706).

4.3. RANK REDUCTION 3054.3.3.0.1 Example. Aδ(X) = b .This academic example demonstrates that a solution found by rank reductioncan certainly have rank less than Barvinok’s upper bound (272): Assume agiven vector b belongs to the conic hull of columns of a given matrix A⎡⎤⎡ ⎤A =⎢⎣−1 1 8 1 1−3 2 812−9 4 8141319Consider the convex optimization problem⎥⎦∈ R m×n , b =⎢⎣11214⎥⎦∈ R m (724)minimize trXX∈ S 5subject to X ≽ 0Aδ(X) = b(725)that minimizes the 1-norm of the main diagonal; id est, problem (725) is thesame asminimize ‖δ(X)‖ 1X∈ S 5subject to X ≽ 0(726)Aδ(X) = bthat finds a solution to Aδ(X)=b. Rank-3 solution X ⋆ = δ(x M) is optimal,where (confer (684))⎡ 2 ⎤1280x M=5⎢ 128 ⎥(727)⎣ 0 ⎦90128Yet upper bound (272) predicts existence of at most a(⌊√ ⌋ )8m + 1 − 1rank-= 22(728)feasible solution from m = 3 equality constraints. To find a lowerrank ρ optimal solution to (725) (barring combinatorics), we invokeProcedure 4.3.1.0.1:

306 CHAPTER 4. SEMIDEFINITE PROGRAMMINGInitialize:C =I , ρ=3, A j δ(A(j, :)) , j =1, 2, 3, X ⋆ = δ(x M) , m=3, n=5.{Iteration i=1:⎡Step 1: R 1 =⎢⎣√21280 00 0 00√501280 0√00 090128⎤.⎥⎦find Z 1 ∈ S 3subject to 〈Z 1 , R T 1A j R 1 〉 = 0, j =1, 2, 3(729)A nonzero randomly selected matrix Z 1 , having 0 main diagonal,is a solution yielding nonzero perturbation matrix B 1 . ChoosearbitrarilyZ 1 = 11 T − I ∈ S 3 (730)then (rounding)⎡B 1 =⎢⎣Step 2: t ⋆ 1= 1 because λ(Z 1 )=[−1 −1⎡X ⋆ ← δ(x M) + B 1 =⎢⎣0 0 0.0247 0 0.10480 0 0 0 00.0247 0 0 0 0.16570 0 0 0 00.1048 0 0.1657 0 02 ] T . So,⎤⎥⎦20 0.0247 0 0.10481280 0 0 0 00.0247 051280 0.16570 0 0 0 00.1048 0 0.1657 090128⎤(731)⎥⎦ (732)has rank ρ ←1 and produces the same optimal objective value.}

4.3. RANK REDUCTION 3074.3.3.0.2 Exercise. Rank reduction of maximal complementarity.Apply rank reduction Procedure 4.3.1.0.1 to the maximal complementarityexample (4.1.2.3.1). Demonstrate a rank-1 solution; which can certainly befound (by Barvinok’s Proposition 2.9.3.0.1) because there is only one equalityconstraint.4.3.4 thoughts regarding rank reductionBecause rank reduction Procedure 4.3.1.0.1 is guaranteed only to produceanother optimal solution conforming to Barvinok’s upper bound (272), theProcedure will not necessarily produce solutions of arbitrarily low rank; but ifthey exist, the Procedure can. Arbitrariness of search direction when matrixZ i becomes indefinite, mentioned on page 303, and the enormity of choicesfor Z i (712) are liabilities for this algorithm.4.3.4.1 inequality constraintsThe question naturally arises: what to do when a semidefinite program (notin prototypical form (649)) 4.22 has linear inequality constraints of the formα T i svec X ≼ β i , i = 1... k (733)where {β i } are given scalars and {α i } are given vectors. One expedient wayto handle this circumstance is to convert the inequality constraints to equalityconstraints by introducing a slack variable γ ; id est,α T i svec X + γ i = β i , i = 1... k , γ ≽ 0 (734)thereby converting the problem to prototypical form.Alternatively, we say the i th inequality constraint is active when it ismet with equality; id est, when for particular i in (733), α T i svec X ⋆ = β i .An optimal high-rank solution X ⋆ is, of course, feasible (satisfying all theconstraints). But for the purpose of rank reduction, inactive inequalityconstraints are ignored while active inequality constraints are interpreted as4.22 Contemporary numerical packages for solving semidefinite programs can solve a rangeof problems wider than prototype (649). Generally, they do so by transforming a givenproblem into prototypical form by introducing new constraints and variables. [11] [385]We are momentarily considering a departure from the primal prototype that augments theconstraint set with linear inequalities.

308 CHAPTER 4. SEMIDEFINITE PROGRAMMINGequality constraints. In other words, we take the union of active inequalityconstraints (as equalities) with equality constraints A svec X = b to forma composite affine subset Â substituting for (652). Then we proceed withrank reduction of X ⋆ as though the semidefinite program were in prototypicalform (649P).4.4 Rank-constrained semidefinite programWe generalize the trace heuristic (7.2.2.1), for finding low-rank optimalsolutions to SDPs of a more general form:4.4.1 rank constraint by convex iterationConsider a semidefinite feasibility problem of the formfindG∈S NGsubject to G ∈ CG ≽ 0rankG ≤ n(735)where C is a convex set presumed to contain positive semidefinite matricesof rank n or less; id est, C intersects the positive semidefinite cone boundary.We propose that this rank-constrained feasibility problem can be equivalentlyexpressed as the convex problem sequence (736) (1700a):minimizeG∈S N 〈G , W 〉subject to G ∈ CG ≽ 0(736)where direction vector 4.23 W is an optimal solution to semidefinite program,for 0≤n≤N −1N∑λ(G ⋆ ) i = minimize 〈G ⋆ , W 〉 (1700a)i=n+1W ∈ S Nsubject to 0 ≼ W ≼ ItrW = N − n4.23 Search direction W is a hyperplane-normal pointing opposite to direction of movementdescribing minimization of a real linear function 〈G, W 〉 (p.75).

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 309whose feasible set is a Fantope (2.3.2.0.1), and where G ⋆ is an optimalsolution to problem (736) given some iterate W . The idea is to iteratesolution of (736) and (1700a) until convergence as defined in4.4.1.2: 4.24N∑λ(G ⋆ ) i = 〈G ⋆ , W ⋆ 〉 = λ(G ⋆ ) T λ(W ⋆ ) 0 (737)i=n+1defines global convergence of the iteration; a vanishing objective that is acertificate of global optimality but cannot be guaranteed. Optimal directionvector W ⋆ is defined as any positive semidefinite matrix yielding optimalsolution G ⋆ of rank n or less to then convex equivalent (736) of feasibilityproblem (735):(735)findG∈S NGsubject to G ∈ CG ≽ 0rankG ≤ n≡minimizeG∈S N 〈G , W ⋆ 〉subject to G ∈ CG ≽ 0id est, any direction vector for which the last N − n nonincreasingly orderedeigenvalues λ of G ⋆ are zero. In any semidefinite feasibility problem, asolution of lowest rank must be an extreme point of the feasible set. Thenthere must exist a linear objective function such that this lowest-rank feasiblesolution optimizes the resultant SDP. [239,2.4] [240]We emphasize that convex problem (736) is not a relaxation ofrank-constrained feasibility problem (735); at global convergence, convexiteration (736) (1700a) makes it instead an equivalent problem.4.4.1.1 direction matrix interpretation(confer4.5.1.2) The feasible set of direction matrices in (1700a) is the convexhull of outer product of all rank-(N − n) orthonormal matrices; videlicet,(736)conv { UU T | U ∈ R N×N−n , U T U = I } = { A∈ S N | I ≽ A ≽ 0, 〈I , A 〉= N − n } (90)This set (92), argument to conv{ } , comprises the extreme points of thisFantope (90). An optimal solution W to (1700a), that is an extreme point,4.24 Proposed iteration is neither dual projection (Figure 164) or alternating projection(Figure 167). Sum of eigenvalues follows from results of Ky Fan (page 658). Innerproduct of eigenvalues follows from (1590) and properties of commutative matrix products(page 605).

310 CHAPTER 4. SEMIDEFINITE PROGRAMMINGis known in closed form (p.658): Given ordered diagonalizationG ⋆ = QΛQ T ∈ S N (A.5.1), then direction matrix W = U ⋆ U ⋆T is optimal andextreme where U ⋆ = Q(:, n+1:N)∈ R N×N−n . Eigenvalue vector λ(W) has 1in each entry corresponding to the N − n smallest entries of δ(Λ) and has 0elsewhere. By (221) (223), polar direction −W can be regarded as pointingtoward the set of all rank-n (or less) positive semidefinite matrices whosenullspace contains that of G ⋆ . For that particular closed-form solution W ,consequently (confer (760))N∑λ(G ⋆ ) i = 〈G ⋆ , W 〉 = λ(G ⋆ ) T λ(W) (738)i=n+1This is the connection to cardinality minimization of vectors; 4.25 id est,eigenvalue λ cardinality (rank) is analogous to vector x cardinality via (760).So as not to be misconstrued under closed-form solution W to (1700a):Define (confer (221))S n {(I −W)G(I −W) |G∈ S N } = {X ∈ S N | N(X) ⊇ N(G ⋆ )} (739)as the symmetric subspace of rank≤n matrices whose nullspace containsN(G ⋆ ). Then projection of G ⋆ on S n is (I −W)G ⋆ (I −W). (E.7)Direction of projection is −WG ⋆ W . (Figure 83) tr(WG ⋆ W) is ameasure of proximity to S n because its orthogonal complement isSn ⊥ = {WGW |G∈ S N } ; the point being, convex iteration incorporatingconstrained tr(WGW) = 〈G , W 〉 minimization is not a projection method:certainly, not on these two subspaces.Closed-form solution W to problem (1700a), though efficient, comes witha caveat: there exist cases where this projection matrix solution W does notprovide the most efficient route to an optimal rank-n solution G ⋆ ; id est,direction W is not unique. So we sometimes choose to solve (1700a) insteadof employing a known closed-form solution.4.25 not trace minimization of a nonnegative diagonal matrix δ(x) as in [135,1] [300,2].To make rank-constrained problem (735) resemble cardinality problem (530), we couldmake C an affine subset:find X ∈ S Nsubject to Asvec X = bX ≽ 0rankX ≤ n

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 311S n(I −W)G ⋆ (I −W)G ⋆WG ⋆ WS ⊥ nFigure 83: (confer Figure 165) Projection of G ⋆ on subspace S n of rank≤nmatrices whose nullspace contains N(G ⋆ ). This direction W is closed-formsolution to (1700a).When direction matrix W = I , as in the trace heuristic for example,then −W points directly at the origin (the rank-0 PSD matrix, Figure 84).Vector inner-product of an optimization variable with direction matrix Wis therefore a generalization of the trace heuristic (7.2.2.1) for rankminimization; −W is instead trained toward the boundary of the positivesemidefinite cone.4.4.1.2 convergenceWe study convergence to ascertain conditions under which a direction matrixwill reveal a feasible solution G , of rank n or less, to semidefinite program(736). Denote by W ⋆ a particular optimal direction matrix from semidefiniteprogram (1700a) such that (737) holds (feasible rankG≤n found). Then wedefine global convergence of the iteration (736) (1700a) to correspond withthis vanishing vector inner-product (737) of optimal solutions.Because this iterative technique for constraining rank is not a projectionmethod, it can find a rank-n solution G ⋆ ((737) will be satisfied) only if atleast one exists in the feasible set of program (736).

312 CHAPTER 4. SEMIDEFINITE PROGRAMMINGI0S 2 +(confer Figure 99)∂H = {G | 〈G, I〉 = κ}Figure 84: Trace heuristic can be interpreted as minimization of a hyperplane,with normal I , over positive semidefinite cone drawn here in isometricallyisomorphic R 3 . Polar of direction vector W = I points toward origin.

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 3134.4.1.2.1 Proof. Suppose 〈G ⋆ , W 〉= τ is satisfied for some nonnegativeconstant τ after any particular iteration (736) (1700a) of the twominimization problems. Once a particular value of τ is achieved, it can neverbe exceeded by subsequent iterations because existence of feasible G and Whaving that vector inner-product τ has been established simultaneously ineach problem. Because the infimum of vector inner-product of two positivesemidefinite matrix variables is zero, the nonincreasing sequence of iterationsis thus bounded below hence convergent because any bounded monotonicsequence in R is convergent. [258,1.2] [42,1.1] Local convergence to somenonnegative objective value τ is thereby established.Local convergence, in this context, means convergence to a fixed point ofpossibly infeasible rank. Only local convergence can be established becauseobjective 〈G , W 〉 , when instead regarded simultaneously in two variables(G , W ) , is generally multimodal. (3.8.0.0.3)Local convergence, convergence to τ ≠ 0 and definition of a stall, neverimplies nonexistence of a rank-n feasible solution to (736). A nonexistentrank-n feasible solution would mean certain failure to converge globally bydefinition (737) (convergence to τ ≠ 0) but, as proved, convex iteration alwaysconverges locally if not globally.When a rank-n feasible solution to (736) exists, it remains an openproblem to state conditions under which 〈G ⋆ , W ⋆ 〉=τ =0 (737) is achievedby iterative solution of semidefinite programs (736) and (1700a). ThenrankG ⋆ ≤ n and pair (G ⋆ , W ⋆ ) becomes a globally optimal fixed point ofiteration. There can be no proof of global convergence because of theimplicit high-dimensional multimodal manifold in variables (G , W). Whenstall occurs, direction vector W can be manipulated to steer out; e.g.,reversal of search direction as in Example 4.6.0.0.1, or reinitialization to arandom rank-(N − n) matrix in the same positive semidefinite cone face(2.9.2.3) demanded by the current iterate: given ordered diagonalizationG ⋆ = QΛQ T ∈ S N , then W = U ⋆ ΦU ⋆T where U ⋆ = Q(:, n+1:N)∈ R N×N−nand where eigenvalue vector λ(W) 1:N−n = λ(Φ) has nonnegative uniformlydistributed random entries in (0, 1] by selection of Φ∈ S N−n+ whileλ(W) N−n+1:N = 0. When this direction works, rank with respect to iterationtends to be noisily monotonic. Zero eigenvalues act as memory whilerandomness largely reduces likelihood of stall.

314 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.4.1.2.2 Exercise. Completely positive semidefinite ⎡ matrix. ⎤ [40]0.50 0.55 0.20Given rank-2 positive semidefinite matrix G = ⎣ 0.55 0.61 0.22 ⎦, find a0.20 0.22 0.08positive factorization G = X T X (900) by solvingfind X ≥ 0X∈R 2×3 [ I Xsubject to Z =X T GrankZ ≤ 2]≽ 0(740)via convex iteration.4.4.1.2.3 Exercise. Nonnegative matrix ⎡ factorization. ⎤17 28 42Given rank-2 nonnegative matrix X = ⎣ 16 47 51 ⎦, find a nonnegative17 82 72factorizationX = WH (741)by solvingfindA∈S 3 , B∈S 3 , W ∈R 3×2 , H∈R 2×3W , H⎡subject to Z = ⎣W ≥ 0H ≥ 0rankZ ≤ 2I W T HW A XH T X T Bwhich follows from the fact, at optimality,⎡ ⎤IZ ⋆ = ⎣ W ⎦ [I W T H ]H T⎤⎦≽ 0(742)(743)Use the known closed-form solution for a direction vector Y to regulate rankby convex iteration; set Z ⋆ = QΛQ T ∈ S 8 to an ordered diagonalization andU ⋆ = Q(:, 3:8)∈ R 8×6 , then Y = U ⋆ U ⋆T (4.4.1.1).

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 315In summary, initialize Y then iterate numerical solution of (convex)semidefinite programminimize 〈Z , Y 〉A∈S 3 , B∈S 3 , W ∈R 3×2 , H∈R 2×3 ⎡ ⎤I W T Hsubject to Z = ⎣ W A X ⎦≽ 0H T X T BW ≥ 0H ≥ 0(744)with Y = U ⋆ U ⋆T until convergence (which is global and occurs in very fewiterations for this instance).Now, an application to optimal regulation of affine dimension:4.4.1.2.4 Example. Sensor-Network Localization or Wireless Location.Heuristic solution to a sensor-network localization problem, proposedby Carter, Jin, Saunders, & Ye in [72], 4.26 is limited to two Euclideandimensions and applies semidefinite programming (SDP) to littlesubproblems. There, a large network is partitioned into smaller subnetworks(as small as one sensor − a mobile point, whereabouts unknown) and thensemidefinite programming and heuristics called spaseloc are applied tolocalize each and every partition by two-dimensional distance geometry.Their partitioning procedure is one-pass, yet termed iterative; a termapplicable only in so far as adjoining partitions can share localized sensorsand anchors (absolute sensor positions known a priori). But there isno iteration on the entire network, hence the term “iterative” is perhapsinappropriate. As partitions are selected based on “rule sets” (heuristics, notgeographics), they also term the partitioning adaptive. But no adaptation ofa partition actually occurs once it has been determined.One can reasonably argue that semidefinite programming methods areunnecessary for localization of small partitions of large sensor networks.[278] [86] In the past, these nonlinear localization problems were solvedalgebraically and computed by least squares solution to hyperbolic equations;4.26 The paper constitutes Jin’s dissertation for University of Toronto [216] although hername appears as second author. Ye’s authorship is honorary.

316 CHAPTER 4. SEMIDEFINITE PROGRAMMINGFigure 85: Sensor-network localization in R 2 , illustrating connectivity andcircular radio-range per sensor. Smaller dark grey regions each hold ananchor at their center; known fixed sensor positions. Sensor/anchor distanceis measurable with negligible uncertainty for sensor within those grey regions.(Graphic by Geoff Merrett.)called multilateration. 4.27 [228] [266] Indeed, practical contemporarynumerical methods for global positioning (GPS) by satellite do not rely onconvex optimization. [290]Modern distance geometry is inextricably melded with semidefiniteprogramming. The beauty of semidefinite programming, as relates tolocalization, lies in convex expression of classical multilateration: So & Yeshowed [320] that the problem of finding unique solution, to a noiselessnonlinear system describing the common point of intersection of hyperspheresin real Euclidean vector space, can be expressed as a semidefinite program4.27 Multilateration − literally, having many sides; shape of a geometric figure formed bynearly intersecting lines of position. In navigation systems, therefore: Obtaining a fix frommultiple lines of position. Multilateration can be regarded as noisy trilateration.

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 317via distance geometry.But the need for SDP methods in Carter & Jin et alii is enigmatic fortwo more reasons: 1) guessing solution to a partition whose intersensormeasurement data or connectivity is inadequate for localization by distancegeometry, 2) reliance on complicated and extensive heuristics for partitioninga large network that could instead be efficiently solved whole by onesemidefinite program [224,3]. While partitions range in size between 2 and10 sensors, 5 sensors optimally, heuristics provided are only for two spatialdimensions (no higher-dimensional heuristics are proposed). For these smallnumbers it remains unclarified as to precisely what advantage is gained overtraditional least squares: it is difficult to determine what part of their noiseperformance is attributable to SDP and what part is attributable to theirheuristic geometry.Partitioning of large sensor networks is a compromise to rapid growthof SDP computational intensity with problem size. But when impact ofnoise on distance measurement is of most concern, one is averse to apartitioning scheme because noise-effects vary inversely with problem size.[53,2.2] (5.13.2) Since an individual partition’s solution is not iteratedin Carter & Jin and is interdependent with adjoining partitions, we expecterrors to propagate from one partition to the next; the ultimate partitionsolved, expected to suffer most.Heuristics often fail on real-world data because of unanticipatedcircumstances. When heuristics fail, generally they are repaired byadding more heuristics. Tenuous is any presumption, for example, thatdistance measurement errors have distribution characterized by circularcontours of equal probability about an unknown sensor-location. (Figure 85)That presumption effectively appears within Carter & Jin’s optimizationproblem statement as affine equality constraints relating unknowns todistance measurements that are corrupted by noise. Yet in most allurban environments, this measurement noise is more aptly characterized byellipsoids of varying orientation and eccentricity as one recedes from a sensor.(Figure 126) Each unknown sensor must therefore instead be bound to itsown particular range of distance, primarily determined by the terrain. 4.28The nonconvex problem we must instead solve is:4.28 A distinct contour map corresponding to each anchor is required in practice.

318 CHAPTER 4. SEMIDEFINITE PROGRAMMINGfind {x i , x j }i , j ∈ I(745)subject to d ij ≤ ‖x i − x j ‖ 2 ≤ d ijwhere x i represents sensor location, and where d ij and d ij respectivelyrepresent lower and upper bounds on measured distance-square from i th toj th sensor (or from sensor to anchor). Figure 90 illustrates contours of equalsensor-location uncertainty. By establishing these individual upper and lowerbounds, orientation and eccentricity can effectively be incorporated into theproblem statement.Generally speaking, there can be no unique solution to the sensor-networklocalization problem because there is no unique formulation; that is the art ofOptimization. Any optimal solution obtained depends on whether or how anetwork is partitioned, whether distance data is complete, presence of noise,and how the problem is formulated. When a particular formulation is aconvex optimization problem, then the set of all optimal solutions forms aconvex set containing the actual or true localization. Measurement noiseprecludes equality constraints representing distance. The optimal solutionset is consequently expanded; necessitated by introduction of distanceinequalities admitting more and higher-rank solutions. Even were the optimalsolution set a single point, it is not necessarily the true localization becausethere is little hope of exact localization by any algorithm once significantnoise is introduced.Carter & Jin gauge performance of their heuristics to the SDP formulationof author Biswas whom they regard as vanguard to the art. [14,1] Biswasposed localization as an optimization problem minimizing a distance measure.[47] [45] Intuitively, minimization of any distance measure yields compactedsolutions; (confer6.7.0.0.1) precisely the anomaly motivating Carter & Jin.Their two-dimensional heuristics outperformed Biswas’ localizations bothin execution-time and proximity to the desired result. Perhaps, instead ofheuristics, Biswas’ approach to localization can be improved: [44] [46].The sensor-network localization problem is considered difficult. [14,2]Rank constraints in optimization are considered more difficult. Control ofaffine dimension in Carter & Jin is suboptimal because of implicit projectionon R 2 . In what follows, we present the localization problem as a semidefiniteprogram (equivalent to (745)) having an explicit rank constraint whichcontrols affine dimension of an optimal solution. We show how to achievethat rank constraint only if the feasible set contains a matrix of desired rank.

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 3192314Figure 86: 2-lattice in R 2 , hand-drawn. Nodes 3 and 4 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc.Our problem formulation is extensible to any spatial dimension.proposed standardized testJin proposes an academic test in two-dimensional real Euclidean space R 2that we adopt. In essence, this test is a localization of sensors andanchors arranged in a regular triangular lattice. Lattice connectivity issolely determined by sensor radio range; a connectivity graph is assumedincomplete. In the interest of test standardization, we propose adoptionof a few small examples: Figure 86 through Figure 89 and their particularconnectivity represented by matrices (746) through (749) respectively.0 • ? •• 0 • •? • 0 ◦• • ◦ 0(746)Matrix entries dot • indicate measurable distance between nodes whileunknown distance is denoted by ? (question mark). Matrix entries hollowdot ◦ represent known distance between anchors (to high accuracy) whilezero distance is denoted 0. Because measured distances are quite unreliablein practice, our solution to the localization problem substitutes a distinct

320 CHAPTER 4. SEMIDEFINITE PROGRAMMING5 763 841 92Figure 87: 3-lattice in R 2 , hand-drawn. Nodes 7, 8, and 9 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc.range of possible distance for each measurable distance; equality constraintsexist only for anchors.Anchors are chosen so as to increase difficulty for algorithms dependenton existence of sensors in their convex hull. The challenge is to find a solutionin two dimensions close to the true sensor positions given incomplete noisyintersensor distance information.0 • • ? • ? ? • •• 0 • • ? • ? • •• • 0 • • • • • •? • • 0 ? • • • •• ? • ? 0 • • • •? • • • • 0 • • •? ? • • • • 0 ◦ ◦• • • • • • ◦ 0 ◦• • • • • • ◦ ◦ 0(747)

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 3211013 11 1271489415 5 611623Figure 88: 4-lattice in R 2 , hand-drawn. Nodes 13, 14, 15, and 16 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc.0 ? ? • ? ? • ? ? ? ? ? ? ? • •? 0 • • • • ? • ? ? ? ? ? • • •? • 0 ? • • ? ? • ? ? ? ? ? • •• • ? 0 • ? • • ? • ? ? • • • •? • • • 0 • ? • • ? • • • • • •? • • ? • 0 ? • • ? • • ? ? ? ?• ? ? • ? ? 0 ? ? • ? ? • • • •? • ? • • • ? 0 • • • • • • • •? ? • ? • • ? • 0 ? • • • ? • ?? ? ? • ? ? • • ? 0 • ? • • • ?? ? ? ? • • ? • • • 0 • • • • ?? ? ? ? • • ? • • ? • 0 ? ? ? ?? ? ? • • ? • • • • • ? 0 ◦ ◦ ◦? • ? • • ? • • ? • • ? ◦ 0 ◦ ◦• • • • • ? • • • • • ? ◦ ◦ 0 ◦• • • • • ? • • ? ? ? ? ◦ ◦ ◦ 0(748)

322 CHAPTER 4. SEMIDEFINITE PROGRAMMING1718 21 19 2013 14221516910 23 11 125 6 24 7 8122534Figure 89: 5-lattice in R 2 . Nodes 21 through 25 are anchors.0 • ? ? • • ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?• 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? • •? ? 0 • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ? • • •? ? • 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? • ?• • ? ? 0 • ? ? • • ? ? • • ? ? • ? ? ? ? ? • ? •• • • ? • 0 • ? • • • ? ? • ? ? ? ? ? ? ? ? • • •? ? • • ? • 0 • ? ? • • ? ? • • ? ? ? ? ? ? • • •? ? • • ? ? • 0 ? ? • • ? ? • • ? ? ? ? ? ? ? • ?• ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? ? ? ? ?? • ? ? • • ? ? • 0 • ? • • ? ? ? • ? ? • • • • •? ? • ? ? • • • ? • 0 • ? • • • ? ? • ? ? • • • •? ? • • ? ? • • ? ? • 0 ? ? • • ? ? • • ? • • • ?? ? ? ? • ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ?? ? ? ? • • ? ? • • • ? • 0 • ? • • • ? • • • • ?? ? ? ? ? ? • • ? ? • • ? • 0 • ? ? • • • • • • ?? ? ? ? ? ? • • ? ? • • ? ? • 0 ? ? • • ? • ? ? ?? ? ? ? • ? ? ? • ? ? ? • • ? ? 0 • ? ? • ? ? ? ?? ? ? ? ? ? ? ? • • ? ? • • ? ? • 0 • ? • • • ? ?? ? ? ? ? ? ? ? ? ? • • ? • • • ? • 0 • • • • ? ?? ? ? ? ? ? ? ? ? ? ? • ? ? • • ? ? • 0 • • ? ? ?? ? ? ? ? ? ? ? ? • ? ? • • • ? • • • • 0 ◦ ◦ ◦ ◦? ? ? ? ? ? ? ? ? • • • • • • • ? • • • ◦ 0 ◦ ◦ ◦? ? • ? • • • ? ? • • • ? • • ? ? • • ? ◦ ◦ 0 ◦ ◦? • • • ? • • • ? • • • ? • • ? ? ? ? ? ◦ ◦ ◦ 0 ◦? • • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ◦ ◦ ◦ ◦ 0(749)

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 323MARKET St.Figure 90: Location uncertainty ellipsoid in R 2 for each of 15 sensors • withinthree city blocks in downtown San Francisco. Data by Polaris Wireless. [322]problem statementAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrix X ;X = [x 1 · · · x N ] ∈ R n×N (76)where N is regarded as cardinality of list X . Positive semidefinite matrixX T X , formed from inner product of the list, is a Gram matrix; [250,3.6]⎡⎤‖x 1 ‖ 2 x T 1x 2 x T 1x 3 · · · x T 1x Nx T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x NG = X T X =x T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N + (900)⎢⎥⎣ . ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2where S N + is the convex cone of N ×N positive semidefinite matrices in thesymmetric matrix subspace S N .Existence of noise precludes measured distance from the input data. Weinstead assign measured distance to a range estimate specified by individualupper and lower bounds: d ij is an upper bound on distance-square from i th

324 CHAPTER 4. SEMIDEFINITE PROGRAMMINGto j th sensor, while d ij is a lower bound. These bounds become the inputdata. Each measurement range is presumed different from the others becauseof measurement uncertainty; e.g., Figure 90.Our mathematical treatment of anchors and sensors is notdichotomized. 4.29 A sensor position that is known a priori to highaccuracy (with absolute certainty) ˇx i is called an anchor. Then thesensor-network localization problem (745) can be expressed equivalently:Given a number of anchors m , and I a set of indices (corresponding to allmeasurable distances • ), for 0 < n < Nfind XG∈S N , X∈R n×Nsubject to d ij ≤ 〈G , (e i − e j )(e i − e j ) T 〉 ≤ d ij ∀(i,j)∈ I〈G , e i e T i 〉 = ‖ˇx i ‖ 2 , i = N − m + 1... N〈G , (e i e T j + e j e T i )/2〉 = ˇx T i ˇx j , i < j , ∀i,j∈{N − m + 1... N}X(:, N − m + 1:N) = [ ˇx N−m+1 · · · ˇx N ][ ] I XZ =X T≽ 0GrankZ= nwhere e i is the i th member of the standard basis for R N . Distance-squared ij = ‖x i − x j ‖ 2 2 = 〈x i − x j , x i − x j 〉 (887)is related to Gram matrix entries G[g ij ] by vector inner-product(750)d ij = g ii + g jj − 2g ij= 〈G , (e i − e j )(e i − e j ) T 〉 = tr(G T (e i − e j )(e i − e j ) T )(902)hence the scalar inequalities. Each linear equality constraint in G∈ S Nrepresents a hyperplane in isometrically isomorphic Euclidean vector spaceR N(N+1)/2 , while each linear inequality pair represents a convex Euclideanbody known as slab (an intersection of two parallel but opposing halfspaces,Figure 11). By Schur complement (A.4), any solution (G, X) providescomparison with respect to the positive semidefinite coneG ≽ X T X (937)4.29 Wireless location problem thus stated identically; difference being: fewer sensors.

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 325which is a convex relaxation of the desired equality constraint[ ] [ ] I X I [I X ]X T =G X T (938)The rank constraint insures this equality holds, by Theorem A.4.0.1.3, thusrestricting solution to R n . Assuming full-rank solution (list) Xconvex equivalent problem statementrankZ = rankG = rankX (751)Problem statement (750) is nonconvex because of the rank constraint. Wedo not eliminate or ignore the rank constraint; rather, we find a convex wayto enforce it: for 0 < n < Nminimize 〈Z , W 〉G∈S N , X∈R n×Nsubject to d ij ≤ 〈G , (e i − e j )(e i − e j ) T 〉 ≤ d ij ∀(i,j)∈ I〈G , e i e T i 〉 = ‖ˇx i ‖ 2 , i = N − m + 1... N〈G , (e i e T j + e j e T i )/2〉 = ˇx T i ˇx j , i < j , ∀i,j∈{N − m + 1... N}X(:, N − m + 1:N) = [ ˇx N−m+1 · · · ˇx N ][ ] I XZ =X T≽ 0 (752)GConvex function trZ is a well-known heuristic whose sole purpose is torepresent convex envelope of rankZ . (7.2.2.1) In this convex optimizationproblem (752), a semidefinite program, we substitute a vector inner-productobjective function for trace;trZ = 〈Z , I 〉 ← 〈Z , W 〉 (753)a generalization of the trace heuristic for minimizing convex envelope of rank,where W ∈ S N+n+ is constant with respect to (752). Matrix W is normal toa hyperplane in S N+n minimized over a convex feasible set specified by theconstraints in (752). Matrix W is chosen so −W points in direction of rank-nfeasible solutions G . For properly chosen W , problem (752) becomes anequivalent to (750). Thus the purpose of vector inner-product objective (753)is to locate a rank-n feasible Gram matrix assumed existent on the boundaryof positive semidefinite cone S N + , as explained beginning in4.4.1; how tochoose direction vector W is explained there and in what follows:

326 CHAPTER 4. SEMIDEFINITE PROGRAMMING1.210.80.60.40.20−0.2−0.5 0 0.5 1 1.5 2Figure 91: Typical solution for 2-lattice in Figure 86 with noise factorη = 0.1 . Two red rightmost nodes are anchors; two remaining nodes aresensors. Radio range of sensor 1 indicated by arc; radius = 1.14 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 1 iteration (752) (1700a) subject to reflection error.1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 92: Typical solution for 3-lattice in Figure 87 with noise factorη = 0.1 . Three red vertical middle nodes are anchors; remaining nodes aresensors. Radio range of sensor 1 indicated by arc; radius = 1.12 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 2 iterations (752) (1700a).

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 3271.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 93: Typical solution for 4-lattice in Figure 88 with noise factorη = 0.1 . Four red vertical middle-left nodes are anchors; remaining nodesare sensors. Radio range of sensor 1 indicated by arc; radius = 0.75 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 7 iterations (752) (1700a).1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 94: Typical solution for 5-lattice in Figure 89 with noise factorη = 0.1 . Five red vertical middle nodes are anchors; remaining nodes aresensors. Radio range of sensor 1 indicated by arc; radius = 0.56 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 3 iterations (752) (1700a).

328 CHAPTER 4. SEMIDEFINITE PROGRAMMING1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 95: Typical solution for 10-lattice with noise factor η = 0.1 comparesbetter than Carter & Jin [72, fig.4.2]. Ten red vertical middle nodes areanchors; the rest are sensors. Radio range of sensor 1 indicated by arc;radius = 0.25 . Actual sensor indicated by target while its localization isindicated by bullet • . Rank-2 solution found in 5 iterations (752) (1700a).1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2Figure 96: Typical localization of 100 randomized noiseless sensors (η = 0) isexact despite incomplete EDM. Ten red vertical middle nodes are anchors;remaining nodes are sensors. Radio range of sensor at origin indicated by arc;radius = 0.25 . Actual sensor indicated by target while its localization isindicated by bullet • . Rank-2 solution found in 3 iterations (752) (1700a).

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 3291.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2Figure 97: Typical solution for 100 randomized sensors with noise factorη = 0.1 ; worst measured average sensor error ≈ 0.0044 compares better thanCarter & Jin’s 0.0154 computed in 0.71s [72, p.19]. Ten red vertical middlenodes are anchors; same as before. Remaining nodes are sensors. Interioranchor placement makes localization difficult. Radio range of sensor at originindicated by arc; radius = 0.25 . Actual sensor indicated by target whileits localization is indicated by bullet • . After 1 iteration rankG=92,after 2 iterations rankG=4. Rank-2 solution found in 3 iterations (752)(1700a). (Regular lattice in Figure 95 is actually harder to solve, requiringmore iterations.) Runtime for SDPT3 [348] under cvx [167] is a few minuteson 2009 vintage laptop Core 2 Duo CPU (Intel T6400@2GHz, 800MHz FSB).

330 CHAPTER 4. SEMIDEFINITE PROGRAMMINGdirection matrix WDenote by Z ⋆ an optimal composite matrix from semidefinite program (752).Then for Z ⋆ ∈ S N+n whose eigenvalues λ(Z ⋆ )∈ R N+n are arranged innonincreasing order, (Ky Fan)N+n∑λ(Z ⋆ ) ii=n+1= minimizeW ∈ S N+n 〈Z ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = N(1700a)which has an optimal solution that is known in closed form (p.658,4.4.1.1).This eigenvalue sum is zero when Z ⋆ has rank n or less.Foreknowledge of optimal Z ⋆ , to make possible this search for W ,implies iteration; id est, semidefinite program (752) is solved for Z ⋆initializing W = I or W = 0. Once found, Z ⋆ becomes constant insemidefinite program (1700a) where a new normal direction W is found asits optimal solution. Then this cycle (752) (1700a) iterates until convergence.When rankZ ⋆ = n , solution via this convex iteration solves sensor-networklocalization problem (745) and its equivalent (750).numerical solutionIn all examples to follow, number of anchorsm = √ N (754)equals square root of cardinality N of list X . Indices set I identifying allmeasurable distances • is ascertained from connectivity matrix (746), (747),(748), or (749). We solve iteration (752) (1700a) in dimension n = 2 for eachrespective example illustrated in Figure 86 through Figure 89.In presence of negligible noise, true position is reliably localized for everystandardized example; noteworthy in so far as each example represents anincomplete graph. This implies that the set of all optimal solutions havinglowest rank must be small.To make the examples interesting and consistent with previous work, werandomize each range of distance-square that bounds 〈G, (e i −e j )(e i −e j ) T 〉in (752); id est, for each and every (i,j)∈ Id ij = d ij (1 + √ 3η χ l) 2d ij = d ij (1 − √ 3η χ l+1) 2 (755)

4.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 331where η = 0.1 is a constant noise factor, χ lis the l th sample of a noiseprocess realization uniformly distributed in the interval (0, 1) like rand(1)from Matlab, and d ij is actual distance-square from i th to j th sensor.Because of distinct function calls to rand() , each range of distance-square[d ij , d ij ] is not necessarily centered on actual distance-square d ij . Unitstochastic variance is provided by factor √ 3.Figure 91 through Figure 94 each illustrate one realization of numericalsolution to the standardized lattice problems posed by Figure 86 throughFigure 89 respectively. Exact localization, by any method, is impossiblebecause of measurement noise. Certainly, by inspection of their publishedgraphical data, our results are better than those of Carter & Jin.(Figure 95, 96, 97) Obviously our solutions do not suffer from thosecompaction-type errors (clustering of localized sensors) exhibited by Biswas’graphical results for the same noise factor η .localization example conclusionSolution to this sensor-network localization problem became apparent byunderstanding geometry of optimization. Trace of a matrix, to a student oflinear algebra, is perhaps a sum of eigenvalues. But to us, trace representsthe normal I to some hyperplane in Euclidean vector space. (Figure 84)Our solutions are globally optimal, requiring: 1) no centralized-gradientpostprocessing heuristic refinement as in [44] because there is effectivelyno relaxation of (750) at optimality, 2) no implicit postprojection onrank-2 positive semidefinite matrices induced by nonzero G−X T X denotingsuboptimality as occurs in [45] [46] [47] [72] [216] [224]; indeed, G ⋆ = X ⋆T X ⋆by convex iteration.Numerical solution to noisy problems, containing sensor variables wellin excess of 100, becomes difficult via the holistic semidefinite programwe proposed. When problem size is within reach of contemporarygeneral-purpose semidefinite program solvers, then the convex iteration wepresented inherently overcomes limitations of Carter & Jin with respect toboth noise performance and ability to localize in any desired affine dimension.The legacy of Carter, Jin, Saunders, & Ye [72] is a sobering demonstrationof the need for more efficient methods for solution of semidefinite programs,while that of So & Ye [320] forever bonds distance geometry to semidefiniteprogramming. Elegance of our semidefinite problem statement (752),for constraining affine dimension of sensor-network localization, should

332 CHAPTER 4. SEMIDEFINITE PROGRAMMING〈Z , W 〉rank Zw ck0wf(Z)Figure 98: Regularization curve, parametrized by weight w for real convexobjective f minimization (756) with rank constraint to k by convex iteration.provide some impetus to focus more research on computational intensity ofgeneral-purpose semidefinite program solvers. An approach different frominterior-point methods is required; higher speed and greater accuracy from asimplex-like solver is what is needed.4.4.1.3 regularizationWe numerically tested the foregoing technique for constraining rank on a widerange of problems including localization of randomized positions (Figure 97),stress (7.2.2.7.1), ball packing (5.4.2.3.4), and cardinality problems (4.6).We have had some success introducing the direction matrix inner-product(753) as a regularization term 4.30minimizeZ∈S N f(Z) + w〈Z , W 〉subject to Z ∈ CZ ≽ 0(756)4.30 called multiobjective- or vector optimization. Proof of convergence for this convexiteration is identical to that in4.4.1.2.1 because f is a convex real function, hence boundedbelow, and f(Z ⋆ ) is constant in (757).

4.5. CONSTRAINING CARDINALITY 333minimizeW ∈ S N f(Z ⋆ ) + w〈Z ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = N − n(757)whose purpose is to constrain rank, affine dimension, or cardinality:The abstraction, that is Figure 98, is a synopsis; a broad generalizationof accumulated empirical evidence: There exists a critical (smallest) weightw c • for which a minimal-rank constraint is met. Graphical discontinuitycan subsequently exist when there is a range of greater w providing requiredrank k but not necessarily increasing a minimization objective function f ;e.g., Example 4.6.0.0.2. Positive scalar w is well chosen by cut-and-try.4.5 Constraining cardinalityThe convex iteration technique for constraining rank can be applied tocardinality problems. There are parallels in its development analogous tohow prototypical semidefinite program (649) resembles linear program (302)on page 275 [382]:4.5.1 nonnegative variableOur goal has been to reliably constrain rank in a semidefinite program. Thereis a direct analogy to linear programming that is simpler to present but,perhaps, more difficult to solve. In Optimization, that analogy is known asthe cardinality problem.Consider a feasibility problem Ax = b , but with an upper bound k oncardinality ‖x‖ 0 of a nonnegative solution x : for A∈ R m×n and vectorb∈R(A)find x ∈ R nsubject to Ax = bx ≽ 0‖x‖ 0 ≤ k(530)where ‖x‖ 0 ≤ k means 4.31 vector x has at most k nonzero entries; sucha vector is presumed existent in the feasible set. Nonnegativity constraint4.31 Although it is a metric (5.2), cardinality ‖x‖ 0 cannot be a norm (3.2) because it isnot positively homogeneous.

334 CHAPTER 4. SEMIDEFINITE PROGRAMMINGx ≽ 0 is analogous to positive semidefiniteness; the notation means vector xbelongs to the nonnegative orthant R n + . Cardinality is quasiconcave on R n +just as rank is quasiconcave on S n + . [61,3.4.2]4.5.1.1 direction vectorWe propose that cardinality-constrained feasibility problem (530) can beequivalently expressed as a sequence of convex problems: for 0≤k ≤n−1minimizex∈R n 〈x , y〉subject to Ax = bx ≽ 0(156)n∑π(x ⋆ ) i =i=k+1minimizey∈R n 〈x ⋆ , y〉subject to 0 ≼ y ≼ 1y T 1 = n − k(525)where π is the (nonincreasing) presorting function. This sequence is iterateduntil x ⋆T y ⋆ vanishes; id est, until desired cardinality is achieved. But thisglobal convergence cannot always be guaranteed. 4.32Problem (525) is analogous to the rank constraint problem; (p.308)N∑λ(G ⋆ ) ii=k+1= minimizeW ∈ S N 〈G ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = N − k(1700a)Linear program (525) sums the n−k smallest entries from vector x . Incontext of problem (530), we want n−k entries of x to sum to zero; id est,we want a globally optimal objective x ⋆T y ⋆ to vanish: more generally,n∑π(|x ⋆ |) i = 〈|x ⋆ |, y ⋆ 〉 0 (758)i=k+1defines global convergence for the iteration. Then n−k entries of x ⋆ arethemselves zero whenever their absolute sum is, and cardinality of x ⋆ ∈ R n is4.32 When it succeeds, a sequence may be regarded as a homotopy to minimal 0-norm.

4.5. CONSTRAINING CARDINALITY 3350R 3 +∂H(confer Figure 84)1∂H = {x | 〈x, 1〉 = κ}Figure 99: 1-norm heuristic for cardinality minimization can be interpreted asminimization of a hyperplane, ∂H with normal 1, over nonnegative orthantdrawn here in R 3 . Polar of direction vector y = 1 points toward origin.at most k . Optimal direction vector y ⋆ is defined as any nonnegative vectorfor which(530)find x ∈ R nsubject to Ax = bx ≽ 0‖x‖ 0 ≤ k≡minimizex∈R n 〈x , y ⋆ 〉subject to Ax = bx ≽ 0(156)Existence of such a y ⋆ , whose nonzero entries are complementary to thoseof x ⋆ , is obvious assuming existence of a cardinality-k solution x ⋆ .4.5.1.2 direction vector interpretation(confer4.4.1.1) Vector y may be interpreted as a negative search direction;it points opposite to direction of movement of hyperplane {x | 〈x , y〉= τ}in a minimization of real linear function 〈x , y〉 over the feasible set inlinear program (156). (p.75) Direction vector y is not unique. The feasibleset of direction vectors in (525) is the convex hull of all cardinality-(n−k)one-vectors; videlicet,

336 CHAPTER 4. SEMIDEFINITE PROGRAMMINGconv{u∈ R n | cardu = n − k , u i ∈ {0, 1}} = {a∈ R n | 1 ≽ a ≽ 0, 〈1, a〉= n − k}(759)This set, argument to conv{ } , comprises the extreme points of set (759)which is a nonnegative hypercube slice. An optimal solution y to (525),that is an extreme point of its feasible set, is known in closed form: ithas 1 in each entry corresponding to the n−k smallest entries of x ⋆ andhas 0 elsewhere. That particular polar direction −y can be interpreted 4.33by Proposition 7.1.3.0.3 as pointing toward the nonnegative orthant inthe Cartesian subspace, whose basis is a subset of the Cartesian axes,containing all cardinality k (or less) vectors having the same ordering as x ⋆ .Consequently, for that closed-form solution, (confer (738))n∑π(|x ⋆ |) i = 〈|x ⋆ |, y〉 = |x ⋆ | T y (760)i=k+1When y = 1, as in 1-norm minimization for example, then polar direction−y points directly at the origin (the cardinality-0 nonnegative vector) as inFigure 99. We sometimes solve (525) instead of employing a known closedform because a direction vector is not unique. Setting direction vector yinstead in accordance with an iterative inverse weighting scheme, calledreweighting [161], was described for the 1-norm by Huo [207,4.11.3] in 1999.4.5.1.3 convergence can mean stallingConvex iteration (156) (525) always converges to a locally optimal solution,a fixed point of possibly infeasible cardinality, by virtue of a monotonicallynonincreasing real objective sequence. [258,1.2] [42,1.1] There can beno proof of global convergence, defined by (758). Constraining cardinality,solution to problem (530), can often be achieved but simple examples canbe contrived that stall at a fixed point of infeasible cardinality; at a positiveobjective value 〈x ⋆ , y〉= τ >0. Direction vector y is then manipulated, ascountermeasure, to steer out of local minima; e.g., complete randomizationas in Example 4.5.1.5.1, or reinitialization to a random cardinality-(n−k)vector in the same nonnegative orthant face demanded by the current4.33 Convex iteration (156) (525) is not a projection method because there is nothresholding or discard of variable-vector x entries. An optimal direction vector y mustalways reside on the feasible set boundary in (525) page 334; id est, it is ill-advised toattempt simultaneous optimization of variables x and y .

4.5. CONSTRAINING CARDINALITY 337iterate: y has nonnegative uniformly distributed random entries in (0, 1]corresponding to the n−k smallest entries of x ⋆ and has 0 elsewhere.When this particular heuristic is successful, cardinality versus iteration ischaracterized by noisy monotonicity. Zero entries behave like memory whilerandomness greatly diminishes likelihood of a stall.4.5.1.4 algebraic derivation of direction vector for convex iterationIn3.2.2.1.2, the compressed sensing problem was precisely represented as anonconvex difference of convex functions bounded below by 0find x ∈ R nsubject to Ax = bx ≽ 0‖x‖ 0 ≤ k≡minimize ‖x‖ 1 − ‖x‖nx∈R nksubject to Ax = bx ≽ 0(530)where convex k-largest norm ‖x‖nkis monotonic on R n + . There we showedhow (530) is equivalently stated in terms of gradientsbecauseminimizex∈R n 〈subject to Ax = bx ≽ 0x , ∇‖x‖ 1 − ∇‖x‖nk〉(761)‖x‖ 1 = x T ∇‖x‖ 1 , ‖x‖nk= x T ∇‖x‖nk, x ≽ 0 (762)The objective function from (761) is a directional derivative (at x in directionx ,D.1.6, conferD.1.4.1.1) of the objective function from (530) while thedirection vector of convex iterationy = ∇‖x‖ 1 − ∇‖x‖nk(763)is an objective gradient where ∇‖x‖ 1 = 1 under nonnegativity and∇‖x‖n = arg maximize z T xk z∈R nsubject to 0 ≼ z ≼ 1z T 1 = k⎫⎪⎬⎪⎭ , x ≽ 0 (533)

338 CHAPTER 4. SEMIDEFINITE PROGRAMMINGSubstituting 1 − z ← z the direction vector becomesy = 1 − arg maximize z T x ← arg minimize z T xz∈R n z∈R nsubject to 0 ≼ z ≼ 1 subject to 0 ≼ z ≼ 1 (525)z T 1 = kz T 1 = n − k4.5.1.5 optimality conditions for compressed sensingNow we see how global optimality conditions can be stated without referenceto a dual problem: From conditions (469) for optimality of (530), it isnecessary [61,5.5.3] thatx ⋆ ≽ 0 (1)Ax ⋆ = b (2)∇‖x ⋆ ‖ 1 − ∇‖x ⋆ ‖n + A T ν ⋆ ≽ 0k(3)〈∇‖x ⋆ ‖ 1 − ∇‖x ⋆ ‖n + A T ν ⋆ , x ⋆ 〉 = 0k(4l)(764)These conditions must hold at any optimal solution (locally or globally). By(762), the fourth condition is identical toBecause a 1-norm‖x ⋆ ‖ 1 − ‖x ⋆ ‖nk+ ν ⋆T Ax ⋆ = 0 (4l) (765)‖x‖ 1 = ‖x‖nk+ ‖π(|x|) k+1:n ‖ 1 (766)is separable into k largest and n −k smallest absolute entries,‖π(|x|) k+1:n ‖ 1 = 0 ⇔ ‖x‖ 0 ≤ k (4g) (767)is a necessary condition for global optimality. By assumption, matrix Ais fat and b ≠ 0 ⇒ Ax ⋆ ≠ 0. This means ν ⋆ ∈ N(A T ) ⊂ R m , and ν ⋆ = 0when A is full-rank. By definition, ∇‖x‖ 1 ≽ ∇‖x‖nkalways holds. Assumingexistence of a cardinality-k solution, then only three of the four conditionsare necessary and sufficient for global optimality of (530):‖x ⋆ ‖ 1 − ‖x ⋆ ‖nk= 0x ⋆ ≽ 0 (1)Ax ⋆ = b (2)(4g)(768)meaning, global optimality of a feasible solution to (530) is identified by azero objective.

4.5. CONSTRAINING CARDINALITY 339m/k76Donoho boundapproximationx > 0 constraintminimize ‖x‖ 1xsubject to Ax = b (518)543m > k log 2 (1+n/k)minimize ‖x‖ 1xsubject to Ax = bx ≽ 0(523)210 0.2 0.4 0.6 0.8 1k/nFigure 100: For Gaussian random matrix A∈ R m×n , graph illustratesDonoho/Tanner least lower bound on number of measurements m belowwhich recovery of k-sparse n-length signal x by linear programming failswith overwhelming probability. Hard problems are below curve, but not thereverse; id est, failure above depends on proximity. Inequality demarcatesapproximation (dashed curve) empirically observed in [23]. Problems havingnonnegativity constraint (dotted) are easier to solve. [122] [123]

340 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.5.1.5.1 Example. Sparsest solution to Ax = b. [70] [118](confer Example 4.5.1.8.1) Problem (682) has sparsest solution not easilyrecoverable by least 1-norm; id est, not by compressed sensing becauseof proximity to a theoretical lower bound on number of measurements mdepicted in Figure 100: for A∈ R m×nGiven data from Example 4.2.3.1.1, for m=3, n=6, k=1⎡⎤ ⎡ ⎤−1 1 8 1 1 01⎢1 1 1A = ⎣ −3 2 8 − 1 ⎥ ⎢2 3 2 3 ⎦ , b = ⎣−9 4 8141914 − 1 91214⎥⎦ (682)the sparsest solution to classical linear equation Ax = b is x = e 4 ∈ R 6(confer (695)).Although the sparsest solution is recoverable by inspection, we discernit instead by convex iteration; namely, by iterating problem sequence(156) (525) on page 334. From the numerical data given, cardinality ‖x‖ 0 = 1is expected. Iteration continues until x T y vanishes (to within some numericalprecision); id est, until desired cardinality is achieved. But this comes notwithout a stall.Stalling, whose occurrence is sensitive to initial conditions of convexiteration, is a consequence of finding a local minimum of a multimodalobjective 〈x, y〉 when regarded as simultaneously variable in x and y .(3.8.0.0.3) Stalls are simply detected as fixed points x of infeasiblecardinality, sometimes remedied by reinitializing direction vector y to arandom positive state.Bolstered by success in breaking out of a stall, we then apply convexiteration to 22,000 randomized problems:Given random data for m=3, n=6, k=1, in Matlab notationA=randn(3,6), index=round(5∗rand(1)) +1, b=rand(1)∗A(:,index)(769)the sparsest solution x∝e index is a scaled standard basis vector.Without convex iteration or a nonnegativity constraint x≽0, rate of failurefor this minimal cardinality problem Ax=b by 1-norm minimization of xis 22%. That failure rate drops to 6% with a nonnegativity constraint. If

4.5. CONSTRAINING CARDINALITY 341we then engage convex iteration, detect stalls, and randomly reinitialize thedirection vector, failure rate drops to 0% but the amount of computation isapproximately doubled.Stalling is not an inevitable behavior. For some problem types (beyondmere Ax = b), convex iteration succeeds nearly all the time. Here is acardinality problem, with noise, whose statement is just a bit more intricatebut easy to solve in a few convex iterations:4.5.1.5.2 Example. Signal dropout. [121,6.2]Signal dropout is an old problem; well studied from both an industrial andacademic perspective. Essentially dropout means momentary loss or gap ina signal, while passing through some channel, caused by some man-madeor natural phenomenon. The signal lost is assumed completely destroyedsomehow. What remains within the time-gap is system or idle channel noise.The signal could be voice over Internet protocol (VoIP), for example, audiodata from a compact disc (CD) or video data from a digital video disc (DVD),a television transmission over cable or the airwaves, or a typically ravagedcell phone communication, etcetera.Here we consider signal dropout in a discrete-time signal corrupted byadditive white noise assumed uncorrelated to the signal. The linear channelis assumed to introduce no filtering. We create a discretized windowedsignal for this example by positively combining k randomly chosen vectorsfrom a discrete cosine transform (DCT) basis denoted Ψ∈ R n×n . Frequencyincreases, in the Fourier sense, from DC toward Nyquist as column index ofbasis Ψ increases. Otherwise, details of the basis are unimportant exceptfor its orthogonality Ψ T = Ψ −1 . Transmitted signal is denoteds = Ψz ∈ R n (770)whose upper bound on DCT basis coefficient cardinality cardz ≤ k isassumed known; 4.34 hence a critical assumption: transmitted signal s issparsely supported (k < n) on the DCT basis. It is further assumed thatnonzero signal coefficients in vector z place each chosen basis vector abovethe noise floor.4.34 This simplifies exposition, although it may be an unrealistic assumption in manyapplications.

342 CHAPTER 4. SEMIDEFINITE PROGRAMMINGWe also assume that the gap’s beginning and ending in time are preciselylocalized to within a sample; id est, index l locates the last sample prior tothe gap’s onset, while index n−l+1 locates the first sample subsequent tothe gap: for rectangularly windowed received signal g possessing a time-gaploss and additive noise η ∈ R n⎡⎤s 1:l + η 1:lg = ⎣ η l+1:n−l⎦∈ R n (771)+ η n−l+1:ns n−l+1:nThe window is thereby centered on the gap and short enough so that the DCTspectrum of signal s can be assumed static over the window’s duration n .Signal to noise ratio within this window is defined[ ]∥ s 1:l ∥∥∥∥ s n−l+1:nSNR 20 log(772)‖η‖In absence of noise, knowing the signal DCT basis and having agood estimate of basis coefficient cardinality makes perfectly reconstructinggap-loss easy: it amounts to solving a linear system of equations andrequires little or no optimization; with caveat, number of equations exceedscardinality of signal representation (roughly l ≥ k) with respect to DCTbasis.But addition of a significant amount of noise η increases level of difficultydramatically; a 1-norm based method of reducing cardinality, for example,almost always returns DCT basis coefficients numbering in excess of minimalcardinality. We speculate that is because signal cardinality 2l becomes thepredominant cardinality. DCT basis coefficient cardinality is an explicitconstraint to the optimization problem we shall pose: In presence of noise,constraints equating reconstructed signal f to received signal g are notpossible. We can instead formulate the dropout recovery problem as a bestapproximation:∥[∥∥∥minimizex∈R nsubject to f = Ψxx ≽ 0card x ≤ k]∥f 1:l − g 1:l ∥∥∥f n−l+1:n − g n−l+1:n(773)

4.5. CONSTRAINING CARDINALITY 343flatline and g0.250.20.15s + η0.10.050−0.05ηs + η(a)−0.1−0.15−0.2dropout (s = 0)−0.250 100 200 300 400 500f and g0.250.20.15f0.10.050(b)−0.05−0.1−0.15−0.2−0.250 100 200 300 400 500Figure 101: (a) Signal dropout in signal s corrupted by noise η(SNR =10dB, g = s + η). Flatline indicates duration of signal dropout.(b) Reconstructed signal f (red) overlaid with corrupted signal g .

344 CHAPTER 4. SEMIDEFINITE PROGRAMMINGf − s0.250.20.150.10.050(a)−0.05−0.1−0.15−0.2−0.250 100 200 300 400 500f and s0.250.20.150.10.050(b)−0.05−0.1−0.15−0.2−0.250 100 200 300 400 500Figure 102: (a) Error signal power (reconstruction f less original noiselesssignal s) is 36dB below s . (b) Original signal s overlaid withreconstruction f (red) from signal g having dropout plus noise.

4.5. CONSTRAINING CARDINALITY 345We propose solving this nonconvex problem (773) by moving the cardinalityconstraint to the objective as a regularization term as explained in4.5;id est, by iteration of two convex problems until convergence:andminimize 〈x , y〉 +x∈R nsubject to f = Ψxx ≽ 0[∥minimizey∈R n 〈x ⋆ , y〉subject to 0 ≼ y ≼ 1]∥f 1:l − g 1:l ∥∥∥f n−l+1:n − g n−l+1:ny T 1 = n − k(525)(774)Signal cardinality 2l is implicit to the problem statement. When numberof samples in the dropout region exceeds half the window size, then thatdeficient cardinality of signal remaining becomes a source of degradationto reconstruction in presence of noise. Thus, by observation, we divinea reconstruction rule for this signal dropout problem to attain goodnoise suppression: l must exceed a maximum of cardinality bounds;2l ≥ max{2k , n/2}.Figure 101 and Figure 102 show one realization of this dropout problem.Original signal s is created by adding four (k = 4) randomly selected DCTbasis vectors, from Ψ (n = 500 in this example), whose amplitudes arerandomly selected from a uniform distribution above the noise floor; in theinterval [10 −10/20 , 1]. Then a 240-sample dropout is realized (l = 130) andGaussian noise η added to make corrupted signal g (from which a bestapproximation f will be made) having 10dB signal to noise ratio (772).The time gap contains much noise, as apparent from Figure 101a. But inonly a few iterations (774) (525), original signal s is recovered with relativeerror power 36dB down; illustrated in Figure 102. Correct cardinality isalso recovered (cardx = cardz) along with the basis vector indices used tomake original signal s . Approximation error is due to DCT basis coefficientestimate error. When this experiment is repeated 1000 times on noisy signalsaveraging 10dB SNR, the correct cardinality and indices are recovered 99%of the time with average relative error power 30dB down. Without noise, weget perfect reconstruction in one iteration. [Matlab code: Wıκımization]

346 CHAPTER 4. SEMIDEFINITE PROGRAMMINGR 3c S = {s | s ≽ 0, 1 T s ≤ c}AFFigure 103: Line A intersecting two-dimensional (cardinality-2) face F ofnonnegative simplex S from Figure 53 emerges at a (cardinality-1) vertex.S is first quadrant of 1-norm ball B 1 from Figure 70. Kissing point (• onedge) is achieved as ball contracts or expands under optimization.

4.5. CONSTRAINING CARDINALITY 3474.5.1.6 Compressed sensing geometry with a nonnegative variableIt is well known that cardinality problem (530), on page 233, is easier to solveby linear programming when variable x is nonnegatively constrained thanwhen not; e.g., Figure 71, Figure 100. We postulate a simple geometricalexplanation: Figure 70 illustrates 1-norm ball B 1 in R 3 and affine subsetA = {x |Ax=b}. Prototypical compressed sensing problem, for A∈ R m×nminimize ‖x‖ 1xsubject to Ax = b(518)is solved when the 1-norm ball B 1 kisses affine subset A .If variable x is constrained to the nonnegative orthantminimize ‖x‖ 1xsubject to Ax = bx ≽ 0(523)then 1-norm ball B 1 (Figure 70) becomes simplex S in Figure 103.Whereas 1-norm ball B 1 has only six vertices in R 3 corresponding tocardinality-1 solutions, simplex S has three edges (along the Cartesian axes)containing an infinity of cardinality-1 solutions. And whereas B 1 has twelveedges containing cardinality-2 solutions, S has three (out of total four)facets constituting cardinality-2 solutions. In other words, likelihood of alow-cardinality solution is higher by kissing nonnegative simplex S (523) thanby kissing 1-norm ball B 1 (518) because facial dimension (corresponding togiven cardinality) is higher in S .Empirically, this conclusion also holds in other Euclidean dimensions(Figure 71, Figure 100).4.5.1.7 cardinality-1 compressed sensing problem always solvableIn the special case of cardinality-1 solution to prototypical compressedsensing problem (523), there is a geometrical interpretation that leadsto an algorithm more efficient than convex iteration. Figure 103 showsa vertex-solution to problem (523) when desired cardinality is 1. Butfirst-quadrant S of 1-norm ball B 1 does not kiss line A ; which requiresexplanation: Under the assumption that nonnegative cardinality-1 solutionsexist in feasible set A , it so happens,

348 CHAPTER 4. SEMIDEFINITE PROGRAMMINGcolumns of measurement matrix A , corresponding to any particularsolution (to (523)) of cardinality greater than 1, may be deprecated andthe problem solved again with those columns missing. Such columnsare recursively removed from A until a cardinality-1 solution is found.Either a solution to problem (523) is cardinality-1 or column indices of A ,corresponding to a higher cardinality solution, do not intersect that indexcorresponding to the cardinality-1 solution. When problem (523) is firstsolved, in the example of Figure 103, solution is cardinality-2 at the kissingpoint • on the indicated edge of simplex S . Imagining that the correspondingcardinality-2 face F has been removed, then the simplex collapses to a linesegment along one of the Cartesian axes. When that line segment kissesline A , then the cardinality-1 vertex-solution illustrated has been found. Asimilar argument holds for any orientation of line A and point of entry tosimplex S .Because signed compressed sensing problem (518) can be equivalentlyexpressed in a nonnegative variable, as we learned in Example 3.2.0.0.1(p.230), and because a cardinality-1 constraint in (518) transforms toa cardinality-1 constraint in its nonnegative equivalent (522), then thiscardinality-1 recursive reconstruction algorithm continues to hold for a signedvariable as in (518). This cardinality-1 reconstruction algorithm also holdsmore generally when affine subset A has any higher dimension n−m .Although it is more efficient to search over the columns of matrix A fora cardinality-1 solution known a priori to exist, a combinatorial approach,tables are turned when cardinality exceeds 1 :4.5.1.8 cardinality-k geometric presolverThis idea of deprecating columns has foundation in convex cone theory.(2.13.4.3) Removing columns (and rows) 4.35 from A∈ R m×n in a linearprogram like (523) (3.2) is known in the industry as presolving; 4.36 theelimination of redundant constraints and identically zero variables prior tonumerical solution. We offer a different and geometric presolver:4.35 Rows of matrix A are removed based upon linear dependence. Assuming b∈R(A) ,corresponding entries of vector b may also be removed without loss of generality.4.36 The conic technique proposed here can exceed performance of the best industrialpresolvers in terms of number of columns removed, but not in execution time. Thisgeometric presolver becomes attractive when a linear or integer program is not solvableby other means; perhaps because of sheer dimension.

4.5. CONSTRAINING CARDINALITY 349R nR n +(a)AP0R mbK(b)0Figure 104: Constraint interpretations: (a) Halfspace-description of feasibleset in problem (523) is a polyhedron P formed by intersection of nonnegativeorthant R n + with hyperplanes A prescribed by equality constraint. (Drawingby Pedro Sánchez) (b) Vertex-description of constraints in problem (523):point b belongs to polyhedral cone K = {Ax | x ≽ 0}. Number of extremedirections in K may exceed dimensionality of ambient space.

350 CHAPTER 4. SEMIDEFINITE PROGRAMMINGTwo interpretations of the constraints from problem (523) are realizedin Figure 104. Assuming that a cardinality-k solution exists and matrix Adescribes a pointed polyhedral cone K = {Ax | x ≽ 0} , columns are removedfrom A if they do not belong to the smallest face of K containing vector b .Columns so removed correspond to 0-entries in variable vector x ; in otherwords, generators of that smallest face always hold a minimal cardinalitysolution.Benefit accrues when vector b does not belong to relative interior of K ;there would be no columns to remove were b∈rel int K since the smallest facebecomes cone K itself (Example 4.5.1.8.1). Were b an extreme direction, atthe other end of our spectrum, then the smallest face is an edge that is a raycontaining b ; this is a geometrical description of a cardinality-1 case whereall columns, save one, would be removed from A .When vector b resides in a face of K that is not K itself, benefit isrealized as a reduction in computational intensity because the consequentequivalent problem has smaller dimension. Number of columns removeddepends completely on particular geometry of a given problem; particularly,location of b within K .There are always exactly n linear feasibility problems to solve in orderto discern generators of the smallest face of K containing b ; the topic of2.13.4.3. So comparison of computational ( intensity for this conic approachnpits combinatorial complexity ∝ a binomial coefficient versus n lineark)feasibility problems plus numerical solution of the presolved problem.4.5.1.8.1 Example. Presolving for cardinality-2 solution to Ax = b .(confer Example 4.5.1.5.1) Again taking data from Example 4.2.3.1.1, form=3, n=6, k=2 (A∈ R m×n , desired cardinality of x is k)⎡⎤ ⎡ ⎤A =⎢⎣−1 1 8 1 1 0−3 2 812−9 4 81413191− 1 2 31− 1 4 9⎥⎦ , b =⎢⎣11214⎥⎦ (682)proper cone K = {Ax | x ≽ 0} is pointed as proven by method of2.12.2.2.A cardinality-2 solution is known to exist; sum of the last two columns ofmatrix A . Generators of the smallest face that contains vector b , foundby the method in Example 2.13.4.3.1, comprise the entire A matrix becauseb∈int K (2.13.4.2.4). So geometry of this particular problem does not

4.5. CONSTRAINING CARDINALITY 351permit number of generators to be reduced below n by discerning thesmallest face. 4.37There is wondrous bonus to presolving when a constraint matrix is sparse.After columns are removed by theory of convex cones (finding the smallestface), some remaining rows may become 0 T , identical to other rows, ornonnegative. When nonnegative rows appear in an equality constraint to 0,all nonnegative variables corresponding to nonnegative entries in those rowsmust vanish (A.7.1); meaning, more columns may be removed. Once rowsand columns have been removed from a constraint matrix, even more rowsand columns may be removed by repeating the presolver procedure.4.5.2 constraining cardinality of signed variableNow consider a feasibility problem equivalent to the classical problem fromlinear algebra Ax = b , but with an upper bound k on cardinality ‖x‖ 0 : forvector b∈R(A)find x ∈ R nsubject to Ax = b‖x‖ 0 ≤ k(775)where ‖x‖ 0 ≤ k means vector x has at most k nonzero entries; such a vectoris presumed existent in the feasible set. Convex iteration (4.5.1) utilizes anonnegative variable; so absolute value |x| is needed here. We propose thatnonconvex problem (775) can be equivalently written as a sequence of convexproblems that move the cardinality constraint to the objective:minimizex∈R n 〈|x| , y〉subject to Ax = b≡minimize 〈t , y + ε1〉x∈R n , t∈R nsubject to Ax = b−t ≼ x ≼ t(776)minimizey∈R n 〈t ⋆ , y + ε1〉subject to 0 ≼ y ≼ 1y T 1 = n − k(525)4.37 But a canonical set of conically independent generators of K comprise only the firsttwo and last two columns of A.

352 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwhere ε is a relatively small positive constant. This sequence is iterated untila direction vector y is found that makes |x ⋆ | T y ⋆ vanish. The term 〈t , ε1〉in (776) is necessary to determine absolute value |x ⋆ | = t ⋆ (3.2) becausevector y can have zero-valued entries. By initializing y to (1−ε)1, thefirst iteration of problem (776) is a 1-norm problem (514); id est,minimize 〈t , 1〉x∈R n , t∈R nsubject to Ax = b−t ≼ x ≼ t≡minimizex∈R n ‖x‖ 1subject to Ax = b(518)Subsequent iterations of problem (776) engaging cardinality term 〈t , y〉 canbe interpreted as corrections to this 1-norm problem leading to a 0-normsolution; vector y can be interpreted as a direction of search.4.5.2.1 local convergenceAs before (4.5.1.3), convex iteration (776) (525) always converges to a locallyoptimal solution; a fixed point of possibly infeasible cardinality.4.5.2.2 simple variations on a signed variableSeveral useful equivalents to linear programs (776) (525) are easily devised,but their geometrical interpretation is not as apparent: e.g., equivalent inthe limit ε→0 +minimize 〈t , y〉x∈R n , t∈R nsubject to Ax = b(777)−t ≼ x ≼ tminimizey∈R n 〈|x ⋆ |, y〉subject to 0 ≼ y ≼ 1y T 1 = n − k(525)We get another equivalent to linear programs (776) (525), in the limit, byinterpreting problem (518) as infimum to a vertex-description of the 1-normball (Figure 70, Example 3.2.0.0.1, confer (517)):minimizex∈R n ‖x‖ 1subject to Ax = b≡minimizea∈R 2n 〈a , y〉subject to [A −A ]a = ba ≽ 0(778)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 353minimizey∈R 2n 〈a ⋆ , y〉subject to 0 ≼ y ≼ 1y T 1 = 2n − k(525)where x ⋆ = [I −I ]a ⋆ ; from which it may be construed that any vector1-norm minimization problem has equivalent expression in a nonnegativevariable.4.6 Cardinality and rank constraint examples4.6.0.0.1 Example. Projection on ellipsoid boundary. [52] [148,5.1][247,2] Consider classical linear equation Ax = b but with constraint onnorm of solution x , given matrices C , fat A , and vector b∈R(A)find x ∈ R Nsubject to Ax = b‖Cx‖ = 1(779)The set {x | ‖Cx‖=1} (2) describes an ellipsoid boundary (Figure 13). Thisis a nonconvex problem because solution is constrained to that boundary.AssignG =[ Cx1] [ ][x T C T 1] X Cx=x T C T 1[ Cxx T C T Cxx T C T 1]∈ S N+1 (780)Any rank-1 solution must have this form. (B.1.0.2) Ellipsoidally constrainedfeasibility problem (779) is equivalent to:findX∈S Nx ∈ R Nsubject to Ax = b[X CxG =x T C T 1rankG = 1trX = 1](≽ 0)(781)

354 CHAPTER 4. SEMIDEFINITE PROGRAMMINGThis is transformed to an equivalent convex problem by moving the rankconstraint to the objective: We iterate solution ofwithminimize 〈G , Y 〉X∈S N , x∈R Nsubject to Ax = b[X CxG =x T C T 1trX = 1minimizeY ∈ S N+1 〈G ⋆ , Y 〉subject to 0 ≼ Y ≼ ItrY = N]≽ 0(782)(783)until convergence. Direction matrix Y ∈ S N+1 , initially 0, regulates rank.(1700a) Singular value decomposition G ⋆ = UΣQ T ∈ S N+1+ (A.6) provides anew direction matrix Y = U(:, 2:N+1)U(:, 2:N+1) T that optimally solves(783) at each iteration. An optimal solution to (779) is thereby found in afew iterations, making convex problem (782) its equivalent.It remains possible for the iteration to stall; were a rank-1 G matrix notfound. In that case, the current search direction is momentarily reversedwith an added randomized element:Y = −U(:,2:N+1) ∗ (U(:,2:N+1) ′ + randn(N,1) ∗U(:,1) ′ ) (784)in Matlab notation. This heuristic is quite effective for problem (779) whichis exceptionally easy to solve by convex iteration.When b /∈R(A) then problem (779) must be restated as a projection:minimize ‖Ax − b‖x∈R Nsubject to ‖Cx‖ = 1(785)This is a projection of point b on an ellipsoid boundary because any affinetransformation of an ellipsoid remains an ellipsoid. Problem (782) in turnbecomesminimize 〈G , Y 〉 + ‖Ax − b‖X∈S N , x∈R N[ ]X Cxsubject to G =x T C T ≽ 0 (786)1trX = 1

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 355We iterate this with calculation (783) of direction matrix Y as before untila rank-1 G matrix is found.4.6.0.0.2 Example. Procrustes problem. [52]Example 4.6.0.0.1 is extensible. An orthonormal matrix Q∈ R n×p ischaracterized Q T Q = I . Consider the particular case Q = [x y ]∈ R n×2 asvariable to a Procrustes problem (C.3): given A∈ R m×n and B ∈ R m×2minimize ‖AQ − B‖ FQ∈R n×2subject to Q T Q = I(787)which is nonconvex. By vectorizing matrix Q we can make the assignment:⎡ ⎤ ⎡ ⎤ ⎡ ⎤x [ x T y T 1] X Z x xx T xy T xG = ⎣ y ⎦ = ⎣ Z T Y y ⎦⎣yx T yy T y ⎦∈ S 2n+1 (788)1x T y T 1 x T y T 1Now orthonormal Procrustes problem (787) can be equivalently restated:minimizeX , Y ∈ S , Z , x , y‖A[x y ] − B‖ F⎡X Z⎤xsubject to G = ⎣ Z T Y y ⎦(≽ 0)x T y T 1rankG = 1trX = 1trY = 1trZ = 0(789)To solve this, we form the convex problem sequence:minimizeX , Y , Z , x , y‖A[x y ] −B‖ F + 〈G , W 〉⎡X Z⎤xsubject to G = ⎣ Z T Y y ⎦ ≽ 0x T y T 1trX = 1trY = 1trZ = 0(790)

356 CHAPTER 4. SEMIDEFINITE PROGRAMMINGandminimizeW ∈ S 2n+1 〈G ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = 2n(791)which has an optimal solution W that is known in closed form (p.658).These two problems are iterated until convergence and a rank-1 G matrix isfound. A good initial value for direction matrix W is 0. Optimal Q ⋆ equals[x ⋆ y ⋆ ].Numerically, this Procrustes problem is easy to solve; a solution seemsalways to be found in one or few iterations. This problem formulation isextensible, of course, to orthogonal (square) matrices Q . 4.6.0.0.3 Example. Combinatorial Procrustes problem.In case A,B∈ R n , when vector A = ΞB is known to be a permutation ofvector B , solution to orthogonal Procrustes problemminimize ‖A − XB‖ FX∈R n×n(1712)subject to X T = X −1is not necessarily a permutation matrix Ξ even though an optimal objectivevalue of 0 is found by the known analytical solution (C.3). The simplestmethod of solution finds permutation matrix X ⋆ = Ξ simply by sortingvector B with respect to A .Instead of sorting, we design two different convex problems each of whoseoptimal solution is a permutation matrix: one design is based on rankconstraint, the other on cardinality. Because permutation matrices are sparseby definition, we depart from a traditional Procrustes problem by insteaddemanding a vector 1-norm which is known to produce solutions more sparsethan Frobenius’ norm.There are two principal facts exploited by the first convex iteration design(4.4.1) we propose. Permutation matrices Ξ constitute:1) the set of all nonnegative orthogonal matrices,2) all points extreme to the polyhedron (100) of doubly stochasticmatrices.That means:

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 357{X(:, i) | X(:, i) T X(:, i) = 1}{X(:, i) | 1 T X(:, i) = 1}Figure 105: Permutation matrix i th column-sum and column-norm constraintin two dimensions when rank constraint is satisfied. Optimal solutions resideat intersection of hyperplane with unit circle.1) norm of each row and column is 1 , 4.38‖Ξ(:, i)‖ = 1, ‖Ξ(i , :)‖ = 1, i=1... n (792)2) sum of each nonnegative row and column is 1, (2.3.2.0.4)Ξ T 1=1, Ξ1=1, Ξ ≥ 0 (793)solution via rank constraintThe idea is to individually constrain each column of variable matrix X tohave unity norm. Matrix X must also belong to that polyhedron, (100) inthe nonnegative orthant, implied by constraints (793); so each row-sum and4.38 This fact would be superfluous were the objective of minimization linear, because thepermutation matrices reside at the extreme points of a polyhedron (100) implied by (793).But as posed, only either rows or columns need be constrained to unit norm becausematrix orthogonality implies transpose orthogonality. (B.5.1) Absence of vanishing innerproduct constraints that help define orthogonality, like trZ = 0 from Example 4.6.0.0.2,is a consequence of nonnegativity; id est, the only orthogonal matrices having exclusivelynonnegative entries are permutations of the identity.

358 CHAPTER 4. SEMIDEFINITE PROGRAMMINGcolumn-sum of X must also be unity. It is this combination of nonnegativity,sum, and sum square constraints that extracts the permutation matrices:(Figure 105) given nonzero vectors A, BminimizeX∈R n×n , G i ∈ S n+1subject to∑‖A − XB‖ 1 + w n 〈G i , W i 〉[ i=1 ] ⎫Gi (1:n, 1:n) X(:, i) ⎬G i =X(:, i) T ≽ 01⎭ ,trG i = 2i=1... n(794)X T 1 = 1X1 = 1X ≥ 0where w ≈10 positively weights the rank regularization term. Optimalsolutions G ⋆ i are key to finding direction matrices W i for the next iterationof semidefinite programs (794) (795):minimizeW i ∈ S n+1 〈G ⋆ i , W i 〉subject to 0 ≼ W i ≼ ItrW i = n⎫⎪⎬, i=1... n (795)⎪⎭Direction matrices thus found lead toward rank-1 matrices G ⋆ i on subsequentiterations. Constraint on trace of G ⋆ i normalizes the i th column of X ⋆ to unitybecause (confer p.419)[ ] XG ⋆ i =⋆ (:, i) [ X ⋆ (:, i) T 1]1(796)at convergence. Binary-valued X ⋆ column entries result from the furthersum constraint X1=1. Columnar orthogonality is a consequence of thefurther transpose-sum constraint X T 1=1 in conjunction with nonnegativityconstraint X ≥ 0 ; but we leave proof of orthogonality an exercise. Theoptimal objective value is 0 for both semidefinite programs when vectors Aand B are related by permutation. In any case, optimal solution X ⋆ becomesa permutation matrix Ξ .

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 359Because there are n direction matrices W i to find, it can be advantageousto invoke a known closed-form solution for each from page 658. What makesthis combinatorial problem more tractable are relatively small semidefiniteconstraints in (794). (confer (790)) When a permutation A of vector B exists,number of iterations can be as small as 1. But this combinatorial Procrustesproblem can be made even more challenging when vector A has repeatedentries.solution via cardinality constraintNow the idea is to force solution at a vertex of permutation polyhedron (100)by finding a solution of desired sparsity. Because permutation matrix X isn-sparse by assumption, this combinatorial Procrustes problem may insteadbe formulated as a compressed sensing problem with convex iteration oncardinality of vectorized X (4.5.1): given nonzero vectors A, BminimizeX∈R n×n ‖A − XB‖ 1 + w〈X , Y 〉subject to X T 1 = 1X1 = 1X ≥ 0where direction vector Y is an optimal solution tominimizeY ∈R n×n 〈X ⋆ , Y 〉subject to 0 ≤ Y ≤ 11 T Y 1 = n 2 − n(525)(797)each a linear program. In this circumstance, use of closed-form solution fordirection vector Y is discouraged. When vector A is a permutation of B ,both linear programs have objectives that converge to 0. When vectors A andB are permutations and no entries of A are repeated, optimal solution X ⋆can be found as soon as the first iteration.In any case, X ⋆ = Ξ is a permutation matrix.4.6.0.0.4 Exercise. Combinatorial Procrustes constraints.Assume that the objective of semidefinite program (794) is 0 at optimality.Prove that the constraints in program (794) are necessary and sufficientto produce a permutation matrix as optimal solution. Alternatively andequivalently, prove those constraints necessary and sufficient to optimallyproduce a nonnegative orthogonal matrix.

360 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.6.0.0.5 Example. Tractable polynomial constraint.The set of all coefficients for which a multivariate polynomial were convex isgenerally difficult to determine. But the ability to handle rank constraintsmakes any nonconvex polynomial constraint transformable to a convexconstraint. All optimization problems having polynomial objective andpolynomial constraints can be reformulated as a semidefinite program witha rank-1 constraint. [285] Suppose we requireIdentify3 + 2x − xy ≤ 0 (798)⎡ ⎤ ⎡ ⎤x [ x y 1] x 2 xy xG = ⎣ y ⎦ = ⎣ xy y 2 y ⎦∈ S 3 (799)1x y 1Then nonconvex polynomial constraint (798) is equivalent to constraint settr(GA) ≤ 0G 33 = 1(G ≽ 0)rankG = 1(800)with direct correspondence to sense of trace inequality where G is assumedsymmetric (B.1.0.2) and⎡ ⎤0 − 1 12A = ⎣ − 1 0 0 ⎦∈ S 3 (801)21 0 3Then the method of convex iteration from4.4.1 is applied to implement therank constraint.4.6.0.0.6 Exercise. Binary Pythagorean theorem.Technique of Example 4.6.0.0.5 is extensible to any quadratic constraint; e.g.,x T Ax + 2b T x + c ≤ 0 , x T Ax + 2b T x + c ≥ 0 , and x T Ax + 2b T x + c = 0.Write a rank-constrained semidefinite program to solve (Figure 105){ x + y = 1x 2 + y 2 (802)= 1whose feasible set is not connected. Implement this system in cvx [167] byconvex iteration.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 3614.6.0.0.7 Example. Higher-order polynomials.Consider nonconvex problem from Canadian Mathematical Olympiad 1999:findx , y , z∈Rx, y , zsubject to x 2 y + y 2 z + z 2 x = 223 3x + y + z = 1x, y , z ≥ 0(803)We wish to solve for what is known to be a tight upper bound 223 3on the constrained polynomial x 2 y + y 2 z + z 2 x by transformation to arank-constrained semidefinite program; identify⎡ ⎤⎡⎤x [x y z 1] x 2 xy zx xG = ⎢ y⎥⎣ z ⎦ = ⎢ xy y 2 yz y⎥⎣ zx yz z 2 z ⎦ ∈ S4 (804)1x y z 1⎡ ⎤⎡⎤x 2 [ x 2 y 2 z 2 x y z 1] x 4 x 2 y 2 z 2 x 2 x 3 x 2 y zx 2 x 2y 2x 2 y 2 y 4 y 2 z 2 xy 2 y 3 y 2 z y 2z 2z 2 x 2 y 2 z 2 z 4 z 2 x yz 2 z 3 z 2X =x=x 3 xy 2 z 2 x x 2 xy zx x∈ S 7⎢ y⎥⎢ x 2 y y 3 yz 2 xy y 2 yz y⎥⎣ z ⎦⎣ zx 2 y 2 z z 3 zx yz z 2 z ⎦1x 2 y 2 z 2 x y z 1(805)then apply convex iteration (4.4.1) to implement rank constraints:findA , C∈S , bbsubject to tr(XE) = 22[3 3] A bG =b T (≽ 0)1⎡ [ ] ⎤ δ(A)X = ⎣ C[bδ(A) T b ] ⎦(≽ 0)T 1(806)1 T b = 1b ≽ 0rankG = 1rankX = 1

362 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwhere⎡E =⎢⎣0 0 0 0 1 0 00 0 0 0 0 1 00 0 0 1 0 0 00 0 1 0 0 0 01 0 0 0 0 0 00 1 0 0 0 0 00 0 0 0 0 0 0⎤12 ∈ S7 (807)⎥⎦(Matlab code on Wıκımization) Positive semidefiniteness is optional onlywhen rank-1 constraints are explicit by Theorem A.3.1.0.7. Optimal solutionto problem (803) is not unique: (x, y , z) = (0, 2 , 1).3 34.6.0.0.8 Example. Boolean vector satisfying Ax ≼ b. (confer4.2.3.1.1)Now we consider solution to a discrete problem whose only known analyticalmethod of solution is combinatorial in complexity: given A∈ R M×N andb∈ R M find x ∈ R Nsubject to Ax ≼ bδ(xx T ) = 1(808)This nonconvex problem demands a Boolean solution [x i = ±1, i=1... N ].Assign a rank-1 matrix of variables; symmetric variable matrix X andsolution vector x :[ xG =1] [x T 1] =[ X xx T 1][ xx T xx T 1]∈ S N+1 (809)Then design an equivalent semidefinite feasibility problem to find a Booleansolution to Ax ≼ b :findX∈S Nx ∈ R Nsubject to Ax ≼ bG =[ X xx T 1rankG = 1δ(X) = 1](≽ 0)(810)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 363where x ⋆ i ∈ {−1, 1}, i=1... N . The two variables X and x are madedependent via their assignment to rank-1 matrix G . By (1620), an optimalrank-1 matrix G ⋆ must take the form (809).As before, we regularize the rank constraint by introducing a directionmatrix Y into the objective:minimize 〈G, Y 〉X∈S N , x∈R Nsubject to Ax ≼ b[ X xG =x T 1δ(X) = 1]≽ 0(811)Solution of this semidefinite program is iterated with calculation of thedirection matrix Y from semidefinite program (783). At convergence, in thesense (737), convex problem (811) becomes equivalent to nonconvex Booleanproblem (808).Direction matrix Y can be an orthogonal projector having closed-formexpression, by (1700a), although convex iteration is not a projection method.(4.4.1.1) Given randomized data A and b for a large problem, we find thatstalling becomes likely (convergence of the iteration to a positive objective〈G ⋆ , Y 〉). To overcome this behavior, we introduce a heuristic into theimplementation on Wıκımization that momentarily reverses direction ofsearch (like (784)) upon stall detection. We find that rate of convergencecan be sped significantly by detecting stalls early.4.6.0.0.9 Example. Variable-vector normalization.Suppose, within some convex optimization problem, we want vector variablesx, y ∈ R N constrained by a nonconvex equality:x‖y‖ = y (812)id est, ‖x‖ = 1 and x points in the same direction as y≠0 ; e.g.,minimize f(x, y)x , ysubject to (x, y) ∈ Cx‖y‖ = y(813)where f is some convex function and C is some convex set. We can realizethe nonconvex equality by constraining rank and adding a regularization

364 CHAPTER 4. SEMIDEFINITE PROGRAMMINGterm to the objective. Make the assignment:⎡ ⎤ ⎡ ⎤ ⎡x [ x T y T 1] X Z x xx T xy T xG = ⎣ y ⎦ = ⎣ Z Y y ⎦⎣yx T yy T y1x T y T 1 x T y T 1⎤⎦∈ S 2N+1 (814)where X , Y ∈ S N , also Z ∈ S N [sic] . Any rank-1 solution must take theform of (814). (B.1) The problem statement equivalent to (813) is thenwrittenminimizeX , Y ∈ S , Z , x , ysubject tof(x, y) + ‖X − Y ‖ F(x, y) ∈ C⎡ ⎤X Z xG = ⎣ Z Y y ⎦(≽ 0)x T y T 1rankG = 1tr(X) = 1δ(Z) ≽ 0(815)The trace constraint on X normalizes vector x while the diagonal constrainton Z maintains sign between respective entries of x and y . Regularizationterm ‖X −Y ‖ F then makes x equal to y to within a real scalar; (C.2.0.0.2)in this case, a positive scalar. To make this program solvable by convexiteration, as explained in Example 4.4.1.2.4 and other previous examples, wemove the rank constraint to the objectiveminimizeX , Y , Z , x , ysubject to (x, y) ∈ C⎡f(x, y) + ‖X − Y ‖ F + 〈G, W 〉G = ⎣tr(X) = 1δ(Z) ≽ 0X Z xZ Y yx T y T 1⎤⎦≽ 0by introducing a direction matrix W found from (1700a):(816)minimizeW ∈ S 2N+1 〈G ⋆ , W 〉subject to 0 ≼ W ≼ Itr W = 2N(817)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 365This semidefinite program has an optimal solution that is known inclosed form. Iteration (816) (817) terminates when rankG = 1 and linearregularization 〈G, W 〉 vanishes to within some numerical tolerance in (816);typically, in two iterations. If function f competes too much with theregularization, positively weighting each regularization term will becomerequired. At convergence, problem (816) becomes a convex equivalent tothe original nonconvex problem (813).4.6.0.0.10 Example. fast max cut. [113]Let Γ be an n-node graph, and let the arcs (i , j) of the graph beassociated with [ ] weights a ij . The problem is to find a cut of thelargest possible weight, i.e., to partition the set of nodes into twoparts S, S ′ in such a way that the total weight of all arcs linkingS and S ′ (i.e., with one incident node in S and the other onein S ′ [Figure 106]) is as large as possible. [34,4.3.3]Literature on the max cut problem is vast because this problem has elegantprimal and dual formulation, its solution is very difficult, and there existmany commercial applications; e.g., semiconductor design [125], quantumcomputing [387].Our purpose here is to demonstrate how iteration of two simple convexproblems can quickly converge to an optimal solution of the max cutproblem with a 98% success rate, on average. 4.39 max cut is stated:∑maximize a ij (1 − x i x j ) 1x21≤i

366 CHAPTER 4. SEMIDEFINITE PROGRAMMING2 3 4S1CUT165S ′154-1-351-36721141398121110Figure 106: A cut partitions nodes {i=1...16} of this graph into S and S ′ .Linear arcs have circled weights. The problem is to find a cut maximizingtotal weight of all arcs linking partitions made by the cut.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 367max cut statement (818) is the same as, for A = [a ij ]∈ S n1maximizex 4 〈11T − xx T , A〉subject to δ(xx T ) = 1Because of Boolean assumption δ(xx T ) = 1so problem (820) is the same as(820)〈11 T − xx T , A〉 = 〈xx T , δ(A1) − A〉 (821)1maximizex 4 〈xxT , δ(A1) − A〉subject to δ(xx T ) = 1This max cut problem is combinatorial (nonconvex).(822)Because an estimate of upper bound to max cut is needed to ascertainconvergence when vector x has large dimension, we digress to derive thedual problem: Directly from (822), its Lagrangian is [61,5.1.5] (1418)L(x, ν) = 1 4 〈xxT , δ(A1) − A〉 + 〈ν , δ(xx T ) − 1〉= 1 4 〈xxT , δ(A1) − A〉 + 〈δ(ν), xx T 〉 − 〈ν , 1〉= 1 4 〈xxT , δ(A1 + 4ν) − A〉 − 〈ν , 1〉(823)where quadratic x T (δ(A1+ 4ν)−A)x has supremum 0 if δ(A1+ 4ν)−A isnegative semidefinite, and has supremum ∞ otherwise. The finite supremumof dual functiong(ν) = supL(x, ν) =x{ −〈ν , 1〉 , A − δ(A1 + 4ν) ≽ 0∞otherwiseis chosen to be the objective of minimization to dual (convex) problemminimize −ν T 1νsubject to A − δ(A1 + 4ν) ≽ 0(824)(825)whose optimal value provides a least upper bound to max cut, but is nottight ( 1 4 〈xxT , δ(A1)−A〉< g(ν) , duality gap is nonzero). [157] In fact, wefind that the bound’s variance with problem instance is too large to beuseful for this problem; thus ending our digression. 4.404.40 Taking the dual of dual problem (825) would provide (826) but without the rankconstraint. [150] Dual of a dual of even a convex primal problem is not necessarily thesame primal problem; although, optimal solution of one can be obtained from the other.

368 CHAPTER 4. SEMIDEFINITE PROGRAMMINGTo transform max cut to its convex equivalent, first defineX = xx T ∈ S n (830)then max cut (822) becomesmaximize1X∈ S n 4〈X , δ(A1) − A〉subject to δ(X) = 1(X ≽ 0)rankX = 1(826)whose rank constraint can be regularized as inmaximize1X∈ S n 4subject to δ(X) = 1X ≽ 0〈X , δ(A1) − A〉 − w〈X , W 〉(827)where w ≈1000 is a nonnegative fixed weight, and W is a direction matrixdetermined fromn∑λ(X ⋆ ) ii=2= minimizeW ∈ S n 〈X ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = n − 1(1700a)which has an optimal solution that is known in closed form. These twoproblems (827) and (1700a) are iterated until convergence as defined onpage 309.Because convex problem statement (827) is so elegant, it is numericallysolvable for large binary vectors within reasonable time. 4.41 To test ourconvex iterative method, we compare an optimal convex result to anactual solution of the max cut problem found by performing a brute forcecombinatorial search of (822) 4.42 for a tight upper bound. Search-time limitsbinary vector lengths to 24 bits (about five days CPU time). 98% accuracy,4.41 We solved for a length-250 binary vector in only a few minutes and convex iterationson a 2006 vintage laptop Core 2 CPU (Intel T7400@2.16GHz, 666MHz FSB).4.42 more computationally intensive than the proposed convex iteration by many ordersof magnitude. Solving max cut by searching over all binary vectors of length 100, forexample, would occupy a contemporary supercomputer for a million years.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 369actually obtained, is independent of binary vector length (12, 13, 20, 24)when averaged over more than 231 problem instances including planar,randomized, and toroidal graphs. 4.43 When failure occurred, large andsmall errors were manifest. That same 98% average accuracy is presumedmaintained when binary vector length is further increased. A Matlabprogram is provided on Wıκımization.4.6.0.0.11 Example. Cardinality/rank problem.d’Aspremont, El Ghaoui, Jordan, & Lanckriet [95] propose approximating apositive semidefinite matrix A∈ S N + by a rank-one matrix having constrainton cardinality c : for 0 < c < Nminimize ‖A − zz T ‖ Fzsubject to cardz ≤ c(828)which, they explain, is a hard problem equivalent tomaximize x T Axxsubject to ‖x‖ = 1card x ≤ c(829)where z √ λx and where optimal solution x ⋆ is a principal eigenvector(1694) (A.5) of A and λ = x ⋆T Ax ⋆ is the principal eigenvalue [159, p.331]when c is true cardinality of that eigenvector. This is principal componentanalysis with a cardinality constraint which controls solution sparsity. Definethe matrix variableX xx T ∈ S N (830)whose desired rank is 1, and whose desired diagonal cardinalitycardδ(X) ≡ cardx (831)4.43 Existence of a polynomial-time approximation to max cut with accuracy provablybetter than 94.11% would refute NP-hardness; which Håstad believes to be highly unlikely.[181, thm.8.2] [182]

370 CHAPTER 4. SEMIDEFINITE PROGRAMMINGis equivalent to cardinality c of vector x . Then we can transform cardinalityproblem (829) to an equivalent problem in new variable X : 4.44maximizeX∈S N 〈X , A〉subject to 〈X , I〉 = 1(X ≽ 0)rankX = 1card δ(X) ≤ c(832)We transform problem (832) to an equivalent convex problem byintroducing two direction matrices into regularization terms: W to achievedesired cardinality cardδ(X) , and Y to find an approximating rank-onematrix X :maximize 〈X , A − w 1 Y 〉 − w 2 〈δ(X) , δ(W)〉X∈S Nsubject to 〈X , I〉 = 1X ≽ 0(833)where w 1 and w 2 are positive scalars respectively weighting tr(XY ) andδ(X) T δ(W) just enough to insure that they vanish to within some numericalprecision, where direction matrix Y is an optimal solution to semidefiniteprogramminimize 〈X ⋆ , Y 〉Y ∈ S Nsubject to 0 ≼ Y ≼ I(834)trY = N − 1and where diagonal direction matrix W ∈ S N optimally solves linear programminimize 〈δ(X ⋆ ) , δ(W)〉W=δ 2 (W)subject to 0 ≼ δ(W) ≼ 1trW = N − c(835)Both direction matrix programs are derived from (1700a) whose analyticalsolution is known but is not necessarily unique. We emphasize (confer p.309):because this iteration (833) (834) (835) (initial Y,W = 0) is not a projection4.44 A semidefiniteness constraint X ≽ 0 is not required, theoretically, because positivesemidefiniteness of a rank-1 matrix is enforced by symmetry. (Theorem A.3.1.0.7)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 371phantom(256)Figure 107: Shepp-Logan phantom from Matlab image processing toolbox.method (4.4.1.1), success relies on existence of matrices in the feasible setof (833) having desired rank and diagonal cardinality. In particular, thefeasible set of convex problem (833) is a Fantope (91) whose extreme pointsconstitute the set of all normalized rank-one matrices; among those are foundrank-one matrices of any desired diagonal cardinality.Convex problem (833) is neither a relaxation of cardinality problem (829);instead, problem (833) is a convex equivalent to (829) at convergence ofiteration (833) (834) (835). Because the feasible set of convex problem (833)contains all normalized (B.1) symmetric rank-one matrices of every nonzerodiagonal cardinality, a constraint too low or high in cardinality c willnot prevent solution. An optimal rank-one solution X ⋆ , whose diagonalcardinality is equal to cardinality of a principal eigenvector of matrix A ,will produce the lowest residual Frobenius norm (to within machine noiseprocesses) in the original problem statement (828).4.6.0.0.12 Example. Compressive sampling of a phantom.In summer of 2004, Candès, Romberg, & Tao [70] and Donoho [118]released papers on perfect signal reconstruction from samples that stand inviolation of Shannon’s classical sampling theorem. These defiant signals areassumed sparse inherently or under some sparsifying affine transformation.Essentially, they proposed sparse sampling theorems asserting averagesample rate independent of signal bandwidth and less than Shannon’s rate.

372 CHAPTER 4. SEMIDEFINITE PROGRAMMINGMinimum sampling rate:of Ω-bandlimited signal: 2Ω (Shannon)of k-sparse length-n signal: k log 2 (1+n/k)(Candès/Donoho)(Figure 100). Certainly, much was already known about nonuniformor random sampling [36] [208] and about subsampling or multiratesystems [91] [357]. Vetterli, Marziliano, & Blu [366] had congealed atheory of noiseless signal reconstruction, in May 2001, from samples thatviolate the Shannon rate. They anticipated the sparsifying transformby recognizing: it is the innovation (onset) of functions constituting a(not necessarily bandlimited) signal that determines minimum samplingrate for perfect reconstruction. Average onset (sparsity) Vetterli et aliicall rate of innovation. The vector inner-products that Candès/Donohocall samples or measurements, Vetterli calls projections. From thoseprojections Vetterli demonstrates reconstruction (by digital signal processingand “root finding”) of a Dirac comb, the very same prototypicalsignal from which Candès probabilistically derives minimum samplingrate (Compressive Sampling and Frontiers in Signal Processing, Universityof Minnesota, June 6, 2007). Combining their terminology, we paraphrase asparse sampling theorem:Minimum sampling rate, asserted by Candès/Donoho, is proportionalto Vetterli’s rate of innovation (a.k.a, information rate, degrees offreedom, ibidem June 5, 2007).What distinguishes these researchers are their methods of reconstruction.Properties of the 1-norm were also well understood by June 2004finding applications in deconvolution of linear systems [83], penalized linearregression (Lasso) [318], and basis pursuit [213]. But never before hadthere been a formalized and rigorous sense that perfect reconstruction werepossible by convex optimization of 1-norm when information lost in asubsampling process became nonrecoverable by classical methods. Donohonamed this discovery compressed sensing to describe a nonadaptive perfectreconstruction method by means of linear programming. By the time Candès’and Donoho’s landmark papers were finally published by IEEE in 2006,compressed sensing was old news that had spawned intense research whichstill persists; notably, from prominent members of the wavelet community.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 373Reconstruction of the Shepp-Logan phantom (Figure 107), from aseverely aliased image (Figure 109) obtained by Magnetic ResonanceImaging (MRI), was the impetus driving Candès’ quest for a sparse samplingtheorem. He realized that line segments appearing in the aliased imagewere regions of high total variation. There is great motivation, in themedical community, to apply compressed sensing to MRI because it translatesto reduced scan-time which brings great technological and physiologicalbenefits. MRI is now about 35 years old, beginning in 1973 with Nobellaureate Paul Lauterbur from Stony Brook USA. There has been muchprogress in MRI and compressed sensing since 2004, but there have alsobeen indications of 1-norm abandonment (indigenous to reconstruction bycompressed sensing) in favor of criteria closer to 0-norm because of acorrespondingly smaller number of measurements required to accuratelyreconstruct a sparse signal: 4.455481 complex samples (22 radial lines, ≈256 complex samples per) wererequired in June 2004 to reconstruct a noiseless 256×256-pixel Shepp-Loganphantom by 1-norm minimization of an image-gradient integral estimatecalled total variation; id est, 8.4% subsampling of 65536 data. [70,1.1][69,3.2] It was soon discovered that reconstruction of the Shepp-Loganphantom were possible with only 2521 complex samples (10 radial lines,1Figure 108); 3.8% subsampled data input to a (nonconvex) -norm 2total-variation minimization. [76,IIIA] The closer to 0-norm, the fewer thesamples required for perfect reconstruction.Passage of a few years witnessed an algorithmic speedup and dramaticreduction in minimum number of samples required for perfect reconstructionof the noiseless Shepp-Logan phantom. But minimization of total variationis ideally suited to recovery of any piecewise-constant image, like a phantom,because the gradient of such images is highly sparse by design.There is no inherent characteristic of real-life MRI images that wouldmake reasonable an expectation of sparse gradient. Sparsification of adiscrete image-gradient tends to preserve edges. Then minimization of totalvariation seeks an image having fewest edges. There is no deeper theoreticalfoundation than that. When applied to human brain scan or angiogram,4.45 Efficient techniques continually emerge urging 1-norm criteria abandonment; [80] [356][355,IID] e.g., five techniques for compressed sensing are compared in [37] demonstratingthat 1-norm performance limits for cardinality minimization can be reliably exceeded.

374 CHAPTER 4. SEMIDEFINITE PROGRAMMINGwith as much as 20% of 256×256 Fourier samples, we have observed 4.46 a30dB image/reconstruction-error ratio 4.47 barrier that seems impenetrableby the total-variation objective. Total-variation minimization has met withmoderate success, in retrospect, only because some medical images aremoderately piecewise-constant signals. One simply hopes a reconstruction,that is in some sense equal to a known subset of samples and whose gradientis most sparse, is that unique image we seek. 4.48The total-variation objective, operating on an image, is expressible asnorm of a linear transformation (854). It is natural to ask whether thereexist other sparsifying transforms that might break the real-life 30dB barrier(any sampling pattern @20% 256×256 data) in MRI. There has beenmuch research into application of wavelets, discrete cosine transform (DCT),randomized orthogonal bases, splines, etcetera, but with suspiciously littlefocus on objective measures like image/error or illustration of differenceimages; the predominant basis of comparison instead being subjectivelyvisual (Duensing & Huang, ISMRM Toronto 2008). 4.49 Despite choice oftransform, there seems yet to have been a breakthrough of the 30dB barrier.Application of compressed sensing to MRI, therefore, remains fertile in 2008for continued research.4.46 Experiments with real-life images were performed by Christine Law at Lucas Centerfor Imaging, Stanford University.4.47 Noise considered here is due only to the reconstruction process itself; id est, noisein excess of that produced by the best reconstruction of an image from a complete setof samples in the sense of Shannon. At less than 30dB image/error, artifacts generallyremain visible to the naked eye. We estimate about 50dB is required to eliminate noticeabledistortion in a visual A/B comparison.4.48 In vascular radiology, diagnoses are almost exclusively based on morphology of vesselsand, in particular, presence of stenoses. There is a compelling argument for total-variationreconstruction of magnetic resonance angiogram because it helps isolate structures ofparticular interest.4.49 I have never calculated the PSNR of these reconstructed images [of Barbara].−Jean-Luc StarckThe sparsity of the image is the percentage of transform coefficients sufficient fordiagnostic-quality reconstruction. Of course the term “diagnostic quality” is subjective....I have yet to see an “objective” measure of image quality. Difference images, in myexperience, definitely do not tell the whole story. Often I would show people some of myresults and get mixed responses, but when I add artificial Gaussian noise to an image,often people say that it looks better.−Michael Lustig

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 375Lagrangian form of compressed sensing in imagingWe now repeat Candès’ image reconstruction experiment from 2004 whichled to discovery of sparse sampling theorems. [70,1.2] But we achieveperfect reconstruction with an algorithm based on vanishing gradient of acompressed sensing problem’s Lagrangian, which is computationally efficient.Our contraction method (p.381) is fast also because matrix multiplications arereplaced by fast Fourier transforms and number of constraints is cut in halfby sampling symmetrically. Convex iteration for cardinality minimization(4.5) is incorporated which allows perfect reconstruction of a phantom at4.1% subsampling rate; 50% Candès’ rate. By making neighboring-pixelselection adaptive, convex iteration reduces discrete image-gradient sparsityof the Shepp-Logan phantom to 1.9% ; 33% lower than previously reported.We demonstrate application of discrete image-gradient sparsification tothe n×n=256×256 Shepp-Logan phantom, simulating idealized acquisitionof MRI data by radial sampling in the Fourier domain (Figure 108). 4.50Define a Nyquist-centric discrete Fourier transform (DFT) matrix⎡⎤1 1 1 1 · · · 11 e −j2π/n e −j4π/n e −j6π/n · · · e −j(n−1)2π/nF 1 e −j4π/n e −j8π/n e −j12π/n · · · e −j(n−1)4π/n1⎢ 1 e −j6π/n e −j12π/n e −j18π/n · · · e −j(n−1)6π/n√ ∈ C n×n⎥ n⎣ . . . .... . ⎦1 e −j(n−1)2π/n e −j(n−1)4π/n e −j(n−1)6π/n · · · e −j(n−1)2 2π/n(836)a symmetric (nonHermitian) unitary matrix characterizedF = F TF −1 = F H (837)Denoting an unknown image U ∈ R n×n , its two-dimensional discrete Fouriertransform F isF(U) F UF (838)hence the inverse discrete transformU = F H F(U)F H (839)4.50 k-space is conventional acquisition terminology indicating domain of the continuousraw data provided by an MRI machine. An image is reconstructed by inverse discreteFourier transform of that data interpolated on a Cartesian grid in two dimensions.

376 CHAPTER 4. SEMIDEFINITE PROGRAMMINGΦFigure 108: MRI radial sampling pattern, in DC-centric Fourier domain,representing 4.1% (10 lines) subsampled data. Only half of these complexsamples, in any halfspace about the origin in theory, need be acquiredfor a real image because of conjugate symmetry. Due to MRI machineimperfections, samples are generally taken over full extent of each radialline segment. MRI acquisition time is proportional to number of lines.FromA.1.1 no.31 we have a vectorized two-dimensional DFT via Kroneckerproduct ⊗vecF(U) (F ⊗F ) vec U (840)and from (839) its inverse [166, p.24]vec U = (F H ⊗F H )(F ⊗F ) vec U = (F H F ⊗ F H F ) vec U (841)Idealized radial sampling in the Fourier domain can be simulated byHadamard product ◦ with a binary mask Φ∈ R n×n whose nonzero entriescould, for example, correspond with the radial line segments in Figure 108.To make the mask Nyquist-centric, like DFT matrix F , define a circulant[168] symmetric permutation matrix 4.51Θ [ 0 II 0]∈ S n (842)Then given subsampled Fourier domain (MRI k-space) measurements in4.51 Matlab fftshift()

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 377incomplete K ∈ C n×n , we might constrain F(U) thus:and in vector form, (42) (1787)ΘΦΘ ◦ F UF = K (843)δ(vec ΘΦΘ)(F ⊗F ) vec U = vec K (844)Because measurements K are complex, there are actually twice the numberof equality constraints as there are measurements.We can cut that number of constraints in half via vertical and horizontalmask Φ symmetry which forces the imaginary inverse transform to 0 : Theinverse subsampled transform in matrix form isand in vector formF H (ΘΦΘ ◦ F UF)F H = F H KF H (845)(F H ⊗F H )δ(vec ΘΦΘ)(F ⊗F ) vec U = (F H ⊗F H ) vec K (846)later abbreviatedP vec U = f (847)whereP (F H ⊗F H )δ(vec ΘΦΘ)(F ⊗F ) ∈ C n2 ×n 2 (848)Because of idempotence P = P 2 , P is a projection matrix. Because of itsHermitian symmetry [166, p.24]P = (F H ⊗F H )δ(vec ΘΦΘ)(F ⊗F ) = (F ⊗F ) H δ(vec ΘΦΘ)(F H ⊗F H ) H = P H(849)P is an orthogonal projector. 4.52 P vec U is real when P is real; id est, whenfor positive even integer n[Φ =]Φ 11 Φ(1, 2:n)Ξ∈ R n×n (850)ΞΦ(2:n, 1) ΞΦ(2:n, 2:n)Ξwhere Ξ∈ S n−1 is the order-reversing permutation matrix (1728). In words,this necessary and sufficient condition on Φ (for a real inverse subsampled

378 CHAPTER 4. SEMIDEFINITE PROGRAMMINGvec −1 fFigure 109: Aliasing of Shepp-Logan phantom in Figure 107 resulting fromk-space subsampling pattern in Figure 108. This image is real becausebinary mask Φ is vertically and horizontally symmetric. It is remarkablethat the phantom can be reconstructed, by convex iteration, given onlyU 0 = vec −1 f .transform [284, p.53]) demands vertical symmetry about row n +1 and2horizontal symmetry 4.53 about column n+1.2Define ⎡⎤1 0 0⎢−1 1 0 ⎥∆ ⎢⎣−1 1 . .............. 1 00 T −1 1Express an image-gradient estimate⎡U ∆∇U ⎢ U ∆ T⎣ ∆ U∆ T U⎤∈ R n×n (851)⎥⎦⎥⎦ ∈ R4n×n (852)4.52 (848) is a diagonalization of matrix P whose binary eigenvalues are δ(vec ΘΦΘ) whilethe corresponding eigenvectors constitute the columns of unitary matrix F H ⊗F H .4.53 This condition on Φ applies to both DC- and Nyquist-centric DFT matrices.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 379that is a simple first-order difference of neighboring pixels (Figure 110) tothe right, left, above, and below. 4.54 ByA.1.1 no.31, its vectorization: forΨ i ∈ R n2 ×n 2vec ∇U =⎡⎢⎣⎤∆ T ⊗ I∆ ⊗ I⎥I ⊗ ∆I ⊗ ∆ T⎡⎦ vec U ⎢⎣Ψ 1Ψ T 1Ψ 2Ψ T 2⎤⎥⎦ vec U Ψ vec U ∈ R4n2 (853)where Ψ∈ R 4n2 ×n 2 . A total-variation minimization for reconstructing MRIimage U , that is known suboptimal [207] [71], may be concisely posedwhereminimize ‖Ψ vec U‖ 1Usubject to P vec U = f(854)f = (F H ⊗F H ) vec K ∈ C n2 (855)is the known inverse subsampled Fourier data (a vectorized aliased image,Figure 109), and where a norm of discrete image-gradient ∇U is equivalentlyexpressed as norm of a linear transformation Ψ vec U .Although this simple problem statement (854) is equivalent to a linearprogram (3.2), its numerical solution is beyond the capability of even themost highly regarded of contemporary commercial solvers. 4.55 Our onlyrecourse is to recast the problem in Lagrangian form (3.1.2.1.2) and writecustomized code to solve it:minimizeUminimize 〈|Ψ vec U| , y〉Usubject to P vec U = f≡〈|Ψ vec U| , y〉 + 1λ‖P vec U − 2 f‖2 2(a)(b)(856)4.54 There is significant improvement in reconstruction quality by augmentation of anominally two-point discrete image-gradient estimate to four points per pixel by inclusionof two polar directions. Improvement is due to centering; symmetry of discrete differencesabout a central pixel. We find small improvement on real-life images, ≈1dB empirically,by further augmentation with diagonally adjacent pixel differences.4.55 for images as small as 128×128 pixels. Obstacle to numerical solution is not acomputer resource: e.g., execution time, memory. The obstacle is, in fact, inadequatenumerical precision. Even when all dependent equality constraints are manually removed,the best commercial solvers fail simply because computer numerics become nonsense;id est, numerical errors enter significant digits and the algorithm exits prematurely, loopsindefinitely, or produces an infeasible solution.

380 CHAPTER 4. SEMIDEFINITE PROGRAMMINGFigure 110: Neighboring-pixel stencil [356] for image-gradient estimation onCartesian grid. Implementation selects adaptively from darkest four • aboutcentral. Continuous image-gradient from two pixels holds only in a limit. Fordiscrete differences, better practical estimates are obtained when centered.where multiobjective parameter λ∈ R + is quite large (λ≈1E8) so as to enforcethe equality constraint: P vec U −f = 0 ⇔ ‖P vec U −f‖ 2 2=0 (A.7.1). Weintroduce a direction vector y ∈ R 4n2+ as part of a convex iteration (4.5.2)to overcome that known suboptimal minimization of discrete image-gradientcardinality: id est, there exists a vector y ⋆ with entries yi ⋆ ∈ {0, 1} such thatminimize ‖Ψ vec U‖ 0Usubject to P vec U = f≡minimizeU〈|Ψ vec U| , y ⋆ 〉 + 1 2 λ‖P vec U − f‖2 2(857)Existence of such a y ⋆ , complementary to an optimal vector Ψ vec U ⋆ , isobvious by definition of global optimality 〈|Ψ vec U ⋆ | , y ⋆ 〉= 0 (758) underwhich a cardinality-c optimal objective ‖Ψ vec U ⋆ ‖ 0 is assumed to exist.Because (856b) is an unconstrained convex problem, a zero in theobjective function’s gradient is necessary and sufficient for optimality(2.13.3); id est, (D.2.1)Ψ T δ(y) sgn(Ψ vec U) + λP H (P vec U − f) = 0 (858)Because of P idempotence and Hermitian symmetry and sgn(x)= x/|x| ,

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 381this is equivalent to(lim Ψ T δ(y)δ(|Ψ vec U| + ǫ1) −1 Ψ + λP ) vec U = λPf (859)ǫ→0where small positive constant ǫ ∈ R + has been introduced for invertibility.The analytical effect of introducing ǫ is to make the objective’s gradient valideverywhere in the function domain; id est, it transforms absolute value in(856b) from a function differentiable almost everywhere into a differentiablefunction. An example of such a transformation in one dimension is illustratedin Figure 111. When small enough for practical purposes 4.56 (ǫ≈1E-3), wemay ignore the limiting operation. Then the mapping, for 0 ≼ y ≼ 1vec U t+1 = ( Ψ T δ(y)δ(|Ψ vec U t | + ǫ1) −1 Ψ + λP ) −1λPf (860)is a contraction in U t that can be solved recursively in t for itsunique fixed point; id est, until U t+1 → U t . [227, p.300] [203, p.155]Calculating this inversion directly is not possible for large matrices oncontemporary computers because of numerical precision, so instead we applythe conjugate gradient method of solution to(Ψ T δ(y)δ(|Ψ vec U t | + ǫ1) −1 Ψ + λP ) vec U t+1 = λPf (861)which is linear in U t+1 at each recursion in the Matlab program. 4.57Observe that P (848), in the equality constraint from problem (856a), isnot a fat matrix. 4.58 Although number of Fourier samples taken is equal tothe number of nonzero entries in binary mask Φ , matrix P is square butnever actually formed during computation. Rather, a two-dimensional fastFourier transform of U is computed followed by masking with ΘΦΘ andthen an inverse fast Fourier transform. This technique significantly reducesmemory requirements and, together with contraction method of solution, isthe principal reason for relatively fast computation.4.56 We are looking for at least 50dB image/error ratio from only 4.1% subsampled data(10 radial lines in k-space). With this setting of ǫ, we actually attain in excess of 100dBfrom a simple Matlab program in about a minute on a 2006 vintage laptop Core 2 CPU(Intel T7400@2.16GHz, 666MHz FSB). By trading execution time and treating discreteimage-gradient cardinality as a known quantity for this phantom, over 160dB is achievable.4.57 Conjugate gradient method requires positive definiteness. [152,4.8.3.2]4.58 Fat is typical of compressed sensing problems; e.g., [69] [76].

382 CHAPTER 4. SEMIDEFINITE PROGRAMMING|x|∫ x ydy−1 |y|+ǫFigure 111: Real absolute value function f 2 (x)= |x| on x∈[−1, 1] fromFigure 68b superimposed upon integral of its derivative at ǫ=0.05 whichsmooths objective function.convex iterationBy convex iteration we mean alternation of solution to (856a) and (862)until convergence. Direction vector y is initialized to 1 until the first fixedpoint is found; which means, the contraction recursion begins calculatinga (1-norm) solution U ⋆ to (854) via problem (856b). Once U ⋆ is found,vector y is updated according to an estimate of discrete image-gradientcardinality c : Sum of the 4n 2 −c smallest entries of |Ψ vec U ⋆ |∈ R 4n2 isthe optimal objective value from a linear program, for 0≤c≤4n 2 − 1 (525)4n ∑2π(|Ψ vec U ⋆ |) i = minimize 〈|Ψ vec U ⋆ | , y〉y∈R 4n2subject to 0 ≼ y ≼ 1i=c+1y T 1 = 4n 2 −c(862)where π is the nonlinear permutation-operator sorting its vector argumentinto nonincreasing order. An optimal solution y to (862), that is anextreme point of its feasible set, is known in closed form: it has 1 in eachentry corresponding to the 4n 2 −c smallest entries of |Ψ vec U ⋆ | and has 0elsewhere (page 336). Updated image U ⋆ is assigned to U t , the contractionis recomputed solving (856b), direction vector y is updated again, andso on until convergence which is guaranteed by virtue of a monotonicallynonincreasing real sequence of objective values in (856a) and (862).

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 383There are two features that distinguish problem formulation (856b) andour particular implementation of it [Matlab on Wıκımization]:1) An image-gradient estimate may engage any combination of fouradjacent pixels. In other words, the algorithm is not lockedinto a four-point gradient estimate (Figure 110); number of pointsconstituting an estimate is directly determined by direction vectory . 4.59 Indeed, we find only c = 5092 zero entries in y ⋆ for theShepp-Logan phantom; meaning, discrete image-gradient sparsity isactually closer to 1.9% than the 3% reported elsewhere; e.g., [355,IIB].2) Numerical precision of the fixed point of contraction (860) (≈1E-2for almost perfect reconstruction @−103dB error) is a parameter tothe implementation; meaning, direction vector y is updated aftercontraction begins but prior to its culmination. Impact of thisidiosyncrasy tends toward simultaneous optimization in variables Uand y while insuring y settles on a boundary point of its feasible set(nonnegative hypercube slice) in (862) at every iteration; for only aboundary point 4.60 can yield the sum of smallest entries in |Ψ vec U ⋆ |.Almost perfect reconstruction of the Shepp-Logan phantom at 103dBimage/error is achieved in a Matlab minute with 4.1% subsampled data(2671 complex samples); well below an 11% least lower bound predicted bythe sparse sampling theorem. Because reconstruction approaches optimalsolution to a 0-norm problem, minimum number of Fourier-domain samplesis bounded below by cardinality of discrete image-gradient at 1.9%. 4.6.0.0.13 Exercise. Contraction operator.Determine conditions on λ and ǫ under which (860) is a contraction andΨ T δ(y)δ(|Ψ vec U t | + ǫ1) −1 Ψ + λP from (861) is positive definite. 4.59 This adaptive gradient was not contrived. It is an artifact of the convex iterationmethod for minimal cardinality solution; in this case, cardinality minimization of a discreteimage-gradient.4.60 Simultaneous optimization of these two variables U and y should never be a pinnacleof aspiration; for then, optimal y might not attain a boundary point.

384 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.7 Constraining rank of indefinite matricesExample 4.7.0.0.1, which follows, demonstrates that convex iteration is moregenerally applicable: to indefinite or nonsquare matrices X ∈ R m×n ; not onlyto positive semidefinite matrices. Indeed,findX∈R m×nXsubject to X ∈ CrankX ≤ k≡findX, Y, ZXsubject to X ∈ C[ ] Y XG =X T ZrankG ≤ k(863)Proof. rankG ≤ k ⇒ rankX ≤ k because X is the projection of compositematrix G on subspace R m×n . For symmetric Y and Z , any rank-k positivesemidefinite composite matrix G can be factored into rank-k terms R ;G = R T R where R [B C ] and rankB, rankC ≤ rankR and B ∈ R k×mand C ∈ R k×n . Because Y and Z and X = B T C are variable, (1465)rankX ≤ rankB, rankC ≤ rankR = rankG is tight.So there must exist an optimal direction vector W ⋆ such thatfindX, Y, ZXsubject to X ∈ C[ ] Y XG =X T ZrankG ≤ k≡minimizeX, Y, Z〈G, W ⋆ 〉subject to X ∈ C[ Y XG =X T Z]≽ 0(864)Were W ⋆ = I , by (1688) the optimal resulting trace objective would beequivalent to the minimization of nuclear norm of X over C . This means:(confer p.230) Any nuclear norm minimization problem may have itsvariable replaced with a composite semidefinite variable of the sameoptimal rank but doubly dimensioned.Then Figure 84 becomes an accurate geometrical description of a consequentcomposite semidefinite problem objective. But there are better directionvectors than identity I which occurs only under special conditions:

4.7. CONSTRAINING RANK OF INDEFINITE MATRICES 3854.7.0.0.1 Example. Compressed sensing, compressive sampling. [300]As our modern technology-driven civilization acquires and exploitsever-increasing amounts of data, everyone now knows that most of thedata we acquire can be thrown away with almost no perceptual loss − witnessthe broad success of lossy compression formats for sounds, images, andspecialized technical data. The phenomenon of ubiquitous compressibilityraises very natural questions: Why go to so much effort to acquire all thedata when most of what we get will be thrown away? Can’t we just directlymeasure the part that won’t end up being thrown away? −David Donoho [118]Lossy data compression techniques like JPEG are popular, but it isalso well known that compression artifacts become quite perceptible withsignal postprocessing that goes beyond mere playback of a compressedsignal. [220] [245] Spatial or audio frequencies presumed masked by asimultaneity are not encoded, for example, so rendered imperceptible evenwith significant postfiltering (of the compressed signal) that is meant toreveal them; id est, desirable artifacts are forever lost, so highly compresseddata is not amenable to analysis and postprocessing: e.g., sound effects [97][98] [99] or image enhancement (Photoshop). 4.61 Further, there can be nouniversally acceptable unique metric of perception for gauging exactly howmuch data can be tossed. For these reasons, there will always be need forraw (noncompressed) data.In this example we throw out only so much information as to leave perfectreconstruction within reach. Specifically, the MIT logo in Figure 112 isperfectly reconstructed from 700 time-sequential samples {y i } acquired bythe one-pixel camera illustrated in Figure 113. The MIT-logo image in thisexample impinges a 46×81 array micromirror device. This mirror arrayis modulated by a pseudonoise source that independently positions all theindividual mirrors. A single photodiode (one pixel) integrates incidentlight from all mirrors. After stabilizing the mirrors to a fixed butpseudorandom pattern, light so collected is then digitized into one sampley i by analog-to-digital (A/D) conversion. This sampling process is repeatedwith the micromirror array modulated to a new pseudorandom pattern.The most important questions are: How many samples do we need for4.61 As simple a process as upward scaling of signal amplitude or image size will alwaysintroduce noise; even to a noncompressed signal. But scaling-noise is particularlynoticeable in a JPEG-compressed image; e.g., text or any sharp edge.

386 CHAPTER 4. SEMIDEFINITE PROGRAMMINGFigure 112: Massachusetts Institute of Technology (MIT) logo, including itswhite boundary, may be interpreted as a rank-5 matrix. (Stanford Universitylogo rank is much higher;) This constitutes Scene Y observed by theone-pixel camera in Figure 113 for Example 4.7.0.0.1.Yy iFigure 113: One-pixel camera. [344] [370] Compressive imaging camerablock diagram. Incident lightfield (corresponding to the desired image Y )is reflected off a digital micromirror device (DMD) array whose mirrororientations are modulated in the pseudorandom pattern supplied by therandom number generators (RNG). Each different mirror pattern producesa voltage at the single photodiode that corresponds to one measurement y i .

4.7. CONSTRAINING RANK OF INDEFINITE MATRICES 387perfect reconstruction? Does that number of samples represent compressionof the original data?We claim that perfect reconstruction of the MIT logo can be reliablyachieved with as few as 700 samples y=[y i ]∈ R 700 from this one-pixelcamera. That number represents only 19% of information obtainable from3726 micromirrors. 4.62 (Figure 114)Our approach to reconstruction is to look for low-rank solution to anunderdetermined system:find XX ∈ R 46×81subject to A vec X = yrankX ≤ 5(865)where vec X is the vectorized (37) unknown image matrix. Each row offat matrix A is one realization of a pseudorandom pattern applied to themicromirrors. Since these patterns are deterministic (known), then the i thsample y i equals A(i, :) vec Y ; id est, y = A vec Y . Perfect reconstructionhere means optimal solution X ⋆ equals scene Y ∈ R 46×81 to within machineprecision.Because variable matrix X is generally not square or positive semidefinite,we constrain its rank by rewriting the problem equivalentlyfindW 1 ∈ R 46×46 , W 2 ∈ R 81×81 , X ∈ R 46×81subject toXA vec[X = y]W1 XrankX T ≤ 5W 2(866)This rank constraint on the composite (block) matrix insures rank X ≤ 5for any choice of dimensionally compatible matrices W 1 and W 2 . But tosolve this problem by convex iteration, we alternate solution of semidefinite4.62 That number (700 samples) is difficult to achieve, as reported in [300,6]. If a minimalbasis for the MIT logo were instead constructed, only five rows or columns worth ofdata (from a 46×81 matrix) are linearly independent. This means a lower bound onachievable compression is about 5×46 = 230 samples plus 81 samples column encoding;which corresponds to 8% of the original information. (Figure 114)

388 CHAPTER 4. SEMIDEFINITE PROGRAMMING1 2 3 4 5samples0 1000 2000 3000 3726Figure 114: Estimates of compression for various encoding methods:1) linear interpolation (140 samples),2) minimal columnar basis (311 samples),3) convex iteration (700 samples) can achieve lower bound predicted bycompressed sensing (670 samples, n=46×81, k =140, Figure 100) whereasnuclear norm minimization alone does not [300,6],4) JPEG @100% quality (2588 samples),5) no compression (3726 samples).programwith semidefinite programminimizeW 1 ∈ S 46 , W 2 ∈ S 81 , X ∈ R 46×81subject tominimizeZ ∈ S 46+81([ ] )W1 XtrX T ZW 2A vec X = y[ ]W1 XX T ≽ 0W 2([ ] ⋆ )W1 XtrX T ZW 2subject to 0 ≼ Z ≼ ItrZ = 46 + 81 − 5(867)(868)(which has an optimal solution that is known in closed form, p.658) until arank-5 composite matrix is found.With 1000 samples {y i } , convergence occurs in two iterations; 700samples require more than ten iterations but reconstruction remainsperfect. Iterating more admits taking of fewer samples. Reconstruction isindependent of pseudorandom sequence parameters; e.g., binary sequencessucceed with the same efficiency as Gaussian or uniformly distributedsequences.

4.7. CONSTRAINING RANK OF INDEFINITE MATRICES 3894.7.1 rank-constraint midsummaryWe find that this direction matrix idea works well and quite independentlyof desired rank or affine dimension. This idea of direction matrix is goodprincipally because of its simplicity: When confronted with a problemotherwise convex if not for a rank or cardinality constraint, then thatconstraint becomes a linear regularization term in the objective.There exists a common thread through all these Examples; that being,convex iteration with a direction matrix as normal to a linear regularization(a generalization of the well-known trace heuristic). But each problem type(per Example) possesses its own idiosyncrasies that slightly modify how arank-constrained optimal solution is actually obtained: The ball packingproblem in Chapter 5.4.2.3.4, for example, requires a problem sequence ina progressively larger number of balls to find a good initial value for thedirection matrix, whereas many of the examples in the present chapterrequire an initial value of 0. Finding a Boolean solution in Example 4.6.0.0.8requires a procedure to detect stalls, while other problems have no suchrequirement. The combinatorial Procrustes problem in Example 4.6.0.0.3allows use of a known closed-form solution for direction vector when solvedvia rank constraint, but not when solved via cardinality constraint. Someproblems require a careful weighting of the regularization term, whereas otherproblems do not, and so on. It would be nice if there were a universallyapplicable method for constraining rank; one that is less susceptible to quirksof a particular problem type.Poor initialization of the direction matrix from the regularization canlead to an erroneous result. We speculate one reason to be a simple dearthof optimal solutions of desired rank or cardinality; 4.63 an unfortunate choiceof initial search direction leading astray. Ease of solution by convex iterationoccurs when optimal solutions abound. With this speculation in mind, wenow propose a further generalization of convex iteration for constraining rankthat attempts to ameliorate quirks and unify problem types:4.63 Recall that, in Convex Optimization, an optimal solution generally comes from aconvex set of optimal solutions that can be large.

390 CHAPTER 4. SEMIDEFINITE PROGRAMMING4.8 Convex Iteration rank-1We now develop a general method for constraining rank that first decomposesa given problem via standard diagonalization of matrices (A.5). Thismethod is motivated by observation (4.4.1.1) that an optimal directionmatrix can be simultaneously diagonalizable with an optimal variablematrix. This suggests minimization of an objective function directly interms of eigenvalues. A second motivating observation is that variableorthogonal matrices seem easily found by convex iteration; e.g., ProcrustesExample 4.6.0.0.2.It turns out that this general method always requires solution to a rank-1constrained problem regardless of desired rank from the original problem. Todemonstrate, we pose a semidefinite feasibility problemfindX ∈ S nsubject to A svec X = bX ≽ 0rankX ≤ ρ(869)given an upper bound 0 < ρ < n on rank, vector b ∈ R m , and typically fatfull-rank⎡ ⎤svec(A 1 ) TA = ⎣ . ⎦∈ R m×n(n+1)/2 (650)svec(A m ) Twhere A i ∈ S n , i=1... m are also given. Symmetric matrix vectorizationsvec is defined in (56). Thus⎡ ⎤tr(A 1 X)A svec X = ⎣ . ⎦ (651)tr(A m X)This program (869) states the classical problem of finding a matrix ofspecified rank in the intersection of the positive semidefinite cone witha number m of hyperplanes in the subspace of symmetric matrices S n .[26,II.13] [24,2.2]Express the nonincreasingly ordered diagonalization of variable matrixn∑X QΛQ T = λ i Q ii ∈ S n (870)i=1

4.8. CONVEX ITERATION RANK-1 391which is a sum of rank-1 orthogonal projection matrices Q ii weighted byeigenvalues λ i where Q ij q i q T j ∈ R n×n , Q = [q 1 · · · q n ]∈ R n×n , Q T = Q −1 ,Λ ii = λ i ∈ R , and⎡Λ = ⎢⎣⎤λ 1 0λ 2 ⎥ ... ⎦ ∈ Sn (871)0 T λ nThe factΛ ≽ 0 ⇔ X ≽ 0 (1450)allows splitting semidefinite feasibility problem (869) into two parts:( ρ) minimizeΛ∥ A svec ∑λ i Q ii − b∥i=1[ Rρ] (872)+subject to δ(Λ) ∈0( ρ) minimizeQ∥ A svec ∑λ ⋆ i Q ii − b∥i=1(873)subject to Q T = Q −1The linear equality constraint A svec X = b has been moved to the objectivewithin a norm because these two problems (872) (873) are iterated; equalitymight only become attainable near convergence. This iteration alwaysconverges to a local minimum because the sequence of objective values ismonotonic and nonincreasing; any monotonically nonincreasing real sequenceconverges. [258,1.2] [42,1.1] A rank ρ matrix X solving the originalproblem (869) is found when the objective converges to 0 ; a certificate ofglobal optimality for the iteration. Positive semidefiniteness of matrix Xwith an upper bound ρ on rank is assured by the constraint on eigenvaluematrix Λ in convex problem (872); it means, the main diagonal of Λ mustbelong to the nonnegative orthant in a ρ-dimensional subspace of R n .The second problem (873) in the iteration is not convex. We propose

392 CHAPTER 4. SEMIDEFINITE PROGRAMMINGsolving it by convex iteration: Make the assignment⎡ ⎤q 1 [ q1 T · · · qρ T ]G = ⎣.⎦∈ S nρq ρ⎡⎤ ⎡Q 11 · · · Q 1ρ q 1 q1 T · · · q 1 qρT= ⎣ .... . ⎦⎣.... .Q T 1ρ · · · Q ρρ q ρ q1 T · · · q ρ qρT⎤⎦(874)Given ρ eigenvalues λ ⋆ i , then a problem equivalent to (873) is∥ ( ∥∥∥ ρ) ∑minimize A svec λ ⋆Q ii ∈S n , Q ij ∈R n×ni Q ii − b∥i=1subject to tr Q ii = 1, i=1... ρtr Q ij = 0,i

4.8. CONVEX ITERATION RANK-1 393the latter providing direction of search W for a rank-1 matrix G in (876).An optimal solution to (877) has closed form (p.658). These convex problems(876) (877) are iterated with (872) until a rank-1 G matrix is found and theobjective of (876) vanishes.For a fixed number of equality constraints m and upper bound ρ , thefeasible set in rank-constrained semidefinite feasibility problem (869) growswith matrix dimension n . Empirically, this semidefinite feasibility problembecomes easier to solve by method of convex iteration as n increases. Wefind a good value of weight w ≈ 5 for randomized problems. Initial value ofdirection matrix W is not critical. With dimension n = 26 and numberof equality constraints m = 10, convergence to a rank-ρ = 2 solution tosemidefinite feasibility problem (869) is achieved for nearly all realizationsin less than 10 iterations. 4.64 For small n , stall detection is required; onenumerical implementation is disclosed on Wıκımization.This diagonalization decomposition technique is extensible to otherproblem types, of course, including optimization problems having nontrivialobjectives. Because of a greatly expanded feasible set, for example,find X ∈ S nsubject to A svec X ≽ bX ≽ 0rankX ≤ ρ(878)this relaxation of (869) [225,III] is more easily solved by convex iterationrank-1 . 4.65 Because of the nonconvex nature of a rank-constrained problem,more generally, there can be no proof of global convergence of convex iterationfrom an arbitrary initialization; although, local convergence is guaranteed byvirtue of monotonically nonincreasing real objective sequences. [258,1.2][42,1.1]4.64 This finding is significant in so far as Barvinok’s Proposition 2.9.3.0.1 predictsexistence of matrices having rank 4 or less in this intersection, but his upper bound canbe no tighter than 3.4.65 The inequality in A remains in the constraints after diagonalization.

394 CHAPTER 4. SEMIDEFINITE PROGRAMMING

Chapter 5Euclidean Distance MatrixThese results [(910)] were obtained by Schoenberg (1935), asurprisingly late date for such a fundamental property ofEuclidean geometry.−John Clifford Gower [162,3]By itself, distance information between many points in Euclidean space islacking. We might want to know more; such as, relative or absolute positionor dimension of some hull. A question naturally arising in some fields(e.g., geodesy, economics, genetics, psychology, biochemistry, engineering)[104] asks what facts can be deduced given only distance information. Whatcan we know about the underlying points that the distance informationpurports to describe? We also ask what it means when given distanceinformation is incomplete; or suppose the distance information is not reliable,available, or specified only by certain tolerances (affine inequalities). Thesequestions motivate a study of interpoint distance, well represented in anyspatial dimension by a simple matrix from linear algebra. 5.1 In what follows,we will answer some of these questions via Euclidean distance matrices.5.1 e.g.,◦√D ∈ R N×N , a classical two-dimensional matrix representation of absoluteinterpoint distance because its entries (in ordered rows and columns) can be written neatlyon a piece of paper. Matrix D will be reserved throughout to hold distance-square.2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.395

396 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX√521⎡D = ⎣1 2 30 1 51 0 45 4 0⎤⎦123Figure 115: Convex hull of three points (N = 3) is shaded in R 3 (n=3).Dotted lines are imagined vectors to points whose affine dimension is 2.5.1 EDMEuclidean space R n is a finite-dimensional real vector space having an innerproduct defined on it, inducing a metric. [227,3.1] A Euclidean distancematrix, an EDM in R N×N+ , is an exhaustive table of distance-square d ijbetween points taken by pair from a list of N points {x l , l=1... N} in R n ;the squared metric, the measure of distance-square:d ij = ‖x i − x j ‖ 2 2 〈x i − x j , x i − x j 〉 (879)Each point is labelled ordinally, hence the row or column index of an EDM,i or j =1... N , individually addresses all the points in the list.Consider the following example of an EDM for the case N = 3 :D = [d ij ] =⎡⎣⎤d 11 d 12 d 13d 21 d 22 d 23⎦ =d 31 d 32 d 33⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎡⎦ = ⎣0 1 51 0 45 4 0⎤⎦ (880)Matrix D has N 2 entries but only N(N −1)/2 pieces of information. InFigure 115 are shown three points in R 3 that can be arranged in a list

5.2. FIRST METRIC PROPERTIES 397to correspond to D in (880). But such a list is not unique because anyrotation, reflection, or translation (5.5) of those points would produce thesame EDM D .5.2 First metric propertiesFor i,j =1... N , absolute distance between points x i and x j must satisfythe defining requirements imposed upon any metric space: [227,1.1][258,1.7] namely, for Euclidean metric √ d ij (5.4) in R n1. √ d ij ≥ 0, i ≠ j nonnegativity2. √ d ij = 0 ⇔ x i = x j self-distance3. √ d ij = √ d ji symmetry4. √ d ij ≤ √ d ik + √ d kj , i≠j ≠k triangle inequalityThen all entries of an EDM must be in concord with these Euclidean metricproperties: specifically, each entry must be nonnegative, 5.2 the main diagonalmust be 0 , 5.3 and an EDM must be symmetric. The fourth propertyprovides upper and lower bounds for each entry. Property 4 is true moregenerally when there are no restrictions on indices i,j,k , but furnishes nonew information.5.3 ∃ fifth Euclidean metric propertyThe four properties of the Euclidean metric provide information insufficientto certify that a bounded convex polyhedron more complicated than atriangle has a Euclidean realization. [162,2] Yet any list of points or thevertices of any bounded convex polyhedron must conform to the properties.5.2 Implicit from the terminology, √ d ij ≥ 0 ⇔ d ij ≥ 0 is always assumed.5.3 What we call self-distance, Marsden calls nondegeneracy. [258,1.6] Kreyszig callsthese first metric properties axioms of the metric; [227, p.4] Blumenthal refers to them aspostulates. [50, p.15]

398 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.3.0.0.1 Example. Triangle.Consider the EDM in (880), but missing one of its entries:⎡0⎤1 d 13D = ⎣ 1 0 4 ⎦ (881)d 31 4 0Can we determine unknown entries of D by applying the metric properties?Property 1 demands √ d 13 , √ d 31 ≥ 0, property 2 requires the main diagonalbe 0, while property 3 makes √ d 31 = √ d 13 . The fourth property tells us1 ≤ √ d 13 ≤ 3 (882)Indeed, described over that closed interval [1, 3] is a family of triangularpolyhedra whose angle at vertex x 2 varies from 0 to π radians. So, yes wecan determine the unknown entries of D , but they are not unique; nor shouldthey be from the information given for this example.5.3.0.0.2 Example. Small completion problem, I.Now consider the polyhedron in Figure 116b formed from an unknownlist {x 1 ,x 2 ,x 3 ,x 4 }. The corresponding EDM less one critical piece ofinformation, d 14 , is given by⎡⎤0 1 5 d 14D = ⎢ 1 0 4 1⎥⎣ 5 4 0 1 ⎦ (883)d 14 1 1 0From metric property 4 we may write a few inequalities for the two trianglescommon to d 14 ; we find√5−1 ≤√d14 ≤ 2 (884)We cannot further narrow those bounds on √ d 14 using only the four metricproperties (5.8.3.1.1). Yet there is only one possible choice for √ d 14 becausepoints x 2 ,x 3 ,x 4 must be collinear. All other values of √ d 14 in the interval[ √ 5−1, 2] specify impossible distances in any dimension; id est, in thisparticular example the triangle inequality does not yield an interval for √ d 14over which a family of convex polyhedra can be reconstructed.

5.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 399x 3x 4x 3x 4(a)√512(b)1x 1 1 x 2x 1 x 2Figure 116: (a) Complete dimensionless EDM graph. (b) Emphasizingobscured segments x 2 x 4 , x 4 x 3 , and x 2 x 3 , now only five (2N −3) absolutedistances are specified. EDM so represented is incomplete, missing d 14 asin (883), yet the isometric reconstruction (5.4.2.3.7) is unique as proved in5.9.3.0.1 and5.14.4.1.1. First four properties of Euclidean metric are nota recipe for reconstruction of this polyhedron.We will return to this simple Example 5.3.0.0.2 to illustrate more elegantmethods of solution in5.8.3.1.1,5.9.3.0.1, and5.14.4.1.1. Until then, wecan deduce some general principles from the foregoing examples:Unknown d ij of an EDM are not necessarily uniquely determinable.The triangle inequality does not produce necessarily tight bounds. 5.4Four Euclidean metric properties are insufficient for reconstruction.5.4 The term tight with reference to an inequality means equality is achievable.

400 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.3.1 LookaheadThere must exist at least one requirement more than the four properties ofthe Euclidean metric that makes them altogether necessary and sufficient tocertify realizability of bounded convex polyhedra. Indeed, there are infinitelymany more; there are precisely N +1 necessary and sufficient Euclideanmetric requirements for N points constituting a generating list (2.3.2). Hereis the fifth requirement:5.3.1.0.1 Fifth Euclidean metric property. Relative-angle inequality.(confer5.14.2.1.1) Augmenting the four fundamental properties of theEuclidean metric in R n , for all i,j,l ≠ k ∈{1... N}, i

5.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 401kθ iklθikjθ jkliljFigure 117: Fifth Euclidean metric property nomenclature. Each angle θ ismade by a vector pair at vertex k while i,j,k,l index four points at thevertices of a generally irregular tetrahedron. The fifth property is necessaryfor realization of four or more points; a reckoning by three angles in anydimension. Together with the first four Euclidean metric properties, thisfifth property is necessary and sufficient for realization of four points.

402 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXNow we develop some invaluable concepts, moving toward a link of theEuclidean metric properties to matrix criteria.5.4 EDM definitionAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrixX = [x 1 · · · x N ] ∈ R n×N (76)where N is regarded as cardinality of list X . When matrix D =[d ij ] is anEDM, its entries must be related to those points constituting the list by theEuclidean distance-square: for i,j=1... N (A.1.1 no.33)d ij = ‖x i − x j ‖ 2 = (x i − x j ) T (x i − x j ) = ‖x i ‖ 2 + ‖x j ‖ 2 − 2x T ix j= [ ] [ ][ ]x T i x T I −I xij−I I x j= vec(X) T (Φ ij ⊗ I) vecX = 〈Φ ij , X T X〉(887)where⎡vec X = ⎢⎣⎤x 1x 2⎥. ⎦ ∈ RnN (888)x Nand where ⊗ signifies Kronecker product (D.1.2.1). Φ ij ⊗ I is positivesemidefinite (1482) having I ∈ S n in its ii th and jj th block of entries while−I ∈ S n fills its ij th and ji th block; id est,Φ ij δ((e i e T j + e j e T i )1) − (e i e T j + e j e T i ) ∈ S N += e i e T i + e j e T j − e i e T j − e j e T i(889)= (e i − e j )(e i − e j ) Twhere {e i ∈ R N , i=1... N} is the set of standard basis vectors. Thus eachentry d ij is a convex quadratic function (A.4.0.0.2) of vecX (37). [307,6]

5.4. EDM DEFINITION 403The collection of all Euclidean distance matrices EDM N is a convex subsetof R N×N+ called the EDM cone (6, Figure 152 p.550);0 ∈ EDM N ⊆ S N h ∩ R N×N+ ⊂ S N (890)An EDM D must be expressible as a function of some list X ; id est, it musthave the formD(X) δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (891)= [vec(X) T (Φ ij ⊗ I) vecX , i,j=1... N] (892)Function D(X) will make an EDM given any X ∈ R n×N , conversely, butD(X) is not a convex function of X (5.4.1). Now the EDM cone may bedescribed:EDM N = { D(X) | X ∈ R N−1×N} (893)Expression D(X) is a matrix definition of EDM and so conforms to theEuclidean metric properties:Nonnegativity of EDM entries (property 1,5.2) is obvious from thedistance-square definition (887), so holds for any D expressible in the formD(X) in (891).When we say D is an EDM, reading from (891), it implicitly meansthe main diagonal must be 0 (property 2, self-distance) and D must besymmetric (property 3); δ(D) = 0 and D T = D or, equivalently, D ∈ S N hare necessary matrix criteria.5.4.0.1 homogeneityFunction D(X) is homogeneous in the sense, for ζ ∈ R√ √◦D(ζX) = |ζ|◦D(X) (894)where the positive square root is entrywise ( ◦ ).Any nonnegatively scaled EDM remains an EDM; id est, the matrix classEDM is invariant to nonnegative scaling (αD(X) for α≥0) because allEDMs of dimension N constitute a convex cone EDM N (6, Figure 144).

404 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.1 −V T N D(X)V N convexity([ ])xiWe saw that EDM entries d ij are convex quadratic functions. Yetx j−D(X) (891) is not a quasiconvex function of matrix X ∈ R n×N because thesecond directional derivative (3.8)− d2dt 2 ∣∣∣∣t=0D(X+ t Y ) = 2 ( −δ(Y T Y )1 T − 1δ(Y T Y ) T + 2Y T Y ) (895)is indefinite for any Y ∈ R n×N since its main diagonal is 0. [159,4.2.8][202,7.1 prob.2] Hence −D(X) can neither be convex in X .The outcome is different when instead we consider−V T N D(X)V N = 2V T NX T XV N (896)where we introduce the full-rank skinny Schoenberg auxiliary matrix (B.4.2)⎡V N √ 12 ⎢⎣(N(V N )=0) having range−1 −1 · · · −11 01. . .0 1⎤⎥⎦= 1 √2[ −1TI]∈ R N×N−1 (897)R(V N ) = N(1 T ) , V T N 1 = 0 (898)Matrix-valued function (896) meets the criterion for convexity in3.7.3.0.2over its domain that is all of R n×N ; videlicet, for any Y ∈ R n×N− d2dt 2 V T N D(X + t Y )V N = 4V T N Y T Y V N ≽ 0 (899)Quadratic matrix-valued function −VN TD(X)V N is therefore convex in Xachieving its minimum, with respect to a positive semidefinite cone (2.7.2.2),at X = 0. When the penultimate number of points exceeds the dimensionof the space n < N −1, strict convexity of the quadratic (896) becomesimpossible because (899) could not then be positive definite.

5.4. EDM DEFINITION 4055.4.2 Gram-form EDM definitionPositive semidefinite matrix X T X in (891), formed from inner product oflist X , is known as a Gram matrix; [250,3.6]⎡⎤‖x 1 ‖ 2 x T⎡ ⎤1x 2 x T 1x 3 · · · x T 1x Nx T 1 [ x 1 · · · x N ]xG X T ⎢ ⎥T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x NX = ⎣ . ⎦ =x T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N +xNT ⎢⎥⎣ . ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2⎡⎤⎛⎡⎤⎞1 cos ψ 12 cos ψ 13 · · · cos ψ 1N ⎛⎡⎤⎞‖x 1 ‖‖x ‖x 2 ‖cos ψ 12 1 cos ψ 23 · · · cos ψ 2N1 ‖= δ⎜⎢⎥⎟⎝⎣. ⎦⎠cos ψ 13 cos ψ 23 1... cos ψ ‖x 2 ‖3Nδ⎜⎢⎥⎟⎢⎥ ⎝⎣. ⎦⎠‖x N ‖⎣ . ....... . ⎦‖x N ‖cos ψ 1N cosψ 2N cos ψ 3N · · · 1 √ δ 2 (G) Ψ √ δ 2 (G) (900)where ψ ij (919) is angle between vectors x i and x j , and where δ 2 denotesa diagonal matrix in this case. Positive semidefiniteness of interpoint anglematrix Ψ implies positive semidefiniteness of Gram matrix G ;G ≽ 0 ⇐ Ψ ≽ 0 (901)When δ 2 (G) is nonsingular, then G ≽ 0 ⇔ Ψ ≽ 0. (A.3.1.0.5)Distance-square d ij (887) is related to Gram matrix entries G T = G [g ij ]d ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(902)where Φ ij is defined in (889). Hence the linear EDM definition}D(G) δ(G)1 T + 1δ(G) T − 2G ∈ EDM N⇐ G ≽ 0 (903)= [〈Φ ij , G〉 , i,j=1... N]The EDM cone may be described, (confer (989)(995))EDM N = { }D(G) | G ∈ S N +(904)

406 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.2.1 First point at originAssume the first point x 1 in an unknown list X resides at the origin;Xe 1 = 0 ⇔ Ge 1 = 0 (905)Consider the symmetric translation (I − 1e T 1 )D(G)(I − e 1 1 T ) that shiftsthe first row and column of D(G) to the origin; setting Gram-form EDMoperator D(G) = D for convenience,− ( D − (De 1 1 T + 1e T 1D) + 1e T 1De 1 1 T) 12 = G − (Ge 11 T + 1e T 1G) + 1e T 1Ge 1 1 Twheree 1 ⎡⎢⎣10.0⎤(906)⎥⎦ (907)is the first vector from the standard basis. Then it follows for D ∈ S N hwhereG = − ( D − (De 1 1 T + 1e T 1D) ) 12 , x 1 = 0= − [ 0 √ ] TD [ √ ]2V N 0 1 2VN2[ ] 0 0T=0 −VN TDV NV T N GV N = −V T N DV N 1 2I − e 1 1 T =[0 √ ]2V N∀Xis a projector nonorthogonally projecting (E.1) on subspaceS N 1 = {G∈ S N | Ge 1 = 0}{ [0 √ ] T [ √ ]= 2VN Y 0 2VN | Y ∈ SN} (2002)(908)(909)in the Euclidean sense. From (908) we get sufficiency of the first matrixcriterion for an EDM proved by Schoenberg in 1935; [312] 5.7{−VTD ∈ EDM NN DV N ∈ S N−1+⇔(910)D ∈ S N h5.7 From (898) we know R(V N )= N(1 T ) , so (910) is the same as (886). In fact, anymatrix V in place of V N will satisfy (910) whenever R(V )= R(V N )= N(1 T ). But V N isthe matrix implicit in Schoenberg’s seminal exposition.

5.4. EDM DEFINITION 407We provide a rigorous complete more geometric proof of this fundamentalSchoenberg criterion in5.9.1.0.4. [ [isedm(D) ] on Wıκımization]0 0TBy substituting G =0 −VN TDV (908) into D(G) (903),N[ ]0D =δ ( [−VN TDV ) 1 T + 1 0 δ ( ) ] [ ]−VNDV T T 0 0TN − 2N0 −VN TDV (1009)Nassuming x 1 = 0. Details of this bijection are provided in5.6.2.5.4.2.2 0 geometric centerAssume the geometric center (5.5.1.0.1) of an unknown list X is the origin;X1 = 0 ⇔ G1 = 0 (911)Now consider the calculation (I − 1 N 11T )D(G)(I − 1 N 11T ) , a geometriccentering or projection operation. (E.7.2.0.2) Setting D(G) = D forconvenience as in5.4.2.1,G = − ( D − 1 N (D11T + 11 T D) + 1N 2 11 T D11 T) 12 , X1 = 0= −V DV 1 2V GV = −V DV 1 2∀X(912)where more properties of the auxiliary (geometric centering, projection)matrixV I − 1 N 11T ∈ S N (913)are found inB.4. V GV may be regarded as a covariance matrix of means 0.From (912) and the assumption D ∈ S N h we get sufficiency of the more popularform of Schoenberg’s criterion:D ∈ EDM N⇔{−V DV ∈ SN+D ∈ S N h(914)Of particular utility when D ∈ EDM N is the fact, (B.4.2 no.20) (887)tr ( ( )∑−V DV 2) 1 =1∑d2N ij = 12N vec(X)T Φ ij ⊗ I vec Xi,ji,j= tr(V GV ) , G ≽ 0∑= trG = N ‖x l ‖ 2 = ‖X‖ 2 F , X1 = 0l=1(915)

408 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXwhere ∑ Φ ij ∈ S N + (889), therefore convex in vecX . We will find this traceuseful as a heuristic to minimize affine dimension of an unknown list arrangedcolumnar in X (7.2.2), but it tends to facilitate reconstruction of a listconfiguration having least energy; id est, it compacts a reconstructed list byminimizing total norm-square of the vertices.By substituting G=−V DV 1 (912) into D(G) (903), assuming X1=02D = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(999)Details of this bijection can be found in5.6.1.1.5.4.2.3 hypersphereThese foregoing relationships allow combination of distance and Gramconstraints in any optimization problem we might pose:Interpoint angle Ψ can be constrained by fixing all the individual pointlengths δ(G) 1/2 ; thenΨ = − 1 2 δ2 (G) −1/2 V DV δ 2 (G) −1/2 (916)(confer5.9.1.0.3, (1098) (1241)) Constraining all main diagonal entriesg ii of a Gram matrix to 1, for example, is equivalent to the constraintthat all points lie on a hypersphere of radius 1 centered at the origin.D = 2(g 11 11 T − G) ∈ EDM N (917)Requiring 0 geometric center then becomes equivalent to the constraint:D1 = 2N1. [89, p.116] Any further constraint on that Gram matrixapplies only to interpoint angle matrix Ψ = G .Because any point list may be constrained to lie on a hypersphere boundarywhose affine dimension exceeds that of the list, a Gram matrix may alwaysbe constrained to have equal positive values along its main diagonal.(Laura Klanfer 1933 [312,3]) This observation renewed interest in theelliptope (5.9.1.0.1).

5.4. EDM DEFINITION 4095.4.2.3.1 Example. List-member constraints via Gram matrix.Capitalizing on identity (912) relating Gram and EDM D matrices, aconstraint set such astr ( − 1V DV e )2 ie T i = ‖xi ‖ 2⎫⎪tr ( ⎬− 1V DV (e 2 ie T j + e j e T i ) 2) 1 = xTi x jtr ( (918)− 1V DV e ) ⎪2 je T j = ‖xj ‖ 2 ⎭relates list member x i to x j to within an isometry through inner-productidentity [380,1-7]cos ψ ij =xT i x j‖x i ‖ ‖x j ‖(919)where ψ ij is angle between the two vectors as in (900). For M list members,there total M(M+1)/2 such constraints. Angle constraints are incorporatedin Example 5.4.2.3.3 and Example 5.4.2.3.10.Consider the academic problem of finding a Gram matrix subject toconstraints on each and every entry of the corresponding EDM:findD∈S N h−V DV 1 2 ∈ SNsubject to 〈 D , (e i e T j + e j e T i ) 1 2〉= ď ij , i,j=1... N , i < j−V DV ≽ 0(920)where the ďij are given nonnegative constants. EDM D can, of course,be replaced with the equivalent Gram-form (903). Requiring only theself-adjointness property (1418) of the main-diagonal linear operator δ weget, for A∈ S N〈D , A〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , A 〉 = 〈G , δ(A1) − A〉 2 (921)Then the problem equivalent to (920) becomes, for G∈ S N c ⇔ G1=0findG∈S N csubject toG ∈ S N〈G , δ ( (e i e T j + e j e T i )1 ) 〉− (e i e T j + e j e T i ) = ďij , i,j=1... N , i < jG ≽ 0 (922)

410 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXFigure 118: Rendering of Fermat point in acrylic on canvas by Suman Vaze.Three circles intersect at Fermat point of minimum total distance from threevertices of (and interior to) red/black/white triangle.Barvinok’s Proposition 2.9.3.0.1 predicts existence for either formulation(920) or (922) such that implicit equality constraints induced by subspacemembership are ignored⌊√ ⌋8(N(N −1)/2) + 1 − 1rankG , rankV DV ≤= N − 1 (923)2because, in each case, the Gram matrix is confined to a face of positivesemidefinite cone S N + isomorphic with S N−1+ (6.6.1). (E.7.2.0.2) This boundis tight (5.7.1.1) and is the greatest upper bound. 5.85.4.2.3.2 Example. First duality.Kuhn reports that the first dual optimization problem 5.9 to be recorded inthe literature dates back to 1755. [Wıκımization] Perhaps more intriguing5.8 −V DV | N←1 = 0 (B.4.1)5.9 By dual problem is meant, in the strongest sense: the optimal objective achieved bya maximization problem, dual to a given minimization problem (related to each otherby a Lagrangian function), is always equal to the optimal objective achieved by theminimization. (Figure 58 Example 2.13.1.0.3) A dual problem is always convex.

5.4. EDM DEFINITION 411is the fact: this earliest instance of duality is a two-dimensional Euclideandistance geometry problem known as a Fermat point (Figure 118) namedafter the French mathematician. Given N distinct points in the plane{x i ∈ R 2 , i=1... N} , the Fermat point y is an optimal solution tominimizeyN∑‖y − x i ‖ (924)i=1a convex minimization of total distance. The historically first dual problemformulation asks for the smallest equilateral triangle encompassing (N = 3)three points x i . Another problem dual to (924) (Kuhn 1967)maximize{z i }subject toN∑〈z i , x i 〉i=1N∑z i = 0i=1‖z i ‖ ≤ 1∀i(925)has interpretation as minimization of work required to balance potentialenergy in an N-way tug-of-war between equally matched opponents situatedat {x i }. [374]It is not so straightforward to write the Fermat point problem (924)equivalently in terms of a Gram matrix from this section. Squaring insteadminimizeαN∑‖α−x i ‖ 2 ≡i=1minimizeD∈S N+1 〈−V , D〉subject to 〈D , e i e T j + e j e T i 〉 1 2 = ďij∀(i,j)∈ I−V DV ≽ 0 (926)yields an inequivalent convex geometric centering problem whoseequality constraints comprise EDM D main-diagonal zeros and knowndistances-square. 5.10 Going the other way, a problem dual to totaldistance-square maximization (Example 6.7.0.0.1) is a penultimate minimumeigenvalue problem having application to PageRank calculation by searchengines [231,4]. [340]Fermat function (924) is empirically compared with (926) in [61,8.7.3],but for multiple unknowns in R 2 , where propensity of (924) for producing5.10 α ⋆ is geometric center of points x i (973). For three points, I = {1,2,3} ; optimalaffine dimension (5.7) must be 2 because a third dimension can only increase totaldistance. Minimization of 〈−V,D〉 is a heuristic for rank minimization. (7.2.2)

412 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXx 3x 4x 5x 6Figure 119: Arbitrary hexagon in R 3 whose vertices are labelled clockwise.x 1x 2zero distance between unknowns is revealed. An optimal solution to (924)gravitates toward gradient discontinuities (D.2.1), as in Figure 72, whereasoptimal solution to (926) is less compact in the unknowns. 5.11 5.4.2.3.3 Example. Hexagon.Barvinok [25,2.6] poses a problem in geometric realizability of an arbitraryhexagon (Figure 119) having:1. prescribed (one-dimensional) face-lengths l2. prescribed angles ϕ between the three pairs of opposing faces3. a constraint on the sum of norm-square of each and every vertex x ;ten affine equality constraints in all on a Gram matrix G∈ S 6 (912). Let’srealize this as a convex feasibility problem (with constraints written in thesame order) also assuming 0 geometric center (911):findD∈S 6 h−V DV 1 2 ∈ S6subject to tr ( )D(e i e T j + e j e T i ) 1 2 = l2ij , j−1 = (i = 1... 6) mod 6tr ( − 1V DV (A 2 i + A T i ) 2) 1 = cos ϕi , i = 1, 2, 3tr(− 1 2 V DV ) = 1−V DV ≽ 0 (927)5.11 Optimal solution to (924) has mechanical interpretation in terms of interconnectingsprings with constant force when distance is nonzero; otherwise, 0 force. Problem (926) isinterpreted instead using linear springs.

5.4. EDM DEFINITION 413Figure 120: Sphere-packing illustration from [373, kissing number].Translucent balls illustrated all have the same diameter.where, for A i ∈ R 6×6 (919)A 1 = (e 1 − e 6 )(e 3 − e 4 ) T /(l 61 l 34 )A 2 = (e 2 − e 1 )(e 4 − e 5 ) T /(l 12 l 45 )A 3 = (e 3 − e 2 )(e 5 − e 6 ) T /(l 23 l 56 )(928)and where the first constraint on length-square l 2 ij can be equivalently writtenas a constraint on the Gram matrix −V DV 1 via (921). We show how to2numerically solve such a problem by alternating projection inE.10.2.1.1.Barvinok’s Proposition 2.9.3.0.1 asserts existence of a list, correspondingto Gram matrix G solving this feasibility problem, whose affine dimension(5.7.1.1) does not exceed 3 because the convex feasible set is bounded bythe third constraint tr(− 1 V DV ) = 1 (915).25.4.2.3.4 Example. Kissing number of sphere packing.Two nonoverlapping Euclidean balls are said to kiss if they touch. Anelementary geometrical problem can be posed: Given hyperspheres, eachhaving the same diameter 1, how many hyperspheres can simultaneouslykiss one central hypersphere? [394] The noncentral hyperspheres are allowed,but not required, to kiss.As posed, the problem seeks the maximal number of spheres K kissinga central sphere in a particular dimension. The total number of spheres isN = K + 1. In one dimension the answer to this kissing problem is 2. In twodimensions, 6. (Figure 7)

414 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXThe question was presented, in three dimensions, to Isaac Newton byDavid Gregory in the context of celestial mechanics. And so was borna controversy between the two scholars on the campus of Trinity CollegeCambridge in 1694. Newton correctly identified the kissing number as 12(Figure 120) while Gregory argued for 13. Their dispute was finally resolvedin 1953 by Schütte & van der Waerden. [298] In 2003, Oleg Musin tightenedthe upper bound on kissing number K in four dimensions from 25 to K = 24by refining a method of Philippe Delsarte from 1973. Delsarte’s methodprovides an infinite number [16] of linear inequalities necessary for convertinga rank-constrained semidefinite program 5.12 to a linear program. 5.13 [273]There are no proofs known for kissing number in higher dimensionexcepting dimensions eight and twenty four. Interest persists [84] becausesphere packing has found application to error correcting codes from thefields of communications and information theory; specifically to quantumcomputing. [92]Translating this problem to an EDM graph realization (Figure 116,Figure 121) is suggested by Pfender & Ziegler. Imagine the centers of eachsphere are connected by line segments. Then the distance between centersmust obey simple criteria: Each sphere touching the central sphere has a linesegment of length exactly 1 joining its center to the central sphere’s center.All spheres, excepting the central sphere, must have centers separated by adistance of at least 1.From this perspective, the kissing problem can be posed as a semidefiniteprogram. Assign index 1 to the central sphere, and assume a total of Nspheres:minimize − trWV TD∈S NN DV Nsubject to D 1j = 1, j = 2... N(929)D ij ≥ 1, 2 ≤ i < j = 3... ND ∈ EDM NThen the kissing numberK = N max − 1 (930)5.12 whose feasible set belongs to that subset of an elliptope (5.9.1.0.1) bounded aboveby some desired rank.5.13 Simplex-method solvers for linear programs produce numerically better results thancontemporary log-barrier (interior-point method) solvers for semidefinite programs byabout 7 orders of magnitude; and they are far more predisposed to vertex solutions.

5.4. EDM DEFINITION 415is found from the maximal number of spheres N that solve this semidefiniteprogram in a given affine dimension r . Matrix W can be interpreted as thedirection of search through the positive semidefinite cone S N−1+ for a rank-roptimal solution −VN TD⋆ V N ; it is constant, in this program, determined bya method disclosed in4.4.1. In two dimensions,⎡W =⎢⎣4 1 2 −1 −1 11 4 −1 −1 2 12 −1 4 1 1 −1−1 −1 1 4 1 2−1 2 1 1 4 −11 1 −1 2 −1 4⎤1⎥6⎦(931)In three dimensions,⎡W =⎢⎣9 1 −2 −1 3 −1 −1 1 2 1 −2 11 9 3 −1 −1 1 1 −2 1 2 −1 −1−2 3 9 1 2 −1 −1 2 −1 −1 1 2−1 −1 1 9 1 −1 1 −1 3 2 −1 13 −1 2 1 9 1 1 −1 −1 −1 1 −1−1 1 −1 −1 1 9 2 −1 2 −1 2 3−1 1 −1 1 1 2 9 3 −1 1 −2 −11 −2 2 −1 −1 −1 3 9 2 −1 1 12 1 −1 3 −1 2 −1 2 9 −1 1 −11 2 −1 2 −1 −1 1 −1 −1 9 3 1−2 −1 1 −1 1 2 −2 1 1 3 9 −11 −1 2 1 −1 3 −1 1 −1 1 −1 9⎤112⎥⎦(932)A four-dimensional solution has rational direction matrix as well, but thesedirection matrices are not unique and their precision not critical. Here is anoptimal point list 5.14 in Matlab output format:5.14 An optimal five-dimensional point list is known: The answer was known at least 175years ago. I believe Gauss knew it. Moreover, Korkine & Zolotarev proved in 1882 that D 5is the densest lattice in five dimensions. So they proved that if a kissing arrangement infive dimensions can be extended to some lattice, then k(5)= 40. Of course, the conjecturein the general case also is: k(5)= 40. You would like to see coordinates? Easily.Let A= √ 2. Then p(1)=(A,A,0,0,0), p(2)=(−A,A,0,0,0), p(3)=(A, −A,0,0,0), ...p(40)=(0,0,0, −A, −A); i.e., we are considering points with coordinates that have twoA and three 0 with any choice of signs and any ordering of the coordinates; the same

416 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXColumns 1 through 6X = 0 -0.1983 -0.4584 0.1657 0.9399 0.74160 0.6863 0.2936 0.6239 -0.2936 0.39270 -0.4835 0.8146 -0.6448 0.0611 -0.42240 0.5059 0.2004 -0.4093 -0.1632 0.3427Columns 7 through 12-0.4815 -0.9399 -0.7416 0.1983 0.4584 -0.28320 0.2936 -0.3927 -0.6863 -0.2936 -0.6863-0.8756 -0.0611 0.4224 0.4835 -0.8146 -0.3922-0.0372 0.1632 -0.3427 -0.5059 -0.2004 -0.5431Columns 13 through 180.2832 -0.2926 -0.6473 0.0943 0.3640 -0.36400.6863 0.9176 -0.6239 -0.2313 -0.0624 0.06240.3922 0.1698 -0.2309 -0.6533 -0.1613 0.16130.5431 -0.2088 0.3721 0.7147 -0.9152 0.9152Columns 19 through 25-0.0943 0.6473 -0.1657 0.2926 -0.5759 0.5759 0.48150.2313 0.6239 -0.6239 -0.9176 0.2313 -0.2313 00.6533 0.2309 0.6448 -0.1698 -0.2224 0.2224 0.8756-0.7147 -0.3721 0.4093 0.2088 -0.7520 0.7520 0.0372This particular optimal solution was found by solving a problem sequence inincreasing number of spheres.Numerical problems begin to arise with matrices of this size due tointerior-point methods of solution. By eliminating some equality constraintsfor this particular problem, matrix size can be reduced: From5.8.3 we have[ 0−VNDV T N = 11 T T− [0 I ] DI] 12(933)(which does not hold more generally) where identity matrix I ∈ S N−1 hascoordinates-expression in dimensions 3 and 4.The first miracle happens in dimension 6. There are better packings than D 6(Conjecture: k(6)=72). It’s a real miracle how dense the packing is in eight dimensions(E 8 =Korkine & Zolotarev packing that was discovered in 1880s) and especially indimension 24, that is the so-called Leech lattice.Actually, people in coding theory have conjectures on the kissing numbers for dimensionsup to 32 (or even greater?). However, sometimes they found better lower bounds. I knowthat Ericson & Zinoviev a few years ago discovered (by hand, no computer) in dimensions13 and 14 better kissing arrangements than were known before. −Oleg Musin

5.4. EDM DEFINITION 417one less dimension than EDM D . By defining an EDM principal submatrix[ ] 0TˆD [0 I ] D ∈ S N−1Ih(934)we get a convex problem equivalent to (929)minimize − tr(W ˆD)ˆD∈S N−1subject to ˆDij ≥ 1 , 1 ≤ i < j = 2... N −111 T − ˆD 1 2 ≽ 0δ( ˆD) = 0(935)Any feasible solution 11 T − ˆD 1 2 belongs to an elliptope (5.9.1.0.1). This next example shows how finding the common point of intersectionfor three circles in a plane, a nonlinear problem, has convex expression.5.4.2.3.5 Example. Trilateration in wireless sensor network.Given three known absolute point positions in R 2 (three anchors ˇx 2 , ˇx 3 , ˇx 4 )and only one unknown point (one sensor x 1 ), the sensor’s absolute position isdetermined from its noiseless measured distance-square ďi1 to each of threeanchors (Figure 2, Figure 121a). This trilateration can be expressed as aconvex optimization problem in terms of list X [x 1 ˇx 2 ˇx 3 ˇx 4 ]∈ R 2×4 andGram matrix G∈ S 4 (900):minimize trGG∈S 4 , X∈R2×4 subject to tr(GΦ i1 ) = ďi1 , i = 2, 3, 4tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4X(:, 2:4) = [ ˇx 2 ˇx 3 ˇx 4 ][ ] I XX T≽ 0G(936)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (889)and where the constraint on distance-square ďi1 is equivalently written asa constraint on the Gram matrix via (902). There are 9 independent affine

418 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX(a)(c)x 3x 4 x 5x 5 x 6√d13√d14x 3ˇx 3 ˇx 4√d12ˇx 2x 1x 1x 2x 4x 1x 1Figure 121: (a) Given three distances indicated with absolute pointpositions ˇx 2 , ˇx 3 , ˇx 4 known and noncollinear, absolute position of x 1 in R 2can be precisely and uniquely determined by trilateration; solution to asystem of nonlinear equations. Dimensionless EDM graphs (b) (c) (d)represent EDMs in various states of completion. Line segments representknown absolute distances and may cross without vertex at intersection.(b) Four-point list must always be embeddable in affine subset havingdimension rankVN TDV N not exceeding 3. To determine relative position ofx 2 ,x 3 ,x 4 , triangle inequality is necessary and sufficient (5.14.1). Knowingall distance information, then (by injectivity of D (5.6)) point position x 1is uniquely determined to within an isometry in any dimension. (c) Whenfifth point is introduced, only distances to x 3 ,x 4 ,x 5 are required todetermine relative position of x 2 in R 2 . Graph represents first instanceof missing distance information; √ d 12 . (d) Three distances are absent( √ d 12 , √ d 13 , √ d 23 ) from complete set of interpoint distances, yet uniqueisometric reconstruction (5.4.2.3.7) of six points in R 2 is certain.(b)(d)

5.4. EDM DEFINITION 419equality constraints on that Gram matrix while the sensor is constrainedonly by dimensioning to lie in R 2 . Although the objective trG ofminimization 5.15 insures a solution on the boundary of positive semidefinitecone S 4 + , for this problem, we claim that the set of feasible Gram matricesforms a line (2.5.1.1) in isomorphic R 10 tangent (2.1.7.1.2) to the positivesemidefinite cone boundary. (5.4.2.3.6, confer4.2.1.3)By Schur complement (A.4,2.9.1.0.1), any feasible G and X provideG ≽ X T X (937)which is a convex relaxation of the desired (nonconvex) equality constraint[ ] [ ] I X I [I X ]X T =G X T (938)expected positive semidefinite rank-2 under noiseless conditions.by (1491), the relaxation admitsBut,(3 ≥) rankG ≥ rankX (939)(a third dimension corresponding to an intersection of three spheres,not circles, were there noise). If rank of an optimal solution equals 2,[ ]I Xrank⋆X ⋆T G ⋆ = 2 (940)then G ⋆ = X ⋆T X ⋆ by Theorem A.4.0.1.3.As posed, this localization problem does not require affinely independent(Figure 27, three noncollinear) anchors. Assuming the anchors exhibitno rotational or reflective symmetry in their affine hull (5.5.2) andassuming the sensor x 1 lies in that affine hull, then sensor position solutionx ⋆ 1= X ⋆ (:, 1) is unique under noiseless measurement. [320] This preceding transformation of trilateration to a semidefinite programworks all the time ((940) holds) despite relaxation (937) because the optimalsolution set is a unique point.5.15 Trace (tr G = 〈I , G〉) minimization is a heuristic for rank minimization. (7.2.2.1)It may be interpreted as squashing G which is bounded below by X T X as in (937);id est, G−X T X ≽ 0 ⇒ trG ≥ tr X T X (1489). δ(G−X T X)=0 ⇔ G=X T X (A.7.2)⇒ tr G = tr X T X which is a condition necessary for equality.

420 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.2.3.6 Proof (sketch). Only the sensor location x 1 is unknown.The objective function together with the equality constraints make a linearsystem of equations in Gram matrix variable GtrG = ‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2tr(GΦ i1 = ďi1 , i = 2, 3, 4tr ( ) (941)Ge i e T i = ‖ˇxi ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4which is invertible:⎡⎤svec(I) Tsvec(Φ 21 ) Tsvec(Φ 31 ) Tsvec(Φ 41 ) Tsvec(e 2 e T 2 ) Tsvec G =svec(e 3 e T 3 ) Tsvec(e 4 e T 4 ) Tsvec ( (e⎢ 2 e T 3 + e 3 e T 2 )/2 ) T⎣ svec ( (e 2 e T 4 + e 4 e T 2 )/2 ) T ⎥svec ( (e 3 e T 4 + e 4 e T 3 )/2 ) ⎦T−1⎡‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2 ⎤ď 21ď 31ď 41‖ˇx 2 ‖ 2‖ˇx 3 ‖ 2‖ˇx 4 ‖ 2⎢ ˇx T 2ˇx 3⎥⎣ ˇx T 2ˇx 4⎦ˇx T 3ˇx 4(942)That line in the ambient space S 4 of G , claimed on page 419, is traced by‖x 1 ‖ 2 ∈ R on the right-hand side, as it turns out. One must show this line tobe tangential (2.1.7.1.2) to S 4 + in order to prove uniqueness. Tangency ispossible for affine dimension 1 or 2 while its occurrence depends completelyon the known measurement data.But as soon as significant noise is introduced or whenever distance data isincomplete, such problems can remain convex although the set of all optimalsolutions generally becomes a convex set bigger than a single point (and stillcontaining the noiseless solution).5.4.2.3.7 Definition. Isometric reconstruction. (confer5.5.3)Isometric reconstruction from an EDM means building a list X correct towithin a rotation, reflection, and translation; in other terms, reconstructionof relative position, correct to within an isometry, correct to within a rigidtransformation.△

5.4. EDM DEFINITION 421How much distance information is needed to uniquely localize a sensor(to recover actual relative position)? The narrative in Figure 121 helpsdispel any notion of distance data proliferation in low affine dimension(r

422 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXx 1x 4x 6x 3x 5x 2Figure 122: Incomplete EDM corresponding to this dimensionless EDMgraph provides unique isometric reconstruction in R 2 . (drawn freehand; nosymmetry intended, confer (943))ˇx 4x 1x 2ˇx 3 ˇx 5Figure 123: (Ye) Two sensors • and three anchors ◦ in R 2 . Connectingline-segments denote known absolute distances. Incomplete EDMcorresponding to this dimensionless EDM graph provides unique isometricreconstruction in R 2 .

5.4. EDM DEFINITION 423105ˇx 4ˇx 50ˇx 3−5x 2x 1−10−15−6 −4 −2 0 2 4 6 8Figure 124: Given in red are two discrete linear trajectories of sensors x 1and x 2 in R 2 localized by algorithm (944) as indicated by blue bullets • .Anchors ˇx 3 , ˇx 4 , ˇx 5 corresponding to Figure 123 are indicated by ⊗ . Whentargets and bullets • coincide under these noiseless conditions, localizationis successful. On this run, two visible localization errors are due to rank-3Gram optimal solutions. These errors can be corrected by choosing a differentnormal in objective of minimization.

424 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.2.3.8 Example. Tandem trilateration in wireless sensor network.Given three known absolute point-positions in R 2 (three anchors ˇx 3 , ˇx 4 , ˇx 5 ),two unknown sensors x 1 , x 2 ∈ R 2 have absolute position determinable fromtheir noiseless distances-square (as indicated in Figure 123) assuming theanchors exhibit no rotational or reflective symmetry in their affine hull(5.5.2). This example differs from Example 5.4.2.3.5 in so far as trilaterationof each sensor is now in terms of one unknown position: the other sensor. Weexpress this localization as a convex optimization problem (a semidefiniteprogram,4.1) in terms of list X [x 1 x 2 ˇx 3 ˇx 4 ˇx 5 ]∈ R 2×5 and Grammatrix G∈ S 5 (900) via relaxation (937):minimize trGG∈S 5 , X∈R2×5 subject to tr(GΦ i1 ) = ďi1 , i = 2, 4, 5tr(GΦ i2 ) = ďi2 , i = 3, 5tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 3, 4, 5tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 3 ≤ i < j = 4, 5X(:, 3:5) = [ ˇx 3 ˇx 4 ˇx 5 ][ ] I XX T≽ 0G(944)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (889)This problem realization is fragile because of the unknown distances betweensensors and anchors. Yet there is no more information we may include beyondthe 11 independent equality constraints on the Gram matrix (nonredundantconstraints not antithetical) to reduce the feasible set. 5.17Exhibited in Figure 124 are two mistakes in solution X ⋆ (:,1:2) dueto a rank-3 optimal Gram matrix G ⋆ . The trace objective is a heuristicminimizing convex envelope of quasiconcave function 5.18 rankG. (2.9.2.9.2,7.2.2.1) A rank-2 optimal Gram matrix can be found and the errorscorrected by choosing a different normal for the linear objective function,5.17 By virtue of their dimensioning, the sensors are already constrained to R 2 the affinehull of the anchors.5.18 Projection on that nonconvex subset of all N ×N-dimensional positive semidefinitematrices, in an affine subset, whose rank does not exceed 2 is a problem considered difficultto solve. [353,4]

5.4. EDM DEFINITION 425now implicitly the identity matrix I ; id est,trG = 〈G , I 〉 ← 〈G , δ(u)〉 (945)where vector u ∈ R 5 is randomly selected. A random search for a goodnormal δ(u) in only a few iterations is quite easy and effective because:the problem is small, an optimal solution is known a priori to exist in twodimensions, a good normal direction is not necessarily unique, and (wespeculate) because the feasible affine-subset slices the positive semidefinitecone thinly in the Euclidean sense. 5.19We explore ramifications of noise and incomplete data throughout; theirindividual effect being to expand the optimal solution set, introducing moresolutions and higher-rank solutions. Hence our focus shifts in4.4 todiscovery of a reliable means for diminishing the optimal solution set byintroduction of a rank constraint.Now we illustrate how a problem in distance geometry can be solvedwithout equality constraints representing measured distance; instead, wehave only upper and lower bounds on distances measured:5.4.2.3.9 Example. Wireless location in a cellular telephone network.Utilizing measurements of distance, time of flight, angle of arrival, or signalpower in the context of wireless telephony, multilateration is the processof localizing (determining absolute position of) a radio signal source • byinferring geometry relative to multiple fixed base stations ◦ whose locationsare known.We consider localization of a cellular telephone by distance geometry,so we assume distance to any particular base station can be inferred fromreceived signal power. On a large open flat expanse of terrain, signal-powermeasurement corresponds well with inverse distance. But it is not uncommonfor measurement of signal power to suffer 20 decibels in loss caused by factorssuch as multipath interference (signal reflections), mountainous terrain,man-made structures, turning one’s head, or rolling the windows up in anautomobile. Consequently, contours of equal signal power are no longercircular; their geometry is irregular and would more aptly be approximated5.19 The log det rank-heuristic from7.2.2.4 does not work here because it chooses thewrong normal. Rank reduction (4.1.2.1) is unsuccessful here because Barvinok’s upperbound (2.9.3.0.1) on rank of G ⋆ is 4.

426 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXˇx 7x 1ˇx 2ˇx 3ˇx 4ˇx 6ˇx 5Figure 125: Regions of coverage by base stations ◦ in a cellular telephonenetwork. The term cellular arises from packing of regions best coveredby neighboring base stations. Illustrated is a pentagonal cell best coveredby base station ˇx 2 . Like a Voronoi diagram, cell geometry depends onbase-station arrangement. In some US urban environments, it is not unusualto find base stations spaced approximately 1 mile apart. There can be asmany as 20 base-station antennae capable of receiving signal from any givencell phone • ; practically, that number is closer to 6.Figure 126: Some fitted contours of equal signal power in R 2 transmittedfrom a commercial cellular telephone • over about 1 mile suburban terrainoutside San Francisco in 2005. Data courtesy Polaris Wireless. [322]

5.4. EDM DEFINITION 427by translated ellipsoids of graduated orientation and eccentricity as inFigure 126.Depicted in Figure 125 is one cell phone x 1 whose signal power isautomatically and repeatedly measured by 6 base stations ◦ nearby. 5.20Those signal power measurements are transmitted from that cell phone tobase station ˇx 2 who decides whether to transfer (hand-off or hand-over)responsibility for that call should the user roam outside its cell. 5.21Due to noise, at least one distance measurement more than the minimumnumber of measurements is required for reliable localization in practice;3 measurements are minimum in two dimensions, 4 in three. 5.22 Existenceof noise precludes measured distance from the input data. We instead assignmeasured distance to a range estimate specified by individual upper andlower bounds: d i1 is the upper bound on distance-square from the cell phoneto i th base station, while d i1 is the lower bound. These bounds become theinput data. Each measurement range is presumed different from the others.Then convex problem (936) takes the form:minimize trGG∈S 7 , X∈R2×7 subject to d i1 ≤ tr(GΦ i1 ) ≤ d i1 , i = 2... 7tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2... 7tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3... 7whereX(:, 2:7) = [ ˇx 2 ˇx 3 ˇx 4 ˇx 5 ˇx 6 ˇx 7 ][ ] I XX T≽ 0 (946)GΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (889)This semidefinite program realizes the wireless location problem illustrated inFigure 125. Location X ⋆ (:, 1) is taken as solution, although measurement5.20 Cell phone signal power is typically encoded logarithmically with 1-decibel incrementand 64-decibel dynamic range.5.21 Because distance to base station is quite difficult to infer from signal powermeasurements in an urban environment, localization of a particular cell phone • bydistance geometry would be far easier were the whole cellular system instead conceived socell phone x 1 also transmits (to base station ˇx 2 ) its signal power as received by all othercell phones within range.5.22 In Example 4.4.1.2.4, we explore how this convex optimization algorithm fares in theface of measurement noise.

428 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXFigure 127: A depiction of molecular conformation. [116]noise will often cause rankG ⋆ to exceed 2. Randomized search for a rank-2optimal solution is not so easy here as in Example 5.4.2.3.8. We introduce amethod in4.4 for enforcing the stronger rank-constraint (940). To formulatethis same problem in three dimensions, point list X is simply redimensionedin the semidefinite program.5.4.2.3.10 Example. (Biswas, Nigam, Ye) Molecular Conformation.The subatomic measurement technique called nuclear magnetic resonancespectroscopy (NMR) is employed to ascertain physical conformation ofmolecules; e.g., Figure 3, Figure 127. From this technique, distance, angle,and dihedral angle measurements can be obtained. Dihedral angles ariseconsequent to a phenomenon where atom subsets are physically constrainedto Euclidean planes.In the rigid covalent geometry approximation, the bond lengthsand angles are treated as completely fixed, so that a given spatialstructure can be described very compactly indeed by a list oftorsion angles alone... These are the dihedral angles betweenthe planes spanned by the two consecutive triples in a chain offour covalently bonded atoms.−G. M. Crippen & T. F. Havel (1988) [88,1.1]

5.4. EDM DEFINITION 429Crippen & Havel recommend working exclusively with distance data becausethey consider angle data to be mathematically cumbersome. The presentexample shows instead how inclusion of dihedral angle data into a problemstatement can be made elegant and convex.As before, ascribe position information to the matrixX = [x 1 · · · x N ] ∈ R 3×N (76)and introduce a matrix ℵ holding normals η to planes respecting dihedralangles ϕ :ℵ [η 1 · · · η M ] ∈ R 3×M (947)As in the other examples, we preferentially work with Gram matrices Gbecause of the bridge they provide between other variables; we define[ ] [ ] [ ]Gℵ Z ℵZ T T ℵ ℵ T X ℵT [ ℵ X ]G X X T ℵ X T =∈ RX X T N+M×N+M (948)whose rank is 3 by assumption. So our problem’s variables are the twoGram matrices G X and G ℵ and matrix Z = ℵ T X of cross products. Thenmeasurements of distance-square d can be expressed as linear constraints onG X as in (946), dihedral angle ϕ measurements can be expressed as linearconstraints on G ℵ by (919), and normal-vector η conditions can be expressedby vanishing linear constraints on cross-product matrix Z : Consider threepoints x labelled 1, 2, 3 assumed to lie in the l th plane whose normal is η l .There might occur, for example, the independent constraintsη T l (x 1 − x 2 ) = 0η T l (x 2 − x 3 ) = 0(949)which are expressible in terms of constant matrices A∈ R M×N ;〈Z , A l12 〉 = 0〈Z , A l23 〉 = 0(950)Although normals η can be constrained exactly to unit length,δ(G ℵ ) = 1 (951)NMR data is noisy; so measurements are given as upper and lower bounds.Given bounds on dihedral angles respecting 0 ≤ ϕ j ≤ π and bounds on

430 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXdistances d i and given constant matrices A k (950) and symmetric matricesΦ i (889) and B j per (919), then a molecular conformation problem can beexpressed:findG ℵ ∈S M , G X ∈S N , Z∈R M×N G Xsubject to d i ≤ tr(G X Φ i ) ≤ d i , ∀i ∈ I 1cos ϕ j ≤ tr(G ℵ B j ) ≤ cos ϕ j , ∀j ∈ I 2〈Z , A k 〉 = 0 , ∀k ∈ I 3G X 1 = 0δ(G ℵ ) = 1[ ]Gℵ ZZ T ≽ 0G X[ ]Gℵ ZrankZ T = 3G X(952)where G X 1=0 provides a geometrically centered list X (5.4.2.2). Ignoringthe rank constraint would tend to force cross-product matrix Z to zero.What binds these three variables is the rank constraint; we show how tosatisfy it in4.4.5.4.3 Inner-product form EDM definitionWe might, for example, want to realize a constellation given onlyinterstellar distance (or, equivalently, parsecs from our Sun andrelative angular measurement; the Sun as vertex to two distantstars); called stellar cartography.(p.22)Equivalent to (887) is [380,1-7] [331,3.2]d ij = d ik + d kj − 2 √ d ik d kj cos θ ikj= [√ d ik√dkj] [ 1 −e ıθ ikj−e −ıθ ikj1] [√ ]d ik√dkj(953)called law of cosines where ı √ −1 , i , j , k are positive integers, and θ ikjis the angle at vertex x k formed by vectors x i − x k and x j − x k ;cos θ ikj =1(d 2 ik + d kj − d ij )√ = (x i − x k ) T (x j − x k )dik d kj ‖x i − x k ‖ ‖x j − x k ‖(954)

5.4. EDM DEFINITION 431where ([√the numerator ]) forms an inner product of vectors. Distance-squaredikd ij √dkj is a convex quadratic function 5.23 on R 2 + whereas d ij (θ ikj ) isquasiconvex (3.8) minimized over domain {−π ≤θ ikj ≤π} by θ ⋆ ikj =0, weget the Pythagorean theorem when θ ikj = ±π/2, and d ij (θ ikj ) is maximizedwhen θ ⋆ ikj =±π ; d ij = (√ d ik + √ ) 2,d kj θikj = ±πsod ij = d ik + d kj , θ ikj = ± π 2d ij = (√ d ik − √ d kj) 2,θikj = 0(955)| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj (956)Hence the triangle inequality, Euclidean metric property 4, holds for anyEDM D .We may construct an inner-product form of the EDM definition formatrices by evaluating (953) for k=1 : By defining⎡√ √ √ ⎤d 12 d12 d 13 cos θ 213 d12 d 14 cos θ 214 · · · d12 d 1N cos θ 21N√ √ √ d12 d 13 cos θ 213 d 13 d13 d 14 cos θ 314 · · · d13 d 1N cos θ 31NΘ T Θ √ √ √ d12 d 14 cos θ 214 d13 d 14 cos θ 314 d 14... d14 d 1N cos θ 41N∈ S N−1⎢⎥⎣ ........√ √. ⎦√d12 d 1N cos θ 21N d13 d 1N cos θ 31N d14 d 1N cos θ 41N · · · d 1Nthen any EDM may be expressed[ ]0D(Θ) δ(Θ T 1Θ)T + 1 [ [0 δ(Θ T Θ) T] 0 0T− 20 Θ T Θ=[0 δ(Θ T Θ) Tδ(Θ T Θ) δ(Θ T Θ)1 T + 1δ(Θ T Θ) T − 2Θ T Θ]](957)∈ EDM N(958)EDM N = { D(Θ) | Θ ∈ R N−1×N−1} (959)for which all Euclidean metric properties hold. Entries of Θ T Θ result fromvector inner-products as in (954); id est,[]5.23 1 −e ıθ ikj−e −ıθ ≽ 0, having eigenvalues {0,2}. Minimum is attained forikj1[ √ ] {dik√dkjµ1, µ ≥ 0, θikj = 0=0, −π ≤ θ ikj ≤ π , θ ikj ≠ 0 . (D.2.1, [61, exmp.4.5])

432 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXΘ = [x 2 − x 1 x 3 − x 1 · · · x N − x 1 ] = X √ 2V N ∈ R n×N−1 (960)Inner product Θ T Θ is obviously related to a Gram matrix (900),[ 0 0TG =0 Θ T Θ], x 1 = 0 (961)For D = D(Θ) and no condition on the list X (confer (908) (912))Θ T Θ = −V T NDV N ∈ R N−1×N−1 (962)5.4.3.1 Relative-angle formThe inner-product form EDM definition is not a unique definition ofEuclidean distance matrix; there are approximately five flavors distinguishedby their argument to operator D . Here is another one:Like D(X) (891), D(Θ) will make an EDM given any Θ∈ R n×N−1 , it isneither a convex function of Θ (5.4.3.2), and it is homogeneous in the sense(894). Scrutinizing Θ T Θ (957) we find that because of the arbitrary choicek = 1, distances therein are all with respect to point x 1 . Similarly, relativeangles in Θ T Θ are between all vector pairs having vertex x 1 . Yet pickingarbitrary θ i1j to fill Θ T Θ will not necessarily make an EDM; inner product(957) must be positive semidefinite.Θ T Θ = √ δ(d) Ω √ δ(d) ⎡√ ⎤⎡⎤⎡√ ⎤d12 0 1 cos θ 213 · · · cos θ 21N d12 0√ d13 ⎢⎥cosθ 213 1... cos θ √31Nd13 ⎣ ... ⎦⎢⎥⎢⎥√ ⎣ ....... . ⎦⎣... ⎦√0d1N cos θ 21N cos θ 31N · · · 1 0d1N(963)Expression D(Θ) defines an EDM for any positive semidefinite relative-anglematrixΩ = [cos θ i1j , i,j = 2...N] ∈ S N−1 (964)and any nonnegative distance vectord = [d 1j , j = 2...N] = δ(Θ T Θ) ∈ R N−1 (965)

5.4. EDM DEFINITION 433because (A.3.1.0.5)Ω ≽ 0 ⇒ Θ T Θ ≽ 0 (966)Decomposition (963) and the relative-angle matrix inequality Ω ≽ 0 lead toa different expression of an inner-product form EDM definition (958)D(Ω,d) =[ ] 01dT + 1 [ 0 d ] √ ([ ])[ ] √ ([ ])0 0 0 T T 0− 2 δδd 0 Ω d[ ]0 dTd d1 T + 1d T − 2 √ δ(d) Ω √ ∈ EDM Nδ(d)(967)and another expression of the EDM cone:EDM N ={D(Ω,d) | Ω ≽ 0, √ }δ(d) ≽ 0(968)In the particular circumstance x 1 = 0, we can relate interpoint anglematrix Ψ from the Gram decomposition in (900) to relative-angle matrixΩ in (963). Thus,[ ] 1 0TΨ ≡ , x0 Ω 1 = 0 (969)5.4.3.2 Inner-product form −V T N D(Θ)V N convexityOn[√page 431]we saw that each EDM entry d ij is a convex quadratic functiondikof √dkj and a quasiconvex function of θ ikj . Here the situation forinner-product form EDM operator D(Θ) (958) is identical to that in5.4.1for list-form D(X) ; −D(Θ) is not a quasiconvex function of Θ by the samereasoning, and from (962)−V T N D(Θ)V N = Θ T Θ (970)is a convex quadratic function of Θ on domain R n×N−1 achieving its minimumat Θ = 0.

434 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.4.3.3 Inner-product form, discussionWe deduce that knowledge of interpoint distance is equivalent to knowledgeof distance and angle from the perspective of one point, x 1 in our chosencase. The total amount of information N(N −1)/2 in Θ T Θ is unchanged 5.24with respect to EDM D .5.5 InvarianceWhen D is an EDM, there exist an infinite number of corresponding N-pointlists X (76) in Euclidean space. All those lists are related by isometrictransformation: rotation, reflection, and translation (offset or shift).5.5.1 TranslationAny translation common among all the points x l in a list will be cancelled inthe formation of each d ij . Proof follows directly from (887). Knowing thattranslation α in advance, we may remove it from the list constituting thecolumns of X by subtracting α1 T . Then it stands to reason by list-formdefinition (891) of an EDM, for any translation α∈ R nD(X − α1 T ) = D(X) (971)In words, interpoint distances are unaffected by offset; EDM D is translationinvariant. When α = x 1 in particular,[x 2 −x 1 x 3 −x 1 · · · x N −x 1 ] = X √ 2V N ∈ R n×N−1 (960)and so(D(X −x 1 1 T ) = D(X −Xe 1 1 T ) = D X[0 √ ])2V N= D(X) (972)5.24 The reason for amount O(N 2 ) information is because of the relative measurements.Use of a fixed reference in measurement of angles and distances would reduce requiredinformation but is antithetical. In the particular case n = 2, for example, ordering allpoints x l (in a length-N list) by increasing angle of vector x l − x 1 with respect tox 2 − x 1 , θ i1j becomes equivalent to j−1 ∑θ k,1,k+1 ≤ 2π and the amount of information isreduced to 2N −3; rather, O(N).k=i

5.5. INVARIANCE 4355.5.1.0.1 Example. Translating geometric center to origin.We might choose to shift the geometric center α c of an N-point list {x l }(arranged columnar in X) to the origin; [352] [163]α = α c Xb c X1 1 N ∈ P ⊆ A (973)where A represents the list’s affine hull. If we were to associate a point-massm l with each of the points x l in the list, then their center of mass(or gravity) would be ( ∑ x l m l )/ ∑ m l . The geometric center is the sameas the center of mass under the assumption of uniform mass density acrosspoints. [219] The geometric center always lies in the convex hull P of the list;id est, α c ∈ P because b T c 1=1 and b c ≽ 0 . 5.25 Subtracting the geometriccenter from every list member,X − α c 1 T = X − 1 N X11T = X(I − 1 N 11T ) = XV ∈ R n×N (974)where V is the geometric centering matrix (913). So we have (confer (891))D(X) = D(XV ) = δ(V T X T XV )1 T +1δ(V T X T XV ) T −2V T X T XV ∈ EDM N5.5.1.1 Gram-form invarianceFollowing from (975) and the linear Gram-form EDM operator (903):(975)D(G) = D(V GV ) = δ(V GV )1 T + 1δ(V GV ) T − 2V GV ∈ EDM N (976)The Gram-form consequently exhibits invariance to translation by a doublet(B.2) u1 T + 1u T D(G) = D(G − (u1 T + 1u T )) (977)because, for any u∈ R N , D(u1 T + 1u T )=0. The collection of all suchdoublets forms the nullspace (993) to the operator; the translation-invariantsubspace S N⊥c (2000) of the symmetric matrices S N . This means matrix Gis not unique and can belong to an expanse more broad than a positivesemidefinite cone; id est, G∈ S N + − S N⊥c . So explains Gram matrix sufficiencyin EDM definition (903). 5.265.25 Any b from α = Xb chosen such that b T 1 = 1, more generally, makes an auxiliaryV -matrix. (B.4.5)5.26 A constraint G1=0 would prevent excursion into the translation-invariant subspace(numerical unboundedness).

436 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.5.2 Rotation/ReflectionRotation of the list X ∈ R n×N about some arbitrary point α∈ R n , orreflection through some affine subset containing α , can be accomplishedvia Q(X −α1 T ) where Q is an orthogonal matrix (B.5).We rightfully expectD ( Q(X − α1 T ) ) = D(QX − β1 T ) = D(QX) = D(X) (978)Because list-form D(X) is translation invariant, we may safely ignoreoffset and consider only the impact of matrices that premultiply X .Interpoint distances are unaffected by rotation or reflection; we say,EDM D is rotation/reflection invariant. Proof follows from the fact,Q T =Q −1 ⇒ X T Q T QX =X T X . So (978) follows directly from (891).The class of premultiplying matrices for which interpoint distances areunaffected is a little more broad than orthogonal matrices. Looking at EDMdefinition (891), it appears that any matrix Q p such thatX T Q T pQ p X = X T X (979)will have the propertyD(Q p X) = D(X) (980)An example is skinny Q p ∈ R m×n (m>n) having orthonormal columns. Wecall such a matrix orthonormal.5.5.2.0.1 Example. Reflection prevention and quadrature rotation.Consider the EDM graph in Figure 128b representing known distancebetween vertices (Figure 128a) of a tilted-square diamond in R 2 . Supposesome geometrical optimization problem were posed where isometrictransformation is allowed excepting reflection, and where rotation mustbe quantized so that only quadrature rotations are allowed; only multiplesof π/2.In two dimensions, a counterclockwise rotation of any vector about theorigin by angle θ is prescribed by the orthogonal matrix[ ]cos θ − sin θQ =sin θ cosθ(981)

5.5. INVARIANCE 437x 2 x 2x 3 x 1 x 3x 1x 4 x 4(a) (b) (c)Figure 128: (a) Four points in quadrature in two dimensions about theirgeometric center. (b) Complete EDM graph of diamond-shaped vertices.(c) Quadrature rotation of a Euclidean body in R 2 first requires a shroudthat is the smallest Cartesian square containing it.whereas reflection of any point through a hyperplane containing the origin{ ∣ ∣∣∣∣ [ ] Tcos θ∂H = x∈ R 2 x = 0}(982)sin θis accomplished via multiplication with symmetric orthogonal matrix (B.5.2)[ ]sin(θ)R =2 − cos(θ) 2 −2 sin(θ) cos(θ)−2 sin(θ) cos(θ) cos(θ) 2 − sin(θ) 2 (983)Rotation matrix Q is characterized by identical diagonal entries and byantidiagonal entries equal but opposite in sign, whereas reflection matrix Ris characterized in the reverse sense.Assign the diamond vertices { x l ∈ R 2 , l=1... 4 } to columns of a matrixX = [x 1 x 2 x 3 x 4 ] ∈ R 2×4 (76)Our scheme to prevent reflection enforces a rotation matrix characteristicupon the coordinates of adjacent points themselves: First shift the geometriccenter of X to the origin; for geometric centering matrix V ∈ S 4 (5.5.1.0.1),defineY XV ∈ R 2×4 (984)

438 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXTo maintain relative quadrature between points (Figure 128a) and to preventreflection, it is sufficient that all interpoint distances be specified and thatadjacencies Y (:, 1:2) , Y (:, 2:3) , and Y (:, 3:4) be proportional to 2×2rotation matrices; any clockwise rotation would ascribe a reflection matrixcharacteristic. Counterclockwise rotation is thereby enforced by constrainingequality among diagonal and antidiagonal entries as prescribed by (981);[ ] 0 1Y (:,1:3) = Y (:, 2:4) (985)−1 0Quadrature quantization of rotation can be regarded as a constrainton tilt of the smallest Cartesian square containing the diamond as inFigure 128c. Our scheme to quantize rotation requires that all squarevertices be described by vectors whose entries are nonnegative when thesquare is translated anywhere interior to the nonnegative orthant. Wecapture the four square vertices as columns of a product Y C where⎡ ⎤1 0 0 1C = ⎢ 1 1 0 0⎥⎣ 0 1 1 0 ⎦ (986)0 0 1 1Then, assuming a unit-square shroud, the affine constraint[ ] 1/2Y C + 1 T ≥ 0 (987)1/2quantizes rotation, as desired.5.5.2.1 Inner-product form invarianceLikewise, D(Θ) (958) is rotation/reflection invariant;so (979) and (980) similarly apply.5.5.3 Invariance conclusionD(Q p Θ) = D(QΘ) = D(Θ) (988)In the making of an EDM, absolute rotation, reflection, and translationinformation is lost. Given an EDM, reconstruction of point position (5.12,the list X) can be guaranteed correct only in affine dimension r and relativeposition. Given a noiseless complete EDM, this isometric reconstruction isunique in so far as every realization of a corresponding list X is congruent:

5.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 4395.6 Injectivity of D & unique reconstructionInjectivity implies uniqueness of isometric reconstruction (5.4.2.3.7); hence,we endeavor to demonstrate it.EDM operators list-form D(X) (891), Gram-form D(G) (903), andinner-product form D(Θ) (958) are many-to-one surjections (5.5) onto thesame range; the EDM cone (6): (confer (904) (995))EDM N = { D(X) : R N−1×N → S N h | X ∈ R N−1×N}= { }D(G) : S N → S N h | G ∈ S N + − S N⊥c= { (989)D(Θ) : R N−1×N−1 → S N h | Θ ∈ R N−1×N−1}where (5.5.1.1)S N⊥c = {u1 T + 1u T | u∈ R N } ⊆ S N (2000)5.6.1 Gram-form bijectivityBecause linear Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (903)has no nullspace [85,A.1] on the geometric center subspace 5.27 (E.7.2.0.2)S N c {G∈ S N | G1 = 0} (1998)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1999)≡ {V N AVN T | A ∈ SN−1 }(990)then D(G) on that subspace is injective.5.27 The equivalence ≡ in (990) follows from the fact: Given B = V Y V = V N AVN T ∈ SN cwith only matrix A∈ S N−1 unknown, then V † †TNBV N = A or V † N Y V †TN = A.

440 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX∈ basis S N⊥hdim S N c = dim S N h = N(N−1)2in R N(N+1)/2dim S N⊥c= dim S N⊥h= N in R N(N+1)/2basis S N c = V {E ij }V (confer (59))S N cbasis S N⊥cS N h∈ basis S N⊥hFigure 129: Orthogonal complements in S N abstractly oriented inisometrically isomorphic R N(N+1)/2 . Case N = 2 accurately illustrated in R 3 .Orthogonal projection of basis for S N⊥h on S N⊥c yields another basis for S N⊥c .(Basis vectors for S N⊥c are illustrated lying in a plane orthogonal to S N c in thisdimension. Basis vectors for each ⊥ space outnumber those for its respectiveorthogonal complement; such is not the case in higher dimension.)

5.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 441To prove injectivity of D(G) on S N c : Any matrix Y ∈ S N can bedecomposed into orthogonal components in S N ;Y = V Y V + (Y − V Y V ) (991)where V Y V ∈ S N c and Y −V Y V ∈ S N⊥c (2000). Because of translationinvariance (5.5.1.1) and linearity, D(Y −V Y V )=0 hence N(D)⊇ S N⊥c .It remains only to showD(V Y V ) = 0 ⇔ V Y V = 0 (992)(⇔ Y = u1 T + 1u T for some u∈ R N) . D(V Y V ) will vanish whenever2V Y V = δ(V Y V )1 T + 1δ(V Y V ) T . But this implies R(1) (B.2) were asubset of R(V Y V ) , which is contradictory. Thus we haveN(D) = {Y | D(Y )=0} = {Y | V Y V = 0} = S N⊥c (993)Since G1=0 ⇔ X1=0 (911) simply means list X is geometricallycentered at the origin, and because the Gram-form EDM operator D istranslation invariant and N(D) is the translation-invariant subspace S N⊥c ,then EDM definition D(G) (989) on 5.28 (confer6.5.1,6.6.1,A.7.4.0.1)S N c ∩ S N + = {V Y V ≽ 0 | Y ∈ S N } ≡ {V N AV T Nmust be surjective onto EDM N ; (confer (904))EDM N = { D(G) | G ∈ S N c ∩ S N +5.6.1.1 Gram-form operator D inversion| A∈ S N−1+ } ⊂ S N (994)Define the linear geometric centering operator V ; (confer (912))}(995)V(D) : S N → S N −V DV 1 2(996)[89,4.3] 5.29 This orthogonal projector V has no nullspace onS N h = aff EDM N (1249)5.28 Equivalence ≡ in (994) follows from the fact: Given B = V Y V = V N AVN T ∈ SN + withonly matrix A unknown, then V † †TNBV N= A and A∈ SN−1 + must be positive semidefiniteby positive semidefiniteness of B and Corollary A.3.1.0.5.5.29 Critchley cites Torgerson (1958) [349, ch.11,2] for a history and derivation of (996).

442 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXbecause the projection of −D/2 on S N c (1998) can be 0 if and only ifD ∈ S N⊥c ; but S N⊥c ∩ S N h = 0 (Figure 129). Projector V on S N h is thereforeinjective hence uniquely invertible. Further, −V S N h V/2 is equivalent to thegeometric center subspace S N c in the ambient space of symmetric matrices; asurjection,S N c = V(S N ) = V ( S N h ⊕ S N⊥h) ( )= V SNh(997)because (72)V ( ) ( ) (S N h ⊇ V SN⊥h = V δ 2 (S N ) ) (998)Because D(G) on S N c is injective, and aff D ( V(EDM N ) ) = D ( V(aff EDM N ) )by property (127) of the affine hull, we find for D ∈ S N hid est,orD(−V DV 1 2 ) = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (999)D = D ( V(D) ) (1000)−V DV = V ( D(−V DV ) ) (1001)S N h = D ( V(S N h ) ) (1002)−V S N h V = V ( D(−V S N h V ) ) (1003)These operators V and D are mutual inverses.The Gram-form D ( )S N c (903) is equivalent to SNh ;D ( S N c)= D(V(SNh ⊕ S N⊥h) ) = S N h + D ( V(S N⊥h ) ) = S N h (1004)because S N h ⊇ D ( V(S N⊥h ) ) . In summary, for the Gram-form we have theisomorphisms [90,2] [89, p.76, p.107] [7,2.1] 5.30 [6,2] [8,18.2.1] [2,2.1]and from bijectivity results in5.6.1,S N h = D(S N c ) (1005)S N c = V(S N h ) (1006)EDM N = D(S N c ∩ S N +) (1007)S N c ∩ S N + = V(EDM N ) (1008)5.30 In [7, p.6, line 20], delete sentence: Since G is also...not a singleton set.[7, p.10, line 11] x 3 =2 (not 1).

5.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 4435.6.2 Inner-product form bijectivityThe Gram-form EDM operator D(G)= δ(G)1 T + 1δ(G) T − 2G (903) is aninjective map, for example, on the domain that is the subspace of symmetricmatrices having all zeros in the first row and columnS N 1 = {G∈ S N | Ge 1 = 0}{[ ] [ 0 0T 0 0T= Y0 I 0 I]| Y ∈ S N }(2002)because it obviously has no nullspace there. Since Ge 1 = 0 ⇔ Xe 1 = 0 (905)means the first point in the list X resides at the origin, then D(G) on S N 1 ∩ S N +must be surjective onto EDM N .Substituting Θ T Θ ← −VN TDV N (970) into inner-product form EDMdefinition D(Θ) (958), it may be further decomposed:[0D(D) =δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TN[ ] 0 0T− 20 −VN TDV N(1009)This linear operator D is another flavor of inner-product form and an injectivemap of the EDM cone onto itself. Yet when its domain is instead the entiresymmetric hollow subspace S N h = aff EDM N , D(D) becomes an injectivemap onto that same subspace. Proof follows directly from the fact: linear Dhas no nullspace [85,A.1] on S N h = aff D(EDM N )= D(aff EDM N ) (127).5.6.2.1 Inversion of D ( −VN TDV )NInjectivity of D(D) suggests inversion of (confer (908))V N (D) : S N → S N−1 −V T NDV N (1010)a linear surjective 5.31 mapping onto S N−1 having nullspace 5.32 S N⊥c ;V N (S N h ) = S N−1 (1011)5.31 Surjectivity of V N (D) is demonstrated via the Gram-form EDM operator D(G):Since S N h = D(S N c ) (1004), then for any Y ∈ S N−1 , −VN T †TD(VN Y V † N /2)V N = Y .5.32 N(V N ) ⊇ S N⊥c is apparent. There exists a linear mappingT(V N (D)) V †TN V N(D)V † N = −V DV 1 2 = V(D)

444 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXinjective on domain S N h because S N⊥c ∩ S N h = 0. Revising the argument ofthis inner-product form (1009), we get another flavorD ( [−VN TDV )N =0δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TNand we obtain mutual inversion of operators V N and D , for D ∈ S N h[ 0 0T− 20 −VN TDV N(1012)]orD = D ( V N (D) ) (1013)−V T NDV N = V N(D(−VTN DV N ) ) (1014)S N h = D ( V N (S N h ) ) (1015)−V T N S N h V N = V N(D(−VTN S N h V N ) ) (1016)Substituting Θ T Θ ← Φ into inner-product form EDM definition (958),any EDM may be expressed by the new flavor[ ]0D(Φ) 1δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(1017)Φ ≽ 0where this D is a linear surjective operator onto EDM N by definition,injective because it has no nullspace on domain S N−1+ . More broadly,aff D(S N−1+ )= D(aff S N−1+ ) (127),S N h = D(S N−1 )S N−1 = V N (S N h )(1018)demonstrably isomorphisms, and by bijectivity of this inner-product form:such thatEDM N = D(S N−1+ ) (1019)S N−1+ = V N (EDM N ) (1020)N(T(V N )) = N(V) ⊇ N(V N ) ⊇ S N⊥c= N(V)where the equality S N⊥c = N(V) is known (E.7.2.0.2).

5.7. EMBEDDING IN AFFINE HULL 4455.7 Embedding in affine hullThe affine hull A (78) of a point list {x l } (arranged columnar in X ∈ R n×N(76)) is identical to the affine hull of that polyhedron P (86) formed from allconvex combinations of the x l ; [61,2] [307,17]A = aff X = aff P (1021)Comparing hull definitions (78) and (86), it becomes obvious that the x land their convex hull P are embedded in their unique affine hull A ;A ⊇ P ⊇ {x l } (1022)Recall: affine dimension r is a lower bound on embedding, equal todimension of the subspace parallel to that nonempty affine set A in whichthe points are embedded. (2.3.1) We define dimension of the convex hull Pto be the same as dimension r of the affine hull A [307,2], but r is notnecessarily equal to rank of X (1041).For the particular example illustrated in Figure 115, P is the triangle inunion with its relative interior while its three vertices constitute the entirelist X . Affine hull A is the unique plane that contains the triangle, so affinedimension r = 2 in that example while rank of X is 3. Were there only twopoints in Figure 115, then the affine hull would instead be the unique linepassing through them; r would become 1 while rank would then be 2.5.7.1 Determining affine dimensionKnowledge of affine dimension r becomes important because we lose anyabsolute offset common to all the generating x l in R n when reconstructingconvex polyhedra given only distance information. (5.5.1) To calculate r , wefirst remove any offset that serves to increase dimensionality of the subspacerequired to contain polyhedron P ; subtracting any α ∈ A in the affine hullfrom every list member will work,translating A to the origin: 5.33X − α1 T (1023)A − α = aff(X − α1 T ) = aff(X) − α (1024)P − α = conv(X − α1 T ) = conv(X) − α (1025)5.33 The manipulation of hull functions aff and conv follows from their definitions.

446 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXBecause (1021) and (1022) translate,R n ⊇ A −α = aff(X − α1 T ) = aff(P − α) ⊇ P − α ⊇ {x l − α} (1026)where from the previous relations it is easily shownaff(P − α) = aff(P) − α (1027)Translating A neither changes its dimension or the dimension of theembedded polyhedron P ; (77)r dim A = dim(A − α) dim(P − α) = dim P (1028)For any α ∈ R n , (1024)-(1028) remain true. [307, p.4, p.12] Yet whenα ∈ A , the affine set A − α becomes a unique subspace of R n in whichthe {x l − α} and their convex hull P − α are embedded (1026), and whosedimension is more easily calculated.5.7.1.0.1 Example. Translating first list-member to origin.Subtracting the first member α x 1 from every list member will translatetheir affine hull A and their convex hull P and, in particular, x 1 ∈ P ⊆ A tothe origin in R n ; videlicet,X −x 1 1 T = X −Xe 1 1 T = X(I −e 1 1 T ) = X[0 √ ]2V N ∈ R n×N (1029)where V N is defined in (897), and e 1 in (907). Applying (1026) to (1029),R n ⊇ R(XV N ) = A−x 1 = aff(X −x 1 1 T ) = aff(P −x 1 ) ⊇ P −x 1 ∋ 0(1030)where XV N ∈ R n×N−1 . Hencer = dim R(XV N ) (1031)Since shifting the geometric center to the origin (5.5.1.0.1) translates theaffine hull to the origin as well, then it must also be truer = dim R(XV ) (1032)

5.7. EMBEDDING IN AFFINE HULL 447For any matrix whose range is R(V )= N(1 T ) we get the same result; e.g.,becauser = dim R(XV †TN ) (1033)R(XV ) = {Xz | z ∈ N(1 T )} (1034)and R(V ) = R(V N ) = R(V †TN) (E). These auxiliary matrices (B.4.2) aremore closely related;V = V N V † N(1656)5.7.1.1 Affine dimension r versus rankNow, suppose D is an EDM as defined byD(X) = δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (891)and we premultiply by −V T N and postmultiply by V N . Then because V T N 1=0(898), it is always true that−V T NDV N = 2V T NX T XV N = 2V T N GV N ∈ S N−1 (1035)where G is a Gram matrix.(confer (912))Similarly pre- and postmultiplying by V−V DV = 2V X T XV = 2V GV ∈ S N (1036)always holds because V 1=0 (1646). Likewise, multiplying inner-productform EDM definition (958), it always holds:−V T NDV N = Θ T Θ ∈ S N−1 (962)For any matrix A , rankA T A = rankA = rankA T . [202,0.4] 5.34 So, by(1034), affine dimensionr = rankXV = rankXV N = rankXV †TN= rank Θ= rankV DV = rankV GV = rankVN TDV N = rankVN TGV N(1037)5.34 For A∈R m×n , N(A T A) = N(A). [331,3.3]

448 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXBy conservation of dimension, (A.7.3.0.1)r + dim N(VNDV T N ) = N −1 (1038)r + dim N(V DV ) = N (1039)For D ∈ EDM N −V T NDV N ≻ 0 ⇔ r = N −1 (1040)but −V DV ⊁ 0. The general fact 5.35 (confer (923))r ≤ min{n, N −1} (1041)is evident from (1029) but can be visualized in the example illustrated inFigure 115. There we imagine a vector from the origin to each point in thelist. Those three vectors are linearly independent in R 3 , but affine dimensionr is 2 because the three points lie in a plane. When that plane is translatedto the origin, it becomes the only subspace of dimension r=2 that cancontain the translated triangular polyhedron.5.7.2 PrécisWe collect expressions for affine dimension r : for list X ∈ R n×N and Grammatrix G∈ S N +r dim(P − α) = dim P = dim conv X= dim(A − α) = dim A = dim aff X= rank(X − x 1 1 T ) = rank(X − α c 1 T )= rank Θ (960)= rankXV N = rankXV = rankXV †TN= rankX , Xe 1 = 0 or X1=0= rankVN TGV N = rankV GV = rankV † N GV N= rankG , Ge 1 = 0 (908) or G1=0 (912)= rankVN TDV N = rankV DV = rankV † N DV N = rankV N (VN TDV N)VNT= rank Λ (1128)([ ]) 0 1T= N −1 − dim N1 −D[ 0 1T= rank1 −D]− 2 (1049)(1042)⎫⎪⎬D ∈ EDM N⎪⎭5.35 rankX ≤ min{n , N}

5.7. EMBEDDING IN AFFINE HULL 4495.7.3 Eigenvalues of −V DV versus −V † N DV NSuppose for D ∈ EDM N we are given eigenvectors v i ∈ R N of −V DV andcorresponding eigenvalues λ∈ R N so that−V DV v i = λ i v i , i = 1... N (1043)From these we can determine the eigenvectors and eigenvalues of −V † N DV N :Defineν i V † N v i , λ i ≠ 0 (1044)Then we have:−V DV N V † N v i = λ i v i (1045)−V † N V DV N ν i = λ i V † N v i (1046)−V † N DV N ν i = λ i ν i (1047)the eigenvectors of −V † N DV N are given by (1044) while its correspondingnonzero eigenvalues are identical to those of −V DV although −V † N DV Nis not necessarily positive semidefinite. In contrast, −VN TDV N is positivesemidefinite but its nonzero eigenvalues are generally different.5.7.3.0.1 Theorem. EDM rank versus affine dimension r .[163,3] [184,3] [162,3] For D ∈ EDM N (confer (1201))1. r = rank(D) − 1 ⇔ 1 T D † 1 ≠ 0Points constituting a list X generating the polyhedron corresponding toD lie on the relative boundary of an r-dimensional circumhyperspherehavingdiameter = √ 2 ( 1 T D † 1 ) −1/2circumcenter = XD† 11 T D † 1(1048)2. r = rank(D) − 2 ⇔ 1 T D † 1 = 0There can be no circumhypersphere whose relative boundary containsa generating list for the corresponding polyhedron.3. In Cayley-Menger form [113,6.2] [88,3.3] [50,40] (5.11.2),([ ]) [ ]0 1T 0 1Tr = N −1 − dim N= rank − 2 (1049)1 −D 1 −DCircumhyperspheres exist for r< rank(D)−2. [347,7]⋄

450 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXFor all practical purposes, (1041)max{0, rank(D) − 2} ≤ r ≤ min{n, N −1} (1050)5.8 Euclidean metric versus matrix criteria5.8.1 Nonnegativity property 1When D=[d ij ] is an EDM (891), then it is apparent from (1035)2VNX T T XV N = −VNDV T N ≽ 0 (1051)because for any matrix A , A T A≽0 . 5.36 We claim nonnegativity of the d ijis enforced primarily by the matrix inequality (1051); id est,−V T N DV N ≽ 0D ∈ S N h}⇒ d ij ≥ 0, i ≠ j (1052)(The matrix inequality to enforce strict positivity differs by a stroke of thepen. (1055))We now support our claim: If any matrix A∈ R m×m is positivesemidefinite, then its main diagonal δ(A)∈ R m must have all nonnegativeentries. [159,4.2] Given D ∈ S N h−VN TDV N =⎡⎤1d 12 2 (d 112+d 13 −d 23 )2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 12+d 1N −d 2N )12 (d 112+d 13 −d 23 ) d 13 2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 13+d 1N −d 3N )1⎢2 (d 11,j+1+d 1,i+1 −d j+1,i+1 )2 (d .1,j+1+d 1,i+1 −d j+1,i+1 ) d .. 11,i+1 2 (d 14+d 1N −d 4N )⎥⎣..... . .. . ⎦12 (d 112+d 1N −d 2N )2 (d 113+d 1N −d 3N )2 (d 14+d 1N −d 4N ) · · · d 1N= 1 2 (1D 1,2:N + D 2:N,1 1 T − D 2:N,2:N ) ∈ S N−1 (1053)5.36 For A∈R m×n , A T A ≽ 0 ⇔ y T A T Ay = ‖Ay‖ 2 ≥ 0 for all ‖y‖ = 1. When A isfull-rank skinny-or-square, A T A ≻ 0.

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 451where row,column indices i,j ∈ {1... N −1}. [312] It follows:−V T N DV N ≽ 0D ∈ S N h}⎡⇒ δ(−VNDV T N ) = ⎢⎣⎤d 12d 13⎥.d 1N⎦ ≽ 0 (1054)Multiplication of V N by any permutation matrix Ξ has null effect on its rangeand nullspace. In other words, any permutation of the rows or columns of V Nproduces a basis for N(1 T ); id est, R(Ξ r V N )= R(V N Ξ c )= R(V N )= N(1 T ).Hence, −VN TDV N ≽ 0 ⇔ −VN TΞT rDΞ r V N ≽ 0 (⇔ −Ξ T c VN TDV N Ξ c ≽ 0).Various permutation matrices 5.37 will sift the remaining d ij similarlyto (1054) thereby proving their nonnegativity. Hence −VN TDV N ≽ 0 isa sufficient test for the first property (5.2) of the Euclidean metric,nonnegativity.When affine dimension r equals 1, in particular, nonnegativity symmetryand hollowness become necessary and sufficient criteria satisfying matrixinequality (1051). (6.5.0.0.1)5.8.1.1 Strict positivityShould we require the points in R n to be distinct, then entries of D off themain diagonal must be strictly positive {d ij > 0, i ≠ j} and only those entriesalong the main diagonal of D are 0. By similar argument, the strict matrixinequality is a sufficient test for strict positivity of Euclidean distance-square;−V T N DV N ≻ 0D ∈ S N h5.8.2 Triangle inequality property 4}⇒ d ij > 0, i ≠ j (1055)In light of Kreyszig’s observation [227,1.1 prob.15] that properties 2through 4 of the Euclidean metric (5.2) together imply nonnegativityproperty 1,5.37 The rule of thumb is: If Ξ r (i,1) = 1, then δ(−V T N ΞT rDΞ r V N )∈ R N−1 is somepermutation of the i th row or column of D excepting the 0 entry from the main diagonal.

452 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX2 √ d jk = √ d jk + √ d kj ≥ √ d jj = 0 , j ≠k (1056)nonnegativity criterion (1052) suggests that matrix inequality −V T N DV N ≽ 0might somehow take on the role of triangle inequality; id est,δ(D) = 0D T = D−V T N DV N ≽ 0⎫⎬⎭ ⇒ √ d ij ≤ √ d ik + √ d kj , i≠j ≠k (1057)We now show that is indeed the case: Let T be the leading principalsubmatrix in S 2 of −VN TDV N (upper left 2×2 submatrix from (1053));[T 1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13(1058)Submatrix T must be positive (semi)definite whenever −VN TDV N(A.3.1.0.4,5.8.3) Now we have,is.−V T N DV N ≽ 0 ⇒ T ≽ 0 ⇔ λ 1 ≥ λ 2 ≥ 0−V T N DV N ≻ 0 ⇒ T ≻ 0 ⇔ λ 1 > λ 2 > 0(1059)where λ 1 and λ 2 are the eigenvalues of T , real due only to symmetry of T :(λ 1 = 1 d2 12 + d 13 + √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(λ 2 = 1 d2 12 + d 13 − √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R (1060)Nonnegativity of eigenvalue λ 1 is guaranteed by only nonnegativity of the d ijwhich in turn is guaranteed by matrix inequality (1052). Inequality betweenthe eigenvalues in (1059) follows from only realness of the d ij . Since λ 1always equals or exceeds λ 2 , conditions for the positive (semi)definiteness ofsubmatrix T can be completely determined by examining λ 2 the smaller ofits two eigenvalues. A triangle inequality is made apparent when we express

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 453T eigenvalue nonnegativity in terms of D matrix entries; videlicet,T ≽ 0 ⇔ detT = λ 1 λ 2 ≥ 0 , d 12 ,d 13 ≥ 0 (c)⇔λ 2 ≥ 0(b)⇔| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)(1061)Triangle inequality (1061a) (confer (956) (1073)), in terms of three rootedentries from D , is equivalent to metric property 4√d13 ≤ √ d 12 + √ d 23√d23≤ √ d 12 + √ d 13√d12≤ √ d 13 + √ d 23(1062)for the corresponding points x 1 ,x 2 ,x 3 from some length-N list. 5.385.8.2.1 CommentGiven D whose dimension N equals or exceeds 3, there are N!/(3!(N − 3)!)distinct triangle inequalities in total like (956) that must be satisfied, of whicheach d ij is involved in N −2, and each point x i is in (N −1)!/(2!(N −1 − 2)!).We have so far revealed only one of those triangle inequalities; namely,(1061a) that came from T (1058). Yet we claim if −VN TDV N ≽ 0 thenall triangle inequalities will be satisfied simultaneously;| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj , i

454 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.8.2.1.1 Shore. The columns of Ξ r V N Ξ c hold a basis for N(1 T )when Ξ r and Ξ c are permutation matrices. In other words, any permutationof the rows or columns of V N leaves its range and nullspace unchanged;id est, R(Ξ r V N Ξ c )= R(V N )= N(1 T ) (898). Hence, two distinct matrixinequalities can be equivalent tests of the positive semidefiniteness of D onR(V N ) ; id est, −VN TDV N ≽ 0 ⇔ −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c )≽0. By properlychoosing permutation matrices, 5.39 the leading principal submatrix T Ξ ∈ S 2of −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c ) may be loaded with the entries of D needed totest any particular triangle inequality (similarly to (1053)-(1061)). Becauseall the triangle inequalities can be individually tested using a test equivalentto the lone matrix inequality −VN TDV N ≽0, it logically follows that the lonematrix inequality tests all those triangle inequalities simultaneously. Weconclude that −VN TDV N ≽0 is a sufficient test for the fourth property of theEuclidean metric, triangle inequality.5.8.2.2 Strict triangle inequalityWithout exception, all the inequalities in (1061) and (1062) can be madestrict while their corresponding implications remain true. The thenstrict inequality (1061a) or (1062) may be interpreted as a strict triangleinequality under which collinear arrangement of points is not allowed.[223,24/6, p.322] Hence by similar reasoning, −VN TDV N ≻ 0 is a sufficienttest of all the strict triangle inequalities; id est,δ(D) = 0D T = D−V T N DV N ≻ 0⎫⎬⎭ ⇒ √ d ij < √ d ik + √ d kj , i≠j ≠k (1064)5.8.3 −V T N DV N nestingFrom (1058) observe that T =−VN TDV N | N←3 . In fact, for D ∈ EDM N ,the leading principal submatrices of −VN TDV N form a nested sequence (byinclusion) whose members are individually positive semidefinite [159] [202][331] and have the same form as T ; videlicet, 5.405.39 To individually test triangle inequality | √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj forparticular i,k,j, set Ξ r (i,1)= Ξ r (k,2)= Ξ r (j,3)=1 and Ξ c = I .5.40 −V DV | N←1 = 0 ∈ S 0 + (B.4.1)

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 455−V T N DV N | N←1 = [ ∅ ] (o)−V T N DV N | N←2 = [d 12 ] ∈ S + (a)−V T N DV N | N←3 =−V T N DV N | N←4 =[⎡⎢⎣1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13= T ∈ S 2 + (b)1d 12 (d 12 12+d 13 −d 23 ) (d ⎤2 12+d 14 −d 24 )1(d 12 12+d 13 −d 23 ) d 13 (d 2 13+d 14 −d 34 )1(d 12 12+d 14 −d 24 ) (d 2 13+d 14 −d 34 ) d 14⎥⎦ (c).−VN TDV N | N←i =.−VN TDV N =⎡⎣⎡⎣−VN TDV ⎤N | N←i−1 ν(i)⎦ ∈ S i−1ν(i) T + (d)d 1i−VN TDV ⎤N | N←N−1 ν(N)⎦ ∈ S N−1ν(N) T + (e)d 1N (1065)where⎡ν(i) 1 ⎢2 ⎣⎤d 12 +d 1i −d 2id 13 +d 1i −d 3i⎥. ⎦ ∈ Ri−2 , i > 2 (1066)d 1,i−1 +d 1i −d i−1,iHence, the leading principal submatrices of EDM D must also be EDMs. 5.41Bordered symmetric matrices in the form (1065d) are known to haveintertwined [331,6.4] (or interlaced [202,4.3] [328,IV.4.1]) eigenvalues;(confer5.11.1) that means, for the particular submatrices (1065a) and(1065b),5.41 In fact, each and every principal submatrix of an EDM D is another EDM. [235,4.1]

456 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXλ 2 ≤ d 12 ≤ λ 1 (1067)where d 12 is the eigenvalue of submatrix (1065a) and λ 1 , λ 2 are theeigenvalues of T (1065b) (1058). Intertwining in (1067) predicts that shouldd 12 become 0, then λ 2 must go to 0 . 5.42 The eigenvalues are similarlyintertwined for submatrices (1065b) and (1065c);γ 3 ≤ λ 2 ≤ γ 2 ≤ λ 1 ≤ γ 1 (1068)where γ 1 , γ 2 ,γ 3 are the eigenvalues of submatrix (1065c). Intertwininglikewise predicts that should λ 2 become 0 (a possibility revealed in5.8.3.1),then γ 3 must go to 0. Combining results so far for N = 2, 3, 4: (1067) (1068)γ 3 ≤ λ 2 ≤ d 12 ≤ λ 1 ≤ γ 1 (1069)The preceding logic extends by induction through the remaining membersof the sequence (1065).5.8.3.1 Tightening the triangle inequalityNow we apply Schur complement fromA.4 to tighten the triangle inequalityfrom (1057) in case: cardinality N = 4. We find that the gains by doing soare modest. From (1065) we identify:[ ] A BB T −V TCNDV N | N←4 (1070)A T = −V T NDV N | N←3 (1071)both positive semidefinite by assumption, where B=ν(4) (1066), andC = d 14 . Using nonstrict CC † -form (1511), C ≽0 by assumption (5.8.1)and CC † =I . So by the positive semidefinite ordering of eigenvalues theorem(A.3.1.0.1),−V T NDV N | N←4 ≽ 0 ⇔ T ≽ d −114 ν(4)ν(4) T ⇒{λ1 ≥ d −114 ‖ν(4)‖ 2λ 2 ≥ 0(1072)where {d −114 ‖ν(4)‖ 2 , 0} are the eigenvalues of d −114 ν(4)ν(4) T while λ 1 , λ 2 arethe eigenvalues of T .5.42 If d 12 were 0, eigenvalue λ 2 becomes 0 (1060) because d 13 must then be equal tod 23 ; id est, d 12 = 0 ⇔ x 1 = x 2 . (5.4)

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 4575.8.3.1.1 Example. Small completion problem, II.Applying the inequality for λ 1 in (1072) to the small completion problem onpage 398 Figure 116, the lower bound on √ d 14 (1.236 in (884)) is tightenedto 1.289 . The correct value of √ d 14 to three significant figures is 1.414 .5.8.4 Affine dimension reduction in two dimensions(confer5.14.4) The leading principal 2×2 submatrix T of −VN TDV N haslargest eigenvalue λ 1 (1060) which is a convex function of D . 5.43 λ 1 cannever be 0 unless d 12 = d 13 = d 23 = 0. Eigenvalue λ 1 can never be negativewhile the d ij are nonnegative. The remaining eigenvalue λ 2 is a concavefunction of D that becomes 0 only at the upper and lower bounds of triangleinequality (1061a) and its equivalent forms: (confer (1063))| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)⇔| √ d 12 − √ d 13 | ≤ √ d 23 ≤ √ d 12 + √ d 13 (b)⇔| √ d 13 − √ d 23 | ≤ √ d 12 ≤ √ d 13 + √ d 23 (c)(1073)In between those bounds, λ 2 is strictly positive; otherwise, it would benegative but prevented by the condition T ≽ 0.When λ 2 becomes 0, it means triangle △ 123 has collapsed to a linesegment; a potential reduction in affine dimension r . The same logic is validfor any particular principal 2×2 submatrix of −VN TDV N , hence applicableto other triangles.5.43 The largest eigenvalue of any symmetric matrix is always a convex function of itsentries, while the ⎡ smallest ⎤ eigenvalue is always concave. [61, exmp.3.10] In our particulard 12case, say d ⎣d 13⎦∈ R 3 . Then the Hessian (1760) ∇ 2 λ 1 (d)≽0 certifies convexityd 23whereas ∇ 2 λ 2 (d)≼0 certifies concavity. Each Hessian has rank equal to 1. The respectivegradients ∇λ 1 (d) and ∇λ 2 (d) are nowhere 0.

458 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.9 Bridge: Convex polyhedra to EDMsThe criteria for the existence of an EDM include, by definition (891) (958),the properties imposed upon its entries d ij by the Euclidean metric. From5.8.1 and5.8.2, we know there is a relationship of matrix criteria to thoseproperties. Here is a snapshot of what we are sure: for i , j , k ∈{1... N}(confer5.2)√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇐−V T N DV N ≽ 0δ(D) = 0D T = D(1074)all implied by D ∈ EDM N . In words, these four Euclidean metric propertiesare necessary conditions for D to be a distance matrix. At the moment,we have no converse. As of concern in5.3, we have yet to establishmetric requirements beyond the four Euclidean metric properties that wouldallow D to be certified an EDM or might facilitate polyhedron or listreconstruction from an incomplete EDM. We deal with this problem in5.14.Our present goal is to establish ab initio the necessary and sufficient matrixcriteria that will subsume all the Euclidean metric properties and any furtherrequirements 5.44 for all N >1 (5.8.3); id est,−V T N DV N ≽ 0D ∈ S N h}⇔ D ∈ EDM N (910)or for EDM definition (967),}Ω ≽ 0√δ(d) ≽ 0⇔ D = D(Ω,d) ∈ EDM N (1075)5.44 In 1935, Schoenberg [312, (1)] first extolled matrix product −VN TDV N (1053)(predicated on symmetry and self-distance) specifically incorporating V N , albeitalgebraically. He showed: nonnegativity −y T VN TDV N y ≥ 0, for all y ∈ R N−1 , is necessaryand sufficient for D to be an EDM. Gower [162,3] remarks how surprising it is that sucha fundamental property of Euclidean geometry was obtained so late.

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 459Figure 130: Elliptope E 3 in isometrically isomorphic R 6 (projected on R 3 )is a convex body that appears to possess some kind of symmetry in thisdimension; it resembles a malformed pillow in the shape of a bulgingtetrahedron. Elliptope relative boundary is not smooth and comprises all setmembers (1076) having at least one 0 eigenvalue. [238,2.1] This elliptopehas an infinity of vertices, but there are only four vertices corresponding toa rank-1 matrix. Those yy T , evident in the illustration, have binary vectory ∈ R 3 with entries in {±1}.

460 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX0Figure 131: Elliptope E 2 in isometrically isomorphic R 3 is a line segmentillustrated interior to positive semidefinite cone S 2 + (Figure 43).5.9.1 Geometric arguments5.9.1.0.1 Definition. Elliptope: [238] [235,2.3] [113,31.5]a unique bounded immutable convex Euclidean body in S n ; intersection ofpositive semidefinite cone S n + with that set of n hyperplanes defined by unitymain diagonal;E n S n + ∩ {Φ∈ S n | δ(Φ)=1} (1076)a.k.a, the set of all correlation matrices of dimensiondim E n = n(n −1)/2 in R n(n+1)/2 (1077)An elliptope E n is not a polyhedron, in general, but has some polyhedral facesand an infinity of vertices. 5.45 Of those, 2 n−1 vertices (some extreme pointsof the elliptope) are extreme directions yy T of the positive semidefinitecone, where the entries of vector y ∈ R n belong to {±1} and exercise everycombination. Each of the remaining vertices has rank, greater than one,belonging to the set {k>0 | k(k + 1)/2 ≤ n}. Each and every face of anelliptope is exposed.△5.45 Laurent defines vertex distinctly from the sense herein (2.6.1.0.1); she defines vertexas a point with full-dimensional (nonempty interior) normal cone (E.10.3.2.1). Herdefinition excludes point C in Figure 32, for example.

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 461In fact, any positive semidefinite matrix whose entries belong to {±1} isa rank-one correlation matrix; and vice versa: 5.465.9.1.0.2 Theorem. Elliptope vertices rank-one. [125,2.1.1]For Y ∈ S n , y ∈ R n , and all i,j∈{1... n} (confer2.3.1.0.2)Y ≽ 0, Y ij ∈ {±1} ⇔ Y = yy T , y i ∈ {±1} (1078)⋄The elliptope for dimension n = 2 is a line segment in isometricallyisomorphic R n(n+1)/2 (Figure 131). Obviously, cone(E n )≠ S n + . The elliptopefor dimension n = 3 is realized in Figure 130.5.9.1.0.3 Lemma. Hypersphere. (confer bullet p.408) [17,4]Matrix Ψ = [Ψ ij ]∈ S N belongs to the elliptope in S N iff there exist N points pon the boundary of a hypersphere in R rankΨ having radius 1 such that‖p i − p j ‖ 2 = 2(1 − Ψ ij ) , i,j=1... N (1079)⋄There is a similar theorem for Euclidean distance matrices:We derive matrix criteria for D to be an EDM, validating (910) usingsimple geometry; distance to the polyhedron formed by the convex hull of alist of points (76) in Euclidean space R n .5.9.1.0.4 EDM assertion.D is a Euclidean distance matrix if and only if D ∈ S N h and distances-squarefrom the origin{‖p(y)‖ 2 = −y T V T NDV N y | y ∈ S − β} (1080)correspond to points p in some bounded convex polyhedronP − α = {p(y) | y ∈ S − β} (1081)having N or fewer vertices embedded in an r-dimensional subspace A − α ofR n , where α ∈ A = aff P and where domain of linear surjection p(y) is theunit simplex S ⊂ R N−1+ shifted such that its vertex at the origin is translatedto −β in R N−1 . When β = 0, then α = x 1 .⋄5.46 As there are few equivalent conditions for rank constraints, this device is ratherimportant for relaxing integer, combinatorial, or Boolean problems.

462 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXIn terms of V N , the unit simplex (293) in R N−1 has an equivalentrepresentation:S = {s ∈ R N−1 | √ 2V N s ≽ −e 1 } (1082)where e 1 is as in (907). Incidental to the EDM assertion, shifting theunit-simplex domain in R N−1 translates the polyhedron P in R n . Indeed,there is a map from vertices of the unit simplex to members of the listgenerating P ;p : R N−1⎛⎧⎪⎨p⎜⎝⎪⎩−βe 1 − βe 2 − β.e N−1 − β⎫⎞⎪⎬⎟⎠⎪⎭→=⎧⎪⎨⎪⎩R nx 1 − αx 2 − αx 3 − α.x N − α⎫⎪⎬⎪⎭(1083)5.9.1.0.5 Proof. EDM assertion.(⇒) We demonstrate that if D is an EDM, then each distance-square ‖p(y)‖ 2described by (1080) corresponds to a point p in some embedded polyhedronP − α . Assume D is indeed an EDM; id est, D can be made from some listX of N unknown points in Euclidean space R n ; D = D(X) for X ∈ R n×Nas in (891). Since D is translation invariant (5.5.1), we may shift the affinehull A of those unknown points to the origin as in (1023). Then take anypoint p in their convex hull (86);P − α = {p = (X − Xb1 T )a | a T 1 = 1, a ≽ 0} (1084)where α = Xb ∈ A ⇔ b T 1 = 1. Solutions to a T 1 = 1 are: 5.47a ∈{e 1 + √ }2V N s | s ∈ R N−1(1085)where e 1 is as in (907). Similarly, b = e 1 + √ 2V N β .P − α = {p = X(I − (e 1 + √ 2V N β)1 T )(e 1 + √ 2V N s) | √ 2V N s ≽ −e 1 }= {p = X √ 2V N (s − β) | √ 2V N s ≽ −e 1 } (1086)5.47 Since R(V N )= N(1 T ) and N(1 T )⊥ R(1) , then over all s∈ R N−1 , V N s is ahyperplane through the origin orthogonal to 1. Thus the solutions {a} constitute ahyperplane orthogonal to the vector 1, and offset from the origin in R N by any particularsolution; in this case, a = e 1 .

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 463that describes the domain of p(s) as the unit simplexMaking the substitution s − β ← yS = {s | √ 2V N s ≽ −e 1 } ⊂ R N−1+ (1082)P − α = {p = X √ 2V N y | y ∈ S − β} (1087)Point p belongs to a convex polyhedron P−α embedded in an r-dimensionalsubspace of R n because the convex hull of any list forms a polyhedron, andbecause the translated affine hull A − α contains the translated polyhedronP − α (1026) and the origin (when α ∈ A), and because A has dimensionr by definition (1028). Now, any distance-square from the origin to thepolyhedron P − α can be formulated{p T p = ‖p‖ 2 = 2y T V T NX T XV N y | y ∈ S − β} (1088)Applying (1035) to (1088) we get (1080).(⇐) To validate the EDM assertion in the reverse direction, we prove: If eachdistance-square ‖p(y)‖ 2 (1080) on the shifted unit-simplex S −β ⊂ R N−1corresponds to a point p(y) in some embedded polyhedron P − α , thenD is an EDM. The r-dimensional subspace A − α ⊆ R n is spanned byp(S − β) = P − α (1089)because A − α = aff(P − α) ⊇ P − α (1026). So, outside domain S − βof linear surjection p(y) , simplex complement \S − β ⊂ R N−1 must containdomain of the distance-square ‖p(y)‖ 2 = p(y) T p(y) to remaining points insubspace A − α ; id est, to the polyhedron’s relative exterior \P − α . For‖p(y)‖ 2 to be nonnegative on the entire subspace A − α , −VN TDV N mustbe positive semidefinite and is assumed symmetric; 5.48−V T NDV N Θ T pΘ p (1090)where 5.49 Θ p ∈ R m×N−1 for some m ≥ r . Because p(S − β) is a convexpolyhedron, it is necessarily a set of linear combinations of points from some5.48 The antisymmetric part ( −V T N DV N − (−V T N DV N) T) /2 is annihilated by ‖p(y)‖ 2 . Bythe same reasoning, any positive (semi)definite matrix A is generally assumed symmetricbecause only the symmetric part (A +A T )/2 survives the test y T Ay ≥ 0. [202,7.1]5.49 A T = A ≽ 0 ⇔ A = R T R for some real matrix R . [331,6.3]

464 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXlength-N list because every convex polyhedron having N or fewer verticescan be generated that way (2.12.2). Equivalent to (1080) are{p T p | p ∈ P − α} = {p T p = y T Θ T pΘ p y | y ∈ S − β} (1091)Because p ∈ P − α may be found by factoring (1091), the list Θ p is foundby factoring (1090). A unique EDM can be made from that list usinginner-product form definition D(Θ)| Θ=Θp (958). That EDM will be identicalto D if δ(D)=0, by injectivity of D (1009).5.9.2 Necessity and sufficiencyFrom (1051) we learned that matrix inequality −VN TDV N ≽ 0 is a necessarytest for D to be an EDM. In5.9.1, the connection between convex polyhedraand EDMs was pronounced by the EDM assertion; the matrix inequalitytogether with D ∈ S N h became a sufficient test when the EDM assertiondemanded that every bounded convex polyhedron have a correspondingEDM. For all N >1 (5.8.3), the matrix criteria for the existence of an EDMin (910), (1075), and (886) are therefore necessary and sufficient and subsumeall the Euclidean metric properties and further requirements.5.9.3 Example revisitedNow we apply the necessary and sufficient EDM criteria (910) to an earlierproblem.5.9.3.0.1 Example. Small completion problem, III. (confer5.8.3.1.1)Continuing Example 5.3.0.0.2 pertaining to Figure 116 where N = 4,distance-square d 14 is ascertainable from the matrix inequality −VN TDV N ≽0.Because all distances in (883) are known except √ d 14 , we may simplycalculate the smallest eigenvalue of −VN TDV N over a range of d 14 as inFigure 132. We observe a unique value of d 14 satisfying (910) where theabscissa axis is tangent to the hypograph of the smallest eigenvalue. Sincethe smallest eigenvalue of a symmetric matrix is known to be a concavefunction (5.8.4), we calculate its second partial derivative with respect tod 14 evaluated at 2 and find −1/3. We conclude there are no other satisfyingvalues of d 14 . Further, that value of d 14 does not meet an upper or lowerbound of a triangle inequality like (1063), so neither does it cause the collapse

5.10. EDM-ENTRY COMPOSITION 465of any triangle. Because the smallest eigenvalue is 0, affine dimension r ofany point list corresponding to D cannot exceed N −2. (5.7.1.1) 5.10 EDM-entry compositionLaurent [235,2.3] applies results from Schoenberg (1938) [313] to showcertain nonlinear compositions of individual EDM entries yield EDMs;in particular,D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0 (a)⇔ [e −αd ij] ∈ E N ∀α > 0 (b)(1092)where D = [d ij ] and E N is the elliptope (1076).5.10.0.0.1 Proof. (Monique Laurent, 2003) [313] (confer [227])Lemma 2.1. from A Tour d’Horizon ... on Completion Problems.[235] For D=[d ij , i,j=1... N]∈ S N h and E N the elliptope in S N(5.9.1.0.1), the following assertions are equivalent:(i) D ∈ EDM N(ii) e −αD [e −αd ij] ∈ E N for all α > 0(iii) 11 T − e −αD [1 − e −αd ij] ∈ EDM N for all α > 0⋄1) Equivalence of Lemma 2.1 (i) (ii) is stated in Schoenberg’s Theorem 1[313, p.527].2) (ii) ⇒ (iii) can be seen from the statement in the beginning of section 3,saying that a distance space embeds in L 2 iff some associated matrixis PSD. We reformulate it:Let d =(d ij ) i,j=0,1...N be a distance space on N+1 points(i.e., symmetric hollow matrix of order N+1) and letp =(p ij ) i,j=1...N be the symmetric matrix of order N related by:(A) 2p ij = d 0i + d 0j − d ij for i,j = 1... Nor equivalently(B) d 0i = p ii , d ij = p ii + p jj − 2p ij for i,j = 1... N

466 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXsmallest eigenvalue1 2 3 4d 14-0.2-0.4-0.6Figure 132: Smallest eigenvalue of −VN TDV Nonly one value of d 14 : 2.makes it a PSD matrix for1.41.2d 1/αijlog 2 (1+d 1/αij )(a)10.80.60.40.21−e −αd ij40.5 1 1.5 2d ij3(b)21d 1/αijlog 2 (1+d 1/αij )1−e −αd ij0.5 1 1.5 2αFigure 133: Some entrywise EDM compositions: (a) α = 2. Concavenondecreasing in d ij . (b) Trajectory convergence in α for d ij = 2.

5.10. EDM-ENTRY COMPOSITION 467Then d embeds in L 2 iff p is a positive semidefinite matrix iff dis of negative type (second half page 525/top of page 526 in [313]).For the implication from (ii) to (iii), set: p = e −αd and define d ′from p using (B) above. Then d ′ is a distance space on N+1points that embeds in L 2 . Thus its subspace of N points alsoembeds in L 2 and is precisely 1 − e −αd .Note that (iii) ⇒ (ii) cannot be read immediately from this argumentsince (iii) involves the subdistance of d ′ on N points (and not the fulld ′ on N+1 points).3) Show (iii) ⇒ (i) by using the series expansion of the function 1 − e −αd :the constant term cancels, α factors out; there remains a summationof d plus a multiple of α . Letting α go to 0 gives the result.This is not explicitly written in Schoenberg, but he also uses suchan argument; expansion of the exponential function then α → 0 (firstproof on [313, p.526]).Schoenberg’s results [313,6 thm.5] (confer [227, p.108-109]) also suggestcertain finite positive roots of EDM entries produce EDMs; specifically,D ∈ EDM N ⇔ [d 1/αij ] ∈ EDM N ∀α > 1 (1093)The special case α = 2 is of interest because it corresponds to absolutedistance; e.g.,D ∈ EDM N ⇒ ◦√ D ∈ EDM N (1094)Assuming that points constituting a corresponding list X are distinct(1055), then it follows: for D ∈ S N hlimα→∞ [d1/α ij] = limα→∞[1 − e −αd ij] = −E 11 T − I (1095)Negative elementary matrix −E (B.3) is relatively interior to the EDM cone(6.5) and terminal to respective trajectories (1092a) and (1093) as functionsof α . Both trajectories are confined to the EDM cone; in engineering terms,the EDM cone is an invariant set [310] to either trajectory. Further, if D isnot an EDM but for some particular α p it becomes an EDM, then for allgreater values of α it remains an EDM.

468 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.10.0.0.2 Exercise. Concave nondecreasing EDM-entry composition.Given EDM D = [d ij ] , empirical evidence suggests that the composition[log 2 (1 + d 1/αij )] is also an EDM for each fixed α ≥ 1 [sic] . Its concavityin d ij is illustrated in Figure 133 together with functions from (1092a) and(1093). Prove whether it holds more generally: Any concave nondecreasingcomposition of individual EDM entries d ij on R + produces another EDM.5.10.0.0.3 Exercise. Taxicab distance matrix as EDM.Determine whether taxicab distance matrices (D 1 (X) in Example 3.7.0.0.2)are all numerically equivalent to EDMs. Explain why or why not. 5.10.1 EDM by elliptope(confer (917)) For some κ∈ R + and C ∈ S N + in the elliptope E N (5.9.1.0.1),Alfakih asserts any given EDM D is expressible [9] [113,31.5]D = κ(11 T − C) ∈ EDM N (1096)This expression exhibits nonlinear combination of variables κ and C . Wetherefore propose a different expression requiring redefinition of the elliptope(1076) by scalar parametrization;E n t S n + ∩ {Φ∈ S n | δ(Φ)=t1} (1097)where, of course, E n = E n 1 . Then any given EDM D is expressiblewhich is linear in variables t∈ R + and E∈ E N t .5.11 EDM indefinitenessD = t11 T − E ∈ EDM N (1098)By known result (A.7.2) regarding a 0-valued entry on the main diagonal ofa symmetric positive semidefinite matrix, there can be no positive or negativesemidefinite EDM except the 0 matrix because EDM N ⊆ S N h (890) andS N h ∩ S N + = 0 (1099)the origin. So when D ∈ EDM N , there can be no factorization D =A T Aor −D =A T A . [331,6.3] Hence eigenvalues of an EDM are neither allnonnegative or all nonpositive; an EDM is indefinite and possibly invertible.

5.11. EDM INDEFINITENESS 4695.11.1 EDM eigenvalues, congruence transformationFor any symmetric −D , we can characterize its eigenvalues by congruencetransformation: [331,6.3][ ]VT−W T NDW = − D [V N1 T1 ] = −[VTN DV N V T N D11 T DV N 1 T D1]∈ S N (1100)BecauseW [V N 1 ] ∈ R N×N (1101)is full-rank, then (1516)inertia(−D) = inertia ( −W T DW ) (1102)the congruence (1100) has the same number of positive, zero, and negativeeigenvalues as −D . Further, if we denote by {γ i , i=1... N −1} theeigenvalues of −VN TDV N and denote eigenvalues of the congruence −W T DWby {ζ i , i=1... N} and if we arrange each respective set of eigenvalues innonincreasing order, then by theory of interlacing eigenvalues for borderedsymmetric matrices [202,4.3] [331,6.4] [328,IV.4.1]ζ N ≤ γ N−1 ≤ ζ N−1 ≤ γ N−2 ≤ · · · ≤ γ 2 ≤ ζ 2 ≤ γ 1 ≤ ζ 1 (1103)When D ∈ EDM N , then γ i ≥ 0 ∀i (1450) because −V T N DV N ≽0 as weknow. That means the congruence must have N −1 nonnegative eigenvalues;ζ i ≥ 0, i=1... N −1. The remaining eigenvalue ζ N cannot be nonnegativebecause then −D would be positive semidefinite, an impossibility; so ζ N < 0.By congruence, nontrivial −D must therefore have exactly one negativeeigenvalue; 5.50 [113,2.4.5]5.50 All entries of the corresponding eigenvector must have the same sign, with respectto each other, [89, p.116] because that eigenvector is the Perron vector corresponding tospectral radius; [202,8.3.1] the predominant characteristic of square nonnegative matrices.Unlike positive semidefinite matrices, nonnegative matrices are guaranteed only to haveat least one nonnegative eigenvalue.

470 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXD ∈ EDM N ⇒⎧λ(−D) i ≥ 0, i=1... N −1⎪⎨ ( N)∑λ(−D) i = 0i=1⎪⎩D ∈ S N h ∩ R N×N+(1104)where the λ(−D) i are nonincreasingly ordered eigenvalues of −D whosesum must be 0 only because trD = 0 [331,5.1]. The eigenvalue summationcondition, therefore, can be considered redundant. Even so, all theseconditions are insufficient to determine whether some given H ∈ S N h is anEDM, as shown by counterexample. 5.515.11.1.0.1 Exercise. Spectral inequality.Prove whether it holds: for D=[d ij ]∈ EDM Nλ(−D) 1 ≥ d ij ≥ λ(−D) N−1 ∀i ≠ j (1105)Terminology: a spectral cone is a convex cone containing all eigenspectra[227, p.365] [328, p.26] corresponding to some set of matrices. Anypositive semidefinite matrix, for example, possesses a vector of nonnegativeeigenvalues corresponding to an eigenspectrum contained in a spectral conethat is a nonnegative orthant.5.11.2 Spectral cones[ 0 1TDenoting the eigenvalues of Cayley-Menger matrix1 −D([ 0 1Tλ1 −D5.51 When N = 3, for example, the symmetric hollow matrixH =⎡⎣ 0 1 11 0 51 5 0⎤]∈ S N+1 by])∈ R N+1 (1106)⎦ ∈ S N h ∩ R N×N+is not an EDM, although λ(−H) = [5 0.3723 −5.3723] T conforms to (1104).

5.11. EDM INDEFINITENESS 471we have the Cayley-Menger form (5.7.3.0.1) of necessary and sufficientconditions for D ∈ EDM N from the literature: [184,3] 5.52 [74,3] [113,6.2](confer (910) (886))D ∈ EDM N ⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])≥ 0 ,ii = 1... N⎫⎪⎬⎪⎭ ⇔ {−VTN DV N ≽ 0D ∈ S N h(1107)These conditions say the Cayley-Menger form has ([ one and])only one negative0 1Teigenvalue. When D is an EDM, eigenvalues λ belong to that1 −Dparticular orthant in R N+1 having the N+1 th coordinate as sole negativecoordinate 5.53 :[ ]RN+= cone {eR 1 , e 2 , · · · e N , −e N+1 } (1108)−5.11.2.1 Cayley-Menger versus SchoenbergConnection to the Schoenberg criterion (910) is made when theCayley-Menger form is further partitioned:[ 0 1T1 −D⎡ [ ] [ ] ⎤] 0 1 1 T⎢= ⎣ 1 0 −D 1,2:N⎥⎦ (1109)[1 −D 2:N,1 ] −D 2:N,2:N[ ] 0 1Matrix D ∈ S N h is an EDM if and only if the Schur complement of1 0(A.4) in this partition is positive semidefinite; [17,1] [217,3] id est,5.52 Recall: for D ∈ S N h , −V T N DV N ≽ 0 subsumes nonnegativity property 1 (5.8.1).5.53 Empirically, all except one entry of the corresponding eigenvector have the same signwith respect to each other.

472 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX(confer (1053))D ∈ EDM N⇔[ ][ ]0 1 1−D 2:N,2:N − [1 −D 2:N,1 ]T= −2VN 1 0 −D TDV N ≽ 01,2:Nand(1110)D ∈ S N hPositive semidefiniteness of that Schur complement insures nonnegativity(D ∈ R N×N+ ,5.8.1), whereas complementary inertia (1518) insures existenceof that lone negative eigenvalue of the Cayley-Menger form.Now we apply results from chapter 2 with regard to polyhedral cones andtheir duals.5.11.2.2 Ordered eigenspectraConditions (1107) specify eigenvalue [ membership ] to the smallest pointed0 1Tpolyhedral spectral cone for1 −EDM N :K λ {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N ≥ 0 ≥ ζ N+1 , 1 T ζ = 0}[ ]RN+= K M ∩ ∩ ∂HR −([ ])0 1T= λ1 −EDM N(1111)where∂H {ζ ∈ R N+1 | 1 T ζ = 0} (1112)is a hyperplane through the origin, and K M is the monotone cone(2.13.9.4.3, implying ordered eigenspectra) which is full-dimensional but isnot pointed;K M = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N+1 } (436)K ∗ M = { [e 1 − e 2 e 2 −e 3 · · · e N −e N+1 ]a | a ≽ 0 } ⊂ R N+1 (437)

5.11. EDM INDEFINITENESS 473So because of the hyperplane,indicating K λ is not full-dimensional. Defining⎡⎡e T 1 − e T ⎤2A ⎢e T 2 − e T 3 ⎥⎣ . ⎦ ∈ RN×N+1 , B ⎢e T N − ⎣eT N+1we have the halfspace-description:dim aff K λ = dim∂H = N (1113)e T 1e T2 .e T N−e T N+1⎤⎥⎦ ∈ RN+1×N+1 (1114)K λ = {ζ ∈ R N+1 | Aζ ≽ 0, Bζ ≽ 0, 1 T ζ = 0} (1115)From this and (444) we get a vertex-description for a pointed spectral conethat is not full-dimensional:{ ([ ] ) †}ÂK λ = V N V N b | b ≽ 0(1116)ˆBwhere V N ∈ R N+1×N , and where [sic]ˆB = e T N ∈ R 1×N+1 (1117)and⎡e T 1 − e T ⎤2Â = ⎢e T 2 − e T 3 ⎥⎣ . ⎦ ∈ RN−1×N+1 (1118)e T N−1 − eT Nhold those[ rows]of A and B corresponding to conically independent rowsA(2.10) in VB N .Conditions (1107) can be equivalently restated in terms of a spectral conefor Euclidean distance matrices:⎧ ([ ]) [ ]⎪⎨ 0 1TRN+λ∈ KD ∈ EDM N ⇔ 1 −DM ∩ ∩ ∂HR − (1119)⎪⎩D ∈ S N h

474 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXVertex-description of the dual spectral cone is, (314)K ∗ λ = K∗ M + [RN+R −] ∗+ ∂H ∗ ⊆ R N+1= { [ A T B T 1 −1 ] b | b ≽ 0 } =From (1116) and (445) we get a halfspace-description:{ }[ÂT ˆBT 1 −1]a | a ≽ 0(1120)K ∗ λ = {y ∈ R N+1 | (V T N [ÂT ˆBT ]) † V T N y ≽ 0} (1121)This polyhedral dual spectral cone K ∗ λ is closed, convex, full-dimensionalbecause K λ is pointed, but is not pointed because K λ is not full-dimensional.5.11.2.3 Unordered eigenspectraSpectral cones are not unique; eigenspectra ordering can be rendered benignwithin a cone by presorting a vector of eigenvalues into nonincreasingorder. 5.54 Then things simplify: Conditions (1107) now specify eigenvaluemembership to the spectral cone([ ]) [ ]0 1T RNλ1 −EDM N += ∩ ∂HR − (1122)= {ζ ∈ R N+1 | Bζ ≽ 0, 1 T ζ = 0}where B is defined in (1114), and ∂H in (1112). From (444) we get avertex-description for a pointed spectral cone not full-dimensional:([ ])0 1T {λ1 −EDM N = V N ( ˜BV}N ) † b | b ≽ 0{[ ] } (1123)I=−1 T b | b ≽ 0where V N ∈ R N+1×N and˜B ⎡⎢⎣e T 1e T2 .e T N⎤⎥⎦ ∈ RN×N+1 (1124)5.54 Eigenspectra ordering (represented by a cone having monotone description such as(1111)) becomes benign in (1332), for example, where projection of a given presorted vectoron the nonnegative orthant in a subspace is equivalent to its projection on the monotonenonnegative cone in that same subspace; equivalence is a consequence of presorting.

5.11. EDM INDEFINITENESS 475holds only those rows of B corresponding to conically independent rowsin BV N .For presorted eigenvalues, (1107) can be equivalently restatedD ∈ EDM N⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])∈[ ]RN+∩ ∂HR −(1125)Vertex-description of the dual spectral cone is, (314)([ ]) 0 1T ∗λ1 −EDM N =[ ]RN++ ∂H ∗ ⊆ R N+1R −= {[ B T 1 −1 ] b | b ≽ 0 } =From (445) we get a halfspace-description:{[ }˜BT 1 −1]a | a ≽ 0(1126)([ ]) 0 1T ∗λ1 −EDM N = {y ∈ R N+1 | (VN T ˜B T ) † VN Ty ≽ 0}= {y ∈ R N+1 | [I −1 ]y ≽ 0}(1127)This polyhedral dual spectral cone is closed, convex, full-dimensional but isnot pointed. (Notice that any nonincreasingly ordered eigenspectrum belongsto this dual spectral cone.)5.11.2.4 Dual cone versus dual spectral coneAn open question regards the relationship of convex cones and their duals tothe corresponding spectral cones and their duals. A positive semidefinitecone, for example, is selfdual. Both the nonnegative orthant and themonotone nonnegative cone are spectral cones for it. When we considerthe nonnegative orthant, then that spectral cone for the selfdual positivesemidefinite cone is also selfdual.

476 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.12 List reconstructionThe term metric multidimensional scaling 5.55 [257] [106] [353] [104] [260][89] refers to any reconstruction of a list X ∈ R n×N in Euclidean spacefrom interpoint distance information, possibly incomplete (6.7), ordinal(5.13.2), or specified perhaps only by bounding-constraints (5.4.2.3.9)[351]. Techniques for reconstruction are essentially methods for optimallyembedding an unknown list of points, corresponding to given Euclideandistance data, in an affine subset of desired or minimum dimension.The oldest known precursor is called principal component analysis [165]which analyzes the correlation matrix (5.9.1.0.1); [53,22] a.k.a,Karhunen-Loéve transform in the digital signal processing literature.Isometric reconstruction (5.5.3) of point list X is best performed byeigenvalue decomposition of a Gram matrix; for then, numerical errors offactorization are easily spotted in the eigenvalues: Now we consider howrotation/reflection and translation invariance factor into a reconstruction.5.12.1 x 1 at the origin. V NAt the stage of reconstruction, we have D ∈ EDM N and wish to find agenerating list (2.3.2) for polyhedron P − α by factoring Gram matrix−V T N DV N (908) as in (1090). One way to factor −V T N DV N is viadiagonalization of symmetric matrices; [331,5.6] [202] (A.5.1,A.3)−V T NDV N QΛQ T (1128)QΛQ T ≽ 0 ⇔ Λ ≽ 0 (1129)where Q∈ R N−1×N−1 is an orthogonal matrix containing eigenvectorswhile Λ∈ S N−1 is a diagonal matrix containing corresponding nonnegativeeigenvalues ordered by nonincreasing value. From the diagonalization,identify the list using (1035);−V T NDV N = 2V T NX T XV N Q √ ΛQ T pQ p√ΛQT(1130)5.55 Scaling [349] means making a scale, i.e., a numerical representation of qualitative data.If the scale is multidimensional, it’s multidimensional scaling.−Jan de LeeuwA goal of multidimensional scaling is to find a low-dimensional representation of listX so that distances between its elements best fit a given set of measured pairwisedissimilarities. When data comprises measurable distances, then reconstruction is termedmetric multidimensional scaling. In one dimension, N coordinates in X define the scale.

5.12. LIST RECONSTRUCTION 477where √ ΛQ T pQ p√Λ Λ =√Λ√Λ and where Qp ∈ R n×N−1 is unknown as isits dimension n . Rotation/reflection is accounted for by Q p yet only itsfirst r columns are necessarily orthonormal. 5.56 Assuming membership tothe unit simplex y ∈ S (1087), then point p = X √ 2V N y = Q p√ΛQ T y in R nbelongs to the translated polyhedronP − x 1 (1131)whose generating list constitutes the columns of (1029)[ √ ] [ √ ]0 X 2VN = 0 Q p ΛQT∈ R n×N= [0 x 2 −x 1 x 3 −x 1 · · · x N −x 1 ](1132)The scaled auxiliary matrix V N represents that translation. A simple choicefor Q p has n set to N − 1; id est, Q p = I . Ideally, each member of thegenerating list has at most r nonzero entries; r being, affine dimensionrankV T NDV N = rankQ p√ΛQ T = rank Λ = r (1133)Each member then has at least N −1 − r zeros in its higher-dimensionalcoordinates because r ≤ N −1. (1041) To truncate those zeros, choose nequal to affine dimension which is the smallest n possible because XV N hasrank r ≤ n (1037). 5.57 In that case, the simplest choice for Q p is [I 0 ]having dimension r ×N −1.We may wish to verify the list (1132) found from the diagonalization of−VN TDV N . Because of rotation/reflection and translation invariance (5.5),EDM D can be uniquely made from that list by calculating: (891)5.56 Recall r signifies affine dimension. Q p is not necessarily an orthogonal matrix. Q p isconstrained such that only its first r columns are necessarily orthonormal because thereare only r nonzero eigenvalues in Λ when −VN TDV N has rank r (5.7.1.1). Remainingcolumns of Q p are arbitrary.⎡⎤⎡q T1 ⎤λ 1 0. .. 5.57 If we write Q T = ⎣. .. ⎦ as rowwise eigenvectors, Λ = ⎢ λr ⎥qN−1T ⎣ 0 ... ⎦0 0in terms of eigenvalues, and Q p = [ ]q p1 · · · q pN−1 as column vectors, then√Q p Λ Q T ∑= r √λi q pi qiT is a sum of r linearly independent rank-one matrices (B.1.1).i=1Hence the summation has rank r .

478 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXD(X) = D(X[0 √ 2V N ]) = D(Q p [0 √ ΛQ T ]) = D([0 √ ΛQ T ]) (1134)This suggests a way to find EDM D given −V T N DV N (confer (1013))[0D =δ ( −VN TDV )N] [1 T + 1 0 δ ( ) ] [−VNDV T T 0 0TN − 20 −VN TDV N](1009)5.12.2 0 geometric center. VAlternatively we may perform reconstruction using auxiliary matrix V(B.4.1) and Gram matrix −V DV 1 (912) instead; to find a generating list2for polyhedronP − α c (1135)whose geometric center has been translated to the origin. Redimensioningdiagonalization factors Q, Λ∈R N×N and unknown Q p ∈ R n×N , (1036)−V DV = 2V X T XV Q √ ΛQ T pQ p√ΛQ T QΛQ T (1136)where the geometrically centered generating list constitutes (confer (1132))XV = 1 √2Q p√ΛQ T ∈ R n×N= [x 1 − 1 N X1 x 2 − 1 N X1 x 3 − 1 N X1 · · · x N − 1 N X1 ] (1137)where α c = 1 N X1. (5.5.1.0.1) The simplest choice for Q p is [ I 0 ]∈ R r×N .Now EDM D can be uniquely made from the list found: (891)D(X) = D(XV ) = D( 1 √2Q p√ΛQ T ) = D( √ ΛQ T ) 1 2(1138)This EDM is, of course, identical to (1134). Similarly to (1009), from −V DVwe can find EDM D (confer (1000))D = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (999)

5.12. LIST RECONSTRUCTION 479(a)(c)(b)(d)(f)(e)Figure 134: Map of United States of America showing some state boundariesand the Great Lakes. All plots made by connecting 5020 points. Anydifference in scale in (a) through (d) is artifact of plotting routine.(a) Shows original map made from decimated (latitude, longitude) data.(b) Original map data rotated (freehand) to highlight curvature of Earth.(c) Map isometrically reconstructed from an EDM (from distance only).(d) Same reconstructed map illustrating curvature.(e)(f) Two views of one isotonic reconstruction (from comparative distance);problem (1147) with no sort constraint Πd (and no hidden line removal).

480 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.13 Reconstruction examples5.13.1 Isometric reconstruction5.13.1.0.1 Example. Cartography.The most fundamental application of EDMs is to reconstruct relative pointposition given only interpoint distance information. Drawing a map of theUnited States is a good illustration of isometric reconstruction from completedistance data. We obtained latitude and longitude information for the coast,border, states, and Great Lakes from the usalo atlas data file within MatlabMapping Toolbox; conversion to Cartesian coordinates (x,y,z) via:φ π/2 − latitudeθ longitudex = sin(φ) cos(θ)y = sin(φ) sin(θ)z = cos(φ)(1139)We used 64% of the available map data to calculate EDM D from N = 5020points. The original (decimated) data and its isometric reconstruction via(1130) are shown in Figure 134a-d. Matlab code is on Wıκımization. Theeigenvalues computed for (1128) areλ(−V T NDV N ) = [199.8 152.3 2.465 0 0 0 · · · ] T (1140)The 0 eigenvalues have absolute numerical error on the order of 2E-13 ;meaning, the EDM data indicates three dimensions (r = 3) are required forreconstruction to nearly machine precision.5.13.2 Isotonic reconstructionSometimes only comparative information about distance is known (Earth iscloser to the Moon than it is to the Sun). Suppose, for example, EDM D forthree points is unknown:D = [d ij ] =⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0but comparative distance data is available:⎤⎦ ∈ S 3 h (880)d 13 ≥ d 23 ≥ d 12 (1141)

5.13. RECONSTRUCTION EXAMPLES 481With vectorization d = [d 12 d 13 d 23 ] T ∈ R 3 , we express the comparative dataas the nonincreasing sorting⎡ ⎤⎡⎤ ⎡ ⎤0 1 0 d 12 d 13Πd = ⎣ 0 0 1 ⎦⎣d 13⎦ = ⎣ d 23⎦ ∈ K M+ (1142)1 0 0 d 23 d 12where Π is a given permutation matrix expressing known sorting action onthe entries of unknown EDM D , and K M+ is the monotone nonnegativecone (2.13.9.4.2)K M+ = {z | z 1 ≥ z 2 ≥ · · · ≥ z N(N−1)/2 ≥ 0} ⊆ R N(N−1)/2+ (429)where N(N −1)/2 = 3 for the present example. From sorted vectorization(1142) we create the sort-index matrixgenerally defined⎡O = ⎣0 1 2 3 21 2 0 2 23 2 2 2 0⎤⎦ ∈ S 3 h ∩ R 3×3+ (1143)O ij k 2 | d ij = (Ξ Πd) k, j ≠ i (1144)where Ξ is a permutation matrix (1728) completely reversing order of vectorentries.Replacing EDM data with indices-square of a nonincreasing sorting likethis is, of course, a heuristic we invented and may be regarded as a nonlinearintroduction of much noise into the Euclidean distance matrix. For largedata sets, this heuristic makes an otherwise intense problem computationallytractable; we see an example in relaxed problem (1148).Any process of reconstruction that leaves comparative distanceinformation intact is called ordinal multidimensional scaling or isotonicreconstruction. Beyond rotation, reflection, and translation error, (5.5)list reconstruction by isotonic reconstruction is subject to error in absolutescale (dilation) and distance ratio. Yet Borg & Groenen argue: [53,2.2]reconstruction from complete comparative distance information for a largenumber of points is as highly constrained as reconstruction from an EDM;the larger the number, the better.

482 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXλ(−V T N OV N) j90080070060050040030020010001 2 3 4 5 6 7 8 9 10jFigure 135: Largest ten eigenvalues, of −V T N OV N for map of USA, sorted bynonincreasing value.5.13.2.1 Isotonic cartographyTo test Borg & Groenen’s conjecture, suppose we make a complete sort-indexmatrix O ∈ S N h ∩ R N×N+ for the map of the USA and then substitute O in placeof EDM D in the reconstruction process of5.12. Whereas EDM D returnedonly three significant eigenvalues (1140), the sort-index matrix O is generallynot an EDM (certainly not an EDM with corresponding affine dimension 3)so returns many more. The eigenvalues, calculated with absolute numericalerror approximately 5E-7, are plotted in Figure 135: (In the code onWıκımization, matrix O is normalized by (N(N −1)/2) 2 .)λ(−V T N OV N ) = [880.1 463.9 186.1 46.20 17.12 9.625 8.257 1.701 0.7128 0.6460 · · · ] T(1145)The extra eigenvalues indicate that affine dimension corresponding to anEDM near O is likely to exceed 3. To realize the map, we must simultaneouslyreduce that dimensionality and find an EDM D closest to O in somesense 5.58 while maintaining the known comparative distance relationship.For example: given permutation matrix Π expressing the known sortingaction like (1142) on entries5.58 a problem explored more in7.

5.13. RECONSTRUCTION EXAMPLES 483d 1 √2dvec D =⎡⎢⎣⎤d 12d 13d 23d 14d 24d 34⎥⎦.d N−1,N∈ R N(N−1)/2 (1146)of unknown D ∈ S N h , we can make sort-index matrix O input to theoptimization problemminimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(1147)Πd ∈ K M+D ∈ EDM Nthat finds the EDM D (corresponding to affine dimension not exceeding 3 inisomorphic dvec EDM N ∩ Π T K M+ ) closest to O in the sense of Schoenberg(910).Analytical solution to this problem, ignoring the sort constraintΠd ∈ K M+ , is known [353]: we get the convex optimization [sic] (7.1)minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(1148)D ∈ EDM NOnly the three largest nonnegative eigenvalues in (1145) need be retained tomake list (1132); the rest are discarded. The reconstruction from EDM Dfound in this manner is plotted in Figure 134e-f. Matlab code is onWıκımization. From these plots it becomes obvious that inclusion of thesort constraint is necessary for isotonic reconstruction.That sort constraint demands: any optimal solution D ⋆ must possess theknown comparative distance relationship that produces the original ordinaldistance data O (1144). Ignoring the sort constraint, apparently, violates it.Yet even more remarkable is how much the map reconstructed using onlyordinal data still resembles the original map of the USA after suffering themany violations produced by solving relaxed problem (1148). This suggeststhe simple reconstruction techniques of5.12 are robust to a significantamount of noise.

484 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.13.2.2 Isotonic solution with sort constraintBecause problems involving rank are generally difficult, we will partition(1147) into two problems we know how to solve and then alternate theirsolution until convergence:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM Nminimize ‖σ − Πd‖σsubject to σ ∈ K M+(a)(b)(1149)where sort-index matrix O (a given constant in (a)) becomes an implicitvector variable o i solving the i th instance of (1149b)1√2dvec O i = o i Π T σ ⋆ ∈ R N(N−1)/2 , i∈{1, 2, 3...} (1150)As mentioned in discussion of relaxed problem (1148), a closed-formsolution to problem (1149a) exists. Only the first iteration of (1149a)sees the original sort-index matrix O whose entries are nonnegative wholenumbers; id est, O 0 =O∈ S N h ∩ R N×N+ (1144). Subsequent iterations i takethe previous solution of (1149b) as inputO i = dvec −1 ( √ 2o i ) ∈ S N (1151)real successors, estimating distance-square not order, to the sort-indexmatrix O .New convex problem (1149b) finds the unique minimum-distanceprojection of Πd on the monotone nonnegative cone K M+ . By definingY †T = [e 1 − e 2 e 2 −e 3 e 3 −e 4 · · · e m ] ∈ R m×m (430)where mN(N −1)/2, we may rewrite (1149b) as an equivalent quadraticprogram; a convex problem in terms of the halfspace-description of K M+ :minimize (σ − Πd) T (σ − Πd)σsubject to Y † σ ≽ 0(1152)

5.13. RECONSTRUCTION EXAMPLES 485This quadratic program can be converted to a semidefinite program viaSchur-form (3.5.2); we get the equivalent problemminimizet∈R , σsubject tot[tI σ − Πd(σ − Πd) T 1]≽ 0(1153)Y † σ ≽ 05.13.2.3 ConvergenceInE.10 we discuss convergence of alternating projection on intersectingconvex sets in a Euclidean vector space; convergence to a point in theirintersection. Here the situation is different for two reasons:Firstly, sets of positive semidefinite matrices having an upper bound onrank are generally not convex. Yet in7.1.4.0.1 we prove (1149a) is equivalentto a projection of nonincreasingly ordered eigenvalues on a subset of thenonnegative orthant:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM N≡minimize ‖Υ − Λ‖ FΥ [ ]R3+subject to δ(Υ) ∈0(1154)where −VN TDV N UΥU T ∈ S N−1 and −VN TOV N QΛQ T ∈ S N−1 areordered diagonalizations (A.5). It so happens: optimal orthogonal U ⋆always equals Q given. Linear operator T(A) = U ⋆T AU ⋆ , acting on squarematrix A , is an isometry because Frobenius’ norm is orthogonally invariant(48). This isometric isomorphism T thus maps a nonconvex problem to aconvex one that preserves distance.Secondly, the second half (1149b) of the alternation takes place in adifferent vector space; S N h (versus S N−1 ). From5.6 we know these twovector spaces are related by an isomorphism, S N−1 =V N (S N h ) (1018), butnot by an isometry.We have, therefore, no guarantee from theory of alternating projectionthat the alternation (1149) converges to a point, in the set of allEDMs corresponding to affine dimension not in excess of 3, belonging todvec EDM N ∩ Π T K M+ .

486 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.13.2.4 InterludeMap reconstruction from comparative distance data, isotonic reconstruction,would also prove invaluable to stellar cartography where absolute interstellardistance is difficult to acquire. But we have not yet implemented the secondhalf (1152) of alternation (1149) for USA map data because memory-demandsexceed capability of our computer.5.13.2.4.1 Exercise. Convergence of isotonic solution by alternation.Empirically demonstrate convergence, discussed in5.13.2.3, on a smallerdata set.It would be remiss not to mention another method of solution to thisisotonic reconstruction problem: Once again we assume only comparativedistance data like (1141) is available. Given known set of indices Iminimize rankV DVD(1155)subject to d ij ≤ d kl ≤ d mn ∀(i,j,k,l,m,n)∈ ID ∈ EDM Nthis problem minimizes affine dimension while finding an EDM whoseentries satisfy known comparative relationships. Suitable rank heuristicsare discussed in4.4.1 and7.2.2 that will transform this to a convexoptimization problem.Using contemporary computers, even with a rank heuristic in place of theobjective function, this problem formulation is more difficult to compute thanthe relaxed counterpart problem (1148). That is because there exist efficientalgorithms to compute a selected few eigenvalues and eigenvectors from avery large matrix. Regardless, it is important to recognize: the optimalsolution set for this problem (1155) is practically always different from theoptimal solution set for its counterpart, problem (1147).5.14 Fifth property of Euclidean metricWe continue now with the question raised in5.3 regarding the necessityfor at least one requirement more than the four properties of the Euclideanmetric (5.2) to certify realizability of a bounded convex polyhedron or to

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 487reconstruct a generating list for it from incomplete distance information.There we saw that the four Euclidean metric properties are necessary forD ∈ EDM N in the case N = 3, but become insufficient when cardinality Nexceeds 3 (regardless of affine dimension).5.14.1 RecapitulateIn the particular case N = 3, −V T N DV N ≽ 0 (1059) and D ∈ S 3 h are necessaryand sufficient conditions for D to be an EDM. By (1061), triangle inequality isthen the only Euclidean condition bounding the necessarily nonnegative d ij ;and those bounds are tight. That means the first four properties of theEuclidean metric are necessary and sufficient conditions for D to be an EDMin the case N = 3 ; for i,j∈{1, 2, 3}√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇔ −V T N DV N ≽ 0D ∈ S 3 h⇔ D ∈ EDM 3(1156)Yet those four properties become insufficient when N > 3.5.14.2 Derivation of the fifthCorrespondence between the triangle inequality and the EDM was developedin5.8.2 where a triangle inequality (1061a) was revealed within theleading principal 2 ×2 submatrix of −VN TDV N when positive semidefinite.Our choice of the leading principal submatrix was arbitrary; actually,a unique triangle inequality like (956) corresponds to any one of the(N −1)!/(2!(N −1 − 2)!) principal 2 ×2 submatrices. 5.59 Assuming D ∈ S 4 hand −VN TDV N ∈ S 3 , then by the positive (semi)definite principal submatricestheorem (A.3.1.0.4) it is sufficient to prove: all d ij are nonnegative, alltriangle inequalities are satisfied, and det(−VN TDV N) is nonnegative. WhenN = 4, in other words, that nonnegative determinant becomes the fifth andlast Euclidean metric requirement for D ∈ EDM N . We now endeavor toascribe geometric meaning to it.5.59 There are fewer principal 2×2 submatrices in −V T N DV N than there are triangles madeby four or more points because there are N!/(3!(N − 3)!) triangles made by point triples.The triangles corresponding to those submatrices all have vertex x 1 . (confer5.8.2.1)

488 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.14.2.1 Nonnegative determinantBy (962) when D ∈EDM 4 , −VN TDV N is equal to inner product (957),⎡√ √ ⎤d 12 d12 d 13 cos θ 213 d12 √d12 √d 14 cos θ 214Θ T Θ = ⎣ d 13 cosθ 213 d 13 d13 √d12d 14 cos θ 314⎦√d 14 cosθ 214 d13 d 14 cos θ 314 d 14(1157)Because Euclidean space is an inner-product space, the more conciseinner-product form of the determinant is admitted;det(Θ T Θ) = −d 12 d 13 d 14(cos(θ213 ) 2 +cos(θ 214 ) 2 +cos(θ 314 ) 2 − 2 cos θ 213 cosθ 214 cosθ 314 − 1 )The determinant is nonnegative if and only if(1158)cos θ 214 cos θ 314 − √ sin(θ 214 ) 2 sin(θ 314 ) 2 ≤ cos θ 213 ≤ cos θ 214 cos θ 314 + √ sin(θ 214 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 314 − √ sin(θ 213 ) 2 sin(θ 314 ) 2 ≤ cos θ 214 ≤ cos θ 213 cos θ 314 + √ sin(θ 213 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 214 − √ sin(θ 213 ) 2 sin(θ 214 ) 2 ≤ cos θ 314 ≤ cos θ 213 cos θ 214 + √ sin(θ 213 ) 2 sin(θ 214 ) 2which simplifies, for 0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π and all i≠j ≠l ∈{2, 3, 4} , to(1159)cos(θ i1l + θ l1j ) ≤ cos θ i1j ≤ cos(θ i1l − θ l1j ) (1160)Analogously to triangle inequality (1073), the determinant is 0 upon equalityon either side of (1160) which is tight. Inequality (1160) can be equivalentlywritten linearly as a triangle inequality between relative angles [393,1.4];|θ i1l − θ l1j | ≤ θ i1j ≤ θ i1l + θ l1jθ i1l + θ l1j + θ i1j ≤ 2π0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π(1161)

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 489θ 213-θ 214-θ 314πFigure 136: The relative-angle inequality tetrahedron (1162) bounding EDM 4is regular; drawn in entirety. Each angle θ (954) must belong to this solidto be realizable.Generalizing this:5.14.2.1.1 Fifth property of Euclidean metric - restatement.Relative-angle inequality. (confer5.3.1.0.1) [49] [50, p.17, p.107] [235,3.1]Augmenting the four fundamental Euclidean metric properties in R n , for alli,j,l ≠ k ∈{1... N }, i

490 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXBecause point labelling is arbitrary, this fifth Euclidean metricrequirement must apply to each of the N points as though each were in turnlabelled x 1 ; hence the new index k in (1162). Just as the triangle inequalityis the ultimate test for realizability of only three points, the relative-angleinequality is the ultimate test for only four. For four distinct points, thetriangle inequality remains a necessary although penultimate test; (5.4.3)Four Euclidean metric properties (5.2).Angle θ inequality (885) or (1162).⇔ −V T N DV N ≽ 0D ∈ S 4 h⇔ D = D(Θ) ∈ EDM 4The relative-angle inequality, for this case, is illustrated in Figure 136.5.14.2.2 Beyond the fifth metric property(1163)When cardinality N exceeds 4, the first four properties of the Euclideanmetric and the relative-angle inequality together become insufficientconditions for realizability. In other words, the four Euclidean metricproperties and relative-angle inequality remain necessary but become asufficient test only for positive semidefiniteness of all the principal 3 × 3submatrices [sic] in −VN TDV N . Relative-angle inequality can be consideredthe ultimate test only for realizability at each vertex x k of each and everypurported tetrahedron constituting a hyperdimensional body.When N = 5 in particular, relative-angle inequality becomes thepenultimate Euclidean metric requirement while nonnegativity of thenunwieldy det(Θ T Θ) corresponds (by the positive (semi)definite principalsubmatrices theorem inA.3.1.0.4) to the sixth and last Euclidean metricrequirement. Together these six tests become necessary and sufficient, andso on.Yet for all values of N , only assuming nonnegative d ij , relative-anglematrix inequality in (1075) is necessary and sufficient to certify realizability;(5.4.3.1)Euclidean metric property 1 (5.2).Angle matrix inequality Ω ≽ 0 (963).⇔ −V T N DV N ≽ 0D ∈ S N h⇔ D = D(Ω,d) ∈ EDM N(1164)Like matrix criteria (886), (910), and (1075), the relative-angle matrixinequality and nonnegativity property subsume all the Euclidean metricproperties and further requirements.

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 4915.14.3 Path not followedAs a means to test for realizability of four or more points, an intuitivelyappealing way to augment the four Euclidean metric properties is torecognize generalizations of the triangle inequality: In the case ofcardinality N = 4, the three-dimensional analogue to triangle & distance istetrahedron & facet-area, while in case N = 5 the four-dimensional analogueis polychoron & facet-volume, ad infinitum. For N points, N + 1 metricproperties are required.5.14.3.1 N = 4Each of the four facets of a general tetrahedron is a triangle and its relativeinterior. Suppose we identify each facet of the tetrahedron by its area-square:c 1 , c 2 , c 3 , c 4 . Then analogous to metric property 4, we may write a tight 5.60area inequality for the facets√ci ≤ √ c j + √ c k + √ c l , i≠j ≠k≠l∈{1, 2, 3, 4} (1165)which is a generalized “triangle” inequality [227,1.1] that follows from√ci = √ c j cos ϕ ij + √ c k cos ϕ ik + √ c l cos ϕ il (1166)[242] [373, Law of Cosines] where ϕ ij is the dihedral angle at the commonedge between triangular facets i and j .If D is the EDM corresponding to the whole tetrahedron, then area-squareof the i th triangular facet has a convenient formula in terms of D i ∈ EDM N−1the EDM corresponding to that particular facet: From the Cayley-Mengerdeterminant 5.61 for simplices, [373] [131] [162,4] [88,3.3] the i th facetarea-square for i∈{1... N} is (A.4.1)5.60 The upper bound is met when all angles in (1166) are simultaneously 0; that occurs,for example, if one point is relatively interior to the convex hull of the three remaining.5.61 whose foremost characteristic is: the determinant vanishes [ if and ] only if affine0 1Tdimension does not equal penultimate cardinality; id est, det = 0 ⇔ r < N −11 −Dwhere D is any EDM (5.7.3.0.1). Otherwise, the determinant is negative.

492 CHAPTER 5. EUCLIDEAN DISTANCE MATRIXc i ===[ ]−1 0 12 N−2 (N −2)! 2 det T1 −D i(−1) N2 N−2 (N −2)! 2 detD i(1167)(1 T D −1i 1 ) (1168)(−1) N2 N−2 (N −2)! 2 1T cof(D i ) T 1 (1169)where D i is the i th principal N −1×N −1 submatrix 5.62 of D ∈ EDM N ,and cof(D i ) is the N −1×N −1 matrix of cofactors [331,4] correspondingto D i . The number of principal 3 × 3 submatrices in D is, of course, equalto the number of triangular facets in the tetrahedron; four (N!/(3!(N −3)!))when N = 4.5.14.3.1.1 Exercise. Sufficiency conditions for an EDM of four points.Triangle inequality (property 4) and area inequality (1165) are conditionsnecessary for D to be an EDM. Prove their sufficiency in conjunction withthe remaining three Euclidean metric properties.5.14.3.2 N = 5Moving to the next level, we might encounter a Euclidean body calledpolychoron: a bounded polyhedron in four dimensions. 5.63 Our polychoronhas five (N!/(4!(N −4)!)) facets, each of them a general tetrahedron whosevolume-square c i is calculated using the same formula; (1167) whereD is the EDM corresponding to the polychoron, and D i is the EDMcorresponding to the i th facet (the principal 4 × 4 submatrix of D ∈ EDM Ncorresponding to the i th tetrahedron). The analogue to triangle & distanceis now polychoron & facet-volume. We could then write another generalized“triangle” inequality like (1165) but in terms of facet volume; [379,IV]√ci ≤ √ c j + √ c k + √ c l + √ c m , i≠j ≠k≠l≠m∈{1... 5} (1170)5.62 Every principal submatrix of an EDM remains an EDM. [235,4.1]5.63 The simplest polychoron is called a pentatope [373]; a regular simplex hence convex.(A pentahedron is a three-dimensional body having five vertices.)

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 493.Figure 137: Length of one-dimensional face a equals height h=a=1 of thisconvex nonsimplicial pyramid in R 3 with square base inscribed in a circle ofradius R centered at the origin. [373, Pyramid]5.14.3.2.1 Exercise. Sufficiency for an EDM of five points.For N = 5, triangle (distance) inequality (5.2), area inequality (1165), andvolume inequality (1170) are conditions necessary for D to be an EDM. Provetheir sufficiency.5.14.3.3 Volume of simplicesThere is no known formula for the volume of a bounded general convexpolyhedron expressed either by halfspace or vertex-description. [391,2.1][282, p.173] [233] [173] [174] Volume is a concept germane to R 3 ; in higherdimensions it is called content. Applying the EDM assertion (5.9.1.0.4)and a result from [61, p.407], a general nonempty simplex (2.12.3) in R N−1corresponding to an EDM D ∈ S N h has content√ c = content(S)√det(−V T N DV N) (1171)where content-square of the unit simplex S ⊂ R N−1 is proportional to itsCayley-Menger determinant;content(S) 2 =[ ]−1 0 12 N−1 (N −1)! 2 det T1 −D([0 e 1 e 2 · · · e N−1 ])where e i ∈ R N−1 and the EDM operator used is D(X) (891).(1172)

494 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX5.14.3.3.1 Example. Pyramid.A formula for volume of a pyramid is known; 5.64 it is 1 the product of its3base area with its height. [223] The pyramid in Figure 137 has volume 1 . 3To find its volume using EDMs, we must first decompose the pyramid intosimplicial parts. Slicing it in half along the plane containing the line segmentscorresponding to radius R and height h we find the vertices of one simplex,⎡X = ⎣1/2 1/2 −1/2 01/2 −1/2 −1/2 00 0 0 1⎤⎦∈ R n×N (1173)where N = n + 1 for any nonempty simplex in R n .this simplex is half that of the entire pyramid; id est,evaluating (1171).√The volume ofc =1found by6With that, we conclude digression of path.5.14.4 Affine dimension reduction in three dimensions(confer5.8.4) The determinant of any M ×M matrix is equal to the productof its M eigenvalues. [331,5.1] When N = 4 and det(Θ T Θ) is 0, thatmeans one or more eigenvalues of Θ T Θ∈ R 3×3 are 0. The determinant will goto 0 whenever equality is attained on either side of (885), (1162a), or (1162b),meaning that a tetrahedron has collapsed to a lower affine dimension; id est,r = rank Θ T Θ = rank Θ is reduced below N −1 exactly by the number of0 eigenvalues (5.7.1.1).In solving completion problems of any size N where one or more entriesof an EDM are unknown, therefore, dimension r of the affine hull requiredto contain the unknown points is potentially reduced by selecting distancesto attain equality in (885) or (1162a) or (1162b).5.64 Pyramid volume is independent of the paramount vertex position as long as its heightremains constant.

5.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 4955.14.4.1 Exemplum reduxWe now apply the fifth Euclidean metric property to an earlier problem:5.14.4.1.1 Example. Small completion problem, IV. (confer5.9.3.0.1)Returning again to Example 5.3.0.0.2 that pertains to Figure 116 whereN =4, distance-square d 14 is ascertainable from the fifth Euclidean metricproperty. Because all distances in (883) are known except √ d 14 , thencos θ 123 =0 and θ 324 =0 result from identity (954). Applying (885),cos(θ 123 + θ 324 ) ≤ cosθ 124 ≤ cos(θ 123 − θ 324 )0 ≤ cos θ 124 ≤ 0(1174)It follows again from (954) that d 14 can only be 2. As explained in thissubsection, affine dimension r cannot exceed N −2 because equality isattained in (1174).

496 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

Chapter 6Cone of distance matricesFor N > 3, the cone of EDMs is no longer a circular cone andthe geometry becomes complicated...−Hayden, Wells, Liu, & Tarazaga (1991) [185,3]In the subspace of symmetric matrices S N , we know that the convex cone ofEuclidean distance matrices EDM N (the EDM cone) does not intersect thepositive semidefinite cone S N + (PSD cone) except at the origin, their onlyvertex; there can be no positive or negative semidefinite EDM. (1099) [235]EDM N ∩ S N + = 0 (1175)Even so, the two convex cones can be related. In6.8.1 we prove the equalityEDM N = S N h ∩ ( )S N⊥c − S N + (1268)2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.497

498 CHAPTER 6. CONE OF DISTANCE MATRICESa resemblance to EDM definition (891) whereS N h = { A ∈ S N | δ(A) = 0 } (66)is the symmetric hollow subspace (2.2.3) and whereS N⊥c = {u1 T + 1u T | u∈ R N } (2000)is the orthogonal complement of the geometric center subspace (E.7.2.0.2)S N c = {Y ∈ S N | Y 1 = 0} (1998)6.0.1 gravityEquality (1268) is equally important as the known isomorphisms (1007)(1008) (1019) (1020) relating the EDM cone EDM N to positive semidefinitecone S N−1+ (5.6.2.1) or to an N(N −1)/2-dimensional face of S N +(5.6.1.1). 6.1 But those isomorphisms have never led to this equality relatingwhole cones EDM N and S N + .Equality (1268) is not obvious from the various EDM definitions such as(891) or (1191) because inclusion must be proved algebraically in order toestablish equality; EDM N ⊇ S N h ∩ (S N⊥c − S N +). We will instead prove (1268)using purely geometric methods.6.0.2 highlightIn6.8.1.7 we show: the Schoenberg criterion for discriminating Euclideandistance matricesD ∈ EDM N⇔{−VTN DV N ∈ S N−1+D ∈ S N h(910)is a discretized membership relation (2.13.4, dual generalized inequalities)between the EDM cone and its ordinary dual.6.1 Because both positive semidefinite cones are frequently in play, dimension is explicit.

6.1. DEFINING EDM CONE 4996.1 Defining EDM coneWe invoke a popular matrix criterion to illustrate correspondence between theEDM and PSD cones belonging to the ambient space of symmetric matrices:{−V DV ∈ SND ∈ EDM N+⇔(914)D ∈ S N hwhere V ∈ S N is the geometric centering matrix (B.4). The set of all EDMsof dimension N ×N forms a closed convex cone EDM N because any pair ofEDMs satisfies the definition of a convex cone (175); videlicet, for each andevery ζ 1 , ζ 2 ≥ 0 (A.3.1.0.2)ζ 1 V D 1 V + ζ 2 V D 2 V ≽ 0ζ 1 D 1 + ζ 2 D 2 ∈ S N h⇐ V D 1V ≽ 0, V D 2 V ≽ 0D 1 ∈ S N h , D 2 ∈ S N h(1176)and convex cones are invariant to inverse linear transformation [307, p.22].6.1.0.0.1 Definition. Cone of Euclidean distance matrices.In the subspace of symmetric matrices, the set of all Euclidean distancematrices forms a unique immutable pointed closed convex cone called theEDM cone: for N > 0EDM N { }D ∈ S N h | −V DV ∈ S N += ⋂ {D ∈ S N | 〈zz T , −D〉≥0, δ(D)=0 } (1177)z∈N(1 T )The EDM cone in isomorphic R N(N+1)/2 [sic] is the intersection of an infinitenumber (when N >2) of halfspaces about the origin and a finite numberof hyperplanes through the origin in vectorized variable D = [d ij ] . HenceEDM N is not full-dimensional with respect to S N because it is confined to thesymmetric hollow subspace S N h . The EDM cone relative interior comprisesrel int EDM N = ⋂ {D ∈ S N | 〈zz T , −D〉>0, δ(D)=0 }z∈N(1 T )= { D ∈ EDM N | rank(V DV ) = N −1 } (1178)while its relative boundary comprisesrel∂EDM N = { D ∈ EDM N | 〈zz T , −D〉 = 0 for some z ∈ N(1 T ) }= { D ∈ EDM N | rank(V DV ) < N −1 } (1179)△

500 CHAPTER 6. CONE OF DISTANCE MATRICESdvec rel∂EDM 3d 0 13 0.20.2d 120.40.40.60.60.80.8111d 230.80.60.4d 23(a)0.20(d)d 12d 130.80.8√d23 0.60.6√ √ 0 0 d13 0.20.2 d120.20.20.40.4 0.40.40.60.6 0.60.60.80.8 0.80.811 11110.40.40.20.2(b)00(c)Figure 138: Relative boundary (tiled) of EDM cone EDM 3 drawn truncatedin isometrically isomorphic subspace R 3 . (a) EDM cone drawn in usualdistance-square coordinates d ij . View is from interior toward origin. Unlikepositive semidefinite cone, EDM cone is not selfdual; neither is it properin ambient symmetric subspace (dual EDM cone for this example belongsto isomorphic R 6 ). (b) Drawn in its natural coordinates √ d ij (absolutedistance), cone remains convex (confer5.10); intersection of three halfspaces(1062) whose partial boundaries each contain origin. Cone geometry becomesnonconvex (nonpolyhedral) in higher dimension. (6.3) (c) Two coordinatesystems √ artificially superimposed. Coordinate transformation from d ij todij appears a topological contraction. (d) Sitting on its vertex 0, pointedEDM 3 is a circular cone having axis of revolution dvec(−E)= dvec(11 T − I)(1095) (73). (Rounded vertex is plot artifact.)

6.2. POLYHEDRAL BOUNDS 501This cone is more easily visualized in the isomorphic vector subspaceR N(N−1)/2 corresponding to S N h :In the case N = 1 point, the EDM cone is the origin in R 0 .In the case N = 2, the EDM cone is the nonnegative real line in R ; ahalfline in a subspace of the realization in Figure 142.The EDM cone in the case N = 3 is a circular cone in R 3 illustrated inFigure 138(a)(d); rather, the set of all matrices⎡D = ⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎦ ∈ EDM 3 (1180)makes a circular cone in this dimension. In this case, the first four Euclideanmetric properties are necessary and sufficient tests to certify realizabilityof triangles; (1156). Thus triangle inequality property 4 describes threehalfspaces (1062) whose intersection makes a polyhedral cone in R 3 ofrealizable √ d ij (absolute distance); an isomorphic subspace representationof the set of all EDMs D in the natural coordinates⎡ √ √ ⎤0 d12 d13◦√ √d12 √ D ⎣ 0 d23 ⎦√d13 √(1181)d23 0illustrated in Figure 138b.6.2 Polyhedral boundsThe convex cone of EDMs is nonpolyhedral in d ij for N > 2 ; e.g.,Figure 138a. Still we found necessary and sufficient bounding polyhedralrelations consistent with EDM cones for cardinality N = 1, 2, 3, 4:N = 3. Transforming distance-square coordinates d ij by taking their positivesquare root provides the polyhedral cone in Figure 138b; polyhedral√because an intersection of three halfspaces in natural coordinatesdij is provided by triangle inequalities (1062). This polyhedralcone implicitly encompasses necessary and sufficient metric properties:nonnegativity, self-distance, symmetry, and triangle inequality.

502 CHAPTER 6. CONE OF DISTANCE MATRICESN = 4. Relative-angle inequality (1162) together with four Euclidean metricproperties are necessary and sufficient tests for realizability oftetrahedra. (1163) Albeit relative angles θ ikj (954) are nonlinearfunctions of the d ij , relative-angle inequality provides a regulartetrahedron in R 3 [sic] (Figure 136) bounding angles θ ikj at vertexx k consistently with EDM 4 . 6.2Yet were we to employ the procedure outlined in5.14.3 for makinggeneralized triangle inequalities, then we would find all the necessary andsufficient d ij -transformations for generating bounding polyhedra consistentwith EDMs of any higher dimension (N > 3).6.3 √ EDM cone is not convexFor some applications, like a molecular conformation problem (Figure 3,Figure 127) or multidimensional scaling [103] [354], absolute distance √ d ijis the preferred variable. Taking square root of the entries in all EDMs Dof dimension N , we get another cone but not a convex cone when N > 3(Figure 138b): [89,4.5.2]√EDM N { ◦√ D | D ∈ EDM N } (1182)where ◦√ D is defined like (1181). It is a cone simply because any coneis completely constituted by rays emanating from the origin: (2.7) Anygiven ray {ζΓ∈ R N(N−1)/2 | ζ ≥0} remains a ray under entrywise square root:{ √ ζΓ∈ R N(N−1)/2 | ζ ≥0}. It is already established thatD ∈ EDM N ⇒ ◦√ D ∈ EDM N (1094)But because of how √ EDM N is defined, it is obvious that (confer5.10)D ∈ EDM N ⇔ ◦√ √D ∈ EDM N (1183)Were √ EDM N convex, then given◦ √ D 1 , ◦√ D 2 ∈ √ EDM N we wouldexpect their conic combination ◦√ D 1 + ◦√ D 2 to be a member of √ EDM N .6.2 Still, property-4 triangle inequalities (1062) corresponding to each principal 3 ×3submatrix of −VN TDV N demand that the corresponding √ d ij belong to a polyhedralcone like that in Figure 138b.

6.4. EDM DEFINITION IN 11 T 503That is easily proven false by counterexample via (1183), for then( ◦√ D 1 + ◦√ D 2 ) ◦( ◦√ D 1 + ◦√ D 2 ) would need to be a member of EDM N .Notwithstanding, √EDM N ⊆ EDM N (1184)by (1094) (Figure 138), and we learn how to transform a nonconvexproximity problem in the natural coordinates √ d ij to a convex optimizationin7.2.1.6.4 EDM definition in 11 TAny EDM D corresponding to affine dimension r has representationD(V X ) δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (1185)where R(V X ∈ R N×r )⊆ N(1 T ) = 1 ⊥VX T V X = δ 2 (VX T V X ) and V X is full-rank with orthogonal columns.(1186)Equation (1185) is simply the standard EDM definition (891) with acentered list X as in (975); Gram matrix X T X has been replaced withthe subcompact singular value decomposition (A.6.2) 6.3V X V T X ≡ V T X T XV ∈ S N c ∩ S N + (1187)This means: inner product VX TV X is an r×r diagonal matrix Σ of nonzerosingular values.Vector δ(V X VX T ) may me decomposed into complementary parts byprojecting it on orthogonal subspaces 1 ⊥ and R(1) : namely,P 1 ⊥(δ(VX V T X ) ) = V δ(V X V T X ) (1188)P 1(δ(VX V T X ) ) = 1 N 11T δ(V X V T X ) (1189)Of courseδ(V X V T X ) = V δ(V X V T X ) + 1 N 11T δ(V X V T X ) (1190)6.3 Subcompact SVD: V X VX T Q√ Σ √ ΣQ T ≡ V T X T XV . So VX T is not necessarily XV(5.5.1.0.1), although affine dimension r = rank(VX T ) = rank(XV ). (1032)

504 CHAPTER 6. CONE OF DISTANCE MATRICESby (913). Substituting this into EDM definition (1185), we get theHayden, Wells, Liu, & Tarazaga EDM formula [185,2]whereD(V X , y) y1 T + 1y T + λ N 11T − 2V X V T X ∈ EDM N (1191)λ 2‖V X ‖ 2 F = 1 T δ(V X V T X )2 and y δ(V X V T X ) − λ2N 1 = V δ(V XV T X )(1192)and y=0 if and only if 1 is an eigenvector of EDM D . Scalar λ becomesan eigenvalue when corresponding eigenvector 1 exists. 6.4Then the particular dyad sum from (1191)y1 T + 1y T + λ N 11T ∈ S N⊥c (1193)must belong to the orthogonal complement of the geometric center subspace(p.731), whereas V X V T X ∈ SN c ∩ S N + (1187) belongs to the positive semidefinitecone in the geometric center subspace.Proof. We validate eigenvector 1 and eigenvalue λ .(⇒) Suppose 1 is an eigenvector of EDM D . Then becauseit followsV T X 1 = 0 (1194)D1 = δ(V X V T X )1T 1 + 1δ(V X V T X )T 1 = N δ(V X V T X ) + ‖V X ‖ 2 F 1⇒ δ(V X V T X ) ∝ 1 (1195)For some κ∈ R +δ(V X V T X ) T 1 = N κ = tr(V T X V X ) = ‖V X ‖ 2 F ⇒ δ(V X V T X ) = 1 N ‖V X ‖ 2 F1 (1196)so y=0.(⇐) Now suppose δ(V X VX T)= λ 1 ; id est, y=0. Then2ND = λ N 11T − 2V X V T X ∈ EDM N (1197)1 is an eigenvector with corresponding eigenvalue λ . 6.4 e.g., when X = I in EDM definition (891).

6.4. EDM DEFINITION IN 11 T 505N(1 T )δ(V X V T X )1V XFigure 139: Example of V X selection to make an EDM corresponding tocardinality N = 3 and affine dimension r = 1 ; V X is a vector in nullspaceN(1 T )⊂ R 3 . Nullspace of 1 T is hyperplane in R 3 (drawn truncated) havingnormal 1. Vector δ(V X VX T) may or may not be in plane spanned by {1, V X } ,but belongs to nonnegative orthant which is strictly supported by N(1 T ).

506 CHAPTER 6. CONE OF DISTANCE MATRICES6.4.1 Range of EDM DFromB.1.1 pertaining to linear independence of dyad sums: If the transposehalves of all the dyads in the sum (1185) 6.5 make a linearly independent set,then the nontranspose halves constitute a basis for the range of EDM D .Saying this mathematically: For D ∈ EDM NR(D)= R([δ(V X V T X ) 1 V X ]) ⇐ rank([δ(V X V T X ) 1 V X ])= 2 + rR(D)= R([1 V X ]) ⇐ otherwise (1198)6.4.1.0.1 Proof. We need that condition under which the rankequality is satisfied: We know R(V X )⊥1, but what is the relativegeometric orientation of δ(V X VX T) ? δ(V X V X T)≽0 because V X V X T ≽0,and δ(V X VX T)∝1 remains possible (1195); this means δ(V X V X T) /∈ N(1T )simply because it has no negative entries. (Figure 139) If the projection ofδ(V X VX T) on N(1T ) does not belong to R(V X ) , then that is a necessary andsufficient condition for linear independence (l.i.) of δ(V X VX T ) with respect toR([1 V X ]) ; id est,V δ(V X V T X ) ≠ V X a for any a∈ R r(I − 1 N 11T )δ(V X V T X ) ≠ V X aδ(V X V T X ) − 1 N ‖V X ‖ 2 F 1 ≠ V X aδ(V X V T X ) − λ2N 1 = y ≠ V X a ⇔ {1, δ(V X V T X ), V X } is l.i.(1199)When this condition is violated (when (1192) y=V X a p for some particulara∈ R r ), on the other hand, then from (1191) we haveR ( D = y1 T + 1y T + λ N 11T − 2V X V T X)= R((VX a p + λ N 1)1T + (1a T p − 2V X )V T X= R([V X a p + λ N 1 1aT p − 2V X ])= R([1 V X ]) (1200)An example of such a violation is (1197) where, in particular, a p = 0. )6.5 Identifying columns V X [v 1 · · · v r ] , then V X V T X = ∑ iv i v T iis also a sum of dyads.

6.4. EDM DEFINITION IN 11 T 507Then a statement parallel to (1198) is, for D ∈ EDM N (Theorem 5.7.3.0.1)rank(D) = r + 2 ⇔ y /∈ R(V X ) ( ⇔ 1 T D † 1 = 0 )rank(D) = r + 1 ⇔ y ∈ R(V X ) ( ⇔ 1 T D † 1 ≠ 0 ) (1201)6.4.2 Boundary constituents of EDM coneExpression (1185) has utility in forming the set of all EDMs correspondingto affine dimension r :{D ∈ EDM N | rank(V DV )= r }= { D(V X ) | V X ∈ R N×r , rankV X =r , V T X V X = δ2 (V T X V X ), R(V X)⊆ N(1 T ) }(1202)whereas {D ∈ EDM N | rank(V DV )≤ r} is the closure of this same set;{D ∈ EDM N | rank(V DV )≤ r } = { D ∈ EDM N | rank(V DV )= r } (1203)For example,rel ∂EDM N = { D ∈ EDM N | rank(V DV )< N −1 }= N−2⋃ {D ∈ EDM N | rank(V DV )= r } (1204)r=0None of these are necessarily convex sets, althoughEDM N = N−1 ⋃ {D ∈ EDM N | rank(V DV )= r }r=0= { D ∈ EDM N | rank(V DV )= N −1 }rel int EDM N = { D ∈ EDM N | rank(V DV )= N −1 } (1205)are pointed convex cones.When cardinality N = 3 and affine dimension r = 2, for example, therelative interior rel int EDM 3 is realized via (1202). (6.5)When N = 3 and r = 1, the relative boundary of the EDM conedvec rel∂EDM 3 is realized in isomorphic R 3 as in Figure 138d. This figurecould be constructed via (1203) by spiraling vector V X tightly about theorigin in N(1 T ) ; as can be imagined with aid of Figure 139. Vectors closeto the origin in N(1 T ) are correspondingly close to the origin in EDM N .As vector V X orbits the origin in N(1 T ) , the corresponding EDM orbitsthe axis of revolution while remaining on the boundary of the circular conedvec rel∂EDM 3 . (Figure 140)

508 CHAPTER 6. CONE OF DISTANCE MATRICES10(a)-110-1V X ∈ R 3×1տ-101←−dvecD(V X ) ⊂ dvec rel ∂EDM 3(b)Figure 140: (a) Vector V X from Figure 139 spirals in N(1 T )⊂ R 3decaying toward origin. (Spiral is two-dimensional in vector space R 3 .)(b) Corresponding trajectory D(V X ) on EDM cone relative boundary createsa vortex also decaying toward origin. There are two complete orbits on EDMcone boundary about axis of revolution for every single revolution of V Xabout origin. (Vortex is three-dimensional in isometrically isomorphic R 3 .)

6.4. EDM DEFINITION IN 11 T 5096.4.3 Faces of EDM coneLike the positive semidefinite cone, EDM cone faces are EDM cones.6.4.3.0.1 Exercise. Isomorphic faces.Prove that in high cardinality N , any set of EDMs made via (1202) or (1203)with particular affine dimension r is isomorphic with any set admitting thesame affine dimension but made in lower cardinality.6.4.3.1 smallest face that contains an EDMNow suppose we are given a particular EDM D(V Xp )∈ EDM N correspondingto affine dimension r and parametrized by V Xp in (1185). The EDM cone’ssmallest face that contains D(V Xp ) isF ( EDM N ∋ D(V Xp ) )= { D(V X ) | V X ∈ R N×r , rankV X =r , V T X V X = δ2 (V T X V X ), R(V X)⊆ R(V Xp ) }≃ EDM r+1 (1206)which is isomorphic 6.6 with convex cone EDM r+1 , hence of dimensiondim F ( EDM N ∋ D(V Xp ) ) = (r + 1)r/2 (1207)in isomorphic R N(N−1)/2 . Not all dimensions are represented; e.g., the EDMcone has no two-dimensional faces.When cardinality N = 4 and affine dimension r=2 so that R(V Xp ) is anytwo-dimensional subspace of three-dimensional N(1 T ) in R 4 , for example,then the corresponding face of EDM 4 is isometrically isomorphic with: (1203)EDM 3 = {D ∈ EDM 3 | rank(V DV )≤ 2} ≃ F(EDM 4 ∋ D(V Xp )) (1208)Each two-dimensional subspace of N(1 T ) corresponds to anotherthree-dimensional face.Because each and every principal submatrix of an EDM in EDM N(5.14.3) is another EDM [235,4.1], for example, then each principalsubmatrix belongs to a particular face of EDM N .6.6 The fact that the smallest face is isomorphic with another EDM cone (perhaps smallerthan EDM N ) is implicit in [185,2].

510 CHAPTER 6. CONE OF DISTANCE MATRICES6.4.3.2 extreme directions of EDM coneIn particular, extreme directions (2.8.1) of EDM N correspond to affinedimension r = 1 and are simply represented: for any particular cardinalityN ≥ 2 (2.8.2) and each and every nonzero vector z in N(1 T )Γ (z ◦ z)1 T + 1(z ◦ z) T − 2zz T ∈ EDM N= δ(zz T )1 T + 1δ(zz T ) T − 2zz T (1209)is an extreme direction corresponding to a one-dimensional face of the EDMcone EDM N that is a ray in isomorphic subspace R N(N−1)/2 .Proving this would exercise the fundamental definition (186) of extremedirection. Here is a sketch: Any EDM may be representedD(V X ) = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (1185)where matrix V X (1186) has orthogonal columns. For the same reason (1467)that zz T is an extreme direction of the positive semidefinite cone (2.9.2.7)for any particular nonzero vector z , there is no conic combination of distinctEDMs (each conically independent of Γ (2.10)) equal to Γ . 6.4.3.2.1 Example. Biorthogonal expansion of an EDM.(confer2.13.7.1.1) When matrix D belongs to the EDM cone, nonnegativecoordinates for biorthogonal expansion are the eigenvalues λ∈ R N of−V DV 1 : For any D ∈ 2 SN h it holdsD = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(999)By diagonalization −V DV 1 2 QΛQT ∈ S N c(A.5.1) we may write( N) (∑N)∑ T∑D = δ λ i q i qiT 1 T + 1δ λ i q i qiT − 2 N λ i q i qiTi=1i=1i=1∑= N ( )λ i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1(1210)where q i is the i th eigenvector of −V DV 1 arranged columnar in orthogonal2matrixQ = [q 1 q 2 · · · q N ] ∈ R N×N (400)

6.5. CORRESPONDENCE TO PSD CONE S N−1+ 511and where {δ(q i q T i )1 T + 1δ(q i q T i ) T − 2q i q T i , i=1... N} are extremedirections of some pointed polyhedral cone K ⊂ S N h and extreme directionsof EDM N . Invertibility of (1210)−V DV 1 = −V ∑ N (λ2 i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1∑= N λ i q i qiTi=1)V12(1211)implies linear independence of those extreme directions.biorthogonal expansion is expresseddvec D = Y Y † dvecD = Y λ ( )−V DV 1 2Then the(1212)whereY [ dvec ( δ(q i q T i )1 T + 1δ(q i q T i ) T − 2q i q T i), i = 1... N]∈ R N(N−1)/2×N (1213)When D belongs to the EDM cone in the subspace of symmetric hollowmatrices, unique coordinates Y † dvecD for this biorthogonal expansionmust be the nonnegative eigenvalues λ of −V DV 1 . This means D2simultaneously belongs to the EDM cone and to the pointed polyhedral conedvec K = cone(Y ).6.4.3.3 open questionResult (1207) is analogous to that for the positive semidefinite cone (222),although the question remains open whether all faces of EDM N (whosedimension is less than dimension of the cone) are exposed like they are forthe positive semidefinite cone. 6.7 (2.9.2.3) [347]6.5 Correspondence to PSD cone S N−1+Hayden, Wells, Liu, & Tarazaga [185,2] assert one-to-one correspondenceof EDMs with positive semidefinite matrices in the symmetric subspace.Because rank(V DV )≤N−1 (5.7.1.1), that PSD cone corresponding to6.7 Elementary example of face not exposed is given by the closed convex set in Figure 32and in Figure 42.

512 CHAPTER 6. CONE OF DISTANCE MATRICESthe EDM cone can only be S N−1+ . [8,18.2.1] To clearly demonstrate thiscorrespondence, we invoke inner-product form EDM definitionD(Φ) =[0δ(Φ)]1 T + 1 [ 0 δ(Φ) ] [ 0 0 T T− 20 Φ⇔Φ ≽ 0]∈ EDM N(1017)Then the EDM cone may be expressedEDM N = { }D(Φ) | Φ ∈ S N−1+(1214)Hayden & Wells’ assertion can therefore be equivalently stated in terms ofan inner-product form EDM operatorD(S N−1+ ) = EDM N (1019)V N (EDM N ) = S N−1+ (1020)identity (1020) holding because R(V N )= N(1 T ) (898), linear functions D(Φ)and V N (D)= −VN TDV N (5.6.2.1) being mutually inverse.In terms of affine dimension r , Hayden & Wells claim particularcorrespondence between PSD and EDM cones:r = N −1: Symmetric hollow matrices −D positive definite on N(1 T ) correspondto points relatively interior to the EDM cone.r < N −1: Symmetric hollow matrices −D positive semidefinite on N(1 T ) , where−VN TDV N has at least one 0 eigenvalue, correspond to points on therelative boundary of the EDM cone.r = 1: Symmetric hollow nonnegative matrices rank-one on N(1 T ) correspondto extreme directions (1209) of the EDM cone; id est, for some nonzerovector u (A.3.1.0.7)rankV T N DV N =1D ∈ S N h ∩ R N×N+}⇔{D ∈ EDM N−VTD is an extreme direction ⇔ N DV N ≡ uu TD ∈ S N h(1215)

6.5. CORRESPONDENCE TO PSD CONE S N−1+ 5136.5.0.0.1 Proof. Case r = 1 is easily proved: From the nonnegativitydevelopment in5.8.1, extreme direction (1209), and Schoenberg criterion(910), we need show only sufficiency; id est, proverankV T N DV N =1D ∈ S N h ∩ R N×N+}⇒D ∈ EDM ND is an extreme directionAny symmetric matrix D satisfying the rank condition must have the form,for z,q ∈ R N and nonzero z ∈ N(1 T ) ,because (5.6.2.1, conferE.7.2.0.2)D = ±(1q T + q1 T − 2zz T ) (1216)N(V N (D)) = {1q T + q1 T | q ∈ R N } ⊆ S N (1217)Hollowness demands q = δ(zz T ) while nonnegativity demands choice ofpositive sign in (1216). Matrix D thus takes the form of an extremedirection (1209) of the EDM cone.The foregoing proof is not extensible in rank: An EDMwith corresponding affine dimension r has the general form, for{z i ∈ N(1 T ), i=1... r} an independent set,( r)∑ T ( r)∑ ∑D = 1δ z i zi T + δ z i ziT 1 T − 2 r z i zi T ∈ EDM N (1218)i=1i=1i=1The EDM so defined relies principally on the sum ∑ z i ziT having positivesummand coefficients (⇔ −VN TDV N ≽0) 6.8 . Then it is easy to find asum incorporating negative coefficients while meeting rank, nonnegativity,and symmetric hollowness conditions but not positive semidefiniteness onsubspace R(V N ) ; e.g., from page 470,⎡−V ⎣0 1 11 0 51 5 06.8 (⇐) For a i ∈ R N−1 , let z i =V †TN a i .⎤⎦V 1 2 = z 1z T 1 − z 2 z T 2 (1219)

514 CHAPTER 6. CONE OF DISTANCE MATRICES6.5.0.0.2 Example. ⎡ Extreme ⎤ rays versus rays on the boundary.0 1 4The EDM D = ⎣ 1 0 1 ⎦ is an extreme direction of EDM 3 where[ ]4 1 01u = in (1215). Because −VN 2TDV N has eigenvalues {0, 5} , the raywhose direction is D also lies on[ the relative ] boundary of EDM 3 .0 1In exception, EDM D = κ , for any particular κ > 0, is an1 0extreme direction of EDM 2 but −VN TDV N has only one eigenvalue: {κ}.Because EDM 2 is a ray whose relative boundary (2.6.1.4.1) is the origin,this conventional boundary does not include D which belongs to the relativeinterior in this dimension. (2.7.0.0.1)6.5.1 Gram-form correspondence to S N−1+With respect to D(G)=δ(G)1 T + 1δ(G) T − 2G (903) the linear Gram-formEDM operator, results in5.6.1 provide [2,2.6]EDM N = D ( V(EDM N ) ) ≡ D ( )V N S N−1+ VNT(1220)V N S N−1+ VN T ≡ V ( D ( ))V N S N−1+ VN T = V(EDM N ) −V EDM N V 1 = 2 SN c ∩ S N +(1221)a one-to-one correspondence between EDM N and S N−1+ .6.5.2 EDM cone by elliptopeHaving defined the elliptope parametrized by scalar t>0then following Alfakih [9] we haveE N t = S N + ∩ {Φ∈ S N | δ(Φ)=t1} (1097)EDM N = cone{11 T − E N 1 } = {t(11 T − E N 1 ) | t ≥ 0} (1222)Identification E N = E N 1 equates the standard elliptope (5.9.1.0.1,Figure 130) to our parametrized elliptope.

6.5. CORRESPONDENCE TO PSD CONE S N−1+ 515dvec rel ∂ EDM 3dvec(11 T − E 3 )EDM N = cone{11 T − E N } = {t(11 T − E N ) | t ≥ 0} (1222)Figure 141: Three views of translated negated elliptope 11 T − E 3 1(confer Figure 130) shrouded by truncated EDM cone. Fractal on EDMcone relative boundary is numerical artifact belonging to intersection withelliptope relative boundary. The fractal is trying to convey existence of aneighborhood about the origin where the translated elliptope boundary andEDM cone boundary intersect.

516 CHAPTER 6. CONE OF DISTANCE MATRICES6.5.2.0.1 Expository. Normal cone, tangent cone, elliptope.Define T E (11 T ) to be the tangent cone to the elliptope E at point 11 T ;id est,T E (11 T ) {t(E − 11 T ) | t≥0} (1223)The normal cone K ⊥ E (11T ) to the elliptope at 11 T is a closed convex conedefined (E.10.3.2.1, Figure 175)K ⊥ E (11 T ) {B | 〈B , Φ − 11 T 〉 ≤ 0, Φ∈ E } (1224)The polar cone of any set K is the closed convex cone (confer (297))K ◦ {B | 〈B , A〉≤0, for all A∈ K} (1225)The normal cone is well known to be the polar of the tangent cone,and vice versa; [199,A.5.2.4]K ⊥ E (11 T ) = T E (11 T ) ◦ (1226)K ⊥ E (11 T ) ◦ = T E (11 T ) (1227)From Deza & Laurent [113, p.535] we have the EDM coneEDM = −T E (11 T ) (1228)The polar EDM cone is also expressible in terms of the elliptope. From (1226)we haveEDM ◦ = −K ⊥ E (11 T ) (1229)⋆In5.10.1 we proposed the expression for EDM DD = t11 T − E ∈ EDM N (1098)where t∈ R + and E belongs to the parametrized elliptope E N t . We furtherpropose, for any particular t>0EDM N = cone{t11 T − E N t } (1230)Proof. Pending.6.5.2.0.2 Exercise. EDM cone from elliptope.Relationship of the translated negated elliptope with the EDM cone isillustrated in Figure 141. Prove whether it holds thatEDM N = limt→∞t11 T − E N t (1231)

6.6. VECTORIZATION & PROJECTION INTERPRETATION 5176.6 Vectorization & projection interpretationInE.7.2.0.2 we learn: −V DV can be interpreted as orthogonal projection[6,2] of vectorized −D ∈ S N h on the subspace of geometrically centeredsymmetric matricesS N c = {G∈ S N | G1 = 0} (1998)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1999)≡ {V N AVN T | A ∈ SN−1 }(990)because elementary auxiliary matrix V is an orthogonal projector (B.4.1).Yet there is another useful projection interpretation:Revising the fundamental matrix criterion for membership to the EDMcone (886), 6.9〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h}⇔ D ∈ EDM N (1232)this is equivalent, of course, to the Schoenberg criterion−VN TDV }N ≽ 0⇔ D ∈ EDM N (910)D ∈ S N hbecause N(11 T )= R(V N ). When D ∈ EDM N , correspondence (1232)means −z T Dz is proportional to a nonnegative coefficient of orthogonalprojection (E.6.4.2, Figure 143) of −D in isometrically isomorphicR N(N+1)/2 on the range of each and every vectorized (2.2.2.1) symmetricdyad (B.1) in the nullspace of 11 T ; id est, on each and every member ofT { svec(zz T ) | z ∈ N(11 T )= R(V N ) } ⊂ svec ∂ S N += { svec(V N υυ T V T N ) | υ ∈ RN−1} (1233)whose dimension isdim T = N(N − 1)/2 (1234)6.9 N(11 T )= N(1 T ) and R(zz T )= R(z)

518 CHAPTER 6. CONE OF DISTANCE MATRICESThe set of all symmetric dyads {zz T | z ∈ R N } constitute the extremedirections of the positive semidefinite cone (2.8.1,2.9) S N + , hence lie onits boundary. Yet only those dyads in R(V N ) are included in the test (1232),thus only a subset T of all vectorized extreme directions of S N + is observed.In the particularly simple case D ∈ EDM 2 = {D ∈ S 2 h | d 12 ≥ 0} , forexample, only one extreme direction of the PSD cone is involved:zz T =[1 −1−1 1](1235)Any nonnegative scaling of vectorized zz T belongs to the set T illustratedin Figure 142 and Figure 143.6.6.1 Face of PSD cone S N + containing VIn any case, set T (1233) constitutes the vectorized extreme directions ofan N(N −1)/2-dimensional face of the PSD cone S N + containing auxiliarymatrix V ; a face isomorphic with S N−1+ = S rank V+ (2.9.2.3).To show this, we must first find the smallest face that contains auxiliarymatrix V and then determine its extreme directions. From (221),F ( S N + ∋V ) = {W ∈ S N + | N(W) ⊇ N(V )} = {W ∈ S N + | N(W) ⊇ 1}= {V Y V ≽ 0 | Y ∈ S N } ≡ {V N BV T N | B ∈ SN−1 + }≃ S rank V+ = −V T N EDMN V N (1236)where the equivalence ≡ is from5.6.1 while isomorphic equality ≃ withtransformed EDM cone is from (1020). Projector V belongs to F ( S N + ∋V )because V N V † N V †TN V N T = V . (B.4.3) Each and every rank-one matrixbelonging to this face is therefore of the form:V N υυ T V T N | υ ∈ R N−1 (1237)Because F ( S N + ∋V ) is isomorphic with a positive semidefinite cone S N−1+ ,then T constitutes the vectorized extreme directions of F , the originconstitutes the extreme points of F , and auxiliary matrix V is some convexcombination of those extreme points and directions by the extremes theorem(2.8.1.1.1).

6.6. VECTORIZATION & PROJECTION INTERPRETATION 519svec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec EDM 20−Td 11√2d12T {[ 1svec(zz T ) | z ∈ N(11 T )= κ−1] }, κ∈ R ⊂ svec ∂ S 2 +Figure 142: Truncated boundary of positive semidefinite cone S 2 + inisometrically isomorphic R 3 (via svec (56)) is, in this dimension, constitutedsolely by its extreme directions. Truncated cone of Euclidean distancematrices EDM 2 in isometrically isomorphic subspace R . Relativeboundary of EDM cone is constituted solely by matrix 0. HalflineT = {κ 2 [ 1 − √ 2 1 ] T | κ∈ R} on PSD cone boundary depicts that loneextreme ray (1235) on which orthogonal projection of −D must be positivesemidefinite if D is to belong to EDM 2 . aff cone T = svec S 2 c . (1240) DualEDM cone is halfspace in R 3 whose bounding hyperplane has inward-normalsvec EDM 2 .

520 CHAPTER 6. CONE OF DISTANCE MATRICESsvec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec S 2 h−D0D−Td 11√2d12Projection of vectorized −D on range of vectorized zz T :P svec zz T(svec(−D)) = 〈zzT , −D〉〈zz T , zz T 〉 zzTD ∈ EDM N⇔{〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h(1232)Figure 143: Given-matrix D is assumed to belong to symmetric hollowsubspace S 2 h ; a line in this dimension. Negating D , we find its polar alongS 2 h . Set T (1233) has only one ray member in this dimension; not orthogonalto S 2 h . Orthogonal projection of −D on T (indicated by half dot) hasnonnegative projection coefficient. Matrix D must therefore be an EDM.

6.6. VECTORIZATION & PROJECTION INTERPRETATION 521In fact the smallest face, that contains auxiliary matrix V , of the PSDcone S N + is the intersection with the geometric center subspace (1998) (1999);F ( S N + ∋V ) = cone { V N υυ T VNT | υ ∈ RN−1}= S N c ∩ S N +≡ {X ≽ 0 | 〈X , 11 T 〉 = 0} (1593)(1238)In isometrically isomorphic R N(N+1)/2svec F ( S N + ∋V ) = cone T (1239)related to S N c byaff cone T = svec S N c (1240)6.6.2 EDM criteria in 11 T(confer6.4, (917)) Laurent specifies an elliptope trajectory condition forEDM cone membership: [235,2.3]D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0 (1092a)From the parametrized elliptope E N tD ∈ EDM N ⇔ ∃ t∈ R +E∈ E N tin6.5.2 and5.10.1 we propose} D = t11 T − E (1241)Chabrillac & Crouzeix [74,4] prove a different criterion they attributeto Finsler (1937) [144]. We apply it to EDMs: for D ∈ S N h (1040)−V T N DV N ≻ 0 ⇔ ∃κ>0 −D + κ11 T ≻ 0⇔D ∈ EDM N with corresponding affine dimension r=N −1(1242)This Finsler criterion has geometric interpretation in terms of thevectorization & projection already discussed in connection with (1232). Withreference to Figure 142, the offset 11 T is simply a direction orthogonal toT in isomorphic R 3 . Intuitively, translation of −D in direction 11 T is likeorthogonal projection on T in so far as similar information can be obtained.

522 CHAPTER 6. CONE OF DISTANCE MATRICESWhen the Finsler criterion (1242) is applied despite lower affinedimension, the constant κ can go to infinity making the test −D+κ11 T ≽ 0impractical for numerical computation. Chabrillac & Crouzeix invent acriterion for the semidefinite case, but is no more practical: for D ∈ S N hD ∈ EDM N⇔(1243)∃κ p >0 ∀κ≥κ p , −D − κ11 T [sic] has exactly one negative eigenvalue6.7 A geometry of completionIt is not known how to proceed if one wishes to restrict thedimension of the Euclidean space in which the configuration ofpoints may be constructed.−Michael W. Trosset (2000) [352,1]Given an incomplete noiseless EDM, intriguing is the question of whether alist in X ∈ R n×N (76) may be reconstructed and under what circumstancesreconstruction is unique. [2] [4] [5] [6] [8] [17] [67] [205] [217] [234] [235] [236]If one or more entries of a particular EDM are fixed, then geometricinterpretation of the feasible set of completions is the intersection of the EDMcone EDM N in isomorphic subspace R N(N−1)/2 with as many hyperplanes asthere are fixed symmetric entries. 6.10 Assuming a nonempty intersection,then the number of completions is generally infinite, and those correspondingto particular affine dimension r

6.7. A GEOMETRY OF COMPLETION 523(b)(c)dvec rel ∂EDM 3(a)∂H0Figure 144: (a) In isometrically isomorphic subspace R 3 , intersection ofEDM 3 with hyperplane ∂H representing one fixed symmetric entry d 23 =κ(both drawn truncated, rounded vertex is artifact of plot). EDMs in thisdimension corresponding to affine dimension 1 comprise relative boundary ofEDM cone (6.5). Since intersection illustrated includes a nontrivial subsetof cone’s relative boundary, then it is apparent there exist infinitely manyEDM completions corresponding to affine dimension 1. In this dimension it isimpossible to represent a unique nonzero completion corresponding to affinedimension 1, for example, using a single hyperplane because any hyperplanesupporting relative boundary at a particular point Γ contains an entire ray{ζΓ | ζ ≥0} belonging to rel∂EDM 3 by Lemma 2.8.0.0.1. (b) d 13 =κ.(c) d 12 =κ.

524 CHAPTER 6. CONE OF DISTANCE MATRICES(a)2 nearest neighbors(b)3 nearest neighborsFigure 145: Neighborhood graph (dashed) with one dimensionless EDMsubgraph completion (solid) superimposed (but not obscuring dashed). Localview of a few dense samples from relative interior of some arbitraryEuclidean manifold whose affine dimension appears two-dimensional in thisneighborhood. All line segments measure absolute distance. Dashed linesegments help visually locate nearest neighbors; suggesting, best number ofnearest neighbors can be greater than value of embedding dimension aftertopological transformation. (confer [212,2]) Solid line segments representcompletion of EDM subgraph from available distance data for an arbitrarilychosen sample and its nearest neighbors. Each distance from EDM subgraphbecomes distance-square in corresponding EDM submatrix.

6.7. A GEOMETRY OF COMPLETION 525[372,2.2] The physical process is intuitively described as unfurling,unfolding, diffusing, decompacting, or unraveling. In particular instances,the process is a sort of flattening by stretching until taut (but not bycrushing); e.g., unfurling a three-dimensional Euclidean body resembling abillowy national flag reduces that manifold’s affine dimension to r=2.Data input to the proposed process originates from distances betweenneighboring relatively dense samples of a given manifold. Figure 145 realizesa densely sampled neighborhood; called, neighborhood graph. Essentially,the algorithmic process preserves local isometry between nearest neighborsallowing distant neighbors to excurse expansively by “maximizing variance”(Figure 5). The common number of nearest neighbors to each sample isa data-dependent algorithmic parameter whose minimum value connectsthe graph. The dimensionless EDM subgraph between each sample andits nearest neighbors is completed from available data and included asinput; one such EDM subgraph completion is drawn superimposed upon theneighborhood graph in Figure 145. 6.11 The consequent dimensionless EDMgraph comprising all the subgraphs is incomplete, in general, because theneighbor number is relatively small; incomplete even though it is a supersetof the neighborhood graph. Remaining distances (those not graphed at all)are squared then made variables within the algorithm; it is this variabilitythat admits unfurling.To demonstrate, consider untying the trefoil knot drawn in Figure 146a.A corresponding Euclidean distance matrix D = [d ij , i,j=1... N]employing only 2 nearest neighbors is banded having the incomplete form⎡⎤0 ď 12 ď 13 ? · · · ? ď 1,N−1 ď 1Nď 12 0 ď 23 ď 24... ? ? ď 2Nď 13 ď 23 0 ď 34... ? ? ?D =? ď 24 ď 34 0...... ? ?(1244)................... ?? ? ?...... 0 ď N−2,N−1 ď N−2,N⎢⎣ ď 1,N−1 ? ? ?...⎥ď N−2,N−1 0 ď N−1,N ⎦ď 1N ď 2N ? ? ? ď N−2,N ď N−1,N 06.11 Local reconstruction of point position, from the EDM submatrix corresponding to acomplete dimensionless EDM subgraph, is unique to within an isometry (5.6,5.12).

526 CHAPTER 6. CONE OF DISTANCE MATRICES(a)(b)Figure 146: (a) Trefoil knot in R 3 from Weinberger & Saul [372].(b) Topological transformation algorithm employing 4 nearest neighbors andN = 539 samples reduces affine dimension of knot to r=2. Choosing instead2 nearest neighbors would make this embedding more circular.

6.7. A GEOMETRY OF COMPLETION 527where ďij denotes a given fixed distance-square. The unfurling algorithm canbe expressed as an optimization problem; constrained total distance-squaremaximization:maximize 〈−V , D〉Dsubject to 〈D , e i e T j + e j e T i 〉 1 = 2 ďij ∀(i,j)∈ I(1245)rank(V DV ) = 2D ∈ EDM Nwhere e i ∈ R N is the i th member of the standard basis, where set I indexesthe given distance-square data like that in (1244), where V ∈ R N×N is thegeometric centering matrix (B.4.1), and where〈−V , D〉 = tr(−V DV ) = 2 trG = 1 ∑d ij (915)Nwhere G is the Gram matrix producing D assuming G1 = 0.If the (rank) constraint on affine dimension is ignored, then problem(1245) becomes convex, a corresponding solution D ⋆ can be found, and anearest rank-2 solution is then had by ordered eigenvalue decompositionof −V D ⋆ V followed by spectral projection (7.1.3) oni,j[R2+0]⊂ R N . Thistwo-step process is necessarily suboptimal. Yet because the decompositionfor the trefoil knot reveals only two dominant eigenvalues, the spectralprojection is nearly benign. Such a reconstruction of point position (5.12)utilizing 4 nearest neighbors is drawn in Figure 146b; a low-dimensionalembedding of the trefoil knot.This problem (1245) can, of course, be written equivalently in terms ofGram matrix G , facilitated by (921); videlicet, for Φ ij as in (889)maximize 〈I , G〉G∈S N csubject to 〈G , Φ ij 〉 = ďijrankG = 2G ≽ 0∀(i,j)∈ I(1246)The advantage to converting EDM to Gram is: Gram matrix G is a bridgebetween point list X and EDM D ; constraints on any or all of thesethree variables may now be introduced. (Example 5.4.2.3.5) Confinement

528 CHAPTER 6. CONE OF DISTANCE MATRICESFigure 147: Trefoil ribbon; courtesy, Kilian Weinberger. Same topologicaltransformation algorithm with 5 nearest neighbors and N = 1617 samples.

6.8. DUAL EDM CONE 529to the geometric center subspace S N c (implicit constraint G1 = 0) keeps Gindependent of its translation-invariant subspace S N⊥c (5.5.1.1, Figure 148)so as not to become numerically unbounded.A problem dual to maximum variance unfolding problem (1246) (less theGram rank constraint) has been called the fastest mixing Markov process.That dual has simple interpretations in graph and circuit theory and inmechanical and thermal systems, explored in [340], and has direct applicationto quick calculation of PageRank by search engines [231,4]. Optimal Gramrank turns out to be tightly bounded above by minimum multiplicity of thesecond smallest eigenvalue of a dual optimal variable.6.8 Dual EDM cone6.8.1 Ambient S NWe consider finding the ordinary dual EDM cone in ambient space S N whereEDM N is pointed, closed, convex, but not full-dimensional. The set of allEDMs in S N is a closed convex cone because it is the intersection of halfspacesabout the origin in vectorized variable D (2.4.1.1.1,2.7.2):EDM N = ⋂ {D ∈ S N | 〈e i e T i , D〉 ≥ 0, 〈e i e T i , D〉 ≤ 0, 〈zz T , −D〉 ≥ 0 }z∈ N(1 T )i=1...N(1247)By definition (297), dual cone K ∗comprises each and every vectorinward-normal to a hyperplane supporting convex cone K . The dual EDMcone in the ambient space of symmetric matrices is therefore expressible asthe aggregate of every conic combination of inward-normals from (1247):EDM N∗ = cone{e i e T i , −e j e T j | i, j =1... N } − cone{zz T | 11 T zz T =0}∑= { N ∑ζ i e i e T i − N ξ j e j e T j | ζ i ,ξ j ≥ 0} − cone{zz T | 11 T zz T =0}i=1j=1= {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1 , (‖v‖= 1) } ⊂ S N= {δ 2 (Y ) − V N ΨV T N | Y ∈ SN , Ψ∈ S N−1+ } (1248)The EDM cone is not selfdual in ambient S N because its affine hull belongsto a proper subspaceaff EDM N = S N h (1249)

530 CHAPTER 6. CONE OF DISTANCE MATRICESThe ordinary dual EDM cone cannot, therefore, be pointed. (2.13.1.1)When N = 1, the EDM cone is the point at the origin in R . Auxiliarymatrix V N is empty [ ∅ ] , and dual cone EDM ∗ is the real line.When N = 2, the EDM cone is a nonnegative real line in isometricallyisomorphic R 3 ; there S 2 h is a real line containing the EDM cone. Dual coneEDM 2∗ is the particular halfspace in R 3 whose boundary has inward-normalEDM 2 . Diagonal matrices {δ(u)} in (1248) are represented by a hyperplanethrough the origin {d | [ 0 1 0 ]d = 0} while the term cone{V N υυ T V T N }is represented by the halfline T in Figure 142 belonging to the positivesemidefinite cone boundary. The dual EDM cone is formed by translatingthe hyperplane along the negative semidefinite halfline −T ; the union ofeach and every translation. (confer2.10.2.0.1)When cardinality N exceeds 2, the dual EDM cone can no longer bepolyhedral simply because the EDM cone cannot. (2.13.1.1)6.8.1.1 EDM cone and its dual in ambient S NConsider the two convex conessoK 1 S N hK 2 ⋂{A ∈ S N | 〈yy T , −A〉 ≥ 0 }y∈N(1 T )= { A ∈ S N | −z T V AV z ≥ 0 ∀zz T (≽ 0) } (1250)= { A ∈ S N | −V AV ≽ 0 }K 1 ∩ K 2 = EDM N (1251)Dual cone K ∗ 1 = S N⊥h ⊆ S N (72) is the subspace of diagonal matrices.From (1248) via (314),K ∗ 2 = − cone { V N υυ T V T N | υ ∈ R N−1} ⊂ S N (1252)Gaffke & Mathar [148,5.3] observe that projection on K 1 and K 2 havesimple closed forms: Projection on subspace K 1 is easily performed bysymmetrization and zeroing the main diagonal or vice versa, while projectionof H ∈ S N on K 2 isP K2 H = H − P S N+(V H V ) (1253)

6.8. DUAL EDM CONE 531Proof. First, we observe membership of H −P S N+(V H V ) to K 2 because()P S N+(V H V ) − H = P S N+(V H V ) − V H V + (V H V − H) (1254)The term P S N+(V H V ) − V H V necessarily belongs to the (dual) positivesemidefinite cone by Theorem E.9.2.0.1. V 2 = V , hence()−V H −P S N+(V H V ) V ≽ 0 (1255)by Corollary A.3.1.0.5.Next, we requireExpanding,〈P K2 H −H , P K2 H 〉 = 0 (1256)〈−P S N+(V H V ) , H −P S N+(V H V )〉 = 0 (1257)〈P S N+(V H V ) , (P S N+(V H V ) − V H V ) + (V H V − H)〉 = 0 (1258)〈P S N+(V H V ) , (V H V − H)〉 = 0 (1259)Product V H V belongs to the geometric center subspace; (E.7.2.0.2)V H V ∈ S N c = {Y ∈ S N | N(Y )⊇1} (1260)Diagonalize V H V QΛQ T (A.5) whose nullspace is spanned bythe eigenvectors corresponding to 0 eigenvalues by Theorem A.7.3.0.1.Projection of V H V on the PSD cone (7.1) simply zeroes negativeeigenvalues in diagonal matrix Λ . ThenN(P S N+(V H V )) ⊇ N(V H V ) (⊇ N(V )) (1261)from which it follows:P S N+(V H V ) ∈ S N c (1262)so P S N+(V H V ) ⊥ (V H V −H) because V H V −H ∈ S N⊥c .Finally, we must have P K2 H −H =−P S N+(V H V )∈ K ∗ 2 .From6.6.1we know dual cone K ∗ 2 =−F ( S N + ∋V ) is the negative of the positivesemidefinite cone’s smallest face that contains auxiliary matrix V . ThusP S N+(V H V )∈ F ( S N + ∋V ) ⇔ N(P S N+(V H V ))⊇ N(V ) (2.9.2.3) which wasalready established in (1261).

532 CHAPTER 6. CONE OF DISTANCE MATRICESEDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec ∂S 2 +svec S 2 h0Figure 148: Orthogonal complement S 2⊥c (2000) (B.2) of geometric centersubspace (a plane in isometrically isomorphic R 3 ; drawn is a tiled fragment)apparently supporting positive semidefinite cone. (Rounded vertex is artifactof plot.) Line svec S 2 c = aff cone T (1240) intersects svec ∂S 2 + , also drawn inFigure 142; it runs along PSD cone boundary. (confer Figure 129)

6.8. DUAL EDM CONE 533EDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec S 2 h0− svec ∂S 2 +Figure 149: EDM cone construction in isometrically isomorphic R 3 by addingpolar PSD cone to svec S 2⊥c . Difference svec ( S 2⊥c − S+) 2 is halfspace partiallybounded by svec S 2⊥c . EDM cone is nonnegative halfline along svec S 2 h inthis dimension.

534 CHAPTER 6. CONE OF DISTANCE MATRICESFrom results inE.7.2.0.2, we know matrix product V H V is theorthogonal projection of H ∈ S N on the geometric center subspace S N c . Thusthe projection productP K2 H = H − P S N+P S N cH (1263)6.8.1.1.1 Lemma. Projection on PSD cone ∩ geometric center subspace.P S N+ ∩ S N c= P S N+P S N c(1264)Proof. For each and every H ∈ S N , projection of P S N cH on the positivesemidefinite cone remains in the geometric center subspace⋄S N c = {G∈ S N | G1 = 0} (1998)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1999)(990)That is because: eigenvectors of P S N cH corresponding to 0 eigenvaluesspan its nullspace N(P S N cH). (A.7.3.0.1) To project P S N cH on the positivesemidefinite cone, its negative eigenvalues are zeroed. (7.1.2) The nullspaceis thereby expanded while eigenvectors originally spanning N(P S N cH)remain intact. Because the geometric center subspace is invariant toprojection on the PSD cone, then the rule for projection on a convex setin a subspace governs (E.9.5, projectors do not commute) and statement(1264) follows directly. From the lemma it followsThen from (2024){P S N+P S N cH | H ∈ S N } = {P S N+ ∩ S N c H | H ∈ SN } (1265)− ( S N c ∩ S N +) ∗= {H − PS N+P S N cH | H ∈ S N } (1266)From (314) we get closure of a vector sumK 2 = − ( )S N c ∩ S N ∗+ = SN⊥c − S N + (1267)

6.8. DUAL EDM CONE 535therefore the equality [100]EDM N = K 1 ∩ K 2 = S N h ∩ ( )S N⊥c − S N +(1268)whose veracity is intuitively evident, in hindsight, [89, p.109] from the mostfundamental EDM definition (891). Formula (1268) is not a matrix criterionfor membership to the EDM cone, it is not an EDM definition, and it isnot an equivalence between EDM operators or an isomorphism. Rather, it isa recipe for constructing the EDM cone whole from large Euclidean bodies:the positive semidefinite cone, orthogonal complement of the geometric centersubspace, and symmetric hollow subspace. A realization of this constructionin low dimension is illustrated in Figure 148 and Figure 149.The dual EDM cone follows directly from (1268) by standard propertiesof cones (2.13.1.1):EDM N∗ = K ∗ 1 + K ∗ 2 = S N⊥h − S N c ∩ S N + (1269)which bears strong resemblance to (1248).6.8.1.2 nonnegative orthant contains EDM NThat EDM N is a proper subset of the nonnegative orthant is not obviousfrom (1268). We wish to verifyEDM N = S N h ∩ ( )S N⊥c − S N + ⊂ RN×N+ (1270)While there are many ways to prove this, it is sufficient to show that allentries of the extreme directions of EDM N must be nonnegative; id est, forany particular nonzero vector z = [z i , i=1... N]∈ N(1 T ) (6.4.3.2),δ(zz T )1 T + 1δ(zz T ) T − 2zz T ≥ 0 (1271)where the inequality denotes entrywise comparison. The inequality holdsbecause the i,j th entry of an extreme direction is squared: (z i − z j ) 2 .We observe that the dyad 2zz T ∈ S N + belongs to the positive semidefinitecone, the doubletδ(zz T )1 T + 1δ(zz T ) T ∈ S N⊥c (1272)to the orthogonal complement (2000) of the geometric center subspace,while their difference is a member of the symmetric hollow subspace S N h .

536 CHAPTER 6. CONE OF DISTANCE MATRICESHere is an algebraic method to prove nonnegativity: Suppose we aregiven A∈ S N⊥c and B = [B ij ]∈ S N + and A −B ∈ S N h . Then we have, forsome vector u , A = u1 T + 1u T = [A ij ] = [u i + u j ] and δ(B)= δ(A)= 2u .Positive semidefiniteness of B requires nonnegativity A −B ≥ 0 because(e i −e j ) T B(e i −e j ) = (B ii −B ij )−(B ji −B jj ) = 2(u i +u j )−2B ij ≥ 0 (1273)6.8.1.3 Dual Euclidean distance matrix criterionConditions necessary for membership of a matrix D ∗ ∈ S N to the dualEDM cone EDM N∗ may be derived from (1248): D ∗ ∈ EDM N∗ ⇒D ∗ = δ(y) − V N AVNT for some vector y and positive semidefinite matrixA∈ S N−1+ . This in turn implies δ(D ∗ 1)=δ(y). Then, for D ∗ ∈ S Nwhere, for any symmetric matrix D ∗D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (1274)δ(D ∗ 1) − D ∗ ∈ S N c (1275)To show sufficiency of the matrix criterion in (1274), recall Gram-formEDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (903)where Gram matrix G is positive semidefinite by definition, and recall theself-adjointness property of the main-diagonal linear operator δ (A.1):〈D , D ∗ 〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , D ∗〉 = 〈G , δ(D ∗ 1) − D ∗ 〉 2 (921)Assuming 〈G , δ(D ∗ 1) − D ∗ 〉≥ 0 (1485), then we have known membershiprelation (2.13.2.0.1)D ∗ ∈ EDM N∗ ⇔ 〈D , D ∗ 〉 ≥ 0 ∀D ∈ EDM N (1276)

6.8. DUAL EDM CONE 537Elegance of this matrix criterion (1274) for membership to the dual EDMcone is lack of any other assumptions except D ∗ be symmetric: 6.12 LinearGram-form EDM operator D(Y ) (903) has adjoint, for Y ∈ S NThen from (1276) and (904) we have: [89, p.111]D T (Y ) (δ(Y 1) − Y ) 2 (1277)EDM N∗ = {D ∗ ∈ S N | 〈D , D ∗ 〉 ≥ 0 ∀D ∈ EDM N }= {D ∗ ∈ S N | 〈D(G), D ∗ 〉 ≥ 0 ∀G ∈ S N +}= {D ∗ ∈ S N | 〈 G, D T (D ∗ ) 〉 ≥ 0 ∀G ∈ S N +}= {D ∗ ∈ S N | δ(D ∗ 1) − D ∗ ≽ 0}(1278)the dual EDM cone expressed in terms of the adjoint operator. A dual EDMcone determined this way is illustrated in Figure 151.6.8.1.3.1 Exercise. Dual EDM spectral cone.Find a spectral cone as in5.11.2 corresponding to EDM N∗ .6.8.1.4 Nonorthogonal components of dual EDMNow we tie construct (1269) for the dual EDM cone together with the matrixcriterion (1274) for dual EDM cone membership. For any D ∗ ∈ S N it isobvious:δ(D ∗ 1) ∈ S N⊥h (1279)any diagonal matrix belongs to the subspace of diagonal matrices (67). Weknow when D ∗ ∈ EDM N∗ δ(D ∗ 1) − D ∗ ∈ S N c ∩ S N + (1280)this adjoint expression (1277) belongs to that face (1238) of the positivesemidefinite cone S N + in the geometric center subspace. Any nonzerodual EDMD ∗ = δ(D ∗ 1) − (δ(D ∗ 1) − D ∗ ) ∈ S N⊥h ⊖ S N c ∩ S N + = EDM N∗ (1281)can therefore be expressed as the difference of two linearly independent (whenvectorized) nonorthogonal components (Figure 129, Figure 150).6.12 Recall: Schoenberg criterion (910) for membership to the EDM cone requiresmembership to the symmetric hollow subspace.

538 CHAPTER 6. CONE OF DISTANCE MATRICESD ◦ = δ(D ◦ 1) + (D ◦ − δ(D ◦ 1)) ∈ S N⊥h⊕ S N c ∩ S N + = EDM N◦S N c ∩ S N +EDM N◦ D ◦ − δ(D ◦ 1)D ◦δ(D ◦ 1)0EDM N◦S N⊥hFigure 150: Hand-drawn abstraction of polar EDM cone (drawn truncated).Any member D ◦ of polar EDM cone can be decomposed into two linearlyindependent nonorthogonal components: δ(D ◦ 1) and D ◦ − δ(D ◦ 1).

6.8. DUAL EDM CONE 5396.8.1.5 Affine dimension complementarityFrom6.8.1.3 we have, for some A∈ S N−1+ (confer (1280))δ(D ∗ 1) − D ∗ = V N AV T N ∈ S N c ∩ S N + (1282)if and only if D ∗ belongs to the dual EDM cone. Call rank(V N AV T N ) dualaffine dimension. Empirically, we find a complementary relationship in affinedimension between the projection of some arbitrary symmetric matrix H onthe polar EDM cone, EDM N◦ = −EDM N∗ , and its projection on the EDMcone; id est, the optimal solution of 6.13minimize ‖D ◦ − H‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0(1283)has dual affine dimension complementary to affine dimension correspondingto the optimal solution ofminimize ‖D − H‖ FD∈S N hsubject to −VN TDV N ≽ 0(1284)Precisely,rank(D ◦⋆ −δ(D ◦⋆ 1)) + rank(V T ND ⋆ V N ) = N −1 (1285)and rank(D ◦⋆ −δ(D ◦⋆ 1))≤N−1 because vector 1 is always in the nullspaceof rank’s argument. This is similar to the known result for projection on theselfdual positive semidefinite cone and its polar:rankP −S N+H + rankP S N+H = N (1286)6.13 This polar projection can be solved quickly (without semidefinite programming) viaLemma 6.8.1.1.1; rewriting,minimize ‖(D ◦ − δ(D ◦ 1)) − (H − δ(D ◦ 1))‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0which is the projection of affinely transformed optimal solution H − δ(D ◦⋆ 1) on S N c ∩ S N + ;D ◦⋆ − δ(D ◦⋆ 1) = P S N+P S Nc(H − δ(D ◦⋆ 1))Foreknowledge of an optimal solution D ◦⋆ as argument to projection suggests recursion.

540 CHAPTER 6. CONE OF DISTANCE MATRICESWhen low affine dimension is a desirable result of projection on theEDM cone, projection on the polar EDM cone should be performed instead.Convex polar problem (1283) can be solved for D ◦⋆ by transforming to anequivalent Schur-form semidefinite program (3.5.2). Interior-point methodsfor numerically solving semidefinite programs tend to produce high-ranksolutions. (4.1.2) Then D ⋆ = H − D ◦⋆ ∈ EDM N by Corollary E.9.2.2.1, andD ⋆ will tend to have low affine dimension. This approach breaks whenattempting projection on a cone subset discriminated by affine dimensionor rank, because then we have no complementarity relation like (1285) or(1286) (7.1.4.1).6.8.1.6 EDM cone is not selfdualIn5.6.1.1, via Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G ∈ EDM N ⇐ G ≽ 0 (903)we established clear connection between the EDM cone and that face (1238)of positive semidefinite cone S N + in the geometric center subspace:whereIn5.6.1 we establishedEDM N = D(S N c ∩ S N +) (1007)V(EDM N ) = S N c ∩ S N + (1008)V(D) = −V DV 1 2(996)S N c ∩ S N + = V N S N−1+ V T N (994)Then from (1274), (1282), and (1248) we can deduceδ(EDM N∗ 1) − EDM N∗ = V N S N−1+ V T N = S N c ∩ S N + (1287)which, by (1007) and (1008), means the EDM cone can be related to the dualEDM cone by an equality:(EDM N = D δ(EDM N∗ 1) − EDM N∗) (1288)V(EDM N ) = δ(EDM N∗ 1) − EDM N∗ (1289)This means projection −V(EDM N ) of the EDM cone on the geometriccenter subspace S N c (E.7.2.0.2) is a linear transformation of the dual EDMcone: EDM N∗ − δ(EDM N∗ 1). Secondarily, it means the EDM cone is notselfdual in S N .

6.8. DUAL EDM CONE 5416.8.1.7 Schoenberg criterion is discretized membership relationWe show the Schoenberg criterion−VN TDV }N ∈ S N−1+D ∈ S N h⇔ D ∈ EDM N (910)to be a discretized membership relation (2.13.4) between a closed convexcone K and its dual K ∗ like〈y , x〉 ≥ 0 for all y ∈ G(K ∗ ) ⇔ x ∈ K (366)where G(K ∗ ) is any set of generators whose conic hull constructs closedconvex dual cone K ∗ :The Schoenberg criterion is the same as}〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0⇔ D ∈ EDM N (1232)D ∈ S N hwhich, by (1233), is the same as〈zz T , −D〉 ≥ 0 ∀zz T ∈ { V N υυ T VN T | υ ∈ RN−1}D ∈ S N h}⇔ D ∈ EDM N (1290)where the zz T constitute a set of generators G for the positive semidefinitecone’s smallest face F ( S N + ∋V ) (6.6.1) that contains auxiliary matrix V .From the aggregate in (1248) we get the ordinary membership relation,assuming only D ∈ S N [199, p.58]〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ EDM N∗ ⇔ D ∈ EDM N(1291)〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1} ⇔ D ∈ EDM NDiscretization (366) yields:〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {e i e T i , −e j e T j , −V N υυ T V T N | i, j =1... N , υ ∈ R N−1 } ⇔ D ∈ EDM N(1292)

542 CHAPTER 6. CONE OF DISTANCE MATRICESBecause 〈 {δ(u) | u∈ R N }, D 〉 ≥ 0 ⇔ D ∈ S N h , we can restrict observationto the symmetric hollow subspace without loss of generality. Then for D ∈ S N h〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ { −V N υυ T V T N | υ ∈ R N−1} ⇔ D ∈ EDM N (1293)this discretized membership relation becomes (1290); identical to theSchoenberg criterion.Hitherto a correspondence between the EDM cone and a face of a PSDcone, the Schoenberg criterion is now accurately interpreted as a discretizedmembership relation between the EDM cone and its ordinary dual.6.8.2 Ambient S N hWhen instead we consider the ambient space of symmetric hollow matrices(1249), then still we find the EDM cone is not selfdual for N >2. Thesimplest way to prove this is as follows:Given a set of generators G = {Γ} (1209) for the pointed closed convexEDM cone, the discretized membership theorem in2.13.4.2.1 asserts thatmembers of the dual EDM cone in the ambient space of symmetric hollowmatrices can be discerned via discretized membership relation:EDM N∗ ∩ S N h {D ∗ ∈ S N h | 〈Γ , D ∗ 〉 ≥ 0 ∀ Γ ∈ G(EDM N )}By comparison= {D ∗ ∈ S N h | 〈δ(zz T )1 T + 1δ(zz T ) T − 2zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}= {D ∗ ∈ S N h | 〈1δ(zz T ) T − zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}(1294)EDM N = {D ∈ S N h | 〈−zz T , D〉 ≥ 0 ∀z∈ N(1 T )} (1295)the term δ(zz T ) T D ∗ 1 foils any hope of selfdualness in ambient S N h . To find the dual EDM cone in ambient S N h per2.13.9.4 we prune theaggregate in (1248) describing the ordinary dual EDM cone, removing anymember having nonzero main diagonal:EDM N∗ ∩ S N h = cone { δ 2 (V N υυ T V T N ) − V N υυT V T N | υ ∈ RN−1}= {δ 2 (V N ΨV T N ) − V N ΨV T N| Ψ∈ SN−1 + }(1296)

6.8. DUAL EDM CONE 5430dvec rel ∂EDM 3dvec rel ∂(EDM 3∗ ∩ S 3 h)D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (1274)Figure 151: Ordinary dual EDM cone projected on S 3 h shrouds EDM 3 ;drawn tiled in isometrically isomorphic R 3 . (It so happens: intersectionEDM N∗ ∩ S N h (2.13.9.3) is identical to projection of dual EDM cone on S N h .)

544 CHAPTER 6. CONE OF DISTANCE MATRICESWhen N = 1, the EDM cone and its dual in ambient S h each comprisethe origin in isomorphic R 0 ; thus, selfdual in this dimension. (confer (104))When N = 2, the EDM cone is the nonnegative real line in isomorphic R .(Figure 142) EDM 2∗ in S 2 h is identical, thus selfdual in this dimension.This [ result]is in agreement [ with ](1294), verified directly: for all κ∈ R ,11z = κ and δ(zz−1T ) = κ 2 ⇒ d ∗ 12 ≥ 0.1The first case adverse to selfdualness N = 3 may be deduced fromFigure 138; the EDM cone is a circular cone in isomorphic R 3 correspondingto no rotation of Lorentz cone (178) (the selfdual circular cone). Figure 151illustrates the EDM cone and its dual in ambient S 3 h ; no longer selfdual.6.9 Theorem of the alternativeIn2.13.2.1.1 we showed how alternative systems of generalized inequalitycan be derived from closed convex cones and their duals. This section is,therefore, a fitting postscript to the discussion of the dual EDM cone.6.9.0.0.1 Theorem. EDM alternative. [163,1]Given D ∈ S N hD ∈ EDM Nor in the alternative{1 T z = 1∃ z such thatDz = 0(1297)In words, either N(D) intersects hyperplane {z | 1 T z =1} or D is an EDM;the alternatives are incompatible.⋄When D is an EDM [259,2]N(D) ⊂ N(1 T ) = {z | 1 T z = 0} (1298)Because [163,2] (E.0.1)thenDD † 1 = 11 T D † D = 1 T (1299)R(1) ⊂ R(D) (1300)

6.10. POSTSCRIPT 5456.10 PostscriptWe provided an equality (1268) relating the convex cone of Euclidean distancematrices to the convex cone of positive semidefinite matrices. Projection ona positive semidefinite cone, constrained by an upper bound on rank, is easyand well known; [129] simply, a matter of truncating a list of eigenvalues.Projection on a positive semidefinite cone with such a rank constraint is, infact, a convex optimization problem. (7.1.4)In the past, it was difficult to project on the EDM cone under aconstraint on rank or affine dimension. A surrogate method was to invokethe Schoenberg criterion (910) and then project on a positive semidefinitecone under a rank constraint bounding affine dimension from above. But asolution acquired that way is necessarily suboptimal.In7.3.3 we present a method for projecting directly on the EDM coneunder a constraint on rank or affine dimension.

546 CHAPTER 6. CONE OF DISTANCE MATRICES

Chapter 7Proximity problemsIn the “extremely large-scale case” (N of order of tens andhundreds of thousands), [iteration cost O(N 3 )] rules out alladvanced convex optimization techniques, including all knownpolynomial time algorithms.−Arkadi Nemirovski (2004)A problem common to various sciences is to find the Euclidean distancematrix (EDM) D ∈ EDM N closest in some sense to a given complete matrixof measurements H under a constraint on affine dimension 0 ≤ r ≤ N −1(2.3.1,5.7.1.1); rather, r is bounded above by desired affine dimension ρ .2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.547

548 CHAPTER 7. PROXIMITY PROBLEMS7.0.1 Measurement matrix HIdeally, we want a given matrix of measurements H ∈ R N×N to conformwith the first three Euclidean metric properties (5.2); to belong to theintersection of the orthant of nonnegative matrices R N×N+ with the symmetrichollow subspace S N h (2.2.3.0.1). Geometrically, we want H to belong to thepolyhedral cone (2.12.1.0.1)K S N h ∩ R N×N+ (1301)Yet in practice, H can possess significant measurement uncertainty (noise).Sometimes realization of an optimization problem demands that itsinput, the given matrix H , possess some particular characteristics; perhapssymmetry and hollowness or nonnegativity. When that H given does nothave the desired properties, then we must impose them upon H prior tooptimization:When measurement matrix H is not symmetric or hollow, taking itssymmetric hollow part is equivalent to orthogonal projection on thesymmetric hollow subspace S N h .When measurements of distance in H are negative, zeroing negativeentries effects unique minimum-distance projection on the orthant ofnonnegative matrices R N×N+ in isomorphic R N2 (E.9.2.2.3).7.0.1.1 Order of impositionSince convex cone K (1301) is the intersection of an orthant with a subspace,we want to project on that subset of the orthant belonging to the subspace;on the nonnegative orthant in the symmetric hollow subspace that is, in fact,the intersection. For that reason alone, unique minimum-distance projectionof H on K (that member of K closest to H in isomorphic R N2 in the Euclideansense) can be attained by first taking its symmetric hollow part, and onlythen clipping negative entries of the result to 0 ; id est, there is only onecorrect order of projection, in general, on an orthant intersecting a subspace:

549project on the subspace, then project the result on the orthant in thatsubspace. (conferE.9.5)In contrast, order of projection on an intersection of subspaces is arbitrary.That order-of-projection rule applies more generally, of course, tothe intersection of any convex set C with any subspace. Consider theproximity problem 7.1 over convex feasible set S N h ∩ C given nonsymmetricnonhollow H ∈ R N×N :minimizeB∈S N hsubject to‖B − H‖ 2 FB ∈ C(1302)a convex optimization problem. Because the symmetric hollow subspace S N his orthogonal to the antisymmetric antihollow subspace R N×N⊥h(2.2.3), thenfor B ∈ S N h( ( ))1tr B T 2 (H −HT ) + δ 2 (H) = 0 (1303)so the objective function is equivalent to( )∥ ‖B − H‖ 2 F ≡1 ∥∥∥2∥ B − 2 (H +HT ) − δ 2 (H) +F21∥2 (H −HT ) + δ 2 (H)∥F(1304)This means the antisymmetric antihollow part of given matrix H wouldbe ignored by minimization with respect to symmetric hollow variable Bunder Frobenius’ norm; id est, minimization proceeds as though given thesymmetric hollow part of H .This action of Frobenius’ norm (1304) is effectively a Euclidean projection(minimum-distance projection) of H on the symmetric hollow subspace S N hprior to minimization. Thus minimization proceeds inherently following thecorrect order for projection on S N h ∩ C . Therefore we may either assumeH ∈ S N h , or take its symmetric hollow part prior to optimization.7.1 There are two equivalent interpretations of projection (E.9): one finds a set normal,the other, minimum distance between a point and a set. Here we realize the latter view.

550 CHAPTER 7. PROXIMITY PROBLEMS......❜..❜..❜..❜..❜..❜..❜.❜❜❜❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧ 0EDM N❜❜ S N ❝ ✟✟✟✟✟✟✟✟✟✟✟❝❝❝❝❝❝❝❝❝❝❝h❜❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❜❜❜❜❜❜ K = S N .h ∩ R N×N+.❜... S N ❜.. ❜..❜.. ❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧....... . . . ..R N×N........................Figure 152: Pseudo-Venn diagram: The EDM cone belongs to theintersection of the symmetric hollow subspace with the nonnegative orthant;EDM N ⊆ K (890). EDM N cannot exist outside S N h , but R N×N+ does.......7.0.1.2 Egregious input error under nonnegativity demandMore pertinent to the optimization problems presented herein whereC EDM N ⊆ K = S N h ∩ R N×N+ (1305)then should some particular realization of a proximity problem demandinput H be nonnegative, and were we only to zero negative entries of anonsymmetric nonhollow input H prior to optimization, then the ensuingprojection on EDM N would be guaranteed incorrect (out of order).Now comes a surprising fact: Even were we to correctly follow theorder-of-projection rule and provide H ∈ K prior to optimization, then theensuing projection on EDM N will be incorrect whenever input H has negativeentries and some proximity problem demands nonnegative input H .

551HS N h0EDM NK = S N h ∩ R N×N+Figure 153: Pseudo-Venn diagram from Figure 152 showing elbow placedin path of projection of H on EDM N ⊂ S N h by an optimization problemdemanding nonnegative input matrix H . The first two line segmentsleading away from H result from correct order-of-projection required toprovide nonnegative H prior to optimization. Were H nonnegative, then itsprojection on S N h would instead belong to K ; making the elbow disappear.(confer Figure 166)

552 CHAPTER 7. PROXIMITY PROBLEMSThis is best understood referring to Figure 152: Suppose nonnegativeinput H is demanded, and then the problem realization correctly projectsits input first on S N h and then directly on C = EDM N . That demandfor nonnegativity effectively requires imposition of K on input H prior tooptimization so as to obtain correct order of projection (on S N h first). Yetsuch an imposition prior to projection on EDM N generally introduces anelbow into the path of projection (illustrated in Figure 153) caused bythe technique itself; that being, a particular proximity problem realizationrequiring nonnegative input.Any procedure for imposition of nonnegativity on input H can only beincorrect in this circumstance. There is no resolution unless input H isguaranteed nonnegative with no tinkering. Otherwise, we have no choice butto employ a different problem realization; one not demanding nonnegativeinput.7.0.2 Lower boundMost of the problems we encounter in this chapter have the general form:minimize ‖B − A‖ FBsubject to B ∈ C(1306)where A ∈ R m×n is given data. This particular objective denotes Euclideanprojection (E) of vectorized matrix A on the set C which may or may not beconvex. When C is convex, then the projection is unique minimum-distancebecause Frobenius’ norm when squared is a strictly convex function ofvariable B and because the optimal solution is the same regardless of thesquare (513). When C is a subspace, then the direction of projection isorthogonal to C .Denoting by AU A Σ A QA T and BU BΣ B QB T their full singular valuedecompositions (whose singular values are always nonincreasingly ordered(A.6)), there exists a tight lower bound on the objective over the manifoldof orthogonal matrices;‖Σ B − Σ A ‖ F ≤infU A ,U B ,Q A ,Q B‖B − A‖ F (1307)This least lower bound holds more generally for any orthogonally invariantnorm on R m×n (2.2.1) including the Frobenius and spectral norm [328,II.3].[202,7.4.51]

5537.0.3 Problem approachProblems traditionally posed in terms of point position {x i ∈ R n , i=1... N }such as∑minimize (‖x i − x j ‖ − h ij ) 2 (1308){x i }orminimize{x i }i , j ∈ I∑(‖x i − x j ‖ 2 − h ij ) 2 (1309)i , j ∈ I(where I is an abstract set of indices and h ij is given data) are everywhereconverted herein to the distance-square variable D or to Gram matrix G ;the Gram matrix acting as bridge between position and distance. (Thatconversion is performed regardless of whether known data is complete.)Then the techniques of chapter 5 or chapter 6 are applied to find relativeor absolute position. This approach is taken because we prefer introductionof rank constraints into convex problems rather than searching a googol oflocal minima in nonconvex problems like (1308) [105] (3.6.4.0.3,7.2.2.7.1)or (1309).7.0.4 Three prevalent proximity problemsThere are three statements of the closest-EDM problem prevalent in theliterature, the multiplicity due primarily to choice of projection on theEDM versus positive semidefinite (PSD) cone and vacillation between thedistance-square variable d ij versus absolute distance √ d ij . In their mostfundamental form, the three prevalent proximity problems are (1310.1),(1310.2), and (1310.3): [342] for D [d ij ] and ◦√ D [ √ d ij ](1)(3)minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNminimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMN(2)(1310)(4)

554 CHAPTER 7. PROXIMITY PROBLEMSwhere we have made explicit an imposed upper bound ρ on affine dimensionr = rankV T NDV N = rankV DV (1042)that is benign when ρ =N−1 or H were realizable with r ≤ρ. Problems(1310.2) and (1310.3) are Euclidean projections of a vectorized matrix H onan EDM cone (6.3), whereas problems (1310.1) and (1310.4) are Euclideanprojections of a vectorized matrix −VHV on a PSD cone. 7.2 Problem(1310.4) is not posed in the literature because it has limited theoreticalfoundation. 7.3Analytical solution to (1310.1) is known in closed form for any bound ρalthough, as the problem is stated, it is a convex optimization only in the caseρ =N−1. We show in7.1.4 how (1310.1) becomes a convex optimizationproblem for any ρ when transformed to the spectral domain. When expressedas a function of point list in a matrix X as in (1308), problem (1310.2)becomes a variant of what is known in statistics literature as a stress problem.[53, p.34] [103] [354] Problems (1310.2) and (1310.3) are convex optimizationproblems in D for the case ρ =N−1 wherein (1310.3) becomes equivalentto (1309). Even with the rank constraint removed from (1310.2), we will seethat the convex problem remaining inherently minimizes affine dimension.Generally speaking, each problem in (1310) produces a different resultbecause there is no isometry relating them. Of the various auxiliaryV -matrices (B.4), the geometric centering matrix V (913) appears in theliterature most often although V N (897) is the auxiliary matrix naturallyconsequent to Schoenberg’s seminal exposition [312]. Substitution of anyauxiliary matrix or its pseudoinverse into these problems produces anothervalid problem.Substitution of VNT for left-hand V in (1310.1), in particular, produces adifferent result becauseminimize ‖−V (D − H)V ‖ 2 FD(1311)subject to D ∈ EDM N7.2 Because −VHV is orthogonal projection of −H on the geometric center subspace S N c(E.7.2.0.2), problems (1310.1) and (1310.4) may be interpreted as oblique (nonminimumdistance) projections of −H on a positive semidefinite cone.7.3 D ∈ EDM N ⇒ ◦√ D ∈ EDM N , −V ◦√ DV ∈ S N + (5.10)

7.1. FIRST PREVALENT PROBLEM: 555finds D to attain Euclidean distance of vectorized −VHV to the positivesemidefinite cone in ambient isometrically isomorphic R N(N+1)/2 , whereasminimize ‖−VN T(D − H)V N ‖ 2 FD(1312)subject to D ∈ EDM Nattains Euclidean distance of vectorized −VN THV N to the positivesemidefinite cone in isometrically isomorphic subspace R N(N−1)/2 ; quitedifferent projections 7.4 regardless of whether affine dimension is constrained.But substitution of auxiliary matrix VW T (B.4.3) or V † Nyields the sameresult as (1310.1) because V = V W VW T = V N V † N; id est,‖−V (D − H)V ‖ 2 F = ‖−V W V T W (D − H)V W V T W ‖2 F = ‖−V T W (D − H)V W‖ 2 F= ‖−V N V † N (D − H)V N V † N ‖2 F = ‖−V † N (D − H)V N ‖ 2 F(1313)We see no compelling reason to prefer one particular auxiliary V -matrixover another. Each has its own coherent interpretations; e.g.,5.4.2,6.6,B.4.5. Neither can we say any particular problem formulation producesgenerally better results than another. 7.57.1 First prevalent problem:Projection on PSD coneThis first problemminimize ‖−VN T(D − H)V ⎫N ‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 1 (1314)⎪D ∈ EDM N ⎭7.4 The isomorphism T(Y )=V †TN Y V † N onto SN c = {V X V | X ∈ S N } relates the map in(1312) to that in (1311), but is not an isometry.7.5 All four problem formulations (1310) produce identical results when affinedimension r , implicit to a realizable measurement matrix H , does not exceed desiredaffine dimension ρ; because, the optimal objective value will vanish (‖ · ‖ = 0).

556 CHAPTER 7. PROXIMITY PROBLEMSposes a Euclidean projection of −VN THV N in subspace S N−1 on a generallynonconvex subset (when ρ

7.1. FIRST PREVALENT PROBLEM: 5577.1.1 Closest-EDM Problem 1, convex case7.1.1.0.1 Proof. Solution (1316), convex case.When desired affine dimension is unconstrained, ρ =N −1, the rank functiondisappears from (1314) leaving a convex optimization problem; a simpleunique minimum-distance projection on the positive semidefinite cone S N−1videlicetby (910). Becauseminimize ‖−VD∈ S N N T(D − H)V N ‖ 2 Fhsubject to −VN TDV N ≽ 0S N−1 = −V T N S N h V N (1011)+ :(1319)then the necessary and sufficient conditions for projection in isometricallyisomorphic R N(N−1)/2 on the selfdual (377) positive semidefinite cone S N−1+are: 7.8 (E.9.2.0.1) (1590) (confer (2031))−V T N D⋆ V N ≽ 0−V T N D⋆ V N(−VTN D ⋆ V N + V T N HV N)= 0−V T N D⋆ V N + V T N HV N ≽ 0(1320)Symmetric −VN THV N is diagonalizable hence decomposable in terms of itseigenvectors v and eigenvalues λ as in (1317). Therefore (confer (1316))−V T ND ⋆ V N =N−1∑i=1max{0, λ i }v i v T i (1321)satisfies (1320), optimally solving (1319). To see that, recall: theseeigenvectors constitute an orthogonal set and−V T ND ⋆ V N + V T NHV NN−1∑= − min{0, λ i }v i vi T (1322)i=17.8 The Karush-Kuhn-Tucker (KKT) optimality conditions [280, p.328] [61,5.5.3] forproblem (1319) are identical to these conditions for projection on a convex cone.

558 CHAPTER 7. PROXIMITY PROBLEMS7.1.2 Generic problemPrior to determination of D ⋆ , analytical solution (1316) to Problem 1 isequivalent to solution of a generic rank-constrained projection problem:Given desired affine dimension ρ andA −V T NHV N =N−1∑i=1λ i v i v T i ∈ S N−1 (1317)Euclidean projection on a rank ρ subset of a PSD cone (on a generallynonconvex subset of the PSD cone boundary ∂S N−1+ when ρ

7.1. FIRST PREVALENT PROBLEM: 559dimension ρ ; 7.9 called rank ρ subset: (260)S N−1+ \ S N−1+ (ρ + 1) = {X ∈ S N−1+ | rankX ≤ ρ} (216)7.1.3 Choice of spectral coneSpectral projection substitutes projection on a polyhedral cone, containinga complete set of eigenspectra, in place of projection on a convex set ofdiagonalizable matrices; e.g., (1336). In this section we develop a method ofspectral projection for constraining rank of positive semidefinite matrices ina proximity problem like (1323). We will see why an orthant turns out to bethe best choice of spectral cone, and why presorting is critical.Define a nonlinear permutation-operator π(x) : R n → R n that sorts itsvector argument x into nonincreasing order.7.1.3.0.1 Definition. Spectral projection.Let R be an orthogonal matrix and Λ a nonincreasingly ordered diagonalmatrix of eigenvalues. Spectral projection means unique minimum-distanceprojection of a rotated (R ,B.5.4) nonincreasingly ordered (π) vector (δ)of eigenvaluesπ ( δ(R T ΛR) ) (1325)on a polyhedral cone containing all eigenspectra corresponding to a rank ρsubset of a positive semidefinite cone (2.9.2.1) or the EDM cone (inCayley-Menger form,5.11.2.3).△In the simplest and most common case, projection on a positivesemidefinite cone, orthogonal matrix R equals I (7.1.4.0.1) and diagonalmatrix Λ is ordered during diagonalization (A.5.1). Then spectralprojection simply means projection of δ(Λ) on a subset of the nonnegativeorthant, as we shall now ascertain:It is curious how nonconvex Problem 1 has such a simple analyticalsolution (1316). Although solution to generic problem (1323) is well knownsince 1936 [129], its equivalence was observed in 1997 [353,2] to projection7.9 Recall: affine dimension is a lower bound on embedding (2.3.1), equal to dimensionof the smallest affine set in which points from a list X corresponding to an EDM D canbe embedded.

560 CHAPTER 7. PROXIMITY PROBLEMSof an ordered vector of eigenvalues (in diagonal matrix Λ) on a subset of themonotone nonnegative cone (2.13.9.4.2)K M+ = {υ | υ 1 ≥ υ 2 ≥ · · · ≥ υ N−1 ≥ 0} ⊆ R N−1+ (429)Of interest, momentarily, is only the smallest convex subset of themonotone nonnegative cone K M+ containing every nonincreasingly orderedeigenspectrum corresponding to a rank ρ subset of the positive semidefinitecone S N−1+ ; id est,K ρ M+ {υ ∈ Rρ | υ 1 ≥ υ 2 ≥ · · · ≥ υ ρ ≥ 0} ⊆ R ρ + (1326)a pointed polyhedral cone, a ρ-dimensional convex subset of themonotone nonnegative cone K M+ ⊆ R N−1+ having property, for λ denotingeigenspectra,[ Kρ]M+= π(λ(rank ρ subset)) ⊆ K N−10M+ K M+ (1327)For each and every elemental eigenspectrumγ ∈ λ(rank ρ subset)⊆ R N−1+ (1328)of the rank ρ subset (ordered or unordered in λ), there is a nonlinearsurjection π(γ) onto K ρ M+ .7.1.3.0.2 Exercise. Smallest spectral cone.Prove that there is no convex subset of K M+ smaller than K ρ M+ containingevery ordered eigenspectrum corresponding to the rank ρ subset of a positivesemidefinite cone (2.9.2.1).7.1.3.0.3 Proposition. (Hardy-Littlewood-Pólya) Inequalities. [179,X][55,1.2] Any vectors σ and γ in R N−1 satisfy a tight inequalityπ(σ) T π(γ) ≥ σ T γ ≥ π(σ) T Ξπ(γ) (1329)where Ξ is the order-reversing permutation matrix defined in (1728),and permutator π(γ) is a nonlinear function that sorts vector γ intononincreasing order thereby providing the greatest upper bound and leastlower bound with respect to every possible sorting.⋄

7.1. FIRST PREVALENT PROBLEM: 5617.1.3.0.4 Corollary. Monotone nonnegative sort.Any given vectors σ,γ∈R N−1 satisfy a tight Euclidean distance inequality‖π(σ) − π(γ)‖ ≤ ‖σ − γ‖ (1330)where nonlinear function π(γ) sorts vector γ into nonincreasing orderthereby providing the least lower bound with respect to every possiblesorting.⋄Given γ ∈ R N−1infσ∈R N−1+‖σ−γ‖ = infσ∈R N−1+‖π(σ)−π(γ)‖ = infσ∈R N−1+‖σ−π(γ)‖ = inf ‖σ−π(γ)‖σ∈K M+Yet for γ representing an arbitrary vector of eigenvalues, because(1331)inf[Rρσ∈ +0]‖σ − γ‖ 2 ≥ inf[Rρσ∈ +0]‖σ − π(γ)‖ 2 =inf[ Kρσ∈ M+0]‖σ − π(γ)‖ 2 (1332)then projection of γ on the eigenspectra of a rank ρ subset can be tightenedsimply by presorting γ into nonincreasing order.Proof. Simply because π(γ) 1:ρ ≽ π(γ 1:ρ )inf[Rρσ∈ +0inf[Rρσ∈ +0]‖σ − γ‖ 2 = γ T ρ+1:N−1 γ ρ+1:N−1 + inf]‖σ − γ‖ 2 ≥ inf= γ T γ + infσ∈R N−1+≥ γ T γ + infσ∈R N−1+σ T 1:ρσ 1:ρ − 2σ T 1:ργ 1:ρ‖σ 1:ρ − γ 1:ρ ‖ 2σ1:ρσ T 1:ρ − 2σ T (1333)1:ρπ(γ) 1:ρσ∈R N−1+[Rρ]‖σ − π(γ)‖ 2σ∈ +0

562 CHAPTER 7. PROXIMITY PROBLEMS7.1.3.1 Orthant is best spectral cone for Problem 1This means unique minimum-distance projection of γ on the nearestspectral member of the rank ρ subset is tantamount to presorting γ intononincreasing order. Only then does unique spectral projection on a subsetK ρ M+of the monotone nonnegative cone become equivalent to unique spectralprojection on a subset R ρ + of the nonnegative orthant (which is simpler);in other words, unique minimum-distance projection of sorted γ on thenonnegative orthant in a ρ-dimensional subspace of R N is indistinguishablefrom its projection on the subset K ρ M+of the monotone nonnegative cone inthat same subspace.7.1.4 Closest-EDM Problem 1, “nonconvex” caseProof of solution (1316), for projection on a rank ρ subset of the positivesemidefinite cone S N−1+ , can be algebraic in nature. [353,2] Here we derivethat known result but instead using a more geometric argument via spectralprojection on a polyhedral cone (subsuming the proof in7.1.1). In sodoing, we demonstrate how nonconvex Problem 1 is transformed to a convexoptimization:7.1.4.0.1 Proof. Solution (1316), nonconvex case.As explained in7.1.2, we may instead work with the more facile genericproblem (1323). With diagonalization of unknownB UΥU T ∈ S N−1 (1334)given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalizableA QΛQ T ∈ S N−1 (1335)having eigenvalues in Λ arranged in nonincreasing order, by (48) the genericproblem is equivalent tominimize ‖B − A‖ 2B∈S N−1Fsubject to rankB ≤ ρB ≽ 0≡minimize ‖Υ − R T ΛR‖ 2 FR , Υsubject to rank Υ ≤ ρ (1336)Υ ≽ 0R −1 = R T

7.1. FIRST PREVALENT PROBLEM: 563whereR Q T U ∈ R N−1×N−1 (1337)in U on the set of orthogonal matrices is a bijection. We propose solving(1336) by instead solving the problem sequence:minimize ‖Υ − R T ΛR‖ 2 FΥsubject to rank Υ ≤ ρΥ ≽ 0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(1338)Problem (1338a) is equivalent to: (1) orthogonal projection of R T ΛRon an N − 1-dimensional subspace of isometrically isomorphic R N(N−1)/2containing δ(Υ)∈ R N−1+ , (2) nonincreasingly ordering the result,[(3) uniqueRρ]+minimum-distance projection of the ordered result on . (E.9.5)0Projection on that N−1-dimensional subspace amounts to zeroing R T ΛR atall entries off the main diagonal; thus, the equivalent sequence leading witha spectral projection:minimize ‖δ(Υ) − π ( δ(R T ΛR) ) ‖ 2Υ [ Rρ]+subject to δ(Υ) ∈0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(1339)Because any permutation matrix is an orthogonal matrix, δ(R T ΛR)∈ R N−1can always be arranged in nonincreasing order without loss of generality;hence, permutation operator π . Unique minimum-distance projectionof vector π ( δ(R T ΛR) ) [ Rρ]+on the ρ-dimensional subset of nonnegative0

564 CHAPTER 7. PROXIMITY PROBLEMSorthant R N−1+ requires: (E.9.2.0.1)δ(Υ ⋆ ) ρ+1:N−1 = 0δ(Υ ⋆ ) ≽ 0δ(Υ ⋆ ) T( δ(Υ ⋆ ) − π(δ(R T ΛR)) ) = 0δ(Υ ⋆ ) − π(δ(R T ΛR)) ≽ 0(1340)which are necessary and sufficient conditions. Any value Υ ⋆ satisfyingconditions (1340) is optimal for (1339a). So{ {δ(Υ ⋆ max 0, π ( δ(R) i =T ΛR) ) }, i=1... ρi(1341)0, i=ρ+1... N −1specifies an optimal solution. The lower bound on the objective with respectto R in (1339b) is tight: by (1307)‖ |Υ ⋆ | − |Λ| ‖ F ≤ ‖Υ ⋆ − R T ΛR‖ F (1342)where | | denotes absolute entry-value. For selection of Υ ⋆ as in (1341), thislower bound is attained when (conferC.4.2.2)R ⋆ = I (1343)which is the known solution.7.1.4.1 SignificanceImportance of this well-known [129] optimal solution (1316) for projectionon a rank ρ subset of a positive semidefinite cone should not be dismissed:Problem 1, as stated, is generally nonconvex. This analytical solutionat once encompasses projection on a rank ρ subset (216) of the positivesemidefinite cone (generally, a nonconvex subset of its boundary)from either the exterior or interior of that cone. 7.10 By problemtransformation to the spectral domain, projection on a rank ρ subsetbecomes a convex optimization problem.7.10 Projection on the boundary from the interior of a convex Euclidean body is generallya nonconvex problem.

7.1. FIRST PREVALENT PROBLEM: 565This solution is closed form.This solution is equivalent to projection on a polyhedral cone inthe spectral domain (spectral projection, projection on a spectralcone7.1.3.0.1); a necessary and sufficient condition (A.3.1) formembership of a symmetric matrix to a rank ρ subset of a positivesemidefinite cone (2.9.2.1).Because U ⋆ = Q , a minimum-distance projection on a rank ρ subsetof the positive semidefinite cone is a positive semidefinite matrixorthogonal (in the Euclidean sense) to direction of projection. 7.11For the convex case problem (1319), this solution is always unique.Otherwise, distinct eigenvalues (multiplicity 1) in Λ guaranteesuniqueness of this solution by the reasoning inA.5.0.1 . 7.127.1.5 Problem 1 in spectral norm, convex caseWhen instead we pose the matrix 2-norm (spectral norm) in Problem 1(1314) for the convex case ρ = N −1, then the new problemminimize ‖−VN T(D − H)V N ‖ 2D(1344)subject to D ∈ EDM Nis convex although its solution is not necessarily unique; 7.13 giving rise tononorthogonal projection (E.1) on the positive semidefinite cone S N−1+ .Indeed, its solution set includes the Frobenius solution (1316) for the convexcase whenever −VN THV N is a normal matrix. [184,1] [177] [61,8.1.1]Proximity problem (1344) is equivalent tominimize µµ , Dsubject to −µI ≼ −VN T(D − H)V N ≼ µI (1345)D ∈ EDM N7.11 But Theorem E.9.2.0.1 for unique projection on a closed convex cone does not applyhere because the direction of projection is not necessarily a member of the dual PSD cone.This occurs, for example, whenever positive eigenvalues are truncated.7.12 Uncertainty of uniqueness prevents the erroneous conclusion that a rank ρ subset(216) were a convex body by the Bunt-Motzkin[theorem](E.9.0.0.1).2 07.13 For each and every |t|≤2, for example, has the same spectral-norm value.0 t

566 CHAPTER 7. PROXIMITY PROBLEMSby (1704) where{ ∣µ ⋆ = maxλ ( −VN T (D ⋆ − H)V N∣ , i = 1... N −1i)i} ∈ R + (1346)the minimized largest absolute eigenvalue (due to matrix symmetry).For lack of a unique solution here, we prefer the Frobenius rather thanspectral norm.7.2 Second prevalent problem:Projection on EDM cone in √ d ijLet◦√D [√dij ] ∈ K = S N h ∩ R N×N+ (1347)be an unknown matrix of absolute distance; id est,D = [d ij ] ◦√ D ◦ ◦√ D ∈ EDM N (1348)where ◦ denotes Hadamard product. The second prevalent proximityproblem is a Euclidean projection (in the natural coordinates √ d ij ) of matrixH on a nonconvex subset of the boundary of the nonconvex cone of Euclideanabsolute-distance matrices rel∂ √ EDM N : (6.3, confer Figure 138b)minimize ‖ ◦√ ⎫D − H‖◦√ 2 FD⎪⎬subject to rankVN TDV N ≤ ρ Problem 2 (1349)◦√ √D ∈ EDMN⎪⎭where√EDM N = { ◦√ D | D ∈ EDM N } (1182)This statement of the second proximity problem is considered difficult tosolve because of the constraint on desired affine dimension ρ (5.7.2) andbecause the objective function‖ ◦√ D − H‖ 2 F = ∑ i,j( √ d ij − h ij ) 2 (1350)is expressed in the natural coordinates; projection on a doubly nonconvexset.

7.2. SECOND PREVALENT PROBLEM: 567Our solution to this second problem prevalent in the literature requiresmeasurement matrix H to be nonnegative;H = [h ij ] ∈ R N×N+ (1351)If the H matrix given has negative entries, then the technique of solutionpresented here becomes invalid. As explained in7.0.1, projection of Hon K = S N h ∩ R N×N+ (1301) prior to application of this proposed solution isincorrect.7.2.1 Convex caseWhen ρ = N − 1, the rank constraint vanishes and a convex problem thatis equivalent to (1308) emerges: 7.14minimize◦√Dsubject to‖ ◦√ D − H‖ 2 F◦√D ∈√EDMN⇔minimizeD∑ √d ij − 2h ij dij + h 2 iji,jsubject to D ∈ EDM N (1352)For any fixed i and j , the argument of summation is a convex functionof d ij because (for nonnegative constant h ij ) the negative square root isconvex in nonnegative d ij and because d ij + h 2 ij is affine (convex). Becausethe sum of any number of convex functions in D remains convex [61,3.2.1]and because the feasible set is convex in D , we have a convex optimizationproblem:minimize 1 T (D − 2H ◦ ◦√ D )1 + ‖H‖ 2 FD(1353)subject to D ∈ EDM NThe objective function being a sum of strictly convex functions is,moreover, strictly convex in D on the nonnegative orthant. Existenceof a unique solution D ⋆ for this second prevalent problem depends uponnonnegativity of H and a convex feasible set (3.1.2). 7.157.14 still thought to be a nonconvex problem as late as 1997 [354] even though discoveredconvex by de Leeuw in 1993. [103] [53,13.6] Yet using methods from3, it can be easilyascertained: ‖ ◦√ D − H‖ F is not convex in D .7.15 The transformed problem in variable D no longer describes √ Euclidean projection onan EDM cone. Otherwise we might erroneously conclude EDM N were a convex bodyby the Bunt-Motzkin theorem (E.9.0.0.1).

568 CHAPTER 7. PROXIMITY PROBLEMS7.2.1.1 Equivalent semidefinite program, Problem 2, convex caseConvex problem (1352) is numerically solvable for its global minimum usingan interior-point method [391] [289] [277] [382] [11] [145]. We translate (1352)to an equivalent semidefinite program (SDP) for a pedagogical reason madeclear in7.2.2.2 and because there exist readily available computer programsfor numerical solution [167] [384] [385] [359] [34] [383] [348] [336].Substituting a new matrix variable Y [y ij ]∈ R N×N+h ij√dij ← y ij (1354)Boyd proposes: problem (1352) is equivalent to the semidefinite program∑minimize d ij − 2y ij + h 2 ijD , Yi,j[ ]dij y ij(1355)subject to≽ 0 , i,j =1... Ny ij h 2 ijD ∈ EDM NTo see that, recall d ij ≥ 0 is implicit to D ∈ EDM N (5.8.1, (910)). Sowhen H ∈ R N×N+ is nonnegative as assumed,[ ]dij y √ij≽ 0 ⇔ h ij√d ij ≥ yij 2 (1356)y ijh 2 ijMinimization of the objective function implies maximization of y ij that isbounded above. √Hence nonnegativity of y ij is implicit to (1355) and, asdesired, y ij →h ij dij as optimization proceeds.If the given matrix H is now assumed symmetric and nonnegative,H = [h ij ] ∈ S N ∩ R N×N+ (1357)then Y = H ◦ ◦√ D must belong to K= S N h ∩ R N×N+ (1301). Because Y ∈ S N h(B.4.2 no.20), then‖ ◦√ D − H‖ 2 F = ∑ i,jd ij − 2y ij + h 2 ij = −N tr(V (D − 2Y )V ) + ‖H‖ 2 F (1358)

7.2. SECOND PREVALENT PROBLEM: 569So convex problem (1355) is equivalent to the semidefinite programminimize − tr(V (D − 2Y )V )D , Y[ ]dij y ijsubject to≽ 0 ,N ≥ j > i = 1... N −1y ij h 2 ij(1359)Y ∈ S N hD ∈ EDM Nwhere the constants h 2 ij and N have been dropped arbitrarily from theobjective.7.2.1.2 Gram-form semidefinite program, Problem 2, convex caseThere is great advantage to expressing problem statement (1359) inGram-form because Gram matrix G is a bidirectional bridge between pointlist X and distance matrix D ; e.g., Example 5.4.2.3.5, Example 6.7.0.0.1.This way, problem convexity can be maintained while simultaneouslyconstraining point list X , Gram matrix G , and distance matrix D at ourdiscretion.Convex problem (1359) may be equivalently written via linear bijective(5.6.1) EDM operator D(G) (903);minimizeG∈S N c , Y ∈ S N hsubject to− tr(V (D(G) − 2Y )V )[ ]〈Φij , G〉 y ij≽ 0 ,y ijh 2 ijN ≥ j > i = 1... N −1(1360)G ≽ 0where distance-square D = [d ij ] ∈ S N h (887) is related to Gram matrix entriesG = [g ij ] ∈ S N c ∩ S N + bywhered ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(902)Φ ij = (e i − e j )(e i − e j ) T ∈ S N + (889)

570 CHAPTER 7. PROXIMITY PROBLEMSConfinement of G to the geometric center subspace provides numericalstability and no loss of generality (confer (1246)); implicit constraint G1 = 0is otherwise unnecessary.To include constraints on the list X ∈ R n×N , we would first rewrite (1360)minimize − tr(V (D(G) − 2Y )V )G∈S N c , Y ∈ SN h , X∈ Rn×N [ ]〈Φij , G〉 y ijsubject to≽ 0 ,y ijh 2 ij[ ] I XX T ≽ 0GX ∈ CN ≥ j > i = 1... N −1(1361)and then add the constraints, realized here in abstract membership to someconvex set C . This problem realization includes a convex relaxation of thenonconvex constraint G = X T X and, if desired, more constraints on G couldbe added. This technique is discussed in5.4.2.3.5.7.2.2 Minimization of affine dimension in Problem 2When desired affine dimension ρ is diminished, the rank function becomesreinserted into problem (1355) that is then rendered difficult to solve becausefeasible set {D , Y } loses convexity in S N h × R N×N . Indeed, the rank functionis quasiconcave (3.8) on the positive semidefinite cone; (2.9.2.9.2) id est,its sublevel sets are not convex.7.2.2.1 Rank minimization heuristicA remedy developed in [262] [135] [136] [134] introduces convex envelope ofthe quasiconcave rank function: (Figure 154)7.2.2.1.1 Definition. Convex envelope. [198]Convex envelope cenv f of a function f : C →R is defined to be the largestconvex function g such that g ≤ f on convex domain C ⊆ R n . 7.16 △7.16 Provided f ≢+∞ and there exists an affine function h ≤f on R n , then the convexenvelope is equal to the convex conjugate (the Legendre-Fenchel transform) of the convexconjugate of f ; id est, the conjugate-conjugate function f ∗∗ . [199,E.1]

7.2. SECOND PREVALENT PROBLEM: 571rankXgcenv rankXFigure 154: Abstraction of convex envelope of rank function. Rank is aquasiconcave function on positive semidefinite cone, but its convex envelopeis the largest convex function whose epigraph contains it. Vertical barlabelled g measures a trace/rank gap; id est, rank found always exceedsestimate; large decline in trace required here for only small decrease in rank.

572 CHAPTER 7. PROXIMITY PROBLEMS[135] [134] Convex envelope of rank function: for σ i a singular value,(1566)cenv(rankA) on {A∈ R m×n | ‖A‖ 2 ≤κ} = 1 κ 1T σ(A) = 1 κ tr √ A T A (1362)cenv(rankA) on {A normal | ‖A‖ 2 ≤κ} = 1 κ ‖λ(A)‖ 1 = 1 κ tr √ A T A (1363)cenv(rankA) on {A∈ S n + | ‖A‖ 2 ≤κ} = 1 κ 1T λ(A) = 1 tr(A) (1364)κA properly scaled trace thus represents the best convex lower bound on rankfor positive semidefinite matrices. The idea, then, is to substitute convexenvelope for rank of some variable A∈ S M + (A.6.5)rankA ← cenv(rankA) ∝trA = ∑ iσ(A) i = ∑ iλ(A) i (1365)which is equivalent to the sum of all eigenvalues or singular values.[134] Convex envelope of the cardinality function is proportional to the1-norm:cenv(cardx) on {x∈ R n | ‖x‖ ∞ ≤κ} = 1 κ ‖x‖ 1 (1366)cenv(cardx) on {x∈ R n + | ‖x‖ ∞ ≤κ} = 1 κ 1T x (1367)7.2.2.2 Applying trace rank-heuristic to Problem 2Substituting rank envelope for rank function in Problem 2, for D ∈ EDM N(confer (1042))cenv rank(−V T NDV N ) = cenv rank(−V DV ) ∝ − tr(V DV ) (1368)and for desired affine dimension ρ ≤ N −1 and nonnegative H [sic] we geta convex optimization problemminimize ‖ ◦√ D − H‖ 2 FDsubject to − tr(V DV ) ≤ κρ(1369)D ∈ EDM N

7.2. SECOND PREVALENT PROBLEM: 573where κ ∈ R + is a constant determined by cut-and-try. The equivalentsemidefinite program makes κ variable: for nonnegative and symmetric HminimizeD , Y , κsubject toκρ + 2 tr(V Y V )[ ]dij y ij≽ 0 ,y ijh 2 ijN ≥ j > i = 1... N −1(1370)− tr(V DV ) ≤ κρY ∈ S N hD ∈ EDM Nwhich is the same as (1359), the problem with no explicit constraint on affinedimension. As the present problem is stated, the desired affine dimension ρyields to the variable scale factor κ ; ρ is effectively ignored.Yet this result is an illuminant for problem (1359) and it equivalents(all the way back to (1352)): When the given measurement matrix His nonnegative and symmetric, finding the closest EDM D as in problem(1352), (1355), or (1359) implicitly entails minimization of affine dimension(confer5.8.4,5.14.4). Those non−rank-constrained problems are eachinherently equivalent to cenv(rank)-minimization problem (1370), in otherwords, and their optimal solutions are unique because of the strictly convexobjective function in (1352).7.2.2.3 Rank-heuristic insightMinimization of affine dimension by use of this trace rank-heuristic (1368)tends to find a list configuration of least energy; rather, it tends to optimizecompaction of the reconstruction by minimizing total distance. (915) It is bestused where some physical equilibrium implies such an energy minimization;e.g., [352,5].For this Problem 2, the trace rank-heuristic arose naturally in theobjective in terms of V . We observe: V (in contrast to VN T ) spreads energyover all available distances (B.4.2 no.20, contrast no.22) although the rankfunction itself is insensitive to choice of auxiliary matrix.Trace rank-heuristic (1364) is useless when a main diagonal is constrainedto be constant. Such would be the case were optimization over anelliptope (5.4.2.3), or when the diagonal represents a Boolean vector; e.g.,Example 4.2.3.1.1, Example 4.6.0.0.8.

574 CHAPTER 7. PROXIMITY PROBLEMS7.2.2.4 Rank minimization heuristic beyond convex envelopeFazel, Hindi, & Boyd [136] [387] [137] propose a rank heuristic more potentthan trace (1365) for problems of rank minimization;rankY ← log det(Y +εI) (1371)the concave surrogate function log det in place of quasiconcave rankY(2.9.2.9.2) when Y ∈ S n + is variable and where ε is a small positive constant.They propose minimization of the surrogate by substituting a sequencecomprising infima of a linearized surrogate about the current estimate Y i ;id est, from the first-order Taylor series expansion about Y i on some openinterval of ‖Y ‖ 2 (D.1.7)log det(Y + εI) ≈ log det(Y i + εI) + tr ( (Y i + εI) −1 (Y − Y i ) ) (1372)we make the surrogate sequence of infima over bounded convex feasible set Cwhere, for i = 0...arg inf rankY ← lim Y i+1 (1373)Y ∈ C i→∞Y i+1 = arg infY ∈ C tr( (Y i + εI) −1 Y ) (1374)a matrix analogue to the reweighting scheme disclosed in [207,4.11.3].Choosing Y 0 =I , the first step becomes equivalent to finding the infimum oftrY ; the trace rank-heuristic (1365). The intuition underlying (1374) is thenew term in the argument of trace; specifically, (Y i + εI) −1 weights Y so thatrelatively small eigenvalues of Y found by the infimum are made even smaller.To see that, substitute the nonincreasingly ordered diagonalizationsY i + εI Q(Λ + εI)Q TY UΥU Tinto (1374). Then from (1701) we have,inf δ((Λ + εI) −1 ) T δ(Υ) =Υ∈U ⋆T CU ⋆≤infΥ∈U T CU(a)(b)inf tr ( (Λ + εI) −1 R T ΥR )R T =R −1inf tr((Y i + εI) −1 Y )Y ∈ C(1375)(1376)where R Q T U in U on the set of orthogonal matrices is a bijection. Therole of ε is, therefore, to limit maximum weight; the smallest entry on themain diagonal of Υ gets the largest weight.

7.2. SECOND PREVALENT PROBLEM: 5757.2.2.5 Applying log det rank-heuristic to Problem 2When the log det rank-heuristic is inserted into Problem 2, problem (1370)becomes the problem sequence in iminimizeD , Y , κsubject toκρ + 2 tr(V Y V )[ ]djl y jl≽ 0 ,y jlh 2 jll > j = 1... N −1− tr((−V D i V + εI) −1 V DV ) ≤ κρ(1377)Y ∈ S N hD ∈ EDM Nwhere D i+1 D ⋆ ∈ EDM N and D 0 11 T − I .7.2.2.6 Tightening this log det rank-heuristicLike the trace method, this log det technique for constraining rank offersno provision for meeting a predetermined upper bound ρ . Yet sinceeigenvalues are simply determined, λ(Y i + εI)=δ(Λ + εI) , we may certainlyforce selected weights to ε −1 by manipulating diagonalization (1375a).Empirically we find this sometimes leads to better results, although affinedimension of a solution cannot be guaranteed.7.2.2.7 Cumulative summary of rank heuristicsWe have studied a perturbation method of rank reduction in4.3 as well asthe trace heuristic (convex envelope method7.2.2.1.1) and log det heuristicin7.2.2.4. There is another good contemporary method called LMIRank[285] based on alternating projection (E.10). 7.177.2.2.7.1 Example. Unidimensional scaling.We apply the convex iteration method from4.4.1 to numerically solve aninstance of Problem 2; a method empirically superior to the foregoing convexenvelope and log det heuristics for rank regularization and enforcing affinedimension.7.17 that does not solve the ball packing problem presented in5.4.2.3.4.

576 CHAPTER 7. PROXIMITY PROBLEMSUnidimensional scaling, [105] a historically practical application ofmultidimensional scaling (5.12), entails solution of an optimization problemhaving local minima whose multiplicity varies as the factorial of point-listcardinality; geometrically, it means reconstructing a list constrained to lie inone affine dimension. In terms of point list, the nonconvex problem is: givennonnegative symmetric matrix H = [h ij ] ∈ S N ∩ R N×N+ (1357) whose entriesh ij are all known,minimize{x i ∈R}N∑(|x i − x j | − h ij ) 2 (1308)i , j=1called a raw stress problem [53, p.34] which has an implicit constraint ondimensional embedding of points {x i ∈ R , i=1... N}. This problem hasproven NP-hard; e.g., [73].As always, we first transform variables to distance-square D ∈ S N h ; sobegin with convex problem (1359) on page 569minimize − tr(V (D − 2Y )V )D , Y[ ]dij y ijsubject to≽ 0 ,y ijh 2 ijY ∈ S N hD ∈ EDM NrankV T N DV N = 1N ≥ j > i = 1... N −1(1378)that becomes equivalent to (1308) by making explicit the constraint on affinedimension via rank. The iteration is formed by moving the dimensionalconstraint to the objective:minimize −〈V (D − 2Y )V , I〉 − w〈VN TDV N , W 〉D , Y[ ]dij y ijsubject to≽ 0 , N ≥ j > i = 1... N −1y ij h 2 ij(1379)Y ∈ S N hD ∈ EDM Nwhere w (≈ 10) is a positive scalar just large enough to make 〈V T N DV N , W 〉vanish to within some numerical precision, and where direction matrix W is

7.2. SECOND PREVALENT PROBLEM: 577an optimal solution to semidefinite program (1700a)minimize −〈VN T WD⋆ V N , W 〉subject to 0 ≼ W ≼ ItrW = N − 1(1380)one of which is known in closed form. Semidefinite programs (1379) and(1380) are iterated until convergence in the sense defined on page 309. Thisiteration is not a projection method. (4.4.1.1) Convex problem (1379)is neither a relaxation of unidimensional scaling problem (1378); instead,problem (1379) is a convex equivalent to (1378) at convergence of theiteration.Jan de Leeuw provided us with some test data⎡H =⎢⎣0.000000 5.235301 5.499274 6.404294 6.486829 6.2632655.235301 0.000000 3.208028 5.840931 3.559010 5.3534895.499274 3.208028 0.000000 5.679550 4.020339 5.2398426.404294 5.840931 5.679550 0.000000 4.862884 4.5431206.486829 3.559010 4.020339 4.862884 0.000000 4.6187186.263265 5.353489 5.239842 4.543120 4.618718 0.000000and a globally optimal solution⎤⎥⎦(1381)X ⋆ = [ −4.981494 −2.121026 −1.038738 4.555130 0.764096 2.822032 ]= [ x ⋆ 1 x ⋆ 2 x ⋆ 3 x ⋆ 4 x ⋆ 5 x ⋆ 6 ](1382)found by searching 6! local minima of (1308) [105]. By iterating convexproblems (1379) and (1380) about twenty times (initial W = 0) we find theglobal infimum 98.12812 of stress problem (1308), and by (1132) we find acorresponding one-dimensional point list that is a rigid transformation in Rof X ⋆ .Here we found the infimum to accuracy of the given data, but that ceasesto hold as problem size increases. Because of machine numerical precisionand an interior-point method of solution, we speculate, accuracy degradesquickly as problem size increases beyond this.

578 CHAPTER 7. PROXIMITY PROBLEMS7.3 Third prevalent problem:Projection on EDM cone in d ijIn summary, we find that the solution to problem [(1310.3) p.553]is difficult and depends on the dimension of the space as thegeometry of the cone of EDMs becomes more complex.−Hayden, Wells, Liu, & Tarazaga (1991) [185,3]Reformulating Problem 2 (p.566) in terms of EDM D changes theproblem considerably:⎫minimize ‖D − H‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 3 (1383)⎪D ∈ EDM N ⎭This third prevalent proximity problem is a Euclidean projection of givenmatrix H on a generally nonconvex subset (ρ < N −1) of ∂EDM N theboundary of the convex cone of Euclidean distance matrices relativeto subspace S N h (Figure 138d). Because coordinates of projection aredistance-square and H now presumably holds distance-square measurements,numerical solution to Problem 3 is generally different than that of Problem 2.For the moment, we need make no assumptions regarding measurementmatrix H .7.3.1 Convex caseminimize ‖D − H‖ 2 FD(1384)subject to D ∈ EDM NWhen the rank constraint disappears (for ρ = N −1), this third problembecomes obviously convex because the feasible set is then the entire EDMcone and because the objective function‖D − H‖ 2 F = ∑ i,j(d ij − h ij ) 2 (1385)

7.3. THIRD PREVALENT PROBLEM: 579is a strictly convex quadratic in D ; 7.18∑minimize d 2 ij − 2h ij d ij + h 2 ijDi,j(1386)subject to D ∈ EDM NOptimal solution D ⋆ is therefore unique, as expected, for this simpleprojection on the EDM cone equivalent to (1309).7.3.1.1 Equivalent semidefinite program, Problem 3, convex caseIn the past, this convex problem was solved numerically by means ofalternating projection. (Example 7.3.1.1.1) [155] [148] [185,1] We translate(1386) to an equivalent semidefinite program because we have a good solver:Assume the given measurement matrix H to be nonnegative andsymmetric; 7.19H = [h ij ] ∈ S N ∩ R N×N+ (1357)We then propose: Problem (1386) is equivalent to the semidefinite program,for∂ [d 2 ij ] = D ◦D (1387)a matrix of distance-square squared,minimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , N ≥ j > i = 1... N −1d ij 1(1388)D ∈ EDM N∂ ∈ S N h7.18 For nonzero Y ∈ S N h and some open interval of t∈R (3.7.3.0.2,D.2.3)d 2dt 2 ‖(D + tY ) − H‖2 F = 2 tr Y T Y > 07.19 If that H given has negative entries, then the technique of solution presented herebecomes invalid. Projection of H on K (1301) prior to application of this proposedtechnique, as explained in7.0.1, is incorrect.

580 CHAPTER 7. PROXIMITY PROBLEMSwhere [∂ij d ijd ij 1]≽ 0 ⇔ ∂ ij ≥ d 2 ij (1389)Symmetry of input H facilitates trace in the objective (B.4.2 no.20), whileits nonnegativity causes ∂ ij →d 2 ij as optimization proceeds.7.3.1.1.1 Example. Alternating projection on nearest EDM.By solving (1388) we confirm the result from an example given by Glunt,Hayden, Hong, & Wells [155,6] who found analytical solution to convexoptimization problem (1384) for particular cardinality N = 3 by using thealternating projection method of von Neumann (E.10):⎡H = ⎣0 1 11 0 91 9 0⎤⎦ , D ⋆ =⎡⎢⎣19 1909 919 7609 919 7609 9⎤⎥⎦ (1390)The original problem (1384) of projecting H on the EDM cone is transformedto an equivalent iterative sequence of projections on the two convex cones(1250) from6.8.1.1. Using ordinary alternating projection, input H goes toD ⋆ with an accuracy of four decimal places in about 17 iterations. Affinedimension corresponding to this optimal solution is r = 1.Obviation of semidefinite programming’s computational expense is theprincipal advantage of this alternating projection technique. 7.3.1.2 Schur-form semidefinite program, Problem 3 convex caseSemidefinite program (1388) can be reformulated by moving the objectivefunction inminimize ‖D − H‖ 2 FD(1384)subject to D ∈ EDM Nto the constraints. This makes an equivalent epigraph form of the problem:for any measurement matrix Hminimize tt∈R , Dsubject to ‖D − H‖ 2 F ≤ t(1391)D ∈ EDM N

7.3. THIRD PREVALENT PROBLEM: 581We can transform this problem to an equivalent Schur-form semidefiniteprogram; (3.5.2)minimizet∈R , Dsubject tot[]tI vec(D − H)vec(D − H) T ≽ 0 (1392)1D ∈ EDM Ncharacterized by great sparsity and structure. The advantage of this SDP islack of conditions on input H ; e.g., negative entries would invalidate anysolution provided by (1388). (7.0.1.2)7.3.1.3 Gram-form semidefinite program, Problem 3 convex caseFurther, this problem statement may be equivalently written in terms of aGram matrix via linear bijective (5.6.1) EDM operator D(G) (903);minimizeG∈S N c , t∈Rsubject tot[tI vec(D(G) − H)vec(D(G) − H) T 1]≽ 0(1393)G ≽ 0To include constraints on the list X ∈ R n×N , we would rewrite this:minimizetG∈S N c , t∈R , X∈ Rn×Nsubject to[tI vec(D(G) − H)vec(D(G) − H) T 1[ ] I XX T ≽ 0G]≽ 0(1394)X ∈ Cwhere C is some abstract convex set. This technique is discussed in5.4.2.3.5.7.3.1.4 Dual interpretation, projection on EDM coneFromE.9.1.1 we learn that projection on a convex set has a dual form. Inthe circumstance K is a convex cone and point x exists exterior to the cone

582 CHAPTER 7. PROXIMITY PROBLEMSor on its boundary, distance to the nearest point Px in K is found as theoptimal value of the objective‖x − Px‖ = maximize a T xasubject to ‖a‖ ≤ 1 (2016)a ∈ K ◦where K ◦ is the polar cone.Applying this result to (1384), we get a convex optimization for any givensymmetric matrix H exterior to or on the EDM cone boundary:maximize 〈A ◦ , H〉minimize ‖D − H‖ 2 A ◦FDsubject to D ∈ EDM ≡ subject to ‖A ◦ ‖ N F ≤ 1 (1395)A ◦ ∈ EDM N◦Then from (2018) projection of H on cone EDM N isD ⋆ = H − A ◦⋆ 〈A ◦⋆ , H〉 (1396)Critchley proposed, instead, projection on the polar EDM cone in his 1980thesis [89, p.113]: In that circumstance, by projection on the algebraiccomplement (E.9.2.2.1),which is equal to (1396) when A ⋆ solvesD ⋆ = A ⋆ 〈A ⋆ , H〉 (1397)maximize 〈A , H〉Asubject to ‖A‖ F = 1(1398)A ∈ EDM NThis projection of symmetric H on polar cone EDM N◦ can be made a convexproblem, of course, by relaxing the equality constraint (‖A‖ F ≤ 1).7.3.2 Minimization of affine dimension in Problem 3When desired affine dimension ρ is diminished, Problem 3 (1383) is difficultto solve [185,3] because the feasible set in R N(N−1)/2 loses convexity. By

7.3. THIRD PREVALENT PROBLEM: 583substituting rank envelope (1368) into Problem 3, then for any given H weget a convex problemminimize ‖D − H‖ 2 FDsubject to − tr(V DV ) ≤ κρ(1399)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut-and-try. Given κ , problem(1399) is a convex optimization having unique solution in any desiredaffine dimension ρ ; an approximation to Euclidean projection on thatnonconvex subset of the EDM cone containing EDMs with correspondingaffine dimension no greater than ρ .The SDP equivalent to (1399) does not move κ into the variables as onpage 573: for nonnegative symmetric input H and distance-square squaredvariable ∂ as in (1387),minimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , N ≥ j > i = 1... N −1d ij 1− tr(V DV ) ≤ κρD ∈ EDM N(1400)∂ ∈ S N hThat means we will not see equivalence of this cenv(rank)-minimizationproblem to the non−rank-constrained problems (1386) and (1388) like wesaw for its counterpart (1370) in Problem 2.Another approach to affine dimension minimization is to project insteadon the polar EDM cone; discussed in6.8.1.5.7.3.3 Constrained affine dimension, Problem 3When one desires affine dimension diminished further below what can beachieved via cenv(rank)-minimization as in (1400), spectral projection can beconsidered a natural means in light of its successful application to projectionon a rank ρ subset of the positive semidefinite cone in7.1.4.Yet it is wrong here to zero eigenvalues of −V DV or −V GV or a variantto reduce affine dimension, because that particular method comes from

584 CHAPTER 7. PROXIMITY PROBLEMSprojection on a positive semidefinite cone (1336); zeroing those eigenvalueshere in Problem 3 would place an elbow in the projection path (Figure 153)thereby producing a result that is necessarily suboptimal. Problem 3 isinstead a projection on the EDM cone whose associated spectral cone isconsiderably different. (5.11.2.3) Proper choice of spectral cone is demandedby diagonalization of that variable argument to the objective:7.3.3.1 Cayley-Menger formWe use Cayley-Menger composition of the Euclidean distance matrix to solvea problem that is the same as Problem 3 (1383): (5.7.3.0.1)[ ] [ ]∥ minimize0 1T 0 1T ∥∥∥2D∥ −1 −D 1 −H[ ]F0 1T(1401)subject to rank≤ ρ + 21 −DD ∈ EDM Na projection of H on a generally nonconvex subset (when ρ < N −1) of theEuclidean distance matrix cone boundary rel∂EDM N ; id est, projectionfrom the EDM cone interior or exterior on a subset of its relative boundary(6.5, (1179)).Rank of an optimal solution is intrinsically bounded above and below;[ ] 0 1T2 ≤ rank1 −D ⋆ ≤ ρ + 2 ≤ N + 1 (1402)Our proposed strategy ([ for low-rank ]) solution is projection on that subset0 1Tof a spectral cone λ1 −EDM N (5.11.2.3) corresponding to affinedimension not in excess of that ρ desired; id est, spectral projection on⎡ ⎤⎣ Rρ+1 +0 ⎦ ∩ ∂H ⊂ R N+1 (1403)R −where∂H = {λ ∈ R N+1 | 1 T λ = 0} (1112)is a hyperplane through the origin. This pointed polyhedral cone (1403), towhich membership subsumes the rank constraint, is not full-dimensional.

7.3. THIRD PREVALENT PROBLEM: 585Given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalization (A.5)of unknown EDM D [ ] 0 1T UΥU T ∈ S N+11 −Dh(1404)and given symmetric H in diagonalization[ ] 0 1T QΛQ T ∈ S N+1 (1405)1 −Hhaving eigenvalues arranged in nonincreasing order, then by (1125) problem(1401) is equivalent to∥minimize ∥δ(Υ) − π ( δ(R T ΛR) )∥ ∥ 2Υ , R⎡ ⎤subject to δ(Υ) ∈ ⎣ Rρ+1 +0 ⎦ ∩ ∂H(1406)R −δ(QRΥR T Q T ) = 0R −1 = R Twhere π is the permutation operator from7.1.3 arranging its vectorargument in nonincreasing order, 7.20 whereR Q T U ∈ R N+1×N+1 (1407)in U on the set of orthogonal matrices is a bijection, and where ∂H insuresone negative eigenvalue. Hollowness constraint δ(QRΥR T Q T ) = 0 makesproblem (1406) difficult by making the two variables dependent.Our plan is to instead divide problem (1406) into two and then iteratetheir solution:minimizeΥsubject to δ(Υ) ∈(∥ δ(Υ) − π δ(R T ΛR) )∥ ∥ 2⎡ ⎤⎣ Rρ+1 +0 ⎦ ∩ ∂HR −(a)(1408)minimize ‖R Υ ⋆ R T − Λ‖ 2 FRsubject to δ(QR Υ ⋆ R T Q T ) = 0R −1 = R T(b)7.20 Recall, any permutation matrix is an orthogonal matrix.

586 CHAPTER 7. PROXIMITY PROBLEMSProof. We justify disappearance of the hollowness constraint inconvex optimization problem (1408a): From the arguments in7.1.3with regard ⎡ to⎤π the permutation operator, cone membership constraintδ(Υ) ∈⎣ Rρ+1 +0 ⎦∩ ∂H from (1408a) is equivalent toR −⎡δ(Υ) ∈ ⎣ Rρ+1 +0R −⎤⎦ ∩ ∂H ∩ K M (1409)where K M is the monotone cone (2.13.9.4.3). Membership of δ(Υ) to thepolyhedral cone of majorization (Theorem A.1.2.0.1)K ∗ λδ = ∂H ∩ K ∗ M+ (1426)where K ∗ M+ is the dual monotone nonnegative cone (2.13.9.4.2), is acondition (in absence of a hollowness [ constraint) ] that would insure existence0 1Tof a symmetric hollow matrix . Curiously, intersection of1 −D⎡ ⎤this feasible superset ⎣ Rρ+1 +0 ⎦∩ ∂H ∩ K M from (1409) with the cone ofmajorization K ∗ λδR −is a benign operation; id est,∂H ∩ K ∗ M+ ∩ K M = ∂H ∩ K M (1410)verifiable by observing conic dependencies (2.10.3) among the aggregate ofhalfspace-description normals. The cone membership constraint in (1408a)therefore inherently insures existence of a symmetric hollow matrix.

7.3. THIRD PREVALENT PROBLEM: 587Optimization (1408b) would be a Procrustes problem (C.4) were itnot for the hollowness constraint; it is, instead, a minimization over theintersection of the nonconvex manifold of orthogonal matrices with anothernonconvex set in variable R specified by the hollowness constraint.We solve problem (1408b) by a method introduced in4.6.0.0.2: DefineR = [r 1 · · · r N+1 ]∈ R N+1×N+1 and make the assignment⎡G = ⎢⎣r 1.r N+11⎤[ r1 T · · · rN+1 T 1] ⎥∈ S (N+1)2 +1⎦⎡⎤ ⎡R 11 · · · R 1,N+1 r 1.....= ⎢⎣ R1,N+1 T ⎥R N+1,N+1 r N+1 ⎦ ⎢⎣r1 T · · · rN+1 T 1(1411)⎤r 1 r1 T · · · r 1 rN+1 T r 1.....r N+1 r1 T r N+1 rN+1 T ⎥r N+1 ⎦r1 T · · · rN+1 T 1where R ij r i rj T ∈ R N+1×N+1 and Υ ⋆ ii ∈ R . Since R Υ ⋆ R T = N+1 ∑ΥiiR ⋆ ii thenproblem (1408b) is equivalently expressed:∥ ∥∥∥ N+12∑minimize Υ ⋆R ii ∈S , R ij , r iiiR ii − Λ∥i=1Fsubject to trR ii = 1, i=1... N+1trR ij ⎡= 0,⎤i

588 CHAPTER 7. PROXIMITY PROBLEMS∥ ∥∥∥ N+12∑minimize Υ ⋆R ij , r iiiR ii − Λ∥ + 〈G, W 〉i=1Fsubject to trR ii = 1, i=1... N+1trR ij ⎡= 0,⎤i

7.4. CONCLUSION 589filtering, multidimensional scaling or principal component analysis (5.12),medical imaging (Figure 107), molecular conformation (Figure 127), andsensor-network localization or wireless location (Figure 85).There has been little progress in spectral projection since the discovery byEckart & Young in 1936 [129] leading to a formula for projection on a rank ρsubset of a positive semidefinite cone (2.9.2.1). [146] The only closed-formspectral method presently available for solving proximity problems, having aconstraint on rank, is based on their discovery (Problem 1,7.1,5.13).One popular recourse is intentional misapplication of Eckart & Young’sresult by introducing spectral projection on a positive semidefinite coneinto Problem 3 via D(G) (903), for example. [73] Since Problem 3instead demands spectral projection on the EDM cone, any solutionacquired that way is necessarily suboptimal.A second recourse is problem redesign: A presupposition to allproximity problems in this chapter is that matrix H is given.We considered H having various properties such as nonnegativity,symmetry, hollowness, or lack thereof. It was assumed that if H didnot already belong to the EDM cone, then we wanted an EDM closestto H in some sense; id est, input-matrix H was assumed corruptedsomehow. For practical problems, it withstands reason that such aproximity problem could instead be reformulated so that some or allentries of H were unknown but bounded above and below by knownlimits; the norm objective is thereby eliminated as in the developmentbeginning on page 323. That particular redesign (the art, p.8), in termsof the Gram-matrix bridge between point-list X and EDM D , at onceencompasses proximity and completion problems.A third recourse is to apply the method of convex iteration just likewe did in7.2.2.7.1. This technique is applicable to any semidefiniteproblem requiring a rank constraint; it places a regularization term inthe objective that enforces the rank constraint.

590 CHAPTER 7. PROXIMITY PROBLEMS

Appendix ALinear algebraA.1 Main-diagonal δ operator, λ , trace, vecWe introduce notation δ denoting the main-diagonal linear self-adjointoperator. When linear function δ operates on a square matrix A∈ R N×N ,δ(A) returns a vector composed of all the entries from the main diagonal inthe natural order;δ(A) ∈ R N (1415)Operating on a vector y ∈ R N , δ naturally returns a diagonal matrix;δ(y) ∈ S N (1416)Operating recursively on a vector Λ∈ R N or diagonal matrix Λ∈ S N ,δ(δ(Λ)) returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) Λ (1417)Defined in this manner, main-diagonal linear operator δ is self-adjoint[227,3.10,9.5-1]; A.1 videlicet, (2.2)δ(A) T y = 〈δ(A), y〉 = 〈A , δ(y)〉 = tr ( A T δ(y) ) (1418)A.1 Linear operator T : R m×n → R M×N is self-adjoint when, ∀ X 1 , X 2 ∈ R m×n〈T(X 1 ), X 2 〉 = 〈X 1 , T(X 2 )〉2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.591

592 APPENDIX A. LINEAR ALGEBRAA.1.1IdentitiesThis δ notation is efficient and unambiguous as illustrated in the followingexamples where: A ◦ B denotes Hadamard product [202] [159,1.1.4] ofmatrices of like size, ⊗ Kronecker product [166] (D.1.2.1), y a vector, Xa matrix, e i the i th member of the standard basis for R n , S N h the symmetrichollow subspace, σ(A) a vector of (nonincreasingly) ordered singular values,and λ(A) a vector of nonincreasingly ordered eigenvalues of matrix A :1. δ(A) = δ(A T )2. tr(A) = tr(A T ) = δ(A) T 1 = 〈I, A〉3. δ(cA) = cδ(A) c∈R4. tr(cA) = c tr(A) = c1 T λ(A) c∈R5. vec(cA) = c vec(A) c∈R6. A ◦ cB = cA ◦ B c∈R7. A ⊗ cB = cA ⊗ B c∈R8. δ(A + B) = δ(A) + δ(B)9. tr(A + B) = tr(A) + tr(B)10. vec(A + B) = vec(A) + vec(B)11. (A + B) ◦ C = A ◦ C + B ◦ CA ◦ (B + C) = A ◦ B + A ◦ C12. (A + B) ⊗ C = A ⊗ C + B ⊗ CA ⊗ (B + C) = A ⊗ B + A ⊗ C13. sgn(c)λ(|c|A) = cλ(A) c∈R14. sgn(c)σ(|c|A) = cσ(A) c∈R15. tr(c √ A T A) = c tr √ A T A = c1 T σ(A) c∈R16. π(δ(A)) = λ(I ◦A) where π is the presorting function.17. δ(AB) = (A ◦ B T )1 = (B T ◦ A)118. δ(AB) T = 1 T (A T ◦ B) = 1 T (B ◦ A T )

A.1. MAIN-DIAGONAL δ OPERATOR, λ , TRACE, VEC 593⎡ ⎤u 1 v 119. δ(uv T ⎢ ⎥) = ⎣ . ⎦ = u ◦ v ,u N v Nu,v ∈ R N20. tr(A T B) = tr(AB T ) = tr(BA T ) = tr(B T A)= 1 T (A ◦ B)1 = 1 T δ(AB T ) = δ(A T B) T 1 = δ(BA T ) T 1 = δ(B T A) T 121. D = [d ij ] ∈ S N h , H = [h ij ] ∈ S N h , V = I − 1 N 11T ∈ S N (conferB.4.2 no.20)N tr(−V (D ◦ H)V ) = tr(D T H) = 1 T (D ◦ H)1 = tr(11 T (D ◦ H)) = ∑ d ij h iji,j22. tr(ΛA) = δ(Λ) T δ(A), δ 2 (Λ) Λ ∈ S N23. y T Bδ(A) = tr ( Bδ(A)y T) = tr ( δ(B T y)A ) = tr ( Aδ(B T y) )= δ(A) T B T y = tr ( yδ(A) T B T) = tr ( A T δ(B T y) ) = tr ( δ(B T y)A T)24. δ 2 (A T A) = ∑ ie i e T iA T Ae i e T i25. δ ( δ(A)1 T) = δ ( 1δ(A) T) = δ(A)26. δ(A1)1 = δ(A11 T ) = A1 , δ(y)1 = δ(y1 T ) = y27. δ(I1) = δ(1) = I28. δ(e i e T j 1) = δ(e i ) = e i e T i29. For ζ =[ζ i ]∈ R k and x=[x i ]∈ R k ,30.∑ζ i /x i = ζ T δ(x) −1 1vec(A ◦ B) = vec(A) ◦ vec(B) = δ(vecA) vec(B)= vec(B) ◦ vec(A) = δ(vecB) vec(A)31. vec(AXB) = (B T ⊗ A) vec X32. vec(BXA) = (A T ⊗ B) vec Xi(42)(1787)(notH )33.34.tr(AXBX T ) = vec(X) T vec(AXB) = vec(X) T (B T ⊗ A) vec X [166]= δ ( vec(X) vec(X) T (B T ⊗ A) ) T1tr(AX T BX) = vec(X) T vec(BXA) = vec(X) T (A T ⊗ B) vec X= δ ( vec(X) vec(X) T (A T ⊗ B) ) T1

594 APPENDIX A. LINEAR ALGEBRA35. For any permutation matrix Ξ and dimensionally compatible vector yor matrix Aδ(Ξy) = Ξδ(y) Ξ T (1419)δ(ΞAΞ T ) = Ξδ(A) (1420)So given any permutation matrix Ξ and any dimensionally compatiblematrix B , for example,δ 2 (B) = Ξδ 2 (Ξ T B Ξ)Ξ T (1421)36. A ⊗ 1 = 1 ⊗ A = A37. A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C38. (A ⊗ B)(C ⊗ D) = AC ⊗ BD39. For A a vector, (A ⊗ B) = (A ⊗ I)B40. For B a row vector, (A ⊗ B) = A(I ⊗ B)41. (A ⊗ B) T = A T ⊗ B T42. (A ⊗ B) −1 = A −1 ⊗ B −143. tr(A ⊗ B) = trA trB44. For A∈ R m×m , B ∈ R n×n , det(A ⊗ B) = det n (A) det m (B)45. There exist permutation matrices Ξ 1 and Ξ 2 such that [166, p.28]A ⊗ B = Ξ 1 (B ⊗ A)Ξ 2 (1422)46. For eigenvalues λ(A)∈ C n and eigenvectors v(A)∈ C n×n such thatA = vδ(λ)v −1 ∈ R n×nλ(A ⊗ B) = λ(A) ⊗ λ(B) , v(A ⊗ B) = v(A) ⊗ v(B) (1423)47. Given analytic function [82] f : C n×n → C n×n , then f(I ⊗A)= I ⊗f(A)and f(A ⊗ I) = f(A) ⊗ I [166, p.28]

A.1. MAIN-DIAGONAL δ OPERATOR, λ , TRACE, VEC 595A.1.2MajorizationA.1.2.0.1 Theorem. (Schur) Majorization. [393,7.4] [202,4.3][203,5.5] Let λ∈ R N denote a given vector of eigenvalues and letδ ∈ R N denote a given vector of main diagonal entries, both arranged innonincreasing order. Thenand conversely∃A∈ S N λ(A)=λ and δ(A)= δ ⇐ λ − δ ∈ K ∗ λδ (1424)A∈ S N ⇒ λ(A) − δ(A) ∈ K ∗ λδ (1425)The difference belongs to the pointed polyhedral cone of majorization (nota full-dimensional cone, confer (313))K ∗ λδ K ∗ M+ ∩ {ζ1 | ζ ∈ R} ∗ (1426)where K ∗ M+ is the dual monotone nonnegative cone (435), and where thedual of the line is a hyperplane; ∂H = {ζ1 | ζ ∈ R} ∗ = 1 ⊥ .⋄Majorization cone K ∗ λδ is naturally consequent to the definition ofmajorization; id est, vector y ∈ R N majorizes vector x if and only ifk∑x i ≤i=1k∑y i ∀ 1 ≤ k ≤ N (1427)i=1and1 T x = 1 T y (1428)Under these circumstances, rather, vector x is majorized by vector y .In the particular circumstance δ(A)=0 we get:A.1.2.0.2 Corollary. Symmetric hollow majorization.Let λ∈ R N denote a given vector of eigenvalues arranged in nonincreasingorder. Then∃A∈ S N h λ(A)=λ ⇐ λ ∈ K ∗ λδ (1429)and converselywhere K ∗ λδis defined in (1426).⋄A∈ S N h ⇒ λ(A) ∈ K ∗ λδ (1430)

596 APPENDIX A. LINEAR ALGEBRAA.2 Semidefiniteness: domain of testThe most fundamental necessary, sufficient, and definitive test for positivesemidefiniteness of matrix A ∈ R n×n is: [203,1]x T Ax ≥ 0 for each and every x ∈ R n such that ‖x‖ = 1 (1431)Traditionally, authors demand evaluation over broader domain; namely,over all x ∈ R n which is sufficient but unnecessary. Indeed, that standardtextbook requirement is far over-reaching because if x T Ax is nonnegative forparticular x = x p , then it is nonnegative for any αx p where α∈ R . Thus,only normalized x in R n need be evaluated.Many authors add the further requirement that the domain be complex;the broadest domain. By so doing, only Hermitian matrices (A H = A wheresuperscript H denotes conjugate transpose) A.2 are admitted to the set ofpositive semidefinite matrices (1434); an unnecessary prohibitive condition.A.2.1Symmetry versus semidefinitenessWe call (1431) the most fundamental test of positive semidefiniteness. Yetsome authors instead say, for real A and complex domain {x∈ C n } , thecomplex test x H Ax≥0 is most fundamental. That complex broadening of thedomain of test causes nonsymmetric real matrices to be excluded from the setof positive semidefinite matrices. Yet admitting nonsymmetric real matricesor not is a matter of preference A.3 unless that complex test is adopted, as weshall now explain.Any real square matrix A has a representation in terms of its symmetricand antisymmetric parts; id est,A = (A +AT )2+ (A −AT )2(53)Because, for all real A , the antisymmetric part vanishes under real test,x T(A −AT )2x = 0 (1432)A.2 Hermitian symmetry is the complex analogue; the real part of a Hermitian matrixis symmetric while its imaginary part is antisymmetric. A Hermitian matrix has realeigenvalues and real main diagonal.A.3 Golub & Van Loan [159,4.2.2], for example, admit nonsymmetric real matrices.

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 597only the symmetric part of A , (A +A T )/2, has a role determining positivesemidefiniteness. Hence the oft-made presumption that only symmetricmatrices may be positive semidefinite is, of course, erroneous under (1431).Because eigenvalue-signs of a symmetric matrix translate unequivocally toits semidefiniteness, the eigenvalues that determine semidefiniteness arealways those of the symmetrized matrix. (A.3) For that reason, andbecause symmetric (or Hermitian) matrices must have real eigenvalues,the convention adopted in the literature is that semidefinite matrices aresynonymous with symmetric semidefinite matrices. Certainly misleadingunder (1431), that presumption is typically bolstered with compellingexamples from the physical sciences where symmetric matrices occur withinthe mathematical exposition of natural phenomena. A.4 [140,52]Perhaps a better explanation of this pervasive presumption of symmetrycomes from Horn & Johnson [202,7.1] whose perspective A.5 is the complexmatrix, thus necessitating the complex domain of test throughout theirtreatise. They explain, if A∈C n×n...and if x H Ax is real for all x ∈ C n , then A is Hermitian.Thus, the assumption that A is Hermitian is not necessary in thedefinition of positive definiteness. It is customary, however.Their comment is best explained by noting, the real part of x H Ax comesfrom the Hermitian part (A +A H )/2 of A ;rather,re(x H Ax) = xHA +AHx (1433)2x H Ax ∈ R ⇔ A H = A (1434)because the imaginary part of x H Ax comes from the anti-Hermitian part(A −A H )/2 ;im(x H Ax) = xHA −AHx (1435)2that vanishes for nonzero x if and only if A = A H . So the Hermitiansymmetry assumption is unnecessary, according to Horn & Johnson, notA.4 Symmetric matrices are certainly pervasive in the our chosen subject as well.A.5 A totally complex perspective is not necessarily more advantageous. The positivesemidefinite cone, for example, is not selfdual (2.13.5) in the ambient space of Hermitianmatrices. [195,II]

598 APPENDIX A. LINEAR ALGEBRAbecause nonHermitian matrices could be regarded positive semidefinite,rather because nonHermitian (includes nonsymmetric real) matrices are notcomparable on the real line under x H Ax . Yet that complex edifice isdismantled in the test of real matrices (1431) because the domain of testis no longer necessarily complex; meaning, x T Ax will certainly always bereal, regardless of symmetry, and so real A will always be comparable.In summary, if we limit the domain of test to all x in R n as in (1431),then nonsymmetric real matrices are admitted to the realm of semidefinitematrices because they become comparable on the real line. One importantexception occurs for rank-one matrices Ψ=uv T where u and v are realvectors: Ψ is positive semidefinite if and only if Ψ=uu T . (A.3.1.0.7)We might choose to expand the domain of test to all x in C n so that onlysymmetric matrices would be comparable. The alternative to expandingdomain of test is to assume all matrices of interest to be symmetric; thatis commonly done, hence the synonymous relationship with semidefinitematrices.A.2.1.0.1 Example. Nonsymmetric positive definite product.Horn & Johnson assert and Zhang agrees:If A,B ∈ C n×n are positive definite, then we know that theproduct AB is positive definite if and only if AB is Hermitian.[202,7.6 prob.10] [393,6.2,3.2]Implicitly in their statement, A and B are assumed individually Hermitianand the domain of test is assumed complex.We prove that assertion to be false for real matrices under (1431) thatadopts a real domain of test.A T = A =⎡⎢⎣3 0 −1 00 5 1 0−1 1 4 10 0 1 4⎤⎥⎦ ,⎡λ(A) = ⎢⎣5.94.53.42.0⎤⎥⎦ (1436)B T = B =⎡⎢⎣4 4 −1 −14 5 0 0−1 0 5 1−1 0 1 4⎤⎥⎦ ,⎡λ(B) = ⎢⎣8.85.53.30.24⎤⎥⎦ (1437)

A.3. PROPER STATEMENTS 599(AB) T ≠ AB =⎡⎢⎣13 12 −8 −419 25 5 1−5 1 22 9−5 0 9 17⎤⎥⎦ ,⎡λ(AB) = ⎢⎣36.29.10.0.72⎤⎥⎦ (1438)1(AB + 2 (AB)T ) =⎡⎢⎣13 15.5 −6.5 −4.515.5 25 3 0.5−6.5 3 22 9−4.5 0.5 9 17⎤⎥⎦ , λ( 12 (AB + (AB)T ) ) =⎡⎢⎣36.30.10.0.014(1439)Whenever A∈ S n + and B ∈ S n + , then λ(AB)=λ( √ AB √ A) will alwaysbe a nonnegative vector by (1463) and Corollary A.3.1.0.5. Yet positivedefiniteness of product AB is certified instead by the nonnegative eigenvaluesλ ( 12 (AB + (AB)T ) ) in (1439) (A.3.1.0.1) despite the fact that AB is notsymmetric. A.6 Horn & Johnson and Zhang resolve the anomaly by choosingto exclude nonsymmetric matrices and products; they do so by expandingthe domain of test to C n .⎤⎥⎦A.3 Proper statementsof positive semidefinitenessUnlike Horn & Johnson and Zhang, we never adopt a complex domain of testwith real matrices. So motivated is our consideration of proper statementsof positive semidefiniteness under real domain of test. This restriction,ironically, complicates the facts when compared to corresponding statementsfor the complex case (found elsewhere [202] [393]).We state several fundamental facts regarding positive semidefiniteness ofreal matrix A and the product AB and sum A +B of real matrices underfundamental real test (1431); a few require proof as they depart from thestandard texts, while those remaining are well established or obvious.A.6 It is a little more difficult to find a counterexample in R 2×2 or R 3×3 ; which may haveserved to advance any confusion.

600 APPENDIX A. LINEAR ALGEBRAA.3.0.0.1 Theorem. Positive (semi)definite matrix.A∈ S M is positive semidefinite if and only if for each and every vector x∈ R Mof unit norm, ‖x‖ = 1 , A.7 we have x T Ax ≥ 0 (1431);A ≽ 0 ⇔ tr(xx T A) = x T Ax ≥ 0 ∀xx T (1440)Matrix A ∈ S M is positive definite if and only if for each and every ‖x‖ = 1we have x T Ax > 0 ;A ≻ 0 ⇔ tr(xx T A) = x T Ax > 0 ∀xx T , xx T ≠ 0 (1441)Proof. Statements (1440) and (1441) are each a particular instanceof dual generalized inequalities (2.13.2) with respect to the positivesemidefinite cone; videlicet, [360]⋄A ≽ 0 ⇔ 〈xx T , A〉 ≥ 0 ∀xx T (≽ 0)A ≻ 0 ⇔ 〈xx T , A〉 > 0 ∀xx T (≽ 0), xx T ≠ 0(1442)This says: positive semidefinite matrix A must belong to the normal sideof every hyperplane whose normal is an extreme direction of the positivesemidefinite cone. Relations (1440) and (1441) remain true when xx T isreplaced with “for each and every” positive semidefinite matrix X ∈ S M +(2.13.5) of unit norm, ‖X‖= 1, as inA ≽ 0 ⇔ tr(XA) ≥ 0 ∀X ∈ S M +A ≻ 0 ⇔ tr(XA) > 0 ∀X ∈ S M + , X ≠ 0(1443)But that condition is more than what is necessary. By the discretizedmembership theorem in2.13.4.2.1, the extreme directions xx T of the positivesemidefinite cone constitute a minimal set of generators necessary andsufficient for discretization of dual generalized inequalities (1443) certifyingmembership to that cone.A.7 The traditional condition requiring all x∈ R M for defining positive (semi)definitenessis actually more than what is necessary. The set of norm-1 vectors is necessary andsufficient to establish positive semidefiniteness; actually, any particular norm and anynonzero norm-constant will work.

A.3. PROPER STATEMENTS 601A.3.1Semidefiniteness, eigenvalues, nonsymmetricWhen A∈ R n×n , let λ ( 12 (A +AT ) ) ∈ R n denote eigenvalues of thesymmetrized matrix A.8 arranged in nonincreasing order.By positive semidefiniteness of A∈ R n×n we mean, A.9 [272,1.3.1](conferA.3.1.0.1)x T Ax ≥ 0 ∀x∈ R n ⇔ A +A T ≽ 0 ⇔ λ(A +A T ) ≽ 0 (1444)(2.9.0.1)A ≽ 0 ⇒ A T = A (1445)A ≽ B ⇔ A −B ≽ 0 A ≽ 0 or B ≽ 0 (1446)x T Ax≥0 ∀x A T = A (1447)Matrix symmetry is not intrinsic to positive semidefiniteness;A T = A, λ(A) ≽ 0 ⇒ x T Ax ≥ 0 ∀x (1448)If A T = A thenλ(A) ≽ 0 ⇐ A T = A, x T Ax ≥ 0 ∀x (1449)λ(A) ≽ 0 ⇔ A ≽ 0 (1450)meaning, matrix A belongs to the positive semidefinite cone in thesubspace of symmetric matrices if and only if its eigenvalues belong tothe nonnegative orthant.〈A , A〉 = 〈λ(A), λ(A)〉 (45)For µ∈ R , A∈ R n×n , and vector λ(A)∈ C n holding the orderedeigenvalues of Aλ(µI + A) = µ1 + λ(A) (1451)Proof: A=MJM −1 and µI + A = M(µI + J)M −1 where Jis the Jordan form for A ; [331,5.6, App.B] id est, δ(J) = λ(A) ,so λ(µI + A) = δ(µI + J) because µI + J is also a Jordan form. A.8 The symmetrization of A is (A +A T )/2. λ ( 12 (A +AT ) ) = λ(A +A T )/2.A.9 Strang agrees [331, p.334] it is not λ(A) that requires observation. Yet he is mistakenby proposing the Hermitian part alone x H (A+A H )x be tested, because the anti-Hermitianpart does not vanish under complex test unless A is Hermitian. (1435)

602 APPENDIX A. LINEAR ALGEBRABy similar reasoning,λ(I + µA) = 1 + λ(µA) (1452)For vector σ(A) holding the singular values of any matrix Aσ(I + µA T A) = π(|1 + µσ(A T A)|) (1453)σ(µI + A T A) = π(|µ1 + σ(A T A)|) (1454)where π is the nonlinear permutation-operator sorting its vectorargument into nonincreasing order.For A∈ S M and each and every ‖w‖= 1 [202,7.7 prob.9]w T Aw ≤ µ ⇔ A ≼ µI ⇔ λ(A) ≼ µ1 (1455)[202,2.5.4] (confer (44))A is normal matrix ⇔ ‖A‖ 2 F = λ(A) T λ(A) (1456)For A∈ R m×n A T A ≽ 0, AA T ≽ 0 (1457)because, for dimensionally compatible vector x , x T A T Ax = ‖Ax‖ 2 2 ,x T AA T x = ‖A T x‖ 2 2 .For A∈ R n×n and c∈Rtr(cA) = c tr(A) = c1 T λ(A)(A.1.1 no.4)For m a nonnegative integer, (1868)det(A m ) =tr(A m ) =n∏λ(A) m i (1458)i=1n∑λ(A) m i (1459)i=1

A.3. PROPER STATEMENTS 603For A diagonalizable (A.5), A = SΛS −1 , (confer [331, p.255])rankA = rankδ(λ(A)) = rank Λ (1460)meaning, rank is equal to the number of nonzero eigenvalues in vectorby the 0 eigenvalues theorem (A.7.3.0.1).(Ky Fan) For A,B∈ S n [55,1.2] (confer (1741))λ(A) δ(Λ) (1461)tr(AB) ≤ λ(A) T λ(B) (1727)with equality (Theobald) when A and B are simultaneouslydiagonalizable [202] with the same ordering of eigenvalues.For A∈ R m×n and B ∈ R n×mtr(AB) = tr(BA) (1462)and η eigenvalues of the product and commuted product are identical,including their multiplicity; [202,1.3.20] id est,λ(AB) 1:η = λ(BA) 1:η , ηmin{m , n} (1463)Any eigenvalues remaining are zero. By the 0 eigenvalues theorem(A.7.3.0.1),rank(AB) = rank(BA), AB and BA diagonalizable (1464)For any compatible matrices A,B [202,0.4]min{rankA, rankB} ≥ rank(AB) (1465)For A,B ∈ S n +rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} ≥ rank(AB)(1466)For linearly independent matrices A,B ∈ S n + (2.1.2, R(A) ∩ R(B)=0,R(A T ) ∩ R(B T )=0,B.1.1),rankA + rankB = rank(A + B) > min{rankA, rankB} ≥ rank(AB)(1467)

604 APPENDIX A. LINEAR ALGEBRABecause R(A T A)= R(A T ) and R(AA T )= R(A) (p.634), for anyA∈ R m×n rank(AA T ) = rank(A T A) = rankA = rankA T (1468)For A∈ R m×n having no nullspace, and for any B ∈ R n×krank(AB) = rank(B) (1469)Proof. For any compatible matrix C , N(CAB)⊇ N(AB)⊇ N(B)is obvious. By assumption ∃A † A † A = I . Let C = A † , thenN(AB)= N(B) and the stated result follows by conservation ofdimension (1583).For A∈ S n and any nonsingular matrix Yinertia(A) = inertia(YAY T ) (1470)a.k.a, Sylvester’s law of inertia. (1516) [113,2.4.3]For A,B∈R n×n square, [202,0.3.5]Yet for A∈ R m×n and B ∈ R n×m [77, p.72]det(AB) = det(BA) (1471)det(AB) = detA detB (1472)det(I + AB) = det(I + BA) (1473)For A,B ∈ S n , product AB is symmetric if and only if AB iscommutative;(AB) T = AB ⇔ AB = BA (1474)Proof. (⇒) Suppose AB=(AB) T . (AB) T =B T A T =BA .AB=(AB) T ⇒ AB=BA .(⇐) Suppose AB=BA . BA=B T A T =(AB) T . AB=BA ⇒AB=(AB) T .Commutativity alone is insufficient for symmetry of the product.[331, p.26]

A.3. PROPER STATEMENTS 605Diagonalizable matrices A,B∈R n×n commute if and only if they aresimultaneously diagonalizable. [202,1.3.12] A product of diagonalmatrices is always commutative.For A,B ∈ R n×n and AB = BAx T Ax ≥ 0, x T Bx ≥ 0 ∀x ⇒ λ(A+A T ) i λ(B+B T ) i ≥ 0 ∀i x T ABx ≥ 0 ∀x(1475)the negative result arising because of the schism between the productof eigenvalues λ(A + A T ) i λ(B + B T ) i and the eigenvalues of thesymmetrized matrix product λ(AB + (AB) T ) i. For example, X 2 isgenerally not positive semidefinite unless matrix X is symmetric; then(1457) applies. Simply substituting symmetric matrices changes theoutcome:For A,B ∈ S n and AB = BAA ≽ 0, B ≽ 0 ⇒ λ(AB) i =λ(A) i λ(B) i ≥0 ∀i ⇔ AB ≽ 0 (1476)Positive semidefiniteness of A and B is sufficient but not a necessarycondition for positive semidefiniteness of the product AB .Proof. Because all symmetric matrices are diagonalizable, (A.5.1)[331,5.6] we have A=SΛS T and B=T∆T T , where Λ and ∆ arereal diagonal matrices while S and T are orthogonal matrices. Because(AB) T =AB , then T must equal S , [202,1.3] and the eigenvalues ofA are ordered in the same way as those of B ; id est, λ(A) i =δ(Λ) i andλ(B) i =δ(∆) i correspond to the same eigenvector.(⇒) Assume λ(A) i λ(B) i ≥0 for i=1... n . AB=SΛ∆S T issymmetric and has nonnegative eigenvalues contained in diagonalmatrix Λ∆ by assumption; hence positive semidefinite by (1444). Nowassume A,B ≽0. That, of course, implies λ(A) i λ(B) i ≥0 for all ibecause all the individual eigenvalues are nonnegative.(⇐) Suppose AB=SΛ∆S T ≽ 0. Then Λ∆≽0 by (1444),and so all products λ(A) i λ(B) i must be nonnegative; meaning,sgn(λ(A))= sgn(λ(B)). We may, therefore, conclude nothing aboutthe semidefiniteness of A and B .

606 APPENDIX A. LINEAR ALGEBRAFor A,B ∈ S n and A ≽ 0, B ≽ 0 (Example A.2.1.0.1)AB = BA ⇒ λ(AB) i =λ(A) i λ(B) i ≥ 0 ∀i ⇒ AB ≽ 0 (1477)AB = BA ⇒ λ(AB) i ≥ 0, λ(A) i λ(B) i ≥ 0 ∀i ⇔ AB ≽ 0 (1478)For A,B ∈ S n [202,7.7 prob.3] [203,4.2.13,5.2.1]A ≽ 0, B ≽ 0 ⇒ A ⊗ B ≽ 0 (1479)A ≽ 0, B ≽ 0 ⇒ A ◦ B ≽ 0 (1480)A ≻ 0, B ≻ 0 ⇒ A ⊗ B ≻ 0 (1481)A ≻ 0, B ≻ 0 ⇒ A ◦ B ≻ 0 (1482)where Kronecker and Hadamard products are symmetric.For A,B ∈ S n , (1450) A ≽ 0 ⇔ λ(A) ≽ 0 yetA ≽ 0 ⇒ δ(A) ≽ 0 (1483)A ≽ 0 ⇒ trA ≥ 0 (1484)A ≽ 0, B ≽ 0 ⇒ trA trB ≥ tr(AB) ≥ 0 (1485)[393,6.2] Because A≽0, B ≽0 ⇒ λ(AB)=λ( √ AB √ A)≽0 by(1463) and Corollary A.3.1.0.5, then we have tr(AB) ≥ 0.For A,B,C ∈ S n (Löwner)A ≽ 0 ⇔ tr(AB)≥ 0 ∀B ≽ 0 (378)A ≼ B , B ≼ C ⇒ A ≼ CA ≼ B ⇔ A + C ≼ B + CA ≼ B , A ≽ B ⇒ A = BA ≼ AA ≼ B , B ≺ C ⇒ A ≺ CA ≺ B ⇔ A + C ≺ B + C(transitivity)(additivity)(antisymmetry)(reflexivity)(strict transitivity)(strict additivity)(1486)(1487)For A,B ∈ R n×n x T Ax ≥ x T Bx ∀x ⇒ trA ≥ trB (1488)Proof. x T Ax≥x T Bx ∀x ⇔ λ((A −B) + (A −B) T )/2 ≽ 0 ⇒tr(A+A T −(B+B T ))/2 = tr(A −B)≥0. There is no converse.

A.3. PROPER STATEMENTS 607For A,B ∈ S n [393,6.2] (Theorem A.3.1.0.4)A ≽ B ⇒ trA ≥ trB (1489)A ≽ B ⇒ δ(A) ≽ δ(B) (1490)There is no converse, and restriction to the positive semidefinite conedoes not improve the situation. All-strict versions hold.A ≽ B ≽ 0 ⇒ rankA ≥ rankB (1491)A ≽ B ≽ 0 ⇒ detA ≥ detB ≥ 0 (1492)A ≻ B ≽ 0 ⇒ detA > detB ≥ 0 (1493)For A,B ∈ int S n + [34,4.2] [202,7.7.4]A ≽ B ⇔ A −1 ≼ B −1 , A ≻ 0 ⇔ A −1 ≻ 0 (1494)For A,B ∈ S n [393,6.2]A ≽ B ≽ 0 ⇒ √ A ≽ √ BA ≽ 0 ⇐ A 1/2 ≽ 0(1495)For A,B ∈ S n and AB = BA [393,6.2 prob.3]A ≽ B ≽ 0 ⇒ A k ≽ B k , k=1, 2,... (1496)A.3.1.0.1 Theorem. Positive semidefinite ordering of eigenvalues.For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M . Then [331,6]x T Ax ≥ 0 ∀x ⇔ λ ( A +A T) ≽ 0 (1497)x T Ax > 0 ∀x ≠ 0 ⇔ λ ( A +A T) ≻ 0 (1498)because x T (A −A T )x=0. (1432) Now arrange the entries of λ ( 1(A 2 +AT ) )and λ ( 1(B 2 +BT ) ) in nonincreasing order so λ ( 1(A 2 +AT ) ) holds the1largest eigenvalue of symmetrized A while λ ( 1(B 2 +BT ) ) holds the largest1eigenvalue of symmetrized B , and so on. Then [202,7.7 prob.1 prob.9] forκ ∈ Rx T Ax ≥ x T Bx ∀x ⇒ λ ( A +A T) ≽ λ ( B +B T)x T Ax ≥ x T Ixκ ∀x ⇔ λ ( 12 (A +AT ) ) ≽ κ1(1499)

608 APPENDIX A. LINEAR ALGEBRANow let A,B ∈ S M have diagonalizations A=QΛQ T and B=UΥU T withλ(A)=δ(Λ) and λ(B)=δ(Υ) arranged in nonincreasing order. Thenwhere S = QU T . [393,7.5]A ≽ B ⇔ λ(A−B) ≽ 0 (1500)A ≽ B ⇒ λ(A) ≽ λ(B) (1501)A ≽ B λ(A) ≽ λ(B) (1502)S T AS ≽ B ⇐ λ(A) ≽ λ(B) (1503)A.3.1.0.2 Theorem. (Weyl) Eigenvalues of sum. [202,4.3.1]For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M in nonincreasingorder so λ ( 1(A 2 +AT ) ) holds the largest eigenvalue of symmetrized A while1λ ( 1(B 2 +BT ) ) holds the largest eigenvalue of symmetrized B , and so on.1Then, for any k ∈{1... M }λ ( A +A T) k + λ( B +B T) M ≤ λ( (A +A T ) + (B +B T ) ) k ≤ λ( A +A T) k + λ( B +B T) 1⋄(1504)Weyl’s theorem establishes: concavity of the smallest λ M and convexityof the largest eigenvalue λ 1 of a symmetric matrix, via (497), and positivesemidefiniteness of a sum of positive semidefinite matrices; for A,B∈ S M +λ(A) k + λ(B) M ≤ λ(A + B) k ≤ λ(A) k + λ(B) 1 (1505)Because S M + is a convex cone (2.9.0.0.1), then by (175)A,B ≽ 0 ⇒ ζA + ξB ≽ 0 for all ζ,ξ ≥ 0 (1506)A.3.1.0.3 Corollary. Eigenvalues of sum and difference. [202,4.3]For A∈ S M and B ∈ S M + , place the eigenvalues of each matrix into respectivevectors λ(A), λ(B)∈ R M in nonincreasing order so λ(A) 1 holds the largesteigenvalue of A while λ(B) 1 holds the largest eigenvalue of B , and so on.Then, for any k ∈{1... M }λ(A − B) k ≤ λ(A) k ≤ λ(A +B) k (1507)⋄⋄

A.3. PROPER STATEMENTS 609When B is rank-one positive semidefinite, the eigenvalues interlace; id est,for B = qq Tλ(A) k−1 ≤ λ(A − qq T ) k ≤ λ(A) k ≤ λ(A + qq T ) k ≤ λ(A) k+1 (1508)A.3.1.0.4 Theorem. Positive (semi)definite principal submatrices. A.10A∈ S M is positive semidefinite if and only if all M principal submatricesof dimension M−1 are positive semidefinite and detA is nonnegative.A ∈ S M is positive definite if and only if any one principal submatrixof dimension M −1 is positive definite and detA is positive. ⋄If any one principal submatrix of dimension M−1 is not positive definite,conversely, then A can neither be. Regardless of symmetry, if A∈ R M×M ispositive (semi)definite, then the determinant of each and every principalsubmatrix is (nonnegative) positive. [272,1.3.1]A.3.1.0.5[202, p.399]Corollary. Positive (semi)definite symmetric products.If A∈ S M is positive definite and any particular dimensionallycompatible matrix Z has no nullspace, then Z T AZ is positive definite.If matrix A∈ S M is positive (semi)definite then, for any matrix Z ofcompatible dimension, Z T AZ is positive semidefinite.A∈ S M is positive (semi)definite if and only if there exists a nonsingularZ such that Z T AZ is positive (semi)definite.If A∈ S M is positive semidefinite and singular it remains possible, forsome skinny Z ∈ R M×N with N

610 APPENDIX A. LINEAR ALGEBRAWe can deduce from these, given nonsingular matrix Z and any particulardimensionally[ ]compatible Y : matrix A∈ S M is positive semidefinite if andZTonly ifY T A[Z Y ] is positive semidefinite. In other words, from theCorollary it follows: for dimensionally compatible ZA ≽ 0 ⇔ Z T AZ ≽ 0 and Z T has a left inverseProducts such as Z † Z and ZZ † are symmetric and positive semidefinitealthough, given A ≽ 0, Z † AZ and ZAZ † are neither necessarily symmetricor positive semidefinite.A.3.1.0.6 Theorem. Symmetric projector semidefinite. [20,III][21,6] [222, p.55] For symmetric idempotent matrices P and RP,R ≽ 0P ≽ R ⇔ R(P ) ⊇ R(R) ⇔ N(P ) ⊆ N(R)(1509)Projector P is never positive definite [333,6.5 prob.20] unless it is theidentity matrix.⋄A.3.1.0.7 Theorem. Symmetric positive semidefinite. [202, p.400]Given real matrix Ψ with rank Ψ = 1Ψ ≽ 0 ⇔ Ψ = uu T (1510)where u is some real vector; id est, symmetry is necessary and sufficient forpositive semidefiniteness of a rank-1 matrix.⋄Proof. Any rank-one matrix must have the form Ψ = uv T . (B.1)Suppose Ψ is symmetric; id est, v = u . For all y ∈ R M , y T uu T y ≥ 0.Conversely, suppose uv T is positive semidefinite. We know that can hold ifand only if uv T + vu T ≽ 0 ⇔ for all normalized y ∈ R M , 2y T uv T y ≥ 0 ;but that is possible only if v = u .The same does not hold true for matrices of higher rank, as Example A.2.1.0.1shows.

A.4. SCHUR COMPLEMENT 611A.4 Schur complementConsider Schur-form partitioned matrix G : Given A T = A and C T = C ,then [59][ ] A BG =B T ≽ 0C(1511)⇔ A ≽ 0, B T (I −AA † ) = 0, C −B T A † B ≽ 0⇔ C ≽ 0, B(I −CC † ) = 0, A−BC † B T ≽ 0where A † denotes the Moore-Penrose (pseudo)inverse (E). In the firstinstance, I − AA † is a symmetric projection matrix orthogonally projectingon N(A T ). (1910) It is apparently requiredR(B) ⊥ N(A T ) (1512)which precludes A = 0 when B is any nonzero matrix. Note that A ≻ 0 ⇒A † =A −1 ; thereby, the projection matrix vanishes. Likewise, in the secondinstance, I − CC † projects orthogonally on N(C T ). It is requiredR(B T ) ⊥ N(C T ) (1513)which precludes C =0 for B nonzero. Again, C ≻ 0 ⇒ C † = C −1 . So weget, for A or C nonsingular,[ ] A BG =B T ≽ 0C⇔A ≻ 0, C −B T A −1 B ≽ 0orC ≻ 0, A−BC −1 B T ≽ 0(1514)When A is full-rank then, for all B of compatible dimension, R(B) is inR(A). Likewise, when C is full-rank, R(B T ) is in R(C). Thus the flavor,for A and C nonsingular,[ ] A BG =B T ≻ 0C⇔ A ≻ 0, C −B T A −1 B ≻ 0⇔ C ≻ 0, A−BC −1 B T ≻ 0(1515)where C − B T A −1 B is called the Schur complement of A in G , while theSchur complement of C in G is A − BC −1 B T . [153,4.8]

612 APPENDIX A. LINEAR ALGEBRAOrigin of the term Schur complement is from complementary inertia:[113,2.4.4] Defineinertia ( G∈ S M) {p,z,n} (1516)where p,z,n respectively represent number of positive, zero, and negativeeigenvalues of G ; id est,M = p + z + n (1517)Then, when A is invertible,and when C is invertible,inertia(G) = inertia(A) + inertia(C − B T A −1 B) (1518)inertia(G) = inertia(C) + inertia(A − BC −1 B T ) (1519)A.4.0.0.1 Example. Equipartition inertia. [55,1.2 exer.17]When A=C = 0, denoting nonincreasingly ordered singular values of matrixB ∈ R m×m by σ(B)∈ R m + , then we have eigenvaluesand[ ] A bb T c([ 0 Bλ(G) = λB T 0])=[σ(B)−Ξσ(B)](1520)inertia(G) = inertia(B T B) + inertia(−B T B) (1521)where Ξ is the order-reversing permutation matrix defined in (1728). A.4.0.0.2 Example. Nonnegative polynomial. [34, p.163]Quadratic multivariate polynomial x T Ax + 2b T x + c is a convex function ofvector x if and only if A ≽ 0, but sublevel set {x | x T Ax + 2b T x + c ≤ 0}is convex if A ≽ 0 yet not vice versa. Schur-form positive semidefinitenessis sufficient for polynomial convexity but necessary and sufficient fornonnegativity:≽ 0 ⇔ [xT 1] [ A bb T c][ x1]≥ 0 ⇔ x T Ax+2b T x+c ≥ 0 (1522)All is extensible to univariate polynomials; e.g., x [t n t n−1 t n−2 · · · t ] T .

A.4. SCHUR COMPLEMENT 613A.4.0.0.3 Example. Schur-form fractional function trace minimization.From (1484),[ ] A BB T ≽ 0 ⇒ tr(A + C) ≥ 0C⇕[ A 00 T C −B T A −1 B]≽ 0 ⇒ tr(C −B T A −1 B) ≥ 0(1523)⇕[ A−BC −1 B T 00 T C]≽ 0 ⇒ tr(A−BC −1 B T ) ≥ 0Since tr(C −B T A −1 B)≥ 0 ⇔ trC ≥ tr(B T A −1 B) ≥ 0 for example, thenminimization of trC is necessary and sufficient [ for ] minimization ofA Btr(C −B T A −1 B) when both are under constraintB T ≽ 0. CA.4.0.1Schur-form nullspace basisFrom (1511),[ A BG =B T C]≽ 0⇕[ A 00 T C −B T A † B]≽ 0 and B T (I −AA † ) = 0(1524)⇕[ A−BC † B T 00 T C]≽ 0 and B(I −CC † ) = 0These facts plus Moore-Penrose condition (E.0.1) provide a partial basis:([ ]) [ ]A B I −AA†0basis NB T ⊇C 0 T I −CC † (1525)

614 APPENDIX A. LINEAR ALGEBRAA.4.0.1.1 Example. Sparse Schur conditions.Setting matrix A to the identity simplifies the Schur conditions. Oneconsequence relates definiteness of three quantities:[ ][ ]I BI 0B T ≽ 0 ⇔ C − B T B ≽ 0 ⇔C0 T C −B T ≽ 0 (1526)BA.4.0.1.2 Exercise. Eigenvalues λ of sparse Schur-form.Prove: given C −B T B = 0, for B ∈ R m×n and C ∈ S n⎧([ ]) ⎪⎨ 1 + λ(C) i , 1 ≤ i ≤ nI BλB T = 1, n < i ≤ mCi⎪⎩0, otherwise(1527)A.4.0.1.3 Theorem. Rank of partitioned matrices. [393,2.2 prob.7]When symmetric matrix A is invertible and C is symmetric,[ ] [ ]A B A 0rankB T = rankC 0 T C −B T A −1 B(1528)= rankA + rank(C −B T A −1 B)equals rank of main diagonal block A plus rank of its Schur complement.Similarly, when symmetric matrix C is invertible and A is symmetric,[ ] [ ]A B A − BCrankB T = rank−1 B T 0C0 T C(1529)= rank(A − BC −1 B T ) + rankCProof. The first assertion (1528) holds if and only if [202,0.4.6c][ ] [ ]A B A 0∃ nonsingular X,Y XB T Y =C 0 T C −B T A −1 (1530)BLet [202,7.7.6]Y = X T =[ I −A −1 B0 T I]⋄(1531)

A.4. SCHUR COMPLEMENT 615From Corollary A.3.1.0.3 eigenvalues are related by0 ≼ λ(C −B T A −1 B) ≼ λ(C) (1532)0 ≼ λ(A − BC −1 B T ) ≼ λ(A) (1533)which meansrank(C −B T A −1 B) ≤ rankC (1534)rank(A − BC −1 B T ) ≤ rankA (1535)Therefore[ A BrankB T C]≤ rankA + rankC (1536)A.4.0.1.4 Lemma. Rank of Schur-form block. [136] [134]Matrix B ∈ R m×n has rankB≤ ρ if and only if there exist matrices A∈ S mand C ∈ S n such that[ ][ ]A 0A Brank0 T ≤ 2ρ and G =CB T ≽ 0 (1537)C⋄Schur-form positive semidefiniteness alone implies rankA ≥ rankB andrankC ≥ rankB . But, even in absence of semidefiniteness, we must alwayshave rankG ≥ rankA, rankB, rankC by fundamental linear algebra.A.4.1Determinant[ ] A BG =B T C(1538)We consider again a matrix G partitioned like (1511), but not necessarilypositive (semi)definite, where A and C are symmetric.When A is invertible,When C is invertible,detG = detA det(C − B T A −1 B) (1539)detG = detC det(A − BC −1 B T ) (1540)

616 APPENDIX A. LINEAR ALGEBRAWhen B is full-rank and skinny, C = 0, and A ≽ 0, then [61,10.1.1]detG ≠ 0 ⇔ A + BB T ≻ 0 (1541)When B is a (column) vector, then for all C ∈ R and all A of dimensioncompatible with GdetG = det(A)C − B T A T cofB (1542)while for C ≠ 0detG = C det(A − 1 C BBT ) (1543)where A cof is the matrix of cofactors [331,4] corresponding to A .When B is full-rank and fat, A = 0, and C ≽ 0, thendetG ≠ 0 ⇔ C + B T B ≻ 0 (1544)When B is a row-vector, then for A ≠ 0 and all C of dimensioncompatible with Gwhile for all A∈ RdetG = A det(C − 1 A BT B) (1545)detG = det(C)A − BC T cofB T (1546)where C cof is the matrix of cofactors corresponding to C .A.5 Eigenvalue decompositionAll square matrices have associated eigenvalues λ and eigenvectors; if notsquare, Ax = λ i x becomes impossible dimensionally. Eigenvectors mustbe nonzero. Prefix eigen is from the German; in this context meaning,something akin to “characteristic”. [328, p.14]When a square matrix X ∈ R m×m is diagonalizable, [331,5.6] then⎡ ⎤w T1 m∑X = SΛS −1 = [s 1 · · · s m ] Λ⎣. ⎦ = λ i s i wi T (1547)w T mi=1

A.5. EIGENVALUE DECOMPOSITION 617where {s i ∈ N(X − λ i I)⊆ C m } are l.i. (right-)eigenvectors constituting thecolumns of S ∈ C m×m defined byXS = SΛ rather Xs i λ i s i , i = 1... m (1548){w i ∈ N(X T − λ i I)⊆ C m } are linearly independent left-eigenvectors of X(eigenvectors of X T ) constituting the rows of S −1 defined by [202]S −1 X = ΛS −1 rather w T i X λ i w T i , i = 1... m (1549)and where {λ i ∈ C} are eigenvalues (1461)δ(λ(X)) = Λ ∈ C m×m (1550)corresponding to both left and right eigenvectors; id est, λ(X) = λ(X T ).There is no connection between diagonalizability and invertibility of X .[331,5.2] Diagonalizability is guaranteed by a full set of linearly independenteigenvectors, whereas invertibility is guaranteed by all nonzero eigenvalues.distinct eigenvalues ⇒ l.i. eigenvectors ⇔ diagonalizablenot diagonalizable ⇒ repeated eigenvalue(1551)A.5.0.0.1 Theorem. Real eigenvector.Eigenvectors of a real matrix corresponding to real eigenvalues must be real.⋄Proof. Ax = λx . Given λ=λ ∗ , x H Ax = λx H x = λ‖x‖ 2 = x T Ax ∗x = x ∗ , where x H =x ∗T . The converse is equally simple. ⇒A.5.0.1UniquenessFrom the fundamental theorem of algebra, [219] which guarantees existenceof zeros for a given polynomial, it follows: eigenvalues, including theirmultiplicity, for a given square matrix are unique; meaning, there is no otherset of eigenvalues for that matrix. (Conversely, many different matrices mayshare the same unique set of eigenvalues; e.g., for any X , λ(X) = λ(X T ).)Uniqueness of eigenvectors, in contrast, disallows multiplicity of the samedirection:

618 APPENDIX A. LINEAR ALGEBRAA.5.0.1.1 Definition. Unique eigenvectors.When eigenvectors are unique, we mean: unique to within a real nonzeroscaling, and their directions are distinct.△If S is a matrix of eigenvectors of X as in (1547), for example, then −Sis certainly another matrix of eigenvectors decomposing X with the sameeigenvalues.For any square matrix, the eigenvector corresponding to a distincteigenvalue is unique; [328, p.220]distinct eigenvalues ⇒ eigenvectors unique (1552)Eigenvectors corresponding to a repeated eigenvalue are not unique for adiagonalizable matrix;repeated eigenvalue ⇒ eigenvectors not unique (1553)Proof follows from the observation: any linear combination of distincteigenvectors of diagonalizable X , corresponding to a particular eigenvalue,produces another eigenvector. For eigenvalue λ whose multiplicity A.12dim N(X −λI) exceeds 1, in other words, any choice of independentvectors from N(X −λI) (of the same multiplicity) constitutes eigenvectorscorresponding to λ .Caveat diagonalizability insures linear independence which impliesexistence of distinct eigenvectors. We may conclude, for diagonalizablematrices,distinct eigenvalues ⇔ eigenvectors unique (1554)A.5.0.2Invertible matrixWhen diagonalizable matrix X ∈ R m×m is nonsingular (no zero eigenvalues),then it has an inverse obtained simply by inverting eigenvalues in (1547):X −1 = SΛ −1 S −1 (1555)A.12 A matrix is diagonalizable iff algebraic multiplicity (number of occurrences of sameeigenvalue) equals geometric multiplicity dim N(X −λI) = m − rank(X −λI) [328, p.15](number of Jordan blocks w.r.t λ or number of corresponding l.i. eigenvectors).

A.5. EIGENVALUE DECOMPOSITION 619A.5.0.3eigenmatrixThe (right-)eigenvectors {s i } (1547) are naturally orthogonal w T i s j =0to left-eigenvectors {w i } except, for i=1... m , w T i s i =1 ; called abiorthogonality condition [365,2.2.4] [202] because neither set of left or righteigenvectors is necessarily an orthogonal set. Consequently, each dyad from adiagonalization is an independent (B.1.1) nonorthogonal projector becauses i w T i s i w T i = s i w T i (1556)(whereas the dyads of singular value decomposition are not inherentlyprojectors (confer (1563))).Dyads of eigenvalue decomposition can be termed eigenmatrices becauseSum of the eigenmatrices is the identity;A.5.1X s i w T i = λ i s i w T i (1557)m∑s i wi T = I (1558)i=1Symmetric matrix diagonalizationThe set of normal matrices is, precisely, that set of all real matrices havinga complete orthonormal set of eigenvectors; [393,8.1] [333, prob.10.2.31]id est, any matrix X for which XX T = X T X ; [159,7.1.3] [328, p.3]e.g., orthogonal and circulant matrices [168]. All normal matrices arediagonalizable.A symmetric matrix is a special normal matrix whose eigenvalues Λmust be real A.13 and whose eigenvectors S can be chosen to make a realorthonormal set; [333,6.4] [331, p.315] id est, for X ∈ S ms T1X = SΛS T = [s 1 · · · s m ] Λ⎣.⎡s T m⎤⎦ =m∑λ i s i s T i (1559)A.13 Proof. Suppose λ i is an eigenvalue corresponding to eigenvector s i of real A=A T .Then s H i As i= s T i As∗ i (by transposition) ⇒ s ∗Ti λ i s i = s T i λ∗ i s∗ i because (As i ) ∗ = (λ i s i ) ∗ byassumption. So we have λ i ‖s i ‖ 2 = λ ∗ i ‖s i‖ 2 . There is no converse.i=1

620 APPENDIX A. LINEAR ALGEBRAwhere δ 2 (Λ) = Λ∈ S m (A.1) and S −1 = S T ∈ R m×m (orthogonal matrix,B.5) because of symmetry: SΛS −1 = S −T ΛS T . By 0 eigenvalues theoremA.7.3.0.1,R{s i |λ i ≠0} = R(A) = R(A T )R{s i |λ i =0} = N(A T (1560)) = N(A)A.5.1.1Diagonal matrix diagonalizationBecause arrangement of eigenvectors and their corresponding eigenvalues isarbitrary, we almost always arrange eigenvalues in nonincreasing order asis the convention for singular value decomposition. Then to diagonalize asymmetric matrix that is already a diagonal matrix, orthogonal matrix Sbecomes a permutation matrix.A.5.1.2Invertible symmetric matrixWhen symmetric matrix X ∈ S m is nonsingular (invertible), then its inverse(obtained by inverting eigenvalues in (1559)) is also symmetric:A.5.1.3X −1 = SΛ −1 S T ∈ S m (1561)Positive semidefinite matrix square rootWhen X ∈ S m + , its unique positive semidefinite matrix square root is defined√X S√ΛS T ∈ S m + (1562)where the square root of nonnegative diagonal matrix √ Λ is taken entrywiseand positive. Then X = √ X √ X .A.6 Singular value decomposition, SVDA.6.1Compact SVD[159,2.5.4] For any A∈ R m×n⎡q T1A = UΣQ T = [ u 1 · · · u η ] Σ⎣.⎤∑⎦ = η σ i u i qiTqηT i=1(1563)U ∈ R m×η , Σ ∈ R η×η , Q ∈ R n×η

A.6. SINGULAR VALUE DECOMPOSITION, SVD 621where U and Q are always skinny-or-square, each having orthonormalcolumns, and whereη min{m , n} (1564)Square matrix Σ is diagonal (A.1.1)δ 2 (Σ) = Σ ∈ R η×η (1565)holding the singular values {σ i ∈ R} of A which are always arranged innonincreasing order by convention and are related to eigenvalues λ by A.14⎧√ ⎨ λ(AT A)σ(A) i = σ(A T i = √ (√ )λ(AA T ) i = λ AT A)= λ(√AAT> 0, 1 ≤ i ≤ ρ) i =ii⎩0, ρ < i ≤ η(1566)of which the last η −ρ are 0 , A.15 whereρ rankA = rank Σ (1567)A point sometimes lost: Any real matrix may be decomposed in terms ofits real singular values σ(A)∈ R η and real matrices U and Q as in (1563),where [159,2.5.3]R{u i |σ i ≠0} = R(A)R{u i |σ i =0} ⊆ N(A T )R{q i |σ i ≠0} = R(A T (1568))R{q i |σ i =0} ⊆ N(A)A.6.2Subcompact SVDSome authors allow only nonzero singular values. In that case the compactdecomposition can be made smaller; it can be redimensioned in terms ofrank ρ because, for any A∈ R m×nρ = rankA = rank Σ = max {i∈{1... η} | σ i ≠ 0} ≤ η (1569)There are η singular values. For any flavor SVD, rank is equivalent tothe number of nonzero singular values on the main diagonal of Σ.A.14 When matrix A is normal,√σ(A) = |λ(A)|.)[393,8.1]A.15 For η = n , σ(A) = λ(A T A) = λ(√AT A .√)For η = m , σ(A) = λ(AA T ) = λ(√AAT.

622 APPENDIX A. LINEAR ALGEBRANowq T1A = UΣQ T = [ u 1 · · · u ρ ] Σ⎣.⎡⎤∑⎦ = ρ σ i u i qiTqρT i=1(1570)U ∈ R m×ρ , Σ ∈ R ρ×ρ , Q ∈ R n×ρwhere the main diagonal of diagonal matrix Σ has no 0 entries, andA.6.3Full SVDR{u i } = R(A)R{q i } = R(A T )(1571)Another common and useful expression of the SVD makes U and Qsquare; making the decomposition larger than compact SVD. Completingthe nullspace bases in U and Q from (1568) provides what is called thefull singular value decomposition of A∈ R m×n [331, App.A]. Orthonormalmatrices U and Q become orthogonal matrices (B.5):R{u i |σ i ≠0} = R(A)R{u i |σ i =0} = N(A T )R{q i |σ i ≠0} = R(A T )R{q i |σ i =0} = N(A)For any matrix A having rank ρ (= rank Σ)⎡ ⎤q T1 ∑A = UΣQ T = [ u 1 · · · u m ] Σ⎣. ⎦ = η σ i u i qiTq T n⎡σ 1= [ m×ρ basis R(A) m×m−ρ basis N(A T ) ] σ 2⎢⎣...i=1(1572)⎤⎡ ( n×ρ basis R(A T ) ) ⎤T⎥⎣⎦⎦(n×n−ρ basis N(A)) TU ∈ R m×m , Σ ∈ R m×n , Q ∈ R n×n (1573)where upper limit of summation η is defined in (1564). Matrix Σ is nolonger necessarily square, now padded with respect to (1565) by m−ηzero rows or n−η zero columns; the nonincreasingly ordered (possibly 0)singular values appear along its main diagonal as for compact SVD (1566).

A.6. SINGULAR VALUE DECOMPOSITION, SVD 623A = UΣQ T = [u 1 · · · u m ] Σ⎣⎡q T1 .q T n⎤∑⎦ = η σ i u i qiTi=1Σq 2qq 21Q Tq 1q 2 u 2Uu 1q 1Figure 155: Geometrical interpretation of full SVD [271]: Image of circle{x∈ R 2 | ‖x‖ 2 =1} under matrix multiplication Ax is, in general, an ellipse.For the example illustrated, U [u 1 u 2 ]∈ R 2×2 , Q[q 1 q 2 ]∈ R 2×2 .

624 APPENDIX A. LINEAR ALGEBRAAn important geometrical interpretation of SVD is given in Figure 155for m = n = 2 : The image of the unit sphere under any m × n matrixmultiplication is an ellipse. Considering the three factors of the SVDseparately, note that Q T is a pure rotation of the circle. Figure 155 showshow the axes q 1 and q 2 are first rotated by Q T to coincide with the coordinateaxes. Second, the circle is stretched by Σ in the directions of the coordinateaxes to form an ellipse. The third step rotates the ellipse by U into itsfinal position. Note how q 1 and q 2 are rotated to end up as u 1 and u 2 , theprincipal axes of the final ellipse. A direct calculation shows that Aq j = σ j u j .Thus q j is first rotated to coincide with the j th coordinate axis, stretched bya factor σ j , and then rotated to point in the direction of u j . All of thisis beautifully illustrated for 2 ×2 matrices by the Matlab code eigshow.m(see [334]).A direct consequence of the geometric interpretation is that the largestsingular value σ 1 measures the “magnitude” of A (its 2-norm):‖A‖ 2 = sup ‖Ax‖ 2 = σ 1 (1574)‖x‖ 2 =1This means that ‖A‖ 2 is the length of the longest principal semiaxis of theellipse.Expressions for U , Q , and Σ follow readily from (1573),AA T U = UΣΣ T and A T AQ = QΣ T Σ (1575)demonstrating that the columns of U are the eigenvectors of AA T and thecolumns of Q are the eigenvectors of A T A. −Muller, Magaia, & Herbst [271]A.6.4Pseudoinverse by SVDMatrix pseudoinverse (E) is nearly synonymous with singular valuedecomposition because of the elegant expression, given A = UΣQ T ∈ R m×nA † = QΣ †T U T ∈ R n×m (1576)that applies to all three flavors of SVD, where Σ † simply inverts nonzeroentries of matrix Σ .Given symmetric matrix A∈ S n and its diagonalization A = SΛS T(A.5.1), its pseudoinverse simply inverts all nonzero eigenvalues:A † = SΛ † S T ∈ S n (1577)

A.7. ZEROS 625A.6.5SVD of symmetric matricesFrom (1566) and (1562) for A = A T⎧√ (√⎨ λ(A2 ) i = λ A 2)= |λ(A) i | > 0, 1 ≤ i ≤ ρσ(A) i =i⎩0, ρ < i ≤ η(1578)A.6.5.0.1 Definition. Step function. (confer4.3.2.0.1)Define the signum-like quasilinear function ψ : R n → R n that takes value 1corresponding to a 0-valued entry in its argument:[ψ(a) limx i →a ix i|x i | = { 1, ai ≥ 0−1, a i < 0 , i=1... n ]∈ R n (1579)Eigenvalue signs of a symmetric matrix having diagonalizationA = SΛS T (1559) can be absorbed either into real U or real Q from thefull SVD; [350, p.34] (conferC.4.2.1)△orA = SΛS T = Sδ(ψ(δ(Λ))) |Λ|S T U ΣQ T ∈ S n (1580)A = SΛS T = S|Λ|δ(ψ(δ(Λ)))S T UΣ Q T ∈ S n (1581)where matrix of singular values Σ = |Λ| denotes entrywise absolute value ofdiagonal eigenvalue matrix Λ .A.7 ZerosA.7.1norm zeroFor any given norm, by definition,‖x‖ l= 0 ⇔ x = 0 (1582)Consequently, a generally nonconvex constraint in x like ‖Ax − b‖ = κbecomes convex when κ = 0.

626 APPENDIX A. LINEAR ALGEBRAA.7.20 entryIf a positive semidefinite matrix A = [A ij ] ∈ R n×n has a 0 entry A ii on itsmain diagonal, then A ij + A ji = 0 ∀j . [272,1.3.1]Any symmetric positive semidefinite matrix having a 0 entry on its maindiagonal must be 0 along the entire row and column to which that 0 entrybelongs. [159,4.2.8] [202,7.1 prob.2]A.7.3 0 eigenvalues theoremThis theorem is simple, powerful, and widely applicable:A.7.3.0.1 Theorem. Number of 0 eigenvalues.For any matrix A∈ R m×nrank(A) + dim N(A) = n (1583)by conservation of dimension. [202,0.4.4]For any square matrix A∈ R m×m , number of 0 eigenvalues is at leastequal to dim N(A)dim N(A) ≤ number of 0 eigenvalues ≤ m (1584)while all eigenvectors corresponding to those 0 eigenvalues belong to N(A).[331,5.1] A.16For diagonalizable matrix A (A.5), the number of 0 eigenvalues isprecisely dim N(A) while the corresponding eigenvectors span N(A). Realand imaginary parts of the eigenvectors remaining span R(A).A.16 We take as given the well-known fact that the number of 0 eigenvalues cannot be lessthan dimension of the nullspace. We offer an example of the converse:⎡A = ⎢⎣1 0 1 00 0 1 00 0 0 01 0 0 0dim N(A)=2, λ(A)=[0 0 0 1] T ; three eigenvectors in the nullspace but only two areindependent. The right-hand side of (1584) is tight for nonzero matrices; e.g., (B.1) dyaduv T ∈ R m×m has m 0-eigenvalues when u ∈ v ⊥ .⎤⎥⎦

A.7. ZEROS 627(Transpose.)Likewise, for any matrix A∈ R m×nrank(A T ) + dim N(A T ) = m (1585)For any square A∈ R m×m , number of 0 eigenvalues is at least equalto dim N(A T )=dim N(A) while all left-eigenvectors (eigenvectors of A T )corresponding to those 0 eigenvalues belong to N(A T ).For diagonalizable A , number of 0 eigenvalues is precisely dim N(A T )while the corresponding left-eigenvectors span N(A T ). Real and imaginaryparts of the left-eigenvectors remaining span R(A T ).⋄Proof. First we show, for a diagonalizable matrix, the number of 0eigenvalues is precisely the dimension of its nullspace while the eigenvectorscorresponding to those 0 eigenvalues span the nullspace:Any diagonalizable matrix A∈ R m×m must possess a complete set oflinearly independent eigenvectors. If A is full-rank (invertible), then allm=rank(A) eigenvalues are nonzero. [331,5.1]Suppose rank(A)< m . Then dim N(A) = m−rank(A). Thus there isa set of m−rank(A) linearly independent vectors spanning N(A). Eachof those can be an eigenvector associated with a 0 eigenvalue becauseA is diagonalizable ⇔ ∃ m linearly independent eigenvectors. [331,5.2]Eigenvectors of a real matrix corresponding to 0 eigenvalues must be real. A.17Thus A has at least m−rank(A) eigenvalues equal to 0.Now suppose A has more than m−rank(A) eigenvalues equal to 0.Then there are more than m−rank(A) linearly independent eigenvectorsassociated with 0 eigenvalues, and each of those eigenvectors must be inN(A). Thus there are more than m−rank(A) linearly independent vectorsin N(A) ; a contradiction.Diagonalizable A therefore has rank(A) nonzero eigenvalues and exactlym−rank(A) eigenvalues equal to 0 whose corresponding eigenvectorsspan N(A).By similar argument, the left-eigenvectors corresponding to 0 eigenvaluesspan N(A T ).Next we show when A is diagonalizable, the real and imaginary parts ofits eigenvectors (corresponding to nonzero eigenvalues) span R(A) :A.17 Proof. Let ∗ denote complex conjugation. Suppose A=A ∗ and As i =0. Thens i = s ∗ i ⇒ As i =As ∗ i ⇒ As ∗ i =0. Conversely, As∗ i =0 ⇒ As i=As ∗ i ⇒ s i= s ∗ i .

628 APPENDIX A. LINEAR ALGEBRAThe (right-)eigenvectors of a diagonalizable matrix A∈ R m×m are linearlyindependent if and only if the left-eigenvectors are. So, matrix A hasa representation in terms of its right- and left-eigenvectors; from thediagonalization (1547), assuming 0 eigenvalues are ordered last,A =m∑λ i s i wi T =i=1k∑≤ mi=1λ i ≠0λ i s i w T i (1586)From the linearly independent dyads theorem (B.1.1.0.2), the dyads {s i w T i }must be independent because each set of eigenvectors are; hence rankA = k ,the number of nonzero eigenvalues. Complex eigenvectors and eigenvaluesare common for real matrices, and must come in complex conjugate pairs forthe summation to remain real. Assume that conjugate pairs of eigenvaluesappear in sequence. Given any particular conjugate pair from (1586), we getthe partial summationλ i s i w T i + λ ∗ i s ∗ iw ∗Ti = 2re(λ i s i w T i )= 2 ( res i re(λ i w T i ) − im s i im(λ i w T i ) ) (1587)where A.18 λ ∗ i λ i+1 , s ∗ i s i+1 , and w ∗ i w i+1 . Then (1586) isequivalently writtenA = 2 ∑ iλ ∈ Cλ i ≠0res 2i re(λ 2i w T 2i) − im s 2i im(λ 2i w T 2i) + ∑ jλ ∈ Rλ j ≠0λ j s j w T j (1588)The summation (1588) shows: A is a linear combination of real and imaginaryparts of its (right-)eigenvectors corresponding to nonzero eigenvalues. Thek vectors {re s i ∈ R m , ims i ∈ R m | λ i ≠0, i∈{1... m}} must therefore spanthe range of diagonalizable matrix A .The argument is similar regarding span of the left-eigenvectors. A.7.40 trace and matrix productFor X,A∈ R M×N+ (39)tr(X T A) = 0 ⇔ X ◦ A = A ◦ X = 0 (1589)A.18 Complex conjugate of w is denoted w ∗ . Conjugate transpose is denoted w H = w ∗T .

A.7. ZEROS 629For X,A∈ S M +[34,2.6.1 exer.2.8] [359,3.1]tr(XA) = 0 ⇔ XA = AX = 0 (1590)Proof. (⇐) Suppose XA = AX = 0. Then tr(XA)=0 is obvious.(⇒) Suppose tr(XA)=0. tr(XA)= tr( √ AX √ A) whose argument ispositive semidefinite by Corollary A.3.1.0.5. Trace of any square matrix isequivalent to the sum of its eigenvalues. Eigenvalues of a positive semidefinitematrix can total 0 if and only if each and every nonnegative eigenvalue is 0.The only positive semidefinite matrix, having all 0 eigenvalues, resides at theorigin; (confer (1614)) id est,√AX√A =(√X√A) T√X√A = 0 (1591)implying √ X √ A = 0 which in turn implies √ X( √ X √ A) √ A = XA = 0.Arguing similarly yields AX = 0.Diagonalizable matrices A and X are simultaneously diagonalizable if andonly if they are commutative under multiplication; [202,1.3.12] id est, iffthey share a complete set of eigenvectors.A.7.4.0.1 Example. An equivalence in nonisomorphic spaces.Identity (1590) leads to an unusual equivalence relating convex geometry totraditional linear algebra: The convex sets, given A ≽ 0{X | 〈X , A〉 = 0} ∩ {X ≽ 0} ≡ {X | N(X) ⊇ R(A)} ∩ {X ≽ 0} (1592)(one expressed in terms of a hyperplane, the other in terms of nullspace andrange) are equivalent only when symmetric matrix A is positive semidefinite.We might apply this equivalence to the geometric center subspace, forexample,S M c = {Y ∈ S M | Y 1 = 0}= {Y ∈ S M | N(Y ) ⊇ 1} = {Y ∈ S M | R(Y ) ⊆ N(1 T )}(1998)from which we derive (confer (994))S M c ∩ S M + ≡ {X ≽ 0 | 〈X , 11 T 〉 = 0} (1593)

630 APPENDIX A. LINEAR ALGEBRAA.7.5Zero definiteThe domain over which an arbitrary real matrix A is zero definite can exceedits left and right nullspaces. For any positive semidefinite matrix A∈ R M×M(for A +A T ≽ 0){x | x T Ax = 0} = N(A +A T ) (1594)because ∃R A+A T =R T R , ‖Rx‖=0 ⇔ Rx=0, and N(A+A T )= N(R).Then given any particular vector x p , x T pAx p = 0 ⇔ x p ∈ N(A +A T ). Forany positive definite matrix A (for A +A T ≻ 0)Further, [393,3.2 prob.5]while{x | x T Ax = 0} = 0 (1595){x | x T Ax = 0} = R M ⇔ A T = −A (1596){x | x H Ax = 0} = C M ⇔ A = 0 (1597)The positive semidefinite matrix[ ] 1 2A =0 1for example, has no nullspace. Yet(1598){x | x T Ax = 0} = {x | 1 T x = 0} ⊂ R 2 (1599)which is the nullspace of the symmetrized matrix. Symmetric matrices arenot spared from the excess; videlicet,[ ] 1 2B =(1600)2 1has eigenvalues {−1, 3} , no nullspace, but is zero definite on A.19X {x∈ R 2 | x 2 = (−2 ± √ 3)x 1 } (1601)A.19 These two lines represent the limit in the union of two generally distinct hyperbolae;id est, for matrix B and set X as definedlim {x∈ R 2 | x T Bx = ε} = Xε→0 +

A.7. ZEROS 631A.7.5.0.1 Proposition. (Sturm/Zhang) Dyad-decompositions. [337,5.2]Let positive semidefinite matrix X ∈ S M + have rank ρ . Then given symmetricmatrix A∈ S M , 〈A, X〉 = 0 if and only if there exists a dyad-decompositionsatisfyingX =ρ∑x j xj T (1602)j=1〈A , x j x T j 〉 = 0 for each and every j ∈ {1... ρ} (1603)⋄The dyad-decomposition of X proposed is generally not that obtainedfrom a standard diagonalization by eigenvalue decomposition, unless ρ =1or the given matrix A is simultaneously diagonalizable (A.7.4) with X .That means, elemental dyads in decomposition (1602) constitute a generallynonorthogonal set. Sturm & Zhang give a simple procedure for constructingthe dyad-decomposition [Wıκımization]; matrix A may be regarded as aparameter.A.7.5.0.2 Example. Dyad.The dyad uv T ∈ R M×M (B.1) is zero definite on all x for which eitherx T u=0 or x T v=0;{x | x T uv T x = 0} = {x | x T u=0} ∪ {x | v T x=0} (1604)id est, on u ⊥ ∪ v ⊥ . Symmetrizing the dyad does not change the outcome:{x | x T (uv T + vu T )x/2 = 0} = {x | x T u=0} ∪ {x | v T x=0} (1605)

632 APPENDIX A. LINEAR ALGEBRA

Appendix BSimple matricesMathematicians also attempted to develop algebra of vectors butthere was no natural definition of the product of two vectorsthat held in arbitrary dimensions. The first vector algebra thatinvolved a noncommutative vector product (that is, v×w need notequal w×v) was proposed by Hermann Grassmann in his bookAusdehnungslehre (1844). Grassmann’s text also introduced theproduct of a column matrix and a row matrix, which resulted inwhat is now called a simple or a rank-one matrix. In the late19th century the American mathematical physicist Willard Gibbspublished his famous treatise on vector analysis. In that treatiseGibbs represented general matrices, which he called dyadics, assums of simple matrices, which Gibbs called dyads. Later thephysicist P. A. M. Dirac introduced the term “bra-ket” for whatwe now call the scalar product of a “bra” (row) vector times a“ket” (column) vector and the term “ket-bra” for the product of aket times a bra, resulting in what we now call a simple matrix, asabove. Our convention of identifying column matrices and vectorswas introduced by physicists in the 20th century.−Marie A. Vitulli [368]2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.633

634 APPENDIX B. SIMPLE MATRICESB.1 Rank-one matrix (dyad)Any matrix formed from the unsigned outer product of two vectors,Ψ = uv T ∈ R M×N (1606)where u ∈ R M and v ∈ R N , is rank-one and called a dyad. Conversely, anyrank-one matrix must have the form Ψ . [202, prob.1.4.1] Product −uv T isa negative dyad. For matrix products AB T , in general, we haveR(AB T ) ⊆ R(A) , N(AB T ) ⊇ N(B T ) (1607)with equality when B =A [331,3.3,3.6] B.1 or respectively when B isinvertible and N(A)=0. Yet for all nonzero dyads we haveR(uv T ) = R(u) , N(uv T ) = N(v T ) ≡ v ⊥ (1608)where dim v ⊥ =N −1.It is obvious a dyad can be 0 only when u or v is 0 ;Ψ = uv T = 0 ⇔ u = 0 or v = 0 (1609)The matrix 2-norm for Ψ is equivalent to Frobenius’ norm;‖Ψ‖ 2 = σ 1 = ‖uv T ‖ F = ‖uv T ‖ 2 = ‖u‖ ‖v‖ (1610)When u and v are normalized, the pseudoinverse is the transposed dyad.Otherwise,Ψ † = (uv T ) † vu T=(1611)‖u‖ 2 ‖v‖ 2B.1 Proof. R(AA T ) ⊆ R(A) is obvious.R(AA T ) = {AA T y | y ∈ R m }⊇ {AA T y | A T y ∈ R(A T )} = R(A) by (142)

B.1. RANK-ONE MATRIX (DYAD) 635R(v)0 0 R(Ψ) = R(u)N(Ψ)= N(v T )N(u T )R N = R(v) ⊞ N(uv T )N(u T ) ⊞ R(uv T ) = R MFigure 156: The four fundamental subspaces [333,3.6] of any dyadΨ = uv T ∈R M×N : R(v)⊥ N(Ψ) & N(u T )⊥ R(Ψ). Ψ(x) uv T x is a linearmapping from R N to R M . Map from R(v) to R(u) is bijective. [331,3.1]When dyad uv T ∈R N×N is square, uv T has at least N −1 0-eigenvaluesand corresponding eigenvectors spanning v ⊥ . The remaining eigenvector uspans the range of uv T with corresponding eigenvalueλ = v T u = tr(uv T ) ∈ R (1612)Determinant is a product of the eigenvalues; so, it is always true thatdet Ψ = det(uv T ) = 0 (1613)When λ = 1, the square dyad is a nonorthogonal projector projecting on itsrange (Ψ 2 =Ψ ,E.6); a projector dyad. It is quite possible that u∈v ⊥making the remaining eigenvalue instead 0 ; B.2 λ = 0 together with thefirst N −1 0-eigenvalues; id est, it is possible uv T were nonzero while allits eigenvalues are 0. The matrix[ ] 1 [ 1 1]=−1[ 1 1−1 −1](1614)for example, has two 0-eigenvalues. In other words, eigenvector u maysimultaneously be a member of the nullspace and range of the dyad.The explanation is, simply, because u and v share the same dimension,dimu = M = dimv = N :B.2 A dyad is not always diagonalizable (A.5) because its eigenvectors are not necessarilyindependent.

636 APPENDIX B. SIMPLE MATRICESProof. Figure 156 shows the four fundamental subspaces for the dyad.Linear operator Ψ : R N →R M provides a map between vector spaces thatremain distinct when M =N ;u ∈ R(uv T )u ∈ N(uv T ) ⇔ v T u = 0R(uv T ) ∩ N(uv T ) = ∅(1615)B.1.0.1rank-one modificationFor A∈ R M×N , x∈ R N , y ∈ R M , and y T Ax≠0 [206,2.1]( )rank A − AxyT A= rank(A) − 1 (1616)y T AxIf A∈ R N×N is any nonsingular matrix and 1 + v T A −1 u ≠ 0, then[221, App.6] [393,2.3 prob.16] [153,4.11.2] (Sherman-Morrison)(A + uv T ) −1 = A −1 − A−1 uv T A −11 + v T A −1 u(1617)B.1.0.2dyad symmetryIn the specific circumstance that v = u , then uu T ∈ R N×N is symmetric,rank-one, and positive semidefinite having exactly N −1 0-eigenvalues. Infact, (Theorem A.3.1.0.7)uv T ≽ 0 ⇔ v = u (1618)and the remaining eigenvalue is almost always positive;λ = u T u = tr(uu T ) > 0 unless u=0 (1619)The matrix [ Ψ uu T 1](1620)for example, is rank-1 positive semidefinite if and only if Ψ = uu T .

B.1. RANK-ONE MATRIX (DYAD) 637B.1.1Dyad independenceNow we consider a sum of dyads like (1606) as encountered in diagonalizationand singular value decomposition:( k∑)k∑R s i wiT = R ( )k∑s i wiT = R(s i ) ⇐ w i ∀i are l.i. (1621)i=1i=1range of summation is the vector sum of ranges. B.3 (Theorem B.1.1.1.1)Under the assumption the dyads are linearly independent (l.i.), then vectorsums are unique (p.772): for {w i } l.i. and {s i } l.i.( k∑R s i wiTi=1)i=1= R ( s 1 w T 1)⊕ ... ⊕ R(sk w T k)= R(s1 ) ⊕ ... ⊕ R(s k ) (1622)B.1.1.0.1 Definition. Linearly independent dyads. [211, p.29 thm.11][339, p.2] The set of k dyads{si w T i | i=1... k } (1623)where s i ∈ C M and w i ∈ C N , is said to be linearly independent iff()k∑rank SW T s i wiT = k (1624)i=1where S [s 1 · · · s k ] ∈ C M×k and W [w 1 · · · w k ] ∈ C N×k .△Dyad independence does not preclude existence of a nullspace N(SW T ) ,as defined, nor does it imply SW T were full-rank. In absence of assumptionof independence, generally, rankSW T ≤ k . Conversely, any rank-k matrixcan be written in the form SW T by singular value decomposition. (A.6)B.1.1.0.2 Theorem. Linearly independent (l.i.) dyads.Vectors {s i ∈ C M , i=1... k} are l.i. and vectors {w i ∈ C N , i=1... k} arel.i. if and only if dyads {s i wi T ∈ C M×N , i=1... k} are l.i.⋄B.3 Move of range R to inside summation depends on linear independence of {w i }.

638 APPENDIX B. SIMPLE MATRICESProof. Linear independence of k dyads is identical to definition (1624).(⇒) Suppose {s i } and {w i } are each linearly independent sets. InvokingSylvester’s rank inequality, [202,0.4] [393,2.4]rankS+rankW − k ≤ rank(SW T ) ≤ min{rankS , rankW } (≤ k) (1625)Then k ≤rank(SW T )≤k which implies the dyads are independent.(⇐) Conversely, suppose rank(SW T )=k . Thenk ≤ min{rankS , rankW } ≤ k (1626)implying the vector sets are each independent.B.1.1.1Biorthogonality condition, Range and Nullspace of SumDyads characterized by biorthogonality condition W T S =I are independent;id est, for S ∈ C M×k and W ∈ C N×k , if W T S =I then rank(SW T )=k bythe linearly independent dyads theorem because (conferE.1.1)W T S =I ⇒ rankS=rankW =k≤M =N (1627)To see that, we need only show: N(S)=0 ⇔ ∃ B BS =I . B.4(⇐) Assume BS =I . Then N(BS)=0={x |BSx = 0} ⊇ N(S). (1607)(⇒) If N(S)=0[ then]S must be full-rank skinny-or-square.B∴ ∃ A,B,C [S A] = I (id est, [ S A] is invertible) ⇒ BS =I .CLeft inverse B is given as W T here. Because of reciprocity with S , itimmediately follows: N(W)=0 ⇔ ∃ S S T W =I . Dyads produced by diagonalization, for example, are independent becauseof their inherent biorthogonality. (A.5.0.3) The converse is generally false;id est, linearly independent dyads are not necessarily biorthogonal.B.1.1.1.1 Theorem. Nullspace and range of dyad sum.Given a sum of dyads represented by SW T where S ∈C M×k and W ∈ C N×kN(SW T ) = N(W T ) ⇐R(SW T ) = R(S) ⇐B.4 Left inverse is not unique, in general.∃ B BS = I∃ Z W T Z = I(1628)⋄

B.2. DOUBLET 639R([u v ])R(Π)= R([u v])0 0 N(Π)=v ⊥ ∩ u ⊥([ ]) vR N T= R([u v ]) ⊞ Nu Tv ⊥ ∩ u ⊥([ ]) vTNu T ⊞ R([u v ]) = R NFigure 157: The four fundamental subspaces [333,3.6] of a doubletΠ = uv T + vu T ∈ S N . Π(x)=(uv T + vu T )x is a linear bijective mappingfrom R([u v ]) to R([u v ]).Proof. (⇒) N(SW T )⊇ N(W T ) and R(SW T )⊆ R(S) are obvious.(⇐) Assume the existence of a left inverse B ∈ R k×N and a right inverseZ ∈ R N×k . B.5N(SW T ) = {x | SW T x = 0} ⊆ {x | BSW T x = 0} = N(W T ) (1629)R(SW T ) = {SW T x | x∈ R N } ⊇ {SW T Zy | Zy ∈ R N } = R(S) (1630)B.2 DoubletConsider a sum of two linearly independent square dyads, one a transpositionof the other:Π = uv T + vu T = [u v ] [ ]v Tu T = SW T ∈ S N (1631)where u,v ∈ R N . Like the dyad, a doublet can be 0 only when u or v is 0 ;Π = uv T + vu T = 0 ⇔ u = 0 or v = 0 (1632)By assumption of independence, a nonzero doublet has two nonzeroeigenvaluesλ 1 u T v + ‖uv T ‖ , λ 2 u T v − ‖uv T ‖ (1633)B.5 By counterexample, the theorem’s converse cannot be true; e.g., S = W = [1 0].

640 APPENDIX B. SIMPLE MATRICESN(u T )R(E) = N(v T )0 0 N(E)=R(u)R(v)R N = N(u T ) ⊞ N(E)R(v) ⊞ R(E) = R NFigure 158: v T u = 1/ζ . The four fundamental subspaces)[333,3.6] ofelementary matrix E as a linear mapping E(x)=(I − uvT x .v T uwhere λ 1 > 0 >λ 2 , with corresponding eigenvectorsx 1 u‖u‖ + v‖v‖ ,x 2 u‖u‖ − v‖v‖(1634)spanning the doublet range. Eigenvalue λ 1 cannot be 0 unless u and v haveopposing directions, but that is antithetical since then the dyads would nolonger be independent. Eigenvalue λ 2 is 0 if and only if u and v share thesame direction, again antithetical. Generally we have λ 1 > 0 and λ 2 < 0, soΠ is indefinite.By the nullspace and range of dyad sum theorem, doublet Π hasN −2 ([ zero-eigenvalues ]) remaining and corresponding eigenvectors spanningvTNu T . We therefore haveR(Π) = R([u v ]) , N(Π) = v ⊥ ∩ u ⊥ (1635)of respective dimension 2 and N −2.B.3 Elementary matrixA matrix of the formE = I − ζuv T ∈ R N×N (1636)where ζ ∈ R is finite and u,v ∈ R N , is called an elementary matrix or arank-one modification of the identity. [204] Any elementary matrix in R N×N

B.3. ELEMENTARY MATRIX 641has N −1 eigenvalues equal to 1 corresponding to real eigenvectors thatspan v ⊥ . The remaining eigenvalueλ = 1 − ζv T u (1637)corresponds to eigenvector u . B.6 From [221, App.7.A.26] the determinant:detE = 1 − tr ( ζuv T) = λ (1638)If λ ≠ 0 then E is invertible; [153] (conferB.1.0.1)E −1 = I + ζ λ uvT (1639)Eigenvectors corresponding to 0 eigenvalues belong to N(E) , andthe number of 0 eigenvalues must be at least dim N(E) which, here,can be at most one. (A.7.3.0.1) The nullspace exists, therefore,when λ=0 ; id est, when v T u=1/ζ ; rather, whenever u belongs tohyperplane {z ∈R N | v T z=1/ζ}. Then (when λ=0) elementary matrix Eis a nonorthogonal projector projecting on its range (E 2 =E ,E.1) andN(E)= R(u) ; eigenvector u spans the nullspace when it exists. Byconservation of dimension, dim R(E)=N −dim N(E). It is apparentfrom (1636) that v ⊥ ⊆ R(E) , but dimv ⊥ =N −1. Hence R(E)≡v ⊥ whenthe nullspace exists, and the remaining eigenvectors span it.In summary, when a nontrivial nullspace of E exists,R(E) = N(v T ), N(E) = R(u), v T u = 1/ζ (1640)illustrated in Figure 158, which is opposite to the assignment of subspacesfor a dyad (Figure 156). Otherwise, R(E)= R N .When E = E T , the spectral norm isB.3.1Householder matrix‖E‖ 2 = max{1, |λ|} (1641)An elementary matrix is called a Householder matrix when it has the definingform, for nonzero vector u [159,5.1.2] [153,4.10.1] [331,7.3] [202,2.2]H = I − 2 uuTu T u ∈ SN (1642)B.6 Elementary matrix E is not always diagonalizable because eigenvector u need not beindependent of the others; id est, u∈v ⊥ is possible.

642 APPENDIX B. SIMPLE MATRICESwhich is a symmetric orthogonal (reflection) matrix (H −1 =H T =H(B.5.2)). Vector u is normal to an N−1-dimensional subspace u ⊥ throughwhich this particular H effects pointwise reflection; e.g., Hu ⊥ = u ⊥ whileHu = −u .Matrix H has N −1 orthonormal eigenvectors spanning that reflectingsubspace u ⊥ with corresponding eigenvalues equal to 1. The remainingeigenvector u has corresponding eigenvalue −1 ; sodetH = −1 (1643)Due to symmetry of H , the matrix 2-norm (the spectral norm) is equal to thelargest eigenvalue-magnitude. A Householder matrix is thus characterized,H T = H , H −1 = H T , ‖H‖ 2 = 1, H 0 (1644)For example, the permutation matrix⎡ ⎤1 0 0Ξ = ⎣ 0 0 1 ⎦ (1645)0 1 0is a Householder matrix having u=[ 0 1 −1 ] T / √ 2 . Not all permutationmatrices are Householder matrices, although all permutation matricesare orthogonal matrices (B.5.1, Ξ T Ξ = I) [331,3.4] because they aremade by permuting rows and columns of the identity matrix. Neitherare ⎡all symmetric ⎤ permutation matrices Householder matrices; e.g.,0 0 0 1Ξ = ⎢ 0 0 1 0⎥⎣ 0 1 0 0 ⎦ (1728) is not a Householder matrix.1 0 0 0B.4 Auxiliary V -matricesB.4.1Auxiliary projector matrix VIt is convenient to define a matrix V that arises naturally as a consequence oftranslating the geometric center α c (5.5.1.0.1) of some list X to the origin.In place of X − α c 1 T we may write XV as in (974) whereV = I − 1 N 11T ∈ S N (913)

B.4. AUXILIARY V -MATRICES 643is an elementary matrix called the geometric centering matrix.Any elementary matrix in R N×N has N −1 eigenvalues equal to 1. For theparticular elementary matrix V , the N th eigenvalue equals 0. The numberof 0 eigenvalues must equal dim N(V ) = 1, by the 0 eigenvalues theorem(A.7.3.0.1), because V =V T is diagonalizable. BecauseV 1 = 0 (1646)the nullspace N(V )= R(1) is spanned by the eigenvector 1. The remainingeigenvectors span R(V ) ≡ 1 ⊥ = N(1 T ) that has dimension N −1.BecauseV 2 = V (1647)and V T = V , elementary matrix V is also a projection matrix (E.3)projecting orthogonally on its range N(1 T ) which is a hyperplane containingthe origin in R N V = I − 1(1 T 1) −1 1 T (1648)The {0, 1} eigenvalues also indicate diagonalizable V is a projectionmatrix. [393,4.1 thm.4.1] Symmetry of V denotes orthogonal projection;from (1916),V T = V , V † = V , ‖V ‖ 2 = 1, V ≽ 0 (1649)Matrix V is also circulant [168].B.4.1.0.1 Example. Relationship of Auxiliary to Householder matrix.Let H ∈ S N be a Householder matrix (1642) defined by⎡ ⎤u = ⎢1.⎥⎣ 11 + √ ⎦ ∈ RN (1650)NThen we have [155,2]Let D ∈ S N h and define[ I 0V = H0 T 0[ A b−HDH −b T c]H (1651)](1652)

644 APPENDIX B. SIMPLE MATRICESwhere b is a vector. Then because H is nonsingular (A.3.1.0.5) [184,3][ A 0−V DV = −H0 T 0]H ≽ 0 ⇔ −A ≽ 0 (1653)and affine dimension is r = rankA when D is a Euclidean distance matrix.B.4.2Schoenberg auxiliary matrix V N1. V N = 1 √2[ −1TI2. V T N 1 = 03. I − e 1 1 T = [ 0 √ 2V N]4. [ 0 √ 2V N]VN = V N5. [ 0 √ 2V N]V = V]∈ R N×N−16. V [ 0 √ 2V N]=[0√2VN]7. [ 0 √ 2V N] [0√2VN]=[0√2VN]8. [ 0 √ 2V N] †=[ 0 0T0 I]V9. [ 0 √ 2V N] †V =[0√2VN] †10. [ 0 √ ] [ √ ] †2V N 0 2VN = V11. [ 0 √ [ ]] † [ √ ] 0 0T2V N 0 2VN =0 I12. [ 0 √ ] [ ]0 02V TN = [ 0 √ ]2V0 IN[ ] [ ]0 0T [0 √ ] 0 0T13.2VN =0 I0 I

B.4. AUXILIARY V -MATRICES 64514. [V N1 √21 ] −1 =[ ]V†N√2N 1T15. V † N = √ 2 [ − 1 N 1 I − 1 N 11T] ∈ R N−1×N ,16. V † N 1 = 017. V † N V N = I18. V T = V = V N V † N = I − 1 N 11T ∈ S N(I −1N 11T ∈ S N−1)19. −V † N (11T − I)V N = I ,(11 T − I ∈ EDM N)20. D = [d ij ] ∈ S N h (915)tr(−V DV ) = tr(−V D) = tr(−V † N DV N) = 1 N 1T D 1 = 1 N tr(11T D) = 1 NAny elementary matrix E ∈ S N of the particular form∑d iji,jE = k 1 I − k 2 11 T (1654)where k 1 , k 2 ∈ R , B.7 will make tr(−ED) proportional to ∑ d ij .21. D = [d ij ] ∈ S Ntr(−V DV ) = 1 N∑i,ji≠jd ij − N−1N∑d ii = 1 N 1T D 1 − trDi22. D = [d ij ] ∈ S N htr(−V T N DV N) = ∑ jd 1j23. For Y ∈ S NV (Y − δ(Y 1))V = Y − δ(Y 1)B.7 If k 1 is 1−ρ while k 2 equals −ρ∈R , then all eigenvalues of E for −1/(N −1) < ρ < 1are guaranteed positive and therefore E is guaranteed positive definite. [301]

646 APPENDIX B. SIMPLE MATRICESB.4.3The skinny matrix⎡V W ⎢⎣Orthonormal auxiliary matrix V W−1 √N1 + −1N+ √ N−1N+ √ N.−1N+ √ N√−1−1N· · · √N−1N+ √ N......−1N+ √ N· · ·−1N+ √ N...−1N+ √ N...· · · 1 + −1N+ √ N.⎤∈ R N×N−1 (1655)⎥⎦has R(V W )= N(1 T ) and orthonormal columns. [6] We defined threeauxiliary V -matrices: V , V N (897), and V W sharing some attributes listedin Table B.4.4. For example, V can be expressedV = V W V T W = V N V † N(1656)but VW TV W = I means V is an orthogonal projector (1913) andB.4.4V † W = V T W , ‖V W ‖ 2 = 1 , V T W1 = 0 (1657)Auxiliary V -matrix TabledimV rankV R(V ) N(V T ) V T V V V T V V †V N ×N N −1 N(1 T ) R(1) V V V[ ]V N N ×(N −1) N −1 N(1 T 1) R(1) (I + 2 11T 1 N −1 −1T) V2 −1 IV W N ×(N −1) N −1 N(1 T ) R(1) I V VB.4.5More auxiliary matricesMathar shows [259,2] that any elementary matrix (B.3) of the formV S = I − b1 T ∈ R N×N (1658)such that b T 1 = 1 (confer [162,2]), is an auxiliary V -matrix havingR(VS T)= N(bT ), R(V S ) = N(1 T )N(V S ) = R(b), N(VS T)= R(1) (1659)

B.5. ORTHOGONAL MATRIX 647Given X ∈ R n×N , the choice b = 1 N 1 (V S = V ) minimizes ‖X(I − b1 T )‖ F .[164,3.2.1]B.5 Orthogonal matrixB.5.1Vector rotationThe property Q −1 = Q T completely defines an orthogonal matrix Q∈ R n×nemployed to effect vector rotation; [331,2.6,3.4] [333,6.5] [202,2.1]for any x ∈ R n ‖Qx‖ 2 = ‖x‖ 2 (1660)In other words, the 2-norm is orthogonally invariant. A unitary matrix is acomplex generalization of orthogonal matrix; conjugate transpose defines it:U −1 = U H . An orthogonal matrix is simply a real unitary matrix.Orthogonal matrix Q is a normal matrix further characterized:Q −1 = Q T , ‖Q‖ 2 = 1 (1661)Applying characterization (1661) to Q T we see it too is an orthogonal matrix.Hence the rows and columns of Q respectively form an orthonormal set.Normalcy guarantees diagonalization (A.5.1) so, for Q SΛS HSΛ −1 S H = S ∗ ΛS T , ‖δ(Λ)‖ ∞ = 1 (1662)characterizes an orthogonal matrix in terms of eigenvalues and eigenvectors.All permutation matrices Ξ , for example, are nonnegative orthogonalmatrices; and vice versa. Any product of permutation matrices remainsa permutator. Any product of a permutation matrix with an orthogonalmatrix remains orthogonal. In fact, any product of orthogonal matrices AQremains orthogonal by definition. Given any other dimensionally compatibleorthogonal matrix U , the mapping g(A)= U T AQ is a bijection on thedomain of orthogonal matrices (a nonconvex manifold of dimension 1 n(n −1)2[52]). [239,2.1] [240]The largest magnitude entry of an orthogonal matrix is 1; for each andevery j ∈1... n‖Q(j,:)‖ ∞ ≤ 1‖Q(:, j)‖ ∞ ≤ 1(1663)

648 APPENDIX B. SIMPLE MATRICESEach and every eigenvalue of a (real) orthogonal matrix has magnitude 1λ(Q) ∈ C n , |λ(Q)| = 1 (1664)while only the identity matrix can be simultaneously positive definite andorthogonal.B.5.2ReflectionA matrix for pointwise reflection is defined by imposing symmetry uponthe orthogonal matrix; id est, a reflection matrix is completely defined byQ −1 = Q T = Q . The reflection matrix is a symmetric orthogonal matrix,and vice versa, characterized:Q T = Q , Q −1 = Q T , ‖Q‖ 2 = 1 (1665)The Householder matrix (B.3.1) is an example of symmetric orthogonal(reflection) matrix.Reflection matrices have eigenvalues equal to ±1 and so detQ=±1. Itis natural to expect a relationship between reflection and projection matricesbecause all projection matrices have eigenvalues belonging to {0, 1}. Infact, any reflection matrix Q is related to some orthogonal projector P by[204,1 prob.44]Q = I − 2P (1666)Yet P is, generally, neither orthogonal or invertible. (E.3.2)λ(Q) ∈ R n , |λ(Q)| = 1 (1667)Reflection is with respect to R(P ) ⊥ . Matrix 2P −I represents antireflection.Every orthogonal matrix can be expressed as the product of a rotation anda reflection. The collection of all orthogonal matrices of particular dimensiondoes not form a convex set.B.5.3Rotation of range and rowspaceGiven orthogonal matrix Q , column vectors of a matrix X are simultaneouslyrotated about the origin via product QX . In three dimensions (X ∈ R 3×N ),

B.5. ORTHOGONAL MATRIX 649Figure 159: Gimbal: a mechanism imparting three degrees of dimensionalfreedom to a Euclidean body suspended at the device’s center. Each ring isfree to rotate about one axis. (Courtesy of The MathWorks Inc.)the precise meaning of rotation is best illustrated in Figure 159 where thegimbal aids visualization of what is achievable; mathematically, (5.5.2.0.1)⎡⎤⎡⎤⎡⎤cosθ 0 − sin θ 1 0 0 cos φ − sin φ 0Q = ⎣ 0 1 0 ⎦⎣0 cos ψ − sin ψ ⎦⎣sin φ cos φ 0 ⎦ (1668)sin θ 0 cos θ 0 sinψ cos ψ 0 0 1B.5.3.0.1 Example. One axis of revolution. [ RPartition an n+1-dimensional Euclidean space R n+1 nRan n-dimensional subspace]and defineR {λ∈ R n+1 | 1 T λ = 0} (1669)(a hyperplane through the origin). We want an orthogonal matrix thatrotates a list in the columns of matrix X ∈ R n+1×N through the dihedralangle between R n and R (2.4.3)( ) ( )(R n 〈en+1 , 1〉1, R) = arccos = arccos √ radians (1670)‖e n+1 ‖ ‖1‖ n+1The vertex-description of the nonnegative orthant in R n+1 is{[e 1 e 2 · · · e n+1 ]a | a ≽ 0} = {a ≽ 0} = R n+1+ ⊂ R n+1 (1671)

650 APPENDIX B. SIMPLE MATRICESConsider rotation of these vertices via orthogonal matrixQ [1 1 √ n+1ΞV W ]Ξ ∈ R n+1×n+1 (1672)where permutation matrix Ξ∈ S n+1 is defined in (1728), and V W ∈ R n+1×nis the orthonormal auxiliary matrix defined inB.4.3. This particularorthogonal matrix is selected because it rotates any point in subspace R nabout one axis of revolution onto R ; e.g., rotation Qe n+1 aligns the laststandard basis vector with subspace normal R ⊥ = 1. The rotated standardbasis vectors remaining are orthonormal spanning R .Another interpretation of product QX is rotation/reflection of R(X).Rotation of X as in QXQ T is a simultaneous rotation/reflection of rangeand rowspace. B.8Proof. Any matrix can be expressed as a singular value decompositionX = UΣW T (1563) where δ 2 (Σ) = Σ , R(U) ⊇ R(X) , and R(W) ⊇ R(X T ).B.5.4Matrix rotationOrthogonal matrices are also employed to rotate/reflect other matrices likevectors: [159,12.4.1] Given orthogonal matrix Q , the product Q T A willrotate A∈ R n×n in the Euclidean sense in R n2 because Frobenius’ norm isorthogonally invariant (2.2.1);‖Q T A‖ F = √ tr(A T QQ T A) = ‖A‖ F (1673)(likewise for AQ). Were A symmetric, such a rotation would depart from S n .One remedy is to instead form product Q T AQ because‖Q T AQ‖ F = √ tr(Q T A T QQ T AQ) = ‖A‖ F (1674)ByA.1.1 no.31,vec Q T AQ = (Q ⊗ Q) T vec A (1675)B.8 The product Q T AQ can be regarded as a coordinate transformation; e.g., givenlinear map y =Ax : R n → R n and orthogonal Q, the transformation Qy =AQx is arotation/reflection of range and rowspace (141) of matrix A where Qy ∈ R(A) andQx∈ R(A T ) (142).

B.5. ORTHOGONAL MATRIX 651which is a rotation of the vectorized A matrix because any Kronecker productof orthogonal matrices remains orthogonal; id est, byA.1.1 no.38,(Q ⊗ Q) T (Q ⊗ Q) = I (1676)Matrix A is orthogonally equivalent to B if B=S T AS for someorthogonal matrix S . Every square matrix, for example, is orthogonallyequivalent to a matrix having equal entries along the main diagonal.[202,2.2, prob.3]

652 APPENDIX B. SIMPLE MATRICES

Appendix CSome analytical optimal resultsPeople have been working on Optimization since the ancientGreeks [Zenodorus, circa 200bc] learned that a string enclosesthe most area when it is formed into the shape of a circle.−Roman PolyakWe speculate that optimization problems possessing analytical solutionhave convex transformation or constructive global optimality conditions,perhaps yet unknown; e.g.,7.1.4, (1700),C.3.0.1.C.1 Properties of infimainf ∅ ∞(1677)sup ∅ −∞Given f(x) : X →R defined on arbitrary set X [199,0.1.2]inf f(x) = − sup −f(x)x∈X x∈Xsupx∈Xf(x) = −infx∈X −f(x) (1678)arg inf f(x) = arg sup −f(x)x∈X x∈Xarg supx∈Xf(x) = arg infx∈X −f(x) (1679)2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.653

654 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSGiven scalar κ and f(x) : X →R and g(x) : X →R defined onarbitrary set X [199,0.1.2]inf (κ + f(x)) = κ + infx∈Xarg infx∈Xf(x)x∈X(κ + f(x)) = arg inf f(x) (1680)x∈Xinf κf(x) = κ infx∈Xarg infx∈Xf(x)⎫⎬x∈Xκf(x) = arg inf f(x) ⎭ , κ > 0 (1681)x∈Xinf (f(x) + g(x)) ≥ inf f(x) + inf g(x) (1682)x∈X x∈X x∈XGiven f(x) : X ∪ Y →R and arbitrary sets X and Y [199,0.1.2]X ⊂ Y ⇒ inf f(x) ≥ inf f(x) (1683)x∈X x∈Yinf f(x) = min{inf f(x), inf f(x)} (1684)x∈X ∪Y x∈X x∈Yinf f(x) ≥ max{inf f(x), inf f(x)} (1685)x∈X ∩Y x∈X x∈YC.2 Trace, singular and eigen valuesFor A∈ R m×n and σ(A) 1 connoting spectral norm,σ(A) 1 = √ λ(A T A) 1 = ‖A‖ 2 = sup ‖Ax‖ 2 = minimize t(639)‖x‖=1t∈R [ ] tI Asubject toA T ≽ 0tI∑iFor A∈ R m×n and σ(A) denoting its singular values, the nuclear norm(Ky Fan norm) of matrix A (confer (44), (1566), [203, p.200]) isσ(A) i = tr √ A T A = sup tr(X T A) = maximize tr(X T A)‖X‖ 2 ≤1X∈R m×n [ I Xsubject toX T I= 1 2 minimizeX∈S m , Y ∈S nsubject to]≽ 0(1686)trX + trY[ ] X AA T ≽ 0Y

C.2. TRACE, SINGULAR AND EIGEN VALUES 655This nuclear norm is convex C.1 and dual to spectral norm. [203, p.214][61,A.1.6] Given singular value decomposition A = SΣQ T ∈ R m×n(A.6), then X ⋆ = SQ T ∈ R m×n is an optimal solution to maximization(confer2.3.2.0.5) while X ⋆ = SΣS T ∈ S m and Y ⋆ = QΣQ T ∈ S n is anoptimal solution to minimization [135]. Srebro [323] asserts∑iσ(A) i = 1 minimize ‖U‖ 2 2 F + ‖V U,V‖2 Fsubject to A = UV T(1687)= minimize ‖U‖ F ‖V ‖ FU,Vsubject to A = UV TC.2.0.0.1 Exercise. Optimal matrix factorization.Prove (1687). C.2For X ∈ S m , Y ∈ S n , A∈ C ⊆ R m×n for set C convex, and σ(A)denoting the singular values of A [135,3]minimizeA∑σ(A) isubject to A ∈ Ci≡1minimize 2A , X , Ysubject totrX + trY[ ] X AA T ≽ 0YA ∈ C(1688)C.1 discernible as envelope of the rank function (1362) or as supremum of functions linearin A (Figure 74).C.2 Hint: Write A = SΣQ T ∈ R m×n and[ ] [ X A UA T =Y V][ U T V T ] ≽ 0Show U ⋆ = S √ Σ∈ R m×min{m,n} and V ⋆ = Q √ Σ∈ R n×min{m,n} , hence ‖U ⋆ ‖ 2 F = ‖V ⋆ ‖ 2 F .

656 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor A∈ S N + and β ∈ Rβ trA = maximize tr(XA)X∈ S Nsubject to X ≼ βI(1689)But the following statement is numerically stable, preventing anunbounded solution in direction of a 0 eigenvalue:maximize sgn(β) tr(XA)X∈ S Nsubject to X ≼ |β|IX ≽ −|β|I(1690)where β trA = tr(X ⋆ A). If β ≥ 0, then X ≽−|β|I ← X ≽ 0.For symmetric A∈ S N , its smallest and largest eigenvalue in λ(A)∈ R Nis respectively [11,4.1] [43,I.6.15] [202,4.2] [239,2.1] [240]min{λ(A) i } = inf x T Ax = minimizei‖x‖=1X∈ S N +subject to trX = 1max{λ(A) i } = sup x T Ax = maximizei‖x‖=1X∈ S N +subject to trX = 1tr(XA) = maximize tt∈Rsubject to A ≽ tI(1691)tr(XA) = minimize tt∈Rsubject to A ≼ tI(1692)The largest eigenvalue λ 1 is always convex in A∈ S N because, givenparticular x , x T Ax is linear in matrix A ; supremum of a family oflinear functions is convex, as illustrated in Figure 74. So for A,B∈ S N ,λ 1 (A + B) ≤ λ 1 (A) + λ 1 (B). (1504) Similarly, the smallest eigenvalueλ N of any symmetric matrix is a concave function of its entries;λ N (A + B) ≥ λ N (A) + λ N (B). (1504) For v 1 a normalized eigenvectorof A corresponding to the largest eigenvalue, and v N a normalizedeigenvector corresponding to the smallest eigenvalue,v N = arg inf x T Ax (1693)‖x‖=1v 1 = arg sup x T Ax (1694)‖x‖=1

C.2. TRACE, SINGULAR AND EIGEN VALUES 657For A∈ S N having eigenvalues λ(A)∈ R N , consider the unconstrainednonconvex optimization that is a projection on the rank-1 subset(2.9.2.1,3.6.0.0.1) of the boundary of positive semidefinite cone S N + :Defining λ 1 max i {λ(A) i } and corresponding eigenvector v 1minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=(1695)‖λ(A)‖ 2 − λ 2 1 , λ 1 > 0arg minimizex‖xx T − A‖ 2 F ={0 , λ1 ≤ 0v 1√λ1 , λ 1 > 0(1696)Proof. This is simply the Eckart & Young solution from7.1.2:x ⋆ x ⋆T ={0 , λ1 ≤ 0λ 1 v 1 v T 1 , λ 1 > 0(1697)minimizexGiven nonincreasingly ordered diagonalization A = QΛQ T whereΛ = δ(λ(A)) (A.5), then (1695) has minimum value⎧‖QΛQ T ‖ 2 F = ‖δ(Λ)‖2 , λ 1 ≤ 0⎪⎨⎛⎡‖xx T −A‖ 2 F =λ 1 Q ⎜⎢0⎝⎣. ..⎪⎩ ∥0⎤ ⎞ ∥ ∥∥∥∥∥∥2⎡⎥ ⎟⎦− Λ⎠Q T ⎢=⎣∥Fλ 10.0⎤⎥⎦− δ(Λ)∥2(1698), λ 1 > 0C.2.0.0.2 Exercise. Rank-1 approximation.Given symmetric matrix A∈ S N , prove:v 1 = arg minimize ‖xx T − A‖ 2 Fxsubject to ‖x‖ = 1(1699)where v 1 is a normalized eigenvector of A corresponding to its largesteigenvalue. What is the objective’s optimal value?

658 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS(Ky Fan, 1949) For B ∈ S N whose eigenvalues λ(B)∈ R N are arrangedin nonincreasing order, and for 1≤k ≤N [11,4.1] [215] [202,4.3.18][359,2] [239,2.1] [240]N∑i=N−k+1λ(B) i = infU∈ R N×kU T U=Itr(UU T B) = minimizeX∈ S N +tr(XB)subject to X ≼ ItrX = k(a)k∑λ(B) i = supi=1U∈ R N×kU T U=I= maximize (k − N)µ + tr(B − Z)µ∈R , Z∈S N +subject to µI + Z ≽ Btr(UU T B) = maximizeX∈ S N +tr(XB)subject to X ≼ ItrX = k(b)(c)= minimize kµ + trZµ∈R , Z∈S N +subject to µI + Z ≽ B(d)(1700)Given ordered diagonalization B = QΛQ T , (A.5.1) then an optimalU for the infimum is U ⋆ = Q(:, N − k+1:N)∈ R N×k whereasU ⋆ = Q(:, 1:k)∈ R N×k for the supremum is more reliably computed.In both cases, X ⋆ = U ⋆ U ⋆T . Optimization (a) searches the convex hullof outer product UU T of all N ×k orthonormal matrices. (2.3.2.0.1)For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, and for diagonal matrix Υ∈ S k whose diagonal entries arearranged in nonincreasing order where 1≤k ≤N , we utilize themain-diagonal δ operator’s self-adjointness property (1418): [12,4.2]k∑Υ ii λ(B) N−i+1 = inf tr(ΥU T BU) = inf δ(Υ) T δ(U T BU)i=1U∈ R N×kU∈ R N×kU T U=I= minimizeV i ∈S Ni=1U T U=I()∑tr B k (Υ ii −Υ i+1,i+1 )V isubject to trV i = i ,I ≽ V i ≽ 0,(1701)i=1... ki=1... k

C.2. TRACE, SINGULAR AND EIGEN VALUES 659where Υ k+1,k+1 0. We speculate,k∑Υ ii λ(B) i = supi=1tr(ΥU T BU) = sup δ(Υ) T δ(U T BU) (1702)U∈ R N×kU∈ R N×kU T U=IU T U=Ik∑i=1Alizadeh shows: [11,4.2]k∑Υ ii λ(B) i = minimize iµ i + trZ iµ∈R k , Z i ∈S N i=1subject to µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... k= maximizeV i ∈S Nwhere Υ k+1,k+1 0.Z i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )V ii=1i=1... ksubject to trV i = i , i=1... kI ≽ V i ≽ 0, i=1... k (1703)The largest eigenvalue magnitude µ of A∈ S Nmax { |λ(A) i | } = minimize µi µ∈Rsubject to −µI ≼ A ≼ µI(1704)is minimized over convex set C by semidefinite program: (confer7.1.5)minimize ‖A‖ 2Asubject to A ∈ C≡minimize µµ , Asubject to −µI ≼ A ≼ µIA ∈ C(1705)id est,µ ⋆ maxi{ |λ(A ⋆ ) i | , i = 1... N } ∈ R + (1706)

660 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, let Πλ(B) be a permutation of eigenvalues λ(B) suchthat their absolute value becomes arranged in nonincreasingorder: |Πλ(B)| 1 ≥ |Πλ(B)| 2 ≥ · · · ≥ |Πλ(B)| N . Then, for 1≤k ≤N[11,4.3] C.3k∑|Πλ(B)| ii=1k∑i=1= minimize kµ + trZµ∈R , Z∈S N +subject to µI + Z + B ≽ 0µI + Z − B ≽ 0= maximize 〈B , V − W 〉V ,W ∈ S N +subject to I ≽ V , Wtr(V + W)=k(1707)For diagonal matrix Υ∈ S k whose diagonal entries are arranged innonincreasing order where 1≤k ≤Nk∑Υ ii |Πλ(B)| i = minimize iµ i + trZ iµ∈R k , Z i ∈S N i=1subject to µ i I + Z i + (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... k= maximizeV i ,W i ∈S Nwhere Υ k+1,k+1 0.µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... kZ i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )(V i − W i )i=1i=1... ksubject to tr(V i + W i ) = i , i=1... kI ≽ V i ≽ 0, i=1... kI ≽ W i ≽ 0, i=1... k (1708)C.2.0.0.3 Exercise. Weighted sum of largest eigenvalues.Prove (1702).C.3 We eliminate a redundant positive semidefinite variable from Alizadeh’s minimization.There exist typographical errors in [292, (6.49) (6.55)] for this minimization.

C.3. ORTHOGONAL PROCRUSTES PROBLEM 661For A,B∈ S N whose eigenvalues λ(A), λ(B)∈ R N are respectivelyarranged in nonincreasing order, and for nonincreasingly ordereddiagonalizations A = W A ΥWA T and B = W B ΛWB T [200] [239,2.1][240]λ(A) T λ(B) = supU∈ R N×NU T U=I(confer (1732)) where optimal U istr(A T U T BU) ≥ tr(A T B) (1727)U ⋆ = W B W TA ∈ R N×N (1724)We can push that upper bound higher using a result inC.4.2.1:|λ(A)| T |λ(B)| = supU∈ C N×NU H U=Ire tr(A T U H BU) (1709)For step function ψ as defined in (1579), optimal U becomesU ⋆ = W B√δ(ψ(δ(Λ)))H√δ(ψ(δ(Υ)))WTA ∈ C N×N (1710)C.3 Orthogonal Procrustes problemGiven matrices A,B∈ R n×N , their product having full singular valuedecomposition (A.6.3)AB T UΣQ T ∈ R n×n (1711)then an optimal solution R ⋆ to the orthogonal Procrustes problemminimize ‖A − R T B‖ FR(1712)subject to R T = R −1maximizes tr(A T R T B) over the nonconvex manifold of orthogonal matrices:[202,7.4.8]R ⋆ = QU T ∈ R n×n (1713)A necessary and sufficient condition for optimalityholds whenever R ⋆ is an orthogonal matrix. [164,4]AB T R ⋆ ≽ 0 (1714)

minimizeR662 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSOptimal solution R ⋆ can reveal rotation/reflection (5.5.2,B.5) of onelist in the columns of matrix A with respect to another list in B . Solution isunique if rankBV N = n . [113,2.4.1] In the case that A is a vector andpermutation of B , solution R ⋆ is not necessarily a permutation matrix(4.6.0.0.3) although the optimal objective will be zero. More generally, theoptimal value for objective of minimization istr ( A T A + B T B − 2AB T R ⋆) = tr(A T A) + tr(B T B) − 2 tr(UΣU T )= ‖A‖ 2 F + ‖B‖2 F − 2δ(Σ)T 1while the optimal value for corresponding trace maximization is(1715)sup tr(A T R T B) = tr(A T R ⋆T B) = δ(Σ) T 1 ≥ tr(A T B) (1716)R T =R −1The same optimal solution R ⋆ solvesC.3.0.1Procrustes relaxationmaximize ‖A + R T B‖ FR(1717)subject to R T = R −1By replacing its feasible set with the convex hull of orthogonal matrices(Example 2.3.2.0.5), we relax Procrustes problem (1712) to a convex problem‖A − R T B‖ 2 Fsubject to R T = R −1= tr(AT A + B T B) − 2 maximizeRsubject towhose adjusted objective must always equal Procrustes’. C.4C.3.1Effect of translationtr(A T R T B)[ ] I RR T ≽ 0 (1718)IConsider the impact on problem (1712) of dc offset in known listsA,B∈ R n×N . Rotation of B there is with respect to the origin, so betterresults may be obtained if offset is first accounted. Because the geometriccenters of the lists AV and BV are the origin, instead we solveminimize ‖AV − R T BV ‖ FR(1719)subject to R T = R −1C.4 (because orthogonal matrices are the extreme points of this hull) and whose optimalnumerical solution (SDPT3 [348]) [167] is reliably observed to be orthogonal for n≤N .

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES 663where V ∈ S N is the geometric centering matrix (B.4.1). Now we define thefull singular value decompositionand an optimal rotation matrixAV B T UΣQ T ∈ R n×n (1720)R ⋆ = QU T ∈ R n×n (1713)The desired result is an optimally rotated offset listR ⋆T BV + A(I − V ) ≈ A (1721)which most closely matches the list in A . Equality is attained when the listsare precisely related by a rotation/reflection and an offset. When R ⋆T B=Aor B1=A1=0, this result (1721) reduces to R ⋆T B ≈ A .C.3.1.1Translation of extended listSuppose an optimal rotation matrix R ⋆ ∈ R n×n were derived as beforefrom matrix B ∈ R n×N , but B is part of a larger list in the columns of[C B ]∈ R n×M+N where C ∈ R n×M . In that event, we wish to applythe rotation/reflection and translation to the larger list. The expressionsupplanting the approximation in (1721) makes 1 T of compatible dimension;R ⋆T [C −B11 T 1 NBV ] + A11 T 1 N(1722)id est, C −B11 T 1 N ∈ Rn×M and A11 T 1 N ∈ Rn×M+N .C.4 Two-sided orthogonal ProcrustesC.4.0.1MinimizationGiven symmetric A,B∈ S N , each having diagonalization (A.5.1)A Q A Λ A Q T A , B Q B Λ B Q T B (1723)where eigenvalues are arranged in their respective diagonal matrix Λ innonincreasing order, then an optimal solution [130]R ⋆ = Q B Q T A ∈ R N×N (1724)

664 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSto the two-sided orthogonal Procrustes problemminimize ‖A − R T BR‖ FR= minimize tr ( A T A − 2A T R T BR + B T B )subject to R T = R −1 Rsubject to R T = R −1 (1725)maximizes tr(A T R T BR) over the nonconvex manifold of orthogonal matrices.Optimal product R ⋆T BR ⋆ has the eigenvectors of A but the eigenvalues of B .[164,7.5.1] The optimal value for the objective of minimization is, by (48)‖Q A Λ A Q T A −R ⋆T Q B Λ B Q T BR ⋆ ‖ F = ‖Q A (Λ A −Λ B )Q T A ‖ F = ‖Λ A −Λ B ‖ F (1726)while the corresponding trace maximization has optimal valuesup tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A Λ B ) ≥ tr(A T B) (1727)R T =R −1The lower bound on inner product of eigenvalues is due to Fan (p.603).C.4.0.2MaximizationAny permutation matrix is an orthogonal matrix. Defining a row- andcolumn-swapping permutation matrix (a reflection matrix,B.5.2)⎡ ⎤0 1·Ξ = Ξ T =⎢ ·⎥(1728)⎣ 1 ⎦1 0then an optimal solution R ⋆ to the maximization problem [sic]minimizes tr(A T R T BR) : [200] [239,2.1] [240]maximize ‖A − R T BR‖ FR(1729)subject to R T = R −1R ⋆ = Q B ΞQ T A ∈ R N×N (1730)The optimal value for the objective of maximization is‖Q A Λ A Q T A − R⋆T Q B Λ B Q T B R⋆ ‖ F = ‖Q A Λ A Q T A − Q A ΞT Λ B ΞQ T A ‖ F= ‖Λ A − ΞΛ B Ξ‖ F(1731)while the corresponding trace minimization has optimal valueinf tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A ΞΛ B Ξ) (1732)R T =R −1

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES 665C.4.1Procrustes’ relation to linear programmingAlthough these two-sided Procrustes problems are nonconvex, a connectionwith linear programming [94] was discovered by Anstreicher & Wolkowicz[12,3] [239,2.1] [240]: Given A,B∈ S N , this semidefinite program in Sand Tminimize tr(A T R T BR) = maximize tr(S + T ) (1733)RS , T ∈S Nsubject to R T = R −1 subject to A T ⊗ B − I ⊗ S − T ⊗ I ≽ 0(where ⊗ signifies Kronecker product (D.1.2.1)) has optimal objectivevalue (1732). These two problems in (1733) are strong duals (2.13.1.0.3).Given ordered diagonalizations (1723), make the observation:infR tr(AT R T BR) = inf tr(Λ ˆRT A Λ B ˆR) (1734)ˆRbecause ˆRQ B TRQ A on the set of orthogonal matrices (which includes thepermutation matrices) is a bijection. This means, basically, diagonal matricesof eigenvalues Λ A and Λ B may be substituted for A and B , so only the maindiagonals of S and T come into play;maximizeS,T ∈S N 1 T δ(S + T )subject to δ(Λ A ⊗ (ΞΛ B Ξ) − I ⊗ S − T ⊗ I) ≽ 0(1735)a linear program in δ(S) and δ(T) having the same optimal objective valueas the semidefinite program (1733).We relate their results to Procrustes problem (1725) by manipulatingsigns (1678) and permuting eigenvalues:maximize tr(A T R T BR) = minimize 1 T δ(S + T )RS , T ∈S Nsubject to R T = R −1 subject to δ(I ⊗ S + T ⊗ I − Λ A ⊗ Λ B ) ≽ 0= minimize tr(S + T )(1736)S , T ∈S Nsubject to I ⊗ S + T ⊗ I − A T ⊗ B ≽ 0This formulation has optimal objective value identical to that in (1727).

666 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSC.4.2Two-sided orthogonal Procrustes via SVDBy making left- and right-side orthogonal matrices independent, we can pushthe upper bound on trace (1727) a little further: Given real matrices A,Beach having full singular value decomposition (A.6.3)A U A Σ A Q T A ∈ R m×n , B U B Σ B Q T B ∈ R m×n (1737)then a well-known optimal solution R ⋆ , S ⋆ to the problemminimize ‖A − SBR‖ FR , Ssubject to R H = R −1(1738)S H = S −1maximizes re tr(A T SBR) : [314] [288] [52] [194] optimal orthogonal matricesS ⋆ = U A U H B ∈ R m×m , R ⋆ = Q B Q H A ∈ R n×n (1739)[sic] are not necessarily unique [202,7.4.13] because the feasible set is notconvex. The optimal value for the objective of minimization is, by (48)‖U A Σ A Q H A − S ⋆ U B Σ B Q H BR ⋆ ‖ F = ‖U A (Σ A − Σ B )Q H A ‖ F = ‖Σ A − Σ B ‖ F (1740)while the corresponding trace maximization has optimal value [43,III.6.12]sup | tr(A T SBR)| = sup re tr(A T SBR) = re tr(A T S ⋆ BR ⋆ ) = tr(ΣA TΣ B ) ≥ tr(AT B)R H =R −1R H =R −1S H =S −1 S H =S −1 (1741)for which it is necessaryA T S ⋆ BR ⋆ ≽ 0 , BR ⋆ A T S ⋆ ≽ 0 (1742)The lower bound on inner product of singular values in (1741) is due tovon Neumann. Equality is attained if U HA U B =I and QH B Q A =I .C.4.2.1Symmetric matricesNow optimizing over the complex manifold of unitary matrices (B.5.1),the upper bound on trace (1727) is thereby raised: Suppose we are givendiagonalizations for (real) symmetric A,B (A.5)A = W A ΥW TA ∈ S n , δ(Υ) ∈ K M (1743)

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES 667B = W B ΛW T B ∈ S n , δ(Λ) ∈ K M (1744)having their respective eigenvalues in diagonal matrices Υ, Λ ∈ S n arrangedin nonincreasing order (membership to the monotone cone K M (436)). Thenby splitting eigenvalue signs, we invent a symmetric SVD-like decompositionA U A Σ A Q H A ∈ S n , B U B Σ B Q H B ∈ S n (1745)where U A , U B , Q A , Q B ∈ C n×n are unitary matrices defined by (conferA.6.5)U A W A√δ(ψ(δ(Υ))) , QA W A√δ(ψ(δ(Υ)))H, ΣA = |Υ| (1746)U B W B√δ(ψ(δ(Λ))) , QB W B√δ(ψ(δ(Λ)))H, ΣB = |Λ| (1747)where step function ψ is defined in (1579). In this circumstance,S ⋆ = U A U H B = R ⋆T ∈ C n×n (1748)optimal matrices (1739) now unitary are related by transposition.optimal value of objective (1740) isThe‖U A Σ A Q H A − S ⋆ U B Σ B Q H BR ⋆ ‖ F = ‖ |Υ| − |Λ| ‖ F (1749)while the corresponding optimal value of trace maximization (1741) isC.4.2.2supR H =R −1S H =S −1 re tr(A T SBR) = tr(|Υ| |Λ|) (1750)Diagonal matricesNow suppose A and B are diagonal matricesA = Υ = δ 2 (Υ) ∈ S n , δ(Υ) ∈ K M (1751)B = Λ = δ 2 (Λ) ∈ S n , δ(Λ) ∈ K M (1752)both having their respective main diagonal entries arranged in nonincreasingorder:minimize ‖Υ − SΛR‖ FR , Ssubject to R H = R −1(1753)S H = S −1

668 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSThen we have a symmetric decomposition from unitary matrices as in (1745)whereU A √ δ(ψ(δ(Υ))) , Q A √ δ(ψ(δ(Υ))) H , Σ A = |Υ| (1754)U B √ δ(ψ(δ(Λ))) , Q B √ δ(ψ(δ(Λ))) H , Σ B = |Λ| (1755)Procrustes solution (1739) again sees the transposition relationshipS ⋆ = U A U H B = R ⋆T ∈ C n×n (1748)but both optimal unitary matrices are now themselves diagonal. So,S ⋆ ΛR ⋆ = δ(ψ(δ(Υ)))Λδ(ψ(δ(Λ))) = δ(ψ(δ(Υ)))|Λ| (1756)C.5 Nonconvex quadratics[346,2] [321,2] Given A∈ S n , b∈ R n , and ρ>01minimizex 2 xT Ax + b T xsubject to ‖x‖ ≤ ρ(1757)is a nonconvex problem for symmetric A unless A≽0. But necessaryand sufficient global optimality conditions are known for any symmetric A :vector x ⋆ solves (1757) iff ∃ Lagrange multiplier λ ⋆ ≥ 0 i) (A + λ ⋆ I)x ⋆ = −bii) λ ⋆ (‖x ⋆ ‖ − ρ) = 0 ,iii) A + λ ⋆ I ≽ 0‖x ⋆ ‖ ≤ ρλ ⋆ is unique. C.5 x ⋆ is unique if A+λ ⋆ I ≻0. Equality constrained problemi) (A + λ ⋆ I)x ⋆ = −bii) ‖x ⋆ ‖ = ρiii) A + λ ⋆ I ≽ 0⇔1minimizex 2 xT Ax + b T xsubject to ‖x‖ = ρ(1758)is nonconvex for any symmetric A matrix. x ⋆ solves (1758) iff ∃ λ ⋆ ∈ Rsatisfying the associated conditions. λ ⋆ and x ⋆ are unique as before. In 1998,Hiriart-Urruty [197] disclosed global optimality conditions for maximizing aconvexly constrained convex quadratic; a nonconvex problem [307,32].C.5 The first two are necessary KKT conditions [61,5.5.3] while condition iii governspassage to nonconvex global optimality.

Appendix DMatrix calculusFrom too much study, and from extreme passion, cometh madnesse.−Isaac Newton [154,5]D.1 Directional derivative, Taylor seriesD.1.1GradientsGradient of a differentiable real function f(x) : R K →R with respect to itsvector argument is defined uniquely in terms of partial derivatives⎡∇f(x) ⎢⎣∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎤⎥⎦ ∈ RK (1759)while the second-order gradient of the twice differentiable real function withrespect to its vector argument is traditionally called the Hessian ;⎡∇ 2 f(x) ⎢⎣∂ 2 f(x)∂x 2 1∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂x 1 ∂x 2· · ·∂ 2 f(x)∂x 2 2· · ·. . ..∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂x 2 K⎤∈ S K (1760)⎥⎦2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.669

670 APPENDIX D. MATRIX CALCULUSThe gradient of vector-valued function v(x) : R→R N on real domain isa row-vector[∇v(x) ∂v 1 (x)∂xwhile the second-order gradient is∂v 2 (x)∂x· · ·∂v N (x)∂x]∈ R N (1761)[∇ 2 v(x) ∂ 2 v 1 (x)∂x 2∂ 2 v 2 (x)∂x 2 · · ·]∂ 2 v N (x)∈ R N (1762)∂x 2Gradient of vector-valued function h(x) : R K →R N on vector domain is⎡∇h(x) ⎢⎣∂h 1 (x)∂x 1∂h 1 (x)∂x 2.∂h 2 (x)∂x 1· · ·∂h 2 (x)∂x 2· · ·.∂h N (x)∂x 1∂h N (x)∂x 2.⎦(1763)∂h 1 (x) ∂h 2 (x) ∂h∂x K ∂x K· · · N (x)∂x K= [ ∇h 1 (x) ∇h 2 (x) · · · ∇h N (x) ] ∈ R K×Nwhile the second-order gradient has a three-dimensional representationdubbed cubix ; D.1⎡∇ 2 h(x) ⎢⎣∇ ∂h 1(x)∂x 1∇ ∂h 1(x)∂x 2.⎤⎥∇ ∂h 2(x)∂x 1· · · ∇ ∂h N(x)∂x 1∇ ∂h 2(x)∂x 2· · · ∇ ∂h N(x)∂x 2. .∇ ∂h 2(x)∂x K· · · ∇ ∂h N(x)⎦(1764)∇ ∂h 1(x)∂x K∂x K= [ ∇ 2 h 1 (x) ∇ 2 h 2 (x) · · · ∇ 2 h N (x) ] ∈ R K×N×Kwhere the gradient of each real entry is with respect to vector x as in (1759).⎤⎥D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derivedfrom mater meaning mother.

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 671The gradient of real function g(X) : R K×L →R on matrix domain is⎡∇g(X) ⎢⎣=∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)[∇X(:,1) g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤∈ R K×L⎥⎦∇ X(:,2) g(X)...∇ X(:,L) g(X) ] ∈ R K×1×L (1765)where gradient ∇ X(:,i) is with respect to the i th column of X . The strangeappearance of (1765) in R K×1×L is meant to suggest a third dimensionperpendicular to the page (not a diagonal matrix). The second-order gradienthas representation⎡∇ 2 g(X) ⎢⎣=∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1[∇∇X(:,1) g(X)∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22· · · ∇ ∂g(X)∂X 2L. .∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤∈ R K×L×K×L⎥⎦∇∇ X(:,2) g(X)...∇∇ X(:,L) g(X) ] ∈ R K×1×L×K×L (1766)where the gradient ∇ is with respect to matrix X .

672 APPENDIX D. MATRIX CALCULUSGradient of vector-valued function g(X) : R K×L →R N on matrix domainis a cubix[∇X(:,1) g 1 (X) ∇ X(:,1) g 2 (X) · · · ∇ X(:,1) g N (X)∇g(X) ∇ X(:,2) g 1 (X) ∇ X(:,2) g 2 (X) · · · ∇ X(:,2) g N (X). . .. . .. . .∇ X(:,L) g 1 (X) ∇ X(:,L) g 2 (X) · · · ∇ X(:,L) g N (X) ]= [ ∇g 1 (X) ∇g 2 (X) · · · ∇g N (X) ] ∈ R K×N×L (1767)while the second-order gradient has a five-dimensional representation;[∇∇X(:,1) g 1 (X) ∇∇ X(:,1) g 2 (X) · · · ∇∇ X(:,1) g N (X)∇ 2 g(X) ∇∇ X(:,2) g 1 (X) ∇∇ X(:,2) g 2 (X) · · · ∇∇ X(:,2) g N (X).........∇∇ X(:,L) g 1 (X) ∇∇ X(:,L) g 2 (X) · · · ∇∇ X(:,L) g N (X) ]= [ ∇ 2 g 1 (X) ∇ 2 g 2 (X) · · · ∇ 2 g N (X) ] ∈ R K×N×L×K×L (1768)The gradient of matrix-valued function g(X) : R K×L →R M×N on matrixdomain has a four-dimensional representation called quartix (fourth-ordertensor)⎡⎤∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g(X) ∇g⎢ 21 (X) ∇g 22 (X) · · · ∇g 2N (X)⎥⎣ . .. ⎦ ∈ RM×N×K×L (1769)∇g M1 (X) ∇g M2 (X) · · · ∇g MN (X)while the second-order gradient has six-dimensional representation⎡⎤∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g(X) ∇⎢2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X)⎥⎣ . .. ⎦ ∈ RM×N×K×L×K×L∇ 2 g M1 (X) ∇ 2 g M2 (X) · · · ∇ 2 g MN (X)(1770)and so on.

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 673D.1.2Product rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X)while [53,8.3] [315]∇ X(f(X) T g(X) ) = ∇ X (f)g + ∇ X (g)f (1771)∇ X tr ( f(X) T g(X) ) = ∇ X(tr ( f(X) T g(Z) ) + tr ( g(X)f(Z) T))∣ ∣∣Z←X (1772)These expressions implicitly apply as well to scalar-, vector-, or matrix-valuedfunctions of scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f(X) : R 2×2 →R 2 = X T a and g(X) : R 2×2 →R 2 = Xb . We wishto find∇ X(f(X) T g(X) ) = ∇ X a T X 2 b (1773)using the product rule. Formula (1771) calls for∇ X a T X 2 b = ∇ X (X T a)Xb + ∇ X (Xb)X T a (1774)Consider the first of the two terms:Á∂(XÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ∇ X (f)g = ∇ X (X T a)Xb= [ ]Á∇(X T a) 1 ∇(XÁ(1775)T a) 2 Xb⎤T a) 2∂X 11(1776)⎡⎥The gradient of X T a forms a cubix in R 2×2×2 ; a.k.a, third-order tensor.⎡∂(X T a) 1∂X 11Á Á∂(X T a) 1∂(X T a) 2∂X 12Á⎤(Xb) 1∂X 12∇ X (X T a)Xb =⎢ ⎥⎣ ⎦ ∈ R 2×1×2∂(X T a) 1∂(X T a) 2(Xb) ∂X 21∂X 212⎢⎥⎣⎦∂(X T a) 1∂X 22∂(X T a) 2∂X 22

674 APPENDIX D. MATRIX CALCULUSBecause gradient of the product (1773) requires total change with respectto change in each entry of matrix X , the Xb vector must make an innerproduct with each vector in the second dimension of the cubix (indicated bydotted line segments);⎡ ⎤a 1 0∇ X (X T 0 a 1[ ]b1 Xa)Xb =11 + b 2 X 12⎢ ⎥∈ R 2×1×2⎣ a 2 0 ⎦ b 1 X 21 + b 2 X 220 a 2[ ]a1 (b= 1 X 11 + b 2 X 12 ) a 1 (b 1 X 21 + b 2 X 22 )∈ R 2×2a 2 (b 1 X 11 + b 2 X 12 ) a 2 (b 1 X 21 + b 2 X 22 )= ab T X T (1777)where the cubix appears as a complete 2 ×2×2 matrix. In like manner forthe second term ∇ X (g)f⎡ ⎤b 1 0∇ X (Xb)X T b 2 0[ ]X11 aa =1 + X 21 a 2⎢ ⎥∈ R 2×1×2⎣ 0 b 1⎦ X 12 a 1 + X 22 a 2 (1778)0 b 2= X T ab T ∈ R 2×2The solution∇ X a T X 2 b = ab T X T + X T ab T (1779)can be found from Table D.2.1 or verified using (1772).D.1.2.1Kronecker productA partial remedy for venturing into hyperdimensional matrix representations,such as the cubix or quartix, is to first vectorize matrices as in (37). Thisdevice gives rise to the Kronecker product of matrices ⊗ ; a.k.a, directproduct or tensor product. Although it sees reversal in the literature,[327,2.1] we adopt the definition: for A∈ R m×n and B ∈ R p×q⎡⎤B 11 A B 12 A · · · B 1q AB 21 A B 22 A · · · B 2q AB ⊗ A ⎢⎥⎣ . . . ⎦ ∈ Rpm×qn (1780)B p1 A B p2 A · · · B pq A

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 675for which A ⊗ 1 = 1 ⊗ A = A (real unity acts like identity).One advantage to vectorization is existence of a traditionaltwo-dimensional matrix representation (second-order tensor) for thesecond-order gradient of a real function with respect to a vectorized matrix.For example, fromA.1.1 no.33 (D.2.1) for square A,B ∈ R n×n [166,5.2][12,3]∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗A) vec X = B⊗A T +B T ⊗A ∈ R n2 ×n 2(1781)To disadvantage is a large new but known set of algebraic rules (A.1.1)and the fact that its mere use does not generally guarantee two-dimensionalmatrix representation of gradients.Another application of the Kronecker product is to reverse order ofappearance in a matrix product: Suppose we wish to weight the columnsof a matrix S ∈ R M×N , for example, by respective entries w i from the maindiagonal in⎡W ⎣ w ⎤1 0. . . ⎦∈ S N (1782)0 T w NA conventional means for accomplishing column weighting is to multiply Sby diagonal matrix W on the right-hand side:⎡ ⎤w 1 0SW = S⎣... ⎦= [ ]S(:, 1)w 1 · · · S(:, N)w N ∈ R M×N (1783)0 T w NTo reverse product order such that diagonal matrix W instead appears tothe left of S : for I ∈ S M (Sze Wan)⎡⎤S(:, 1) 0 0SW = (δ(W) T ⊗ I) ⎢ 0 S(:, 2)...⎥⎣ ...... 0 ⎦ ∈ RM×N (1784)0 0 S(:, N)To instead weight the rows of S via diagonal matrix W ∈ S M , for I ∈ S N⎡⎤S(1, :) 0 0WS = ⎢ 0 S(2, :)...⎥⎣ ...... 0 ⎦ (δ(W) ⊗ I) ∈ RM×N (1785)0 0 S(M , :)

676 APPENDIX D. MATRIX CALCULUSFor any matrices of like size, S,Y ∈ R M×N⎡⎤S(:, 1) 0 0S ◦Y = [ δ(Y (:, 1)) · · · δ(Y (:, N)) ] ⎢ 0 S(:, 2)...⎥⎣ ...... 0 ⎦ ∈ RM×N0 0 S(:, N)(1786)which converts a Hadamard product into a standard matrix product. In thespecial case that S = s and Y = y are vectors in R MD.1.3s ◦ y = δ(s)y (1787)s T ⊗ y = ys Ts ⊗ y T = sy T (1788)Chain rules for composite matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X) [219,15.7]∇ X g ( f(X) T) = ∇ X f T ∇ f g (1789)∇ 2 X g( f(X) T) = ∇ X(∇X f T ∇ f g ) = ∇ 2 X f ∇ f g + ∇ X f T ∇ 2f g ∇ Xf (1790)D.1.3.1Two arguments∇ X g ( f(X) T , h(X) T) = ∇ X f T ∇ f g + ∇ X h T ∇ h g (1791)D.1.3.1.1 Example. Chain rule for two arguments. [42,1.1]∇ x g ( f(x) T , h(x) T) =g ( f(x) T , h(x) T) = (f(x) + h(x)) T A(f(x) + h(x)) (1792)[ ] [ ]x1εx1f(x) = , h(x) =εx 2 x 2(1793)[ 1 00 ε][ ε 0(A +A T )(f + h) +0 1](A +A T )(f + h)(1794)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 677∇ x g ( f(x) T , h(x) T) =[ 1 + ε 00 1 + ε] ([ ] [ ])(A +A T x1 εx1) +εx 2 x 2(1795)from Table D.2.1.limε→0 ∇ xg ( f(x) T , h(x) T) = (A +A T )x (1796)These foregoing formulae remain correct when gradient produceshyperdimensional representation:D.1.4First directional derivativeAssume that a differentiable function g(X) : R K×L →R M×N has continuousfirst- and second-order gradients ∇g and ∇ 2 g over domg which is an openset. We seek simple expressions for the first and second directional derivativesin direction Y ∈R K×L →Y: respectively, dg ∈ R M×N and dg →Y2 ∈ R M×N .Assuming that the limit exists, we may state the partial derivative of themn th entry of g with respect to the kl th entry of X ;∂g mn (X)∂X klg mn (X + ∆t e= limk e T l ) − g mn(X)∆t→0 ∆t∈ R (1797)where e k is the k th standard basis vector in R K while e l is the l th standardbasis vector in R L . The total number of partial derivatives equals KLMNwhile the gradient is defined in their terms; the mn th entry of the gradient is⎡∇g mn (X) =⎢⎣∂g mn(X)∂X 11∂g mn(X)∂X 21.∂g mn(X)∂X K1∂g mn(X)∂X 12· · ·∂g mn(X)∂X 22· · ·.∂g mn(X)∂X K2· · ·∂g mn(X)∂X 1L∂g mn(X)∂X 2L.∂g mn(X)∂X KL⎤∈ R K×L (1798)⎥⎦while the gradient is a quartix⎡⎤∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X)∇g(X) = ⎢⎥⎣ . .. ⎦ ∈ RM×N×K×L (1799)∇g M1 (X) ∇g M2 (X) · · · ∇g MN (X)

678 APPENDIX D. MATRIX CALCULUSBy simply rotating our perspective of the four-dimensional representation ofgradient matrix, we find one of three useful transpositions of this quartix(connoted T 1 ):⎡∇g(X) T 1=⎢⎣∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N (1800)When the limit for ∆t∈ R exists, it is easy to show by substitution ofvariables in (1797)∂g mn (X) g mn (X + ∆t Y kl eY kl = limk e T l ) − g mn(X)∂X kl ∆t→0 ∆t∈ R (1801)which may be interpreted as the change in g mn at X when the change in X klis equal to Y kl , the kl th entry of any Y ∈ R K×L . Because the total changein g mn (X) due to Y is the sum of change with respect to each and everyX kl , the mn th entry of the directional derivative is the corresponding totaldifferential [219,15.8]dg mn (X)| dX→Y= ∑ k,l∂g mn (X)∂X klY kl = tr ( ∇g mn (X) T Y ) (1802)= ∑ g mn (X + ∆t Y kl elimk e T l ) − g mn(X)(1803)∆t→0 ∆tk,lg mn (X + ∆t Y ) − g mn (X)= lim(1804)∆t→0 ∆t= d dt∣ g mn (X+ t Y ) (1805)t=0where t∈ R . Assuming finite Y , equation (1804) is called the Gâteauxdifferential [41, App.A.5] [199,D.2.1] [358,5.28] whose existence is impliedby existence of the Fréchet differential (the sum in (1802)). [250,7.2] Eachmay be understood as the change in g mn at X when the change in X is equal

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 679in magnitude and direction to Y . D.2 Hence the directional derivative,⎡⎤dg 11 (X) dg 12 (X) · · · dg 1N (X)→Ydg 21 (X) dg 22 (X) · · · dg 2N (X)dg(X) ⎢⎥∈ R M×N⎣ . . . ⎦∣dg M1 (X) dg M2 (X) · · · dg MN (X)⎡= ⎢⎣⎡∣dX→Ytr ( ∇g 11 (X) T Y ) tr ( ∇g 12 (X) T Y ) · · · tr ( ∇g 1N (X) T Y ) ⎤tr ( ∇g 21 (X) T Y ) tr ( ∇g 22 (X) T Y ) · · · tr ( ∇g 2N (X) T Y )⎥...tr ( ∇g M1 (X) T Y ) tr ( ∇g M2 (X) T Y ) · · · tr ( ∇g MN (X) T Y ) ⎦∑k,l∑=k,l⎢⎣ ∑k,l∂g 11 (X)∂X klY kl∑k,l∂g 21 (X)∂X klY kl∑k,l.∂g M1 (X)∑∂X klY klk,l∂g 12 (X)∂X klY kl · · ·∂g 22 (X)∂X klY kl · · ·.∂g M2 (X)∂X klY kl · · ·∑k,l∑k,l∑k,l∂g 1N (X)∂X klY kl∂g 2N (X)∂X klY kl.∂g MN (X)∂X kl⎤⎥⎦Y kl(1806)from which it follows→Ydg(X) = ∑ k,l∂g(X)∂X klY kl (1807)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈ Rg(X+ t Y ) = g(X) + t →Ydg(X) + o(t 2 ) (1808)which is the first-order Taylor series expansion about X . [219,18.4][152,2.3.4] Differentiation with respect to t and subsequent t-zeroingisolates the second term of expansion. Thus differentiating and zeroingg(X+ t Y ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X+ t Y ) as in (1805). So the directional derivativeof g(X) : R K×L →R M×N in any direction Y ∈ R K×L evaluated at X ∈ dom gbecomes→Ydg(X) = d dt∣ g(X+ t Y ) ∈ R M×N (1809)t=0D.2 Although Y is a matrix, we may regard it as a vector in R KL .

680 APPENDIX D. MATRIX CALCULUSυ ⎡⎢⎣∇ x f(α)→∇ xf(α)1df(α)2⎤⎥⎦f(α + t y)υ T(f(α),α)✡ ✡✡✡✡✡✡✡✡✡f(x)∂HFigure 160: Strictly convex quadratic bowl in R 2 ×R ; f(x)= x T x : R 2 → Rversus x on some open disc in R 2 . Plane slice ∂H is perpendicularto function domain. Slice intersection with domain connotes bidirectionalvector y . Slope of tangent line T at point (α , f(α)) is value of ∇ x f(α) T ydirectional derivative (1834) at α in slice direction y . Negative gradient−∇ x f(x)∈ R 2 is direction of steepest descent. [375] [219,15.6] [152] Whenvector υ ∈[ R 3 entry ] υ 3 is half directional derivative in gradient direction at αυ1and when = ∇υ x f(α) , then −υ points directly toward bowl bottom.2[277,2.1,5.4.5] [34,6.3.1] which is simplest. In case of a real functiong(X) : R K×L →RIn case g(X) : R K →R→Ydg(X) = tr ( ∇g(X) T Y ) (1831)→Ydg(X) = ∇g(X) T Y (1834)Unlike gradient, directional derivative does not expand dimension;directional derivative (1809) retains the dimensions of g . The derivativewith respect to t makes the directional derivative resemble ordinary calculus→Y(D.2); e.g., when g(X) is linear, dg(X) = g(Y ). [250,7.2]

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 681D.1.4.1Interpretation of directional derivativeIn the case of any differentiable real function g(X) : R K×L →R , thedirectional derivative of g(X) at X in any direction Y yields the slope ofg along the line {X+ t Y | t ∈ R} through its domain evaluated at t = 0.For higher-dimensional functions, by (1806), this slope interpretation can beapplied to each entry of the directional derivative.Figure 160, for example, shows a plane slice of a real convex bowl-shapedfunction f(x) along a line {α + t y | t ∈ R} through its domain. The slicereveals a one-dimensional real function of t ; f(α + t y). The directionalderivative at x = α in direction y is the slope of f(α + t y) with respectto t at t = 0. In the case of a real function having vector argumenth(X) : R K →R , its directional derivative in the normalized direction of itsgradient is the gradient magnitude. (1834) For a real function of real variable,the directional derivative evaluated at any point in the function domain isjust the slope of that function there scaled by the real direction. (confer3.6)Directional derivative generalizes our one-dimensional notion of derivativeto a multidimensional domain. When direction Y coincides with a memberof the standard Cartesian basis e k e T l (60), then a single partial derivative∂g(X)/∂X kl is obtained from directional derivative (1807); such is each entryof gradient ∇g(X) in equalities (1831) and (1834), for example.D.1.4.1.1 Theorem. Directional derivative optimality condition.[250,7.4] Suppose f(X) : R K×L →R is minimized on convex set C ⊆ R p×kby X ⋆ , and the directional derivative of f exists there. Then for all X ∈ C→X−X ⋆df(X) ≥ 0 (1810)⋄D.1.4.1.2 Example. Simple bowl.Bowl function (Figure 160)f(x) : R K → R (x − a) T (x − a) − b (1811)has function offset −b ∈ R , axis of revolution at x = a , and positive definiteHessian (1760) everywhere in its domain (an open hyperdisc in R K ); id est,strictly convex quadratic f(x) has unique global minimum equal to −b at

∇ ∂g mn(X)∂X kl682 APPENDIX D. MATRIX CALCULUSx = a . A vector −υ based anywhere in domf × R pointing toward theunique bowl-bottom is specified:[ ] x − aυ ∝∈ R K × R (1812)f(x) + bSuch a vector issince the gradient isυ =⎡⎢⎣∇ x f(x)→∇ xf(x)1df(x)2⎤⎥⎦ (1813)∇ x f(x) = 2(x − a) (1814)and the directional derivative in the direction of the gradient is (1834)D.1.5→∇ xf(x)df(x) = ∇ x f(x) T ∇ x f(x) = 4(x − a) T (x − a) = 4(f(x) + b) (1815)Second directional derivativeBy similar argument, it so happens: the second directional derivative isequally simple. Given g(X) : R K×L →R M×N on open domain,⎡∂ 2 g mn(X)⎤∂X kl ∂X 12· · ·= ∂∇g mn(X)∂X kl=⎡∇ 2 g mn (X) =⎢⎣⎡=⎢⎣⎢⎣∇ ∂gmn(X)∂X 11∇ ∂gmn(X)∂X 21.∇ ∂gmn(X)∂X K1∂∇g mn(X)∂X 11∂∇g mn(X)∂X 21.∂∇g mn(X)∂X K1∂ 2 g mn(X)∂X kl ∂X 11∂ 2 g mn(X)∂X kl ∂X 21.∂ 2 g mn(X)∂X kl ∂X K1∂ 2 g mn(X)∂X kl ∂X 22· · ·.∂ 2 g mn(X)∂X kl ∂X K2· · ·∇ ∂gmn(X)∂X 12· · · ∇ ∂gmn(X)∂X 1L∇ ∂gmn(X)∂X 22· · · ∇ ∂gmn(X)∂X 2L∂ 2 g mn(X)∂X kl ∂X 1L∂ 2 g mn(X)∂X kl ∂X 2L.∂ 2 g mn(X)∂X kl ∂X KL∈ R K×L (1816)⎥⎦∈ R K×L×K×L⎥. . ⎦∇ ∂gmn(X)∂X K2· · · ∇ ∂gmn(X)∂X KL∂∇g mn(X)∂X 12· · ·∂∇g mn(X)∂X 22.· · ·∂∇g mn(X)∂X K2· · ·∂∇g mn(X)∂X 1L∂∇g mn(X)∂X 2L.∂∇g mn(X)∂X KL⎤⎥⎦⎤(1817)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 683Rotating our perspective, we get several views of the second-order gradient:⎡∇ 2 g(X) = ⎢⎣∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X). .∇ 2 g M1 (X) ∇ 2 g M2 (X) · · ·.∇ 2 g MN (X)⎤⎥⎦ ∈ RM×N×K×L×K×L (1818)⎡∇ 2 g(X) T 1=⎢⎣∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22.· · · ∇ ∂g(X)∂X 2L.∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N×K×L (1819)⎡∇ 2 g(X) T 2=⎢⎣∂∇g(X)∂X 11∂∇g(X)∂X 21.∂∇g(X)∂X K1∂∇g(X)∂X 12· · ·∂∇g(X)∂X 22.· · ·∂∇g(X)∂X K2· · ·∂∇g(X)∂X 1L∂∇g(X)∂X 2L.∂∇g(X)∂X KL⎤⎥⎦ ∈ RK×L×K×L×M×N (1820)Assuming the limits exist, we may state the partial derivative of the mn thentry of g with respect to the kl th and ij th entries of X ;∂ 2 g mn(X)∂X kl ∂X ij=g mn(X+∆t elimk e T l +∆τ e i eT j )−gmn(X+∆t e k eT l )−(gmn(X+∆τ e i eT)−gmn(X))j∆τ,∆t→0∆τ ∆t(1821)Differentiating (1801) and then scaling by Y ij∂ 2 g mn(X)∂X kl ∂X ijY kl Y ij = lim∆t→0∂g mn(X+∆t Y kl e k e T l )−∂gmn(X)∂X ijY∆tij(1822)g mn(X+∆t Y kl e= limk e T l +∆τ Y ij e i e T j )−gmn(X+∆t Y kl e k e T l )−(gmn(X+∆τ Y ij e i e T)−gmn(X))j∆τ,∆t→0∆τ ∆t

684 APPENDIX D. MATRIX CALCULUSwhich can be proved by substitution of variables in (1821).second-order total differential due to any Y ∈R K×L isThe mn thd 2 g mn (X)| dX→Y= ∑ i,j∑k,l∂ 2 g mn (X)Y kl Y ij = tr(∇ X tr ( ∇g mn (X) T Y ) )TY∂X kl ∂X ij(1823)= ∑ ∂g mn (X + ∆t Y ) − ∂g mn (X)limY ij∆t→0 ∂Xi,jij ∆t(1824)g mn (X + 2∆t Y ) − 2g mn (X + ∆t Y ) + g mn (X)= lim∆t→0∆t 2 (1825)= d2dt 2 ∣∣∣∣t=0g mn (X+ t Y ) (1826)Hence the second directional derivative,→Ydg 2 (X) ⎡⎢⎣⎡tr(∇tr ( ∇g 11 (X) T Y ) )TY =tr(∇tr ( ∇g 21 (X) T Y ) )TY ⎢⎣ .tr(∇tr ( ∇g M1 (X) T Y ) )TYd 2 g 11 (X) d 2 g 12 (X) · · · d 2 g 1N (X)d 2 g 21 (X) d 2 g 22 (X) · · · d 2 g 2N (X). .d 2 g M1 (X) d 2 g M2 (X) · · ·.d 2 g MN (X)⎤⎥∈ R M×N⎦∣∣dX→Ytr(∇tr ( ∇g 12 (X) T Y ) )TY · · · tr(∇tr ( ∇g 1N (X) T Y ) ) ⎤TYtr(∇tr ( ∇g 22 (X) T Y ) )TY · · · tr(∇tr ( ∇g 2N (X) T Y ) )T Y ..tr(∇tr ( ∇g M2 (X) T Y ) )TY · · · tr(∇tr ( ∇g MN (X) T Y ) ⎥) ⎦TY⎡=⎢⎣∑ ∑i,jk,l∑ ∑i,jk,l∂ 2 g 11 (X)∂X kl ∂X ijY kl Y ij∂ 2 g 21 (X)∂X kl ∂X ijY kl Y ij.∑ ∑∂ 2 g M1 (X)∂X kl ∂X ijY kl Y iji,jk,l∑ ∑∂ 2 g 12 (X)∂X kl ∂X ijY kl Y ij · · ·i,ji,jk,l∑ ∑∂ 2 g 22 (X)∂X kl ∂X ijY kl Y ij · · ·k,l.∑ ∑∂ 2 g M2 (X)∂X kl ∂X ijY kl Y ij · · ·i,jk,l∑ ∑ ⎤∂ 2 g 1N (X)∂X kl ∂X ijY kl Y iji,j k,l∑ ∑∂ 2 g 2N (X)∂X kl ∂X ijY kl Y iji,j k,l. ⎥∑ ∑ ∂ 2 g MN (X) ⎦∂X kl ∂X ijY kl Y iji,jk,l(1827)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 685from which it follows→Ydg 2 (X) = ∑ i,j∑k,l∂ 2 g(X)∂X kl ∂X ijY kl Y ij = ∑ i,j∂∂X ij→Ydg(X)Y ij (1828)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈Rg(X+ t Y ) = g(X) + t dg(X) →Y+ 1 →Yt2dg 2 (X) + o(t 3 ) (1829)2!which is the second-order Taylor series expansion about X . [219,18.4][152,2.3.4] Differentiating twice with respect to t and subsequent t-zeroingisolates the third term of the expansion. Thus differentiating and zeroingg(X+ t Y ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X+ t Y ) as in (1826). So the second directionalderivative of g(X) : R K×L →R M×N becomes [277,2.1,5.4.5] [34,6.3.1]→Ydg 2 (X) = d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ∈ R M×N (1830)which is again simplest. (confer (1809)) Directional derivative retains thedimensions of g .D.1.6directional derivative expressionsIn the case of a real function g(X) : R K×L →R , all its directional derivativesare in R :→Ydg(X) = tr ( ∇g(X) T Y ) (1831)→Ydg 2 (X) = tr(∇ X tr ( ∇g(X) T Y ) ) ( )T→YY = tr ∇ X dg(X) T Y(→Ydg 3 (X) = tr ∇ X tr(∇ X tr ( ∇g(X) T Y ) ) ) ( )T TY →YY = tr ∇ X dg 2 (X) T Y(1832)(1833)

686 APPENDIX D. MATRIX CALCULUSIn the case g(X) : R K →R has vector argument, they further simplify:and so on.D.1.7Taylor series→Ydg(X) = ∇g(X) T Y (1834)→Ydg 2 (X) = Y T ∇ 2 g(X)Y (1835)→Ydg 3 (X) = ∇ X(Y T ∇ 2 g(X)Y ) TY (1836)Series expansions of the differentiable matrix-valued function g(X) , ofmatrix argument, were given earlier in (1808) and (1829). Assuming g(X)has continuous first-, second-, and third-order gradients over the open setdomg , then for X ∈ domg and any Y ∈ R K×L the complete Taylor seriesis expressed on some open interval of µ∈Rg(X+µY ) = g(X) + µ dg(X) →Y+ 1 →Yµ2dg 2 (X) + 1 →Yµ3dg 3 (X) + o(µ 4 ) (1837)2! 3!or on some open interval of ‖Y ‖ 2g(Y ) = g(X) +→Y −X→Y −X→Y −Xdg(X) + 1 dg 2 (X) + 1 dg 3 (X) + o(‖Y ‖ 4 ) (1838)2! 3!which are third-order expansions about X . The mean value theorem fromcalculus is what insures finite order of the series. [219] [42,1.1] [41, App.A.5][199,0.4] These somewhat unbelievable formulae imply that a function canbe determined over the whole of its domain by knowing its value and all itsdirectional derivatives at a single point X .D.1.7.0.1 Example. Inverse-matrix function.Say g(Y )= Y −1 . From the table on page 692,→Ydg(X) = d dt∣ g(X+ t Y ) = −X −1 Y X −1 (1839)t=0→Ydg 2 (X) = d2dt 2 ∣∣∣∣t=0g(X+ t Y ) = 2X −1 Y X −1 Y X −1 (1840)→Ydg 3 (X) = d3dt 3 ∣∣∣∣t=0g(X+ t Y ) = −6X −1 Y X −1 Y X −1 Y X −1 (1841)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 687Let’s find the Taylor series expansion of g about X = I : Since g(I)= I , for‖Y ‖ 2 < 1 (µ = 1 in (1837))g(I + Y ) = (I + Y ) −1 = I − Y + Y 2 − Y 3 + ... (1842)If Y is small, (I + Y ) −1 ≈ I − Y . D.3 Now we find Taylor series expansionabout X :g(X + Y ) = (X + Y ) −1 = X −1 − X −1 Y X −1 + 2X −1 Y X −1 Y X −1 − ...(1843)If Y is small, (X + Y ) −1 ≈ X −1 − X −1 Y X −1 .D.1.7.0.2 Exercise. log det. (confer [61, p.644])Find the first three terms of a Taylor series expansion for log detY . Specifyan open interval over which the expansion holds in vicinity of X . D.1.8Correspondence of gradient to derivativeFrom the foregoing expressions for directional derivative, we derive arelationship between the gradient with respect to matrix X and the derivativewith respect to real variable t :D.1.8.1first-orderRemoving evaluation at t = 0 from (1809), D.4 we find an expression for thedirectional derivative of g(X) in direction Y evaluated anywhere along aline {X+ t Y | t ∈ R} intersecting domg→Ydg(X+ t Y ) = d g(X+ t Y ) (1844)dtIn the general case g(X) : R K×L →R M×N , from (1802) and (1805) we findtr ( ∇ X g mn (X+ t Y ) T Y ) = d dt g mn(X+ t Y ) (1845)D.3 Had we instead set g(Y )=(I + Y ) −1 , then the equivalent expansion would have beenabout X =0.D.4 Justified by replacing X with X+ tY in (1802)-(1804); beginning,dg mn (X+ tY )| dX→Y= ∑ k, l∂g mn (X+ tY )Y kl∂X kl

688 APPENDIX D. MATRIX CALCULUSwhich is valid at t = 0, of course, when X ∈ dom g . In the important caseof a real function g(X) : R K×L →R , from (1831) we have simplytr ( ∇ X g(X+ t Y ) T Y ) = d g(X+ t Y ) (1846)dtWhen, additionally, g(X) : R K →R has vector argument,∇ X g(X+ t Y ) T Y = d g(X+ t Y ) (1847)dtD.1.8.1.1 Example. Gradient.g(X) = w T X T Xw , X ∈ R K×L , w ∈R L . Using the tables inD.2,tr ( ∇ X g(X+ t Y ) T Y ) = tr ( 2ww T (X T + t Y T )Y ) (1848)Applying the equivalence (1846),= 2w T (X T Y + t Y T Y )w (1849)ddt g(X+ t Y ) = d dt wT (X+ t Y ) T (X+ t Y )w (1850)= w T( X T Y + Y T X + 2t Y T Y ) w (1851)= 2w T (X T Y + t Y T Y )w (1852)which is the same as (1849); hence, equivalence is demonstrated.It is easy to extract ∇g(X) from (1852) knowing only (1846):D.1.8.2tr ( ∇ X g(X+ t Y ) T Y ) = 2w T (X T Y + t Y T Y )w= 2 tr ( ww T (X T + t Y T )Y )tr ( ∇ X g(X) T Y ) = 2 tr ( ww T X T Y )(1853)⇔∇ X g(X) = 2Xww Tsecond-orderLikewise removing the evaluation at t = 0 from (1830),→Ydg 2 (X+ t Y ) = d2dt2g(X+ t Y ) (1854)we can find a similar relationship between the second-order gradient and thesecond derivative: In the general case g(X) : R K×L →R M×N from (1823) and(1826),

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 689tr(∇ X tr ( ∇ X g mn (X+ t Y ) T Y ) )TY = d2dt 2g mn(X+ t Y ) (1855)In the case of a real function g(X) : R K×L →R we have, of course,tr(∇ X tr ( ∇ X g(X+ t Y ) T Y ) )TY = d2dt2g(X+ t Y ) (1856)From (1835), the simpler case, where the real function g(X) : R K →R hasvector argument,Y T ∇ 2 Xd2g(X+ t Y )Y =dt2g(X+ t Y ) (1857)D.1.8.2.1 Example. Second-order gradient.Given real function g(X) = log detX having domain int S K + , we want tofind ∇ 2 g(X)∈ R K×K×K×K . From the tables inD.2,h(X) ∇g(X) = X −1 ∈ int S K + (1858)so ∇ 2 g(X)=∇h(X). By (1845) and (1808), for Y ∈ S Ktr ( ∇h mn (X) T Y ) =d dt∣ h mn (X+ t Y ) (1859)( t=0)d =dt∣∣ h(X+ t Y ) (1860)t=0 mn( )d =dt∣∣ (X+ t Y ) −1 (1861)t=0mn= − ( X −1 Y X −1) mn(1862)Setting Y to a member of {e k e T l ∈ R K×K | k,l=1... K} , and employing aproperty (39) of the trace function we find∇ 2 g(X) mnkl = tr ( )∇h mn (X) T e k e T l = ∇hmn (X) kl = − ( X −1 e k e T l X−1) mn(1863)∇ 2 g(X) kl = ∇h(X) kl = − ( X −1 e k e T l X −1) ∈ R K×K (1864)From all these first- and second-order expressions, we may generate newones by evaluating both sides at arbitrary t (in some open interval) but onlyafter the differentiation.

690 APPENDIX D. MATRIX CALCULUSD.2 Tables of gradients and derivativesResults may be numerically proven by Romberg extrapolation. [108]When proving results for symmetric matrices algebraically, it is criticalto take gradients ignoring symmetry and to then substitute symmetricentries afterward. [166] [65]a,b∈R n , x,y ∈ R k , A,B∈ R m×n , X,Y ∈ R K×L , t,µ∈R ,i,j,k,l,K,L,m,n,M,N are integers, unless otherwise noted.x µ means δ(δ(x) µ ) for µ∈R ; id est, entrywise vector exponentiation.δ is the main-diagonal linear operator (1415). x 0 1, X 0 I if square.d⎡ ⎢dx ⎣ddx 1.ddx k⎤⎥⎦,→ydg(x) ,→ydg 2 (x) (directional derivativesD.1), log x , e x ,√|x| , sgnx, x/y (Hadamard quotient), x (entrywise square root),etcetera, are maps f : R k → R k that maintain dimension; e.g., (A.1.1)ddx x−1 ∇ x 1 T δ(x) −1 1 (1865)For A a scalar or square matrix, we have the Taylor series [77,3.6]e A ∞∑k=01k! Ak (1866)Further, [331,5.4]e A ≻ 0 ∀A ∈ S m (1867)For all square A and integer kdet k A = detA k (1868)

D.2. TABLES OF GRADIENTS AND DERIVATIVES 691D.2.1algebraic∇ x x = ∇ x x T = I ∈ R k×k ∇ X X = ∇ X X T I ∈ R K×L×K×L (identity)∇ x (Ax − b) = A T∇ x(x T A − b T) = A∇ x (Ax − b) T (Ax − b) = 2A T (Ax − b)∇ 2 x (Ax − b) T (Ax − b) = 2A T A∇ x ‖Ax − b‖ = A T (Ax − b)/‖Ax − b‖∇ x z T |Ax − b| = A T δ(z)sgn(Ax ( − b), z i ≠ 0 ⇒ (Ax − b) i ≠ 0∇ x 1 T f(|Ax − b|) = A T df(y)δ ∣)sgn(Ax − b)y=|Ax−b|dy(∇ x x T Ax + 2x T By + y T Cy ) = ( A +A T) x + 2By∇ x (x(+ y) T A(x + y) = (A +A T )(x + y)∇x2 x T Ax + 2x T By + y T Cy ) = A +A T ∇ X a T Xb = ∇ X b T X T a = ab T∇ X a T X 2 b = X T ab T + ab T X T∇ X a T X −1 b = −X −T ab T X −T∇ X (X −1 ) kl = ∂X−1∂X kl= −X −1 e ke T l X−1 ,∇ x a T x T xb = 2xa T b ∇ X a T X T Xb = X(ab T + ba T )confer(1800)(1864)∇ x a T xx T b = (ab T + ba T )x ∇ X a T XX T b = (ab T + ba T )X∇ x a T x T xa = 2xa T a∇ x a T xx T a = 2aa T x∇ x a T yx T b = ba T y∇ x a T y T xb = yb T a∇ x a T xy T b = ab T y∇ x a T x T yb = ya T b∇ X a T X T Xa = 2Xaa T∇ X a T XX T a = 2aa T X∇ X a T YX T b = ba T Y∇ X a T Y T Xb = Y ab T∇ X a T XY T b = ab T Y∇ X a T X T Y b = Y ba T

692 APPENDIX D. MATRIX CALCULUSalgebraic continuedd(X+ t Y ) = Ydtddt BT (X+ t Y ) −1 A = −B T (X+ t Y ) −1 Y (X+ t Y ) −1 Addt BT (X+ t Y ) −T A = −B T (X+ t Y ) −T Y T (X+ t Y ) −T Addt BT (X+ t Y ) µ A = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M +B T (X+ t Y ) −1 A = 2B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 Adt 2 d 3B T (X+ t Y ) −1 A = −6B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 Adt 3d 2ddt((X+ t Y ) T A(X+ t Y ) ) = Y T AX + X T AY + 2tY T AY(d 2dt (X+ t Y ) T A(X+ t Y ) ) = 2Y T AY2((X+ t Y ) T A(X+ t Y ) ) −1ddtddt= − ( (X+ t Y ) T A(X+ t Y ) ) −1(Y T AX + X T AY + 2tY T AY ) ( (X+ t Y ) T A(X+ t Y ) ) −1((X+ t Y )A(X+ t Y )) = YAX + XAY + 2tYAYd 2dt 2 ((X+ t Y )A(X+ t Y )) = 2YAYD.2.1.0.1 Exercise. Expand these tables.Provide unfinished table entries indicated by ... throughoutD.2.D.2.1.0.2 Exercise. log. (D.1.7)Find the first four terms of the Taylor series expansion for log x about x = 1.Prove that log x ≤ x −1; alternatively, plot the supporting hyperplane tothe hypograph of log x at[ xlog x]=[ 10]. D.2.2trace Kronecker∇ vec X tr(AXBX T ) = ∇ vec X vec(X) T (B T ⊗ A) vec X = (B ⊗A T + B T ⊗A) vec X∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗ A) vec X = B ⊗A T + B T ⊗A

D.2. TABLES OF GRADIENTS AND DERIVATIVES 693D.2.3trace∇ x µ x = µI ∇ X tr µX = ∇ X µ trX = µI∇ x 1 T δ(x) −1 1 = ddx x−1 = −x −2∇ x 1 T δ(x) −1 y = −δ(x) −2 y∇ X tr X −1 = −X −2T∇ X tr(X −1 Y ) = ∇ X tr(Y X −1 ) = −X −T Y T X −Tddx xµ = µx µ−1 ∇ X tr X µ = µX µ−1 , X ∈ S M∇ X tr X j = jX (j−1)T∇ x (b − a T x) −1 = (b − a T x) −2 a∇ X tr ( (B − AX) −1) = ( (B − AX) −2 A ) T∇ x (b − a T x) µ = −µ(b − a T x) µ−1 a∇ x x T y = ∇ x y T x = y∇ X tr(X T Y ) = ∇ X tr(Y X T ) = ∇ X tr(Y T X) = ∇ X tr(XY T ) = Y∇ X tr(AXBX T ) = ∇ X tr(XBX T A) = A T XB T + AXB∇ X tr(AXBX) = ∇ X tr(XBXA) = A T X T B T + B T X T A T∇ X tr(AXAXAX) = ∇ X tr(XAXAXA) = 3(AXAXA) T∇ X tr(Y X k ) = ∇ X tr(X k Y ) = k−1 ∑ (X i Y X k−1−i) Ti=0∇ X tr(Y T XX T Y ) = ∇ X tr(X T Y Y T X) = 2 Y Y T X∇ X tr(Y T X T XY ) = ∇ X tr(XY Y T X T ) = 2XY Y T∇ X tr ( (X + Y ) T (X + Y ) ) = 2(X + Y ) = ∇ X ‖X + Y ‖ 2 F∇ X tr((X + Y )(X + Y )) = 2(X + Y ) T∇ X tr(A T XB) = ∇ X tr(X T AB T ) = AB T∇ X tr(A T X −1 B) = ∇ X tr(X −T AB T ) = −X −T AB T X −T∇ X a T Xb = ∇ X tr(ba T X) = ∇ X tr(Xba T ) = ab T∇ X b T X T a = ∇ X tr(X T ab T ) = ∇ X tr(ab T X T ) = ab T∇ X a T X −1 b = ∇ X tr(X −T ab T ) = −X −T ab T X −T∇ X a T X µ b = ...

694 APPENDIX D. MATRIX CALCULUStrace continueddtrg(X+ t Y ) = tr d g(X+ t Y )dt dt[203, p.491]dtr(X+ t Y ) = trYdtddt trj (X+ t Y ) = j tr j−1 (X+ t Y ) tr Ydtr(X+ t Y dt )j = j tr((X+ t Y ) j−1 Y )(∀j)d2tr((X+ t Y )Y ) = trYdtdtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = k tr ( (X+ t Y ) k−1 Y 2) , k ∈{0, 1, 2}dtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = tr k−1 ∑(X+ t Y ) i Y (X+ t Y ) k−1−i Ydtr((X+ t Y dt )−1 Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dtr( B T (X+ t Y ) −1 A ) = − tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 A )dtdtr( B T (X+ t Y ) −T A ) = − tr ( B T (X+ t Y ) −T Y T (X+ t Y ) −T A )dtdtr( B T (X+ t Y ) −k A ) = ..., k>0dtdtr( B T (X+ t Y ) µ A ) = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M dt +d 2tr ( B T (X+ t Y ) −1 A ) = 2 tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 A )dt 2dtr( (X+ t Y ) T A(X+ t Y ) ) = tr ( Y T AX + X T AY + 2tY T AY )dtd 2tr ( (X+ t Y ) T A(X+ t Y ) ) = 2 tr ( Y T AY )dt 2 (d ((X+ tr dt t Y ) T A(X+ t Y ) ) ) −1( ((X+= − tr t Y ) T A(X+ t Y ) ) −1(Y T AX + X T AY + 2tY T AY ) ( (X+ t Y ) T A(X+ t Y ) ) ) −1ddttr((X+ t Y )A(X+ t Y )) = tr(YAX + XAY + 2tYAY )d 2dt 2 tr((X+ t Y )A(X+ t Y )) = 2 tr(YAY )i=0

D.2. TABLES OF GRADIENTS AND DERIVATIVES 695D.2.4logarithmic determinantx≻0, detX >0 on some neighborhood of X , and det(X+ t Y )>0 onsome open interval of t ; otherwise, log( ) would be discontinuous. [82, p.75]dlog x = x−1∇dxddx log x−1 = −x −1ddx log xµ = µx −1X log detX = X −T∇ 2 X log det(X) kl = ∂X−T∂X kl= − ( X −1 e k e T l X−1) T, confer (1817)(1864)∇ X log detX −1 = −X −T∇ X log det µ X = µX −T∇ X log detX µ = µX −T∇ X log detX k = ∇ X log det k X = kX −T∇ X log det µ (X+ t Y ) = µ(X+ t Y ) −T∇ x log(a T x + b) = a 1a T x+b∇ X log det(AX+ B) = A T (AX+ B) −T∇ X log det(I ± A T XA) = ±A(I ± A T XA) −T A T∇ X log det(X+ t Y ) k = ∇ X log det k (X+ t Y ) = k(X+ t Y ) −Tdlog det(X+ t Y ) = tr((X+ t Y dt )−1 Y )d 2dt 2 log det(X+ t Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dlog det(X+ t Y dt )−1 = − tr((X+ t Y ) −1 Y )d 2dt 2 log det(X+ t Y ) −1 = tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )ddt log det(δ(A(x + t y) + a)2 + µI)= tr ( (δ(A(x + t y) + a) 2 + µI) −1 2δ(A(x + t y) + a)δ(Ay) )

696 APPENDIX D. MATRIX CALCULUSD.2.5determinant∇ X detX = ∇ X detX T = det(X)X −T∇ X detX −1 = − det(X −1 )X −T = − det(X) −1 X −T∇ X det µ X = µ det µ (X)X −T∇ X detX µ = µ det(X µ )X −T∇ X detX k = k det k−1 (X) ( tr(X)I − X T) , X ∈ R 2×2∇ X detX k = ∇ X det k X = k det(X k )X −T = k det k (X)X −T∇ X det µ (X+ t Y ) = µ det µ (X+ t Y )(X+ t Y ) −T∇ X det(X+ t Y ) k = ∇ X det k (X+ t Y ) = k det k (X+ t Y )(X+ t Y ) −Tddet(X+ t Y ) = det(X+ t Y ) tr((X+ t Y dt )−1 Y )d 2dt 2 det(X+ t Y ) = det(X+ t Y )(tr 2 ((X+ t Y ) −1 Y ) − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddet(X+ t Y dt )−1 = − det(X+ t Y ) −1 tr((X+ t Y ) −1 Y )d 2dt 2 det(X+ t Y ) −1 = det(X+ t Y ) −1 (tr 2 ((X+ t Y ) −1 Y ) + tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddt detµ (X+ t Y ) = µ det µ (X+ t Y ) tr((X+ t Y ) −1 Y )D.2.6logarithmicMatrix logarithm.dlog(X+ t Y dt )µ = µY (X+ t Y ) −1 = µ(X+ t Y ) −1 Y ,dlog(I − t Y dt )µ = −µY (I − t Y ) −1 = −µ(I − t Y ) −1 YXY = Y X[203, p.493]

D.2. TABLES OF GRADIENTS AND DERIVATIVES 697D.2.7exponentialMatrix exponential. [77,3.6,4.5] [331,5.4]∇ X e tr(Y T X) = ∇ X dete Y TX = e tr(Y T X) Y (∀X,Y )∇ X tre Y X = e Y T X T Y T = Y T e XT Y T∇ x 1 T e Ax = A T e Ax∇ x 1 T e |Ax| = A T δ(sgn(Ax))e |Ax| (Ax) i ≠ 0∇ x log(1 T e x ) = 11 T e x ex∇ 2 x log(1 T e x ) = 11 T e x (δ(e x ) − 11 T e x ex e xT )k∏∇ x x 1 ki = 1 (∏ kx 1 ki=1 k i)1/xi=1∇ 2 xk∏x 1 ki = − 1 ( k)(∏x 1 ki=1 k i δ(x) −2 − 1 )i=1 k (1/x)(1/x)Tddt etY = e tY Y = Y e tYddt eX+ t Y = e X+ t Y Y = Y e X+ t Y ,d 2dt 2 e X+ t Y = e X+ t Y Y 2 = Y e X+ t Y Y = Y 2 e X+ t Y ,XY = Y XXY = Y Xd jdt j e tr(X+ t Y ) = e tr(X+ t Y ) tr j (Y )

698 APPENDIX D. MATRIX CALCULUS

Appendix EProjectionFor any A∈ R m×n , the pseudoinverse [202,7.3 prob.9] [250,6.12 prob.19][159,5.5.4] [331, App.A]A † limt→0 +(AT A + t I) −1 A T = limt→0 +AT (AA T + t I) −1 ∈ R n×m (1869)is a unique matrix from the optimal solution set to minimize ‖AX − I‖ 2 FX(3.6.0.0.2). For any t > 0I − A(A T A + t I) −1 A T = t(AA T + t I) −1 (1870)Equivalently, pseudoinverse A † is that unique matrix satisfying theMoore-Penrose conditions: [204,1.3] [373]1. AA † A = A 3. (AA † ) T = AA †2. A † AA † = A † 4. (A † A) T = A † Awhich are necessary and sufficient to establish the pseudoinverse whoseprincipal action is to injectively map R(A) onto R(A T ) (Figure 161).Conditions 1 and 3 are necessary and sufficient for AA † to be the orthogonalprojector on R(A) , while conditions 2 and 4 hold iff A † A is the orthogonalprojector on R(A T ).Range and nullspace of the pseudoinverse [267] [328,III.1 exer.1]R(A † ) = R(A T ), R(A †T ) = R(A) (1871)N(A † ) = N(A T ), N(A †T ) = N(A) (1872)can be derived by singular value decomposition (A.6).2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.699

700 APPENDIX E. PROJECTIONR np = AA † bR m{x}xR(A T )0N(A)x = A † px = A † b0 = A † (b − p)Ax = pR(A)p0bb −pN(A T ){b}Figure 161:(confer Figure 16) Pseudoinverse A † ∈ R n×m action: [331, p.449]Component of b in N(A T ) maps to 0, while component of b in R(A) mapsto rowspace R(A T ). For any A∈ R m×n , inversion is bijective ∀p ∈ R(A).x=A † b ⇔ x∈R(A T ) & b−Ax⊥R(A) ⇔ x⊥ N(A) & b−Ax∈ N(A T ). [48]The following relations reliably hold without qualification:a. A T† = A †Tb. A †† = Ac. (AA T ) † = A †T A †d. (A T A) † = A † A †Te. (AA † ) † = AA †f. (A † A) † = A † AYet for arbitrary A,B it is generally true that (AB) † ≠ B † A † :E.0.0.0.1 Theorem. Pseudoinverse of product. [169] [57] [241, exer.7.23]For A∈ R m×n and B ∈ R n×k (AB) † = B † A † (1873)if and only ifR(A T AB) ⊆ R(B) and R(BB T A T ) ⊆ R(A T ) (1874)⋄

701U T = U † for orthonormal (including the orthogonal) matrices U . So, fororthonormal matrices U,Q and arbitrary A(UAQ T ) † = QA † U T (1875)E.0.0.0.2Prove:Exercise. Kronecker inverse.(A ⊗ B) † = A † ⊗ B † (1876)E.0.1Logical deductionsWhen A is invertible, A † = A −1 ; so A † A = AA † = I . Otherwise, forA∈ R m×n [153,5.3.3.1] [241,7] [295]g. A † A = I , A † = (A T A) −1 A T , rankA = nh. AA † = I , A † = A T (AA T ) −1 , rankA = mi. A † Aω = ω , ω ∈ R(A T )j. AA † υ = υ , υ ∈ R(A)k. A † A = AA † , A normall. A k† = A †k , A normal, k an integerEquivalent to the corresponding Moore-Penrose condition:1. A T = A T AA † or A T = A † AA T2. A †T = A †T A † A or A †T = AA † A †TWhen A is symmetric, A † is symmetric and (A.6)A ≽ 0 ⇔ A † ≽ 0 (1877)E.0.1.0.1 Example. Solution to classical linear equation Ax = b .In2.5.1.1, the solution set to matrix equation Ax = b was representedas an intersection of hyperplanes. Regardless of rank of A or its shape(fat or skinny), interpretation as a hyperplane intersection describing apossibly empty affine set generally holds. If matrix A is rank deficient orfat, there is an infinity of solutions x when b∈R(A). A unique solutionoccurs when the hyperplanes intersect at a single point.

702 APPENDIX E. PROJECTIONGiven arbitrary matrix A (of any rank and dimension) and vector b notnecessarily in R(A) , we wish to find a best solution x ⋆ toAx ≈ b (1878)in a Euclidean sense by solving an algebraic expression for orthogonalprojection of b on R(A)minimize ‖Ax − b‖ 2 (1879)xNecessary and sufficient condition for optimal solution to this unconstrainedoptimization is the so-called normal equation that results from zeroing theconvex objective’s gradient: (D.2.1)A T Ax = A T b (1880)normal because error vector b −Ax is perpendicular to R(A) ; id est,A T (b −Ax)=0. Given any matrix A and any vector b , the normal equationis solvable exactly; always so, because R(A T A)= R(A T ) and A T b∈ R(A T ).Given particular ˜x∈ R(A T ) solving (1880), then (Figure 161) it isnecessarily unique and ˜x = x ⋆ = A † b . When A is skinny-or-square full-rank,normal equation (1880) can be solved exactly by inversion:x ⋆ = (A T A) −1 A T b ≡ A † b (1881)For matrix A of arbitrary rank and shape, on the other hand, A T A mightnot be invertible. Yet the normal equation can always be solved exactly by:(1869)x ⋆ = lim A + t I) −1 A T b = A † b (1882)t→0 +(ATinvertible for any positive value of t by (1451). The exact inversion (1881)and this pseudoinverse solution (1882) each solve a limited regularization∥ [ ]∥b ∥∥∥20limt→0 +minimize ‖Ax − b‖ 2 + t ‖x‖ 2 ≡xlimt→0 +minimize x[ √t A∥]x − I(1883)simultaneously providing least squares solution to (1879) and the classicalleast norm solution E.1 [331, App.A.4] [48] (confer3.2.0.0.1)arg minimize ‖x‖ 2xsubject to Ax = AA † b(1884)E.1 This means: optimal solutions of lesser norm than the so-called least norm solution(1884) can be obtained (at expense of approximation Ax ≈ b hence, of perpendicularity)by ignoring the limiting operation and introducing finite positive values of t into (1883).

E.1. IDEMPOTENT MATRICES 703where AA † b is the orthogonal projection of vector b on R(A). Least normsolution can be interpreted as orthogonal projection of the origin 0 on affinesubset A = {x |Ax=AA † b} ; (E.5.0.0.5,E.5.0.0.6)arg minimize ‖x − 0‖ 2xsubject to x ∈ A(1885)equivalently, maximization of the Euclidean ball until it kisses A ; rather,arg dist(0, A).E.1 Idempotent matricesProjection matrices are square and defined by idempotence, P 2 =P ;[331,2.6] [204,1.3] equivalent to the condition, P be diagonalizable[202,3.3 prob.3] with eigenvalues φ i ∈ {0, 1}. [393,4.1 thm.4.1]Idempotent matrices are not necessarily symmetric. The transpose ofan idempotent matrix remains idempotent; P T P T = P T . Solely exceptingP = I , all projection matrices are neither orthogonal (B.5) or invertible.[331,3.4] The collection of all projection matrices of particular dimensiondoes not form a convex set.Suppose we wish to project nonorthogonally (obliquely) on the rangeof any particular matrix A∈ R m×n . All idempotent matrices projectingnonorthogonally on R(A) may be expressed:P = A(A † + BZ T ) ∈ R m×m (1886)where R(P )= R(A) , E.2 B ∈ R n×k for k ∈{1... m} is otherwise arbitrary,and Z ∈ R m×k is any matrix whose range is in N(A T ) ; id est,A T Z = A † Z = 0 (1887)Evidently, the collection of nonorthogonal projectors projecting on R(A) isan affine subsetP k = { A(A † + BZ T ) | B ∈ R n×k} (1888)E.2 Proof. R(P )⊆ R(A) is obvious [331,3.6]. By (141) and (142),R(A † + BZ T ) = {(A † + BZ T )y | y ∈ R m }⊇ {(A † + BZ T )y | y ∈ R(A)} = R(A T )R(P ) = {A(A † + BZ T )y | y ∈ R m }⊇ {A(A † + BZ T )y | (A † + BZ T )y ∈ R(A T )} = R(A)

704 APPENDIX E. PROJECTIONWhen matrix A in (1886) is skinny full-rank (A † A = I) or has orthonormalcolumns (A T A = I), either property leads to a biorthogonal characterizationof nonorthogonal projection:E.1.1Biorthogonal characterizationAny nonorthogonal projector P 2 =P ∈ R m×m projecting on nontrivial R(U)can be defined by a biorthogonality condition Q T U =I ; the biorthogonaldecomposition of P being (confer (1886))where E.3 (B.1.1.1)and where generally (confer (1916)) E.4P = UQ T , Q T U = I (1889)R(P )= R(U) , N(P )= N(Q T ) (1890)P T ≠ P , P † ≠ P , ‖P ‖ 2 ≠ 1, P 0 (1891)and P is not nonexpansive (1917).(⇐) To verify assertion (1889) we observe: because idempotent matricesare diagonalizable (A.5), [202,3.3 prob.3] they must have the form (1547)P = SΦS −1 =m∑φ i s i wi T =i=1k∑≤ mi=1s i w T i (1892)that is a sum of k = rankP independent projector dyads (idempotentdyads,B.1.1,E.6.2.1) where φ i ∈ {0, 1} are the eigenvalues of P[393,4.1 thm.4.1] in diagonal matrix Φ∈ R m×m arranged in nonincreasingE.3 Proof. Obviously, R(P ) ⊆ R(U). Because Q T U = IR(P ) = {UQ T x | x ∈ R m }⊇ {UQ T Uy | y ∈ R k } = R(U)E.4 Orthonormal decomposition (1913) (conferE.3.4) is a special case of biorthogonaldecomposition (1889) characterized by (1916). So, these characteristics (1891) are notnecessary conditions for biorthogonality.

E.1. IDEMPOTENT MATRICES 705order, and where s i ,w i ∈ R m are the right- and left-eigenvectors of P ,respectively, which are independent and real. E.5 ThereforeU S(:,1:k) = [s 1 · · · s k ] ∈ R m×k (1893)is the full-rank matrix S ∈ R m×m having m − k columns truncated(corresponding to 0 eigenvalues), while⎡ ⎤Q T S −1 (1:k, :) = ⎣w T1.w T k⎦ ∈ R k×m (1894)is matrix S −1 having the corresponding m − k rows truncated. By the0 eigenvalues theorem (A.7.3.0.1), R(U)= R(P ) , R(Q)= R(P T ) , andR(P ) = span {s i | φ i = 1 ∀i}N(P ) = span {s i | φ i = 0 ∀i}R(P T ) = span {w i | φ i = 1 ∀i}N(P T ) = span {w i | φ i = 0 ∀i}(1895)Thus biorthogonality Q T U =I is a necessary condition for idempotence, andso the collection of nonorthogonal projectors projecting on R(U) is the affinesubset P k =UQ T k where Q k = {Q | Q T U = I , Q∈ R m×k }.(⇒) Biorthogonality is a sufficient condition for idempotence;P 2 =k∑s i wiTi=1k∑s j wj T = P (1896)j=1id est, if the cross-products are annihilated, then P 2 =P .Nonorthogonal projection of x on R(P ) has expression like a biorthogonalexpansion,k∑Px = UQ T x = wi T xs i (1897)When the domain is restricted to range of P , say x=Uξ for ξ ∈ R k , thenx = Px = UQ T Uξ = Uξ and expansion is unique because the eigenvectorsE.5 Eigenvectors of a real matrix corresponding to real eigenvalues must be real.(A.5.0.0.1)i=1

706 APPENDIX E. PROJECTIONTxT ⊥ R(Q)cone(Q)cone(U)PxFigure 162: Nonorthogonal projection of x∈ R 3 on R(U)= R 2 underbiorthogonality condition; id est, Px=UQ T x such that Q T U =I . Anypoint along imaginary line T connecting x to Px will be projectednonorthogonally on Px with respect to horizontal plane constitutingR 2 = aff cone(U) in this example. Extreme directions of cone(U) correspondto two columns of U ; likewise for cone(Q). For purpose of illustration, wetruncate each conic hull by truncating coefficients of conic combination atunity. Conic hull cone(Q) is headed upward at an angle, out of plane ofpage. Nonorthogonal projection would fail were N(Q T ) in R(U) (were Tparallel to a line in R(U)).

E.1. IDEMPOTENT MATRICES 707are linearly independent. Otherwise, any component of x in N(P)= N(Q T )will be annihilated. Direction of nonorthogonal projection is orthogonal toR(Q) ⇔ Q T U =I ; id est, for Px∈ R(U)Px − x ⊥ R(Q) in R m (1898)E.1.1.0.1 Example. Illustration of nonorthogonal projector.Figure 162 shows cone(U) , the conic hull of the columns of⎡ ⎤1 1U = ⎣−0.5 0.3 ⎦ (1899)0 0from nonorthogonal projector P = UQ T . Matrix U has a limitless numberof left inverses because N(U T ) is nontrivial. Similarly depicted is left inverseQ T from (1886)⎡Q = U †T + ZB T = ⎣⎡= ⎣0.3750 0.6250−1.2500 1.25000 00.3750 0.6250−1.2500 1.25000.5000 0.5000⎤⎡⎦ + ⎣⎤⎦001⎤⎦[0.5 0.5](1900)where Z ∈ N(U T ) and matrix B is selected arbitrarily; id est, Q T U = Ibecause U is full-rank.Direction of projection on R(U) is orthogonal to R(Q). Any point alongline T in the figure, for example, will have the same projection. Were matrixZ instead equal to 0, then cone(Q) would become the relative dual tocone(U) (sharing the same affine hull;2.13.8, confer Figure 56a). In thatcase, projection Px = UU † x of x on R(U) becomes orthogonal projection(and unique minimum-distance).E.1.2Idempotence summaryNonorthogonal subspace-projector P is a (convex) linear operator defined byidempotence or biorthogonal decomposition (1889), but characterized not bysymmetry or positive semidefiniteness or nonexpansivity (1917).

708 APPENDIX E. PROJECTIONE.2 I−P , Projection on algebraic complementIt follows from the diagonalizability of idempotent matrices that I − P mustalso be a projection matrix because it too is idempotent, and because it maybe expressedm∑I − P = S(I − Φ)S −1 = (1 − φ i )s i wi T (1901)where (1 − φ i ) ∈ {1, 0} are the eigenvalues of I − P (1452) whoseeigenvectors s i ,w i are identical to those of P in (1892). A consequence ofthat complementary relationship of eigenvalues is the fact, [343,2] [338,2]for subspace projector P = P 2 ∈ R m×mR(P ) = span {s i | φ i = 1 ∀i} = span {s i | (1 − φ i ) = 0 ∀i} = N(I − P )N(P ) = span {s i | φ i = 0 ∀i} = span {s i | (1 − φ i ) = 1 ∀i} = R(I − P )R(P T ) = span {w i | φ i = 1 ∀i} = span {w i | (1 − φ i ) = 0 ∀i} = N(I − P T )N(P T ) = span {w i | φ i = 0 ∀i} = span {w i | (1 − φ i ) = 1 ∀i} = R(I − P T )(1902)that is easy to see from (1892) and (1901). Idempotent I −P thereforeprojects vectors on its range: N(P ). Because all eigenvectors of a realidempotent matrix are real and independent, the algebraic complement ofR(P ) [227,3.3] is equivalent to N(P ) ; E.6 id est,R(P )⊕N(P ) = R(P T )⊕N(P T ) = R(P T )⊕N(P ) = R(P )⊕N(P T ) = R mi=1(1903)because R(P ) ⊕ R(I −P )= R m . For idempotent P ∈ R m×m , consequently,rankP + rank(I − P ) = m (1904)E.2.0.0.1 Theorem. Rank/Trace. [393,4.1 prob.9] (confer (1921))P 2 = P⇔rankP = trP and rank(I − P ) = tr(I − P )(1905)E.6 The same phenomenon occurs with symmetric (nonidempotent) matrices, for example.When summands in A ⊕ B = R m are orthogonal vector spaces, the algebraic complementis the orthogonal complement.⋄

E.3. SYMMETRIC IDEMPOTENT MATRICES 709E.2.1Universal projector characteristicAlthough projection is not necessarily orthogonal and R(P )̸⊥ R(I − P ) ingeneral, still for any projector P and any x∈ R mPx + (I − P )x = x (1906)must hold where R(I − P )= N(P ) is the algebraic complement of R(P).The algebraic complement of closed convex cone K , for example, is thenegative dual cone −K ∗ . (2023)E.3 Symmetric idempotent matricesWhen idempotent matrix P is symmetric, P is an orthogonal projector. Inother words, the direction of projection of point x∈ R m on subspace R(P )is orthogonal to R(P ) ; id est, for P 2 =P ∈ S m and projection Px∈ R(P )Px − x ⊥ R(P ) in R m (1907)(confer (1898)) Perpendicularity is a necessary and sufficient condition fororthogonal projection on a subspace. [109,4.9]A condition equivalent to (1907) is: Norm of direction x −Px is theinfimum over all nonorthogonal projections of x on R(P ) ; [250,3.3] forP 2 =P ∈ S m , R(P )= R(A) , matrices A,B, Z and positive integer k asdefined for (1886), and given x∈ R m‖x − Px‖ 2 = ‖x − AA † x‖ 2 = infB∈R n×k ‖x − A(A † + BZ T )x‖ 2 = dist(x, R(P ))(1908)The infimum is attained for R(B)⊆ N(A) over any affine subset ofnonorthogonal projectors (1888) indexed by k .Proof is straightforward: The vector 2-norm is a convex function. Settinggradient of the norm-square to 0, applyingD.2,(A T ABZ T − A T (I − AA † ) ) xx T A = 0⇔(1909)A T ABZ T xx T A = 0because A T = A T AA † . Projector P =AA † is therefore unique; theminimum-distance projector is the orthogonal projector, and vice versa.

710 APPENDIX E. PROJECTIONWe get P =AA † so this projection matrix must be symmetric. Then forany matrix A∈ R m×n , symmetric idempotent P projects a given vector xin R m orthogonally on R(A). Under either condition (1907) or (1908), theprojection Px is unique minimum-distance; for subspaces, perpendicularityand minimum-distance conditions are equivalent.E.3.1Four subspacesWe summarize the orthogonal projectors projecting on the four fundamentalsubspaces: for A∈ R m×nA † A : R n on ( R(A † A) = R(A T ) )AA † : R m on ( R(AA † ) = R(A) )I −A † A : R n on ( R(I −A † A) = N(A) )I −AA † : R m on ( R(I −AA † )= N(A T ) ) (1910)A basis for each fundamental subspace comprises the linearly independentcolumn vectors from its associated symmetric projection matrix:basis R(A T ) ⊆ A † A ⊆ R(A T )basis R(A) ⊆ AA † ⊆ R(A)basis N(A) ⊆ I −A † A ⊆ N(A)basis N(A T ) ⊆ I −AA † ⊆ N(A T )(1911)For completeness: E.7 (1902)N(A † A) = N(A)N(AA † ) = N(A T )N(I −A † A) = R(A T )N(I −AA † ) = R(A)(1912)E.3.2Orthogonal characterizationAny symmetric projector P 2 =P ∈ S m projecting on nontrivial R(Q) canbe defined by the orthonormality condition Q T Q = I . When skinny matrixE.7 Proof is by singular value decomposition (A.6.2): N(A † A)⊆ N(A) is obvious.Conversely, suppose A † Ax=0. Then x T A † Ax=x T QQ T x=‖Q T x‖ 2 =0 where A=UΣQ Tis the subcompact singular value decomposition. Because R(Q)= R(A T ) , then x∈N(A)which implies N(A † A)⊇ N(A). ∴ N(A † A)= N(A).

E.3. SYMMETRIC IDEMPOTENT MATRICES 711Q∈ R m×k has orthonormal columns, then Q † = Q T by the Moore-Penroseconditions. Hence, any P having an orthonormal decomposition (E.3.4)where [331,3.3] (1607)P = QQ T , Q T Q = I (1913)R(P ) = R(Q) , N(P ) = N(Q T ) (1914)is an orthogonal projector projecting on R(Q) having, for Px∈ R(Q)(confer (1898))Px − x ⊥ R(Q) in R m (1915)From (1913), orthogonal projector P is obviously positive semidefinite(A.3.1.0.6); necessarily,P T = P , P † = P , ‖P ‖ 2 = 1, P ≽ 0 (1916)and ‖Px‖ = ‖QQ T x‖ = ‖Q T x‖ because ‖Qy‖ = ‖y‖ ∀y ∈ R k . All orthogonalprojectors are therefore nonexpansive because√〈Px, x〉 = ‖Px‖ = ‖Q T x‖ ≤ ‖x‖ ∀x∈ R m (1917)the Bessel inequality, [109] [227] with equality when x∈ R(Q).From the diagonalization of idempotent matrices (1892) on page 704P = SΦS T =m∑φ i s i s T i =i=1k∑≤ mi=1s i s T i (1918)orthogonal projection of point x on R(P ) has expression like an orthogonalexpansion [109,4.10]wherePx = QQ T x =k∑s T i xs i (1919)i=1Q = S(:,1:k) = [s 1 · · · s k ] ∈ R m×k (1920)and where the s i [sic] are orthonormal eigenvectors of symmetricidempotent P . When the domain is restricted to range of P , say x=Qξ forξ ∈ R k , then x = Px = QQ T Qξ = Qξ and expansion is unique. Otherwise,any component of x in N(Q T ) will be annihilated.

712 APPENDIX E. PROJECTIONE.3.2.0.1 Theorem. Symmetric rank/trace. (confer (1905) (1456))P T = P , P 2 = P⇔rankP = trP = ‖P ‖ 2 F and rank(I − P ) = tr(I − P ) = ‖I − P ‖ 2 F(1921)⋄Proof. We take as given Theorem E.2.0.0.1 establishing idempotence.We have left only to show trP =‖P ‖ 2 F ⇒ P T = P , established in [393,7.1].E.3.3Summary, symmetric idempotent(conferE.1.2) Orthogonal projector P is a (convex) linear operatordefined [199,A.3.1] by idempotence and symmetry, and characterized bypositive semidefiniteness and nonexpansivity. The algebraic complement(E.2) to R(P ) becomes the orthogonal complement R(I − P ) ; id est,R(P ) ⊥ R(I − P ).E.3.4Orthonormal decompositionWhen Z =0 in the general nonorthogonal projector A(A † + BZ T ) (1886),an orthogonal projector results (for any matrix A) characterized principallyby idempotence and symmetry. Any real orthogonal projector may, infact, be represented by an orthonormal decomposition such as (1913).[204,1 prob.42]To verify that assertion for the four fundamental subspaces (1910),we need only to express A by subcompact singular value decomposition(A.6.2): From pseudoinverse (1576) of A = UΣQ T ∈ R m×nAA † = UΣΣ † U T = UU T ,I − AA † = I − UU T = U ⊥ U ⊥T ,A † A = QΣ † ΣQ T = QQ TI − A † A = I − QQ T = Q ⊥ Q ⊥T(1922)where U ⊥ ∈ R m×m−rank A holds columnar an orthonormal basis for theorthogonal complement of R(U) , and likewise for Q ⊥ ∈ R n×n−rank A .Existence of an orthonormal decomposition is sufficient to establishidempotence and symmetry of an orthogonal projector (1913).

E.3. SYMMETRIC IDEMPOTENT MATRICES 713E.3.5Unifying trait of all projectors: directionRelation (1922) shows: orthogonal projectors simultaneously possessa biorthogonal decomposition (conferE.1.1) (for example, AA † forskinny-or-square A full-rank) and an orthonormal decomposition (UU Twhence Px = UU T x).E.3.5.1orthogonal projector, orthonormal decompositionConsider orthogonal expansion of x∈ R(U) :n∑x = UU T x = u i u T i x (1923)a sum of one-dimensional orthogonal projections (E.6.3), wherei=1U [u 1 · · · u n ] and U T U = I (1924)and where the subspace projector has two expressions, (1922)AA † UU T (1925)where A ∈ R m×n has rank n . The direction of projection of x on u j forsome j ∈{1... n} , for example, is orthogonal to u j but parallel to a vectorin the span of all the remaining vectors constituting the columns of U ;E.3.5.2u T j(u j u T j x − x) = 0u j u T j x − x = u j u T j x − UU T x ∈ R({u i |i=1... n, i≠j})orthogonal projector, biorthogonal decomposition(1926)We get a similar result for the biorthogonal expansion of x∈ R(A). DefineA [a 1 a 2 · · · a n ] ∈ R m×n (1927)and the rows of the pseudoinverse E.8⎡ ⎤a ∗T1A † a∗T⎢⎣2 ⎥. ⎦ ∈ Rn×m (1928)a ∗TnE.8 Notation * in this context connotes extreme direction of a dual cone; e.g., (407) orExample E.5.0.0.3.

714 APPENDIX E. PROJECTIONunder biorthogonality condition A † A=I. In biorthogonal expansion (2.13.8)x = AA † x =n∑i=1a i a ∗Ti x (1929)the direction of projection of x on a j for some particular j ∈{1... n} , forexample, is orthogonal to a ∗ j and parallel to a vector in the span of all theremaining vectors constituting the columns of A ;a ∗Tj (a j a ∗Tj x − x) = 0a j a ∗Tj x − x = a j a ∗Tj x − AA † x ∈ R({a i |i=1... n, i≠j})(1930)E.3.5.3nonorthogonal projector, biorthogonal decompositionBecause the result inE.3.5.2 is independent of matrix symmetryAA † =(AA † ) T , we must get the same result for any nonorthogonal projectorcharacterized by a biorthogonality condition; namely, for nonorthogonalprojector P = UQ T (1889) under biorthogonality condition Q T U = I , inthe biorthogonal expansion of x∈ R(U)x = UQ T x =k∑u i qi T x (1931)i=1whereU [ u 1 · · · u k ] ∈ R m×k⎡ ⎤q T1 (1932)Q T ⎣ . ⎦ ∈ R k×mq T kthe direction of projection of x on u j is orthogonal to q j and parallel to avector in the span of the remaining u i :q T j (u j q T j x − x) = 0u j q T j x − x = u j q T j x − UQ T x ∈ R({u i |i=1... k , i≠j})(1933)

E.4. ALGEBRA OF PROJECTION ON AFFINE SUBSETS 715E.4 Algebra of projection on affine subsetsLet P A x denote projection of x on affine subset A R + α where R is asubspace and α ∈ A . Then, because R is parallel to A , it holds:P A x = P R+α x = (I − P R )(α) + P R x= P R (x − α) + α(1934)Subspace projector P R is a linear operator (P A is not), and P R (x + y)=P R xwhenever y ⊥R and P R is an orthogonal projector.E.4.0.0.1 Theorem. Orthogonal projection on affine subset. [109,9.26]Let A = R + α be an affine subset where α ∈ A , and let R ⊥ be theorthogonal complement of subspace R . Then P A x is the orthogonalprojection of x∈ R n on A if and only ifP A x ∈ A , 〈P A x − x, a − α〉 = 0 ∀a ∈ A (1935)or if and only ifP A x ∈ A , P A x − x ∈ R ⊥ (1936)⋄E.5 Projection examplesE.5.0.0.1 Example. Orthogonal projection on orthogonal basis.Orthogonal projection on a subspace can instead be accomplished byorthogonally projecting on the individual members of an orthogonal basis forthat subspace. Suppose, for example, matrix A∈ R m×n holds an orthonormalbasis for R(A) in its columns; A [a 1 a 2 · · · a n ] . Then orthogonalprojection of vector x∈ R n on R(A) is a sum of one-dimensional orthogonalprojectionsPx = AA † x = A(A T A) −1 A T x = AA T x =n∑a i a T i x (1937)where each symmetric dyad a i a T i is an orthogonal projector projecting onR(a i ). (E.6.3) Because ‖x − Px‖ is minimized by orthogonal projection,Px is considered to be the best approximation (in the Euclidean sense) tox from the set R(A). [109,4.9]i=1

716 APPENDIX E. PROJECTIONE.5.0.0.2 Example. Orthogonal projection on span of nonorthogonal basis.Orthogonal projection on a subspace can also be accomplished by projectingnonorthogonally on the individual members of any nonorthogonal basis forthat subspace. This interpretation is in fact the principal application of thepseudoinverse we discussed. Now suppose matrix A holds a nonorthogonalbasis for R(A) in its columns,A = [a 1 a 2 · · · a n ] ∈ R m×n (1927)and define the rows a ∗Ti of its pseudoinverse A † as in (1928). Thenorthogonal projection of vector x∈ R n on R(A) is a sum of one-dimensionalnonorthogonal projectionsPx = AA † x =n∑i=1a i a ∗Ti x (1938)where each nonsymmetric dyad a i a ∗Ti is a nonorthogonal projector projectingon R(a i ) , (E.6.1) idempotent because of biorthogonality condition A † A = I .The projection Px is regarded as the best approximation to x from theset R(A) , as it was in Example E.5.0.0.1.E.5.0.0.3 Example. Biorthogonal expansion as nonorthogonal projection.Biorthogonal expansion can be viewed as a sum of components, each anonorthogonal projection on the range of an extreme direction of a pointedpolyhedral cone K ; e.g., Figure 163.Suppose matrix A∈ R m×n holds a nonorthogonal basis for R(A) inits columns as in (1927), and the rows of pseudoinverse A † are definedas in (1928). Assuming the most general biorthogonality condition(A † + BZ T )A = I with BZ T defined as for (1886), then biorthogonalexpansion of vector x is a sum of one-dimensional nonorthogonal projections;for x∈ R(A)x = A(A † + BZ T )x = AA † x =n∑i=1a i a ∗Ti x (1939)where each dyad a i a ∗Ti is a nonorthogonal projector projecting on R(a i ).(E.6.1) The extreme directions of K=cone(A) are {a 1 ,..., a n } the linearlyindependent columns of A , while the extreme directions {a ∗ 1 ,..., a ∗ n}

E.5. PROJECTION EXAMPLES 717a ∗ 2K ∗a 2a ∗ 1zxK0ya 1a 1 ⊥ a ∗ 2a 2 ⊥ a ∗ 1x = y + z = P a1 x + P a2 xK ∗Figure 163: (confer Figure 63) Biorthogonal expansion of point x∈aff K isfound by projecting x nonorthogonally on extreme directions of polyhedralcone K ⊂ R 2 . (Dotted lines of projection bound this translated negated cone.)Direction of projection on extreme direction a 1 is orthogonal to extremedirection a ∗ 1 of dual cone K ∗ and parallel to a 2 (E.3.5); similarly, directionof projection on a 2 is orthogonal to a ∗ 2 and parallel to a 1 . Point x issum of nonorthogonal projections: x on R(a 1 ) and x on R(a 2 ). Expansionis unique because extreme directions of K are linearly independent. Werea 1 orthogonal to a 2 , then K would be identical to K ∗ and nonorthogonalprojections would become orthogonal.

718 APPENDIX E. PROJECTIONof relative dual cone K ∗ ∩aff K=cone(A †T ) (2.13.9.4) correspond to thelinearly independent (B.1.1.1) rows of A † . Directions of nonorthogonalprojection are determined by the pseudoinverse; id est, direction ofprojection a i a ∗Ti x−x on R(a i ) is orthogonal to a ∗ i . E.9Because the extreme directions of this cone K are linearly independent,the component projections are unique in the sense:there is only one linear combination of extreme directions of K thatyields a particular point x∈ R(A) wheneverR(A) = aff K = R(a 1 ) ⊕ R(a 2 ) ⊕ ... ⊕ R(a n ) (1940)E.5.0.0.4 Example. Nonorthogonal projection on elementary matrix.Suppose P Y is a linear nonorthogonal projector projecting on subspace Y ,and suppose the range of a vector u is linearly independent of Y ; id est,for some other subspace M containing Y supposeM = R(u) ⊕ Y (1941)Assuming P M x = P u x + P Y x holds, then it follows for vector x∈MP u x = x − P Y x , P Y x = x − P u x (1942)nonorthogonal projection of x on R(u) can be determined fromnonorthogonal projection of x on Y , and vice versa.Such a scenario is realizable were there some arbitrary basis for Ypopulating a full-rank skinny-or-square matrix AA [ basis Y u ] ∈ R N×n+1 (1943)Then P M =AA † fulfills the requirements, with P u =A(:,n + 1)A † (n + 1,:)and P Y =A(:,1:n)A † (1:n,:). Observe, P M is an orthogonal projectorwhereas P Y and P u are nonorthogonal projectors.Now suppose, for example, P Y is an elementary matrix (B.3); inparticular,P Y = I − e 1 1 T =[0 √ ]2V N ∈ R N×N (1944)E.9 This remains true in high dimension although only a little more difficult to visualizein R 3 ; confer , Figure 64.

E.5. PROJECTION EXAMPLES 719where Y = N(1 T ). We have M = R N , A = [ √ 2V N e 1 ] , and u = e 1 .Thus P u = e 1 1 T is a nonorthogonal projector projecting on R(u) in adirection parallel to a vector in Y (E.3.5), and P Y x = x − e 1 1 T x is anonorthogonal projection of x on Y in a direction parallel to u . E.5.0.0.5 Example. Projecting origin on a hyperplane.(confer2.4.2.0.2) Given the hyperplane representation having b∈R andnonzero normal a∈ R m ∂H = {y | a T y = b} ⊂ R m (114)orthogonal projection of the origin P0 on that hyperplane is the uniqueoptimal solution to a minimization problem: (1908)‖P0 − 0‖ 2 = infy∈∂H ‖y − 0‖ 2= infξ∈R m−1 ‖Zξ + x‖ 2(1945)where x is any solution to a T y=b , and where the columns of Z ∈ R m×m−1constitute a basis for N(a T ) so that y = Zξ + x ∈ ∂H for all ξ ∈ R m−1 .The infimum can be found by setting the gradient (with respect to ξ) ofthe strictly convex norm-square to 0. We find the minimizing argumentξ ⋆ = −(Z T Z) −1 Z T x (1946)soy ⋆ = ( I − Z(Z T Z) −1 Z T) x (1947)and from (1910)a TP0 = y ⋆ = a(a T a) −1 a T x = a‖a‖ ‖a‖ x =a‖a‖ 2 aT x A † Ax = a b‖a‖ 2 (1948)In words, any point x in the hyperplane ∂H projected on its normal a(confer (1972)) yields that point y ⋆ in the hyperplane closest to the origin.

720 APPENDIX E. PROJECTIONE.5.0.0.6 Example. Projection on affine subset.The technique of Example E.5.0.0.5 is extensible. Given an intersection ofhyperplanesA = {y | Ay = b} ⊂ R m (1949)where each row of A∈ R m×n is nonzero and b∈R(A) , then the orthogonalprojection Px of any point x∈ R n on A is the solution to a minimizationproblem:‖Px − x‖ 2 = infy∈A‖y − x‖ 2= infξ∈R n−rank A ‖Zξ + y p − x‖ 2(1950)where y p is any solution to Ay = b , and where the columns ofZ ∈ R n×n−rank A constitute a basis for N(A) so that y = Zξ + y p ∈ A forall ξ ∈ R n−rank A .The infimum is found by setting the gradient of the strictly convexnorm-square to 0. The minimizing argument isξ ⋆ = −(Z T Z) −1 Z T (y p − x) (1951)soand from (1910),y ⋆ = ( I − Z(Z T Z) −1 Z T) (y p − x) + x (1952)Px = y ⋆ = x − A † (Ax − b)= (I − A † A)x + A † Ay p(1953)which is a projection of x on N(A) then translated perpendicularly withrespect to the nullspace until it meets the affine subset A . E.5.0.0.7 Example. Projection on affine subset, vertex-description.Suppose now we instead describe the affine subset A in terms of some givenminimal set of generators arranged columnar in X ∈ R n×N (76); id est,A = aff X = {Xa | a T 1=1} ⊆ R n (78)Here minimal set means XV N = [x 2 −x 1 x 3 −x 1 · · · x N −x 1 ]/ √ 2 (960) isfull-rank (2.4.2.2) where V N ∈ R N×N−1 is the Schoenberg auxiliary matrix

E.5. PROJECTION EXAMPLES 721(B.4.2). Then the orthogonal projection Px of any point x∈ R n on A isthe solution to a minimization problem:‖Px − x‖ 2= inf ‖Xa − x‖ 2a T 1=1(1954)= inf ‖X(V N ξ + a p ) − x‖ 2ξ∈R N−1where a p is any solution to a T 1=1. We find the minimizing argumentξ ⋆ = −(V T NX T XV N ) −1 V T NX T (Xa p − x) (1955)and so the orthogonal projection is [205,3]Px = Xa ⋆ = (I − XV N (XV N ) † )Xa p + XV N (XV N ) † x (1956)a projection of point x on R(XV N ) then translated perpendicularly withrespect to that range until it meets the affine subset A . E.5.0.0.8 Example. Projecting on hyperplane, halfspace, slab.Given hyperplane representation having b ∈ R and nonzero normal a∈ R m∂H = {y | a T y = b} ⊂ R m (114)orthogonal projection of any point x∈ R m on that hyperplane [206,3.1] isunique:Px = x − a(a T a) −1 (a T x − b) (1957)Orthogonal projection of x on the halfspace parametrized by b ∈ R andnonzero normal a∈ R m H − = {y | a T y ≤ b} ⊂ R m (106)is the pointPx = x − a(a T a) −1 max{0, a T x − b} (1958)Orthogonal projection of x on the convex slab (Figure 11), for c < bB {y | c ≤ a T y ≤ b} ⊂ R m (1959)is the point [148,5.1]Px = x − a(a T a) −1( max{0, a T x − b} − max{0, c − a T x} ) (1960)

722 APPENDIX E. PROJECTIONE.6 Vectorization interpretation,projection on a matrixE.6.1Nonorthogonal projection on a vectorNonorthogonal projection of vector x on the range of vector y isaccomplished using a normalized dyad P 0 (B.1); videlicet,〈z,x〉〈z,y〉 y = zT xz T y y = yzTz T y x P 0x (1961)where 〈z,x〉/〈z,y〉 is the coefficient of projection on y . Because P0 2 =P 0and R(P 0 )= R(y) , rank-one matrix P 0 is a nonorthogonal projector dyadprojecting on R(y). Direction of nonorthogonal projection is orthogonalto z ; id est,P 0 x − x ⊥ R(P0 T ) (1962)E.6.2Nonorthogonal projection on vectorized matrixFormula (1961) is extensible. Given X,Y,Z ∈ R m×n , we have theone-dimensional nonorthogonal projection of X in isomorphic R mn on therange of vectorized Y : (2.2)〈Z , X〉Y , 〈Z , Y 〉 ≠ 0 (1963)〈Z , Y 〉where 〈Z , X〉/〈Z , Y 〉 is the coefficient of projection. The inequalityaccounts for the fact: projection on R(vecY ) is in a direction orthogonal tovec Z .E.6.2.1Nonorthogonal projection on dyadNow suppose we have nonorthogonal projector dyadAnalogous to (1961), for X ∈ R m×mP 0 = yzTz T y ∈ Rm×m (1964)P 0 XP 0 =yzTz T y X yzTz T y = zT Xy(z T y) 2 yzT = 〈zyT , X〉〈zy T , yz T 〉 yzT (1965)

E.6. VECTORIZATION INTERPRETATION, 723is a nonorthogonal projection of matrix X on the range of vectorizeddyad P 0 ; from which it follows:P 0 XP 0 = zT Xyz T y〈〉yz T zyT yzTz T y = z T y , X z T y = 〈P 0 T , X〉 P 0 = 〈P 0 T , X〉〈P0 T , P 0 〉 P 0(1966)Yet this relationship between matrix product and vector inner-product onlyholds for a dyad projector. When nonsymmetric projector P 0 is rank-one asin (1964), therefore,R(vecP 0 XP 0 ) = R(vecP 0 ) in R m2 (1967)andP 0 XP 0 − X ⊥ P T 0 in R m2 (1968)E.6.2.1.1 Example. λ as coefficients of nonorthogonal projection.Any diagonalization (A.5)X = SΛS −1 =m∑λ i s i wi T ∈ R m×m (1547)i=1may be expressed as a sum of one-dimensional nonorthogonal projectionsof X , each on the range of a vectorized eigenmatrix P j s j w T j ;∑X = m 〈(Se i e T jS −1 ) T , X〉 Se i e T jS −1i,j=1∑= m ∑〈(s j wj T ) T , X〉 s j wj T + m 〈(Se i e T jS −1 ) T , SΛS −1 〉 Se i e T jS −1j=1∑= m 〈(s j wj T ) T , X〉 s j wjTj=1i,j=1j ≠ i∑ m ∑〈Pj T , X〉 P j = m s j wj T Xs j wjTj=1∑= m λ j s j wjTj=1j=1∑= m P j XP jj=1(1969)This biorthogonal expansion of matrix X is a sum of nonorthogonalprojections because the term outside the projection coefficient 〈〉 is not

724 APPENDIX E. PROJECTIONidentical to the inside-term. (E.6.4) The eigenvalues λ j are coefficients ofnonorthogonal projection of X , while the remaining M(M −1)/2 coefficients(for i≠j) are zeroed by projection. When P j is rank-one as in (1969),andR(vecP j XP j ) = R(vecs j w T j ) = R(vecP j ) in R m2 (1970)P j XP j − X ⊥ P T j in R m2 (1971)Were matrix X symmetric, then its eigenmatrices would also be. So theone-dimensional projections would become orthogonal. (E.6.4.1.1) E.6.3Orthogonal projection on a vectorThe formula for orthogonal projection of vector x on the range of vector y(one-dimensional projection) is basic analytic geometry; [13,3.3] [331,3.2][365,2.2] [380,1-8]〈y,x〉〈y,y〉 y = yT xy T y y = yyTy T y x P 1x (1972)where 〈y,x〉/〈y,y〉 is the coefficient of projection on R(y). An equivalentdescription is: Vector P 1 x is the orthogonal projection of vector x onR(P 1 )= R(y). Rank-one matrix P 1 is a projection matrix because P 2 1 =P 1 .The direction of projection is orthogonalbecause P T 1 = P 1 .E.6.4P 1 x − x ⊥ R(P 1 ) (1973)Orthogonal projection on a vectorized matrixFrom (1972), given instead X, Y ∈ R m×n , we have the one-dimensionalorthogonal projection of matrix X in isomorphic R mn on the range ofvectorized Y : (2.2)〈Y , X〉〈Y , Y 〉 Y (1974)where 〈Y , X〉/〈Y , Y 〉 is the coefficient of projection.For orthogonal projection, the term outside the vector inner-products 〈〉must be identical to the terms inside in three places.

E.6. VECTORIZATION INTERPRETATION, 725E.6.4.1Orthogonal projection on dyadThere is opportunity for insight when Y is a dyad yz T (B.1): Instead givenX ∈ R m×n , y ∈ R m , and z ∈ R n〈yz T , X〉〈yz T , yz T 〉 yzT = yT Xzy T y z T z yzT (1975)is the one-dimensional orthogonal projection of X in isomorphic R mn onthe range of vectorized yz T . To reveal the obscured symmetric projectionmatrices P 1 and P 2 we rewrite (1975):y T Xzy T y z T z yzT =yyTy T y X zzTz T z P 1 XP 2 (1976)So for projector dyads, projection (1976) is the orthogonal projection in R mnif and only if projectors P 1 and P 2 are symmetric; E.10 in other words,andfor orthogonal projection on the range of a vectorized dyad yz T , theterm outside the vector inner-products 〈〉 in (1975) must be identicalto the terms inside in three places.When P 1 and P 2 are rank-one symmetric projectors as in (1976), (37)When y=z then P 1 =P 2 =P T 2 andR(vec P 1 XP 2 ) = R(vecyz T ) in R mn (1977)P 1 XP 2 − X ⊥ yz T in R mn (1978)P 1 XP 1 = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1 (1979)E.10 For diagonalizable X ∈ R m×m (A.5), its orthogonal projection in isomorphic R m2 onthe range of vectorized yz T ∈ R m×m becomes:P 1 XP 2 =m∑λ i P 1 s i wi T P 2i=1When R(P 1 ) = R(w j ) and R(P 2 ) = R(s j ), the j th dyad term from the diagonalizationis isolated but only, in general, to within a scale factor because neither set of left or righteigenvectors is necessarily orthonormal unless X is a normal matrix [393,3.2]. Yet whenR(P 2 )= R(s k ) , k≠j ∈{1... m}, then P 1 XP 2 =0.

726 APPENDIX E. PROJECTIONmeaning, P 1 XP 1 is equivalent to orthogonal projection of matrix X onthe range of vectorized projector dyad P 1 . Yet this relationship betweenmatrix product and vector inner-product does not hold for general symmetricprojector matrices.E.6.4.1.1 Example. Eigenvalues λ as coefficients of orthogonal projection.Let C represent any convex subset of subspace S M , and let C 1 be any elementof C . Then C 1 can be expressed as the orthogonal expansionC 1 =M∑i=1M∑〈E ij , C 1 〉 E ij ∈ C ⊂ S M (1980)j=1j ≥ iwhere E ij ∈ S M is a member of the standard orthonormal basis for S M(59). This expansion is a sum of one-dimensional orthogonal projectionsof C 1 ; each projection on the range of a vectorized standard basis matrix.Vector inner-product 〈E ij , C 1 〉 is the coefficient of projection of svec C 1 onR(svec E ij ).When C 1 is any member of a convex set C whose dimension is L ,Carathéodory’s theorem [113] [307] [199] [41] [42] guarantees that no morethan L +1 affinely independent members from C are required to faithfullyrepresent C 1 by their linear combination. E.11Dimension of S M is L=M(M+1)/2 in isometrically isomorphicR M(M+1)/2 . Yet because any symmetric matrix can be diagonalized, (A.5.1)C 1 ∈ S M is a linear combination of its M eigenmatrices q i q T i (A.5.0.3)weighted by its eigenvalues λ i ;C 1 = QΛQ T =M∑λ i q i qi T (1981)i=1where Λ ∈ S M is a diagonal matrix having δ(Λ) i =λ i , and Q=[q 1 · · · q M ]is an orthogonal matrix in R M×M containing corresponding eigenvectors.To derive eigenvalue decomposition (1981) from expansion (1980), Mstandard basis matrices E ij are rotated (B.5) into alignment with the ME.11 Carathéodory’s theorem guarantees existence of a biorthogonal expansion for anyelement in aff C when C is any pointed closed convex cone.

E.6. VECTORIZATION INTERPRETATION, 727eigenmatrices q i qiT of C 1 by applying a similarity transformation; [331,5.6]{ }qi q{QE ij Q T i T , i = j = 1... M} = ( ) (1982)√1qi 2qj T + q j qiT , 1 ≤ i < j ≤ Mwhich remains an orthonormal basis for S M . Then remarkably∑C 1 = M 〈QE ij Q T , C 1 〉 QE ij Q Ti,j=1j ≥ i∑= M ∑〈q i qi T , C 1 〉 q i qi T + M 〈QE ij Q T , QΛQ T 〉 QE ij Q Ti=1∑= M 〈q i qi T , C 1 〉 q i qiTi=1∑ Mi=1i,j=1j > i∑〈P i , C 1 〉 P i = M q i qi T C 1 q i qiT∑= M λ i q i qiTi=1i=1∑= M P i C 1 P ii=1(1983)this orthogonal expansion becomes the diagonalization; still a sum ofone-dimensional orthogonal projections. The eigenvaluesλ i = 〈q i q T i , C 1 〉 (1984)are clearly coefficients of projection of C 1 on the range of each vectorizedeigenmatrix. (conferE.6.2.1.1) The remaining M(M −1)/2 coefficients(i≠j) are zeroed by projection. When P i is rank-one symmetric as in (1983),R(svecP i C 1 P i ) = R(svec q i q T i ) = R(svec P i ) in R M(M+1)/2 (1985)andP i C 1 P i − C 1 ⊥ P i in R M(M+1)/2 (1986)E.6.4.2Positive semidefiniteness test as orthogonal projectionFor any given X ∈ R m×m the familiar quadratic construct y T Xy ≥ 0,over broad domain, is a fundamental test for positive semidefiniteness.

728 APPENDIX E. PROJECTION(A.2) It is a fact that y T Xy is always proportional to a coefficient oforthogonal projection; letting z in formula (1975) become y ∈ R m , thenP 2 =P 1 =yy T /y T y=yy T /‖yy T ‖ 2 (confer (1610)) and formula (1976) becomes〈yy T , X〉〈yy T , yy T 〉 yyT = yT Xyy T yyy Ty T y = yyTy T y X yyTy T y P 1XP 1 (1987)By (1974), product P 1 XP 1 is the one-dimensional orthogonal projection ofX in isomorphic R m2 on the range of vectorized P 1 because, for rankP 1 =1and P 21 =P 1 ∈ S m (confer (1966))P 1 XP 1 = yT Xyy T y〈〉yy T yyT yyTy T y = y T y , X y T y = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1(1988)The coefficient of orthogonal projection 〈P 1 , X〉= y T Xy/(y T y) is also knownas Rayleigh’s quotient. E.12 When P 1 is rank-one symmetric as in (1987),R(vec P 1 XP 1 ) = R(vec P 1 ) in R m2 (1989)andP 1 XP 1 − X ⊥ P 1 in R m2 (1990)E.12 When y becomes the j th eigenvector s j of diagonalizable X , for example, 〈P 1 , X 〉becomes the j th eigenvalue: [195,III]( m∑)s T j λ i s i wiT s ji=1〈P 1 , X 〉| y=sj=s T j s = λ jjSimilarly for y = w j , the j th left-eigenvector,( m∑〈P 1 , X 〉| y=wj=w T jλ i s i wiTi=1wj Tw j)w j= λ jA quandary may arise regarding the potential annihilation of the antisymmetric part ofX when s T j Xs j is formed. Were annihilation to occur, it would imply the eigenvalue thusfound came instead from the symmetric part of X . The quandary is resolved recognizingthat diagonalization of real X admits complex eigenvectors; hence, annihilation could onlycome about by forming re(s H j Xs j) = s H j (X +XT )s j /2 [202,7.1] where (X +X T )/2 isthe symmetric part of X , and s H j denotes conjugate transpose.

E.7. PROJECTION ON MATRIX SUBSPACES 729The test for positive semidefiniteness, then, is a test for nonnegativity ofthe coefficient of orthogonal projection of X on the range of each and everyvectorized extreme direction yy T (2.8.1) from the positive semidefinite conein the ambient space of symmetric matrices.E.6.4.3 PXP ≽ 0In some circumstances, it may be desirable to limit the domain of testy T Xy ≥ 0 for positive semidefiniteness; e.g., {‖y‖= 1}. Another exampleof limiting domain-of-test is central to Euclidean distance geometry: ForR(V )= N(1 T ) , the test −V DV ≽ 0 determines whether D ∈ S N h is aEuclidean distance matrix. The same test may be stated: For D ∈ S N h (andoptionally ‖y‖=1)D ∈ EDM N ⇔ −y T Dy = 〈yy T , −D〉 ≥ 0 ∀y ∈ R(V ) (1991)The test −V DV ≽ 0 is therefore equivalent to a test for nonnegativity of thecoefficient of orthogonal projection of −D on the range of each and everyvectorized extreme direction yy T from the positive semidefinite cone S N + suchthat R(yy T )= R(y)⊆ R(V ). (The validity of this result is independent ofwhether V is itself a projection matrix.)E.7 Projection on matrix subspacesE.7.1 PXP misinterpretation for higher-rank PFor a projection matrix P of rank greater than 1, PXP is generally notcommensurate with 〈P,X 〉 P as is the case for projector dyads (1988). Yet〈P,P 〉for a symmetric idempotent matrix P of any rank we are tempted to say“ PXP is the orthogonal projection of X ∈ S m on R(vec P) ”. The fallacyis: vec PXP does not necessarily belong to the range of vectorized P ; themost basic requirement for projection on R(vec P).E.7.2Orthogonal projection on matrix subspacesWith A 1 ∈ R m×n , B 1 ∈ R n×k , Z 1 ∈ R m×k , A 2 ∈ R p×n , B 2 ∈ R n×k , Z 2 ∈ R p×k asdefined for nonorthogonal projector (1886), and definingP 1 A 1 A † 1 ∈ S m , P 2 A 2 A † 2 ∈ S p (1992)

730 APPENDIX E. PROJECTIONthen, given compatible X‖X −P 1 XP 2 ‖ F = inf ‖X −A 1 (A † 1+B 1 Z TB 1 , B 2 ∈R n×k 1 )X(A †T2 +Z 2 B2 T )A T 2 ‖ F (1993)As for all subspace projectors, range of the projector is the subspace on whichprojection is made: {P 1 Y P 2 | Y ∈ R m×p }. For projectors P 1 and P 2 of anyrank, altogether, this means projection P 1 XP 2 is unique minimum-distance,orthogonalP 1 XP 2 − X ⊥ {P 1 Y P 2 | Y ∈ R m×p } in R mp (1994)and P 1 and P 2 must each be symmetric (confer (1976)) to attain the infimum.E.7.2.0.1 Proof. Minimum Frobenius norm (1993).Defining P A 1 (A † 1 + B 1 Z T 1 ) ,inf ‖X − A 1 (A † 1 + B 1 Z1 T )X(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2= inf ‖X − PX(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2 ()= inf tr (X T − A 2 (A † 2 + B 2 Z2 T )X T P T )(X − PX(A †T2 + Z 2 B2 T )A T 2 )B 1 , B 2 (= inf tr X T X −X T PX(A †T2 +Z 2 B2 T )A T 2 −A 2 (A †B 1 , B 22+B 2 Z2 T )X T P T X)+A 2 (A † 2+B 2 Z2 T )X T P T PX(A †T2 +Z 2 B2 T )A T 2(1995)Necessary conditions for a global minimum are ∇ B1 =0 and ∇ B2 =0. Termsnot containing B 2 in (1995) will vanish from gradient ∇ B2 ; (D.2.3)(∇ B2 tr −X T PXZ 2 B2A T T 2 −A 2 B 2 Z2X T T P T X+A 2 A † 2X T P T PXZ 2 B2A T T 2)+A 2 B 2 Z2X T T P T PXA †T2 A T 2+A 2 B 2 Z2X T T P T PXZ 2 B2A T T 2= −2A T 2X T PXZ 2 + 2A T 2A 2 A † 2X T P T PXZ 2 +)2A T 2A 2 B 2 Z2X T T P T PXZ 2= A T 2(−X T + A 2 A † 2X T P T + A 2 B 2 Z2X T T P T PXZ 2(1996)= 0 ⇔R(B 1 )⊆ N(A 1 ) and R(B 2 )⊆ N(A 2 )(or Z 2 = 0) because A T = A T AA † . Symmetry requirement (1992) is implicit.Were instead P T (A †T2 + Z 2 B2 T )A T 2 and the gradient with respect to B 1observed, then similar results are obtained. The projector is unique.Perpendicularity (1994) establishes uniqueness [109,4.9] of projectionP 1 XP 2 on a matrix subspace. The minimum-distance projector is theorthogonal projector, and vice versa.

E.7. PROJECTION ON MATRIX SUBSPACES 731E.7.2.0.2 Example. PXP redux & N(V).Suppose we define a subspace of m ×n matrices, each elemental matrixhaving columns constituting a list whose geometric center (5.5.1.0.1) is theorigin in R m :R m×nc {Y ∈ R m×n | Y 1 = 0}= {Y ∈ R m×n | N(Y ) ⊇ 1} = {Y ∈ R m×n | R(Y T ) ⊆ N(1 T )}= {XV | X ∈ R m×n } ⊂ R m×n (1997)the nonsymmetric geometric center subspace. Further suppose V ∈ S n isa projection matrix having N(V )= R(1) and R(V ) = N(1 T ). Then linearmapping T(X)=XV is the orthogonal projection of any X ∈ R m×n on R m×ncin the Euclidean (vectorization) sense because V is symmetric, N(XV )⊇1,and R(VX T )⊆ N(1 T ).Now suppose we define a subspace of symmetric n ×n matrices each ofwhose columns constitute a list having the origin in R n as geometric center,S n c {Y ∈ S n | Y 1 = 0}= {Y ∈ S n | N(Y ) ⊇ 1} = {Y ∈ S n | R(Y ) ⊆ N(1 T )}(1998)the geometric center subspace. Further suppose V ∈ S n is a projectionmatrix, the same as before. Then V XV is the orthogonal projection ofany X ∈ S n on S n c in the Euclidean sense (1994) because V is symmetric,V XV 1=0, and R(V XV )⊆ N(1 T ). Two-sided projection is necessary onlyto remain in the ambient symmetric matrix subspace. ThenS n c = {V XV | X ∈ S n } ⊂ S n (1999)has dim S n c = n(n−1)/2 in isomorphic R n(n+1)/2 . We find its orthogonalcomplement as the aggregate of all negative directions of orthogonalprojection on S n c : the translation-invariant subspace (5.5.1.1)S n⊥c {X − V XV | X ∈ S n } ⊂ S n= {u1 T + 1u T | u∈ R n }(2000)characterized by doublet u1 T + 1u T (B.2). E.13 Defining geometric centerE.13 Proof.{X − V X V | X ∈ S n } = {X − (I − 1 n 11T )X(I − 11 T 1 n ) | X ∈ Sn }= { 1 n 11T X + X11 T 1 n − 1 n 11T X11 T 1 n | X ∈ Sn }

732 APPENDIX E. PROJECTIONmappingV(X) = −V XV 1 2(996)consistently with (996), then N(V)= R(I − V) on domain S n analogouslyto vector projectors (E.2); id est,N(V) = S n⊥c (2001)a subspace of S n whose dimension is dim S n⊥c = n in isomorphic R n(n+1)/2 .Intuitively, operator V is an orthogonal projector; any argumentduplicitously in its range is a fixed point. So, this symmetric operator’snullspace must be orthogonal to its range.Now compare the subspace of symmetric matrices having all zeros in thefirst row and columnS n 1 {Y ∈ S n | Y e 1 = 0}{[ ] [ ] }0 0T 0 0T= X | X ∈ S n0 I 0 I{ [0 √ ] T [ √ ]= 2VN Z 0 2VN | Z ∈ SN}(2002)[ ] 0 0Twhere P = is an orthogonal projector. Then, similarly, PXP is0 Ithe orthogonal projection of any X ∈ S n on S n 1 in the Euclidean sense (1994),and{[ ] [ ] }0 0S n⊥T 0 0T1 X − X | X ∈ S n ⊂ S n0 I 0 I= { (2003)ue T 1 + e 1 u T | u∈ R n}Obviously, S n 1 ⊕ S n⊥1 = S n . Because {X1 | X ∈ S n } = R n ,{X − V X V | X ∈ S n } = {1ζ T + ζ1 T − 11 T (1 T ζ 1 n ) | ζ ∈Rn }= {1ζ T (I − 11 T 12n ) + (I − 12n 11T )ζ1 T | ζ ∈R n }where I − 12n 11T is invertible.

E.8. RANGE/ROWSPACE INTERPRETATION 733E.8 Range/Rowspace interpretationFor idempotent matrices P 1 and P 2 of any rank, P 1 XP2T is a projectionof R(X) on R(P 1 ) and a projection of R(X T ) on R(P 2 ) : For any given∑X = UΣQ T = η σ i u i qi T ∈ R m×p , as in compact SVD (1563),i=1P 1 XP T 2 =η∑σ i P 1 u i qi T P2 T =i=1η∑σ i P 1 u i (P 2 q i ) T (2004)i=1where η min{m , p}. Recall u i ∈ R(X) and q i ∈ R(X T ) when thecorresponding singular value σ i is nonzero. (A.6.1) So P 1 projects u i onR(P 1 ) while P 2 projects q i on R(P 2 ) ; id est, the range and rowspace of anyX are respectively projected on the ranges of P 1 and P 2 . E.14E.9 Projection on convex setThus far we have discussed only projection on subspaces. Now wegeneralize, considering projection on arbitrary convex sets in Euclidean space;convex because projection is, then, unique minimum-distance and a convexoptimization problem:For projection P C x of point x on any closed set C ⊆ R n it is obvious:C ≡ {P C x | x∈ R n } (2005)where P C is a projection operator that is convex when C is convex. [61, p.88]If C ⊆ R n is a closed convex set, then for each and every x∈ R n there existsa unique point P C x belonging to C that is closest to x in the Euclideansense. Like (1908), unique projection Px (or P C x) of a point x on convexset C is that point in C closest to x ; [250,3.12]‖x − Px‖ 2 = infy∈C ‖x − y‖ 2 = dist(x, C) (2006)There exists a converse (in finite-dimensional Euclidean space):E.14 When P 1 and P 2 are symmetric and R(P 1 )= R(u j ) and R(P 2 )= R(q j ) , then the j thdyad term from the singular value decomposition of X is isolated by the projection. Yetif R(P 2 )= R(q l ), l≠j ∈{1... η}, then P 1 XP 2 =0.

734 APPENDIX E. PROJECTIONE.9.0.0.1 Theorem. (Bunt-Motzkin) Convex set if projections unique.[371,7.5] [196] If C ⊆ R n is a nonempty closed set and if for each and everyx in R n there is a unique Euclidean projection Px of x on C belonging to C ,then C is convex.⋄Borwein & Lewis propose, for closed convex set C [55,3.3 exer.12d]for any point x whereas, for x /∈ C∇‖x − Px‖ 2 2 = 2(x − Px) (2007)∇‖x − Px‖ 2 = (x − Px) ‖x − Px‖ −12 (2008)E.9.0.0.2 Exercise. Norm gradient.Prove (2007) and (2008). (Not proved in [55].)A well-known equivalent characterization of projection on a convex set isa generalization of the perpendicularity condition (1907) for projection on asubspace:E.9.1Dual interpretation of projection on convex setE.9.1.0.1 Definition. Normal vector. [307, p.15]Vector z is normal to convex set C at point Px∈ C if〈z , y−Px〉 ≤ 0 ∀y ∈ C (2009)△A convex set has at least one nonzero normal at each of its boundarypoints. [307, p.100] (Figure 67) Hence, the normal or dual interpretation ofprojection:E.9.1.0.2 Theorem. Unique minimum-distance projection. [199,A.3.1][250,3.12] [109,4.1] [78] (Figure 169b p.752) Given a closed convex setC ⊆ R n , point Px is the unique projection of a given point x∈ R n on C(Px is that point in C nearest x) if and only ifPx ∈ C , 〈x − Px , y − Px〉 ≤ 0 ∀y ∈ C (2010)⋄

E.9. PROJECTION ON CONVEX SET 735As for subspace projection, convex operator P is idempotent in the sense:for each and every x∈ R n , P(Px)=Px . Yet operator P is nonlinear;Projector P is a linear operator if and only if convex set C (on whichprojection is made) is a subspace. (E.4)E.9.1.0.3 Theorem. Unique projection via normal cone. E.15 [109,4.3]Given closed convex set C ⊆ R n , point Px is the unique projection of agiven point x∈ R n on C if and only ifPx ∈ C , Px − x ∈ (C − Px) ∗ (2011)In other words, Px is that point in C nearest x if and only if Px − x belongsto that cone dual to translate C − Px .⋄E.9.1.1Dual interpretation as optimizationDeutsch [111, thm.2.3] [112,2] and Luenberger [250, p.134] carry forwardNirenberg’s dual interpretation of projection [279] as solution to amaximization problem: Minimum distance from a point x∈ R n to a convexset C ⊂ R n can be found by maximizing distance from x to hyperplane ∂Hover the set of all hyperplanes separating x from C . Existence of aseparating hyperplane (2.4.2.7) presumes point x lies on the boundary orexterior to set C .The optimal separating hyperplane is characterized by the fact it alsosupports C . Any hyperplane supporting C (Figure 29a) has form∂H − = { y ∈ R n | a T y = σ C (a) } (129)where the support function is convex, definedσ C (a) = supz∈Ca T z (553)When point x is finite and set C contains finite points, under this projectioninterpretation, if the supporting hyperplane is a separating hyperplane thenthe support function is finite. From Example E.5.0.0.8, projection P ∂H− x ofx on any given supporting hyperplane ∂H − isP ∂H− x = x − a(a T a) −1( a T x − σ C (a) ) (2012)E.15 −(C − Px) ∗ is the normal cone to set C at point Px. (E.10.3.2)

736 APPENDIX E. PROJECTION∂H −C(a)P C xH −H +κaP ∂H− xx(b)Figure 164: Dual interpretation of projection of point x on convex set Cin R 2 . (a) κ = (a T a) −1( a T x − σ C (a) ) (b) Minimum distance from x toC is found by maximizing distance to all hyperplanes supporting C andseparating it from x . A convex problem for any convex set, distance ofmaximization is unique.

E.9. PROJECTION ON CONVEX SET 737With reference to Figure 164, identifyingthenH + = {y ∈ R n | a T y ≥ σ C (a)} (107)‖x − P C x‖ = sup ‖x − P ∂H− x‖ = sup ‖a(a T a) −1 (a T x − σ C (a))‖∂H − | x∈H + a | x∈H += supa | x∈H +1‖a‖ |aT x − σ C (a)|(2013)which can be expressed as a convex optimization, for arbitrary positiveconstant τ‖x − P C x‖ = 1 τ maximize a T x − σ C (a)a(2014)subject to ‖a‖ ≤ τThe unique minimum-distance projection on convex set C is thereforewhere optimally ‖a ⋆ ‖= τ .P C x = x − a ⋆( a ⋆T x − σ C (a ⋆ ) ) 1τ 2 (2015)E.9.1.1.1 Exercise. Dual projection technique on polyhedron.Test that projection paradigm from Figure 164 on any convex polyhedralset.E.9.1.2Dual interpretation of projection on coneIn the circumstance set C is a closed convex cone K and there exists ahyperplane separating given point x from K , then optimal σ K (a ⋆ ) takesvalue 0 [199,C.2.3.1]. So problem (2014) for projection of x on K becomes‖x − P K x‖ = 1 τ maximize a T xasubject to ‖a‖ ≤ τ (2016)a ∈ K ◦The norm inequality in (2016) can be handled by Schur complement (3.5.2).Normals a to all hyperplanes supporting K belong to the polar coneK ◦ = −K ∗ by definition: (319)a ∈ K ◦ ⇔ 〈a, x〉 ≤ 0 for all x ∈ K (2017)

738 APPENDIX E. PROJECTIONProjection on cone K isP K x = (I − 1τ 2a⋆ a ⋆T )x (2018)whereas projection on the polar cone −K ∗ is (E.9.2.2.1)P K ◦x = x − P K x = 1τ 2a⋆ a ⋆T x (2019)Negating vector a , this maximization problem (2016) becomes aminimization (the same problem) and the polar cone becomes the dual cone:E.9.2‖x − P K x‖ = − 1 τ minimize a T xasubject to ‖a‖ ≤ τ (2020)a ∈ K ∗Projection on coneWhen convex set Cconditions:is a cone, there is a finer statement of optimalityE.9.2.0.1 Theorem. Unique projection on cone. [199,A.3.2]Let K ⊆ R n be a closed convex cone, and K ∗ its dual (2.13.1). Then Px isthe unique minimum-distance projection of x∈ R n on K if and only ifPx ∈ K , 〈Px − x, Px〉 = 0, Px − x ∈ K ∗ (2021)In words, Px is the unique minimum-distance projection of x on K ifand only if1) projection Px lies in K2) direction Px−x is orthogonal to the projection Px3) direction Px−x lies in the dual cone K ∗ .As the theorem is stated, it admits projection on K not full-dimensional;id est, on closed convex cones in a proper subspace of R n .⋄

E.9. PROJECTION ON CONVEX SET 739R(P )x − (I −P )xx(I −P )xR(P ) ⊥Figure 165: (confer Figure 83) Given orthogonal projection (I −P )x of xon orthogonal complement R(P ) ⊥ , projection on R(P ) is immediate:x − (I −P )x .Projection on K of any point x∈−K ∗ , belonging to the negative dualcone, is the origin. By (2021): the set of all points reaching the origin, whenprojecting on K , constitutes the negative dual cone; a.k.a, the polar coneK ◦ = −K ∗ = {x∈ R n | Px = 0} (2022)E.9.2.1Relationship to subspace projectionConditions 1 and 2 of the theorem are common with orthogonal projection ona subspace R(P ) : Condition 1 corresponds to the most basic requirement;namely, the projection Px∈ R(P ) belongs to the subspace. Recall theperpendicularity requirement for projection on a subspace;Px − x ⊥ R(P ) or Px − x ∈ R(P ) ⊥ (1907)which corresponds to condition 2.Yet condition 3 is a generalization of subspace projection; id est, forunique minimum-distance projection on a closed convex cone K , polarcone −K ∗ plays the role R(P ) ⊥ plays for subspace projection (Figure 165:

740 APPENDIX E. PROJECTIONP R x = x − P R ⊥ x). Indeed, −K ∗ is the algebraic complement in theorthogonal vector sum (p.773) [269] [199,A.3.2.5]K ⊞ −K ∗ = R n ⇔ cone K is closed and convex (2023)Also, given unique minimum-distance projection Px on K satisfyingTheorem E.9.2.0.1, then by projection on the algebraic complement via I −PinE.2 we have−K ∗ = {x − Px | x∈ R n } = {x∈ R n | Px = 0} (2024)consequent to Moreau (2027). Recalling that any subspace is a closed convexcone E.16 K = R(P ) ⇔ −K ∗ = R(P ) ⊥ (2025)meaning, when a cone is a subspace R(P ) , then the dual cone becomes itsorthogonal complement R(P ) ⊥ . [61,2.6.1] In this circumstance, condition 3becomes coincident with condition 2.The properties of projection on cones following inE.9.2.2 furthergeneralize to subspaces by: (4)K = R(P ) ⇔ −K = R(P ) (2026)E.9.2.2 Salient properties: Projection Px on closed convex cone K[199,A.3.2] [109,5.6] For x, x 1 , x 2 ∈ R n1. P K (αx) = α P K x ∀α≥0 (nonnegative homogeneity)2. ‖P K x‖ ≤ ‖x‖3. P K x = 0 ⇔ x ∈ −K ∗4. P K (−x) = −P −K x5. (Jean-Jacques Moreau (1962)) [269]x = x 1 + x 2 , x 1 ∈ K , x 2 ∈−K ∗ , x 1 ⊥ x 2⇔x 1 = P K x , x 2 = P −K∗x(2027)6. K = {x − P −K∗x | x∈ R n } = {x∈ R n | P −K∗x = 0}7. −K ∗ = {x − P K x | x∈ R n } = {x∈ R n | P K x = 0} (2024)E.16 but a proper subspace is not a proper cone (2.7.2.2.1).

E.9. PROJECTION ON CONVEX SET 741E.9.2.2.1 Corollary. I −P for cones. (conferE.2)Denote by K ⊆ R n a closed convex cone, and call K ∗ its dual. Thenx −P −K∗x is the unique minimum-distance projection of x∈ R n on K if andonly if P −K∗x is the unique minimum-distance projection of x on −K ∗ thepolar cone.⋄Proof. Assume x 1 = P K x . Then by Theorem E.9.2.0.1 we havex 1 ∈ K , x 1 − x ⊥ x 1 , x 1 − x ∈ K ∗ (2028)Now assume x − x 1 = P −K∗x . Then we havex − x 1 ∈ −K ∗ , −x 1 ⊥ x − x 1 , −x 1 ∈ −K (2029)But these two assumptions are apparently identical. We must therefore havex −P −K∗x = x 1 = P K x (2030)E.9.2.2.2 Corollary. Unique projection via dual or normal cone.[109,4.7] (E.10.3.2, confer Theorem E.9.1.0.3) Given point x∈ R n andclosed convex cone K ⊆ R n , the following are equivalent statements:1. point Px is the unique minimum-distance projection of x on K2. Px ∈ K , x − Px ∈ −(K − Px) ∗ = −K ∗ ∩ (Px) ⊥3. Px ∈ K , 〈x − Px, Px〉 = 0, 〈x − Px, y〉 ≤ 0 ∀y ∈ KE.9.2.2.3 Example. Unique projection on nonnegative orthant.(confer (1320)) From Theorem E.9.2.0.1, to project matrix H ∈ R m×n onthe selfdual orthant (2.13.5.1) of nonnegative matrices R m×n+ in isomorphicR mn , the necessary and sufficient conditions are:⋄H ⋆ ≥ 0tr ( (H ⋆ − H) T H ⋆) = 0H ⋆ − H ≥ 0(2031)

742 APPENDIX E. PROJECTIONwhere the inequalities denote entrywise comparison. The optimal solutionH ⋆ is simply H having all its negative entries zeroed;H ⋆ ij = max{H ij , 0} , i,j∈{1... m} × {1... n} (2032)Now suppose the nonnegative orthant is translated by T ∈ R m×n ; id est,consider R m×n+ + T . Then projection on the translated orthant is [109,4.8]H ⋆ ij = max{H ij , T ij } (2033)E.9.2.2.4 Example. Unique projection on truncated convex cone.Consider the problem of projecting a point x on a closed convex cone thatis artificially bounded; really, a bounded convex polyhedron having a vertexat the origin:minimize ‖x − Ay‖ 2y∈R Nsubject to y ≽ 0(2034)‖y‖ ∞ ≤ 1where the convex cone has vertex-description (2.12.2.0.1), for A∈ R n×NK = {Ay | y ≽ 0} (2035)and where ‖y‖ ∞ ≤ 1 is the artificial bound. This is a convex optimizationproblem having no known closed-form solution, in general. It arises, forexample, in the fitting of hearing aids designed around a programmablegraphic equalizer (a filter bank whose only adjustable parameters are gainper band each bounded above by unity). [96] The problem is equivalent to aSchur-form semidefinite program (3.5.2)minimizey∈R N , t∈Rsubject tot[tI x − Ay(x − Ay) T t]≽ 0(2036)0 ≼ y ≼ 1

E.9. PROJECTION ON CONVEX SET 743E.9.3nonexpansivityE.9.3.0.1 Theorem. Nonexpansivity. [175,2] [109,5.3]When C ⊂ R n is an arbitrary closed convex set, projector P projecting on Cis nonexpansive in the sense: for any vectors x,y ∈ R n‖Px − Py‖ ≤ ‖x − y‖ (2037)with equality when x −Px = y −Py . E.17⋄Proof. [54]‖x − y‖ 2 = ‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2+ 2〈x − Px, Px − Py〉 + 2〈y − Py , Py − Px〉(2038)Nonnegativity of the last two terms follows directly from the uniqueminimum-distance projection theorem (E.9.1.0.2).The foregoing proof reveals another flavor of nonexpansivity; for each andevery x,y ∈ R n‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2 ≤ ‖x − y‖ 2 (2039)Deutsch shows yet another: [109,5.5]E.9.4‖Px − Py‖ 2 ≤ 〈x − y , Px − Py〉 (2040)Easy projectionsTo project any matrix H ∈ R n×n orthogonally in Euclidean/Frobeniussense on subspace of symmetric matrices S n in isomorphic R n2 , takesymmetric part of H ; (2.2.2.0.1) id est, (H+H T )/2 is the projection.To project any H ∈ R n×n orthogonally on symmetric hollow subspaceS n h in isomorphic Rn2 (2.2.3.0.1,7.0.1), take symmetric part then zeroall entries along main diagonal or vice versa (because this is projectionon intersection of two subspaces); id est, (H + H T )/2 − δ 2 (H).E.17 This condition for equality corrects an error in [78] (where the norm is applied to eachside of the condition given here) easily revealed by counterexample.

744 APPENDIX E. PROJECTIONTo project a matrix on nonnegative orthant R m×n+ , simply clip allnegative entries to 0. Likewise, projection on nonpositive orthant R m×n−sees all positive entries clipped to 0. Projection on other orthants isequally simple with appropriate clipping.Projecting on hyperplane, halfspace, slab:E.5.0.0.8.Projection of y ∈ R n on Euclidean ball B = {x∈ R n | ‖x − a‖ ≤ c} :for y ≠ a , P B y = (y − a) + a .c‖y−a‖Clipping in excess of |1| each entry of point x∈ R n is equivalent tounique minimum-distance projection of x on a hypercube centered atthe origin. (conferE.10.3.2)Projection of x∈ R n on a (rectangular) hyperbox: [61,8.1.1]C = {y ∈ R n | l ≼ y ≼ u, l ≺ u} (2041)⎧⎨ l k , x k ≤ l kP(x) k=0...n = x k , l k ≤ x k ≤ u k (2042)⎩u k , x k ≥ u kOrthogonal projection of x on a Cartesian subspace, whose basis issome given subset of the Cartesian axes, zeroes entries correspondingto the remaining (complementary) axes.Projection of x on set of all cardinality-k vectors {y | cardy ≤k} keepsk entries of greatest magnitude and clips to 0 those remaining.Unique minimum-distance projection of H ∈ S n on positive semidefinitecone S n + in Euclidean/Frobenius sense is accomplished by eigenvaluedecomposition (diagonalization) followed by clipping all negativeeigenvalues to 0.Unique minimum-distance projection on generally nonconvex subset ofall matrices belonging to S n + having rank not exceeding ρ (2.9.2.1)is accomplished by clipping all negative eigenvalues to 0 and zeroingsmallest nonnegative eigenvalues keeping only ρ largest. (7.1.2)

E.9. PROJECTION ON CONVEX SET 745Unique minimum-distance projection, in Euclidean/Frobenius sense,of H ∈ R m×n on generally nonconvex subset of all m ×n matrices ofrank no greater than k is singular value decomposition (A.6) of Hhaving all singular values beyond k th zeroed. This is also a solution toprojection in sense of spectral norm. [328, p.79, p.208]Projection on monotone nonnegative cone K M+ ⊂ R n in less than onecycle (in sense of alternating projectionsE.10): [Wıκımization].Fast projection on a simplicial cone: [Wıκımization].Projection on closed convex cone K of any point x∈−K ∗ , belonging topolar cone, is equivalent to projection on origin. (E.9.2)P S N+ ∩ S N c= P S N+P S N c(1264)P RN×N+ ∩ S N h= P RN×NP + S Nh(7.0.1.1)P RN×N+ ∩ S= P N RN×N+P S N(E.9.5)E.9.4.0.1 Exercise. Projection on spectral norm ball.Find the unique minimum-distance projection on the convex set of allm ×n matrices whose largest singular value does not exceed 1 ; id est, on{X ∈ R m×n | ‖X‖ 2 ≤ 1} the spectral norm ball (2.3.2.0.5). E.9.4.1notesProjection on Lorentz (second-order) cone: [61, exer.8.3c].Deutsch [112] provides an algorithm for projection on polyhedral cones.Youla [392,2.5] lists eleven “useful projections”, of square-integrableuni- and bivariate real functions on various convex sets, in closed form.Unique minimum-distance projection on an ellipsoid: Example 4.6.0.0.1.Numerical algorithms for projection on the nonnegative simplex and1-norm ball are in [126].E.9.5Projection on convex set in subspaceSuppose a convex set C is contained in some subspace R n . Then uniqueminimum-distance projection of any point in R n ⊕ R n⊥ on C can be

746 APPENDIX E. PROJECTION❇❇❇❇❇❇❇❇ x ❜❜R n⊥ ❙❇ ❜❇❇❇❇◗◗◗◗ ❙❙❙❙❙❙ ❜❜❜❜❜z❜y❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧ R n❜❜❜C❜❜❜❜❜x −z ⊥ z −y ❜❇❜❇❇❇❇ ❜❜❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧Figure 166: Closed convex set C belongs to subspace R n (shown boundedin sketch and drawn without proper perspective). Point y is uniqueminimum-distance projection of x on C ; equivalent to product of orthogonalprojection of x on R n and minimum-distance projection of result z on C .

E.9. PROJECTION ON CONVEX SET 747accomplished by first projecting orthogonally on that subspace, and thenprojecting the result on C ; [109,5.14] id est, the ordered product of twoindividual projections that is not commutable.Proof. (⇐) To show that, suppose unique minimum-distance projectionP C x on C ⊂ R n is y as illustrated in Figure 166;‖x − y‖ ≤ ‖x − q‖ ∀q ∈ C (2043)Further suppose P Rn x equals z . By the Pythagorean theorem‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 (2044)because x − z ⊥ z − y . (1907) [250,3.3] Then point y = P C x is the sameas P C z because‖z − y‖ 2 = ‖x − y‖ 2 − ‖x − z‖ 2 ≤ ‖z − q‖ 2 = ‖x − q‖ 2 − ‖x − z‖ 2which holds by assumption (2043).(⇒) Now suppose z = P Rn x and∀q ∈ C(2045)‖z − y‖ ≤ ‖z − q‖ ∀q ∈ C (2046)meaning y = P C z . Then point y is identical to P C x because‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 ≤ ‖x − q‖ 2 = ‖x − z‖ 2 + ‖z − q‖ 2by assumption (2046).∀q ∈ C(2047)This proof is extensible via translation argument. (E.4) Uniqueminimum-distance projection on a convex set contained in an affine subsetis, therefore, similarly accomplished.Projecting matrix H ∈ R n×n on convex cone K = S n ∩ R n×n+ in isomorphicR n2 can be accomplished, for example, by first projecting on S n and only thenprojecting the result on R n×n+ (confer7.0.1). This is because that projectionproduct is equivalent to projection on the subset of the nonnegative orthantin the symmetric matrix subspace.

748 APPENDIX E. PROJECTIONE.10 Alternating projectionAlternating projection is an iterative technique for finding a point in theintersection of a number of arbitrary closed convex sets C k , or for findingthe distance between two nonintersecting closed convex sets. Because it cansometimes be difficult or inefficient to compute the intersection or expressit analytically, one naturally asks whether it is possible to instead project(unique minimum-distance) alternately on the individual C k ; often easierand what motivates adoption of this technique. Once a cycle of alternatingprojections (an iteration) is complete, we then iterate (repeat the cycle)until convergence. If the intersection of two closed convex sets is empty, thenby convergence we mean the iterate (the result after a cycle of alternatingprojections) settles to a point of minimum distance separating the sets.While alternating projection can find the point in the nonemptyintersection closest to a given point b , it does not necessarily find theclosest point. Finding that closest point is made dependable by an elegantlysimple enhancement via correction to the alternating projection technique:this Dykstra algorithm (2086) for projection on the intersection is one ofthe most beautiful projection algorithms ever discovered. It is accuratelyinterpreted as the discovery of what alternating projection originally soughtto accomplish: unique minimum-distance projection on the nonemptyintersection of a number of arbitrary closed convex sets C k . Alternatingprojection is, in fact, a special case of the Dykstra algorithm whose discussionwe defer untilE.10.3.E.10.0.1commutative projectorsGiven two arbitrary convex sets C 1 and C 2 and their respectiveminimum-distance projection operators P 1 and P 2 , if projectors commutefor each and every x∈ R n then it is easy to show P 1 P 2 x∈ C 1 ∩ C 2 andP 2 P 1 x∈ C 1 ∩ C 2 . When projectors commute (P 1 P 2 =P 2 P 1 ), a point in theintersection can be found in a finite number of steps; while commutativity isa sufficient condition, it is not necessary (6.8.1.1.1 for example).When C 1 and C 2 are subspaces, in particular, projectors P 1 and P 2commute if and only if P 1 P 2 = P C1 ∩ C 2or iff P 2 P 1 = P C1 ∩ C 2or iff P 1 P 2 isthe orthogonal projection on a Euclidean subspace. [109, lem.9.2] Subspaceprojectors will commute, for example, when P 1 (C 2 )⊂ C 2 or P 2 (C 1 )⊂ C 1 orC 1 ⊂ C 2 or C 2 ⊂ C 1 or C 1 ⊥ C 2 . When subspace projectors commute, this

E.10. ALTERNATING PROJECTION 749bC 2∂K ⊥ C 1 ∩ C 2(Pb) + PbC 1Figure 167: First several alternating projections in von Neumann-styleprojection (2059), of point b , converging on closest point Pb in intersectionof two closed convex sets in R 2 : C 1 and C 2 are partially drawn in vicinityof their intersection. Pointed normal cone K ⊥ (449) is translated to Pb , theunique minimum-distance projection of b on intersection. For this particularexample, it is possible to start anywhere in a large neighborhood of b and stillconverge to Pb . Alternating projections are themselves robust with respectto significant noise because they belong to translated normal cone.means we can find a point in the intersection of those subspaces in a finitenumber of steps; we find, in fact, the closest point.E.10.0.1.1 Theorem. Kronecker projector. [327,2.7]Given any projection matrices P 1 and P 2 (subspace projectors), thenP 1 ⊗ P 2 and P 1 ⊗ I (2048)are projection matrices. The product preserves symmetry if present. ⋄E.10.0.2noncommutative projectorsTypically, one considers the method of alternating projection when projectorsdo not commute; id est, when P 1 P 2 ≠P 2 P 1 .The iconic example for noncommutative projectors illustrated inFigure 167 shows the iterates converging to the closest point in theintersection of two arbitrary convex sets. Yet simple examples likeFigure 168 reveal that noncommutative alternating projection does not

750 APPENDIX E. PROJECTIONbH 1H 2P 2 bH 1 ∩ H 2PbP 1 P 2 bFigure 168: The sets {C k } in this example comprise two halfspaces H 1 andH 2 . The von Neumann-style alternating projection in R 2 quickly convergesto P 1 P 2 b (feasibility). The unique minimum-distance projection on theintersection is, of course, Pb .always yield the closest point, although we shall show it always yields somepoint in the intersection or a point that attains the distance between twoconvex sets.Alternating projection is also known as successive projection [178] [175][63], cyclic projection [148] [260,3.2], successive approximation [78], orprojection on convex sets [325] [326,6.4]. It is traced back to von Neumann(1933) [369] and later Wiener [376] who showed that higher iterates of aproduct of two orthogonal projections on subspaces converge at each pointin the ambient space to the unique minimum-distance projection on theintersection of the two subspaces. More precisely, if R 1 and R 2 are closedsubspaces of a Euclidean space and P 1 and P 2 respectively denote orthogonalprojection on R 1 and R 2 , then for each vector b in that space,lim (P 1P 2 ) i b = P R1 ∩ R 2b (2049)i→∞Deutsch [109, thm.9.8, thm.9.35] shows rate of convergence for subspaces tobe geometric [391,1.4.4]; bounded above by κ 2i+1 ‖b‖ , i=0, 1, 2..., where

E.10. ALTERNATING PROJECTION 7510≤κ

752 APPENDIX E. PROJECTIONa(a){y | (b −P 1 b) T (y −P 1 b)=0}b(b)C 1yC 1C 1P 1 bC 2C 2bC 2PbP 1 P 2 b(c)Figure 169:(a) (distance) Intersection of two convex sets in R 2 is empty. Method ofalternating projection would be applied to find that point in C 1 nearest C 2 .(b) (distance) Given b ∈ C 2 , then P 1 b ∈ C 1 is nearest b iff(y −P 1 b) T (b −P 1 b)≤0 ∀y ∈ C 1 by the unique minimum-distance projectiontheorem (E.9.1.0.2). When P 1 b attains the distance between the two sets,hyperplane {y | (b −P 1 b) T (y −P 1 b)=0} separates C 1 from C 2 . [61,2.5.1](c) (0 distance) Intersection is nonempty.(optimization) We may want the point Pb in ⋂ C k nearest point b .(feasibility) We may instead be satisfied with a fixed point of the projectionproduct P 1 P 2 b in ⋂ C k .

E.10. ALTERNATING PROJECTION 753E.10.1Distance and existenceExistence of a fixed point is established:E.10.1.0.1 Theorem. Distance. [78]Given any two closed convex sets C 1 and C 2 in R n , then P 1 b∈ C 1 is afixed point of projection product P 1 P 2 if and only if P 1 b is a point of C 1nearest C 2 .⋄Proof. (⇒) Given fixed point a =P 1 P 2 a∈ C 1 with b P 2 a∈ C 2 intandem so that a =P 1 b , then by the unique minimum-distance projectiontheorem (E.9.1.0.2)(2054)(b − a) T (u − a) ≤ 0 ∀u∈ C 1(a − b) T (v − b) ≤ 0⇔∀v ∈ C 2‖a − b‖ ≤ ‖u − v‖ ∀u∈ C 1 and ∀v ∈ C 2by Cauchy-Schwarz inequality [307]|〈x,y〉| ≤ ‖x‖ ‖y‖ (2055)(with equality iff x=κy where κ∈ R [227, p.137]).(⇐) Suppose a∈ C 1 and ‖a − P 2 a‖ ≤ ‖u − P 2 u‖ ∀u∈ C 1 and we chooseu =P 1 P 2 a . Then‖u − P 2 u‖ = ‖P 1 P 2 a − P 2 P 1 P 2 a‖ ≤ ‖a − P 2 a‖ ⇔ a = P 1 P 2 a (2056)Thus a = P 1 b (with b =P 2 a∈ C 2 ) is a fixed point in C 1 of the projectionproduct P 1 P 2 . E.19E.10.2Feasibility and convergenceThe set of all fixed points of any nonexpansive mapping is a closed convexset. [156, lem.3.4] [28,1] The projection product P 1 P 2 is nonexpansive byTheorem E.9.3.0.1 because, for any vectors x,a∈ R n‖P 1 P 2 x − P 1 P 2 a‖ ≤ ‖P 2 x − P 2 a‖ ≤ ‖x − a‖ (2057)E.19 Point b=P 2 a can be shown, similarly, to be a fixed point of product P 2 P 1 .

754 APPENDIX E. PROJECTIONIf the intersection of two closed convex sets C 1 ∩ C 2 is empty, then the iteratesconverge to a point of minimum distance, a fixed point of the projectionproduct. Otherwise, convergence is to some fixed point in their intersection (afeasible solution) whose existence is guaranteed by virtue of the fact that eachand every point in the convex intersection is in one-to-one correspondencewith fixed points of the nonexpansive projection product.Bauschke & Borwein [28,2] argue that any sequence monotonic in thesense of Fejér is convergent: E.20E.10.2.0.1 Definition. Fejér monotonicity. [270]Given closed convex set C ≠ ∅ , then a sequence x i ∈ R n , i=0, 1, 2..., ismonotonic in the sense of Fejér with respect to C iff‖x i+1 − c‖ ≤ ‖x i − c‖ for all i≥0 and each and every c ∈ C (2058)Given x 0 b , if we express each iteration of alternating projection byx i+1 = P 1 P 2 x i , i=0, 1, 2... (2059)and define any fixed point a =P 1 P 2 a , then sequence x i is Fejér monotonewith respect to fixed point a because‖P 1 P 2 x i − a‖ ≤ ‖x i − a‖ ∀i ≥ 0 (2060)by nonexpansivity. The nonincreasing sequence ‖P 1 P 2 x i − a‖ is boundedbelow hence convergent because any bounded monotonic sequence in Ris convergent; [258,1.2] [42,1.1] P 1 P 2 x i+1 = P 1 P 2 x i = x i+1 . Sequencex i therefore converges to some fixed point. If the intersection C 1 ∩ C 2is nonempty, convergence is to some point there by the distance theorem.Otherwise, x i converges to a point in C 1 of minimum distance to C 2 .E.10.2.0.2 Example. Hyperplane/orthant intersection.Find a feasible solution (2052) belonging to the nonempty intersection of twoconvex sets: given A∈ R m×n , β ∈ R(A)C 1 ∩ C 2 = R n + ∩ A = {y | y ≽ 0} ∩ {y | Ay = β} ⊂ R n (2061)E.20 Other authors prove convergence by different means; e.g., [175] [63].△

E.10. ALTERNATING PROJECTION 755y 2θC 1 = R 2 +bPb = ( ∞ ∏j=1 k=12∏P k )by 1C 2 = A = {y | [ 1 1 ]y = 1}Figure 170: From Example E.10.2.0.2 in R 2 , showing von Neumann-stylealternating projection to find feasible solution belonging to intersection ofnonnegative orthant with hyperplane A . Point Pb lies at intersection ofhyperplane with ordinate axis. In this particular example, feasible solutionfound is coincidentally optimal. Rate of convergence depends upon angle θ ;as it becomes more acute, convergence slows. [175,3]∥ x ∏i − ( ∞j=1 k=12∏P k )b∥201816141210864200 5 10 15 20 25 30 35 40 45iFigure 171: Example E.10.2.0.2 in R 1000 ; geometric convergence of iteratesin norm.

756 APPENDIX E. PROJECTIONthe nonnegative orthant with affine subset A an intersection of hyperplanes.Projection of an iterate x i ∈ R n on A is calculatedP 2 x i = x i − A T (AA T ) −1 (Ax i − β) (1953)while, thereafter, projection of the result on the orthant is simplyx i+1 = P 1 P 2 x i = max{0,P 2 x i } (2062)where the maximum is entrywise (E.9.2.2.3).One realization of this problem in R 2 is illustrated in Figure 170: ForA = [ 1 1 ] , β =1, and x 0 = b = [ −3 1/2 ] T , iterates converge to a feasiblesolution Pb = [ 0 1 ] T .To give a more palpable sense of convergence in higher dimension, wedo this example again but now we compute an alternating projection forthe case A∈ R 400×1000 , β ∈ R 400 , and b∈R 1000 , all of whose entries areindependently and randomly set to a uniformly distributed real number inthe interval [−1, 1] . Convergence is illustrated in Figure 171. This application of alternating projection to feasibility is extensible toany finite number of closed convex sets.E.10.2.0.3 Example. Under- and over-projection. [58,3]Consider the following variation of alternating projection: We begin withsome point x 0 ∈ R n then project that point on convex set C and thenproject that same point x 0 on convex set D . To the first iterate we assignx 1 = (P C (x 0 ) + P D (x 0 )) 1 . More generally,2x i+1 = (P C (x i ) + P D (x i )) 1 2, i=0, 1, 2... (2063)Because the Cartesian product of convex sets remains convex, (2.1.8) wecan reformulate this problem.Consider the convex set [ ] CS (2064)Drepresenting Cartesian product C × D . Now, those two projections P C andP D are equivalent to one projection on the Cartesian product; id est,([ ]) [ ]xi PC (xP S = i )(2065)x i P D (x i )

E.10. ALTERNATING PROJECTION 757Define the subspaceR By the results in Example E.5.0.0.6P RS([xix i])= P R([PC (x i )P D (x i ){ [ ] ∣ }Rn ∣∣∣v ∈R n [I −I ]v = 0])=[PC (x i ) + P D (x i )P C (x i ) + P D (x i )] 12(2066)(2067)This means the proposed variation of alternating projection is equivalent toan alternation of projection on convex sets S and R . If S and R intersect,these iterations will converge to a point in their intersection; hence, to a pointin the intersection of C and D .We need not apply equal weighting to the projections, as supposed in(2063). In that case, definition of R would change accordingly. E.10.2.1Relative measure of convergenceInspired by Fejér monotonicity, the alternating projection algorithm fromthe example of convergence illustrated by Figure 171 employs a redundant∏sequence: The first sequence (indexed by j) estimates point ( ∞ L∏P k )b inj=1 k=1the presumably nonempty intersection of L convex sets, then the quantity( ∥ x ∏ ∞ L∏i − P k)b(2068)∥j=1 k=1in second sequence x i is observed per iteration i for convergence. A prioriknowledge of a feasible solution (2052) is both impractical and antithetical.We need another measure:Nonexpansivity implies( L( ∏L∥∏ ∥∥∥∥ P l)x k,i−1 − P l)x ki = ‖x ki − x k,i+1 ‖ ≤ ‖x k,i−1 − x ki ‖ (2069)∥l=1l=1wherex ki P k x k+1,i ∈ R n , x L+1,i x 1,i−1 (2070)

758 APPENDIX E. PROJECTIONx ki represents unique minimum-distance projection of x k+1,i on convex set kat iteration i . So a good convergence measure is total monotonic sequenceε i L∑‖x ki − x k,i+1 ‖ , i=0, 1, 2... (2071)k=1where limi→∞ε i = 0 whether or not the intersection is nonempty.E.10.2.1.1 Example. Affine subset ∩ positive semidefinite cone.Consider the problem of finding X ∈ S n that satisfiesX ≽ 0, 〈A j , X〉 = b j , j =1... m (2072)given nonzero A j ∈ S n and real b j . Here we take C 1 to be the positivesemidefinite cone S n + while C 2 is the affine subset of S nC 2 = A {X | tr(A j X)=b j , j =1... m} ⊆ S n⎡ ⎤svec(A 1 ) T= {X | ⎣ . ⎦svec X = b}svec(A m ) T {X | A svec X = b}(2073)where b = [b j ] ∈ R m , A ∈ R m×n(n+1)/2 , and symmetric vectorization svec isdefined by (56). Projection of iterate X i ∈ S n on A is: (E.5.0.0.6)P 2 svec X i = svec X i − A † (A svec X i − b) (2074)Euclidean distance from X i to A is thereforedist(X i , A) = ‖X i − P 2 X i ‖ F = ‖A † (A svec X i − b)‖ 2 (2075)Projection of P 2 X i ∑ j λ j q j q T j on the positive semidefinite cone (7.1.2) isfound from its eigenvalue decomposition (A.5.1);P 1 P 2 X i =n∑max{0 , λ j }q j qj T (2076)j=1

E.10. ALTERNATING PROJECTION 759Distance from P 2 X i to the positive semidefinite cone is thereforen∑dist(P 2 X i , S+) n = ‖P 2 X i − P 1 P 2 X i ‖ F = √ (min{0,λ j }) 2 (2077)When the intersection is empty A ∩ S n +=∅ , the iterates converge to thatpositive semidefinite matrix closest to A in the Euclidean sense. Otherwise,convergence is to some point in the nonempty intersection.Barvinok (2.9.3.0.1) shows that if a solution to (2072) exists, then thereexists an X ∈ A ∩ S n + such that⌊√ ⌋ 8m + 1 − 1rankX ≤2j=1(272)E.10.2.1.2 Example. Semidefinite matrix completion.Continuing Example E.10.2.1.1: When m≤n(n + 1)/2 and the A j matricesare distinct members of the standard orthonormal basis {E lq ∈ S n } (59){ }el e T{A j ∈ S n l , l = q = 1... n, j =1... m} ⊆ {E lq } =√12(e l e T q + e q e T l ), 1 ≤ l < q ≤ nand when the constants b jX [X lq ]∈ S n{b j , j =1... m} ⊆(2078)are set to constrained entries of variable{ }Xlq√, l = q = 1... nX lq 2 , 1 ≤ l < q ≤ n= {〈X,E lq 〉} (2079)then the equality constraints in (2072) fix individual entries of X ∈ S n . Thusthe feasibility problem becomes a positive semidefinite matrix completionproblem. Projection of iterate X i ∈ S n on A simplifies to (confer (2074))P 2 svec X i = svec X i − A T (A svec X i − b) (2080)From this we can see that orthogonal projection is achieved simply by settingcorresponding entries of P 2 X i to the known entries of X , while the entriesof P 2 X i remaining are set to corresponding entries of the current iterate X i .

760 APPENDIX E. PROJECTION10 0 idist(P 2 X i , S n +)10 −10 2 4 6 8 10 12 14 16 18Figure 172: Distance (confer (2077)) between positive semidefinite cone anditerate (2080) in affine subset A (2073) for Laurent’s completion problem;initially, decreasing geometrically.Using this technique, we find a positive semidefinite completion for⎡⎢⎣4 3 ? 23 4 3 ?? 3 4 32 ? 3 4⎤⎥⎦ (2081)Initializing the unknown entries to 0, they all converge geometrically to1.5858 (rounded) after about 42 iterations.Laurent gives a problem for which no positive semidefinite completionexists: [236]⎡⎢⎣1 1 ? 01 1 1 ?? 1 1 10 ? 1 1⎤⎥⎦ (2082)Initializing unknowns to 0, by alternating projection we find the constrained

E.10. ALTERNATING PROJECTION 761matrix closest to the positive semidefinite cone,⎡⎢⎣1 1 0.5454 01 1 1 0.54540.5454 1 1 10 0.5454 1 1⎤⎥⎦ (2083)and we find the positive semidefinite matrix closest to the affine subset A(2073):⎡⎤1.0521 0.9409 0.5454 0.0292⎢ 0.9409 1.0980 0.9451 0.5454⎥⎣ 0.5454 0.9451 1.0980 0.9409 ⎦ (2084)0.0292 0.5454 0.9409 1.0521These matrices (2083) and (2084) attain the Euclidean distance dist(A , S+).nConvergence is illustrated in Figure 172.E.10.3Optimization and projectionUnique projection on the nonempty intersection of arbitrary convex sets tofind the closest point therein is a convex optimization problem. The firstsuccessful application of alternating projection to this problem is attributedto Dykstra [127] [62] who in 1983 provided an elegant algorithm that prevailstoday. In 1988, Han [178] rediscovered the algorithm and provided aprimal−dual convergence proof. A synopsis of the history of alternatingprojection E.21 can be found in [64] where it becomes apparent that Dykstra’swork is seminal.E.10.3.1Dykstra’s algorithmAssume we are given some point b ∈ R n and closed convex sets{C k ⊂ R n | k=1... L}. Let x ki ∈ R n and y ki ∈ R n respectively denote aprimal and dual vector (whose meaning can be deduced from Figure 173and Figure 174) associated with set k at iteration i . Initializey k0 = 0 ∀k=1... L and x 1,0 = b (2085)E.21 For a synopsis of alternating projection applied to distance geometry, see [353,3.1].

762 APPENDIX E. PROJECTIONbH 1x 22x 21H 2x 12H 1 ∩ H 2x 11Figure 173: H 1 and H 2 are the same halfspaces as in Figure 168.Dykstra’s alternating projection algorithm generates the alternationsb, x 21 , x 11 , x 22 , x 12 , x 12 ... x 12 . The path illustrated from b to x 12 in R 2terminates at the desired result, Pb in Figure 168. The {y ki } correspondto the first two difference vectors drawn (in the first iteration i=1), thenoscillate between zero and a negative vector thereafter. These alternationsare not so robust in presence of noise as for the example in Figure 167.Denoting by P k t the unique minimum-distance projection of t on C k , andfor convenience x L+1,i = x 1,i−1 (2070), iterate x 1i calculation proceeds: E.22for i=1, 2,...until convergence {for k=L... 1 {t = x k+1,i − y k,i−1x ki = P k ty ki = P k t − t}}(2086)E.22 We reverse order of projection (k=L...1) in the algorithm for continuity of exposition.

E.10. ALTERNATING PROJECTION 763K ⊥ H 1 ∩ H 2(0)K ⊥ H 1 ∩ H 2(Pb) + PbH 10H 2K H 1 ∩ H 2PbbFigure 174: Two examples (truncated): Normal cone to H 1 ∩ H 2 at theorigin, and at point Pb on the boundary. H 1 and H 2 are the same halfspacesfrom Figure 173. The normal cone at the origin K ⊥ H 1 ∩ H 2(0) is simply −K ∗ .Assuming a nonempty intersection, then the iterates converge to the uniqueminimum-distance projection of point b on that intersection; [109,9.24]Pb = limi→∞x 1i (2087)In the case that all C k are affine, then calculation of y ki is superfluousand the algorithm becomes identical to alternating projection. [109,9.26][148,1] Dykstra’s algorithm is so simple, elegant, and represents such a tinyincrement in computational intensity over alternating projection, it is nearlyalways arguably cost effective.E.10.3.2Normal coneGlunt [155,4] observes that the overall effect of Dykstra’s iterativeprocedure is to drive t toward the translated normal cone to ⋂ C k atthe solution Pb (translated to Pb). The normal cone gets its namefrom its graphical construction; which is, loosely speaking, to draw theoutward-normals at Pb (Definition E.9.1.0.1) to all the convex sets C ktouching Pb . Relative interior of the normal cone subtends these normalvectors.

764 APPENDIX E. PROJECTIONE 3K ⊥ E 3 (11 T ) + 11 TFigure 175: A few renderings (including next page) of normal cone K ⊥ E 3 toelliptope E 3 (Figure 130), at point 11 T , projected on R 3 . In [237, fig.2],normal cone is claimed circular in this dimension. (Severe numerical artifactscorrupt boundary and make relative interior corporeal; drawn truncated.)

E.10. ALTERNATING PROJECTION 765E.10.3.2.1 Definition. Normal cone. (2.13.10)The normal cone [268] [42, p.261] [199,A.5.2] [55,2.1] [306,3] [307, p.15]to any set S ⊆ R n at any particular point a∈ R n is defined as the closed coneK ⊥ S (a) = {z ∈ Rn | z T (y −a)≤0 ∀y ∈ S} = −(S − a) ∗= {z ∈ R n | z T y ≤ 0 ∀y ∈ S − a}(449)an intersection of halfspaces about the origin in R n , hence convex regardlessof convexity of S ; it is the negative dual cone to translate S − a ; the setof all vectors normal to S at a (E.9.1.0.1).△Examples of normal cone construction are illustrated in Figure 67,Figure 174, Figure 175, and Figure 176. Normal cone at 0 in Figure 174is the vector sum (2.1.8) of two normal cones; [55,3.3 exer.10] forH 1 ∩ int H 2 ≠ ∅K ⊥ H 1 ∩ H 2(0) = K ⊥ H 1(0) + K ⊥ H 2(0) (2088)This formula applies more generally to other points in the intersection.The normal cone to any affine set A at α∈ A , for example, is theorthogonal complement of A − α . When A = 0, KA ⊥(0) = A⊥ is R n theambient space of A .Projection of any point in the translated normal cone KC ⊥ (a∈ C) + a onconvex set C is identical to a ; in other words, point a is that point in C

766 APPENDIX E. PROJECTIONC ∞C 1C 2K ⊥ C 1 (0)K ⊥ C 2 (0)K ⊥ C ∞(0)Figure 176: Rough sketch of normal cone to set C ⊂ R 2 as C wanders towardinfinity. A point at which a normal cone is determined, here the origin, neednot belong to the set. Normal cone to C 1 is a ray. But as C moves outward,normal cone approaches a halfspace.closest to any point belonging to the translated normal cone KC ⊥ (a) + a ; e.g.,Theorem E.4.0.0.1. The normal cone to convex cone K at the originK ⊥ K(0) = −K ∗ (2089)is the negative dual cone. Any point belonging to −K ∗ , projected on K ,projects on the origin. More generally, [109,4.5]K ⊥ K(a) = −(K − a) ∗ (2090)K ⊥ K(a∈ K) = −K ∗ ∩ a ⊥ (2091)The normal cone to ⋂ C k at Pb in Figure 168 is ray {ξ(b −Pb) | ξ ≥0}illustrated in Figure 174. Applying Dykstra’s algorithm to that example,convergence to the desired result is achieved in two iterations as illustrated inFigure 173. Yet applying Dykstra’s algorithm to the example in Figure 167does not improve rate of convergence, unfortunately, because the givenpoint b and all the alternating projections already belong to the translatednormal cone at the vertex of intersection.

E.10. ALTERNATING PROJECTION 767E.10.3.3speculationDykstra’s algorithm always converges at least as quickly as classicalalternating projection, never slower [109], and it succeeds where alternatingprojection fails. Rate of convergence is wholly dependent on particulargeometry of a given problem. From these few examples we surmise, uniqueminimum-distance projection on blunt (not sharp or acute, informally)full-dimensional polyhedral cones may be found by Dykstra’s algorithm infew iterations. But total number of alternating projections, constitutingthose iterations, can never be less than number of convex sets.

768 APPENDIX E. PROJECTION

Appendix FNotation and a few definitionsbb ib i:jb k (i:j)b Tb HAA TA −2TA T 1skinny(italic abcdefghijklmnopqrstuvwxyz)column vector, scalar, or logical conditioni th entry of vector b=[b i , i=1... n] or i th b vector from a set or list{b j , j =1... n} or i th iterate of vector bor b(i:j) , truncated vector comprising i th through j th entry of vector bor b i:j,k , truncated vector comprising i th through j th entry of vector b kvector transposeHermitian (complex conjugate) transpose b ∗Tmatrix, scalar, or logical condition(italic ABCDEFGHIJKLMNOPQRSTUV WXY Z)Matrix transpose. Regarding A as a linear operator, A T is its adjoint.matrix transpose of squared inverse; and vice versa, ( A −2) T=(AT ) −2first of various transpositions of a cubix or quartix A (p.678, p.683)⎡ ⎤a skinny matrix; meaning, more rows than columns:⎣⎦.When there are more equations than unknowns, we say that the systemAx = b is overdetermined. [159,5.3]2001 Jon Dattorro. co&edg version 2010.10.26. All rights reserved.citation: Dattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.769

770 APPENDIX F. NOTATION AND A FEW DEFINITIONSfata fat matrix; meaning, more columns than rows:underdetermined[ ].AF(C ∋A)G(K)some set (calligraphic ABCDEFGHIJ KLMN OPQRST UVWX YZ)smallest face (171) that contains element A of set Cgenerators (2.13.4.2.1) of set K ; any collection of points and directionswhose hull constructs KL ν ν level set (555)L ν sublevel set (559)L ν superlevel set (644)A −1A †√√linverse of matrix AMoore-Penrose pseudoinverse of matrix Apositive square rootpositive l th rootA 1/2 and √ A A 1/2 is any matrix √ such that A 1/2 A 1/2 =A .For A ∈ S n + , A ∈ Sn+ is unique and √ A √ A =A . [55,1.2] (A.5.1.3)◦√DA= [√dij ] . (1347) Hadamard positive square root: D = ◦√ D ◦ ◦√ DEuler Fraktur ABCDEFGHIJKLMNOPQRSTUVWXYZL Lagrangian (507)Emember of elliptope E t (1097) parametrized by scalar tE elliptope (1076)EE ijelementary matrixmember of standard orthonormal basis for symmetric (59) or symmetrichollow (75) matrices

771A ij or A(i, j) , ij th entry of matrix A = ⎣or rank-one matrix a i a T j (4.8)⎡1 2 3· · ·· · ·· · ·⎤⎦123A(i , j)A iA(i, :)A(:, j)A i:j,k:le.g.videlicetno.a.i.c.i.l.i.w.r.tA is a function of i and ji th matrix from a set or i th principal submatrix or i th iterate of Ai th row of matrix Aj th column of matrix A [159,1.1.8]or A(i:j, k:l) , submatrix taken from i th through j th row andk th through l th columnexempli gratia, from the Latin meaning for sake of examplefrom the Latin meaning it is permitted to seenumber, from the Latin numeroaffinely independent (2.4.2.3)conically independent (2.10)linearly independentwith respect toa.k.a also known asreimı or jreal partimaginary part√ −1⊂ ⊃ ∩ ∪ standard set theory, subset, superset, intersection, union∈membership, element belongs to, or element is a member of∋ membership, contains as in C ∋ y (C contains element y)

772 APPENDIX F. NOTATION AND A FEW DEFINITIONS∃∴∀&∝∞≡≈≃∼ =such thatthere existsthereforefor all, or over all(ampersand) andproportional toinfinityequivalent todefined equal to, equal by definitionapproximately equal toisomorphic to or withcongruent to or withHadamard quotient as in, for x,y ∈ R n ,xy [x i/y i , i=1... n ]∈ R n◦ Hadamard product of matrices: x ◦ y [x i y i , i=1... n ]∈ R n⊗Kronecker product of matrices (D.1.2.1)⊕ vector sum of sets X = Y ⊕ Z where every element x∈X hasunique expression x = y + z where y ∈ Y and z ∈ Z ; [307, p.19]then summands are algebraic complements. X = Y ⊕ Z ⇒ X = Y + Z .Now assume Y and Z are nontrivial subspaces. X = Y + Z ⇒X = Y ⊕ Z ⇔ Y ∩ Z =0 [308,1.2] [109,5.8]. Each element froma vector sum (+) of subspaces has unique expression (⊕) when a basisfrom each subspace is linearly independent of bases from all the othersubspaces.⊖likewise, the vector difference of sets

773⊞orthogonal vector sum of sets X = Y ⊞ Z where every element x∈Xhas unique orthogonal expression x = y + z where y ∈ Y , z ∈ Z ,and y ⊥ z . [330, p.51] X = Y ⊞ Z ⇒ X = Y + Z . If Z ⊆ Y ⊥ thenX = Y ⊞ Z ⇔ X = Y ⊕ Z . [109,5.8] If Z = Y ⊥ then summands areorthogonal complements.± plus or minus or plus and minus⊥as in A ⊥ B meaning A is orthogonal to B (and vice versa), whereA and B are sets, vectors, or matrices. When A and B arevectors (or matrices under Frobenius’ norm), A ⊥ B ⇔ 〈A,B〉 = 0⇔ ‖A + B‖ 2 = ‖A‖ 2 + ‖B‖ 2\ as in \A means logical not A , or relative complement of set A ;id est, \A = {x /∈A} ; e.g., B\A {x∈ B | x /∈A} ≡ B ∩\A⇒ or ⇐⇔is or ←→t → 0 +sufficient or necessary, implies, or is implied by; e.g.,A is sufficient: A ⇒ B , A is necessary: A ⇐ B ,A ⇒ B ⇔ \A ⇐ \B , A ⇐ B ⇔ \A ⇒ \B ,if A then B , if B then A ,A only if B . B only if A .if and only if (iff) or corresponds with or necessary and sufficientor logical equivalenceas in A is B means A ⇒ B ; conventional usage of English languageimposed by logiciansinsufficient or unnecessary, does not imply, or is not implied by; e.g.,A B ⇔ \A \B . A B ⇔ \A \B .is replaced with; substitution, assignmentgoes to, or approaches, or maps tot goes to 0 from above; meaning, from the positive [199, p.2].... · · · as in 1 · · · 1 and [s 1 · · · s N ] meaning continuation; respectively, onesin a row and a matrix whose columns are s i for i=1... N... as in i=1... N meaning, i is a sequence of successive integersbeginning with 1 and ending with N ; id est, 1... N = 1:N

774 APPENDIX F. NOTATION AND A FEW DEFINITIONS: as in f : R n → R m meaning f is a mapping,or sequence of successive integers specified by bounds as in i:j = i ... j(if j < i then sequence is descending)f : M → Rmeaning f is a mapping from ambient space M to ambient R , notnecessarily denoting either domain or range| as in f(x) | x∈ C means with the condition(s) or such that orevaluated for, or as in {f(x) | x∈ C} means evaluated for each andevery x belonging to set Cg| xpexpression g evaluated at x pA, B as in, for example, A, B ∈ S N means A ∈ S N and B ∈ S N(A, B) open interval between A and B in R ,or variable pair perhaps of disparate dimension[A, B ]closed interval or line segment between A and B in R( ) hierarchal, parenthetical, optional{ } curly braces denote a set or list, e.g., {Xa | a≽0} the set comprisingXa evaluated for each and every a≽0 where membership of a tosome space is implicit, a union〈〉 angle brackets denote vector inner-product (33) (38)[ ] matrix or vector, or quote insertion, or citation[d ij ] matrix whose ij th entry is d ij[x i ] vector whose i th entry is x ix pparticular value of xx 0 particular instance of x , or initial value of a sequence x ix 1 first entry of vector x , or first element of a set or list {x i }x εextreme point

775x +vector x whose negative entries are replaced with 0 ; x + = 1 (x + |x|)2(535) or clipped vector x or nonnegative part of xx −ˇxx ⋆x ∗x − 1(x − |x|) or nonpositive part of x = x 2 ++ x −known dataoptimal value of variable x . optimal ⇒ feasiblecomplex conjugate or dual variable or extreme direction of dual conef ∗ convex conjugate function f ∗ (s)= sup{〈s , x〉 − f(x) | x∈domf }P C x or PxP k xδ(A)δ 2 (A)δ(A) 2λ i (X)λ(X) iλ(A)σ(A)Σ∑π(γ)ΞΠprojection of point x on set C , P is operator or idempotent matrixprojection of point x on set C k or on range of implicit vector(a.k.a diag(A) ,A.1) vector made from main diagonal of A if A isa matrix; otherwise, diagonal matrix made from vector A≡ δ(δ(A)). For vector or diagonal matrix Λ , δ 2 (Λ) = Λ= δ(A)δ(A) where A is a vectori th entry of vector λ is function of Xi th entry of vector-valued function of Xvector of eigenvalues of matrix A , (1461) typically arranged innonincreasing ordervector of singular values of matrix A (always arranged in nonincreasingorder), or support function in direction Adiagonal matrix of singular values, not necessarily squaresumnonlinear permutation operator (or presorting function) arrangesvector γ into nonincreasing order (7.1.3)permutation matrixdoublet or permutation operator or matrix

776 APPENDIX F. NOTATION AND A FEW DEFINITIONS∏ψ(Z)DDD T (X)D(X) TD −1 (X)D(X) −1D ⋆D ∗productsignum-like step function that returns a scalar for matrix argument(708), it returns a vector for vector argument (1579)symmetric hollow matrix of distance-square,or Euclidean distance matrixEuclidean distance matrix operatoradjoint operatortranspose of D(X)inverse operatorinverse of D(X)optimal value of variable Ddual to variable DD ◦ polar variable D∂√d ijd ijpartial derivative or partial differential or matrix of distance-squaresquared (1387) or boundary of set K as in ∂K (17) (24)(absolute) distance scalardistance-square scalar, EDM entryV geometric centering operator, V(D)= −V DV 1 2(996)V N V N (D)= −V T N DV N (1010)VN ×N symmetric elementary, auxiliary, projector, geometric centeringmatrix, R(V )= N(1 T ) , N(V )= R(1) , V 2 =V (B.4.1)V N N ×N −1 Schoenberg auxiliary matrix, R(V N )= N(1 T ) ,N(VN T )= R(1) (B.4.2)V X V X V T X ≡ V T X T XV (1187)

777X point list ((76) having cardinality N) arranged columnar in R n×N ,or set of generators, or extreme directions, or matrix variableGrkknNinonontooverone-to-oneepidomRfGram matrix X T Xaffine dimensionnumber of conically independent generatorsraw-data domain of Magnetic Resonance Imaging machine as in k-spaceEuclidean (ambient spatial) dimension of list X ∈ R n×N , or integercardinality of list X ∈ R n×N , or integerfunction f in x means x as argument to for x in C means element x is a member of set Cfunction f(x) on A means A is domfor relation ≼ on A means A is set whose elements are subject to ≼or projection of x on A means A is body on which projection is madeor operating on vector identifies argument type to f as “vector”function f(x) maps onto M means f over its domain is a surjectionwith respect to Mfunction f(x) over C means f evaluated at each and every element ofset Cinjective map or unique correspondence between setsfunction epigraphfunction domainfunction rangeR(A) the subspace: range of A , or span basis R(A) ; R(A) ⊥ N(A T )spanbasis R(A)as in spanA = R(A) = {Ax | x∈ R n } when A is a matrixovercomplete columnar basis for range of Aor minimal set constituting generators for vertex-description of R(A)or linearly independent set of vectors spanning R(A)

778 APPENDIX F. NOTATION AND A FEW DEFINITIONSN(A) the subspace: nullspace of A ; N(A) ⊥ R(A T )R nEuclidean n-dimensional real vector space (nonnegative integer n).R 0 = 0, R = R 1 or vector space of unspecified dimension.R m×n[ RmR n ]Euclidean vector space of m by n dimensional real matrices× Cartesian product. R m×n−m R m×(n−m) . K 1 × K 2 =R m × R n = R m+n[K1K 2]C n or C n×nEuclidean complex vector space of respective dimension n and n×nR n + or R n×n+ nonnegative orthant in Euclidean vector space of respective dimensionn and n×nR n −or R n×n−S nnonpositive orthant in Euclidean vector space of respective dimensionn and n×nsubspace of real symmetric n×n matrices; the symmetric matrixsubspace. S = S 1 or symmetric subspace of unspecified dimension.S n⊥ orthogonal complement of S n in R n×n , the antisymmetric matrices (51)S n +int S n +S+(ρ)nEDM N√EDMNPSDconvex cone comprising all (real) symmetric positive semidefinite n×nmatrices, the positive semidefinite coneinterior of convex cone comprising all (real) symmetric positivesemidefinite n×n matrices; id est, positive definite matrices= {X ∈ S n + | rankX ≥ ρ} (260) convex set of all positive semidefiniten×n symmetric matrices whose rank equals or exceeds ρcone of N ×N Euclidean distance matrices in the symmetric hollowsubspacenonconvex cone of N ×N Euclidean absolute distance matrices in thesymmetric hollow subspace (6.3)positive semidefinite

779SDPSVDSNRdBEDMS n 1S n hS n⊥hS n csemidefinite programsingular value decompositionsignal to noise ratiodecibelEuclidean distance matrixsubspace comprising all symmetric n×n matrices having all zeros infirst row and column (2002)subspace comprising all symmetric hollow n×n matrices (0 maindiagonal), the symmetric hollow subspace (66)orthogonal complement of S n h in Sn (67), the set of all diagonal matricessubspace comprising all geometrically centered symmetric n×nmatrices; geometric center subspace S N c = {Y ∈ S N | Y 1=0} (1998)S n⊥c orthogonal complement of S n c in S n (2000)R m×ncsubspace comprising all geometrically centered m×n matricesX ⊥ basis N(X T ) (2.13.9)x ⊥ N(x T ) ; {y ∈ R n | x T y = 0} (2.13.10.1.1)R(P ) ⊥ N(P T ) (fundamental subspace relations (137))N(P ) ⊥ R(P T )R ⊥ = {y ∈ R n | 〈x,y〉=0 ∀x∈ R} (373).Orthogonal complement of R in R n when R is a subspaceK ⊥ normal cone (449)A ⊥ normal cone to affine subset A (3.1.2.1.2)KK ∗conedual cone

780 APPENDIX F. NOTATION AND A FEW DEFINITIONSK ◦ polar cone; K ∗ = −K ◦ , or angular degree as in 360 ◦K M+K MK λK ∗ λδHH −H +∂H∂H∂H −∂H +dmonotone nonnegative conemonotone conespectral conecone of majorizationhalfspacehalfspace described using an outward-normal (106) to the hyperplanepartially bounding ithalfspace described using an inward-normal (107) to the hyperplanepartially bounding ithyperplane; id est, partial boundary of halfspacesupporting hyperplanea supporting hyperplane having outward-normal with respect to set itsupportsa supporting hyperplane having inward-normal with respect to set itsupportsvector of distance-squared ijlower bound on distance-square d ijd ijABABCupper bound on distance-square d ijclosed line segment between points A and Bmatrix multiplication of A and Bclosure of set Cdecomposition orthonormal (1913) page 711, biorthogonal (1889) page 704expansion orthogonal (1923) page 713, biorthogonal (403) page 191

781vectorentrycubixquartixfeasible setsolution setoptimalfeasiblesameequivalentobjectiveprogramnatural ordercolumn vector in R nscalar element or real variable constituting a vector or matrixmember of R M×N×Lmember of R M×N×L×Kmost simply, the set of all variable values satisfying all constraints ofan optimization problemmost simply, the set of all optimal solutions to an optimization problem;a subset of the feasible set and not necessarily a single pointas in optimal solution, means solution to an optimization problem.optimal ⇒ feasibleas in feasible solution, means satisfies the (“subject to”) constraints ofan optimization problem, may or may not be optimalas in same problem, means optimal solution set for one problem isidentical to optimal solution set of another (without transformation)as in equivalent problem, means optimal solution to one problem can bederived from optimal solution to another via suitable transformationThe three objectives of Optimization are minimize, maximize, and findSemidefinite program is any convex minimization, maximization, orfeasibility problem constraining a variable to a subset of a positivesemidefinite cone.Prototypical semidefinite program conventionally means: a semidefiniteprogram having linear objective, affine equality constraints, but noinequality constraints except for cone membership. (4.1.1)Linear program is any feasibility problem, or minimization ormaximization of a linear objective, constraining the variable to anypolyhedron. (2.13.1.0.3)with reference to stacking columns in a vectorization means a vectormade from superposing column 1 on top of column 2 then superposingthe result on column 3 and so on; as in a vector made from entries of themain diagonal δ(A) means taken from left to right and top to bottom

782 APPENDIX F. NOTATION AND A FEW DEFINITIONSpartial orderoperatortightg ′g ′′→Ydg→Ydg 2∇∇ 2relation ≼ is a partial order, on a set, if it possesses reflexivity,antisymmetry, and transitivity (2.7.2.2)mapping to a vector space (a multidimensional function)with reference to a bound means a bound that can be met,with reference to an inequality means equality is achievablefirst derivative of possibly multidimensional function with respect toreal argumentsecond derivative with respect to real argumentfirst directional derivative of possibly multidimensional function g indirection Y ∈R K×L (maintains dimensions of g)second directional derivative of g in direction Ygradient from calculus, ∇f is shorthand for ∇ x f(x). ∇f(y) means∇ y f(y) or gradient ∇ x f(y) of f(x) with respect to x evaluated at ysecond-order gradient∆ distance scalar (Figure 25), or first-order difference matrix (851),or infinitesimal difference operator (D.1.4)△ ijkIII∅triangle made by vertices i , j , and kRoman numeral oneidentity matrix; I =δ 2 (I), δ(I)=1. Variant: I m I ∈ S mindex set, a discrete set of indicesempty set, an implicit member of every set0 real zero0 origin or vector or matrix of zerosOsort-index matrix

783Oorder of magnitude information required, or computational intensity:O(N) is first order, O(N 2 ) is second, and so on1 real one1 vector of ones. Variant: 1 m 1 ∈ R me i vector whose i th entry is 1 (otherwise 0),i th member of the standard basis for R n (60)maxmaximizexargsup Xarg supf(x)subject tominminimizexfindxinf Xmaximum [199,0.1.1] or largest element of a totally ordered setfind maximum of objective function w.r.t independent variables x .Subscript x ← x∈ C may hold implicit constraints if context clear; e.g.,semidefinitenessargument of operator or function, or variable of optimizationsupremum of totally ordered set X , least upper bound, may or maynot belong to set [199,0.1.1]argument x at supremum of function f ; not necessarily unique or amember of function domainspecifies constraints of an optimization problem; generally, inequalitiesand affine equalities. Subject to implies: anything not an independentvariable is constant, an assignment, or substitutionminimum [199,0.1.1] or smallest element of a totally ordered setfind objective function minimum w.r.t independent variables x .Subscript x ← x∈ C may hold implicit constraints if context clear; e.g.,semidefinitenessfind any feasible solution, specified by the (“subject to”) constraints,w.r.t independent variables x . Subscript x ← x∈ C may hold implicitconstraints if context clear; e.g., semidefiniteness. “Find” is the thirdobjective of Optimizationinfimum of totally ordered set X , greatest lower bound, may or maynot belong to set [199,0.1.1]

784 APPENDIX F. NOTATION AND A FEW DEFINITIONSarg inf f(x)argument x at infimum of function f ; not necessarily unique or amember of function domainiff if and only if , necessary and sufficient; also the meaningindiscriminately attached to appearance of the word “if ” in thestatement of a mathematical definition, [128, p.106] [258, p.4] anesoteric practice worthy of abolition because of ambiguity thusconferredrelintlimsgnroundmodtrrankrelativeinteriorlimitsignum function or signround to nearest integermodulus functionmatrix traceas in rankA, rank of matrix A ; dim R(A)dim dimension, dim R n = n , dim(x∈ R n ) = n , dim R(x∈ R n ) = 1,dim R(A∈ R m×n )= rank(A)affdim affaffine hullaffine dimensioncard cardinality, number of nonzero entries cardx ‖x‖ 0or N is cardinality of list X ∈ R n×N (p.323)convconecenvcontentconvex hull (2.3.2)conic hull (2.3.3)convex envelope (7.2.2.1)of high-dimensional bounded polyhedron, volume in 3 dimensions, areain 2, and so on

785cofmatrix of cofactors corresponding to matrix argumentdist distance between point or set arguments; e.g., dist(x, B)vec columnar vectorization of m ×n matrix, Euclidean dimension mn (37)svec columnar vectorization of symmetric n ×n matrix, Euclideandimension n(n + 1)/2 (56)dveccolumnar vectorization of symmetric hollow n ×n matrix, Euclideandimension n(n − 1)/2 (73)(x,y) angle between vectors x and y , or dihedral angle between affinesubsets≽generalized inequality; e.g., A ≽ 0 means: vector or matrix A must beexpressible in a biorthogonal expansion having nonnegative coordinateswith respect to extreme directions of some implicit pointed closedconvex cone K (2.13.2.0.1,2.13.7.1.1), or comparison to the originwith respect to some implicit pointed closed convex cone (2.7.2.2),or (when K= S n +) matrix A belongs to the positive semidefinite coneof symmetric matrices (2.9.0.1), or (when K= R n +) vector A belongsto the nonnegative orthant (each vector entry is nonnegative,2.3.1.1)≽Kas in x ≽Kz means x − z ∈ K (182)≻⊁≥strict generalized inequality, membership to cone interiornot positive definitescalar inequality, greater than or equal to; comparison of scalars,or entrywise comparison of vectors or matrices with respect to R +nonnegative for α∈ R n , α ≽ 0> greater thanpositive for α∈ R n , α ≻ 0 ; id est, no zero entries when with respect tononnegative orthant, no vector on boundary with respect to pointedclosed convex cone K

786 APPENDIX F. NOTATION AND A FEW DEFINITIONS⌊ ⌋ floor function, ⌊x⌋ is greatest integer not exceeding x| | entrywise absolute value of scalars, vectors, and matriceslogdetnatural (or Napierian) logarithmmatrix determinant‖x‖ vector 2-norm or Euclidean norm ‖x‖ 2√n∑‖x‖ l= l |x j | l vector l-norm for l ≥ 1 (convex)j=1n∑|x j | l vector l-norm for 0 ≤ l < 1 (violates3.2 no.3)j=1‖x‖ ∞ = max{|x j | ∀j} infinity-norm‖x‖ 2 2 = x T x = 〈x , x〉 Euclidean norm square‖x‖ 1 = 1 T |x| 1-norm, dual infinity-norm‖x‖ 0 = 1 T |x| 0 (0 0 0) 0-norm (4.5.1), cardinality of vector x (cardx)‖x‖nk‖X‖ 2k-largest norm (3.2.2.1)= sup ‖Xa‖ 2 = σ 1 = √ λ(X T X) 1 (639) matrix 2-norm (spectral norm),‖a‖=1largest singular value, ‖Xa‖ 2 ≤ ‖X‖ 2 ‖a‖ 2 [159, p.56], ‖δ(x)‖ 2 = ‖x‖ ∞‖X‖ ∗ 2 = 1 T σ(X) nuclear norm, dual spectral norm‖X‖ = ‖X‖ F Frobenius’ matrix norm

Bibliography[1] Edwin A. Abbott. Flatland: A Romance of Many Dimensions. Seely & Co., London,sixth edition, 1884.[2] Suliman Al-Homidan and Henry Wolkowicz. Approximate and exact completionproblems for Euclidean distance matrices using semidefinite programming. LinearAlgebra and its Applications, 406:109–141, September 2005.orion.math.uwaterloo.ca/~hwolkowi/henry/reports/edmapr04.ps[3] Faiz A. Al-Khayyal and James E. Falk. Jointly constrained biconvex programming.Mathematics of Operations Research, 8(2):273–286, May 1983.http://www.convexoptimization.com/TOOLS/Falk.pdf[4] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrix completions.Linear Algebra and its Applications, 370:1–14, 2003.[5] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrix completions: thecase of points in general position. Linear Algebra and its Applications, 397:265–277,2005.[6] Abdo Y. Alfakih, Amir Khandani, and Henry Wolkowicz. Solving Euclideandistance matrix completion problems via semidefinite programming. ComputationalOptimization and Applications, 12(1):13–30, January 1999.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.8307[7] Abdo Y. Alfakih and Henry Wolkowicz. On the embeddability of weighted graphsin Euclidean spaces. Research Report CORR 98-12, Department of Combinatoricsand Optimization, University of Waterloo, May 1998.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.1160Erratum: p.442 herein.[8] Abdo Y. Alfakih and Henry Wolkowicz. Matrix completion problems. In HenryWolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors, Handbook ofSemidefinite Programming: Theory, Algorithms, and Applications, chapter 18.Kluwer, 2000.http://www.convexoptimization.com/TOOLS/Handbook.pdf[9] Abdo Y. Alfakih and Henry Wolkowicz. Two theorems on Euclidean distancematrices and Gale transform. Linear Algebra and its Applications, 340:149–154,2002.787

788 BIBLIOGRAPHY[10] Farid Alizadeh. Combinatorial Optimization with Interior Point Methods andSemi-Definite Matrices. PhD thesis, University of Minnesota, Computer ScienceDepartment, Minneapolis Minnesota USA, October 1991.[11] Farid Alizadeh. Interior point methods in semidefinite programming withapplications to combinatorial optimization. SIAM Journal on Optimization,5(1):13–51, February 1995.[12] Kurt Anstreicher and Henry Wolkowicz. On Lagrangian relaxation of quadraticmatrix constraints. SIAM Journal on Matrix Analysis and Applications, 22(1):41–55,2000.[13] Howard Anton. Elementary Linear Algebra. Wiley, second edition, 1977.[14] James Aspnes, David Goldenberg, and Yang Richard Yang. On the computationalcomplexity of sensor network localization. In Proceedings of the First InternationalWorkshop on Algorithmic Aspects of Wireless Sensor Networks (ALGOSENSORS),volume 3121 of Lecture Notes in Computer Science, pages 32–44, Turku Finland,July 2004. Springer-Verlag.cs-www.cs.yale.edu/homes/aspnes/localization-abstract.html[15] D. Avis and K. Fukuda. A pivoting algorithm for convex hulls and vertexenumeration of arrangements and polyhedra. Discrete and Computational Geometry,8:295–313, 1992.[16] Christine Bachoc and Frank Vallentin. New upper bounds for kissing numbersfrom semidefinite programming. Journal of the American Mathematical Society,21(3):909–924, July 2008.http://arxiv.org/abs/math/0608426[17] Mihály Bakonyi and Charles R. Johnson. The Euclidean distance matrix completionproblem. SIAM Journal on Matrix Analysis and Applications, 16(2):646–654, April1995.[18] Keith Ball. An elementary introduction to modern convex geometry. In Silvio Levy,editor, Flavors of Geometry, volume 31, chapter 1, pages 1–58. MSRI Publications,1997.www.msri.org/publications/books/Book31/files/ball.pdf[19] Richard G. Baraniuk. Compressive sensing [lecture notes]. IEEE Signal ProcessingMagazine, 24(4):118–121, July 2007.http://www.convexoptimization.com/TOOLS/GeometryCardinality.pdf[20] George Phillip Barker. Theory of cones. Linear Algebra and its Applications,39:263–291, 1981.[21] George Phillip Barker and David Carlson. Cones of diagonally dominant matrices.Pacific Journal of Mathematics, 57(1):15–32, 1975.[22] George Phillip Barker and James Foran. Self-dual cones in Euclidean spaces. LinearAlgebra and its Applications, 13:147–155, 1976.

BIBLIOGRAPHY 789[23] Dror Baron, Michael B. Wakin, Marco F. Duarte, Shriram Sarvotham, andRichard G. Baraniuk. Distributed compressed sensing. Technical Report ECE-0612,Rice University, Electrical and Computer Engineering Department, December 2006.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.4789[24] Alexander I. Barvinok. Problems of distance geometry and convex properties ofquadratic maps. Discrete & Computational Geometry, 13(2):189–202, 1995.[25] Alexander I. Barvinok. A remark on the rank of positive semidefinite matricessubject to affine constraints. Discrete & Computational Geometry, 25(1):23–31,2001.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.4627[26] Alexander I. Barvinok. A Course in Convexity. American Mathematical Society,2002.[27] Alexander I. Barvinok. Approximating orthogonal matrices by permutationmatrices. Pure and Applied Mathematics Quarterly, 2:943–961, 2006.[28] Heinz H. Bauschke and Jonathan M. Borwein. On projection algorithms for solvingconvex feasibility problems. SIAM Review, 38(3):367–426, September 1996.[29] Steven R. Bell. The Cauchy Transform, Potential Theory, and Conformal Mapping.CRC Press, 1992.[30] Jean Bellissard and Bruno Iochum. Homogeneous and facially homogeneous self-dualcones. Linear Algebra and its Applications, 19:1–16, 1978.[31] Richard Bellman and Ky Fan. On systems of linear inequalities in Hermitianmatrix variables. In Victor L. Klee, editor, Convexity, volume VII of Proceedings ofSymposia in Pure Mathematics, pages 1–11. American Mathematical Society, 1963.[32] Adi Ben-Israel. Linear equations and inequalities on finite dimensional, real orcomplex, vector spaces: A unified theory. Journal of Mathematical Analysis andApplications, 27:367–389, 1969.[33] Adi Ben-Israel. Motzkin’s transposition theorem, and the related theorems ofFarkas, Gordan and Stiemke. In Michiel Hazewinkel, editor, Encyclopaedia ofMathematics. Springer-Verlag, 2001.http://www.convexoptimization.com/TOOLS/MOTZKIN.pdf[34] Aharon Ben-Tal and Arkadi Nemirovski. Lectures on Modern Convex Optimization:Analysis, Algorithms, and Engineering Applications. SIAM, 2001.[35] Aharon Ben-Tal and Arkadi Nemirovski. Non-Euclidean restricted memorylevel method for large-scale convex optimization. Mathematical Programming,102(3):407–456, January 2005.http://www2.isye.gatech.edu/~nemirovs/Bundle-Mirror rev fin.pdf[36] John J. Benedetto and Paulo J.S.G. Ferreira editors. Modern Sampling Theory:Mathematics and Applications. Birkhäuser, 2001.

790 BIBLIOGRAPHY[37] Christian R. Berger, Javier Areta, Krishna Pattipati, and Peter Willett. Compressedsensing − A look beyond linear programming. In Proceedings of the IEEEInternational Conference on Acoustics, Speech, and Signal Processing, pages3857–3860, 2008.[38] Radu Berinde, Anna C. Gilbert, Piotr Indyk, Howard J. Karloff, and Martin J.Strauss. Combining geometry and combinatorics: A unified approach to sparsesignal recovery. In 46 th Annual Allerton Conference on Communication, Control,and Computing, pages 798–805. IEEE, September 2008.http://arxiv.org/abs/0804.4666[39] Abraham Berman. Cones, Matrices, and Mathematical Programming, volume 79 ofLecture Notes in Economics and Mathematical Systems. Springer-Verlag, 1973.[40] Abraham Berman and Naomi Shaked-Monderer. Completely Positive Matrices.World Scientific, 2003.[41] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, second edition,1999.[42] Dimitri P. Bertsekas, Angelia Nedić, and Asuman E. Ozdaglar. Convex Analysisand Optimization. Athena Scientific, 2003.[43] Rajendra Bhatia. Matrix Analysis. Springer-Verlag, 1997.[44] Pratik Biswas, Tzu-Chen Liang, Kim-Chuan Toh, Yinyu Ye, and Ta-ChungWang. Semidefinite programming approaches for sensor network localization withnoisy distance measurements. IEEE Transactions on Automation Science andEngineering, 3(4):360–371, October 2006.[45] Pratik Biswas, Tzu-Chen Liang, Ta-Chung Wang, and Yinyu Ye. Semidefiniteprogramming based algorithms for sensor network localization. ACM Transactionson Sensor Networks, 2(2):188–220, May 2006.http://www.stanford.edu/~yyye/combined rev3.pdf[46] Pratik Biswas, Kim-Chuan Toh, and Yinyu Ye. A distributed SDP approachfor large-scale noisy anchor-free graph realization with applications to molecularconformation. SIAM Journal on Scientific Computing, 30(3):1251–1277, March2008.http://www.stanford.edu/~yyye/SISC-molecule-revised-2.pdf[47] Pratik Biswas and Yinyu Ye. Semidefinite programming for ad hoc wireless sensornetwork localization. In Proceedings of the Third International Symposium onInformation Processing in Sensor Networks (IPSN), pages 46–54. IEEE, April 2004.http://www.stanford.edu/~yyye/adhocn4.pdf[48] Åke Björck and Tommy Elfving. Accelerated projection methods for computingpseudoinverse solutions of systems of linear equations. BIT Numerical Mathematics,19(2):145–163, June 1979.http://www.convexoptimization.com/TOOLS/BIT.pdf

BIBLIOGRAPHY 791[49] Leonard M. Blumenthal. On the four-point property. Bulletin of the AmericanMathematical Society, 39:423–426, 1933.[50] Leonard M. Blumenthal. Theory and Applications of Distance Geometry. OxfordUniversity Press, 1953.[51] Radu Ioan Boţ, Ernö Robert Csetnek, and Gert Wanka. Regularity conditions viaquasi-relative interior in convex programming. SIAM Journal on Optimization,19(1):217–233, 2008.http://www.convexoptimization.com/TOOLS/Wanka.pdf[52] A. W. Bojanczyk and A. Lutoborski. The Procrustes problem for orthogonal Stiefelmatrices. SIAM Journal on Scientific Computing, 21(4):1291–1304, December 1999.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.9063[53] Ingwer Borg and Patrick Groenen. Modern Multidimensional Scaling.Springer-Verlag, 1997.[54] Jonathan M. Borwein and Heinz Bauschke.Projection algorithms and monotone operators, 1998.http://oldweb.cecm.sfu.ca/personal/jborwein/projections4.pdf[55] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and NonlinearOptimization: Theory and Examples. Springer-Verlag, 2000.http://www.convexoptimization.com/TOOLS/Borwein.pdf[56] Jonathan M. Borwein and Warren B. Moors. Stability of closedness of convex conesunder linear mappings. Journal of Convex Analysis, 16(3):699–705, 2009.http://www.convexoptimization.com/TOOLS/BorweinMoors.pdf[57] Richard Bouldin. The pseudo-inverse of a product. SIAM Journal on AppliedMathematics, 24(4):489–495, June 1973.http://www.convexoptimization.com/TOOLS/Pseudoinverse.pdf[58] Stephen Boyd and Jon Dattorro. Alternating projections, 2003.http://www.stanford.edu/class/ee392o/alt proj.pdf[59] Stephen Boyd, Laurent El Ghaoui, Eric Feron, and Venkataramanan Balakrishnan.Linear Matrix Inequalities in System and Control Theory. SIAM, 1994.[60] Stephen Boyd, Seung-Jean Kim, Lieven Vandenberghe, and Arash Hassibi. Atutorial on geometric programming. Optimization and Engineering, 8(1):1389–4420,March 2007.http://www.stanford.edu/~boyd/papers/gp tutorial.html[61] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.Cambridge University Press, 2004.http://www.stanford.edu/~boyd/cvxbook[62] James P. Boyle and Richard L. Dykstra. A method for finding projections ontothe intersection of convex sets in Hilbert spaces. In R. Dykstra, T. Robertson,and F. T. Wright, editors, Advances in Order Restricted Statistical Inference, pages28–47. Springer-Verlag, 1986.

792 BIBLIOGRAPHY[63] Lev M. Brègman. The method of successive projection for finding a common point ofconvex sets. Soviet Mathematics, 162(3):487–490, 1965. AMS translation of DokladyAkademii Nauk SSSR, 6:688-692.[64] Lev M. Brègman, Yair Censor, Simeon Reich, and Yael Zepkowitz-Malachi. Findingthe projection of a point onto the intersection of convex sets via projections ontohalfspaces. Journal of Approximation Theory, 124(2):194–218, October 2003.http://www.optimization-online.org/DB FILE/2003/06/669.pdf[65] Mike Brookes. Matrix reference manual: Matrix calculus, 2002.http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html[66] Richard A. Brualdi. Combinatorial Matrix Classes. Cambridge University Press,2006.[67] Jian-Feng Cai, Emmanuel J. Candès, and Zuowei Shen. A singular valuethresholding algorithm for matrix completion, October 2008.http://arxiv.org/abs/0810.3286[68] Emmanuel J. Candès. Compressive sampling. In Proceedings of the InternationalCongress of Mathematicians, volume III, pages 1433–1452, Madrid, August 2006.European Mathematical Society.http://www.acm.caltech.edu/~emmanuel/papers/CompressiveSampling.pdf[69] Emmanuel J. Candès and Justin K. Romberg. l 1 -magic : Recovery of sparse signalsvia convex programming, October 2005.http://www.acm.caltech.edu/l1magic/downloads/l1magic.pdf[70] Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Robust uncertaintyprinciples: Exact signal reconstruction from highly incomplete frequencyinformation. IEEE Transactions on Information Theory, 52(2):489–509, February2006.http://arxiv.org/abs/math.NA/0409186 (2004 DRAFT)http://www.acm.caltech.edu/~emmanuel/papers/ExactRecovery.pdf[71] Emmanuel J. Candès, Michael B. Wakin, and Stephen P. Boyd. Enhancing sparsityby reweighted l 1 minimization. Journal of Fourier Analysis and Applications,14(5-6):877–905, October 2008.http://arxiv.org/abs/0711.1612v1[72] Michael Carter, Holly Hui Jin, Michael Saunders, and Yinyu Ye. spaseloc: Anadaptive subproblem algorithm for scalable wireless sensor network localization.SIAM Journal on Optimization, 17(4):1102–1128, December 2006.http://www.convexoptimization.com/TOOLS/JinSIAM.pdfErratum: p.315 herein.[73] Lawrence Cayton and Sanjoy Dasgupta. Robust Euclidean embedding. InProceedings of the 23 rd International Conference on Machine Learning (ICML),Pittsburgh Pennsylvania USA, 2006.http://cseweb.ucsd.edu/~lcayton/robEmb.pdf

BIBLIOGRAPHY 793[74] Yves Chabrillac and Jean-Pierre Crouzeix. Definiteness and semidefiniteness ofquadratic forms revisited. Linear Algebra and its Applications, 63:283–292, 1984.[75] Manmohan K. Chandraker, Sameer Agarwal, Fredrik Kahl, David Nistér, andDavid J. Kriegman. Autocalibration via rank-constrained estimation of the absolutequadric. In Proceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition, June 2007.http://vision.ucsd.edu/~manu/cvpr07 quadric.pdf[76] Rick Chartrand. Exact reconstruction of sparse signals via nonconvex minimization.IEEE Signal Processing Letters, 14(10):707–710, October 2007.math.lanl.gov/Research/Publications/Docs/chartrand-2007-exact.pdf[77] Chi-Tsong Chen. Linear System Theory and Design. Oxford University Press, 1999.[78] Ward Cheney and Allen A. Goldstein. Proximity maps for convex sets. Proceedingsof the American Mathematical Society, 10:448–450, 1959. Erratum: p.743 herein.[79] Mung Chiang. Geometric Programming for Communication Systems. Now, 2005.[80] Stéphane Chrétien. An alternating l1 approach to the compressed sensing problem,September 2009.http://arxiv.org/PS cache/arxiv/pdf/0809/0809.0660v3.pdf[81] Steven Chu. Autobiography from Les Prix Nobel, 1997.nobelprize.org/physics/laureates/1997/chu-autobio.html[82] Ruel V. Churchill and James Ward Brown. Complex Variables and Applications.McGraw-Hill, fifth edition, 1990.[83] Jon F. Claerbout and Francis Muir. Robust modeling of erratic data. Geophysics,38:826–844, 1973.[84] J. H. Conway and N. J. A. Sloane. Sphere Packings, Lattices and Groups.Springer-Verlag, third edition, 1999.[85] John B. Conway. A Course in Functional Analysis. Springer-Verlag, second edition,1990.[86] Jose A. Costa, Neal Patwari, and Alfred O. Hero III. Distributedweighted-multidimensional scaling for node localization in sensor networks.ACM Transactions on Sensor Networks, 2(1):39–64, February 2006.http://www.convexoptimization.com/TOOLS/Costa.pdf[87] Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone.The Linear Complementarity Problem. Academic Press, 1992.[88] G. M. Crippen and T. F. Havel. Distance Geometry and Molecular Conformation.Wiley, 1988.[89] Frank Critchley. Multidimensional scaling: a critical examination and some newproposals. PhD thesis, University of Oxford, Nuffield College, 1980.

794 BIBLIOGRAPHY[90] Frank Critchley. On certain linear mappings between inner-product andsquared-distance matrices. Linear Algebra and its Applications, 105:91–107, 1988.http://www.convexoptimization.com/TOOLS/Critchley.pdf[91] Ronald E. Crochiere and Lawrence R. Rabiner. Multirate Digital Signal Processing.Prentice-Hall, 1983.[92] Lawrence B. Crowell. Quantum Fluctuations of Spacetime. World Scientific, 2005.[93] Joachim Dahl, Bernard H. Fleury, and Lieven Vandenberghe. Approximatemaximum-likelihood estimation using semidefinite programming. In Proceedings ofthe IEEE International Conference on Acoustics, Speech, and Signal Processing,volume VI, pages 721–724, April 2003.http://www.convexoptimization.com/TOOLS/Dahl.pdf[94] George B. Dantzig. Linear Programming and Extensions.Princeton University Press, 1963 (Rand Corporation).[95] Alexandre d’Aspremont, Laurent El Ghaoui, Michael I. Jordan, and Gert R. G.Lanckriet. A direct formulation for sparse PCA using semidefinite programming.In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances inNeural Information Processing Systems 17, pages 41–48. MIT Press, CambridgeMassachusetts USA, 2005.http://books.nips.cc/papers/files/nips17/NIPS2004 0645.pdf[96] Jon Dattorro. Constrained least squares fit of a filter bank to an arbitrarymagnitude frequency response, 1991.http://ccrma.stanford.edu/~dattorro/Hearing.htmhttp://ccrma.stanford.edu/~dattorro/PhiLS.pdf[97] Jon Dattorro. Effect Design, part 1: Reverberator and other filters. Journal of theAudio Engineering Society, 45(9):660–684, September 1997.http://www.convexoptimization.com/TOOLS/EffectDesignPart1.pdf[98] Jon Dattorro. Effect Design, part 2: Delay-line modulation and chorus. Journal ofthe Audio Engineering Society, 45(10):764–788, October 1997.http://www.convexoptimization.com/TOOLS/EffectDesignPart2.pdf[99] Jon Dattorro. Effect Design, part 3: Oscillators: Sinusoidal and pseudonoise.Journal of the Audio Engineering Society, 50(3):115–146, March 2002.http://www.convexoptimization.com/TOOLS/EffectDesignPart3.pdf[100] Jon Dattorro. Equality relating Euclidean distance cone to positive semidefinitecone. Linear Algebra and its Applications, 428(11+12):2597–2600, June 2008.http://www.convexoptimization.com/TOOLS/DattorroLAA.pdf[101] Joel Dawson, Stephen Boyd, Mar Hershenson, and Thomas Lee. Optimal allocationof local feedback in multistage amplifiers via geometric programming. IEEETransactions on Circuits and Systems I: Fundamental Theory and Applications,48(1):1–11, January 2001.http://www.stanford.edu/~boyd/papers/fdbk alloc.html

BIBLIOGRAPHY 795[102] Etienne de Klerk. Aspects of Semidefinite Programming: Interior Point Algorithmsand Selected Applications. Kluwer Academic Publishers, 2002.[103] Jan de Leeuw. Fitting distances by least squares. UCLA Statistics Series TechnicalReport No. 130, Interdivisional Program in Statistics, UCLA, Los Angeles CaliforniaUSA, 1993.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.7472[104] Jan de Leeuw. Multidimensional scaling. In International Encyclopedia of theSocial & Behavioral Sciences, pages 13512–13519. Elsevier, 2004.http://preprints.stat.ucla.edu/274/274.pdf[105] Jan de Leeuw. Unidimensional scaling. In Brian S. Everitt and David C. Howell,editors, Encyclopedia of Statistics in Behavioral Science, volume 4, pages 2095–2097.Wiley, 2005.http://www.convexoptimization.com/TOOLS/deLeeuw2005.pdf[106] Jan de Leeuw and Willem Heiser. Theory of multidimensional scaling. In P. R.Krishnaiah and L. N. Kanal, editors, Handbook of Statistics, volume 2, chapter 13,pages 285–316. North-Holland Publishing, Amsterdam, 1982.[107] Erik D. Demaine, Francisco Gomez-Martin, Henk Meijer, David Rappaport, PerouzTaslakian, Godfried T. Toussaint, Terry Winograd, and David R. Wood. Thedistance geometry of music. Computational Geometry: Theory and Applications,42(5):429–454, July 2009.http://www.convexoptimization.com/TOOLS/dgm.pdf[108] John R. D’Errico. DERIVEST, 2007.http://www.convexoptimization.com/TOOLS/DERIVEST.pdf[109] Frank Deutsch. Best Approximation in Inner Product Spaces. Springer-Verlag, 2001.[110] Frank Deutsch and Hein Hundal. The rate of convergence of Dykstra’s cyclicprojections algorithm: The polyhedral case. Numerical Functional Analysis andOptimization, 15:537–565, 1994.[111] Frank Deutsch and Peter H. Maserick. Applications of the Hahn-Banach theoremin approximation theory. SIAM Review, 9(3):516–530, July 1967.[112] Frank Deutsch, John H. McCabe, and George M. Phillips. Some algorithms forcomputing best approximations from convex cones. SIAM Journal on NumericalAnalysis, 12(3):390–403, June 1975.[113] Michel Marie Deza and Monique Laurent. Geometry of Cuts and Metrics.Springer-Verlag, 1997.[114] Carolyn Pillers Dobler. A matrix approach to finding a set of generators and findingthe polar (dual) of a class of polyhedral cones. SIAM Journal on Matrix Analysisand Applications, 15(3):796–803, July 1994. Erratum: p.212 herein.[115] Elizabeth D. Dolan, Robert Fourer, Jorge J. Moré, and Todd S. Munson.Optimization on the NEOS server. SIAM News, 35(6):4,8,9, August 2002.

796 BIBLIOGRAPHY[116] Bruce Randall Donald. 3-D structure in chemistry and molecular biology, 1998.http://www.cs.duke.edu/brd/Teaching/Previous/Bio[117] David L. Donoho. Neighborly polytopes and sparse solution of underdeterminedlinear equations. Technical Report 2005-04, Stanford University, Department ofStatistics, January 2005.www-stat.stanford.edu/~donoho/Reports/2005/NPaSSULE-01-28-05.pdf[118] David L. Donoho. Compressed sensing. IEEE Transactions on Information Theory,52(4):1289–1306, April 2006.www-stat.stanford.edu/~donoho/Reports/2004/CompressedSensing091604.pdf[119] David L. Donoho. High-dimensional centrally symmetric polytopes withneighborliness proportional to dimension. Discrete & Computational Geometry,35(4):617–652, May 2006.[120] David L. Donoho, Michael Elad, and Vladimir Temlyakov. Stable recovery ofsparse overcomplete representations in the presence of noise. IEEE Transactionson Information Theory, 52(1):6–18, January 2006.www-stat.stanford.edu/~donoho/Reports/2004/StableSparse-Donoho-etal.pdf[121] David L. Donoho and Philip B. Stark. Uncertainty principles and signal recovery.SIAM Journal on Applied Mathematics, 49(3):906–931, June 1989.[122] David L. Donoho and Jared Tanner. Neighborliness of randomly projectedsimplices in high dimensions. Proceedings of the National Academy of Sciences,102(27):9452–9457, July 2005.[123] David L. Donoho and Jared Tanner. Sparse nonnegative solution of underdeterminedlinear equations by linear programming. Proceedings of the National Academy ofSciences, 102(27):9446–9451, July 2005.[124] David L. Donoho and Jared Tanner. Counting faces of randomly projectedpolytopes when the projection radically lowers dimension. Journal of the AmericanMathematical Society, 22(1):1–53, January 2009.[125] Miguel Nuno Ferreira Fialho dos Anjos. New Convex Relaxations for the MaximumCut and VLSI Layout Problems. PhD thesis, University of Waterloo, OntarioCanada, Department of Combinatorics and Optimization, 2001.http://etd.uwaterloo.ca/etd/manjos2001.pdf[126] John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficientprojections onto the l 1 -ball for learning in high dimensions. In Proceedings ofthe 25 th International Conference on Machine Learning (ICML), pages 272–279,Helsinki Finland, July 2008. Association for Computing Machinery (ACM).http://icml2008.cs.helsinki.fi/papers/361.pdf[127] Richard L. Dykstra. An algorithm for restricted least squares regression. Journal ofthe American Statistical Association, 78(384):837–842, 1983.[128] Peter J. Eccles. An Introduction to Mathematical Reasoning: numbers, sets andfunctions. Cambridge University Press, 1997.

BIBLIOGRAPHY 797[129] Carl Eckart and Gale Young. The approximation of one matrix by another of lowerrank. Psychometrika, 1(3):211–218, September 1936.www.convexoptimization.com/TOOLS/eckart&young.1936.pdf[130] Alan Edelman, Tomás A. Arias, and Steven T. Smith. The geometry of algorithmswith orthogonality constraints. SIAM Journal on Matrix Analysis and Applications,20(2):303–353, 1998.[131] John J. Edgell. Graphics calculator applications on 4-D constructs, 1996.http://archives.math.utk.edu/ICTCM/EP-9/C47/pdf/paper.pdf[132] Ivar Ekeland and Roger Témam. Convex Analysis and Variational Problems. SIAM,1999.[133] Julius Farkas. Theorie der einfachen Ungleichungen. Journal für die reine undangewandte Mathematik, 124:1–27, 1902.http://gdz.sub.uni-goettingen.de/no cache/dms/load/img/?IDDOC=261361[134] Maryam Fazel. Matrix Rank Minimization with Applications. PhD thesis, StanfordUniversity, Department of Electrical Engineering, March 2002.http://www.cds.caltech.edu/~maryam/thesis-final.pdf[135] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. A rank minimization heuristicwith application to minimum order system approximation. In Proceedings of theAmerican Control Conference, volume 6, pages 4734–4739. American AutomaticControl Council (AACC), June 2001.http://www.cds.caltech.edu/~maryam/nucnorm.html[136] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. Log-det heuristic formatrix rank minimization with applications to Hankel and Euclidean distancematrices. In Proceedings of the American Control Conference, volume 3,pages 2156–2162. American Automatic Control Council (AACC), June 2003.http://www.cds.caltech.edu/~maryam/acc03 final.pdf[137] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. Rank minimization andapplications in system theory. In Proceedings of the American Control Conference,volume 4, pages 3273–3278. American Automatic Control Council (AACC), June2004.http://www.cds.caltech.edu/~maryam/acc04-tutorial.pdf[138] J. T. Feddema, R. H. Byrne, J. J. Harrington, D. M. Kilman, C. L. Lewis,R. D. Robinett, B. P. Van Leeuwen, and J. G. Young. Advanced mobilenetworking, sensing, and controls. Sandia Report SAND2005-1661, Sandia NationalLaboratories, Albuquerque New Mexico USA, March 2005.www.prod.sandia.gov/cgi-bin/techlib/access-control.pl/2005/051661.pdf[139] César Fernández, Thomas Szyperski, Thierry Bruyère, Paul Ramage, EgonMösinger, and Kurt Wüthrich. NMR solution structure of the pathogenesis-relatedprotein P14a. Journal of Molecular Biology, 266:576–593, 1997.

798 BIBLIOGRAPHY[140] Richard Phillips Feynman, Robert B. Leighton, and Matthew L. Sands. TheFeynman Lectures on Physics: Commemorative Issue, volume I. Addison-Wesley,1989.[141] Anthony V. Fiacco and Garth P. McCormick. Nonlinear Programming: SequentialUnconstrained Minimization Techniques. SIAM, 1990.[142] Önder Filiz and Aylin Yener. Rank constrained temporal-spatial filters for CDMAsystems with base station antenna arrays. In Proceedings of the Johns HopkinsUniversity Conference on Information Sciences and Systems, March 2003.labs.ee.psu.edu/labs/wcan/Publications/filiz-yener ciss03.pdf[143] P. A. Fillmore and J. P. Williams. Some convexity theorems for matrices. GlasgowMathematical Journal, 12:110–117, 1971.[144] Paul Finsler. Über das Vorkommen definiter und semidefiniter Formen in Scharenquadratischer Formen. Commentarii Mathematici Helvetici, 9:188–192, 1937.[145] Anders Forsgren, Philip E. Gill, and Margaret H. Wright. Interior methods fornonlinear optimization. SIAM Review, 44(4):525–597, 2002.[146] Shmuel Friedland and Anatoli Torokhti. Generalized rank-constrained matrixapproximations. SIAM Journal on Matrix Analysis and Applications, 29(2):656–659,March 2007.http://www2.math.uic.edu/~friedlan/frtor8.11.06.pdf[147] Ragnar Frisch. The multiplex method for linear programming. SANKHYĀ : TheIndian Journal of Statistics, 18(3/4):329–362, September 1957.http://www.convexoptimization.com/TOOLS/Frisch57.pdf[148] Norbert Gaffke and Rudolf Mathar. A cyclic projection algorithm via duality.Metrika, 36:29–54, 1989.[149] Jérôme Galtier. Semi-definite programming as a simple extension to linearprogramming: convex optimization with queueing, equity and other telecomfunctionals. In 3ème Rencontres Francophones sur les Aspects Algorithmiques desTélécommunications (AlgoTel), pages 21–28. INRIA, Saint Jean de Luz FRANCE,May 2001.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.7378[150] Laurent El Ghaoui. EE 227A: Convex Optimization and Applications, Lecture 11− October 3. University of California, Berkeley, Fall 2006. Scribe: Nikhil Shetty.http://www.convexoptimization.com/TOOLS/Ghaoui.pdf[151] Laurent El Ghaoui and Silviu-Iulian Niculescu, editors. Advances in Linear MatrixInequality Methods in Control. SIAM, 2000.[152] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization.Academic Press, 1981.[153] Philip E. Gill, Walter Murray, and Margaret H. Wright. Numerical Linear Algebraand Optimization, volume 1. Addison-Wesley, 1991.

BIBLIOGRAPHY 799[154] James Gleik. Isaac Newton. Pantheon Books, 2003.[155] W. Glunt, Tom L. Hayden, S. Hong, and J. Wells. An alternating projectionalgorithm for computing the nearest Euclidean distance matrix. SIAM Journalon Matrix Analysis and Applications, 11(4):589–600, 1990.[156] K. Goebel and W. A. Kirk. Topics in Metric Fixed Point Theory. CambridgeUniversity Press, 1990.[157] Michel X. Goemans and David P. Williamson. Improved approximation algorithmsfor maximum cut and satisfiability problems using semidefinite programming.Journal of the Association for Computing Machinery, 42(6):1115–1145, November1995.http://www.convexoptimization.com/TOOLS/maxcut-jacm.pdf[158] D. Goldfarb and K. Scheinberg. Interior point trajectories in semidefiniteprogramming. SIAM Journal on Optimization, 8(4):871–886, 1998.[159] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins,third edition, 1996.[160] P. Gordan. Ueber die auflösung linearer gleichungen mit reellen coefficienten.Mathematische Annalen, 6:23–28, 1873.[161] Irina F. Gorodnitsky and Bhaskar D. Rao. Sparse signal reconstruction from limiteddata using FOCUSS: A re-weighted minimum norm algorithm. IEEE Transactionson Signal Processing, 45(3):600–616, March 1997.http://www.convexoptimization.com/TOOLS/focuss.pdf[162] John Clifford Gower. Euclidean distance geometry. The Mathematical Scientist,7:1–14, 1982.http://www.convexoptimization.com/TOOLS/Gower2.pdf[163] John Clifford Gower. Properties of Euclidean and non-Euclidean distance matrices.Linear Algebra and its Applications, 67:81–97, 1985.http://www.convexoptimization.com/TOOLS/Gower1.pdf[164] John Clifford Gower and Garmt B. Dijksterhuis. Procrustes Problems. OxfordUniversity Press, 2004.[165] John Clifford Gower and David J. Hand. Biplots. Chapman & Hall, 1996.[166] Alexander Graham. Kronecker Products and Matrix Calculus with Applications.Ellis Horwood Limited, 1981.[167] Michael Grant, Stephen Boyd, and Yinyu Ye.cvx: Matlab software for disciplined convex programming, 2007.http://www.stanford.edu/~boyd/cvx[168] Robert M. Gray. Toeplitz and circulant matrices: A review. Foundations andTrends in Communications and Information Theory, 2(3):155–239, 2006.http://www-ee.stanford.edu/~gray/toeplitz.pdf

800 BIBLIOGRAPHY[169] T. N. E. Greville. Note on the generalized inverse of a matrix product. SIAMReview, 8:518–521, 1966.[170] Rémi Gribonval and Morten Nielsen. Sparse representations in unions of bases.IEEE Transactions on Information Theory, 49(12):3320–3325, December 2003.http://people.math.aau.dk/~mnielsen/reprints/sparse unions.pdf[171] Rémi Gribonval and Morten Nielsen. Highly sparse representations from dictionariesare unique and independent of the sparseness measure. Applied and ComputationalHarmonic Analysis, 22(3):335–355, May 2007.http://www.convexoptimization.com/TOOLS/R-2003-16.pdf[172] Karolos M. Grigoriadis and Eric B. Beran. Alternating projection algorithms forlinear matrix inequalities problems with rank constraints. In Laurent El Ghaouiand Silviu-Iulian Niculescu, editors, Advances in Linear Matrix Inequality Methodsin Control, chapter 13, pages 251–267. SIAM, 2000.[173] Peter Gritzmann and Victor Klee. On the complexity of some basic problemsin computational convexity: II. volume and mixed volumes. Technical ReportTR:94-31, DIMACS, Rutgers University, 1994.ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1994/94-31.ps[174] Peter Gritzmann and Victor Klee. On the complexity of some basic problemsin computational convexity: II. volume and mixed volumes. In T. Bisztriczky,P. McMullen, R. Schneider, and A. Ivić Weiss, editors, Polytopes: Abstract, Convexand Computational, pages 373–466. Kluwer Academic Publishers, 1994.[175] L. G. Gubin, B. T. Polyak, and E. V. Raik. The method of projections forfinding the common point of convex sets. U.S.S.R. Computational Mathematicsand Mathematical Physics, 7(6):1–24, 1967.[176] Osman Güler and Yinyu Ye. Convergence behavior of interior-point algorithms.Mathematical Programming, 60(2):215–228, 1993.[177] P. R. Halmos. Positive approximants of operators. Indiana University MathematicsJournal, 21:951–960, 1972.[178] Shih-Ping Han. A successive projection method. Mathematical Programming,40:1–14, 1988.[179] Godfrey H. Hardy, John E. Littlewood, and George Pólya. Inequalities. CambridgeUniversity Press, second edition, 1952.[180] Arash Hassibi and Mar Hershenson. Automated optimal design of switched-capacitorfilters. In Proceedings of the Conference on Design, Automation, and Test in Europe,page 1111, March 2002.[181] Johan Håstad. Some optimal inapproximability results. In Proceedings of theTwenty-Ninth Annual ACM Symposium on Theory of Computing (STOC), pages1–10, El Paso Texas USA, 1997. Association for Computing Machinery (ACM).http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.183 , 2002.

BIBLIOGRAPHY 801[182] Johan Håstad. Some optimal inapproximability results. Journal of the Associationfor Computing Machinery, 48(4):798–859, July 2001.[183] Timothy F. Havel and Kurt Wüthrich. An evaluation of the combined use ofnuclear magnetic resonance and distance geometry for the determination of proteinconformations in solution. Journal of Molecular Biology, 182:281–294, 1985.[184] Tom L. Hayden and Jim Wells. Approximation by matrices positive semidefinite ona subspace. Linear Algebra and its Applications, 109:115–130, 1988.[185] Tom L. Hayden, Jim Wells, Wei-Min Liu, and Pablo Tarazaga. The cone of distancematrices. Linear Algebra and its Applications, 144:153–169, 1991.[186] Uwe Helmke and John B. Moore. Optimization and Dynamical Systems.Springer-Verlag, 1994.[187] Bruce Hendrickson. Conditions for unique graph realizations. SIAM Journal onComputing, 21(1):65–84, February 1992.[188] T. Herrmann, Peter Güntert, and Kurt Wüthrich. Protein NMR structuredetermination with automated NOE assignment using the new software CANDIDand the torsion angle dynamics algorithm DYANA. Journal of Molecular Biology,319(1):209–227, May 2002.[189] Mar Hershenson. Design of pipeline analog-to-digital converters via geometricprogramming. In Proceedings of the IEEE/ACM International Conference onComputer Aided Design (ICCAD), pages 317–324, November 2002.[190] Mar Hershenson. Efficient description of the design space of analog circuits. InProceedings of the 40 th ACM/IEEE Design Automation Conference, pages 970–973,June 2003.[191] Mar Hershenson, Stephen Boyd, and Thomas Lee. Optimal design of a CMOSOpAmp via geometric programming. IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, 20(1):1–21, January 2001.http://www.stanford.edu/~boyd/papers/opamp.html[192] Mar Hershenson, Dave Colleran, Arash Hassibi, and Navraj Nandra. Synthesizablefull custom mixed-signal IP. Electronics Design Automation Consortium (EDA),2002.http://www.eda.org/edps/edp02/PAPERS/edp02-s6 2.pdf[193] Mar Hershenson, Sunderarajan S. Mohan, Stephen Boyd, and Thomas Lee.Optimization of inductor circuits via geometric programming. In Proceedings of the36 th ACM/IEEE Design Automation Conference, pages 994–998, June 1999.http://www.stanford.edu/~boyd/papers/inductor opt.html[194] Nick Higham. Matrix Procrustes problems, 1995.http://www.ma.man.ac.uk/~higham/talksLecture notes.[195] Richard D. Hill and Steven R. Waters. On the cone of positive semidefinite matrices.Linear Algebra and its Applications, 90:81–88, 1987.

802 BIBLIOGRAPHY[196] Jean-Baptiste Hiriart-Urruty. Ensembles de Tchebychev vs. ensembles convexes:l’état de la situation vu via l’analyse convexe non lisse. Annales des SciencesMathématiques du Québec, 22(1):47–62, 1998.[197] Jean-Baptiste Hiriart-Urruty. Global optimality conditions in maximizing aconvex quadratic function under convex quadratic constraints. Journal of GlobalOptimization, 21(4):445–455, December 2001.[198] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex Analysisand Minimization Algorithms II: Advanced Theory and Bundle Methods.Springer-Verlag, second edition, 1996.[199] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Fundamentals of ConvexAnalysis. Springer-Verlag, 2001.[200] Alan J. Hoffman and Helmut W. Wielandt. The variation of the spectrum of anormal matrix. Duke Mathematical Journal, 20:37–40, 1953.[201] Alfred Horn. Doubly stochastic matrices and the diagonal of a rotation matrix.American Journal of Mathematics, 76(3):620–630, July 1954.http://www.convexoptimization.com/TOOLS/AHorn.pdf[202] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge UniversityPress, 1987.[203] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. CambridgeUniversity Press, 1994.[204] Alston S. Householder. The Theory of Matrices in Numerical Analysis. Dover, 1975.[205] Hong-Xuan Huang, Zhi-An Liang, and Panos M. Pardalos. Some properties for theEuclidean distance matrix and positive semidefinite matrix completion problems.Journal of Global Optimization, 25(1):3–21, January 2003.http://www.convexoptimization.com/TOOLS/pardalos.pdf[206] Lawrence Hubert, Jacqueline Meulman, and Willem Heiser. Two purposes formatrix factorization: A historical appraisal. SIAM Review, 42(1):68–82, 2000.http://www.convexoptimization.com/TOOLS/hubert.pdf[207] Xiaoming Huo. Sparse Image Representation via Combined Transforms. PhDthesis, Stanford University, Department of Statistics, August 1999.www.convexoptimization.com/TOOLS/ReweightingFrom1999XiaomingHuo.pdf[208] Robert J. Marks II, editor. Advanced Topics in Shannon Sampling and InterpolationTheory. Springer-Verlag, 1993.[209] 5W Infographic. Wireless 911. Technology Review, 107(5):78–79, June 2004.http://www.technologyreview.com[210] George Isac. Complementarity Problems. Springer-Verlag, 1992.[211] Nathan Jacobson. Lectures in Abstract Algebra, vol. II - Linear Algebra.Van Nostrand, 1953.

BIBLIOGRAPHY 803[212] Viren Jain and Lawrence K. Saul. Exploratory analysis and visualization of speechand music by locally linear embedding. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, volume 3, pages 984–987,May 2004.http://www.cs.ucsd.edu/~saul/papers/lle icassp04.pdf[213] Joakim Jaldén. Bi-criterion l 1 /l 2 -norm optimization. Master’s thesis, RoyalInstitute of Technology (KTH), Department of Signals Sensors and Systems,Stockholm Sweden, September 2002.www.ee.kth.se/php/modules/publications/reports/2002/IR-SB-EX-0221.pdf[214] Joakim Jaldén, Cristoff Martin, and Björn Ottersten. Semidefinite programmingfor detection in linear systems − Optimality conditions and space-time decoding.In Proceedings of the IEEE International Conference on Acoustics, Speech, andSignal Processing, volume IV, pages 9–12, April 2003.http://www.s3.kth.se/signal/reports/03/IR-S3-SB-0309.pdf[215] Florian Jarre. Convex analysis on symmetric matrices. In Henry Wolkowicz,Romesh Saigal, and Lieven Vandenberghe, editors, Handbook of SemidefiniteProgramming: Theory, Algorithms, and Applications, chapter 2. Kluwer, 2000.http://www.convexoptimization.com/TOOLS/Handbook.pdf[216] Holly Hui Jin. Scalable Sensor Localization Algorithms for Wireless SensorNetworks. PhD thesis, University of Toronto, Graduate Department of Mechanicaland Industrial Engineering, 2005.www.stanford.edu/group/SOL/dissertations/holly-thesis.pdf[217] Charles R. Johnson and Pablo Tarazaga. Connections between the real positivesemidefinite and distance matrix completion problems. Linear Algebra and itsApplications, 223/224:375–391, 1995.[218] Charles R. Johnson and Pablo Tarazaga. Binary representation of normalizedsymmetric and correlation matrices. Linear and Multilinear Algebra, 52(5):359–366,2004.[219] George B. Thomas, Jr. Calculus and Analytic Geometry.Addison-Wesley, fourth edition, 1972.[220] Mark Kahrs and Karlheinz Brandenburg, editors. Applications of Digital SignalProcessing to Audio and Acoustics. Kluwer Academic Publishers, 1998.[221] Thomas Kailath. Linear Systems. Prentice-Hall, 1980.[222] Tosio Kato. Perturbation Theory for Linear Operators.Springer-Verlag, 1966.[223] Paul J. Kelly and Norman E. Ladd. Geometry. Scott, Foresman and Company,1965.[224] Sunyoung Kim, Masakazu Kojima, Hayato Waki, and Makoto Yamashita. sfsdp:a Sparse version of Full SemiDefinite Programming relaxation for sensor networklocalization problems. Research Report B-457, Department of Mathematical and

804 BIBLIOGRAPHYComputing Sciences, Tokyo Institute of Technology, Japan, July 2009.http://math.ewha.ac.kr/~skim/Research/B-457.pdf[225] Yoonsoo Kim and Mehran Mesbahi. On the rank minimization problem. InProceedings of the American Control Conference, volume 3, pages 2015–2020.American Automatic Control Council (AACC), June 2004.[226] Ron Kimmel. Numerical Geometry of Images: Theory, Algorithms, andApplications. Springer-Verlag, 2003.[227] Erwin Kreyszig. Introductory Functional Analysis with Applications. Wiley, 1989.[228] Anthony Kuh, Chaopin Zhu, and Danilo Mandic. Sensor network localization usingleast squares kernel regression. In 10 th International Conference of Knowledge-BasedIntelligent Information and Engineering Systems (KES), volume 4253(III) of LectureNotes in Computer Science, pages 1280–1287, Bournemouth UK, October 2006.Springer-Verlag.http://www.convexoptimization.com/TOOLS/Kuh.pdf[229] Harold W. Kuhn. Nonlinear programming: a historical view. In Richard W. Cottleand Carlton E. Lemke, editors, Nonlinear Programming, pages 1–26. AmericanMathematical Society, 1976.[230] Takahito Kuno, Yasutoshi Yajima, and Hiroshi Konno. An outer approximationmethod for minimizing the product of several convex functions on a convex set.Journal of Global Optimization, 3(3):325–335, September 1993.[231] Amy N. Langville and Carl D. Meyer. Google’s PageRank and Beyond: The Scienceof Search Engine Rankings. Princeton University Press, 2006.[232] Jean B. Lasserre. A new Farkas lemma for positive semidefinite matrices. IEEETransactions on Automatic Control, 40(6):1131–1133, June 1995.[233] Jean B. Lasserre and Eduardo S. Zeron. A Laplace transform algorithm for thevolume of a convex polytope. Journal of the Association for Computing Machinery,48(6):1126–1140, November 2001.http://arxiv.org/abs/math/0106168 (title revision).[234] Monique Laurent. A connection between positive semidefinite and Euclideandistance matrix completion problems. Linear Algebra and its Applications, 273:9–22,1998.[235] Monique Laurent. A tour d’horizon on positive semidefinite and Euclideandistance matrix completion problems. In Panos M. Pardalos and Henry Wolkowicz,editors, Topics in Semidefinite and Interior-Point Methods, pages 51–76. AmericanMathematical Society, 1998.[236] Monique Laurent. Matrix completion problems. In Christodoulos A. Floudas andPanos M. Pardalos, editors, Encyclopedia of Optimization, volume III (Interior-M),pages 221–229. Kluwer, 2001.http://homepages.cwi.nl/~monique/files/opt.ps

BIBLIOGRAPHY 805[237] Monique Laurent and Svatopluk Poljak. On a positive semidefinite relaxation of thecut polytope. Linear Algebra and its Applications, 223/224:439–461, 1995.[238] Monique Laurent and Svatopluk Poljak. On the facial structure of the setof correlation matrices. SIAM Journal on Matrix Analysis and Applications,17(3):530–547, July 1996.[239] Monique Laurent and Franz Rendl. Semidefinite programming and integerprogramming. Optimization Online, 2002.http://www.optimization-online.org/DB HTML/2002/12/585.html[240] Monique Laurent and Franz Rendl. Semidefinite programming and integerprogramming. In K. Aardal, George L. Nemhauser, and R. Weismantel, editors,Discrete Optimization, volume 12 of Handbooks in Operations Research andManagement Science, chapter 8, pages 393–514. Elsevier, 2005.[241] Charles L. Lawson and Richard J. Hanson. Solving Least Squares Problems. SIAM,1995.[242] Jung Rye Lee. The law of cosines in a tetrahedron. Journal of the Korea Societyof Mathematical Education Series B: The Pure and Applied Mathematics, 4(1):1–6,1997.[243] Claude Lemaréchal. Note on an extension of “Davidon” methods to nondifferentiablefunctions. Mathematical Programming, 7(1):384–387, December 1974.http://www.convexoptimization.com/TOOLS/Lemarechal.pdf[244] Vladimir L. Levin. Quasi-convex functions and quasi-monotone operators. Journalof Convex Analysis, 2(1/2):167–172, 1995.[245] Scott Nathan Levine. Audio Representations for Data Compression and CompressedDomain Processing. PhD thesis, Stanford University, Department of ElectricalEngineering, 1999.http://www-ccrma.stanford.edu/~scottl/thesis/thesis.pdf[246] Adrian S. Lewis. Eigenvalue-constrained faces. Linear Algebra and its Applications,269:159–181, 1998.[247] Anhua Lin. Projection algorithms in nonlinear programming. PhD thesis, JohnsHopkins University, 2003.[248] Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and Hervé Lebret.Applications of second-order cone programming. Linear Algebra and itsApplications, 284:193–228, November 1998. Special Issue on Linear Algebrain Control, Signals and Image Processing.http://www.stanford.edu/~boyd/socp.html[249] Lee Lorch and Donald J. Newman. On the composition of completely monotonicfunctions and completely monotonic sequences and related questions. Journal ofthe London Mathematical Society (second series), 28:31–45, 1983.http://www.convexoptimization.com/TOOLS/Lorch.pdf

806 BIBLIOGRAPHY[250] David G. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.[251] David G. Luenberger. Introduction to Dynamic Systems: Theory, Models, &Applications. Wiley, 1979.[252] David G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, secondedition, 1989.[253] Zhi-Quan Luo, Jos F. Sturm, and Shuzhong Zhang. Superlinear convergence ofa symmetric primal-dual path following algorithm for semidefinite programming.SIAM Journal on Optimization, 8(1):59–81, 1998.[254] Zhi-Quan Luo and Wei Yu. An introduction to convex optimization forcommunications and signal processing. IEEE Journal On Selected Areas InCommunications, 24(8):1426–1438, August 2006.[255] Morris Marden. Geometry of Polynomials. American Mathematical Society, secondedition, 1985.[256] K. V. Mardia. Some properties of classical multi-dimensional scaling.Communications in Statistics: Theory and Methods, A7(13):1233–1241, 1978.[257] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press,1979.[258] Jerrold E. Marsden and Michael J. Hoffman. Elementary Classical Analysis.Freeman, second edition, 1995.[259] Rudolf Mathar. The best Euclidean fit to a given distance matrix in prescribeddimensions. Linear Algebra and its Applications, 67:1–6, 1985.[260] Rudolf Mathar. Multidimensionale Skalierung. B. G. Teubner Stuttgart, 1997.[261] Nathan S. Mendelsohn and A. Lloyd Dulmage. The convex hull of sub-permutationmatrices. Proceedings of the American Mathematical Society, 9(2):253–254, April1958.http://www.convexoptimization.com/TOOLS/permu.pdf[262] Mehran Mesbahi and G. P. Papavassilopoulos. On the rank minimization problemover a positive semi-definite linear matrix inequality. IEEE Transactions onAutomatic Control, 42(2):239–243, February 1997.[263] Mehran Mesbahi and G. P. Papavassilopoulos. Solving a class of rank minimizationproblems via semi-definite programs, with applications to the fixed order outputfeedback synthesis. In Proceedings of the American Control Conference, volume 1,pages 77–80. American Automatic Control Council (AACC), June 1997.http://www.convexoptimization.com/TOOLS/Mesbahi.pdf[264] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and Thomas Lee. Simpleaccurate expressions for planar spiral inductances. IEEE Journal of Solid-StateCircuits, 34(10):1419–1424, October 1999.http://www.stanford.edu/~boyd/papers/inductance expressions.html

BIBLIOGRAPHY 807[265] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and Thomas Lee.Bandwidth extension in CMOS with optimized on-chip inductors. IEEE Journal ofSolid-State Circuits, 35(3):346–355, March 2000.http://www.stanford.edu/~boyd/papers/bandwidth ext.html[266] David Moore, John Leonard, Daniela Rus, and Seth Teller. Robust distributednetwork localization with noisy range measurements. In Proceedings of the SecondInternational Conference on Embedded Networked Sensor Systems (SenSys´04),pages 50–61, Baltimore Maryland USA, November 2004. Association for ComputingMachinery (ACM, Winner of the Best Paper Award).http://rvsn.csail.mit.edu/netloc/sensys04.pdf[267] E. H. Moore. On the reciprocal of the general algebraic matrix. Bulletin of theAmerican Mathematical Society, 26:394–395, 1920. Abstract.[268] B. S. Mordukhovich. Maximum principle in the problem of time optimal responsewith nonsmooth constraints. Journal of Applied Mathematics and Mechanics,40:960–969, 1976.[269] Jean-Jacques Moreau. Décomposition orthogonale d’un espace Hilbertien selon deuxcônes mutuellement polaires. Comptes Rendus de l’Académie des Sciences, Paris,255:238–240, 1962.[270] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linear inequalities.Canadian Journal of Mathematics, 6:393–404, 1954.[271] Neil Muller, Lourenço Magaia, and B. M. Herbst. Singular value decomposition,eigenfaces, and 3D reconstructions. SIAM Review, 46(3):518–545, September 2004.[272] Katta G. Murty and Feng-Tien Yu. Linear Complementarity, Linear and NonlinearProgramming. Heldermann Verlag, Internet edition, 1988.ioe.engin.umich.edu/people/fac/books/murty/linear complementarity webbook[273] Oleg R. Musin. An extension of Delsarte’s method. The kissing problem in three andfour dimensions. In Proceedings of the 2004 COE Workshop on Sphere Packings,Kyushu University Japan, 2005.http://arxiv.org/abs/math.MG/0512649[274] Stephen G. Nash and Ariela Sofer. Linear and Nonlinear Programming.McGraw-Hill, 1996.[275] John Lawrence Nazareth. Differentiable Optimization and Equation Solving: ATreatise on Algorithmic Science and the Karmarkar Revolution. Springer-Verlag,2003.[276] Arkadi Nemirovski. Lectures on Modern Convex Optimization.http://www.convexoptimization.com/TOOLS/LectModConvOpt.pdf , 2005.[277] Yurii Nesterov and Arkadii Nemirovskii. Interior-Point Polynomial Algorithms inConvex Programming. SIAM, 1994.

808 BIBLIOGRAPHY[278] Ewa Niewiadomska-Szynkiewicz and Micha̷l Marks. Optimization schemes forwireless sensor network localization. International Journal of Applied Mathematicsand Computer Science, 19(2):291–302, 2009.http://www.convexoptimization.com/TOOLS/Marks.pdf[279] L. Nirenberg. Functional Analysis. New York University, New York, 1961. Lecturesgiven in 1960-1961, notes by Lesley Sibner.[280] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer-Verlag,1999.[281] Royal Swedish Academy of Sciences. Nobel prize in chemistry, 2002.http://nobelprize.org/chemistry/laureates/2002/public.html[282] C. S. Ogilvy. Excursions in Geometry. Dover, 1990. Citation: Proceedings of theCUPM Geometry Conference, Mathematical Association of America, No.16 (1967),p.21.[283] Ricardo Oliveira, João Costeira, and João Xavier. Optimal point correspondencethrough the use of rank constraints. In Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, volume 2, pages1016–1021, June 2005.[284] Alan V. Oppenheim and Ronald W. Schafer. Discrete-Time Signal Processing.Prentice-Hall, 1989.[285] Robert Orsi, Uwe Helmke, and John B. Moore. A Newton-like method for solvingrank constrained linear matrix inequalities. Automatica, 42(11):1875–1882, 2006.users.rsise.anu.edu.au/~robert/publications/OHM06 extended version.pdf[286] Brad Osgood. Notes on the Ahlfors mapping of a multiply connected domain, 2000.www-ee.stanford.edu/~osgood/Ahlfors-Bergman-Szego.pdf[287] M. L. Overton and R. S. Womersley. On the sum of the largest eigenvalues of asymmetric matrix. SIAM Journal on Matrix Analysis and Applications, 13:41–45,1992.[288] Pythagoras Papadimitriou. Parallel Solution of SVD-Related Problems, WithApplications. PhD thesis, Department of Mathematics, University of Manchester,October 1993.[289] Panos M. Pardalos and Henry Wolkowicz, editors. Topics in Semidefinite andInterior-Point Methods. American Mathematical Society, 1998.[290] Bradford W. Parkinson, James J. Spilker Jr., Penina Axelrad, and Per Enge, editors.Global Positioning System: Theory and Applications, Volume I. American Instituteof Aeronautics and Astronautics, 1996.[291] Gábor Pataki. Cone-LP’s and semidefinite programs: Geometry and a simplex-typemethod. In William H. Cunningham, S. Thomas McCormick, and MauriceQueyranne, editors, Proceedings of the 5 th International Conference on IntegerProgramming and Combinatorial Optimization (IPCO), volume 1084 of Lecture

BIBLIOGRAPHY 809Notes in Computer Science, pages 162–174, Vancouver, British Columbia, Canada,June 1996. Springer-Verlag.[292] Gábor Pataki. On the rank of extreme matrices in semidefinite programs andthe multiplicity of optimal eigenvalues. Mathematics of Operations Research,23(2):339–358, 1998.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.3831Erratum: p.660 herein.[293] Gábor Pataki. The geometry of semidefinite programming. In Henry Wolkowicz,Romesh Saigal, and Lieven Vandenberghe, editors, Handbook of SemidefiniteProgramming: Theory, Algorithms, and Applications, chapter 3. Kluwer, 2000.http://www.convexoptimization.com/TOOLS/Handbook.pdf[294] Teemu Pennanen and Jonathan Eckstein. Generalized Jacobians of vector-valuedconvex functions. Technical Report RRR 6-97, RUTCOR, Rutgers University, May1997.http://rutcor.rutgers.edu/pub/rrr/reports97/06.ps[295] Roger Penrose. A generalized inverse for matrices. In Proceedings of the CambridgePhilosophical Society, volume 51, pages 406–413, 1955.[296] Chris Perkins. A convergence analysis of Dykstra’s algorithm for polyhedral sets.SIAM Journal on Numerical Analysis, 40(2):792–804, 2002.[297] Sam Perlis. Theory of Matrices. Addison-Wesley, 1958.[298] Florian Pfender and Günter M. Ziegler. Kissing numbers, sphere packings, and someunexpected proofs. Notices of the American Mathematical Society, 51(8):873–883,September 2004.http://www.convexoptimization.com/TOOLS/Pfender.pdf[299] Benjamin Recht. Convex Modeling with Priors. PhD thesis, Massachusetts Instituteof Technology, Media Arts and Sciences Department, 2006.http://www.convexoptimization.com/TOOLS/06.06.Recht.pdf[300] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-ranksolutions of linear matrix equations via nuclear norm minimization. OptimizationOnline, June 2007.http://faculty.washington.edu/mfazel/low-rank-v2.pdf[301] Alex Reznik. Problem 3.41, from Sergio Verdú. Multiuser Detection. CambridgeUniversity Press, 1998.www.ee.princeton.edu/~verdu/mud/solutions/3/3.41.areznik.pdf , 2001.[302] Arthur Wayne Roberts and Dale E. Varberg. Convex Functions. Academic Press,1973.[303] Sara Robinson. Hooked on meshing, researcher creates award-winning triangulationprogram. SIAM News, 36(9), November 2003.http://www.siam.org/news/news.php?id=370

810 BIBLIOGRAPHY[304] R. Tyrrell Rockafellar. Conjugate Duality and Optimization. SIAM, 1974.[305] R. Tyrrell Rockafellar. Lagrange multipliers in optimization. SIAM-AmericanMathematical Society Proceedings, 9:145–168, 1976.[306] R. Tyrrell Rockafellar. Lagrange multipliers and optimality. SIAM Review,35(2):183–238, June 1993.[307] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press, 1997. Firstpublished in 1970.[308] C. K. Rushforth. Signal restoration, functional analysis, and Fredholm integralequations of the first kind. In Henry Stark, editor, Image Recovery: Theory andApplication, chapter 1, pages 1–27. Academic Press, 1987.[309] Peter Santos. Application platforms and synthesizable analog IP. In Proceedings ofthe Embedded Systems Conference, San Francisco California USA, 2003.http://www.convexoptimization.com/TOOLS/Santos.pdf[310] Shankar Sastry. Nonlinear Systems: Analysis, Stability, and Control.Springer-Verlag, 1999.[311] Uwe Schäfer. A linear complementarity problem with a P-matrix. SIAM Review,46(2):189–201, June 2004.[312] Isaac J. Schoenberg. Remarks to Maurice Fréchet’s article “Sur ladéfinition axiomatique d’une classe d’espace distanciés vectoriellement applicablesur l’espace de Hilbert”. Annals of Mathematics, 36(3):724–732, July 1935.http://www.convexoptimization.com/TOOLS/Schoenberg2.pdf[313] Isaac J. Schoenberg. Metric spaces and positive definite functions. Transactions ofthe American Mathematical Society, 44:522–536, 1938.http://www.convexoptimization.com/TOOLS/Schoenberg3.pdf[314] Peter H. Schönemann. A generalized solution of the orthogonal Procrustes problem.Psychometrika, 31(1):1–10, 1966.[315] Peter H. Schönemann, Tim Dorcey, and K. Kienapple. Subadditive concatenationin dissimilarity judgements. Perception and Psychophysics, 38:1–17, 1985.[316] Alexander Schrijver. On the history of combinatorial optimization (till 1960). InK. Aardal, G. L. Nemhauser, and R. Weismantel, editors, Handbook of DiscreteOptimization, pages 1–68. Elsevier, 2005.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.3362[317] Seymour Sherman. Doubly stochastic matrices and complex vector spaces.American Journal of Mathematics, 77(2):245–246, April 1955.http://www.convexoptimization.com/TOOLS/Sherman.pdf[318] Joshua A. Singer. Log-Penalized Linear Regression. PhD thesis, Stanford University,Department of Electrical Engineering, June 2004.http://www.convexoptimization.com/TOOLS/Josh.pdf

BIBLIOGRAPHY 811[319] Anthony Man-Cho So and Yinyu Ye. A semidefinite programming approach totensegrity theory and realizability of graphs. In Proceedings of the SeventeenthAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 766–775,Miami Florida USA, January 2006. Association for Computing Machinery (ACM).http://www.convexoptimization.com/TOOLS/YeSo.pdf[320] Anthony Man-Cho So and Yinyu Ye. Theory of semidefinite programming for sensornetwork localization. Mathematical Programming: Series A and B, 109(2):367–384,January 2007.http://www.stanford.edu/~yyye/local-theory.pdf[321] D. C. Sorensen. Newton’s method with a model trust region modification. SIAMJournal on Numerical Analysis, 19(2):409–426, April 1982.http://www.convexoptimization.com/TOOLS/soren.pdf[322] Steve Spain. Polaris Wireless corp., 2005.http://www.polariswireless.comPersonal communication.[323] Nathan Srebro, Jason D. M. Rennie, and Tommi S. Jaakkola. Maximum-marginmatrix factorization. In Lawrence K. Saul, Yair Weiss, and Léon Bottou,editors, Advances in Neural Information Processing Systems 17 (NIPS 2004), pages1329–1336. MIT Press, 2005.[324] Wolfram Stadler, editor. Multicriteria Optimization in Engineering and in theSciences. Springer-Verlag, 1988.[325] Henry Stark, editor. Image Recovery: Theory and Application. Academic Press,1987.[326] Henry Stark. Polar, spiral, and generalized sampling and interpolation. In RobertJ. Marks II, editor, Advanced Topics in Shannon Sampling and Interpolation Theory,chapter 6, pages 185–218. Springer-Verlag, 1993.[327] Willi-Hans Steeb. Matrix Calculus and Kronecker Product with Applications andC++ Programs. World Scientific, 1997.[328] Gilbert W. Stewart and Ji-guang Sun. Matrix Perturbation Theory. Academic Press,1990.[329] Josef Stoer. On the characterization of least upper bound norms in matrix space.Numerische Mathematik, 6(1):302–314, December 1964.http://www.convexoptimization.com/TOOLS/Stoer.pdf[330] Josef Stoer and Christoph Witzgall. Convexity and Optimization in FiniteDimensions I. Springer-Verlag, 1970.[331] Gilbert Strang. Linear Algebra and its Applications. Harcourt Brace, third edition,1988.[332] Gilbert Strang. Calculus. Wellesley-Cambridge Press, 1992.

812 BIBLIOGRAPHY[333] Gilbert Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press, secondedition, 1998.[334] Gilbert Strang. Course 18.06: Linear algebra, 2004.http://web.mit.edu/18.06/www[335] Stefan Straszewicz. Über exponierte Punkte abgeschlossener Punktmengen.Fundamenta Mathematicae, 24:139–143, 1935.http://www.convexoptimization.com/TOOLS/Straszewicz.pdf[336] Jos F. Sturm. SeDuMi (self-dual minimization).Software for optimization over symmetric cones, 2003.http://sedumi.ie.lehigh.edu[337] Jos F. Sturm and Shuzhong Zhang. On cones of nonnegative quadratic functions.Mathematics of Operations Research, 28(2):246–267, May 2003.www.optimization-online.org/DB HTML/2001/05/324.html[338] George P. H. Styan. A review and some extensions of Takemura’s generalizationsof Cochran’s theorem. Technical Report 56, Stanford University, Department ofStatistics, September 1982.[339] George P. H. Styan and Akimichi Takemura. Rank additivity and matrixpolynomials. Technical Report 57, Stanford University, Department of Statistics,September 1982.[340] Jun Sun, Stephen Boyd, Lin Xiao, and Persi Diaconis. The fastest mixing Markovprocess on a graph and a connection to a maximum variance unfolding problem.SIAM Review, 48(4):681–699, December 2006.http://stanford.edu/~boyd/papers/fmmp.html[341] Chen Han Sung and Bit-Shun Tam. A study of projectionally exposed cones. LinearAlgebra and its Applications, 139:225–252, 1990.[342] Yoshio Takane. On the relations among four methods of multidimensional scaling.Behaviormetrika, 4:29–43, 1977.http://takane.brinkster.net/Yoshio/p008.pdf[343] Akimichi Takemura. On generalizations of Cochran’s theorem and projectionmatrices. Technical Report 44, Stanford University, Department of Statistics,August 1980.[344] Dharmpal Takhar, Jason N. Laska, Michael B. Wakin, Marco F. Duarte, DrorBaron, Shriram Sarvotham, Kevin F. Kelly, and Richard G. Baraniuk. A newcompressive imaging camera architecture using optical-domain compression. InProceedings of the SPIE Conference on Computational Imaging IV, volume 6065,pages 43–52, February 2006.http://www.convexoptimization.com/TOOLS/csCamera-SPIE-dec05.pdf[345] Peng Hui Tan and Lars K. Rasmussen. The application of semidefinite programmingfor detection in CDMA. IEEE Journal on Selected Areas in Communications, 19(8),August 2001.

BIBLIOGRAPHY 813[346] Pham Dinh Tao and Le Thi Hoai An. A D.C. optimization algorithm for solvingthe trust-region subproblem. SIAM Journal on Optimization, 8(2):476–505, 1998.http://www.convexoptimization.com/TOOLS/pham.pdf[347] Pablo Tarazaga. Faces of the cone of Euclidean distance matrices: Characterizations,structure and induced geometry. Linear Algebra and its Applications, 408:1–13, 2005.[348] Kim-Chuan Toh, Michael J. Todd, and Reha H. Tütüncü. SDPT3 − a Matlabsoftware for semidefinite-quadratic-linear programming, February 2009.http://www.math.nus.edu.sg/~mattohkc/sdpt3.html[349] Warren S. Torgerson. Theory and Methods of Scaling. Wiley, 1958.[350] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra. SIAM, 1997.[351] Michael W. Trosset. Applications of multidimensional scaling to molecularconformation. Computing Science and Statistics, 29:148–152, 1998.[352] Michael W. Trosset. Distance matrix completion by numerical optimization.Computational Optimization and Applications, 17(1):11–22, October 2000.[353] Michael W. Trosset. Extensions of classical multidimensional scaling:Computational theory. convexoptimization.com/TOOLS/TrossetPITA.pdf , 2001.Revision of technical report entitled “Computing distances between convex sets andsubsets of the positive semidefinite matrices” first published in 1997.[354] Michael W. Trosset and Rudolf Mathar. On the existence of nonglobal minimizersof the STRESS criterion for metric multidimensional scaling. In Proceedings ofthe Statistical Computing Section, pages 158–162. American Statistical Association,1997.[355] Joshua Trzasko and Armando Manduca. Highly undersampled magnetic resonanceimage reconstruction via homotopic l 0 -minimization. IEEE Transactions onMedical Imaging, 28(1):106–121, January 2009.www.convexoptimization.com/TOOLS/SparseRecon.pdf[356] Joshua Trzasko, Armando Manduca, and Eric Borisch. Sparse MRI reconstructionvia multiscale L 0 -continuation. In Proceedings of the IEEE 14 th Workshop onStatistical Signal Processing, pages 176–180, August 2007.http://www.convexoptimization.com/TOOLS/L0MRI.pdf[357] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice-Hall, 1993.[358] Jan van Tiel. Convex Analysis, an Introductory Text. Wiley, 1984.[359] Lieven Vandenberghe and Stephen Boyd. Semidefinite programming. SIAM Review,38(1):49–95, March 1996.[360] Lieven Vandenberghe and Stephen Boyd. Connections between semi-infiniteand semidefinite programming. In R. Reemtsen and J.-J. Rückmann, editors,Semi-Infinite Programming, chapter 8, pages 277–294. Kluwer Academic Publishers,1998.

814 BIBLIOGRAPHY[361] Lieven Vandenberghe and Stephen Boyd. Applications of semidefinite programming.Applied Numerical Mathematics, 29(3):283–299, March 1999.[362] Lieven Vandenberghe, Stephen Boyd, and Shao-Po Wu. Determinant maximizationwith linear matrix inequality constraints. SIAM Journal on Matrix Analysis andApplications, 19(2):499–533, April 1998.[363] Robert J. Vanderbei. Convex optimization: Interior-point methods and applications.Lecture notes, 1999.http://orfe.princeton.edu/~rvdb/pdf/talks/pumath/talk.pdf[364] Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, 2004.[365] Martin Vetterli and Jelena Kovačević. Wavelets and Subband Coding. Prentice-Hall,1995.[366] Martin Vetterli, Pina Marziliano, and Thierry Blu. Sampling signals with finiterate of innovation. IEEE Transactions on Signal Processing, 50(6):1417–1428, June2002.http://bigwww.epfl.ch/publications/vetterli0201.pdfhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5630[367] È. B. Vinberg. The theory of convex homogeneous cones. Transactions of theMoscow Mathematical Society, 12:340–403, 1963. American Mathematical Societyand London Mathematical Society joint translation, 1965.[368] Marie A. Vitulli. A brief history of linear algebra and matrix theory, 2004.darkwing.uoregon.edu/~vitulli/441.sp04/LinAlgHistory.html[369] John von Neumann. Functional Operators, Volume II: The Geometry of OrthogonalSpaces. Princeton University Press, 1950. Reprinted from mimeographed lecturenotes first distributed in 1933.[370] Michael B. Wakin, Jason N. Laska, Marco F. Duarte, Dror Baron, ShriramSarvotham, Dharmpal Takhar, Kevin F. Kelly, and Richard G. Baraniuk. Anarchitecture for compressive imaging. In Proceedings of the IEEE InternationalConference on Image Processing (ICIP), pages 1273–1276, October 2006.http://www.convexoptimization.com/TOOLS/CSCam-ICIP06.pdf[371] Roger Webster. Convexity. Oxford University Press, 1994.[372] Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning of imagemanifolds by semidefinite programming. In Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, volume 2, pages988–995, 2004.http://www.cs.ucsd.edu/~saul/papers/sde cvpr04.pdf[373] Eric W. Weisstein. Mathworld – A Wolfram Web Resource, 2007.http://mathworld.wolfram.com/search[374] D. J. White. An analogue derivation of the dual of the general Fermat problem.Management Science, 23(1):92–94, September 1976.http://www.convexoptimization.com/TOOLS/White.pdf

BIBLIOGRAPHY 815[375] Bernard Widrow and Samuel D. Stearns. Adaptive Signal Processing. Prentice-Hall,1985.[376] Norbert Wiener. On factorization of matrices. Commentarii Mathematici Helvetici,29:97–111, 1955.[377] Ami Wiesel, Yonina C. Eldar, and Shlomo Shamai (Shitz). Semidefinite relaxationfor detection of 16-QAM signaling in MIMO channels. IEEE Signal ProcessingLetters, 12(9):653–656, September 2005.[378] Michael P. Williamson, Timothy F. Havel, and Kurt Wüthrich. Solutionconformation of proteinase inhibitor IIA from bull seminal plasma by 1 H nuclearmagnetic resonance and distance geometry. Journal of Molecular Biology,182:295–315, 1985.[379] Willie W. Wong. Cayley-Menger determinant and generalized N-dimensionalPythagorean theorem, November 2003. Application of Linear Algebra: Notes onTalk given to Princeton University Math Club.http://www.convexoptimization.com/TOOLS/gp-r.pdf[380] William Wooton, Edwin F. Beckenbach, and Frank J. Fleming. Modern AnalyticGeometry. Houghton Mifflin, 1975.[381] Margaret H. Wright. The interior-point revolution in optimization: History, recentdevelopments, and lasting consequences. Bulletin of the American MathematicalSociety, 42(1):39–56, January 2005.[382] Stephen J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.[383] Shao-Po Wu. max-det Programming with Applications in Magnitude Filter Design.A dissertation submitted to the department of Electrical Engineering, StanfordUniversity, December 1997.[384] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver for semidefiniteprogramming and determinant maximization problems with matrix structure, May1996.http://www.stanford.edu/~boyd/old software/SDPSOL.html[385] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver for semidefinite programswith matrix structure. In Laurent El Ghaoui and Silviu-Iulian Niculescu, editors,Advances in Linear Matrix Inequality Methods in Control, chapter 4, pages 79–91.SIAM, 2000.http://www.stanford.edu/~boyd/sdpsol.html[386] Shao-Po Wu, Stephen Boyd, and Lieven Vandenberghe. FIR filter design viaspectral factorization and convex optimization. In Biswa Nath Datta, editor,Applied and Computational Control, Signals, and Circuits, volume 1, chapter 5,pages 215–246. Birkhäuser, 1998.http://www.stanford.edu/~boyd/papers/magdes.html

[387] Naoki Yamamoto and Maryam Fazel. Computational approach to quantum encoderdesign for purity optimization. Physical Review A, 76(1), July 2007.http://arxiv.org/abs/quant-ph/0606106[388] David D. Yao, Shuzhong Zhang, and Xun Yu Zhou. Stochastic linear-quadraticcontrol via primal-dual semidefinite programming. SIAM Review, 46(1):87–111,March 2004. Erratum: p.238 herein.[389] Yinyu Ye. Semidefinite programming for Euclidean distance geometric optimization.Lecture notes, 2003.http://www.stanford.edu/class/ee392o/EE392o-yinyu-ye.pdf[390] Yinyu Ye. Convergence behavior of central paths for convex homogeneous self-dualcones, 1996.http://www.stanford.edu/~yyye/yyye/ye.ps[391] Yinyu Ye. Interior Point Algorithms: Theory and Analysis. Wiley, 1997.[392] D. C. Youla. Mathematical theory of image restoration by the method of convexprojection. In Henry Stark, editor, Image Recovery: Theory and Application,chapter 2, pages 29–77. Academic Press, 1987.[393] Fuzhen Zhang. Matrix Theory: Basic Results and Techniques. Springer-Verlag,1999.[394] Günter M. Ziegler. Kissing numbers: Surprises in dimension four. Emissary, pages4–5, Spring 2004.http://www.msri.org/communications/emissary816

Index∅, see empty set0-norm, 229, 233, 292, 333, 339, 340,346–349, 369, 380, 786Boolean, 2921-norm, 225–229, 231–233, 351–353, 572,786ball, see ballBoolean, 292distance matrix, 263, 264, 468gradient, 233heuristic, 335, 336, 572nonnegative, 229, 230, 339, 346–3502-norm, 50, 221, 222, 225, 226, 353, 691, 786ball, see ballBoolean, 292matrix, 70, 268, 552, 565, 624, 634, 641,642, 654, 745, 786dual, 655, 786inequality, 70, 786Schur-form, 268, 654∞-norm, 225, 226, 232, 786ball, see balldual, 786k-largest norm, 232, 233, 337, 338gradient, 234≻, 64, 101, 103, 116, 785≽, 64, 103, 116, 785≥, 64, 785Ax = b, 49, 86, 227, 228, 233, 296, 298,305, 333, 337, 340, 351–353, 387,700, 701Boolean, 292nonnegative, 88, 89, 229, 275, 339,346–350V , 642δ , 591, 606, 775ı , 430, 771j, 375, 771λ , 592, 606, 775∂ , 40, 46, 669π , 231, 334, 559, 563, 585, 775ψ , 301, 625σ , 592, 775×, 47, 163, 756, 778tr , 606, 784e, 58, 783− A − , 38active set, 307adaptive, 375, 380, 383additivity, 105, 116, 606adjacency, 53adjointoperator, 50, 51, 164, 537, 769, 776self, 409, 536, 591affine, 38, 237algebra, 38, 40, 62, 79, 95, 715boundary, 38, 46combination, 67, 72, 78, 150dimension, 23, 38, 61, 445, 447, 448,457, 539complementary, 539cone, 141low, 421minimization, 570, 582Précis, 448reduction, 494spectral projection, 583face, 95hull, see hullindependence, 71, 78, 142, 145preservation, 79817

818 INDEXinterior, 40map, see transformationmembership relation, 224normal cone, 765set, 38, 61, 62parallel, 38subset, 62, 72hyperplane intersection, 86nullspace basis span, 87projector, 703, 715transformation, see transformationalgebraaffine, 38, 40, 62, 79, 95, 715arg, 338, 653, 654cone, 164face, 95functionlinear, 220support, 240fundamental theorem, 617linear, 85interior, 40, 46, 183linear, 85, 591nullspace, 634optimization, 653, 654projection, 715subspace, 36, 85, 86, 634algebraiccomplement, 96, 97, 156, 203, 708, 712,772projection on, 708, 740multiplicity, 618Alizadeh, 299alternation, 485alternative, see theoremambient, 529, 542vector space, 36anchor, 315, 316, 417, 424angle, 84acute, 72alternating projection, 755brackets, 50, 51constraint, 408, 409, 412, 428dihedral, 29, 84, 429, 491, 649EDM cone, 128halfspace, 73inequality, 400, 489matrix, 433interpoint, 405obtuse, 72positive semidefinite cone, 128relative, 430, 432right, 129, 159, 181, 182torsion, 29, 428Anstreicher, 665antidiagonal, 437antisymmetry, 103, 606arg, 783algebra, 338, 653, 654artificial intelligence, 25asymmetry, 35, 72axioms of the metric, 397axis of revolution, 129, 130, 649− B −, 228ball1-norm, 227, 228nonnegative, 347∞-norm, 227Euclidean, 39, 227, 228, 413, 703, 744smallest, 64nuclear norm, 67packing, 413spectral norm, 70, 745Barvinok, 35, 113, 138, 142, 145, 299, 393,425base, 96station, 425basis, 58, 190, 777discrete cosine transform, 341nonorthogonal, 190, 716projection on, 716nullspace, 87, 91, 613, 710orthogonalprojection, 715orthonormal, 58, 90, 712pursuit, 372standardmatrix, 58, 60, 770vector, 58, 151, 406, 783

INDEX 819bees, 30, 413Bellman, 276bijective, 50, 53, 56, 119, 185Gram form, 439, 442inner-product form, 443, 444invertible, 53, 700isometry, 54, 485linear, 52, 54, 57, 60, 144orthogonal, 54, 485, 563, 647, 665biorthogonaldecomposition, 704, 713expansion, 153, 168, 186, 190, 705EDM, 510projection, 716unique, 188, 190, 193, 511, 638, 717biorthogonality, 188, 191condition, 186, 619, 638Birkhoff, 69Boolean, see problemboundlower, 552polyhedral, 501boundary, 39, 40, 46, 64, 96-point, 94extreme, 111of affine, 38, 46of cone, 106, 178, 183, 185EDM, 499, 507, 514membership, 169, 178, 185, 287PSD, 40, 114, 119, 135–138, 287, 519ray, 107of halfspace, 72of hypercube slice, 383of open set, 41of point, 46of ray, 96relative, 39, 40, 46, 64, 96bounded, 64, 84bowl, 680, 681box, 83, 744bridge, 429, 458, 527, 553, 569Bunt, 137, 734calculus− C −, 36matrix, 669of inequalities, 7of variations, 173Carathéodory’s theorem, 168, 726cardinality, 333, 402, 784, 7861-norm heuristic, 335, 336, 572-one, 227, 347constraint, see constraintconvex envelope, see convexgeometry, 227, 230, 347minimization, 227, 228, 333, 340,351–353, 380Boolean, 292nonnegative, 229, 230, 233, 333, 337,339, 346–350rank connection, 310rank one, 369rank, full, 298quasiconcavity, 233, 334regularization, 345, 370Cartesianaxes, 38, 63, 80, 99, 108, 154, 166, 336,744cone, see orthantcoordinates, 480product, 47, 284, 756, 778cone, 102, 103, 163function, 224subspace, 336, 744Cauchy-Schwarz, 88, 753Cayley-Mengerdeterminant, 491, 493form, 449, 470, 471, 559, 584cellular, 426telephone network, 425centergeometric, 407, 411, 435, 478gravity, 435mass, 435central path, 283certificate, 217, 287, 288, 309, 391null, 170, 179chain rule, 676two argument, 676Chu, 7

820 INDEXclipping, 234, 548, 744, 775closed, see setcoefficientbinomial, 232, 350projection, 186, 517, 722, 728cofactor, 492compaction, 318, 331, 408, 412, 525, 573comparable, 105, 187complement, 72algebraic, 96, 97, 156, 203, 708, 712,772projection on, 708, 740orthogonal, 56, 85, 310, 708, 712, 773,779cone, 156, 166, 740, 765vector, 779relative, 92, 773complementarity, 203, 226, 291, 309, 334linear, 207maximal, 278, 283, 307problem, 206linear, 207strict, 291complementaryaffine dimension, 539dimension, 85eigenvalues, 708halfspace, 76inertia, 472, 612slackness, 290subspace, see complementcompressed sensing, 225, 227–230, 233, 298,337–340, 351–353, 371–375, 381,385formula, 339, 372nonnegative, 229, 230, 339, 346–350Procrustes, 359compressive sampling, see compressedcomputational complexity, 276concave, 219, 220, 235conditionbiorthogonality, 186, 619, 638convexity, 256first order, 256, 260, 262, 264second order, 261, 266Moore-Penrose, 699necessary, 773optimality, 289dual, 160first order, 173, 174, 203–208KKT, 206, 208, 557, 668semidefinite program, 290Slater, 162, 216, 285, 289, 290unconstrained, 174, 251, 252, 730orthonormality, 189sufficient, 773cone, 70, 96, 100, 784algebra, 164blade, 167boundary, see boundaryCartesian, see orthantproduct, 102, 103, 163circular, 45, 101, 129–132, 544Lorentz, see cone, Lorentzright, 131closed, see set, closedconvex, 70, 100–102, 109difference, 47, 163dimension, 141dual, 153, 156, 158–160, 163–166, 208,211, 212, 475algorithm, 208construction, 158, 159dual, 163, 176facet, 186, 213formula, 175, 176, 194, 201, 202halfspace-description, 175in subspace, 164, 169, 191, 195, 196Lorentz, 165, 182orthant, see orthantpointed, 151, 163product, Cartesian, 163properties, 163self, 170, 180, 181unique, 156, 185vertex-description, 194, 208EDM, 403, 497, 499, 500, 523√EDM , 500, 566√EDM , 502boundary, 499, 507, 514

INDEX 821circular, 130construction, 508convexity, 502decomposition, 538dual, 529, 530, 535, 540, 542–544extreme direction, 510face, 509, 511interior, 499, 507one-dimensional, 111positive semidefinite, 511, 519, 535projection, 566, 578, 581empty set, 100face, 106intersection, 178pointed, 103PSD, 147PSD, smallest, 122–126, 518, 531,541smallest, 102, 139, 178, 179, 230, 281,350, 509facet, 186, 213halfline, 96, 101, 103, 109, 111, 142, 151halfspace, 151, 160hull, 164ice cream, see cone, circularinterior, see interiorintersection, 102, 164pointed, 103invariance, see invarianceLorentzcircular, 101, 129dual, 165, 182polyhedral, 149, 165majorization, 595membership, see membership relationmonotone, 199–201, 472, 586, 667nonnegative, 197–199, 474, 475, 481,483, 484, 560–562, 586, 595, 745negative, 187, 717nonconvex, 97–100normal, 203, 204, 206, 224, 516, 735,741, 763–766affine, 765elliptope, 764membership relation, 203orthant, 203one-dimensional, 96, 101, 103, 142, 151EDM, 111PSD, 109origin, 72, 100, 102, 103, 107, 109, 142orthant, see orthantorthogonal complement, 166, 740, 765pointed, 102, 103, 106–109, 151, 168,188, 201closed convex, 103, 144dual, 151, 163extreme, 103, 109, 151, 160face, 103polyhedral, 108, 151, 184, 185polar, 96, 156, 516, 739, 780polyhedral, 70, 142, 148–151, 160dual, 143, 165, 170, 176, 181, 184,186equilateral, 181facet, 186, 213halfspace-description, 148majorization, 586pointed, 108, 151, 184, 185proper, 143vertex-description, 70, 150, 170, 185positive semidefinite, 43, 113–116, 533boundary, 114, 119, 135, 287, 519circular, 129–132, 235dimension, 122dual, 180EDM, 535extreme direction, 109, 115, 127face, 147face, smallest, 122–126, 518, 531, 541hyperplane, supporting, 121, 123inscription, 133interior, 114, 286inverse image, 117, 118one-dimensional, 109optimization, 282polyhedral, 132rank, 122–126, 136, 279visualization, 280productCartesian, 102, 103, 163

822 INDEXproper, 106, 208dual, 165quadratic, 101ray, 96, 101, 103, 142, 151EDM, 111positive semidefinite, 109recession, 157second order, see cone, Lorentzselfdual, 170, 180, 181shrouded, 181, 543simplicial, 71, 153–155, 187, 745dual, 165, 187, 195spectral, 470, 472, 559dual, 475orthant, 562subspace, 101, 149, 151sum, 102, 164supporting hyperplane, see hyperplanetangent, 516transformation, see transformationtwo-dimensional, 142union, 98, 100, 164, 209unique, 114, 156, 499vertex, 103congruence, 438, 469coniccombination, 72, 100, 150constraint, 205, 206, 208, 235coordinates, 153, 190, 193, 213–217hull, see hullindependence, 27, 71, 140–146, 298versus dimension, 141dual cone, 196extreme direction, 144preservation, 142unique, 141, 146, 192, 194problem, 162, 205, 206, 208, 224, 275dual, 162, 275section, 130circular cone, 131conjugatecomplex, 376, 596, 628, 647, 728, 769,775convex, 163, 168, 169, 175, 186, 195,203function, 570, 775gradient, 381conservationdimension, 85, 448, 626, 627, 641constellation, 22constraintangle, 408, 409, 412, 428binary, 292, 362cardinality, 333, 340, 351–353, 380full-rank, 298nonnegative, 229, 233, 333, 337, 339,346–350projection on PSD cone, 369rank one, 369conic, 205, 206, 208, 235equality, 174, 208affine, 274inequality, 206, 307matrixGram, 408, 409orthogonal, see Procrustes, 391orthonormal, 355nonnegative, 69, 88, 89, 206, 275norm, 353, 363, 625spectral, 70polynomial, 235, 360, 361qualification, see Slaterrank, 308, 355, 360, 361, 387, 390, 575,583cardinality, 369fat, 384indefinite, 384projection on matrices, 745projection on PSD cone, 251, 369,558, 564, 657Schur-form, 70sort, 484tractable, 274content, 493, 784contour plot, 204, 259contraction, 70, 207, 381, 383convergence, 311, 485, 748geometric, 750global, 309, 311, 313, 334, 391local, 313, 336, 340, 352, 391, 393

INDEX 823measure, 295convex, 21, 36, 220combination, 36, 72, 150envelope, 570–572bilinear, 271cardinality, 297, 572rank, 325, 424, 570–572equivalent, 325form, 7, 273geometry, 35fundamental, 74, 79, 84, 395, 407hull, see hulliterationcardinality, 333–340, 351, 352convergence, 309, 311, 313, 334, 336,352, 393rank, 308–313, 575rank one, 355, 390, 587rank, fat, 384rank, indefinite, 384stall, 313, 336, 340, 341, 354, 363,389, 393log, 272optimization, 7, 21, 222, 274, 417art, 8, 298, 318, 589fundamental, 274polyhedron, see polyhedronproblem, 162, 174, 205, 208, 226, 238,247, 249, 278, 284, 564, 733definition, 274geometry, 21, 27nonlinear, 273statement as solution, 7, 273tractable, 274program, see problemquadratic, see functionconvexity, 219, 235conditionfirst order, 256, 260, 262, 264second order, 261, 266K-, 220log, 272norm, 70, 221, 222, 225, 226, 255, 268,655, 786strict, see function, convexcoordinatesconic, 153, 190, 193, 213–217Crippen & Havel, 428crippling, 276criterionEDM, 517, 521dual, 536Finsler, 521metric versus matrix, 450Schoenberg, 30, 407, 471, 541, 729cubix, 670, 673, 781curvature, 261, 270− D −, 406Dantzig, 276decomposition, 780biorthogonal, 704, 713completely PSD, 101, 314dyad, 631eigenvalue, 616, 618–620nonnegative matrix factorization, 314orthonormal, 711, 712singular value, see singular valuedeconvolution, 372definite, see matrixdeLeeuw, 476, 567, 577Delsarte, 414dense, 95derivative, 251directional, 260, 677, 680, 681, 685–687dimension, 680, 690gradient, 681second, 682gradient, 687partial, 669, 776table, 690descriptionconversion, 153halfspace-, 71, 74of affine, 86of dual cone, 175of polyhedral cone, 148vertex-, 71, 72, 110of affine, 87of dual cone, 194, 208

824 INDEXof halfspace, 145of hyperplane, 77of polyhedral cone, 70, 150, 170, 185of polyhedron, 64, 150projection on affine, 720determinant, 269, 488, 602, 615, 690, 696Cayley-Menger, 491, 493inequality, 607Kronecker, 594product, 604Deza, 301, 516diag(), see δdiagonal, 591, 606, 775δ , 591, 606, 775binary, 704dominance, 134inequality, 607matrix, 617commutative, 605diagonalization, 620valueseigen, 617singular, 621diagonalizable, 188, 616simultaneously, 123, 291, 603, 605, 629diagonalization, 42, 91, 616, 618–620diagonal matrix, 620expansion by, 188symmetric, 619diamond, 437differenceconvex functions, 233, 249, 337differentiable, see functiondifferentialFréchet, 678partial, 776diffusion, 525dilation, 481dimension, 38affine, see affinecomplementary, 85conservation, 85, 448, 626, 627, 641domain, 55embedding, 61Euclidean, 777invariance, 55nullspace, 85positive semidefinite coneface, 122Précis, 448range, 55rank, 141, 447, 448, 784direction, 96, 109, 677matrix, 308, 325, 330, 370, 415, 576interpretation, 309Newton, 251steepest descent, 251, 680vector, 308–313, 334–337interpretation, 335, 336, 572disc, 65discretization, 175, 221, 261, 541dist, 751, 785distance, 751, 7521-norm, 226–228, 263, 264absolute, 397, 467comparative, see distance, ordinalEuclidean, 402geometry, 22, 425matrix, see EDMordinal, 479–486origin to affine, 77, 227, 228, 703, 719nonnegative, 229, 230, 346property, 397taxicab, see distance, 1-normDonoho, 385doublet, see matrixdualaffine dimension, 539dual, 163, 170, 176, 290, 294, 367feasible set, 284norm, 165, 655, 786problem, 160, 161, 216, 231, 232, 275,288, 290, 291, 367, 410, 529projection, 734, 736on cone, 737on convex set, 734, 736on EDM cone, 581optimization, 735strong, 161, 162, 289, 665variable, 160, 206, 208, 216, 775

INDEX 825duality, 160, 410gap, 161, 162, 285, 289, 290, 367strong, 162, 216, 238, 290weak, 288dvec, 60, 483, 785dyad, see matrixDykstra, 751, 763algorithm, 748, 761, 762, 767− E −, 460Eckart, 556, 558, 589edge, 93 √EDM , 502EDM, 28, 395, 396, 403, 544closest, 557composition, 465–468, 500–502construction, 505criterion, 517, 521dual, 536definition, 402, 503Gram form, 405, 514inner-product form, 430interpoint angle, 405relative-angle form, 432dual, 536, 537exponential entry, 465graph, 399, 414, 418, 421, 422, 525indefinite, 468membership relation, 541nonnegative, 535projection, 520range, 506test, numerical, 400, 407unique, 464, 477, 478, 558eigen, 616matrix, 116, 189, 619spectrum, 470, 559ordered, 472unordered, 474value, 601, 616, 617, 654, 656λ , 592, 606, 775coefficients, 723, 726decomposition, 616, 618–620EDM, 449, 469, 482, 504inequality, 603, 664interlaced, 455, 469, 609intertwined, 455inverse, 618, 620, 624Kronecker, 594largest, 457, 608, 656, 657, 659left, 617of sum, 608principal, 369product, 603, 605, 606, 661real, 596, 617, 619repeated, 137, 617, 618Schur, 614singular, 621, 624smallest, 457, 466, 529, 608, 656sum of, 53, 602, 658, 660symmetric matrix, 619, 625transpose, 617unique, 128, 617zero, 626, 635vector, 616, 656, 657distinct, 618EDM, 504Kronecker, 594left, 617normal, 619principal, 369, 371real, 617, 619, 627symmetric, 619unique, 618elbow, 551, 552ellipsoid, 36, 41, 43, 353invariance, 49elliptope, 294, 296, 408, 417, 459, 460, 468,514–516smooth, 459vertex, 295, 460, 461embedding, 445dimension, 61emptyinterior, 39set, 38, 39, 46, 653affine, 38closed, 41cone, 100face, 92

826 INDEXhull, 62, 65, 72open, 41entry, 37, 220, 769, 774–776, 781largest, 231smallest, 231epigraph, 219, 222, 241, 242, 265, 271form, 248, 580, 581intersection, 241, 268, 272equalityconstraint, 174, 208affine, 274EDM, 535normal equation, 702errata, 212, 238, 315, 442, 660, 743Euclideanball, 39, 227, 228, 413, 703, 744smallest, 64distance, 396, 402geometry, 22, 418, 425, 729matrix, see EDMmetric, 397, 451fifth property, 397, 400, 401, 486,489, 495norm, see 2-normprojection, 136space, 36, 396, 778exclusivemutually, 172expansion, 191, 780biorthogonal, 153, 168, 186, 190, 705EDM, 510projection, 716unique, 188, 190, 193, 511, 638, 717implied by diagonalization, 188orthogonal, 58, 156, 189, 711, 713w.r.t orthant, 190exponential, 697exposed, 92closed, 511direction, 111extreme, 94, 103, 146face, 93, 147point, 93, 94, 111density, 95extreme, 92boundary, 111direction, 89, 107, 112, 144distinct, 109dual cone, 143, 186, 191, 193, 196,210–212EDM cone, 111, 510, 535one-dimensional, 109, 111origin, 109orthant, 89pointed cone, 103, 109, 151, 160PSD cone, 109, 115, 127shared, 212supporting hyperplane, 186unique, 109exposed, 94, 103, 146point, 69, 92, 94, 95, 283cone, 103, 139Fantope, 309hypercube slice, 336ray, 107, 514− F −, 95face, 42, 92affine, 95algebra, 95cone, see coneexposed, see exposedintersection, 178isomorphic, 122, 410, 509of face, 95polyhedron, 147recognition, 25smallest, 95, 122–126, 178, 179, seeconetransitivity, 95facet, 93, 186, 213factorizationmatrix, 655nonnegative matrix, 314Fan, 65, 276, 603, 658Fantope, 65, 66, 128, 246, 309extreme point, 309Farkas’ lemma, 168, 171positive definite, 286not, 287

INDEX 827positive semidefinite, 284fat, see matrixFejér, 180, 754Fenchel, 219Fermat point, 410, 411Fiacco, 276find, 783finitely generated, 148floor, 786Forsgren & Gill, 273Fouriertransform, discrete, 54, 375Frisch, 276Frobenius, 52, see normfull, 39-dimensional, 39, 106-rank, 86, 298function, see operatoraffine, 47, 224, 237–240, 253supremum, 241, 655bilinear, 271composition, 235, 255, 676affine, 255, 272concave, 219, 235, 271, 574, 656conjugate, 570, 775continuity, 220, 262, 271convex, 219, 220, 404, 433difference, 233, 249, 337invariance, 220, 241, 272, 653, 654matrix, 262nonlinear, 220, 240simultaneously, 244, 245, 265strictly, 221, 222, 224, 255, 256, 258,261, 267, 268, 567, 573, 579, 680,681sum, 224, 255, 272supremum, 268, 272trivial, 224vector, 220differentiable, 220, 240, 258, 266non, 220, 221, 226, 252, 271, 381, 382distance1-norm, 263, 264Euclidean, 402, 403fractional, 244, 256, 261, 265, 613inverted, 235, 690, 697power, 236, 697projector, 245root, 235, 607, 620, 770inverted, see function, fractionalinvertible, 49, 53, 55linear, see operatormonotonic, 219, 231–233, 254, 271, 337multidimensional, 219, 250, 266, 782affine, 224, 238negative, 235objective, see objectivepresorting, 231, 334, 559, 563, 585, 775product, 244, 256, 271projection, see projectorpseudofractional, 243quadratic, 222, 248, 257, 261, 267, 431,612, 727convex, strictly, 222, 261, 267, 579,680, 681distance, 402nonconvex, 360, 668nonnegative, 612quasiconcave, 269–271, 301cardinality, 233, 334rank, 135quasiconvex, 136, 242, 269–271continuity, 270, 271quasilinear, 261, 271, 625quotient, see function, fractionalratio, see function, fractionalsmooth, 220, 226sorting, 231, 334, 559, 563, 585, 775step, 625matrix, 301stress, 262, 554, 576support, 82, 240algebra, 240fundamentalconvexgeometry, 74, 79, 84, 395, 407optimization, 274metric property, 397, 451semidefiniteness test, 180, 596, 727

828 INDEXsubspace, 49, 85, 86, 635, 639, 640, 700,710vectorized, 90− G − , 176Gâteaux differential, 678Gaffke, 530generators, 62, 72, 110, 176c.i., 144hull, 150affine, 62, 77, 78, 720conic, 72, 144, 145convex, 64l.i., 145minimal set, 64, 77, 78, 144, 777affine, 720extreme, 110, 185halfspace, 145hyperplane, 145orthant, 221positive semidefinite cone, 115, 180,263geometriccenter, 407, 411, 435, 478operator, 441subspace, 121, 435, 439, 440, 498,529, 532–534, 570, 629, 731Hahn-Banach theorem, 74, 79, 84mean, 697multiplicity, 618Geršgorin, 132gimbal, 649global positioning system, 24, 316Glunt, 580, 763Gower, 395gradient, 173, 204, 240, 250, 258, 669, 688composition, 676conjugate, 381derivative, 687directional, 681dimension, 250first order, 687image, 378monotonic, 255norm, 234, 691, 693, 709, 719, 720, 730,734product, 673second order, 688, 689sparsity, 373table, 690zero, 174, 251, 252, 730Gram matrix, 323, 405, 417constraint, 408, 409unique, 435, 439− H −, 72Hadamardproduct, 51, 566, 592, 593, 676, 772positive semidefinite, 606quotient, 690, 697, 772square root, 770halfline, 96halfspace, 72–74, 160boundary, 72description, 71, 74of affine, 86of dual cone, 175of polyhedral cone, 148vertex-, 145interior, 46intersection, 74, 163, 182polarity, 72Hardy-Littlewood-Pólya, 560Hayden & Wells, 248, 497, 504, 511, 578Herbst, 624Hessian, 251, 669hexagon, 412Hiriart-Urruty, 82homogeneity, 105, 225, 272, 403homotopy, 334honeycomb, 30Hong, 580Horn & Johnson, 597, 598horn, flared, 101hull, 61affine, 38, 61–65, 79, 150correlation matrices rank-1, 62EDM cone, 441, 529empty set, 62

INDEX 829point, 39, 62positive semidefinite cone, 62unique, 61conic, 63, 70, 150empty set, 72convex, 61, 63, 64, 150, 396cone, 164dyads, 67empty set, 65extreme directions, 110orthogonal matrices, 70orthonormal matrices, 70outer product, 65permutation matrices, 69positive semidefinite cone, 127projection matrices, 65rank one matrices, 67, 68rank one symmetric matrices, 127unique, 64hyperboloid, 630hyperbox, 83, 744hypercube, 52, 83, 227, 744nonnegative, 297slice, nonnegative, 336, 383hyperdimensional, 674hyperdisc, 681hyperplane, 72, 73, 75–77, 253hypersphere radius, 75independence, 86intersection, 86, 178convex set, 80, 523movement, 75, 89, 90, 230normal, 76separating, 84, 165, 170, 171strictly, 84supporting, 79, 81, 82, 204, 257–259cone, 156, 158, 178, 186cone, EDM, 529cone, PSD, 121, 123exposed face, 93polarity, 82strictly, 82, 102, 103, 257, 258trivially, 79, 179unique, 257, 258vertex-description, 77hypersphere, 65, 75, 408, 413, 461circumhypersphere, 449hypograph, 243, 271, 692− I −, 782idempotent, see matrixiff, 784image, 47inverse, 47–49, 183cone, 117, 118, 164, 184magnetic resonance, 373in, 777independenceaffine, 71, 78, 142, 145preservation, 79conic, 27, 71, 140–146, 298versus dimension, 141dual cone, 196extreme direction, 144preservation, 142unique, 141, 146, 192, 194linear, 37, 71, 141, 142, 145, 298, 348,637matrix, 603of affine, 86, 88preservation, 37subspace, 37, 772inequalityangle, 400, 489matrix, 433Bessel, 711Cauchy-Schwarz, 88, 753constraint, 206, 307determinant, 607diagonal, 607eigenvalue, 603, 664generalized, 105, 153, 168dual, 168dual PSD, 180partial order, 103, 105, 116Hardy-Littlewood-Pólya, 560Jensen, 256linear, 35, 72, 169matrix, 182–184, 284, 287log, 692

830 INDEXnorm, 293, 786triangle, 225rank, 607singular value, 70, 666spectral, 70, 470trace, 607triangle, 397, 398, 431, 451–458, 464,487–492, 501, 502norm, 225strict, 454unique, 487vector, 225variation, 173inertia, 469, 612complementary, 472, 612Sylvester’s law, 604infimum, 272, 653, 656, 709, 730, 783inflection, 261injective, 53, 54, 106, 215, 439, 777Gram form, 435, 439, 442inner-product form, 443, 444invertible, 53, 699, 700non, 55nullspace, 53innovationrate, 372interior, 39, 40, 46, 96-point, 39method, 148, 273, 276, 278, 295, 332,414, 416, 540, 577algebra, 40, 46, 183empty, 39of affine, 40of cone, 183, 185EDM, 499, 507membership, 169positive semidefinite, 114, 286of halfspace, 46of point, 39, 41of ray, 96relative, 39, 40, 46, 96transformation, 183intersection, 46cone, 102, 164pointed, 103epigraph, 241, 268, 272face, 178halfspace, 74, 163, 182hyperplane, 86, 178convex set, 80, 523line with boundary, 42nullspace, 88planes, 87positive semidefinite coneaffine, 139, 287geometric center subspace, 629line, 419subspace, 88tangential, 43invarianceclosure, 39, 148, 183cone, 102, 103, 144, 164, 499pointed, 106, 151convexity, 46, 47dimension, 55ellipsoid, 49function, 653, 654convex, 220, 241, 272Gram form, 435inner-product form, 438isometric, 23, 420, 434, 438, 476, 477optimization, 239, 654orthogonal, 54, 485, 552, 647, 650reflection, 436rotation, 130, 436scaling, 272, 403, 653, 654set, 467translation, 434–436, 441, 462function, 220, 272, 654subspace, 529, 731inversionconic coordinates, 215Gram form, 441injective, 53matrix, see matrix, inversenullspace, 49, 53operator, 49, 53, 55, 183is, 773isedm(), 400, 407isometry, 54, 485, 554

INDEX 831isomorphic, 52, 56, 59, 772face, 122, 410, 509isometrically, 42, 53, 118, 180non, 629isomorphism, 52, 183, 184, 442, 444, 485,498isometric, 53, 54, 57, 60, 485symmetric hollow subspace, 60symmetric matrix subspace, 57isotonic, 481iterate, 748, 758, 759, 762iterationalternating projection, 580, 748, 754,763convex, see convex− j − , 375Jacobian, 251Johnson, 597, 598− K − , 100Karhunen-Loéve transform, 476kissing, 161, 227, 228, 346–348, 413, 703number, 413KKT conditions, 206, 208, 557, 668Klanfer, 408Klee, 110Krein-Rutman, 164Kronecker product, 118, 592–594, 665, 674,675, 692eigen, 594inverse, 376orthogonal, 651positive semidefinite, 402, 606projector, 749pseudoinverse, 701trace, 594vector, 676vectorization, 593Kuhn, 411− L −, 240Lagrange multiplier, 174, 668sign, 216Lagrangian, 216, 224, 367, 379, 410Lasso, 372lattice, 319–322, 326–328regular, 30, 319Laurent, 299, 301, 460, 465, 516, 521, 760law of cosines, 430least1-norm, 227, 228, 293, 347energy, 408, 573norm, 227, 293, 702squares, 315, 317, 702Legendre-Fenchel transform, 570Lemaréchal, 21, 82line, 253, 266tangential, 44linearalgebra, 85, 591bijection, see bijectivecomplementarity, 207function, see operatorindependence, 37, 71, 141, 142, 145,298, 348, 637matrix, 603of affine, 86, 88preservation, 37subspace, 37, 772inequality, 35, 72, 169matrix, 182–184, 284, 287injective, see injectiveoperator, 47, 59, 220, 237, 238, 405,441, 443, 485, 512, 591, 732projector, 707, 712, 715, 735program, 83, 88–90, 162, 226, 238, 273,275–278, 414, 781dual, 162, 275prototypical, 163, 275regression, 372surjective, see surjectivetransformation, see transformationlist, 28Littlewood, 560Liu, 497, 504, 511, 578localization, 24, 419sensor network, 315, 316

832 INDEXstandardized test, 319unique, 23, 316, 318, 418, 421wireless, 425log, 692, 696-convex, 272barrier, 414det, 269, 425, 574, 687, 695Lowner, 116, 606− M − , 199machine learning, 25Magaia, 624majorization, 595cone, 595symmetric hollow, 595manifold, 25, 26, 69, 70, 121, 522, 524, 552,587, 647, 661map, see transformationisotonic, 479, 482USA, 28, 479, 480Mardia, 556Markov process, 529mater, 670Mathar, 530Matlab, 33, 140, 293, 331, 340, 345, 354,362, 369, 415, 480, 483, 624MRI, 371, 376, 381, 383matrix, 670angleinterpoint, 405, 408relative, 432antisymmetric, 56, 596, 778antihollow, 59, 60, 549auxiliary, 407, 642, 646orthonormal, 646projector, 642Schoenberg, 404, 644table, 646binary, see matrix, BooleanBoolean, 295, 461bordered, 455, 469calculus, 669circulant, 268, 619, 643permutation, 376symmetric, 376commutative, see productcompletion, see problemcorrelation, 62, 460, 461covariance, 407determinant, see determinantdiagonal, see diagonaldiagonalizable, see diagonalizabledirection, 308, 325, 330, 370, 415, 576interpretation, 309distance1-norm, 263, 468Euclidean, see EDMdoublet, 435, 639, 731nullspace, 639, 640range, 639, 640dyad, 631, 634-decomposition, 631independence, 90, 189, 477, 506, 637,638negative, 634nullspace, 634, 635projection on, 722, 725projector, 635, 704, 722range, 634, 635sum, 616, 620, 638symmetric, 127, 609, 610, 636EDM, see EDMelementary, 467, 640, 643nullspace, 640, 641range, 640, 641exponential, 268, 697fat, 55, 86, 384, 770Fourier, 54, 375full-rank, 86Gale, 421geometric centering, 407, 435, 499, 527,643Gram, 323, 405, 417constraint, 408, 409unique, 435, 439Hermitian, 596hollow, 56, 59Householder, 641auxiliary, 643idempotent, 707

INDEX 833nonsymmetric, 703range, 703, 704symmetric, 709, 712identity, 782indefinite, 384, 468inverse, 267, 607, 618, 636, 686, 687injective, 700symmetric, 620Jordan form, 601, 618measurement, 548nonnegative, 469, 512, 535, 548, 741definite, see positive semidefinitefactorization, 314nonsingular, 618, 620normal, 53, 602, 619, 621normalization, 296nullspace, see nullspaceorthogonal, 54, 356, 357, 436, 620,647–650manifold, 647product, 647, 651symmetric, 648orthonormal, 54, 65, 70, 309, 355, 436,622, 646, 658, 711manifold, 70pseudoinverse, 701partitioned, 611–615, see Schurpermutation, 69, 356, 357, 451, 594,620, 647circulant, 376extreme point, 69positive semidefinite, 69product, 647symmetric, 376, 642, 664positive definite, 116inverse, 607positive real numbers, 263positive factorization, 314positive semidefinite, 116, 469, 596,599, 601, 609completely, 101, 314difference, 138domain of test, 596, 727from extreme directions, 128Hadamard, 606Kronecker, 402, 606nonnegative versus, 469nonsymmetric, 601permutation, 69projection, 610, 711pseudoinverse, 701rank, see ranksquare root, 607, 620, 770sum, eigenvalues, 608sum, nullspace, 90sum, rank, see ranksymmetry versus, 596zero entry, 626product, see productprojection, 119, 643, 711Kronecker, 749nonorthogonal, 703orthogonal, 709positive semidefinite, 610, 711product, 748pseudoinverse, 118, 191, 252, 293, 611,699Kronecker, 701nullspace, 699, 700orthonormal, 701positive semidefinite, 701product, 700range, 699, 700SVD, 624symmetric, 624, 701transpose, 188unique, 699quotient, Hadamard, 690, 697, 772range, see rangerank, see rankreflection, 436, 437, 648, 664rotation, 436, 647, 649, 663of range, 648rotation of, 650Schur-form, see Schursimple, 633skinny, 54, 191, 769sort index, 481square root, 770, see matrix, positivesemidefinite

834 INDEXsquared, 267standard basis, 58, 60, 770Stiefel, see matrix, orthonormalstochastic, 69, 356sumeigenvalues, 608nullspace, 90rank, see ranksymmetric, 56, 619inverse, 620pseudoinverse, 624, 701real numbers, 263subspace of, 56symmetrized, 597trace, see traceunitary, 647symmetric, 375, 378zero definite, 630max cut problem, 365maximization, 241maximum variance unfolding, 522, 529McCormick, 276membership, 64, 103, 105, 116, 785boundary, 185interior, 185relation, 168affine, 224boundary, 169, 178, 287discretized, 175, 176, 180, 541EDM, 541interior, 169, 286normal cone, 203orthant, 64, 177subspace, 195metric, 54, 263, 396postulate, 397property, 397, 451fifth, 397, 400, 401, 486, 489, 495space, 397minimalcardinality, see minimizationelement, 104, 105, 222, 223set, see setminimization, 173, 203, 318, 681affine function, 239cardinality, 227, 228, 333, 340, 351–353,380Boolean, 292nonnegative, 229, 230, 233, 333, 337,339, 346–350rank connection, 310rank one, 369rank, full, 298fractional, inverted, see functionnorm1-, 226, see problem2-, 226, 410, 411∞-, 226Frobenius, 226, 248, 249, 251, 252,553, 730nuclear, 384, 572, 655on hypercube, 83rank, 308, 355, 387, 390, 575, 583cardinality connection, 310cardinality constrained, 369fat, 384indefinite, 384trace, 310–312, 384, 419, 570, 572, 573,613, 655minimizer, 221, 222minimumelement, 104, 222unique, 105, 223globalunique, 219, 681valueunique, 222Minkowski, 148MIT, 386molecular conformation, 24, 428monotonic, 247, 254, 313, 754, 758Fejér, 754function, 219, 231–233, 254, 271, 337gradient, 255Moore-Penroseconditions, 699inverse, see matrix, pseudoinverseMoreau, 206, 740Motzkin, 137, 173, 734MRI, 371, 373, 378

INDEX 835Muller, 624multidimensionalfunction, 219, 250, 266, 782affine, 224, 238objective, 104, 105, 222, 223, 332, 380scaling, 24, 476ordinal, 481multilateration, 316, 425multiobjective optimization, 104, 105,222–224, 332, 380multipath, 425multiplicity, 603, 618Musin, 414, 416− N − , 85nearest neighbors, 525neighborhood graph, 524, 525Nemirovski, 547nesting, 454Newton, 414, 669direction, 251nondegeneracy, 397nonexpansive, 711, 743nonnegative, 450, 785constraint, 69, 88, 89, 206, 275matrix factorization, 314part, 234, 548, 744, 775polynomial, 106, 612nonnegativity, 225, 397nonvertical, 253norm, 222, 225, 226, 229, 333, 7860, 6250-, see 0-norm1-, see 1-norm2-, see 2-norm∞-, see ∞-normk-largest, 232, 233, 337, 338gradient, 234ball, see ballconstraint, 353, 363, 625spectral, 70convexity, 70, 221, 222, 225, 226, 255,268, 655, 786dual, 165, 655, 786Euclidean, see 2-normFrobenius, 53, 222, 226, 552minimization, 226, 248, 249, 251,252, 553, 730Schur-form, 248gradient, 234, 691, 693, 709, 719, 720,730, 734inequality, 293, 786triangle, 225Ky Fan, 654least, 227, 293, 702nuclear, 384, 572, 654, 655, 786ball, 67orthogonally invariant, 54, 485, 552,647, 650property, 225regularization, 364, 702spectral, 70, 268, 552, 565, 624, 634,641, 642, 654, 745, 786ball, 70, 745dual, 655, 786inequality, 70, 786Schur-form, 268, 654square, 222, 693, 719, 720, 730, 786vs. square root, 226, 255, 552, 709normal, 76, 204, 734cone, see coneequation, 702facet, 186inward, 72, 529outward, 72vector, 76, 734, 765Notation, 769NP-hard, 576nuclearmagnetic resonance, 428norm, 384, 572, 654, 655, 786ball, 67nullspace, 85, 643, 732algebra, 634basis, 87, 91, 613, 710dimension, 85doublet, 639, 640dyad, 634, 635elementary matrix, 640, 641form, 85

836 INDEXhyperplane, 73injective, 53intersection, 88orthogonal complement, 85product, 604, 634, 638, 710pseudoinverse, 699, 700set, 178sum, 90vectorized, 92− O − , 782objective, 83, 781convex, 274strictly, 567, 573function, 83, 84, 162, 173, 203, 205, 781linear, 238, 325, 419multidimensional, 104, 105, 222, 223,332, 380nonlinear, 240polynomial, 360quadratic, 248nonconvex, 668real, 222value, 288, 304unique, 275offset, see invariance, translationon, 777one-to-one, 777, see injectiveonly if, 773onto, 777, see surjectiveopen, see setoperator, 782adjoint, 50, 51, 164, 537, 769, 776self, 409, 536, 591affine, see functioninjective, see injectiveinvertible, 49, 53, 55, 183linear, 47, 59, 220, 237, 238, 405, 441,443, 485, 512, 591, 732projector, 707, 712, 715, 735nonlinear, 220, 240nullspace, 435, 439, 441, 443, 732permutation, 231, 334, 559, 563, 585,775projection, see projectorsurjective, see surjectiveunitary, 54optimalitycondition, 289directional derivative, 681dual, 160first order, 173, 174, 203–208KKT, 206, 208, 557, 668semidefinite program, 290Slater, 162, 216, 285, 289, 290unconstrained, 174, 251, 252, 730constraintconic, 205, 206, 208equality, 174, 208inequality, 206nonnegative, 206global, 8, 222, 247, 274perturbation, 304optimization, 21, 273algebra, 653, 654combinatorial, 69, 83, 230, 292, 356,362, 365, 461, 588conic, see problemconvex, 7, 21, 222, 274, 417art, 8, 298, 318, 589fundamental, 274invariance, 239, 654multiobjective, 104, 105, 222–224, 332,380tractable, 274vector, 104, 105, 222–224, 332, 380ordernatural, 50, 591, 781nonincreasing, 561, 620, 621, 775of projection, 548partial, 64, 101, 103–105, 116, 177, 186,187, 606, 782, 785generalized inequality, 103, 105, 116orthant, 177property, 103, 105total, 103, 104, 156, 783transitivity, 103, 116, 606origin, 36cone, 72, 100, 102, 103, 107, 109, 142projection of, 227, 228, 703, 719

INDEX 837projection on, 739, 745translation to, 446Orion nebula, 22orthant, 37, 153, 156, 168, 177, 190, 203dual, 182extreme direction, 89nonnegative, 170, 535orthogonal, 50, 709, 730complement, 56, 85, 310, 708, 712, 773,779cone, 156, 166, 740, 765vector, 779equivalence, 651expansion, 58, 156, 189, 711, 713invariance, 54, 485, 552, 647, 650set, 178sum, 773orthonormal, 50decomposition, 711matrix, see matrixorthonormality condition, 189over, 777overdetermined, 769− P − , 703PageRank, 529parallel, 38, 713pattern recognition, 24Penrose conditions, 699pentahedron, 492pentatope, 151, 492perpendicular, see orthogonalperturbation, 299, 301optimality, 304phantom, 373plane, 72segment, 107pointboundary, 94of hypercube slice, 383exposed, see exposedfeasible, see solutionfixed, 313, 381, 732, 751–754inflection, 261Pólya, 560Polyak, 653polychoron, 46, 147, 153, 491, 492polyhedron, 42, 61, 83, 147, 458bounded, 64, 69, 149, 230face, 147halfspace-description, 147norm ball, 228permutation, 69, 356, 359stochastic, 69, 356, 359transformationlinear, 144, 148, 170unbounded, 38, 42, 149vertex, 69, 149, 228vertex-description, 64, 150polynomialconstraint, 235, 360, 361convex, 261nonconvex, 360nonnegative, 106, 612objective, 360quadratic, see functionpolytope, 147positive, 785strictly, 451power, see function, fractionalPrécis, 448precision, numerical, 148, 276, 278, 295,332, 379, 414, 416, 547, 577presolver, 348–351primalfeasible set, 278, 284problem, 162, 275via dual, 163, 291principalcomponent analysis, 369, 476eigenvalue, 369eigenvector, 369, 371submatrix, 124, 417, 487, 492, 509, 609leading, 124, 452, 454principlehalfspace-description, 74separating hyperplane, 84supporting hyperplane, 79problem1-norm, 226–230, 293, 351–353

838 INDEXnonnegative, 229, 230, 335, 336, 339,346–350, 572Boolean, 69, 83, 292, 362, 461cardinality, see cardinalitycombinatorial, 69, 83, 230, 292, 356,362, 365, 461, 588complementarity, 206linear, 207completion, 319–322, 330, 376, 398,399, 418, 421, 422, 457, 464, 465,494, 495, 522–524, 589geometry, 522semidefinite, 759, 760compressed sensing, see compressedconcave, 274conic, 162, 205, 206, 208, 224, 275dual, 162, 275convex, 162, 174, 205, 208, 226, 238,247, 249, 278, 284, 564, 733definition, 274geometry, 21, 27nonlinear, 273statement as solution, 7, 273tractable, 274dual, see dualepigraph form, 248, 580, 581equivalent, 309, 781feasibility, 141, 151, 178, 179, 783semidefinite, 288, 308, 309, 390fractional, inverted, see functionmax cut, 365minimax, 64, 160, 162nonlinear, 417convex, 273open, 30, 135, 313, 475, 511, 588primal, see primalProcrustes, see Procrustesproximity, 547, 549, 552, 555, 566, 578EDM, nonconvex, 562in spectral norm, 565rank heuristic, 572, 575SDP, Gram form, 569semidefinite program, 558, 568, 579quadraticnonconvex, 668rank, see ranksame, 781stress, 262, 554, 576tractable, 274procedurerank reduction, 300Procrustescombinatorial, 356compressed sensing, 359diagonal, 667linear program, 665maximization, 664orthogonal, 661two sided, 663, 666orthonormal, 355symmetric, 666translation, 662unique solution, 662product, 271, 675Cartesian, 47, 284, 756, 778cone, 102, 103, 163function, 224commutative, 268, 291, 603–605, 629,748determinant, 604direct, 674eigenvalue, 603, 605, 606, 661matrix, 605, 606fractional, see functionfunction, see functiongradient, 673Hadamard, 51, 566, 592, 593, 676, 772positive semidefinite, 606innermatrix, 405vector, 50, 271, 276, 372, 431, 723,774vectorized matrix, 50, 601Kronecker, 118, 592–594, 665, 674, 675,692eigen, 594inverse, 376orthogonal, 651positive semidefinite, 402, 606projector, 749

INDEX 839pseudoinverse, 701trace, 594vector, 676vectorization, 593nullspace, 604, 634, 638, 710orthogonal, 647outer, 65permutation, 647positive definite, 598nonsymmetric, 598positive semidefinite, 605, 606, 609, 610nonsymmetric, 605zero, 628projector, 534, 723, 748pseudofractional, see functionpseudoinverse, 700Kronecker, 701range, 604, 634, 638, 710rank, 603, 604tensor, 674trace, see tracezero, 628program, 83, 273convex, see problemgeometric, 8, 277integer, 69, 83, 348, 461linear, 83, 88–90, 162, 226, 238, 273,275–278, 414, 781dual, 162, 275prototypical, 163, 275nonlinearconvex, 273quadratic, 277, 278, 484second-order cone, 277, 278semidefinite, 163, 226, 240, 273–275,277, 278, 358, 781dual, 163, 275equality constraint, 116Gram form, 581prototypical, 163, 275, 289, 307, 781Schur-form, 235, 248, 249, 268, 485,568, 580, 581, 654, 742projection, 53, 699I − P, 708, 740algebra, 715alternating, 413, 748, 749angle, 755convergence, 753, 755, 757, 760distance, 751, 752Dykstra algorithm, 748, 761, 762feasibility, 751–753iteration, 748, 763on affine/psd cone, 758on EDM cone, 580on halfspaces, 750on orthant/hyperplane, 754, 755optimization, 751, 752, 761over/under, 756biorthogonal expansion, 716coefficient, 186, 517, 722, 728cyclic, 750direction, 709, 713, 724nonorthogonal, 706, 707, 722dual, see dual, projectioneasy, 743Euclidean, 136, 549, 552, 556, 558, 566,578, 734matrix, see matrixminimum-distance, 174, 707, 710, 730,733nonorthogonal, 406, 565, 703, 706, 707,722eigenvalue coefficients, 723oblique, 227, 228, 554, see projection,nonorthogonalof convex set, 50of hypercube, 52of matrix, 729, 731of origin, 227, 228, 703, 719of polyhedron, 147of PSD cone, 120on 1-norm ball, 745on affine, 50, 715, 720hyperplane, 719, 721of origin, 227, 228, 703, 719vertex-description, 720on basisnonorthogonal, 716orthogonal, 715on box, 744

840 INDEXon cardinality k , 744on complement, 708, 740on cone, 738, 740, 745dual, 96, 156, 740EDM, 566, 567, 578, 583, 584Lorentz (second-order), 745monotone nonnegative, 474, 484,560–562, 745polyhedral, 745simplicial, 745truncated, 742on cone, PSD, 136, 555, 557, 558, 744cardinality constrained, 369geometric center subspace, 534rank constrained, 251, 558, 564, 657rank one, 369on convex set, 174, 733–735in affine subset, 745in subspace, 745, 746on convex sets, 750on dyad, 722, 725on ellipsoid, 353on Euclidean ball, 744on function domain, 243on geometric center subspace, 534, 731on halfspace, 721on hyperbox, 744on hypercube, 744on hyperplane, 719, 721, see projection,on affineon hypersphere, 744on intersection, 745on matricesrank constrained, 745on matrix, 722, 724on origin, 739, 745on orthant, 561, 741, 744in subspace, 561on range, 702on rowspace, 55on simplex, 745on slab, 721on spectral norm ball, 745on subspace, 50, 710Cartesian, 744elementary, 718hollow symmetric, 556, 743matrix, 729orthogonal complement, 96, 156, 739polyhedron, 147symmetric matrices, 743on vectorcardinality k , 744nonorthogonal, 722orthogonal, 724one-dimensional, 724order of, 548, 550, 745orthogonal, 702, 709, 724eigenvalue coefficients, 726semidefiniteness test as, 727range/rowspace, 733spectral, 527, 559, 562norm, 565, 745unique, 562successive, 750two sided, 729, 731, 733umbral, 50unique, 707, 710, 730, 733–735vectorization, 517, 731projectorbiorthogonal, 713characteristic, 704, 707, 708, 711, 712commutative, 748non, 534, 745, 749complement, 709convexity, 707, 712, 733, 735direction, 709, 713, 724nonorthogonal, 706, 707, 722dyad, 635, 704, 722Kronecker, 749linear operator, 55, 707, 712, 715, 735nonorthogonal, 406, 707, 714, 722orthogonal, 710orthonormal, 713product, 534, 723, 748range, 710semidefinite, 610, 711prototypicalcomplementarity problem, 207compressed sensing, 347

INDEX 841coordinate system, 156linear program, 275SDP, 163, 275, 289, 307, 781signal, 372pyramid, 493, 494− Q − , 647quadrant, 37quadratic, see functionquadrature, 436quartix, 672, 781quasi-, see functionquotientHadamard, 690, 697, 772Rayleigh, 728− R −, 85range, 63, 85, 730, 777dimension, 55doublet, 639, 640dyad, 634, 635elementary matrix, 640, 641form, 85idempotent, 703, 704orthogonal complement, 85product, 604, 634, 638, 710pseudoinverse, 699, 700rotation, 648rank, 603, 604, 621, 622, 784-ρ subset, 121, 137, 556, 558–565, 583,589constraint, see constraintconvex envelope, see convexconvex iteration, see convexdiagonalizable, 603dimension, 141, 447, 448, 784affine, 449, 575full, 86, 298heuristic, 572, 575inequality, 607log det, 574, 575lowest, 309minimization, 308, 355, 387, 390, 575,583cardinality connection, 310cardinality constrained, 369fat, 384indefinite, 384one, 631, 634Boolean, 295, 461convex iteration, 355, 390, 587hull, see hull, convexmodification, 636, 640PSD, 127, 609, 610, 636subset, 251symmetric, 127, 609, 610, 636update, see rank, one, modificationpartitioned matrix, 614positive semidefinite cone, 136, 279face, 122–126Précis, 448product, 603, 604projection matrix, 708, 712quasiconcavity, 135reduction, 278, 299, 300, 305procedure, 300regularization, 332, 355, 365, 389, 390,575, 587Schur-form, 614, 615sum, 135positive semidefinite, 135, 603trace heuristic, 310–312, 384, 570–573,655ray, 96boundary of, 96cone, 96, 100, 101, 103, 142, 151boundary, 107EDM, 111positive semidefinite, 109extreme, 107, 514interior, 96Rayleigh quotient, 728realizable, 22, 412, 554, 555reconstruction, 476isometric, 420, 480isotonic, 479–482, 484, 486list, 476unique, 399, 421, 422, 438, 439recursion, 381, 539, 591, 609

842 INDEXreflection, 436, 437, 664invariance, 436vector, 648reflexivity, 103, 606regularlattice, 30, 319simplex, 492tetrahedron, 489, 502regularizationcardinality, 345, 370norm, 364, 702rank, 332, 355, 365, 389, 390, 575, 587relaxation, 70, 83, 230, 294, 297, 419, 461,483reweighting, 336, 574Riemann, 278robotics, 25, 31root, see function, fractionalrotation, 436, 649, 663invariance, 436of matrix, 650of range, 648vector, 132, 436, 647round, 784Rutman, 164− S −, 151saddle value, 161, 162scaling, 476multidimensional, 24, 476ordinal, 481unidimensional, 576Schoenberg, 395, 406, 458, 465, 467, 483,498, 554, 556, 721auxiliary matrix, 404, 644criterion, 30, 407, 471, 541, 729Schur, 595-form, 611–615anomaly, 249constraint, 70convex set, 117norm, Frobenius, 248norm, spectral, 268, 654nullspace, 613rank, 614, 615semidefinite program, 235, 248, 249,268, 485, 568, 580, 581, 654, 742sparse, 614complement, 324, 419, 611, 612, 614Schwarz, 88, 753self-distance, 397semidefinite program, see programsensor, 315, 316network localization, 315, 316, 417, 424sequence, 754setactive, 307Cartesian product, 47closed, 36, 39–41, 72–74, 183cone, 114, 119, 144, 148, 163, 164,170, 183, 284–286, 499exposed, 511polyhedron, 144, 147, 148connected, 36convex, 36invariance, 46, 47projection on, 174, 733–735difference, 47, 163empty, 38, 39, 46, 653affine, 38closed, 41cone, 100face, 92hull, 62, 65, 72open, 41feasible, 45, 70, 173, 203, 222, 274, 781dual, 284primal, 278, 284intersection, see intersectioninvariance, 467level, 204, 238, 240, 250, 259, 271, 274affine, 240minimal, 64, 77, 78, 144, 777affine, 720extreme, 110, 185halfspace, 145hyperplane, 145orthant, 221positive semidefinite cone, 115, 180,263

INDEX 843nonconvex, 40, 97–101nullspace, 178open, 36, 38–41, 169and closed, 36, 41orthogonal, 178projection of, 50Schur-form, 117smooth, 459solution, 222, 781sublevel, 204, 242, 251, 258, 265, 270convex, 243, 259, 612quasiconvex, 270sum, 46, 772superlevel, 243, 270union, 46, 96, 98, 100, 125Shannon, 372shape, 23Sherman-Morrison, 636shift, see invariance, translationshroud, 437, 438, 515cone, 181, 543SIAM, 274signaldropout, 341processingdigital, 341, 372, 385, 476simplex, 151–153area, 491content, 493method, 332, 414nonnegative, 346, 347regular, 492unit, 151, 152volume, 493singular value, 621, 654σ , 592, 775decomposition, 620–625, 655compact, 620diagonal, 668full, 622geometrical, 623pseudoinverse, 624subcompact, 621symmetric, 625, 667eigenvalue, 621, 624, 625inequality, 70, 666inverse, 624largest, 70, 268, 654, 745, 786normal matrix, 53, 621sum of, 53, 67, 654, 655symmetric matrix, 624, 625triangle inequality, 67skinny, 191, 769slab, 38, 324, 721slackcomplementary, 290variable, 275, 284, 307Slater, 162, 216, 285, 289, 290slice, 129, 182, 425, 680, 681slope, 251solid, 147, 152, 489solutionanalytical, 248, 273, 653feasible, 83, 173, 285, 751–755, 781dual, 289, 290strictly, 285, 289numerical, 330optimal, 83, 88–90, 222, 781problem statement as, 7, 273set, 222, 781trivial, 37unique, 45, 221, 222, 419, 420, 565, 567,573, 583vertex, 83, 230, 359, 414sortfunction, 231, 334, 559, 563, 585, 775index matrix, 481largest entries, 231monotone nonnegative, 561smallest entries, 231span, 63, 777sparsity, 225, 227–230, 233, 292, 298, 333,334, 337–340, 351–353, 369, 372gradient, 373nonnegative, 229, 339, 346–350spectralcone, 470, 472, 559dual, 475orthant, 562inequality, 70, 470

844 INDEXnorm, 70, 268, 552, 565, 624, 634, 641,642, 654, 745, 786ball, 70, 745dual, 655, 786inequality, 70, 786Schur-form, 268, 654projection, see projectionsphere packing, 413steepest descent, 251, 680strictcomplementarity, 291feasibility, 285, 289positivity, 451triangle inequality, 454Sturm, 631subject to, 783submatrixprincipal, 124, 417, 487, 492, 509, 609leading, 124, 452, 454subspace, 36algebra, 36, 85, 86, 634antihollow, 59antisymmetric, 59, 60, 549antisymmetric, 56Cartesian, 336, 744complementary, see complementfundamental, 49, 85, 86, 635, 639, 640,700, 710geometric center, 439, 440, 498, 529,532–534, 570, 629, 731orthogonal complement, 121, 435,529, 731hollow, 58, 440, 533intersection, 88hyperplane, 86nullspace basis span, 87proper, 36representation, 85symmetric hollow, 58tangent, 121translation invariant, 435, 441, 529, 731vectorized, 90, 92successiveapproximation, 750projection, 750sum, 46of eigenvalues, 53, 602of functions, 224, 255, 272of matriceseigenvalues, 608nullspace, 90rank, 135, 603of singular values, 53, 67, 654, 655vector, 46, 637, 772orthogonal, 773unique, 56, 156, 637, 772, 773supremum, 64, 272, 653–656, 783of affine functions, 241, 655of convex functions, 268, 272surjective, 54, 439, 441, 443, 560, 777linear, 443, 444, 461, 463svec, 57, 275, 785Swiss roll, 26symmetry, 397− T −, 769tangentline, 43, 419subspace, 121tangential, 43, 82, 420Taylor series, 679, 685–687tensor, 672–675tesseract, 52tetrahedron, 151–153, 346, 491angle inequality, 489regular, 489, 502Theobald, 603theorem0 eigenvalues, 626alternating projectiondistance, 753alternative, 170–173, 179EDM, 544semidefinite, 287weak, 172Barvinok, 139Bunt-Motzkin, 734Carathéodory, 168, 726compressed sensing, 339, 372cone

INDEX 845faces, 106intersection, 102conic coordinates, 213convexity condition, 256decompositiondyad, 631directional derivative, 681discretized membership, 176dual cone intersection, 209duality, weak, 288EDM, 461eigenvalueof difference, 608of sum, 608order, 607zero, 626elliptope vertices, 461exposed, 111extreme existence, 92extremes, 110Farkas’ lemma, 168, 171not positive definite, 287positive definite, 286positive semidefinite, 284fundamentalalgebra, 617algebra, linear, 85convex optimization, 274generalized inequality and membership,168Geršgorin discs, 132gradient monotonicity, 255Hahn-Banach, 74, 79, 84halfspaces, 74Hardy-Littlewood-Pólya, 560hypersphere, 461inequalities, 560intersection, 46inverse image, 47, 51closedness, 183Klee, 110line, 266, 272linearly independent dyads, 637majorization, 595mean value, 686Minkowski, 148monotone nonnegative sort, 561Moreau decomposition, 206, 740Motzkin, 734transposition, 173nonexpansivity, 743pointed cones, 103positive semidefinite, 600cone subsets, 136matrix sum, 135principal submatrix, 609symmetric, 610projectionalgebraic complement, 741minimum-distance, 734, 735on affine, 715on cone, 738on convex set, 734, 735on PSD geometric intersection, 534on subspace, 50via dual cone, 741via normal cone, 735projectorrank/trace, 712semidefinite, 610proper-cone boundary, 106Pythagorean, 360, 431, 747range of dyad sum, 638rankaffine dimension, 449partitioned matrix, 614Schur-form, 614, 615trace, 708real eigenvector, 617sparse sampling, 339, 372sparsity, 229Tour, 465Weyl, 148eigenvalue, 608tight, 399, 782trace, 238, 591, 592, 654, 656, 693tr , 606, 784/rank gap, 571eigenvalues, 602heuristic, 310–312, 384, 570–573, 655

846 INDEXinequality, 606, 607Kronecker, 594maximization, 527minimization, 310–312, 384, 419, 570,572, 573, 613, 655of product, 51, 603, 606product, 594, 606projection matrix, 708zero, 628trajectory, 467, 521sensor, 423transformationaffine, 47, 49, 79, 183, 255, 272positive semidefinite cone, 117, 118bijective, see bijectiveFourier, discrete, 54, 375injective, see injectiveinvertible, 49, 53, 55linear, 37, 49, 79, 142, 700cone, 102, 106, 144, 148, 170, 183,185cone, dual, 164, 169, 540inverse, 49, 499, 700polyhedron, 144, 148, 170rigid, 23, 420similarity, 727surjective, see surjectivetransitivityface, 95order, 103, 116, 606trefoil, 525, 526, 528triangle, 398inequality, 397, 398, 431, 451–458, 464,487–492, 501, 502norm, 225strict, 454unique, 487vector, 225triangulation, 23trilateration, 23, 45, 418tandem, 424Tucker, 171, 206, 557− U − , 65unbounded below, 83, 88, 89, 172underdetermined, 387, 770unfolding, 525, 529unfurling, 525unimodal, 269, 270unraveling, 525, 526, 528untieing, 526, 528− V − , 404valueabsolute, 225, 262minimumunique, 222objective, 288, 304unique, 275variabledual, 160, 206, 208, 216, 775matrix, 77nonnegative, see constraintslack, 275, 284, 307variationcalculus, 173total, 379vec, 51, 376, 592, 593, 785vector, 36, 781binary, 62, 297, 362, 459direction, 308–313, 334–337interpretation, 335, 336, 572dual, 761normal, 76, 734, 765optimization, 104, 105, 222–224, 332,380orthogonal complement, 779parallel, 713Perron, 469primal, 761productinner, 50, 271, 276, 372, 431, 601,723, 774Kronecker, 676quadrature, 436reflection, 648rotation, 132, 436, 647space, 36, 396, 778standard basis, 58, 151, 406, 783vectorization, 51

INDEX 847projection, 517symmetric, 57hollow, 60VennEDM, 550, 551programs, 277sets, 149vertex, 41, 45, 69, 93, 94, 228cone, 103description, 71, 72, 110of affine, 87of dual cone, 194, 208of halfspace, 145of hyperplane, 77of polyhedral cone, 70, 150, 170, 185of polyhedron, 64, 150projection on affine, 720elliptope, 295, 460, 461polyhedron, 69, 149, 228solution, 83, 230, 359, 414Vitulli, 633von Neumann, 580, 666, 749, 750vortex, 508zero, 625definite, 630entry, 626gradient, 174, 251, 252, 730matrix product, 628norm-, 625trace, 628− W − , 308Wıκımization, 33, 140, 174, 207, 208, 300,345, 362, 363, 369, 383, 393, 400,407, 410, 480, 482, 483, 631, 745wavelet, 372, 374wedge, 107, 108, 165Weyl, 148, 608Wiener, 750wireless location, 24, 315, 425Wolkowicz, 665womb, 670Wuthrich, 24− X −, 779− Y −, 266Young, 556, 558, 589− Z −, 76

A searchable color pdfBookDattorro, Convex Optimization & Euclidean Distance Geometry,Mεβoo Publishing USA, 2005, v2010.10.26.is available online from Meboo Publishing.848

849

850

Convex Optimization & Euclidean Distance Geometry, DattorroMεβoo Publishing USA

v2010.10.26 - Convex Optimization

Create successful ePaper yourself

Delete template?

Save as template?