v2006.06.24 - Convex Optimization

DATTORROCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMεβοο

DattorroCONVEXOPTIMIZATION&EUCLIDEANDISTANCEGEOMETRYMeboo

Mεβoo Publishing

DATTORRO

Convex Optimization&Euclidean Distance Geometry

EDM N = S N h ∩ ( )S N⊥c − S N +

Convex Optimization&Euclidean Distance GeometryJon DattorroMeboo Publishing USA

Meboo Publishing USA345 Stanford Shopping Ctr.Suite 410Palo Alto, California 94304-1413Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo, 2005.ISBN 0-9764013-0-4printed book available as conceived in color, version 06.24.2006cybersearch:I. convex optimizationII. convex conesIII. convex geometryIV. distance geometryV. distance matrixprograms and graphics by Matlabtypesetting bywith donations from SIAM and AMS.This searchable electronic color pdfBook is click-navigable within the text bypage, section, subsection, chapter, theorem, example, definition, cross reference,citation, equation, figure, table, and hyperlink. A pdfBook has no electronic copyprotection and can be read and printed by most computers. The publisher herebygrants the right to reproduce this work in any format but limited to personal use.2005 Jon DattorroAll rights reserved.

for Jennie Columba♦Antonio♦♦& Sze Wan

AcknowledgementsReflecting on help given me all along the way, I am awed by my heavy debtof gratitude:Anthony, Teppo Aatola, Bob Adams, Chuck Bagnaschi, Jeffrey Barish, KeithBarr, Florence Barker, David Beausejour, Leonard Bellin, Dave Berners,Barry Blesser, Richard Bolt, Stephen Boyd, Merril Bradshaw, Gary Brooks,Jack Brown, Patty Brown, Joe Bryan, Barb Burk, John Calderone, KathrynCalí, Jon Caliri, Terri Carter, Reginald Centracchio, Bonnie Charvez,Michael Chille, Jan Chomysin, John Cioffi, Robert Cogan, Tony, Sandra,Josie, and Freddy Coletta, Fred, Rita, and Kathy Copley, Gary Corsini, John,Christine, Donna, Sylvia, Ida, Dottie, Sheila, and Yvonne D’Attoro, CosmoDellagrotta, Robin DeLuca, Lou D’Orsaneo, Antonietta DeCotis, DonnaDeMichele, Tom Destefano, Tom Dickie, Peggy Donatelli, Jeanne Duguay,Pozzi Escot, Courtney Esposito, Terry Fayer, Carol Feola, Valerie, Robin,Eddie, and Evelyn Ferruolo, Martin Fischer, Salvatore Fransosi, StanleyFreedman, Tony Garnett, Edouard Gauthier, Amelia Giglio, Chae TeokGoh, Gene Golub, Patricia Goodrich, Heather Graham, Robert Gray, DavidGriesinger, Robert Haas, George Heath, Bruce, Carl, John, Joy, and Harley,Michelle, Martin Henry, Mar Hershenson, Evan Holland, Leland Jackson,Rene Jaeger, Sung Dae Jun, Max Kamenetsky, Seymour Kass, PatriciaKeefe, Christine Kerrick, James Koch, Mary Kohut, Paul Kruger, HelenKung, Lynne Lamberis, Paul Lansky, Monique Laurent, Karen Law, FrancisLee, Scott Levine, Hal Lipset, Nando Lopez-Lezcano, Pat Macdonald, GinnyMagee, Ed Mahler, Sylvia, Frankie, and David Martinelli, Bill Mauchly,Jamie Jacobs May, Paul & Angelia McLean, Joyce, Al, and Tilly Menchini,Teresa Meng, Tom Metcalf, J. Andy Moorer, Walter Murray, John O’Donnell,Brad Osgood, Ray Ouellette, Sal Pandozzi, Bob Peretti, Bob Poirier, MannyPokotilow, Michael Radentz, Timothy Rapa, Doris Rapisardi, KatharinaReissing, Flaviu, Nancy, and Myron Romanul, Marvin Rous, Carlo, Anthony,and Peters Russo, Eric Safire, Jimmy Scialdone, Anthony, Susan, and EstherScorpio, John Senior, Robin Silver, Josh, Richard, and Jeremy Singer, AlbertSiu-Toung, Julius Smith, Terry Stancil, Brigitte Strain, John Strawn, DavidTaraborelli, Robin Tepper, Susan Thomas, Stavros Toumpis, GeorgetteTrezvant, Rebecca, Daniel, Jack, and Stella Tuttle, Maryanne Uccello, JoeVotta, Julia Wade, Shelley Marged Weber, Jennifer Westerlind, Todd Wilson,Yinyu Ye, George & MaryAnn Young, Donna & Joe ZagamiFrom: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.

PreludeThe constant demands of my department and university and theever increasing work needed to obtain funding have stolen much ofmy precious thinking time, and I sometimes yearn for the halcyondays of Bell Labs.−Steven Chu, Nobel laureate [50]Convex Analysis is the calculus of inequalities while Convex Optimizationis its application. Analysis is inherently the domain of the mathematicianwhile Optimization belongs to the engineer.There is a great race under way to determine which important problemscan be posed in a convex setting. Yet, that skill acquired by understandingthe geometry and application of Convex Optimization will remain more anart for some time to come; the reason being, there is generally no uniquetransformation of a given problem to its convex equivalent. This means, tworesearchers pondering the same problem are likely to formulate the convexequivalent differently; hence, one solution is likely different from the otherfor the same problem. Any presumption of only one right or correct solutionbecomes nebulous. Study of equivalence, sameness, and uniqueness thereforepervade study of optimization.2005 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.11

There is tremendous benefit in transforming a given optimization problemto its convex equivalent, primarily because any locally optimal solution isthen guaranteed globally optimal. Solving a nonlinear system, for example,by instead solving an equivalent convex optimization problem is thereforehighly preferable. 0.1 Yet it can be difficult for the engineer to apply theorywithout understanding Analysis. Boyd & Vandenberghe’s book ConvexOptimization [41] is a marvelous bridge between Analysis and Optimization;rigorous though accessible to nonmathematicians. 0.2These pages comprise my journal over a six year period filling someremaining gaps between mathematician and engineer; they constitute atranslation, unification, and cohering of about two hundred papers, books,and reports from several different areas of mathematics and engineering.Beacons of historical accomplishment are cited throughout. Extraordinaryattention is paid to detail, clarity, accuracy, and consistency; accompaniedby immense effort to eliminate ambiguity and verbosity. Consequently thereis heavy cross-referencing and much background material provided in thetext, footnotes, and appendices so as to be self-contained and to provideunderstanding of fundamental concepts.Much of what is written here will not be found elsewhere. Materialcontained in Boyd & Vandenberghe is, for the most part, not duplicated herealthough topics found particularly useful in the study of distance geometryare expanded or elaborated; e.g., matrix-valued functions in chapter 3 andcalculus in appendix D, linear algebra in appendix A, convex geometryin chapter 2, nonorthogonal and orthogonal projection in appendix E,semidefinite programming in chapter 6, etcetera. In that sense, this bookmay be regarded a companion to Convex Optimization.−Jon DattorroStanford, California20050.1 This is the idea motivating a form of convex optimization known as geometricprogramming [41, p.188] that has already driven great advances in the electronic circuitdesign industry. [26,4.7] [161] [260] [263] [59] [112] [119] [120] [121] [122] [123] [124] [172][173] [180] [204]0.2 Their text was conventionally published in 2004 having been taught at StanfordUniversity and made freely available on the world-wide web for ten prior years; adissemination motivated by the belief that a virtual flood of new applications would followby epiphany that many problems hitherto presumed nonconvex could be transformed orrelaxed into convexity. [8] [9] [26,4.3, p.316-322] [40] [56] [85] [86] [182] [196] [201] [242][243] [267] [270] Course enrollment for Spring quarter 2005 at Stanford was 170 students.12

Convex Optimization&Euclidean Distance Geometry1 Overview 352 Convex geometry 472.1 Convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.1.1 subspace . . . . . . . . . . . . . . . . . . . . . . . . . . 482.1.2 linear independence . . . . . . . . . . . . . . . . . . . . 492.1.2.1 preservation . . . . . . . . . . . . . . . . . . . 492.1.3 Orthant: . . . . . . . . . . . . . . . . . . . . . . . . . . 502.1.4 affine set . . . . . . . . . . . . . . . . . . . . . . . . . . 502.1.5 dimension . . . . . . . . . . . . . . . . . . . . . . . . . 502.1.6 empty set versus empty interior . . . . . . . . . . . . . 512.1.6.1 relative interior . . . . . . . . . . . . . . . . . 512.1.7 classical boundary . . . . . . . . . . . . . . . . . . . . 522.1.7.1 Line intersection with boundary . . . . . . . . 532.1.7.2 Tangential line intersection with boundary . . 542.1.8 intersection, sum, difference, product . . . . . . . . . . 562.1.9 inverse image . . . . . . . . . . . . . . . . . . . . . . . 582.2 Vectorized-matrix inner product . . . . . . . . . . . . . . . . . 592.2.1 Frobenius’ . . . . . . . . . . . . . . . . . . . . . . . . . 612.2.2 Symmetric matrices . . . . . . . . . . . . . . . . . . . . 622.2.2.1 Isomorphism on symmetric matrix subspace . 632.2.3 Symmetric hollow subspace . . . . . . . . . . . . . . . 652.3 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6713

14 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY2.3.1 Affine dimension, affine hull . . . . . . . . . . . . . . . 672.3.2 Convex hull . . . . . . . . . . . . . . . . . . . . . . . . 702.3.2.1 Comparison with respect to R N + . . . . . . . . 702.3.3 Conic hull . . . . . . . . . . . . . . . . . . . . . . . . . 722.3.4 Vertex-description . . . . . . . . . . . . . . . . . . . . . 742.4 Halfspace, Hyperplane . . . . . . . . . . . . . . . . . . . . . . 742.4.1 Halfspaces H + and H − . . . . . . . . . . . . . . . . . . 742.4.1.1 PRINCIPLE 1: Halfspace-description . . . . . 762.4.2 Hyperplane ∂H representations . . . . . . . . . . . . . 782.4.2.1 Matrix variable . . . . . . . . . . . . . . . . . 792.4.2.2 Vertex-description of hyperplane . . . . . . . 802.4.2.3 Affine independence, minimal set . . . . . . . 802.4.2.4 Preservation of affine independence . . . . . . 812.4.2.5 affine maps . . . . . . . . . . . . . . . . . . . 812.4.2.6 PRINCIPLE 2: Supporting hyperplane . . . . 842.4.2.7 PRINCIPLE 3: Separating hyperplane . . . . 862.5 Subspace representations . . . . . . . . . . . . . . . . . . . . . 872.5.1 Subspace or affine set . . . . . . . . . . . . . . . . . . . . 882.5.1.1 . . . as hyperplane intersection . . . . . . . . . 882.5.1.2 . . . as span of nullspace basis . . . . . . . . . 892.5.2 Intersection of subspaces . . . . . . . . . . . . . . . . . 902.6 Extreme, Exposed . . . . . . . . . . . . . . . . . . . . . . . . . 902.6.1 Exposure . . . . . . . . . . . . . . . . . . . . . . . . . 932.6.1.1 Density of exposed points . . . . . . . . . . . 932.6.1.2 Face transitivity and algebra . . . . . . . . . 942.6.1.3 Boundary . . . . . . . . . . . . . . . . . . . . 942.7 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952.7.1 Cone defined . . . . . . . . . . . . . . . . . . . . . . . 982.7.2 Convex cone . . . . . . . . . . . . . . . . . . . . . . . . 992.7.2.1 cone invariance . . . . . . . . . . . . . . . . . 1002.7.2.2 Pointed closed convex cone and partial order 1012.8 Cone boundary . . . . . . . . . . . . . . . . . . . . . . . . . . 1032.8.1 Extreme direction . . . . . . . . . . . . . . . . . . . . . 1042.8.1.1 extreme distinction, uniqueness . . . . . . . . 1042.8.1.2 Generators . . . . . . . . . . . . . . . . . . . 1072.8.2 Exposed direction . . . . . . . . . . . . . . . . . . . . . 1082.8.2.1 Connection between boundary and extremes . 1082.8.2.2 Converse caveat . . . . . . . . . . . . . . . . . 110

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 152.9 Positive semidefinite (PSD) cone . . . . . . . . . . . . . . . . . 1112.9.0.1 Membership . . . . . . . . . . . . . . . . . . . 1122.9.1 PSD cone is convex . . . . . . . . . . . . . . . . . . . . 1122.9.2 PSD cone boundary . . . . . . . . . . . . . . . . . . . 1162.9.2.1 rank ρ subset of the PSD cone . . . . . . . . 1182.9.2.2 open rank-ρ subset . . . . . . . . . . . . . . . 1182.9.2.3 Faces of PSD cone . . . . . . . . . . . . . . . 1192.9.2.4 Extreme directions of the PSD cone . . . . . 1202.9.2.5 PSD cone is generally not circular . . . . . . 1212.9.2.6 Boundary constituents of the PSD cone . . . 1272.9.2.7 Projection on S M +(ρ) . . . . . . . . . . . . . . 1292.9.2.8 Uniting constituents . . . . . . . . . . . . . . 1302.9.2.9 Peeling constituents . . . . . . . . . . . . . . 1302.9.3 Barvinok’s proposition . . . . . . . . . . . . . . . . . . 1312.10 Conic independence (c.i.) . . . . . . . . . . . . . . . . . . . . . 1332.10.1 Preservation of conic independence . . . . . . . . . . . 1362.10.1.1 linear maps of cones . . . . . . . . . . . . . . 1362.10.2 Pointed closed convex K & conic independence . . . . . 1362.10.3 Utility of conic independence . . . . . . . . . . . . . . 1382.11 When extreme means exposed . . . . . . . . . . . . . . . . . . 1392.12 Convex polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . 1392.12.1 Polyhedral cone . . . . . . . . . . . . . . . . . . . . . . 1402.12.2 Vertices of convex polyhedra . . . . . . . . . . . . . . . 1412.12.2.1 Vertex-description of polyhedral cone . . . . . 1422.12.2.2 Pointedness . . . . . . . . . . . . . . . . . . . 1422.12.3 Unit simplex . . . . . . . . . . . . . . . . . . . . . . . 1442.12.3.1 Simplex . . . . . . . . . . . . . . . . . . . . . 1442.12.4 Converting between descriptions . . . . . . . . . . . . . 1472.13 Dual cone & generalized inequality . . . . . . . . . . . . . . . 1472.13.1 Dual cone . . . . . . . . . . . . . . . . . . . . . . . . . 1482.13.1.1 Key properties of dual cone . . . . . . . . . . 1522.13.1.2 Examples of dual cone . . . . . . . . . . . . . 1542.13.2 Abstractions of Farkas’ lemma . . . . . . . . . . . . . . 1572.13.2.1 Null certificate, Theorem of the alternative . . 1592.13.3 Optimality condition . . . . . . . . . . . . . . . . . . . 1612.13.4 Discretization of membership relation . . . . . . . . . . 1622.13.4.1 Dual halfspace-description . . . . . . . . . . . 1622.13.4.2 First dual-cone formula . . . . . . . . . . . . 163

16 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY2.13.5 Dual PSD cone and generalized inequality . . . . . . . 1642.13.5.1 self-dual cones . . . . . . . . . . . . . . . . . 1662.13.6 Dual of pointed polyhedral cone . . . . . . . . . . . . . 1682.13.6.1 Facet normal & extreme direction . . . . . . . 1692.13.7 Biorthogonal expansion by example . . . . . . . . . . . 1692.13.7.1 Pointed cones and biorthogonality . . . . . . 1712.13.8 Biorthogonal expansion, derivation . . . . . . . . . . . 1732.13.8.1 x ∈ K . . . . . . . . . . . . . . . . . . . . . . 1742.13.8.2 x ∈ aff K . . . . . . . . . . . . . . . . . . . . 1762.13.9 Formulae, algorithm finding dual cone . . . . . . . . . 1762.13.9.1 Pointed K , dual, X skinny-or-square . . . . 1762.13.9.2 Simplicial case . . . . . . . . . . . . . . . . . 1772.13.9.3 Membership relations in a subspace . . . . . . 1782.13.9.4 Subspace M = aff K . . . . . . . . . . . . . . 1792.13.9.5 More pointed cone descriptions . . . . . . . . 1842.13.10Dual cone-translate . . . . . . . . . . . . . . . . . . . . 1852.13.10.1 first-order optimality condition - restatement 1852.13.11Proper nonsimplicial K , dual, X fat full-rank . . . . . 1893 Geometry of convex functions 1933.1 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . 1943.1.1 Vector-valued function . . . . . . . . . . . . . . . . . . 1943.1.1.1 Strict convexity . . . . . . . . . . . . . . . . . 1953.1.1.2 Affine function . . . . . . . . . . . . . . . . . 1963.1.1.3 Epigraph, sublevel set . . . . . . . . . . . . . 2003.1.1.4 Gradient . . . . . . . . . . . . . . . . . . . . . 2013.1.1.5 First-order convexity condition, real function 2033.1.1.6 First-order, vector-valued function . . . . . . 2073.1.1.7 Second-order, vector-valued function . . . . . 2083.1.1.8 second-order ⇒ first-order condition . . . . . 2083.1.2 Matrix-valued function . . . . . . . . . . . . . . . . . . 2093.1.2.1 First-order condition, matrix-valued function 2103.1.2.2 Epigraph of matrix-valued function . . . . . . 2103.1.2.3 Second-order, matrix-valued function . . . . . 2113.2 Quasiconvex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2143.2.1 Differentiable and quasiconvex . . . . . . . . . . . . . . 2153.3 Salient properties . . . . . . . . . . . . . . . . . . . . . . . . . 216

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 174 Euclidean Distance Matrix 2174.1 EDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2184.2 First metric properties . . . . . . . . . . . . . . . . . . . . . . 2194.3 ∃ fifth Euclidean metric property . . . . . . . . . . . . . . . . 2194.3.1 Lookahead . . . . . . . . . . . . . . . . . . . . . . . . . 2224.4 EDM definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2244.4.0.1 homogeneity . . . . . . . . . . . . . . . . . . 2254.4.1 −VN TD(X)V N convexity . . . . . . . . . . . . . . . . . 2264.4.2 Gram-form EDM definition . . . . . . . . . . . . . . . 2274.4.2.1 First point at origin . . . . . . . . . . . . . . 2284.4.2.2 0 geometric center . . . . . . . . . . . . . . . 2294.4.3 Inner-product form EDM definition . . . . . . . . . . . 2504.4.3.1 Relative-angle form . . . . . . . . . . . . . . . 2524.4.3.2 Inner-product form −VN TD(Θ)V N convexity . 2534.4.3.3 Inner-product form, discussion . . . . . . . . 2544.5 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2544.5.1 Translation . . . . . . . . . . . . . . . . . . . . . . . . 2544.5.1.1 Gram-form invariance . . . . . . . . . . . . . 2554.5.2 Rotation/Reflection . . . . . . . . . . . . . . . . . . . . 2564.5.2.1 Inner-product form invariance . . . . . . . . . 2564.5.3 Invariance conclusion . . . . . . . . . . . . . . . . . . . 2564.6 Injectivity of D & unique reconstruction . . . . . . . . . . . . 2574.6.1 Gram-form bijectivity . . . . . . . . . . . . . . . . . . 2574.6.1.1 Gram-form operator D inversion . . . . . . . 2594.6.2 Inner-product form bijectivity . . . . . . . . . . . . . . 2614.6.2.1 Inversion of D ( −VN TDV N). . . . . . . . . . . 2614.7 Embedding in affine hull . . . . . . . . . . . . . . . . . . . . . 2634.7.1 Determining affine dimension . . . . . . . . . . . . . . 2634.7.1.1 Affine dimension r versus rank . . . . . . . . 2654.7.2 Précis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2664.7.3 Eigenvalues of −V DV versus −V † N DV N . . . . . . . . 2674.8 Euclidean metric versus matrix criteria . . . . . . . . . . . . . 2684.8.1 Nonnegativity property 1 . . . . . . . . . . . . . . . . . 2684.8.1.1 Strict positivity . . . . . . . . . . . . . . . . . 2694.8.2 Triangle inequality property 4 . . . . . . . . . . . . . . 2704.8.2.1 Comment . . . . . . . . . . . . . . . . . . . . 2714.8.2.2 Strict triangle inequality . . . . . . . . . . . . 2724.8.3 −VN TDV N nesting . . . . . . . . . . . . . . . . . . . . . 272

18 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY4.8.3.1 Tightening the triangle inequality . . . . . . . 2744.8.4 Affine dimension reduction in two dimensions . . . . . 2754.9 Bridge: Convex polyhedra to EDMs . . . . . . . . . . . . . . . 2764.9.1 Geometric arguments . . . . . . . . . . . . . . . . . . . 2784.9.2 Necessity and sufficiency . . . . . . . . . . . . . . . . . 2824.9.3 Example revisited . . . . . . . . . . . . . . . . . . . . . 2824.10 EDM-entry composition . . . . . . . . . . . . . . . . . . . . . 2844.10.1 EDM by elliptope . . . . . . . . . . . . . . . . . . . . . 2854.11 EDM indefiniteness . . . . . . . . . . . . . . . . . . . . . . . . 2854.11.1 EDM eigenvalues, congruence transformation . . . . . . 2864.11.2 Spectral cones . . . . . . . . . . . . . . . . . . . . . . . 2874.11.2.1 Cayley-Menger versus Schoenberg . . . . . . 2884.11.2.2 Ordered eigenvalues . . . . . . . . . . . . . . 2894.11.2.3 Unordered eigenvalues . . . . . . . . . . . . . 2914.11.2.4 Dual cone versus dual spectral cone . . . . . 2924.12 List reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 2934.12.1 x 1 at the origin. V N . . . . . . . . . . . . . . . . . . . 2934.12.2 0 geometric center. V . . . . . . . . . . . . . . . . . . 2954.13 Reconstruction examples . . . . . . . . . . . . . . . . . . . . . 2974.13.1 Isometric reconstruction . . . . . . . . . . . . . . . . . 2974.13.2 Isotonic reconstruction . . . . . . . . . . . . . . . . . . 2974.13.2.1 Isotonic map of the USA . . . . . . . . . . . . 2994.13.2.2 Isotonic solution with sort constraint . . . . . 3014.13.2.3 Convergence . . . . . . . . . . . . . . . . . . 3024.13.2.4 Interlude . . . . . . . . . . . . . . . . . . . . 3034.14 Fifth property of Euclidean metric . . . . . . . . . . . . . . . 3044.14.1 Recapitulate . . . . . . . . . . . . . . . . . . . . . . . . 3044.14.2 Derivation of the fifth . . . . . . . . . . . . . . . . . . 3044.14.2.1 Nonnegative determinant . . . . . . . . . . . 3054.14.2.2 Beyond the fifth metric property . . . . . . . 3074.14.3 Path not followed . . . . . . . . . . . . . . . . . . . . . 3074.14.3.1 N = 4 . . . . . . . . . . . . . . . . . . . . . . 3084.14.3.2 N = 5 . . . . . . . . . . . . . . . . . . . . . . 3094.14.3.3 Volume of simplices . . . . . . . . . . . . . . 3094.14.4 Affine dimension reduction in three dimensions . . . . . 3114.14.4.1 Exemplum redux . . . . . . . . . . . . . . . . 311

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 195 EDM cone 3135.0.1 gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . 3145.0.2 highlight . . . . . . . . . . . . . . . . . . . . . . . . . . 3145.1 Defining EDM cone . . . . . . . . . . . . . . . . . . . . . . . . 3155.2 √Polyhedral bounds . . . . . . . . . . . . . . . . . . . . . . . . 3175.3 EDM cone is not convex . . . . . . . . . . . . . . . . . . . . 3195.4 a geometry of completion . . . . . . . . . . . . . . . . . . . . . 3215.5 EDM definition in 11 T . . . . . . . . . . . . . . . . . . . . . . 3265.5.1 Range of EDM D . . . . . . . . . . . . . . . . . . . . . 3275.5.2 Boundary constituents of EDM cone . . . . . . . . . . 3295.5.3 Faces of EDM cone . . . . . . . . . . . . . . . . . . . . 3325.5.3.1 Isomorphic faces . . . . . . . . . . . . . . . . 3325.5.3.2 Extreme direction of EDM cone . . . . . . . . 3325.5.3.3 Smallest face . . . . . . . . . . . . . . . . . . 3345.5.3.4 Open question . . . . . . . . . . . . . . . . . 3345.6 Correspondence to PSD cone S N−1+ . . . . . . . . . . . . . . . 3355.6.1 Gram-form correspondence to S N−1+ . . . . . . . . . . . 3375.6.2 EDM cone by elliptope . . . . . . . . . . . . . . . . . . 3375.7 Vectorization & projection interpretation . . . . . . . . . . . . 3405.7.1 Face of PSD cone S N + containing V . . . . . . . . . . . 3415.7.2 EDM criteria in 11 T . . . . . . . . . . . . . . . . . . . 3445.8 Dual EDM cone . . . . . . . . . . . . . . . . . . . . . . . . . . 3455.8.1 Ambient S N . . . . . . . . . . . . . . . . . . . . . . . . 3455.8.1.1 EDM cone and its dual in ambient S N . . . . 3465.8.1.2 nonnegative orthant . . . . . . . . . . . . . . 3515.8.1.3 Dual Euclidean distance matrix criterion . . . 3525.8.1.4 Nonorthogonal components of dual EDM . . . 3535.8.1.5 Affine dimension complementarity . . . . . . 3535.8.1.6 EDM cone duality . . . . . . . . . . . . . . . 3565.8.1.7 Schoenberg criterion is membership relation . 3575.8.2 Ambient S N h . . . . . . . . . . . . . . . . . . . . . . . . 3585.9 Theorem of the alternative . . . . . . . . . . . . . . . . . . . . 3595.10 postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3616 Semidefinite programming 3636.1 Conic problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 3646.1.1 Maximal complementarity . . . . . . . . . . . . . . . . 3666.1.1.1 Reduced-rank solution . . . . . . . . . . . . . 366

20 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY6.1.1.2 Coexistence of low- and high-rank solutions . 3686.1.1.3 Previous work . . . . . . . . . . . . . . . . . . 3716.1.1.4 Later developments . . . . . . . . . . . . . . . 3726.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3726.2.1 Feasible sets . . . . . . . . . . . . . . . . . . . . . . . . 3726.2.1.1 A ∩ S n + emptiness determination via Farkas’ . 3736.2.1.2 Theorem of the alternative for semidefinite . . 3756.2.1.3 boundary-membership criterion . . . . . . . . 3766.2.2 Duals . . . . . . . . . . . . . . . . . . . . . . . . . . . 3776.2.3 Optimality conditions . . . . . . . . . . . . . . . . . . 3776.3 Rank reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3866.3.1 Posit a perturbation of X ⋆ . . . . . . . . . . . . . . . . 3866.3.2 Perturbation form . . . . . . . . . . . . . . . . . . . . . 3886.3.3 Optimality of perturbed X ⋆ . . . . . . . . . . . . . . . 3916.3.4 Final thoughts regarding rank reduction . . . . . . . . 3946.3.4.1 Inequality constraints . . . . . . . . . . . . . 3946.4 Rank-constrained semidefinite program . . . . . . . . . . . . . 3956.4.1 Sensor-Network Localization . . . . . . . . . . . . . . . 3956.4.1.1 proposed standardized test . . . . . . . . . . 3976.4.1.2 problem statement . . . . . . . . . . . . . . . 4026.4.1.3 convergence . . . . . . . . . . . . . . . . . . . 4056.4.1.4 numerical solution . . . . . . . . . . . . . . . 4066.4.1.5 localization conclusion . . . . . . . . . . . . . 4116.4.1.6 postscript . . . . . . . . . . . . . . . . . . . . 4117 EDM proximity 4137.0.1 Measurement matrix H . . . . . . . . . . . . . . . . . . 4147.0.1.1 Order of imposition . . . . . . . . . . . . . . 4147.0.1.2 Egregious input error under nonnegativity . . 4167.0.2 Lower bound . . . . . . . . . . . . . . . . . . . . . . . 4187.0.3 Problem approach . . . . . . . . . . . . . . . . . . . . 4197.0.4 Three prevalent proximity problems . . . . . . . . . . . 4197.1 First prevalent problem: . . . . . . . . . . . . . . . . . . . . . 4217.1.1 Closest-EDM Problem 1, convex case . . . . . . . . . . 4237.1.2 Generic problem . . . . . . . . . . . . . . . . . . . . . 4247.1.2.1 Projection on rank ρ subset of PSD cone . . 4247.1.3 Choice of spectral cone . . . . . . . . . . . . . . . . . . 4257.1.3.1 Orthant is best spectral cone for Problem 1 . 427

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 217.1.4 Closest-EDM Problem 1, “nonconvex” case . . . . . . . 4287.1.4.1 Significance . . . . . . . . . . . . . . . . . . . 4307.1.5 Problem 1 in spectral norm, convex case . . . . . . . . 4317.2 Second prevalent problem: . . . . . . . . . . . . . . . . . . . . 4327.2.1 Convex case . . . . . . . . . . . . . . . . . . . . . . . . 4337.2.1.1 Equivalent semidefinite program, Problem 2 . 4337.2.1.2 Gram-form semidefinite program, Problem 2 . 4357.2.2 Minimization of affine dimension in Problem 2 . . . . . 4367.2.2.1 Rank minimization heuristic . . . . . . . . . . 4367.2.2.2 Applying trace rank-heuristic to Problem 2 . 4377.2.2.3 Rank-heuristic insight . . . . . . . . . . . . . 4387.2.2.4 Rank minimization heuristic beyond convex . 4387.2.2.5 Applying log det rank-heuristic to Problem 2 4397.2.2.6 Tightening this log det rank-heuristic . . . . 4397.3 Third prevalent problem: . . . . . . . . . . . . . . . . . . . . . 4407.3.1 Convex case . . . . . . . . . . . . . . . . . . . . . . . . 4407.3.1.1 Equivalent semidefinite program, Problem 3 . 4417.3.1.2 Schur-form semidefinite program, Problem 3 . 4427.3.1.3 Gram-form semidefinite program, Problem 3 . 4437.3.1.4 Dual interpretation, projection on EDM cone 4437.3.2 Minimization of affine dimension in Problem 3 . . . . . 4447.3.3 Constrained affine dimension, Problem 3 . . . . . . . . 4457.3.3.1 Cayley-Menger form . . . . . . . . . . . . . . 4467.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449A Linear algebra 451A.1 Main-diagonal δ operator, trace, vec . . . . . . . . . . . . . . . 451A.1.1 Identities . . . . . . . . . . . . . . . . . . . . . . . . . 452A.2 Semidefiniteness: domain of test . . . . . . . . . . . . . . . . . 455A.2.1 Symmetry versus semidefiniteness . . . . . . . . . . . . 455A.3 Proper statements . . . . . . . . . . . . . . . . . . . . . . . . . 458A.3.1 Semidefiniteness, eigenvalues, nonsymmetric . . . . . . 460A.4 Schur complement . . . . . . . . . . . . . . . . . . . . . . . . 470A.4.1 Semidefinite program via Schur . . . . . . . . . . . . . 472A.4.2 Determinant . . . . . . . . . . . . . . . . . . . . . . . . 474A.5 eigen decomposition . . . . . . . . . . . . . . . . . . . . . . . . 476A.5.0.1 Uniqueness . . . . . . . . . . . . . . . . . . . 476A.5.1 eigenmatrix . . . . . . . . . . . . . . . . . . . . . . . . 477

22 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRYA.5.2 Symmetric matrix diagonalization . . . . . . . . . . . . 478A.5.2.1 Positive semidefinite matrix square root . . . 478A.6 Singular value decomposition, SVD . . . . . . . . . . . . . . . 479A.6.1 Compact SVD . . . . . . . . . . . . . . . . . . . . . . . 479A.6.2 Subcompact SVD . . . . . . . . . . . . . . . . . . . . . 480A.6.3 Full SVD . . . . . . . . . . . . . . . . . . . . . . . . . 480A.6.4 SVD of symmetric matrices . . . . . . . . . . . . . . . 483A.6.5 Pseudoinverse by SVD . . . . . . . . . . . . . . . . . . 483A.7 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484A.7.1 0 entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 484A.7.2 0 eigenvalues theorem . . . . . . . . . . . . . . . . . . 484A.7.3 0 trace and matrix product . . . . . . . . . . . . . . . 487A.7.3.1 an equivalence in nonisomorphic spaces . . . . 487A.7.4 Zero definite . . . . . . . . . . . . . . . . . . . . . . . . 488B Simple matrices 491B.1 Rank-one matrix (dyad) . . . . . . . . . . . . . . . . . . . . . 492B.1.0.1 rank-one modification . . . . . . . . . . . . . 494B.1.0.2 dyad symmetry . . . . . . . . . . . . . . . . . 494B.1.1 Dyad independence . . . . . . . . . . . . . . . . . . . . 494B.1.1.1 Biorthogonality condition, Range, Nullspace . 496B.2 Doublet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497B.3 Elementary matrix . . . . . . . . . . . . . . . . . . . . . . . . 498B.3.1 Householder matrix . . . . . . . . . . . . . . . . . . . . 499B.4 Auxiliary V -matrices . . . . . . . . . . . . . . . . . . . . . . . 500B.4.1 Auxiliary projector matrix V . . . . . . . . . . . . . . 500B.4.2 Schoenberg auxiliary matrix V N . . . . . . . . . . . . . 502B.4.3 Orthonormal auxiliary matrix V W . . . . . . . . . . . . 503B.4.4 Auxiliary V -matrix Table . . . . . . . . . . . . . . . . 504B.4.5 More auxiliary matrices . . . . . . . . . . . . . . . . . 504B.5 Orthogonal matrix . . . . . . . . . . . . . . . . . . . . . . . . 505B.5.1 Vector rotation . . . . . . . . . . . . . . . . . . . . . . 505B.5.2 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . 505B.5.3 Rotation of range and rowspace . . . . . . . . . . . . . 506B.5.4 Matrix rotation . . . . . . . . . . . . . . . . . . . . . . 508B.5.4.1 bijection . . . . . . . . . . . . . . . . . . . . . 508

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 23C Some analytical optimal results 509C.1 properties of infima . . . . . . . . . . . . . . . . . . . . . . . . 509C.2 involving absolute value . . . . . . . . . . . . . . . . . . . . . 510C.3 involving diagonal, trace, eigenvalues . . . . . . . . . . . . . . 511C.4 Orthogonal Procrustes problem . . . . . . . . . . . . . . . . . 517C.4.1 Effect of translation . . . . . . . . . . . . . . . . . . . . 518C.4.1.1 Translation of extended list . . . . . . . . . . 519C.5 Two-sided orthogonal Procrustes . . . . . . . . . . . . . . . . 519C.5.0.1 Minimization . . . . . . . . . . . . . . . . . . 519C.5.0.2 Maximization . . . . . . . . . . . . . . . . . . 520C.5.1 Procrustes’ relation to linear programming . . . . . . . 520C.5.2 Two-sided orthogonal Procrustes via SVD . . . . . . . 521C.5.2.1 Symmetric matrices . . . . . . . . . . . . . . 522C.5.2.2 Diagonal matrices . . . . . . . . . . . . . . . 523D Matrix calculus 525D.1 Directional derivative, Taylor series . . . . . . . . . . . . . . . 525D.1.1 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . 525D.1.2 Product rules for matrix-functions . . . . . . . . . . . . 529D.1.2.1 Kronecker product . . . . . . . . . . . . . . . 530D.1.3 Chain rules for composite matrix-functions . . . . . . . 531D.1.3.1 Two arguments . . . . . . . . . . . . . . . . . 531D.1.4 First directional derivative . . . . . . . . . . . . . . . . 532D.1.4.1 Interpretation directional derivative . . . . . . 535D.1.5 Second directional derivative . . . . . . . . . . . . . . . 537D.1.6 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . 540D.1.7 Correspondence of gradient to derivative . . . . . . . . 541D.1.7.1 first-order . . . . . . . . . . . . . . . . . . . . 541D.1.7.2 second-order . . . . . . . . . . . . . . . . . . 542D.2 Tables of gradients and derivatives . . . . . . . . . . . . . . . 544D.2.1 Algebraic . . . . . . . . . . . . . . . . . . . . . . . . . 545D.2.2 Trace Kronecker . . . . . . . . . . . . . . . . . . . . . . 546D.2.3 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547D.2.4 Log determinant . . . . . . . . . . . . . . . . . . . . . 549D.2.5 Determinant . . . . . . . . . . . . . . . . . . . . . . . . 550D.2.6 Logarithmic . . . . . . . . . . . . . . . . . . . . . . . . 551D.2.7 Exponential . . . . . . . . . . . . . . . . . . . . . . . . 551

24 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRYE Projection 553E.0.1 Logical deductions . . . . . . . . . . . . . . . . . . . . 554E.1 Idempotent matrices . . . . . . . . . . . . . . . . . . . . . . . 555E.1.1 Biorthogonal characterization . . . . . . . . . . . . . . 556E.1.2 Idempotence summary . . . . . . . . . . . . . . . . . . 559E.2 I − P , Projection on algebraic complement . . . . . . . . . . . 560E.2.1 Universal projector characteristic . . . . . . . . . . . . 561E.3 Symmetric idempotent matrices . . . . . . . . . . . . . . . . . 561E.3.1 Four subspaces . . . . . . . . . . . . . . . . . . . . . . 562E.3.2 Orthogonal characterization . . . . . . . . . . . . . . . 562E.3.3 Summary, symmetric idempotent . . . . . . . . . . . . 564E.3.4 Orthonormal decomposition . . . . . . . . . . . . . . . 564E.3.4.1 Unifying trait of all projectors: direction . . . 564E.3.4.2 orthogonal projector, orthonormal . . . . . . 565E.3.4.3 orthogonal projector, biorthogonal . . . . . . 565E.3.4.4 nonorthogonal projector, biorthogonal . . . . 566E.4 Algebra of projection on affine subsets . . . . . . . . . . . . . 566E.5 Projection examples . . . . . . . . . . . . . . . . . . . . . . . 567E.6 Vectorization interpretation, . . . . . . . . . . . . . . . . . . . 573E.6.1 Nonorthogonal projection on a vector . . . . . . . . . . 573E.6.2 Nonorthogonal projection on vectorized matrix . . . . . 574E.6.2.1 Nonorthogonal projection on dyad . . . . . . 574E.6.3 Orthogonal projection on a vector . . . . . . . . . . . . 576E.6.4 Orthogonal projection on a vectorized matrix . . . . . 576E.6.4.1 Orthogonal projection on dyad . . . . . . . . 576E.6.4.2 Positive semidefiniteness test as projection . . 579E.6.4.3 PXP ≽ 0 . . . . . . . . . . . . . . . . . . . . 581E.7 on vectorized matrices of higher rank . . . . . . . . . . . . . . 581E.7.1 PXP misinterpretation for higher-rank P . . . . . . . 581E.7.2 Orthogonal projection on matrix subspaces . . . . . . . 581E.8 Range/Rowspace interpretation . . . . . . . . . . . . . . . . . 585E.9 Projection on convex set . . . . . . . . . . . . . . . . . . . . . 585E.9.1 Dual interpretation of projection on convex set . . . . . 586E.9.1.1 Dual interpretation as optimization . . . . . . 587E.9.1.2 Dual interpretation of projection on cone . . . 589E.9.2 Projection on cone . . . . . . . . . . . . . . . . . . . . 590E.9.2.1 Relation to subspace projection . . . . . . . . 591E.9.2.2 Salient properties: Projection on cone . . . . 592

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 25E.9.3 nonexpansivity . . . . . . . . . . . . . . . . . . . . . . 594E.9.4 Easy projections . . . . . . . . . . . . . . . . . . . . . 595E.9.5 Projection on convex set in subspace . . . . . . . . . . 598E.10 Alternating projection . . . . . . . . . . . . . . . . . . . . . . 599E.10.0.1 commutative projectors . . . . . . . . . . . . 599E.10.0.2 noncommutative projectors . . . . . . . . . . 600E.10.0.3 the bullets . . . . . . . . . . . . . . . . . . . . 602E.10.1 Distance and existence . . . . . . . . . . . . . . . . . . 604E.10.2 Feasibility and convergence . . . . . . . . . . . . . . . 604E.10.2.1 Relative measure of convergence . . . . . . . 608E.10.3 Optimization and projection . . . . . . . . . . . . . . . 612E.10.3.1 Dykstra’s algorithm . . . . . . . . . . . . . . 613E.10.3.2 Normal cone . . . . . . . . . . . . . . . . . . 613E.10.3.3 speculation . . . . . . . . . . . . . . . . . . . 616F Proof of EDM composition 617F.1 EDM-entry exponential . . . . . . . . . . . . . . . . . . . . . . 617G Matlab programs 621G.1 isedm() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621G.1.1 Subroutines for isedm() . . . . . . . . . . . . . . . . . 625G.1.1.1 chop() . . . . . . . . . . . . . . . . . . . . . 625G.1.1.2 Vn() . . . . . . . . . . . . . . . . . . . . . . . 625G.1.1.3 Vm() . . . . . . . . . . . . . . . . . . . . . . . 625G.1.1.4 signeig() . . . . . . . . . . . . . . . . . . . 626G.1.1.5 Dx() . . . . . . . . . . . . . . . . . . . . . . . 626G.2 conic independence, conici() . . . . . . . . . . . . . . . . . . 627G.2.1 lp() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629G.3 Map of the USA . . . . . . . . . . . . . . . . . . . . . . . . . . 630G.3.1 EDM, mapusa() . . . . . . . . . . . . . . . . . . . . . . 630G.3.1.1 USA map input-data decimation, decimate() 632G.3.2 EDM using ordinal data, omapusa() . . . . . . . . . . 632G.4 Rank reduction subroutine, RRf() . . . . . . . . . . . . . . . . 635G.4.1 svect() . . . . . . . . . . . . . . . . . . . . . . . . . . 637G.4.2 svectinv() . . . . . . . . . . . . . . . . . . . . . . . . 638G.5 Sturm’s procedure . . . . . . . . . . . . . . . . . . . . . . . . 639H Notation and a few definitions 641

26 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRYBibliography 655Index 681

List of Figures1 Overview 351 Orion nebula . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 Application of trilateration is localization . . . . . . . . . . . . 373 Molecular conformation . . . . . . . . . . . . . . . . . . . . . . 384 Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Swiss roll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 USA map reconstruction . . . . . . . . . . . . . . . . . . . . . 427 Honeycomb . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 Convex geometry 478 Slab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Intersection of line with boundary . . . . . . . . . . . . . . . . 5210 Tangentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5511 Convex hull of a random list of points . . . . . . . . . . . . . . 6712 A simplicial cone . . . . . . . . . . . . . . . . . . . . . . . . . 6913 Fantope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7114 Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7315 Hyperplane illustrated ∂H is a line partially bounding . . . . 7516 Hyperplanes in R 2 . . . . . . . . . . . . . . . . . . . . . . . . 7717 Affine independence . . . . . . . . . . . . . . . . . . . . . . . . 8018 {z ∈ C | a T z = κ i } . . . . . . . . . . . . . . . . . . . . . . . . . 8219 Hyperplane supporting closed set . . . . . . . . . . . . . . . . 8320 Closed convex set illustrating exposed and extreme points . . 9221 Two-dimensional nonconvex cone . . . . . . . . . . . . . . . . 9622 Nonconvex cone made from lines . . . . . . . . . . . . . . . . 9623 Nonconvex cone is convex cone boundary . . . . . . . . . . . . 9727

28 LIST OF FIGURES24 Cone exterior is convex cone . . . . . . . . . . . . . . . . . . . 9725 Truncated nonconvex cone X . . . . . . . . . . . . . . . . . . 9826 Not a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9927 K is a pointed polyhedral cone having empty interior . . . . . 10528 Exposed and extreme directions . . . . . . . . . . . . . . . . . 10929 Positive semidefinite cone . . . . . . . . . . . . . . . . . . . . 11330 Convex Schur-form set . . . . . . . . . . . . . . . . . . . . . . 11431 Projection of the PSD cone . . . . . . . . . . . . . . . . . . . 11732 Circular cone showing axis of revolution . . . . . . . . . . . . 12233 Circular section . . . . . . . . . . . . . . . . . . . . . . . . . . 12434 Polyhedral inscription . . . . . . . . . . . . . . . . . . . . . . 12535 Conically (in)dependent vectors . . . . . . . . . . . . . . . . . 13436 Pointed six-faceted polyhedral cone and its dual . . . . . . . . 13537 Minimal set of generators for halfspace about origin . . . . . . 13738 Unit simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14339 Two views of a simplicial cone and its dual . . . . . . . . . . . 14540 Two equivalent constructions of dual cone . . . . . . . . . . . 14941 Dual polyhedral cone construction by right-angle . . . . . . . 15042 K is a halfspace about the origin . . . . . . . . . . . . . . . . 15143 Orthogonal cones . . . . . . . . . . . . . . . . . . . . . . . . . 15544 Blades K and K ∗ . . . . . . . . . . . . . . . . . . . . . . . . . 15645 Shrouded polyhedral cone . . . . . . . . . . . . . . . . . . . . 16546 Simplicial cone K in R 2 and its dual . . . . . . . . . . . . . . . 17047 Monotone nonnegative cone K M+ and its dual . . . . . . . . . 18048 Monotone cone K M and its dual . . . . . . . . . . . . . . . . . 18249 Two views of monotone cone K M and its dual . . . . . . . . . 18350 First-order optimality condition . . . . . . . . . . . . . . . . . 1863 Geometry of convex functions 19351 Convex functions having unique minimizer . . . . . . . . . . . 19552 Affine function . . . . . . . . . . . . . . . . . . . . . . . . . . 19753 Supremum of affine functions . . . . . . . . . . . . . . . . . . 19954 Gradient in R 2 evaluated on grid . . . . . . . . . . . . . . . . 20155 Quadratic function convexity in terms of its gradient . . . . . 20456 Plausible contour plot . . . . . . . . . . . . . . . . . . . . . . 20657 Iconic quasiconvex function . . . . . . . . . . . . . . . . . . . 213

LIST OF FIGURES 294 Euclidean Distance Matrix 21758 Convex hull of three points . . . . . . . . . . . . . . . . . . . . 21859 Complete dimensionless EDM graph . . . . . . . . . . . . . . 22160 Fifth Euclidean metric property . . . . . . . . . . . . . . . . . 22361 Arbitrary hexagon in R 3 . . . . . . . . . . . . . . . . . . . . . 23262 Kissing number . . . . . . . . . . . . . . . . . . . . . . . . . . 23463 Trilateration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23864 This EDM graph provides unique isometric reconstruction . . 24265 Two sensors • and three anchors ◦ . . . . . . . . . . . . . . . 24266 Two discrete linear trajectories of sensors . . . . . . . . . . . . 24367 Coverage in cellular telephone network . . . . . . . . . . . . . 24668 Example of molecular conformation . . . . . . . . . . . . . . . 24869 Orthogonal complements in S N abstractly oriented . . . . . . . 25870 Elliptope E 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27771 Elliptope E 2 interior to S 2 + . . . . . . . . . . . . . . . . . . . . 27872 Minimum eigenvalue of −VN TDV N . . . . . . . . . . . . . . . . 28373 Some entrywise EDM compositions . . . . . . . . . . . . . . . 28374 Map of United States of America . . . . . . . . . . . . . . . . 29675 Largest ten eigenvalues of −VN TOV N . . . . . . . . . . . . . . 29976 Relative-angle inequality tetrahedron . . . . . . . . . . . . . . 30677 Nonsimplicial pyramid in R 3 . . . . . . . . . . . . . . . . . . . 3105 EDM cone 31378 Relative boundary of cone of Euclidean distance matrices . . . 31679 Intersection of EDM cone with hyperplane . . . . . . . . . . . 31880 Neighborhood graph . . . . . . . . . . . . . . . . . . . . . . . 32081 Trefoil knot untied . . . . . . . . . . . . . . . . . . . . . . . . 32382 Trefoil ribbon . . . . . . . . . . . . . . . . . . . . . . . . . . . 32583 Example of V X selection to make an EDM . . . . . . . . . . . 32884 Vector V X spirals . . . . . . . . . . . . . . . . . . . . . . . . . 33085 Three views of translated negated elliptope . . . . . . . . . . . 33886 Halfline T on PSD cone boundary . . . . . . . . . . . . . . . . 34287 Vectorization and projection interpretation example . . . . . . 34388 Orthogonal complement of geometric center subspace . . . . . 34889 EDM cone construction by flipping PSD cone . . . . . . . . . 34990 Decomposing a member of polar EDM cone . . . . . . . . . . 35491 Ordinary dual EDM cone projected on S 3 h . . . . . . . . . . . 360

30 LIST OF FIGURES6 Semidefinite programming 36392 Visualizing positive semidefinite cone in high dimension . . . . 36793 2-lattice of sensors and anchors for wireless location example . 39894 3-lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39995 4-lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40096 5-lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40197 a 2-lattice solution for wireless location example . . . . . . . . 40798 a 3-lattice solution . . . . . . . . . . . . . . . . . . . . . . . . 40899 a 4-lattice solution . . . . . . . . . . . . . . . . . . . . . . . . 409100 a 5-lattice solution . . . . . . . . . . . . . . . . . . . . . . . . 4107 EDM proximity 413101 Pseudo-Venn diagram . . . . . . . . . . . . . . . . . . . . . . 416102 Elbow placed in path of projection . . . . . . . . . . . . . . . 417A Linear algebra 451103 Geometrical interpretation of full SVD . . . . . . . . . . . . . 482B Simple matrices 491104 Four fundamental subspaces of any dyad . . . . . . . . . . . . 493105 Four fundamental subspaces of a doublet . . . . . . . . . . . . 497106 Four fundamental subspaces of elementary matrix . . . . . . . 498107 Gimbal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506D Matrix calculus 525108 Convex quadratic bowl in R 2 × R . . . . . . . . . . . . . . . . 535E Projection 553109 Nonorthogonal projection of x∈ R 3 on R(U)= R 2 . . . . . . . 558110 Biorthogonal expansion of point x∈aff K . . . . . . . . . . . . 569111 Dual interpretation of projection on convex set . . . . . . . . . 588112 Projection product on convex set in subspace . . . . . . . . . 597113 von Neumann-style projection of point b . . . . . . . . . . . . 600114 Alternating projection on two halfspaces . . . . . . . . . . . . 601115 Distance, optimization, feasibility . . . . . . . . . . . . . . . . 603

LIST OF FIGURES 31116 Alternating projection on nonnegative orthant and hyperplane 606117 Geometric convergence of iterates in norm . . . . . . . . . . . 606118 Distance between PSD cone and iterate in A . . . . . . . . . . 611119 Dykstra’s alternating projection algorithm . . . . . . . . . . . 612120 Polyhedral normal cones . . . . . . . . . . . . . . . . . . . . . 614121 Normal cone to elliptope . . . . . . . . . . . . . . . . . . . . . 615

List of Tables2 Convex GeometryTable 2.9.2.3.1, rank versus dimension of S 3 + faces 119Table 2.10.0.0.1, maximum number c.i. directions 133Cone Table 1 177Cone Table S 177Cone Table A 179Cone Table 1* 1826 Semidefinite programmingFaces of S 3 + correspond to faces of S 3 + 369B Simple matricesAuxiliary V -matrix Table B.4.4 504D Matrix calculusTable D.2.1, algebraic gradients and derivatives 545Table D.2.2, trace Kronecker gradients 546Table D.2.3, trace gradients and derivatives 547Table D.2.4, log determinant gradients and derivatives 549Table D.2.5, determinant gradients and derivatives 550Table D.2.7, exponential gradients and derivatives 5512001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.32

Chapter 1OverviewConvex OptimizationEuclidean Distance GeometryPeople are so afraid of convex analysis.−Claude Lemaréchal, 2003In layman’s terms, the mathematical science of Optimization is the study ofhow to make a good choice when confronted with conflicting requirements.The qualifier convex means: when an optimal solution is found, then it isguaranteed to be a best solution; there is no better choice.Any convex optimization problem has geometric interpretation. If a givenoptimization problem can be transformed to a convex equivalent, then thisinterpretive benefit is acquired. That is a powerful attraction: the ability tovisualize geometry of an optimization problem. Conversely, recent advancesin geometry and in graph theory hold convex optimization within their proofs’core. [270] [213]This book is about convex optimization, convex geometry (with particularattention to distance geometry), geometrical problems, and problems thatcan be transformed into geometrical problems.2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.35

36 CHAPTER 1. OVERVIEWFigure 1: Orion nebula. (astrophotography by Massimo Robberto)Euclidean distance geometry is, fundamentally, a determination of pointconformation (configuration, relative position or location) by inference frominterpoint distance information. By inference we mean: e.g., given onlydistance information, determine whether there corresponds a realizableconformation of points; a list of points in some dimension that attains thegiven interpoint distances. Each point may represent simply location or,abstractly, any entity expressible as a vector in finite-dimensional Euclideanspace.It is a common misconception to presume that some desired pointconformation cannot be recovered in the absence of complete interpointdistance information. We might, for example, realize a constellation givenonly interstellar distance (or, equivalently, distance from Earth and relativeangular measurement; the Earth as vertex to two stars). At first it may seemO(N 2 ) data is required, yet there are many circumstances where this can bereduced to O(N).

37ˇx 4ˇx 3ˇx 2Figure 2: Application of trilateration (4.4.2.2.4) is localization (determiningposition) of a radio signal source in 2 dimensions; more commonly known byradio engineers as the process “triangulation”. In this scenario, anchorsˇx 2 , ˇx 3 , ˇx 4 are illustrated as fixed antennae. [136] The radio signal source(a sensor • x 1 ) anywhere in affine hull of three antenna bases can beuniquely localized by measuring distance to each (dashed white arrowed linesegments). Ambiguity of lone distance measurement to sensor is representedby circle about each antenna. So & Ye proved trilateration is expressible asa semidefinite program; hence, a convex optimization problem. [212]

38 CHAPTER 1. OVERVIEWFigure 3: [118] [69] Distance data collected via nuclear magnetic resonance(NMR) helped render this 3-dimensional depiction of a protein molecule.At the beginning of the 1980s, Kurt Wüthrich [Nobel laureate], developedan idea about how NMR could be extended to cover biological moleculessuch as proteins. He invented a systematic method of pairing each NMRsignal with the right hydrogen nucleus (proton) in the macromolecule. Themethod is called sequential assignment and is today a cornerstone of all NMRstructural investigations. He also showed how it was subsequently possible todetermine pairwise distances between a large number of hydrogen nuclei anduse this information with a mathematical method based on distance-geometryto calculate a three-dimensional structure for the molecule. [184] [255] [113]If we agree that a set of points can have a shape (three points canform a triangle and its interior, for example, four points a tetrahedron),then we can ascribe shape of a set of points to their convex hull. Itshould be apparent: these shapes can be determined only to within arigid transformation (a rotation, reflection, and a translation).Absolute position information is generally lost, given only distanceinformation, but we can determine the smallest possible dimension in whichan unknown list of points can exist; that attribute is their affine dimension(a triangle in any ambient space has affine dimension 2, for example). Incircumstances where fixed points of reference are also provided, it becomespossible to determine absolute position or location; e.g., Figure 2.

39Figure 4: This coarsely discretized triangulated algorithmically flattenedhuman face (made by Kimmel & the Bronsteins [146]) represents a stagein machine recognition of human identity; called face recognition. Distancegeometry is applied to determine discriminating features.Geometric problems involving distance between points can sometimes bereduced to convex optimization problems. Mathematics of this combinedstudy of geometry and optimization is rich and deep. Its applicationhas already proven invaluable discerning organic molecular conformationby measuring interatomic distance along covalent bonds; e.g., Figure 3.[53] [235] [113] [255] [80] Many disciplines have already benefitted andsimplified consequent to this theory; e.g., distance-based pattern recognition(Figure 4), localization in wireless sensor networks by measurement ofintersensor distance along channels of communication, [32,5] [265] [31]wireless location of a radio-signal source such as a cell phone by multiplemeasurements of signal strength, the global positioning system (GPS), andmultidimensional scaling which is a numerical representation of qualitativedata by finding a low-dimensional scale. Euclidean distance geometry hasalso found application in artificial intelligence: to machine learning bydiscerning naturally occurring manifolds in Euclidean bodies (Figure 5,5.4.0.0.1), in Fourier spectra of kindred utterances [138], and in imagesequences [251].

40 CHAPTER 1. OVERVIEW4Figure 5: Swiss roll from Weinberger & Saul [251]. The problem of manifoldlearning, illustrated for N =800 data points sampled from a “Swiss roll” 1Ç.A discretized manifold is revealed by connecting each data point and its k=6nearest neighbors 2Ç. An unsupervised learning algorithm unfolds the Swissroll while preserving the local geometry of nearby data points 3Ç. Finally, thedata points are projected onto the two dimensional subspace that maximizestheir variance, yielding a faithful embedding of the original manifold 4Ç.

41by chapterWe study the pervasive convex Euclidean bodies and their variousrepresentations. In particular, we make convex polyhedra, cones, and dualcones more visceral through illustration in chapter 2, Convex geometry,and we study the geometric relation of polyhedral cones to nonorthogonalbases (biorthogonal expansion). We explain conversion between halfspaceandvertex-descriptions of convex cones, we motivate dual cones and provideformulae describing them, and we show how first-order optimality conditionsor linear matrix inequalities [40] or established alternative systems of linearinequality can be explained by generalized inequalities with respect to convexcones and their duals. The conic analogue to linear independence, calledconic independence, is introduced as a new tool in the study of cones; thelogical next step in the progression: linear, affine, conic.Any convex optimization problem can be visualized geometrically.Desire to visualize in high dimension is deeply embedded in themathematical psyche. Chapter 2 provides tools to make visualizationeasier. The concepts of face, extreme point, and extreme direction ofa convex Euclidean body are explained here, crucial to understandingconvex optimization. The convex cone of positive semidefinite matrices, inparticular, is studied in depth. We interpret, for example, inverse image ofthe positive semidefinite cone under affine transformation. We explain howhigher-rank subsets of the positive semidefinite cone boundary united withits interior are convex, and we give a closed form for projection on thoseconvex subsets.Chapter 3, Geometry of convex functions, observes analogiesbetween convex sets and functions: The set of all vector-valued convexfunctions of particular dimension is a closed convex cone. Included amongthe examples in this chapter, we show how the real affine function relatesto convex functions as the hyperplane relates to convex sets. Here, also,pertinent results for multidimensional convex functions are presentedthat are largely ignored in the literature; tricks and tips for determiningtheir convexity and discerning their geometry, particularly with regard tomatrix calculus which remains largely unsystematized when compared withtraditional (ordinary) calculus. Consequently, we collect some results ofmatrix differentiation in the appendices.

42 CHAPTER 1. OVERVIEWoriginalreconstructionFigure 6: About five thousand points along the borders constituting theUnited States were used to create an exhaustive matrix of interpoint distancefor each and every pair of points in the ordered set (a list); called aEuclidean distance matrix. From that noiseless distance information, it iseasy to reconstruct the map exactly via the Schoenberg criterion (487).(4.13.1.0.1, confer Figure 74) Map reconstruction is exact (to within a rigidtransformation) given any number of interpoint distances; the greater thenumber of distances, the greater the detail (just as it is for all conventionalmap preparation).

43Figure 7: These bees construct a honeycomb by solving a convex optimizationproblem. (4.4.2.2.3) The most dense packing of identical spheres about acentral sphere in two dimensions is 6. Packed sphere centers describe aregular lattice.The EDM is studied in chapter 4, Euclidean distance matrix, itsproperties and relationship to both positive semidefinite and Gram matrices.We relate the EDM to the four classical properties of the Euclidean metric;thereby, observing existence of an infinity of properties of the Euclideanmetric beyond the triangle inequality. We proceed by deriving the fifthEuclidean metric property and then explain why furthering this endeavor isinefficient because the ensuing criteria (while describing polyhedra in angleor area, volume, content, and so on ad infinitum) grow linearly in complexityand number with problem size.Reconstruction methods are discussed and applied to a map of theUnited States; e.g., Figure 6. We also generate a distorted but recognizableisotonic map using only comparative distance information (only ordinaldistance data). We demonstrate an elegant method for including dihedral(or torsion) angle constraints into a molecular conformation problem. Moregeometrical problems solvable via EDMs are presented with the best methodsfor posing them, EDM problems are posed as convex optimizations, andwe show how to recover exact relative position given incomplete noiselessinterpoint distance information.

44 CHAPTER 1. OVERVIEWWe offer a new proof of the Schoenberg characterization of Euclideandistance matrices in chapter 4;{−VTD ∈ EDM NN DV N ≽ 0⇔(487)D ∈ S N hOur proof relies on fundamental geometry; assuming, any EDM mustcorrespond to a list of points contained in some polyhedron (possibly atits vertices) and vice versa. It is known, but not obvious, this Schoenbergcriterion implies nonnegativity of the EDM entries; proved here.We characterize the eigenvalue spectrum of an EDM in chapter 4, andthen devise a polyhedral cone required for determining membership of acandidate matrix (in Cayley-Menger form) to the convex cone of Euclideandistance matrices (EDM cone); id est, a candidate is an EDM if and only ifits eigenspectrum belongs to a spectral cone for EDM N ;⎧ ([ ]) [ ]⎪⎨ 0 1T RN+λ∈ ∩ ∂HD ∈ EDM N ⇔ 1 −D R − (689)⎪⎩D ∈ S N hWe will see spectral cones are not unique.In chapter 5, EDM cone, we explain the geometric relationshipbetween the cone of Euclidean distance matrices, two positive semidefinitecones, and the elliptope. We illustrate geometric requirements, in particular,for projection of a candidate matrix on a positive semidefinite cone thatestablish its membership to the EDM cone. The faces of the EDM cone aredescribed, but still open is the question whether all its faces are exposedas they are for the positive semidefinite cone. The Schoenberg criterion(487), relating the EDM cone and a positive semidefinite cone, is revealedto be a discretized membership relation (dual generalized inequalities, a newFarkas’-like lemma) between the EDM cone and its ordinary dual, EDM N∗ .A matrix criterion for membership to the dual EDM cone is derived that issimpler than the Schoenberg criterion:D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (833)We derive a new concise equality of the EDM cone to two subspaces and apositive semidefinite cone;EDM N = S N h ∩ ( )S N⊥c − S N + (827)

Semidefinite programming is reviewed in chapter 6 with particularattention to optimality conditions of prototypical primal and dual conicprograms, their interplay, and the perturbation method of rank reductionof optimal solutions (extant but not well-known). Positive definite Farkas’lemma is derived, and a three-dimensional polyhedral analogue for thepositive semidefinite cone of 3 × 3 symmetric matrices is introduced. Thisanalogue is a new tool for visualizing coexistence of low- and high-rankoptimal solutions in 6 dimensions. We solve one instance of a combinatorialoptimization problem via semidefinite program relaxation; id est,45minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(891)an optimal minimum cardinality Boolean solution to Ax = b . Thesensor-network localization problem is solved in any dimension in thischapter. We show how to constrain rank in the form rankG = ρ for anysmall semidefinite feasibility problem.In chapter 7, EDM proximity, we explore methods of solution to a fewfundamental and prevalent Euclidean distance matrix proximity problems;the problem of finding that Euclidean distance matrix closest to a givenmatrix in some sense. We discuss several heuristics for the problems whencompounded with rank minimization: (965)minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNminimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNWe offer a new geometrical proof of a famous result discovered byEckart & Young in 1936 [72] regarding Euclidean projection of a pointon that nonconvex subset of the positive semidefinite cone comprisingall positive semidefinite matrices having rank not exceeding a prescribed

46 CHAPTER 1. OVERVIEWbound ρ . We explain how this problem is transformed to a convexoptimization for any rank ρ .Eight appendices are provided so as to be more self-contained:linear algebra (appendix A is primarily concerned with properstatements of semidefiniteness for square matrices),simple matrices (dyad, doublet, elementary, Householder, Schoenberg,orthogonal, etcetera, in appendix B),a collection of known analytical solutions to some importantoptimization problems (appendix C),matrix calculus (appendix D concerns matrix-valued functions, theirderivatives and directional derivatives, Taylor series, and tables of firstandsecond-order gradients and derivatives),an elaborate and insightful exposition of orthogonal and nonorthogonalprojection on convex sets (the connection between projection andpositive semidefiniteness, or between projection and a linear objectivefunction, for example, in appendix E),software to discriminate EDMs, conic independence, software to reducerank of an optimal solution to a semidefinite program, and two distinctmethods of reconstructing a map of the United States given onlydistance information (appendix G).

Chapter 2Convex geometryConvexity has an immensely rich structure and numerousapplications. On the other hand, almost every “convex” idea canbe explained by a two-dimensional picture.−Alexander Barvinok [19, p.vii]There is relatively less published pertaining to matrix-valued convex sets andfunctions. [140] [132,6.6] [193] As convex geometry and linear algebra areinextricably bonded, we provide much background material on linear algebra(especially in the appendices) although it is assumed the reader is comfortablewith [220], [222], [131], or any other intermediate-level text. The essentialreferences to convex analysis are [129] [202]. The reader is referred to [218][19] [250] [29] [41] [199] [239] for a comprehensive treatment of convexity.2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.47

48 CHAPTER 2. CONVEX GEOMETRY2.1 Convex setA set C is convex iff for all Y, Z ∈ C and 0≤µ≤1µ Y + (1 − µ)Z ∈ C (1)Under that defining condition on µ , the linear sum in (1) is called a convexcombination of Y and Z . If Y and Z are points in Euclidean real vectorspace R n or R m×n (matrices), then (1) represents the closed line segmentjoining them. All line segments are thereby convex sets. Apparent from thedefinition, a convex set is a connected set. [168,3.4,3.5] [29, p.2]The idealized chicken egg, an ellipsoid (Figure 9(c), p.52), is a goodorganic icon for a convex set in three dimensions R 3 .2.1.1 subspaceA subset of Euclidean real vector space R n is called a subspace (formallydefined in2.5) if every vector 2.1 of the form αx + βy , for α,β∈R , isin the subset whenever vectors x and y are. [162,2.3] A subspace is aconvex set containing the origin, by definition. [202, p.4] Any subspace istherefore open in the sense that it contains no boundary, but closed in thesense [168,2]R n + R n = R n (2)It is not difficult to showR n = −R n (3)as is true for any subspace R , because x∈ R n ⇔ −x∈ R n .The intersection of an arbitrary collection of subspaces remains asubspace. Any subspace not constituting the entire ambient vector space R nis a proper subspace; e.g., 2.2 any line through the origin in two-dimensionalEuclidean space R 2 . The vector space R n is itself a conventional subspace,inclusively, [147,2.1] although not proper.2.1 A vector is assumed, throughout, to be a column vector.2.2 We substitute the abbreviation e.g. in place of the Latin exempli gratia.

2.1. CONVEX SET 49Figure 8: A slab is a convex Euclidean body infinite in extent butnot affine. Illustrated in R 2 , it may be constructed by intersectingtwo opposing halfspaces whose bounding hyperplanes are parallel but notcoincident. Because number of halfspaces used in its construction is finite,slab is a polyhedron. (Cartesian axes and vector inward-normal to eachhalfspace-boundary are drawn for reference.)2.1.2 linear independenceArbitrary given vectors in Euclidean space {Γ i ∈ R n , i=1... N} are linearlyindependent (l.i.) if and only if, for all ζ ∈ R NΓ 1 ζ 1 + · · · + Γ N−1 ζ N−1 + Γ N ζ N = 0 (4)has only the trivial solution ζ = 0 ; in other words, iff no vector from thegiven set can be expressed as a linear combination of those remaining.2.1.2.1 preservationLinear independence can be preserved under linear transformation. Givenmatrix Y ∆ = [y 1 y 2 · · · y N ]∈ R N×N , consider the mappingT(Γ) : R n×N → R n×N ∆ = ΓY (5)whose domain is the set of all matrices Γ∈ R n×N holding a linearlyindependent set columnar. Linear independence of {Γy i ∈ R n , i=1... N}demands, by definition, there exists no nontrivial solution ζ ∈ R N toΓy 1 ζ i + · · · + Γy N−1 ζ N−1 + Γy N ζ N = 0 (6)By factoring Γ , we see that is ensured by linear independence of {y i ∈ R N }.

50 CHAPTER 2. CONVEX GEOMETRY2.1.3 Orthant:name given to a closed convex set that is the higher-dimensionalgeneralization of quadrant from the classical Cartesian partition of R 2 .The most common is the nonnegative orthant R n + or R n×n+ (analogueto quadrant I) to which membership denotes nonnegative vector- ormatrix-entries respectively; e.g.,R n +∆= {x∈ R n | x i ≥ 0 ∀i} (7)The nonpositive orthant R n − or R n×n− (analogue to quadrant III) denotesnegative and 0 entries. Orthant convexity 2.3 is easily verified bydefinition (1).2.1.4 affine setA nonempty affine set (from the word affinity) is any subset of R n that is atranslation of some subspace. An affine set is convex and open so contains noboundary: e.g., ∅ , point, line, plane, hyperplane (2.4.2), subspace, etcetera.For some parallel 2.4 subspace M and any point x∈AA is affine ⇔ A = x + M= {y | y − x∈M}(8)The intersection of an arbitrary collection of affine sets remains affine. Theaffine hull of a set C ⊆ R n (2.3.1) is the smallest affine set containing it.2.1.5 dimensionDimension of an arbitrary set S is the dimension of its affine hull; [250, p.14]dim S ∆ = dim aff S = dim aff(S − s) , s∈ S (9)the same as dimension of the subspace parallel to that affine set aff S whennonempty. Hence dimension (of a set) is synonymous with affine dimension.[129, A.2.1]2.3 All orthants are self-dual simplicial cones. (2.13.5.1,2.12.3.1.1)2.4 Two affine sets are said to be parallel when one is a translation of the other. [202, p.4]

2.1. CONVEX SET 512.1.6 empty set versus empty interiorEmptiness of a set ∅ is handled differently than interior in the classicalliterature. It is common for a nonempty convex set to have empty interior;e.g., paper in the real world. Thus the term relative is the conventional fixto this ambiguous terminology: 2.5An ordinary flat piece of paper is an example of a nonempty convex setin R 3 having empty interior but relatively nonempty interior.2.1.6.1 relative interiorWe distinguish interior from relative interior throughout. [218] [250] [239]The relative interior rel int C of a convex set C ⊆ R n is its interior relativeto its affine hull. 2.6 Thus defined, it is common (though confusing) for int Cthe interior of C to be empty while its relative interior is not: this happenswhenever dimension of its affine hull is less than dimension of the ambientspace (dim aff C < n , e.g., were C a flat piece of paper in R 3 ) or in theexception when C is a single point; [168,2.2.1]rel int{x} ∆ = aff{x} = {x} , int{x} = ∅ , x∈ R n (10)In any case, closure of the relative interior of a convex set C always yieldsthe closure of the set itself;rel int C = C (11)If C is convex then rel int C and C are convex, [129, p.24] and it is alwayspossible to pass to a smaller ambient Euclidean space where a nonempty setacquires an interior. [19,II.2.3].Given the intersection of convex set C with an affine set Arel int(C ∩ A) = rel int(C) ∩ A (12)If C has nonempty interior, then rel int C = int C .2.5 Superfluous mingling of terms as in relatively nonempty set would be an unfortunateconsequence. From the opposite perspective, some authors use the term full orfull-dimensional to describe a set having nonempty interior.2.6 Likewise for relative boundary (2.6.1.3), although relative closure is superfluous.[129,A.2.1]

52 CHAPTER 2. CONVEX GEOMETRY(a)R(b)R 2(c)R 3Figure 9: (a) Ellipsoid in R is a line segment whose boundary comprises twopoints. Intersection of line with ellipsoid in R , (b) in R 2 , (c) in R 3 . Eachellipsoid illustrated has entire boundary constituted by zero-dimensionalfaces; in fact, by vertices (2.6.1.0.1). Intersection of line with boundaryis a point at entry to interior. These same facts hold in higher dimension.2.1.7 classical boundary(confer2.6.1.3) Boundary of a set C is the closure of C less its interiorpresumed nonempty; [38,1.1]∂ C = C \ int C (13)which follows from the factint C = C (14)assuming nonempty interior. 2.7 One implication is: an open set has aboundary defined although not contained in the set.2.7 Otherwise, for x∈ R n as in (10), [168,2.1,2.3]the empty set is both open and closed.int{x} = ∅ = ∅

2.1. CONVEX SET 532.1.7.1 Line intersection with boundaryA line can intersect the boundary of a convex set in any dimension at apoint demarcating the line’s entry to the set interior. On one side of thatentry-point along the line is the exterior of the set, on the other side is theset interior. In other words,starting from any point of a convex set, a move toward the interior isan immediate entry into the interior. [19,II.2]When a line intersects the interior of a convex body in any dimension, theboundary appears to the line to be as thin as a point. This is intuitivelyplausible because, for example, a line intersects the boundary of the ellipsoidsin Figure 9 at a point in R , R 2 , and R 3 . Such thinness is a remarkablefact when pondering visualization of convex polyhedra (2.12,4.14.3) infour dimensions, for example, having boundaries constructed from otherthree-dimensional convex polyhedra called faces.We formally define face in (2.6). For now, we observe the boundaryof a convex body to be entirely constituted by all its faces of dimensionlower than the body itself. For example: The ellipsoids in Figure 9 haveboundaries composed only of zero-dimensional faces. The two-dimensionalslab in Figure 8 is a polyhedron having one-dimensional faces making itsboundary. The three-dimensional bounded polyhedron in Figure 11 haszero-, one-, and two-dimensional polygonal faces constituting its boundary.2.1.7.1.1 Example. Intersection of line with boundary in R 6 .The convex cone of positive semidefinite matrices S 3 + (2.9) in the ambientsubspace of symmetric matrices S 3 (2.2.2.0.1) is a six-dimensional Euclideanbody in isometrically isomorphic R 6 (2.2.1). The boundary of thepositive semidefinite cone in this dimension comprises faces having only thedimensions 0, 1, and 3 ; id est, {ρ(ρ+1)/2, ρ=0, 1, 2}.Unique minimum-distance projection PX (E.9) of any point X ∈ S 3 onthat cone is known in closed form (7.1.2). Given, for example, λ ∈ int R 3 +and diagonalization (A.5.2) of exterior point⎡X = QΛQ T ∈ S 3 , Λ =∆ ⎣⎤λ 1 0λ 2⎦ (15)0 −λ 3

54 CHAPTER 2. CONVEX GEOMETRYwhere Q∈ R 3×3 is an orthogonal matrix, then the projection on S 3 + in R 6 is⎡PX = Q⎣λ 1 0λ 20 0⎤⎦Q T ∈ S 3 + (16)This positive semidefinite matrix PX nearest X thus has rank 2, found bydiscarding all negative eigenvalues. The line connecting these two points is{X + (PX−X)t , t∈R} where t=0 ⇔ X and t=1 ⇔ PX . Becausethis line intersects the boundary of the positive semidefinite cone S 3 + atpoint PX and passes through its interior (by assumption), then the matrixcorresponding to an infinitesimally positive perturbation of t there shouldreside interior to the cone (rank 3). Indeed, for ε an arbitrarily small positiveconstant,⎡ ⎤λ 1 0X + (PX−X)t| t=1+ε= Q(Λ+(PΛ−Λ)(1+ε))Q T = Q⎣λ 2⎦Q T ∈ int S 3 +0 ελ 32.1.7.2 Tangential line intersection with boundary(17)A higher-dimensional boundary ∂ C of a convex Euclidean body C is simplya dimensionally larger set through which a line can pass when it does notintersect the body’s interior. Still, for example, a line existing in five ormore dimensions may pass tangentially (intersecting no point interior to C[142,15.3]) through a single point relatively interior to a three-dimensionalface on ∂ C . Let’s understand why by inductive reasoning.Figure 10(a) shows a vertical line-segment whose boundary comprisesits two endpoints. For a line to pass through the boundary tangentially(intersecting no point relatively interior to the line-segment), it must existin an ambient space of at least two dimensions. Otherwise, the line is confinedto the same one-dimensional space as the line-segment and must pass alongthe segment to reach the end points.Figure 10(b) illustrates a two-dimensional ellipsoid whose boundary isconstituted entirely by zero-dimensional faces. Again, a line must exist inat least two dimensions to tangentially pass through any single arbitrarilychosen point on the boundary (without intersecting the ellipsoid interior).

2.1. CONVEX SET 55(a)R 2(b)R 3(c)(d)Figure 10: Line tangential (a) (b) to relative interior of a zero-dimensionalface in R 2 , (c) (d) to relative interior of a one-dimensional face in R 3 .

56 CHAPTER 2. CONVEX GEOMETRYNow let’s move to an ambient space of three dimensions. Figure 10(c)shows a polygon rotated into three dimensions. For a line to pass through itszero-dimensional boundary (one of its vertices) tangentially, it must existin at least the two dimensions of the polygon. But for a line to passtangentially through a single arbitrarily chosen point in the relative interiorof a one-dimensional face on the boundary as illustrated, it must exist in atleast three dimensions.Figure 10(d) illustrates a solid circular pyramid (upside-down) whoseone-dimensional faces are line-segments emanating from its pointed end(its vertex). This pyramid’s boundary is constituted solely by theseone-dimensional line-segments. A line may pass through the boundarytangentially, striking only one arbitrarily chosen point relatively interior toa one-dimensional face, if it exists in at least the three-dimensional ambientspace of the pyramid.From these few examples, way deduce a general rule (without proof):A line may pass tangentially through a single arbitrarily chosen pointrelatively interior to a k-dimensional face on the boundary of a convexEuclidean body if the line exists in dimension at least equal to k+2.Now the interesting part, with regard to Figure 11 showing a boundedpolyhedron in R 3 ; call it P : A line existing in at least four dimensions isrequired in order to pass tangentially (without hitting int P) through a singlearbitrary point in the relative interior of any two-dimensional polygonal faceon the boundary of polyhedron P . Now imagine that polyhedron P is itselfa three-dimensional face of some other polyhedron in R 4 . To pass a linetangentially through polyhedron P itself, striking only one point from itsrelative interior rel int P as claimed, requires a line existing in at least fivedimensions.This rule can help determine whether there exists unique solution to aconvex optimization problem whose feasible set is an intersecting line; e.g.,the trilateration problem (4.4.2.2.4).2.1.8 intersection, sum, difference, product2.1.8.0.1 Theorem. Intersection. [41,2.3.1] [202,2]The intersection of an arbitrary collection of convex sets is convex. ⋄

2.1. CONVEX SET 57This theorem in converse is implicitly false in so far as a convex set canbe formed by the intersection of sets that are not.Vector sum of two convex sets C 1 and C 2C 1 + C 2 = {x + y | x ∈ C 1 , y ∈ C 2 } (18)is convex.By additive inverse, we can similarly define vector difference of two convexsetsC 1 − C 2 = {x − y | x∈ C 1 , y ∈ C 2 } (19)which is convex. Applying this definition to nonempty convex set C 1 , itsself-difference C 1 − C 1 is generally nonempty, nontrivial, and convex; e.g.,for any convex cone K , (2.7.2) the set K − K constitutes its affine hull.[202, p.15]Cartesian product of convex setsC 1 × C 2 ={[ xy]| x ∈ C 1 , y ∈ C 2}=[C1C 2](20)remains convex. The converse also holds; id est, a Cartesian product isconvex iff each set is. [129, p.23]Convex results are also obtained for scaling κ C of a convex set C ,rotation/reflection Q C , or translation C+ α ; all similarly defined.Given any operator T and convex set C , we are prone to write T(C)meaningT(C) ∆ = {T(x) | x∈ C} (21)Given linear operator T , it therefore follows from (18),T(C 1 + C 2 ) = {T(x + y) | x∈ C 1 , y ∈ C 2 }= {T(x) + T(y) | x∈ C 1 , y ∈ C 2 }= T(C 1 ) + T(C 2 )(22)

58 CHAPTER 2. CONVEX GEOMETRY2.1.9 inverse image2.1.9.0.1 Theorem. Image, Inverse image. [202,3] [41,2.3.2]Let f be a mapping from R p×k to R m×n .The image of a convex set C under any affine function (3.1.1.2)is convex.The inverse image 2.8 of a convex set F ,f(C) = {f(X) | X ∈ C} ⊆ R m×n (23)f −1 (F) = {X | f(X)∈ F} ⊆ R p×k (24)a single or many-valued mapping, under any affine function f is convex.In particular, any affine transformation of an affine set remains affine.[202, p.8]Each converse of this two-part theorem is generally false; id est, givenf affine, a convex image f(C) does not imply that set C is convex, andneither does a convex inverse image f −1 (F) imply set F is convex. Acounter-example is easy to visualize when the affine function is an orthogonalprojector [220] [162]:⋄2.1.9.0.2 Corollary. Projection on subspace. [202,3] 2.9Orthogonal projection of a convex set on a subspace is another convex set.⋄Again, the converse is false. Shadows, for example, are umbral projectionsthat can be convex when the body providing the shade is not.2.8 See2.9.1.0.2 for an example.2.9 The corollary holds more generally for projection on hyperplanes (2.4.2); [250,6.6]hence, for projection on affine subsets (2.3.1, nonempty intersections of hyperplanes).Orthogonal projection on affine subsets is reviewed inE.4.0.0.1.

2.2. VECTORIZED-MATRIX INNER PRODUCT 592.2 Vectorized-matrix inner productEuclidean space R n comes equipped with a linear vector inner-product〈y,z〉 ∆ = y T z (25)We prefer those angle brackets to connote a geometric rather than algebraicperspective. Two vectors are orthogonal (perpendicular) to one another ifand only if their inner product vanishes;A vector inner-product defines a normy ⊥ z ⇔ 〈y,z〉 = 0 (26)‖y‖ 2 ∆ = √ y T y , ‖y‖ 2 = 0 ⇔ y = 0 (27)When orthogonal vectors each have unit norm, then they are orthonormal.For linear operation A on a vector, represented by a real matrix, the adjointoperation A T is transposition and defined for matrix A by [147,3.10]〈y,A T z〉 ∆ = 〈Ay,z〉 (28)The vector inner-product for matrices is calculated just as it is for vectors;by first transforming a matrix in R p×k to a vector in R pk by concatenatingits columns in the natural order. For lack of a better term, we shall callthat linear bijective (one-to-one and onto [147, App.A1.2]) transformationvectorization. For example, the vectorization of Y = [y 1 y 2 · · · y k ] ∈ R p×k[99] [216] is⎡ ⎤y 1vec Y =∆ y⎢ 2⎥⎣ . ⎦ ∈ Rpk (29)y kThen the vectorized-matrix inner-product is trace of matrix inner-product;for Z ∈ R p×k , [41,2.6.1] [129,0.3.1] [259,8] [246,2.2]where (A.1.1)〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z (30)tr(Y T Z) = tr(ZY T ) = tr(YZ T ) = tr(Z T Y ) = 1 T (Y ◦ Z)1 (31)

60 CHAPTER 2. CONVEX GEOMETRYand where ◦ denotes the Hadamard product 2.10 of matrices [131] [93,1.1.4].The adjoint operation A T on a matrix can therefore be defined in like manner:〈Y , A T Z〉 ∆ = 〈AY , Z〉 (32)For example, take any element C 1 from a matrix-valued set in R p×k ,and consider any particular dimensionally compatible real vectors v andw . Then vector inner-product of C 1 with vw T is〈vw T , C 1 〉 = v T C 1 w = tr(wv T C 1 ) = 1 T( (vw T )◦ C 1)1 (33)2.2.0.0.1 Example. Application of the image theorem.Suppose the set C ⊆ R p×k is convex. Then for any particular vectors v ∈R pand w ∈R k , the set of vector inner-productsY ∆ = v T Cw = 〈vw T , C〉 ⊆ R (34)is convex. This result is a consequence of the image theorem. Yet it is easyto show directly that convex combination of elements from Y remains anelement of Y . 2.11More generally, vw T in (34) may be replaced with any particular matrixZ ∈ R p×k while convexity of the set 〈Z , C〉⊆ R persists. Further, byreplacing v and w with any particular respective matrices U and W ofdimension compatible with all elements of convex set C , then set U T CWis convex by the image theorem because it is a linear mapping of C .2.10 The Hadamard product is a simple entrywise product of corresponding entries fromtwo matrices of like size; id est, not necessarily square.2.11 To verify that, take any two elements C 1 and C 2 from the convex matrix-valued setC , and then form the vector inner-products (34) that are two elements of Y by definition.Now make a convex combination of those inner products; videlicet, for 0≤µ≤1µ 〈vw T , C 1 〉 + (1 − µ) 〈vw T , C 2 〉 = 〈vw T , µ C 1 + (1 − µ)C 2 〉The two sides are equivalent by linearity of inner product. The right-hand side remainsa vector inner-product of vw T with an element µ C 1 + (1 − µ)C 2 from the convex set C ;hence it belongs to Y . Since that holds true for any two elements from Y , then it mustbe a convex set.

2.2. VECTORIZED-MATRIX INNER PRODUCT 612.2.1 Frobenius’2.2.1.0.1 Definition. Isomorphic.An isomorphism of a vector space is a transformation equivalent to a linearbijective mapping. The image and inverse image under the transformationoperator are then called isomorphic vector spaces.△Isomorphic vector spaces are characterized by preservation of adjacency;id est, if v and w are points connected by a line segment in one vector space,then their images will also be connected by a line segment. Two Euclideanbodies may be considered isomorphic of there exists an isomorphism of theircorresponding ambient spaces. [247,I.1]When Z =Y ∈ R p×k in (30), Frobenius’ norm is resultant from vectorinner-product;‖Y ‖ 2 F = ‖ vec Y ‖2 2 = 〈Y , Y 〉 = tr(Y T Y )= ∑ i,jY 2ij = ∑ iλ(Y T Y ) i = ∑ iσ(Y ) 2 i(35)where λ(Y T Y ) i is the i th eigenvalue of Y T Y , and σ(Y ) i the i th singularvalue of Y . Were Y a normal matrix (A.5.2), then σ(Y )= |λ(Y )|[269,8.1] thus‖Y ‖ 2 F = ∑ iλ(Y ) 2 i = ‖λ(Y )‖ 2 2 (36)The converse (36) ⇒ normal matrix Y also holds. [131,2.5.4]Because the metrics are equivalent‖ vec X −vec Y ‖ 2 = ‖X −Y ‖ F (37)and because vectorization (29) is a linear bijective map, then vector spaceR p×k is isometrically isomorphic with vector space R pk in the Euclidean senseand vec is an isometric isomorphism on R p×k . 2.12 Because of this Euclideanstructure, all the known results from convex analysis in Euclidean space R ncarry over directly to the space of real matrices R p×k .2.12 Given matrix A, its range R(A) (2.5) is isometrically isomorphic with its vectorizedrange vec R(A) but not with R(vec A).

62 CHAPTER 2. CONVEX GEOMETRY2.2.1.0.2 Definition. Isometric isomorphism.An isometric isomorphism of a vector space having a metric defined on it is alinear bijective mapping T that preserves distance; id est, for all x,y ∈dom T‖Tx − Ty‖ = ‖x − y‖ (38)Then the isometric isomorphism T is a bijective isometry.△Unitary linear operator Q : R n → R n representing orthogonal matrixQ∈ R n×n (B.5), for example, is an isometric ⎡ isomorphism. ⎤ Yet isometric1 0operator T : R 2 → R 3 representing T = ⎣ 0 1 ⎦ on R 2 is injective but not0 0a surjective map [147,1.6] to R 3 .The Frobenius norm is orthogonally invariant; meaning, for X,Y ∈ R p×kand dimensionally compatible orthonormal matrix 2.13 U and orthogonalmatrix Q‖U(X −Y )Q‖ F = ‖X −Y ‖ F (39)2.2.2 Symmetric matrices2.2.2.0.1 Definition. Symmetric matrix subspace.Define a subspace of R M×M : the convex set of all symmetric M×M matrices;S M ∆ = { A∈ R M×M | A=A T} ⊆ R M×M (40)This subspace comprising symmetric matrices S M is isomorphic with thevector space R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. The orthogonal complement [220] [162] of S M isS M⊥ ∆ = { A∈ R M×M | A=−A T} ⊂ R M×M (41)the subspace of antisymmetric matrices in R M×M ; id est,S M ⊕ S M⊥ = R M×M (42)where unique vector sum ⊕ is defined on page 644.△2.13 Any matrix whose columns are orthonormal with respect to each other; this includesthe orthogonal matrices.

2.2. VECTORIZED-MATRIX INNER PRODUCT 63All antisymmetric matrices are hollow by definition (have 0main-diagonal). Any square matrix A∈ R M×M can be written as the sum ofits symmetric and antisymmetric parts: respectively,A = 1 2 (A +AT ) + 1 2 (A −AT ) (43)The symmetric part is orthogonal in R M2 to the antisymmetric part; videlicet,tr ( (A T + A)(A −A T ) ) = 0 (44)In the ambient space of real matrices, the antisymmetric matrix subspacecan be described{ }S M⊥ =∆ 12 (A −AT ) | A∈ R M×M ⊂ R M×M (45)because any matrix in S M is orthogonal to any matrix in S M⊥ . Furtherconfined to the ambient subspace of symmetric matrices, because ofantisymmetry, S M⊥ would become trivial.2.2.2.1 Isomorphism on symmetric matrix subspaceWhen a matrix is symmetric in S M , we may still employ the vectorizationtransformation (29) to R M2 ; vec , an isometric isomorphism. We mightinstead choose to realize in the lower-dimensional subspace R M(M+1)/2 byignoring redundant entries (below the main diagonal) during transformation.Such a realization would remain isomorphic but not isometric. Lack ofisometry is a spatial distortion due now to disparity in metric between R M2and R M(M+1)/2 . To realize isometrically in R M(M+1)/2 , we must make acorrection: For Y = [Y ij ]∈ S M we introduce the symmetric vectorization⎡ ⎤√2Y12Y 11Y 22svec Y =∆ √2Y13√2Y23∈ R M(M+1)/2 (46)⎢ Y 33 ⎥⎣ ⎦.Y MM

64 CHAPTER 2. CONVEX GEOMETRYwhere all entries off the main diagonal have been scaled. Now for Z ∈ S M〈Y , Z〉 ∆ = tr(Y T Z) = vec(Y ) T vec Z = svec(Y ) T svec Z (47)Then because the metrics become equivalent, for X ∈ S M‖ svec X − svec Y ‖ 2 = ‖X − Y ‖ F (48)and because symmetric vectorization (46) is a linear bijective mapping, thensvec is an isometric isomorphism on the symmetric matrix subspace. In otherwords, S M is isometrically isomorphic with R M(M+1)/2 in the Euclidean senseunder transformation svec .The set of all symmetric matrices S M forms a proper subspace in R M×M ,so for it there exists a standard orthonormal basis in isometrically isomorphicR M(M+1)/2 {E ij ∈ S M } =⎧⎨⎩e i e T i ,1 √2(e i e T j + e j e T ii = j = 1...M), 1 ≤ i < j ≤ M⎫⎬⎭(49)where M(M+1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M . Thus we have a basic orthogonal expansion for Y ∈ S MY =M∑ j∑〈E ij ,Y 〉 E ij (50)j=1 i=1whose coefficients〈E ij ,Y 〉 ={Yii , i = 1... MY ij√2 , 1 ≤ i < j ≤ M(51)correspond to entries of the symmetric vectorization (46).

2.2. VECTORIZED-MATRIX INNER PRODUCT 652.2.3 Symmetric hollow subspace2.2.3.0.1 Definition. Hollow subspaces. [236]Define a subspace of R M×M : the convex set of all (real) symmetric M ×Mmatrices having 0 main-diagonal;R M×Mh∆= { A∈ R M×M | A=A T , δ(A) = 0 } ⊂ R M×M (52)where the main diagonal of A∈ R M×M is denoted (A.1)δ(A) ∈ R M (1061)Operating on a vector, linear operator δ naturally returns a diagonal matrix;δ(δ(A)) is a diagonal matrix. Operating recursively on a vector Λ∈ R N ordiagonal matrix Λ∈ R N×N , operator δ(δ(Λ)) returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) ∆ = Λ (1062)The subspace R M×Mh(52) comprising (real) symmetric hollow matrices isisomorphic with subspace R M(M−1)/2 . The orthogonal complement of R M×MhisR M×M⊥h∆= { A∈ R M×M | A=−A T + 2δ 2 (A) } ⊆ R M×M (53)the subspace of antisymmetric antihollow matrices in R M×M ; id est,R M×Mh⊕ R M×M⊥h= R M×M (54)Yet defined instead as a proper subspace of S MS M h∆= { A∈ S M | δ(A) = 0 } ⊂ S M (55)the orthogonal complement S M⊥h of S M h in ambient S MS M⊥h∆= { A∈ S M | A=δ 2 (A) } ⊆ S M (56)is simply the subspace of diagonal matrices; id est,S M h ⊕ S M⊥h = S M (57)△

66 CHAPTER 2. CONVEX GEOMETRYAny matrix A∈ R M×M can be written as the sum of its symmetric hollowand antisymmetric antihollow parts: respectively,( ) ( 1 1A =2 (A +AT ) − δ 2 (A) +2 (A −AT ) + δ (A))2 (58)The symmetric hollow part is orthogonal in R M2 to the antisymmetricantihollow part; videlicet,))1 1tr((2 (A +AT ) − δ (A))( 2 2 (A −AT ) + δ 2 (A) = 0 (59)In the ambient space of real matrices, the antisymmetric antihollow subspaceis described{ }∆ 1=2 (A −AT ) + δ 2 (A) | A∈ R M×M ⊆ R M×M (60)S M⊥hbecause any matrix in S M h is orthogonal to any matrix in S M⊥h . Yet inthe ambient space of symmetric matrices S M , the antihollow subspace isnontrivial;S M⊥h∆= { δ 2 (A) | A∈ S M} = { δ(u) | u∈ R M} ⊆ S M (61)In anticipation of their utility with Euclidean distance matrices (EDMs)in4, for symmetric hollow matrices we introduce the linear bijectivevectorization dvec that is the natural analogue to symmetric matrixvectorization svec (46): for Y = [Y ij ]∈ S M h⎡dvecY = ∆ √ 2⎢⎣⎤Y 12Y 13Y 23Y 14Y 24Y 34 ⎥.⎦Y M−1,M∈ R M(M−1)/2 (62)Like svec (46), dvec is an isometric isomorphism on the symmetric hollowsubspace.

2.3. HULLS 67Figure 11: Convex hull of a random list of points in R 3 . Some pointsfrom that generating list reside in the interior of this convex polyhedron.[252, Convex Polyhedron] (Avis-Fukuda-Mizukoshi)The set of all symmetric hollow matrices S M h forms a proper subspace inR M×M , so for it there must be a standard orthonormal basis in isometricallyisomorphic R M(M−1)/2{E ij ∈ S M h } ={ }1 ( )√ ei e T j + e j e T i , 1 ≤ i < j ≤ M 2(63)where M(M −1)/2 standard basis matrices E ij are formed from the standardbasis vectors e i ∈ R M .The symmetric hollow majorization corollary on page 454 characterizeseigenvalues of symmetric hollow matrices.2.3 Hulls2.3.1 Affine dimension, affine hullAffine dimension of any set in R n is the dimension of the smallest affineset (empty set, point, line, plane, hyperplane (2.4.2), subspace, R n ) thatcontains it. For nonempty sets, affine dimension is the same as dimension ofthe subspace parallel to that affine set. [202,1] [129,A.2.1]Ascribe the points in a list {x l ∈ R n , l=1... N} to the columns ofmatrix X :X = [x 1 · · · x N ] ∈ R n×N (64)

68 CHAPTER 2. CONVEX GEOMETRYIn particular, we define affine dimension r of the N-point list X asdimension of the smallest affine set in Euclidean space R n that contains X ;r ∆ = dim aff X (65)Affine dimension r is a lower bound sometimes called embedding dimension.[236] [115] That affine set A in which those points are embedded is uniqueand called the affine hull [41,2.1.2] [218,2.1];A ∆ = aff {x l ∈ R n , l=1... N} = aff X= x 1 + R{x l − x 1 , l=2... N} = {Xa | a T 1 = 1} ⊆ R n (66)Given some arbitrary set C and any x∈ Cwhere aff(C−x) is a subspace.aff C = x + aff(C − x) (67)2.3.1.0.1 Definition. Affine subset.We analogize affine subset to subspace, 2.14 defining it to be any nonemptyaffine set (2.1.4).△The affine hull of a point x is that point itself;aff ∅ ∆ = ∅ (68)aff{x} = {x} (69)The affine hull of two distinct points is the unique line through them.(Figure 14) The affine hull of three noncollinear points in any dimensionis that unique plane containing the points, and so on. The subspace ofsymmetric matrices S m is the affine hull of the cone of positive semidefinitematrices; (2.9)aff S m + = S m (70)2.14 The popular term affine subspace is an oxymoron.

2.3. HULLS 69Figure 12: A simplicial cone (2.12.3.1.1) in R 3 whose boundary is drawntruncated; constructed using A∈ R 3×3 and C = 0 in (240). By the mostfundamental definition of a cone (2.7.1), entire boundary can be constructedfrom an aggregate of rays emanating exclusively from the origin. Eachof three extreme directions corresponds to an edge (2.6.0.0.3); they areconically, affinely, and linearly independent for this cone. Because thisset is polyhedral, exposed directions are in one-to-one correspondence withextreme directions; there are only three. Its extreme directions give rise towhat is called a vertex-description of this polyhedral cone; simply, the conichull of extreme directions. Obviously this cone can also be constructed byintersection of three halfspaces; hence the equivalent halfspace-description.

70 CHAPTER 2. CONVEX GEOMETRY2.3.2 Convex hullThe convex hull [129,A.1.4] [41,2.1.4] [202] of any bounded 2.15 list (or set)of N points X ∈ R n×N forms a unique convex polyhedron (2.12.0.0.1) whosevertices constitute some subset of that list;P ∆ = conv {x l , l=1... N} = conv X = {Xa | a T 1 = 1, a ≽ 0} ⊆ R n(71)The union of relative interior and relative boundary (2.6.1.3) of thepolyhedron comprise the convex hull P , the smallest closed convex set thatcontains the list X ; e.g., Figure 11. Given P , the generating list {x l } isnot unique.Given some arbitrary set C ⊆ R n , its convex hull conv C is equivalent tothe smallest closed convex set containing it. (confer2.4.1.1.1) The convexhull is a subset of the affine hull;conv C ⊆ aff C = aff C = aff C = aff conv C (72)Any closed bounded convex set C is equal to the convex hull of its boundary;C = conv ∂ C (73)conv ∅ ∆ = ∅ (74)2.3.2.1 Comparison with respect to R N +The notation a ≽ 0 means vector a belongs to the nonnegative orthantR N + , whereas a ≽ b denotes comparison of vector a to vector b on R N withrespect to the nonnegative orthant; id est, a ≽ b means a −b belongs to thenonnegative orthant. In particular, a ≽ b ⇔ a i ≽ b i ∀i. (304)Comparison of matrices with respect to the positive semidefinite cone,like I ≽A ≽ 0 in Example 2.3.2.1.1, is explained in2.9.0.1.2.15 A set in R n is bounded if and only if it can be contained in a Euclidean ball havingfinite radius. [66,2.2] (confer4.7.3.0.1)

2.3. HULLS 71svec ∂ S 2 +I[ α ββ γ]γα√2βFigure 13: The circle, (radius 1 √2) shown here on boundary of positivesemidefinite cone S 2 + in isometrically isomorphic R 3 from Figure 29,comprises boundary of a Fantope (75) in this dimension (k = 1, N = 2).Lone point illustrated is the identity matrix I and corresponds to case k = 2.(View is from inside PSD cone looking toward origin.)

72 CHAPTER 2. CONVEX GEOMETRY2.3.2.1.1 Example. Convex hull of outer product. [187] [9,4.1][191,3] [156,2.4] Hull of the set comprising outer product of orthonormalmatrices has equivalent expression:conv { UU T | U ∈ R N×k , U T U = I } = { A∈ S N | I ≽ A ≽ 0, 〈I , A〉= k }(75)This we call a Fantope. In case k = 1, there is slight simplification: ((1244),Example 2.9.2.4.1)conv { UU T | U ∈ R N , U T U = I } = { A∈ S N | A ≽ 0, 〈I , A〉= 1 } (76)The set, for any particular k > 0{UU T | U ∈ R N×k , U T U = I} (77)comprises the extreme points (2.6.0.0.1) of its convex hull. By (1104), eachand every extreme point UU T has only k nonzero eigenvalues λ and they allequal 1 ; id est, λ(UU T ) 1:k = λ(U T U) = 1. So the Frobenius norm of eachand every extreme point equals the same constant‖UU T ‖ 2 F = k (78)Each extreme point simultaneously lies on the boundary of the positivesemidefinite cone (when k < N ,2.9) and on the boundary of a hypersphereof dimension k(N −√k(1 k + 1) and radius − k ) centered at kI along2 2 N Nthe ray (base 0) through the identity matrix I in isomorphic vector spaceR N(N+1)/2 (2.2.2.1).Figure 13 illustrates the extreme points (77) comprising the boundaryof a Fantope; but that circumscription does not hold in higher dimension.(2.9.2.5)2.3.3 Conic hullIn terms of a finite-length point list (or set) arranged columnar in X ∈ R n×N(64), its conic hull is expressedK ∆ = cone {x l , l=1... N} = coneX = {Xa | a ≽ 0} ⊆ R n (79)id est, every nonnegative combination of points from the list. The conic hullof any finite-length list forms a polyhedral cone [129,A.4.3] (2.12.1.0.1;e.g., Figure 12); the smallest closed convex cone that contains the list.

2.3. HULLS 73Aaffine hull (drawn truncated)Cconvex hullKconic hull (truncated)Rrange or span is a plane (truncated)Figure 14: Given two points in Euclidean vector space of any dimension,their various hulls are illustrated. Each hull is a subset of range; generally,A , C, K ⊆ R . (Cartesian axes drawn for reference.)

74 CHAPTER 2. CONVEX GEOMETRYBy convention, the aberration [218,2.1]cone ∅ = ∆ {0} (80)Given some arbitrary set C , it is apparentconv C ⊆ cone C (81)2.3.4 Vertex-descriptionThe conditions in (66), (71), and (79) respectively define an affine, convex,and conic combination of elements from the set or list. Whenever a Euclideanbody can be described as some hull or span of a set of points, then thatrepresentation is loosely called a vertex-description.2.4 Halfspace, HyperplaneA two-dimensional affine set is called a plane. An (n −1)-dimensional affineset in R n is called a hyperplane. [202] [129] Every hyperplane partially boundsa halfspace (which is convex but not affine).2.4.1 Halfspaces H + and H −Euclidean space R n is partitioned into two halfspaces by any hyperplane∂H ; id est, H − + H + = R n . The resulting (closed convex) halfspaces, bothpartially bounded by ∂H , may be describedH − = {y | a T y ≤ b} = {y | a T (y − y p ) ≤ 0} ⊂ R n (82)H + = {y | a T y ≥ b} = {y | a T (y − y p ) ≥ 0} ⊂ R n (83)where nonzero vector a∈R n is an outward-normal to the hyperplane partiallybounding H − while an inward-normal with respect to H + . For any vectory −y p that makes an obtuse angle with normal a , vector y will lie in thehalfspace H − on one side (shaded in Figure 15) of the hyperplane, whileacute angles denote y in H + on the other side.An equivalent more intuitive representation of a halfspace comes aboutwhen we consider all the points in R n closer to point d than to point c orequidistant, in the Euclidean sense; from Figure 15,

2.4. HALFSPACE, HYPERPLANE 75H +ay pcyd∆∂H={y | a T (y − y p )=0}H −N(a T )={y | a T y=0}Figure 15: Hyperplane illustrated ∂H is a line partially bounding halfspacesH − = {y | a T (y − y p )≤0} and H + = {y | a T (y − y p )≥0} in R 2 . Shaded isa rectangular piece of semi-infinite H − with respect to which vector a isoutward-normal to bounding hyperplane; vector a is inward-normal withrespect to H + . Halfspace H − contains nullspace N(a T ) (dashed linethrough origin) because a T y p > 0. Hyperplane, halfspace, and nullspace areeach drawn truncated. Points c and d are equidistant from hyperplane, andvector c − d is normal to it. ∆ is distance from origin to hyperplane.

76 CHAPTER 2. CONVEX GEOMETRYH − = {y | ‖y − d‖ ≤ ‖y − c‖} (84)This representation, in terms of proximity, is resolved with the moreconventional representation of a halfspace (82) by squaring both sides ofthe inequality in (84);}H − ={y | (c − d) T y ≤ ‖c‖2 − ‖d‖ 22=( {y | (c − d) T y − c + d ) }≤ 02(85)2.4.1.1 PRINCIPLE 1: Halfspace-description of convex setsThe most fundamental principle in convex geometry follows from thegeometric Hahn-Banach theorem [162,5.12] [15,1] [75,I.1.2] whichguarantees any closed convex set to be an intersection of halfspaces.2.4.1.1.1 Theorem. Halfspaces. [41,2.3.1] [202,18][129,A.4.2(b)] [29,2.4] A closed convex set in R n is equivalent to theintersection of all halfspaces that contain it.⋄Intersection of multiple halfspaces in R n may be represented using amatrix constant A ;⋂H i−= {y | A T y ≼ b} = {y | A T (y − y p ) ≼ 0} (86)i⋂H i+= {y | A T y ≽ b} = {y | A T (y − y p ) ≽ 0} (87)iwhere b is now a vector, and the i th column of A is normal to a hyperplane∂H i partially bounding H i . By the halfspaces theorem, intersections likethis can describe interesting convex Euclidean bodies such as polyhedra andcones, giving rise to the term halfspace-description.

2.4. HALFSPACE, HYPERPLANE 7711−1−11[ 1a =1](a)[ −1b =−1]−1−11(b){y | a T y=1}{y | b T y=−1}{y | a T y=−1}{y | b T y=1}(c)[ −1c =1]−11−11−11−1[ 1d =−11](d){y | c T y=1}{y | c T y=−1}{y | d T y=−1}{y | d T y=1}[ 1e =0]−11(e){y | e T y=−1} {y | e T y=1}Figure 16: (a)-(d) Hyperplanes in R 2 (truncated). Movement in normaldirection increases vector inner-product. This visual concept is exploitedto attain analytical solution of linear programs; e.g., Example 2.4.2.6.2,Example 3.1.1.2.1, [41, exer.4.8-exer.4.20]. Each graph is also interpretableas a contour plot of a real affine function of two variables as inFigure 52. (e) Whenever normal is norm 1, nonnegative b in {x | a T x = b}represents radius of hypersphere about 0 supported by hyperplane.

78 CHAPTER 2. CONVEX GEOMETRY2.4.2 Hyperplane ∂H representationsEvery hyperplane ∂H is an affine set parallel to an (n −1)-dimensionalsubspace of R n ; it is itself a subspace if and only if it contains the origin.dim∂H = n − 1 (88)so a hyperplane is a point in R , a line in R 2 , a plane in R 3 , and so on.Every hyperplane can be described as the intersection of complementaryhalfspaces; [202,19]∂H = H − ∩ H + = {y | a T y ≤ b , a T y ≥ b} = {y | a T y = b} (89)a halfspace-description. Assuming normal a ∈ R n to be nonzero, then anyhyperplane in R n can be described as the solution set to vector equationa T y = b , illustrated in Figure 15 and Figure 16 for R 2 ;∂H ∆ = {y | a T y = b} = {y | a T (y −y p ) = 0} = {Zξ+y p | ξ ∈ R n−1 } ⊂ R n (90)All solutions y constituting the hyperplane are offset from the nullspace ofa T by the same constant vector y p ∈ R n that is any particular solution toa T y=b ; id est,y = Zξ + y p (91)where the columns of Z ∈ R n×n−1 constitute a basis for the nullspaceN(a T ) = {x∈ R n | a T x=0} . 2.16Conversely, given any point y p in R n , the unique hyperplane containingit having normal a is the affine set ∂H (90) where b equals a T y p andwhere a basis for N(a T ) is arranged in Z columnar. Hyperplane dimensionis apparent from the dimensions of Z ; that hyperplane is parallel to thespan of its columns.2.16 We will later find this expression for y in terms of nullspace of a T (more generally, ofmatrix A T (117)) to be a useful device for eliminating affine equality constraints, much aswe did here.

2.4. HALFSPACE, HYPERPLANE 792.4.2.0.1 Example. Distance from origin to hyperplane.Given the (shortest) distance ∆∈ R + from the origin to a hyperplanehaving normal vector a , we can find its representation ∂H by droppinga perpendicular. The point thus found is the orthogonal projection of theorigin on ∂H (E.5.0.0.5), equal to a∆/‖a‖ if the origin is known a priorito belong to halfspace H − (Figure 15), or equal to −a∆/‖a‖ if the originbelongs to halfspace H + ; id est, when H − ∋0∂H = { y | a T (y − a∆/‖a‖) = 0 } = { y | a T y = ‖a‖∆ } (92)or when H + ∋0∂H = { y | a T (y + a∆/‖a‖) = 0 } = { y | a T y = −‖a‖∆ } (93)Knowledge of only distance ∆ and normal a thus introduces ambiguity intothe hyperplane representation.2.4.2.1 Matrix variableAny halfspace in R mn may be represented using a matrix variable. Forvariable Y ∈ R m×n , given constants A∈ R m×n and b = 〈A , Y p 〉 ∈ R ,H − = {Y ∈ R mn | 〈A, Y 〉 ≤ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≤ 0} (94)H + = {Y ∈ R mn | 〈A, Y 〉 ≥ b} = {Y ∈ R mn | 〈A, Y −Y p 〉 ≥ 0} (95)Recall vector inner-product from2.2, 〈A, Y 〉= tr(A T Y ).Hyperplanes in R mn may, of course, also be represented using matrixvariables.∂H = {Y | 〈A, Y 〉 = b} = {Y | 〈A, Y −Y p 〉 = 0} ⊂ R mn (96)Vector a from Figure 15 is normal to the hyperplane illustrated. Likewise,nonzero vectorized matrix A is normal to hyperplane ∂H ;A ⊥ ∂H in R mn (97)

80 CHAPTER 2. CONVEX GEOMETRYA 1A 2A 30Figure 17: Any one particular point of three points illustrated does not belongto affine hull A i (i∈1, 2, 3, each drawn truncated) of points remaining.Three corresponding vectors in R 2 are, therefore, affinely independent (butneither linearly or conically independent).2.4.2.2 Vertex-description of hyperplaneAny hyperplane in R n may be described as the affine hull of a minimal set ofpoints {x l ∈ R n , l = 1... n} arranged columnar in a matrix X ∈ R n×n (64):∂H = aff{x l ∈ R n , l = 1... n} , dim aff{x l ∀l}=n−1= aff X , dim aff X = n−1= x 1 + R{x l − x 1 , l=2... n} , dim R{x l − x 1 , l=2... n}=n−1= x 1 + R(X − x 1 1 T ) , dim R(X − x 1 1 T ) = n−1(98)whereR(A) = {Ax | x∈ R n } (115)2.4.2.3 Affine independence, minimal setFor any particular affine set, a minimal set of points constituting itsvertex-description is an affinely independent descriptive set and vice versa.

2.4. HALFSPACE, HYPERPLANE 81Arbitrary given points {x i ∈ R n , i=1... N} are affinely independent(a.i.) if and only if, over all ζ ∈ R N ζ T 1=1, ζ k = 0 (confer2.1.2)x i ζ i + · · · + x j ζ j − x k = 0, i≠ · · · ≠j ≠k = 1... N (99)has no solution ζ ; in words, iff no point from the given set can be expressedas an affine combination of those remaining. We deducel.i. ⇒ a.i. (100)Consequently, {x i , i=1... N} is an affinely independent set if and only if{x i −x 1 , i=2... N} is a linearly independent (l.i.) set. [ [135,3] ] (Figure 17)XThis is equivalent to the property that the columns of1 T (for X ∈ R n×Nas in (64)) form a linearly independent set. [129,A.1.3]2.4.2.4 Preservation of affine independenceIndependence in the linear (2.1.2.1), affine, and conic (2.10.1) senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (64)holds an affinely independent set in its columns. Consider a transformationT(X) : R n×N → R n×N ∆ = XY (101)where the given matrix Y ∆ = [y 1 y 2 · · · y N ]∈ R N×N is represented by linearoperator T . Affine independence of {Xy i ∈ R n , i=1... N} demands (bydefinition (99)) there exists no solution ζ ∈ R N ζ T 1=1, ζ k = 0, toXy i ζ i + · · · + Xy j ζ j − Xy k = 0, i≠ · · · ≠j ≠k = 1... N (102)By factoring X , we see that is ensured by affine independence of {y i ∈ R N }and by R(Y )∩ N(X) = 0 where2.4.2.5 affine mapsR(A) = {Ax | x∈ R n } (115)N(A) = {x∈ R n | Ax=0} (116)Affine transformations preserve affine hulls. Given any affine mapping T ofvector spaces and some arbitrary set C [202, p.8]aff(T C) = T(aff C) (103)

82 CHAPTER 2. CONVEX GEOMETRYCH −H +0 > κ 3 > κ 2 > κ 1{z ∈ R 2 | a T z = κ 1 }a{z ∈ R 2 | a T z = κ 2 }{z ∈ R 2 | a T z = κ 3 }Figure 18: Each shaded line segment {z ∈ C | a T z = κ i } belonging to setC ⊂ R 2 shows intersection with hyperplane parametrized by scalar κ i ; eachshows a (linear) contour in vector z of equal inner product with normalvector a . Cartesian axes drawn for reference. (confer Figure 52)

2.4. HALFSPACE, HYPERPLANE 83tradition(a)Yy paH +H −∂H −nontraditional(b)Yy pãH −H +∂H +Figure 19: (a) Hyperplane ∂H − (104) supporting closed set Y ∈ R 2 .Vector a is inward-normal to hyperplane with respect to halfspace H + ,but outward-normal with respect to set Y . A supporting hyperplane canbe considered the limit of an increasing sequence in the normal-direction likethat in Figure 18. (b) Hyperplane ∂H + nontraditionally supporting Y .Vector ã is inward-normal to hyperplane now with respect to bothhalfspace H + and set Y . Tradition [129] [202] recognizes only positivenormal polarity in support function σ Y as in (104); id est, normal a ,figure (a). But both interpretations of supporting hyperplane are useful.

84 CHAPTER 2. CONVEX GEOMETRY2.4.2.6 PRINCIPLE 2: Supporting hyperplaneThe second most fundamental principle of convex geometry also follows fromthe geometric Hahn-Banach theorem [162,5.12] [15,1] that guaranteesexistence of at least one hyperplane in R n supporting a convex set (havingnonempty interior) 2.17 at each point on its boundary.2.4.2.6.1 Definition. Supporting hyperplane ∂H .The partial boundary ∂H of a closed halfspace containing arbitrary set Yis called a supporting hyperplane ∂H to Y when it contains at least onepoint of Y . [202,11] Specifically, given normal a≠0 (belonging to H + bydefinition), the supporting hyperplane to Y at y p ∈ ∂Y [sic] is∂H − = { y | a T (y − y p ) = 0, y p ∈ Y , a T (z − y p ) ≤ 0 ∀z∈Y }= { y | a T y = sup{a T z |z∈Y} } (104)where normal a and set Y reside in opposite halfspaces. (Figure 19(a)) Realfunctionσ Y (a) ∆ = sup{a T z |z∈Y} (412)is called the support function for Y .An equivalent but nontraditional representation 2.18 for a supportinghyperplane is obtained by reversing polarity of normal a ; (1299)∂H + = { y | ã T (y − y p ) = 0, y p ∈ Y , ã T (z − y p ) ≥ 0 ∀z∈Y }= { y | ã T y = − inf{ã T z |z∈Y} = sup{−ã T z |z∈Y} } (105)where normal ã and set Y now both reside in H + . (Figure 19(b))When a supporting hyperplane contains only a single point of Y , thathyperplane is termed strictly supporting (and termed tangent to Y if thesupporting hyperplane is unique there [202,18, p.169]).△2.17 It is conventional to speak of a hyperplane supporting set C but not containing C ;called nontrivial support. [202, p.100] Hyperplanes in support of lower-dimensional bodiesare admitted.2.18 useful for constructing the dual cone; e.g., Figure 40(b). Tradition recognizes thepolar cone; which is the negative of the dual cone.

2.4. HALFSPACE, HYPERPLANE 85A closed convex set C ⊂ R n , for example, can be expressed as theintersection of all halfspaces partially bounded by hyperplanes supportingit; videlicet, [162, p.135]C = ⋂a∈R n {y | a T y ≤ σ C (a) } (106)by the halfspaces theorem (2.4.1.1.1).There is no geometric difference 2.19 between supporting hyperplane ∂H +or ∂H − or ∂H and an ordinary hyperplane ∂H coincident with them.2.4.2.6.2 Example. Minimization on the unit cube.Consider minimization of a linear function on a hypercube, given vector cminimize c T xxsubject to −1 ≼ x ≼ 1(107)This convex optimization problem is called a linear program because theconstraints describe intersection of a finite number of halfspaces (andhyperplanes). Applying graphical concepts from Figure 16, Figure 18, andFigure 19, an optimal solution can be shown to be x ⋆ = − sgn(c) but isnot necessarily unique. Because a solution always exists at a hypercubevertex regardless of the value of nonzero vector c , [57] mathematicianssee this as a means to relax a problem whose desired solution is integer(confer Example 6.2.3.0.2).2.4.2.6.3 Exercise. Unbounded below.Suppose instead we minimize over the unit hypersphere in Example 2.4.2.6.2;‖x‖ ≤ 1. What is an expression for optimal solution now? Is that programstill linear?Now suppose we instead minimize absolute value in (107). Are thefollowing programs equivalent for some arbitrary real convex set C ?(confer (1311))minimize |x|x∈Rsubject to −1 ≤ x ≤ 1x ∈ C≡minimize x + + x −x + , x −subject to 1 ≥ x − ≥ 01 ≥ x + ≥ 0x + − x − ∈ C(108)2.19 If vector-normal polarity is unimportant, we may instead signify a supportinghyperplane by ∂H .

86 CHAPTER 2. CONVEX GEOMETRYMany optimization problems of interest and some older methods ofsolution require nonnegative variables. The method illustrated below splitsa variable into its nonnegative and negative parts; x = x + − x − (extensibleto vectors). Under what conditions on vector a and scalar b is an optimalsolution x ⋆ negative infinity?minimizex + ∈ R , x − ∈ Rx + − x −subject to x − ≥ 0x + ≥ 0[ ]a T x+= bx −(109)Minimization of the objective function 2.20 entails maximization of x − .2.4.2.7 PRINCIPLE 3: Separating hyperplaneThe third most fundamental principle of convex geometry again follows fromthe geometric Hahn-Banach theorem [162,5.12] [15,1] [75,I.1.2] thatguarantees existence of a hyperplane separating two nonempty convex sets inR n whose relative interiors are nonintersecting. Separation intuitively meanseach set belongs to a halfspace on an opposing side of the hyperplane. Thereare two cases of interest:1) If the two sets intersect only at their relative boundaries (2.6.1.3), thenthere exists a separating hyperplane ∂H containing the intersection butcontaining no points relatively interior to either set. If at least one ofthe two sets is open, conversely, then the existence of a separatinghyperplane implies the two sets are nonintersecting. [41,2.5.1]2) A strictly separating hyperplane ∂H intersects the closure of neitherset; its existence is guaranteed when the intersection of the closures isempty and at least one set is bounded. [129,A.4.1]2.20 The objective is the function that is argument to minimization or maximization.

2.5. SUBSPACE REPRESENTATIONS 872.5 Subspace representationsThere are two common forms of expression for subspaces, both comingfrom elementary linear algebra: range form and nullspace form; a.k.a,vertex-description and halfspace-description respectively.The fundamental vector subspaces associated with a matrix A∈ R m×n[220,3.1] are ordinarily relatedand of dimension:R(A T ) ⊥ N(A), N(A T ) ⊥ R(A) (110)R(A T ) ⊕ N(A) = R n , N(A T ) ⊕ R(A) = R m (111)dim R(A T ) = dim R(A) = rankA ≤ min{m,n} (112)dim N(A) = n − rankA , dim N(A T ) = m − rankA (113)From these four fundamental subspaces, the rowspace and range identify oneform of subspace description (range form or vertex-description (2.3.4))R(A T ) ∆ = spanA T = {A T y | y ∈ R m } = {x∈ R n | A T y=x , y ∈R(A)} (114)R(A) ∆ = spanA = {Ax | x∈ R n } = {y ∈ R m | Ax=y , x∈R(A T )} (115)while the nullspaces identify the second common form (nullspace form orhalfspace-description (89))N(A) ∆ = {x∈ R n | Ax=0} (116)N(A T ) ∆ = {y ∈ R m | A T y=0} (117)Range forms (114) (115) are realized as the respective span of the columnvectors in matrices A T and A , whereas nullspace form (116) or (117) is thesolution set to a linear equation similar to hyperplane definition (90). Yetbecause matrix A generally has multiple rows, halfspace-description N(A) isactually the intersection of as many hyperplanes through the origin; for (116),each row of A is normal to a hyperplane while each row of A T is a normalfor (117).

88 CHAPTER 2. CONVEX GEOMETRY2.5.1 Subspace or affine set ...Any particular vector subspace R p can be described as N(A) the nullspaceof some matrix A or as R(B) the range of some matrix B .More generally, we have the choice of expressing an n − m-dimensionalaffine subset in R n as the intersection of m hyperplanes, or as the offset spanof n − m vectors:2.5.1.1 ...as hyperplane intersectionAny affine subset A of dimension n−m can be described as an intersectionof m hyperplanes in R n ; given fat full-rank (rank = min{m,n}) matrixand vector b∈R m ,⎡a T1A =∆ ⎣ .a T m⎤⎦∈ R m×n (118)A ∆ = {x∈ R n | Ax=b} =m⋂ { }x | aTi x=b ii=1(119)a halfspace-description. (89)For example: The intersection of any two independent 2.21 hyperplanesin R 3 is a line, whereas three independent hyperplanes intersect at apoint. In R 4 , the intersection of two independent hyperplanes is a plane(Example 2.5.1.2.1), whereas three hyperplanes intersect at a line, four at apoint, and so on.For n>kA ∩ R k = {x∈ R n | Ax=b} ∩ R k =m⋂ { }x∈ R k | a i (1:k) T x=b ii=1(120)The result in2.4.2.2 is extensible; id est, any affine subset A also has avertex-description.2.21 Hyperplanes are said to be independent iff the normals defining them are linearlyindependent.

2.5. SUBSPACE REPRESENTATIONS 892.5.1.2 ...as span of nullspace basisAlternatively, we may compute a basis for the nullspace of matrix A in (118)and then equivalently express the affine subset as its range plus an offset:DefineZ ∆ = basis N(A)∈ R n×n−m (121)so AZ = 0. Then we have the vertex-description,A = {x∈ R n | Ax = b} = { Zξ + x p | ξ ∈ R n−m} ⊆ R n (122)the offset span of n−m column vectors, where x p is any particular solutionto Ax = b .2.5.1.2.1 Example. Intersecting planes in 4-space.Two planes can intersect at a point in four-dimensional Euclidean vectorspace. It is easy to visualize intersection of two planes in three dimensions;a line can be formed. In four dimensions it is harder to visualize. So let’sresort to the tools acquired.Suppose an intersection of two hyperplanes in four dimensions is specifiedby a fat full-rank matrix A 1 ∈ R 2×4 (m = 2, n = 4) as in (119):{ ∣ [ ] }∣∣∣A ∆ 1 = x∈ R 4 a11 a 12 a 13 a 14x = ba 21 a 22 a 23 a 1 (123)24The nullspace of A 1 is two dimensional (from Z in (122)), so A 1 representsa plane in four dimensions. Similarly define a second plane in terms ofA 2 ∈ R 2×4 :{ ∣ [ ] }∣∣∣A ∆ 2 = x∈ R 4 a31 a 32 a 33 a 34x = ba 41 a 42 a 43 a 2 (124)44If the two planes are independent (meaning any line in one is linearlyindependent [ ] of any line from the other), they will intersect at a point becauseA1then is invertible;A 2A 1 ∩ A 2 ={ ∣ [ ] [ ]}∣∣∣x∈ R 4 A1 b1x =A 2 b 2(125)

90 CHAPTER 2. CONVEX GEOMETRY2.5.2 Intersection of subspacesThe intersection of nullspaces associated with two matrices A∈ R m×n andB ∈ R k×n can be expressed most simply as([ AN(A) ∩ N(B) = NB])[∆ A= {x∈ R n |B]x = 0} (126)the nullspace of their rowwise concatenation.Suppose the columns of a matrix Z constitute a basis for N(A) while thecolumns of a matrix W constitute a basis for N(BZ). Then [93,12.4.2]N(A) ∩ N(B) = R(ZW) (127)If each basis is orthonormal, then the columns of ZW constitute anorthonormal basis for the intersection.In the particular circumstance A and B are each positive semidefinite[17,6], or in the circumstance A and B are two linearly independent dyads(B.1.1), thenN(A) ∩ N(B) = N(A + B),⎧⎨⎩A,B ∈ S M +orA + B = u 1 v T 1 + u 2 v T 2(l.i.)(128)2.6 Extreme, Exposed2.6.0.0.1 Definition. Extreme point.An extreme point x ε of a convex set C is a point, belonging to its closureC [29,3.3], that is not expressible as a convex combination of points in Cdistinct from x ε ; id est, for x ε ∈ C and all x 1 ,x 2 ∈ C \x εµ x 1 + (1 − µ)x 2 ≠ x ε , µ ∈ [0, 1] (129)△

2.6. EXTREME, EXPOSED 91In other words, x ε is an extreme point of C if and only if x ε is not apoint relatively interior to any line segment in C . [239,2.10]Borwein & Lewis offer: [38,4.1.6] An extreme point of a convex set C isa point x ε in C whose relative complement C \x ε is convex.The set consisting of a single point C ={x ε } is itself an extreme point.2.6.0.0.2 Theorem. Extreme existence. [202,18.5.3] [19,II.3.5]A nonempty closed convex set containing no lines has at least one extremepoint.⋄2.6.0.0.3 Definition. Face, edge. [129,A.2.3]A face F of convex set C is a convex subset F ⊆ C such that everyclosed line segment x 1 x 2 in C , having a relatively interior point(x∈rel intx 1 x 2 ) in F , has both endpoints in F . The zero-dimensionalfaces of C constitute its extreme points. The empty set and C itselfare conventional faces of C . [202,18]All faces F are extreme sets by definition; id est, for F ⊆ C and allx 1 ,x 2 ∈ C \Fµ x 1 + (1 − µ)x 2 /∈ F , µ ∈ [0, 1] (130)A one-dimensional face of a convex set is called an edge.△Dimension of a face is the penultimate number of affinely independentpoints (2.4.2.3) belonging to it;dim F = sup dim{x 2 − x 1 , x 3 − x 1 , ... , x ρ − x 1 | x i ∈ F , i=1... ρ} (131)ρThe point of intersection in C with a strictly supporting hyperplaneidentifies an extreme point, but not vice versa. The nonempty intersection ofany supporting hyperplane with C identifies a face, in general, but not viceversa. To acquire a converse, the concept exposed face requires introduction:

92 CHAPTER 2. CONVEX GEOMETRYABCDFigure 20: Closed convex set in R 2 . Point A is exposed hence extreme;a classical vertex. Point B is extreme but not an exposed point. Point Cis exposed and extreme; zero-dimensional exposure makes it a vertex.Point D is neither an exposed or extreme point although it belongs to aone-dimensional exposed face. [129,A.2.4] [218,3.6] Closed face AB isexposed; a facet. The arc is not a conventional face, yet it is composedentirely of extreme points. Union of all rotations of this entire set aboutits vertical edge produces another convex set in three dimensions havingno edges; but that convex set produced by rotation about horizontal edgecontaining D has edges.

2.6. EXTREME, EXPOSED 932.6.1 Exposure2.6.1.0.1 Definition. Exposed face, exposed point, vertex, facet.[129,A.2.3, A.2.4]Fis an exposed face of an n-dimensional convex set C iff there is asupporting hyperplane ∂H to C such thatF = C ∩ ∂H (132)Only faces of dimension −1 through n −1 can be exposed by ahyperplane.An exposed point, the definition of vertex, is equivalent to azero-dimensional exposed face; the point of intersection with a strictlysupporting hyperplane.Afacet is an (n −1)-dimensional exposed face of an n-dimensionalconvex set C ; in one-to-one correspondence with the(n −1)-dimensional faces. 2.22{exposed points} = {extreme points}{exposed faces} ⊆ {faces}△2.6.1.1 Density of exposed pointsFor any closed convex set C , its exposed points constitute a dense subset ofits extreme points; [202,18] [223] [218,3.6, p.115] dense in the sense [252]that closure of that subset yields the set of extreme points.For the convex set illustrated in Figure 20, point B cannot be exposedbecause it relatively bounds both the facet AB and the closed quarter circle,each bounding the set. Since B is not relatively interior to any line segmentin the set, then B is an extreme point by definition. Point B may be regardedas the limit of some sequence of exposed points beginning at vertex C .2.22 This coincidence occurs simply because the facet’s dimension is the same as thedimension of the supporting hyperplane exposing it.

94 CHAPTER 2. CONVEX GEOMETRY2.6.1.2 Face transitivity and algebraFaces of a convex set enjoy transitive relation. If F 1 is a face (an extreme set)of F 2 which in turn is a face of F 3 , then it is always true that F 1 is aface of F 3 . (The parallel statement for exposed faces is false. [202,18])For example, any extreme point of F 2 is an extreme point of F 3 ; inthis example, F 2 could be a face exposed by a hyperplane supportingpolyhedron F 3 . [145, def.115/6, p.358] Yet it is erroneous to presume thata face, of dimension 1 or more, consists entirely of extreme points, nor is aface of dimension 2 or more entirely composed of edges, and so on.For the polyhedron in R 3 from Figure 11, for example, the nonemptyfaces exposed by a hyperplane are the vertices, edges, and facets; thereare no more. The zero-, one-, and two-dimensional faces are in one-to-onecorrespondence with the exposed faces in that example.Define the smallest face F that contains some element G of a convexset C :F(C ∋G) (133)videlicet, C ⊇ F(C ∋G) ∋ G . An affine set has no faces except itself and theempty set. The smallest face that contains G of the intersection of convexset C with an affine set A [156,2.4]F((C ∩ A)∋G) = F(C ∋G) ∩ A (134)equals the intersection of A with the smallest face that contains G of set C .2.6.1.3 BoundaryThe classical definition of boundary of a set C presumes nonempty interior:∂ C = C \ int C (13)More suitable for the study of convex sets is the relative boundary; defined[129,A.2.1.2]rel∂C = C \ rel int C (135)the boundary relative to the affine hull of C , conventionally equivalent to:

2.7. CONES 952.6.1.3.1 Definition. Conventional boundary of convex set. [129,C.3.1]The relative boundary ∂ C of a nonempty convex set C is the union of all theexposed faces of C .△Equivalence of this definition to (135) comes about because it isconventionally presumed that any supporting hyperplane, central to thedefinition of exposure, does not contain C . [202, p.100]Any face F of convex set C (that is not C itself) belongs to rel∂C .(2.8.2.1) In the exception when C is a single point {x} , (10)rel∂{x} = {x}\{x} = ∅ , x∈ R n (136)A bounded convex polyhedron (2.12.0.0.1) having nonempty interior, forexample, in R has a boundary constructed from two points, in R 2 fromat least three line segments, in R 3 from convex polygons, while a convexpolychoron (a bounded polyhedron in R 4 [252]) has a boundary constructedfrom three-dimensional convex polyhedra.By Definition 2.6.1.3.1, an affine set has no relative boundary.2.7 ConesIn optimization, convex cones achieve prominence because they generalizesubspaces. Most compelling is the projection analogy: Projection on asubspace can be ascertained from projection on its orthogonal complement(E), whereas projection on a closed convex cone can be determined fromprojection instead on its algebraic complement (2.13,E.9.2.1); called thepolar cone.2.7.0.0.1 Definition. Ray.The one-dimensional set{ζΓ + B | ζ ≥ 0, Γ ≠ 0} ⊂ R n (137)defines a halfline called a ray in nonzero direction Γ∈ R n having baseB ∈ R n . When B=0, a ray is the conic hull of direction Γ ; hence a convexcone.△The conventional boundary of a single ray, base 0, in any dimension isthe origin because that is the union of all exposed faces not containing theentire set. Its relative interior is the ray itself excluding the origin.

96 CHAPTER 2. CONVEX GEOMETRYX(a)00(b)XFigure 21: (a) Two-dimensional nonconvex cone drawn truncated. Boundaryof this cone is itself a cone. Each polar half is itself a convex cone. (b) Thisconvex cone (drawn truncated) is a line through the origin in any dimension.It has no relative boundary, while its relative interior comprises entire line.0Figure 22: This nonconvex cone in R 2 is a pair of lines through the origin.[162,2.4]

2.7. CONES 970Figure 23: Boundary of a convex cone in R 2 is a nonconvex cone; a pair ofrays emanating from the origin.X0Figure 24: Nonconvex cone X drawn truncated in R 2 . Boundary is also acone. [162,2.4] Cone exterior is convex cone.

98 CHAPTER 2. CONVEX GEOMETRYXXFigure 25: Truncated nonconvex cone X = {x ∈ R 2 | x 1 ≥ x 2 , x 1 x 2 ≥ 0}.Boundary is also a cone. [162,2.4] Cartesian axes drawn for reference. Eachhalf (about the origin) is itself a convex cone.2.7.1 Cone definedA set X is called, simply, cone if and only ifΓ ∈ X ⇒ ζΓ ∈ X for all ζ ≥ 0 (138)where X denotes closure of cone X . An example of such a cone is theunion of two opposing quadrants; e.g., X = { x∈ R 2 | x 1 x 2 ≥0 } which is notconvex. [250,2.5] Similar examples are shown in Figure 21 and Figure 25.All cones can be defined by an aggregate of rays emanating exclusivelyfrom the origin (but not all cones are convex). Hence all closed cones containthe origin and are unbounded, excepting the simplest cone {0}. The emptyset ∅ is not a cone, butcone ∅ ∆ = {0} (80)

2.7. CONES 992.7.2 Convex coneWe call the set K ⊆ R M a convex cone iffΓ 1 , Γ 2 ∈ K ⇒ ζΓ 1 + ξΓ 2 ∈ K for all ζ,ξ ≥ 0 (139)Apparent from this definition, ζΓ 1 ∈ K and ξΓ 2 ∈ K for all ζ,ξ ≥ 0. Theset K is convex since, for any particular ζ,ξ ≥ 0because µ ζ,(1 − µ)ξ ≥ 0.Obviously,µ ζΓ 1 + (1 − µ)ξΓ 2 ∈ K ∀µ ∈ [0, 1] (140){X } ⊃ {K} (141)the set of all convex cones is a proper subset of all cones. The set ofconvex cones is a narrower but more familiar class of cone, any memberof which can be equivalently described as the intersection of a possibly(but not necessarily) infinite number of hyperplanes (through the origin)and halfspaces whose bounding hyperplanes pass through the origin; ahalfspace-description (2.4). The interior of a convex cone is possibly empty.Figure 26: Not a cone; ironically, the three-dimensional flared horn (with orwithout its interior) resembling the mathematical symbol ≻ denoting conemembership and partial order.

100 CHAPTER 2. CONVEX GEOMETRYFamiliar examples of convex cones include an unbounded ice-cream coneunited with its interior (a.k.a: second-order cone, quadratic cone, circularcone (2.9.2.5.1), Lorentz cone (confer Figure 32) [41, exmps.2.3 & 2.25]),{[ ]}xK l = ∈ R n × R | ‖x‖tl ≤ t , l=2 (142)and any polyhedral cone (2.12.1.0.1); e.g., any orthant generated byCartesian axes (2.1.3). Esoteric examples of convex cones include the pointat the origin, any line through the origin, any ray having the origin as basesuch as the nonnegative real line R + in subspace R , any halfspace partiallybounded by a hyperplane through the origin, the positive semidefinitecone S M + (155), the cone of Euclidean distance matrices EDM N (470) (5),any subspace, and Euclidean vector space R n .2.7.2.1 cone invariance(confer figures: 12, 21, 22, 23, 24, 25, 26, 27, 29, 36, 39, 42, 43,44, 45, 46, 47, 48, 49, 78, 91, 121) More Euclidean bodies are cones,it seems, than are not. This class of convex body, the convex cone, isinvariant to nonnegative scaling, vector summation, affine and inverse affinetransformation, Cartesian product, and intersection, [202, p.22] but is notinvariant to projection; e.g., Figure 31.2.7.2.1.1 Theorem. Cone intersection. [202,2,19]The intersection of an arbitrary collection of convex cones is a convex cone.Intersection of an arbitrary collection of closed convex cones is a closedconvex cone. [168,2.3] Intersection of a finite number of polyhedral cones(2.12.1.0.1, Figure 36 p.135) is polyhedral.⋄The property pointedness is associated with a convex cone.2.7.2.1.2 Definition. Pointed convex cone. (confer2.12.2.2)A convex cone K is pointed iff it contains no line. Equivalently, K is notpointed iff there exists any nonzero direction Γ ∈ K such that −Γ ∈ K .[41,2.4.1] If the origin is an extreme point of K or, equivalently, ifK ∩ −K = {0} (143)then K is pointed, and vice versa. [218,2.10]△

2.7. CONES 101Thus the simplest and only bounded [250, p.75] convex coneK = {0} ⊆ R n is pointed by convention, but has empty interior. Its relativeboundary is the empty set (136) while its relative interior is the pointitself (10). The pointed convex cone that is a halfline emanating from theorigin in R n has the origin as relative boundary while its relative interior isthe halfline itself, excluding the origin.2.7.2.1.3 Theorem. Pointed cones. [38,3.3.15, exer.20]A closed convex cone K ⊂ R n is pointed if and only if there exists a normalα such that the setC ∆ = {x ∈ K | 〈x,α〉=1} (144)is closed, bounded, and K = cone C . Equivalently, if and only if there existsa vector β and positive scalar ǫ such that〈x, β〉 ≥ ǫ‖x‖ ∀x∈ K (145)is K pointed.⋄If closed convex cone K is not pointed, then it has no extreme point. Yeta pointed closed convex cone has only one extreme point; it resides at theorigin. [29,3.3]From the cone intersection theorem it follows that an intersection ofconvex cones is pointed if at least one of the cones is.2.7.2.2 Pointed closed convex cone and partial orderA pointed closed convex cone K induces partial order [252] on R n or R m×n ,[17,1] respectively defined by vector or matrix inequality;x ≼Kx ≺Kz ⇔ z − x ∈ K (146)z ⇔ z − x ∈ rel int K (147)Neither x or z is necessarily a member of K for these relations to hold. Onlywhen K is the nonnegative orthant do these inequalities reduce to ordinaryentrywise comparison. (2.13.4.2.3) Inclusive of that special case, we ascribe

102 CHAPTER 2. CONVEX GEOMETRYnomenclature generalized inequality to comparison with respect to a pointedclosed convex cone.The visceral mechanics of actually comparing vectors when the cone is notan orthant is well illustrated in the example of Figure 46 which relies on theequivalent membership-interpretation in (146) or (147). Comparable pointsand the minimum element of some vector- or matrix-valued set are thus welldefined (and decreasing sequences with respect to K therefore converge):Vector x ∈ C is the (unique) minimum element of set C with respect tocone K if and only if for each and every z ∈ C we have x ≼ z ; equivalently,iff C ⊆ x + K . 2.23 A closely related concept minimal element, is useful forsets having no minimum element: Vector x ∈ C is a minimal element of setC with respect to K if and only if (x − K) ∩ C = x .More properties of partial ordering with respect to K are catalogedin [41,2.4] with respect to proper cones: 2.24 such as reflexivity (x≼x),antisymmetry (x≼z , z ≼x ⇒ x=z), transitivity (x≼y , y ≼z ⇒ x≼z),additivity (x≼z , u≼v ⇒ x+u ≼ z+v), strict properties, and preservationunder nonnegative scaling or limiting operations.2.7.2.2.1 Definition. Proper cone: [41,2.4.1] a cone that ispointedclosedconvexhas nonempty interior.△A proper cone remains proper under injective linear transformation.[55,5.1]2.23 Borwein & Lewis [38,3.3, exer.21] ignore possibility of equality to x + K in thiscondition, and require a second condition: . . . and C ⊂ y + K for some y in R n impliesx ∈ y + K .2.24 We distinguish pointed closed convex cones here because the generalized inequalityand membership corollary (2.13.2.0.1) remains intact. [129,A.4.2.7]

2.8. CONE BOUNDARY 103Examples of proper cones are the positive semidefinite cone S M + in theambient space of symmetric matrices (2.9), the nonnegative real line R + invector space R , or any orthant in R n .2.8 Cone boundaryEvery hyperplane supporting a convex cone contains the origin. [129,A.4.2]Because any supporting hyperplane to a convex cone must therefore itself bea cone, then from the cone intersection theorem it follows:2.8.0.0.1 Lemma. Cone faces. [19,II.8]Each nonempty exposed face of a convex cone is a convex cone. ⋄2.8.0.0.2 Theorem. Proper-cone boundary.Suppose a nonzero point Γ lies on the boundary ∂K of proper cone K in R n .Then it follows that the ray {ζΓ | ζ ≥ 0} also belongs to ∂K . ⋄Proof. By virtue of its propriety, a proper cone guarantees the existenceof a strictly supporting hyperplane at the origin. [202, Cor.11.7.3] 2.25 Hencethe origin belongs to the boundary of K because it is the zero-dimensionalexposed face. The origin belongs to the ray through Γ , and the ray belongsto K by definition (138). By the cone faces lemma, each and every nonemptyexposed face must include the origin. Hence the closed line segment 0Γ mustlie in an exposed face of K because both endpoints do by Definition 2.6.1.3.1.That means there exists a supporting hyperplane ∂H to K containing 0Γ .So the ray through Γ belongs both to K and to ∂H . ∂H must thereforeexpose a face of K that contains the ray; id est,{ζΓ | ζ ≥ 0} ⊆ K ∩ ∂H ⊂ ∂K (148)2.25 Rockafellar’s corollary yields a supporting hyperplane at the origin to any convex conein R n not equal to R n .

104 CHAPTER 2. CONVEX GEOMETRYProper cone {0} in R 0 has no boundary (135) because (10)rel int{0} = {0} (149)The boundary of any proper cone in R is the origin.The boundary of any proper cone whose dimension exceeds 1 can beconstructed entirely from an aggregate of rays emanating exclusively fromthe origin.2.8.1 Extreme directionThe property extreme direction arises naturally in connection with thepointed closed convex cone K ⊂ R n , being analogous to extreme point.[202,18, p.162] 2.26 An extreme direction Γ ε of pointed K is a vectorcorresponding to an edge that is a ray emanating from the origin. 2.27Nonzero direction Γ ε in pointed K is extreme if and only if,ζ 1 Γ 1 +ζ 2 Γ 2 ≠ Γ ε ∀ ζ 1 , ζ 2 ≥ 0, ∀ Γ 1 , Γ 2 ∈ K\{ζΓ ε ∈ K | ζ ≥0} (150)In words, an extreme direction in a pointed closed convex cone is thedirection of a ray, called an extreme ray, that cannot be expressed as a coniccombination of any ray directions in the cone distinct from it.An extreme ray is a one-dimensional face of K . By (81), extremedirection Γ ε is not a point relatively interior to any line segment inK\{ζΓ ε ∈ K | ζ ≥0} . Thus, by analogy, the corresponding extreme ray{ζΓ ε ∈ K | ζ ≥0} is not a ray relatively interior to any plane segment 2.28in K .2.8.1.1 extreme distinction, uniquenessAn extreme direction is unique, but its vector representation Γ ε is notbecause any positive scaling of it produces another vector in the same(extreme) direction. Hence an extreme direction is unique to within a positivescaling. When we say extreme directions are distinct, we are referring todistinctness of rays containing them. Nonzero vectors of various length in2.26 We diverge from Rockafellar’s extreme direction: “extreme point at infinity”.2.27 An edge (2.6.0.0.3) of a convex cone is not necessarily a ray. A convex cone maycontain an edge that is a line; e.g., a wedge-shaped polyhedral cone (K ∗ in Figure 27).2.28 A planar fragment; in this context, a planar cone.

2.8. CONE BOUNDARY 105∂K ∗KFigure 27: K is a pointed polyhedral cone having empty interior in R 3(drawn truncated and in a plane parallel to the floor upon which you stand).K ∗ is a wedge whose truncated boundary is illustrated (drawn perpendicularto the floor). In this particular instance, K ⊂ int K ∗ (excepting the origin).Cartesian coordinate axes drawn for reference.

106 CHAPTER 2. CONVEX GEOMETRYthe same extreme direction are therefore interpreted to be identical extremedirections. 2.29The extreme directions of the polyhedral cone in Figure 12 (page 69), forexample, correspond to its three edges.The extreme directions of the positive semidefinite cone (2.9) comprisethe infinite set of all symmetric rank-one matrices. [17,6] [126,III] Itis sometimes prudent to instead consider the less infinite but completenormalized set, for M >0 (confer (187)){zz T ∈ S M | ‖z‖= 1} (151)The positive semidefinite cone in one dimension M =1, S + the nonnegativereal line, has one extreme direction belonging to its relative interior; anidiosyncrasy of dimension 1.Pointed closed convex cone K = {0} has no extreme direction becauseextreme directions are nonzero by definition.If closed convex cone K is not pointed, then it has no extreme directionsand no vertex. [17,1]Conversely, pointed closed convex cone K is equivalent to the convex hullof its vertex and all its extreme directions. [202,18, p.167] That is thepractical utility of extreme direction; to facilitate construction of polyhedralsets, apparent from the extremes theorem:2.8.1.1.1 Theorem. (Klee) Extremes. [218,3.6] [202,18, p.166](confer2.3.2,2.12.2.0.1) Any closed convex set containing no lines canbe expressed as the convex hull of its extreme points and extreme rays. ⋄2.29 Like vectors, an extreme direction can be identified by the Cartesian point at thevector’s head with respect to the origin.

2.8. CONE BOUNDARY 107It follows that any element of a convex set containing no lines maybe expressed as a linear combination of its extreme elements; e.g.,Example 2.9.2.4.1.2.8.1.2 GeneratorsIn the narrowest sense, generators for a convex set comprise any collectionof points and directions whose convex hull constructs the set.When the extremes theorem applies, the extreme points and directionsare called generators of a convex set. An arbitrary collection of generatorsfor a convex set includes its extreme elements as a subset; the set of extremeelements of a convex set is a minimal set of generators for that convex set.Any polyhedral set has a minimal set of generators whose cardinality is finite.When the convex set under scrutiny is a closed convex cone, coniccombination of generators during construction is implicit as shown inExample 2.8.1.2.1 and Example 2.10.2.0.1. So, a vertex at the origin (if itexists) becomes benign.We can, of course, generate affine sets by taking the affine hull of anycollection of points and directions. We broaden, thereby, the meaning ofgenerator to be inclusive of all kinds of hulls.Any hull of generators is loosely called a vertex-description. (2.3.4)Hulls encompass subspaces, so any basis constitutes generators for avertex-description; span basis R(A).2.8.1.2.1 Example. Application of extremes theorem.Given an extreme point at the origin and N extreme rays, denoting the i thextreme direction by Γ i ∈ R n , then the convex hull is (71)P = { [0 Γ 1 Γ 2 · · · Γ N ]aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] aζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{= [Γ1 Γ 2 · · · Γ N ] b | b ≽ 0 } (152)⊂ R na closed convex set that is simply a conic hull like (79).

108 CHAPTER 2. CONVEX GEOMETRY2.8.2 Exposed direction2.8.2.0.1 Definition. Exposed point & direction of pointed convex cone.[202,18] (confer2.6.1.0.1)When a convex cone has a vertex, an exposed point, it resides at theorigin; there can be only one.In the closure of a pointed convex cone, an exposed direction is thedirection of a one-dimensional exposed face that is a ray emanatingfrom the origin.{exposed directions} ⊆ {extreme directions}△For a proper cone in vector space R n with n ≥ 2, we can say more:{exposed directions} = {extreme directions} (153)It follows from Lemma 2.8.0.0.1 for any pointed closed convex cone, thereis one-to-one correspondence of one-dimensional exposed faces with exposeddirections; id est, there is no one-dimensional exposed face that is not a raybase 0.The pointed closed convex cone EDM 2 , for example, is a ray inisomorphic subspace R whose relative boundary (2.6.1.3.1) is the origin.The conventionally exposed directions of EDM 2 constitute the empty set∅ ⊂ {extreme direction}. This cone has one extreme direction belonging toits relative interior; an idiosyncrasy of dimension 1.2.8.2.1 Connection between boundary and extremes2.8.2.1.1 Theorem. Exposed. [202,18.7] (confer2.8.1.1.1)Any closed convex set C containing no lines (and whose dimension is atleast 2) can be expressed as the closure of the convex hull of its exposedpoints and exposed rays.⋄

2.8. CONE BOUNDARY 109BCADFigure 28: Properties of extreme points carry over to extreme directions.[202,18] Four rays (drawn truncated) on boundary of conic hull oftwo-dimensional closed convex set from Figure 20 lifted to R 3 . Raythrough point A is exposed hence extreme. Extreme direction B on coneboundary is not an exposed direction, although it belongs to the exposedface cone{A,B} . Extreme ray through C is exposed. Point D is neitheran exposed or extreme direction although it belongs to a two-dimensionalexposed face of the conic hull.0

110 CHAPTER 2. CONVEX GEOMETRYFrom Theorem 2.8.1.1.1,rel∂C = C \ rel int C (135)= conv{exposed points and exposed rays} \ rel int C= conv{extreme points and extreme rays} \ rel int C⎫⎪⎬⎪⎭(154)Thus each and every extreme point of a convex set (that is not a point)resides on its relative boundary, while each and every extreme direction of aconvex set (that is not a halfline and contains no line) resides on its relativeboundary because extreme points and directions of such respective sets donot belong to the relative interior by definition.The relationship between extreme sets and the relative boundary actuallygoes deeper: Any face F of convex set C (that is not C itself) belongs torel ∂ C , so dim F < dim C . [202,18.1.3]2.8.2.2 Converse caveatIt is inconsequent to presume that each and every extreme point and directionis necessarily exposed, as might be erroneously inferred from the conventionalboundary definition (2.6.1.3.1); although it can correctly be inferred: eachand every extreme point and direction belongs to some exposed face.Arbitrary points residing on the relative boundary of a convex set are notnecessarily exposed or extreme points. Similarly, the direction of an arbitraryray, base 0, on the boundary of a convex cone is not necessarily an exposedor extreme direction. For the polyhedral cone illustrated in Figure 12, forexample, there are three two-dimensional exposed faces constituting theentire boundary, each composed of an infinity of rays. Yet there are onlythree exposed directions.Neither is an extreme direction on the boundary of a pointed convex conenecessarily an exposed direction. Lift the two-dimensional set in Figure 20,for example, into three dimensions such that no two points in the set arecollinear with the origin. Then its conic hull can have an extreme directionB on the boundary that is not an exposed direction, illustrated in Figure 28.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1112.9 Positive semidefinite (PSD) coneThe cone of positive semidefinite matrices studied in this sectionis arguably the most important of all non-polyhedral cones whosefacial structure we completely understand.−Alexander Barvinok [19, p.78]2.9.0.0.1 Definition. Positive semidefinite (PSD) cone.The set of all symmetric positive semidefinite matrices of particulardimension M is called the positive semidefinite cone:S M ∆+ = { A ∈ S M | A ≽ 0 }= { A ∈ S M | y T Ay ≥0 ∀ ‖y‖ = 1 }= ⋂ {A ∈ S M | 〈yy T , A〉 ≥ 0 } (155)‖y‖=1formed by the intersection of an infinite number of halfspaces (2.4.1.1) invectorized variable A , 2.30 each halfspace having partial boundary containingthe origin in isomorphic R M(M+1)/2 . It is a unique immutable proper conein the ambient space of symmetric matrices S M .The positive definite (full-rank) matrices comprise the cone interior, 2.31int S M + = { A ∈ S M | A ≻ 0 }= { A ∈ S M | y T Ay>0 ∀ ‖y‖ = 1 }= {A ∈ S M + | rankA = M}(156)while all singular positive semidefinite matrices (having at least one 0eigenvalue) reside on the cone boundary (Figure 29); (A.7.4)∂S M + = { A ∈ S M | min{λ(A) i , i=1... M} = 0 }= { A ∈ S M + | 〈yy T , A〉=0 for some ‖y‖ = 1 }= { A ∈ S M + | rankA < M } (157)where λ(A)∈ R M holds the eigenvalues of A .△2.30 infinite in number when M >1. Because y T A y=y T A T y , matrix A is almost alwaysassumed symmetric. (A.2.1)2.31 The remaining inequalities in (155) also become strict for membership to the coneinterior.

112 CHAPTER 2. CONVEX GEOMETRYThe only symmetric positive semidefinite matrix in S M +0-eigenvalues resides at the origin. (A.7.2.0.1)having M2.9.0.1 MembershipObserve the notation A ≽ 0 ; meaning, 2.32 matrix A is symmetricand belongs to the positive semidefinite cone in the subspace ofsymmetric matrices, whereas A ≻ 0 denotes membership to that cone’sinterior. (2.13.2) This notation further implies that coordinates [sic] fororthogonal expansion of a positive (semi)definite matrix must be its(nonnegative) positive eigenvalues (2.13.7.1.1,E.6.4.1.1) when expandedin its eigenmatrices (A.5.1).2.9.0.1.1 Example. Equality constraints in semidefinite program (862).Employing properties of partial ordering (2.7.2.2) for the pointed closedconvex positive semidefinite cone, it is easy to show, given A + S = CS ≽ 0 ⇔ A ≼ C (158)2.9.1 PSD cone is convexThe set of all positive semidefinite matrices forms a convex cone in theambient space of symmetric matrices because any pair satisfies definition(139); [131,7.1] videlicet, for all ζ 1 , ζ 2 ≥ 0 and each and every A 1 , A 2 ∈ S Mζ 1 A 1 + ζ 2 A 2 ≽ 0 ⇐ A 1 ≽ 0, A 2 ≽ 0 (159)a fact easily verified by the definitive test for positive semidefiniteness of asymmetric matrix (A):A ≽ 0 ⇔ x T Ax ≥ 0 for each and every ‖x‖ = 1 (160)2.32 For matrices, notation A ≽B denotes comparison on S M with respect to the positivesemidefinite cone; (A.3.1) id est, A ≽B ⇔ A −B ∈ S M + , a generalization of comparisonon the real line. The symbol ≥ is reserved for scalar comparison on the real line R withrespect to the nonnegative real line R + as in a T y ≥ b, while a ≽ b denotes comparisonof vectors on R M with respect to the nonnegative orthant R M + .

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 113γsvec ∂ S 2 +[ α ββ γ]α√2βMinimal set of generators are the extreme directions: svec{yy T | y ∈ R M }Figure 29: Truncated boundary of PSD cone in S 2 plotted in isometricallyisomorphic R 3 via svec (46); courtesy, Alexandre W. d’Aspremont. (Plottedis 0-contour of minimum eigenvalue (157). Lightest shading is closest.Darkest shading is furthest and inside shell.) Entire boundary can beconstructed from an aggregate of rays (2.7.0.0.1) emanating exclusivelyfrom the origin, { √κ 2 [z12 2z1 z 2 z2 2 ] T | κ∈ R } . In this dimension the coneis circular (2.9.2.5) while each and every ray on boundary corresponds toan extreme direction, but such is not the case in any higher dimension(confer Figure 12). PSD cone geometry is not as simple in higher dimensions[19,II.12], although for real matrices it is self-dual (305) in ambient spaceof symmetric matrices. [126,II] PSD cone has no two-dimensional faces inany dimension, and its only extreme point resides at the origin.

114 CHAPTER 2. CONVEX GEOMETRYCXFigure 30: Convex set C ={X ∈ S × x∈ R | X ≽ xx T } drawn truncated.xid est, for A 1 , A 2 ≽ 0 and each and every ζ 1 , ζ 2 ≥ 0ζ 1 x T A 1 x + ζ 2 x T A 2 x ≥ 0 for each and every normalized x ∈ R M (161)The convex cone S M + is more easily visualized in the isomorphic vectorspace R M(M+1)/2 whose dimension is the number of free variables in asymmetric M ×M matrix. When M = 2 the PSD cone is semi-infinite inexpanse in R 3 , having boundary illustrated in Figure 29. When M = 3 thePSD cone is six-dimensional, and so on.2.9.1.0.1 Example. Sets from maps of the PSD cone.The setC = {X ∈ S n × x∈ R n | X ≽ xx T } (162)is convex because it has a Schur form; (A.4)X − xx T ≽ 0 ⇔ f(X , x) ∆ =[ X xx T 1]≽ 0 (163)e.g., Figure 30. Set C is the inverse image (2.1.9.0.1) of S n+1+ under theaffine mapping f . The set {X ∈ S n × x∈ R n | X ≼ xx T } is not convex, incontrast, having no Schur form. Yet for fixed x = x p , the set{X ∈ S n | X ≼ x p x T p } (164)is simply the negative semidefinite cone shifted to x p x T p .

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1152.9.1.0.2 Example. Inverse image of the PSD cone.Now consider finding the set of all matrices X ∈ S N satisfyinggiven A,B∈ S N . Define the setAX + B ≽ 0 (165)X ∆ = {X | AX + B ≽ 0} ⊆ S N (166)which is the inverse image of the positive semidefinite cone under affinetransformation g(X) =AX+B ∆ . Set X must therefore be convex byTheorem 2.1.9.0.1.Yet we would like a less amorphous characterization of this set, so insteadwe consider its vectorization (29) which is easier to visualize:wherevec g(X) = vec(AX) + vec B = (I ⊗A) vec X + vec B (167)I ⊗A ∆ = QΛQ T ∈ S N 2 (168)is a block-diagonal matrix formed by Kronecker product (A.1 no.20,D.1.2.1). Assignx ∆ = vec X ∈ R N 2b ∆ = vec B ∈ R N 2 (169)then make the equivalent problem: Findwherevec X = {x∈ R N 2 | (I ⊗A)x + b ∈ K} (170)K ∆ = vec S N + (171)is a proper cone isometrically isomorphic with the positive semidefinite conein the subspace of symmetric matrices; the vectorization of every element ofS N + . Utilizing the diagonalization (168),vec X = {x | ΛQ T x ∈ Q T (K − b)}= {x | ΦQ T x ∈ Λ † Q T (K − b)} ⊆ R N 2 (172)whereΦ ∆ = Λ † Λ (173)

116 CHAPTER 2. CONVEX GEOMETRYis a diagonal projection matrix whose entries are either 1 or 0 (E.3). Wehave the complementary sumΦQ T x + (I − Φ)Q T x = Q T x (174)So, adding (I −Φ)Q T x to both sides of the membership within (172) admitsvec X = {x∈ R N 2 | Q T x ∈ Λ † Q T (K − b) + (I − Φ)Q T x}= {x | Q T x ∈ Φ ( Λ † Q T (K − b) ) ⊕ (I − Φ)R N 2 }= {x ∈ QΛ † Q T (K − b) ⊕ Q(I − Φ)R N 2 }= (I ⊗A) † (K − b) ⊕ N(I ⊗A)(175)where we used the facts: linear function Q T x in x on R N 2 is a bijection,and ΦΛ † = Λ † .vec X = (I ⊗A) † vec(S N + − B) ⊕ N(I ⊗A) (176)In words, set vec X is the vector sum of the translated PSD cone(linearly mapped onto the rowspace of I ⊗ A (E)) and the nullspace ofI ⊗ A (synthesis of fact fromA.6.3 andA.7.2.0.1). Should I ⊗A have nonullspace, then vec X =(I ⊗A) −1 vec(S N + − B) which is the expected result.2.9.2 PSD cone boundaryFor any symmetric positive semidefinite matrix A of rank ρ , there mustexist a rank ρ matrix Y such that A be expressible as an outer productin Y ; [220,6.3]A = Y Y T ∈ S M + , rankA=ρ , Y ∈ R M×ρ (177)Then the boundary of the positive semidefinite cone may be expressed∂S M + = { A ∈ S M + | rankA

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 117√2βsvec S 2 +(a)(b)α[ α ββ γ]Figure 31: (a) Projection of the PSD cone S 2 + , truncated above γ =1, onαβ-plane in isometrically isomorphic R 3 . View is from above with respect toFigure 29. (b) Truncated above γ =2. From these plots we may infer, forexample, the line { [ 0 1/ √ 2 γ ] T | γ ∈ R } intercepts the PSD cone at somelarge value of γ ; in fact, γ =∞ .

118 CHAPTER 2. CONVEX GEOMETRY2.9.2.1 rank ρ subset of the PSD coneFor the same reason (closure), this applies more generally; for 0≤ρ≤M{A ∈ SM+ | rankA=ρ } = { A ∈ S M + | rankA≤ρ } (180)For easy reference, we give such generally nonconvex sets a name:rank ρ subset of the positive semidefinite cone. For ρ < M this subset,nonconvex for M > 1, resides on the positive semidefinite cone boundary.2.9.2.1.1 Exercise.Prove equality in (180).For example,∂S M + = { A ∈ S M + | rankA=M− 1 } = { A ∈ S M + | rankA≤M− 1 } (181)In S 2 , each and every ray on the boundary of the positive semidefinite conein isomorphic R 3 corresponds to a symmetric rank-1 matrix (Figure 29), butthat does not hold in any higher dimension.2.9.2.2 open rank-ρ subsetWhen the positive semidefinite cone subset in (180) is left unclosed as inM(ρ) ∆ = { A ∈ S M + | rankA=ρ } (182)then we can specify a subspace tangent to the positive semidefinite coneat a particular member of manifold M(ρ). Specifically, the subspace R Mtangent to manifold M(ρ) at B ∈ M(ρ) [116,5, prop.1.1]R M (B) ∆ = {XB + BX T | X ∈ R M×M } ⊆ S M (183)has dimension(dim svec R M (B) = ρ M − ρ − 1 )= ρ(M − ρ) +2ρ(ρ + 1)2(184)Tangent subspace R M contains no member of the positive semidefinite coneS M + whose rank exceeds ρ .A good example of such a tangent subspace is given inE.7.2.0.2 by(1602); R M (11 T ) = S M⊥c , orthogonal complement to the geometric centersubspace. (Figure 88, p.348)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1192.9.2.3 Faces of PSD cone, their dimension versus rankEach and every face of the positive semidefinite cone, having dimension lessthan that of the cone, is exposed. [160,6] [140,2.3.4] Because each andevery face of the positive semidefinite cone contains the origin (2.8.0.0.1),each face belongs to a subspace of the same dimension.Given positive semidefinite matrix A ∈ S M + , define F ( S M + ∋A ) (133) asthe smallest face that contains A of the positive semidefinite cone S M + . ThenA , having diagonalization QΛQ T (A.5.2), is relatively interior to [19,II.12][66,31.5.3] [156,2.4]F ( S M + ∋A ) = {X ∈ S M + | N(X) ⊇ N(A)}= {X ∈ S M + | 〈Q(I − ΛΛ † )Q T , X〉 = 0}≃ S rank A+(185)which is isomorphic with the convex cone S rank A+ . Thus dimension of thesmallest face containing given matrix A isdim F ( S M + ∋A ) = rank(A)(rank(A) + 1)/2 (186)in isomorphic R M(M+1)/2 , and each and every face of S M + is isomorphic witha positive semidefinite cone having dimension the same as the face. Observe:not all dimensions are represented, and the only zero-dimensional face is theorigin. The PSD cone has no facets, for example.2.9.2.3.1 Table: Rank versus dimension of S 3 + facesk dim F(S 3 + ∋ rank-k matrix)0 0boundary 1 12 3interior 3 6For the positive semidefinite cone in isometrically isomorphic R 3 depictedin Figure 29, for example, rank-2 matrices belong to the interior of the facehaving dimension 3 (the entire closed cone), while rank-1 matrices belong tothe relative interior of a face having dimension 1 (the boundary constitutesall the one-dimensional faces, in this dimension, which are rays emanatingfrom the origin), and the only rank-0 matrix is the point at the origin (thezero-dimensional face).

120 CHAPTER 2. CONVEX GEOMETRY2.9.2.4 Extreme directions of the PSD coneBecause the positive semidefinite cone is pointed (2.7.2.1.2), there is aone-to-one correspondence of one-dimensional faces with extreme directionsin any dimension M ; id est, because of the cone faces lemma (2.8.0.0.1)and the direct correspondence of exposed faces to faces of S M + , it followsthere is no one-dimensional face of the positive semidefinite cone that is nota ray emanating from the origin.Symmetric dyads constitute the set of all extreme directions: For M >0(151){yy T ∈ S M | y ∈ R M } ⊂ ∂S M + (187)this superset of extreme directions for the positive semidefinite coneis, generally, a subset of the boundary. For two-dimensional matrices,(Figure 29){yy T ∈ S 2 | y ∈ R 2 } = ∂S 2 + (188)while for one-dimensional matrices, in exception, (2.7){yy T ∈ S | y≠0} = int S + (189)Each and every extreme direction yy T makes the same angle with the identitymatrix in isomorphic R M(M+1)/2 , dependent only on dimension; videlicet, 2.33(yy T , I) = arccos( )〈yy T , I〉 1= arccos √‖yy T ‖ F ‖I‖ F M∀y ∈ R M (190)2.9.2.4.1 Example. Positive semidefinite matrix from extreme directions.Diagonalizability (A.5) of symmetric matrices yields the following results:Any symmetric positive semidefinite matrix (1094) can be written in theformA = ∑ λ i z i zi T = ÂÂT = ∑ â i â T i ≽ 0, λ ≽ 0 (191)iia conic combination of linearly independent extreme directions (â i â T i or z i z T iwhere ‖z i ‖=1), where λ is a vector of eigenvalues.2.33 Analogy with respect to the EDM cone is considered by Hayden & Wells et alii[115, p.162] where it is found: angle is not constant. The extreme directions of the EDMcone can be found in5.5.3.2 while the cone axis is −E =11 T − I (659).

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 121If we limit consideration to all symmetric positive semidefinite matricesbounded such that trA=1C ∆ = {A ≽ 0 | trA = 1} (192)then any matrix A from that set may be expressed as a convex combinationof linearly independent extreme directions;A = ∑ iλ i z i z T i ∈ C , 1 T λ = 1, λ ≽ 0 (193)Implications are:1. set C is convex, (it is an intersection of PSD cone with hyperplane)2. because the set of eigenvalues corresponding to a given square matrixA is unique, no single eigenvalue can exceed 1 ; id est, I ≽ A .2.9.2.5 PSD cone is generally not circularExtreme angle equation (190) suggests that the positive semidefinite conemight be invariant to rotation about its axis of revolution; id est, a circularcone. We investigate this now:2.9.2.5.1 Definition. Circular cone: 2.34a pointed closed convex cone having hyperspherical sections orthogonal toits axis of revolution about which the cone is invariant to rotation. △A conic section is the intersection of a cone with any hyperplane. In threedimensions, an intersecting plane perpendicular to a circular cone’s axis ofrevolution produces a section bounded by a circle. (Figure 32) A prominentexample of a circular cone in convex analysis is the Lorentz cone (142). Wealso find that the positive semidefinite cone and cone of Euclidean distancematrices are circular cones, but only in low dimension.2.34 A circular cone is assumed convex throughout, although not so by other authors. Wealso assume a right circular cone.

122 CHAPTER 2. CONVEX GEOMETRYR0Figure 32: This circular cone continues upward infinitely. Axis of revolutionis illustrated as vertical line segment through origin. R is the radius, thedistance measured from any extreme direction to axis of revolution. Werethis a Lorentz cone, any plane slice containing the axis of revolution wouldmake a right angle.

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 123The positive semidefinite cone has axis of revolution that is the ray(base 0) through the identity matrix I . Consider the set of normalizedextreme directions of the positive semidefinite cone: for some arbitrarypositive constant a{yy T ∈ S M | ‖y‖ = √ a} ⊂ ∂S M + (194)The distance from each extreme direction to the axis of revolution is theradius√R = ∆ infc ‖yyT − cI‖ F = a 1 − 1 (195)Mwhich is the distance from yy T to a I ; the length of vector M yyT − a I . MBecause distance R (in a particular dimension) from the axis of revolutionto each and every normalized extreme direction is identical, the extremedirections lie on the boundary of a hypersphere in isometrically isomorphicR M(M+1)/2 . From Example 2.9.2.4.1, the convex hull (excluding the vertexat the origin) of the normalized extreme directions is a conic sectionC ∆ = conv{yy T | y ∈ R M , y T y = a} = S M + ∩ {A∈ S M | 〈I , A〉 = a} (196)orthogonal to the identity matrix I ;〈C − a M I , I〉 = tr(C − a I) = 0 (197)MAlthough the positive semidefinite cone possesses some characteristics ofa circular cone, we can prove it is not by demonstrating a shortage of extremedirections; id est, some extreme directions corresponding to each and everyangle of rotation about the axis of revolution are nonexistent: Referring toFigure 33, [257,1-7]〈 aMcos θ =11T − a I , M yyT − a I〉 Ma 2 (1 − 1 ) (198)MSolving for vector y we geta(1 + (M −1) cosθ) = (1 T y) 2 (199)Because this does not have real solution for every matrix dimension M andfor all 0 ≤ θ ≤ 2π , then we can conclude that the positive semidefinite conemight be circular but only in matrix dimensions 1 and 2 . 2.35 2.35 In fact, the positive semidefinite cone is circular in those dimensions while it is arotation of the Lorentz cone in dimension 2.

124 CHAPTER 2. CONVEX GEOMETRYyy TaI θ aMRM 11TFigure 33: Illustrated is a section, perpendicular to the axis of revolution,of the circular cone from Figure 32. Radius R is the distance from anyextreme direction to the axis at a M I . Vector a M 11T is an arbitrary referenceby which to measure angle θ .Because of the shortage of extreme directions, the conic section (196)cannot be hyperspherical by the extremes theorem (2.8.1.1.1).2.9.2.5.2 Example. PSD cone inscription in three dimensions.Theorem. Geršgorin discs. [131,6.1] [245]For p∈R m + given A=[A ij ]∈ S m , then all the eigenvalues of A belong to theunion of m closed intervals on the real line;λ(A) ∈ m ⋃i=1⎧⎪⎨ξ ∈ R⎪⎩|ξ − A ii | ≤ ̺i∆= 1 m∑p ij=1j ≠ i⎫⎪⎬ ⋃p j |A ij | = m [A ii −̺i , A ii +̺i]i=1 ⎪⎭(200)Furthermore, if a union of k of these m [intervals] forms a connected regionthat is disjoint from all the remaining n − k [intervals], then there areprecisely k eigenvalues of A in this region.⋄

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1250-1-0.5p =[ 1/21]10.500.5110.750.50 0.25p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 120-1-0.50.50[ 1p =1]10.5110.750.50 0.250-1-0.5A 11√2A12A 220.50[ 2p =1]10.5110.750.50 0.25svec ∂S 2 +Figure 34: Polyhedral proper cone K , created by intersection of halfspaces,inscribes PSD cone in isometrically isomorphic R 3 as predicted by Geršgorindiscs theorem for A=[A ij ]∈ S 2 . Hyperplanes supporting K intersect alongboundary of PSD cone. Four extreme directions of K coincide with extremedirections of PSD cone.

126 CHAPTER 2. CONVEX GEOMETRYTo apply the theorem to determine positive semidefiniteness of symmetricmatrix A , we observe that for each i we must haveSupposeA ii ≥ ̺i (201)m = 2 (202)so A ∈ S 2 . Vectorizing A as in (46), svec A belongs to isometricallyisomorphic R 3 . Then we have m2 m−1 = 4 inequalities, in the matrix entriesA ij with Geršgorin parameters p =[p i ]∈ R 2 + ,p 1 A 11 ≥ ±p 2 A 12p 2 A 22 ≥ ±p 1 A 12(203)which describe an intersection of four halfspaces in R m(m+1)/2 . Thatintersection creates the polyhedral proper cone K (2.12.1) whoseconstruction is illustrated in Figure 34. Drawn truncated is the boundaryof the positive semidefinite cone svec S 2 + and the bounding hyperplanessupporting K .Created by means of Geršgorin discs, K always belongs to the positivesemidefinite cone for any nonnegative value of p ∈ R m + . Hence any point inK corresponds to some positive semidefinite matrix A . Only the extremedirections of K intersect the positive semidefinite cone boundary in thisdimension; the four extreme directions of K are extreme directions of thepositive semidefinite cone. As p 1 /p 2 increases in value from 0, two extremedirections of K sweep the entire boundary of this positive semidefinite cone.Because the entire positive semidefinite cone can be swept by K , the systemof linear inequalities[Y T svec A =∆ p1 ±p 2 / √ ]2 00 ±p 1 / √ svec A ≽ 0 (204)2 p 2when made dynamic can replace a semidefinite constraint A≽0 ; id est, forgiven p where Y ∈ R m(m+1)/2×m2m−1K = {z | Y T z ≽ 0} ⊂ svec S m + (205)svec A ∈ K ⇒ A ∈ S m + (206)

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 127but∃p Y T svec A ≽ 0 ⇔ A ≽ 0 (207)In other words, diagonal dominance [131, p.349,7.2.3]A ii ≥m∑|A ij | , ∀i = 1... m (208)j=1j ≠ iis only a sufficient condition for membership to the PSD cone; but bydynamic weighting p in this dimension, it was made necessary andsufficient.In higher dimension (m > 2), the boundary of the positive semidefinitecone is no longer constituted completely by its extreme directions (symmetricrank-one matrices); the geometry becomes complicated. How all the extremedirections can be swept by an inscribed polyhedral cone, 2.36 similarly to theforegoing example, remains an open question.2.9.2.5.3 Exercise.Find dual polyhedral proper cone K ∗ from Figure 34.2.9.2.6 Boundary constituents of the PSD cone2.9.2.6.1 Lemma. Sum of positive semidefinite matrices.For A,B ∈ S M +rank(A + B) = rank(µA + (1 −µ)B) (209)over the open interval (0, 1) of µ .⋄Proof. Any positive semidefinite matrix belonging to the PSD conehas an eigen decomposition that is a positively scaled sum of linearlyindependent symmetric dyads. By the linearly independent dyads definitioninB.1.1.0.1, rank of the sum A +B is equivalent to the number of linearlyindependent dyads constituting it. Linear independence is insensitive tofurther positive scaling by µ . The assumption of positive semidefinitenessprevents annihilation of any dyad from the sum A +B . 2.36 It is not necessary to sweep the entire boundary in higher dimension.

128 CHAPTER 2. CONVEX GEOMETRY2.9.2.6.2 Example. Rank function quasiconcavity. (confer3.2)For A,B ∈ R m×n [131,0.4]that follows from the fact [220,3.6]rankA + rankB ≥ rank(A + B) (210)dim R(A) + dim R(B) = dim R(A + B) + dim(R(A) ∩ R(B)) (211)For A,B ∈ S M +[41,3.4.2]rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} (212)that follows from the factN(A + B) = N(A) ∩ N(B) , A,B ∈ S M + (128)Rank is a quasiconcave function on S M + because the right-hand inequality in(212) has the concave form (452); videlicet, Lemma 2.9.2.6.1. From this example we see, unlike convex functions, quasiconvex functionsare not necessarily continuous. (3.2) We also glean:2.9.2.6.3 Theorem. Convex subsets of the positive semidefinite cone.The subsets of the positive semidefinite cone S M + , for 0≤ρ≤MS M +(ρ) ∆ = {X ∈ S M + | rankX ≥ ρ} (213)are pointed convex cones, but not closed unless ρ = 0 ; id est, S M +(0)= S M + .⋄Proof. Given ρ , a subset S M +(ρ) is convex if and only ifconvex combination of any two members has rank at least ρ . That isconfirmed applying identity (209) from Lemma 2.9.2.6.1 to (212); id est, forA,B ∈ S M +(ρ) on the closed interval µ∈[0, 1]rank(µA + (1 −µ)B) ≥ min{rankA, rankB} (214)It can similarly be shown, almost identically to proof of the lemma, any coniccombination of A,B in subset S M +(ρ) remains a member; id est, ∀ζ, ξ ≥0rank(ζA + ξB) ≥ min{rank(ζA), rank(ξB)} (215)Therefore, S M +(ρ) is a convex cone.Another proof of convexity can be made by projection arguments:

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1292.9.2.7 Projection on S M +(ρ)Because these cones S M +(ρ) indexed by ρ (213) are convex, projection onthem is straightforward. Given a symmetric matrix H having diagonalizationH = ∆ QΛQ T ∈ S M (A.5.2) with eigenvalues Λ arranged in nonincreasingorder, then its Euclidean projection (minimum-distance projection) on S M +(ρ)P S M+ (ρ)H = QΥ ⋆ Q T (216)corresponds to a map of its eigenvalues:Υ ⋆ ii ={ max {ǫ , Λii } , i=1... ρmax {0, Λ ii } , i=ρ+1... M(217)where ǫ is positive but arbitrarily close to 0.2.9.2.7.1 Exercise.Prove (217) using Theorem E.9.2.0.1.Because each H ∈ S M has unique projection on S M +(ρ) (despite possibilityof repeated eigenvalues in Λ), we may conclude it is a convex set by theBunt-Motzkin theorem (E.9.0.0.1).Compare (217) to the well-known result regarding Euclidean projectionon a rank ρ subset (2.9.2.1) of the positive semidefinite coneS M + \ S M +(ρ + 1) = {X ∈ S M + | rankX ≤ ρ} (218)P S M+ \S M + (ρ+1) H = QΥ ⋆ Q T (219)As proved in7.1.4, this projection of H corresponds to the eigenvalue mapΥ ⋆ ii ={ max {0 , Λii } , i=1... ρ0 , i=ρ+1... M(996)Together these two results (217) and (996) mean: A higher-rank solutionto projection on the positive semidefinite cone lies arbitrarily close to anygiven lower-rank projection, but not vice versa. Were the number ofnonnegative eigenvalues in Λ known a priori not to exceed ρ , then thesetwo different projections would produce identical results in the limit ǫ→0.

130 CHAPTER 2. CONVEX GEOMETRY2.9.2.8 Uniting constituentsInterior of the PSD cone int S M + is convex by Theorem 2.9.2.6.3, for example,because all positive semidefinite matrices having rank M constitute the coneinterior.All positive semidefinite matrices of rank less than M constitute the coneboundary; an amalgam of positive semidefinite matrices of different rank.Thus each nonconvex subset of positive semidefinite matrices, for 0

2.9. POSITIVE SEMIDEFINITE (PSD) CONE 1312.9.2.9.1 Exercise. Difference A − B .What about the difference of matrices A,B belonging to the positivesemidefinite cone? Show:The difference of any two points on the boundary belongs to theboundary or exterior.The difference A−B , where A belongs to the boundary while B isinterior, belongs to the exterior.2.9.3 Barvinok’s propositionBarvinok posits existence and quantifies an upper bound on rank of a positivesemidefinite matrix belonging to the intersection of the PSD cone with anaffine subset:2.9.3.0.1 Proposition. (Barvinok) Affine intersection with PSD cone.[19,II.13] [20,2.2] Consider finding a matrix X ∈ S N satisfyingX ≽ 0, 〈A j , X〉 = b j , j =1... m (223)given nonzero linearly independent A j ∈ S N and real b j . Define the affinesubsetA ∆ = {X | 〈A j , X〉=b j , j =1... m} ⊆ S N (224)If the intersection A ∩ S N + is nonempty, then there exists a matrixX ∈ A ∩ S N + such that given a number of equalities mrankX (rankX + 1)/2 ≤ m (225)whence the upper bound 2.37⌊√ ⌋ 8m + 1 − 1rankX ≤2(226)or given desired rank instead, equivalently,m < (rankX + 1)(rankX + 2)/2 (227)2.376.1.1.2 contains an intuitive explanation. This bound is itself limited above, of course,by N ; a tight limit corresponding to an interior point of S N + .

132 CHAPTER 2. CONVEX GEOMETRYAn extreme point of A ∩ S N + satisfies (226) and (227). (confer6.1.1.2)A matrix X ∆ =R T R is an extreme point if and only if the smallest face thatcontains X of A ∩ S N + has dimension 0 ; [156,2.4] id est, iff (133)dim F ( (A ∩ S N +)∋X )= rank(X)(rank(X) + 1)/2 − rank [ (228)svec RA 1 R T svec RA 2 R T · · · svec RA m R ] Tequals 0 in isomorphic R N(N+1)/2 .Now the intersection A ∩ S N + is assumed bounded: Assume a givennonzero upper bound ρ on rank, a number of equalitiesm=(ρ + 1)(ρ + 2)/2 (229)and matrix dimension N ≥ ρ + 2 ≥ 3. If the intersection is nonempty andbounded, then there exists a matrix X ∈ A ∩ S N + such thatrankX ≤ ρ (230)This represents a tightening of the upper bound; a reduction by exactly 1of the bound provided by (226) given the same specified number m (229) ofequalities; id est,√ 8m + 1 − 1rankX ≤− 1 (231)2⋄When the intersection A ∩ S N + is known a priori to consist only of asingle point, then Barvinok’s proposition provides the greatest upper boundon its rank not exceeding N . The intersection can be a single nonzero pointonly if the number of linearly independent hyperplanes m constituting Asatisfies 2.38 N(N + 1)/2 − 1 ≤ m ≤ N(N + 1)/2 (232)2.38 For N >1, N(N+1)/2 −1 independent hyperplanes in R N(N+1)/2 can make a linetangent to svec ∂ S N + at a point because all one-dimensional faces of S N + are exposed.Because a pointed convex cone has only one vertex, the origin, there can be no intersectionof svec ∂ S N + with any higher-dimensional affine subset A that will make a nonzero point.

2.10. CONIC INDEPENDENCE (C.I.) 1332.10 Conic independence (c.i.)In contrast to extreme direction, the property conically independent directionis more generally applicable, inclusive of all closed convex cones (notonly pointed closed convex cones). Similar to the definition for linearindependence, arbitrary given directions {Γ i ∈ R n , i=1... N} are conicallyindependent if and only if, for all ζ ∈ R N +Γ i ζ i + · · · + Γ j ζ j − Γ l ζ l = 0, i≠ · · · ≠j ≠l = 1... N (233)has only the trivial solution ζ =0; in words, iff no direction from the givenset can be expressed as a conic combination of those remaining. (Figure 35,for example. A Matlab implementation of test (233) is given inG.2.) Itis evident that linear independence (l.i.) of N directions implies their conicindependence;l.i. ⇒ c.i.Arranging any set of generators for a particular convex cone in a matrixcolumnar,X ∆ = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (234)then the relationship l.i. ⇒ c.i. suggests: the number of l.i. generators inthe columns of X cannot exceed the number of c.i. generators. Denoting byk the number of conically independent generators contained in X , we havethe most fundamental rank inequality for convex conesdim aff K = dim aff[0 X ] = rankX ≤ k ≤ N (235)Whereas N directions in n dimensions can no longer be linearly independentonce N exceeds n , conic independence remains possible:2.10.0.0.1 Table: Maximum number c.i. directionsn supk (pointed) supk (not pointed)0 0 01 1 22 2 43.∞.∞.

134 CHAPTER 2. CONVEX GEOMETRY000(a) (b) (c)Figure 35: Vectors in R 2 : (a) affinely and conically independent,(b) affinely independent but not conically independent, (c) conicallyindependent but not affinely independent. None of the examples exhibitslinear independence. (In general, a.i. c.i.)Assuming veracity of this table, there is an apparent vastness between twoand three dimensions. The finite numbers of conically independent directionsindicate:Convex cones in dimensions 0, 1, and 2 must be polyhedral. (2.12.1)Conic independence is certainly one convex idea that cannot be completelyexplained by a two-dimensional picture. [19, p.vii]From this table it is also evident that dimension of Euclidean space cannotexceed the number of conically independent directions possible;n ≤ supkWe suspect the number of conically independent columns (rows) of X tobe the same for X †T .The columns (rows) of X are c.i. ⇔ the columns (rows) of X †T are c.i.Proof. Pending.

2.10. CONIC INDEPENDENCE (C.I.) 135K(a)K(b)∂K ∗Figure 36: (a) A pointed polyhedral cone (drawn truncated) in R 3 having sixfacets. The extreme directions, corresponding to six edges emanating fromthe origin, are generators for this cone; not linearly independent but theymust be conically independent. (b) The boundary of dual cone K ∗ (drawntruncated) is now added to the drawing of same K . K ∗ is polyhedral, proper,and has the same number of extreme directions as K has facets.

136 CHAPTER 2. CONVEX GEOMETRY2.10.1 Preservation of conic independenceIndependence in the linear (2.1.2.1), affine (2.4.2.4), and conic senses canbe preserved under linear transformation. Suppose a matrix X ∈ R n×N (234)holds a conically independent set columnar. Consider the transformationT(X) : R n×N → R n×N ∆ = XY (236)where the given matrix Y = ∆ [y 1 y 2 · · · y N ]∈ R N×N is represented by linearoperator T . Conic independence of {Xy i ∈ R n , i=1... N} demands, bydefinition (233),Xy i ζ i + · · · + Xy j ζ j − Xy l ζ l = 0, i≠ · · · ≠j ≠l = 1... N (237)have no nontrivial solution ζ ∈ R N + . That is ensured by conic independenceof {y i ∈ R N } and by R(Y )∩ N(X) = 0 ; seen by factoring X .2.10.1.1 linear maps of cones[17,7] If K is a convex cone in Euclidean space R and T is any linearmapping from R to Euclidean space M , then T(K) is a convex cone in Mand x ≼ y with respect to K implies T(x) ≼ T(y) with respect to T(K).If K is closed or has nonempty interior in R , then so is T(K) in M .If T is a linear bijection, then x ≼ y ⇔ T(x) ≼ T(y). Further, if F isa face of K , then T(F) is a face of T(K).2.10.2 Pointed closed convex K & conic independenceThe following bullets can be derived from definitions (150) and (233) inconjunction with the extremes theorem (2.8.1.1.1):The set of all extreme directions from a pointed closed convex coneK ⊂ R n is not necessarily a linearly independent set, yet it must be a conicallyindependent set; (compare Figure 12 on page 69 with Figure 36(a)){extreme directions} ⇒ {c.i.}Conversely, when a conically independent set of directions from pointedclosed convex cone K is known a priori to comprise generators, then alldirections from that set must be extreme directions of the cone;{extreme directions} ⇔ {c.i. generators of pointed closed convex K}

2.10. CONIC INDEPENDENCE (C.I.) 137Hx 2x 30∂HFigure 37: Minimal set of generators X = [x 1 x 2 x 3 ]∈ R 2×3 for halfspaceabout origin.x 1Barker & Carlson [17,1] call the extreme directions a minimal generatingset for a pointed closed convex cone. A minimal set of generators is thereforea conically independent set of generators, and vice versa, 2.39 for a pointedclosed convex cone.Any collection of n or fewer extreme directions from pointed closedconvex cone K ⊂ R n must be linearly independent;{≤ n extreme directions in R n } ⇒ {l.i.}Conversely, because l.i. ⇒ c.i.,{extreme directions} ⇐ {l.i. generators of pointed closed convex K}2.39 This converse does not hold for nonpointed closed convex cones as Table 2.10.0.0.1implies; e.g., ponder four conically independent generators for a plane (case n=2).

138 CHAPTER 2. CONVEX GEOMETRY2.10.2.0.1 Example. Vertex-description of halfspace H about origin.From n + 1 points in R n we can make a vertex-description of a convexcone that is a halfspace H , where {x l ∈ R n , l=1... n} constitutes aminimal set of generators for a hyperplane ∂H through the origin. Anexample is illustrated in Figure 37. By demanding the augmented set{x l ∈ R n , l=1... n + 1} be affinely independent (we want x n+1 not parallelto ∂H), thenH = ⋃ (ζ x n+1 + ∂H)ζ ≥0= {ζ x n+1 + cone{x l ∈ R n , l=1... n} | ζ ≥0}(238)= cone{x l ∈ R n , l=1... n + 1}a union of parallel hyperplanes. Cardinality is one step beyond dimension ofthe ambient space, but {x l ∀l} is a minimal set of generators for this convexcone H which has no extreme elements.2.10.3 Utility of conic independencePerhaps the most useful application of conic independence is determinationof the intersection of closed convex cones from their halfspace-descriptions,or representation of the sum of closed convex cones from theirvertex-descriptions.⋂KiA halfspace-description for the intersection of any number of closedconvex cones K i can be acquired by pruning normals; specifically,only the conically independent normals from the aggregate of all thehalfspace-descriptions need be retained.∑KiGenerators for the sum of any number of closed convex cones K i canbe determined by retaining only the conically independent generatorsfrom the aggregate of all the vertex-descriptions.Such conically independent sets are not necessarily unique or minimal.

2.11. WHEN EXTREME MEANS EXPOSED 1392.11 When extreme means exposedFor any convex polyhedral set in R n having nonempty interior, distinctionbetween the terms extreme and exposed vanishes [218,2.4] [66,2.2] forfaces of all dimensions except n ; their meanings become equivalent as wesaw in Figure 11 (discussed in2.6.1.2). In other words, each and every faceof any polyhedral set (except the set itself) can be exposed by a hyperplane,and vice versa; e.g., Figure 12.Lewis [160,6] [140,2.3.4] claims nonempty extreme proper subsets andthe exposed subsets coincide for S n + ; id est, each and every face of the positivesemidefinite cone, whose dimension is less than the dimension of the cone,is exposed. A more general discussion of cones having this property can befound in [228]; e.g., the Lorentz cone (142) [16,II.A].2.12 Convex polyhedraEvery polyhedron, such as the convex hull (71) of a bounded list X , canbe expressed as the solution set of a finite system of linear equalities andinequalities, and vice versa. [66,2.2]2.12.0.0.1 Definition. Convex polyhedra, halfspace-description.[41,2.2.4] A convex polyhedron is the intersection of a finite number ofhalfspaces and hyperplanes;P = {y | Ay ≽ b, Cy = d} ⊆ R n (239)where coefficients A and C generally denote matrices. Each row of C is avector normal to a hyperplane, while each row of A is a vector inward-normalto a hyperplane partially bounding a halfspace.△By the halfspaces theorem in2.4.1.1.1, a polyhedron thus described is aclosed convex set having possibly empty interior; e.g., Figure 11. Convexpolyhedra 2.40 are finite-dimensional comprising all affine sets (2.3.1),polyhedral cones, line segments, rays, halfspaces, convex polygons, solids[145, def.104/6, p.343], polychora, polytopes, 2.41 etcetera.2.40 We consider only convex polyhedra throughout, but acknowledge the existence ofconcave polyhedra. [252, Kepler-Poinsot Solid]2.41 Some authors distinguish bounded polyhedra via the designation polytope. [66,2.2]

140 CHAPTER 2. CONVEX GEOMETRYIt follows from definition (239) by exposure that each face of a convexpolyhedron is a convex polyhedron.The projection of any polyhedron on a subspace remains a polyhedron.More generally, the image of a polyhedron under any linear transformationis a polyhedron. [19,I.9]When b and d in (239) are 0, the resultant is a polyhedral cone. Theset of all polyhedral cones is a subset of convex cones:2.12.1 Polyhedral coneFrom our study of cones we see the number of intersecting hyperplanes andhalfspaces constituting a convex cone is possibly but not necessarily infinite.When the number is finite, the convex cone is termed polyhedral. That isthe primary distinguishing feature between the set of all convex cones andpolyhedra; all polyhedra, including polyhedral cones, are finitely generated[202,19]. We distinguish polyhedral cones in the set of all convex cones forthis reason, although all convex cones of dimension 2 or less are polyhedral.2.12.1.0.1 Definition. Polyhedral cone, halfspace-description. 2.42(confer (246)) A polyhedral cone is the intersection of a finite number ofhalfspaces and hyperplanes about the origin;K = {y | Ay ≽ 0, Cy = 0} ⊆ R n (a)= {y | Ay ≽ 0, Cy ≽ 0, Cy ≼ 0} (b)⎧ ⎡ ⎤ ⎫⎨ A ⎬=⎩ y | ⎣ C ⎦y ≽ 0(c)⎭−C(240)where coefficients A and C generally denote matrices of finite dimension.Each row of C is a vector normal to a hyperplane containing the origin,while each row of A is a vector inward-normal to a hyperplane containingthe origin and partially bounding a halfspace.△A polyhedral cone thus defined is closed, convex, possibly has emptyinterior, and only a finite number of generators (2.8.1.2), and vice versa.(Minkowski/Weyl) [218,2.8] [202, thm.19.1]2.42 Rockafellar [202,19] proposes affine sets be handled via complementary pairs of affineinequalities; e.g., Cy ≽d and Cy ≼d .

2.12. CONVEX POLYHEDRA 141From the definition it follows that any single hyperplane through theorigin, or any halfspace partially bounded by a hyperplane through the originis a polyhedral cone. The most familiar example of polyhedral cone is anyquadrant (or orthant,2.1.3) generated by Cartesian axes. Esoteric examplesof polyhedral cone include the point at the origin, any line through the origin,any ray having the origin as base such as the nonnegative real line R + insubspace R , polyhedral flavors of the (proper) Lorentz cone (confer (142)){[ ]}xK l = ∈ R n × R | ‖x‖tl ≤ t , l=1 or ∞ (241)any subspace, and R n . More examples are illustrated in Figure 36 andFigure 12.2.12.2 Vertices of convex polyhedraBy definition, a vertex (2.6.1.0.1) always lies on the relative boundary of aconvex polyhedron. [145, def.115/6, p.358] In Figure 11, each vertex of thepolyhedron is located at the intersection of three or more facets, and everyedge belongs to precisely two facets [19,VI.1, p.252]. In Figure 12, the onlyvertex of that polyhedral cone lies at the origin.The set of all polyhedral cones is clearly a subset of convex polyhedra anda subset of convex cones. Not all convex polyhedra are bounded, evidently,neither can they all be described by the convex hull of a bounded set of pointsas we defined it in (71). Hence we propose a universal vertex-description ofpolyhedra in terms of that same finite-length list X (64):2.12.2.0.1 Definition. Convex polyhedra, vertex-description.(confer2.8.1.1.1) Denote the truncated a-vector,[ ]aia i:l = .a l(242)By discriminating a suitable finite-length generating list (or set) arrangedcolumnar in X ∈ R n×N , then any particular polyhedron may be describedP = { Xa | a T 1:k1 = 1, a m:N ≽ 0, {1... k} ∪ {m ...N} = {1... N} } (243)where 0 ≤ k ≤ N and 1 ≤ m ≤ N + 1 . Setting k = 0 removes the affineequality condition. Setting m=N + 1 removes the inequality. △

142 CHAPTER 2. CONVEX GEOMETRYCoefficient indices in (243) may or may not be overlapping, but allthe coefficients are assumed constrained. From (66), (71), and (79), wesummarize how the coefficient conditions may be applied;affine sets −→ a T 1:k 1 = 1polyhedral cones −→ a m:N ≽ 0}←− convex hull (m ≤ k) (244)It is always possible to describe a convex hull in the region of overlappingindices because, for 1 ≤ m ≤ k ≤ N{a m:k | a T m:k1 = 1, a m:k ≽ 0} ⊆ {a m:k | a T 1:k1 = 1, a m:N ≽ 0} (245)Members of a generating list are not necessarily vertices of thecorresponding polyhedron; certainly true for (71) and (243), some subsetof list members reside in the polyhedron’s relative interior. Conversely, whenboundedness (71) applies, the convex hull of the vertices is a polyhedronidentical to the convex hull of the generating list.2.12.2.1 Vertex-description of polyhedral coneGiven closed convex cone K in a subspace of R n having any set of generatorsfor it arranged in a matrix X ∈ R n×N as in (234), then that cone is describedsetting m=1 and k=0 in vertex-description (243):K = cone(X) = {Xa | a ≽ 0} ⊆ R n (246)a conic hull, like (79), of N generators.This vertex description is extensible to an infinite number of generators;which follows from the extremes theorem (2.8.1.1.1) and Example 2.8.1.2.1.2.12.2.2 Pointedness[218,2.10] Assuming all generators constituting the columns of X ∈ R n×Nare nonzero, K is pointed (2.7.2.1.2) if and only if there is no nonzero a ≽ 0that solves Xa=0; id est, iff N(X) ∩ R N + = 0. (If rankX = n , then thedual cone K ∗ is pointed. (260))A polyhedral proper cone in R n must have at least n linearly independentgenerators, or be the intersection of at least n halfspaces whose partialboundaries have normals that are linearly independent. Otherwise, the cone

2.12. CONVEX POLYHEDRA 143S = {s | s ≽ 0, 1 T s ≤ 1}1Figure 38: Unit simplex S in R 3 is a unique solid tetrahedron, but is notregular.will contain at least one line and there can be no vertex; id est, the conecannot otherwise be pointed.For any pointed polyhedral cone, there is a one-to-one correspondence ofone-dimensional faces with extreme directions.Examples of pointed closed convex cones K are not limited to polyhedralcones: the origin, any 0-based ray in a subspace, any two-dimensionalV-shaped cone in a subspace, the Lorentz (ice-cream) cone and its polyhedralflavors, the cone of Euclidean distance matrices EDM N in S N h , the propercones: S M + in ambient S M , any orthant in R n or R m×n ; e.g., the nonnegativereal line R + in vector space R .

144 CHAPTER 2. CONVEX GEOMETRY2.12.3 Unit simplexA peculiar convex subset of the nonnegative orthant havinghalfspace-descriptionS ∆ = {s | s ≽ 0, 1 T s ≤ 1} ⊆ R n + (247)is a unique bounded convex polyhedron called unit simplex (Figure 38)having nonempty interior, n + 1 vertices, and dimension [41,2.2.4]dim S = n (248)The origin supplies one vertex while heads of the standard basis [131][220] {e i , i=1... n} in R n constitute those remaining; 2.43 thus itsvertex-description:S2.12.3.1 Simplex= conv {0, {e i , i=1... n}}= { [0 e 1 e 2 · · · e n ]a | a T 1 = 1, a ≽ 0 } (249)The unit simplex comes from a class of general polyhedra called simplex,having vertex-description: [57] [202] [250] [66]conv{x l ∈ R n } | l = 1... k+1, dim aff{x l } = k , n ≥ k (250)So defined, a simplex is a closed bounded convex set having possibly emptyinterior. Examples of simplices, by increasing affine dimension, are: a point,any line segment, any triangle and its relative interior, a general tetrahedron,polychoron, and so on.2.12.3.1.1 Definition. Simplicial cone.A polyhedral proper (2.7.2.2.1) cone K in R n is called simplicial iff Khas exactly n extreme directions; [16,II.A] equivalently, iff proper K hasexactly n linearly independent generators contained in any given set ofgenerators.△There are an infinite variety of simplicial cones in R n ; e.g., Figure 12,Figure 39, Figure 47. Any orthant is simplicial, as is any rotation thereof.2.43 In R 0 the unit simplex is the point at the origin, in R the unit simplex is the linesegment [0,1], in R 2 it is a triangle and its relative interior, in R 3 it is the convex hull ofa tetrahedron (Figure 38), in R 4 it is the convex hull of a pentatope [252], and so on.

2.12. CONVEX POLYHEDRA 145Figure 39: Two views of a simplicial cone and its dual in R 3 (second view onnext page). Semi-infinite boundary of each cone is truncated for illustration.Cartesian axes are drawn for reference.

146 CHAPTER 2. CONVEX GEOMETRY

2.13. DUAL CONE & GENERALIZED INEQUALITY 1472.12.4 Converting between descriptionsConversion between halfspace-descriptions (239) (240) and equivalentvertex-descriptions (71) (243) is nontrivial, in general, [13] [66,2.2] but theconversion is easy for simplices. [41,2.2] Nonetheless, we tacitly assume thetwo descriptions to be equivalent. [202,19, thm.19.1] We explore conversionsin2.13.4 and2.13.9:2.13 Dual cone & generalized inequality& biorthogonal expansionThese three concepts, dual cone, generalized inequality, and biorthogonalexpansion, are inextricably melded; meaning, it is difficult to completelydiscuss one without mentioning the others. The dual cone is critical in testsfor convergence by contemporary primal/dual methods for numerical solutionof conic problems. [267] [182,4.5] For unique minimum-distance projectionon a closed convex cone K , the negative dual cone −K ∗ plays the role thatorthogonal complement plays for subspace projection. 2.44 (E.9.2.1) Indeed,−K ∗ is the algebraic complement in R n ;K ⊞ −K ∗ = R n (251)where ⊞ denotes unique orthogonal vector sum.One way to think of a pointed closed convex cone is as a new kind ofcoordinate system whose basis is generally nonorthogonal; a conic system,very much like the familiar Cartesian system whose analogous cone is thefirst quadrant or nonnegative orthant. Generalized inequality ≽ K is aformalized means to determine membership to any pointed closed convexcone (2.7.2.2), while biorthogonal expansion is simply a formulation forexpressing coordinates in a pointed conic coordinate system. When coneK is the nonnegative orthant, then these three concepts come into alignmentwith the Cartesian prototype; biorthogonal expansion becomes orthogonalexpansion.2.44 Namely, projection on a subspace is ascertainable from its projection on the orthogonalcomplement.

148 CHAPTER 2. CONVEX GEOMETRY2.13.1 Dual coneFor any set K (convex or not), the dual cone [41,2.6.1] [64,4.2]K ∗ ∆ = { y ∈ R n | 〈y , x〉 ≥ 0 for all x ∈ K } (252)is a unique cone 2.45 that is always closed and convex because it is anintersection of halfspaces (halfspaces theorem (2.4.1.1.1)) whose partialboundaries each contain the origin, each halfspace having inward-normal xbelonging to K ; e.g., Figure 40(a).When cone K is convex, there is a second and equivalent construction:Dual cone K ∗ is the union of each and every vector y inward-normal toa hyperplane supporting K or bounding a halfspace containing K ; e.g.,Figure 40(b). When K is represented by a halfspace-description such as(240), for example, where⎡⎤⎡⎤A =∆ ⎣a T1 .⎦∈ R m×n , C =∆ ⎣c T1 .⎦∈ R p×n (253)a T mc T pthen the dual cone can be represented as the conic hullK ∗ = cone{a 1 ,..., a m , ±c 1 ,..., ±c p } (254)a vertex-description, because each and every conic combination of normalsfrom the halfspace-description of K yields another inward-normal to ahyperplane supporting or bounding a halfspace containing K .2.13.1.0.1 Exercise.Perhaps the most instructive graphical method of dual cone construction iscut-and-try. Find the dual of each polyhedral cone from Figure 41 by usingdual cone equation (252).2.45 The dual cone is the negative polar cone defined by many authors; K ∗ = −K ◦ .[129,A.3.2] [202,14] [28] [19] [218,2.7]

2.13. DUAL CONE & GENERALIZED INEQUALITY 149K ∗(a)0K ∗KyK(b)0Figure 40: Two equivalent constructions of dual cone K ∗ in R 2 : (a) Showingconstruction by intersection of halfspaces about 0 (drawn truncated). Onlythose two halfspaces whose bounding hyperplanes have inward-normalcorresponding to an extreme direction of this pointed closed convex coneK ⊂ R 2 need be drawn; by (303). (b) Suggesting construction by union ofinward-normals y to each and every hyperplane ∂H + supporting K . Thisinterpretation is valid when K is convex because existence of a supportinghyperplane is then guaranteed (2.4.2.6).

150 CHAPTER 2. CONVEX GEOMETRYR 2 R 310.8(a)0.60.40.20∂K ∗K∂K ∗K(b)−0.2−0.4−0.6K ∗−0.8−1−0.5 0 0.5 1 1.5x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ G(K ∗ ) (301)Figure 41: Dual cone construction by right-angle. Each extreme direction of apolyhedral cone is orthogonal to a facet of its dual cone, and vice versa, in anydimension. (2.13.6.1) (a) This characteristic guides graphical constructionof dual cone in two dimensions: It suggests finding dual-cone boundary ∂by making right-angles with extreme directions of polyhedral cone. Theconstruction is then pruned so that each dual boundary vector does notexceed π/2 radians in angle with each and every vector from polyhedralcone. Were dual cone in R 2 to narrow, Figure 42 would be reached in limit.(b) Same polyhedral cone and its dual continued into three dimensions.(confer Figure 47)

2.13. DUAL CONE & GENERALIZED INEQUALITY 151KK ∗0Figure 42: K is a halfspace about the origin in R 2 . K ∗ is a ray base 0,hence has empty interior in R 2 ; so K cannot be pointed. (Both convexcones appear truncated.)As defined, dual cone K ∗ exists even when the affine hull of the originalcone is a proper subspace; id est, even when the original cone has emptyinterior. Rockafellar formulates the dimension of K and K ∗ . [202,14] 2.46To further motivate our understanding of the dual cone, consider theease with which convergence can be observed in the following optimizationproblem (p):2.13.1.0.2 Example. Dual problem. (confer6.1)Essentially, duality theory concerns the representation of a given optimizationproblem as half a minimax problem (minimize x maximize y ˆf(x,y)) [202,36][41,5.4] whose saddle-value [88] exists. [199, p.3] Consider primal conicproblem (p) and its corresponding dual problem (d): [192,3.3.1] [156,2.1]given vectors α,β and matrix constant C(p)minimize α T xxsubject to x ∈ KCx = βmaximize β T zy,zsubject to y ∈ K ∗C T z + y = α(d) (255)2.46 His monumental work Convex Analysis has not one figure or illustration. See[19,II.16] for a good illustration of Rockafellar’s recession cone [29].

152 CHAPTER 2. CONVEX GEOMETRYObserve the dual problem is also conic, and its objective function value neverexceeds that of the primal;α T x ≥ β T zx T (C T z + y) ≥ (Cx) T zx T y ≥ 0(256)which holds by definition (252). Under the sufficient condition: (255p) isa convex problem and satisfies Slater’s condition, 2.47 then each problem (p)and (d) attains the same optimal value of its objective and each problemis called a strong dual to the other because the duality gap (primal-dualobjective difference) is 0. Then (p) and (d) are together equivalent to theminimax problemminimize α T x − β T zx,y,zsubject to x ∈ K ,y ∈ K ∗Cx = β , C T z + y = α(p)−(d) (257)whose optimal objective always has the saddle-value 0 (regardless of theparticular convex cone K and other problem parameters). [240,3.2] Thusdetermination of convergence for either primal or dual problem is facilitated.2.13.1.1 Key properties of dual coneFor any cone, (−K) ∗ = −K ∗For any cones K 1 and K 2 , K 1 ⊆ K 2 ⇒ K ∗ 1 ⊇ K ∗ 2 [218,2.7]For closed convex cones K 1 and K 2 , their Cartesian productK = K 1 × K 2 is a closed convex cone, andK ∗ = K ∗ 1 × K ∗ 2 (258)2.47 A convex problem, essentially, has convex objective function optimized over a convexset. (6) In this context, (p) is convex if K is a convex cone. Slater’s condition is satisfiedwhenever any primal strictly feasible point exists. (p.374)

2.13. DUAL CONE & GENERALIZED INEQUALITY 153(conjugation) [202,14] [64,4.5] When K is any convex cone, the dualof the dual cone is the closure of the original cone; K ∗∗ = K . BecauseK ∗∗∗ = K ∗ K ∗ = (K) ∗ (259)When K is closed and convex, then the dual of the dual cone is theoriginal cone; K ∗∗ = K .If any cone K has nonempty interior, then K ∗ is pointed;K nonempty interior ⇒ K ∗ pointed (260)Conversely, if the closure of any convex cone K is pointed, then K ∗ hasnonempty interior;K pointed ⇒ K ∗ nonempty interior (261)Given that a cone K ⊂ R n is closed and convex, K is pointed ifand only if K ∗ − K ∗ = R n ; id est, iff K ∗has nonempty interior.[38,3.3, exer.20](vector sum) [202, thm.3.8] For convex cones K 1 and K 2K 1 + K 2 = conv(K 1 ∪ K 2 ) (262)(dual vector-sum) [202,16.4.2] [64,4.6] For convex cones K 1 and K 2K ∗ 1 ∩ K ∗ 2 = (K 1 + K 2 ) ∗ = (K 1 ∪ K 2 ) ∗ (263)(closure of vector sum of duals) 2.48 For closed convex cones K 1 and K 2(K 1 ∩ K 2 ) ∗ = K ∗ 1 + K ∗ 2 = conv(K ∗ 1 ∪ K ∗ 2) (264)where closure becomes superfluous under the condition K 1 ∩ int K 2 ≠∅[38,3.3, exer.16,4.1, exer.7].2.48 These parallel analogous results for subspaces R 1 , R 2 ⊆ R n ; [64,4.6]R ⊥⊥ = R for any subspace R.(R 1 + R 2 ) ⊥ = R ⊥ 1 ∩ R ⊥ 2(R 1 ∩ R 2 ) ⊥ = R ⊥ 1 + R⊥ 2

154 CHAPTER 2. CONVEX GEOMETRY(Krein-Rutman) For closed convex cones K 1 ⊆ R m and K 2 ⊆ R nand any linear map A : R n → R m , then provided int K 1 ∩ AK 2 ≠ ∅[38,3.3.13, confer4.1, exer.9](A −1 K 1 ∩ K 2 ) ∗ = A T K ∗ 1 + K ∗ 2 (265)where the dual of cone K 1 is with respect to its ambient space R m andthe dual of cone K 2 is with respect to R n , where A −1 K 1 denotes theinverse image (2.1.9.0.1) of K 1 under mapping A , and where A Tdenotes the adjoint operation.K is proper if and only if K ∗ is proper.K is polyhedral if and only if K ∗ is polyhedral. [218,2.8]K is simplicial if and only if K ∗ is simplicial. (2.13.9.2) A simplicialcone and its dual are polyhedral proper cones (Figure 47, Figure 39),but not the converse.K ⊞ −K ∗ = R n ⇔ K is closed and convex. (1625) (p.644)Any direction in a proper cone K is normal to a hyperplane separatingK from −K ∗ .2.13.1.2 Examples of dual coneWhen K is R n , K ∗ is the point at the origin, and vice versa.When K is a subspace, K ∗ is its orthogonal complement, and vice versa.(E.9.2.1, Figure 43)When cone K is a halfspace in R n with n > 0 (Figure 42 for example),the dual cone K ∗ is a ray (base 0) belonging to that halfspace but orthogonalto its bounding hyperplane (that contains the origin), and vice versa.When convex cone K is a closed halfplane in R 3 (Figure 44), it is neitherpointed or of nonempty interior; hence, the dual cone K ∗ can be neither ofnonempty interior or pointed.When K is any particular orthant in R n , the dual cone is identical; id est,K = K ∗ .

2.13. DUAL CONE & GENERALIZED INEQUALITY 155K ∗K0K ∗K0Figure 43: When convex cone K is any one Cartesian axis, its dual cone isthe convex hull of all axes remaining. In R 3 , dual cone K ∗ (drawn tiled andtruncated) is a hyperplane through the origin; its normal belongs to line K .In R 2 , the dual cone K ∗ is a line through the origin while convex cone K isthe line orthogonal.

156 CHAPTER 2. CONVEX GEOMETRYKK ∗Figure 44: K and K ∗ are halfplanes in R 3 ; blades. Both semi-infinite convexcones appear truncated. Each cone is like K in Figure 42, but embedded ina two-dimensional subspace of R 3 . Cartesian coordinate axes drawn forreference.When K is any quadrant in subspace R 2 , K ∗ is a[ wedge-shaped ] polyhedralcone in R 3 ; e.g., for K equal to quadrant I , K ∗ R2+= .RWhen K is a polyhedral flavor of the Lorentz cone K l (241), the dual isthe polyhedral proper cone K q : for l=1 or ∞K q = K ∗ l ={[ xt]}∈ R n × R | ‖x‖ q ≤ t(266)where ‖x‖ q is the dual norm determined via solution to 1/l + 1/q = 1.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1572.13.2 Abstractions of Farkas’ lemma2.13.2.0.1 Corollary. Generalized inequality and membership relation.[129,A.4.2] Let K be any closed convex cone and K ∗ its dual, and let xand y belong to a vector space R n . Thenx ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ (267)which is a simple translation of the Farkas lemma [76] as in [202,22] to thelanguage of convex cones, and a generalization of the well-known Cartesianfactx ≽ 0 ⇔ 〈y , x〉 ≥ 0 for all y ≽ 0 (268)for which implicitly K = K ∗ = R n + the nonnegative orthant. By closure wehave conjugation:y ∈ K ∗ ⇔ 〈y , x〉 ≥ 0 for all x ∈ K (269)merely, a statement of fact by definition of the dual cone (252).When K and K ∗ are pointed closed convex cones, membership relation(267) is often written instead using dual generalized inequalitiesx ≽K0 ⇔ 〈y , x〉 ≥ 0 for all y ≽ 0 (270)K ∗meaning, coordinates for biorthogonal expansion of x (2.13.8) [246] mustbe nonnegative when x belongs to K . By conjugation [202, thm.14.1]y ≽K ∗0 ⇔ 〈y , x〉 ≥ 0 for all x ≽K0 (271)⋄When pointed closed convex cone K is not polyhedral, coordinate axesfor biorthogonal expansion asserted by the corollary are taken from extremedirections of K ; expansion is assured by Carathéodory’s theorem (E.6.4.1.1).We presume, throughout, the obvious:x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ (267)⇔x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ , ‖y‖= 1(272)

158 CHAPTER 2. CONVEX GEOMETRY2.13.2.0.2 Exercise.Test Corollary 2.13.2.0.1 and (272) graphically on the two-dimensionalpolyhedral cone and its dual in Figure 41.When pointed closed convex cone K(confer2.7.2.2)x ≽ 0 ⇔ x ∈ Kx ≻ 0 ⇔ x ∈ rel int Kis implicit from context:(273)Strict inequality x ≻ 0 means coordinates for biorthogonal expansion of xmust be positive when x belongs to rel int K . Strict membership relationsare useful; e.g., for any proper cone K and its dual K ∗x ∈ int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ , y ≠ 0 (274)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ int K ∗ (275)By conjugation, we also have the dual relations:y ∈ int K ∗ ⇔ 〈y , x〉 > 0 for all x ∈ K , x ≠ 0 (276)y ∈ K ∗ , y ≠ 0 ⇔ 〈y , x〉 > 0 for all x ∈ int K (277)Boundary-membership relations for proper cones are also useful:x ∈ ∂K ⇔ ∃ y 〈y , x〉 = 0, y ∈ K ∗ , y ≠ 0, x ∈ K (278)y ∈ ∂K ∗ ⇔ ∃ x 〈y , x〉 = 0, x ∈ K , x ≠ 0, y ∈ K ∗ (279)2.13.2.0.3 Example. Dual linear transformation. [225,4]Consider a given matrix A and closed convex cone K . By membershiprelation we haveThis impliesAy ∈ K ∗ ⇔ x T Ay ≥0 ∀x ∈ K⇔ y T z ≥0 ∀z ∈ {A T x | x ∈ K}⇔ y ∈ {A T x | x ∈ K} ∗ (280){y | Ay ∈ K ∗ } = {A T x | x ∈ K} ∗ (281)If we regard A as a linear operator, then A T is its adjoint.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1592.13.2.1 Null certificate, Theorem of the alternativeIf in particular x p /∈ K a closed convex cone, then the construction inFigure 40(b) suggests there exists a hyperplane having inward-normalbelonging to dual cone K ∗ separating x p from K ; indeed, (267)x p /∈ K ⇔ ∃ y ∈ K ∗ 〈y , x p 〉 < 0 (282)The existence of any one such y is a certificate of null membership. From adifferent perspective,x p ∈ Kor in the alternative∃ y ∈ K ∗ 〈y , x p 〉 < 0(283)By alternative is meant: these two systems are incompatible; one system isfeasible while the other is not.2.13.2.1.1 Example. Theorem of the alternative for linear inequality.Myriad alternative systems of linear inequality can be explained in terms ofpointed closed convex cones and their duals.From membership relation (269) with affine transformation of dualvariable we write, for example,b − Ay ∈ K ∗ ⇔ x T (b − Ay)≥ 0 ∀x ∈ K (284)A T x=0, b − Ay ∈ K ∗ ⇒ x T b ≥ 0 ∀x ∈ K (285)where A∈ R n×m and b∈R n are given. Given membership relation (284),conversely, suppose we allow any y ∈ R m . Then because −x T Ay isunbounded below, x T (b −Ay)≥0 implies A T x=0: for y ∈ R mIn toto,A T x=0, b − Ay ∈ K ∗ ⇐ x T (b − Ay)≥ 0 ∀x ∈ K (286)b − Ay ∈ K ∗ ⇔ x T b ≥ 0, A T x=0 ∀x ∈ K (287)Vector x belongs to cone K but is constrained to lie in a subspaceof R n specified by an intersection of hyperplanes through the origin{x∈ R n |A T x=0}. From this, alternative systems of generalized inequalitywith respect to pointed closed convex cones K and K ∗

160 CHAPTER 2. CONVEX GEOMETRYAy ≼K ∗or in the alternativeb(288)x T b < 0, A T x=0, x ≽K0derived from (287) simply by taking the complementary sense of theinequality in x T b . These two systems are alternatives; if one system hasa solution, then the other does not. 2.49 [202, p.201]By invoking a strict membership relation between proper cones (274), wecan construct a more exotic interdependency strengthened by the demandfor an interior point;b − Ay ≻K ∗0 ⇔ x T b > 0, A T x=0 ∀x ≽K0, x ≠ 0 (289)From this, alternative systems of generalized inequality [41, pp.50, 54, 262]Ay ≺K ∗or in the alternativeb(290)x T b ≤ 0, A T x=0, x ≽K0, x ≠ 0derived from (289) by taking the complementary sense of the inequalityin x T b .2.49 If solutions at ±∞ are disallowed, then the alternative systems become insteadmutually exclusive with respect to nonpolyhedral cones. Simultaneous infeasibility ofthe two systems is not precluded by mutual exclusivity; called a weak alternative.Ye provides an example illustrating simultaneous [ ] infeasibility[ with respect ] to the positivesemidefinite cone: x∈ S 2 1 00 1, y ∈ R , A = , and b = where x0 01 0T b means〈x , b〉 . A better strategy than disallowing solutions at ±∞ is to demand an interiorpoint as in (290) or Lemma 6.2.1.1.2. Then question of simultaneous infeasibility is moot.

2.13. DUAL CONE & GENERALIZED INEQUALITY 161And from this, alternative systems with respect to the nonnegativeorthant attributed to Gordan in 1873: [94] [38,2.2] substituting A ←A Tand setting b = 0A T y ≺ 0or in the alternativeAx = 0, x ≽ 0, ‖x‖ 1 = 1(291)2.13.3 Optimality conditionThe general first-order necessary and sufficient condition for optimalityof solution x ⋆ to a minimization problem ((255p) for example) withreal differentiable convex objective function ˆf(x) : R n →R is [201,3](confer2.13.10.1) (Figure 50)∇ ˆf(x ⋆ ) T (x − x ⋆ ) ≥ 0 ∀x ∈ C , x ⋆ ∈ C (292)where C is the feasible set, a convex set of all variable values satisfying theproblem constraints, and where ∇ ˆf(x ⋆ ) is the gradient of ˆf (3.1.1.4) withrespect to x evaluated at x ⋆ .2.13.3.0.1 Example. Equality constrained problem.Given a real differentiable convex function ˆf(x) : R n →R defined ondomain R n , a fat full-rank matrix C ∈ R p×n , and vector d∈ R p , the convexoptimization problemminimize ˆf(x)x(293)subject to Cx = dis characterized by the well-known necessary and sufficient optimalitycondition [41,4.2.3]∇ ˆf(x ⋆ ) + C T ν = 0 (294)where ν ∈ R p is the eminent Lagrange multiplier. [200] Feasible solution x ⋆is optimal, in other words, if and only if ∇ ˆf(x ⋆ ) belongs to R(C T ). Viamembership relation, we now derive this particular condition from the generalfirst-order condition for optimality (292):In this case, the feasible set isC ∆ = {x∈ R n | Cx = d} = {Zξ + x p | ξ ∈ R n−rank C } (295)

162 CHAPTER 2. CONVEX GEOMETRYwhere Z ∈ R n×n−rank C holds basis N(C) columnar, and x p is any particularsolution to Cx = d . Since x ⋆ ∈ C , we arbitrarily choose x p = x ⋆ whichyields the equivalent optimality condition∇ ˆf(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (296)But this is simply half of a membership relation, and the cone dual toR n−rank C is the origin in R n−rank C . We must therefore haveZ T ∇ ˆf(x ⋆ ) = 0 ⇔ ∇ ˆf(x ⋆ ) T Zξ ≥ 0 ∀ξ∈ R n−rank C (297)meaning, ∇ ˆf(x ⋆ ) must be orthogonal to N(C). This conditionZ T ∇ ˆf(x ⋆ ) = 0 , x ⋆ ∈ C (298)is necessary and sufficient for optimality of x ⋆ .2.13.4 Discretization of membership relation2.13.4.1 Dual halfspace-descriptionHalfspace-description of the dual cone is equally simple (and extensibleto an infinite number of generators) as vertex-description (246) for thecorresponding closed convex cone: By definition (252), for X ∈ R n×N as in(234), (confer (240))K ∗ = { y ∈ R n | z T y ≥ 0 for all z ∈ K }= { y ∈ R n | z T y ≥ 0 for all z = Xa , a ≽ 0 }= { y ∈ R n | a T X T y ≥ 0, a ≽ 0 }= { y ∈ R n | X T y ≽ 0 } (299)that follows from the generalized inequality and membership corollary (268).The semi-infinity of tests specified by all z ∈ K has been reduced to a setof generators for K constituting the columns of X ; id est, the test has beendiscretized.Whenever K is known to be closed and convex, then the converse mustalso hold; id est, given any set of generators for K ∗ arranged columnarin Y , then the consequent vertex-description of the dual cone connotes ahalfspace-description for K : [218,2.8]K ∗ = {Y a | a ≽ 0} ⇔ K ∗∗ = K = { y | Y T y ≽ 0 } (300)

2.13. DUAL CONE & GENERALIZED INEQUALITY 1632.13.4.2 First dual-cone formulaFrom these two results (299) and (300) we deduce a general principle:From any given vertex-description of a convex cone K , ahalfspace-description of the dual cone K ∗ is immediate by matrixtransposition; conversely, from any given halfspace-description, a dualvertex-description is immediate.Various other converses are just a little trickier. (2.13.9)We deduce further: For any polyhedral cone K , the dual cone K ∗ is alsopolyhedral and K ∗∗ = K . [218,2.8]The generalized inequality and membership corollary is discretized in thefollowing theorem [17,1] 2.50 that follows directly from (299) and (300):2.13.4.2.1 Theorem. Discrete membership. (confer2.13.2.0.1)Given any set of generators (2.8.1.2) denoted by G(K) for closed convexcone K ⊆ R n and any set of generators denoted G(K ∗ ) for its dual, let xand y belong to vector space R n . Then discretization of the generalizedinequality and membership corollary is necessary and sufficient for certifyingmembership:x ∈ K ⇔ 〈γ ∗ , x〉 ≥ 0 for all γ ∗ ∈ G(K ∗ ) (301)y ∈ K ∗ ⇔ 〈γ , y〉 ≥ 0 for all γ ∈ G(K) (302)⋄2.13.4.2.2 Exercise.Test Theorem 2.13.4.2.1 on Figure 41(a) using the extreme directions asgenerators.From this we may further deduce a more surgical description of dual conethat prescribes only a finite number of halfspaces for its construction whenpolyhedral: (Figure 40(a))K ∗ = {y ∈ R n | 〈γ , y〉 ≥ 0 for all γ ∈ G(K)} (303)2.50 Barker & Carlson state the theorem only for the pointed closed convex case.

164 CHAPTER 2. CONVEX GEOMETRY2.13.4.2.3 Example. Comparison with respect to orthant.When comparison is with respect to the nonnegative orthant K = R n + , thenfrom the discrete membership theorem it directly follows:x ≼ z ⇔ x i ≤ z i ∀i (304)Simple examples show entrywise inequality holds only with respect to thenonnegative orthant.2.13.5 Dual PSD cone and generalized inequalityThe dual positive semidefinite cone K ∗ is confined to S M by convention;S M +∗ ∆ = {Y ∈ S M | 〈Y , X〉 ≥ 0 for all X ∈ S M + } = S M + (305)The positive semidefinite cone is self-dual in the ambient space of symmetricmatrices [41, exmp.2.24] [27] [126,II]; K = K ∗ .Dual generalized inequalities with respect to the positive semidefinite conein the ambient space of symmetric matrices can therefore be simply stated:(Fejér)X ≽ 0 ⇔ tr(Y T X) ≥ 0 for all Y ≽ 0 (306)Membership to this cone can be determined in the isometrically isomorphicEuclidean space R M2 via (30). (2.2.1) By the two interpretations in2.13.1,positive semidefinite matrix Y can be interpreted as inward-normal to ahyperplane supporting the positive semidefinite cone.The fundamental statement of positive semidefiniteness, y T Xy ≥0 ∀y(A.3.0.0.1), is a particular instance of these generalized inequalities (306):X ≽ 0 ⇔ 〈yy T , X〉 ≥ 0 ∀yy T (≽ 0) (1086)Discretization (2.13.4.2.1) allows replacement of positive semidefinitematrices Y with this minimal set of generators comprising the extremedirections of the positive semidefinite cone (2.9.2.4).

2.13. DUAL CONE & GENERALIZED INEQUALITY 165(a)K0 ∂K ∗∂K ∗K(b)Figure 45: Two (truncated) views of a polyhedral cone K and its dual in R 3 .Each of four extreme directions from K belongs to a face of dual cone K ∗ .Shrouded (inside) cone K is symmetrical about its axis of revolution. Eachpair of diametrically opposed extreme directions from K makes a right angle.An orthant (or any rotation thereof; a simplicial cone) is not the only self-dualpolyhedral cone in three or more dimensions; [16,2.A.21] e.g., consider anequilateral with five extreme directions. In fact, every self-dual polyhedralcone in R 3 has an odd number of extreme directions. [18, thm.3]

166 CHAPTER 2. CONVEX GEOMETRY2.13.5.1 self-dual conesFrom (106) (a consequence of the halfspaces theorem (2.4.1.1.1)), where theonly finite value of the support function for a convex cone is 0 [129,C.2.3.1],or from discretized definition (303) of the dual cone we get a ratherself-evident characterization of self-duality:K = K ∗ ⇔ K = ⋂ {y | γ T y ≥ 0 } (307)γ∈G(K)In words: Cone K is self-dual iff its own extreme directions areinward-normals to a (minimal) set of hyperplanes bounding halfspaces whoseintersection constructs it. This means each extreme direction of K is normalto a hyperplane exposing one of its own faces; a necessary but insufficientcondition for self-duality (Figure 45, for example).Self-dual cones are of necessarily nonempty interior [24,I] and invariantto rotation about the origin. Their most prominent representatives are theorthants, the positive semidefinite cone S M + in the ambient space of symmetricmatrices (305), and the Lorentz cone (142) [16,II.A] [41, exmp.2.25]. Inthree dimensions, a plane containing the axis of revolution of a self-dual cone(and the origin) will produce a slice whose boundary makes a right angle.2.13.5.1.1 Example. Linear matrix inequality.Consider a peculiar vertex-description for a closed convex cone defined overthe positive semidefinite cone (instead of the nonnegative orthant as indefinition (79)): for X ∈ S n given A j ∈ S n , j =1... m⎧⎡⎨K = ⎣⎩⎧⎡⎨= ⎣⎩〈A 1 , X〉.〈A m , X〉⎤ ⎫⎬⎦ | X ≽ 0⎭ ⊆ Rm⎤ ⎫svec(A 1 ) T⎬. ⎦svec X | X ≽ 0svec(A m ) T ⎭∆= {A svec X | X ≽ 0}(308)where A∈ R m×n(n+1)/2 , and where symmetric vectorization svec is definedin (46). K is indeed a convex cone because by (139)A svec X p1 , A svec X p2 ∈ K ⇒ A(ζ svec X p1 +ξ svec X p2 ) ∈ K for all ζ,ξ ≥ 0(309)

2.13. DUAL CONE & GENERALIZED INEQUALITY 167since a nonnegatively weighted sum of positive semidefinite matrices must bepositive semidefinite. (A.3.1.0.2) Although matrix A is finite-dimensional,K is generally not a polyhedral cone (unless m equals 1 or 2) becauseX ∈ S n + . Provided the A j matrices are linearly independent, thenrel int K = int K (310)meaning, the cone interior is nonempty implying the dual cone is pointedby (260).If matrix A has no nullspace, on the other hand, then (by2.10.1.1 andDefinition 2.2.1.0.1) A svec X is an isomorphism in X between the positivesemidefinite cone and R(A). In that case, convex cone K has relativeinteriorrel int K = {A svec X | X ≻ 0} (311)and boundaryrel ∂K = {A svec X | X ≽ 0, X ⊁ 0} (312)Now consider the (closed convex) dual cone:K ∗ = {y | 〈A svec X , y〉 ≥ 0 for all X ≽ 0} ⊆ R m= { y | 〈svec X , A T y〉 ≥ 0 for all X ≽ 0 }= { y | svec −1 (A T y) ≽ 0 } (313)that follows from (306) and leads to an equally peculiar halfspace-descriptionm∑K ∗ = {y ∈ R m | y j A j ≽ 0} (314)The summation inequality with respect to the positive semidefinite cone isknown as a linear matrix inequality. [40] [86] [171] [243]When the A j matrices are linearly independent, function g(y) = ∆ ∑ y j A jon R m is a linear bijection. The inverse image of the positive semidefinitecone under g(y) must therefore have dimension m . In that circumstance,the dual cone interior is nonemptym∑int K ∗ = {y ∈ R m | y j A j ≻ 0} (315)having boundary∂K ∗ = {y ∈ R m |j=1j=1m∑y j A j ≽ 0,j=1m∑y j A j ⊁ 0} (316)j=1

168 CHAPTER 2. CONVEX GEOMETRY2.13.6 Dual of pointed polyhedral coneIn a subspace of R n , now we consider a pointed polyhedral cone K given interms of its extreme directions Γ i arranged columnar in X ;X = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (234)The extremes theorem (2.8.1.1.1) provides the vertex-description of apointed polyhedral cone in terms of its finite number of extreme directionsand its lone vertex at the origin:2.13.6.0.1 Definition. Pointed polyhedral cone, vertex-description.(encore) (confer (246) (152)) Given pointed polyhedral cone K in a subspaceof R n , denoting its i th extreme direction by Γ i ∈ R n arranged in a matrix Xas in (234), then that cone may be described: (71) (confer (247))K = { [0 X ] aζ | a T 1 = 1, a ≽ 0, ζ ≥ 0 }{= Xaζ | a T 1 ≤ 1, a ≽ 0, ζ ≥ 0 }{ } (317)= Xb | b ≽ 0 ⊆ Rnthat is simply a conic hull (like (79)) of a finite number N of directions.△Whenever cone K is pointed closed and convex (not only polyhedral), thendual cone K ∗ has a halfspace-description in terms of the extreme directionsΓ i of K :K ∗ = { y | γ T y ≥ 0 for all γ ∈ {Γ i , i=1... N} ⊆ rel ∂K } (318)because when {Γ i } constitutes any set of generators for K , the discretizationresult in2.13.4.1 allows relaxation of the requirement ∀x∈ K in (252) to∀γ∈{Γ i } directly. 2.51 That dual cone so defined is unique, identical to (252),polyhedral whenever the number of generators N is finiteK ∗ = { y | X T y ≽ 0 } ⊆ R n (299)and has nonempty interior because K is assumed pointed (but K ∗ is notnecessarily pointed unless K has nonempty interior (2.13.1.1)).2.51 The extreme directions of K constitute a minimal set of generators.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1692.13.6.1 Facet normal & extreme directionWe see from (299) that the conically independent generators of cone K(namely, the extreme directions of pointed closed convex cone K constitutingthe columns of X) each define an inward-normal to a hyperplane supportingK ∗ (2.4.2.6.1) and exposing a dual facet when N is finite. Were K ∗pointed and finitely generated, then by conjugation the dual statementwould also hold; id est, the extreme directions of pointed K ∗ each definean inward-normal to a hyperplane supporting K and exposing a facet whenN is finite. Examine Figure 41 or Figure 45, for example.We may conclude the extreme directions of polyhedral proper K arerespectively orthogonal to the facets of K ∗ ; likewise, the extreme directionsof polyhedral proper K ∗ are respectively orthogonal to the facets of K .2.13.7 Biorthogonal expansion by example2.13.7.0.1 Example. Relationship to dual polyhedral cone.The simplicial cone K illustrated in Figure 46 induces a partial order on R 2 .All points greater than x with respect to K , for example, are contained inthe translated cone x + K . The extreme directions Γ 1 and Γ 2 of K do notmake an orthogonal set; neither do extreme directions Γ 3 and Γ 4 of dualcone K ∗ ; rather, we have the biorthogonality condition, [246]Γ T 4 Γ 1 = Γ T 3 Γ 2 = 0Γ T 3 Γ 1 ≠ 0, Γ T 4 Γ 2 ≠ 0(319)Biorthogonal expansion of x ∈ K is thenx = Γ 1Γ T 3 xΓ T 3 Γ 1+ Γ 2Γ T 4 xΓ T 4 Γ 2(320)where Γ T 3 x/(Γ T 3 Γ 1 ) is the nonnegative coefficient of nonorthogonal projection(E.6.1) of x on Γ 1 in the direction orthogonal to Γ 3 , and whereΓ T 4 x/(Γ T 4 Γ 2 ) is the nonnegative coefficient of nonorthogonal projection ofx on Γ 2 in the direction orthogonal to Γ 4 ; they are coordinates in thisnonorthogonal system. Those coefficients must be nonnegative x ≽ K0because x ∈ K (273) and K is simplicial.If we ascribe the extreme directions of K to the columns of a matrixX ∆ = [ Γ 1 Γ 2 ] (321)

170 CHAPTER 2. CONVEX GEOMETRYΓ 4Γ 2Γ 3K ∗xzKyΓ 10Γ 1 ⊥ Γ 4Γ 2 ⊥ Γ 3K ∗w + KwFigure 46: Simplicial cone K in R 2 and its dual K ∗ drawn truncated. Conicallyindependent generators Γ 1 and Γ 2 constitute extreme directions of K whileΓ 3 and Γ 4 constitute extreme directions of K ∗ . Dotted ray-pairs boundtranslated cones K . Point x is comparable to point z (and vice versa) butnot to y ; z ≽ x ⇔ z − x ∈ K ⇔ z − x ≽ K0 iff ∃ nonnegative coordinatesfor biorthogonal expansion of z − x . Point y is not comparable to z becausez does not belong to y ± K . Flipping a translated cone is quite helpful forvisualization: x ≼ z ⇔ x ∈ z − K ⇔ x − z ≼ K0. Points need not belongto K to be comparable; e.g., all points greater than w belong to w + K .

2.13. DUAL CONE & GENERALIZED INEQUALITY 171then we findTherefore,X †T =[Γ 31Γ T 3 Γ 1Γ 41Γ T 4 Γ 2]x = XX † x (328)(322)is the biorthogonal expansion (320) (E.0.1), and the biorthogonalitycondition (319) can be expressed succinctly (E.1.1) 2.52X † X = I (329)Expansion w=XX † w for any w ∈ R 2 is unique if and only if the extremedirections of K are linearly independent; id est, iff X has no nullspace.2.13.7.1 Pointed cones and biorthogonalityBiorthogonality condition X † X = I from Example 2.13.7.0.1 means Γ 1 andΓ 2 are linearly independent generators of K (B.1.1.1); generators becauseevery x ∈ K is their conic combination. From2.10.2 we know that meansΓ 1 and Γ 2 must be extreme directions of K .A biorthogonal expansion is necessarily associated with a pointed closedconvex cone; pointed, otherwise there can be no extreme directions (2.8.1).We will address biorthogonal expansion with respect to a pointed polyhedralcone having empty interior in2.13.8.2.13.7.1.1 Example. Expansions implied by diagonalization.(confer5.5.3.2.1) When matrix X ∈ R M×M is diagonalizable (A.5),⎡ ⎤w T1 M∑X = SΛS −1 = [s 1 · · · s M ] Λ⎣. ⎦ = λ i s i wi T (1181)coordinates for biorthogonal expansion are its eigenvalues λ i (contained indiagonal matrix Λ) when expanded in S ;⎡ ⎤wX = SS −1 1 T X M∑X = [s 1 · · · s M ] ⎣ . ⎦ = λ i s i wi T (323)wM T X i=12.52 Possibly confusing is the fact that formula XX † x is simultaneously the orthogonalprojection of x on R(X) (1510), and a sum of nonorthogonal projections of x ∈ R(X) onthe range of each and every column of full-rank X skinny-or-square (E.5.0.0.2).w T Mi=1

172 CHAPTER 2. CONVEX GEOMETRYCoordinate value depend upon the geometric relationship of X to its linearlyindependent eigenmatrices s i w T i . (A.5.1,B.1.1)Eigenmatrices s i w T i are linearly independent dyads constituted by rightand left eigenvectors of diagonalizable X and are generators of somepointed polyhedral cone K in a subspace of R M×M .When S is real and X belongs to that polyhedral cone K , for example,then coordinates of expansion (the eigenvalues λ i ) must be nonnegative.When X = QΛQ T is symmetric, coordinates for biorthogonal expansionare its eigenvalues when expanded in Q ; id est, for X ∈ S MX = QQ T X =M∑q i qi T X =i=1M∑λ i q i qi T ∈ S M (324)i=1becomes an orthogonal expansion with orthonormality condition Q T Q=Iwhere λ i is the i th eigenvalue of X , q i is the corresponding i th eigenvectorarranged columnar in orthogonal matrixQ = [q 1 q 2 · · · q M ] ∈ R M×M (325)and where eigenmatrix q i qiT is an extreme direction of some pointedpolyhedral cone K ⊂ S M and an extreme direction of the positive semidefinitecone S M + .Orthogonal expansion is a special case of biorthogonal expansion ofX ∈ aff K occurring when polyhedral cone K is any rotation about theorigin of an orthant belonging to a subspace.Similarly, when X = QΛQ T belongs to the positive semidefinite cone inthe subspace of symmetric matrices, coordinates for orthogonal expansionmust be its nonnegative eigenvalues (1094) when expanded in Q ; id est, forX ∈ S M +M∑M∑X = QQ T X = q i qi T X = λ i q i qi T ∈ S M + (326)i=1where λ i ≥0 is the i th eigenvalue of X . This means X simultaneouslybelongs to the positive semidefinite cone and to the pointed polyhedral coneK formed by the conic hull of its eigenmatrices.i=1

2.13. DUAL CONE & GENERALIZED INEQUALITY 1732.13.7.1.2 Example. Expansion respecting nonpositive orthant.Suppose x ∈ K any orthant in R n . 2.53 Then coordinates for biorthogonalexpansion of x must be nonnegative; in fact, absolute value of the Cartesiancoordinates.Suppose, in particular, x belongs to the nonpositive orthant K = R n − .Then the biorthogonal expansion becomes an orthogonal expansionn∑n∑x = XX T x = −e i (−e T i x) = −e i |e T i x| ∈ R n − (327)i=1and the coordinates of expansion are nonnegative. For this orthant K we haveorthonormality condition X T X = I where X = −I , e i ∈ R n is a standardbasis vector, and −e i is an extreme direction (2.8.1) of K .Of course, this expansion x=XX T x applies more broadly to domain R n ,but then the coordinates each belong to all of R .2.13.8 Biorthogonal expansion, derivationBiorthogonal expansion is a means for determining coordinates in a pointedconic coordinate system characterized by a nonorthogonal basis. Studyof nonorthogonal bases invokes pointed polyhedral cones and their duals;extreme directions of a cone K are assumed to constitute the basis whilethose of the dual cone K ∗ determine coordinates.Unique biorthogonal expansion with respect to K depends upon existenceof its linearly independent extreme directions: Polyhedral cone K must bepointed; then it possesses extreme directions. Those extreme directions mustbe linearly independent to uniquely represent any point in their span.We consider nonempty pointed polyhedral cone K having possibly emptyinterior; id est, we consider a basis spanning a subspace. Then we needonly observe that section of dual cone K ∗ in the affine hull of K because, byexpansion of x , membership x∈aff K is implicit and because any breachof the ordinary dual cone into ambient space becomes irrelevant (2.13.9.3).Biorthogonal expansioni=1x = XX † x ∈ aff K = aff cone(X) (328)is expressed in the extreme directions {Γ i } of K arranged columnar in2.53 An orthant is simplicial and self-dual.X = [ Γ 1 Γ 2 · · · Γ N ] ∈ R n×N (234)

174 CHAPTER 2. CONVEX GEOMETRYunder assumption of biorthogonalityX † X = I (329)We therefore seek, in this section, a vertex-description for K ∗ ∩ aff K interms of linearly independent dual generators {Γ ∗ i }⊂aff K in the same finitequantity 2.54 as the extreme directions {Γ i } ofK = cone(X) = {Xa | a ≽ 0} ⊆ R n (246)We assume the quantity of extreme directions N does not exceed thedimension n of ambient vector space because, otherwise, the expansion couldnot be unique; id est, assume N linearly independent extreme directionshence N ≤ n (X skinny 2.55 -or-square full-rank). In other words, fat full-rankmatrix X is prohibited by uniqueness because of the existence of an infinityof right-inverses;polyhedral cones whose extreme directions number in excess of theambient space dimension are precluded in biorthogonal expansion.2.13.8.1 x ∈ KSuppose x belongs to K ⊆ R n . Then x =Xa for some a ≽ 0. Vector a isunique only when {Γ i } is a linearly independent set. 2.56 Vector a ∈ R N cantake the form a =Bx if R(B)= R N . Then we require Xa =XBx = x andBx = BXa = a . The pseudoinverse B = X † ∈ R N×n (E) is suitable whenX is skinny-or-square and full-rank. In that case rankX = N , and for allc ≽ 0 and i=1... Na ≽ 0 ⇔ X † Xa ≽ 0 ⇔ a T X T X †T c ≥ 0 ⇔ Γ T i X †T c ≥ 0 (330)2.54 When K is contained in a proper subspace of R n , the ordinary dual cone K ∗ will havemore generators in any minimal set than K has extreme directions.2.55 “Skinny” meaning thin; more rows than columns.2.56 Conic independence alone (2.10) is insufficient to guarantee uniqueness.

2.13. DUAL CONE & GENERALIZED INEQUALITY 175The penultimate inequality follows from the generalized inequality andmembership corollary, while the last inequality is a consequence of thatcorollary’s discretization (2.13.4.2.1). 2.57 From (330) and (318) we deduceK ∗ ∩ aff K = cone(X †T ) = {X †T c | c ≽ 0} ⊆ R n (331)is the vertex-description for that section of K ∗ in the affine hull of K becauseR(X †T ) = R(X) by definition of the pseudoinverse. From (260), we knowK ∗ ∩ aff K must be pointed if rel int K is logically assumed nonempty withrespect to aff K .Conversely, suppose full-rank skinny-or-square matrix[ ]X †T =∆ Γ ∗ 1 Γ ∗ 2 · · · Γ ∗ N ∈ R n×N (332)comprises the extreme directions {Γ ∗ i } ⊂ aff K of the dual-cone intersectionwith the affine hull of K . 2.58 From the discrete membership theorem and(264) we get a partial dual to (318); id est, assuming x∈aff cone X{ }x ∈ K ⇔ γ ∗T x ≥ 0 for all γ ∗ ∈ Γ ∗ i , i=1... N ⊂ ∂K ∗ ∩ aff K (333)⇔ X † x ≽ 0 (334)that leads to a partial halfspace-description,K = { x∈aff cone X | X † x ≽ 0 } (335)For γ ∗ =X †T e i , any x =Xa , and for all i we have e T i X † Xa = e T i a ≥ 0only when a ≽ 0. Hence x∈ K .2.57a ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀(c ≽ 0 ⇔ a T X T X †T c ≥ 0 ∀a ≽ 0)∀(c ≽ 0 ⇔ Γ T i X†T c ≥ 0 ∀i) Intuitively, any nonnegative vector a is a conic combination of the standard basis{e i ∈ R N }; a≽0 ⇔ a i e i ≽0 for all i. The last inequality in (330) is a consequence of thefact that x=Xa may be any extreme direction of K , in which case a is a standard basisvector; a = e i ≽ 0. Theoretically, because c≽0 defines a pointed polyhedral cone (in fact,the nonnegative orthant in R N ), we can take (330) one step further by discretizing c :a ≽ 0 ⇔ Γ T i Γ ∗ j ≥ 0 for i,j =1... N ⇔ X † X ≥ 0In words, X † X must be a matrix whose entries are each nonnegative.2.58 When closed convex cone K has empty interior, K ∗ has no extreme directions.

176 CHAPTER 2. CONVEX GEOMETRYWhen X is full-rank, then the unique biorthogonal expansion of x ∈ Kbecomes (328)N∑x = XX † x = Γ i Γ ∗Ti x (336)whose coordinates Γ ∗Ti x must be nonnegative because K is assumed pointedclosed and convex. Whenever X is full-rank, so is its pseudoinverse X † .(E) In the present case, the columns of X †T are linearly independent andgenerators of the dual cone K ∗ ∩ aff K ; hence, the columns constitute itsextreme directions. (2.10) That section of the dual cone is itself a polyhedralcone (by (240) or the cone intersection theorem,2.7.2.1.1) having the samenumber of extreme directions as K .2.13.8.2 x ∈ aff KThe extreme directions of K and K ∗ ∩ aff K have a distinct relationship;because X † X = I , then for i,j = 1... N , Γ T i Γ ∗ i = 1, while for i ≠ j ,Γ T i Γ ∗ j = 0. Yet neither set of extreme directions, {Γ i } nor {Γ ∗ i } , isnecessarily orthogonal. This is, precisely, a biorthogonality condition,[246,2.2.4] [131] implying each set of extreme directions is linearlyindependent. (B.1.1.1)The biorthogonal expansion therefore applies more broadly; meaning,for any x ∈ aff K , vector x can be uniquely expressed x =Xb whereb∈R N because aff K contains the origin. Thus, for any such x∈ R(X)(conferE.1.1), biorthogonal expansion (336) becomes x =XX † Xb =Xb .2.13.9 Formulae, algorithm finding dual cone2.13.9.1 Pointed K , dual, X skinny-or-square full-rankWe wish to derive expressions for a convex cone and its ordinary dualunder the general assumptions: pointed polyhedral K denoted by its linearlyindependent extreme directions arranged columnar in matrix X such thatThe vertex-description is given:i=1rank(X ∈ R n×N ) = N ∆ = dim aff K ≤ n (337)K = {Xa | a ≽ 0} ⊆ R n (338)

2.13. DUAL CONE & GENERALIZED INEQUALITY 177from which a halfspace-description for the dual cone follows directly:By defining a matrixK ∗ = {y ∈ R n | X T y ≽ 0} (339)X ⊥ ∆ = basis N(X T ) (340)(a columnar basis for the orthogonal complement of R(X)), we can sayaff cone X = aff K = {x | X ⊥T x = 0} (341)meaning K lies in a subspace, perhaps R n . Thus we have ahalfspace-descriptionK = {x ∈ R n | X † x ≽ 0, X ⊥T x = 0} (342)and from (264), a vertex-description 2.59K ∗ = { [X †T X ⊥ −X ⊥ ]b | b ≽ 0 } ⊆ R n (343)These results are summarized for a pointed polyhedral cone, havinglinearly independent generators, and its ordinary dual:2.13.9.2 Simplicial caseCone Table 1 K K ∗vertex-description X X †T , ±X ⊥halfspace-description X † , X ⊥T X TWhen a convex cone is simplicial (2.12.3), Cone Table 1 simplifies becausethen aff coneX = R n : For square X and assuming simplicial K such thatrank(X ∈ R n×N ) = N ∆ = dim aff K = n (344)we haveCone Table S K K ∗vertex-description X X †Thalfspace-description X † X T2.59 These descriptions are not unique. A vertex-description of the dual cone, for example,might use four conically independent generators for a plane (2.10.0.0.1) when only threewould suffice.

178 CHAPTER 2. CONVEX GEOMETRYFor example, vertex-description (343) simplifies toK ∗ = {X †T b | b ≽ 0} ⊂ R n (345)Now, because dim R(X) = dim R(X †T ) , (E) the dual cone K ∗ is simplicialwhenever K is.2.13.9.3 Membership relations in a subspaceIt is obvious by definition (252) of the ordinary dual cone K ∗ in ambient vectorspace R that its determination instead in subspace M⊆ R is identical toits intersection with M ; id est, assuming closed convex cone K ⊆ M andK ∗ ⊆ R(K ∗ were ambient M) ≡ (K ∗ in ambient R) ∩ M (346)because{y ∈ M | 〈y , x〉 ≥ 0 for all x ∈ K}={y ∈ R | 〈y , x〉 ≥ 0 for all x ∈ K}∩ M(347)From this, a constrained membership relation for the ordinary dual coneK ∗ ⊆ R , assuming x,y ∈ M and closed convex cone K ⊆ My ∈ K ∗ ∩ M ⇔ 〈y , x〉 ≥ 0 for all x ∈ K (348)By closure in subspace M we have conjugation (2.13.1.1):x ∈ K ⇔ 〈y , x〉 ≥ 0 for all y ∈ K ∗ ∩ M (349)This means membership determination in subspace M requires knowledgeof the dual cone only in M . For sake of completeness, for proper cone Kwith respect to subspace M (confer (274))x ∈ int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ ∩ M, y ≠ 0 (350)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ int K ∗ ∩ M (351)(By conjugation, we also have the dual relations.) Yet when M equals aff Kfor K a closed convex conex ∈ rel int K ⇔ 〈y , x〉 > 0 for all y ∈ K ∗ ∩ aff K , y ≠ 0 (352)x ∈ K , x ≠ 0 ⇔ 〈y , x〉 > 0 for all y ∈ rel int(K ∗ ∩ aff K) (353)

2.13. DUAL CONE & GENERALIZED INEQUALITY 1792.13.9.4 Subspace M = aff KAssume now a subspace M that is the affine hull of cone K : Consider againa pointed polyhedral cone K denoted by its extreme directions arrangedcolumnar in matrix X such thatrank(X ∈ R n×N ) = N ∆ = dim aff K ≤ n (337)We want expressions for the convex cone and its dual in subspace M=aff K :Cone Table A K K ∗ ∩ aff Kvertex-description X X †Thalfspace-description X † , X ⊥T X T , X ⊥TWhen dim aff K = n , this table reduces to Cone Table S. These descriptionsfacilitate work in a proper subspace. The subspace of symmetric matricesS N , for example, often serves as ambient space. 2.602.13.9.4.1 Example. Monotone nonnegative cone.[41, exer.2.33] [237,2] Simplicial cone (2.12.3.1.1) K M+ is the cone of allnonnegative vectors having their entries sorted in nonincreasing order:K M+ ∆ = {x | x 1 ≥ x 2 ≥ · · · ≥ x n ≥ 0} ⊆ R n += {x | (e i − e i+1 ) T x ≥ 0, i = 1... n−1, e T nx ≥ 0}= {x | X † x ≽ 0}(354)a halfspace-description where e i is the i th standard basis vector, and whereX †T ∆ = [ e 1 −e 2 e 2 −e 3 · · · e n ] ∈ R n×n (355)(With X † in hand, we might concisely scribe the remaining vertex andhalfspace-descriptions from the tables for K M+ and its dual. Instead weuse generalized inequality in their derivation.) For any vectors x and y ,simple algebra demandsn∑x T y = x i y i = (x 1 − x 2 )y 1 + (x 2 − x 3 )(y 1 + y 2 ) + (x 3 − x 4 )(y 1 + y 2 + y 3 ) + · · ·i=1+ (x n−1 − x n )(y 1 + · · · + y n−1 ) + x n (y 1 + · · · + y n ) (356)2.60 The dual cone of positive semidefinite matrices S N∗+ = S N + remains in S N by convention,whereas the ordinary dual cone would venture into R N×N .

180 CHAPTER 2. CONVEX GEOMETRY10.80.6K ∗ M+(a)0.40.2K M+0−0.2−0.4−0.6K ∗ M+−0.8−1−0.5 0 0.5 1 1.5(b)⎡X †T (:,3) = ⎣001⎤⎦∂K ∗ M+K M+⎡X = ⎣1 1 10 1 10 0 1⎤⎦⎡1X †T (:,1) = ⎣−10⎤⎦⎡ ⎤0X †T (:,2) = ⎣ 1 ⎦−1Figure 47: Simplicial cones. (a) Monotone nonnegative cone K M+ and itsdual K ∗ M+ (drawn truncated) in R2 . (b) Monotone nonnegative cone andboundary of its dual (both drawn truncated) in R 3 . Extreme directions ofK ∗ M+ are indicated.

2.13. DUAL CONE & GENERALIZED INEQUALITY 181Because x i − x i+1 ≥ 0 ∀i by assumption whenever x ∈ K M+ , we can employdual generalized inequalities (271) with respect to the self-dual nonnegativeorthant R n + to find the halfspace-description of the dual cone K ∗ M+ . We cansay x T y ≥ 0 for all X † x ≽ 0 [sic] if and only ifid est,wherey 1 ≥ 0, y 1 + y 2 ≥ 0, ... , y 1 + y 2 + · · · + y n ≥ 0 (357)x T y ≥ 0 ∀X † x ≽ 0 ⇔ X T y ≽ 0 (358)X = [e 1 e 1 +e 2 e 1 +e 2 +e 3 · · · 1 ] ∈ R n×n (359)Because X † x ≽ 0 connotes membership of x to pointed K M+ , then by(252) the dual cone we seek comprises all y for which (358) holds; thus itshalfspace-descriptionK ∗ M+ = {y ≽K ∗ M+0} = {y | ∑ ki=1 y i ≥ 0, k = 1... n} = {y | X T y ≽ 0} ⊂ R n(360)The monotone nonnegative cone and its dual are simplicial, illustrated fortwo Euclidean spaces in Figure 47.From2.13.6.1, the extreme directions of proper K M+ are respectivelyorthogonal to the facets of K ∗ M+ . Because K∗ M+ is simplicial, theinward-normals to its facets constitute the linearly independent rows of X Tby (360). Hence the vertex-description for K M+ employs the columns of Xin agreement with Cone Table S because X † =X −1 . Likewise, the extremedirections of proper K ∗ M+ are respectively orthogonal to the facets of K M+whose inward-normals are contained in the rows of X † by (354). So thevertex-description for K ∗ M+ employs the columns of X†T . 2.13.9.4.2 Example. Monotone cone.(Figure 48, Figure 49) Of nonempty interior but not pointed, the monotonecone is polyhedral and defined by the halfspace-descriptionK M ∆ = {x ∈ R n | x 1 ≥ x 2 ≥ · · · ≥ x n } = {x ∈ R n | X ∗T x ≽ 0} (361)Its dual is therefore pointed but of empty interior, having vertex-descriptionK ∗ M = {X ∗ b ∆ = [e 1 −e 2 e 2 −e 3 · · · e n−1 −e n ]b | b ≽ 0 } ⊂ R n (362)

182 CHAPTER 2. CONVEX GEOMETRYx 210.50K M−0.5−1K M−1.5K ∗ M−2−1 −0.5 0 0.5 1 1.5 2x 1Figure 48: Monotone cone K M and its dual K ∗ M (drawn truncated) in R2 .where the columns of X ∗ comprise the extreme directions of K ∗ M . BecauseK ∗ M is pointed and satisfiesrank(X ∗ ∈ R n×N ) = N ∆ = dim aff K ∗ ≤ n (363)where N = n−1, and because K M is closed and convex, we may adapt ConeTable 1 as follows:Cone Table 1* K ∗ K ∗∗ = Kvertex-description X ∗ X ∗†T , ±X ∗⊥halfspace-description X ∗† , X ∗⊥T X ∗TThe vertex-description for K M is thereforeK M = {[X ∗†T X ∗⊥ −X ∗⊥ ]a | a ≽ 0} ⊂ R n (364)where X ∗⊥ = 1 and⎡⎤n − 1 −1 −1 · · · −1 −1 −1n − 2 n − 2 −2... · · · −2 −2X ∗† = 1 . n − 3 n − 3 . .. −(n − 4) . −3n3 . n − 4 . ∈ R.. n−1×n−(n − 3) −(n − 3) .⎢⎥⎣ 2 2 · · ·... 2 −(n − 2) −(n − 2) ⎦1 1 1 · · · 1 1 −(n − 1)(365)

2.13. DUAL CONE & GENERALIZED INEQUALITY 183-x 2x 3∂K MK ∗ Mx 2x 3∂K MK ∗ MFigure 49: Two views of monotone cone K M and its dual K ∗ M (drawntruncated) in R 3 . Monotone cone is not pointed. Dual monotone conehas empty interior. Cartesian coordinate axes are drawn for reference.

184 CHAPTER 2. CONVEX GEOMETRYwhileK ∗ M = {y ∈ R n | X ∗† y ≽ 0, X ∗⊥T y = 0} (366)is the dual monotone cone halfspace-description.2.13.9.5 More pointed cone descriptions with equality conditionConsider pointed polyhedral cone K having a linearly independent set ofgenerators and whose subspace membership is explicit; id est, we are giventhe ordinary halfspace-descriptionK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(240a)where A ∈ R m×n and C ∈ R p×n . This can be equivalently written in termsof nullspace of C and vector ξ :K = {Zξ ∈ R n | AZξ ≽ 0} (367)where R(Z ∈ R n×n−rank C ) ∆ = N(C) . Assuming (337) is satisfiedrankX ∆ = rank ( (AZ) † ∈ R n−rank C×m) = m − l = dim aff K ≤ n − rankC(368)where l is the number of conically dependent rows in AZ (2.10)that must be removed to make ÂZ before the cone tables becomeapplicable. 2.61 Then the results collected in the cone tables admit theassignment ˆX =(ÂZ)† ∆ ∈ R n−rank C×m−l , where Â ∈ Rm−l×n , followed withlinear transformation by Z . So we get the vertex-description, for (ÂZ)†skinny-or-square full-rank,K = {Z(ÂZ)† b | b ≽ 0} (369)From this and (299) we get a halfspace-description of the dual coneK ∗ = {y ∈ R n | (Z T Â T ) † Z T y ≽ 0} (370)From this and Cone Table 1 (p.177) we get a vertex-description, (1482)K ∗ = {[Z †T (ÂZ)T C T −C T ]c | c ≽ 0} (371)2.61 When the conically dependent rows are removed, the rows remaining must be linearlyindependent for the cone tables to apply.

2.13. DUAL CONE & GENERALIZED INEQUALITY 185Yet becauseK = {x | Ax ≽ 0} ∩ {x | Cx = 0} (372)then, by (264), we get an equivalent vertex-description for the dual coneK ∗ = {x | Ax ≽ 0} ∗ + {x | Cx = 0} ∗= {[A T C T −C T ]b | b ≽ 0}(373)from which the conically dependent columns may, of course, be removed.2.13.10 Dual cone-translateFirst-order optimality condition (292) inspires a dual-cone variant: For anyset K , the negative dual of its translation by any a∈ R n is−(K − a) ∗ ∆ = { y ∈ R n | 〈y , x − a〉≤0 for all x ∈ K }= { y ∈ R n | 〈y , x〉≤0 for all x ∈ K − a } (374)a closed convex cone called the normal cone to K at point a . (E.10.3.2.1)From this, a new membership relation like (267) for closed convex cone K :y ∈ −(K − a) ∗ ⇔ 〈y , x − a〉≤0 for all x ∈ K (375)2.13.10.1 first-order optimality condition - restatementThe general first-order necessary and sufficient condition for optimalityof solution x ⋆ to a minimization problem with real differentiable convexobjective function ˆf(x) : R n →R over convex feasible set C is [201,3](confer (292))−∇ ˆf(x ⋆ ) ∈ −(C − x ⋆ ) ∗ , x ⋆ ∈ C (376)id est, the negative gradient (3.1.1.4) belongs to the normal cone at x ⋆ asin Figure 50.

186 CHAPTER 2. CONVEX GEOMETRYαα ≥ β ≥ γβCx ⋆ −∇ ˆf(x ⋆ )γ{z | ˆf(z) = α}{y | ∇ ˆf(x ⋆ ) T (y − x ⋆ ) = 0, ˆf(x ⋆ )=γ}Figure 50: Shown is a plausible contour plot in R 2 of some arbitrarydifferentiable real convex function ˆf(x) at selected levels α , β , and γ ;id est, contours of equal level ˆf (level sets) drawn (dashed) in function’sdomain. Function is minimized over convex set C at point x ⋆ iff negativegradient −∇ ˆf(x ⋆ ) belongs to normal cone to C there. In circumstancedepicted, normal cone is a ray whose direction is coincident with negativegradient. From results in3.1.1.5 (p.205), ∇ ˆf(x ⋆ ) is normal to the γ-sublevelset by Definition E.9.0.0.3.

2.13. DUAL CONE & GENERALIZED INEQUALITY 1872.13.10.1.1 Example. Normal cone to orthant.Consider proper cone K= R n + , the self-dual nonnegative orthant in R n . Thenormal cone to R n + at a∈ K is (1691)K ⊥ R n + (a∈ Rn +) = −(R n + − a) ∗ = −R n + ∩ a ⊥ , a∈ R n + (377)where −R n += −K ∗ is the algebraic complement of R n + , and a ⊥ is theorthogonal complement of point a . This means: When point a is interior toR n + , the normal cone is the origin. If n p represents the number of nonzeroentries in point a ∈ ∂R n + , then dim(−R n + ∩ a ⊥ )= n − n p and there is acomplementary relationship between the nonzero entries in point a and thenonzero entries in any vector x∈−R n + ∩ a ⊥ .2.13.10.1.2 Example. Optimality conditions for conic problem.Consider a convex optimization problem having real differentiable convexobjective function ˆf(x) : R n →R defined on domain R n ;minimize ˆf(x)xsubject to x ∈ K(378)The feasible set is a pointed polyhedral cone K possessing a linearlyindependent set of generators and whose subspace membership is madeexplicit by fat full-rank matrix C ∈ R p×n ; id est, we are given thehalfspace-descriptionK = {x | Ax ≽ 0, Cx = 0} ⊆ R n(240a)where A∈ R m×n . The vertex-description of this cone, assuming (ÂZ)†skinny-or-square full-rank, isK = {Z(ÂZ)† b | b ≽ 0} (369)where Â∈ Rm−l×n , l is the number of conically dependent rows in AZ(2.10) that must be removed, and Z ∈ R n×n−rank C holds basis N(C)columnar.

188 CHAPTER 2. CONVEX GEOMETRYFrom optimality condition (292),∇ ˆf(x ⋆ ) T (Z(ÂZ)† b − x ⋆ )≥ 0 ∀b ≽ 0 (379)−∇ ˆf(x ⋆ ) T Z(ÂZ)† (b − b ⋆ )≤ 0 ∀b ≽ 0 (380)becausex ⋆ ∆ = Z(ÂZ)† b ⋆ ∈ K (381)From membership relation (375) and Example 2.13.10.1.1〈−(Z T Â T ) † Z T ∇ ˆf(x ⋆ ), b − b ⋆ 〉 ≤ 0 for all b ∈ R m−l+⇔(382)−(Z T Â T ) † Z T ∇ ˆf(x ⋆ ) ∈ −R m−l+ ∩ b ⋆⊥Then the equivalent necessary and sufficient conditions for optimality of theconic program (378) with pointed polyhedral feasible set K are: (confer (298))(Z T Â T ) † Z T ∇ ˆf(x ⋆ ) ≽R m−l+0 , b ⋆ ≽R m−l+0 , ∇ ˆf(x ⋆ ) T Z(ÂZ)† b ⋆ = 0 (383)When K = R n + , in particular, then C =0, A=Z =I ∈ S n ; id est,minimize ˆf(x)xsubject to x ≽ 0 (384)R n +The necessary and sufficient conditions become (confer [41,4.2.3])∇ ˆf(x ⋆ ) ≽ 0 , x ⋆ ≽ 0 , ∇ ˆf(x ⋆ ) T x ⋆ = 0 (385)R n +R n +

2.13. DUAL CONE & GENERALIZED INEQUALITY 1892.13.10.1.3 Example. Linear complementarity. [179] [206]Given matrix A ∈ R n×n and vector q ∈ R n , the complementarity problem isa feasibility problem:find w , zsubject to w ≽ 0z ≽ 0w T z = 0w = q + Az(386)Volumes have been written about this problem, most notably by Cottle [52].The problem is not convex if both vectors w and z are variable. But if one ofthem is fixed, then the problem becomes convex with a very simple geometricinterpretation: Define the affine subsetA ∆ = {y ∈ R n | Ay=w − q} (387)For w T z to vanish, there must be a complementary relationship between thenonzero entries of vectors w and z ; id est, w i z i =0 ∀i. Given w ≽0, thenz belongs to the convex set of feasible solutions:z ∈ −K ⊥ R n + (w ∈ Rn +) ∩ A = R n + ∩ w ⊥ ∩ A (388)where KR ⊥ (w) is the normal cone to n Rn + + at w (377). If this intersection isnonempty, then the problem is solvable.2.13.11 Proper nonsimplicial K , dual, X fat full-rankAssume we are given a set of N conically independent generators 2.62 (2.10)of an arbitrary polyhedral proper cone K in R n arranged columnar inX ∈ R n×N such that N > n (fat) and rankX = n . Having found formula(345) to determine the dual of a simplicial cone, the easiest way to find avertex-description of the proper dual cone K ∗ is to first decompose K intosimplicial parts K i so that K = ⋃ K i . 2.63 Each component simplicial cone2.62 We can always remove conically dependent columns from X to construct K or todetermine K ∗ . (G.2)2.63 That proposition presupposes, of course, that we know how to perform simplicialdecomposition efficiently; also called “triangulation”. [198] [105,3.1] [106,3.1] Existenceof multiple simplicial parts means expansion of x∈ K like (336) can no longer be uniquebecause N the number of extreme directions in K exceeds n the dimension of the space.

190 CHAPTER 2. CONVEX GEOMETRYin K corresponds to some subset of n linearly independent columns from X .The key idea, here, is how the extreme directions of the simplicial parts mustremain extreme directions of K . Finding the dual of K amounts to findingthe dual of each simplicial part:2.13.11.0.1 Theorem. Dual cone intersection. [218,2.7]Suppose proper cone K ⊂ R n equals the union of M simplicial cones K i whoseextreme directions all coincide with those of K . Then proper dual cone K ∗is the intersection of M dual simplicial cones K ∗ i ; id est,M⋃M⋂K = K i ⇒ K ∗ = K ∗ i (389)i=1Proof. For X i ∈ R n×n , a complete matrix of linearly independentextreme directions (p.137) arranged columnar, corresponding simplicial K i(2.12.3.1.1) has vertex-descriptionNow suppose,K =K =i=1K i = {X i c | c ≽ 0} (390)M⋃K i =i=1M⋃{X i c | c ≽ 0} (391)i=1The union of all K i can be equivalently expressed⎧⎡ ⎤⎪⎨[ X 1 X 2 · · · X M ] ⎢⎣⎪⎩aḅ.c⎥⎦ | a,b,... ,c ≽ 0 ⎫⎪ ⎬⎪ ⎭(392)Because extreme directions of the simplices K i are extreme directions of Kby assumption, then by the extremes theorem (2.8.1.1.1),K = { [ X 1 X 2 · · · X M ]d | d ≽ 0 } (393)Defining X = ∆ [X 1 X 2 · · · X M ] (with any redundant [sic] columns optionallyremoved from X), then K ∗ can be expressed, (299) (Cone Table S, p.177)M⋂M⋂K ∗ = {y | X T y ≽ 0} = {y | Xi T y ≽ 0} = K ∗ i (394)i=1i=1⋄

2.13. DUAL CONE & GENERALIZED INEQUALITY 191To find the extreme directions of the dual cone, first we observe that somefacets of each simplicial part K i are common to facets of K by assumption,and the union of all those common facets comprises the set of all facets ofK by design. For any particular polyhedral proper cone K , the extremedirections of dual cone K ∗ are respectively orthogonal to the facets of K .(2.13.6.1) Then the extreme directions of the dual cone can be found amongthe inward-normals to facets of the component simplicial cones K i ; thosenormals are extreme directions of the dual simplicial cones K ∗ i . From thetheorem and Cone Table S (p.177),K ∗ =M⋂K ∗ i =i=1M⋂i=1{X †Ti c | c ≽ 0} (395)The set of extreme directions {Γ ∗ i } for proper dual cone K ∗ is thereforeconstituted by the conically independent generators, from the columnsof all the dual simplicial matrices {X †Ti } , that do not violate discretedefinition (299) of K ∗ ;{ }Γ ∗ 1 , Γ ∗ 2 ... Γ ∗ N{}= c.i. X †Ti (:,j), i=1... M , j =1... n | X † i (j,:)Γ l ≥ 0, l =1... N(396)where c.i. denotes selection of only the conically independent vectors fromthe argument set, argument (:,j) denotes the j th column while (j,:) denotesthe j th row, and {Γ l } constitutes the extreme directions of K . Figure 36(b)(p.135) shows a cone and its dual found via this formulation.2.13.11.0.2 Example. Dual of K nonsimplicial in subspace aff K .Given conically independent generators for pointed closed convex cone K inR 4 arranged columnar inX = [ Γ 1 Γ 2 Γ 3 Γ 4 ] =⎡⎢⎣1 1 0 0−1 0 1 00 −1 0 10 0 −1 −1⎤⎥⎦ (397)

192 CHAPTER 2. CONVEX GEOMETRYhaving dim aff K = rankX = 3, then performing the most inefficientsimplicial decomposition in aff K we find⎡ ⎤ ⎡ ⎤1 1 01 1 0X 1 = ⎢ −1 0 1⎥⎣ 0 −1 0 ⎦ , X 2 = ⎢ −1 0 0⎥⎣ 0 −1 1 ⎦0 0 −10 0 −1X 3 =4X †T1 =⎢⎣⎡⎢⎣1 0 0−1 1 00 0 10 −1 −1⎤⎥⎦ , X 4 =⎡⎢⎣1 0 00 1 0−1 0 10 −1 −1⎤⎥⎦(398)The corresponding dual simplicial cones in aff K have generators respectivelycolumnar in⎡ ⎤ ⎡ ⎤2 1 1−2 1 1 ⎢4X †T3 =⎡⎢⎣Applying (396) we get2 −3 1−2 1 −33 2 −1−1 2 −1−1 −2 3−1 −2 −1⎥⎦ ,⎤⎥⎦ ,[ ]Γ ∗ 1 Γ ∗ 2 Γ ∗ 3 Γ ∗ 44X†T 2 =4X†T 4 =⎡= 1 ⎢4⎣⎢⎣⎡⎢⎣1 2 1−3 2 11 −2 11 −2 −33 −1 2−1 3 −2−1 −1 2−1 −1 −21 2 3 21 2 −1 −21 −2 −1 2−3 −2 −1 −2⎤⎥⎦⎤⎥⎦(399)⎥⎦ (400)whose rank is 3, and is the known result; 2.64 the conically independentgenerators for that pointed section of the dual cone K ∗ in aff K ; id est,K ∗ ∩ aff K .2.64 These calculations proceed so as to be consistent with [67,6]; as if the ambient vectorspace were the proper subspace aff K whose dimension is 3. In that ambient space, Kmay be regarded as a proper cone. Yet that author (from the citation) erroneously statesthe dimension of the ordinary dual cone to be 3 ; it is, in fact, 4.

Chapter 3Geometry of convex functionsThe link between convex sets and convex functions is via theepigraph: A function is convex if and only if its epigraph is aconvex set.−Stephen Boyd & Lieven Vandenberghe [41,3.1.7]We limit our treatment of multidimensional functions 3.1 to finite-dimensionalEuclidean space. Then the icon for the one-dimensional (real)convex function is bowl-shaped (Figure 55), whereas the concave icon is theinverted bowl; respectively characterized by a unique global minimum andmaximum whose existence is assumed. Because of this simple relationship,usage of the term convexity is often implicitly inclusive of concavity in theliterature. Despite the iconic imagery, the reader is reminded that the setof all convex, concave, quasiconvex, and quasiconcave functions contains themonotone [132] [140,2.3.5] functions; e.g., [41,3.6, exer.3.46].3.1 vector- or matrix-valued functions including the real functions. Appendix D, with itstables of first- and second-order gradients, is the practical adjunct to this chapter.2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.193

194 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1 Convex function3.1.1 Vector-valued functionVector-valued function f(X) : R p×k →R M assigns each X in its domaindomf (a subset of ambient vector space R p×k ) to a specific element [168, p.3]of its range (a subset of R M ). Function f(X) is linear in X on its domain ifand only if, for each and every Y,Z ∈domf and α , β ∈ Rf(αY + βZ) = αf(Y ) + βf(Z) (401)A vector-valued function f(X) : R p×k →R M is convex in X if and only ifdomf is a convex set and, for each and every Y,Z ∈domf and 0≤µ≤1f(µY + (1 − µ)Z) ≼µf(Y ) + (1 − µ)f(Z) (402)R M +As defined, continuity is implied but not differentiability. Reversing thesense of the inequality flips this definition to concavity. Linear functions are,apparently, simultaneously convex and concave.Vector-valued functions are most often compared (146) as in (402) withrespect to the M-dimensional self-dual nonnegative orthant R M + , a propercone. 3.2 In this case, the test prescribed by (402) is simply a comparison on Rof each entry of the vector-valued function. (2.13.4.2.3) The vector-valuedfunction case is therefore a straightforward generalization of conventionalconvexity theory for a real function. [41,3,4] This conclusion follows fromtheory of generalized inequality (2.13.2.0.1) which assertsf convex ⇔ w T f convex ∀w ∈ G(R M + ) (403)shown by substitution of the defining inequality (402). Discretization(2.13.4.2.1) allows relaxation of the semi-infinite number of conditions∀w ≽ 0 to:∀w ∈ G(R M + ) = {e i , i=1... M} (404)(the standard basis for R M and a minimal set of generators (2.8.1.2) for R M + )from which the stated conclusion follows; id est, the test for convexity of avector-valued function is a comparison on R of each entry.3.2 The definition of convexity can be broadened to other (not necessarily proper) cones;referred to in the literature as K-convexity. [193]

3.1. CONVEX FUNCTION 195ˆf 1 (x)ˆf2 (x)(a)(b)Figure 51: Each convex real function has a unique minimizer x ⋆ but,for x∈ R , ˆf1 (x)=x 2 is strictly convex whereas ˆf2 (x)= √ x 2 =|x| is not.Strict convexity of a real function is therefore only a sufficient condition forminimizer uniqueness.Relation (403) implies the set of all vector-valued convex functions in R Mis a convex cone. (Proof pending.) Indeed, any nonnegatively weighted sumof (strictly) convex functions remains (strictly) convex. 3.3 Interior to thecone would be the strictly convex functions.3.1.1.1 Strict convexityWhen f(X) instead satisfies, for each and every distinct Y and Z in dom fand all 0

196 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSAny convex real function ˆf(X) has unique minimum value over anyconvex subset of its domain. Yet solution to some convex optimizationproblem is, in general, not unique; e.g., given a minimization of a convexreal function over some abstracted convex set Cminimize ˆf(X)Xsubject to X ∈ C(407)any optimal solution X ⋆ comes from a convex set of optimal solutionsX ⋆ ∈ {X | ˆf(X) = infY ∈ Cˆf(Y ) } (408)But a strictly convex real function has a unique minimizer X ⋆ ; id est, for theoptimal solution set in (408) to be a single point, it is sufficient (Figure 51)that ˆf(X) be a strictly convex real function and set C convex.It is customary to consider only a real function for the objective of aconvex optimization problem because vector- or matrix-valued functions canintroduce ambiguity into the optimal value of the objective. (2.7.2.2)Quadratic real functions x T Px + q T x + r characterized by a symmetricpositive definite matrix P are strictly convex. The vector 2-norm squared‖x‖ 2 (Euclidean norm squared) and Frobenius norm squared ‖X‖ 2 F , forexample, are strictly convex functions of their respective argument (eachnorm is convex but not strictly convex). Figure 51(a) illustrates a strictlyconvex real function.3.1.1.2 Affine functionA function f(X) is affine when it is continuous and has the dimensionallyextensible form (confer2.9.1.0.2)f(X) = AX + B (409)When B=0 then f(X) is a linear function. All affine functions aresimultaneously convex and concave.The real affine function in Figure 52 illustrates hyperplanes in its domainconstituting contours of equal function-value (level sets).Variegated multidimensional affine functions are recognized by theexistence of no multivariate terms in argument entries and no polynomialterms in argument entries of degree higher than 1 ; id est, entries of the

3.1. CONVEX FUNCTION 197ˆf(z)Az 2z 1CH−H+a{z ∈ R 2 | a T z = κ1 }{z ∈ R 2 | a T z = κ2 }{z ∈ R 2 | a T z = κ3 }Figure 52: Drawn (truncated) are Cartesian axes in R 3 and three hyperplanesintersecting convex set C ⊂ R 2 reproduced from Figure 18. Plotted with thirddimension is affine set A = ˆf(R 2 ) a plane. Sequence of hyperplanes, w.r.tdomain R 2 of an affine function ˆf(z)= a T z + b : R 2 → R , is increasing indirection of gradient a (3.1.1.4.1) because affine function increases in normaldirection (Figure 16).function are characterized only by linear combinations of the argumententries plus constants.For X ∈ S M and matrices A,B, Q, R of any compatible dimensions, forexample, the expression XAX is not affine in X whereasg(X) =[ R B T XXB Q + A T X + XA](410)is an affine multidimensional function.engineering control. [264,2.2] 3.4 [40] [86]Such a function is typical in3.4 The interpretation from this citation of {X ∈ S M | g(X) ≽ 0} as “an intersectionbetween a linear subspace and the cone of positive semidefinite matrices” is incorrect.(See2.9.1.0.2 for a similar example.) The conditions they state under which strongduality holds for semidefinite programming are conservative. (confer6.2.3.0.1)

198 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.1.2.1 Example. Linear objective.Consider minimization of a real affine function ˆf(z)= a T z + b over theconvex feasible set C in its domain R 2 illustrated in Figure 52. Sincevector b is fixed, the problem posed is the same as the convex optimizationminimize a T zzsubject to z ∈ C(411)whose objective of minimization is a real linear function. Were convex set Cpolyhedral (2.12), then this problem would be called a linear program.Were C a positive semidefinite cone, then this problem would be called asemidefinite program.There are two distinct ways to visualize this problem: one in theobjective function’s domain R 2 , the other [ including ] the ambient spaceR2of the objective function’s range as in . Both visualizations areRillustrated in Figure 52. Visualization in the function domain is easierbecause of lower dimension and because level sets of any affine function areaffine (2.1.9). In this circumstance, the level sets are parallel hyperplaneswith respect to R 2 . One solves optimization problem (411) graphically byfinding that hyperplane intersecting feasible set C furthest right (in thedirection of negative gradient −a (3.1.1.4)).When a differentiable convex objective function ˆf is nonlinear, thenegative gradient −∇ ˆf is a viable search direction (replacing −a in (411)).(2.13.10.1, Figure 50) Then the nonlinear objective function can be replacedwith a dynamic linear objective; linear as in (411).3.1.1.2.2 Example. Support function. [41,3.2]For arbitrary set Y ⊆ R n , its support function σ Y (a) : R n → R is definedσ Y (a) = ∆ supa T z (412)z∈Ywhose range contains ±∞ [162, p.135] [129,C.2.3.1]. For each z ∈ Y ,a T z is a linear function of vector a . Because σ Y (a) is the pointwisesupremum of linear functions, it is convex in a . (Figure 53) Applicationof the support function is illustrated in Figure 19(a) for one particularnormal a .

3.1. CONVEX FUNCTION 199{a T z 1 + b 1 | a∈ R}supa T pz i + b ii{a T z 2 + b 2 | a∈ R}{a T z 3 + b 3 | a∈ R}a{a T z 4 + b 4 | a∈ R}{a T z 5 + b 5 | a∈ R}Figure 53: Pointwise supremum of convex functions remains a convexfunction. Illustrated is a supremum of affine functions in variable a evaluatedfor a particular argument a p . Topmost affine function is supremum for eachvalue of a .

200 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.1.3 Epigraph, sublevel setIt is well established that a continuous real function is convex if andonly if its epigraph makes a convex set. [129] [202] [239] [250] [162]Thereby, piecewise-continuous convex functions are admitted. Epigraph isthe connection between convex sets and convex functions. Its generalizationto a vector-valued function f(X) : R p×k →R M is straightforward: [193]epif ∆ = {(X , t) | X ∈ domf , f(X) ≼t } ⊆ R p×k × R M (413)id est,R M +f convex ⇔ epi f convex (414)Necessity is proven: [41, exer.3.60] Given any (X, u), (Y , v)∈ epif , wemust show for all µ∈[0, 1] that µ(X, u) + (1−µ)(Y , v)∈ epif ; id est, wemust showf(µX + (1−µ)Y ) ≼R M +µu + (1−µ)v (415)Yet this holds by definition because f(µX+(1−µ)Y ) ≼ µf(X)+(1−µ)f(Y ).The converse also holds.Sublevel sets of a real convex function are convex. Likewise, sublevel setsL ν f ∆ = {X ∈ domf | f(X) ≼ν } ⊆ R p×k (416)of a vector-valued convex function are convex. As for real functions, theconverse does not hold.To prove necessity of convex sublevel sets: For any X, Y ∈ L ν f we mustshow for each and every µ∈[0, 1] that µX + (1−µ)Y ∈ L ν f . By definition,R M +f(µX + (1−µ)Y ) ≼R M +µf(X) + (1−µ)f(Y ) ≼R M +ν (417)

3.1. CONVEX FUNCTION 20121.510.5Y 20−0.5−1−1.5−2−2 −1.5 −1 −0.5 0 0.5 1 1.5 2Figure 54: Gradient in R 2 evaluated on grid over some open disc in domainof convex quadratic bowl ˆf(Y )= Y T Y : R 2 → R illustrated in Figure 55.Circular contours are level sets; each defined by a constant function-value.Y 1When an epigraph is artificially bounded above, then the correspondingsublevel set can be regarded as an orthogonal projection of the epigraph onthe function domain.3.1.1.4 GradientThe gradient ∇f (D.1) of any differentiable multidimensional function fmaps each entry f i to a space having the same dimension as the ambientspace of its domain. The gradient can be interpreted as a vector pointingin the direction of greatest change. [142,15.6] The gradient can also beinterpreted as that vector normal to a level set; e.g., Figure 56, Figure 50.For the quadratic bowl in Figure 55, the gradient maps to R 2 ; illustratedin Figure 54. For a one-dimensional function of real variable, for example,the gradient evaluated at any point in the function domain is just the slope(or derivative) of that function there. (conferD.1.4.1)

202 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.1.4.1 Example. Hyperplane, line, described by affine function.Consider the real affine function of vector variable,ˆf(x) : R p → R = a T x + b (418)whose domain is R p and whose gradient ∇ ˆf(x)=a is a constant vector(independent of x). This function describes the real line R (its range), andit describes a nonvertical [129,B.1.2] hyperplane ∂H in the space R p × Rfor any particular vector a (confer2.4.2);{[∂H =xa T x + b] }| x∈ R p ⊂ R p ×R (419)having nonzero normalη =[ a−1]∈ R p ×R (420)This equivalence to a hyperplane holds only for real functions. 3.5[ The ]epigraph of the real affine function ˆf(x) Rpis therefore a halfspace in ,Rso we have:The real affine function is to convex functionsasthe hyperplane is to convex sets.3.5 To prove that, consider a vector-valued affine functionf(x) : R p →R M = Ax + bhaving gradient ∇f(x)=A T ∈ R p×M : The affine set{[ ] }x| x∈ R p ⊂ R p ×R MAx + bis perpendicular toη ∆ =[ ∇f(x)−I]∈ R p×M × R M×Mbecauseη T ([xAx + b] [ 0−b])= 0 ∀x ∈ R pYet η is a vector (in R p ×R M ) only when M = 1.

3.1. CONVEX FUNCTION 203Similarly, the matrix-valued affine function of real variable x , for anyparticular matrix A∈ R M×N ,describes a line in R M×N in direction Aand describes a line in R×R M×N{[h(x) : R→R M×N = Ax + B (421){Ax + B | x∈ R} ⊆ R M×N (422)xAx + B]}| x∈ R ⊂ R×R M×N (423)whose slope with respect to x is A .3.1.1.5 First-order convexity condition, real functionDiscretization of w ≽0 in (403) invites refocus to the real-valued function:3.1.1.5.1 Theorem. Necessary and sufficient convexity condition.[41,3.1.3] [75,I.5.2] [267,1.2.3] [29,1.2] [218,4.2] [201,3] For realdifferentiable function ˆf(X) : R p×k →R with matrix argument on openconvex domain, the condition (D.1.6)ˆf(Y ) ≥ ˆf(X) +〈∇ ˆf(X)〉, Y − Xfor each and every X,Y ∈ dom ˆf (424)is necessary and sufficient for convexity of ˆf .⋄When ˆf(X) : R p →R is a real differentiable convex function with vectorargument on open convex domain, there is simplification of the first-ordercondition (424); for each and every X,Y ∈ dom ˆfˆf(Y ) ≥ ˆf(X) + ∇ ˆf(X) T (Y − X) (425)From this we can find a unique [250,5.5.4] nonvertical [129,B.1.2]hyperplane ∂H − (2.4), expressed in terms of the function gradient,

204 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSˆf(Y )[∇ ˆf(X)−1]∂H −Figure 55: When a real function ˆf is differentiable at each point in its opendomain, there is an intuitive geometric interpretation of function convexityin terms of its gradient ∇ ˆf and its epigraph: Drawn is a convex quadraticbowl in R 2 ×R (confer Figure 108, p.535); ˆf(Y )= Y T Y : R 2 → R versus Yon some open disc in R 2 . Supporting hyperplane ∂H − ∈ R 2 × R (which istangent, only partially drawn) and its normal vector [ ∇ ˆf(X) T −1 ] T at theparticular point of support [X T ˆf(X) ] T are illustrated. The interpretation:At each and every coordinate Y , there is such a hyperplane containing[Y T ˆf(Y ) ] T and supporting the epigraph.

3.1. CONVEX FUNCTION 205supporting epi ˆf at[41,3.1.7][Xˆf(X)]: videlicet, defining ˆf(Y /∈ dom ˆf) ∆ = ∞[ Yt]∈ epi ˆf ⇔ t ≥ ˆf(Y ) ⇒[∇ ˆf(X) T] ([ Y−1t] [−Xˆf(X)])≤ 0(426)This means, for each and every point X in the domain of a real convexfunction[ ˆf(X) ], there exists a hyperplane ∂H −[in R p ×]R having normal∇ ˆf(X)Xsupporting the function epigraph at ∈ ∂H −−1ˆf(X){[ Y∂H − =t] [ Rp∈R] [∇ ˆf(X) T] ([ Y−1t] [−Xˆf(X)]) }= 0(427)One such supporting hyperplane (confer Figure 19(a)) is illustrated inFigure 55 for a convex quadratic.From (425) we deduce, for each and every X,Y ∈ dom ˆf∇ ˆf(X) T (Y − X) ≥ 0 ⇒ ˆf(Y ) ≥ ˆf(X) (428)meaning, the gradient at X identifies a supporting hyperplane there in R p{Y ∈ R p | ∇ ˆf(X) T (Y − X) = 0} (429)to the convex sublevel sets of convex function ˆf (confer (416))L ˆf(X)ˆf ∆ = {Y ∈ dom ˆf | ˆf(Y ) ≤ ˆf(X)} ⊆ R p (430)illustrated for an arbitrary real convex function in Figure 56.

206 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSαβα ≥ β ≥ γ∇ ˆf(X)Y−Xγ{Z | ˆf(Z) = α}{Y | ∇ ˆf(X) T (Y − X) = 0, ˆf(X)=α}Figure 56: Shown is a plausible contour plot in R 2 of some arbitrary realconvex function ˆf(Z) at selected levels α , β , and γ ; contours of equallevel ˆf (level sets) drawn in the function’s domain. A convex functionhas convex sublevel sets L ˆf(X)ˆf (430). [202,4.6] The sublevel set whoseboundary is the level set at α , for instance, comprises all the shaded regions.For any particular convex function, the family comprising all its sublevel setsis nested. [129, p.75] Were the sublevel sets not convex, we may certainlyconclude the corresponding function is neither convex. Contour plots of realaffine functions are illustrated in Figure 16 and Figure 52.

3.1. CONVEX FUNCTION 2073.1.1.5.2 Theorem. Gradient monotonicity. [129,B.4.1.4][38,3.1, exer.20] Given ˆf(X) : R p×k →R a real differentiable function withmatrix argument on open convex domain, the condition〈∇ ˆf(Y ) − ∇ ˆf(X)〉, Y − X ≥ 0 for each and every X,Y ∈ dom ˆf (431)is necessary and sufficient for convexity of ˆf . Strict inequality and caveatdistinct Y ,X provide necessary and sufficient conditions for strict convexity.⋄3.1.1.6 First-order convexity condition, vector-valued functionNow consider the first-order necessary and sufficient condition for convexityof a vector-valued function: Differentiable function f(X) : R p×k →R M isconvex if and only if domf is open, convex, and for each and everyX,Y ∈ domff(Y ) ≽R M +→Y −Xf(X) +→Y −Xdf(X)= f(X) + d dt∣ f(X+ t (Y − X)) (432)t=0where df(X) is the directional derivative 3.6 [142] [221] of f at X in directionY − X . This, of course, follows from the real-valued function case: bygeneralized inequality (2.13.2.0.1),f(Y ) − f(X) −→Y −Xdf(X) ≽0 ⇔〈〉→Y −Xf(Y ) − f(X) − df(X) , w ≥ 0 ∀w ≽ 0whereR M +→Y −Xdf(X) =⎡⎢⎣(433)tr ( ∇f 1 (X) T (Y − X) ) ⎤tr ( ∇f 2 (X) T (Y − X) )⎥.tr ( ∇f M (X) T (Y − X) ) ⎦ ∈ RM (434)R M +3.6 We extend the traditional definition of directional derivative inD.1.4 so that directionmay be indicated by a vector or a matrix, thereby broadening the scope of the Taylorseries (D.1.6). The right-hand side of the inequality (432) is the first-order Taylor seriesexpansion of f about X .

208 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONSNecessary and sufficient discretization (2.13.4.2.1) allows relaxation of thesemi-infinite number of conditions w ≽ 0 instead to w ∈ {e i , i=1... M} theextreme directions of the nonnegative orthant. Each extreme direction picksout an entry from the vector-valued function and its directional derivative,satisfying Theorem 3.1.1.5.1.The vector-valued function case (432) is therefore a straightforwardapplication of the first-order convexity condition for real functions to eachentry of the vector-valued function.3.1.1.7 Second-order convexity condition, vector-valued functionAgain, by discretization, we are obliged only to consider each individual entryf i of a vector-valued function f .For f(X) : R p →R M , a twice differentiable vector-valued function withvector argument on open convex domain,∇ 2 f i (X) ≽0 ∀X ∈ domf , i=1... M (435)S p +is a necessary and sufficient condition for convexity of f . Intuitively, thiscondition precludes points of inflection, as in Figure 57 on page 213.Strict inequality is a sufficient condition for strict convexity, but that isnothing new; videlicet, the strictly convex real function f i (x)=x 4 does nothave positive second derivative at each and every x∈ R . Quadratic formsconstitute a notable exception where the strict-case converse is reliably true.3.1.1.8 second-order ⇒ first-order conditionFor a twice-differentiable real function f i (X) : R p →R having open domain,a consequence of the mean value theorem from calculus allows compressionof its complete Taylor series expansion about X ∈ domf i (D.1.6) to threeterms: On some open interval of ‖Y ‖ so each and every line segment[X,Y ] belongs to domf i , there exists an α ∈ [0, 1] such that [267,1.2.3][29,1.1.4]f i (Y ) = f i (X)+ ∇f i (X) T (Y −X)+ 1 2 (Y −X)T ∇ 2 f i (αX +(1 −α)Y )(Y −X)(436)The first-order condition for convexity (425) follows directly from this andthe second-order condition (435).

3.1. CONVEX FUNCTION 2093.1.2 Matrix-valued functionWe need different tools for matrix argument: We are primarily interested incontinuous matrix-valued functions g(X). We choose symmetric g(X)∈ S Mbecause matrix-valued functions are most often compared (437) with respectto the positive semidefinite cone S M + in the ambient space of symmetricmatrices. 3.73.1.2.0.1 Definition. Convex matrix-valued function.1) Matrix-definition.A function g(X) : R p×k →S M is convex in X iff domg is a convex set and,for each and every Y,Z ∈domg and all 0≤µ≤1 [140,2.3.7]g(µY + (1 − µ)Z) ≼µg(Y ) + (1 − µ)g(Z) (437)S M +Reversing the sense of the inequality flips this definition to concavity. Strictconvexity is defined less a stroke of the pen in (437) similarly to (405).2) Scalar-definition.It follows that g(X) : R p×k →S M is convex in X iff w T g(X)w : R p×k →R isconvex in X for each and every ‖w‖= 1; shown by substituting the defininginequality (437). By generalized inequality we have the equivalent but morebroad criterion, (2.13.5)g convex ⇔ 〈W , g〉 convex ∀W ≽0 (438)S M +Strict convexity on both sides requires caveat W ≠ 0. Because the set ofall extreme directions for the positive semidefinite cone (2.9.2.4) comprisesa minimal set of generators for that cone, discretization (2.13.4.2.1) allowsreplacement of matrix W with symmetric dyad ww T as proposed. △3.7 Function symmetry is not a necessary requirement for convexity; indeed, for A∈R m×pand B ∈R m×k , g(X) = AX + B is a convex (affine) function in X on domain R p×k withrespect to the nonnegative orthant R m×k+ . Symmetric convex functions share the samebenefits as symmetric matrices. Horn & Johnson [131,7.7] liken symmetric matrices toreal numbers, and (symmetric) positive definite matrices to positive real numbers.

210 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.2.1 First-order convexity condition, matrix-valued functionFrom the scalar-definition we have, for differentiable matrix-valued functiong and for each and every real vector w of unit norm ‖w‖= 1,w T g(Y )w ≥ w T →Y −XTg(X)w + w dg(X) w (439)that follows immediately from the first-order condition (424) for convexity ofa real function because→Y −XTw dg(X) w = 〈 ∇ X w T g(X)w , Y − X 〉 (440)→Y −Xwhere dg(X) is the directional derivative (D.1.4) of function g at X indirection Y − X . By discretized generalized inequality, (2.13.5)g(Y ) − g(X) −→Y −Xdg(X) ≽0 ⇔〈g(Y ) − g(X) −→Y −X〉dg(X) , ww T ≥ 0 ∀ww T (≽ 0)S M +Hence, for each and every X,Y ∈ domg (confer (432))(441)S M +g(Y ) ≽g(X) +→Y −Xdg(X) (442)S M +must therefore be necessary and sufficient for convexity of a matrix-valuedfunction of matrix variable on open convex domain.3.1.2.2 Epigraph of matrix-valued function, sublevel setsWe generalize the epigraph to a continuous matrix-valued functiong(X) : R p×k →S M :epi g ∆ = {(X , T ) | X ∈ domg , g(X) ≼T } ⊆ R p×k × S M (443)from which it followsS M +Proof of necessity is similar to that in3.1.1.3.g convex ⇔ epi g convex (444)

3.1. CONVEX FUNCTION 211Sublevel sets of a matrix-valued convex function (confer (416))L V g ∆ = {X ∈ dom g | g(X) ≼V } ⊆ R p×k (445)S M +are convex. There is no converse.3.1.2.3 Second-order convexity condition, matrix-valued function3.1.2.3.1 Theorem. Line theorem. [41,3.1.1]Matrix-valued function g(X) : R p×k →S M is convex in X if and only if itremains convex on the intersection of any line with its domain. ⋄Now we assume a twice differentiable function and drop the subscript S M +from an inequality when apparent.3.1.2.3.2 Definition. Differentiable convex matrix-valued function.Matrix-valued function g(X) : R p×k →S M is convex in X iff domg is anopen convex set, and its second derivative g ′′ (X+ t Y ) : R→S M is positivesemidefinite on each point along every line X+ t Y that intersects domg ;id est, iff for each and every X, Y ∈ R p×k such that X+ t Y ∈ domg oversome open interval of t ∈ Rd 2g(X+ t Y ) ≽ 0 (446)dt2 Similarly, ifd 2g(X+ t Y ) ≻ 0 (447)dt2 then g is strictly convex; the converse is generally false. [41,3.1.4] 3.8 △3.8 Quadratic forms constitute a notable exception where the strict-case converse isreliably true.

212 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.1.2.3.3 Example. Matrix inverse.The matrix-valued function X µ is convex on int S M + for −1≤µ≤0or 1≤µ≤2 and concave for 0≤µ≤1. [41,3.6.2] In particular, thefunction g(X) = X −1 is convex on int S M + . For each and every Y ∈ S M(D.2.1,A.3.1.0.5)d 2dt 2 g(X+ t Y ) = 2(X+ t Y )−1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 ≽ 0 (448)on some open interval of t ∈ R such that X + t Y ≻0. Hence, g(X) isconvex in X . This result is extensible; 3.9 trX −1 is convex on that samedomain. [131,7.6, prob.2] [38,3.1, exer.25]3.1.2.3.4 Example. Matrix products.Iconic real function ˆf(x)= x 2 is strictly convex on R . The matrix-valuedfunction g(X)=X 2 is convex on the domain of symmetric matrices; forX, Y ∈ S M and any open interval of t ∈ R (D.2.1)d 2d2g(X+ t Y ) =dt2 dt 2(X+ t Y )2 = 2Y 2 (449)which is positive semidefinite when Y is symmetric because then Y 2 = Y T Y(1099). 3.10 A more appropriate matrix-valued counterpart for ˆf isg(X)=X T X which is a convex function on domain X ∈ R m×n , and strictlyconvex whenever X is skinny-or-square full-rank.3.1.2.3.5 Example. Matrix exponential.The matrix-valued function g(X) = e X : S M → S M is convex on the subspaceof circulant [101] symmetric matrices. Applying the line theorem, for all t∈Rand circulant X, Y ∈ S M , from Table D.2.7 we haved 2dt 2eX+ t Y = Y e X+ t Y Y ≽ 0 , (XY ) T = XY (450)3.9 d/dt trg(X+ tY ) = trd/dt g(X+ tY ).3.10 By (1114) inA.3.1, changing the domain instead to all symmetric and nonsymmetricpositive semidefinite matrices, for example, will not produce a convex function.

3.1. CONVEX FUNCTION 213Figure 57: Iconic unimodal differentiable quasiconvex function of twovariables graphed in R 2 × R on some open disc in R 2 . Note reversal ofcurvature in direction of gradient.because all circulant matrices are commutative and, for symmetric matrices,XY = Y X ⇔ (XY ) T = XY (1113). Given symmetric argument, the matrixexponential always resides interior to the cone of positive semidefinitematrices in the symmetric matrix subspace; e A ≻0 ∀A∈ S M (1477). Thenfor any matrix Y of compatible dimension, Y T e A Y is positive semidefinite.(A.3.1.0.5)The subspace of circulant symmetric matrices contains all diagonalmatrices. The matrix exponential of any diagonal matrix e Λ exponentiateseach individual entry on the main diagonal. [163,5.3] So, changingthe function domain to the subspace of real diagonal matrices reduces thematrix exponential to a vector-valued function in an isometrically isomorphicsubspace R M ; known convex (3.1.1) from the real-valued function case[41,3.1.5].There are, of course, multifarious methods to determine functionconvexity, [41] [29] [75] each of them efficient when appropriate.

214 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.2 QuasiconvexQuasiconvex functions [41,3.4] [129] [218] [250] [159,2] are useful inpractical problem solving because they are unimodal (by definition whennonmonotone); a global minimum is guaranteed to exist over any convex setin the function domain; e.g., Figure 57. The scalar definition of convexitycarries over to quasiconvexity:3.2.0.0.1 Definition. Quasiconvex matrix-valued function.g(X) : R p×k →S M is a quasiconvex function of matrix X iff domg is a convexset and for each and every Y,Z ∈ domg , 0 ≤ µ ≤ 1, and real vector w ofunit norm,w T g(µ Y + (1 − µ)Z)w ≤ max{w T g(Y )w , w T g(Z)w} (451)A quasiconcave function is characterizedw T g(µ Y + (1 − µ)Z)w ≥ min{w T g(Y )w , w T g(Z)w} (452)In either case, vector w becomes superfluous for real functions.△Unlike convex functions, quasiconvex functions are not necessarilycontinuous. Although insufficient for convex functions, convexity of eachand every sublevel set is necessary and sufficient for quasiconvexity.3.2.0.0.2 Definition. Quasiconvex function by sublevel sets.Matrix-valued function g(X) : R p×k →S M is a quasiconvex function ofmatrix X if and only if domg is a convex set and the sublevel setcorresponding to each and every V ∈ S M (confer (416))L V g = {X ∈ domg | g(X) ≼ V } ⊆ R p×k (445)is convex.△

3.2. QUASICONVEX 215Necessity of convex sublevel sets is proven as follows: For any X, Y ∈ L V gwe must show for each and every µ∈[0, 1] that µX + (1−µ)Y ∈ L V g . Bydefinition we have, for each and every real vector w of unit norm,w T g(µX + (1−µ)Y )w ≤ max{w T g(X)w , w T g(Y )w} ≤ w T V w (453)Likewise, convexity of all superlevel sets 3.11 is necessary and sufficient forquasiconcavity.3.2.1 Differentiable and quasiconvex3.2.1.0.1 Definition. Differentiable quasiconvex matrix-valued function.Assume function g(X) : R p×k →S M is twice differentiable, and domg is anopen convex set.Then g(X) is quasiconvex in X if wherever in its domain the directionalderivative (D.1.4) becomes 0, the second directional derivative (D.1.5) ispositive definite there [41,3.4.3] in the same direction Y ; id est, g isquasiconvex if for each and every point X ∈ dom g , all nonzero directionsY ∈ R p×k , and for t ∈ Rddt∣ g(X+ t Y ) = 0 ⇒t=0d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ≻ 0 (454)If g(X) is quasiconvex, conversely, then for each and every X ∈ domgand all Y ∈ R p×kddt∣ g(X+ t Y ) = 0 ⇒t=03.11 The superlevel set is similarly defined:d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ≽ 0 (455)△L Sg = {X ∈ dom g | g(X) ≽ S }

216 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS3.3 Salient propertiesof convex and quasiconvex functions1.Aconvex (or concave) function is assumed continuous but notnecessarily differentiable on the relative interior of its domain.[202,10]A quasiconvex (or quasiconcave) function is not necessarily acontinuous function.2. convexity ⇒ quasiconvexity ⇔ convex sublevel setsconcavity ⇒ quasiconcavity ⇔ convex superlevel sets3. g convex ⇔ −g concaveg quasiconvex ⇔ −g quasiconcave4. The scalar-definition of matrix-valued function convexity (3.1.2.0.1)and the line theorem (3.1.2.3.1) [41,3.4.2] translate identically toquasiconvexity (and quasiconcavity).5. Composition g(h(X)) of a convex (concave) function g with anyaffine function h : R m×n → R p×k , such that h(R m×n ) ∩ domg ≠ ∅,becomes convex (concave) in X ∈ R m×n . [129,B.2.1] Likewise for thequasiconvex (quasiconcave) functions g .6. Convexity, concavity, quasiconvexity, and quasiconcavity are invariantto nonnegative scaling.7.A sum of (strictly) convex (concave) functions remains (strictly)convex (concave). A pointwise supremum (infimum) of convex(concave) functions remains convex (concave). [202,5]Quasiconvexity (quasiconcavity) is invariant to pointwisesupremum (infimum) of quasiconvex (quasiconcave) functions.

Chapter 4Euclidean Distance MatrixThese results were obtained by Schoenberg (1935), a surprisinglylate date for such a fundamental property of Euclidean geometry.−John Clifford Gower [95,3]By itself, distance information between many points in Euclidean space islacking. We might want to know more; such as, relative or absolute positionor dimension of some hull. A question naturally arising in some fields(e.g., geodesy, economics, genetics, psychology, biochemistry, engineering)[61] asks what facts can be deduced given only distance information. Whatcan we know about the underlying points that the distance informationpurports to describe? We also ask what it means when given distanceinformation is incomplete; or suppose the distance information is not reliable,available, or specified only by certain tolerances (affine inequalities). Thesequestions motivate a study of interpoint distance, well represented in anyspatial dimension by a simple matrix from linear algebra. 4.1 In what follows,we will answer some of these questions via Euclidean distance matrices.4.1 e.g.,◦√D ∈ R N×N , a classical two-dimensional matrix representation of absoluteinterpoint distance because its entries (in ordered rows and columns) can be written neatlyon a piece of paper. Matrix D will be reserved throughout to hold distance-square.2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.217

218 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX√52D =⎡⎣1 2 30 1 51 0 45 4 0⎤⎦1231Figure 58: Convex hull of three points (N = 3) is shaded in R 3 (n = 3).Dotted lines are imagined vectors to points.4.1 EDMEuclidean space R n is a finite-dimensional real vector space having an innerproduct defined on it, hence a metric as well. [147,3.1] A Euclidean distancematrix, an EDM in R N×N+ , is an exhaustive table of distance-square d ijbetween points taken by pair from a list of N points {x l , l=1... N} in R n ;the squared metric the measure of distance-square:d ij = ‖x i − x j ‖ 2 2∆= 〈x i − x j , x i − x j 〉 (456)Each point is labelled ordinally, hence the row or column index of an EDM,i or j =1... N , individually addresses all the points in the list.Consider the following example of an EDM for the case N = 3 :D = [d ij ] =⎡⎣⎤d 11 d 12 d 13d 21 d 22 d 23⎦ =d 31 d 32 d 33⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎡⎦ = ⎣0 1 51 0 45 4 0⎤⎦ (457)Matrix D has N 2 entries but only N(N −1)/2 pieces of information. InFigure 58 are shown three points in R 3 that can be arranged in a list to

4.2. FIRST METRIC PROPERTIES 219correspond to D in (457). Such a list is not unique because any rotation,reflection, or translation (4.5) of the points in Figure 58 would produce thesame EDM D .4.2 First metric propertiesFor i,j = 1... N , the Euclidean distance between points x i and x j mustsatisfy the requirements imposed by any metric space: [147,1.1] [168,1.7]1. √ d ij ≥ 0, i ≠ j nonnegativity2. √ d ij = 0, i = j self-distance3. √ d ij = √ d ji symmetry4. √ d ij ≤ √ d ik + √ d kj , i≠j ≠k triangle inequalitywhere √ d ij is the Euclidean metric in R n (4.4). Then all entries of an EDMmust be in concord with these Euclidean metric properties: specifically, eachentry must be nonnegative, 4.2 the main diagonal must be 0 , 4.3 and an EDMmust be symmetric. The fourth property provides upper and lower bounds foreach entry. Property 4 is true more generally when there are no restrictionson indices i,j,k , but furnishes no new information.4.3 ∃ fifth Euclidean metric propertyThe four properties of the Euclidean metric provide information insufficientto certify that a bounded convex polyhedron more complicated than atriangle has a Euclidean realization. [95,2] Yet any list of points or thevertices of any bounded convex polyhedron must conform to the properties.4.2 Implicit from the terminology, √ d ij ≥ 0 ⇔ d ij ≥ 0 is always assumed.4.3 What we call self-distance, Marsden calls nondegeneracy. [168,1.6] Kreyszig callsthese first metric properties axioms of the metric; [147, p.4] Blumenthal refers to them aspostulates. [34, p.15]

220 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.3.0.0.1 Example. Triangle.Consider the EDM in (457), but missing one of its entries:⎡0⎤1 d 13D = ⎣ 1 0 4 ⎦ (458)d 31 4 0Can we determine unknown entries of D by applying the metric properties?Property 1 demands √ d 13 , √ d 31 ≥ 0, property 2 requires the main diagonalbe 0, while property 3 makes √ d 31 = √ d 13 . The fourth property tells us1 ≤ √ d 13 ≤ 3 (459)Indeed, described over that closed interval [1, 3] is a family of triangularpolyhedra whose angle at vertex x 2 varies from 0 to π radians. So, yes wecan determine the unknown entries of D , but they are not unique; nor shouldthey be from the information given for this example.4.3.0.0.2 Example. Small completion problem, I.Now consider the polyhedron in Figure 59(b) formed from an unknownlist {x 1 ,x 2 ,x 3 ,x 4 }. The corresponding EDM less one critical piece ofinformation, d 14 , is given by⎡⎤0 1 5 d 14D = ⎢ 1 0 4 1⎥⎣ 5 4 0 1 ⎦ (460)d 14 1 1 0From metric property 4 we may write a few inequalities for the two trianglescommon to d 14 ; we find√5−1 ≤√d14 ≤ 2 (461)We cannot further narrow those bounds on √ d 14 using only the four metricproperties (4.8.3.1.1). Yet there is only one possible choice for √ d 14 becausepoints x 2 ,x 3 ,x 4 must be collinear. All other values of √ d 14 in the interval[ √ 5−1, 2] specify impossible distances in any dimension; id est, in thisparticular example the triangle inequality does not yield an interval for √ d 14over which a family of convex polyhedra can be reconstructed.

4.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 221x 3x 4x 3x 4(a)√512(b)1x 1 1 x 2x 1 x 2Figure 59: (a) Complete dimensionless EDM graph. (b) Emphasizingobscured segments x 2 x 4 , x 4 x 3 , and x 2 x 3 , now only five (2N −3) absolutedistances are specified. EDM so represented is incomplete, missing d 14 asin (460), yet the isometric reconstruction (4.4.2.2.5) is unique as proved in4.9.3.0.1 and4.14.4.1.1. First four properties of Euclidean metric are nota recipe for reconstruction of this polyhedron.We will return to this simple Example 4.3.0.0.2 to illustrate more elegantmethods of solution in4.8.3.1.1,4.9.3.0.1, and4.14.4.1.1. Until then, wecan deduce some general principles from the foregoing examples:Unknown d ij of an EDM are not necessarily uniquely determinable.The triangle inequality does not produce necessarily tight bounds. 4.4Four Euclidean metric properties are insufficient for reconstruction.4.4 The term tight with reference to an inequality means equality is achievable.

222 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.3.1 LookaheadThere must exist at least one requirement more than the four properties ofthe Euclidean metric that makes them altogether necessary and sufficient tocertify realizability of bounded convex polyhedra. Indeed, there are infinitelymany more; there are precisely N + 1 necessary and sufficient Euclideanmetric requirements for N points constituting a generating list (2.3.2). Hereis the fifth requirement:4.3.1.0.1 Fifth Euclidean metric property. Relative-angle inequality.(confer4.14.2.1.1) Augmenting the four fundamental properties of theEuclidean metric in R n , for all i,j,l ≠ k ∈{1... N}, i

4.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 223kθ iklθ ikjθ jkliljFigure 60: Nomenclature for fifth Euclidean metric property. Each angle θis made by a vector pair at vertex k while i,j,k,l index four points. Thefifth property is necessary for realization of four or more points reckoned bythree angles in any dimension.

224 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXNow we develop some invaluable concepts, moving toward a link of theEuclidean metric properties to matrix criteria.4.4 EDM definitionAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrix X ;X = [x 1 · · · x N ] ∈ R n×N (64)where N is regarded as cardinality of list X . When matrix D =[d ij ] is anEDM, its entries must be related to those points constituting the list by theEuclidean distance-square: for i,j=1... N (A.1.1 no.22)d ij = ‖x i − x j ‖ 2 = (x i − x j ) T (x i − x j ) = ‖x i ‖ 2 + ‖x j ‖ 2 − 2x T ix j= [ ] [ ] [ ]x T i x T I −I xij−I I x j= vec(X) T (Φ ij ⊗ I) vec X = 〈Φ ij , X T X〉(464)where⎡vec X = ⎢⎣⎤x 1x 2⎥. ⎦ ∈ RnN (465)x Nand where Φ ij ⊗I has I ∈ S n in its ii th and jj th block of entries while −I ∈ S nfills its ij th and ji th block; id est,Φ ij ∆ = δ((e i e T j + e j e T i )1) − (e i e T j + e j e T i ) ∈ S N += e i e T i + e j e T j − e i e T j − e j e T (466)i= (e i − e j )(e i − e j ) Twhere {e i ∈ R N , i=1... N} is the set of standard basis vectors, and ⊗signifies the Kronecker product (D.1.2.1). Thus each entry d ij is a convexquadratic function [41,3,4] of vec X (29). [202,6]

4.4. EDM DEFINITION 225The collection of all Euclidean distance matrices EDM N is a convex subsetof R N×N+ called the EDM cone (5, Figure 101, p.416);0 ∈ EDM N ⊆ S N h ∩ R N×N+ ⊂ S N (467)An EDM D must be expressible as a function of some list X ; id est, it musthave the formD(X) ∆ = δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (468)= [vec(X) T (Φ ij ⊗ I) vecX , i,j=1... N] (469)Function D(X) will make an EDM given any X ∈ R n×N , conversely, butD(X) is not a convex function of X (4.4.1). Now the EDM cone may bedescribed:EDM N = { D(X) | X ∈ R N−1×N} (470)Expression D(X) is a matrix definition of EDM and so conforms to theEuclidean metric properties:Nonnegativity of EDM entries (property 1,4.2) is obvious from thedistance-square definition (464), so holds for any D expressible in the formD(X) in (468).When we say D is an EDM, reading from (468), it implicitly meansthe main diagonal must be 0 (property 2, self-distance) and D must besymmetric (property 3); δ(D) = 0 and D T = D or, equivalently, D ∈ S N hare necessary matrix criteria.4.4.0.1 homogeneityFunction D(X) is homogeneous in the sense, for ζ ∈ R√ √◦D(ζX) = |ζ|◦D(X) (471)where the positive square root is entrywise.Any nonnegatively scaled EDM remains an EDM; id est, the matrix classEDM is invariant to nonnegative scaling (αD(X) for α≥0) because allEDMs of dimension N constitute a convex cone EDM N (5, Figure 79).

226 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.1 −V T N D(X)V N convexity([ ])xiWe saw that EDM entries d ij are convex quadratic functions. Yetx j−D(X) (468) is not a quasiconvex function of matrix X ∈ R n×N because thesecond directional derivative (3.2)− d2dt 2 ∣∣∣∣t=0D(X+ t Y ) = 2 ( −δ(Y T Y )1 T − 1δ(Y T Y ) T + 2Y T Y ) (472)is indefinite for any Y ∈ R n×N since its main diagonal is 0. [93,4.2.8][131,7.1, prob.2] Hence −D(X) can neither be convex in X .The outcome is different when instead we consider−V T N D(X)V N = 2V T NX T XV N (473)where we introduce the full-rank skinny Schoenberg auxiliary matrix (B.4.2)⎡V ∆ N = √ 12 ⎢⎣(N(V N )=0) having range−1 −1 · · · −11 01. . .0 1⎤⎥⎦= 1 √2[ −1TI]∈ R N×N−1 (474)R(V N ) = N(1 T ) , V T N 1 = 0 (475)Matrix-valued function (473) meets the criterion for convexity in3.1.2.3.2over its domain that is all of R n×N ; videlicet, for any Y ∈ R n×N− d2dt 2 V T N D(X + t Y )V N = 4V T N Y T Y V N ≽ 0 (476)Quadratic matrix-valued function −VN TD(X)V N is therefore convex in Xachieving its minimum, with respect to a positive semidefinite cone (2.7.2.2),at X = 0. When the penultimate number of points exceeds the dimensionof the space n < N −1, strict convexity of the quadratic (473) becomesimpossible because (476) could not then be positive definite.

4.4. EDM DEFINITION 2274.4.2 Gram-form EDM definitionPositive semidefinite matrix X T X in (468), formed from inner product of thelist, is known as a Gram matrix; [162,3.6]⎡⎤‖x 1 ‖ 2 x T 1x 2 x T 1x 3 · · · x T 1x NxG = ∆ T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x NX T X =x T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N +⎢⎥⎣ . ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2⎡⎤⎛⎡⎤⎞1 cos ψ 12 cos ψ 13 · · · cos ψ 1N ⎛⎡⎤⎞‖x 1 ‖‖x ‖x 2 ‖cosψ 12 1 cos ψ 23 · · · cos ψ 2N1 ‖= δ⎜⎢⎥⎟. ⎝⎣. ⎦⎠cosψ 13 cos ψ 23 1 . . cos ψ ‖x 2 ‖3Nδ⎜⎢⎥⎟⎢⎥ ⎝⎣. ⎦⎠‖x N ‖⎣ . ....... . ⎦‖x N ‖cos ψ 1N cosψ 2N cos ψ 3N · · · 1∆= δ 2 (G) 1/2 Ψ δ 2 (G) 1/2 (477)where ψ ij (497) is the angle between vectors x i and x j , and whereδ 2 denotes a diagonal matrix in this case. Positive semidefinitenessof interpoint angle matrix Ψ implies positive semidefiniteness of Grammatrix G ; [41,8.3]G ≽ 0 ⇐ Ψ ≽ 0 (478)When δ 2 (G) is nonsingular, then G ≽ 0 ⇔ Ψ ≽ 0. (A.3.1.0.5)Distance-square d ij (464) is related to Gram matrix entries G T = G ∆ = [g ij ]d ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(479)where Φ ij is defined in (466). Hence the linear EDM definition}D(G) = ∆ δ(G)1 T + 1δ(G) T − 2G ∈ EDM N⇐ G ≽ 0 (480)= [〈Φ ij , G〉 , i,j=1... N]The EDM cone may be described, (confer (555)(561))EDM N = { }D(G) | G ∈ S N +(481)

228 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.2.1 First point at originAssume the first point x 1 in an unknown list X resides at the origin;Xe 1 = 0 ⇔ Ge 1 = 0 (482)Consider the symmetric translation (I − 1e T 1 )D(G)(I − e 1 1 T ) that shiftsthe first row and column of D(G) to the origin; setting Gram-form EDMoperator D(G) = D for convenience,− ( D − (De 1 1 T + 1e T 1D) + 1e T 1De 1 1 T) 12 = G − (Ge 11 T + 1e T 1G) + 1e T 1Ge 1 1 Twheree 1 ∆ =⎡⎢⎣10.0⎤(483)⎥⎦ (484)is the first vector from the standard basis. Then it follows for D ∈ S N hG = − ( D − (De 1 1 T + 1e T 1D) ) 12 , x 1 = 0= − [ 0 √ ] TD [ √ ]2V N 0 1 2VN2[ ] 0 0T=0 −VN TDV N(485)VN TGV N = −VN TDV N 1 , 2 ∀XwhereI − e 1 1 T =[0 √ 2V N](486)is a projector nonorthogonally projecting (E.1) onS N 1 = {G∈ S N | Ge 1 = 0}{ [0 √ ] T [ √ ]= 2VN Y 0 2VN | Y ∈ SN} (1604)in the Euclidean sense. From (485) we get sufficiency of the first matrixcriterion for an EDM proved by Schoenberg in 1935; [207] 4.74.7 From (475) we know R(V N )= N(1 T ) , so (487) is the same as (463). In fact, anymatrix V in place of V N will satisfy (487) whenever R(V )= R(V N )= N(1 T ). But V N isthe matrix implicit in Schoenberg’s seminal exposition.

4.4. EDM DEFINITION 229D ∈ EDM N⇔{−VTN DV N ∈ S N−1+D ∈ S N h(487)We provide a rigorous complete more geometric proof of this Schoenbergcriterion in4.9.1.0.3.[ ] 0 0TBy substituting G =0 −VN TDV (485) into D(G) (480), assumingNx 1 = 0[0D =δ ( −VN TDV )N] [1 T + 1 0 δ ( ) ]−VNDV T TNWe provide details of this bijection in4.6.2.[ ] 0 0T− 20 −VN TDV N(488)4.4.2.2 0 geometric centerAssume the geometric center (4.5.1.0.1) of an unknown list X is the origin;X1 = 0 ⇔ G1 = 0 (489)Now consider the calculation (I − 1 N 11T )D(G)(I − 1 N 11T ) , a geometriccentering or projection operation. (E.7.2.0.2) Setting D(G) = D forconvenience as in4.4.2.1,G = − ( D − 1 N (D11T + 11 T D) + 1N 2 11 T D11 T) 12 , X1 = 0= −V DV 1 2V GV = −V DV 1 2∀X(490)where more properties of the auxiliary (geometric centering, projection)matrixV ∆ = I − 1 N 11T ∈ S N (491)are found inB.4. From (490) and the assumption D ∈ S N h we get sufficiencyof the more popular form of Schoenberg’s criterion:

230 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXD ∈ EDM N⇔{−V DV ∈ SN+D ∈ S N h(492)Of particular utility when D ∈ EDM N is the fact, (B.4.2 no.20) (464)tr ( −V DV 1 2)=12N∑i,jd ij = 12N vec(X)T (∑i,jΦ ij ⊗ I= tr(V GV ) , G ≽ 0)vec X∑= trG = N ‖x l ‖ 2 = ‖X‖ 2 F , X1 = 0l=1(493)where ∑ Φ ij ∈ S N + (466), therefore convex in vecX . We will find this traceuseful as a heuristic to minimize affine dimension of an unknown list arrangedcolumnar in X , (7.2.2) but it tends to facilitate reconstruction of a listconfiguration having least energy; id est, it compacts a reconstructed list byminimizing total norm-square of the vertices.By substituting G=−V DV 1 (490) into D(G) (480), assuming X1=02(confer4.6.1)D = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(494)These relationships will allow combination of distance and Gramconstraints in any optimization problem we may pose:Constraining all main-diagonal entries of a Gram matrix to 1, forexample, is equivalent to the constraint that all points lie on ahypersphere (4.9.1.0.2) of radius 1 centered at the origin. Thisis equivalent to the EDM constraint: D1 = 2N1. [54, p.116] Anyfurther constraint on that Gram matrix then applies only to interpointangle Ψ .More generally, interpoint angle Ψ can be constrained by fixing all theindividual point lengths δ(G) 1/2 ; thenΨ = − 1 2 δ2 (G) −1/2 V DV δ 2 (G) −1/2 (495)

4.4. EDM DEFINITION 2314.4.2.2.1 Example. List member constraints via Gram matrix.Capitalizing on identity (490) relating Gram and EDM D matrices, aconstraint set such astr ( − 1V DV e )2 ie T i = ‖xi ‖ 2⎫⎪tr ( ⎬− 1V DV (e 2 ie T j + e j e T i ) 2) 1 = xTi x jtr ( (496)− 1V DV e ) ⎪2 je T j = ‖xj ‖ 2 ⎭relates list member x i to x j to within an isometry through inner-productidentity [257,1-7]cos ψ ij =xT i x j‖x i ‖ ‖x j ‖For M list members, there are a total of M(M+1)/2 such constraints.(497)Consider the academic problem of finding a Gram matrix subject toconstraints on each and every entry of the corresponding EDM:findD∈S N h−V DV 1 2 ∈ SNsubject to 〈 D , (e i e T j + e j e T i ) 1 2〉= ď ij , i,j=1... N , i < j−V DV ≽ 0(498)where the ďij are given nonnegative constants. EDM D can, of course,be replaced with the equivalent Gram-form (480). Requiring only theself-adjointness property (1063) of the main-diagonal linear operator δ weget, for A∈ S N〈D , A〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , A 〉 = 〈G , δ(A1) − A〉 2 (499)Then the problem equivalent to (498) becomes, for G∈ S N c ⇔ G1=0findG∈S N csubject toG ∈ S N〈G , δ ( (e i e T j + e j e T i )1 ) 〉− (e i e T j + e j e T i ) = ďij , i,j=1... N , i < jG ≽ 0 (500)

232 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXx 3x 4x 5x 6Figure 61: Arbitrary hexagon in R 3 whose vertices are labelled clockwise.x 1x 2Barvinok’s Proposition 2.9.3.0.1 predicts existence for either formulation(498) or (500) such that implicit equality constraints induced by subspacemembership are ignored⌊√ ⌋8(N(N −1)/2) + 1 − 1rankG , rankV DV ≤= N − 1 (501)2because, in each case, the Gram matrix is confined to a face of positivesemidefinite cone S N + isomorphic with S N−1+ (5.7.1). (E.7.2.0.2) This boundis tight (4.7.1.1) and is the greatest upper bound. 4.84.4.2.2.2 Example. Hexagon.Barvinok [21,2.6] poses a problem in geometric realizability of an arbitraryhexagon (Figure 61) having:1. prescribed (one-dimensional) face-lengths l2. prescribed angles ϕ between the three pairs of opposing faces3. a constraint on the sum of norm-square of each and every vertex x4.8 −V DV | N←1 = 0 (B.4.1)

4.4. EDM DEFINITION 233ten affine equality constraints in all on a Gram matrix G∈ S 6 (490). Let’srealize this as a convex feasibility problem (with constraints written in thesame order) also assuming 0 geometric center (489):findD∈S 6 h−V DV 1 2 ∈ S6subject to tr ( )D(e i e T j + e j e T i ) 1 2 = l2ij , j−1 = (i = 1... 6) mod 6tr ( − 1V DV (A 2 i + A T i ) 2) 1 = cos ϕi , i = 1, 2, 3tr(− 1 2 V DV ) = 1−V DV ≽ 0 (502)where, for A i ∈ R 6×6 (497)A 1 = (e 1 − e 6 )(e 3 − e 4 ) T /(l 61 l 34 )A 2 = (e 2 − e 1 )(e 4 − e 5 ) T /(l 12 l 45 )A 3 = (e 3 − e 2 )(e 5 − e 6 ) T /(l 23 l 56 )(503)and where the first constraint on length-square l 2 ij can be equivalently writtenas a constraint on the Gram matrix −V DV 1 via (499). We show how to2numerically solve such a problem by alternating projection inE.10.2.1.1.Barvinok’s Proposition 2.9.3.0.1 asserts existence of a list, correspondingto Gram matrix G solving this feasibility problem, whose affine dimension(4.7.1.1) does not exceed 3 because the convex feasible set is bounded bythe third constraint tr(− 1 V DV ) = 1 (493).24.4.2.2.3 Example. Kissing number of ball packing.Two Euclidean balls are said to kiss if they touch. An elementary geometricalproblem can be posed: Given hyperspheres, each having the same diameter 1,how many hyperspheres can simultaneously kiss one central hypersphere?[270] The noncentral hyperspheres are allowed but not required to kiss.As posed, the problem seeks the maximum number of spheres K kissinga central sphere in a particular dimension. The total number of spheres isN = K + 1. In one dimension the answer to this kissing problem is 2. In twodimensions, 6. (Figure 7)

234 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFigure 62: Sphere-packing illustration from [252, kissing number].Translucent balls illustrated all have the same diameter.The question was presented in three dimensions to Isaac Newton in thecontext of celestial mechanics, and became controversy with David Gregoryon the campus of Cambridge University in 1694. Newton correctlyidentified the kissing number as 12 (Figure 62) while Gregory argued for 13.In 2003, Oleg Musin tightened the upper bound on kissing number Kin four dimensions from 25 to K = 24 by refining a proof offered byPhilippe Delsarte in 1973 based on linear programming.There are no proofs known for kissing number in higher dimensionexcepting dimensions eight and twenty four.Ye proposes translating this problem to an EDM graph realization(Figure 59, Figure 63). Imagine the centers of each sphere are connected byline segments. Then the distance between centers must obey simple criteria:Each sphere touching the central sphere has a line segment of length exactly 1joining its center to the central sphere’s center. All spheres, excepting thecentral sphere, must have centers separated by a distance of at least 1.From this perspective, the kissing problem can be posed as a semidefiniteprogram. Assign index 1 to the central sphere, and assume a total of Nspheres:minimize − tr W V TD∈S NN DV Nsubject to D 1j = 1, j = 2... N(504)D ij ≥ 1, 2 ≤ i < j = 3... ND ∈ EDM N

4.4. EDM DEFINITION 235Then the kissing numberK = N max − 1 (505)is found from the maximum number of spheres N that solve this semidefiniteprogram in a given affine dimension r . Matrix W can be interpreted as thedirection of search through the positive semidefinite cone S N−1+ for a rank-roptimal solution −VN TD⋆ V N ; it is constant, in this program, determined bya method discussed in6.4. In two dimensions,⎡W =⎢⎣4 1 2 −1 −1 11 4 −1 −1 2 12 −1 4 1 1 −1−1 −1 1 4 1 2−1 2 1 1 4 −11 1 −1 2 −1 4⎤1⎥6⎦(506)In three dimensions,⎡W =⎢⎣9 1 −2 −1 3 −1 −1 1 2 1 −2 11 9 3 −1 −1 1 1 −2 1 2 −1 −1−2 3 9 1 2 −1 −1 2 −1 −1 1 2−1 −1 1 9 1 −1 1 −1 3 2 −1 13 −1 2 1 9 1 1 −1 −1 −1 1 −1−1 1 −1 −1 1 9 2 −1 2 −1 2 3−1 1 −1 1 1 2 9 3 −1 1 −2 −11 −2 2 −1 −1 −1 3 9 2 −1 1 12 1 −1 3 −1 2 −1 2 9 −1 1 −11 2 −1 2 −1 −1 1 −1 −1 9 3 1−2 −1 1 −1 1 2 −2 1 1 3 9 −11 −1 2 1 −1 3 −1 1 −1 1 −1 9⎤112⎥⎦(507)A four-dimensional solution has rational direction matrix as well. Here is anoptimal point list 4.9 in Matlab output format:4.9 An optimal five-dimensional point list is, to date, unknown.

236 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXColumns 1 through 6X = 0 -0.1983 -0.4584 0.1657 0.9399 0.74160 0.6863 0.2936 0.6239 -0.2936 0.39270 -0.4835 0.8146 -0.6448 0.0611 -0.42240 0.5059 0.2004 -0.4093 -0.1632 0.3427Columns 7 through 12-0.4815 -0.9399 -0.7416 0.1983 0.4584 -0.28320 0.2936 -0.3927 -0.6863 -0.2936 -0.6863-0.8756 -0.0611 0.4224 0.4835 -0.8146 -0.3922-0.0372 0.1632 -0.3427 -0.5059 -0.2004 -0.5431Columns 13 through 180.2832 -0.2926 -0.6473 0.0943 0.3640 -0.36400.6863 0.9176 -0.6239 -0.2313 -0.0624 0.06240.3922 0.1698 -0.2309 -0.6533 -0.1613 0.16130.5431 -0.2088 0.3721 0.7147 -0.9152 0.9152Columns 19 through 25-0.0943 0.6473 -0.1657 0.2926 -0.5759 0.5759 0.48150.2313 0.6239 -0.6239 -0.9176 0.2313 -0.2313 00.6533 0.2309 0.6448 -0.1698 -0.2224 0.2224 0.8756-0.7147 -0.3721 0.4093 0.2088 -0.7520 0.7520 0.0372This particular optimal solution was found by solving a problem sequencein increasing number of spheres. Numerical problems begin to arise withmatrices of this cardinality due to interior-point methods of solution.By eliminating some equality constraints for this particular problem,matrix variable dimension can be reduced. From4.8.3 we have[ 0−VNDV T N = 11 T T− [0 I ] DI] 12(508)(which does not generally hold) where identity matrix I ∈ S N−1 has one lessdimension than EDM D . By defining an EDM principal submatrix[ˆD = ∆ 0T[0 I ] DI]∈ S N−1h(509)

4.4. EDM DEFINITION 237we get a convex problem equivalent to (504)minimize − tr(W ˆD)ˆD∈S N−1subject to ˆDij ≥ 1 , 1 ≤ i < j = 2... N −111 T − ˆD 1 2 ≽ 0δ( ˆD) = 0(510)Any feasible matrix 11 T − ˆD 1 2belongs to an elliptope (4.9.1.0.1). This next example shows how finding the common point of intersectionfor three circles in a plane, a nonlinear problem, has convex expression.4.4.2.2.4 Example. So & Ye trilateration in wireless sensor network.Given three known absolute point positions in R 2 (three anchors ˇx 2 , ˇx 3 , ˇx 4 )and only one unknown point (one sensor x 1 ∈ R 2 ), the sensor’s absoluteposition is determined from its noiseless measured distance-square ď i1to each of three anchors (Figure 2, Figure 63(a)). This trilaterationcan be expressed as a convex optimization problem in terms of listX = ∆ [x 1 ˇx 2 ˇx 3 ˇx 4 ]∈ R 2×4 and Gram matrix G∈ S 4 (477):minimize trGG∈S 4 , X∈R2×4 subject to tr(GΦ i1 ) = ďi1 , i = 2, 3, 4tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4X(:, 2:4) = [ ˇx 2 ˇx 3 ˇx 4 ][ ] I XX T≽ 0G(511)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (466)

238 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX(a)(c)x 3x 4 x 5x 5 x 6√d13√d14x 3ˇx 3 ˇx 4√d12ˇx 2x 1x 1x 2x 4x 1x 1Figure 63: (a) Given three distances indicated with absolute pointpositions ˇx 2 , ˇx 3 , ˇx 4 known and noncollinear, absolute position of x 1 in R 2can be precisely and uniquely determined by trilateration; solution to asystem of nonlinear equations. Dimensionless EDM graphs (b) (c) (d)represent EDMs in various states of completion. Line segments representknown absolute distances and may cross without vertex at intersection.(b) Four-point list must always be embeddable in affine subset havingdimension rankVN TDV N not exceeding 3. To determine relative position ofx 2 ,x 3 ,x 4 , triangle inequality is necessary and sufficient (4.14.1). Knowingall distance information, then (by injectivity of D (4.6)) point position x 1is uniquely determined to within an isometry in any dimension. (c) Whenfifth point is introduced, only distances to x 3 ,x 4 ,x 5 are required todetermine relative position of x 2 in R 2 . Graph represents first instanceof missing distance information; √ d 12 . (d) Three distances are absent( √ d 12 , √ d 13 , √ d 23 ) from complete set of interpoint distances, yet uniqueisometric reconstruction (4.4.2.2.5) of six points in R 2 is certain.(b)(d)

4.4. EDM DEFINITION 239and where the constraint on distance-square ďi1 is equivalently written asa constraint on the Gram matrix via (479). There are 9 independent affineequality constraints on that Gram matrix while the sensor is constrained onlyby dimensioning to lie in R 2 . Although the objective of minimization 4.10insures a solution on the boundary of positive semidefinite cone S 4 + , weclaim that the set of feasible Gram matrices forms a line (2.5.1.1) inisomorphic R 10 tangent (2.1.7.2) to the positive semidefinite cone boundary.(confer6.2.1.3)By Schur complement (A.4) any feasible G and X provideG ≽ X T X (512)which is a convex relaxation (2.9.1.0.1) of the desired (nonconvex) equalityconstraint[ ] [ ]I X IX T =G X T [ I X ] (513)expected positive semidefinite rank-2 under noiseless conditions. But by(1126), the relaxation admits(3 ≥) rankG ≥ rankX (514)(a third dimension corresponding to[an intersection ] of three spheresI X(not circles) were there noise). If rank⋆X ⋆T G ⋆ of an optimal solutionequals 2, then G ⋆ = X ⋆T X ⋆ by Theorem A.4.0.0.2.As posed, this localization problem does not require affinely independent(Figure 17, three noncollinear) anchors. Assuming the anchors exhibit norotational or reflective symmetry in their affine hull (4.5.2) and assumingthe sensor lies in that affine hull, then solution X ⋆ (:, 1) is unique undernoiseless measurement. [212]4.10 Trace (trG = 〈I , G〉) minimization is a heuristic for rank minimization. (7.2.2.1) Itmay be interpreted as squashing G which is bounded below by X T X as in (512).

240 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXThis preceding transformation of trilateration to a semidefinite programworks all the time (rankG ⋆ = 2) despite relaxation (512) because the optimalsolution set is a unique point.Proof (sketch). Only the sensor location x 1 is unknown. The objectivefunction together with the equality constraints make a linear system ofequations in Gram matrix variable GtrG = ‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2tr(GΦ i1 = ďi1 , i = 2, 3, 4tr ( ) (515)Ge i e T i = ‖ˇxi ‖ 2 , i = 2, 3, 4tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3, 4which is invertible:⎡svec(I) Tsvec(Φ 21 ) Tsvec(Φ 31 ) Tsvec(Φ 41 ) Tsvec(e 2 e T 2 ) Tsvec G =svec(e 3 e T 3 ) Tsvec(e 4 e T 4 ) Tsvec ( (e⎢ 2 e T 3 + e 3 e T 2 )/2 ) T⎣ svec ( (e 2 e T 4 + e 4 e T 2 )/2 ) Tsvec ( (e 3 e T 4 + e 4 e T 3 )/2 ) T⎤⎥⎦−1⎡‖x 1 ‖ 2 + ‖ˇx 2 ‖ 2 + ‖ˇx 3 ‖ 2 + ‖ˇx 4 ‖ 2 ⎤ď 21ď 31ď 41‖ˇx 2 ‖ 2‖ˇx 3 ‖ 2‖ˇx 4 ‖ 2⎢ ˇx T 2ˇx 3⎥⎣ ˇx T 2ˇx 4⎦ˇx T 3ˇx 4(516)That line in the ambient space S 4 of G is traced by the right-hand side.One must show this line to be tangential (2.1.7.2) to S 4 + in order to proveuniqueness. Tangency is possible for affine dimension 1 or 2 while itsoccurrence depends completely on the known measurement data. But as soon as significant noise is introduced or whenever distance data isincomplete, such problems can remain convex although the set of all optimalsolutions generally becomes a convex set bigger than a single point butcontaining the noiseless solution.

4.4. EDM DEFINITION 2414.4.2.2.5 Definition. Isometric reconstruction. (confer4.5.3)Isometric reconstruction from an EDM means building a list X correct towithin a rotation, reflection, and translation; in other terms, reconstructionof relative position, correct to within an isometry, or to within a rigidtransformation.△How much distance information is needed to uniquely localize a sensor(to recover actual relative position)? The narrative in Figure 63 helpsdispel any notion of distance data proliferation in low affine dimension(r

242 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXx 1x 4x 6x 3x 5x 2Figure 64: Incomplete EDM corresponding to this dimensionless EDMgraph provides unique isometric reconstruction in R 2 . (drawn freehand, nosymmetry intended)ˇx 4x 1x 2ˇx 3ˇx 5Figure 65: Two sensors • and three anchors ◦ in R 2 . (Ye) Connectingline-segments denote known absolute distances. Incomplete EDMcorresponding to this dimensionless EDM graph provides unique isometricreconstruction in R 2 .

4.4. EDM DEFINITION 243105ˇx 4ˇx 50ˇx 3−5x 2x 1−10−15−6 −4 −2 0 2 4 6 8Figure 66: Given in red are two discrete linear trajectories of sensors x 1and x 2 in R 2 localized by algorithm (518) as indicated by blue bullets • .Anchors ˇx 3 , ˇx 4 , ˇx 5 corresponding to Figure 65 are indicated by ⊗ . Whentargets and bullets • coincide under these noiseless conditions, localizationis successful. On this run, two visible localization errors are due to rank-3Gram optimal solutions. These errors can be corrected by choosing a differentnormal in objective of minimization.

244 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.2.2.6 Example. Tandem trilateration in wireless sensor network.Given three known absolute point-positions in R 2 (three anchors ˇx 3 , ˇx 4 , ˇx 5 )and two unknown sensors x 1 , x 2 ∈ R 2 , the sensors’ absolute positions aredeterminable from their noiseless distances-square (as indicated in Figure 65)assuming the anchors exhibit no rotational or reflective symmetry in theiraffine hull (4.5.2). This example differs from Example 4.4.2.2.4 in so faras trilateration of each sensor is now in terms of one unknown position, theother sensor. We express this localization as a convex optimization problem(a semidefinite program,6.1) in terms of list X ∆ = [x 1 x 2 ˇx 3 ˇx 4 ˇx 5 ]∈ R 2×5and Gram matrix G∈ S 5 (477) via relaxation (512):minimize trGG∈S 5 , X∈R2×5 subject to tr(GΦ i1 ) = ďi1 , i = 2, 4, 5tr(GΦ i2 ) = ďi2 , i = 3, 5tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 3, 4, 5tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 3 ≤ i < j = 4, 5X(:, 3:5) = [ ˇx 3 ˇx 4 ˇx 5 ][ ] I XX T≽ 0G(518)whereΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (466)This problem realization is fragile because of the unknown distances betweensensors and anchors. Yet there is no more information we may include beyondthe 11 independent equality constraints on the Gram matrix (nonredundantconstraints not antithetical) to reduce the feasible set 4.12 . (By virtue of theirdimensioning, the sensors are already constrained to R 2 the affine hull of theanchors.)Exhibited in Figure 66 are two errors in solution X ⋆ (:,1:2) due toa rank-3 optimal Gram matrix G ⋆ . The trace objective is a heuristicminimizing convex envelope of quasiconcave rank function 4.13 rankG.(2.9.2.6.2,7.2.2.1) A rank-2 optimal Gram matrix can be found and4.12 the presumably nonempty convex set of all points G and X satisfying the constraints.4.13 Projection on that nonconvex subset of all N ×N-dimensional positive semidefinitematrices whose rank does not exceed 2 in an affine subset is a problem considered difficultto solve. [237,4]

4.4. EDM DEFINITION 245the errors corrected by choosing a different normal for the linear objectivefunction, now implicitly the identity matrix I ; id est,trG = 〈G , I 〉 ← 〈G , δ(u)〉 (519)where vector u ∈ R 5 is randomly selected. A random search for a goodnormal δ(u) in only a few iterations is quite easy and effective becausethe problem is small, an optimal solution is known a priori to exist in twodimensions, a good normal direction is not necessarily unique, and (wespeculate) because the feasible affine-subset slices the positive semidefinitecone thinly in the Euclidean sense. 4.14We explore ramifications of noise and incomplete data throughout; theirindividual effect being to expand the optimal solution set, introducing moresolutions and higher-rank solutions. Hence our focus shifts to discovery of areliable means for diminishing the optimal solution set by introduction of arank constraint in6.4.4.4.2.2.7 Example. Cellular telephone network.Utilizing measurements of distance, time of flight, angle of arrival, or signalpower, multilateration is the process of localizing (determining absoluteposition of) a radio signal source • by inferring geometry relative to multiplefixed base stations ◦ whose locations are known.We consider localization of a regular cellular telephone by distancegeometry, so we assume distance to any particular base station can be inferredby received signal power. On a large open flat expanse of terrain, signal-powermeasurement corresponds well with inverse distance. But it is not uncommonfor measurement of signal power to suffer 20 decibels in loss caused by factorssuch as multipath interference (reflections), mountainous terrain, man-madestructures, turning one’s head, or rolling the windows up in an automobile.Consequently, contours of equal signal power are no longer circular; theirgeometry is irregular and would more aptly be approximated by translatedellipses of various orientation.4.14 The log det rank-heuristic from7.2.2.4 does not work here because it chooses thewrong normal. Rank reduction (6.1.1.1) is unsuccessful here because Barvinok’s upperbound (2.9.3.0.1) on rank of G ⋆ is 4.

246 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXˇx 7x 1ˇx 2ˇx 3ˇx 4ˇx 6ˇx 5Figure 67: Regions of coverage by base stations ◦ in a cellular telephonenetwork. The term cellular arises from packing of regions best coveredby neighboring base stations. Illustrated is a pentagonal cell best coveredby base station ˇx 2 . Like a Voronoi diagram, cell geometry depends onbase-station arrangement. In some US urban environments, it is not unusualto find base stations spaced approximately 1 mile apart. There can beas many as 20 base-station antennae capable of receiving signal from anygiven cell phone • ; practically, that number is closer to 6. Depictedis one cell phone x 1 whose signal power is automatically and repeatedlymeasured by 6 base stations ◦ nearby. (Cell phone signal power is typicallyencoded logarithmically with 1-decibel increment and 64-decibel dynamicrange.) Those signal power measurements are transmitted from that cellphone to base station ˇx 2 who decides whether to transfer (hand-off orhand-over) responsibility for that call should the user roam outside its cell.(Because distance to base station is quite difficult to infer from signal powermeasurements in an urban environment, localization of a particular cellphone • by distance geometry would be far easier were the whole cellularsystem instead conceived so cell phone x 1 also transmits (to base station ˇx 2 )its signal power as received by all other cell phones within range.)

4.4. EDM DEFINITION 247Due to noise, at least one distance measurement more than the minimumnumber of measurements is required for reliable localization in practice;3 measurements are minimum in two dimensions, 4 in three. (We arecurious to know how these convex optimization algorithms fare in the face ofmeasurement noise, and how they compare with traditional methods solvingsimultaneous hyperbolic equations; but we leave that question open for nowas our purpose here is to illustrate how a problem in distance geometrycan be solved without any measured distance data, just upper and lowerbounds on it.) Existence of noise precludes measured distance from theinput data. We instead assign measured distance to a known range specifiedby individual upper and lower bounds; d i1 is the upper bound on actualdistance-square from the cell phone to i th base station, while d i1 is the lowerbound. These bounds become the input data. Each measurement range ispresumed different from the others.Then convex problem (511) takes the form:minimize trGG∈S 7 , X∈R2×7 subject to d i1 ≤ tr(GΦ i1 ) ≤ d i1 , i = 2... 7tr ( )Ge i e T i = ‖ˇx i ‖ 2 , i = 2... 7tr(G(e i e T j + e j e T i )/2) = ˇx T i ˇx j , 2 ≤ i < j = 3... 7whereX(:, 2:7) = [ ˇx 2 ˇx 3 ˇx 4 ˇx 5 ˇx 6 ˇx 7 ][ ] I XX T≽ 0 (520)GΦ ij = (e i − e j )(e i − e j ) T ∈ S N + (466)This semidefinite program realizes the problem illustrated in Figure 67.We take X ⋆ (:, 1) as solution, although measurement noise will often causerankG ⋆ to exceed 2. Randomized search for a rank-2 optimal solution isnot so easy here as in Example 4.4.2.2.6. We introduce a method for rankconstraint in6.4.To formulate the same problem in three dimensions, X is simplyredimensioned in the semidefinite program.

248 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFigure 68: Example of molecular conformation. [69]4.4.2.2.8 Example. (Biswas, Nigam, Ye) Molecular Conformation.The subatomic measurement technique called nuclear magnetic resonancespectroscopy (NMR) is employed to ascertain physical conformation ofmolecules; e.g., Figure 3, Figure 68. From this technique, distance, angle,and dihedral angle data is obtained. Dihedral angles arise consequent toa phenomenon where atom subsets are physically constrained to Euclideanplanes.In the rigid covalent geometry approximation, the bond lengthsand angles are treated as completely fixed, so that a given spatialstructure can be described very compactly indeed by a list oftorsion angles alone... These are the dihedral angles betweenthe planes spanned by the two consecutive triples in a chain offour covalently bonded atoms. [53,1.1]Crippen & Havel recommend working exclusively with distance databecause they consider angle data to be mathematically cumbersome. Thepresent example shows instead how inclusion of dihedral angle data into aproblem statement can be made elegant and convex.As before, ascribe position information to the matrixX = [x 1 · · · x N ] ∈ R 3×N (64)and introduce a matrix ℵ holding normals η to planes respecting dihedralangles ϕ :ℵ ∆ = [η 1 · · · η M ] ∈ R 3×M (521)

4.4. EDM DEFINITION 249As in the other examples, we preferentially work with Gram matrices Gbecause of the bridge they provide between other variables; we define[ ] [ ] [ ]Gℵ Z ∆ ℵZ T =T ℵ ℵ T X ℵTG X X T ℵ X T =X X T [ ℵ X ] ∈ R N+M×N+M (522)whose rank is 3 by assumption. So our problem’s variables are the twoGram matrices and the matrix Z of inner products. Then measurements ofdistance-square can be expressed as linear constraints on G X as in (520),dihedral angle measurements can be expressed as linear constraints on G ℵas in (502), and the normal-vector conditions can be expressed by vanishinglinear constraints on inner-product matrix Z . Consider three points xlabelled 1, 2, 3 in the l th plane and its corresponding normal η l . Thenwe may have, for example, the independent constraintsη T l (x 1 − x 2 ) = 0η T l (x 2 − x 3 ) = 0(523)expressible in terms of constant symmetric matrices A ;〈Z , A l12 〉 = 0〈Z , A l23 〉 = 0(524)NMR data is noisy, of course, so given measurements are described by upperand lower bounds although normals η can be constrained to be exactly unitlength;δ(G ℵ ) = 1 (525)Then we can express the molecular conformation problem: for 0 ≤ ϕ ≤ πfindG ℵ ∈S M , G X ∈S N , Z∈R M×N G Xsubject to d i ≤ tr(G X Φ i ) ≤ d i , ∀i ∈ I 1cos ϕ j ≤ tr(G ℵ B j ) ≤ cos ϕ j , ∀j ∈ I 2〈Z , A k 〉 = 0 , ∀k ∈ I 3δ(G ℵ ) = 1[ ]Gℵ ZZ T ≽ 0G X[ ]Gℵ ZrankZ T = 3G X(526)

250 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXIgnoring the rank constraint tends to force inner-product matrix Z to zero.What binds these three variables is the rank constraint. We show how tomeet the rank constraint in6.4.4.4.3 Inner-product form EDM definition[p.36] We might, for example, realize a constellation given onlyinterstellar distance (or, equivalently, distance from Earth andrelative angular measurement; the Earth as vertex to two stars).Equivalent to (464) is [257,1-7] [220,3.2]d ij = d ik + d kj − 2 √ d ik d kj cos θ ikj= [√ d ik√dkj] [ 1 −e ıθ ikj−e −ıθ ikj1] [√ ]d ik√dkj(527)called the law of cosines, where ı ∆ = √ −1 , i,k,j are positive integers, andθ ikj is the angle at vertex x k formed by vectors x i − x k and x j − x k ;cos θ ikj =1(d 2 ik + d kj − d ij )√ = (x i − x k ) T (x j − x k )dik d kj ‖x i − x k ‖ ‖x j − x k ‖(528)where ([√the numerator ]) forms an inner product of vectors. Distance-squaredikd ij √dkj is a convex quadratic function 4.15 on R 2 + whereas d ij (θ ikj )is quasiconvex (3.2) minimized over domain −π ≤θ ikj ≤π by θ ⋆ ikj =0, weget the Pythagorean theorem when θ ikj = ±π/2, and d ij (θ ikj ) is maximizedwhen θ ⋆ ikj =±π ;[]4.15 1 −e ıθ ikj−e −ıθ ≽ 0, having eigenvalues {0,2}. Minimum is attained forikj1[ √ ] {dik√dkjµ1, µ ≥ 0, θikj = 0=0, −π ≤ θ ikj ≤ π , θ ikj ≠ 0 . (D.2.1, [41, exmp.4.5])

4.4. EDM DEFINITION 251d ij = (√ d ik + √ d kj) 2,θikj = ±πd ij = d ik + d kj , θ ikj = ± π 2d ij = (√ d ik − √ d kj) 2,θikj = 0(529)so| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj (530)Hence the triangle inequality, Euclidean metric property 4, holds for anyEDM D .We may construct an inner-product form of the EDM definition formatrices by evaluating (527) for k=1: By defining⎡√ √ √ ⎤d 12 d12 d 13 cos θ 213 d12 d 14 cos θ 214 · · · d12 d 1N cos θ 21N√ √ √ d12 dΘ T Θ =∆ 13 cosθ 213 d 13 d13 d 14 cos θ 314 · · · d13 d 1N cos θ 31N√ √ √ d12 d 14 cosθ 214 d13 d 14 cos θ 314 d 14... d14 d 1N cos θ 41N∈ S N−1⎢⎥⎣ ........√ √. ⎦√d12 d 1N cosθ 21N d13 d 1N cos θ 31N d14 d 1N cos θ 41N · · · d 1Nthen any EDM may be expressed(531)D(Θ) ∆ ==[[0δ(Θ T Θ)]1 T + 1 [ 0 δ(Θ T Θ) ] [ 0 0 T T− 20 Θ T Θ0 δ(Θ T Θ) Tδ(Θ T Θ) δ(Θ T Θ)1 T + 1δ(Θ T Θ) T − 2Θ T Θ]]∈ EDM N(532)EDM N = { D(Θ) | Θ ∈ R N−1×N−1} (533)for which all Euclidean metric properties hold. Entries of Θ T Θ result fromvector inner-products as in (528); id est,

252 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXΘ = [x 2 − x 1 x 3 − x 1 · · · x N − x 1 ] = X √ 2V N ∈ R n×N−1 (534)Inner product Θ T Θ is obviously related to a Gram matrix (477),[ 0 0TG =0 Θ T Θ], x 1 = 0 (535)For D = D(Θ) and no condition on the list X (confer (485) (490))Θ T Θ = −V T NDV N ∈ R N−1×N−1 (536)4.4.3.1 Relative-angle formThe inner-product form EDM definition is not a unique definition ofEuclidean distance matrix; there are approximately five flavors distinguishedby their argument to operator D . Here is another one:Like D(X) (468), D(Θ) will make an EDM given any Θ∈ R n×N−1 , it isneither a convex function of Θ (4.4.3.2), and it is homogeneous in the sense(471). Scrutinizing Θ T Θ (531) we find that because of the arbitrary choicek = 1, distances therein are all with respect to point x 1 . Similarly, relativeangles in Θ T Θ are between all vector pairs having vertex x 1 . Yet pickingarbitrary θ i1j to fill Θ T Θ will not necessarily make an EDM; inner product(531) must be positive semidefinite.Θ T Θ = √ δ(d) Ω √ δ(d) =∆⎡√ ⎤⎡⎤⎡√ ⎤d12 0 1 cos θ 213 · · · cos θ 21N d12 0√ d13 ⎢⎥cosθ 213 1... cos θ √31Nd13 ⎣ ... ⎦⎢⎥⎢⎥√ ⎣ ....... . ⎦⎣... ⎦√0d1N cos θ 21N cos θ 31N · · · 1 0d1N(537)Expression D(Θ) defines an EDM for any positive semidefinite relative-anglematrixΩ = [cos θ i1j , i,j = 2...N] ∈ S N−1 (538)and any nonnegative distance vectord = [d 1j , j = 2...N] = δ(Θ T Θ) ∈ R N−1 (539)

4.4. EDM DEFINITION 253because (A.3.1.0.5)Ω ≽ 0 ⇒ Θ T Θ ≽ 0 (540)Decomposition (537) and the relative-angle matrix inequality Ω ≽ 0 lead toa different expression of an inner-product form EDM definition (532)D(Ω,d) ∆ ==[ ] 01dT + 1 [ 0 d ] √ ([ ]) [ ] √ ([ ])0 0 0 T T 0− 2 δδd 0 Ω d[ ]0 dTd d1 T + 1d T − 2 √ δ(d) Ω √ ∈ EDM Nδ(d)(541)and another expression of the EDM cone:EDM N ={D(Ω,d) | Ω ≽ 0, √ }δ(d) ≽ 0(542)In the particular circumstance x 1 = 0, we can relate interpoint angle matrixΨ from the Gram decomposition in (477) to relative-angle matrix Ω in(537). Thus,[ ] 1 0TΨ ≡ , x0 Ω 1 = 0 (543)4.4.3.2 Inner-product form −V T N D(Θ)V N convexityWe saw that each EDM entry d ij is a convex quadratic function of[√ ]dik√dkjand a quasiconvex function of θ ikj . Here the situation for inner-productform EDM operator D(Θ) (532) is identical to that in4.4.1 for list-formD(X) ; −D(Θ) is not a quasiconvex function of Θ by the same reasoning,and from (536)−V T N D(Θ)V N = Θ T Θ (544)is a convex quadratic function of Θ on domain R n×N−1 achieving its minimumat Θ = 0.

254 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.4.3.3 Inner-product form, discussionWe deduce that knowledge of interpoint distance is equivalent to knowledge ofdistance and angle from the perspective of one point, x 1 in our chosen case.The total amount of information in Θ T Θ , N(N −1)/2, is unchanged 4.16with respect to EDM D .4.5 InvarianceWhen D is an EDM, there exist an infinite number of corresponding N-pointlists X (64) in Euclidean space. All those lists are related by isometrictransformation: rotation, reflection, and translation (offset or shift).4.5.1 TranslationAny translation common among all the points x l in a list will be cancelledin the formation of each d ij . Proof follows directly from (464). Knowingthat translation α in advance, we may remove it from the list constitutingthe columns of X by subtracting α1 T . Then it stands to reason by list-formdefinition (468) of an EDM, for any translation α∈ R nD(X − α1 T ) = D(X) (545)In words, interpoint distances are unaffected by offset; EDM D is translationinvariant. When α = x 1 in particular,[x 2 −x 1 x 3 −x 1 · · · x N −x 1 ] = X √ 2V N ∈ R n×N−1 (534)and so(D(X −x 1 1 T ) = D(X −Xe 1 1 T ) = D X[0 √ ])2V N= D(X) (546)4.16 The reason for the amount O(N 2 ) information is because of the relative measurements.The use of a fixed reference in the measurement of angles and distances would reducethe required information but is antithetical. In the particular case n = 2, for example,ordering all points x l in a length-N list by increasing angle of vector x l −x 1 with respectto x 2 − x 1 , θ i1j becomes equivalent to j−1 ∑θ k,1,k+1 ≤ 2π and the amount of informationis reduced to 2N −3; rather, O(N).k=i

4.5. INVARIANCE 2554.5.1.0.1 Example. Translating geometric center to origin.We might choose to shift the geometric center α c of an N-point list {x l }(arranged columnar in X) to the origin; [236] [96]α = α ∆ c = Xb ∆ c = 1 X1 ∈ P ⊆ A (547)NIf we were to associate a point-mass m l with each of the points x l in thelist, then their center of mass (or gravity) would be ( ∑ x l m l )/ ∑ m l . Thegeometric center is the same as the center of mass under the assumption ofuniform mass density across points. [142] The geometric center always lies inthe convex hull P of the list; id est, α c ∈ P because b T c 1=1 and b c ≽ 0 . 4.17Subtracting the geometric center from every list member,X − α c 1 T = X − 1 N X11T = X(I − 1 N 11T ) = XV ∈ R n×N (548)So we have (confer (468))D(X) = D(XV ) = δ(V T X T XV )1 T +1δ(V T X T XV ) T − 2V T X T XV ∈ EDM N(549)4.5.1.1 Gram-form invarianceConsequent to list translation invariance, the Gram-form EDM operator(480) exhibits invariance to translation by a doublet (B.2) u1 T + 1u T ;D(G) = D(G − (u1 T + 1u T )) (550)because, for any u∈ R N , D(u1 T + 1u T )=0. The collection of all suchdoublets forms the nullspace to the operator (559); the translation-invariantsubspace S N⊥c (1602) of the symmetric matrices S N . This means matrix Gcan belong to an expanse more broad than a positive semidefinite cone; id est,G∈ S N + − S N⊥c . So explains Gram matrix sufficiency in EDM definition (480).4.17 Any b from α = Xb chosen such that b T 1 = 1, more generally, makes an auxiliaryV -matrix. (B.4.5)

256 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.5.2 Rotation/ReflectionRotation of the list X ∈ R n×N about some arbitrary point α∈ R n , orreflection through some affine subset containing α can be accomplished viaQ(X −α1 T ) where Q is an orthogonal matrix (B.5).We rightfully expectD ( Q(X − α1 T ) ) = D(QX − β1 T ) = D(QX) = D(X) (551)Because list-form D(X) is translation invariant, we may safely ignoreoffset and consider only the impact of matrices that premultiply X .Interpoint distances are unaffected by rotation or reflection; we say,EDM D is rotation/reflection invariant. Proof follows from the fact,Q T =Q −1 ⇒ X T Q T QX =X T X . So (551) follows directly from (468).The class of premultiplying matrices for which interpoint distances areunaffected is a little more broad than orthogonal matrices. Looking at EDMdefinition (468), it appears that any matrix Q p such thatwill have the propertyX T Q T pQ p X = X T X (552)D(Q p X) = D(X) (553)An example is skinny Q p ∈ R m×n (m>n) having orthonormal columns. Wecall such a matrix orthonormal.4.5.2.1 Inner-product form invarianceLikewise, D(Θ) (532) is rotation/reflection invariant;so (552) and (553) similarly apply.4.5.3 Invariance conclusionD(Q p Θ) = D(QΘ) = D(Θ) (554)In the making of an EDM, absolute rotation, reflection, and translationinformation is lost. Given an EDM, reconstruction of point position (4.12,the list X) can be guaranteed correct only in affine dimension r and relativeposition. Given a noiseless complete EDM, this isometric reconstruction isunique in so far as every realization of a corresponding list X is congruent:

4.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 2574.6 Injectivity of D & unique reconstructionInjectivity implies uniqueness of isometric reconstruction; hence, we endeavorto demonstrate it.EDM operators list-form D(X) (468), Gram-form D(G) (480), andinner-product form D(Θ) (532) are many-to-one surjections (4.5) onto thesame range; the EDM cone (5): (confer (481) (561))EDM N = { D(X) : R N−1×N → S N h | X ∈ R N−1×N}= { }D(G) : S N → S N h | G ∈ S N + − S N⊥c= { (555)D(Θ) : R N−1×N−1 → S N h | Θ ∈ R N−1×N−1}where (4.5.1.1)S N⊥c = {u1 T + 1u T | u∈ R N } ⊆ S N (1602)4.6.1 Gram-form bijectivityBecause linear Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (480)has no nullspace [51,A.1] on the geometric center subspace 4.18 (E.7.2.0.2)S N c∆= {G∈ S N | G1 = 0} (1600)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1601)≡ {V N AV T N | A ∈ SN−1 }(556)then D(G) on that subspace is injective.4.18 The equivalence ≡ in (556) follows from the fact: Given B = V Y V = V N AVN T ∈ SN cwith only matrix A∈ S N−1 unknown, then V † †TNBV N = A or V † N Y V †TN = A .

258 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX∈ basis S N⊥hdim S N c = dim S N h = N(N−1)2in R N(N+1)/2dim S N⊥c= dim S N⊥h= N in R N(N+1)/2basis S N c = V {E ij }V (confer (49))S N cbasis S N⊥cS N h∈ basis S N⊥hFigure 69: Orthogonal complements in S N abstractly oriented in isometricallyisomorphic R N(N+1)/2 . Case N = 2 accurately illustrated in R 3 . Orthogonalprojection of basis for S N⊥h on S N⊥c yields another basis for S N⊥c . (Basisvectors for S N⊥c are illustrated lying in a plane orthogonal to S N c in thisdimension. Basis vectors for each ⊥ space outnumber those for its respectiveorthogonal complement; such is not the case in higher dimension.)

4.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 259To prove injectivity of D(G) on S N c : Any matrix Y ∈ S N can bedecomposed into orthogonal components in S N ;Y = V Y V + (Y − V Y V ) (557)where V Y V ∈ S N c and Y −V Y V ∈ S N⊥c (1602). Because of translationinvariance (4.5.1.1) and linearity, D(Y −V Y V )=0 hence N(D)⊇ S N⊥c . Itremains only to showD(V Y V ) = 0 ⇔ V Y V = 0 (558)(⇔ Y = u1 T + 1u T for some u∈ R N) . D(V Y V ) will vanish whenever2V Y V = δ(V Y V )1 T + 1δ(V Y V ) T . But this implies R(1) (B.2) were asubset of R(V Y V ) , which is contradictory. Thus we haveN(D) = {Y | D(Y )=0} = {Y | V Y V = 0} = S N⊥c (559)Since G1=0 ⇔ X1=0 (489) simply means list X is geometricallycentered at the origin, and because the Gram-form EDM operator D istranslation invariant and N(D) is the translation-invariant subspace S N⊥c ,then EDM definition D(G) (555) on 4.19 (confer5.6.1,5.7.1,A.7.3.1)S N c ∩ S N + = {V Y V ≽ 0 | Y ∈ S N } ≡ {V N AV T Nmust be surjective onto EDM N ; (confer (481))EDM N = { D(G) | G ∈ S N c ∩ S N +4.6.1.1 Gram-form operator D inversion| A∈ S N−1+ } ⊂ S N (560)Define the linear geometric centering operator V ; (confer (490))}(561)V(D) : S N → S N ∆ = −V DV 1 2(562)[54,4.3] 4.20 This orthogonal projector V has no nullspace onS N h = aff EDM N (808)4.19 Equivalence ≡ in (560) follows from the fact: Given B = V Y V = V N AVN T ∈ SN + withonly matrix A unknown, then V † †TNBV N= A and A∈ SN−1 + must be positive semidefiniteby positive semidefiniteness of B and Corollary A.3.1.0.5.4.20 Critchley cites Torgerson (1958) [233, ch.11,2] for a history and derivation of (562).

260 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXbecause the projection of −D/2 on S N c (1600) can be 0 if and only ifD ∈ S N⊥c ; but S N⊥c ∩ S N h = 0 (Figure 69). Projector V on S N h is thereforeinjective hence invertible. Further, −V S N h V/2 is equivalent to the geometriccenter subspace S N c in the ambient space of symmetric matrices; a surjection,S N c = V(S N ) = V ( ) ( )S N h ⊕ S N⊥h = V SNh (563)because (61)V ( ) ( ) (S N h ⊇ V SN⊥h = V δ 2 (S N ) ) (564)Because D(G) on S N c is injective, and aff D ( V(EDM N ) ) = D ( V(aff EDM N ) )by property (103) of the affine hull, we find for D ∈ S N h (confer (494))id est,orD(−V DV 1 2 ) = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (565)D = D ( V(D) ) (566)−V DV = V ( D(−V DV ) ) (567)S N h = D ( V(S N h ) ) (568)−V S N h V = V ( D(−V S N h V ) ) (569)These operators V and D are mutual inverses.The Gram-form D ( )S N c (480) is equivalent to SNh ;D ( S N c) (= D V(SNh ⊕ S N⊥h ) ) = S N h + D ( V(S N⊥h ) ) = S N h (570)because S N h ⊇ D ( V(S N⊥h ) ) . In summary, for the Gram-form we have theisomorphisms [55,2] [54, p.76, p.107] [5,2.1] 4.21 [4,2] [6,18.2.1] [1,2.1]and from the bijectivity results in4.6.1,S N h = D(S N c ) (571)S N c = V(S N h ) (572)EDM N = D(S N c ∩ S N +) (573)S N c ∩ S N + = V(EDM N ) (574)4.21 [5, p.6, line 20] Delete sentence: Since G is also...not a singleton set.[5, p.10, line 11] x 3 =2 (not 1).

4.6. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 2614.6.2 Inner-product form bijectivityThe Gram-form EDM operator D(G)= δ(G)1 T + 1δ(G) T − 2G (480) is aninjective map, for example, on the domain that is the subspace of symmetricmatrices having all zeros in the first row and columnS N 1∆= {G∈ S N | Ge 1 = 0}{[ ] [ 0 0T 0 0T= Y0 I 0 I] }| Y ∈ S N(1604)because it obviously has no nullspace there. Since Ge 1 = 0 ⇔ Xe 1 = 0 (482)means the first point in the list X resides at the origin, then D(G) on S N 1 ∩ S N +must be surjective onto EDM N .Substituting Θ T Θ ← −VN TDV N (544) into inner-product form EDMdefinition D(Θ) (532), it may be further decomposed: (confer (488))[ ]0D(D) =δ ( [−VN TDV ) 1 T + 1 0 δ ( −VN TDV ) ] [ ]T 0 0TN − 2N0 −VN TDV N(575)This linear operator D is another flavor of inner-product form and an injectivemap of the EDM cone onto itself. Yet when its domain is instead the entiresymmetric hollow subspace S N h = aff EDM N , D(D) becomes an injectivemap onto that same subspace. Proof follows directly from the fact: linear Dhas no nullspace [51,A.1] on S N h = aff D(EDM N )= D(aff EDM N ) (103).4.6.2.1 Inversion of D ( −VN TDV )NInjectivity of D(D) suggests inversion of (confer (485))V N (D) : S N → S N−1 ∆ = −V T NDV N (576)a linear surjective 4.22 mapping onto S N−1 having nullspace 4.23 S N⊥c ;V N (S N h ) = S N−1 (577)4.22 Surjectivity of V N (D) is demonstrated via the Gram-form EDM operator D(G):Since S N h = D(S N c ) (570), then for any Y ∈ S N−1 , −VN T †TD(VN Y V † N /2)V N = Y .4.23 N(V N ) ⊇ S N⊥c is apparent. There exists a linear mappingT(V N (D)) ∆ = V †TN V N(D)V † N = −V DV 1 2 = V(D)

262 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXinjective on domain S N h because S N⊥c ∩ S N h = 0. Revising the argument ofthis inner-product form (575), we get another flavorD ( [−VN TDV )N =0δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TNand we obtain mutual inversion of operators V N and D , for D ∈ S N h[ 0 0T− 20 −VN TDV N(578)]orD = D ( V N (D) ) (579)−V T NDV N = V N(D(−VTN DV N ) ) (580)S N h = D ( V N (S N h ) ) (581)−V T N S N h V N = V N(D(−VTN S N h V N ) ) (582)Substituting Θ T Θ ← Φ into inner-product form EDM definition (532),any EDM may be expressed by the new flavor[ ]D(Φ) =∆ 01δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(583)Φ ≽ 0where this D is a linear surjective operator onto EDM N by definition,injective because it has no nullspace on domain S N−1+ . More broadly,aff D(S N−1+ )= D(aff S N−1+ ) (103),S N h = D(S N−1 )S N−1 = V N (S N h )(584)demonstrably isomorphisms, and by bijectivity of this inner-product form:such thatEDM N = D(S N−1+ ) (585)S N−1+ = V N (EDM N ) (586)N(T(V N )) = N(V) ⊇ N(V N ) ⊇ S N⊥c= N(V)where the equality S N⊥c = N(V) is known (E.7.2.0.2).

4.7. EMBEDDING IN AFFINE HULL 2634.7 Embedding in affine hullThe affine hull A (66) of a point list {x l } (arranged columnar in X ∈ R n×N(64)) is identical to the affine hull of that polyhedron P (71) formed from allconvex combinations of the x l ; [41,2] [202,17]A = aff X = aff P (587)Comparing hull definitions (66) and (71), it becomes obvious that the x land their convex hull P are embedded in their unique affine hull A ;A ⊇ P ⊇ {x l } (588)Recall: affine dimension r is a lower bound on embedding, equal todimension of the subspace parallel to that nonempty affine set A in whichthe points are embedded. (2.3.1) We define dimension of the convex hullP to be the same as dimension r of the affine hull A [202,2], but r is notnecessarily equal to the rank of X (607).For the particular example illustrated in Figure 58, P is the triangle plusits relative interior while its three vertices constitute the entire list X . Theaffine hull A is the unique plane that contains the triangle, so r=2 in thatexample while the rank of X is 3. Were there only two points in Figure 58,then the affine hull would instead be the unique line passing through them;r would become 1 while the rank would then be 2.4.7.1 Determining affine dimensionKnowledge of affine dimension r becomes important because we lose anyabsolute offset common to all the generating x l in R n when reconstructingconvex polyhedra given only distance information. (4.5.1) To calculate r , wefirst remove any offset that serves to increase dimensionality of the subspacerequired to contain polyhedron P ; subtracting any α ∈ A in the affine hullfrom every list member will work,translating A to the origin: 4.24X − α1 T (589)A − α = aff(X − α1 T ) = aff(X) − α (590)P − α = conv(X − α1 T ) = conv(X) − α (591)4.24 The manipulation of hull functions aff and conv follows from their definitions.

264 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXBecause (587) and (588) translate,R n ⊇ A −α = aff(X − α1 T ) = aff(P − α) ⊇ P − α ⊇ {x l − α} (592)where from the previous relations it is easily shownaff(P − α) = aff(P) − α (593)Translating A neither changes its dimension or the dimension of theembedded polyhedron P ; (65)r ∆ = dim A = dim(A − α) ∆ = dim(P − α) = dim P (594)For any α ∈ R n , (590)-(594) remain true. [202, p.4, p.12] Yet when α ∈ A ,the affine set A − α becomes a unique subspace of R n in which the {x l − α}and their convex hull P − α are embedded (592), and whose dimension ismore easily calculated.4.7.1.0.1 Example. Translating first list-member to origin.Subtracting the first member α = ∆ x 1 from every list member will translatetheir affine hull A and their convex hull P and, in particular, x 1 ∈ P ⊆ A tothe origin in R n ; videlicet,X − x 1 1 T = X − Xe 1 1 T = X(I − e 1 1 T ) = X[0 √ ]2V N ∈ R n×N (595)where V N is defined in (474), and e 1 in (484). Applying (592) to (595),R n ⊇ R(XV N ) = A−x 1 = aff(X −x 1 1 T ) = aff(P −x 1 ) ⊇ P −x 1 ∋ 0(596)where XV N ∈ R n×N−1 . Hencer = dim R(XV N ) (597)Since shifting the geometric center to the origin (4.5.1.0.1) translates theaffine hull to the origin as well, then it must also be truer = dim R(XV ) (598)

4.7. EMBEDDING IN AFFINE HULL 265For any matrix whose range is R(V )= N(1 T ) we get the same result; e.g.,becauser = dim R(XV †TN ) (599)R(XV ) = {Xz | z ∈ N(1 T )} (600)and R(V ) = R(V N ) = R(V †TN) (E). These auxiliary matrices (B.4.2) aremore closely related;V = V N V † N(1281)4.7.1.1 Affine dimension r versus rankNow, suppose D is an EDM as defined byD(X) ∆ = δ(X T X)1 T + 1δ(X T X) T − 2X T X ∈ EDM N (468)and we premultiply by −V T N and postmultiply by V N . Then because V T N 1=0(475), it is always true that−V T NDV N = 2V T NX T XV N = 2V T N GV N ∈ S N−1 (601)where G is a Gram matrix.(confer (490))Similarly pre- and postmultiplying by V−V DV = 2V X T XV = 2V GV ∈ S N (602)always holds because V 1=0 (1271). Likewise, multiplying inner-productform EDM definition (532), it always holds:−V T NDV N = Θ T Θ ∈ S N−1 (536)For any matrix A , rankA T A = rankA = rankA T . [131,0.4] 4.25 So, by(600), affine dimensionr = rankXV = rankXV N = rankXV †TN= rank Θ= rankV DV = rankV GV = rankVN TDV N = rankVN TGV N(603)4.25 For A∈R m×n , N(A T A) = N(A). [220,3.3]

266 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXBy conservation of dimension, (A.7.2.0.1)r + dim N(VNDV T N ) = N −1 (604)r + dim N(V DV ) = N (605)For D ∈ EDM N −V T NDV N ≻ 0 ⇔ r = N −1 (606)but −V DV ⊁ 0. The general fact 4.26 (confer (501))r ≤ min{n, N −1} (607)is evident from (595) but can be visualized in the example illustrated inFigure 58. There we imagine a vector from the origin to each point in thelist. Those three vectors are linearly independent in R 3 , but affine dimensionr is 2 because the three points lie in a plane. When that plane is translatedto the origin, it becomes the only subspace of dimension r=2 that cancontain the translated triangular polyhedron.4.7.2 PrécisWe collect expressions for affine dimension: for list X ∈ R n×N and Grammatrix G∈ S N +r∆= dim(P − α) = dim P = dim conv X= dim(A − α) = dim A = dim aff X= rank(X − x 1 1 T ) = rank(X − α c 1 T )= rank Θ (534)= rankXV N = rankXV = rankXV †TN= rankX , Xe 1 = 0 or X1=0= rankVN TGV N = rankV GV = rankV † N GV N= rankG , Ge 1 = 0 (485) or G1=0 (490)= rankVN TDV N = rankV DV = rankV † N DV N = rankV N (VN TDV N)VNT= rank Λ (692)([ ]) 0 1T= N −1 − dim N1 −D[ 0 1T= rank1 −D]− 2 (615)(608)⎫⎪⎬D ∈ EDM N⎪⎭4.26 rankX ≤ min{n , N}

4.7. EMBEDDING IN AFFINE HULL 2674.7.3 Eigenvalues of −V DV versus −V † N DV NSuppose for D ∈ EDM N we are given eigenvectors v i ∈ R N of −V DV andcorresponding eigenvalues λ∈ R N so that−V DV v i = λ i v i , i = 1... N (609)From these we can determine the eigenvectors and eigenvalues of −V † N DV N :Defineν i ∆ = V † N v i , λ i ≠ 0 (610)Then we have−V DV N V † N v i = λ i v i (611)−V † N V DV N ν i = λ i V † N v i (612)−V † N DV N ν i = λ i ν i (613)the eigenvectors of −V † N DV N are given by (610) while its correspondingnonzero eigenvalues are identical to those of −V DV although −V † N DV Nis not necessarily positive semidefinite. In contrast, −VN TDV N is positivesemidefinite but its nonzero eigenvalues are generally different.4.7.3.0.1 Theorem. EDM rank versus affine dimension r .[96,3] [114,3] [95,3] For D ∈ EDM N (confer (763))1. r = rank(D) − 1 ⇔ 1 T D † 1 ≠ 0Points constituting a list X generating the polyhedron corresponding toD lie on the relative boundary of an r-dimensional circumhyperspherehavingdiameter = √ 2 ( 1 T D † 1 ) −1/2circumcenter = XD† 11 T D † 1(614)2. r = rank(D) − 2 ⇔ 1 T D † 1 = 0There can be no circumhypersphere whose relative boundary containsa generating list for the corresponding polyhedron.3. In Cayley-Menger form [66,6.2] [53,3.3] [34,40] (4.11.2),([ ]) [ ]0 1T 0 1Tr = N −1 − dim N= rank − 2 (615)1 −D 1 −DCircumhyperspheres exist for r< rank(D)−2. [232,7]⋄

268 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFor all practical purposes,max{0, rank(D) − 2} ≤ r ≤ min{n, N −1} (616)4.8 Euclidean metric versus matrix criteria4.8.1 Nonnegativity property 1When D=[d ij ] is an EDM (468), then it is apparent from (601)2VNX T T XV N = −VNDV T N ≽ 0 (617)because for any matrix A , A T A≽0 . 4.27 We claim nonnegativity of the d ijis enforced primarily by the matrix inequality (617); id est,−V T N DV N ≽ 0D ∈ S N h}⇒ d ij ≥ 0, i ≠ j (618)(The matrix inequality to enforce strict positivity differs by a stroke of thepen. (621))We now support our claim: If any matrix A∈ R m×m is positivesemidefinite, then its main diagonal δ(A)∈ R m must have all nonnegativeentries. [93,4.2] Given D ∈ S N h−VN TDV N =⎡⎤1d 12 2 (d 112+d 13 −d 23 )2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 12+d 1N −d 2N )12 (d 112+d 13 −d 23 ) d 13 2 (d 11,i+1+d 1,j+1 −d i+1,j+1 ) · · ·2 (d 13+d 1N −d 3N )1⎢2 (d 11,j+1+d 1,i+1 −d j+1,i+1 )2 (d .1,j+1+d 1,i+1 −d j+1,i+1 ) d .. 11,i+1 2 (d 14+d 1N −d 4N )⎥⎣..... . .. . ⎦12 (d 112+d 1N −d 2N )2 (d 113+d 1N −d 3N )2 (d 14+d 1N −d 4N ) · · · d 1N= 1 2 (1D 1,2:N + D 2:N,1 1 T − D 2:N,2:N ) ∈ S N−1 (619)4.27 For A∈R m×n , A T A ≽ 0 ⇔ y T A T Ay = ‖Ay‖ 2 ≥ 0 for all ‖y‖ = 1. When A isfull-rank skinny-or-square, A T A ≻ 0.

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 269where row,column indices i,j ∈ {1... N −1}. [207] It follows:−V T N DV N ≽ 0D ∈ S N h}⎡⇒ δ(−VNDV T N ) = ⎢⎣⎤d 12d 13⎥.d 1N⎦ ≽ 0 (620)Multiplication of V N by any permutation matrix Ξ has null effect on its rangeand nullspace. In other words, any permutation of the rows or columns of V Nproduces a basis for N(1 T ); id est, R(Ξ r V N )= R(V N Ξ c )= R(V N )= N(1 T ).Hence, −VN TDV N ≽ 0 ⇔ −VN T ΞT rDΞ r V N ≽ 0 (⇔ −Ξ T c VN TDV N Ξ c ≽ 0).Various permutation matrices 4.28 will sift the remaining d ij similarlyto (620) thereby proving their nonnegativity. Hence −VN TDV N ≽ 0 isa sufficient test for the first property (4.2) of the Euclidean metric,nonnegativity.When affine dimension r equals 1, in particular, nonnegativity symmetryand hollowness become necessary and sufficient criteria satisfying matrixinequality (617). (5.6.0.0.1)4.8.1.1 Strict positivityShould we require the points in R n to be distinct, then entries of D off themain diagonal must be strictly positive {d ij > 0, i ≠ j} and only those entriesalong the main diagonal of D are 0. By similar argument, the strict matrixinequality is a sufficient test for strict positivity of Euclidean distance-square;−V T N DV N ≻ 0D ∈ S N h}⇒ d ij > 0, i ≠ j (621)4.28 The rule of thumb is: If Ξ r (i,1) = 1, then δ(−V T N ΞT rDΞ r V N )∈ R N−1 is somepermutation of the i th row or column of D excepting the 0 entry from the main diagonal.

270 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.8.2 Triangle inequality property 4In light of Kreyszig’s observation [147,1.1, prob.15] that properties 2through 4 of the Euclidean metric (4.2) together imply property 1,the nonnegativity criterion (618) suggests that the matrix inequality−V T N DV N ≽ 0 might somehow take on the role of triangle inequality; id est,δ(D) = 0D T = D−V T N DV N ≽ 0⎫⎬⎭ ⇒ √ d ij ≤ √ d ik + √ d kj , i≠j ≠k (622)We now show that is indeed the case: Let T be the leading principalsubmatrix in S 2 of −VN TDV N (upper left 2×2 submatrix from (619));[T =∆1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13(623)Submatrix T must be positive (semi)definite whenever −VN TDV N(A.3.1.0.4,4.8.3) Now we have,is.−V T N DV N ≽ 0 ⇒ T ≽ 0 ⇔ λ 1 ≥ λ 2 ≥ 0−V T N DV N ≻ 0 ⇒ T ≻ 0 ⇔ λ 1 > λ 2 > 0(624)where λ 1 and λ 2 are the eigenvalues of T , real due only to symmetry of T :(λ 1 = 1 d2 12 + d 13 + √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(λ 2 = 1 d2 12 + d 13 − √ )d23 2 − 2(d 12 + d 13 )d 23 + 2(d12 2 + d13)2 ∈ R(625)Nonnegativity of eigenvalue λ 1 is guaranteed by only nonnegativity of the d ijwhich in turn is guaranteed by matrix inequality (618). Inequality betweenthe eigenvalues in (624) follows from only realness of the d ij . Since λ 1always equals or exceeds λ 2 , conditions for the positive (semi)definitenessof submatrix T can be completely determined by examining λ 2 the smaller ofits two eigenvalues. A triangle inequality is made apparent when we expressT eigenvalue nonnegativity in terms of D matrix entries; videlicet,

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 271T ≽ 0 ⇔ detT = λ 1 λ 2 ≥ 0 , d 12 ,d 13 ≥ 0 (c)⇔λ 2 ≥ 0(b)⇔| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)(626)Triangle inequality (626a) (confer (530) (638)), in terms of three rootedentries from D , is equivalent to metric property 4√d13 ≤ √ d 12 + √ d 23√d23≤ √ d 12 + √ d 13√d12≤ √ d 13 + √ d 23(627)for the corresponding points x 1 ,x 2 ,x 3 from some length-N list. 4.294.8.2.1 CommentGiven D whose dimension N equals or exceeds 3, there are N!/(3!(N − 3)!)distinct triangle inequalities in total like (530) that must be satisfied, of whicheach d ij is involved in N−2, and each point x i is in (N −1)!/(2!(N −1 − 2)!).We have so far revealed only one of those triangle inequalities; namely, (626a)that came from T (623). Yet we claim if −VN TDV N ≽ 0 then all triangleinequalities will be satisfied simultaneously;| √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj , i

272 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.8.2.1.1 Shore. The columns of Ξ r V N Ξ c hold a basis for N(1 T )when Ξ r and Ξ c are permutation matrices. In other words, any permutationof the rows or columns of V N leaves its range and nullspace unchanged;id est, R(Ξ r V N Ξ c )= R(V N )= N(1 T ) (475). Hence, two distinct matrixinequalities can be equivalent tests of the positive semidefiniteness of D onR(V N ) ; id est, −VN TDV N ≽ 0 ⇔ −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c )≽0. By properlychoosing permutation matrices, 4.30 the leading principal submatrix T Ξ ∈ S 2of −(Ξ r V N Ξ c ) T D(Ξ r V N Ξ c ) may be loaded with the entries of D needed totest any particular triangle inequality (similarly to (619)-(626)). Because allthe triangle inequalities can be individually tested using a test equivalent tothe lone matrix inequality −VN TDV N ≽ 0, it logically follows that the lonematrix inequality tests all those triangle inequalities simultaneously. Weconclude that −VN TDV N ≽0 is a sufficient test for the fourth property of theEuclidean metric, triangle inequality.4.8.2.2 Strict triangle inequalityWithout exception, all the inequalities in (626) and (627) can be madestrict while their corresponding implications remain true. The thenstrict inequality (626a) or (627) may be interpreted as a strict triangleinequality under which collinear arrangement of points is not allowed.[145,24/6, p.322] Hence by similar reasoning, −VN TDV N ≻ 0 is a sufficienttest of all the strict triangle inequalities; id est,δ(D) = 0D T = D−V T N DV N ≻ 0⎫⎬⎭ ⇒ √ d ij < √ d ik + √ d kj , i≠j ≠k (629)4.8.3 −V T N DV N nestingFrom (623) observe that T =−VN TDV N | N←3 . In fact, for D ∈ EDM N , theleading principal submatrices of −VN TDV N form a nested sequence (byinclusion) whose members are individually positive semidefinite [93] [131][220] and have the same form as T ; videlicet, 4.314.30 To individually test triangle inequality | √ d ik − √ d kj | ≤ √ d ij ≤ √ d ik + √ d kj forparticular i,k,j, set Ξ r (i,1)= Ξ r (k,2)= Ξ r (j,3)=1 and Ξ c = I .4.31 −V DV | N←1 = 0 ∈ S 0 + (B.4.1)

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 273−V T N DV N | N←1 = [ ∅ ] (o)−V T N DV N | N←2 = [d 12 ] ∈ S + (a)−V T N DV N | N←3 =−V T N DV N | N←4 =[⎡⎢⎣1d 12 (d ]2 12+d 13 −d 23 )1(d 2 12+d 13 −d 23 ) d 13= T ∈ S 2 + (b)1d 12 (d 12 12+d 13 −d 23 ) (d ⎤2 12+d 14 −d 24 )1(d 12 12+d 13 −d 23 ) d 13 (d 2 13+d 14 −d 34 )1(d 12 12+d 14 −d 24 ) (d 2 13+d 14 −d 34 ) d 14⎥⎦ (c).−VN TDV N | N←i =.−VN TDV N =⎡⎣⎡⎣−VN TDV ⎤N | N←i−1 ν(i)⎦ ∈ S i−1ν(i) T + (d)d 1i−VN TDV ⎤N | N←N−1 ν(N)⎦ ∈ S N−1ν(N) T + (e)d 1N (630)where⎡ν(i) = ∆ 1 ⎢2 ⎣⎤d 12 +d 1i −d 2id 13 +d 1i −d 3i⎥. ⎦ ∈ Ri−2 , i > 2 (631)d 1,i−1 +d 1i −d i−1,iHence, the leading principal submatrices of EDM D must also be EDMs. 4.32Bordered symmetric matrices in the form (630d) are known to haveintertwined [220,6.4] (or interlaced [131,4.3] [217,IV.4.1]) eigenvalues;(confer4.11.1) that means, for the particular submatrices (630a) and (630b),4.32 In fact, each and every principal submatrix of an EDM D is another EDM. [152,4.1]

274 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXλ 2 ≤ d 12 ≤ λ 1 (632)where d 12 is the eigenvalue of submatrix (630a) and λ 1 , λ 2 are theeigenvalues of T (630b) (623). Intertwining in (632) predicts that shouldd 12 become 0, then λ 2 must go to 0 . 4.33 The eigenvalues are similarlyintertwined for submatrices (630b) and (630c);γ 3 ≤ λ 2 ≤ γ 2 ≤ λ 1 ≤ γ 1 (633)where γ 1 , γ 2 ,γ 3 are the eigenvalues of submatrix (630c). Intertwininglikewise predicts that should λ 2 become 0 (a possibility revealed in4.8.3.1),then γ 3 must go to 0. Combining results so far for N = 2, 3, 4: (632) (633)γ 3 ≤ λ 2 ≤ d 12 ≤ λ 1 ≤ γ 1 (634)The preceding logic extends by induction through the remaining membersof the sequence (630).4.8.3.1 Tightening the triangle inequalityNow we apply the Schur complement fromA.4 to tighten the triangleinequality in the case cardinality N = 4. We find that the gains by doing soare modest. From (630) we identify:[ A] BB T C∆= −V T NDV N | N←4 (635)A ∆ = T = −V T NDV N | N←3 (636)both positive semidefinite by assumption, B = ν(4) defined in (631), andC = d 14 . Using nonstrict CC † -form (1145), C ≽ 0 by assumption (4.8.1)and CC † = I . So by the positive semidefinite ordering of eigenvalues theorem(A.3.1.0.1),−V T NDV N | N←4 ≽ 0 ⇔ T ≽ d −114 ν(4)ν(4) T ⇒{λ1 ≥ d −114 ‖ν(4)‖ 2λ 2 ≥ 0(637)where {d −114 ‖ν(4)‖ 2 , 0} are the eigenvalues of d −114 ν(4)ν(4) T while λ 1 , λ 2 arethe eigenvalues of T .4.33 If d 12 were 0, eigenvalue λ 2 becomes 0 (625) because d 13 must then be equal to d 23 ;id est, d 12 = 0 ⇔ x 1 = x 2 . (4.4)

4.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 2754.8.3.1.1 Example. Small completion problem, II.Applying the inequality for λ 1 in (637) to the small completion problem onpage 220 Figure 59, the lower bound on √ d 14 (1.236 in (461)) is tightenedto 1.289 . The correct value of √ d 14 to three significant figures is 1.414 .4.8.4 Affine dimension reduction in two dimensions(confer4.14.4) The leading principal 2 ×2 submatrix T of −VN TDV N haslargest eigenvalue λ 1 (625) which is a convex function of D . 4.34 λ 1 can neverbe 0 unless d 12 = d 13 = d 23 = 0. Eigenvalue λ 1 can never be negative whilethe d ij are nonnegative. The remaining eigenvalue λ 2 is a concave functionof D that becomes 0 only at the upper and lower bounds of inequality (626a)and its equivalent forms: (confer (628))| √ d 12 − √ d 23 | ≤ √ d 13 ≤ √ d 12 + √ d 23 (a)⇔| √ d 12 − √ d 13 | ≤ √ d 23 ≤ √ d 12 + √ d 13 (b)⇔| √ d 13 − √ d 23 | ≤ √ d 12 ≤ √ d 13 + √ d 23 (c)(638)In between those bounds, λ 2 is strictly positive; otherwise, it would benegative but prevented by the condition T ≽ 0.When λ 2 becomes 0, it means triangle △ 123 has collapsed to a linesegment; a potential reduction in affine dimension r . The same logic is validfor any particular principal 2×2 submatrix of −VN TDV N , hence applicableto other triangles.4.34 The maximum eigenvalue of any symmetric matrix is always a convex function of itsentries, while the ⎡ minimum ⎤ eigenvalue is always concave. [41, exmp.3.10] In our particulard 12case, say d =∆ ⎣d 13⎦∈ R 3 . Then the Hessian (1382) ∇ 2 λ 1 (d)≽0 certifies convexityd 23whereas ∇ 2 λ 2 (d)≼0 certifies concavity. Each Hessian has rank equal to 1. The respectivegradients ∇λ 1 (d) and ∇λ 2 (d) are nowhere 0.

276 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.9 Bridge: Convex polyhedra to EDMsThe criteria for the existence of an EDM include, by definition (468) (532),the properties imposed upon its entries d ij by the Euclidean metric. From4.8.1 and4.8.2, we know there is a relationship of matrix criteria to thoseproperties. Here is a snapshot of what we are sure: for i , j , k ∈{1... N}(confer4.2)√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇐−V T N DV N ≽ 0δ(D) = 0D T = D(639)all implied by D ∈ EDM N . In words, these four Euclidean metric propertiesare necessary conditions for D to be a distance matrix. At the moment,we have no converse. As of concern in4.3, we have yet to establishmetric requirements beyond the four Euclidean metric properties that wouldallow D to be certified an EDM or might facilitate polyhedron or listreconstruction from an incomplete EDM. We deal with this problem in4.14.Our present goal is to establish ab initio the necessary and sufficient matrixcriteria that will subsume all the Euclidean metric properties and any furtherrequirements 4.35 for all N >1 (4.8.3); id est,−V T N DV N ≽ 0D ∈ S N h}⇔ D ∈ EDM N (487)or for EDM definition (541),}Ω ≽ 0√δ(d) ≽ 0⇔ D = D(Ω,d) ∈ EDM N (640)4.35 In 1935, Schoenberg [207, (1)] first extolled matrix product −VN TDV N (619)(predicated on symmetry and self-distance) specifically incorporating V N , albeitalgebraically. He showed: nonnegativity −y T VN TDV N y ≥ 0, for all y ∈ R N−1 , is necessaryand sufficient for D to be an EDM. Gower [95,3] remarks how surprising it is that sucha fundamental property of Euclidean geometry was obtained so late.

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 277Figure 70: Elliptope E 3 in isometrically isomorphic R 6 (projected on R 3 )is a convex body that appears to possess some kind of symmetry in thisdimension; it resembles a malformed pillow in the shape of a bulgingtetrahedron. Elliptope relative boundary is not smooth and comprises allset members (641) having at least one 0 eigenvalue. [155,2.1] This elliptopehas an infinity of vertices, but there are only four vertices corresponding toa rank-1 matrix. Those yy T , evident in the illustration, have binary vectory ∈ R 3 with entries in {±1}.

278 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX0Figure 71: Elliptope E 2 in isometrically isomorphic R 3 is a line segmentillustrated interior to positive semidefinite cone S 2 + (Figure 29).4.9.1 Geometric arguments4.9.1.0.1 Definition. Elliptope: [155] [152,2.3] [66,31.5]a unique bounded immutable convex Euclidean body in S n ; intersection ofpositive semidefinite cone S n + with that set of n hyperplanes defined by unitymain diagonal;E n ∆ = S n + ∩ {Φ∈ S n | δ(Φ)=1} (641)a.k.a, the set of all correlation matrices of dimensiondim E n = n(n −1)/2 in R n(n+1)/2 (642)An elliptope E n is not a polyhedron, in general, but has some polyhedral facesand an infinity of vertices. 4.36 Of those, 2 n−1 vertices are extreme directionsyy T of the positive semidefinite cone where the entries of vector y ∈ R n eachbelong to {±1} while the vector exercises every combination. Each of theremaining vertices has rank belonging to the set {k>0 | k(k + 1)/2 ≤ n}.Each and every face of an elliptope is exposed.△4.36 Laurent defines vertex distinctly from the sense herein (2.6.1.0.1); she defines vertexas a point with full-dimensional (nonempty interior) normal cone (E.10.3.2.1). Herdefinition excludes point C in Figure 20, for example.

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 279The elliptope for dimension n = 2 is a line segment in isometricallyisomorphic R n(n+1)/2 (Figure 71). Obviously, cone(E n )≠ S n + . The elliptopefor dimension n = 3 is realized in Figure 70.4.9.1.0.2 Lemma. Hypersphere. [14,4] (confer bullet p.230)Matrix A = [A ij ]∈ S N belongs to the elliptope in S N iff there exist N pointsp on the boundary of a hypersphere having radius 1 in R rank A such that‖p i − p j ‖ = √ 2 √ 1 − A ij , i,j=1... N (643)There is a similar theorem for Euclidean distance matrices:We derive matrix criteria for D to be an EDM, validating (487) usingsimple geometry; distance to the polyhedron formed by the convex hull of alist of points (64) in Euclidean space R n .⋄4.9.1.0.3 EDM assertion.D is a Euclidean distance matrix if and only if D ∈ S N h and distances-squarefrom the origin{‖p(y)‖ 2 = −y T V T NDV N y | y ∈ S − β} (644)correspond to points p in some bounded convex polyhedronP − α = {p(y) | y ∈ S − β} (645)having N or fewer vertices embedded in an r-dimensional subspace A − αof R n , where α ∈ A = aff P and where the domain of linear surjection p(y)is the unit simplex S ⊂ R N−1+ shifted such that its vertex at the origin istranslated to −β in R N−1 . When β = 0, then α = x 1 .⋄In terms of V N , the unit simplex (247) in R N−1 has an equivalentrepresentation:S = {s ∈ R N−1 | √ 2V N s ≽ −e 1 } (646)where e 1 is as in (484). Incidental to the EDM assertion, shifting theunit-simplex domain in R N−1 translates the polyhedron P in R n . Indeed,

280 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXthere is a map from vertices of the unit simplex to members of the listgenerating P ;p : R N−1⎛⎧⎪⎨p⎜⎝⎪⎩−βe 1 − βe 2 − β.e N−1 − β⎫⎞⎪⎬⎟⎠⎪⎭→=⎧⎪⎨⎪⎩R nx 1 − αx 2 − αx 3 − α.x N − α⎫⎪⎬⎪⎭(647)4.9.1.0.4 Proof. EDM assertion.(⇒) We demonstrate that if D is an EDM, then each distance-square ‖p(y)‖ 2described by (644) corresponds to a point p in some embedded polyhedronP − α . Assume D is indeed an EDM; id est, D can be made from some listX of N unknown points in Euclidean space R n ; D = D(X) for X ∈ R n×Nas in (468). Since D is translation invariant (4.5.1), we may shift the affinehull A of those unknown points to the origin as in (589). Then take anypoint p in their convex hull (71);P − α = {p = (X − Xb1 T )a | a T 1 = 1, a ≽ 0} (648)where α = Xb ∈ A ⇔ b T 1 = 1. Solutions to a T 1 = 1 are: 4.37a ∈{e 1 + √ }2V N s | s ∈ R N−1(649)where e 1 is as in (484). Similarly, b = e 1 + √ 2V N β .P − α = {p = X(I − (e 1 + √ 2V N β)1 T )(e 1 + √ 2V N s) | √ 2V N s ≽ −e 1 }= {p = X √ 2V N (s − β) | √ 2V N s ≽ −e 1 }(650)that describes the domain of p(s) as the unit simplexS = {s | √ 2V N s ≽ −e 1 } ⊂ R N−1+ (646)4.37 Since R(V N )= N(1 T ) and N(1 T )⊥ R(1) , then over all s∈ R N−1 , V N s is ahyperplane through the origin orthogonal to 1. Thus the solutions {a} constitute ahyperplane orthogonal to the vector 1, and offset from the origin in R N by any particularsolution; in this case, a = e 1 .

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 281Making the substitution s − β ← yP − α = {p = X √ 2V N y | y ∈ S − β} (651)Point p belongs to a convex polyhedron P−α embedded in an r-dimensionalsubspace of R n because the convex hull of any list forms a polyhedron, andbecause the translated affine hull A − α contains the translated polyhedronP − α (592) and the origin (when α∈A), and because A has dimension r bydefinition (594). Now, any distance-square from the origin to the polyhedronP − α can be formulated{p T p = ‖p‖ 2 = 2y T V T NX T XV N y | y ∈ S − β} (652)Applying (601) to (652) we get (644).(⇐) To validate the EDM assertion in the reverse direction, we prove: Ifeach distance-square ‖p(y)‖ 2 (644) on the shifted unit-simplex S −β ⊂ R N−1corresponds to a point p(y) in some embedded polyhedron P − α , then Dis an EDM. The r-dimensional subspace A − α ⊆ R n is spanned byp(S − β) = P − α (653)because A − α = aff(P − α) ⊇ P − α (592). So, outside the domain S − βof linear surjection p(y) , the simplex complement \S − β ⊂ R N−1 mustcontain the domain of the distance-square ‖p(y)‖ 2 = p(y) T p(y) to remainingpoints in the subspace A − α ; id est, to the polyhedron’s relative exterior\P − α . For ‖p(y)‖ 2 to be nonnegative on the entire subspace A − α ,−V T N DV N must be positive semidefinite and is assumed symmetric; 4.38−V T NDV N ∆ = Θ T pΘ p (654)where 4.39 Θ p ∈ R m×N−1 for some m ≥ r . Because p(S − β) is a convexpolyhedron, it is necessarily a set of linear combinations of points from somelength-N list because every convex polyhedron having N or fewer verticescan be generated that way (2.12.2). Equivalent to (644) are{p T p | p ∈ P − α} = {p T p = y T Θ T pΘ p y | y ∈ S − β} (655)4.38 The antisymmetric part ( −V T N DV N − (−V T N DV N) T) /2 is annihilated by ‖p(y)‖ 2 . Bythe same reasoning, any positive (semi)definite matrix A is generally assumed symmetricbecause only the symmetric part (A +A T )/2 survives the test y T Ay ≥ 0. [131,7.1]4.39 A T = A ≽ 0 ⇔ A = R T R for some real matrix R . [220,6.3]

282 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXBecause p ∈ P − α may be found by factoring (655), the list Θ p is foundby factoring (654). A unique EDM can be made from that list usinginner-product form definition D(Θ)| Θ=Θp (532). That EDM will be identicalto D if δ(D)=0, by injectivity of D (575).4.9.2 Necessity and sufficiencyFrom (617) we learned that the matrix inequality −VN TDV N ≽ 0 is anecessary test for D to be an EDM. In4.9.1, the connection betweenconvex polyhedra and EDMs was pronounced by the EDM assertion; thematrix inequality together with D ∈ S N h became a sufficient test when theEDM assertion demanded that every bounded convex polyhedron have acorresponding EDM. For all N >1 (4.8.3), the matrix criteria for theexistence of an EDM in (487), (640), and (463) are therefore necessaryand sufficient and subsume all the Euclidean metric properties and furtherrequirements.4.9.3 Example revisitedNow we apply the necessary and sufficient EDM criteria (487) to an earlierproblem.4.9.3.0.1 Example. Small completion problem, III. (confer4.8.3.1.1)Continuing Example 4.3.0.0.2 pertaining to Figure 59 where N = 4,distance-square d 14 is ascertainable from the matrix inequality −VN TDV N ≽0.Because all distances in (460) are known except √ d 14 , we may simplycalculate the minimum eigenvalue of −VN TDV N over a range of d 14 as inFigure 72. We observe a unique value of d 14 satisfying (487) where theabscissa is tangent to the hypograph of the minimum eigenvalue. Sincethe minimum eigenvalue of a symmetric matrix is known to be a concavefunction (4.8.4), we calculate its second partial derivative with respect tod 14 evaluated at 2 and find −1/3. We conclude there are no other satisfyingvalues of d 14 . Further, that value of d 14 does not meet an upper or lowerbound of a triangle inequality like (628), so neither does it cause the collapseof any triangle. Because the minimum eigenvalue is 0, affine dimension r ofany point list corresponding to D cannot exceed N −2. (4.7.1.1)

4.9. BRIDGE: CONVEX POLYHEDRA TO EDMS 283minimum eigenvalue1 2 3 4d 14-0.2-0.4-0.6Figure 72: Minimum eigenvalue of −VN TDV None value of d 14 ; 2.positive semidefinite for only1.41.2d 1/αijlog 2 (1+d 1/αij )(a)10.80.60.40.21−e −αd ij40.5 1 1.5 2d ij3(b)21d 1/αijlog 2 (1+d 1/αij )1−e −αd ij0.5 1 1.5 2αFigure 73: Some entrywise EDM compositions: (a) α = 2. Concavenondecreasing in d ij . (b) Trajectory convergence in α for d ij = 2.

284 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.10 EDM-entry compositionLaurent [152,2.3] applies results from Schoenberg (1938) [208] to showcertain nonlinear compositions of individual EDM entries yield EDMs;in particular,D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0⇔ [e −αd ij] ∈ E N ∀α > 0(656)where D = [d ij ] and E N is the elliptope (641).Proof.F.Schoenberg’s results [208,6, thm.5] (confer [147, p.108-109]) alsosuggest certain finite positive roots of EDM entries produce EDMs;specifically,D ∈ EDM N ⇔ [d 1/αij ] ∈ EDM N ∀α > 1 (657)The special case α = 2 is of interest because it corresponds to absolutedistance; e.g.,D ∈ EDM N ⇒ ◦√ D ∈ EDM N (658)Assuming that points constituting a corresponding list X are distinct(621), then it follows: for D ∈ S N hlimα→∞ [d1/α ij] = limα→∞[1 − e −αd ij] = −E ∆ = 11 T − I (659)Negative elementary matrix −E (B.3) is relatively interior to the EDM cone(5.6) and terminal to the respective trajectories (656) and (657) as functionsof α . Both trajectories are confined to the EDM cone; in engineering terms,the EDM cone is an invariant set [205] to either trajectory. Further, if D isnot an EDM but for some particular α p it becomes an EDM, then for allgreater values of α it remains an EDM.These preliminary findings lead one to speculate whether any concavenondecreasing composition of individual EDM entries d ij on R + will produceanother EDM; e.g., empirical evidence suggests that given EDM D , for eachfixed α ≥ 1 [sic] the composition [log 2 (1 + d 1/αij )] is also an EDM. Figure 73illustrates that composition’s concavity in d ij together with functions from(656) and (657).

4.11. EDM INDEFINITENESS 2854.10.1 EDM by elliptopeFor some κ∈ R + and C ∈ S N + in the elliptope E N (4.9.1.0.1), Alfakih assertsany given EDM D is expressible [7] [66,31.5]D = κ(11 T − C) ∈ EDM N (660)This expression exhibits nonlinear combination of variables κ and C . Wetherefore propose a different expression requiring redefinition of the elliptope(641) by parametrization;E n t∆= S n + ∩ {Φ∈ S n | δ(Φ)=t1} (661)where, of course, E n = E n 1 . Then any given EDM D is expressibleD = t11 T − E ∈ EDM N (662)which is linear in variables t∈ R + and E∈ E N t .4.11 EDM indefinitenessBy the known result inA.7.1 regarding a 0-valued entry on the maindiagonal of a symmetric positive semidefinite matrix, there can be no positivenor negative semidefinite EDM except the 0 matrix because EDM N ⊆ S N h(467) andS N h ∩ S N + = 0 (663)the origin. So when D ∈ EDM N , there can be no factorization D =A T Anor −D =A T A . [220,6.3] Hence eigenvalues of an EDM are neither allnonnegative or all nonpositive; an EDM is indefinite and possibly invertible.

286 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.11.1 EDM eigenvalues, congruence transformationFor any symmetric −D , we can characterize its eigenvalues by congruencetransformation: [220,6.3][ ]VT−W T NDW = − D [ V N1 T1 ] = −[VTN DV N V T N D11 T DV N 1 T D1]∈ S N (664)BecauseW ∆ = [V N 1 ] ∈ R N×N (665)is full-rank, then (1151)inertia(−D) = inertia ( −W T DW ) (666)the congruence (664) has the same number of positive, zero, and negativeeigenvalues as −D . Further, if we denote by {γ i , i=1... N −1} theeigenvalues of −VN TDV N and denote eigenvalues of the congruence −W T DWby {ζ i , i=1... N} and if we arrange each respective set of eigenvalues innonincreasing order, then by theory of interlacing eigenvalues for borderedsymmetric matrices [131,4.3] [220,6.4] [217,IV.4.1]ζ N ≤ γ N−1 ≤ ζ N−1 ≤ γ N−2 ≤ · · · ≤ γ 2 ≤ ζ 2 ≤ γ 1 ≤ ζ 1 (667)When D ∈ EDM N , then γ i ≥ 0 ∀i (1094) because −V T N DV N ≽0 as weknow. That means the congruence must have N −1 nonnegative eigenvalues;ζ i ≥ 0, i=1... N −1. The remaining eigenvalue ζ N cannot be nonnegativebecause then −D would be positive semidefinite, an impossibility; so ζ N < 0.By congruence, nontrivial −D must therefore have exactly one negativeeigenvalue; 4.40 [66,2.4.5]4.40 All the entries of the corresponding eigenvector must have the same sign with respectto each other [54, p.116] because that eigenvector is the Perron vector corresponding tothe spectral radius; [131,8.2.6] the predominant characteristic of negative [sic] matrices.

4.11. EDM INDEFINITENESS 287D ∈ EDM N ⇒⎧λ(−D) i ≥ 0, i=1... N −1⎪⎨ ( N)∑λ(−D) i = 0i=1⎪⎩D ∈ S N h ∩ R N×N+(668)where the λ(−D) i are nonincreasingly ordered eigenvalues of −D whosesum must be 0 only because trD = 0 [220,5.1]. The eigenvalue summationcondition, therefore, can be considered redundant. Even so, all theseconditions are insufficient to determine whether some given H ∈ S N h is anEDM, as shown by counter-example. 4.414.11.1.0.1 Exercise. Spectral inequality.Prove whether it holds: for D=[d ij ]∈ EDM Nλ(−D) 1 ≥ d ij ≥ λ(−D) N−1 ∀i ≠ j (669)4.11.2 Spectral cones[ 0 1TDenoting the eigenvalues of Cayley-Menger matrix1 −D([ 0 1Tλ1 −D]∈ S N+1 by])∈ R N+1 (670)we have the Cayley-Menger form (4.7.3.0.1) of necessary and sufficientconditions for D ∈ EDM N from the literature; [114,3] 4.42 [47,3] [66,6.2](confer (487) (463))4.41 When N = 3, for example, the symmetric hollow matrix⎡H = ⎣0 1 11 0 51 5 0⎤⎦ ∈ S N h ∩ R N×N+is not an EDM, although λ(−H) = [5 0.3723 −5.3723] T conforms to (668).4.42 Recall: for D ∈ S N h , −V T N DV N ≽ 0 subsumes nonnegativity property 1 (4.8.1).

288 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXD ∈ EDM N ⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])≥ 0 ,ii = 1... N⎫⎪⎬⎪⎭ ⇔ {−VTN DV N ≽ 0D ∈ S N h(671)These conditions say the Cayley-Menger form has one ([ and only])one negative0 1Teigenvalue. When D is an EDM, the eigenvalues λbelong to1 −Dthe particular orthant in R N+1 having the N+1 th coordinate as sole negativecoordinate 4.43 ;[ ]RN+= cone {eR 1 , e 2 , · · · e N , −e N+1 } (672)−4.11.2.1 Cayley-Menger versus SchoenbergConnection to the Schoenberg criterion (487) is made when theCayley-Menger form is further partitioned:⎡ [ ] [ ] ⎤[ ] 0 1T 0 1 1 T⎢= ⎣ 1 0 −D 1,2:N⎥⎦ (673)1 −D[1 −D 2:N,1 ] −D 2:N,2:N[ ] 0 1Matrix D ∈ S N h is an EDM if and only if the Schur complement of1 0in this partition (A.4) is positive semidefinite; [14,1] [141,3] id est,(confer (619))D ∈ EDM N⇔[ ][ ]0 1 1−D 2:N,2:N − [1 −D 2:N,1 ]T= −2VN 1 0 −D TDV N ≽ 01,2:Nand(674)D ∈ S N h4.43 Empirically, all except one entry of the corresponding eigenvector have the same signwith respect to each other.

4.11. EDM INDEFINITENESS 289Now we apply results from chapter 2 with regard to polyhedral cones andtheir duals.4.11.2.2 Ordered eigenvaluesWhen eigenvalues λ are nonincreasingly ordered, conditions (671) specifytheir [ membership ] to the smallest pointed polyhedral spectral cone for0 1T1 −EDM N :K ∆ λ = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N ≥ 0 ≥ ζ N+1 , 1 T ζ = 0}[ ]RN+= K M ∩ ∩ ∂HR −([ ])0 1T= λ1 −EDM N(675)where∂H = {ζ ∈ R N+1 | 1 T ζ = 0} (676)is a hyperplane through the origin, and K M is the monotone cone(2.13.9.4.2) which has nonempty interior but is not pointed;K M = {ζ ∈ R N+1 | ζ 1 ≥ ζ 2 ≥ · · · ≥ ζ N+1 } (361)K ∗ M = { [e 1 − e 2 e 2 −e 3 · · · e N −e N+1 ]a | a ≽ 0 } ⊂ R N+1 (362)So because of the hyperplane,indicating K λ has empty interior. DefiningA ∆ =⎡⎢⎣e T 1 − e T 2e T 2 − e T 3.e T N − eT N+1⎤dim aff K λ = dim ∂H = N (677)⎥⎦ ∈ RN×N+1 , B ∆ =⎡⎢⎣e T 1e T2 .e T N−e T N+1⎤⎥⎦ ∈ RN+1×N+1 (678)

290 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXwe have the halfspace-description:K λ = {ζ ∈ R N+1 | Aζ ≽ 0, Bζ ≽ 0, 1 T ζ = 0} (679)From this and (369) we get a vertex-description for a pointed spectral conehaving empty interior:K λ ={V N([ ÂˆB] ) †}V N b | b ≽ 0(680)where V N ∈ R N+1×N , and where [sic]ˆB = e T N ∈ R 1×N+1 (681)andÂ =⎡⎢⎣e T 1 − e T 2e T 2 − e T 3.e T N−1 − eT N⎤⎥⎦ ∈ RN−1×N+1 (682)hold [ ] those rows of A and B corresponding to conically independent rows inAVB N .For eigenvalues arranged in nonincreasing order, conditions (671) can berestated in terms of a spectral cone for Euclidean distance matrices:D ∈ EDM N⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])∈ K M ∩Vertex-description of the dual spectral cone is, (264)[ ]RN+∩ ∂HR −(683)K ∗ λ = K∗ M + [RN+R −] ∗+ ∂H ∗ ⊆ R N+1= {[ A T B T 1 −1 ] b | b ≽ 0 } ={ }[ÂT ˆBT 1 −1]a | a ≽ 0(684)

4.11. EDM INDEFINITENESS 291From (680) and (370) we get a halfspace-description:K ∗ λ = {y ∈ R N+1 | (V T N [ÂT ˆBT ]) † V T N y ≽ 0} (685)This polyhedral dual spectral cone K ∗ λ is closed, convex, has nonemptyinterior because K λ is pointed, but is not pointed because K λ has emptyinterior.4.11.2.3 Unordered eigenvaluesSpectral cones are not unique. When eigenvalue ordering is of no concern, 4.44things simplify: The conditions (671) specify eigenvalue membership to([ ]) [ ]0 1T RNλ1 −EDM N += ∩ ∂HR −= {ζ ∈ R N+1 | Bζ ≽ 0, 1 T ζ = 0}(686)where B is defined in (678), and ∂H in (676). From (369) we get avertex-description for a pointed spectral cone having empty interior:([ ])0 1T {λ1 −EDM N = V N ( ˜BV}N ) † b | b ≽ 0{[ ] } (687)I=−1 T b | b ≽ 0where V N ∈ R N+1×N and˜B ∆ =⎡⎢⎣e T 1e T2 .e T N⎤⎥⎦ ∈ RN×N+1 (688)holds only those rows of B corresponding to conically independent rowsin BV N .4.44 Eigenvalue ordering (represented by a cone having monotone description such as (675))would be of no concern in (987), for example, where projection of a given presorted seton the nonnegative orthant in a subspace is equivalent to its projection on the monotonenonnegative cone in that same subspace; equivalence is a consequence of presorting.

292 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXFor unordered eigenvalues, (671) can be equivalently restatedD ∈ EDM N⇔⎧⎪⎨⎪⎩([ 0 1Tλ1 −DD ∈ S N h])∈[ ]RN+∩ ∂HR −(689)Vertex-description of the dual spectral cone is, (264)([ ]) 0 1T ∗λ1 −EDM N =[ ]RN++ ∂H ∗ ⊆ R N+1R −= {[ B T 1 −1 ] b | b ≽ 0 } =From (370) we get a halfspace-description:{ [ }˜BT 1 −1]a | a ≽ 0(690)([ ]) 0 1T ∗λ1 −EDM N = {y ∈ R N+1 | (VN T ˜B T ) † VN T y ≽ 0}= {y ∈ R N+1 | [I −1 ]y ≽ 0}(691)This polyhedral dual spectral cone is closed, convex, has nonempty interiorbut is not pointed. (Notice that any nonincreasingly ordered eigenspectrumbelongs to this dual spectral cone.)4.11.2.4 Dual cone versus dual spectral coneAn open question regards the relationship of convex cones and their duals tothe corresponding spectral cones and their duals. The positive semidefinitecone, for example, is self-dual. Both the nonnegative orthant and themonotone nonnegative cone are spectral cones for it. When we considerthe nonnegative orthant, then the spectral cone for the self-dual positivesemidefinite cone is also self-dual.

4.12. LIST RECONSTRUCTION 2934.12 List reconstructionThe traditional term multidimensional scaling 4.45 [167] [63] [237] [61][170] [54] refers to any reconstruction of a list X ∈ R n×N in Euclideanspace from interpoint distance information, possibly incomplete (5.4),ordinal (4.13.2), or specified perhaps only by bounding-constraints(4.4.2.2.7) [235]. Techniques for reconstruction are essentially methodsfor optimally embedding an unknown list of points, corresponding togiven Euclidean distance data, in an affine subset of desired or minimumdimension. The oldest known precursor is called principal componentanalysis [98] which analyzes the correlation matrix (4.9.1.0.1); [36,22]a.k.a, Karhunen−Loéve transform in the digital signal processing literature.Isometric reconstruction (4.5.3) of point list X is best performed by eigendecomposition of a Gram matrix; for then, numerical errors of factorizationare easily spotted in the eigenvalues.We now consider how rotation/reflection and translation invariance factorinto a reconstruction.4.12.1 x 1 at the origin. V NAt the stage of reconstruction, we have D ∈ EDM N and we wish to finda generating list (2.3.2) for P − α by factoring positive semidefinite−VN TDV N (654) as suggested in4.9.1.0.4. One way to factor −VN TDV Nis via diagonalization of symmetric matrices; [220,5.6] [131] (A.5.2,A.3)−VNDV T ∆ N = QΛQ T (692)QΛQ T ≽ 0 ⇔ Λ ≽ 0 (693)where Q ∈ R N−1×N−1 is an orthogonal matrix containing eigenvectors whileΛ∈ R N−1×N−1 is a diagonal matrix containing corresponding nonnegativeeigenvalues ordered by nonincreasing value. From the diagonalization,identify the list via positive square root (A.5.2.1) using (601);−V T NDV N = 2V T NX T XV N ∆ = Q √ ΛQ T pQ p√ΛQT(694)where √ ΛQ T pQ p√Λ ∆ = Λ = √ Λ √ Λ and where Q p ∈ R n×N−1 is unknown asis its dimension n . Rotation/reflection is accounted for by Q p yet only its4.45 Scaling [233] means making a scale, i.e., a numerical representation of qualitative data.If the scale is multidimensional, it’s multidimensional scaling.−Jan de Leeuw

294 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXfirst r columns are necessarily orthonormal. 4.46 Assuming membership tothe unit simplex y ∈ S (651), then point p = X √ 2V N y = Q p√ΛQ T y in R nbelongs to the translated polyhedronP − x 1 (695)whose generating list constitutes the columns of (595)[ √ ] [0 X 2VN = 0 √ ]−VN TDV N=[ √ ]0 Q p ΛQT∈ R n×N= [0 x 2 −x 1 x 3 −x 1 · · · x N −x 1 ](696)The scaled auxiliary matrix V N represents that translation. A simple choicefor Q p has n set to N − 1; id est, Q p = I . Ideally, each member of thegenerating list has at most r nonzero entries; r being, affine dimensionrankV T NDV N = rankQ p√ΛQ T = rank Λ = r (697)Each member then has at least N −1 − r zeros in its higher-dimensionalcoordinates because r ≤ N −1. (607) To truncate those zeros, choose nequal to affine dimension which is the smallest n possible because XV N hasrank r ≤ n (603). 4.47 In that case, the simplest choice for Q p is [ I 0 ]having dimensions r ×N −1.We may wish to verify the list (696) found from the diagonalization of−VN TDV N . Because of rotation/reflection and translation invariance (4.5),EDM D can be uniquely made from that list by calculating: (468)4.46 Recall r signifies affine dimension. Q p is not necessarily an orthogonal matrix. Q p isconstrained such that only its first r columns are necessarily orthonormal because thereare only r nonzero eigenvalues in Λ when −VN TDV N has rank r (4.7.1.1). Remainingcolumns of Q p are arbitrary.⎡⎤⎡q T1 ⎤λ 1 0. .. 4.47 If we write Q T = ⎣. .. ⎦ as rowwise eigenvectors, Λ = ⎢ λr ⎥qN−1T ⎣ 0 ... ⎦0 0in terms of eigenvalues, and Q p = [ ]q p1 · · · q pN−1 as column vectors, then√Q p Λ Q T ∑= r √λi q pi qiT is a sum of r linearly independent rank-one matrices (B.1.1).i=1Hence the summation has rank r .

4.12. LIST RECONSTRUCTION 295D(X) = D(X[0 √ 2V N ]) = D(Q p [0 √ ΛQ T ]) = D([0 √ ΛQ T ]) (698)This suggests a way to find EDM D given −V T N DV N ; (confer (579))[0D =δ ( −VN TDV )N] [1 T + 1 0 δ ( −VN TDV ) ] TN[ 0 0T− 20 −VN TDV N(488)]4.12.2 0 geometric center. VAlternatively, we may perform reconstruction using instead the auxiliarymatrix V (B.4.1), corresponding to the polyhedronP − α c (699)whose geometric center has been translated to the origin. RedimensioningQ, Λ∈R N×N and Q p ∈ R n×N , (602)−V DV = 2V X T XV ∆ = Q √ ΛQ T pQ p√ΛQT(700)where the geometrically centered generating list constitutes (confer (696))XV =√−V DV 1 2= 1 √2Q p√ΛQ T ∈ R n×N= [x 1 − 1 N X1 x 2 − 1 N X1 x 3 − 1 N X1 · · · x N − 1 N X1 ] (701)where α c = 1 X1. (4.5.1.0.1)NNow EDM D can be uniquely made from the list found, by calculating:(468)D(X) = D(XV ) = D( 1 √2Q p√ΛQ T ) = D( √ ΛQ T ) 1 2(702)This EDM is, of course, identical to (698). Similarly to (488), from −V DVwe can find EDM D ; (confer (566))D = δ(−V DV 1 2 )1T + 1δ(−V DV 1 2 )T − 2(−V DV 1 2 ) (494)

296 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX(a)(c)(b)(d)(f)(e)Figure 74: Map of United States of America showing some state boundariesand the Great Lakes. All plots made using 5020 connected points. Anydifference in scale in (a) through (d) is an artifact of plotting routine.(a) shows original map made from decimated (latitude, longitude) data.(b) Original map data rotated (freehand) to highlight curvature of Earth.(c) Map isometrically reconstructed from the EDM.(d) Same reconstructed map illustrating curvature.(e)(f) Two views of one isotonic reconstruction; problem (711) with no sortconstraint Ξd (and no hidden line removal).

4.13. RECONSTRUCTION EXAMPLES 2974.13 Reconstruction examples4.13.1 Isometric reconstruction4.13.1.0.1 Example. Map of the USA.The most fundamental application of EDMs is to reconstruct relative pointposition given only interpoint distance information. Drawing a map of theUnited States is a good illustration of isometric reconstruction from completedistance data. We obtained latitude and longitude information for the coast,border, states, and Great Lakes from the usalo atlas data file within theMatlab Mapping Toolbox; the conversion to Cartesian coordinates (x,y,z)via:φ ∆ = π/2 − latitudeθ ∆ = longitudex = sin(φ) cos(θ)y = sin(φ) sin(θ)z = cos(φ)(703)We used 64% of the available map data to calculate EDM D from N = 5020points. The original (decimated) data and its isometric reconstruction via(694) are shown in Figure 74(a)-(d). The Matlab code is inG.3.1. Theeigenvalues computed for (692) areλ(−V T NDV N ) = [199.8 152.3 2.465 0 0 0 · · · ] T (704)The 0 eigenvalues have absolute numerical error on the order of 2E-13 ;meaning, the EDM data indicates three dimensions (r = 3) are required forreconstruction to nearly machine precision.4.13.2 Isotonic reconstructionSometimes only comparative information about distance is known (Earth iscloser to the Moon than it is to the Sun). Suppose, for example, the EDMD for three points is unknown:

298 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXD = [d ij ] =⎡⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0but the comparative data is available:⎤⎦ ∈ S 3 h (457)d 13 ≥ d 23 ≥ d 12 (705)With the vectorization d = [d 12 d 13 d 23 ] T ∈ R 3 , we express the comparativedistance relationship as the nonincreasing sortingΞd =⎡⎣0 1 00 0 11 0 0⎤⎡⎦⎣⎤d 12d 13⎦ =d 23⎡⎣⎤d 13d 23⎦ ∈ K M+ (706)d 12where Ξ is a given permutation matrix expressing the known sorting actionon the entries of unknown EDM D , and K M+ is the monotone nonnegativecone (2.13.9.4.1)K M+ ∆ = {z | z 1 ≥ z 2 ≥ · · · ≥ z N(N−1)/2 ≥ 0} ⊆ R N(N−1)/2+ (354)where N(N −1)/2 = 3 for the present example.vectorization (706) we create the sort-index matrixgenerally defined⎡O = ⎣0 1 2 3 21 2 0 2 23 2 2 2 0⎤From the sorted⎦ ∈ S 3 h ∩ R 3×3+ (707)O ij ∆ = k 2 | d ij = ( Π Ξd ) k , j ≠ i (708)where Π is a permutation matrix completely reversing order of vector entries.Replacing EDM data with indices-square of a nonincreasing sorting likethis is, of course, a heuristic we invented and may be regarded as a nonlinearintroduction of much noise into the Euclidean distance matrix. For largedata sets, this heuristic makes an otherwise intense problem computationallytractable; we see an example in relaxed problem (712).

4.13. RECONSTRUCTION EXAMPLES 299λ(−V T N OV N) j90080070060050040030020010001 2 3 4 5 6 7 8 9 10jFigure 75: Largest ten eigenvalues of −VN TOV N for map of USA, sorted bynonincreasing value. In the code (G.3.2), we normalize O by (N(N −1)/2) 2 .Any process of reconstruction that leaves comparative distanceinformation intact is called ordinal multidimensional scaling or isotonicreconstruction. Beyond rotation, reflection, and translation error, (4.5)list reconstruction by isotonic reconstruction is subject to error in absolutescale (dilation) and distance ratio. Yet Borg & Groenen argue: [36,2.2]reconstruction from complete comparative distance information for a largenumber of points is as highly constrained as reconstruction from an EDM;the larger the number, the better.4.13.2.1 Isotonic map of the USATo test Borg & Groenen’s conjecture, suppose we make a complete sort-indexmatrix O ∈ S N h ∩ R N×N+ for the map of the USA and then substitute O in placeof EDM D in the reconstruction process of4.12. Whereas EDM D returnedonly three significant eigenvalues (704), the sort-index matrix O is generallynot an EDM (certainly not an EDM with corresponding affine dimension 3)so returns many more. The eigenvalues, calculated with absolute numericalerror approximately 5E-7, are plotted in Figure 75:λ(−V T N OV N ) = [880.1 463.9 186.1 46.20 17.12 9.625 8.257 1.701 0.7128 0.6460 · · · ] T(709)

300 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXThe extra eigenvalues indicate that affine dimension corresponding to anEDM near O is likely to exceed 3. To realize the map, we must simultaneouslyreduce that dimensionality and find an EDM D closest to O in some sense(a problem explored more in7) while maintaining the known comparativedistance relationship; e.g., given permutation matrix Ξ expressing the knownsorting action on the entries d of unknown D ∈ S N h , (62)d ∆ = 1 √2dvec D =⎡⎢⎣⎤d 12d 13d 23d 14d 24d 34⎥⎦.d N−1,N∈ R N(N−1)/2 (710)we can make the sort-index matrix O input to the optimization problemminimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(711)Ξd ∈ K M+D ∈ EDM Nthat finds the EDM D (corresponding to affine dimension not exceeding 3 inisomorphic dvec EDM N ∩ Ξ T K M+ ) closest to O in the sense of Schoenberg(487).Analytical solution to this problem, ignoring the sort constraintΞd ∈ K M+ , is known [237]: we get the convex optimization [sic] (7.1)minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3(712)D ∈ EDM NOnly the three largest nonnegative eigenvalues in (709) need be retainedto make list (696); the rest are discarded. The reconstruction fromEDM D found in this manner is plotted in Figure 74(e)(f) from which itbecomes obvious that inclusion of the sort constraint is necessary for isotonicreconstruction.

4.13. RECONSTRUCTION EXAMPLES 301That sort constraint demands: any optimal solution D ⋆ must possess theknown comparative distance relationship that produces the original ordinaldistance data O (708). Ignoring the sort constraint, apparently, violates it.Yet even more remarkable is how much the map reconstructed using onlyordinal data still resembles the original map of the USA after suffering themany violations produced by solving relaxed problem (712). This suggeststhe simple reconstruction techniques of4.12 are robust to a significantamount of noise.4.13.2.2 Isotonic solution with sort constraintBecause problems involving rank are generally difficult, we will partition(711) into two problems we know how to solve and then alternate theirsolution until convergence:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM Nminimize ‖σ − Ξd‖σsubject to σ ∈ K M+(a)(b)(713)where the sort-index matrix O (a given constant in (a)) becomes an implicitvectorized variable o i solving the i th instance of (713b)o i ∆ = Ξ T σ ⋆ = 1 √2dvec O i ∈ R N(N−1)/2 , i∈{1, 2, 3...} (714)As mentioned in discussion of relaxed problem (712), a closed-formsolution to problem (713a) exists. Only the first iteration of (713a) sees theoriginal sort-index matrix O whose entries are nonnegative whole numbers;id est, O 0 =O∈ S N h ∩ R N×N+ (708). Subsequent iterations i take the previoussolution of (713b) as inputO i = dvec −1 ( √ 2o i ) ∈ S N (715)real successors to the sort-index matrix O .New problem (713b) finds the unique minimum-distance projection of Ξdon the monotone nonnegative cone K M+ . By definingY †T ∆ = [e 1 − e 2 e 2 −e 3 e 3 −e 4 · · · e m ] ∈ R m×m (355)

302 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXwhere m=N(N ∆ −1)/2, we may rewrite (713b) as an equivalent quadraticprogram; a convex optimization problem [41,4] in terms of thehalfspace-description of K M+ :minimize (σ − Ξd) T (σ − Ξd)σsubject to Y † σ ≽ 0(716)This quadratic program can be converted to a semidefinite program via Schurform (A.4.1); we get the equivalent problemminimizet∈R , σsubject tot[tI σ − Ξd(σ − Ξd) T 1]≽ 0(717)Y † σ ≽ 04.13.2.3 ConvergenceInE.10 we discuss convergence of alternating projection on intersectingconvex sets in a Euclidean vector space; convergence to a point in theirintersection. Here the situation is different for two reasons:Firstly, sets of positive semidefinite matrices having an upper bound onrank are generally not convex. Yet in7.1.4.0.1 we prove (713a) is equivalentto a projection of nonincreasingly ordered eigenvalues on a subset of thenonnegative orthant:minimize ‖−VN T(D − O)V N ‖ FDsubject to rankVN TDV N ≤ 3D ∈ EDM N≡minimize ‖Υ − Λ‖ FΥ [ ]R3+ (718)subject to δ(Υ) ∈0∆where −VN TDV N =UΥU T ∈ S N−1 and −VN TOV N =QΛQ T ∈ S N−1 areordered diagonalizations (A.5). It so happens: optimal orthogonal U ⋆always equals Q given. Linear operator T(A) = U ⋆T AU ⋆ , acting on squarematrix A , is a bijective isometry because the Frobenius norm is orthogonallyinvariant (39). This isometric isomorphism T thus maps a nonconvexproblem to a convex one that preserves distance.∆

4.13. RECONSTRUCTION EXAMPLES 303Secondly, the second half (713b) of the alternation takes place in adifferent vector space; S N h (versus S N−1 ). From4.6 we know these twovector spaces are related by an isomorphism, S N−1 =V N (S N h ) (584), but notby an isometry.We have, therefore, no guarantee from theory of alternating projectionthat the alternation (713) converges to a point, in the set of all EDMscorresponding to affine dimension not in excess of 3, belonging todvec EDM N ∩ Ξ T K M+ .4.13.2.4 InterludeWe have not implemented the second half (716) of alternation (713) forUSA map data because memory-demands exceed the capability of our 32-bitlaptop computer.4.13.2.4.1 Exercise.Empirically demonstrate convergence, discussed in4.13.2.3, on a smallerdata set.It would be remiss not to mention another method of solution to thisisotonic reconstruction problem: Once again we assume only comparativedistance data like (705) is available. Given known set of indices Iminimize rankV DVD(719)subject to d ij ≤ d kl ≤ d mn ∀(i,j,k,l,m,n)∈ ID ∈ EDM Nthis problem minimizes affine dimension while finding an EDM whoseentries satisfy known comparative relationships. Suitable rank heuristics arediscussed in7.2.2 that will transform this to a convex optimization problem.Using contemporary computers, even with a rank heuristic in place of theobjective function, this problem formulation is more difficult to compute thanthe relaxed counterpart problem (712). That is because there exist efficientalgorithms to compute a selected few eigenvalues and eigenvectors from avery large matrix. Regardless, it is important to recognize: the optimalsolution set for this problem (719) is practically always different from theoptimal solution set for its counterpart, problem (711).

304 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX4.14 Fifth property of Euclidean metricWe continue now with the question raised in4.3 regarding the necessityfor at least one requirement more than the four properties of the Euclideanmetric (4.2) to certify realizability of a bounded convex polyhedron or toreconstruct a generating list for it from incomplete distance information.There we saw that the four Euclidean metric properties are necessary forD ∈ EDM N in the case N = 3, but become insufficient when cardinality Nexceeds 3 (regardless of affine dimension).4.14.1 RecapitulateIn the particular case N = 3, −V T N DV N ≽ 0 (624) and D ∈ S 3 h are necessaryand sufficient conditions for D to be an EDM. By (626), triangle inequalityis then the only Euclidean condition bounding the necessarily nonnegatived ij ; and those bounds are tight. That means the first four properties of theEuclidean metric are necessary and sufficient conditions for D to be an EDMin the case N = 3 ; for i,j∈{1, 2, 3}√dij ≥ 0, i ≠ j√dij = 0, i = j√dij = √ d ji √dij≤ √ d ik + √ d kj , i≠j ≠k⇔ −V T N DV N ≽ 0D ∈ S 3 h⇔ D ∈ EDM 3(720)Yet those four properties become insufficient when N > 3.4.14.2 Derivation of the fifthCorrespondence between the triangle inequality and the EDM was developedin4.8.2 where a triangle inequality (626a) was revealed within theleading principal 2 ×2 submatrix of −VN TDV N when positive semidefinite.Our choice of the leading principal submatrix was arbitrary; actually,a unique triangle inequality like (530) corresponds to any one of the(N −1)!/(2!(N −1 − 2)!) principal 2 ×2 submatrices. 4.48 Assuming D ∈ S 4 hand −VN TDV N ∈ S 3 , then by the positive (semi)definite principal submatrices4.48 There are fewer principal 2×2 submatrices in −V T N DV N than there are triangles madeby four or more points because there are N!/(3!(N − 3)!) triangles made by point triples.The triangles corresponding to those submatrices all have vertex x 1 . (confer4.8.2.1)

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 305theorem (A.3.1.0.4) it is sufficient to prove all d ij are nonnegative, alltriangle inequalities are satisfied, and det(−VN TDV N) is nonnegative. WhenN = 4, in other words, that nonnegative determinant becomes the fifth andlast Euclidean metric requirement for D ∈ EDM N . We now endeavor toascribe geometric meaning to it.4.14.2.1 Nonnegative determinantBy (536) when D ∈EDM 4 , −VN TDV N is equal to inner product (531),⎡√ √ ⎤d 12 d12 d 13 cosθ 213 d12 √d12 √d 14 cosθ 214Θ T Θ = ⎣ d 13 cosθ 213 d 13 d13 √d12d 14 cosθ 314⎦√(721)d 14 cosθ 214 d13 d 14 cosθ 314 d 14Because Euclidean space is an inner-product space, the more conciseinner-product form of the determinant is admitted;det(Θ T Θ) = −d 12 d 13 d 14(cos(θ213 ) 2 +cos(θ 214 ) 2 +cos(θ 314 ) 2 − 2 cos θ 213 cosθ 214 cos θ 314 − 1 )The determinant is nonnegative if and only if(722)cos θ 214 cos θ 314 − √ sin(θ 214 ) 2 sin(θ 314 ) 2 ≤ cos θ 213 ≤ cos θ 214 cos θ 314 + √ sin(θ 214 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 314 − √ sin(θ 213 ) 2 sin(θ 314 ) 2 ≤ cos θ 214 ≤ cos θ 213 cos θ 314 + √ sin(θ 213 ) 2 sin(θ 314 ) 2⇔cos θ 213 cos θ 214 − √ sin(θ 213 ) 2 sin(θ 214 ) 2 ≤ cos θ 314 ≤ cos θ 213 cos θ 214 + √ sin(θ 213 ) 2 sin(θ 214 ) 2which simplifies, for 0 ≤ θ i1l ,θ l1j ,θ i1j ≤ π and all i≠j ≠l ∈{2, 3, 4}, to(723)cos(θ i1l + θ l1j ) ≤ cos θ i1j ≤ cos(θ i1l − θ l1j ) (724)Analogously to triangle inequality (638), the determinant is 0 upon equalityon either side of (724) which is tight. Inequality (724) can be equivalentlywritten linearly as a “triangle inequality”, but between relative angles[269,1.4];|θ i1l − θ l1j | ≤ θ i1j ≤ θ i1l + θ l1jθ i1l + θ l1j + θ i1j ≤ 2π(725)0 ≤ θ i1l ,θ l1j ,θ i1j ≤ πGeneralizing this:

306 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXθ 213-θ 214-θ 314Figure 76: The relative-angle inequality tetrahedron (726) bounding EDM 4is regular; drawn in entirety. Each angle θ (528) must belong to this solidto be realizable.π4.14.2.1.1 Fifth property of Euclidean metric - restatement.Relative-angle inequality. (confer4.3.1.0.1) [33] [34, p.17, p.107] [152,3.1]Augmenting the four fundamental Euclidean metric properties in R n , for alli,j,l ≠ k ∈{1... N}, i

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 307triangle inequality remains a necessary although penultimate test; (4.4.3)Four Euclidean metric properties (4.2).Angle θ inequality (462) or (726).⇔ −V T N DV N ≽ 0D ∈ S 4 h⇔ D = D(Θ) ∈ EDM 4The relative-angle inequality, for this case, is illustrated in Figure 76.4.14.2.2 Beyond the fifth metric property(727)When cardinality N exceeds 4, the first four properties of the Euclideanmetric and the relative-angle inequality together become insufficientconditions for realizability. In other words, the four Euclidean metricproperties and relative-angle inequality remain necessary but become asufficient test only for positive semidefiniteness of all the principal 3 × 3submatrices [sic] in −VN TDV N . Relative-angle inequality can be consideredthe ultimate test only for realizability at each vertex x k of each and everypurported tetrahedron constituting a hyperdimensional body.When N = 5 in particular, relative-angle inequality becomes thepenultimate Euclidean metric requirement while nonnegativity of thenunwieldy det(Θ T Θ) corresponds (by the positive (semi)definite principalsubmatrices theorem inA.3.1.0.4) to the sixth and last Euclidean metricrequirement. Together these six tests become necessary and sufficient, andso on.Yet for all values of N , only assuming nonnegative d ij , relative-anglematrix inequality in (640) is necessary and sufficient to certify realizability;(4.4.3.1)Euclidean metric property 1 (4.2).Angle matrix inequality Ω ≽ 0 (537).⇔ −V T N DV N ≽ 0D ∈ S N h⇔ D = D(Ω,d) ∈ EDM N(728)Like matrix criteria (463), (487), and (640), the relative-angle matrixinequality and nonnegativity property subsume all the Euclidean metricproperties and further requirements.4.14.3 Path not followedAs a means to test for realizability of four or more points, anintuitively appealing way to augment the four Euclidean metric properties

308 CHAPTER 4. EUCLIDEAN DISTANCE MATRIXis to recognize generalizations of the triangle inequality: In thecase N = 4, the three-dimensional analogue to triangle & distance istetrahedron & facet-area, while in the case N = 5 the four-dimensionalanalogue is polychoron & facet-volume, ad infinitum. For N points, N + 1metric properties are required.4.14.3.1 N = 4Each of the four facets of a general tetrahedron is a triangle and itsrelative interior. Suppose we identify each facet of the tetrahedron by itsarea-squared: c 1 , c 2 , c 3 , c 4 . Then analogous to metric property 4, we maywrite a tight 4.49 area inequality for the facets√ci ≤ √ c j + √ c k + √ c l , i≠j ≠k≠l∈{1, 2, 3, 4} (729)which is a generalized “triangle” inequality [147,1.1] that follows from√ci = √ c j cos ϕ ij + √ c k cos ϕ ik + √ c l cos ϕ il (730)[158] [252, Law of Cosines] where ϕ ij is the dihedral angle at the commonedge between triangular facets i and j .If D is the EDM corresponding to the whole tetrahedron, thenarea-squared of the i th triangular facet has a convenient formula in termsof D i ∈ EDM N−1 the EDM corresponding to that particular facet: From theCayley-Menger determinant 4.50 for simplices, [252] [74] [95,4] [53,3.3] thei th facet area-squared for i∈{1... N} is (A.4.2)[ ]−1 0 1c i =2 N−2 (N −2)! 2 det T(731)1 −D i==(−1) N2 N−2 (N −2)! 2 detD i(1 T D −1i 1 ) (732)(−1) N2 N−2 (N −2)! 2 1T cof(D i ) T 1 (733)where D i is the i th principal N−1×N−1 submatrix 4.51 of D ∈ EDM N , and4.49 The upper bound is met when all angles in (730) are simultaneously 0; that occurs,for example, if one point is relatively interior to the convex hull of the three remaining.4.50 whose foremost characteristic is: the determinant vanishes [ if and ] only if affine0 1Tdimension does not equal penultimate cardinality; id est, det = 0 ⇔ r < N −11 −Dwhere D is any EDM (4.7.3.0.1). Otherwise, the determinant is negative.4.51 Every principal submatrix of an EDM remains an EDM. [152,4.1]

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 309cof(D i ) is the N −1×N −1 matrix of cofactors [220,4] corresponding toD i . The number of principal 3 × 3 submatrices in D is, of course, equalto the number of triangular facets in the tetrahedron; four (N!/(3!(N −3)!))when N = 4.The triangle inequality (property 4) and area inequality (729) areconditions necessary for D to be an EDM; we do not prove their sufficiencyin conjunction with the remaining three Euclidean metric properties.4.14.3.2 N = 5Moving to the next level, we might encounter a Euclidean body calledpolychoron, a bounded polyhedron in four dimensions. 4.52 The polychoronhas five (N!/(4!(N −4)!)) facets, each of them a general tetrahedron whosevolume-squared c i is calculated using the same formula; (731) whereD is the EDM corresponding to the polychoron, and D i is the EDMcorresponding to the i th facet (the principal 4 × 4 submatrix of D ∈ EDM Ncorresponding to the i th tetrahedron). The analogue to triangle & distanceis now polychoron & facet-volume. We could then write another generalized“triangle” inequality like (729) but in terms of facet volume; [256,IV]√ci ≤ √ c j + √ c k + √ c l + √ c m , i≠j ≠k≠l≠m∈{1... 5} (734)Now, for N = 5, the triangle (distance) inequality (4.2) and the areainequality (729) and the volume inequality (734) are conditions necessary forD to be an EDM. We do not prove their sufficiency.4.14.3.3 Volume of simplicesThere is no known formula for the volume of a bounded general convexpolyhedron expressed either by halfspace or vertex-description. [267,2.1][185, p.173] [149] [150] [105] [106] Volume is a concept germane to R 3 ;in higher dimensions it is called content. Applying the EDM assertion(4.9.1.0.3) and a result given in [41,8.3.1], a general nonempty simplex(2.12.3) in R N−1 corresponding to an EDM D ∈ S N h has content√ c = content(S) det 1/2 (−V T NDV N ) (735)4.52 The simplest polychoron is called a pentatope [252]; a regular simplex hence convex.(A pentahedron is a three-dimensional body having five vertices.)

310 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX.Figure 77: Length of one-dimensional face a equals height h=a=1 of thisnonsimplicial pyramid in R 3 with square base inscribed in a circle of radius Rcentered at the origin. [252, Pyramid]where the content-squared of the unit simplex S ⊂ R N−1 is proportional toits Cayley-Menger determinant;[ ]content(S) 2 −1 0 1=2 N−1 (N −1)! 2 det T(736)1 −D([0 e 1 e 2 · · · e N−1 ])where e i ∈ R N−1 and the EDM operator used is D(X) (468).4.14.3.3.1 Example. Pyramid.A formula for volume of a pyramid is known; 4.53 it is 1 the product of its3base area with its height. [145] The pyramid in Figure 77 has volume 1 . 3To find its volume using EDMs, we must first decompose the pyramid intosimplicial parts. Slicing it in half along the plane containing the line segmentscorresponding to radius R and height h we find the vertices of one simplex,⎡⎤1/2 1/2 −1/2 0X = ⎣ 1/2 −1/2 −1/2 0 ⎦∈ R n×N (737)0 0 0 1where N = n + 1 for any nonempty simplex in R n .this simplex is half that of the entire pyramid; id est,evaluating (735).With that, we conclude digression of path.√The volume ofc =1found by64.53 Pyramid volume is independent of the paramount vertex position as long as its heightremains constant.

4.14. FIFTH PROPERTY OF EUCLIDEAN METRIC 3114.14.4 Affine dimension reduction in three dimensions(confer4.8.4) The determinant of any M ×M matrix is equal to the productof its M eigenvalues. [220,5.1] When N = 4 and det(Θ T Θ) is 0, thatmeans one or more eigenvalues of Θ T Θ∈ R 3×3 are 0. The determinant willgo to 0 whenever equality is attained on either side of (462), (726a), or (726b),meaning that a tetrahedron has collapsed to a lower affine dimension; id est,r = rank Θ T Θ = rank Θ is reduced below N − 1 exactly by the number of0 eigenvalues (4.7.1.1).In solving completion problems of any size N where one or more entriesof an EDM are unknown, therefore, dimension r of the affine hull requiredto contain the unknown points is potentially reduced by selecting distancesto attain equality in (462) or (726a) or (726b).4.14.4.1 Exemplum reduxWe now apply the fifth Euclidean metric property to an earlier problem:4.14.4.1.1 Example. Small completion problem, IV. (confer4.9.3.0.1)Returning again to Example 4.3.0.0.2 that pertains to Figure 59 whereN =4, distance-square d 14 is ascertainable from the fifth Euclidean metricproperty. Because all distances in (460) are known except √ d 14 , thencos θ 123 =0 and θ 324 =0 result from identity (528). Applying (462),cos(θ 123 + θ 324 ) ≤ cos θ 124 ≤ cos(θ 123 − θ 324 )0 ≤ cos θ 124 ≤ 0(738)It follows again from (528) that d 14 can only be 2. As explained in thissubsection, affine dimension r cannot exceed N −2 because equality isattained in (738).

312 CHAPTER 4. EUCLIDEAN DISTANCE MATRIX

Chapter 5EDM coneFor N > 3, the cone of EDMs is no longer a circular cone andthe geometry becomes complicated...−Hayden, Wells, Liu, & Tarazaga (1991) [115,3]In the subspace of symmetric matrices S N , we know the convex coneof Euclidean distance matrices EDM N does not intersect the positivesemidefinite (PSD) cone S N + except at the origin, their only vertex; therecan be no positive nor negative semidefinite EDM. (663) [152]EDM N ∩ S N + = 0 (739)Even so, the two convex cones can be related. We prove the new equalityEDM N = S N h ∩ ( )S N⊥c − S N + (827)2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.313

314 CHAPTER 5. EDM CONEa resemblance to EDM definition (468) whereS N h∆= { A ∈ S N | δ(A) = 0 } (55)is the symmetric hollow subspace (2.2.3) and whereS N⊥c = {u1 T + 1u T | u∈ R N } (1602)is the orthogonal complement of the geometric center subspace (E.7.2.0.2)S N c∆= {Y ∈ S N | Y 1 = 0} (1600)5.0.1 gravityEquality (827) is equally important as the known isomorphisms (573) (574)(585) (586) relating the EDM cone EDM N to an N(N −1)/2-dimensionalface of S N + (4.6.1.1), or to S N−1+ (4.6.2.1). 5.1 Those isomorphisms havenever led to this equality (827) relating the whole cones EDM N and S N + .Equality (827) is not obvious from the various EDM matrix definitionssuch as (468) or (751) because it is nontrivial to prove inclusion;EDM N ⊇ S N h ∩ (S N⊥c − S N +). Specifically, while it is easy to demonstratehow S N + can be spanned, any proof of inclusion must likewise show howS N⊥c may be spanned. Otherwise, that equality claimed by (827) would nothold. Although an algebraic proof might be attractive, we instead prove(827) using purely geometric methods.5.0.2 highlightIn5.8.1.7 we show: the Schoenberg criterion for discriminating Euclideandistance matricesD ∈ EDM N⇔{−VTN DV N ∈ S N−1+D ∈ S N h(487)is a discretized membership relation (2.13.4) between the EDM cone and itsordinary dual.5.1 Because both positive semidefinite cones are frequently in play, dimension is notated.

5.1. DEFINING EDM CONE 3155.1 Defining EDM coneWe invoke a popular matrix criterion to illustrate correspondence between theEDM and PSD cones belonging to the ambient space of symmetric matrices:{−V DV ∈ SND ∈ EDM N+⇔(492)D ∈ S N hwhere V ∈ S N is the geometric centering projection matrix (B.4). The setof all EDMs of dimension N ×N forms a closed convex cone EDM N becauseany pair of EDMs satisfies the definition of a convex cone (139); videlicet, foreach and every ζ 1 , ζ 2 ≥ 0 (A.3.1.0.2)ζ 1 V D 1 V + ζ 2 V D 2 V ≽ 0ζ 1 D 1 + ζ 2 D 2 ∈ S N h⇐ V D 1V ≽ 0, V D 2 V ≽ 0D 1 ∈ S N h , D 2 ∈ S N h(740)and convex cones are invariant to inverse affine transformation [202, p.22].5.1.0.0.1 Definition. Cone of Euclidean distance matrices.In the subspace of symmetric matrices, the set of all Euclidean distancematrices forms a unique immutable pointed closed convex cone called theEDM cone: for N > 0EDM N = ∆ { }D ∈ S N h | −V DV ∈ S N += ⋂ {D ∈ S N | 〈zz T , −D〉≥0, δ(D)=0 } (741)z∈N(1 T )The EDM cone in isomorphic R N(N+1)/2 [sic] is the intersection of an infinitenumber (when N >2) of halfspaces about the origin and a finite numberof hyperplanes through the origin in vectorized variable D = [d ij ] . HenceEDM N has empty interior with respect to S N because it is confined to thesymmetric hollow subspace S N h . The EDM cone relative interior comprisesrel int EDM N = ⋂ {D ∈ S N | 〈zz T , −D〉>0, δ(D)=0 }z∈N(1 T )= { D ∈ EDM N | rank(V DV ) = N −1 } (742)while its relative boundary comprisesrel∂EDM N = { D ∈ EDM N | 〈zz T , −D〉 = 0 for some z ∈ N(1 T ) }= { D ∈ EDM N | rank(V DV ) < N −1 } (743)△

316 CHAPTER 5. EDM CONEdvec rel∂EDM 3d 0 13 0.20.2d 120.40.40.60.60.80.8111d 230.80.60.4d 23(a)0.20(d)d 12d 130.80.8√d23 0.60.6√ √ 0 0 d13 0.20.2 d120.20.20.40.4 0.40.40.60.6 0.60.60.80.8 0.80.811 11110.40.40.20.2(b)00(c)Figure 78: Relative boundary (tiled) of EDM cone EDM 3 drawn truncatedin isometrically isomorphic subspace R 3 . (a) EDM cone drawn in usualdistance-square coordinates d ij . View is from interior toward origin. Unlikepositive semidefinite cone, EDM cone is not self-dual, neither is it properin ambient symmetric subspace (dual EDM cone for this example belongsto isomorphic R 6 ). (b) Drawn in its natural coordinates √ d ij (absolutedistance), cone remains convex (confer4.10); intersection of three halfspaces(627) whose partial boundaries each contain origin. Cone geometry becomes“complicated” (nonpolyhedral) in higher dimension. [115,3] (c) Twocoordinate systems artificially superimposed. Coordinate transformationfrom d ij to √ d ij appears a topological contraction. (d) Sitting onits vertex 0, pointed EDM 3 is a circular cone having axis of revolutiondvec(−E)= dvec(11 T − I) (659) (62). Rounded vertex is plot artifact.

5.2. POLYHEDRAL BOUNDS 317This cone is more easily visualized in the isomorphic vector subspaceR N(N−1)/2 corresponding to S N h :In the case N = 1 point, the EDM cone is the origin in R 0 .In the case N = 2, the EDM cone is the nonnegative real line in R ; ahalfline in a subspace of the realization in Figure 86.The EDM cone in the case N = 3 is a circular cone in R 3 illustrated inFigure 78(a)(d); rather, the set of all matrices⎡D = ⎣0 d 12 d 13d 12 0 d 23d 13 d 23 0⎤⎦ ∈ EDM 3 (744)makes a circular cone in this dimension. In this case, the first four Euclideanmetric properties are necessary and sufficient tests to certify realizabilityof triangles; (720). Thus triangle inequality property 4 describes threehalfspaces (627) whose intersection makes a polyhedral cone in R 3 ofrealizable √ d ij (absolute distance); an isomorphic subspace representationof the set of all EDMs D in the natural coordinates⎡ √ √ ⎤0 d12 d13◦√D =∆ √d12 √⎣ 0 d23 ⎦√d13 √(745)d23 0illustrated in Figure 78(b).5.2 Polyhedral boundsThe convex cone of EDMs is nonpolyhedral in d ij for N > 2 ; e.g.,Figure 78(a). Still we found necessary and sufficient bounding polyhedralrelations consistent with EDM cones for cardinality N = 1, 2, 3, 4:N = 3. Transforming distance-square coordinates d ij by taking their positivesquare root provides polyhedral cone in Figure 78(b); polyhedralbecause an intersection of three halfspaces in natural coordinates√dij is provided by triangle inequalities (627). This polyhedralcone implicitly encompasses necessary and sufficient metric properties:nonnegativity, self-distance, symmetry, and triangle inequality.

318 CHAPTER 5. EDM CONE(b)(c)dvec rel∂EDM 3(a)∂H0Figure 79: (a) In isometrically isomorphic subspace R 3 , intersection ofEDM 3 with hyperplane ∂H representing one fixed symmetric entry d 23 =κ(both drawn truncated, rounded vertex is artifact of plot). EDMs in thisdimension corresponding to affine dimension 1 comprise relative boundary ofEDM cone (5.6). Since intersection illustrated includes a nontrivial subsetof cone’s relative boundary, then it is apparent there exist infinitely manyEDM completions corresponding to affine dimension 1. In this dimension it isimpossible to represent a unique nonzero completion corresponding to affinedimension 1, for example, using a single hyperplane because any hyperplanesupporting relative boundary at a particular point Γ contains an entire ray{ζΓ | ζ ≥0} belonging to rel∂EDM 3 by Lemma 2.8.0.0.1. (b) d 13 =κ.(c) d 12 =κ.

5.3.√EDM CONE IS NOT CONVEX 319N = 4. Relative-angle inequality (726) together with four Euclidean metricproperties are necessary and sufficient tests for realizability oftetrahedra. (727) Albeit relative angles θ ikj (528) are nonlinearfunctions of the d ij , relative-angle inequality provides a regulartetrahedron in R 3 [sic] (Figure 76) bounding angles θ ikj at vertex x kconsistently with EDM 4 . 5.2Yet were we to employ the procedure outlined in4.14.3 for makinggeneralized triangle inequalities, then we would find all the necessary andsufficient d ij -transformations for generating bounding polyhedra consistentwith EDMs of any higher dimension (N > 3).5.3 √ EDM cone is not convexFor some applications, like the molecular conformation problem (Figure 3)or multidimensional scaling, [60] [238] absolute distance √ d ij is the preferredvariable. Taking square root of the entries in all EDMs D of dimension N ,we get another cone but not a convex cone when N > 3 (Figure 78(b)):[54,4.5.2]√EDM N ∆ = { ◦√ D | D ∈ EDM N } (746)where ◦√ D is defined as in (745). It is a cone simply because any coneis completely constituted by rays emanating from the origin: (2.7) Anygiven ray {ζΓ∈ R N(N−1)/2 | ζ ≥0} remains a ray under entrywise square root:{ √ ζΓ∈ R N(N−1)/2 | ζ ≥0}. Because of how √ EDM N is defined, it is obviousthat (confer4.10)D ∈ EDM N ⇔ ◦√ √D ∈ EDM N (747)Were √ EDM N convex, then given◦ √ D 1 , ◦√ D 2 ∈ √ EDM N we wouldexpect their conic combination ◦√ D 1 + ◦√ D 2 to be a member of √ EDM N .That is easily proven false by counterexample via (747), for then( ◦√ D 1 + ◦√ D 2 ) ◦( ◦√ D 1 + ◦√ D 2 ) would need to be a member of EDM N .Notwithstanding, in7.2.1 we learn how to transform a nonconvexproximity problem in the natural coordinates √ d ij to a convex optimization.5.2 Still, property-4 triangle inequalities (627) corresponding to each principal 3 ×3submatrix of −VN TDV N demand that the corresponding √ d ij belong to a polyhedralcone like that in Figure 78(b).

320 CHAPTER 5. EDM CONE(a)2 nearest neighbors(b)3 nearest neighborsFigure 80: Neighborhood graph (dashed) with dimensionless EDM subgraphcompletion (solid) superimposed (but not covering dashed). Local viewof a few dense samples from relative interior of some arbitraryEuclidean manifold whose affine dimension appears two-dimensional in thisneighborhood. All line segments measure absolute distance. Dashed linesegments help visually locate nearest neighbors; suggesting, best number ofnearest neighbors can be greater than value of embedding dimension aftertopological transformation. (confer [138,2]) Solid line segments representcompletion of EDM subgraph from available distance data for an arbitrarilychosen sample and its nearest neighbors. Each distance from EDM subgraphbecomes distance-square in corresponding EDM submatrix.

5.4. A GEOMETRY OF COMPLETION 3215.4 a geometry of completionIntriguing is the question of whether the list in X ∈ R n×N (64) maybe reconstructed given an incomplete noiseless EDM, and under whatcircumstances reconstruction is unique. [1] [2] [3] [4] [6] [14] [135] [141] [151][152] [153]If one or more entries of a particular EDM are fixed, then geometricinterpretation of the feasible set of completions is the intersection of the EDMcone EDM N in isomorphic subspace R N(N−1)/2 with as many hyperplanes asthere are fixed symmetric entries. Assuming a nonempty intersection, thenthe number of completions is generally infinite, and those corresponding toparticular affine dimension r

322 CHAPTER 5. EDM CONEa data-dependent algorithmic parameter whose minimum value connectsthe graph. The dimensionless EDM subgraph between each sample andits nearest neighbors is completed from available data and included asinput; one such EDM subgraph completion is drawn superimposed upon theneighborhood graph in Figure 80. 5.3 The consequent dimensionless EDMgraph comprising all the subgraphs is incomplete, in general, because theneighbor number is relatively small; incomplete even though it is a supersetof the neighborhood graph. Remaining distances (those not graphed at all)are squared then made variables within the algorithm; it is this variabilitythat admits unfurling.To demonstrate, consider untying the trefoil knot drawn in Figure 81(a).A corresponding Euclidean distance matrix D = [d ij , i,j=1... N]employing only 2 nearest neighbors is banded having the incomplete form⎡D =⎢⎣0 ď 12 ď 13 ? · · · ? ď 1,N−1 ď 1Nď 12 0 ď 23 ď 24... ? ? ď 2Nď 13 ď 23 0 ď 34... ? ? ?? ď 24 ď 34 0...... ? ?................... ?? ? ?...... 0 ď N−2,N−1 ď N−2,Nď 1,N−1 ? ? ?... ď N−2,N−1 0 ď N−1,Nď 1N ď 2N ? ? ? ď N−2,N ď N−1,N 0⎤⎥⎦(748)where ďij denotes a given fixed distance-square. The unfurling algorithmcan be expressed as an optimization problem; constrained distance-squaremaximization:maximize 〈−V , D〉Dsubject to 〈D , e i e T j + e j e T i 〉 1 = 2 ďij ∀(i,j)∈ I(749)rank(V DV ) = 2D ∈ EDM N5.3 Local reconstruction of point position from the EDM submatrix corresponding to acomplete dimensionless EDM subgraph is unique to within an isometry (4.6,4.12).

5.4. A GEOMETRY OF COMPLETION 323(a)(b)Figure 81: (a) Trefoil knot in R 3 from Weinberger & Saul [251].(b) Topological transformation algorithm employing 4 nearest neighbors andN = 539 samples reduces affine dimension of knot to r=2. Choosing instead2 nearest neighbors would make this embedding more circular.

324 CHAPTER 5. EDM CONEwhere e i ∈ R N is the i th member of the standard basis, where set I indexesthe given distance-square data like that in (748), where V ∈ R N×N is thegeometric centering projection matrix (B.4.1), and where〈−V , D〉 = tr(−V DV ) = 2 trG = 1 N∑d ij (493)i,jwhere G is the Gram matrix producing D assuming G1 = 0.If we ignore the (rank) constraint on affine dimension, then problem(749) becomes convex, a corresponding solution D ⋆ can be found, and thena nearest rank-2 solution can be found by ordered eigen [ decomposition ] ofR2−V D ⋆ V followed by spectral projection (7.1.3) on ⊂ R N . This0two-step process is necessarily suboptimal. Yet because the decompositionfor the trefoil knot reveals only two dominant eigenvalues, the spectralprojection is nearly benign. Such a reconstruction of point position (4.12)utilizing 4 nearest neighbors is drawn in Figure 81(b); a low-dimensionalembedding of the trefoil knot.This problem (749) can, of course, be written equivalently in terms ofGram matrix G , facilitated by (499); videlicet, for Φ ij as in (466)maximize 〈I , G〉G∈S N csubject to 〈G , Φ ij 〉 = ďijrankG = 2G ≽ 0∀(i,j)∈ I(750)The advantage to converting EDM to Gram is: Gram matrix G is a bridgebetween point list X and EDM D ; constraints on any or all of thesethree variables may now be introduced. (Example 4.4.2.2.4) Confining Gto the geometric center subspace suffers no loss of generality and serves notheoretical purpose; numerically, this implicit constraint G1 = 0 keeps Gindependent of its translation-invariant subspace S N⊥c (4.5.1.1, Figure 88)so as not to become unbounded.

5.4. A GEOMETRY OF COMPLETION 325Figure 82: Trefoil ribbon; courtesy, Kilian Weinberger. Same topologicaltransformation algorithm with 5 nearest neighbors and N = 1617 samples.

326 CHAPTER 5. EDM CONE5.5 EDM definition in 11 TAny EDM D corresponding to affine dimension r has representation(confer (468))D(V X , y) ∆ = y1 T + 1y T − 2V X V T X + λ N 11T ∈ EDM N (751)where R(V X ∈ R N×r )⊆ N(1 T ) ,V T X V X = δ 2 (V T X V X ) , V X is full-rank with orthogonal columns, (752)λ ∆ = 2‖V X ‖ 2 F , and y ∆ = δ(V X V T X ) − λ2N 1 (753)where y=0 if and only if 1 is an eigenvector of EDM D . [115,2] Scalar λbecomes an eigenvalue when corresponding eigenvector 1 exists; e.g., whenX = I in EDM definition (468).Formula (751) can be validated by substituting (753); we findD(V X ) ∆ = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (754)is simply the standard EDM definition (468) where X T X has been replacedwith the subcompact singular value decomposition (A.6.2)V X V T X ≡ V T X T XV (755)Then inner product VX TV Xsingular values. 5.4is an r×r diagonal matrix Σ of nonzero∆5.4 Subcompact SVD: V X VXT = QΣ 1/2 Σ 1/2 Q T ≡ V T X T XV . So VX T is not necessarilyXV (4.5.1.0.1), although affine dimension r = rank(VX T ) = rank(XV ). (598)

5.5. EDM DEFINITION IN 11 T 327Proof. Next we validate eigenvector 1 and eigenvalue λ .(⇒) Suppose 1 is an eigenvector of EDM D . Then becauseit followsV T X 1 = 0 (756)D1 = δ(V X V T X )1T 1 + 1δ(V X V T X )T 1 = N δ(V X V T X ) + ‖V X ‖ 2 F 1⇒ δ(V X V T X ) ∝ 1 (757)For some κ∈ R +δ(V X V T X ) T 1 = N κ = tr(V T X V X ) = ‖V X ‖ 2 F ⇒ δ(V X V T X ) = 1 N ‖V X ‖ 2 F1 (758)so y=0.(⇐) Now suppose δ(V X VX T)= λ 1 ; id est, y=0. Then2ND = λ N 11T − 2V X V T X ∈ EDM N (759)1 is an eigenvector with corresponding eigenvalue λ . 5.5.1 Range of EDM DFromB.1.1 pertaining to linear independence of dyad sums: If the transposehalves of all the dyads in the sum (754) 5.5 make a linearly independent set,then the nontranspose halves constitute a basis for the range of EDM D .Saying this mathematically: For D ∈ EDM NR(D)= R([δ(V X V T X ) 1 V X ]) ⇐ rank([δ(V X V T X ) 1 V X ])= 2 + rR(D)= R([1 V X ]) ⇐ otherwise (760)5.5 Identifying the columns V X∆= [v1 · · · v r ] , then V X V T X = ∑ iv i v T iis a sum of dyads.

328 CHAPTER 5. EDM CONEN(1 T )δ(V X V T X )1V XFigure 83: Example of V X selection to make an EDM corresponding tocardinality N = 3 and affine dimension r = 1 ; V X is a vector in nullspaceN(1 T )⊂ R 3 . Nullspace of 1 T is hyperplane in R 3 (drawn truncated) havingnormal 1. Vector δ(V X VX T) may or may not be in plane spanned by {1, V X } ,but belongs to nonnegative orthant which is strictly supported by N(1 T ).

5.5. EDM DEFINITION IN 11 T 329To prove that, we need the condition under which the rank equality issatisfied: We know R(V X )⊥1, but what is the relative geometric orientationof δ(V X VX T) ? δ(V X V X T)≽0 because V X V X T ≽0, and δ(V X V X T )∝1 remainspossible (757); this means δ(V X VX T) /∈ N(1T ) simply because it has nonegative entries. (Figure 83) If the projection of δ(V X VX T) on N(1T ) doesnot belong to R(V X ), then that is a necessary and sufficient condition forlinear independence (l.i.) of δ(V X VX T) with respect to R([1 V X ]) ; id est,V δ(V X V T X ) ≠ V X a for any a∈ R r(I − 1 N 11T )δ(V X V T X ) ≠ V X aδ(V X V T X ) − 1 N ‖V X ‖ 2 F 1 ≠ V X aδ(V X V T X ) − λ2N 1 = y ≠ V X a ⇔ {1, δ(V X V T X ), V X } is l.i.(761)On the other hand when this condition is violated (when (753) y=V X a p forsome particular a∈ R r ), then from (751) we haveR ( D = y1 T + 1y T − 2V X V T X + λ N 11T) = R ( (V X a p + λ N 1)1T + (1a T p − 2V X )V T X= R([V X a p + λ N 1 1aT p − 2V X ])= R([1 V X ]) (762)An example of such a violation is (759) where, in particular, a p = 0. Then a statement parallel to (760) is, for D ∈ EDM N (Theorem 4.7.3.0.1)rank(D) = r + 2 ⇔ y /∈ R(V X ) ( ⇔ 1 T D † 1 = 0 )rank(D) = r + 1 ⇔ y ∈ R(V X ) ( ⇔ 1 T D † 1 ≠ 0 ) (763))5.5.2 Boundary constituents of EDM coneExpression (754) has utility in forming the set of all EDMs corresponding toaffine dimension r :

330 CHAPTER 5. EDM CONE10(a)-110-1V X ∈ R 3×1տ-101←−dvecD(V X ) ⊂ dvec rel ∂EDM 3(b)Figure 84: (a) Vector V X from Figure 83 spirals in N(1 T )⊂ R 3decaying toward origin. (Spiral is two-dimensional in vector space R 3 .)(b) Corresponding trajectory D(V X ) on EDM cone relative boundary createsa vortex also decaying toward origin. There are two complete orbits on EDMcone boundary about axis of revolution for every single revolution of V Xabout origin. (Vortex is three-dimensional in isometrically isomorphic R 3 .)

5.5. EDM DEFINITION IN 11 T 331{D ∈ EDM N | rank(V DV )= r }= { D(V X ) | V X ∈ R N×r , rankV X =r , VX TV X = δ2 (VX TV X ), R(V X)⊆ N(1 T ) }(764)whereas {D ∈ EDM N | rank(V DV )≤ r} is the closure of this same set;{D ∈ EDM N | rank(V DV )≤ r } = { D ∈ EDM N | rank(V DV )= r } (765)For example,rel ∂EDM N = { D ∈ EDM N | rank(V DV )< N −1 }= N−2⋃ {D ∈ EDM N | rank(V DV )= r } (766)r=0None of these are necessarily convex sets, althoughEDM N = N−1 ⋃ {D ∈ EDM N | rank(V DV )= r }r=0= { D ∈ EDM N | rank(V DV )= N −1 }rel int EDM N = { D ∈ EDM N | rank(V DV )= N −1 } (767)are pointed convex cones.When cardinality N = 3 and affine dimension r = 2, for example, therelative interior rel int EDM 3 is realized via (764). (5.6)When N = 3 and r = 1, the relative boundary of the EDM conedvec rel∂EDM 3 is realized in isomorphic R 3 as in Figure 78(d). This figurecould be constructed via (765) by spiraling vector V X tightly about theorigin in N(1 T ) ; as can be imagined with aid of Figure 83. Vectors closeto the origin in N(1 T ) are correspondingly close to the origin in EDM N .As vector V X orbits the origin in N(1 T ) , the corresponding EDM orbitsthe axis of revolution while remaining on the boundary of the circular conedvec rel∂EDM 3 . (Figure 84)

332 CHAPTER 5. EDM CONE5.5.3 Faces of EDM cone5.5.3.1 Isomorphic facesIn high cardinality N , any set of EDMs made via (764) or (765) withparticular affine dimension r is isomorphic with any set admitting the sameaffine dimension but made in lower cardinality. We do not prove that here.5.5.3.2 Extreme direction of EDM coneIn particular, extreme directions (2.8.1) of EDM N correspond to affinedimension r = 1 and are simply represented: for any particular cardinalityN ≥ 2 (2.8.2) and each and every nonzero vector z in N(1 T )Γ ∆ = (z ◦ z)1 T + 1(z ◦ z) T − 2zz T ∈ EDM N= δ(zz T )1 T + 1δ(zz T ) T − 2zz T (768)is an extreme direction corresponding to a one-dimensional face of the EDMcone EDM N that is a ray in isomorphic subspace R N(N−1)/2 .Proving this would exercise the fundamental definition (150) of extremedirection. Here is a sketch: Any EDM may be representedD(V X ) ∆ = δ(V X V T X )1 T + 1δ(V X V T X ) T − 2V X V T X ∈ EDM N (754)where matrix V X (752) has orthogonal columns. For the same reason (1108)that zz T is an extreme direction of the positive semidefinite cone (2.9.2.4)for any particular nonzero vector z , there is no conic combination of distinctEDMs (each conically independent of Γ) equal to Γ .5.5.3.2.1 Example. Biorthogonal expansion of an EDM.(confer2.13.7.1.1) When matrix D belongs to the EDM cone, nonnegativecoordinates for biorthogonal expansion are the eigenvalues λ∈ R N of−V DV 1 : For any D ∈ 2 SN h it holds

5.5. EDM DEFINITION IN 11 T 333D = δ ( −V DV 2) 1 1 T + 1δ ( −V DV 2) 1 T ( )− 2 −V DV12(565)By diagonalization −V DV 1 2∆= QΛQ T ∈ S N c (A.5.2) we may write( N) (∑N)∑ T∑D = δ λ i q i qiT 1 T + 1δ λ i q i qiT − 2 N λ i q i qiTi=1i=1i=1∑= N ( )λ i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1(769)where q i is the i th eigenvector of −V DV 1 arranged columnar in orthogonal2matrixQ = [q 1 q 2 · · · q N ] ∈ R N×N (325)and where {δ(q i q T i )1 T + 1δ(q i q T i ) T − 2q i q T i , i=1... N} are extremedirections of some pointed polyhedral cone K ⊂ S N h and extreme directionsof EDM N . Invertibility of (769)−V DV 1 = −V ∑ N (λ2 i δ(qi qi T )1 T + 1δ(q i qi T ) T − 2q i qiTi=1∑= N λ i q i qiTi=1)V12(770)implies linear independence of those extreme directions.biorthogonal expansion is expresseddvec D = Y Y † dvec D = Y λ ( )−V DV 1 2Then the(771)whereY ∆ = [ dvec ( δ(q i q T i )1 T + 1δ(q i q T i ) T − 2q i q T i), i = 1... N]∈ R N(N−1)/2×N (772)When D belongs to the EDM cone in the subspace of symmetric hollowmatrices, unique coordinates Y † dvecD for this biorthogonal expansionmust be the nonnegative eigenvalues λ of −V DV 1 . This means D2simultaneously belongs to the EDM cone and to the pointed polyhedral conedvec K = cone(Y ).

334 CHAPTER 5. EDM CONE5.5.3.3 Smallest faceNow suppose we are given a particular EDM D(V Xp )∈ EDM N correspondingto affine dimension r and parametrized by V Xp in (754). The EDM cone’ssmallest face that contains D(V Xp ) isF ( EDM N ∋ D(V Xp ) )= { D(V X ) | V X ∈ R N×r , rankV X =r , V T X V X = δ2 (V T X V X ), R(V X)⊆ R(V Xp ) }which is isomorphic 5.6 with the convex cone EDM r+1 , hence of dimension(773)dim F ( EDM N ∋ D(V Xp ) ) = (r + 1)r/2 (774)in isomorphic R N(N−1)/2 . Not all dimensions are represented; e.g., the EDMcone has no two-dimensional faces.When cardinality N = 4 and affine dimension r=2 so that R(V Xp ) is anytwo-dimensional subspace of three-dimensional N(1 T ) in R 4 , for example,then the corresponding face of EDM 4 is isometrically isomorphic with: (765)EDM 3 = {D ∈ EDM 3 | rank(V DV )≤ 2} ≃ F(EDM 4 ∋ D(V Xp )) (775)Each two-dimensional subspace of N(1 T ) corresponds to anotherthree-dimensional face.Because each and every principal submatrix of an EDM in EDM N(4.14.3) is another EDM [152,4.1], for example, then each principalsubmatrix belongs to a particular face of EDM N .5.5.3.4 Open questionThis result (774) is analogous to that for the positive semidefinite cone,although the question remains open whether all faces of EDM N (whosedimension is less than the dimension of the cone) are exposed like they arefor the positive semidefinite cone. (2.9.2.3) [232]5.6 The fact that the smallest face is isomorphic with another (perhaps smaller) EDMcone is implicit in [115,2].

5.6. CORRESPONDENCE TO PSD CONE S N−1+ 3355.6 Correspondence to PSD cone S N−1+Hayden & Wells et alii [115,2] assert one-to-one correspondence of EDMswith positive semidefinite matrices in the symmetric subspace. Becauserank(V DV )≤N−1 (4.7.1.1), the positive semidefinite cone correspondingto the EDM cone can only be S N−1+ . [6,18.2.1] To clearly demonstrate thatcorrespondence, we invoke inner-product form EDM definition[ ]D(Φ) =∆ 01δ(Φ)T + 1 [ 0 δ(Φ) ] [ ] 0 0 T T− 2∈ EDM N0 Φ⇔(583)Φ ≽ 0Then the EDM cone may be expressedEDM N = { D(Φ) | Φ ∈ S N−1+}(776)Hayden & Wells’ assertion can therefore be equivalently stated in terms ofan inner-product form EDM operatorD(S N−1+ ) = EDM N (585)V N (EDM N ) = S N−1+ (586)identity (586) holding because R(V N )= N(1 T ) (475), linear functions D(Φ)and V N (D)= −VN TDV N (4.6.2.1) being mutually inverse.In terms of affine dimension r , Hayden & Wells claim particularcorrespondence between PSD and EDM cones:r = N −1: Symmetric hollow matrices −D positive definite on N(1 T ) correspondto points relatively interior to the EDM cone.r < N −1: Symmetric hollow matrices −D positive semidefinite on N(1 T ) , where−VN TDV N has at least one 0 eigenvalue, correspond to points on therelative boundary of the EDM cone.r = 1: Symmetric hollow nonnegative matrices rank-one on N(1 T ) correspondto extreme directions (768) of the EDM cone; id est, for some nonzerovector u (A.3.1.0.7)rankV T N DV N =1D ∈ S N h ∩ R N×N+}⇔{D ∈ EDM N−VTD is an extreme direction ⇔ N DV N ≡ uu TD ∈ S N h(777)

336 CHAPTER 5. EDM CONE5.6.0.0.1 Proof. Case r = 1 is easily proved: From the nonnegativitydevelopment in4.8.1, extreme direction (768), and Schoenberg criterion(487), we need show only sufficiency; id est, proverankV T N DV N =1D ∈ S N h ∩ R N×N+}⇒D ∈ EDM ND is an extreme directionAny symmetric matrix D satisfying the rank condition must have the form,for z,q ∈ R N and nonzero z ∈ N(1 T ) ,because (4.6.2.1, conferE.7.2.0.2)D = ±(1q T + q1 T − 2zz T ) (778)N(V N (D)) = {1q T + q1 T | q ∈ R N } ⊆ S N (779)Hollowness demands q = δ(zz T ) while nonnegativity demands choice ofpositive sign in (778). Matrix D thus takes the form of an extreme direction(768) of the EDM cone. The foregoing proof is not extensible in rank: An EDMwith corresponding affine dimension r has the general form, for{z i ∈ N(1 T ), i=1... r} an independent set,( r)∑ T ( r)∑ ∑D = 1δ z i zi T + δ z i ziT 1 T − 2 r z i zi T ∈ EDM N (780)i=1i=1i=1The EDM so defined relies principally on the sum ∑ z i ziT having positivesummand coefficients (⇔ −VN TDV N ≽0) 5.7 . Then it is easy to find asum incorporating negative coefficients while meeting rank, nonnegativity,and symmetric hollowness conditions but not positive semidefiniteness onsubspace R(V N ) ; e.g., from page 287,⎡−V ⎣0 1 11 0 51 5 05.7 (⇐) For a i ∈ R N−1 , let z i =V †TN a i .⎤⎦V 1 2 = z 1z T 1 − z 2 z T 2 (781)

5.6. CORRESPONDENCE TO PSD CONE S N−1+ 3375.6.0.0.2 Example. ⎡ Extreme ⎤ rays versus rays on the boundary.0 1 4The EDM D = ⎣ 1 0 1 ⎦ is an extreme direction of EDM 3 where[ ]4 1 01u = in (777). Because −VN 2TDV N has eigenvalues {0, 5}, the ray whosedirection is D also lies on the relative [ boundary ] of EDM 3 .0 1In exception, EDM D = κ , for any particular κ > 0, is an1 0extreme direction of EDM 2 but −VN TDV N has only one eigenvalue: {κ} .Because EDM 2 is a ray whose relative boundary (2.6.1.3.1) is the origin,this conventional boundary does not include D which belongs to the relativeinterior in this dimension. (2.7.0.0.1)5.6.1 Gram-form correspondence to S N−1+With respect to D(G)=δ(G)1 T + 1δ(G) T − 2G (480) the linear Gram-formEDM operator, results in4.6.1 provide [1,2.6]EDM N = D ( V(EDM N ) ) ≡ D ( )V N S N−1+ VNT(782)V N S N−1+ VN T ≡ V ( D ( ))V N S N−1+ VN T = V(EDM N ) = ∆ −V EDM N V 1 = 2 SN c ∩ S N +(783)a one-to-one correspondence between EDM N and S N−1+ .5.6.2 EDM cone by elliptopeDefining the elliptope parametrized by scalar t>0E N t∆= S N + ∩ {Φ∈ S N | δ(Φ)=t1} (661)then following Alfakih [7] we haveEDM N = cone{11 T − E N 1 } = {t(11 T − E N 1 ) | t ≥ 0} (784)Identification E N = E N 1 equates the standard elliptope (4.9.1.0.1, Figure 70)to our parametrized elliptope.

338 CHAPTER 5. EDM CONEdvec rel ∂ EDM 3dvec(11 T − E 3 )EDM N = cone{11 T − E N } = {t(11 T − E N ) | t ≥ 0} (784)Figure 85: Three views of translated negated elliptope 11 T − E13(confer Figure 70) shrouded by truncated EDM cone. Fractal on EDMcone relative boundary is numerical artifact belonging to intersection withelliptope relative boundary. The fractal is trying to convey existence of aneighborhood about the origin where the translated elliptope boundary andEDM cone boundary intersect.

5.6. CORRESPONDENCE TO PSD CONE S N−1+ 3395.6.2.0.1 Expository. Define T E (11 T ) to be the tangent cone to theelliptope E at point 11 T ; id est,T E (11 T ) ∆ = {t(E − 11 T ) | t≥0} (785)The normal cone K ⊥ E (11T ) to the elliptope at 11 T is a closed convex conedefined (E.10.3.2.1, Figure 121)K ⊥ E (11 T ) ∆ = {B | 〈B , Φ − 11 T 〉 ≤ 0, Φ∈ E } (786)The polar cone of any set K is the closed convex cone (confer (252))K ◦ ∆ = {B | 〈B , A〉≤0, for all A∈ K} (787)The normal cone is well known to be the polar of the tangent cone,and vice versa; [129,A.5.2.4]K ⊥ E (11 T ) = T E (11 T ) ◦ (788)K ⊥ E (11 T ) ◦ = T E (11 T ) (789)From Deza & Laurent [66, p.535] we have the EDM coneEDM = −T E (11 T ) (790)The polar EDM cone is also expressible in terms of the elliptope. From (788)we haveEDM ◦ = −K ⊥ E (11 T ) (791)⋆In4.10.1 we proposed the expression for EDM DD = t11 T − E ∈ EDM N (662)where t∈ R + and E belongs to the parametrized elliptope E N t . We furtherpropose, for any particular t>0Proof. Pending.EDM N = cone{t11 T − E N t } (792)Relationship of the translated negated elliptope with the EDM cone isillustrated in Figure 85. We speculateEDM N = limt→∞t11 T − E N t (793)

340 CHAPTER 5. EDM CONE5.7 Vectorization & projection interpretationInE.7.2.0.2 we learn: −V DV can be interpreted as orthogonal projection[4,2] of vectorized −D ∈ S N h on the subspace of geometrically centeredsymmetric matricesS N c = {G∈ S N | G1 = 0} (1600)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1601)≡ {V N AVN T | A ∈ SN−1 }(556)because elementary auxiliary matrix V is an orthogonal projector (B.4.1).Yet there is another useful projection interpretation:Revising the fundamental matrix criterion for membership to the EDMcone (463), 5.8〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h}⇔ D ∈ EDM N (794)this is equivalent, of course, to the Schoenberg criterion−VN TDV }N ≽ 0⇔ D ∈ EDM N (487)D ∈ S N hbecause N(11 T )= R(V N ). When D ∈ EDM N , correspondence (794) means−z T Dz is proportional to a nonnegative coefficient of orthogonal projection(E.6.4.2, Figure 87) of −D in isometrically isomorphic R N(N+1)/2 on therange of each and every vectorized (2.2.2.1) symmetric dyad (B.1) in thenullspace of 11 T ; id est, on each and every member ofT ∆ = { svec(zz T ) | z ∈ N(11 T )= R(V N ) } ⊂ svec ∂ S N += { svec(V N υυ T V T N ) | υ ∈ RN−1} (795)whose dimension isdim T = N(N − 1)/2 (796)5.8 N(11 T )= N(1 T ) and R(zz T )= R(z)

5.7. VECTORIZATION & PROJECTION INTERPRETATION 341The set of all symmetric dyads {zz T |z∈R N } constitute the extremedirections of the positive semidefinite cone (2.8.1,2.9) S N + , hence lie onits boundary. Yet only those dyads in R(V N ) are included in the test (794),thus only a subset T of all vectorized extreme directions of S N + is observed.In the particularly simple case D ∈ EDM 2 = {D ∈ S 2 h | d 12 ≥ 0} , forexample, only one extreme direction of the PSD cone is involved:zz T =[1 −1−1 1](797)Any nonnegative scaling of vectorized zz T belongs to the set T illustratedin Figure 86 and Figure 87.5.7.1 Face of PSD cone S N + containing VIn any case, set T (795) constitutes the vectorized extreme directions ofan N(N −1)/2-dimensional face of the PSD cone S N + containing auxiliarymatrix V ; a face isomorphic with S N−1+ = S rank V+ (2.9.2.3).To show this, we must first find the smallest face that contains auxiliarymatrix V and then determine its extreme directions. From (185),F ( S N + ∋V ) = {W ∈ S N + | N(W) ⊇ N(V )} = {W ∈ S N + | N(W) ⊇ 1}= {V Y V ≽ 0 | Y ∈ S N } ≡ {V N BV T N | B ∈ SN−1 + }≃ S rank V+ = −V T N EDMN V N (798)where the equivalence ≡ is from4.6.1 while isomorphic equality ≃ withtransformed EDM cone is from (586). Projector V belongs to F ( S N + ∋V )because V N V † N V †TN V N T = V . (B.4.3) Each and every rank-one matrixbelonging to this face is therefore of the form:V N υυ T V T N | υ ∈ R N−1 (799)Because F ( S N + ∋V ) is isomorphic with a positive semidefinite cone S N−1+ ,then T constitutes the vectorized extreme directions of F , the originconstitutes the extreme points of F , and auxiliary matrix V is some convexcombination of those extreme points and directions by the extremes theorem(2.8.1.1.1).

342 CHAPTER 5. EDM CONEsvec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec EDM 20−Td 11√2d12T ∆ ={[ 1svec(zz T ) | z ∈ N(11 T )= κ−1] }, κ∈ R ⊂ svec ∂ S 2 +Figure 86: Truncated boundary of positive semidefinite cone S 2 + inisometrically isomorphic R 3 (via svec (46)) is, in this dimension, constitutedsolely by its extreme directions. Truncated cone of Euclidean distancematrices EDM 2 in isometrically isomorphic subspace R . Relativeboundary of EDM cone is constituted solely by matrix 0. HalflineT = {κ 2 [ 1 − √ 2 1 ] T | κ∈ R} on PSD cone boundary depicts that loneextreme ray (797) on which orthogonal projection of −D must be positivesemidefinite if D is to belong to EDM 2 . aff cone T = svec S 2 c . (802) DualEDM cone is halfspace in R 3 whose bounding hyperplane has inward-normalsvec EDM 2 .

5.7. VECTORIZATION & PROJECTION INTERPRETATION 343svec ∂ S 2 +[ ]d11 d 12d 12 d 22d 22svec S 2 h−D0D−Td 11√2d12Projection of vectorized −D on range of vectorized zz T :P svec zz T(svec(−D)) = 〈zzT , −D〉〈zz T , zz T 〉 zzTD ∈ EDM N⇔{〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0D ∈ S N h(794)Figure 87: Candidate matrix D is assumed to belong to symmetric hollowsubspace S 2 h ; a line in this dimension. Negating D , we find its polar alongS 2 h . Set T (795) has only one ray member in this dimension; not orthogonalto S 2 h . Orthogonal projection of −D on T (indicated by half dot) hasnonnegative projection coefficient. Candidate D must therefore be EDM.

344 CHAPTER 5. EDM CONEIn fact, the smallest face that contains auxiliary matrix V of the PSDcone S N + is the intersection with the geometric center subspace (1600) (1601);F ( S N + ∋V ) = cone { V N υυ T V T N= S N c ∩ S N +In isometrically isomorphic R N(N+1)/2| υ ∈ RN−1}(800)svec F ( S N + ∋V ) = cone T (801)related to S N c byaff cone T = svec S N c (802)5.7.2 EDM criteria in 11 TLaurent specifies an elliptope trajectory condition for EDM conemembership: [152,2.3]D ∈ EDM N ⇔ [1 − e −αd ij] ∈ EDM N ∀α > 0 (656)From the parametrized elliptope E N tD ∈ EDM N ⇔ ∃ t∈ R +E∈ E N tin5.6.2 we propose} D = t11 T − E (803)Chabrillac & Crouzeix [47,4] prove a different criterion they attributeto Finsler (1937) [82]. We apply it to EDMs: for D ∈ S N h (606)−V T N DV N ≻ 0 ⇔ ∃κ>0 −D + κ11 T ≻ 0⇔D ∈ EDM N with corresponding affine dimension r=N −1(804)This Finsler criterion has geometric interpretation in terms of thevectorization & projection already discussed in connection with (794). Withreference to Figure 86, the offset 11 T is simply a direction orthogonal to Tin isomorphic R 3 . Intuitively, translation of −D in direction 11 T is likeorthogonal projection on T in so far as similar information can be obtained.

5.8. DUAL EDM CONE 345When the Finsler criterion (804) is applied despite lower affine dimension,the constant κ can go to infinity making the test −D+κ11 T ≽ 0 impracticalfor numerical computation. Chabrillac & Crouzeix invent a criterion for thesemidefinite case, but is no more practical: for D ∈ S N hD ∈ EDM N⇔(805)∃κ p >0 ∀κ≥κ p , −D − κ11 T [sic] has exactly one negative eigenvalue5.8 Dual EDM cone5.8.1 Ambient S NWe consider finding the ordinary dual EDM cone in ambient space S N whereEDM N is pointed, closed, convex, but has empty interior. The set of all EDMsin S N is a closed convex cone because it is the intersection of halfspaces aboutthe origin in vectorized variable D (2.4.1.1.1,2.7.2):EDM N = ⋂ {D ∈ S N | 〈e i e T i , D〉 ≥ 0, 〈e i e T i , D〉 ≤ 0, 〈zz T , −D〉 ≥ 0 }z∈ N(1 T )i=1...N(806)By definition (252), dual cone K ∗comprises each and every vectorinward-normal to a hyperplane supporting convex cone K (2.4.2.6.1) orbounding a halfspace containing K . The dual EDM cone in the ambientspace of symmetric matrices is therefore expressible as the aggregate of everyconic combination of inward-normals from (806):EDM N∗ = cone{e i e T i , −e j e T j | i, j =1... N } − cone{zz T | 11 T zz T =0}∑= { N ∑ζ i e i e T i − N ξ j e j e T j | ζ i ,ξ j ≥ 0} − cone{zz T | 11 T zz T =0}i=1j=1= {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1 , (‖v‖= 1) } ⊂ S N= {δ 2 (Y ) − V N ΨV T N | Y ∈ SN , Ψ∈ S N−1+ } (807)The EDM cone is not self-dual in ambient S N because its affine hull belongsto a proper subspaceaff EDM N = S N h (808)

346 CHAPTER 5. EDM CONEThe ordinary dual EDM cone cannot, therefore, be pointed. (2.13.1.1)When N = 1, the EDM cone is the point at the origin in R . Auxiliarymatrix V N is empty [ ∅ ] , and dual cone EDM ∗ is the real line.When N = 2, the EDM cone is a nonnegative real line in isometricallyisomorphic R 3 ; there S 2 h is a real line containing the EDM cone. Dual coneEDM 2∗ is the particular halfspace in R 3 whose boundary has inward-normalEDM 2 . Diagonal matrices {δ(u)} in (807) are represented by a hyperplanethrough the origin {d | [ 0 1 0 ]d = 0} while the term cone{V N υυ T VN T }is represented by the halfline T in Figure 86 belonging to the positivesemidefinite cone boundary. The dual EDM cone is formed by translatingthe hyperplane along the negative semidefinite halfline −T ; the union ofeach and every translation. (confer2.10.2.0.1)When cardinality N exceeds 2, the dual EDM cone can no longer bepolyhedral simply because the EDM cone cannot. (2.13.1.1)5.8.1.1 EDM cone and its dual in ambient S NConsider the two convex conessoK ∆ 1 = S N hK ∆ 2 =⋂{A ∈ S N | 〈yy T , −A〉 ≥ 0 }y∈N(1 T )= { A ∈ S N | −z T V AV z ≥ 0 ∀zz T (≽ 0) } (809)= { A ∈ S N | −V AV ≽ 0 }K 1 ∩ K 2 = EDM N (810)The dual cone K ∗ 1 = S N⊥h ⊆ S N (61) is the subspace of diagonal matrices.From (807) via (264),K ∗ 2 = − cone { V N υυ T V T N | υ ∈ R N−1} ⊂ S N (811)Gaffke & Mathar [84,5.3] observe that projection on K 1 and K 2 havesimple closed forms: Projection on subspace K 1 is easily performed bysymmetrization and zeroing the main diagonal or vice versa, while projectionof H ∈ S N on K 2 isP K2 H = H − P S N+(V H V ) (812)

5.8. DUAL EDM CONE 347Proof. First, we observe membership of H−P S N+(V H V ) to K 2 because()P S N+(V H V ) − H = P S N+(V H V ) − V H V + (V H V − H) (813)The term P S N+(V H V ) − V H V necessarily belongs to the (dual) positivesemidefinite cone by Theorem E.9.2.0.1. V 2 = V , hence()−V H −P S N+(V H V ) V ≽ 0 (814)by Corollary A.3.1.0.5.Next, we requireExpanding,〈P K2 H −H , P K2 H 〉 = 0 (815)〈−P S N+(V H V ) , H −P S N+(V H V )〉 = 0 (816)〈P S N+(V H V ) , (P S N+(V H V ) − V H V ) + (V H V − H)〉 = 0 (817)〈P S N+(V H V ) , (V H V − H)〉 = 0 (818)Product V H V belongs to the geometric center subspace; (E.7.2.0.2)V H V ∈ S N c = {Y ∈ S N | N(Y )⊇1} (819)Diagonalize V H V ∆ =QΛQ T (A.5) whose nullspace is spanned bythe eigenvectors corresponding to 0 eigenvalues by Theorem A.7.2.0.1.Projection of V H V on the PSD cone (7.1) simply zeros negative eigenvaluesin diagonal matrix Λ . Thenfrom which it follows:N(P S N+(V H V )) ⊇ N(V H V ) (⊇ N(V )) (820)P S N+(V H V ) ∈ S N c (821)so P S N+(V H V ) ⊥ (V H V −H) because V H V −H ∈ S N⊥c .Finally, we must have P K2 H −H =−P S N+(V H V )∈ K ∗ 2 .From5.7.1we know dual cone K ∗ 2 =−F ( S N + ∋V ) is the negative of the positivesemidefinite cone’s smallest face that contains auxiliary matrix V . ThusP S N+(V H V )∈ F ( S N + ∋V ) ⇔ N(P S N+(V H V ))⊇ N(V ) (2.9.2.3) which wasalready established in (820).

348 CHAPTER 5. EDM CONEEDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec ∂S 2 +svec S 2 h0Figure 88: Orthogonal complement S 2⊥c (1602) (B.2) of geometric centersubspace (a plane in isometrically isomorphic R 3 ; drawn is a tiled fragment)apparently supporting positive semidefinite cone. (Rounded vertex is artifactof plot.) Line svec S 2 c = aff cone T (802) intersects svec ∂S 2 + , also drawn inFigure 86; it runs along PSD cone boundary. (confer Figure 69)

5.8. DUAL EDM CONE 349EDM 2 = S 2 h ∩ ( )S 2⊥c − S 2 +svec S 2⊥csvec S 2 csvec S 2 h0− svec ∂S 2 +Figure 89: EDM cone construction in isometrically isomorphic R 3 by addingpolar PSD cone to svec S 2⊥c . Difference svec ( S 2⊥c − S+) 2 is halfspace partiallybounded by svec S 2⊥c . EDM cone is nonnegative halfline along svec S 2 h inthis dimension.

350 CHAPTER 5. EDM CONEFrom the results inE.7.2.0.2, we know matrix product V H V is theorthogonal projection of H ∈ S N on the geometric center subspace S N c . Thusthe projection productP K2 H = H − P S N+P S N cH (822)5.8.1.1.1 Lemma. Projection on PSD cone ∩ geometric center subspace.P S N+ ∩ S N c= P S N+P S N c(823)⋄Proof. For each and every H ∈ S N , projection of P S N cH on the positivesemidefinite cone remains in the geometric center subspaceS N c = {G∈ S N | G1 = 0} (1600)= {G∈ S N | N(G) ⊇ 1} = {G∈ S N | R(G) ⊆ N(1 T )}= {V Y V | Y ∈ S N } ⊂ S N (1601)(556)That is because: Eigenvectors of P S N cH corresponding to 0 eigenvalues spanits nullspace. (A.7.2.0.1) To project P S N cH on the positive semidefinitecone, its negative eigenvalues are zeroed. (7.1.2) The nullspace is therebyexpanded while the eigenvectors originally spanning N(P S N cH) remainintact. Because the geometric center subspace is invariant to projection onthe PSD cone, then the rule for projection on a convex set in a subspacegoverns (E.9.5 projectors do not commute), and statement (823) followsdirectly.From the lemma it followsThen from (1626){P S N+P S N cH | H ∈ S N } = {P S N+ ∩ S N c H | H ∈ SN } (824)− ( S N c ∩ S N +) ∗= {H − PS N+P S N cH | H ∈ S N } (825)From (264) we get closure of a vector sumK 2 = − ( )S N c ∩ S N ∗+ = SN⊥c − S N + (826)therefore the new equalityEDM N = K 1 ∩ K 2 = S N h ∩ ( )S N⊥c − S N +(827)

5.8. DUAL EDM CONE 351whose veracity is intuitively evident, in hindsight, [54, p.109] from themost fundamental EDM definition (468). Formula (827) is not a matrixcriterion for membership to the EDM cone, and it is not an equivalencebetween EDM operators. Rather, it is a recipe for constructing the EDMcone whole from large Euclidean bodies: the positive semidefinite cone,orthogonal complement of the geometric center subspace, and symmetrichollow subspace. A realization of this construction in low dimension isillustrated in Figure 88 and Figure 89.The dual EDM cone follows directly from (827) by standard propertiesof cones (2.13.1.1):EDM N∗ = K ∗ 1 + K ∗ 2 = S N⊥h − S N c ∩ S N + (828)which bears strong resemblance to (807).5.8.1.2 nonnegative orthantThat EDM N is a proper subset of the nonnegative orthant is not obviousfrom (827). We wish to verifyEDM N = S N h ∩ ( )S N⊥c − S N + ⊂ RN×N+ (829)While there are many ways to prove this, it is sufficient to show that allentries of the extreme directions of EDM N must be nonnegative; id est, forany particular nonzero vector z = [z i , i=1... N]∈ N(1 T ) (5.5.3.2),δ(zz T )1 T + 1δ(zz T ) T − 2zz T ≥ 0 (830)where the inequality denotes entrywise comparison. The inequality holdsbecause the i,j th entry of an extreme direction is squared: (z i − z j ) 2 .We observe that the dyad 2zz T ∈ S N + belongs to the positive semidefinitecone, the doubletδ(zz T )1 T + 1δ(zz T ) T ∈ S N⊥c (831)to the orthogonal complement (1602) of the geometric center subspace, whiletheir difference is a member of the symmetric hollow subspace S N h .Here is an algebraic method provided by a reader to prove nonnegativity:Suppose we are given A∈ S N⊥c and B = [B ij ]∈ S N + and A −B ∈ S N h .Then we have, for some vector u , A = u1 T + 1u T = [A ij ] = [u i + u j ] andδ(B)= δ(A)= 2u . Positive semidefiniteness of B requires nonnegativityA −B ≥ 0 because(e i −e j ) T B(e i −e j ) = (B ii −B ij )−(B ji −B jj ) = 2(u i +u j )−2B ij ≥ 0 (832)

352 CHAPTER 5. EDM CONE5.8.1.3 Dual Euclidean distance matrix criterionConditions necessary for membership of a matrix D ∗ ∈ S N to thedual EDM cone EDM N∗ may be derived from (807): D ∗ ∈ EDM N∗ ⇒D ∗ = δ(y) − V N AVNT for some vector y and positive semidefinite matrixA∈ S N−1+ . This in turn implies δ(D ∗ 1)=δ(y) . Then, for D ∗ ∈ S Nwhere, for any symmetric matrix D ∗D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (833)δ(D ∗ 1) − D ∗ ∈ S N c (834)To show sufficiency of the matrix criterion in (833), recall Gram-formEDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G (480)where Gram matrix G is positive semidefinite by definition, and recall theself-adjointness property of the main-diagonal linear operator δ (A.1):〈D , D ∗ 〉 = 〈 δ(G)1 T + 1δ(G) T − 2G , D ∗〉 = 〈G , δ(D ∗ 1) − D ∗ 〉 2 (499)Assuming 〈G , δ(D ∗ 1) − D ∗ 〉≥ 0 (1119), then we have known membershiprelation (2.13.2.0.1)D ∗ ∈ EDM N∗ ⇔ 〈D , D ∗ 〉 ≥ 0 ∀D ∈ EDM N (835)Elegance of this matrix criterion (833) for membership to the dualEDM cone is the lack of any other assumptions except D ∗ be symmetric.(Recall: Schoenberg criterion (487) for membership to the EDM cone requiresmembership to the symmetric hollow subspace.)Linear Gram-form EDM operator has adjoint, for Y ∈ S NThen we have: [54, p.111]D T (Y ) ∆ = (δ(Y 1) − Y ) 2 (836)EDM N∗ = {Y ∈ S N | δ(Y 1) − Y ≽ 0} (837)the dual EDM cone expressed in terms of the adjoint operator. A dual EDMcone determined this way is illustrated in Figure 91.

5.8. DUAL EDM CONE 3535.8.1.3.1 Exercise.Find a spectral cone as in4.11.2 corresponding to EDM N∗ .5.8.1.4 Nonorthogonal components of dual EDMNow we tie construct (828) for the dual EDM cone together with the matrixcriterion (833) for dual EDM cone membership. For any D ∗ ∈ S N it isobvious:δ(D ∗ 1) ∈ S N⊥h (838)any diagonal matrix belongs to the subspace of diagonal matrices (56). Weknow when D ∗ ∈ EDM N∗ δ(D ∗ 1) − D ∗ ∈ S N c ∩ S N + (839)this adjoint expression (836) belongs to that face (800) of the positivesemidefinite cone S N + in the geometric center subspace. Any nonzerodual EDMD ∗ = δ(D ∗ 1) − (δ(D ∗ 1) − D ∗ ) ∈ S N⊥h ⊖ S N c ∩ S N + = EDM N∗ (840)can therefore be expressed as the difference of two linearly independentnonorthogonal components (Figure 69, Figure 90).5.8.1.5 Affine dimension complementarityFrom5.8.1.3 we have, for some A∈ S N−1+ (confer (839))δ(D ∗ 1) − D ∗ = V N AV T N ∈ S N c ∩ S N + (841)if and only if D ∗ belongs to the dual EDM cone. Call rank(V N AV T N ) dualaffine dimension. Empirically, we find a complementary relationship in affinedimension between the projection of some arbitrary symmetric matrix H onthe polar EDM cone, EDM N◦ = −EDM N∗ , and its projection on the EDMcone; id est, the optimal solution of 5.9minimize ‖D ◦ − H‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0(842)5.9 This dual projection can be solved quickly (without semidefinite programming) via

354 CHAPTER 5. EDM CONED ◦ = δ(D ◦ 1) + (D ◦ − δ(D ◦ 1)) ∈ S N⊥h⊕ S N c ∩ S N + = EDM N◦S N c ∩ S N +EDM N◦ D ◦ − δ(D ◦ 1)D ◦δ(D ◦ 1)0EDM N◦S N⊥hFigure 90: Hand-drawn abstraction of polar EDM cone (drawn truncated).Any member D ◦ of polar EDM cone can be decomposed into two linearlyindependent nonorthogonal components: δ(D ◦ 1) and D ◦ − δ(D ◦ 1).

5.8. DUAL EDM CONE 355has dual affine dimension complementary to affine dimension correspondingto the optimal solution ofminimize ‖D − H‖ FD∈S N hsubject to −VN TDV N ≽ 0(843)Precisely,rank(D ◦⋆ −δ(D ◦⋆ 1)) + rank(V T ND ⋆ V N ) = N −1 (844)and rank(D ◦⋆ −δ(D ◦⋆ 1))≤N−1 because vector 1 is always in the nullspaceof rank’s argument. This is similar to the known result for projection on theself-dual positive semidefinite cone and its polar:rankP −S N+H + rankP S N+H = N (845)When low affine dimension is a desirable result of projection on theEDM cone, projection on the polar EDM cone should be performed instead.Convex polar problem (842) can be solved for D ◦⋆ by transforming to anequivalent Schur-form semidefinite program (A.4.1). Interior-point methodsfor numerically solving semidefinite programs tend to produce high-ranksolutions. (6.1.1) Then D ⋆ = H − D ◦⋆ ∈ EDM N by Corollary E.9.2.2.1, andD ⋆ will tend to have low affine dimension. This approach breaks whenattempting projection on a cone subset discriminated by affine dimensionor rank, because then we have no complementarity relation like (844) or(845) (7.1.4.1).Lemma 5.8.1.1.1; rewriting,minimize ‖(D ◦ − δ(D ◦ 1)) − (H − δ(D ◦ 1))‖ FD ◦ ∈ S Nsubject to D ◦ − δ(D ◦ 1) ≽ 0which is the projection of affinely transformed optimal solution H − δ(D ◦⋆ 1) on S N c ∩ S N + ;D ◦⋆ − δ(D ◦⋆ 1) = P S N+P S Nc(H − δ(D ◦⋆ 1))Foreknowledge of an optimal solution D ◦⋆ as argument to projection suggests recursion.

356 CHAPTER 5. EDM CONE5.8.1.6 EDM cone dualityIn4.6.1.1, via Gram-form EDM operatorD(G) = δ(G)1 T + 1δ(G) T − 2G ∈ EDM N ⇐ G ≽ 0 (480)we established clear connection between the EDM cone and that face (800)of positive semidefinite cone S N + in the geometric center subspace:EDM N = D(S N c ∩ S N +) (573)V(EDM N ) = S N c ∩ S N + (574)whereIn4.6.1 we establishedV(D) = −V DV 1 2(562)S N c ∩ S N + = V N S N−1+ V T N (560)Then from (833), (841), and (807) we can deduceδ(EDM N∗ 1) − EDM N∗ = V N S N−1+ V T N = S N c ∩ S N + (846)which, by (573) and (574), means the EDM cone can be related to the dualEDM cone by an equality:(EDM N = D δ(EDM N∗ 1) − EDM N∗) (847)V(EDM N ) = δ(EDM N∗ 1) − EDM N∗ (848)This means projection −V(EDM N ) of the EDM cone on the geometric centersubspace S N c (E.7.2.0.2) is an affine transformation of the dual EDM cone:EDM N∗ − δ(EDM N∗ 1). Secondarily, it means the EDM cone is not self-dual.

5.8. DUAL EDM CONE 3575.8.1.7 Schoenberg criterion is discretized membership relationWe show the Schoenberg criterion−VN TDV }N ∈ S N−1+D ∈ S N h⇔ D ∈ EDM N (487)to be a discretized membership relation (2.13.4) between a closed convexcone K and its dual K ∗ like〈y , x〉 ≥ 0 for all y ∈ G(K ∗ ) ⇔ x ∈ K (301)where G(K ∗ ) is any set of generators whose conic hull constructs closedconvex dual cone K ∗ :The Schoenberg criterion is the same as}〈zz T , −D〉 ≥ 0 ∀zz T | 11 T zz T = 0⇔ D ∈ EDM N (794)D ∈ S N hwhich, by (795), is the same as〈zz T , −D〉 ≥ 0 ∀zz T ∈ { V N υυ T VN T | υ ∈ RN−1}D ∈ S N h}⇔ D ∈ EDM N (849)where the zz T constitute a set of generators G for the positive semidefinitecone’s smallest face F ( S N + ∋V ) (5.7.1) that contains auxiliary matrix V .From the aggregate in (807) we get the ordinary membership relation,assuming only D ∈ S N [129, p.58]〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ EDM N∗ ⇔ D ∈ EDM N(850)〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {δ(u) | u∈ R N } − cone { V N υυ T V T N | υ ∈ RN−1} ⇔ D ∈ EDM NDiscretization yields (301):〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ {e i e T i , −e j e T j , −V N υυ T V T N | i, j =1... N , υ ∈ R N−1 } ⇔ D ∈ EDM N(851)

358 CHAPTER 5. EDM CONEBecause 〈 {δ(u) | u∈ R N }, D 〉 ≥ 0 ⇔ D ∈ S N h , we can restrict observationto the symmetric hollow subspace without loss of generality. Then for D ∈ S N h〈D ∗ , D〉 ≥ 0 ∀D ∗ ∈ { −V N υυ T V T N | υ ∈ R N−1} ⇔ D ∈ EDM N (852)this discretized membership relation becomes (849); identical to theSchoenberg criterion.Hitherto a correspondence between the EDM cone and a face of a PSDcone, the Schoenberg criterion is now accurately interpreted as a discretizedmembership relation between the EDM cone and its ordinary dual.5.8.2 Ambient S N hWhen instead we consider the ambient space of symmetric hollow matrices(808), then still we find the EDM cone is not self-dual for N >2. Thesimplest way to prove this is as follows:Given a set of generators G = {Γ} (768) for the pointed closed convexEDM cone, the discrete membership theorem in2.13.4.2.1 asserts thatmembers of the dual EDM cone in the ambient space of symmetric hollowmatrices can be discerned via discretized membership relation:EDM N∗ ∩ S N h∆= {D ∗ ∈ S N h | 〈Γ , D ∗ 〉 ≥ 0 ∀ Γ ∈ G(EDM N )}(853)By comparison= {D ∗ ∈ S N h | 〈δ(zz T )1 T + 1δ(zz T ) T − 2zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}= {D ∗ ∈ S N h | 〈1δ(zz T ) T − zz T , D ∗ 〉 ≥ 0 ∀z∈ N(1 T )}EDM N = {D ∈ S N h | 〈−zz T , D〉 ≥ 0 ∀z∈ N(1 T )} (854)the term δ(zz T ) T D ∗ 1 foils any hope of self-duality in ambient S N h .

5.9. THEOREM OF THE ALTERNATIVE 359To find the dual EDM cone in ambient S N h per2.13.9.4 we prune theaggregate in (807) describing the ordinary dual EDM cone, removing anymember having nonzero main diagonal:EDM N∗ ∩ S N h = cone { δ 2 (V N υυ T V T N ) − V N υυT V T N | υ ∈ RN−1}= {δ 2 (V N ΨV T N ) − V N ΨV T N| Ψ∈ SN−1 + }(855)When N = 1, the EDM cone and its dual in ambient S h each comprisethe origin in isomorphic R 0 ; thus, self-dual in this dimension. (confer (80))When N = 2, the EDM cone is the nonnegative real line in isomorphic R .(Figure 86) EDM 2∗ in S 2 h is identical, thus self-dual in this dimension.This [ result]is in agreement [ with ] (853), verified directly: for all κ∈ R ,11z = κ and δ(zz−1T ) = κ 2 ⇒ d ∗ 12 ≥ 0.1The first case adverse to self-duality N = 3 may be deduced fromFigure 78; the EDM cone is a circular cone in isomorphic R 3 corresponding tono rotation of the Lorentz cone (142) (the self-dual circular cone). Figure 91illustrates the EDM cone and its dual in ambient S 3 h ; no longer self-dual.5.9 Theorem of the alternativeIn2.13.2.1.1 we showed how alternative systems of generalized inequalitycan be derived from closed convex cones and their duals. This section is,therefore, a fitting postscript to the discussion of the dual EDM cone.5.9.0.0.1 Theorem. EDM alternative. [96,1]Given D ∈ S N hD ∈ EDM Nor in the alternative{1 T z = 1∃ z such thatDz = 0(856)In words, either N(D) intersects hyperplane {z | 1 T z =1} or D is an EDM;the alternatives are incompatible.⋄

360 CHAPTER 5. EDM CONE0dvec rel∂EDM 3dvec rel∂(EDM 3∗ ∩ S 3 h)D ∗ ∈ EDM N∗ ⇔ δ(D ∗ 1) − D ∗ ≽ 0 (833)Figure 91: Ordinary dual EDM cone projected on S 3 h shrouds EDM 3 ;drawn tiled in isometrically isomorphic R 3 . (It so happens: intersectionEDM N∗ ∩ S N h (2.13.9.3) is identical to projection of dual EDM cone on S N h .)

5.10. POSTSCRIPT 361When D is an EDM [169,2]N(D) ⊂ N(1 T ) = {z | 1 T z = 0} (857)Because [96,2] (E.0.1)thenDD † 1 = 11 T D † D = 1 T (858)R(1) ⊂ R(D) (859)5.10 postscriptWe provided an equality (827) relating the convex cone of Euclidean distancematrices to the convex cone of positive semidefinite matrices. Projection onthe positive semidefinite cone constrained by an upper bound on rank iseasy and well known; [72] simply a matter of truncating a list of eigenvalues.Projection on the positive semidefinite cone with such a rank constraint is,in fact, a convex optimization problem. (7.1.4)Yet, it remains difficult to project on the EDM cone under a constrainton rank or affine dimension. One thing we can do is invoke the Schoenbergcriterion (487) and then project on a positive semidefinite cone under aconstraint bounding affine dimension from above.It is our hope that the equality herein relating EDM and PSD cones willbecome a step toward understanding projection on the EDM cone under arank constraint.

362 CHAPTER 5. EDM CONE

Chapter 6Semidefinite programmingPrior to 1984 , 6.1 linear and nonlinear programming, one a subsetof the other, had evolved for the most part along unconnectedpaths, without even a common terminology. (The use of“programming” to mean “optimization” serves as a persistentreminder of these differences.)−Forsgren, Gill, & Wright (2002) [83]Given some application of convex analysis, it may at first seem puzzling whythe search for its solution ends abruptly with a formalized statement of theproblem itself as a constrained optimization. The explanation is: typicallywe do not seek analytical solution because there are relatively few. (C) If aproblem can be expressed in convex form, rather, then there exist computerprograms providing efficient numerical global solution. [224] [100] [260] [261][262] [267]6.1 nascence of interior-point methods of solution [244] [258],2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.363

364 CHAPTER 6. SEMIDEFINITE PROGRAMMINGThe goal, then, becomes conversion of a given problem (perhaps anonconvex problem statement) to an equivalent convex form or to analternation of convex subproblems convergent to a solution of the originalproblem: A fundamental property of convex optimization problems is thatany locally optimal point is also (globally) optimal. [41,4.2.2] [201,1] Givenconvex real objective function g and convex feasible set C ⊆dom g , whichis the set of all variable values satisfying the problem constraints, we havethe generic convex optimization problemminimize g(X)Xsubject to X ∈ C(860)where constraints are abstract here in the membership of variable X tofeasible set C . Inequality constraint functions of a convex optimizationproblem are convex while equality constraint functions are conventionallyaffine, but not necessarily so. Affine equality constraint functions (necessarilyconvex), as opposed to the larger set of all convex equality constraintfunctions having convex level sets, make convex optimization tractable.Similarly, the problemmaximize g(X)Xsubject to X ∈ C(861)is convex were g a real concave function. As conversion to convex form is notalways possible, there is much ongoing research to determine which problemclasses have convex expression or relaxation. [26] [40] [86] [182] [231] [85]6.1 Conic problemStill, we are surprised to see the relatively small number ofsubmissions to semidefinite programming (SDP) solvers, as thisis an area of significant current interest to the optimizationcommunity. We speculate that semidefinite programming issimply experiencing the fate of most new areas: Users have yet tounderstand how to pose their problems as semidefinite programs,and the lack of support for SDP solvers in popular modellinglanguages likely discourages submissions.−SIAM News, 2002. [68, p.9]

6.1. CONIC PROBLEM 365Consider a prototypical conic problem (p) and its dual (d): [192,3.3.1][156,2.1](p)minimize c T xxsubject to x ∈ KAx = bmaximize b T yy,ssubject to s ∈ K ∗A T y + s = c(d) (255)where K is a closed convex cone, K ∗ is its dual, matrix A is fixed, and theremaining quantities are vectors.When K is a polyhedral cone (2.12.1), then each conic problem becomesa linear program [57]. More generally, each optimization problem is convexwhen K is a closed convex cone. Unlike the optimal objective value, asolution to each problem is not necessarily unique; in other words, the optimalsolution set {x ⋆ } or {y ⋆ , s ⋆ } is convex and may comprise more than a singlepoint although the corresponding optimal objective value is unique when thefeasible set is nonempty.When K is the self-dual cone of positive semidefinite matrices in thesubspace of symmetric matrices, then each conic problem is called asemidefinite program (SDP); [182,6.4] primal problem (P) having matrixvariable X ∈ S n while corresponding dual (D) has matrix slack variableS ∈ S n and vector variable y = [y i ]∈ R m : [8] [9,2] [267,1.3.8](P)minimizeX∈ S n 〈C , X〉subject to X ≽ 0A svec X = bmaximizey∈R m , S∈S n 〈b, y〉subject to S ≽ 0svec −1 (A T y) + S = C(D)(862)where matrix C ∈ S n and vector b∈R m are fixed, as is⎡ ⎤svec(A 1 ) TA =∆ ⎣ . ⎦ ∈ R m×n(n+1)/2 (863)svec(A m ) Twhere A i ∈ S n , i=1... m , are given. Thus⎡ ⎤〈A 1 , X〉A svec X = ⎣ . ⎦〈A m , X〉(864)∑svec −1 (A T y) = m y i A ii=1

366 CHAPTER 6. SEMIDEFINITE PROGRAMMINGThe vector inner-product for matrices is defined in the Euclidean/Frobeniussense in the isomorphic vector space R n(n+1)/2 ; id est,〈C,X〉 ∆ = tr(C T X) = svec(C) T svec X (30)where svec X defined by (46) denotes symmetric vectorization.Semidefinite programming has emerged recently to prominence primarilybecause it admits a new class of problem previously unsolvable by convexoptimization techniques, [40] secondarily because it theoretically subsumesother convex techniques such as linear, quadratic, and second-order coneprogramming. Determination of the Riemann mapping function fromcomplex analysis [186] [23,8,13], for example, can be posed as asemidefinite program.6.1.1 Maximal complementarityIt has been shown that contemporary interior-point methods (developedcirca 1990 [86]) [41,11] [259] [189] [182] [9] [83] for numericalsolution of semidefinite programs can converge to a solution of maximalcomplementarity; [108,5] [266] [165] [92] not a vertex-solution but a solutionof highest cardinality or rank among all optimal solutions. 6.2 [267,2.5.3]6.1.1.1 Reduced-rank solutionA simple rank reduction algorithm for construction of a primal optimalsolution X ⋆ to (862P) satisfying an upper bound on rank governed byProposition 2.9.3.0.1 is presented in6.3. That proposition asserts existenceof feasible solutions with an upper bound on their rank; [19,II.13.1]specifically, it asserts an extreme point (2.6.0.0.1) of the primal feasibleset A ∩ S n + satisfies upper bound⌊√ ⌋ 8m + 1 − 1rankX ≤(226)2where, given A∈ R m×n(n+1)/2 and b∈ R mA ∆ = {X ∈ S n | A svec X = b} (865)is the affine subset from primal problem (862P).6.2 This characteristic might be regarded as a disadvantage to this method of numericalsolution, but this behavior is not certain and depends on solver implementation.

6.1. CONIC PROBLEM 367C0PΓ 1Γ 2S+3A=∂HFigure 92: Visualizing positive semidefinite cone in high dimension: Properpolyhedral cone S+ 3 ⊂ R 3 representing positive semidefinite cone S 3 + ⊂ S 3 ;analogizing its intersection with hyperplane S 3 + ∩ ∂H . Number of facetsis arbitrary (analogy is not inspired by eigen decomposition). The rank-0positive semidefinite matrix corresponds to the origin in R 3 , rank-1 positivesemidefinite matrices correspond to the edges of the polyhedral cone, rank-2to the facet relative interiors, and rank-3 to the polyhedral cone interior.Γ 1 and Γ 2 are extreme points of polyhedron P =∂H ∩ S+ 3 , and extremedirections of S+ 3 . A given vector C is normal to another hyperplane(not illustrated but independent w.r.t ∂H) containing line segment Γ 1 Γ 2minimizing real linear function 〈C , X〉 on P . (confer Figure 16)

368 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.1.1.2 Coexistence of low- and high-rank solutions; analogyThat low-rank and high-rank optimal solutions {X ⋆ } of (862P) coexist maybe grasped with the following analogy: We compare a proper polyhedral coneS 3 + in R 3 (illustrated in Figure 92) to the positive semidefinite cone S 3 + inisometrically isomorphic R 6 , difficult to visualize. The analogy is good:int S 3 + is constituted by rank-3 matricesint S+ 3 has three dimensionsboundary ∂S 3 + contains rank-0, rank-1, and rank-2 matricesboundary ∂S+ 3 contains 0-, 1-, and 2-dimensional facesthe only rank-0 matrix resides in the vertex at the originRank-1 matrices are in one-to-one correspondence with extremedirections of S 3 + and S 3 + . The set of all rank-1 symmetric matrices inthis dimension{G ∈ S3+ | rankG=1 } (866)is not a connected set.In any SDP feasibility problem, the SDP feasible solution with the lowestrank must be an extreme point of the feasible set. Thus, there must existan SDP objective function such that this lowest-rank feasible solutionuniquely optimizes it. −Ye, 2006Rank of a sum of members F+G in Lemma 2.9.2.6.1 and location ofa difference F −G in2.9.2.9.1 similarly hold for S 3 + and S 3 + .Euclidean distance from any particular rank-3 positive semidefinitematrix (in the cone interior) to the closest rank-2 positive semidefinitematrix (on the boundary) is generally less than the distance to theclosest rank-1 positive semidefinite matrix. (7.1.2)distance from any point in ∂S 3 + to int S 3 + is infinitesimal (2.1.7.1.1)distance from any point in ∂S 3 + to int S 3 + is infinitesimal

6.1. CONIC PROBLEM 369faces of S 3 + correspond to faces of S 3 + (confer Table 2.9.2.3.1)k dim F(S+) 3 dim F(S 3 +) dim F(S 3 + ∋ rank-k matrix)0 0 0 0boundary 1 1 1 12 2 3 3interior 3 3 6 6Integer k indexes k-dimensional faces F of S 3 + . Positive semidefinitecone S 3 + has four kinds of faces, including cone itself (k = 3,boundary + interior), whose dimensions in isometrically isomorphicR 6 are listed under dim F(S 3 +). Smallest face F(S 3 + ∋ rank-k matrix)that contains a rank-k positive semidefinite matrix has dimensionk(k + 1)/2 by (186).For A equal to intersection of m hyperplanes having independentnormals, and for X ∈ S 3 + ∩ A , we have rankX ≤ m ; the analogueto (226).Proof. With reference to Figure 92: Assume one (m = 1) hyperplaneA = ∂H intersects the polyhedral cone. Every intersecting planecontains at least one matrix having rank less than or equal to 1 ; id est,from all X ∈ ∂H ∩ S 3 + there exists an X such that rankX ≤ 1. Rank 1is therefore an upper bound in this case.Now visualize intersection of the polyhedral cone with two (m = 2)hyperplanes having linearly independent normals. The hyperplaneintersection A makes a line. Every intersecting line contains at least onematrix having rank less than or equal to 2, providing an upper bound.In other words, there exists a positive semidefinite matrix X belongingto any line intersecting the polyhedral cone such that rankX ≤ 2.In the case of three independent intersecting hyperplanes (m = 3), thehyperplane intersection A makes a point that can reside anywhere inthe polyhedral cone. The upper bound on a point in S+ 3 is also thegreatest upper bound: rankX ≤ 3.

370 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.1.1.2.1 Example. Optimization on A ∩ S 3 + .Consider minimization of the real linear function 〈C , X〉 ona polyhedral feasible set;P ∆ = A ∩ S 3 + (867)f ⋆ 0∆= minimize 〈C , X〉Xsubject to X ∈ A ∩ S+3(868)As illustrated for particular C and A = ∂H in Figure 92, this linear functionis minimized (confer Figure 16) on any X belonging to the face of Pcontaining extreme points {Γ 1 , Γ 2 } and all the rank-2 matrices in between;id est, on any X belonging to the face of PF(P) = {X | 〈C , X〉 = f ⋆ 0 } ∩ A ∩ S 3 + (869)exposed by the hyperplane {X | 〈C , X〉=f ⋆ 0 }. In other words, the set of alloptimal points X ⋆ is a face of P{X ⋆ } = F(P) = Γ 1 Γ 2 (870)comprising rank-1 and rank-2 positive semidefinite matrices. Rank 1 isthe upper bound on existence in the feasible set P for this case m = 1hyperplane constituting A . The rank-1 matrices Γ 1 and Γ 2 in face F(P)are extreme points of that face and (by transitivity (2.6.1.2)) extremepoints of the intersection P as well. As predicted by analogy to Barvinok’sProposition 2.9.3.0.1, the upper bound on rank of X existent in the feasibleset P is satisfied by an extreme point. The upper bound on rank of anoptimal solution X ⋆ existent in F(P) is thereby also satisfied by an extremepoint of P precisely because {X ⋆ } constitutes F(P) ; 6.3 in particular,{X ⋆ ∈ P | rankX ⋆ ≤ 1} = {Γ 1 , Γ 2 } ⊆ F(P) (871)As all linear functions on a polyhedron are minimized on a face, [57] [164][181] [183] by analogy we so demonstrate coexistence of optimal solutions X ⋆of (862P) having assorted rank.6.3 and every face contains a subset of the extreme points of P by the extremeexistence theorem (2.6.0.0.2). This means: because the affine subset A and hyperplane{X | 〈C , X 〉 = f ⋆ 0 } must intersect a whole face of P , calculation of an upper bound onrank of X ⋆ ignores counting the hyperplane when determining m in (226).

6.1. CONIC PROBLEM 3716.1.1.3 Previous workBarvinok showed [20,2.2] when given a positive definite matrix C and anarbitrarily small neighborhood of C comprising positive definite matrices,there exists a matrix ˜C from that neighborhood such that optimal solutionX ⋆ to (862P) (substituting ˜C) is an extreme point of A ∩ S n + and satisfiesupper bound (226). 6.4 Given arbitrary positive definite C , this meansnothing inherently guarantees that an optimal solution X ⋆ to problem (862P)satisfies (226); certainly nothing given any symmetric matrix C , as theproblem is posed. This can be proved by example:6.1.1.3.1 Example. (Ye) Maximal Complementarity.Assume dimension n to be an even positive number. Then the particularinstance of problem (862P),minimizeX∈ S n 〈[ I 00 2Isubject to X ≽ 0〈I , X〉 = n] 〉, X(872)has optimal solutionX ⋆ =[ 2I 00 0]∈ S n (873)with an equal number of twos and zeros along the main diagonal. Indeed,optimal solution (873) is a terminal solution along the central path taken bythe interior-point method as implemented in [267,2.5.3]; it is also a solutionof highest rank among all optimal solutions to (872). Clearly, rank of thisprimal optimal solution exceeds by far a rank-1 solution predicted by upperbound (226).6.4 Further, the set of all such ˜C in that neighborhood is open and dense.

372 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.1.1.4 Later developmentsThis rational example (872) indicates the need for a more generally applicableand simple algorithm to identify an optimal solution X ⋆ satisfying Barvinok’sProposition 2.9.3.0.1. We will review such an algorithm in6.3, but first weprovide more background.6.2 Framework6.2.1 Feasible setsDenote by C and C ∗ the convex sets of primal and dual points respectivelysatisfying the primal and dual constraints in (862), each assumed nonempty;⎧ ⎡⎨C =⎩ X ∈ Sn + | ⎣C ∗ =〈A 1 , X〉.〈A m , X〉{S ∈ S n + , y = [y i ]∈ R m |⎤ ⎫⎬⎦= b⎭ = A ∩ Sn +m∑i=1}y i A i + S = C(874)These are the primal feasible set and dual feasible set in domain intersectionof the respective constraint functions. Geometrically, primal feasible A ∩ S n +represents an intersection of the positive semidefinite cone S n + with anaffine subset A of the subspace of symmetric matrices S n in isometricallyisomorphic R n(n+1)/2 . The affine subset has dimension n(n+1)/2 −m whenthe A i are linearly independent. Dual feasible set C ∗ is the Cartesian productof the positive semidefinite cone with its inverse image (2.1.9.0.1) underaffine transformation C − ∑ y i A i . 6.5 Both sets are closed and convex andthe objective functions on a Euclidean vector space are linear, hence (862P)and (862D) are convex optimization problems.6.5 The inequality C − ∑ y i A i ≽0 follows directly from (862D) (2.9.0.1.1) and is knownas a linear matrix inequality. (2.13.5.1.1) Because ∑ y i A i ≼C , matrix S is known as aslack variable (a term borrowed from linear programming [57]) since its inclusion raisesthis inequality to equality.

6.2. FRAMEWORK 3736.2.1.1 A ∩ S n + emptiness determination via Farkas’ lemma6.2.1.1.1 Lemma. Semidefinite Farkas’ lemma.Given an arbitrary set {A i ∈ S n , i=1... m} and a vector b = [b i ]∈ R m ,define the affine subsetA = {X ∈ S n | 〈A i , X〉 = b i , i=1... m} (865)Primal feasible set A ∩ S n + is nonempty if and only if y T b ≥ 0 holds foreach and every vector y = [y i ]∈ R m ∑such that m y i A i ≽ 0.Equivalently, primal feasible set A ∩ S n + is nonempty if and onlyif y T b ≥ 0 holds for each and every norm-1 vector ‖y‖= 1 such thatm∑y i A i ≽ 0.⋄i=1Semidefinite Farkas’ lemma follows directly from a membership relation(2.13.2.0.1) and the closed convex cones from linear matrix inequalityexample 2.13.5.1.1; given convex cone K and its duali=1whereK = {A svec X | X ≽ 0} (308)m∑K ∗ = {y | y j A j ≽ 0} (314)⎡A = ⎣j=1⎤svec(A 1 ) T. ⎦ ∈ R m×n(n+1)/2 (863)svec(A m ) Tthen we have membership relationb ∈ K ⇔ 〈y,b〉 ≥ 0 ∀y ∈ K ∗ (267)and equivalentsb ∈ K ⇔ ∃X ≽ 0 A svec X = b ⇔ A ∩ S n + ≠ ∅ (875)

374 CHAPTER 6. SEMIDEFINITE PROGRAMMINGSemidefinite Farkas’ lemma provides the conditions required for a setof hyperplanes to have a nonempty intersection A ∩ S n + with the positivesemidefinite cone. While the lemma as stated is correct, Ye points out[267,1.3.8] that a positive definite version of this lemma is required forsemidefinite programming because any feasible point in the relative interiorA ∩ int S n + is required by Slater’s condition 6.6 to achieve 0 duality gap(primal-dual objective difference6.2.3). In our circumstance, assuminga nonempty intersection, a positive definite lemma is required to insure apoint of intersection closest to the origin is not at infinity; e.g., Figure 31.Then given A∈ R m×n(n+1)/2 having rank m , we wish to detect existence ofa nonempty relative interior of the primal feasible set; 6.7b ∈ int K ⇔ 〈y, b〉 > 0 for all y ∈ K ∗ , y ≠ 0 ⇔ A ∩ int S n + ≠ ∅(876)A positive definite Farkas’ lemma can easily be constructed from thismembership relation (274) and these proper convex cones K (308) andK ∗ (314):6.2.1.1.2 Lemma. Positive definite Farkas’ lemma.Given a linearly independent set {A i ∈ S n , i=1... m}b = [b i ]∈ R m , define the affine subsetand a vectorA = {X ∈ S n | 〈A i , X〉 = b i , i=1... m} (865)Primal feasible set relative interior A ∩ int S n + is nonempty if and only if∑y T b > 0 holds for each and every vector y = [y i ]≠ 0 such that m y i A i ≽ 0.Equivalently, primal feasible set relative interior A ∩ int S n + is nonemptyif and only if y T b > 0 holds for each and every norm-1 vector ‖y‖= 1 such∑that m y i A i ≽ 0.⋄i=16.6 Slater’s sufficient condition is satisfied whenever any primal strictly feasible pointexists; id est, any point feasible with the affine equality (or affine inequality) constraintfunctions and relatively interior to convex cone K . If cone K is polyhedral, then Slater’scondition is satisfied when any feasible point exists relatively interior to K or on its relativeboundary. [41,5.2.3] [28, p.325]6.7 Detection of A ∩ int S n + by examining K interior is a trick need not be lost.i=1

6.2. FRAMEWORK 3756.2.1.1.3 Example. “New” Farkas’ lemma.In 1995, Lasserre [148,III] presented an example originally offered byBen-Israel in 1969 [25, p.378] as evidence of failure in semidefinite Farkas’Lemma 6.2.1.1.1:[ ]A =∆ svec(A1 ) Tsvec(A 2 ) T =[ 0 1 00 0 1] [ 1, b =0](877)The intersection A ∩ S n + is practically empty because the solution set{X ≽ 0 | A svec X = b} ={[α1 √2√120]≽ 0 | α∈ R}(878)is positive semidefinite only asymptotically (α→∞). Yet the dual systemm∑y i A i ≽0 ⇒ y T b≥0 indicates nonempty intersection; videlicet, for ‖y‖= 1i=1y 1[01 √21 √20]+ y 2[ 0 00 1][ ] 0≽ 0 ⇔ y =1⇒ y T b = 0 (879)On the other hand, positive definite Farkas’ Lemma 6.2.1.1.2 showsA ∩ int S n + is empty; what we need to know for semidefinite programming.Based on Ben-Israel’s example, Lasserre suggested addition of anothercondition to semidefinite Farkas’ Lemma 6.2.1.1.1 to make a “new” lemma.Ye recommends positive definite Farkas’ Lemma 6.2.1.1.2 instead; which issimpler and obviates Lasserre’s proposed additional condition. 6.2.1.2 Theorem of the alternative for semidefinite programmingBecause these Farkas’ lemmas follow from membership relations, we mayconstruct alternative systems from them. From positive definite Farkas’lemma and using the method of2.13.2.1.1, we getA ∩ int S n + ≠ ∅or in the alternativem∑y T b ≤ 0, y i A i ≽ 0, y ≠ 0i=1(880)

376 CHAPTER 6. SEMIDEFINITE PROGRAMMINGAny single vector y satisfying the alternative certifies A ∩ int S n + is empty.Such a vector can be found as a feasible point of another semidefiniteprogram: for linearly independent set {A i ∈ S n , i=1... m}minimizeysubject toy T bm∑y i A i ≽ 0i=1(881)If y ≠ 0 is a feasible point and y T b ≤ 0, then the relative interior of theprimal feasible set A ∩ int S n + from (874) is empty.6.2.1.3 boundary-membership criterionFrom the boundary-membership relation (278) for proper cones and fromlinear matrix inequality cones K (308) and K ∗ (314)b ∈ ∂K ⇔ ∃ y 〈y , b〉 = 0, y ∈ K ∗ , y ≠ 0, b ∈ K ⇔ A ∩ S n + ⊂ ∂S n +(882)Determination of whether b ∈ ∂K , is expressible as a semidefinite program:assuming linearly independent set {A i ∈ S n , i=1... m} 6.8maximize 1 T yy , Xsubject to y T b = 0m∑y i A i ≽ 0i=1‖y‖ 2 ≤ 1A svec X = bX ≽ 0(883)If such an optimal vector y ⋆ ≠ 0 can be found, then affine subset A (865)intersects the positive semidefinite cone but only on its boundary.6.8 From the results of Example 2.13.5.1.1, vector b on the boundary of K cannot bedetected simply by looking for 0 eigenvalues in matrix X . Therefore, none of theseconstraints are redundant. We do not consider the case where matrix A has no nullspacebecause then the feasible set {X} is at most a single point.

6.2. FRAMEWORK 3776.2.2 DualsThe dual objective function evaluated at any feasible point represents a lowerbound on the primal optimal objective value. We can see this by directsubstitution: Assume the feasible sets A ∩ S n + and C ∗ are nonempty. Thenit is always true:〈 ∑i〈C , X〉 ≥ 〈b, y〉〉y i A i + S , X ≥ [ 〈A 1 , X〉 · · · 〈A m , X〉 ] y〈S , X〉 ≥ 0(884)The converse also follows becauseX ≽ 0, S ≽ 0 ⇒ 〈S,X〉 ≥ 0 (1119)The optimal value of the dual objective thus represents the greatest lowerbound on the primal. This fact is known as the weak duality theoremfor semidefinite programming, [267,1.3.8] and can be used to detectconvergence in any primal/dual numerical method of solution.6.2.3 Optimality conditionsWhen any primal feasible point exists relatively interior to A ∩ S n + in S n ,or when any dual feasible point exists relatively interior to C ∗ in S n × R m ,then by Slater’s sufficient condition these two problems (862P) and (862D)become strong duals. In other words, the primal optimal objective valuebecomes equivalent to the dual optimal objective value: there is no dualitygap; id est, if ∃X ∈ A ∩ int S n + or ∃S,y ∈ rel int C ∗ then〈 ∑i〈C , X ⋆ 〉 = 〈b, y ⋆ 〉y ⋆ i A i + S ⋆ , X ⋆ 〉= [ 〈A 1 , X ⋆ 〉 · · · 〈A m , X ⋆ 〉 ] y ⋆〈S ⋆ , X ⋆ 〉 = 0(885)where S ⋆ , y ⋆ denote a dual optimal solution. 6.9 We summarize this:6.9 Optimality condition 〈S ⋆ , X ⋆ 〉=0 is called a complementary slackness condition, inkeeping with the tradition of linear programming, [57] that forbids dual inequalities in(862) to simultaneously hold strictly. [201,4]

378 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.2.3.0.1 Corollary. Optimality and strong duality. [240,3.1][267,1.3.8] For semidefinite programs (862P) and (862D), assume primaland dual feasible sets A ∩ S n + ⊂ S n and C ∗ ⊂ S n × R m (874) are nonempty.ThenX ⋆ is optimal for (P)S ⋆ , y ⋆ are optimal for (D)the duality gap 〈C,X ⋆ 〉−〈b, y ⋆ 〉 is 0if and only ifi) ∃X ∈ A ∩ int S n + or ∃S , y ∈ rel int C ∗andii) 〈S ⋆ , X ⋆ 〉 = 0⋄For symmetric positive semidefinite matrices, requirement ii is equivalentto the complementarity (A.7.3)〈S ⋆ , X ⋆ 〉 = 0 ⇔ S ⋆ X ⋆ = X ⋆ S ⋆ = 0 (886)Commutativity of diagonalizable matrices is a necessary and sufficientcondition [131,1.3.12] for these two optimal symmetric matrices to besimultaneously diagonalizable. ThereforerankX ⋆ + rankS ⋆ ≤ n (887)Proof. To see that, the product of symmetric optimal matricesX ⋆ , S ⋆ ∈ S n must itself be symmetric because of commutativity. (1113) Thesymmetric product has diagonalization [9, cor.2.11]S ⋆ X ⋆ = X ⋆ S ⋆ = QΛ S ⋆Λ X ⋆Q T = 0 ⇔ Λ X ⋆Λ S ⋆ = 0 (888)where Q is an orthogonal matrix. The product of the nonnegative diagonal Λmatrices can be 0 if their main-diagonal zeros are complementary or coincide.Due only to symmetry, rankX ⋆ = rank Λ X ⋆ and rankS ⋆ = rank Λ S ⋆ forthese optimal primal and dual solutions. (1101) So, because of thecomplementarity, the total number of nonzero diagonal entries from both Λcannot exceed n .

6.2. FRAMEWORK 379When equality is attained in (887)rankX ⋆ + rankS ⋆ = n (889)there are no coinciding main-diagonal zeros in Λ X ⋆Λ S ⋆ , and so we have whatis called strict complementarity. 6.10 Logically it follows that a necessary andsufficient condition for strict complementarity of an optimal primal and dualsolution isX ⋆ + S ⋆ ≻ 0 (890)The beauty of Corollary 6.2.3.0.1 is its conjugacy; id est, one can solveeither the primal or dual problem and then find a solution to the other via theoptimality conditions. When a dual optimal solution is known, for example,a primal optimal solution belongs to the hyperplane {X | 〈S ⋆ , X〉=0}.6.2.3.0.2 Example. Minimum cardinality Boolean. [56] [26,4.3.4] [231]Consider finding a minimum cardinality Boolean solution x to the classiclinear algebra problem Ax = b given noiseless data A∈ R m×n and b∈ R m ;minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n(891)where ‖x‖ 0 denotes cardinality of vector x (a.k.a, 0-norm; not a convexfunction).A minimum cardinality solution answers the question: “Which fewestlinear combination of columns in A constructs vector b ?” Cardinalityproblems have extraordinarily wide appeal, arising in many fields of scienceand across many disciplines. [211] [139] [103] [104] Yet designing an efficientalgorithm to optimize cardinality has proved difficult. In this example, wealso constrain the variable to be Boolean. The Boolean constraint forcesan identical solution were the norm in problem (891) instead the 1-norm or2-norm; id est, the two problems6.10 distinct from maximal complementarity (6.1.1).

380 CHAPTER 6. SEMIDEFINITE PROGRAMMING(891)minimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... n=minimize ‖x‖ 1xsubject to Ax = bx i ∈ {0, 1} ,(892)i=1... nare the same. The Boolean constraint makes the 1-norm problem nonconvex.Given data 6.11⎡−1 1 8 1 1 01 1 1A = ⎣ −3 2 82 31 1−9 4 84 9− 1 2 31− 1 4 9⎤ ⎡⎦ , b = ⎣11214⎤⎦ (893)the obvious and desired solution to the problem posed,x ⋆ = e 4 ∈ R 6 (894)has norm ‖x ⋆ ‖ 2 = 1 and minimum cardinality; the minimum number ofnonzero entries in vector x . Though solution (894) is obvious, the simplestnumerical method of solution is, more generally, combinatorial. The Matlabbackslash command x=A\b , for example, finds⎡x M=⎢⎣2128051280901280⎤⎥⎦(895)having norm ‖x M‖ 2 = 0.7044 .id est, an optimal solution toCoincidentally, x Mis a 1-norm solution;minimize ‖x‖ 1xsubject to Ax = b(896)6.11 This particular matrix A is full-rank having three-dimensional nullspace (but thecolumns are not conically independent).

6.2. FRAMEWORK 381The pseudoinverse solution (rounded)⎡ ⎤−0.0456−0.1881x P= A † b =0.0623⎢ 0.2668⎥⎣ 0.3770 ⎦−0.1102(897)has minimum norm ‖x P‖ 2 =0.5165 ; id est, the optimal solution tominimize ‖x‖ 2xsubject to Ax = b(898)Certainly, none of the traditional methods provide x ⋆ = e 4 (894).We can reformulate this minimum cardinality Boolean problem (891) asa semidefinite program: First transform the variablex ∆ = (ˆx + 1) 1 2(899)so ˆx i ∈ {−1, 1} ; equivalently,minimize ‖(ˆx + 1) 1‖ ˆx2 0subject to A(ˆx + 1) 1 = b 2δ(ˆx) 2 = I(900)where δ is the main-diagonal linear operator (A.1). By defining a matrix[X =∆ ˆx1] [ ˆx T 1] =[ ˆxˆxTˆxˆx T 1]∈ S n+1 (901)

382 CHAPTER 6. SEMIDEFINITE PROGRAMMINGproblem (900) becomes equivalent to:minimizeX∈ S n+1 〈subject to[ 0 1X ,1 T 0[ A 00 T 1]Xδ(X) = 1X ≽ 0rankX = 1]〉[ AT00 T 1]=[ (2b −A1)(2b −A1)T2b −A1(2b −A1) T 1](902)where solution is confined to rank-1 vertices of the elliptope in S n+1(4.9.1.0.1) by the rank constraint, the positive semidefiniteness, and theequality constraints δ(X)=1. The rank constraint makes this problemnonconvex. By removing it 6.12 we get the semidefinite programminimizeX∈ S n+1subject to(tr X[ A 00 T 1[ ]) 0 11 T 0]Xδ(X) = 1X ≽ 0[ AT00 T 1]=[ (2b −A1)(2b −A1)T2b −A1(2b −A1) T 1](903)whose optimal solution X ⋆ is identical to that of minimum cardinalityBoolean problem (891) if and only if rankX ⋆ =1. The hope of acquiringa rank-1 solution is not ill-founded because 2 n elliptope vertices haverank 1, and we are minimizing an affine function on a subset of theelliptope (Figure 70) containing rank-1 vertices; id est, by assumption thatthe feasible set of the minimum cardinality Boolean problem (891) isnonempty, a desired solution resides on the elliptope relative boundary ata rank-1 vertex. Confinement to the elliptope can be regarded as a kindof normalization akin to A column-normalization as suggested in [70] andexplored in Example 6.2.3.0.3.6.12 Relaxed problem (903) can also be derived via Lagrange duality; [199] [41,5] it is adual of a dual program [sic] to (902). The relaxed problem must therefore be convex.

6.2. FRAMEWORK 383For the data given in (893), our semidefinite program solver (accurate toapproximately 1E-8) 6.13 finds optimal solution to (903)⎡round(X ⋆ ) =⎢⎣1 1 1 −1 1 1 −11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 11 1 1 −1 1 1 −11 1 1 −1 1 1 −1−1 −1 −1 1 −1 −1 1⎤⎥⎦(904)near a rank-1 vertex of the elliptope in S n+1 ; its sorted eigenvalues,⎡λ(X ⋆ ) =⎢⎣6.999999777990990.000000226872410.000000022502960.00000000262974−0.00000000999738−0.00000000999875−0.00000001000000⎤⎥⎦(905)The negative eigenvalues are undoubtedly finite-precision effects. Becausethe largest eigenvalue predominates by many orders of magnitude, we canexpect to find a good approximation to a minimum cardinality Booleansolution by truncating all smaller eigenvalues. By so doing we find, indeed,⎛⎡x ⋆ = round⎜⎢⎝⎣0.000000001279470.000000005273690.000000001810010.999999974690440.000000014089500.00000000482903⎤⎞= e 4 (906)⎥⎟⎦⎠the desired result (894).6.13 A typically ignored limitation of interior-point methods of solution is their relativeaccuracy of only about 1E-8 on a machine using 64-bit (double precision) arithmetic;id est, optimal solution cannot be more accurate than square root of machine epsilon(2.2204E-16).

384 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.2.3.0.3 Example. Optimization on elliptope versus 1-norm polyhedron.A minimum cardinality problem is typically formulated via, what is by now,the standard practice [70] of column normalization applied to a 1-normproblem surrogate like (896). Suppose we define a diagonal matrix⎡Λ =∆ ⎢⎣⎤‖A(:,1)‖ 2 0‖A(:, 2)‖ 2 ⎥... ⎦ ∈ S6 (907)0 ‖A(:, 6)‖ 2used to normalize the columns (assumed nonzero) of given noiseless datamatrix A . Then approximate the minimum cardinality Boolean problemaswhere optimal solutionminimize ‖x‖ 0xsubject to Ax = bx i ∈ {0, 1} ,i=1... nminimize ‖ỹ‖ 1ỹsubject to AΛ −1 ỹ = b1 ≽ Λ −1 ỹ ≽ 0(891)(908)y ⋆ = round(Λ −1 ỹ ⋆ ) (909)The inequality in (908) relaxes the Boolean constraint y i ∈ {0, 1} as in (891);serving to bound any solution y ⋆ to a unit cube whose vertices are binarynumbers. Convex problem (908) is one of many conceivable approximations,but Donoho concurs with this particular formulation equivalently expressibleas a linear program via (1308).Approximation (908) is therefore equivalent to minimization of an affinefunction on a bounded polyhedron, whereas semidefinite program( [ ]) 0 1minimizeX∈ S n+1subject totr X[ A 00 T 1δ(X) = 1X ≽ 01 T 0]X[ AT00 T 1]=[ (2b −A1)(2b −A1)T2b −A1(2b −A1) T 1](903)

6.2. FRAMEWORK 385minimizes an affine function on the elliptope intersected by hyperplanes.Although the same Boolean solution is obtained from this approximation(908) as compared with semidefinite program (903) when given thatparticular data from Example 6.2.3.0.2, Singer confides a counter-example:Instead, given data[ ] [ ]1 0 √2 11A =1, b =(910)0 1 √2 1then solving approximation (908) yields⎛⎡y ⋆ ⎜⎢= round⎝⎣1 − 1 √21 − 1 √21⎤⎞⎥⎟⎦⎠ =⎡⎢⎣001⎤⎥⎦ (911)(infeasible, with or without rounding, with respect to original problem (891))whereas solving semidefinite program (903) produces⎡⎤1 1 −1 1round(X ⋆ ) = ⎢ 1 1 −1 1⎥⎣ −1 −1 1 −1 ⎦ (912)1 1 −1 1with sorted eigenvaluesλ(X ⋆ ) =⎡⎢⎣3.999999650572640.00000035942736−0.00000000000000−0.00000001000000⎤⎥⎦ (913)Truncating all but the largest eigenvalue, versus y ⋆ we obtain⎛⎡⎤⎞⎡ ⎤0.99999999625299 1x ⋆ = round⎝⎣0.99999999625299 ⎦⎠ = ⎣ 1 ⎦ (914)0.00000001434518 0the desired minimum cardinality Boolean result.We leave assessment of general performance of (908) as compared withsemidefinite program (903) an open question.

386 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.3 Rank reduction...it is not clear generally how to predict rankX ⋆ or rankS ⋆before solving the SDP problem.−Farid Alizadeh (1995) [9, p.22]The premise of rank reduction in semidefinite programming is: an optimalsolution found does not satisfy Barvinok’s upper bound (226) on rank. Theparticular numerical algorithm solving a semidefinite program may haveinstead returned a high-rank optimal solution (6.1.1; e.g., (873)) when alower-rank optimal solution was expected.6.3.1 Posit a perturbation of X ⋆Recall from6.1.1.1, there is an extreme point of A ∩ S n + (865) satisfyingupper bound (226) on rank. [20,2.2] It is therefore sufficient to locatean extreme point of the intersection whose primal objective value (862P) isoptimal: 6.14 [66,31.5.3] [156,2.4] [5,3] [190]Consider again the affine subsetA = {X ∈ S n | A svec X = b} (865)where for A i ∈ S n ⎡A =∆ ⎣⎤svec(A 1 ) T. ⎦ ∈ R m×n(n+1)/2 (863)svec(A m ) TGiven any optimal solution X ⋆ tominimizeX∈ S n 〈C , X 〉subject to X ∈ A ∩ S n +(862)(P)6.14 There is no known construction for Barvinok’s tighter result (231). −Monique Laurent

6.3. RANK REDUCTION 387whose rank does not satisfy upper bound (226), we posit existence of a setof perturbations{t j B j | t j ∈ R , B j ∈ S n , j =1... n} (915)such that, for some 0≤i≤n and scalars {t j , j =1... i}X ⋆ +i∑t j B j (916)j=1becomes an extreme point of A ∩ S n + and remains an optimal solutionof (862P). Membership of (916) to affine subset A is secured for the i thperturbation by demanding〈B i , A j 〉 = 0, j =1... m (917)while membership to the positive semidefinite cone S n + is insured by smallperturbation (926). In this manner feasibility is insured. Optimality is provedin6.3.3.The following simple algorithm has very low computational intensity andlocates an optimal extreme point, assuming a nontrivial solution:6.3.1.0.1 Procedure. Rank reduction.initialize: B i = 0 ∀ifor iteration i=1...n{1. compute a nonzero perturbation matrix B i of X ⋆ + i−1 ∑j=1t j B j2. maximize t isubject to X ⋆ + i ∑j=1t j B j ∈ S n +}

388 CHAPTER 6. SEMIDEFINITE PROGRAMMINGA rank-reduced optimal solution is theni∑X ⋆ ← X ⋆ + t j B j (918)j=16.3.2 Perturbation formThe perturbations are independent of constants C ∈ S n and b∈R m in primaland dual programs (862). Numerical accuracy of any rank-reduced result,found by perturbation of an initial optimal solution X ⋆ , is therefore quitedependent upon initial accuracy of X ⋆ .6.3.2.0.1 Definition. Matrix step function. (conferA.6.4.0.1)Define the signum-like real function ψ : S n → Rψ(Z) ∆ ={ 1, Z ≽ 0−1, otherwise(919)The value −1 is taken for indefinite or nonzero negative semidefiniteargument.△Deza & Laurent [66,31.5.3] prove: every perturbation matrix B i ,i=1... n , is of the formB i = −ψ(Z i )R i Z i R T i ∈ S n (920)where∑i−1X ⋆ = ∆ R 1 R1 T , X ⋆ + t j B ∆ j = R i Ri T ∈ S n (921)j=1where the t j are scalars and R i ∈ R n×ρ is full-rank and skinny where( )∑i−1ρ = ∆ rank X ⋆ + t j B jj=1(922)and where matrix Z i ∈ S ρ is found at each iteration i by solving a very

6.3. RANK REDUCTION 389simple feasibility problem: 6.15find Z i ∈ S ρsubject to 〈Z i , RiA T j R i 〉 = 0 ,j =1... m(923)Were there a sparsity pattern common to each member of the set{R T iA j R i ∈ S ρ , j =1... m} , then a good choice for Z i has 1 in each entrycorresponding to a 0 in the pattern; id est, a sparsity pattern complement.At iteration i∑i−1X ⋆ + t j B j + t i B i = R i (I − t i ψ(Z i )Z i )Ri T (924)j=1By fact (1094), therefore∑i−1X ⋆ + t j B j + t i B i ≽ 0 ⇔ 1 − t i ψ(Z i )λ(Z i ) ≽ 0 (925)j=1where λ(Z i )∈ R ρ denotes the eigenvalues of Z i .Maximization of each t i in step 2 of the Procedure reduces rank of (924)and locates a new point on the boundary ∂(A ∩ S n +) . 6.16 Maximization oft i thereby has closed form;6.15 A simple method of solution is closed-form projection of a random nonzero point onthat proper subspace of isometrically isomorphic R ρ(ρ+1)/2 specified by the constraints.(E.5.0.0.6) Such a solution is nontrivial assuming the specified intersection of hyperplanesis not the origin; guaranteed by ρ(ρ + 1)/2 > m. Indeed, this geometric intuition aboutforming the perturbation is what bounds any solution’s rank from below; m is fixed bythe number of equality constraints in (862P) while rank ρ decreases with each iteration i.Otherwise, we might iterate indefinitely.6.16 This holds because rank of a positive semidefinite matrix in S n is diminished belown by the number of its 0 eigenvalues (1101), and because a positive semidefinite matrixhaving one or more 0 eigenvalues corresponds to a point on the PSD cone boundary (157).Necessity and sufficiency are due to the facts: R i can be completed to a nonsingular matrix(A.3.1.0.5), and I − t i ψ(Z i )Z i can be padded with zeros while maintaining equivalencein (924).

390 CHAPTER 6. SEMIDEFINITE PROGRAMMING(t ⋆ i) −1 = max {ψ(Z i )λ(Z i ) j , j =1... ρ} (926)When Z i is indefinite, the direction of perturbation (determined by ψ(Z i )) isarbitrary. We may take an early exit from the Procedure were Z i to become0 or wererank [ svec R T iA 1 R i svec R T iA 2 R i · · · svec R T iA m R i]= ρ(ρ + 1)/2 (927)which characterizes the rank ρ of any [sic] extreme point in A ∩ S n + .[156,2.4]Proof. Assuming the form of every perturbation matrix is indeed (920),then by (923)svec Z i ⊥ [ svec(R T iA 1 R i ) svec(R T iA 2 R i ) · · · svec(R T iA m R i ) ] (928)By orthogonal complement we haverank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] ⊥+ rank [ svec(R T iA 1 R i ) · · · svec(R T iA m R i ) ] = ρ(ρ + 1)/2(929)When Z i can only be 0, then the perturbation is null because an extremepoint has been found; thus[svec(RTi A 1 R i ) · · · svec(R T iA m R i ) ] ⊥= 0 (930)from which the stated result (927) directly follows.

6.3. RANK REDUCTION 3916.3.3 Optimality of perturbed X ⋆We show that the optimal objective value is unaltered by perturbation (920);id est,i∑〈C , X ⋆ + t j B j 〉 = 〈C , X ⋆ 〉 (931)j=1Proof. From Corollary 6.2.3.0.1 we have the necessary and sufficientrelationship between optimal primal and dual solutions under the assumptionof existence of a relatively interior feasible point:S ⋆ X ⋆ = S ⋆ R 1 R T 1 = X ⋆ S ⋆ = R 1 R T 1 S ⋆ = 0 (932)This means R(R 1 ) ⊆ N(S ⋆ ) and R(S ⋆ ) ⊆ N(R T 1 ). From (921) and (924)we get the sequence:X ⋆ = R 1 R1TX ⋆ + t 1 B 1 = R 2 R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )R1TX ⋆ + t 1 B 1 + t 2 B 2 = R 3 R3 T = R 2 (I − t 2 ψ(Z 2 )Z 2 )R2 T = R 1 (I − t 1 ψ(Z 1 )Z 1 )(I − t 2 ψ(Z 2 )Z 2 )R1T.()∑X ⋆ + i i∏t j B j = R 1 (I − t j ψ(Z j )Z j ) R1 T (933)j=1j=1Substituting C = svec −1 (A T y ⋆ ) + S ⋆ from (862),()∑〈C , X ⋆ + i i∏t j B j 〉 =〈svec −1 (A T y ⋆ ) + S ⋆ , R 1 (I − t j ψ(Z j )Z j )j=1〈〉m∑= yk ⋆A ∑k , X ⋆ + i t j B jk=1j=1j=1〈 m〉∑= yk ⋆A k + S ⋆ , X ⋆ = 〈C , X ⋆ 〉 (934)k=1R T 1〉because 〈B i , A j 〉=0 ∀i, j by design (917).

392 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.3.3.0.1 Example. Aδ(X) = b .This academic example demonstrates that a solution found by rank reductioncan certainly have rank less than Barvinok’s upper bound (226): Assume agiven vector b∈ R m belongs to the conic hull of the columns of a given matrixA∈ R m×n ;⎡−1 1 8 1 1A = ⎣ −3 2 8 1/2 1/3−9 4 8 1/4 1/9Consider the convex optimization problem⎤⎡⎦ , b = ⎣11/21/4⎤⎦ (935)minimize trXX∈ S 5subject to X ≽ 0Aδ(X) = b(936)that minimizes the 1-norm of the main diagonal; id est, problem (936) is thesame asminimizeX∈ S 5 ‖δ(X)‖ 1subject to X ≽ 0Aδ(X) = b(937)that finds a solution to Aδ(X)=b. Rank-3 solution X ⋆ = δ(x M) is optimal,where⎡ 2 ⎤1280x M=5⎢ 128 ⎥(938)⎣ 0 ⎦90128Yet upper bound (226) predicts existence of at most a(⌊√ ⌋ )8m + 1 − 1rank-= 22(939)feasible solution from m = 3 equality constraints. To find a lowerrank ρ optimal solution to (936) (barring combinatorics), we invokeProcedure 6.3.1.0.1:

6.3. RANK REDUCTION 393Initialize:C =I , ρ=3, A j ∆ = δ(A(j, :)) , j =1, 2, 3, X ⋆ = δ(x M) , m=3, n=5.{Iteration i=1:⎡Step 1: R 1 =⎢⎣√21280 00 0 00√501280 0√00 090128⎤.⎥⎦find Z 1 ∈ S 3subject to 〈Z 1 , R T 1A j R 1 〉 = 0, j =1, 2, 3(940)A nonzero randomly selected matrix Z 1 having 0 main diagonalis feasible and yields a nonzero perturbation matrix. Choose,arbitrarily,Z 1 = 11 T − I ∈ S 3 (941)then (rounding)⎡B 1 =⎢⎣Step 2: t ⋆ 1= 1 because λ(Z 1 )=[−1 −1⎡X ⋆ ← δ(x M) + B 1 =⎢⎣0 0 0.0247 0 0.10480 0 0 0 00.0247 0 0 0 0.16570 0 0 0 00.1048 0 0.1657 0 02 ] T . So,⎤⎥⎦20 0.0247 0 0.10481280 0 0 0 00.0247 051280 0.16570 0 0 0 00.1048 0 0.1657 090128⎤(942)⎥⎦ (943)has rank ρ ←1 and produces the same optimal objective value.}

394 CHAPTER 6. SEMIDEFINITE PROGRAMMINGAnother demonstration of rank reduction Procedure 6.3.1.0.1 would applyto the maximal complementarity example (6.1.1.3.1); a rank-1 solution cancertainly be found (by Barvinok’s Proposition 2.9.3.0.1) because there is onlyone equality constraint.6.3.4 Final thoughts regarding rank reductionBecause the rank reduction procedure is guaranteed only to produce anotheroptimal solution conforming to Barvinok’s upper bound (226), the Procedurewill not necessarily produce solutions of arbitrarily low rank; but if they exist,the Procedure can. Arbitrariness of search direction when matrix Z i becomesindefinite, mentioned on page 390, and the enormity of choices for Z i (923)are liabilities for this algorithm.6.3.4.1 Inequality constraintsThe question naturally arises: what to do when a semidefinite program (notin prototypical form) 6.17 has inequality constraints of the formα T i svec X ≼ β i , i = 1... k (944)where the β i are scalars. One expedient way to handle this circumstance isto convert the inequality constraints to equality constraints by introducing aslack variable; id est,α T i svec X + γ i = β i , i = 1... k , γ ≽ 0 (945)thereby converting the problem to prototypical form.6.17 Contemporary numerical packages for solving semidefinite programs can solve a widerrange of problem than our conic prototype (862). Generally, they do so by transforming agiven problem into some prototypical form by introducing new constraints and variables.[9] [262] We are momentarily considering a departure from the primal prototype thataugments the constraint set with affine inequalities.

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 395Alternatively, we say an inequality constraint is active when it is met withequality; id est, when for particular i in (944), αi T psvec X ⋆ = β ip . An optimalhigh-rank solution X ⋆ is, of course, feasible satisfying all the constraints.But for the purpose of rank reduction, the inactive inequality constraints areignored while the active inequality constraints are interpreted as equalityconstraints. In other words, we take the union of the active inequalityconstraints (as equalities) with the equality constraints A svec X = b toform a composite affine subset Â for the primal problem (862P). Then weproceed with rank reduction of X ⋆ as though the semidefinite program werein prototypical form (862P).6.4 Rank-constrained semidefinite program6.4.1 Sensor-Network Localizationand Wireless LocationA solution to a sensor-network localization problem in two dimensionspresented by Carter & Jin in [46] is, in their belief, state-of-the-art. There,a large network is partitioned into smaller subnetworks (as small as onesensor) and then semidefinite programming (SDP) and heuristics are appliedto localize each and every partition by distance geometry. Their partitioningprocedure is one-pass, yet termed iterative; a term applicable in so far asadjoining partitions can share anchors (absolute sensor positions knowna priori) and localized sensors. As partitions are selected based not ongeographic criteria, they also term the partitioning adaptive.One can reasonably argue that semidefinite programming methods areunnecessary for localization of large sensor networks. In the past, thesenonlinear localization problems were solved algebraically and solutionscomputed by least squares; called multilateration. 6.18 Indeed, practicalcontemporary numerical methods for global positioning (GPS) do not relyon semidefinite programming.6.18 Multilateration - literally, having many sides, the shape of a geometric figure formed bynearly-intersecting lines-of-position. Therefore, in navigation systems, the obtaining of afix from multiple lines of position.

396 CHAPTER 6. SEMIDEFINITE PROGRAMMINGThe beauty of semidefinite programming as relates to localization liesin convex expression of classical multilateration: So & Ye showed that theproblem of finding unique solution to a noiseless nonlinear system describingthe common point of intersection of hyperspheres in real Euclidean vectorspace can be expressed as a semidefinite program via distance geometry. [212]The need for SDP methods is also a question logically consequentto Carter & Jin’s reliance on complicated and extensive heuristics forpartitioning a large network and for solving a partition whose intersensormeasurement data is inadequate for localization by distance geometry.Partitions range in size between 2 and 10 sensors; 5 sensors optimally. Forthese small numbers it is unclear what advantage SDP offers over traditionalleast squares.Heuristics often fail on real-world data because of unanticipatedcircumstances. When heuristics fail, generally they are repaired byadding more heuristics. Tenuous is any presumption that distancemeasurement errors have distribution characterized by circular contours ofequal probability about an unknown sensor-location. That presumptioneffectively appears within Carter & Jin’s optimization problem statementas affine equality constraints relating unknowns to distance measurementscorrupted by noise. Yet in most all urban environments, this measurementnoise is more aptly characterized by ellipses of varying orientation andeccentricity as one recedes from a sensor. Each unknown must thereforeinstead be bound to its own particular range of distance, primarilydetermined by the terrain; a distinct contour map is required for eachanchor. [134]Partitioning of large sensor networks is the logical alternative toexponential growth of SDP computational complexity with problem size.But when impact of noise on distance measurement is of most concern, oneis averse to a partitioning scheme because noise-effects vary inversely withproblem size. [36,2.2] (4.13.2) Since an individual partition’s solution isnot iterated in Carter & Jin and is interdependent with adjoining partitions,we expect errors to propagate from one partition to the next; the ultimatepartition solved, expected to suffer most.

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 397Generally speaking, there can be no unique solution to the sensor-networklocalization problem because there is no unique formulation; that is the art ofoptimization. Any optimal solution obtained depends on whether or how thenetwork is partitioned and how the problem is formulated. When a particularformulation is a convex optimization problem, then the set of all optimalsolutions forms a convex set containing the actual or true localization. Evenwere that solution set a single point, it is not necessarily the true localizationbecause there is little hope of exact localization by any algorithm oncesignificant noise is introduced.Carter & Jin compare performance of their heuristics to the SDPformulation of author Biswas whom they feel a vanguard of the art. [32] [31]Their heuristics far outperform Biswas’ localizations both in execution-timeand proximity to the desired result. Perhaps the lesson is instead: Biswas’approach to localization is not the best. [12,1] This can be intuitivelyunderstood: Biswas poses localization as an optimization minimizing adistance measure. Experience tells us that minimization of any distancemeasure yields compacted solutions; [251] (5.4) precisely the anomalyobserved by Carter & Jin.The sensor-network localization problem is considered difficult. [12,2]Rank constraints in optimization are also considered difficult. In whatfollows, we present a proper formulation of the localization problem as asemidefinite program having an explicit rank constraint. We show how toachieve the rank constraint if and only if the feasible solution set contains amatrix of the desired rank.6.4.1.1 proposed standardized testJin proposes an academic test in real Euclidean two-dimensional space R 2that we adopt. In the interest of evaluating performance of all localizationalgorithms against some standard, we recommend standardization andadoption of this test by other researchers.In essence, Jin’s test is a localization of sensors and anchors arrangedin a regular triangular lattice. Lattice connectivity is solely determined bysensor radio range (assumed common to all sensors) and true positions; aconnectivity graph is assumed incomplete. In the interest of standardization,we propose adoption of small examples: Figure 93 through Figure 96 andtheir particular connectivity, matrices (946) through (949) respectively.

398 CHAPTER 6. SEMIDEFINITE PROGRAMMING2314Figure 93: Jin’s 2-lattice in R 2 , hand-drawn. Nodes 3 and 4 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc. Rangeof each other sensor is likewise.0 • ? •• 0 • •? • 0 ◦• • ◦ 0(946)Matrix entries dot • indicate measurable distance between nodes whileunknown distances are denoted by question mark. Matrix entries ◦ denoteknown distance between anchors (to very high accuracy) while zero distanceis denoted 0. Because measured distances are quite unreliable in practice, oursolution to the localization problem substitutes a distinct range of possibledistance for each measurable distance; equality constraints exist only foranchors.Anchors are chosen so as to increase difficulty for algorithms dependenton existence of sensors in their convex hull. The challenge is to find a solutionin two dimensions close to the true sensor positions given incomplete noisyintersensor distance information.

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 3995 763 841 92Figure 94: Jin’s 3-lattice in R 2 , hand-drawn. Nodes 7, 8, and 9 are anchors;remaining nodes are sensors. Radio range of sensor 1 indicated by arc; rangeof each other sensor is likewise.0 • • ? • ? ? • •• 0 • • ? • ? • •• • 0 • • • • • •? • • 0 ? • • • •• ? • ? 0 • • • •? • • • • 0 • • •? ? • • • • 0 ◦ ◦• • • • • • ◦ 0 ◦• • • • • • ◦ ◦ 0(947)

400 CHAPTER 6. SEMIDEFINITE PROGRAMMING1013 11 1271489415 5 611623Figure 95: Jin’s 4-lattice in R 2 , hand-drawn. Nodes 13, 14, 15, and 16 areanchors; remaining nodes are sensors.0 ? ? • ? ? • ? ? ? ? ? ? ? • •? 0 • • • • ? • ? ? ? ? ? • • •? • 0 ? • • ? ? • ? ? ? ? ? • •• • ? 0 • ? • • ? • ? ? • • • •? • • • 0 • ? • • ? • • • • • •? • • ? • 0 ? • • ? • • ? ? ? ?• ? ? • ? ? 0 ? ? • ? ? • • • •? • ? • • • ? 0 • • • • • • • •? ? • ? • • ? • 0 ? • • • ? • ?? ? ? • ? ? • • ? 0 • ? • • • ?? ? ? ? • • ? • • • 0 • • • • ?? ? ? ? • • ? • • ? • 0 ? ? ? ?? ? ? • • ? • • • • • ? 0 ◦ ◦ ◦? • ? • • ? • • ? • • ? ◦ 0 ◦ ◦• • • • • ? • • • • • ? ◦ ◦ 0 ◦• • • • • ? • • ? ? ? ? ◦ ◦ ◦ 0(948)

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 4011718 21 19 2013 14221516910 23 11 125 6 24 7 8122534Figure 96: Jin’s 5-lattice in R 2 . Nodes 21 through 25 are anchors.0 • ? ? • • ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?• 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? • •? ? 0 • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ? • • •? ? • 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? • ?• • ? ? 0 • ? ? • • ? ? • • ? ? • ? ? ? ? ? • ? •• • • ? • 0 • ? • • • ? ? • ? ? ? ? ? ? ? ? • • •? ? • • ? • 0 • ? ? • • ? ? • • ? ? ? ? ? ? • • •? ? • • ? ? • 0 ? ? • • ? ? • • ? ? ? ? ? ? ? • ?• ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? ? ? ? ?? • ? ? • • ? ? • 0 • ? • • ? ? ? • ? ? • • • • •? ? • ? ? • • • ? • 0 • ? • • • ? ? • ? ? • • • •? ? • • ? ? • • ? ? • 0 ? ? • • ? ? • • ? • • • ?? ? ? ? • ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ?? ? ? ? • • ? ? • • • ? • 0 • ? • • • ? • • • • ?? ? ? ? ? ? • • ? ? • • ? • 0 • ? ? • • • • • • ?? ? ? ? ? ? • • ? ? • • ? ? • 0 ? ? • • ? • ? ? ?? ? ? ? • ? ? ? • ? ? ? • • ? ? 0 • ? ? • ? ? ? ?? ? ? ? ? ? ? ? • • ? ? • • ? ? • 0 • ? • • • ? ?? ? ? ? ? ? ? ? ? ? • • ? • • • ? • 0 • • • • ? ?? ? ? ? ? ? ? ? ? ? ? • ? ? • • ? ? • 0 • • ? ? ?? ? ? ? ? ? ? ? ? • ? ? • • • ? • • • • 0 ◦ ◦ ◦ ◦? ? ? ? ? ? ? ? ? • • • • • • • ? • • • ◦ 0 ◦ ◦ ◦? ? • ? • • • ? ? • • • ? • • ? ? • • ? ◦ ◦ 0 ◦ ◦? • • • ? • • • ? • • • ? • • ? ? ? ? ? ◦ ◦ ◦ 0 ◦? • • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ◦ ◦ ◦ ◦ 0(949)

402 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.4.1.2 problem statementAscribe points in a list {x l ∈ R n , l=1... N} to the columns of a matrix X ;X = [x 1 · · · x N ] ∈ R n×N (64)where N is regarded as cardinality of list X . Positive semidefinite matrixX T X , formed from inner product of the list, is a Gram matrix; [162,3.6]G ∆ = X T X =⎡⎢⎣⎤‖x 1 ‖ 2 x T 1x 2 x T 1x 3 · · · x T 1x Nx T 2x 1 ‖x 2 ‖ 2 x T 2x 3 · · · x T 2x Nx T 3x 1 x T 3x 2 ‖x 3 ‖ 2 ... x T 3x N∈ S N + (477)⎥. ....... . ⎦xN Tx 1 xN Tx 2 xN Tx 3 · · · ‖x N ‖ 2where S N + is the convex cone of N ×N positive semidefinite matrices in thereal symmetric matrix subspace S N .Existence of noise precludes measured distance from the input data. Weinstead assign measured distance to a known range specified by individualupper and lower bounds; d ij is the upper bound on actual distance-squarefrom i th to j th sensor, while d ij is the lower bound. These bounds becomethe input data. Each measurement range is presumed different from theothers.Our mathematical treatment of anchors and sensors is not dichotomized.A known sensor position to high accuracy ˇx i is an anchor. The wirelesslocation problem is stated identically; the difference being, fewer sensors.Then the sensor-network localization problem can be expressed: Givenindices set I for all existing distance measurements • and number ofanchors m , then for 0 < n < N

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 403minimize trGG∈S N , X∈R n×Nsubject to d ij ≤ 〈G , (e i − e j )(e i − e j ) T 〉 ≤ d ij ∀(i,j)∈ I〈G , e i e T i 〉 = ‖ˇx i ‖ 2 , i = N − m + 1... N〈G , (e i e T j + e j e T i )/2〉 = ˇx T i ˇx j , i < j , i,j∈{N − m + 1... N}X(:, N − m + 1:N) = [ ˇx N−m+1 · · · ˇx N ][ ] I XX T≽ 0GrankG= n(950)where e i is the i th member of the standard basis for R N . The objectivefunction trG is a heuristic whose sole purpose is to represent the convexenvelope of rankG. (7.2.2.1.1) Distance-squared ij = ‖x i − x j ‖ 2 2∆= 〈x i − x j , x i − x j 〉 (464)is related to Gram matrix entries G ∆ =[g ij ] by a vector inner-productd ij = g ii + g jj − 2g ij= 〈G , (e i − e j )(e i − e j ) T 〉 ∆ = tr(G T (e i − e j )(e i − e j ) T )(479)hence the scalar inequalities. By Schur complement (A.4) any feasible Gand X provide a comparison with respect to the positive semidefinite coneG ≽ X T X (512)which is a convex relaxation of the desired equality constraint[ I XX T G]=[ IX T ][ I X ] (513)The rank constraint insures this equality holds thus restricting solution to R n .

404 CHAPTER 6. SEMIDEFINITE PROGRAMMING6.4.1.2.1 convex equivalent problem statementProblem statement (950) is nonconvex because of the rank constraint. Wedo not eliminate or ignore the rank constraint; rather, we find a convex wayto enforce it: for 0 < n < Nminimize 〈G , W 〉G∈S N , X∈R n×Nsubject to d ij ≤ 〈G , (e i − e j )(e i − e j ) T 〉 ≤ d ij ∀(i,j)∈ I〈G , e i e T i 〉 = ‖ˇx i ‖ 2 , i = N − m + 1... N〈G , (e i e T j + e j e T i )/2〉 = ˇx T i ˇx j , i < j , i,j∈{N − m + 1... N}X(:, N − m + 1:N) = [ ˇx N−m+1 · · · ˇx N ][ ] I XX T≽ 0 (951)GEach affine equality constraint in G∈ S N represents a hyperplane inisometrically isomorphic Euclidean vector space R N(N+1)/2 , while eachaffine inequality pair represents a convex Euclidean body known as slab(an intersection of two parallel but opposing halfspaces). In this convexoptimization problem (951), a semidefinite program, we substitute a vectorinner-product objective function for trace from nonconvex problem (950);〈G , I 〉 = trG ← 〈G , W 〉 (952)a generalization of the known trace heuristic [78] for minimizing convexenvelope of rank, where W ∈ S N + is constant with respect to (951). MatrixW is normal to a hyperplane minimized over the convex feasible set specifiedby the constraints. Matrix W is chosen so −W points in the directionof a feasible rank-n Gram matrix. Thus the only purpose of the vectorinner-product objective is to locate a feasible rank-n Gram matrix on theboundary of the positive semidefinite cone S N + .6.4.1.2.2 direction matrix WDenote by G ⋆ an optimal Gram matrix from semidefinite program(951). Then for G ⋆ ∈ S N whose eigenvalues λ(G ⋆ )∈ R N are arranged innonincreasing order, and for 1≤N − n≤N (Fan)

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 405N∑λ(G ⋆ ) ii=n+1= minimizeW ∈ S N 〈G ⋆ , W 〉subject to 0 ≼ W ≼ ItrW = N − n(1328a)whose closed-form solution is known. This eigenvalue sum is zero when G ⋆has rank n .Foreknowledge of optimal G ⋆ , to make possible this search for W , impliesrecursion; id est, semidefinite program (951) is solved for G ⋆ initializingW = I . Once found, G ⋆ becomes constant in semidefinite program (1328a)where a new normal direction W is found as its optimal solution. Then thecycle (951) (1328a) iterates until, presumably, convergence.We make no assumption regarding uniqueness of direction matrix W .The feasible set of direction matrices in (1328a) is the convex hull of outerproduct of all rank-N −n orthonormal matrices; videlicet, (2.3.2.1.1)conv { UU T | U ∈ R N×N−n , U T U = I } = { A∈ S N | I ≽ A ≽ 0, 〈I , A 〉= N −n } (75)Set {UU T | U ∈ R N×N−n , U T U = I} comprises the extreme points of itsconvex hull. Each and every extreme point has only N −n nonzeroeigenvalues λ and they all equal 1 ; id est, λ(UU T ) 1:N−n = λ(U T U) = 1.6.4.1.3 convergenceDenote by W ⋆ a particular optimal direction matrix from semidefiniteprogram (1328a) such that〈G ⋆ , W ⋆ 〉 = 0 (953)Then we define convergence of the iteration (951) (1328a) to correspondwith this vanishing vector inner-product. We study convergence to ascertainconditions under which a direction matrix will reveal a feasible rank-n Grammatrix in semidefinite program (951).

406 CHAPTER 6. SEMIDEFINITE PROGRAMMINGBecause this iterative rank minimization technique is not a projectionmethod, it will find a rank-n solution ((953) will be satisfied) if and only ifone exists in the feasible set of program (951).Proof. Pending.Failure to converge by definition (953) means a rank-n feasible solutionis nonexistent.6.4.1.4 numerical solutionIn all examples to follow, number of anchorsm = √ N (954)equals square root of cardinality N of list X . Indices set I identifying allexisting distance measurements • is ascertained from connectivity matrix(946), (947), (948), or (949). We solve iteration (951) (1328a) in dimensionn = 2 for each respective example illustrated in Figure 93 through Figure 96.In presence of negligible noise, actual position is reliably localized forevery standardized example; noteworthy in so far as each example representsan incomplete graph. This means the set of all solutions having lowest rankis a single point.To make the examples more interesting and consistent withprevious work, we randomize each range of distance-square bounding〈G , (e i − e j )(e i − e j ) T 〉 in (951); id est, for each and every (i,j)∈ Id ij = d ij (1 + √ 3 rand(1)η) 2d ij = d ij (1 − √ 3 rand(1)η) 2 (955)where η = 0.1 is a constant noise factor, rand(1) is the Matlab functionproviding one sample of uniformly distributed noise in the interval [0, 1] ,and d ij is actual distance-square from i th to j th sensor. Because of theseparate function calls rand(1) , each range of distance-square [d ij , d ij ]is not necessarily centered on actual distance-square d ij . The factor √ 3provides unit variance on the stochastic range.

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 4071.210.80.60.40.20−0.2−0.5 0 0.5 1 1.5 2Figure 97: Fairly typical is this solution for Jin’s 2-lattice in Figure 93 withnoise factor η = 0.1 . Two rightmost nodes are anchors; two remaining nodesare sensors. Radio range of sensor 1 indicated by arc; radius = 1.14 . Actualsensor indicated by target while its localization is indicated by bullet • .Rank-2 solution found in 1 iteration (951) (1328a).

408 CHAPTER 6. SEMIDEFINITE PROGRAMMING1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 98: Typical solution for Jin’s 3-lattice in Figure 94 with noise factorη = 0.1 . Three red middle nodes are anchors; remaining nodes are sensors.Radio range of sensor 1 indicated by arc; radius = 1.12 . Actual sensorindicated by target while its localization is indicated by bullet • . Rank-2solution found in 2 iterations (951) (1328a).

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 4091.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 99: Typical solution for Jin’s 4-lattice in Figure 95 with noise factorη = 0.1 . Four red nodes are anchors; remaining nodes are sensors. Radiorange of sensor 1 indicated by arc; radius = 0.75 . Rank-2 solution found in7 iterations (951) (1328a).

410 CHAPTER 6. SEMIDEFINITE PROGRAMMING1.210.80.60.40.20−0.2−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4Figure 100: Typical solution for Jin’s 5-lattice in Figure 96 with noise factorη = 0.1 . Five red middle nodes are anchors; remaining nodes are sensors.Sensor 1 radius = 0.56 . Rank-2 solution found in 3 iterations (951) (1328a).

6.4. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 411Figure 97 through Figure 100 each illustrate one realization of numericalsolution to the standardized lattice problems posed by Figure 93 throughFigure 96 respectively. Exact localization is impossible because ofmeasurement noise. Certainly, by inspection of their published graphicaldata, these new results requiring no heuristics are competitive withthose of Carter & Jin. Obviously our solutions do not suffer from thosecompaction-type errors exhibited by Biswas’ graphical results; which is allwe intended to show.6.4.1.5 localization conclusionSolution to this sensor-network localization problem became apparent byunderstanding geometry of optimization. Trace of a matrix to a student oflinear algebra is perhaps a sum of eigenvalues. But to us, trace representsthe normal I to some hyperplane in Euclidean vector space.The legacy of Carter & Jin provided by [46] is a sobering demonstrationof the need for more efficient methods for solution of semidefinite programs,while that of So & Ye in [212] is the bonding of distance geometry tosemidefinite programming. Elegance of our problem statement for asensor-network localization problem in any dimension should provide someimpetus to focus more research on computation intensity.6.4.1.6 postscriptThis technique for constraining rank of a semidefinite feasibility problemhas been tested numerically on a wide range of problem types welloutside of localization, including randomized problems and some cardinalityproblems. 6.19 We find that it works well and quite independently of desiredrank or affine dimension, with the caveat that problem size is small. 6.20We have also tested the technique as a regularization (Pareto optimization)introduced into various convex problems having a pre-existing objective. Inall cases, we find the technique works well but only for small problems. Atthis moment we speculate the reason to be sensitivity to numerical precision,induced by interior-point methods of solution.6.19 Conic independence of vertices in constraints like Ax = b plays some empirical role incardinality problems.6.20 Small is hard to define. An upper bound on problem size is certainly larger indimension than that corresponding to the PSD cone depicted in Figure 29.

412 CHAPTER 6. SEMIDEFINITE PROGRAMMING

Chapter 7EDM proximityIn summary, we find that the solution to problem [(965.3) p.419]is difficult and depends on the dimension of the space as thegeometry of the cone of EDMs becomes more complex.−Hayden, Wells, Liu, & Tarazaga (1991) [115,3]A problem common to various sciences is to find the Euclidean distancematrix (EDM) D ∈ EDM N closest in some sense to a given complete matrixof measurements H under a constraint on affine dimension 0 ≤ r ≤ N −1(2.3.1,4.7.1.1); rather, r is bounded above by desired affine dimension ρ .2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.413

414 CHAPTER 7. EDM PROXIMITY7.0.1 Measurement matrix HIdeally, we want a given matrix of measurements H ∈ R N×N to conformwith the first three Euclidean metric properties (4.2); to belong to theintersection of the orthant of nonnegative matrices R N×N+ with the symmetrichollow subspace S N h (2.2.3.0.1). Geometrically, we want H to belong to thepolyhedral cone (2.12.1.0.1)K ∆ = S N h ∩ R N×N+ (956)Yet in practice, H can possess significant measurement uncertainty (noise).Sometimes realization of an optimization problem demands that itsinput, the given matrix H , possess some particular characteristics; perhapssymmetry and hollowness or nonnegativity. When that H given does notpossess the desired properties, then we must impose them upon H prior tooptimization:When measurement matrix H is not symmetric or hollow, taking itssymmetric hollow part is equivalent to orthogonal projection on thesymmetric hollow subspace S N h .When measurements of distance in H are negative, zeroing negativeentries effects unique minimum-distance projection on the orthant ofnonnegative matrices R N×N+ in isomorphic R N2 (E.9.2.2.3).7.0.1.1 Order of impositionSince convex cone K (956) is the intersection of an orthant with a subspace,we want to project on that subset of the orthant belonging to the subspace;on the nonnegative orthant in the symmetric hollow subspace that is, in fact,the intersection. For that reason alone, unique minimum-distance projectionof H on K (that member of K closest to H in isomorphic R N2 in the Euclideansense) can be attained by first taking its symmetric hollow part, and onlythen clipping negative entries of the result to 0 ; id est, there is only onecorrect order of projection, in general, on an orthant intersecting a subspace:project on the subspace, then project the result on the orthant in thatsubspace. (conferE.9.5)

In contrast, order of projection on an intersection of subspaces is arbitrary.415That order-of-projection rule applies more generally, of course, tothe intersection of any convex set C with any subspace. Consider theproximity problem 7.1 over convex feasible set S N h ∩ C given nonsymmetricnonhollow H ∈ R N×N :minimizeB∈S N hsubject to‖B − H‖ 2 FB ∈ C(957)a convex optimization problem. Because the symmetric hollow subspaceis orthogonal to the antisymmetric antihollow subspace (2.2.3), then forB ∈ S N h( ( ))1tr B T 2 (H −HT ) + δ 2 (H) = 0 (958)so the objective function is equivalent to( )∥ ‖B − H‖ 2 F ≡1 ∥∥∥2∥ B − 2 (H +HT ) − δ 2 (H) +F21∥2 (H −HT ) + δ 2 (H)∥F(959)This means the antisymmetric antihollow part of given matrix H wouldbe ignored by minimization with respect to symmetric hollow variable Bunder the Frobenius norm; id est, minimization proceeds as though giventhe symmetric hollow part of H .This action of the Frobenius norm (959) is effectively a Euclideanprojection (minimum-distance projection) of H on the symmetric hollowsubspace S N h prior to minimization. Thus minimization proceeds inherentlyfollowing the correct order for projection on S N h ∩ C . Therefore wemay either assume H ∈ S N h , or take its symmetric hollow part prior tooptimization.7.1 There are two equivalent interpretations of projection (E.9): one finds a set normal,the other, minimum distance between a point and a set. Here we realize the latter view.

416 CHAPTER 7. EDM PROXIMITY......❜..❜..❜..❜..❜..❜..❜.❜❜❜❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧ 0 ✟✟✟✟✟✟✟✟✟✟✟EDM N❜❜ S N ❝ ❝❝❝❝❝❝❝❝❝❝❝h❜❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❜❜❜❜❜❜ K = S N .h ∩ R N×N+.❜... S N ❜.. ❜..❜.. ❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧✧....... . . . ..R N×N........................Figure 101: Pseudo-Venn diagram: The EDM cone belongs to theintersection of the symmetric hollow subspace with the nonnegative orthant;EDM N ⊆ K (467). EDM N cannot exist outside S N h , but R N×N+ does.......7.0.1.2 Egregious input error under nonnegativity demandMore pertinent to the optimization problems presented herein whereC ∆ = EDM N ⊆ K = S N h ∩ R N×N+ (960)then should some particular realization of a proximity problem demandinput H be nonnegative, and were we only to zero negative entries of anonsymmetric nonhollow input H prior to optimization, then the ensuingprojection on EDM N would be guaranteed incorrect (out of order).Now comes a surprising fact: Even were we to correctly follow theorder-of-projection rule and provide H ∈ K prior to optimization, then theensuing projection on EDM N will be incorrect whenever input H has negativeentries and some proximity problem demands nonnegative input H .

417HS N h0EDM NK = S N h ∩ R N×N+Figure 102: Pseudo-Venn diagram from Figure 101 showing elbow placedin path of projection of H on EDM N ⊂ S N h by an optimization problemdemanding nonnegative input matrix H . The first two line segmentsleading away from H result from correct order-of-projection required toprovide nonnegative H prior to optimization. Were H nonnegative, then itsprojection on S N h would instead belong to K ; making the elbow disappear.(confer Figure 112)

418 CHAPTER 7. EDM PROXIMITYThis is best understood referring to Figure 101: Suppose nonnegativeinput H is demanded, and then the problem realization correctly projectsits input first on S N h and then directly on C = EDM N . That demandfor nonnegativity effectively requires imposition of K on input H priorto optimization so as to obtain correct order of projection (on S N h first).Yet such an imposition prior to projection on EDM N generally introducesan elbow into the path of projection (illustrated in Figure 102) caused bythe technique itself; that being, a particular proximity problem realizationrequiring nonnegative input.Any procedure for imposition of nonnegativity on input H can only beincorrect in this circumstance. There is no resolution unless input H isguaranteed nonnegative with no tinkering. Otherwise, we have no choice butto employ a different problem realization; one not demanding nonnegativeinput.7.0.2 Lower boundMost of the problems we encounter in this chapter have the general form:minimize ‖B − A‖ FBsubject to B ∈ C(961)where A ∈ R m×n is given data. This particular objective denotes Euclideanprojection (E) of vectorized matrix A on the set C which may or may not beconvex. When C is convex, then the projection is unique minimum-distancebecause the Frobenius norm when squared is a strictly convex function ofvariable B and because the optimal solution is the same regardless of thesquare (1307). When C is a subspace, then the direction of projection isorthogonal to C .Denoting by A=U A Σ A QA T and B=U BΣ B QB T their full singular valuedecompositions (whose singular values are always nonincreasingly ordered(A.6)), there exists a tight lower bound on the objective over the manifoldof orthogonal matrices;‖Σ B − Σ A ‖ F ≤inf ‖B − A‖ F (962)U A ,U B ,Q A ,Q BThis least lower bound holds more generally for any orthogonally invariantnorm on R m×n (2.2.1) including the Frobenius and spectral norm [217,II.3].[131,7.4.51]

4197.0.3 Problem approachProblems traditionally posed in terms of point position {x i , i=1... N } ,such as∑minimize (‖x i − x j ‖ − h ij ) 2 (963){x i }orminimize{x i }i , j ∈ I∑(‖x i − x j ‖ 2 − h 2 ij) 2 (964)i , j ∈ I(where I is an abstract set of indices and h ij is given data) are everywhereconverted (in this book) to the distance-square variable D or to Grammatrix G ; the Gram matrix sometimes acting as bridge between positionand distance. That conversion is performed regardless of whether knowndata is complete. Then the techniques of chapter 4 or chapter 5 are appliedto find relative or absolute position. This approach is taken because we preferintroduction of rank constraints into convex problems rather than searchingan infinitude of local minima in (963) or (964) [62].7.0.4 Three prevalent proximity problemsThere are three statements of the closest-EDM problem prevalent in theliterature, the multiplicity due primarily to choice of projection on theEDM versus positive semidefinite (PSD) cone and vacillation between thedistance-square variable d ij versus absolute distance √ d ij . In their mostfundamental form, the three prevalent proximity problems are (965.1),(965.2), and (965.3): [229](1)(3)minimize ‖−V (D − H)V ‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖D − H‖ 2 FDsubject to rankV DV ≤ ρD ∈ EDM Nminimize ‖ ◦√ D − H‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMNminimize ‖−V ( ◦√ D − H)V ‖◦√ 2 FDsubject to rankV DV ≤ ρ◦√ √D ∈ EDMN(2)(965)(4)

420 CHAPTER 7. EDM PROXIMITYwhere we have made explicit an imposed upper bound ρ on affine dimensionr = rankV T NDV N = rankV DV (608)that is benign when ρ =N−1, and where D ∆ =[d ij ] and ◦√ D ∆ = [ √ d ij ] .Problems (965.2) and (965.3) are Euclidean projections of a vectorizedmatrix H on an EDM cone (5.3), whereas problems (965.1) and (965.4) areEuclidean projections of a vectorized matrix −VHV on a PSD cone. Problem(965.4) is not posed in the literature because it has limited theoreticalfoundation. 7.2Analytical solution to (965.1) is known in closed-form for any bound ρalthough, as the problem is stated, it is a convex optimization only in the caseρ =N−1. We show in7.1.4 how (965.1) becomes a convex optimizationproblem for any ρ when transformed to the spectral domain. When expressedas a function of point list in a matrix X , problem (965.2) is a variant of whatis known in statistics literature as the stress problem. [36, p.34] [60] [238]Problems (965.2) and (965.3) are convex optimization problems in D for thecase ρ =N −1. Even with the rank constraint removed from (965.2), we willsee the convex problem remaining inherently minimizes affine dimension.Generally speaking, each problem in (965) produces a different resultbecause there is no isometry relating them. Of the various auxiliaryV -matrices (B.4), the geometric centering matrix V (491) appears in theliterature most often although V N (474) is the auxiliary matrix naturallyconsequent to Schoenberg’s seminal exposition [207]. Substitution of anyauxiliary matrix or its pseudoinverse into these problems produces anothervalid problem.Substitution of VNT for left-hand V in (965.1), in particular, produces adifferent result becauseminimize ‖−V (D − H)V ‖ 2 FD(966)subject to D ∈ EDM N7.2 D ∈ EDM N ⇒ ◦√ D ∈ EDM N , −V ◦√ DV ∈ S N + (4.10)

7.1. FIRST PREVALENT PROBLEM: 421finds D to attain the Euclidean distance of vectorized −VHV to the positivesemidefinite cone in ambient isometrically isomorphic R N(N+1)/2 , whereasminimize ‖−VN T(D − H)V N ‖ 2 FD(967)subject to D ∈ EDM Nattains Euclidean distance of vectorized −VN THV N to the positivesemidefinite cone in isometrically isomorphic subspace R N(N−1)/2 ; quitedifferent projections 7.3 regardless of whether affine dimension is constrained.But substitution of auxiliary matrix VW T (B.4.3) or V † Nyields the sameresult as (965.1) because V = V W VW T = V N V † N; id est,‖−V (D − H)V ‖ 2 F = ‖−V W V T W (D − H)V W V T W ‖2 F = ‖−V T W (D − H)V W‖ 2 F= ‖−V N V † N (D − H)V N V † N ‖2 F = ‖−V † N (D − H)V N ‖ 2 F(968)We see no compelling reason to prefer one particular auxiliary V -matrixover another. Each has its own coherent interpretations; e.g.,4.4.2,5.7.Neither can we say any particular problem formulation produces generallybetter results than another.7.1 First prevalent problem:Projection on PSD coneThis first problemminimize ‖−VN T(D − H)V ⎫N ‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 1 (969)⎪D ∈ EDM N ⎭7.3 The isomorphism T(Y )=V †TN Y V † N onto SN c = {V X V | X ∈ S N } relates the map in(967) to that in (966), but is not an isometry.

422 CHAPTER 7. EDM PROXIMITYposes a Euclidean projection of −VN THV N in subspace S N−1 on a generallynonconvex subset (when ρ

7.1. FIRST PREVALENT PROBLEM: 4237.1.1 Closest-EDM Problem 1, convex case7.1.1.0.1 Proof. Solution (971), convex case.When desired affine dimension is unconstrained, ρ =N −1, the rank functiondisappears from (969) leaving a convex optimization problem; a simpleunique minimum-distance projection on the positive semidefinite cone S N−1videlicetby (487). Becauseminimize ‖−VD∈ S N N T(D − H)V N ‖ 2 Fhsubject to −VN TDV N ≽ 0S N−1 = −V T N S N h V N (577)+ :(974)then the necessary and sufficient conditions for projection in isometricallyisomorphic R N(N−1)/2 on the self-dual (305) positive semidefinite cone S N−1+are: 7.6 (E.9.2.0.1) (1216) (confer (1632))−V T N D⋆ V N ≽ 0−V T N D⋆ V N(−VTN D ⋆ V N + V T N HV N)= 0−V T N D⋆ V N + V T N HV N ≽ 0(975)Symmetric −VN THV N is diagonalizable hence decomposable in terms of itseigenvectors v and eigenvalues λ as in (972). Therefore (confer (971))−V T ND ⋆ V N =N−1∑i=1max{0, λ i }v i v T i (976)satisfies (975), optimally solving (974). To see that, recall: these eigenvectorsconstitute an orthogonal set and−V T ND ⋆ V N + V T NHV NN−1∑= − min{0, λ i }v i vi T (977)i=17.6 These conditions for projection on a convex cone are identical to theKarush-Kuhn-Tucker (KKT) optimality conditions for problem (974).

424 CHAPTER 7. EDM PROXIMITY7.1.2 Generic problemPrior to determination of D ⋆ , analytical solution to Problem 1B ⋆ ∆ = −V T ND ⋆ V N =ρ∑max{0, λ i }v i vi T ∈ S N−1 (971)i=1is equivalent to solution of a generic rank-constrained projection problem; aEuclidean projection on a subset of a PSD cone (on a generally nonconvexsubset of the PSD cone boundary ∂S N−1+ when ρ

7.1. FIRST PREVALENT PROBLEM: 425dimension 7.7 ρ ; called rank ρ subset: (180) (213)S N−1+ \ S N−1+ (ρ + 1) = {X ∈ S N−1+ | rankX ≤ ρ} (218)7.1.3 Choice of spectral coneIn this section we develop a method of spectral projection, on polyhedralcones containing eigenspectra, for constraining rank of positive semidefinitematrices in a proximity problem like (978). We will see why an orthant turnsout to be the best choice of spectral cone, and why presorting (π) is critical.7.1.3.0.1 Definition. Spectral projection.Let R be an orthogonal matrix and Λ a nonincreasingly ordereddiagonal matrix of eigenvalues. Spectral projection means uniqueminimum-distance projection of a rotated (R ,B.5.4) nonincreasinglyordered (π) eigenspectrumπ ( δ(R T ΛR) ) ∈ R N−1 (980)on a polyhedral cone containing all eigenspectra corresponding to a rank ρsubset (2.9.2.1) of the positive semidefinite cone S N−1+ . △In the simplest and most common case, projection on the positivesemidefinite cone, orthogonal matrix R equals I (7.1.4.0.1) and diagonalmatrix Λ is ordered during diagonalization. Then spectral projection simplymeans projection of δ(Λ) on a subset of the nonnegative orthant, as we shallnow ascertain:It is curious how nonconvex Problem 1 has such a simple analyticalsolution (971). Although solution to generic problem (978) is known since1936 [72], Trosset [237,2] first observed its equivalence in 1997 to spectralprojection of an ordered eigenspectrum (in diagonal matrix Λ) on a subsetof the monotone nonnegative cone (2.13.9.4.1)K M+ = {υ | υ 1 ≥ υ 2 ≥ · · · ≥ υ N−1 ≥ 0} ⊆ R N−1+ (354)Of interest, momentarily, is only the smallest convex subset of themonotone nonnegative cone K M+ containing every nonincreasingly ordered7.7 Recall: affine dimension is a lower bound on embedding (2.3.1), equal to dimensionof the smallest affine set in which points from a list X corresponding to an EDM D canbe embedded.

426 CHAPTER 7. EDM PROXIMITYeigenspectrum corresponding to a rank ρ subset of the positive semidefinitecone S N−1+ ; id est,K ρ M+∆= {υ ∈ R ρ | υ 1 ≥ υ 2 ≥ · · · ≥ υ ρ ≥ 0} ⊆ R ρ + (981)a pointed polyhedral cone, a ρ-dimensional convex subset of themonotone nonnegative cone K M+ ⊆ R N−1+ having property, for λ denotingeigenspectra,[ρ] KM+= π(λ(rank ρ subset)) ⊆ K N−1 ∆0M+= K M+ (982)For each and every elemental eigenspectrumγ ∈ λ(rank ρ subset)⊆ R N−1+ (983)of the rank ρ subset (ordered or unordered in λ), there is a nonlinearsurjection π(γ) onto K ρ M+ . There is no convex subset of K M+ smaller thanK ρ M+containing every ordered eigenspectrum corresponding to the rank ρsubset; a fact not proved here.7.1.3.0.2 Proposition. (Hardy-Littlewood-Pólya) [111,X][38,1.2] Any vectors σ and γ in R N−1 satisfy a tight inequalityπ(σ) T π(γ) ≥ σ T γ ≥ π(σ) T Ξπ(γ) (984)where Ξ is the order-reversing permutation matrix defined in (1352),and permutator π(γ) is a nonlinear function that sorts vector γ intononincreasing order thereby providing the greatest upper bound and leastlower bound with respect to every possible sorting.⋄7.1.3.0.3 Corollary. Monotone nonnegative sort.Any given vectors σ,γ∈R N−1 satisfy a tight Euclidean distance inequality‖π(σ) − π(γ)‖ ≤ ‖σ − γ‖ (985)where nonlinear function π(γ) sorts vector γ into nonincreasing orderthereby providing the least lower bound with respect to every possiblesorting.⋄

7.1. FIRST PREVALENT PROBLEM: 427Given γ ∈ R N−1infσ∈R N−1+‖σ−γ‖ = infσ∈R N−1+‖π(σ)−π(γ)‖ = infσ∈R N−1+Yet for γ representing an arbitrary eigenspectrum, becauseinf − γ‖0‖σ 2 ≥ inf − π(γ)‖σ∈R ρ +0‖σ 2 =σ∈R ρ +‖σ−π(γ)‖ = inf ‖σ−π(γ)‖σ∈K M+infσ∈K ρ M+0(986)‖σ − π(γ)‖ 2 (987)then projection of γ on the eigenspectra of a rank ρ subset can be tightenedsimply by presorting γ into nonincreasing order.Proof. Simply because π(γ) 1:ρ ≽ π(γ 1:ρ )inf − γ‖0‖σ 2 = γρ+1:N−1 T γ ρ+1:N−1 + inf ‖σ 1:ρ − γ 1:ρ ‖ 2σ∈R N−1+= γ T γ + inf σ1:ρσ T 1:ρ − 2σ1:ργ T 1:ρσ∈R ρ +inf − γ‖0‖σ 2 ≥ infσ∈R ρ +σ∈R N−1+≥ γ T γ + infσ1:ρσ T 1:ρ − 2σ T (988)1:ρπ(γ) 1:ρσ∈R N−1+− π(γ)‖σ∈R ρ +0‖σ 27.1.3.1 Orthant is best spectral cone for Problem 1This means unique minimum-distance projection of γ on the nearestspectral member of the rank ρ subset is tantamount to presorting γ intononincreasing order. Only then does unique spectral projection on a subsetK ρ M+of the monotone nonnegative cone become equivalent to unique spectralprojection on a subset R ρ + of the nonnegative orthant (which is simpler);in other words, unique minimum-distance projection of sorted γ on thenonnegative orthant in a ρ-dimensional subspace of R N is indistinguishablefrom its projection on the subset K ρ M+of the monotone nonnegative cone inthat same subspace.

428 CHAPTER 7. EDM PROXIMITY7.1.4 Closest-EDM Problem 1, “nonconvex” caseTrosset’s proof of solution (971), for projection on a rank ρ subset of thepositive semidefinite cone S N−1+ , was algebraic in nature. [237,2] Here wederive that known result but instead using a more geometric argument viaspectral projection on a polyhedral cone (subsuming the proof in7.1.1). Inso doing, we demonstrate how Problem 1 is implicitly a convex optimization:7.1.4.0.1 Proof. Solution (971), nonconvex case.As explained in7.1.2, we may instead work with the more facile genericproblem (978). With diagonalization of unknownB ∆ = UΥU T ∈ S N−1 (989)given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalizableA ∆ = QΛQ T ∈ S N−1 (990)having eigenvalues in Λ arranged in nonincreasing order, by (39) the genericproblem is equivalent towhereminimize ‖Υ − R T ΛR‖ 2 FR , Υsubject to rank Υ ≤ ρ(991)Υ ≽ 0R −1 = R TR ∆ = Q T U ∈ R N−1×N−1 (992)in U on the set of orthogonal matrices is a linear bijection. By (1301), thisis equivalent to the problem sequence:minimize ‖Υ − R T ΛR‖ 2 FΥsubject to rank Υ ≤ ρΥ ≽ 0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(993)

7.1. FIRST PREVALENT PROBLEM: 429Problem (993a) is equivalent to: (1) orthogonal projection of R T ΛRon an N − 1-dimensional subspace of isometrically isomorphic R N(N−1)/2containing δ(Υ)∈ R N−1+ , (2) nonincreasingly ordering the[result, (3) uniqueRρ]+minimum-distance projection of the ordered result on . (E.9.5)0Projection on that N−1-dimensional subspace amounts to zeroing R T ΛR atall entries off the main diagonal; thus, the equivalent sequence leading witha spectral projection:minimize ‖δ(Υ) − π ( δ(R T ΛR) ) ‖ 2Υ [ Rρ]+subject to δ(Υ) ∈0minimize ‖Υ ⋆ − R T ΛR‖ 2 FRsubject to R −1 = R T(a)(b)(994)Because any permutation matrix is an orthogonal matrix, it is alwaysfeasible that δ(R T ΛR)∈ R N−1 be arranged in nonincreasing order; hence,the permutation operator π . Unique minimum-distance projection of vectorπ ( δ(R T ΛR) ) [ Rρ]+on the ρ-dimensional subset of the nonnegative orthant0R N−1+ requires: (E.9.2.0.1)δ(Υ ⋆ ) ρ+1:N−1 = 0δ(Υ ⋆ ) ≽ 0δ(Υ ⋆ ) T( δ(Υ ⋆ ) − π(δ(R T ΛR)) ) = 0δ(Υ ⋆ ) − π(δ(R T ΛR)) ≽ 0(995)which are necessary and sufficient conditions.conditions (995) is optimal for (994a). Soδ(Υ ⋆ ) i ={{max 0, π ( δ(R T ΛR) ) }, i=1... ρi0, i=ρ+1... N −1Any value Υ ⋆ satisfying(996)specifies an optimal solution. The lower bound on the objective with respectto R in (994b) is tight; by (962)

430 CHAPTER 7. EDM PROXIMITY‖ |Υ ⋆ | − |Λ| ‖ F ≤ ‖Υ ⋆ − R T ΛR‖ F (997)where | | denotes absolute entry-value. For selection of Υ ⋆ as in (996), thislower bound is attained whenR ⋆ = I (998)which is the known solution.7.1.4.1 SignificanceImportance of this well-known [72] optimal solution (971) for projection ona rank ρ subset of the positive semidefinite cone should not be dismissed:This solution is closed-form and the only method known for enforcinga constraint on rank of an EDM in projection problems such as (965).This solution is equivalent to projection on a polyhedral cone in thespectral domain (projection on a spectral cone); a necessary andsufficient condition (A.3.1) for membership of a symmetric matrix toa rank ρ subset (2.9.2.1) of the positive semidefinite cone.This solution at once encompasses projection on a rank ρ subset (218)of the positive semidefinite cone (generally, a nonconvex subset of itsboundary) from either the exterior or interior of that cone. 7.8 Bytransforming the problem to the spectral domain, projection on arank ρ subset became a convex optimization problem.For the convex case problem, this solution is always unique. Otherwise,distinct eigenvalues (multiplicity 1) in Λ guarantees uniqueness of thissolution by the reasoning inA.5.0.1 . 7.97.8 Projection on the boundary from the interior of a convex Euclidean body is generallya nonconvex problem.7.9 Uncertainty of uniqueness prevents the erroneous conclusion that a rank ρ subset(180) were a convex body by the Bunt-Motzkin theorem (E.9.0.0.1).

7.1. FIRST PREVALENT PROBLEM: 431Because U ⋆ = Q , a minimum-distance projection on a rank ρ subsetof the positive semidefinite cone is a positive semidefinite matrixorthogonal (in the Euclidean sense) to direction of projection. 7.107.1.5 Problem 1 in spectral norm, convex caseWhen instead we pose the matrix 2-norm (spectral norm) in Problem 1 (969)for the convex case ρ = N −1, then the new problemminimize ‖−VN T(D − H)V N ‖ 2D(999)subject to D ∈ EDM Nis convex although its solution is not necessarily unique; 7.11 giving rise tooblique projection (E) on the positive semidefinite cone S N−1+ . Indeed,its solution set includes the Frobenius solution (971) for the convex casewhenever −VN THV N is a normal matrix. [114,1] [109] [41,8.1.1] Singularvalue problem (999) is equivalent towhereminimize µµ,Dsubject to −µI ≼ −VN T(D − H)V N ≼ µI (1000)D ∈ EDM N{ ∣µ ⋆ = maxλ ( −VN T (D ⋆ − H)V N∣ , i = 1... N −1i)i} ∈ R + (1001)the maximum absolute eigenvalue (due to matrix symmetry).For lack of a unique solution here, we prefer the Frobenius rather thanspectral norm.7.10 But Theorem E.9.2.0.1 for unique projection on a closed convex cone does not applyhere because the direction of projection is not necessarily a member of the dual PSD cone.This occurs, for example, whenever positive eigenvalues[ ]are truncated.2 07.11 For each and every |t|≤2, for example, has the same spectral norm.0 t

432 CHAPTER 7. EDM PROXIMITY7.2 Second prevalent problem:Projection on EDM cone in √ d ijLet◦√D ∆ = [ √ d ij ] ∈ K = S N h ∩ R N×N+ (1002)be an unknown matrix of absolute distance; id est,D = [d ij ] ∆ = ◦√ D ◦ ◦√ D ∈ EDM N (1003)where ◦ denotes Hadamard product. The second prevalent proximityproblem is a Euclidean projection (in the natural coordinates √ d ij ) of matrixH on a nonconvex subset of the boundary of the nonconvex cone of Euclideanabsolute-distance matrices rel∂ √ EDM N : (5.3, confer Figure 78(b))minimize ‖ ◦√ ⎫D − H‖◦√ 2 FD⎪⎬subject to rankVN TDV N ≤ ρ Problem 2 (1004)◦√ √D ∈ EDMN⎪⎭where√EDM N = { ◦√ D | D ∈ EDM N } (746)This statement of the second proximity problem is considered difficult tosolve because of the constraint on desired affine dimension ρ (4.7.2) andbecause the objective function‖ ◦√ D − H‖ 2 F = ∑ i,j( √ d ij − h ij ) 2 (1005)is expressed in the natural coordinates; projection on a doubly nonconvexset.Our solution to this second problem prevalent in the literature requiresmeasurement matrix H to be nonnegative;H = [h ij ] ∈ R N×N+ (1006)If the H matrix given has negative entries, then the technique of solutionpresented here becomes invalid. As explained in7.0.1, projection of Hon K = S N h ∩ R N×N+ (956) prior to application of this proposed solution isincorrect.

7.2. SECOND PREVALENT PROBLEM: 4337.2.1 Convex caseWhen ρ = N − 1, the rank constraint vanishes and a convex problememerges: 7.12minimize◦√Dsubject to‖ ◦√ D − H‖ 2 F◦√D ∈√EDMN⇔minimizeD∑ √d ij − 2h ij dij + h 2 iji,jsubject to D ∈ EDM N (1007)For any fixed i and j , the argument of summation is a convex functionof d ij because (for nonnegative constant h ij ) the negative square root isconvex in nonnegative d ij and because d ij + h 2 ij is affine (convex). Becausethe sum of any number of convex functions in D remains convex [41,3.2.1]and because the feasible set is convex in D , we have a convex optimizationproblem:minimize 1 T (D − 2H ◦ ◦√ D )1 + ‖H‖ 2 FD(1008)subject to D ∈ EDM NThe objective function being a sum of strictly convex functions is,moreover, strictly convex in D on the nonnegative orthant. Existenceof a unique solution D ⋆ for this second prevalent problem depends uponnonnegativity of H and a convex feasible set (3.1.1.1). 7.137.2.1.1 Equivalent semidefinite program, Problem 2, convex caseConvex problem (1007) is numerically solvable for its global minimum usingan interior-point method [41,11] [267] [189] [182] [259] [9] [83]. We translate(1007) to an equivalent semidefinite program (SDP) for a pedagogical reasonmade clear in7.2.2.2 and because there exist readily available computerprograms for numerical solution [224] [100] [240] [26] [260] [261] [262].Substituting a new matrix variable Y ∆ = [y ij ]∈ R N×N+h ij√dij ← y ij (1009)7.12 still thought to be a nonconvex problem as late as 1997 [238] even though discoveredconvex by de Leeuw in 1993. [60] [36,13.6] Yet using methods from3, it can be easilyascertained: ‖ ◦√ D − H‖ F is not convex in D .7.13 The transformed problem in variable D no longer describes √ Euclidean projection onan EDM cone. Otherwise we might erroneously conclude EDM N were a convex bodyby the Bunt-Motzkin theorem (E.9.0.0.1).

434 CHAPTER 7. EDM PROXIMITYBoyd proposes: problem (1007) is equivalent to the semidefinite program∑minimize d ij − 2y ij + h 2 ijD , Yi,j[ ]dij y ij(1010)subject to≽ 0 , i,j = 1... Ny ij h 2 ijD ∈ EDM NTo see that, recall d ij ≥ 0 is implicit to D ∈ EDM N (4.8.1, (487)). Sowhen H ∈ R N×N+ is nonnegative as assumed,[ ]dij y √ij≽ 0 ⇔ h ij√d ij ≥ yij 2 (1011)y ijh 2 ijMinimization of the objective function implies maximization of y ij that isbounded above. √Hence nonnegativity of y ij is implicit to (1010) and, asdesired, y ij →h ij dij as optimization proceeds.If the given matrix H is now assumed symmetric and nonnegative,H = [h ij ] ∈ S N ∩ R N×N+ (1012)then Y = H ◦ ◦√ D must belong to K= S N h ∩ R N×N+ (956). Then becauseY ∈ S N h (B.4.2 no.20)‖ ◦√ D − H‖ 2 F = ∑ i,jd ij − 2y ij + h 2 ij = −N tr(V (D − 2Y )V ) + ‖H‖ 2 F (1013)so convex problem (1010) is equivalent to the semidefinite programminimize − tr(V (D − 2Y )V )D , Y[ ]dij y ijsubject to≽ 0 ,j > i = 1... N −1y ij h 2 ij(1014)Y ∈ S N hD ∈ EDM Nwhere the constants h 2 ij and N have been dropped arbitrarily from theobjective.

7.2. SECOND PREVALENT PROBLEM: 4357.2.1.2 Gram-form semidefinite program, Problem 2, convex caseThere is great advantage to expressing problem statement (1014) inGram-form because Gram matrix G is a bidirectional bridge between pointlist X and distance matrix D ; e.g., Example 4.4.2.2.4, Example 5.4.0.0.1.This way, problem convexity can be maintained while simultaneouslyconstraining point list X , Gram matrix G , and distance matrix D at ourdiscretion.Convex problem (1014) may be equivalently written via linear bijective(4.6.1) EDM operator D(G) (480);minimizeG∈S N c , Y ∈ S N hsubject to− tr(V (D(G) − 2Y )V )[ ]〈Φij , G〉 y ij≽ 0 ,y ijh 2 ijj > i = 1... N −1(1015)G ≽ 0where distance-square D = [d ij ] ∈ S N h (464) is related to Gram matrix entriesG = [g ij ] ∈ S N c ∩ S N + bywhered ij = g ii + g jj − 2g ij= 〈Φ ij , G〉(479)Φ ij = (e i − e j )(e i − e j ) T ∈ S N + (466)Confinement of G to the geometric center subspace provides numericalstability and no loss of generality (confer (750)); implicit constraint G1 = 0is otherwise unnecessary.To include constraints on the list X ∈ R n×N , we would first rewrite(1015)minimize − tr(V (D(G) − 2Y )V )G∈S N c , Y ∈ SN h , X∈ Rn×N [ ]〈Φij , G〉 y ijsubject to≽ 0 ,y ijh 2 ij[ ] I XX T ≽ 0GX ∈ Cj > i = 1... N −1(1016)

436 CHAPTER 7. EDM PROXIMITYand then add the constraints, realized here in abstract membership tosome convex set C . This problem realization includes a relaxation of thenonconvex constraint G = X T X and, if desired, more constraints on G couldbe added. This technique is discussed in4.4.2.2.4.7.2.2 Minimization of affine dimension in Problem 2When desired affine dimension ρ is diminished, the rank function becomesreinserted into problem (1010) that is then rendered difficult to solve becausethe feasible set {D , Y } loses convexity in S N h × R N×N . Indeed, the rankfunction is quasiconcave (3.2) on the positive semidefinite cone; (2.9.2.6.2)id est, its sublevel sets are not convex.7.2.2.1 Rank minimization heuristicA remedy developed in [78] [171] [79] [77] introduces the convex envelope(cenv) of the quasiconcave rank function:7.2.2.1.1 Definition. Convex envelope. [128]The convex envelope of a function f : C →R is defined as the largest convexfunction g such that g ≤ f on convex domain C ⊆ R n . 7.14 △The convex envelope of the rank function is proportional to the tracefunction when its argument is constrained to be symmetric and positivesemidefinite.A properly scaled trace thus represents the best convex lower bound on rank.The idea, then, is to substitute the convex envelope for the rank of somevariable A∈ S M +rankA ← cenv(rankA) ∝trA = ∑ iσ(A) i = ∑ iequivalent to the sum of all eigenvalues or singular values.λ(A) i = ‖λ(A)‖ 1(1017)7.14 Provided f ≢+∞ and there exists an affine function h ≤f on R n , then the convexenvelope is equal to the convex conjugate (the Legendre-Fenchel transform) of the convexconjugate of f ; id est, the conjugate-conjugate function f ∗∗ . [129,E.1]

7.2. SECOND PREVALENT PROBLEM: 4377.2.2.2 Applying trace rank-heuristic to Problem 2Substituting the rank envelope for the rank function in Problem 2, forD ∈ EDM N (confer (608))cenv rank(−V T NDV N ) = cenv rank(−V DV ) ∝ − tr(V DV ) (1018)and for desired affine dimension ρ ≤ N−1 and nonnegative H [sic] we havethe relaxed convex optimization problemminimize ‖ ◦√ D − H‖ 2 FDsubject to − tr(V DV ) ≤ κ ρ(1019)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut-and-try. 7.15 The equivalentsemidefinite program makes κ variable: for nonnegative and symmetric HminimizeD , Y , κsubject toκ ρ + 2 tr(V Y V )[ ]dij y ij≽ 0 ,y ijh 2 ijj > i = 1... N −1(1020)− tr(V DV ) ≤ κ ρY ∈ S N hD ∈ EDM Nwhich is the same as (1014), the problem with no explicit constraint on affinedimension. As the present problem is stated, the desired affine dimension ρyields to the variable scale factor κ ; ρ is effectively ignored.Yet this result is an illuminant for problem (1014) and it equivalents(all the way back to (1007)): When the given measurement matrix His nonnegative and symmetric, finding the closest EDM D as in problem(1007), (1010), or (1014) implicitly entails minimization of affine dimension(confer4.8.4,4.14.4). Those non rank-constrained problems are eachinherently equivalent to cenv(rank)-minimization problem (1020), in otherwords, and their optimal solutions are unique because of the strictly convexobjective function in (1007).7.15 cenv(rank A) on {A∈ S n + | ‖A‖ 2 ≤κ} equals tr(A)/κ [78] [77]

438 CHAPTER 7. EDM PROXIMITY7.2.2.3 Rank-heuristic insightMinimization of affine dimension by use of this trace rank-heuristic (1018)tends to find the list configuration of least energy; rather, it tends to optimizecompaction of the reconstruction by minimizing total distance. (493) It is bestused where some physical equilibrium implies such an energy minimization;e.g., [236,5].For this Problem 2, the trace rank-heuristic arose naturally in theobjective in terms of V . We observe: V (in contrast to VN T ) spreads energyover all available distances (B.4.2 no.20, no.22) although the rank functionitself is insensitive to the choice.7.2.2.4 Rank minimization heuristic beyond convex envelopeFazel, Hindi, and Boyd [79] propose a rank heuristic more potent than trace(1017) for problems of rank minimization;rankY ← log det(Y +εI) (1021)the concave surrogate function log det in place of quasiconcave rankY(2.9.2.6.2) when Y ∈ S n + is variable and where ε is a small positive constant.They propose minimization of the surrogate by substituting a sequencecomprising infima of a linearized surrogate about the current estimate Y i ;id est, from the first-order Taylor series expansion about Y i on some openinterval of ‖Y ‖ (D.1.6)log det(Y + εI) ≈ log det(Y i + εI) + tr ( (Y i + εI) −1 (Y − Y i ) ) (1022)we make the surrogate sequence of infima over bounded convex feasible set Cwhere, for i = 0...arg inf rankY ← lim Y i+1 (1023)Y ∈ C i→∞Y i+1 = arg infY ∈ C tr( (Y i + εI) −1 Y ) (1024)Choosing Y 0 =I , the first step becomes equivalent to finding the infimum oftrY ; the trace rank-heuristic (1017). The intuition underlying (1024) is thenew term in the argument of trace; specifically, (Y i + εI) −1 weights Y so thatrelatively small eigenvalues of Y found by the infimum are made even smaller.

7.2. SECOND PREVALENT PROBLEM: 439To see that, substitute the nonincreasingly ordered diagonalizationsY i + εI ∆ = Q(Λ + εI)Q TY ∆ = UΥU T(a)(b)(1025)into (1024). Then from (1329) we have,inf δ((Λ + εI) −1 ) T δ(Υ) =Υ∈U ⋆T CU ⋆≤infΥ∈U T CUinf tr ( (Λ + εI) −1 R T ΥR )R T =R −1inf tr((Y i + εI) −1 Y )Y ∈ C(1026)where R = ∆ Q T U in U on the set of orthogonal matrices is a bijection. Therole of ε is, therefore, to limit the maximum weight; the smallest entry onthe main diagonal of Υ gets the largest weight.7.2.2.5 Applying log det rank-heuristic to Problem 2When the log det rank-heuristic is inserted into Problem 2, problem (1020)becomes the problem sequence in iminimizeD , Y , κsubject toκ ρ + 2 tr(V Y V )[ ]djl y jl≽ 0 ,y jlh 2 jll > j = 1... N −1− tr((−V D i V + εI) −1 V DV ) ≤ κ ρ(1027)Y ∈ S N hD ∈ EDM Nwhere D i+1 ∆ =D ⋆ ∈ EDM N , and D 0 ∆ =11 T − I .7.2.2.6 Tightening this log det rank-heuristicLike the trace method, this log det technique for constraining rank offersno provision for meeting a predetermined upper bound ρ . Yet since theeigenvalues of the sum are simply determined, λ(Y i + εI) = δ(Λ + εI) , wemay certainly force selected weights to ε −1 by manipulating diagonalization(1025a). Empirically we find this sometimes leads to better results, althoughaffine dimension of a solution cannot be guaranteed.

440 CHAPTER 7. EDM PROXIMITY7.3 Third prevalent problem:Projection on EDM cone in d ijReformulating Problem 2 (p.432) in terms of EDM D changes the problemconsiderably:⎫minimize ‖D − H‖ 2 F ⎪⎬Dsubject to rankVN TDV N ≤ ρ Problem 3 (1028)⎪D ∈ EDM N ⎭This third prevalent proximity problem is a Euclidean projection of givenmatrix H on a generally nonconvex subset (when ρ < N −1) of theboundary of the convex cone of Euclidean distance matrices ∂EDM N relativeto subspace S N h (Figure 78(d)). Because coordinates of projection aredistance-square and H presumably now holds distance-square measurements,numerical solution to Problem 3 is generally different than that of Problem 2.For the moment, we need make no assumptions regarding measurementmatrix H .7.3.1 Convex caseminimize ‖D − H‖ 2 FD(1029)subject to D ∈ EDM NWhen the rank constraint disappears (for ρ = N −1), this third problembecomes obviously convex because the feasible set is then the entire EDMcone and because the objective function‖D − H‖ 2 F = ∑ i,j(d ij − h ij ) 2 (1030)is a strictly convex quadratic in D ; 7.167.16 For nonzero Y ∈ S N h and some open interval of t∈R (3.1.2.3.2,D.2.3)d 2dt 2 ‖(D + tY ) − H‖2 F = 2 tr Y T Y > 0

7.3. THIRD PREVALENT PROBLEM: 441∑minimize d 2 ij − 2h ij d ij + h 2 ijDi,j(1031)subject to D ∈ EDM NOptimal solution D ⋆ is therefore unique, as expected, for this simpleprojection on the EDM cone.7.3.1.1 Equivalent semidefinite program, Problem 3, convex caseIn the past, this convex problem was solved numerically by means ofalternating projection. (Example 7.3.1.1.1) [90] [84] [115,1] We translate(1031) to an equivalent semidefinite program because we have a good solver:Assume the given measurement matrix H to be nonnegative andsymmetric; 7.17H = [h ij ] ∈ S N ∩ R N×N+ (1012)We then propose: Problem (1031) is equivalent to the semidefinite program,for ∂ = ∆ [d 2 ij ]=D ◦D distance-square squared,minimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , j > i = 1... N −1d ij 1(1032)D ∈ EDM N∂ ∈ S N hwhere [∂ij d ijd ij 1]≽ 0 ⇔ ∂ ij ≥ d 2 ij (1033)Symmetry of input H facilitates trace in the objective (B.4.2 no.20), whileits nonnegativity causes ∂ ij →d 2 ij as optimization proceeds.7.17 If that H given has negative entries, then the technique of solution presented herebecomes invalid. Projection of H on K (956) prior to application of this proposedtechnique, as explained in7.0.1, is incorrect.

442 CHAPTER 7. EDM PROXIMITY7.3.1.1.1 Example. Alternating projection on nearest EDM.By solving (1032) we confirm the result from an example given by Glunt,Hayden, et alii [90,6] who found an analytical solution to the convexoptimization problem (1029) for the particular cardinality N = 3 by usingthe alternating projection method of von Neumann (E.10):⎡H = ⎣0 1 11 0 91 9 0⎤⎦ , D ⋆ =⎡⎢⎣19 1909 919 7609 919 7609 9⎤⎥⎦ (1034)The original problem (1029) of projecting H on the EDM cone is transformedto an equivalent iterative sequence of projections on the two convex cones(809) from5.8.1.1. Using ordinary alternating projection, input H goes toD ⋆ with an accuracy of four decimal places in about 17 iterations. Affinedimension corresponding to this optimal solution is r = 1.Obviation of semidefinite programming’s computational expense is theprincipal advantage of this alternating projection technique. 7.3.1.2 Schur-form semidefinite program, Problem 3 convex caseSemidefinite program (1032) can be reformulated by moving the objectivefunction inminimize ‖D − H‖ 2 FD(1029)subject to D ∈ EDM Nto the constraints. This makes an equivalent second-order cone program: forany measurement matrix Hminimize tt∈R , Dsubject to ‖D − H‖ 2 F ≤ t(1035)D ∈ EDM NWe can transform this problem to an equivalent Schur-form semidefiniteprogram; (A.4.1)minimizet∈R , Dsubject tot[]tI vec(D − H)vec(D − H) T ≽ 0 (1036)1D ∈ EDM N

7.3. THIRD PREVALENT PROBLEM: 443characterized by great sparsity and structure. The advantage of this SDPis lack of conditions on input H ; e.g., negative entries would invalidate anysolution provided by (1032). (7.0.1.2)7.3.1.3 Gram-form semidefinite program, Problem 3 convex caseFurther, this problem statement may be equivalently written in terms of aGram matrix via linear bijective (4.6.1) EDM operator D(G) (480);minimizeG∈S N c , t∈Rsubject tot[tI vec(D(G) − H)vec(D(G) − H) T 1]≽ 0(1037)G ≽ 0To include constraints on the list X ∈ R n×N , we would rewrite this:G∈S N cminimize, t∈R , X∈ Rn×Ntsubject to[tI vec(D(G) − H)vec(D(G) − H) T 1[ ] I XX T ≽ 0G]≽ 0(1038)X ∈ Cwhere C is some abstract convex set. This technique is discussed in4.4.2.2.4.7.3.1.4 Dual interpretation, projection on EDM coneFromE.9.1.1 we learn that projection on a convex set has a dual form. Inthe circumstance K is a convex cone and point x exists exterior to the coneor on its boundary, distance to the nearest point Px in K is found as theoptimal value of the objectivewhere K ◦ is the polar cone.‖x − Px‖ = maximize a T xasubject to ‖a‖ ≤ 1 (1618)a ∈ K ◦

444 CHAPTER 7. EDM PROXIMITYApplying this result to (1029), we get a convex optimization for any givensymmetric matrix H exterior to or on the EDM cone boundary:maximize 〈A ◦ , H〉minimize ‖D − H‖ 2 A ◦FDsubject to D ∈ EDM ≡ subject to ‖A ◦ ‖ N F ≤ 1 (1039)A ◦ ∈ EDM N◦Then from (1620) projection of H on cone EDM N isD ⋆ = H − A ◦⋆ 〈A ◦⋆ , H〉 (1040)Critchley proposed, instead, projection on the polar EDM cone in his 1980thesis [54, p.113]: In that circumstance, by projection on the algebraiccomplement (E.9.2.2.1),which is equal to (1040) when A ⋆ solvesD ⋆ = A ⋆ 〈A ⋆ , H〉 (1041)maximize 〈A , H〉Asubject to ‖A‖ F = 1(1042)A ∈ EDM NThis projection of symmetric H on polar cone EDM N◦ can be made a convexproblem, of course, by relaxing the equality constraint (‖A‖ F ≤ 1).7.3.2 Minimization of affine dimension in Problem 3When the desired affine dimension ρ is diminished, Problem 3 (1028) isdifficult to solve [115,3] because the feasible set in R N(N−1)/2 loses convexity.By substituting the rank envelope (1018) into Problem 3, then for any givenH we have the relaxed convex problemminimize ‖D − H‖ 2 FDsubject to − tr(V DV ) ≤ κ ρ(1043)D ∈ EDM Nwhere κ ∈ R + is a constant determined by cut-and-try. Given κ , problem(1043) is a convex optimization having unique solution in any desired

7.3. THIRD PREVALENT PROBLEM: 445affine dimension ρ ; an approximation to Euclidean projection on thatnonconvex subset of the EDM cone containing EDMs with correspondingaffine dimension no greater than ρ .The SDP equivalent to (1043) does not move κ into the variables as onpage 437: for nonnegative symmetric input Hminimize − tr(V (∂ − 2H ◦D)V )∂ , D[ ]∂ij d ijsubject to≽ 0 , j > i = 1... N −1d ij 1− tr(V DV ) ≤ κ ρD ∈ EDM N(1044)∂ ∈ S N hThat means we will not see equivalence of this cenv(rank)-minimizationproblem to the non rank-constrained problems (1031) and (1032) like wesaw for its counterpart (1020) in Problem 2.Another approach to affine dimension minimization is to project insteadon the polar EDM cone; discussed in5.8.1.5.7.3.3 Constrained affine dimension, Problem 3When one desires affine dimension diminished further below what can beachieved via cenv(rank)-minimization as in (1044), spectral projection isa natural consideration. Spectral projection is motivated by its successfulapplication to projection on a rank ρ subset of the positive semidefinite conein7.1.4. Yet it is wrong here to zero eigenvalues of −V DV or −V GV or avariant to reduce affine dimension, because that particular method is germaneto projection on a positive semidefinite cone; zeroing eigenvalues here wouldplace an elbow in the projection path. (confer Figure 102) Problem 3 isinstead a projection on the EDM cone whose associated spectral cone isconsiderably different. (4.11.2.3)We shall now show why direct application of spectral projection to theEDM cone is difficult. Although philosophically questionable to providea negative example, it is instructive and will hopefully motivate futureresearchers to overcome the obstacles we encountered.

446 CHAPTER 7. EDM PROXIMITY7.3.3.1 Cayley-Menger formWe use Cayley-Menger composition of the Euclidean distance matrix to solvea problem that is the same as Problem 3 (1028): (4.7.3.0.1)[ ] [ ]∥ minimize0 1T 0 1T ∥∥∥2D∥ −1 −D 1 −H[ ]F0 1T(1045)subject to rank≤ ρ + 21 −DD ∈ EDM Na projection of H on a generally nonconvex subset (when ρ < N −1) of theEuclidean distance matrix cone boundary rel∂EDM N ; id est, projectionfrom the EDM cone interior or exterior on a subset of its relative boundary(5.6, (743)).Rank of an optimal solution is intrinsically bounded above and below;[ ] 0 1T2 ≤ rank1 −D ⋆ ≤ ρ + 2 ≤ N + 1 (1046)Our proposed strategy ([ for low-rank solution ]) is projection on that subset0 1Tof a spectral cone λ1 −EDM N (4.11.2) corresponding to affinedimension not in excess of that ρ desired; id est, spectral projection onthe convex cone ⎡ ⎤⎣ Rρ+1 +0 ⎦ ∩ ∂H ⊂ R N+1 (1047)R −where∂H = ∆ {λ ∈ R N+1 | 1 T λ = 0} (1048)This pointed polyhedral cone (1047), to which membership subsumes therank constraint, has empty interior.Given desired affine dimension 0 ≤ ρ ≤ N −1 and diagonalization (A.5)of unknown EDM D [ ] 0 1T∆= UΥU T ∈ S N+11 −Dh(1049)and given symmetric H in diagonalization[ ] 0 1T∆= QΛQ T ∈ S N+1 (1050)1 −H

7.3. THIRD PREVALENT PROBLEM: 447having eigenvalues arranged in nonincreasing order, then by (671) problem(1045) is equivalent towhereminimize ‖Υ − R T ΛR‖ 2 FΥ,Rsubject to rank Υ ≤ ρ + 2δ 2 (Υ) = Υ holds exactly one negative eigenvalue (1051)δ(QRΥR T Q T ) = 0R −1 = R TR ∆ = Q T U ∈ R N+1×N+1 (1052)in U on the set of orthogonal matrices is a bijection. Two eigenvalueconstraints can be precisely and equivalently restated in terms of spectralcone (1047):minimize ‖Υ − R T ΛR‖ 2 FΥ,R⎡ ⎤subject to δ(Υ) ∈ ⎣ Rρ+1 +0 ⎦ ∩ ∂H(1053)R −δ(QRΥR T Q T ) = 0R −1 = R Twhere ∂H insures one negative eigenvalue. Diagonal constraintδ(QRΥR T Q T ) = 0 makes problem (1053) difficult by making the twovariables dependent. Without it, the problem could be solved exactly by(1301) reducing it to a sequence of two infima.Our strategy is to instead divide problem (1053) into two and thenalternate their solution:minimizeΥsubject to δ(Υ) ∈∥∥δ(Υ) − π ( δ(R T ΛR) )∥ ∥ 2⎡ ⎤⎣ Rρ+1 +0 ⎦ ∩ ∂HR −(a)(1054)minimize ‖Υ − R T ΛR‖ 2 FRsubject to δ(QRΥR T Q T ) = 0R −1 = R T(b)

448 CHAPTER 7. EDM PROXIMITYwhere π is the permutation operator from7.1.3 arranging its vectorargument in nonincreasing order. (Recall, any permutation matrix is anorthogonal matrix.)We justify disappearance of the equality constraint in convex optimizationproblem (1054a): From the arguments in7.1.3 with regard⎡to⎤π thepermutation operator, cone membership constraint δ(Υ) ∈⎣ Rρ+1 +0 ⎦∩ ∂HR −is equivalent to⎡ ⎤δ(Υ) ∈ ⎣ Rρ+1 +0 ⎦ ∩ ∂H ∩ K M (1055)R −where K M is the monotone cone (2.13.9.4.2). Membership to the polyhedralcone of majorization (A.1.1 no.27)K ∗ λδ = ∂H ∩ K ∗ M+ (1070)where K ∗ M+ is the dual monotone nonnegative cone (2.13.9.4.1), is acondition (in the absence of a main-diagonal [ ] δ constraint) insuring existence0 1Tof a symmetric hollow matrix . Intersection of this feasible1 −D⎡ ⎤superset ⎣ Rρ+1 +0 ⎦∩ ∂H ∩ K M with K ∗ λδ is a benign operation; id est,R −∂H ∩ K ∗ M+ ∩ K M = ∂H ∩ K M (1056)verifiable by observing conic dependencies (2.10.3) among the aggregate ofhalfspace-description normals. The cone membership constraint in (1054a)therefore insures existence of a symmetric hollow matrix.Optimization (1054b) would be a Procrustes problem (C.5) were it notfor the equality constraint; it is, instead, a minimization over the intersectionof the nonconvex manifold of orthogonal matrices with another nonconvexset specified by the equality constraint. That is the first obstacle.The second obstacle is deriving conditions for convergence of the iterationto the global minimum.

7.4. CONCLUSION 4497.4 ConclusionAt the present time, the compounded difficulties presented by existenceof a rank constraint in problems formulated as spectral projection appearinsurmountable. [237,4] There has been little progress since the discovery byEckart & Young in 1936 [72] of a formula for spectral projection on a rank ρsubset (2.9.2.1) of the positive semidefinite cone. The only spectral methodpresently available for solving proximity problems, having a constraint onrank, is based on their discovery (Problem 1,7.1,4.13).The importance and application of solving rank-constrained problemsare enormous, a conclusion generally accepted gratis by the mathematicsand engineering communities. The ability to handle rank constraintsmakes polynomial constraints (generally nonconvex) transformable to convexconstraints, for example. Suppose we require the constraint (Orsi)3 + 2x − xy ≤ 0 (1057)Define a new variable⎡X =∆ ⎣1xy⎤⎦[1 x y ] ∈ S 3 (1058)Then the polynomial constraint (1057) is equivalent to the constraint settr(AX) ≤ 0X 11 = 1X ≽ 0rankX = 1(1059)where⎡A = ⎣3 1 01 0 − 1 20 − 1 20⎤⎦ (1060)and then the method of6.4 is applied to implement the rank constraint. Wetherefore encourage further research be directed to that method.

450 CHAPTER 7. EDM PROXIMITY

Appendix ALinear algebraA.1 Main-diagonal δ operator, trace, vecWe introduce notation δ denoting the main-diagonal linear self-adjointoperator. When linear function δ operates on a square matrix A ∈ R N×N ,δ(A) returns a vector composed of all the entries from the main diagonal inthe natural order;δ(A) ∈ R N (1061)Operating on a vector, δ naturally returns a diagonal matrix. Operatingrecursively on a vector Λ∈ R N or diagonal matrix Λ∈ R N×N , δ(δ(Λ))returns Λ itself;δ 2 (Λ) ≡ δ(δ(Λ)) ∆ = Λ (1062)Defined in this manner, main-diagonal linear operator δ is self-adjoint[147,3.10,9.5-1]; A.1 videlicet, for y ∈ R N (2.2)δ(A) T y = 〈δ(A), y〉 = 〈A , δ(y)〉 = tr ( A T δ(y) ) (1063)A.1 Linear operator T : R m×n → R M×N is self-adjoint when, for each and everyX 1 , X 2 ∈ R m×n 〈T(X 1 ), X 2 〉 = 〈X 1 , T(X 2 )〉2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.451

452 APPENDIX A. LINEAR ALGEBRAA.1.1IdentitiesThis δ notation is efficient and unambiguous as illustrated in the followingexamples where A ◦ B denotes Hadamard product [131] [93,1.1.4] ofmatrices of like size, ⊗ the Kronecker product (D.1.2.1), y a vector, X amatrix, e i the i th member of the standard basis for R n , S N h the symmetrichollow subspace, σ a vector of (nonincreasingly) ordered singular values,and λ denotes a vector of nonincreasingly ordered eigenvalues:1. δ(A) = δ(A T )2. tr(A) = tr(A T ) = δ(A) T 13. 〈I, A〉 = trA4. δ(cA) = cδ(A), c∈R5. tr(c √ A T A) = c tr √ A T A = c1 T σ(A), c∈R6. tr(cA) = c tr(A) = c1 T λ(A), c∈R7. δ(A + B) = δ(A) + δ(B)8. tr(A + B) = tr(A) + tr(B)9. δ(AB) = (A ◦ B T )1 = (B T ◦ A)110. δ(AB) T = 1 T (A T ◦ B) = 1 T (B ◦ A T )⎡ ⎤u 1 v 111. δ(uv T ⎢ ⎥) = ⎣ . ⎦ = u ◦ v , u,v ∈ R Nu N v N12. tr(A T B) = tr(AB T ) = tr(BA T ) = tr(B T A)= 1 T (A ◦ B)1 = 1 T δ(AB T ) = δ(A T B) T 1 = δ(BA T ) T 1 = δ(B T A) T 113. D = [d ij ] ∈ S N h , H = [h ij ] ∈ S N h , V = I − 1 N 11T ∈ S N (conferB.4.2 no.20)N tr(−V (D ◦ H)V ) = tr(D T H) = 1 T (D ◦ H)1 = tr(11 T (D ◦ H)) = ∑ d ij h iji,j14. y T Bδ(A) = tr ( Bδ(A)y T) = tr ( δ(B T y)A ) = tr ( Aδ(B T y) )= δ(A) T B T y = tr ( yδ(A) T B T) = tr ( A T δ(B T y) ) = tr ( δ(B T y)A T)

A.1. MAIN-DIAGONAL δ OPERATOR, TRACE, VEC 45315. δ 2 (A T A) = ∑ ie i e T iA T Ae i e T i16. δ ( δ(A)1 T) = δ ( 1δ(A) T) = δ(A)17. δ(A1)1 = δ(A11 T ) = A1 , δ(y)1 = δ(y1 T ) = y18. δ(I1) = δ(1) = I19. δ(e i e T j 1) = δ(e i ) = e i e T i20. vec(AXB) = (B T ⊗ A) vec X21. vec(BXA) = (A T ⊗ B) vec X22. tr(AXBX T ) = vec(X) T vec(AXB) = vec(X) T (B T ⊗ A) vec X [99]23.tr(AX T BX) = vec(X) T vec(BXA) = vec(X) T (A T ⊗ B) vec X= δ ( vec(X) vec(X) T (A T ⊗ B) ) T124. For ζ =[ζ i ]∈ R k and x=[x i ]∈ R k ,∑ζ i /x i = ζ T δ(x) −1 1i25. Let λ(A)∈ C N denote the eigenvalues of A ∈ R N×N . Thenδ(A) = λ(I ◦A) (1064)26. For any permutation matrix Ξ and dimensionally compatible vector yor matrix Aδ(Ξy) = Ξδ(y) Ξ T (1065)δ(ΞAΞ T ) = Ξδ(A) (1066)So given any permutation matrix Ξ and any dimensionally compatiblematrix B , for example,δ 2 (B) = Ξδ 2 (Ξ T B Ξ)Ξ T (1067)

454 APPENDIX A. LINEAR ALGEBRA27. Theorem (Schur). Majorization. [269,7.4] [131,4.3] [132,5.5]Let λ ∈ R N denote a vector of eigenvalues and let δ ∈ R N denote avector of main diagonal entries, both arranged in nonincreasing order.Then∃A∈ S N λ(A)=λ and δ(A)= δ ⇐ λ − δ ∈ K ∗ λδ (1068)and converselyA∈ S N ⇒ λ(A) − δ(A) ∈ K ∗ λδ (1069)the pointed (empty interior) polyhedral cone of majorization(confer (263))∆= K ∗ M+ ∩ {ζ1 | ζ ∈ R} ∗ (1070)K ∗ λδwhere K ∗ M+ is the dual monotone nonnegative cone (360), and wherethe dual of the line is a hyperplane; ∂H = {ζ1 | ζ ∈ R} ∗ = 1 ⊥ . ⋄The majorization cone K ∗ λδ is naturally consequent to the definitionof majorization; id est, vector y ∈ R N majorizes vector x if and only ifk∑x i ≤i=1k∑y i ∀ 1 ≤ k ≤ N (1071)i=1and1 T x = 1 T y (1072)Under these circumstances, rather, vector x is majorized by vector y .In the particular circumstance δ(A)=0, we get:Corollary. Symmetric hollow majorization.Let λ ∈ R N denote a vector of eigenvalues arranged in nonincreasingorder. Then∃A∈ S N h λ(A)=λ ⇐ λ ∈ K ∗ λδ (1073)and converselywhere K ∗ λδis defined in (1070).⋄A∈ S N h ⇒ λ(A) ∈ K ∗ λδ (1074)

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 455A.2 Semidefiniteness: domain of testThe most fundamental necessary, sufficient, and definitive test for positivesemidefiniteness of matrix A ∈ R n×n is: [132,1]x T Ax ≥ 0 for each and every x ∈ R n such that ‖x‖ = 1. (1075)Traditionally, authors demand evaluation over broader domain; namely,over all x ∈ R n which is sufficient but unnecessary. Indeed, that standardtextbook requirement is far over-reaching because if x T Ax is nonnegative forparticular x=x p , then it is nonnegative for any αx p where α ∈R . Thus,only normalized x in R n need be evaluated.Many authors add the further requirement that the domain be complex;the broadest domain. By so doing, only Hermitian matrices (A H = A wheresuperscript H denotes conjugate transpose) A.2 are admitted to the set ofpositive semidefinite matrices (1078); an unnecessary prohibitive condition.A.2.1Symmetry versus semidefinitenessWe call (1075) the most fundamental test of positive semidefiniteness. Yetsome authors instead say, for real A and complex domain (x∈ C n ), thecomplex test x H Ax≥0 is most fundamental. That complex broadening of thedomain of test causes nonsymmetric real matrices to be excluded from the setof positive semidefinite matrices. Yet admitting nonsymmetric real matricesor not is a matter of preference A.3 unless that complex test is adopted, as weshall now explain.Any real square matrix A has a representation in terms of its symmetricand antisymmetric parts; id est,A = (A +AT )2+ (A −AT )2(43)Because, for all real A , the antisymmetric part vanishes under real test,x T (A −AT )2x = 0 (1076)A.2 Hermitian symmetry is the complex analogue; the real part of a Hermitian matrixis symmetric while its imaginary part is antisymmetric. A Hermitian matrix has realeigenvalues and real main diagonal.A.3 Golub & Van Loan [93,4.2.2], for example, admit nonsymmetric real matrices.

456 APPENDIX A. LINEAR ALGEBRAonly the symmetric part of A , (A +A T )/2, has a role determining positivesemidefiniteness. Hence the oft-made presumption that only symmetricmatrices may be positive semidefinite is, of course, erroneous under (1075).Because eigenvalue-signs of a symmetric matrix translate unequivocally toits semidefiniteness, the eigenvalues that determine semidefiniteness arealways those of the symmetrized matrix. (A.3) For that reason, andbecause symmetric (or Hermitian) matrices must have real eigenvalues,the convention adopted in the literature is that semidefinite matrices aresynonymous with symmetric semidefinite matrices. Certainly misleadingunder (1075), that presumption is typically bolstered with compellingexamples from the physical sciences where symmetric matrices occur withinthe mathematical exposition of natural phenomena. A.4 [81,52]Perhaps a better explanation of this pervasive presumption of symmetrycomes from Horn & Johnson [131,7.1] whose perspective A.5 is the complexmatrix, thus necessitating the complex domain of test throughout theirtreatise. They explain, if A∈C n×n...and if x H Ax is real for all x ∈ C n , then A is Hermitian.Thus, the assumption that A is Hermitian is not necessary in thedefinition of positive definiteness. It is customary, however.Their comment is best explained by noting, the real part of x H Ax comesfrom the Hermitian part (A +A H )/2 of A ;rather,Re(x H Ax) = x H A +AH2x (1077)x H Ax ∈ R ⇔ A H = A (1078)because the imaginary part of x H Ax comes from the anti-Hermitian part(A −A H )/2 ;Im(x H Ax) = x H A −AH x2(1079)that vanishes for nonzero x if and only if A = A H . So the Hermitiansymmetry assumption is unnecessary, according to Horn & Johnson, notA.4 Symmetric matrices are certainly pervasive in the our chosen subject as well.A.5 A totally complex perspective is not necessarily more advantageous. The positivesemidefinite cone, for example, is not self-dual (2.13.5) in the ambient space of Hermitianmatrices. [126,II]

A.2. SEMIDEFINITENESS: DOMAIN OF TEST 457because nonHermitian matrices could be regarded positive semidefinite,rather because nonHermitian (includes nonsymmetric real) matrices are notcomparable on the real line under x H Ax . Yet that complex edifice isdismantled in the test of real matrices (1075) because the domain of testis no longer necessarily complex; meaning, x T Ax will certainly always bereal, regardless of symmetry, and so real A will always be comparable.In summary, if we limit the domain of test to x in R n as in (1075),then nonsymmetric real matrices are admitted to the realm of semidefinitematrices because they become comparable on the real line. One importantexception occurs for rank-one matrices Ψ=uv T where u and v are realvectors: Ψ is positive semidefinite if and only if Ψ=uu T . (A.3.1.0.7)We might choose to expand the domain of test to x in C n so that onlysymmetric matrices would be comparable. The alternative to expandingdomain of test is to assume all matrices of interest to be symmetric; thatis commonly done, hence the synonymous relationship with semidefinitematrices.A.2.1.0.1 Example. Nonsymmetric positive definite product.Horn & Johnson assert and Zhang agrees:If A,B ∈ C n×n are positive definite, then we know that theproduct AB is positive definite if and only if AB is Hermitian.[131,7.6, prob.10] [269,6.2,3.2]Implicitly in their statement, A and B are assumed individually Hermitianand the domain of test is assumed complex.We prove that assertion to be false for real matrices under (1075) thatadopts a real domain of test.A T = A =⎡⎢⎣3 0 −1 00 5 1 0−1 1 4 10 0 1 4⎤⎥⎦ ,⎡λ(A) = ⎢⎣5.94.53.42.0⎤⎥⎦ (1080)B T = B =⎡⎢⎣4 4 −1 −14 5 0 0−1 0 5 1−1 0 1 4⎤⎥⎦ ,⎡λ(B) = ⎢⎣8.85.53.30.24⎤⎥⎦ (1081)

458 APPENDIX A. LINEAR ALGEBRA(AB) T ≠ AB =⎡⎢⎣13 12 −8 −419 25 5 1−5 1 22 9−5 0 9 17⎤⎥⎦ ,⎡λ(AB) = ⎢⎣36.29.10.0.72⎤⎥⎦ (1082)1(AB + 2 (AB)T ) =⎡⎢⎣13 15.5 −6.5 −4.515.5 25 3 0.5−6.5 3 22 9−4.5 0.5 9 17⎤⎥⎦ , λ( 12 (AB + (AB)T ) ) =⎡⎢⎣36.30.10.0.014(1083)Whenever A∈ S n + and B ∈ S n + , then λ(AB)=λ(A 1/2 BA 1/2 ) will alwaysbe a nonnegative vector by (1104) and Corollary A.3.1.0.5. Yet positivedefiniteness of the product AB is certified instead by the nonnegativeeigenvalues λ ( 12 (AB + (AB)T ) ) in (1083) (A.3.1.0.1) despite the fact ABis not symmetric. A.6 Horn & Johnson and Zhang resolve the anomaly bychoosing to exclude nonsymmetric matrices and products; they do so byexpanding the domain of test to C n .⎤⎥⎦A.3 Proper statementsof positive semidefinitenessUnlike Horn & Johnson and Zhang, we never adopt the complex domainof test in regard to real matrices. So motivated is our consideration ofproper statements of positive semidefiniteness under real domain of test.This restriction, ironically, complicates the facts when compared to thecorresponding statements for the complex case (found elsewhere, [131] [269]).We state several fundamental facts regarding positive semidefiniteness ofreal matrix A and the product AB and sum A +B of real matrices underfundamental real test (1075); a few require proof as they depart from thestandard texts, while those remaining are well established or obvious.A.6 It is a little more difficult to find a counter-example in R 2×2 or R 3×3 ; which mayhave served to advance any confusion.

A.3. PROPER STATEMENTS 459A.3.0.0.1 Theorem. Positive (semi)definite matrix.A∈ S M is positive semidefinite if and only if for each and every real vectorx of unit norm, ‖x‖ = 1 , A.7 we have x T Ax ≥ 0 (1075);A ≽ 0 ⇔ tr(xx T A) = x T Ax ≥ 0 (1084)Matrix A ∈ S M is positive definite if and only if for each and every ‖x‖ = 1we have x T Ax > 0 ;A ≻ 0 ⇔ tr(xx T A) = x T Ax > 0 (1085)Proof. Statements (1084) and (1085) are each a particular instanceof dual generalized inequalities (2.13.2) with respect to the positivesemidefinite cone; videlicet, [241]⋄A ≽ 0 ⇔ 〈xx T , A〉 ≥ 0 ∀xx T (≽ 0)A ≻ 0 ⇔ 〈xx T , A〉 > 0 ∀xx T (≽ 0), xx T ≠ 0(1086)Relations (1084) and (1085) remain true when xx T is replaced with “for eachand every” X ∈ S M + [41,2.6.1] (2.13.5) of unit norm ‖X‖= 1 as inA ≽ 0 ⇔ tr(XA) ≥ 0 ∀X ∈ S M +A ≻ 0 ⇔ tr(XA) > 0 ∀X ∈ S M + , X ≠ 0(1087)but this condition is far more than what is necessary. By the discretemembership theorem in2.13.4.2.1, the extreme directions xx T of the positivesemidefinite cone constitute a minimal set of generators necessary andsufficient for discretization of dual generalized inequalities (1087) certifyingmembership to that cone.A.7 The traditional condition requiring all x∈ R M for defining positive (semi)definitenessis actually far more than what is necessary. The set of norm-1 vectors is necessary andsufficient to establish positive semidefiniteness; actually, any particular norm and anynonzero norm-constant will work.

460 APPENDIX A. LINEAR ALGEBRAA.3.1Semidefiniteness, eigenvalues, nonsymmetricWhen A∈ R n×n , let λ ( 12 (A +AT ) ) ∈ R n denote eigenvalues of thesymmetrized matrix A.8 arranged in nonincreasing order.By positive semidefiniteness of A∈ R n×n we mean, A.9 [179,1.3.1](conferA.3.1.0.1)x T Ax ≥ 0 ∀x∈ R n ⇔ A +A T ≽ 0 ⇔ λ(A +A T ) ≽ 0 (1088)(2.9.0.1)A ≽ 0 ⇒ A T = A (1089)A ≽ B ⇔ A −B ≽ 0 A ≽ 0 or B ≽ 0 (1090)x T Ax≥0 ∀x A T = A (1091)Matrix symmetry is not intrinsic to positive semidefiniteness;A T = A, λ(A) ≽ 0 ⇒ x T Ax ≥ 0 ∀x (1092)If A T = A thenλ(A) ≽ 0 ⇐ A T = A, x T Ax ≥ 0 ∀x (1093)λ(A) ≽ 0 ⇔ A ≽ 0 (1094)meaning, matrix A belongs to the positive semidefinite cone in thesubspace of symmetric matrices if and only if its eigenvalues belong tothe nonnegative orthant.〈A , A〉 = 〈λ(A), λ(A)〉 (1095)For µ∈ R , A∈ R n×n , and vector λ(A)∈ C n holding the orderedeigenvalues of Aλ(I + µA) = 1 + λ(µA) (1096)Proof: A=MJM −1 and I + µA = M(I + µJ)M −1 where Jis the Jordan form for A ; [220,5.6] id est, δ(J) = λ(A) , soλ(I + µA) = δ(I + µJ) because I + µJ is also a Jordan form. A.8 The symmetrization of A is (A +A T )/2. λ ( 12 (A +AT ) ) = λ(A +A T )/2.A.9 Strang agrees [220, p.334] it is not λ(A) that requires observation. Yet he is mistakenby proposing the Hermitian part alone x H (A+A H )x be tested, because the anti-Hermitianpart does not vanish under complex test unless A is Hermitian. (1079)

A.3. PROPER STATEMENTS 461Similarly, λ(µI + A)=µ1 + λ(A). For vector σ(A) holding thesingular values of any matrix A , σ(I + µA T A)= π(|1 + µσ(A T A)|)and σ(µI + A T A)= π(|µ1 + σ(A T A)|) where π is a nonlinear operatorsorting its vector argument into nonincreasing order.For A∈ S M and each and every ‖w‖= 1 [131,7.7, prob.9]w T Aw ≤ µ ⇔ A ≼ µI ⇔ λ(A) ≼ µ1 (1097)[131,2.5.4] (confer (35))A is normal ⇔ ‖A‖ 2 F = λ(A) T λ(A) (1098)For A∈ R m×n A T A ≽ 0, AA T ≽ 0 (1099)because, for dimensionally compatible vector x , x T A T Ax = ‖Ax‖ 2 2 ,x T AA T x = ‖A T x‖ 2 2 .For A∈ R n×n and c∈Rtr(cA) = c tr(A) = c1 T λ(A)(A.1.1 no.6)detA = ∏ iλ(A) i . For m a nonnegative integer,tr(A m ) =n∑λ m i (1100)i=1For A diagonalizable, (confer [220, p.255])rankA = rankδ(λ(A)) (1101)meaning, rank is the same as the number of nonzero eigenvalues invector λ(A) by the 0 eigenvalues theorem (A.7.2.0.1).

462 APPENDIX A. LINEAR ALGEBRA(Fan) For A,B∈ S n [38,1.2] (confer (1365))tr(AB) ≤ λ(A) T λ(B) (1102)with equality (Theobald) when A and B are simultaneouslydiagonalizable [131] with the same ordering of eigenvalues.For A∈ R m×n and B ∈ R n×mtr(AB) = tr(BA) (1103)and η eigenvalues of the product and commuted product are identical,including their multiplicity; [131,1.3.20] id est,λ(AB) 1:η = λ(BA) 1:η , η ∆ =min{m , n} (1104)Any eigenvalues remaining are zero. By the 0 eigenvalues theorem(A.7.2.0.1),rank(AB) = rank(BA), AB and BA diagonalizable (1105)For any compatible matrices A,B [131,0.4]min{rankA, rankB} ≥ rank(AB) (1106)For A,B ∈ S n + (212)rankA + rankB ≥ rank(A + B) ≥ min{rankA, rankB} ≥ rank(AB)(1107)For A,B ∈ S n + linearly independent, each rank-1, (B.1.1)rankA + rankB = rank(A + B) > min{rankA, rankB} = rank(AB)(1108)

A.3. PROPER STATEMENTS 463For A∈ R m×n having no nullspace, and for any B ∈ R n×krank(AB) = rank(B) (1109)Proof. For any compatible matrix C , N(CAB)⊇ N(AB)⊇ N(B)is obvious. By assumption ∃A † A † A = I . Let C = A † , thenN(AB)= N(B) and the stated result follows by conservation ofdimension (1210).For A∈ S n and any nonsingular matrix Yinertia(A) = inertia(YAY T ) (1110)a.k.a, Sylvester’s law of inertia. (1151) [66,2.4.3]For A,B∈R n×n square,Yet for A∈ R m×n and B ∈ R n×m [48, p.72]det(AB) = det(BA) (1111)det(I + AB) = det(I + BA) (1112)For A,B ∈ S n , product AB is symmetric if and only if AB iscommutative;(AB) T = AB ⇔ AB = BA (1113)Proof. (⇒) Suppose AB=(AB) T . (AB) T =B T A T =BA .AB=(AB) T ⇒ AB=BA .(⇐) Suppose AB=BA . BA=B T A T =(AB) T . AB=BA ⇒AB=(AB) T .Commutativity alone is insufficient for symmetry of the product.[220, p.26] Diagonalizable matrices A,B ∈ R n×n commute if and onlyif they are simultaneously diagonalizable. [131,1.3.12]

464 APPENDIX A. LINEAR ALGEBRAFor A,B ∈ R n×n and AB = BAx T Ax ≥ 0, x T Bx ≥ 0 ∀x ⇒ λ(A+A T ) i λ(B+B T ) i ≥ 0 ∀i x T ABx ≥ 0 ∀x(1114)the negative result arising because of the schism between the productof eigenvalues λ(A + A T ) i λ(B + B T ) i and the eigenvalues of thesymmetrized matrix product λ(AB + (AB) T ) i. For example, X 2 isgenerally not positive semidefinite unless matrix X is symmetric; then(1099) applies. Simply substituting symmetric matrices changes theoutcome:For A,B ∈ S n and AB = BAA ≽ 0, B ≽ 0 ⇒ λ(AB) i =λ(A) i λ(B) i ≥0 ∀i ⇔ AB ≽ 0 (1115)Positive semidefiniteness of A and B is sufficient but not a necessarycondition for positive semidefiniteness of the product AB .Proof. Because all symmetric matrices are diagonalizable, (A.5.2)[220,5.6] we have A=SΛS T and B=T∆T T , where Λ and ∆ arereal diagonal matrices while S and T are orthogonal matrices. Because(AB) T =AB , then T must equal S , [131,1.3] and the eigenvaluesof A are ordered identically to those of B ; id est, λ(A) i = δ(Λ) i andλ(B) i =δ(∆) i correspond to the same eigenvector.(⇒) Assume λ(A) i λ(B) i ≥0 for i=1... n . AB=SΛ∆S T issymmetric and has nonnegative eigenvalues contained in diagonalmatrix Λ∆ by assumption; hence positive semidefinite by (1088). Nowassume A,B ≽0. That, of course, implies λ(A) i λ(B) i ≥0 for all ibecause all the individual eigenvalues are nonnegative.(⇐) Suppose AB=SΛ∆S T ≽ 0. Then Λ∆≽0 by (1088),and so all products λ(A) i λ(B) i must be nonnegative; meaning,sgn(λ(A))= sgn(λ(B)). We may, therefore, conclude nothing aboutthe semidefiniteness of A and B .For A,B ∈ S n and A ≽ 0, B ≽ 0 (Example A.2.1.0.1)AB = BA ⇒ λ(AB) i =λ(A) i λ(B) i ≥ 0 ∀i ⇒ AB ≽ 0 (1116)AB = BA ⇒ λ(AB) i ≥ 0, λ(A) i λ(B) i ≥ 0 ∀i ⇔ AB ≽ 0 (1117)

A.3. PROPER STATEMENTS 465For A,B ∈ S n [269,6.2]A ≽ 0 ⇒ trA ≥ 0 (1118)A ≽ 0, B ≽ 0 ⇒ trA trB ≥ tr(AB) ≥ 0 (1119)Because A ≽ 0, B ≽ 0 ⇒ λ(AB)= λ(A 1/2 BA 1/2 ) ≽ 0 by (1104) andCorollary A.3.1.0.5, then we have tr(AB) ≥ 0.For A,B,C ∈ S n (Löwner)A ≽ 0 ⇔ tr(AB)≥ 0 ∀B ≽ 0 (306)A ≼ B , B ≼ C ⇒ A ≼ C (1120)A ≼ B ⇔ A + C ≼ B + C (1121)A ≼ B , A ≽ B ⇒ A = B (1122)For A,B ∈ R n×n x T Ax ≥ x T Bx ∀x ⇒ trA ≥ trB (1123)Proof. x T Ax≥x T Bx ∀x ⇔ λ((A −B) + (A −B) T )/2 ≽ 0 ⇒tr(A+A T −(B+B T ))/2 = tr(A −B)≥0. There is no converse. For A,B ∈ S n [269,6.2, prob.1] (Theorem A.3.1.0.4)A ≽ B ⇒ trA ≥ trB (1124)A ≽ B ⇒ δ(A) ≽ δ(B) (1125)There is no converse, and restriction to the positive semidefinite conedoes not improve the situation. The all-strict versions hold. From[269,6.2]A ≽ B ≽ 0 ⇒ rankA ≥ rankB (1126)A ≽ B ≽ 0 ⇒ detA ≥ detB ≥ 0 (1127)A ≻ B ≽ 0 ⇒ detA > detB ≥ 0 (1128)For A,B ∈ int S n + [26,4.2] [131,7.7.4]A ≽ B ⇔ A −1 ≼ B −1 (1129)

466 APPENDIX A. LINEAR ALGEBRAFor A,B ∈ S n [269,6.2]A ≽ B ≽ 0 ⇒ A 1/2 ≽ B 1/2 (1130)For A,B ∈ S n and AB = BA [269,6.2, prob.3]A ≽ B ≽ 0 ⇒ A k ≽ B k , k=1, 2,... (1131)A.3.1.0.1 Theorem. Positive semidefinite ordering of eigenvalues.For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M . Then, [220,6]x T Ax ≥ 0 ∀x ⇔ λ ( A +A T) ≽ 0 (1132)x T Ax > 0 ∀x ≠ 0 ⇔ λ ( A +A T) ≻ 0 (1133)because x T (A −A T )x=0. (1076) Now arrange the entries of λ ( 1(A 2 +AT ) )and λ ( 1(B 2 +BT ) ) in nonincreasing order so λ ( 1(A 2 +AT ) ) holds the1largest eigenvalue of symmetrized A while λ ( 1(B 2 +BT ) ) holds the largest1eigenvalue of symmetrized B , and so on. Then [131,7.7, prob.1, prob.9]for κ ∈ Rx T Ax ≥ x T Bx ∀x ⇒ λ ( A +A T) ≽ λ ( B +B T)x T Ax ≥ x T Ixκ ∀x ⇔ λ ( 12 (A +AT ) ) ≽ κ1(1134)Now let A,B ∈ S M have diagonalizations A=QΛQ T and B =UΥU T withλ(A)=δ(Λ) and λ(B)=δ(Υ) arranged in nonincreasing order. ThenA ≽ B ⇔ λ(A−B) ≽ 0 (1135)A ≽ B ⇒ λ(A) ≽ λ(B) (1136)A ≽ B λ(A) ≽ λ(B) (1137)S T AS ≽ B ⇐ λ(A) ≽ λ(B) (1138)where S = QU T . [269,7.5]⋄

A.3. PROPER STATEMENTS 467A.3.1.0.2 Theorem. (Weyl) Eigenvalues of sum. [131,4.3]For A,B∈ R M×M , place the eigenvalues of each symmetrized matrix intothe respective vectors λ ( 1(A 2 +AT ) ) , λ ( 1(B 2 +BT ) ) ∈ R M in nonincreasingorder so λ ( 1(A 2 +AT ) ) holds the largest eigenvalue of symmetrized A while1λ ( 1(B 2 +BT ) ) holds the largest eigenvalue of symmetrized B , and so on.1Then, for any k ∈{1... M }λ ( A +A T) k + λ( B +B T) M ≤ λ( (A +A T ) + (B +B T ) ) k ≤ λ( A +A T) k + λ( B +B T) 1(1139)Weyl’s theorem establishes positive semidefiniteness of a sum of positivesemidefinite matrices. Because S M + is a convex cone (2.9.0.0.1), thenby (139)A,B ≽ 0 ⇒ ζA + ξB ≽ 0 for all ζ,ξ ≥ 0 (1140)⋄A.3.1.0.3 Corollary. Eigenvalues of sum and difference. [131,4.3]For A∈ S M and B ∈ S M + , place the eigenvalues of each matrix into therespective vectors λ(A),λ(B) ∈ R M in nonincreasing order so λ(A) 1 holdsthe largest eigenvalue of A while λ(B) 1 holds the largest eigenvalue of B ,and so on. Then, for any k ∈{1... M}λ(A − B) k ≤ λ(A) k ≤ λ(A +B) k (1141)⋄When B is rank-one positive semidefinite, the eigenvalues interlace; id est,for B = qq Tλ(A) k−1 ≤ λ(A − qq T ) k ≤ λ(A) k ≤ λ(A + qq T ) k ≤ λ(A) k+1 (1142)

468 APPENDIX A. LINEAR ALGEBRAA.3.1.0.4 Theorem. Positive (semi)definite principal submatrices. A.10A∈ S M is positive semidefinite if and only if all M principal submatricesof dimension M−1 are positive semidefinite and detA is nonnegative.A ∈ S M is positive definite if and only if any one principal submatrixof dimension M −1 is positive definite and detA is positive. ⋄If any one principal submatrix of dimension M−1 is not positive definite,conversely, then A can neither be. Regardless of symmetry, if A ∈ R M×M ispositive (semi)definite, then the determinant of each and every principalsubmatrix is (nonnegative) positive. [179,1.3.1]A.3.1.0.5[131, p.399]Corollary. Positive (semi)definite symmetric products.If A∈ S M is positive definite and any particular dimensionallycompatible matrix Z has no nullspace, then Z T AZ is positive definite.If matrix A∈ S M is positive (semi)definite then, for any matrix Z ofcompatible dimension, Z T AZ is positive semidefinite.A∈ S M is positive (semi)definite if and only if there exists a nonsingularZ such that Z T AZ is positive (semi)definite.If A ∈ S M is positive semidefinite and singular it remains possible, forsome skinny Z ∈ R M×N with N

A.3. PROPER STATEMENTS 469We can deduce from these, given nonsingular matrix Z and any particulardimensionally[ ]compatible Y : matrix A∈ S M is positive semidefinite if andZTonly ifY T A[Z Y ] is positive semidefinite. In other words, from theCorollary it follows: for dimensionally compatible ZA ≽ 0 ⇔ Z T AZ ≽ 0 and Z Thas a left inverseProducts such as Z † Z and ZZ † are symmetric and positive semidefinitealthough, given A ≽ 0, Z † AZ and ZAZ † are neither necessarily symmetricor positive semidefinite.A.3.1.0.6 Theorem. Symmetric projector semidefinite. [16,III][17,6] [144, p.55] For symmetric idempotent matrices P and RP,R ≽ 0P ≽ R ⇔ R(P ) ⊇ R(R) ⇔ N(P ) ⊆ N(R)(1143)Projector P is never positive definite [222,6.5, prob.20] unless it is theidentity matrix.⋄A.3.1.0.7 Theorem. Symmetric positive semidefinite.Given real matrix Ψ with rank Ψ = 1Ψ ≽ 0 ⇔ Ψ = uu T (1144)where u is a real vector.⋄Proof. Any rank-one matrix must have the form Ψ = uv T . (B.1)Suppose Ψ is symmetric; id est, v = u . For all y ∈ R M , y T uu T y ≥ 0.Conversely, suppose uv T is positive semidefinite. We know that can hold ifand only if uv T + vu T ≽ 0 ⇔ for all normalized y ∈ R M , 2y T uv T y ≥ 0 ;but that is possible only if v = u .The same does not hold true for matrices of higher rank, as Example A.2.1.0.1shows.

470 APPENDIX A. LINEAR ALGEBRAA.4 Schur complementConsider the block matrix G : Given A T = A and C T = C , then [40][ ] A BG =B T ≽ 0C(1145)⇔ A ≽ 0, B T (I −AA † ) = 0, C −B T A † B ≽ 0⇔ C ≽ 0, B(I −CC † ) = 0, A−BC † B T ≽ 0where A † denotes the Moore-Penrose (pseudo)inverse (E). In the firstinstance, I − AA † is a symmetric projection matrix orthogonally projectingon N(A T ). (1510) It is apparently requiredR(B) ⊥ N(A T ) (1146)which precludes A = 0 when B is any nonzero matrix. Note that A ≻ 0 ⇒A † =A −1 ; thereby, the projection matrix vanishes. Likewise, in the secondinstance, I − CC † projects orthogonally on N(C T ). It is requiredR(B T ) ⊥ N(C T ) (1147)which precludes C = 0 for B nonzero. Again, C ≻ 0 ⇒ C † = C −1 . So weget, for A or C nonsingular,[ ] A BG =B T ≽ 0C⇔A ≻ 0, C −B T A −1 B ≽ 0orC ≻ 0, A−BC −1 B T ≽ 0(1148)When A is full-rank then, for all B of compatible dimension, R(B) is inR(A). Likewise, when C is full-rank, R(B T ) is in R(C). Thus the flavor,for A and C nonsingular,[ ] A BG =B T ≻ 0C⇔ A ≻ 0, C −B T A −1 B ≻ 0⇔ C ≻ 0, A−BC −1 B T ≻ 0(1149)where C − B T A −1 B is called the Schur complement of A in G , while theSchur complement of C in G is A − BC −1 B T . [87,4.8]

A.4. SCHUR COMPLEMENT 471A.4.0.0.1 Example. Sparse Schur conditions.Setting matrix A to the identity simplifies the Schur conditions.consequence relates the definiteness of three quantities:[ ] I BB T ≽ 0 ⇔CC − B T B ≽ 0 ⇔[ I 00 T C −B T BOne]≽ 0 (1150)Origin of the term Schur complement is from complementary inertia:[66,2.4.4] Defineinertia ( G∈ S M) ∆ = {p,z,n} (1151)where p,z,n respectively represent the number of positive, zero, andnegative eigenvalues of G ; id est,Then, when C is invertible,and when A is invertible,M = p + z + n (1152)inertia(G) = inertia(C) + inertia(A − BC −1 B T ) (1153)inertia(G) = inertia(A) + inertia(C − B T A −1 B) (1154)When A=C =0, denoting by σ(B)∈ R m the nonincreasingly orderedsingular values of matrix B ∈ R m×m , then we have the eigenvalues[38,1.2, prob.17]([ 0 Bλ(G) = λB T 0])=[σ(B)−Ξσ(B)](1155)andinertia(G) = inertia(B T B) + inertia(−B T B) (1156)where Ξ is the order-reversing permutation matrix defined in (1352).

472 APPENDIX A. LINEAR ALGEBRAA.4.0.0.2 Theorem. Rank of partitioned matrices.When symmetric matrix A is invertible and C is symmetric,[ ] [ ]A B A 0rankB T = rankC 0 T C −B T A −1 B= rankA + rank(C −B T A −1 B)(1157)equals rank of a block on the main diagonal plus rank of its Schur complement[269,2.2, prob.7]. Similarly, when symmetric matrix C is invertible and Ais symmetric,[ ] [ ]A B A − BCrankB T = rank−1 B T 0C0 T C(1158)= rank(A − BC −1 B T ) + rankCProof. The first assertion (1157) holds if and only if [131,0.4.6(c)][ ] [ ]A B A 0∃ nonsingular X,Y XB T Y =C 0 T C −B T A −1 (1159)BLet [131,7.7.6]Y = X T =[ I −A −1 B0 T I]⋄(1160)A.4.0.0.3 Lemma. Rank of Schur-form block. [79] [77]Matrix B ∈ R m×n has rankB≤ ρ if and only if there exist matrices A∈ S mand C ∈ S n such that[ ] A BrankA + rankC ≤ 2ρ andB T ≽ 0 (1161)C⋄A.4.1Semidefinite program via SchurSchur complement (1145) can be used to convert a projection problemto an optimization problem in epigraph form. Suppose, for example,we are presented with the constrained projection problem studied byHayden & Wells in [114] (who provide analytical solution): Given A∈ R M×Mand some full-rank matrix S ∈ R M×L with L < M

A.4. SCHUR COMPLEMENT 473minimize ‖A − X‖ 2X∈ S MFsubject to S T XS ≽ 0(1162)Variable X is constrained to be positive semidefinite, but only on a subspacedetermined by S . First we write the epigraph form (3.1.1.3):minimize tX∈ S M , t∈Rsubject to ‖A − X‖ 2 F ≤ tS T XS ≽ 0(1163)Next we use the Schur complement [182,6.4.3] [161] and matrixvectorization (2.2):minimizeX∈ S M , t∈Rsubject tot[tI vec(A − X)vec(A − X) T 1]≽ 0(1164)S T XS ≽ 0This semidefinite program is an epigraph form in disguise, equivalentto (1162).Were problem (1162) instead equivalently expressed without the squareminimize ‖A − X‖ FX∈ S Msubject to S T XS ≽ 0(1165)then we get a subtle variation:minimize tX∈ S M , t∈Rsubject to ‖A − X‖ F ≤ tS T XS ≽ 0(1166)that leads to an equivalent for (1165)minimizeX∈ S M , t∈Rsubject tot[tI vec(A − X)vec(A − X) T t]≽ 0(1167)S T XS ≽ 0

474 APPENDIX A. LINEAR ALGEBRAA.4.1.0.1 Example. Schur anomaly.Consider the problem abstract in the convex constraint, given symmetricmatrix Aminimize ‖X‖ 2X∈ S M F − ‖A − X‖2 Fsubject to X ∈ Ca difference of two functions convex in matrix X . Observe equality(1168)‖X‖ 2 F − ‖A − X‖ 2 F = 2 tr(XA) − ‖A‖ 2 F (1169)so problem (1168) is equivalent to the convex optimizationminimize tr(XA)X∈ S Msubject to X ∈ C(1170)But this problem (1168) does not have a Schur formminimize t − αX∈ S M , α , tsubject to X ∈ C‖X‖ 2 F ≤ t(1171)‖A − X‖ 2 F ≥ αbecause the constraint in α is nonconvex. (2.9.1.0.1)A.4.2Determinant[ A BG =B T C](1172)We consider again a matrix G partitioned similarly to (1145), but notnecessarily positive (semi)definite, where A and C are symmetric.

A.4. SCHUR COMPLEMENT 475When A is invertible,detG = detA det(C − B T A −1 B) (1173)When C is invertible,detG = detC det(A − BC −1 B T ) (1174)When B is full-rank and skinny, C = 0, and A ≽ 0, then [41,10.1.1]detG ≠ 0 ⇔ A + BB T ≻ 0 (1175)When B is a (column) vector, then for all C ∈ R and all A of dimensioncompatible with Gwhile for C ≠ 0detG = det(A)C − B T A T cofB (1176)detG = C det(A − 1 C BBT ) (1177)where A cof is the matrix of cofactors [220,4] corresponding to A .When B is full-rank and fat, A = 0, and C ≽ 0, thendetG ≠ 0 ⇔ C + B T B ≻ 0 (1178)When B is a row vector, then for A ≠ 0 and all C of dimensioncompatible with GdetG = A det(C − 1 A BT B) (1179)while for all A∈ RdetG = det(C)A − BCcofB T T (1180)where C cof is the matrix of cofactors corresponding to C .

476 APPENDIX A. LINEAR ALGEBRAA.5 eigen decompositionWhen a square matrix X ∈ R m×m is diagonalizable, [220,5.6] then⎡ ⎤w T1 m∑X = SΛS −1 = [ s 1 · · · s m ] Λ⎣. ⎦ = λ i s i wi T (1181)where s i ∈ C m are linearly independent (right-)eigenvectors A.12 constitutingthe columns of S ∈ C m×m defined byw T mi=1XS = SΛ (1182)w T i ∈ C m are linearly independent left-eigenvectors of X constituting the rowsof S −1 defined by [131]S −1 X = ΛS −1 (1183)and where λ i ∈ C are eigenvalues (populating diagonal matrix Λ∈ C m×m )corresponding to both left and right eigenvectors.There is no connection between diagonalizability and invertibility of X .[220,5.2] Diagonalizability is guaranteed by a full set of linearly independenteigenvectors, whereas invertibility is guaranteed by all nonzero eigenvalues.distinct eigenvalues ⇒ l.i. eigenvectors ⇔ diagonalizablenot diagonalizable ⇒ repeated eigenvalue(1184)A.5.0.0.1 Theorem. Real eigenvector. Eigenvectors of a real matrixcorresponding to real eigenvalues must be real.⋄Proof. Ax = λx . Given λ=λ ∗ , x H Ax = λx H x = λ‖x‖ 2 = x T Ax ∗x = x ∗ , where x H =x ∗T . The converse is equally simple. ⇒A.5.0.1UniquenessFrom the fundamental theorem of algebra it follows: Eigenvalues, includingtheir multiplicity, for a given square matrix are unique; meaning, there is noother set of eigenvalues for that matrix. (Conversely, many different matricesmay share the same unique set of eigenvalues.)Uniqueness of eigenvectors, in contrast, disallows multiplicity of the samedirection.A.12 Eigenvectors must, of course, be nonzero. The prefix eigen is from the German; inthis context meaning, something akin to “characteristic”. [217, p.14]

A.5. EIGEN DECOMPOSITION 477A.5.0.1.1 Definition. Unique eigenvectors.When eigenvectors are unique, we mean unique to within a real nonzeroscaling and their directions are distinct.△If S is a matrix of eigenvectors of X as in (1181), for example, then −Sis certainly another matrix of eigenvectors decomposing X with the sameeigenvalues.For any square matrix, [217, p.220]distinct eigenvalues ⇒ eigenvectors unique (1185)Eigenvectors corresponding to a repeated eigenvalue are not unique for adiagonalizable matrix;repeated eigenvalue ⇒ eigenvectors not unique (1186)Proof follows from the observation: any linear combination of distincteigenvectors of diagonalizable X , corresponding to a particular eigenvalue,produces another eigenvector. For eigenvalue λ whose multiplicity A.13dim N(X −λI) exceeds 1, in other words, any choice of independentvectors from N(X − λI) (of the same multiplicity) constitutes eigenvectorscorresponding to λ .Caveat diagonalizability insures linear independence which impliesexistence of distinct eigenvectors. We may conclude, for diagonalizablematrices,distinct eigenvalues ⇔ eigenvectors unique (1187)A.5.1eigenmatrixThe (right-)eigenvectors {s i } are naturally orthogonal to the left-eigenvectors{w i } except, for i = 1... m , w T i s i = 1 ; called a biorthogonality condition[246,2.2.4] [131] because neither set of left or right eigenvectors is necessarilyan orthogonal set. Consequently, each dyad from a diagonalization is anindependent (B.1.1) nonorthogonal projector becauseA.13 For a diagonalizable matrix, algebraic multiplicity is the same as geometric multiplicity.[217, p.15]

478 APPENDIX A. LINEAR ALGEBRAs i w T i s i w T i = s i w T i (1188)(whereas the dyads of singular value decomposition are not inherentlyprojectors (confer (1192))).The dyads of eigen decomposition can be termed eigenmatrices becauseX s i w T i = λ i s i w T i (1189)A.5.2Symmetric matrix diagonalizationThe set of normal matrices is, precisely, that set of all real matrices havinga complete orthonormal set of eigenvectors; [269,8.1] [222, prob.10.2.31]e.g., orthogonal and circulant matrices [101]. All normal matrices arediagonalizable. A symmetric matrix is a special normal matrix whoseeigenvalues must be real and whose eigenvectors can be chosen to make areal orthonormal set; [222,6.4] [220, p.315] id est, for X ∈ S ms T1X = SΛS T = [ s 1 · · · s m ] Λ⎣.⎡s T m⎤⎦ =m∑λ i s i s T i (1190)where δ 2 (Λ) = Λ∈ R m×m (A.1) and S −1 = S T ∈ R m×m (B.5) because of thesymmetry: SΛS −1 = S −T ΛS T . Because the arrangement of eigenvectors andtheir corresponding eigenvalues is arbitrary, we almost always arrange theeigenvalues in nonincreasing order as is the convention for singular valuedecomposition.i=1A.5.2.1Positive semidefinite matrix square rootWhen X ∈ S m + , its unique positive semidefinite matrix square root is defined√X ∆ = S √ ΛS T ∈ S m + (1191)where the square root of nonnegative diagonal matrix √ Λ is taken entrywiseand positive. Then X = √ X √ X .

A.6. SINGULAR VALUE DECOMPOSITION, SVD 479A.6 Singular value decomposition, SVDA.6.1Compact SVD[93,2.5.4] For any A∈ R m×nq T1A = UΣQ T = [ u 1 · · · u η ] Σ⎣.⎡⎤∑⎦ = η σ i u i qiTqηT i=1(1192)U ∈ R m×η , Σ ∈ R η×η , Q ∈ R n×ηwhere U and Q are always skinny-or-square each having orthonormalcolumns, and whereη ∆ = min{m , n} (1193)Square matrix Σ is diagonal (A.1.1)δ 2 (Σ) = Σ ∈ R η×η (1194)holding the singular values σ i of A which are always arranged innonincreasing order by convention and are related to eigenvalues by A.14⎧√ ⎨ λ(AT A) i = √ (√ )λ(AA T ) i = λ AT A)= λ(√AAT> 0, i = 1... ρσ(A) i =ii⎩0, i = ρ+1... η(1195)of which the last η −ρ are 0 , A.15 whereρ ∆ = rankA = rank Σ (1196)A point sometimes lost: Any real matrix may be decomposed in terms ofits real singular values σ(A) ∈ R η and real matrices U and Q as in (1192),where [93,2.5.3]R{u i |σ i ≠0} = R(A)R{u i |σ i =0} ⊆ N(A T )R{q i |σ i ≠0} = R(A T (1197))R{q i |σ i =0} ⊆ N(A)A.14 When A is normal, σ(A) = |λ(A)|. [269,8.1]A.15 For η = n , σ(A) = √ )λ(A T A) = λ(√AT A where λ denotes eigenvalues.For η = m , σ(A) = √ )λ(AA T ) = λ(√AAT.

480 APPENDIX A. LINEAR ALGEBRAA.6.2Subcompact SVDSome authors allow only nonzero singular values. In that case the compactdecomposition can be made smaller; it can be redimensioned in terms of rankρ because, for any A∈ R m×nρ = rankA = rank Σ = max {i∈{1... η} | σ i ≠ 0} ≤ η (1198)There are η singular values. Rank is equivalent to the number ofnonzero singular values, on the main diagonal of Σ for any flavor SVD,as is well known. Now⎡q T1A = UΣQ T = [ u 1 · · · u ρ ] Σ⎣.⎤∑⎦ = ρ σ i u i qiTqρT i=1(1199)U ∈ R m×ρ , Σ ∈ R ρ×ρ , Q ∈ R n×ρwhere the main diagonal of diagonal matrix Σ has no 0 entries, andR{u i } = R(A)R{q i } = R(A T )(1200)A.6.3Full SVDAnother common and useful expression of the SVD makes U and Qsquare; making the decomposition larger than compact SVD. Completingthe nullspace bases in U and Q from (1197) provides what is called thefull singular value decomposition of A ∈ R m×n [220, App.A]. Orthonormalmatrices U and Q become orthogonal matrices (B.5):R{u i |σ i ≠0} = R(A)R{u i |σ i =0} = N(A T )R{q i |σ i ≠0} = R(A T )R{q i |σ i =0} = N(A)(1201)

A.6. SINGULAR VALUE DECOMPOSITION, SVD 481For any matrix A having rank ρ (= rank Σ)⎡q T1A = UΣQ T = [u 1 · · · u m ] Σ⎣.q T n⎤∑⎦ = η σ i u i qiT⎡σ 1= [ m×ρ basis R(A) m×m−ρ basis N(A T ) ] σ 2⎢⎣...i=1⎤⎡⎥⎣⎦(n×ρ basis R(A T ) ) ⎤T⎦(n×n−ρ basis N(A)) TU ∈ R m×m , Σ ∈ R m×n , Q ∈ R n×n (1202)where upper limit of summation η is defined in (1193). Matrix Σ is nolonger necessarily square, now padded with respect to (1194) by m − ηzero rows or n−η zero columns; the nonincreasingly ordered (possibly 0)singular values appear along its main diagonal as for compact SVD (1195).An important geometrical interpretation of SVD is given in Figure 103for m = n = 2 : The image of the unit sphere under any m × n matrixmultiplication is an ellipse. Considering the three factors of the SVDseparately, note that Q T is a pure rotation of the circle. Figure 103 showshow the axes q 1 and q 2 are first rotated by Q T to coincide with the coordinateaxes. Second, the circle is stretched by Σ in the directions of the coordinateaxes to form an ellipse. The third step rotates the ellipse by U into itsfinal position. Note how q 1 and q 2 are rotated to end up as u 1 and u 2 , theprincipal axes of the final ellipse. A direct calculation shows that Aq j =σ j u j .Thus q j is first rotated to coincide with the j th coordinate axis, stretched bya factor σ j , and then rotated to point in the direction of u j . All of thisis beautifully illustrated for 2 ×2 matrices by the Matlab code eigshow.m(see [219]).A direct consequence of the geometric interpretation is that the largestsingular value σ 1 measures the “magnitude” of A (its 2-norm):‖A‖ 2 = sup ‖Ax‖ 2 = σ 1 (1203)‖x‖ 2 =1This means that ‖A‖ 2 is the length of the longest principal semiaxis of theellipse.

482 APPENDIX A. LINEAR ALGEBRAΣq 2qq 21Q Tq 1q 2 u 2Uu 1q 1Figure 103: Geometrical interpretation of full SVD [178]: Image of circle{x∈ R 2 | ‖x‖ 2 =1} under matrix multiplication Ax is, in general, an ellipse.For the example illustrated, U ∆ =[u 1 u 2 ]∈ R 2×2 , Q ∆ =[q 1 q 2 ]∈ R 2×2 .

A.6. SINGULAR VALUE DECOMPOSITION, SVD 483Expressions for U , Q , and Σ follow readily from (1202),AA T U = UΣΣ T and A T AQ = QΣ T Σ (1204)demonstrating that the columns of U are the eigenvectors of AA T and thecolumns of Q are the eigenvectors of A T A. −Neil Muller et alii [178]A.6.4SVD of symmetric matricesA.6.4.0.1 Definition. Step function. (confer6.3.2.0.1)Define the signum-like entrywise vector-valued function ψ : R n → R n thattakes the value 1 corresponding to an entry-value 0 in the argument:[ψ(a) =∆limx i →a ix i|x i | = { 1, ai ≥ 0−1, a i < 0 , i=1... n ]∈ R n (1205)Eigenvalue signs of a symmetric matrix having diagonalizationA = SΛS T (1190) can be absorbed either into real U or real Q from the fullSVD; [234, p.34] (conferC.5.2.1)orA = SΛS T = Sδ(ψ(δ(Λ))) |Λ|S T ∆ = U ΣQ T ∈ S n (1206)A = SΛS T = S|Λ|δ(ψ(δ(Λ)))S T ∆ = UΣ Q T ∈ S n (1207)where |Λ| denotes entrywise absolute value of diagonal matrix Λ .A.6.5Pseudoinverse by SVDMatrix pseudoinverse (E) is nearly synonymous with singular valuedecomposition because of the elegant expression, given A = UΣQ TA † = QΣ †T U T ∈ R n×m (1208)that applies to all three flavors of SVD, where Σ † simply inverts nonzeroentries of matrix Σ .Given symmetric matrix A∈ S n and its diagonalization A = SΛS T(A.5.2), its pseudoinverse simply inverts all nonzero eigenvalues;A † = SΛ † S T (1209)△

484 APPENDIX A. LINEAR ALGEBRAA.7 ZerosA.7.10 entryIf a positive semidefinite matrix A = [A ij ] ∈ R n×n has a 0 entry A ii on itsmain diagonal, then A ij + A ji = 0 ∀j . [179,1.3.1]Any symmetric positive semidefinite matrix having a 0 entry on its maindiagonal must be 0 along the entire row and column to which that 0 entrybelongs. [93,4.2.8] [131,7.1, prob.2]A.7.2 0 eigenvalues theoremThis theorem is simple, powerful, and widely applicable:A.7.2.0.1 Theorem. Number of 0 eigenvalues.For any matrix A∈ R m×nrank(A) + dim N(A) = n (1210)by conservation of dimension. [131,0.4.4]For any square matrix A∈ R m×m , the number of 0 eigenvalues is at leastequal to dim N(A)dim N(A) ≤ number of 0 eigenvalues ≤ m (1211)while the eigenvectors corresponding to those 0 eigenvalues belong to N(A).[220,5.1] A.16For diagonalizable matrix A (A.5), the number of 0 eigenvalues isprecisely dim N(A) while the corresponding eigenvectors span N(A). Thereal and imaginary parts of the eigenvectors remaining span R(A).A.16 We take as given the well-known fact that the number of 0 eigenvalues cannot be lessthan the dimension of the nullspace. We offer an example of the converse:⎡ ⎤1 0 1 0A = ⎢ 0 0 1 0⎥⎣ 0 0 0 0 ⎦1 0 0 0dim N(A) = 2, λ(A) = [0 0 0 1] T ; three eigenvectors in the nullspace but only two areindependent. The right-hand side of (1211) is tight for nonzero matrices; e.g., (B.1) dyaduv T ∈ R m×m has m 0-eigenvalues when u ∈ v ⊥ .

A.7. ZEROS 485(TRANSPOSE.)Likewise, for any matrix A∈ R m×nrank(A T ) + dim N(A T ) = m (1212)For any square A∈ R m×m , the number of 0 eigenvalues is at least equalto dim N(A T ) = dim N(A) while the left-eigenvectors (eigenvectors of A T )corresponding to those 0 eigenvalues belong to N(A T ).For diagonalizable A , the number of 0 eigenvalues is preciselydim N(A T ) while the corresponding left-eigenvectors span N(A T ). The realand imaginary parts of the left-eigenvectors remaining span R(A T ). ⋄Proof. First we show, for a diagonalizable matrix, the number of 0eigenvalues is precisely the dimension of its nullspace while the eigenvectorscorresponding to those 0 eigenvalues span the nullspace:Any diagonalizable matrix A∈ R m×m must possess a complete set oflinearly independent eigenvectors. If A is full-rank (invertible), then allm=rank(A) eigenvalues are nonzero. [220,5.1]Suppose rank(A)< m . Then dim N(A) = m−rank(A). Thus there isa set of m−rank(A) linearly independent vectors spanning N(A). Eachof those can be an eigenvector associated with a 0 eigenvalue becauseA is diagonalizable ⇔ ∃ m linearly independent eigenvectors. [220,5.2]Eigenvectors of a real matrix corresponding to 0 eigenvalues must be real. A.17Thus A has at least m−rank(A) eigenvalues equal to 0.Now suppose A has more than m−rank(A) eigenvalues equal to 0.Then there are more than m−rank(A) linearly independent eigenvectorsassociated with 0 eigenvalues, and each of those eigenvectors must be inN(A). Thus there are more than m−rank(A) linearly independent vectorsin N(A) ; a contradiction.Therefore diagonalizable A has rank(A) nonzero eigenvalues and exactlym−rank(A) eigenvalues equal to 0 whose corresponding eigenvectors spanN(A).By similar argument, the left-eigenvectors corresponding to 0 eigenvaluesspan N(A T ).Next we show when A is diagonalizable, the real and imaginary parts ofits eigenvectors (corresponding to nonzero eigenvalues) span R(A) :A.17 Let ∗ denote complex conjugation. Suppose A=A ∗ and As i = 0. Then s i = s ∗ i ⇒As i =As ∗ i ⇒ As ∗ i =0. Conversely, As∗ i =0 ⇒ As i=As ∗ i ⇒ s i= s ∗ i .

486 APPENDIX A. LINEAR ALGEBRAThe right-eigenvectors of a diagonalizable matrix A∈ R m×m are linearlyindependent if and only if the left-eigenvectors are. So, matrix A hasa representation in terms of its right- and left-eigenvectors; from thediagonalization (1181), assuming 0 eigenvalues are ordered last,A =m∑λ i s i wi T =i=1k∑≤ mi=1λ i ≠0λ i s i w T i (1213)From the linearly independent dyads theorem (B.1.1.0.2), the dyads {s i w T i }must be independent because each set of eigenvectors are; hence rankA=k ,the number of nonzero eigenvalues. Complex eigenvectors and eigenvaluesare common for real matrices, and must come in complex conjugate pairs forthe summation to remain real. Assume that conjugate pairs of eigenvaluesappear in sequence. Given any particular conjugate pair from (1213), we getthe partial summationλ i s i w T i + λ ∗ i s ∗ iw ∗Ti = 2Re(λ i s i w T i )= 2 ( Res i Re(λ i w T i ) − Im s i Im(λ i w T i ) ) (1214)where A.18 λ ∗ i = λ i+1 , s ∗ iequivalently written∆∆= s i+1 , and w ∗ i∆= w i+1 . Then (1213) isA = 2 ∑ iλ ∈ Cλ i ≠0Res 2i Re(λ 2i w T 2i) − Im s 2i Im(λ 2i w T 2i) + ∑ jλ ∈ Rλ j ≠0λ j s j w T j (1215)The summation (1215) shows: A is a linear combination of real and imaginaryparts of its right-eigenvectors corresponding to nonzero eigenvalues. The kvectors {Re s i ∈ R m , Im s i ∈ R m | λ i ≠0, i∈{1... m}} must therefore spanthe range of diagonalizable matrix A .The argument is similar regarding the span of the left-eigenvectors. A.18 The complex conjugate of w is denoted w ∗ , while its conjugate transpose is denotedby w H = w ∗T .

A.7. ZEROS 487A.7.3For X,A∈ S M +0 trace and matrix product[26,2.6.1, exer.2.8] [240,3.1]tr(XA) = 0 ⇔ XA = AX = 0 (1216)Proof. (⇐) Suppose XA = AX = 0. Then tr(XA)=0 is obvious.(⇒) Suppose tr(XA)=0. tr(XA)= tr(A 1/2 XA 1/2 ) whose argument ispositive semidefinite by Corollary A.3.1.0.5. Trace of any square matrix isequivalent to the sum of its eigenvalues. Eigenvalues of a positive semidefinitematrix can total 0 if and only if each and every nonnegative eigenvalueis 0. The only feasible positive semidefinite matrix, having all 0 eigenvalues,resides at the origin; id est,A 1/2 XA 1/2 = ( X 1/2 A 1/2) TX 1/2 A 1/2 = 0 (1217)implying X 1/2 A 1/2 = 0 which in turn implies X 1/2 (X 1/2 A 1/2 )A 1/2 = XA = 0.Arguing similarly yields AX = 0.Symmetric matrices A and X are simultaneously diagonalizable if andonly if they are commutative under multiplication; (1113) id est, they sharea complete set of eigenvectors.A.7.3.1an equivalence in nonisomorphic spacesIdentity (1216) leads to an unusual equivalence relating convex geometry totraditional linear algebra: The convex sets{X | 〈X , A〉 = 0} ∩ {X ≽ 0} ≡ {X | N(X) ⊇ R(A)} ∩ {X ≽ 0} (1218)(one expressed in terms of a hyperplane, the other in terms of nullspace andrange) are equivalent only when given A ≽ 0.We might apply this equivalence to the geometric center subspace, forexample,S M c = {Y ∈ S M | Y 1 = 0}= {Y ∈ S M | N(Y ) ⊇ 1} = {Y ∈ S M | R(Y ) ⊆ N(1 T )}(1600)from which we derive (confer (560))S M c ∩ S M + ≡ {X ≽ 0 | 〈X , 11 T 〉 = 0} (1219)

488 APPENDIX A. LINEAR ALGEBRAA.7.4Zero definiteThe domain over which an arbitrary real matrix A is zero definite can exceedits left and right nullspaces. For any positive semidefinite matrix A∈ R M×M(for A +A T ≽ 0){x | x T Ax = 0} = N(A +A T ) (1220)because ∃R A+A T =R T R , ‖Rx‖=0 ⇔ Rx=0, and N(A+A T )= N(R).Then given any particular vector x p , x T pAx p = 0 ⇔ x p ∈ N(A +A T ). Forany positive definite matrix A (for A +A T ≻ 0)Further, [269,3.2, prob.5]while{x | x T Ax = 0} = 0 (1221){x | x T Ax = 0} = R M ⇔ A T = −A (1222){x | x H Ax = 0} = C M ⇔ A = 0 (1223)The positive semidefinite matrix[ ] 1 2A =0 1for example, has no nullspace. Yet(1224){x | x T Ax = 0} = {x | 1 T x = 0} ⊂ R 2 (1225)which is the nullspace of the symmetrized matrix. Symmetric matrices arenot spared from the excess; videlicet,[ ] 1 2B =(1226)2 1has eigenvalues {−1, 3}, no nullspace, but is zero definite on A.19X ∆ = {x∈ R 2 | x 2 = (−2 ± √ 3)x 1 } (1227)A.19 These two lines represent the limit in the union of two generally distinct hyperbolae;id est, for matrix B and set X as definedlim{x∈ R 2 | x T Bx = ε} = Xε↓0

A.7. ZEROS 489A.7.4.0.1 Proposition. (Sturm) Dyad-decompositions. [225,5.2]Given symmetric matrix A∈ S M , let positive semidefinite matrix X ∈ S M +have rank ρ . Then 〈A, X〉 = 0 if and only if there exists adyad-decompositionρ∑X = x j xj T (1228)satisfyingj=1〈A, x j x T j 〉 = 0 for each and every j (1229)⋄The dyad-decomposition proposed is generally not that obtained froma standard diagonalization of X by eigen decomposition, unless ρ =1or the given matrix A is positive semidefinite. That means, elementaldyads in decomposition (1228) constitute a generally nonorthogonalset. Sturm & Zhang give a simple procedure for constructing thedyad-decomposition. (G.5)A.7.4.0.2 Example. Dyad.The dyad uv T ∈ R M×M (B.1) is zero definite on all x for which eitherx T u=0 or x T v=0;{x | x T uv T x = 0} = {x | x T u=0} ∪ {x | v T x=0} (1230)id est, on u ⊥ ∪ v ⊥ . Symmetrizing the dyad does not change the outcome:{x | x T (uv T + vu T )x/2 = 0} = {x | x T u=0} ∪ {x | v T x=0} (1231)

490 APPENDIX A. LINEAR ALGEBRA

Appendix BSimple matricesMathematicians also attempted to develop algebra of vectors butthere was no natural definition of the product of two vectorsthat held in arbitrary dimensions. The first vector algebra thatinvolved a noncommutative vector product (that is, v×w need notequal w×v) was proposed by Hermann Grassmann in his bookAusdehnungslehre (1844). Grassmann’s text also introduced theproduct of a column matrix and a row matrix, which resulted inwhat is now called a simple or a rank-one matrix. In the late19th century the American mathematical physicist Willard Gibbspublished his famous treatise on vector analysis. In that treatiseGibbs represented general matrices, which he called dyadics, assums of simple matrices, which Gibbs called dyads. Later thephysicist P. A. M. Dirac introduced the term “bra-ket” for whatwe now call the scalar product of a “bra” (row) vector times a“ket” (column) vector and the term “ket-bra” for the product of aket times a bra, resulting in what we now call a simple matrix, asabove. Our convention of identifying column matrices and vectorswas introduced by physicists in the 20th century.−Marie A. Vitulli [248]2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.491

492 APPENDIX B. SIMPLE MATRICESB.1 Rank-one matrix (dyad)Any matrix formed from the unsigned outer product of two vectors,Ψ = uv T ∈ R M×N (1232)where u ∈ R M and v ∈ R N , is rank-one and called a dyad. Conversely, anyrank-one matrix must have the form Ψ . [131, prob.1.4.1] The product −uv Tis a negative dyad. For matrix products AB T , in general, we haveR(AB T ) ⊆ R(A) , N(AB T ) ⊇ N(B T ) (1233)with equality when B = A [220,3.3,3.6] B.1 or respectively when B isinvertible and N(A)=0. Yet for all nonzero dyads we haveR(uv T ) = R(u) , N(uv T ) = N(v T ) ≡ v ⊥ (1234)where dim v ⊥ =N −1.It is obvious a dyad can be 0 only when u or v is 0;Ψ = uv T = 0 ⇔ u = 0 or v = 0 (1235)The matrix 2-norm for Ψ is equivalent to the Frobenius norm;‖Ψ‖ 2 = ‖uv T ‖ F = ‖uv T ‖ 2 = ‖u‖ ‖v‖ (1236)When u and v are normalized, the pseudoinverse is the transposed dyad.Otherwise,Ψ † = (uv T ) † vu T=(1237)‖u‖ 2 ‖v‖ 2When dyad uv T ∈R N×N is square, uv T has at least N −1 0-eigenvaluesand corresponding eigenvectors spanning v ⊥ . The remaining eigenvector uspans the range of uv T with corresponding eigenvalueB.1 Proof. R(AA T ) ⊆ R(A) is obvious.λ = v T u = tr(uv T ) ∈ R (1238)R(AA T ) = {AA T y | y ∈ R m }⊇ {AA T y | A T y ∈ R(A T )} = R(A) by (115)

B.1. RANK-ONE MATRIX (DYAD) 493R(v)0 0 R(Ψ) = R(u)N(Ψ)= N(v T )N(u T )R N = R(v) ⊕ N(uv T )N(u T ) ⊕ R(uv T ) = R MFigure 104:The four fundamental subspaces [222,3.6] of any dyadΨ = uv T ∈R M×N . Ψ(x) ∆ = uv T x is a linear mapping from R N to R M . Themap from R(v) to R(u) is bijective. [220,3.1]The determinant is the product of the eigenvalues; so, it is always true thatdet Ψ = det(uv T ) = 0 (1239)When λ=1, the square dyad is a nonorthogonal projector projecting on itsrange (Ψ 2 =Ψ ,E.1). It is quite possible that u∈v ⊥ making the remainingeigenvalue instead 0 ; B.2 λ=0 together with the first N − 1 0-eigenvalues;id est, it is possible uv T were nonzero while all its eigenvalues are 0. Thematrix [ 1−1] [ 1 1]=[ 1 1−1 −1](1240)for example, has two 0-eigenvalues. In other words, the value of eigenvectoru may simultaneously be a member of the nullspace and range of the dyad.The explanation is, simply, because u and v share the same dimension,dimu = M = dimv = N :Proof. Figure 104 shows the four fundamental subspaces for the dyad.Linear operator Ψ : R N → R M provides a map between vector spaces thatremain distinct when M =N ;u ∈ R(uv T )u ∈ N(uv T ) ⇔ v T u = 0R(uv T ) ∩ N(uv T ) = ∅(1241)B.2 The dyad is not always diagonalizable (A.5) because the eigenvectors are notnecessarily independent.

494 APPENDIX B. SIMPLE MATRICESB.1.0.1rank-one modificationIf A∈ R N×N is any nonsingular matrix and 1+v T A −1 u≠0, then [143, App.6][269,2.3, prob.16] [87,4.11.2] (Sherman-Morrison)(A + uv T ) −1 = A −1 − A−1 uv T A −11 + v T A −1 u(1242)B.1.0.2dyad symmetryIn the specific circumstance that v = u , then uu T ∈ R N×N is symmetric,rank-one, and positive semidefinite having exactly N −1 0-eigenvalues. Infact, (Theorem A.3.1.0.7)uv T ≽ 0 ⇔ v = u (1243)and the remaining eigenvalue is almost always positive;λ = u T u = tr(uu T ) > 0 unless u=0 (1244)The matrix [ Ψ uu T 1for example, is rank-1 positive semidefinite if and only if Ψ = uu T .](1245)B.1.1Dyad independenceNow we consider a sum of dyads like (1232) as encountered in diagonalizationand singular value decomposition:( k∑)k∑R s i wiT = R ( )k∑s i wiT = R(s i ) ⇐ w i ∀i are l.i. (1246)i=1i=1i=1range of the summation is the vector sum of ranges. B.3 (Theorem B.1.1.1.1)Under the assumption the dyads are linearly independent (l.i.), then thevector sums are unique (p.644): for {w i } l.i. and {s i } l.i.( k∑)R s i wiT = R ( ) ( )s 1 w1 T ⊕ ... ⊕ R sk wk T = R(s1 ) ⊕ ... ⊕ R(s k ) (1247)i=1B.3 Move of range R to inside the summation depends on linear independence of {w i }.

B.1. RANK-ONE MATRIX (DYAD) 495B.1.1.0.1 Definition. Linearly independent dyads. [137, p.29, thm.11][227, p.2] The set of k dyads{si wi T | i=1... k } (1248)where s i ∈C M and w i ∈C N , is said to be linearly independent iffrank(SW T ∆ =k∑s i wiTi=1)= k (1249)where S ∆ = [s 1 · · · s k ] ∈ C M×k and W ∆ = [w 1 · · · w k ] ∈ C N×k .△As defined, dyad independence does not preclude existence of a nullspaceN(SW T ) , nor does it imply SW T is full-rank. In absence of an assumptionof independence, generally, rankSW T ≤ k . Conversely, any rank-k matrixcan be written in the form SW T by singular value decomposition. (A.6)B.1.1.0.2 Theorem. Linearly independent (l.i.) dyads.Vectors {s i ∈ C M , i = 1... k} are l.i. and vectors {w i ∈ C N , i = 1... k} arel.i. if and only if dyads {s i wi T ∈ C M×N , i=1... k} are l.i.⋄Proof. Linear independence of k dyads is identical to definition (1249).(⇒) Suppose {s i } and {w i } are each linearly independent sets. InvokingSylvester’s rank inequality, [131,0.4] [269,2.4]rankS+rankW − k ≤ rank(SW T ) ≤ min{rankS , rankW } (≤ k) (1250)Then k ≤rank(SW T )≤k that implies the dyads are independent.(⇐) Conversely, suppose rank(SW T )=k . Thenk ≤ min{rankS , rankW } ≤ k (1251)implying the vector sets are each independent.

496 APPENDIX B. SIMPLE MATRICESB.1.1.1 Biorthogonality condition, Range and Nullspace of SumDyads characterized by a biorthogonality condition W T S =I areindependent; id est, for S ∈ C M×k and W ∈ C N×k , if W T S =I thenrank(SW T )=k by the linearly independent dyads theorem because(conferE.1.1)W T S =I ⇔ rankS=rankW =k≤M =N (1252)To see that, we need only show: N(S)=0 ⇔ ∃ B BS =I . B.4(⇐) Assume BS =I . Then N(BS)=0={x | BSx = 0} ⊇ N(S). (1233)(⇒) If N(S)=0[ then]S must be full-rank skinny-or-square.B∴ ∃ A,B,C [ S A] = I (id est, [ S A] is invertible) ⇒ BS =I .CLeft inverse B is given as W T here. Because of reciprocity with S , itimmediately follows: N(W)=0 ⇔ ∃ S S T W =I . Dyads produced by diagonalization, for example, are independent becauseof their inherent biorthogonality. (A.5.1) The converse is generally false;id est, linearly independent dyads are not necessarily biorthogonal.B.1.1.1.1 Theorem. Nullspace and range of dyad sum.Given a sum of dyads represented by SW T where S ∈C M×k and W ∈ C N×kN(SW T ) = N(W T ) ⇐ ∃ B BS = IR(SW T ) = R(S) ⇐ ∃ Z W T Z = I(1253)⋄Proof. (⇒) N(SW T )⊇ N(W T ) and R(SW T )⊆ R(S) are obvious.(⇐) Assume the existence of a left inverse B ∈ R k×N and a right inverseZ ∈ R N×k . B.5N(SW T ) = {x | SW T x = 0} ⊆ {x | BSW T x = 0} = N(W T ) (1254)R(SW T ) = {SW T x | x∈ R N } ⊇ {SW T Zy | Zy ∈ R N } = R(S) (1255)B.4 Left inverse is not unique, in general.B.5 By counter example, the theorem’s converse cannot be true; e.g., S = W = [1 0].

B.2. DOUBLET 497R([u v ])R(Π)= R([u v])0 0 N(Π)=v ⊥ ∩ u ⊥([ ]) vR N T= R([u v ]) ⊕ Nu Tv ⊥ ∩ u ⊥([ ]) vTNu T ⊕ R([u v ]) = R NFigure 105: Four fundamental subspaces [222,3.6] of a doubletΠ = uv T + vu T ∈ S N . Π(x) = (uv T + vu T )x is a linear bijective mappingfrom R([u v ]) to R([u v ]).B.2 DoubletConsider a sum of two linearly independent square dyads, one a transpositionof the other:Π = uv T + vu T = [ u v ] [ ]v Tu T = SW T ∈ S N (1256)where u,v ∈ R N . Like the dyad, a doublet can be 0 only when u or v is 0;Π = uv T + vu T = 0 ⇔ u = 0 or v = 0 (1257)By assumption of independence, a nonzero doublet has two nonzeroeigenvaluesλ 1 ∆ = u T v + ‖uv T ‖ , λ 2 ∆ = u T v − ‖uv T ‖ (1258)where λ 1 > 0 >λ 2 , with corresponding eigenvectorsx 1 ∆ = u‖u‖ + v‖v‖ , x 2∆= u‖u‖ −v‖v‖(1259)spanning the doublet range. Eigenvalue λ 1 cannot be 0 unless u and v haveopposing directions, but that is antithetical since then the dyads would nolonger be independent. Eigenvalue λ 2 is 0 if and only if u and v share thesame direction, again antithetical. Generally we have λ 1 > 0 and λ 2 < 0, soΠ is indefinite.

498 APPENDIX B. SIMPLE MATRICESN(u T )R(E) = N(v T )0 0 N(E)=R(u)R(v)R N = N(u T ) ⊕ N(E)R(v) ⊕ R(E) = R NFigure 106: v T u = 1/ζ . The four fundamental subspaces)[222,3.6] ofelementary matrix E as a linear mapping E(x)=(I − uvT x .v T uBy the nullspace and range of dyad sum theorem, doublet Π hasN −2 ([ zero-eigenvalues ]) remaining and corresponding eigenvectors spanningvTNu T . We therefore haveR(Π) = R([u v ]) , N(Π) = v ⊥ ∩ u ⊥ (1260)of respective dimension 2 and N −2.B.3 Elementary matrixA matrix of the formE = I − ζuv T ∈ R N×N (1261)where ζ ∈ R is finite and u,v ∈ R N , is called an elementary matrix or arank-one modification of the identity. [133] Any elementary matrix in R N×Nhas N −1 eigenvalues equal to 1 corresponding to real eigenvectors thatspan v ⊥ . The remaining eigenvalueλ = 1 − ζv T u (1262)corresponds to eigenvector u . B.6 From [143, App.7.A.26] the determinant:detE = 1 − tr ( ζuv T) = λ (1263)B.6 Elementary matrix E is not always diagonalizable because eigenvector u need not beindependent of the others; id est, u∈v ⊥ is possible.

B.3. ELEMENTARY MATRIX 499If λ ≠ 0 then E is invertible; [87]E −1 = I + ζ λ uvT (1264)Eigenvectors corresponding to 0 eigenvalues belong to N(E) , andthe number of 0 eigenvalues must be at least dim N(E) which, here,can be at most one. (A.7.2.0.1) The nullspace exists, therefore, whenλ=0; id est, when v T u=1/ζ , rather, whenever u belongs to thehyperplane {z ∈R N | v T z=1/ζ}. Then (when λ=0) elementary matrixE is a nonorthogonal projector projecting on its range (E 2 =E ,E.1)and N(E)= R(u) ; eigenvector u spans the nullspace when it exists. Byconservation of dimension, dim R(E)=N −dim N(E). It is apparent from(1261) that v ⊥ ⊆ R(E) , but dimv ⊥ =N −1. Hence R(E)≡v ⊥ when thenullspace exists, and the remaining eigenvectors span it.In summary, when a nontrivial nullspace of E exists,R(E) = N(v T ), N(E) = R(u), v T u = 1/ζ (1265)illustrated in Figure 106, which is opposite to the assignment of subspacesfor a dyad (Figure 104). Otherwise, R(E)= R N .When E =E T , the spectral norm isB.3.1Householder matrix‖E‖ 2 = max{1, |λ|} (1266)An elementary matrix is called a Householder matrix when it has the definingform, for nonzero vector u [93,5.1.2] [87,4.10.1] [220,7.3] [131,2.2]H = I − 2 uuTu T u ∈ SN (1267)which is a symmetric orthogonal (reflection) matrix (H −1 =H T =H(B.5.2)). Vector u is normal to an N−1-dimensional subspace u ⊥ throughwhich this particular H effects pointwise reflection; e.g., Hu ⊥ = u ⊥ whileHu = −u .Matrix H has N − 1 orthonormal eigenvectors spanning that reflectingsubspace u ⊥ with corresponding eigenvalues equal to 1. The remainingeigenvector u has corresponding eigenvalue −1 ; sodetH = −1 (1268)

500 APPENDIX B. SIMPLE MATRICESDue to symmetry of H , the matrix 2-norm (the spectral norm) is equal to thelargest eigenvalue-magnitude. A Householder matrix is thus characterized,H T = H , H −1 = H T , ‖H‖ 2 = 1, H 0 (1269)For example, the permutation matrix⎡ ⎤1 0 0Ξ = ⎣ 0 0 1 ⎦ (1270)0 1 0is a Householder matrix having u=[ 0 1 −1 ] T / √ 2 . Not all permutationmatrices are Householder matrices, although all permutation matrices areorthogonal matrices. [220,3.4] Neither ⎡are all symmetric ⎤ permutation0 0 0 1matrices Householder matrices; e.g., Ξ = ⎢ 0 0 1 0⎥⎣ 0 1 0 0 ⎦ (1352) is not a1 0 0 0Householder matrix.B.4 Auxiliary V -matricesB.4.1Auxiliary projector matrix VIt is convenient to define a matrix V that arises naturally as a consequence oftranslating the geometric center α c (4.5.1.0.1) of some list X to the origin.In place of X − α c 1 T we may write XV as in (548) whereV ∆ = I − 1 N 11T ∈ S N (491)is an elementary matrix called the geometric centering matrix.Any elementary matrix in R N×N has N−1 eigenvalues equal to 1. For theparticular elementary matrix V , the N th eigenvalue equals 0. The numberof 0 eigenvalues must equal dim N(V ) = 1, by the 0 eigenvalues theorem(A.7.2.0.1), because V =V T is diagonalizable. BecauseV 1 = 0 (1271)the nullspace N(V )=R(1) is spanned by the eigenvector 1. The remainingeigenvectors span R(V ) ≡ 1 ⊥ = N(1 T ) that has dimension N −1.

B.4. AUXILIARY V -MATRICES 501BecauseV 2 = V (1272)and V T = V , elementary matrix V is also a projection matrix (E.3)projecting orthogonally on its range N(1 T ).V = I − 1(1 T 1) −1 1 T (1273)The {0, 1} eigenvalues also indicate diagonalizable V is a projection matrix.[269,4.1, thm.4.1] Symmetry of V denotes orthogonal projection; from(1515),V T = V , V † = V , ‖V ‖ 2 = 1, V ≽ 0 (1274)Matrix V is also circulant [101].B.4.1.0.1 Example. Relationship of auxiliary to Householder matrix.Let H ∈ S N be a Householder matrix (1267) defined byu =⎡⎢⎣1.11 + √ N⎤⎥⎦ ∈ RN (1275)Then we have [90,2][ I 0V = H0 T 0]H (1276)Let D ∈ S N h and define[−HDH = ∆ A b−b T c](1277)where b is a vector. Then because H is nonsingular (A.3.1.0.5) [114,3][ A 0−V DV = −H0 T 0]H ≽ 0 ⇔ −A ≽ 0 (1278)and affine dimension is r = rankA when D is a Euclidean distance matrix.

502 APPENDIX B. SIMPLE MATRICESB.4.2Schoenberg auxiliary matrix V N1. V N = 1 √2[ −1TI2. V T N 1 = 03. I − e 1 1 T = [ 0 √ 2V N]4. [ 0 √ 2V N]VN = V N5. [ 0 √ 2V N]V = V]∈ R N×N−16. V [ 0 √ 2V N]=[0√2VN]7. [ 0 √ 2V N] [0√2VN]=[0√2VN]8. [ 0 √ 2V N] †=[ 0 0T0 I]V9. [ 0 √ 2V N] †V =[0√2VN] †10. [ 0 √ ] [ √ ] †2V N 0 2VN = V11. [ 0 √ [ ]] † [ √ ] 0 0T2V N 0 2VN =0 I12. [ 0 √ ] [ ]0 02V TN = [ 0 √ ]2V0 IN[ ] [ ]0 0T [0 √ ] 0 0T13.2VN =0 I0 I14. [V N1 √21 ] −1 =[ ]V†N√2N 1T15. V † N = √ 2 [ − 1 N 1 I − 1 N 11T] ∈ R N−1×N ,(I −1N 11T ∈ S N−1)16. V † N 1 = 017. V † N V N = I

B.4. AUXILIARY V -MATRICES 50318. V T = V = V N V † N = I − 1 N 11T ∈ S N(19. −V † N (11T − I)V N = I , 11 T − I ∈ EDM N)20. D = [d ij ] ∈ S N h (493)tr(−V DV ) = tr(−V D) = tr(−V † N DV N) = 1 N 1T D 1 = 1 N tr(11T D) = 1 NAny elementary matrix E ∈ S N of the particular form∑d iji,jE = k 1 I − k 2 11 T (1279)where k 1 , k 2 ∈ R , B.7 will make tr(−ED) proportional to ∑ d ij .21. D = [d ij ] ∈ S Ntr(−V DV ) = 1 N22. D = [d ij ] ∈ S N h∑i,ji≠jd ij − N−1N∑d ii = 1 T D1 1 − trD Nitr(−V T N DV N) = ∑ jd 1j23. For Y ∈ S NB.4.3V (Y − δ(Y 1))V = Y − δ(Y 1)The skinny matrix⎡V ∆ W =⎢⎣Orthonormal auxiliary matrix V W−1 √N1 + −1N+ √ N−1N+ √ N.−1N+ √ N√−1−1N· · · √N−1N+ √ N......−1N+ √ N· · ·−1N+ √ N...−1N+ √ N...· · · 1 + −1N+ √ N.⎤∈ R N×N−1 (1280)⎥⎦B.7 If k 1 is 1−ρ while k 2 equals −ρ∈R , then all eigenvalues of E for −1/(N −1) < ρ < 1are guaranteed positive and therefore E is guaranteed positive definite. [197]

504 APPENDIX B. SIMPLE MATRICEShas R(V W )= N(1 T ) and orthonormal columns. [4] We defined threeauxiliary V -matrices: V , V N (474), and V W sharing some attributes listedin Table B.4.4. For example, V can be expressedV = V W V T W = V N V † N(1281)but V T W V W= I means V is an orthogonal projector (1512) andV † W = V T W , ‖V W ‖ 2 = 1 , V T W1 = 0 (1282)B.4.4Auxiliary V -matrix TabledimV rankV R(V ) N(V T ) V T V V V T V V †V N ×N N −1 N(1 T ) R(1) V V V[ ]V N N ×(N −1) N −1 N(1 T 1) R(1) (I + 2 11T 1 N −1 −1T) V2 −1 IV W N ×(N −1) N −1 N(1 T ) R(1) I V VB.4.5More auxiliary matricesMathar shows [169,2] that any elementary matrix (B.3) of the formV M = I − b1 T ∈ R N×N (1283)such that b T 1 = 1 (confer [95,2]), is an auxiliary V -matrix havingR(V T M ) = N(bT ), R(V M ) = N(1 T )N(V M ) = R(b), N(V T M ) = R(1) (1284)Given X ∈ R n×N , the choice b= 1 N 1 (V M=V ) minimizes ‖X(I − b1 T )‖ F .[97,3.2.1]

B.5. ORTHOGONAL MATRIX 505B.5 Orthogonal matrixB.5.1Vector rotationThe property Q −1 = Q T completely defines an orthogonal matrix Q ∈ R n×nemployed to effect vector rotation; [220,2.6,3.4] [222,6.5] [131,2.1] forx ∈ R n ‖Qx‖ = ‖x‖ (1285)The orthogonal matrix is characterized:Q −1 = Q T , ‖Q‖ 2 = 1 (1286)Applying characterization (1286) to Q T we see it too is an orthogonal matrix.Hence the rows and columns of Q respectively form an orthonormal set.All permutation matrices Ξ , for example, are orthogonal matrices. Thelargest magnitude entry of any orthogonal matrix is 1; for each and everyj ∈1... n‖Q(j,:)‖ ∞ ≤ 1(1287)‖Q(:, j)‖ ∞ ≤ 1Each and every eigenvalue of a (real) orthogonal matrix has magnitude 1λ(Q) ∈ C n , |λ(Q)| = 1 (1288)while only the identity matrix can be simultaneously positive definite andorthogonal.A unitary matrix is a complex generalization of the orthogonal matrix.The conjugate transpose defines it: U −1 = U H . An orthogonal matrix issimply a real unitary matrix.B.5.2ReflectionA matrix for pointwise reflection is defined by imposing symmetry uponthe orthogonal matrix; id est, a reflection matrix is completely definedby Q −1 = Q T = Q . The reflection matrix is an orthogonal matrix,characterized:Q T = Q , Q −1 = Q T , ‖Q‖ 2 = 1 (1289)The Householder matrix (B.3.1) is an example of a symmetric orthogonal(reflection) matrix.

506 APPENDIX B. SIMPLE MATRICESFigure 107: Gimbal: a mechanism imparting three degrees of dimensionalfreedom to a Euclidean body suspended at its center. Each ring is free torotate about one axis. (Drawing courtesy of The MathWorks Inc.)Reflection matrices have eigenvalues equal to ±1 and so detQ=±1. Itis natural to expect a relationship between reflection and projection matricesbecause all projection matrices have eigenvalues belonging to {0, 1} . Infact, any reflection matrix Q is related to some orthogonal projector P by[133,1, prob.44]Q = I − 2P (1290)Yet P is, generally, neither orthogonal or invertible. (E.3.2)λ(Q) ∈ R n , |λ(Q)| = 1 (1291)Reflection is with respect to R(P ) ⊥ . Matrix 2P −I represents antireflection.Every orthogonal matrix can be expressed as the product of a rotation anda reflection. The collection of all orthogonal matrices of particular dimensiondoes not form a convex set.B.5.3Rotation of range and rowspaceGiven orthogonal matrix Q , column vectors of a matrix X are simultaneouslyrotated by the product QX . In three dimensions (X ∈ R 3×N ), the precisemeaning of rotation is best illustrated in Figure 107 where the gimbal aidsvisualization of rotation achievable about the origin.

B.5. ORTHOGONAL MATRIX 507B.5.3.0.1 Example. One axis of revolution. [Partition an n + 1-dimensional Euclidean space R n+1 =∆ RnRan n-dimensional subspace]and defineR ∆ = {λ∈ R n+1 | 1 T λ = 0} (1292)(a hyperplane through the origin). We want an orthogonal matrix thatrotates a list in the columns of matrix X ∈ R n+1×N through the dihedralangle between R n and R : (R n , R)= arccos ( 1/ √ n+1 ) radians. Thevertex-description of the nonnegative orthant in R n+1 is{[e 1 e 2 · · · e n+1 ]a | a ≽ 0} = {a ≽ 0} ⊂ R n+1 (1293)Consider rotation of these vertices via orthogonal matrixQ ∆ = [1 1 √ n+1ΞV W ]Ξ ∈ R n+1×n+1 (1294)where permutation matrix Ξ ∈ S n+1 is defined in (1352), and V W ∈ R n+1×nis the orthonormal auxiliary matrix defined inB.4.3. This particularorthogonal matrix is selected because it rotates any point in R n about oneaxis of revolution onto R ; e.g., rotation Qe n+1 aligns the last standard basisvector with subspace normal R ⊥ =1, and from these two vectors we get(R n , R). The rotated standard basis vectors remaining are orthonormalspanning R .Another interpretation of product QX is rotation/reflection of R(X).Rotation of X as in QXQ T is the simultaneous rotation/reflection of rangeand rowspace. B.8Proof. Any matrix can be expressed as a singular value decompositionX = UΣW T (1192) where δ 2 (Σ) = Σ , R(U) ⊇ R(X) , and R(W) ⊇ R(X T ).B.8 The product Q T AQ can be regarded as a coordinate transformation; e.g., givenlinear map y =Ax : R n → R n and orthogonal Q, the transformation Qy = AQx is arotation/reflection of the range and rowspace (114) of matrix A where Qy ∈ R(A) andQx∈R(A T ) (115).

508 APPENDIX B. SIMPLE MATRICESB.5.4Matrix rotationOrthogonal matrices are also employed to rotate/reflect like vectors othermatrices: [sic] [93,12.4.1] Given orthogonal matrix Q , the product Q T Awill rotate A∈ R n×n in the Euclidean sense in R n2 because the Frobeniusnorm is orthogonally invariant (2.2.1);‖Q T A‖ F = √ tr(A T QQ T A) = ‖A‖ F (1295)(likewise for AQ). Were A symmetric, such a rotation would depart fromS n . One remedy is to instead form the product Q T AQ because‖Q T AQ‖ F =√tr(Q T A T QQ T AQ) = ‖A‖ F (1296)Matrix A is orthogonally equivalent to B if B=S T AS for someorthogonal matrix S . Every square matrix, for example, is orthogonallyequivalent to a matrix having equal entries along the main diagonal.[131,2.2, prob.3]B.5.4.1bijectionAny product AQ of orthogonal matrices remains orthogonal. Givenany other dimensionally compatible orthogonal matrix U , the mappingg(A)= U T AQ is a linear bijection on the domain of orthogonal matrices.[156,2.1]

Appendix CSome analytical optimal resultsinfC.1 properties of infima∅ ∆ = ∞ (1297)sup ∅ ∆ = −∞ (1298)Given f(x) : X →R defined on arbitrary set X [129,0.1.2]inf f(x) = − sup −f(x)x∈X x∈Xsupx∈Xf(x) = −infx∈X −f(x) (1299)arg inf f(x) = arg sup −f(x)x∈X x∈Xarg supx∈Xf(x) = arg infx∈X −f(x) (1300)Given g(x,y) : X × Y →R with independent variables x and y definedon mutually independent arbitrary sets X and Y [129,0.1.2]inf g(x,y) = inf inf g(x,y) = inf inf g(x,y) (1301)x∈X, y∈Y x∈X y∈Y y∈Y x∈XThe variables are independent if and only if the corresponding sets arenot interdependent. An objective function of partial infimum is notnecessarily unique.2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.509

510 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSGiven f(x) : X →R and g(x) : X →R defined on arbitrary set X[129,0.1.2]inf (f(x) + g(x)) ≥ inf f(x) + inf g(x) (1302)x∈X x∈X x∈XGiven f(x) : X ∪ Y →R and arbitrary sets X and Y [129,0.1.2]X ⊂ Y ⇒ inf f(x) ≥ inf f(x) (1303)x∈X x∈Yinf f(x) = min{inf f(x), inf f(x)} (1304)x∈X ∪Y x∈X x∈Yinf f(x) ≥ max{inf f(x), inf f(x)} (1305)x∈X ∩Y x∈X x∈YOver some convex set C given vector constant y or matrix constant Yarg infx∈C ‖x − y‖ 2 = arg infx∈C ‖x − y‖2 2 (1306)arg infX∈ C ‖X − Y ‖ F = arg infX∈ C ‖X − Y ‖2 F (1307)C.2 involving absolute valueOptimal solution is norm dependent. [41, p.297]minimize ‖x‖ 1xsubject to x ∈ C≡minimize 1 T tt,xsubject to −t ≼ x ≼ tx ∈ C(1308)minimize ‖x‖ 2xsubject to x ∈ C≡minimizet,xsubject tot[ tI xx T tx ∈ C]≽ 0(1309)minimize ‖x‖ ∞xsubject to x ∈ C≡minimize tt,xsubject to −t1 ≼ x ≼ t1x ∈ C(1310)In R n the norms respectively represent: length measured along a grid,Euclidean length, maximum |coordinate|.

C.3. INVOLVING DIAGONAL, TRACE, EIGENVALUES 511(Ye) Assuming existence of a minimum element (p.101):minimize |x|xsubject to x ∈ C≡minimize 1 T (α + β)α,β, xsubject to α,β ≽ 0x = α − βx ∈ C(1311)All these problems are convex when C is.C.3 involving diagonal, trace, eigenvaluesFor A∈ R m×n and σ(A) denoting its singular values, [41,A.1.6][78,1]∑iσ(A) i = tr √ A T A = sup tr(X T A) = maximize tr(X T A)‖X‖ 2 ≤1X∈R m×n [ ] I Xsubject toX T ≽ 0I(1312)For X ∈ S m , Y ∈ S n , A∈ C ⊆ R m×n for set C convex, and σ(A)denoting the singular values of A [78,3]minimizeA∑σ(A) isubject to A ∈ Ci≡minimizeA , X , Ysubject totrX + trY[ ] X AA T ≽ 0YA ∈ C(1313)For A∈ S N + and β ∈ Rβ trA = maximize tr(XA)X∈ S Nsubject to X ≼ βI(1314)

512 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor A∈ S N having eigenvalues λ(A)∈ R N [156,2.1] [30,I.6.15]min{λ(A) i } = inf x T Ax = minimizei‖x‖=1X∈ S N +subject to trX = 1max{λ(A) i } = sup x T Ax = maximizei‖x‖=1X∈ S N +subject to trX = 1tr(XA) = maximize tt∈Rsubject to A ≽ tI(1315)tr(XA) = minimize tt∈Rsubject to A ≼ tI(1316)The minimum eigenvalue of any symmetric matrix is always a concavefunction of its entries, while the maximum eigenvalue is always convex.[41, exmp.3.10] For v 1 a normalized eigenvector of A corresponding tothe largest eigenvalue, and v N a normalized eigenvector correspondingto the smallest eigenvalue,v N = arg inf‖x‖=1 xT Ax (1317)v 1 = arg sup x T Ax (1318)‖x‖=1For A∈ S N having eigenvalues λ(A)∈ R N , consider the unconstrainednonconvex optimization that is a projection (7.1.2) on the rank 1subset (2.9.2.1) of the boundary of positive semidefinite cone S N + :defining λ 1 ∆ = maxi{λ(A) i } and corresponding eigenvector v 1minimizex‖xx T − A‖ 2 F = minimize tr(xx T (x T x) − 2Axx T + A T A)x{‖λ(A)‖ 2 , λ 1 ≤ 0=‖λ(A)‖ 2 − λ 2 1 , λ 1 > 0 (1319)

C.3. INVOLVING DIAGONAL, TRACE, EIGENVALUES 513arg minimizex‖xx T − A‖ 2 F ={0 , λ1 ≤ 0v 1√λ1 , λ 1 > 0(1320)Proof. From (1404) andD.2.1 the gradient is∇ x((x T x) 2 − 2x T Ax ) = 4(x T x)x − 4Ax (1321)When the gradient is set to 0Ax = x(x T x) (1322)becomes necessary for optimal solution. Suppose we replace vectorx with any normalized eigenvector v i (of A∈ S N ) corresponding to apositive eigenvalue λ i (positive by Theorem A.3.1.0.7). If we scale thateigenvector by square root of the eigenvalue, then (1322) is satisfiedx ← v i√λi ⇒ Av i = v i λ i (1323)xx T = λ i v i v T i (1324)and the minimum is achieved when λ i =λ 1 is the largest positiveeigenvalue of A ; if A has no positive eigenvalue then x=0 yields theminimum. Given nonincreasingly ordered diagonalization A = QΛQ Twhere Λ = δ(λ(A)) (A.5), then (1319) has minimum valueminimizex⎧‖QΛQ T ‖ 2 F = ‖δ(Λ)‖2 , λ 1 ≤ 0⎪⎨⎛⎡‖xx T −A‖ 2 F =λ 1 Q ⎜⎢0⎝⎣. ..⎪⎩ ∥0⎤ ⎞ ∥ ∥∥∥∥∥∥2⎡⎥ ⎟⎦− Λ⎠Q T ⎢=⎣∥Fλ 10.0⎤⎥⎦− δ(Λ)∥2(1325), λ 1 > 0

514 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSGiven some convex set C , maximum eigenvalue magnitude µ ofA∈ S N is minimized over C by the semidefinite program (confer7.1.5)minimize ‖A‖ 2Asubject to A ∈ C≡minimize µµ , Asubject to −µI ≼ A ≼ µIA ∈ C(1326)µ ⋆ ∆ = maxi{ |λ(A ⋆ ) i | , i = 1... N } ∈ R + (1327)(Fan) For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged innonincreasing order, and for 1≤k ≤N [9,4.1] [140] [131,4.3.18][240,2] [156,2.1]N∑i=N−k+1λ(B) i = infU∈ R N×kU T U=Ik∑λ(B) i = supi=1U∈ R N×kU T U=Itr(UU T B) = minimizeX∈ S N +tr(XB)subject to X ≼ ItrX = k= maximize (k − N)µ + tr(B − Z)µ∈R , Z∈S N +subject to µI + Z ≽ Btr(UU T B) = maximizeX∈ S N +tr(XB)subject to X ≼ ItrX = k(a)(b)(c)= minimize kµ + trZµ∈R , Z∈S N +subject to µI + Z ≽ B(d)(1328)Given ordered diagonalization B = WΛW T , then optimal Ufor the infimum is U ⋆ = W(:, N − k+1:N)∈ R N×k whereasU ⋆ = W(:, 1:k)∈ R N×k for the supremum. In both cases, X ⋆ = U ⋆ U ⋆T .Optimization (a) searches the convex hull of the outer product UU Tof all N ×k orthonormal matrices. (2.3.2.1.1)

C.3. INVOLVING DIAGONAL, TRACE, EIGENVALUES 515For B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, and for diagonal matrix Υ∈ S k whose diagonal entriesare arranged in nonincreasing order where 1≤k ≤N , we utilizethe main-diagonal δ operator’s property of self-adjointness (1063);[10,4.2]k∑i=1Υ ii λ(B) N−i+1 = inf tr(ΥU T BU) = infU∈ R N×kU T U=I= minimizeV i ∈S Ni=1U∈ R N×kU T U=Iδ(Υ) T δ(U T BU)()∑tr B k (Υ ii −Υ i+1,i+1 )V isubject to trV i = i ,I ≽ V i ≽ 0,where Υ k+1,k+1 ∆ = 0. We speculate,i=1... ki=1... k(1329)k∑Υ ii λ(B) i = supi=1tr(ΥU T BU) = sup δ(Υ) T δ(U T BU) (1330)U∈ R N×kU∈ R N×kU T U=IU T U=IAlizadeh shows: [9,4.2]k∑i=1Υ ii λ(B) i = minimizek∑iµ i + trZ iµ∈R k , Z i ∈S N i=1subject to µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... k= maximizeV i ∈S Nwhere Υ k+1,k+1 ∆ = 0.Z i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )V ii=1i=1... ksubject to trV i = i , i=1... kI ≽ V i ≽ 0, i=1... k (1331)

516 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSFor B ∈ S N whose eigenvalues λ(B)∈ R N are arranged in nonincreasingorder, let Ξλ(B) be a permutation of λ(B) such thattheir absolute value becomes arranged in nonincreasing order;|Ξλ(B)| 1 ≥ |Ξλ(B)| 2 ≥ · · · ≥ |Ξλ(B)| N . Then, for 1≤k ≤N[9,4.3] C.1k∑i=1k∑i=1|Ξλ(B)| i = minimize kµ + tr(Y + Z)µ∈R , Y,Z∈S N +subject to µI + Y + B ≽ 0µI + Z − B ≽ 0= maximize 〈B , V − W 〉V ,W ∈ S N +subject to I ≽ V , Wtr(V + W)=k(1332)For diagonal matrix Υ∈ S k whose diagonal entries are arranged innonincreasing order where 1≤k ≤NΥ ii |Ξλ(B)| i = minimizek∑iµ i + tr(Y i + Z i )µ∈R k , Y i , Z i ∈S N i=1subject to µ i I + Y i + (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... k= maximizeV i ,W i ∈S Nwhere Υ k+1,k+1 ∆ = 0.µ i I + Z i − (Υ ii −Υ i+1,i+1 )B ≽ 0, i=1... kY i ,Z i ≽ 0,()∑tr B k (Υ ii −Υ i+1,i+1 )(V i − W i )i=1i=1... ksubject to tr(V i + W i ) = i , i=1... kI ≽ V i ≽ 0, i=1... kI ≽ W i ≽ 0, i=1... k (1333)For A,B∈ S N whose eigenvalues λ(A), λ(B)∈ R N are respectivelyarranged in nonincreasing order, and for nonincreasingly ordereddiagonalizations A = W A ΥWA T and B = W B ΛW B T [130] [156,2.1]λ(A) T λ(B) = supU∈ R N×NU T U=Itr(A T U T BU) ≥ tr(A T B) (1351)C.1 There exist typographical errors in [191, (6.49) (6.55)] for this minimization.

C.4. ORTHOGONAL PROCRUSTES PROBLEM 517(confer (1356)) where optimal U isU ⋆ = W B W TA ∈ R N×N (1348)We can push that upper bound higher using a result inC.5.2.1:|λ(A)| T |λ(B)| = supU∈ C N×NU H U=IRe tr(A T U H BU) (1334)For step function ψ as defined in (1205), optimal U becomesU ⋆ = W B√δ(ψ(δ(Λ)))H√δ(ψ(δ(Υ))) WTA ∈ C N×N (1335)C.4 Orthogonal Procrustes problemGiven matrices A,B∈ R n×N , their product having full singular valuedecomposition (A.6.3)AB T ∆ = UΣQ T ∈ R n×n (1336)then an optimal solution R ⋆ to the orthogonal Procrustes problemminimize ‖A − R T B‖ FR(1337)subject to R T = R −1maximizes tr(A T R T B) over the nonconvex manifold of orthogonal matrices:[131,7.4.8]R ⋆ = QU T ∈ R n×n (1338)A necessary and sufficient condition for optimalityAB T R ⋆ ≽ 0 (1339)holds whenever R ⋆ is an orthogonal matrix. [97,4]Solution to problem (1337) can reveal rotation/reflection (4.5.2,B.5)of one list in the columns of A with respect to another list B . Solutionis unique if rankBV N = n . [66,2.4.1] The optimal value for objective ofminimization istr ( A T A + B T B − 2AB T R ⋆) = tr(A T A) + tr(B T B) − 2 tr(UΣU T )= ‖A‖ 2 F + ‖B‖2 F − 2δ(Σ)T 1(1340)

518 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSwhile the optimal value for corresponding trace maximization issupR T =R −1 tr(A T R T B) = tr(A T R ⋆T B) = δ(Σ) T 1 ≥ tr(A T B) (1341)The same optimal solution R ⋆ solvesmaximize ‖A + R T B‖ FR(1342)subject to R T = R −1C.4.1Effect of translationConsider the impact of dc offset in known lists A,B∈ R n×N on problem(1337). Rotation of B there is with respect to the origin, so better resultsmay be obtained if offset is first accounted. Because the geometric centersof the lists AV and BV are the origin, instead we solveminimize ‖AV − R T BV ‖ FR(1343)subject to R T = R −1where V ∈ S N is the geometric centering matrix (B.4.1). Now we define thefull singular value decompositionand an optimal rotation matrixAV B T ∆ = UΣQ T ∈ R n×n (1344)R ⋆ = QU T ∈ R n×n (1338)The desired result is an optimally rotated offset listR ⋆T BV + A(I − V ) ≈ A (1345)which most closely matches the list in A . Equality is attained when the listsare precisely related by a rotation/reflection and an offset. When R ⋆T B=Aor B1=A1=0, this result (1345) reduces to R ⋆T B ≈ A .

C.5. TWO-SIDED ORTHOGONAL PROCRUSTES 519C.4.1.1Translation of extended listSuppose an optimal rotation matrix R ⋆ ∈ R n×n were derived as beforefrom matrix B ∈ R n×N , but B is part of a larger list in the columns of[C B ]∈ R n×M+N where C ∈ R n×M . In that event, we wish to applythe rotation/reflection and translation to the larger list. The expressionsupplanting the approximation in (1345) makes 1 T of compatible dimension;R ⋆T [C −B11 T 1 NBV ] + A11 T 1 N(1346)id est, C −B11 T 1 N ∈ Rn×M and A11 T 1 N ∈ Rn×M+N .C.5 Two-sided orthogonal ProcrustesC.5.0.1MinimizationGiven symmetric A,B∈ S N , each having diagonalization (A.5.2)A ∆ = Q A Λ A Q T A , B ∆ = Q B Λ B Q T B (1347)where eigenvalues are arranged in their respective diagonal matrix Λ innonincreasing order, then an optimal solution [73]to the two-sided orthogonal Procrustes problemR ⋆ = Q B Q T A ∈ R N×N (1348)minimize ‖A − R T BR‖ FR= minimize tr ( A T A − 2A T R T BR + B T B )subject to R T = R −1 Rsubject to R T = R −1 (1349)maximizes tr(A T R T BR) over the nonconvex manifold of orthogonal matrices.Optimal product R ⋆T BR ⋆ has the eigenvectors of A but the eigenvalues of B .[97,7.5.1] The optimal value for the objective of minimization is, by (39)‖Q A Λ A Q T A −R ⋆T Q B Λ B Q T BR ⋆ ‖ F = ‖Q A (Λ A −Λ B )Q T A ‖ F = ‖Λ A −Λ B ‖ F (1350)while the corresponding trace maximization has optimal valuesup tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A Λ B ) ≥ tr(A T B) (1351)R T =R −1

520 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSC.5.0.2MaximizationAny permutation matrix is an orthogonal matrix. Defining a row and columnswapping permutation matrix (a reflection matrix, B.5.2)⎡ ⎤0 1·Ξ = Ξ T =⎢ ·⎥(1352)⎣ 1 ⎦1 0then an optimal solution R ⋆ to the maximization problem [sic]minimizes tr(A T R T BR) : [130] [156,2.1]maximize ‖A − R T BR‖ FR(1353)subject to R T = R −1R ⋆ = Q B ΞQ T A ∈ R N×N (1354)The optimal value for the objective of maximization is‖Q A Λ A Q T A − R⋆T Q B Λ B Q T B R⋆ ‖ F = ‖Q A Λ A Q T A − Q A ΞT Λ B ΞQ T A ‖ F= ‖Λ A − ΞΛ B Ξ‖ F(1355)while the corresponding trace minimization has optimal valueC.5.1inf tr(A T R T BR) = tr(A T R ⋆T BR ⋆ ) = tr(Λ A ΞΛ B Ξ) (1356)R T =R −1Procrustes’ relation to linear programmingAlthough these two-sided Procrustes problems are nonconvex, a connectionwith linear programming [57] was discovered by Anstreicher & Wolkowicz[10,3] [156,2.1]: Given A,B∈ S N , this semidefinite program in S and Tminimize tr(A T R T BR) = maximize tr(S + T ) (1357)RS , T ∈S Nsubject to R T = R −1 subject to A T ⊗ B − I ⊗ S − T ⊗ I ≽ 0

C.5. TWO-SIDED ORTHOGONAL PROCRUSTES 521(where ⊗ signifies Kronecker product (D.1.2.1)) has optimal objective value(1356). These two problems are strong duals (2.13.1.0.2). Given ordereddiagonalizations (1347), make the observation:infR tr(AT R T BR) = inf tr(Λ ˆRT A Λ B ˆR) (1358)ˆRbecause ˆR =Q ∆ B TRQ A on the set of orthogonal matrices (which includes thepermutation matrices) is a bijection. This means, basically, diagonal matricesof eigenvalues Λ A and Λ B may be substituted for A and B , so only the maindiagonals of S and T come into play;maximizeS,T ∈S N 1 T δ(S + T )subject to δ(Λ A ⊗ (ΞΛ B Ξ) − I ⊗ S − T ⊗ I) ≽ 0(1359)a linear program in δ(S) and δ(T) having the same optimal objective valueas the semidefinite program.We relate their results to Procrustes problem (1349) by manipulatingsigns (1299) and permuting eigenvalues:maximize tr(A T R T BR) = minimize 1 T δ(S + T )RS , T ∈S Nsubject to R T = R −1 subject to δ(I ⊗ S + T ⊗ I − Λ A ⊗ Λ B ) ≽ 0= minimize tr(S + T )(1360)S , T ∈S Nsubject to I ⊗ S + T ⊗ I − A T ⊗ B ≽ 0This formulation has optimal objective value identical to that in (1351).C.5.2Two-sided orthogonal Procrustes via SVDBy making left- and right-side orthogonal matrices independent, we can pushthe upper bound on trace (1351) a little further: Given real matrices A,Beach having full singular value decomposition (A.6.3)A ∆ = U A Σ A Q T A ∈ R m×n , B ∆ = U B Σ B Q T B ∈ R m×n (1361)then a well-known optimal solution R ⋆ , S ⋆ to the problemminimize ‖A − SBR‖ FR , Ssubject to R H = R −1(1362)S H = S −1

522 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTSmaximizes Re tr(A T SBR) : [209] [188] [35] [125] optimal orthogonal matricesS ⋆ = U A U H B ∈ R m×m , R ⋆ = Q B Q H A ∈ R n×n (1363)[sic] are not necessarily unique [131,7.4.13] because the feasible set is notconvex. The optimal value for the objective of minimization is, by (39)‖U A Σ A Q H A − S ⋆ U B Σ B Q H B R ⋆ ‖ F = ‖U A (Σ A − Σ B )Q H A ‖ F = ‖Σ A − Σ B ‖ F (1364)while the corresponding trace maximization has optimal value [30,III.6.12]sup | tr(A T SBR)| = sup Re tr(A T SBR) = Re tr(A T S ⋆ BR ⋆ ) = tr(ΣA TΣ B ) ≥ tr(AT B)R H =R −1R H =R −1S H =S −1 S H =S −1 (1365)for which it is necessaryA T S ⋆ BR ⋆ ≽ 0 , BR ⋆ A T S ⋆ ≽ 0 (1366)The lower bound on inner product of singular values in (1365) is due tovon Neumann. Equality is attained if U HA U B =I and QH B Q A =I .C.5.2.1Symmetric matricesNow optimizing over the complex manifold of unitary matrices (B.5.1),the upper bound on trace (1351) is thereby raised: Suppose we are givendiagonalizations for (real) symmetric A,B (A.5)A = W A ΥW TA ∈ S n , δ(Υ) ∈ K M (1367)B = W B ΛW T B ∈ S n , δ(Λ) ∈ K M (1368)having their respective eigenvalues in diagonal matrices Υ, Λ ∈ S n arrangedin nonincreasing order (membership to the monotone cone K M (361)). Thenby splitting eigenvalue signs, we invent a symmetric SVD-like decompositionA ∆ = U A Σ A Q H A ∈ S n , B ∆ = U B Σ B Q H B ∈ S n (1369)where U A , U B , Q A , Q B ∈ C n×n are unitary matrices defined by (conferA.6.4)U A ∆ = W A√δ(ψ(δ(Υ))) , QA ∆ = W A√δ(ψ(δ(Υ)))H, ΣA = |Υ| (1370)U B ∆ = W B√δ(ψ(δ(Λ))) , QB ∆ = W B√δ(ψ(δ(Λ)))H, ΣB = |Λ| (1371)

C.5. TWO-SIDED ORTHOGONAL PROCRUSTES 523where step function ψ is defined in (1205). In this circumstance,S ⋆ = U A U H B = R ⋆T ∈ C n×n (1372)optimal matrices (1363) now unitary are related by transposition.optimal value of objective (1364) isThe‖U A Σ A Q H A − S ⋆ U B Σ B Q H B R ⋆ ‖ F = ‖ |Υ| − |Λ| ‖ F (1373)while the corresponding optimal value of trace maximization (1365) isC.5.2.2supR H =R −1S H =S −1 Re tr(A T SBR) = tr(|Υ| |Λ|) (1374)Diagonal matricesNow suppose A and B are diagonal matricesA = Υ = δ 2 (Υ) ∈ S n , δ(Υ) ∈ K M (1375)B = Λ = δ 2 (Λ) ∈ S n , δ(Λ) ∈ K M (1376)both having their respective main-diagonal entries arranged in nonincreasingorder:minimize ‖Υ − SΛR‖ FR , Ssubject to R H = R −1(1377)S H = S −1Then we have a symmetric decomposition from unitary matrices as in (1369)whereU A ∆ = √ δ(ψ(δ(Υ))) , Q A ∆ = √ δ(ψ(δ(Υ))) H , Σ A = |Υ| (1378)U B ∆ = √ δ(ψ(δ(Λ))) , Q B ∆ = √ δ(ψ(δ(Λ))) H , Σ B = |Λ| (1379)Procrustes solution (1363) again sees the transposition relationshipS ⋆ = U A U H B = R ⋆T ∈ C n×n (1372)but both optimal unitary matrices are now themselves diagonal. So,S ⋆ ΛR ⋆ = δ(ψ(δ(Υ)))Λδ(ψ(δ(Λ))) = δ(ψ(δ(Υ)))|Λ| (1380)

524 APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS

Appendix DMatrix calculusFrom too much study, and from extreme passion, comethmadnesse.−Isaac Newton [89,5]D.1 Directional derivative, Taylor seriesD.1.1GradientsGradient of a differentiable real function f(x) : R K →R with respect to itsvector domain is defined⎡ ⎤∇ 2 f(x) =∆ ⎢⎣∇f(x) =∆ ⎢⎣∂ 2 f(x)∂ 2 x 1∂f(x)∂x 1∂f(x)∂x 2.∂f(x)∂x K⎥⎦ ∈ RK (1381)while the second-order gradient of the twice differentiable real function withrespect to its vector domain is traditionally called the Hessian ;⎡⎤∂ 2 f(x)∂x 1 ∂x 2· · ·∂ 2 f(x)∂x 2 ∂x 1.∂ 2 f(x)∂x K ∂x 1∂ 2 f(x)∂ 2 x 2· · ·.. . .∂ 2 f(x)∂x K ∂x 2· · ·∂ 2 f(x)∂x 1 ∂x K∂ 2 f(x)∂x 2 ∂x K.∂ 2 f(x)∂ 2 x K∈ S K (1382)⎥⎦2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.525

526 APPENDIX D. MATRIX CALCULUSThe gradient of vector-valued function v(x) : R→R N on real domain isa row-vector[∇v(x) =∆∂v 1 (x)∂xwhile the second-order gradient is∂v 2 (x)∂x· · ·∂v N (x)∂x]∈ R N (1383)[∇ 2 v(x) =∆∂ 2 v 1 (x)∂x 2∂ 2 v 2 (x)∂x 2 · · ·]∂ 2 v N (x)∈ R N (1384)∂x 2Gradient of vector-valued function h(x) : R K →R N on vector domain is⎡∇h(x) =∆ ⎢⎣∂h 1 (x)∂x 1∂h 1 (x)∂x 2.∂h 2 (x)∂x 1· · ·∂h 2 (x)∂x 2· · ·.∂h N (x)∂x 1∂h N (x)∂x 2.⎦(1385)∂h 1 (x) ∂h 2 (x) ∂h∂x K ∂x K· · · N (x)∂x K= [ ∇h 1 (x) ∇h 2 (x) · · · ∇h N (x) ] ∈ R K×Nwhile the second-order gradient has a three-dimensional representationdubbed cubix ; D.1⎡∇ 2 h(x) =∆ ⎢⎣∇ ∂h 1(x)∂x 1∇ ∂h 1(x)∂x 2.⎤⎥∇ ∂h 2(x)∂x 1· · · ∇ ∂h N(x)∂x 1∇ ∂h 2(x)∂x 2· · · ∇ ∂h N(x)∂x 2. .∇ ∂h 2(x)∂x K· · · ∇ ∂h N(x)⎦(1386)∇ ∂h 1(x)∂x K∂x K= [ ∇ 2 h 1 (x) ∇ 2 h 2 (x) · · · ∇ 2 h N (x) ] ∈ R K×N×Kwhere the gradient of each real entry is with respect to vector x as in (1381).⎤⎥D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derivedfrom mater meaning mother.

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 527The gradient of real function g(X) : R K×L →R on matrix domain is⎡∇g(X) =∆ ⎢⎣=∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)[∇X(:,1) g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤∈ R K×L⎥⎦∇ X(:,2) g(X)...∇ X(:,L) g(X) ] ∈ R K×1×L (1387)where the gradient ∇ X(:,i) is with respect to the i th column of X . Thestrange appearance of (1387) in R K×1×L is meant to suggest a third dimensionperpendicular to the page (not a diagonal matrix). The second-order gradienthas representation⎡∇ 2 g(X) =∆ ⎢⎣=∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1[∇∇X(:,1) g(X)∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22· · · ∇ ∂g(X)∂X 2L. .∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL⎤∈ R K×L×K×L⎥⎦∇∇ X(:,2) g(X)...∇∇ X(:,L) g(X) ] ∈ R K×1×L×K×L (1388)where the gradient ∇ is with respect to matrix X .

528 APPENDIX D. MATRIX CALCULUSGradient of vector-valued function g(X) : R K×L →R N on matrix domainis a cubix[∇X(:,1) g 1 (X) ∇ X(:,1) g 2 (X) · · · ∇ X(:,1) g N (X)∇g(X) ∆ =∇ X(:,2) g 1 (X) ∇ X(:,2) g 2 (X) · · · ∇ X(:,2) g N (X).........∇ X(:,L) g 1 (X) ∇ X(:,L) g 2 (X) · · · ∇ X(:,L) g N (X) ]= [ ∇g 1 (X) ∇g 2 (X) · · · ∇g N (X) ] ∈ R K×N×L (1389)while the second-order gradient has a five-dimensional representation;[∇∇X(:,1) g 1 (X) ∇∇ X(:,1) g 2 (X) · · · ∇∇ X(:,1) g N (X)∇ 2 g(X) ∆ =∇∇ X(:,2) g 1 (X) ∇∇ X(:,2) g 2 (X) · · · ∇∇ X(:,2) g N (X).........∇∇ X(:,L) g 1 (X) ∇∇ X(:,L) g 2 (X) · · · ∇∇ X(:,L) g N (X) ]= [ ∇ 2 g 1 (X) ∇ 2 g 2 (X) · · · ∇ 2 g N (X) ] ∈ R K×N×L×K×L (1390)The gradient of matrix-valued function g(X) : R K×L →R M×N on matrixdomain has a four-dimensional representation called quartix⎡∇g(X) =∆ ⎢⎣∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X). .∇g M1 (X) ∇g M2 (X) · · ·.∇g MN (X)⎤⎥⎦ ∈ RM×N×K×L (1391)while the second-order gradient has six-dimensional representation⎡∇ 2 g(X) =∆ ⎢⎣and so on.∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X). .∇ 2 g M1 (X) ∇ 2 g M2 (X) · · ·.∇ 2 g MN (X)⎤⎥⎦ ∈ RM×N×K×L×K×L(1392)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 529D.1.2Product rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X)while [36,8.3] [210]∇ X(f(X) T g(X) ) = ∇ X (f)g + ∇ X (g)f (1393)∇ X tr ( f(X) T g(X) ) = ∇ X(tr ( f(X) T g(Z) ) + tr ( g(X)f(Z) T))∣ ∣∣Z←X (1394)These expressions implicitly apply as well to scalar-, vector-, or matrix-valuedfunctions of scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f(X) : R 2×2 →R 2 = X T a and g(X) : R 2×2 →R 2 = Xb . We wishto find∇ X(f(X) T g(X) ) = ∇ X a T X 2 b (1395)using the product rule. Formula (1393) calls for∇ X a T X 2 b = ∇ X (X T a)Xb + ∇ X (Xb)X T a (1396)Consider the first of the two terms:∇ X (f)g = ∇ X (X T a)Xb= [ ∇(X T a) 1 ∇(X T a) 2]Xb(1397)The gradient of X T a forms a cubix in R 2×2×2 .⎡∇ X (X T a)Xb =⎢⎣∂(X T a) 1∂X 11 ∂(X T a) 1∂X 21 ∂(X T a) 2∂X 11 ∂(X T a) 1∂(X T a) 2∂X 12 ∂X 12∂(X T a) 2∂X 21 ∂(X T a) 1∂(X T a) 2∂X 22 ∂X 22⎤⎡⎢⎣⎥⎦(1398)⎤(Xb) 1⎥⎦ ∈ R 2×1×2(Xb) 2

530 APPENDIX D. MATRIX CALCULUSBecause gradient of the product (1395) requires total change with respectto change in each entry of matrix X , the Xb vector must make an innerproduct with each vector in the second dimension of the cubix (indicated bydotted line segments);⎡∇ X (X T a)Xb = ⎢⎣⎤a 1 00 a 1⎥a 2 0 ⎦0 a 2[ ]b1 X 11 + b 2 X 12∈ R 2×1×2b 1 X 21 + b 2 X 22(1399)[ ]a1 (b= 1 X 11 + b 2 X 12 ) a 1 (b 1 X 21 + b 2 X 22 )∈ R 2×2a 2 (b 1 X 11 + b 2 X 12 ) a 2 (b 1 X 21 + b 2 X 22 )= ab T X Twhere the cubix appears as a complete 2 ×2×2 matrix. In like manner forthe second term ∇ X (g)f⎡ ⎤b 1 0∇ X (Xb)X T b 2 0[ ]X11 aa =1 + X 21 a 2⎢ ⎥∈ R 2×1×2⎣ 0 b 1⎦ X 12 a 1 + X 22 a 2 (1400)0 b 2= X T ab T ∈ R 2×2The solution∇ X a T X 2 b = ab T X T + X T ab T (1401)can be found from Table D.2.1 or verified using (1394).D.1.2.1Kronecker productA partial remedy for venturing into hyperdimensional representations, suchas the cubix or quartix, is to first vectorize matrices as in (29). This devicegives rise to the Kronecker product of matrices ⊗ ; a.k.a, direct productor tensor product. Although it sees reversal in the literature, [216,2.1] weadopt the definition: for A∈ R m×n and B ∈ R p×q⎡⎤B 11 A B 12 A · · · B 1q AB ⊗ A =∆ B 21 A B 22 A · · · B 2q A⎢⎥⎣ . . . ⎦ ∈ Rpm×qn (1402)B p1 A B p2 A · · · B pq A

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 531One advantage to vectorization is existence of a traditionaltwo-dimensional matrix representation for the second-order gradient ofa real function with respect to a vectorized matrix. For example, fromA.1.1 no.22 (D.2.1) for square A,B∈R n×n [99,5.2] [10,3]∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗A) vec X = B⊗A T +B T ⊗A ∈ R n2 ×n 2(1403)To disadvantage is a large new but known set of algebraic rules and thefact that its mere use does not generally guarantee two-dimensional matrixrepresentation of gradients.D.1.3Chain rules for composite matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variablef(X) and g(X) [142,15.7]∇ X g ( f(X) T) = ∇ X f T ∇ f g (1404)∇ 2 X g( f(X) T) = ∇ X(∇X f T ∇ f g ) = ∇ 2 X f ∇ f g + ∇ X f T ∇ 2f g ∇ Xf (1405)D.1.3.1Two arguments∇ X g ( f(X) T , h(X) T) = ∇ X f T ∇ f g + ∇ X h T ∇ h g (1406)D.1.3.1.1 Example. Chain rule for two arguments. [29,1.1]∇ x g ( f(x) T , h(x) T) =g ( f(x) T , h(x) T) = (f(x) + h(x)) T A (f(x) + h(x)) (1407)[ ] [ ]x1εx1f(x) = , h(x) =(1408)εx 2 x 2∇ x g ( f(x) T , h(x) T) =[ 1 00 ε][ ε 0(A +A T )(f + h) +0 1[ 1 + ε 00 1 + ε](A +A T )(f + h)] ([ ] [ ])(A +A T x1 εx1) +εx 2 x 2(1409)(1410)

532 APPENDIX D. MATRIX CALCULUSfrom Table D.2.1.limε→0 ∇ xg ( f(x) T , h(x) T) = (A +A T )x (1411)These formulae remain correct when the gradients producehyperdimensional representations:D.1.4First directional derivativeAssume that a differentiable function g(X) : R K×L →R M×N has continuousfirst- and second-order gradients ∇g and ∇ 2 g over domg which is an openset. We seek simple expressions for the first and second directional derivativesin direction Y ∈R K×L →Y, dg ∈ R M×N and dg →Y2 ∈ R M×N respectively.Assuming that the limit exists, we may state the partial derivative of themn th entry of g with respect to the kl th entry of X ;∂g mn (X)∂X klg mn (X + ∆t e= limk e T l ) − g mn(X)∆t→0 ∆t∈ R (1412)where e k is the k th standard basis vector in R K while e l is the l th standardbasis vector in R L . The total number of partial derivatives equals KLMNwhile the gradient is defined in their terms; the mn th entry of the gradient is⎡∇g mn (X) =⎢⎣∂g mn(X)∂X 11∂g mn(X)∂X 21.∂g mn(X)∂X K1while the gradient is a quartix∂g mn(X)∂X 12· · ·∂g mn(X)∂X 22· · ·.∂g mn(X)∂X K2· · ·∂g mn(X)∂X 1L∂g mn(X)∂X 2L.∂g mn(X)∂X KL⎤∈ R K×L (1413)⎥⎦⎡⎤∇g 11 (X) ∇g 12 (X) · · · ∇g 1N (X)∇g 21 (X) ∇g 22 (X) · · · ∇g 2N (X)∇g(X) = ⎢⎥⎣ . .. ⎦ ∈ RM×N×K×L (1414)∇g M1 (X) ∇g M2 (X) · · · ∇g MN (X)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 533By simply rotating our perspective of the four-dimensional representation ofthe gradient matrix, we find one of three useful transpositions of this quartix(connoted T 1 ):⎡∇g(X) T 1=⎢⎣∂g(X)∂X 11∂g(X)∂X 21.∂g(X)∂X K1∂g(X)∂X 12· · ·∂g(X)∂X 22.· · ·∂g(X)∂X K2· · ·∂g(X)∂X 1L∂g(X)∂X 2L.∂g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N (1415)When the limit for ∆t ∈ R exists, it is easy to show by substitution ofvariables in (1412)∂g mn (X) g mn (X + ∆t Y kl eY kl = limk e T l ) − g mn(X)∂X kl ∆t→0 ∆t∈ R (1416)which may be interpreted as the change in g mn at X when the change in X klis equal to Y kl , the kl th entry of any Y ∈ R K×L . Because the total changein g mn (X) due to Y is the sum of change with respect to each and everyX kl , the mn th entry of the directional derivative is the corresponding totaldifferential [142,15.8]dg mn (X)| dX→Y= ∑ k,l∂g mn (X)∂X klY kl = tr ( ∇g mn (X) T Y ) (1417)= ∑ g mn (X + ∆t Y kl elimk e T l ) − g mn(X)(1418)∆t→0 ∆tk,lg mn (X + ∆t Y ) − g mn (X)= lim(1419)∆t→0 ∆t= d dt∣ g mn (X+ t Y ) (1420)t=0where t ∈ R . Assuming finite Y , equation (1419) is called the Gâteauxdifferential [28, App.A.5] [129,D.2.1] [239,5.28] whose existence is impliedby the existence of the Fréchet differential, the sum in (1417). [162,7.2] Eachmay be understood as the change in g mn at X when the change in X is equal

534 APPENDIX D. MATRIX CALCULUSin magnitude and direction to Y . D.2 Hence the directional derivative,⎡⎤dg 11 (X) dg 12 (X) · · · dg 1N (X)→Ydg(X) =∆ dg 21 (X) dg 22 (X) · · · dg 2N (X)⎢⎥∈ R M×N⎣ . . . ⎦∣dg M1 (X) dg M2 (X) · · · dg MN (X)⎡= ⎢⎣⎡∣dX→Ytr ( ∇g 11 (X) T Y ) tr ( ∇g 12 (X) T Y ) · · · tr ( ∇g 1N (X) T Y ) ⎤tr ( ∇g 21 (X) T Y ) tr ( ∇g 22 (X) T Y ) · · · tr ( ∇g 2N (X) T Y )⎥...tr ( ∇g M1 (X) T Y ) tr ( ∇g M2 (X) T Y ) · · · tr ( ∇g MN (X) T Y ) ⎦∑k,l∑=k,l⎢⎣ ∑k,l∂g 11 (X)∂X klY kl∑k,l∂g 21 (X)∂X klY kl∑k,l.∂g M1 (X)∑∂X klY klk,l∂g 12 (X)∂X klY kl · · ·∂g 22 (X)∂X klY kl · · ·.∂g M2 (X)∂X klY kl · · ·∑k,l∑k,l∑k,l∂g 1N (X)∂X klY kl∂g 2N (X)∂X klY kl.∂g MN (X)∂X kl⎤⎥⎦Y kl(1421)from which it follows→Ydg(X) = ∑ k,l∂g(X)∂X klY kl (1422)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈Rg(X+ t Y ) = g(X) + t →Ydg(X) + o(t 2 ) (1423)which is the first-order Taylor series expansion about X . [142,18.4][88,2.3.4] Differentiation with respect to t and subsequent t-zeroing isolatesthe second term of the expansion. Thus differentiating and zeroing g(X+t Y )in t is an operation equivalent to individually differentiating and zeroingevery entry g mn (X+ t Y ) as in (1420). So the directional derivative of g(X)in any direction Y ∈R K×L evaluated at X ∈ domg becomes→Ydg(X) = d dt∣ g(X+ t Y ) ∈ R M×N (1424)t=0D.2 Although Y is a matrix, we may regard it as a vector in R KL .

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 535υ ∆ =⎡⎢⎣∇ x ˆf(α)→∇ x ˆf(α)1d ˆf(α) 2⎤⎥⎦ˆf(α + t y)υ T( ˆf(α),α)✡ ✡✡✡✡✡✡✡✡✡ˆf(x)∂HFigure 108: Drawn is a convex quadratic bowl in R 2 ×R ;ˆf(x)= x T x : R 2 → R versus x on some open disc in R 2 . Plane slice∂H is perpendicular to function domain. Slice intersection with domainconnotes bidirectional vector y . Tangent line T slope at point (α , ˆf(α))is directional derivative value ∇ x ˆf(α) T y (1451) at α in slice direction y .Negative gradient −∇ x ˆf(x)∈ R 2 is direction of steepest descent. [253][142,15.6] [88] When vector υ ∈ R 3 entry [ υ]3 is half directional derivativeυ1in gradient direction at α and when = ∇υ x ˆf(α) , then −υ points2directly toward bowl bottom.[182,2.1,5.4.5] [26,6.3.1] which is simplest. The derivative with respectto t makes the directional derivative (1424) resemble ordinary calculus→Y(D.2); e.g., when g(X) is linear, dg(X) = g(Y ). [162,7.2]D.1.4.1Interpretation directional derivativeIn the case of any differentiable real function ˆf(X) : R K×L →R , thedirectional derivative of ˆf(X) at X in any direction Y yields the slopeof ˆf along the line X+ t Y through its domain (parametrized by t ∈ R)evaluated at t = 0. For higher-dimensional functions, by (1421), this slopeinterpretation can be applied to each entry of the directional derivative.Unlike the gradient, directional derivative does not expand dimension; e.g.,

536 APPENDIX D. MATRIX CALCULUSdirectional derivative in (1424) retains the dimensions of g .Figure 108, for example, shows a plane slice of a real convex bowl-shapedfunction ˆf(x) along a line α + t y through its domain. The slice reveals aone-dimensional real function of t ; ˆf(α + t y). The directional derivativeat x = α in direction y is the slope of ˆf(α + t y) with respect to t att = 0. In the case of a real function having vector argument h(X) : R K →R ,its directional derivative in the normalized direction of its gradient is thegradient magnitude. (1451) For a real function of real variable, the directionalderivative evaluated at any point in the function domain is just the slope ofthat function there scaled by the real direction. (confer3.1.1.4)D.1.4.1.1 Theorem. Directional derivative condition for optimization.[162,7.4] Suppose ˆf(X) : R K×L →R is minimized on convex set C ⊆ R p×kby X ⋆ , and the directional derivative of ˆf exists there. Then for all X ∈ CD.1.4.1.2 Example. Simple bowl.Bowl function (Figure 108)→X−X ⋆d ˆf(X) ≥ 0 (1425)ˆf(x) : R K → R ∆ = (x − a) T (x − a) − b (1426)has function offset −b ∈ R , axis of revolution at x = a , and positive definiteHessian (1382) everywhere in its domain (an open hyperdisc in R K ); id est,strictly convex quadratic ˆf(x) has unique global minimum equal to −b atx = a . A vector −υ based anywhere in dom ˆf × R pointing toward theunique bowl-bottom is specified:υ ∝[ x − aˆf(x) + b]∈ R K × R (1427)⋄Such a vector isυ =⎡⎢⎣∇ x ˆf(x)→∇ x ˆf(x)1d ˆf(x) 2⎤⎥⎦ (1428)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 537since the gradient is∇ x ˆf(x) = 2(x − a) (1429)and the directional derivative in the direction of the gradient is (1451)→∇ x ˆf(x)d ˆf(x)( )= ∇ x ˆf(x) T ∇ x ˆf(x) = 4(x − a) T (x − a) = 4 ˆf(x) + b(1430)∇ ∂g mn(X)∂X klD.1.5Second directional derivativeBy similar argument, it so happens: the second directional derivative isequally simple. Given g(X) : R K×L →R M×N on open domain,= ∂∇g mn(X)∂X kl=⎡∇ 2 g mn (X) =⎢⎣⎡=⎢⎣⎡⎢⎣∇ ∂gmn(X)∂X 11∇ ∂gmn(X)∂X 21.∇ ∂gmn(X)∂X K1∂∇g mn(X)∂X 11∂∇g mn(X)∂X 21.∂∇g mn(X)∂X K1∂ 2 g mn(X)∂X kl ∂X 11∂ 2 g mn(X)∂X kl ∂X 21∂ 2 g mn(X)∂X kl ∂X K1∂ 2 g mn(X)∂X kl ∂X 12· · ·∂ 2 g mn(X)∂X kl ∂X 1L∂ 2 g mn(X) ∂∂X kl ∂X 22· · ·2 g mn(X)∂X kl ∂X 2L∈ R K×L (1431)⎥. . . ⎦∂ 2 g mn(X) ∂∂X kl ∂X K2· · ·2 g mn(X)∂X kl ∂X KL⎤∇ ∂gmn(X)∂X 12· · · ∇ ∂gmn(X)∂X 1L∇ ∂gmn(X)∂X 22· · · ∇ ∂gmn(X)∂X 2L∈ R K×L×K×L⎥. . ⎦∇ ∂gmn(X)∂X K2· · · ∇ ∂gmn(X)∂X KL(1432)∂∇g mn(X)⎤∂X 12· · ·∂∇g mn(X)∂X 22.· · ·∂∇g mn(X)∂X K2· · ·∂∇g mn(X)∂X 1L∂∇g mn(X)∂X 2L.∂∇g mn(X)∂X KLRotating our perspective, we get several views of the second-order gradient:⎡⎤∇ 2 g 11 (X) ∇ 2 g 12 (X) · · · ∇ 2 g 1N (X)∇ 2 ∇ 2 g 21 (X) ∇ 2 g 22 (X) · · · ∇ 2 g 2N (X)g(X) = ⎢⎥⎣ . .. ⎦ ∈ RM×N×K×L×K×L (1433)∇ 2 g M1 (X) ∇ 2 g M2 (X) · · · ∇ 2 g MN (X)⎥⎦⎤

538 APPENDIX D. MATRIX CALCULUS⎡∇ 2 g(X) T 1=⎢⎣⎡∇ 2 g(X) T 2=⎢⎣∇ ∂g(X)∂X 11∇ ∂g(X)∂X 21.∇ ∂g(X)∂X K1∂∇g(X)∂X 11∂∇g(X)∂X 21.∂∇g(X)∂X K1∇ ∂g(X)∂X 12· · · ∇ ∂g(X)∂X 1L∇ ∂g(X)∂X 22.· · · ∇ ∂g(X)∂X 2L.∇ ∂g(X)∂X K2· · · ∇ ∂g(X)∂X KL∂∇g(X)∂X 12· · ·∂∇g(X)∂X 22.· · ·∂∇g(X)∂X K2· · ·∂∇g(X)∂X 1L∂∇g(X)∂X 2L.∂∇g(X)∂X KL⎤⎥⎦ ∈ RK×L×M×N×K×L (1434)⎤⎥⎦ ∈ RK×L×K×L×M×N (1435)Assuming the limits exist, we may state the partial derivative of the mn thentry of g with respect to the kl th and ij th entries of X ;∂ 2 g mn(X)∂X kl ∂X ij=g mn(X+∆t elimk e T l +∆τ e i eT j )−gmn(X+∆t e k eT l )−(gmn(X+∆τ e i eT )−gmn(X))j∆τ,∆t→0∆τ ∆t(1436)Differentiating (1416) and then scaling by Y ij∂ 2 g mn(X)∂X kl ∂X ijY kl Y ij = lim∆t→0∂g mn(X+∆t Y kl e k e T l )−∂gmn(X)∂X ijY∆tij(1437)g mn(X+∆t Y kl e= limk e T l +∆τ Y ij e i e T j )−gmn(X+∆t Y kl e k e T l )−(gmn(X+∆τ Y ij e i e T )−gmn(X))j∆τ,∆t→0∆τ ∆twhich can be proved by substitution of variables in (1436).second-order total differential due to any Y ∈R K×L isd 2 g mn (X)| dX→Y= ∑ i,j∑k,l∂ 2 g mn (X)Y kl Y ij = tr(∇ X tr ( ∇g mn (X) T Y ) )TY∂X kl ∂X ijThe mn th(1438)= ∑ ∂g mn (X + ∆t Y ) − ∂g mn (X)limY ij∆t→0 ∂Xi,jij ∆t(1439)g mn (X + 2∆t Y ) − 2g mn (X + ∆t Y ) + g mn (X)= lim∆t→0∆t 2 (1440)= d2dt 2 ∣∣∣∣t=0g mn (X+ t Y ) (1441)Hence the second directional derivative,

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 539→Ydg 2 (X) ∆ =⎡⎢⎣⎡tr(∇tr ( ∇g 11 (X) T Y ) )TY =tr(∇tr ( ∇g 21 (X) T Y ) )TY ⎢⎣ .tr(∇tr ( ∇g M1 (X) T Y ) )TY⎡∑ ∑i,j k,l∑ ∑=i,j k,l⎢⎣ ∑ ∑i,jk,l∂ 2 g 11 (X)∂X kl ∂X ijY kl Y ij∂ 2 g 21 (X)∂X kl ∂X ijY kl Y ij.∂ 2 g M1 (X)∂X kl ∂X ijY kl Y ijd 2 g 11 (X) d 2 g 12 (X) · · · d 2 g 1N (X)d 2 g 21 (X) d 2 g 22 (X) · · · d 2 g 2N (X). .d 2 g M1 (X) d 2 g M2 (X) · · ·.d 2 g MN (X)⎤⎥∈ R M×N⎦∣∣dX→Ytr(∇tr ( ∇g 12 (X) T Y ) )TY · · · tr(∇tr ( ∇g 1N (X) T Y ) ) ⎤TYtr(∇tr ( ∇g 22 (X) T Y ) )TY · · · tr(∇tr ( ∇g 2N (X) T Y ) )T Y ..tr(∇tr ( ∇g M2 (X) T Y ) )TY · · · tr(∇tr ( ∇g MN (X) T Y ) ⎥) ⎦TY∑ ∑∂ 2 g 12 (X)∂X kl ∂X ijY kl Y ij · · ·i,ji,jk,l∑ ∑∂ 2 g 22 (X)∂X kl ∂X ijY kl Y ij · · ·k,l.∑ ∑∂ 2 g M2 (X)∂X kl ∂X ijY kl Y ij · · ·from which it follows→Ydg 2 (X) = ∑ ∑ ∂ 2 g(X)Y kl Y ij = ∑ ∂Xi,j kl ∂X iji,jk,li,jk,l∑∑⎤∂ 2 g 1N (X)∂X kl ∂X ijY kl Y iji,j k,l∑∑∂ 2 g 2N (X)∂X kl ∂X ijY kl Y iji,j k,l. ⎥∑ ∑ ∂ 2 g MN (X) ⎦∂X kl ∂X ijY kl Y iji,jk,l(1442)∂∂X ij→Ydg(X)Y ij (1443)Yet for all X ∈ domg , any Y ∈R K×L , and some open interval of t∈Rg(X+ t Y ) = g(X) + t dg(X) →Y+ 1 →Yt2dg 2 (X) + o(t 3 ) (1444)2!which is the second-order Taylor series expansion about X . [142,18.4][88,2.3.4] Differentiating twice with respect to t and subsequent t-zeroingisolates the third term of the expansion. Thus differentiating and zeroingg(X+tY ) in t is an operation equivalent to individually differentiating andzeroing every entry g mn (X + t Y ) as in (1441). So the second directionalderivative becomes→Ydg 2 (X) = d2dt 2 ∣∣∣∣t=0g(X+ t Y ) ∈ R M×N (1445)[182,2.1,5.4.5] [26,6.3.1] which is again simplest. (confer (1424))

540 APPENDIX D. MATRIX CALCULUSD.1.6Taylor seriesSeries expansions of the differentiable matrix-valued function g(X) , ofmatrix argument, were given earlier in (1423) and (1444). Assuming g(X)has continuous first-, second-, and third-order gradients over the open setdomg , then for X ∈ domg and any Y ∈ R K×L the complete Taylor serieson some open interval of µ∈R is expressedg(X+µY ) = g(X) + µ dg(X) →Y+ 1 →Yµ2dg 2 (X) + 1 →Yµ3dg 3 (X) + o(µ 4 ) (1446)2! 3!or on some open interval of ‖Y ‖g(Y ) = g(X) +→Y −X→Y −X→Y −Xdg(X) + 1 dg 2 (X) + 1 dg 3 (X) + o(‖Y ‖ 4 ) (1447)2! 3!which are third-order expansions about X . The mean value theorem fromcalculus is what insures the finite order of the series. [29,1.1] [28, App.A.5][129,0.4] [142]In the case of a real function g(X) : R K×L →R , all the directionalderivatives are in R :→Ydg(X) = tr ( ∇g(X) T Y ) (1448)→Ydg 2 (X) = tr(∇ X tr ( ∇g(X) T Y ) )TY(= tr∇ X→Ydg(X) T Y(→Ydg 3 (X) = tr ∇ X tr(∇ X tr ( ∇g(X) T Y ) ) ) ( )T TY →YY = tr ∇ X dg 2 (X) T YIn the case g(X) : R K →R has vector argument, they further simplify:and so on.)(1449)(1450)→Ydg(X) = ∇g(X) T Y (1451)→Ydg 2 (X) = Y T ∇ 2 g(X)Y (1452)→Ydg 3 (X) = ∇ X(Y T ∇ 2 g(X)Y ) TY (1453)D.1.6.0.1 Exercise. log det. (confer [41, p.644])Find the first two terms of the Taylor series expansion (1447) for log detX .

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 541D.1.7Correspondence of gradient to derivativeFrom the foregoing expressions for directional derivative, we derive arelationship between the gradient with respect to matrix X and the derivativewith respect to real variable t :D.1.7.1first-orderRemoving from (1424) the evaluation at t = 0 , D.3 we find an expression forthe directional derivative of g(X) in direction Y evaluated anywhere alonga line X+ t Y (parametrized by t) intersecting domg→Ydg(X+ t Y ) = d g(X+ t Y ) (1454)dtIn the general case g(X) : R K×L →R M×N , from (1417) and (1420) we findtr ( ∇ X g mn (X+ t Y ) T Y ) = d dt g mn(X+ t Y ) (1455)which is valid at t = 0, of course, when X ∈ domg . In the important caseof a real function g(X) : R K×L →R , from (1448) we have simplytr ( ∇ X g(X+ t Y ) T Y ) = d g(X+ t Y ) (1456)dtWhen, additionally, g(X) : R K →R has vector argument,∇ X g(X+ t Y ) T Y = d g(X+ t Y ) (1457)dtD.3 Justified by replacing X with X+ tY in (1417)-(1419); beginning,dg mn (X+ tY )| dX→Y= ∑ k,l∂g mn (X+ tY )Y kl∂X kl

542 APPENDIX D. MATRIX CALCULUSD.1.7.1.1 Example. Gradient.g(X) = w T X T Xw , X ∈ R K×L , w ∈R L . Using the tables inD.2,tr ( ∇ X g(X+ t Y ) T Y ) = tr ( 2ww T (X T + t Y T )Y ) (1458)Applying the equivalence (1456),= 2w T (X T Y + t Y T Y )w (1459)ddt g(X+ t Y ) = d dt wT (X+ t Y ) T (X+ t Y )w (1460)= w T( X T Y + Y T X + 2t Y T Y ) w (1461)= 2w T (X T Y + t Y T Y )w (1462)which is the same as (1459); hence, equivalence is demonstrated.It is easy to extract ∇g(X) from (1462) knowing only (1456):D.1.7.2tr ( ∇ X g(X+ t Y ) T Y ) = 2w T (X T Y + t Y T Y )w= 2 tr ( ww T (X T + t Y T )Y )tr ( ∇ X g(X) T Y ) = 2 tr ( ww T X T Y )(1463)⇔∇ X g(X) = 2Xww Tsecond-orderLikewise removing the evaluation at t = 0 from (1445),→Ydg 2 (X+ t Y ) = d2dt2g(X+ t Y ) (1464)we can find a similar relationship between the second-order gradient and thesecond derivative: In the general case g(X) : R K×L →R M×N from (1438) and(1441),tr(∇ X tr ( ∇ X g mn (X+ t Y ) T Y ) )TY = d2dt 2g mn(X+ t Y ) (1465)In the case of a real function g(X) : R K×L →R we have, of course,tr(∇ X tr ( ∇ X g(X+ t Y ) T Y ) )TY = d2dt2g(X+ t Y ) (1466)

D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 543From (1452), the simpler case, where the real function g(X) : R K → R hasvector argument,Y T ∇ 2 Xd2g(X+ t Y )Y =dt2g(X+ t Y ) (1467)D.1.7.2.1 Example. Second-order gradient.Given real function g(X) = log detX having domain int S K + , we want tofind ∇ 2 g(X)∈R K×K×K×K . From the tables inD.2,h(X) ∆ = ∇g(X) = X −1 ∈ int S K + (1468)so ∇ 2 g(X)=∇h(X). By (1455) and (1423), for Y ∈ S Ktr ( ∇h mn (X) T Y ) ===d dt∣ h mn (X+ t Y ) (1469)( t=0)d dt∣∣ h(X+ t Y ) (1470)t=0 mn( )d dt∣∣ (X+ t Y ) −1 (1471)t=0mn= − ( X −1 Y X −1) mn(1472)Setting Y to a member of the standard basis E kl =e k e T l , for k,l ∈{1... K},and employing a property of the trace function (31) we find∇ 2 g(X) mnkl = tr ( ∇h mn (X) T E kl)= ∇hmn (X) kl = − (X −1 E kl X −1 ) mn(1473)∇ 2 g(X) kl = ∇h(X) kl = − ( X −1 E kl X −1) ∈ R K×K (1474)From all these first- and second-order expressions, we may generate newones by evaluating both sides at arbitrary t (in some open interval) but onlyafter the differentiation.

544 APPENDIX D. MATRIX CALCULUSD.2 Tables of gradients and derivatives[99] [45]When proving results for symmetric matrices algebraically, it is criticalto take gradients ignoring symmetry and to then substitute symmetricentries afterward.a,b∈R n , x,y ∈ R k , A,B∈ R m×n , X,Y ∈ R K×L , t,µ∈R ,i,j,k,l,K,L,m,n,M,N are integers, unless otherwise noted.x µ means δ(δ(x) µ ) for µ ∈R ; id est, entrywise vector exponentiation.δ is the main-diagonal linear operator (1061). x 0 = ∆ 1, X 0 = ∆ I if square.⎡ ⎤ddx∆ ⎢= ⎣ddx 1.ddx k⎥⎦,→ydg(x) ,→ydg 2 (x) (directional derivativesD.1), log x ,sgnx, sin x , x/y (Hadamard quotient), √ x (entrywise square root),etcetera, are maps f : R k → R k that maintain dimension; e.g., (A.1.1)ddx x−1 ∆ = ∇ x 1 T δ(x) −1 1 (1475)The standard basis: { E kl = e k e T l ∈ R K×K | k,l∈{1... K} }For A a scalar or matrix, we have the Taylor series [48,3.6]e A =∞∑k=01k! Ak (1476)Further, [220,5.4]e A ≻ 0 ∀A ∈ S m (1477)For all square A and integer kdet k A = detA k (1478)Table entries with notation X ∈ R 2×2 have been algebraically verifiedin that dimension but may hold more broadly.

D.2. TABLES OF GRADIENTS AND DERIVATIVES 545D.2.1Algebraic∇ x x = ∇ x x T = I ∈ R k×k ∇ X X = ∇ X X T = ∆ I ∈ R K×L×K×L (identity)∇ x (Ax − b) = A T(∇ x x T A − b T) = A∇ x (Ax − b) T (Ax − b) = 2A T (Ax − b)∇x 2 (Ax − b) T (Ax − b) = 2A T A(∇ x x T Ax + 2x T By + y T Cy ) = (A +A T )x + 2By(∇x2 x T Ax + 2x T By + y T Cy ) = A +A T ∇ X a T Xb = ∇ X b T X T a = ab T∇ X a T X 2 b = X T ab T + ab T X T∇ X a T X −1 b = −X −T ab T X −T∇ x a T x T xb = 2xa T b ∇ X a T X T Xb = X(ab T + ba T )∇ x a T xx T b = (ab T + ba T )x ∇ X a T XX T b = (ab T + ba T )X∇ X (X −1 ) kl = ∂X−1∂X kl= −X −1 E kl X −1 , confer (1415)(1474)∇ x a T x T xa = 2xa T a∇ x a T xx T a = 2aa T x∇ x a T yx T b = ba T y∇ x a T y T xb = yb T a∇ x a T xy T b = ab T y∇ x a T x T yb = ya T b∇ X a T X T Xa = 2Xaa T∇ X a T XX T a = 2aa T X∇ X a T YX T b = ba T Y∇ X a T Y T Xb = Y ab T∇ X a T XY T b = ab T Y∇ X a T X T Y b = Y ba T

546 APPENDIX D. MATRIX CALCULUSAlgebraic continuedd(X+ t Y ) = Ydtddt BT (X+ t Y ) −1 A = −B T (X+ t Y ) −1 Y (X+ t Y ) −1 Addt BT (X+ t Y ) −T A = −B T (X+ t Y ) −T Y T (X+ t Y ) −T Addt BT (X+ t Y ) µ A = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M +d 2dt 2 B T (X+ t Y ) −1 A = 2B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 Addt((X+ t Y ) T A(X+ t Y ) ) = Y T AX + X T AY + 2tY T AYd 2dt 2 ((X+ t Y ) T A(X+ t Y ) ) = 2Y T AYd((X+ t Y )A(X+ t Y )) = YAX + XAY + 2tYAYdtd 2dt 2 ((X+ t Y )A(X+ t Y )) = 2YAYD.2.2Trace Kronecker∇ vec X tr(AXBX T ) = ∇ vec X vec(X) T (B T ⊗ A) vec X = (B ⊗A T + B T ⊗A) vec X∇ 2 vec X tr(AXBXT ) = ∇ 2 vec X vec(X)T (B T ⊗ A) vec X = B ⊗A T + B T ⊗A

D.2. TABLES OF GRADIENTS AND DERIVATIVES 547D.2.3Trace∇ x µ x = µI ∇ X trµX = ∇ X µ trX = µI∇ x 1 T δ(x) −1 1 = ddx x−1 = −x −2∇ x 1 T δ(x) −1 y = −δ(x) −2 y∇ X trX −1 = −X −2T∇ X tr(X −1 Y ) = ∇ X tr(Y X −1 ) = −X −T Y T X −Tddx xµ = µx µ−1 ∇ X trX µ = µX (µ−1)T , X ∈ R 2×2∇ X trX j = jX (j−1)T∇ x (b − a T x) −1 = (b − a T x) −2 a∇ X tr ( (B − AX) −1) = ( (B − AX) −2 A ) T∇ x (b − a T x) µ = −µ(b − a T x) µ−1 a∇ x x T y = ∇ x y T x = y∇ X tr(X T Y ) = ∇ X tr(Y X T ) = ∇ X tr(Y T X) = ∇ X tr(XY T ) = Y∇ X tr(AXBX T ) = ∇ X tr(XBX T A) = A T XB T + AXB∇ X tr(AXBX) = ∇ X tr(XBXA) = A T X T B T + B T X T A T∇ X tr(AXAXAX) = ∇ X tr(XAXAXA) = 3(AXAXA) T∇ X tr(Y X k ) = ∇ X tr(X k Y ) = k−1 ∑ (X i Y X k−1−i) Ti=0∇ X tr(Y T XX T Y ) = ∇ X tr(X T Y Y T X) = 2 Y Y T X∇ X tr(Y T X T XY ) = ∇ X tr(XY Y T X T ) = 2XY Y T∇ X tr ( (X + Y ) T (X + Y ) ) = 2(X + Y )∇ X tr((X + Y )(X + Y )) = 2(X + Y ) T∇ X tr(A T XB) = ∇ X tr(X T AB T ) = AB T∇ X tr(A T X −1 B) = ∇ X tr(X −T AB T ) = −X −T AB T X −T∇ X a T Xb = ∇ X tr(ba T X) = ∇ X tr(Xba T ) = ab T∇ X b T X T a = ∇ X tr(X T ab T ) = ∇ X tr(ab T X T ) = ab T∇ X a T X −1 b = ∇ X tr(X −T ab T ) = −X −T ab T X −T∇ X a T X µ b = ...

548 APPENDIX D. MATRIX CALCULUSTrace continueddtrg(X+ t Y ) = tr d g(X+ t Y )dt dtdtr(X+ t Y ) = trYdtddt trj (X+ t Y ) = j tr j−1 (X+ t Y ) trYdtr(X+ t Y dt )j = j tr((X+ t Y ) j−1 Y )(∀j)d2tr((X+ t Y )Y ) = trYdtdtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = k tr ( (X+ t Y ) k−1 Y 2) , k ∈{0, 1, 2}dtr( (X+ t Y ) k Y ) = d tr(Y (X+ t Y dt dt )k ) = tr k−1 ∑(X+ t Y ) i Y (X+ t Y ) k−1−i Ydtr((X+ t Y dt )−1 Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dtr( B T (X+ t Y ) −1 A ) = − tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 A )dtdtr( B T (X+ t Y ) −T A ) = − tr ( B T (X+ t Y ) −T Y T (X+ t Y ) −T A )dtdtr( B T (X+ t Y ) −k A ) = ..., k>0dtdtr( B T (X+ t Y ) µ A ) = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M dt +d 2tr ( B T (X+ t Y ) −1 A ) = 2 tr ( B T (X+ t Y ) −1 Y (X+ t Y ) −1 Y (X+ t Y ) −1 A )dt 2dtr( (X+ t Y ) T A(X+ t Y ) ) = tr ( Y T AX + X T AY + 2tY T AY )dtd 2tr ( (X+ t Y ) T A(X+ t Y ) ) = 2 tr ( Y T AY )dt 2dtr((X+ t Y )A(X+ t Y )) = tr(YAX + XAY + 2tYAY )dtd 2dt 2 tr((X+ t Y )A(X+ t Y )) = 2 tr(YAY )i=0

D.2. TABLES OF GRADIENTS AND DERIVATIVES 549D.2.4Log determinantx≻0, detX >0 on some neighborhood of X , and det(X+ t Y )>0 onsome open interval of t ; otherwise, log( ) would be discontinuous.dlog x = x−1∇dxddx log x−1 = −x −1ddx log xµ = µx −1X log detX = X −T∇ 2 X log det(X) kl = ∂X−T∂X kl= −(X −1 E kl X −1 ) T , confer (1432)(1474)∇ X log detX −1 = −X −T∇ X log det µ X = µX −T∇ X log detX µ = µX −T , X ∈ R 2×2∇ X log detX k = ∇ X log det k X = kX −T∇ X log det µ (X+ t Y ) = µ(X+ t Y ) −T∇ x log(a T x + b) = a 1a T x+b∇ X log det(AX+ B) = A T (AX+ B) −T∇ X log det(I ± A T XA) = ...∇ X log det(X+ t Y ) k = ∇ X log det k (X+ t Y ) = k(X+ t Y ) −Tdlog det(X+ t Y ) = tr((X+ t Y dt )−1 Y )d 2dt 2 log det(X+ t Y ) = − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )dlog det(X+ t Y dt )−1 = − tr((X+ t Y ) −1 Y )d 2dt 2 log det(X+ t Y ) −1 = tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y )ddt log det(δ(A(x + t y) + a)2 + µI)= tr ( (δ(A(x + t y) + a) 2 + µI) −1 2δ(A(x + t y) + a)δ(Ay) )

550 APPENDIX D. MATRIX CALCULUSD.2.5Determinant∇ X detX = ∇ X detX T = det(X)X −T∇ X detX −1 = − det(X −1 )X −T = − det(X) −1 X −T∇ X det µ X = µ det µ (X)X −T∇ X detX µ = µ det(X µ )X −T , X ∈ R 2×2∇ X detX k = k det k−1 (X) ( tr(X)I − X T) , X ∈ R 2×2∇ X detX k = ∇ X det k X = k det(X k )X −T = k det k (X)X −T∇ X det µ (X+ t Y ) = µ det µ (X+ t Y )(X+ t Y ) −T∇ X det(X+ t Y ) k = ∇ X det k (X+ t Y ) = k det k (X+ t Y )(X+ t Y ) −Tddet(X+ t Y ) = det(X+ t Y ) tr((X+ t Y dt )−1 Y )d 2dt 2 det(X+ t Y ) = det(X+ t Y )(tr 2 ((X+ t Y ) −1 Y ) − tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddet(X+ t Y dt )−1 = − det(X+ t Y ) −1 tr((X+ t Y ) −1 Y )d 2dt 2 det(X+ t Y ) −1 = det(X+ t Y ) −1 (tr 2 ((X+ t Y ) −1 Y ) + tr((X+ t Y ) −1 Y (X+ t Y ) −1 Y ))ddt detµ (X+ t Y ) = ...

D.2. TABLES OF GRADIENTS AND DERIVATIVES 551D.2.6Logarithmicdlog(X+ t Y dt )µ = ..., −1 ≤ µ ≤ 1, X, Y ∈ S M + [132,6.6, prob.20]D.2.7Exponential[48,3.6,4.5] [220,5.4]∇ X e tr(Y T X) = ∇ X dete Y TX = e tr(Y T X) Y (∀X,Y )∇ X tre Y X = e Y T X T Y T = Y T e XT Y Tlog-sum-exp & geometric mean [41, p.74]...d jdt j e tr(X+ t Y ) = e tr(X+ t Y ) tr j (Y )ddt etY = e tY Y = Y e tYddt eX+ t Y = e X+ t Y Y = Y e X+ t Y ,d 2dt 2 e X+ t Y = e X+ t Y Y 2 = Y e X+ t Y Y = Y 2 e X+ t Y ,XY = Y XXY = Y Xe X for symmetric X of dimension less than 3 [41, pg.110]...

552 APPENDIX D. MATRIX CALCULUS

Appendix EProjectionFor all A ∈ R m×n , the pseudoinverse [131,7.3, prob.9] [162,6.12, prob.19][93,5.5.4] [220, App.A]A † = limt→0 +(AT A + t I) −1 A T = limt→0 +AT (AA T + t I) −1 ∈ R n×m (1479)is a unique matrix having [174] [217,III.1, exer.1] E.1R(A † ) = R(A T ), R(A †T ) = R(A) (1480)N(A † ) = N(A T ), N(A †T ) = N(A) (1481)and satisfies the Penrose conditions: [252, Moore-Penrose] [133,1.3]1. AA † A = A2. A † AA † = A †3. (AA † ) T = AA †4. (A † A) T = A † AThe Penrose conditions are necessary and sufficient to establish thepseudoinverse whose principal action is to injectively map R(A) onto R(A T ).E.1 Proof of (1480) and (1481) is by singular value decomposition (A.6).2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.553

554 APPENDIX E. PROJECTIONThe following relations are reliably true without qualification:a. A T † = A †Tb. A †† = Ac. (AA T ) † = A †T A †d. (A T A) † = A † A †Te. (AA † ) † = AA †f. (A † A) † = A † AYet for arbitrary A,B it is generally true that (AB) † ≠ B † A † :E.0.0.0.1 Theorem. Pseudoinverse of product. [102] [157, exer.7.23]For A∈ R m×n and B ∈ R n×k (AB) † = B † A † (1482)if and only ifR(A T AB) ⊆ R(B) and R(BB T A T ) ⊆ R(A T ) (1483)For orthogonal matrices U,Q and arbitrary A [217,III.1]E.0.1Logical deductions(UAQ T ) † = QA † U T (1484)When A is invertible, A † = A −1 , of course; so A † A = AA † = I . Otherwise,for A∈ R m×n [87,5.3.3.1] [157,7] [194]g. A † A = I , A † = (A T A) −1 A T , rankA = nh. AA † = I , A † = A T (AA T ) −1 , rankA = mi. A † Aω = ω , ω ∈ R(A T )j. AA † υ = υ , υ ∈ R(A)k. A † A = AA † , A normall. A k† = A †k , A normal, k an integerWhen A is symmetric, A † is symmetric and (A.6)A ≽ 0 ⇔ A † ≽ 0 (1485)⋄

E.1. IDEMPOTENT MATRICES 555E.1 Idempotent matricesProjection matrices are square and defined by idempotence, P 2 =P ;[220,2.6] [133,1.3] equivalent to the condition, P be diagonalizable[131,3.3, prob.3] with eigenvalues φ i ∈ {0, 1} . [269,4.1, thm.4.1]Idempotent matrices are not necessarily symmetric. The transpose of anidempotent matrix remains idempotent; P T P T = P T . Solely exceptingP = I , all projection matrices are neither orthogonal (B.5) or invertible.[220,3.4] The collection of all projection matrices of particular dimensiondoes not form a convex set.Suppose we wish to project nonorthogonally (obliquely) on the rangeof any particular matrix A∈ R m×n . All idempotent matrices projectingnonorthogonally on R(A) may be expressed:P = A(A † + BZ T ) (1486)where R(P ) = R(A) , E.2 B ∈ R n×k for k ∈ {1... m} is otherwise arbitrary,and Z ∈ R m×k is any matrix in N(A T ); id est,A T Z = A † Z = 0 (1487)Evidently, the collection of nonorthogonal projectors projecting on R(A) isan affine setP k = { A(A † + BZ T ) | B ∈ R n×k} (1488)When matrix A in (1486) is full-rank (A † A=I) or even hasorthonormal columns (A T A = I), this characteristic leads to a biorthogonalcharacterization of nonorthogonal projection:E.2 Proof. R(P )⊆ R(A) is obvious [220,3.6]. By (114) and (115),R(A † + BZ T ) = {(A † + BZ T )y | y ∈ R m }⊇ {(A † + BZ T )y | y ∈ R(A)} = R(A T )R(P ) = {A(A † + BZ T )y | y ∈ R m }⊇ {A(A † + BZ T )y | (A † + BZ T )y ∈ R(A T )} = R(A)

556 APPENDIX E. PROJECTIONE.1.1Biorthogonal characterizationAny nonorthogonal projector P 2 =P ∈ R m×m projecting on nontrivial R(U)can be defined by a biorthogonality condition Q T U =I ; the biorthogonaldecompositiondecomposition!biorthogonal of P being (confer (1486))where E.3 (B.1.1.1)and where generally (confer (1515)) E.4P = UQ T , Q T U = I (1489)R(P )= R(U) , N(P )= N(Q T ) (1490)P T ≠ P , P † ≠ P , ‖P ‖ 2 ≠ 1, P 0 (1491)and P is not nonexpansive (1516).(⇐) To verify assertion (1489) we observe: because idempotent matricesare diagonalizable (A.5), [131,3.3, prob.3] they must have the form (1181)P = SΦS −1 =m∑φ i s i wi T =i=1k∑≤ mi=1s i w T i (1492)that is a sum of k = rankP independent projector dyads (idempotent dyadB.1.1) where φ i ∈ {0, 1} are the eigenvalues of P [269,4.1, thm.4.1] indiagonal matrix Φ∈ R m×m arranged in nonincreasing order, and wheres i ,w i ∈ R m are the right- and left-eigenvectors of P , respectively, whichare independent and real. E.5 ThereforeU ∆ = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1493)E.3 Proof. Obviously, R(P ) ⊆ R(U). Because Q T U = IR(P ) = {UQ T x | x ∈ R m }⊇ {UQ T Uy | y ∈ R k } = R(U)E.4 Orthonormal decomposition (1512) (conferE.3.4) is a special case of biorthogonaldecomposition (1489) characterized by (1515). So, these characteristics (1491) are notnecessary conditions for biorthogonality.E.5 Eigenvectors of a real matrix corresponding to real eigenvalues must be real.(A.5.0.0.1)

E.1. IDEMPOTENT MATRICES 557is the full-rank matrix S ∈ R m×m having m − k columns truncated(corresponding to 0 eigenvalues), while⎡Q T = ∆ S −1 (1:k, :) = ⎣w T1.w T k⎤⎦ ∈ R k×m (1494)is matrix S −1 having the corresponding m − k rows truncated. By the0 eigenvalues theorem (A.7.2.0.1), R(U)= R(P ) , R(Q)= R(P T ) , andR(P ) = span {s i | φ i = 1 ∀i}N(P ) = span {s i | φ i = 0 ∀i}R(P T ) = span {w i | φ i = 1 ∀i}N(P T ) = span {w i | φ i = 0 ∀i}(1495)Thus biorthogonality Q T U =I is a necessary condition for idempotence, andso the collection of nonorthogonal projectors projecting on R(U) is the affineset P k =UQ T k where Q k = {Q | Q T U = I , Q∈ R m×k }.(⇒) Biorthogonality is a sufficient condition for idempotence;P 2 =k∑s i wiTi=1k∑s j wj T = P (1496)j=1id est, if the cross-products are annihilated, then P 2 =P .Nonorthogonal projection ofbiorthogonal expansion,x on R(P ) has expression like aPx = UQ T x =k∑wi T xs i (1497)i=1When the domain is restricted to the range of P , say x=Uξ forξ ∈ R k , then x = Px = UQ T Uξ = Uξ . Otherwise, any component of xin N(P)= N(Q T ) will be annihilated. The direction of nonorthogonalprojection is orthogonal to R(Q) ⇔ Q T U =I ; id est, for Px∈ R(U)Px − x ⊥ R(Q) in R m (1498)

558 APPENDIX E. PROJECTIONTxT ⊥ R(Q)cone(Q)cone(U)PxFigure 109: Nonorthogonal projection of x∈ R 3 on R(U)= R 2 underbiorthogonality condition; id est, Px=UQ T x such that Q T U =I . Anypoint along imaginary line T connecting x to Px will be projectednonorthogonally on Px with respect to horizontal plane constituting R 2 inthis example. Extreme directions of cone(U) correspond to two columnsof U ; likewise for cone(Q). For purpose of illustration, we truncate eachconic hull by truncating coefficients of conic combination at unity. Conic hullcone(Q) is headed upward at an angle, out of plane of page. Nonorthogonalprojection would fail were N(Q T ) in R(U) (were T parallel to a linein R(U)).

E.1. IDEMPOTENT MATRICES 559E.1.1.0.1 Example. Illustration of nonorthogonal projector.Figure 109 shows cone(U) , the conic hull of the columns of⎡1 1U = ⎣−0.5 0.30 0⎤⎦ (1499)from nonorthogonal projector P = UQ T . Matrix U has a limitless numberof left inverses because N(U T ) is nontrivial. Similarly depicted is left inverseQ T from (1486)⎡Q = U †T + ZB T = ⎣⎡= ⎣0.3750 0.6250−1.2500 1.25000 00.3750 0.6250−1.2500 1.25000.5000 0.5000⎤⎡⎦ + ⎣⎤⎦001⎤⎦[0.5 0.5](1500)where Z ∈ N(U T ) and matrix B is selected arbitrarily; id est, Q T U = Ibecause U is full-rank.Direction of projection on R(U) is orthogonal to R(Q). Any point alongline T in the figure, for example, will have the same projection. Were matrixZ instead equal to 0, then cone(Q) would become the relative dual tocone(U) (sharing the same affine hull;2.13.8, confer Figure 41(a)) In thatcase, projection Px = UU † x of x on R(U) becomes orthogonal projection(and unique minimum-distance).E.1.2Idempotence summarySummarizing, nonorthogonal subspace-projector P is a linear operatordefined by idempotence or biorthogonal decomposition (1489), butcharacterized not by symmetry nor positive semidefiniteness nornonexpansivity (1516).

560 APPENDIX E. PROJECTIONE.2 I−P , Projection on algebraic complementIt follows from the diagonalizability of idempotent matrices that I − P mustalso be a projection matrix because it too is idempotent, and because it maybe expressedm∑I − P = S(I − Φ)S −1 = (1 − φ i )s i wi T (1501)where (1 − φ i ) ∈ {1, 0} are the eigenvalues of I − P (1096) whoseeigenvectors s i ,w i are identical to those of P in (1492). A consequence ofthat complementary relationship of eigenvalues is the fact, [230,2] [226,2]for subspace projector P = P 2 ∈ R m×mR(P ) = span {s i | φ i = 1 ∀i} = span {s i | (1 − φ i ) = 0 ∀i} = N(I − P )N(P ) = span {s i | φ i = 0 ∀i} = span {s i | (1 − φ i ) = 1 ∀i} = R(I − P )R(P T ) = span {w i | φ i = 1 ∀i} = span {w i | (1 − φ i ) = 0 ∀i} = N(I − P T )N(P T ) = span {w i | φ i = 0 ∀i} = span {w i | (1 − φ i ) = 1 ∀i} = R(I − P T )(1502)that is easy to see from (1492) and (1501). Idempotent I −P thereforeprojects vectors on its range, N(P ). Because all eigenvectors of a realidempotent matrix are real and independent, the algebraic complement ofR(P ) [147,3.3] is equivalent to N(P ) ; E.6 id est,R(P )⊕N(P ) = R(P T )⊕N(P T ) = R(P T )⊕N(P ) = R(P )⊕N(P T ) = R mi=1(1503)because R(P ) ⊕ R(I −P )= R m . For idempotent P ∈ R m×m , consequently,rankP + rank(I − P ) = m (1504)E.2.0.0.1 Theorem. Rank/Trace. [269,4.1, prob.9] (confer (1520))P 2 = P⇔rankP = trP and rank(I − P ) = tr(I − P )(1505)E.6 The same phenomenon occurs with symmetric (nonidempotent) matrices, for example.When the summands in A ⊕ B = R m are orthogonal vector spaces, the algebraiccomplement is the orthogonal complement.⋄

E.3. SYMMETRIC IDEMPOTENT MATRICES 561E.2.1Universal projector characteristicAlthough projection is not necessarily orthogonal and R(P )̸⊥ R(I − P ) ingeneral, still for any projector P and any x∈ R mPx + (I − P )x = x (1506)must hold where R(I − P ) = N(P ) is the algebraic complement of R(P).The algebraic complement of closed convex cone K , for example, is thenegative dual cone −K ∗ . (1625)E.3 Symmetric idempotent matricesWhen idempotent matrix P is symmetric, P is an orthogonal projector. Inother words, the projection direction of point x∈ R m on subspace R(P ) isorthogonal to R(P ) ; id est, for P 2 =P ∈ S m and projection Px∈ R(P )Px − x ⊥ R(P ) in R m (1507)Perpendicularity is a necessary and sufficient condition for orthogonalprojection on a subspace. [64,4.9]A condition equivalent to (1507) is: The norm of direction x −Px isthe infimum over all nonorthogonal projections of x on R(P ) ; [162,3.3] forP 2 =P ∈ S m , R(P )= R(A) , matrices A,B, Z and integer k as defined for(1486), and any given x∈ R m‖x − Px‖ 2 =inf ‖x − A(A † + BZ T )x‖ 2 (1508)B∈R n×kThe infimum is attained for R(B)⊆ N(A) over any affine set ofnonorthogonal projectors (1488) indexed by k .The proof is straightforward: The vector 2-norm is a convex function.ApplyingD.2, setting gradient of the norm-square to 0(A T ABZ T − A T (I − AA † ) ) xx T A = 0⇔A T ABZ T xx T A = 0Projection Px=AA † x is therefore unique.(1509)

562 APPENDIX E. PROJECTIONIn any case, P =AA † so the projection matrix must be symmetric. Thenfor any A∈ R m×n , P =AA † projects any vector x in R m orthogonally onR(A). Under either condition (1507) or (1508), the projection Px is uniqueminimum-distance; for subspaces, perpendicularity and minimum-distanceconditions are equivalent.E.3.1Four subspacesWe summarize the orthogonal projectors projecting on the four fundamentalsubspaces: for A∈ R m×nA † A : R n on R(A † A) = R(A T )AA † : R m on R(AA † ) = R(A)I −A † A : R n on R(I −A † A) = N(A)I −AA † : R m on R(I −AA † ) = N(A T )(1510)For completeness: E.7 (1502)N(A † A) = N(A)N(AA † ) = N(A T )N(I −A † A) = R(A T )N(I −AA † ) = R(A)(1511)E.3.2Orthogonal characterizationAny symmetric projector P 2 =P ∈ S m projecting on nontrivial R(Q) canbe defined by the orthonormality condition Q T Q = I . When skinnymatrix Q∈ R m×k has orthonormal columns, then Q † = Q T by the Penroseconditions. Hence, any P having an orthonormal decomposition (E.3.4)where [220,3.3] (1233)P = QQ T , Q T Q = I (1512)R(P ) = R(Q) , N(P ) = N(Q T ) (1513)E.7 Proof is by singular value decomposition (A.6.2): N(A † A)⊆ N(A) is obvious.Conversely, suppose A † Ax=0. Then x T A † Ax=x T QQ T x=‖Q T x‖ 2 =0 where A=UΣQ Tis the subcompact singular value decomposition. Because R(Q)= R(A T ), then x∈N(A)that implies N(A † A)⊇ N(A).

E.3. SYMMETRIC IDEMPOTENT MATRICES 563is an orthogonal projector projecting on R(Q) having, for Px∈ R(Q)(confer (1498))Px − x ⊥ R(Q) in R m (1514)From (1512), orthogonal projector P is obviously positive semidefinite(A.3.1.0.6); necessarily,P T = P , P † = P , ‖P ‖ 2 = 1, P ≽ 0 (1515)and ‖Px‖ = ‖QQ T x‖ = ‖Q T x‖ because ‖Qy‖ = ‖y‖ ∀y ∈ R k . All orthogonalprojectors are therefore nonexpansive because√〈Px, x〉 = ‖Px‖ = ‖Q T x‖ ≤ ‖x‖ ∀x∈ R m (1516)the Bessel inequality, [64] [147] with equality when x∈ R(Q).From the diagonalization of idempotent matrices (1492) on page 556P = SΦS T =m∑φ i s i s T i =i=1k∑≤ mi=1s i s T i (1517)orthogonal projection of point x on R(P ) has expression like an orthogonalexpansion [64,4.10]wherePx = QQ T x =k∑s T i xs i (1518)i=1Q = S(:,1:k) = [ s 1 · · · s k]∈ Rm×k(1519)and where the s i [sic] are orthonormal eigenvectors of symmetric idempotentP . When the domain is restricted to the range of P , say x=Qξ for ξ ∈ R k ,then x = Px = QQ T Qξ = Qξ . Otherwise, any component of x in N(Q T )will be annihilated.E.3.2.0.1 Theorem. Symmetric rank/trace. (confer (1505) (1098))P T = P , P 2 = P⇔rankP = trP = ‖P ‖ 2 F and rank(I − P ) = tr(I − P ) = ‖I − P ‖ 2 F(1520)⋄

564 APPENDIX E. PROJECTIONProof. We take as given Theorem E.2.0.0.1 establishing idempotence.We have left only to show trP =‖P ‖ 2 F ⇒ P T = P , established in [269,7.1].E.3.3Summary, symmetric idempotentIn summary, orthogonal projector P is a linear operator defined[129,A.3.1] by idempotence and symmetry, and characterized bypositive semidefiniteness and nonexpansivity. The algebraic complement(E.2) to R(P ) becomes the orthogonal complement R(I − P ) ; id est,R(P ) ⊥ R(I − P ).E.3.4Orthonormal decompositionWhen Z = 0 in the general nonorthogonal projector A(A † + BZ T ) (1486),an orthogonal projector results (for any matrix A) characterized principallyby idempotence and symmetry. Any real orthogonal projector may, infact, be represented by an orthonormal decomposition such as (1512).[133,1, prob.42]To verify that assertion for the four fundamental subspaces (1510),we need only to express A by subcompact singular value decomposition(A.6.2): From pseudoinverse (1208) of A = UΣQ T ∈ R m×nAA † = UΣΣ † U T = UU T ,I − AA † = I − UU T = U ⊥ U ⊥T ,A † A = QΣ † ΣQ T = QQ TI − A † A = I − QQ T = Q ⊥ Q ⊥T(1521)where U ⊥ ∈ R m×m−rank A holds columnar an orthonormal basis for theorthogonal complement of R(U) , and likewise for Q ⊥ ∈ R n×n−rank A .The existence of an orthonormal decomposition is sufficient to establishidempotence and symmetry of an orthogonal projector (1512). E.3.4.1Unifying trait of all projectors: directionRelation (1521) shows: orthogonal projectors simultaneously possessa biorthogonal decomposition (conferE.1.1) (for example, AA † forskinny-or-square A full-rank) and an orthonormal decomposition (UU Twhence Px = UU T x).

E.3. SYMMETRIC IDEMPOTENT MATRICES 565E.3.4.2 orthogonal projector, orthonormal decompositionConsider orthogonal expansion of x∈ R(U) :x = UU T x =n∑u i u T i x (1522)i=1a sum of one-dimensional orthogonal projections (E.6.3), whereU ∆ = [u 1 · · · u n ] and U T U = I (1523)and where the subspace projector has two expressions, (1521)AA † ∆ = UU T (1524)where A ∈ R m×n has rank n . The direction of projection of x on u j forsome j ∈{1... n} , for example, is orthogonal to u j but parallel to a vectorin the span of all the remaining vectors constituting the columns of U ;u T j(u j u T j x − x) = 0u j u T j x − x = u j u T j x − UU T x ∈ R({u i |i=1... n, i≠j})(1525)E.3.4.3orthogonal projector, biorthogonal decompositionWe get a similar result for the biorthogonal expansion of x∈ R(A). Defineand the rows of the pseudoinverse⎡A ∆ = [a 1 a 2 · · · a n ] ∈ R m×n (1526)A † =∆ ⎢⎣a † 1a †2.a † n⎤⎥⎦ ∈ Rn×m (1527)under the biorthogonality condition A † A=I . In the biorthogonal expansion(2.13.8)n∑x = AA † x = a i a † i x (1528)i=1

566 APPENDIX E. PROJECTIONthe direction of projection of x on a j for some particular j ∈ {1... n} , forexample, is orthogonal to a † j and parallel to a vector in the span of all theremaining vectors constituting the columns of A ;E.3.4.4a † j (a ja † j x − x) = 0a j a † j x − x = a ja † j x − AA† x ∈ R({a i |i=1... n, i≠j})nonorthogonal projector, biorthogonal decomposition(1529)Because the result inE.3.4.3 is independent of symmetry AA † =(AA † ) T , wemust have the same result for any nonorthogonal projector characterized bya biorthogonality condition; namely, for nonorthogonal projector P = UQ T(1489) under biorthogonality condition Q T U = I , in the biorthogonalexpansion of x∈ R(U)k∑x = UQ T x = u i qi T x (1530)wherei=1U = ∆ [ ]u 1 · · · u k ∈ Rm×k⎡ ⎤q T1 (1531)Q T =∆ ⎣ . ⎦ ∈ R k×mq T kthe direction of projection of x on u j is orthogonal to q j and parallel to avector in the span of the remaining u i :q T j (u j q T j x − x) = 0u j q T j x − x = u j q T j x − UQ T x ∈ R({u i |i=1... k , i≠j})E.4 Algebra of projection on affine subsets(1532)Let P A x denote projection of x on affine subset A ∆ = R + α where R is asubspace and α ∈ A . Then, because R is parallel to A , it holds:P A x = P R+α x = (I − P R )(α) + P R x= P R (x − α) + α(1533)Subspace projector P R is a linear operator (E.1.2), and P R (x + y) = P R xwhenever y ⊥R and P R is an orthogonal projector.

E.5. PROJECTION EXAMPLES 567E.4.0.0.1 Theorem. Orthogonal projection on affine subset. [64,9.26]Let A = R + α be an affine subset where α ∈ A , and let R ⊥ be theorthogonal complement of subspace R . Then P A x is the orthogonalprojection of x∈ R n on A if and only ifP A x ∈ A , 〈P A x − x, a − α〉 = 0 ∀a ∈ A (1534)or if and only ifP A x ∈ A , P A x − x ∈ R ⊥ (1535)⋄E.5 Projection examplesE.5.0.0.1 Example. Orthogonal projection on orthogonal basis.Orthogonal projection on a subspace can instead be accomplished byorthogonally projecting on the individual members of an orthogonal basis forthat subspace. Suppose, for example, matrix A∈ R m×n holds an orthonormalbasis for R(A) in its columns; A = ∆ [a 1 a 2 · · · a n ] . Then orthogonalprojection of vector x∈ R n on R(A) is a sum of one-dimensional orthogonalprojectionsPx = AA † x = A(A T A) −1 A T x = AA T x =n∑a i a T i x (1536)i=1where each symmetric dyad a i a T i is an orthogonal projector projecting onR(a i ). (E.6.3) Because ‖x − Px‖ is minimized by orthogonal projection,Px is considered to be the best approximation (in the Euclidean sense) tox from the set R(A) . [64,4.9]E.5.0.0.2 Example. Orthogonal projection on span of nonorthogonal basis.Orthogonal projection on a subspace can also be accomplished by projectingnonorthogonally on the individual members of any nonorthogonal basis forthat subspace. This interpretation is in fact the principal application of thepseudoinverse we discussed. Now suppose matrix A holds a nonorthogonalbasis for R(A) in its columns;A ∆ = [a 1 a 2 · · · a n ] ∈ R m×n (1537)

568 APPENDIX E. PROJECTIONand define the rows of the pseudoinverse⎡ ⎤a † 1A † =∆ a †2⎢ ⎥⎣ . ⎦ ∈ Rn×m (1538)a † nwith A † A = I . Then orthogonal projection of vector x ∈ R n on R(A) is asum of one-dimensional nonorthogonal projectionsPx = AA † x =n∑a i a † i x (1539)i=1where each nonsymmetric dyad a i a † i is a nonorthogonal projector projectingon R(a i ) , (E.6.1) idempotent because of the biorthogonality conditionA † A = I .The projection Px is regarded as the best approximation to x from theset R(A) , as it was in Example E.5.0.0.1.E.5.0.0.3 Example. Biorthogonal expansion as nonorthogonal projection.Biorthogonal expansion can be viewed as a sum of components, each anonorthogonal projection on the range of an extreme direction of a pointedpolyhedral cone K ; e.g., Figure 110.Suppose matrix A∈ R m×n holds a nonorthogonal basis for R(A) inits columns as in (1537), and the rows of pseudoinverse A † are definedas in (1538). Assuming the most general biorthogonality condition(A † + BZ T )A = I with BZ T defined as for (1486), then biorthogonalexpansion of vector x is a sum of one-dimensional nonorthogonal projections;for x∈ R(A)x = A(A † + BZ T )x = AA † x =n∑a i a † i x (1540)i=1where each dyad a i a † i is a nonorthogonal projector projecting on R(a i ).(E.6.1) The extreme directions of K=cone(A) are {a 1 ,..., a n } thelinearly independent columns of A while {a †T1 ,..., a †Tn } the extremedirections of relative dual cone K ∗ ∩aff K= cone(A †T ) (2.13.9.4) correspondto the linearly independent (B.1.1.1) rows of A † . The directions of

E.5. PROJECTION EXAMPLES 569a †T2K ∗a 2a †TzxK0ya 1a 1 ⊥ a †T2a 2 ⊥ a †T1x = y + z = P a1 x + P a2 xK ∗Figure 110: (confer Figure 46) Biorthogonal expansion of point x ∈ aff Kis found by projecting x nonorthogonally on range of extreme directions ofpolyhedral cone K ⊂ R 2 . Direction of projection on extreme direction a 1is orthogonal to extreme direction a †T1 of dual cone K ∗ and parallel to a 2(E.3.4.1); similarly, direction of projection on a 2 is orthogonal to a †T2 andparallel to a 1 . Point x is sum of nonorthogonal projections: x on R(a 1 )and x on R(a 2 ). Expansion is unique because extreme directions of K arelinearly independent. Were a 1 orthogonal to a 2 , then K would be identicalto K ∗ and nonorthogonal projections would become orthogonal.1

570 APPENDIX E. PROJECTIONnonorthogonal projection are determined by the pseudoinverse; id est,direction of projection a i a † i x−x on R(a i) is orthogonal to a †Ti . E.8Because the extreme directions of this cone K are linearly independent,the component projections are unique in the sense:there is only one linear combination of extreme directions of K thatyields a particular point x∈ R(A) wheneverR(A) = aff K = R(a 1 ) ⊕ R(a 2 ) ⊕ ... ⊕ R(a n ) (1541)E.5.0.0.4 Example. Nonorthogonal projection on elementary matrix.Suppose P Y is a linear nonorthogonal projector projecting on subspace Y ,and suppose the range of a vector u is linearly independent of Y ; id est,for some other subspace M containing Y supposeM = R(u) ⊕ Y (1542)Assuming P M x = P u x + P Y x holds, then it follows for vector x∈MP u x = x − P Y x , P Y x = x − P u x (1543)nonorthogonal projection of x on R(u) can be determined fromnonorthogonal projection of x on Y , and vice versa.Such a scenario is realizable were there some arbitrary basis for Ypopulating a full-rank skinny-or-square matrix AA ∆ = [ basis Y u ] ∈ R n+1 (1544)Then P M =AA † fulfills the requirements, with P u =A(:,n + 1)A † (n + 1,:)and P Y =A(:,1 : n)A † (1 : n,:). Observe, P M is an orthogonal projectorwhereas P Y and P u are nonorthogonal projectors.Now suppose, for example, P Y is an elementary matrix (B.3); inparticular,P Y = I − e 1 1 T =[0 √ ]2V N ∈ R N×N (1545)where Y = N(1 T ) . We have M = R N , A = [ √ 2V N e 1 ] , and u = e 1 .Thus P u = e 1 1 T is a nonorthogonal projector projecting on R(u) in adirection parallel to a vector in Y (E.3.4.1), and P Y x = x − e 1 1 T x is anonorthogonal projection of x on Y in a direction parallel to u . E.8 This remains true in high dimension although only a little more difficult to visualizein R 3 ; confer , Figure 47.

E.5. PROJECTION EXAMPLES 571E.5.0.0.5 Example. Projecting the origin on a hyperplane.(confer2.4.2.0.1) Given the hyperplane representation having b ∈ R andnonzero normal a∈R m ∂H = {y | a T y = b} ⊂ R m (90)the orthogonal projection of the origin P0 on that hyperplane is the solutionto a minimization problem: (1508)‖P0 − 0‖ 2 = infy∈∂H ‖y − 0‖ 2= infξ∈R m−1 ‖Zξ + x‖ 2(1546)where x is any solution to a T y=b , and where the columns of Z ∈ R m×m−1constitute a basis for N(a T ) so that y = Zξ + x ∈ ∂H for all ξ ∈ R m−1 .The infimum can be found by setting the gradient (with respect to ξ) ofthe strictly convex norm-square to 0. We find the minimizing argumentsoand from (1510)P0 = y ⋆ = a(a T a) −1 a T x =ξ ⋆ = −(Z T Z) −1 Z T x (1547)y ⋆ = ( I − Z(Z T Z) −1 Z T) x (1548)a a T‖a‖ ‖a‖ x = ∆ AA † x = a b (1549)‖a‖ 2In words, any point x in the hyperplane ∂H projected on its normal a(confer (1574)) yields that point y ⋆ in the hyperplane closest to the origin.E.5.0.0.6 Example. Projection on affine subset.The technique of Example E.5.0.0.5 is extensible. Given an intersection ofhyperplanesA = {y | Ay = b} ⊂ R m (1550)where each row of A ∈ R m×n is nonzero and b ∈ R(A) , then the orthogonalprojection Px of any point x∈ R n on A is the solution to a minimizationproblem:‖Px − x‖ 2 = infy∈A‖y − x‖ 2= infξ∈R n−rank A ‖Zξ + y p − x‖ 2(1551)

572 APPENDIX E. PROJECTIONwhere y p is any solution to Ay = b , and where the columns ofZ ∈ R n×n−rank A constitute a basis for N(A) so that y = Zξ + y p ∈ A forall ξ ∈ R n−rank A .The infimum is found by setting the gradient of the strictly convexnorm-square to 0. The minimizing argument issoand from (1510),ξ ⋆ = −(Z T Z) −1 Z T (y p − x) (1552)y ⋆ = ( I − Z(Z T Z) −1 Z T) (y p − x) + x (1553)Px = y ⋆ = x − A † (Ax − b)= (I − A † A)x + A † Ay p(1554)which is a projection of x on N(A) then translated perpendicularly withrespect to the nullspace until it meets the affine subset A . E.5.0.0.7 Example. Projection on affine subset, vertex-description.Suppose now we instead describe the affine subset A in terms of some givenminimal set of generators arranged columnar in X ∈ R n×N (64); id est,A ∆ = aff X = {Xa | a T 1=1} ⊆ R n (1555)Here minimal set means XV N = [x 2 −x 1 x 3 −x 1 · · · x N −x 1 ]/ √ 2 (534) isfull-rank (2.4.2.2) where V N ∈ R N×N−1 is the Schoenberg auxiliary matrix(B.4.2). Then the orthogonal projection Px of any point x∈ R n on A isthe solution to a minimization problem:‖Px − x‖ 2= inf ‖Xa − x‖ 2a T 1=1(1556)= inf ‖X(V N ξ + a p ) − x‖ 2ξ∈R N−1where a p is any solution to a T 1=1. We find the minimizing argumentξ ⋆ = −(V T NX T XV N ) −1 V T NX T (Xa p − x) (1557)and so the orthogonal projection is [135,3]Px = Xa ⋆ = (I − XV N (XV N ) † )Xa p + XV N (XV N ) † x (1558)a projection of point x on R(XV N ) then translated perpendicularly withrespect to that range until it meets the affine subset A .

E.6. VECTORIZATION INTERPRETATION, 573E.5.0.0.8 Example. Projecting on hyperplane, halfspace, slab.Given the hyperplane representation having b ∈ R and nonzero normala∈ R m ∂H = {y | a T y = b} ⊂ R m (90)the orthogonal projection of any point x∈ R m on that hyperplane isPx = x − a(a T a) −1 (a T x − b) (1559)Orthogonal projection of x on the halfspace parametrized by b ∈ R andnonzero normal a∈ R m H − = {y | a T y ≤ b} ⊂ R m (82)is the pointPx = x − a(a T a) −1 max{0, a T x − b} (1560)Orthogonal projection of x on the convex slab (Figure 8), for c < bis the point [84,5.1]B ∆ = {y | c ≤ a T y ≤ b} ⊂ R m (1561)Px = x − a(a T a) −1 ( max{0, a T x − b} − max{0, c − a T x} ) (1562)E.6 Vectorization interpretation,projection on a matrixE.6.1Nonorthogonal projection on a vectorNonorthogonal projection of vector x on the range of vector y isaccomplished using a normalized dyad P 0 (B.1); videlicet,〈z,x〉〈z,y〉 y = zT xz T y y = yzTz T y x = ∆ P 0 x (1563)where 〈z,x〉/〈z,y〉 is the coefficient of projection on y . Because P0 2 =P 0and R(P 0 )= R(y) , rank-one matrix P 0 is a nonorthogonal projectorprojecting on R(y) . The direction of nonorthogonal projection is orthogonalto z ; id est,P 0 x − x ⊥ R(P0 T ) (1564)

574 APPENDIX E. PROJECTIONE.6.2Nonorthogonal projection on vectorized matrixFormula (1563) is extensible. Given X,Y,Z ∈ R m×n , we have theone-dimensional nonorthogonal projection of X in isomorphic R mn on therange of vectorized Y : (2.2)〈Z , X〉Y , 〈Z , Y 〉 ≠ 0 (1565)〈Z , Y 〉where 〈Z , X〉/〈Z , Y 〉 is the coefficient of projection. The inequalityaccounts for the fact: projection on R(vecY ) is in a direction orthogonal tovec Z .E.6.2.1Nonorthogonal projection on dyadNow suppose we have nonorthogonal projector dyadAnalogous to (1563), for X ∈ R m×mP 0 = yzTz T y ∈ Rm×m (1566)P 0 XP 0 =yzTz T y X yzTz T y = zT Xy(z T y) 2 yzT = 〈zyT , X〉〈zy T , yz T 〉 yzT (1567)is a nonorthogonal projection of matrix X on the range of vectorized dyadP 0 ; from which it follows:P 0 XP 0 = zT Xyz T y〈〉yz T zyT yzTz T y = z T y , X z T y = 〈P 0 T , X〉 P 0 = 〈P 0 T , X〉〈P0 T , P 0 〉 P 0(1568)Yet this relationship between matrix product and vector inner-product onlyholds for a dyad projector. When nonsymmetric projector P 0 is rank-one asin (1566), therefore,R(vecP 0 XP 0 ) = R(vec P 0 ) in R m2 (1569)andP 0 XP 0 − X ⊥ P T 0 in R m2 (1570)

E.6. VECTORIZATION INTERPRETATION, 575E.6.2.1.1 Example. λ as coefficients of nonorthogonal projection.Any diagonalization (A.5)X = SΛS −1 =m∑λ i s i wi T ∈ R m×m (1181)i=1may be expressed as a sum of one-dimensional nonorthogonal projections ofX , each on the range of a vectorized eigenmatrix P j ∆ = s j w T j ;∑X = m 〈(Se i e T jS −1 ) T , X〉 Se i e T jS −1i,j=1∑= m ∑〈(s j wj T ) T , X〉 s j wj T + m 〈(Se i e T jS −1 ) T , SΛS −1 〉 Se i e T jS −1j=1∑= m 〈(s j wj T ) T , X〉 s j wjTj=1i,j=1j ≠ i∆ ∑= m ∑〈Pj T , X〉 P j = m s j wj T Xs j wjTj=1∑= m λ j s j wjTj=1j=1∑= m P j XP jj=1(1571)This biorthogonal expansion of matrix X is a sum of nonorthogonalprojections because the term outside the projection coefficient 〈〉 is notidentical to the inside-term. (E.6.4) The eigenvalues λ j are coefficients ofnonorthogonal projection of X , while the remaining M(M −1)/2 coefficients(for i≠j) are zeroed by projection. When P j is rank-one as in (1571),R(vecP j XP j ) = R(vec s j w T j ) = R(vec P j ) in R m2 (1572)andP j XP j − X ⊥ P T j in R m2 (1573)Were matrix X symmetric, then its eigenmatrices would also be. So theone-dimensional projections would become orthogonal. (E.6.4.1.1)

576 APPENDIX E. PROJECTIONE.6.3Orthogonal projection on a vectorThe formula for orthogonal projection of vector x on the range of vector y(one-dimensional projection) is basic analytic geometry; [11,3.3] [220,3.2][246,2.2] [257,1-8]〈y,x〉〈y,y〉 y = yT xy T y y = yyTy T y x = ∆ P 1 x (1574)where 〈y,x〉/〈y,y〉 is the coefficient of projection on R(y) . An equivalentdescription is: Vector P 1 x is the orthogonal projection of vector x onR(P 1 )= R(y). Rank-one matrix P 1 is a projection matrix because P 2 1 =P 1 .The direction of projection is orthogonalbecause P T 1 = P 1 .E.6.4P 1 x − x ⊥ R(P 1 ) (1575)Orthogonal projection on a vectorized matrixFrom (1574), given instead X, Y ∈ R m×n , we have the one-dimensionalorthogonal projection of matrix X in isomorphic R mn on the range ofvectorized Y : (2.2)〈Y , X〉〈Y , Y 〉 Y (1576)where 〈Y , X〉/〈Y , Y 〉 is the coefficient of projection.For orthogonal projection, the term outside the vector inner-products 〈〉must be identical to the terms inside in three places.E.6.4.1Orthogonal projection on dyadThere is opportunity for insight when Y is a dyad yz T (B.1): Instead givenX ∈ R m×n , y ∈ R m , and z ∈ R n〈yz T , X〉〈yz T , yz T 〉 yzT = yT Xzy T y z T z yzT (1577)is the one-dimensional orthogonal projection of X in isomorphic R mn onthe range of vectorized yz T . To reveal the obscured symmetric projection

E.6. VECTORIZATION INTERPRETATION, 577matrices P 1 and P 2 we rewrite (1577):y T Xzy T y z T z yzT =yyTy T y X zzTz T z∆= P 1 XP 2 (1578)So for projector dyads, projection (1578) is the orthogonal projection in R mnif and only if projectors P 1 and P 2 are symmetric; E.9 in other words,for orthogonal projection on the range of a vectorized dyad yz T , theterm outside the vector inner-products 〈〉 in (1577) must be identicalto the terms inside in three places.When P 1 and P 2 are rank-one symmetric projectors as in (1578), (29)R(vecP 1 XP 2 ) = R(vec yz T ) in R mn (1579)andP 1 XP 2 − X ⊥ yz T in R mn (1580)When y=z then P 1 =P 2 =P T 2 andP 1 XP 1 = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1 (1581)meaning, P 1 XP 1 is equivalent to orthogonal projection of matrix X onthe range of vectorized projector dyad P 1 . Yet this relationship betweenmatrix product and vector inner-product does not hold for general symmetricprojector matrices.E.9 For diagonalizable X ∈ R m×m (A.5), its orthogonal projection in isomorphic R m2 onthe range of vectorized yz T ∈ R m×m becomes:P 1 XP 2 =m∑λ i P 1 s i wi T P 2i=1When R(P 1 ) = R(w j ) and R(P 2 ) = R(s j ), the j th dyad term from the diagonalizationis isolated but only, in general, to within a scale factor because neither set of left orright eigenvectors is necessarily orthonormal unless X is normal [269,3.2]. Yet whenR(P 2 )= R(s k ) , k≠j ∈{1... m}, then P 1 XP 2 =0.

578 APPENDIX E. PROJECTIONE.6.4.1.1 Example. Eigenvalues λ as coefficients of orthogonal projection.Let C represent any convex subset of subspace S M , and let C 1 be any elementof C . Then C 1 can be expressed as the orthogonal expansionC 1 =M∑i=1M∑〈E ij , C 1 〉 E ij ∈ C ⊂ S M (1582)j=1j ≥ iwhere E ij ∈ S M is a member of the standard orthonormal basis for S M(49). This expansion is a sum of one-dimensional orthogonal projectionsof C 1 ; each projection on the range of a vectorized standard basis matrix.Vector inner-product 〈E ij , C 1 〉 is the coefficient of projection of svec C 1 onR(svec E ij ).When C 1 is any member of a convex set C whose dimension is L ,Carathéodory’s theorem [66] [202] [129] [28] [29] guarantees that no morethan L +1 affinely independent members from C are required to faithfullyrepresent C 1 by their linear combination. E.10Dimension of S M is L=M(M+1)/2 in isometrically isomorphicR M(M+1)/2 . Yet because any symmetric matrix can be diagonalized, (A.5.2)C 1 ∈ S M is a linear combination of its M eigenmatrices q i q T i (A.5.1) weightedby its eigenvalues λ i ;C 1 = QΛQ T =M∑λ i q i qi T (1583)i=1where Λ ∈ S M is a diagonal matrix having δ(Λ) i =λ i , and Q=[q 1 · · · q M ]is an orthogonal matrix in R M×M containing corresponding eigenvectors.To derive eigen decomposition (1583) from expansion (1582), M standardbasis matrices E ij are rotated (B.5) into alignment with the M eigenmatricesq i qiT of C 1 by applying a similarity transformation; [220,5.6]{QE ij Q T } ={qi q T i , i = j = 1... M}( )√1qi 2qj T + q j qiT , 1 ≤ i < j ≤ M(1584)E.10 Carathéodory’s theorem guarantees existence of a biorthogonal expansion for anyelement in aff C when C is any pointed closed convex cone.

E.6. VECTORIZATION INTERPRETATION, 579which remains an orthonormal basis for S M . Then remarkably∑C 1 = M 〈QE ij Q T , C 1 〉 QE ij Q Ti,j=1j ≥ i∑= M ∑〈q i qi T , C 1 〉 q i qi T + M 〈QE ij Q T , QΛQ T 〉 QE ij Q Ti=1∑= M 〈q i qi T , C 1 〉 q i qiTi=1i,j=1j > i∆ ∑= M ∑〈P i , C 1 〉 P i = M q i qi T C 1 q i qiTi=1∑= M λ i q i qiTi=1i=1∑= M P i C 1 P ii=1(1585)this orthogonal expansion becomes the diagonalization; still a sum ofone-dimensional orthogonal projections. The eigenvaluesλ i = 〈q i q T i , C 1 〉 (1586)are clearly coefficients of projection of C 1 on the range of each vectorizedeigenmatrix. (conferE.6.2.1.1) The remaining M(M −1)/2 coefficients(i≠j) are zeroed by projection. When P i is rank-one symmetric as in (1585),R(svecP i C 1 P i ) = R(svecq i q T i ) = R(svecP i ) in R M(M+1)/2 (1587)andP i C 1 P i − C 1 ⊥ P i in R M(M+1)/2 (1588)E.6.4.2Positive semidefiniteness test as orthogonal projectionFor any given X ∈ R m×m the familiar quadratic construct y T Xy ≥ 0,over broad domain, is a fundamental test for positive semidefiniteness.(A.2) It is a fact that y T Xy is always proportional to a coefficient oforthogonal projection; letting z in formula (1577) become y ∈ R m , thenP 2 =P 1 =yy T /y T y=yy T /‖yy T ‖ 2 (confer (1236)) and formula (1578) becomes〈yy T , X〉〈yy T , yy T 〉 yyT = yT Xyy T yyy Ty T y = yyTy T y X yyTy T y∆= P 1 XP 1 (1589)

580 APPENDIX E. PROJECTIONBy (1576), product P 1 XP 1 is the one-dimensional orthogonal projection ofX in isomorphic R m2 on the range of vectorized P 1 because, for rankP 1 =1and P 21 =P 1 ∈ S m (confer (1568))P 1 XP 1 = yT Xyy T y〈〉yy T yyT yyTy T y = y T y , X y T y = 〈P 1 , X〉 P 1 = 〈P 1 , X〉〈P 1 , P 1 〉 P 1(1590)The coefficient of orthogonal projection 〈P 1 , X〉= y T Xy/(y T y) is also knownas Rayleigh’s quotient. E.11 When P 1 is rank-one symmetric as in (1589),R(vecP 1 XP 1 ) = R(vec P 1 ) in R m2 (1591)andP 1 XP 1 − X ⊥ P 1 in R m2 (1592)The test for positive semidefiniteness, then, is a test for nonnegativity ofthe coefficient of orthogonal projection of X on the range of each and everyvectorized extreme direction yy T (2.8.1) from the positive semidefinite conein the ambient space of symmetric matrices.E.11 When y becomes the j th eigenvector s j of diagonalizable X , for example, 〈P 1 , X 〉becomes the j th eigenvalue: [126,III]〈P 1 , X 〉| y=sj=s T j( m∑λ i s i wiTi=1s T j s j)s j= λ jSimilarly for y = w j , the j th left-eigenvector,〈P 1 , X 〉| y=wj=w T j( m∑λ i s i wiTi=1w T j w j)w j= λ jA quandary may arise regarding the potential annihilation of the antisymmetric part ofX when s T j Xs j is formed. Were annihilation to occur, it would imply the eigenvalue thusfound came instead from the symmetric part of X . The quandary is resolved recognizingthat diagonalization of real X admits complex eigenvectors; hence, annihilation could onlycome about by forming Re(s H j Xs j) = s H j (X +XT )s j /2 [131,7.1] where (X +X T )/2 isthe symmetric part of X, and s H j denotes the conjugate transpose.

E.7. ON VECTORIZED MATRICES OF HIGHER RANK 581E.6.4.3 PXP ≽ 0In some circumstances, it may be desirable to limit the domain of testy T Xy ≥ 0 for positive semidefiniteness; e.g., ‖y‖= 1. Another exampleof limiting domain-of-test is central to Euclidean distance geometry: ForR(V )= N(1 T ) , the test −V DV ≽ 0 determines whether D ∈ S N h is aEuclidean distance matrix. The same test may be stated: For D ∈ S N h (andoptionally ‖y‖=1)D ∈ EDM N ⇔ −y T Dy = 〈yy T , −D〉 ≥ 0 ∀y ∈ R(V ) (1593)The test −V DV ≽ 0 is therefore equivalent to a test for nonnegativity of thecoefficient of orthogonal projection of −D on the range of each and everyvectorized extreme direction yy T from the positive semidefinite cone S N + suchthat R(yy T ) = R(y) ⊆ R(V ). (The validity of this result is independent ofwhether V is itself a projection matrix.)E.7 on vectorized matrices of higher rankE.7.1 PXP misinterpretation for higher-rank PFor a projection matrix P of rank greater than 1, PXP is generally notcommensurate with 〈P,X 〉 P as is the case for projector dyads (1590). Yet〈P,P 〉for a symmetric idempotent matrix P of any rank we are tempted to say“ PXP is the orthogonal projection of X ∈ S m on R(vec P) ”. The fallacyis: vec PXP does not necessarily belong to the range of vectorized P ; themost basic requirement for projection on R(vec P) .E.7.2Orthogonal projection on matrix subspacesWith A 1 ,B 1 ,Z 1 ,A 2 ,B 2 ,Z 2 as defined for nonorthogonal projector (1486),and definingP 1 ∆ = A 1 A † 1 ∈ S m , P 2 ∆ = A 2 A † 2 ∈ S p (1594)where A 1 ∈ R m×n , Z 1 ∈ R m×k , A 2 ∈ R p×n , Z 2 ∈ R p×k , then given any X‖X −P 1 XP 2 ‖ F = inf ‖X −A 1 (A † 1+B 1 Z TB 1 , B 2 ∈R n×k 1 )X(A †T2 +Z 2 B2 T )A T 2 ‖ F (1595)

582 APPENDIX E. PROJECTIONAs for all subspace projectors, the range of the projector is the subspace onwhich projection is made; {P 1 Y P 2 | Y ∈ R m×p }. Altogether, for projectorsP 1 and P 2 of any rank this means the projection P 1 XP 2 is unique, orthogonalP 1 XP 2 − X ⊥ {P 1 Y P 2 | Y ∈ R m×p } in R mp (1596)and projectors P 1 and P 2 must each be symmetric (confer (1578)) to attainthe infimum.E.7.2.0.1 Proof. Minimum Frobenius norm (1595).Defining P ∆ = A 1 (A † 1 + B 1 Z T 1 ) ,inf ‖X − A 1 (A † 1 + B 1 Z1 T )X(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2= inf ‖X − PX(A †T2 + Z 2 B2 T )A T 2 ‖ 2 FB 1 , B 2()= inf tr (X T − A 2 (A † 2 + B 2 Z2 T )X T P T )(X − PX(A †T2 + Z 2 B2 T )A T 2 )B 1 , B 2(= inf tr X T X −X T PX(A †T2 +Z 2 B2 T )A T 2 −A 2 (A †B 1 , B 22+B 2 Z2 T )X T P T X)+A 2 (A † 2+B 2 Z2 T )X T P T PX(A †T2 +Z 2 B2 T )A T 2(1597)The Frobenius norm is a convex function. [41,8.1] Necessary and sufficientconditions for the global minimum are ∇ B1 =0 and ∇ B2 =0. (D.1.3.1) Termsnot containing B 2 in (1597) will vanish from gradient ∇ B2 ; (D.2.3)(∇ B2 tr −X T PXZ 2 B2A T T 2 −A 2 B 2 Z2X T T P T X+A 2 A † 2X T P T PXZ 2 B2A T T 2+A 2 B 2 Z T 2X T P T PXA †T2 A T 2+A 2 B 2 Z T 2X T P T PXZ 2 B T 2A T 2= −2A T 2X T PXZ 2 + 2A T 2A 2 A † 2X T P T PXZ 2 +)2A T 2A 2 B 2 Z2X T T P T PXZ 2= A T 2(−X T + A 2 A † 2X T P T + A 2 B 2 Z2X T T P T PXZ 2= 0⇔R(B 1 )⊆ N(A 1 ) and R(B 2 )⊆ N(A 2 ))(1598)The same conclusion is obtained were instead P T = ∆ (A †T2 + Z 2 B2 T )A T 2 andthe gradient with respect to B 1 observed. The projection P 1 XP 2 (1594) istherefore unique.

E.7. ON VECTORIZED MATRICES OF HIGHER RANK 583E.7.2.0.2 Example. PXP redux & N(V).Suppose we define a subspace of m ×n matrices, each elemental matrixhaving columns constituting a list whose geometric center (4.5.1.0.1) is theorigin in R m :R m×nc∆= {Y ∈ R m×n | Y 1 = 0}= {Y ∈ R m×n | N(Y ) ⊇ 1} = {Y ∈ R m×n | R(Y T ) ⊆ N(1 T )}= {XV | X ∈ R m×n } ⊂ R m×n (1599)the nonsymmetric geometric center subspace. Further suppose V ∈ S n isa projection matrix having N(V )= R(1) and R(V ) = N(1 T ). Then linearmapping T(X)=XV is the orthogonal projection of any X ∈ R m×n on R m×ncin the Euclidean (vectorization) sense because V is symmetric, N(XV )⊇1,and R(VX T )⊆ N(1 T ).Now suppose we define a subspace of symmetric n ×n matrices each ofwhose columns constitute a list having the origin in R n as geometric center,S n c∆= {Y ∈ S n | Y 1 = 0}= {Y ∈ S n | N(Y ) ⊇ 1} = {Y ∈ S n | R(Y ) ⊆ N(1 T )}(1600)the geometric center subspace. Further suppose V ∈ S n is a projectionmatrix, the same as before. Then V XV is the orthogonal projection ofany X ∈ S n on S n c in the Euclidean sense (1596) because V is symmetric,V XV 1=0, and R(V XV )⊆ N(1 T ). Two-sided projection is necessary onlyto remain in the ambient symmetric matrix subspace. ThenS n c = {V XV | X ∈ S n } ⊂ S n (1601)has dim S n c = n(n−1)/2 in isomorphic R n(n+1)/2 . We find its orthogonalcomplement as the aggregate of all negative directions of orthogonalprojection on S n c : the translation-invariant subspace (4.5.1.1)S n⊥c∆= {X − V XV | X ∈ S n } ⊂ S n= {u1 T + 1u T | u∈ R n }(1602)

584 APPENDIX E. PROJECTIONcharacterized by the doublet u1 T + 1u T (B.2). E.12 Defining thegeometric center mapping V(X) = −V XV 1 consistently with (562), then2N(V)= R(I − V) on domain S n analogously to vector projectors (E.2);id est,N(V) = S n⊥c (1603)a subspace of S n whose dimension is dim S n⊥c = n in isomorphic R n(n+1)/2 .Intuitively, operator V is an orthogonal projector; any argumentduplicitously in its range is a fixed point. So, this symmetric operator’snullspace must be orthogonal to its range.Now compare the subspace of symmetric matrices having all zeros in thefirst row and columnS n 1∆= {Y ∈ S n | Y e 1 = 0}{[ ] [ ] }0 0T 0 0T= X | X ∈ S n0 I 0 I{ [0 √ ] T [ √ ]= 2VN Z 0 2VN | Z ∈ SN}(1604)[ ] 0 0Twhere P = is an orthogonal projector. Then, similarly, PXP is0 Ithe orthogonal projection of any X ∈ S n on S n 1 in the Euclidean sense (1596),andS n⊥1{[ ] [ ] }0 0T 0 0TX − X | X ∈ S n ⊂ S n0 I 0 I= { (1605)ue T 1 + e 1 u T | u∈ R n}∆=and obviously S n 1 ⊕ S n⊥1 = S n . E.12 Proof.{X − V X V | X ∈ S n } = {X − (I − 1 n 11T )X(I − 11 T 1 n ) | X ∈ Sn }Because {X1 | X ∈ S n } = R n ,= { 1 n 11T X + X11 T 1 n − 1 n 11T X11 T 1 n | X ∈ Sn }{X − V X V | X ∈ S n } = {1ζ T + ζ1 T − 11 T (1 T ζ 1 n ) | ζ ∈Rn }= {1ζ T (I − 11 T 12n ) + (I − 12n 11T )ζ1 T | ζ ∈R n }where I − 12n 11T is invertible.

E.8. RANGE/ROWSPACE INTERPRETATION 585E.8 Range/Rowspace interpretationFor idempotent matrices P 1 and P 2 of any rank, P 1 XP2T is a projectionof R(X) on R(P 1 ) and a projection of R(X T ) on R(P 2 ) : For any givenX = UΣQ T ∈ R m×p , as in compact singular value decomposition (1192),P 1 XP T 2 =η∑σ i P 1 u i qi T P2 T =i=1η∑σ i P 1 u i (P 2 q i ) T (1606)i=1where η = ∆ min{m , p}. Recall u i ∈ R(X) and q i ∈ R(X T ) when thecorresponding singular value σ i is nonzero. (A.6.1) So P 1 projects u i onR(P 1 ) while P 2 projects q i on R(P 2 ) ; id est, the range and rowspace of anyX are respectively projected on the ranges of P 1 and P 2 . E.13E.9 Projection on convex setThus far we have discussed only projection on subspaces. Now wegeneralize, considering projection on arbitrary convex sets in Euclidean space;convex because projection is, then, unique minimum-distance and a convexoptimization problem:For projection P C x of point x on any closed set C ⊆ R n it is obvious:C = {P C x | x∈ R n } (1607)If C ⊆ R n is a closed convex set, then for each and every x∈ R n there existsa unique point Px belonging to C that is closest to x in the Euclidean sense.Like (1508), unique projection Px (or P C x) of a point x on convex set Cis that point in C closest to x ; [162,3.12]There exists a converse:‖x − Px‖ 2 = infy∈C ‖x − y‖ 2 (1608)E.13 When P 1 and P 2 are symmetric and R(P 1 )=R(u j ) and R(P 2 )=R(q j ), then the j thdyad term from the singular value decomposition of X is isolated by the projection. Yetif R(P 2 )= R(q l ), l≠j ∈{1... η}, then P 1 XP 2 =0.

586 APPENDIX E. PROJECTIONE.9.0.0.1 Theorem. (Bunt-Motzkin) Convex set if projections unique.[250,7.5] [127] If C ⊆ R n is a nonempty closed set and if for each and everyx in R n there is a unique Euclidean projection Px of x on C belonging toC , then C is convex.⋄Borwein & Lewis propose, for closed convex set C [38,3.3, exer.12]whereas, for x /∈ C∇‖x − Px‖ 2 2 = (x − Px)2 (1609)∇‖x − Px‖ 2 = (x − Px) ‖x − Px‖ −12 (1610)E.9.0.0.2 Exercise.Prove (1609) and (1610). (Not proved in [38].)A well-known equivalent characterization of projection on a convex set isa generalization of the perpendicularity condition (1507) for projection on asubspace:E.9.0.0.3 Definition. Normal vector. [202, p.15]Vector z is normal to convex set C at point Px∈ C if〈z , y−Px〉 ≤ 0 ∀y ∈ C (1611)A convex set has a nonzero normal at each of its boundary points.[202, p.100] Hence, the normal or dual interpretation of projection:E.9.1Dual interpretation of projection on convex setE.9.1.0.1 Theorem. Unique minimum-distance projection. [129,A.3.1][162,3.12] [64,4.1] [49] (Figure 115(b), p.603) Given a closed convex setC ⊆ R n , point Px is the unique projection of a given point x∈ R n on C(Px is that point in C nearest x) if and only ifPx ∈ C , 〈x − Px , y − Px〉 ≤ 0 ∀y ∈ C (1612)△⋄

E.9. PROJECTION ON CONVEX SET 587As for subspace projection, operator P is idempotent in the sense: for eachand every x ∈ R n , P(Px)= Px . Yet operator P is not linear; projector Pis a linear operator if and only if convex set C (on which projection is made)is a subspace. (E.4)E.9.1.0.2 Theorem. Unique projection via normal cone. E.14 [64,4.3]Given closed convex set C ⊆ R n , point Px is the unique projection of agiven point x∈ R n on C if and only ifPx ∈ C , Px − x ∈ (C − Px) ∗ (1613)In other words, Px is that point in C nearest x if and only if Px − x belongsto that cone dual to translate C − Px .⋄E.9.1.1Dual interpretation as optimizationLuenberger [162, p.134] carries forward the dual interpretation of projectionas solution to a maximization problem: Minimum distance from a pointx∈ R n to a convex set C ⊂ R n can be found by maximizing distance fromx to hyperplane ∂H over the set of all hyperplanes separating x from C .Existence of a separating hyperplane (2.4.2.7) presumes point x lies on theboundary or exterior to set C .The optimal separating hyperplane is characterized by the fact it alsosupports C . Any hyperplane supporting C (Figure 19(a)) has form∂H − = { y ∈ R n | a T y = σ C (a) } (104)where the support function is convex, definedσ C (a) ∆ = supz∈Ca T z (412)When point x is finite and set C contains finite points, under this projectioninterpretation, if the supporting hyperplane is a separating hyperplane thenthe support function is finite. From Example E.5.0.0.8, projection P ∂H− x ofx on any given supporting hyperplane ∂H − isP ∂H− x = x − a(a T a) −1( a T x − σ C (a) ) (1614)E.14 −(C − Px) ∗ is the normal cone to set C at point Px. (E.10.3.2)

588 APPENDIX E. PROJECTION∂H −C(a)P C xH −H +κaP ∂H− xx(b)Figure 111: Dual interpretation of projection of point x on convex set Cin R 2 . (a) κ = (a T a) −1( a T x − σ C (a) ) (b) Minimum distance from x toC is found by maximizing distance to all hyperplanes supporting C andseparating it from x . A convex problem for any convex set, distance ofmaximization is unique.

E.9. PROJECTION ON CONVEX SET 589With reference to Figure 111, identifyingthenH + = {y ∈ R n | a T y ≥ σ C (a)} (83)‖x − P C x‖ = sup ‖x − P ∂H− x‖ = sup ‖a(a T a) −1 (a T x − σ C (a))‖∂H − | x∈H + a | x∈H += supa | x∈H +1‖a‖ |aT x − σ C (a)|(1615)which can be expressed as a convex optimization, for arbitrary positiveconstant τ‖x − P C x‖ = maximize a T x − σ C (a)a(1616)subject to ‖a‖ ≤ τThe unique minimum-distance projection on convex set C is thereforewhere optimally ‖a ⋆ ‖= τ .P C x = x − a ⋆( a ⋆T x − σ C (a ⋆ ) ) 1τ 2 (1617)E.9.1.1.1 Exercise.Test that projection paradigm from Figure 111 on any convex polyhedralset.E.9.1.2Dual interpretation of projection on coneIn the circumstance set C is a closed convex cone K and there exists ahyperplane separating given point x from K , then optimal σ K (a ⋆ ) takesvalue 0 [129,C.2.3.1]. So problem (1616) for projection of x on K becomes‖x − P K x‖ = maximize a T xasubject to ‖a‖ ≤ τ (1618)a ∈ K ◦The norm inequality in (1618) can be handled by Schur complement (A.4).Normals a to all hyperplanes supporting K belong to the polar coneK ◦ = −K ∗ by definition: (269)a ∈ K ◦ ⇔ 〈a, x〉 ≤ 0 for all x ∈ K (1619)

590 APPENDIX E. PROJECTIONProjection on cone K isP K x = (I − 1τ 2a⋆ a ⋆T )x (1620)whereas projection on the polar cone −K ∗ is (E.9.2.2.1)P K ◦x = x − P K x = 1τ 2a⋆ a ⋆T x (1621)Negating vector a , this maximization problem (1618) becomes aminimization (the same problem) and the polar cone becomes the dual cone:E.9.2‖x − P K x‖ = −minimize a T xasubject to ‖a‖ ≤ τ (1622)a ∈ K ∗Projection on coneWhen convex set Cconditions:is a cone, there is a finer statement of optimalityE.9.2.0.1 Theorem. Unique projection on cone. [129,A.3.2]Let K ⊆ R n be a closed convex cone, and K ∗ its dual (2.13.1). Then Px isthe unique minimum-distance projection of x∈ R n on K if and only ifPx ∈ K , 〈Px − x, Px〉 = 0, Px − x ∈ K ∗ (1623)In words, Px is the unique minimum-distance projection of x on K ifand only if1) projection Px lies in K2) direction Px−x is orthogonal to the projection Px3) direction Px−x lies in the dual cone K ∗ .As the theorem is stated, it admits projection on K having empty interior;id est, on convex cones in a proper subspace of R n . Projection on K of anypoint x∈−K ∗ belonging to the negative dual cone is on the origin.⋄

E.9. PROJECTION ON CONVEX SET 591E.9.2.1Relation to subspace projectionConditions 1 and 2 of the theorem are common with orthogonal projectionon a subspace R(P ) : Condition 1 is the most basic requirement;namely, Px∈ R(P ) , the projection belongs to the subspace. Invokingperpendicularity condition (1507), we recall the second requirement forprojection on a subspace:Px − x ⊥ R(P ) or Px − x ∈ R(P ) ⊥ (1624)which corresponds to condition 2. Yet condition 3 is a generalizationof subspace projection; id est, for unique minimum-distance projection ona closed convex cone K , polar cone −K ∗ plays the role R(P ) ⊥ playsfor subspace projection (P R x = x − P R ⊥ x). Indeed, −K ∗ is the algebraiccomplement in the orthogonal vector sum (p.644) [176] [129,A.3.2.5]K ⊞ −K ∗ = R n ⇔ cone K is closed and convex (1625)Also, given unique minimum-distance projection Px on K satisfyingTheorem E.9.2.0.1, then by projection on the algebraic complement via I −PinE.2 we have−K ∗ = {x − Px | x∈ R n } = {x∈ R n | Px = 0} (1626)consequent to Moreau (1628). Recalling any subspace is a closed convexcone E.15 K = R(P ) ⇔ −K ∗ = R(P ) ⊥ (1627)meaning, when a cone is a subspace R(P ) then the dual cone becomes itsorthogonal complement R(P ) ⊥ . [41,2.6.1] In this circumstance, condition 3becomes coincident with condition 2.E.15 but a proper subspace is not a proper cone (2.7.2.2.1).

592 APPENDIX E. PROJECTIONE.9.2.2 Salient properties: Projection Px on closed convex cone K[129,A.3.2] [64,5.6] For x, x 1 , x 2 ∈ R n1. P K (αx) = α P K x ∀α≥0 (nonnegative homogeneity)2. ‖P K x‖ ≤ ‖x‖3. P K x = 0 ⇔ x ∈ −K ∗4. P K (−x) = −P −K x5. (Jean-Jacques Moreau (1962)) [176]x = x 1 + x 2 , x 1 ∈ K , x 2 ∈−K ∗ , x 1 ⊥ x 2⇔x 1 = P K x , x 2 = P −K∗x(1628)6. K = {x − P −K∗x | x∈ R n } = {x∈ R n | P −K∗x = 0}7. −K ∗ = {x − P K x | x∈ R n } = {x∈ R n | P K x = 0} (1626)E.9.2.2.1 Corollary. I −P for cones. (conferE.2)Denote by K ⊆ R n a closed convex cone, and call K ∗ its dual. Thenx −P −K∗x is the unique minimum-distance projection of x∈ R n on K if andonly if P −K∗x is the unique minimum-distance projection of x on −K ∗ thepolar cone.⋄Proof. Assume x 1 = P K x . Then by Theorem E.9.2.0.1 we havex 1 ∈ K , x 1 − x ⊥ x 1 , x 1 − x ∈ K ∗ (1629)Now assume x − x 1 = P −K∗x . Then we havex − x 1 ∈ −K ∗ , −x 1 ⊥ x − x 1 , −x 1 ∈ −K (1630)But these two assumptions are apparently identical. We must therefore havex −P −K∗x = x 1 = P K x (1631)

E.9. PROJECTION ON CONVEX SET 593E.9.2.2.2 Corollary. Unique projection via dual or normal cone.[64,4.7] (E.10.3.2, confer Theorem E.9.1.0.2) Given point x∈ R n andclosed convex cone K ⊆ R n , the following are equivalent statements:1. point Px is the unique minimum-distance projection of x on K2. Px ∈ K , x − Px ∈ −(K − Px) ∗ = −K ∗ ∩ (Px) ⊥3. Px ∈ K , 〈x − Px, Px〉 = 0, 〈x − Px, y〉 ≤ 0 ∀y ∈ K⋄E.9.2.2.3 Example. Unique projection on nonnegative orthant.(confer (975)) From Theorem E.9.2.0.1, to project matrix H ∈ R m×n on theself-dual orthant (2.13.5.1) of nonnegative matrices R m×n+ in isomorphicR mn , the necessary and sufficient conditions are:H ⋆ ≥ 0tr ( (H ⋆ − H) T H ⋆) = 0H ⋆ − H ≥ 0(1632)where the inequalities denote entrywise comparison. The optimal solutionH ⋆ is simply H having all its negative entries zeroed;H ⋆ ij = max{H ij , 0} , i,j∈{1... m} × {1... n} (1633)Now suppose the nonnegative orthant is translated by T ∈ R m×n ; id est,consider R m×n+ + T . Then projection on the translated orthant is [64,4.8]H ⋆ ij = max{H ij , T ij } (1634)E.9.2.2.4 Example. Unique projection on truncated convex cone.Consider the problem of projecting a point x on a closed convex cone thatis artificially bounded; really, a bounded convex polyhedron having a vertexat the origin:minimize ‖x − Ay‖ 2y∈R Nsubject to y ≽ 0(1635)‖y‖ ∞ ≤ 1

594 APPENDIX E. PROJECTIONwhere the convex cone has vertex-description (2.12.2.0.1), for A∈ R n×NK = {Ay | y ≽ 0} (1636)and where ‖y‖ ∞ ≤ 1 is the artificial bound. This is a convex optimizationproblem having no known closed-form solution, in general. It arises, forexample, in the fitting of hearing aids designed around a programmablegraphic equalizer (a filter bank whose only adjustable parameters are gainper band each bounded above by unity). [58] The problem is equivalent to aSchur-form semidefinite program (A.4.1)minimizey∈R N , t∈Rsubject tot[tI x − Ay(x − Ay) T t]≽ 0(1637)0 ≼ y ≼ 1E.9.3nonexpansivityE.9.3.0.1 Theorem. Nonexpansivity. [107,2] [64,5.3]When C ⊂ R n is an arbitrary closed convex set, projector P projecting on Cis nonexpansive in the sense: for any vectors x,y ∈ R n‖Px − Py‖ ≤ ‖x − y‖ (1638)with equality when x −Px = y −Py . E.16⋄Proof. [37]‖x − y‖ 2 = ‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2+ 2〈x − Px, Px − Py〉 + 2〈y − Py , Py − Px〉(1639)Nonnegativity of the last two terms follows directly from the uniqueminimum-distance projection theorem (E.9.1.0.1).E.16 This condition for equality corrects an error in [49] (where the norm is applied to eachside of the condition given here) easily revealed by counter-example.

E.9. PROJECTION ON CONVEX SET 595The foregoing proof reveals another flavor of nonexpansivity; for each andevery x,y ∈ R n‖Px − Py‖ 2 + ‖(I − P )x − (I − P )y‖ 2 ≤ ‖x − y‖ 2 (1640)Deutsch shows yet another: [64,5.5]E.9.4‖Px − Py‖ 2 ≤ 〈x − y , Px − Py〉 (1641)Easy projectionsProjecting any matrix H ∈ R n×n in the Euclidean/Frobenius senseorthogonally on the subspace of symmetric matrices S n in isomorphicR n2 amounts to taking the symmetric part of H ; (2.2.2.0.1) id est,(H+H T )/2 is the projection.To project any H ∈ R n×n orthogonally on the symmetric hollowsubspace S n h in isomorphic Rn2 (2.2.3.0.1), we may take the symmetricpart and then zero all entries along the main diagonal, or vice versa(because this is projection on the intersection of two subspaces); id est,(H + H T )/2 − δ 2 (H) .To project on the nonnegative orthant R m×n+ , simply clip all negativeentries to 0. Likewise, projection on the nonpositive orthant R m×n− seesall positive entries clipped to 0. Projection on other orthants is equallysimple with appropriate clipping.Clipping in excess of |1| each entry of a point x ∈ R n is equivalentto unique minimum-distance projection of x on the unit hypercubecentered at the origin. (conferE.10.3.2)Projecting on hyperplane, halfspace, slab:E.5.0.0.8.Projection of x∈ R n on a hyper-rectangle: [41,8.1.1]C = {y ∈ R n | l ≼ y ≼ u, l ≺ u} (1642)⎧⎨ l k , x k ≤ l kP(x) k = x k , l k ≤ x k ≤ u k (1643)⎩u k , x k ≥ u k

596 APPENDIX E. PROJECTIONUnique minimum-distance projection of H ∈ S n on the positivesemidefinite cone S n + in the Euclidean/Frobenius sense is accomplishedby eigen decomposition (diagonalization) followed by clipping allnegative eigenvalues to 0.Unique minimum-distance projection on the generally nonconvexsubset of all matrices belonging to S n + having rank not exceeding ρ(2.9.2.1) is accomplished by clipping all negative eigenvalues to 0 andzeroing the smallest nonnegative eigenvalues keeping only ρ largest.(7.1.2)Unique minimum-distance projection of H ∈ R m×n on the set of allm ×n matrices of rank no greater than k in the Euclidean/Frobeniussense is the singular value decomposition (A.6) of H having allsingular values beyond the k th zeroed. [217, p.208] This is also a solutionto the projection in the sense of spectral norm. [41,8.1]Projection on Lorentz cone: [41, exer.8.3(c)]P S N+ ∩ S N c= P S N+P S N c(823)P RN×N+ ∩ S N h= P RN×NP + S Nh(7.0.1.1)P RN×N+ ∩ S= P N RN×N+P S N(E.9.5)E.9.4.0.1 Exercise. Largest singular value.Find the unique minimum-distance projection on the set of all m ×nmatrices whose largest singular value does not exceed 1. Unique minimum-distance projection on an ellipsoid (Figure 9) is noteasy but there are several known techniques. [35] [84,5.1]Youla [268,2.5] lists eleven “useful projections” of square-integrableuni- and bivariate real functions on various convex sets, expressible in closedform.

E.9. PROJECTION ON CONVEX SET 597❇❇❇❇❇❇❇❇ x ❜❜R n⊥ ❇❙ ❜❇❇❇❇◗◗◗◗ ❙❙❙❙❙❙ ❜❜❜❜❜z❜y❜❜❜❜✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧ R n❜❜C❜❜❜❜❜❜x −z ⊥ z −y ❜❇❜❇❇❇❇ ❜❜❜❜❜ ✧ ✧✧✧✧✧✧✧✧✧✧✧✧✧Figure 112: Closed convex set C belongs to subspace R n (shown boundedin sketch and drawn without proper perspective). Point y is uniqueminimum-distance projection of x on C ; equivalent to product of orthogonalprojection of x on R n and minimum-distance projection of result z on C .

598 APPENDIX E. PROJECTIONE.9.5Projection on convex set in subspaceSuppose a convex set C is contained in some subspace R n . Then uniqueminimum-distance projection of any point in R n ⊕ R n⊥ on C can beaccomplished by first projecting orthogonally on that subspace, and thenprojecting the result on C ; [64,5.14] id est, the ordered product of twoindividual projections.Proof. (⇐) To show that, suppose unique minimum-distance projectionP C x on C ⊂ R n is y as illustrated in Figure 112;‖x − y‖ ≤ ‖x − q‖ ∀q ∈ C (1644)Further suppose P Rn x equals z . By the Pythagorean theorem‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 (1645)because x − z ⊥ z − y . (1507) [162,3.3] Then point y = P C x is the sameas P C z because‖z − y‖ 2 = ‖x − y‖ 2 − ‖x − z‖ 2 ≤ ‖z − q‖ 2 = ‖x − q‖ 2 − ‖x − z‖ 2which holds by assumption (1644).(⇒) Now suppose z = P Rn x and∀q ∈ C(1646)‖z − y‖ ≤ ‖z − q‖ ∀q ∈ C (1647)meaning y = P C z . Then point y is identical to P C x because‖x − y‖ 2 = ‖x − z‖ 2 + ‖z − y‖ 2 ≤ ‖x − q‖ 2 = ‖x − z‖ 2 + ‖z − q‖ 2by assumption (1647).∀q ∈ C(1648)Projecting matrix H ∈ R n×n on convex cone K = S n ∩ R n×n+ in isomorphicR n2 can be accomplished, for example, by first projecting on S n and only thenprojecting the result on R n×n+ (confer7.0.1). This is because that projectionproduct is equivalent to projection on the subset of the nonnegative orthantin the symmetric matrix subspace.

E.10. ALTERNATING PROJECTION 599E.10 Alternating projectionAlternating projection is an iterative technique for finding a point in theintersection of a number of arbitrary closed convex sets C k , or for findingthe distance between two nonintersecting closed convex sets. Because it cansometimes be difficult or inefficient to compute the intersection or expressit analytically, one naturally asks whether it is possible to instead project(unique minimum-distance) alternately on the individual C k , often easier.Once a cycle of alternating projections (an iteration) is complete, we theniterate (repeat the cycle) until convergence. If the intersection of two closedconvex sets is empty, then by convergence we mean the iterate (the resultafter a cycle of alternating projections) settles to a point of minimum distanceseparating the sets.While alternating projection can find the point in the nonemptyintersection closest to a given point b , it does not necessarily find it.Dependably finding that point is solved by an elegantly simple enhancementto the alternating projection technique: this Dykstra algorithm (1685)for projection on the intersection is one of the most beautiful projectionalgorithms ever discovered. It is accurately interpreted as the discoveryof what alternating projection originally sought to accomplish; uniqueminimum-distance projection on the nonempty intersection of a number ofarbitrary closed convex sets C k . Alternating projection is, in fact, a specialcase of the Dykstra algorithm whose discussion we defer untilE.10.3.E.10.0.1commutative projectorsGiven two arbitrary convex sets C 1 and C 2 and their respectiveminimum-distance projection operators P 1 and P 2 , if projectors commutefor each and every x∈ R n then it is easy to show P 1 P 2 x∈ C 1 ∩ C 2 andP 2 P 1 x∈ C 1 ∩ C 2 . When projectors commute (P 1 P 2 =P 2 P 1 ), a point in theintersection can be found in a finite number of steps; while commutativity isa sufficient condition, it is not necessary (5.8.1.1.1 for example).When C 1 and C 2 are subspaces, in particular, projectors P 1 and P 2commute if and only if P 1 P 2 = P C1 ∩ C 2or iff P 2 P 1 = P C1 ∩ C 2or iff P 1 P 2 isthe orthogonal projection on a Euclidean subspace. [64, lem.9.2] Subspaceprojectors will commute, for example, when P 1 (C 2 ) ⊂ C 2 or P 2 (C 1 ) ⊂ C 1 orC 1 ⊂ C 2 or C 2 ⊂ C 1 or C 1 ⊥ C 2 . When subspace projectors commute, thismeans we can find a point in the intersection of those subspaces in a finite

600 APPENDIX E. PROJECTIONbC 2∂K ⊥ C 1 ∩ C 2(Pb) + PbC 1Figure 113: First several alternating projections (1658) invon Neumann-style projection of point b converging on closest pointPb in intersection of two closed convex sets in R 2 ; C 1 and C 2 are partiallydrawn in vicinity of their intersection. The pointed normal cone K ⊥ (1687)is translated to Pb , the unique minimum-distance projection of b onintersection. For this particular example, it is possible to start anywherein a large neighborhood of b and still converge to Pb . The alternatingprojections are themselves robust with respect to some significant amountof noise because they belong to translated normal cone.number of steps; we find, in fact, the closest point.E.10.0.2noncommutative projectorsTypically, one considers the method of alternating projection when projectorsdo not commute; id est, when P 1 P 2 ≠P 2 P 1 .The iconic example for noncommutative projectors illustrated inFigure 113 shows the iterates converging to the closest point in theintersection of two arbitrary convex sets. Yet simple examples likeFigure 114 reveal that noncommutative alternating projection does notalways yield the closest point, although we shall show it always yields somepoint in the intersection or a point that attains the distance between twoconvex sets.Alternating projection is also known as successive projection [110] [107][43], cyclic projection [84] [170,3.2], successive approximation [49], orprojection on convex sets [214] [215,6.4]. It is traced back to von Neumann(1933) [249] and later Wiener [254] who showed that higher iterates of a

E.10. ALTERNATING PROJECTION 601bH 1H 2P 2 bH 1 ∩ H 2PbP 1 P 2 bFigure 114: The sets {C k } in this example comprise two halfspaces H 1 andH 2 . The von Neumann-style alternating projection in R 2 quickly convergesto P 1 P 2 b (feasibility). The unique minimum-distance projection on theintersection is, of course, Pb .product of two orthogonal projections on subspaces converge at each pointin the ambient space to the unique minimum-distance projection on theintersection of the two subspaces. More precisely, if R 1 and R 2 are closedsubspaces of a Euclidean space and P 1 and P 2 respectively denote orthogonalprojection on R 1 and R 2 , then for each vector b in that space,lim (P 1P 2 ) i b = P R1 ∩ R 2b (1649)i→∞Deutsch [64, thm.9.8, thm.9.35] shows rate of convergence for subspaces tobe geometric [267,1.4.4]; bounded above by κ 2i+1 ‖b‖ , i=0, 1, 2..., where0≤κ

602 APPENDIX E. PROJECTIONillustrates one application where convergence is reasonably geometric and theresult is the unique minimum-distance projection. Figure 114, in contrast,demonstrates convergence in one iteration to a fixed point (of the projectionproduct) E.17 in the intersection of two halfspaces; a.k.a, feasibility problem.It was Dykstra who in 1983 [71] (E.10.3) first solved this projection problem.E.10.0.3the bulletsAlternating projection has, therefore, various meaning dependent on theapplication or field of study; it may be interpreted to be: a distance problem,a feasibility problem (von Neumann), or a projection problem (Dykstra):Distance. Figure 115(a)(b). Find a unique point of projection P 1 b∈C 1 that attains the distance between any two closed convex sets C 1and C 2 ;‖P 1 b − b‖ = dist(C 1 , C 2 ) ∆ = infz∈C 2‖P 1 z − z‖ (1651)⋂Feasibility. Figure 115(c), Ck ≠ ∅ . Given a number of indexedclosed convex sets C k ⊂ R n , find any fixed point in their intersectionby iterating (i) a projection product starting from b ;(∏ ∞)∏P k b ∈ ⋂i=1kkC k (1652)Optimization. Figure 115(c), ⋂ C k ≠ ∅ . Given a number of indexedclosed convex sets C k ⊂ R n , uniquely project a given point b on ⋂ C k ;‖Pb − b‖ = infx∈ÌC k‖x − b‖ (1653)E.17 A fixed point of a mapping T : R n → R n is a point x whose image is identical underthe map; id est, Tx = x.

E.10. ALTERNATING PROJECTION 603a(a){y | (b −P 1 b) T (y −P 1 b)=0}b(b)C 1yC 1C 1P 1 bC 2C 2bC 2PbP 1 P 2 b(c)Figure 115:(a) (distance) Intersection of two convex sets in R 2 is empty. Method ofalternating projection would be applied to find that point in C 1 nearest C 2 .(b) (distance) Given b ∈ C 2 , then P 1 b ∈ C 1 is nearest b iff(y −P 1 b) T (b −P 1 b)≤0 ∀y ∈ C 1 by the unique minimum-distance projectiontheorem (E.9.1.0.1). When P 1 b attains the distance between the two sets,hyperplane {y | (b −P 1 b) T (y −P 1 b)=0} separates C 1 from C 2 . [41,2.5.1](c) (0 distance) Intersection is nonempty.(optimization) We may want the point Pb in ⋂ C k nearest point b .(feasibility) We may instead be satisfied with a fixed point of the projectionproduct P 1 P 2 b in ⋂ C k .

604 APPENDIX E. PROJECTIONE.10.1Distance and existenceExistence of a fixed point is established:E.10.1.0.1 Theorem. Distance. [49]Given any two closed convex sets C 1 and C 2 in R n , then P 1 b∈ C 1 is a fixedpoint of the projection product P 1 P 2 if and only if P 1 b is a point of C 1nearest C 2 .⋄Proof. (⇒) Given fixed point a = P 1 P 2 a ∈ C 1 with b ∆ = P 2 a ∈ C 2 intandem so that a = P 1 b , then by the unique minimum-distance projectiontheorem (E.9.1.0.1)(1654)(b − a) T (u − a) ≤ 0 ∀u∈ C 1(a − b) T (v − b) ≤ 0⇔∀v ∈ C 2‖a − b‖ ≤ ‖u − v‖ ∀u∈ C 1 and ∀v ∈ C 2by Schwarz inequality ‖〈x,y〉‖ ≤ ‖x‖ ‖y‖ [147] [202].(⇐) Suppose a∈ C 1 and ‖a − P 2 a‖ ≤ ‖u − P 2 u‖ ∀u∈ C 1 . Now suppose wechoose u =P 1 P 2 a . Then‖u − P 2 u‖ = ‖P 1 P 2 a − P 2 P 1 P 2 a‖ ≤ ‖a − P 2 a‖ ⇔ a = P 1 P 2 a (1655)Thus a = P 1 b (with b =P 2 a∈ C 2 ) is a fixed point in C 1 of the projectionproduct P 1 P 2 . E.18E.10.2Feasibility and convergenceThe set of all fixed points of any nonexpansive mapping is a closed convexset. [91, lem.3.4] [22,1] The projection product P 1 P 2 is nonexpansive byTheorem E.9.3.0.1 because, for any vectors x,a∈ R n‖P 1 P 2 x − P 1 P 2 a‖ ≤ ‖P 2 x − P 2 a‖ ≤ ‖x − a‖ (1656)If the intersection of two closed convex sets C 1 ∩ C 2 is empty, then the iteratesconverge to a point of minimum distance, a fixed point of the projectionproduct. Otherwise, convergence is to some fixed point in their intersectionE.18 Point b=P 2 a can be shown, similarly, to be a fixed point of the product P 2 P 1 .

E.10. ALTERNATING PROJECTION 605(a feasible point) whose existence is guaranteed by virtue of the fact that eachand every point in the convex intersection is in one-to-one correspondencewith fixed points of the nonexpansive projection product.Bauschke & Borwein [22,2] argue that any sequence monotone in thesense of Fejér is convergent. E.19E.10.2.0.1 Definition. Fejér monotonicity. [177]Given closed convex set C ≠ ∅ , then a sequence x i ∈ R n , i=0, 1, 2..., ismonotone in the sense of Fejér with respect to C iff‖x i+1 − c‖ ≤ ‖x i − c‖ for all i≥0 and each and every c ∈ C (1657)Given x 0 ∆ = b , if we express each iteration of alternating projection byx i+1 = P 1 P 2 x i , i=0, 1, 2... (1658)△and define any fixed point a =P 1 P 2 a , then the sequence x imonotone with respect to fixed point a becauseis Fejér‖P 1 P 2 x i − a‖ ≤ ‖x i − a‖ ∀i ≥ 0 (1659)by nonexpansivity. The nonincreasing sequence ‖P 1 P 2 x i − a‖ is boundedbelow hence convergent because any bounded monotone sequence in R isconvergent; [168,1.2] [29,1.1] P 1 P 2 x i+1 = P 1 P 2 x i = x i+1 . Sequence x itherefore converges to some fixed point. If the intersection C 1 ∩ C 2 isnonempty, convergence is to some point there by the distance theorem.Otherwise, x i converges to a point in C 1 of minimum distance to C 2 .E.10.2.0.2 Example. Hyperplane/orthant intersection.Find a feasible point (1652) belonging to the nonempty intersection of twoconvex sets: given A∈ R m×n , β ∈ R(A)C 1 ∩ C 2 = R n + ∩ A = {y | y ≽ 0} ∩ {y | Ay = β} ⊂ R n (1660)the nonnegative orthant with affine subset A an intersection of hyperplanes.Projection of an iterate x i ∈ R n on A is calculatedP 2 x i = x i − A T (AA T ) −1 (Ax i − β) (1554)E.19 Other authors prove convergence by different means; e.g., [107] [43].

606 APPENDIX E. PROJECTIONy 2θC 1 = R 2 +bPby 1C 2 = A = {y | [ 1 1 ]y = 1}Figure 116: From Example E.10.2.0.2 in R 2 , showing von Neumann-stylealternating projection to find feasible point belonging to intersection ofnonnegative orthant with hyperplane. Point Pb lies at intersection ofhyperplane with ordinate axis. In this particular example, the feasible pointfound is coincidentally optimal. The rate of convergence depends upon angleθ ; as it becomes more acute, convergence slows. [107,3]∥ x ∏i − ( ∞ ∏P k )b∥j=1k201816141210864200 5 10 15 20 25 30 35 40 45iFigure 117: Geometric convergence of iterates in norm, forExample E.10.2.0.2 in R 1000 .

E.10. ALTERNATING PROJECTION 607while, thereafter, projection of the result on the orthant is simplyx i+1 = P 1 P 2 x i = max{0,P 2 x i } (1661)where the maximum is entrywise (E.9.2.2.3).One realization of this problem in R 2 is illustrated in Figure 116: ForA = [ 1 1 ] , β = 1, and x 0 = b = [ −3 1/2 ] T , the iterates converge to thefeasible point Pb = [ 0 1 ] T .To give a more palpable sense of convergence in higher dimension, wedo this example again but now we compute an alternating projection forthe case A ∈ R 400×1000 , β ∈ R 400 , and b ∈ R 1000 , all of whose entries areindependently and randomly set to a uniformly distributed real number inthe interval [−1, 1] . Convergence is illustrated in Figure 117. This application of alternating projection to feasibility is extensible toany finite number of closed convex sets.E.10.2.0.3 Example. Under- and over-projection. [39,3]Consider the following variation of alternating projection: We begin withsome point x 0 ∈ R n then project that point on convex set C and thenproject that same point x 0 on convex set D . To the first iterate we assignx 1 = (P C (x 0 ) + P D (x 0 )) 1 . More generally,2x i+1 = (P C (x i ) + P D (x i )) 1 2, i=0, 1, 2... (1662)Because the Cartesian product of convex sets remains convex, (2.1.8) wecan reformulate this problem.Consider the convex set [ ]S =∆ C(1663)Drepresenting Cartesian product C × D . Now, those two projections P C andP D are equivalent to one projection on the Cartesian product; id est,([ ]) [ ]xi PC (xP S = i )(1664)x i P D (x i )Define the subspaceR ∆ ={ [ ] ∣ }Rn ∣∣∣v ∈R n [I −I ]v = 0(1665)

608 APPENDIX E. PROJECTIONBy the results in Example E.5.0.0.6([ ]) ([ ])xi PC (xP RS = P i )x R =i P D (x i )[PC (x i ) + P D (x i )P C (x i ) + P D (x i )] 12(1666)This means the proposed variation of alternating projection is equivalent toan alternation of projection on convex sets S and R . If S and R intersect,these iterations will converge to a point in their intersection; hence, to a pointin the intersection of C and D .We need not apply equal weighting to the projections, as supposed in(1662). In that case, definition of R would change accordingly. E.10.2.1Relative measure of convergenceInspired by Fejér monotonicity, the alternating projection algorithm fromthe example of convergence illustrated by Figure 117 employs a redundant∏sequence: The first sequence (indexed by j) estimates point ( ∞ ∏P k )b inthe presumably nonempty intersection, then the quantity( ∞ ∥ x ∏∏i − P k)b∥j=1kj=1k(1667)in second sequence x i is observed per iteration i for convergence. A prioriknowledge of a feasible point (1652) is both impractical and antithetical. Weneed another measure:Nonexpansivity implies( ( ∥ ∏ ∏ ∥∥∥∥ P∥ l)x k,i−1 − P l)x ki = ‖x ki − x k,i+1 ‖ ≤ ‖x k,i−1 − x ki ‖ (1668)llwherex ki ∆ = P k x k+1,i ∈ R n (1669)represents unique minimum-distance projection of x k+1,i on convex set k atiteration i . So a good convergence measure is the total monotonic sequenceε i ∆ = ∑ k‖x ki − x k,i+1 ‖ , i=0, 1, 2... (1670)where limi→∞ε i = 0 whether or not the intersection is nonempty.

E.10. ALTERNATING PROJECTION 609E.10.2.1.1 Example. Affine subset ∩ positive semidefinite cone.Consider the problem of finding X ∈ S n that satisfiesX ≽ 0, 〈A j , X〉 = b j , j =1... m (1671)given nonzero A j ∈ S n and real b j . Here we take C 1 to be the positivesemidefinite cone S n + while C 2 is the affine subset of S nC 2 = A = ∆ {X | tr(A j X)=b j , j =1... m} ⊆ S n⎡ ⎤svec(A 1 ) T= {X | ⎣ . ⎦svec X = b}svec(A m ) T∆= {X | A svec X = b}(1672)where b = [b j ] ∈ R m , A ∈ R m×n(n+1)/2 , and symmetric vectorization svec isdefined by (46). Projection of iterate X i ∈ S n on A is: (E.5.0.0.6)P 2 svec X i = svec X i − A † (A svec X i − b) (1673)The Euclidean distance from X i to A is thereforedist(X i , A) = ‖X i − P 2 X i ‖ F = ‖A † (A svec X i − b)‖ 2 (1674)Projection on the positive semidefinite cone (7.1.2) is found from the eigendecomposition P 2 X i = ∑ λ j q j qjTjn∑P 1 P 2 X i = max{0 , λ j }q j qj T (1675)j=1The distance from P 2 X i to the positive semidefinite cone is thereforen∑dist(P 2 X i , S n +) = ‖P 2 X i − P 1 P 2 X i ‖ F = √ min{0,λ j } 2 (1676)When the intersection is empty A ∩ S n + = ∅ , the iterates converge to thatpositive semidefinite matrix closest to A in the Euclidean sense. Otherwise,convergence is to some point in the nonempty intersection.Barvinok (2.9.3.0.1) shows that if a point feasible with (1671) exists,then there exists an X ∈ A ∩ S n + such that⌊√ ⌋ 8m + 1 − 1rankX ≤(226)2j=1

610 APPENDIX E. PROJECTIONE.10.2.1.2 Example. Semidefinite matrix completion.Continuing Example E.10.2.1.1: When m≤n(n + 1)/2 and the A j matricesare distinct members of the standard orthonormal basis {E lq ∈ S n } (49){A j ∈ S n , j =1... m} ⊆ {E lq } =and when the constants b jX = ∆ [X lq ]∈ S n{b j , j =1... m} ⊆{el e T l , l = q = 1... n}√12(e l e T q + e q e T l ), 1 ≤ l < q ≤ n(1677)are set to constrained entries of variable{ }Xlq√, l = q = 1... nX lq 2 , 1 ≤ l < q ≤ n= {〈X,E lq 〉} (1678)then the equality constraints in (1671) fix individual entries of X ∈ S n . Thusthe feasibility problem becomes a positive semidefinite matrix completionproblem. Projection of iterate X i ∈ S n on A simplifies to (confer (1673))P 2 svec X i = svec X i − A T (A svec X i − b) (1679)From this we can see that orthogonal projection is achieved simply bysetting corresponding entries of P 2 X i to the known entries of X , whilethe remaining entries of P 2 X i are set to corresponding entries of the currentiterate X i .Using this technique, we find a positive semidefinite completion for⎡⎢⎣4 3 ? 23 4 3 ?? 3 4 32 ? 3 4⎤⎥⎦ (1680)Initializing the unknown entries to 0, they all converge geometrically to1.5858 (rounded) after about 42 iterations.

E.10. ALTERNATING PROJECTION 61110 0dist(P 2 X i , S n +)10 −10 2 4 6 8 10 12 14 16 18iFigure 118: Distance (confer (1676)) between PSD cone and iterate (1679)in affine subset A (1672) for Laurent’s problem; initially, decreasinggeometrically.Laurent gives a problem for which no positive semidefinite completionexists: [153]⎡ ⎤1 1 ? 0⎢ 1 1 1 ?⎥⎣ ? 1 1 1 ⎦ (1681)0 ? 1 1Initializing unknowns to 0, by alternating projection we find the constrainedmatrix closest to the positive semidefinite cone,⎡⎤1 1 0.5454 0⎢ 1 1 1 0.5454⎥⎣ 0.5454 1 1 1 ⎦ (1682)0 0.5454 1 1and we find the positive semidefinite matrix closest to the affine subset A(1672): ⎡⎤1.0521 0.9409 0.5454 0.0292⎢ 0.9409 1.0980 0.9451 0.5454⎥⎣ 0.5454 0.9451 1.0980 0.9409 ⎦ (1683)0.0292 0.5454 0.9409 1.0521These matrices (1682) and (1683) attain the Euclidean distance dist(A , S n +) .Convergence is illustrated in Figure 118.

612 APPENDIX E. PROJECTIONbH 1y 21x 22x 21Hy 211x 12H 1 ∩ H 2x 11Figure 119: H 1 and H 2 are the same halfspaces as in Figure 114.Dykstra’s alternating projection algorithm generates the alternationsb, x 21 , x 11 , x 22 , x 12 , x 12 ...,. The path illustrated from b to x 12 in R 2terminates at the desired result, Pb . The alternations are not so robust inpresence of noise as for the example in Figure 113.E.10.3Optimization and projectionUnique projection on the nonempty intersection of arbitrary convex sets tofind the closest point therein is a convex optimization problem. The firstsuccessful application of alternating projection to this problem is attributedto Dykstra [71] [42] who in 1983 provided an elegant algorithm that prevailstoday. In 1988, Han [110] rediscovered the algorithm and provided aprimal-dual convergence proof. A synopsis of the history of alternatingprojection E.20 can be found in [44] where it becomes apparent that Dykstra’swork is seminal.E.20 For a synopsis of alternating projection applied to distance geometry, see [237,3.1].

E.10. ALTERNATING PROJECTION 613E.10.3.1Dykstra’s algorithmAssume we are given some point b ∈ R n and closed convex sets{C k ⊂ R n | k=1... L}. Let x ki ∈ R n and y ki ∈ R n respectively denote aprimal and dual vector (whose meaning can be deduced from Figure 119and Figure 120) associated with set k at iteration i . Initializey k0 = 0 ∀k=1... L and x 1,0 = b (1684)Denoting by P k t the unique minimum-distance projection of t on C k , andfor convenience x L+1,i ∆ = x 1,i−1 , calculation of the iterates x 1i proceeds: E.21for i=1, 2,...until convergence {for k=L...1 {t = x k+1,i − y k,i−1x ki = P k ty ki = P k t − t}}(1685)Assuming a nonempty intersection, then the iterates converge to the uniqueminimum-distance projection of point b on that intersection; [64,9.24]Pb = limi→∞x 1i (1686)In the case all the C k are affine, then calculation of y ki is superfluousand the algorithm becomes identical to alternating projection. [64,9.26][84,1] Dykstra’s algorithm is so simple, elegant, and represents such a tinyincrement in computational intensity over alternating projection, it is nearlyalways arguably cost-effective.E.10.3.2Normal coneGlunt [90,4] observes that the overall effect of Dykstra’s iterative procedureis to drive t toward the translated normal cone to ⋂ C k at the solutionPb (translated to Pb). The normal cone gets its name from its graphicalconstruction; which is, loosely speaking, to draw the outward-normals at Pb(Definition E.9.0.0.3) to all the convex sets C k touching Pb . The relativeinterior of the normal cone subtends these normal vectors.E.21 We reverse order of projection (k=L...1) in the algorithm for continuity of exposition.

614 APPENDIX E. PROJECTIONK ⊥ H 1 ∩ H 2(0)K ⊥ H 1 ∩ H 2(Pb) + PbH 10H 2K ∆ = H 1 ∩ H 2PbbFigure 120: Two examples (truncated): Normal cone to H 1 ∩ H 2 at theorigin, and at point Pb on the boundary. H 1 and H 2 are the same halfspacesfrom Figure 119. The normal cone at the origin K ⊥ H 1 ∩ H 2(0) is simply −K ∗ .E.10.3.2.1 Definition. Normal cone. [175] [29, p.261] [129,A.5.2][38,2.1] [201,3] The normal cone to any set S ⊆ R n at any particularpoint a∈ R n is defined as the closed coneK ⊥ S (a) ∆ = {z ∈ R n | z T (y −a)≤0 ∀y ∈ S} = −(S − a) ∗ (1687)an intersection of halfspaces about the origin in R n hence convex regardlessof the convexity of S ; the negative dual cone to the translate S − a . △Examples of normal cone construction are illustrated in Figure 120: Thenormal cone at the origin is the vector sum (2.1.8) of two normal cones;[38,3.3, exer.10] for H 1 ∩ int H 2 ≠ ∅K ⊥ H 1 ∩ H 2(0) = K ⊥ H 1(0) + K ⊥ H 2(0) (1688)This formula applies more generally to other points in the intersection.The normal cone to any affine set A at α∈ A , for example, is theorthogonal complement of A − α . Projection of any point in the translatednormal cone KC ⊥ (a∈ C) + a on convex set C is identical to a ; in other words,point a is that point in C closest to any point belonging to the translatednormal cone KC ⊥ (a) + a ; e.g., Theorem E.4.0.0.1.

E.10. ALTERNATING PROJECTION 615E 3K ⊥ E 3 (11 T ) + 11 TFigure 121: A few renderings (next page) of normal cone K ⊥ Eto elliptope3E 3 (Figure 70) at point 11 T , projected on R 3 . In [154, fig.2], normalcone is claimed circular in this dimension (severe numerical artifacts corruptboundary and make interior corporeal, drawn truncated).

616 APPENDIX E. PROJECTIONWhen set S is a convex cone K , then the normal cone to K at the originK ⊥ K(0) = −K ∗ (1689)is the negative dual cone. Any point belonging to −K ∗ , projected on K ,projects on the origin. More generally, [64,4.5]K ⊥ K(a) = −(K − a) ∗ (1690)KK(a∈ ⊥ K) = −K ∗ ∩ a ⊥ (1691)⋂The normal cone to Ck at Pb in Figure 114 is the ray{ξ(b −Pb) | ξ ≥0} illustrated in Figure 120. Applying Dykstra’s algorithm tothat example, convergence to the desired result is achieved in two iterations asillustrated in Figure 119. Yet applying Dykstra’s algorithm to the examplein Figure 113 does not improve rate of convergence, unfortunately, becausethe given point b and all the alternating projections already belong to thetranslated normal cone at the vertex of intersection.E.10.3.3speculationFrom these few examples we surmise, unique minimum-distance projection onblunt polyhedral cones having nonempty interior may be found by Dykstra’salgorithm in few iterations.

Appendix FProof of EDM compositionF.1 EDM-entry exponential(4.10)D ∈ EDM n ⇔ [1 − e −λd ij] ∈ EDM n ∀λ > 0 (656)Lemma 2.1. from A Tour d’Horizon ...on Completion Problems. [152]The following assertions are equivalent: for D =[d ij , i,j =1... n] ∈ S n h andE n the elliptope in S n (4.9.1.0.1),(i) D ∈ EDM n(ii) e −λD ∆ = [e −λd ij] ∈ E n for all λ > 0(iii) 11 T − e −λD ∆ = [1 − e −λd ij] ∈ EDM n for all λ > 0⋄2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.617

618 APPENDIX F. PROOF OF EDM COMPOSITIONProof. [208] (confer [147])Date: Fri, 06 Jun 2003 10:42:47 +0200From: Monique Laurent To: Jon Dattorro Subject: Re: TourHallo Jon,I looked again at the paper of Schoenberg and what I can see isthe following:1) the equivalence of Lemma 2.1 (i) (ii) (my paper) is stated inSchoenberg’s Theorem 1 (page 527).2) (ii) implies (iii) can be seen from the statement in the beginningof section 3, saying that a distance space embeds in $L_2$ iffsome associated matrix is PSD. Let me reformulate it:Let $d=(d_{ij})_{i,j=0,1,..,n}$ be a distance space on $n+1$ points(i.e., symmetric matrix of order $n+1$ with zero diagonal)and let $p=(p_{ij})_{i,j=1,..,n}$ be the symmetric matrix of order$n$ related by relations:(A) $p_{ij} = {d_{0i}+d_{0j}-d_{ij}\2}$ for $i,j=1,..n$or equivalently(B) $d_{0i} = p_{ii}, d_{ij} = p_{ii}+p_{jj}-2p_{ij}$for $i,j=1,..,n$Then, $d$ embeds in $L_2$ iff $p$ is positive semidefinite matrixiff $d$ is of negative type(see second half of page 525 and top of page 526)

F.1. EDM-ENTRY EXPONENTIAL 619For the implication from (ii) to (iii), set:$p = exp(-\lambda d)$ and define $d’$ from $p$ using (B) above.Then, $d’$ is a distance space on $n+1$ points that embedsin $L_2$. Thus its subspace of $n$ points also embeds in $L_2$and is precisely $1-exp(-\lambda d)$.Note that (iii) implies (ii) cannot be read immediately from thisargument since (iii) involves the subdistance of $d’$ on $n$ points(and not the full $d’$ on $n+1$ points).3) Show (iii) implies (i) by using the series expansion of the function$1 - exp(-lambda d)$; the constant term cancels; $\lambda$ factors out;remains a summation of $d$ plus a multiple of $\lambda$; letting$\lambda$ go to 0 gives the result.As far as I can see this is not explicitly written in Schoenberg.But Schoenberg also uses such an argument of expansion of theexponential function plus letting $\lambda$ go to 0(see first proof in page 526).I hope this helps. If not just ask again.Best regards, Monique> Hi Monique>> I’m reading your "A Tour d’Horizon..." from the AMS book "Topics in> Semidefinite and Interior-Point Methods".>> On page 56, Lemma 2.1(iii), 1-exp(-\lambda D) is EDM D is EDM.> You cite Schoenberg 1938; a paper I have acquired. I am missing the> connection of your Lemma(iii) to that paper; most likely because of my> lack of understanding of Schoenberg’s results. I am wondering if you> might provide a hint how you arrived at that result in terms of> Schoenberg’s results.>> Jon Dattorro

620 APPENDIX F. PROOF OF EDM COMPOSITION

Appendix GMatlab programsG.1 isedm()%Is real D a Euclidean Distance Matrix. -Jon Dattorro%%[Dclosest,X,isisnot,r] = isedm(D,tolerance,verbose,dimension,V)%%Returns: closest EDM in Schoenberg sense (default output),% a generating list X,% string ’is’ or ’isnot’ EDM,% actual affine dimension r of EDM output.%Input: candidate matrix D,% optional absolute numerical tolerance for EDM determination,% optional verbosity ’on’ or ’off’,% optional desired affine dim of generating list X output,% optional choice of ’Vn’ auxiliary matrix (default) or ’V’.function [Dclosest,X,isisnot,r] = isedm(D,tolerance_in,verbose,dim,V);isisnot = ’is’;N = length(D);2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.621

622 APPENDIX G. MATLAB PROGRAMSif nargin < 2 | isempty(tolerance_in)tolerance_in = eps;endtolerance = max(tolerance_in, eps*N*norm(D));if nargin < 3 | isempty(verbose)verbose = ’on’;endif nargin < 5 | isempty(V)use = ’Vn’;elseuse = ’V’;end%is emptyif N < 1if strcmp(verbose,’on’), disp(’Input D is empty.’), endX = [ ];Dclosest = [ ];isisnot = ’isnot’;r = [ ];returnend%is squareif size(D,1) ~= size(D,2)if strcmp(verbose,’on’), disp(’An EDM must be square.’), endX = [ ];Dclosest = [ ];isisnot = ’isnot’;r = [ ];returnend%is realif ~isreal(D)if strcmp(verbose,’on’), disp(’Because an EDM is real,’), endisisnot = ’isnot’;D = real(D);end

G.1. ISEDM() 623%is nonnegativeif sum(sum(chop(D,tolerance) < 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because an EDM is nonnegative,’),endend%is symmetricif sum(sum(abs(chop((D - D’)/2,tolerance)) > 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because an EDM is symmetric,’), endD = (D + D’)/2; %only required conditionend%has zero diagonalif sum(abs(diag(chop(D,tolerance))) > 0)isisnot = ’isnot’;if strcmp(verbose,’on’)disp(’Because an EDM has zero main diagonal,’)endend%is EDMif strcmp(use,’Vn’)VDV = -Vn(N)’*D*Vn(N);elseVDV = -Vm(N)’*D*Vm(N);end[Evecs Evals] = signeig(VDV);if ~isempty(find(chop(diag(Evals),...max(tolerance_in,eps*N*normest(VDV))) < 0))isisnot = ’isnot’;if strcmp(verbose,’on’), disp(’Because -VDV < 0,’), endendif strcmp(verbose,’on’)if strcmp(isisnot,’isnot’)disp(’matrix input is not EDM.’)elseif tolerance_in == epsdisp(’Matrix input is EDM to machine precision.’)elsedisp(’Matrix input is EDM to specified tolerance.’)end

624 APPENDIX G. MATLAB PROGRAMSend%find generating listr = max(find(chop(diag(Evals),...max(tolerance_in,eps*N*normest(VDV))) > 0));if isempty(r)r = 0;endif nargin < 4 | isempty(dim)dim = r;elsedim = round(dim);endt = r;r = min(r,dim);if r == 0X = zeros(1,N);elseif strcmp(use,’Vn’)X = [zeros(r,1) diag(sqrt(diag(Evals(1:r,1:r))))*Evecs(:,1:r)’];elseX = [diag(sqrt(diag(Evals(1:r,1:r))))*Evecs(:,1:r)’]/sqrt(2);endendif strcmp(isisnot,’isnot’) | dim < tDclosest = Dx(X);elseDclosest = D;end

G.1. ISEDM() 625G.1.1 Subroutines for isedm()G.1.1.1 chop()%zeroing entries below specified absolute tolerance threshold%-Jon Dattorrofunction Y = chop(A,tolerance)R = real(A);I = imag(A);if nargin == 1tolerance = max(size(A))*norm(A)*eps;endidR = find(abs(R) < tolerance);idI = find(abs(I) < tolerance);R(idR) = 0;I(idI) = 0;Y = R + i*I;G.1.1.2 Vn()function y = Vn(N)y = [-ones(1,N-1);eye(N-1)]/sqrt(2);G.1.1.3 Vm()%returns EDM V matrixfunction V = Vm(n)V = [eye(n)-ones(n,n)/n];

626 APPENDIX G. MATLAB PROGRAMSG.1.1.4 signeig()%Sorts signed real part of eigenvalues%and applies sort to values and vectors.%[Q, lam] = signeig(A)%-Jon Dattorrofunction [Q, lam] = signeig(A);[q l] = eig(A);lam = diag(l);[junk id] = sort(real(lam));id = id(length(id):-1:1);lam = diag(lam(id));Q = q(:,id);if nargout < 2Q = diag(lam);endG.1.1.5 Dx()%Make EDM from point listfunction D = Dx(X)[n,N] = size(X);one = ones(N,1);del = diag(X’*X);D = del*one’ + one*del’ - 2*X’*X;

G.2. CONIC INDEPENDENCE, CONICI() 627G.2 conic independence, conici()The recommended subroutine lp() (G.2.1) is a linear program solver fromMatlab’s Optimization Toolbox v2.0 (R11). Later releases of Matlabreplace lp() with linprog() that we find quite inferior to lp() on a widerange of problems.Given an arbitrary set of directions, this c.i. subroutine removes theconically dependent members. Yet a conically independent set returned isnot necessarily unique. In that case, if desired, the set returned may bealtered by reordering the set input.% Test for c.i. of arbitrary directions in rows or columns of X.% -Jon Dattorrofunction [Xci, indep_str, how_many_depend] = conici(X,rowORcol,tol);if nargin < 3tol=max(size(X))*eps*norm(X);endif nargin < 2 | strcmp(rowORcol,’col’)rowORcol = ’col’;Xin = X;elseif strcmp(rowORcol,’row’)Xin = X’;elsedisp(’Invalid rowORcol input.’)returnend[n, N] = size(Xin);indep_str = ’conically independent’;how_many_depend = 0;if rank(Xin) == NXci = X;returnend

628 APPENDIX G. MATLAB PROGRAMScount = 1;new_N = N;%remove zero rows or columnsfor i=1:Nif chop(Xin(:,count),tol)==0how_many_depend = how_many_depend + 1;indep_str = ’conically Dependent’;Xin(:,count) = [ ];new_N = new_N - 1;elsecount = count + 1;endend%remove conic dependenciescount = 1;newer_N = new_N;for i=1:new_Nif newer_N > 1A = [Xin(:,1:count-1) Xin(:,count+1:newer_N); -eye(newer_N-1)];b = [Xin(:,count); zeros(newer_N-1,1)];[a, lambda, how] = lp(zeros(newer_N-1,1),A,b,[ ],[ ],[ ],n,-1);if ~strcmp(how,’infeasible’)how_many_depend = how_many_depend + 1;indep_str = ’conically Dependent’;Xin(:,count) = [ ];newer_N = newer_N - 1;elsecount = count + 1;endendendif strcmp(rowORcol,’col’)Xci = Xin;elseXci = Xin’;end

G.2. CONIC INDEPENDENCE, CONICI() 629G.2.1 lp()LP Linear programming.X=LP(f,A,b) solves the linear programming problem:min f’x subject to: Ax

630 APPENDIX G. MATLAB PROGRAMSG.3 Map of the USAG.3.1 EDM, mapusa()%Find map of USA using only distance information.% -Jon Dattorro%Reconstruction from EDM.clear all;close all;load usalo; %From Matlab Mapping Toolboxhttp://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.html%To speed-up execution (decimate map data), make%’factor’ bigger positive integer.factor = 1;Mg = 2*factor; %Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);uslat = decimate(uslat,Mu);uslon = decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);%plot original dataplot3(x,y,z), axis equal, axis off

G.3. MAP OF THE USA 631lengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Make the distance matrixclear gtlakelat gtlakelon statelat statelonclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;%destroy input dataclear XVn = [-ones(1,N-1); speye(N-1)];VDV = (-Vn’*D*Vn)/2;clear D Vnpack[evec evals flag] = eigs(VDV, speye(size(VDV)), 10, ’LR’);if flag, disp(’convergence problem’), return, end;evals = real(diag(evals));index = find(abs(evals) > eps*normest(VDV)*N);n = sum(evals(index) > 0);Xs = [zeros(n,1) diag(sqrt(evals(index)))*evec(:,index)’];warning off; Xsplot=zeros(3,lengthNaN)*(0/0); warning on;Xsplot(:,id) = Xs;figure(2)%plot map found via EDM.plot3(Xsplot(1,:), Xsplot(2,:), Xsplot(3,:))axis equal, axis off

632 APPENDIX G. MATLAB PROGRAMSG.3.1.1USA map input-data decimation, decimate()function xd = decimate(x,m)roll = 0;rock = 1;for i=1:length(x)if isnan(x(i))roll = 0;xd(rock) = x(i);rock=rock+1;elseif ~mod(roll,m)xd(rock) = x(i);rock=rock+1;endroll=roll+1;endendxd = xd’;G.3.2 EDM using ordinal data, omapusa()%Find map of USA using ORDINAL distance information.% -Jon Dattorroclear all;close all;load usalo; %From Matlab Mapping Toolboxhttp://www-ccs.ucsd.edu/matlab/toolbox/map/usalo.htmlfactor = 1;Mg = 2*factor; %Relative decimation factorsMs = factor;Mu = 2*factor;gtlakelat = decimate(gtlakelat,Mg);gtlakelon = decimate(gtlakelon,Mg);statelat = decimate(statelat,Ms);statelon = decimate(statelon,Ms);

G.3. MAP OF THE USA 633uslatuslon= decimate(uslat,Mu);= decimate(uslon,Mu);lat = [gtlakelat; statelat; uslat]*pi/180;lon = [gtlakelon; statelon; uslon]*pi/180;phi = pi/2 - lat;theta = lon;x = sin(phi).*cos(theta);y = sin(phi).*sin(theta);z = cos(phi);%plot original dataplot3(x,y,z), axis equal, axis offlengthNaN = length(lat);id = find(isfinite(x));X = [x(id)’; y(id)’; z(id)’];N = length(X(1,:))% Make the distance matrixclear gtlakelat gtlakelon statelatclear statelon state stateborder greatlakesclear factor x y z phi theta conusclear uslat uslon Mg Ms Mu lat lonD = diag(X’*X)*ones(1,N) + ones(N,1)*diag(X’*X)’ - 2*X’*X;%ORDINAL MDS - vectorize Dcount = 1;M = (N*(N-1))/2;f = zeros(M,1);for j=2:Nfor i=1:j-1f(count) = D(i,j);count = count + 1;endend%sorted is f(idx)[sorted idx] = sort(f);

634 APPENDIX G. MATLAB PROGRAMSclear D sorted Xf(idx)=((1:M).^2)/M^2;%Create ordinal data matrixO = zeros(N,N);count = 1;for j=2:Nfor i=1:j-1O(i,j) = f(count);O(j,i) = f(count);count = count+1;endendclear f idxVn = sparse([-ones(1,N-1); eye(N-1)]);VOV = (-Vn’*O*Vn)/2;clear O Vnpack[evec evals flag] = eigs(VOV, speye(size(VOV)), 10, ’LR’);if flag, disp(’convergence problem’), return, end;evals = real(diag(evals));Xs = [zeros(3,1) diag(sqrt(evals(1:3)))*evec(:,1:3)’];warning off; Xsplot=zeros(3,lengthNaN)*(0/0); warning on;Xsplot(:,id) = Xs;figure(2)%plot map found via Ordinal MDS.plot3(Xsplot(1,:), Xsplot(2,:), Xsplot(3,:))axis equal, axis off

G.4. RANK REDUCTION SUBROUTINE, RRF() 635G.4 Rank reduction subroutine, RRf()%Rank Reduction function -Jon Dattorro%Inputs are:% Xstar matrix,% affine equality constraint matrix A whose rows are in svec format.%% Tolerance scheme needs revision...function X = RRf(Xstar,A);rand(’seed’,23);m = size(A,1);n = size(Xstar,1);if size(Xstar,1)~=size(Xstar,2)disp(’Rank Reduction subroutine: Xstar not square’)pauseendtoler = norm(eig(Xstar))*size(Xstar,1)*1e-9;if sum(chop(eig(Xstar),toler)

636 APPENDIX G. MATLAB PROGRAMS%try to find sparsity pattern for Z_itolerance = norm(X,’fro’)*size(X,1)*1e-9;Ztem = zeros(rho,rho);pattern = find(chop(cumu,tolerance)==0);if isempty(pattern) %if no sparsity, do random projectionranp = svect(2*(rand(rho,rho)-0.5));Z(1:rho,1:rho,i)...=svectinv((eye(rho*(rho+1)/2)-pinv(svectRAR)*svectRAR)*ranp);elsedisp(’sparsity pattern found’)Ztem(pattern)=1;Z(1:rho,1:rho,i) = Ztem;endphiZ = 1;toler = norm(eig(Z(1:rho,1:rho,i)))*rho*1e-9;if sum(chop(eig(Z(1:rho,1:rho,i)),toler)

G.4. RANK REDUCTION SUBROUTINE, RRF() 637G.4.1 svect()%Map from symmetric matrix to vector% -Jon Dattorrofunction y = svect(Y,N)if nargin == 1N=size(Y,1);endy = zeros(N*(N+1)/2,1);count = 1;for j=1:Nfor i=1:jif i~=jy(count) = sqrt(2)*Y(i,j);elsey(count) = Y(i,j);endcount = count + 1;endend

638 APPENDIX G. MATLAB PROGRAMSG.4.2 svectinv()%convert vector into symmetric matrix. m is dim of matrix.% -Jon Dattorrofunction A = svectinv(y)m = round((sqrt(8*length(y)+1)-1)/2);if length(y) ~= m*(m+1)/2disp(’dimension error in svectinv()’);pauseendA = zeros(m,m);count = 1;for j=1:mfor i=1:mif i

G.5. STURM’S PROCEDURE 639G.5 Sturm’s procedureThis is a demonstration program that can easily be transformed to asubroutine for decomposing positive semidefinite matrix X . It’s attractionis that it is a nonorthogonal alternative to eigen decomposition. (A.7.4.0.1)That particular decomposition obtained is dependent on choice of matrix A .%procedure to find dyad decomposition of X -Jon Dattorroclear allN = 4;r = 2;X = 2*(ran(r,N)-0.5);X = X’*X;t = null(svect(X)’);A = svectinv(t(:,1));%Suppose given matrix A is positive semidefinite (N=3, r=2)%[v,d] = signeig(X);%d(1,1)=0; d(2,2)=0; d(3,3)=pi;%A = v*d*v’;tol = 1e-8;Y = X;y = zeros(size(X,1),r);rho = r;for k=1:r[v,d] = signeig(Y);v = v*sqrt(chop(d,1e-14));viol = 0;j = [ ];for i=2:rhoif chop((v(:,1)’*A*v(:,1))*(v(:,i)’*A*v(:,i)),tol) ~= 0viol = 1;endif (v(:,1)’*A*v(:,1))*(v(:,i)’*A*v(:,i)) < 0j = i;

640 APPENDIX G. MATLAB PROGRAMSendbreakendendif ~violy(:,k) = v(:,1);elsealpha = (-2*(v(:,1)’*A*v(:,j)) + sqrt((2*v(:,1)’*A*v(:,j)).^2 ...-4*(v(:,j)’*A*v(:,j))*(v(:,1)’*A*v(:,1))))/(2*(v(:,j)’*A*v(:,j)));if chop(y(:,k)’*A*y(:,k),tol) ~= 0alpha = (-2*(v(:,1)’*A*v(:,j)) - sqrt((2*v(:,1)’*A*v(:,j)).^2 ...-4*(v(:,j)’*A*v(:,j))*(v(:,1)’*A*v(:,1))))/(2*(v(:,j)’*A*v(:,j)));endy(:,k) = (v(:,1) + alpha*v(:,j))/sqrt(1+alpha^2);endY = Y - y(:,k)*y(:,k)’;rho = rho - 1;z = zeros(size(y));e = zeros(N,N);for i=1:rz(:,i) = y(:,i)/norm(y(:,i));e(i,i) = norm(y(:,i))^2;endlam = diag(e);[junk id] = sort(real(lam));id = id(length(id):-1:1);z = [z(:,id(1:r)) null(z’)]e = diag(lam(id))[v,d] = signeig(X)X-z*e*z’

Appendix HNotation and a few definitionsbvector, scalar, logical condition (italic abcdefghijklmnopqrstuvwxyz)b i i th element of vector b or i th b vector from a set or list {b j }b i:jb Tb HA T 1Askinnytruncated vector comprising i th through j th entry of vector btransposeHermitian (conjugate) transposefirst of various transpositions of a cubix or quartix Amatrix, vector, scalar, or logical condition(italic ABCDEFGHIJKLMNOPQRSTUV WXY Z)⎡ ⎤a skinny matrix, meaning more rows than columns; ⎣ ⎦. When thereare more equations than unknowns, we say that the system Ax = b isoverdetermined. [93,5.3]fata fat matrix, meaning more columns than rows;underdetermined[ ],AF(C ∋A)some set (calligraphic ABCDEFGHIJ KLMN OPQRST UVWX YZ)smallest face (133) that contains element A of set C2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.641

642 APPENDIX H. NOTATION AND A FEW DEFINITIONSG(K)A −1A †√generators (2.8.1.2) of set K ; any collection of points and directionswhose hull constructs Kinverse of matrix AMoore-Penrose pseudoinverse of matrix Apositive square rootA 1/2 and √ A A 1/2 is any matrix √ such that A 1/2 A 1/2 =A .For A ∈ S n + , A ∈ Sn+ is unique and √ A √ A =A . [38,1.2]◦√D = ∆ [ √ d ij ] . (1002) Hadamard positive square root: D = ◦√ D ◦ ◦√ D .A ijE ijEλ i (X)λ(X) iA(:,i)A(j,:)A i:j,k:le.g.no.a.i.c.i.l.i.w.r.tij th element of matrix Amember of standard orthonormal basis for symmetric (49) or symmetrichollow (63) matriceselementary matrixi th entry of vector λ is function of Xi th entry of vector-valued function of Xi th column of matrix A [93,1.1.8]j th row of matrix Aor A(i:j, k:l) , submatrix taken from i through j th row and kthrough l th columnexempli gratia, from the Latin meaning for sake of examplenumber, from the Latin numeroaffinely independentconically independentlinearly independentwith respect to

643a.k.a also known asReImreal partimaginary part⊂ ⊃ ∩ ∪ standard set theory, subset, superset, intersection, union∈membership, element belongs to, or element is a member of∋ membership, contains as in C ∋ y (C contains element y)∃∴∀∝∞≡such thatthere existsthereforefor allproportional toinfinityequivalent to∆= defined equal to≈≃∼ =approximately equal toisomorphic to or withcongruent to or with◦ Hadamard product of matricesHadamard quotient as in, for x,y ∈ R n ,xy∆= [x i /y i , i=1... n]∈ R n⊗Kronecker product of matrices× Cartesian product

644 APPENDIX H. NOTATION AND A FEW DEFINITIONS⊕⊖⊞vector sum of sets X = Y ⊕ Z where every element x∈X has uniqueexpression x = y + z where y ∈ Y and z ∈ Z ; [202, p.19] then thesummands are algebraic complements. X = Y ⊕ Z ⇒ X = Y + Z .Now assume Y and Z are nontrivial subspaces. X = Y + Z ⇒X = Y ⊕ Z ⇔ Y ∩ Z =0 [203,1.2] [64,5.8]. Each element froma vector sum (+) of subspaces has a unique representation (⊕) when abasis from each subspace is linearly independent of bases from all theother subspaces.vector difference of setsorthogonal vector sum of sets X = Y ⊞ Z where every element x∈Xhas unique orthogonal expression x = y + z where y ∈ Y , z ∈ Z ,and y ⊥ z . [218, p.51] X = Y ⊞ Z ⇒ X = Y + Z . If Z ⊆ Y ⊥ thenX = Y ⊞ Z ⇔ X = Y ⊕ Z . [64,5.8] If Z = Y ⊥ then the summandsare orthogonal complements.± plus or minus⊥as in A ⊥ B meaning A is orthogonal to B (and vice versa), whereA and B are sets, vectors, or matrices. When A and B arevectors (or matrices under Frobenius norm), A ⊥ B ⇔ 〈A,B〉 = 0⇔ ‖A + B‖ 2 = ‖A‖ 2 + ‖B‖ 2\ as in \A means logical not A , or relative complement of set A ; e.g.,B\A = {x∈B | x /∈A}⇒ or ⇐ sufficiency or necessity, implies; e.g., A ⇒ B ⇔ \A ⇐ \B⇔is or →↓if and only if (iff) or corresponds to or necessary and sufficient orthe same asas in A is B means A ⇒ B ; conventional usage of English languageby mathematiciansdoes not implygoes to, or maps togoes to from above; e.g., above might mean positive in some context[129, p.2]

645←is replaced with; substitution| as in f(x) | x∈ C means with the condition(s) or such that orevaluated for, or as in {f(x) | x∈ C} means evaluated for each andevery x belonging to set Cg| xpexpression g evaluated at x p: as in f : R n → R m meaning f is a mapping, or sequence of successiveintegers specified by bounds as in i:j (if j < i then sequence isdescending)f : M → R[A, B ](A, B)meaning f is a mapping from ambient space M to ambient R , notnecessarily denoting either domain or rangeclosed interval or line segment between A and B in Ropen interval between A and B in R , or variable pair perhaps ofdisparate dimensionA, B as in, for example, A, B ∈ S N means A ∈ S N and B ∈ S N( ) hierarchal, parenthetical, optional{ } curly braces denote a set or list, e.g., {Xa | a ≽ 0} the set of all Xafor each and every a ≽ 0 where membership of a to some space isimplicit, a union〈〉 angle brackets denote vector inner-product[ ] matrix or vector, or quote insertion[d ij ] matrix whose ij th entry is d ij[x i ] vector whose i th entry is x ix pparticular value of xx 0particular instance of x , or initial value of a sequence x ix 1 first entry of vector x , or first element of a set or list {x i }x ⋆optimal value of variable x

646 APPENDIX H. NOTATION AND A FEW DEFINITIONSx ∗f ∗ˆfcomplex conjugateconvex conjugate functionreal function (one-dimensional)K coneK ∗dual coneK ◦ polar cone; K ∗ = −K ◦P C x or PxP k xδ(A)δ 2 (A)δ(A) 2λ(A)σ(A)π(γ)projection of point x on set C , P is operator or idempotent matrixprojection of point x on set C k(A.1) vector made from the main diagonal of A if A is a matrix;otherwise, diagonal matrix made from vector A≡ δ(δ(A)). For vector or diagonal matrix Λ , δ 2 (Λ) = Λ= δ(A)δ(A) where A is a vectorvector of eigenvalues of matrix A , typically arranged in nonincreasingordervector of singular values of matrix A (always arranged in nonincreasingorder), or support functionpermutation function (or presorting function) arranges vector γ intononincreasing orderψ(a) signum-like step function (919) (1205)VN ×N symmetric elementary, auxiliary, projector, geometric centeringmatrix, R(V )= N(1 T ) , N(V )= R(1) , V 2 =VV N N ×N −1 Schoenberg auxiliary matrix, R(V N )= N(1 T ) ,N(VN T )= R(1)V X V X V T X ≡ V T X T XV (755)Xpoint list (having cardinality N) arranged columnar in R n×N , or set ofgenerators, or extreme directions, or matrix variable

647GDDD T (X)D(X) TD −1 (X)D(X) −1D ⋆D ∗Gram matrix X T Xsymmetric hollow matrix of distance-square, or Euclidean distancematrix, or EDM candidateEuclidean distance matrix operatoradjoint operatortranspose of D(X)inverse operatorinverse of D(X)optimal value of variable Ddual to variable DD ◦ polar variable D∂ partial derivative or matrix of distance-square squared or, as in ∂K ,boundary of set K∂ypartial differential of yV geometric centering operator, V(D)= −V DV 1 2V NrnNdomonontoV N (D)= −V T N DV Naffine dimensiondimension of list X , or integercardinality of list X , or integerfunction domainfunction f(x) on A means A is domf , or projection of x on Ameans A is Euclidean body on which projection of x is madefunction f(x) maps onto M means f over its domain is a surjectionwith respect to M

648 APPENDIX H. NOTATION AND A FEW DEFINITIONSepifunction epigraphspan as in spanA = R(A) = {Ax | x∈ R n }R(A) the subspace, range of A , or span basis R(A) ; R(A) ⊥ N(A T )basis R(A)columnar basis for range of A , or a minimal set constituting generatorsfor the vertex-description of R(A) , or a linearly independent set ofvectors spanning R(A)N(A) the subspace, nullspace of A ; N(A) ⊥ R(A T )ı or j[ ] RmR nR m×nR nC n or C n×n√ −1R m × R n = R m+nEuclidean vector space of m by n dimensional real matricesEuclidean n-dimensional real vector spaceEuclidean complex vector space of respective dimension n and n×nR n + or R n×n+ nonnegative orthant in Euclidean vector space of respective dimensionn and n×nR n −or R n×n−S nS n⊥S n +int S n +S n +(ρ)nonpositive orthant in Euclidean vector space of respective dimensionn and n×nsubspace comprising all (real) symmetric n×n matrices, the symmetricmatrix subspaceorthogonal complement of S n in R n×n , the antisymmetric matricesconvex cone comprising all (real) symmetric positive semidefinite n×nmatrices, the positive semidefinite coneinterior of convex cone comprising all (real) symmetric positivesemidefinite n×n matrices; id est, positive definite matricesconvex set of all positive semidefinite n×n matrices whose rank equalsor exceeds ρ

649EDM N√EDMNPSDSDPEDMS n 1S n hS n⊥hS n ccone of N ×N Euclidean distance matrices in the symmetric hollowsubspacenonconvex cone of N ×N Euclidean absolute distance matrices in thesymmetric hollow subspacepositive semidefinitesemidefinite programEuclidean distance matrixsubspace comprising all symmetric n×n matrices having all zeros infirst row and column (1604)subspace comprising all symmetric hollow n×n matrices (0 maindiagonal), the symmetric hollow subspace (55)orthogonal complement of S n h in Sn (56), the set of all diagonal matricessubspace comprising all geometrically centered symmetric n×nmatrices; geometric center subspace S N ∆c = {Y ∈ S N | Y 1=0} (1600)S n⊥c orthogonal complement of S n c in S n (1602)R m×ncsubspace comprising all geometrically centered m×n matricesX ⊥ basis N(X T )x ⊥ N(x T ) , {y | x T y = 0}R(P ) ⊥ N(P T )R ⊥ orthogonal complement of R⊆ R n ; R ⊥ ={y ∆ ∈ R n | 〈x,y〉=0 ∀x∈ R}K ⊥K M+K MK ∗ λδHnormal conemonotone nonnegative conemonotone conecone of majorizationhalfspace

650 APPENDIX H. NOTATION AND A FEW DEFINITIONS∂H∂H∂H −∂H +dhyperplane; id est, partial boundary of halfspacesupporting hyperplanea supporting hyperplane having outward-normal with respect to set itsupportsa supporting hyperplane having inward-normal with respect to set itsupportsvector of distance-squared ijlower bound on distance-square d ijd ijABCupper bound on distance-square d ijclosed line segment ABclosure of set Cdecomposition orthonormal (1512) page 562, biorthogonal (1489) page 556expansion orthogonal (1522) page 565, biorthogonal (328) page 173cubixquartixfeasible setsolution setnatural orderg ′g ′′member of R M×N×Lmember of R M×N×L×Kmost simply, the set of all variable values satisfying all constraints ofan optimization problemmost simply, the set of all optimal solutions to an optimization problem;a subset of the feasible set and not necessarily a single pointwith reference to stacking columns in a vectorization means a vectormade from superposing column 1 on top of column 2 then superposingthe result on column 3 and so on; as in a vector made from entries of themain diagonal δ(A) means taken from left to right and top to bottomfirst derivative of possibly multidimensional function with respect toreal argumentsecond derivative with respect to real argument

651→Ydg→Ydg 2∇∆△ ijk√d ijd ijII∅first directional derivative of possibly multidimensional function g indirection Y ∈R K×L (maintains dimensions of g)second directional derivative of g in direction Ygradient from calculus, ∇ X is gradient with respect to X , ∇ 2 issecond-order gradientdistance scalar, or matrix of absolute distance, or difference operator,or diagonal matrixtriangle made by vertices i , j , and k(absolute) distance scalardistance-square scalar, EDM entryidentity matrixindex set, a discrete set of indicesthe empty set is an implicit member of every set0 real zero0 vector or matrix of zerosOorder of magnitude of information required or computational intensity:O(N) is first order, O(N 2 ) is second, and so on1 real one1 vector of onese itightmaxmaximizexvector whose i th entry is 1 (otherwise 0), or i th member of the standardbasis for R nwith reference to a bound means a bound that can be met,with reference to an inequality means equality is achievablemaximum [129,0.1.1]find the maximum with respect to variable x

652 APPENDIX H. NOTATION AND A FEW DEFINITIONSargsuparg supf(x)subject tominminimizexinfarg inf f(x)argument of operator or function, or variable of optimizationsupremum, least upper bound [129,0.1.1]argument x at supremum of function f ; not necessarily a member offunction domainspecifies constraints to an optimization problemminimum [129,0.1.1]find the minimum with respect to variable xinfimum, greatest lower bound [129,0.1.1]argument x at infimum of function f ; not necessarily a member offunction domainiff if and only if, necessary and sufficient; also the meaningindiscriminately attached to appearance of the word “if ” in thestatement of a mathematical definition, an esoteric practice worthyof abolition because of the ambiguity thus conferredrelintsgnroundmodtrrankrelativeinteriorsignum functionround to nearest integermodulus functionmatrix traceas in rankA, rank of matrix A ; dim R(A)dim dimension, dim R n = n , dim(x∈ R n ) = n , dim R(x∈ R n ) = 1,dim R(A∈ R m×n )= rank(A)affconvaffine hullconvex hull

653cenvconeconvex envelopeconic hullcontent content of high-dimensional bounded polyhedron, volume in 3dimensions, area in 2, and so oncofdistvecmatrix of cofactors corresponding to matrix argumentdistance between point or set argumentsvectorization of m ×n matrix, Euclidean dimension mnsvec vectorization of symmetric n ×n matrix, Euclidean dimensionn(n + 1)/2dvec(x,y)≽≻⊁≥vectorization of symmetric hollow n ×n matrix, Euclidean dimensionn(n − 1)/2angle between vectors x and y , or dihedral angle between affinesubsetsgeneralized inequality; e.g., A ≽ 0 means vector or matrix A must beexpressible in a biorthogonal expansion having nonnegative coordinateswith respect to extreme directions of some implicit pointed closed convexcone K , or comparison to the origin with respect to some implicitpointed closed convex cone, or when K= S n + matrix A belongs to thepositive semidefinite cone of symmetric matricesstrict generalized inequalitynot positive definitescalar inequality, greater than or equal to; comparison of scalars orentrywise comparison of vectors or matrices with respect to R +nonnegative for α∈ R , α ≥ 0> greater thanpositive for α∈ R , α > 0⌊ ⌋ floor function, ⌊x⌋ is greatest integer not exceeding x

654 APPENDIX H. NOTATION AND A FEW DEFINITIONS| | entrywise absolute value of scalars, vectors, and matricesdetmatrix determinant‖x‖ vector 2-norm or Euclidean norm ‖x‖ 2√n∑‖x‖ l= l |x j | l vector l-normj=1‖x‖ ∞ = max{|x j | ∀j} infinity-norm‖x‖ 2 2 = x T x = 〈x , x〉‖x‖ 1 = 1 T |x| 1-norm, dual infinity-norm‖x‖ 0 0-norm, cardinality of vector x , 0 0 = ∆ 0‖X‖ 2= sup ‖Xa‖ 2 = σ 1 = √ λ(X T X) 1‖a‖=1largest singular valuematrix 2-norm (spectral norm),‖X‖ = ‖X‖ F Frobenius’ matrix norm

Bibliography[1] Suliman Al-Homidan and Henry Wolkowicz. Approximate and exactcompletion problems for Euclidean distance matrices using semidefiniteprogramming, April 2004.orion.math.uwaterloo.ca/∼hwolkowi/henry/reports/edmapr04.ps .[2] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrixcompletions. Linear Algebra and its Applications, 370:1–14, 2003.[3] Abdo Y. Alfakih. On the uniqueness of Euclidean distance matrixcompletions: the case of points in general position. Linear Algebra andits Applications, 397:265–277, 2005.[4] Abdo Y. Alfakih, Amir Khandani, and Henry Wolkowicz. SolvingEuclidean distance matrix completion problems via semidefiniteprogramming. Computational Optimization and Applications,12(1):13–30, January 1999.http://citeseer.ist.psu.edu/alfakih97solving.html .[5] Abdo Y. Alfakih and Henry Wolkowicz. On the embeddabilityof weighted graphs in Euclidean spaces. Research Report CORR98-12, Department of Combinatorics and Optimization, University ofWaterloo, May 1998. Erratum: p.260 herein.http://citeseer.ist.psu.edu/alfakih98embeddability.html .[6] Abdo Y. Alfakih and Henry Wolkowicz. Matrix completion problems.In Henry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe,editors, Handbook of Semidefinite Programming: Theory, Algorithms,and Applications, chapter 18. Kluwer, 2000.655

656 BIBLIOGRAPHY[7] Abdo Y. Alfakih and Henry Wolkowicz. Two theorems on Euclideandistance matrices and Gale transform. Linear Algebra and itsApplications, 340:149–154, 2002.[8] Farid Alizadeh. Combinatorial Optimization with Interior PointMethods and Semi-Definite Matrices. PhD thesis, University ofMinnesota, Minneapolis, Computer Science Department, October1991.[9] Farid Alizadeh. Interior point methods in semidefinite programmingwith applications to combinatorial optimization. SIAM Journal onOptimization, 5(1):13–51, February 1995.[10] Kurt Anstreicher and Henry Wolkowicz. On Lagrangian relaxation ofquadratic matrix constraints. SIAM Journal on Matrix Analysis andApplications, 22(1):41–55, 2000.[11] Howard Anton. Elementary Linear Algebra. Wiley, second edition,1977.[12] James Aspnes, David Goldenberg, and Yang Richard Yang. Onthe computational complexity of sensor network localization. InProceedings of the First International Workshop on AlgorithmicAspects of Wireless Sensor Networks: ALGOSENSORS 2004, volume3121 of Lecture Notes in Computer Science, pages 32–44, Turku,Finland, July 2004. Springer-Verlag.cs-www.cs.yale.edu/homes/aspnes/localization-abstract.html .[13] D. Avis and K. Fukuda. A pivoting algorithm for convex hulls andvertex enumeration of arrangements and polyhedra. Discrete andComputational Geometry, 8:295–313, 1992.[14] Mihály Bakonyi and Charles R. Johnson. The Euclidean distancematrix completion problem. SIAM Journal on Matrix Analysis andApplications, 16(2):646–654, April 1995.[15] Keith Ball. An elementary introduction to modern convex geometry.In Silvio Levy, editor, Flavors of Geometry, volume 31, chapter 1,pages 1–58. MSRI Publications, 1997.http://www.msri.org/publications/books/Book31/files/ball.pdf .

BIBLIOGRAPHY 657[16] George Phillip Barker. Theory of cones. Linear Algebra and itsApplications, 39:263–291, 1981.[17] George Phillip Barker and David Carlson. Cones of diagonallydominant matrices. Pacific Journal of Mathematics, 57(1):15–32, 1975.[18] George Phillip Barker and James Foran. Self-dual cones in Euclideanspaces. Linear Algebra and its Applications, 13:147–155, 1976.[19] Alexander Barvinok. A Course in Convexity. American MathematicalSociety, 2002.[20] Alexander I. Barvinok. Problems of distance geometry and convexproperties of quadratic maps. Discrete & Computational Geometry,13(2):189–202, 1995.[21] Alexander I. Barvinok. A remark on the rank of positive semidefinitematrices subject to affine constraints. Discrete & ComputationalGeometry, 25(1):23–31, 2001.http://citeseer.ist.psu.edu/304448.html .[22] Heinz H. Bauschke and Jonathan M. Borwein. On projectionalgorithms for solving convex feasibility problems. SIAM Review,38(3):367–426, September 1996.[23] Steven R. Bell. The Cauchy Transform, Potential Theory, andConformal Mapping. CRC Press, 1992.[24] Jean Bellissard and Bruno Iochum. Homogeneous and faciallyhomogeneous self-dual cones. Linear Algebra and its Applications,19:1–16, 1978.[25] Adi Ben-Israel. Linear equations and inequalities on finite dimensional,real or complex, vector spaces: A unified theory. Journal ofMathematical Analysis and Applications, 27:367–389, 1969.[26] Aharon Ben-Tal and Arkadi Nemirovski. Lectures on Modern ConvexOptimization: Analysis, Algorithms, and Engineering Applications.SIAM, 2001.

658 BIBLIOGRAPHY[27] Abraham Berman. Cones, Matrices, and Mathematical Programming,volume 79 of Lecture Notes in Economics and Mathematical Systems.Springer-Verlag, 1973.[28] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific,second edition, 1999.[29] Dimitri P. Bertsekas, Angelia Nedić, and Asuman E. Ozdaglar.Convex Analysis and Optimization. Athena Scientific, 2003.[30] Rajendra Bhatia. Matrix Analysis. Springer-Verlag, 1997.[31] Pratik Biswas, Tzu-Chen Liang, Ta-Chung Wang, and Yinyu Ye.Semidefinite programming based algorithms for sensor networklocalization.http://www.stanford.edu/∼yyye/combined rev3.pdf , 2005.[32] Pratik Biswas and Yinyu Ye. Semidefinite programming for ad hocwireless sensor network localization.http://www.stanford.edu/∼yyye/adhocn2.pdf , 2003.[33] Leonard M. Blumenthal. On the four-point property. Bulletin of theAmerican Mathematical Society, 39:423–426, 1933.[34] Leonard M. Blumenthal. Theory and Applications of DistanceGeometry. Oxford University Press, 1953.[35] A. W. Bojanczyk and A. Lutoborski. The Procrustes problem fororthogonal Stiefel matrices. SIAM Journal on Scientific Computing,21(4):1291–1304, February 1998. Date is electronic publication:http://citeseer.ist.psu.edu/500185.html .[36] Ingwer Borg and Patrick Groenen. Modern Multidimensional Scaling.Springer-Verlag, 1997.[37] Jonathan M. Borwein and Heinz Bauschke.Projection algorithms and monotone operators.http://oldweb.cecm.sfu.ca/personal/jborwein/projections4.pdf ,1998.[38] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis andNonlinear Optimization: Theory and Examples. Springer-Verlag, 2000.

BIBLIOGRAPHY 659[39] Stephen Boyd and Jon Dattorro. Alternating projections, 2003.http://www.stanford.edu/class/ee392o/alt proj.pdf .[40] Stephen Boyd, Laurent El Ghaoui, Eric Feron, and VenkataramananBalakrishnan. Linear Matrix Inequalities in System and ControlTheory. SIAM, 1994.[41] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.Cambridge University Press, 2004.http://www.stanford.edu/∼boyd/cvxbook .[42] James P. Boyle and Richard L. Dykstra. A method for findingprojections onto the intersection of convex sets in Hilbert spaces. InR. Dykstra, T. Robertson, and F. T. Wright, editors, Advances in OrderRestricted Statistical Inference, pages 28–47. Springer-Verlag, 1986.[43] Lev M. Brègman. The method of successive projection for finding acommon point of convex sets. Soviet Mathematics, 162(3):487–490,1965. AMS translation of Doklady Akademii Nauk SSSR, 6:688-692.[44] Lev M. Brègman, Yair Censor, Simeon Reich, and YaelZepkowitz-Malachi. Finding the projection of a point ontothe intersection of convex sets via projections onto halfspaces.http://www.optimization-online.org/DB FILE/2003/06/669.pdf ,2003.[45] Mike Brookes. Matrix reference manual: Matrix calculus.http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html ,2002.[46] Michael Carter, Holly Jin, Michael Saunders, and Yinyu Ye.SPASELOC: An adaptive subproblem algorithm for scalable wirelesssensor network localization. SIAM Journal on Optimization, 2006.www.stanford.edu/∼dattorro/JinSIAM.pdf .[47] Yves Chabrillac and Jean-Pierre Crouzeix. Definiteness andsemidefiniteness of quadratic forms revisited. Linear Algebra and itsApplications, 63:283–292, 1984.[48] Chi-Tsong Chen. Linear System Theory and Design. Oxford UniversityPress, 1999.

660 BIBLIOGRAPHY[49] Ward Cheney and Allen A. Goldstein. Proximity maps for convex sets.Proceedings of the American Mathematical Society, 10:448–450, 1959.[50] Steven Chu. Autobiography from Les Prix Nobel, 1997.http://nobelprize.org/physics/laureates/1997/chu-autobio.html .[51] John B. Conway. A Course in Functional Analysis. Springer-Verlag,second edition, 1990.[52] Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone. The LinearComplementarity Problem. Academic Press, 1992.[53] G. M. Crippen and T. F. Havel. Distance Geometry and MolecularConformation. Wiley, 1988.[54] Frank Critchley. Multidimensional scaling: a critical examination andsome new proposals. PhD thesis, University of Oxford, Nuffield College,1980.[55] Frank Critchley. On certain linear mappings between inner-productand squared-distance matrices. Linear Algebra and its Applications,105:91–107, 1988.[56] Joachim Dahl, Bernard H. Fleury, and Lieven Vandenberghe.Approximate maximum-likelihood estimation using semidefiniteprogramming. In Proceedings of the IEEE International Conference onAcoustics, Speech, and Signal Processing, volume VI, pages 721–724,April 2003.www.ee.ucla.edu/faculty/papers/vandenberghe ICASSP03 apr03.pdf .[57] George B. Dantzig. Linear Programming and Extensions.Princeton University Press, 1998.[58] Jon Dattorro. Constrained least squares fit of a filter bank to anarbitrary magnitude frequency response, 1991.http://www.stanford.edu/∼dattorro/Hearing.htm .[59] Joel Dawson, Stephen Boyd, Mar Hershenson, and Thomas Lee.Optimal allocation of local feedback in multistage amplifiers viageometric programming. IEEE Circuits and Systems I: FundamentalTheory and Applications, 2001.

BIBLIOGRAPHY 661[60] Jan de Leeuw. Fitting distances by least squares. UCLA StatisticsSeries Technical Report No. 130, Interdivisional Program in Statistics,UCLA, Los Angeles, CA, 1993.gifi.stat.ucla.edu/work/courses/240/mds/distancia.pdf .[61] Jan de Leeuw. Multidimensional scaling. In International Encyclopediaof the Social & Behavioral Sciences. Elsevier, 2001.http://preprints.stat.ucla.edu/274/274.pdf .[62] Jan de Leeuw. Unidimensional scaling. In Brian S. Everitt andDavid C. Howell, editors, Encyclopedia of Statistics in BehavioralScience, volume 4, pages 2095–2097. Wiley, 2005.http://repositories.cdlib.org/uclastat/papers/2004032701/ .[63] Jan de Leeuw and Willem Heiser. Theory of multidimensional scaling.In P. R. Krishnaiah and L. N. Kanal, editors, Handbook of Statistics,volume 2, chapter 13, pages 285–316. North-Holland Publishing,Amsterdam, 1982.[64] Frank Deutsch. Best Approximation in Inner Product Spaces.Springer-Verlag, 2001.[65] Frank Deutsch and Hein Hundal. The rate of convergence ofDykstra’s cyclic projections algorithm: The polyhedral case. NumericalFunctional Analysis and Optimization, 15:537–565, 1994.[66] Michel Marie Deza and Monique Laurent. Geometry of Cuts andMetrics. Springer-Verlag, 1997.[67] Carolyn Pillers Dobler. A matrix approach to finding a set of generatorsand finding the polar (dual) of a class of polyhedral cones. SIAMJournal on Matrix Analysis and Applications, 15(3):796–803, July1994. Erratum: p.192 herein.[68] Elizabeth D. Dolan, Robert Fourer, Jorge J. Moré, and Todd S.Munson. Optimization on the NEOS server. SIAM News, 35(6):4,8,9,August 2002.[69] Bruce Randall Donald. 3-D structure in chemistry and molecularbiology, 1998.http://www.cs.dartmouth.edu/∼brd/Teaching/Bio/ .

662 BIBLIOGRAPHY[70] David L. Donoho, Michael Elad, and Vladimir Temlyakov. Stablerecovery of sparse overcomplete representations in the presence of noise,February 2004. StableSparse-Donoho-etal.pdf .[71] Richard L. Dykstra. An algorithm for restricted least squaresregression. Journal of the American Statistical Association,78(384):837–842, 1983.[72] Carl Eckart and Gale Young. The approximation of one matrix byanother of lower rank. Psychometrika, 1(3):211–218, September 1936.http://www.stanford.edu/∼dattorro/eckart&young.1936.pdf .[73] Alan Edelman, Tomás A. Arias, and Steven T. Smith. The geometryof algorithms with orthogonality constraints. SIAM Journal on MatrixAnalysis and Applications, 20(2):303–353, 1998.[74] John J. Edgell. Graphics calculator applications on 4 - D constructs.http://archives.math.utk.edu/ICTCM/EP-9/C47/pdf/paper.pdf ,1996.[75] Ivar Ekeland and Roger Témam. Convex Analysis and VariationalProblems. SIAM, 1999.[76] J. Farkas. Theorie der einfachen ungleichungen. Journal für die Reineund Angewandte Mathematik, 124:1–27, 1902.[77] Maryam Fazel. Matrix Rank Minimization with Applications. PhDthesis, Stanford University, Department of Electrical Engineering,March 2002.http://www.cds.caltech.edu/∼maryam/thesis-final.pdf .[78] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. A rankminimization heuristic with application to minimum order systemapproximation. In Proceedings of the American Control Conference,volume 6, pages 4734–4739. American Automatic Control Council(AACC), June 2001.http://www.cds.caltech.edu/∼maryam/nucnorm.html .[79] Maryam Fazel, Haitham Hindi, and Stephen P. Boyd. Log-detheuristic for matrix rank minimization with applications to Hankel andEuclidean distance matrices. In Proceedings of the American Control

BIBLIOGRAPHY 663Conference. American Automatic Control Council (AACC), June 2003.http://www.cds.caltech.edu/∼maryam/acc03 final.pdf .[80] César Fernández, Thomas Szyperski, Thierry Bruyère, Paul Ramage,Egon Mösinger, and Kurt Wüthrich. NMR solution structure ofthe pathogenesis-related protein P14a. Journal of Molecular Biology,266:576–593, 1997.[81] Richard Phillips Feynman, Robert B. Leighton, and Matthew L. Sands.The Feynman Lectures on Physics: Commemorative Issue, volume I.Addison-Wesley, 1989.[82] Paul Finsler. Über das Vorkommen definiter und semidefiniter Formenin Scharen quadratischer Formen. Commentarii Mathematici Helvetici,9:188–192, 1937.[83] Anders Forsgren, Philip E. Gill, and Margaret H. Wright. Interiormethods for nonlinear optimization. SIAM Review, 44(4):525–597,2002.[84] Norbert Gaffke and Rudolf Mathar. A cyclic projection algorithm viaduality. Metrika, 36:29–54, 1989.[85] Jérôme Galtier. Semi-definite programming as a simple extension tolinear programming: convex optimization with queueing, equity andother telecom functionals. In 3ème Rencontres Francophones sur lesAspects Algorithmiques des Télécommunications (AlgoTel). INRIA,France, 2001.www-sop.inria.fr/mascotte/Jerome.Galtier/misc/Galtier01b.pdf .[86] Laurent El Ghaoui and Silviu-Iulian Niculescu, editors. Advances inLinear Matrix Inequality Methods in Control. SIAM, 2000.[87] Philip E. Gill, Walter Murray, and Margaret H. Wright. NumericalLinear Algebra and Optimization, volume 1. Addison-Wesley, 1991.[88] Philip E. Gill, Walter Murray, and Margaret H. Wright. PracticalOptimization. Academic Press, 1999.[89] James Gleik. Isaac Newton. Pantheon Books, 2003.

664 BIBLIOGRAPHY[90] W. Glunt, Tom L. Hayden, S. Hong, and J. Wells. An alternatingprojection algorithm for computing the nearest Euclidean distancematrix. SIAM Journal on Matrix Analysis and Applications,11(4):589–600, 1990.[91] K. Goebel and W. A. Kirk. Topics in Metric Fixed Point Theory.Cambridge University Press, 1990.[92] D. Goldfarb and K. Scheinberg. Interior point trajectoriesin semidefinite programming. SIAM Journal on Optimization,8(4):871–886, 1998.[93] Gene H. Golub and Charles F. Van Loan. Matrix Computations. JohnsHopkins, third edition, 1996.[94] P. Gordan. Ueber die auflösung linearer gleichungen mit reellencoefficienten. Mathematische Annalen, 6:23–28, 1873.[95] John Clifford Gower. Euclidean distance geometry. The MathematicalScientist, 7:1–14, 1982.[96] John Clifford Gower. Properties of Euclidean and non-Euclideandistance matrices. Linear Algebra and its Applications, 67:81–97, 1985.[97] John Clifford Gower and Garmt B. Dijksterhuis. Procrustes Problems.Oxford University Press, 2004.[98] John Clifford Gower and David J. Hand. Biplots. Chapman & Hall,1996.[99] Alexander Graham. Kronecker Products and Matrix Calculus: withApplications. Wiley, 1981.[100] Michael Grant, Stephen Boyd, and Yinyu Ye. cvx: Matlab softwarefor disciplined convex programming.http://www.stanford.edu/∼boyd/cvx/ , 2006.A symbolic front-end for SeDuMi and other optimization software.[101] Robert M. Gray. Toeplitz and circulant matrices: A review.http://www-ee.stanford.edu/∼gray/toeplitz.pdf , 2002.

BIBLIOGRAPHY 665[102] T. N. E. Greville. Note on the generalized inverse of a matrix product.SIAM Review, 8:518–521, 1966.[103] Rémi Gribonval and Morten Nielsen. Highly sparse representationsfrom dictionaries are unique and independent of the sparsenessmeasure, October 2003.http://www.math.auc.dk/research/reports/R-2003-16.pdf .[104] Rémi Gribonval and Morten Nielsen. Sparse representations in unionsof bases. IEEE Transactions on Information Theory, 49(12):1320–1325,December 2003.http://www.math.auc.dk/∼mnielsen/papers.htm .[105] Peter Gritzmann and Victor Klee. On the complexity of some basicproblems in computational convexity: II. Volume and mixed volumes.Technical Report TR:94-31, DIMACS, Rutgers University, 1994.dimacs/TechnicalReports/TechReports/1994/94-31.ps .[106] Peter Gritzmann and Victor Klee. On the complexity of somebasic problems in computational convexity: II. Volume and mixedvolumes. In T. Bisztriczky, P. McMullen, R. Schneider, and A. IvićWeiss, editors, Polytopes: Abstract, Convex and Computational, pages373–466. Kluwer Academic Publishers, 1994.[107] L. G. Gubin, B. T. Polyak, and E. V. Raik. The method of projectionsfor finding the common point of convex sets. U.S.S.R. ComputationalMathematics and Mathematical Physics, 7(6):1–24, 1967.[108] Osman Güler and Yinyu Ye. Convergence behavior of interior-pointalgorithms. Mathematical Programming, 60(2):215–228, 1993.[109] P. R. Halmos. Positive approximants of operators. Indiana UniversityMathematics Journal, 21:951–960, 1972.[110] Shih-Ping Han. A successive projection method. MathematicalProgramming, 40:1–14, 1988.[111] Godfrey H. Hardy, John E. Littlewood, and George Pólya. Inequalities.Cambridge University Press, second edition, 1952.

666 BIBLIOGRAPHY[112] Arash Hassibi and Mar Hershenson. Automated optimal design ofswitched-capacitor filters. Design Automation and Test in EuropeConference, 2001.[113] Timothy F. Havel and Kurt Wüthrich. An evaluation of the combineduse of nuclear magnetic resonance and distance geometry for thedetermination of protein conformations in solution. Journal ofMolecular Biology, 182:281–294, 1985.[114] Tom L. Hayden and Jim Wells. Approximation by matrices positivesemidefinite on a subspace. Linear Algebra and its Applications,109:115–130, 1988.[115] Tom L. Hayden, Jim Wells, Wei-Min Liu, and Pablo Tarazaga.The cone of distance matrices. Linear Algebra and its Applications,144:153–169, 1991.[116] Uwe Helmke and John B. Moore. Optimization and DynamicalSystems. Springer-Verlag, 1994.[117] Bruce Hendrickson. Conditions for unique graph realizations. SIAMJournal on Computing, 21(1):65–84, February 1992.[118] T. Herrmann, Peter Güntert, and Kurt Wüthrich. Protein NMRstructure determination with automated NOE assignment using thenew software CANDID and the torsion angle dynamics algorithmDYANA. Journal of Molecular Biology, 319(1):209–227, May 2002.[119] Mar Hershenson. Design of pipeline analog-to-digital converters viageometric programming. International Conference on Computer AidedDesign - ICCAD, 2002.[120] Mar Hershenson. Efficient description of the design space of analogcircuits. 40 th Design Automation Conference, 2003.[121] Mar Hershenson, Stephen Boyd, and Thomas Lee. Optimal design ofa CMOS OpAmp via geometric programming. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 2001.[122] Mar Hershenson, Dave Colleran, Arash Hassibi, and Navraj Nandra.Synthesizable full custom mixed-signal IP. Electronics DesignAutomation Consortium - EDA, 2002.

BIBLIOGRAPHY 667[123] Mar Hershenson, Sunderarajan S. Mohan, Stephen Boyd, and ThomasLee. Optimization of inductor circuits via geometric programming,1999.[124] Mar Hershenson and Xiling Shen. A new analog design flow. Cadence,2002.[125] Nick Higham. Matrix Procrustes problems.http://www.ma.man.ac.uk/∼higham/talks/ , 1995. Lecture notes.[126] Richard D. Hill and Steven R. Waters. On the cone of positivesemidefinite matrices. Linear Algebra and its Applications, 90:81–88,1987.[127] Jean-Baptiste Hiriart-Urruty. Ensembles de Tchebychev vs. ensemblesconvexes: l’état de la situation vu via l’analyse convexe non lisse.Annales des Sciences Mathématiques du Québec, 22(1):47–62, 1998.Association Mathématique du Québec,www.lacim.uqam.ca/∼annales/volumes/22-1/PDF/47-62.pdf .[128] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex Analysisand Minimization Algorithms II: Advanced Theory and BundleMethods. Springer-Verlag, second edition, 1996.[129] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Fundamentalsof Convex Analysis. Springer-Verlag, 2001.[130] Alan J. Hoffman and Helmut W. Wielandt. The variation of thespectrum of a normal matrix. Duke Mathematical Journal, 20:37–40,1953.[131] Roger A. Horn and Charles R. Johnson. Matrix Analysis. CambridgeUniversity Press, 1987.[132] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis.Cambridge University Press, 1994.[133] Alston S. Householder. The Theory of Matrices in Numerical Analysis.Dover, 1975.[134] http://www.polariswireless.com/ . Polaris Wireless, 2005.

668 BIBLIOGRAPHY[135] Hong-Xuan Huang, Zhi-An Liang, and Panos M. Pardalos. Someproperties for the Euclidean distance matrix and positive semidefinitematrix completion problems. Journal of Global Optimization,25(1):3–21, 2003.http://portal.acm.org/citation.cfm?id=607918 .[136] 5W Infographic. Wireless 911. Technology Review, 107(5):78–79, June2004. http://www.technologyreview.com/ .[137] Nathan Jacobson. Lectures in Abstract Algebra, vol. II - Linear Algebra.Van Nostrand, 1953.[138] Viren Jain and Lawrence K. Saul. Exploratory analysis andvisualization of speech and music by locally linear embedding. InProceedings of the IEEE International Conference on Acoustics,Speech, and Signal Processing, volume 3, pages 984–987, 2004.http://www.cis.upenn.edu/∼lsaul/papers/lle icassp04.pdf .[139] Joakim Jaldén, Cristoff Martin, and Björn Ottersten. Semidefiniteprogramming for detection in linear systems − Optimality conditionsand space-time decoding. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, volume VI,April 2003.http://www.s3.kth.se/signal/reports/03/IR-S3-SB-0309.pdf .[140] Florian Jarre. Convex analysis on symmetric matrices. InHenry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors,Handbook of Semidefinite Programming: Theory, Algorithms, andApplications, chapter 2. Kluwer, 2000.[141] Charles R. Johnson and Pablo Tarazaga. Connections between the realpositive semidefinite and distance matrix completion problems. LinearAlgebra and its Applications, 223/224:375–391, 1995.[142] George B. Thomas, Jr. Calculus and Analytic Geometry.Addison-Wesley, fourth edition, 1972.[143] Thomas Kailath. Linear Systems. Prentice-Hall, 1980.[144] Tosio Kato. Perturbation Theory for Linear Operators.Springer-Verlag, 1966.

BIBLIOGRAPHY 669[145] Paul J. Kelly and Norman E. Ladd. Geometry. Scott, Foresman andCompany, 1965.[146] Ron Kimmel. Numerical Geometry of Images: Theory, Algorithms,and Applications. Springer-Verlag, 2003.[147] Erwin Kreyszig. Introductory Functional Analysis with Applications.Wiley, 1989.[148] Jean B. Lasserre. A new Farkas lemma for positive semidefinitematrices. IEEE Transactions on Automatic Control, 40(6):1131–1133,June 1995.[149] Jean B. Lasserre and Eduardo S. Zeron. A Laplace transform algorithmfor the volume of a convex polytope. Journal of the Association forComputing Machinery, 48(6):1126–1140, November 2001.[150] Jean B. Lasserre and Eduardo S. Zeron. A new algorithm for thevolume of a convex polytope. arXiv.org, June 2001.http://arxiv.org/PS cache/math/pdf/0106/0106168.pdf .[151] Monique Laurent. A connection between positive semidefinite andEuclidean distance matrix completion problems. Linear Algebra andits Applications, 273:9–22, 1998.[152] Monique Laurent. A tour d’horizon on positive semidefinite andEuclidean distance matrix completion problems. In Panos M. Pardalosand Henry Wolkowicz, editors, Topics in Semidefinite andInterior-Point Methods, pages 51–76. American Mathematical Society,1998.[153] Monique Laurent. Matrix completion problems. In Christodoulos A.Floudas and Panos M. Pardalos, editors, Encyclopedia of Optimization,volume III (Interior - M), pages 221–229. Kluwer, 2001.http://homepages.cwi.nl/∼monique/files/opt.ps .[154] Monique Laurent and Svatopluk Poljak. On a positive semidefiniterelaxation of the cut polytope. Linear Algebra and its Applications,223/224:439–461, 1995.

670 BIBLIOGRAPHY[155] Monique Laurent and Svatopluk Poljak. On the facial structure ofthe set of correlation matrices. SIAM Journal on Matrix Analysis andApplications, 17(3):530–547, July 1996.[156] Monique Laurent and Franz Rendl. Semidefinite programming andinteger programming. Optimization Online, 2002.http://www.optimization-online.org/DB HTML/2002/12/585.html .[157] Charles L. Lawson and Richard J. Hanson. Solving Least SquaresProblems. SIAM, 1995.[158] Jung Rye Lee. The law of cosines in a tetrahedron. Journal of the KoreaSociety of Mathematical Education Series B: The Pure and AppliedMathematics, 4(1):1–6, 1997.[159] Vladimir L. Levin. Quasi-convex functions and quasi-monotoneoperators. Journal of Convex Analysis, 2(1/2):167–172, 1995.[160] Adrian S. Lewis. Eigenvalue-constrained faces. Linear Algebra and itsApplications, 269:159–181, 1998.[161] Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and HervéLebret. Applications of second-order cone programming. LinearAlgebra and its Applications, 284:193–228, November 1998. SpecialIssue on Linear Algebra in Control, Signals and Image Processing.http://www.stanford.edu/∼boyd/socp.html .[162] David G. Luenberger. Optimization by Vector Space Methods. Wiley,1969.[163] David G. Luenberger. Introduction to Dynamic Systems: Theory,Models, & Applications. Wiley, 1979.[164] David G. Luenberger. Linear and Nonlinear Programming.Addison-Wesley, second edition, 1989.[165] Zhi-Quan Luo, Jos F. Sturm, and Shuzhong Zhang. Superlinearconvergence of a symmetric primal-dual path following algorithm forsemidefinite programming. SIAM Journal on Optimization, 8(1):59–81,1998.

BIBLIOGRAPHY 671[166] K. V. Mardia. Some properties of classical multi-dimensionalscaling. Communications in Statistics: Theory and Methods,A7(13):1233–1241, 1978.[167] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis.Academic Press, 1979.[168] Jerrold E. Marsden and Michael J. Hoffman. Elementary ClassicalAnalysis. Freeman, second edition, 1995.[169] Rudolf Mathar. The best Euclidean fit to a given distance matrix inprescribed dimensions. Linear Algebra and its Applications, 67:1–6,1985.[170] Rudolf Mathar. Multidimensionale Skalierung. B. G. TeubnerStuttgart, 1997.[171] Mehran Mesbahi and G. P. Papavassilopoulos. On the rankminimization problem over a positive semi-definite linear matrixinequality. IEEE Transactions on Automatic Control, 42(2):239–243,1997.[172] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Simple accurate expressions for planar spiral inductances. IEEEJournal of Solid-State Circuits, 1999.[173] Sunderarajan S. Mohan, Mar Hershenson, Stephen Boyd, and ThomasLee. Bandwidth extension in CMOS with optimized on-chip inductors.IEEE Journal of Solid-State Circuits, 2000.[174] E. H. Moore. On the reciprocal of the general algebraic matrix. Bulletinof the American Mathematical Society, 26:394–395, 1920. Abstract.[175] B. S. Mordukhovich. Maximum principle in the problem of timeoptimal response with nonsmooth constraints. Journal of AppliedMathematics and Mechanics, 40:960–969, 1976.[176] Jean-Jacques Moreau. Décomposition orthogonale d’un espaceHilbertien selon deux cônes mutuellement polaires. Comptes Rendusde l’Académie des Sciences, Paris, 255:238–240, 1962.

672 BIBLIOGRAPHY[177] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linearinequalities. Canadian Journal of Mathematics, 6:393–404, 1954.[178] Neil Muller, Lourenço Magaia, and B. M. Herbst. Singular valuedecomposition, eigenfaces, and 3D reconstructions. SIAM Review,46(3):518–545, September 2004.[179] Katta G. Murty and Feng-Tien Yu. Linear Complementarity, Linearand Nonlinear Programming. Heldermann Verlag, internet edition,1988. linear complementarity webbook/ .[180] Navraj Nandra. Synthesizable analog IP. IP Based Design Workshop,2002.[181] Stephen G. Nash and Ariela Sofer. Linear and Nonlinear Programming.McGraw-Hill, 1996.[182] Yurii Nesterov and Arkadii Nemirovskii. Interior-Point PolynomialAlgorithms in Convex Programming. SIAM, 1994.[183] Jorge Nocedal and Stephen J. Wright. Numerical Optimization.Springer-Verlag, 1999.[184] Royal Swedish Academy of Sciences. Nobel prize in chemistry, 2002.http://nobelprize.org/chemistry/laureates/2002/public.html .[185] C. S. Ogilvy. Excursions in Geometry. Dover, 1990. Citation:Proceedings of the CUPM Geometry Conference, MathematicalAssociation of America, No.16 (1967), p.21.[186] Brad Osgood. Notes on the Ahlfors mapping of a multiply connecteddomain, 2000.www-ee.stanford.edu/∼osgood/Ahlfors-Bergman-Szego.pdf .[187] M. L. Overton and R. S. Womersley. On the sum of the largesteigenvalues of a symmetric matrix. SIAM Journal on Matrix Analysisand Applications, 13:41–45, 1992.[188] Pythagoras Papadimitriou. Parallel Solution of SVD-Related Problems,With Applications. PhD thesis, Department of Mathematics, Universityof Manchester, October 1993.

BIBLIOGRAPHY 673[189] Panos M. Pardalos and Henry Wolkowicz, editors. Topics inSemidefinite and Interior-Point Methods. American MathematicalSociety, 1998.[190] Gábor Pataki. Cone-LP’s and semidefinite programs: Geometry anda simplex-type method. In William H. Cunningham, S. ThomasMcCormick, and Maurice Queyranne, editors, Integer Programmingand Combinatorial Optimization, Proceedings of the 5th InternationalIPCO Conference, Vancouver, British Columbia, Canada, June 3-5,1996, volume 1084 of Lecture Notes in Computer Science, pages162–174. Springer-Verlag, 1996.[191] Gábor Pataki. On the rank of extreme matrices in semidefiniteprograms and the multiplicity of optimal eigenvalues. Mathematicsof Operations Research, 23(2):339–358, 1998.http://citeseer.ist.psu.edu/pataki97rank.htmlErratum: p.516 herein.[192] Gábor Pataki. The geometry of semidefinite programming. InHenry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe, editors,Handbook of Semidefinite Programming: Theory, Algorithms, andApplications, chapter 3. Kluwer, 2000.[193] Teemu Pennanen and Jonathan Eckstein. Generalized Jacobiansof vector-valued convex functions. Technical Report RRR 6-97,RUTCOR, Rutgers University, May 1997.rutcor.rutgers.edu/pub/rrr/reports97/06.ps .[194] R. Penrose. A generalized inverse for matrices. In Proceedings of theCambridge Philosophical Society, volume 51, pages 406–413, 1955.[195] Chris Perkins. A convergence analysis of Dykstra’s algorithm forpolyhedral sets. SIAM Journal on Numerical Analysis, 40(2):792–804,2002.[196] Florian Pfender and Günter M. Ziegler. Kissing numbers, spherepackings, and some unexpected proofs. Notices of the AmericanMathematical Society, 51(8):873–883, September 2004.[197] Alex Reznik. Problem 3.41, from Sergio Verdú. Multiuser Detection.Cambridge University Press, 1998.

674 BIBLIOGRAPHYwww.ee.princeton.edu/∼verdu/mud/solutions/3/3.41.areznik.pdf ,2001.[198] Sara Robinson. Hooked on meshing, researcher creates award-winningtriangulation program. SIAM News, 36(9), November 2003.http://siam.org/siamnews/11-03/meshing.pdf .[199] R. Tyrrell Rockafellar. Conjugate Duality and Optimization. SIAM,1974.[200] R. Tyrrell Rockafellar. Lagrange multipliers in optimization.SIAM-American Mathematical Society Proceedings, 9:145–168, 1976.[201] R. Tyrrell Rockafellar. Lagrange multipliers and optimality. SIAMReview, 35(2):183–238, June 1993.[202] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press,1997. First published in 1970.[203] C. K. Rushforth. Signal restoration, functional analysis, and Fredholmintegral equations of the first kind. In Henry Stark, editor, ImageRecovery: Theory and Application, chapter 1, pages 1–27. AcademicPress, 1987.[204] Peter Santos. Application platforms and synthesizable analog IP.Embedded Systems Conference, 2003.[205] Shankar Sastry. Nonlinear Systems: Analysis, Stability, and Control.Springer-Verlag, 1999.[206] Uwe Schäfer. A linear complementarity problem with a P-matrix.SIAM Review, 46(2):189–201, June 2004.[207] I. J. Schoenberg. Remarks to Maurice Fréchet’s article “Sur ladéfinition axiomatique d’une classe d’espace distanciés vectoriellementapplicable sur l’espace de Hilbert”. Annals of Mathematics,36(3):724–732, July 1935.http://www.stanford.edu/∼dattorro/Schoenberg1.pdf .[208] I. J. Schoenberg. Metric spaces and positive definite functions.Transactions of the American Mathematical Society, 44:522–536, 1938.

BIBLIOGRAPHY 675[209] Peter H. Schönemann. A generalized solution of the orthogonalProcrustes problem. Psychometrika, 31(1):1–10, 1966.[210] Peter H. Schönemann, Tim Dorcey, and K. Kienapple. Subadditiveconcatenation in dissimilarity judgements. Perception andPsychophysics, 38:1–17, 1985.[211] Joshua A. Singer. Log-Penalized Linear Regression. PhD thesis,Stanford University, Department of Electrical Engineering, June 2004.[212] Anthony Man-Cho So and Yinyu Ye. Theory of semidefiniteprogramming for sensor network localization, 2004.http://www.stanford.edu/∼yyye/local-theory.pdf .[213] Anthony Man-Cho So and Yinyu Ye. A semidefinite programmingapproach to tensegrity theory and realizability of graphs, 2005.http://www.stanford.edu/∼yyye/d-real-soda2.pdf .[214] Henry Stark, editor. Image Recovery: Theory and Application.Academic Press, 1987.[215] Henry Stark. Polar, spiral, and generalized sampling and interpolation.In Robert J. Marks II, editor, Advanced Topics in Shannon Samplingand Interpolation Theory, chapter 6, pages 185–218. Springer-Verlag,1993.[216] Willi-Hans Steeb. Matrix Calculus and Kronecker Product withApplications and C++ Programs. World Scientific Publishing Co.,1997.[217] Gilbert W. Stewart and Ji-guang Sun. Matrix Perturbation Theory.Academic Press, 1990.[218] Josef Stoer and Christoph Witzgall. Convexity and Optimization inFinite Dimensions I. Springer-Verlag, 1970.[219] Gilbert Strang. Course 18.06: Linear algebra.http://web.mit.edu/18.06/www , 2004.[220] Gilbert Strang. Linear Algebra and its Applications. Harcourt Brace,third edition, 1988.

676 BIBLIOGRAPHY[221] Gilbert Strang. Calculus. Wellesley-Cambridge Press, 1992.[222] Gilbert Strang. Introduction to Linear Algebra. Wellesley-CambridgePress, second edition, 1998.[223] S. Straszewicz. Über exponierte Punkte abgeschlossener Punktmengen.Fund. Math., 24:139–143, 1935.[224] Jos F. Sturm. SeDuMi (self-dual minimization).http://sedumi.mcmaster.ca/ , 2005.Software for optimization over symmetric cones.[225] Jos F. Sturm and Shuzhong Zhang. On cones of nonnegative quadraticfunctions. Optimization Online, April 2001.http://www.optimization-online.org/DB HTML/2001/05/324.html .[226] George P. H. Styan. A review and some extensions of Takemura’sgeneralizations of Cochran’s theorem. Technical Report 56, StanfordUniversity, Department of Statistics, September 1982.[227] George P. H. Styan and Akimichi Takemura. Rank additivityand matrix polynomials. Technical Report 57, Stanford University,Department of Statistics, September 1982.[228] Chen Han Sung and Bit-Shun Tam. A study of projectionally exposedcones. Linear Algebra and its Applications, 139:225–252, 1990.[229] Yoshio Takane. On the relations among four methods ofmultidimensional scaling. Behaviormetrika, 4:29–43, 1977.http://takane.brinkster.net/Yoshio/p008.pdf .[230] Akimichi Takemura. On generalizations of Cochran’s theorem andprojection matrices. Technical Report 44, Stanford University,Department of Statistics, August 1980.[231] Peng Hui Tan and Lars K. Rasmussen. The application of semidefiniteprogramming for detection in CDMA. IEEE Journal on Selected Areasin Communications, 19(8), August 2001.[232] Pablo Tarazaga. Faces of the cone of Euclidean distance matrices:Characterizations, structure and induced geometry. Linear Algebraand its Applications, 408:1–13, 2005.

BIBLIOGRAPHY 677[233] Warren S. Torgerson. Theory and Methods of Scaling. Wiley, 1958.[234] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra.SIAM, 1997.[235] Michael W. Trosset. Applications of multidimensional scaling tomolecular conformation. Computing Science and Statistics, 29:148–152,1998.[236] Michael W. Trosset. Distance matrix completion by numericaloptimization. Computational Optimization and Applications,17(1):11–22, October 2000.[237] Michael W. Trosset. Extensions of classical multidimensional scaling:Computational theory. www.math.wm.edu/∼trosset/r.mds.html ,2001. Revision of technical report entitled “Computing distancesbetween convex sets and subsets of the positive semidefinite matrices”first published in 1997.[238] Michael W. Trosset and Rudolf Mathar. On the existence of nonglobalminimizers of the STRESS criterion for metric multidimensionalscaling. In Proceedings of the Statistical Computing Section, pages158–162. American Statistical Association, 1997.[239] Jan van Tiel. Convex Analysis, an Introductory Text. Wiley, 1984.[240] Lieven Vandenberghe and Stephen Boyd. Semidefinite programming.SIAM Review, 38(1):49–95, March 1996.[241] Lieven Vandenberghe and Stephen Boyd. Connections betweensemi-infinite and semidefinite programming. In R. Reemtsen andJ.-J. Rückmann, editors, Semi-Infinite Programming, chapter 8, pages277–294. Kluwer Academic Publishers, 1998.[242] Lieven Vandenberghe and Stephen Boyd. Applications of semidefiniteprogramming. Applied Numerical Mathematics, 29(3):283–299, March1999.[243] Lieven Vandenberghe, Stephen Boyd, and Shao-Po Wu. Determinantmaximization with linear matrix inequality constraints. SIAM Journalon Matrix Analysis and Applications, 19(2):499–533, April 1998.

678 BIBLIOGRAPHY[244] Robert J. Vanderbei. Convex optimization: Interior-point methods andapplications, 1999.www.sor.princeton.edu/∼rvdb/pdf/talks/pumath/talk.pdf .[245] Richard S. Varga. Geršgorin and His Circles. Springer-Verlag, 2004.[246] Martin Vetterli and Jelena Kovačević. Wavelets and Subband Coding.Prentice Hall, 1995.[247] È. B. Vinberg. The theory of convex homogeneous cones.Transactions of the Moscow Mathematical Society, 12:340–403, 1963.American Mathematical Society and London Mathematical Societyjoint translation, 1965.[248] Marie A. Vitulli. A brief history of linear algebra and matrix theory.darkwing.uoregon.edu/∼vitulli/441.sp04/LinAlgHistory.html ,2004.[249] John von Neumann. Functional Operators, Volume II: The Geometryof Orthogonal Spaces. Princeton University Press, 1950. Reprintedfrom mimeographed lecture notes first distributed in 1933.[250] Roger Webster. Convexity. Oxford University Press, 1994.[251] Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning ofimage manifolds by semidefinite programming. In Proceedings of theIEEE Computer Society Conference on Computer Vision and PatternRecognition, volume 2, pages 988–995, 2004.http://www.cis.upenn.edu/∼lsaul/papers/sde cvpr04.pdf .[252] Eric W. Weisstein. MathWorld – A Wolfram Web Resource.http://mathworld.wolfram.com/search/ , 2004.[253] Bernard Widrow and Samuel D. Stearns. Adaptive Signal Processing.Prentice-Hall, 1985.[254] Norbert Wiener. On factorization of matrices. CommentariiMathematici Helvetici, 29:97–111, 1955.[255] Michael P. Williamson, Timothy F. Havel, and Kurt Wüthrich.Solution conformation of proteinase inhibitor IIA from bull seminal

BIBLIOGRAPHY 679plasma by 1 H nuclear magnetic resonance and distance geometry.Journal of Molecular Biology, 182:295–315, 1985.[256] Willie W. Wong. Cayley-Menger determinant and generalizedN-dimensional Pythagorean theorem.http://www.princeton.edu/∼wwong/papers/gp-r.pdf ,November 2003. Application of Linear Algebra: Notes on Talk givento Princeton University Math Club.[257] William Wooton, Edwin F. Beckenbach, and Frank J. Fleming. ModernAnalytic Geometry. Houghton Mifflin, 1975.[258] Margaret H. Wright. The interior-point revolution in optimization:History, recent developments, and lasting consequences. Bulletin ofthe American Mathematical Society, 42(1):39–56, January 2005.[259] Stephen J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.[260] Shao-Po Wu. max-det Programming with Applications in MagnitudeFilter Design. A dissertation submitted to the department of ElectricalEngineering, Stanford University, December 1997.[261] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver forsemidefinite programming and determinant maximization problemswith matrix structure. User’s guide.http://www.stanford.edu/∼boyd/SDPSOL.html , 1996.[262] Shao-Po Wu and Stephen Boyd. sdpsol: A parser/solver forsemidefinite programs with matrix structure. In Laurent El Ghaoui andSilviu-Iulian Niculescu, editors, Advances in Linear Matrix InequalityMethods in Control, chapter 4, pages 79–91. SIAM, 2000.http://www.stanford.edu/∼boyd/sdpsol.html .[263] Shao-Po Wu, Stephen Boyd, and Lieven Vandenberghe. FIRfilter design via spectral factorization and convex optimization.http://www.stanford.edu/∼boyd/reports/magdes.pdf , 1997.[264] David D. Yao, Shuzhong Zhang, and Xun Yu Zhou. Stochasticlinear-quadratic control via primal-dual semidefinite programming.SIAM Review, 46(1):87–111, March 2004. Erratum: p.197 herein.

[265] Yinyu Ye. Semidefinite programming for Euclidean distance geometricoptimization. Lecture notes, 2003.http://www.stanford.edu/class/ee392o/EE392o-yinyu-ye.pdf .[266] Yinyu Ye. Convergence behavior of central paths for convexhomogeneous self-dual cones, 1996.http://www.stanford.edu/∼yyye/yyye/ye.ps .[267] Yinyu Ye. Interior Point Algorithms: Theory and Analysis. Wiley,1997.[268] D. C. Youla. Mathematical theory of image restoration by the methodof convex projection. In Henry Stark, editor, Image Recovery: Theoryand Application, chapter 2, pages 29–77. Academic Press, 1987.[269] Fuzhen Zhang. Matrix Theory: Basic Results and Techniques.Springer-Verlag, 1999.[270] Günter M. Ziegler. Kissing numbers: Surprises in dimension four.Emissary, pages 4–5, Spring 2004.http://www.msri.org/publications/emissary/ .2001 Jon Dattorro. CO&EDG version 06.24.2006. All rights reserved.Citation: Jon Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing USA, 2005.680

Index0 eigenvalues theorem, 4842 lattice, 398, 4073 lattice, 399, 4084 lattice, 400, 4095 lattice, 401, 410active, 395adaptive, 395adjacency, 61adjoint operation, 59affine, 196dimension, 50, 68hull, 50, 68independence, 80preservation, 81map, 81set, 50as hyperplane intersection, 88as span of nullspace basis, 89subset, 68affinely independent, 81affinity, 50algebralinear, 451algebraiccomplement, 95, 147multiplicity, 477Alizadeh, 386alternating projection, 233, 599alternation, 303alternative, 159ambient, 345, 358vector space, 48anchor, 237, 244, 395antisymmetric, 62antihollow, 65artificial intelligence, 39axioms of the metric, 219axis of revolution, 121, 507Barvinok, 47, 111, 131base, 95station, 245basis, 64, 648nonorthogonalprojection, 567orthogonalprojection, 567bees, 43bijection, 116, 508bijective, 59isometry, 62bijectivityGram form, 257inner-product form, 261biorthogonal, 171, 556expansion, 147, 173biorthogonality, 174, 496condition, 169blunt, 616Boolean, 379bound681

682 INDEXlower, 418polyhedral, 317boundary, 52, 94, 337and extreme, 108classical, 52cone, 103conventional, 95criterion, 376PSD cone, 116bounded, 70bowl, 535, 536bridge, 419calculusmatrix, 525Carathéodory’s theorem, 157, 578cardinality, 224, 402minimum, 379Cayley-Menger, 267, 288, 308, 446cellular, 246telephone network, 245centerof gravity, 255of mass, 255central path, 371certificatenull, 159chain rule, 531two argument, 531chop(), 625Chu, 11circulant, 212circular cone, 100circumhypersphere, 267closure, 51cofactor, 309comparable points, 102comparison, 70orthant, 164complementalgebraicprojection on, 560complementarity, 378linear, 189maximal, 366, 371problem, 189complementaryslackness condition, 377completiongeometry, 321problem, 220, 275, 282, 311semidefinite, 610composition, 216EDM, 283concave, 193concavity, 193conditionoptimality, 377cone, 72, 95, 98, 653blade, 156circular, 121, 122right, 121convex, 99positive semidefinite, 112dual, 147, 148, 189, 191, 292algorithm, 176construction, 149, 150examples, 154formula, 163, 176properties, 152unique, 148, 168EDM, 313, 315, 316, 318boundary, 329construction, 330convexity, 319decomposition, 354

INDEX 683dual, 345, 346, 356, 360extreme direction, 332face, 332, 334projection, 432PSD, 335, 342halfspace, 151invariance, 100membership, 112monotone, 181–183nonnegative, 180nonconvex, 96–99normal, 185, 339, 593, 613, 614elliptope, 615orthant, 187orthogonal, 155pointed, 100, 171, 184closed convex, 101, 136polyhedral, 105polar, 95, 148, 339polyhedral, 140dual, 165, 168, 169halfspace description, 140majorization, 448pointed, 168proper, 135positive semidefinite, 111, 349boundary, 342circular, 121face, 341inscription, 125inverse image, 115optimization, 370polyhedral, 124rank, 368visualization, 367proper, 102, 189recession, 151self dual, 166simplicial, 69, 144, 145, 170dual, 170, 177spectral, 287, 425dual, 292orthant, 427tangent, 339unique, 111, 148, 315congruence, 286congruent, 256conicindependence, 41preservation, 136problem, 365sectioncircular cone, 124conically independent, 133conici(), 627conservation of dimension, 266constraintequality, 161Gram, 231inequality, 394polynomial, 449rank, 395, 424sort, 301content, 309, 653contour plot, 186, 206convergence, 302, 405, 599geometric, 601convexcombination, 48cone, 57envelope, 244, 403, 436equivalent, 404form, 363function, 193geometry, 47hull, 70

684 INDEXof outer product, 72polyhedron, 70, 139halfspace description, 139vertices, 141problem, 152set, 48Schur form, 114strictly, 195convexity, 193, 253first order, 203, 207, 208, 210second order, 208, 211coordinate, 176Crippen & Havel, 248criteriamatrix, 268cubix, 526, 529, 650curvature, 213decimate(), 632decomposition, 650dyad, 489eigen, 476orthonormal, 562, 564SVD, 479compact, 479full, 480subcompact, 480dense, 93derivativedirectional, 207, 525, 532, 535second, 537table, 544descriptionconversion, 147halfspace, 76dual cone, 162vertex, 74, 168of halfspace, 138determinant, 305, 474, 550diagonal, 451, 511dominance, 127diagonalizable, 171, 476simultaneously, 378, 463diagonalization, 53and expansion, 171symmetric, 478difference, 56PSD matrices, 131vector, 57dihedral, 308dilation, 299dimension, 50affine, 38, 67, 263, 265, 275, 353low, 241minimization, 436, 444reduction, 311spectral projection, 445versus rank, 265direction, 95, 104exposed, 108extreme, 104, 169PSD cone, 113distanceorigin to hyperplane, 79doublet, 255, 497range, 497dualaffine dimension, 353cone, 142feasible set, 372norm, 156problem, 151projection, 586PSD cone, 164duality gap, 152dvec, 66

INDEX 685Dx(), 626dyad, 489, 492independence, 494, 495negative, 492projector, 556range, 493sumrange of, 496symmetry, 494Dykstra algorithm, 599edge, 91EDM, 359closest, 423composition, 617cone, 225, 315projection dual, 443construction, 328criterion, 344dual, 352definition, 224, 326, 340Gram form, 227, 337inner-product form, 250relative-angle form, 252dual, 353exponential entry, 617graph, 221, 234, 241, 242, 322projection, 343range, 327unique, 282, 294, 295, 424eigen, 476, 511decomposition, 476matrix, 112, 477value, 267, 460EDM, 286, 299interlaced, 273, 286intertwined, 273minimum, 283ordered, 289unique, 121, 476zero, 484vectordistinct, 477left, 476unique, 477elbow, 417, 418elementminimal, 102minimum, 102elementary matrix, 498ellipsoids, 52elliptope, 237, 277, 278, 285, 337, 338,384embedding, 263dimension, 68empty, 51interior, 51set, 51epigraph, 200, 210form, 472Euclideandistance matrix, 42norm, 654projection, 129expansion, 173, 650biorthogonal, 147, 169, 173, 569as projection, 568EDM, 332unique, 171, 173, 176, 333, 569implied by diagonalization, 171w.r.t orthant, 173exponential, 551exposed, 90direction, 108extreme, 92, 139face, 91, 93

686 INDEXpoint, 93, 108density, 93extreme, 90and boundary, 108direction, 104, 109distinct, 104unique, 104exposed, 92, 139point, 90, 405ray, 104face, 53, 91algebra, 94transitivity, 94face recognition, 39facet, 93Fantope, 71, 72Farkas’ lemma, 157definite, 374semidefinite, 373fat, 641full-rank, 88fifth Euclidean metric property, 306,311finitely generated, 140Finsler criterion, 344flared horn, 99floor, 653Forsgren & Gill, 363Fréchet differential, 533Frobeniusnorm, 61minimum, 582full, 51dimensional, 51functionaffine, 196, 202supremum, 199convex, 193differentiable, 211matrix, 209monotone, 193objective, 86, 152, 161, 185, 187linear, 198nonlinear, 198quadratic, 204quasiconvex, 213differentiable, 215step, 483matrix, 388support, 198vector, 194fundamentaltest, 455theorem algebra, 476Gâteaux differential, 533Gale matrix, 241generalized inequality, 41, 102generating list, 70generator, 107geometriccenter, 229, 255, 295subspace, 350, 583centeringmatrix, 500operator, 259Hahn-Banach theorem, 76multiplicity, 477realizability, 232gimbal, 506global positioning system, 39, 395Gower, 217gradient, 186, 198, 201, 525, 542derivative, 541first order, 541

INDEX 687product, 529second order, 542, 543table, 544Gram matrix, 227halfline, 95halfspace, 74, 75, 151description, 69, 76handoff, 246over, 246Hardy-Littlewood-Pólya, 426Hayden & Wells, 313, 413Hermitian matrix, 455Hessian, 525hexagon, 232homogeneity, 225honeycomb, 43Horn & Johnson, 456, 457hull, 67, 73affine, 67unique, 68conic, 72convex, 67, 70, 218of outer product, 72unique, 70hyperdimensional, 530hyperdisc, 536hyperplane, 74, 75, 78, 202independent, 88movement, 77separating, 86supporting, 84unique, 84, 203tangent, 84vertex description, 80hypersphere, 72, 279ice cream cone, 100idempotent, 555, 559symmetric, 561, 564iff, 652imageinverse, 58indefinite, 285independenceaffine, 80, 134conic, 133, 134, 136, 138preservation, 136linear, 49, 134preservation, 49inequalitygeneralized, 147, 157dual PSD cone, 164linearmatrix, 166spectral, 287triangle, 270unique, 304infima, 509inflection, 208injective, 62injectivity, 257inner productvectorized matrix, 59input error, 416interior, 51interior-point method, 236, 355, 366interpoint angle matrix, 227intersection, 56cone, 100hyperplane with convex set, 82line with boundary, 53of subspaces, 90planes, 89PSD coneaffine, 131

688 INDEXtangential, 54invariance, 254Gram form, 255inner-product form, 256translation, 254invariant set, 284inversionGram form, 259inward normal, 74is, 644isedm(), 621isomorphic, 61, 62, 332isometrically, 53, 61isomorphism, 61isometric, 62symmetric matrix subspace, 63isotonic reconstruction, 299iterate, 599iteration, 599iterative, 395K-convexity, 194Karhunen−Loéve transform, 293kissing number, 234Kronecker, 546product, 530Lagrange multiplier, 161law of cosines, 250Legendre-Fenchel transform, 436Lemaréchal, 35line, 202tangential, 55linearfunction, 196matrix inequality, 41, 167, 372,373, 376program, 85, 234, 365localization, 39, 239, 411sensor network, 395standardized test, 397unique, 37, 238, 241, 396, 397log det, 540, 549logarithm, 551Lorentz cone, 100lp(), 629machine learning, 39majorization, 454symmetric hollow, 454manifold, 39, 40, 118, 320, 321, 418,448, 517mapisotonic, 299linearof cone, 136USA, 42, 297, 630mapusa(), 630mater, 526Matlab, 297, 621lp(), 627matrix, 526auxiliary, 500, 504orthonormal, 503projector, 500Schoenberg, 502table, 504bordered, 286calculus, 525correlation, 278direction, 404, 405elementary, 498range, 498Euclidean distance, 217exponential, 212, 551Gale, 241

INDEX 689geometric centering, 500Gram, 227Householder, 499auxiliary, 501inverse, 212measurement, 414normal, 61, 478orthogonal, 505positive semidefinite, 455from extreme directions, 120product, 212, 457, 462, 529zero, 487rotation, 508simple, 491sort index, 298square root, 478step function, 388symmetric, 62subspace of, 62unitary, 505maximal complementarity, 366membership relation, 157discretized, 162, 357in subspace, 178metric property, 219fifth, 219, 222, 223, 304minimalelement, 102generating set, 137set, 80, 572minimax problem, 151minimizationon unit cube, 85minimumcardinality, 379element, 102globalunique, 193, 536unique, 196molecular conformation, 38, 39, 248monotonicityFejér, 605Muller, 483multidimensionalfunction, 193scaling, 39, 293multilateration, 245, 395multipath, 245mutually exclusive, 160natural order, 650nearest neighbors, 321neighborhood graph, 320, 321nesting, 272Newton, 525nondegeneracy, 219nonexpansive, 563, 594nonisomorphic, 487nonnegative, 268, 653nonsymmetric, 460nontrivial support, 84nonvertical, 202norm, 59, 654normal, 186, 586cone, 185, 593, 614facet, 169matrix, 61, 478vector, 586Notation, 641nuclear magnetic resonance, 248nullspace form, 87offset, 254omapusa(), 632on, 647one-dimensional projection, 576

690 INDEXonto, 647optimalanalytical results, 509solution, 196optimality, 161, 185, 378conic, 187equality constraint, 161first order, 186order of projection, 414ordinal multidimensional scaling, 299origin, 264, 571Orion nebula, 36orthant, 50, 173, 187nonnegative, 351orthogonal, 59complement, 62expansion, 64orthogonallyequivalent, 508invariant, 62orthonormal, 59, 256decomposition, 562matrix, 62orthonormality condition, 172outward normal, 74parallel, 50partial order, 101pattern recognition, 39Penrose conditions, 553pentahedron, 309perpendicular, 59, 562perturbation, 386, 388optimality, 391plane, 74segment, 104pointexposed, 92extreme, 92fixed, 602inflection, 208pointed, 100, 142polychoron, 95polyhedralcone, 72, 140polyhedron, 53, 276vertex description, 141vertices, 141polytope, 139positive, 653definite Farkas’ lemma, 374semidefinite, 458Farkas’ lemma, 373square root, 478semidefinite cone, 54, 111, 113,350boundary, 127dimension, 119extreme direction, 120face, 119inscription, 125rank, 119strictly, 269postulates, 219primal feasible set, 372principalcomponent analysis, 293submatrixleading, 270theorem, 468problemconic, 364feasibility, 189minimax, 151prevalent, 419proximity, 421

INDEX 691stress, 420proceduredyad-decomposition, 639rank reduction, 387Sturm, 639Procrustesdiagonal, 523linear program, 520maximization, 520orthogonal, 517two sided, 519, 521solutionunique, 517symmetric, 522translation, 518product, 56Cartesian, 57direct, 530Kronecker, 530matrix, 212, 457, 462, 529zero, 487positive definitenonsymmetric, 457pseudoinverse, 554semidefinitesymmetric, 468tensor, 530programdual, 377, 378geometric, 12semidefinite, 363, 433projection, 340, 553algebra, 566alternating, 599, 600convergence, 604, 606, 608, 611distance, 602, 603Dykstra, 612, 613feasibility, 602–604on affine/psd cone, 609on EDM cone, 442on halfspaces, 601on orthant/hyperplane, 605,606optimization, 602, 603, 612over/under, 607biorthogonal expansion, 568cyclic, 600dualas optimization, 587on cone, 589on convex set, 586, 588on EDM cone, 443easy, 595Euclidean, 129, 415, 418, 422,424, 432, 440, 586minimum-distance, 559, 562, 585nonorthogonal, 558, 573eigenvalue coefficients, 575on dyad, 574on elementary matrix, 570on matrix, 574oblique, 431on affine, 567, 571vertex description, 572on cone, 590, 592on convex set, 585in subspace, 597, 598on convex sets, 600on EDM cone, 433, 440on halfspace, 573on hyperplane, 573origin, 571on matrixvectorized, 573, 581on nonorthogonal basis, 567on orthant, 593

692 INDEXon orthogonal basis, 567on PSD cone, 117, 129, 421, 424rank constrained, 424on slab, 573on subspace, 591on truncated cone, 593one dimensional, 576order, 414orthogonal, 576eigenvalue coefficients, 578on dyad, 576on function domain, 201on matrix subspace, 581on vectorized matrix, 576semidefiniteness as, 579spectral, 425, 427unique, 427successive, 600two sided, 583range, 585unique, 559, 562, 582, 585, 586projectorbiorthogonal, 565commutative, 599directionuniversal, 564dyad, 556noncommutative, 600nonorthogonal, 559, 566orthogonal, 562orthonormal, 565range, 562universal, 561proper subspace, 48proximityEDM, 413nonconvex, 428in spectral norm, 431problem, 415rank heuristic, 437, 439semidefinite program, 433, 441Gram form, 435pseudoinverse, 174, 554product, 554unique, 553pyramid, 310quadrant, 50quadraticcone, 100program, 302quartix, 528, 650quasiconcave, 128quasiconvex, 128, 214function, 214rangeform, 87rotation, 506rank, 652heuristic, 437, 439log det, 438, 439lowest, 368minimization, 436one, 494modification, 498quasiconcavity, 128reduction, 366, 386, 387, 392procedure, 387trace, 438heuristic, 436rank ρ subset, 118ray, 95extreme, 337Rayleigh’s quotient, 580realizable, 36

INDEX 693reconstruction, 293isometric, 241, 297isotonic, 297unique, 221, 241, 242, 256, 257reflection, 256regularization, 411relative, 51angleinequality, 222, 306matrix, 252matrix inequality, 253boundary, 51, 70interior, 51rotation, 256invariant, 256round, 652RRf(), 635saddle-value, 151scaling, 293multidimensional, 39, 293Schoenberg, 288, 357auxiliary matrix, 226criterion, 44, 229Schurcomplement, 239, 403, 470, 471formanomaly, 474convex set, 114semidefinite program, 472, 594sparse, 471second ordercone, 100program, 366section, 121selfadjoint, 451dual, 164semidefinite, 460domain, 455Farkas’ lemma, 373program, 198, 365equality constraint, 112Gram form, 443Schur form, 442versus symmetry, 455sensor, 237, 244separation, 86sequence, 605setfeasible, 56, 161, 185, 372, 650invariant, 284level, 186, 196minimal, 80, 572solution, 650sublevel, 200, 214superlevel, 215shape, 38shift, 254SIAM, 364signeig(), 626simplex, 143, 144, 309unit, 143, 144simplicial, 144simultaneously diagonalizable, 378,463singular value problem, 431skinny, 174, 641slab, 49, 404slack variable, 365Slater’s condition, 152, 374smooth, 277solid, 139solutionnumerical, 406set, 650

694 INDEXunique, 56, 195, 196, 239, 368,430, 433, 437, 444sortindex matrix, 298monotone nonnegative, 426span, 648spectralcone, 289norm, 431projection, 324, 425standard basis, 144stars, 36steepest descent, 535step function, 483strictcomplementarity, 379positivity, 269triangle inequality, 272strictlyconvex function, 195feasible, 374separating hyperplane, 86supporting, 84strongdual, 152, 377duality, 378subject to, 652sublevel set, 200subspace, 48as hyperplane intersection, 88as span of nullspace basis, 89geometric center, 258, 348, 583orthogonal complement, 118,583hollow, 65, 258, 349projection, 58representation, 87symmetric hollow, 65tangent, 118translation invariant, 583successiveapproximation, 600projection, 600sum, 56vector, 57unique, 62, 147, 494superlevel set, 215support function, 84, 198supporting hyperplane, 83surjective, 62SVDgeometrical, 482pseudoinverse, 483symmetric, 483svec, 63svect(), 637svectinv(), 638Swiss roll, 40Sylvester’s law of inertia, 463symmetrized, 456tangent, 84, 118, 204tangential, 54, 240Taylor series, 525, 540tetrahedronangle inequality, 306theorem0 eigenvalue, 484alternating projectiondistance, 604alternative, 159, 359semidefinite, 375Barvinok, 131Bunt-Motzkin, 586Carathéodory, 157, 578cone faces, 103

INDEX 695convexity condition, 203decompositiondyad, 489directional derivative, 536discrete membership, 163dual cone intersection, 190EDM, 279eigenvalue0, 484of difference, 467order, 466sum, 467exposed, 108extreme existence, 91extremes, 106, 107Farkas’ lemma, 157definite, 374semidefinite, 373Geršgorin discs, 124gradient monotonicity, 207halfspaces, 76image, 58, 60intersection, 56inverse image, 58line, 211linearly independent dyads, 495mean value, 540monotone nonnegative sort, 426nonexpansivity, 594pointed cones, 101positive semidefinite, 459cone subsets, 128matrix sum, 127principal submatrix, 468symmetric, 469projection, 350algebraic complement, 592minimum-distance, 586on affine, 567on cone, 590via dual cone, 593via normal cone, 587projectorrank/trace, 563semidefinite, 469proper-cone boundary, 103Pythagorean, 250, 598range of dyad sum, 496rank versusaffine dimension, 267partitioned matrix, 472trace, 560real eigenvector, 476Schur, 454Schur formrank, 472subspace projection, 58Tour, 617tight, 221, 651torsion, 43trace, 451, 511, 547trajectorysensor, 243transformationlineardual, 158rigid, 38similarity, 578translation invariant, 254subspace, 255, 583trefoil, 322, 323, 325triangle, 220triangulation, 37trilateration, 37, 56, 238tandem, 244Trosset, 321

696 INDEXunboundedbelow, 85unfurling, 321unimodal, 214unique, 11, 102, 104, 240, 476, 509unit simplex, 144unitarylinear operator, 62matrix, 505USA, 296valueabsolute, 510minimumunique, 196objectiveunique, 365singularlargest, 596variablematrix, 79slack, 365vec, 59, 451vector, 48dual, 613normal, 586Perron, 286primal, 613vectorization, 59, 340Venn, 416vertex, 56, 93description, 69, 74of halfspace, 137polyhedral cone, 142vertices, 52, 56Vitulli, 491Vm(), 625Vn(), 625Wüthrich, 38weakalternative, 160duality theorem, 377wirelesslocation, 39, 395sensor network, 237, 244womb, 526zero, 484definite, 488entry, 484matrix product, 487trace, 487

A searchable pdfBook is available online from Meboo Publishing.697

698

699

700

701

702

703

704

Convex Optimization & Euclidean Distance Geometry, DattorroMeboo Publishing USA

707

v2006.06.24 - Convex Optimization

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?