Algorithms for Linear Least Squares problems on the Stiefel manifold

The BEELINE June 2008Maryland State Beekeepers’ AssociationSummer MeetingJune 7, 2008Oregon Ridge Nature Center,Cockeysville, MD8:30amRefreshments, Coffee, Donuts, etc.9:30am Opening and Welcome Steve McDanielPresident9:45am Md. Apiary Inspector's Report Jerry FischerMd. State Inspector10:00amBeekeeping MathThe Numbers Game All Beekeepers PlayBreakKim FlottumEditor, “Bee Culture”Medina, Oh.11:15am12:15pm1:00pm2:15pm3:15pmMaryland's Native Bees: The 400 Other BeeSpeciesSam DroegePatuxent Wildlife ResearchCenterLaurel, Md.Lunch may be ordered from designated CMBA members and will be delivered to the Nature Center.Field Day in the Apiary ( Attendees advised to bring veil / protective gear)CMBA members will conduct open hive examination in the apiary.Managing PheromonesEverything you want to, or should know, aboutpheromones in a honey bee colony.Panel discussionKim FlottumEditor, “Bee Culture”Medina, Oh.4:15 Adjourn###3

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 63Let β i , i = 1, 2, . . . be the positive real solutions of P 2t−1 (β) = 0 ordered suchthat β i < β i+1 . Since s is a descent direction, β 1 is always a minimum or saddlepoint to (8). Consider the interval (0, µ] where µ > 0, the optimal step lengthon the interval is given byˆβ = arg{min(P 2t (β i ), P 2t (µ)) ∀ β i < µ}.Typically when considering local convergence µ = 1 is used, corresponding to afull step length.4 The overall algorithmTo get a starting approximation Q 0 <strong>for</strong> the nonlinear solver first a least squaresproblem with equality constraint is solved,˜Q 0 = minQ {||Fvec(Q) − b||2 2 , subject to vec(Q) T vec(Q) = √ n} (9)Since ˜Q 0 ∈ R m×n does not necessarily has orthonormal columns, the OPPQ 0 = arg{minQ ||Q − ˜Q 0 || 2 F , subject to Q ∈ V m,n} (10)is solved yielding Q 0 ∈ V m,n , which is used as the initial value <strong>for</strong> the iterativealgorithm. The algorithm to compute a minimum to (1) works as follows.Algorithm: min ||f(Q) − b|| 2 2 subject to Q ∈ V m,n0. Compute Q 0 by solving (9) and (10).1. j = 0, µ = 1, ˆ∆ = 10 −10 , ∆ 0 = ˆ∆ + 1.2. While ∆ j > ˆ∆2.1. If (J T J + H) is positive definite2.1.1. compute a Newton search direction s = s N (7),2.2. else2.2.1. take a Gauss-Newton search direction s = s GN (6).2.3 Compute optimal step length ˆβ on the interval (0, µ] by solving (8).2.4 Update Q j+1 = [Q j , (Q j ) ⊥ ] exp(S(ˆβs))I m,n .2.5. j = j + 1.2.6.3. end While.∆ j+1 = ||JT (f(Q j ) − b)|| 2||J|| 2 ||f(Q j ) − b|| 2.The algorithm has been implemented in MATLAB, and can be downloadedfromhttp://www.cs.umu.se/ ∼ viklands/WOPP/index.html.

64 Paper II5 Computational experimentsIn this section, the algorithm presented is tested on randomly generated <strong>problems</strong>of different dimensions Q ∈ R m×n and F ∈ R k×mn . We mainly investigatethe ability of computing a minimizer, and the number of iterations needed todo so. But, we also present some results regarding the efficiency of computingthe global minimum.The matrix F is generated as a matrix with normally distributed randomnumbers. Then by manipulating the singular values of F different conditionnumbers can be chosen. A random solution ˆQ is generated, and the exactmodel is then ˆb = Fvec( ˆQ). To generate b, letb = ˆb + γ¯b,where ¯b is a perturbation and γ > 0 a scalar. Some different methods to choose¯b ∈ R k have been considered.1. Let each element ¯b i = ǫ i |ˆb i |, i = 1, ..., k, where ǫ i is a scalar chosen randomlyfrom the normal distribution.2. Assume that Q is parameterized with p parameters. If k > p, we cancompute the jacobian at f( ˆQ) and let N ∈ R k×(k−p) be a basis of the nullspace of J T . Now take ¯b = ρNx where x ∈ R k−p is a vector with normallydistributed random numbers. ρ is a scalar used to make ||¯b|| 2 = ||f( ˆQ)|| 2 .In Item 1, relative perturbations are generated. Using this type of perturbationchanges the initial chosen solution ˆQ. That is, the generated ˆQ is not a minimum(critical point) to the optimization problem. By using Item 2, the initialsolution ˆQ will always be a critical point with residual γ¯b (but not necessarilya minimum). γ is here used to make the norm of the residual proportional tothe norm of f( ˆQ). For instance, using γ = 0.1 means that the magnitude of theresidual is 10% of the magnitude of the function value f( ˆQ). For small values ofγ, ˆQ should still be a global minimum after adding the perturbation. Choosingtoo large values of γ often results in that ˆQ becomes a local minimum, saddlepoint or maximum. We want to add a small perturbation such that ˆQ is stillglobal minimum, but large enough to cause trouble. That is, not so small thatthe algorithm becomes 100% successful in computing the global (generated)minimum ˆQ.5.1 Relative perturbationsThe tables in Appendix C.1 display results <strong>for</strong> different dimensions m, n andF ∈ R mn×mn using relative perturbations (item 1 above). For a given conditionnumber κ(F) and noise level γ, 100 tests are randomly generated. The tablesdisplay the average number of iterations, and the corresponding standard deviationinside parenthesis. For example, <strong>for</strong> m = 3 and n = 2 with κ(F) = 5 andγ = 0.01 (1% relative noise level), 100 tests were done. The average numberof iterations needed to compute a solution is 3.77, with a standard deviation of0.42.

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 655.2 Null space perturbationsFor a given dimension m, n and noise level γ, each table in Appendix C.2.1corresponds to a set of tests where the condition number of F ∈ R mn×mn isvaried. For each condition number, 100 test <strong>problems</strong> were generated with aperturbation according to item 2 above. The tables in contain the followingin<strong>for</strong>mation.κ(F) Condition number <strong>for</strong> the matrix F.Iterations The average number of iterations, and the correspondingstandard deviation inside parenthesis (computed after running100 test <strong>problems</strong>).Fails Contains the number of tests that resulted in a nonminimumsolution, due to exceeding 100 iterations. It isexpected that some generated <strong>problems</strong> can yield very slowconvergence, hence the algorithm was set to terminate at100 iterations.New min For a given test problem generated with the exact solutionˆQ, let ¯Q be the solution computed by the algorithm. Thenumber of tests that resulted in that ˆQ ≠ ¯Q is shown here.This was done by checking if || ˆQ − ¯Q|| F > 10 −4 .Not global Shows the number of tests when ||f( ˆQ) − b|| 2 < ||f( ¯Q) −b|| 2 occurred. That is, the computed solution resulted in agreater residual norm than the generated solution.The ideal results are, e.g., those shown in Table 14. The ’Fails’ column with justzeroes, indicates that the algorithm managed to compute a minimum to all test<strong>problems</strong>. The column ’New min’ indicates that the computed solution is thesame as the generated solution. Also, e.g., Table 13 shows good results. Herethe computed solutions differ on several occasions from the generated solutions,seen in the column ’New min’. However, only a few of the computed solutionsresulted in a greater residual norm, seen in the ’Not global’ column.5.2.1 Tests with non-square FFor Q ∈ R m×n and F ∈ R k×mn , the computational experiments in previoussections used k = mn (resulting in that F is a square matrix). Here we considerthe case when k < mn, with perturbations according to item 2. For the results,shown in Appendix C.2.2, k = mn − n was used. As earlier, <strong>for</strong> each conditionnumber κ(F), 100 tests were made. The tables show the same in<strong>for</strong>mation asdescribed in Section 5.2.6 Summary of computational experimentsThe computational experiments presented in Appendix C, show that the algorithmis efficient in computing a solution to (1). Tables indicates that around

66 Paper II5−15 iterations were needed on an average, depending on the problem dimensionand noise level.When it comes to the success rate of computing the global solution, the algorithmseems quite successful. First of all, no global optimization algorithm hasbeen used during the experiments. By using small γ values and perturbationsaccording to Item 2 above, ˆQ should most often be global minimizer. Typicallythis was the case when using γ = 0.05 and γ = 0.1, while <strong>for</strong> γ = 0.2 ˆQ wouldmore often become a local minimum, saddle point or maximum.For test <strong>problems</strong> generated in Appendix C.2.1, with m = 6, 10, 12, thealgorithm most often computed a solution ¯Q, better 2 than or same as ˆQ. Rathersurprisingly the worst results appear in the low-dimensional case with m = 3,where around 10% of the computed solutions yielded a greater residual normthan ˆQ. For all tests, using γ = 0.2 quite often resulted in that ˆQ was not aglobal minimum. By subtracting the ’Not global’ column from the ’New min’column, the number of test <strong>problems</strong> where ||f( ¯Q) − b|| 2 < ||f( ˆQ) − b|| 2 isgiven. Typically ˆQ would become a saddle point most of the cases, when theperturbation was added.For a non-square F, in Appendix C.2.2, the tables show a noticeable increasein the number of average iterations. In the tables with m = 6 and n = 5,10% − 40% of the experiments resulted in that the computed solution yieldeda greater residual than ˆQ. However, when looking at the tables with m = 10and n = 4 many of the tests resulted in that ˆQ became a local minimum(or maximum/saddle-point) after adding the perturbation. And the computedsolution yielded a smaller residual norm. Even though these <strong>problems</strong> are ofdifferent dimensions, 6 × 5 and 10 × 4, it is not clear why the results are quitevarying in this sense.Nevertheless, in total 41800 tests are presented here. Out of these tests, 5resulted in that the algorithm terminated due to more than 100 iterations wereper<strong>for</strong>med (without fulfilling the desired tolerance). Since the algorithm usesGauss-Newton steps, unless the Hessian J T J + H is positive definite, this canin some cases (with large residuals), result in slow convergence. Specially if thecomputed initial matrix Q 0 is a bad starting value. However, on the total, <strong>for</strong>these tests, it was a rare scenario.AppendixA The canonical <strong>for</strong>m of a WOPPProposition A.1 The matrices A ∈ R mA×m and X ∈ R n×nX with Rank(A) =m and Rank(X) = n belonging to a WOPPmin 1 2 ||AQX − B||2 F , subject to Q T Q = I n ,can always be considered as m by m and n by n diagonal matrices, respectively.2 Better in the sense that ¯Q resulted in a smaller residual.

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 67be the singular value decom-Proof. Let A = U A Σ A VA T and X = U XΣ X VX Tposition of A and X. Then||U A Σ A V T A QU B Σ B V T B − B|| 2 F = ||U A Σ A ZΣ X V T X − B|| 2 F,where Z = V T A QU X ∈ R m×n has orthonormal columns. Since U T A U A = I mAand V T X V X = I nX it follows that||U A Σ A ZΣ X V T X − B||2 F = tr(U AΣ A ZΣ X V T X − B)T (U A Σ A ZΣ X V T X − B) == tr(V X Σ X Z T Σ 2 AZΣ X V T X − 2V X Σ X Z T Σ 2 AU T AB + B T B) == tr(Σ X Z T Σ 2 AZΣ X ) − tr(2Σ B Z T Σ A U T ABV X ) + tr(B T B) =tr(Σ A ZΣ X − U T A BV X) T (Σ A ZΣ X − U T A BV X) = ||Σ A ZΣ X − U T A BV X|| 2 F .Hence, without loss of generality we can assume that A = diag(α 1 , ..., α m ) andX = diag(χ 1 , ..., χ n ) with α i ≥ α i+1 ≥ 0 and χ i ≥ χ i+1 ≥ 0. ✷B Parametrization of V m,n by using the Cayleytrans<strong>for</strong>mThe Cayley is often used to represent orthogonal matrices with positive determinantsasQ(S) = (I + S)(I − S) −1 , (11)where S ∈ R m×m is skew-symmetric S = −S T . Since a skew-symmetric matrixhas imaginary eigenvalues, (I − S) always has full rank. However, thisparametrization fails in some cases, namely when ( ˜Q + I) is singular. As anexample, there exist no S ∈ R 2×2 such that Q(S) = diag(−1, −1). Instead ofusing (11) as a parametrization of orthogonal matrices, a local parametrizationcan be used. Given a point ˜Q ∈ V m,n , we can express any Q ∈ V m,m in thevicinity of ˜Q by usingQ(S) = ˜Q(I + S)(I − S) −1 . (12)To get a local parametrization of V m,n when n ≤ m, (12) is modified accordingto the following. Given a point ˜Q ∈ V m,n , then a parametrization <strong>for</strong> anyQ ∈ V m,n in the vicinity of ˜Q can be written asQ(S) = [ ˜Q, ˜Q ⊥ ](I + S)(I − S) −1 I m,n . (13)Here ˜Q ⊥ is any extension such that [ ˜Q, ˜Q ⊥ ] ∈ R m×m is orthogonal and[ ]InI m,n = ∈ R m×n .0S is skew-symmetric according to[S11 −S21S =T S 21 0], (14)

68 Paper IIwhere S 11 ∈ R n×n is skew-symmetric and S 21 ∈ R m×n is arbitrary. The remaininglower right part in S is a zero matrix. Observe that if m = n, then(13) is the same as (12).B.1 Search directions with Cayley representationFor a given search direction s at a point ˜Q, moving along the surface of f(Q)can be done by usingQ(φ) = [ ˜Q, ˜Q ⊥ ]C φ (φ)I m,n ,where=C φ (φ) =˜p∑j=1˜p∑j=1U j[cos(φj ) − sin(φ j )sin(φ j ) cos(φ j )[(cos(φ j )U j Uj H 0 −1+ sin(φ j )U j1 0]U T j = (15)]U T j ). (16)By using the spectral decomposition of S, S = WDW H , the decompositionC φ (φ) = UΦ(φ)U T is derived [6]. U ∈ R m×m is orthogonal,Φ(φ) ={ diag(Φ1 , Φ 2 , ..., Φ m/2 ) if m is even ⇒ ˜p = m/2,diag(Φ 1 , Φ 2 , ..., Φ (m−1)/2 , 1) otherwise ⇒ ˜p = (m − 1)/2 + 1,whereΦ i =Now using (16) to express C φ (φ) yieldsf(Q(φ)) =˜p∑j=1[ cos(φi ) − sin(φ i )sin(φ i ) cos(φ i )].[(cos(φ j )f( ˜QU j Uj T I m,n ) + sin(φ)f( ˜QU 0 −1j1 0]U T j I m,n )) =cos(φ 1 )f 1,cos + sin(φ 1 )f 1,sin + ... + cos(φ˜p )f˜p,cos + sin(φ˜p )f˜p,sin .The optimal C φ (φ) is given by solving the least squares problem[ ] [ ] [ ]cosφ1 cosφ2min ||A f1 + Aφ sin φf2 + . . . + A cosφ˜p1 sin φf˜p− b|| 222, (17)sinφ˜pwhere A fi = [f i,cos , f i,sin ] ∈ R k×2 .Two different approaches to solve this subproblem are considered. A traditionalGauss Newton or Newton method can be used to solve (17). However,empirical studies have shown that the Jacobian matrix of f(Q(φ)) can occasionallybecome ill conditioned. Hence using a Gauss-Newton method can result inslow convergence. Switching to a Newton method then might result in convergencetowards a maximum. Since the parameters φ i are periodic, a large searchdirection when solving (17) can result in a seemingly randomized step.

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 69In the cases when Newton type algorithms fail, a coordinate-wise search canbe used. This is done by keeping every angle but one φ i ∈ φ fix, and use it as aminimizer. Then repeating this <strong>for</strong> all angles φ j ∈ φ, j = 1, ..., ˜p.Algorithm: Coordinate-wise search0. Given a search direction s (S), compute U.1. Set φ 1 = φ 2 = ... = φ˜p = 0.2. While φ is not a minimizer to (17)2.1. <strong>for</strong> i = 1 to ˜p2.1.1.c =˜p∑j=1,j≠i2.1.2. Let φ i be the solution of2.2. end <strong>for</strong>2.3 end While[ ]cosφjA fjsin φ j[ ]cosφimin ||A fi − (b − c)||φ i sin φ 2 2 . (18)iThe subproblem (18) is solved optimally by computing all solutions to afourth degree polynomial, see [13]. This is a very robust method in order tominimize (17), but <strong>for</strong> larger <strong>problems</strong> it can be a time consuming task. Also ithas shown to result in too short step lengths, resulting in a slow convergence.

70 Paper IICResults from computational experimentsC.1 Tables <strong>for</strong> the relative type of perturbationsκ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3 ( 0 ) 3.13 ( 0.34 ) 3.77 ( 0.42 ) 3.92 ( 0.37) 4.18 ( 0.52)5 3.15 ( 0.36 ) 3.77 ( 0.42 ) 4.21 ( 0.46 ) 4.4 ( 0.57) 4.73 ( 0.97)10 3.34 ( 0.48 ) 3.93 ( 0.48 ) 4.46 ( 0.63 ) 4.69 ( 0.72) 4.91 ( 1.06)50 3.69 ( 0.49 ) 4.13 ( 0.58 ) 5.13 ( 1.01 ) 5.15 ( 1.28) 5.18 ( 0.93)100 3.77 ( 0.6 ) 4.25 ( 0.73 ) 5.2 ( 1.05 ) 5.39 ( 1.14) 5.14 ( 1.06)250 3.72 ( 0.6 ) 4.74 ( 1.28 ) 5.21 ( 1.09 ) 5.24 ( 1.06) 5.58 ( 1.61)500 3.92 ( 0.8 ) 4.87 ( 1.51 ) 5.4 ( 1.41 ) 5.76 ( 1.96) 5.15 ( 1.04)1000 3.87 ( 0.77 ) 4.84 ( 1.22 ) 5.26 ( 1.3 ) 5.66 ( 1.36) 5.55 ( 1.31)2500 4.15 ( 1.1 ) 5.14 ( 2.43 ) 5.27 ( 1.12 ) 5.53 ( 1.36) 5.35 ( 1.1)5000 4.54 ( 1.53 ) 5.2 ( 1.37 ) 5.38 ( 1.43 ) 5.3 ( 1.12) 5.4 ( 1.56)10000 4.6 ( 1.34 ) 5.17 ( 1.41 ) 5.29 ( 1.13 ) 5.63 ( 1.76) 5.52 ( 1.32)Table 1: m = 3, n = 2.κ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3.05 ( 0.5 ) 3.29 ( 0.52 ) 4.00 ( 0 ) 4 ( 0) 4.27 ( 0.49)5 3.19 ( 0.39 ) 4 ( 0 ) 4.4 ( 0.49 ) 4.94 ( 0.51) 5.46 ( 0.77)10 3.71 ( 0.46 ) 4 ( 0 ) 4.95 ( 0.5 ) 5.35 ( 0.63) 6.07 ( 0.96)50 3.98 ( 0.14 ) 4.44 ( 0.54 ) 5.66 ( 0.79 ) 6.26 ( 1.38) 7.07 ( 3.27)100 4.02 ( 0.25 ) 4.75 ( 0.61 ) 5.79 ( 1.15 ) 6.32 ( 1.65) 7.13 ( 3.08)250 4.1 ( 0.3 ) 4.99 ( 0.82 ) 5.88 ( 0.79 ) 6.32 ( 1.06) 6.95 ( 2.04)500 4.11 ( 0.37 ) 5.12 ( 0.83 ) 5.93 ( 0.87 ) 6.59 ( 2.18) 7.33 ( 2.31)1000 4.27 ( 0.57 ) 5.19 ( 0.85 ) 5.95 ( 1.91 ) 6.44 ( 1.56) 7.45 ( 3.34)2500 4.59 ( 0.93 ) 5.18 ( 0.78 ) 5.81 ( 0.66 ) 6.28 ( 1.16) 7.11 ( 2.7)5000 4.63 ( 0.86 ) 5.48 ( 1.18 ) 6.08 ( 1.35 ) 6.2 ( 1.06) 6.76 ( 1.24)10000 4.92 ( 1.01 ) 5.29 ( 1 ) 5.89 ( 0.91 ) 6.33 ( 1.23) 6.89 ( 1.49)Table 2: m = 6, n = 5.

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 71κ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3.07 ( 0.7 ) 3.4 ( 0.6 ) 4 ( 0) 4 ( 0 ) 4.17 ( 0.43 )5 3.41 ( 0.49 ) 4 ( 0 ) 4.72 ( 0.45) 5.01 ( 0.33 ) 5.58 ( 0.67 )10 3.99 ( 0.1 ) 4.02 ( 0.14 ) 5.3 ( 0.48) 5.83 ( 0.57 ) 6.19 ( 1.24 )50 4.01 ( 0.1 ) 4.98 ( 0.53 ) 6.3 ( 0.7) 6.51 ( 0.87 ) 7.32 ( 2.06 )100 4.07 ( 0.26 ) 5.28 ( 0.65 ) 6.41 ( 0.79) 6.7 ( 1.11 ) 7.47 ( 2.02 )250 4.32 ( 0.47 ) 5.48 ( 0.89 ) 6.44 ( 0.9) 7.01 ( 1.45 ) 6.86 ( 1.1 )500 4.33 ( 0.49 ) 5.74 ( 0.8 ) 6.54 ( 0.88) 6.79 ( 1.08 ) 7.11 ( 1.54 )1000 4.67 ( 0.74 ) 5.63 ( 0.79 ) 6.53 ( 0.81) 7.02 ( 2.16 ) 7.58 ( 2.77 )2500 4.73 ( 0.85 ) 5.87 ( 0.98 ) 6.64 ( 1.19) 6.95 ( 1.4 ) 7.44 ( 2.28 )5000 5.23 ( 1.04 ) 5.82 ( 0.98 ) 6.65 ( 0.89) 6.85 ( 1.1 ) 7.06 ( 1.51 )10000 5.16 ( 1.13 ) 5.87 ( 1.02 ) 6.6 ( 1.02) 6.73 ( 1.05 ) 7.29 ( 2.26 )Table 3: m = 12, n = 5.κ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3.04 ( 0.4 ) 3.53 ( 0.61 ) 4 ( 0 ) 4 ( 0 ) 4.36 ( 0.48 )5 3.29 ( 0.46 ) 4 ( 0 ) 4.7 ( 0.46 ) 5.03 ( 0.22 ) 6.15 ( 1.77 )10 3.99 ( 0.1 ) 4 ( 0 ) 5.09 ( 0.29 ) 5.58 ( 0.54 ) 6.39 ( 1.11 )50 4 ( 0 ) 4.69 ( 0.46 ) 5.97 ( 0.63 ) 6.41 ( 0.98 ) 8.65 ( 7.51 )100 4 ( 0 ) 4.95 ( 0.41 ) 6.01 ( 0.58 ) 6.64 ( 1.09 ) 7.75 ( 1.95 )250 4.08 ( 0.27 ) 5.2 ( 0.68 ) 6.18 ( 0.67 ) 6.6 ( 0.94 ) 7.8 ( 3.24 )500 4.19 ( 0.42 ) 5.48 ( 0.81 ) 6.26 ( 0.75 ) 6.39 ( 0.71 ) 7.74 ( 3.02 )1000 4.39 ( 0.62 ) 5.44 ( 0.72 ) 6.13 ( 0.65 ) 6.48 ( 0.9 ) 8.13 ( 5.31 )2500 4.64 ( 0.82 ) 5.62 ( 0.76 ) 6.25 ( 0.69 ) 6.6 ( 0.9 ) 7.59 ( 2.82 )5000 4.76 ( 0.81 ) 5.39 ( 0.72 ) 6.1 ( 0.73 ) 6.59 ( 0.94 ) 8.36 ( 4.15 )10000 5.09 ( 1 ) 5.51 ( 0.83 ) 6.07 ( 0.54 ) 6.41 ( 0.65 ) 8 ( 2.86 )Table 4: m = 10, n = 7.

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 75κ(F) Iterations Fails New min Not global2 4.22 ( 0.42 ) 0 0 05 6.12 ( 1.27 ) 0 4 110 8.06 ( 3.64 ) 0 27 050 9.5 ( 4.24 ) 0 70 2100 10.09 ( 7.04 ) 0 67 1250 8.91 ( 2.87 ) 0 69 3500 8.93 ( 2.09 ) 2 61 11000 9.34 ( 4.38 ) 0 63 02500 10.28 ( 6.75 ) 0 65 25000 10.16 ( 5.02 ) 0 72 110000 9.47 ( 4.73 ) 0 68 1Table 13: m = 12, n = 5, γ = 0.2.κ(F) Iterations Fails New min Not global2 4 ( 0 ) 0 0 05 4.6 ( 0.49 ) 0 0 010 5.03 ( 0.17 ) 0 0 050 5.9 ( 0.61 ) 0 0 0100 6.09 ( 0.64 ) 0 0 0250 6.09 ( 0.6 ) 0 0 0500 6.03 ( 0.64 ) 0 0 01000 6.14 ( 0.64 ) 0 0 02500 6.19 ( 0.54 ) 0 0 05000 6.05 ( 0.52 ) 0 0 010000 6.23 ( 0.58 ) 0 0 0Table 14: m = 10, n = 5, γ = 0.05.κ(F) Iterations Fails New min Not global2 4 ( 0 ) 0 0 05 5.01 ( 0.1 ) 0 0 010 5.65 ( 0.61 ) 0 0 050 6.13 ( 0.56 ) 0 0 0100 6.54 ( 0.86 ) 0 2 0250 6.44 ( 0.88 ) 0 0 0500 6.56 ( 0.81 ) 0 0 01000 6.63 ( 0.91 ) 0 0 02500 6.62 ( 0.83 ) 0 2 05000 6.45 ( 0.77 ) 0 1 010000 6.61 ( 1.38 ) 0 2 0Table 15: m = 10, n = 5, γ = 0.1.

76 Paper IIκ(F) Iterations Fails New min Not global2 4.03 ( 0.17 ) 0 0 05 5.68 ( 0.55 ) 0 0 010 6.56 ( 0.98 ) 0 5 050 8.99 ( 4.06 ) 0 20 1100 9.02 ( 5.49 ) 0 22 5250 10.67 ( 10.24 ) 0 31 4500 9.44 ( 9.4 ) 0 28 11000 8.48 ( 2.89 ) 0 33 02500 8.04 ( 2.13 ) 0 27 45000 9.24 ( 5.15 ) 0 26 210000 8.73 ( 3.52 ) 0 24 1Table 16: m = 10, n = 5, γ = 0.2.C.2.2 Using non-square matrix F ∈ R mn−n×mnκ(F) Iterations Fails New min Not global2 7.25 ( 1.48 ) 0 9 95 7.65 ( 1.47 ) 0 10 1010 9.29 ( 4.78 ) 0 10 1050 10.03 ( 3.25 ) 0 18 17100 11.9 ( 4.44 ) 0 15 14250 11.81 ( 4.08 ) 0 24 23500 12.76 ( 3.95 ) 0 31 281000 13.34 ( 4.34 ) 0 36 342500 13.33 ( 3.83 ) 0 35 355000 14.08 ( 4.38 ) 0 25 2410000 13.65 ( 3.88 ) 0 35 33Table 17: m = 6, n = 5, γ = 0.05.

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 77κ(F) Iterations Fails New min Not global2 7.36 ( 1.84) 0 8 85 8.86 ( 2.78) 0 8 810 9.67 ( 3.6) 0 15 1150 10.7 ( 3.34) 0 39 28100 12.44 ( 4.53) 0 35 24250 13.47 ( 4.39) 0 43 33500 13.82 ( 4.56) 0 43 331000 14.72 ( 7.26) 0 36 292500 14.52 ( 5.25) 0 47 355000 14.48 ( 5.23) 0 41 3510000 14.94 ( 4.93) 0 42 31Table 18: m = 6, n = 5, γ = 0.1.κ(F) Iterations Fails New min Not global2 8.01 ( 2.63 ) 0 10 75 9.24 ( 2.79 ) 0 26 1010 10.11 ( 3.2 ) 0 46 1950 12.18 ( 4.93 ) 1 60 26100 13.21 ( 3.81 ) 0 58 33250 14.91 ( 7.41 ) 0 71 36500 13.16 ( 5.52 ) 0 67 291000 12.92 ( 3.52 ) 0 62 272500 14.73 ( 5.48 ) 0 58 325000 15.05 ( 8.87 ) 0 67 3810000 14.47 ( 5.44 ) 0 61 32Table 19: m = 6, n = 5, γ = 0.2.κ(F) Iterations Fails New min Not global2 7.35 ( 1.45 ) 0 2 25 8.21 ( 1.47 ) 0 4 010 8.92 ( 1.79 ) 0 14 350 11.57 ( 3.21 ) 0 42 9100 12.47 ( 3.25 ) 0 53 7250 14.38 ( 4.49 ) 0 50 11500 13.81 ( 3.06 ) 0 40 41000 15.37 ( 4.99 ) 0 39 82500 14.77 ( 3.54 ) 0 38 45000 14.79 ( 3.87 ) 0 49 910000 15.18 ( 4.99 ) 0 45 9Table 20: m = 10, n = 4, γ = 0.05.

78 Paper IIκ(F) Iterations Fails New min Not global2 7.55 ( 1.36 ) 0 19 25 8.96 ( 5.19 ) 0 29 110 9.54 ( 3.19 ) 0 46 650 11.49 ( 2.49 ) 0 62 2100 13.09 ( 3.87 ) 0 55 3250 13.72 ( 5.38 ) 0 68 4500 14.2 ( 3.53 ) 0 59 61000 14.15 ( 3.66 ) 0 69 62500 14.6 ( 4.3 ) 0 74 95000 14.38 ( 4.37 ) 0 59 410000 14.78 ( 3.49 ) 0 74 7Table 21: m = 10, n = 4, γ = 0.1.κ(F) Iterations Fails New min Not global2 8.63 ( 3.83 ) 0 60 25 9.39 ( 3.44 ) 0 70 210 10.29 ( 3.26 ) 0 85 550 12.42 ( 3.53 ) 0 79 4100 13.61 ( 4.69 ) 0 79 0250 14.38 ( 7.06 ) 0 88 3500 13.64 ( 3.12 ) 0 81 31000 15.04 ( 4.18 ) 1 85 22500 14.79 ( 6.26 ) 0 85 35000 15.51 ( 9.29 ) 0 81 210000 13.62 ( 3.09 ) 1 78 2Table 22: m = 10, n = 4, γ = 0.2.References[1] M. T. Chu and N. T. Trendafilov. On a Differential Equation Approach tothe Weighted Orthogonal Procrustes Problem. Statistics and Computing,8(2):125–133, 1998.[2] M. T. Chu and N. T. Trendafilov. The Orthogonally Constrained RegressionRevisted. J. Comput. Graph. Stat., 10:746–771, 2001.[3] A. Edelman, T. A. Arias, and S. T. Smith. The Geometry of <strong>Algorithms</strong>with Orthogonality Constraints. SIAM Journal on Matrix Analysis andApplications, 20(2):303–353, 1998.[4] L. Eldén and H. Park. A Procrustes problem on the Stiefel manifold.Numer. Math., 82(4):599–619, 1999.[5] W. Gander. <strong>Least</strong> <strong>Squares</strong> with a Quadratic Constraint. Numer. Math.,36:291–307, 1981.

<strong>Algorithms</strong> <strong>for</strong> <strong>Linear</strong> <strong>Least</strong> <strong>Squares</strong> <strong>problems</strong> on the Stiefel manifold 79[6] P. R. Halmos. Finite-dimensional vector spaces. Van Nostrand, 1958.[7] M. A. Koschat and D. F. Swayne. A Weigthed Procrustes Criterion. Psychometrika,56(2):229–239, 1991.[8] A. Mooijaart and J. J. F. Commandeur. A General Solution of the WeigthedOrthonormal Procrustes Problem. Psychometrika, 55(4):657–663, 1990.[9] T. Rapcsak. On Minimization on Stiefel manifolds. European J. Oper.Res., 143(2):365–376, 2002.[10] I. Söderkvist. Some Numerical Methods <strong>for</strong> Kinematical Analysis. ISSN-0348-0542, UMINF-186.90, Department of Computing Science, Umeå University,1990.[11] I. Söderkvist and Per-Åke Wedin. On Condition Numbers and <strong>Algorithms</strong><strong>for</strong> Determining a Rigid Body Movement. BIT, 34:424–436, 1994.[12] E. Stiefel. Richtungsfelder und Fernparallelismus in n-dimensionalen Mannigfaltigkeiten.Commentarii Math. Helvetici, 8:305–353, 1935-1936.[13] P. Å . Wedin and T. Viklands. <strong>Algorithms</strong> <strong>for</strong> 3-dimensional WeightedOrthogonal Procrustes Problems. Technical Report UMINF-06.06, Departmentof Computing Science, Umeå University, Umeå, Sweden, 2006.

80 Paper II

Algorithms for Linear Least Squares problems on the Stiefel manifold

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?