12.07.2015 Views

Algorithms for Linear Least Squares problems on the Stiefel manifold

Algorithms for Linear Least Squares problems on the Stiefel manifold

Algorithms for Linear Least Squares problems on the Stiefel manifold

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The BEELINE June 2008Maryland State Beekeepers’ Associati<strong>on</strong>Summer MeetingJune 7, 2008Oreg<strong>on</strong> Ridge Nature Center,Cockeysville, MD8:30amRefreshments, Coffee, D<strong>on</strong>uts, etc.9:30am Opening and Welcome Steve McDanielPresident9:45am Md. Apiary Inspector's Report Jerry FischerMd. State Inspector10:00amBeekeeping MathThe Numbers Game All Beekeepers PlayBreakKim FlottumEditor, “Bee Culture”Medina, Oh.11:15am12:15pm1:00pm2:15pm3:15pmMaryland's Native Bees: The 400 O<strong>the</strong>r BeeSpeciesSam DroegePatuxent Wildlife ResearchCenterLaurel, Md.Lunch may be ordered from designated CMBA members and will be delivered to <strong>the</strong> Nature Center.Field Day in <strong>the</strong> Apiary ( Attendees advised to bring veil / protective gear)CMBA members will c<strong>on</strong>duct open hive examinati<strong>on</strong> in <strong>the</strong> apiary.Managing Pherom<strong>on</strong>esEverything you want to, or should know, aboutpherom<strong>on</strong>es in a h<strong>on</strong>ey bee col<strong>on</strong>y.Panel discussi<strong>on</strong>Kim FlottumEditor, “Bee Culture”Medina, Oh.4:15 Adjourn###3


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 63Let β i , i = 1, 2, . . . be <strong>the</strong> positive real soluti<strong>on</strong>s of P 2t−1 (β) = 0 ordered suchthat β i < β i+1 . Since s is a descent directi<strong>on</strong>, β 1 is always a minimum or saddlepoint to (8). C<strong>on</strong>sider <strong>the</strong> interval (0, µ] where µ > 0, <strong>the</strong> optimal step length<strong>on</strong> <strong>the</strong> interval is given byˆβ = arg{min(P 2t (β i ), P 2t (µ)) ∀ β i < µ}.Typically when c<strong>on</strong>sidering local c<strong>on</strong>vergence µ = 1 is used, corresp<strong>on</strong>ding to afull step length.4 The overall algorithmTo get a starting approximati<strong>on</strong> Q 0 <str<strong>on</strong>g>for</str<strong>on</strong>g> <strong>the</strong> n<strong>on</strong>linear solver first a least squaresproblem with equality c<strong>on</strong>straint is solved,˜Q 0 = minQ {||Fvec(Q) − b||2 2 , subject to vec(Q) T vec(Q) = √ n} (9)Since ˜Q 0 ∈ R m×n does not necessarily has orth<strong>on</strong>ormal columns, <strong>the</strong> OPPQ 0 = arg{minQ ||Q − ˜Q 0 || 2 F , subject to Q ∈ V m,n} (10)is solved yielding Q 0 ∈ V m,n , which is used as <strong>the</strong> initial value <str<strong>on</strong>g>for</str<strong>on</strong>g> <strong>the</strong> iterativealgorithm. The algorithm to compute a minimum to (1) works as follows.Algorithm: min ||f(Q) − b|| 2 2 subject to Q ∈ V m,n0. Compute Q 0 by solving (9) and (10).1. j = 0, µ = 1, ˆ∆ = 10 −10 , ∆ 0 = ˆ∆ + 1.2. While ∆ j > ˆ∆2.1. If (J T J + H) is positive definite2.1.1. compute a Newt<strong>on</strong> search directi<strong>on</strong> s = s N (7),2.2. else2.2.1. take a Gauss-Newt<strong>on</strong> search directi<strong>on</strong> s = s GN (6).2.3 Compute optimal step length ˆβ <strong>on</strong> <strong>the</strong> interval (0, µ] by solving (8).2.4 Update Q j+1 = [Q j , (Q j ) ⊥ ] exp(S(ˆβs))I m,n .2.5. j = j + 1.2.6.3. end While.∆ j+1 = ||JT (f(Q j ) − b)|| 2||J|| 2 ||f(Q j ) − b|| 2.The algorithm has been implemented in MATLAB, and can be downloadedfromhttp://www.cs.umu.se/ ∼ viklands/WOPP/index.html.


64 Paper II5 Computati<strong>on</strong>al experimentsIn this secti<strong>on</strong>, <strong>the</strong> algorithm presented is tested <strong>on</strong> randomly generated <str<strong>on</strong>g>problems</str<strong>on</strong>g>of different dimensi<strong>on</strong>s Q ∈ R m×n and F ∈ R k×mn . We mainly investigate<strong>the</strong> ability of computing a minimizer, and <strong>the</strong> number of iterati<strong>on</strong>s needed todo so. But, we also present some results regarding <strong>the</strong> efficiency of computing<strong>the</strong> global minimum.The matrix F is generated as a matrix with normally distributed randomnumbers. Then by manipulating <strong>the</strong> singular values of F different c<strong>on</strong>diti<strong>on</strong>numbers can be chosen. A random soluti<strong>on</strong> ˆQ is generated, and <strong>the</strong> exactmodel is <strong>the</strong>n ˆb = Fvec( ˆQ). To generate b, letb = ˆb + γ¯b,where ¯b is a perturbati<strong>on</strong> and γ > 0 a scalar. Some different methods to choose¯b ∈ R k have been c<strong>on</strong>sidered.1. Let each element ¯b i = ǫ i |ˆb i |, i = 1, ..., k, where ǫ i is a scalar chosen randomlyfrom <strong>the</strong> normal distributi<strong>on</strong>.2. Assume that Q is parameterized with p parameters. If k > p, we cancompute <strong>the</strong> jacobian at f( ˆQ) and let N ∈ R k×(k−p) be a basis of <strong>the</strong> nullspace of J T . Now take ¯b = ρNx where x ∈ R k−p is a vector with normallydistributed random numbers. ρ is a scalar used to make ||¯b|| 2 = ||f( ˆQ)|| 2 .In Item 1, relative perturbati<strong>on</strong>s are generated. Using this type of perturbati<strong>on</strong>changes <strong>the</strong> initial chosen soluti<strong>on</strong> ˆQ. That is, <strong>the</strong> generated ˆQ is not a minimum(critical point) to <strong>the</strong> optimizati<strong>on</strong> problem. By using Item 2, <strong>the</strong> initialsoluti<strong>on</strong> ˆQ will always be a critical point with residual γ¯b (but not necessarilya minimum). γ is here used to make <strong>the</strong> norm of <strong>the</strong> residual proporti<strong>on</strong>al to<strong>the</strong> norm of f( ˆQ). For instance, using γ = 0.1 means that <strong>the</strong> magnitude of <strong>the</strong>residual is 10% of <strong>the</strong> magnitude of <strong>the</strong> functi<strong>on</strong> value f( ˆQ). For small values ofγ, ˆQ should still be a global minimum after adding <strong>the</strong> perturbati<strong>on</strong>. Choosingtoo large values of γ often results in that ˆQ becomes a local minimum, saddlepoint or maximum. We want to add a small perturbati<strong>on</strong> such that ˆQ is stillglobal minimum, but large enough to cause trouble. That is, not so small that<strong>the</strong> algorithm becomes 100% successful in computing <strong>the</strong> global (generated)minimum ˆQ.5.1 Relative perturbati<strong>on</strong>sThe tables in Appendix C.1 display results <str<strong>on</strong>g>for</str<strong>on</strong>g> different dimensi<strong>on</strong>s m, n andF ∈ R mn×mn using relative perturbati<strong>on</strong>s (item 1 above). For a given c<strong>on</strong>diti<strong>on</strong>number κ(F) and noise level γ, 100 tests are randomly generated. The tablesdisplay <strong>the</strong> average number of iterati<strong>on</strong>s, and <strong>the</strong> corresp<strong>on</strong>ding standard deviati<strong>on</strong>inside paren<strong>the</strong>sis. For example, <str<strong>on</strong>g>for</str<strong>on</strong>g> m = 3 and n = 2 with κ(F) = 5 andγ = 0.01 (1% relative noise level), 100 tests were d<strong>on</strong>e. The average numberof iterati<strong>on</strong>s needed to compute a soluti<strong>on</strong> is 3.77, with a standard deviati<strong>on</strong> of0.42.


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 655.2 Null space perturbati<strong>on</strong>sFor a given dimensi<strong>on</strong> m, n and noise level γ, each table in Appendix C.2.1corresp<strong>on</strong>ds to a set of tests where <strong>the</strong> c<strong>on</strong>diti<strong>on</strong> number of F ∈ R mn×mn isvaried. For each c<strong>on</strong>diti<strong>on</strong> number, 100 test <str<strong>on</strong>g>problems</str<strong>on</strong>g> were generated with aperturbati<strong>on</strong> according to item 2 above. The tables in c<strong>on</strong>tain <strong>the</strong> followingin<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong>.κ(F) C<strong>on</strong>diti<strong>on</strong> number <str<strong>on</strong>g>for</str<strong>on</strong>g> <strong>the</strong> matrix F.Iterati<strong>on</strong>s The average number of iterati<strong>on</strong>s, and <strong>the</strong> corresp<strong>on</strong>dingstandard deviati<strong>on</strong> inside paren<strong>the</strong>sis (computed after running100 test <str<strong>on</strong>g>problems</str<strong>on</strong>g>).Fails C<strong>on</strong>tains <strong>the</strong> number of tests that resulted in a n<strong>on</strong>minimumsoluti<strong>on</strong>, due to exceeding 100 iterati<strong>on</strong>s. It isexpected that some generated <str<strong>on</strong>g>problems</str<strong>on</strong>g> can yield very slowc<strong>on</strong>vergence, hence <strong>the</strong> algorithm was set to terminate at100 iterati<strong>on</strong>s.New min For a given test problem generated with <strong>the</strong> exact soluti<strong>on</strong>ˆQ, let ¯Q be <strong>the</strong> soluti<strong>on</strong> computed by <strong>the</strong> algorithm. Thenumber of tests that resulted in that ˆQ ≠ ¯Q is shown here.This was d<strong>on</strong>e by checking if || ˆQ − ¯Q|| F > 10 −4 .Not global Shows <strong>the</strong> number of tests when ||f( ˆQ) − b|| 2 < ||f( ¯Q) −b|| 2 occurred. That is, <strong>the</strong> computed soluti<strong>on</strong> resulted in agreater residual norm than <strong>the</strong> generated soluti<strong>on</strong>.The ideal results are, e.g., those shown in Table 14. The ’Fails’ column with justzeroes, indicates that <strong>the</strong> algorithm managed to compute a minimum to all test<str<strong>on</strong>g>problems</str<strong>on</strong>g>. The column ’New min’ indicates that <strong>the</strong> computed soluti<strong>on</strong> is <strong>the</strong>same as <strong>the</strong> generated soluti<strong>on</strong>. Also, e.g., Table 13 shows good results. Here<strong>the</strong> computed soluti<strong>on</strong>s differ <strong>on</strong> several occasi<strong>on</strong>s from <strong>the</strong> generated soluti<strong>on</strong>s,seen in <strong>the</strong> column ’New min’. However, <strong>on</strong>ly a few of <strong>the</strong> computed soluti<strong>on</strong>sresulted in a greater residual norm, seen in <strong>the</strong> ’Not global’ column.5.2.1 Tests with n<strong>on</strong>-square FFor Q ∈ R m×n and F ∈ R k×mn , <strong>the</strong> computati<strong>on</strong>al experiments in previoussecti<strong>on</strong>s used k = mn (resulting in that F is a square matrix). Here we c<strong>on</strong>sider<strong>the</strong> case when k < mn, with perturbati<strong>on</strong>s according to item 2. For <strong>the</strong> results,shown in Appendix C.2.2, k = mn − n was used. As earlier, <str<strong>on</strong>g>for</str<strong>on</strong>g> each c<strong>on</strong>diti<strong>on</strong>number κ(F), 100 tests were made. The tables show <strong>the</strong> same in<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong> asdescribed in Secti<strong>on</strong> 5.2.6 Summary of computati<strong>on</strong>al experimentsThe computati<strong>on</strong>al experiments presented in Appendix C, show that <strong>the</strong> algorithmis efficient in computing a soluti<strong>on</strong> to (1). Tables indicates that around


66 Paper II5−15 iterati<strong>on</strong>s were needed <strong>on</strong> an average, depending <strong>on</strong> <strong>the</strong> problem dimensi<strong>on</strong>and noise level.When it comes to <strong>the</strong> success rate of computing <strong>the</strong> global soluti<strong>on</strong>, <strong>the</strong> algorithmseems quite successful. First of all, no global optimizati<strong>on</strong> algorithm hasbeen used during <strong>the</strong> experiments. By using small γ values and perturbati<strong>on</strong>saccording to Item 2 above, ˆQ should most often be global minimizer. Typicallythis was <strong>the</strong> case when using γ = 0.05 and γ = 0.1, while <str<strong>on</strong>g>for</str<strong>on</strong>g> γ = 0.2 ˆQ wouldmore often become a local minimum, saddle point or maximum.For test <str<strong>on</strong>g>problems</str<strong>on</strong>g> generated in Appendix C.2.1, with m = 6, 10, 12, <strong>the</strong>algorithm most often computed a soluti<strong>on</strong> ¯Q, better 2 than or same as ˆQ. Ra<strong>the</strong>rsurprisingly <strong>the</strong> worst results appear in <strong>the</strong> low-dimensi<strong>on</strong>al case with m = 3,where around 10% of <strong>the</strong> computed soluti<strong>on</strong>s yielded a greater residual normthan ˆQ. For all tests, using γ = 0.2 quite often resulted in that ˆQ was not aglobal minimum. By subtracting <strong>the</strong> ’Not global’ column from <strong>the</strong> ’New min’column, <strong>the</strong> number of test <str<strong>on</strong>g>problems</str<strong>on</strong>g> where ||f( ¯Q) − b|| 2 < ||f( ˆQ) − b|| 2 isgiven. Typically ˆQ would become a saddle point most of <strong>the</strong> cases, when <strong>the</strong>perturbati<strong>on</strong> was added.For a n<strong>on</strong>-square F, in Appendix C.2.2, <strong>the</strong> tables show a noticeable increasein <strong>the</strong> number of average iterati<strong>on</strong>s. In <strong>the</strong> tables with m = 6 and n = 5,10% − 40% of <strong>the</strong> experiments resulted in that <strong>the</strong> computed soluti<strong>on</strong> yieldeda greater residual than ˆQ. However, when looking at <strong>the</strong> tables with m = 10and n = 4 many of <strong>the</strong> tests resulted in that ˆQ became a local minimum(or maximum/saddle-point) after adding <strong>the</strong> perturbati<strong>on</strong>. And <strong>the</strong> computedsoluti<strong>on</strong> yielded a smaller residual norm. Even though <strong>the</strong>se <str<strong>on</strong>g>problems</str<strong>on</strong>g> are ofdifferent dimensi<strong>on</strong>s, 6 × 5 and 10 × 4, it is not clear why <strong>the</strong> results are quitevarying in this sense.Never<strong>the</strong>less, in total 41800 tests are presented here. Out of <strong>the</strong>se tests, 5resulted in that <strong>the</strong> algorithm terminated due to more than 100 iterati<strong>on</strong>s wereper<str<strong>on</strong>g>for</str<strong>on</strong>g>med (without fulfilling <strong>the</strong> desired tolerance). Since <strong>the</strong> algorithm usesGauss-Newt<strong>on</strong> steps, unless <strong>the</strong> Hessian J T J + H is positive definite, this canin some cases (with large residuals), result in slow c<strong>on</strong>vergence. Specially if <strong>the</strong>computed initial matrix Q 0 is a bad starting value. However, <strong>on</strong> <strong>the</strong> total, <str<strong>on</strong>g>for</str<strong>on</strong>g><strong>the</strong>se tests, it was a rare scenario.AppendixA The can<strong>on</strong>ical <str<strong>on</strong>g>for</str<strong>on</strong>g>m of a WOPPPropositi<strong>on</strong> A.1 The matrices A ∈ R mA×m and X ∈ R n×nX with Rank(A) =m and Rank(X) = n bel<strong>on</strong>ging to a WOPPmin 1 2 ||AQX − B||2 F , subject to Q T Q = I n ,can always be c<strong>on</strong>sidered as m by m and n by n diag<strong>on</strong>al matrices, respectively.2 Better in <strong>the</strong> sense that ¯Q resulted in a smaller residual.


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 67be <strong>the</strong> singular value decom-Proof. Let A = U A Σ A VA T and X = U XΣ X VX Tpositi<strong>on</strong> of A and X. Then||U A Σ A V T A QU B Σ B V T B − B|| 2 F = ||U A Σ A ZΣ X V T X − B|| 2 F,where Z = V T A QU X ∈ R m×n has orth<strong>on</strong>ormal columns. Since U T A U A = I mAand V T X V X = I nX it follows that||U A Σ A ZΣ X V T X − B||2 F = tr(U AΣ A ZΣ X V T X − B)T (U A Σ A ZΣ X V T X − B) == tr(V X Σ X Z T Σ 2 AZΣ X V T X − 2V X Σ X Z T Σ 2 AU T AB + B T B) == tr(Σ X Z T Σ 2 AZΣ X ) − tr(2Σ B Z T Σ A U T ABV X ) + tr(B T B) =tr(Σ A ZΣ X − U T A BV X) T (Σ A ZΣ X − U T A BV X) = ||Σ A ZΣ X − U T A BV X|| 2 F .Hence, without loss of generality we can assume that A = diag(α 1 , ..., α m ) andX = diag(χ 1 , ..., χ n ) with α i ≥ α i+1 ≥ 0 and χ i ≥ χ i+1 ≥ 0. ✷B Parametrizati<strong>on</strong> of V m,n by using <strong>the</strong> Cayleytrans<str<strong>on</strong>g>for</str<strong>on</strong>g>mThe Cayley is often used to represent orthog<strong>on</strong>al matrices with positive determinantsasQ(S) = (I + S)(I − S) −1 , (11)where S ∈ R m×m is skew-symmetric S = −S T . Since a skew-symmetric matrixhas imaginary eigenvalues, (I − S) always has full rank. However, thisparametrizati<strong>on</strong> fails in some cases, namely when ( ˜Q + I) is singular. As anexample, <strong>the</strong>re exist no S ∈ R 2×2 such that Q(S) = diag(−1, −1). Instead ofusing (11) as a parametrizati<strong>on</strong> of orthog<strong>on</strong>al matrices, a local parametrizati<strong>on</strong>can be used. Given a point ˜Q ∈ V m,n , we can express any Q ∈ V m,m in <strong>the</strong>vicinity of ˜Q by usingQ(S) = ˜Q(I + S)(I − S) −1 . (12)To get a local parametrizati<strong>on</strong> of V m,n when n ≤ m, (12) is modified accordingto <strong>the</strong> following. Given a point ˜Q ∈ V m,n , <strong>the</strong>n a parametrizati<strong>on</strong> <str<strong>on</strong>g>for</str<strong>on</strong>g> anyQ ∈ V m,n in <strong>the</strong> vicinity of ˜Q can be written asQ(S) = [ ˜Q, ˜Q ⊥ ](I + S)(I − S) −1 I m,n . (13)Here ˜Q ⊥ is any extensi<strong>on</strong> such that [ ˜Q, ˜Q ⊥ ] ∈ R m×m is orthog<strong>on</strong>al and[ ]InI m,n = ∈ R m×n .0S is skew-symmetric according to[S11 −S21S =T S 21 0], (14)


68 Paper IIwhere S 11 ∈ R n×n is skew-symmetric and S 21 ∈ R m×n is arbitrary. The remaininglower right part in S is a zero matrix. Observe that if m = n, <strong>the</strong>n(13) is <strong>the</strong> same as (12).B.1 Search directi<strong>on</strong>s with Cayley representati<strong>on</strong>For a given search directi<strong>on</strong> s at a point ˜Q, moving al<strong>on</strong>g <strong>the</strong> surface of f(Q)can be d<strong>on</strong>e by usingQ(φ) = [ ˜Q, ˜Q ⊥ ]C φ (φ)I m,n ,where=C φ (φ) =˜p∑j=1˜p∑j=1U j[cos(φj ) − sin(φ j )sin(φ j ) cos(φ j )[(cos(φ j )U j Uj H 0 −1+ sin(φ j )U j1 0]U T j = (15)]U T j ). (16)By using <strong>the</strong> spectral decompositi<strong>on</strong> of S, S = WDW H , <strong>the</strong> decompositi<strong>on</strong>C φ (φ) = UΦ(φ)U T is derived [6]. U ∈ R m×m is orthog<strong>on</strong>al,Φ(φ) ={ diag(Φ1 , Φ 2 , ..., Φ m/2 ) if m is even ⇒ ˜p = m/2,diag(Φ 1 , Φ 2 , ..., Φ (m−1)/2 , 1) o<strong>the</strong>rwise ⇒ ˜p = (m − 1)/2 + 1,whereΦ i =Now using (16) to express C φ (φ) yieldsf(Q(φ)) =˜p∑j=1[ cos(φi ) − sin(φ i )sin(φ i ) cos(φ i )].[(cos(φ j )f( ˜QU j Uj T I m,n ) + sin(φ)f( ˜QU 0 −1j1 0]U T j I m,n )) =cos(φ 1 )f 1,cos + sin(φ 1 )f 1,sin + ... + cos(φ˜p )f˜p,cos + sin(φ˜p )f˜p,sin .The optimal C φ (φ) is given by solving <strong>the</strong> least squares problem[ ] [ ] [ ]cosφ1 cosφ2min ||A f1 + Aφ sin φf2 + . . . + A cosφ˜p1 sin φf˜p− b|| 222, (17)sinφ˜pwhere A fi = [f i,cos , f i,sin ] ∈ R k×2 .Two different approaches to solve this subproblem are c<strong>on</strong>sidered. A traditi<strong>on</strong>alGauss Newt<strong>on</strong> or Newt<strong>on</strong> method can be used to solve (17). However,empirical studies have shown that <strong>the</strong> Jacobian matrix of f(Q(φ)) can occasi<strong>on</strong>allybecome ill c<strong>on</strong>diti<strong>on</strong>ed. Hence using a Gauss-Newt<strong>on</strong> method can result inslow c<strong>on</strong>vergence. Switching to a Newt<strong>on</strong> method <strong>the</strong>n might result in c<strong>on</strong>vergencetowards a maximum. Since <strong>the</strong> parameters φ i are periodic, a large searchdirecti<strong>on</strong> when solving (17) can result in a seemingly randomized step.


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 69In <strong>the</strong> cases when Newt<strong>on</strong> type algorithms fail, a coordinate-wise search canbe used. This is d<strong>on</strong>e by keeping every angle but <strong>on</strong>e φ i ∈ φ fix, and use it as aminimizer. Then repeating this <str<strong>on</strong>g>for</str<strong>on</strong>g> all angles φ j ∈ φ, j = 1, ..., ˜p.Algorithm: Coordinate-wise search0. Given a search directi<strong>on</strong> s (S), compute U.1. Set φ 1 = φ 2 = ... = φ˜p = 0.2. While φ is not a minimizer to (17)2.1. <str<strong>on</strong>g>for</str<strong>on</strong>g> i = 1 to ˜p2.1.1.c =˜p∑j=1,j≠i2.1.2. Let φ i be <strong>the</strong> soluti<strong>on</strong> of2.2. end <str<strong>on</strong>g>for</str<strong>on</strong>g>2.3 end While[ ]cosφjA fjsin φ j[ ]cosφimin ||A fi − (b − c)||φ i sin φ 2 2 . (18)iThe subproblem (18) is solved optimally by computing all soluti<strong>on</strong>s to afourth degree polynomial, see [13]. This is a very robust method in order tominimize (17), but <str<strong>on</strong>g>for</str<strong>on</strong>g> larger <str<strong>on</strong>g>problems</str<strong>on</strong>g> it can be a time c<strong>on</strong>suming task. Also ithas shown to result in too short step lengths, resulting in a slow c<strong>on</strong>vergence.


70 Paper IICResults from computati<strong>on</strong>al experimentsC.1 Tables <str<strong>on</strong>g>for</str<strong>on</strong>g> <strong>the</strong> relative type of perturbati<strong>on</strong>sκ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3 ( 0 ) 3.13 ( 0.34 ) 3.77 ( 0.42 ) 3.92 ( 0.37) 4.18 ( 0.52)5 3.15 ( 0.36 ) 3.77 ( 0.42 ) 4.21 ( 0.46 ) 4.4 ( 0.57) 4.73 ( 0.97)10 3.34 ( 0.48 ) 3.93 ( 0.48 ) 4.46 ( 0.63 ) 4.69 ( 0.72) 4.91 ( 1.06)50 3.69 ( 0.49 ) 4.13 ( 0.58 ) 5.13 ( 1.01 ) 5.15 ( 1.28) 5.18 ( 0.93)100 3.77 ( 0.6 ) 4.25 ( 0.73 ) 5.2 ( 1.05 ) 5.39 ( 1.14) 5.14 ( 1.06)250 3.72 ( 0.6 ) 4.74 ( 1.28 ) 5.21 ( 1.09 ) 5.24 ( 1.06) 5.58 ( 1.61)500 3.92 ( 0.8 ) 4.87 ( 1.51 ) 5.4 ( 1.41 ) 5.76 ( 1.96) 5.15 ( 1.04)1000 3.87 ( 0.77 ) 4.84 ( 1.22 ) 5.26 ( 1.3 ) 5.66 ( 1.36) 5.55 ( 1.31)2500 4.15 ( 1.1 ) 5.14 ( 2.43 ) 5.27 ( 1.12 ) 5.53 ( 1.36) 5.35 ( 1.1)5000 4.54 ( 1.53 ) 5.2 ( 1.37 ) 5.38 ( 1.43 ) 5.3 ( 1.12) 5.4 ( 1.56)10000 4.6 ( 1.34 ) 5.17 ( 1.41 ) 5.29 ( 1.13 ) 5.63 ( 1.76) 5.52 ( 1.32)Table 1: m = 3, n = 2.κ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3.05 ( 0.5 ) 3.29 ( 0.52 ) 4.00 ( 0 ) 4 ( 0) 4.27 ( 0.49)5 3.19 ( 0.39 ) 4 ( 0 ) 4.4 ( 0.49 ) 4.94 ( 0.51) 5.46 ( 0.77)10 3.71 ( 0.46 ) 4 ( 0 ) 4.95 ( 0.5 ) 5.35 ( 0.63) 6.07 ( 0.96)50 3.98 ( 0.14 ) 4.44 ( 0.54 ) 5.66 ( 0.79 ) 6.26 ( 1.38) 7.07 ( 3.27)100 4.02 ( 0.25 ) 4.75 ( 0.61 ) 5.79 ( 1.15 ) 6.32 ( 1.65) 7.13 ( 3.08)250 4.1 ( 0.3 ) 4.99 ( 0.82 ) 5.88 ( 0.79 ) 6.32 ( 1.06) 6.95 ( 2.04)500 4.11 ( 0.37 ) 5.12 ( 0.83 ) 5.93 ( 0.87 ) 6.59 ( 2.18) 7.33 ( 2.31)1000 4.27 ( 0.57 ) 5.19 ( 0.85 ) 5.95 ( 1.91 ) 6.44 ( 1.56) 7.45 ( 3.34)2500 4.59 ( 0.93 ) 5.18 ( 0.78 ) 5.81 ( 0.66 ) 6.28 ( 1.16) 7.11 ( 2.7)5000 4.63 ( 0.86 ) 5.48 ( 1.18 ) 6.08 ( 1.35 ) 6.2 ( 1.06) 6.76 ( 1.24)10000 4.92 ( 1.01 ) 5.29 ( 1 ) 5.89 ( 0.91 ) 6.33 ( 1.23) 6.89 ( 1.49)Table 2: m = 6, n = 5.


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 71κ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3.07 ( 0.7 ) 3.4 ( 0.6 ) 4 ( 0) 4 ( 0 ) 4.17 ( 0.43 )5 3.41 ( 0.49 ) 4 ( 0 ) 4.72 ( 0.45) 5.01 ( 0.33 ) 5.58 ( 0.67 )10 3.99 ( 0.1 ) 4.02 ( 0.14 ) 5.3 ( 0.48) 5.83 ( 0.57 ) 6.19 ( 1.24 )50 4.01 ( 0.1 ) 4.98 ( 0.53 ) 6.3 ( 0.7) 6.51 ( 0.87 ) 7.32 ( 2.06 )100 4.07 ( 0.26 ) 5.28 ( 0.65 ) 6.41 ( 0.79) 6.7 ( 1.11 ) 7.47 ( 2.02 )250 4.32 ( 0.47 ) 5.48 ( 0.89 ) 6.44 ( 0.9) 7.01 ( 1.45 ) 6.86 ( 1.1 )500 4.33 ( 0.49 ) 5.74 ( 0.8 ) 6.54 ( 0.88) 6.79 ( 1.08 ) 7.11 ( 1.54 )1000 4.67 ( 0.74 ) 5.63 ( 0.79 ) 6.53 ( 0.81) 7.02 ( 2.16 ) 7.58 ( 2.77 )2500 4.73 ( 0.85 ) 5.87 ( 0.98 ) 6.64 ( 1.19) 6.95 ( 1.4 ) 7.44 ( 2.28 )5000 5.23 ( 1.04 ) 5.82 ( 0.98 ) 6.65 ( 0.89) 6.85 ( 1.1 ) 7.06 ( 1.51 )10000 5.16 ( 1.13 ) 5.87 ( 1.02 ) 6.6 ( 1.02) 6.73 ( 1.05 ) 7.29 ( 2.26 )Table 3: m = 12, n = 5.κ(F) γ = 0.001 γ = 0.01 γ = 0.1 γ = 0.2 γ = 0.52 3.04 ( 0.4 ) 3.53 ( 0.61 ) 4 ( 0 ) 4 ( 0 ) 4.36 ( 0.48 )5 3.29 ( 0.46 ) 4 ( 0 ) 4.7 ( 0.46 ) 5.03 ( 0.22 ) 6.15 ( 1.77 )10 3.99 ( 0.1 ) 4 ( 0 ) 5.09 ( 0.29 ) 5.58 ( 0.54 ) 6.39 ( 1.11 )50 4 ( 0 ) 4.69 ( 0.46 ) 5.97 ( 0.63 ) 6.41 ( 0.98 ) 8.65 ( 7.51 )100 4 ( 0 ) 4.95 ( 0.41 ) 6.01 ( 0.58 ) 6.64 ( 1.09 ) 7.75 ( 1.95 )250 4.08 ( 0.27 ) 5.2 ( 0.68 ) 6.18 ( 0.67 ) 6.6 ( 0.94 ) 7.8 ( 3.24 )500 4.19 ( 0.42 ) 5.48 ( 0.81 ) 6.26 ( 0.75 ) 6.39 ( 0.71 ) 7.74 ( 3.02 )1000 4.39 ( 0.62 ) 5.44 ( 0.72 ) 6.13 ( 0.65 ) 6.48 ( 0.9 ) 8.13 ( 5.31 )2500 4.64 ( 0.82 ) 5.62 ( 0.76 ) 6.25 ( 0.69 ) 6.6 ( 0.9 ) 7.59 ( 2.82 )5000 4.76 ( 0.81 ) 5.39 ( 0.72 ) 6.1 ( 0.73 ) 6.59 ( 0.94 ) 8.36 ( 4.15 )10000 5.09 ( 1 ) 5.51 ( 0.83 ) 6.07 ( 0.54 ) 6.41 ( 0.65 ) 8 ( 2.86 )Table 4: m = 10, n = 7.


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 75κ(F) Iterati<strong>on</strong>s Fails New min Not global2 4.22 ( 0.42 ) 0 0 05 6.12 ( 1.27 ) 0 4 110 8.06 ( 3.64 ) 0 27 050 9.5 ( 4.24 ) 0 70 2100 10.09 ( 7.04 ) 0 67 1250 8.91 ( 2.87 ) 0 69 3500 8.93 ( 2.09 ) 2 61 11000 9.34 ( 4.38 ) 0 63 02500 10.28 ( 6.75 ) 0 65 25000 10.16 ( 5.02 ) 0 72 110000 9.47 ( 4.73 ) 0 68 1Table 13: m = 12, n = 5, γ = 0.2.κ(F) Iterati<strong>on</strong>s Fails New min Not global2 4 ( 0 ) 0 0 05 4.6 ( 0.49 ) 0 0 010 5.03 ( 0.17 ) 0 0 050 5.9 ( 0.61 ) 0 0 0100 6.09 ( 0.64 ) 0 0 0250 6.09 ( 0.6 ) 0 0 0500 6.03 ( 0.64 ) 0 0 01000 6.14 ( 0.64 ) 0 0 02500 6.19 ( 0.54 ) 0 0 05000 6.05 ( 0.52 ) 0 0 010000 6.23 ( 0.58 ) 0 0 0Table 14: m = 10, n = 5, γ = 0.05.κ(F) Iterati<strong>on</strong>s Fails New min Not global2 4 ( 0 ) 0 0 05 5.01 ( 0.1 ) 0 0 010 5.65 ( 0.61 ) 0 0 050 6.13 ( 0.56 ) 0 0 0100 6.54 ( 0.86 ) 0 2 0250 6.44 ( 0.88 ) 0 0 0500 6.56 ( 0.81 ) 0 0 01000 6.63 ( 0.91 ) 0 0 02500 6.62 ( 0.83 ) 0 2 05000 6.45 ( 0.77 ) 0 1 010000 6.61 ( 1.38 ) 0 2 0Table 15: m = 10, n = 5, γ = 0.1.


76 Paper IIκ(F) Iterati<strong>on</strong>s Fails New min Not global2 4.03 ( 0.17 ) 0 0 05 5.68 ( 0.55 ) 0 0 010 6.56 ( 0.98 ) 0 5 050 8.99 ( 4.06 ) 0 20 1100 9.02 ( 5.49 ) 0 22 5250 10.67 ( 10.24 ) 0 31 4500 9.44 ( 9.4 ) 0 28 11000 8.48 ( 2.89 ) 0 33 02500 8.04 ( 2.13 ) 0 27 45000 9.24 ( 5.15 ) 0 26 210000 8.73 ( 3.52 ) 0 24 1Table 16: m = 10, n = 5, γ = 0.2.C.2.2 Using n<strong>on</strong>-square matrix F ∈ R mn−n×mnκ(F) Iterati<strong>on</strong>s Fails New min Not global2 7.25 ( 1.48 ) 0 9 95 7.65 ( 1.47 ) 0 10 1010 9.29 ( 4.78 ) 0 10 1050 10.03 ( 3.25 ) 0 18 17100 11.9 ( 4.44 ) 0 15 14250 11.81 ( 4.08 ) 0 24 23500 12.76 ( 3.95 ) 0 31 281000 13.34 ( 4.34 ) 0 36 342500 13.33 ( 3.83 ) 0 35 355000 14.08 ( 4.38 ) 0 25 2410000 13.65 ( 3.88 ) 0 35 33Table 17: m = 6, n = 5, γ = 0.05.


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 77κ(F) Iterati<strong>on</strong>s Fails New min Not global2 7.36 ( 1.84) 0 8 85 8.86 ( 2.78) 0 8 810 9.67 ( 3.6) 0 15 1150 10.7 ( 3.34) 0 39 28100 12.44 ( 4.53) 0 35 24250 13.47 ( 4.39) 0 43 33500 13.82 ( 4.56) 0 43 331000 14.72 ( 7.26) 0 36 292500 14.52 ( 5.25) 0 47 355000 14.48 ( 5.23) 0 41 3510000 14.94 ( 4.93) 0 42 31Table 18: m = 6, n = 5, γ = 0.1.κ(F) Iterati<strong>on</strong>s Fails New min Not global2 8.01 ( 2.63 ) 0 10 75 9.24 ( 2.79 ) 0 26 1010 10.11 ( 3.2 ) 0 46 1950 12.18 ( 4.93 ) 1 60 26100 13.21 ( 3.81 ) 0 58 33250 14.91 ( 7.41 ) 0 71 36500 13.16 ( 5.52 ) 0 67 291000 12.92 ( 3.52 ) 0 62 272500 14.73 ( 5.48 ) 0 58 325000 15.05 ( 8.87 ) 0 67 3810000 14.47 ( 5.44 ) 0 61 32Table 19: m = 6, n = 5, γ = 0.2.κ(F) Iterati<strong>on</strong>s Fails New min Not global2 7.35 ( 1.45 ) 0 2 25 8.21 ( 1.47 ) 0 4 010 8.92 ( 1.79 ) 0 14 350 11.57 ( 3.21 ) 0 42 9100 12.47 ( 3.25 ) 0 53 7250 14.38 ( 4.49 ) 0 50 11500 13.81 ( 3.06 ) 0 40 41000 15.37 ( 4.99 ) 0 39 82500 14.77 ( 3.54 ) 0 38 45000 14.79 ( 3.87 ) 0 49 910000 15.18 ( 4.99 ) 0 45 9Table 20: m = 10, n = 4, γ = 0.05.


78 Paper IIκ(F) Iterati<strong>on</strong>s Fails New min Not global2 7.55 ( 1.36 ) 0 19 25 8.96 ( 5.19 ) 0 29 110 9.54 ( 3.19 ) 0 46 650 11.49 ( 2.49 ) 0 62 2100 13.09 ( 3.87 ) 0 55 3250 13.72 ( 5.38 ) 0 68 4500 14.2 ( 3.53 ) 0 59 61000 14.15 ( 3.66 ) 0 69 62500 14.6 ( 4.3 ) 0 74 95000 14.38 ( 4.37 ) 0 59 410000 14.78 ( 3.49 ) 0 74 7Table 21: m = 10, n = 4, γ = 0.1.κ(F) Iterati<strong>on</strong>s Fails New min Not global2 8.63 ( 3.83 ) 0 60 25 9.39 ( 3.44 ) 0 70 210 10.29 ( 3.26 ) 0 85 550 12.42 ( 3.53 ) 0 79 4100 13.61 ( 4.69 ) 0 79 0250 14.38 ( 7.06 ) 0 88 3500 13.64 ( 3.12 ) 0 81 31000 15.04 ( 4.18 ) 1 85 22500 14.79 ( 6.26 ) 0 85 35000 15.51 ( 9.29 ) 0 81 210000 13.62 ( 3.09 ) 1 78 2Table 22: m = 10, n = 4, γ = 0.2.References[1] M. T. Chu and N. T. Trendafilov. On a Differential Equati<strong>on</strong> Approach to<strong>the</strong> Weighted Orthog<strong>on</strong>al Procrustes Problem. Statistics and Computing,8(2):125–133, 1998.[2] M. T. Chu and N. T. Trendafilov. The Orthog<strong>on</strong>ally C<strong>on</strong>strained Regressi<strong>on</strong>Revisted. J. Comput. Graph. Stat., 10:746–771, 2001.[3] A. Edelman, T. A. Arias, and S. T. Smith. The Geometry of <str<strong>on</strong>g>Algorithms</str<strong>on</strong>g>with Orthog<strong>on</strong>ality C<strong>on</strong>straints. SIAM Journal <strong>on</strong> Matrix Analysis andApplicati<strong>on</strong>s, 20(2):303–353, 1998.[4] L. Eldén and H. Park. A Procrustes problem <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong>.Numer. Math., 82(4):599–619, 1999.[5] W. Gander. <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> with a Quadratic C<strong>on</strong>straint. Numer. Math.,36:291–307, 1981.


<str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>Linear</str<strong>on</strong>g> <str<strong>on</strong>g>Least</str<strong>on</strong>g> <str<strong>on</strong>g>Squares</str<strong>on</strong>g> <str<strong>on</strong>g>problems</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>Stiefel</strong> <strong>manifold</strong> 79[6] P. R. Halmos. Finite-dimensi<strong>on</strong>al vector spaces. Van Nostrand, 1958.[7] M. A. Koschat and D. F. Swayne. A Weig<strong>the</strong>d Procrustes Criteri<strong>on</strong>. Psychometrika,56(2):229–239, 1991.[8] A. Mooijaart and J. J. F. Commandeur. A General Soluti<strong>on</strong> of <strong>the</strong> Weig<strong>the</strong>dOrth<strong>on</strong>ormal Procrustes Problem. Psychometrika, 55(4):657–663, 1990.[9] T. Rapcsak. On Minimizati<strong>on</strong> <strong>on</strong> <strong>Stiefel</strong> <strong>manifold</strong>s. European J. Oper.Res., 143(2):365–376, 2002.[10] I. Söderkvist. Some Numerical Methods <str<strong>on</strong>g>for</str<strong>on</strong>g> Kinematical Analysis. ISSN-0348-0542, UMINF-186.90, Department of Computing Science, Umeå University,1990.[11] I. Söderkvist and Per-Åke Wedin. On C<strong>on</strong>diti<strong>on</strong> Numbers and <str<strong>on</strong>g>Algorithms</str<strong>on</strong>g><str<strong>on</strong>g>for</str<strong>on</strong>g> Determining a Rigid Body Movement. BIT, 34:424–436, 1994.[12] E. <strong>Stiefel</strong>. Richtungsfelder und Fernparallelismus in n-dimensi<strong>on</strong>alen Mannigfaltigkeiten.Commentarii Math. Helvetici, 8:305–353, 1935-1936.[13] P. Å . Wedin and T. Viklands. <str<strong>on</strong>g>Algorithms</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> 3-dimensi<strong>on</strong>al WeightedOrthog<strong>on</strong>al Procrustes Problems. Technical Report UMINF-06.06, Departmentof Computing Science, Umeå University, Umeå, Sweden, 2006.


80 Paper II

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!