lecture notes on matrice and least squares

lecture notes on matrice and least squares lecture notes on matrice and least squares

mesoscopic.mines.edu
from mesoscopic.mines.edu More from this publisher

76 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAFor example,AB =On the other h<strong>and</strong>, note well thatBA =0 12 31 23 4 0 12 3 1 23 4==4 78 153 411 16. (3.2.21)= AB. (3.2.22)This definiti<strong>on</strong> of matrix-matrix product even extends to the case in which both <strong>matrice</strong>sare vectors. If x ∈ R m <strong>and</strong> y ∈ R n , then xy (called the “outer” product <strong>and</strong> usuallywritten as xy T ) is(xy) ij = x i y j . (3.2.23)So if<strong>and</strong>thenxy T =x =y =−11⎡⎢⎣130⎤−1 −3 01 3 0(3.2.24)⎥⎦ (3.2.25). (3.2.26)Here is a brief summary of the notati<strong>on</strong> for inner products:x · y = x T y = (x y) = ix i y i = x i y i summati<strong>on</strong> c<strong>on</strong>venti<strong>on</strong>3.3 Some Special MatricesThe identity element in the space of square n × n <strong>matrice</strong>s is a matrix with <strong>on</strong>es <strong>on</strong> themain diag<strong>on</strong>al <strong>and</strong> zeros everywhere else⎡⎤1 0 0 0 . . .0 1 0 0 . . .I n =0 0 1 0 . . .. (3.3.1)⎢. ⎣ . .. ⎥⎦0 . . . 0 0 1


78 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAthat it is never negative <strong>and</strong> it is zero if <strong>and</strong> <strong>on</strong>ly if the scalar itself is zero. For vectors<strong>and</strong> <strong>matrice</strong>s both we can define a generalizati<strong>on</strong> of this c<strong>on</strong>cept of length called a norm.A norm is a functi<strong>on</strong> from the space of vectors <strong>on</strong>to the scalars, denoted by · satisfyingthe following properties for any two vectors v <strong>and</strong> u <strong>and</strong> any scalar α:Definiti<strong>on</strong> 2 NormsN1: v > 0 for any v = 0 <strong>and</strong> v = 0 ⇔ v = 0N2: αv = |α|vN3: v + u ≤ v + uProperty N3 is called the triangle inequality.The most useful class of norms for vectors in R n is the p norm defined for p ≥ 1 by n 1/px p = |x i | p . (3.4.1)i=1For p = 2 this is just the ordinary Euclidean norm: x 2 = √ x T x. A finite limit of the p norm exists as p → ∞ called the ∞ norm:x ∞ = max1≤i≤n |x i| (3.4.2)We w<strong>on</strong>’t need matrix norms in this class, but in case you’re interested any norm <strong>on</strong>vectors in R n induces a norm <strong>on</strong> <strong>matrice</strong>s viaA = maxx=0Axx . (3.4.3)E.g., Let x = (1 1), then x = √ 1 · 1 + 1 · 1 = √ 2.3.5 Projecting Vectors Onto Other VectorsFigure 3.1 illustrates the basic idea of projecting <strong>on</strong>e vector <strong>on</strong>to another. We can alwaysrepresent <strong>on</strong>e, say b, in terms of its comp<strong>on</strong>ents parallel <strong>and</strong> perpendicular to the other.The length of the comp<strong>on</strong>ent of b al<strong>on</strong>g a is b cos θ which is also b T a/aNow suppose we want to c<strong>on</strong>struct a vector in the directi<strong>on</strong> of a but whose length is thecomp<strong>on</strong>ent of b al<strong>on</strong>g b. We did this, in effect, when we computed the tangential force


3.5. PROJECTING VECTORS ONTO OTHER VECTORS 79yba - bb cos axFigure 3.1: Let a <strong>and</strong> b be any two vectors. We can always represent <strong>on</strong>e, say b, in termsof its comp<strong>on</strong>ents parallel <strong>and</strong> perpendicular to the other. The length of the comp<strong>on</strong>entof b al<strong>on</strong>g a is b cos θ which is also b T a/a.of gravity <strong>on</strong> a simple pendulum. What we need to do is multiply b cos θ by a unitvector in the a directi<strong>on</strong>. Obviously a c<strong>on</strong>venient unit vector in the a directi<strong>on</strong> is a/a,which equalsa√aT a .So a vector in the a with length b cos θ is given byb cos θâ = aT b aa a(3.5.1)= a a T ba a = aaT ba T a = aaTa T a b (3.5.2)As an exercise verify that in general a(a T b) = (aa T )b. This is not completely obvioussince in <strong>on</strong>e expressi<strong>on</strong> there is an inner product in the parenthesis <strong>and</strong> in the other thereis an outer product.What we’ve managed to show is that the projecti<strong>on</strong> of the vector b into the directi<strong>on</strong> ofa can be achieved with the following matrix (operator)aa Ta T a .This is our first example of a projecti<strong>on</strong> operator.


3.8. MATRIX INVERSES 833.7.2 A Geometrical PictureAny vector in the null space of a matrix, must be orthog<strong>on</strong>al to all the rows (since eachcomp<strong>on</strong>ent of the matrix dotted into the vector is zero). Therefore all the elementsin the null space are orthog<strong>on</strong>al to all the elements in the row space. In mathematicalterminology, the null space <strong>and</strong> the row space are orthog<strong>on</strong>al complements of <strong>on</strong>e another.Or, to say the same thing, they are orthog<strong>on</strong>al subspaces of R m . Similarly, vectors in theleft null space of a matrix are orthog<strong>on</strong>al to all the columns of this matrix. This meansthat the left null space of a matrix is the orthog<strong>on</strong>al complement of the column space;they are orthog<strong>on</strong>al subspaces of R n .3.8 Matrix InversesA left inverse of a matrix A ∈ R n×m is defined to be a matrix B such thatA right inverse C therefore must satisfyBA = I. (3.8.1)AC = I. (3.8.2)If there exists a left <strong>and</strong> a right inverse of A then they must be equal since matrixmultiplicati<strong>on</strong> is associative:AC = I ⇒ B(AC) = B ⇒ (BA)C = B ⇒ C = B. (3.8.3)Now if we have more equati<strong>on</strong>s than unknowns then the columns cannot possibly spanall of R n . Certainly the rank r must be less than or equal to n, but it can <strong>on</strong>ly equal nif we have at <strong>least</strong> as many unknowns as equati<strong>on</strong>s. The basic existence result is then:Theorem 2 Existence of soluti<strong>on</strong>s to Ax = y The system Ax = y has at <strong>least</strong> <strong>on</strong>esoluti<strong>on</strong> x for every y there might be infinitely many soluti<strong>on</strong>s) if <strong>and</strong> <strong>on</strong>ly if the columnsspan R n r = n), in which case there exists an m × n right inverse C such that AC = I n .This is <strong>on</strong>ly possible if n ≤ m.D<strong>on</strong>’t be mislead by the picture above into neglecting the important special case whenm = n. The point is that the basic issues of existence <strong>and</strong>, next, uniqueness, depend <strong>on</strong>whether there are more or fewer rows than equati<strong>on</strong>s. The statement of uniqueness is:Theorem 3 Uniqueness of soluti<strong>on</strong>s to Ax = y There is at most <strong>on</strong>e soluti<strong>on</strong> toAx = y there might be n<strong>on</strong>e) if <strong>and</strong> <strong>on</strong>ly if the columns of A are linearly independentr = m), in which case there exists an m × n left inverse B such that BA = I m . This is<strong>on</strong>ly possible if n ≥ m.


84 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAClearly then, in order to have both existence <strong>and</strong> uniqueness, we must have that r =m = n. This precludes having existence <strong>and</strong> uniqueness for rectangular <strong>matrice</strong>s. Forsquare <strong>matrice</strong>s m = n, so existence implies uniqueness <strong>and</strong> uniqueness impliesexistence.Using the left <strong>and</strong> right inverses we can find soluti<strong>on</strong>s to Ax = y: if they exist. Forexample, given a right inverse A, then since AC = I, we have ACy = y. But sinceAx = y it follows that x = Cy. But C is not necessarily unique. On the other h<strong>and</strong>, ifthere exists a left inverse BA = I, then BAx = By, which implies that x = By.Some examples. C<strong>on</strong>sider first the case of more equati<strong>on</strong>s than unknowns. Let⎡ ⎤−1 0⎢ ⎥A = ⎣ 0 3 ⎦ (3.8.4)0 0Since the columns are linearly independent <strong>and</strong> there are more rows than columns, therecan be at most <strong>on</strong>e soluti<strong>on</strong>. You can readily verify that any matrix of the form −1 0 γ(3.8.5)0 1/3 ιis a left inverse. The particular left inverse given by the formula (A T A) −1 A T (cf. theexercise at the end of this chapter) is the <strong>on</strong>e for which γ <strong>and</strong> ι are zero. But thereare infinitely many other left inverses. As for soluti<strong>on</strong>s of Ax = y, if we take the innerproduct of A with the vector (x 1 x 2 ) T we get⎡ ⎤ ⎡ ⎤−x 1 y 1⎢ ⎥ ⎢ ⎥⎣ 3x 2 ⎦ = ⎣ y 2 ⎦ (3.8.6)0 y 3So, clearly, we must have x 1 = −y 1 <strong>and</strong> x 2 = 1/3y 2 . But, there will not be any soluti<strong>on</strong>unless y 3 = 0.Next, let’s c<strong>on</strong>sider the case of more columns (unknowns) than rows (equati<strong>on</strong>s). Let −1 0 0A =(3.8.7)0 3 0Here you can readily verify that any matrix of the form⎡⎤−1 0⎢⎣0 1/3γ ι⎥⎦ (3.8.8)is a right inverse. The particular right inverse (shown in the exercise at the end of thischapter) A T (AA T ) −1 corresp<strong>on</strong>ds to γ = ι = 0.Now if we look at soluti<strong>on</strong>s of the linear system Ax = y with x ∈ R 3 <strong>and</strong> y ∈ R 2 wefind that x 1 = −y 1 , x 2 = 1/3y 2 , <strong>and</strong> that x 3 is completely undetermined. So there is aninfinite set of soluti<strong>on</strong>s corresp<strong>on</strong>ding to the different values of x 3 .


3.9. ELEMENTARY OPERATIONS AND GAUSSIAN ELIMINATION 853.9 Elementary operati<strong>on</strong>s <strong>and</strong> GaussianEliminati<strong>on</strong>I am assuming that you’ve seen this before, so this is a very terse review. If not, see thebook by Strang in the bibliography.Elementary matrix operati<strong>on</strong>s c<strong>on</strong>sist of:• Interchanging two rows (or columns)• Multiplying a row (or column) by a n<strong>on</strong>zero c<strong>on</strong>stant• Adding a multiple of <strong>on</strong>e row (or column) to another row (or column)If you have a matrix that can be derived from another matrix by a sequence of elementaryoperati<strong>on</strong>s, then the two <strong>matrice</strong>s are said to be row or column equivalent. For exampleis row equivalent toA =B =⎜⎝⎜⎝1 2 4 32 1 3 21 −1 2 32 4 8 61 −1 2 34 −1 7 8because we can add 2 times row 3 of A to row 2 of A; then interchange rows 2 <strong>and</strong> 3;finally multiply row 1 by 2.Gaussian eliminati<strong>on</strong> c<strong>on</strong>sists of two phases. The first is the applicati<strong>on</strong> of elementaryoperati<strong>on</strong>s to try to put the matrix in row-reduced form; i.e., making zero all the elementsbelow the main diag<strong>on</strong>al (<strong>and</strong> normalizing the diag<strong>on</strong>al elements to 1). The sec<strong>on</strong>dphase is back-substituti<strong>on</strong>. Unless the matrix is very simple, calculating any of the fourfundamental subspaces is probably easiest if you put the matrix in row-reduced formfirst.⎞⎟⎠⎞⎟⎠3.9.1 Examples1. Find the row-reduced form <strong>and</strong> the null-space ofA =1 2 34 5 6


86 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAAnswer A row-reduced form of the matrix is1 2 30 1 2Now, some people reserve the term row-reduced (or row-reduced echel<strong>on</strong>) form forthe matrix that also has zeros above the <strong>on</strong>es. We can get this form in <strong>on</strong>e morestep: 1 0 −10 1 2The null space of A can be obtained by solving the system1 0 −10 1 2 ⎞x 1⎜ ⎟ ⎝ x 2 ⎠ =x 3So we must have x 1 = x 3 <strong>and</strong> x 2 = −2x 3 . So the null space is is the line spannedby(1 −2 1).2. Solve the linear system Ax = y with y = (1 1):Answer00Any vector of the form (z − 1 1 − 2z z) will do. For instance, (−1 1 0).3. Solve the linear system Ax = y with y = (0 −1):Answer One example is − 2 3 1 3 0 .4. Find the row-reduced form <strong>and</strong> the null space of the matrix ⎞1 2 3⎜ ⎟B = ⎝ 4 5 6 ⎠7 8 9Answer The row-reduced matrix is⎜⎝1 0 −10 1 20 0 0⎞⎟⎠The null space is spanned by.(1 −2 1)


3.9. ELEMENTARY OPERATIONS AND GAUSSIAN ELIMINATION 875. Find the row-reduced form <strong>and</strong> the null space of the matrixC =Answer The row-reduced matrix is⎜⎝⎜⎝1 2 34 5 61 0 11 0 00 1 00 0 1The <strong>on</strong>ly element in the null space is the zero vector.6. Find the null space of the matrixD =⎞⎟⎠1 1 11 0 2Answer You can solve the linear system Dx = y with y = (0 0 0) <strong>and</strong> discoverthat x 1 = −2x 3 = −2x 2 . This means that the null space is spanned (−2 1 1). Therow-reduced form of the matrix is1 0 20 1 −17. Are the following vectors in R 3 linearly independent or dependent? If they aredependent express <strong>on</strong>e as a linear combinati<strong>on</strong> of the others.⎧⎪⎨⎜⎝⎪⎩110⎞⎟ ⎜⎠ ⎝023⎞⎟ ⎜⎠ ⎝123⎞⎞⎟⎠⎟ ⎜⎠ ⎝366⎞⎫⎪⎬⎟⎠⎪⎭Answer The vectors are obviously dependent since you cannot have four linearlyindependent vectors in a three dimensi<strong>on</strong>al space. If you put the matrix in rowreducedform you will get ⎞1 0 00 1 0⎜ ⎟⎝ 0 0 1 ⎠ .0 0 0The first three vectors are indeed linearly independent. Note that the determinantof ⎞1 0 1⎜ ⎟⎝ 1 2 2 ⎠0 3 3


88 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAis equal to 3.To find the desired linear combinati<strong>on</strong> we need to solve: ⎞ ⎞ ⎞ 1 0 1⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜x ⎝ 1 ⎠ + y ⎝ 2 ⎠ + z ⎝ 2 ⎠ = ⎝0 3 3or⎜⎝1 0 11 2 20 3 3⎞ ⎟ ⎜⎠ ⎝xyz⎞⎟⎠ =Gaussian eliminati<strong>on</strong> could proceed as follows (the sequence of steps is not uniqueof course): first divide the third row by 31 0 1 31 2 2 60 1 1 21 0 1 30 2 1 30 1 1 21 0 1 30 0 −1 −10 1 1 21 0 1 30 0 1 10 1 0 11 0 1 30 1 0 10 0 1 1Thus we have z = y = 1 <strong>and</strong> x + z = 3, which implies that x = 2. So, the soluti<strong>on</strong>is (2 1 1) <strong>and</strong> you can verify that⎜2 ⎝110⎞⎟ ⎜⎠ + 1 ⎝023⎞⎟ ⎜⎠ + 1 ⎝123⎜⎝⎞366⎟⎠ =⎞⎟⎠⎜⎝366366⎞⎟⎠⎞⎟⎠3.10 Least SquaresIn this secti<strong>on</strong> we will c<strong>on</strong>sider the problem of solving Ax = y when no soluti<strong>on</strong> existsI.e., we c<strong>on</strong>sider what happens when there is no vector that satisfies the equati<strong>on</strong>s exactly.This sort of situati<strong>on</strong> occurs all the time in science <strong>and</strong> engineering. Often we


3.10. LEAST SQUARES 89make repeated measurements which, because of noise, for example, are not exactly c<strong>on</strong>sistent.Suppose we make n measurements of some quantity x. Let x i denote the i-thmeasurement. You can think of this as n equati<strong>on</strong>s with 1 unknown:⎜⎝111.1⎞x =⎟ ⎜⎠ ⎝⎞x 1x 2x 3.⎟⎠x nObviously unless all the x i are the same, there cannot be a value of x which satisfies allthe equati<strong>on</strong>s simultaneously. Being practical people we could, at <strong>least</strong> for this simpleproblem, ignore all the linear algebra <strong>and</strong> simply assert that we want to find the valueof x which minimizes the sum of squared errors:min xn i=1(x − x i ) 2 .Differentiating this equati<strong>on</strong> with respect to x <strong>and</strong> setting the result equal to zero gives:x ls = 1 nx ini=1where we have used x ls to denote the <strong>least</strong> <strong>squares</strong> value of x. In other words the valueof x that minimizes the sum of <strong>squares</strong> of the errors is just the mean of the data.In more complicated situati<strong>on</strong>s (with n equati<strong>on</strong>s <strong>and</strong> m unknowns) it’s not quite soobvious how to proceed. Let’s return to the basic problem of solving Ax = y. If y werein the column space of A, then there would exist a vector x such that Ax = y. On theother h<strong>and</strong>, if y is not in the column space of A a reas<strong>on</strong>able strategy is to try to findan approximate soluti<strong>on</strong> from within the column space. In other words, find a linearcombinati<strong>on</strong> of the columns of A that is as close as possible in a <strong>least</strong> <strong>squares</strong> sense tothe data. Let’s call this approximate soluti<strong>on</strong> x ls . Since Ax ls is, by definiti<strong>on</strong>, c<strong>on</strong>finedto the column space of A then Ax ls − y (the error in fitting the data) must be in theorthog<strong>on</strong>al complement of the column space. The orthog<strong>on</strong>al complement of the columnspace is the left null space, so Ax ls − y must get mapped into zero by A T :orA T Ax ls − y = 0A T Ax ls = A T yThese are called the normal equati<strong>on</strong>s. Now we saw in the last chapter that the outerproduct of a vector or matrix with itself defined a projecti<strong>on</strong> operator <strong>on</strong>to the subspacespanned by the vector (or columns of the matrix). If we look again at the normal


90 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAequati<strong>on</strong>s <strong>and</strong> assume for the moment that the matrix A T A is invertible, then the <strong>least</strong><strong>squares</strong> soluti<strong>on</strong> is:x ls = (A T A) −1 A T yThe matrix (A T A) −1 A T is an example of what is called a generalized inverse of A. In theeven that A is not invertible in the usual sense, this provides a reas<strong>on</strong>able generalizati<strong>on</strong>(not the <strong>on</strong>ly <strong>on</strong>e) of the ordinary inverse.Now A applied to the <strong>least</strong> <strong>squares</strong> soluti<strong>on</strong> is the approximati<strong>on</strong> to the data from withinthe column space. So Ax ls is precisely the projecti<strong>on</strong> of the data y <strong>on</strong>to the columnspace:Ax ls = A(A T A) −1 A T y.Before when we did orthog<strong>on</strong>al projecti<strong>on</strong>s, the projecting vectors/<strong>matrice</strong>s were orthog<strong>on</strong>al,so A T A term would have been the identity, but the outer product structure in Ax lsis evident.The generalized inverse projects the data <strong>on</strong>to the column space of A.A few observati<strong>on</strong>s:• When A is invertible (square, full rank) A(A T A) −1 A T = AA −1 (A T ) −1 A T = I, soevery vector projects to itself.• A T A has the same null space as A. Proof: clearly if Ax = 0, then A T Ax = 0. Goingthe other way, suppose A T Ax = 0. Then x T A T Ax = 0. But this can also be writtenas (Ax Ax) = Ax 2 = 0. By the properties of the norm, Ax 2 = 0 ⇒ Ax = 0.• As a corollary of this, if A has linearly independent columns (i.e., the rank r = m)then A T A is invertible.Finally, it’s not too difficult to show that the normal equati<strong>on</strong>s can also be derived bydirectly minimizing the following functi<strong>on</strong>:Ax − y 2 = (Ax − y Ax − y).This is just the sum of the squared errors, but for n simultaneous equati<strong>on</strong>s in m unknowns.You can either write this vector functi<strong>on</strong> out explicitly in terms of its comp<strong>on</strong>ents<strong>and</strong> use ordinary calculus, or you can actually differentiate the expressi<strong>on</strong> with respectto the vector x <strong>and</strong> set the result equal to zero. So for instance, since(Ax Ax) = (A T Ax x) = (x A T Ax)differentiating (Ax Ax) with respect to x yields 2A T Ax, <strong>on</strong>e factor coming from eachfactor of x. The details will be left as an exercise.


3.10. LEAST SQUARES 913.10.1 Examples of Least SquaresLet us return to the problem we started above: ⎞ 11x =⎜⎝1.⎟ ⎜⎠ ⎝1⎞x 1x 2x 3.⎟⎠x nIgnoring linear algebra <strong>and</strong> just going for a <strong>least</strong> <strong>squares</strong> value of the parameter x wecame up with:x ls = 1 nx i .nLet’s make sure we get the same thing using the generalized inverse approach. Now, A T Ais just ⎞11(1 1 1 ... 1)= n.⎜⎝1.⎟⎠1So the generalized inverse of A isHence the generalized inverse soluti<strong>on</strong> is:as we knew already.i=1(A T A) −1 A T = 1 (1 1 1 ... 1) .n1n (1 1 1 ... 1) ⎜⎝⎞x 1x 2x 3.⎟⎠x n= 1 nx ini=1C<strong>on</strong>sider a more interesting example⎜⎝1 10 10 2⎞⎟ x⎠y=⎜⎝αβγ⎞⎟⎠Thus x + y = α, y = β <strong>and</strong> 2y = γ. So, for example, if α = 1, <strong>and</strong> β = γ = 0, thenx = 1, y = 0 is a soluti<strong>on</strong>. In that case the right h<strong>and</strong> side is in the column space of A.


92 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRABut now suppose the right h<strong>and</strong> side is α = β = 0 <strong>and</strong> γ = 1. It is not hard to see thatthe column vector (0 1 1) T is not in the column space of A. (Show this as an exercise.)So what do we do? We solve the normal equati<strong>on</strong>s. Here are the steps. We want to solve(in the <strong>least</strong> <strong>squares</strong> sense) the following system:⎜⎝1 10 10 2⎞⎟ x⎠y=⎜⎝001⎞⎟⎠So first computeThe inverse of this matrix isA T A =1 11 6A T A −1 1 6 −1=5 −1 1..So the generalized inverse soluti<strong>on</strong> (i.e., the <strong>least</strong> <strong>squares</strong> soluti<strong>on</strong>) isx ls =1 −1/5 −2/51 1/5 2/5 ⎜ ⎝001⎞⎟⎠ =−2/52/5.The interpretati<strong>on</strong> of this soluti<strong>on</strong> is that it satisfies the first equati<strong>on</strong> exactly (sincex + y = 0) <strong>and</strong> it does an average job of satisfying the sec<strong>on</strong>d <strong>and</strong> third equati<strong>on</strong>s. Least<strong>squares</strong> tends to average inc<strong>on</strong>sistent informati<strong>on</strong>.3.11 Eigenvalues <strong>and</strong> EigenvectorsRecall that in Chapter 1 we showed that the equati<strong>on</strong>s of moti<strong>on</strong> for two coupled massesarem 1 ẍ 1 = −k 1 x 1 − k 2 (x 1 − x 2 ).m 2 ẍ 2 = −k 3 x 2 − k 2 (x 2 − x 1 ).or, restricting ourselves to the case in which m 1 = m 2 = m <strong>and</strong> k 1 = k 2 = kẍ 1 = − k m x 1 − k m (x 1 − x 2 )= −ω 2 0 x 1 − ω 2 0 (x 1 − x 2 )= −2ω 2 0x 1 + ω 2 0x 2 . (3.11.1)


3.11. EIGENVALUES AND EIGENVECTORS 93<strong>and</strong>ẍ 2 = − k m x 2 − k m (x 2 − x 1 )= −ω0 2 x 2 − ω0 2 (x 2 − x 1 )= −2ω0 2 x 2 + ω0 2 x 1. (3.11.2)If we look for the usual suspect soluti<strong>on</strong>sx 1 = Ae iωt (3.11.3)x 2 = Be iωt (3.11.4)we see that the relati<strong>on</strong>ship between the displacement amplitudes A <strong>and</strong> B <strong>and</strong> ω canbe written as the following matrix equati<strong>on</strong>:2ω20 −ω 2 0−ω 2 0 2ω 2 0 AB= ω 2 AB. (3.11.5)This equati<strong>on</strong> has the form of a matrix times a vector is equal to a scalar times the samevector:Ku = ω 2 u. (3.11.6)In other words, the acti<strong>on</strong> of the matrix is to map the vector ((A B) T ) into a scalarmultiple of itself. This is a very special thing for a matrix to do.Without using any linear algebra we showed way back <strong>on</strong> page 22 that the soluti<strong>on</strong>s ofthe equati<strong>on</strong>s of moti<strong>on</strong> had two characteristic frequencies (ω = ω 0 <strong>and</strong> ω = √ 3ω 0 ), whilethe vector (A B) T was either (1 1) T for the slow mode (ω = ω 0 ) or (1 −1) T for the fastmode (ω = √ 3ω 0 ). You can quickly verify that these two sets of vectors/frequencies doindeed satisfy the matrix equati<strong>on</strong> 3.11.5.Now we will look at equati<strong>on</strong>s of the general form of 3.11.6 more systematically. Wewill see that finding the eigenvectors of a matrix gives us fundamental informati<strong>on</strong> aboutthe system which the matrix models. Usually when a matrix operates <strong>on</strong> a vector, itchanges the directi<strong>on</strong> of the vector as well as its length. But for a special class of vectors,eigenvectors, the acti<strong>on</strong> of the matrix is to simply scale the vector:Ax = λx. (3.11.7)If this is true, then x is an eigenvector of the matrix A associated with the eigenvalue λ.Now, λx equals λIx so we can rearrange this equati<strong>on</strong> <strong>and</strong> write(A − λI)x = 0. (3.11.8)Clearly in order that x be an eigenvector we must choose λ so that (A − λI) has anullspace <strong>and</strong> we must choose x so that it lies in that nullspace. That means we must


96 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRA• A matrix can be invertible without being diag<strong>on</strong>alizable. For example, 3 1. (3.11.21)0 3Its two eigenvalues are both equal to 3 <strong>and</strong> its eigenvectors cannot be linearlyindependent. However the inverse of this matrix is straightforward 1/3 −1/9. (3.11.22)0 1/3We can summarize these ideas with a theorem whose proof can be found in linear algebrabooks.Theorem 5 Linear independence of eigenvectors If n eigenvectors of an n×n matrixcorresp<strong>on</strong>d to n different eigenvalues, then the eigenvectors are linearly independent.An important class of <strong>matrice</strong>s for inverse theory are the real symmetric <strong>matrice</strong>s. Thereas<strong>on</strong> is that since we have to deal with rectangular <strong>matrice</strong>s, we often end up treating the<strong>matrice</strong>s A T A <strong>and</strong> AA T instead. And these two <strong>matrice</strong>s are manifestly symmetric. In thecase of real symmetric <strong>matrice</strong>s, the eigenvector/eigenvalue decompositi<strong>on</strong> is especiallynice, since in this case the diag<strong>on</strong>alizing matrix S can be chosen to be an orthog<strong>on</strong>almatrix Q.Theorem 6 Orthog<strong>on</strong>al decompositi<strong>on</strong> of a real symmetric matrix A real symmetricmatrix A can be factored intowith orth<strong>on</strong>ormal eigenvectors in Q <strong>and</strong> real eigenvalues in Λ.A = QΛQ T (3.11.23)3.12 Orthog<strong>on</strong>al decompositi<strong>on</strong> of rectangular<strong>matrice</strong>s4 For dimensi<strong>on</strong>al reas<strong>on</strong>s there is clearly no hope of the kind of eigenvector decompositi<strong>on</strong>discussed above being applied to rectangular <strong>matrice</strong>s. However, there is an amazinglyuseful generalizati<strong>on</strong> that pertains if we allow a different orthog<strong>on</strong>al matrix <strong>on</strong> eachside of A. It is called the Singular Value Decompositi<strong>on</strong> <strong>and</strong> works for any matrixwhatsoever. Essentially the singular value decompositi<strong>on</strong> generates orthog<strong>on</strong>al basesof R m <strong>and</strong> R n simultaneously.4 This secti<strong>on</strong> can be skipped <strong>on</strong> first reading.


3.12. ORTHOGONAL DECOMPOSITION OF RECTANGULAR MATRICES 97Theorem 7 Singular value decompositi<strong>on</strong> Any matrix A ∈ R n×m can be factoredasA = UΛV T (3.12.1)where the columns of U ∈ R n×n are eigenvectors of AA T <strong>and</strong> the columns of V ∈ R m×mare the eigenvectors of A T A. Λ ∈ R n×m is a rectangular matrix with the singular values<strong>on</strong> its main diag<strong>on</strong>al <strong>and</strong> zero elsewhere. The singular values are the square roots of theeigenvalues of A T A, which are the same as the n<strong>on</strong>zero eigenvalues of AA T . Further,there are exactly r n<strong>on</strong>zero singular values, where r is the rank of A.The columns of U <strong>and</strong> V span the four fundamental subspaces. The column space of A isspanned by the first r columns of U. The row space is spanned by the first r columns ofV . The left nullspace of A is spanned by the last n − r columns of U. And the nullspaceof A is spanned by the last m − r columns of V .A direct approach to the SVD, attributed to the physicist Lanczos, is to make a symmetricmatrix out of the rectangular matrix A as follows: LetS =0 AA T 0. (3.12.2)Since A is in R n×m , S must be in R n+m)×n+m) . And since S is symmetric it has orthog<strong>on</strong>aleigenvectors w i with real eigenvalues λ iSw i = λ i w i . (3.12.3)If we split up the eigenvector w i , which is in R n+m , into an n-dimensi<strong>on</strong>al data part <strong>and</strong>an m-dimensi<strong>on</strong>al model part uiw i =(3.12.4)v ithen the eigenvalue problem for S reduces to two coupled eigenvalue problems, <strong>on</strong>e forA <strong>and</strong> <strong>on</strong>e for A T A T u i = λ i v i (3.12.5)Av i = λ i u i . (3.12.6)We can multiply the first of these equati<strong>on</strong>s by A <strong>and</strong> the sec<strong>on</strong>d by A T to getA T Av i = λ i 2 v i (3.12.7)AA T u i = λ i 2 u i . (3.12.8)So we see, <strong>on</strong>ce again, that the model eigenvectors u i are eigenvectors of AA T <strong>and</strong> thedata eigenvectors v i are eigenvectors of A T A. Also note that if we change sign of theeigenvalue we see that (−u i v i ) is an eigenvector too. So if there are r pairs of n<strong>on</strong>zero


98 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAeigenvalues ±λ i then there are r eigenvectors of the form (u i v i ) for the positive λ i <strong>and</strong>r of the form (−u i v i ) for the negative λ i .Keep in mind that the <strong>matrice</strong>s U <strong>and</strong> V whose columns are the model <strong>and</strong> data eigenvectorsare square (respectively n × n <strong>and</strong> m × m) <strong>and</strong> orthog<strong>on</strong>al. Therefore we haveU T U = UU T = I n <strong>and</strong> V T V = V V T = I m . But it is important to distinguish betweenthe eigenvectors associated with zero <strong>and</strong> n<strong>on</strong>zero eigenvalues. Let U r <strong>and</strong> V r be the<strong>matrice</strong>s whose columns are the r model <strong>and</strong> data eigenvectors associated with the rn<strong>on</strong>zero eigenvalues <strong>and</strong> U 0 <strong>and</strong> V 0 be the <strong>matrice</strong>s whose columns are the eigenvectorsassociated with the zero eigenvalues, <strong>and</strong> let Λ r be the diag<strong>on</strong>al matrix c<strong>on</strong>taining the rn<strong>on</strong>zero eigenvalues. Then we have the following eigenvalue problemAV r = U r Λ r (3.12.9)A T U r = V r Λ r (3.12.10)AV 0 = 0 (3.12.11)A T U 0 = 0. (3.12.12)Since the full <strong>matrice</strong>s U <strong>and</strong> V satisfy U T U = UU T = I n <strong>and</strong> V T V = V V T = I m it canbe readily seen that AV = UΛ implies A = UΛV T <strong>and</strong> thereforeA = [U r U 0 ]Λr 00 0 VTrV T0= U r Λ r V Tr (3.12.13)This is the singular value decompositi<strong>on</strong>. Notice that 0 represent rectangular <strong>matrice</strong>sof zeros. Since Λ r is r × r <strong>and</strong> Λ is n × m then the lower left block of zeros must ben − r × r, the upper right must be r × m − r <strong>and</strong> the lower right must be n − r × m − r.It is important to keep the subscript r in mind since the fact that A can be rec<strong>on</strong>structedfrom the eigenvectors associated with the n<strong>on</strong>zero eigenvalues means that the experimentis unable to see the c<strong>on</strong>tributi<strong>on</strong> due to the eigenvectors associated with zero eigenvalues.3.13 Eigenvectors <strong>and</strong> Orthog<strong>on</strong>al Projecti<strong>on</strong>sAbove we said that the <strong>matrice</strong>s V <strong>and</strong> U were orthog<strong>on</strong>al so that V T V = V V T = I m <strong>and</strong>U T U = UU T = I n . There is a nice geometrical picture we can draw for these equati<strong>on</strong>shaving to do with projecti<strong>on</strong>s <strong>on</strong>to lines or subspaces. Let v i denote the ith column ofthe matrix V . (The same argument applies to U of course.) The outer product v i viT isan m × m matrix. It is easy to see that the acti<strong>on</strong> of this matrix <strong>on</strong> a vector is to projectthat vector <strong>on</strong>to the <strong>on</strong>e-dimensi<strong>on</strong>al subspace spanned by v i :vi v T ix = (vTi x)v i .


3.14. A FEW EXAMPLES 99A “projecti<strong>on</strong>” operator is defined by the property that <strong>on</strong>ce you’ve applied it to avector, applying it again doesn’t change the result: P (P x) = P x, in other words. Forthe operator v i v T i this is obviously true since v T i v i = 1.Now suppose we c<strong>on</strong>sider the sum of two of these projecti<strong>on</strong> operators: v i vi T +v jvj T . Thiswill project any vector in R m <strong>on</strong>to the plane spanned by v i <strong>and</strong> v j . We can c<strong>on</strong>tinue thisprocedure <strong>and</strong> define a projecti<strong>on</strong> operator <strong>on</strong>to the subspace spanned by any number pof the model eigenvectors:pv i vi T .i=1If we let p = m then we get a projecti<strong>on</strong> <strong>on</strong>to all of R m . But this must be the identityoperator. In effect we’ve just proved the following identity:mv i vi T = V V T = I.i=1On the other h<strong>and</strong>, if we <strong>on</strong>ly include the terms in the sum associated with the r n<strong>on</strong>zerosingular values, then we get a projecti<strong>on</strong> operator <strong>on</strong>to the n<strong>on</strong>-null space (which is therow space). Sorv i viT = V r VrTis a projecti<strong>on</strong> operator <strong>on</strong>to the row space. By the same reas<strong>on</strong>ingi=1mi=r+1v i v T i= V 0 V T0is a projecti<strong>on</strong> operator <strong>on</strong>to the null space. Putting this all together we can say thatV r V Tr+ V 0 V T0 = I.This says that any vector in R m can be written in terms of its comp<strong>on</strong>ent in the nullspace <strong>and</strong> its comp<strong>on</strong>ent in the row space of A. Let x ∈ R m , thenx = Ix = V r V Tr+ V 0 V T0x = (x)row + (x) null . (3.13.1)3.14 A few examplesThis example shows that often <strong>matrice</strong>s with repeated eigenvalues cannot be diag<strong>on</strong>alized.But symmetric <strong>matrice</strong>s can always be diag<strong>on</strong>alized.A =3 10 3(3.14.1)


100 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRAThe eigenvalues of this matrix are obviously 3 <strong>and</strong> 3. This matrix has a <strong>on</strong>e-dimensi<strong>on</strong>alfamily of eigenvectors; any vector of the form (x 0) T will do. So it cannot be diag<strong>on</strong>alized,it doesn’t have enough eigenvectors.Now c<strong>on</strong>siderA =3 00 3(3.14.2)The eigenvalues of this matrix are still 3 <strong>and</strong> 3. But it will be diag<strong>on</strong>alized by anyinvertible matrix So, of course, to make our lives simple we will choose an orthog<strong>on</strong>almatrix. How about0 11 0? (3.14.3)That will do. But so will1√2−1 11 1. (3.14.4)So, as you can see, repeated eigenvalues give us choice. And for symmetric <strong>matrice</strong>s wenearly always choose to diag<strong>on</strong>alize with orthog<strong>on</strong>al <strong>matrice</strong>s.Exercises3.1 Solve the following linear system for a, b <strong>and</strong> c.⎡⎢⎣−2 1 01 −2 10 1 −2⎤ ⎡⎥ ⎢⎦ ⎣abc⎤⎥⎦ =⎡⎢⎣000⎤⎥⎦3.2 C<strong>on</strong>sider the linear systemabbd xy 0=0Assume x <strong>and</strong> y are n<strong>on</strong>zero. Try to solve this system for x <strong>and</strong> y <strong>and</strong> therebyshow what c<strong>on</strong>diti<strong>on</strong>s must be put <strong>on</strong> the elements of the matrix such that thereis a n<strong>on</strong>zero soluti<strong>on</strong> of these equati<strong>on</strong>s.


3.14. A FEW EXAMPLES 1013.3 Here is a box generated by two unit vectors, <strong>on</strong>e in the x directi<strong>on</strong> <strong>and</strong> <strong>on</strong>e in they(0,1)xy directi<strong>on</strong>. (1,0)aA =bIf we take a two by two matrixbd<strong>and</strong> apply it to the two unit vectors, we get two new vectors that form a differentbox. (I.e., take the dot product of A with the two column vectors (1 0) T <strong>and</strong>(0 1) T .) Draw the resulting boxes for the following <strong>matrice</strong>s <strong>and</strong> say in words whatthe transformati<strong>on</strong> is doing.(a)1 −11 1(b)(c)(d)(e)3.4 For the <strong>matrice</strong>sA =2 00 22 00 1/2−1 00 −1−2 11 −21 02 1


102 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRA<strong>and</strong>B =compute A −1 , B −1 , (BA) −1 , <strong>and</strong> (AB) −11 20 13.5 The next 5 questi<strong>on</strong>s c<strong>on</strong>cern a particular linear system. LetA =⎜⎝0 2 4 −61 −2 4 32 2 −4 0Compute the row-reduced form of A <strong>and</strong> A T . Clearly label the pivots for each case.3.6 Write down basis vectors for the row <strong>and</strong> column spaces of A. What is the rank ofthe matrix?3.7 Write down basis vectors for the left <strong>and</strong> right null spaces of A.3.8 What are the free variable(s) of the linear system Ar = b wherer = ⎜⎝wxyz⎞⎟⎠ <strong>and</strong> b = ⎜⎝Compute the particular soluti<strong>on</strong> of this system by setting the free variable(s) equalto zero. Show for this system the general soluti<strong>on</strong> is equal to this particular soluti<strong>on</strong>plus an element of the null space.3.9 How many of the columns are linearly independent?3.10 LetHow many of the rows are linearly independent?060⎞⎟⎠⎞⎟⎠ .A =3/2 −5/2−5/2 3/2Compute the eigenvalues <strong>and</strong> eigenvectors of this matrix.orthog<strong>on</strong>al?Are the eigenvectors3.11 Let Q be the matrix of eigenvectors from the previous questi<strong>on</strong> <strong>and</strong> L be thediag<strong>on</strong>al matrix of eigenvalues. Show by direct calculati<strong>on</strong> that Q diag<strong>on</strong>alizes A,i.e., QAQ T = L.3.12 Give an example of a real, n<strong>on</strong>diag<strong>on</strong>al 2×2 matrix whose eigenvalues are complex.


3.14. A FEW EXAMPLES 1033.13 In terms of its eigenvalues, what does it mean for a matrix to be invertible? Arediag<strong>on</strong>alizable <strong>matrice</strong>s always invertible?3.14 Give specific (n<strong>on</strong>zero) examples of 2 by 2 <strong>matrice</strong>s satisfying the following properties:A 2 = 0 A 2 = −I 2 <strong>and</strong> AB = −BA (3.14.5)3.15 Let A be an upper triangular matrix. Suppose that all the diag<strong>on</strong>al elements aren<strong>on</strong>zero. Show that the columns must be linearly independent <strong>and</strong> that the nullspacec<strong>on</strong>tains <strong>on</strong>ly the zero vector.3.16 Figure out the column space <strong>and</strong> null space of the following two <strong>matrice</strong>s: 1 −1 0 0 0<strong>and</strong>(3.14.6)0 0 0 0 03.17 Which of the following two are subspaces of R n : the plane of all vectors whose firstcomp<strong>on</strong>ent is zero; the plane of all vectors whose first comp<strong>on</strong>ent is 1.3.18 Let P be a plane in R 3 defined by x 1 − 6x 2 + 13x 3 = −3. What is the equati<strong>on</strong>of the plane P 0 parallel to P but passing through the origin? Is either P or P 0 asubspace of R 3 ?3.19 LetCompute x 1 , x 2 , <strong>and</strong> x ∞ .x =9−12. (3.14.7)3.20 Show that B = (A T A) −1 A T is a left inverse <strong>and</strong> C = A T (AA T ) −1 is a right inverseof a matrix A, provided that AA T <strong>and</strong> A T A are invertible. It turns out that A T Ais invertible if the rank of A is equal to n, the number of columns; <strong>and</strong> AA T isinvertible if the rank is equal to m, the number of rows.3.21 C<strong>on</strong>sider the matrix a bc d(3.14.8)The trace of this matrix is a + d <strong>and</strong> the determinant is ad − cb. Show by directcalculati<strong>on</strong> that the product of the eigenvalues is equal to the determinant <strong>and</strong> thesum of the eigenvalues is equal to the trace.3.22 As we have seen, an orthog<strong>on</strong>al matrix corresp<strong>on</strong>ds to a rotati<strong>on</strong>. C<strong>on</strong>sider theeigenvalue problem for a simple orthog<strong>on</strong>al matrix such as 0 −1Q =(3.14.9)1 0How can a rotati<strong>on</strong> map a vector into a multiple of itself?


104 CHAPTER 3 A LITTLE MORE LINEAR ALGEBRA3.23 Show that the eigenvalues of A j are the j-th powers of the eigenvalues of A.3.24 Compute the SVD of the matrixA =⎡⎢⎣1 1 00 0 10 0 −1directly by computing the eigenvectors of A T A <strong>and</strong> AA T .⎤⎥⎦ (3.14.10)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!