Benjamin McKay Linear Algebra

Benjamin McKay 

Linear Algebra 

January 2, 2008

Preface 

Up close, smooth things look flat—the picture behind differential calculus. In 

mathematical language, we can approximate smoothly varying functions by 

linear functions. In calculus of several variables, the resulting linear functions 

can be complicated: you need to study linear algebra. 

Problems appear throughout the text, which you must learn to solve. They 

often provide vital results used in the course. Most of these problems have 

hints, particularly the more important ones. There are also review problems at 

the end of each section, and you should try to solve a few from each section. 

Try to solve each problem first before looking up the hint. Never use decimal 

approximations (for instance, from a calculator) on any problem, except to 

check your work; many problems are very sensitive to small errors and must 

be worked out precisely. Whenever ridiculously large numbers appear in the 

statement of a problem, this is a hint that they must play little or no role in 

the solution. 

The prerequisites for this course are basic arithmetic and elementary algebra, 

typically learned in high school, and some comfort and facility with proofs, 

particularly using mathematical induction. You can’t prove that all men are 

wearing hats just by pointing out one example of a man in a hat; most proofs 

require an argument, and not just examples. Polya [9] and Solow [12] explain 

induction and provide help with proofs. Bretscher [2] and Strang [15] are 

excellent introductory textbooks of linear algebra. 

iii

Contents 

Matrix Calculations 

1 Solving Linear Equations 3 

2 Matrices 17 

3 Important Types of Matrices 25 

4 Elimination Via Matrix Arithmetic 35 

5 Finding the Inverse of a Matrix 43 

6 The Determinant 51 

7 The Determinant via Elimination 61 

Bases and Subspaces 

8 Span 71 

9 Bases 81 

10 Kernel and Image 91 

Eigenvectors 

11 Eigenvalues and Eigenvectors 101 

12 Bases of Eigenvectors 109 

Orthogonal Linear Algebra 

13 Inner Product 119 

14 The Spectral Theorem 135 

15 Complex Vectors 149 

Abstraction 

16 Vector Spaces 161 

17 Fields 173 

Geometry and Combinatorics 

18 Permutations and Determinants 181 

19 Volume and Determinants 189 

20 Geometry and Orthogonal Matrices 193 

21 Orthogonal Projections 201 

Jordan Normal Form 

22 Direct Sums of Subspaces 207 

iv

Contents 

v 

23 Jordan Normal Form 213 

24 Decomposition and Minimal Polynomial 225 

25 Matrix Functions of a Matrix Variable 235 

26 Symmetric Functions of Eigenvalues 245 

27 The Pfaffian 251 

Factorizations 

28 Dual Spaces and Quotient Spaces 265 

29 Singular Value Factorization 273 

30 Factorizations 279 

Tensors 

31 Quadratic Forms 285 

32 Tensors and Indices 295 

33 Tensors 303 

34 Exterior Forms 313 

A Hints 317 

Bibliography 477 

List of Notation 479 

Index 481

1 

Matrix Calculations

1 Solving Linear Equations 

In this chapter, we learn how to solve systems of linear equations by a simple 

recipe, suitable for a computer. 

1.1 Elimination 

Consider equations 

−6 − x 3 + x 2 = 0 

3 x 1 + 7 x 2 + 4 x 3 = 9 

3 x 1 + 5 x 2 + 8 x 3 = 3. 

They are called linear because they are sums of constants and constant multiples 

of variables. How can we solve them (or teach a computer to solve them)? To 

solve means to find values for each of the variables x 1 , x 2 and x 3 satisfying all 

three of the equations. 

Preliminaries 

a. Line up the variables: 

x 2 − x 3 = 6 

3x 1 + 7x 2 + 4x 3 = 9 

3x 1 + 5x 2 + 8x 3 = 3 

All of the x 1 ’s are in the same column, etc. and all constants on the right 

hand side. 

b. Drop the variables and equals signs, just writing the numbers. 

⎛ 

⎞ 

⎜ 

0 1 −1 6 ⎟ 

⎝3 7 4 9⎠ . 

3 5 8 3 

This saves rewriting the variables at each step. We put brackets around 

for decoration. 

3


c. Draw a box around the entry in the top left corner, and call that entry 

the pivot. 

⎛ 

⎞ 

0 1 −1 6 

⎜ 

⎟ 

⎝ 3 7 4 9⎠ . 

3 5 8 3 

Forward elimination 

(1) If the pivot is zero, then swap rows with a lower row to get the pivot to be 

nonzero. This gives 

⎛ 

⎞ 

3 7 4 9 

⎜ 

⎟ 

⎝ 0 1 −1 6⎠ . 

3 5 8 3 

(Going back to the linear equations we started with, we are swapping the 

order in which we write them down.) If you can’t find any row to swap 

with (because every lower row also has a zero in the pivot column), then 

move pivot one step to the right → and repeat step (1). 

(2) Add whatever multiples of the pivot row you need to each lower row, in 

order to kill off every entry under the pivot. (“Kill off” means “make into 

0”). This requires us to add − (row 1) to row 3 to kill off the 3 under the 

pivot, giving 

⎛ 

⎜ 

⎝ 

⎞ 

3 7 4 9 

⎟ 

0 1 −1 6 ⎠ . 

0 −2 4 −6 

(Going back to the linear equations, we are adding equations together which 

doesn’t change the answers—we could reverse this step by subtracting 

again.) 

(3) Make a new pivot one step down and to the right: ↘. 

and start again at step (1). 

⎛ 

⎞ 

3 7 4 9 

⎜ 

⎟ 

⎝ 0 1 −1 6 ⎠ . 

0 −2 4 −6 

In our example, our next pivot, 1, must kill everything beneath it: −2. So we 

add 2(row 2) to (row 3), giving 

⎛ 

⎞ 

3 7 4 9 

⎜ 

⎟ 

⎝ 0 1 −1 6⎠ . 

0 0 2 6

1.1. Elimination 5 

Figure 1.1: Forward elimination on a large matrix. The shaded boxes are 

nonzero entries, while zero entries are left blank. The pivots are outlined. You 

can see the first few steps, and then a step somewhere in the middle of the 

calculation, and then the final result. 

We are done with that pivot. Move ↘. 

⎛ 

⎞ 

3 7 4 9 

⎜ 

⎟ 

⎝ 0 1 −1 6⎠ . 

0 0 2 6 

Forward elimination is done. Lets turn the numbers back into equations, to see 

what we have: 

3x 1 +7x 2 +4x 3 = 9 

x 2 −x 3 = 6 

2x 3 = 6 

Each pivot solves for one variable in terms of later variables. 

Problem 1.1. Apply forward elimination to 

⎛ 

⎞ 

0 0 1 1 

0 0 1 1 

⎜ 

⎟ 

⎝1 0 3 0⎠ 

1 1 1 1 


⎛ 

⎞ 

1 1 0 1 

0 1 1 0 

⎜ 

⎟ 

⎝0 0 0 0 ⎠ 

0 0 0 −1 

Back Substitution 

Starting at the last pivot, and working up: 

a. Rescale the entire row to turn the pivot into a 1.


Figure 1.2: Back substitution on the large matrix from figure 1.1. You can see 

the first few steps, and then a step somewhere in the middle of the calculation, 

and then the final result. You can see the pivots turn into 1’s. 

b. Add whatever multiples of the pivot row you need to each higher row, in 

order to kill off every entry above the pivot. 

Applied to our example: 

⎛ 

⎞ 

Scale row 3 by 1 2 : 

⎜ 

⎝ 

3 7 4 9 

⎟ 

0 1 −1 6⎠ , 

0 0 2 6 

⎛ 

⎞ 

3 7 4 9 

⎜ 

⎟ 

⎝ 0 1 −1 6⎠ 

0 0 1 3 

Add row 3 to row 2, −4 (row 3) to row 1. 

⎛ 

⎞ 

3 7 0 −3 

⎜ 

⎟ 

⎝ 0 1 0 9 ⎠ 

0 0 1 3 

Add −7 (row 2) to row 1. 

⎛ 

Scale row 1 by 1 3 . 

⎜ 

⎝ 

Done. Turn back into equations: 

⎞ 

3 0 0 −66 

⎟ 

⎠ 

0 1 0 9 

0 0 1 3 

⎛ 

⎞ 

1 0 0 −22 

⎜ 

⎟ 

⎝ 0 1 0 9 ⎠ 

0 0 1 3 

x 1 = −22 

x 2 = 9 

x 3 = 3.

1.2. Examples 7 

Definition 1.1. Forward elimination and back substitution together are called 

Gauss–Jordan elimination or just elimination. (Forward elimination is often 

called Gaussian elimination.) 

Remark 1.2. Forward elimination already shows us what is going to happen: 

which variables are solved for in terms of which other variables. So for answering 

most questions, we usually only need to carry out forward elimination, without 

back substitution. 

1.2 Examples 

Example 1.3 (More than one solution). 

Write down the numbers: 

⎛ 

x 1 + x 2 + x 3 + x 4 = 7 

x 1 + 2x 3 = 1 

x 2 + x 3 = 0 

⎞ 

1 1 1 1 7 

⎜ 

⎝ 1 0 2 0 

⎟ 

1⎠ . 

0 1 1 0 0 

Kill everything under the pivot: add − (row 1) to row 2. 

⎛ 

⎞ 

1 1 1 1 7 

⎜ 

⎟ 

⎝ 0 −1 1 −1 −6⎠ . 

0 1 1 0 0 

Done with that pivot; move ↘. 

⎛ 

⎞ 

1 1 1 1 7 

⎜ 

⎟ 

⎝ 0 −1 1 −1 −6⎠ . 

0 1 1 0 0 

Kill: add row 2 to row 3: 

⎛ 

⎞ 

1 1 1 1 7 

⎜ 

⎟ 

⎝ 0 −1 1 −1 −6⎠ . 

0 0 2 −1 −6 

Move ↘. Forward elimination is done. Lets look at where the pivots lie: 

⎛ 

⎞ 

1 1 1 1 7 

⎜ 

⎟ 

⎝ 0 −1 1 −1 −6⎠ . 

0 0 2 −1 −6


Lets turn back into equations: 

x 1 +x 2 +x 3 +x 4 = 7 

−x 2 +x 3 −x 4 =−6 

2x 3 −x 4 =−6 

Look: each pivot solves for one variable, in terms of later variables. There was 

never any pivot in the x 4 column, so x 4 is a free variable: x 4 can take on any 

value, and then we just use each pivot to solve for the other variables, bottom 

up. 

Problem 1.3. Back substitute to find the values of x 1 , x 2 , x 3 in terms of x 4 . 

Example 1.4 (No solutions). Consider the equations 

x 1 + x 2 + x 3 = 1 

2x 1 + x 2 + x 3 = 0 

4x 1 + 3x 2 + 3x 3 = 1. 

Forward eliminate: 

⎛ 

⎞ 

1 1 2 1 

⎜ 

⎟ 

⎝ 2 1 1 0⎠ 

4 3 5 1 

Add −2(row 1) to row 2, −4(row 1) to row 3. 

⎛ 

⎞ 

1 1 2 1 

⎜ 

⎟ 

⎝ 0 −1 −3 −2⎠ 

0 −1 −3 −3 

Move the pivot ↘ . 

⎛ 

⎞ 

1 1 2 1 

⎜ 

⎟ 

⎝ 0 −1 −3 −2⎠ 

0 −1 −3 −3 

Add −(row 2) to row 3. 

⎛ 

⎞ 

1 1 2 1 

⎜ 

⎟ 

⎝ 0 −1 −3 −2⎠ 

0 0 0 −1

1.3. Summary 9 


Move the pivot →. 

⎛ 

⎞ 

1 1 2 1 

⎜ 

⎟ 

⎝ 0 −1 −3 −2⎠ 

0 0 0 −1 

⎛ 

⎜ 

⎝ 

1 1 2 1 

0 −1 −3 −2 

0 0 0 −1 

⎞ 

⎟ 

⎠ 

Turn back into equations: 

x 1 + x 2 + 2 x 3 = 1 

−x 2 − 3 x 3 = −2 

0 = −1. 

You can’t solve these equations: 0 can’t equal −1. So you can’t solve the 

original equations either: there are no solutions. Two lessons that save you time 

and effort: 

a. If a pivot appears in the constants’ column, then there are no solutions. 

b. You don’t need to back substitute for this problem; forward elimination 

already tells you if there are any solutions. 

1.3 Summary 

We can turn linear equations into a box of numbers. Start a pivot at the top left 

corner, swap rows if needed, move → if swapping won’t work, kill off everything 

under the pivot, and then make a new pivot ↘ from the last one. After forward 

elimination, we will say that the resulting equations are in echelon form (often 

called row echelon form). 

The echelon form equations have the same solutions as the original equations. 

Each column except the last (the column of constants) represents a variable. 

Each pivot solves for one variable in terms of later variables (each pivot “binds” 

a variable, so that the variable is not free). The original equations have no 

solutions just when the echelon equations have a pivot in the column of constants. 

Otherwise there are solutions, and any pivotless column (besides the column 

of constants) gives a free variable (a variable whose value is not fixed by the 

equations). The value of any free variable can be picked as we like. So if 

there are solutions, there is either only one solution (no free variables), or 

there are infinitely many solutions (free variables). Setting free variables to 

different values gives different solutions. The number of pivots is called the 

rank. Forward elimination makes the pattern of pivots clear; often we don’t 

need to back substitute.


Remark 1.5. We often encounter systems of linear equations for which all of 

the constants are zero (the “right hand sides”). When this happens, to save 

time we won’t write out a column of constants, since the constants would just 

remain zero all the way through forward elimination and back substitution. 

Problem 1.4. Use elimination to solve the linear equations 

2 x 2 + x 3 = 1 

4 x 1 − x 2 + x 3 = 2 

4 x 1 + 3 x 2 + 3 x 3 = 4 

1.3 Review Problems 


⎛ 

⎞ 

2 0 2 

⎜ 

⎟ 

⎝1 0 0⎠ 

0 2 2 


⎛ 

⎞ 

⎜ 

−1 −1 1 ⎟ 

⎝ 1 1 1⎠ 

−1 1 0 


⎛ 

⎞ 

−1 2 −2 −1 

⎜ 

⎟ 

⎝ 1 2 −2 2 ⎠ 

−2 0 0 −1 


⎛ 

⎞ 

0 0 1 

⎜ 

⎟ 

⎝0 1 1⎠ 

1 1 1 


⎛ 

⎞ 

0 1 1 0 0 

0 1 0 1 0 

⎜ 

⎟ 

⎝0 0 0 0 0⎠ 

1 1 0 1 1



⎛ 

⎜ 

1 3 2 6 

⎞ 

⎟ 

⎝2 5 4 1⎠ 

3 8 6 7 

Problem 1.11. Apply back substitution to the result of problem 1.2 on page 5. 

Problem 1.12. Apply back substitution to 

⎛ 

⎞ 

−1 1 0 

⎜ 

⎟ 

⎝ 0 −2 0⎠ 

0 0 1 


⎛ 

⎞ 

1 0 −1 

⎜ 

⎟ 

⎝0 −1 −1⎠ 

0 0 0 


⎛ 

⎞ 

⎜ 

2 1 −1 ⎟ 

⎝0 3 −1⎠ 

0 0 0 


⎛ 

⎞ 

3 0 2 2 

0 2 0 −1 

⎜ 

⎟ 

⎝0 0 3 2 ⎠ 

0 0 0 2 


−x 1 + 2 x 2 + x 3 + x 4 = 1 

−x 1 + 2 x 2 + 2 x 3 + x 4 = 0 

x 3 + 2 x 4 = 0 

x 4 = 2



x 1 + 2 x 2 + 3 x 3 + 4 x 4 = 5 

2 x 1 + 5 x 2 + 7 x 3 + 11 x 4 = 12 

x 2 + x 3 + 4 x 4 = 3 


−2 x 1 + x 2 + x 3 + x 4 = 0 

x 1 − 2 x 2 + x 3 + x 4 = 0 

x 1 + x 2 − 2 x 3 + x 4 = 0 

x 1 + x 2 + x 3 − 2 x 4 = 0 

Problem 1.19. Write down the simplest example you can to show that adding 

one to each entry in a row can change the answers to the linear equations. So 

adding numbers to rows is not allowed. 

Problem 1.20. Write down the simplest systems of linear equations you can 

come up with that have 

a. One solution. 

b. No solutions. 

c. Infinitely many solutions. 

Problem 1.21. If all of the constants in some linear equations are zeros, must 

the equations have a solution? 

Problem 1.22. Draw the two lines 1 2 x 1 − x 2 = − 1 2 and 2 x 1 + x 2 = 3 in R 2 . 

In your drawing indicate the points which satisfy both equations.


Problem 1.23. Which pair of equations cuts out which pair of lines? How 

many solutions does each pair of equations have? 

x 1 − x 2 = 0 (1) 

x 1 + x 2 = 1 

x 1 − x 2 = 4 (2) 

−2 x 1 + 2 x 2 = 1 

x 1 − x 2 = 1 (3) 

−3 x 1 + 3 x 2 = −3 

(a) (b) (c) 

Problem 1.24. Draw the two lines 2 x 1 + x 2 = 1 and x 1 − 2 x 2 = 1 in the 

x 1 x 2 -plane. Explain geometrically where the solution of this pair of equations 

lies. Carry out forward elimination on the pair, to obtain a new pair of equations. 

Draw the lines corresponding to each new equation. Explain why one of these 

lines is parallel to one of the axes. 

Problem 1.25. Find the quadratic function y = ax 2 + bx + c which passes 

through the points (x, y) = (0, 2), (1, 1), (2, 6). 

Problem 1.26. Give a simple example of a system of linear equations which 

has a solution, but for which, if you alter one of the coefficients by a tiny amount 

(as tiny as you like), then there is no solution. 

Problem 1.27. If you write down just one linear equation in three variables, 

like 2x 1 + x 2 − x 3 = −1, the solutions draw out a plane. So a system of three 

linear equations draws out three different planes. The solutions of two of the 

equations lie on the intersections of the two corresponding planes. The solutions 

of the whole system are the points where all three planes intersect. Which 

system of equations in table 1.1 on the next page draws out which picture of 

planes from figure 1.3 on page 15?


x 1 − x 2 + 2 x 3 = 2 (1) 

−2 x 1 + 2 x 2 + x 3 = −2 

−3 x 1 + 3 x 2 − x 3 = 0 

−x 1 − x 3 = 0 (2) 

x 1 − 2 x 2 − x 3 = 0 

2 x 1 − 2 x 2 − 2 x 3 = −1 

x 1 + x 2 + x 3 = 1 (3) 

x 1 + x 2 + x 3 = 0 

x 1 + x 2 + x 3 = −1 

−2 x 1 + x 2 + x 3 = −2 (4) 

−2 x 1 − x 2 + 2 x 3 = 0 

−4 x 2 + 2 x 3 = 4 

−2 x 2 − x 3 = 0 (5) 

−x 1 − x 2 − x 3 = −1 

−3x 1 − 3x 2 − 3x 3 = 0 

Table 1.1: Five systems of linear equations


(a) (b) (c) 

(d) 

(e) 

Figure 1.3: When you have three equations in three variables, each one draws a 

plane. Solutions of a pair of equations lie where their planes intersect. Solutions 

of all three equations lie where all three planes intersect.

2 Matrices 

The boxes of numbers we have been writing are called matrices. Lets learn the 

arithmetic of matrices. 

2.1 Definitions 

Definition 2.1. A matrix is a finite box A of numbers, arranged in rows and 

columns. We write it as 

⎛ 

⎞ 

A 11 A 12 . . . A 1q 

A 

A = 

21 A 22 . . . A 2q 

⎜ 

⎟ 

⎝ . . . . ⎠ 

A p1 A p2 . . . A pq 

and say that A is p × q if it has p rows and q columns. If there are as many 

rows as columns, we will say that the matrix is square. 

Remark 2.2. A 31 is in row 3, column 1. If we have 10 or more rows or columns 

(which won’t happen in this book), we might write A 1,1 instead of A 11 . For 

example, we can distinguish A 11,1 from A 1,11 . 

Definition 2.3. A matrix x with only one column is called a vector and written 

⎛ ⎞ 

x 1 

x 

x = 

2 

⎜ ⎟ 

⎝ . ⎠ . 

x n 

The collection of all vectors with n real number entries is called R n . 

Think of R 2 as the xy-plane, writing each point as 

( ) 

x 

y 

instead of (x, y). We draw a vector, for example the vector 

( ) 

2 

, 

3 

17

18 Matrices 

⎛ 

⎜ 

⎝ 

⎞ 

⎟ 

⎠ 

Figure 2.1: Echelon form: a staircase, each step down by only 1, but across to 

the right by maybe more than one. The pivots are the steps down, and below 

them, in the unshaded part, are zeros. 

as an arrow, pointing out of the origin, with the arrow head at the point 

x = 2, y = 3 : 

Similarly, think of R 3 as 3-dimensional space. 

Problem 2.1. Draw the vectors: 

( ) ( ) ( ) 

1 0 1 

, , , 

0 1 1 

( ) 

2 

, 

1 

( ) 

3 

, 

1 

( ) 

−4 

, 

2 

( ) 

4 

. 

−2 

We draw vectors either as dots or more often as arrows. If there are too 

many vectors, pictures of arrows can get cluttered, so we prefer to draw dots. 

Sometimes we distinguish between points (drawn as dots) and vectors (drawn 

as arrows), but algebraically they are the same objects: columns of numbers. 


Problem 2.2. Find points in R 3 which form the vertices of a regular 

(a) cube 

(b) octahedron 

(c) tetrahedron. 

2.2 Echelon Form 

Definition 2.4. A matrix is in echelon form if (as in figure 2.1) each row is either 

all zeros or starts with more zeros than any earlier row. The first nonzero entry 

of each row is called a pivot.

2.3. Matrices in Blocks 19 

Problem 2.3. Draw dots where the pivots are in figure 2.1 on the facing page. 

Problem 2.4. Give the simplest examples you can of two matrices which are 

not in echelon form, each for a different reason. 

Problem 2.5. The entries A 11 , A 22 , . . . of a square matrix A are called the 

diagonal. Prove that every square matrix in echelon form has all pivots lying 

on or above the diagonal. 

Problem 2.6. Prove that a square matrix in echelon form has a zero row just 

when it is either all zeroes or it has a pivot above the diagonal. 

Problem 2.7. Prove that a square matrix in echelon form has a column with 

no pivot just when it has a zero row. Thus all diagonal entries are pivots or 

else there is a zero row. 

Theorem 2.5. Forward elimination brings any matrix to echelon form, without 

altering the solutions of the associated linear equations. 

Obviously proof is by induction, but the result is clear enough, so we won’t 

give a proof. 


Problem 2.8. In one colour, draw the locations of the pivots, and in another 

draw the “staircase” (as in figure 2.1 on the preceding page) for the matrices 

⎛ 

( 

) 

0 1 0 1 0 ⎜ 

1 2 

⎞ 

( ) ( ) 

⎟ 

A = 

, B = ⎝0 3⎠ , C = 0 0 , D = 1 0 . 

0 0 0 0 1 

0 0 

2.3 Matrices in Blocks 

If 

then write 

( 

A 

B 

A = 

( 

) 

= 

( 

1 2 

3 4 

) 

and B = 

1 2 5 6 

3 4 7 8 

) 

( 

and 

5 6 

7 8 

( 

A 

B 

) 

, 

⎛ 

) 

= ⎜ 

⎝ 

1 2 

3 4 

5 6 

7 8 

(We will often colour various rows and columns of matrices, just to make the 

discussion easier to follow. The colours have no mathematical meaning.) 

⎞ 

⎟ 

⎠ .

20 Matrices 

Definition 2.6. Any matrix which has only zero entries will be written 0. 

Problem 2.9. What could (0 

0) mean? 

Problem 2.10. What could (A 0) mean if 

( ) 

1 2 

A = ? 

3 4 

2.4 Matrix Arithmetic 

Example 2.7. Add matrices like: 

( ) ( ) ( 

) 

1 2 5 6 

1 + 5 2 + 6 

A = 

, B = 

, A + B = 

. 

3 4 7 8 

3 + 7 4 + 8. 

Definition 2.8. If two matrices have matching numbers of rows and columns, 

we add them by adding their components: (A + B) ij = A ij + B ij . Similarly for 

subtracting. 

Problem 2.11. Let 

( ) 

1 2 

A = 

, B = 

3 4 

( 

−1 

−1 

) 

−2 

. 

−2 

Find A + B. 

When we add matrices in blocks, 

( ) ( ) 

A B + C D = 

( 

A + C 

) 

B + D 

(as long as A and C have the same numbers of rows and columns and B and D 

do as well). 

Problem 2.12. Draw the vectors 

( ) ( ) 

u = 

2 3 

, v = 

−1 1 

, 

and the vectors 0 and u + v. In your picture, you should see that they form the 

vertices of a parallelogram (a quadrilateral whose opposite sides are parallel). 

Multiply by numbers like: 

( ) ( 

) 

7 

1 2 7 · 1 7 · 2 

= 

3 4 7 · 3 7 · 4 

.

2.5. Matrix Multiplication 21 

Definition 2.9. If A is a matrix and c is a number, cA is the matrix with 

(cA) ij 

= cA ij . 

Example 2.10. Let 

( ) 

−1 

x = . 

2 

The multiples x, 2x, 3x, . . . and −x, −2x, −3x, . . . live on a straight line through 

0: 

3x 

2x 

x 

0 

−x 

−2x 

−3x 

2.5 Matrix Multiplication 

Surprisingly, matrix multiplication is more difficult. 

Example 2.11. To multiply a single row by a single column, just multiply entries 

in order, and add up: 

( ) ( ) 

3 

1 2 = 1 · 3 + 2 · 4 = 3 + 8 = 11. 

4 

Put your left hand index finger on the row, and your right hand index finger on 

the column, and as you run your left hand along, run your right hand down: 

( ) ( ) 

↓ 

→ → . 

↓ 

As your fingers travel, you multiply the entries you hit, and add up all of the 

products. 

Problem 2.13. Multiply 

⎛ ⎞ 

( 

) 

⎜ 

8 ⎟ 

1 −1 2 ⎝1⎠ 

3

22 Matrices 

Example 2.12. To multiply the matrices 

( ) ( ) 

1 2 5 6 

A = 

, B = 

, 

3 4 7 8 

multiply any row of A by any column of B: 

( 

1 

) ( 

2 

) 

5 

7 

= 

( 

1 · 5 + 2 · 7 

) 

. 

As your left hand finger travels along a row, and your right hand down a column, 

you produce the entry in that row and column; the second row of A times the 

first column of B gives the entry of AB in second row, first column. 


( 

) ⎛ ⎞ 

1 2 

1 2 3 ⎜ ⎟ ⎝ 2 3⎠ 

2 3 4 

3 4 

Definition 2.13. We write ∑ k 

in front of an expression to mean the sum for k 

taking on all possible values for which the expression makes sense. For example, 

if x is a vector with 3 entries, 

⎛ ⎞ 

⎜ 

x 1 

⎟ 

x = ⎝x 2 ⎠ , 

x 3 

then ∑ k x k = x 1 + x 2 + x 3 . 

Definition 2.14. If A is p × q and B is q × r, then AB is the p × r matrix whose 

entries are (AB) ij 

= ∑ k A ikB kj . 


Problem 2.15. If A is a matrix and x a vector, what constraints on dimensions 

need to be satisfied to multiply Ax? What about xA? 

Problem 2.16. 

⎛ ⎞ 

⎜ 

2 0 ( ) 

⎟ 0 1 

A = ⎝2 0⎠ , B = 

, C = 

0 1 

2 0 

Compute all of the following which are defined: 

AB, AC, AD, BC, CA, CD. 

( 

) ( ) 

2 1 2 

0 0 

, D = 

0 1 −1 2 −1

2.5. Matrix Multiplication 23 

Problem 2.17. Find some 2 × 2 matrices A and B with no zero entries for 

which AB = 0. 

Problem 2.18. Find a 2 × 2 matrix A with no zero entries for which A 2 = 0. 

Problem 2.19. Suppose that we have a matrix A, so that whenever x is a 

vector with integer entries, then Ax is also a vector with integer entries. Prove 

that A has integer entries. 

Problem 2.20. A matrix is called upper triangular if all entries below the 

diagonal are zero. Prove that the product of upper triangular square matrices 

is upper triangular, and if, for example 

⎛ 

⎞ 

A 11 A 12 A 13 A 14 . . . A 1n 

A 22 A 23 A 24 . . . A 2n 

A 33 A 34 . . . A 3n 

A = 

. .. . .. , 

. 

⎜ 

. 

⎝ 

.. 

⎟ 

. ⎠ 

A nn 

(with zeroes under the diagonal) and 

⎛ 

⎞ 

B 11 B 12 B 13 B 14 . . . B 1n 

B 22 B 23 B 24 . . . B 2n 

B 33 B 34 . . . B 3n 

B = 

. .. . .. , 

. 

⎜ 

. 

⎝ 

.. 

⎟ 

. ⎠ 

B nn 

then 

⎛ 

⎞ 

A 11 B 11 ∗ ∗ ∗ . . . ∗ 

A 22 B 22 ∗ ∗ . . . ∗ 

A 33 B 33 ∗ . . . ∗ 

AB = 

. .. . .. . 

. 

⎜ 

. 

⎝ 

.. 

⎟ 

∗ ⎠ 

A nn B nn 

Problem 2.21. Prove the analogous result for lower triangular matrices.

24 Matrices 

Algebraic Properties of Matrix Multiplication 

Problem 2.22. If A and B matrices, and AB is defined, and c is any number, 

prove that c(AB) = (cA)B = A(cB). 

Problem 2.23. Prove that matrix multiplication is associative: (AB)C = 

A(BC) (and that if either side is defined, then the other is, and they are equal). 

Problem 2.24. Prove that matrix multiplication is distributive: A(B + C) = 

AB + AC and (P + Q)R = P R + QR for any matrices A, B, C, P, Q, R (again 

if one side is defined, then both are and they are equal). 

like: 

etc. 

Running your finger along rows and columns, you see that blocks multiply 

( ) ( ) 

C 

A B = AC + BD 

D 

Problem 2.25. To make sense of this last statement, what do we need to know 

about the numbers of rows and columns of A, B, C and D?

3 Important Types of Matrices 

Some matrices, or types of matrices, are especially important. 

3.1 The Identity Matrix 

Define matrices 

⎛ 

⎞ 

( ) 

1 0 ⎜ 

1 0 0 ⎟ 

I 1 = (1) , I 2 = 

, I 3 = ⎝0 1 0⎠ , . . . 

0 1 

0 0 1 

Definition 3.1. The n × n matrix with 1’s on the diagonal and zeros everywhere 

else is called the identity matrix, and written I n . We often write it as I to be 

deliberately ambiguous about what size it is. 

An equivalent definition: 

⎧ 

⎨ 

1 if i = j 

I ij = 

⎩0 if i ≠ j. 

Problem 3.1. What could I 13 mean? (Careful: it has two meanings.) What 

does I 2 mean? 

Problem 3.2. Prove that IA = AI = A for any matrix A. 

Problem 3.3. Suppose that B is an n × n matrix, and that AB = A for any 

n × n matrix A. Prove that B = I n . 

Problem 3.4. If A and B are two matrices and Ax = Bx for any vector x, 

prove that A = B. 

Definition 3.2. The columns of I n are vectors called e 1 , e 2 , . . . , e n . 

Problem 3.5. Consider the identity matrix I 3 . What are the vectors e 1 , e 2 , e 3 ? 

25


Problem 3.6. The vector e j has a one in which row? And zeroes in which 

rows? 

Problem 3.7. If A is a matrix, prove that Ae 1 is the first column of A. 

Problem 3.8. If A is any matrix, prove that Ae j is the j-th column of A. 

If A is a p × q matrix, by the previous exercise, 

A = (Ae 1 Ae 2 . . . Ae q ) . 

In particular, when we multiply matrices 

AB = (ABe 1 ABe 2 . . . ABe q ) 

(and if either side of this equation is defined, then both sides are and they are 

equal). In other words, the columns of AB are A times the columns of B. 

This next exercise is particularly vital: 

Problem 3.9. If A is a matrix, and x a vector, prove that Ax is a sum of the 

columns of A, each weighted by entries of x: 

Ax = x 1 (Ae 1 ) + x 2 (Ae 2 ) + · · · + x n (Ae n ) . 


Problem 3.10. True or false (if false, give a counterexample): 

a. If the second column of B is 3 times the first column, then the same is 

true of AB. 

b. Same question for rows instead of columns. 

Problem 3.11. Can you find matrices A and B so that A is 3 × 5 and B is 

5 × 3, and AB = I? 

Problem 3.12. Prove that the rows of AB are the rows of A multiplied by B.

3.1. The Identity Matrix 27 

(a) The 

original 

(b) (c) (d) (e) 

Figure 3.1: Faces formed by y = Ax. Each face is centered at the origin. 

Problem 3.13. The Fibonacci numbers are the numbers x 0 = 1, x 1 = 1, x n+1 = 

x n + x n−1 . Write down x 0 , x 1 , x 2 , x 3 and x 4 . Let 

( ) 

1 1 

A = 

. 

1 0 

Prove that ( ) 

x n+1 

x n 

( ) 

= A n 1 

. 

1 


( ) ( ) 

1 0 

x = , y = . 

0 2 

Draw these vectors in the plane. Let 

( ) ( ) ( ) 

1 0 2 0 1 1 

A = 

, B = 

, C = 

, 

0 1 0 3 0 0 

( ) ( ) ( ) 

1 0 0 0 0 −1 

D = 

, E = 

, F = 

. 

1 0 0 0 1 0 

For each matrix M = A, B, C, D, E, F draw Mx and My (in a different colour 

for each matrix), and explain in words what each matrix is “doing” (for example, 

rotating, flattening onto a line, expanding, contracting, etc.). 

Problem 3.15. The first picture in figure 3.1 is the original in the x 1 , x 2 plane, 

and the center of the circular face is at the origin. If we pick a matrix A and 

set y = Ax, and draw the image in the y 1 , y 2 plane, which matrix below draws 

which picture?


( ) 

2 0 

, 

1 0 

( ) 

0 −1 

, 

1 0 

( ) 

2 0 

, 

1 1 

( ) 

1 0 

−1 1 

Problem 3.16. Can you figure out which matrices give rise to the pictures in 

the last problem, just by looking at the pictures? Assume that you known that 

all the entries of each matrix are integers between -2 and 2. 

Problem 3.17. What are the simplest examples you can find of 2 × 2 matrices 

A for which taking vectors x to Ax 

(a) contracts the plane, 

(b) dilates the plane, 

(c) dilates one direction, while contracting another, 

(d) rotates the plane by a right angle, 

(e) reflects the plane in a line, 

(f) moves the vertical axis, but leaves every point of the horizontal axis where 

it is (a “shear”)? 

3.2 Inverses 

Definition 3.3. A matrix is called square if it has the same number of rows as 

columns. 

Definition 3.4. If A is a square matrix, a square matrix B of the same size as 

A is called the inverse of A and written A −1 if AB = BA = I. 

Problem 3.18. If 

check that 

( ) 

2 1 

A = 

, 

1 1 

( ) 

A −1 1 −1 

= 

. 

−1 2 

Problem 3.19. If A, B and C are square matrices, and AB = I and CA = I, 

prove that B = C. In particular, there is only one inverse (if there is one at all). 

Problem 3.20. Which 1×1 matrices have inverses, and what are their inverses? 

Problem 3.21. By multiplying out the matrices, prove that any 2 × 2 matrix 

( ) 

a b 

A = 

c d

3.2. Inverses 29 

has inverse 

as long as ad − bc ≠ 0. 

A −1 = 

( 

1 d 

ad − bc −c 

) 

−b 

a 

Problem 3.22. If A and B are invertible matrices, prove that AB is invertible, 

and (AB) −1 = B −1 A −1 . 

Problem 3.23. Prove that ( A −1) −1 

= A, for any invertible square matrix A. 

Problem 3.24. If A is invertible, prove that Ax = 0 only for x = 0. 

Problem 3.25. If A is invertible, and AB = I, prove that A = B −1 and 

B = A −1 . 


Problem 3.26. Write down a pair of nonzero 2 × 2 matrices A and B for which 

AB = 0. 

Problem 3.27. If A is an invertible matrix, prove that Ax = Ay just when 

x = y. 

Problem 3.28. If a matrix M splits up into square blocks like 

( ) 

A B 

M = 

0 D 

explain how to find M −1 in terms of A −1 and D −1 . (Warning: for a matrix 

which splits into blocks like 

( ) 

A B 

M = 

C D 

the inverse of M cannot be expressed in any elementary way in terms of the 

blocks and their inverses.)


Original 

Matrices 

Inverse matrices 

Figure 3.2: Images coming from some matrices, and from their inverses. 

Problem 3.29. Figure 3.2 shows how various matrices (on the left hand side) 

and their inverses (on the right hand side) affect vectors. But the two columns 

are scrambled up. Which right hand side picture is produced by the inverse 

matrix of each left hand side picture? 

3.3 Permutation Matrices 

Permutations 

Definition 3.5. Write down the numbers 1, 2, . . . , n in any order. Call this a 

permutation of the numbers 1, 2, . . . , n.

3.3. Permutation Matrices 31 

Problem 3.30. Suppose that I write down a list of numbers, and 

(a) they are all different and 

(b) there are 25 of them and 

(c) all of them are integers between 1 and 25. 

Prove that this list is a permutation of the numbers 1, 2, . . . , 25. 

We can draw a picture of a permutation: the permutation 2,4,1,3 is 

1 2 

2 4 

3 1 

4 3 

In general, write the numbers 1, 2, . . . , n down the left side, and the permutation 

down the right side, and then connect left to right, 1 to 1, 2 to 2, etc. We don’t 

need to write out the labels, so from now on lets draw 2,4,1,3 as 

Problem 3.31. Which permutations are , , ? 

Inverting a Permutation 

Definition 3.6. Flip a picture of a permutation left to right to give the inverse 

permutation: to . So the inverse of 2, 4, 1, 3 is 3, 1, 4, 2. 

Problem 3.32. Find the inverses of 

a. 1,2,4,3 

b. 4,3,2,1 

c. 3,1,2 

Multiplying 

We need to write down names for permutations. Write down the numbers in a 

permutation as p(1), p(2), . . . , p(n), and call the permutation p. If p and q are 

permutations, write pq for the permutation p(q(1)), p(q(2)), . . . , p(q(n)) (the 

product of p and q). So pq scrambles up numbers by first getting q to scramble 

them, and then getting p to scramble them. Multiply by drawing the pictures 

beside each other: 

p = , q = , pq = . 

Write the inverse permutation of p as p −1 . So pp −1 = p −1 p = 1, where 1 (the 

identity permutation) means the permutation that just leaves all of the numbers 

where they were: 1, 2, . . . , n. 

Problem 3.33. Let p be 2, 3, 1, 4 and q be 1, 4, 2, 3. What is pq? 

Problem 3.34. Write down two permutations p and q for which pq ≠ qp.


Transpositions 

Definition 3.7. A transposition is a permutation which swaps precisely two 

numbers, leaving all of the others alone. 

For example, 

is a transposition. 

Problem 3.35. Prove that every permutation is a product of transpositions. 

Problem 3.36. Draw each of the following permutations as a product of 

transpositions: 

(a) 3,1,2 

(b) 4,3,2,1 

(c) 4,1,2,3 

Flipping the picture of a transposition left to right gives the same picture: 

a transposition is its own inverse. When a permutation is written as a product 

of transpositions, its inverse is written as the same transpositions, taken in 

reverse order. 

Problem 3.37. Prove that every permutation is a product of transpositions 

swapping successive numbers, i.e. swapping 1 with 2, or 2 with 3, etc. 

Permutation Matrices 


( ) 

0 1 

P = 

. 

1 0 

Prove that, for any vector x in R 2 , P x is x with rows 1 and 2 swapped. 

Definition 3.8. The permutation matrix associated to a permutation p of the 

numbers 1, 2, . . . , n is 

) 

P = 

(e p(1) e p(2) . . . e p(n) . 

Example 3.9. Let p be the permutation 3, 1, 2. The permutation matrix of p is 

⎛ 

⎞ 

) 0 1 0 

⎜ 

⎟ 

P = 

(e 3 e 1 e 2 = ⎝0 0 1⎠ . 

1 0 0 

Problem 3.39. What is the permutation matrix P associated to the permutation 

4, 2, 5, 3, 1?

3.3. Permutation Matrices 33 

Problem 3.40. What permutation p has permutation matrix 

⎛ 

⎞ 

0 0 1 0 

1 0 0 0 

⎜ 

⎟ 

⎝0 1 0 0⎠ ? 

0 0 0 1 

Problem 3.41. What is the permutation matrix of the identity permutation? 

Problem 3.42. Prove that a matrix is the permutation matrix of some permutation 

just when 

a. its entries are all 0’s or 1’s and 

b. it has exactly one 1 in each column and 

c. it has exactly one 1 in each row. 

Lemma 3.10. Let P be the permutation matrix associated to a permutation p. 

Then for any vector x, P x is just x with the x 1 entry moved to row p(1), the 

x 2 entry moved to row p(2), etc. 

Remark 3.11. All we need to remember is that P x is x with rows permuted 

somehow, and that we could permute the rows of x any way we want by 

choosing a suitable permutation matrix P . We will never have to actually find 

the permutation. 

Proof. 

P x = P ∑ j 

x j e j 

= ∑ j 

= ∑ j 

x j P e j 

x j e p(j) . 

Proposition 3.12. Let P be the permutation matrix associated to a permutation 

p. Then for any matrix A, P A is just A with the row 1 moved to row p(1), row 

2 moved to row p(2), etc. 

Proof. The columns of P A are just P multiplied by the columns of A. 

Remark 3.13. It is faster and easier to work directly with permutations than 

with permutation matrices. Avoid writing down permutation matrices if you 

can; otherwise you end up wasting time juggling huge numbers of 0’s. Replace 

permutation matrices by permutations.


Problem 3.43. If p and q are two permutations with permutation matrices P 

and Q, prove that pq has permutation matrix P Q. 



⎛ 

⎞ 

0 1 2 

⎜ 

⎟ 

A = ⎝0 0 0⎠ . 

3 0 0 

Which permutation p do we need to make sure that P A is in echelon form, with 

P the permutation matrix of p? 

Problem 3.45. Confusing: Prove that the permutation matrix P of a permutation 

p is the result of permuting the columns of the identity matrix by p, or 

the rows of the identity matrix by p −1 . 

Problem 3.46. What happens to a permutation matrix when you carry out 

forward elimination? back substitution? 

Problem 3.47. Let p be a permutation with permutation matrix P . Prove 

that P −1 is the permutation matrix of p −1 .

4 Elimination Via Matrix 

Arithmetic 

In this chapter, we will describe linear equations and elimination using only 

matrix multiplication. 

4.1 Strictly Lower Triangular Matrices 

Definition 4.1. A square matrix is strictly lower triangular if it has the form 

⎛ 

⎞ 

1 

1 

S = 

⎜ 

. 

⎝ 

.. ⎟ 

⎠ 

1 

with 1’s on the diagonal, 0’s above the diagonal, and anything below. 

Problem 4.1. Let S be strictly lower triangular. Must it be true that S ij = 0 

for i > j? What about for j > i? 

Lemma 4.2. If S is a strictly lower triangular matrix, and A any matrix, then 

SA is A with S ij (row j) added to row i. In particular, S adds multiples of rows 

to lower rows. 

Proof. For x a vector, 

⎛ 

1 

S 

Sx = 

21 1 

⎜ 

⎝ 

S 31 S 32 1 

. . . 

. . . 

⎛ 

x 1 

⎞ 

x 

= 

2 + S 21 x 1 

⎜ 

⎝ 

x 3 + S 31 x 1 + S 32 x 2 

⎟ 

⎠ 

. 

⎞ ⎛ 

⎟ ⎜ 

⎠ ⎝ 

. .. 

x 1 

x 2 

x 3 

. 

⎞ 

⎟ 

⎠ 

35

36 Elimination Via Matrix Arithmetic 

adds S 21 x 1 to x 2 , etc. 

If A is any matrix then the columns of SA are S times columns of A. 

Problem 4.2. Let S be a matrix so that for any matrix A (of appropriate 

size), SA is A with multiples of some rows added to later rows. Prove that S is 

strictly lower triangular. 

Problem 4.3. Which 3 × 3 matrix S adds −5 row 2 to row 3, and −7 row 1 to 

row 2? 

Problem 4.4. Prove that if R and S are strictly lower triangular, then RS is 

too. 

Problem 4.5. Say that a strictly lower triangular matrix is elementary if it 

has only one nonzero entry below the diagonal. Prove that every strictly lower 

triangular matrix is a product of elementary strictly lower triangular matrices. 

Lemma 4.3. Every strictly lower triangular matrix is invertible, and its inverse 

is also strictly lower triangular. 

Proof. Clearly true for 1 × 1 matrices. Lets consider an n × n strictly lower 

triangular matrix S, and assume that we have already proven the result for all 

matrices of smaller size. Write 

S = 

( 

1 0 

c A 

where c is a column and A is a smaller strictly lower triangular matrix. Then 

( 

) 

S −1 = 

1 0 

−A −1 c A −1 , 

which is strictly lower triangular. 

Definition 4.4. A matrix M is strictly upper triangular if it has ones down the 

diagonal zeroes everywhere below the diagonal. 

Problem 4.6. For each fact proven above about strictly lower triangular 

matrices, prove an analogue for strictly upper triangular matrices. 

)

4.2. Diagonal Matrices 37 

Problem 4.7. Draw a picture indicating where some vectors lie in the x 1 x 2 

plane, and where they get mapped to in the y 1 y 2 plane by y = Ax with 

( ) 

1 0 

A = 

. 

2 1 

4.2 Diagonal Matrices 

A diagonal matrix is one like 

⎛ 

⎞ 

t 1 t 

D = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ , 

t n 

(with blanks representing 0 entries). 

Problem 4.8. Show by calculation that 

⎛ 

1 

⎜ 

⎝ 

1 

⎞ ⎛ 

⎞ ⎛ 

⎞ 

1 2 3 1 2 3 

⎟ ⎜ 

⎟ ⎜ 

⎟ 

⎠ ⎝4 5 6⎠ = ⎝4 5 6⎠ . 

7 8 9 

1 

7 

1 

8 

7 

9 

7 

Problem 4.9. Prove that a diagonal matrix D is invertible just when none of 

its diagonal entries are zero. Find its inverse. 

Lemma 4.5. If 

⎛ 

⎞ 

t 1 t 

D = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ , 

t n 

then DA is A with row 1 scaled by t 1 , etc.


Proof. For a vector x, 

⎛ 

⎞ ⎛ ⎞ 

t 1 x 1 

t 

Dx = 

2 x 2 

⎜ 

. 

⎝ 

.. ⎟ ⎜ ⎟ 

⎠ ⎝ . ⎠ 

t n x n 

⎛ ⎞ 

t 1 x 1 

t 

= 

2 x 2 

⎜ ⎟ 

⎝ . ⎠ 

t n x n 

(just running your fingers along rows and down columns). So D scales row i by 

t i . For any matrix A, the columns of DA are D times columns of A. 


Problem 4.10. Which diagonal matrix D takes the matrix 

( ) 

3 4 

A = 

5 6 

to the matrix 

( ) 

DA = 

4 

1 

3 

5 6 

? 


⎛ 

⎞ ⎛ 

⎞ 

⎜ 

a 0 0 d 0 0 

⎟ ⎜ 

⎟ 

⎝0 b 0⎠ 

⎝0 e 0⎠ 

0 0 c 0 0 f 

Problem 4.12. Draw a picture indicating where some vectors lie in the x 1 x 2 

plane, and where they get mapped to in the y 1 y 2 plane by y = Ax with each of 

the following matrices playing the part of A: 

( ) 

2 0 

, 

0 3 

( ) 

1 0 

, 

0 −1 

( ) 

2 0 

. 

0 −3

4.3. Encoding Linear Equations in Matrices 39 

4.3 Encoding Linear Equations in Matrices 

Linear equations 

x 2 − x 3 = 6 

3x 1 + 7x 2 + 4x 3 = 9 

3x 1 + 5x 2 + 8x 3 = 3 

can be written in matrix form as 

⎛ 

⎞ ⎛ ⎞ ⎛ ⎞ 

0 1 −1 

⎜ 

⎟ ⎜ 

x 1 6 

⎟ ⎜ ⎟ 

⎝3 7 4 ⎠ ⎝x 2 ⎠ = ⎝9⎠ . 

3 5 8 x 3 3 

Any linear equations 

A 11 x 1 + A 12 x 2 + · · · + A 1q x q = b 1 

A 21 x 1 + a 22 x 2 + · · · + A 2q x q = b 2 

. = . . 

A p1 x 1 + A p2 x 2 + · · · + A pq x q = b p 

become ⎛ 

⎞ ⎛ ⎞ ⎛ ⎞ 

A 11 A 12 . . . A 1q x 1 b 1 

A 21 A 22 . . . A 2q 

x 2 

⎜ 

. 

⎝ . . .. ⎟ ⎜ ⎟ . ⎠ ⎝ . ⎠ = b 2 ⎜ ⎟ 

⎝ . ⎠ 

A p1 A p2 . . . A pq x q b p 

which we write as Ax = b. 

Problem 4.13. Write the linear equations 

x 1 + 2 x 2 = 7 

3 x 1 + 4 x 2 = 8 

in matrices. 

4.4 Forward Elimination Encoded in Matrix Multiplication 

Forward elimination on a matrix A is carried out by multiplying on the left of 

A by a sequence of permutation matrices and strictly lower triangular matrices. 

Example 4.6. 

⎛ 

⎞ 

⎜ 

0 1 −1 ⎟ 

A = ⎝3 7 4 ⎠ 

3 5 8


Swap rows 1 and 2 (and lets write out the permutation matrix): 

⎛ 

⎞ ⎛ 

⎞ 

0 1 0 3 7 4 

⎜ 

⎟ ⎜ 

⎟ 

⎝1 0 0⎠ A = ⎝ 0 1 −1⎠ 

0 0 1 3 5 8 

Add − (row 1) to (row 3): 

⎛ 

⎞ ⎛ 

⎞ ⎛ 

⎞ 

1 0 0 

⎜ 

⎟ ⎜ 

0 1 0 3 7 4 

⎟ ⎜ 

⎟ 

⎝ 0 1 0⎠ 

⎝1 0 0⎠ A = ⎝ 0 1 −1⎠ 

−1 0 1 0 0 1 0 −2 4 

The string of matrices in front of A just gets longer at each step. Add 2 (row 2) 

to (row 3). 

⎛ 

⎞ ⎛ 

⎞ ⎛ 

⎞ ⎛ 

⎞ 

⎜ 

1 0 0 1 0 0 0 1 0 3 7 4 

⎟ ⎜ 

⎟ ⎜ 

⎟ ⎜ 

⎟ 

⎝0 1 0⎠ 

⎝ 0 1 0⎠ 

⎝1 0 0⎠ A = ⎝ 0 1 −1⎠ 

0 2 1 −1 0 1 0 0 1 0 0 2 

Call this U. This is the echelon form: 

⎛ 

⎞ ⎛ 

⎞ ⎛ 

⎞ 

1 0 0 1 0 0 

⎜ 

⎟ ⎜ 

⎟ ⎜ 

0 1 0 ⎟ 

U = ⎝0 1 0⎠ 

⎝ 0 1 0⎠ 

⎝1 0 0⎠ A. 

0 2 1 −1 0 1 0 0 1 

Remark 4.7. We won’t write out these tedious matrices on the left side of A 

ever again, but it is important to see it done once. 

Remark 4.8. Back substitution is similarly carried out by multiplying by strictly 

upper triangular and invertible diagonal matrices. 


Problem 4.14. Let P be the 3 × 3 permutation matrix which swaps rows 1 

and 2. What does the matrix P 99 do? Write it down. 

Problem 4.15. Let S be the 3 × 3 strictly lower triangular matrix which adds 

2 (row 1) to row 3. What does the 3 × 3 matrix S 101 do? Write it down. 

Problem 4.16. Which 3 × 3 matrix adds twice the first row to the second row 

when you multiply by it? 

Problem 4.17. Which 4 × 4 matrix swaps the second and fourth rows when 

you multiply by it?

4.4. Forward Elimination Encoded in Matrix Multiplication 41 

Problem 4.18. Which 4 × 4 matrix doubles the second and quadruples the 

third rows when you multiply by it? 

Problem 4.19. If P is the permutation matrix of a permutation p, what is 

AP ? 

Problem 4.20. If we start with 

⎛ 

⎞ 

⎜ 

0 0 1 ⎟ 

A = ⎝2 3 4⎠ 

0 5 6 

and end up with 

what permutation matrix is P ? 

⎛ 

⎞ 

⎜ 

2 3 4 ⎟ 

P A = ⎝0 5 6⎠ 

0 0 1 

Problem 4.21. If A is a 2×2 matrix, and AP = P A for every 2×2 permutation 

matrix P or strictly lower triangular matrix, then prove that A = c I for some 

number c. 

Problem 4.22. If the third and fourth columns of a matrix A are equal, are 

they still equal after we carry out forward elimination? After back substitution? 

Problem 4.23. How many pivots can there be in a 3 × 5 matrix in echelon 

form? 

Problem 4.24. Write down the simplest 3 × 5 matrices you can come up with 

in echelon form and for which 

a. The second and third variables are the only free variables. 

b. There are no free variables. 

c. There are pivots in precisely the columns 3 and 4. 

Problem 4.25. Write down the simplest matrices A you can for which the 

number of solutions to Ax = b is 

a. 1 for any b; 

b. 0 for some b, and ∞ for other b; 

c. 0 for some b, and 1 for other b; 

d. ∞ for any b. 

Problem 4.26. Suppose that A is a square matrix. Prove that all entries of A 

are positive just when, for any nonzero vector x which has no negative entries, 

the vector Ax has only positive entries.


Problem 4.27. Prove that short matrices kill. A matrix is called short if it 

is wider than it is tall. We say that a matrix A kills a vector x if x ≠ 0 but 

Ax = 0. 

4.5 Summary 

The many steps of elimination can each be encoded into a matrix multiplication. 

The resulting matrices can all be multiplied together to give the single equation 

U = V A, where A is the matrix we started with, U is the echelon matrix we 

end up with and V is the product of the various matrices that carry out all 

of our elimination steps. There is a big idea at work here: encode a possibly 

huge number of steps into a single algebraic equation (in this case the equation 

U = V A), turning a large computation into a simple piece of algebra. We will 

return to this idea periodically.

5 Finding the Inverse of a Matrix 

Lets use elimination to calculate the inverse of a matrix. 

5.1 Finding the Inverse of a Matrix By Elimination 

Example 5.1. If Ax = y then multiplying both sides by A −1 gives x = A −1 y, 

solving for x. We can write out Ax = y as linear equations, and solve these 

equations for x. For example, if 

( ) 

1 −2 

A = 

, 

2 −3 

then writing out Ax = y: 

x 1 −2 x 2 = y 1 

2 x 1 −3 x 2 = y 2 . 

Let apply Gauss–Jordan elimination, but watch the equations instead of the 

matrices. Add -2(equation 1) to equation 2. 

Add 2(equation 2) to equation 1. 

x 1 −2 x 2 = y 1 

x 2 = −2 y 1 + y 2 . 

x 1 = −3 y 1 + 2 y 2 

x 2 = −2 y 1 + y 2 . 

So 

( ) 

A −1 −3 2 

= 

. 

−2 1 

Theorem 5.2. Let A be a square matrix. Suppose that Gauss–Jordan elimination 

applied to the matrix (A I) ends up with (U V ) with U and V square 

matrices. A is invertible just when U = I, in which case V = A −1 . 

43


Example 5.3. Before the proof, lets have an example. Lets invert 

( ) 

1 −2 

A = 

. 

2 −3 

( 

A 

I 

( 

) 

= 

1 −2 1 0 

2 −3 0 1 

) 

Add -2(row 1) to row 2. 

( 

1 −2 1 0 

0 1 −2 1 

) 

Make a new pivot ↘. 

( 

1 −2 1 0 

0 1 −2 1 

) 

Add 2(row 2) to row 1. 

( 

1 0 −3 2 

0 

( 

1 −2 

) 

1 

= U V . 

) 

Obviously these are the same steps we used in the example above; the shaded 

part represents coefficients in front of the y vector above. Since U = I, A is 

invertible and 

A −1 = V 

( ) 

−3 2 

= 

. 

−2 1 

Proof. Gauss–Jordan elimination on (A I) is carried out by multiplying by 

various invertible matrices (strictly lower triangular, permutation, invertible 

diagonal and strictly upper triangular), say like 

( ) ( ) 

U V = M N M N−1 . . . M 2 M 1 A I . 

So 

U = M N M N−1 . . . M 2 M 1 A 

V = M N M N−1 . . . M 2 M 1 ,

5.1. Finding the Inverse of a Matrix By Elimination 45 

which we summarize as U = V A. Clearly V is a product of invertible matrices, 

so invertible. Thus U is invertible just when A is. 

First suppose that U has pivots all down the diagonal. Every pivot is a 1. 

Entries above and below each pivot are 0, so U = I. Since U = V A, we find 

I = V A. Multiply both sides on the left by V −1 , to see that V −1 = A. But 

then multiply on the right by V to see that I = AV . So A and V are inverses 

of one another. 

Next suppose that U doesn’t have pivots all down the diagonal. We always 

start Gauss–Jordan elimination on the diagonal, so we fail to place a pivot somewhere 

along the diagonal just because we move → during forward elimination. 

That move makes a pivotless column, hence a free variable for the equation 

Ax = 0. Setting the free variable to a nonzero value produces a nonzero x with 

Ax = 0. By problem 3.24 on page 29, A is not invertible. 

Problem 5.1. Find the inverse of 

⎛ 

⎜ 

0 0 1 

⎞ 

⎟ 

A = ⎝1 1 0⎠ . 

1 2 1 



⎛ 

⎞ 

⎜ 

1 2 3 ⎟ 

A = ⎝1 2 0⎠ . 

1 0 0 


⎛ 

⎞ 

1 −1 1 

⎜ 

⎟ 

A = ⎝−1 1 −1⎠ . 

1 −1 1 


⎛ 

⎜ 

1 1 1 

⎞ 

⎟ 

A = ⎝0 1 3⎠ . 

1 1 3 

Problem 5.5. Is there a faster method than Gauss–Jordan elimination to find 

the inverse of a permutation matrix?


5.2 Invertibility and Forward Elimination 

Proposition 5.4. A square matrix U in echelon form is invertible just when 

U has pivots all the way down the diagonal, which occurs just when U has no 

zero rows. 

Proof. Applying back substitution to a matrix U which is already in echelon 

form preserves the locations of the pivots, and just rescales them to be 1, killing 

everything above them. So back substitution takes U to I just when U has 

pivots all the way down the diagonal. 

Example 5.5. ( 

is invertible, while 

is not invertible. 

) 

1 2 

0 7 

⎛ 

⎞ 

⎜ 

0 1 2 ⎟ 

⎝0 0 3 ⎠ 

0 0 0 

Theorem 5.6. A square matrix A is invertible just when its echelon form U 

is invertible. 

Remark 5.7. So we can quickly decide if a matrix is invertible by forward 

elimination. We only need back substitution if we actually need to compute 

out the inverse. 

Proof. U = V A, and V is invertible, so U is invertible just when A is. 

Example 5.8. 

has echelon form 

so A is invertible. 

( ) 

0 1 

A = 

1 1 

U = 

( 

) 

1 1 

0 1 

Problem 5.6. Is ( ) 

0 1 

1 0 

invertible?

5.2. Invertibility and Forward Elimination 47 

Problem 5.7. Is 

invertible? 

⎛ 

⎞ 

⎜ 

0 1 0 ⎟ 

⎝1 0 1⎠ 

1 1 1 

Problem 5.8. Prove that a square matrix A is invertible just when the only 

solution x to the equation Ax = 0 is x = 0. 

Inversion and Solvability of Linear Equations 

Theorem 5.9. Take a square matrix A. The equation Ax = b has a solution x 

for every vector b just when A is invertible. For each b, the solution x is then 

unique. On the other hand, if A is not invertible, then Ax = b has either no 

solution or infinitely many, and both of these possibilities occur for different 

choices of b. 

Proof. If A is invertible, then multiplying both sides of Ax = b by A −1 , we see 

that we have to have x = A −1 b. 

On the other hand, suppose that A is not invertible. There is a free variable 

for Ax = b, so no solutions or infinitely many. Lets see that for different choices 

of b both possibilities occur. Carry out forward elimination, say U = V A. Then 

U has a zero row, say row n. We can’t solve Ux = e n (look at row n). So 

set b = V −1 e n and we can’t solve Ax = b. But now instead set b = 0 and we 

can solve Ax = 0 (for example with x = 0) and therefore solve Ax = 0 with 

infinitely many solutions x, since there is a free variable. 

Example 5.10. The equations 

x 1 + 2x 2 = 9845039843453455938453 

x 1 − 2x 2 = 90853809458394034464578 

have a unique solution, because they are Ax = b with 

( ) 

1 2 

A = 

1 −2 

which has echelon form 

U = 

( 

) 

1 2 

. 

0 −4


Problem 5.9. Suppose that A and B are n × n matrices and AB = I. Prove 

that A and B are both invertible, and that B = A −1 and that A = B −1 . 

Problem 5.10. Prove that for square matrices A and B of the same size 

(AB) −1 = B −1 A −1 

(and if either side is defined, then the other is and they are equal). 



invertible? 

( ) 

0 −1 

A = 

1 0 

Problem 5.12. How many solutions are there to the following equations? 

x 1 + 2x 2 + 3x 3 = 284905309485083 

x 1 + 2x 2 + x 3 = 92850234853408 

x 2 + 15x 3 = 4250348503489085. 

Problem 5.13. Let A be the n × n matrix which has 1 in every entry on or 

under the diagonal, and 0 in every entry above the diagonal. Find A −1 . 

Problem 5.14. Let A be the n × n matrix which has 1 in every entry on or 

above the diagonal, and 0 in every entry below the diagonal. Find A −1 . 

Problem 5.15. Give an example of a 3 × 3 invertible matrix A for which A 

and A t have different values for their pivots. 

Problem 5.16. Imagine that you start with a matrix A which might not be 

invertible, and carry out forward elimination on (A I). Suppose you arrive at 

⎛ 

⎞ 

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 

(U V ) = ⎜ 

⎟ 

⎝0 0 0 0 2 8 3 9⎠ , 

0 0 0 0 0 1 5 2

5.2. Invertibility and Forward Elimination 49 

with some pivots somewhere on the first two rows of U. Fact: you can solve 

Ax = b just for those vectors b which solve the equations 

Explain why. 

2b 1 +8b 2 +3b 3 +9b 4 = 0 

b 2 +5b 3 +2b 4 = 0 .

6 The Determinant 

We can see whether a matrix is invertible by computing a single number, the 

determinant. 

Problem 6.1. Use forward elimination to prove that a 2 × 2 matrix 

( ) 

a b 

A = 

c d 

is invertible just when ad − bc ≠ 0. 

( ) 

a b 

For any 2 × 2 matrix 

, the determinant is ad − bc. For larger 

c d 

matrices, the determinant is complicated. 

6.1 Definition 

Determinants are computed as in figure 6.1 on the next page. To compute 

a determinant, run your finger down the first column, writing down plus and 

minus signs in the pattern +, −, +, −, . . . in front the entry your finger points 

at, and then writing down the determinant of the matrix you get by deleting 

the row and column where your finger lies (always the first column), and add 

up. 

Problem 6.2. Prove that 

det 

( 

a 

c 

) 

b 

= ad − bc. 

d 


Problem 6.3. Find the determinant of 

( ) 

3 1 

1 −3 

51


⎛ 

⎞ 

⎛ 

⎜ 

3 2 1 ⎟ 

⎜ 

det ⎝1 4 5⎠ = + (3) det ⎝ 

6 7 2 

⎛ 

⎜ 

− (1) det ⎝ 

⎛ 

⎜ 

+ (6) det ⎝ 

=3 det 

( 

4 5 

7 2 

3 2 1 

1 4 5 

6 7 2 

3 2 1 

1 4 5 

6 7 2 

3 2 1 

1 4 5 

6 7 2 

) 

− det 

( 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

2 1 

7 2 

) ( ) 

2 1 

+ 6 det 

4 5 

Figure 6.1: Computing a 3 × 3 determinant. 


⎛ 

⎞ 

⎜ 

1 −1 0 ⎟ 

⎝0 1 1 ⎠ 

1 0 −1 

Problem 6.5. Does A 2 11 appear in the expression for det A, when you expand 

out all of the determinants in the expression completely? 

Problem 6.6. Prove that the determinant of 

⎛ 

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 

0 0 0 ∗ ∗ ∗ ∗ ∗ 

0 0 0 ∗ ∗ ∗ ∗ ∗ 

A = 

0 0 0 ∗ ∗ ∗ ∗ ∗ 

0 0 0 ∗ ∗ ∗ ∗ ∗ 

0 0 0 ∗ ∗ ∗ ∗ ∗ 

⎜ 

⎝ 0 0 0 ∗ ∗ ∗ ∗ ∗ 

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 

⎞ 

⎟ 

⎠ 

is zero, no matter what number we put in place of the ∗’s, even if the numbers 

are all different.

6.2. Easy Determinants 53 

Problem 6.7. Give an example of a matrix all of whose entries are positive, 

even though its determinant is zero. 

Problem 6.8. What is det I? Justify your answer. 


( ) 

A B 

det 

= det A det C, 

0 C 

for A and C any square matrices, and B any matrix of appropriate size to fit 

in here. 

6.2 Easy Determinants 

Example 6.1. Lets find the determinant of 

⎛ 

7 

⎜ 

A = ⎝ 4 

⎞ 

⎟ 

⎠ . 

2 

(There are zeros wherever entries are not written.) Running down the first 

column, we only hit the 7. So 

( ) 

4 

det A = 7 det . 

2 

By the same trick: 

Summing up: 

det A = (7)(4) det(2) 

= (7)(4)(2). 

Lemma 6.2. The determinant of a diagonal matrix 

⎛ 

⎞ 

p 1 p 

A = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ 

p n 

is det A = p 1 p 2 . . . p n . 

We can easily do better: recall that a matrix A is upper triangular if all 

entries below the diagonal are 0. By the same trick again:


Lemma 6.3. The determinant of an upper triangular square matrix 

⎛ 

⎞ 

U 11 U 12 U 13 U 14 . . . U 1n 

U 22 U 23 U 24 . . . U 2n 

U 33 U 34 . . . U 3n 

U = 

. .. . .. . 

⎜ 

. 

⎝ 

.. 

⎟ 

. ⎠ 

U nn 

is the product of the diagonal terms: det U = U 11 U 22 . . . U nn . 

Corollary 6.4. A square matrix A is invertible just when det U ≠ 0, with U 

obtained from A by forward elimination. 

Proof. The matrix U is upper triangular. The fact that det U ≠ 0 says just 

precisely that all diagonal entries of U are not zero, so are pivots—a pivot in 

every column. Apply theorem 5.6 on page 46. 


Problem 6.10. Find 

⎛ 

⎞ 

1 2 3 

⎜ 

⎟ 

det ⎝0 4 5⎠ . 

0 0 6 

Problem 6.11. Suppose that U is an invertible upper triangular matrix. 

a. Prove that U −1 is upper triangular. 

b. Prove that the diagonal entries of U −1 are the reciprocals of the diagonal 

entries of U. 

c. How can you calculate by induction the entries of U −1 in terms of the 

entries of U? 

Problem 6.12. Let U be any upper triangular matrix with integer entries. 

Prove that U −1 has integer entries just when det U = ±1. 

6.3 Tricks to Find Determinants 

Lemma 6.5. Swapping any two neighbouring rows of a square matrix changes 

the sign of the determinant. For example, 

( ) ( ) 

1 2 

3 4 

det 

= − det 

. 

3 4 

1 2

6.3. Tricks to Find Determinants 55 

Proof. It is obvious for 1 × 1 (you can’t swap anything). It is easy to check for 

a 2 × 2. Picture a 3 × 3 matrix A (like example 6.1 on page 52). For simplicity, 

lets swap rows 1 and 2. Then the plus sign of row 1 and the minus sign of row 

2 are clearly switched in the 1st and 2nd terms in the determinant. In the 3rd 

term, the leading plus sign is not switched. Look at the determinant in the 

3rd term: rows 1 and 2 don’t get crossed out, and have been switched, so the 

determinant factor changes sign. So all terms in the determinant formula have 

changed sign, and therefore the determinant has changed sign. The argument 

goes through identically with any size of matrix (by induction) and any two 

neighboring rows, instead of just rows 1 and 2. 

Lemma 6.6. Swapping any two rows of a square matrix changes the sign 

of the determinant, so det P A = − det A for P the permutation matrix of a 

transposition. 

Proof. Suppose that we want to swap two rows, not neighboring. For concreteness, 

imagine rows 1 and 4. Swapping the first with the second, then second 

with third, etc., a total of 3 swaps will drive row 1 into place in row 4, and 

drives the old row 4 into row 3. Two more swaps (of row 3 with row 2, row 2 

with row 1) puts everything where we want it. 

More generally, to swap two rows, start by swapping the higher of the two 

with the row immediately under it, repeatedly until it fits into place. Some 

number s of swaps will do the trick. Now the row which was the lower of the 

two has become the higher of the two, and we have to swap it s − 1 swaps into 

place. So 2s − 1 swaps in all, an odd number. 

Problem 6.13. If a square matrix has two rows the same, prove that it has 

determinant 0. 

Problem 6.14. Find 2 × 2 matrices A and B for which 

det(A + B) ≠ det A + det B. 

So det doesn’t behave well under adding matrices. But it does behave well 

under adding rows of matrices. 

Example 6.7. Watch each row: 

( 

) ( 

1 + 5 2 + 6 

det 

= det 

3 4 

1 2 

3 4 

) 

+ det 

( 

5 6 

3 4 

Theorem 6.8. The determinant of any square matrix scales when you scale 

across any row like 

( 

) ( ) 

7 · 1 7 · 2 

1 2 

det 

= 7 · det 

3 4 

3 4 

) 

.


or when you scale down any column like 

( ) ( ) 

7 · 1 2 

1 2 

det 

= 7 · det . 

7 · 3 4 

3 4 

It adds when you add across any row like 

( 

) ( ) 

1 + 5 2 + 6 1 2 

det 

= det 

3 4 

3 4 

or when you add down any column like 

( ) ( ) 

1 2 + 5 1 2 

det 

= det 

3 4 + 6 3 4 

( ) 

5 6 

+ det 

3 4 

( ) 

1 5 

+ det . 

3 6 

Proof. To compute a determinant, you pick an entry from the first column, and 

then delete its row and column. You then multiply it by the determinant of 

what is left over, which is computed by picking out an entry from the second 

column, not from the same row, etc. If we ignore for a moment the plus and 

minus signs, we can see the pattern emerging: you just pick something from 

the first column, and cross out its row and column, 

⎛ 

⎞ 

⎜ 

⎝ 

⎟ 

⎠ 

and then something from the second column, and cross out its row and column, 

⎛ 

⎞ ⎛ 

⎞ ⎛ 

, 

, . . . , 

⎜ 

⎟ ⎜ 

⎟ ⎜ 

⎝ 

⎠ ⎝ 

⎠ ⎝ 

⎞ 

⎟ 

⎠ 

and so on. Finally, you have picked one entry from each column, all from 

different rows. In our example, we picked A 31 , A 52 , A 23 , A 14 , A 45 . Multiply these 

together, and you get just one term from the determinant: A 31 A 52 A 23 A 14 A 45 . 

Your term has exactly one entry from the first column, and then you crossed out 

the first column and moved on. Suppose that you double all of the entries in 

the first column. Your term contains exactly one entry from that column, A 31 

in our example, so your term doubles. Adding up the terms, the determinant 

doubles. 

The determinant is the sum over all choices you could make of rows to pick 

at each step; and of course, there are some plus and minus signs which we are


still ignoring. For example, with this kind of picture, a 2 × 2 determinant looks 

like 

− = A 11 A 22 − A 21 A 12 . 

In the same way, scaling any column, you scale your entry from that column, 

so you scale your term. You scale all of the terms, so you scale the determinant. 

When you cobbled together your term, you picked out an entry from some 

row, and then crossed out that row. So you didn’t use the same row twice. 

There are as many rows as columns, and you picked an entry in each column, 

so you have picked as many entries as there are rows, never using the same row 

twice. So you must have picked out exactly one entry from each row. In our 

example term above, we see this clearly: the rows used were 3, 5, 2, 1, 4. By the 

same argument as for columns, if you scale row 2, you must scale the entry A 23 , 

any only that entry, so you scale the term. Adding up all possible terms, you 

scale the determinant. 

Lets see why we can add across rows. If I try to add entries across the first 

row, a single term looks like 

⎛ 

⎜ 

⎝ 

1 + 6 2 + 7 3 + 8 4 + 9 5 + 10 

⎞ 

⎟ 

⎠ 

= (4 + 9) (. . . ) 

where the (. . . ) indicates all of the other factors from the lower rows, which we 

will leave unspecified, 

= 4 (. . . ) + 9 (. . . ) 

⎛ 

1 2 3 4 5 ⎞ ⎛ 

⎟ ⎟⎟⎟⎟⎠ = 

+ 

⎜ 

⎜ 

⎝ 

⎝ 

6 7 8 9 10 ⎞ ⎟ ⎟⎟⎟⎟⎠ 

since we keep all of the entries in the lower rows exactly the same in each matrix. 

This shows that each term adds when you add across a single row, so the sum 

of the terms, the determinant, must add. This reasoning works for any size of 

matrix in the same way. Moreover, it works for columns just in the same way 

as for rows. 

Problem 6.15. What happens to the determinant if I double the first row and 

then triple the second row?


Proposition 6.9. Suppose that S is the strictly upper or strictly lower triangular 

matrix which adds a multiple of one row to another row. Then 

det SA = det A. 

i.e. we can add a multiple of any row to any other row without affecting the 

determinant. 

Proof. We can always swap rows as needed, to get the rows involved to be the 

first and second rows. Then swap back again. This just changes signs somehow, 

and then changes them back again. So we need only work with the first and 

second rows. For simplicity, picture a 3 × 3 matrix as 3 rows: 

⎛ ⎞ 

A = 

⎜ 

⎝ 

a 1 

a 2 

a 3 

⎟ 

⎠ . 

Adding s (row 1) to (row 2) gives 

⎛ 

⎜ 

⎝ 

a 1 

a 2 + s a 1 

a 3 

⎞ 

⎟ 

⎠ , 

which has determinant 

⎛ 

⎜ 

det ⎝ 

a 1 

a 2 + s a 1 

a 3 

⎞ ⎛ 

⎟ ⎜ 

⎠ = det ⎝ 

a 1 

a 2 

a 3 

⎞ ⎛ 

⎟ ⎜ 

⎠ + s det ⎝ 

a 1 

a 1 

a 3 

⎞ 

⎟ 

⎠ . 

by the last lemma. The second determinant vanishes because it has two identical 

rows. The general case is just the same with more notation: we stuff more rows 

around the three rows we had above. 

Problem 6.16. Which property of the determinant is illustrated in each of 

these examples? 

(a) 

⎛ 

⎞ ⎛ 

⎞ 

10 −5 −5 2 −1 −1 

(b) 

⎜ 

⎝ 

−1 0 2 

−1 0 1 

⎟ ⎜ 

⎠ = 5 ⎝ 

−1 0 2 

−1 0 1 

⎟ 

⎠ 

⎛ 

⎞ ⎛ 

⎞ 

1 −2 −3 1 −2 −3 

⎜ 

⎟ ⎜ 

⎟ 

⎝−1 0 2 ⎠ = − ⎝−3 0 2 ⎠ 

−3 0 2 −1 0 2


(c) 

⎛ 

⎞ ⎛ 

⎞ 

1 −1 −2 

⎜ 

⎟ ⎜ 

1 −1 2 ⎟ 

⎝4 0 2 ⎠ = ⎝4 0 2⎠ 

2 −2 −1 0 0 3

7 The Determinant via Elimination 

The fast way to compute the determinant of a large matrix is via elimination. 

The fast formula for the determinant: 

Theorem 7.1. Via forward elimination, 

⎧ 

⎨ 

± (product of the pivots) if there is a pivot in each column, 

det A = 

⎩0 otherwise. 

where 

⎧ 

⎨ 

+ if we make an even number of row swaps during forward elimination, 

± = 

⎩− 

otherwise. 

Example 7.2. Forward elimination takes 

( ) ( ) 

0 7 

2 3 

A = 

to U = 

2 3 

0 7 

with one row swap so det A = −(2)(7) = −14. 

Remark 7.3. The fast formula isn’t actually any faster for small matrices, so 

for a 2 × 2 or 3 × 3 you wouldn’t use it. But we need the fast formula anyway; 

each of the two formulas gives different insight. 

Proof. We can see how the determinant changes during elimination: adding 

multiples of rows to other rows does nothing, swapping rows changes sign. 

Problem 7.1. Use the fast formula to find the determinant of 

⎛ 

⎞ 

2 5 5 

⎜ 

⎟ 

A = ⎝2 5 7 ⎠ 

2 6 11 

61


Problem 7.2. Just by looking, find 

⎛ 

⎞ 

1001 1002 1003 1004 

2002 2004 2006 2008 

det ⎜ 

⎟ 

⎝2343 6787 1938 4509⎠ . 

9873 7435 2938 9038 

Problem 7.3. Prove that a square matrix is invertible just when its determinant 

is not zero. 



⎛ 

⎞ 

1 0 1 −1 

1 0 0 0 

⎜ 

⎟ 

⎝0 1 −1 −1⎠ 

1 0 −1 0 


⎛ 

⎞ 

0 2 2 0 

−1 1 0 0 

⎜ 

⎟ 

⎝−1 −1 0 1⎠ 

2 0 1 1 


⎛ 

⎞ 

2 −1 −1 

⎜ 

⎟ 

⎝−1 −1 0 ⎠ 

2 −1 −1 


⎛ 

⎞ 

⎜ 

0 2 0 ⎟ 

⎝0 0 −1⎠ 

2 2 −1

The Determinant via Elimination 63 


⎛ 

⎞ 

⎜ 

2 1 −1 ⎟ 

⎝2 0 2 ⎠ 

0 2 1 


⎛ 

⎞ 

0 −1 −1 

⎜ 

⎟ 

⎝0 −1 2 ⎠ 

0 1 0 

Problem 7.10. Prove that a square matrix with a zero row has determinant 0. 

Problem 7.11. Prove that det P A = (−1) N det A if P is the permutation 

matrix of a product of N transpositions. 

Problem 7.12. Use the fast formula to find the determinant of 

⎛ 

⎞ 

⎜ 

0 2 1 ⎟ 

A = ⎝3 1 2⎠ 

3 5 2 

Problem 7.13. Prove that the determinant of any lower triangular square 

matrix 

⎛ 

⎞ 

L 11 

L 21 L 22 

L 31 L 32 L 33 

L = 

. L 41 L 42 L .. 

43 ⎜ 

. 

⎝ . . . . .. 

⎟ 

⎠ 

L n1 L n2 L n3 . . . L n(n−1) L nn 

(with zeroes above the diagonal) is the product of the diagonal terms: det L = 

L 11 L 22 . . . L nn .


7.1 Determinants Multiply 

Theorem 7.4. 

det (AB) = det A det B. (7.1) 

Proof. Suppose that det A = 0. By the fast formula, A is not invertible. Problem 

5.10 on page 48 tells us that therefore AB is not invertible, and both sides of 

equation 7.1 are 0. So we can safely suppose that det A ≠ 0. Via Gauss-Jordan 

elimination, any invertible matrix is a product of matrices each of which adds 

a multiple of one row to another, or scales a row, or swaps two rows. Write 

A as a product of such matrices, and peel off one factor at a time, applying 

lemma 6.5 on page 54 and proposition 6.9 on page 58. 

Example 7.5. If 

⎛ 

⎜ 

1 4 6 

⎞ ⎛ 

⎞ 

1 0 0 

⎟ ⎜ 

⎟ 

A = ⎝0 2 5⎠ , B = ⎝2 2 0⎠ , 

0 0 3 7 5 4 

then it is hard to compute out AB, and then compute out det AB. 

det AB = det A det B = (1)(2)(3)(1)(2)(4) = 48. 

But 

7.2 Transpose 

Definition 7.6. The transpose of a matrix A is the matrix A t whose entries are 

A t ij = A ji (switching rows with columns). 

Example 7.7. Flip over the diagonal: 

⎛ ⎞ 

⎜ 

10 2 ( 

) 

⎟ 

A = ⎝ 3 40⎠ , A t 10 3 5 

= 

. 

2 40 6 

5 6 

Problem 7.14. Find the transpose of 

⎛ 

⎞ 

1 2 3 

⎜ 

⎟ 

A = ⎝4 5 6⎠ . 

0 0 0 


(AB) t = B t A t . 

(The transpose of the product is the product of the transposes, in the reverse 

order.)

7.3. Expanding Down Any Column or Across Any Row 65 

Problem 7.16. Prove that the transpose of any permutation matrix is a 

permutation matrix. How is the permutation of the transpose related to the 

original permutation? 

Corollary 7.8. 

det A = det A t 

Proof. Forward elimination gives U = V A, U upper triangular and V a product 

of permutation and strictly lower triangular matrices. Tranpose: U t = A t V t . 

But V t is a product of permutation and strictly upper triangular matrices, 

with the same number of row swaps as V , so det V t = det V = ±1. The 

matrix U t is lower triangular, so det U t is the product of the diagonal entries 

of U t (by problem 7.13 on page 63), which are the diagonal entries of U, so 

det U t = det U. 

7.3 Expanding Down Any Column or Across Any Row 

Consider the “checkerboard pattern” 

+ − + − . . . 

− + − + . . .. 

. . 

Theorem 7.9. We can compute the determinant of any square matrix A by 

picking any column (or any row) of A, writing down plus and minus signs 

from the same column (or row) of the checkboard matrix, writing down the 

entries of A from that column (or row), multiplying each of these entries by 

the determinant obtained from deleting the row and column of that entry, and 

adding all of these up. 

Example 7.10. For 

⎛ 

⎞ 

3 2 1 

⎜ 

⎟ 

A = ⎝1 4 5⎠ , 

6 7 2


if we expand along the second row, we get 

⎛ 

3 2 1 

⎜ 

det A = − (1) det ⎝ 1 4 5 

6 7 2 

⎛ 

3 2 1 

⎜ 

+ (4) det ⎝ 1 4 5 

6 7 2 

⎛ 

3 2 1 

⎜ 

− (5) det ⎝ 1 4 5 

6 7 2 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

Proof. By swapping columns (or rows), we change signs of the determinant. 

Swap columns (or rows) to get the required column (or row) to slide over to 

become the first column (or row). Take the sign changes into account with the 

checkboard pattern: changing all plus and minus signs for each swap. 

Problem 7.17. Use this to calculate the determinant of 

⎛ 

⎞ 

1 2 0 1 

3 4 0 0 

A = ⎜ 

⎟ 

⎝ 0 0 0 2 ⎠ . 

839 −1702 1 493 

7.4 Summary 

Determinants 

(a) scale when you scale across a row (or down a column), 

(b) add when you add across a row (or down a column), 

(c) switch sign when you swap two rows, (or when you swap two columns), 

(d) don’t change when you add a multiple of one row to another row (or a 

multiple of one column to another column), 

(e) don’t change when you transpose, 

(f) multiply when you multiply matrices. 

The determinant of 

(a) an upper (or lower) triangular matrix is the product of the diagonal entries. 

(b) a permutation matrix is (−1) # of transpositions . 

(c) a matrix is not zero just when the matrix is invertible. 

(d) any matrix is det A = (−1) N det U, if A is taken by forward elimination 

with N row swaps to a matrix U.


Problem 7.18. If A is a square matrix, prove that 

for k = 1, 2, 3, . . . . 

det ( A k) = (det A) k 

Problem 7.19. Use this last exercise to find 

det ( A 2222444466668888) 

where 

A = 

( 

) 

0 1 

. 

1 1234567890 

Problem 7.20. If A is invertible, prove that 

det ( A −1) = 1 

det A . 


Problem 7.21. What are all of the different ways you know to calculate 

determinants? 

Problem 7.22. How many solutions are there to the following equations? 

x 1 + 1010x 2 + 130923x 3 = 2839040283 

2x 2 + 23932x 3 = 2390843248 

3x 3 = 98234092384 

Problem 7.23. Prove that no matter which entry of an n × n matrix you pick 

(n > 1), you can find some invertible n × n matrix for which that entry is zero.

Bases and Subspaces 

69

8 Span 

We want to think not only about vectors, but also about lines and planes. We 

will find a convenient language in which to describe lines and planes and similar 

objects. 

8.1 The Problem 

Look at a very simple linear equation: 

x 1 + 2x 2 + x 3 = 0. (8.1) 

There are many solutions. Each is a point in R 3 , and together they draw out a 

plane. But how do we write down this plane? The picture is useless—we can’t 

see for sure which vectors live on it. We need a clear method to write down 

planes, lines, and similar things, so that we can communicate about them (e.g. 

over the telephone or to a computer). 

One method to describe a plane is to write down an equation, like x 1 + 

2x 2 + x 3 = 0, cutting out the plane. But there is another method, which we 

will often prefer, building up the plane out of vectors. 

The solutions of an 

equation forming a 

plane. 

8.2 Span 

Example 8.1. Consider the equations 

Solutions have 

x 1 + 2x 2 − 7x 4 = 0 

x 3 + x 4 = 0 

x 1 = −2 x 2 + 7 x 4 

x 3 = −x 4 , 

71

72 Span 

giving 

⎛ ⎞ 

x 1 

x 

x = 2 

⎜ ⎟ 

⎝x 3 ⎠ 

x 4 

⎛ 

⎞ 

−2 x 2 + 7 x 4 

x 

= 

2 

⎜ 

⎟ 

⎝ −x 4 ⎠ 

x 4 

⎛ ⎞ ⎛ 

−2 

1 

= x 2 ⎜ ⎟ 

⎝ 0 ⎠ + x 4 ⎜ 

⎝ 

0 

But x 2 and x 4 are free—they can be anything. The solutions are just arbitrary 

“combinations” of 

⎛ ⎞ ⎛ ⎞ 

−2 7 

1 

⎜ ⎟ 

⎝ 0 ⎠ and 0 

⎜ ⎟ 

⎝−1⎠ 

0 

1 

We can just remember these two vectors, to describe all of the solutions. 

Definition 8.2. A multiple of a vector v is a vector cv where c is a number. A 

linear combination of some vectors v 1 , v 2 , . . . , v p in R n is a vector 

7 

0 

−1 

1 

⎞ 

⎟ 

⎠ 

v = c 1 v 1 + c 2 v 2 + · · · + c p v p , 

for some numbers c 1 , c 2 , . . . , c p (a sum of multiples). The span of some vectors 

is the collection of all of their linear combinations. 

8.3 The Solution 

We can describe the plane of solutions of equation 8.1 on the previous page: it 

is the span of the vectors ⎛ ⎞ ⎛ ⎞ 

2 0 

⎜ ⎟ ⎜ ⎟ 

⎝−1⎠ , ⎝−1⎠ . 

0 2 

That isn’t obvious. (You can apply forward elimination to check that it is 

correct.) But immediately we see our next problem: you might describe it as 

this span, and I might describe it as the span of 

⎛ ⎞ 

2 

⎜ ⎟ 

⎝−1⎠ , 

0 

⎛ 

⎜ 

⎝ 

1 

0 

−1 

⎞ 

⎟ 

⎠ .

8.4. How to Tell if a Vector Lies in a Span 73 

How do we see that these are the same thing? 

8.4 How to Tell if a Vector Lies in a Span 

If we have some vectors, lets say 

⎛ ⎞ ⎛ 

⎜ 

⎟ ⎜ 

x 1 = ⎝2⎠ , x 2 = ⎝ 

3 

1 

0 

−1 

how do we tell if another vector lies in their span? Lets ask if 

⎛ ⎞ 

1 

⎜ ⎟ 

y = ⎝4⎠ 

7 

lies in the span of x 1 and x 2 . So we are asking if y is a linear combination 

c 1 x 1 + c 2 x 2 . 

Example 8.3. Solving the linear equations 

c 1 + c 2 = 1 

2c 1 = 4 

3c 1 − c 2 = 7 

⎞ 

⎟ 

⎠ 

just means finding numbers c 1 and c 2 for which 

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

⎜ 

1 1 1 

⎟ ⎜ ⎟ ⎜ ⎟ 

c 1 ⎝2⎠ + c 2 ⎝ 0 ⎠ = ⎝4⎠ , 

3 −1 7 

writing y as a linear combination of x 1 and x 2 . Solving linear equations is 

exactly the same problem as asking whether one vector is a linear combination 

of some other vectors. 

Problem 8.1. Write down some linear equations, so that solving them is the 

same problem as asking whether 

⎛ ⎞ 

1 

⎜ ⎟ 

⎝ 0 ⎠ 

−2 

is a linear combination of 

⎛ 

⎜ 

⎝ 

1 

0 

−1 

⎞ 

⎟ 

⎠ , 

⎛ ⎞ 

2 

⎜ ⎟ 

⎝1⎠ , 

1 

⎛ ⎞ 

1 

⎜ ⎟ 

⎝1⎠ . 

0

74 Span 

Definition 8.4. A pivot column of a matrix A is a column in which a pivot 

appears when we forward eliminate A. 

Example 8.5. The matrix 

( 

) 

0 0 1 

A = 

−1 1 1 


( 

) 

−1 1 1 

U = 

0 0 −1 

so columns 1 and 3 of A: 

( 

A = 

are pivot columns. 

0 0 1 

−1 1 1 

) 

Lemma 8.6. Write some vectors into the columns of a matrix, say 

( 

) 

A = x 1 x 2 . . . x p y 

and apply forward elimination. Then y lies in the span of x 1 , x 2 , . . . , x p just 

when y is not a pivot column. 

Proof. As in example 8.3 on the preceding page, the problem is precisely whether 

we can solve the linear equations whose matrix is A, with y the column of 

constants. We already know that linear equations have solutions just when the 

column of constants is not a pivot column. 

Applied to our example, this gives 

( 

) 

A = x 1 x 2 y 

⎛ 

⎞ 

⎜ 

1 1 1 ⎟ 

= ⎝2 0 4⎠ 

3 −1 7 

to which we apply forward elimination: 

⎛ 

⎞ 

1 1 1 

⎜ 

⎟ 

⎝ 2 0 4⎠ 

3 −1 7

8.4. How to Tell if a Vector Lies in a Span 75 

Add −2(row 1) to row 2, and −3(row 1) to row 3. 

⎛ 

⎞ 

1 1 1 

⎜ 

⎟ 

⎝ 0 −2 2⎠ 

0 −4 4 

Add −2(row 2) to row 3. 

⎛ 

⎞ 

1 1 1 

⎜ 

⎟ 

⎝ 0 −2 2⎠ 

0 0 0 

There is no pivot in the last column, so y is a linear combination of x 1 and x 2 , 

i.e. lies in their span. (In fact, in the echelon form, we see that the last column 

is twice the first column minus the second column. So this must hold in the 

original matrix too: y = 2 x 1 − x 2 .) 

Problem 8.2. What if we have a lot of vectors y to test? Prove that vectors 

y 1 , y 2 , . . . , y q all lie in the span of vectors x 1 , x 2 , . . . , x p just when the matrix 

(x 1 x 2 . . . x p y 1 y 2 . . . y q 

) 

has no pivots in the last q columns. 


Problem 8.3. Is the span of the vectors 

⎛ ⎞ ⎛ ⎞ 

1 

⎜ ⎟ ⎜ 

1 ⎟ 

⎝−1⎠ , ⎝1⎠ 

0 1 

the same as the span of the vectors 

⎛ ⎞ 

⎜ 

0 ⎟ 

⎝4⎠ , 

2 

⎛ ⎞ 

4 

⎜ ⎟ 

⎝2⎠? 

3 

Problem 8.4. Describe the span of the vectors 

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

1 1 

⎜ ⎟ ⎜ ⎟ ⎜ 

1 ⎟ 

⎝0⎠ , ⎝0⎠ , ⎝1⎠ . 

1 0 0

76 Span 

Problem 8.5. Does the vector 

⎛ ⎞ 

−1 

⎜ ⎟ 

⎝ 0 ⎠ 

1 

lie in the span of the vectors 

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

0 

⎜ ⎟ ⎜ 

⎟ ⎜ 

⎟ 

⎝−1⎠ , ⎝1⎠ , ⎝−1⎠? 

0 2 1 


⎛ 

⎜ 

0 

⎞ 

⎟ 

⎝2⎠ 

0 


⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

2 2 4 

⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 

⎝0⎠ , ⎝−1⎠ , ⎝0⎠? 

0 1 0 



⎛ 

⎜ 

⎝ 

0 

1 

−1 



⎛ 

⎛ ⎞ 

−1 

⎜ ⎟ 

⎝ 0 ⎠ 

1 

⎞ ⎛ ⎞ ⎛ ⎞ 

⎟ ⎜ 

2 ⎟ ⎜ 

−1 ⎟ 

⎠ , ⎝0⎠ , ⎝ 1 ⎠? 

1 0 

⎞ ⎛ 

1 

⎜ ⎟ ⎜ 

⎝−1⎠ , ⎝ 

0 

⎛ ⎞ 

0 

⎜ ⎟ 

⎝−3⎠ 

6 

1 

2 

−6 

⎞ ⎛ ⎞ 

−3 

⎟ ⎜ ⎟ 

⎠ , ⎝ 0 ⎠? 

6

8.5. Subspaces 77 

Problem 8.9. Find a linear equation satisfied on the span of the vectors 

⎛ ⎞ ⎛ ⎞ 

1 2 

⎜ ⎟ ⎜ ⎟ 

⎝ 1 ⎠ , ⎝ 0 ⎠ 

−1 −1 

8.5 Subspaces 

Picture a straight line through the origin, or a plane through the origin. We 

generalize this picture: 

Definition 8.7. A subspace P of R n is a collection of vectors in R n so that 

a. P is not empty (i.e. some vector belongs to the collection P ) 

b. If x belongs to P , then ax does too, for any number a. 

c. If x and y belong to P , then x + y does too. 

We can see in pictures that a plane through the origin is a subspace: 

My plane is not 

empty: the origin 

lies in my plane 

Scale a vector 

from my plane: 

it stays in that 

plane 

Problem 8.10. Prove that 0 belongs to every subspace. 

Add vectors from 

my plane: the 

sum also lies in 

my plane 

Problem 8.11. Prove that if a subspace contains some vectors, then it contains 

their span. 

Intuitively, a subspace is a flat object, like a line or a plane, passing through 

the origin 0 of R n . 

Example 8.8. The set P of vectors 

x = 

( ) 

x 1 

x 2 

for which x 1 + 2x 2 = 0 is a subspace of R 2 , because 

a. x = 0 satisfies x 1 + 2x 2 = 0 (so P is not empty). 

b. If x satisfies x 1 + 2x 2 = 0, then ax satisfies 

(ax) 1 + 2(ax) 2 = a x 1 + 2a x 2 

= a (x 1 + 2 x 2 ) 

= 0.

78 Span 

c. If x and y are points of P , satisfying 

then x + y satisfies 

x 1 + 2x 2 = 0 

y 1 + 2y 2 = 0 

(x 1 + y 1 ) + 2 (x 2 + y 2 ) = (x 1 + 2x 2 ) + (y 1 + 2y 2 ) 

= 0. 

Problem 8.12. Is the set S of all points 

( ) 

x 

x = 1 

x 2 

of the plane with x 2 = 1 a subspace? 

Problem 8.13. Is the set P of all points 

⎛ ⎞ 

⎜ 

x 1 

⎟ 

x = ⎝x 2 ⎠ 

x 3 

with x 1 + x 2 + x 3 = 0 a subspace? 

The word “subspace” really means just the same as “the span of some 

vectors,” as we will eventually see. 

Proposition 8.9. The span of a set of vectors is a subspace; in fact, it is the 

smallest subspace containing those vectors. Conversely, every subspace is a 

span: the span of all of the vectors inside it. 

Remark 8.10. In order to make this proposition true, we have to change our 

definitions just a little: if we have an empty collection of vectors (i.e. we don’t 

have any vectors at all), then we will declare that the span of that empty 

collection is the origin. 

Remark 8.11. If we have an infinite collection of vectors, then their span just 

means the collection of all linear combinations we can build up from all possible 

choices we can make of any finite number of vectors from our collection. We 

don’t allow infinite sums. We would really like to avoid using spans of infinite 

sets of vectors; we will address this problem in chapter 9. 

Proof. Given any set of vectors X in R n , let U be their span. So any vector in 

U is a linear combination of vectors from X. Scaling any linear combination 

yields another linear combination, and adding two linear combinations yields 

a further linear combination, so U is a subspace. If W is any other subspace


containing X, then we can add and scale vectors from W , yielding more vectors 

from W , so we can make linear combinations of any vectors from W making 

more vectors from W . Therefore W contains the span of X, i.e. contains U. 

Finally, if V is any subspace, then we can add and scale vectors from V to 

make more vectors from V , so V is the span of all vectors in V . 

Problem 8.14. Prove that every subspace is the span of the vectors that it 

contains. (Warning: this fact isn’t very helpful, because any subspace will 

either contain only the origin, or contain infinitely many vectors. We would 

really rather only think about spans of finitely many vectors. So we will have 

to reconsider this problem later.) 

Problem 8.15. What are the subspaces of R? 

Problem 8.16. If U and V are subspaces of R n : 

a. Let W be the set of vectors which either belong to U or belong to V . Is 

W a subspace? 

b. Let Z be the set of vectors which belong to U and to V . Is Z a subspace? 


Problem 8.17. Is the set X of all points 

( ) 

x 

x = 1 

x 2 

of the plane with x 2 = x 2 1 a subspace? 

Problem 8.18. Which of the following are subspaces of R 4 ? 

a. The set of points x for which x 1 x 4 = x 2 x 3 . 

b. The set of points x for which 2x 1 = 3x 2 . 

c. The set of points x for which x 1 + x 2 + x 3 + x 4 = 0. 

d. The set of points x for which x 1 , x 2 , x 3 and x 4 are all ≥ 0. 

Problem 8.19. Is a circle in the plane a subspace? Prove your answer. Draw 

pictures to explain your answer. 

Problem 8.20. Which lines in the plane are subspaces? Draw pictures to 

explain your answer.

80 Span 

8.6 Summary 

We have solved the problem of this chapter: to describe a subspace. You write 

down a set of vectors spanning it. If I write down a different set of vectors, 

you can check to see if mine are linear combinations of yours, and if yours are 

linear combinations of mine, so you know when yours and mine span the same 

subspace.

9 Bases 

Our goal in this book is to greatly simplify equations in many variables by 

changing to new variables. In linear algebra, the concept of changing variables 

is replaced with the more concrete concept of a basis. 


A basis is a list of “just enough” vectors to span a subspace. For example, we 

should be able to span a line by writing down just one vector lying in it, a plane 

with just two vectors, etc. 

Definition 9.1. A linear relation among some vectors x 1 , x 2 , . . . , x p in R n is an 

equation 

c 1 x 1 + c 2 x 2 + · · · + c p x p = 0, 

where c 1 , c 2 , . . . , c p are not all zero. A set of vectors is linearly independent if 

the vectors admit no linear relation. A set of vectors is a basis of R n if (1) the 

vectors are linearly independent and (2) adding any other vector into the set 

would render them no longer linearly independent. 

Example 9.2. The vectors 

( ) ( ) 

1 2 

x 1 = , x 2 = 

2 4 

satisfy the linear relation 2x 1 − x 2 = 0. 

9.2 Properties 

Lemma 9.3. The columns of a matrix are linearly independent just when each 

one is a pivot column. 

Only this plane contains 

0 and these 

two vectors. Threelegged 

tables don’t 

wobble, unless all of 

the feet of the table 

legs lie on the same 

straight line. 

The vector sticking 

up is linearly independent 

of the other 

two vectors. 

Proof. Obvious from lemma 8.6 on page 74. 


0 

, 

1 

a basis of R 2 ? 

( ) 

1 

1 

81

82 Bases 

Problem 9.2. The standard basis of R n is the basis e 1 , e 2 , . . . , e n (where e 1 is 

the first column of I n , etc.). Prove that the standard basis of R n is a basis. 

Problem 9.3. Prove that there is a linear relation between some vectors 

w 1 , w 2 , . . . , w q just when one of those vectors, say w k , is a linear combination 

of earlier vectors w 1 , w 2 , . . . , w k−1 . 

Theorem 9.4. Every linearly independent set of vectors in R n consists in at 

most n vectors, and consists in exactly n vectors just when it is a basis. 

Proof. Suppose that x 1 , x 2 , . . . , x p are a linearly independent. Let 

) 

A = 

(x 1 x 2 . . . x p . 

There is either one pivot or no pivot in each row. So the number of rows is at 

least as large as the number of pivots. There are n rows. There is one pivot in 

each column, so p pivots. So p ≤ n. 

If p = n, we have one pivot in each row, so adding another vector (another 

column) can’t add another pivot. Therefore adding any other vector to the 

vectors x 1 , x 2 , . . . , x p would break linear independence. 

If p < n, then we have zero rows after forward elimination. Suppose that 

forward elimination yields U = V A. Then (U e p+1 ) has more pivot columns 

than U has, so ( A V −1 e p+1 

) 

has more pivot columns than A has. Thus adding 

a new vector x p+1 = V −1 e p+1 to the collection of vectors x 1 , x 2 , . . . , x p , we 

have a larger linearly independent collection. 

Problem 9.4. Prove that every linearly independent set of vectors in R n 

belongs to a basis. 

Lemma 9.5. A set of vectors u 1 , u 2 , . . . , u n is a basis of R n just when every 

vector b in R n can be written as a linear combination 

b = a 1 u 1 + a 2 u 2 + · · · + a n u n , 

for a unique choice of numbers a 1 , a 2 , . . . , a n . 

Proof. Let 

) 

A = 

(u 1 u 2 . . . u n , 

⎛ ⎞ 

a 1 

a 

a = 

2 

⎜ ⎟ 

⎝ . ⎠ , 

a n 

and apply theorem 5.9 on page 47 to the equation Aa = b.

9.3. The Change of Basis Matrix 83 


Problem 9.5. Are the vectors 

⎛ ⎞ ⎛ 

1 

⎜ ⎟ ⎜ 

⎝−1⎠ , ⎝ 

0 

linearly independent? 

0 

1 

−1 

⎞ 

⎟ 

⎠ , 

⎛ 

⎜ 

⎝ 

1 

0 

−1 

⎞ 

⎟ 

⎠ 

Problem 9.6. Are the vectors 

a basis? 

( ) 

2 

, 

1 

( ) 

1 

2 

Problem 9.7. Can you find matrices A and B so that A is 3 × 5 and B is 

5 × 3, and AB = 1? 

Problem 9.8. Suppose that A is 3 × 5 and B is 5 × 3, and that AB is invertible. 

Must the columns of B be linearly independent? the rows of B? the columns 

of A? the rows of A? 

Problem 9.9. Give an example of a 3×3 matrix for which any two columns are 

linearly independent, but the three columns together are not linearly independent. 

Can such a matrix be invertible? 

9.3 The Change of Basis Matrix 

Definition 9.6. The change of basis matrix F associated to a basis u 1 , u 2 , . . . , u n 

of R n is the matrix 

) 

F = 

(u 1 u 2 . . . u n . 

Note that F e 1 = u 1 , F e 2 = u 2 , . . . , F e n = u n . So taking x to F x is a 

“change of basis”, taking the standard basis to the new basis. 

Problem 9.10. Prove that an n × n matrix A is the change of basis matrix of 

a basis just when the equation Ax = 0 has x = 0 as its only solution, which 

occurs just when A is invertible. 

Suppose that you and I look at the sky and watch a falling star. You measure 

its position against the fixed choice of basis e 1 , e 2 , e 3 , while I measure against

p 

84 Bases 

some funny choice of basis u 1 , u 2 , u 3 . The actual position is some vector p in 

R 3 . Lets say 

p = x 1 e 1 + x 2 e 2 + x 3 e 3 as you measure it, 

= y 1 u 1 + y 2 u 2 + y 3 u 3 as I measure it. 

Let 

) 

F = 

(u 1 u 2 u 3 

be the change of basis matrix, so that F e 1 = u 1 , etc. So F takes your basis to 

mine. If we let 

⎛ ⎞ ⎛ ⎞ 

x 1 

⎜ ⎟ ⎜ 

y 1 

⎟ 

x = ⎝x 2 ⎠ and y = ⎝y 2 ⎠ 

x 3 y 3 

then in your basis: 

p = x 1 e 1 + x 2 e 2 + x 3 e 3 

= x 

but in mine: 

p = y 1 u 1 + y 2 u 2 + y 3 u 3 

= y 1 F e 1 + y 2 F e 2 + y 3 F e 3 

= F (y 1 e 1 + y 2 e 2 + y 3 e 3 ) 

= F y. 

So x = F y converts my measurements to yours. 

Remark 9.7. Suppose that we change variables by x = F y and so y = F −1 x, 

with F some invertible matrix. Then any matrix A acting on the x variables 

by taking x to Ax is represented in y variables as 

the matrix F −1 AF . 

F −1 

turn x’s to y’s 

A 

act on x’s 

F 

turn y’s to x’s 

Problem 9.11. Take 

⎛ 

⎞ ⎛ 

⎞ 

1 0 1 1 0 0 

⎜ 

⎟ ⎜ 

⎟ 

F = ⎝0 1 0⎠ , A = ⎝0 2 0⎠ . 

0 0 1 0 0 2 

Compute F −1 AF .

9.3. The Change of Basis Matrix 85 

Problem 9.12. A shower of falling stars fall to Earth. Each star falls from a 

position 

⎛ ⎞ 

⎜ 

x 1 

⎟ 

x = ⎝x 2 ⎠ 

x 3 

to a position on the ground 

⎛ 

x 1 

⎞ 

⎜ ⎟ 

Ax = ⎝x 2 ⎠ . 

0 

What is the matrix A? Suppose that I measure the positions of the stars against 

the basis 

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

⎜ 

1 ⎟ ⎜ 

2 ⎟ ⎜ 

0 ⎟ 

u 1 = ⎝0⎠ , u 2 = ⎝1⎠ , u 3 = ⎝2⎠ . 

0 

0 

1 

Find the change of basis matrix F , and find F −1 AF , the matrix that describes 

how each star falls from the sky as measured against my basis. 



⎛ ⎞ 

⎜ 

1 ⎟ 

⎝2⎠ , 

0 

⎛ ⎞ 

⎜ 

0 ⎟ 

⎝2⎠ , 

0 

⎛ ⎞ 

⎜ 

1 ⎟ 

⎝1⎠ 

1 


Problem 9.14. If A is a matrix, show how each vector which A kills determines 

a linear relation between the columns of A, and vice versa. 

Problem 9.15. Are ( ) 

1 

, 

0 

( ) 

0 

0 

linearly independent? 

Problem 9.16. Write down a basis of R 2 other than the standard basis, and 

prove that your basis really is a basis.

86 Bases 


1 

, 

1 


( ) 

2 

1 


0 1 

1 1 

a change of basis matrix? If so, for what basis? 

Problem 9.19. If x 1 , x 2 , . . . , x n and y 1 , y 2 , . . . , y n are two bases of R n , prove 

that there is a unique invertible matrix A so that A x 1 = y 1 , A x 2 = y 2 , etc. 

9.4 Bases of Subspaces 

We can write down a subspace, by writing down a spanning set of vectors. But 

you might write down more vectors than you need to. We want to squeeze 

the description down to the bare simplest minimum, throwing out redundant 

information. 

Definition 9.8. If V is a subspace of R n , a basis of V is a set of linearly 

independent vectors from V , so that adding any other vector from V into the 

set would render them no longer linearly independent. 

Example 9.9. The vectors 

⎛ ⎞ ⎛ ⎞ 

⎜ 

1 ⎟ ⎜ 

0 ⎟ 

⎝0⎠ , ⎝1⎠ 

0 0 

are a basis for the subspace V in R 3 of vectors of the form 

⎛ ⎞ 

Obviously: 

x 1 

⎜ ⎟ 

⎝x 2 ⎠ . 

0 

Lemma 9.10. If some vectors span a subspace, then putting them into the 

columns of a matrix, the pivot columns form a basis of the subspace. 

Example 9.11. Lets find a basis for the span of the vectors 

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

⎜ 

1 ⎟ ⎜ 

1 ⎟ ⎜ 

0 ⎟ 

⎝1⎠ , ⎝1⎠ , ⎝0⎠ . 

1 0 1

9.5. Dimension 87 

Put them into a matrix 

Forward eliminate: 

⎛ 

⎞ 

1 1 0 

⎜ 

⎟ 

⎝1 1 0⎠ . 

1 0 1 

⎛ 

⎞ 

1 1 0 

⎜ 

⎟ 

⎝ 0 −1 1⎠ , 

0 0 0 

so the first and second columns are pivot columns. Therefore 

are a basis for the span. 

Are there any bases? 

⎛ ⎞ ⎛ ⎞ 

1 1 

⎜ ⎟ ⎜ ⎟ 

⎝1⎠ , ⎝1⎠ 

1 0 

Proposition 9.12. Every subspace of R n has a basis. Moreover, any basis 

v 1 , v 2 , . . . , v p of a subspace V of R n lives in a basis v 1 , v 2 , . . . , v p , w 1 , w 2 , . . . , w q 

of R n . 

Proof. If V only contains the 0 vector, then we can take no vectors as a basis 

for V , and let w 1 , w 2 , . . . , w n be any basis for R n . On the other hand, if V 

contains a nonzero vector, then pick as many linearly independent vectors from 

V as possible. By theorem 9.4 on page 82, we could only pick at most n vectors. 

They must span V , because otherwise we could pick another one. If V = R n , 

then we are finished. Otherwise, pick as many vectors from R n as possible 

which are linearly independent of v 1 , v 2 , . . . , v p . Clearly we stop just when we 

hit a total of n vectors. 

9.5 Dimension 

Do all bases look pretty much the same? 

Theorem 9.13. Any two bases of a subspace have the same number of vectors. 

Proof. Imagine two bases, say x 1 , x 2 , . . . , x p and y 1 , y 2 , . . . , y q , for the same 

subspace. Forward eliminate 

( 

x 1 x 2 . . . x p y 1 y 2 . . . y q 

)

88 Bases 

yielding 

⎛ 

⎜ 

⎝ 

⎞ 

. 

⎟ 

⎠ 

Each x vector generates a pivot, p pivots in all, straight down the diagonal. 

Forward eliminate the right hand portion of the matrix, yielding 

⎛ 

⎜ 

⎝ 

⎞ 

, 

⎟ 

⎠ 

giving at most p pivots because of the zero rows. Each y vector generates a 

pivot. So there aren’t more than p of these y vectors. Thus no more y vectors 

than x vectors. Reversing the roles of x and y vectors, we find that there can’t 

be more x vectors than y vectors. 

Problem 9.20. Prove that every subspace of R n has a basis with at most n 

vectors. 

Definition 9.14. The dimension of a subspace is the number of vectors in any 

basis. Write the dimension of a subspace U as dim U. 


Problem 9.21. Consider the vectors in R n of the form e i − e j (for all possible 

values of i and j from 1 to n). Find a basis for the subspace they span. 

9.6 Summary 

• A subspace is a flat thing passing through 0. 

• A basis for a subspace is a collection of just enough vectors to span the 

subspace. 

• A change of basis matrix is a basis organized into the columns of a matrix.

9.7. Uniqueness of Reduced Echelon Form 89 

9.7 Uniqueness of Reduced Echelon Form 

When we carry out elimination, we choose rows to swap. 

Problem 9.22. Find the simplest matrix A you can with two different ways of 

carrying out forward elimination, with different results. 

Recall that Gauss–Jordan elimination means forward elimination followed 

by back substitution. The matrix resulting from Gauss–Jordan elimination is 

said to be in reduced echelon form. 

Theorem 9.15. The result of Gauss–Jordan elimination does not depend on 

the choices made of which rows to swap. 

Proof. Suppose that U and W are two different eliminations of the same matrix 

A, obtained using different choices of rows to swap. The first pivot column is 

just the first nonzero column of A. The second pivot column is the earliest 

column which is linearly independent of the first pivot column, etc. This is 

true for A, and doesn’t change under forward elimination or back substitution. 

Therefore A and U and W have the same pivot columns. After elimination, the 

first pivot column becomes e 1 , the second becomes e 2 , etc. So all of the pivot 

columns of U and W must be identical. 

Every pivotless column is a linear combination of earlier pivot columns, and 

the coefficients in this linear combination are not affected by Gauss–Jordan 

elimination. Therefore the pivotless columns of U and W are the same linear 

combinations of the pivot columns. The pivot columns are the same, so all 

columns are the same. 

Problem 9.23. The rank is the number of pivots in the forward elimination. 

Prove that the rank of a matrix does not depend on which rows you choose 

when forward eliminating.

10 Kernel and Image 

Each matrix A has two important subspaces associated to it: its kernel (the 

vectors it kills), and its image (the vectors b for which you can solve Ax = b). 

10.1 Kernel 

Definition 10.1. If A is any matrix, say p × q, then the vectors x in R q for which 

Ax = 0 (vectors “killed” by A) form a subspace of R q called the kernel of A, 

and written ker A. 

The kernel is a subspace, because 

a. 0 belongs to the kernel of any matrix A, since A0 = 0 (everything kills 0). 

b. If Ax = 0 and Ay = 0, then A(x + y) = Ax + Ay = 0 (when you kill two 

vectors, you kill their sum). 

c. If Ax = 0, then A (ax) = aAx = 0 (when you kill a vector, you kill its 

multiples). 

Problem 10.1. If a matrix is wider than it is tall (a “short” matrix), then its 

kernel contains nonzero vectors. 

Problem 10.2. Prove that the kernel of AB contains the kernel of B. Does it 

have to contain the kernel of A? 

We will often need to find kernels of matrices. To rapidly calculate the kernel 

of a matrix, for example 

⎛ 

⎞ 

⎜ 

2 0 1 1 ⎟ 

A = ⎝1 1 2 1⎠ 

3 −1 0 1 

a. Carry out forward elimination and back substitution. 

⎛ 

⎞ 

1 1 

1 0 

2 2 

b. Cut out all zero rows. 

⎜ 

⎝ 

( 

3 1 

0 1 

2 2 

0 0 0 0 

1 

1 0 

2 

3 

0 1 

2 

91 

1 

2 

1 

2 

⎟ 

⎠ 

)


c. Change the signs of all entries after each pivot. 

( 

) 

1 0 − 1 2 

− 1 2 

0 1 − 3 2 

− 1 2 

(This corresponds to changing equations like x 1 + 1 2 x 3 + 1 2 x 4 = 0 to 

x 1 = − 1 2 x 3 − 1 2 x 4. Think of it as moving everything after the pivot over 

to the right hand side, although we won’t actually move anything.) 

d. Stuff in whatever rows from the identity matrix you need into your matrix, 

so that it ends up with nonzero entries all down the diagonal. 

⎛ 

⎞ 

1 0 − 1 2 

− 1 2 

0 1 − 3 

⎜ 

2 

− 1 2 

⎟ 

⎝ 0 0 1 0 ⎠ . 

0 0 0 1 

We won’t mark the new rows with pivots. Each new row corresponds to 

setting one of the free variables to 1 and the others to 0. 

e. Cut out all of the pivot columns. The remaining columns are a basis for 

the kernel. 

⎛ ⎞ ⎛ ⎞ 

− 1 2 

− 1 − 3 2 

⎜ 

2 

⎟ 

⎝ ⎠ , − 1 ⎜ 

2 

⎟ 

⎝ ⎠ . 

Problem 10.3. Apply this algorithm to the matrix 

⎛ 

⎞ 

0 1 2 

⎜ 

⎟ 

A = ⎝ 2 2 2⎠ 

−2 0 2 

1 

0 

and check that Ax = 0 for each vector x in your resulting basis for the kernel. 

0 

1 

Lemma 10.2. The algorithm works, giving a basis for the kernel of any matrix. 

Proof. Each vector in the kernel is obtained by setting arbitrary values for 

the free variables, and letting the pivots solve for the other variables. Let v 1 

be the vector in the kernel which has value 1 for the 1 st free variable, and 0 

for all other free variables. Similarly, make a vector v 2 , v 3 , . . . , v s for each free 

variable–suppose that there are s free variables. The kernel is a subspace, so 

each linear combination 

c 1 v 1 + c 2 v 2 + · · · + c s v s 

lies in the kernel. This linear combination has value c 1 for the first free variable, 

c 2 for the second, etc. (just looking at the rows of the free variables). Each

10.1. Kernel 93 

vector in the kernel has some values c 1 , c 2 , . . . , c s for the free variables. So each 

vector in the kernel is a unique linear combination of v 1 , v 2 , . . . , v s . Supose 

we find a linear relation among v 1 , v 2 , . . . , v s , say c 1 v 1 + c 2 v 2 + · · · + c s v s = 0. 

Look at the row in which v 1 has a 1 and all of the other vectors have 0’s: the 

linear relation gives c 1 = 0 in that row. Similarly all of c 1 , c 2 , . . . , c s must 

vanish, so there is no linear relation among these vectors. Therefore the vectors 

v 1 , v 2 , . . . , v s form a basis for the kernel. 

Finally, we need to see why these vectors v 1 , v 2 , . . . , v s are precisely the 

vectors which come out of our process above. First, look at our example. The 

reduced echelon form turns back into equations as 

x 1 + 1 2 x 3 + 1 2 x 4 = 0 

Solving for pivots means subtracting off: 

x 2 + 3 2 x 3 + 1 2 x 4 = 0 

x 1 = − 1 2 x 3 − 1 2 x 4 

x 2 = − 3 2 x 3 − 1 2 x 4. 

All free variables line up on the right hand side, and we have changed the signs 

of their coefficients. Setting x 3 = 1 and x 4 = 0, go down the right hand side, 

killing the x 4 entries, and putting x 3 = 1 in each x 3 entry, i.e. writing down 

just the entries from the x 3 column: 

⎛ ⎞ 

v 1 = ⎜ 

⎝ 

The general algorithm works in the same way: if we put all free variables on to 

the right hand side, and then set one free variable to 1 (“turn it on”) and the 

others to 0’s (“turn them off”), we can picture this as “turning on” the column 

associated to that free variable. Each pivot solves for a pivot variable—the 

value of that pivot variable is the entry in the corresponding row of the “turned 

on” column. 

− 1 2 

− 3 2 

1 

0 

⎟ 

⎠ . 

Problem 10.4. Give an example of a square matrix whose kernel is not the 

kernel of its transpose. 

Problem 10.5. Draw a picture of the kernel for each of 

( 

) ( ) 

1 0 0 1 0 

( ) ( ) 

A = 

, B = 

, C = 1 , D = 0 0 . 

0 1 0 2 0 

Corollary 10.3. The dimension of the kernel of a matrix is the number of 

pivotless columns after forward elimination. 

Remark 10.4. Another way to say it: the dimension of the kernel of a matrix A 

is the number of free variables in the equation Ax = 0.



Problem 10.6. Find a basis for the kernel of 

⎛ 

⎞ 

1 0 0 0 0 

0 1 0 0 1 

⎜ 

⎟ 

⎝0 0 1 2 2⎠ 

0 0 0 0 0 


⎛ 

⎞ 

0 1 0 0 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝0 0 0 1 0⎠ 

0 0 0 0 1 


⎛ 

⎞ 

1 −1 −1 −1 1 

2 2 0 0 −1 

⎜ 

⎟ 

⎝ 2 2 0 1 0 ⎠ 

−1 0 2 −1 2 


⎛ 

⎞ 

1 1 2 

⎜ 

⎟ 

⎝−1 −1 0⎠ 

1 1 1 


⎛ 

⎞ 

1 1 2 

⎜ 

⎟ 

⎝−1 −1 0⎠ 

1 1 1 


⎛ 

⎞ 

1 0 1 

⎜ 

⎟ 

⎝ 2 1 −1⎠ 

−1 1 0

10.2. Image 95 

10.2 Image 

Definition 10.5. The image of a matrix is the set of vectors y of the form y = Ax 

for some vector x, written im A. 

Problem 10.12. Prove that the image of a matrix is the span of its columns. 

Example 10.6. The image of 

⎛ 

⎞ 

1 1 2 3 

⎜ 

⎟ 

A = ⎝ 0 −1 −2 −3⎠ 

−1 0 0 0 

is the span of 

⎛ 

⎜ 

⎝ 

1 

0 

−1 

⎞ 

⎟ 

⎠ , 

⎛ ⎞ 

1 

⎜ ⎟ 

⎝−1⎠ , 

0 

⎛ ⎞ 

2 

⎜ ⎟ 

⎝−2⎠ , 

0 

⎛ ⎞ 

3 

⎜ ⎟ 

⎝−3⎠ . 

0 

Problem 10.13. Prove that the equation Ax = b has a solution x just when b 

lies in the image of A. 


Problem 10.14. Describe the image of the matrix 

⎛ ⎞ 

⎜ 

2 0 ⎟ 

A = ⎝0 3⎠ . 

0 0 

Problem 10.15. (a) Suppose that A is a 2 × 2 matrix which, when taking a 

vector x to the vector Ax, takes multiples of e 1 to multiples of e 1 . Show 

that A is upper triangular. 

(b) Similarly, if A is a 3×3 matrix which takes multiples of e 1 to multiples of e 1 , 

and takes linear combinations of e 1 and e 2 to other such linear combinations, 

then A is upper triangular. 

(c) Generalize this to R n , and use it to show that the inverse of an invertible 

upper triangular matrix is upper triangular. 

Problem 10.16. Prove that if a matrix is taller than it is wide (a “tall” matrix), 

then some vector does not belong to its image.


Problem 10.17. Find the dimension of the kernel of 

( ) ( ) ( ) 

0 0 1 0 0 0 

A = 

, B = 

, C = 

, 

0 1 0 0 0 0 

( ) ( 

) ( ) 

1 1 1 1 1 1 0 

D = 

, E = 

, F = 

. 

1 1 0 0 1 0 −1 

Problem 10.18. Prove that the kernel of A is the kernel of 

( ) 

A 

. 

A 

Problem 10.19. If you know the kernel of a p × q matrix A, how do you find 

the dimension of the kernel of 

⎛ 

⎞ 

A A A 

⎜ 

⎟ 

B = ⎝A A A⎠? 

A A A 

10.3 Kernel and Image 

Problem 10.20. If A and B are matrices, and there is an invertible matrix C 

for which B = CA, prove that A and B have the same kernel. 

Problem 10.21. If A and B are two matrices, and B = CA for an invertible 

matrix C, prove that A and B have images of the same dimension. 

Theorem 10.7. For any matrix A, 

dim ker A + dim im A = number of columns. 

Proof. The image of A is the span of the columns. By lemma 8.6 on page 74, 

each pivotless column is a linear combination of earlier pivot columns. So 

the pivot columns span the image. Pivot columns are linearly independent: a 

basis. Each pivotless column contributes (in our algorithm) to our basis for the 

kernel. 


⎛ 

⎞ 

⎜ 

−1 1 −1 1 ⎟ 

A = ⎝−1 1 1 1⎠ 

−2 2 −1 1



⎛ 

⎞ 

−1 1 −1 1 

⎜ 

⎟ 

U = ⎝ 0 0 2 0 ⎠ 

0 0 0 −1 

So columns 1, 3 and 4 of A (not of U) are a basis for the image of A: 

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

⎜ 

−1 ⎟ ⎜ 

−1 ⎟ ⎜ 

1 ⎟ 

⎝−1⎠ , ⎝ 1 ⎠ , ⎝1⎠ . 

−2 −1 1 

The image has dimension 3 because there are 3 pivot columns. The kernel has 

dimension 1, because there is one pivotless column. The pivotless column is not 

a basis for the kernel. It just shows you the dimension of the kernel. (In this 

example, the pivotless column isn’t even in the kernel.) 

Problem 10.22. Find the rank of 

⎛ 

⎞ 

0 2 2 2 

1 2 0 0 

A = ⎜ 

⎟ 

⎝0 1 2 2⎠ 

0 2 2 2 

and explain what this tells you about image and kernel. 


Problem 10.23. Find two matrices A and B, which have different images, but 

for which B = CA for an invertible matrix C. Prove that the images really are 

different, but of the same dimension. 

Problem 10.24. Prove that rank A = rank A t for any matrix A. 

10.4 Summary 

a. The kernel of a matrix is the set of vectors it kills. It is large just when 

linear equations Ax = b with one solution have lots of solutions (measures 

plurality of solutions when they exist). 

b. Our algorithm makes a basis for the kernel out of the pivotless columns. 

c. The image of a matrix is the stuff that comes out of it—the vectors b for 

which you can solve Ax = b (measures existence of solutions). 

d. The pivot columns are a basis of the image.



Problem 10.25. Suppose that Ax = b. Prove that A(x + y) = b too, just 

when y lies in the kernel. So the kernel measures the plurality of solutions of 

equations, while the image measures existence of solutions. 

Problem 10.26. What is the maximum possible rank of a 4 × 3 matrix? A 

3 × 5 matrix? 

Problem 10.27. If a 3 × 5 matrix A has rank 3 must the equation Ax = b have 

a solution x? Can it have more than one solution? If it has one solution, must 

it have infinitely many? 

Problem 10.28. As for the previous question, but with a 5 × 3 matrix A. 

Problem 10.29. If A = BC and B is 5 × 4 and C is 4 × 5, prove that det A = 0. 

Problem 10.30. Write down a 2 × 2 matrix A so that if I choose any vector 

x with positive entries, then the vector Ax also has positive entries, and lies 

between (but not on) the horizontal axis and a diagonal line. 

Problem 10.31. The Fredholm Alternative: for any matrix A and vector b, 

prove that just one of the following two problems has a solution: (1) Ax = b or 

(2) A t y = 0 with b t y ≠ 0. 

Problem 10.32. Prove that the image of AB is contained in the image of A. 

Problem 10.33. Prove that the rank of AB is never more than the rank of A 

or of B. 

Problem 10.34. Prove that the rank of a sum of matrices is never more than 

the sum of the ranks. 

Problem 10.35. Which of the following can change when you carry out forward 

elimination? 

a. image, 

b. kernel, 

c. dimension of image, 

d. dimension of kernel? 

Problem 10.36. Prove that the rank of AB is no larger than the ranks of A 

and B.

99 

Eigenvectors

11 Eigenvalues and Eigenvectors 

In this chapter, we study certain special vectors, called eigenvectors, associated 

to a square matrix. When a vector x is struck by a matrix A, it becomes a new 

vector Ax. Usually the new vector is unrelated to the old one. Rarely, the new 

vector might just be the old one stretched or squished; we will then call x an 

eigenvector of A. If we have a basis worth of eigenvectors, then the matrix A 

just squishes or stretches each one, and we can completely recover the matrix if 

we know the basis of eigenvectors and their eigenvalues. 

An eigenvector just 

gets stretched 

11.1 Eigenvalues and Characteristic Polynomial 

Definition 11.1. An eigenvector x of a square matrix A is a nonzero vector for 

which Ax = λx for some number λ, called the eigenvalue of x. 

The eigenvalue is the factor that the eigenvector gets stretched by. 

Problem 11.1. For each of the pictures in problem 3.15 on page 27, calculate 

the eigenvalues of the associated matrix and draw on each face the directions 

that the eigenvectors point in. 

Lemma 11.2. A number λ is an eigenvalue of a square matrix A (which is to 

say that there is an eigenvector x with that eigenvalue) just when 

det (A − λ I) = 0. 

Proof. Rewrite the equation Ax = λx as (A − λ I) x = 0. Recall that the 

equation Bx = 0 (with B a square matrix) has a nonzero solution x just when 

B is not invertible, so just when det B ≠ 0. Lets pick B to be A − λ I; there is 

an eigenvector x with eigenvalue λ just when det (A − λ I) = 0. 

Definition 11.3. The expression det (A − λ I) is called the characteristic polynomial 

of the matrix A. 

We can restate the lemma: 

Lemma 11.4. The eigenvalues of a square matrix A are precisely the roots of 

its characteristic polynomial. 

101



( ) 

2 0 

A = 

0 3 

has characteristic polynomial 

(( ) ( )) ( 

) 

2 0 1 0 

2 − λ 0 

det 

− λ 

= det 

0 3 0 1 

0 3 − λ 

So the eigenvalues are λ = 2 and λ = 3. 

= (2 − λ) (3 − λ) . 

Problem 11.2. Prove that the eigenvalues of an upper (or lower) triangular 

matrix are the diagonal entries. For example: 

⎛ 

⎞ 

⎜ 

1 2 3 ⎟ 

A = ⎝0 4 5⎠ 

0 0 6 

has eigenvalues λ = 1, λ = 4, λ = 6. 

Problem 11.3. Find a 2 × 2 matrix A whose eigenvalues are not the same as 

its diagonal entries. 

Problem 11.4. Find 2 × 2 matrices A and B for which A + B has an eigenvalue 

which is not a sum of some eigenvalue of A with some eigenvalue of B. 

Definition 11.6. The set of eigenvalues of a matrix is called its spectrum. 

Example 11.7. 

( ) 

0 −1 

A = 

1 0 

has det (A − λ I) = λ 2 + 1, which has no roots, so there are no eigenvalues 

(among real numbers λ). 

Problem 11.5. Why is the characteristic polynomial a polynomial in λ? 

Problem 11.6. What is the highest order term of the variable λ in the characteristic 

polynomial of a matrix A?

11.1. Eigenvalues and Characteristic Polynomial 103 

Appendix: Why the Fast Formula is So Slow 

We use the slow formula to calculate determinants when we compute out 

characteristic polynomials. Why not the fast formula? Lets try it on an 

example. Take 

⎛ 

⎞ 

⎜ 

1 1 2 ⎟ 

A = ⎝2 3 0⎠ . 

1 1 4 

Lets hunt down the eigenvalues of A, by computing the characteristic polynomial 

as before. But this time, lets try the fast formula for the determinant, to find 

det (A − λ I). We apply forward elimination to 

⎛ 

⎞ 

1 − λ 1 2 

⎜ 

⎟ 

A − λ I = ⎝ 2 3 − λ 0 ⎠ . 

1 1 4 − λ 

to find 

⎛ 

1 − λ 1 2 

⎜ 

⎝ 

2 3 − λ 0 

1 1 4 − λ 

⎞ 

⎟ 

⎠ 

Add − 2 

1 

1−λ 

(row 1) to row 2, − 

1−λ 

(row 1) to row 3. 

⎛ 

⎞ 

1 − λ 1 2 

⎜ 

1−4 λ+λ2 

⎝ 0 − 

−1+λ 

4 (−1 + λ) −1 ⎟ 

⎠ 

Move the pivot ↘. 

Add 

0 

λ 

−1+λ 

2−5 λ+λ2 

− 

−1+λ 

⎛ 

⎞ 

1 − λ 1 2 

⎜ 0 − 1 − 4 λ + λ2 

4 (−1 + λ) −1 

⎟ 

⎝ 

−1 + λ 

⎠ 

0 

λ 

−1+λ 

λ 

1−4 λ+λ 2 (row 2) to row 3. 

2−5 λ+λ2 

− 

−1+λ 

⎛ 

⎞ 

1 − λ 1 2 

⎜ 0 − 1 − 4 λ + λ2 

4 (−1 + λ) −1 

⎟ 

⎝ 

−1 + λ 

⎠ 

0 0 − λ3 −8 λ 2 +15 λ−2 

1−4 λ+λ 2



⎛ 

⎞ 

1 − λ 1 2 

0 − 1 − 4 λ + λ2 

4 (−1 + λ) −1 

−1 + λ 

⎜ 

⎝ 

0 0 − λ3 − 8 λ 2 ⎟ 

+ 15 λ − 2 ⎠ 

1 − 4 λ + λ 2 

The point: at each step, the expressions are rational functions of λ, accumulating 

to become more complicated at each step. This is not any faster than the slow 

process, which gives: 

( 

) 

3 − λ 0 

det (A − λ I) = + (1 − λ) det 

− 2 det 

+ 1 det 

( 

1 2 

1 4 − λ 

( 

1 2 

3 − λ 0 

=2 − 15 λ + 8 λ 2 − λ 3 . 

1 4 − λ 

) 

Lets always use the slow process. There is actually a faster method to find 

eigenvalues (see section 24.2 on page 230) of large matrices, but it is slower on 

small matrices, and we won’t ever want to work with large matrices. 


Problem 11.7. Find the eigenvalues of the matrices 

⎛ 

⎞ 

( ) ( ) ( ) 

4 −2 2 3 0 1 ⎜ 

1 1 0 ⎟ 

A = 

, B = 

, C = 

, D = ⎝1 1 0⎠ 

−2 1 

1 0 0 1 

0 1 0 

) 

Problem 11.8. Prove that a square matrix A and its transpose A t have the 

same eigenvalues. 


det ( F −1 AF − λ I ) = det (A − λ I) 

for any square matrix A, and any invertible matrix F . So the characteristic 

polynomial is unchanged by change of basis.

11.1. Eigenvalues and Characteristic Polynomial 105 

Problem 11.10. If all of the entries of a square matrix are positive, are its 

eigenvalues positive? 

Problem 11.11. Are the eigenvalues of AB equal to those of BA? 

Problem 11.12. Give an example of 2 × 2 matrices A and B for which the 

eigenvalues of AB are not products of eigenvalues of A with those of B. 

Problem 11.13. What are the eigenvalues of 

⎛ 

⎞ 

⎜ 

1 1 1 ⎟ 

A = ⎝1 1 1⎠? 

1 1 1 

Problem 11.14. The multiplicity of an eigenvalue λ j is the number of factors of 

λ−λ j appearing in the characteristic polynomial. Suppose that the characteristic 

polynomial of some n × n matrix A splits into a product of linear factors. Prove 

that the determinant of A is the product of its eigenvalues (each taken with 

multiplicity), by setting λ = 0 in the characteristic polynomial. 

Problem 11.15. From the previous exercise, if a 2 × 2 matrix A has eigenvalues 

0 and 1, what is its rank? 

Problem 11.16. Write out the characteristic polynomial of an n × n matrix A 

as 

det (A − λ I) = s 0 (A) − s 1 (A)λ + s 2 (A)λ 2 + · · · + (−1) n s n (A)λ n . 

a. Find s n (A). 

b. Prove that s 0 (A) = det A. 

c. Prove that s j (A) is a sum of products of precisely n − j entries of A. In 

particular, s n−1 (A) is a polynomial of degree 1 as a function of each entry 

of A. 

d. Use this to prove that s n−1 (A) = A 11 + A 22 + · · · + A nn . (This quantity 

A 11 + A 22 + · ·(· + A nn is called the trace of A). 

e. Prove that s j F −1 AF ) = s j (A) for any invertible matrix F , so the 

coefficients of the characteristic polynomial are unchanged by change of 

basis. 

f. Take a basis u 1 , u 2 , . . . , u n for which the vectors u r+1 , u r+2 , . . . , u n form 

a basis of the kernel, let F be the associated change of basis matrix, and 

look at F −1 AF . Prove that 

F −1 AF = 

( 

P 0 

Q 0 

for some invertible r × r matrix P , and some matrix Q. 

)


g. If A has rank r, prove that s k (A) = 0 for k ≤ n − r. 

h. Write down two 2 × 2 matrices of different ranks with the same characteristic 

polynomial. 

11.2 Eigenvectors 

To find the eigenvectors of a matrix A: once you have the eigenvalues, pick each 

eigenvalue λ, and find the kernel of A − λ I. 


( ) 

2 0 

A = 

1 3 

has eigenvalues λ = 2 and λ = 3. 

Lets start with λ = 2: 

( ) ( ) 

2 0 1 0 

A − λ I = 

− 2 

1 3 0 1 

( ) 

0 0 

= 

1 1 

Our algorithm (from section 10.1) for finding the kernel yields a basis 

for the λ = 2-eigenvectors. 

( ) 

−1 

1 

Problem 11.17. Do the same for λ = 3. 

Problem 11.18. Find the eigenvectors and eigenvalues of 

⎛ 

⎞ 

1 3 0 

⎜ 

⎟ 

A = ⎝−2 6 0⎠ 

0 0 4 

Example 11.9. Lets put it all together. How do we calculate the eigenvectors of 

( ) 

3 2 

A = ? 

0 1

11.2. Eigenvectors 107 

a. Find the eigenvalues: 

0 = det (A − λ I) 

( 

) 

3 − λ 2 

= det 

0 1 − λ 

= (3 − λ) (1 − λ) 

So the eigenvalues are λ = 3 and λ = 1. 

b. Find the eigenvectors: for each eigenvalue λ, compute a basis for the 

kernel of A − λ I. 

( 

) 

3 − 3 2 

A − 3 I = 

= 

0 1 − 3 

) 

( 

0 2 

0 −2 

The kernel of A − 3 I has basis 

( ) 

1 

. 

0 

The nonzero linear combinations of this basis are the eigenvectors with 

eigenvalue λ = 3. 

For λ = 1, 

( 

) 

3 − 1 2 

A − I = 

0 1 − 1 

= 

( 

2 2 

0 0 

) 

The kernel of A − I has basis 

( ) 

−1 

. 

1 

The nonzero linear combinations of this basis are the λ = 1-eigenvectors. 


Problem 11.19. Without any calculation, what are the eigenvalues and eigenvectors 

of 

⎛ 

⎞ 

⎜ 

5 0 0 ⎟ 

A = ⎝0 6 0⎠? 

0 0 7


Problem 11.20. Find the eigenvalues and eigenvectors of 

( ) 

1 1 

A = 

0 1 


( ) 

0 3 

A = 

2 1 


⎛ 

⎞ 

⎜ 

2 0 0 ⎟ 

A = ⎝2 1 3⎠ 

2 0 3 

Problem 11.23. Prove that every eigenvector of any square matrix A is an 

eigenvector of A −1 , of A 2 , of 3A and of A−7 I. How are the eigenvalues related? 

Problem 11.24. Forward elimination messes up eigenvalues and eigenvectors. 

Back substitution messes them up further. Give the simplest examples you can. 

Problem 11.25. What are the eigenvalues and eigenvectors of the permutation 

matrix of a transposition? 

Problem 11.26. What are the eigenvalues and eigenvectors of a 2 × 2 strictly 

lower triangular matrix?

12 Bases of Eigenvectors 

In this chapter, we try (and don’t always succeed) to organize eigenvectors into 

bases. 

12.1 Eigenspaces 

Definition 12.1. The λ-eigenspace of a square matrix A is the set of vectors x 

for which (A − λ I)x = 0 (i.e. the kernel of A − λ I). 

The eigenvectors are precisely the nonzero vectors in the eigenspace. In 

particular, if λ is not an eigenvalue, then the λ-eigenspace is just the 0 vector. 

Problem 12.1. Prove that for any value λ, the λ-eigenspace of any square 

matrix is a subspace. 


Problem 12.2. Suppose that A and B are n × n matrices, and AB = BA. 

Prove that if x is in the λ-eigenspace of A, then so is Bx. 

Figure 12.1: An eigenspace with eigenvalue λ = 2: anything you draw in that 

subspace get doubled. 

109


Figure 12.2: A basis of eigenvectors of a matrix. Each vector starts off as 

a thickly drawn vector, and gets stretched into the thinly drawn vector. A 

negative stretching factor reverses the direction of the vector. We can recover 

the entire matrix A if we know the directions of the basis vectors that are 

stretched and how much the matrix stretches vectors in each of those directions. 

12.2 Bases of Eigenvectors 

Diagonal matrices are very easy to work with: 

( ) ( ) ( ) 

2 x 1 2x 

= 1 

, 

3 x 2 3x 2 

each variable simply getting scaled by a factor. The next easiest are matrices 

that become diagonal when we change variables. 

Theorem 12.2 (Decoupling Theorem). If u 1 , u 2 , . . . , u n is a basis of R n , and 

each of u 1 , u 2 , . . . , u n is an eigenvector of a square matrix A, say Au 1 = 

λ 1 u 1 , Au 2 = λ 2 u 2 , . . . , Au n = λ n u n , then 

⎛ 

⎞ 

λ 1 λ 

F −1 AF = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ , 

λ n 

where 

) 

F = 

(u 1 u 2 . . . u n 

is the change of basis matrix of the basis u 1 , u 2 , . . . , u n . 

Definition 12.3. We say that the matrix F (or the basis u 1 , u 2 , . . . , u n ) diagonalizes 

the matrix A. 

Remark 12.4. We call this the decoupling theorem, because the transformation 

taking x to Ax is usually very complicated, mixing up the variables x 1 , x 2 , . . . , x n 

in a tangled mess. But if we can somehow change the variables and make A 

into a diagonal matrix, then each of the new variables is just being stretched 

or squished by a factor λ i , independently of any of the other variables, so the 

variables appear “decoupled” from one another.

12.2. Bases of Eigenvectors 111 

Proof. F takes e’s to u’s, A scales the u’s, and then F −1 turns the scaled u’s 

back into e’s. So F −1 AF e j = λ j e j , giving the j-th column of F −1 AF . So 

⎛ 

⎞ 

λ 1 λ 

F −1 AF = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ . 

λ n 

Problem 12.3. Diagonalize 

⎛ 

⎞ 

⎜ 

1 0 0 ⎟ 

A = ⎝0 2 0⎠ . 

2 0 3 

We save a lot of time if we notice that: 

Theorem 12.5. Eigenvectors with different eigenvalues are linearly independent. 

Remark 12.6. This saves us time because we don’t have to check to see if the 

eigenvectors we come up with are linearly independent, since we generate a 

basis for each eigenspace, and there are no relations between eigenspaces. 

Proof. Take a square matrix A. Pick some eigenvectors, say x 1 with eigenvalue 

λ 1 , x 2 with eigenvalue λ 2 , etc., up to some x p . Suppose that all of these 

eigenvalues λ 1 , λ 2 , . . . , λ p are different from one another. If we found a linear 

relation c 1 x 1 = 0 involving just one vector x 1 , we would divide by c 1 to see 

that x 1 = 0. But x 1 ≠ 0 (being an eigenvector), so there is no linear relation 

involving just one eigenvector. Lets suppose we found a linear relation involving 

just two eigenvectors, x 1 and x 2 , like c 1 x 1 + c 2 x 2 = 0. We could just replace x 1 

by c 1 x 1 and x 2 by c 2 x 2 to arrange a linear relation x 1 + x 2 = 0. Since x 1 is an 

eigenvector with eigenvalue λ 1 , we know that (A − λ 1 I) x 1 = 0. Apply A − λ 1 I 

to both sides of our relation to get (λ 2 − λ 1 ) x 2 = 0. Since the eigenvalues are 

distinct, we can divide by λ 2 − λ 1 to get x 2 = 0, again a contradiction. So there 

are no linear relations involving just two eigenvectors. 

Lets imagine a linear relation 

c 1 x 1 + c 2 x 2 + · · · + c p x p = 0, 

involving any number of eigenvectors, and see why that leads us into a contradiction. 

If any of the terms are 0, just drop them, so we can assume that there 

are no 0 terms, i.e. that all coefficients c 1 , c 2 , . . . , c p are nonzero. So we can 

rescale, replacing x 1 by c 1 x 1 , etc., to arrange that our relation is now 

x 1 + x 2 + · · · + x p = 0.


Applying A − λ 1 I to our linear relation: 

0 = (A − λ 1 I) (x 1 + x 2 + · · · + x p ) 

= (λ 2 − λ 1 ) x 2 + · · · + (λ p − λ 1 ) x p 

a linear relation with fewer terms. Since λ 1 ≠ λ 2 , the coefficient of x 2 won’t 

become 0 in the new linear relation unless it was already 0, so this new linear 

relation still has nonzero terms. In this way, each linear relation leads to a 

linear relation with fewer terms, until we get down to one or two terms, which 

we already saw can’t happen. Therefore there are no linear relations among 

x 1 , x 2 , . . . , x n . 


⎛ 

⎞ 

⎜ 

−1 2 2 ⎟ 

A = ⎝ 2 2 2 ⎠ . 

−3 −6 −6 


⎛ 

⎞ 

−1 3 −3 

⎜ 

⎟ 

A = ⎝−3 5 −3⎠ . 

−6 6 −4 

Problem 12.6. Prove that a square matrix is diagonalizable (i.e. diagonalized 

by some matrix) just when it has a basis of eigenvectors. 

Remark 12.7. Following remark 9.7 on page 84, F diagonalizes A just when the 

change of coordinates y = F −1 x changes the matrix A into a diagonal matrix. 

Problem 12.7. Give an example of a matrix which is not diagonalizable. 

Problem 12.8. If A is diagonalized by F , say F −1 AF = Λ diagonal, then 

prove that A 2 is also diagonalized by F . Apply induction to prove that all 

powers of A are diagonalized by F . 

Problem 12.9. Use the result of the previous exercise to compute A 100000 

where 

( ) 

−3 −2 

A = 

. 

4 3

12.3. Summary 113 



⎛ 

⎞ 

1 

⎜ 

⎟ 

A = ⎝ 2 ⎠ . 

3 

Problem 12.11. Find the matrix A which has eigenvalues 1 and 3 and corresponding 

eigenvectors ( ) ( ) 

2 −4 

, . 

4 2 

Problem 12.12. Imagine that a quarter of all people who are healthy become 

sick each month, and a quarter die. Imagine that a quarter of all people who 

are sick die each month, and a quarter become healthy. What happens to the 

dead people? Write a matrix A to show how the numbers h n , s n , d n of healthy, 

sick and dead people change from month n to month n + 1. Diagonalize A. 

What happens to the population in the long run? Prove your answer. (Keep in 

mind that no one is being born in this story.) 

Problem 12.13. Let A be the 5 × 5 matrix all of whose entries are 1. 

a. Without any calculation, what is the kernel of A? 

b. Use this to diagonalize A. 

Problem 12.14. Lets investigate which 2 × 2 matrices are diagonalizable. 

a. Prove that every 2 × 2 matrix A can be written uniquely as 

( 

) 

p + q r + s 

A = 

r − s p − q 

for some numbers p, q, r, s. 

b. Prove that the characteristic polynomial of A is (p − λ) 2 + s 2 − q 2 − r 2 . 

c. Prove that A has two different eigenvalues just when q 2 + r 2 > s 2 . 

d. Prove that any 2×2 matrix with two different eigenvalues is diagonalizable. 

e. Prove that any 2 × 2 matrix with only one eigenvalue is diagonalizable 

just when it is diagonal. 

f. Prove that any 2 × 2 matrix with no eigenvalues is not diagonalizable. 

12.3 Summary 

Linear algebra has two problems: 

a. Solving linear equations Ax = b for the unknown x. This problem is truly 

linear. It has a solution x whenever b lies in the image, and the solution 

x is unique up to adding on vectors from the kernel.


b. Find eigenvectors and eigenvalues Ax = λx. This problem is nonlinear, in 

fact quadratic, since λ and x are multiplied by one another. The nonlinear 

part is finding the eigenvalues λ, which are the roots of the characteristic 

polynomial det (A − λ I). There is an eigenspace of solutions x for each 

λ, and finding a basis of each eigenspace is a linear problem. If we get 

lucky (which doesn’t always happen), then the eigenvectors might form a 

basis of R n , diagonalizing A. 

Table 12.1: Invertibility criteria (Strang’s nutshell [15]). A is n × n. 

U is any matrix obtained from A by forward elimination. 

§ Invertible Just When . . . 

5.1 Gauss–Jordan on A yields 1. 

5.2 U is invertible. 

5.2 Pivots lie all the way down the diagonal. 

5.2 U has no zero rows 

5.2 U has n pivots. 

5.2 Ax = b has a solution x for each b. 

5.2 Ax = b has exactly one solution x for each b. 

5.2 Ax = b has exactly one solution x for some b. 

5.2 Ax = 0 only for x = 0. 

5.2 A has rank n. 

7.4 A t is invertible. 

7.4 det A ≠ 0. 

9.2 The columns are linearly independent. 

9.2 The columns form a basis. 

9.2 The rows form a basis. 

10.2 The kernel of A is just the 0 vector. 

10.3 The image of A is all of R n . 

11.1 0 is not an eigenvalue of A. 

Problem 12.15. Take each of the criteria in table 12.1, and describe an 

analogous criterion for showing that A is not invertible. For example, instead 

of det A ≠ 0, you would write det A = 0. Make sure that as many as possible 

of your criteria express the failure of invertibility in terms of the rank r of the 

matrix A. For example, instead of turning 

U has no zero rows 

into 

U has a zero row,

12.3. Summary 115 

you should turn it into 

U has n − r zero rows.

Orthogonal Linear Algebra 

117

13 Inner Product 

So far, we haven’t thought about distances or angles. The elegant algebraic 

way to describe these geometric notions is in terms of the inner product, which 

measures something like how strongly in agreement two vectors are. 

13.1 Definition and Simplest Properties 

Definition 13.1. The inner product (also called the dot product or scalar product) 

of two vectors x and y in R n is the number 

〈x, y〉 = x 1 y 1 + x 2 y 2 + · · · + x n y n . 

Example 13.2. 

have inner product 

⎛ ⎞ ⎛ 

1 

⎜ ⎟ ⎜ 

4 

⎞ 

⎟ 

x = ⎝0⎠ , y = ⎝5⎠ , 

2 6 

〈x, y〉 = (1)(4) + (0)(5) + (2)(6) 

= 16. 

Example 13.3. 

⎧ 

⎨ 

1 if i = j, 

〈e i , e j 〉 = 

⎩0 if i ≠ j. 

Problem 13.1. Prove that 〈Ae j , e i 〉 = A ij . 

Recall that the transpose A t of a matrix A is the matrix with entries 

A t ij = A ji, i.e. with rows and columns switched. 

Problem 13.2. Prove that 〈x, y〉 = x t y. 

Problem 13.3. Let P be a permutation matrix. Use the result of problem 13.1 

to prove that P −1 = P t . 

119


√ 

a2 + b 2 

a 

b 

Figure 13.1: The Pythagorean theorem. Rearrange the 4 triangles into 2 

rectangles to find the area of all 4 triangles. Add the area of the small white 

square. 

Definition 13.4. Vectors u and v are perpendicular if 〈u, v〉 = 0. 

Definition 13.5. The length of a vector x in R n is ‖x‖ = √ 〈x, x〉. 

( ) 

a 

This agrees in the plane with the Pythagorean theorem: if x = then 

b 

we can draw x as a point of the plane, and the length along x is √ a 2 + b 2 . 

Problem 13.4. Prove that for any vectors u and v, with u ≠ 0, the vector 

is perpendicular to u. 

v − 

〈v, u〉 

〈u, u〉 u 


Problem 13.5. How many vectors x in R n have integer coordinates x 1 , x 2 , . . . , x n 

and have 

(a) ‖x‖ = 0? 

(b) ‖x‖ = 1? 

(c) ‖x‖ = 2? 

(d) ‖x‖ = 3? 

Problem 13.6. (Due to Ian Christie) 

a. What is wrong with the clock?

13.2. Symmetric Matrices 121 

b. At what times of day are the minute and hour hands of a properly 

functioning clock 

a) perpendicular? 

b) parallel? 

(The answer isn’t very pretty.) 

Problem 13.7. Prove that 〈Ax, y〉 = 〈x, A t y〉 for vectors x in R q , y in R p and 

A any p × q matrix. 

13.2 Symmetric Matrices 

Definition 13.6. A matrix A is symmetric if A t = A. 

Problem 13.8. Which of the following are symmetric? 

⎛ 

⎞ 

( ) ( ) 

1 2 1 1 ⎜ 

0 1 2 ⎟ 

, 

, ⎝1 3 4⎠ 

2 1 0 0 

2 4 5 

Problem 13.9. Prove that an n × n matrix is symmetric just when 

for x and y any vectors in R n . 

〈Ax, y〉 = 〈x, Ay〉 

Clearly a symmetric matrix is square. 


a. The sum and difference of symmetric matrices is symmetric. 

b. If A is a symmetric matrix, then 3A is also a symmetric matrix. 

Problem 13.11. Give an example of a pair of symmetric 2 × 2 matrices A and 

B for which AB is not symmetric.



Problem 13.12. For which matrices A is the matrix 

( ) 

1 A 

B = 

A 1 

symmetric? 

Problem 13.13. If A is symmetric, and F an invertible matrix, is F AF −1 

symmetric? If not, can you give a 2 × 2 example? 

Problem 13.14. If A and B are symmetric, is AB + BA symmetric? 

AB − BA symmetric? 

Is 

13.3 Orthogonal Matrices 

Definition 13.7. A matrix F is orthogonal if F t F = 1. 

Example 13.8. In problem 13.3, you proved that permutation matrices are 

orthogonal. 

Problem 13.15. Which of the following are orthogonal? 

( ) ( ) ( ) ( 

1 0 1 0 2 0 

, 

, 

, 

0 1 0 −1 0 1 

( ) ( ) ( ) ( 

1√2 1 

1 1 

√2 

1 1 

, 

0 1 − √ 1 , 

, 

√2 1 

2 

1 1 

2 0 

0 2 

) 

0 1 

−1 0 

Orthogonal matrices are important because they preserve inner products: 

Problem 13.16. Prove that a matrix is orthogonal just when 

for x and y any vectors. 

〈F x, F y〉 = 〈x, y〉 

Clearly any orthogonal matrix is square. 


a. The product of orthogonal matrices is orthogonal. 

b. The inverse of an orthogonal matrix is orthogonal. 

Problem 13.18. If F is orthogonal, and c is a real number, prove that cF is 

also orthogonal only when c = ±1. 

Problem 13.19. Which diagonal matrices are orthogonal? 

, 

)

13.4. Orthonormal Bases 123 

Problem 13.20. Prove that the matrices 

( 

cos θ 

P = 

sin θ 

) 

− sin θ 

cos θ 

are orthogonal. Give an example of an orthogonal 2 × 2 matrix not of this form. 

Problem 13.21. Give an example of a pair of orthogonal 2 × 2 matrices A and 

B for which A + B is not orthogonal. 

Problem 13.22. By expanding out the expressions 〈x + y, x + y〉 using the 

properties of inner products, express the inner product 〈x, y〉 of two vectors in 

terms of their lengths. Use this to prove that a matrix A is orthogonal just 

when 

for any vector x. 

‖Ax‖ = ‖x‖ , 

13.4 Orthonormal Bases 

Some bases are much easier to use than others. 

Definition 13.9. A basis u 1 , u 2 , . . . , u n is orthonormal if 

⎧ 

⎨ 

1 if i = j, 

〈u i , u j 〉 = 

⎩0 if i ≠ j. 

Example 13.10. The standard basis is orthonormal. 

Example 13.11. The basis 

is orthonormal. 

u 1 = 

( √3 

2 

1 

2 

) 

, u 2 = 

( 

− 1 2 √ 

3 

2 

)


Why Are Orthonormal Bases Better Than Other Bases? 

Take any basis u 1 , u 2 , . . . , u n for R n . Every vector x in R n can be written as a 

linear combination 

x = c 1 u 1 + c 2 u 2 + · · · + c n u n . 

How do you find the coefficients c 1 , c 2 , . . . , c n ? You apply elimination to the 

matrix ( 

) 

u 1 u 2 . . . u n x . 

This is a big job. But if the basis is orthonormal then you can just read off the 

coefficients as 

c 1 = 〈x, u 1 〉 , c 2 = 〈x, u 2 〉 , . . . , c n = 〈x, u n 〉 . 

Problem 13.23. Prove that if u 1 , u 2 , . . . , u n is an orthonormal basis for R n 

and x is any vector in R n then 

x = 〈x, u 1 〉 u 1 + 〈x, u 2 〉 u 2 + · · · + 〈x, u n 〉 u n . 

How Do We Tell If a Basis is Orthonormal? 

Proposition 13.12. A square matrix is orthogonal just when its columns are 

orthonormal. 

Proof. Write the matrix as 

F = 

(u 1 u 2 . . . u n 

) 

. 

Calculate 

⎛ 

t 

u 1 

t 

u 2 

u n 

t 

⎞ 

) 

F t F = 

⎜ ⎟ 

(u 1 u 2 . . . u n 

⎝ . ⎠ 

⎛ 

⎞ 

u t 1 u 1 u t 1 u 2 . . . u t 1 u n 

u 

= 

t 2 u 1 u t 2 u 2 . . . u t 2 u n 

⎜ . . . . 

⎟ 

⎝ . . . . ⎠ 

u t n u 1 u t n u 2 . . . u t n u n 

⎛ 

⎞ 

〈u 1 , u 1 〉〈u 1 , u 2 〉 . . . 〈u 1 , u n 〉 

〈u 

= 

2 , u 1 〉〈u 2 , u 2 〉 . . . 〈u 2 , u n 〉 

⎜ . 

. . . 

⎟ 

⎝ . 

. . . ⎠ . 

〈u n , u 1 〉〈u n , u 2 〉 . . . 〈u n , u n 〉

13.5. Gram–Schmidt Orthogonalization 125 

The original vectors. 

Project the second 

vector perpendicular to 

the first. 

Projected. 

Shrink/stretch all 

vectors to length 1. 

Done: orthonormal. 

Figure 13.2: The Gram–Schmidt process 

Problem 13.24. Is the basis 

( ) 

1 

, 

−2 

( ) 

2 

1 

orthonormal? Draw a picture of these two vectors. 

Problem 13.25. Prove that a square matrix is orthogonal just when its rows 

are orthonormal. 

13.5 Gram–Schmidt Orthogonalization 

The idea: if I start with a basis v 1 , v 2 of R 2 which is not orthonormal, I can fix 

it up (as in figure 13.2).


The formal definition: given any linearly independent vectors v 1 , v 2 , . . . , v p 

(as input), the output are orthonormal vectors u 1 , u 2 , . . . , u p : 

w 1 = v 1 , 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1, 

w 3 = v 3 − 〈v 3, w 1 〉 

〈w 1 , w 1 〉 w 1 − 〈v 3, w 2 〉 

〈w 2 , w 2 〉 w 2, 

w j = v j − ∑ 〈v j , w i 〉 

〈w 

i


Problem 13.30. Prove that if v 1 , v 2 , . . . , v p are linearly independent vectors, 

then each step of Gram–Schmidt makes sense (no dividing by zero), and the 

resulting u 1 , u 2 , . . . , u p are an orthonormal basis for the span of v 1 , v 2 , . . . , v p . 

Problem 13.31. Prove that any set of vectors, all of unit length, and perpendicular 

to one another, is contained in an orthonormal basis. 

Problem 13.32. If u and v are two vectors in R n , and every vector w which is 

perpendicular to u is perpendicular to v, then v = au for some number a. 


Problem 13.33. Orthogonalize 

( ) ( ) 

1 5 

, . 

−1 3 


( ) ( ) 

−1 0 

, . 

−1 2 


( ) ( ) 

−1 −1 

, 

1 2 

. 


( ) ( ) 

2 −1 

, 

−1 1 

. 


( ) ( ) 

1 0 

, . 

2 2



( ) ( ) 

1 2 

, . 

1 0 


( ) ( ) 

1 0 

, . 

1 2 


( ) ( ) 

1 −1 

, . 

1 2 


( ) ( ) 

0 −1 

, . 

1 1 


( ) ( ) 

2 −1 

, . 

1 2 


( ) ( ) 

−1 1 

, 

0 −1 

. 


( ) ( ) 

2 0 

, 

−1 −1 

. 


( ) ( ) 

1 −1 

, . 

1 0



( ) ( ) 

1 −1 

, . 

2 0 


( ) ( ) 

1 0 

, . 

1 −1 


( ) ( ) 

−1 1 

, 

2 −1 

. 


( ) ( ) 

0 −1 

, . 

1 2 


( ) ( ) 

1 0 

, . 

1 2 


( ) ( ) 

1 2 

, 

−1 −1 

. 


( ) ( ) 

1 −1 

, . 

1 0 


( ) ( ) 

1 2 

, . 

2 2



( ) ( ) 

2 1 

, . 

2 2 


( ) ( ) 

1 0 

, . 

0 −1 


( ) ( ) 

1 1 

, . 

−1 0 


( ) ( ) 

1 2 

, . 

−1 0 


( ) ( ) 

1 2 

, . 

1 1 


( ) ( ) 

0 1 

, . 

2 1 


( ) ( ) 

1 2 

, . 

1 1 


( ) ( ) 

2 1 

, . 

1 2



( ) ( ) 

−1 −1 

, 

2 −1 

. 


( ) ( ) 

2 2 

, . 

1 0 


( ) ( ) 

−1 2 

, 

−1 −1 

. 


( ) ( ) 

1 0 

, . 

1 −1 


( ) ( ) 

−1 2 

, . 

−1 0 


( ) ( ) 

2 0 

, . 

0 2 


( ) ( ) 

0 1 

, . 

−1 1 


( ) ( ) 

1 2 

, . 

0 2



( ) ( ) 

0 2 

, . 

2 −1 


( ) ( ) 

0 2 

, . 

2 0 


( ) ( ) 

2 1 

, . 

2 −1 


( ) ( ) 

0 1 

, . 

2 1 


( ) ( ) 

2 −1 

, . 

0 −1 


( ) ( ) 

−1 0 

, . 

0 1 


( ) ( ) 

2 0 

, 

−1 −1 

. 


( ) ( ) 

2 2 

, . 

0 1



( ) ( ) 

2 −1 

, . 

2 1 


( ) ( ) 

2 2 

, . 

0 1 


( ) ( ) 

1 0 

, . 

2 −1 


( ) ( ) 

1 1 

, . 

2 0 


( ) ( ) 

2 0 

, . 

2 −1 


( ) ( ) 

1 1 

, . 

−1 1 

Problem 13.84. What happens to a basis when you carry out Gram–Schmidt, 

if it was already orthonormal to begin with?

14 The Spectral Theorem 

Finally, our goal: we want to prove that symmetric matrices can be made into 

diagonal matrices by orthogonal changes of variable. 

14.1 Statement and Proof 

Proposition 14.1 (The Minimum Principle). Let A be a symmetric n × n 

matrix. The function 

Q(x) = 〈Ax, x〉 

is a quadratic polynomial function. Restrict x to lie on the sphere of unit length 

vectors. Then Q(x) reaches a minimum among all vectors on that sphere at 

some vector x = u. This vector u is an eigenvector of A. 

Proof. In the appendix to this chapter, we prove that the minimum occurs. So 

there is some vector x = u so that 〈Ax, x〉 ≥ 〈Au, u〉 for any x of unit length. 

Fixing u, consider the quadratic function 

For x of unit length, 〈x, x〉 = 1, so 

H(x) = 〈Ax, x〉 − 〈Au, u〉〈x, x〉 . 

H(x) = 〈Ax, x〉 − 〈Au, u〉 

≥ 0. 

But if we scale x, say to ax, clearly H(x) is quadratic in x, so 

H (ax) = a 2 H(x). 

So rescaling, we find that H(x) ≥ 0 for any vector x, of any length. Pick w any 

vector perpendicular to u. For any number t: 

0 ≤ H(u + tw) 

= 〈A(u + tw), u + tw〉 − 〈Au, u〉〈u + tw, u + tw〉 

= 〈Au, u〉 + 2t 〈Au, w〉 + t 2 〈Aw, w〉 − 〈Au, u〉 ( 1 + t 2 〈w, w〉 ) 

= 2t 〈Au, w〉 + t 2 H(w) 

= t (2 〈Au, w〉 + tH(w)) . 

135


First lets try t positive, so we can divide by t, and find 0 ≤ 2 〈Au, w〉 + tH(w). 

Let t to go to zero, to see that 0 ≤ 〈Au, w〉. Next try t negative, and divide by t 

and then let t go to zero, and see that 0 ≥ 〈Au, w〉. Therefore 〈Au, w〉 = 0. So 

every vector w perpendicular to u is also perpendicular to Au. By problem 13.32 

on page 127, Au is a multiple of u, so u is an eigenvector. 

Problem 14.1. If two eigenvectors of a symmetric matrix have different eigenvalues, 

prove that they are perpendicular. 

Theorem 14.2 (Spectral Theorem). Each symmetric matrix A is diagonalized 

by an orthogonal matrix F . The columns of F form an orthonormal basis of 

eigenvectors. We say that F orthogonally diagonalizes A. 

Proof. Start with a unit eigenvector u 1 , given by the minimum principle. Take 

any orthonormal basis u 1 , u 2 , . . . , u n that starts with this vector, and let F 

be the matrix with these vectors as columns. Replace A with F t AF . After 

replacement, A has e 1 as eigenvector: Ae 1 = λe 1 , so the first column of A is 

λe 1 . Because A is symmetric, we see that 

( ) 

λ I 0 

A = 

0 B 

with B a smaller symmetric matrix. By induction on the size of matrix, we can 

orthogonally diagonalize B. 

The previous exercise is vital for calculations: to orthogonally diagonalize, 

find all eigenvalues, and for each eigenvalue λ find an orthonormal basis 

u 1 , u 2 , . . . of eigenvectors of that eigenvalue λ. All of the eigenvectors of all of 

the other eigenvalues will automatically be perpendicular to u 1 , u 2 , . . . , so put 

together they make an orthonormal basis of R n . 

Problem 14.2. Find a matrix F which orthogonally diagonalizes the matrix 

( ) 

7 −6 

A = 

, 

−6 12 

by finding the eigenvectors u 1 , u 2 and eigenvalues λ 1 , λ 2 . 

Problem 14.3. Prove that a square matrix is orthogonally diagonalizable just 

when it is symmetric. 

Remark 14.3. An elegant (but longer) proof of the spectral theorem can be made 

along the following lines. Once we have used the minimum principle to find one 

eigenvector u 1 , we can then look among all unit length vectors x perpendicular

14.1. Statement and Proof 137 

to u 1 , and see which of these vectors has smallest value for 〈Ax, x〉. Call that 

vector u 2 . Look among all unit vectors x which are perpendicular to both u 1 

and u 2 for one which has the smallest value of 〈Ax, x〉, and call it u 3 , etc. This 

recipe will actually generate the eigenvectors for us, although it isn’t easy to 

use either by hand or by computer. 



( ) 

4 −2 

A = 

−2 1 


⎛ 

⎞ 

4 −2 0 

⎜ 

⎟ 

A = ⎝−2 1 0⎠ . 

0 0 3 


⎛ 

⎞ 

7 

6 

− 1 6 

− 1 3 

⎜ 

A = 

7 1 ⎟ 

⎝ 

⎠ 

− 1 6 

− 1 3 

Problem 14.7. Find a matrix F which orthogonally diagonalizes 

⎛ 

⎞ 

− 7 24 

25 

0 

25 

⎜ 

⎟ 

A = ⎝ 0 1 0 ⎠ 


⎛ 

1 

⎜ 

A = ⎝ 

6 

1 

3 

3 

5 

3 

24 

7 

25 

0 

25 

2 

⎞ 

⎟ 

⎠ 

3 

What are all of the orthogonal matrices F for which F t AF is diagonal with 

entries increasing as we move down the diagonal? 

Problem 14.9. If A is symmetric, prove that A 2 has the same rank as A. 

Problem 14.10. If A is n × n, prove that A t A has no negative eigenvalues, 

and its eigenvalues are all positive just when A is invertible.


14.2 Quadratic Forms 

Definition 14.4. A quadratic form is a polynomial in several variables, with all 

terms being quadratic (like x 2 or xy). 

In particular, no linear or constant terms can appear in a quadratic form. 

Symmetrizing 

If we have a quadratic form in variables x 1 and x 2 , we can write it more 

symmetrically; for example: 

x 1 x 2 = 1 2 x 1x 2 + 1 2 x 2x 1 . 

We just leave alone a term like x 2 1, so for example 

Problem 14.11. Symmetrize: 

a. x 2 2 

b. x 2 1 + x 2 2 

c. x 2 1 + 3 x 1 x 2 

d. x 1 (x 1 + x 2 ) 

x 2 1 + 8 x 1 x 2 = x 2 1 + 4 x 1 x 2 + 4 x 2 x 1 . 

Making a matrix 

Pluck out the quadratic terms in the polynomial to make a matrix. For example: 

becomes 

a x 2 1 + b x 1 x 2 + c x 2 2 = a x 2 1 + b 2 x 1x 2 + b 2 x 2x 1 + c x 2 2 

( 

a 

A = 

b 

2 

b 

2 

c 

More generally, ∑ ij A ijx i x j becomes A = (A ij ). Because we symmetrized, the 

matrix is symmetric. 

Problem 14.12. Make matrices for 

a. x 2 2 

b. x 2 1 + x 2 2 

c. x 2 1 + 3 2 x 1x 2 + 3 2 x 2x 1 

d. x 2 1 + 1 2 x 1x 2 + 1 2 x 2x 1 

) 

.

14.2. Quadratic Forms 139 

Diagonalizing 

Diagonalize our matrix, by orthogonal change of variables. Then the same 

orthogonal change of variables will simplify our quadratic form, turning it into 

a sum of quadratic forms in one variable each. For example, take the quadratic 

form 

23 x 2 1 + 72 x 1 x 2 + 2 x 2 2. 

Symmetrize: 

The associated matrix is 

23 x 2 1 + 36 x 1 x 2 + 36 x 2 x 1 + 2 x 2 2. 

( ) 

23 36 

A = 

. 

36 2 

We let the reader check that A is orthogonally diagonalized by 

( ) 

F = 

3 

5 

− 4 5 

4 

5 

3 

5 

, 

so that 

( ) 

F t −25 0 

AF = 

. 

0 50 

We also let the reader check that if we take new variables y, defined by y = F t x, 

i.e. by x = F y, then the same quadratic form is 

−25 y 2 1 + 50 y 2 2. 

Theorem 14.5 (Decoupling Theorem). Any quadratic form in any number of 

variables becomes a sum of quadratic forms in one variable each, after a change 

of variables x = F y given by an orthogonal matrix F . 

Remark 14.6. We will say that the quadratic form is diagonalized by the 

orthogonal matrix. 

Proof. The problem comes from the mixed terms, like x 1 x 2 . Symmetrize and 

write a symmetric matrix A out of the coefficients. Then the quadratic form 

is ∑ ij A ijx i x j = 〈Ax, x〉. Diagonalize A to Λ = F t AF . Let y = F t x. Then 

x = F y, so 

〈Ax, x〉 = 〈AF y, F y〉 

= 〈 F t AF y, y 〉 

= 〈Λy, y〉 

= λ 1 y 2 1 + · · · + λ n y 2 n.




(a) 3 x 1 2 − 2 x 1 x 2 + 3 x 2 

2 

(b) 9 x 1 2 + 18 x 1 x 2 + 9 x 2 

2 

(c) 11 x 1 2 + 6 x 1 x 2 + 3 x 2 

2 

(d) 3 x 1 2 + 4 x 1 x 2 + 6 x 2 

2 

(e) 8 x 1 2 − 12 x 1 x 2 + 3 x 2 

2 

(f) 10 x 1 2 + 6 x 1 x 2 + 2 x 2 

2 


(a) 4 x 1 x 2 + 3 x 2 

2 

(b) 6 x 1 2 − 12 x 1 x 2 + 11 x 2 

2 

(c) 10 x 1 2 − 12 x 1 x 2 + 5 x 2 

2 

(d) −2 x 1 2 − 4 x 1 x 2 + x 2 

2 

(e) 2 x 1 2 − 6 x 1 x 2 + 10 x 2 

2 

(f) −2 x 1 2 + 6 x 1 x 2 + 6 x 2 

2 


(a) 10 x 1 2 + 12 x 1 x 2 + 5 x 2 

2 

(b) 10 x 1 2 − 18 x 1 x 2 + 10 x 2 

2 

(c) 11 x 1 2 + 12 x 1 x 2 + 6 x 2 

2 

(d) −2 x 1 2 + 4 x 1 x 2 + x 2 

2 

(e) x 1 2 − 4 x 1 x 2 + 4 x 2 

2 

(f) 8 x 1 2 + 18 x 1 x 2 + 8 x 2 

2 


(a) 4 x 1 2 − 4 x 1 x 2 + x 2 

2 

(b) 11 x 1 2 + 18 x 1 x 2 + 11 x 2 

2 

(c) 9 x 1 2 + 12 x 1 x 2 + 4 x 2 

2 

(d) x 1 2 − 2 x 1 x 2 + x 2 

2 

(e) 3 x 1 2 + 4 x 1 x 2 

(f) 4 x 1 2 + 8 x 1 x 2 + 4 x 2 

2 


(a) x 1 2 + 4 x 1 x 2 − 2 x 2 

2 

(b) x 1 2 − 12 x 1 x 2 + 6 x 2 

2 

(c) −x 1 2 − 6 x 1 x 2 + 7 x 2 

2 

(d) −x 1 2 + 2 x 1 x 2 − x 2 

2 

(e) 6 x 1 2 − 18 x 1 x 2 + 6 x 2 

2 

(f) 8 x 1 2 + 12 x 1 x 2 + 3 x 2 

2 


(a) x 1 2 − 8 x 1 x 2 + x 2 

2 

(b) 6 x 1 2 − 8 x 1 x 2 + 6 x 2 

2 

(c) 2 x 1 2 + 4 x 1 x 2 + 5 x 2 

2 

(d) 3 x 1 2 − 12 x 1 x 2 + 8 x 2 

2 

(e) x 1 2 − 4 x 1 x 2 − 2 x 2 

2 

(f) −2 x 1 2 − 6 x 1 x 2 + 6 x 2 

2



(a) 7 x 1 2 + 12 x 1 x 2 + 2 x 2 

2 

(b) 2 x 1 2 − 4 x 1 x 2 − x 2 

2 

(c) 6 x 1 2 + 18 x 1 x 2 + 6 x 2 

2 

(d) 3 x 1 2 + 6 x 1 x 2 + 11 x 2 

2 

(e) 9 x 1 2 + 6 x 1 x 2 + x 2 

2 

(f) 8 x 1 2 − 6 x 1 x 2 


(a) 8 x 1 2 + 18 x 1 x 2 + 8 x 2 

2 

(b) 7 x 1 2 − 6 x 1 x 2 − x 2 

2 

(c) 11 x 1 2 − 18 x 1 x 2 + 11 x 2 

2 

(d) 9 x 1 2 − 6 x 1 x 2 + x 2 

2 

(e) 11 x 1 2 − 12 x 1 x 2 + 6 x 2 

2 

(f) 5 x 1 2 + 12 x 1 x 2 + 10 x 2 

2 


(a) 4 x 1 2 + 12 x 1 x 2 + 9 x 2 

2 

(b) 6 x 1 2 − 6 x 1 x 2 − 2 x 2 

2 

(c) −2 x 1 x 2 

(d) 2 x 1 2 + 8 x 1 x 2 + 2 x 2 

2 

(e) 3 x 1 2 − 6 x 1 x 2 + 11 x 2 

2 

(f) 7 x 1 2 + 18 x 1 x 2 + 7 x 2 

2 


(a) 3 x 1 2 − 4 x 1 x 2 

(b) −x 1 2 + 6 x 1 x 2 + 7 x 2 

2 

(c) 3 x 1 2 + 12 x 1 x 2 + 8 x 2 

2 

(d) −2 x 1 2 − 2 x 1 x 2 − 2 x 2 

2 

(e) −6 x 1 x 2 + 8 x 2 

2 

(f) 3 x 1 2 + 8 x 1 x 2 + 3 x 2 

2 


(a) x 1 2 + 2 x 1 x 2 + x 2 

2 

(b) 7 x 1 2 − 18 x 1 x 2 + 7 x 2 

2 

(c) 6 x 1 2 + 12 x 1 x 2 + 11 x 2 

2 

(d) 2 x 1 2 − 8 x 1 x 2 + 2 x 2 

2 

(e) 5 x 1 2 + 4 x 1 x 2 + 2 x 2 

2 

(f) 2 x 1 2 + 4 x 1 x 2 − x 2 

2 


(a) 4 x 1 2 − 8 x 1 x 2 + 4 x 2 

2 

(b) 11 x 1 2 − 6 x 1 x 2 + 3 x 2 

2 

(c) x 1 2 − 6 x 1 x 2 + 9 x 2 

2 

(d) 2 x 1 2 + 6 x 1 x 2 + 10 x 2 

2 

(e) 4 x 1 2 + 4 x 1 x 2 + x 2 

2 

(f) x 1 2 + 12 x 1 x 2 + 6 x 2 

2 

Problem 14.25. Diagonalize


(a) −4 x 1 x 2 + 3 x 2 

2 

(b) 6 x 1 2 − 4 x 1 x 2 + 3 x 2 

2 

(c) 6 x 1 2 + 6 x 1 x 2 − 2 x 2 

2 

(d) 10 x 1 2 − 6 x 1 x 2 + 2 x 2 

2 

(e) x 1 2 + 6 x 1 x 2 + 9 x 2 

2 

(f) −x 1 2 − 4 x 1 x 2 + 2 x 2 

2 


(a) −x 1 2 + 4 x 1 x 2 + 2 x 2 

2 

(b) 2 x 1 2 − 4 x 1 x 2 + 5 x 2 

2 

(c) 7 x 1 2 + 6 x 1 x 2 − x 2 

2 

(d) 8 x 1 2 − 18 x 1 x 2 + 8 x 2 

2 

(e) 3 x 1 2 + 2 x 1 x 2 + 3 x 2 

2 

(f) 9 x 1 2 − 12 x 1 x 2 + 4 x 2 

2 


(a) 5 x 1 2 + 8 x 1 x 2 + 5 x 2 

2 

(b) 6 x 1 2 + 12 x 1 x 2 + x 2 

2 

(c) 3 x 1 2 − 8 x 1 x 2 + 3 x 2 

2 

(d) 2 x 1 x 2 

(e) x 1 2 + 4 x 1 x 2 + 4 x 2 

2 

(f) 9 x 1 2 − 18 x 1 x 2 + 9 x 2 

2 


(a) 7 x 1 2 + 6 x 1 x 2 − x 2 

2 

(b) 2 x 1 2 − 2 x 1 x 2 + 2 x 2 

2 

(c) −2 x 1 2 + 2 x 1 x 2 − 2 x 2 

2 

(d) 8 x 1 2 + 6 x 1 x 2 

(e) 5 x 1 2 − 12 x 1 x 2 + 10 x 2 

2 

(f) 6 x 1 2 − 12 x 1 x 2 + x 2 

2 


(a) 2 x 1 2 + 2 x 1 x 2 + 2 x 2 

2 

(b) 6 x 1 x 2 + 8 x 2 

2 

(c) x 1 2 + 8 x 1 x 2 + x 2 

2 

(d) 5 x 1 2 − 8 x 1 x 2 + 5 x 2 

2 

(e) 6 x 1 2 + 4 x 1 x 2 + 3 x 2 

2 

(f) 6 x 1 2 + 8 x 1 x 2 + 6 x 2 

2 


(a) 4 x 1 2 − 12 x 1 x 2 + 9 x 2 

2 

(b) 3 x 1 2 − 4 x 1 x 2 + 6 x 2 

2 

(c) 2 x 1 2 − 12 x 1 x 2 + 7 x 2 

2 

(d) 5 x 1 2 − 4 x 1 x 2 + 2 x 2 

2 

(e) 10 x 1 2 + 18 x 1 x 2 + 10 x 2 

2 

(f) 2 x 1 2 + 12 x 1 x 2 + 7 x 2 

2

14.3. Application to Quadratic Equations 143 

(a) 

Circle 

(b) 

Ellipse 

(c) Pair of lines (d) Hyperbola 

Figure 14.1: Some examples of solutions of quadratic equations 

14.3 Application to Quadratic Equations 

In the plane, with two variables x 1 and x 2 , a quadratic equation Q(x) = c 

(with Q(x) a quadratic form and c a constant number) cuts out a circle, ellipse, 

hyperbola, pair of lines, single line, point, or empty set. 

Example 14.7. The quadratic equation 

4 x 1 x 2 + 3 x 2 2 = 0 

involves the quadratic form with matrix 

( ) 

0 2 

. 

2 3 

Its eigenvalues are λ = −1 and λ = 4. So we can change variables (somehow) 

to get to 

−x 2 1 + 4 x 2 2 = 0. 

This is just 

x 1 = ±2 x 2 , 

a pair of lines intersecting at a point. Since the change of variables is linear, the 

original quadratic equation also cuts out a pair of lines intersecting at a point. 

Example 14.8. The equation 

x 2 1 + 4 x 1 x 2 + x 2 2 = 1 

contains the quadratic form with matrix 

( ) 

1 2 

. 

2 1 

The eigenvalues are λ = −1 and λ = 3. So after a linear change of variables, we 

get 

−x 2 1 + 3 x 2 2 = 1. 

(The right hand side is a constant, so doesn’t change.)


It is well known that an equation of the form 

a x 2 1 + b x 2 2 = 1 

with a and b of different signs is a hyperbola, while if a and b have the same 

signs then it is an ellipse. So our last example must be a hyperbola. Warning: 

until you diagonalize the associated matrix, and look at the eigenvalues, you 

can’t easily see what shape a quadratic equation cuts out. You can’t just look 

at whether the coefficients are positive, or anything obvious like that. 


Problem 14.31. Just by finding eigenvalues (without finding eigenvectors), 

determine what geometric shape (ellipse, hyperbola, pair of lines, line, empty 

set) the following are: 

a. 5 x 2 1 + 2 x 2 2 − 4 x 1 x 2 = 1 

b. 3 x 2 1 + 8 x 1 x 2 + 3 x 2 2 = 1 

c. x 2 1 − 4 x 1 x 2 + 4 x 2 2 = −1 

d. 6 x 1 x 2 + 8 x 2 2 = 1 

e. x 2 1 − 6 x 1 x 2 + 9 x 2 2 = 0 

f. 4 x 1 x 2 − 3 x 2 1 − 6 x 2 2 = 1 

Problem 14.32. What more can you do to normalize a quadratic form if you 

allow arbitrary invertible matrices instead of orthogonal ones? 

14.4 Positivity 

Definition 14.9. A quadratic form Q(x) is positive definite if Q(x) > 0 except if 

x = 0. 

(Clearly if x = 0 then Q(x) = 0.) For example, Q(x) = x 2 is a positive 

definite quadratic form on R, while Q(x) = x 2 1 + x 2 2 is positive definite on R n . 

But it is not at all clear whether Q(x) = 6 x 2 1 − 12 x 1 x 2 + 11 x 2 2 is positive 

definite, because it has positive terms and negative ones. As in figure 14.2 

on the facing page, we can also define positive semidefinite forms (Q(x) ≥ 0), 

negative definite forms (Q(x) < 0 for x ≠ 0), and indefinite forms (not positive 

semidefinite or negative semidefinite), but they are less important. 

Lemma 14.10. A quadratic form Q(x) = 〈x, Ax〉 (with a symmetric matrix 

A) is positive definite just if all of the eigenvalues of A are positive. 

Proof. Let λ 1 , λ 2 , . . . , λ n be the eigenvalues, and change variables to y = F −1 x, 

so x = F y, to diagonalize the quadratic form: 

Q = λ 1 y 2 1 + λ 2 y 2 2 + · · · + λ n y 2 n.

14.5. Appendix: Continuous Functions and Maxima 145 

0 

-1 

-1.0 

-0.5 

x[2] 

0.0 

3 

0.5 

1.0 

2 

-1.0 

-0.5 

1 

0.0 

0.5 x[1] 

0 1.0 

-1.0 

-0.5 

x[2] 

0.0 

1.0 

0.5 

0.75 

1.0 

-1.0 

0.5 

-0.5 

0.0 

0.25 

0.5 x[1] 

0.0 1.0 

-1.0 

-0.5 

x[2] 

0.0 

1.0 

0.5 

1.0 

0.5 

-1.0 

-0.5 

0.0 

0.0 

0.5 x[1] 

-0.5 1.0 

-2 

-3 

1.0 

0.5 

0.0 

x[1] 

-1.0 

-0.5 

0.0 

-0.5 

x[2] 

0.5 -1.0 1.0 

(c) Indefinite 

defi- 

(d) Negative 

nite 

semi- 

(a) Positive definite (b) Positive 

definite 

Figure 14.2: Various behaviours of quadratic forms in two variables 

Suppose that all of the eigenvalues are positive. Clearly this quantity is then 

positive for nonzero vectors y, because each term is positive or zero, and at 

least one of y 1 , y 2 , . . . , y n is not zero, so gives a positive term. 

On the other hand, if one of these λ j is negative, then take y = e j , and you 

get Q ≤ 0 but x = F y = F e j ≠ 0. 


Problem 14.33. Which of the quadratic forms in problem 14.13 on page 140 

are positive definite? 

14.5 Appendix: Continuous Functions and Maxima 

There is one gap in our proof of the spectral theorem for symmetric matrices: 

we need to know that a quadratic function on the sphere in R n has a maximum. 

This appendix gives the proof. Warning: students are not required or expected 

to work through this appendix, which is advanced and is included only for 

completeness. 

Definition 14.11. A sequence of numbers x 1 , x 2 , . . . converges to a number x if, 

in order to make x j stay as close as we like to x, we have only to ensure that j 

is kept large enough. A sequence of points 

( ) 

x 1 

, 

y 1 

( ) 

x 2 

, . . . 

y 2 

in R 2 converges to a point 

( ) 

x 

y 

if x 1 , x 2 , . . . converges to x and y 1 , y 2 , . . . converges to y. Similarly, a sequence 

of points in R n converges if all of the coordinates of those points converge.


Any sequence of increasing real numbers, all of which are bounded from 

above by some large enough number, must converge to something. This fact is 

a property of real numbers which we cannot prove without giving an explicit 

and precise definition of the real numbers; see Spivak [14] for the complete story. 

We will just assume that this fact is true. 

Definition 14.12. A function f (x) of any number of variables x 1 , x 2 , . . . , x n 

(writing x for 

⎛ ⎞ 

x 1 

x = ⎜ ⎟ 

⎝ . ⎠ 

x n 

as a point of R n ) is continuous if, in order to get f(y) to stay as close to f(x) 

as you like, you have only to ensure that y is kept close enough to x. 

If two numbers are close, their sums, products and differences are clearly 

close. The reader should try to prove: 

Lemma 14.13. The function f (x) = x 1 is continuous. Constant functions 

are continuous. The sum, difference and product of continuous functions is 

continuous. 

Corollary 14.14. Any polynomial function in any finite number of variables 

is continuous. 

Proof. Induction on the degree and number of terms of the polynomial. 

Definition 14.15. A ball in R n is a set B consisting of all points closer than 

some distance to some chosen point (called the center of the ball). The distance 

is called the radius. A closed ball includes also the points of distance equal to 

the radius (an apple with the skin), while an open ball does not include any 

such points, only including points of distance less than the radius (an apple 

without the skin). 

Definition 14.16. A set S ⊂ R n is bounded if it lies in a ball (open or closed). 

Definition 14.17. A set S ⊂ R n is called closed if every point x of R n not 

belonging to S can be surrounded by an open ball not belonging to S. 

Definition 14.18. A closed box is the set of points x = (x 1 , . . . , x n ) for which 

each x j lies in some chosen interval, a j ≤ x j ≤ b j . An open box is the same 

but with a j < x j 

Lemma 14.19. A closed ball is a closed set, as is a closed box. 

Proof. Given a closed ball, say of radius r, take any point p not belonging to it, 

say of distance R from the center, and draw an open ball of radius R − r about 

p. By the triangle inequality, no point in the open ball lies in the closed ball. 

Given a closed box, and a point p not belonging to it, there must be some 

coordinate of p which does not satisfy the inequalities defining the closed

14.5. Appendix: Continuous Functions and Maxima 147 

box. For example, suppose that the box is cut out by inequalities including 

a 1 ≤ x 1 ≤ b 1 , and p fails to satisfy these bounds because p 1 > b 1 . Then every 

point q closer to p than p 1 − b 1 will still fail: q 1 > b 1 . So then a ball of radius 

p 1 − b 1 around p will not overlap the closed box. 

Theorem 14.20. Every infinite sequence of points in a closed, bounded set has 

a convergent subsequence. 

Proof. Suppose that the set is a box. Cover the box with a finite number of 

small closed boxes (perhaps overlapping). There are infinitely many points 

x 1 , x 2 , . . . , and only finitely many of the small boxes, so there must be infinitely 

many x j lying in the same small box. 

Similarly, subdivide that small box into much smaller closed boxes. Repeating, 

we find a sequence of closed boxes, like Russian dolls, each contained 

entirely in the previous one, with infinitely many x j in each. We get to choose 

how small the boxes are going to be at each step, so lets make them get much 

smaller at each step, with side lengths decreasing as rapidly as we like. Pick out 

one of these x i points, call it y j , from the j-th nested box, as the point with 

the smallest possible coordinates among all points of that box. The sequence of 

points y 1 , y 2 , . . . must converge, since all of the coordinates of the point y j are 

constrained by the box y j lies in, and each coordinate only increases with j. 

If we face a closed, bounded set S, which is not a closed box, then find a 

closed box B containing it, and repeat the argument above. The problem is 

to ensure that the limit x of the sequence constructed belongs to the set S. 

Even if not, it certainly belongs to B. Since S is closed, if x does not belong 

to S,then there must be an open ball around x not containing any points of S. 

But that open ball can not contain any of the points in the nested boxes, and 

therefore x cannot be their limit. 

Theorem 14.21. Every continuous function f on a closed, bounded set attains 

a maximum and a minimum. 

Proof. For the moment, lets suppose that our closed, bounded set is just a 

closed box. Suppose that f has no maximum. So the values of f can get larger 

and larger, but never peak. Let M be the smallest positive number so that f 

never exceeds M; if there is no such number let M = ∞. By definition, f gets 

as close to M as we like (which, if M = ∞, means simply that f gets as large 

as we like), but never reaches M. Let x 1 , x 2 , x 3 , . . . be any points of the closed 

bounded set on which f (x j ) approaches M. Taking a subsequence, we find x j 

approaching a limit point x, and by continuity f (x j ) must approach f(x), so 

f(x) = M. 

So every continuous function on a closed bounded set has a maximum. If 

f is a continuous function on a closed bounded set, then −f is too, and has a 

maximum, so f has a minimum.

15 Complex Vectors 

The entire story so far can be retold with a cast of complex numbers instead 

of real numbers. Most of this is straightforward. But there turns out to be an 

important twist in the complex theory of the inner product. The minimum 

principle doesn’t make any sense in the setting of complex numbers, and the 

spectral theorem as it was stated just isn’t true any more for complex matrices. 

Moreover, the natural notion of inner product itself is quite different for complex 

vectors—this new notion leads directly to the complex spectral theorem. 

15.1 Complex Numbers 

Definition 15.1. A complex number is a pair (x, y) of real numbers. Write 1 to 

mean the pair (1, 0), and i to mean (0, 1). Addition is defined by the rule 

subtraction by 

and multiplication by 

(x, y) + (X, Y ) = (x + X, y + Y ) , 

(x, y) − (X, Y ) = (x − X, y − Y ) , 

(x, y) (X, Y ) = (xX − yY, xY + yX) . 

We henceforth write any pair as x + iy. We call x the real part and y 

the imaginary part. When working with complex numbers, we draw them as 

points of the xy-plane, which we call the complex plane. Complex numbers are 

associative, commutative and distributive, and every nonzero complex number 

z = x + iy has a reciprocal: 

1 

z = x − iy 

x 2 + y 2 . 

You can easily check all of this, but you may assume it if you prefer. 

15.2 Polar Coordinates 

Trigonometry tells us that any point (x, y) of the plane can be written as 

x = r cos θ, y = r sin θ in polar coordinates. Therefore 

x + iy = r cos θ + ir sin θ, 

149


The number r is called the modulus of the complex number (written |z| if z 

is the complex number). The angle θ is called the argument of the complex 

number (written arg z if z is the complex number). The modulus is the distance 

from 0, or the length if think of (x, y) as a vector. 

Theorem 15.2 (de Moivre). Under multiplication of complex numbers, moduli 

multiply, while arguments add. Under division of complex numbers, moduli 

divide, while arguments subtract. 

Proof. Let z and w be two complex numbers. Write them as z = r cos α + 

ir sin α and w = ρ cos β + iρ cos β. Then calculate 

zw = rρ [(cos α cos β − sin α sin β) + i (sin α cos β + cos α sin β)] . 

A trigonometric identity: 

Division is similar. 

cos α cos β − sin α sin β = cos (α + β) 

sin α cos β + cos α sin β = sin (α + β) . 

Problem 15.1. Explain why every complex number has a square root. 

Definition 15.3. The conjugate ¯z of a complex number z = x + iy is the number 

¯z = x − iy. 

Problem 15.2. If z is a complex number, prove that |z| 2 = z¯z 

We write C for the set of complex numbers, and C n for the set of vectors 

⎛ ⎞ 

z 1 

z 

z = 

2 

⎜ ⎟ 

⎝ . ⎠ 

z n 

with each of z 1 , z 2 , . . . , z n a complex number. We won’t go through the effort 

of translating the theorems above into complex linear algebra, except to say 

that all of the results before chapter 13 (on inner products) are still true for 

complex matrices, with identical proofs. 


Problem 15.3. Draw the point z = − 1 2 + 1 3 i on the plane. Draw ¯z, 2z, z2 , 1 z . 

Problem 15.4. Draw w = 1 + 2i and z = 2 + i, and draw z + w, zw, z w .

15.3. Complex Linear Algebra 151 

Problem 15.5. The unit disk in the complex plane is the set of complex 

numbers of modulus less than 1. The unit circle is the set of complex numbers 

of modulus 1. Draw the unit circle. Pick z a nonzero complex number, and 

consider its integer powers z n (n ranging over the integers). Prove that either 

infinitely many of these powers lie inside the unit disk, or all of them lie on the 

unit circle. 


( ) 2πk 

z k = cos + i sin 

n 

( 2πk 

n 

) 

. 

Use de Moivre’s theorem to show that z n k = 1. These are the so-called nth roots 

of 1. Why are they all different for k = 0, 1, . . . , n − 1? Draw the 3 rd roots of 1 

and (in another colour) the 4 th roots of 1. 

Problem 15.7. With pictures and words, explain what you know about 

a. z + ¯z, 

b. z 2 if |z| > 1, 

c. ¯z if |z| = 1, 

d. zw if |z| = |w| = 1. 

15.3 Complex Linear Algebra 

The main differences between real and complex linear algebra are (1) eigenvalues 

and (2) inner products. We will consider inner products soon, but first lets 

consider eigenvalues. 

Theorem 15.4. Every square complex matrix has a complex eigenvalue. 

Proof. The eigenvalues are the roots of the characteristic polynomial det (A − λ I) ; 

a polynomial with complex number coefficients in a complex variable λ. The 

existence of a root of any complex polynomial is proven in the appendix. 

Example 15.5. It may be that there are not very many eigenvectors. 

( ) 

0 1 

A = 

0 0 

as a complex matrix still only has one eigenvalue, and (up to rescaling) one 

eigenvector. Two linearly independent eigenvectors are just what we need to 

diagonalize a 2 × 2 matrix. Clearly we cannot diagonalize A. So complex 

numbers don’t resolve all of the subtleties. 

Problem 15.8. Find the (complex) eigenvalues and eigenvectors of 

( ) 

0 −1 

A = 

1 0


and write down a matrix F which diagonalizes A. Moral of the story: even a 

matrix like A, which has only real number entries, can have complex number 

eigenvalues, and complex eigenvectors. 

15.4 Hermitian Inner Product 

The spectral theorem for symmetric matrices breaks down: 

Problem 15.9. Prove that the symmetric complex matrix 

( ) 

1 i 

i −1 

is not diagonalizable. 

We will find a complex spectral theorem, but with a different concept 

replacing symmetric matrices. 

The equation |z| 2 = z¯z is very important. Think of a complex number as if 

it were a vector in the plane: 

z = x + iy 

( ) 

x 

= . 

y 

Then |z| 2 = z¯z is the squared length x 2 + y 2 . This motivates: 

Definition 15.6. The Hermitian inner productof two vectors z and w in C n is 

the complex number 

〈z, w〉 = z 1 ¯w 1 + z 2 ¯w 2 + · · · + z n ¯w n . 

The curious bars on top of the w terms allow us to write ‖z‖ 2 = 〈z, z〉, just 

as we would for real vectors. Warning: the Hermitian inner product 〈z, w〉 is a 

complex number, not the real number we had in inner products before. 

Problem 15.10. Compute 〈z, w〉 , ‖z‖ and ‖w‖ for 

( ) ( ) 

1 

i 

z = , w = , 

i 2 + 2i 


a. 〈w, z〉 = 〈z, w〉 

b. 〈cz, w〉 = c 〈z, w〉 

c. 〈z + w, u〉 = 〈z, u〉 + 〈w, u〉 

d. 〈z, z〉 ≥ 0 

e. 〈z, z〉 = 0 just when z = 0 

for z, w and u any complex vectors in C n and c and complex number.

15.5. Adjoint of a Matrix 153 

15.5 Adjoint of a Matrix 

Definition 15.7. The adjoint A ∗ of a matrix A is the matrix whose entries are 

A ∗ ij = Āji (the conjugate of the transpose). 

Note that (A ∗ ) ∗ = A. 


〈Az, w〉 = 〈z, A ∗ w〉 

for any vectors z and w (if one side is defined, then they both are and they are 

equal). 

Problem 15.13. Prove that if some matrices A and B satisfy 〈Az, w〉 = 〈z, Bw〉 

for any vectors z and w for which this is defined, then B = A ∗ . 

15.6 Self-adjoint Matrices 

Definition 15.8. A complex matrix A is self-adjoint if A = A ∗ . 

This is the complex analogue of a symmetric matrix. Clearly self-adjoint 

matrices are square. 

Problem 15.14. Prove that sums and differences of self-adjoint matrices are 

self-adjoint, and that any real multiple of a self-adjoint matrix is self-adjoint. 

Example 15.9. The matrices 

are self-adjoint. 

( ) ( ) 

1 0 0 1 

, 

, 

0 1 1 0 

( ) 

0 −i 

, 

i 0 

( ) 

1 0 

0 −1 

Problem 15.15. Which diagonal matrices are self-adjoint? 

Problem 15.16. Prove that the eigenvalues of a self-adjoint matrix are real 

numbers. 


Problem 15.17. A matrix A is called skew-adjoint if A ∗ = −A. Prove that a 

matrix A is skew-adjoint just when iA is self-adjoint, and vice versa.


15.7 Unitary Matrices 

Definition 15.10. A complex matrix A is unitary if A ∗ = A −1 . 

This is the complex analogue of an orthogonal matrix. Clearly every unitary 

matrix is square. 

Problem 15.18. Prove that a matrix is unitary just when 

for any vectors z and w. 

〈Az, Aw〉 = 〈z, w〉 

Problem 15.19. Which diagonal matrices are unitary? 

Problem 15.20. If A is a real orthogonal matrix, prove that A is also unitary. 


Problem 15.21. Prove that the eigenvalues of a unitary matrix are complex 

numbers of modulus 1. 

15.8 Unitary Bases 

Definition 15.11. A unitary basis of C n is a complex basis u 1 , u 2 , . . . , u n for 

which 

⎧ 

⎨ 

1 if p = q 

〈u p , u q 〉 = 

⎩0 if p ≠ q. 

It may be helpful to use letters like p and q as subscripts, rather than i, j, 

to avoid confusion with the complex number i. 

Problem 15.22. Any complex basis v 1 , v 2 , . . . , v n determines a unitary basis 

u 1 , u 2 , . . . , u n by the complex Gram-Schmidt process: 

w p = v p − ∑ q

15.9. Normal Matrices 155 

15.9 Normal Matrices 

Definition 15.12. A complex matrix A is normal if AA ∗ = A ∗ A. 

Problem 15.24. Prove that self-adjoint, skew-adjoint and unitary matrices are 

normal. 

Problem 15.25. Which diagonal matrices are normal? 

Problem 15.26. If A is normal, and c is a constant, prove that A + c1 is also 

normal. 

Lemma 15.13. If A is normal, and Az = 0 for some vector z, then A ∗ z = 0. 

Proof. Suppose that Az = 0. Then 

So A ∗ z = 0. 

0 = ‖Az‖ 2 

= 〈Az, Az〉 

= 〈z, A ∗ Az〉 

= 〈z, AA ∗ z〉 

= 〈A ∗ z, A ∗ z〉 . 

Lemma 15.14. If A is normal, then every eigenvector z of A with eigenvalue 

λ is also an eigenvector of A ∗ , but with eigenvalue ¯λ. 

Proof. Let B = A − λ. Then Bz = 0. Moreover, B is normal since A is. By 

the previous lemma, B ∗ z = 0, so ( A ∗ − ¯λ ) z = 0. 

15.10 The Spectral Theorem for Normal Matrices 

At last, the complex version of our main theorem: normal matrices are diagonal 

after a unitary change of variables. 

Theorem 15.15. A square matrix is normal just when it is unitarily diagonalizable. 

Proof. Let A be unitarily diagonalizable. So F ∗ AF = Λ is diagonal. Then 

A = F ΛF ∗ , and one easily checks that AA ∗ = A ∗ A. 

Let A be a normal matrix. Pick any eigenvector u 1 of A, say with eigenvalue 

λ. Scale u 1 to be a unit vector. Pick unit vectors u 2 , u 3 , . . . , u n so that 

u 1 , u 2 , u 3 , . . . , u n is a unitary basis. Let F be the associated unitary change of 

basis matrix, 

) 

F = 

(u 1 u 2 . . . u n .


Replace A by F ∗ AF . After replacement A is still normal, and Ae 1 = λe 1 ; the 

first column of A is λe 1 . So 

( ) 

λ B 

A = 

0 C 

for some smaller matrices B and C. By lemma 15.14, A ∗ e 1 = ¯λe 1 , so the first 

column of A ∗ is ¯λe 1 , and so B = 0. Moreover, C is also normal, so by induction 

we can unitarily diagonalize C, and therefore A. 

Corollary 15.16. Self-adjoint, skew-adjoint and unitary matrices are unitarily 

diagonalizable. 


A = 

( 

7 

2 

− i 2 

a. Is A self-adjoint or skew-adjoint? 

b. Find the eigenvalues and eigenvectors of A. 

c. Find a unitary matrix F which diagonalizes A, and unitarily diagonalize 

A. 

i 

2 

7 

2 

) 

. 

15.11 Appendix: The Fundamental Theorem of Algebra 

There was one missing ingredient in the proof of the complex spectral theorem: 

we need to know that complex polynomials have zeros. This gap is filled in here. 

Warning: students are not required or expected to work through this appendix, 

which is advanced and is included only for completeness. 

Lemma 15.17. Take 

p(z) = a 0 + a 1 z + · · · + a n z n 

any nonconstant polynomial. In order to keep p(z) at as large a modulus as we 

like, we only have to keep z at a large enough modulus. 

Proof. We can assume that a n ≠ 0. Write 

p(z) 

z n = a 0 

z n + a 1 

z n−1 + · · · + a n. 

All of the terms get as small as we like, for z of large modulus, except the last 

one. So for large enough z, p(z)/z n is close to a n . Since z n has large modulus, 

p(z) must as well. 

Corollary 15.18. For any polynomial p(z), there must be a point z = z 0 at 

which p(z) has smallest modulus.

15.11. Appendix: The Fundamental Theorem of Algebra 157 

Proof. By lemma 15.17, if we choose a large enough disk containing z 0 then 

p(z) has large modulus at each point z around the edge of that disk. Making 

the disk even larger if need be, we can ensure that the modulus all around the 

edge is larger than at some chosen point inside the disk. By theorem 14.21 on 

page 147, there is a point of the disk where p(z) has minimum modulus among 

all points of that disk. The minimum can’t be on the edge. Moreover, the 

modulus stays large as we move past the edge. Thus any minimum modulus 

point in that large disk is a minimum modulus point among all points of the 

plane. 

Lemma 15.19. The modulus |p(z)| of any nonconstant polynomial function 

reaches a minimum just where p(z) reaches zero. 

Proof. Take any point z 0 . Suppose that p (z 0 ) ≠ 0, and lets find a reason why 

z 0 is not a minimum modulus point. Replace p(z) by p (z − z 0 ) if needed, to 

arrange that z 0 = 0. Write out 

p(z) = a 0 + a 1 z + a 2 z 2 + · · · + a n z n . 

It might happen that a 1 = 0, and maybe a 2 too. So write 

p(z) = a 0 + a k z k + · · · + a n z n , 

writing down only the nonzero terms, in increasing order of their power of z. 

Clearly a 0 ≠ 0 because p(0) ≠ 0. We can divide by a 0 if we wish, which 

alters modulus only by a positive factor, so lets assume that a 0 = 1. We can 

rotate the z variable, and rescale it, which rotates and scales each coefficient. 

Thereby arrange a k = −1, so 

Calculate 

p(z) = 1 − z k + · · · + a n z n . 

|p(z)| 2 = p(z)p(z) 

= 1 − z k − ¯z k + . . . , 

where the dots indicate terms involving more z and ¯z factors. Write z = 

r cos θ + ir sin θ. De Moivre’s theorem gives 

|p(z)| 2 = 1 − 2 r k cos kθ + r k+1 (. . . ) . 

The error term (. . . ) is some (probably very complicated) polynomial in r with 

(complicated) coefficients involving cos θ and sin θ. We don’t need to work it 

out. We only need to know that it is bounded for z near enough to 0, which is 

clear whatever the terms involved are. For r > 0 sufficiently small, 

Multiplying by −r k , 

2 − r(. . . ) > 0. 

−2 r k + r k+1 (. . . ) < 0. 

Therefore |p(z)| 2 gets even smaller at the point z = r than at z = 0.


Corollary 15.20. Every nonconstant complex polynomial has a root. 

Theorem 15.21 (Fundamental Theorem of Algebra). Every nonconstant complex 

polynomial p(z) can be factored into linear factors. More specifically, 

p(z) = c (z − z 1 ) d1 (z − z 2 ) d2 . . . (z − z k ) d k 

where c is a constant, z 1 , z 2 , . . . , z k are the roots of p(z), and d 1 , d 2 , . . . , d k are 

positive integers, with sum d 1 + d 2 + · · · + d k equal to the degree of p(z). 

Proof. We have a root, say z 1 . Therefore p(z)/ (z − z 1 ) is a polynomial, and 

we apply induction.

159 

Abstraction

16 Vector Spaces 

The ideas of linear algebra apply more widely, in more abstract spaces than R n . 


Definition 16.1. A vector space V is a set (whose elements are called vectors) 

equipped with two operations, addition (written +) and scaling (written ·), so 

that 

a. Addition laws: 

a) u + v is in V 

b) (u + v) + w = u + (v + w) 

c) u + v = v + u 

for any vectors u, v, w in V , 

b. Zero laws: 

a) There is a vector 0 in V so that 0 + v = v for any vector v in V . 

b) For each vector v in V , there is a vector w in V , for which v + w = 0. 

c. Scaling laws: 

a) av is in V 

b) 1 · v = v 

c) a(bv) = (ab)v 

d) (a + b)v = av + bv 

e) a(u + v) = au + av 

for any real numbers a and b, and any vectors u and v in V . 

Because (u + v) + w = u + (v + w), we never need parentheses in adding up 

vectors. 

Example 16.2. R n is a vector space, with the usual addition and scaling. 

Example 16.3. The set V of all real-valued functions of a real variable is a vector 

space: we can add functions (f + g)(x) = f(x) + g(x), and scale functions: 

(c f)(x) = c f(x). This example is the main motivation for developing an 

abstract theory of vector spaces. 

Example 16.4. Take some region inside R n , like a box, or a ball, or several 

boxes and balls glued together. Let V be the set of all real-valued functions of 

that region. Unlike R n , which comes equipped with the standard basis, there 

is no “standard basis” of V . By this, we mean that there is no collection of 

161


functions f i we know how to write down so that every function f is a unique 

linear combination of the f i . Even still, we can generalize a lot of ideas about 

linear algebra to various spaces like V instead of just R n . Practically speaking, 

there are only two types of vector spaces that we ever encounter: R n (and its 

subspaces) and the space V of real-valued functions defined on some region in 

R n (and its subspaces). 

Example 16.5. The set R p×q of p × q matrices is a vector space, with usual 

matrix addition and scaling. 

Problem 16.1. If V is a vector space, prove that 

a. 0 v = 0 for any vector v, and 

b. a 0 = 0 for any scalar a. 

Problem 16.2. Let V be the set of real-valued polynomial functions of a real 

variable. Prove that V is a vector space, with the usual addition and scaling. 

Problem 16.3. Prove that there is a unique vector w for which v + w = 0. 

(Lets always call that vector −v.) Prove also that −v = (−1)v. 

We will write u − v for u + (−v) from now on. We define linear relations, 

linear independence, bases, subspaces, bases of subspaces, and dimension using 

exactly the same definitions as for R n . 

Remark 16.6. Thinking as much as possible in terms of abstract vector spaces 

saves a lot of hard work. We will see many reasons why, but the first is that 

every subspace of any vector space is itself a vector space. 


Problem 16.4. Prove that if u + v = u + w then v = w. 

Problem 16.5. Imagine that the population p j at year j is governed (at least 

roughly) by some equation 

p j+1 = ap j + bp j−1 + cp j−2 . 

Prove that for fixed a, b, c, the set of all sequences . . . , p 1 , p 2 , . . . which satisfy 

this law is a vector space. 

Problem 16.6. Give examples of subsets of the plane 

a. invariant under scaling of vectors (sending u to au for any number a), 

but not under addition of vectors. (In other words, if you scale vectors 

from your subset, they have to stay inside the subset, but if you add some 

vectors from your subset, you don’t always get a vector from your subset.) 

b. invariant under addition but not under scaling or subtraction. 

c. invariant under addition and subtraction but not scaling. 

Problem 16.7. Take positive real numbers and “add” by the law u ⊕ v = uv 

and “scale” by a ⊙ u = u a . Prove that the positive numbers form a vector space 

with these funny laws for addition and multiplication.

16.2. Linear Maps 163 

16.2 Linear Maps 

Definition 16.7. A linear map between vector spaces U and V is a rule which 

associates to each vector x from U a vector y from V , (which we write as 

y = T x) for which 

a. T (x 0 + x 1 ) = T x 0 + T x 1 

b. T (ax) = aT x 

for any vectors x 0 , x 1 and x in U and real number a. We will write T : U → V 

to mean that T is a linear map from U to V . 

Example 16.8. Let U be the vector space of all real-valued functions of real 

variable. Imagine 16 scientists standing one at each kilometer along a riverbank, 

each measuring the height of the river at the same time. The height at that 

time is a function h of how far you are along the bank. The 16 measurements 

of the function, say h(1), h(2), . . . , h(16), sit as the entries of a vector in R 16 . 

So we have a map T : U → R 16 , given by sampling values of functions h(x) at 

various points x = 1, x = 2, . . . , x = 16. 

⎛ ⎞ 

h(1) 

⎜ h(2) ⎟ 

T h = 

⎜ 

⎝ 

. 

h(16) 

⎟ 

⎠ . 

This T is a linear map, possibly the most important type of map in all of 

science. 

Example 16.9. Any p × q matrix A determines a linear map T : R q → R p , by 

the equation T x = Ax. Conversely, given a linear map T : R q → R p , define a 

p × q matrix A by letting the j-th column of A be T e j . Then T x = Ax. We 

say that A is the matrix associated to T . In this way we can identify the space 

of linear maps T : R q → R p with the space of p × q matrices. 

Example 16.10. There is an obvious linear map 1 : V → V given by 1v = v 

for any vector v in V , and called the identity map We could easily confuse the 

number 1 with the identity map 1, once again a deliberate ambiguity. Similarly, 

we will write 2 : V → V for the map which associates to each vector v the 

vector 2v, etc. 

Definition 16.11. If S : U → V and T : V 

T S : U → W is their composition. 

→ W are linear maps, then 

Problem 16.8. Prove that if A is the matrix associated to a linear map 

S : R p → R q and B the matrix associated to T : R q → R r , then BA is the 

matrix associated to their composition. 

Remark 16.12. From now on, we won’t distinguish a linear map T : R q → R p 

from its associated matrix, which we will also write as T . Once again, deliberate 

ambiguity has many advantages.


Remark 16.13. A linear map between abstract vector spaces doesn’t have an 

associated matrix; this idea only makes sense for maps T : R q → R p . 

Example 16.14. Let U and V be two vector spaces. The set W of all linear 

maps T : U → V is a vector space: we add linear maps by (T 1 + T 2 ) (u) = 

T 1 (u) + T 2 (u), and scale by (cT )u = cT u. 

Definition 16.15. The kernel of a linear map T : U → V is the set of vectors 

u in U for which T u = 0. The image is the set of vectors v in V of the form 

v = T u for some u in U. 

Definition 16.16. A linear map T : U → V is an isomorphism if 

a. T x = T y just when x = y (one-to-one) for any x and y in U, and 

b. For any z in W , there is some x in U for which T x = z (onto). 

Two vector spaces U and V are called isomorphic if there is an isomorphism 

between them. 

Being isomorphic means effectively being the same for purposes of linear 

algebra. 

Problem 16.9. Prove that a linear map T : U → V is an isomorphism just 

when its kernel is 0, and its image is V . 

Problem 16.10. Let V be a vector space. 

isomorphism. 

Prove that 1 : V → V is an 

Problem 16.11. Prove that an isomorphism T : U → V has a unique inverse 

map T −1 : V → U so that T −1 T = 1 and T T −1 = 1, and that T −1 is linear. 

Problem 16.12. Let V be the set of polynomials of degree at most 2, and map 

T : V → R 3 by, for any polynomial p, 

Prove that T is an isomorphism. 

⎛ ⎞ 

p(0) 

⎜ ⎟ 

T p = ⎝p(1) 

⎠ . 

p(2) 


Problem 16.13. Prove that the space of all p × q matrices is isomorphic to 

R pq .


16.3 Subspaces 

The definition of a subspace is identical to that for R n . 

Example 16.17. Let V be the set of real-valued functions of a real variable. The 

set P of continuous real-valued functions of a real variable is a subspace of V . 

Example 16.18. Let V be the set of all infinite sequences of real numbers. We 

add a sequence x 1 , x 2 , x 3 , . . . to a sequence y 1 , y 2 , y 3 , . . . to make the sequence 

x 1 + y 1 , x 2 + y 2 , x 3 + y 3 , . . . . We scale a sequence by scaling each entry. The 

set of convergent infinite sequences of real numbers is a subspace of V . 

In these last two examples, we see that a large part of analysis is encoded 

into subspaces of infinite dimensional vector spaces. (We will define dimension 

shortly.) 

Problem 16.14. Describe some subspaces of the space of all real-valued functions 

of a real variable. 


Problem 16.15. Which of the following are subspaces of the space of real-valued 

functions of a real variable? 

a. The set of everywhere positive functions. 

b. The set of nowhere positive functions. 

c. The set of functions which are positive somewhere. 

d. The set of polynomials which vanish at the origin. 

e. The set of increasing functions. 

f. The set of functions f(x) for which f(−x) = f(x). 

g. The set of functions f(x) each of which is bounded from above and below 

by some constant functions. 

Problem 16.16. Which of the following are subspaces of vector space of all 

3 × 3 matrices? 

a. The invertible matrices. 

b. The noninvertible matrices. 

c. The matrices with positive entries. 

d. The upper triangular matrices. 

e. The symmetric matrices. 

f. The orthogonal matrices. 

Problem 16.17. 

a. Let H be an n × n matrix. Let P be the set of all matrices A for which 

AH = HA. Prove that P is a subspace of the space V of all n × n 

matrices.


b. Describe this subspace P for 

H = 

( ) 

1 0 

. 

0 −1 

16.4 Bases 

We define linear combinations, linear relations, linear independence, bases, the 

span of a set of vectors, eigenvalues, and eigenvectors identically. 

Problem 16.18. Find bases for the following vector spaces: 

a. The set of polynomial functions of degree 3 or less. 

b. The set of 3 × 2 matrices. 

c. The set of n × n upper triangular matrices. 

d. The set of polynomial functions p(x) of degree 3 or less which vanish at 

the origin x = 0. 

Remark 16.19. When working with an abstract vector space V , the role that 

has up to now been played by a change of basis matrix will henceforth be played 

by an isomorphism F : R n → V . Equivalently, F e 1 , F e 2 , . . . , F e n is a basis of 

V . 

Example 16.20. Let V be the vector space of polynomials p(x) = a + bx + cx 2 

of degree at most 2. Let F : R 3 → V be the map 

⎛ ⎞ 

⎜ 

a ⎟ 

F ⎝b⎠ = a + bx + cx 2 . 

c 

Clearly F is an isomorphism. 

Definition 16.21. The dimension of a vector space V is n if there is an isomorphism 

F : R n → V . If there is no such value of n, then we say the V has 

infinite dimension. 

Remark 16.22. We can include the possibility that n = 0 by defining R 0 to 

consist in just a single vector 0, a zero dimensional vector space. 

Problem 16.19. Prove that the definition of dimension is well-defined, i.e. that 

there is either only one such value of n, or no such value of n. 

Problem 16.20. Let V be the set of polynomials of degree at most p in n 

variables. Find the dimension of V . 

Problem 16.21. Prove that if linear maps satisfy P S = T and P is an isomorphism, 

then S and T have the same kernel, and isomorphic images.

16.4. Bases 167 

Problem 16.22. Prove that if linear maps satisfy SP = T , and P is an 

isomorphism, then S and T have the same image and isomorphic kernels. 

Problem 16.23. Prove that dimension is invariant under isomorphism. 

Problem 16.24. Prove that for any subspace Z of a finite dimensional vector 

space U, there is basis for U 

z 1 , z 2 , . . . , z p , u 1 , u 2 , . . . , u q 

so that 

z 1 , z 2 , . . . , z p , 

form a basis for Z. 

Theorem 16.23. If v 1 , v 2 , . . . , v n is a basis for a vector space V , and w 1 , w 2 , . . . , w n 

are any vectors in a vector space W , then there is a unique linear map 

T : V → W so that T v i = w i . 

Proof. If there were two such maps, say S and T , then S − T would vanish on 

v 1 , v 2 , . . . , v n , and therefore by linearity would vanish on any linear combination 

of v 1 , v 2 , . . . , v n , therefore on any vector, so S = T . 

To see that there is such a map, we know that each vector x in V can be 

written uniquely as 

x = x 1 v 1 + x 2 v 2 + · · · + x n v n . 

So lets define 

T x = x 1 w 1 + x 2 w 2 + · · · + x n w n . 

If we take two vectors, say x and y, and write them as linear combinations of 

basis vectors, say with 

x = x 1 v 1 + x 2 v 2 + · · · + x n v n , 

y = y 1 v 1 + y 2 v 2 + · · · + y n v n , 

then 

T (x + y) = (x 1 + y 1 ) w 1 + (x 2 + y 2 ) w 2 + · · · + (x n + y n ) w n 

= T x + T y. 

Similarly, if we scale a vector x by a number a, then 

ax = a x 1 v 1 + a x 2 v 2 + · · · + a x n v n ,


so that 

Therefore T is linear. 

T (ax) = a x 1 w 1 + a x 2 w 2 + · · · + a x n w n 

= a T x. 

Definition 16.24. If T : U → V is a linear map, and W is a subspace of U, the 

restriction, written T | W 

: W → V , is the linear map defined by T | W 

(w) = T w 

for w in W , only allowing vectors from W to map through T . 

Theorem 16.25. Let T : U → V be a linear transformation of finite dimensional 

vector spaces. Then 

dim ker T + dim im T = dim U. 

Proof. Problem 16.24 on the previous page shows that we can pick a basis 

z 1 , z 2 , . . . , z p , u 1 , u 2 , . . . , u q for U so that z 1 , z 2 , . . . , z p is a basis for ker T . Let 

w 1 = T u 1 , w 2 = T u 2 , . . . , w q = T u q . Every vector in im T can be written as 

y = T x for some vector x in U. But then x can be written in terms of the basis, 

say as 

x = a 1 z 1 + a 2 z 2 + · · · + a p z p + b 1 u 1 + b 2 u 2 + · · · + b q u q . 

So 

y = T x 

= a 1 T z 1 + a 2 T z 2 + · · · + a p T z p + b 1 T u 1 + b 2 T u 2 + · · · + b q T u q 

= b 1 T u 1 + b 2 T u 2 + · · · + b q T u q 

= b 1 w 1 + b 2 w 2 + · · · + b q w q . 

Therefore w 1 , w 2 , . . . , w q span im T . If there is a linear relation between 

w 1 , w 2 , . . . , w q , say 

0 = c 1 w 1 + c 2 w 2 + · · · + c q w q , 

then 

0 = T (c 1 u 1 + c 2 u 2 + · · · + c q u q ) . 

Therefore c 1 u 1 + c 2 u 2 + · · · + c q u q lies in ker T , and so can be written in terms 

of the basis z 1 , z 2 , . . . , z p of ker T , say as 

c 1 u 1 + c 2 u 2 + · · · + c q u q = d 1 z 1 + d 2 z 2 + · · · + d p z p , 

a linear relation among the vectors of our basis of U, impossible. So w 1 , w 2 , . . . , w q 

are linearly independent, and so form a basis of im T . Finally, we have a basis 

of U with p + q vectors in it, a basis of ker T with p vectors in it, and a basis of 

im T with q vectors in it, so dim ker T + dim im T = dim U. 

Remark 16.26. In this theorem, we could also allow U, V , or both to have infinite 

dimension, as well as allowing kernel and image to have infinite dimension, with 

the understanding that ∞ + ∞ = ∞. However, in this book we will content 

ourselves with finite dimensional vector spaces.

16.5. Determinants 169 

16.5 Determinants 

Definition 16.27. If T : V → V is a linear map taking a finite dimensional 

vector space to itself, define det T to be 

det T = det A, 

where F : R n → V is an isomorphism, and A is the matrix associated to 

F −1 T F : R n → R n . 

Remark 16.28. There is no definition of determinant for a linear map of an 

infinite dimensional vector space, and there is no general theory to handle such 

things, although there are many important examples. 

Remark 16.29. A map T : U → V between different vector spaces doesn’t have 

a determinant. 

Problem 16.25. Prove that value of the determinant is independent of the 

choice of isomorphism F . 

Problem 16.26. Let V be the vector space of polynomials of degree at most 2, 

and let T : V → V be the linear map T p(x) = 2p(x − 1) (shifting a polynomial 

p(x) to 2p(x − 1).) For example, T 1 = 2, T x = 2(x − 1), T x 2 = 2(x − 1) 2 . 

a. Prove that T is a linear map. 

b. Prove that T is an isomorphism. 

c. Find det T . 


Problem 16.27. Let T : V → V be the linear map T x = 2x. Suppose that V 

has dimension n. What is det T ? 

Problem 16.28. Let V be the vector space of all 2 × 2 matrices. Let A be 

a 2 × 2 matrix with two different eigenvalues, λ 1 and λ 2 , and eigenvectors x 1 

and x 2 corresponding to these eigenvalues. Consider the linear map T : V → V 

given by T B = AB (matrix multiplication on the right hand side of B by A). 

What are the eigenvalues of T and what are the eigenvectors? (Warning: the 

eigenvectors are vectors from V , so they are matrices.) What is det T ? 

Problem 16.29. The same but let T B = BA.


Problem 16.30. Let V be the vector space of polynomials of degree at most 2, 

and let T : V → V be defined by T q(x) = q(−x). What is the characteristic 

polynomial of T ? What are the eigenspaces of T ? Is T diagonalizable? 

Problem 16.31. (Due to Peter Lax [6].) Consider the problem of finding a 

polynomial p(x) with specified average values on each of a dozen intervals on 

the x-axis. (Suppose that the intervals don’t overlap.) Does this problem have 

a solution? Does it have many solutions? (All you need is a naive notion of 

average value, but you can consult a calculus book, for example [14], for a 

precise definition.) 

(a) For each polynomial p of degree n, let T p be the vector whose entries are 

the averages. Suppose that the number of intervals is at least n. Show that 

T p = 0 only if p = 0. 

(b) Suppose that the number of intervals is no more than n. Show that we can 

solve T p = b for any given vector b. 

Problem 16.32. How much of the “nutshell” (table 12.1 on page 114) can you 

translate into criteria for invertibility of a linear map T : U → V ? How much 

more if we assume that U and V are finite dimensional? How much more if we 

assume that U = V ? 

16.6 Complex Vector Spaces 

If we change the definition of a vector space, a linear map, etc. to use complex 

numbers instead of real numbers, we have a complex vector space, complex 

linear map, etc. All of the examples so far in this chapter work just as well with 

complex numbers replacing real numbers. We will refer to a real vector space 

or a complex vector space to distinguish the sorts of numbers we are using to 

scale the vectors. Some examples of complex vector spaces: 

a. C n 

b. The space of p × q matrices with complex entries. 

c. The space of complex-valued functions of a real variable. 

d. The space of infinite sequences of complex numbers. 

16.7 Inner Product Spaces 

Definition 16.30. An inner product on a real vector space V is a choice of a real 

number 〈x, y〉 for each pair of vectors x and y so that 

a. 〈x, y〉 is a real-valued linear map in x for each fixed y 

b. 〈x, y〉 = 〈y, x〉 

c. 〈x, x〉 ≥ 0 and equal to 0 just when x = 0.

16.8. Hermitian Inner Product Spaces 171 

A real vector space equipped with an inner product is called a inner product 

space. A linear map between vector spaces is called orthogonal if it preserves 

inner products. 

Theorem 16.31. Every inner product space of dimension n is carried by some 

orthogonal isomorphism to R n with its usual inner product. 

Proof. Use the Gram–Schmidt process to construct an orthonormal basis, using 

the same formulas we have used before, say u 1 , u 2 , . . . , u n . Define a linear map 

F x = x 1 u 1 +· · ·+x n u n , for x in R n . Clearly F is an orthogonal isomorphism. 

Example 16.32. Take A any symmetric n × n matrix with positive eigenvalues, 

and let 〈x, y〉 A 

= 〈Ax, y〉 (with the usual inner product on R n appearing on the 

right hand side). Then the expression 〈x, y〉 A 

is an inner product. Therefore 

by the theorem, we can find a change of variables taking it to the usual inner 

product. 

Definition 16.33. A linear map T : V → V from an inner product space to itself 

is symmetric if 〈T v, w〉 = 〈v, T w〉 for any vectors v and w. 

Theorem 16.34 (Spectral Theorem). Given a symmetric linear map T on a 

finite dimensional inner product space V , there is an orthogonal isomorphism 

F : R n → V for which F −1 T F is the linear map of a diagonal matrix. 

16.8 Hermitian Inner Product Spaces 

Definition 16.35. A Hermitian inner product on a complex vector space V is a 

choice of a complex number 〈z, w〉 for each pair of vectors z and w from V so 

that 

a. 〈z, w〉 is a complex-valued linear map in z for each fixed w 

b. 〈z, w〉 = 〈w, z〉 

c. 〈z, z〉 ≥ 0 and equal to 0 just when z = 0. 


Problem 16.33. Let V be the vector space of complex-valued polynomials of a 

complex variable of degree at most 3. Prove that for any four distinct points 

z 1 , z 2 , z 3 , z 4 , the expression 

〈p(z), q(z)〉 = p (z 0 ) ¯q (z 0 ) + p (z 1 ) ¯q (z 1 ) + p (z 2 ) ¯q (z 2 ) + p (z 3 ) ¯q (z 3 ) + 

is a Hermitian inner product. 

Problem 16.34. Continuing the previous question, if the points z 0 , z 1 , z 2 , z 3 

are z 0 = 1, z 1 = −1, z 2 = i, z 3 = −i, prove that the map T : V → V given by 

T p(z) = p(−z) is unitary.


Problem 16.35. Continuing the previous two questions, unitarily diagonalize 

T . 

Problem 16.36. State and prove a spectral theorem for normal complex linear 

maps T : V → V on a Hermitian inner product space, and define the terms 

adjoint, normal and unitary for complex linear maps V → V .

17 Fields 

Instead of real or complex numbers, we can dream up wilder notions of numbers. 

Definition 17.1. A field is a set F equipped with operations + and · so that 

a. Addition laws 

a) x + y is in F 

b) (x + y) + z = x + (y + z) 

c) x + y = y + x 

for any x, y and z from F . 

b. Zero laws 

a) There is an element 0 of F for which x + 0 = x for any x from F 

b) For each x from F there is a y from F so that x + y = 0. 

c. Multiplication laws 

a) xy is in F 

b) x(yz) = (xy)z 

c) xy = yx 

for any x, y and z in F . 

d. Identity laws 

a) There is an element 1 in F for which x1 = 1x = x for any x in F . 

b) For each x ≠ 0 there is a y ≠ 0 for which xy = 1. (This y is called 

the reciprocal or inverse of x.) 

c) 1 ≠ 0 

e. Distributive law 

a) x(y + z) = xy + xz 

for any x, y and z in F . 

We will not ask the reader to check all of these laws in any of our examples, 

because there are just too many of them. We will only give some examples; for 

a proper introduction to fields, see Artin [1]. 

Example 17.2. Of course, the set of real numbers R is a field (with the usual 

addition and multiplication), as is the set C of complex numbers and the set Q 

of rational numbers. The set Z of integers is not a field, because the integer 2 

has no integer reciprocal. 

173

174 Fields 

Example 17.3. Let F be the set of all rational functions p(x)/q(x), with p(x) 

and q(x) polynomials, and q(x) not the 0 polynomial. Clearly for any pair of 

rational functions, the sum 

p 1 (x) 

q 1 (x) + p 2(x) 

q 2 (x) = p 1(x)q 2 (x) + q 1 (x)p 2 (x) 

q 1 (x)q 2 (x) 

is also rational, as is the product, and the reciprocal. 

Problem 17.1. Suppose that F is a field. Prove the uniqueness of 0, i.e. that 

there is only one element z = 0 in F which satisfies x + z = x for any element x. 

Problem 17.2. Prove the uniqueness of 1. 

Problem 17.3. Let x be an element of a field F . Prove the uniqueness of the 

element y for which x + y = 0. 

Henceforth, we write this y as −x. 

Problem 17.4. Let x be an element of field F . If x ≠ 0, prove the uniqueness 

of the reciprocal. 

Henceforth, we write the reciprocal of x as 1 x 

, and write x + (−y) as x − y. 

17.1 Some Finite Fields 

Example 17.4. Let F be the set of numbers F = {0, 1}. Carry out multiplication 

by the usual rule, but when you add, x + y won’t mean the usual addition, but 

instead will mean the usual addition except when x = y = 1, and then we set 

1 + 1 = 0. F is a field called the field of Boolean numbers. 

Problem 17.5. Prove that for Boolean numbers, −x = x and 1 x = x. 

Example 17.5. Suppose that p is a positive integer. Let F be the set of numbers 

F p = {0, 1, 2, . . . , p − 1}. Define addition and multiplication as usual for integers, 

but if the result is bigger than p − 1, then subtract multiples of p from the result 

until it lands in F p , and let that be the definition of addition and multiplication. 

F 2 is the field of Boolean numbers. We usually write x = y (mod p) to mean 

that x and y differ by a multiple of p. 

For example, if p = 7, we find 

5 · 6 = 30 (mod 7) 

= 30 − 28 (mod 7) 

= 2 (mod 7). 

This is arithmetic in F 7 . 

It turns out that F p is a field for any prime number p.

17.1. Some Finite Fields 175 

Problem 17.6. Prove that F p is not a field if p is not prime. 

The only trick in seeing that F p is field is to see why there is a reciprocal. 

It can’t be the usual reciprocal as a number. For example, if p = 7 

6 · 6 = 36 (mod 7) 

= 36 − 35 (mod 7) 

= 1 (mod 7) 

(because 35 is a multiple of 7). So 6 has reciprocal 6 in F 7 . 

The Euclidean Algorithm 

To compute reciprocals, we first need to find greatest common divisors, using 

the Euclidean algorithm. The basic idea: given two numbers, for example 12132 

and 2304, divide the smaller into the larger, writing a quotient and remainder: 

12132 − 5 · 2304 = 612. 

Take the two last numbers in the equation (2304 and 612 in this example), and 

repeat the process on them, and so on: 

2304 − 3 · 612 = 468 

612 − 1 · 468 = 144 

468 − 3 · 144 = 36 

144 − 4 · 36 = 0. 

Stop when you hit a remainder of 0. The greatest common divisor of the 

numbers you started with is the last nonzero remainder (36 in our example). 

Now that we can find the greatest common divisor, we will need to write 

the greatest common divisor as an integer linear combination of the original 

numbers. If we write the two numbers we started with as a and b, then our 

goal is to compute integers u and v for which ua + bv = gcd(a, b). To do this, 

lets go backwards. Start with the second to last equation, giving the greatest 

common divisor. 

36 = 468 − 3 · 144 

Plug the previous equation into it: 

Simplify: 

= 468 − 3 · (612 − 1 · 468) 

= −3 · 612 + 4 · 468

176 Fields 

Plug in the equation before that: 

= −3 · 612 + 4 · (2304 − 3 · 612) 

= 4 · 2304 − 15 · 612 

= 4 · 2304 − 15 · (12132 − 5 · 2304) 

= −15 · 12132 + 79 · 2304. 

We have it: gcd(a, b) = u a + b v, in our case 36 = −15 · 12132 + 79 · 2304. 

What does this algorithm do? At each step downward, we are facing an 

equation like a − bq = r, so any number which divides into a and b must divide 

into r and b (the next a and b) and vice versa. The remainders r get smaller 

at each step, always smaller than either a or b. On the last line, b divides 

into a. Therefore b is the greatest common divisor of a and b on the last line, 

and so is the greatest common divisor of the original numbers. We express 

each remainder in terms of previous a and b numbers, so we can plug them in, 

cascading backwards until we express the greatest common divisor in terms of 

the original a and b. In the example, that gives (−15)(12132) + (79)(2304) = 36. 

Let compute a reciprocal modulo an integer. Lets compute 17 −1 modulo 

1001. Take a = 1001, and b = 17. 

Going backwards 

1001 − 58 · 17 = 15 

1 = 15 − 7 · 2 

17 − 1 · 15 = 2 

15 − 7 · 2 = 1 

2 − 2 · 1 = 0. 

= 15 − 7 · (17 − 1 · 15) 

= −7 · 17 + 8 · 15 

= −7 · 17 + 8 · (1001 − 58 · 17) 

= −471 · 17 + 8 · 1001. 

So finally, (−471)(17) + (8)(1001) = 1. Modulo 1001, (−471)(17) = 1. So 

17 −1 = −471 = 1001 − 471 = 530 (mod 1001). 

This is how we can compute reciprocals in F p : we take a = p, and b the 

number to reciprocate, and apply the process. If p is prime, the resulting 

greatest common divisor is 1, and so we get up + vb = 1, and so vb = 1 (mod p), 

so v is the reciprocal of b. 

Problem 17.7. Compute 15 −1 in F 79 . 

Problem 17.8. Solve the linear equation 3x + 1 = 0 in F 5 .

17.2. Matrices 177 

Problem 17.9. Prove that F p is a field whenever p is a prime number. 

17.2 Matrices 

Matrices with entries from any field F are added, subtracted, and multiplied by 

the same rules. We can still carry out forward elimination, back substitution, 

calculate inverses, determinants, characteristic polynomials, eigenvectors and 

eigenvalues, using the same steps. 

Problem 17.10. Let F be the Boolean numbers, and A the matrix 

⎛ 

⎜ 

0 1 0 

⎞ 

⎟ 

A = ⎝1 0 1⎠ , 

1 1 0 

thought of as having entries from F . Is A invertible? If so, find A −1 . 

All of the ideas of linear algebra worked out for the real and complex 

numbers have obvious analogues over any field, except for the concept of inner 

product, which is much more sophisticated. From now on, we will only state and 

prove results for real vector spaces, but those results which do not require inner 

products (or orthogonal or unitary matrices) continue to hold with identical 

proofs over any field. 

Problem 17.11. If A is a matrix whose entries are rational functions of a 

variable t, prove that the rank of A is constant in t, except for finitely many 

values of t.

Geometry and Combinatorics 

179

18 Permutations and Determinants 

In this chapter, we present explicit, theoretically important, but computationally 

infeasible formulas for determinants, matrix inversion and solving linear 

equations. 

18.1 Determinants Via Permutations 

Recall the definition of determinant: 

Definition 18.1. Let A (ij) be the matrix obtained by cutting out row i and 

column j from a matrix A. The determinant of an n × n matrix A is 

a. If A is 1 × 1, say A = (a), then det A = a. 

b. Otherwise 

det A =A 11 det A (11) − A 21 det A (21) + A 31 det A (31) − A 41 det A (41) + . . . 

= ∑ i 

(−1) i+1 A i1 det A (i1) . 

Remark 18.2. There is no standard notation in the mathematical literature for 

A (ij) . We won’t refer to A (ij) again after this chapter. 

We can also expand down any column, or across any row: 

Theorem 18.3. 

det A = ∑ i 

= ∑ j 

(−1) i+j A ij det A (ij) , for any column j 

(−1) i+j A ij det A (ij) , for any row i, 

where A is any square matrix. 

Finally, we can expand into a sum of permutations: 

Theorem 18.4. 

det A = ∑ (−1) N A i11A i22 . . . A inn, (18.1) 

181


for any square matrix A, where the sum is over all permutations i 1 , i 2 , . . . , i n 

of 1, 2, . . . , n, and N is the number of transpositions in some sequence of 

transpositions taking the permutation i 1 , i 2 , . . . , i n back to 1, 2, . . . , n. 

Remark 18.5. This permutation formula for the determinant is important for 

the theory. However, it is too slow as a method for computing determinants. 

For a 20 × 20 matrix, Gauss–Jordan elimination takes approximately 2489 multiplications 

and divisions, while the permutation formula takes approximately 

46225138155356160000 multiplications, so about 10 16 times as long—too long 

for any supercomputer that will ever be built. 

Remark 18.6. The number (−1) N is called the sign of the permutation i 1 , i 2 , . . . , i n . 

For example, the sign of 1, 3, 2 is (−1) 1 = −1. Keep in mind that a permutation 

might factor into transpositions in different ways, using different numbers of 

transpositions; for example 

= . 

It is not obvious whether there could be two different ways to carry out a 

permutation in terms of transpositions, with two different values for (−1) N (i.e. 

one with an odd number of transpositions, and the other an even number of 

transpositions). However, this will follow from the proof below. 

Proof. First, lets forget about the minus signs. Run your finger down the first 

column, and you pick up A i11, and multiply it by det A (i11) . This det A (i11) is 

calculated similarly, except that row i 1 and column 1 have been deleted, so 

none of the terms come from that row or column. Therefore (inductively) all 

of the terms look just right, except for the (very confusing) minus signs. In 

each term, the numbers i 1 , i 2 , . . . , i n label deleted rows, in the order in which 

we delete them, and 1, 2, . . . , n label deleted columns. There is only one way 

to generate each term, given by the permutation i 1 , i 2 , . . . , i n , since i 1 labels 

which row we delete first, etc. So each term we have written above shows up 

just once, with a plus or minus sign. 

Lets fix the minus signs. The term A 11 A 22 . . . A nn occurs with a plus sign, 

because we start with a plus sign when we run our finger down the first column, 

so it is clear by induction. When we swap rows, we switch sign of the whole 

determinant. Each term is different from any other (coming from a different 

permutation), and has just a plus or minus sign in front of it. Swapping rows 

can alter signs of terms, and changes the order in which the terms appear, but 

the same terms are still sitting there. So every term must switch sign when we 

swap any two rows. The sign in front of A 12 A 21 A 33 A 44 . . . A nn , for example, 

must be −, since it comes about from one swapping (of rows 1 and 2) applied 

to A 11 A 22 . . . A nn . 

The number N of transpositions needed to reorder the permutation i 1 , i 2 , . . . , i n 

of rows into 1, 2, . . . , n is the number of minus signs in front of that term. (The 

minus signs cancel in pairs.)

18.2. Determinants Multiply: Permutation Proof 183 

Corollary 18.7. 

det A = ∑ (−1) N A 1j1 A 2j2 . . . A njn , 

for any square matrix A, where the sum is over all permutations j 1 , j 2 , . . . , j n 

of 1, 2, . . . , n, and N is the number of transpositions in some sequence of 

transpositions taking the permutation j 1 , j 2 , . . . , i n back to 1, 2, . . . , n. 

Proof. Take a term in the row permutation formula 18.1, say 

(−1) N A i11A i22 . . . A inn. 

Scramble the factors back into order by first index, say 

(−1) N A 1j1 A 2j2 . . . A njn . 

So j 1 , j 2 , . . . , j n is the inverse permutation of i 1 , i 2 , . . . , i n . The inverse permutation 

can be brought about by reversing the transpositions that bring about the 

original permutation i 1 , i 2 , . . . , i n . So it has the same number of transpositions 

N, and therefore the same sign. 

18.2 Determinants Multiply: Permutation Proof 

Lets see a different proof that det AB = det A det B, using the permutation 

formulas above instead of forward elimination. Write columns of AB as 

( ) ∑ 

ABe j = A B ij e j 

i 

= ∑ i 

B ij Ae i . 

Calculate the determinant: 

det AB = ∑ p 

(−1) N (AB) i11(AB) i22 . . . (AB) inn 

Use (AB) ij = ∑ k A ikB kj , to expand each AB: 

= ∑ (−1) ∑ ∑ 

N A i1k 1 

B k11 A i2k 2 

B k21 · · · ∑ 

A ink n 

B kn1 

p 

k 1 

k 2 

k n 

= ∑ 

∑ 

B k11B k22 . . . B knn (−1) N A i1k 1 

A i2k 2 

. . . A ink n 

k 1,k 2,...,k n p 

= ∑ 

) 

B k11B k22 . . . B knn det 

(Ae k1 Ae k2 . . . Ae kn 

k 1,k 2,...,k n


If k 1 = k 2 , then two columns in here are equal, so the resulting determinant is 

zero. Therefore this is really a sum over permutations k 1 , k 2 , . . . , k n . Reorder 

the columns: 

= ∑ ) 

B k11 . . . B knn(−1) N det 

(Ae 1 . . . Ae n 

= ∑ B k11 . . . B knn(−1) N det A 

= det A ∑ (−1) N B k11 . . . B knn 

= det A det B. 


Problem 18.1. Prove that the sign of a permutation p is the determinant of 

its permutation matrix. 

Problem 18.2. Prove that the sign of qp is the product of the signs of q and p, 

and that the sign of the inverse permutation p −1 is the sign of p. 

Problem 18.3. Let Q (x 1 , x 2 , . . . , x n ) be the product of all possible terms of 

the form x i − x j where i < j. For example, if n = 2, then Q (x 1 , x 2 ) = x 1 − x 2 , 

while if n = 3, then 

Q (x 1 , x 2 , x 3 ) = (x 1 − x 2 ) (x 1 − x 3 ) (x 2 − x 3 ) . 

(a) If n = 4, what is Q (x 1 , x 2 , x 3 , x 4 )? 

(b) Prove that if we permute the variables by a permutation p, then 

Q ( x p(1) , x p(2) , . . . , x p(n) 

) 

= ±Q (x1 , x 2 , . . . , x n ) 

where ± is the sign of p. (We could use this equation as the definition of 

the sign of a permutation.) 

Problem 18.4. Prove that a matrix is a permutation matrix just when it is a 

product of permutation matrices of transpositions. 

Problem 18.5. For the reader who has read chapter 13: why are permutation 

matrices orthogonal? For which permutations is the associated permutation 

matrix symmetric? 

Problem 18.6. Lets write n! (read as n factorial) for how many permutations 

there are of the numbers 1, 2, . . . , n. Prove that n! = n(n − 1)(n − 2) . . . (2)(1). 

Problem 18.7. If all entries of an n × n matrix A satisfy |A ij | ≤ R:

18.3. Cramer’s Rule 185 

a. Prove that |det A| ≤ n! R n , 

b. Using caution with how you pick your pivots, by forward elimination and 

induction, prove that 

|det A| ≤ 2 n(n−1)/2 R n . 

c. Give some evidence as to which is the better bound. 

18.3 Cramer’s Rule 

The permutation formula for the determinant is no help in calculation, but it 

has some theoretical power. There are similarly impractical formulas for the 

inverse of a matrix, and for the solutions of linear equations. The formulas 

give insight into how complicated inverses and solutions can get as functions of 

matrix entries. 

Recall the notation: A (ij) means the matrix A with row i and column j 

deleted. Recall from theorem 7.9 on page 65: 

det A = ∑ i 

(−1) i+j A ij det A (ij) , for any column j 

= ∑ j 

(−1) i+j A ij det A (ij) , for any row i, 

where A is any square matrix. 

Definition 18.8. The adjugate matrix adj A of a square matrix A is the matrix 

whose entries are 

(adj A) ij 

= (−1) j+i det A (ji) . 

Note the placement of i and j here: reversed from the formula for determinants. 

So clearly 

det A = ∑ j 

A ij (adj A) ji 

for any fixed index i. Some more notation: if A is a matrix, and b a vector, 

write A b,i for the matrix obtained by replacing column i of A by b. 

Theorem 18.9 (Cramer’s Rule). If A is invertible, then the solution to Ax = b 

has entries 

Proof. Write A as columns 

A = 

x j = det A b,j 

det A . 

(a 1 a 2 . . . a n 

) 

.


Expand out: 

) 

det A b,j = det 

(a 1 a 2 . . . b . . . a n 

) 

= det 

(a 1 a 2 . . . Ax . . . a n 

) 

= det 

(a 1 a 2 . . . (x 1 a 1 + x 2 a 2 + · · · + x n a n ) . . . a n 

But a 1 already appears in the first column, so x 1 a 1 has no effect on the result. 

Similarly for x 2 a 2 , etc., so 

) 

= det 

(a 1 a 2 . . . x j a j . . . a n 

= x j det A. 

Problem 18.8. If A and b have integer entries, and det A = ±1, prove that the 

solution x of Ax = b has integer entries. 

Corollary 18.10. If a matrix A is invertible, then 

A −1 = 1 adj A. 

det A 

Proof. We need to invert A, so to solve Ax = e j for each vector e j , and then 

put each solution x in column j of a matrix A −1 . By Cramer’s rule, the solution 

of Ax = e j has entries 

x i = det A e j,i 

det A . 

Putting these together into the columns of a matrix, we find components 

A −1 

ij 

= det A e j,i 

det A 

Calculate the numerator: 

⎛ 

⎞ 

A 11 A 12 . . . 0 . . . A 1n 

A 21 A 22 . . . 0 . . . A 2n 

. . 

. 

. 

det A ej,i = det 

A j1 A j2 . . . 1 . . . A jn 

⎜ 

⎟ 

⎝ . . 

. 

. ⎠ 

A n1 A n2 . . . 0 . . . A nn 

Expand along column j: 

= (−1) i+j det A (ji) .

18.3. Cramer’s Rule 187 

Remark 18.11. Cramer’s rule shows us that the solution x of Ax = b is a 

smooth function of A and b, as long as A is invertible. (By smooth, we mean 

differentiable as many times as you like, with respect to all variables. In this 

case, the variables are the entries of A and b.) In particular, the entries of A −1 

are smooth functions of the entries of A. 

Definition 18.12. We will say that two square matrices A and B commute 

if AB = BA. Similarly, we will say that two linear maps S : V → V and 

T : V → V of a vector space V commute if ST = T S. 

Lemma 18.13 (Lax [6]). Suppose that P (x) and Q(x) are polynomials with 

n × n matrix coefficients: 

P (x) = P 0 + P 1 x + P 2 x 2 + · · · + P p x p 

Q(x) = Q 0 + Q 1 x + Q 2 x 2 + · · · + Q q x q . 

Their product P Q(x) = P (x)Q(x) is also a polynomial with matrix coefficients. 

If A is an n × n matrix, then write P (A) to mean 

P (A) = P 0 + P 1 A + P 2 A 2 + · · · + P p A p . 

If A is an n×n matrix commuting with all of the coefficient matrices Q 0 , Q 1 , . . . , Q q 

of Q(x), then P Q(A) = P (A)Q(A). 

Proof. Calculate 

P Q(x) = P 0 Q 0 + (P 1 Q 0 + P 0 Q 1 ) x + · · · + P p Q q x p+q 

= ∑ j,k 

P j Q k x j+k . 

So 

P Q(A) = ∑ j,k 

P j Q k A j+k 

= ∑ j,k 

P j A j Q k A k 

= P (A)Q(A). 

Theorem 18.14 (Cayley–Hamilton). Every square matrix A satisfies p(A) = 0 

where p(λ) = det (A − λ I) is the characteristic polynomial of A. 

Remark 18.15. The proof works equally well over any field, not just the real or 

complex numbers. 

Proof. Let P (λ) = adj (A − λI), Q(λ) = A − λI. Then P (λ)Q(λ) = p(λ). 

Clearly A commutes with the coefficient matrices of Q(λ) (i.e. A commutes 

with A), so P (A)Q(A) = P Q(A) = p(A). But Q(A) = A − A = 0.



Problem 18.9. Use Cramer’s rule to solve 

2x 1 + x 2 = −1 

x 1 − x 2 = 3. 

Problem 18.10. By finding the adjugate, compute the inverse of 

( ) 

1 2 

A = 

. 

3 4 

Problem 18.11. If an invertible matrix A has integer entries, prove that A −1 

also has integer entries just when det A = ±1. 

Problem 18.12. Without any calculation, what can you say about the inverse 

of the matrix 

⎛ 

⎞ 

1 1 + x + 5 x 3 1 + 2 x 3 2 + 7 x + x 3 6 + 3 x 3 

0 1 5 + 7 x + 7 x 3 7 + 7 x + 8 x 3 1 + 5 x 3 

A = 

0 0 1 2 + 4 x + 3 x 

⎜ 

3 6 x + 4 x 3 

? 

⎟ 

⎝0 0 0 1 3 + 3 x + 2 x 3 ⎠ 

0 0 0 0 1

19 Volume and Determinants 

We will see that the determinant of a matrix measures volume growth. 

19.1 Shears 

Consider the matrix 

( ) 

1 c 

A = 

. 

0 1 

To each vector x, associate the vector y = Ax, 

( ) ( ) ( ) 

y 1 1 c x 

= 

1 

y 2 0 1 x 2 

( ) 

x 

= 1 + cx 2 

. 

x 2 

A shear 

We call the transformation taking x to y = Ax a shear. Consider the x 1 , x 2 - 

plane. Draw the x 1 axis horizontally, and x 2 vertically. Take a rectangule in 

the x 1 , x 2 -plane, with sides along the two axes. The rectangle gets mapped into 

a parallelogram in the y 1 , y 2 -plane. 

Problem 19.1. Prove that the bottom of the rectangle stays put, but the top 

is shifted over by the amount c, while the height stays the same. 

The parallelogram has the same area as the rectangle. So the shear preserves 

the area of the rectangle. If we slide the rectangle away from the origin, the 

parallelogram slides too in the same way. Consequently all rectangles with sides 

parallel to the axes have area preserved by any shear. 

19.2 Reflections 

The map y = Ax with 

( ) 

0 1 

A = 

1 0 

swaps the x 1 , x 2 coordinates, so clearly takes rectangles to congruent rectangles, 

and so preserves their area. 

Cut the parallelogram. 

Slide the triangle 

over to the 

right side to recover 

the rectangle. 

189

190 Volume and Determinants 

19.3 Hypotheses About Volume 

Complicated shapes 

can be approximated 

by small 

rectangular boxes. 

Area of any shape 

is preserved as long 

as the areas of 

rectangular boxes 

are preserved. 

We will make the following assumptions about volumes of subsets of R n : 

a. A single point in R has volume 0. 

b. If we scale the real line by a factor, say c ≠ 0, then we scale lengths of 

line segments by a factor of |c|. 

c. If A is a subset of R p and B a subset of R q , with volumes Vol (A) = a 

and Vol (B) = b, then A × B in R p+q has Vol (A × B) = ab. 

d. If a map preserves the volumes of all rectangular boxes with sides parallel 

to the coordinate axes, then it preserves volumes of any subset that has a 

finite volume. (Because it should be possible to approximate complicated 

objects with little boxes.) 

e. If a set X has a finite volume, then so does AX for any linear map A. 

These hypotheses are justified in any book of multivariable calculus or basic 

analysis (for example, Wheeden & Zygmund [16]). 

19.4 Determinant and Volume 

Theorem 19.1. Consider a transformation y = Ax with A an n × n matrix. 

Taking any set X in R n , let AX be its image under the transformation. If X 

has some volume Vol (X), then AX has volume Vol (AX) = |det A| Vol (X). 

Proof. The strictly lower and strictly upper triangular matrices with only 

a single nonzero entry off the diagonal are shears of two coordinates. The 

permutation matrices of transpositions are swaps of two coordinates. We have 

seen that these preserve areas in the plane of those two coordinates. They have 

no effect on any other coordinates. Therefore, by our first hypothesis, they 

preserve volumes of boxes. By our second hypothesis, they preserve volumes. 

The scalings of rows we encountered in reduced echelon form are just rescalings 

of variables, so they rescale volume by their determinant. Any matrix can be 

brought to reduced echelon form by finitely many products with these matrices. 

If det A ≠ 0, then the reduced echelon form is 1, so the result is true. 

If det A = 0, we carry out more simplications after Gauss–Jordan elimination: 

multiplying A by strictly lower and strictly upper triangular matrices on the 

right side of A, we can add any column of A to any other column, thereby 

killing all entries after each pivot. Each step preserves volumes. We arrange 

( ) 

1 

A = p 0 

, 

0 0 

for some size p × p of identity matrix. So the image of A lies inside R p . Take 

any set X. Suppose that X has volume Vol (X). The set AX is AX = Y × Z 

for Y some set in R p , and Z the single point 0 in R n−p . Hence Vol (Z) = 0 and 

Vol (AX) = Vol (Y ) Vol (Z) = 0.

19.4. Determinant and Volume 191 


Problem 19.2. (a) What is the volume of a cube in n dimensions in terms of 

the length of each side? 

(b) How far apart are the opposite corners of the cube from one another? 

(c) How many “faces” are there of one lower dimension? 

(d) How many “vertices” are there? 

(e) How many “faces” are there of all possible lower dimensions? 

Problem 19.3. Use a determinant to find the area of the ellipse 

( x 

a 

) 2 

+ 

( y 

b 

) 2 

= 1, 

from the well-known fact that a circle of unit radius 

has area π. 

x 2 + y 2 = 1

20 Geometry and Orthogonal 

Matrices 

What does it mean geometrically for a matrix to be orthogonal? 

20.1 Distances and Orthogonal Matrices 

Inequalities 

Lemma 20.1 (The Schwarz Inequality). For x and y in R n , 

|〈x, y〉| ≤ ‖x‖ ‖y‖ . 

Equality holds just when one of the vectors is a multiple of the other. 

Proof. If y = 0 then the result is obvious. If y ≠ 0, then 

Therefore 

∥ 

0 ≤ ∥‖y‖ 2 x − 〈x, y〉 y∥ 2 

〈 

〉 

= ‖y‖ 2 x − 〈x, y〉 y, ‖y‖ 2 x − 〈x, y〉 y 

= ‖y‖ 4 ‖x‖ 2 − ‖y‖ 2 〈x, y〉〈x, y〉 − 〈x, y〉 ‖y‖ 2 〈x, y〉 + 〈x, y〉 2 ‖y‖ 2 

= ‖y‖ 4 ‖x‖ 2 − ‖y‖ 2 〈x, y〉 2 

= ‖y‖ 2 ( ‖x‖ 2 ‖y‖ 2 − 〈x, y〉 2) 

‖x‖ 2 ‖y‖ 2 ≥ 〈x, y〉 2 

On the first line, you see that equality holds just when x is a multiple of y. 

Lemma 20.2 (The Triangle Inequality). If x and y are vectors in R n , then 

‖x + y‖ ≤ ‖x‖ + ‖y‖ . 

Equality holds just when one of the vectors is a nonnegative multiple of the 

other. 

193

194 Geometry and Orthogonal Matrices 

Proof. 

‖x + y‖ 2 = 〈x + y, x + y〉 

= ‖x‖ 2 + 2 〈x, y〉 + ‖y‖ 2 

≤ ‖x‖ 2 + 2 ‖x‖ ‖y‖ + ‖y‖ 2 

= (‖x‖ + ‖y‖) 2 . 

Equality requires 〈x, y〉 = ‖x‖ ‖y‖, so one is a multiple of the other, say y = ax, 

and then 

〈x, y〉 = ‖x‖ ‖y‖ 

implies that a ≥ 0, and similarly if x is a multiple of y. 


Problem 20.1. Suppose that A is a matrix. 

(a) Among all vectors x of length 1, how large can the 1st entry of Ax (the 

entry (Ax) 1 ) be? 

(b) Show that if x is a vector of length 1, then y = Ax sits inside a rectangular 

box whose sides are parallel to the axes, with the side along the y j -axis 

being of length 2 ‖A ∗ e j ‖. 

(c) Apply this to the matrix 

( ) 

4 3 

A = 

, 

0 2 

and draw a picture. 

Isometries and Orthogonal Matrices 

Problem 20.2. The distance between two points x and y of R n is ‖x − y‖. 

Prove that for any three points x, y, z in R n , 

‖x − z‖ + ‖z − y‖ ≥ ‖x − y‖ . 

Definition 20.3. Define the line connecting two distinct points x and y in R n 

to be the set of points of the form tx + (1 − t)y, and the line segment between 

x and y to be the set of such points with 0 ≤ t ≤ 1. 

Problem 20.3. Recall that for any three points x, y, z in R n , 

‖x − z‖ + ‖z − y‖ ≥ ‖x − y‖ . 

Prove that when x and y are distinct, equality holds just when z lies on the 

line segment between x and y.

20.1. Distances and Orthogonal Matrices 195 

Problem 20.4. If x and y are distinct, and z = tx+(1−t)y on the line segment 

between x and y, prove that no other point of R n has the same distances to x 

and y. 

Definition 20.4. A map T : R n → R p is a rule associating to each point x of 

R n a point T (x) of R p . An isometry T of R n is a map T : R n → R n , so that 

the distance between any two points x and y of R n is the same as the distance 

between T (x) and T (y). 

Theorem 20.5. The isometries of R n are precisely the maps 

T (x) = Ax + b 

where A is any orthogonal matrix, and b is any vector in R n . 

Proof. From the exercises, 

T (tx + (1 − t)y) = tT (x) + (1 − t)T (y) , 

for x and y distinct, and t between 0 and 1, because T preserves distances, so 

preserves the triangle inequality, the lines, line segments, etc. Replace the map 

T (x) by T (x) − T (0) if needed (just shifting T (x) over) to ensure that T (0) = 0. 

Taking y = 0, and x ≠ 0, we get 

for 0 ≤ t ≤ 1. For t > 1, 

T (tx) = tT (x) , 

( ) 1 

t T (x) = t T 

t tx 

= t 1 t T (tx) 

= T (tx) . 

To handle minus signs, set y = −x and t = 1 2 : 

0 = T (0) 

( 1 

= T 

2 x + 1 ) 

2 (−x) 

= 1 2 T (x) + 1 2 T (−x). 

So T (−x) = −T (x), and this ensures that T (tx) = t T (x) for all real values of t. 

Take any nonzero vector y: 

( 

T (x + y) = T t x ) 

y 

+ (1 − t) 

t 1 − t 

( x 

) 

( ) y 

= t T + (1 − t) T 

t 1 − t 

= T (x) + T (y).


Set A to be the matrix whose columns are T (e 1 ) , T (e 2 ) , . . . , T (e n ). Then 

(∑ ) 

T (x) = T xj e j 

= ∑ x j T (e j ) 

= ∑ x j Ae j 

= Ax. 

So T (x) = Ax for a matrix A, and ‖Ax‖ = ‖x‖ so A is orthogonal. 

Problem 20.5. Which maps alter distance by a constant factor? 

20.2 Rotations and Orthogonal Matrices 

How can we picture what an orthogonal matrix does, in any number of dimensions? 

Problem 20.6. Prove that a 2 × 2 orthogonal matrix has the form 

( 

) 

cos θ − sin θ 

A = 

sin θ cos θ 

just when it has det A = 1. 

Consider the matrix 

⎛ 

1 

⎜ 

⎝ 

cos θ 

sin θ 

1 

− sin θ 

cos θ 

⎞ 

⎟ 

⎠ 

1 

(where each 1 could be an identity matrix of any size). Call this a rotation in 

the x k x l -plane if the sines and cosines appear in rows k and l. 

More generally, picking any two perpendicular vectors of unit length, say u 

and v, and an angle θ, we can define an orthogonal matrix R by asking that 

Rx = x for x perpendicular to u and v, and that R rotate the plane spanned 

by u and v by the angle θ : 

Ru = cos θu + sin θv 

Rv = − sin θu + cos θv. 

Problem 20.7. The minus signs look funny. Check that this gives the expected 

matrix R if u = e 1 and v = e 2 . 

Problem 20.8. How can we be sure that such a matrix R exists?

20.2. Rotations and Orthogonal Matrices 197 

Careful: if we swap the choice of which is u and which is v, we will rotate 

in the wrong direction. 

Another kind of orthogonal map is the map taking x to −x, which (for want 

of a better word) we can call a reversal. 

Theorem 20.6. Every orthogonal matrix is a product of rotations in mutually 

perpendicular planes together with a reversal in some subspace perpendicular to 

all of those planes. 

Remark 20.7. The reversal occurs precisely in the λ = −1 eigenspace of the 

orthogonal matrix. 

Remark 20.8. There might be some vectors which are perpendicular to all of 

the planes and to the reversing subspace. Such vectors must be fixed: Ax = x, 

and so form the λ = 1 eigenspace of the orthogonal matrix. 

Proof. Call the matrix A. This matrix A is a real matrix, but we can think of 

it as a complex matrix, which just happens to have only real number entries. 

Since A is an orthogonal matrix, it is normal. By the complex spectral theorem 

we can find a unitary basis u 1 , u 2 , . . . , u n in C n of complex eigenvectors of A, 

say 

Au 1 = λ 1 u 1 , Au 2 = λ 2 u 2 , . . . , Au n = λ n u n . 

These eigenvalues λ 1 , λ 2 , . . . , λ n are complex numbers. What sort of complex 

numbers are they? As in problem 15.21 on page 154, since A is unitary, the 

eigenvalues of A are complex numbers of modulus 1: 

1 = |λ 1 | = |λ 2 | = · · · = |λ n | . 

Moreover, since A is a matrix of real numbers, taking complex conjugates of 

the equation Au 1 = λ 1 u 1 gives Aū 1 = ¯λ 1 ū 1 . Because A is orthogonal, and 

therefore unitary, 〈Au 1 , Aū 1 〉 = 〈u 1 , ū 1 〉. Therefore 

〈u 1 , ū 1 〉 = 〈Au 1 , Aū 1 〉 

= 〈 λ 1 u 1 , ¯λ 1 ū 1 

〉 

Pulling the scalars through the Hermitian inner product, 

= λ 1 λ 1 〈u 1 , ū 1 〉 

= λ 2 1 〈u 1 , ū 1 〉 . 

Therefore either (1) λ 2 1 = 1 or else (2) u 1 is perpendicular to ū 1 . If (1), then 

λ 1 = ±1, so we are looking at an eigenvector in the reversing subspace, or a 

fixed vector. The case (2) is trickier: write u 1 = x 1 + iy 1 , with x 1 and y 1 real 

vectors. Then calculate 

〈u 1 , ū 1 〉 = ‖x 1 ‖ 2 − ‖y 1 ‖ 2 + 2i 〈x 1 , y 1 〉 .


Therefore in case (2), we find that x 1 and y 1 have the same length, and are 

perpendicular. We can scale x 1 and y 1 to both have length 1. Since |λ 1 | = 1, 

we can write 

λ 1 = cos θ 1 + i sin θ 1 . 

Expand out the equation Au 1 = λ 1 u 1 , into real and imaginary parts, and you 

find 

Ax 1 = cos θ 1 x 1 − sin θ 1 y 1 

Ay 1 = sin θ 1 x 1 + cos θ 1 y 1 , 

a rotation by an angle of −θ 1 . The same results hold for u 2 , u 3 , . . . , u n in place 

of u 1 . 

Problem 20.9. Prove that the reversal of an even dimensional space R 2n is a 

product of rotations, each rotating some plane by an angle of π. We can choose 

any planes we like, as long as they are mutually perpendicular and together 

span R 2n . 

Corollary 20.9. An orthogonal matrix is a product of rotations in mutually 

perpendicular planes just when it has determinant 1. 

Definition 20.10. A rotation is an orthogonal matrix of unit determinant. 

20.3 Reflections and Orthogonal Matrices 

Definition 20.11. A reflection by a nonzero vector u in R n is an isometry R of 

R n for which R(u) = −u and R(x) = x for any vector x perpendicular to u. 

Lemma 20.12. There is a unique reflection R by any nonzero vector u, and 

it is 

〈x, u〉 

Rx = x − 2 u. (20.1) 

〈u, u〉 

Proof. If R is a reflection, then R is an isometry and R(0) = 0, so R(x) = Rx 

for a unique matrix R (which we deliberately call by the same name). Clearly 

R 2 = 1 so R −1 = R. If R and S are two reflections in the same vector u, then 

RS(u) = u and RS(x) = x for x perpendicular to u, so RS(au + bx) = au + bx 

fixes every vector, and therefore RS = 1 so S = R −1 = R. So there is a unique 

reflection, if one exists. 

Let u 1 = 

u 

‖u‖ . Pick unit vectors u 2, u 3 , . . . , u n so that u 1 , u 2 , . . . , u n is an 

orthonormal basis. Write a vector x as x = ∑ i a iu i . Then equation 20.1 gives 

Rx = −a 1 u 1 + a 2 u 2 + a 3 u 3 + · · · + a n u n . 

Therefore ‖Rx‖ = ‖x‖ so R is any isometry, and clearly a reflection.

20.3. Reflections and Orthogonal Matrices 199 

Problem 20.10. If A is orthogonal, prove that det A = ±1. 

Theorem 20.13. Every orthogonal matrix is either a rotation (if it has determinant 

1) or the product of a rotation with a reflection (if it has determinant 

-1). 

Proof. By the same procedure as in corollary 20.9, we can arrange by induction 

without loss of generality that the columns of our orthogonal matrix A are 

e 1 , e 2 , e 3 , . . . , e n−1 . But the proof breaks down at the last column, because to 

get each column fixed up, the proof needs to have at least two rows. So the 

last column is ±e n . Hence det A = ±1 and A = 1 just when det A = 1, and is 

reflection in e n just when det A = −1. 

Problem 20.11. Prove that every unitary matrix is a rotation. 

Problem 20.12. Use the spectral theorem to show that every n × n unitary 

matrix is a product of rotations in n mutually perpendicular planes. 

Theorem 20.14 (Cartan–Dieudonné). Every n × n orthogonal matrix is a 

product of at most n reflections. The number of reflections is an even number 

if it is a rotation, and odd otherwise. 

Proof. For a 1 × 1 matrix, the result is obvious. 

Let A be our n × n orthogonal matrix. Our matrix A is a product of reflections, 

say in vectors u 1 , u 2 , . . . , u n , just when F AF t is a product in reflections 

F u 1 , F u 2 , . . . , F u n . So we may change orthonormal basis as we please. 

If u is a fixed vector, i.e. Au = u, then we can rescale u to have unit length, 

and change basis to get u = e 1 . Then by induction, 

( ) 

1 0 

A = 

0 B 

is a product of at most n − 1 reflections. 

If A doesn’t fix any vector, then take any nonzero vector v. Consider 

the reflection R in the vector u = Av − v. A simple calculation: Rv = Av 

and RAv = v. Therefore ARAv = Av, so that AR has a fixed vector Av. 

By induction AR is a product of n − 1 reflections, so A is a product of n 

reflections.

21 Orthogonal Projections 

In chapter 20, we studied geometry of the inner product in R n . In this chapter, 

we continue the study in abstract inner product spaces. 


An orthonormal basis in an inner product space has the same definition as in 

R n (see definition 13.9 on page 123). 

Lemma 21.1. Suppose that W is a subspace of a finite dimensional inner 

product space V , say of dimension p. Then there is an orthonormal basis 

v 1 , v 2 , . . . , v n 

for V so that v 1 , v 2 , . . . , v p is an orthonormal basis for W . 

Proof. Follow the process given in the hint for problem 16.24 on page 167, 

to produce a basis for V whose first p vectors are a basis for W . Apply the 

Gram–Schmidt process to these vectors to obtain an orthonormal basis. 

21.2 Orthogonal Projections 

Recall that a vector space which is equipped with an inner product is called an 

inner product space. 

Definition 21.2. If W is a subspace of an inner product space V , let W ⊥ (called 

the orthogonal complement of W ) be the set of vectors of V perpendicular to 

all vectors of W . 

Problem 21.1. Prove that W ⊥ is a subspace of V . 

Theorem 21.3. Let W be a subspace of a finite dimensional inner product 

space V . Then every vector v in V can be written in precisely one way as 

v = x + y with x from W and y from W ⊥ . The vector x is called the orthogonal 

projection of v to W . The map P : V → W taking each vector to its orthogonal 

projection is linear. 

201


Proof. First, lets show that there is at most one way to break up a vector v 

into a sum x + y. Suppose that v = x 0 + y 0 = x 1 + y 1 , with x 0 and x 1 from W 

and y 0 and y 1 from W ⊥ . Then x 0 − x 1 = y 1 − y 0 . The left hand side lies in 

W while the right hand side lies in W ⊥ . But the the left hand side must be 

perpendicular to the right hand side, so perpendicular to itself, so 0. Therefore 

the right hand side must be 0 too, so x 0 = x 1 and y 0 = y 1 : there is at most 

one way to break up each vector v. 

Next, lets show that there is a way. Take an orthonormal basis for V , say 

v 1 , v 2 , . . . , v n 

for which v 1 , v 2 , . . . , v p form an orthonormal basis for W . Therefore 

v p+1 , v p+2 , . . . , v n 

lie in W ⊥ . Write each vector v in V as 

v = a 1 v 1 + a 2 v 2 + · · · + a p v p + a p+1 v p+1 + a p+2 v p+2 + · · · + a n v n . 

} {{ } } {{ } 

in W 

in W ⊥ 

Problem 21.2. Finish the proof by proving that orthogonal projection is a 

linear map. 

Problem 21.3. Suppose that W is a subspace of a finite dimensional inner 

product space V . Prove that W ⊥⊥ = W . 

Problem 21.4. Prove the Pythagorean theorem:if x and y are perpendicular 

vectors in an inner product space, then ‖x + y‖ 2 = ‖x‖ 2 + ‖y‖ 2 . 

Problem 21.5. Prove that v = P v + Qv, where P is the orthgonal projection 

to a subspace W and Q is the orthogonal projection to W ⊥ . 

Lemma 21.4. The orthogonal projection P v of a vector v to a subspace W is 

the closest point of W to the vector v. 

Proof. Write v = x + y with x in W and y in W ⊥ . Lets see how close a vector 

w from W can get to v. 

‖v − w‖ 2 = ‖x − w + y‖ 2 

= ‖x − w‖ 2 + ‖y‖ 2 

by the Pythagorean theorem. As we vary w through W , this expression is 

clearly smallest when w = x.

21.2. Orthogonal Projections 203 

Problem 21.6. Let P be the orthogonal projection to a subspace W of a finite 

dimensional inner product space V . Prove Bessel’s inequality 

‖P v‖ ≤ ‖v‖ , 

and equality holds just when P v = v, which holds just when v lies in W . 

Problem 21.7. Prove that a linear map P : V → V of a finite dimensional inner 

product space V is the projection to some subspace if and only if P = P 2 = P t . 

Problem 21.8. Find analogues of all of the results of this chapter for complex 

vectors in a finite dimensional Hermitian inner product space.

Jordan Normal Form 

205

22 Direct Sums of Subspaces 

Subspaces have a kind of arithmetic. 

Definition 22.1. The intersection of two subspaces is the collection of vectors 

which belong to both of the subspaces. We will write the intersection of 

subspaces U and V as U ∩ V . 

Example 22.2. The subspace U of R 3 given by the vectors of the form 

⎛ ⎞ 

x 1 

⎜ ⎟ 

⎝ x 2 ⎠ 

x 1 − x 2 

intersects the subspace V consisting in the vectors of the form 

⎛ ⎞ 

⎜ 

x 1 

⎟ 

⎝ 0 ⎠ 

x 3 

in the subspace written U ∩ V , which consists in the vectors of the form 

⎛ ⎞ 

x 1 

⎜ ⎟ 

⎝ 0 ⎠ . 

x 1 

Definition 22.3. If U and V are subspaces of a vector space W , write U + V 

for the set of vectors w of the form w = u + v for some u in U and v in V ; call 

U + V the sum. 

Problem 22.1. Prove that U + V is a subspace of W . 

Definition 22.4. If U and V are two subspaces of a vector space W , we will 

write U + V as U ⊕ V (and say that U ⊕ V is a direct sum) to mean that every 

vector x of U + V can be written uniquely as a sum x = y + z with y in U and 

z in Z. We will also say that U and V are complementary, or complements of 

one another. 

Example 22.5. R 3 = U ⊕ V for U the subspace consisting of the vectors 

⎛ ⎞ 

x 1 

⎜ ⎟ 

x = ⎝x 2 ⎠ 

0 

207


and V the subspace consisting of the vectors 

⎛ ⎞ 

0 

⎜ ⎟ 

x = ⎝ 0 ⎠ , 

x 3 

since we can write any vector x uniquely as 

⎛ ⎞ ⎛ ⎞ 

⎜ 

x 1 0 

⎟ ⎜ ⎟ 

x = ⎝x 2 ⎠ + ⎝ 0 ⎠ . 

0 x 3 

Problem 22.2. Give an example of two subspaces of R 3 which are not complementary. 

Theorem 22.6. U + V is a direct sum U ⊕ V just when U ∩ V consists of just 

the 0 vector. 

Proof. If U +V is a direct sum, then we need to see that U ∩V only contains the 

zero vector. If it contains some vector x, then we can write x uniquely as a sum 

x = y + z, but we can also write x = (1/2)x + (1/2)x or as x = (1/3)x + (2/3)x, 

as a sum of vectors from U and V . Therefore x = 0. 

On the other hand, if there is more than one way to write x = y + z = Y + Z 

for some vectors y and Y from U and z and Z from V , then 0 = (y−Y )+(z−Z), 

so Y − y = z − Z, a nonzero vector from U ∩ V . 

Lemma 22.7. If U ⊕ V is a direct sum of subspaces of a vector space W , then 

the dimension of U ⊕ V is the sum of the dimensions of U and V . Moreover, 

putting together any basis of U with any basis of V gives a basis of W . 

Proof. Pick a basis for U, say u 1 , u 2 , . . . , u p , and a basis for V , say v 1 , v 2 , . . . , v q . 

Then consider the set of vectors given by throwing all of the u’s and v’s together. 

The u’s and v’ are linearly independent of one another, because any linear 

relation 

0 = a 1 u 1 + a 2 u 2 + · · · + a p u p + b 1 v 1 + b 2 v 2 + · · · + b q v q 

would allow us to write 

a 1 u 1 + a 2 u 2 + · · · + a p u p = − (b 1 v 1 + b 2 v 2 + · · · + b q v q ) , 

so that a vector from U (the left hand side) belongs to V (the right hand side), 

which is impossible unless that vector is zero, because U and V intersect only 

at 0. But that forces 0 = a 1 u 1 + a 2 u 2 + · · · + a p u p . Since the u’s are a basis, 

this forces all a’s to be zero. The same for the b’s, so it isn’t a linear relation. 

Therefore the u’s and v’s put together give a basis for U ⊕ V .

22.1. Application: Simultaneously Diagonalizing Several Matrices 209 

We can easily extend these ideas to direct sums with many summands 

U 1 ⊕ U 2 ⊕ · · · ⊕ U k . 

Problem 22.3. Prove that if U ⊕ V = W , then any linear maps S : U → R p 

and T : V → R p determine a unique linear map Q : W → R p written Q = S ⊕T , 

so that Q| U 

= S and Q| V 

= T . 

22.1 Application: Simultaneously Diagonalizing Several Matrices 

Theorem 22.8. Suppose that T 1 , T 2 , . . . , T N are linear maps taking a vector 

space V to itself, each diagonalizable, and each commuting with the other 

(which means T 1 T 2 = T 2 T 1 , etc.) Then there is a single invertible linear map 

F : R n → V diagonalizing all of them. 

Proof. Since T 1 and T 2 commute, if x is an eigenvector of T 1 with eigenvalue λ, 

then T 2 x is too: 

T 1 (T 2 x) = T 2 T 1 x 

= T 2 λx 

= λ (T 2 x) . 

So each eigenspace of T 1 is invariant under T 2 . The same is true for any two 

of the linear maps T 1 , T 2 , . . . , T N . Because T 1 is diagonalizable, V is a direct 

sum of the eigenspaces of T 1 . So it suffices to find a basis for each eigenspace of 

T 1 , which diagonalizes all of the linear maps. It suffices to prove this on each 

eigenspace separately. So lets restrict to an eigenspace of T 1 , where T 1 = λ 1 . 

So T 1 = λ 1 is diagonal in any basis. By the same reasoning applied to T 2 , etc., 

we can work on a common eigenspace of all of the T j , arranging that T 2 = λ 2 , 

etc., diagonal in any basis. 

22.2 Transversality 

Lemma 22.9. If V is a finite dimensional vector space, containing two subspaces 

U and W , then 

dim U + dim W = dim(U + W ) + dim (U ∩ W ) . 

Proof. Take any basis for U ∩ W . Then while you pick some more vectors from 

U to extend it to a basis of U, I will simultaneously pick some more vectors 

from W to extend it to a basis of W . Clearly can throw our vectors together to 

get a basis of U + W . Count them up. 

This lemma makes certain inequalities on dimensions obvious.


Lemma 22.10. If U and W are subspaces of an n dimensional vector space 

V , say of dimensions p and q, then 

max {0, p + q − n} ≤ dim (U ∩ W ) ≤ min {p, q} , 

max {p, q} ≤ dim (U + W ) ≤ min {n, p + q} . 

Proof. All inequalities but the first are obvious. The first follows from the last 

by applying lemma 22.9 on the previous page. 

Problem 22.4. How few dimensions can the intersection of subspaces of dimensions 

5 and 3 in R 7 have? How many? 

Definition 22.11. Two subspaces U and W of a finite dimensional vector space 

V are transverse if U + W = V . 

Problem 22.5. How few dimensions can the intersection of transverse subspaces 

of dimensions 5 and 3 in R 7 have? How many? 

Problem 22.6. Must subspaces in direct sums be transverse? 

22.3 Computations 

In R n , all abstract concepts of linear algebra become calculations. 

Problem 22.7. Suppose that U and W are subspaces of R n . Take a basis for 

U, and put it into the columns of a matrix, and call that matrix A. Take a 

basis for W , and put it into the columns of a matrix, and call that matrix B. 

How do you find a basis for U + W ? How do you see if U + W is a direct sum? 

Proposition 22.12. Suppose that U and W are subspaces of R n and that A 

and B are matrices whose columns give bases for U and W respectively. Apply 

the algorithm of chapter 10 to find a basis for the kernel of (A B), say 

( ) ( ) ( ) 

x 1 x 

, 2 x 

, . . . , s 

. 

y 1 y 2 y s 

Then the vectors Ax 1 , Ax 2 , . . . , Ax s form a basis for the intersection of U and 

W . 

Proof. For example, Ax 1 + By 1 = 0, so Ax 1 = −By 1 = B(−y 1 ) lies in the 

image of A and of B. Therefore the vectors Ax 1 , Ax 2 , . . . , Ax n lie in U ∩ W . 

Suppose that some vector v also lies in U ∩ W . Then v = Ax = B(−y) for some 

vectors x and y. But then Ax + By = 0, so 

( ) 

x 

y

22.3. Computations 211 

is a multiple of the vectors 

( ) ( ) ( ) 

x 1 x 

, 2 x 

, . . . , s 

. 

y 1 y 2 y s 

so x = ∑ a j Ax j , for some numbers a j . Therefore these Ax j span the intersection. 

Suppose they suffer some linear relation: 0 = ∑ c j Ax j . So 0 = A ∑ c j x j . 

But the columns of A are linearly independent, so A is 1-1. Therefore 0 = ∑ c j x j . 

At the same time, 

But B is also 1-1, so 0 = ∑ c j y j . So 

0 = ∑ c j Ax j 

= ∑ c j B(−y j ) 

= B(− ∑ c j y j ). 

0 = ∑ ( ) 

x 

c j 

j . 

y j 

But these vectors are linearly independent by construction.

23 Jordan Normal Form 

We can’t quite diagonalize every matrix, but we will see how close we can come: 

the Jordan normal form. We will generalize this form to linear maps of an 

abstract vector space. 

23.1 Jordan Normal Form and Strings 

Diagonalizing is powerful. By diagonalizing a matrix, we can see what it does 

completely, and we can easily carry out immense computations; for example, 

finding large powers of a matrix. The trouble is that some matrices can’t be 

diagonalized. How close can we come? 

Example 23.1. 

( ) 

0 1 

A = 

0 0 

is the simplest possible example. Its only eigenvalue is λ = 0. As a real or 

complex matrix, it has only one eigenvector, 

( ) 

1 

, 

0 

up to rescaling. Not enough eigenvectors to form a basis of R 2 , so not enough 

to diagonalize. We will build this entire chapter from this simple example. 

Problem 23.1. What does the map taking x to Ax look like for this matrix A? 

Lets write 

⎛ 

⎞ 

( ) 

0 1 ⎜ 

0 1 0 ⎟ 

∆ 1 = (0) , ∆ 2 = 

, ∆ 3 = ⎝0 0 1⎠ , 

0 0 

0 0 0 

and in general write ∆ n or just ∆ for the square matrix with 1’s just above the 

diagonal, and 0’s everywhere else. 

Problem 23.2. Prove that ∆e j = e j−1 , except for ∆e 1 = 0. So we can think of 

∆ as shifting the standard basis, like the proverbial lemmings stepping forward 

until e 1 falls off the cliff. 

213


A matrix of the form λ + ∆ is called a Jordan block. Our goal in this chapter 

is to prove: 

Theorem 23.2. Every square complex matrix A can be brought by change of 

F −1 AF to Jordan normal form 

⎛ 

⎞ 

λ 1 + ∆ 

λ 

F −1 AF = 

2 + ∆ 

⎜ 

. 

⎝ 

.. ⎟ 

⎠ , 

λ N + ∆ 

broken into Jordan blocks. 

We will not give the simplest possible proof (which is probably the one given 

by Hartl [4]), but instead give an explicit algorithm for computing F , and then 

prove that the algorithm works. 

Definition 23.3. If λ is an eigenvalue of a matrix A, a vector x is a generalized 

eigenvector of A with eigenvalue λ if (A − λ) k x = 0, for some positive integer k. 

If k = 1 then x is an eigenvector in the usual sense. 

Example 23.4. 

( ) 

0 1 

A = 

0 0 

satisfies A 2 = 0, so every vector x in R 2 is a generalized eigenvector of A, with 

eigenvalue 0. In the generalized sense, we have lots of eigenvectors. 

Problem 23.3. Prove that every vector in C n is a generalized eigenvector of ∆ 

with eigenvalue 0. Then prove that for any number λ, every vector in C n is a 

generalized eigenvector of λ + ∆ with eigenvalue λ. 

Problem 23.4. Prove that no nonzero vector can be a generalized eigenvector 

with two different eigenvalues. 

Problem 23.5. Prove that nonzero generalized eigenvectors of a square matrix, 

with different eigenvalues, are linearly independent. 

Definition 23.5. A string is a collection of linearly independent vectors of the 

form 

x, (A − λ) x, (A − λ) 2 x, . . . , (A − λ) k x, 

each a generalized eigenvector with eigenvalue λ. 

We want to make our strings as long as possible, not contained in any longer 

string.

23.2. Algorithm 215 

Example 23.6. For A = ∆, e n , e n−1 , . . . , e 1 is a string with eigenvalue λ = 0. 

Problem 23.6. For A = λ + ∆, show that e n , e n−1 , . . . , e 1 is a string with 

eigenvalue λ. 

Problem 23.7. Find strings of 

⎛ 

⎞ 

2 0 0 0 0 0 

0 2 1 0 0 0 

A = 

0 0 2 0 0 0 

0 0 0 3 1 0 

. 

⎜ 

⎟ 

⎝0 0 0 0 3 1⎠ 

0 0 0 0 0 3 

Problem 23.8. Prove that every nonzero generalized eigenvector x belongs to 

a string, and the string can be lengthened until the last entry of the string is 

an eigenvector (not generalized). 

23.2 Algorithm 

First we give the algorithm, and then an example, and then a proof that it 

works. To compute out the strings that put a matrix A into Jordan normal 

form: 

a. If A is already in Jordan normal form, then just look beside the diagonal 

of A to find the blocks, and read off the strings. So we can assume that 

A is not in Jordan normal form. 

b. Find an eigenvalue λ of A. Replace A with A − λ. (So from now on A is 

not invertible.) 

c. Apply forward elimination to (A 1). Call the result (U V ). 

d. Put the pivot columns of A into a matrix A ′ , and the pivot columns of U 

into a matrix U ′ . 

e. Solve the system of equations 

U ′ X = UA ′ 

for an unknown square matrix X, by back substitution. This matrix X 

has size r × r, where r is the rank of A (i.e. the number of pivot columns 

of A). The matrix X is smaller than A, because not all columns of A are 

pivot columns. 

f. Apply the algorithm to the matrix X, to find its strings. (It may save you 

time to notice that X has the same eigenvalues as A, except perhaps 0.) 

g. For each string of X, applying A ′ to the string gives a string of A. For 

example, a string y 1 , y 2 , . . . becomes 

A ′ y 1 , A ′ y 2 , . . . .


h. From among the strings we have just constructed in our last step, take 

each vector z which starts a string with eigenvalue 0. Solve the equations 

Ux = V z by back substitution, and add x to the start of this string. 

i. We are still missing the strings of length 1 and zero eigenvalue, i.e. vectors 

from the kernel. Solve Ux = 0 by back substitution, and take a basis 

x 1 , x 2 , . . . , x k of solutions. 

j. Each of these vectors x 1 , x 2 , . . . , x k constitutes a string of length 1 and 

zero eigenvalue. Add into our list of strings enough of these vectors to 

produce a basis. 

k. Put all of the strings into the columns of a single matrix F , each string 

listed in reverse order. Then F −1 AF is in Jordan normal form. 

A computer can carry out the algorithm, using symbolic algebra software. Lets 

see an example, and then prove that this algorithm always works. 

Example 23.7. Let 

⎛ 

⎞ 

4 0 0 0 

0 1 0 0 

A = ⎜ 

⎟ 

⎝0 0 3 0⎠ . 

0 1 1 3 

a. A is not already in Jordan normal form. However, we can see that A is 

built out of blocks: a 1 × 1 block, and a 3 × 3 block. Each block can be 

separately brought to Jordan normal form, so the strings will divide up 

into strings for each block. The 1 × 1 block has e 1 as eigenvector (a string 

of length 1) with eigenvalue 4. So it suffices to find the Jordan normal 

form for 

⎛ 

⎞ 

⎜ 

1 0 0 ⎟ 

A = ⎝0 3 0⎠ . 

1 1 3 

(We will still call this matrix A, even after we make various changes to it, 

to avoid a mess of notation.) 

b. The eigenvalues of A are 1 and 3. Replace A by A − 3, so 

⎛ 

⎞ 

A = 

⎜ 

⎝ 

−2 0 0 

⎟ 

0 0 0⎠ . 

1 1 0 

c. Forward elimination applied to (A 1) yields 

⎛ 

⎞ 

−2 0 0 1 0 0 

⎜ 

(U V ) = 

1 

⎟ 

⎝ 0 1 0 

2 

0 1⎠ , 

0 0 0 0 −1 0 

so 

⎛ 

⎞ ⎛ 

⎞ 

−2 0 0 1 0 0 

⎜ 

⎟ ⎜ 

U = ⎝ 0 1 0⎠ , V = 

1 

⎟ 

⎝ 

2 

0 1⎠ . 

0 0 0 0 −1 0


d. The first two columns of A are pivot columns. Cutting out nonpivot 

columns, 

⎛ ⎞ ⎛ 

⎞ 

−2 0 

−2 0 

A ′ ⎜ ⎟ 

= ⎝ 0 0⎠ , U ′ ⎜ 

⎟ 

= ⎝ 0 1 ⎠ . 

1 1 

0 0 

e. Solving U ′ X = UA ′ , we find 

⎛ 

⎞ 

−2 X 11 − 4 −2 X 12 

U ′ X − UA ′ ⎜ 

⎟ 

= ⎝ X 21 X 22 ⎠ . 

0 0 

Therefore 

( ) 

−2 0 

X = 

. 

0 0 

f. This matrix X is already in Jordan normal form. The strings of X are 

λ = −2 e 1 

λ = 0 e 2 

g. These give strings in A: 

⎛ ⎞ 

−2 

λ = −2 A ′ ⎜ ⎟ 

e 1 = ⎝ 0 ⎠ 

1 

⎛ ⎞ 

0 

λ = 0 A ′ ⎜ ⎟ 

e 2 = ⎝0⎠ 

1 

h. One of the strings has zero eigenvalue, and starts with 

⎛ ⎞ 

⎜ 

0 ⎟ 

z = ⎝0⎠ . 

1 

Solve Ux = V z for an unknown vector 

⎛ ⎞ 

⎜ 

x 1 

⎟ 

x = ⎝x 2 ⎠ , 

x 3 

finding 

⎛ ⎞ 

−2x 1 

⎜ ⎟ 

Ux − V z = ⎝x 2 − 1⎠ . 

0


Back substitution gives 

x = 

⎛ ⎞ 

0 

⎜ ⎟ 

⎝ 1 ⎠ 

x 3 

for any number x 3 . We will pick x 3 = 0 for simplicity. Add this vector x 

to the front of the string: e 2 , Ae 2 = e 3 . So the two strings for A are 

⎛ ⎞ 

−2 

⎜ ⎟ 

λ = −2 ⎝ 0 ⎠ 

1 

λ = 0 e 2 , e 3 

i. We can see a basis already, so skip this step. 

j. And skip this step too, for the same reason. 

Returning the original problem, we have to first shift back the eigenvalues by 3 

to restore the matrix back to 

⎛ 

⎞ 

1 0 0 

⎜ 

⎟ 

A = ⎝0 3 0⎠ . 

1 1 3 

This shifts eigenvalues, but preserves strings: 

⎛ ⎞ 

−2 

⎜ ⎟ 

λ = 1 ⎝ 0 ⎠ 

1 

λ = 3 e 2 , e 3 

Next we add back in the original block structure of A, so return to 

⎛ 

⎞ 

4 0 0 0 

0 1 0 0 

A = ⎜ 

⎟ 

⎝0 0 3 0⎠ . 

0 1 1 3 

This requires us to relabel our vectors for the second block, giving strings: 

λ = 4 e 1 

⎛ ⎞ 

0 

−2 

λ = 1 ⎜ ⎟ 

⎝ 0 ⎠ 

1 

λ = 3 e 3 , e 4


Writing each string down in reverse order, they become the columns of the 

matrix 

⎛ 

⎞ 

1 0 0 0 

0 −2 0 0 

F = ⎜ 

⎟ 

⎝0 0 0 1⎠ . 

0 1 1 0 

This matrix F brings A to Jordan normal form: 

⎛ 

⎞ 

4 0 0 0 

F −1 0 1 0 0 

AF = ⎜ 

⎟ 

⎝0 0 3 1⎠ . 

0 0 0 3 

Proposition 23.8. The algorithm takes any complex n × n matrix, and yields 

strings whose entries are a basis of C n . 

Proof. Clearly this is true for a 1 × 1 matrix. Lets imagine that A is n × n, and 

that we have proven this proposition for any smaller complex matrix. 

U = V A, so dropping pivotless columns gives U ′ = V A ′ . Therefore U ′ X = 

UA ′ just when A ′ X = AA ′ . This implies A ′ (X − λ) = (A − λ) A ′ , and so 

A ′ (X − λ) k = (A − λ) k A ′ for any integer k. Since A ′ has linearly independent 

columns, only 0 belongs to its kernel. Since the columns of A and of A ′ both 

span the image of A, the images of A and A ′ are the same. So mapping a 

vector y in C r to the vector A ′ y in C n makes a correspondence between vectors 

in C r and vectors in the image of A. A vector y starts a string for X just 

when A ′ y starts a corresponding string for A (of the same length with the same 

eigenvalue). By induction, we can assume that the strings for X produce a 

basis of C r . We write down the corresponding strings for A in step g. So we 

have a basis of strings for the image of A, not enough to make a basis of C n . 

Lets make some more. 

First, lets think about strings of A of eigenvalue λ = 0. They look like 

x, Ax, A 2 x, . . . . Every string of A that we have constructed so far lies in the 

image of A. We can always lengthen the λ = 0-strings because if the first vector 

in the string, say z, lies in the image, say z = Ax, then we can add x to the 

front of the string. After that first vector, which is not in the image, the rest of 

the string is in the image. This explains step h. 

But there could still be λ = 0-strings which have no vectors in the image. 

Such a λ = 0-string must have length 1, since any λ = 0-string x, Ax, . . . 

has Ax in the image. This explains step i. We can’t build any more linearly 

independent λ = 0-strings, or lengthen any of the λ = 0-strings we have. 

We need to see that our strings form a basis of C n . First we will check that 

all of the vectors in all of the strings are linearly independent, and then we will 

count them. 

Take any vectors x 1 , x 2 , . . . , x q , so that any two are either from different 

strings, or from different positions on the same string. Suppose that they satisfy


a linear relation. If none of these x j head up a λ = 0-string, then they live in 

the image, and are linearly independent by construction. There cannot be a 

linear relation between generalized eigenvectors with different eigenvalues, so 

we can assume that all of these x j live in λ = 0-strings. Any linear relation 

entails a linear relation 

0 = c 1 x 1 + c 2 x 2 + · · · + c p x p 

0 = c 1 Ax 1 + c 2 Ax 2 + · · · + c p Ax p , 

pushing each vector down the string. None of these vectors can occur at the 

same position, so there is no such relation unless all of the Ax j vanish. Therefore 

we can assume that all of the x j are in λ = 0-strings of length 1. But these are 

linearly independent by construction. 

We have to count how many vectors we have written down in all of the 

strings put together. A is n × n, and has rank r. The image of A has dimension 

r and the kernel has dimension n − r. In step f, we picked up strings containing 

r vectors. In step h, we added one vector to the front of each λ = 0-string. 

Such a string ends in the intersection of kernel and image. Suppose that the 

kernel and image have m dimensional intersection—there are m such strings. 

Therefore we have taken r vectors, and added m more. In step i, we added one 

more vector, for each dimension of the kernel outside the image. This adds 

n − r − m more vectors, giving a total of r + m + (n − r − m) = n vectors. 

Theorem 23.9. Every complex square matrix A can be brought to Jordan 

normal form 

⎛ 

λ 1 + ∆ 

F −1 AF = 

⎜ 

⎝ 

λ 2 + ∆ 

⎞ 

. .. ⎟ 

⎠ 

λ N + ∆ 

by the invertible complex matrix F obtained from the algorithm. 

Proof. Take the strings and write each one down in reverse order into the 

columns of a matrix F . Each string now becomes e i , e i−1 , e i−2 , . . . , just as in a 

Jordan block. Replace A by F −1 AF , so we can assume that Ae i = λ i e i + e i−1 . 

This gives us the i-th column of A, the same as for the Jordan normal form. 

Corollary 23.10. The same theorem is true for real square matrices, as long 

as all of the complex eigenvalues are real numbers. 

Proof. Use the same proof.


Problem 23.9. For an n × n matrix A with entries in a field F , show that we 

can put A into Jordan normal form as P AP −1 using a square matrix P with 

entries from F just when the characteristic polynomial det (A − λ I) splits into 

n linear factors with coefficients from F . 

Cracking of Jordan Blocks 

Jordan normal form is very sensitive to small changes in matrix entries. For 

this reason, we cannot compute Jordan normal form unless we know the matrix 

entries precisely, to infinitely many decimals. 

Problem 23.10. If A is n × n and has n different eigenvalues, show that A is 



( ) 

0 1 

∆ 2 = 

0 0 

is not diagonalizable, but the nearby matrix 

( ) 

ε 1 1 

0 ε 2 

is, as long as ε 1 ≠ ε 2 , since these ε 1 and ε 2 are the eigenvalues of this matrix. 

It doesn’t matter how small ε 1 and ε 2 are. The same idea clearly works for ∆ 

of any size, and for λ + ∆ of any size, and so for any matrix in Jordan normal 

form. 

Theorem 23.12. Every complex square matrix can be approximated as closely 

as we like by a diagonalizable square matrix. 

Remark 23.13. By “approximated”, we mean that we can make a new matrix 

with entries all as close as we wish to the entries of the original matrix. 

Proof. Put your matrix into Jordan normal form, say F −1 AF is in Jordan normal 

form, and then use the trick from the last example, to make diagonalizable 

matrices B close to F −1 AF . Then F BF −1 is diagonalizable too, and is close 

to A. 

Remark 23.14. Using the same proof, we can also approximate any real matrix 

arbitrarily closely by diagonalizable real matrices (i.e. diagonalized by real 

change of basis matrices), just when its eigenvalues are real.


23.3 Uniqueness of Jordan Normal Form 

Proposition 23.15. A square matrix has only one Jordan normal form, up to 

reordering the Jordan blocks. 

Proof. Clearly the eigenvalues are independent of change of basis. The problem 

is to figure out how to measure, for each eigenvalue, the number of blocks of 

each size. Fix an eigenvalue λ. Let 

d m (λ, A) = dim ker (A − λ) m . 

(We will only use this notation in this proof.) Clearly d m (λ, A) is independent 

of choice of basis. For example, d 1 (λ, A) is the number of blocks with eigenvalue 

λ, while d 2 (λ, A) counts vectors at or next to the end of strings with eigenvalue 

λ. All 1 × 1 blocks contribute to both d 1 (λ, A) and d 2 (λ, A). The difference 

d 2 (λ, A) − d 1 (λ, A) is the number of blocks of size at least 2 × 2. Similarly 

the number d 3 (λ, A) − d 2 (λ, A) measures the number of blocks of size at least 

3 × 3, etc. But then the difference 

(d 2 (λ, A) − d 1 (λ, A))−(d 3 (λ, A) − d 2 (λ, A)) = 2 d 2 (λ, A)−d 1 (λ, A)−d 3 (λ, A) 

is the number blocks of size at least 2 × 2, but not 3 × 3 or more, i.e. exactly 

2 × 2. 

The number of m × m blocks is the difference between the number of blocks 

at least m × m and the number of blocks at least (m + 1) × (m + 1), so 

number of m × m blocks = (d m (λ, A) − d m−1 (λ, A)) − (d m+1 (λ, A) − d m (λ, A)) 

= 2 d m (λ, A) − d m−1 (λ, A) − d m+1 (λ, A) . 

(We can just define d 0 (λ, A) = 0 to allow this equation to hold even for m = 1.) 

Therefore the number of blocks of any size is independent of the choice of 

basis. 


Compute the matrix F for which F −1 AF is in Jordan normal form, and the 

Jordan normal form itself, for 


( ) 

1 1 

A = 

1 1 


⎛ 

⎜ 

0 0 1 

⎞ 

⎟ 

A = ⎝0 0 0⎠ 

0 0 0

23.3. Uniqueness of Jordan Normal Form 223 

Problem 23.13. Thinking about the fact that ∆ has string e n , e n−1 , . . . , e 1 , 

what is the Jordan normal form of A = ∆ 2 n? (Don’t try to find the matrix F 

bringing A to that form.) 

Problem 23.14. Use the algorithm to compute the Jordan normal form of 

⎛ 

⎞ 

0 1 0 0 

0 0 1 0 

A = ⎜ 

⎟ 

⎝ 0 0 0 1⎠ 

−1 0 −2 0 

Problem 23.15. Use the algorithm to compute the Jordan normal form of 

⎛ 

⎞ 

0 0 1 0 0 

0 0 0 1 0 

A = 

0 0 0 0 1 

. 

⎜ 

⎟ 

⎝0 0 0 0 0⎠ 

0 0 0 0 0 

Problem 23.16. Without computation (and without finding the matrix F 

taking A to Jordan normal form), explain how you can see the Jordan normal 

form of 

⎛ 

⎞ 

⎜ 

1 10 100 ⎟ 

⎝0 20 200⎠ 

0 0 300 

Problem 23.17. If a square complex matrix A satisfies a complex polynomial 

equation f(A) = 0, show that each eigenvalue of A must satisfy the same 

equation. 

Problem 23.18. Prove that any reordering of Jordan blocks can occur by 

changing the choice of the matrix F we use to bring a matrix A to Jordan 

normal form. 

Problem 23.19. Prove that every matrix is a product of two diagonalizable 

matrices. 

Problem 23.20. Suppose that to each n × n matrix A we assign a number 

D(A), and that D(AB) = D(A)D(B).


(a) Prove that 

for any matrix P . 

(b) Define a function f(x) by 

⎛ 

x 

f(x) = D 

⎜ 

⎝ 

D ( P −1 AP ) = D(A) 

1 

1 

⎞ 

. 

. .. ⎟ 

⎠ 

1 

Prove that f(x) is multiplicative; i.e. f(ab) = f(a)f(b) for any numbers a 

and b. 

(c) Prove that on any diagonal matrix 

⎛ 

⎞ 

a 1 a 

A = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ 

a n 

we have 

D(A) = f (a 1 ) f (a 2 ) . . . f (a n ) . 

(d) Prove that on any diagonalizable matrix A, 

D(A) = f (λ 1 ) f (λ 2 ) . . . f (λ n ) , 

where λ 1 , λ 2 , . . . , λ n are the eigenvalues of A (counted with multiplicity). 

(e) Use the previous exercise to show that D(A) = f(det(A)) for any matrix A. 

(f) In this sense, det is the unique multiplicative quantity associated to a 

matrix, up to composing with a multiplicative function f. What are all 

continuous multiplicative functions f? (Warning: it is a deep result that 

there are many discontinuous multiplicative functions.)

24 Decomposition and Minimal 

Polynomial 

The Jordan normal form has an abstract version for linear maps, called the 

decomposition of a linear map. Most results in this chapter are obvious consequences 

of the Jordan normal form from the previous chapter, but because 

Jordan normal form is so complicated we won’t use it in this chapter. We 

will assume that the reader has read the last chapter up to, but perhaps not 

including, the algorithm. 

24.1 The Spectral Theorem for Complex Linear Maps 

Polynomial Division 

Problem 24.1. Divide x 2 + 1 into x 5 + 3x 2 + 4x + 1, giving quotient and 

remainder. 

Problem 24.2. Use the Euclidean algorithm (subsection 17.5 on page 175) 

applied to polynomials instead of integers, to compute the greatest common 

divisor r(x) of a(x) = x 4 + 2 x 3 + 4 x 2 + 4 x + 4 and b(x) = x 5 + x 2 + 2 x 3 + 2. 

Find polynomials u(x) and v(x) so that u(x)a(x) + b(x)v(x) = r(x). 

Given any pair of polynomials a(x) and b(x), the Euclidean algorithm writes 

their greatest common divisor r(x) as 

r(x) = u(x) a(x) + v(x) b(x), 

a “linear combination” of a(x) and b(x). Similarly, if we have any number of 

polynomials, we can write the greatest common divisor of any pair of them as 

a linear combination. Pick two pairs of polynomials, and write the greatest 

common divisor of the greatest common divisors as a linear combination, etc. 

Keep going until you hit the greatest common divisor of the entire collection. 

We can unwind this process, to write the greatest common divisor of the entire 

collection of polynomials as a linear combination of the polynomials themselves. 

225

226 Decomposition and Minimal Polynomial 

Problem 24.3. For integers 2310, 990 and 1386 (instead of polynomials) express 

their greatest common divisor as an “integer linear combination” of them. 

Generalized Eigenvectors 

Lemma 24.1. Let T : V → V be a complex linear map on a finite dimensional 

vector space V of dimension n. For each vector v in V there is a polynomial 

q(x) of degree at most n for which q(T )v = 0, and for which the roots of q(x) 

are eigenvalues of T . 

Remark 24.2. We could just let q(x) be the characteristic polynomial of T , and 

employ the Cayley–Hamilton theorem (theorem 18.14 on page 187). But we 

will opt for a more elementary proof. 

Proof. Take a vector v in V , and consider the vectors v, T v, T 2 , . . . , T n v. There 

are n + 1 vectors in this list, so there must be a linear relation 

a 0 v + a 1 T v + · · · + a n T n v. 

Let 

q(x) = a 0 + a 1 x + · · · + a n x n . 

Clearly q(T )v = 0. Rescale to get the leading coefficient (the highest nonzero 

a j ) to be 1. Suppose that q (λ) = 0. If λ is not an eigenvalue of T , then T − λ 

is invertible, and we can drop the x − λ j factor from q(x) and still satisfy 

q(T )v = 0. 

Theorem 24.3. Let T : V → V be a complex linear map on a finite dimensional 

vector space. Every vector in V can be written as a sum of generalized 

eigenvectors, in a unique way. In other words, V is the direct sum of the 

generalized eigenspaces of T . 

Proof. We have seen in the previous chapter that generalized eigenvectors with 

different eigenvalues are linearly independent. We need to show that every 

vector v can be written as a sum of generalized eigenvectors. 

Pick any vector v in V . Suppose that T has eigenvalues λ 1 , λ 2 , . . . , λ s , all 

distinct from one another. Pick q(x) as in lemma 24.1. By the fundamental 

theorem of algebra, we can write q(x) as a product of linear factors, each of the 

form x − λ j . Let q j (x) be the result of dividing out as many factors of x − λ j 

as possible from q(x), say: 

q j (x) = 

q(x) 

(x − λ j ) dj .

24.2. The Minimal Polynomial 227 

The various q j (x) have no common divisors by construction. Therefore we can 

find polynomials u j (x) for which ∑ u j (x)q j (x) = 1. Let v j = u j (T )q j (T )v. 

Clearly 

∑ 

vj = ∑ q j (T )u j (T )v 

= v. 

These v i are generalized eigenvectors: 

(T − λ j ) dj v j = (T − λ j ) dj q j (T )u j (T )v 

= q(T )u j (T )v 

= u j (T )q(T )v 

= 0. 

24.2 The Minimal Polynomial 

We are interested in equations satisfied by a matrix. 

Lemma 24.4. Every linear map T : V → V on a finite dimensional vector 

space V satisfies a polynomial equation p(T ) = 0 with p(x) a nonzero polynomial. 

Remark 24.5. The Cayley–Hamilton theorem (theorem 18.14 on page 187) 

proves this lemma easily, but again we prefer an elementary proof. 

Proof. The set of all linear maps V → V is a finite dimensional vector space, 

of dimension n 2 where n = dim V . Therefore the elements 1, T, T 2 , . . . , T n2 

cannot be linearly independent. 

Problem 24.4. Prove that ∆ k has zeros down the diagonal, for any integer 

k ≥ 1. 

Problem 24.5. Prove that, for any number λ, the diagonal entries of (λ + ∆) k 

are all λ k . 

Definition 24.6. The minimal polynomial of a square matrix A (or a linear map 

T : V → V ) is the smallest degree polynomial 

m(x) = x d + a d−1 x d−1 + · · · + a 0 

(with complex coefficients) for which m(A) = 0 (or m(T ) = 0). 

Lemma 24.7. There is a unique minimal polynomial for any linear map 

T : V → V on a finite dimensional vector space V . The minimal polynomial 

divides every other polynomial s(x) for which s(T ) = 0.


Remark 24.8. The Cayley–Hamilton theorem (theorem 18.14 on page 187) 

coupled with this lemma ensures that the minimal polynomial divides the 

characteristic polynomial. 

Proof. For example, if T satisfies two polynomials, say 0 = T 3 + 3 T + 1 and 

0 = 2 T 3 + 1, then we can rescale the second equation by 1 2 to get 0 = T 3 + 1 2 , 

and then we have two equations which both start with T 3 , so just take the 

difference: 0 = ( T 3 + 3 T + 1 ) − ( T 3 + 1 2) 

. The point is that the T 3 terms wipe 

each other out, giving a new equation of lower degree. Keep going until you 

get the lowest degree possible nonzero polynomial. Rescale to get the leading 

coefficient to be 1. 

If s(x) is some other polynomial, and s(T ) = 0, then divide m(x) into s(x), 

say s(x) = q(x)m(x) + r(x), with remainder r(x) of smaller degree than m(x). 

But then 0 = s(T ) = q(T )m(T ) + r(T ) = r(T ), so r(x) has smaller degree than 

m(x), and r(T ) = 0. But m(x) is already the smallest degree possible without 

being 0. So r(x) = 0, and m(x) divides s(x). 

Problem 24.6. Prove that the minimal polynomial of ∆ n is m(x) = x n . 

Problem 24.7. Prove that the minimal polynomial of a Jordan block λ + ∆ n 

is m(x) = (x − λ) n . 

Lemma 24.9. If A and B are square matrices with minimal polynomials m A (x) 

and m B (x), then the matrix 

( ) 

A 0 

C = 

0 B 

has minimal polynomial m C (x) the greatest common divisor of the polynomials 

m A (x) and m B (x). 

Proof. Calculate that 

C 2 = 

etc., so for any polynomial q(x), 

q(C) = 

( ) 

A 2 0 

, 

0 B 2 

( 

) 

q(A) 0 

. 

0 q(B) 

Let g(x) be the greatest common divisor of the polynomials m A (x) and m B (x). 

Then clearly g(C) = 0. So m C (x) divides g(x). But m C (C) = 0, so m C (A) = 0. 

Therefore m A (x) divides m C (x). Similarly, m B (x) divides m C (x). So m C (x) 

is the greatest common divisor.

24.2. The Minimal Polynomial 229 

Using the same proof: 

Lemma 24.10. If a linear map T : V → V has invariant subspaces U and W 

so that V = U ⊕ W , then the minimal polynomial of T is the greatest common 

divisor of the minimal polynomials of T | U 

and T | W 

. 

Lemma 24.11. The minimal polynomial m(x) of a complex linear map T : 

V → V on a finite dimensional vector space V is 

m(x) = (x − λ 1 ) d1 (x − λ 2 ) d2 . . . (x − λ s ) ds 

where λ 1 , λ 2 , . . . , λ s are the eigenvalues of T and d j is no larger than the 

dimension of the generalized eigenspace of λ j . 

Remark 24.12. In fact d j is the size of the largest Jordan block with eigenvalue 

λ j in the Jordan normal form. Lets first prove the result using Jordan normal 

form. 

Proof. We can assume that T is already in Jordan normal form: the minimal 

polynomial is the greatest common divisor of the minimal polynomials of the 

blocks. 

Remark 24.13. Next a proof which doesn’t use Jordan normal form. 

Proof. We need only prove the result on each generalized eigenspace since they 

form a direct sum. We can assume that V is a single generalized eigenspace, say 

with eigenvalue λ. The result holds for T just if it holds for T − λ, so we can 

assume that λ = 0 is our only eigenvalue. By lemma 24.1 on page 226, every 

vector v satisfies T n v = 0. So the minimal polynomial must divide x n . 

Corollary 24.14. A square matrix A (or linear map T : V → V ) is diagonalizable 

just when it satisfies a polynomial equation s(A) = 0 (or s(T ) = 0) 

s(x) = (x − λ 1 ) (x − λ 2 ) . . . (x − λ s ) = 0, 

for some distinct numbers λ 1 , λ 2 , . . . , λ s , which happens just when its minimal 

polynomial is a product of distinct linear factors. 


Problem 24.8. Find the minimal polynomial of 

( ) 

1 2 

A = 

. 

3 4


Problem 24.9. Prove that the minimal polynomial of any 2 × 2 matrix A is 

m(λ) = λ 2 − tr A λ + det A, 

(where tr A is the trace of A), unless A is a multiple of the identity matrix, say 

A = c for some number c, in ehich case m(A) = λ − c. 

Problem 24.10. Use Jordan normal form to prove the Cayley–Hamilton 

theorem: every complex square matrix A satisfies p(A) = 0, where p(λ) = 

det (A − λ I) is the characteristic polynomial of A. 

Problem 24.11. Prove that if A is a square matrix with real entries, then the 

minimal polynomial of A has real coefficients. 

Problem 24.12. If A is a square matrix, and A n = 1, prove that A is diagonalizable 

over the complex numbers. Give an example to show that A need not 

be diagonalizable over the real numbers. 

Appendix: How the Find the Minimal Polynomial 

Given a square matrix A, to find its minimal polynomial requires that we find 

linear relations among powers of A. If we find a relation like A 3 = I + 5A, 

then we can multiply both sides by A to obtain a relation A 4 = A + 5A 2 . In 

particular, once some power of A is a linear combination of lower powers of A, 

then every higher power is also a linear combination of lower powers. 

For each n × n matrix A, just for this appendix lets write A to mean the 

vector you get by chopping out the columns of A and stacking them on top of 

one another. For example, if 

then 

A = 

( 

1 2 

3 4 

⎛ ⎞ 

1 

3 

A = ⎜ ⎟ 

⎝2⎠ . 

4 

Clearly a linear relation like A 3 = I +5A will hold just when it holds underlined: 

A 3 = I + 5A. 

Now lets suppose that A is n × n, and lets form the matrix 

) 

B = 

(I A A 2 . . . A n . 

) 

,

24.3. Decomposition of a Linear Map 231 

Clearly B has n 2 rows and n columns. Apply forward elimination to B, and 

call the resulting matrix U. If one of the columns, lets say A 3 , is not a pivot 

column, then A 3 is a linear combination of lower powers of A, so therefore A 4 is 

too, etc. So as soon as you hit a pivotless column of B, all subsequent columns 

are pivotless. Therefore U looks like 

⎛ 

⎞ 

U = 

⎜ 

⎝ 

, 

⎟ 

⎠ 

pivots straight down the diagonal, until you hit rows of zeros. Cut out all of 

the pivotless columns of U except the first pivotless column. Also cut out the 

zero rows. Then apply back substitution, turning U into 

⎛ 

⎞ 

1 a 0 

1 a 1 

⎜ 

. 

⎝ 

.. ⎟ . ⎠ . 

1 a p 

Then the minimal polynomial is m(x) = x p+1 − a 0 − a 1 x − · · · − a p x p . To 

see that this works, you notice that we have cut out all but the column of 

A p+1 , the smallest power of A that is a linear multiple of lower powers. So 

the minimal polynomial has to express A p+1 as a linear combination of lower 

powers, i.e. solving the linear equations a 0 I + a 1 A + · · · + a p A p = A p+1 = 0. 

These equations yield the matrix B with a 0 , a 1 , . . . , a p as the unknowns, and 

we just apply elimination. On large matrices, this process is faster than finding 

the determinant. But it has the danger that small perturbations of the matrix 

entries alter the minimal polynomial drastically, so we can only apply this 

process when we know the matrix entries precisely. 

Problem 24.13. Find the minimal polynomial of 

⎛ 

⎞ 

⎜ 

0 1 1 ⎟ 

⎝2 1 2 ⎠ . 

0 0 −1 

What are the eigenvalues? 

24.3 Decomposition of a Linear Map 

There is a more abstract version of the Jordan normal form, applicable to an 

abstract finite dimensional vector space.


Definition 24.15. If T : V → W is a linear map, and U is a subspace of V , 

recall that the restriction T | U 

: U → W is defined by T | U 

u = T u for u in U. 

Definition 24.16. Suppose that T : V → V is a linear map from a vector space 

back to itself, and U is a subspace of V . We say that U is invariant under T if 

whenever u is in U, T u is also in U. 

A difficult result to prove by any other means: 

Corollary 24.17. If a linear map T : V → V on a finite dimensional vector 

space is diagonalizable, then its restriction to any invariant subspace is 


Proof. The linear map satisfies the same polynomial equation, even after restricting 

to the subspace. 

Problem 24.14. Prove that a linear map T : V → V on a finite dimensional 

vector space is diagonalizable just when every subspace invariant under T has 

a complementary subspace invariant under T . 

Definition 24.18. A linear map N : V → V from a vector space back to itself is 

called nilpotent if there is some positive integer k for which N k = 0. 

Clearly a linear map on a finite dimensional vector space is nilpotent just 

when its minimal polynomial is p(x) = x k for some positive integer k. 

Corollary 24.19. A linear map N : V → V is nilpotent just when the restriction 

of N to any N invariant subspace is nilpotent. 

Problem 24.15. Prove that the only nilpotent which is diagonalizable is 0. 

Problem 24.16. Give an example to show that the sum of two nilpotents might 

not be nilpotent. 

Definition 24.20. Two linear maps S : V → V and T : V → V commute when 

ST = T S. 

Lemma 24.21. The sum and difference of commuting nilpotents is nilpotent. 

Proof. If S and T are nilpotent linear maps V → V , say S p = 0 and T q = 0. 

Then take any number r ≥ p + q and expand out the sum (S + T ) r . Because S 

and T commute, every term can be written with all its S factors on the left, 

and all its T factors on the right, and there are r factors in all, so either p of 

the factors must be S or q must be T , hence each term vanishes. 

Lemma 24.22. If two linear maps S, T : V → V commute, then S preserves 

each generalized eigenspace of T , and vice versa.

24.3. Decomposition of a Linear Map 233 

Proof. If ST = T S, then clearly ST 2 = T 2 S, etc. so that Sp(T ) = p(T )S for 

any polynomial p(T ). Suppose that T has an eigenvalue λ. If x is a generalized 

eigenvector, i.e. (T − λ) p x = 0 for some p, then 

(T − λ) p Sx = S (T − λ) p x = 0, 

so that Sx is also a generalized eigenvector with the same eigenvalue. 

Theorem 24.23. Take T : V → V any linear map from a finite dimensional 

vector space back to itself. T can be written in just one way as a sum T = D+N 

with D diagonalizable, N nilpotent, and all three of T, D and N commuting. 

Example 24.24. For T = λ + ∆, set D = λ, and N = ∆. 

Remark 24.25. If any two of T, D and N commute, and T = D + N, then it is 

easy to check that all three commute. 

Proof. First, lets prove that D and N exist, and then prove they are unique. 

One proof that D and N exist, which doesn’t require Jordan normal form: split 

up V into the direct sum of the generalized eigenspaces of T . It is enough to 

find some D and N on each one of these spaces. But on each eigenspace, say 

with eigenvalue λ, we can let D = λ and let N = T − λ. So existence of D and 

N is obvious. 

Another proof that D and N exist, which uses Jordan normal form: pick 

a basis in which the matrix of T is in Jordan normal form. Lets also call this 

matrix T , say 

⎛ 

⎞ 

λ 1 + ∆ 

λ 

T = 

2 + ∆ 

⎜ 

. 

⎝ 

.. ⎟ 

⎠ . 

λ N + ∆ 

Let 

⎛ 

⎞ 

λ 1 λ 

D = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ , 

λ N 

and 

⎛ 

∆ 

N = 

⎜ 

⎝ 

∆ 

⎞ 

⎟ 

⎠ . 

. .. 

∆ 

This proves that D and N exist. 

Why are D and N uniquely determined? All generalized eigenspaces of T 

are D and N invariant. So we can restrict to a single generalized eigenspace


of T , and need only show that D and N are uniquely determined there. If 

λ is the eigenvalue, then D − λ = (T − λ) − N is a difference of commuting 

nilpotents, so nilpotent by lemma 24.21 on page 232. Therefore D − λ is both 

nilpotent and diagonalizable, and so vanishes: D = λ and N = T − λ, uniquely 

determined. 

Theorem 24.26. If T 0 : V → V and T 1 : V → V are commuting complex 

linear maps (i.e. T 0 T 1 = T 1 T 0 ) on a finite dimensional complex vector space 

V , then splitting each into its diagonalizable and nilpotent parts, T 0 = D 0 + N 0 

and T 1 = D 1 + N 1 , any two of the maps T 0 , D 0 , N 0 , T 1 , D 1 , N 1 commute. 

Proof. If x is a generalized eigenvector of T 0 (so that (T 0 − λ) k x = 0 for some λ 

and integer k > 0), then T 1 x is also (because (T 0 − λ) k T 1 x = 0 too, by pulling 

the T 1 across to the left). Since V is a direct sum of generalized eigenspaces of 

T 0 , we can restrict to a generalized eigenspace of T 0 and prove the result there. 

So we can assume that T 0 = λ 0 + N 0 , for some complex number λ 0 . Switching 

the roles of T 0 and T 1 , we can assume that T 1 = λ 1 + N 1 . Clearly D 0 = λ 0 and 

D 1 = λ 1 commute with one another and with anything else. The commuting of 

T 0 and T 1 is equivalent to the commuting of N 0 and N 1 . 

Remark 24.27. All of the results of this chapter apply equally well to any linear 

map T : V → V on any finite dimensional vector space over any field, as long 

as the characteristic polynomial of T splits into a product of linear factors. 


Problem 24.17. Find the decomposition T = D + N of 

( ) 

ε 

T = 1 1 

0 ε 2 

(using the same letter T for the linear map and its associated matrix). 

Problem 24.18. If two linear maps S : V → V and T : V → V on a finite 

dimensional complex vector space commute, show that the eigenvalues of ST 

are products of eigenvalues of S with eigenvalues of T .

25 Matrix Functions of a Matrix 

Variable 

We will make sense out of expressions like e T , sin T, cos T for square matrices 

T , and for linear maps T : V → V . We expect that the reader is familiar with 

calculus and infinite series. 

Definition 25.1. A function f(x) is analytic if near each point x = x 0 , f(x) is 

the sum of a convergent Taylor series, say 

f(x) = a 0 + a 1 (x − x 0 ) + a 2 (x − x 0 ) 2 + . . . 

We will henceforth allow the variable x (and the point x = x 0 around which we 

take the Taylor series) to take on real or complex values. 

Definition 25.2. If f(x) is an analytic function and T : V → V is a linear map 

on a finite dimensional vector space, define f(T ) to mean the infinite series 

f(T ) = a 0 + a 1 (T − x 0 ) + a 2 (T − x 0 ) 2 + . . . , 

just plugging the linear map T into the expansion. 

Lemma 25.3. Under an isomorphism F : U → V of finite dimensional vector 

spaces 

f ( F −1 T F ) = F −1 f(T )F, 

with each side defined when the other is. 

Proof. Expanding out, we see that ( F −1 T F ) 2 

= F −1 T 2 F . By induction, 

( 

F −1 T F ) k 

= F −1 T k F for k = 1, 2, 3, . . . . So for any polynomial function p(x), 

we see that p ( F −1 T F ) = F −1 p(T )F . Therefore the partial sums of the Taylor 

expansion converge on the left hand side just when they converge on the right, 

approaching the same value. 

Remark 25.4. If a square matrix is in square blocks, 

( ) 

B 0 

A = 

, 

0 C 

235

236 Matrix Functions of a Matrix Variable 

then clearly 

f(A) = 

( 

) 

f(B) 0 

. 

0 f(C) 

So we only need to work one block at a time. 

Theorem 25.5. Let f(x) be an analytic function. If A is a single Jordan block, 

A = λ + ∆ n , and the series for f(x) converges near x = λ, then 

f(A) = f(λ) + f ′ (λ) ∆ + f ′′ (λ) 

2 

∆ 2 + f ′′ (λ) 

3! 

∆ 3 + · · · + f (n−1) (λ) 

(n − 1)! 

∆ n−1 . 

Proof. Expand out the Taylor series, keeping in mind that ∆ n = 0. 

Corollary 25.6. The value of f(A) does not depend on which Taylor series 

we use for f(x): we can expand f(x) about any point as long as the series 

converges on the spectrum of A. If we change the choice of the point to expand 

around, the resulting expression for f(A) determines the same matrix. 

Proof. Entries will be given by the formulas above, which don’t depend on the 

particular choice of Taylor series, only on the values of the function f(x) for x 

in the spectrum of A. 

Corollary 25.7. Suppose that T : V → V is a linear map of a finite dimensional 

vector space. Split T into T = D + N, diagonalizable and nilpotent parts. Take 

f(x) an analytic function given by a Taylor series converging on the spectrum 

of T (which is the spectrum of D). Then 

f(T ) = f(D + N) 

= f(D) + f ′ (D)N + f ′′ (D)N 2 

where n is the dimensional of V . 

2! 

+ f ′′′ (D)N 3 

3! 

+ · · · + f (n−1) (D)N n−1 

(n − 1)! 

Remark 25.8. In particular, the result is independent of the choice of point 

about which we expand f(x) into a Taylor series, as long as the series converges 

on the spectrum of T . 

Example 25.9. Consider the matrix 

( ) 

π 1 

A = 

= π + ∆. 

0 π

25.1. The Exponential and Logarithm 237 

Then 

sin A = sin (π + ∆) 

= sin (π) + sin ′ (π) ∆ 

= 0 + cos (π) ∆ 

= −∆ 

( ) 

0 −1 

= 

. 

0 0 

Problem 25.1. Find √ 1 + ∆, log (1 + ∆), e ∆ . 

Remark 25.10. If f(x) is analytic on the spectrum of a linear map T : V → V 

on finite dimensional vector space, then we could actually define f(T ) by 

using a different Taylor series for f(x) around each eigenvalue, but that would 

require a more sophisticated theory. (We could, for example, split up V into 

generalized eigenspaces for T , and compute out f(T ) on each generalized 

eigenspace separately; this proves convergence.) However, we will never use 

such a complicated theory—we will only define f(T ) when f(x) has a single 

Taylor series converging on the entire spectrum of T . 

25.1 The Exponential and Logarithm 

Example 25.11. The series 

e x = 1 + x + x2 

2! + x3 

3! + . . . 

converges for all values of x. Therefore e T is defined for any linear map 

T : V → V on any finite dimensional vector space. 

Problem 25.2. Recall that the trace of an n × n matrix A is tr A = A 11 + 

A 22 + · · · + A nn . Prove that det e A = e tr A . 

Example 25.12. For any positive number r, 

∞∑ 1 

( 

log (x + r) = − − x ) k 

. 

k r 

For |x| < r, this series converges by the ratio test. (Clearly x can actually be 

real or complex.) If T : V → V is a linear map on a finite dimensional vector 

space, and if every eigenvalue of T has positive real part, then we can pick r 

larger than the largest eigenvalue of T . Then 

log T = − 

∞∑ 

k=0 

1 

k 

k=0 

( 

− T − r 

r 

converges. The value of this sum does not depend on the value of r, since the 

value of log x = log (x − r + r) doesn’t. 

) k


Remark 25.13. The same tricks work for complex linear maps. We won’t ever 

be tempted to consider f(T ) for f(x) anything other than a real-valued function 

of a real variable; the reader may be aware that there is a sophisticated theory 

of complex functions of a complex variable. 

Problem 25.3. Find the Taylor series of f(x) = 1 x 

around the point x = r (as 

long as r ≠ 0). Prove that f(A) = A −1 , if all eigenvalues of A have positive 

real part. 

Problem 25.4. If f(x) is the sum of a Taylor series converging on the spectrum 

of a matrix A, why are the entries of f(A) smooth functions of the entries of A? 

(A function is called “smooth” to mean that we can differentiate the function 

any number of times with respect to any of its variables, in any order.) 

Problem 25.5. For any complex number z = x + iy, prove that e z converges 

to e x (cos y + i sin y). 

Problem 25.6. Use the result of the previous exercise to prove that e log A = A 

if all eigenvalues of T have positive real part. 

Problem 25.7. Use the results of the previous two exercises to prove that 

log e A = A if all eigenvalues of A have imaginary part strictly between −π/2 

and π/2. 

Lemma 25.14. If A and B are two n × n matrices, and AB = BA, then 

e A+B = e A e B . 

Proof. We expand out the product e A e B and collect terms. (The process 

proceeds term by term exactly as it would if A and B were real numbers, 

because AB = BA.) 

Corollary 25.15. e A is invertible for all square matrices A, and ( e A) −1 

= e −A . 

Proof. A commutes with −A, so e A e −A = e 0 = 1. 

Definition 25.16. A real matrix A is skew-symmetric if A t = −A. A complex 

matrix A is skew-adjoint if A ∗ = −A. 

Corollary 25.17. If A is skew-symmetric/complex skew-adjoint then e A is 

orthogonal/unitary. 

Proof. Term by term in the Taylor series, ( e A) t 

= e 

A t 

= e −A . Similarly for the 

other cases. 

Lemma 25.18. If two n × n matrices A and B commute (AB = BA) and 

the eigenvalues of A and B have positive real part and the products of their 

eigenvalues also have positive real part, then log (AB) = log A + log B.

25.1. The Exponential and Logarithm 239 

Proof. The eigenvalues of AB will be products of eigenvalues of A and of B, as 

see in section 24.3 on page 231. Again the result proceeds as it would for A 

and B numbers, term by term in the Taylor series. 

Corollary 25.19. If A is orthogonal/unitary and all eigenvalues of A have 

positive real part, then log A is skew-symmetric/complex skew-adjoint. 

Problem 25.8. What do corollaries 25.17 and 25.19 say about the matrix 

( ) 

0 1 

A = 

? 

−1 0 

Problem 25.9. What can you say about e A for A symmetric? For A selfadjoint? 

If we slightly alter a matrix, we only slightly alter its spectrum. 

Theorem 25.20 (Continuity of the spectrum). Suppose that A is a n × n 

complex matrix, and pick some disks in the complex plane which together 

contain exactly k eigenvalues of A (counting each eigenvalue by its multiplicity). 

In order to ensure that a complex matrix B also has exactly k eigenvalues (also 

counted by multiplicity) in those same disks, it is sufficient to ensure that each 

entry of B is close enough to the corresponding entry of A. 

Proof. Eigenvalues are the roots of the characteristic polynomial det (A − λ I). 

If each entry of B is close enough to the corresponding entry of A, then each 

coefficient of the characteristic polynomial of B is close to the corresponding 

coefficient of the characteristic polynomial of A. The result follows by the 

argument principle (theorem 25.32 on page 243 in the appendix to this chapter). 

Remark 25.21. The eigenvalues vary as differentiable functions of the matrix 

entries as well, except where eigenvalues “collide” (i.e. at matrices for which 

two eigenvalues are equal), when there might not be any way to write the 

eigenvalues in terms of differentiable functions of matrix entries. In a suitable 

sense, the eigenvectors can also be made to depend differentiably on the matrix 

entries away from eigenvalue “collisions.” See Kato [5] for more information. 

Problem 25.10. Find the eigenvalues of 

( ) 

0 1 

A = 

t 0 

as a function of t. What happens at t = 0? 

Problem 25.11. Prove that if an n × n complex matrix A has n distinct 

eigenvalues, then so does every complex matrix whose entries are close enough 

to the entries of A.


Corollary 25.22. If f(x) is an analytic function given by a Taylor series 

converging on the spectrum of a matrix A, then f(B) is defined by the same 

Taylor expansion as long as each entry of B is close enough to the corresponding 

entry of A. 

Problem 25.12. Prove that a complex square matrix A is invertible just when 

it has the form A = e L for some square matrix L. 

25.2 Appendix: Analysis of Infinite Series 

Proposition 25.23. Suppose that f(x) is defined by a convergent Taylor series 

f(x) = ∑ k 

a k (x − x 0 ) k , 

converging for x near x 0 . Then both of the series 

∑ 

|a k | |x − x 0 | k , 

converge for x near x 0 . 

k 

∑ 

ka k (x − x 0 ) k−1 

k 

Proof. We can assume just by replacing x by x − x 0 that x 0 = 0. Lets suppose 

that our Taylor series converges for −b < x < b. Then it must converge for 

x = b/r for any r > 1. So the terms must get small eventually, i.e. 

( ) ∣ k b ∣∣∣∣ ∣ a k → 0. 

r 

For large enough k, 

( r 

) k 

|a k | < . 

b 

Pick any R < r. Then 

Therefore if |x| ≤ b/R, we have 

( ) k ( ) k b b (r ) k 

|a k | < 

R R b 

( r 

) k 

= . 

R 

∑ 

|ak | |x| k ≤ ∑ ( r 

R 

) k

25.2. Appendix: Analysis of Infinite Series 241 

a geometric series of diminishing terms, so convergent. 

Similarly, 

∑ ∣ ka k x k∣ ∣ ≤ ∑ ( r 

) k 

k 

R 

which converges by the comparison test. 

Corollary 25.24. Under the same conditions, the series 

∑ 

( k 

a k (x − x 0 ) 

l) 

k−l 

converges in the same domain. 

Proof. The same trick works. 

k 

Lemma 25.25. If f(x) is the sum of a convergent Taylor series, then f ′ (x) is 

too. More specifically, if 

f(x) = ∑ a k (x − x 0 ) k , 

then 

Proof. Let 

f ′ (x) = ∑ ka k (x − x 0 ) k−1 . 

f 1 (x) = ∑ ka k (x − x 0 ) k−1 , 

which we know converges in the same open interval where the Taylor series for 

f(x) converges. We have to show that f 1 (x) = f ′ (x). Pick any points x + h 

and x where f(x) converges. Expand out: 

f (x + h) − f (x) 

h 

∑k 

− f 1 (x) = 

a k (x + h) k − ∑ k a kx k 

− ∑ ka k x k−1 

h 

k 

= ∑ ( 

) 

(x + h) k − x k 

a k − kx k−1 

h 

k 

( ) 

k 

l)x k−l h l−1 − x k h −1 − kx k−1 

= ∑ k 

= h ∑ k 

= h ∑ k 

( k∑ 

a k 

l=0 

k∑ 

( k 

x 

l) 

k−l h l−2 

a k 

l=2 

a k k(k − 1) 

k∑ 

l=0 

1 

(l + 2)(l + 1) 

( k − 2 

l 

) 

x k−2−l h l .


Each term has absolute value no more than 

) ∣ k ∣∣∣ 

∣ a kk(k − 1)( 

(x + h) k−2 

l 

which are the terms of a convergent series. The expression 

f (x + h) − f (x) 

h 

− f 1 (x) 

is governed by a convergent series multiplied by h. In particular the limit as 

h → 0 is 0. 

Corollary 25.26. Any function f(x) which is the sum of a convergent Taylor 

series in a disk has derivatives of all orders everywhere in the interior of that 

disk, given by formally differentiating the Taylor series of f(x). 

z 

p 

A point z travelling 

around a circle 

winds around 

each point p inside, 

and doesn’t wind 

around any point p 

outside. 

As z travels around 

the circle, the angle 

from p to z increases 

by 2π if p is inside, 

but is periodic if p 

is outside. 

z 

z 

p 

p 

z 

p 

All of these tricks work equally well for complex functions of a complex 

variable, as long as they are sums of Taylor series. 

25.3 Appendix: Perturbing Roots of Polynomials 

Lemma 25.27. As a point z travels counterclockwise around a circle, it travels 

once around every point inside the circle, and does not travel around any point 

outside the circle. 

Remark 25.28. Lets state the result more precisely. The angle between any 

points z and p is defined only up to 2π multiples. As we will see, if z travels 

around a circle counterclockwise, and p doesn’t lie on that circle, we can select 

this angle to be a continuously varying function φ. If p is inside the circle, then 

this function increases by 2π as ztravels around the circle. If p is outside the 

circle, then this function is periodic as z travels around the circle. 

Proof. Rotate the picture to get p to lie on the positive x-axis, say p = (p 0 , 0). 

Scale to get the circle to be the unit circle, so z = (cos θ, sin θ). The vector 

from z to p is 

p − z = (p 0 − cos θ, − sin θ) . 

This vector has angle φ from the horizontal, where 

( 

p0 − cos θ 

(cos φ, sin φ) = 

, − sin θ ) 

r r 

with 

r = 

√ 

(p 0 − cos θ) 2 + sin 2 θ. 

If p 0 > 1, then cos φ > 0 so that after adding multiples of 2π, we must have φ 

contained inside the domain of the arcsin function: 

( 

φ = arcsin − sin θ ) 

, 

r

25.3. Appendix: Perturbing Roots of Polynomials 243 

a continuous function of θ, and φ (θ + 2π) = φ (θ). This continuous function φ 

is uniquely determined up to adding integer multiples of 2π. 

On the other hand, suppose that p 0 < 1. Consider the angle Φ between 

Z = (p 0 cos θ, p 0 sin θ) and P = (1, 0). By the argument above 

( 

Φ = arcsin − p ) 

0 sin θ 

r 

where 

r = 

√ 

(1 − p 0 cos θ) 2 + p 2 0 sin2 θ. 

Rotating by θ takes P to z and Z to p. Therefore the angle of the ray from 

z to p is φ = Φ(θ) + θ, a continuous function increasing by 2π every time θ 

increases by 2π. Since Φ is uniquely determined up to adding integer multiples 

of 2π, so is φ. 

Corollary 25.29. Consider the complex polynomial function 

P (z) = (z − p 1 ) (z − p 2 ) . . . (z − p n ) . 

Suppose that p 1 lies inside some disk, and all other roots p 2 , p 3 , . . . , p n lie outside 

that disk. Then as z travels once around the boundary of that disk, the argument 

of the complex number w = P (z) increases by 2π. 

Proof. The argument of a product is the sum of the arguments of the factors, 

so the argument of P (z) is the sum of the arguments of z − p 1 , z − p 2 , etc. 

Corollary 25.30. Consider the complex polynomial function 

P (z) = a (z − p 1 ) (z − p 2 ) . . . (z − p n ) . 

Suppose that some roots p 1 , p 2 , . . . , p k all lie inside some disk, and all other 

roots p k+1 , p k+2 , . . . , p n lie outside that disk. Then as z travels once around the 

boundary of that disk, the argument of the complex number w = P (z) increases 

by 2πk. 

Corollary 25.31. Consider two complex polynomial functions P (z) and Q(z). 

Suppose that P (z) has k roots lying inside some disk, and Q(z) has l roots lying 

inside that same disk, and all other roots of P (z) and Q(z) lie outside that disk. 

(So no roots of P (z) or Q(z) lie on the boundary of the disk.) Then as z travels 

once around the boundary of that disk, the argument of the complex number 

w = P (z)/Q(z) increases by 2π(k − l). 

Theorem 25.32 (The argument principle). If P (z) is a polynomial, with k 

roots inside a particular disk, and no roots on the boundary of that disk, then 

every polynomial Q(z) of the same degree as P (z) and whose coefficients are 

sufficiently close to the coefficients of P (z) has exactly k roots inside the same 

disk, and no roots on the boundary.


Proof. To apply corollary 25.31 on the previous page, we have only to ensure 

that Q(z)/P (z) is not going to change in argument (or vanish) as we travel 

around the boundary of that disk. So we have only to ensure that while z 

stays on the boundary of the disk, Q(z)/P (z) lies in a particular half-plane, for 

example that Q(z)/P (z) is never a negative real number (or 0). So it is enough 

to ensure that |P (z) − Q(z)| < |P (z)| for z on the boundary of the disk. Let 

m be the minimum value of |P (z)| for z on the boundary of the disk. Suppose 

that the furthest point of our disk from the origin is some point z with |z| = R. 

Then if we write out Q(z) = P (z) + ∑ c j z j , we only need to ensure that the 

coefficients c 0 , c 1 , . . . , c n satisfy ∑ |c j |R j < m, to be sure that Q(z) will have 

the same number of roots as P (z) in that disk.

26 Symmetric Functions of 

Eigenvalues 

26.1 Symmetric Functions 

Definition 26.1. A function f (x 1 , x 2 , . . . , x n ) is symmetric if its value is unchanged 

by permuting the variables x 1 , x 2 , . . . , x n . 

For example, x 1 + x 2 + · · · + x n is clearly symmetric. 

Definition 26.2. The elementary symmetric functions are the functions 

s 1 (x) = x 1 + x 2 + . . . + x n 

s 2 (x) = x 1 x 2 + x 1 x 3 + · · · + x 1 x n + x 2 x 3 + · · · + x n−1 x n 

. 

. 

s k (x) = 

∑ 

i 1

246 Symmetric Functions of Eigenvalues 

Lemma 26.5. The entries of two vectors z and w are permutations of one 

another just when s(z) = s(w). 

Proof. The roots of P z (t) and P w (t) are the same numbers. 

Corollary 26.6. A function is symmetric just when it is a function of the 

elementary symmetric functions. 

Remark 26.7. This means that every symmetric function f : C n → C has the 

form f(z) = h(s(z)), for a unique function h : C n → C, and conversely if h is 

any function at all, then f(z) = h(s(z)) determines a symmetric function. 

Theorem 26.8. A symmetric function of some complex variables is continuous 

just when it is expressible as a continuous function of the elementary symmetric 

functions, and this expression is uniquely determined. 

Proof. If h(z) is continuous, clearly f(z) = h(s(z)) is. If f(z) is continuous and 

symmetric, then given any sequence w 1 , w 2 , . . . in C n converging to a point w, 

we let z 1 , z 2 , . . . be a sequence in C n for which s (z j ) = w j , and z a point for 

which s(z) = w. The entries of z j are the roots of the polynomial 

P zj (t) = t n − w j1 t n−1 + w j2 t n−2 + · · · + (−1) n w jn . 

By the argument principle (theorem 25.32 on page 243), we can rearrange 

the entries of each of the various z 1 , z 2 , . . . vectors so that they converge to z. 

Therefore h (w j ) = f (z j ) converges to f(z) = h(w). If there are two expressions, 

f(z) = h 1 (s(z)) and f(z) = h 2 (s(z)), then because s is onto, h 1 = h 2 . 

If a = (a 1 , a 2 , . . . , a n ), write z a to mean z a1 

1 za2 2 . . . zan n . Call a the weight of 

the monomial z a . We will order weights by “alphabetical” order, for example so 

that (2, 1) > (1, 2). Define the weight of a polynomial to be the highest weight 

of any of its monomials. (The zero polynomial will not be assigned any weight.) 

The weight of a product of nonzero polynomials is the sum of the weights. The 

weight of a sum is at most the highest weight of any term. The weight of s j (z) 

is 

(1, 1, . . . , 1, 0, 0, . . . , 0). 

} {{ } 

j 

Theorem 26.9. Every symmetric polynomial f has exactly one expression as a 

polynomial in the elementary symmetric polynomials. If f has real/rational/integer 

coefficients, then f is a real/rational/integer coefficient polynomial of the elementary 

symmetric polynomials. 

Proof. For any monomial z a , let 

zā = ∑ p 

z p(a)

26.1. Symmetric Functions 247 

a sum over all permutations p. Every symmetric polynomial, if it contains 

a monomial z a , must also contain z p(a) , for any permutation p. Hence every 

symmetric polynomial is a sum of zā polynomials. Consequently the weight a 

of a symmetric polynomial f must satisfy a 1 ≥ a 2 ≥ · · · ≥ a n . We have only 

to write the zā in terms of the elementary symmetric functions, with integer 

coefficients. Let b n = a n , b n−1 = a n−1 −b n , b n−2 = a n−2 −b n−1 , . . . , b 1 = a 1 −b 2 . 

Then s(z) b has leading monomial z a , so zā − s(z) b has lower weight. Apply 

induction on the weight. 

Example 26.10. z 2 1 + z 2 2 = (z 1 + z 2 ) 2 − 2 z 1 z 2 . To compute out these expressions: 

f(z) = z 2 1 + z 2 2 has weight (2, 0). The polynomials s 1 (z) and s 2 (z) have weights 

(1, 0) and (1, 1). So we subtract off the appropriate weights of s 1 (z) 2 from f(z), 

and find f(z) − s 1 (z) 2 = −2 z 1 z 2 = −2s 2 (z). 

Sums of powers 

Define p j (z) = z j 1 + zj 2 + · · · + zj n, the sums of powers. 

Lemma 26.11. The sums of powers are related to the elementary symmetric 

functions by 

0 = k s k − p 1 s k−1 + p 2 s k−2 − · · · + (−1) k−1 p k−1 s 1 + (−1) k p k . 

Proof. Lets write z (l) for z with the l-th entry removed, so if z is a vector in 

C n , then z (l) is a vector in C n−1 . 

p j s k−j = ∑ l 

z j l 

∑ 

i 1


Hence the sum collapses to 

p 1 s k − p 2 s k−1 + · · · + (−1) k−1 p k−1 s 1 = ∑ l 

z l s k−1 

( 

z (l)) + (−1) k−1 ∑ l 

z k l · s 0 

( 

z (l)) 

= k s k + (−1) k−1 p k . 

Proposition 26.12. Every symmetric polynomial is a polynomial in the sums 

of powers. If the coefficients of the symmetric polynomial are real (or rational), 

then it is a real (or rational) polynomial function of the sums of powers. Every 

continuous symmetric function of complex variables is a continuous function of 

the sums of powers. 

Proof. We can solve recursively for the sums of powers in terms of the elementary 

symmetric functions and conversely. 

Remark 26.13. The standard reference on symmetric functions is [7]. 

26.2 The Invariants of a Square Matrix 

Definition 26.14. A complex-valued or real-valued function f(A), depending on 

the entries of a square matrix A, is an invariant if f ( F AF −1) = f(A) for any 

invertible matrix F . 

So an invariant is independent of change of basis. If T : V → V is a linear 

map on an n-dimensional vector space, we can define the value f(T ) of any 

invariant f of n × n matrices, by letting f(T ) = f(A) where A is the matrix 

associated to F T F −1 , for any isomorphism F : R n → V . 

Example 26.15. For any n × n matrix A, write 

det(A − λ) = s n (A) − s n−1 (A)λ + s n−2 (A)λ 2 + · · · + (−1) n λ n . 

The functions s 1 (A), s 2 (A), . . . , s n (A) are invariants. 

Example 26.16. The functions 

are invariants. 

p k (A) = tr ( A k) . 

Problem 26.2. If A is diagonal, say 

⎛ 

⎞ 

z 1 z 

A = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ , 

z n 

then prove that s j (A) = s j (z 1 , z 2 , . . . , z n ), the elementary symmetric functions 

of the eigenvalues.

26.2. The Invariants of a Square Matrix 249 

Problem 26.3. Generalize the previous exercise to A diagonalizable. 

Theorem 26.17. Each continuous (or polynomial) invariant function of a 

complex matrix has exactly one expression as a continuous (or polynomial) function 

of the elementary symmetric functions of the eigenvalues. Each polynomial 

invariant function of a real matrix has exactly one expression as a polynomial 

function of the elementary symmetric functions of the eigenvalues. 

Remark 26.18. We can replace the elementary symmetric functions of the 

eigenvalues by the sums of powers of the eigenvalues. 

Proof. Every continuous invariant function f(A) determines a continuous function 

f(z) by setting 

⎛ 

⎞ 

z 1 z 

A = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ . 

z n 

Taking F any permutation matrix, invariance tells us that f ( F AF −1) = f(A). 

But f ( F AF −1) is given by applying the associated permutation to the entries of 

z. Therefore f(z) is a symmetric function. If f(A) is continuous (or polynomial) 

then f(z) is too. Therefore f(z) = h(s(z)), for some continuous (or polynomial) 

function h; so f(A) = h(s(A)) for diagonal matrices. By invariance, the same 

is true for diagonalizable matrices. If we work with complex matrices, then 

every matrix can be approximated arbitrarily closely by diagonalizable matrices 

(by theorem 23.12 on page 221). Therefore by continuity of h, the equation 

f(A) = h(s(A)) holds for all matrices A. 

For real matrices, the equation only holds for those matrices whose eigenvalues 

are real. However, for polynomials this is enough, since two polynomial 

functions equal on an open set must be equal everywhere. 

Remark 26.19. Consider the function f(A) = s j (|λ 1 | , |λ 2 | , . . . , |λ n |), where A 

has eigenvalues λ 1 , λ 2 , . . . , λ n . This function is a continuous invariant of a real 

matrix A, and is not a polynomial in λ 1 , λ 2 , . . . , λ n .

27 The Pfaffian 

Skew-symmetric matrices have a surprising additional polynomial invariant, 

called the Pfaffian, but it is only invariant under rotations, and only exists for 

skew-symmetric matrices with an even number of rows and columns. 

27.1 Skew-Symmetric Normal Form 

Theorem 27.1. For any skew-symmetric matrix A with an even number of 

rows and columns, there is a rotation matrix F so that 

⎛ 

⎞ 

0 a 1 

−a 1 0 

0 a 2 

F AF −1 = 

−a 2 0 

. 

. .. ⎜ 

⎟ 

⎝ 

0 a n ⎠ 

−a n 0 

(We can say that F brings A to skew-symmetric normal form.) If A is any skewsymmetric 

matrix with an odd number of rows and columns, we can arrange 

the same equation, again via a rotation matrix F , but the normal form has an 

extra row of zeroes and an extra column of zeroes: 

⎛ 

⎞ 

0 a 1 

⎜−a 1 0 

⎟ 

F AF −1 = 

⎜ 

⎝ 

0 a 2 

−a 2 0 

0 a n 

−a n 0 

Proof. Because A is skew-symmetric, A is skew-adjoint, so normal when thought 

of as a complex matrix. So there is a unitary basis of C 2n of complex eigenvectors 

251 

0 

. 

⎟ 

⎠


of A. If λ is an complex eigenvalue of A, with complex eigenvector z, scale z to 

have unit length, and then 

λ = λ 〈z, z〉 

= 〈λz, z〉 

= 〈Az, z〉 

= − 〈z, Az〉 

= − 〈z, λz〉 

= −¯λ 〈z, z〉 

= −¯λ. 

Therefore λ = −¯λ, i.e. λ has the form ia for some real number a. So there are 

two different possibilities: λ = 0 or λ = ia with a ≠ 0. 

If λ = 0, then z lies in the kernel, so if we write z = x + iy then both x 

and y lie in the kernel. In particular, we can write a real orthonormal basis 

for the kernel, and then x and y will be real linear combinations of those basis 

vectors, and therefore z will be a complex linear combination of those basis 

vectors. Lets take u 1 , u 2 , . . . , u s to be a real orthonormal basis for the kernel 

of A, and clearly then the same vectors u 1 , u 2 , . . . , u n form a complex unitary 

basis for the complex kernel of A. 

Next lets take care of the nonzero eigenvalues. If λ = ia is a nonzero 

eigenvalue, with unit length eigenvector z, then taking complex conjugates on 

the equation Az = λz = iaz, we find A¯z = −ia¯z, so ¯z is another eigenvector 

with eigenvalue −ia. So they come in pairs. Since the eigenvalues ia and −ia 

are distinct, the eigenvectors z and ¯z must be perpendicular. So we can always 

make a new unitary basis of eigenvectors, throwing out any λ = −ia eigenvector 

and replacing it with ¯z if needed, to ensure that for each eigenvector z in our 

unitary basis of eigenvectors, ¯z also belongs to our unitary basis. Moreover, we 

have three equations: Az = iaz, 〈z, z〉 = 1, and 〈z, ¯z〉 = 0. Write z = x + iy 

with x and y real vectors, and expand out all three equations in terms of x 

and y to find Ax = −ay, Ay = ax, 〈x, x〉 + 〈y, y〉 = 1, 〈x, x〉 − 〈y, y〉 = 0 and 

〈x, y〉 = 0. So if we let X = √ 1 2 

x and Y = √ 1 2 

y, then X and Y are unit vectors, 

and AX = −aY and AY = aX. 

Now if we carry out this process for each eigenvalue λ = ia with a > 0, 

then we can write down vectors X 1 , Y 1 , X 2 , Y 2 , . . . , X t , Y t , one pair for each 

eigenvector from our unitary basis with a nonzero eigenvalue. These vectors 

are each unit length, and each X i is perpendicular to each Y i . We also have 

AX i = −a i Y i and AY i = a i X i . 

If z i and z j are two different eigenvectors from our original unitary basis of 

eigenvectors, and their eigenvalues are λ i = ia i and λ j = ia j with a i , a j > 0, 

then we want to see why X i , Y i , X j and Y j must be perpendicular. This follows 

immediately from z i , ¯z i , z j and ¯z j being perpendicular, by just expanding 

into real and imaginary parts. Similarly, we can see that u 1 , u 2 , . . . , u s are

27.2. Partitions 253 

perpendicular to each X i and Y i . So finally, we can let 

) 

F = 

(X 1 Y 1 X 2 Y 2 . . . X t Y t u 1 u 2 . . . u s . 

Clearly these vectors form a real orthonormal basis, so F is an orthogonal matrix. 

We want to arrange that F be a rotation matrix. Lets suppose that F is not a 

rotation. We can either change the sign of one of the vectors u 1 , u 2 , . . . , u s (if 

there are any), or replace X 1 by −X 1 , which switches the sign of a 1 , to make 

F orthogonal. 

27.2 Partitions 

A partition of the numbers 1, 2, . . . , 2n is a choice of division of these numbers 

into pairs. For example, we could choose to partition 1, 2, 3, 4, 5, 6 into 

{4, 1} , {2, 5} , {6, 3} . 

This is the same partition if we write the pairs down in a different order, like 

{2, 5} , {6, 3} , {4, 1} , 

or if we write the numbers inside each pair down in a different order, like 

{1, 4} , {5, 2} , {6, 3} . 

It isn’t really important that the objects partitioned be numbers. Of course, 

you can’t partition an odd number of objects into pairs. Each permutation p of 

the numbers 1, 2, 3, . . . , 2n has an associated partition 

{p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n − 1), p(2n)} . 

For example, the permutation 3, 1, 4, 6, 5, 2 has associated partition 

{3, 1} , {4, 6} , {5, 2} . 

Clearly two different permutations p and q could have the same associated 

partition, i.e. we could first transpose various of the pairs of the partition of 

p, keeping each pair in order, and then transpose entries within each pair, but 

not across different pairs. Consequently, there are n!2 n different permutations 

associating the same partition: n! ways to permute pairs, and 2 n ways to 

swap the order within each pair. When you permute a pair, like changing 

3, 1, 4, 6, 5, 2 to 4, 6, 3, 1, 5, 2, this is the effect of a pair of transpositions (one to 

permute 3 and 4 and another to permute 5 and 6), so has no effect on signs. 

Therefore if two permutations have the same partition, the root cause of any 

difference in sign must be from transpositions inside each pair. For example, 

while it is complicated to find the signs of the permutations 3, 1, 4, 6, 5, 2 and of 

4, 6, 1, 3, 5, 2, it is easy to see that these signs must be different.


On the other hand, we can write each partition in “alphabetical order”, like 

for example rewriting 

{4, 1} , {2, 5} , {6, 3} 

as 

{1, 4} , {2, 5} , {3, 6} 

so that we put each pair in order, and then order the pairs among one another 

by their lowest elements. This in term determines a permutation, called the 

natural permutation of the partition, given by putting the elements in that 

order; in our example this is the permutation 

1, 5, 2, 5, 3, 6. 

We write the sign of a permutation p as sgn(p), and define the sign of a 

partition P to be sign of its natural permutation. Watch out: if we start with 

a permutation p, like 6, 2, 4, 1, 3, 5, then the associated partition P is 

This is the same partition as 

{6, 2} , {4, 1} , {3, 5} . 

{1, 4} , {2, 6} , {3, 5} 

(just written in alphabetical order). The natural permutation q of P is therefore 

1, 4, 2, 6, 3, 5, 

so the original permutation p is not the natural permutation of its associated 

partition. 

Problem 27.1. How many partitions are there of the numbers 1, 2, . . . , 2n? 

Problem 27.2. Write down all of the partitions of 

(a) 1, 2; 

(b) 1, 2, 3, 4; 

(c) 1, 2, 3, 4, 5, 6. 

27.3 The Pfaffian 

We want to write down a square root of the determinant. 

Example 27.2. If A is a 2 × 2 skew-symmetric matrix, 

( ) 

0 a 

A = 

, 

−a 0 

then det A = a 2 , so the entry a = A 12 is a polynomial function of the entries of 

A, which squares to det A.

27.3. The Pfaffian 255 

Example 27.3. A huge calculation shows that if A is a 4 × 4 skew-symmetric 

matrix, then 

det A = (A 12 A 34 − A 13 A 24 + A 14 A 23 ) 2 . 

So A 12 A 34 −A 13 A 24 +A 14 A 23 is a polynomial function of the entries of A which 

squares to det A. 

Definition 27.4. For any 2n × 2n skew-symmetric matrix A, let 

Pf A = 1 ∑ 

n!2 n sgn(p)A p(1)p(2) A p(3)p(4) . . . A p(2n−1)p(2n) 

p 

where sgn(p) is the sign of the permutation p. Pf is called the Pfaffian. 

Remark 27.5. Don’t ever try to use this horrible formula to compute a Pfaffian. 

We will find better way soon. 

Lemma 27.6. For any 2n × 2n skew-symmetric matrix A, 

Pf A = ∑ P 

sgn(P )A p(1)p(2) A p(3)p(4) . . . A p(2n−1)p(2n) , 

where the sum is over partitions P and the permutation p is the natural permutation 

of the partition P . In particular, Pf A is an integer coefficient polynomial 

of the entries of the matrix A. 

Proof. Each permutation p has an associated partition P . So we can write the 

Pfaffian as a sum 

Pf A = 1 ∑ ∑ 

n!2 n sgn(p)A p(1)p(2) A p(3)p(4) . . . A p(2n−1)p(2n) 

P 

p 

where the first sum is over all partitions P , and the second over all permutations 

p which have P as their associated partition. But if two permutations p and q 

both have the same associated partition 

{p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n − 1), p(2n)} , 

then p and q give the same pairs of indices in the expression A p(1)p(2) A p(3)p(4) . . . A p(2n−1)p(2n) . 

Perhaps some of the indices in these pairs might be reversed. For example, we 

might have partition P being 

{1, 5} , {2, 6} , {3, 4} , 

and permutations p being 1, 5, 2, 6, 3, 4 and q being 5, 1, 2, 6, 3, 4. The contribution 

to the sum coming from p is 

sgn(p)A 15 A 26 A 34 , 

while that from q is 

sgn(q)A 51 A 26 A 34 .


But then A 51 = −A 15 , a sign change which is perfectly offset by the sign sgn(q): 

each transposition of pairs gives a sign change from sgn(p). 

So put together, we find that for any two permutations p and q with the 

same partition, their contributions are the same: 

sgn(q)A q(1)q(2) A q(3)q(4) . . . A q(2n−1)q(2n) = sgn(p)A p(1)p(2) A p(3)p(4) . . . A p(2n−1)p(2n) . 

Therefore the n!2 n permutations with associated partition P all contribute the 

same amount as the natural permutation of P . 

27.4 Rotation Invariants of Skew-Symmetric Matrices 

Theorem 27.7. 

Pf 2 A = det A. 

Moreover, for any 2n × 2n matrix B, 

Pf ( BAB t) = Pf(A) det(B). 

If A is in skew-symmetric normal form, say 

⎛ 

0 a 1 

−a 1 0 

A = 

⎜ 

⎝ 

0 a 2 

−a 2 0 

0 a n 

−a n 0 

⎞ 

, 

⎟ 

⎠ 

then 

Pf A = a 1 a 2 . . . a n . 

Proof. Lets start by proving that Pf A = a 1 a 2 . . . a n for A in skew-symmetric 

normal form. Looking at the terms that appear in the Pfaffian, we find that 

at least one of the factors A p(2j−1)p(2j) in each term will vanish unless these 

factors come from among the entries A 1,2 , A 3,4 , . . . , A 2n−1,2n . So all terms 

vanish, except when the partition is (in alphabetical order) 

{1, 2} , {3, 4} , . . . , {2n − 1, 2n} , 

yielding 

Pf A = a 1 a 2 . . . a n . 

In particular, Pf 2 A = det A.

27.4. Rotation Invariants of Skew-Symmetric Matrices 257 

For any 2n × 2n matrix B, 

n!2 n Pf ( BAB t) = ∑ p 

sgn(p) ( BAB t) p(1)p(2) 

( 

BAB 

t ) p(3)p(4) . . . ( BAB t) p(2n−1)p(2n) 

= ∑ ( ∑ ∑ 

sgn(p) B p(1)i1 A i1i 2 

B p(2)i2 B p(3)i3 A i3i 4 

B p(4)i4 

p 

i 1i 2 i 3i 4 

∑ 

· · · 

i 2n−1i 2n 

B p(2n−1)i2n−1 A i2n−1i 2n 

B p(2n)i2n 

) 

= ∑ ( ) ∑ 

sgn(p)B p(1)i1 B p(2)i2 B p(3)i3 . . . B p(2n)i2n A i1i 2 

A i3i 4 

. . . A i2n−1i 2n 

i 1i 2...i 2n p 

= ∑ 

) 

(Be i1 Be i2 . . . Be i2n 

A i1i 2 

A i3i 4 

. . . A i2n−1i 2n 

i 1i 2...i 2n 

det 

If i 1 = i 2 then two columns are equal inside the determinant, so we can write 

this as a sum over permutations: 

= ∑ q 

= ∑ q 

) 

det 

(Be q(1) Be q(2) . . . Be q(2n) A q(1)q(2) A q(3)q(4) . . . A q(2n−1)q(2n) 

) 

sgn(q) det 

(Be 1 Be 2 . . . Be 2n A q(1)q(2) A q(3)q(4) . . . A q(2n−1)q(2n) 

= n!2 n det B Pf A. 

Finally, to prove that Pf 2 A = det A, we just need to get A into skew-symmetric 

normal form via a rotation matrix B, and then Pf 2 A = Pf 2 (BAB t ) = 

det (BAB t ) = det A. 

How do you calculate the Pfaffian in practice? It is like the determinant, 

except that you start running your finger down the first column under the 

diagonal, and you write down −, +, −, +, . . . in front of each entry from the 

first column, and then the Pfaffian you get by crossing out that row and column,


and symmetrically the corresponding column and row. So 

⎛ 

⎞ 

⎛ 

0 −2 1 3 

0 −2 1 3 

2 0 8 −4 

Pf ⎜ 

⎟ 

⎝−1 −8 0 5 ⎠ = −(2) · Pf 2 0 8 −4 

⎜ 

⎝ −1 −8 0 5 

−3 4 −5 0 

−3 4 −5 0 

⎛ 

+ (−1) · Pf ⎜ 

⎝ 

⎛ 

− (−3) · Pf ⎜ 

⎝ 

0 −2 1 3 

2 0 8 −4 

−1 −8 0 5 

−3 4 −5 0 

0 −2 1 3 

2 0 8 −4 

−1 −8 0 5 

−3 4 −5 0 

= −(2) · 5 + (−1) · (−4) − (−3) · 8. 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

Lets prove that this works: 

Lemma 27.8. If A is a skew-symmetric matrix with an even number of rows 

and columns, larger than 2 × 2, then 

Pf A = −A 21 Pf A [21] + A 31 Pf A [31] − . . . 

= ∑ j>1(−1) i+1 A i1 Pf A [i1] , 

where A [ij] is the matrix A with rows i and j and columns i and j removed. 

Proof. Lets define a polynomial P (A) in the entries of a skew-symmetric matrix 

A (with an even number of rows and columns) by setting P (A) = Pf A if A is 

2 × 2, and setting 

P (A) = ∑ ( 

i>1(−1) i+1 A i1 P A [i1]) , 

for larger A. We need to show that P (A) = Pf A. Clearly P (A) = Pf A if A is 

in skew-symmetric normal form. Each term in Pf A corresponds to a partition, 

and each partition must put 1 into one of its pairs, say in a pair {1, i}. It then 

can’t use 1 or i in any other pair. Clearly P (A) also has exactly one factor like 

A i1 in each term, and then no other factors get to have i or 1 as subscripts. 

Moreover, all terms in P (A) and in Pf A have a coefficient of 1 or −1. So it is 

clear that the terms of P (A) and of Pf A are the same, up to sign. We have to 

fix the signs. 

Suppose that we swap rows 2 and 3 of A and columns 2 and 3. Lets show 

that this changes the signs of P (A) and of Pf A. For Pf A, this is immediate 

from theorem 27.7 on page 256. Let Q be the permutation matrix of the


transposition of 2 and 3. (To be more precise, let Q n be the n × n permutation 

matrix of the transposition of 2 and 3, for any value of n ≥ 3. But lets write all 

such matrices Q n as Q.) Let B = QAQ t , i.e. A with rows 2 and 3 and columns 

2 and 3 swapped. So B ij is just A ij unless i or j is either 2 or 3. So 

( 

( 

( 

P (B) = −B 21 P 

= −A 31 P 

B [21]) + B 31 P 

( 

A [31]) + A 21 P 

By induction, the sign changes in the last term. 

= +A 21 P 

= −P (A). 

B [31]) − B 41 P B [41]) + . . . 

( 

A [21]) − A 41 P 

(QA [41] Q t) 

( 

A [21]) ( 

− A 31 P A [31]) + A 41 P 

(A [41]) 

So swapping rows 2 and 3 changes a sign. In the same way, 

P ( QAQ t) = sgn qP (A), 

for Q the permutation matrix of any permutation q of the numbers 2, 3, . . . , 2n. 

If we start with A in skew-symmetric normal form, letting the numbers 

a 1 , a 2 , . . . , a n in the skew-symmetric normal form be some abstract variables, 

then Pf A is just a single term of the Pf and of P and these terms have the 

same sign. All of the terms of Pf are obtained by permuting indices in this term, 

i.e. as Pf (QAQ t ) for suitable permutation matrices Q. Indeed you just need 

to take Q the permutation matrix of the natural permutation of each partition. 

Therefore the signs of Pf and of P are the same for each term, so P = Pf. 

Problem 27.3. Prove that the odd degree elementary symmetric functions of 

the eigenvalues vanish on any skew-symmetric matrix. 

Let s 1 (a), . . . , s n (a) be the usual symmetric functions of some numbers 

a 1 , a 2 , . . . , a n . For any vector a let t(a) be the vector 

⎛ ( ) ⎞ 

s 1 a 

2 

1 , a 2 2, . . . , a ( 2 n) 

s 2 a 

2 1 , a 2 2, . . . , a 2 n 

. 

. 

⎜ ( ) ⎟ 

⎝s n−1 a 

2 

1 , a 2 2, . . . , a 2 ⎠ 

n 

a 1 a 2 . . . a n 

Lemma 27.9. Two complex vectors a and b in C n satisfy t(a) = t(b) just when 

b can be obtained from a by permutation of entries and changing signs of an 

even number of entries. A function f(a) is invariant under permutations and 

even numbers of sign changes just when f(a) = h(t(a)) for some function h. 

Proof. Clearly s n (a 2 1, a 2 2, . . . , a 2 n) = (a 1 a 2 . . . a n ) 2 = t n (a) 2 . In particular, the 

symmetric functions of a 2 1, a 2 2, . . . , a 2 n are all functions of t 1 , t 2 , . . . , t n . Therefore


if we have two vectors a and b with t(a) = t(b), then a 2 1, a 2 2, . . . , a 2 n are a permutation 

of b 2 1, b 2 2, . . . , b 2 n. So after permutation, a 1 = ±b 1 , a 2 = ±b 2 , . . . , a n = ±b n , 

equality up to some sign changes. Since we also know that t n (a) = t n (b), we 

must have a 1 a 2 . . . a n = b 1 b 2 . . . b n . If none of the a i vanish, then a 1 a 2 . . . a n = 

b 1 b 2 . . . b n ensures that none of the b i vanish either, and that the number of 

sign changes is even. It is possible that one of the a i vanish, in which case we 

can change its sign as we like to arrange that the number of sign changes is 

even. 

Lemma 27.10. Two skew-symmetric matrices A and B with the same even 

numbers of rows and columns can be brought one to another, say B = F AF t , 

by some rotation matrix F , just when they have skew-symmetric normal forms 

⎛ 

⎞ 

0 a 1 

⎜−a 1 0 

⎟ 

⎜ 

⎝ 

0 a 2 

−a 2 0 

0 a n 

−a n 0 

⎟ 

⎠ 

and 

⎛ 

0 b 1 

−b 1 0 

⎜ 

⎝ 

respectively, with t(a) = t(b). 

0 b 2 

−b 2 0 

0 b n 

−b n 0 

⎞ 

⎟ 

⎠ 

Proof. If we have a skew-symmetric normal form for a matrix A, with numbers 

a 1 , a 2 , . . . , a n as above, then t 1 (a), t 2 (a), . . . , t n−1 (a) are the elementary 

symmetric functions of the squares of the eigenvalues, while t n (a) = Pf A, so 

clearly t(a) depends only on the invariants of A under rotation. In particular, 

suppose that I find two different skew-symmetric normal forms, one with 

numbers a 1 , a 2 , . . . , a n and one with numbers b 1 , b 2 , . . . , b n . Then the numbers 

b 1 , b 2 , . . . , b n must be given from the numbers a 1 , a 2 , . . . , a n by permutation 

and switching of an even number of signs. In fact we can attain these changes 

by actual rotations as follows. 

For example, think about 4 × 4 matrices. The permutation matrix F of 

3, 4, 1, 2 permutes the first two and second two basis vectors, and is a rotation 

because the number of transpositions is even. When we replace A by F AF t , we


swap a 1 with a 2 . Similarly, we can take the matrix F which reflects e 1 and e 3 , 

changing the sign of a 1 and of a 2 . So we can clearly carry out any permutations, 

and any even number of sign changes, on the numbers a 1 , a 2 , . . . , a n . 

Lemma 27.11. Any polynomial in a 1 , a 2 , . . . , a n can be written in only one 

way as h(t(a)). 

Proof. Recall that every complex number has a square root (a vector with half 

the argument and the square root of the modulus). Clearly 0 has only 0 as 

square root, while all other complex numbers z have two square roots, which 

we write as ± √ z. 

Given any complex vector b, I can solve t(a) = b by first constructing a 

solution c to 

⎛ ⎞ 

b 1 

b 2 s(c) = 

. 

, 

⎜ ⎟ 

⎝b n−1 

⎠ 

b 2 n 

and then letting a j = ± √ c j . Clearly t(a) = b unless t n (a) has the wrong sign. 

If we change the sign of one of the a j then we can fix this. So t : C n → C n is 

onto. 

Theorem 27.12. Each polynomial invariant of a skew-symmetric matrix with 

even number of rows and columns can be expressed in exactly one way as a 

polynomial function of the even degree symmetric functions of the eigenvalues 

and the Pfaffian. Two skew-symmetric matrices A and B with the same even 

numbers of rows and columns can be brought one to another, say B = F AF t , 

by some rotation matrix F , just when their even degree symmetric functions 

and their Pfaffian agree. 

Proof. If f(A) is a polynomial invariant under rotations, i.e. f (F AF t ) = f(A), 

then we can write f(A) = h(t(a)), with a 1 , a 2 , . . . , a n the numbers in the skewsymmetric 

normal form of A, and h some function. Lets write the restriction of 

f to the normal form matrices as as polynomial f(a). We can split f into a 

sum of homogeneous polynomials of various degrees, so lets assume that f is 

already homogeneous of some degree. We can pick any monomial in f and sum 

it over permutations and over changes of signs of any even number of variables, 

and f will be a sum over such quantities. So we only have to consider each 

such quantity, i.e. assume that 

f = ∑ (±a 1 ) d p(1) 

(±a 2 ) d p(2) 

. . . (±a n ) d p(n) 

(27.1) 

where the sum is over all choices of any even number of minus signs and all 

permutations p of the degrees d 1 , d 2 , . . . , d n . If all degrees d 1 , d 2 , . . . , d n are even,


then f is an elementary symmetric function of a 2 1, a 2 2, . . . , a 2 n, so a polynomial in 

t 1 (a), t 2 , (a), . . . , t n−1 (a). If all degrees are odd, then they are all positive, and 

we can divide out a factor of a 1 a 2 . . . a n = t n (a). So lets assume that at least 

one degree is even, say d 1 , and that at least one degree is odd, say d 2 . All terms 

in equation 27.1 that put a plus sign in front of a 1 and a 2 cancel those terms 

which put a minus sign in front of both a 1 and a 2 . Similarly, terms putting 

a minus sign in front of a 1 and a plus sign in front of a 2 cancel those which 

do the opposite. So f = 0. Consequently, invariant polynomial functions f(A) 

are polynomials in the Pfaffian and the symmetric functions of the squared 

eigenvalues. 

The characteristic polynomial of an n×n skew-symmetric matrix A is clearly 

so that 

det (A − λ I) = ( λ + a 2 1) ( 

λ + a 

2 

2 

) 

. . . 

( 

λ + a 

2 

n 

) 

, 

s 2j (A) = s j 

( 

a 

2 

1 , a 2 2, . . . , a 2 n) 

, s2j−1 (A) = 0, 

for any j = 1, . . . , n. Consequently, invariant polynomial functions f(A) are 

polynomials in the Pfaffian and the even degree symmetric functions of the 

eigenvalues. 

27.5 The Fast Formula for the Pfaffian 

Since Pf (BAB t ) = det B Pf A, we can find the Pfaffian of a large matrix by a 

sort of Gaussian elimination process, picking B to be a permutation matrix, or 

a strictly lower triangular matrix, to move A one step towards skew-symmetric 

normal form. Careful: 

Problem 27.4. Prove that replacing A by BAB t , with B a permutation matrix, 

permutes the rows and columns of A. 

Problem 27.5. Prove that if B is a strictly lower triangular matrix which adds 

a multiple of, say, row 2 to row 3, then BAB t is A with that row addition 

carried out, and with the same multiple of column 2 added to column 3. 

We leave the reader to formulate the obvious notion of Gaussian elimination 

of skew-symmetric matrices to find the Pfaffian.

263 

Factorizations

28 Dual Spaces and Quotient Spaces 

In this chapter, we learn how to manipulate whole vector spaces, rather than 

just individual vectors. Out of any abstract vector space, we will construct 

some new vector spaces, giving algebraic operations on vector spaces rather 

than on vectors. 

28.1 The Vector Space of Linear Maps Between Two Vector Spaces 

If V and W are two vector spaces, then a linear map T : V → W is also often 

called a homomorphism of vector spaces, or a homomorphism for short, or 

a morphism to be even shorter. We won’t use this terminology, but we will 

nevertheless write Hom (V, W ) for the set of all linear maps T : V → W . 

Definition 28.1. A linear map is onto if every output w in W comes from some 

input: w = T v, some input v in V . A linear map is 1-to-1 if any two distinct 

vectors v 1 ≠ v 2 get mapped to distinct vectors T v 1 ≠ T v 2 . 

Problem 28.1. Turn Hom (V, W ) into a vector space. 

Problem 28.2. Prove that a linear map is an isomorphism just when it is 

1-to-1 and onto. 

Problem 28.3. Give the simplest example you can of a 1-to-1 linear map which 

is not onto. 

Problem 28.4. Give the simplest example you can of an onto linear map which 

is not 1-to-1. 

Problem 28.5. Prove that a linear map is 1-to-1 just when its kernel consists 

precisely in the zero vector. 

Remark 28.2. If V and W are complex vector spaces, we will write Hom (V, W ) 

to mean the set of linear maps of complex vector spaces, etc. We will only prove 

results for real vector spaces, and the reader can imagine how to generalize 

them appropriately. 

265


28.2 The Dual Space 

The simplest possible real vector space is R. 

Definition 28.3. If V is a vector space, let V ∗ = Hom (V, R), i.e. V ∗ is the set 

of linear maps T : V → R, i.e. the set of real-valued linear functions on V . We 

call V ∗ the dual space of V . 

We will usually write vectors in V with Roman letters, and vectors in V ∗ 

with Greek letters. The vectors in V ∗ are often called covectors. 

Example 28.4. If V = R n , every linear function looks like 

α(x) = a 1 x 1 + a 2 x 2 + · · · + a n x n . 

We can write this as 

⎛ ⎞ 

x 1 

) 

x 

α(x) = 

(a 1 a 2 . . . a n 2 

⎜ ⎟ 

⎝ . ⎠ . 

x n 

So we will identify R n∗ with the set of row matrices. We will write e 1 , e 2 , . . . , e n 

for the obvious basis: e i is the i-th row of the identity matrix. 

Problem 28.6. Why is V ∗ a vector space? 

Problem 28.7. What is dim V ∗ ? 

Remark 28.5. V and V ∗ have the same dimension, but we should think of them 

as quite different vector spaces. 

Lemma 28.6. Suppose that V is a vector space with basis v 1 , v 2 , . . . , v n . There 

is a unique basis for V ∗ , called the basis dual to v 1 , v 2 , . . . , v n , which we will 

write as v 1 , v 2 , . . . , v n , so that 

⎧ 

⎨ 

v i 1 if i = j , 

(v j ) = 

⎩0 if i ≠ j . 

Remark 28.7. The hard part is getting used to the notation: v 1 , v 2 , . . . , v n are 

each a linear function taking vectors from V to numbers: v 1 , v 2 , . . . , v n : V → R. 

Proof. For each fixed i, the equations above uniquely determine a linear function 

v i , by theorem 16.23 on page 167, since we have defined the linear function 

on a basis. The functions v 1 , v 2 , . . . , v n are linearly independent, because 

if they satisfy ∑ a i v i = 0, then applying this linear function ∑ a i v i to the

28.2. The Dual Space 267 

basis vector v j we find a j = 0, and this holds for each j, so all numbers 

a 1 , a 2 , . . . , a n vanish. Finally if we have any linear function f on V , then we 

can set a 1 = f (v 1 ) , a 2 = f (v 2 ) , . . . , a n = f (v n ), and find f(v) = ∑ a j v j (v) 

for v = v 1 or v = v 2 , etc., and therefore for v any linear combination of v 1 , v 2 , 

etc. Therefore f = ∑ a j v j , and we see that these functions v 1 , v 2 , . . . , v n span 

V ∗ . 

Problem 28.8. Find the dual basis v 1 , v 2 , v 3 to the basis 

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 

⎜ 

1 ⎟ ⎜ 

1 ⎟ ⎜ 

1 ⎟ 

v 1 = ⎝0⎠ , v 2 = ⎝2⎠ , v 3 = ⎝2⎠ . 

0 0 3 

Problem 28.9. Prove that the dual basis v 1 , v 2 , . . . , v n to any basis v 1 , v 2 , . . . , v n 

of R n satisfies 

⎛ ⎞ 

v 1 

v 2 

⎜ ⎟ 

⎝ . ⎠ = F −1 , 

v n 

where 

F = 

(v 1 v 2 . . . v n 

) 

. 

Lemma 28.8. Let V be a finite dimensional vector space. V ∗∗ and V are 

isomorphic, by associating to each vector x from V the linear function f x from 

V ∗∗ defined by 

f x (α) = α(x). 

Remark 28.9. This lemma is very confusing, but very simple, and therefore very 

important. 

Proof. First, lets ask what V ∗∗ means. Its vectors are linear functions on V ∗ , 

by definition. Next, lets pick a vector x in V and construct a linear function on 

V ∗ . How? Take any covector α in V ∗ , and lets assign to it some number f(α). 

Since α is (by definition again) a linear function on V , α(x) is a number. Lets 

take the number f(α) = α(x). Lets call this function f = f x . The rest of the 

proof is a series of exercises. 

Problem 28.10. Check that f x is a linear function. 

Problem 28.11. Check that the map T (x) = f x is a linear map T : V → V ∗∗ . 

Problem 28.12. Check that T : V → V ∗∗ is one-to-one: i.e. if we pick two 

different vectors x and y in V , then f x ≠ f y .


Remark 28.10. Although V and V ∗∗ are identified as above, V and V ∗ cannot 

be identified in any “natural manner,” and should be thought of as different. 

Definition 28.11. If T : V → W is a linear map, write T ∗ : W ∗ → V ∗ for the 

linear map given by 

T ∗ (α)(v) = α(T v). 

Call T ∗ the transpose of T . 

Problem 28.13. Prove that T ∗ : W ∗ → V ∗ is a linear map. 

Problem 28.14. What does this notion of transpose have to do with the notion 

of transpose of matrices? 

28.3 Quotient Spaces 

A subspace W of a vector space V doesn’t usually have a natural choice of 

complementary subspace. For example, if V = R 2 , and W is the vertical axis, 

then we might like to choose the horizontal axis as a complement to W . But this 

choice is not natural, because we could carry out a linear change of variables, 

fixing the vertical axis but not the horizontal axis (for example, a shear along 

the vertical direction). There is a natural choice of vector space which plays 

the role of a complement, called the quotient space. 

Definition 28.12. If V is a vector space and W a subspace of V , and v a vector 

in V , the translate v + W of W is a set of vectors in V of the form v + w where 

w is in W . 

Example 28.13. The translates of the horizontal plane through 0 in R 3 are just 

the horizontal planes. 

Problem 28.15. Prove that any subspace W will have 

for any w from W . 

w + W = W, 

Remark 28.14. If we take W the horizontal plane (x 3 = 0) in R 3 , then the 

translates 

⎛ ⎞ ⎛ ⎞ 

0 

⎜ ⎟ ⎜ 

7 ⎟ 

⎝0⎠ + W and ⎝2⎠ + W 

1 

1 

are the same, because we can write 

⎛ 

⎜ 

7 

⎞ ⎛ 

⎟ ⎜ 

0 

⎞ ⎛ ⎞ ⎛ 

7 

⎟ ⎜ ⎟ ⎜ 

0 

⎞ 

⎟ 

⎝2⎠ = ⎝0⎠ + ⎝2⎠ + W = ⎝0⎠ + W. 

1 1 0 

1 

This is the main idea behind translates: two vectors make the same translate 

just when their difference lies in the subspace.

28.3. Quotient Spaces 269 

Definition 28.15. If x + W and y + W are translates, we add them by (x + W ) + 

(y + W ) = (x + y) + W . If s is a number, let s(x + W ) = sx + W . 

Problem 28.16. Prove that addition and scaling of translates is well-defined, 

independent of the choice of x and y in a given translate. 

Definition 28.16. The quotient space V/W of a vector space V by a subspace 

W is the set of all translates v + W of all vectors v in V . 

Example 28.17. Take V the plane, V = R 2 , and W the vertical axis. The 

translates of W are the vertical lines in the plane. The quotient space V/W 

has the various vertical lines as its points. Each vertical line passes through the 

horizontal axis at a single point, uniquely determining the vertical line. So the 

translates are the points ( ) 

x 

+ W. 

0 

The quotient space V/W is just identified with the horizontal axis, by taking 

( ) 

x 

+ W to x. 

0 

Lemma 28.18. The quotient space V/W of a vector space by a subspace is a 

vector space. The map T : V → V/W given by the rule T x = x + W is an onto 

linear map. 

Remark 28.19. The concept of quotient space can each be circumvented by 

using some complicated matrices, as can everything in linear algebra, so that 

one never really needs to use abstract vector spaces. But that approach is far 

more complicated and confusing, because it involves a choice of basis, and there 

is usually no natural choice to make. It is always easiest to carry out linear 

algebra as abstractly as possible, descending into choices of basis at the latest 

possible stage. 

Proof. One has to check that (x + W ) + (y + W ) = (y + W ) + (x + W ), but 

this follows from x + y = y + x clearly. Similarly all of the laws of vector spaces 

hold. The 0 element of V/W is the translate 0 + W , i.e. W itself. To check 

that T is linear, consider scaling: T (sx) = sx + W = s(x + W ), and addition: 

T (x + y) = x + y + W = (x + W ) + (y + W ). 

Lemma 28.20. If U and W are subspaces of a vector space V , and V = U ⊕W 

a direct sum of subspaces, then the map T : V → V/W taking vectors v to 

v + W restricts to an isomorphism T | U 

: U → V/W . 

Remark 28.21. So, while there is no natural complement to W , every choice of 

complement is naturally identified with the quotient space.


Proof. The kernel of T is clearly U ∩ W = 0. To see that T is onto, take a 

vector v + W in V/W . Because V = U ⊕ W , we can somehow write v as a sum 

v = u + w with u from U and w from W . Therefore v + W = u + W = T | U 

u 

lies in the image of T | U 

. 

Theorem 28.22. If V is a finite dimensional vector space and W a subspace 

of V , then 

dim V/W = dim V − dim W. 

28.4 The Three Isomorphism Theorems 

Theorem 28.23 (The First Isomorphism Theorem). Any linear map T : V → 

W of vector spaces yields an isomorphism ¯T : V/ ker T → im T by the rule 

¯T (v + ker T ) = T v. 

Remark 28.24. For simplicity, mathematicians often write V/ ker T = im T , 

with the isomorphism ¯T above implicitly understood. 

Proof. Clearly if a vector k belongs to ker T , then T (v + k) = T v for any vector 

v in V . Therefore T is constant on each translate v + ker T , and we can define 

¯T (v + ker T ) to be T v. We need to see that ¯T is a linear map. If we pick v 0 

and v 1 any vectors in V , then 

¯T ((v 0 + ker T ) + (v 1 + ker T )) = ¯T (v 0 + v 1 + ker T ) 

= T (v 0 + v 1 ) 

= T v 0 + T v 1 

= ¯T (v 0 + ker T ) + ¯T (v 1 + ker T ) 

A similar argument ensures that ¯T (av 0 + ker T ) = a ¯T (v 0 + ker T ), so that ¯T 

is linear, ¯T : V/ ker T → im T . 

Next, we need to ensure that ¯T is an isomorphism. Clearly ¯T has the same 

image as T , because ¯T (v + ker T ) = T v. Therefore ¯T : V/ ker T → im T is 

onto. 

The kernel of ¯T consists in the translates v + ker T so that ¯T (v + ker T ) = 0, 

i.e. so that T v = 0. But then v lies in ker T , so v + ker T = ker T is the 

0 vector in V/ ker T . Therefore ¯T has kernel 0, i.e. is 1-to-1, and so is an 

isomorphism. 

Theorem 28.25 (The Second Isomorphism Theorem). Suppose that U and V 

are two subspaces of a vector space W . Then there is an isomorphism 

ι : V/(U ∩ V ) → (U + V )/U, 

given by the rule ι (v + (U ∩ V )) = v + U.

28.4. The Three Isomorphism Theorems 271 

Remark 28.26. For simplicity, mathematicians often write 

V/(U ∩ V ) = (U + V )/U, 

with the isomorphism ι above implicitly understood. 

Proof. Define a linear map ι : V → (U + V ) /U, by ιv = v + U. This map has 

kernel U ∩ V , clearly, so defines a monomorphism ῑ : V/ (U ∩ V ) → (U + V ) /U, 

as in the first isomorphism theorem, by ῑ (v + (U ∩ V )) = ιv. Take any vector 

in (U + V ) /U, say u + v + U. Clearly u + v + U = v + U (as a translate), 

so every vector in (U + V ) /U has the form v + U for some vector v from V . 

Therefore 

v + U = ιv = ῑ (v + (U ∩ V )) , 

so ῑ is an isomorphism. 

Theorem 28.27 (The Third Isomorphism Theorem). If U is a subspace of V 

which is itself a subspace of a vector space W , then there is an isomorphism 

ι : (W/U)/(V/U) → W/V, 

given by the rule ι((w + U) + V/U) = w + V . 

Remark 28.28. For simplicity, mathematicians often write 

(W/U)/(V/U) = W/V, 

with the isomorphism ι above implicitly understood. 

Proof. Define a map ι : W/U → W/V by the rule ι(w + U) = w + V . Clearly 

ι is an epimorphism, with ker ι = V/U. Therefore by the first isomorphism 

theorem, ι induces an isomorphism ῑ : (W/U) / (V/U) → W/V .

29 Singular Value Factorization 

We will analyse statistical data, using the spectral theorem. 

29.1 Principal Components 

Consider a large collection of data coming in from some kind of measuring 

equipment. Lets suppose that the data consists in a large number of vectors, 

say vectors v 1 , v 2 , . . . , v N in R n . How can we get a good rough description of 

what these vectors look like? 

We can take the mean of the vectors 

µ = 1 N (v 1 + v 2 + · · · + v N ) , 

as a good description of where they lie. How do they arrange themselves around 

the mean? To keep things simple, lets subtract the mean from each of the 

vectors. So assume that the mean is µ = 0, and we are asking how the vectors 

arrange themselves around the origin. 

Imagine that these vectors v 1 , v 2 , . . . , v N tend to lie along a particular 

line through the origin. Lets try to take an orthonormal basis of R n , say 

u 1 , u 2 , . . . , u n , so that u 1 points along that line. How can we find the direction 

of that line? We look at the quantity 〈v k , x〉. If the vectors lie nearly on a line 

through 0, then for x on that line, 〈v k , x〉 should be large positive or negative, 

while for x perpendicular to that line, 〈v k , x〉 should be nearly 0. If we square, 

we can make sure the large positive or negative becomes large positive, so we 

take the quantity 

Q(x) = 〈v 1 , x〉 2 + 〈v 2 , x〉 2 + · · · + 〈v N , x〉 2 . 

The spectral theorem guarantees that we can pick an orthonormal basis 

u 1 , u 2 , . . . , u n of eigenvectors of the symmetric matrix A associated to Q. We 

will arrange the eigenvalues λ 1 , λ 2 , . . . , λ n from largest to smallest. Because 

Q(x) ≥ 0, we see that none of the eigenvalues are negative. Clearly Q(x) grows 

fastest in the direction x = u 1 . 

Problem 29.1. The symmetric matrix A associated to Q(x) (for which 

〈Ax, x〉 = Q(x) 

273


for every vector x) is 

A ij = 〈v 1 , e i 〉〈v 1 , e j 〉 + 〈v 2 , e i 〉〈v 2 , e j 〉 + · · · + 〈v 2 , e i 〉〈v 2 , e j 〉 . 

If we rescale all of the vectors v 1 , v 2 , . . . , v N by the same nonzero scalar, then 

the resulting vectors tend to lie along the same lines or planes as the original 

vectors did. So it is convenient to replace Q(x) by the quadratic polynomial 

function 

∑k 

Q(x) = 

〈v k, x〉 2 

∑l ‖v l‖ 2 . 

This has associated symmetric matrix 

∑ 

k 

A ij = 

〈v k, e i 〉〈v k , e j 〉 

∑l ‖v l‖ 2 , 

which we will call the covariance matrix associated to the data. 

Lemma 29.1. Given any set of nonzero vectors v 1 , v 2 , . . . , v N 

them as the columns of a matrix V . Their covariance matrix 

in R n , write 

A = 

V V t 

∑k ‖v k‖ 2 

has an orthonormal basis of eigenvectors u 1 , u 2 , . . . , u n with eigenvalues 1 ≥ 

λ 1 ≥ λ 2 ≥ · · · ≥ λ n ≥ 0. 

Remark 29.2. The square roots of the eigenvalues are the correlation coefficients, 

each indicating how much the data tends to lie in the direction of the associated 

eigenvector. 

Proof. We have only to check that 〈x, V V t x〉 = ∑ k 〈v k, x〉 2 , an exercise for 

the reader, to see that A is the covariance matrix. Eigenvalues of A can’t be 

negative, as mentioned already. For any vector x of length 1, the Schwarz 

inequality (lemma 20.1 on page 193) says that 

∑ 

〈v k , x〉 2 ≤ ∑ 

k 

k 

‖v k ‖ 2 . 

Therefore, by the minimum principle, eigenvalues of A can’t exceed 1. 

Our data lies mostly along a line through 0 just when λ 1 is large, and the 

remaining eigenvalues λ 2 , λ 3 , . . . , λ n are much smaller. More generally, if we 

find that the first dozen or so eigenvalues are relatively large, and the rest 

are relatively much smaller, then our data must lie very close to a subspace 

of dimension a dozen or so. The data tends most strong to lie along the u 1 

direction; fluctations about that direction are mostly in the u 2 direction, etc.

29.2. Singular Value Factorization 275 

(a) Data points. The mean is 

marked as a cross. 

(b) The same data. Lines indicate 

the directions of eigenvectors. 

Vectors sticking out from 

the mean are drawn in those directions. 

The lengths of the vectors 

give the correlation coefficients. 

Figure 29.1: Applying principal components analysis to some data points. 

Every vector x can be written as x = a 1 u 1 + a 2 u 2 + · · · + a n u n , and 

the numbers a 1 , a 2 , . . . , a n are recovered from the formula a i = 〈x, u i 〉. If the 

eigenvalues λ 1 , λ 2 , . . . , λ d are relatively much larger than the rest, we can say 

that our data live near the subspace spanned by u 1 , u 2 , . . . , u d , and say that 

our data has d effective dimensions. The numbers a 1 , a 2 , . . . , a d are called the 

principal components of a vector x. 

To store the data, instead of remembering all of the vectors v 1 , v 2 , . . . , v N , we 

just keep track of the eigenvectors u 1 , u 2 , . . . , u d , and of the principal components 

of the vectors v 1 , v 2 , . . . , v N . In matrices, this means that instead of storing V , 

we store F = (u 1 u 2 . . . u n ), and store the first d columns of F t V ; let W be the 

matrix of these columns. Coming out of storage, we can approximately recover 

the vectors v 1 , v 2 , . . . , v N as the columns of F W . The matrix F represents an 

orthogonal transformation putting the vectors v 1 , v 2 , . . . , v N nearly into the 

subspace spanned by e 1 , e 2 , . . . , e d , and mostly along the e 1 direction, with 

fluctations mostly along the e 2 direction, etc. So it is often useful to take a look 

at the columns of W themselves, as a convenient picture of the data. 

29.2 Singular Value Factorization 

Theorem 29.3. Every real matrix A can be written as A = UΣV t , with U and 

V orthogonal, and Σ has the same dimensions as A, with the form 

( ) 

D 0 

Σ = 

0 0


with D diagonal with nonnegative diagonal entries: 

⎛ 

⎞ 

σ 1 σ 

D = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ . 

σ r 

Proof. Suppose that A is p × q. Just like when we worked out principal 

components, we order the eigenvalues of A t A from largest to smallest. For 

each eigenvalue λ j , let σ j = √ λ j . (Since we saw that the eigenvalues λ j of 

A t A aren’t negative, the square root makes sense.) Let V be the matrix whose 

columns are an orthonormal basis of eigenvectors of A t A, ordered by eigenvalue. 

Write 

) 

V = 

(V 1 V 2 

with V 1 the eigenvectors with positive eigenvalues, and V 2 those with 0 eigenvalue. 

For each nonzero eigenvalue, define a vector 

u j = 1 σ j 

Av j . 

Suppose that there are r nonzero eigenvalues. Lets check that these vectors 

u 1 , u 2 , . . . , u r are orthonormal. 

〈 1 

〈u i , u j 〉 = Av i , 1 〉 

Av j 

σ i σ j 

= 1 

σ i σ j 

〈Av i , Av j 〉 

= 1 

σ i σ j 

〈 

vi , A t Av j 

〉 

= 1 

σ i σ j 

〈v i , λ j v j 〉 

⎧ 

= λ ⎨ 

j 1 if i = j , 

√ 

λi λ j ⎩0 otherwise. 

⎧ 

⎨ 

1 if i = j , 

= 


If there aren’t enough vectors u 1 , u 2 , . . . , u r to make up a basis (i.e. if r < p), 

then just write down some more vectors to make up an orthonormal basis, say

29.2. Singular Value Factorization 277 

vectors u r+1 , u 2 , . . . , u p , and let 

) 

U 1 = 

(u 1 u 2 . . . u r , 

) 

U 2 = 

(u r+1 u 2 . . . u p , 

) 

U = 

(U 1 U 2 . 

By definition of these u j , Av j = σ j u j , so AV 1 = U 1 D. Calculate 

( ) ( ) ( ) 

UΣV t t 

D 0 V 

= 

1 

U 1 U 2 t 

0 0 V 2 

= U 1 DV 1 

t 

= AV 1 V −1 

1 

= A. 

Corollary 29.4. Any square matrix A can be written as A = KP (the Cartan 

decomposition, also called the polar decomposition), where K is orthogonal and 

P is symmetric and positive semidefinite. 

Proof. Write A = UΣV t and set K = UV t and P = V ΣV t .

30 Factorizations 

Most theorems in linear algebra are obvious consequence of simple factorizations. 

30.1 LU Factorization 

Forward elimination is messy: we swap rows and add rows to lower rows. We 

want to put together all of the row swaps into one permutation matrix, and all 

of the row additions into one strictly lower triangular matrix. 

Algorithm 

To forward eliminate a matrix A (lets say with n rows), start by setting p to 

be the permutation 1, 2, 3, . . . , n (the identity permutation), L = 1, U = A. To 

start with, no entries of L are painted. Carry out forward elimination on U. 

a. Each time you find a nonzero pivot in U, you paint a larger square box in 

the upper left corner of L. (This number of rows in this painted box is 

always the number of pivots in U.) 

b. When you swap rows k and l of U, 

a) swap entries k and l of the permutation p and 

b) swap rows k and l of L, but only swap unpainted entries which lie 

beneath painted ones. 

c. If you add s (row k) to row l in U, then put −s into column k, row l in 

L. 

The painted box in L is always square, with number of rows and columns equal 

to the number of pivots drawn in U. 

Remark 30.1. As each step begins, the pivot rows of U with nonzero pivots in 

them are finished, and the entries inside the painted box and the entries on and 

above the diagonal of L are finished. 

Theorem 30.2. By following the algorithm above, every matrix A can be 

written as A = P −1 LU where P is a permutation matrix of a permutation p, L 

a strictly lower triangular matrix, and U an upper triangular matrix. 

Proof. Lets show that after each step, we always have P A = LU and always 

have L strictly lower triangular. For the first forward elimination step, we might 

have to swap rows. There is no painted box yet, so the algorithm says that 

279


Figure 30.1: Computing the LU factorization 

p L U 

⎛ 

⎞ ⎛ 

⎞ 

1 0 0 

1 0 0 

⎜ 

⎟ ⎜ 

⎟ 

1, 2, 3 ⎝ 0 1 0 ⎠ ⎝ 2 0 1⎠ 

0 0 1 

3 3 3 

⎛ 

⎞ ⎛ 

⎞ 

1 0 0 

1 0 0 

⎜ 

⎟ ⎜ 

⎟ 

1, 2, 3 ⎝ −2 1 0 ⎠ ⎝ 0 0 1⎠ 

−3 0 1 

0 3 3 

⎛ 

⎞ ⎛ 

⎞ 

1 0 0 

1 0 0 

⎜ 

⎟ ⎜ 

⎟ 

1, 3, 2 ⎝ −3 1 0 ⎠ ⎝ 0 3 3⎠ 

−2 0 1 

0 0 1 

⎛ 

⎞ ⎛ 

⎞ 

1 0 0 

1 0 0 

⎜ 

⎟ ⎜ 

⎟ 

1, 3, 2 ⎝ −3 1 0 ⎠ ⎝ 0 3 3⎠ 

−2 0 1 

0 0 1 

⎛ 

⎞ ⎛ 

⎞ 

1 0 0 

1 0 0 

⎜ 

⎟ ⎜ 

⎟ 

1, 3, 2 ⎝ −3 1 0 ⎠ ⎝ 0 3 3 ⎠ 

−2 0 1 

0 0 1 

the row swap leaves all entries of L alone. Let Q be the permutation matrix of 

the required row swap, and q the permutation. Our algorithm will pass from 

p = 1, L = I, U = A to p = q, L = I, U = QA, and so P A = LU. 

Next, we might have to add some multiples of the first row of U to lower 

rows. We carry this out by a strictly lower triangular matrix, say 

with s a vector. Notice that 

( ) 

1 0 

S = 

, 

s I 

( ) 

S −1 1 0 

= 

, 

−s I 

subtracts the corresponding multiples of row 1 from lower rows. So U becomes 

U new = SU, while the permutation p (and hence the matrix P ) stays the same. 

The matrix L becomes L new = S −1 L = S −1 , strictly lower triangular, and 

P new A = L new U new . 

Suppose that after some number of steps, we have reduced the upper left

30.1. LU Factorization 281 

corner of U to echelon form, say 

U = 

( ) 

U 0 U 1 

, 

0 U 2 

with U 0 in echelon form. Suppose that 

( ) 

L 

L = 0 0 

L 1 1 

is strictly lower triangular, and that P is some permutation matrix. Finally, 

suppose that P A = LU. 

Our next step in forward elimination could be to swap rows k and l in U, 

and these we can assume are rows in the bottom of U, i.e. rows of U 2 . Suppose 

that Q is the permutation matrix of a transposition so that QU 2 is U 2 with the 

appropriate rows swapped. In particular, Q 2 = I since Q is the permutation 

matrix of a transposition. Let 

( ) 

P new I 0 

= 

P, 

0 Q 

( ) ( 

L new I 0 

= 

L 

0 Q 

( ) 

U new I 0 

= 

U. 

0 Q 

I 0 

0 Q 

Check that P new A = L new U new . Multiplying out: 

( ) 

L new L 

= 0 0 

, 

QL 1 I 

strictly lower triangular. The upper left corner L 0 is the painted box. So L new 

is just L with rows k and l swapped under the painted box. 

If we add s (row k) of U to row l, this means multiplying by a strictly lower 

triangular matrix, say S. Then P A = LU implies that P A = LS −1 SU. But 

LS −1 is just L with s (column l) subtracted from column k. 

Problem 30.1. Find the LU-factorization of each of 

( ) 

0 1 

( ) ( ) 

A = 

, B = 1 , C = 1 . 

1 0 

) 

, 

Problem 30.2. Suppose that A is an invertible matrix. Prove that any two 

LU-factorizations of A which have the same permutation matrix P must be the 

same.

283 

Tensors

31 Quadratic Forms 

Quadratic forms generalize the concept of inner product, and play a crucial role 

in modern physics. 

31.1 Bilinear Forms 

Definition 31.1. A bilinear form on a vector space V is a rule associating to each 

pair of vectors x and y from V a number b(x, y) which is linear as a function of 

x for each fixed y and also linear as a function of y for each fixed x. 

Example 31.2. If V = R, then every bilinear form on V is b(x, y) = cxy where 

c could be any constant. 

Example 31.3. Every inner product on a vector space is a bilinear form. Moreover, 

a bilinear form b on V is an inner product just when it is symmetric 

(b(v, w) = b(w, v)) and positive definite (b(v, v) > 0 unless v = 0). 

Example 31.4. If V = R p , each p × p matrix A determines a bilinear map b by 

the rule b(v, w) = 〈v, Aw〉. Conversely, given a bilinear form b on R p , we can 

define a matrix A by setting A ij = b (e i , e j ), and then clearly if we expand out, 

b(x, y) = ∑ ij 

x i y j b (e i , e j ) 

= ∑ ij 

x i y j A ij 

= 〈x, Ay〉 . 

So every bilinear form on R p has the form b(x, y) = 〈x, Ay〉 for a uniquely 

determined matrix A. 

Of course, we can add and scale bilinear forms in the obvious way to make 

more bilinear forms, and the bilinear forms on a fixed vector space V form a 

vector space. 

Lemma 31.5. Fix a basis v 1 , v 2 , . . . , v p of V . Given any collection of numbers 

b ij (with i, j = 1, 2, . . . , p), there is precisely one bilinear form b with b ij = 

b (v i , v j ). Thus the vector space of bilinear forms on V is isomorphic to the 

vector space of p × p matrices (b ij ). 

285


Proof. Given any bilinear form b we can calculate out the numbers b ij = b (v i , v j ). 

Conversely given any numbers b ij , and vectors x = ∑ i x iv i and y = ∑ j y jv j 

we can let 

b (x, y) = ∑ b ij x i y j . 

i,j 

Clearly adding bilinear forms adds the associated numbers b ij , and scaling 

bilinear forms scales those numbers. 

Lemma 31.6. Let B be the set of all bilinear forms on V . If V has finite 

dimension, then dim B = (dim V ) 2 . 

Proof. There are (dim V ) 2 numbers b ij . 


Problem 31.1. Let V be any vector space. Prove that b(x 0 + y 0 , x 1 + y 1 ) = 

y 0 (x 1 ) is a bilinear form on V ⊕ V ∗ . 

Problem 31.2. What are the numbers b ij if b(x, y) = 〈x, y〉 (the usual inner 

product on R n )? 

Problem 31.3. Suppose that V is a finite dimensional vector space, and let 

B be the vector space of all bilinear forms on V . Prove that B is isomorphic 

to the vector space Hom (V, V ∗ ), by the isomorphism F : Hom (V, V ∗ ) → B 

given by taking each linear map T : V → V ∗ , to the bilinear form b given by 

b(v, w) = (T w)(v). (This gives another proof that dim B = (dim V ) 2 .) 

Problem 31.4. A bilinear form b on V is degenerate if b(x, y) = 0 for all x for 

some fixed y, or for all y for some fixed x. 

(a) Give the simplest example you can of a nonzero bilinear form which is 

degenerate. 

(b) Give the simplest example you can of a bilinear form which is nondegenerate. 

Problem 31.5. Let V be the vector space of all 2 × 2 matrices. Let b(A, B) = 

tr AB. Prove that b is a nondegenerate bilinear form. 

Problem 31.6. Let V be the vector space of polynomial functions of degree at 

most 2. For each of the expressions 

(a) 

(b) 

(c) 

b(p(x), q(x)) = 

b(p(x), q(x)) = 

∫ ∞ 

∫ 1 

−∞ 

−1 

p(x)q(x) dx, 

p(x)q(x)e −x2 dx, 

b(p(x), q(x)) = p(0)q ′ (0),


(d) 

(e) 

b(p(x), q(x)) = p(1) + q(1), 

b(p(x), q(x)) = p(0)q(0) + p(1)q(1) + p(2)q(2), 

is b bilinear? Is b degenerate? 

31.2 Quadratic Forms 

Definition 31.7. A bilinear form b on a vector space V is symmetric if b(x, y) = 

b(y, x) for all x and y in V . A bilinear form b on a vector space V is positive 

definite if b(x, x) > 0 for all x ≠ 0. 

Problem 31.7. Which of the bilinear forms in problem 31.6 on the facing page 

are symmetric? 

Definition 31.8. The quadratic form Q of a bilinear form b on a vector space V 

is the real-valued function Q(x) = b(x, x). 

Example 31.9. The squared length of a vector in R n , 

Q(x) = ‖x‖ 2 = 〈x, x〉 = ∑ i 

x 2 i 

is the quadratic form of the inner product on R n . 

Example 31.10. Every quadratic form on R n has the form 

Q(x) = ∑ ij 

A ij x i x j , 

for some numbers A ij = A ji . We could make a symmetric matrix A with those 

numbers as entries, so that Q(x) = 〈x, Ax〉. Of course, as in section 14.2 on 

page 138, the symmetric matrix A is uniquely determined by the quadratic 

form Q and uniquely determines Q. 

Problem 31.8. What are the quadratic forms of the bilinear forms in problem 

31.6 on the facing page? 

Lemma 31.11. Every quadratic form Q determines a symmetric bilinear form 

b by 

b(x, y) = 1 (Q(x + y) − Q(x) − Q(y)) . 

2 

Moreover, Q is the quadratic form of b. 

Proof. There are various identities we have to check on b to ensure that b is a 

bilinear form. Each identity involves a finite number of vectors. Therefore it 

suffices to prove the result over a finite dimensional vector space V (replacing V 

by the span of the vectors involved in each identity). Be careful: the identities


have to hold for all vectors from V , but we can first pick vectors from V , and 

then replace V by their span and then check the identity. 

Since we can assume that V is finite dimensional, we can take a basis for V 

and therefore assume that V = R n . Therefore we can write 

Q(x) = 〈x, Ax〉 , 

for a symmetric matrix A. Expanding out 

which is clearly bilinear. 

b(x, y) = 1 (〈x + y, A(x + y)〉 − 〈x, Ax〉 − 〈y, Ay〉) 

2 

= 〈x, Ay〉 , 

Problem 31.9. The results of this chapter are still true over any field (although 

we won’t try to make sense of being positive definite if our field is not R), except 

for lemma 31.11. Find a counterexample to lemma 31.11 on the previous page 

over the field of Boolean numbers. 

Theorem 31.12. The equation 

b(x, y) = 1 (Q(x + y) − Q(x) − Q(y)) . 

2 

gives an isomorphism between the vector space of symmetric bilinear forms b 

and the vector space of quadratic forms Q. 

The proof is obvious just looking at the equation: if you scale the left side, 

then you scale the right side, and vice versa, and similarly if you add bilinear 

forms on the left side, you add quadratic forms on the right side. 

31.3 Sylvester’s Law of Inertia 

Theorem 31.13 (Sylvester’s Law of Inertia). Given a quadratic form Q on 

a finite dimensional vector space V , there is an isomorphism F : V → R n for 

which 

Q(x) = x 2 1 + x 2 2 + · · · + x 2 p −x 2 p+1 − x 2 p+2 − · · · − x 2 p+q, 

} {{ } } {{ } 

p positive terms 

q negative terms 

where 

⎛ ⎞ 

x 1 

x 

F x = 

2 

⎜ ⎟ 

⎝ . ⎠ . 

x n 

We cannot by any linear change of variables alter the value of p (the number of 

positive terms) or the value of q (the number of negative terms).

31.3. Sylvester’s Law of Inertia 289 

Remark 31.14. Sylvester’s Law of Inertia tells us what all quadratic forms look 

like, if we allow ourselves to change variables. The numbers p and q are the only 

invariants. The reader should keep in mind that in our study of the spectral 

theorem, we only allowed orthogonal changes of variable, so we got eigenvalues 

as invariants. But here we allow any linear change of variable; in particular we 

can rescale, so only the signs of the eigenvalues are invariant. 

Proof. We could apply the spectral theorem (theorem 14.2 on page 136), but 

we will instead use elementary algebra. Take any basis for V , so that we can 

assume that V = R n , and that Q(x) = 〈x, Ax〉 for some symmetric matrix A. 

In other words, 

Q(x) = A 11 x 2 1 + A 12 x 1 x 2 + . . . . 

Suppose that A 11 ≠ 0. Lets collect together all terms containing x 1 and 

complete the square 

Let 

Then 

A 11 x 2 1 + ∑ A 1j x 1 x j + ∑ A i1 x i x 1 

j>1 

i>1 

⎛ 

⎞2 

⎛ ⎞ 

=A 11 

⎝x 1 + 1 ∑ 

A 1j x j 

⎠ − 1 ⎝ ∑ 2 

A 1j x j 

⎠ 

A 11 A 

j>1 

11 

j>1 

y 1 = x 1 + 1 ∑ 

A 1j x j . 

A 11 

j>1 

Q(x) = A 11 y 2 1 + . . . 

where the . . . involve only x 2 , x 3 , . . . , x n . Changing variables to use y 1 in place 

of x 1 is an invertible linear change of variables, and gets rid of nondiagonal 

terms involving x 1 . 

We can continue this process using x 2 instead of x 1 , until we have used 

up all variables x i with A ii ≠ 0. So lets suppose that all diagonal terms of A 

vanish. If there is some nondiagonal term of A which doesn’t vanish, say A 12 , 

then make new variables y 1 = x 1 + x 2 and y 2 = x 1 − x 2 , so x 1 = 1 2 (y 1 + y 2 ) 

and x 2 = 1 2 (y ( ) 

1 − y 2 ). Then x 1 x 2 = 1 4 y 

2 

1 − y2 

2 , turning the A12 x 1 x 2 term into 

two diagonal terms. 

So now we have killed off all nondiagonal terms, so we can assume that 

Q(x) = A 11 x 2 1 + A 22 x 2 2 + · · · + A nn x 2 n. 

We can rescale x 1 by any nonzero constant c which scales A 11 by 1/c 2 . Lets 

choose c so that c 2 = ±A 11 . 

Problem 31.10. Apply this method to the quadratic form 

Q(x) = x 2 1 + x 1 x 2 + x 2 x 1 + x 2 x 3 + x 3 x 2 .


Next we have to show that the numbers p of positive terms and q of negative 

terms cannot be altered by any linear change of variables. We can assume (by 

using the linear change of variables we have just constructed) that V = R n and 

that 

Q(x) = x 2 1 + x 2 2 + · · · + x 2 p − ( x 2 p+1 + x 2 p+2 + · · · + x 2 p+q) 

. 

We want to show that p is the largest dimension of any subspace on which Q is 

positive definite, and similarly that q is the largest dimension of any subspace 

on which Q is negative definite. Consider the subspace V + of vectors of the 

form 

⎛ ⎞ 

x 1 

x 2 

. 

x = 

x p 

0 

. 

0 

⎜ ⎟ 

⎝ . ⎠ 

0 

Clearly Q is positive definite on V + . Similarly, Q is negative definite on the 

subspace V − of vectors of the form 

⎛ 

x = 

⎜ 

⎝ 

0 

0 

. 

0 

x p+1 

x p+2 

. 

x p+q 

0 

0 

. 

0 

⎞ 

. 

⎟ 

⎠ 

Suppose that we can find some subspace W of V of greater dimension than p, 

so that Q is positive definite on W . Let T be the orthogonal projection to V + .

31.3. Sylvester’s Law of Inertia 291 

In other words, for any vector x in R n , let 

⎛ 

x 1 

⎞ 

x 2 

. 

P + x = 

x p 

0 

. 

0 

⎜ ⎟ 

⎝ . ⎠ 

0 

Then P + | W 

: W → V + is a linear map, and dim W > dim V + , so 

dim W = dim ker P + | W 

+ dim im T | W 

so, subtracting dim W from both sides, 

≤ dim ker P + | W 

+ dim V + 

< dim ker P + | W 

+ dim W. 

0 < dim ker P + | W 

. 

Therefore there is a nonzero vector x in W for which P + x = 0, i.e. 

⎛ 

x = 

⎜ 

⎝ 

0 

0 

. 

0 

x p+1 

x p+2 

. 

x p+q 

x p+q+1 

x p+q+2 

. 

x n 

⎞ 

. 

⎟ 

⎠ 

Clearly Q(x) > 0 since x lies in W . But clearly 

a contradiction. 

Q(x) = −x 2 p+1 − x 2 p+2 − · · · − x 2 p+q ≤ 0,


Remark 31.15. Much of the proof works over any field, as long as we can divide 

by 2, i.e. as long as 2 ≠ 0. However, there could be a problem when we try to 

rescale: even if 2 ≠ 0, we can only arrange 

Q(x) = ε 1 x 2 1 + ε 2 x 2 2 + · · · + ε n x 2 n, 

where each ε i can be rescaled by any nonzero number of the form c 2 . (There is 

no reasonable analogue of the numbers p and q in a general field). In particular, 

since every complex number has a square root, the same theorem is true for 

complex quadratic forms, but in the stronger form that we can arrange q = 0, 

i.e. we can arrange 

Q(x) = x 2 1 + x 2 2 + · · · + x 2 p. 

Problem 31.11. Prove that a quadratic form on any real n-dimensional vector 

space is nondegenerate just when p + q = n, with p and q as in Sylvester’s law 

of inertia. 

Problem 31.12. For a complex quadratic form, prove that if we arrange our 

quadratic form to be 

Q(x) = x 2 1 + x 2 2 + · · · + x 2 p. 

then we are stuck with the resulting value of p, no matter what linear change 

of variables we employ. 


Problem 31.13. Find a function Q(x) for x in R n which satisfies Q(ax) = 

a 2 Q(x), but which is not a quadratic form. 

31.4 Kernel and Null Cone 

Definition 31.16. Take a vector space V . The kernel of a symmetric bilinear 

form b on V is the set of all vectors x in V for which b(x, y) = 0 for any vector 

y in V . 

Problem 31.14. Find the kernel of the symmetric bilinear form b(x, y) = 

x 1 y 1 − x 2 y 2 for x and y in R 3 . 

Definition 31.17. The null cone of a symmetric bilinear form b(x, y) is the set 

of all vectors x in V for which b(x, x) = 0. 

Example 31.18. The vector 

( ) 

1 

x = 

0 

lies in the null cone of the symmetric bilinear form b(x, y) = x 1 y 1 − x 2 y 2 

for x and y in R 2 . Indeed the null cone of that symmetric bilinear form is 

0 = b(x, x) = x 2 1 − x 2 2, so its the pair of lines x 1 = x 2 and x 1 = −x 2 in R 2 .

31.5. Orthonormal Bases 293 

Problem 31.15. Prove that the kernel of a symmetric bilinear form lies in its 

null cone. 

Problem 31.16. Find the null cone of the symmetric bilinear form b(x, y) = 

x 1 y 1 − x 2 y 2 for x and y in R 3 . What part of the null cone is the kernel? 

Problem 31.17. Prove that the kernel is a subspace. For which symmetric 

bilinear forms is the null cone a subspace? For which symmetric bilinear forms 

is the kernel equal to the null cone? 


Problem 31.18. Prove that the kernel of a symmetric bilinear form consists 

precisely in the vectors x for which Q(x + y) = Q(y) for all vectors y, with Q 

the quadratic form of that bilinear form. (In other words, we can translate in 

the x direction without altering the value of the function Q(y).) 


Definition 31.19. If b is a nondegenerate symmetric bilinear form on a vector 

space V , a basis v 1 , v 2 , . . . , v n for V is called orthonormal for b if if 

⎧ 

⎨ 

±1 if i = j, 

b (v i , v j ) = 

⎩0 if i ≠ j. 

Corollary 31.20. Any nondegenerate symmetric bilinear form any finite dimensional 

vector space has an orthonormal basis. 

Proof. Take the linear change of variables guaranteed by Sylvesters’ law of 

inertia, and then the standard basis will be orthonormal. 

Problem 31.19. Find an orthonormal basis for the quadratic form b(x, y) = 

x 1 y 2 + x 2 y 1 on R 2 . 

Problem 31.20. Suppose that b is a symmetric bilinear form on finite dimensional 

vector space V . 

(a) For each vector x in V , define a linear map ξ : V → R by ξ(y) = b(x, y). 

Write this covector ξ as ξ = T x. Prove that the map T : V → V ∗ given by 

ξ = T x is linear. 

(b) Prove that the kernel of T is the kernel of b. 

(c) Prove that T is an isomorphism just when b is nondegenerate. (The moral 

of the story is that a nondegenerate symmetric bilinear form b identifies 

the vector space V with its dual space V ∗ via the map T .)


(d) If b is nondegenerate, prove that for each covector ξ, there is a unique vector 

x in V so that b(x, y) = ξ(y) for every vector y in V . 


Problem 31.21. What is the linear map T of problem 31.20 on the previous 

page (i.e. write down the associated matrix) for each of the following symmetric 

bilinear forms? 

(a) b(x, y) = x 1 y 2 + x 2 y 1 on R 2 

(b) b(x, y) = x 1 y 1 + x 2 y 2 on R 2 

(c) b(x, y) = x 1 y 1 − x 2 y 2 on R 2 

(d) b(x, y) = x 1 y 1 − x 2 y 2 − x 3 y 3 − x 4 y 4 on R 4 

(e) b(x, y) = x 1 (y 1 + y 2 + y 3 ) + (x 1 + x 2 + x 3 ) y 1

32 Tensors and Indices 

In this chapter, we define the concept of a tensor in R n , following an approach 

common in physics. 

32.1 What is a Tensor? 

Vectors x have entries x i 

⎛ ⎞ 

x 1 

x 

x = 

2 

⎜ ⎟ 

⎝ . ⎠ . 

x n 

A matrix A has entries A ij . To describe the entries of a vector x, we use a single 

index, while for a matrix we use two indices. A tensor is just an object whose 

entries have any number of indices. (Entries will also be called components.) 

Example 32.1. In a physics course, one learns that stress applied to a crystal 

causes an electric field (see Feynman et. al. [3] II-31-12). The electric field is 

a vector E (at each point in space) with components E i , while the stress is 

a symmetric matrix S with components S ij = S ji . These are related by the 

piezoelectric tensor P , which has components P ijk : 

E i = ∑ jk 

P ijk S jk . 

Just as a matrix is a rectangle of numbers, a tensor with three indices is a box 

of numbers. 

For this chapter, all of our tensors will be “tensors in R n ,” which means 

that the indices all run from 1 to n. For example, our vectors are literally in 

R n , while our matrices are n × n, etc. 

The subject of tensors is almost trivial, since there is really nothing much 

we can say in any generality about them. There are two subtle points: upper 

versus lower indices and summation notation. 

295


Upper versus lower indices 

It is traditional (following Einstein) to write components of vectors x not as x i 

but as x i , so not as 

⎛ ⎞ 

x 1 

x 

x = 

2 

⎜ ⎟ 

⎝ . ⎠ , 

x n 

but instead as 

⎛ ⎞ 

x 1 

x 

x = 

2 

⎜ ⎟ 

⎝ . ⎠ . 

x n 

In particular, x 2 doesn’t mean x · x, but means the second entry of x. Next we 

write elements of R n∗ as 

) 

y = 

(y 1 y 2 . . . y n , 

with indices down. We will call the elements of R n∗ covectors. Finally, we write 

matrices as 

⎛ 

⎞ 

A 1 1 A 1 2 . . . A 1 q 

A 

A = 

2 1 A 2 2 . . . A 2 q 

⎜ . . 

. 

⎟ 

⎝ . . . . . . ⎠ , 

A p 1 A p 2 . . . A p q 

so A row 

column . In general, a tensor can have as many upper and lower indices as 

we need, and we will treat upper and lower indices as being different. 

For example, the components of a matrix look like A i j , never like A ij or A ij , 

which would represent tensors of a different type. 

Summation Notation 

Following Einstein further, whenever we write an expression with some letter 

appearing once as an upper index and once as a lower index, like A i j xj , this 

means ∑ j Ai j xj , i.e. a sum is implicitly understood over the repeated j index. 

We will often refer to a vector x as x i . This isn’t really fair, since it confuses 

a single component x i of a vector with the entire vector, but it is standard. 

Similarly, we write a matrix A as A i j and a tensor with 2 upper and 3 lower 

indices as 

t ij 

klm . 

The names of the indices have no significance and will usually change during 

the course of calculations.

32.2. Operations 297 

32.2 Operations 

What can we do with tensors? Very little. At first sight, they look complicated. 

But there are very few operations on tensors. We can 

(1) Add tensors that have the same numbers of upper and of lower indices; for 

example add 

s ijk 

lm 

to 

t ijk 

lm 

to get 

s ijk 

lm + tijk lm . 

If the tensors are vectors or matrices, this is just adding in the usual way. 

(2) Scale; for example 

3 t ij 

klm 

means the obvious thing: triple each component of t ij 

klm . 

(3) Swap indices of the same type; for example, take a tensor t i jk 

and make the 

tensor t i kj 

. There is no nice notation for doing this. 

(4) Take tensor product: just write down two tensors beside one another, with 

distinct indices; for example, the tensor product of s i j and t ij is 

s i jt kl . 

Note that we have to change the names on the indices of t before we write 

it down, so that we don’t use the same index names twice. 

(5) Finally, contract: take any one upper index, and any one lower index, and 

set them equal and sum. For example, we can contract the i and k indices 

of a tensor t i jk to produce ti ji . Note that ti ji has only one index j, since the 

summation convention tells us to sum over all possibilities for the i index. 

So t i ji is a covector. 

In tensor calculus there are some additional operations on tensor quantities 

(various methods for differentiating and integrating), and these additional 

operations are essential to physical applications, but tensor calculus is not in 

the algebraic spirit of this book, so we will never consider any other operations 

than those listed above. 

Example 32.2. If x i is a vector and y i a covector, then we can’t add them, 

because the indices don’t match. But we can take their tensor product x i y j , 

and then we can contract to get x i y i . This is of course just y(x), thinking of 

every covector y as a linear function on R n . 

Example 32.3. If x i is a vector and A i j is a matrix, then their tensor product 

is A i j xk , and contracting gives A i j xj , which is the vector Ax. So matrix 

multiplication is tensor product followed by contraction. 

Example 32.4. If A i j and Bi j are two matrices, then Ai k Bk j is the matrix AB. 

Similarly, A k j Bi k 

is the matrix BA. These are the two possible contractions of 

the tensor product A i j Bk l .


Example 32.5. A matrix A i j has only one contraction: Ai i , the trace, which is a 

number (because it has no free indices). 

Example 32.6. It is standard in working with tensors to write the entries of the 

identity matrix not as Ij i but as δi j : 

⎧ 

⎨ 

δj i 1 if i = j, 

= 


The trace of the identity matrix is δi i = n, since for each value of i we add one. 

Tensor products of the identity matrix give many other tensors, like δj iδk l . 

Problem 32.1. Take the tensor product of the identity matrix and a covector 

ξ, and simplify all possible contractions. 

Example 32.7. If a tensor has various lower indices, we can average over all 

permutations of them. This process is called symmetrizing over indices. For 

example, a tensor t i jk can be symmetrized to a tensor 1 2 ti jk + 1 2 ti kj . Obviously 

we can also symmetrize over any two upper indices. But we can’t symmetrize 

over an upper and a lower index. 

If we fix our attention on a pair of indices, we can also antisymmetrize over 

them, say taking a tensor t jk and producing 

1 

2 t jk − 1 2 t kj. 

A tensor is symmetric in some indices if it doesn’t change when they are 

permuted, and is antisymmetric in those indices if it changes by the sign of the 

permutation when the indices are permuted. 

Again focusing on just two lower indices, we can split any tensor into a sum 

t jk = 1 2 t jk + 1 2 t kj + 1 

} {{ } 

2 t jk − 1 2 t kj 

} {{ } 

symmetric antisymmetric 

of a symmetric part and an antisymmetric part. 

Problem 32.2. Suppose that a tensor t ijk is symmetric in i and j, and antisymmetric 

in j and k. Prove that t ijk = 0. 

Of course, we write a tensor as 0 to mean that all of its components are 0. 

Example 32.8. Lets look at some tensors with lots of indices. Working in R 3 , 

define a tensor ε by setting 

⎧ 

⎪⎨ 1 if i, j, k is an even permutation of 1, 2, 3, 

ε ijk = −1 if i, j, k is an odd permutation of 1, 2, 3, 

⎪⎩ 

0 if i, j, k is not a permutation of 1, 2, 3.

32.3. Changing Variables 299 

Of course, i, j, k fails to be a permutation just when two or three of i, j or k 

are equal. For example, 

ε 123 = 1, ε 221 = 0, ε 321 = −1, ε 222 = 0. 

Problem 32.3. Take three vectors x, y and z in R 3 , and calculate the contraction 

ε ijk x i y j z k . 

Problem 32.4. Prove that every tensor t ijk which is antisymmetric in all lower 

indices is a constant multiple t ijk = c ε ijk . 

Example 32.9. More generally, working in R n , we can define a tensor ε by 

⎧ 

⎪⎨ 1 if i 1 , i 2 , . . . i n is an even permutation of 1, 2, . . . , n, 

ε i1i 2...i n 

= −1 if i 1 , i 2 , . . . i n is an odd permutation of 1, 2, . . . , n, 

⎪⎩ 

0 if i 1 , i 2 , . . . i n is not a permutation of 1, 2, . . . , n. 

We can restate equation 18.1 on page 181 as 

for a matrix A. 

ε i1i 2...i n 

A i1 

1 Ai2 2 . . . Ain n = det A, 

32.3 Changing Variables 

Lets change variables from x to y = F x, with F an invertible matrix. We want 

to make a corresponding tensor F ∗ t out of any tensor t, so that sums, scalings, 

tensor products, and contractions will correspond, and so that on vectors F ∗ x 

will be F x. These requirements will determine what F ∗ t has to be. 

Lets start with a covector ξ i . We need to preserve the contraction ξ(x) on 

any vector x, so need to have F ∗ ξ (F ∗ x) = ξ(x). Replacing the vector x by some 

vector y = F x, we get F ∗ ξ(y) = ξ ( F −1 y ) , for any vector y, which identifies 

F ∗ ξ as 

F ∗ ξ = ξF −1 . 

In indices: 

(F ∗ ξ) i 

= ξ j 

( 

F 

−1 ) j 

i . 

In other words, F ∗ ξ is ξ contracted against F −1 . So vectors transform as 

F ∗ x = F x (contract with F ), and covectors as F ∗ ξ = ξF −1 (contract with F −1 ). 

Contracting any tensor with as many vectors and covectors as needed to form 

a number; in order to preserve these contractions, the tensor’s upper indices 

must transform like vectors, and its lower indices like covectors, when we carry 

out F ∗ . For example, 

(F ∗ t) ij 

k 

= F pF i q j t pq ( 

r F 

−1 ) r 

k


In other words, we contract one copy of F with each upper index and contract 

one copy of F −1 with each lower index. 

For example, lets see the invariance of contraction under F ∗ in the simplest 

case of a matrix. 

(F ∗ A) i i = F j i A j ( 

k F 

−1 ) k 

i 

= A j ( 

k F 

−1 ) k 

i F j 

i 

= A j ( 

k F −1 F ) k 

j 

= A j k δk j 

= A j j 

since A j k δk j has a sum over j and k, but each term vanishes unless j = k, in 

which case we find A j j being added. 

Problem 32.5. Prove that both of the contractions of a tensor t i jk 

are preserved 

by linear change of variables. 

Problem 32.6. Find F ∗ ε. 

previous page.) 

(Recall the tensor ε from example 32.9 on the 

Remark 32.10. Note how abstract the subject is: it is rare that we would write 

down examples of tensors, with actual numbers in their entries. It is more 

common that we think about tensors as abstract algebraic gadgets for storing 

multivariate data from physics. Writing down examples, in this rarified air, 

would only make the subject more confusing. 

32.4 Two Problems 

Two problems: 

a. How can you tell if two tensors can be made equal by a linear change of 

variable? 

b. How can a tensor be split into a sum of tensors, in a manner that is 

unaltered by linear change of variables? 

The first problem has no solution. For a matrix, the answer is Jordan normal 

form (theorem 23.2 on page 214). For a quadratic form, the answer is Sylvester’s 

law of inertia (theorem 31.13 on page 288). But a general tensor can store an 

enormous amount of information, and no one knows how to describe it. We can 

give some invariants of a tensor, to help to distinguish tensors. These invariants 

are similar to the symmetric functions of the eigenvalues of a matrix. 

The second problem is much easier, and we will provide a complete solution.

32.5. Cartesian Tensors 301 

Tensor Invariants 

The contraction of indices is invariant under F ∗ , as is tensor product. So given 

a tensor t, we can try to write down some invariant numbers out it by taking 

any number of tensor products of that tensor with itself, and any number of 

contractions until there are no indices left. 

For example, a matrix A i j has among its invariants the numbers 

A i i, A i jA j i , Ai jA j k Ak i , . . . 

In the notation of earlier chapters, these numbers are 

tr A, tr ( A 2) , tr ( A 3) , . . . 

i.e. the functions we called p k (A) in example 26.16 on page 248. We already 

know from that chapter that every real-valued polynomial invariant of a matrix is 

a function of p 1 (A), p 2 (A), . . . , p n (A). More generally, all real-valued polynomial 

invariants of a tensor are polynomial functions of those obtained by taking 

some number of tensor products followed by some number of contractions. (The 

proof is very difficult; see Olver [8] and Procesi [10]). 

Problem 32.7. Describe as many invariants of a tensor t ij 

kl 

as you can. 

It is difficult to decide how many of these invariants you need to write down 

before you can be sure that you have a complete set, in the sense that every 

invariant is a polynomial function of the ones you have written down. General 

theorems (again see Olver [8] and Procesi [10]) ensure that eventually you will 

produce a finite complete set of invariants. 

If a tensor has more lower indices than upper indices, then so does every 

tensor product of it with itself any number of times, and so does every contraction. 

Therefore there are no polynomial invariants of such tensors. Similarly if 

there are more upper indices than lower indices then there are no polynomial 

invariants. For example, a vector has no polynomial invariants. For example, a 

quadratic form Q ij = Q ji has no upper indices, so has no polynomial invariants. 

(This agrees with Sylvester’s law of inertia (theorem 31.13 on page 288), which 

tells us that the only invariants of a quadratic form (over the real numbers) are 

the integers p and q, which are not polynomials in the Q ij entries. 

Breaking into Irreducible Components 

32.5 Cartesian Tensors 

Tensors in engineering and applied physics never have any upper indices. However, 

the engineers and applied physicists never carry out any linear changes 

of variable except for those described by orthogonal matrices. This practice is 

(quite misleadingly) referred to as working with “Cartesian tensors.” The point 

to keep in mind is that they are working in R n in a physical context in which 

lengths (and distances) are physically measurable. Theorem 20.5 on page 195


tells us that in order to preserve lengths, the only linear changes of variable we 

can employ are those given by orthogonal matrices. 

If we had a tensor with both upper and lower indices, like t i jk 

, we can see 

that it transforms under a linear change of variables y = F x as 

(F ∗ t) i jk = F pt i p ( 

qr F 

−1 ) q ( 

j F 

−1 ) r 

k . 

Lets define a new tensor by letting s ijk = t i jk 

, just dropping the upper index. If 

F is orthogonal, i.e. F −1 = F t , then F = ( F −1) t 

, so F 

i 

p = ( F −1) p 

i . Therefore 

(F ∗ s) ijk 

= ( F −1) p 

We summarize this calculation: 

= F i pt p qr 

= (F ∗ t) i jk . 

i s pqr 

( 

F 

−1 ) q 

j 

( 

F 

−1 ) q ( 

F 

−1 ) r 

j 

( 

F 

−1 ) r 

k 

Theorem 32.11. Dropping upper indices to become lower indices is an operation 

on tensors which is invariant under any orthogonal linear change of 

variable. 

Note that this trick only works for orthogonal matrices F , i.e. orthogonal 

changes of variable. 

Problem 32.8. Prove that doubling (i.e. the linear map y = F x = 2x) acts 

on a vector x by doubling it, on a covector by scaling by 1 2 

. (In general, this 

linear map F x = 2x acts on any tensor t with p upper and q lower indices by 

scaling by 2 p−q , i.e. F ∗ t = 2 p−q t.) What happens if you first lower the index of 

a vector and then apply F ∗ ? What happens if you apply F ∗ and then lower the 

index? 

Problem 32.9. Prove that contracting two lower indices with one another is 

an operation on tensors which is invariant under orthogonal linear change of 

variable, but not under rescaling of variables. 

Engineers often prefer their approach to tensors: only lower indices, and all 

of the usual operations. However, their approach makes rescalings, and other 

nonorthogonal transformations (like shears, for example) more difficult. There 

is a similar approach in relativistic physics to lower indices: by contracting with 

a quadratic form. 

k

33 Tensors 

We give a mathematical definition of tensor, and show that it agrees with 

the more concrete definition of chapter 32. We will continue to use Einstein’s 

summation convention throughout this chapter. 

33.1 Multilinear Functions and Multilinear Maps 

Definition 33.1. If V 1 , V 2 , . . . , V p are some vector spaces, then a function 

t(x 1 , x 2 , . . . , x p ) is multilinear if t(x 1 , x 2 , . . . , x p ) is a linear function of each 

vector when all of the other vectors are held constant. (The vector x 1 comes 

from the vector space V 1 , etc.) 

Similarly a map t taking vectors x 1 in V 1 , x 2 in V 2 , . . . , x p in V p , to a 

vector w = t(x 1 , x 2 , . . . , x p ) in a vector space W is called a multilinear map if 

t(x 1 , x 2 , . . . , x p ) is a linear map of each vector x i , when all of the other vectors 

are held constant. 

Example 33.2. The function t(x, y) = xy is multilinear for x and y real numbers, 

being linear in x when y is held fixed and linear in y when x is held fixed. Note 

that t is not linear as a function of the vector 

( 

x 

y 

Example 33.3. Any two linear functions ξ : V → R and η : W → R on two 

vector spaces determine a multilinear map t(x, y) = ξ(x)η(y) for x from V and 

y from W . 

33.2 What is a Tensor? 

) 

We already have a definition of tensor, but only for tensors “in R n ”. We want 

to define some kind of object which we will call a tensor in an abstract vector 

space, so that when we pick a basis (identifying the vector space with R n ), the 

object becomes a tensor in R n . The clue is that a tensor in R n has lower and 

upper indices, which can be contracted against vectors and covectors. For now, 

lets only think about tensors with just upper indices. Upper indices contract 

against covectors, so we should be able to “plug in covectors,” motivating the 

definition: 

303 

.

304 Tensors 

Definition 33.4. For any finite dimensional vector spaces V 1 , V 2 , . . . , V p , let 

V 1 ⊗ V 2 ⊗ · · · ⊗ V p (called the tensor product of the vector spaces V 1 , V 2 , . . . , V p ) 

be the set of all multilinear maps 

t (ξ 1 , ξ 2 , . . . , ξ p ) , 

where ξ 1 is a covector from V 1 ∗ , ξ 2 is a covector from V 2 ∗ , etc. Each such 

multilinear map t is called a tensor. 

Example 33.5. A tensor t ij in R n following our old definition (from chapter 32) 

yields a tensor t(ξ, η) = t ij ξ i η j following this new definition. On the other hand, 

if t (ξ, η) is a tensor in R n ⊗ R n , then we can define a tensor following our old 

definition by letting t ij = t ( e i , e j) , where e 1 , e 2 , . . . , e n is the usual dual basis 

to the standard basis of R n . 

Example 33.6. Let V be a finite dimensional vector space. Recall that there is 

a natural isomorphism V → V ∗∗ , given by sending any vector v to the linear 

function f v on V ∗ given by f v (ξ) = ξ(v). We will henceforth identify any vector 

v with the function f v ; in other words we will from now on use the symbol v 

itself instead of writing f v , so that we think of a covector ξ as a linear function 

ξ(v) on vectors, and also think of a vector v as a linear function on covectors ξ, 

by the bizarre definition v(ξ) = ξ(v). In this way, a vector is the simplest type 

of tensor. 

Definition 33.7. Let V and W be finite dimensional vector spaces, and take v a 

vector in V and w a vector in W . Then write v ⊗ w for the multilinear map 

v ⊗ w(ξ, η) = ξ(v)η(w). 

So v ⊗ w is a tensor in V ⊗ W , called the tensor product of v and w. 

Definition 33.8. If s is a tensor in V 1 ⊗ V 2 ⊗ · · · ⊗ V p , and t is a tensor in 

W 1 ⊗ W 2 ⊗ · · · ⊗ W q , then let s ⊗ t, called the tensor product of s and t, be the 

tensor in V 1 ⊗ V 2 ⊗ · · · ⊗ V p ⊗ W 1 ⊗ W 2 ⊗ · · · ⊗ W q given by 

s ⊗ t (ξ 1 , ξ 2 , . . . , ξ p , η 1 , η 2 , . . . , η q ) = s (ξ 1 , ξ 2 , . . . , ξ p ) t (η 1 , η 2 , . . . , η q ) . 

Definition 33.9. Similarly, we can define the tensor product of several tensors. 

For example, given finite dimensional vector spaces U, V and W , and vectors 

u from U, v from V and w from W , let u ⊗ v ⊗ w mean the multilinear map 

u ⊗ v ⊗ w(ξ, η, ζ) = ξ(u)η(v)ζ(w), etc. 


(av) ⊗ w = a (v ⊗ w) = v ⊗ (aw) 

(v 1 + v 2 ) ⊗ w = v 1 ⊗ w + v 2 ⊗ w 

v ⊗ (w 1 + w 2 ) = v ⊗ w 1 + v ⊗ w 2 

for any vectors v, v 1 , v 2 from V and w, w 1 , w 2 from W and any number a.

33.2. What is a Tensor? 305 

Problem 33.2. Take U, V and W any finite dimensional vector spaces. Prove 

that (u ⊗ v) ⊗ w = u ⊗ (v ⊗ w) = u ⊗ v ⊗ w for any three vectors u from U, v 

from V and w from W . 

Theorem 33.10. If V and W are two finite dimensional vector spaces, with 

bases v 1 , v 2 , . . . , v p and w 1 , w 2 , . . . , w q , then V ⊗ W has as a basis the vectors 

v i ⊗ w J for i running over 1, 2, . . . , p and J running over 1, 2, . . . , q. 

Proof. Take the dual bases v 1 , v 2 , . . . , v p and w 1 , w 2 , . . . , w q . Every tensor t 

from V ⊗ W has the form 

so let t ij = t ( v i , w J) to find 

t(ξ, η) = t ( ξ i v i , η J w J) 

= ξ i η J t ( v i , w J) 

= t iJ ξ i η J 

= t iJ v i ⊗ w J (ξ, η) . 

So the v i ⊗ w J span. Any linear relation between the v i ⊗ w J , just reading 

these lines from bottom to top, would yield a vanishing multilinear map, so 

would have to satisfy 0 = t ( v k , w L) = t kL , forcing all coefficients in the linear 

relation to vanish. 

Remark 33.11. A similar theorem, with a similar proof, holds for any tensor 

products: take any finite dimensional vector spaces V 1 , V 2 , . . . , V p , and pick 

any basis for V 1 and any basis for V 2 , etc. Then taking one vector from each 

basis, and taking the tensor product of these vectors, we obtain a tensor in 

V 1 ⊗ V 2 ⊗ · · · ⊗ V p . These tensors, when we throw in all possible choices of basis 

vectors for all of those bases, yield a basis for V 1 ⊗ V 2 ⊗ · · · ⊗ V p , called the 

tensor product basis. 

Problem 33.3. Let V = R 3 , W = R 2 and let 

⎛ ⎞ 

⎜ 

1 ( ) 

⎟ 4 

x = ⎝2⎠ , y = . 

5 

3 

What is x ⊗ y in terms of the standard basis vectors e i ⊗ e J ? 

Definition 33.12. Tensors of the form v ⊗ w are called pure tensors. 

Problem 33.4. Prove that every tensor in V ⊗ W can be written as a sum of 

pure tensors.

306 Tensors 

Example 33.13. Consider in R 3 the tensor 

e 1 ⊗ e 1 + e 2 ⊗ e 2 + e 3 ⊗ e 3 . 

This tensor is not pure (which is certainly not obvious just looking at it). Lets 

see why. Any pure tensor x ⊗ y must be 

x ⊗ y = ( x 1 e 1 + x 2 e 2 + x 3 e 3 

) 

⊗ 

( 

y 1 e 1 + y 2 e 2 + y 3 e 3 

) 

=x 1 y 1 e 1 ⊗ e 1 + x 2 y 1 e 2 ⊗ e 1 + x 3 y 1 e 3 ⊗ e 1 

+ x 1 y 2 e 1 ⊗ e 2 + x 2 y 2 e 2 ⊗ e 2 + x 3 y 2 e 3 ⊗ e 2 

+ x 1 y 3 e 1 ⊗ e 3 + x 2 y 3 e 2 ⊗ e 3 + x 3 y 3 e 3 ⊗ e 3 . 

If we were going to have x ⊗ y = e 1 ⊗ e 1 + e 2 ⊗ e 2 + e 3 ⊗ e 3 , we would need 

x 1 y 1 = 1, x 2 y 2 = 1, x 3 y 3 = 1, but also x 1 y 2 = 0, so x 1 = 0 or y 2 = 0, 

contradicting x 1 y 1 = x 2 y 2 = 1. 

Definition 33.14. The rank of a tensor is the minimum number of pure tensors 

that can appear when it is written as a sum of pure tensors. 

Definition 33.15. If U, V and W are finite dimensional vector spaces and 

b : U ⊕ V → W is a map for which b(u, v) is linear in u for any fixed v and 

linear in v for any fixed u, say that b is a bilinear map. 

Theorem 33.16 (Universal Mapping Theorem). Every bilinear map b : U ⊕ 

V → W induces a unique linear map B : U ⊗ V → W , by the rule B (u ⊗ v) = 

b(u, v). Sending b to B = T b gives an isomorphism T between the vector space 

Z of all bilinear maps f : U ⊕ V → W and the vector space Hom (U ⊗ V, W ). 

Problem 33.5. Prove the universal mapping theorem: 


Problem 33.6. Write down an isomorphism V ∗ ⊗ W ∗ = (V ⊗ W ) ∗ , and prove 

that it is an isomorphism. 

Problem 33.7. If S : V 0 → V 1 and T : W 0 → W 1 are linear maps of finite 

dimensional vector spaces, prove that there is a unique linear map, which we 

will write as S ⊗ T : V 0 ⊗ W 0 → V 1 ⊗ W 1 , so that 

S ⊗ T (v 0 ⊗ w 0 ) = (Sv 0 ) ⊗ (T w 0 ) . 

Problem 33.8. What is the rank of e 1 ⊗ e 1 + e 2 ⊗ e 2 + e 3 ⊗ e 3 as a tensor in 

R 3 ⊗ R 3 ? 

Problem 33.9. Let any tensor v ⊗ w in V ⊗ W eat a covector ξ from V ∗ by the 

rule (v ⊗ w) ξ = ξ(v)w. Prove that this makes v ⊗ w into a linear map V ∗ → W . 

Prove that this definition extends to a linear map V ⊗ W → Hom (V ∗ , W ). 

Prove that the rank of a tensor as defined above is the rank of the associated 

linear map V ∗ → W . Use this to find the rank of ∑ i e i ⊗ e i in R n ⊗ R n .

33.3. “Lower Indices” 307 

Problem 33.10. Take two vector spaces V and W and define a vector space 

V ∗ W to be the collection of all real-valued functions on V ⊕ W which are zero 

except at finitely many points. Careful: these functions don’t have to be linear. 

Picking any vectors v from V and w from W , lets write the function 

⎧ 

⎨ 

1, if x = v and y = w 

f(x, y) = 

⎩0, otherwise, 

as v ∗ w. So clearly V ∗ W is a vector space, whose elements are linear 

combinations of elements of the form v ∗ w. Let Z be the subspace of V ∗ W 

spanned by the vectors 

(av) ∗ w − a(v ∗ w), 

v ∗ (aw) − a(v ∗ w), 

(v 1 + v 2 ) ∗ w − v 1 ∗ w − v 2 ∗ w, 

v ∗ (w 1 + w 2 ) − v ∗ w 1 − v ∗ w 2 , 

for any vectors v, v 1 , v 2 from V and w, w 1 , w 2 from W and any number a. 

a. Prove that if V and W both have positive and finite dimension, then 

V ∗ W and Z are infinite dimensional. 

b. Write down a linear map V ∗ W → V ⊗ W . 

c. Prove that your linear map has kernel containing Z. 

It turns out that (V ∗ W ) /Z is isomorphic to V ⊗ W . We could have defined 

V ⊗ W to be (V ∗ W ) /Z, and this definition has many advantages for various 

generalizations of tensor products. 

Remark 33.17. In the end, what we really care about is that tensors using our 

abstract definitions should turn out to have just the properties they had with the 

more concrete definition in terms of indices. So even if the abstract definition 

is hard to swallow, we will really only need to know that tensors have tensor 

products, contractions, sums and scaling, change according to the usual rules 

when we linearly change variables, and that when we tensor together bases, we 

obtain a basis for the tensor product. This is the spirit behind problem 33.10. 

33.3 “Lower Indices” 

Fix a finite dimensional vector space V . Consider the tensor product V ⊗ V ∗ . 

A basis v 1 , v 2 , . . . , v n for V has a dual basis v 1 , v 2 , . . . , v n for V ∗ , and a tensor 

product basis v i ⊗ v j . Every tensor t in V ⊗ V ∗ has the form 

t = t i jv i ⊗ v j . 

So it is clear where the lower indices come from when we pick a basis: they 

come from V ∗ .

308 Tensors 

Problem 33.11. We saw in chapter 32 that a matrix A is written in indices as 

A i j . Describe an isomorphism between Hom (V, V ) and V ⊗ V ∗ . 

Lets define contractions. 

Theorem 33.18. Let V and W be finite dimensional vector spaces. There is 

a unique linear map 

V ⊗ V ∗ ⊗ W → W, 

called the contraction map, that on pure tensors takes v ⊗ ξ ⊗ w to ξ(v)w. 

Remark 33.19. We can generalize this idea in the obvious way to any tensor 

product of any finite number of finite dimensional vector spaces: if one of the 

vector spaces is the dual of another one, then we can contract. For example, we 

can contract V ⊗ W ⊗ V ∗ by a linear map which on pure tensors takes v ⊗ w ⊗ ξ 

to ξ(v)w. 

Proof. Pick a basis v 1 , v 2 , . . . , v p of V and a basis w 1 , w 2 , . . . , w q of W . Define 

T on the basis v i ⊗ v j ⊗ w K by 

⎧ 

⎨ 

T v i ⊗ v j w K if i = j, 

⊗ w K = 

⎩0 if i ≠ j. 

By theorem 16.23 on page 167, there is a unique linear map 

T : V ⊗ V ∗ ⊗ W → W, 

which has these values on these basis vectors. Writing any vector v in V as 

v = a i v i and any covector ξ in V ∗ as ξ = b i v i , and any vector w in W as 

w = c J w J , we find 

T v ⊗ ξ ⊗ w = a i b j c J w J = ξ(v)w. 

Therefore there is a linear map T : V ⊗ V ∗ ⊗ W → W , that on pure tensors 

takes v ⊗ ξ ⊗ w to ξ(v)w. Any other such map, say S, which agrees with T on 

pure tensors, must agree on all linear combinations of pure tensors, so on all 

tensors. 

33.4 “Swapping Indices” 

We have one more tensor operation to generalize to abstract vector spaces: 

when working in indices we can associate to a tensor t i jk the tensor ti kj , i.e. 

swap indices. This generalizes in the obvious manner. 

Theorem 33.20. Take V and W finite dimensional vector spaces. There is 

a unique linear isomorphism V ⊗ W → W ⊗ V , which on pure tensors takes 

v ⊗ w to w ⊗ v.

33.5. Summary 309 

Problem 33.12. Prove theorem 33.20, by imitating the proof of theorem 33.18. 

Remark 33.21. In the same fashion, we can make a unique linear isomorphism 

reordering the factors in any tensor product of vector spaces. To be more 

specific, take any permutation q of the numbers 1, 2, . . . , p. Then (with basically 

the same proof) there is a unique linear isomorphism 

V 1 ⊗ V 2 ⊗ · · · ⊗ V p → V q(1) ⊗ V q(2) ⊗ · · · ⊗ V q(p) 

which takes each pure tensor v 1 ⊗ v 2 ⊗ · · · ⊗ v p to the pure tensor v q(1) ⊗ v q(2) ⊗ 

· · · ⊗ v q(p) . 

33.5 Summary 

We have now acheived our goal: we have defined tensors on an abstract finite 

dimensional vector space, and defined the operations of addition, scaling, tensor 

product, contraction and “index swapping” for tensors on an abstract vector 

space. 

All there is to know about tensors is that 

(1) they are sums of pure tensors v ⊗ w, 

(2) the pure tensor v ⊗ w depends linearly on v and linearly on w, and 

(3) the universal mapping property. 

Another way to think about the universal mapping property is that there are 

no identities satisfied by tensors other than those which are forced by (1) and 

(2); if there were, then we couldn’t turn a bilinear map which didn’t satisfy that 

identity into a linear map on tensors, i.e. we would contradict the universal 

mapping property. Roughly speaking, there is nothing else that you could know 

about tensors besides (1) and (2) and the fact that there is nothing else to 

know. 

33.6 Cartesian Tensors 

If V is a finite dimensional inner product space, then we can ask how to “lower 

indices” in this abstract setting. Given a single vector v from V , we can “lower 

its index” by turning v into a covector. We do this by contructing the covector 

ξ(x) = 〈v, x〉. One often sees this covector ξ written as v ∗ or some such notation, 

and usually called the dual to v. 

Problem 33.13. Prove that the map ∗ : V → V ∗ given by taking v to v ∗ is an 

isomorphism of finite dimensional vector spaces. 

Careful: the covector v ∗ depends on the choice not only of the vector v, but 

also of the inner product. 

Problem 33.14. In the usual inner product in R n , what is the map ∗ ? 

Problem 33.15. Let 〈x, y〉 0 

be the usual inner product on R n , and define a 

new inner product by the rule 〈x, y〉 1 

= 2, 〈x, y〉 0 

. Calculate the map which 

gives the dual covector in the new inner product.

310 Tensors 

The inverse to ∗ is usually also written ∗ , and we write the vector dual 

to a covector ξ as ξ ∗ . Naturally, we can define an inner product on V ∗ by 

〈ξ, η〉 = 〈ξ ∗ , η ∗ 〉. 

Problem 33.16. Prove that this defines an inner product on V ∗ . 

If V and W are finite dimensional inner product spaces, we then define an 

inner product on V ⊗ W by setting 

〈v 1 ⊗ w 1 , v 2 ⊗ w 2 〉 = 〈v 1 , v 2 〉〈w 1 , w 2 〉 . 

This expression only determines the inner product on pure tensors, but since 

the inner product is required to be bilinear and every tensor is a sum of pure 

tensors, we only need to know the inner product on pure tensors. 

Problem 33.17. Prove that this defines an inner product on V ⊗ W . 

Lets write V ⊗2 to mean V ⊗ V , etc. We refer to tensors in a vector space V 

to mean elements of V ⊗p ⊗ V ∗⊗q for some positive integers p and q, i.e. tensor 

products of vectors and covectors. The elements of V ⊗p are called covariant 

tensors: they are sums of tensor products of vectors. The elements of V ∗⊗p are 

called contravariant tensors: they are sums of tensor products of covectors. 

Problem 33.18. Prove that an inner product yields a unique linear isomorphism 

so that 

∗ : V ⊗p ⊗ V ∗⊗q → V ⊗(p+q) , 

(v 1 ⊗ v 2 ⊗ · · · ⊗ v p ⊗ ξ 1 ⊗ ξ 2 ⊗ · · · ⊗ ξ q ) ∗ = v 1 ⊗v 2 ⊗· · ·⊗v p ⊗ξ 1 ∗ ⊗ξ 2 ∗ ⊗· · ·⊗ξ q ∗ . 

This isomorphism “raises indices.” Similarly, we can define a map to “lower 

indices.” 

33.7 Polarization 

We will generalize the isomorphism between symmetric bilinear forms and 

quadratic forms to an isomorphism between symmetric tensors and polynomials. 

Definition 33.22. Let V be a finite dimensional vector space. If t is a tensor 

in V ∗⊗p , i.e. a multilinear function t (v 1 , v 2 , . . . , v p ) depending on p vectors 

v 1 , v 2 , . . . , v p from V , then we can define the polarization of t to be the function 

(also written traditionally with the same letter t) t(v) = t (v, v, . . . , v). 

Example 33.23. If t is a covector, so a linear function t(v) of a single vector v, 

then the polarization is the same linear function. 

Example 33.24. In R 2 , if t = e 1 ⊗ e 2 , then the polarization is t (x) = x 1 x 2 for 

( ) 

x 

x = 

1 

x 2 

in R 2 .

33.7. Polarization 311 

Example 33.25. The antisymmetric tensor t = e 1 ⊗ e 2 − e 2 ⊗ e 1 in R 2 has 

polarization t(x) = x 1 x 2 − x 2 x 1 = 0, vanishing. 

Definition 33.26. A function f : V → R on a finite dimensional vector space 

is called a polynomial if there is a linear isomorphism F : R n → V for which 

f(F (x)) is a polynomial in the usual sense. 

Clearly the choice of linear isomorphism F is irrelevant. A different choice 

would only alter the linear functions by linear combinations of one another, and 

therefore would alter the polynomial functions by substituting linear combinations 

of new variables in place of old variables. In particular, the degree of a 

polynomial function is well defined. A polynomial function f : V → R is called 

homogeneous of degree d if f(λx) = λ d f(x) for any vector x in V and number 

λ. Clearly every polynomial function splits into a unique sum of homogeneous 

polynomial functions. 

There are two natural notions of multiplying symmetric tensors, which 

simply differ by a factor. The first is 

s◦t (x 1 , x 2 , . . . , x a+b ) = ∑ p 

s ( x p(1) , x p(2) , . . . , x p(a) 

) 

t 

( 

xp(a+1) , x p(a+2) , . . . , x p(a+b) 

) 

, 

when s has a lower indices and t has b and the sum is over all permutations p 

of the numbers 1, 2, . . . , a + b. The second is 

st (x 1 , x 2 , . . . , x a+b ) = 

1 

(a + b)! s ◦ t. 

Theorem 33.27. Polarization in is a linear isomorphism taking symmetric 

contravariant tensors to polynomials, preserving degree, and taking products to 

products (using the second multiplication above). 

Proof. ???

34 Exterior Forms 

This chapter develops the definition and basic properties of exterior forms. 

34.1 Why Forms? 

This section is just for motivation; the ideas of this section will not be used 

subsequently. 

Antisymmetric contravariant tensors are also called exterior forms. In terms 

of indices, they have no upper indices, and they are skew-symmetric in their 

lower indices. The reason for the importance of exterior forms is quite deep, 

coming from physics. Consider fluid passing into and out of a region in space. 

We would like to measure how rapidly some quantity (for instance heat) flows 

into that region. We do this by integrating the flux of the quantity across the 

boundary: counting how much is going in, and subtracting off how much is 

going out. This flux is an integral along the boundary surface. But if we change 

our mind about which part is the inside and which is the outside, the sign of 

the integral has to change. So this type of integral (called a surface integral) 

has to be sensitive to the choice of inside and outside, called the orientation of 

the surface. This sign sensitivity has to be built into the integral. We are used 

to integrating functions, but they don’t change sign when we change orientation 

the way a flux integral should. For example, the area of a surface is not a 

flux integral, because it doesn’t depend on orientation. Lets play around with 

some rough ideas to get a sense of how exterior forms provide just the right 

sign changes for flux integrals: we can integrate exterior forms. For precise 

definitions and proofs, Spivak [13] is an excellent introduction. 

Lets imagine some kind of object α that we can integrate over any surface S 

(as long as S is reasonably smooth, except maybe for a few sharp edges: lets not 

make ∫ that precise since we are only playing around). Suppose that the integral 

α is a number. Of course, S must be a surface with a choice of orientation. 

S 

Moreover, if we write −S for the same surface with the opposite orientation, 

then ∫ −S α = − ∫ S α. Suppose that ∫ α varies smoothly as we smoothly deform 

S 

S. Moreover, suppose that if we cut a surface S into two surfaces S 1 and S 2 , 

then ∫ S α = ∫ S 1 

α+ ∫ S 2 

α: the integral is a sum of locally measureable quantities. 

Fix a point, which we can translate to become the origin, and scale up the 

picture so that eventually, for S a little piece of surface, ∫ α is very nearly 

S 

invariant under small translations of S. (We can do this because the integral 

313 

S = S 1 ∪ S 1


varies smoothly, so after rescaling the picture the integral hardly varies at all.) 

Just for simplicity, lets assume that ∫ α is unchanged when we translate the 

S 

surface S. Any two opposite sides of a box are translations of one another, but 

with opposite orientations. So they must have opposite signs for ∫ α. Therefore 

any small box has as much of our quantity entering as leaving. Approximating 

any region with small boxes, we must get total flux ∫ α = 0 when S is the 

S 

boundary of the region. 

Pick two linearly independent vectors u and v, and let P be the parallelogram 

at the origin with sides u and v. Pick any vector w perpendicular to u and v 

and with 

( 

) 

det u v w > 0. 

Orient P so that the outside of P is the side in the direction of w. If we 

swap u and v, then we change the orientation of the parallelogram P . Lets 

write α(u, v) for ∫ α. Slicing the parallelogram into 3 equal pieces, say into 

P 

3 parallelograms with sides u/3, v, we see that α(u/3, v) = α(u, v)/3. In the 

same way, we can see that α(λu, v) = λ α(u, v) for λ any positive rational 

number (dilate by the numerator, and cut into a number of pieces given by the 

denominator). Because α(u, v) is a smooth function of u and v, we see that 

α(λu, v) = λ α(u, v) for λ > 0. Similarly, α(0, v) = 0 since the parallelogram is 

flattened into a line. Moreover, α(−u, v) = −α(u, v), since the parallelogram of 

−u, v is the parallelogram of u, v reflected, reversing its orientation. So α(u, v) 

scales in u and v. By reversing orientation, α(v, u) = −α(u, v). A shear applied 

to the parallelogram will preserve the area, and after the shear we can cut and 

paste the parallelogram as in chapter 19. The integral must be preserved, by 

translation invariance, so α(u + v, v) = α(u, v). 

The hard part is to see why α is linear as a function of u. This comes from 

the drawing 

u + v 

u 

v 

v + w 

w 

The integral over the boundary of this region must vanish. If we pick three 

vectors u, v and w, which we draw as the standard basis vectors, then the region 

has boundary given by various parallelograms and triangles (each triangle being 

half a parallelogram), and the vanishing of the integral gives 

0 = α(u, v) + α(v, w) + 1 2 α(w, u) − 1 α(w, u) − α(u + w, v). 

2 

Therefore α(u, v) + α(v, w) = α(u + w, v), so that finally α is a tensor. If you 

like indices, you can write α as α ij with α ji = −α ij . 

∫ Our argument is only slightly altered if we keep in mind that the integral 

α should not really be exactly translation invariant, but only vary slightly 

S

34.2. Definition 315 

with small translations, and that the integral around the boundary of a small 

region should be small. We can still carry out the same argument, but throwing 

in error terms proportional to the area of surface and extent of translation, or 

to the volume of a region. We end up with α being an exterior form whose 

coefficients are functions. 

If we imagine a flow contained inside a surface, we can similarly measure 

flux across a curve. We also need to be sensitive to orientation: which side of 

the boundary of a surface is the inside of the surface. Again the correct object 

to work with in order to have the correct sign sensitivity is an exterior form 

(whose coefficients are functions, not just numbers). Similar remarks hold in 

any number of dimensions. So exterior forms play a vital role because they are 

the objects we integrate. We can easily change variables when we integrate 

exterior forms. 


Definition 34.1. A tensor t in V ∗⊗p is called a p-form if it is antisymmetric, i.e. 

t (v 1 , v 2 , . . . , v p ) 

is antisymmetric as a function of the vectors v 1 , v 2 , . . . , v p , 

For any permutation q, 

t ( v q(1) , v q(2) , . . . , v q(p) 

) 

= (−1) N t (v 1 , v 2 , . . . , v p ) , 

where (−1) N is the sign of the permutation q. 

Example 34.2. The form in R n 

) 

ε(v 1 , v 2 , . . . , v n ) = det 

(v 1 v 2 . . . v n 

is ∫ called the volume form of R n (because of its interpretation as an integrand: 

ε is the volume of any region R.). 

R 

Example 34.3. A covector ξ in V ∗ is a 1-form, because there are no permutations 

you can carry out on ξ(v). 

Example 34.4. In R n we traditionally write points as 

⎛ ⎞ 

x 1 

x 

x = 

2 

⎜ ⎟ 

⎝ . ⎠ , 

x n 

and write dx 1 for the covector given by the rule dx 1 (y) = y 1 for any vector y 

in R n . Then α = dx 1 ⊗ dx 2 − dx 2 ⊗ dx 1 is a 2-form: 

α(u, v) = u 1 v 2 − u 2 v 1 .


Problem 34.1. If t is a 3-form, prove that 

a t(x, y, y) = 0 and 

b t(x, y + 3 x, z) = t(x, y, z) 

for any vectors x, y, z.

A 

Hints 


1.1. 

⎛ 

⎞ 

0 0 1 1 

0 0 1 1 

⎜ 

⎟ 

⎝ 1 0 3 0⎠ 

1 1 1 1 

Swap rows 1 and 3. 

⎛ 

⎞ 

1 0 3 0 

0 0 1 1 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

1 1 1 1 


⎛ 

⎞ 

1 0 3 0 

0 0 1 1 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

0 1 −2 1 


⎛ 

⎞ 

1 0 3 0 

0 0 1 1 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

0 1 −2 1 

317

318 Hints 






⎛ 

⎞ 

1 0 3 0 

0 1 −2 1 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

0 0 1 1 

⎛ 

⎞ 

1 0 3 0 

0 1 −2 1 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

0 0 1 1 

⎛ 

⎞ 

1 0 3 0 

0 1 −2 1 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

0 0 0 0 

⎛ 

⎜ 

⎝ 

1 0 3 0 

0 1 −2 1 

0 0 1 1 

0 0 0 0 

⎛ 

⎞ 

1 0 3 0 

0 1 −2 1 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

0 0 0 0 

⎞ 

⎟ 

⎠ 

1.2. 

⎛ 

⎞ 

1 1 0 1 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 ⎠ 

0 0 0 −1

Hints 319 





⎛ 

⎞ 

1 1 0 1 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 ⎠ 

0 0 0 −1 

⎛ 

⎞ 

1 1 0 1 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 ⎠ 

0 0 0 −1 

⎛ 

⎞ 

1 1 0 1 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 ⎠ 

0 0 0 −1 

⎛ 

⎞ 

1 1 0 1 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 −1 ⎠ 

0 0 0 0 

1.3. 

1.4. Forward eliminate: 

⎛ 

⎜ 

⎝ 

x 1 = 7 − x 4 

x 2 = 3 − x 4 

2 

x 3 = −3 + x 4 

2 . 

⎞ 

0 2 1 1 

⎟ 

4 −1 1 2⎠ 

4 3 3 4

320 Hints 




⎛ 

⎞ 

4 −1 1 2 

⎜ 

⎟ 

⎝ 0 2 1 1⎠ 

4 3 3 4 

⎛ 

⎞ 

4 −1 1 2 

⎜ 

⎟ 

⎝ 0 2 1 1⎠ 

0 4 2 2 

⎛ 

⎞ 

4 −1 1 2 

⎜ 

⎟ 

⎝ 0 2 1 1⎠ 

0 4 2 2 


⎛ 

⎞ 

4 −1 1 2 

⎜ 

⎟ 

⎝ 0 2 1 1⎠ 

0 0 0 0 




⎛ 

⎞ 

4 −1 1 2 

⎜ 

⎟ 

⎝ 0 2 1 1⎠ 

0 0 0 0 

⎛ 

⎜ 

⎝ 

4 −1 1 2 

0 2 1 1 

0 0 0 0 

⎞ 

⎟ 

⎠ 

⎛ 

⎞ 

4 −1 1 2 

⎜ 

⎟ 

⎝ 0 2 1 1⎠ 

0 0 0 0

Hints 321 

Back substitute: 


⎛ 

⎞ 

4 −1 1 2 

⎜ 

1 1⎟ 

⎝ 0 1 

2 2⎠ 

0 0 0 0 

Add row 2 to row 1. 

⎛ 

⎜ 

⎝ 

4 0 

3 

2 

5 

2 

1 

2 

1 

0 1 

2 

0 0 0 0 

⎞ 

⎟ 

⎠ 


⎛ 

⎜ 

⎝ 

1 0 

3 

8 

5 

8 

1 

2 

1 

0 1 

2 

0 0 0 0 

⎞ 

⎟ 

⎠ 

x 1 = −3/8 x 3 + 5/8 

x 2 = −1/2 x 3 + 1/2 

1.5. ⎛ 

⎞ 

⎜ 

2 0 2 ⎟ 

⎝0 2 2 ⎠ 

0 0 −1 

1.6. ⎛ 

⎞ 

−1 −1 1 

⎜ 

⎟ 

⎝ 0 2 −1⎠ 

0 0 2 

1.9. 

⎛ 

⎞ 

0 1 1 0 0 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 0⎠ 

1 1 0 1 1

322 Hints 



⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 0⎠ 

0 1 1 0 0 

⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 0⎠ 

0 1 1 0 0 


⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 0⎠ 

0 0 1 −1 0 




⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 0 0 0⎠ 

0 0 1 −1 0 

⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 1 −1 0⎠ 

0 0 0 0 0 

⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 1 −1 0⎠ 

0 0 0 0 0

Hints 323 



⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 1 −1 0 ⎠ 

0 0 0 0 0 

⎛ 

⎞ 

1 1 0 1 1 

0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 0 1 −1 0⎠ 

0 0 0 0 0 

1.10. 

⎛ 

⎞ 

1 3 2 6 

⎜ 

⎟ 

⎝ 2 5 4 1⎠ 

3 8 6 7 


⎛ 

⎞ 

1 3 2 6 

⎜ 

⎟ 

⎝ 0 −1 0 −11⎠ 

0 −1 0 −11 


⎛ 

⎞ 

1 3 2 6 

⎜ 

⎟ 

⎝ 0 −1 0 −11⎠ 

0 −1 0 −11 


⎛ 

⎞ 

1 3 2 6 

⎜ 

⎟ 

⎝ 0 −1 0 −11⎠ 

0 0 0 0 


⎛ 

⎞ 

1 3 2 6 

⎜ 

⎟ 

⎝ 0 −1 0 −11⎠ 

0 0 0 0

324 Hints 



⎛ 

⎞ 

1 3 2 6 

⎜ 

⎟ 

⎝ 0 −1 0 −11⎠ 

0 0 0 0 

⎛ 

⎞ 

1 3 2 6 

⎜ 

⎟ 

⎝ 0 −1 0 −11⎠ 

0 0 0 0 

1.11. 

Scale row 3 by −1. 



⎛ 

⎞ 

1 1 0 1 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 1 ⎠ 

0 0 0 0 

⎛ 

⎞ 

1 1 0 0 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 1 ⎠ 

0 0 0 0 

⎛ 

⎞ 

1 0 −1 0 

0 1 1 0 

⎜ 

⎟ 

⎝ 0 0 0 1 ⎠ 

0 0 0 0 

1.12. 

Scale row 2 by − 1 2 . 

⎛ 

⎞ 

−1 1 0 

⎜ 

⎟ 

⎝ 0 1 0 ⎠ 

0 0 1

Hints 325 


⎛ 

⎞ 

−1 0 0 

⎜ 

⎟ 

⎝ 0 1 0 ⎠ 

0 0 1 


I 

1.13. 


⎛ 

⎞ 

1 0 −1 

⎜ 

⎟ 

⎝ 0 1 1 ⎠ 

0 0 0 

1.14. ⎛ 

⎞ 

1 0 − 1 3 

⎜ 

⎝0 1 − 1 ⎟ 

3⎠ 

0 0 0 

1.15. I 


⎛ 

⎞ 

−1 2 1 1 1 

−1 2 2 1 0 

⎜ 

⎟ 

⎝ 0 0 1 2 0⎠ 

0 0 0 1 2 


⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

⎟ 

⎝ 0 0 1 2 0 ⎠ 

0 0 0 1 2 


⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

⎟ 

⎝ 0 0 1 2 0 ⎠ 

0 0 0 1 2

326 Hints 


⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

⎟ 

⎝ 0 0 1 2 0 ⎠ 

0 0 0 1 2 


⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

⎟ 

⎝ 0 0 0 2 1 ⎠ 

0 0 0 1 2 


⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

⎟ 

⎝ 0 0 0 2 1 ⎠ 

0 0 0 1 2 

Add − 1 2 


⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

⎟ 

⎝ 0 0 0 2 1 ⎠ 

3 

0 0 0 0 

2 


⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

0 0 0 2 1 

⎟ 

⎝ 

3 ⎠ 

0 0 0 0 

2 



⎛ 

⎞ 

−1 2 1 1 1 

0 0 1 0 −1 

⎜ 

⎟ 

⎝ 0 0 0 2 1 ⎠ 

0 0 0 0 1

Hints 327 

Add −(row 4) to row 1, row 4 to row 2, −(row 4) to row 3. 

⎛ 

⎞ 

−1 2 1 1 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝ 0 0 0 2 0 ⎠ 

0 0 0 0 1 


⎛ 

⎞ 

−1 2 1 1 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝ 0 0 0 1 0 ⎠ 

0 0 0 0 1 



⎛ 

⎞ 

−1 2 1 0 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝ 0 0 0 1 0 ⎠ 

0 0 0 0 1 

⎛ 

⎞ 

−1 2 0 0 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝ 0 0 0 1 0 ⎠ 

0 0 0 0 1 


⎛ 

⎞ 

1 −2 0 0 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝ 0 0 0 1 0 ⎠ 

0 0 0 0 1 

There are no solutions. 


⎛ 

⎜ 

⎝ 

⎞ 

1 2 3 4 5 

⎟ 

2 5 7 11 12⎠ 

0 1 1 4 3

328 Hints 


⎛ 

⎜ 

⎝ 

⎞ 

1 2 3 4 5 

⎟ 

0 1 1 3 2⎠ 

0 1 1 4 3 


⎛ 

⎞ 

1 2 3 4 5 

⎜ 

⎟ 

⎝ 0 1 1 3 2⎠ 

0 1 1 4 3 


⎛ 

⎞ 

1 2 3 4 5 

⎜ 

⎝ 0 1 1 3 

⎟ 

2⎠ 

0 0 0 1 1 



⎛ 

⎞ 

1 2 3 4 5 

⎜ 

⎟ 

⎝ 0 1 1 3 2⎠ 

0 0 0 1 1 

⎛ 

⎞ 

1 2 3 4 5 

⎜ 

⎟ 

⎝ 0 1 1 3 2⎠ 

0 0 0 1 1 



⎛ 

⎞ 

1 2 3 0 1 

⎜ 

⎟ 

⎝ 0 1 1 0 −1⎠ 

0 0 0 1 1 


⎛ 

⎞ 

1 0 1 0 3 

⎜ 

⎟ 

⎝ 0 1 1 0 −1⎠ 

0 0 0 1 1

Hints 329 

x 1 = −x 3 + 3 

x 2 = −x 3 − 1 

x 4 = 1 


⎛ 

⎞ 

−2 1 1 1 0 

1 −2 1 1 0 

⎜ 

⎟ 

⎝ 1 1 −2 1 0⎠ 

1 1 1 −2 0 

Add 1 2 (row 1) to row 2, 1 2 (row 1) to row 3, 1 2 



⎛ 

⎞ 

−2 1 1 1 0 

0 − 3 3 3 

⎜ 

2 2 2 

0 

⎟ 

⎝ 

3 ⎠ 

3 

0 

2 

− 3 2 

3 

0 

2 

2 

0 

3 

2 

− 3 2 

0 

⎛ 

⎞ 

−2 1 1 1 0 

0 − 3 3 3 

2 

2 2 

0 

⎜ 

⎟ 

⎝ 

3 ⎠ 

3 

0 

2 

− 3 2 

3 3 

0 

2 

Add row 2 to row 3, row 2 to row 4. 


2 

0 

2 

− 3 2 

0 

⎛ 

⎞ 

−2 1 1 1 0 

0 − 3 3 3 

2 

2 2 

0 

⎜ 

⎟ 

⎝ 0 0 0 3 0⎠ 

0 0 3 0 0 

⎛ 

⎞ 

−2 1 1 1 0 

0 − 3 3 3 

2 

2 2 

0 

⎜ 

⎟ 

⎝ 0 0 0 3 0⎠ 

0 0 3 0 0

330 Hints 



⎛ 

⎞ 

−2 1 1 1 0 

0 − 3 3 3 

2 

2 2 

0 

⎜ 

⎟ 

⎝ 0 0 3 0 0⎠ 

0 0 0 3 0 

⎛ 

⎞ 

−2 1 1 1 0 

0 − 3 3 3 

2 

2 2 

0 

⎜ 

⎟ 

⎝ 0 0 3 0 0⎠ 

0 0 0 3 0 



⎛ 

⎞ 

−2 1 1 1 0 

0 − 3 3 3 

2 

2 2 

0 

⎜ 

⎟ 

⎝ 0 0 3 0 0⎠ 

0 0 0 1 0 

Add −(row 4) to row 1, − 3 2 


⎛ 

⎞ 

−2 1 1 0 0 

0 − 3 3 

2 

2 

0 0 

⎜ 

⎟ 

⎝ 0 0 3 0 0⎠ 

0 0 0 1 0 


⎛ 

⎞ 

−2 1 1 0 0 

0 − 3 3 

2 

2 

0 0 

⎜ 

⎟ 

⎝ 0 0 1 0 0⎠ 

0 0 0 1 0

Hints 331 

Add −(row 3) to row 1, − 3 2 


⎛ 

⎞ 

−2 1 0 0 0 

0 − 3 0 0 0 

2 

⎜ 

⎟ 

⎝ 0 0 1 0 0⎠ 

0 0 0 1 0 

Scale row 2 by − 2 3 . ⎛ 

⎞ 

−2 1 0 0 0 

0 1 0 0 0 

⎜ 

⎟ 

⎝ 0 0 1 0 0⎠ 

0 0 0 1 0 


⎛ 

⎞ 

−2 0 0 0 0 

0 1 0 0 0 

⎜ 

⎟ 

⎝ 0 0 1 0 0⎠ 

0 0 0 1 0 

Scale row 1 by − 1 2 . ⎛ 

⎞ 

1 0 0 0 0 

0 1 0 0 0 

⎜ 

⎟ 

⎝ 0 0 1 0 0⎠ 

0 0 0 1 0 

x 1 = 0 

x 2 = 0 

x 3 = 0 

x 4 = 0 

1.20. You could try: 

a. One solution: x 1 = 0. 

b. No solutions: x 1 = 1, x 2 = 1, x 1 + x 2 = 0. 

c. Infinitely many solutions: x 1 + x 2 = 0.

332 Hints 

1.22. 

1.27. (a)=(2), (b)=(4), (c)=(1), (d)=(5), (e)=(3) 

2 Matrices 

2.2. 

(a) All coordinates of each vertex are ±1. 

(b) The vertices of a regular octahedron lie in the centers of the faces of a cube. 

(c) Try an equilateral triangle in the plane first. This should lead you to the 

points 

⎛ ⎞ 

±1 

⎜ ⎟ 

⎝±1⎠ 

±1 

with an even number of minus signs. 

2.3. 

⎛ 

∗ 

∗ 

⎜ 

⎝ 

∗ 

∗ 

⎞ 

⎟ 

⎠ 

2.4. 

( ) ( ) 

1 0 0 

A = 

, B = . 

1 0 1 

2.9. Any matrix full of zeros with at least two columns. 

2.13. 1 · 8 + (−1) · 1 + 2 · 3 = 13 

2.14. 

( 

14 20 

20 29 

)

Hints 333 

2.16. 

2.18. You could try 

⎛ ⎞ 

0 2 

⎜ ⎟ 

AB = ⎝0 2⎠ 

0 2 

( 

) 

4 2 4 

AC = 

4 2 4 

⎛ ⎞ 

0 0 

⎜ ⎟ 

AD = ⎝0 0⎠ 

0 0 

( 

) 

0 1 −1 

BC = 

0 1 −1 

( ) 

10 0 

CA = 

. 

0 0 

( ) 

1 1 

A = 

. 

−1 −1 

2.20. In an upper triangular matrix, A ij = 0 if i > j. So nonzero terms 

have i ≤ j. The product: (AB) ij 

= ∑ k A ikB kj The A ik vanishes unless i ≤ k, 

and the second vanishes unless k ≤ j , so the whole sum consists in terms with 

i ≤ k ≤ j. The sum vanishes unless i ≤ j, hence upper triangular. Moreover, 

the terms with i = j must have i ≤ k ≤ j, so just the one A ii B ii term. 

2.22. 

(c(AB)) ij 

= c(AB) ij 

= c ∑ k 

A ik B kj 

= ∑ k 

cA ik B kj 

= ∑ k 

(cA) ik 

B kj 

= ((cA)B) ij 

. 

2.23. 

((AB) C) ij 

= ∑ k 

= ∑ k 

(AB) ik 

C kj 

∑ 

A il B lk C kj . 

l

334 Hints 

On the other hand, 

(A (BC)) ij 

= ∑ k 

= ∑ k 

A ik (BC) kj 

∑ 

A ik B kl C lj . 

l 

Since k and l are just used to add up, we can change their names to anything 

we like. In particular, the resulting sums won’t change if we rename k to l and 

l to k. Moreover, we can carry out the sums in any order. (You still have to 

show that each side is defined just when the other is.) 

2.24. 

(A(B + C)) ij 

= ∑ k 

= ∑ k 

A ik (B + C) kj 

A ik (B kj + C kj ) 

= ∑ k 

A ik B kj + ∑ k 

A ik C kj 

= (AB) ij 

+ (AC) ij 

. 


3.2. One proof: 

(IA) ij 

= ∑ k 

I ik A kj 

= A ij 

because I ik = 1 just when k = i. 

A hint for a different proof (without using ∑ ): if A is 1 × 1, then the result 

is clear. Now suppose that we have proven the result already for all matrices of 

some size smaller than p × q, but that we face a matrix A which is p × q. Then 

split A into blocks, in any way you like, say as 

( ) 

P Q 

A = 

, 

R S 

and write out 

( ) ( 

I 0 P 

IA = 

0 I R 

) 

Q 

, 

S 

and calculate out the result, using the fact that since P, Q, R and S are smaller 

matrices, we can pretend that have already checked the result for them.

Hints 335 

3.3. IB = BI = I but IB = B. 

3.5. 

⎛ 

⎜ 

1 

⎞ 

⎟ 

e 1 = ⎝0⎠ . 

0 

3.6. Row j. All rows except row j. 

3.7. One proof: 

⎛ 

⎞ ⎛ ⎞ 

A 11 A 12 . . . A 1n 1 

A 

Ae 1 = 

21 A 22 . . . A 2n 

0 

⎜ 

⎟ ⎜ ⎟ 

⎝ . . . . . . ⎠ ⎝. 

⎠ 

A n1 A n2 . . . A nn 0 

so running your fingers along the rows of A and column of e 1 : 

⎛ 

⎞ 

A 11 · 1 + A 12 · 0 + · · · + A 1n · 0 

A 

= 

21 · 1 + A 22 · 0 + · · · + A 2n · 0 

⎜ 

⎟ 

⎝ 

. 

⎠ 

A n1 · 1 + A n2 · 0 + · · · + A nn · 0 

⎛ ⎞ 

A 11 

A 

= 

21 

⎜ ⎟ 

⎝ . ⎠ 

A n1 

Another proof: e 1 has entries (e 1 ) i 

= 1 if i = 1 and (e 1 ) i 

= 0 if i ≠ 1. So Ae 1 

has entries (Ae 1 ) i 

= ∑ k A ik (e 1 ) k 

. Each term is zero except if k = 1, in which 

case it is A i1 . But the entries of the first column of A are A 11 , A 21 , . . . , A n1 . 

3.9. Write x = ∑ x j e j , and multiply both sides by A. 

3.10. Use the fact that the columns of AB are A times columns of B. 


( ) 

( ) 

I 

A = I 3 0 , B = 3 

. 

0 

3.15. 

( ) ( ) ( ) ( ) 

2 0 0 −1 2 0 1 0 

1 0 1 0 1 1 −1 1

336 Hints 

3.16. Mostly yes. You only have to try to figure out where e 1 goes to, and 

where e 2 goes to. The vector e 1 is close to her left eye. The vector e 2 is close to 

the top of her head, which is not really marked by anything, so harder to follow. 

But you can’t figure out what matrix gives the straight line segment. Why? 

3.19. C = CI = C(AB) = (CA)B = IB = B. 

3.22. 

B −1 A −1 (AB) = B −1 ( A −1 A ) B 

= B −1 IB 

= B −1 B 

= I. 

and similarly multiplying out (AB) B −1 A −1 . 

3.23. By definition, AA −1 = A −1 A = I. But these equations say exactly 

that A is the inverse of A −1 . 

3.24. Multiply both sides of the equation Ax = 0 by A −1 . 

3.25. Multiply both sides of AB = I by A −1 to find B = A −1 . Therefore 

BA = I, and so A = B −1 . 

3.27. If x = y then clearly Ax = Ay. If Ax = Ay, then multiply both sides 

by A −1 . 

3.28. 

( 

) 

M −1 A 

= 

−1 −A −1 BD −1 

. 

0 D −1 

3.29. See figure A.1 on the next page. 

3.31. 2, 3, 4, 1; 3, 4, 1, 2; and 4, 2, 3, 1. 

3.33. pq is 2, 4, 3, 1. 

3.35. Put the number 1 into the spot where the permutation wants it to go, 

by swapping 1 with whatever number sits in that spot. Then put the number 2 

into its spot, etc., swapping two numbers at each step. 

3.36. 

(a) 3,1,2 is 

(b) 4,3,2,1 is 

(c) 4,1,2,3 is 

3.37. These transpositions allow us to shift any number we want over to 

the left or to the right, one step. Keep doing this until it lands in its place. So 

we can put whatever number we want to in the last place, and then proceed by 

induction.

Hints 337 

Figure A.1: Images coming from some matrices, and from their inverses 

3.39. 

) 

P = 

(e 4 e 2 e 5 e 3 e 1 

⎛ 

⎞ 

0 0 0 0 1 

0 1 0 0 0 

= 

0 0 0 1 0 

. 

⎜ 

⎟ 

⎝1 0 0 0 0⎠ 

0 0 1 0 0 

3.40. 2,3,1,4 

3.42. Each column has to be a column of the identity matrix, since it has 

all 0’s except for a single 1. But all of the columns have to be different, in order 

that no two columns have 1’s in the same row. So they are different columns 

of the identity matrix. If some column of the identity matrix doesn’t show up 

anywhere in our matrix, say the third column for example, then the third row 

has only 0’s. So every column of the identity matrix shows up, precisely once, 

scrambled in some permutation.

338 Hints 

3.43. 

P Qe j = P e q(j) 

= e p(q(j)) . 

3.45. By definition, column j of P is e p(j) . So we permuted by p. For the 

rows, P A is A with row 1 moved to row p(1), etc. Take A = I : then P is 1 

with row 1 moved to row p(1), etc. So row p(1) of P is e 1 , etc. So row 1 of P is 

e p −1 (1), etc. 

3.47. The rows of P are those of 1 swapped by p −1 . 


4.2. Se j must be e j with multiples of rows added only to later rows, so 

column j of S has zeros above row j and 1 on row j. Hence S 1j = S 2j = · · · = 

S j−1 j = 0 and S jj = 1. 

4.3. 

⎛ 

⎞ 

1 0 0 

⎜ 

⎟ 

S = ⎝−7 1 0⎠ . 

0 −5 1 

4.4. Proof (1): Multiplying by R adds entries only to lower entries, preserving 

the 1’s on and 0’s above the diagonal. 

Proof (2): A matrix R is strictly lower triangular just when R ij = 0 unless 

i ≥ j and R ij = 1 for i = j. 

(RS) ij 

= ∑ k 

R ik S kj . 

But this sum is all 0’s unless we find i ≥ k and k ≥ j, so we need i ≥ j. If i = j, 

then only the term k = i = j makes a contribution, which is R ii S ii = 1. 

Proof (3): Obvious for 1 × 1 matrices. Suppose that we have proven the 

result for all matrices of size smaller than n × n. If R and S are n × n, write 

R = 

( 

1 0 

a B 

) 

, S = 

( 

1 0 

p Q 

where a and p are columns, and B and Q are strictly lower triangular. Then 

( 

) 

1 0 

RS = 

a + Bp BQ 

which is strictly lower triangular because B and Q are strictly lower triangular 

of smaller size. 

4.5. Suppose that we want to make a matrix S which is p × q. Start with 

the identity matrix. Multiply it by the elementary matrix with S 1q in row 1, 

)

Hints 339 

column q. We get one element into place. Keep going, first getting things set 

up properly along the bottom row. 

4.7. 

4.9. If 

⎛ 

⎞ 

t 1 t 

D = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ , 

t n 

then 

⎛ 

D −1 = 

⎜ 

⎝ 

t −1 

1 

t −1 

2 

. .. 

t −1 

n 

If one of these t i is 0, then there can’t be an inverse, because De i = 0, so if D 

has an inverse, then D −1 De i = e i , but also D −1 De i = D −1 0 = 0. 

4.11. ⎛ 

⎞ 

ad 0 0 

⎜ 

⎟ 

⎝ 0 be 0 ⎠ 

0 0 cf 

4.12. The original picture is 

⎞ 

⎟ 

⎠ . 

The three resulting pictures are: 

4.13. 

( ) ( ) ( ) 

1 2 x 1 7 

= 

3 4 x 2 8

340 Hints 

4.14. P 99 = P swaps rows 1 and 2. 

⎛ 

⎞ 

P 99 ⎜ 

0 1 0 ⎟ 

= P = ⎝1 0 0⎠ . 

0 0 1 

4.15. S 101 adds 202(row 1) to row 3. 

⎛ 

⎞ 

1 0 0 

S 101 ⎜ 

⎟ 

= ⎝ 0 1 0⎠ . 

202 0 1 

4.23. Any number between 0 and 3. 

4.24. 

a. ⎛ 

⎞ 

1 2 3 4 5 

⎜ 

⎟ 

⎝0 0 0 6 7⎠ 

0 0 0 0 8 

b. Impossible: can only use 1 pivot in each row, so at most 3 pivots binding 

up variables, so must have at least 2 left over free variables (or at least 

one free variable, if the last column is a column of constants). 

c. ⎛ 

⎞ 

⎜ 

0 0 1 1 1 ⎟ 

⎝0 0 0 1 1⎠ 

0 0 0 0 0 

4.27. There is at most one pivot in each row. There are more columns than 

rows. So there must be a pivotless column: a free variable for the equation 

Ax = 0. 



⎛ 

⎜ 

⎝ 

0 0 1 1 0 0 

1 1 0 0 1 0 

1 2 1 0 0 1 

⎞ 

⎟ 

⎠ 


⎛ 

⎜ 

⎝ 

1 1 0 0 1 0 

0 0 1 1 0 0 

1 2 1 0 0 1 

⎞ 

⎟ 

⎠

Hints 341 



⎛ 

⎞ 

1 1 0 0 1 0 

⎜ 

⎟ 

⎝ 0 0 1 1 0 0⎠ 

0 1 1 0 −1 1 

⎛ 

⎞ 

1 1 0 0 1 0 

⎜ 

⎟ 

⎝ 0 0 1 1 0 0⎠ 

0 1 1 0 −1 1 


⎛ 

⎜ 

⎝ 

1 1 0 0 1 0 

0 1 1 0 −1 1 

0 0 1 1 0 0 

⎞ 

⎟ 

⎠ 


⎛ 

⎞ 

1 1 0 0 1 0 

⎜ 

⎟ 

⎝ 0 1 1 0 −1 1⎠ 

0 0 1 1 0 0 




⎛ 

⎞ 

1 1 0 0 1 0 

⎜ 

⎟ 

⎝ 0 1 0 −1 −1 1⎠ 

0 0 1 1 0 0 

⎛ 

⎞ 

1 0 0 1 2 −1 

⎜ 

⎟ 

⎝ 0 1 0 −1 −1 1 ⎠ 

0 0 1 1 0 0 

⎛ 

⎞ 

1 2 −1 

A −1 ⎜ 

⎟ 

= ⎝−1 −1 1 ⎠ . 

1 0 0

342 Hints 


⎛ 

⎜ 

⎝ 

1 2 3 1 0 0 

1 2 0 0 1 0 

1 0 0 0 0 1 

⎞ 

⎟ 

⎠ 

Add −(row 1) to row 2, −(row 1) to row 3. 

⎛ 

⎞ 

1 2 3 1 0 0 

⎜ 

⎟ 

⎝ 0 0 −3 −1 1 0⎠ 

0 −2 −3 −1 0 1 


⎛ 

⎞ 

1 2 3 1 0 0 

⎜ 

⎟ 

⎝ 0 0 −3 −1 1 0⎠ 

0 −2 −3 −1 0 1 


⎛ 

⎜ 

⎝ 

1 2 3 1 0 0 

0 −2 −3 −1 0 1 

0 0 −3 −1 1 0 

⎞ 

⎟ 

⎠ 


⎛ 

⎞ 

1 2 3 1 0 0 

⎜ 

⎟ 

⎝ 0 −2 −3 −1 0 1⎠ 

0 0 −3 −1 1 0 



⎛ 

⎜ 

⎝ 

1 2 3 1 0 0 

0 −2 −3 −1 0 1 

1 

0 0 1 

3 

− 1 3 

0 

⎞ 

⎟ 

⎠ 

Add −3(row 3) to row 1, 3(row 3) to row 2. 

⎛ 

⎞ 

1 2 0 0 1 0 

⎜ 

⎟ 

⎝ 0 −2 0 0 −1 1⎠ 

1 

0 0 1 

3 

− 1 3 

0

Hints 343 


⎛ 

⎜ 

⎝ 

1 2 0 0 1 0 

0 1 0 0 

1 

2 

− 1 2 

0 0 1 

1 

3 

− 1 3 

0 

⎞ 

⎟ 

⎠ 


⎛ 

1 0 0 0 0 1 

⎜ 

1 

⎝ 0 1 0 0 

1 

0 0 1 

3 

− 1 3 

0 

2 

− 1 2 

⎞ 

⎟ 

⎠ 

5.3. Not invertible. 

5.4. 

⎛ 

⎞ 

0 0 1 

A −1 ⎜ 

= 

1 

⎝0 2 

− 1 ⎟ 

2⎠ . 

1 

3 

− 1 3 

0 

A −1 = 

⎛ 

⎜ 

⎝ 

0 −1 1 

3 

2 

1 − 3 2 

− 1 2 

0 

1 

2 

5.6. Yes, invertible. 

5.7. No, not invertible. 

5.8. If A is invertible, A −1 Ax = x = 0. On the other hand, if Ax = 0 holds 

only for x = 0, then the same is true of Ux = 0, after Gauss–Jordan elimination. 

Any column of U with no pivot gives a free variable, so there must be a pivot 

in each column, going straight down the diagonal, so U is invertible. 

5.9. Lets use theorem 5.9. Is there a solution x to the equation Ax = b for 

every choice of b? Yes: try x = Bb. So A is invertible. Multiply AB = I on 

both sides by A −1 to get B = A −1 . 

5.10. We have already seen that if A and B are both invertible, then 

(AB)( 

−1 = B −1 A −1 . Suppose that AB is invertible. Then (AB) (AB) −1 = I 

so A B (AB) −1) = I. Therefore A is invertible. Multiply on the left by A −1 : 

B (AB) −1 = A −1 . Multiply by A on the right: B (AB) −1 A = I. So B is 

invertible. 

5.11. Yes 

5.12. Forward elimination: 

⎛ 

⎞ 

⎜ 

⎝ 

1 2 3 

⎟ 

1 2 1 ⎠ 

0 1 15 

⎞ 

⎟ 

⎠ .

344 Hints 


Swap rows 2 and 3 

⎛ 

⎞ 

1 2 3 

⎜ 

⎟ 

⎝ 0 0 −2⎠ 

0 1 15 

⎛ 

⎞ 

1 2 3 

⎜ 

⎟ 

⎝ 0 1 15 ⎠ 

0 0 −2 

⎛ 

⎞ 

1 2 3 

⎜ 

⎟ 

⎝ 0 1 15 ⎠ 

0 0 −2 

The matrix is invertible, so the equations have a unique solution. 


⎛ 

⎞ 

⎜ 

0 1 0 ⎟ 

A = ⎝0 2 1⎠ 

1 0 0 

which has forward elimination 

⎛ 

1 0 0 

0 2 1 

⎜ 

⎝ 

0 0 − 1 2 

⎞ 

⎟ 

⎠ , 

while its transpose A t has forward elimination 

⎛ 

⎞ 

1 2 0 

⎜ 

⎟ 

⎝ 0 1 0 ⎠ . 

0 0 1 

5.16. Ux = V b just when V Ax = V b just when Ax = b. So to solve Ax = b 

we need b to solve Ux = V b. The last two rows of Ux = V b give the conditions 

above. If those are satisfied, we can then drop those rows, and start solving the 

first two rows with the pivots.

Hints 345 


6.1. If a ≠ 0, use it as a pivot, and forward elimination yields 

⎛ 

⎞ 

a b 

⎜ 

⎝ 

0 d − bc ⎟ 

⎠ . 

a 

Therefore if a ≠ 0, then A is invertible just when d − bc a ≠ 0. Multiplying by a, 

we see that A is invertible just when ad − bc ≠ 0. 

What if a = 0? We try to swap. Forward elimination yields 

( ) 

c d 

. 

0 b 

Invertibility (when a = 0) is just precisely both b and c not vanishing. But 

(when a = 0) ad − bc = −bc vanishes just when b or c does, so just when 

invertibility fails. 

6.2. 

( ) ( 

) ( 

) 

a b 

a b 

a b 

det 

= a det 

− c det 

c d 

c d 

c d 

6.3. 

6.4. 

+1 det 

= ad − bc. 

+3 (−3) − 1 (1) = −10 

( ) ( ) ( ) 

1 1 

−1 0 

−1 0 

− 0 det 

+ 1 det 

0 −1 

0 −1 

1 1 

6.10. 1 · 4 · 6 = 24 

6.11. For 1 × 1 matrices U, the results are obvious: 

( 1 

U = (u) , U −1 = . 

u) 

= −2 

Suppose that U is n × n and assume that we have already checked that all 

smaller invertible upper triangular matrices. Split into blocks, in any manner 

at all 

( ) 

A B 

U = 

0 C 

with A and C square and upper triangular. You will have to find a way to see 

that A and C are invertible. Once you do that, check that 

( 

) 

U −1 A 

= 

−1 −A −1 BC −1 

. 

0 C −1

346 Hints 

We see by induction that (1) U −1 is upper triangular, (2) the diagonal entries 

of U −1 are the reciprocals of the diagonal entries of U, and (3) we can compute 

the entries U −1 inductively in terms of the entries of U. 

6.13. Swapping changes the sign of the determinant. But swapping doesn’t 

change the matrix, so it doesn’t change the sign of the determinant. Therefore 

the determinant is 0. 

6.14. You can use 

A = 

( 

1 0 

0 0 

) 

, B = 

for example. 

6.15. The determinant goes up by 6 times. 

( 

0 0 

0 1 

) 


7.1. 

⎛ 

⎜ 

⎝ 

2 5 5 

2 5 7 

2 6 11 

⎞ 

⎟ 

⎠ 


⎛ 

⎞ 

2 5 5 

⎜ 

⎟ 

⎝ 0 0 2⎠ 

0 1 6 




⎛ 

⎞ 

2 5 5 

⎜ 

⎟ 

⎝ 0 0 2⎠ 

0 1 6 

⎛ 

⎞ 

2 5 5 

⎜ 

⎟ 

⎝ 0 1 6⎠ 

0 0 2 

⎛ 

⎜ 

⎝ 

2 5 5 

0 1 6 

0 0 2 

⎞ 

⎟ 

⎠

Hints 347 

So det A = −(2)(1)(2): the minus sign because of one row swap. 

7.2. 0 because the second row is a multiple of the first. 

7.3. From the fast formula, det A ≠ 0 just when there is a pivot in each 

column. 

7.4. Gauss–Jordan elimination with one row swap yields 

⎛ 

⎞ 

1 0 1 −1 

0 1 −1 −1 

⎜ 

⎟ 

⎝ 0 0 −1 1 ⎠ 

so 

so 

so 

0 0 0 −1 

⎛ 

⎞ 

1 0 1 −1 

1 0 0 0 

det ⎜ 

⎟ 

⎝0 1 −1 −1⎠ = −1 

1 0 −1 0 

7.5. Gauss–Jordan elimination with one row swap yields 

⎛ 

⎞ 

−1 1 0 0 

0 2 2 0 

⎜ 

0 0 2 1 

⎟ 

⎝ 

3 ⎠ 

0 0 0 

2 

⎛ 

⎞ 

0 2 2 0 

−1 1 0 0 

det ⎜ 

⎟ 

⎝−1 −1 0 1⎠ = 6 

2 0 1 1 

7.6. Gauss–Jordan elimination with no row swaps yields 

⎛ 

⎞ 

2 −1 −1 

⎜ 0 − 3 − 1 ⎟ 

⎝ 2 

2⎠ 

0 0 0 

⎛ 

⎞ 

2 −1 −1 

⎜ 

⎟ 

det ⎝−1 −1 0 ⎠ = 0 

2 −1 −1 

7.7. 

( ) ( ) ( ) 

0 −1 

2 0 

2 0 

+0 det 

− 0 det 

+ 2 det 

= −4 

2 −1 

2 −1 

0 −1

348 Hints 

7.8. 

( ) ( ) ( ) 

0 2 

1 −1 

1 −1 

+2 det − 2 det 

+ 0 det 

= −14 

2 1 

2 1 

0 2 

7.9. 

( ) ( ) ( ) 

−1 2 

−1 −1 

−1 −1 

+0 det 

− 0 det 

+ 0 det 

= 0 

1 0 

1 0 

−1 2 

7.10. Rescaling that row rescales the determinant. But rescaling that 

row doesn’t change anything, so it must not change the determinant. The 

determinant must not change when scaled, so must be 0. 

7.12. To see that det A = 12, we compute 

⎛ 

⎞ 

⎜ 

⎝ 

0 2 1 

⎟ 

3 1 2⎠ 

3 5 2 





⎛ 

⎞ 

3 1 2 

⎜ 

⎟ 

⎝ 0 2 1⎠ 

3 5 2 

⎛ 

⎞ 

3 1 2 

⎜ 

⎟ 

⎝ 0 2 1⎠ 

0 4 0 

⎛ 

⎞ 

3 1 2 

⎜ 

⎟ 

⎝ 0 2 1⎠ 

0 4 0 

⎛ 

⎞ 

3 1 2 

⎜ 

⎟ 

⎝ 0 2 1 ⎠ 

0 0 −2

Hints 349 


⎛ 

⎜ 

⎝ 

3 1 2 

0 2 1 

0 0 −2 

7.13. If L 11 = 0, then we have a zero row, so det = 0, and the result is 

obviously true. If L 11 ≠ 0, then use it as a pivot to kill everything underneath 

it: 

⎛ 

⎞ 

L 11 

0 L 22 

0 L 32 L 33 

. 0 L 42 L .. 

43 ⎜ 

⎝ 

. 

. . . . .. 

⎟ 

⎠ 

0 L n2 L n3 . . . L n(n−1) L nn 

Proceed by induction. 

7.15. Here is one proof: the i-th row of A t is obvious the transpose of 

the i-th column of A: e i t A t = (Ae i ) t . Writing out any vector x as x = 

x 1 e 1 + x 2 e 2 + · · · + x n e n , and adding up, we see that x t A t = (Ax) t for any 

vector x. Apply this to a column of B, say x = Be i . 

e i t B t A t = (Be i ) t A t 

= (ABe i ) t 

Here is another proof, using lots of indices: 

= e i t (AB) t . 

⎞ 

⎟ 

⎠ 

(AB) t ij = (AB) ji 

= ∑ k 

A jk B ki 

= ∑ k 

B ki A jk 

= ∑ k 

B t ikA t kj 

= ( B t A t) ij . 

7.16. It is the inverse permutation; see the exercises in subsection 3.3. 

7.17. 4: expand down the third column.

350 Hints 

7.18. For k = 1, A 1 = A, so obvious. By induction, 

det ( A k+1) = det ( A · A k) 

= (det A) ( det A k) 

= (det A) (det A) k 

= (det A) k+1 . 

7.19. det ( A 2222444466668888) = (det A) 2222444466668888 = (−1) 2222444466668888 = 

1 (an even number of minus signs). 

7.20. AA −1 = I so det A det ( A −1) = 1. 

7.21. 

a. By expanding down any column. 

b. By expanding across any row. 

c. By forward elimination, and then taking the product of the diagonal 

entries. (The fastest way for a big matrix.) 

7.22. One, because the determinant of the coefficients is 1 · 2 · 3 = 6. 

8 Span 

8.1. 

x 1 + 2x 2 + x 3 = 1 

x 2 + x 3 = 0 

−x 1 + x 2 = −2 

8.3. Yes. 

8.4. Lets call these vectors x 1 , x 2 , x 3 . Make the matrix 

Apply forward elimination: 


A = 

(x 1 x 2 x 3 

) 

. 

⎛ 

⎞ 

1 1 1 

⎜ 

⎟ 

A = ⎝ 0 0 1⎠ 

1 0 0 

⎛ 

⎞ 

1 1 1 

⎜ 

⎟ 

⎝ 0 0 1 ⎠ 

0 −1 −1

Hints 351 


⎛ 

⎞ 

1 1 1 

⎜ 

⎟ 

⎝ 0 −1 −1⎠ 

0 0 1 

⎛ 

⎞ 

1 1 1 

⎜ 

⎟ 

⎝ 0 −1 −1⎠ 

0 0 1 

If we add any vector y, we can’t add another pivot, so every vector y is a linear 

combination of x 1 , x 2 , x 3 . Therefore the span is all of R 3 . 

8.5. Yes. Forward eliminate: 

⎛ 

⎞ 

0 2 −1 −1 

⎜ 

⎝−1 1 −1 0 

0 2 1 1 

⎟ 

⎠ 



⎛ 

⎞ 

−1 1 −1 0 

⎜ 

⎟ 

⎝ 0 2 −1 −1⎠ 

0 2 1 1 

⎛ 

⎞ 

−1 1 −1 0 

⎜ 

⎟ 

⎝ 0 2 −1 −1⎠ 

0 2 1 1 


⎛ 

⎞ 

−1 1 −1 0 

⎜ 

⎟ 

⎝ 0 2 −1 −1⎠ 

0 0 2 2 


⎛ 

⎞ 

−1 1 −1 0 

⎜ 

⎟ 

⎝ 0 2 −1 −1⎠ 

0 0 2 2

352 Hints 

There is no pivot in the final column, so the final column is a linear combination 

of earlier columns. 

8.6. No. Forward eliminate: 

⎛ 

⎞ 

2 2 4 0 

⎜ 

⎝ 0 −1 0 

⎟ 

2⎠ 

0 1 0 0 


Add row 2 to row 3. 



⎛ 

⎞ 

2 2 4 0 

⎜ 

⎟ 

⎝ 0 −1 0 2⎠ 

0 1 0 0 

⎛ 

⎞ 

2 2 4 0 

⎜ 

⎟ 

⎝ 0 −1 0 2⎠ 

0 0 0 2 

⎛ 

⎞ 

2 2 4 0 

⎜ 

⎟ 

⎝ 0 −1 0 2⎠ 

0 0 0 2 

⎛ 

⎜ 

⎝ 

2 2 4 0 

0 −1 0 2 

0 0 0 2 

⎞ 

⎟ 

⎠ 

There is a pivot in the final column, so the final column is linearly independent 

of earlier columns. 

8.9. x 1 + x 2 + 2x 3 = 0 

8.11. You can rescale and add as many times as you need to, forming any 

linear combination. 

8.12. No: it doesn’t contain 0. 

8.13. Yes 

8.14. Obviously every vector in a subspace is a linear combination of 

vectors in the subspace: x = 1 · x. So the subspace lies inside the span of its 

vectors. Conversely, every linear combination of vectors in a subspace belongs

Hints 353 

to the subspace, so the span of the vectors in the subspace lies in the subspace. 

Therefore a subspace is its own span. 

8.15. {0} and R. 

8.16. 

a. Not always. Take the x and y axes in the (x, y) plane. 

b. Yes. 

8.17. No: it contains 

( ) 

1 

x = 

1 

but doesn’t contain 

8.18. 

a. no 

b. yes 

c. yes 

d. no 

8.20. The lines through 0. 

( ) 

2x = 

2 

2 

. 

9 Bases 

9.2. Put the standard basis into the columns of a matrix, and you have the 

identity matrix. Look: there is a pivot in each column. 

9.4. Try adding a vector to the set. If you can’t then you are done: a basis. 

If you can, then keep going. If you end up with more than n vectors, then use 

theorem 9.4. 

9.5. Put them into the columns of matrix A. You find det A = 0, so these 

vectors are linearly dependent. 

9.6. Put them into the columns of a matrix, and apply forward elimination 

to find pivots: 

⎛ ⎞ 

⎜ 

⎝ 

2 1 

3 

0 

2 

⎟ 

⎠ . 

A square matrix, a pivot in every column, so a basis. 

9.10. Write A as columns 

) 

A = 

(u 1 u 2 . . . u n . 

The equation Ax = 0 is the equation x 1 u 1 + x 2 u 2 + · · · + x n u n = 0, which 

imposes a linear relation. Therefore u 1 , u 2 , . . . , u n are linearly independent just 

when Ax = 0 has x = 0 as its only solution.

354 Hints 

9.11. 

⎛ 

⎞ 

⎛ 

⎞ 

1 0 −1 

F −1 ⎜ 

⎟ 

= ⎝0 1 0 ⎠ , F −1 ⎜ 

1 0 −1 ⎟ 

AF = ⎝0 2 0 ⎠ . 

0 0 1 

0 0 2 

9.12. 

⎛ 

⎞ ⎛ 

⎞ 

⎜ 

1 0 0 1 2 0 

⎟ ⎜ 

⎟ 

A = ⎝0 1 0⎠ , F = ⎝0 1 2⎠ , 

0 0 0 0 0 1 

⎛ 

⎞ 

⎛ 

⎞ 

F −1 ⎜ 

1 −2 4 ⎟ 

= ⎝0 1 −2⎠ , F −1 ⎜ 

1 0 −4 ⎟ 

AF = ⎝0 1 2 ⎠ . 

0 0 1 

0 0 0 

9.14. Expand out x = x 1 e 1 + x 2 e 2 + · · · + x q e q to give 

Ax = x 1 (Ae 1 ) + x 2 (Ae 2 ) + · · · + x q (Ae q ) . 

So Ax = 0 just when x 1 , x 2 , . . . , x q give a linear relation among the columns of 

A. 

9.15. No 

9.17. Yes 

9.19. Let 

) 

(x 1 x 2 . . . x n 

F = 

G = 

(y 1 y 2 . . . y n 

) 

, 

and let A = G F −1 . But why is there only one such matrix? 

9.21. The idea is that e 2 − e 3 = (e 1 − e 3 ) − (e 1 − e 2 ), etc. So consider the 

vectors e 1 − e 2 , e 1 − e 3 , . . . , e 1 − e n−1 . Clearly if i = 1, then e i − e j is one of 

these vectors. But if i ≠ 1, then 

e i − e j = (−1) (e 1 − e i ) + (e 1 − e j ) . 

So the vectors e 1 − e 2 , e 1 − e 3 , . . . , e 1 − e n−1 span the subspace. Clearly these 

vectors are linearly independent, because each one has a nonzero entry just at 

a spot where all of the others have a zero entry. Alternatively, to see linear 

independence, any linear relation among them: 

0 = c 2 (e 1 − e 2 ) + c 3 (e 1 − e 3 ) + . . . c n (e 1 − e n ) 

= (c 2 + c 3 + · · · + c n ) e 1 + c 2 e 2 + c 3 e 3 + · · · + c n e n 

determines a linear relation among the standard basis vectors, forcing 0 = 

c 2 = c 3 = · · · = c n . The subspace is actually the set of vectors x for which 

x 1 + x 2 + · · · + x n = 0.

Hints 355 


⎛ 

⎜ 

0 1 

⎞ 

⎟ 

A = ⎝1 1⎠ . 

1 2 

Depending on whether you swap row 1 with row 2 or with row 3, forward 

elimination yields 

⎛ ⎞ ⎛ ⎞ 

1 1 1 2 

⎜ ⎟ ⎜ ⎟ 

⎝ 0 1 ⎠ or ⎝ 0 1 ⎠ . 

0 0 0 0 


10.3. The reduced echelon form is 

⎛ 

⎞ 

1 0 −1 

⎜ 

⎟ 

⎝ 0 1 2 ⎠ . 

0 0 0 

The basis for the kernel is 

⎛ ⎞ 

1 

⎜ ⎟ 

⎝−2⎠ 

1 

10.6. Remove zero rows. 

⎛ 

⎞ 

1 0 0 0 0 

⎜ 

⎟ 

⎝ 0 1 0 0 1⎠ 

0 0 1 2 2 

Change signs of the entries after each pivot. 

⎛ 

⎞ 

1 0 0 0 0 

⎜ 

⎟ 

⎝ 0 1 0 0 −1⎠ 

0 0 1 −2 −2 

Pad with rows from the identity matrix, to get 1’s down the diagonal. 

⎛ 

⎞ 

1 0 0 0 0 

0 1 0 0 −1 

0 0 1 −2 −2 

⎜ 

⎟ 

⎝ 0 0 0 1 0 ⎠ 

0 0 0 0 1

356 Hints 

Keep the pivotless columns. 

⎛ ⎞ 

0 

0 

−2 

, 

⎜ ⎟ 

⎝ 1 ⎠ 

0 

⎛ ⎞ 

0 

−1 

−2 

⎜ ⎟ 

⎝ 0 ⎠ 

1 

10.7. Remove zero rows. 

⎛ 

⎞ 

0 1 0 0 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝0 0 0 1 0 ⎠ 

0 0 0 0 1 


⎛ 

⎞ 

0 1 0 0 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝0 0 0 1 0 ⎠ 

0 0 0 0 1 


⎛ 

⎞ 

1 0 0 0 0 

0 1 0 0 0 

0 0 1 0 0 

⎜ 

⎟ 

⎝0 0 0 1 0 ⎠ 

0 0 0 0 1 


⎛ ⎞ 

1 

0 

0 

⎜ ⎟ 

⎝0⎠ 

0 

10.8. The reduced echelon form is 

⎛ 

⎞ 

1 0 0 0 2 

0 1 0 0 − 5 ⎜ 

2 

⎝ 

5 

⎟ 

0 0 1 0 ⎠ 

2 

0 0 0 1 1

Hints 357 

Remove zero rows. 

⎛ 

⎞ 

1 0 0 0 2 

0 1 0 0 − 5 ⎜ 

2 

⎝ 

5 

⎟ 

0 0 1 0 ⎠ 

2 

0 0 0 1 1 


⎛ 

⎞ 

1 0 0 0 −2 

5 

0 1 0 0 

⎜ 

2 

⎝ 0 0 1 0 − 5 ⎟ 

⎠ 

2 

0 0 0 1 −1 


⎛ 

⎞ 

1 0 0 0 −2 

5 

0 1 0 0 2 

0 0 1 0 − 

⎜ 

5 2⎟ 

⎝ 0 0 0 1 −1⎠ 

0 0 0 0 1 


⎛ 

⎜ 

⎝ 

−2 

5 

2 

− 5 2 

−1 

1 

⎞ 

⎟ 

⎠ 

10.9. ⎛ ⎞ 

−1 

⎜ ⎟ 

⎝ 1 ⎠ 

0 

10.10. ⎛ ⎞ 

−1 

⎜ ⎟ 

⎝ 1 ⎠ 

0 

10.11. The only vector in the kernel is 0. 

10.12. 

y = Ax 

= A ∑ j 

x j e j 

= ∑ j 

x j Ae j

358 Hints 

so a linear combination (with coefficients x 1 , x 2 , . . . , x q ) of the columns Ae j of A. 

Therefore the vectors of the form y = Ax are precisely the linear combinations 

of the columns of A. 

10.14. It is the horizontal plane in R 3 , the xy-plane in xyz coordinates. 

10.16. If A is tall, then A t is short, so there are nonzero vectors c so that 

A t c = 0, i.e. c t A = 0. Since c ≠ 0, there must be some entry c i ≠ 0. If Ax = e i , 

we find 0 = c t Ax = c t e i = c i a contradiction. So e i is not in the image. 

10.17. 1, 1, 2, 1, 1, 0 

10.19. The kernel of B consists in the vectors 

⎛ 

⎜ 

x 

⎞ 

⎟ 

v = ⎝y⎠ 

z 

for which Bv = 0. This is just asking for Ax + Ay + Az = 0, i.e. for 

A(x + y + z) = 0. We can pick x and y arbitrarily, and pick an arbitrary vector 

w in the kernel of A, and set z = w − (x + y). In particular, B has kernel of 

dimension 2q + k where k is the dimension of the kernel of A. 

10.20. Ax = 0 implies that Bx = CAx = 0. Conversely, Bx = 0 implies 

that Ax = C −1 Bx = 0. 

10.21. Linear relations among vectors pass through C and through C −1 . 

10.22. Compute echelon form: 

⎛ 

⎞ 

0 2 2 2 

1 2 0 0 

⎜ 

⎟ 

⎝ 0 1 2 2⎠ 

0 2 2 2 


⎛ 

⎞ 

1 2 0 0 

0 2 2 2 

⎜ 

⎟ 

⎝ 0 1 2 2⎠ 

0 2 2 2 

⎛ 

⎞ 

1 2 0 0 

0 2 2 2 

⎜ 

⎟ 

⎝ 0 1 2 2⎠ 

0 2 2 2 

Add −1/2(row 2) to row 3.

Hints 359 


⎛ 

⎞ 

1 2 0 0 

0 2 2 2 

⎜ 

⎟ 

⎝ 0 0 1 1⎠ 

0 0 0 0 

So the rank is 3, the number of pivots. There is 1 pivotless column. The image 

is 3-dimensional, while the kernel is 1-dimensional. 


A = 

( 

1 0 

0 0 

) 

, B = 

( 

0 0 

1 0 

) 

, C = 

( 

0 1 

1 0 

The image of A is the span of the columns, so the span of 

( ) 

1 

0 

while the image of B is the span of its columns, so the span of 

( ) 

0 

, 

1 

which are clearly not the same subspaces, but both one dimensional. 

10.24. The rank of A t is the number of linearly independent columns of 

A t (rows of A). Let U be the forward elimination of A. When we compute 

forward elimination, we add rows to other rows, and swap rows, so the rows of 

U are linear combinations of the rows of A, and vice versa. Thus the rank of 

A t is the number of linearly independent rows of U. None of the zero rows of 

U count toward this number, while pivot rows are clearly linearly independent. 

Therefore the rank of A t is the number of pivots, so equal to the rank of A. 

10.31. If they both have solutions, then A t y = 0 so y t A = 0, so y t Ax = 0. 

But y t Ax = y t b, so b t y = 0, a contradiction. 

On the other hand, if neither has a solution, then b is not in the image of A. 

So b is not a linear combination of the columns of A, and the matrix M = (A b) 

has rank 1 higher than the matrix A. Therefore the matrix 

( ) 

M t A 

= 

t 

b t 

also has rank one higher than A t . So the dimension of the kernel of M t is one 

lower than the dimension of the kernel of A t , and therefore there is a vector y 

in the kernel of A t but not in the kernel of M t . 

10.34. 

( ) ( ) 

A 

A + B = 1 1 

B 

) 

.

360 Hints 

so you can use the previous exercise. 

10.35. (a) only 

10.36. The rank of AB is the dimension of the image. But (AB)x = A(Bx), 

so every vector in the image of AB lies in the image of A. The clever bit: on 

the other hand, the rank of AB is the same as the rank of AB t , as shown in 

problem 10.24 on page 97. But AB t = B t A t , so the rank is at most the rank of 

B t , which is the rank of B. 


11.2. The determinant det (A − λ I) is the determinant of an upper (lower) 

triangular matrix if A is upper (lower) triangular, with diagonal entries. 


( ) 

0 1 

A = 

1 0 

which has eigenvalues λ = 1 and λ = −1. 


( ) ( ) ( ) 

1 1 1 0 

2 1 

A = 

, B = 

, A + B = 

. 

0 1 1 1 

1 2 

Their eigenvalues are: for A, λ = 1; for B, λ = 1, for A + B, λ = 1 or λ = 3. 

11.5. Suppose that A is an n × n matrix. The determinant of a matrix A 

is a sum of terms, each one linear in any of the entries of A which appear in it. 

The characteristic polynomial det (A − λ I) therefore involves at most n terms 

with a λ in them, coming from the n diagonal entries of A − λ, so a polynomial 

in λ of degree at most n. 

11.6. (−1) n λ n 

11.7. A : 0, 5, B : −1, 3, C : 0, 1, D : 0, 0, 2 

11.8. det A = det A t for any square matrix A, so I t = I and det (A − λ I) = 

det (A t − λ I) . 

11.9. 

det ( F −1 AF − λ I ) = det ( F −1 (A − λ I) F ) 

11.11. If A is invertible, then 

= det ( F −1) det (A − λ I) det (F ) 

= det (A − λ I) . 

det (AB − λ I) = det ( A −1 (AB − λ I) A ) 

= det (BA − λ I) . 

If B is invertible, the same trick works. But if neither A nor B is invertible, we 

have to work harder. Pick α any number which is not an eigenvalue of A. Then

Hints 361 

A − α I is invertible, so (A − α I) B and B (A − α I) have the same eigenvalues. 

Therefore 

det ((A − α I) B − λ I) = det (B (A − α I) − λ I) 

as polynomials in α. Now we can plug in α = 0. 

11.13. 0,3 

11.16. 

a. s n (A) = 1 

b. s 0 (A) = det (A − 0) = det A 

c. The expression det A is a sum ofterms, each a product of precisely n 

entries of A, as we have seen. So det (A − λ I) is a sum of terms, each 

involving some A entries and some λ’s, with a total of n factors in each 

term. 

d. If A is upper triangular, or lower triangular, then the result is obvious. 

But each term of s n−1 (A) involves precisely one entry of A, hence linear 

in those entries. So s n−1 (A) is a linear function of A, i.e. s n−1 (A + B) = 

s n−1 (A) + s n−1 (B). We can write any matrix as a sum of lower triangular 

and an upper triangular. 

e. Follows immediately from problem 11.9. 

f. Clearly F −1 AF e j = F −1 Au j = 0 for j > r. So we get zeros just where 

we need them, to be able to write 

( ) 

F −1 P 0 

AF = 

. 

Q 0 

g. 

P must have rank r, since there is nowhere else to put the r pivots, so P 

is invertible. Since A has rank r, so must F −1 AF . 

( 

) 

P − λ I 0 

det (A − λ I) = det 

Q −λ I 

= det (P − λ I) (−λ) n−r 

has no terms of degree less than n − r in λ. 

h. You could try ( ) ( ) 

0 1 0 0 

A = 

, B = 

. 

0 0 0 0 

11.17. The λ = 3-eigenvectors are multiples of 

( ) 

0 

x = . 

1 

11.20. 

λ = 1 

( ) 

0 

1

362 Hints 

11.21. 

11.22. 

λ = −2 

λ = 3 

λ = 1 

λ = 2 

λ = 3 

( 

− 3 2 

) 

1 

( ) 

1 

1 

⎛ 

⎜ 

0 

⎞ 

⎟ 

⎝1⎠ 

0 

⎛ ⎞ 

− 1 2 

⎜ 

⎝ 2 

1 

⎛ ⎞ 

0 

⎜ 3⎟ 

⎝ ⎠ 

2 

1 

⎟ 

⎠ 


12.1. The kernel of any matrix is a subspace. 

12.2. Ax = λx so 

ABx = BAx 

= Bλx 

= λBx. 

12.3. Depending on the order in which you write down your eigenvalues, 

you could get: 

⎛ 

⎞ 

−1 0 0 

⎜ 

⎟ 

F = ⎝ 0 1 0⎠ 

1 0 1 

⎛ 

⎞ 

−1 0 0 

F −1 ⎜ 

⎟ 

= ⎝ 0 1 0⎠ 

1 0 1 

⎛ 

⎞ 

F −1 ⎜ 

1 0 0 ⎟ 

AF = ⎝0 2 0⎠ 

0 0 3

Hints 363 

12.4. Again, it depends on the order you choose to write the eigenvalues, 

and the order in which you write down basis vectors for each eigenspace. You 

could get: 

λ = −3 

λ = −2 

λ = 0 

⎛ ⎞ 

−1 

⎜ ⎟ 

⎝ 0 ⎠ 

1 

⎛ ⎞ 

−2 

⎜ ⎟ 

⎝ 1 ⎠ 

0 

⎛ ⎞ 

0 

⎜ ⎟ 

⎝−1⎠ 

1 

⎛ 

⎞ 

−1 −2 0 

⎜ 

⎟ 

F = ⎝ 0 1 −1⎠ 

1 0 1 

⎛ 

⎞ 

1 2 2 

F −1 ⎜ 

⎟ 

= ⎝−1 −1 −1⎠ 

−1 −2 −1 

⎛ 

⎞ 

−3 0 0 

F −1 ⎜ 

⎟ 

AF = ⎝ 0 −2 0⎠ 

0 0 0 

12.5. You could get 

λ = 2 

λ = −4 

⎛ ⎞ ⎛ ⎞ 

−1 

⎜ ⎟ ⎜ 

1 ⎟ 

⎝ 0 ⎠ , ⎝1⎠ 

1 0 

⎛ ⎞ 

1 

2 

⎜ 1⎟ 

⎝ 

2⎠ 

1

364 Hints 

⎛ 

⎞ 

1 

−1 1 

2 

⎜ 

F = 

1⎟ 

⎝ 0 1 

2⎠ 

1 0 1 

⎛ 

⎞ 

−1 1 0 

F −1 ⎜ 

= ⎝− 1 3 

2 2 

− 1 ⎟ 

2⎠ 

1 −1 1 

⎛ 

⎞ 

2 0 0 

F −1 ⎜ 

⎟ 

AF = ⎝0 2 0 ⎠ 

0 0 −4 

12.6. Suppose we have a matrix F for which 

⎛ 

⎞ 

λ 1 λ 

F −1 AF = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ . 

λ n 

Call this diagonal matrix Λ. Therefore AF = F Λ. Lets check that the columns 

of F are eigenvectors. We need only see that F Λ is just F with columns scaled 

by the diagonal entries of Λ. 


A = 

because it has only one eigenvector, 

( 

0 1 

0 0 

( ) 

1 

x = 

0 

(up to rescaling), with eigenvalue λ = 0. Therefore there is no basis of eigenvectors. 

12.9. You could get 

λ = −1 

λ = 1 

) 

( ) 

−1 

1 

( ) 

− 1 2 

1

Hints 365 

( ) 

−1 − 1 

F = 

2 

1 1 

( ) 

F −1 −2 −1 

= 

2 2 

( ) 

F −1 −1 0 

AF = 

0 1 

Let Λ = F −1 AF . So A = F ΛF −1 . Clearly Λ 100000 = 1, so A 100000 = 

F Λ 100000 F −1 = 1. 

12.11. 

( ) 

A = 1 13 −4 

5 −4 7 

12.12. 

⎛ 

⎜ 

A = ⎝ 

1 

2 

1 

4 

1 

4 

1 

4 

0 

1 

2 

0 

1 

4 

1 

⎛ 

− 1 2 

− 1 2 

⎞ 

⎟ 

⎠ 

⎞ 

λ = 3 ⎜ ⎟ 

4 

v 1 = ⎝ ⎠ 

1 

⎛ ⎞ 

⎜ 

0 ⎟ 

λ = 1 v 2 = ⎝0⎠ 

1 

⎛ ⎞ 

−1 

λ = 1 ⎜ ⎟ 

4 

v 3 = ⎝ 1 ⎠ 

0 

⎛ 

⎞ 

⎜ 

− 1 F = 

2 0 −1 

⎝− 1 ⎟ 

2 

0 1 ⎠ 

1 1 0 

⎛ 

⎞ 

−1 −1 0 

F −1 ⎜ 

⎟ 

= ⎝ 1 1 1⎠ 

− 1 2 

1 

2 

0 

⎛ 

⎞ 

3 

F −1 4 

0 0 

⎜ 

⎟ 

AF = ⎝0 1 0⎠ 

1 

0 0 

4

366 Hints 

Consider the coordinates of the vector 

⎛ ⎞ 

⎜ 

h 0 

⎟ 

⎝s 0 ⎠ . 

d 0 

Numbers of people can’t be negative. Clearly the vector must be a linear 

combination of eigenvectors, with a positive coefficient for the λ = 1 eigenvector: 

⎛ ⎞ 

h 0 

⎜ ⎟ 

⎝s 0 ⎠ = a 1 v 1 + a 2 v 2 + a 3 v 3 , 

d 0 

with a 2 > 0. Over time, the numbers develop according to 

⎛ ⎞ ⎛ 

h n 

⎜ ⎟ ⎜ 

h ⎞ 

n−1 

⎟ 

⎝s n ⎠ = A ⎝s n−1 ⎠ 

d n d n−1 

⎛ ⎞ 

= A n ⎜ 

h 0 

⎟ 

⎝s 0 ⎠ 

d 0 

( ) n ( ) n 3 1 

= a 1 v 1 + a 2 v 2 + a 3 v 3 . 

4 

4 

Since the other eigenvalues are smaller than 1, their powers become very small, 

and their components in the resulting vector gradually decay away. Thereforethe 

result becomes every closer to 

⎛ ⎞ 

0 

⎜ ⎟ 

a 2 v 2 = ⎝ 0 ⎠ , 

a 2 

everybody dead. So everyone dies, in an exponential decay of population. (It 

should be obvious, because we didn’t allow any births in our model.) 

12.15. See table A.1. 

Table A.1: Invertibility criteria. A is n × n of rank r. U is any 

matrix obtained from A by forward elimination. 

§ Invertible Not invertible 

5.1 Gauss–Jordan on A yields 1. Gauss–Jordan on A yields a 

matrix with n − r zero rows. 

5.2 U is invertible. U has n − r zero rows.

Hints 367 

§ Invertible Not invertible 

5.2 Pivots lie on the diagonal. Some pivot lies above the diagonal, 

and all pivots after it. 

5.2 U has no zero rows U has n − r zero rows. 

5.2 U has n pivots. U has r < n pivots. 

5.2 Ax = b has a solution x for 

each b. 

5.2 Ax = b has exactly one solution 

x for each b. 

5.2 Ax = b has exactly one solution 

x for some b. 

Ax = b has no solution for 

some b, n−r dimensions worth 

for other b. 


some b, many for other b. 


some b, many for other b. 

5.2 Ax = 0 only for x = 0. Ax = 0 for many x. 

5.2 A has rank n. A has rank r < n. 

7.4 A t is invertible. A t is not invertible. 

7.4 det A ≠ 0. Every square block larger 

than r × r has det = 0. 

9.2 The columns are linearly independent. 

The n − r pivotless columns 

are linear combinations of the 

r pivot columns. 

9.2 The columns form a basis. Each of the n − r pivotless 

columns is a linear combination 

of earlier pivot columns. 

9.2 The rows form a basis. One row is a linear combination 

of earlier rows. 

10.2 The kernel of A is just the 0 

vector. 

The kernel has positive dimension 

n − r. 

10.3 The image of A is all of R n . The image has positive dimension 

r. 

11.1 0 is not an eigenvalue of A. The λ = 0 eigenspace has positive 

dimension n − r. 


13.1. 

〈Ae j , e i 〉 = 〈A kj e k , e i 〉 

= A ij .

368 Hints 

13.2. 

x t y = ∑ k 

= ∑ k 

x t ky k 

x k y k . 

13.3. 

13.4. 

〈 

v − 

P ij = 〈P e i , e j 〉 

= 〈〉 

e p(i) , e j 

⎧ 

⎨ 

1, if p(i) = j, 

= 

⎩0, otherwise. 

⎧ 

⎨ 

1, if i = p −1 (j), 

= 

⎩0, otherwise. 

= 〈〉 

e i , e p −1 (j) 

= 〈 e i , P −1 〉 

e j 

= P −1 

ji . 

〉 

〈v, u〉 

‖u‖ 2 u, u = 〈v, u〉 − 

= 〈v, u〉 − 〈v, u〉 

= 0. 

〈v, u〉 

2 

〈u, u〉 

‖u‖ 

13.5. 

(a) One: x = 0. 

(b) 2n: one x i is ±1, all other x j are 0. 

(c) 2n + 16 ( n 

4) 

: one xi is ±2, all other x j are 0, or 4 x i ’s are ±1 and all other 

are 0. 

(d) 2n + 8(n − 2) ( ) 

n 

2 + 2 6 (n − 5) ( ) 

n 

( 5 + 2 

9 n 

9) 

: 

a. one ±3 or 

b. two ±2’s and one ±1 or 

c. one ±2 and five ±1’s or 

d. nine ±1’s. 

13.6. The hour hand starts at angle π/2, and completes a revolution every 

12 hours. So the hour hand is at an angle of 

θ = π 2 − 2π 

12 t, 

after t hours. The minute hand, if we measure time in hours, revolves every 

hour, so has angle 

θ = π 2 − 2πt.

Hints 369 

a. t = 3 

11 

(2k + 1), any integer k 

b. t = 6k 

11 

, any integer k. 


( ) ( ) ( ) 

0 1 0 1 

1 2 

A = 

, B = 

, AB = 

. 

1 1 1 2 

1 3 

13.12. For A symmetric. 

13.19. Those which have ±1 in each diagonal entry. 

13.20. 

P = 

( ) 

1 0 

0 −1 

is not of that form. 


( ) 

0 1 

A = 

, B = A. 

−1 0 

13.22. 

〈x, y〉 = 1 2 

( 

‖x + y‖ 2 − ‖x‖ 2 − ‖y‖ 2) . 

Preserve the right hand side, and you must preserve the left hand side. 

13.25. The rows of A are orthonormal just when A t is orthogonal, which 

occurs just when A = (A t ) −1 , which occurs just when A t = A −1 , which occurs 

just when A is orthogonal. 

13.26. See table A.2 on the following page and table A.3 on the next page. 

13.27. 

u 1 = 

⎛ 

⎜ 

⎝ 

√ 

2 

2√ 

2 

2 

0 

⎞ 

⎛ 

⎟ ⎜ 

⎠ , u 2 = ⎝ 

√ 

6 

6 

− √ 6 

√6 

6 

3 

⎞ 

⎟ 

⎠ , u 3 = 

⎛ 

⎜ 

⎝ 

− √ 3 

√3 

3 

3√ 

3 

3 

⎞ 

⎟ 

⎠ , 

13.28. The pictures should look like

370 Hints 

w 1 = v 1 

⎛ ⎞ 

1 

⎜ ⎟ 

= ⎝−1⎠ 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

⎛ ⎞ 

⎛ ⎞ 

2 

1 

⎜ ⎟ (2)(1) + (0)(−1) + (−2)(0) ⎜ ⎟ 

= ⎝ 0 ⎠ − ⎝−1⎠ 

(1)(1) + (−1)(−1) + (0)(0) 

−2 

0 

= 

⎛ 

⎜ 

⎝ 

1 

1 

−2 

⎞ 

⎟ 

⎠ 

Table A.2: Orthogonalizing vectors: the projections 

u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

⎛ ⎞ 

1 

1 

⎜ ⎟ 

= √ ⎝−1⎠ 

(1)(1) + (−1)(−1) + (0)(0) 

0 

⎛ √ ⎞ 

1 

2 2 

⎜ √ 

= ⎝− 1 ⎟ 

2 2 ⎠ 

0 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

⎛ 

1 

⎜ 

= √ ⎝ 

(1)(1) + (1)(1) + (−2)(−2) 

⎛ 

⎜ 

= ⎝ 

√ 

1 

6 6 

√ 

1 

6 6 

− 1 3 

⎞ 

⎟ 

⎠ 

√ 

6 

1 

1 

−2 

⎞ 

⎟ 

⎠ 

Table A.3: Orthogonalizing vectors: rescaling

Hints 371 




the first. 

Projected. 




13.29. 

u 1 = 

⎛ 

⎜ 

⎝ 

1√ 

6 

1√ 

6 

1 

2 √ 6 

⎞ 

⎟ 

⎠ , u 2 = 

⎛ 

⎜ 

⎝ 

⎞ 

1√ 

3 

1√ ⎟ 

3 

− √ 1 3 

Notice that v 1 , v 2 , v 3 did not give a basis, so when we try to compute u 3 , we 

run into trouble. 

13.30. The only problem that can come up is division by zero. But that 

happens only when we divide by a length ‖w j ‖. If this length is 0, then w j is 

zero, so 

⎠ . 

v j = ∑ i 

〈v j , u i 〉 u i . 

But each u i is a linear combination of vectors v 1 , v 2 , . . . , v i , so this is a linear 

dependence. 

13.32. If v is perpendicular to u, then set w = v, see that v is perpendicular 

to v, so v = 0. Otherwise, if v is not perpendicular to u, then w = v − 〈v,u〉 u is 

‖u‖ 2 

perpendicular to u and v, and therefore perpendicular to any linear combination 

of u and v. In particular, since w is a linear combination of u and v, w must be 

perpendicular to w, so w = 0. So 

v = 

〈v, u〉 

‖u‖ 2 u.

372 Hints 

13.33. 

w 1 = v 1 

( ) 

1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

5 (5)(1) + (3)(−1) 1 

= − 

3 (1)(1) + (−1)(−1) −1 

( ) 

4 

= 

4 

u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

1 

= √ 

(1)(1) + (−1)(−1) −1 

( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

( ) 

1 4 

= √ 

(4)(4) + (4)(4) 4 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

See figure A.2 on the facing page.

Hints 373 




the first. 

Projected. 




Figure A.2: Orthogonalizing vectors in the plane 

13.34. 

w 1 = v 1 

( ) 

−1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(−1) + (2)(−1) −1 

= − 

2 (−1)(−1) + (−1)(−1) −1 

( ) 

−1 

= 

1

374 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

= √ 

(−1)(−1) + (−1)(−1) −1 

( √ ) 

− 1 

= 2 2 

√ 

2 

= 

= 

− 1 2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

−1 

√ 

(−1)(−1) + (1)(1) 1 

( √ 

− 1 2 2 

1 

2 

√ 

2 

) 

See figure A.3.

Hints 375 

13.35. 

w 1 = v 1 

( ) 

−1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

−1 (−1)(−1) + (2)(1) −1 

= − 

2 (−1)(−1) + (1)(1) 1 

( ) 

= 

1 

2 

1 

2 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

= √ 

(−1)(−1) + (1)(1) 1 

( √ ) 

− 1 

= 2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

1 

= √ 

2 

( 1 2 )( 1 2 ) + ( 1 2 )( 1 2 ) 1 

2 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

See figure A.4 on the following page.

376 Hints 




the first. 

Projected. 





13.36. 

w 1 = v 1 

( ) 

2 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

−1 (−1)(2) + (1)(−1) 2 

= − 

1 (2)(2) + (−1)(−1) −1 

( ) 

= 

1 

5 

2 

5

Hints 377 




the first. 

Projected. 





u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

2 

= √ 

(2)(2) + (−1)(−1) −1 

( √ ) 

2 

= 5 5 

√ 

− 1 5 5 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

( ) 

1 

1 

= √ 

5 

( 1 5 )( 1 5 ) + ( 2 5 )( 2 5 ) 2 

5 

( √ ) 

1 

= 5 5 

√ 

5 

2 

5 

See figure A.5.

378 Hints 

13.37. 

w 1 = v 1 

( ) 

1 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(1) + (2)(2) 1 

= − 

2 (1)(1) + (2)(2) 2 

( ) 

= 

− 4 5 

2 

5 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (2)(2) 2 

( √ ) 

1 

= 5 5 

√ 

5 

2 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 4 5 )(− 4 5 ) + ( 2 5 )( 2 5 ) ( √ ) 

− 2 

= 5 5 

√ 

5 

1 

5 

− 4 5 

2 

5 

) 


Hints 379 




the first. 

Projected. 





13.38. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(1) + (0)(1) 1 

= − 

0 (1)(1) + (1)(1) 1 

( ) 

1 

= 

−1

380 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (1)(1) 1 

( √ ) 

1 

= 2 2 

√ 

2 

= 

= 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

1 

√ 

(1)(1) + (−1)(−1) −1 

( 

1 

2 

√ 

2 

− 1 2 

√ 

2 

) 

See figure A.7.

Hints 381 

13.39. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(1) + (2)(1) 1 

= − 

2 (1)(1) + (1)(1) 1 

( ) 

−1 

= 

1 

u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

√ 

(1)(1) + (1)(1) 

( 

1 

√ ) 

2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

−1 

= √ 

(−1)(−1) + (1)(1) 1 

( √ ) 

− 1 

= 2 2 

√ 

2 

1 

2 

1 


382 Hints 




the first. 

Projected. 





13.40. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(1) + (2)(1) 

= − 

2 (1)(1) + (1)(1) 

( ) 

= 

− 3 2 

3 

2 

( ) 

1 

1

Hints 383 




the first. 

Projected. 





u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

√ 

(1)(1) + (1)(1) 

( 

1 

√ ) 

2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 3 2 )(− 3 2 ) + ( 3 2 )( 3 2 ) ( √ ) 

− 1 

= 2 2 

√ 

2 

1 

2 

1 

− 3 2 

3 

2 

) 

See figure A.9.

384 Hints 

13.41. 

w 1 = v 1 

( ) 

0 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(0) + (1)(1) 

= − 

1 (0)(0) + (1)(1) 

( ) 

−1 

= 

0 

( ) 

0 

1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 0 

= √ 

(0)(0) + (1)(1) 1 

( ) 

0 

= 

1 

= 

= 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

−1 

√ 

(−1)(−1) + (0)(0) 0 

( 

−1 

0 

) 


Hints 385 




the first. 

Projected. 





13.42. 

w 1 = v 1 

( ) 

2 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(2) + (2)(1) 

= − 

2 (2)(2) + (1)(1) 

( ) 

−1 

= 

2 

( ) 

2 

1

386 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (1)(1) 1 

( √ ) 

2 

= 5 5 

√ 

5 

= 

= 

1 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

−1 

√ 

(−1)(−1) + (2)(2) 2 

( √ 

− 1 5 5 

2 

5 

√ 

5 

) 

See figure A.11.

Hints 387 

13.43. 

w 1 = v 1 

( ) 

−1 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(−1) + (−1)(0) −1 

= − 

−1 (−1)(−1) + (0)(0) 0 

( ) 

0 

= 

−1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

= √ 

(−1)(−1) + (0)(0) 0 

( ) 

−1 

= 

0 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

0 

= √ 

(0)(0) + (−1)(−1) −1 

( ) 

0 

= 

−1 


388 Hints 




the first. 

Projected. 





13.44. 

w 1 = v 1 

( ) 

2 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(2) + (−1)(−1) 2 

= − 

−1 (2)(2) + (−1)(−1) −1 

( ) 

= 

− 2 5 

− 4 5

Hints 389 




the first. 

Projected. 





u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

2 

= √ 

(2)(2) + (−1)(−1) −1 

( √ ) 

2 

= 5 5 

√ 

− 1 5 5 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 2 5 )(− 2 5 ) + (− 4 5 )(− 4 5 ) ( √ ) 

− 1 

= 5 5 

√ 

5 

− 2 5 

− 2 5 

− 4 5 

) 

See figure A.13.

390 Hints 

13.45. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(1) + (0)(1) 

= − 

0 (1)(1) + (1)(1) 

( ) 

= 

− 1 2 

1 

2 

( ) 

1 

1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (1)(1) 1 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 1 2 )(− 1 2 ) + ( 1 2 )( 1 2 ) ( √ ) 

− 1 

= 2 2 

√ 

2 

1 

2 

− 1 2 

1 

2 

) 


Hints 391 




the first. 

Projected. 





13.46. 

w 1 = v 1 

( ) 

1 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(1) + (0)(2) 

= − 

0 (1)(1) + (2)(2) 

( ) 

= 

− 4 5 

2 

5 

( ) 

1 

2

392 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (2)(2) 2 

( √ ) 

1 

= 5 5 

√ 

5 

2 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 4 5 )(− 4 5 ) + ( 2 5 )( 2 5 ) ( √ ) 

− 2 

= 5 5 

√ 

5 

1 

5 

− 4 5 

2 

5 

) 

See figure A.15.

Hints 393 

13.47. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

0 (0)(1) + (−1)(1) 

= − 

−1 (1)(1) + (1)(1) 

( ) 

= 

1 

2 

− 1 2 

( ) 

1 

1 

u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

√ 

(1)(1) + (1)(1) 

( 

1 

√ ) 

2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 1 2 )( 1 2 ) + (− 1 2 )(− 1 2 ) ( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

1 

2 

− 1 2 

) 


394 Hints 




the first. 

Projected. 





13.48. 

w 1 = v 1 

( ) 

−1 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(−1) + (−1)(2) −1 

= − 

−1 (−1)(−1) + (2)(2) 2 

( ) 

= 

2 

5 

1 

5

Hints 395 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

= √ 

(−1)(−1) + (2)(2) 2 

( √ ) 

− 1 

= 5 5 

√ 

5 

2 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

2 

= √ 

5 

( 2 5 )( 2 5 ) + ( 1 5 )( 1 5 ) 1 

5 

( √ ) 

2 

= 5 5 

√ 

5 

1 

5 

See figure A.17.

396 Hints 

13.49. 

w 1 = v 1 

( ) 

0 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(0) + (2)(1) 

= − 

2 (0)(0) + (1)(1) 

( ) 

−1 

= 

0 

( ) 

0 

1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 0 

= √ 

(0)(0) + (1)(1) 1 

( ) 

0 

= 

1 

= 

= 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

−1 

√ 

(−1)(−1) + (0)(0) 0 

( 

−1 

0 

) 


Hints 397 




the first. 

Projected. 





13.50. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(1) + (2)(1) 1 

= − 

2 (1)(1) + (1)(1) 1 

( ) 

−1 

= 

1

398 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (1)(1) 1 

( √ ) 

1 

= 2 2 

√ 

2 

= 

= 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

−1 

√ 

(−1)(−1) + (1)(1) 1 

( √ 

− 1 2 2 

1 

2 

√ 

2 

) 

See figure A.19.

Hints 399 

13.51. 

w 1 = v 1 

( ) 

1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(1) + (−1)(−1) 1 

= − 

−1 (1)(1) + (−1)(−1) −1 

( ) 

= 

1 

2 

1 

2 

u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

1 

= √ 

(1)(1) + (−1)(−1) −1 

( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

( ) 

1 

1 

= √ 

2 

( 1 2 )( 1 2 ) + ( 1 2 )( 1 2 ) 1 

2 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 


400 Hints 




the first. 

Projected. 





13.52. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(1) + (0)(1) 

= − 

0 (1)(1) + (1)(1) 

( ) 

= 

− 1 2 

1 

2 

( ) 

1 

1

Hints 401 




the first. 

Projected. 





u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

√ 

(1)(1) + (1)(1) 

( 

1 

√ ) 

2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 1 2 )(− 1 2 ) + ( 1 2 )( 1 2 ) ( √ ) 

− 1 

= 2 2 

√ 

2 

1 

2 

1 

− 1 2 

1 

2 

) 

See figure A.21.

402 Hints 

13.53. 

w 1 = v 1 

( ) 

1 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(1) + (2)(2) 1 

= − 

2 (1)(1) + (2)(2) 2 

( ) 

= 

4 

5 

− 2 5 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (2)(2) 2 

( √ ) 

1 

= 5 5 

√ 

5 

2 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 4 5 )( 4 5 ) + (− 2 5 )(− 2 5 ) ( √ ) 

2 

= 5 5 

√ 

− 1 5 5 

4 

5 

− 2 5 

) 


Hints 403 




the first. 

Projected. 





13.54. 

w 1 = v 1 

( ) 

2 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(2) + (2)(2) 2 

= − 

2 (2)(2) + (2)(2) 2 

( ) 

= 

− 1 2 

1 

2

404 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (2)(2) 2 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 1 2 )(− 1 2 ) + ( 1 2 )( 1 2 ) ( √ ) 

− 1 

= 2 2 

√ 

2 

1 

2 

− 1 2 

1 

2 

) 

See figure A.23.

Hints 405 

13.55. 

w 1 = v 1 

( ) 

1 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

0 (0)(1) + (−1)(0) 

= − 

−1 (1)(1) + (0)(0) 

( ) 

0 

= 

−1 

( ) 

1 

0 

u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

√ 

(1)(1) + (0)(0) 0 

( 

1 

0 

) 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

0 

= √ 

(0)(0) + (−1)(−1) −1 

( ) 

0 

= 

−1 


406 Hints 




the first. 

Projected. 





13.56. 

w 1 = v 1 

( ) 

1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(1) + (0)(−1) 1 

= − 

0 (1)(1) + (−1)(−1) −1 

( ) 

= 

1 

2 

1 

2

Hints 407 




the first. 

Projected. 





u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

1 

= √ 

(1)(1) + (−1)(−1) −1 

( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

( ) 

1 

1 

= √ 

2 

( 1 2 )( 1 2 ) + ( 1 2 )( 1 2 ) 1 

2 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

See figure A.25.

408 Hints 

13.57. 

w 1 = v 1 

( ) 

1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(1) + (0)(−1) 1 

= − 

0 (1)(1) + (−1)(−1) −1 

( ) 

1 

= 

1 

u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

1 

= √ 

(1)(1) + (−1)(−1) −1 

( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

( ) 

1 1 

= √ 

(1)(1) + (1)(1) 1 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 


Hints 409 




the first. 

Projected. 





13.58. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(1) + (1)(1) 1 

= − 

1 (1)(1) + (1)(1) 1 

( ) 

= 

1 

2 

− 1 2

410 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (1)(1) 1 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 1 2 )( 1 2 ) + (− 1 2 )(− 1 2 ) ( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

2 

− 1 2 

) 

See figure A.27.

Hints 411 

13.59. 

w 1 = v 1 

( ) 

0 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(0) + (1)(2) 0 

= − 

1 (0)(0) + (2)(2) 2 

( ) 

1 

= 

0 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 0 

= √ 

(0)(0) + (2)(2) 2 

( ) 

0 

= 

1 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 1 

= √ 

(1)(1) + (0)(0) 0 

( ) 

1 

= 

0 


412 Hints 




the first. 

Projected. 





13.60. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(1) + (1)(1) 1 

= − 

1 (1)(1) + (1)(1) 1 

( ) 

= 

1 

2 

− 1 2

Hints 413 




the first. 

Projected. 





u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

√ 

(1)(1) + (1)(1) 

( 

1 

√ ) 

2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 1 2 )( 1 2 ) + (− 1 2 )(− 1 2 ) ( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

1 

2 

− 1 2 

) 

See figure A.29.

414 Hints 

13.61. 

w 1 = v 1 

( ) 

2 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(2) + (2)(1) 2 

= − 

2 (2)(2) + (1)(1) 1 

( ) 

= 

− 3 5 

6 

5 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (1)(1) 1 

( √ ) 

2 

= 5 5 

√ 

5 

1 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 3 5 )(− 3 5 ) + ( 6 5 )( 6 5 ) ( √ ) 

− 1 

= 5 5 

√ 

5 

2 

5 

− 3 5 

6 

5 

) 


Hints 415 




the first. 

Projected. 





13.62. 

w 1 = v 1 

( ) 

−1 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(−1) + (−1)(2) 

= − 

−1 (−1)(−1) + (2)(2) 

( ) 

= 

− 6 5 

− 3 5 

( ) 

−1 

2

416 Hints 




the first. 

Projected. 





u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

√ 

(−1)(−1) + (2)(2) 2 

( √ 

− 1 5 5 

2 

5 

√ 

5 

) 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 6 5 )(− 6 5 ) + (− 3 5 )(− 3 5 ) ( √ ) 

− 2 

= 5 5 

√ 

5 

− 1 5 

− 6 5 

− 3 5 

) 

See figure A.31.

Hints 417 

13.63. 

w 1 = v 1 

( ) 

2 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(2) + (0)(1) 2 

= − 

0 (2)(2) + (1)(1) 1 

( ) 

= 

2 

5 

− 4 5 

u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

√ 

(2)(2) + (1)(1) 

( 

2 

√ ) 

5 5 

√ 

5 

1 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 2 5 )( 2 5 ) + (− 4 5 )(− 4 5 ) ( √ ) 

1 

= 5 5 

√ 

− 2 5 5 

1 

2 

5 

− 4 5 

) 


418 Hints 




the first. 

Projected. 





13.64. 

w 1 = v 1 

( ) 

−1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(−1) + (−1)(−1) −1 

= − 

−1 (−1)(−1) + (−1)(−1) −1 

( ) 

= 

3 

2 

− 3 2

Hints 419 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

= √ 

(−1)(−1) + (−1)(−1) −1 

( √ ) 

− 1 

= 2 2 

√ 

2 

− 1 2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 3 2 )( 3 2 ) + (− 3 2 )(− 3 2 ) ( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

3 

2 

− 3 2 

) 

See figure A.33.

420 Hints 

13.65. 

w 1 = v 1 

( ) 

1 

= 

1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

0 (0)(1) + (−1)(1) 

= − 

−1 (1)(1) + (1)(1) 

( ) 

= 

1 

2 

− 1 2 

( ) 

1 

1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (1)(1) 1 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 1 2 )( 1 2 ) + (− 1 2 )(− 1 2 ) ( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

2 

− 1 2 

) 


Hints 421 




the first. 

Projected. 





13.66. 

w 1 = v 1 

( ) 

−1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(−1) + (0)(−1) −1 

= − 

0 (−1)(−1) + (−1)(−1) −1 

( ) 

1 

= 

−1

422 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

= √ 

(−1)(−1) + (−1)(−1) −1 

( √ ) 

− 1 

= 2 2 

√ 

2 

= 

= 

− 1 2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

1 

√ 

(1)(1) + (−1)(−1) −1 

( 

1 

2 

√ 

2 

− 1 2 

√ 

2 

) 

See figure A.35.

Hints 423 

13.67. 

w 1 = v 1 

( ) 

2 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(2) + (2)(0) 2 

= − 

2 (2)(2) + (0)(0) 0 

( ) 

0 

= 

2 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (0)(0) 0 

( ) 

1 

= 

0 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 0 

= √ 

(0)(0) + (2)(2) 2 

( ) 

0 

= 

1 


424 Hints 




the first. 

Projected. 





13.68. 

w 1 = v 1 

( ) 

0 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(0) + (1)(−1) 0 

= − 

1 (0)(0) + (−1)(−1) −1 

( ) 

1 

= 

0

Hints 425 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

0 

= √ 

(0)(0) + (−1)(−1) −1 

( ) 

0 

= 

−1 

= 

= 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 1 

√ 

(1)(1) + (0)(0) 0 

( 

1 

0 

) 

See figure A.37.

426 Hints 

13.69. 

w 1 = v 1 

( ) 

1 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(1) + (2)(0) 1 

= − 

2 (1)(1) + (0)(0) 0 

( ) 

0 

= 

2 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (0)(0) 0 

( ) 

1 

= 

0 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 0 

= √ 

(0)(0) + (2)(2) 2 

( ) 

0 

= 

1 


Hints 427 




the first. 

Projected. 





13.70. 

w 1 = v 1 

( ) 

0 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

2 (2)(0) + (−1)(2) 

= − 

−1 (0)(0) + (2)(2) 

( ) 

2 

= 

0 

( ) 

0 

2

428 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 0 

= √ 

(0)(0) + (2)(2) 2 

( ) 

0 

= 

1 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 2 

= √ 

(2)(2) + (0)(0) 0 

( ) 

1 

= 

0 

See figure A.39.

Hints 429 

13.71. 

w 1 = v 1 

( ) 

0 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(0) + (0)(2) 0 

= − 

0 (0)(0) + (2)(2) 2 

( ) 

2 

= 

0 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 0 

= √ 

(0)(0) + (2)(2) 2 

( ) 

0 

= 

1 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 2 

= √ 

(2)(2) + (0)(0) 0 

( ) 

1 

= 

0 


430 Hints 




the first. 

Projected. 





13.72. 

w 1 = v 1 

( ) 

2 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

1 (1)(2) + (−1)(2) 

= − 

−1 (2)(2) + (2)(2) 

( ) 

1 

= 

−1 

( ) 

2 

2

Hints 431 




the first. 

Projected. 





u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

√ 

(2)(2) + (2)(2) 

( 

1 

√ ) 

2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

1 

= √ 

(1)(1) + (−1)(−1) −1 

( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

2 

See figure A.41.

432 Hints 

13.73. 

w 1 = v 1 

( ) 

0 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(0) + (1)(2) 0 

= − 

1 (0)(0) + (2)(2) 2 

( ) 

1 

= 

0 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 0 

= √ 

(0)(0) + (2)(2) 2 

( ) 

0 

= 

1 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 1 

= √ 

(1)(1) + (0)(0) 0 

( ) 

1 

= 

0 


Hints 433 




the first. 

Projected. 





13.74. 

w 1 = v 1 

( ) 

2 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(2) + (−1)(0) 

= − 

−1 (2)(2) + (0)(0) 

( ) 

0 

= 

−1 

( ) 

2 

0

434 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (0)(0) 0 

( ) 

1 

= 

0 

= 

= 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

0 

√ 

(0)(0) + (−1)(−1) −1 

( 

0 

−1 

) 

See figure A.43.

Hints 435 

13.75. 

w 1 = v 1 

( ) 

−1 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(−1) + (1)(0) −1 

= − 

1 (−1)(−1) + (0)(0) 0 

( ) 

0 

= 

1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

−1 

= √ 

(−1)(−1) + (0)(0) 0 

( ) 

−1 

= 

0 

= 

= 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 0 

√ 

(0)(0) + (1)(1) 1 

( 

0 

1 

) 


436 Hints 




the first. 

Projected. 





13.76. 

w 1 = v 1 

( ) 

2 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

0 (0)(2) + (−1)(−1) 2 

= − 

−1 (2)(2) + (−1)(−1) −1 

( ) 

= 

− 2 5 

− 4 5

Hints 437 




the first. 

Projected. 





u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

2 

= √ 

(2)(2) + (−1)(−1) −1 

( √ ) 

2 

= 5 5 

√ 

− 1 5 5 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

(− 2 5 )(− 2 5 ) + (− 4 5 )(− 4 5 ) ( √ ) 

− 1 

= 5 5 

√ 

5 

− 2 5 

− 2 5 

− 4 5 

) 

See figure A.45.

438 Hints 

13.77. 

w 1 = v 1 

( ) 

2 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(2) + (1)(0) 2 

= − 

1 (2)(2) + (0)(0) 0 

( ) 

0 

= 

1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (0)(0) 0 

( ) 

1 

= 

0 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 0 

= √ 

(0)(0) + (1)(1) 1 

( ) 

0 

= 

1 


Hints 439 




the first. 

Projected. 





13.78. 

w 1 = v 1 

( ) 

2 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

−1 (−1)(2) + (1)(2) 

= − 

1 (2)(2) + (2)(2) 

( ) 

−1 

= 

1 

( ) 

2 

2

440 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (2)(2) 2 

( √ ) 

1 

= 2 2 

√ 

2 

= 

= 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 

−1 

√ 

(−1)(−1) + (1)(1) 1 

( √ 

− 1 2 2 

1 

2 

√ 

2 

) 

See figure A.47.

Hints 441 

13.79. 

w 1 = v 1 

( ) 

2 

= 

0 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

2 (2)(2) + (1)(0) 2 

= − 

1 (2)(2) + (0)(0) 0 

( ) 

0 

= 

1 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (0)(0) 0 

( ) 

1 

= 

0 

1 

√ 

〈w2 , w 2 〉 w 2 

( ) 

1 0 

= √ 

(0)(0) + (1)(1) 1 

( ) 

0 

= 

1 


442 Hints 




the first. 

Projected. 





13.80. 

w 1 = v 1 

( ) 

1 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

0 (0)(1) + (−1)(2) 

= − 

−1 (1)(1) + (2)(2) 

( ) 

= 

2 

5 

− 1 5 

( ) 

1 

2

Hints 443 




the first. 

Projected. 





u 1 = 

= 

= 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

√ 

(1)(1) + (2)(2) 

( 

1 

√ ) 

5 5 

√ 

5 

2 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 2 5 )( 2 5 ) + (− 1 5 )(− 1 5 ) ( √ ) 

2 

= 5 5 

√ 

− 1 5 5 

2 

2 

5 

− 1 5 

) 

See figure A.49.

444 Hints 

13.81. 

w 1 = v 1 

( ) 

1 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(1) + (0)(2) 1 

= − 

0 (1)(1) + (2)(2) 2 

( ) 

= 

4 

5 

− 2 5 

u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 1 

= √ 

(1)(1) + (2)(2) 2 

( √ ) 

1 

= 5 5 

√ 

5 

2 

5 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 4 5 )( 4 5 ) + (− 2 5 )(− 2 5 ) ( √ ) 

2 

= 5 5 

√ 

− 1 5 5 

4 

5 

− 2 5 

) 


Hints 445 




the first. 

Projected. 





13.82. 

w 1 = v 1 

( ) 

2 

= 

2 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

0 (0)(2) + (−1)(2) 

= − 

−1 (2)(2) + (2)(2) 

( ) 

= 

1 

2 

− 1 2 

( ) 

2 

2

446 Hints 




the first. 

Projected. 





u 1 = 

u 2 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 2 

= √ 

(2)(2) + (2)(2) 2 

( √ ) 

1 

= 2 2 

√ 

2 

1 

2 

1 

√ 

〈w2 , w 2 〉 w 2 

( 

1 

= √ 

( 1 2 )( 1 2 ) + (− 1 2 )(− 1 2 ) ( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

2 

− 1 2 

) 

See figure A.51.

Hints 447 

13.83. 

w 1 = v 1 

( ) 

1 

= 

−1 

w 2 = v 2 − 〈v 2, w 1 〉 

〈w 1 , w 1 〉 w 1 

( ) 

( ) 

1 (1)(1) + (1)(−1) 1 

= − 

1 (1)(1) + (−1)(−1) −1 

( ) 

1 

= 

1 

u 1 = 

1 

√ 

〈w1 , w 1 〉 w 1 

( ) 

1 

1 

= √ 

(1)(1) + (−1)(−1) −1 

( √ ) 

1 

= 2 2 

√ 

− 1 2 2 

1 

u 2 = √ 

〈w2 , w 2 〉 w 2 

= 

= 

( ) 

1 1 

√ 

(1)(1) + (1)(1) 

( 

1 

√ ) 

2 2 

√ 

2 

See figure A.52 on the following page. 

13.84. Nothing 

1 

2 

1 


14.1. If Au = λu and Av = µv, then 

〈Au, v〉 = λ 〈u, v〉 

= 〈u, Av〉 

= µ 〈u, v〉 . 

So 〈u, v〉 = 0.

448 Hints 




the first. 

Projected. 





14.2. 

λ = 16 

λ = 3 

( 

− 2 3 

) 

1 

( ) 

3 

2 

1 

Gram–Schmidt the eigenvectors, and 

F = 

F −1 = 

F −1 AF = 

( 

− 

2 √ 

13 

3 √ 

13 

) 

√3 

√2 

13 13 

( 

− 

2 √ 

13 

3 √ 

13 

( 

) 

√3 

√2 

13 13 

16 0 

0 3 

Each eigenvector comes from a different eigenvalue, so they are already perpendicular— 

you only have to rescale them to have unit length. 

14.4. 

) 

det (A − λ I) =λ 2 − 5 λ 

=λ (λ − 5)

Hints 449 

λ = 5 

λ = 0 

( ) 

−2 

1 

( ) 

1 

2 

1 

After orthogonalizing the eigenvectors, 

14.5. 

( ) 

− √5 2 √5 1 

F = 

1√ √5 2 

5 

( ) 

− √5 2 √5 1 

F −1 = 

1√ √5 2 

5 

( ) 

F −1 5 0 

AF = 

0 0 

det (A − λ I) =λ 3 − 8 λ 2 + 15 λ 

=λ (λ − 3) (λ − 5) 

⎛ ⎞ 

0 

⎜ ⎟ 

λ = 3 ⎝0⎠ 

1 

⎛ ⎞ 

λ = 0 

λ = 5 

1 

2 

⎜ ⎟ 

⎝1⎠ 

0 

⎛ ⎞ 

−2 

⎜ ⎟ 

⎝ ⎠ 


⎛ 

⎞ 

0 √5 1 

− √ 2 

⎜ 

5 

F = 

2 

⎝0 √5 √5 1 ⎟ 

⎠ 

1 

0 

1 0 0 

⎛ 

⎞ 

0 0 1 

F −1 ⎜ 

= ⎝ 

1√ √5 2 

5 

0 

⎟ 

⎠ 

− √ 2 √5 1 

5 

0 

⎛ 

⎞ 

F −1 ⎜ 

3 0 0 ⎟ 

AF = ⎝0 0 0⎠ 

0 0 5

450 Hints 

14.6. 

det (A − λ I) =λ 3 − 4 λ 2 + 5 λ − 2 

= (λ − 2) (λ − 1) 2 

λ = 1 

λ = 2 

⎛ ⎞ ⎛ ⎞ 

1 

⎜ ⎟ ⎜ 

2 ⎟ 

⎝1⎠ , ⎝0⎠ 

0 1 

⎛ ⎞ 

⎜ 

⎝ 

− 1 2 

1 

2 

1 

⎟ 

⎠ 


⎛ 

⎞ 

1√ √3 1 2 

− √ 1 6 

F = ⎜ 1√ 

⎝ 2 

− √ 1 √6 1 ⎟ 

3 ⎠ 

1 

0 √3 √6 2 

⎛ 

⎞ 

1√ √2 1 2 

0 

F −1 = ⎜ 1√ 

⎝ 3 

− √ 1 √3 1 ⎟ 

3 ⎠ 

− √ 1 √6 1 √6 2 

6 

⎛ 

⎞ 

F −1 ⎜ 

1 0 0 ⎟ 

AF = ⎝0 1 0⎠ 

0 0 2 

14.7. 

λ = −1 

λ = 1 

⎛ 

− 4 3 

⎞ 

⎜ ⎟ 

⎝ 0 ⎠ 

1 

⎛ ⎞ ⎛ ⎞ 

⎜ 

0 3 

4 

⎟ ⎜ ⎟ 

⎝1⎠ , ⎝0⎠ 

0 1

Hints 451 


⎛ 

⎞ 

− 4 3 

5 

0 

5 

⎜ 

⎟ 

F = ⎝ 0 1 0⎠ 

3 

4 

5 

0 

5 

⎛ 

⎞ 

− 4 3 

F −1 5 

0 

5 

⎜ 

⎟ 

= ⎝ 0 1 0⎠ 

3 

4 

5 

0 

5 

⎛ 

⎞ 

−1 0 0 

F −1 ⎜ 

⎟ 

AF = ⎝ 0 1 0⎠ 

0 0 1 

14.8. Such an F must preserve the eigenspaces, so preserve the span of e 1 , 

the span of e 2 , and the span of e 3 . Therefore 

⎛ 

⎞ 

±1 

⎜ 

⎟ 

F = ⎝ ±1 ⎠ . 

±1 

14.11. You only change x 1 x 2 terms: 

a. x 2 2 

b. x 2 1 + x 2 2 

c. x 2 1 + 3 2 x 1x 2 + 3 2 x 2x 1 

d. x 2 1 + 1 2 x 1x 2 + 1 2 x 2x 1 

14.12. 

a. 

b. 

c. 

d. 

( ) 

0 0 

A = 

0 1 

( ) 

0 0 

A = 

0 1 

( ) 

3 

1 

A = 

2 

3 

2 

0 

( ) 

1 

1 

A = 

2 

1 

2 

0 

14.31. Look at the eigenvalues of the symmetric matrix, to get started. 

a. ellipse 

b. hyperbola 

c. pair of lines 

d. hyperbola

452 Hints 

e. line 

f. empty set 

14.32. First, by orthogonal transformations, you can get you quadratic 

form to look like 

λ 1 x 2 1 + λ 2 x 2 2 + · · · + λ n x 2 n. 

Then you can get every eigenvalue λ j to be 0, 1 or −1, by rescaling the associated 

variable x j . Then you can permute the order of the variables. So you can get 

for some s between 1 and n. 

±x 2 1 + ±x 2 2 + · · · + ±x 2 s, 


15.1. Take the complex number with half as much argument, and square 

root as much modulus. 

15.10. 〈z, w〉 = 1(−i) + i(2 − 2i) = 2 + i 

15.16. Since A is self-adjoint, 

〈Az, z〉 = 〈z, Az〉 . 

If we pick z an eigenvector, with eigenvalue λ, then the left side becomes 

and the right side becomes 

〈Az, z〉 = λ 〈z, z〉 , 

〈z, Az〉 = ¯λ 〈z, z〉 . 

Since 〈z, z〉 = ‖z‖ 2 ≠ 0, we find λ = ¯λ. 

15.21. For an eigenvector z, with eigenvalue λ, 

(because A is unitary) 

〈z, z〉 = 〈Az, Az〉 

= 〈λz, λz〉 

= λ ¯λ 〈z, z〉 

= |λ| 2 〈z, z〉 . 

Since z ≠ 0, we can divide 〈z, z〉 off of both sides. 

15.23. 

( ) ( ) 

1√2 

1+2i √ 

u 1 = , u 

√i 

2 = 10 

. 

√2−i 

2 10 

15.27.

Hints 453 

a. A is self-adjoint. 

b. 

c. 

λ = 3 

λ = 4 

F = 

F ∗ AF = 

( ) 1√2 

√i 

2 

( ) 

1 

2 

(1 + i) 

1 

2 (1 − i) 

( 1√2 1 

2 

√i 

2 

1 

( 

3 

4 

) 

(1 + i) 

2 

) 

(1 − i) 

. 


16.1. 

a. 0 v = (0 + 0) v = 0 v + 0 v. Add the vector w for which 0 v + w = 0 to 

both sides. 

b. a 0 = a (0 + 0) = a 0 + a 0. Similar. 

16.3. If there are two such vectors, w 1 and w 2 , then v + w 1 = v + w 2 = 0. 

Therefore 

w 1 = 0 + w 1 

= (v + w 2 ) + w 1 

= v + (w 2 + w 1 ) 

= v + (w 1 + w 2 ) 

= (v + w 1 ) + w 2 

= 0 + w 2 

= w 2 . 

0 = (1 + (−1)) v 

= 1 v + (−1) v 

= v + (−1) v 

so (−1) v = −v. 

16.8. Ax = Sx and By = T y for any x in R p and y in R q . Therefore 

T Sx = BSx = BAx for any x in R p . So (BA)e j = T Se j , and therefore BA 

has the required columns to be the associated matrix. 

16.10.

454 Hints 

a. 1x = 1y just when x = y, since 1x = x. 

b. Given any z in V , we find z = 1x just for x = z. 

16.11. Given z in V , we can find some x in U so that T x = z. If there are 

two choices for this x, say z = T x = T y, then we know that x = y. Therefore x 

is uniquely determined. So let T −1 z = x. Clearly T −1 is uniquely determined, 

by the equation T −1 T = 1, and satisfies T T −1 = 1 too. Lets prove that T −1 

is linear. Pick z and w in V . Then let T x = z and T y = w. We know 

that x and y are uniquely determined by these equations. Since T is linear, 

T x + T y = T (x + y). This gives z + w = T (x + y). So 

T −1 (z + w) = T −1 T (x + y) 

= x + y 

= T −1 z + T −1 w. 

Similarly, T (ax) = aT x, and taking T −1 of both sides gives aT −1 z = T −1 az. 

So T −1 is linear. 

16.12. Write p(x) = a + bx + cx 2 . Then 

T p = 

⎛ 

⎜ 

⎝ 

a 

a + b + c 

a + 2b + 4c 

⎞ 

⎟ 

⎠ . 

To solve for a, b, c in terms of p(0), p(1), p(2), we solve 

Apply forward elimination: 

⎛ 

⎞ ⎛ ⎞ ⎛ ⎞ 

1 

a 

⎜ 

⎟ ⎜ ⎟ ⎜ 

p(0) ⎟ 

⎝1 1 1⎠ 

⎝b⎠ = ⎝p(1) 

⎠ . 

1 2 4 c p(2) 

⎛ 

⎞ 

⎜ 

1 0 0 p(0) ⎟ 

⎝1 1 1 p(1) ⎠ 

1 2 4 p(2) 

⎛ 

⎜ 

1 0 0 p(0) 

⎞ 

⎟ 

⎝0 1 1 p(1) − p(0) ⎠ 

0 2 4 p(2) − p(0) 

⎛ 

⎞ 

⎜ 

1 0 0 p(0) ⎟ 

⎝0 1 1 p(1) − p(0) ⎠ 

0 0 2 p(2) − 2p(1) + p(0)

Hints 455 

Back substitute to find: 

c = 1 2 p(0) − p(1) + 1 2 p(2) 

b = p(1) − p(0) − c 

= − 3 2 p(0) + 2p(1) − 1 2 p(2) 

a = p(0). 

Therefore we can recover p = a+bx+cx 2 completely from knowing p(0), p(1), p(2), 

so T is one-to-one and onto. 

16.14. Some examples you might think of: 

• The set of constant functions. 

• The set of linear functions. 

• The set of polynomial functions. 

• The set of polynomial functions of degree at most d (for any fixed d). 

• The set of functions f(x) which vanish when x < 0. 

• The set of functions f(x) for which there is some interval outside of which 

f(x) vanishes. 

• The set of functions f(x) for which f(x)p(x) goes to zero as x gets large, 

for every polynomial p(x). 

• The set of functions which vanish at the origin. 

16.15. 

a. no 

b. no 

c. no 

d. yes 

e. no 

f. yes 

g. yes 

16.16. 

a. no 

b. no 

c. no 

d. yes 

e. yes 

f. no 

16.17. 

a. If AH = HA and BH = HB, then (A + B)H = H(A + B), clearly. 

Similarly 0H = H0 = 0, and (cA)H = H(cA). 

b. P is the set of diagonal 2 × 2 matrices. 

16.18. You might take: 

a. 1, x, x 2 , x 3 

b. 

( 

) ( 

) ( 

) 

1 0 0 1 0 0 1 0 0 

, 

, 

, 

0 0 0 0 0 0 0 0 0 

( 

1 0 0 

0 0 0 

) 

, 

( 

1 0 0 

0 0 0 

) 

, 

( 

1 0 0 

0 0 0 

)

456 Hints 

c. The set of matrices e(i, j) for i ≤ j, where e(i, j) has zeroes everywhere, 

except for a 1 at row i and column j. 

d. A polynomial p(x) = a + bx + cx 2 + dx 3 vanishes at the origin just when 

a = 0. A basis: x, x 2 , x 3 . 

16.19. If F : R p → V and G : R q → V are isomorphisms, then A = G −1 F : 

R p → R q is an isomorphism. Hence A is a matrix with 0 kernel and image R q . 

So A must have rank q. By theorem 10.7 on page 96, the dimension of the 

kernel plus the dimension of the image must be p. Therefore p = q. 

16.21. The kernel of T is the set of x for which T x = 0. But T x = 0 implies 

Sx = P −1 T x = 0, and Sx = 0 implies T x = P Sx = 0. So the same kernel. 

The image of S is the set of vectors of the form Sx, and each is carried by P to 

a vector of the form P Sx = T x. Conversely P −1 carries the image of T to the 

image of S. Check that this is an isomorphism. 

16.23. If T : U → V is an isomorphism, and F : R n → U is a isomorphism, 

prove that T F : R n → V is also an isomorphism. 

16.24. The same proof as for R n ; see proposition 9.12 on page 87. 

16.25. Take F and G two isomorphisms. Determinants of matrices multiply. 

Let A be the matrix associated to F −1 T F : R n → R n and B the matrix 

associated to G −1 T G : R n → R n . Let C be the matrix associated to G −1 F . 

Therefore CAC −1 = B. 

det B = det ( CAC −1) 

= det C det A (det C) −1 

= det A. 

16.26. 

a. T (p(x) + q(x)) = 2 p(x − 1) + 2 q(x − 1) = T p(x) + T q(x), and T (ap(x)) = 

2a p(x − 1) = a T p(x). 

b. If T p(x) = 0 then 2 p(x − 1) = 0, so p(x − 1) = 0 for any x, so p(x) = 0 

for any x. Therefore T has kernel {0}. As for the image, if q(x) is 

any polynomial of degree at most 2, then let p(x) = 1 2q(x + 1). Clearly 

T p(x) = q(x). So T is onto. 

c. To find the determinant, we need an isomorphism. Let F : R 3 → V , 

⎛ 

⎜ 

a 

⎞ 

⎟ 

F ⎝b⎠ = a + bx + cx 2 . 

c

Hints 457 

Calculate the matrix A of F −1 T F by 

⎛ ⎞ 

F −1 ⎜ 

a ⎟ 

T F ⎝b⎠ = F −1 T (a + bx + cx 2 ) 

c 

= F −1 2 ( a + b(x − 1) + c(x − 1) 2) 

= F −1 ( 2a + 2b(x − 1) + 2c ( x 2 − 2x + 1 )) 

= F −1 ( (2a − 2b + 2c) + (2b − 4c) x + 2cx 2) 

⎛ 

⎞ 

2a − 2b + 2c 

⎜ 

⎟ 

= ⎝ 2b − 4c ⎠ 

2c 

So the associated matrix is 

⎛ 

⎞ 

⎜ 

2 −2 2 ⎟ 

A = ⎝0 2 −4⎠ 

0 0 2 

giving 

det T = det A = 8. 

16.27. det T = 2 n 

16.28. det T = det A 2 . The eigenvalues of T are the eigenvalues of A. The 

eigenvectors with eigenvalue λ j are spanned by 

( ) 

x j 0 , 

(0 x j 

) 

. 

16.29. Let y 1 and y 2 be the eigenvectors of A t with eigenvalues λ 1 and λ 2 

respectively. Then the eigenvalues of T are those of A, with multiplicity 2, and 

λ i -eigenspace spanned by ( ) ( ) 

yi 

t 0 

, . 

0 

16.30. The characteristic polynomial is p (λ) = (λ + 1) (λ − 1) 2 . The 

1-eigenspace is the space of polynomials q(x) for which q(−x) = q(x), so 

q(x) = a + cx 2 . This eigenspace is spanned by 1, x 2 . The (−1)-eigenspace is 

the space of polynomials q(x) for which q(−x) = −q(x), so q(x) = bx. The 

eigenspace is spanned by x. Indeed T is diagonalizable, and diagonalized by 

the isomorphism F e 1 = 1, F e 2 = x, F e 3 = x 2 , for which 

⎛ 

⎞ 

16.31. 

F −1 T F = 

⎜ 

⎝ 

1 

y t i 

−1 

⎟ 

⎠ . 

1

458 Hints 

(a) A polynomial with average value 0 on some interval must take on the value 

0 somewhere on that interval, being either 0 throughout the interval or 

positive somewhere and negative somewhere else. A polynomial in one 

variable can’t have more zeros than its degree. 

(b) It is enough to assume that the number of intervals is n, since if it is smaller, 

we can just add some more intervals and specify some more choices for 

average values on those intervals. But then T p = 0 only for p = 0, so T is 

an isomorphism. 

16.33. Clearly the expression is linear in p(z) and “conjugate linear” in 

q(z). Moreover, if 〈p(z), p(z)〉 = 0, then p(z) has roots at z 0 , z 1 , z 2 and z 3 . But 

p(z) has degree at most 3, so has at most 3 roots or else is everywhere 0. 

17 Fields 

17.1. If there were two, say z 1 and z 2 , then z 1 + z 2 = z 1 , but z 1 + z 2 = 

z 2 + z 1 = z 2 . 

17.2. Same proof, but with · instead of +. 

17.6. If p = ab, then in F p arithmetic ab = p (mod p) = 0 (mod p). If a 

has a reciprocal, say c, then ab = 0 (mod p) implies that b = cab (mod p) = c0 

(mod p) = 0 (mod p). So b is a multiple of p, and p is a multiple of b, so p = b 

and a = 1. 

17.7. You find −21 as answer from the Euclidean algorithm. But you can 

add 79 any number of times to the answer, to get it to be between 0 and 78, 

since we are working modulo 79, so the final answer is −21 + 79 = 58 

17.8. x = 3 

17.9. It is easy to check that F p satisfies addition laws, zero laws, multiplication 

laws, and the distributive law: each one holds in the integers, and 

to see that it holds in F p , we just keep track of multiples of p. For example, 

x + y in F p is just addition up to a multiple of p, say x + y + ap, usual integer 

addition, some integer a. So (x + y) + z in F p is (x + y) + z in the integers, up 

to a multiple of p, and so equals x + (y + z) up to a multiple of p, etc. The 

tricky bit is the reciprocal law. Since p is prime, nothing divides into p except 

p and 1. Therefore for any integer a between 0 and p − 1, the greatest common 

divisor gcd(a, p) is 1. The Euclidean algorithm computes out integers u and v 

so that ua + vp = 1, so that ua = 1 (mod p). Adding or subtracting enough 

multiples of p to u, we find a reciprocal for a.

Hints 459 

So 

17.10. Gauss-Jordan elimination applied to (A 1): 

⎛ 

⎞ 

0 1 0 1 0 0 

⎜ 

⎟ 

⎝ 1 0 1 0 1 0⎠ 

1 1 0 0 0 1 

⎛ 

⎞ 

1 0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 1 0 1 0 0⎠ 

1 1 0 0 0 1 

⎛ 

⎞ 

1 0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 1 0 1 0 0⎠ 

0 1 1 0 1 1 

⎛ 

⎞ 

1 0 1 0 1 0 

⎜ 

⎟ 

⎝ 0 1 0 1 0 0⎠ 

0 0 1 1 1 1 

⎛ 

⎞ 

1 0 0 1 0 1 

⎜ 

⎟ 

⎝ 0 1 0 1 0 0⎠ 

0 0 1 1 1 1 

⎛ 

⎞ 

A −1 ⎜ 

1 0 1 ⎟ 

= ⎝1 0 0⎠ . 

1 1 1 

17.11. Carry out Gauss–Jordan elimination thinking of the entries as living 

in the field of rational functions. The result has rational functions as entries. 

For any value of t for which none of the denominators of the entries are zero, and 

none of the pivot entries are zero, the rank is just the number of pivots. There 

are finitely many entries, and the denominator of each entry is a polynomial, so 

has finitely many roots. 


(a) 

18.3. 

Q (x 1 , x 2 , x 3 , x 4 ) = (x 1 − x 2 ) (x 1 − x 3 ) (x 1 − x 4 ) (x 2 − x 3 ) (x 2 − x 4 ) (x 3 − x 4 ) . 

(b) It is enough to show that the sign changes under a transposition swapping 

successive numbers, like swapping 1 and 2 or 2 and 3, etc. So lets swap j 

and j + 1. The terms that are affected: x j − x j+1 has its sign changed, and 

x i − x j is swapped with x i − x j+1 for i < j, and x j − x k is swapped with 

x j+1 − x k for j + 1 < k.

460 Hints 

18.6. Each permutation chooses somewhere for the number 1 to go to, for 

which there are n choices, and then there are n − 1 choices left for which it can 

put the number 2, etc. 

18.12. A −1 is upper triangular, with 1’s down the diagonal, and its entries 

above the diagonal are polynomials with integer coefficients, of degree at most 

12. 


20.2. Set X = x − z and Y = z − y. 

20.4. The distances tell us that z lies on the line segment. But then 

‖z − y‖ = t ‖x − y‖ , 

so we recover t, and we know which point of the line segment we are at. 

20.6. Suppose that A is a 2 × 2 orthogonal matrix. Then the columns of A 

are orthonormal. In particular, each column is a unit vector, so the first column 

can be written as ( 

cos θ 

sin θ 

) 

for a unique angle θ (unique up to adding integer multiples of 2π). The second 

column must be perpendicular to the first, and of unit length, so at an angle of 

±π/2 from the first, so 

± 

( 

− sin θ 

cos θ 

20.8. Any vector x can be written as 

Apply R to both sides: 

x = 〈x, u〉 u + 〈x, v〉 v + (x − 〈x, u〉 u − 〈x, v〉 v) . 

Rx = 〈x, u〉 (cos θu + sin θv)+〈x, v〉 (− sin θu + cos θv)+(x − 〈x, u〉 u − 〈x, v〉 v) . 

20.9. cos π = −1, sin π = 0 

) 

. 


21.1. Each element x of W yields a linear equation 〈x, y〉 = 0. The vectors 

y that satisfy all of these linear equations form precisely W ⊥ . Clearly if we 

add vectors perpendicular to such a vector x, or scale them, they remain 

perpendicular to x. 

21.2. If we have two vectors, say v 1 and v 2 in W , then we can write 

each one uniquely in the form v 1 = x 1 + y 1 and v 2 = x 2 + y 2 . But then

Hints 461 

v 1 + v 2 = (x 1 + x 2 ) + (y 1 + y 2 ), writes v 1 + v 2 as a sum of a vector in W and a 

vector in W ⊥ . Uniqueness of such a representation tells us that 

P (v 1 + v 2 ) = P v 1 + P v 2 . 

A similar argument works for rescaling vectors. 

21.3. All vectors in W and in W ⊥ are perpendicular. So W lies in the set 

of vectors perpendicular to W ⊥ , i.e. in W ⊥⊥ . Conversely, take v any vector in 

W ⊥⊥ . Write v = x + y with x in W and y in W ⊥ . 

0 = 〈v, y〉 

because v lies in W ⊥⊥ while y lies in W ⊥ 

= 〈x + y, y〉 

= 〈y, y〉 . 

So y = 0. 

21.4. To say that x and y are perpendicular is to say that 〈x, y〉 = 0. 

‖x + y‖ 2 = 〈x + y, x + y〉 

= 〈x, x〉 + 〈x, y〉 + 〈y, x〉 + 〈y, y〉 

= 〈x, x〉 + 〈y, y〉 

= ‖x‖ 2 + ‖y‖ 2 . 

21.7. It is easy to show that the projection P to any subspace satisfies 

P = P 2 = P t . On the other hand, any linear map P with P = P 2 = P t has 

some image, say W , a subspace of V . Take x + y any vector in V , with x in W 

and y in W ⊥ . We need to see that P x = x and P y = 0. Since W is the image 

of P , and x lies in W , we must have x = P z for some vector z in V . Therefore 

P x = P 2 z = P z = x, so P x = x. Next, take any vector v in V . Then 

〈P y, z〉 = 〈 y, P t z 〉 

= 〈y, P z〉 

= 0, 

since y is perpendicular to the image of P , so to P z. Therefore P y = 0. 


22.2. You could try letting U be the horizontal plane, consisting in the 

vectors 

⎛ ⎞ 

x 1 

⎜ ⎟ 

x = ⎝x 2 ⎠ 

0

462 Hints 

and V any other plane, for instance the plane consisting in the vectors 

⎛ ⎞ 

⎜ 

x 1 

⎟ 

x = ⎝ 0 ⎠ . 

x 3 

22.3. Any vector w in R n splits uniquely into w = u + v for some u in 

U and v in V . So we can unambiguously define a map Q by the equation 

Qw = Su + T v. It is easy to check that this map Q is linear. Conversely, given 

any map Q : R n → R p , define S = Q| U 

and T = Q| V 

. 

22.7. The pivot columns of (A B) form a basis for U + W . All columns 

are pivot columns just when U + W is a direct sum. 


23.1. Take any point of the plane, project it horizontally onto the vertical 

axis, and then rotate the vertical axis clockwise by a right angle to become the 

horizontal axis. If you do it twice, clearly you end up at the origin. 

23.4. Take a generalized eigenvector x ≠ 0 with two different eigenvalues: 

(A − λ) k x = 0 and (A − µ) l x = 0. Pick x so that the powers k and l are 

as small as possible. Then let y = (A − λ) x. Check that y is a generalized 

eigenvector of A with two different eigenvalues, but with smaller powers than 

x had. This is a contradiction unless y = 0. So (A − λ) x = 0, giving power 

k = 1. Using the same reasoning, switching the roles of λ and µ, we find that 

(A − µ) x = 0. So Ax = λx = µx, forcing λ = µ. 

23.5. Take the shortest possible linear relation between nonzero generalized 

eigenvectors. Suppose that it involves vectors x i with different eigenvalues λ i : 

(A − λ i ) ki x i = 0. If you can find a shorter relation, or one of the same length 

with lower values for all powers k i , use it instead. Let y i = (A − λ 1 ) x i . These 

y i are still generalized eigenvectors with the same eigenvalue, and same powers 

k i but a smaller power for k 1 . So these y i must all vanish. But then all of the 

vectors x i each have two different eigenvalues. 

23.7. 

λ = 2 

e 1 

e 3 , e 2 

λ = 3 e 6 , e 5 , e 4 

23.8. 

Clearly x by itself is a string of length 1. Let k be the length of the longest 

string starting from x. So the string is 

x, (A − λ) x, . . . , (A − λ) k−1 x = y.

Hints 463 

Lets try to take another step. The next step must be (A − λ) y. If that vanishes, 

we can’t step. If it doesn’t vanish, then we can step, as long as this next step is 

linearly independent of all of the vectors earlier in the string. If the next step 

isn’t linearly independent, then we must have a relation: 

c 0 x + c 1 (A − λ) x + · · · + c k (A − λ) k x = 0. 

Multiply both sides with a big enough power of A − λ to kill off all but the first 

term. This power exists, because all of the vectors in the string are generalized 

λ eigenvectors, so killed by some power of A − λ, and the further down the list, 

the smaller a power of A − λ you need to do the job, since you already have 

some A − λ sitting around. So this forces c 0 = 0. Hit with the next smallest 

power of A − λ to force c 1 = 0, etc. There is no linear relation, and we can take 

this next step. The last step y must be an eigenvector, because (A − λ) y = 0. 

23.10. Two ways:(1) There is an eigenvector for each eigenvalue. Pick 

one for each. Eigenvectors with different eigenvalues are linearly independent, 

so they form a basis. (2) Each Jordan block of size k has an eigenvalue λ of 

multiplicity k. So all blocks must be 1 × 1, and hence A is diagonal in Jordan 

normal form. 

23.11. 

23.12. 

F = 

( 

1 

2 

− 1 2 

1 

2 

1 

2 

) 

( ) 

, F −1 0 0 

AF = 

0 2 

⎛ 

⎜ 

1 0 0 

⎞ 

⎛ 

⎞ 

0 1 0 

⎟ 

F = ⎝0 1 1⎠ , F −1 ⎜ 

⎟ 

AF = ⎝0 0 0⎠ 

0 1 0 

0 0 0 

23.13. Two blocks, each at most n/2 long, and a zero block if needed to 

pad it out. 

23.14. 

⎛ 

− i 4 

1 

4 

i 

1 

2 

i 

4 

i 1 

4 2 

1 

4 

− i 4 

F = ⎜ 

⎝ 

4 

0 − i 4 

0 

− 1 i 

4 4 

− 1 4 

− i 4 

23.15. 

A is in echelon form, so U = A, and 

⎛ 

⎞ 

1 0 0 

0 1 0 

U ′ = A ′ = 

0 0 1 

. 

⎜ 

⎟ 

⎝0 0 0⎠ 

0 0 0 

⎞ 

⎛ 

⎞ 

i 1 0 0 

⎟ 

⎠ , F −1 0 i 0 0 

AF = ⎜ 

⎟ 

⎝0 0 −i 1 ⎠ 

0 0 0 −i

464 Hints 

We can save time writing U = A = ∆ 2 , and 

( ) 

U ′ = A ′ I 

= 

0 

(where I here is actually I 3 ). Solving U ′ X = UA ′ is easy: 

( ) ( ) 

U ′ X − UA ′ I 

= X − ∆ 2 I 

0 

0 

( ) ( ) 

X ∆ 

= − 

2 

. 

0 0 

(Note how we can allow ourselves to just play with I and ∆ as if they were 

numbers. Don’t write I 3 or ∆ 5 , i.e. forget the subscripts.) So X = ∆ 2 , and X 

is 3 × 3. We leave the reader to check that X has strings: 

λ = 0 

e 2 

e 3 , e 1 

giving A the strings: 

λ = 0 

A ′ e 2 

A ′ e 3 , A ′ e 1 

But 

( ) 

A ′ 1 

= 

0 

so A ′ e 1 = e 1 , A ′ e 2 = e 2 and A ′ e 3 = e 3 . Therefore the corresponding strings of 

A are 

λ = 0 

e 2 

e 3 , e 1 

Take the start z of each 0-string, here z = e 2 and z = e 3 , and try to solve 

Ux = V z. But U = ∆ 2 shifts labels back by 2. Therefore the strings of A are 

λ = 0 

e 4 , e 2 

e 5 , e 3 , e 1

Hints 465 

This is a basis, so we have our strings. 

⎛ 

⎞ 

0 0 1 0 0 

1 0 0 0 0 

F = 

0 0 0 1 0 

, 

⎜ 

⎟ 

⎝0 1 0 0 0⎠ 

0 0 0 0 1 

⎛ 

⎞ 

0 1 

0 0 

F −1 AF = 

0 1 0 

. 

⎜ 

⎝ 

0 0 1 ⎟ 

⎠ 

0 0 0 

23.19. Clearly it is enough to prove the result for a single Jordan block. 

Given an n × n Jordan block λ + ∆, let A be any diagonal matrix, 

⎛ 

⎞ 

a 1 a 

A = 

2 ⎜ 

. 

⎝ 

.. ⎟ 

⎠ 

a n 

with all of the a j distinct and nonzero. Compute out the matrix B = A(λ + ∆) 

to see that all diagonal entries of B are distinct and that B is upper triangular. 

Why is B diagonalizable? Then λ + ∆ = A −1 B, a product of diagonalizable 

matrices. 


24.1. Using long division of polynomials, 

x 3 − x + 3 

x 2 + 1 ) x 5 + 3x 2 + 4x + 1 

− x 5 − x 3 

− x 3 + 3x 2 + 4x 

x 3 + x 

3x 2 + 5x + 1 

− 3x 2 − 3 

5x − 2 

so the quotient is x 3 − x + 3 and the remainder is 5x − 2. 

24.2.

466 Hints 

x − 2 

x 4 + 2x 3 + 4x 2 + 4x + 4 ) x 5 + 2x 3 + x 2 + 2 

− x 5 − 2x 4 − 4x 3 − 4x 2 − 4x 

− 2x 4 − 2x 3 − 3x 2 − 4x + 2 

2x 4 + 4x 3 + 8x 2 + 8x + 8 

2x 3 + 5x 2 + 4x + 10 

1 

2 x − 1 4 

2x 3 + 5x 2 + 4x + 10 ) x 4 + 2x 3 + 4x 2 + 4x + 4 

− x 4 − 5 2 x3 − 2x 2 − 5x 

x 2 + 2 ) 2x 3 + 5x 2 + 4x + 10 

− 2x 3 − 4x 

5x 2 + 10 

− 5x 2 − 10 

− 1 2 x3 + 2x 2 − x + 4 

1 

2 x3 + 5 4 x2 + x + 5 2 

13 

4 x2 + 13 2 

2x + 5 

0 

Clearly r(x) = x 2 + 2 (up to scaling; we will always scale to get the leading 

term to be 1.) Solving backwards for r(x): 

with 

and 

r(x) = u(x)a(x) + v(x)b(x) 

u(x) = − 2 ( 

x − 1 ) 

, 

13 2 

v(x) = 2 

13 

24.3. Euclidean algorithm yields: 

(x 2 − 5 2 x + 3 ) 

. 

2310 − 2 · 990 = 330 

990 − 3 · 330 = 0 

1386 − 1 · 990 = 396 

990 − 2 · 396 = 198 

396 − 2 · 198 = 0. 

Therefore the greatest common divisors are 330 and 198 respectively. Apply 

Euclidean algorithm to these: 

330 − 1 · 198 = 132 

198 − 1 · 132 = 66 

132 − 2 · 66 = 0.

Hints 467 

Therefore the greatest common divisor of 2310, 990 and 1386 is 66. Turning 

these equations around, we can write 

66 = 198 − 1 · 132 

= 198 − 1 · (330 − 1 · 198) 

= 2 · 198 − 1 · 330 

= 2 (990 − 2 · 396) − 1 · (2310 − 2 · 990) 

= 4 · 990 − 4 · 396 − 1 · 2310 

= 4 · 990 − 4 · (1386 − 1 · 990) − 1 · 2310 

= 8 · 990 − 4 · 1386 − 1 · 2310. 

24.6. Clearly ∆ n n = 0, since ∆ shifts each vector of the string e n , e n−1 , . . . , e 1 

one step (and sends e 1 to 0). Therefore the minimal polynomial of ∆ n must 

divide x n . But for k < n, ∆ k ne n = e n−k is not 0, so x k is not the minimal 

polynomial. 

24.7. If the minimal polynomial of λ + ∆ is m(x), then the minimal 

polynomial of ∆ must be m(x − λ). 

24.8. m (λ) = λ 2 − 5λ − 2 

24.10. Take A to have Jordan normal form, and you find characteristic 

polynomial 

det (A − λ I) = (λ 1 − λ) n1 (λ 2 − λ) n2 . . . (λ N − λ) n N 

, 

with n 1 the sum of the sizes of all Jordan blocks with eigenvalue λ 1 , etc. The 

characteristic polynomial is clearly divisible by the minimal polynomial. 

24.11. Split the minimal polynomial m(x) into real and imaginary parts. 

Check that s(A) = 0 for s(x) either of these parts, a polynomial equation of 

the same or lower degree. The imaginary part has lower degree, so vanishes. 

24.12. Let 

( ) 2πk 

z k = cos + i sin 

n 

( 2πk 

By deMoivre’s theorem, zk n = 1. If we take k = 1, 2, . . . , n, these are the 

so-called n-th roots of 1, so that 

n 

) 

. 

z n − 1 = (z − z 1 ) (z − z 2 ) . . . (z − z n ) . 

Clearly each root of 1 is at a different angle. A n = 1 implies that 

(A − z 1 ) (A − z 2 ) . . . (A − z n ) = 0, 

so by corollary 24.14, A is diagonalizable. Over the real numbers, we can take 

( ) 

0 −1 

A = 

, 

1 0 

which satisfies A 4 = 1, but has complex eigenvalues, so is not diagonalizable 

over the real numbers.

468 Hints 

24.13. Applying forward elimination to the matrix 

⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 1 1 3 

1 1 3 5 

0 0 0 0 

0 1 1 3 

⎜ 

⎟ 

⎝0 2 2 6 ⎠ 

1 −1 1 −1 

yields 

⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 1 1 3 

1 1 3 5 

0 0 0 0 

0 1 1 3 

⎜ 

⎟ 

⎝ 0 2 2 6 ⎠ 

1 −1 1 −1 


⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 1 1 3 

0 1 1 3 

0 0 0 0 

0 1 1 3 

⎜ 

⎟ 

⎝ 0 2 2 6 ⎠ 

0 −1 −1 −3

Hints 469 


⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 1 1 3 

0 1 1 3 

0 0 0 0 

0 1 1 3 

⎜ 

⎟ 

⎝ 0 2 2 6 ⎠ 

0 −1 −1 −3 

Add − 1 2 (row 2) to row 4, − 1 2 (row 2) to row 5, − 1 2 

(row 2) to row 7, −(row 2) 

to row 8, 1 2 


⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

⎜ 

⎟ 

⎝ 0 0 0 0⎠ 

0 0 0 0 


⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

⎜ 

⎟ 

⎝ 0 0 0 0⎠ 

0 0 0 0

470 Hints 



⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

⎜ 

⎟ 

⎝ 0 0 0 0 ⎠ 

0 0 0 0 

⎛ 

⎞ 

1 0 2 2 

0 2 2 6 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

⎜ 

⎟ 

⎝ 0 0 0 0⎠ 

0 0 0 0 

Cutting out all of the pivotless columns after the first one, and all of the zero 

rows, yields ( 

) 

1 0 2 

0 2 2 

Scale row 2 by 1 2 . ( 

) 

1 0 2 

0 1 1 

The minimal polynomial is therefore 

−2 − λ + λ 2 = (λ + 1) (λ − 2) . 

The eigenvalues are λ = −1 and λ = 2. We can’t see which eigenvalue has 

multiplicity 2 and which has multiplicity 1.

Hints 471 

24.17. If ε 1 ≠ ε 2 , then D = T and N = 0. If ε 1 = ε 2 , then 

( ) ( ) 

ε 

D = 1 0 0 1 

, N = 

. 

0 ε 2 0 0 

If we imagine ε 1 and ε 2 as variables, then D “jumps” (its top right corner 

changes from 0 to 1 “suddenly”) as ε 1 and ε 2 collide. 

24.18. We have seen previously that S and T will preserve each others 

generalized eigenspaces: if (S − λ) k x = 0, then (S − λ) k T x = 0. Therefore 

we can restrict to a generalized eigenspace of S, and then further restrict to a 

generalized eigenspace of T . So we can assume that S = λ 0 +N 0 and T = λ 1 +N 1 , 

with λ 0 and λ 1 complex numbers and N 0 and N 1 commuting nilpotent linear 

maps. But then ST = λ 0 λ 1 + N where N = λ 0 N 1 + λ 1 N 0 + N 0 N 1 . Clearly 

large enough powers of N will vanish, because they will be sums of terms like 

So N is nilpotent. 

λ j 0 λk 1N k+l 

0 N j+l 

1 . 


25.11. If we have a matrix A with n distinct eigenvalues, then every nearby 

matrix has nearby eigenvalues, so still distinct. 


26.2. 

det (A − λ I) = (z 1 − λ) (z 2 − λ) . . . (z n − λ) 

= (−1) n P z (λ) 

= s n (z) − s n−1 (z)λ + s n−2 (z)λ 2 + · · · + (−1) n λ n . 


27.1. There are (2n)! permutations of 1, 2, . . . , 2n, and each partition is 

associated to n!2 n different permutations, so there are (2n)!/ (n!2 n ) different 

partitions of 1, 2, . . . , 2n. 

27.2. In alphabetical order, the first pair must always start with 1. Then 

we can choose any number to sit beside the 1, and the smallest number not 

choosen starts the second pair, etc.

472 Hints 

(a) {1, 2} 

(b) {1, 2} , {3, 4} 

{1, 3} , {2, 4} 

{1, 4} , {2, 3} 

(c) {1, 2} , {3, 4} , {5, 6} 

{1, 2} , {3, 5} , {4, 6} 

{1, 2} , {3, 6} , {4, 5} 

{1, 3} , {2, 4} , {5, 6} 

{1, 3} , {2, 5} , {4, 6} 

{1, 3} , {2, 6} , {4, 5} 

{1, 4} , {2, 3} , {5, 6} 

{1, 4} , {2, 5} , {3, 6} 

{1, 4} , {2, 6} , {3, 5} 

{1, 5} , {2, 3} , {4, 6} 

{1, 5} , {2, 4} , {3, 6} 

{1, 5} , {2, 6} , {3, 4} 

{1, 6} , {2, 3} , {4, 5} 

{1, 6} , {2, 4} , {3, 5} 

{1, 6} , {2, 5} , {3, 4} 


28.1. See example 16.14 on page 164. 

28.5. If a linear map T : V → W is 1-to-1 then T 0 ≠ T v unless v = 0, so 

there are no vectors v in the kernel of T except v = 0. On the other hand, 

suppose that T : V → W has only the zero vector in its kernel. If v 1 ≠ v 2 but 

T v 1 = T v 2 , then v 1 − v 2 ≠ 0 but T (v 1 − v 2 ) = T v 1 − T v 2 = 0. So the vector 

v 1 − v 2 is a nonzero vector in the kernel of T . 

28.6. Hom (V, W ) is always a vector space, for any vector space W . 

28.7. If dim V = n, then V is isomorphic to R n , so without loss of generality 

we can just assume V = R n . But then V ∗ is identified with row matrices, so 

has dimension n as well: dim V ∗ = dim V . 

28.10. 

f x (α + β) = (α + β) (x) 

= α(x) + β(x) 

= f x (α) + f x (β). 

Similarly for f x (s α). 

28.11. f sx (α) = α(sx) = s α(x) = sf x (α). Similarly for any two vectors x 

and y in V , expand out f x+y . 

28.12. We need only find a linear function α for which α (x) ≠ α (y). 

Identifying V with R n using some choice of basis, we can assume that V = R n ,

Hints 473 

and thus x and y are different vectors in R n . So some entry of x is not equal to 

some entry of y, say x j ≠ y j . Let α be the function α(z) = z j , i.e. α = e j . 

28.16. If x + W is a translate, we might find that we can write this translate 

two different ways, say as x + W but also as z + W . So x and z are equal up 

to adding a vector from W , i.e. x − z lies in W . Then after scaling, clearly 

sx − sz = s(x − z) also lies in W . So sx + W = sz + W , and therefore scaling 

is defined independent of any choices. A similar argument works for addition of 

translates. 


29.1. Calculate out 〈Ax, x〉 for x = x 1 e 1 + x 2 e 2 + · · · + x n e n to see that 

this gives Q(x). We know that the equation 〈Ax, x〉 = Q(x) determines the 

symmetric matrix A uniquely. 


30.1. 

p L U 

( ) ( ) 

1 0 

0 1 

1, 2 

0 1 

1 0 

2, 1 

2, 1 

( 

( 

1 0 

0 1 

1 0 

0 1 

) ( 

) ( 

1 0 

0 1 

1 0 

0 1 

) 

) 


31.13. Try 

⎧ 

⎨0 if x = 0, 

Q(x) = 

⎩ x 4 1 +x4 2 +···+x4 n 

otherwise . 

x 2 1 +x2 2 +···+x2 n 

31.15. If b(x, y) = 0 for all y, then plug in y = x to find b(x, x) = 0. 

31.19. 

( ) ( ) 

1 1 1 1 

√ , √ . 

2 1 2 −1 

31.20.

474 Hints 

(a) Suppose that ξ = T x and η = T y, i.e ξ(z) = b(x, z) and η(z) = b(y, z) 

for any vector z. Then ξ(z) + η(z) = b(x, z) + b(y, z) = b(x + y, z), so 

T x + T y = ξ + η = T (x + y). Similarly for scaling: aT x = T ax. 

(b) If T x = 0 then 0 = b(x, y) for all vectors y from V , so x lies in the kernel 

of b, i.e. the kernel of T is the kernel of b. Since b is nondegenerate, the 

kernel of b consists precisely in the 0 vector. Therefore T is 1-to-1. 

(c) T : V → V ∗ is a 1-to-1 linear map, and V and V ∗ have the same dimensions, 

say n. dim im T = dim ker T + dim im T = dim V = n, so T is onto, hence 

T is an isomorphism. 

(d) You have to take x = T −1 ξ. 


32.1. The tensor product is δj iξ k. There are two contractions: 

a. δi iξ k = n ξ k (in other words, nξ), and 

b. δj iξ i = ξ j (in other words, ξ). 

32.3. 

( 

) 

ε ijk x i y j z k = det x y z . 

32.5. One contraction is s k = t i ik . 

(F ∗ t) i ik = F l i t l ( 

pq F 

−1 ) p ( 

i F 

−1 ) q 

k 

= t l ( 

pq F F 

−1 ) p ( 

l F 

−1 ) q 

k 

= t l ( 

lq F 

−1 ) q 

k 

= (F ∗ s) k 

. 

33 Tensors 

33.3. 

x ⊗ y = 4 e 1 ⊗ e 1 + 5 e 1 ⊗ e 2 + 8 e 2 ⊗ e 1 + 10 e 2 ⊗ e 2 + 12 e 3 ⊗ e 1 + 15 e 3 ⊗ e 2 

33.4. Picking a basis v 1 , v 2 , . . . , v p for V and w 1 , w 2 , . . . , w q for W , we have 

already seen that every tensor in V ⊗ W has the form 

t iJ v i ⊗ w J , 

so is a sum of pure tensors. 

33.11. Take any linear map T : V → V and define a tensor t in V ⊗ V ∗ , 

which is thus a bilinear map t (ξ, v) for ξ in V ∗ and v in V ∗∗ = V , by the rule 

t (ξ, v) = ξ(T v). 

(A.1)

Hints 475 

Clearly if we scale T , then we scale t by the same amount. Similarly, if we add 

linear maps on the right side of equation A.1, then we add tensors on the left 

side. Therefore the mapping taking T to t is linear. If t = 0 then ξ(T v) = 0 

for any vector v and covector ξ. Thus T v = 0 for any vector v, and so T = 0. 

Therefore the map taking T to t is 1-to-1. Finally, we need to see that the map 

taking T to t is onto, the tricky part. But we can count dimensions for that: if 

dim V = n then dim Hom (V, V ) = n 2 and dim V ⊗ V ∗ = n.

Bibliography 

[1] Emil Artin, Galois theory, 2nd ed., Dover Publications Inc., Mineola, NY, 

1998. 

[2] Otto Bretscher, Linear algebra, Prentice Hall Inc., Upper Saddle River, 

NJ, 1997. 

[3] Richard P. Feynman, Robert B. Leighton, and Matthew Sands, The Feynman 

lectures on physics. Vol. 2: Mainly electromagnetism and matter, 

Addison-Wesley Publishing Co., Inc., Reading, Mass.-London, 1964. 

[4] Johann Hartl, Zur Jordan-Normalform durch den Faktorraum, Archiv der 

Mathematik 69 (1997), no. 3, 192–195. 

[5] Tosio Kato, Perturbation theory for linear operators, Springer-Verlag, 

Berlin, 1995. 

[6] Peter D. Lax, Linear algebra, John Wiley & Sons Inc., New York, 1997. 

[7] I. G. Macdonald, Symmetric functions and Hall polynomials, 2nd ed., 

Oxford University Press, New York, 1995. 

[8] Peter J. Olver, Classical invariant theory, Cambridge University Press, 

Cambridge, 1999. 

[9] G. Polya, How to solve it, Princeton Science Library, Princeton University 

Press, Princeton, NJ, 2004. 

[10] Claudio Procesi, Lie groups, Springer, New York, 2007. 

[11] Igor R. Shafarevich, Basic notions of algebra, Encyclopaedia of Mathematical 

Sciences, vol. 11, Springer-Verlag, Berlin, 2005. 

[12] Daniel Solow, How to read and do proofs: An introduction to mathematical 

thought processes, Wiley, New Jersey, 2004. 

[13] Michael Spivak, Calculus on manifolds, Addison-Wesley, Reading, Mass., 

1965. 

[14] , Calculus, 3rd ed., Publish or Perish, 1994. 

[15] Gilbert Strang, Linear algebra and its applications, 2nd ed., Academic 

Press, New York, 1980. 

[16] Richard L. Wheeden and Antoni Zygmund, Measure and integral, Marcel 

Dekker Inc., New York, 1977. 

477

List of Notation 

||x|| The length of a vector 116 

〈x, y〉 Inner product of two vectors 115 

〈z, w〉 Hermitian inner product 142 

0 Any matrix whose entries are all zeroes 20 

1 identity map 153 

1 identity permutation 31 

A −1 The inverse of a square matrix A 28 

A [ij] 

A with rows i and j and columns i and j 248 

removed 

A ∗ 

Adjoint of a complex matrix or complex 143 

linear map A 

A b,i 

The matrix obtained by replacing column 175 

i of A by b 

adj A adjugate 175 

A (ij) 

The matrix obtained from cutting out row 51 

i and column j from A 

arg z Argument (angle) of a complex number 140 

A t The transpose of a matrix A 61 

C The set of all complex numbers 140 

C n The set of all complex vectors with n entries 140 

det The determinant of a square matrix 51 

dim Dimension of a subspace 84 

e i i-th row of the identity matrix 256 

e i 

The i-th standard basis vector (also the 25 

i-th column of the identity matrix) 

Hom (V, W ) The set of linear maps T : V → W 255 

479

480 List of Notation 

I The identity matrix 25 

im A The image of a matrix A 91 

I n The n × n identity matrix 25 

ker A The kernel of a matrix A 87 

p × q Matrix with p rows and q columns 17 

Pf Pfaffian of a skew-symmetric matrix 245 

R n 

The space of all vectors with n real number 

entries 

17 

Σ Sum 22 

T ∗ Transpose of a linear map 258 

T | W Restriction of a linear map to a subspace 158 

U + V Sum of two subspaces 197 

U ⊕ V Direct sum of two subspaces 197 

V ∗ The dual space of a vector space V 256 

v + W Translate of a subspace 258 

V/W Quotient space 259 

v ∗ Covector dual to the vector v. 299 

W ⊥ orthogonal complement 191 

|z| Modulus (length) of a complex number 140

Index 

adjoint, 153 

adjugate, 185 

alphabetical order 

see partition, alphabetical order, 

254 

analytic function, 235 

antisymmetrize, 298 

argument, 150 

associated 

matrix, see matrix, associated 

back substitution, 5 

ball, 146 

center, 146 

closed, 146 

open, 146 

radius, 146 

basis, 81, 86 

dual, 266 

orthonormal, 123 

standard, 82 

unitary, 154 

Bessel, see inequality, Bessel 

bilinear form, 285 

degenerate, 286 

nondegenerate, 286 

positive definite, 287 

symmetric, 287 

bilinear map, 306 

block 

Jordan, see Jordan block 

Boolean numbers, 174 

bounded, 146 

box 

closed, 146 

Cayley–Hamilton, see theorem, Cayley– 

Hamilton 

theorem, see theorem, Cayley–Hamilton 

center 

of ball, see ball, center 

change 

of variables, 84 

change of basis, see matrix, change of 

basis 

characteristic polynomial, see polynomial, 

characteristic 

circle 

unit, 151 

closed 

ball, see ball, closed 

box, see box, closed 

set, 146 

column 

permutation formula, see determinant, 

permutation formula, 

column 

combination 

linear, see linear combination 

commuting, 187, 232 

complement, see subspace, complementary 

complementary subspace, see subspace, 

complementary 

complex 

linear map, see linear map, complex 

vector space, see vector space, complex 

complex number, 149 

imaginary part, 149 

real part, 149 

complex plane, 149 

component, 295 

components 

see principal components, 273 

composition, 163 

conjugate, 150 

481

482 Index 

continuous, 146 

convergence, 145 

of points, 145 

covector, 266, 296 

dual, 309 

de Moivre, see theorem, de Moivre 

decoupling 

theorem, see theorem, decoupling 

degenerate 

bilinear form, see bilinear form, 

degenerate 

determinant, 51 

permutation formula 

column, 183 

row, 182 

diagonal 

of a matrix, 19 

diagonalize, 110, 139 

orthogonally, 136 

dimension 

effective, 275 

of subspace, 88 

direct sum, see subspace, direct sum 

distance, 194 

dot product, see inner product 

echelon form, 18 

equation, 9 

reduced, 89 

effective dimension, see dimension, effective 

eigenspace, 109 

eigenvalue, 101 

complex, 151 

multiplicity, 105 

eigenvector, 101 

complex, 151 

generalized, 214 

elimination, 3 

forward, 4 

Gauss–Jordan, 7 

fast formula 

for the determinant, 61 

Fibonacci, 27 

field, 173 

form 

bilinear, see bilinear form 

volume, 315 

form, exterior, 315 

forward, see elimination, forward 

free variable, 8 

fundamental theorem of algebra, 158 

Gauss–Jordan, see elimination, Gauss– 

Jordan 

generalized 

eigenvector, see eigenvector, generalized 

Hermitian 

inner product, see inner product, 

Hermitian, see inner product, 

Hermitian 

homomorphism, 265 

identity 

permutation, see permutation, identity 

identity map, see linear, map, identity 

identity matrix, see matrix, identity 

image, 95 

independence 

linear, see linear independence 

inequality 

Bessel, 203 

Schwarz, 193 

triangle, 193 

inertia 

law of, 288 

inner product, 119, 170 

Hermitian, 152, 171 

space, 171, 201 

intersection, 207 

invariant 

subspace, see subspace, invariant 

inverse, 173 

of a matrix, see matrix, inverse 

of permutation, see permutation, 

inverse 

isometry, 195 

isomorphism, 164 

Jordan 

block, 214 

normal form, 214 

kernel, 91 

kill, 42, 91

Index 483 

length, 120 

line, 194 

segment, 194 

linear 

combination, 72 

equation, 3 

independence, 81 

map, 163 

complex, 170 

identity, 163 

relation, 81 

map, 195 

linear, see linear map 

matrix, 17 

addition, 20 

associated, 163 

change of basis, 83 

diagonal entries, 19 

identity, 25 

inverse, 28, 43 

multiplication, 22 

normal, 155 

permutation, 32 

self-adjoint, 153 

short, 42 

skew-adjoint, 153, 238 

skew-symmetric, 238 

square, 17, 28 

strictly lower triangular, 35 

strictly upper triangular, 36 

subtraction, 20 


unitary, 154 

upper triangular, 53 

minimal polynomial, see polynomial, 

minimal 

minimum 

principle, 135 

modulus, 150 

multilinear, 303 

multiplicative 

function, 224 

multiplicity 

eigenvalue, see eigenvalue, multiplicity 

negative definite, see quadratic form, 

negative definite 

nilpotent, 232 

nondegenerate 

bilinear form, see bilinear form, 

nondegenerate 

normal form 

Jordan, see Jordan normal form 

normal matrix, see matrix, normal 

open 

ball, see ball, open 

orthogonal, 122 

orthogonal complement, 201 

orthogonally diagonalize, see diagonalize, 

orthogonally 

orthonormal basis, see basis, orthonormal 

partition, 253 

alphabetical order, 254 

associated to a permutation, 253 

permutation, 30 

identity, 31, 279 

inverse, 31 

matrix, see matrix, permutation 

natural, of partition, 254 

product, 31 

sign, 182 

permutation formula 

column, see determinant, permutation 

formula, column 

row, see determinant, permutation 

formula, row 

perpendicular, 120 

Pfaffian, 255 

pivot, 4, 18 

column, 74 

plane 

complex, see complex plane 

polar coordinates, 149 

polarization, 310 

polynomial, 311 

characteristic, 101 

homogeneous, 311 

minimal, 227 

positive definite, see quadratic form, 

positive definite 

positive semidefinite, see quadratic form, 

positive semidefinite 

principal component, 275 

principal components, 273 

product

484 Index 

dot, see inner product 

of permutations, see prmutation, 

product31 

scalar, see inner product 

tensor, see tensor, product 

product, inner, see inner product 

see inner product, 119 

projection, 201 

pure 

tensor, see tensor, pure 

Pythagorean theorem, see theorem, Pythagorean 

quadratic form, 138 

kernel, 292 

negative definite, 144 

positive definite, 144 

positive semidefinite, 144 

quotient space, 269 

radius 

of ball, see ball, radius 

rank, 9 

tensor, see tensor, rank 

real 

vector space, see vector space, real 

reciprocal, 173 

reduced echelon form, see echelon form, 

reduced 

reflection, 198 

relation 

linear, see linear relation 

restriction, 168, 232 

reversal, 197 

rotation, 196, 198 

row 

permutation formula, see determinant, 

permutation formula, 

row 

row echelon form, 9 

scalar product, see inner product 

Schwarz inequality, see inequality, Schwarz 

self-adjoint, see matrix, self-adjoint 

shear, 189 

short matrix, see matrix, short 

sign 

of a permutation, see permutation, 

sign 

skew-adjoint matrix, see matrix, skewadjoint 

skew-symmetric 

normal form, 251 

spectral 

theorem, see theorem, spectral 

spectrum, 102 

square, see matrix, square 

matrix, see matrix, square 

standard basis, see basis, standard 

string, 214 

subspace, 77 

complement, see subspace, complementary 

complementary, 207 

direct sum, 207 

invariant, 232 

sum, 207 

substitution 

back, see back substitution 

sum 

of subspaces, see subspace, sum 

sum, direct, see subspace, direct sum 

Sylvester, 288 

symmetric 

matrix, see matrix, symmetric 

symmetrize, 298 

tensor, 295, 304 

antisymmetric, 298 

contraction, 297 

contravariant, 310 

covariant, 310 

product, 297 

pure, 305 

rank, 306 


tensor product 

basis, 305 

of tensors, 304 

of vector spaces, 304 

of vectors, 304 

theorem 

Cayley–Hamilton, 187, 230 

de Moivre, 150, 157 

decoupling, 110 

Pythagorean, 120, 202 

spectral, 136 

trace, 105, 237, 248, 298 

translate, 268 

transpose, 64, 268 

transposition, 32

Index 485 

transverse, 210 

triangle inequality, see inequality, triangle 

unit circle 

see circle, unit, 151 

unitary 

basis, see basis, unitary 

matrix, see matrix, unitary 

variable 

free, see free variable 

vector, 17, 161 

vector space, 161 

complex, 170 

real, 170 

weight 

of polynomial, 246

Benjamin McKay Linear Algebra

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?