13.07.2015 Views

SELECTED CHAPTERS FROM ALGEBRA I. R. Shafarevich Preface

SELECTED CHAPTERS FROM ALGEBRA I. R. Shafarevich Preface

SELECTED CHAPTERS FROM ALGEBRA I. R. Shafarevich Preface

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Selected chapters from algebra 3Chapter 10. Number (continued).(Arithmetic of Gauss numbers, number-theoretical applications.)Chapter 11. Polynomial (continued).(Constructions by means of compasses and ruler and the solution of equationsby square radicals.)Chapter 12. Polynomial (continued).(Symmetric functions.)Chapter 13. Polynomial (continued).(Solutions of cubic and biquadratic equations. Nonsolvability by radicals ofequations of degree n > 5.)Appendix II.(Equations of degree 5, icosahedron, problem of resolvents.)Chapter 14. Number (ended).(Finite elds and nite geometries.)Appendix III.(Construction of regular 17-gon.)Chapter 15. Polynomial (ended).(Formal power series and innite products. Applications to number theory.)CHAPTER I. NUMBER1. Irrational numbersNatural numbers arose as a result of counting. The cognition of the fact thattwo eyes, two men walking side by side and two oars of a boat have somethingin common, expressed by the abstract concept \two", was an important step inthe logical development of mankind. The step to follow was not made so easily.Consonance, in many languages, of the word \three" and the words \many" (orig.\mnogo") or \much" (orig. \slixkom") bears a record to it. And only step by step,the idea of innite series of natural numbers arose.Gradually, the concept of number was related not only to counting, but also tomeasuring of length, area, weight etc. To be more concrete, we will only considerthe length of line segments in the following. Firstofall,wehave tochoose a unitof length: cm, mm, km, light-year, . . . Thus, a segment E is xed and may beused for measuring of another segment A. If E is contained in A exactly n times,then we say that the length of segment A is equal to n (Fig. 1,a). But, as a rule,it will not happen (Fig. 1,b).


4 I. R. <strong>Shafarevich</strong>a) b)Fig. 1Then, the unit lessens, breaking up E into m equal segments E 0 . If E 0 is containedin A exactly n times, then we say that the length of A is equal to n (relative tomthe unit E). Thousands of years, men, in dierent parts of the world, had appliedthis procedure in a variety of situations until the question: is this breaking reallypossible?, arose. This completely new setting of question already belongs to thehistorical epoch|Pythagoras' School, in the period of VI or V century B.C. Thesegments A and E are called commensurable if there exists a segment E 0 exactlym times contained in E and n times in A. Thus the above question modies tothe following: are each two segments commensurable? Or further: is the length ofeach segment (a unit being xed) equal to a rational numbern ? The answer ismnegative and an example of a pair of incommensurable segments is simple. Considera square, its side being E and its diagonal A.THEOREM 1. The side of the square isincommensurable with its diagonal.Before we proceed with the proof, we give this theorem another form. Accordingto the famous Pythagorean theorem, the area of the square over the hypotenuseof a right triangle is equal to the sum of areas of the squares over the other twosides. Or, in other words, the square of the length of the hypotenuse is equal tothe sum of squares of the lengths of the other two sides. However, the diagonalA of our square is the hypotenuse of the isosceles triangle whose other two sidescoincide with the sides E of the square (Fig. 2) and hence in our case A 2 =2E 2 , n 2 nand if A = nE 0 , E = mE 0 , then =2orm m = p 2. Therefore, Theorem 1 canbe reformulated asTHEOREM 2. p 2 is not a rational number.Fig. 2 Fig. 3We shall give a proof of this form of the theorem, but rst we make thefollowing remark. Although we have leaned on the Pythagorean theorem, we have


Selected chapters from algebra 5actually used it only for the case of an isosceles right triangle, when the conclusionis evident. Namely, it is enough to complete the Fig. 2 by constructing the squareover the side A (Fig. 3). From known criterions of congruency it follows that all vesmall right isosceles triangles in Fig. 3 are congruent. Hence they have the samearea S. But the square whose side is E consists of two such triangles and its areais E 2 . Thus, E 2 =2S. Similarly, A 2 =4S. Hence, A 2 =2E 2 , i.e. (A=E) 2 =2,which isallwe need.We cannow proceed with the proof of Theorem 2. Since our task is to provethe impossibility of representing p 2intheform p 2= n , it is natural to start withmthe converse, i.e. to suppose that p 2= n , where n and m are positive integers.mWe also suppose that they are relatively prime, for if they have a common factor,it can be cancelled without changing the ratio n . By denition of the square root,m2 n 2=the equality p 2= n nm means that 2 = mwe obtain the equality(1) 2m 2 = n 2 . Multiplying both sides by m2m2 where m and n are relatively prime positive integers, and it remains to prove thatit is impossible.Since there is a factor 2 on the left-hand side of (1), the question is naturallyrelated to the possibility of dividing positive integers by 2. Numbers divisible by2 are said to be even, and those indivisible by 2tobeodd. Therefore, every evennumber k can be written in the form k = 2l, where l is a positive integer, i.e.we have an explicit expression for even numbers, whereas odd numbers are denedonly by a negative statement|that such an expression does not hold for them. Butit is easy to obtain an explicit expression for odd numbers.LEMMA 1. Every odd number r can be written in the form r =2s +1,wheres is a natural number or 0. Conversely, all such numbers are odd.The last statement is evident: if r =2s +1were even, it would be of the formr =2l, which implies 2l =2s + 1, i.e. 2(l ; s) = 1, which is a contradiction.In order to prove the rst statement, notice that if the odd number r 6 2, thenr =1andtherepresentation is true with s =0. If the odd number r > 1, thenr > 3. Subtracting 2 from it, we obtain the number r 1 = r ; 2 > 1, and r 1 is againodd. If it is greater then 1, we again subtract 2 and put r 2 = r 1 ; 2. In this waywe obtain a decreasing sequence of numbers r, r 1 , r 2 , . . . , where each member isless than its predecessor by 2. We continue this procedure as long as r i > 1, andsince positive integers cannot decrease indenitely, we shall arrive at the situationwhen we cannot further subtract the number 2, i.e. when r i =1. We obtain thatr i = r i;1 ; 2=r i;2 ; 2 ; 2== r ; 2 ; 2 ;2=r ; 2i =1. Hence r =2i +1,as stated.We can now prove the basic property ofeven and odd numbers.LEMMA 2. The product of two even numbers is even, the product of an evenandanodd number is even and the product of two odd numbers is odd.


6 I. R. <strong>Shafarevich</strong>The rst two statements follow directly from the denition of even numbers: ifk =2l, then no matter whether m is even or odd, we always have km =2lm whichis an even number. However, the proof of the last statement requires Lemma 1.Let k 1 and k 2 be two oddnumbers. By Lemma 1 we can write them in the formk 1 = 2s 1 +1, k 2 = 2s 2 +1, where s 1 and s 2 are natural numbers or 0. Thenk 1 k 2 =(2s 1 +1)(2s 2 +1) = 4s 1 s 2 +2s 1 +2s 2 +1=2s+1 where s =2s 1 s 2 +s 1 +s 2 .As we know, any number of the form 2s + 1 is odd and so k 1 k 2 is odd.Notice the following particular case of Lemma 2: the square of an odd numberis odd.Now we can easily nish the proof of Theorem 2. Suppose that the equality(1) is true where m and n are positive integers, relatively prime. If n is odd, thenby Lemma 2, n 2 is also odd, whereas from (1) follows that n 2 is even. Hence, n iseven and can be written in the form n =2s. But m and n are relatively prime, andso m must be odd (otherwise they would have common factor 2). Substituting theexpression for n into (1) and cancelling by 2,we obtainm 2 =2s 2 i.e. the square of the odd number m is even, whichisincontradiction with Lemma 2.Theorem 2, and hence Theorem 1, are proved.Under the supposition that the result of measuring the length of a segment(with respect to a given unit segment) is a number and that the square root of apositive number is a number, we can look at Theorems 1 and 2 from a dierentpoint of view. These theorems assert that in the case of the diagonal of a square orin the case of p 2, these numbers are not rational, that is to say they are irrational.This is the simplest example of an irrational number. All the numbers, rational andirrational, comprise real numbers. In one of the following chapters we shall give amore precise logical approach to the concept of a real number, and we shall use itin accordance with the school teaching of mathematics, that is to say we shall notinsist too much on the logical foundations.Why didsuch a simple and at the same time important fact, the existence ofirrational numbers, have towait so long to be discovered? The answer is simple|because for all practical purposes, we can take, for instance, p 2tobea rationalnumber. In fact, we haveTHEOREM 3. No matter how small is a given number ", itispossible to ndarational number a = m n , such that a


Selected chapters from algebra 7Then wemay take a =k10 , for p k 2; n 10 < 1 . The inequalities (2) are equivalentn n10tok2 (k +1)26 2 < or k 2 6 2 10 2n < (k +1) 2 . Since the number n, and so2n10 10 2n210 n ,isgiven, there exists the largest positive integer k whose square is not greaterthan 2 10 n . This is the number we want.Obviously, the conclusion of Theorem 3 holds not only for the number p 2,but also for any positive (for simplicity'ssake we conne ourselves to them) realnumber x. This becomes evident if werepresent x by a point ofthenumber axis,if we divide the unit length E into small1segments E and cover the entire linen10by these segments (Fig. 4). Fig. 4Then the last of those points which is not right from x gives the requiredrational number: if it is the k-th point, then a =k16 x and x ; a


8 I. R. <strong>Shafarevich</strong>4. Determine p 2 with accuracy not less than 1100 .5. Prove that every positive integer can be written as a sum of terms ofthe form 2 k , where all the terms are dierent. Prove that for each number thisrepresentation is unique.2. Irrationality of other square rootsIt would be interesting to generalize the results of the preceding section. Forexample, is it possible to prove in the same way that p 3 is irrational? Clearly, wehave to adapt the reasoning from the previous section to the new situation. n 2,orWe want toprove the impossibility of the equality 3=m(3) 3m 2 = n 2 where, as in section 1, wemay take that the fraction n cannot be further cancelled,mi.e. that the positive integers m, n are relatively prime. Since in (3) we have thenumber 3, it is natural to examine the properties of division by 3. We adaptLemmas 1 and 2 to the new case.LEMMA 3. Every positive integer r is either divisible by 3 or it can be representedin one of the following forms: r =3s +1 or r =3s +2,where s is a naturalnumbers or 0. The numbers 3s +1 and 3s +2 are not divisible by 3.The last statement is evident. If, for example, n = 3s +1 were divisibleby 3,wewould have 3s +1=3m, i.e.3(m ; s) =1,which is a contradiction. Ifn =3s +2were divisible by 3wewould have 3s +2=m. i.e. 3(m ; s) =2,againacontradiction. We prove the rst statement of Lemma 3 by the same procedureused to prove Lemma 1. If r is not divisible by 3 and is less than 3, then r =1orr = 2 and the given representation holds with s =0. If r>3, then subtracting 3from it we get r 1 = r ; 3 > 0andr 1 is again not divisible by 3. We continue tosubtract the number 3 and we obtain the sequence r, r 1 = r ; 3, r 2 = r ; 3 ; 3,... , r s = r ; 3 ; 3 ;;3, where we cannot subtract 3 any more, since as noticedabove, r s = 1 or r s = 2. As a result we have two possibilities: r ; 3s = 1, i.e.r =3s +1orr ; 3s =2,i.e.r =3s + 2, as stated.In the formulation of the following lemma we take from the formulation ofLemma 2 only that part which we shall use later.LEMMA 4. The product of two positive integers not divisible by 3 is itself notdivisible by 3.Let r 1 and r 2 be two positiveintegers not divisible by3. According to Lemma 3for each one there are two possibilities: the number can be written in the form 3s+1or in the form 3s +2. Hence, there are altogether four possibilities:1) r 1 =3s 1 +1 r 2 =3s 2 +1 2) r 1 =3s 1 +1 r 2 =3s 2 +23) r 1 =3s 1 +2 r 2 =3s 2 +1 4) r 1 =3s 1 +2 r 2 =3s 2 +2


Selected chapters from algebra 9The cases 2) and 3) dier only in the numerations r 1 and r 2 and it is enoughto consider only one of them (for instance, 2). For the remaining three cases wemultiply out:1) r 1 r 2 =9s 1 s 2 +3s 1 +3s 2 +1=3t 1 +1 t 1 =3s 1 s 2 + s 1 + s 2 2) r 1 r 2 =9s 1 s 2 +6s 1 +3s 2 +2=3t 2 +2 t 2 =3s 1 s 2 +2s 1 + s 2 3) r 1 r 2 =9s 1 s 2 +6s 1 +6s 2 +4=3t 3 +1 t 3 =3s 1 s 2 +2s 1 +2s 2 +1(in the last formula we put 4 = 3+1, and group 3 with the numbers divisible by 3).Asaresultwe obtain numbers of the form 3t + 1 and 3t + 2 which are not divisibleby 3 (Lemma 3).Now we can easily carry over Theorem 2 to our case.THEOREM 3. p 3 is not a rational number.The proof follows closely the lines of the proof of Theorem 2. We have toestablish a contradiction starting with the equality (3):3m 2 = n 2 , where m and nare relatively prime. If the number n is not divisible by 3, according to Lemma 4its square is also not divisible by 3. But it is equal to 3m 2 , which means that nis divisible by 3: n = 3s. Substituting this into (3) and cancelling by 3 we getm 2 =3s 2 . But since n and m are relatively prime and n is divisible by 3,m cannotbe divisible by 3.In view of Lemma 4 its square is also not divisible by 3,butitisequal to 3s 2 . This contradiction proves the theorem.The close parallel between the reasonings used in the two cases we provedleads us to think that we can carry on. Of course, we donotconsider p p p 4, since4 = 2, but we may apply the same line of reasoning to 5. Clearly, we shallhave toprove a lemma analogous to Lemmas 2 and 4, but the number of productswhich have tobeevaluated will increase. We canverify all of them and concludethat p 5 is irrational. We cancontinue and consider p 6, p 7,etc.In each new casethe number of checkings in the proof of the lemma which corresponds to Lemmas2 and 4 will increase. Considering all natural numbers n, for instance up to 20, wecan conclude that p n is irrational, except in those cases when n is the square of aninteger (n = 4, 9 and 16). In this way, having to do more and more calculations,we can infer that p 2, p 3, p 5, p 6, p 7, p 8, p 10, p 11, p 12, p 13, p 14, p 15, p p p 17,p 18 and 19 are irrational numbers. This leads to the following conjecture: n isirrational for all positive integers n which are not squares of positive integers. Butwe cannot prove this general conjecture by the reasoning applied up to now, sincein one step of the proof we have to analyse all possible cases.It is interesting that the road covered by our reasoning was actually covered bymankind. As we have said, the irrationalityof p 2was proved by the Pythagoreans.Later on irrationality of p n was proved for some relatively small numbers n, untilthe general problem was formulated. About its solution we read in Plato's dialogue\Theaetetus", written in about 400 B.C. The author narrates how the famousphilosopher Socrates met with Theodor, mathematician from Cyrene and his young,very talented pupil by the name of Theaetetus. Theaetetus had the age of today'sschoolboy, between 14 and 15 years. Theodor's comment on his abilities reads: \he


10 I. R. <strong>Shafarevich</strong>approaches studying and research with such ease, uency, eagerness and peace as oilows form a pot and I wonder how canoneachieve somuch at that age". Furtheron, Theaetetus himself informs Socrates about the work he did with a friend, alsocalled Socrates, a namesake of the philosopher. He says that Theodor informedthem about the incommensurability (to use contemporary terms) of the side of asquare with the unit segment ifthe area of the square is an integer, but not thesquare of an integer. If this area is n, this means that p n is irrational. Theodorproved this for n = 2, 3, 5, and \solving one case after the other, he came up to 17".Theaetetus became intersted in the problem and, together with his friend Socrates,solved it, as it is recorded at the end of the dialogue. We shall not go into thereasoning of Theodor (there are several hypotheses), but we shall give the proof ofthe general statement, following the exposition of Euclid which is,very probablyanalogous to the proof of Theaetetus (with a simplication given by Gauss 2000years later).We rst prove an analog of Lemmas 1 and 3.THEOREM 4. For any two positive integers n and m there exist integers tand r, positive or 0, suchthatr n, the equality (4)holds with t = 0, r = n. If n > m, then put n 1 = n ; m. Clearly, n 1 > 0. Ifn 1 > m, putn 2 = n 1 ;m. We keep on subtracting m until we arrive atthenumbern t = n ; m ;;m = r, where r > 0 but t 2 . Subtracting the second equality from the rst we get:m(t 1 ; t 2 )+r 1 ; r 2 =0,i.e.m(t 1 ; t 2 )=r 2 ; r 1 . Since r 1


Selected chapters from algebra 11be prime (and hence a prime divisor of n) or it has two factors: a = a 1 b 1 . Thenn = a 1 (b 1 b)wherea 1


12 I. R. <strong>Shafarevich</strong>You have noticed that the above reasoning is similar to the proofs of Lemmas 2and 4: the statement reduces to the case when in (5), n 1 and n 2 (or better tosay r 1 and r 2 ) are less than p. But here, consideration of all possible cases anddirect checking are replaced by an elegant reasoning, by which, the theorem can betaken to be true for smaller values than p. (Euclid proved Theorem 5 somewhatdierently. Most probably, the here presented reasoning belongs to Gauss.)Now, to prove irrationality in general, no new ideas are needed.THEOREM 6. If c is a positive integer which is not the square of any positiveinteger, then c is not the square of any rational number, i.e. p c is irrational.We may again verify our statement, going from a positive integer to the biggerone, and thus, we can suppose the theorem has been proved for all smaller c's.In that case, we can assume that c is not divisible by the square of any positiveinteger greater than 1. Indeed, if c = d 2 f, d > 1, then f < c and f is not thesquare of positive integer, since f = g 2 would imply c =(dg) 2 which contradictsthe assumption of the theorem. Thus we can assume the theorem has alreadybeen proved for f, and accordingly, take that p f is irrational. But then p c isnot rational, either. In fact, the equality p c = n m , in view of p c = d p f, yieldsnm = dp f, p f =ndm ,whichwould mean p f is rational.Now we proceed to the main part of the proof. Suppose p c is rational andp n c = , where n and m are taken, as we already did it before, to be relativelymprime. Then m 2 c = n 2 . Let p be a prime factor of c. Put c = pd and d is notdivisible by p, otherwise c would be divisible by p 2 and now we consider the casewhen c is not divisible by a square. From m 2 c = n 2 , it follows that n 2 is divisibleby p and, according to Theorem 5, n is divisible by p. Let n = pn 1 . Using n = pn 1and c = pd and substituting in the relation m 2 c = n 2 ,wegetm 2 d = pn 2 1. Being mand n relatively prime and n divisible by p, m is not divisible by p. Then, accordingto Theorem 5, m 2 is not divisible by p, either. And, as we have seen it, d is notdivisible by p because it would imply that c is divisible by p 2 . Now, the equationm 2 d = pn 2 1is contradictory to Theorem 5.Notice that, in this section, we have more than once derived this or thatproperty of positive integers taking them one after the other and, rst, checkingthe property for n = 1 and, then, after supposing its validity for numbers lessthan n, weproved it for n.Here we lean upon a statement which has to be considered as an axiom ofarithmetic.If a property of positive integers is valid for n = 1 (or n = 2) and if fromits validity for all positive integers less than n, thevalidity forn follows, then theproperty isvalid for all positive integers. This statement is called the Principle ofMathematical Induction or of Total Induction. Sometimes, instead of supposingthe validity for all numbers less than n, onlythevalidity forn ; 1 is supposed. Astatement which corresponds to the case n =1orn =2,withwhich the reasoningstarts (sometimes n = 0 is more convinient) is called the basis of induction and


Selected chapters from algebra 13the statement corresponding to n ; 1, the inductive hypothesis. The Principleof Mathematical Induction is also used to produce a type of denitions, when aconcept involving a positive integer n is dened by supposing that it has alreadybeen dened for n ; 1. For example, when we dene an arithmetic progressionusing the property that each term is obtained from the preceding one by adding aconstant d, called the common dierence, then we have thetype of a denition byinduction. To express the denition symbolically, we writea n = a n;1 + d:And to determine the entire progression we only need to know the initial term: a 1or a 0 .In one of his treatise, French mathematician and physicist H. Poincare considersthe question: how is it possible that mathematics, which is founded on proofscontaining syllogisms, that is statements expressed in a nite number of words, doeslead to theorems related to innite collections (for example, Theorem 6 holds forinnite set of numbers c Theorem 2 asserts that 2n 2 6= m 2 for all positive integersn and m, thenumber of which is also innite). Possibilities for it, Poincare sees inthe Principle of Mathematical Induction, which, in his words, \contains an innitenumber of syllogisms condensed in a single formula".Problems1. Prove the irrationality of p 5 by the same method used in the proofs ofTheorems 2 and 3.2. Prove that the numberofpositiveintegers divisible by m and less than nis equal to the integer part of the quotient, when n is divided by m.3. Prove that if a positive integer c is not the cube of any positive integer,then 3p c is irrational.4. Replace the reasoning, connected with succesive subtracting of the numberm in the proof of Theorem 4, by a reference to the Principle of MathematicalInduction.5. Using the Principle of Mathematical Induction, prove the formula1+2++ n =n(n +1):26. Using the Principle of Mathematical Induction, prove the inequality n 6 2 n .3. The prime factorizationIn the previous section we have seen that every natural numberhasaprimedivisor. Starting with this, we cangetmuch more:THEOREM 7. Every natural number greater than 1 can be expressed as aproduct of prime numbers.


14 I. R. <strong>Shafarevich</strong>If the number p is itself prime, then the equality p = p is considered as a primefactorization containing only one factor. If the number n>1 is not prime, it hasa prime divisor, dierent from itself: n = p 1 n 1 and (since by denition p 1 6= 1),n 1 n 1 >n 2 > . Since our process must come to an end, we obtainn r = 1 for some r and the factorization is n = p 1 ... p r . Of course, the readercan easily formulate this proof in a more \scientic" way|using the method ofmathematical induction.The process used in the proof of Theorem 7 is not uniquely determined: ifthe number n has several prime factors, then any ofthemcould be the rst one.For example, 30 can be expressed rst as 2 15 and then as 2 3 5 and it isalso possible to express it rst as 3 10 and then as 3 2 5. The fact that tworesulting factorizations dier only by the order of their factors, was unpredictable.If for number 30 we could easily foresee all possibilities, is it so simple to convinceoneself that the number740037721 = 23623 31327has no other prime factorizations?In school curricula it is usually assumed, as a self-evident fact, that every givennatural number has only one prime factorization. However, this claim has to beproved, as the following example shows. Suppose that we know onlyeven numbersand do not know how to use odd numbers. (It is possible that this is a reection ofthe real historical situation, since in English the term \odd" has also the meaning\strange"). Repeating literally the denition of the prime number, we should call\prime" all even numbers which do not factorize into product of two even factors.For instance, the \prime" numbers would be 2, 6, 10, 14, 18, 22, 26, 30, . . . Thenagiven number may have two dierent \prime" factorizations, for example60 = 2 30 = 6 10:It is also possible to nd numbers with more dierent factorizations, such as420 = 2 210 = 6 70 = 10 42 = 14 30:Therefore, if the prime factorization is indeed unique, then in the proof of thisstatement wemust use some properties which express that we are dealing with allnatural numbers and not, say, with even numbers.Since we areconvinced that the uniqueness of the prime factorization is notself-evident, let us prove it.THEOREM 8. Any two prime factorizations of a given natural number dieronly by the order of their factors.The proof of the theorem is not really slef-evident, but all the diculties havebeen already overcome in the proof of Theorem 5. From this theorem everythingfollows easily.


Selected chapters from algebra 15First, let us note an obvious generalization of Theorem 5.Ifaproduct of any number of factors is divisible by a prime number p, thenatleast one of them is divisible by p.Letn 1 n 2 ... n r = p a:We shall prove our statement by induction on the number of factors r. When r =2,it coincides with Theorem 5. If r>2, write the equality in the formn 1 (n 2 ... n r )=p a:According to Theorem 5, either n 1 is divisible by p|and then the statement isproved|or n 2 ... n r is divisible by p|and then the statement is again true bythe induction hypothesis.We now prove Theorem 8. Suppose that a number n has two prime factorizations:(8) n = p 1 ... p r = q 1 ... q s :We see that p 1 didvides the product q 1 ... q s . By the generalization of Theorem5, proved earlier, p 1 divides one of the numbers q 1 ,...,q s . But q i is a primenumber and its only prime divisor is itself. Hence, p 1 coincides with one of the q i 's.Changing their numeration we can take p 1 = q 1 . Cancelling the equality (8)by p 1we get(9) n 0 = n p 1= p 2 ... p r = q 2 ... q s :This is a statement concerning a smaller number n 0 and using mathematicalinduction we can take it to be true. Hence the number of factors in the twofactorizations is the same, i.e. r ; 1=s ; 1, implying r = s. Besides, the factors q 2 ,... , q s can be written in such an order that p 2 = q 2 , p 3 = q 3 , ... , p r = q r . Sincewe have already established that p 1 = q 1 , the theorem is proved.The theorem we justproved can be found in Euclid. Although simple, it wasalways considered to be an abstract mathematical theorem. However, in the lasttwo decades it found an unexpected practical application which we shall shortlycomment. The application is connected with coding, i.e. writing an information insuch a form that it cannot be understood by a person who does not know someadditional information (the key of the code). Namely, it turns out that the problemof prime factorization of large numbers requests an enormous amount of operationsthis problem is much more involved than the \inverse" problem|multiplication ofprime numbers. For example, it is possible, though tedious, to multiply two primenumbers, each one having tens of digits (30 or 40 digits, say) in a day and to writedown the result (which willhave about 70 digits) in the evening. But factorizationof this number into two primes would take more time, even if we use the bestcontemporary computer, than the time which elapsed since the formation of theEarth. Hence, a pair of large numbers p and q on one hand, and their product


16 I. R. <strong>Shafarevich</strong>n = pq on the other, give, in view of Theorem 8, the same information, written intwo dierent ways, but the transition from the pair p, q to the number n = pq iseasy, whereas the transition from n to the pair p, q is practically impossible. Thisis the underlying idea of coding we omit technical descripiton.In the prime factorization of a number, certain primes may appear severaltimes, for insatnce, 90 = 2 3 3 5. We can group together the equal factors andwrite 90 = 2 3 2 5. Hence, for each positive integer n we have the factorization(10) n = p 11 p2 2 ... pr rwhere all the primes p 1 , ... , p r dier from one another and the exponents i > 1.This factorization is said to be canonical. Of course, such a factorization is uniquefor every n.Knowing the canonical factorization of a number n, we can nd out whateverwe want about its divisors. First, if the canonical factorization has the form (10),then it is obvious that the numbers(11) m = p 11 ... prrwhere 1 6 1 , 2 6 2 ,..., r 6 r , are divisors of n, where i may have thevalue 0 (i.e. some of the p i 's which are divisors of n need not be divisors of m).Conversely, any divisor of n has the form (11). Indeed, if n = mk, then k is adivisor of n, i.e. it has the form (11): k = p 11 ... prr. Multiplying the canonicalfactorizations of m and k and grouping together the powers of equal primes, wehave to arrive at the factorization (10), since this factorization is, by Theorem 8,unique. When two powers of a prime number are multiplied, their exponents addup which implies that 1 + 1 = 1 , i.e. 1 6 1 and similarly 2 6 2 , ... , r 6 r .For example, we can nd the sum of the divisors of n. We also take the numberitself, n and also 1 to be its divisors. For instance, n = 30 has the divisors 1, 2,3, 5, 6, 10, 15, 30 and their sum is 72. Consider rst the simplest case when n isapower of a prime number: n = p . Then its divisors are the numbers p where0 6 6 , i.e. the numbers 1, p, p 2 , ... , p . We therefore have tondthesum1+p + p 2 + + p . There is a general formula (which you may already know)which gives the sum of consecutive powers of a number:s =1+a + + a r :The derivation of the formula is quite simple: we multiply both sides of the aboveequality by a:sa = a + a 2 + + a r+1 :We see that the expressions for s and sa consist of almost the same terms, but ins we have 1 which doesnotgure in sa, whereas in sa wehave a r+1 which doesnot gure in s. Hence, after subtracting s from sa, all the terms cancel out, exceptthose two:sa ; s = a r+1 ; 1


Selected chapters from algebra 17i.e. s(a ; 1) = a r+1 ; 1, and(12) s =1+a + a 2 + + a r = ar+1 ; 1a ; 1 :Since we have divided by a ; 1, we must suppose that a 6= 1.Therefore, if n = p the sum of its divisors is 1+p + + p = p+1 ; 1p ; 1 .Consider now the next case when n has two prime divisors p 1 and p 2 . Its canonicalfactorization has the form n = p 11 p2 2. In view of formula (11), the divisors of nare p 11 p2,where 0 6 2 1 6 1 , 0 6 2 6 2 . Split them up into groups, onegroup for each value of 2 . So, for 2 = 0 we obtain the divisors 1, p 1 , p 2, ... , 1p 11whose sum is p1+1 1; 1p 1 ; 1 . For 2 = 1 we obtain the group p 2 , p 1 p 2 , p 2 1p 2 ,... , p 1 p 1 2. In order to nd the sum of those divisors, we notice that it is equalto (1 + p 1 + p 2 1+ + p1 )p 1 2 = p1+1 1; 1p 2 . Similarly, for any value of 2 wep 1 ; 1obtain the sum (1 + p 1 + p 2 1of divisors isp 1+11; 1p 1 ; 1+ p1+1+ + p11 )p21; 1p 2 + p1+1 1; 1p 1 ; 1 p 1 ; 11; 1p 1 ; 12 = p1+1p 2 2p1+11; 1+ + p 22p 1 ; 11; 1p 1 ; 1(1 + p 2 + p 2 2= p1+1p 22. Hence, the total sum+ + p22 ):We evaluate the sum in the parentheses by another application of the formula(12). As a result we conclude that the sum of all divisors of n = p 11 p2 2isp 1+11; 1 p2+1 2; 1p 1 ; 1 p 2 ; 1 .We now pass on to the general case. Consider the productS 0 =(1+p 1 + p 2 1+ + p1 )(1 + p 1 2 + p 2 2+ + p2 ) 2 ... (1 + p r + p 2 r+ + pr ) rand remove the parentheses. Howdowe do that? If wehave one pair of parentheses,i.e. an expression of the form (a + b + )k, we multiply each ofthe summandsa, b, etc. by k and the result is the sum of ak, bk, etc. If we have two pairsof parentheses (a 1 + b 1 + c 1 + )(a 2 + b 2 + c 2 + ) we multiply each termfrom one parentheses by each term from the other and the result is the sum ofall terms a 1 a 2 , a 1 b 2 , a 1 c 2 , b 1 a 2 , b 1 b 2 , etc. Finally, foranynumber of parentheses(a 1 + b 1 + c 1 + )(a 2 + b 2 + c 2 + ) ... (a r + b r + c r + )wetake one termfrom each, multiply them and then evaluate the sum of all such products. Applythis rule to our sum S 0 .The terms in the parentheses have the form p 1, 1 p2,2, which is,in view... , p rr(0 6 i 6 i ). Multiplying them we getp 11 p22 pr rof (11), a divisor of n, and according to Theorem 8, each one appears only once.Hence the sum S 0 is equal to the sum of the divisors of n. On the other hand, the


18 I. R. <strong>Shafarevich</strong>i-th parentheses, according to (12), is equal to pi+1 ; 1i, and the product of allp i ; 1parentheses isS = p1+1 1; 1 p2+1 2; 1 ... pr+1 r; 1p 1 ; 1 p 2 ; 1 p r ; 1 :This is the formula for the sum of all divisors. But we have also found thenumber of divisors. Indeed, in order to determine the number of divisors we haveto replace each summand in the sum of divisors by 1. Returning to the previousproof, we see that it is enough to replace each summand in each parentheses of theproduct S 0 by1. The rst parentheses is then equal to 1 + 1, the second to 2 +1,. . . , the r-th to r +1. Hence, the number of divisors is ( 1 +1)( 2 +1)( r +1).For example, the number of divisors of the number whose canonical factorizationis p q is equal to ( +1)( +1).In the same way we can derive the formula for the sum of squares or cubes orgenerally k-th powers of the divisors of n. The reasoning is the same as the oneapplied for nding the sum of the divisors. Verify that the formula for the sum ofk-th powers of all divisors of the number n with the canonical factorization (10) is(13) S = pk(1+1) 1; 1p k 1 ; 1 pk(2+1) 2; 1p k 2 ; 1 ... pk(r+1)r ; 1p k ; 1 :rWe can also investigate common divisors of two positive integers m and n. Lettheir canonical factorizations be(14) n = p 11 ... prr m = p1 1 ... pr rwhere in any pair of numbers ( i i ) one of them may have thevalue 0|this isthe case when a prime number divides one of the numbers m, n, but not the other.Then on the basis of what we know about divisors we can say that the number kis a common factor of m and n if and only if it has the formk = p 11 ... prrwhere 1 6 1 , 1 6 1 , 2 6 2 , 2 6 2 , ... , r 6 r , r 6 r . In other wordsif i denotes the smaller of the numbers i , i , these conditions become 1 6 1 , 2 6 2 , ... , r 6 r . Put(15) d = p 11 ... prr:The above reasoning proves that the following theorem is true.THEOREM 9. For any two numbers with canonical factorization (14), thenumber d, denedby(15), divides both n and m, and any common factor of n andm divides d.The number d is called the greatest common divisor of n and m and is denotedby g: c: d:(n m). It is clear that among all divisors of n and m, d is the greatest,but it is not obvious that all other common divisors divide it. This follows from


20 I. R. <strong>Shafarevich</strong>greatest degree of all monomials which comprise a polynomial is the degree of thepolynomial. For example, the degree of the polynomial 2x 3 ; 3x + 7 is 3 and itscoecients are a 0 =7,a 1 = ;3, a 2 =0,a 3 =2. Thus, a polynomial of degree ncan be written in the formf(x) =a 0 + a 1 x + a 2 x 2 + + a n x n where some of a k 's can be zero, but a n 6=0,because otherwise the degree of thepolynomial would be less than n. The term a 0 is called the constant term of thepolynomial, a n is its leading coecient. The equation f(x) =0iscalledanalgebraicequation with one unknown. Anumber is called its root if f() =0. Arootofthe equation f(x) =0isalsocalledaroot of the polynomial f(x). Degree of thepolynomial f(x) is the degree of the equation. Obviuosly, the equations f(x) =0and cf(x) = 0, where c is a number, distinct from 0, are equivalent.Now we shall treat such equations f(x) = 0 whose coecients a 0 , a 1 , ... ,a n are rational numbers, some of which may be equal to 0 or negative. If c isthe common denominator of all, distinct from 0, coecients, we can pass from theequation f(x) = 0 to the equation cf(x) = 0 having integer coecients. In thesequel we shall treat only such equations. In this connection we shall have to dealwith the divisibility of (not only nonnegative) integers. Recall that an integer a is,by denition, divisible by aninteger b if a = bc for some integer c.THEOREM 10. Let f(x) be a polynomial with integer coecients and withleading coecient equal to 1. If the equation f(x) =0has arational root , then is an integer and it is a divisor of the constant term of the polynomial f(x).Let us represent in the form = a b , where the fraction a is irreducible,bi.e. positive integers a and b are relatively prime. By the condition, the polynomialf(x) =a 0 + a 1 x + a 2 x 2 + + a n;1 x n;1 + x n has integer coecients a i . Let ussubstitute into the equation f(x) =0. By the assumption,(16) a 0 + a a 1 + + a a n;1 a nn;1 + =0:bbbMultiply the equation by b n and transfer (a) n to the right-hand side. All theterms remaining on the left-hand side will be divisible by b:(a 0 b n;1 + a 1 (a)b n;2 + + a n;1 (a) n;1 b n;2 )b =(1) n;1 a n :We see that b divides a n . If were not an integer, b would be > 1. Let p be someof its prime divisors. Then it has to divide a n ,andby Theorem 5, p divides a, too.However, by the assumption, a and b are relatively prime and we have obtained acontradiction. Hence, b =1and = a.In order to obtain the second assertion of the theorem, let us leave justa 0 onthe right-hand side, and transfer all the other terms to the right-hand side (recallthat b = 1). All the terms on the right-hand side are divisible by a:a 0 = a(a 1 ; a 2 (a) ;;a n;1 (a) n;2 ; (a) n;1 ):Obviously, it follows that a devides a 0 .


Selected chapters from algebra 21Theorem 10 allows us to nd rational roots of equations of the given form:in order to do that we have to list all the divisors of the constant term (withsigns + and ;) and check whether they are roots. For example, for the equationx 5 ; 13x +6=0wehave tocheck thenumbers 1, 2, 3, 6. Only x = ;2 isaroot.In such a way, all the roots of a polynomial f(x) with integer coecientsand the leading coecinet equal to 1 are irrational, except integer roots which areincluded as divisors of the constant term.That is just what we have proved in thebeginning of this Chapter: rstly for f(x) =x 2 ; 2 (Theorem 2), then for f(x) =x 2 ;3 (Theorem 3) and nally for f(x) =x 2 ;c, where c is an integer (Theorem 6).Now wehave obtained the widest generalization of all these assertions. It has a lotof other geometrical applications, besides Theorems 1, 2, 3, 6.Consider, e.g., the equation(17) x 3 ; 7x 2 +14x ; 7=0:By Theorem 10, its integer roots can be just divisors of the number ;7, i.e. one ofthe numbers 1, ;1, 7, ;7. Substitutions show that neither of these numbers satisesthe equation. We can conclude that the roots of the equation are irrational numbers.As a matter of fact, we do not know whether the equation (17) has roots at all.But, we shall show later that it has roots and even very interesting ones. One of itsroots appears to be the square of the side of theregular heptagon, inscribed in the circle of radius 1.Moreover, the equation (17) has three roots, lying:between 0 and 1, between 2 and 3 and between3and4. They are the squares of the diagonals ofthe regular heptagon, inscribed into the unit circle.Here by a diagonal we meananarbitrary segment,joining two vertices of a polygon, so that sides areincluded as diagonals. The regular heptagon hasthree diagonals of dierent length|AB, AC andAD (g. 5). In such a way, all these lengths areirrational numbers. Fig. 5Problems1. Show that Theorem 5 is not valid if concepts of a number and a \prime" areunderstood as applied just to even numbers as has been discussed in the beginningof this section. Which part of the proof of Theorem 5 appears to be wrong in thatcase?2. Prove that if integers m and n are relatively prime, then divisors of mnare obtained multiplying divisors of m by divisors of n, and each divisor of mncan be obtained in this way exactly once. Deduce that if S(N) denote the sum ofk-th powers of all divisors of N, m and n are relatively prime and N = mn, thenS(N) =S(m)S(n). Derive the formula (14) in that way.


22 I. R. <strong>Shafarevich</strong>3. A positive integer n is called perfect if it is equal to the sum of its properdivisors (i.e. the number itself is excluded from the set of its divisors). E.g. numbers6 and 28 are perfect. Prove that if for some r the number p =2 r ; 1 is prime, then2 r;1 p is a perfect number (but recall that the formulaonthesumofdivisorsS wehave deduced, includes the number n itself). This proposition was already knownto Euclid. Nearly 2000 years later Euler proved the inverse assertion: each evenperfect number is of the form 2 r;1 p, where p =2 r ;1 is a prime. Proof does not useany facts other than the ones presented previously, but is by no means easy. Tryto rediscover this proof! By now, it is not known whether there exist odd perfectnumbers.4. If for two positive integers m and n there exist such integers a and b so thatma + nb =1,thenobviously m and n are relatively prime: each of their commondivisors is divisible by 1. Prove the converse: for relatively prime numbers m andn there always exist integers a and b such that ma + nb = 1. Use the divisionalgorithm and mathematical induction.5. Using the result of problem 4 prove Lemmas 6 and 7 without using thetheorem on uniqueness of prime factorization. Show that in such a way a newproof of this theorem (Theorem 8) can be obtained. This was just the way Euclidproved it.6. Find integer values of a such that the polynomial x n + ax + 1 has rationalroots.7. Let f(x) be a polynomial with integer coecients. Prove that if a reducedfraction = a is a root of the equation f(x) = 0, then b divides the leadingbcoecient anda divides the constant term. This is a generalization of Theorem 10to the case of a polynomial with integer coecients where the leading coecientneed not be equal to 1.I. R. <strong>Shafarevich</strong>,Russian Academy of Sciences,Moscow, Russia


THE TEACHING OF MATHEMATICS1999, Vol. II, 1, pp. 1{30<strong>SELECTED</strong> <strong>CHAPTERS</strong> <strong>FROM</strong> <strong>ALGEBRA</strong>I. R. <strong>Shafarevich</strong>Abstract. This paper is the second part of the publication \Selected chaptersof algebra", the rst part being published in the previous volume of the Teaching ofMathematics, Vol. I (1998), 1-22.AMS Subject Classication: 00 A 35Key words and phrases: Polynomial, multiple roots and derivatives, binomialformula, Bernoulli's numbers.CHAPTER II. POLYNOMIAL1. Roots and divisibility of polynomialsIn this chapter we shall be concerned with equations of the type f(x) = 0,where f is a polynomial. We have already met with them at the end of the previouschapter. The equation f(x) = 0 should be understood as the problem: nd allthe roots of the polynomial (or the equation).But it may happen that all thecoecients of the polynomial f(x) are 0 and the equation f(x) = 0 turn into anidentity. We then write f = 0 and in that case we agree that the degree of thepolynomial f is not dened.In order to add up two polynomials we simply add the corresponding members.Polynomial are multiplied using the bracket rules. If f(x) =a 0 + a 1 x + + a n x n ,g(x) =b 0 +b 1 x++b m x m , then f(x)g(x) =(a 0 +a 1 x++a n x n )(b 0 +b 1 x++b m x m ). Eliminating the brackets we obtain members a k b l x k+l , where 0 6 k 6 n,0 6 l 6 m. After that we group together similar members. As a result we obtainthe polynomial c 0 + c 1 x + c 2 x 2 + with coecients(1) c 0 = a 0 b 0 c 1 = a 0 b 1 + a 1 b 0 c 2 = a 0 b 2 + a 1 b 1 + a 2 b 0 ...The coecient c m is equal to the sum of all products a k b l , where k + l = m.Polynomials share many properties with integers. The representation of apolynomial in the form f(x) =a 0 + a 1 x + + a n x n can be considered to be ananalog of the representation of a positive integer in the decimal (or some other)system. The degree of a polynomial has the role analogous to the absolute valueThis paper is an English translation of: I. R. Xafareviq, Izbranye glavy algebry,Matematiqeskoe obrazovanie, 1, 2, il~{sent. 1997,Moskva, str. 3{33. In the opinion ofthe editors, the paper merits wider circulation and we are thankful to the author for his kindpermission to let us make this version.


2 I. R. <strong>Shafarevich</strong>of an integer. For example, if we prove a property ofintegers by induction on theabsolute value, then in the proof of the analogous property of polynomials we useinduction on the degree. Notice the following important property: the degree ofthe product of two polynomials is equal to the sum of their degrees. Indeed, letf(x) =a 0 + a 1 x + + a n x n and g(x) =b 0 + b 1 x + + b m x m be polynomials ofdegree n and m, that is to say a n 6=0,b m 6=0. If we calculate the coecients off(x)g(x) using (1), we obtain members of the form a k b l x k+l where k + l 6 n + m.Clearly, the greatest degree we getis m + n and there is only one such member:a n b m x n+m . It diers from zero, since a n b m 6= 0, and it cannot be cancelled withsome other member, since it has the greatest degree. This property is analogous tothe property jxyj = jxjjyj of the absolute value jxj of a number x.The theorem on division with a remainder for polynomials is formulated andproved almost in the same way as for positive integers (Theorem 4, Chapter I).THEOREM 1. For any polynomials f(x) and g(x), where g 6= 0, there existpolynomials h(x) and r(x) such that(2) f(x) =g(x)h(x)+r(x)where either r =0,orthedegree ofr is less than the degree ofg. For given f andg, the polynomials h and r are uniquely determined.If f =0,then the representation (2) is obvious: f =0 g +0. Suppose thatf 6= 0 and apply the method of mathematical induction on the degree of f(x).Suppose that the degree of f(x) isn and that the degree of g(x) ism:f(x) =a 0 + a 1 x + + a n x n g(x) =b 0 + b 1 x + + b m x m :If m>n, the representation (2) has the form f =0 g + f, with h =0,r = f.If m 6 n, put f 1 = f ; a nx n;m g (remember that b m 6= 0, since the degree ofb mg(x) ism). Clearly, in the polynomial f 1 the members having x n cancel out (thatis how wechose the coecient ; a n), which means that its degree is less than n.b mHence, we can take that the theorem is true for that polynomial and that it hasthe representation of the form (2): f 1 = gh 1 + r, where r = 0 or its degree is lessh 1 + a nx n;m g + r, andwehaveb mthan m. This implies f = f 1 + a nb mx n;m g =obtained the representation (2) with h = h 1 + a nb mx n;m . Let us prove that therepresentation (2) is unique. If f = gk + s is another such representation (whichmeans that s = 0 or its degree is less than m), then subtracting one from the otherwe getg(h ; k)+r ; s =0 g(h ; k) =s ; r:If the polynomial s ; r is 0, then s = r and h = k. If s ; r 6= 0, then its degree isless than m and we arrive at a contradiction, since s ; r is equal to the polynomialg(h;k), obtained by multiplying g by h;k, and hence its degree cannot be smallerthan the degree of g which ism.


Selected chapters from algebra 3Now please read the proof of Theorem 4 of Chapter I in order to see thatthe above proof is perfectly analogous to it. On the other hand, if we doalltheoperations which should be done in the application of the mathematical induction(i.e. transition from f 1 to the polynomial f 2 with even smaller degree, etc. untilwe arrive at the remainder r whose degree is less than m), we obtain the rule forthe so called \corner" division of polynomials used in schools. For example, iff(x) =x 3 +3x 2 ; 2x +5,g(x) =x 2 +2x ; 1, then the division is done accordingto the scheme:x 3 +3x 2 ; 2x +5 j x 2 +2x ; 1x 3 +2x 2 ; x x +1x 2 ; x +5x 2 +2x ; 1; 3x +6This means that we choose the leading term of the polynomial h(x) so thatwhen it is multiplied by the leading term of g(x) (that is x 2 ) the result is the leadingterm of f(x) (that is x 3 ). Therefore the leading term of h(x) isx. In the rst rowof the above table we have f(x) and in the second g(x)x (the product of g(x) andthe leading term of h(x)). Their dierence is in the third row. We nowchoose thenext term of the polynomial h(x) so that when multiplied by the leading term ofg(x) (that is x 2 ) it becomes equal to the leading term of the polynomial in the thirdrow (that is x 2 ). Hence the next term of h(x) is1.We now repeat the procedure.In the fth row we obtain a polynomial of degree 1 (which is less than 2, the degreeof g(x)) and so the procedure stops. We see thatx 3 +3x 2 ; 2x +5=(x 2 +2x ; 1)(x +1); 3x +6:As in the case of numbers, the representation (2) is called division with remainderof the polynomial f(x) by the polynomial g(x). Polynomial h(x) is thequotient and polynomial r(x) istheremainder in this division.The division of polynomials is analogous to the division of numbers and it iseven simpler, since when we add two terms of a certain degree we obtain a term ofthe same degree, and we do not transform into tens, hundreds, etc., as in the caseof number division.Repeating the reasoning given in Chapter I for numbers, we canapply Theorem1 to nd the greatest common divisor of two polynomials. In fact, usingthe notation of Theorem 1 we have the following analog of Lemma 5 from ChapterI: g: c: d:(f g) = g: c: d:(g r) more precisely, the pairs (f g) and (g r) havethe same common divisors. We can now use Euclid's algorithm as in Chapter I:divide with reaminder g by r: g = rh 1 + r 1 , then r by r 1 , and so on, to obtain thefollowing sequence of polynomials: r, r 1 , r 2 , ... , r n whose degrees decrease. Westop at the moment when we obtain the polynomial r k+1 = 0, i.e. when r k;1 =r k h k . From the equalities g: c: d:(f g) =g: c: d:(rr 1 )= =g: c: d:(r k;1 r k )wesee that g: c: d:(f g) = g: c: d:(r k;1 r k ). But since r k is a divisor of r k;1 , then


4 I. R. <strong>Shafarevich</strong>g: c: d:(r k;1 r k )=r k and so g: c: d:(f g) =r k | the last nonzero remainder in theEuclid's algorithm. It should be remarked that g: c: d:(f g) is not uniquely dened,which was not the case when we dealt with positive integers. Namely, ifd(x) isacommon divisor of f(x) and g(x), then so is cd(x), where c 6= 0isanumber. Hence,g: c: d:(f g) is dened up to a multiplicative constant.Theorem 1 becomes particularly simple and useful in the case when g(x) isarst degree polynomial. We can then write g(x) =ax + b, witha 6= 0. Since theproperties of division by g are unaltered if g is multiplied by anumber, we multiplyg(x) by a ;1 , so that the coecient ofx is 1. Write g(x) in the form g(x) =x ; (it will soon become evident why it is more convenient to write with a minus).According to Theorem 1, for any polynomial f(x) wehave(3) f(x) =(x ; )h(x)+r:But in our case the degree of r is less than 1, i.e. it is 0: r is a number. Can wend this number without carrying out the division? It is very simple|it is enoughto put x = into (3). We get r = f(), and so we can write (3) in the form(4) f(x) =(x ; )h(x)+f():Polynomial f(x) is divisible by x; if and only if the remainder in the divisionis 0. But in view of (4) it is equal to f(). We therefore obtain the followingconclusion which is called Bezout's theorem.THEOREM 2. Polynomial f(x) is divisible by x ; if and only if is its root.For example, the polynomial x n ; 1 has a root x = 1. Therefore, x n ; 1 isdivisible by x;1. We came across this division earlier: see formula (12) of Chapter I(where a is replaced by x and r +1by n).In spite of its simple proof, Bezout's theorem connects two completely dierentnotions: divisibility and roots, and hence it has important applications. Forinstance, what can be said about the common roots of polynomials f and g, i.e.about the solutions of the system of equations f(x) =0,g(x) =0? By Bezout'stheorem the number is their common root if f and g are divisible by x ; .But then x ; divides g: c: d:(f g) which can be found by Euclid's algorithm. Ifd(x) =g: c: d:(f g), then x ; divides d(x), i.e. d() =0. Therefore, the questionof common roots of f and g reduces to the question of the roots of d, which is,in general, a polynomial of much smaller degree. As an illustration, we shall determinethe greatest common divisor of two second degree polynomials, which wewrite in the form f(x) =x 2 + ax + b and g(x) =x 2 + px + q (we can always reducethem to this form after multiplication by anumber). We divide f by g accordingto the general rule:x 2 + ax + bx 2 + px + q 1(a ; p)x +(b ; q)j x 2 + px + qThe remainder is r(x) = (a ; p)x +(b ; q) and we know that g: c: d:(f g) =g: c: d:(g r). Consider the case a = p. If we alsohave b = q, then f(x) =g(x) and


Selected chapters from algebra 5the system of equations f(x) =0,g(x) = 0 reduces to one equation f(x) =0. Ifb 6= q, thenr(x) is a nonzero number and f and g have no common factor. Finally,if a 6= p, then r(x) has unique root = b ; q . We know that g: c: d:(f g) =p ; ag: c: d:(g r) and it is enough to substitute this value of into g(x), in order to ndout whether g(x) is divisible by x ; . We obtain the relation 2 b ; q b ; q+ p + q =0p ; a p ; aor, multiplying it by nonzero number (p ; a) 2 , the equivalent relation(4') (b ; q) 2 + p(b ; q)(p ; a)+q(p ; a) 2 =0:The second and the third member of this equalityhave common factor p;a. Takingit out we can rewrite the relation (4') in the form(q ; b) 2 +(p ; a)(pb ; aq) =0:The expression D = (q ; b) 2 +(p ; a)(pb ; aq) is called the resultant of thepolynomials f and g. We have seen that the condition D = 0 is necessary andsucient for the existence of a common factor of f(x) and g(x), provided thatp 6= a. But for p = a the condition D = 0 becomes q = b, and that is, as we haveseen, equivalent to the existence of a common non-constant factor of f(x) andg(x).In general, it is possible to nd for any two polynomials f(x) andg(x) of arbitrarydegrees an expression made up from their coecients, which equated to zero givesnecessary and sucient condition for the existence of their common nonconstantfactor, but of course, the technicalities will be more dicult.Another important application of Bezout's theorem is considered with the numberof roots of a polynomial. Suppose that the polynomial f(x) is not identicallyzero, i.e. f 6= 0, and that f(x) has, besides 1 , another root 2 such that 2 6= 1 .By Bezout's theorem, f(x) is divisible by x ; 1 :(5) f(x) =(x ; 1 )f 1 (x):Put x = 2 into this equality. Since 2 is also a root of f(x), we have f( 2 )=0.This means that ( 2 ; 1 )f 1 ( 2 ) = 0, and hence (since 2 6= 1 ) that f 1 ( 2 )=0,i.e. that 2 is a root of f 1 (x). Applying Bezout's theorem to the polynomial f 1 (x)we obtain the equality f 1 (x) = (x ; 2 )f 2 (x) and substituting this into (5) weobtainf(x) =(x ; 1 )(x ; 2 )f 2 (x):Suppose that the polynomial f(x) hask dierent roots 1 , 2 , ... , k . Repeatingour reasoning k times we see that f(x) is divisible by (x ; 1 ) (x ; k ):(6) f(x) =(x ; 1 ) (x ; k )f k (x):Let n be the degree of f(x). On the right-hand side of (6) we have a polynomialwhose degree is not less than k, and on the left-hand side a polynomial of degree n.Hence n > k. We formulate this as follows.


6 I. R. <strong>Shafarevich</strong>THEOREM 3. The number of dierent roots of a polynomial which is notidentically zero is not greater than its degree.Of course, if a polynomial is identically equal to 0, all the numbers are itsroots. Theorem 3 was proved in the 17th century by philosopher and mathematicianDescartes.Using Theorem 3 we can answer the question we have avoided up to now:what is the meaning of the phrase equality of polynomials? One way is to write thepolynomials in the formf(x) =a 0 + a 1 x + + a n x n g(x) =b 0 + b 1 x + + b m x mand to say that they are equal if all their coecients are equal: a 0 = b 0 , a 1 = b 1 ,etc. This is how we think of the equality f = 0|all the coecients of f are zero.Another way to understand the term \equality" is as follows: polynomials f(x) andg(x) are equal if they take the same values when x is substituted by an arbitrarynumber, i.e. if f(c) = g(c) for all c. We shall prove that these two meanings ofthe notion \equality" coincide. But, at rst we have tomake a distinction betweenthem and in the rst case we saythat\f(x) andg(x) have equal coecients" andin the second that \f(x) andg(x) have equal values for all values of x".Evidently, iff(x) and g(x) have equal coecients, then they have equal valuesfor all x. The converse will be proved in a stronger form: we do not havetosupposethat f(x) andg(x) coincide for all values x|it is enough to suppose that they havethe same values for any n +1values of x, where n is not less than the degrees ofboth polynomials.THEOREM 4. Suppose that the degrees of the polynomials f(x) and g(x) arenot greater than n and that they have same values for some n +1 dierent valuesof x. Then the coecients of f(x) and g(x) are equal.Proof. Suppose that the polynomials f(x) and g(x) have equal values for n +1values of x: x = 1 , 2 ,..., n+1 , i.e. thatf( 1 )=g( 1 ) f( 2 )=g( 2 ) ... f( n+1 )=g( n+1 ):Consider the polynomial h(x) = f(x) ; g(x) (here \=" denotes the equality ofcoecients). We have seen that this implies that h() = f() ; g() for any ,and, in particular, that h( 1 )=0,h( 2 )=0,...,h( n+1 )=0. But the degreesof f and g are not greater than n, and so the degree of h is not greater than n.This is a contradiction with Theorem 3, unless h = 0, i.e. unless all the coecientsof h are 0. This implies that the coecients of f and g are equal.From now on we can apply the term \equality" to polynomials without emphasizingin which one of the two senses.Theorem 4 shows an interesting property of polynomials. Namely, ifweknowthe values of a polynomial f(x) of degree not greater than n for some n +1valuesof the variable x, then its coecients are uniquely determined, and so are its valuesfor all other values of x. Notice that in the abovesentence \coecients are uniquelydetermined" means only that there cannot be two dierent polynomials with the


8 I. R. <strong>Shafarevich</strong>(x;x 1 ) (x;x k;1 )(x;x k+1 ) (x;x n+1 ) is equal to F (x) . Putting F (x) =x ; x k x ; xF k k (x), we get(9) c k = y kF k (x k ) f k(x) = y kF k (x k ) F k(x):Passing on to the general interpolation problem with the table (7) weonlyhaveto notice that its solution is the sum of all polynomials f k (x) which correspond toall the simplest interpolation problems:f(x) =f 1 (x)+f 2 (x)++ f n+1 (x):Indeed, if we putx = x k then all the members on the right-hand side become 0,except f k (x k ), and since f k (x) is the solution of the k-th simplest interpolationproblem, we have f k (x k ) = y k . Finally, the degrees of f 1 (x), ... , f n+1 (x) arenot greater than n and the same holds for their sum. We can write the obtainedformula in the form(10) f(x) = y 1F 1 (x 1 ) F 1(x)+ y 2F 2 (x 2 ) F y n+12(x)++F n+1 (x n+1 ) F n+1(x)where F k (x) = F (x)x ; x k, F (x) =(x ; x 1 )(x ; x 2 ) (x ; x n+1 ).There is an unexpected identity which follows from the formula for the interpolationpolynomial. Consider the interpolation problem corresponding to thetablex j x 1 x 2 ... x n+1f(x) j x k 1x k 2... x k n+1where k is a positive integer not greater than n or k =0. On one hand it is evidentthat the polynomial f(x) =x k is the solution of this interpolation problem. Onthe other hand, we can write it down using formula (10) and we obtain thatx k = xk 1F 1 (x 1 ) F 1(x)+ xk 2F 2 (x 2 ) F x k n+12(x)++F n+1 (x n+1 ) F n+1(x)where F (x) =(x ; x 1 )(x ; x 2 ) (x ; x n+1 ) and F i (x) = F (x)x ; x i. The polynomialsF i (x) have degree n and the coecient ofx n is 1. If k


10 I. R. <strong>Shafarevich</strong>Such considerations are based on Bezout's theorem (Theorem 2). Let x = be a rooot of f(x). By Bezout's theorem f(x) is divisible by x ; and we havef(x) =(x ;)g(x), where g(x) is a polynomial whose degree is less than the degreeof f(x) by1. If the polynomial g(x) again has a root x = , we shall say thatf(x)has two roots equal to . By Bezout's theorem, g(x) can be written in the formg(x) =(x ; )h(x) and hence(11) f(x) =(x ; ) 2 h(x):We can say that in the representation (6) there are two factors x ; . This is inaccordance with the intuitive notion of what are two equal roots.If in (11) h(x) again has a root , weshallsay that f(x) hasthree roots equalto . In general, if f(x) can be written in the form f(x) =(x ; ) r u(x), whereu(x) is a polynomial whose root is not , we shall say thatf(x) hasr equal roots .If r > 2, then is said to be a multiple root. Hence, is a multiple root if f(x)is divisible by (x ; ) 2 . If the polynomial f(x) has exactly k roots equal to , wesay that k is the multiplicity of the root . Then f(x) can be written in the formf(x) =(x ; ) k g(x), where is not a root of g(x), i.e. g() 6= 0.For example, suppose that x = is a root of the quadratic equation x 2 + px +q =0. Dividing x 2 + px + q by x ; we getx 2 + px + qx 2 ; x(p + )x + qj x ; (p + )x ; (p + )x + p + q + p + 2i.e. x 2 +px+q =(x ;)(x+p+)+( 2 +p+q). Since is a root of the equationx 2 + px + q =0,wehave 2 + p + q = 0, and so x 2 + px + q =(x ; )(x + p + ).By our denition, this equation has two roots equal to if is a root of x + p + ,i.e. if 2 + p =0. Hence, = ;p=2. Since 2 + p + q = 0, then putting = ;p=2we obtain that ;p 2 =4+q =0. This is the known condition which ensures that theequation x 2 + px + q = 0 has equal roots.For the third order equation x 3 + ax 2 + bx + c = 0 the calculation is only alittle more involved. Divide x 3 + ax 2 + bx + c by x ; :x 3 + ax 2 + bx + cj x ; x 3 ; x 2 x 2 +(a + )x + b + a + 2(a + )x 2 + bx + c(a + )x 2 ; (a + )x(b + a + 2 )x + c(b + a + 2 )x ; (b + a + 2 )c + b + a 2 + 3


Selected chapters from algebra 11Since by supposition 3 + a 2 + b + c = 0, then x 3 + ax 2 + bx + c =(x ; )(x 2 +(a + )x + b + a + 2 ). According to our denition the equationx 3 + ax 2 + bx + c =0hastwo roots equal to if is a root of the equation andalso if is a root of the polynomial x 2 +(a + )x + b + a + 2 . In other words, 2 +(a + ) + b + a + 2 =0,i.e.3 2 +2a + b =0. We see that a multipleroot of the equation x 3 + ax 2 + bx + c = 0 is the common root of the polynomialsx 3 + ax 2 + bx + c and 3x 2 +2ax + b. As we saw in Section 1, they are the roots ofthe polynomial g: c: d:(x 3 + ax 2 + bx + c 3x 2 +2ax + b) and the greatest commondivisor can be found by Euclid's algorithm.Wenow apply the same reasoning to the polynomial f(x) =a 0 +a 1 x++a n x nof arbitrary degree. When we divide f(x) by x ; we obtain as the quotient apolynomial g(x) of degree n ; 1 whose coecients depend on and so we shalldenote it by g(x ). We know (formula (3)) that the remainder is f():(12) f(x) =(x ; )g(x )+f():Putting x = into the polynomial g(x ) we obtain the polynomial in which iscalled the derivative of f(x) and is denoted by f 0 (). Hence, by denition,f(x)(13) f 0 ; f()() = ():x ; The above formula may cause some doubt, since after the substitution x = bothf(x) ; f()the numerator and the denominator in the expression become 0 and wex ; get 0 . This formula therefore needs to be explained: we rst (before substituting0x = ) divide the numerator by the denominator and we substitute x = intotheir quotient whichisapolynomial. For example, the meaning of the expressionx 2 ; 1x ; 1 (1) is: we rst get x2 ; 1= x + 1, and then (x + 1)(1) = 2.x ; 1Those of you who will continue to study mathematics will meet the derivativefor other functions, such asf(x) = sin x or f(x) =2 x . In essence they are denedby the same formula (13), but in general case it is more dicult to give the exactsense to the expression on right-hand side. In the case of polynomials everythingis cleared by applying Bezout's theorem to the polynomial f(x) ; f().If is a root of the polynomial f(x) in (12), i.e. if f() = 0, then we getf(x) =(x ; )g(x ) and by our denition is a multiple root of f(x) if is aroot of g(x ), i.e. if g( ) =0. But this means that f 0 () =0. We have provedthe assertion:THEOREM 5. Aroot of a polynomial f(x) is multiple if and only if it is alsoaroot of the derivative f 0 (x).We see that a multiple root is the common root of the polynomials f(x)and f 0 (x). In other words, isarootofg: c: d:(f(x)f 0 (x)) the greatest commondivisor can be found by Euclid's algorithm and it is, as a rule, a polynomial ofmuch smaller degree.


12 I. R. <strong>Shafarevich</strong>We shall now carry out the division of f(x) by x ; , we shall nd the polynomialg(x ) in (12) and we shall nd the explicite formula for the derivative ofa polynomial.We could make the usual division of f(x) by x; and nd the quotient g(x )and the remainder f(). But it is better to do it another way. Recall that f(x) isthe sum of the terms a k x k and hence f(x);f() is the sum of the terms a k (x k ; k ).The polynomial (x k ; k ) has a root x = and by Bezout's theorem it is divisibleby x ; . We have noticed (after the formulation of Bezout's theorem) that wehave already done this division earlier. True, only for = 1, but the general case iseasily reduced to it. We shall use formula (12) of Chapter I (where r + 1 is replacedby k):Replace x by x=:(x k ; 1) = (x ; 1)(x k;1 + x k;2 + + x +1): xk x k ; 1 = ; 1 x k;1 xk;2+k;1 + + x k;2 +1 :Multiplying both sides of this equality by k weget(14) x k ; k =(x ; )(x k;1 + x k;2 + + k;2 x + k;1 ):This formula was obtained for 6= 0 (since we had x=) but it is clearly true for = 0 also.Consider the polynomial f(x) = a 0 + a 1 x + + a n x n and the dierencef(x) ; f(). It is equal, as we saw, to the sum of the following terms a k (x k ; k ).Divide each such term by x ; , using formula (14). We geta k (x k ; k )= a k (x k;1 + x k;2 + + k;2 x + k;1 ):x ; If we putx = (into the right-hand side!) we obtain the term ka k k;1 . Hencefor the polynomial g(x ) in (12) for x = we get that g(x )() = g( )is the sum of terms ka k k;1 , i.e. a 1 +2a 2 +3a 3 2 + + na n n;1 . In otherwords, we have deduced the formula for the derivative f 0 (x) of the polynomialf(x) =a 0 + a 1 x + + a n x n :(15) f 0 (x) =a 1 +2a 2 x + + na n x n;1 :Compare this with what we obtained for polynomials of degree 2 and 3 and convinceyourself that those were special cases of (15) for n = 2 and n =3.The derivative of a polynomial is important not only in connection with multipleroots it has many other applications. We shall therefore prove the basicproperties of the derivative. All the proofs follow from the denition, i.e. from (12).a) The derivative of a constant polynomial. If f(x) =a 0 then, by denition,f(x) =f() andg(x ) =0. Hence f 0 () = 0, i.e. f 0 (x) =0.


Selected chapters from algebra 13b) The derivative of a sum. Let f 1 and f 2 be two polynomials and let f =f 1 + f 2 . We have(16)f 1 (x) =f 1 ()+(x ; )g 1 (x )f 2 (x) =f 2 ()+(x ; )g 2 (x )and therefore f 0 1() = g 1 ( ), f 0 2() = g 2 ( ). Adding the formulas (16) weget f(x) =f()+(x ; )g(x ), where g(x ) =g 1 (x ) +g 2 (x ). Therefore,f 0 () =g( ) =g 1 ( )+g 2 ( ) =f 0 1()+f 0 2(), i.e.(f 1 + f 2 ) 0 = f 0 1+ f 0 2 :Using induction on the numberofsummandswe easily obtain(f 1 + f 2 + + f r ) 0 = f 0 1+ f 0 2+ + f 0 r :c) Multiplication by a number. Let f 1 (x) =af(x). Then from the equalitiesf(x) =f()+(x ; )g(x ) andg( ) =f 0 (), multiplying by a we getf 1 (x) =af(x) =af()+(x ; )ag(x )i.e. f 1 (x) =f 1 ()+(x ; )ag(x ) andf 0 1() =af 0 ():(af) 0 = af 0 :d) The derivative of a product. Let f = f 1 f 2 . Multiplying the equalities (16)we getf 1 (x)f 2 (x) =f 1 ()f 2 ()+(x ; )g(x )where g(x ) =g 1 (x )f 2 ()+g 2 (x )f 1 ()+(x ; )g 1 (x )g 2 (x ). Thereforef(x) =f()+(x;)g(x ) where g(x ) isgiven above. Hence, f 0 () =g( ) =g 1 ( )f 2 ()+g 2 ( )f 1 () =f 0 1()f 2 ()+f 0 2()f 1 (), i.e.(17) (f 1 f 2 ) 0 = f 0 1 f 2 + f 0 2 f 1:If f 1 is a constant (polynomial of degree 0) then in view of a) from (17) we againget c).By induction on the number of factors we obtain(18) (f 1 f 2 f r ) 0 = f 0 1 f 2 f r + f 1 f 0 2 f r + f 1 f 2 f 0 r(on the right-hand side in the product f 1 f r each factor is succesfully replacedby its derivative).Indeed, according to (17):(f 1 f 2 f r ) 0 =((f 1 f r;1 )f r ) 0 =(f 1 f r;1 ) 0 f r + f 1 f r;1 f 0 r :Applying to (f 1 f r;1 ) 0 the expression (18) which can be taken to be alreadyproved, we obtain the required formula.An important special case occurs when all the factors in (18) are equal:(19) (f r ) 0 = rf r;1 f 0 :


14 I. R. <strong>Shafarevich</strong>From the dnition of the derivative it is easily veried that x 0 =1. Hence, (x r ) 0 =rx r;1 . Combining the above rules, we can give a dierent proof of the expliciteformula (15) for the derivative.Return now to the question of multiple roots of polynomials. Suppose that isarootofmultiplicity k of f(x). This means that f(x) =(x;) k g(x) where in notarootofg(x). According to (17) we have f 0 (x) =((x ; ) k ) 0 g(x)+(x ; ) k g 0 (x),and according to (19) we have ((x ; ) k ) 0 = k(x ; ) k;1 (since (x ; ) 0 =1,by(15)). Therefore, f 0 (x) =k(x;) k;1 g(x)+(x;) k g 0 (x) =(x;) k;1 p(;x), wherep(x) =kg(x)+(x;)g 0 (x). But is not a root of p(x): p() =kg() 6= 0. Considerthe polynomials d(x) =g: c: d:(f(x)f 0 (x)) and '(x) = f(x) (since d(x) is a divisord(x)of f(x), '(x) is a polynomial). The polynomial d(x) is divisible by (x ; ) k;1 sincef(x) andf 0 (x) are divisible by (x ; ) k;1 . But d(x) is not divisible by (x ; ) k ,since p() 6= 0 which means that p(x) is not divisible by x ; . We conclude that'(x) is divisible only by x ; (and no higher power, e.g. (x ; ) 2 , etc). Since '(x)is dened independently from the root (namely '(x) =f(x)g: c: d:(f(x)f 0 (x)) ) theabove conclusion is true for all the roots of f(x), and we see that '(x) has the sameroots as f(x),butnoneofthemismultiple. In view of this, we can always reduce aquestion regarding the roots of a polynomial to the case when the polynomial hasno multiple roots.Notice that we have implicitly met with the derivative in connection with theformula for the interpolation polynomial. Indeed, let F (x) =(x;x 1 )...(x;x n+1 ).From (14) we see that (x ; x i ) 0 =1. Therefore, formula(18)gives:F 0 (x) =(x ; x 2 ) (x ; x n+1 )+(x ; x 1 )(x ; x 3 ) (x ; x n+1 )++ +(x ; x 1 )(x ; x 2 ) (x ; x n ):If we use the notation F k (x) = F (x)x ; x kfrom Section I, then F 0 (x) =F 1 (x)++F n+1 (x). Substituting now for x one of the values x = x k , since all F i (x) fori 6= kcontain the factor x ; x k ,weseethatF i (x k )=0. Therefore F 0 (x k )=F k (x k )andthe formula (10) can be written in the formf(x) = y 1F 0 (x 1 ) F 1(x)+ y 2F 0 (x 2 ) F 2(x)+ y n+1F 0 (x n+1 ) F n+1(x):Problems1. Polynomial x 2n ;2x n +1 clearly has a root x =1,andbyBezout's theoremit is divisible by x ; 1. Find the quotient.2. For which values of a, b does the polynomial x n + ax n;1 + b have amultipleroot? Find this root.3. For which values of a, b does the polynomial x 3 + ax + b have amultipleroot?


Selected chapters from algebra 154. Prove that the polynomial x n + ax m + b cannot have nonzero root ofmultiplicity 3 or more.5. The derivative of the polynomial f 0 (x) is called the second derivative of thepolynomial f(x) and is denoted by f 00 (x). Find the formula for (f 1 f 2 ) 00 , analogousto (17), but which will be, of course, somewhat more complicated.6. Prove thatthe derivative of a polynomial is identically equal to 0 if andonly if the polynomial is constant (i.e. when its degree is 0).7. Prove that for a polynomial f(x) there exists a polynomial g(x) suchthatg 0 (x) =f(x) and that all such polynomials g(x) (for a given f(x)) can dier fromeach other only in the constant term.8. Prove that the number of roots of a polynomial cannot be greater than itsdegree, each root being counted k times if k is its multiplicity.3. The binomial formulaIn this section we shall be concerned with an important formula which expressesthe polynomial (1 + x) n in the usual form a 0 + a 1 x + + a n x n . In order tond the formula we have tomultiply out all the factors (1 + x)(1 + x) (1 + x).Working out these brackets we shall obtain terms of the form x k ,butsuch termswill appear several times, and by grouping them together we shall arrive at therequired formula. For instance, if n =2,itiswell known that(1 + x) 2 =(1+x)(1 + x) =1(1+x)+x(1 + x) =1+x + x + x 2 =1+2x + x 2 :For n = 3 the formula is also probably known. If not, it is easily obtained whenthe formula for (1 + x) 2 is multiplied by 1+x:(1 + x) 3 =(1+x) 2 (1 + x) =(1+2x + x 2 )(1 + x)=(1+2x + x 2 )+(1+2x + x 2 )x =1+3x +3x 2 + x 3 :The coecient a k of x k in the polynomial (1+x) n depends on the index k, butalso on the degree n. In order to indicate this dependence on n and k, we denotethis coecien by C k n. Therefore, C k n are by denition the coecients in the formula(20) (1 + x) n = C 0 n + C 1 n x + C2 n x2 + + C n n xn :For example, C2 0 =1,C2 1 = 2, C2 2 =1C3 0 =1,C3 1 =3,C3 2 =3,C3 3 = 1. Thecoecients Cn k are called binomial coecients. Our aim is to write them in anexplicit form. Notice that some of them are easy to nd. It is clear that when wemultiply all the x's by one another in the product (1+x) n ,we get x n , which meansthat the leading term of the polynomial (1 + x) n is x n , i.e.(21) C n n =1:Similarly, multiplying the constant terms (values for x = 0) in the product (1 + x) nwe see that the constant term of the polunomial (1 + x) n is 1, i.e.(22) C 0 n =1:


16 I. R. <strong>Shafarevich</strong>In the general case consider the derivatives of both sides of (20). On the left,according to (19), we getn(1 + x) n;1 , since (1 + x) 0 =1,by (15). We evaluate thederivative on the right using (15). We obtainn(1 + x) n;1 = C 1 n +2C 2 n x + + kCk n xk;1 + + nC n n xn;1 :But we can apply (20) for n ; 1 to the left-hand side of the above equality. Thecoecient of x k;1 will be nC k;1n;1 on the left and kCk n on the right. Therefore,kC k n = nC k;1n;1 ,or C k n = n k Ck;1 n;1 i.e. the coecient Cn k can be expressed in terms of the coecient C k;1n;1n(n ; 1)indices. Applying this formula to C k;1n;1 we get Ck n =the process r times we obtain the formulaC k n =n(n ; 1) (n ; r +1)k(k ; 1) (k ; r +1) Ck;r n;rk(k ; 1) Ck;2 n;2with smaller, and repeating(we takeawayfromn in the numerator and from k in the denominator r consecutivevalues: 0 1 ...r; 1). Finally, letr = k. Since we knowthatC 0 m = 1 for any m,we obtain the formula for C k n:(23) C k n =n(n ; 1) (n ; k +1):k(k ; 1) 1This is the formula we looked for.Formula (20) with the explicit expression (23) for the binomial coecients C k nis called the binomial formula (or \Newton's binomial").The binomial formula has a large number of applications and it is useful tohave the coecients (23) written in various forms. In the denominator we have theproduct of all positive integers from 1 to k. The product of the form 1 2 ... mis called m factorial and denoted by m!. In the numerator we have the product ofall positive integers from n to n ; k +1. If we multiply it by theproductofthenumbers from n ; k to 1 (i.e. by (n ; k)!) we obtainn!. Therefore, multiplying thenumerator and the denominator in (23) by (n ; k)!, we get(24) C k n =and this implies thatn!k!(n ; k)! (25) C k n = C n;kn :Notice that in formulas (23) and (24) it is not immediately clear that the denominatordivides the numerator, although we know that this is so having in mind themeaning of the coecients C k n in the formula (20). We can express the fact thatthe expression on the right-handsideof(23)isaninteger, by simply saying that


Selected chapters from algebra 17the product of any k consecutive integers is divisible by k!. We shall see later thatthe fact that right-hand sides of (23) and (24) are integers implies some interestingproperties of prime numbers.We now establish some important properties of the coecients C k n. The rstone follows from the obvious equality (1+x) n =(1+x) n;1 (1 + x) after expanding(1 + x) n and (1 + x) n;1 on the basis of (20). We obtainCn 0 + Cn 1 x + C2 n x2 + + Cn n xn =(C 0 n;1 + C 1 n;1x + + Cn;1n;1 xn;1 )(1 + x):The coecient ofx k on the left is Cn k and on the right is obtained from the sum ofthe terms C k n;1 xk 1andC k;1n;1 xk;1 x, i.e.itisC k n;1+ C k;1 . Thereforen;1(26) C k n = C k n;1 + C k;1n;1 :This is a very useful formula for evaluating coecients C k n by means of the coecientsof index n ; 1. In order to get a better visual representation, we write thecoecients C k n in the form of a triangle, where C k n are in the n-th row. Using theformulas (21) and (22), which say that at the beginning and at the end of each rowis 1, the triangle has the form11 11 2 1... ... ... ...1 C 2 n;1... C n;2n;111 Cn 1 Cn 2 ... Cn n;1 1... ... ... ... ... ... ...Formula (26) shows that each binomial coecient C k n is equal to the sum of thecoecients which are situated above onthe left and right of it. Taking the rsttwo rows as given, we easily obtain for the subsequent coecients:11 11 2 11 3 3 11 4 6 4 11 5 10 10 5 1... ... ... ... ... ... ...This triangle is called \Pascal's triangle".The second property is obtained by putting x =1into the formula (20) whichdenes the binomial coecients. On the left we get 2 n and on the right the sum of


18 I. R. <strong>Shafarevich</strong>all binomial coecients Cn k for k =0 1 ...n. Therefore, the sum of all numbersfrom the n-th row of Pascal's triangle is equal to 2 n .Finally, consider two neighbouring members from one row: Cn k;1 and Cn.kAccording to (24) we have Cn k n!=k!(n ; k)! , n!Ck;1 n =(k ; 1)! (n ; k +1)! . Sincek! =(k ; 1)! k, (n ; k +1)!=(n ; k)! (n ; k + 1), we getC k n = n ; k +1kC k;1n :It is evident that n ; k +1 > 1whenn ; k +1>k, i.e. k < n +1 and in thatk2case Cn k > Ck;1 n . Conversely, if k > n +1 , we obtain Cn k 2< Ck;1 n . Therefore,the numbersinonerow of Pascal's triangle increase up to the middle of the row,and after that they decrease. If n is even, then in the middle of the row we havethe greatest number Cnn=2 , and if n is odd, then there are two neighbouring equalgreatest numbers: C n(n;1)=2and C n (n+1)=2. In that case, for k = n +1 we have2Cn k = Cn k;1 .The formula (20), where the binomial coecients are dened by (23) can bewritten in a somewhat more general form. In order to do that, put x = b=a andmultiply both sides of (20) by a n . We obtain the formula(27) (a + b) n = C 0 n an + C 1 n an;1 b + C 2 n an;2 b 2 + + C n n bn :This formula was proved for a 6= 0 (since we divided by a), but it is obviously truealso for a =0. It is also called the binomial formula.We shall now consider some consequences of the binomial formula and theirapplications. As a rule, the simpler a result is, the more applications it has. So, inthe binomial formula we often use the values of the rst coecients. We alreadyknow that the rst coecient Cn 0 is 1. The next one Cn, 1 according to (23) is n.Notice that in view of (25) it follows that Cn n =1(which we already know) andthat Cnn;1 = n. Hence,(a + b) n = a n + na n;1 b + + nab n;1 + b n :This can be applied to equations. We write an equation of order n in the forma 0 + a 1 x + + a n;1 x n;1 + a n x n =0. The fact that its degree is n means thata n 6=0andwe can divide the equation by a n to obtain an equivalent equation inwhich a n = 1. In further text we shall suppose that this has been done and wewrite the equation in the form f(x) =a 0 + a 1 x++ a n;1 x n;1 + x n =0. We shallnow make another transformation of this equation into an equivalent equation. Inorder to do so, put x = y + c, where y is the new variable and c is a number.Substituting into our equation this value of x, from each term a m x m we obtain theterm a m (y + c) m which canbe,by the binomial formula, written as a polynomialin y, and then we collect together corresponding terms. As a result we obtain anew polynomial in y which we denote by g(y) =f(y + c). Since y is expressed in


Selected chapters from algebra 19terms of x: y = x ; c, the equations f(x) =0andg(y) = 0 are equivalent: to theroot x = of f(x) = 0 corresponds the root y = ; c of g(y) = 0, and to the rooty = of this equation corresponds the root x = + c of the equation f(x) =0.Let us examine how the coecients change in this transformation. First of all, thedegree of the equation g(y) =0isn and the coecient of its leading term is 1. Thisfollows from the fact that when a m (y + c) m is expanded by the binomial formula, itgives rise to terms in y with degrees 6 m. Therefore, the term of degree n can onlybe obtained from (y + c) n and (again by the binomial formula) it is equal to y n .Let us look at the term of degree n ; 1. It can be obtained from the term (y + c) nand the term a n;1 (y + c) n;1 . From the last one we get a n;1 y n;1 , and in (y + c) nwe have totake the second term in the binomial expansion. As we know it is equalto ny n;1 c. Hence, the term of degree n ; 1 in the polynomial g(y) =f(y + c) hasthe form (a n;1 + nc)y n;1 .This can be used to simplify the equation by chosing c so that the term ofdegree n ; 1 vanishes: we put a n;1 + nc =0,i.e. c = ;a n;1 =n. We proved thefollowingTHEOREM 6. The substitution x = y;a n;1 =n transforms the equation f(x) =a 0 + a 1 x + + a n;1 x n;1 + x n =0into equivalent equation g(y) =0of degree nwhose coecient of the leading term is 1 and which has no term of degree n ; 1.Notice that Theorem 6 gives the formula for the solutions of second degreeequations. Indeed, the polynomial g(y) has the form y 2 + b 2 and its roots aretherefore y = p ;b 2 . Make the substitution indicated in Theorem 6, evaluate b 2and the roots of f(x) andverify that in this way we obtain the standard formulafor the solutions of a quadratic equation. In the case of polynomials of arbitrarydegree we only get a certain simplication, which is sometimes useful. For example,we see that any third degree equation is equivalent to an equation of the formx 3 + ax + b =0.At the end we shall apply the binomial formula to the evaluation of the sumsof powers of integers. We shall be concerned with the sums(28) S m (n) =0 m +1 m +2 m + + n mof m-th powers of all nonnegative integers not greater than n. You probably known(n +1)the formula S 1 (n) = (see Problem 5 in Section 1 of Chapter I). We start2with a few remarks on the evaluationofsumsingeneral. Let a 0 , a 1 , a 2 ,... , a n ,...be an arbitrary innite sequence of numbers, and consider the following sequenceof their sums: a 0 , a 0 + a 1 , a 0 + a 1 + a 2 , ... , a 0 + a 1 + a 2 + + a n , ... Denotethe rst sequence by the letter a its (n + 1)-st term is a n (itismoreconvenientto write the (n + 1)-st term and not the n-th, and to start the sequence with a 0 ).The above sequence of sums will be denoted by Sa, and its (n + 1)-st term is(Sa) n = a 0 + a 1 + a 2 + + a n n =0 1 2 ...For example, if a n = n m , n =0 1 2 ..., then Sa is the sequence of sums S m (n).Clearly, if we know the sequence Sa we can nd the sequence a. Namely, by


20 I. R. <strong>Shafarevich</strong>substracting the n-th from the (n + 1)-st term of Sa, we obtain a n . Indeed, let(29)(30)b n =(Sa) n = a 0 + a 1 + + a n b n;1 =(Sa) n;1 = a 0 + a 1 + + a n;1 and subtracting (30) from (29) we get b n ; b n;1 = a n . We introduce anotherimportant construction. Together with an arbitrary sequence b 0 , b 1 , b 2 , ... , b n ,... consider the sequence b 0 , b 1 ; b 0 , b 2 ; b 1 , ... , b n+1 ; b n , ... If the rstsequence is denoted by b, then the second is denoted by b. Its (n + 1)-st term is(b) 0 = b 0 (b) n = b n ; b n;1 n =1 2 ...The established connection between the sequences a and Sa can be expressed bythe formula Sa = a. It turns out that there is a formula completely symmetricalto this one, namely both equalities(31) Sa = a Sb = bare true. It can be said that the operations S and applied to sequences areinverse to each other. We have already established the rst formula. In order toprove the second, write the equalities which dene the numbers a k = (b) k fork =0 1 ...n; 1:a 0 = b 0a 1 = b 1 ; b 0a 2 = b 2 ; b 1...a n = b n ; b n;1and add them up. On the left we obtain a 0 ++a n , i.e. (Sa) n , and on the rightallthe numbers cancel out, except b n in the last formula, so that we get (Sa) n = b n ,that is to say the second formula (31).The above relations are useful, since it is often simpler not to evaluate a sumdirectly, i.e. not to nd the sequence Sa directly, but instead to nd a sequnce bsuch that b = a, and from the second relation (31) to obtain Sa = b.This idea will now be applied to the sums (28). We have seen that S m (n) =(Sa) n , where a n = n m . How can we writethe sequence a, a n = n m in the forma =b? This follows from the following assertion.THEOREM 7. For any polynomial f(x) of degree m there exists a uniquepolynomial g(x) of degree m +1 such that(32) g(x) ; g(x ; 1) = f(x)and the constant term of g(x) is 0.The uniqueness of the polynomial g(x) with the given property is easily shown.Let g 1 (x) be another polynomial such that g 1 (x) ; g 1 (x ; 1) = f(x) and whose


Selected chapters from algebra 21constant term is 0. Subtract (32) from the last equality. Putting g 1 (x) ; g(x) =g 2 (x), we see that g 2 (x) ; g 2 (x ; 1) = 0 and the constant termofg 2 (x) is0, i.e.g 2 (0) = 0. For x =1,the above equality gives g 2 (1) = 0. Putting x =2wegetg 2 (2) = g 2 (1)=0,... and by induction that g 2 (n) = 0 for all positive integers n.In other words, all positive integers are roots of g 2 (x). According to Theorem 3this is possible only if g 2 = 0, which means that g = g 1 .The existence of the polynomial g will be proved by induction on m, the degreeof f(x). For m = 0, the polynomial f is a constant a and we see that g(x) =axsatises (32). Suppose that the assertion is true for polynomials f of degree lessthan m. Let a m x m be the leading term of f. Choose the number a so that theleading term of the polynomial ax m+1 ; a(x ; 1) m+1 is equal to the leading terma m x m of f. In order to do this, apply the binomial formula(x ; 1) m+1 = x m+1 ; (m +1)x m + where the dots stand for terms of degree less than m. This impliesClearly we havex m+1 ; (x ; 1) m+1 =(m +1)x m + :(33) a = a mm +1 :a mThen, in the dierence f(x);m +1 (xm+1 ;(x;1) m+1 ) the terms of degree m cancelout and this dierence will have degree less than m. Denoting this polynomialby h(x), by the induction hypothesis we can take that there is a polynomial g 1 of degreeless than m+1 and with zero constant term such thath(x) =g 1 (x);g 1 (x;1),i.e.f(x) ;a mm +1 (xm+1 ; (x ; 1) m+1 )=g 1 (x) ; g 1 (x ; 1):The above equality can be written in the form f(x) =g(x) ; g(x ; 1), whereg(x) =a mm +1 xm+1 + g 1 (x)and the theorem is proved. Of course, in practical construction we do not applyinduction, but we repeat the same procedure of subtraction to the polynomial h(x),and so on until we arrive at a polynomial of degree 0.Return now totheevaluation of the sum S m (n). We have seen that this sumis equal to b n , where b is such that b = a, a n = n m . Apply Theorem 7 to thepolynomial x m . We obtain the polynomial g(x) of degree m +1suchthatg(x) ; g(x ; 1) = x mand the constant term of g(x) is0. Putting x = n into the above equality, weseethat the sequence b n = g(n) for n > 1andb 0 = g(0) = 0 satises the conditionb = a, i.e.Sa = b. Therefore we have proved the following


22 I. R. <strong>Shafarevich</strong>THEOREM 8. The sums S m (n) can be expressed in the form g m (n) where g mis the polynomial of degree m+1 such that g m (x);g m (x;1) = x m and its constantterm is 0.Notice that the proof of Theorem 7 provides us with a method of constructingthe polynomial g m (x) for any m. For instance, let m = 2. By analogywith sequences, we denote the polynomial g(x) ; g(x ; 1) by g, i.e. we put(g)(x) = g(x) ; g(x ; 1). We rst have to nd the monomial ax 3 so that theleading term of (ax 3 ) is equal to x 2 . In view of (33), a = 1 (in this case m =2, 13a 2 =1). By the binomial formula, 3 x2 = 1 1 3 x3 ;3 (x ; 1)3 = x 2 ; x + 1 3 and 1x 2 ; 3 x3 = x 1 ;3 . Now wehave to nd the monomial bx2 so that the leadingcoecient of (bx 2 ) is equal to x. In view of (33), b = 1 (in this case m = 1, 21a 1 = 1) and by the binomial formula 2 x2 = 1 1 2 x2 ;2 (x ; 1)2 = x 1 ;2 ,and 1 1x 2 ; 3 x3 ; 2 x2 = 1 ;3 + 1 2 = 1 6 . Finally, 1 16 = 6 x= 1 6 x 1 ; (x ; 1).61At the end we get that x 2 =3 x3 + 1 2 x2 + 1 6 x and so g(x) = 1 3 x3 + 1 2 x2 + 1 6 x =(2x 2 +3x +1)x (2x + 1)(x +1)x(2n + 1)(n +1)n= . Therefore S 2 (n) = .666We conclude with two more remarks.Remark 1. The obtained formula for the sum S m (n) can be summarizedas follows. For each m there exists the unique polynomial g m (x) with constantterm 0 such that g m (x) ; g m (x ; 1) = x m . The method of its construction iscontained in the proof of Theorem 7. Its degree is m+1. The formula for S m (n) is:S m (n) =g m (n). Hence the question reduces to the investigation of the importantpolynomials g m (x). They are called Bernoulli's polynomials. In the Appendix weshall giveamuch more explixit expression for these polynomials, using an importantsequence of rational numbers, called Bernoulli's numbers.Remark 2. (Historical) The introduced operations S and which transformthe sequences a and b into Sa and b are very similar to the fundamental operationsof Analysis which dene for a function f(x) (but not for every function!) theindenite integral R fdx, and for a function g its derivative g 0 . Our operations Sand are elementary analogs of the operations R fdxand g 0 . Sums and dierencesare also present in the denitions of the integral and the derivative, but in a morecomplicated way (in our denition of the derivative of a polynomial dierenceswere also present|see formula (13)). As in the case of S and , the operations offorming the derivative and the integral are inverse to each other. As in our case,the evaluation of the derivative is simpler than the evaluation of the integral, andthe integral of a function f(x) is mainly evaluated by nding a function whosederivative is equal to f(x).However, the operations for sequences and functions are not only analogous


Selected chapters from algebra 23Fig. 1their connection is deeper. Evaluating the integral of a function f(x) is equivalentto evaluating the area of the surface bounded by thegraph of that function, thex-axis and by two vertical lines starting at x = a and x = b (Fig. 1).Of course, we shall not prove this, as we have not dened the integral, butwe shall show, on a simple example, how such anarea can be evaluated, and itsconnection with the problems we considered earlier.We shall try to determine the area bounded by the parabola which is the graphof the function y = x 2 ,bythex-axis and by the line x = 1 (Fig. 2).Fig. 2In order to do that, divide the segmentbetween0and1into a large number n ofequal parts with coordinates 0, 1 n , 2 n ,... , n ; 1 ,1andevaluate the corresponding n2 2 21 2 n ; 1values 0, , , ... , , 1 of the function y = x 2 . Construct then nnrectangles whose bases are segments from i 2i +1iton nand whose heights are .nThe polygon made up from these rectangles is contained in that part \under theparabola" whose area we wish to determine and by \looking at the picture" we see


24 I. R. <strong>Shafarevich</strong>that if n is very great, then the area s n of this polygon diers very little from thearea of the part under the parabola (we cannot be more precise, since we have notprecisely dened what area is). The area of the polygon is the sum of the areasof the rectangles which make itup. The area of the i-th rectangle is equal to theproduct of its basis 1 2in and its height ,i.e.itis i2nn . Therefore, the area s 3 n ofthe polygon iss n = 02n + 123 n + 22 (n ; 1)2+ + = S 2(n ; 1):3 n3 n 3 n 3We have already found that S 2 (n) = 1 3 n3 + 1 2 n2 + 1 n, and so (replacing n by n ;1in the formula for6S 2 (n)) we gets n = 1 3 ; 1 2 1 n + 1 6 1n 2 :It is clear that as n becomes greater and greater, then the terms ; 1 2 1 n and 1 6 1 n 2become smaller and smaller, and the area of the polygon approaches 1 3 . Hence,this is the area of the gure bounded by the parabola.We have presented here the lines of thought followed, in principle, by Archimedes(3rd century B.C.) who was the rst to solve this problem. (Archimedesdevised a rather articial method whichallowed him to use the sum of a geometricalprogression, instead of the sum S 2 (n). But he knew the formula for S 2 (n) and usedit for the evaluation of other areas and volumes).Mathematicians of the new period were obsessed by the dream to \surpassthe ancients" (that is to say, the Ancient Greek mathematicians) and Archimedeswas considered to be the most important of them. They were therefore very muchinterested in solving the problem considered above for the function y = x n ,wheren is greater than 2. It seems that the rst to obtain the solution was French mathematicianFermat (17th century) who used practically the same method we outlinedabove (itwas later somewhat simplied). At that time the mentioned connectionbetween the integral and the derivative was not known and the integral (i.e. thearea) was calculated directly from the denition. It was later discovered that (to usecontemporary terminology) the operations of forming derivatives and integrals areinverse to each other. This was established by Newton's teacher Barrow. (Newtonworked together with Barrow when he studied at the university, and later on tookover Barrow's chair when the latter decided to take orders). Systematic evaluationof the integral of a function f by nding a function g such that the derivative ofgis f was initiated by Newton. After that the calculation of integrals and areas bythe method we exposed became unnecessary. Nowadays students of higher classescan easily nd the integral of x m for any m without caluclating the sum S m (n).In this way, if in Chapter I we moved in the circle of ideas of Ancient Greekmathematicians (Pythagoras, Theaetetus, Euclid), in this chapter we have encounteredthe ideas of the mathematicians from the new period (17th century).


Selected chapters from algebra 25Problems1. Notice that the area s n of the polygon which we calculated at the end of thesection is less than the area of the given gure, bounded by the parabola y = x 2 ,since the polygon is situated inside that gure. Construct the polygon made upfrom rectangles whose bases are segments from i i +1to and whose heights are 2 n ni +1which contains the given gure. Its area s 0 n will therefore be greater thannthe area of the gure. Calculate the area s 0 n and prove that as n increases, itapproaches 1=3. This gives a more convincing (i.e. more \strict") proof of the factthat the required area is 1=3.2. Try to solve the analogous problem for the \m-th degree parabola", givenby the equation y = x m . Verify that in order to obtain the result it is not necessaryto know the Bernoulli's polynomials g m (x) completely, but that is enough to knowthe coecient of the leading term a m+1 x m+1 . Prove that a m+1 = 1 and hencem +1nd the area of the gure bounded by the parabola whose equation is y = x m ,bythe x-axis and by the line x =1.3. Prove that the area of the gure bounded by the parabola y = x m ,x-axis and the line x = a is equal tothe polynomial1m +1 am+1 . Notice that the derivative of1m +1 xm+1 is x m . This is indeed an instance of Barrow's theoremthat integration and nding derivatives are operations inverse to each other.4. Prove that the sum of the binomial coecients with even upper indicesC 0 n + C 2 n + and with odd indices C 1 n + C 3 n + are equal and nd their mutualvalue.5. Find the relation between binomial coecients which expresses that(1 + x) n (1 + x) m = (1 + x) n+m . For n = m deduce the formula for the sumof the squares of binomial coecients.6. If p is a prime number, prove that all binomial coecients C k p for k 6= 0p,are divisible by p. Deduce from this that 2 p ; 2 is divisible by p. Prove thatforany integer n, thenumber n p ; n is divisible by p. This theorem was rst provedby Fermat.7. What can be said about the sequence a if all the terms of the sequence aare equal? What does formula (31) give in this case?8. Find the sum S 3 (n) and verify that S 3 (n) =(S 1 (n)) 2 .9. Let a be any sequence a 0 , a 1 , a 2 , ... Apply the operation once more tothe sequence a. The obtained sequence (a) will be denoted by 2 a. Dene k a by induction as ( k;1 a). When can we solve the so-called \innite interpolationproblem", that is to say when is there a polynomial f(x) of degree notgreater than m such thatf(n) =a n for n =0 1 2 ...? Prove that a necessary andsucient condition is given by( m+1 a) n = 0 for n > m. This condition shows thatif we write the sequence a, and under it the sequence whose terms are dierences


26 I. R. <strong>Shafarevich</strong>of the two terms above,andsoon:a 0 a 1 a 2 ... a n a n+1a 1 ; a 0 a 2 ; a 1 ... ... a n+1 ; a n ...... ... ... ... ...then in the (m + 1)-st row weonlyhave zeros.Is there a polynomial f(x) suchthatf(n) =2 n for all positive integers n?10. Prove that if a n = q n ,then(a) n = a n;1 (q ; 1). Usethistogive anotherproof of the formula (12) from Chapter I.11. Let m 1 < m 2 < < m n+1 be positive integers and let f(x) be apolynomial of degree n, where the coecient ofx n is 1. Prove that at least one ofthe numbers f(m k ) is not less than n!=2 n .Hint. Use the result of Problem 8 of Section 1. Notice (with the notation ofSection 1) that F k (m k ) > k!(n ; k)! and use some known relations for binomialcoecients.12. Apply the formula (12) from Chapter I to the sum 1 + (1 + x) + (1 + x) 2 ++(1+x) n . Equating the coecients of terms with equal degrees on the left andright, nd the formula for the sumC k k + C k k+1+ + C k n :APPENDIX 1Bernoulli's polynomials and numbersIn Section 3 we showed that the values of the sums of powers of consecutivepositive integers, i.e. the sums S m (n) coincide with the values g m (n) of Bernoulli'spolynomials g m (x), which have the properties(1)1) g m (x) ; g m (x ; 1) = x m 2) the constant term of g m (x) is0.For any m there is only one polynomial of degree m + 1 with these properties.We have given a method for constructing the polynomials g m (x). However,we would like to have a more explicit formula for these polynomials. In orderto achieve this, we shall follow the same path we took in deducing the binomialformula. Namely, we shall rst nd the derivative of both sides of (1). But we rsthave to see how to nd the derivative f(x ; 1) 0 of the polynomial f(x ; 1).LEMMA 1. f(x ; 1) 0 = f 0 (x ; 1).1 Starting with Chapter II, each chapter will have an Appendix. In these appendices we shallonly use those facts which have been exposed earlier, but the text will be somewhat harder thanthe basic text. This means that we shall apply the same arguments as before, but in provinga theorem we shall have to keep in mind a larger number of them. The level of these textsapproaches the level of a simple professional mathematical book.


Selected chapters from algebra 27At rst sight this equality may seem obvious, but it is not really so. Theequality means that when in the polynomial f(x) we rst substitute x ; 1forx,then write it as the sum of powers of x, and then nd its derivative, the obtainedresult is the same as when in the derivative f 0 (x) we substitute x ; 1forx.The proof follows directly from the denition of the derivative of a polynomial,i.e. from (11). Letf(x) ; f() =(x ; )g(x ):Substituting x ; 1forx and ; 1for in this equality, wegetf(x ; 1) ; f( ; 1) = (x ; )g(x ; 1; 1):By denition we have f( ; 1) 0 = g( ; 1 ; 1) and f 0 () = g( ). Hence,f( ; 1) 0 = f 0 ( ; 1), which was to be proved.Lemma 1 could be proved by the use of formulas (16){(19) and reduction tomonomials (verify this).We cannow nd the derivatives of both sides of (1). Having in mind Lemma1 and the rule (13) for derivatives, we obtaing 0 m(x) ; g 0 m(x ; 1) = mx m;1 :On the other hand, replacing m by m ; 1 in (1) we getg m;1 (x) ; g m;1 (x ; 1) = x m;1 :Multiply the second equality by m and subtract it from the rst. Putting h m =mg m;1 ; g 0 m we nd thath m (x) =h m (x ; 1):But this implies that the polynomial h m is constant (of degree 0). Indeed, puttingin this equality x = 1, 2, etc. we obtain h m (0) = h m (1) = h m (2) = . Inother words the polynomial h m (x) and the constant h m (0) have equal values for allpositive integers x, and in view of Theorem 4 they must be equal: h m (x) =h m (0).(We have already met with this kind of reasoning at the beginning of the proof ofTheorem 7.) Hence, the polynomial h m (x) is equal to a constant whichwe shalldenote by ; m . Having in mind the denition of h m (x) we obtain the relation(2) g 0 m = mg m;1 + m :As in the derivation of the binomial formula, we now write the polynomial g m (x)as the sum of powers of x. As before, the lower index indicates the polynomialin question, and upper index corresponds to the degree of x. The coecients aredenoted by A k m and g m (x) has the formg m (x) =A 1 m x + A2 m x2 + + A k m xk + + A m+1m xm+1 :(Remember that the constant termofg m is 0.) Write down the derivative ofg m (x),using the formula (13):g m (x) 0 = A 1 m +2A 2 m x + + kAk m xk;1 + +(m +1)A m+1m xm :


28 I. R. <strong>Shafarevich</strong>On the other hand, write down the analogous formula for g m;1 (x) (replacing m bym ; 1):g m;1 (x)+A 1 m;1 x + A2 m;1 x2 + + A k m;1 xk + + A m m;1 xm and substitute these two formulas into (2). Equating the coecients of x k;1 , wend:(3)(4)kA k m = mA k;1m;1for k > 2A 1 m = m for k =1:(Notice that in the above formulas there is no 0 we only have k where k > 1.)We have obtained the formula similar to the formula for the binomial coecientsC k m, the dierence being that formula (3) holds only for k > 2, and for k =1itisreplaced by (4).Again, we continue to follow the case of binomial coecients.A k m = m k Ak;1 m;1. Applying this formula to Ak;1 we get: m;1 Ak m =Continuing this procedure, after k ; 1 steps we ndA k m =m(m ; 1) (m ; k +2)k(k ; 1) 2A 1 m;k+1 =m(m ; 1) (m ; k +2)k(k ; 1) 2We have:m(m ; 1)k(k ; 1) Ak;2 . m;2 m;k+1(for A 1 m;k+1we use formula (4)).The coecient of m;k+1 very much resembles the binomial coecient. Itdiers from Cm k (see formula (21)) in so much thatthe numerator does not havethe last factor m ; k + 1 (and the denominator does not have the last factor 1, butthis has no eect on the product). However, in the formula for C k m+1 the productin the numerator ends with m;k +1, but it begins with m+1, which is not presenthere. Hence, we can write the coecient of m;k+1 in the form 1m +1 Ck m+1, andthe formula for A k m becomes:A k m = 1m +1 Ck m+1 m+1;k:(We write m;k+1 as m+1;k so that the factors look more similar.)In this way wehave obtained the following formula for the polynomials g m (x):(5) g m (x) = 1m +1 (C1 m+1 mx + C 2 m+1 m;1x 2 + ++ C k m+1 m+1;kx k + + C m+1m+1 0x m+1 ):The obtained formula resembles the binomial formula. Suppose that we have anew variable a and expand the binomial (x + a) m+1 . We then obtain the sameterms as in the brackets in the above formula (5), except that a k is replaced by kand it has no term corresponding to C 0 m+1 m+1. We can compensate for this byconsidering the dierence (x + a) m+1 ; a m+1 in which case the terms with a m+1


Selected chapters from algebra 29cancel. In order to emphasize this analogy, introduce the following notation. Let abe the sequence 1 , 2 , ... and let f(t) be the polynomial a 0 + a 1 t + + a k t k .Then f(a) denotes the number a 0 + a 1 1 + + a k k , i.e. t k is replaced by k .In particular, a m = m , since replacing t m by m we obtain m . Analogously,(x + a) m = x m + C 1 m xm;1 1 + + C m m m : we expand (x + a) m in powers of t andreplace t k by k . The relation (5) with this notation can be written in the form(6) g m (x) = 1 ;(a + x) m+1 ; a m+1 :m +1Remark that a m+1 = m+1 . Notice that we cannot establish that the polynomialgiven by (6) satises the relation (1). In fact, we have found the general formof the polynomials which satisfy the relations (2), but those relations are onlyconsequences of the relation (1). Indeed, the result depends upon the sequence m ,which can in (6) be arbitrary, whereas Theorem 7 states that the polynomial g m (x)is unique for each m. Therefore, we have notyet solved the problem.Among the polynomials g m (x) given by (6) we have to choose those whichsatisfy the relation (1). Since we already know that such polynomials exist andthey are unique (for each m) we only have to nd the unique sequence a whichdenes them. This is quite simple: it is enough to put x = 1 into (1). Sinceg m (0) = 0 (the constant term of g m is 0), we getg m (1) = 1. The notation of theformula (6) yields (a +1) m+1 ; m+1 = m + 1 for m =0 1 2 ... or(a +1) m ; m = m m =1 2 3 ...Definition. The numbers B 1 , B 2 , B 3 ,... are called Bernoulli's numbers ifthe sequence B formed by them satises the relations(B +1) m ; B m = m for m =1 2 3 ...The above relations uniquely dene the sequence of Bernoulli's numbers. Indeed,expanding the above formula, by denition we get(10) 1+mB 1 + C 2 m B 2 + + mB m;1 = m m =1 2 ...(B m cancels). From this relation for m = 1 we get that B 1 = 1=2, and everyrelation that follows allows us to nd B m;1 provided that we knowallB r 's withindices r


30 I. R. <strong>Shafarevich</strong>Bernoulli's polynomials and numbers were discovered by Jacob Bernoulli (therewas a large family of mathematicians of that name). His main results belong to thesecond half of the 17th century, but this particular discovery appeared in a bookpublished after his death at the beginning of the 18th century. The numbers B nwere named Bernoulli's numbers by Euler (18th century) who found many otherapplications of those numbers.Putting k = 2 3 4 5 6 7 8 9 10 11 12 13 into (10) we obtain the followingvalues for the numbers B n (verify this yourself!)B 1 = 1 2 B 2 = 1 6 B 3 =0 B 4 = ; 1 30 B 5 =0 B 6 = 1 42 B 7 =0 B 8 = ; 1 30 B 9 =0 B 10 = 566 B 11 =0 B 12 = ; 6912730etc. Then we easily establish:etc.S 1 (n) =n(n +1) S 2 (n) =2S 4 (n) = n(n + 1)(2n +1)(3n2 +3n ; 1)30n(n +1)(2n +1)6 S 3 (n) = n2 (n +1) 24 S 5 (n) = n2 (n +1) 2 (2n 2 +2n ; 1)12Problems1. Find B m (;1).2. Prove the formula B m =(B ; 1) m for m > 2.3. Derive a relation, analogous to (10), which holds for Bernoulli's numbersB m with odd indices m > 3. Prove that all Bernoulli's numbers B m with oddindices, except B 1 , are equal to 0.4. Find S 6 (n).5. Prove the formula for the derivative of a polynomial of a polynomial: iff(x) andg(x) are polynomials, thenf(g(x)) 0 = f 0 (g(x))g 0 (x):6. Find (a + x) n if the sequence a has the form a n = q n for some number n.I. R. <strong>Shafarevich</strong>,Russian Academy of Sciences,Moscow, Russia


THE TEACHING OF MATHEMATICS1999, Vol. II, 2, pp. 65{80<strong>SELECTED</strong> <strong>CHAPTERS</strong> <strong>FROM</strong> <strong>ALGEBRA</strong>I. R. <strong>Shafarevich</strong>Abstract. This paper is the third part of the publication \Selected chaptersof algebra", the rst two being published in the previous issues of the Teaching ofMathematics, Vol. I (1998), 1{22, and Vol. II, 1 (1999), 1{30.AMS Subject Classication: 00 A 35Key words and phrases: Set, subset, one-to-one correspondence, combinatorics.CHAPTER III. SET1. Sets and subsetsThe notion of a set has a somewhat dierent meaning in mathematics thanin everyday language. The ordinary word \set" usually means a large number ofcertain objects 1 . In mathematics a set is an arbitrary collection of objects denedby a certain property which they all have. The objects which comprise a set arecalled its elements. So, for instance, we may talk about sets of one or two elements.A set is usually denoted by a capital letter (for example, M) anditselements bysmall letters (for example, a, b, ... , , , ... ). The fact that a is an element ofthe set M is written in the form a 2 M and we also say thata belongs to M. If Mconsists of elements a 1 , ... , a n ,we write M = fa 1 ...a n g.A set containing a nite number of elements is called a nite set, while a setcontaining an innite number of elements is called an innite set. The number ofelements of a nite set M is denoted by n(M).In this chapter we shall mainly be concerned with nite sets. The nite setsM and M 0 are said to be equivalent (equipotent) if they have the same numberof elements, i.e. if n(M) = n(M 0 ). We shall now describe the method which isusually used to establish the equivalence of two sets. One-to-one correspondencebetween two sets M and M 0 is coupling, or pairing o, their elements into pairs(a a 0 ), where a 2 M, a 0 2 M 0 , so that each element a of M is coupled with oneand only one element a 0 of M 0 , and each element a 0 of M 0 is coupled with one andThis paper is an English translation of: I. R. Xafareviq, Izbranye glavy algebry,Matematiqeskoe obrazovanie, 3, okt.{dek. 1997, Moskva, str. 2{45. In the opinion of the editors,the paper merits wider circulation and we are thankful to the author for his kind permission tolet us make this version.1 This is not so much true for the English language as it is for Russian. The Russian wordfor the set mnoestvo has the same root as the word mnogo (many).


66 I. R. <strong>Shafarevich</strong>Fig. 1only one element a of M. If we represent the sets M and M 0 graphically and drawlines connecting those elements which belong to one pair, we see that each elementof M is connected to one and only one element ofM 0 and vice versa (Fig. 1).For instance, if n(M) =n and if we numerate the elements of M as follows:M = fa 1 ...a n g, we have established a one-to-one correspondence between theset M and the set N of numbers 1, 2, . . . , n.If we choose two points O and E on a straight line, then to each point A whichlies on that line we can correspond the real number jOAj with the + sign if A isjOEjon the same side of O as E, and with ; sign in the opposite case. This establishesa one-to-one correspondence between the set of the points of the straight lineandthe set of all real numbers which is usually denoted by R. We shall consider thisin more detail in one of the subsequent chapters.If we have a one-to-one correspondence between the sets M and M 0 , and ifthe elements a 2 M and a 0 2 M 0 are coupled in the pair (a a 0 ) we say that theelement a corresponds to the element a 0 , and that the element a 0 corresponds tothe element a.Two nite sets are equivalent if and only if it is possible to establish a one-toonecorrespondence between them.This statement issoovious, that it can hardly be called a theorem. If n(M) =n(M 0 )=n, we can write our sets as follows: M = fa 1 ...a n g, M 0 = fa 0 1 ...a0 g, nand by forming pairs (a i a 0 i) of elements with the same index we establish a oneto-onecorrespondence between M and M 0 . Conversely, if there exists a one-toonecorrespondence between M and M 0 , and if we write M in the form M =fa 1 ...a n g, then each a i belongs to a pair with one and only one element a 0 2 M 0 ,and we can give itthe same index, i.e. we can put a 0 = a i . By the denition ofa one-to-one correspondence, we cannumerate in this way all the elements of M 0 ,and we obtain that M 0 = fa 0 1 ...a0 g. nR. Dedekind (the second part of the 19th century) who did a lot to clear the rolewhich sets have in mathematics, thought that the above statementgives, in a hiddenform, the denition of a positive integer. According to him, it is rst necessary todene the notion of one-to-one correspondence, and then a positive integer is thegeneral property possessed by all nite sets among which it is possible to establish


Selected chapters from algebra 67one-to-one correspondence. This is probably how the notion of a positive integerwas formed historically (of course, without the present terminology). For example,the notion \two" was formed, as we said in Section 1 of Chapter I, by abstracting,i.e. by considering the general property shared by the sets consisting of: two eyes,two oars in a boat, two travellers walking along the road, and more generally by allthe sets which can be put into a one-to-one correspondence with one of the above.This means that the notion of a set is the most fundamental notion of mathematics,since the notion of a positive integer is founded upon the notion of aset.In further text we shall often construct new sets, starting with two given sets.The product of sets M 1 and M 2 is the set whose elements are all the pairs(a b), where a is an arbitrary element ofM 1 and b is an arbitrary element ofM 2 .The product of M 1 and M 2 is denoted byM 1 M 2 .For example, if M 1 = f1 2g, M 2 =f3 4g, thenM 1 M 2 consists of the pairs(1 3), (1 4), (2 3), (2 4).If M 1 = M 2 is the set R of all realnumbers, M 1 M 2 is the set of all pairs(a b), where a and b are real numbers.The coordinate method in the plane establishesa one-to-one correspondence betweenthe set M 1 M 2 and the set of allpoints of the plane (Fig. 2). Fig. 2As another example, suppose that M 1 consists of the numbers 1, 2, . . . , n, andM 2 of the numbers 1, 2, . . . , m. Introduce two new variables x and y and correspondto a number k 2 M 1 the monomial x k and to the number l 2 M 2 the monomial y l .An element of the set M 1 M 2 has the form (k l) and we can correspond to itthe monomial x k y l . In this way we obtain a one-to-one correspondence betweenthe set M 1 M 2 and the set of monomials of the form x k y l , where k =1 ...nl = 1 ...m. In other words this is the set of monomials which stand on theright-hand side of the equality(1) (x + x 2 + + x n )(y + y 2 + + y m )=xy + x 2 y + xy 2 + + x n y m :Hence, the set of these monomials is equivalent to the set M 1 M 2 .Analogously, let M 1 , M 2 , ... , M r be arbitrary sets. Their product is theset consisting of all sequences (a 1 ...a r ) where the i-th place is taken by anarbitrary element of the set M i . The product of the sets M 1 , ... , M r is denotedby M 1 M r .For example, if M 1 = M 2 = M 3 is the set of all real numbers R, the coordinatemethod in the space establishes a one-to-one correspondence between the points ofthe space and the set M 1 M 2 M 3 .But in this chapter we are considered with nite sets.


68 I. R. <strong>Shafarevich</strong>THEOREM 1. If the sets M 1 , ... , M r are nite, then the set M 1 M ris also nite, and n(M 1 M r )=n(M 1 ) n(M r ).We shall rst prove the theorem for the case of two sets, i.e. when r = 2 thiswill be the induction basis. If M 1 = fa 1 ...a n g, M 2 = fb 1 ...b m g, then all thepairs (a i b j ) can be written in the form of a rectangle(2)(a 1 b 1 ) ... (a n b 1 )(a 1 b 2 ) ... (a n b 2 )... ... ...(a 1 b m ) ... (a n b m )The j-th row above contains pairs whose last element isalways b j . In each rowthenumber of pairs is equal to the number of all a i 's, i.e. it is n. The number of therows is equal to the number of all b j 's, i.e. it is equal to m. Hence, the number ofpairs is nm. Notice that the rectangle (2) resembles, in a way, Fig. 2. (A dierentline of reasoning would be to say that the set M 1 is equivalent to the set f1 ...ngor to the set of monomials fx x 2 ...x n g and that M 2 is equivalent to the setfy y 2 ...y m g. Then, n(M 1 M 2 ) is, as we have seen, the number of terms inthe right-hand side of (1). Putting x =1,y =1,we conclude that this number ofterms is nm.)The proof of the general case of r sets M 1 , ... , M r will be carried out byinduction on r. In each sequence (a 1 ...a r )weintroduce two morebrackets, andwrite it in the form ((a 1 ...a r;1 )a r ). Clearly, this does not alter the numberof the sequences. But the sequence ((a 1 ...a r;1 )a r ) is the pair (x a r ), wherex =(a 1 ...a r;1 ) can be considered to be an element ofthesetM 1 M r;1 .Hence, the set M 1 M r is equivalent to the set P M r , where P = M 1 M r;1 . We have proved that n(P M r )=n(P )n(M r ), and by the inductionhypothesis we have n(P ) = n(M 1 ) n(M r;1 ). Therefore, n(M 1 M r ) =n(M 1 ) n(M r;1 )n(M r ) and the proof is complete.Using Theorem 1 we can once more form the expression for the number ofdivisors of a positive integer n. Suppose that n has the cannonical representationn = p 11 pr r :In Section 3 of Chapter I we saw that the divisors of n can be written in the formm = p 11 pr rwhere i can take anyintegral value between 0 and i (formula (11) of Chapter I).In other words, the set of divisors is equivalent to the set of sequences ( 1 ... r )where i takes the above mentioned values. But this is exactly the product M 1 M r of the sets M i where M i is the set f0 1 ... i g. Since n(M i )= i +1,according to Theorem 1 the number of divisors is ( 1 +1)( 2 +1)( r +1). Thisformula was derived in a dierent way in Section 3, Chapter I.


Selected chapters from algebra 69If the sets M 1 , ... , M r coincide, i.e. if M 1 = M 2 = = M r = M theirproduct M 1 M r is denoted by M r . Consider the case when M 1 = =M r = I, and the set I has two elements a and b. An element ofI r is a sequenceof r symbols, each one being a or b, e.g.aababbba (for shortness sake weomitthecommas). This can be considered as a word of r letters written in the alphabet oftwo letters, a being a dot and b adash. Therefore, n(I r ) is equal to the numberof words of length r, written in Morse's alphabet. As we see, it is equal to 2 r (alln i = 2).In further text we consider sets contained in a given set M. They are called itssubsets. This means that a subset N of a set M contains only elements of M, butnot necessarily all of them. ThefactthatN isasubsetofM is written as N M.We also take thatM is a subset of itself. As we shall see later, it is very convenientto consider the subset of M containing no elements|this simplies greatly manydenitions and theorems. This subset is callled the empty subset and is denotedby ?. By denition we take n(?) =0.If N M, the set of all elements of M which do not belong to N is called thecomplement of N and is denoted by N. For instance, if M is the set of all positiveintegers, and if N is the set of all even positive integers, then N is the set of allodd positive integers. If N = M, thenN = ?.If N 1 and N 2 are two subsets of M (i.e. N 1 M and N 2 M) then the set ofall elements which belong to N 1 and N 2 is called their intersection and is denotedby N 1 \ N 2 . For example, if M is the set of all positive integers, if N 1 is the subsetof all those divisible by 2, and N 2 the subset of all those divisible by 3,thenN 1 \N 2is the set of all positive integers divisible by 6.If N 1 and N 2 do not have common elements, then by denition N 1 \ N 2 = ?,the empty set. So, if M and N 1 are the same as in the previous example, and N 2is the set of odd positive integers, then N 1 \ N 2 = ?.The set containing elements which belong to the subset N 1 or the subset N 2 iscalled their union and is denoted by N 1 [ N 2 . For example, if M is again the set ofall positive integers, and N 1 and N 2 are the subsets of all even and odd numbers,respectively, thenN 1 [ N 2 = M.a) b)Fig. 3


70 I. R. <strong>Shafarevich</strong>Intersections and unions of sets can be represented graphically as in Figure 3.In Fig. 3a) M 1 [ M 2 is hatched by horizontal and M 1 \ M 2 by vertical lines. InFig. 3b) the set (M 1 [ M 2 ) \ M 3 is hatched.In this chapter we shall consider subsets of a nite set M, which satisfy certainconditions and we shall derive formulas for the number of all such subsets. Thebranch of mathematics concerned with such questions is called combinatorics.Therefore, combinatorics is the theory of arbitrary nite sets. We donotusenotions such as distance or the magnitude of an angle, equation or its roots, butonly the notion of a subset and the number of its elements. Hence, it is verysurprising that, using only such miserly material, we can nd many regularitiesand connections with other branches of mathematics which are not at all obvious.Problems1. Let M = M 0 be the set of all positive integers. Couple into pairs thenumber a 2 M with b 2 M 0 such thatb =2a. Is this a one-to-one correspondencebetween M and M 0 ?2. Let N be the set of all positive integers, let M = N N and let M 0 bethe set of positive rational numbers. Couple into pairs (n 1 n 2 ) 2 M with a 2 M 0if a = n 1 =n 2 . Is this a one-to-one correspondence?3. How many dierent one-to-one correspondences exist between two sets Mand M 0 if n(M) =n(M 0 )=3? Draw them analogously as in Fig. 1.4. Every one-to-one correspondence between the sets M and M 0 denes theset of those pairs (a a 0 ), where a 2 M and a 0 2 M 0 correspond to each other, i.e.it denes a subset ; M M 0 which iscalledthegraph of correspondence. Let; 1 and ; 2 be graphs of two one-to-one correspondences. Prove that; 1 \ ; 2 is agraph of a one-to-one correspondence if and only if ; 1 = ; 2 and the two givencorrespondences coincide.5. Let n(M) =n(M 0 )=n and let ; be the graph of a one-to-one correspondencebetween M and M 0 (see Problem 4). Evaluate n(;).6. Let M be the set of all positive integers, let N 1 M be the subset ofall numbers divisible by a given number a 1 and let N 2 M be the subset of allnumbers divisible by agiven number a 2 . Describe the sets N 1 [ N 2 and N 1 \ N 2 .7. Prove that (N) = N, i.e. that the complement of the complement of asubset N is exactly N.2. CombinatoricsWe start with the simplest question: determine the number of all subsets of anite set.We rst solve the problem for small values of n(M). We will write down thesubsets N of M writing in one row all the subsets with the same number of elements(i.e. with the same value of n(N)). The rows are arranged in the ascending order


Selected chapters from algebra 71of n(N).1: n(M) =1 M = fagn(N) =0 N = ?n(N) =1N = M = fag2: n(M) =2 M = fa bgn(N) =0 N = ?n(N) =1 N = fag N = fbgn(N) =2N = M = fa bg3: n(M) =3 M = fa b cgn(N) =0 N = ?n(N) =1 N = fag N = fbg N = fcgn(N) =2 N = fa bg N = fa cg N = fb cgn(N) =3N = M = fa b cgTable 1We see that if n(M) =1,thenumber of subsets is 2, if n(M) =2itis4andif n(M) = 3 it is 8. This suggests the general statement.THEOREM 2. The number of all subsets of a nite set M is 2 n(M) .There is a general method which reduces the investigation of an arbitrary niteset to the investigation of sets with smaller number of elements. The set M is calledthe sum of its two subsets M 1 M and M 2 M if M 1 [ M 2 = M, M 1 \ M 2 = ?.Clearly, this is equivalent toM 2 = M 1 and M 1 = M 2 . In this case each element ofM belongs to one of the subsets M 1 or M 2 (since M 1 [ M 2 = M) and only to oneof them (since M 1 \ M 2 = ?). Hence, n(M) =n(M 1 )+n(M 2 ). The fact that Mis the sum of M 1 and M 2 is written as M = M 1 + M 2 . Such arepresentation isalso called a partition of M into M 1 and M 2 .Let M = M 1 + M 2 and let N M be an arbitrary subset. Then any elementa 2 N belongs either to M 1 (in this case a 2 N \ M 1 ) or to M 2 (in this casea 2 N \ M 2 ), and only one of these cases can take place (since M 1 \ M 2 = ?).Hence N =(N \M 1 )+(N \M 2 ). Conversely,ifN 1 M 1 and N 2 M 2 are arbitrarysubsets, then N 1 M, N 2 M and N = N 1 [ N 2 M, whereas N \ M 1 = N 1and N \ M 2 = N 2 . In this way we establish a one-to-one correspondence betweenthe subsets N of M and the pairs (N 1 N 2 ) where N 1 and N 2 are arbitrary subsetsof M 1 and M 2 , respectively.We nowformulate this result in terms of sets. Denote by U(M) thesetofallsubsets of a set M. One should not be alarmed because we consider here subsets aselements of a new set. So, for example, associations of civil or electrical engineersare elements of the general association of engineers. In Table 1 we decribed the setsU(M) when n(M) =1,2 or 3. The result obtained above canbeformulated as


72 I. R. <strong>Shafarevich</strong>follows: if M = M 1 + M 2 is a partition of M, thenthesetU(M) isinaone-to-onecorrespondence with the set U(M 1 ) U(M 2 ). Denote the number n(U(M)) byv(M)|this is the required number of all subsets. Applying Theorem 1 we deducethat(3) v(M 1 + M 2 )=v(M 1 )v(M 2 ):The equality (3) reduces the evaluation of v(M) to the evaluation of v(M 1 )and v(M 2 ) for the sets M 1 and M 2 with smaller number of elements. In orderto obtain the nal result, consider the partition of M not into two, but into anarbitrary number of subsets. We can dene this concept inductively, saying thatM = M 1 + + M r if M = (M 1 + + M r;1 )+M r , where the expressionM 1 + + M r;1 is taken to be already dened. In fact, when we say that M =M 1 ++M r , this means that M 1 ,...,M r are subsets of M and that every elementof M belongs to one and only one of the subsets M 1 , ... , M r . For example, if Mis the set of all positive integers, then M = M 1 + M 2 + M 3 , where M 1 is the subsetof all numbers divisible by 3,M 2 is the subset of all numbers of the form 3r +1and M 3 is the subset of all numbers of the form 3r +2.From (3), for nite sets M i we obtain, by induction(4) v(M 1 + + M r )=v(M 1 ) v(M r ):If n(M) =n, then there exists the \tiniest" partition of M into n subsets M i ,each having only one element, i.e. M = M 1 + + M n . If M = fa 1 ...a n g,thenM i = fa i g. The one element set M i has two subsets: the empty set ? and M iitself (see Table 1, rst row). Hence, v(M i )=2and applying formula (4) to thepartition M = M 1 + + M n weobtainthatv(M) =2 n , as stated in Theorem 2.The question of the number of all subsets of a given set appears in connectionwith certain problems regarding numbers. For example, consider the followingquestion: in how many ways can a positive integer n be written as a product oftwo relatively prime factors? Let n = ab, wherea and b are relatively prime and letn = p 11 pr rbe the canonical prime factorization. Then a and b are divisors of nand, as we saw in Section 3 of Chapter I, each one of them has the form p 11 pr rwhere 0 6 i 6 i . But since a and b are relatively prime, then if some p i divides a,then it cannot divide b and hence appears in a with degree i . Therefore, in orderto obtain the required factorization n = ab, it is necessary to choose an arbitrarysubset N of the set M = fp 1 ...p r g and to equate a to the product of p iiforp i 2 N. Then a divides n and n = ab is the required factorization. According toTheorem 2, the number of all factorizations of n into products of two relativelyprimer factors is 2 r ,wherer is the number of dierent prime factors of n.It should be noted that in the above evaluation we considered the factorizationn = ab and n = ba to be dierent. In fact, if a, and hence the factorizationn = ab, corresponds to the subset N fp 1 ...p r g, then b corresponds to thesubset consisting of those p i 2 M which do not belong to N, i.e. which belongto the complement N of N. Therefore, in our evaluation we corresponded thefactorizations n = ab and n = ba to two dierent subsets N and N. Hence, if we


Selected chapters from algebra 73do not want tomakeadierence between the factorizations n = ab and n = ba,then the two subsets N and N should be treated as one, and then the number ofall factorizations in this sense would be 2 r;1 .We now pass on to a more subtle problem: nd the number of subsets of agiven nite set M which contain m elements and m is a given number. In order todo this, we again collect all the subsets N M such that n(N) =m into one setdenoted by U(Mm). If we put n(U(Mm)) = v(Mm), then this is the numberwhich wewishtond. In Table 1 we wrote the sets which belong to U(Mm) onone row. Hence, we obtain the values of v(Mm) for small values of n(M):n(M) =1 : v(M0) = 1 v(M1) = 1n(M) =2 : v(M0) = 1 v(M1) = 2 v(M2) = 1n(M) =3 : v(M0) = 1 v(M1) = 3 v(M2) = 3 v(M3)=1Table 2THEOREM 3. If n(M) =n, the number of subsets N M of the set M whichcontain m elements (i.e. such that n(N) =m) isequal to the binomial coecientC m n . In other words, v(Mm) =Cm n .The proof is based upon the same idea as the proof of Theorem 2. Namely,suppose that the set M is the sum of two subsets: M = M 1 + M 2 and we shallexpress the number v(Mm) in terms of the numbers v(M 1 m)andv(M 2 m). IfM = M 1 + M 2 , then each subset N M can be written in the form N = N 1 + N 2 ,where N 1 = N \M 1 , N 2 = N \M 2 . If wetakeinto account the condition n(N) =m,then we must have n(N 1 )+n(N 2 )=m. Let k and l be two nonnegative integerssuch thatk + l = m. Consider all subsets N M such that n(N \ M 1 )=k, andn(N \ M 2 )=l, denote the set of all these subsets by U(k l) andputn(U(k l)) =v(Mkl). Then in the same way as in the proof of Theorem 2 we see that(5) v(Mkl) =v(M 1 k)v(M 2 l):The set U(Mm) can clearly be partitioned into sets U(Mkl) for variouspairs of numbers k, l, such that k + l = m. Therefore the number of its elementsv(Mm) is equal to the sum of all numbers v(Mkl) for all k and l such thatk + l = m, i.e. for all the values: k =0,l = m k =1,l = m ; 1 ... k = m, l =0.From the relation (5) we obtain(6) v(Mm) =v(M 1 m)v(M 2 0)+v(M 1 m;1)v(M 2 1)++v(M 1 0)v(M 2 m):Of course, if in the product v(M 1 k)v(M 2 l)itturnsoutthatk>n(M 1 ), we haveto take v(M 1 k) = 0 and the same holds for v(M 2 l).We have obtained a relation analogous to the relation (3), although it is morecomplicated.Wehave met the relation (6) in connection with a completely dierent problem.This is, in fact, the coecient ofx m in the product of two polynomials f(x) andg(x) if the coecient of x k in f(x) is v(M 1 k) and the coecient of x l in g(x)


74 I. R. <strong>Shafarevich</strong>is v(M 2 l) see formula (1) od Chapter II. In order to establish the connectionbetween these two statements, dene for an arbitrary nite set M the polynomialf M (x) whose coecients are v(Ms):(7) f M (x) =v(M0) + v(M1)x + + v(Mn)x n where n = n(M).For instance, according to Table 2, if n(M) = 1, then f M (x) = 1+x, ifn(M) = 2 then f M (x) =1+2x + x 2 ,ifn(M) = 3, then f M (x) =1+3x +3x 2 + x 3 .Now comparing the relations (6) and (7) we can write(8) f M (x) =f M1 (x) f M2 (x) if M = M 1 + M 2 :Hence, if we introduce polynomials f M (x) instead of the numbers v(M) we obtaina complete similarity with the formula (3). We see that the polynomial f M (x)turns out to be a just replacement for the number v(M) in our more complicatedproblem. This is not a rare thing to happen. If we have to deal not with onenumber, but with a nite sequence of numbers (a 0 ...a n ), then its properties areoften well expressed by means of the polynomial a 0 + a 1 x + + a n x n . We shallsee that later, in other examples.It remains literally to repeat the end of the proof of Theorem 2. If M =M 1 + + M r , then from (8) we obtain, by induction,f M (x) =f M1 (x) f Mr (x):Now put n(M) = n and partition the set M into n subsets each containing oneelement: M = M 1 + + M n , n(M i )=1. The one element set M i has two subsets:the empty set ? with n(?) =0andM i with n(M i )=1. Therefore, v(M i 0) = 1,v(M i 1) = 1, v(M i k)=0fork > 1, f Mi =1+x and we conclude that for anynite set M we havef M (x) = (1 + x) n(M) :The expression (1 + x) n(M) can be written in the form of a polynomial in x bymeans of the binomial formula. We have seen (formulas (20) and (24) of Chapter II)that for n = n(M):(1 + x) n = C 0 n + C1 n x + C2 n x2 + + C n n xn where C m n = n!m!(n ; m)! .Therefore, recalling the denition of the polynomial f M (x) (formula (7)), we obtain(9) v(Mm) =C m n =n!m!(n ; m)!for n = n(M):This is the answer to our question.By counting the subsets of M containing 0, 1, 2, . . . , n elements where n =n(M), we have counted all the subsets of M. Therefore, v(M0) + v(M1) + +v(Mn) = v(M), or using (9) and Theorem 2, C 0 + n C1 + + n Cn = n2n . Thisrelation for the binomial coecients is easily obtained from the binomial formula,as we have done in Section 3 of Chapter II.


Selected chapters from algebra 75A subset of m elements of the set fa 1 ...a n g is sometimes called a combinationof n elements, taken m at a time. Hence, the binomial coecient Cnm is thenumber of all such combinations.The above question of the number of subsets N M, ifn(M) =n, n(N) =m,is connected with some questions regarding positive integers. For example, considerthe question: in how manyways can we write a positive integer n in the form ofr summands, where r is a given number? In other words, what is the number ofsolutions of the equation x 1 ++x r = n in positive integers x 1 ,... ,x r ? Solutionswith dierent order of the unknowns are considered to be dierent. For example,if n = 4, r = 2, we have 4= 1+3 = 2+2 = 3+1, and hence there are threesolutions: (1 3), (2 2), (3 1).Consider the segment AB of length n. Its points whose distance from the initialpoint A are integers will be called integral. Clearly, to each solution of the equationx 1 + + x r = n corresponds a partition of the segment AB into r segments withintegral end points of length x 1 , x 2 ,...,x r (Fig. 4).Fig. 4In its turn, such a partition is dened by the end points of the rst r ; 1segments (the end point of the last one is B). These end points dene a subsetN of the set M of integral points of the segment AB which are dierent from B.Clearly, n(N) =r ;1, and in this way wehave dened a one-to-one correspondencebetween the integer solutions of the equation x 1 + + x r = n and the subsetsN M, where n(N) =r;1, n(M) =n;1. Therefore, the number of such solutionsis equal to the number of such subsets. Applying formula (9) we conclude that thenumber of these subsets is C r;1n;1. If we do not x the number of summands intowhich thenumber n is decomposed, then the number of all partitions is evidentlyequal to the sum of partitions into r summands for r =1 2 ...n. Therefore, thenumber of partitions is equal to the sum of all binomial coecients C r;1n;1 wherer =1 2 ...n. We know that this sum is 2 n;1 . In other words, a positive integern can be partitioned into integer summands in 2 n;1 ways (if we allow arbitrarynumber of summands, and take into account their order).Return now to the derivation of formula (9). The method used|the introductionof the polynomials f M (x)|turns out to be very useful in other cases, and weshall come back toitlater.But formula (9) which connects the numbers v(Mm)with binomial coecients can be derived in another way. Consider the expression(1 + x) n as the product of n equal factors(10) (1 + x) n =(1+x)(1 + x) (1 + x)and let us expand the product on the right-hand side of (10). We numerate itsfactors, i.e. we give them numbers 1, 2, . . . , n which formthesetM = f1 2 ...ng.


76 I. R. <strong>Shafarevich</strong>In order to expand the product (10) we have tomultiply each timen terms 1 or x,taking them from one of the brackets. Hence, each term of the expanded expression(10) is dened by indicating from which brackets with m numbers i 1 , i 2 ,...,i m .Then 1 is taken from the remaining n ; m brackets, and as a result we obtain theterm x m . We see that each term of the expanded expression (10) is dened by thesubset N = fi 1 ...i m g of M which gives the number of brackets from which xis taken. From the remaining brackets we take 1. The remaining brackets havethose numbers which belong to the complement N of N. Therefore, the number ofappearences of the term x m is equal to the number of subsets N M containingm elements, and this is v(Mm). Hence, the expression (10) in the expanded formis the sum of the terms of the form v(Mm)x m :(1 + x) n = v(M0) + v(M1)x + + v(Mn)x n :Comparing this with the denition of binomial coecients (formula (20) of ChapterII) we obtain a new proof of the equality v(Mm) =Cn m.The same reasoning can be applied to a more general case. Consider theproduct of rst degree polynomials x + a i , where the coecient ofx is 1. Let ustry to write the product(11) (x + a 1 )(x + a 2 ) (x + a n )in the form of a polynomial in x. As before we numerate the n factors. Then eachterm in the expanded product (11) is obtained by taking a i1 , a i2 ,...,a im from thefactors numerated i 1 , i 2 ,...,i m and taking x from the remaining n ; m factors.The obtained term has the form a i1 a i2 a im x n;m and if all the terms of degreen ; m are collected together we get m (a 1 ...a n )x n;m where m (a 1 ...a n ) isthe sum of all products of the form a i1 a im where fi 1 ...i m g runs over allsets of indices formed from 1, ... , n. Hence, the polynomial m (a 1 ...a n ) hasC m n terms. For example, 1 (a 1 ...a n ) = a 1 + + a n , and 2 (a 1 ...a n ) =a 1 a 2 + a 1 a 3 + + a 2 a 3 + ...a n;1 a n |it is the sum of all products a i a j withi < j. The last polynomial n has the form n (a 1 ...a n ) = a 1 a n . This isthe rst time that we encounter polynomials in an arbitrary number n of variables.Polynomials 1 ,..., n have avery important role in algebra. In particular, wehave proved the formula(12) (x + a 1 ) (x + a n )=x m + 1 (a 1 ...a n )x n;1 + 2 (a 1 ...a n )x n;2 + + n (a 1 ...a n ):It is called Viete's formula.Viete's formula expresses an important property of polynomials. Suppose thatthe polynomial f(x) of degree n has n roots 1 ,..., n . Then, as we have seenmore than once, it is divisible by the product (x ; 1 ) (x ; n ), and since thisproduct is also of degree n, thenf(x) =c(x ; 1 ) (x ; n ), where c is a number.Suppose that the coecient of the leading term of f(x) is1. Then the number cmust also be 1, and we havef(x) =(x ; 1 ) (x ; n ):


Selected chapters from algebra 77We can apply Viete's formula(12)toitby putting a i = ; i . Since all the termsof the polynomial k are products of k variables taken from a 1 , ... , a n , thenreplacing a i by ; i gives rise to a factor (;1) k , namely: k (; 1 ... ; n ) =(;1) k k ( 1 ...a n ). Hence, from (12) we obtain(13) (x ; 1 ) (x ; n )=x n ; 1 ( 1 ... n )x n;1 + +(;1) n n ( 1 ... n ):This formula expresses the coecients of the polynomial f(x) =(x; 1 ) (x; n )in terms of its roots and it is also called Viete's formula. You know its special casefor the quadratic equation: in that case there are only two polynomials 1 and 2 , 1 = 1 + 2 , 2 = 1 2 .In conclusion, consider again formula (9) for the number of subsets (or thenumber of combinations). We deduced it from the binomial formula, which was,in turn, proved in Section 3 of Chapter II using the properties of the derivative.That is a rather involved method. It would be desirable to have another proof ofthis formula based only upon combinatorial reasoning. We shall give such a proofof an even more general formula. Notice that each subset N of the set M denesa partition M = N + N where N is the complement of N. We consider a moregeneral case: an arbitrary partition M = M 1 ++M r into subsets with prescribednumber of elements: n(M 1 )=n 1 , ... , n(M r )=n r . The sequence (n 1 ...n r ) willbe called the type of the partition M = M 1 + + M r . We suppose that none ofthe sets M i is empty, i.e. that all n i > 0.Since we are dealing all the time with one and only set M where n(M) =n, itshall not always be present in our notations. Denote the number of all possible partitionsof our set M which have the prescribed type (n 1 ...n r )by C(n 1 ...n r ).Of course, we must have n 1 + + n r = n. Notice also that we are taking intoaccount the order of the sets M 1 , ... , M r . For instance, for r = 2 and given n 1and n 2 , n 1 + n 2 = n, we take the partitions M = M 1 + M 2 and M = M 2 + M 1 ,with n(M 1 ) = n 1 and n(M 2 ) = n 2 , to be dierent. Indeed, if n 1 6= n 2 thesepartitions are of dierent types. Owing to this, each partition M = M 1 + M 2 de-nes one subset M 1 (the rst one) and we have a connection with the previouslyconsidered problem: C(n 1 n 2 ) = v(Mn 1 ). In other words, for any m < n, wehave v(Mm) = C(m n ; m). We shall now derive an explicit formula for thenumber C(n 1 ...n r ). Consider an arbitrary partition M = M 1 + + M r oftype (n 1 ...n r ). Suppose that at least one of the numbers n 1 , ... , n r is dierentfrom 1. For instance, suppose that n 1 > 1 and choose an arbitrary elementa 2 M 1 . Denote by M1 0 the set of all elements of M 1 dierent from a (this isthe complement oftheset fag taken as a subset of M 1 ). Then we have the partitionM 1 = M1 0 + fag and to our partition M = M 1 + + M r correspondsa new partition M = M1 0 + fag + M 2 + + M r of type (n 1 ; 1 1n 2 ...n r ).In this way fromall partitions of type (n 1 n 2 ...n r ) we obtain all partitions oftype (n 1 ; 1 1n 2 ...n r ): the partition M = M1 0 + fag + M 2 + + M r is obtainedfrom the partition M = M 1 + + M r , where M 1 = M1 0 + fag. Moreover,one partition of type (n 1 n 2 ...n r ) gives rise to n 1 dierent partitions of type(n 1 ; 1 1n 2 ...n r ), depending on the choice of a 2 M 1 . Hence, we have(14) n 1 C(n 1 n 2 ...n r )=C(n 1 ; 1 1n 2 ...n r ):


78 I. R. <strong>Shafarevich</strong>Applying the same method to partitions of type (n 1 ; 1 1n 2 ...n r ) we obtainthat (n 1 ; 1)C(n 1 ; 1 1n 2 ...n r ) = C(n 1 ; 2 1 1n 2 ...n r ), i.e. thatn 1 (n 1 ; 1)C(n 1 n 2 ...n r )=C(n 1 ; 1 1 1n 2 ...n r ) andn 1 ! C(n 1 n 2 ...n r )=C(1 ... 1n 2 ...n r ):| {z }n 1 timesWenow apply the same reasoning to the parameter n 2 in C(1 ... 1n 2 ...n r ).In the same way as before we obtain the relation n 2 ! C(1 ... 1n 2 n 3 ...n r )=C(1 ... 1n 3 ...n r ) where 1 appears in the rst n 1 + n 2 places, that is to sayn 1 ! n 2 ! C(n 1 n 2 ...n r )=C( 1 ... 1 n 3 ...n r ):| {z }n 1+n 2 timesFinally, if we apply the procedure to all the parameters n 1 , n 2 , ... , n robtain the formulawe(15) n 1 ! n 2 ! n r ! C(n 1 n 2 ...n r )=C(1 1 ... 1)| {z }n timessince n 1 +n 2 ++n r = n. It remains to nd the value of the expression C(1 ... 1).In order to do so notice that the aboveformula was proved for partitions of all types(n 1 ...n r ). Apply it to the simplest type (n). There is only one partition of thistype, namely M = M, and so C(n) =1. On the other hand, formula (15) givesn! C(n) =C(1 1 ... 1):| {z }n timesTherefore C(1 ... 1) = n! and substituting this into (15) we obtain the nal expression(16) C(n 1 ...n r )=n!n 1 ! n 2 ! n r ! where n = n 1 + + n r :For n = 2 instead of (n 1 n 2 ), n 1 + n 2 = n it is more usual to write (n m ; n).Since C(m n ; m) =v(Mm), formula (16) reduces to the relation (9).Remark 1. Consider again the expression C(1 ... 1) which appeared at theend of the above proof. What is a partition of type (1 ... 1)? It is a partition of Minto one element sets. But recall that wemust takeinto account the order of the setsin the partition M = M 1 ++M r . Hence, a partition M = fa 1 g++fa n g givesanumertaion of the elements of M. The number C(1 ... 1) shows in how manyways we can numerate the elements of M. It can be said that C(1 ... 1) gives thenumber of dierent arrangements of the elements of M. As we know, the numberof such arrangements is n!. Various arrangements are also called permutations. Forexample, if M = fa b cg, which means that n =3,wehave 6permutations(a b c) (a c b) (b a c) (b c a) (c a b) (c b a):


Selected chapters from algebra 79Remark 2. In the case r = 2, the expression C(n 1 n 2 ) coincides with thebinomial coecient|we have already given two proofs of this fact. An analogousinterpretation has the expression C(n 1 ...n r ) for any r. It can be proved thatif x 1 , ... , x r are variables, then in the expansion of (x 1 + + x r ) n we obtainterms of the form x n11 xnr rwith n 1 + + n r = n, n i nonnegative integers, andthat the coecient ofx n11 xnr r is C(n 1 ...n r ). We have onlytoreturntoourrst denition of a partition, allowing the empty set to appear among M i 's andhence allowing zero to be among the numbers n i . It is easily seen that (16) remainsvalid in this case also, provided we take 0!=1. The proof of this generalizationof the binomial formula to the case of r variables is perfectly analogous to thesecond (combinatorial) proof of the relation v(Mm) = Cnm (where n = n(M))given above.For instance, this formula gives that (x 1 + x 2 + x 3 ) 3 is equal to the sum ofterms C(n 1 n 2 n 3 )x n11 xn2 2 xn3 3 where (n 1n 2 n 3 )runsover all triplets of nonnegativeintegers such thatn 1 + n 2 + n 3 = 3, and C(n 1 n 2 n 3 )isevaluated by formula (16)(with the condition 0! = 1). Substitution gives(x 1 + x 2 + x 3 ) 3 =x 3 1 + x 3 2 + x 3 3 +3x 2 1x 2 +3x 1 x 2 2 +3x 2 1x 3 +3x 1 x 2 3 +3x 2 2x 3 +3x 2 x 2 3 +6x 1 x 2 x 3 :Problems1. Let I = fp qg be a set containing two elements and let M = fa 1 ...a n gbeasetofn elements. To each subset N M correspond the following element:aword from I n where on the i-th place stands p if a i 2 N, and q if a i does notbelong to N. Prove that this establishes a one-to-one correspondence between thesets U(M) and I n . Use this to derive Theorem 2 from Theorem 1.2. How can the intersection and the union of subsets N 1 and N 2 of the set Mbe expressed in terms of their corresponding words from I n (see Problem 1)?3. Find the number of all partitions M 1 + + M r of all types, but for a xednumber r. Verify that for r = 2 the answer is given by Theorem 2.4. Find the sum of all numbers C(n 1 ...n r ) for all n i > 0, n 1 + + n r = n,for given r and n. Give two solutions: one based upon Problem 3 and the otherbased upon the statement given in Remark 2.5. Find the number of factorizations of a given positiveinteger n into a productof r factors: n = a 1 a r , which aremutually relatively prime.6. What is the number of solutions of the equation x 1 + + x r = n, for givenn and r, inintegers x i > 0? Use the following graphic interpretation of solutions,which is a modication of the interpretation given in Fig. 4. Let AB be a segmentof length n + r. Correspond to a solution (x 1 ...x r ) the partition of this segmentconsisting of the segment of length x 1 starting at A, the segment of length x 2 ,starting at the rst integral point after the end of the rst segment, etc see thegure in which wehave x 3 =0.


80 I. R. <strong>Shafarevich</strong>7. Find the number of dierent partitions M = M 1 + M 2 of type (m m)if n(M) = 2m and the partitions M = M 1 + M 2 and M = M 2 + M 1 are nottaken to be dierent. The same question for the partitions M = M 1 + M 2 + M 3of type (m m m) ifn(M) =3m and if partitions with dierent order of M 1 , M 2 ,M 3 are not taken to be dierent. Finally, the same question for the partitions oftype (k k l l l), n(M) =2k +3l and partitions in which equivalent subsets havedierent order are not taken to be dierent.8. What is the form of the term of the polynomial (x 1 + + x n ) 2 ? The samequestion for the polynomial (x 1 + + x n ) 3 .9. How many terms are there in the polynomial (x 1 + + x r ) n , supposingthat similar terms are grouped together?10. Express the polynomial a 2 1 + a 2 2 + + a 2 n in terms of polynomials 1and 2 . Suppose that the polynomial x n + ax n;1 + bx n;2 + has n real roots.Provethata 2 > 2b. When does the equality a 2 =2b take place? Hint: use Bezout'stheorem from Section 1 of Chapter II and the fact that a sum of squares of realnumbers cannot be negative.11. Give a combinatorial proof of the relation Cn k = Ck n;1 +Ck;1 n;1for binomialcoecients (formula (26) of Chapter II), interpreting Cn k as v(Mk) where n(M) =n. Generalize this relation to the numbers C(n 1 ...n r ).12. Give a combinatorial proof of the relation Cn m = Cn;m nfor binomialcoecients.(to be continued in the next issue)I. R. <strong>Shafarevich</strong>,Russian Academy of Sciences,Moscow, Russia


THE TEACHING OF MATHEMATICS2000, Vol. III, 1, pp. 15{40<strong>SELECTED</strong> <strong>CHAPTERS</strong> <strong>FROM</strong> <strong>ALGEBRA</strong>I. R. <strong>Shafarevich</strong>Abstract. This paper is the continuation of the third part of the publication\Selected chapters of algebra", the rst two being published in previous issues of theTeaching of Mathematics, Vol. I (1998), 1{22, and Vol. II, 1 (1999), 1{30, and thebeginning of this part in Vol. II, 2 (1999), 65{80.AMS Subject Classication: 00 A 35Key words and phrases: Algebra of sets, probability theory, Bernoulli's scheme,Chebyshev inequalities.CHAPTER III. SET (continued)3. Algebra of setsIf the intersection of two subsets M 1 M and M 2 M is empty (i.e. M 1 \M 2 = ?), then the union M 1 [ M 2 consists of elements which belong either to M 1or to M 2 , and any element ofM 1 [M 2 can belong only to one of the sets M 1 or M 2 .Hence, M 1 [ M 2 = M 1 + M 2 , and so n(M 1 [ M 2 )=n(M 1 )+n(M 2 ).The case when M 1 \ M 2is not empty can be reduced to the previous one.Denote by M 0 1 the complement of M 1 \ M 2 with respect to M 1 , that is to saythe set of those elements of M 1 which do not belong to M 1 \ M 2 . Then M 1 =(M 1 \ M 2 )+M 0 1 and(17) n(M 1 )=n(M 1 \ M 2 )+n(M 0 1 ):Analogously,(18) n(M 2 )=n(M 1 \ M 2 )+n(M 0 2 )where M 0 2 is the complement ofM 1 \ M 2 with respect to M 2 . Adding up (17) and(18) we obtain(19) n(M 1 )+n(M 2 )=2n(M 1 \ M 2 )+n(M 0 1)+n(M 0 2):But the sets M 1 \ M 2 , M 0 and M 0 1 2do not have common elements and their unionis M 1 [ M 2 . Therefore M 1 [ M 2 = M 1 \ M 2 + M 0 1 + M 0 2 and so n(M 1 [ M 2 )=This paper is an English translation of the second part of: I. R. Xafareviq, Izbranyeglavy algebry, Matematiqeskoe obrazovanie, 3, okt.{dek. 1997, Moskva, str. 2{45. In theopinion of the editors, the paper merits wider circulation and we are thankful to the author forhis kind permission to let us make this version.


16 I. R. <strong>Shafarevich</strong>n(M 1 \ M 2 )+n(M 0 1)+n(M 0 2). Using this, we can rewrite the equality (19) as:n(M 1 )+n(M 2 )=n(M 1 \ M 2 )+n(M 1 [ M 2 ), that is to say(20) n(M 1 [ M 2 )=n(M 1 )+n(M 2 ) ; n(M 1 \ M 2 ):This is the relation wewanted. Our further aim is to generalize it and to obtainthe expression for the number of elements of the union of an arbitrary number ofsets n(M 1 [[M r ), and not only the union of two sets. We shall have to establishsome more or less evident propertiesofintersections and unions of several sets.First of all notice that the union M 1 [ M 2 [[M r of several subsets M 1 ,M 2 ,...,M r can be dened by means of unions of only two subsets. For instance,and also for arbitrary kM 1 [ M 2 [ M 3 =(M 1 [ M 2 ) [ M 3 M 1 [ M 2 [[M k =(M 1 [ M 2 [[M k;1 ) [ M k :The second formula we need has the form(M 1 [ M 2 [[M k ) \ N =(M 1 \ N) [ (M 2 \ N) [[(M k \ N):Both formulas are obvious it is enough to ask oneself: what does it mean thatan element a 2 M belongs to the left or to the right-hand side? For example, inthe last formula a 2 (M 1 [ M 2 [[M k )\N means that a 2 M 1 [ M 2 [[M kand a 2 N. The second statement is merely that a 2 N and the rst that a 2 M ifor some i =1 ...k. But then a 2 M i \ N for the same i, and this means thata 2 (M 1 \ N)[(M 2 \ N)[[(M k \ N). Notice that this property resembles thedistributivity ofnumbers. Indeed, if we replace the sets M 1 , M 2 , ... , M k and Nby thenumbers a 1 , a 2 ,...,a k and b, ifwe replace the sign [ by +and\ by , weobtain the equality (a 1 ++ a k )b = a 1 b ++ a k b, i.e. the distributivity lawfornumbers. There are other properties which show an analogy between the operationsunion and intersection of subsets on one side, and addition and multiplication ofnumbers on the other (see Problem 1). Investigation of system of subsets of a givenset M with respect to the operations [ and \ is called the algebra ofsets.We now derive the formula for n(M 1 [ M 2 [ M 3 ). Since M 1 [ M 2 [ M 3 =(M 1 [ M 2 ) [ M 3 ,we can apply formula (20) to obtainn(M 1 [M 2 [M 3 )=n((M 1 [M 2 )[M 3 )=n(M 1 [M 2 )+n(M 3 );n((M 1 [M 2 )\M 3 ):We can apply formula (20) to the term n(M 1 [ M 2 ) and since (M 1 [ M 2 ) \ M 3 =(M 1 \ M 3 )[(M 2 \ M 3 ), we can also apply formula (20) to the last term above. Wegetn(M 1 [ M 2 [ M 3 )=n(M 1 )+n(M 2 )+n(M 3 ); n(M 1 \ M 2 ) ; n(M 1 \ M 3 ) ; n(M 2 \ M 3 )+ n((M 1 \ M 3 ) \ (M 2 \ M 3 )):


Selected chapters from algebra 17Clearly, (M 1 \ M 3 ) \ (M 2 \ M 3 ) = M 1 \ M 2 \ M 3 and so the last term can bewritten as n(M 1 \ M 2 \ M 3 ). We obtain the formulan(M 1 [ M 2 [ M 3 )=n(M 1 )+n(M 2 )+n(M 3 ); n(M 1 \ M 2 ) ; n(M 1 \ M 3 ) ; n(M 2 \ M 3 )+ n(M 1 \ M 2 \ M 3 ):Now we can guess what should be the form of the formula for n(M 1 [[M r ).It must contain the terms n(M i1 \\M ik ) where M i1 , ... , M ik are any k setstaken among the sets M 1 , ... , M r for all k =1 2 ...r and if k is even we takethe sign ;, while if k is odd we take +. In other words, the sign of the termn(M i1 \\M ik )is(;1) k;1 .We shall nowprove this formula by induction on r in the same wayasweprovedit for r =3. The induction basis will be formula (20). Write M 1 [ M 2 [[M rin the form (M 1 [ M 2 [[M r;1 ) [ M r , and use formula (20):n(M 1 [ M 2 [[M r )=n(M 1 [ M 2 [[M r;1 )+n(M r ); n((M 1 [ M 2 [[M r;1 ) \ M r ):By the induction hypothesis, the formula is true for n(M 1 [[M r;1 )andgivesthose terms of n(M 1 [[M r ) which donotcontain M r . Now wehave(M 1 [ M 2 [[M r;1 ) \ M r =(M 1 \ M r ) [ (M 2 \ M r ) [[(M r;1 \ M r )and by the induction hypothesis we can also apply the formula to the expressionn((M 1 \ M r ) [[(M r;1 \ M r )). The intersection(M i1 \ M r ) \\(M ik \ M r )is obviously M i1 \\M ik \M r and so we obtain all the terms of the formula whichcontain M r . Moreover, if the term of the formula for n((M 1 \M r )[[(M r;1 \M r ))had the sign (;1) r;1 , it has the sign (;1) r in the formula for n(M 1 [[M r )and it will depend on k +1setsM i1 \\M ik \ M r .The formula for n(M 1 [[M r ) can be written down more conveniently ifwe consider the number of elements of the complement M 1 [[M r of the setM 1 [[M r , i.e. the number of elements of the set M which do not belong toany of the subsets M i . Since for any subset N M we always have M = N + N,then n(N) =n(M) ; n(N). In our case n(M 1 [[M r ) will be the sum of theterms (;1) k n(M i1 \\M ik ), where M i1 ,...,M ik are any k subsets taken fromM 1 ,...,M r . For k =0we take the term n(M). In other words,(21)where n = n(M).n(M 1 [[M r )=n ; n(M 1 ) ;;n(M r )+ n(M 1 \ M 2 )++(;1) r n(M 1 \\M r )


18 I. R. <strong>Shafarevich</strong>This formula consists of expressions n(M i1 \\M ik ) where i 1 , ... , i k areany k elements of the set f1 2 ...ng. We have met such expressions in connectionwith Viete's formula (formula (12)). It is worthwhile to compare these twoformulas.Formula (21) follows from (12) if we putx =1,a i = ;x i in (12) and then replaceeverywhere x i1 , ... , x ik by n(M i1 \\M ik ). Indeed, it is sometimes written inthis \symbolic" way(22) n(M 1 [[M r )=n(1 ; M 1 )(1 ; M r )where we suppose that the product on the right-hand side is expanded by Viete's(which isformulaasifM i were variables, and then the expression n M i1 M ikmeaningless) is replaced by the expression n(M i1 \\M ik ), and n1 is replacedby n = n(M).Formula (22) can serve asameans for remembering formula (21), but in algebra,whenever two relations, concerned with dierent questions, have the sameform, it is always possible to devise such a denition so that one formula coincideswith the other. We shall show this on the example of formulas (21), (22) and Viete'sformula (12).In order to do this we shall have to consider functions on a set M. Undoubtedly,you must have already met with the concept of a function|in one way or another.By a function we shall mean any way of corresponding to each element a 2 M acertain number. The actual process of corresponding will be denoted by f, andthenumber corresponded to the element a by this process will be denoted by f(a). Itis also called the value of the function f at the element a. Although the conceptof a function is dened for arbitrary sets, we shall at the moment be interestedonly in the case when the set M is nite. Then a function can be represented bywriting with each element a its corresponding number f(a). For example, here aretwo functions f and g, dened on the set of three elements M = fa b cg (Fig. 5).Fig. 5Therefore, if M = fa 1 ...a n g, then a function on M is the sequence(f(a 1 ) ...f(a n )). Functions can be added up and multiplied, these operationsbeing dened by the values of the functions. In other words, the functions f + gand fg are dened by (f + g)(a) =f(a) +g(a) and(fg)(a) =f(a)g(a) for arbitrarya. For example, if f and g are represented by Fig. 5, then f + g and fg arerepresented by Fig. 6.


Selected chapters from algebra 19Fig. 6Since the operations with functions are dened by their values, they havethe same properties as the operations with numbers: commutativity, associativity,distributivity, etc. We can apply any identity, proved for numbers, if we replacenumbers by functions on a given set M. The function f M (a) whichtoanyelementa 2 M corresponds the number 1 is denoted by 1.Clearly, 1 f = f for anyfunction f.We now connect the notion of a function with the notion of a subset. For anysubset N M there exists the function dened as follows: the values of elementswhich belong to N are 1, and the values of those which do not belong to N are 0.This function is called the characteristic function of the subset N and is denotedby f N . Thus, f N (a) =1ifa 2 N and f N (a) =0ifa 2 N. Conversely, itisclearthat the function f N (a) denes the set N|it consists of all elements a 2 M suchthat f N (a) =1. (In this way we obtain a one-to-one correspondence between thesubsets N M and those functions which take onlytwo values 0, 1. This is thesame relation which enables us to deduce Theorem 1 from Theorem 2. See Problem1 from Section 2, where p and q should be replaced by 0and1.)Some properties of subsets are simply expressed in terms of their characteristicfunctions. For example, the characteristic function of the whole set M has all valuesequal to 1, and hence f M = 1. If N is the complement ofN, then f N= 1 ; f N :indeed, if a 2 N, i.e.f N (a) = 1, then (1;f N )(a) = 0, as it should be. Analogouslyfor a 2 N. If N 1 and N 2 are arbitrary subsets then f N1\N2= f N1 f N2 , since ifa 2 N 1 and a 2 N 2 then f N1 f N2 (a) =1 1=1. If a does not belong to one of thesets N 1 or N 2 , then one of the factors f N1 or f N2 is 0 and so f N1 f N2 (a) = 0, andalso f N1\N 2(a) =0. Clearly, this is also true for several subsets:(23) if N 0 = N 1 \\N r then f N0 = f N1 f Nr :We cannow rewrite formula (21) in the language of characteristic functions.First of all, notice that the considered set M 1 [[M r is equal to M 1 \\M r .This is evident: an element a does not belong to the set M 1 [[M r if it doesnot belong to any ofM i , i.e. if it belongs to all M i . Now using formula (23) we canwrite the characteristic function of the set M 1 [[M r in the formf M1[[M r= f M1\\M r= f M1f Mr:Besides, we know that f Mi= 1 ; f Mi and we obtainf M1[[M r=(1 ; f M1 )(1 ; f M2 )(1 ; f Mr ):


20 I. R. <strong>Shafarevich</strong>We can now apply Viete's formula (13), by setting into it x = 1, a i = f Mi . Wehave already explained why this is possible. We obtainf M1[[M r= 1 ; 1 (f M1 ...f Mr )+ 2 (f M1 ...f Mr );+(;1) r r (f M1 ...f Mr ):Moreover, k (f M1 ...f Mr ) is the sum of all products f Mi1 f Mikfor all differentindices (i 1 ...i k ) taken from (1 ...n). We know that f Mi1 f Mik =f Mi1 \\M ik and we obtain that(24) f M1[[M r= 1 ; f M1 ;;f Mr + f M1\M 2+ +(;1) r f M1\\M r i.e. the sum of all functions f Mi1 \\M ikwhich are taken with the + sign if k iseven and with the ; sign if k is odd.Notice that wehave obtained something essentially more than the formula (21):we found the expression not for the number of elements n(M 1 [[M r ) of thesubset M 1 [[M r , but for its characteristic function which does not determineonly the number of elements of the subset, but the subset itself. In particular,formula (21) has sense only when the set M is nite, while the relation (24) is truefor a nite number of subsets of an arbitrary set M.In order to deduce the relation (21) from it, we have to return from functionsback tonumbers. It is essential here that M be nite. For any function we denethe number Sf as the sum of all values f(a) of the function f at all elementsa 2 M: if M = fa 1 ...a n g, then Sf = f(a 1 )++ f(a n ). For example, forthe functions f and g from Fig. 5 we have Sf =2,Sg =1,Clearly, for any twofunctions f, g we have S(f + g) =Sf + Sg. Indeed, the value of f + g at a i isf(a i )+g(a i ). Therefore, S(f + g) = (f(a 1 )+g(a 1 )) + +(f(a n )+g(a n )) =(f(a 1 )++ f(a n )) + (g(a 1 )++ g(a n )) = Sf + Sg. If f N is the characteristicfunction of the subset N, then f N (a) = 1 is true for the elements a 2 N, and forthe other elements a it is 0. Hence, Sf N = n(N).If we now nd the number Sf for the functions on the left and right-hand sideof (24), using the established properties, we obtain exactly the relation (21).Consider now two applications of formula (21). The rst is a question studiedlong time ago by Euler, and it concerns the permutations of a set. We said at theend of the last Section (Remark 1) that this is the name for the arrangements ofelements of a set M in a given order. The number of permutations is n!ifn(M) =n.At the end of Section 2 we wrotedown, as an example, all the six permutations ofthe three element setM = fa b cg. In the general case we also write down all then! permutations of the set M and denote by (a 1 ...a n ) the rst one. The questionis: how many permutations do we have inwhich no element takes the same placeas in the rst one? This is precisely Euler's question. Solve it for the case n =3and the six permutations written at the end of Section 2. Verify that only twopermutations satisfy the given condition, namely: (c a b) and (b c a).In the general case we shall apply formula (21). Denote by P the set of allpermutations of the elements of the set M = fa 1 ...a n g. We have


Selected chapters from algebra 21n(P) =n!. Consider those permutations in which a i stands at the same place as inthe rst permutation, i.e. at the i-th place. Denote the set of all such permutationsby P i . Then our question becomes: nd n(P 1 [[P n ). Hence we have the samesituation as we had before for the set P and its subsets P 1 ,...,P n (in formula (21)the set was denoted by M and its subsets by M i ). In order to apply the formula,we have tondthenumbers n(P i1 \\P ik ). But the set P i1 \\P ik containsexactly those permutations in which a i1 , ... , a ik take the same place as in therst permutation, namely they are the places i 1 , ... , i k , respectively. Such a permutationdiers from the rst permutation only in the arrangement of elements inother places. In other words, the number of such permutations is equal to the totalnumber of permutations of the set fa i1 ...a ik g. Since n(fa i1 ...a ik g)=n ; k,applying the general formula we get n(P i1 \\P ik ) = (n ; k)!. All the setsP i1 \\P ik for a xed k give one term in the formula (21), and the numberof such terms is equal to the number of subsets fa i1 ...a ik gfa 1 ...a n g for agiven k, that is to say, according to Theorem 3 it is Cn. k Hence, the contribution ofthe terms which correspond to a given value of k is Cn k (n;k)! and substituting thevalue of the binomial coecient wegetin our case becomesn(P 1 [[P n )=n! ; n!n!n!(nk!(n ;; k)! =k)! k!1! + n!2!= n!1 ; 1 1! + 1 2!;+(;1)nn!n!+ +(;1)nn!and formula (21)This is the formula founded by Euler. He was actually interested in the ratioof the founded number with the number of all permutations n!. This ratio is1 ; 1 1! + 1 (;1)n+ + , which, as n increases, can be shown to approach a2! n!xed number, namely 1=e, where e is the basis of natural logarithms (for those whoalready know what that is). The number 1=e is irrational and approximately equalto 0,36787 . . . .The second application of formula (21) is related to the properties of positiveintegers. Let n be a positive integer, and let p 1 , ... , p r be its prime divisors,dierent from each other. How many positive integers exist which are not greaterthan n and which are not divisible by any of the numbers p i ? This is again anapplication of formula (21). Denote by M the set of positive integers 1, 2, . . . , nand by M i its subset whose elements are divisible by p i . Clearly, our problem isequivalent to the evaluation of n(M 1 [[M r ). Let us nd the values of theconsists of allterms n(M i1 \\M ik ) in formula (21). The set M i1 \\M ikpositive integers t 6 n which are divisible by prime numbers p i1 , p i2 ,... ,p ik . Thisis equivalent to the fact that t is divisible by their product p i1 p i2 ...p ik . Let mbe a divisor of n. How many are there positive integers t 6 n which are divisibleby m? Such numbers have the form t = mu, where u is a positive integer andthe condition t 6 n is equivalent to u 6 n=m. Hence, u may take the values 1,2, ... , n=m and the number of such numbers is n=m. If m = p i1 p ik this gives:


22 I. R. <strong>Shafarevich</strong>that n(M i1 \\Mi k )=p i1 p iknand formula (21) becomesn(M 1 [[M r )=n ; n p 1;; n + + +(;1) r :p r p 1 p 2 p 1 p rThe right-hand side can be written in the formn1 ; 1 p 1; 1 p 2;+ 1p 1 p 2+ +(;1) r 1p 1 p r:The expression in brackets can be transformed by Viete's formula (applied simplyto numbers), if we set x =1, i = ;1=p i . By (13) this expression will be1 ; 1 p 11 ; 1 p 21 ; 1 p r:Therefore, for the numberofpositiveintegers not greater than n and not divisibleby p 1 , p 2 , ... , p r we obtain(25) n1 ; 1 p 11 ; 1 p 21 ; 1 p r:We often meet the case when p 1 ,... , p r are all the prime divisors of n. In this caset is not divisible by anyofp i 's if and only if it is relatively prime to n: if it hada common factor d with n, then this factor would have aprime divisor p i whichwould divide both t and n. Therefore, formula (25) gives the number of all positiveintegers not greater than n and relatively prime to n, ifwetake p 1 , ... , p r to beall prime divisors of n. The expression (25) was found in this form by Euler, it isdenoted by '(n), and is called Euler's function. For example, for n =675=3 3 5 2we have n(1; 1 3 )(1; 1 5 )=32 5(3;1)(5;1) = 360 numbers which are not greaterthan 675 and which are relatively prime to 675.Suppose nowthatp 1 ,...,p r need not necessarily divide n. What is the numberof positive integers t 6 n which are not divisible by p 1 , ... , p r ? We can repeatthe previous reasoning, but with one alternation. We have to nd the numberof positive integers t 6 n divisible by p i1 p ir . Let m be an arbitrary positiveinteger. How many are there positive integers t 6 n which are divisible by m?Again put t = mu with the condition mu 6 n. Hence, we have to take allnumbersu =1 2 ..., such that mu 6 n. Let u be the last of them. Then r = n;mu < m,for in the opposite case such a number would also be mu + m = n(u +1). Butthen n = mu + r where 0 6 r < m|which is the formula for the division withremainder of n by m (see Theorem 4 of Chapter I). Hence the number u is equalto the quotient in the above division and it shall be denoted by [n=m]. Therefore,the number of positive integers not greater than n and divisible by m is [n=m]. Wecan now literally repeat the preceding argument and apply the formula (21). Forthe number of positive integers not greater than n and not divisible by p 1 , ... , p rwe obtain the expression(26) n ; np 1;n n;+ + +(;1) r n:p 2 p 1 p 2 p 1 p rnn


Selected chapters from algebra 23It is not as explicit as the expression (25) but we can write it in the form of (25), asan approximation. Recall the formula for the division with a remainder: n = mu+r,where 0 6 r


24 I. R. <strong>Shafarevich</strong>6. Let M be a nite set and let h be an arbitrary function on M. For a subsetN M dene the number S h (N) as the sum of all values h(a), for all a 2 N.Prove the formula analogous to (21) where n(M) is replaced everywhere by h(N).Hint: multiply the relation (24) by the function h.7. Find the sum of all positive integers not exceeding n and relatively primeto n. Hint: apply the result of Problem 6 with h(k) =k.8. The same question for the sum of squares of these numbers.9. Prove that in the right-hand side of inequality (28)wemay replace 2 r by2 r;1 .4. The language of probabilityThe theory of probability, likeany other branch of mathematics, has its basicconcepts which are not dened|like points or numbers. The rst such a conceptis the event. In this Section we shall consider the case when the number of eventsis nite. Usually, anevent is the result of the occurrences of some simpler eventswhich are said to be elementary. For instance, when we throw dice there are 6possible elementary events: the appearance of number 1, number 2, number 3,number 4, number 5, number 6 on the top face. The event thatweobtainanevennumber consists of three elementary events: either we obtain 2, or 4, or 6. The setof elementary events is simply a set (in this Section a nite set) whose elements havespecial names (elementary events). An event isasubset of the set of elementaryevents. The second basic concept is probability: it is a real number assigned toeach elementary event. Therefore, if M = fa 1 ...a n g is the set of elementaryevents, then to dene a probability means assigning to each element a i 2 M a realnumber p i , which is called the probability of the event a i . Probabilities shouldsatisfy two conditions: they should be nonnegative and the sum of probabilities ofall elementary events should be equal to 1:(29) p i > 0 p 1 + + p n =1:In other words, probability isafunction p(a) on the set of elementary events Mwith real values, satisfying the conditions p(a) > 0 for all a 2 M and the sumof all the numbers p(a) fora 2 M is 1. These conditions play the role of axiomsof probability. If N is an arbitrary event (recall that an event isa subset of theset M) then its probability is the sum of the numbers p(a) for all a 2 N. Thisprobability is denoted by p(N). In the special case when N = M, the correspondingevent issaid to be certain. The condition (29) shows that the probability ofthecertain event is 1. The condition p(M) = 1 is not as essential as the conditionp(M) > 0. The arbitrary case can be reduced to the case p(M) =1,by dividing allthe probabilities by p(M). We simply choose the probability of the certain event tobe the unit of measuring other probabilities. We emphasize that the object studiedby the theory of probability is the set (in our case nite) of elementary events withprescribed probabilities. This set and the probabilities are chosen according to thespecic conditions of the considered problem. Afterwards, when they are dened,we can evaluate probabilities of other events. That is why the specialists in the


Selected chapters from algebra 25theory of probability say that their task is to nd probabilities of certain eventsusing probabilities of other events.If the two events are given|and we recall that they are simply two subsetsN 1 and N 2 of the set M|then their union N 1 [ N 2 and intersection N 1 \ N 2 arealso events. From the denition it follows that p(N 1 [ N 2 ) 6 p(N 1 )+p(N 2 ). Thestrict inequality maytake place since in the sum p(N 1 )+p(N 2 ) the term p(a) willappear twice if a 2 N 1 \ N 2 . In fact we havep(N 1 [ N 2 )=p(N 1 )+p(N 2 ) ; p(N 1 \ N 2 ):We came across this relation earlier (see Problem 6 of Section 3). In particular, ifN 1 \N 2 = ?, i.e.ifN 1 and N 2 do not intersect, then the events N 1 and N 2 are saidto be mutually exclusive. In that case p(N 1 [ N 2 )=p(N 1 )+p(N 2 ). A particularcase is when N 1 = N is an arbitrary subset and N 2 = N is its complement. Weobtain that p(N) +p(N) = 1 or p(N) = 1 ; p(N). The event N is said to beopposite to N.The basic object: the set M and the given function on M satisfying the axiomsof probability (29) is called a probability scheme. It is denoted by (M p).An important case of dening probability schemes is when all the elements ofthe set M have the same probability, i.e. when all the numbers p i are equal. Fromthe condition (29) it follows that all p i 's are equal to 1=n. If N M is an arbitraryevent then p(N) = n(N)=n. For example, this is the case when we throw dice,if the dice is considered to be homogeneous. In this case, all 6 elementary eventswhich correspond to the possible appearances of the numbers 1, 2, ... , 6onthetop face have the same probability 1=6, and the event thataneven number appearson the top face has probability 3 1=6 =1=2.If the dice is not homogeneous, we have no reason to give all elementary eventsequal probabilities. In this case we may dene the probabilities experimentally,by throwing dice many times and noting the result. If after a large number nof throwing the number i appears k i times, then the probability of the elementaryevent|the appearance of the number i|is taken to be k i =n. Clearly, the conditions(29) will be satised. The number n depends on the accuracy we wish to attain.This gives another probability scheme (M p).Analogous to the case of dice throwing is the popular problem of drawing ballsfrom a bag. Suppose that the bag contains n identical balls and that we drawoutone of them without looking. The drawing out of a ball is an elementary event.The phrase \identical balls" mathematically means that the probabilities of theseevents are equal. Hence, they are equal to 1=n. Suppose now thatinthebagwehave balls of dierent colours: a black, and b white balls, where a + b = n. Thenthe event \a white ball is drawn from the bag" is a subset N M. Since n(N) =b,we have p(N) =b=n|this is the probability that a white ball is drawn out.Somewhat more involved is the dice problem, if dice is thrown twice. In thiscase an elementary event will be given by two numbers (a b), where 1 6 a 6 6,1 6 b 6 6 which show that in the rst throwing we get a, and in the second b.The number of elementary events is 36. This can be represented by theTable 3,


26 I. R. <strong>Shafarevich</strong>where on the horizontal we write all the possible outcomes of the rst throwing,and on the vertical the outcomes of the second. For example, to the elementaryevent that the rst throwing gives 5 and the second 4 corresponds the cell markedwith an asterisk. The event that the rst throwing gives 5 again has the probability1=6. But it is no longer elementary: it is comprised of six elementary events whichcorrespond to the cells of the vertical column abovethenumber 5. They correspondto the appearance of any number i, 1 6 i 6 6 on the top face of the dice in thesecond throwing, if 5 appeared in the rst. Since the rst throwing has no eecton the second, and the dice is supposed to be homogeneous, we conclude that allthe six elementary events have equal probabilities, and since the probability oftheevent which they make is1=6, then the probability of each oneof them must be1=36. Hence, we see that the probability ofany elementary event is1=36.Table 3 Table 4Consider the event N k : \the sum of the numbers obtained at the rst and thesecond throwing is equal to k" (\the score is k"). For each pair(a b) write in thecorresponding cell the sum a + b (Table 4). We see that 12 appears in one cell andso n(N 12 ) = 1 and also n(N 11 )=2,n(N 10 )=3,n(N 9 )=4,n(N 8 )=5,n(N 7 )=6,n(N 6 ) = 5, n(N 5 ) = 4, n(N 4 ) = 3, n(N 3 ) = 2, n(N 2 ) = 1. The greatest valuehas n(N 7 ), and since p(N k )=n(N k )=36, we see that p(N 7 ) has the greatest valueamong all p(N k )'s. In other words, the event that the score 7 will be obtained intwo throwing is the most probable.And what is the answer in the case of n throwing? Here the elementary eventsare given by sequences of n numbers (a 1 ...a n ) where each one can take the values1, ... , 6. The same reasoning as before shows that their probabilities are 1=6 n .The event N k : \the total score after n throwing is k" consists of those sequenceswhich satisfy a 1 + + a n = k. Hence, we have to nd which number k has thegreatest number of representations of the form(30) k = a 1 + + a n 1 6 a i 6 6:In order to do this, consider the polynomial F (x) = (x + x 2 + + x 6 ) n .Expanding it, we take from the i-th bracket the term x ai and as the result weobtain the term x a1++an . There are several terms of this form, and we collect


Selected chapters from algebra 27them together. Therefore, the numberofvarious representations (30) is equal tothe coecient of x k in the polynomial F (x), and our problem reduces to ndingwhich term has the greatest coecient. Since F (x) = x n G(x), where G(x) =(1 + x + + x 5 ) n , the coecient ofx k in F (x) is equal to the coecient ofx k;nin G(x), and it is enough to nd the term with the greatest coecient inG(x).Polynomial G(x) hastwo properties from which the answer to the above questionfollows.An arbitrary polynomial f(x) =c 0 + c 1 x ++ c n x n is said to be reciprocal ifits terms, equidistant from its ends, have equal coecients, i.e. if c k = c n;k . If thecoecients c i are represented by points with coordinates (i c i ) in the plane, thisproperty means that these points will be arranged symmetrically with respect tothe middle: the line x = n=2. On Fig. 7a) we represent the case when n is even,and on Fig. 7b) the case when n is odd.a) b)Fig. 7 1The polynomial x n f has the same coecients as the polynomial f(x),xbut in the reversed order. Indeed, if f(x) = a 0 + a 1 x + + a n x n , then 1 1f = ax0 +a 1x ++a 11 n fx n and xn = a 0 x n +a 1 x n;1 ++a n . Therefore,x 1the fact that f(x) is a reciprocal polynomial, means that x n f = f(x). Thisximplies that the product of two reciprocal polynomials is also reciprocal. Indeed, if 1f(x) andg(x) are reciprocal polynomials of degree n and m, thenx n f = f(x), x1 1 1x m g = g(x). Multiplying these equalities we getxxn f xxm g = f(x)g(x), x1 1i.e. x n+m f g = f(x)g(x), which means that the polynomial f(x)g(x) is re-xxciprocal. By induction we conclude that the product of any number of reciprocalpolynomials is also reciprocal. Finally, since the polynomial 1+x + + x 5 isreciprocal, so is the polynomial G(x) = (1 + x + + x 5 ) n .The polynomial f(x) = c 0 + c 1 x + + c n x n is called unimodal if for somem 6 n the following inequalities hold: c 0 6 c 1 6 ...6 c m > c m+1 > ...> c n . Thatis to say, the coecients c i at rst do not decrease, and from a certain moment


28 I. R. <strong>Shafarevich</strong>they do not increase. If they are again represented by the points (i c i ) then theywill have \one hump" (Fig. 8).Fig. 8For example, polynomial (1 + x) n is reciprocal: this follows from the propertyCnm = Cn;m n of binomial coecients (see Section 3 of Chapter II). It is also unimodal:this follows from the property of binomial coecients proved in Section 3of Chapter II.It can be proved that if the polynomials f(x) andg(x) have nonnegative coecients,if they are reciprocal and unimodal, then f(x)g(x) is unimodal. Theproof is quite elementary, but a little involved. From this theorem it follows thatthe polynomial G(x) is unimodal. However, you can easily prove yourself this specialcase (Problem 3). Now, it is easy to determine the term with the greatestcoecient in a reciprocal unimodal polynomial. Namely, if the term c k x k has thegreatest coecient, since the polynomial is reciprocal we have c n;k = c k and thereis the symmetric term c k x n;k . We can take that k 6 n=2 and n ; k > n=2.Since the polynomial is unimodal, none of the terms c i x i where k 6 i 6 n ; k canhave smaller coecient, for otherwise there would be two \humps" on the graph.Hence, the greatest coecient must be the middle coecient c n=2 if n is even ortwo \equally middle" coecients c n;1 = c n+1 if n is odd (though there may be22other coecients equal to them). In particular, we see that if n is even, then inG(x) the term xterms of G(x), x5n2 has the greatest coecient, and if n is odd then there are two5n;125n+12and xwith equal greatest coecients.In the polynomial F (x) this term is multiplied by x n and has degree 5n7n2 + n =if n is even. If n is odd, there are two terms with equal coecients with degree25n ; 1+ n = 7n ; 1 5n +1 7n +1and + n = . Therefore, if dice is thrown n22 22times the most probable score is 7n if n is even, and if n is odd there are two scores2which are both most probable: 7n ; 1 7n +1and .2 2Consider one more problem of the same type. A certain quantity ofm physicalparticles are registered by n instruments, so that each particle can be registeredby any instrument, and the registration of a particle by all instruments are taken


Selected chapters from algebra 29to be equally probable. What is the probability that all instruments register atleast one particle? An elementary event here is the registration of a particle by aninstrument. Let the instruments be denoted by the elements a of the set M. Wehave n(M) =n. Numerate the particles by1,2,... ,m. Then an elementary eventis the sequence (a 1 ...a m ) where a i 2 M and this sequence indicates that the i-thparticle is registered by theinstrument a i . In other words, the set of elementaryevents is M m in the sense of the denition given in Section 1. The condition ofthe problem states that all elementary events have equal probabilities. Since byTheorem 1, n(M m )=n m , the probability of each elementary event is1=n m . Weare interested in the subset N M m which contains the sequences (a 1 ...a m )inwhich all the elements of M appear. For example, if M = fa b cg, m =4,then(a b c a) 2 N, but the sequence (a b a b) does not belong to N, since it does notcontain c. Our problem is to nd n(N).Denote by M a the subset of M m which consists of the sequences (a 1 ...a m )in which none of the a i 's is equal to a. Then clearly N = S M a , i.e. N is thecomplement of the union of all sets M a for all a 2 M. Therefore, n(N) =n(M m );n( S M a ) and the values of the numbers n( S M a ) are given by formula (21). Let usnd the number n(M a1 \ M a2 \\M ar ), where a 1 , ... , a r are dierent elementsof the set M. Hence, we are dealing with the sequences (c 1 ...c m )inwhichnoneof the c i 's equals any ofa 1 ,...,a r . In other words, c i are arbitrary elements of theset fa 1 ...a r g, where fa 1 ...a r g is the complement offa 1 ...a r g with respectto M. The set of all such sequences is the set (fa 1 ...a r g) m and the number ofelements of this set is, by Theorem 1, (n(fa 1 ...a r g)) m . Since n(fa 1 ...a r g)=r,n(M) =n, wehave n(fa 1 ...a r g)=n;r and n(M a1 \M a2 \\M ar )=(n;r) m .Hence, each term n(M i1 \ M i2 \\M ir )informula (21) in our case is (n;r) m .The number of terms for a given r is equal to C r n,asweknow. Therefore, formula(21) givesn( [ M a )=C 1 n(n ; 1) m ; C 2 n(n ; 2) m + +(;1) n C n;1n 1 m :For N = S M a we obtainn(N) =n(M m ) ; n( [ M a )=n m ; C 1 n (n ; 1)m + +(;1) n;1 C n;1n 1 m :The requested probability isn(N)(31)=1; n ; 1n m C1 nn m m 1+ +(;1) n;1 Cnn;1 :nIn all the previous examples elementary events had equal probabilities 1=n,where n is the number of elementary events. As a result, the evaluation of probabilitiesof other events reduced to the counting of the number of subsets|i.e.to a problem of combinatorics. We shall now consider examples which are morecharacteristic for the theory of probability.Let (Mp) and (Nq) betwo probability schemes. Suppose that they are de-ned by many times repeated experiments|dierent experiment for each scheme.


30 I. R. <strong>Shafarevich</strong>The experiment used to dene the probability scheme (Mp) will be called experimentA, and the one used for the scheme (Nq) will be called experiment B. Considernow the experiment consisting of consecutive experiments A and B, and letus try to use it to dene a new probability scheme. A similar situation was encounteredin connection with consecutive throwing dice (see Table 3). Let n(M) =m,M = fa 1 ...a m g, p(a i )=p i , n(N) =n, N = fb 1 ...b n g, p(b i )=q i . Then thenew experiment denes the following elementary events: in the rst experiment wehave the event a 2 M and in the second b 2 N. Hence, new elementary eventscorrespond to the pairs (a b), where a 2 M, b 2 N, or to elements of the setX = M N. What probabilities can be assigned to these elements? They canbe reasonably dened if we introduce one more supposition. We shall take thatthe experiments A and B, used to dene probability schemes (Mp) and(Nq) areindependent. This means that the result of the second experiment (i.e. B) doesnot depend on the outcome of the rst experiment (i.e. A). Using this conditionit is possible to dene the probabilities p(a b) oftheevents (a b). Our reasoningwill closely follow the one applied in connection with throwing dice two times (seeTable 3).As in that case (and as we did in Section 1), we represent the elements of theset in the form of the rectangular tableTable. 5The event that the event a i takes place in the rst experiment has, by condition,probability p i . It is not an elementary event, since it consists of elementary events(a i b 1 ), (a i b 2 ), . . . , (a i b n ), displayed in the i-thcolumnofTable 5. As we agreedthe probabilities of these events should not depend on the experiment A, but shouldbe like the probabilities of b 1 ,...,b n in the scheme (Nq). But here we arriveatacontradiction: the sum of probabilities of the events (a i b 1 ), (a i b 2 ), ... , (a i b n )isequal to p i , and the sum of the probabilities of the events b 1 , ... , b n is 1. In otherwords, the i-th column is itself a probability scheme, which must be \the same"as the scheme (Nq). But in this scheme the condition (29) is not fullled. Wetherefore have to make the following \correction": we divide the probabilities ofall elementary events by the probability oftheevents p i . We therefore obtain theprobability scheme with probabilities p((a ib j )). Since it should coincide with thep i


Selected chapters from algebra 31probabilityscheme (Nq), wearrive at the equality p((a ib j ))p i= q j , i.e. p((a i b j )) =p i q j . Hence, we have, by denition(32) p(a i b j )=p i q j :In this way we obtain the new probability scheme: the sum of probabilities ofelementary events which appear in the i-th column of Table 5 is equal to p i q 1 ++p i q n = p i (q 1 ++q n )=p i , and the sum of the probabilities of all elementaryevents is p 1 + + p m =1. Hence, the condition (29) is fullled.This new probability scheme (X r) is called the product of probability schemes(Mp) and(Nq). We can write it as follows: if the given schemes are (Mp) and(Nq), then X = M N and p((a b)) = p(a)q(b). The product of probabilityschemes corresponds to the intuitive idea of the probability scheme dened by twoconsecutive experiments, independent from each other. The above reasoning wasnecessary to explain the motivation for the given denition. Formally, thedenitionis given by the simple equality (32).Now for several probabilityschemes (M 1 p 1 ), . . . , (M r p r )we dene the productby induction(33) M 1 M r =(M 1 M r;1 ) M r where M 1 M r;1 is taken to be known, and the product of two schemesM 1 M r;1 and M r is dened above. Let us decipher this denition. As aset, M 1 M r is the product of sets M 1 , M 2 ,...,M r , dened in Section 1.Hence, it consists of arbitrary sequences (a 1 ...a r )wherea i can be any elementof M i . The probability of the elementary event (a 1 ...a r )is(34) p((a 1 ...a r )) = p 1 (a 1 )p 2 (a 2 )p r (a r ):This can also be veried by induction on r. Indeed, according to denitions (33)and (32), we have p((a 1 ...a r )) = p(((a 1 ...a r;1 )a r )) = p((a 1 ...a r;1 ))p(a r )and by induction hypothesis p((a 1 ...a r;1 )) = p 1 (a 1 )p 2 (a 2 )p r;1 (a r;1 ) andthis implies (34). This equality can be described as follows: in the sequence(a 1 ...a r ) replace each element by its probability and multiply the obtained numbers.This is the probability of the sequence.We now apply the general construction to the special case of the probabilityscheme I n where i = fa bg is the probability scheme consisting of two elementaryevents with probabilities p(a) = p, p(b) = q, with necessary conditions p > 0,q > 0, p + q = 1. We have already dened I n as a set in Section 1. It consistsof all possible \words" of the type (a a b b b a b ...) in the \alphabet" of twoletters: a and b. Hence, they will be the elementary events. Their probabilities aredened, according to the above reasoning, as follows: if the letter a appears in the\word" k times and the letter b appears n;k times, then its probability isp k q n;k .Such a probability scheme is called Bernoulli's scheme. As we saw, it gives theprobabilities of the events a and b in n times repeated experiment, when in eachone the event a has probability p and the event b has probability q. Besides, wesuppose that the outcome of an experiment doesnotaectthe outcomes of laterexperiments.


32 I. R. <strong>Shafarevich</strong>For example, for n =3wehave 8 elementary events (a a a), (a a b), (a b a),(a b b), (b a a), (b a b), (b b a), (b b b). Their respective probabilities are p 3 ,p 2 q, p 2 q, pq 2 , p 2 q, pq 2 , pq 2 , q 3 . Notice that here the letter p does not denotethe probability, but a xed number, where 0 < p < 1. The probability of theelementary event which corresponds to the sequence having k letters a and n ; kletters b is p k q n;k . These notations are too standard to be changed, but we haveto pay attention to what the letter p denotes.Let us nd the probability of the event A k which consists of a series of nexperiments in which the event a occurs k times. These events consist of elementaryevents given by \words" (b a b b b a a ...)inwhich a appears in exactly k places.The remaining n ; k places are occupied by b. By the general formula, such anelementary event has probability p k q n;k . Now how many elementary events makeup the event A k ? This is the number of ways in which k elements can be chosenamong n indices 1, 2, . . . , n, i.e. the number of subsets with k elements of a set ofn elements. According to Theorem 3, this is the binomial coecient Cn. k Thereforefor the probability oftheevent A k we obtain(35) p(A k )=C k n pk q n;k =n!k!(n ; k)! pk q n;k :Using this we can nd the most probable number of occurrences of the event a.It is the value of k for which the expression in (35) has the greatest value. Writedown the expressions (35):1 q n n(n ; 1) npq n;1 p 2 q n;2 ... 1 p n2and consider the ratio of two neighbouring terms:p(A k+1 )p(A k )=n!n!(k +1)!(n ; k ; 1)! pk+1 q n;k;1 k!(n ; k)! pk q n;k (n ; k)p=(k +1)q(after the cancellations, which you can easily check).If this ratio is greater than 1, then the (k+1)-st number is greater than the k-thif it is 1, then the two numbers are equal, and if it is less than 1, then the (k + 1)-st(n ; k)pnumber is less than the k-th. The ratio will be greater than 1 if > 1, i.e.(k +1)q(n;k)p >(k+1)q or np > k(p+q)+q. Having in mind that p+q =1,we can writethis inequality in the form np > k +1; p, i.e.(n +1)p;1 >k. If k>(n +1)p;1,then the ratio p(A k+1 )=p(A k ) is smaller than 1. Finally, ifk =(n +1)p ; 1, thenp(A k+1 )=p(A k ). Therefore, as k takes values less than (n +1)p ; 1, as we movefrom the k-th number to the (k + 1)-st, we obtain greater numbers. We distinguishbetween two cases.a) The number (n+1)p;1 isnotaninteger. Then the greatest number p(A m )is obtained for the greatest integer m which does not exceed (n +1)p. Moreover,m 6= (n +1)p;1 and for greater values of k such number p(A k ) is smaller than thepreceding one. Therefore, there is one most probable number of occurrences of theevent a|that is the greatest integer m which does not exceed (n +1)p ; 1.


Selected chapters from algebra 33b) The number (n +1)p;1 isaninteger. Then the number p(A k ) increases ifkm+ 1 the numbersp(A k ) decrease. Hence, the numbers p(A k ) increase until they reach a maximum,then we have one or two numbers equal to this maximum, and then they decrease.In other words, they have \onehump" as in Fig. 8. This means that the polynomialgenerated by them, namely q n n(n ; 1)+np n;1 qt+ p n;2 q 2 t 2 ++p n t n is unimodal.2Using the binomial formula, we can write this polynomial in the form (q + pt) n .How can one detect that it is unimodal when it is written in such a simple form?I do not know that such a method exists.In the simplest case, when p = q = 1 2 , we obtain that if (n +1)1 2 ; 1isnotan integer, i.e. if n is even, then (n +1) 1 2 ; 1=n 2 ; 1 2 and m = n 2 . Therefore,there is one most probable number of occurrences of the event a|this is m = n 2 .This means that it is most likely that both events a and b occur n times. It isnot surprising, since such ananswer is suggested by symmetry. If n is odd, thenm =(n+1) 1 2 ;1= n ; 1 is an integer, and there are two most probable occurrencesn ; 1of the event a:2case b occurs n ; 12(in which case b occurs n +12times) and n +12(in whichtimes) and this is also quite natural. But for all other values2of p we obtain the answer which would be dicult to predict. Here is a problemfrom a textbook on probability.After many years of observations it was concluded that the probability that itwill rain on the July 1st is 4=17. Find the most probable number of rainy July 1st'sin the next 50 years. We have n =50,p =4=17, m =(n +1)p;1 =11. Hence, themost probable numbers of rainy July 1st's are 11 and 12 (with equal probabilities).The values of probabilities C k np k (1 ; p) n;k , k =0 1 ...n have many importantproperties. On Fig. 9, taken from a course of the theory of probability, thesevalues are represented for the cases p =1=3, n = 4, 9, 16, 36 and 100.You see that as n increases, they are not arranged chaotically, but rather theyapproach a smooth curve. In order to see this better, modify each gure as follows:move the greatest number to the y-axis, decrease the distance between the pointson the x-axis (this change of scale was done in Fig. 9) and nally decrease all thenumbers proportionally with respect to the greatest number. After this, it turnsout that as n increases our points more and more closely approach a certain curve|namely, the graph of the function y = 1 p2c x2 , where is the ordinary ratio ofthe perimeter and the diameter of a circle and c (for those who know that e is thebasis of natural logarithms) is equal to 1= p e.This assertion, called Laplace's theorem, gives, in essence, a more subtle propertyof binomial coecients. But in order to prove this theorem we would haveto explain the phrase \approaches more and more closely", i.e. we would have to


34 I. R. <strong>Shafarevich</strong>Fig. 9introduce the concept of the limit, and we shall not go into this.Problems1. In an arbitrary probability scheme (Mp) consider k events: M 1 M,... ,M k M. Express the probability p(M 1 [ M 2 [[M k )oftheevent M 1 [ M 2 [[M k in terms of the probabilities p(M i1 \\M ir )oftheevents M i1 \\M ir .2. Prove that if the polynomial f(x) is reciprocal and unimodal, then thepolynomial f(x)(1 + x) also has these two properties.3. Prove that if the polynomial f(x) is reciprocal and unimodal, then so isthe polynomial f(x)(1 + x + x 2 + x 3 + x 4 + x 5 ). Deduce then that the polynomial(1 + x + x 2 + x 3 + x 4 + x 5 ) n is unimodal.4. Verify that the answer to the problem of m particles and n instruments isn!if n = m. What relation between binomial coecients is obtained if the formulann (32) is applied in this case?5. There are n identical balls in abag,m white and n ; m black balls. Wedraw out at random r balls. What is the probability thatwe draw k white andr ; k black balls? Hint: \at random" means that the probabilities of any draws ofr balls are equal.6. Prove that if the probability p in Bernoulli's scheme is an irrational number,then there exists exactly one most probable number of occurrences of the event a.7. The ratio of the most probable number of occurrences of an event a inBernoulli's scheme and the number n is called the most probable section. Prove that,


Selected chapters from algebra 35as the number n increases indenitely, then the most probable section approachesmore and more closely the probability p of the event a.APPENDIXInequalities of ChebyshevWe shall again consider a question regarding Bernoulli's scheme which wastreated at the end of Section 4. As we saidthere, Bernoulli's scheme practicallyarises in the situation when we have several times repeated experiment whichcanhave onlytwo outcomes. For example, suppose that we have an asymmetric (nonhomogeneous)coin. The question we pose is: if this coin is spun onto the groundwill the top face be \head" or \tail"? In order to arrive atan answer, we makea large series of spins|say, 1000|and if \head" appears k times, we say that theprobability of its appearance is p = k=1000. After that we can apply our denitionof Bernoulli's scheme (I n p) and we can nd other probabilities within that scheme,e.g. formula (35). But is our abstraction satisfactory? Does it represent sucientlyaccurately the reality withwhich we started: a long series of independent spins?In our abstraction|Bernoulli's scheme|we cannot ask: how many times will theevent a occur in the scheme I n ? Since we only operate with the language of probability,we can only pose questions regarding certain probabilities. But the conceptof probability is connected with reality by our conviction that an event which hasavery small probability practically does not occur. In other words, if the probabilityof a certain event is suciently small, we can in practice proceed as if we knewthat it will not occur. Of course, the sense of the words \suciently small" has tobe made precise in each concrete situation. According to this, we can x a certainnumber ">0 and consider the following event A " : in our Bernoulli's scheme (I n p)the event a occurred k times where k ; n p >". That is to say, the occurrence ofthe event A " means that the \frequency" k=n of occurrences of the event a diersfrom the supposed probability by more than ". It is natural to expect that for axed " the probability p(A " )oftheevent A " will become smaller and smaller as nincreases indenitely. It would mean that the dierence between the \frequency"k=n and the probability p for large n can be ignored. Jacob Bernoulli consideredthis problem already at the beginning of the 18th century, and he realized thatnding the probability p(A " ) is a purely mathematical problem connected with theproperties of binomial coecients. He proved that the probability p(A " ) indeedbecomes suciently small as n increases. In the 19th century Chebyshev provednot only this particular Bernoulli's statement, but he also found a simple explicitinequality for the probability p(A " ). We shall expose here his theorem. This is therst time in our text that we come across the work of a Russian mathematician.P. L. Chebyshev lived in the period from 1821 till 1894, and was the founder of thePetersburg mathematical school.Let us write now in the form of an algebraic formula the expression we areinvestigating. In Section 4 we considered Bernoulli's scheme (I n p)andwe found


36 I. R. <strong>Shafarevich</strong>the probability of the event A k which takes place if in the series of experiments theevent a, for which p(a) =p occurs k times (formula (35)):(1) p(A k )=C k n pk q n;k :We nowhave a given number " and we areinterested in the event A " which takesplace if an event A k with index k occurs where k satises the inequality k ; n p >".We want to nd the probability p(A " ) of the event A " . Recall that an event (inparticular, A k or A " ) is a subset of the set I n . It is clear that the subsets A kwith dierent indices do not intersect and that A " is the union of all subsets A kfor those k's for which k ; n p >". Therefore, the probability p(A " ) is the sumof probabilities p(A k )withsuch indices k. Since p(A k )isgiven by (1), this meansthat we have obtained an explicit, although a bit complicated, expression for theprobability p(A " ) of the event A " . It is more convenient to write the condition k ; n p >", which denes our indices k in the equivalent form(2) jk ; npj > "n:In this way we arrive at the sum(3)S " ; the sum of all expressions C k n pk q n;k for all k, 16 k 6 n, satisfying (2):We see that the probability p(A " )oftheevent A " is equal to S " .Now we can formulate Chebyshev's theorem.CHEBYSHEV'S THEOREM. For the probability p(A " ) of the event A " that thenumber of occurrences of k events a in Bernoulli's scheme (I n p), satisfying thecondition k ; n p >", the following inequality holds(4) p(A " ) < pq" 2 n :The inequality (4) is sometimes written in the form kp ; n p >" < pq" 2 n :It is clear that for given p (q =1;p)and", the right-hand side of the inequality(4) decreases as n increases, which is what we wanted to prove. This particularresult is called Bernoulli's theorem.As we saw, the probability p(A " ) is equal to the S " , dened by (3), and soinequality (4) is equivalent to the inequalityS " < pq" 2 n :The proof of Chebyshev's theorem is based upon explicit evaluation of certainsums which we formulate in the form of a lemma.


Selected chapters from algebra 37(5)(6)(7)LEMMA For the probabilities p(A k ), dened by the relation (1), we havep(A 0 )+p(A 1 )+p(A 2 )++ p(A n )=1p(A 1 )+2p(A 2 )+3p(A 3 )++ np(A n )=npp(A 1 )+2 2 p(A 2 )+3 2 p(A 3 )++ n 2 p(A n )=n 2 p 2 + npq:Proof. Denote the left-hand sides of the equalities (5), (6) and (7) by 0 , 1 and 2 , respectively. We have already seen in Section 4 that, according to the binomialformula, the probabilities p(A k ) are the coecients of the polynomial (pt + q) n .That is to say, ifwe put(8) p(A 0 )+p(A 1 )t + + p(A n )t n = f(t)then(9) f(t) =(pt + q) n :Setting t =1into (8) and (9), and using the fact that p + q =1,we obtain 0 =1,i.e. equality (5).Consider the derivative f 0 (t) of the polynomial f(t). From the formula (9),using the rule (19) of Section 2 of Chapter II we obtainthat(10) f 0 (t) =np(pt + q) n;1since (pt + q) 0 = p, by formula (15) of Chapter II. On the other hand, applyingformula (15) of Chapter II for f 0 (t) to the polynomial f(t) given by (8),we obtain(11) f 0 (t) =p(A 1 )+2p(A 2 )t +3p(A 3 )t 2 + + np(A n )t n;1 :Formulas (10) and (11) together lead to:(12) p(A 1 )+2p(A 2 )t +3p(A 3 )t 2 + + np(A n )t n;1 = np(pt + q) n;1 :Set t =1into both sides of (12). Since p + q =1,we obtain the equality (6).Now multiply both sides of (12) by t. We nd(13) p(A 1 )t +2p(A 2 )t 2 +3p(A 3 )t 3 + + np(A n )t n = np(pt + q) n;1 t:Let us nd the derivatives of both sides of (13). The derivative of the left-handside is found by means of formula (15) of Chapter II. We obtain the polynomialp(A 1 )+2 2 p(A 2 )t + + n 2 p(A n )t n;1 :The derivative of the right-hand side can be evaluated by the rule d) for the derivativeof a product from Section 2, Chapter II. Write the right-hand side of (13) in theform of a product: (np(tp + q) n;1 )t. By rule d) the derivative of this expression is(np(tp+q) n;1 ) 0 t+(np(tp+q) n;1 )t 0 . By formula (15) of Chapter II, wehave t 0 =1by rule c) of Section 2 of Chapter II we have (np(tp + q) n;1 ) 0 = np((tp + q) n;1 ) 0and by formula (19) of Chapter II we have ((tp+q) n;1 ) 0 =(n;1)(tp+q) n;2 p since(tp + q) 0 = p, byformula (15) of Chapter II. Therefore, equating the derivatives ofthe left and right-hand sides of (13) we obtain(14) p(A 1 )+2 2 p(A 2 )t++n 2 p(A n )t n;1 = np(pt+q) n;1 +n(n;1)p 2 (tp+q) n;2 :


38 I. R. <strong>Shafarevich</strong>Set t =1into (14). On the left we get 2 . On the right (in view of p + q =1)weget np + n(n ; 1)p 2 = n 2 p 2 + np(1 ; p) =n 2 p 2 + npq (since 1 ; p = q).We can now turn to the proof of Chebyshev's theorem. Chebyshev's devicewas to write inequality (2),which denes the necessary indices, in the form k ; np"n > 1i.e. k ;2np> 1"n kand then to multiply each termp(A k ) in the sum S "; pn 2,whichbyis greater"nthan 1, and therefore increases the sum. After that he considered the total sum kS "; pn 2p(Akof all terms), k =0 1 ...n, and not only those for indices k"nwhich satisfy (2). It is clear that the sum S " diers from the sum S " by a certainnumber of positive terms, and so it must be greater than S " .Hence, S " < S " . Now, by quite elementary transformations (using the Lemma)we can evaluate the sum S " exactly, andthus we obtainthewanted inequality forthe sum S " . kTherefore, we have tond the sum S "; pn 2p(Akof all terms) for k ="n0 1 ...n. Their common denominator ("n) 2 can be taken out and the expressions(k ; pn) 2 can be expanded: (k ; pn) 2 = k 2 ; 2npk + p 2 n 2 . Every term in the sumS " (after ("n) 2 has been taken out) gives three terms. The sum of the rst termsis 2 on the left of (7). The sum of the second terms, after the common factor;2pn has been taken out, is the sum 1 dened by (6). Finally, the sum of thethird terms, after p 2 n 2 has been taken out is 0 , dened by (5).Adding up all theobtained equalities, we nd the expression for the sum S " :S " = 1" 2 n 2 ( 2 ; 2pn 1 + p 2 n 2 0 ):Substituting the values obtained for 2 , 1 and 0 in the Lemma, we nd(15) S " = 1" 2 n 2 (n2 p 2 + npq ; 2p 2 n 2 + p 2 n 2 )= pq" 2 n :As wesaw, S " < S " and therefore S " < pq and the proof of Chebyshev's inequality" 2 nis nished.Let us briey analyse the method which lies in the essence of this proof. Thesum S " which wewant to estimate has a perfectly simple form. The diculty liesin the fact that the sum is formed by terms which arechosen according to a ratherstrange criterion (the indices k have to satisfy (2)). The rst thing that comes tomind is to ignore these conditions and to take the sum of all terms. This sum iseasily evaluated: according to the Lemma, it is equal to 1. But it is too large anddoes not lead to the equality we want. Chebyshev's device was to introduce the


Selected chapters from algebra 39 k ; np 2additional factorand only after that to consider the sum of all terms,"nignoring the restriction (2). In this process the terms which appear in the sum S "are increased, but those which do not are decreased so much that the total sumS " becomes suciently small (namely, for the terms which do not appear in S " we k ; np 2have< 1)."nWehave met here with a phenomenon whichisvery often present in mathematics.Namely, important and interesting inequalities usually follow from an identityafter an obvious estimate. This obvious estimate in our case is the inequalityS " 6 S " and the identity is the relation (15) which gives the explicit expression forthe sum S " . This is how inequalities of fundamental importance in mathematicsare proved. But sometimes they are proved in a dierent way|this might indicatethat there is an underlying identity whichwe do not yet know.Return once more to the formulation of Chebyshev's theorem. As we alreadyexplained, we are considering the event that in Bernoulli's scheme I n the event aoccurs k times, where either k > np + n", or k < np ; n" in other words, wedo not consider the event that in Bernoulli's scheme the event a occurs k timeswhere np ; n" 6 k 6 np + n". We found that the rst event has small probability(for large n), not exceeding pq=" 2 n. This means that the second event has greaterprobability, not less than 1 ; (pq=" 2 n). For example, consider a series of largenumber of repetitions of one experiment under constant conditions. Suppose thatone experiment canhave onlytwo outcomes|a and b, where the probability ofais p. This situation (if the number of experiments is n) is described, as we saw, byBernoulli's scheme (I n p). The experiment may be, for instance testing a large setof objects (animals, technical details, etc) for a given property, knowing that p-thpart of the set has this property. The scheme I n describes the possible results ofthe testing. According to Chebyshev's theorem, in the series of n experiments thenumber of occurrences of the outcome a will be between np;n" and np + n" withp(1a probability greater than 1 ; ; p). Here, " can be any number which wecan" 2 nchooseaswelike. For example, let p = 3 4 . Choosing " = 1 ,we see that in the100series of n experiments the number k of occurrences of the event a will satisfy theinequality 3 4 n ; n100 6 k 6 3 4 n + n with the probability not less than1001 ; 3 4 1 4; 1100 2n :Since 3 4 < 2 , this probability is not less than2 101 ;210; 1100 2n=1;2000n :For n = 200 000, this probability will be not less than 0,99. The number of occurrencesof the event a after 200 000 experiments which have this large proba-


40 I. R. <strong>Shafarevich</strong>bility will be between 148 000 and 152 000 (since 3 4 n = 150 000, n 1100 = 2000,np ; n" = 148 000, np + n" = 152 000).Conversely, using Chebyshev's theorem we can estimate the number of experimentsto be made in order to obtain the probability p accurately enough. Supposethat we want to determine it with accuracy up to 1=10 and that the probability itis equal to the obtained number is not less than 0,99. According to Chebyshev'stheorem we have toput" =1=10 and to use the inequality; 110pq 2 n < 001:Notice that q =1;p, and for any p such that 0 6 p 6 1, wehave pq = p(1;p) 6 1=4.This follows from the fact that the geometric mean is not greater than the arithmeticmean of the numbers p, q, whichis1=2. Therefore, it is enough that n should satisfythe inequality14; 110 2 n < 001which implies n>2500.Problems1. In the set of some objects, 95% of them have a certain property. Prove thatamong 200 000 objects, the number of those which have thisproperty isbetween189 000 and 191 000 with probability not less than 0,99.2. Modify Problem 1 so that the portion of objects which have a certainproperty isnotknown. What is the probability that after testing 100 objects wecan determine it with accuracy up to 0,1?3. For any positive integer r 6 n nd the sum of all termsk(k ; 1)(k ; r +1)p(A k )for k =1 ...n.4. For r 6 4 evaluate the sums r consisting of terms k r p(A k ) for all k =0 1 2 3 4. Do this in two dierent ways: a) by the reasoning of the proof of theLemma, and b) by expressing the sums r in terms of sums evaluated in Problem 3for r =1 2 3 4.5. Try to improve the inequality (4) in Chebyshev's theorem, applying the k ; np 4 kfactorinstead of; np 2.The improvement will be that n2 willn"n"appear in the denominator of the right-hand side of the inequality instead of n.I. R. <strong>Shafarevich</strong>,Russian Academy of Sciences,Moscow, Russia


THE TEACHING OF MATHEMATICS2000, Vol. III, 2, pp. 63{82<strong>SELECTED</strong> <strong>CHAPTERS</strong> <strong>FROM</strong> <strong>ALGEBRA</strong>I. R. <strong>Shafarevich</strong>Abstract. This paper is the fourth part of the publication \Selected chaptersfrom algebra", the rst three having been published in previous issues of the Teachingof Mathematics, Vol. I (1998), 1{22, Vol. II, 1 (1999), 1{30, Vol. II, 2 (1999), 65{80,and Vol. III, 1 (2000), 15{40.AMS Subject Classication: 00 A 35Key words and phrases: Primes, function (n), Chebyshev inequality, asymptoticallaw of distribution of primes.CHAPTER IV. PRIMES1. Innity of the number of primesIn this chapter we return to the question which wehave already dealt with inChapter I. It was proved there that each natural number can be uniquely representedas a product of primes. Therefore, when the multiplication is concerned, theprimes are the simplest elements and all the natural numbers can be obtained bymultiplying primes, similarly to the fact that they can be obtained by the operationof addition starting from the number 1. From this point of view, the interest forthe set of all primes can be easily understood. There are four primes among therst ten natural numbers: 2, 3, 5, 7. Further primes can be found by dividing eachof the consequent numbers by previously found primes, in order to decide whetherit is a prime itself. In this way we nd the following 25 primes among the rst onehundred natural numbers:2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97:How far does this sequence continue? The question arose already in the antiquetimes. The answer was given by Euclid:THEOREM 1. The number of primes is innite.We give several proofs of this theorem.First proof |the one contained in Euclid's \Elements". Suppose wehave foundn primes: p 1 , p 2 , ... , p n . Consider the number N = p 1 p 2 p n +1. As we sawThis paper is an English translation of: I. R. Xafareviq, Izbranye glavy algebry. GlavaIV. Prostye qisla, Matematiqeskoe obrazovanie, N1(4), nv.{mar. 1998, Moskva, str. 2{21.In the opinion of the editors, the paper merits wider circulation and we are thankful to the authorfor his kind permission to let us make this version.


64 I. R. <strong>Shafarevich</strong>in x2 of Chapter I, each number has at least one prime divisor. In particular, Nhas a prime divisor. But none of the numbers p 1 , p 2 , ... , p n can divide N. Tosee this, let p i be a divisor of N. Then N ; p 1 p n must be divisible by p i ,butsince N ; p 1 p n = 1, this is impossible. It follows that this prime divisor mustbe dierent fromeach p i , i =1 ...n,which means that after each n primes theremust be at least one additional prime. This proves the theorem.Second proof. According to the theorem of the section \Set algebra" of ChapterIII, the number of numbers which are smaller than the given number N andrelatively prime with it, is given by the formula(1) N1 1 ;1 1 ; 1 1; p 1 p 2 p nwhere p 1 , ... , p n are all prime divisors of N. We shall prove the theorem bycontradiction. Suppose that the number of primes is nite and that p 1 , ... , p nare all of them. Set N = p 1 p n . Substituting in formula (1) we obtain for eachfactor p i (1 ; 1 p i) the expression p i ; 1, and for the whole product (1) the expression(p 1 ; 1)(p 2 ; 1) (p n ; 1). As we know that there exist primes greater than 2 (forexample, 3), the number obtained is greater than 1. Hence, there exists a numbera, smaller than N, relatively prime with N and dierent from1. But a has at leastone prime divisor which must be contained among the numbers p 1 , ... , p n , andso a cannot be relatively prime with N. We obtained a contradiction which provesthe theorem.The innite sequence of primes is, on the other hand, very sparsely distributedamong natural numbers. For example, there are arbitrary big \gaps" in thissequence, i.e., one can nd (successively further away) any given number of consecutivenumbers which are not prime. For example, n numbers (n+1)!+2, (n+1)!+3,... , (n + 1)! + n +1 areobviously not primes|the rst one is divisible by 2,thesecond by 3, the last by n +1.For some time mathematicians have searched for a formula expressing primes.For example, Euler found an interesting polynomial x 2 + x +41, which, for 40values of x|from 0 to 39|obtains prime values. However, it is obvious that forx =40itsvalue is a nonprime number 41 2 . It is not hard to conclude that therecannot exist a polynomial f(x) which takes prime values for all natural valuesx = 0 1 2 ... (not even speaking about the possibility that its values are all ofthe primes). We shall show thison an example of a polynomial of second degreeax 2 + bx + c with integer coecients a, b, c. Suppose that for x = 0 the polynomialhas a prime value c. Then for each x = kc its value ak 2 c 2 + bkc + c is divisibleby c. This value can be equal to c for at most one additional value of k (besidesk = 0), which can be easily checked. Moreover, there does not exist a polynomialf(x) =ax 2 +bx+c having prime values for each integer x, starting from some limit.Indeed, suppose that the values of the polynomial f(x) are prime for each x > m.Set x = y+m, f(y+m) =g(y) then all the values of the polynomial g(y) are primefor all integers y > 0, by the assumption, and its coecientsarealsointegers, sinceg(y) =a(y + m) 2 + b(y + m)+c. The same reasoning also applies to a polynomialof an arbitrary degree n: f(x) =a 0 +a 1 x++a n x n . If all of its values for integer


Selected chapters from algebra 65x > 0 are prime, it means that f(0) = a 0 = p is prime, too. Then for each integerk the values f(kp) =p + a 1 kp+ + a n (kp) n are divisible by p. They can be equalto p only if p + a 1 kp + + a n (kp) n = p, i.e., a 1 + a 2 kp + + a n (kp) n;1 =0,and the last equation in k is of the degree n ; 1, and according to Theorem 3 ofChapter II it has at most n ; 1 roots. For all other values of k the number f(kp)is divisible by p and dierent fromp, i.e., it is not a prime.If we suppose that the values of the polynomial f(x) are prime only for integervalues of x > m, for a certain number m, thenwe can set x = y +m and f(y +m) =g(y). The polynomial g(y) = a 0 + a 1 (y + m) ++ a n (y + m) n is obtained byexpanding all the parentheses by the binomial formula and reducing similar terms.Therefore its coecients are again integers, but it obtains prime values for allintegers y > 0, which againisacontradiction.It can be also proved that for an arbitrary number k no polynomial in kvariables with integer coecients exists such that all of its values for all naturalvalues of its variables are primes. Nevertheless, it appears that there is a polynomialof degree 25 with 26 variables, having the following property: if we select thosevalues of that polynomial which are obtained for nonnegative integer values of itsvariables and which are positive themselves, then the set of such values coincideswith the set of primes. Since 26 is equal to the number of letters of the Latinalphabet, it is possible to denote the variables by the letters: a, b, ... , x, y, z.Then the polynomial is of the form:F (a b c d e f g h i j k l m n o p q rstuvwxyz)==(k +2)f1 ; [wz + h + j ; q] 2 ; [(gk +2g + k +1)(h + j)+hz] 2 ;;[2n + p + q + z ; e] 2 ; [16(k +1) 3 (k +2)(n +1) 2 +1; f 2 ] 2 ;;[e 3 (t + 2)(a +1) 2 +1; o 2 ] 2 ; [(a 2 ; 1)y 2 +1; x 2 ] 2 ;;[16r 2 y 4 (a 2 ; 1) + 1 ; u 2 ] 2 ; [(a + u 2 (u 2 ; a 2 ) ; 1)(n +4dy) 2 +1; (x + cu) 2 ] 2 ;;[n + l + v ; y] 2 ; [(a ; 1)l 2 +1; m 2 ] 2 ; [ai + k +1; l ; i] 2 ;;[p + l(a ; n ; 1) + b(2an +2a ; n 2 ; 2n ; 2) ; m] 2 ;;[q + y(a ; p ; 1) + s(2ap +2a ; p 2 ; 2p ; 2) ; x] 2 ;;[z + pl(a ; p)+t(2ap ; p 2 ; 1) ; pm] 2 g:This polynomial has been written here just to impress the reader. Its number ofvariables is too big. It can be proved that it takes also negative values ;m, wherem is not prime. Hence, it does not give us information about the sequence of primeseither.Long trials convinced the majority of mathematicians that there is no easyformula describing the sequence of primes. There exist \explicit formulae" describingthe primes, but they use objects which are even less known than the primesthemselves. That is why mathematicians concentrated on the characteristics of thesequence of primes \in total" and not \in parts". We will deal with this kind ofquestions in the next sections.


66 I. R. <strong>Shafarevich</strong>Problems1. Prove that there are innitely many primes of the form 3s +2.2. The same for the primes of the form 4s +3.3. Prove that each two numbers 2 2n +1 and 2 2m +1 are relatively prime.Deduce once more the innity ofthenumber of primes. [Hint. Assuming that p isa common divisor of two such numbers, nd the remainders of division of 2 2m and2 2n by p.]4. Let f(x) be a polynomial with integer coecients. Prove that there existinnitely many distinct prime divisors of its values f(1), f(2), . . . . (If you do notsucceed immediately, solve the problem for the polynomials of the rst and of thesecond degree.)5. Denote by p n the n-th prime in the natural order. Prove that p n+1


Selected chapters from algebra 674, . . . , where 1 stands on odd places, and natural numbers stand on even places insuccession, is unbounded, but not unboundedly increasing, since one can nd thenumber 1 arbitrarily far in it.If a sequence a = a 1 a 2 ...a n ... of positive numbers is given, and s = Sa,then s n+1 >s n (since s n+1 = s n +a n+1 , a n+1 > 0), and, generally, s m >s n for m>n. Therefore, such a sequence will be unboundedly increasing if it is unbounded.For example, if all a i =1,thens n = n and the sequence s 1 s 2 ... is unbounded.But in other cases it may be bounded.An example can be visualised on Fig. 1,where we rst divide the segmentbetween0 and 1 in half and set a 1 = 1 , then divide2again the segmentbetween0and 1 in half2and set a 2 = 1, etc. In this way, a 4 n =12 n . The result of adding such numbersis represented on Fig. 1 and it is obviousthat the sums S n stay inside our initialsegment, i.e., S n < 1. Fig. 1It is easy to check the last assertion by calculation. If a n = 12 n ,then(Sa) n = 1 2 + 1 4 + + 12 n = 1 1+ 1 2 2 + + 12 n;1and by formula (12) of Chapter I(Sa) n = 1 212 n ; 1 112 ; 1 =1; 2 n so that (Sa) n < 1 for each n.We shall show now that in the case of the sequence 1, 1, 1, ... the rst case2 3appears: although the terms of the sequence decrease, they do not decrease fastenough, and their sums (i.e., S ;1 (n)) increase unboundedly.LEMMA 1. The sum S ;1 (n) is, for n suciently large, greater than an arbitrarygiven number.Let the number k be given. We assert that for some n (and so also for allgreater integers) S ;1 (n) >k. Take n such thatn ; 1=2 m for some m. Divide thesum 1 1S ;1 (n) =1+ +2 3 + 1 1+4 5 + 1 6 + 1 7 + 1 1++8 2 m;1 +1 + + 12 min parts as it is shown: in groups contained between two consecutive powers of two.Each parenthesis has the form 1 + + 12 k;1 +1 2 k , and the number of parenthesesis equal to m. In each parenthesis we replace each summand by the smallest oneentering that parenthesis, that is by the last one. Since the numberofsummandsinsuch a parenthesis is equal to 2 k ;2 k;1 =2 k;1 ,we obtain that the k-th parenthesis


68 I. R. <strong>Shafarevich</strong>is greater than 2k;12 k = 1. As a result, we obtain that S 2 ;1(n) > 1+ m. This2inequality is valid for each n if n ; 1 = 2 m . It remains to put 1+ m = k, i.e.,2m =2k ; 1 and n =2 2k;1 +1. Then S ;1 (n) >k.Now we come to Euler's proof. His idea is connected with the method ofcomputing the sums of powers of the divisors of a natural number, which wasdescribed in section 3 of Chapter I (cf. formula (13) in Chapter I). Denote thesum of k-th powers of all divisors (including 1 and n) of a natural number n by k (n). According to formula (13) of Chapter I, for the number n having canonicalfactorisation n = p 11 pr r ,(2) k (n) = pk(1+1) 1; 1p k 1 ; 1p k(2+1)2; 1p k 2 ; 1 pk(r+1) r ; 1p k r ; 1 :Formula (2) had been known since the antique times, but it was implicitlyassumed that the number k in it was positive. Finally, Euler got interested in itand he posed the question|what would happen if k was integer, but negative?The answer is, of course, that there is no dierence, the derivation of formula (2)is completely formal and the same for negative aswell as for positive values of k.In particular, it is valid for k = ;1. The sum of (;1)-st powers (i.e., the inverses)of the divisors of a given number n will be denoted, as before, by ;1 (n). Formula(2) gives1 1 ;p r+1 r ;1 (n) =1 ; 1p 1+111 ; 1 p 1 ...1 ; 1 p r(we interchanged the order of summands in numerators and denominators in eachof the fraction). From here (since all the expressions in numerators are less than 1),(3) ;1 (n) k1 ; 1 p 11 ; 1p 21 ; 1p rwhere k is an arbitrary number. This is, of course, a contradiction.The value of the above proof is not that the assumption of niteness of thenumber of primes has led to a contradiction, but that it, when the innity ofthat


Selected chapters from algebra 69number has already been proved, gives some quantitative characteristics of thesequence of primes. Namely, reformulating the result obtained, we can now saythat if p 1 , p 2 , ... , p n , ... is the innite sequence of primes, then the expression1 becomes greater than any arbitrary number for n1 ; 1 p 11 ; 1 p 21 ; 1p nsuciently large. This is, of course, equivalent to the fact that the denominatorof the last fraction becomes smaller than an arbitrary positive number for nsuciently large. We have provedTHEOREM 2. If p 1 , p 2 , ... , p n , ... is the sequence ofall primes, then theproduct1 1 ;1 1 ; 1 1 ; , for n suciently large, becomes smallerp 1 p 2than any given positive number.p nThis is a rst approximation to our goal. Let us try now togive a more usefulform of the characteristic obtained.THEOREM 3. If p , p 2 , ... , p n , ... is the sequence ofall primes, then thesequence ofsums 1 + 1 + + 1 increases unboundedly.p 1 p 2 p nDerivation of Theorem 3 from Theorem 2 is purely formal: it does not use thefact that p 1 , p 2 , ... , p n , ... is the sequence of primes|it could be an arbitrarysequence of natural numbers which satises the conditions of Theorem 2.LEMMA 2. For each natural number n>1 the inequality(4) 1 ; 1 n > 14 1=nis valid.Since both sides of inequality (4) are positive, rasing them to the power of n,we obtain an equivalent inequality(5)1 ; 1 n n>14 which we are going to prove. Expanding the left-hand side by the binomial formulawe obtain(6) 1 1 n; =1; n 1 n(n ; 1) 1 n(n ; 1)(n ; 2) 1+ ;nn 2 n2 3! n + 13 +(;1)n n n :Absolute values of the summands on the right-hand side of formula (6) form thesequence Cn k 1n k . We examined such a sequence in connection with the Bernoullischeme in the section \Language of probability" in Chapter III (formula (35)).More precisely, if in that formula we putp = 1n +1 , q 1 =1; n +1 = nn +1 ,thenwe obtain that p + q =1,p k q n;k =(n +1) ;n n n;k and the numbers obtained dier n.from the ones examined in formula (6) just by the common factor Thenn+1


70 I. R. <strong>Shafarevich</strong>expression (n +1)p ; 1 is in our case equal to zero. In the section \Language ofprobability" of Chapter III we proved that if k>(n +1)p ; 1 (in our case k>0),then the (k + 1)-st term is smaller than the k-th one. This means that the numbersof the sequence Cn k 1n k , k =1 2 ...n decrease monotonously. (We referred here toChapter III just to stress the connection between dierent problems that we aredealing with. It would, of course, be easy to write down the ratio of the (k + 1)-stterm of the sequence to the k-th one and conclude that it is less than 1). We can seethat in formula (6), the rst two terms on the right-hand side cancel. The next twoterms (after cancellation which can be done easily) give 1 3 ; 13n. This number is2not less than 1 for n > 2(check ityourself!). The rest of the terms can be grouped4in pairs, where in each pair the rst term is positive and the next negative, but, aswe have seen, by absolute value less than the rst one. Therefore each pair givesa positive contribution to the sum (6). If n is odd, then the number of summandson the right-hand side of formula (6) is even (it is equal to n +1) and the sumis partitioned into n+1 pairs. If n is even, then after grouping into pairs, there2remains the summand 1n n . In such away, inany case the right-hand side consistsof a summand which is not less than 1 , and some additional positive summands.4This proves inequality (5), and so the lemma itself.Theorem 3 is now evident. For each p i wehave, according to the Lemma:1 ; 1 p i> 14 1=pi :Multiplying these inequalities for i =1 ...n we obtain1 1 ;1 1 ; 1 11; > :p 1 p 2 p n 1p4 1+ 1 p 2+ + 1p nIf the sums 1 p 1+ 1p 2+ + 1p nwere for each n less than a certain value k, itwouldfollow that 1 1 ;1 1 ; 1 1; > 1p 1 p 2 p n 4 k :This contradicts Theorem 2.We run here into a problem of a new kind. If N is a subset of a nite set M,then we can tell how much \smaller" N is than M, comparing the number of theirelements, e.g., computing the ratio n(N)=n(M). But now wehave two innite sets:the set of all natural numbers and the set of all primes contained in it. How canwe compare them? Theorem 3 oers one way of comparing, not very easy at rstsight. It can be applied to each sequence of natural numbers a: a 1 , a 2 , ... , a n ,... . According to Lemma 1, for the sequence of all natural numbers, the sums oftheir inverses (i.e., the sums S ;1 (n)) increase unboundedly. We can think of thesequence a to be \tightly" distributed among natural numbers if it has the sameproperty, i.e., if the sums 1 1a 1,a 1+ 1 1a 2,...,a 1+ 1 a 2+ + 1a n,... unboundedlyincrease. This means that in the sequence a enough natural numbers remainedso that the sums of their inverses are not too much less than the sums S ;1 (n) of


Selected chapters from algebra 71the inverses of all natural numbers. If, on the other hand, the sums of inverses ofthe sequence a remain bounded, we can think of it as \loosely" distributed in thenatural row. Theorem 3 states that the sequence of primes is \tight". The most\loose" case is the case of a sequence a having only a nite number of terms.But there are intermediate cases. For instance, the sequence of squares: 1, 4,9, . . . , n 2 , ... . It is natural to denote the corresponding sums 1 + 1 + 1 + + 1 4 9 n 2by S ;2 (n). We shall prove that they are bounded by anumber not depending on n.We use the same idea as in the proof of Lemma 1. Let m be such that2 m > n.Then S ;2 (n) 6 S ;2 (2 m ). We divide the sum S ;2 (2 m )=1+ 1 + 1 + + 12 2 3 2 2 2minto parts:Each part(1) +12 2 +13 2 + 1 4 2 + +1(2 m;1 +1) + + 12 2 2m :1(2 k;1 +1) 2 + + 12 2k again contains 2k;1 terms and the rst termis the greatest. Therefore this part cannot be greater than 2 k;1 1(2 k;1 +1) 2


72 I. R. <strong>Shafarevich</strong>elaborate method of comparing \tight" and \loose" sequences from the previoussection by a more naive one, which can be understood more easily. Namely, wewill try to answer the naive question|\which portion of the sequence of naturalnumbers is covered by the primes"| by nding how many primes there are smallerthan 10, how many smaller than 100, how many smaller than 1000, etc. For eachnatural number n, denoteby (n) thenumber of primes not greater than n, sothat(1) = 0, (2) = 1, (4) = 2, . . . . What can be said about the ratio (n)nwhen nincreases?First of all, consider what can be learned from tables. Each assertion or questionconcerning natural numberscanbechecked for all natural numbers not exceedinga certain limit N. This fact playsaroleinthenumber theory, whichinvestigatesproperties of natural numbers, similar to that of the possibility of experimentingin theoretical physics. in particular, one can compute the values (n) for n =10 k ,k =1 2 ... 10. The following table is obtained.n(n)n(n)10 4 2.5100 25 4.01000 168 6.010000 1229 8.1100000 9592 10.41000000 78498 12.710000000 664579 15.0100000000 5761455 17.41000000000 50847534 19.710000000000 455059512 22.0n(n)Table 1.(n)We see that the ratio is constantly increasing, which means thatnisdecreasing all the time. In other words, the portion of primes among the rst nnumbers becomes close to zero when n increases. According to the tables, it couldbe said that \the primes constitute a zero portion among all natural numbers".That was the way Euler formulated this fact, although his reasoning did not containa full proof. We will now give the precise formulation and then the proof.THEOREM 4. The ratio (n)nbecomes smaller than any given positive numberfor n suciently large.In order to prove the theorem we have to estimate somehow the function (n).For actual calculation of its values we start with the prime 2, then we cancel allthe numbers which are multiples of 2 and not exceeding n. Then we take therst remaining number|this will be 3|and repeat the process. We continue tillwe have exhausted all the numbers not exceeding n. The numbers which are not


Selected chapters from algebra 73cancelled (2, 3, etc.) are all primes not exceeding n. This method was used alreadyin the antique times it is called \the sieve of Eratosthenes".We will apply the same process in our reasoning. Suppose we have alreadyfound the rst r primes: p 1 , p 2 , ... , p r . Then the remaining primes, not exceedingn, are contained among \noncancelled" numbers, not exceeding n, i.e., amongthose numbers m 6 n which are not divisible by anyofthenumbers p 1 , p 2 , ... ,p r . But the number of numbers not exceeding n and not divisible by any oftheprimes p 1 , p 2 , ... , p r was explored in Chapter III|it is given by the formula inthe section Set algebra of Chapter III. As weshowedthere, the expressionin theformula can be replaced with the easier one n 1 ; 1p 11 ; 1 p r, where the erroris less than 2 r (formula (28) of Chapter III). Hence, the number s of numbersm 6 n not divisible by any of the primes p 1 , p 2 , ... , p r satises the inequality(7) s 6 n1 1 ; 1 1 ; +2 r :p 1 p rAll (n) primes not exceeding n are contained either among r primes p 1 , p 2 ,... ,p r ,or among s numbers accounted for by inequality (7).In such away, (n) 6 s + r,and(8) (n) 6 n1 1 ; 1 1 ; +2 r + r:p 1 p r Inequality (8) is remarkable because it contains the product1 ; 1 p 11 ; 1p rwhich can be estimated using Theorem 2.Now we can pass to the proof of Theorem 4. Let an arbitrary small positivenumber " be given. Wehavetondanumber N, depending on ", such that (n)n


74 I. R. <strong>Shafarevich</strong>Note that if we choose an arithmetic progression am + b even with a very bigdierence a, i.e., being very \sparse", then the number of terms of this progressionnot exceeding n is the same as the number of integers m satisfying am 6 n ; b,i.e., n;ba . We saw in section 3 of Chapter III that n;ba diers fromn;baby notmore than 1. Hence, the number of terms of the progression not exceeding n is notless than n;ba; 1. Its quotient withn is not less than 1 n ( n;ba; 1) = 1 a ; 1 bn a ; 1 n .When n increases, this number approaches 1 aand is not becoming arbitrarily small.Thus, Theorem 4 would become false if we replaced the sequence of primes in it byan arbitrary arithmetic progression. This shows that primes are distributed moresparsely than any arithmetic progression.Problems1. Let p n denote the n-th prime. Prove that for an arbitrarily large positivenumber C the inequality p n >Cnis valid for n suciently large. [Hint. Use thefact that (p n )=n.]2. Consider natural numbers having the property that, when written in decimalform, they do not contain a certain digit (e.g., 0). Let q 1 , q 2 , ... , q n , ... bethose numbers, written in ascending order, and let 1 (n) denotes the number ofsuch numbers not exceeding n. Prove thatthe ratio 1(n)nbecomes smaller thanany arbitrary given positive number, for n suciently large. Prove that the sums1 1q 1,q 1+ 11q 2,...,q 1+ 1 q 2+ + 1q n,... are bounded. [Hint. Do not try to copythe proof of Theorem 4. Split the sum to parts, where in each part denominatorsare contained between 10 k and 10 k+1 . Then nd the number of numbers q i in suchintervals. The answer depends on the digit which is excluded: r =0orr 6= 0.]APPENDIXInequalities of Chebyshev for (n)We have put this text in the Appendix mainly for formal reasons, because wehave to use logarithms, the knowledge of which isnot assumed in the rest of thetext. Recall that the logarithm with basis a of the number x is a number y suchthata y = x:This is written as y =log a x. In the sequel it will always be assumed that a>1and that x is a positive number. Basic properties of logarithms, following directlyfrom the denition, are:log a (xy) =log a x +log a y log a c n = n log a c log a a =1:log a x>0 if and only if x>1. Logarithm is a monotonous function, i.e., log a x 6log a y if and only if x 6 y.In this text, if the basis of a logarithm is not indicated, it is supposed that itis equal to 2 log x means log 2 x.


Selected chapters from algebra 75The second reason for putting this text in the Appendix is the following. Inthe rest of the book, the logic of reasoning was clear, namely, why do we followa certain road (at least I hope it was so). Here we encounter the case, not rarein mathematical investigations, when it looks as if some new consideration has\jumped in from nowhere", when even the author cannot explain how hecametothe conclusion. About such situations Euler used to say: \Sometimes it seems tome that my pencil is smarter than myself". Of course, these are results of longtrials and unknown work of psyche.We are going now to continue our investigations concerning the ratio (n)nwhenn is increasing unboundedly. Take another look at Table 1, showing the values of(n) for n =10 k , k =1 2 ... 10. The last column of the table contains the valuesof the ration for some values of n. We see that when we pass from n (n) =10k ton =10 k+1 n,thatiswhenwegodown by one row, the value of increases always(n)approximately by the same value. Namely, the rst number is equal to 2.5 thesecond diers from it for 1.5, and the further dierences are: 2 2.1 2.3 2.3 2.32.4 2.3 2.3. We see that all of these numbers are close to the one: 2.3. Not tryingat the moment to explain the meaning of this particular value, let us suppose thatnalso beyond the range of our table the numbers , when passing from n (n) =10k ton =10 k+1 , increase by amounts which are closer and closer to a certain constant .nThis would mean that for n (n) =10k would be very close to k. But, if n =10 k ,then by the denition k =log 10 n. It is natural to assume that also for other valuesof n the ration is very close to log n(n)10 n. Thus, (n) isvery close to clog 10 n ,where c = ;1 .A lot of mathematicians where attracted by the secret of distribution of primesand tried to solve it using tables. In particular, Gauss got interested in this questionalmost as a child. His interest in mathematics started, it seems, from the child'sinterest in numbers and forming tables. In general, a lot of great mathematiciansshowed virtuosity in calculations and were capable of doing immense ones, often byheart (Euler was even ghting insomnia in that way!). When Gauss was only 14,he constructed a table of primes (in fact, a smaller one than our Table 1) and cameto the conclusion we have formulated. Later on this conclusion was considered bymany mathematicians. But the rst result in that direction was proved only half acentury later, in 1850, by Chebyshev.Chebyshev proved the following assertion.THEOREM. There exist constants c and C such that for all n>1(10) cnlog n 6 (n) 6 Cnlog n :Before we proceed with the proof, we give some remarks concerning the formulationof the theorem. What is the basis of the logarithms we are using? Theanswer is: arbitrary. From the denition of logarithms it immediately followsthat log b x = log b a log a x: it is enough to substitute a by b log b a in the relation


76 I. R. <strong>Shafarevich</strong>a log a x = x, to obtain b log b a log a x = x, which shows that log b x = log b a log a x.Hence, if the inequality (10) is proved for log a n, then it is true also for log b n,withcthe substitution of c and C bylog b a and Clog b a .Inequalities (10) express the idea inspired by tables that (n) is \close" tonclog nfor some c. The question why in our hypothetical reasoning there appearedone constant c, and in the theorem there appear two of them|c and C|andwhether it is possible to use only one constant, will be discussed after the proof ofthe theorem.The key to the proof of Chebyshev's theorem are properties of binomial coef-cients Cn k : mostly the fact that they are integers and some properties about theirdivisibility by primes. We shall recall these properties before we pass to the proof.First of all, there is a proposition proved in section 3 of Chapter II which saysthat the sum of all binomial coecients Cn k for k =0 1 ...n is equal to 2 n . Sincethe sum of positive summands is greater than any of them, we deduce that(11) C k n 6 2 n :We shall particularly need large binomial coecients. We saw in Chapter II thatfor even n = 2m the coecient C2m m is greater than all the others, and for oddn =2m + 1 there exist two equal coecients C m 2m+1 and C m+12m+1which are greaterthan the others. We draw our attention to them, particularly to(12) C n 2n2n(2n ; 1) (n +1)= :1 2 ... nIf we group the factors of the numerator with the factors of the denominator takenin reverse order, we obtainC n 2n = 2n n 2n ; 1n ; 1n +1 ... :1Obviously, nofactorinthelastformula is less than 2, so(13) C n 2n > 2 n :Consider now properties of divisibility of binomial coecients by primes. Factorsin the numerator in the expression (12) are obviously divisible by all primesgreater than n and not exceeding 2n. These primes cannot divide any factor in thedenominator, and so they do not cancel and they are divisors of C2n n . The numberof primes between n and 2n is equal to (2n) ; (n) and all of them are greaterthan n, hence(14) C n 2n > n (2n);(n) :An analogous assertion is valid for the \middle" coecients C n 2n+1 = Cn+1 2n+1 withan odd lower index. If we write them asC n 2n+1(2n +1)(n +2)= 1 2 ... n


Selected chapters from algebra 77we see that (2n +1); (n + 1) of primes, greater than n + 1 and not exceeding2n +1, enters the numerator and cannot be cancelled with the denominator. Sincethey are greater than n +1,wehave(15) C n 2n+1 > (n +1)(2n+1);(n+1) :The inequalities (14) and (15) already reveal important connections between binomialcoecients and prime numbers.Finally, we state the last of the properties of binomial coecients which weneed for the proof although it is quite easy, it is not as obvious as the previousones.LEMMA. For an arbitrary binomial coecient Cn k , any power of a prime dividingit does not exceed n.We draw the attention to the fact that we are not speaking about the exponentof the power but about the power itself. In other words, we assert that if p r dividesCn k,wherep is a prime, then pr 6 n. For example, C9 2 =9 4 is divisible by 9andby 4, and both of these numbers do not exceed 9.Write the binomial coecient in the form(16) C k nn(n ; 1)...(n ; k +1)= :1 2 ... kThe prime p we are dealing with has to divide the numerator of this fraction.Denote by m the factor in numerator which contains the maximal power of p (orone of those having such a property), and by p r this maximal power. Obviously,n > m > n ; k +1. Set n ; m = a, m ; (n ; k +1)=b, then a + b = k ; 1andC k n can be written in the form(17) C k n =(m + a)(m + a ; 1) (m +1)m(m ; 1) (m ; b):k!The factor m is now the most important for us and we write down the product in thenumerator as having a factors to the left and b factors to the right of it. Rearrangethe denominator analogously: k! =(1 2 ... a)(a +1)(a + b)(a + b +1). Since(a +1)(a +2)(a + b) is divisible by b!, this product (denominator) has the forma! b! l, wherel is an integer. Now we can rewrite Cn k in the following form(18) C k n = m + aa m + a ; 1a ; 1 ... m +11where we transferred the factor m lNote that in each ofthe factors m+iito the end.or m;jj m ; 11 ... m ; bb m l (i =1 ...a, j =1 ...b)thepower of p entering the numerator completely cancels with the denominator, andhence after cancellation only the denominator could be divisible by p (though itcan also be relatively prime with p). Really, consider, for instance, fractions m+ii(factors of the type m;jjcan be treated in the same way). Let i be divisible exactlyby p s , i.e., i = p s u, where u is relatively prime with p. If s < r, then m + i is


78 I. R. <strong>Shafarevich</strong>also divisible exactly by p s : setting m = p r v (recall that m is divisible by p r ), weobtain that m + i = p s (u + p r;s v). If s > r, then for the same reasons m + i isdivisible by p r and taking into account the way m was chosen (it is divisible bythe greatest power of p of all the numbers between n ; k + 1 and n and this poweris p r ), we conclude that the number m + i cannot be divisible by a greater powerof p than the r-th. Thus, p r cancels and in the numerator there remains a numbernot divisible by p. As a result we see that among all the factors in the expression(18), only the last one can contain p as a factor. But the power of p which dividesm is p r , and this means the product (18) cannot be divisible by a greater power ofp than p r . Since p r divides m, andm 6 n, itisp r 6 n. The Lemma is proved.Let us see what it says about the canonical factorisation C k n = p1 1 pm m .First of all, the primes p 1 , ... , p m can appear just from the numerator of theexpression (16), therefore all p i 6 n and so m 6 (n). According to the Lemma,p ii6 n for i =1 ...m. As a result we obtainthat(19) C k n 6 n (n) :Now we can proceed with the proof of the Chebyshev's theorem itself, i.e.,with the proof of the inequalities (10). Note that it is enough to prove theseinequalities for all values of n starting with a certain xed limit n 0 . For all nnlog 2n = 1 2n2 log 2n i.e., the left of the two inequalities (10) with the constant c = 1 . But for the time2beingitisproved only for even values of n. For odd values of the form 2n +1weuse the monotonicity of the logarithm and of the function (n). It follows that(2n +1)log(2n +1)> (2n)log2n:Substituting here the obtained inequality for(2n), we see that(2n +1)>n log 2nlog(2n) log(2n +1) = nlog(2n +1) :Since it is always n > 1 (2n + 1), it follows that3(2n +1)> 1 2n +13 log(2n +1) :


Selected chapters from algebra 79Thus, the left inequality (10)isproved for odd n with the constant c = 1 . So, the3left inequality (10) is valid for all n and c = 1.3We proceed to the proof of the right inequality in (10). We shall prove itbyinduction on n. Let, rst of all, n be even. We will write 2n instead of it. Takingthe inequality (11) for the coecient C2n n (i.e., substitute in Cn k n by 2n and k by n)together with the inequality (14), as a consequence we obtainand, passing to logarithms,n (2n);(n) 6 2 2n(21) (2n) ; (n) 6 2n2n (2n) 6 (n)+log n log n :In accordance with the inductive hypothesis, suppose that our inequality has beenproved: (n) 6 Cnlog nwith a constant C whose value we shall make more preciselater. Substituting in the formula (21), we obtain:(2n) 6 Cnlog n + 2n (C +2)n=log n log n :We would like toprove the inequality (2n) 6 C2nlog 2nand for that we have tochoosethe constant C in such away that the inequality(22)(C +2)nlog n6 2Cnlog 2nis valid for all n, starting from some limit.This is just a simple school exercise. Cancel in the inequality both sides by n,remark that log 2n = log 2 + log n = log n +1 and denote log n by x. Then theinequality (22)takes the formC +2x6 2Cx +1 :Multiplying both sides by x(x + 1) (as x>0) and transforming, we write it in theform (C ; 2)x > C +2. Obviously, C has to be chosen so that C ; 2 > 0. Setting,e.g., C =3,we obtain that it is valid for C =3andallx > 5. Since x denotes log n,this means that the necessary inequality would be valid if n > 2 5 = 32, 2n > 64.It remains to consider the case of odd values of the form 2n +1. Compare theinequality (11) (substituting in it n by 2n +1and k by n) with the inequality (15).We obtain the inequalityand, taking logarithms, the inequality2 2n+1 > (n +1) (2n+1);(n+1)2n +1> ((2n +1); (n + 1)) log(n +1):


80 I. R. <strong>Shafarevich</strong>From here, using the inductive hypothesis about (n +1),we obtain, as before(2n +1)6 C n +1 2n +1+log(n +1) log(n +1) :The inequality that we need: (2n +1)6 C 2n+1 will be proved if we canchecklog(2n+1)that(23) C n +1 2n +1 2n +1+ 6 Clog(n +1) log(n +1) log(2n +1)for a suitable choice of the constant C and for all n starting from some limit. Thisis again an exercise of purely school type, though a bit harder than the previousone. In order to compare various terms in the inequality more easily, replace onthe left-hand side 2n +1by a greater value 2(n +1):(24) C n +1 2n +1+log(n +1) log(n +1)(C + 2)(n +1)6 :log(n +1)In order to transform the right-hand side, note that 2n +1> 3 (n +1)forn > 1,2log(2n +1)6 log(2n +2)=log(n +1)+1. Hence,(25)2n +1 (3=2)(n +1)>log(2n +1) log(n +1)+1 :Comparing inequalities (24) and (25) we see that the inequality (23) will be provedif we prove that(C + 2)(n +1) (3=2)C(n +1)6log(n +1) log(n +1)+1 :Cancelling both sides by n+1 and putting log(n+1) = x, we arrive at the inequalityC +2x6 (3=2)Cx +1 which can be solved completely in the same way as in the previous case. It isenough to multiply both sides by x(x + 1) and reduce similar terms. We obtain theinequality (C +2)x + C +26 3 Cx, i.e., ( 1 C ; 2)x > C +2. Setting C =6,we2 2see that the inequality isvalid for x > 8, i.e., for n +1> 2 8 ,2n +1> 511. Thus,the right inequality (10) is proved with the constant C = 6 and for all values of nstarting with 511. The Theorem is proved.Note that Theorem 4 appears as an easy consequence of the Theorem justproved. Really, since (n) < Cn(n)log n, we haven6 Clog n. And as a logarithmchanges monotonously and increases unboundedly (log 2 k = k), (n)nbecomes lessthan any arbitrary positive number. But, the proof of the theorem of Chebyshevwas based on completely dierent considerations than the proof of Theorem 4.At the end, we return once more to assertions which can be made by consideringnTable I. Starting from it, we cametothe claim that isclosetolog (n)10 n witha certain value of the constant C: the rst decimal gures of the number C ;1 are


Selected chapters from algebra 812.3. Hence it can be concluded that (n) is close to C ;1 nlog 10 n. This expressioncanbegiven a simpler formnlog e n,ifwe use a new basis of logarithms e such thatC log 10 n = log e n. But, as it was said earlier, it is always log b x =log b a log a x,and so our relation will be fullled if C = log e 10. Substituting the value x = binto the relation log b x =log b a log a x, we obtain that log b a log a b = 1 and therelation C = log e 10 which we are interested in can be rewritten as C ;1 =log 10 e.14-year-old Gauss turned his attention to these relations and tried to guesswhich number e could be, so that log 10 e is close to (2:3) ;1 . Such anumber at thattime was well known, thanks to the fact that the logarithm with such abasishasa lot of useful properties. This number is commonly denoted by e. The logarithmwith the basis e is called natural and is denoted by ln: log e x =lnx. Here, to theend of this page, we have to consider that the reader is familiar with the conceptof the natural logarithm.In suchaway, a natural assertion which can be deduced from tables is that (n)becomes close tonln nwhen n increases unboundedly. The theorem of Chebyshevwhich hasbeen just proved states (if we use natural logarithms) that there existtwo such constants c and C, that c nnln nx 2 for real x, starting from some limit. Letn 6 x 6 n+1, where n is an integer. Reduce to proving the inequality 2 n > (n+1) 2and use the induction.]3. Prove thatp n


82 I. R. <strong>Shafarevich</strong>6. Using the result of Problem 5 give a new proof of the Lemma in theAppendix.7. Prove that if p 1 , ... , p r are all the primes between m and 2m +1, thentheir product does not exceed 2 2m .8. Determine the constants c and C such that the inequality (10)isvalid forall n.9. Try to nd as large as possible a constant c and as small as possible aconstant C, for which the inequality (10) is valid for all n, starting from somelimit.I. R. <strong>Shafarevich</strong>,Russian Academy of Sciences,Moscow, Russia


THE TEACHING OF MATHEMATICS2001, Vol. IV, 1, pp. 1{34<strong>SELECTED</strong> <strong>CHAPTERS</strong> <strong>FROM</strong> <strong>ALGEBRA</strong>I. R. <strong>Shafarevich</strong>Abstract. This paper is the fth part of the publication \Selected chapters fromalgebra", the rst four having been published in previous issues of the Teaching ofMathematics, Vol. I (1998), 1{22, Vol. II, 1 (1999), 1{30, Vol. II, 2 (1999), 65{80, Vol.III, 1 (2000), 15{40 and Vol. III, 2 (2000), 63{82.AMS Subject Classication: 00 A 35Key words and phrases: Real numbers, limits, innite sums, decimal representations,real roots of polynomials, Sturm's theorem.CHAPTER V. REAL NUMBERS AND POLYNOMIALS1. Axioms of real numbersIn the present chapter we shall try to make our idea of real numbers moreprecise. Our tendency will not be towards very rigorous reasoning, but we shallonly try to give enough accuracy to our notions and reasoning in this eld, so thatwe are able to prove statements about real numbers.If we choose an origin and a unit on a line, we can represent real numbers aspoints on the line. Thus, if we make our idea of real numbers more precise, we giveat the same time a more precise description of a line and points lying on it. In thesequel we shall often, as an illustration, use this bijective correspondence betweenreal numbers and points on a line.Let us try to take geometry as an example and bring the precision of denitionsand arguing to the level which already exists in the school geometry courses. There,some axioms appear as the basis of all the construction, and starting from theseaxioms all other statements are proved. Axioms themselves are not proved: wetake them on the basis of experiment orintuition.In order to be more concrete, let us look at the construction of plane geometrybased on axioms. We can distinguish three types of logical notions. First ofall, there are basic geometrical notions|points and lines. Then, there are basicrelations: a point lies on a line a point lies on a line between other two points.Neither of these are dened. We think as if a \list" of all points and all lines existsThis paper is an English translation of: I. R. Xafareviq, Izbranye glavy algebry. GlavaV. Destvitel~nye qisla i mnogoqleny, Matematiqeskoe obrazovanie, N2(5), apr.{in~1998, Moskva, str. 31{66. In the opinion of the editors, the paper merits wider circulation andwe are thankful to the author for his kind permission to let us make thisversion.


2 I. R. <strong>Shafarevich</strong>somewhere, and we knowwhich points lie on which lines or which triples of pointsA, B, C on the line l are such that B lies between A and C. And only in thirdplace there are axioms, i.e., statements about basic notions and relations amongthem. For instance: each two distinct points belong to exactly one line. Or: amongthree distinct points on a line, there is exactly one lying between other two.There is a complete analogy with real numbers. The basic notions here arereal numbers themselves. This means that, for the moment, we do not assumeanything more about real numbers, but only that they constitute a certain set.Basic relations between real numbers are of two dierent types: operations andinequalities. Let us describe them in more detail.1) Operations with real numbersFor every two realnumbers a and b we dene a third number c, called the sumof a and b. We write this as: a + b = c.For every two real numbers a and b we dene a third number d, called theproduct of a and b. We write this as: ab = d.2) Inequalities between real numbersFor some pairs of real numbers a and b we have thata is less than b. We writethis as: aa. If we want tosaythata a).Before we pass to the formulation of axioms connecting basic notions withbasic relations among them, let us emphasize once more the analogy with geometry.Write analogous notions in the table:AlgebraGeometryBasic notionsReal numbers Point, line, . . .Basic relationssum: a + b = cA point lies on a line.product: ab = dPoint C lies betweeninequality: a


Selected chapters from algebra 3(Remark. There exists exactly one such number. If 0 0 were another numberwith the same property, wewould have 0 0 +0=0 0 ,by the denition of 0, 0 0 +0=0+0 0 by the commutative lawand0+0 0 =0,by the denition of 0 0 . Finally, weobtain 0 0 =0 0 +0=0+0 0 = 0, i.e., 0 0 =0.)I 4 . For each real number a there exists a number called opposite, denoted by ;a,such that a +(;a) =0.(Remark. For the given number a there exists exactly one such number.If a 0 were another number with the same property: a + a 0 = 0, we would have(a +(;a)) + a 0 =0+a 0 = a 0 . Also, (a +(;a)) + a 0 =((;a)+a)+a 0 ,andbytheassociative law, ((;a)+a)+a 0 =(;a)+(a + a 0 ). By the property ofnumber a 0 ,a + a 0 =0and(;a) +0=;a. Taking these equalities together, we obtain thata 0 = ;a.)II (axioms of multiplication)II 1 . Commutative law: ab = ba for arbitrary real numbers a and b.II 2 . Associative law: a(bc) =(ab)c for arbitrary real numbers a, b and c.II 3 . There exists a number called unit, denoted by 1,such that a 1=a for anarbitrary real number a.(Remark. There exists only one such number. It can be proved in thesame way as the remark following axiom I 3 |we onlyhave to replace addition bymultiplication, and 0 by 1.)II 4 . For each realnumber a, dierent from 0, there exists a number called inverse,denoted by a ;1 ,suchthata a ;1 =1.(Remark. For each realnumber a dierent from 0, there exists only one suchnumber. The proof is exactly the same as in the remark following axiom I 4 .)III (axiom of addition and multiplication)III 1 . Distributive law: (a + b)c = ac + bc for arbitrary real numbers a, b and c.IV (axioms of order)IV 1 . For any two real numbers a and b exactly one of the following three relationsholds: a = b or a


4 I. R. <strong>Shafarevich</strong>VII(axiomofembedded segments)Let a 0 , a 1 , a 2 , ... and b 0 , b 1 , b 2 , ... be two sequences of real numbers,satisfying a 0 6 a 1 6 a 2 6 , b 0 > b 1 > b 2 > and b n > a n for each n.Then there exists a real number c, such that b m > c and c > a n for all mand n.If we use representation of real numbers on a line, then numbers x satisfyingthe condition a 6 x and x 6 b (a 6 x 6 b for short) are represented by the set whichis called a segment and denoted by [a b]. So, the premises of the last axiom statethat the segments I n =[a n b n ] are embedded one into another: I 0 I 1 I 2 .The axiom states that there exists a point (i.e., a number) which iscommonforallthese embedded segments (hence the name of the axiom).All the usual properties of real numbers easily follow from the listed axioms. Itwould be too boring to devote several pages to these completely obvious arguments.Hence, we shall only formulate some assertions which we shall need later|and givejust some remarks in connection with their proofs (see also problems 2, 3, 4).It follows from the axioms of group II that for each number a dierent from0 and each number b, the number c = a ;1 b is the unique solution of the equationax = b. It is called the quotient of b and a and denoted by b a. All the usual rulesabout dealing with parentheses and fractions follow from the axioms.Since for a natural number n the equality n =1++1 (n summands) isvalid, it follows from the axioms of group III that for each number a, the numberna (product of n and a) is equal to the sum a + + a (n summands).Axiom IV 3 implies that if a0, 1 > 0. 2We meet exactly this situation when we intend to measure the given realnumber approximately, with deciency or excess, using rational numbers. In thatcase a n and b n are rational numbers. An example is the construction of p 2wespoke


Selected chapters from algebra 5about in Section 1 of Chapter I. Thus, axiom VII formulates what we intuitivelyhave in mind when we speak about \better and better measuring". Together withthe preceding Lemma it gives us the possibility ofconstructing real numbers withthe prescribed properties. We shall often use this observation later.Concerning axioms V and VI we just remark that we assume here natural and,more generally, rational numbers to be known. We shall not analyse these notionsin detail.Let us remark at the end that the given axioms are not independent. Thismeans that some of them could be proven as theorems, relying on other axioms (see,e.g., problem 6). We have just gathered those properties of real numbers which weare used to and which are intuitively convincing. Taking greater numberofaxiomswe obtained the right to skip not very interesting proofs of some intuitively obviousfacts.Problems1. Which of the axioms I{VII are also valid in the set of rational numbers,and which are specic for real numbers?2. Prove, using axioms I{III, that for each realnumber a, 0a =0.3. Prove that for arbitrary real numbers a and b the equation a + x = b has asolution and that it is unique.4. Prove that for arbitrary real numbers a 6= 0 and b the equation ax = b hasa solution and that it is unique.5. Consider the set of rational numbers as a subset of the set of real numbers|on the basis of axiom V. Prove that rational number 0 coincides with the realnumber 0 whose existence is based on axiom I 3 . Do the same for rational number1 and the real number 1 whose existence is based on axiom II 3 .6. Not using axiom V, prove thatnumbers 0, 1, 1+1, ... , 1+1++1(n summands) are dierent for all natural n. Here 1 denotes the number whose existenceis guaranteed by axiom II 3 . Hence, prove that natural numbers are containedamongst the reals, and that operations and inequalities, dened for real numbers,when applied to natural ones, give usual operations and inequalities. Prove afterthat the assertion of axiom V. In that way, this axiom is in fact superuous in ourlist, since it could be proven on the basis of other axioms.7. Instead of the operation of multiplication, given by denition for real numbers,dene a new operation given by the formula a b = a + b + ab. Does itobey the axioms of group II?2. Limits and innite sumsIn order to illustrate the role of axiom of embedded segments as a method ofconstruction of new real numbers, we shall introduce several notions which will alsobe useful later.


6 I. R. <strong>Shafarevich</strong>We met in Chapter IV sequences which were bounded as well as sequenceswhich increased unboundedly. Consider now sequences which are decreasing. Forthe sake of simplicity, consider rst sequences of positive numbers and call such asequence unboundedly decreasing if its terms unboundedly approach zero. The exactdenition can be made analogously to the denition of unboundedly increasingsequences, given in Section 2, Chapter IV.A sequence a n of nonnegativerealnumbers is said to approach zero unboundedlyif for each arbitrary small positive number " there exists a natural number N suchthat a n 1 and it can be written in the form A =1+xwith x>0. Using binomial formula, A n =(1+x) n =1+nx + y, where y is asum of positive numbers, so y>0. Thus, A n > 1+nx and so for each ">0thereexists such N that A n > 1=" for all n > N (this N can be found explicitly). Hence,a n N. But among a n 's with n>N there are both 0 and 1. Therefore wewould have jj < 1 and j1 ; j < 1 . Clearly, suchanumber does not exist.2 2But if a sequence has a limit, this limit is unique. Namely, suppose that asequence (a 1 a 2 ...a n ...)hastwo limits: and , 6= . Then for each " thereexist numbers N and N 0 , such thatforn>N it is ja n ; j Nand n>N 0 then ja n ; j


Selected chapters from algebra 7As not every bounded sequence has a limit, considering just such sequenceswould not lead us to the construction of new real numbers. Our main result willbe that there is a simple special type of sequences which always have limits andtherefore they will give us a method of constructing new real numbers.A sequence (a 1 a 2 ...a n ...) is called increasing, ifa n 6 a n+1 for all n, i.e.,a 1 6 a 2 6 a 3 6 a 4 6 .THEOREM 1. Each bounded and increasing sequence ofpositive numbers hasa limit.The proof will follow the logic of an anecdote which was popular when I wasa student (i.e., before the war). The story was about dierent ways to catch alion in a desert. There was a French method, method of NKVD-investigators,mathematician's method, . . . Mathematician's method went like this. He dividesthe desert into two parts. The lion is situated in one of these parts. He divides thispart again in two parts|and he continues like this till the lion appears in a part ofthe desert whose dimensions are less than the dimensions of the cage. It remains toput the cage around it. This was a parody to a way ofproving existence theorems,one of which we are going to demonstrate now.Let a =(a 1 a 2 ...a n . . . ) be an increasing sequence of positive numbers. Bythe assumption it is bounded, so there exists a constant C such that all a n C=2, and thenall a n with n > m are contained in the segment [C=2C] (since the sequence isincreasing) or a n 6 C=2 forall n, and then all terms of the sequence belong tothe segment [0C=2]. Denote by I 2 one the segments, [0C=2] or [C=2C], namelythe one which contains all the terms of sequence a, starting from some place.After that, divide the new segment into two parts. Obviously, we can continuethe process unboundedly and we will obtain a sequence of embedded segmentsI 1 I 2 I 3 I m , where segment I k has the length C=2 k , and whichpossesses the property that each segment I k contains all the terms of sequence a,starting from some place. By the axiom of embedded segments (axiom VII) thereexists a real number , belonging to all the segments I k . It is indeed the limitof sequence a. Really, aswehave seen, all the terms of sequence a, starting fromsome place, belong to segment I k , This means that for each natural number k thereexists an N such thata n 2 I k for all n>N. But also a 2 I k . Since the length ofsegment I k is equal to C=2 k ,itfollows that ja n ; j N. This givesus the property which appears in the denition of the limit, if we choose k so thatC=2; k


8 I. R. <strong>Shafarevich</strong>c n =1,thena n = n and the sequence a is unbounded. We considered a less trivialexample in Section 2 of Chapter IV: in the sequence c all c n =1=n. We sawthatinthat case the sequence a is also unbounded. But if we cancheck that the sequencea of sums is bounded, then according to Theorem 1 it has a unique limit . Thislimit is called the sum of the sequence (c 1 c 2 ...c n ...), which is denoted byc 1 + c 2 + + c n + = :Sometimes the innite sum c is called a series and its sum|the sum of the series.If the sequence of sums a n is bounded, then, as we have seen, the sum of theseries c 1 + c 2 + + c n + exists. If it is unbounded, then we say that the sumof the series does not exist. Hence, Lemma 1, Section 2, Chapter IV, states thatthe sum of the series 1 + 1 2 + 1 + does not exist.3Consider an example. Let a nonnegative number a, lessthan1,begiven, andlet c =(1aa 2 ...a n ...). Then a n =1+a + a 2 + + a n;1 (in the n-th placein the sequence c there appears a n;1 ). The sum 1+a + a 2 + + a n;1 can beevaluated using the formula for the sum of a geometric progression|formula (12)of Chapter I:(1) a n =1+a + a 2 + + a n;1 = 1 ; an1 ; a = 11 ; a ; an1 ; a :We have seenthata n ! 0whenn !1, wherefrom it follows immediately thata n1 ; a ! 0 when n !1. Thus, formula (1) gives that a n ! 1 . We can write1 ; athis as:(2) 1+a + a 2 + + a n + = 1 for a


Selected chapters from algebra 9We know nearly nothing about analogous sums with odd k. It was proved onlyrecently (in 1978) that the sum 1+ 1 2 + 1 3 3 ++ 1 + is an irrational number.3 n3 This remains probably the only known fact about these sums for odd values of k.Let us remark that just knowing the fact that a series c 1 + c 2 + + c n + has a sum, one can deduce useful corollaries even if the value of the sum is notknown.LEMMA 3. If the sum of the series c 1 + c 2 + + c n + exists, then thesequence of numbers d n = c n + c n+1 + unboundedly approaches 0.We shall use an easy property ofthe limit. Suppose that a sequence a 1 , a 2 ,... , a n ,... has a limit , i.e., a n ! when n !1. Then for each number thesequence ; a 1 , ; a 2 ,..., ; a n ,... has the limit ; . Really, the dierence ;;( ;a n )=a n ;, and the dierence a n ; ! 0, hence ;;( ;a n ) ! 0when n !1. Denote the sum of the series c 1 + c 2 + + c n + by and thenumber c 1 + c 2 + + c m by a m . By the denition of the sum of an innite series,the sum of the series c 1 + c 2 + + c n + is equal to the limit of the sequencea 1 , a 2 ,... ,a m ,... . In the same way the sum d n of the sequence c n+1 +c n+2 +is equal to the limit of the sequence a n+1 ; a n , a n+2 ; a n , ... a n+k ; a n , ... . Bythe remark from the beginning of the proof, the last limit is equal to 0 ; a n ,where 0 is the limit of the sequence a n+1 , a n+2 , ... , a n+k , ... (for xed n). But thelimit of the sequence a n+1 , a n+2 , ... is the same as the limit of the sequence a 1 ,a 2 , . . . , i.e., 0 = . We obtain that d n = ; a n . But, by the denition of limit, ; a n ! 0, i.e., d n ! 0whenn !1.As an example, put d n = 1 n + 12 (n +1) + . We see that d 2 n ! 0 whenn !1.Considering limits of innite sums leads us away from algebra, which is mainlyconcerned with nite expressions. These questions are closely related with anotherbranch of mathematics, called analysis. That is why we are not going to considerthem in more detail. Let us remark only that the most interesting results|such asformulas (3) and (4)|appear on borders of these areas.Problems1. Prove that if the sum of the series c 1 +c 2 ++c n + exists, then c n ! 0when n !1.2. Prove that if a n


10 I. R. <strong>Shafarevich</strong>4. Does there exist a limit of the sequence a 1 , a 2 , ... wherea n = 1 2 ; 1 (;1)n+ + ?3 nHint. Group consecutive terms in pairs.5. Let f(x) be a polynomial of degree d. Prove thata n ! 0whenn !1,where a n = f(n)=n d+1 .6. Find the sum of the series b + ba + ba 2 + ba n + , where jaj < 1 and b isarbitrary. Usually, the sequence (b ba ba 2 . . . ) is also called an innite geometricprogression.7. In a square with side b, centres of the sides are joined by segments. In thenew square which is obtained in that way the same procedure is done, etc. Findthe sum of areas of all squares that can be obtained in this way.18. Find the sum of the series1 2 + 12 3 + + 1n (n +1) + .9. Construct a sequence of positive rational numbers smaller than 1, such thata n has the denominator n and which does not have a limit.10. Prove that if the sequence a 1 , a 2 , ... has a limit , and the sequenceb 1 , b 2 , ... has a limit , then the sequence of sums a 1 + b 1 , a 2 + b 2 , ... has thelimit + .3. Decimal representation of real numbersIn Section 1 we described real numbers using a system of axioms. Now wearegoing to show how real numbers can be given concretely. Here we shall not sayanything new|we shall speak about justication of the well known representationof real numbers by innite decimal fractions. But now we shall show how theexistence of such a representation can be deduced from axioms listed in Section 1.We shall use the usual representation in which integer part can be either positiveor negative, while fractional part (sometimes called the mantissa) is alwaysnonnegative.Let A be an arbitrary integer (of either sign) and a 1 , a 2 ,... ,a n ,... an innitesequence of numbers, each of which can take one of 10 values: 0, 1, 2, 3, 4, 5, 6,7, 8, 9. All this together will be denoted by A a 1 a 2 a 3 ... and called an innitedecimal fraction. For the time being it is just an innite sequence, written in adierent way. Now we are going to show how a real number can be correspondedto it. We dene, for each index n, anumber(5) n = A + a 110 + + a n10 n :Obviously, the sequence 1 , 2 , ... , n , ... is increasing. Let us prove thatitisbounded. Really, since all a i 6 9,wehavea 110 + a 210 + + a n2 10 6 9 n 1+ 110 10 + + 1 :10 n;1


Selected chapters from algebra 11We apply the formula about the sum of geometric progression:1+ 110 + + 110 = 1 ; 110 n< 10 n;1 1 ; 1 910and as a result we obtain thata 1(6)10 + a 210 + + a n2 10 n < 1so that n


12 I. R. <strong>Shafarevich</strong>THEOREM 2. Two distinct innite decimal fractions, neither of which has 9asaperiod, are corresponded to distinct real numbers.The proof can be obtained easily if we connect our construction of a realnumber, dened by a decimal fraction, with the usual measuring of numbers withaccuracy of 1=10 m , with deciency and excess. One has to divide the line intosegments of the length 1=10 m , whose endpoints are rational numbers with denominator10 m . Then each point from the line, that is, each real number, falls in oneof the segments. The endpoints of the segment give a measure of the number, withdeciency and excess and accuracy of 1=10 m . However, violating of one-to-one correspondenceappears because of the endpoints of segments themselves. To whichofthe segments, left or right, is each ofthesepoints corresponded? This is the sameproblem which appears in connection with number 9in the period. We are goingto show that our choice (without 9in periods) corresponds to the case when theendpoints of segments are always attached to segments on the right-hand side. Inother words, the constructed numbers m and the number which they dene areconnected by the relation(8) m 6 < m + 110 m :(The fact that numbers m are rational with the denominators of the form 10 mfollows from their form (5).)Remember that number was dened as the limit of the sequence 1 , 2 ,... , n , ... . All numbers n with n > m, obviously satisfy the condition n > m .Hence, such an inequality is valid for their limit . Really, from the assumption< m we could deduce that n ; =( n ; m )+( m ;) > m ; for all n > m.But, by the denition of limit, the absolute value of the number n ; is smallerthan an arbitrary given positive number for n large enough. This contradicts thefact that it is not smaller than the xed positive number m ; (see Problem 2 inSection 2).In this way the left-hand inequality in (8) is proved. The right-hand one canbe proved similarly, if the sign < is replaced by 6. Namely, for each n > m wehave(9) n = m + a m+110 m+1 + + a n10 n 6 m + 110 m am+110 + + a n10 n;m and applying inequality (6) we conclude that n < m + 110 m . Repeating theprevious reasoning we obtain that 6 m + 110 m .But, if we want to obtain the right-hand inequality in (8) with the sign kwe can write n = m +(a m+1 =10 m+1 + + a k =10 k )+(a k+1 =10 k+1 + + a n =10 n ):


Selected chapters from algebra 13As before, we see thatand soa k+1 =10 k+1 + + a n =10 n 6 1=10 k n 6 m +(a m+1 =10 m+1 + +(a k +1)=10 k ):Since a k 6= 9, the digit a k + 1 is one of the digits 1, 2, . . . , 9. Putc = a m+1 =10 + +(a k +1)=10 k;m :We can repeat our reasoning once more and obtain that c m. From these relationsit follows that 0 m < m + 110 m , i.e., 0 m ; m < 110 m . But this contradicts thefact that m and 0 m are distinct rational numbers having the same denominator10 m . Hence, 0 m = m for all m. But numbers a m are uniquely determined by thenumbers m ,since m ; m;1 = a m =10 m . Thus, they coincide in both fractions,too.We pass now to the second question: does every real number correspond tosome innite decimal fraction? As well as the answer, the method of proof is alreadyknown to us. We justwant toconvince ourselves that the reasoning can be basedon the axioms we formulated.First of all, let us remark that each real number is situated between twoconsecutive integers, i.e., there exists an integer A, suchthatA 6


14 I. R. <strong>Shafarevich</strong>must be satised: either a 1 6


Selected chapters from algebra 15corresponding to the number a=b. Prove that a b ; n =r n10 n b , where 0 6 r n


16 I. R. <strong>Shafarevich</strong>number. From school courses it is known that(11)(12)(13)jA + Bj 6 jAj + jBjjA + Bj > jAj;jBjjABj = jAjjBj:Theorem 3 gives a quantitative estimate of how much f(x) diers from f(a)if x slightly diers from a. In order to prove the theorem, put y = x ; a, i.e.,x = a + y and substitute this value into the polynomial f(x). Each terma k x k ofthe polynomial f(x), after the substitution, gives the expression a k (a + y) k , whichcan be written as a sum of powers of y and then similar terms in f(a + y) canbereduced. As a result we obtain that f(a + y) is a polynomial in y, whichwe denoteby g(y) =c 0 + c 1 y + + c n y n . Then f(x) =f(a + y) =g(y), f(a) =f(a +0)=g(0) = c 0 , x ; a = y and inequality (10)whichweintend to prove becomes(14) jg(y) ; g(0)j 6 Mjyjfor all y satisfying jyj 6 1.In the transformed form, the expression g(y) ; g(0) acquires a simple formc 1 y ++c n y n (since g(0) = c 0 ). Inequality (11) can be applied also to a sum withan arbitrary number of summands (which canbeproved directly by induction) and,in particular, to our sum c 1 y + + c n y n . We obtain thatjg(y) ; g(0)j = jc 1 y + + c n y n j 6 jc 1 yj + + jc n y n j:Using equality (13) (also applied to an arbitrary number of factors), jc k y k j = jc k jjyj k ,sothatjg(y) ; g(0)j 6 jc 1 jjyj + + jc n jjyj n :Since, by the assumption, jyj 6 1, we have jyj k 6 jyj andfor jyj 6 1.jg(y) ; g(0)j 6 (jc 1 j + + jc n j)jyjIt is enough to put M = jc 1 j + + jc n j to obtain inequality (14), which alsomeans inequality (10).Now we are able to prove an important property of polynomials.THEOREM 4. (Bolzano's theorem) If a polynomial for x = a and x = b takesvalues with opposite signs, then it takes the value 0 somewhere between a and b.In other words, if for a polynomial f(x) values f(a) and f(b) are numbers ofopposite signs and a


Selected chapters from algebra 17Fig. 2 Fig. 3which is called continuity. In the case of polynomials it is enough to use the easyinequality (10), proved in Theorem 3.The proof is based on the same principle of \catching a lion in a desert", wehave already used for proving Theorem 1.Suppose, for example, that f(a) > 0, f(b) < 0. Consider the segment [a b](i.e., the set of real numbers x satisfying a 6 x and x 6 b). Denote this segment byI 1 and divide it into two segments of equal length by thepoint r = a+b . If f(r) =0,2then the theorem is proved (c = r). If f(r) 6= 0 and, for example, f(r) > 0, thenthe polynomial f(x) takes values of opposite signs for x = r and x = b. Denotethen by I 2 the segment [rb]. If that f(r) < 0, then the segment [a r] will bedenoted by I 2 . In any case we obtain a segment I 2 contained in I 1 , having twotimes smaller length, and having again the property that the polynomial f(x) hasvalues of opposite signs at its endpoints|namely, positive at the left-hand end andnegative at the right-hand one.This process can be continued. Either we shall at some moment reach a root ofthe polynomial f(x) (and the theorem will be proved), or the process shall continueunboundedly. It remains to consider the latter case. We obtain an innite sequenceof embedded segments I 1 I 2 I n , I n =[a n b n ], such thateachofthem is of half-a-length of the previous one, and the polynomial f(x) takes valuesof opposite signs at the endpoints a n and b n of each segment I n , more precisely,f(a n ) > 0, f(b n ) < 0. Now we are going to use the more precise denition of realnumbers we gave in Section 1. Segments I n satisfy the prepositions of Axiom VII(axiom of embedded segments) and Lemma 1 of Section 1. Really, segments I n areembedded one into another, by their construction, and since I n is half-of-length ofsegment I n;1, its length is equal to b ; a , and so this length becomes unboundedly2n;1 small when n increases. Hence, according to Axiom VII and Lemma 1, there existsa unique number c, belonging to all segments I n , i.e., such that(15) a n 6 c 6 b n :In this way wehave constructed the number c which we searched for. Namely, wenow prove that f(c) =0.Consider the values f(a n ) of the polynomial f(x) at the left-hand endpointsof segments I n . By the assumption, all f(a n ) > 0. Inequality (11) implies that thesequence a 1 , a 2 ,... approaches the number c unboundedly: really, a n 6 c 6 b and


18 I. R. <strong>Shafarevich</strong>0 6 c ; a n 6 b n ; a n , where, by the assumption, b n ; a n = b ; a. Therefore the2n;1 inequality ja n ; cj


Selected chapters from algebra 19\roots of a of degree n" (denoted as np a). Consider rst the case when a>0. Thenthe polynomial f(x) =x n ;a takes for x = 0 negativevalue ;a. On the other hand,it is easy to nd a value x = c such thatf(c) > 0 Really, by Archimedes' axiom(Axiom VI) there exists a natural number m such that m>a. Then m n >mandm n ; a>m; a>0. Using Bolzano's theorem, we can state that there is a root ofthe polynomial in the segment [0m]. If, on the other hand, a 0asan even power ofarealnumber, and x n ; a>0. If n is odd, then putting x = ;y we obtain thatx n ; a = ;y n ; a = ;(y n + a). The polynomial y n + a (for a0|not more than two roots which dier only in the sign (whichmeans it has exactly two roots).But in the case of other polynomials, it can happen that Bolzano's theoremdoes not give anything. Take as an example the polynomial x 2 ; x +2. Using theformula for solutions of quadratic equation we can conclude that this polynomialhas no real roots. But if we tried to give values 0, 1, 2, ... to the argument x,we would obtain only positive values, and Bolzano's theorem wouldn't give usanything. Therefore, we will try now to explore polynomials more thoroughly.Theorem 3 estimates values of a polynomial for values of x being close toa certain value a. We shall prove now a similar assertion about values of thepolynomial for large (by absolute value) values of x.THEOREM 5. For the polynomial f(x) =a 0 + a 1 x + + a n x n there existsaconstant N>0 such that(16) ja 0 + a 1 x + + a n;1x n;1 j < ja n x n jfor all values of x such that jxj >N.The theorem states that for suciently large values of x, the absolute valueof the leading term exceeds the absolute value of the sum of all other terms. Inorder to prove this, we use inequality (11) (for an arbitrary number of summands)and equality (13). It follows from them that ja 0 + a 1 x + + a n;1x n;1 j 6 ja 0 j +ja 1 jjxj + + ja n;1jjxj n;1 ,andja n x n j = ja n jjxj n . In order to prove inequality (16)it is enough to convince oneself that ja 0 j + ja 1 jjxj + + ja n;1jjxj n;1 6 ja n jjxj n ,and this will be proved if we show that(17) ja k jjxj k < 1 n ja njjxj nfor each k =0 1 ...n; 1 and jxj >N for N large enough. Then, summing up allthe inequalities (17) for k =0 1 ...n; 1we obtain the inequality we needed.Inequality (17)canbesolved in the usual way. It is equivalent to


20 I. R. <strong>Shafarevich</strong>jxj n;k > nja kjja n j , i.e.,(18) jxj > n;k sn ja kjja n j :Therefore, it sis enough to choose for N an arbitrary number larger than all thenumbers n;k n ja kj, k =0 1 ...n; 1, and it will satisfy the assertion of Theorem5.ja n jTheorem 5 has a lot of useful corollaries. Note rst that under the assumptionsof the theorem (i.e., for jxj >N) we always have jf(x)j > 0, which followsimmediately from inequality (12)jf(x)j = ja 0 + a 1 x + + a n x n j 6 ja n x n j;ja 1 + a 1 x + + a n;1x n;1 j:But this means that the polynomial f(x) does not have roots x with jxj > N.In other words, roots of a polynomial (if they exist) have to be contained in thesegment jxj 6 N, where, as weshave shown (inequality (18)) N can be chosen asthe greatest of the numbers n;k n ja kj. One calls such anumber N the bound ofja n jroots of the polynomial. So, for the polynomial x 3 ; 7x + 5 one can take forN anarbitrary number greater than 3p 3 5 and p 3 7. For example, N =46 satisesthe conditions. This means that all roots of the polynomial are distributed between;46 and46. We have convinced ourselves earlier that they are in fact containedbetween ;3 and +3 (Table 1).Theorem 5 implies more that just the assertion that f(x) 6= 0ifjxj >N,forthe found value of N. To evaluate the value of a 0 + a 1 x + + a n;1x n;1 + a n x nmeans to sum up two real numbers a 0 + a 1 x + + a n;1x n;1 and a n x n , rst ofwhich is smaller (by absolute value) than the other (for jxj >N). But then thesign is determined by the sign of the second summand. We come to the followingconclusion:COROLLARY 1. For jxj >N,where N is the bound of roots dened inTheorem5, values of the polynomial f(x) have the same sign as the leading term a n x n .Suppose that the degree n of the polynomial is odd. Then the sign of theleading term a n x n for x > 0 agrees with the sign of the coecient a n , and forxN and x


Selected chapters from algebra 21situation appears more complicated it depends not on how large the degree of thepolynomial is, but on its parity.Finally, consider one more property of polynomials, which can make investigationsin some cases much easier. Theorem 3 gave us information about theabsolute value of the dierence f(x) ; f(a) when the dierence x ; a is small. Weshall investigate now thesign of the dierence f(x) ; f(a). Here we shall excludethe cases when the value x = a appears to be a root of the derivative f 0 (x) ofthepolynomial f(x). These special values of a could be investigated easily in the samemanner, but we will not need this at the moment.THEOREM 6. Let a polynomial f(x) be given and take a value x = a whichis not a root of its derivative f 0 (x) (i.e., f 0 (a) 6= 0). If f 0 (a) > 0, then the valuesf(x) for x close, but to the left of a, are smaller than f(a), and for x close, butright of a, aregreater than f(a). If f 0 (a) < 0, then the situation is opposite.f 0 (a) > 0 f 0 (a) < 0Fig. 4 Fig. 5This means that there exists suciently small ">0 (depending on f(x) andon a), such that when f 0 (a) > 0, for a ; "


22 I. R. <strong>Shafarevich</strong>somewhere inside the segment, which would contradict the choice of the number ".This contains in fact the assertion of Theorem 6. Let, for example, f 0 (a) > 0.Then g(a a) = f 0 (a) > 0, too, and according to what was said, g(x a) > 0 fora ; "


Selected chapters from algebra 23for a polynomial of the third degree. In Section 3 of Chapter II we saw that eachequation of the third degree can be replaced by an equivalent equation of the formx 3 + ax + b =0. We shall investigate such a form in the sequel.First of all let us solve the question about multiple roots. We proved in section2ofChapter II that multiple roots of a polynomial are in fact joint roots of thepolynomial and its derivative. According to formula (15) of Section II, for thepolynomial f(x) =x 3 + ax + b the derivative is equal to f 0 (x) =3x 2 + a. If a>0,then the derivative has no roots and this means that the polynomial f(x) hasnomultiple roots. If a < 0, then denote by the positive root of the polynomial3x 2 + a (i.e., = p ;a=3). Then the polynomial f(x) can have asamultiple rootonly one of the numbers or ;. Since the polynomial f(x) can be written in theform f(x) =(x 2 + a)x + b and for x = , x 2 = ;a=3 andx 2 + a =2a=3, then thecondition that f(x) hasamultiple root takes the form 2a = ;b, i.e., 3 2 4a = 9 b2 ,and since 2 = ;a=3, the condition becomes ; 4a327 = b2 , i.e., 4a 3 +27b 2 = 0. Ifthis condition is satised, then the polynomial has a multiple root and may berepresented in the form f(x) =(x ; ) 2 g(x). Here the polynomial g(x) hastobeof the rst degree which means that it has a single root . Thus, the polynomialf(x) has two rootsequalto, and one root equal to .Consider now the remaining case when the polynomial f(x) does not havemultiple roots, i.e., 4a 3 +27b 2 6=0. According to Corollary 2 of Theorem 5, thepolynomial f(x) has at least one root . If it has another root , thenitmust bedivisible by (x ; )(x ; ), i.e., it has the form f(x) =(x ; )(x ; )g(x), whereg(x) is a polynomial of the rst degree and therefore it has a root . In such away,the polynomial f(x) has three roots: , and . It cannot have more than threeroots. We conclude that only two things can happen: either the polynomial f(x)has 1 root or the polynomial f(x) has3roots.Our problem is to nd out which ofthe cases takes place (for given coecients a and b).Fig. 8Suppose that the polynomial f(x) has three roots: , and , where


24 I. R. <strong>Shafarevich</strong>that for x large enough (more precisely, forx > N), the values of the polynomialhave the same sign as the values of the leading term x 3 |i.e., they are positive, andfor x 6 ;N they are negative, for the same reason. Hence, for xit is always f(x) > 0 (Fig. 8).Since we have f(x) < 0for ; "


Selected chapters from algebra 25Problems1. We proved at the end of Chapter I that the polynomial x 3 ; 7x 2 +14x ; 7has no rational roots, so its roots|if they exist|are irrational numbers. Determinethe number of roots of this polynomial, their signs and also, for each of the roots,two consecutive integers such thatthisrootislyingbetween them.2. Prove that the polynomial x 4 + ax + b either has no roots, or it has tworoots and nd conditions (on coecients a and b) such that the rst or the lattercase takes place.3. Prove that the number of roots of a polynomial of even degree is even andof odd degree is odd.4. Prove that the polynomial x n + ax + b, for n even, has 0 or 2 roots, and forn odd|1or3. Determine conditions (on coecients a and b) such that the rst orthe latter case takes place.5. Determine the number of roots of the polynomial x n +ax n;1 +b (dependingon n, a and b).6. Prove that each polynomial f(x) takes arbitrarily large values (by absolutevalue), for suciently large values of x (by absolute value).M7. Prove that as a bound of roots N the number +1 can be taken,where M is the largest of the numbers ja 0 j, ... , ja n;1j. Hint. Use the inequalityja 0 + + a n;1z n;1 j 6 M(1 + jzj + + jzj n ).8. Prove that the polynomial f(x) = a 0 + a 1 x + + a n;1x n;1 + a n x n ,where a n > 0, a i 6 0 for i =1 ...n; 1, a 0 < 0, has exactly one positive root.Hint. Write f(x) in the form a n x n 1+ a n;1a n x + + a 0a n x n and nd whether theexpressions a n;kincrease or decrease when x increases, remaining positive.a n xn 9. Let a polynomial f(x) have all the coecients at even powers of x equalto 0, and all the coecients at odd powers positive. Prove that it has a uniqueroot.ja n jAPPENDIXSturm's TheoremWe shall present now a method allowing to determine for each polynomial f(x)the number of its roots lying in a given segment [a b].The idea of the method is based on the fact that, although for a single polynomialf(x) there is no simple method which could connect its properties with someproperties of polynomials with smaller degree, for a pair of polynomials f(x), g(x)such a method is well known: it consists of divison with remainder of the polynomialf(x) by g(x): f(x) =g(x)q(x)+r(x), and passing from the pair of polynomials(fg) to the pair of polynomials (g r). Repeating this process leads us to the algorithmof Euclid for nding the greatest common divisor of polynomials f and g.


26 I. R. <strong>Shafarevich</strong>For example, the question of the existence of common roots of polynomials f andg can be reduced to the question of the existence of common roots of the polynomialsof smaller degree g and r and, as a result, to the question of the existence ofroots of the polynomial of smaller degree g: c: d:(fg). The method can be appliedto the case of the pair of a polynomial and its derivative and then we obtaintheanswer to the question of the existence of multiple roots of the polynomial. That ishow we proceeded in Chapter II, and we shall also proceed like thatnow: we shallrst consider a certain property ofroots of the pair of polynomials (fg), whichcan be treated using division with remainder. Applying then this property tothepair consisting of a polynomial and its derivative, we shall nd the answer to ourquestion.Let us start with a simple observation, related to a single polynomial F (x).Let x = be its root and let this root have themultiplicity k. Then we can writedown (by the denition of the multiplicity ofroots,given in Section 2 of Chapter II)(1) F (x) =(x ; ) k G(x)where G() 6= 0. Thus, if anumber " is smaller than the distance from to thenearest root of the polynomial G(x), then G(x) takes the values of the same signin the segment [ ; " + "]. Really, if for any two numbers x and y lying in thissegment the polynomial G had values G(x) and G(y) of opposite signs, then, byBolzano's theorem, there would exist a root of the polynomial between x and y.But this would contradict the way how " had been chosen|that there had been noroot of the polynomial G lying in the segment [ ; " + "]. In particular, all thevalues of the polynomial G(x) forx in the segment [ ; " + "] have the same signas G(). Formula (1) implies now thatifmultiplicity k is even, then the values ofthe polynomial F (x) forx lying in the segment [ ; " + "] have the same sign asG(). The graph could be situated as in Fig. 1.G() > 0 G() < 0Fig. 1If, on the other hand, the multiplicity k is odd, then for G() > 0 we haveF (x) < 0 for ;" 6 x 0 for


Selected chapters from algebra 27former case (i.e., for G() > 0) is a root with increasing, andinthelatter (forG() < 0)|root with decreasing. Possible graphs of the polynomial F (x) inbothcases are displayed in Fig. 2.G() > 0 G() < 0Fig. 2DEFINITION. LetF (x) be a polynomial having as roots neither a nor b. Characteristicsof the polynomial F (x) on the segment [a b] is the dierence betweenthe number of its roots with increasing and the roots with decreasing, lying in thatsegment. Here, roots having even multiplicity are not counted. The characteristicsis denoted by [F (x)] b a. For example, the polynomial represented in Fig. 3 has 3roots with increasing and 2 roots with decreasing, so we have [F ] b a =1.Fig. 3Since after each root with increasing there must follow a root with decreasing(roots with even multiplicity do not count), the characteristics is determined bythe signs of numbers F (a) andF (b), namely:[F (x)] b a =0 if F (a) and F (b) are of the same sign[F (x)] b a =1 if F (a) < 0, F (b) > 0[F (x)] b a = ;1 if F (a) > 0, F (b) < 0Table 1


28 I. R. <strong>Shafarevich</strong>Thus, the characteristics of the polynomial F (x) on the given segment isdeterminedby its signs at the endpoints of the segment and so it can be evaluatedeasily, although by the denition it is connected with its roots which are ususallyhard to nd.Our situation can be visually demonstrated as if a passenger is travelling,crossing several times the border between two states, say France and Germany.What is the dierence between the number of crossings the border from France toGermany and from Germany toFrance? Obviously, it is equal to 0 if the passengerstarted and nished his travel in the same state it is equal to 1 if he started inFrance and nished in Germany and to ;1 if he started in Germany and nished inFrance. His itinerary can be demonstrated as a line similar to the graph in Fig. 3,where France is the area below thex-axis and Germany isabove.Consider now two polynomials, f and g, and assume that, rst of all, theyhave no common roots, and, secondly, that the former (i.e., f) doesnotvanish atx = a, nor at x = b. The characteristics of the polynomial f with respect to thepolynomial g on the segment [a b] is the dierence between the number of roots ofthe polynomial f contained in the segment [a b] and being roots with increasing ofthe polynomial fg and the number of its roots being roots with decreasing for fg.The characteristics is denoted by (fg) b a .The main example, which was the reason to introduce this notion is given bythe following proposition.THEOREM 1. If a polynomial f(x) has no multiple roots and neither it norits derivative vanishes at the endpoints a and b of the segment [a b], then thecharacteristics (ff 0 ) b a is equal to the number of roots of the polynomial f containedin the segment [a b].The theorem is an easy consequence of Corollary of Theorem 4, Section 3. Wesimply state that all the roots of the polynomial f(x) are roots with increasing of thepolynomial ff 0 . Really, according to Theorem 5 of Chapter II, the polynomials fand f 0 have nocommonroots. If is a root of the polynomial f(x) with f 0 () > 0,then according to Theorem 4 of Section 3 is a root with increasing for f(x), andso also for f(x)f 0 (x), since f 0 (x) > 0 in a neighbourhood of . If, on the otherhand, f 0 () < 0, then is a root with decreasing for f(x), and so again a root withincreasing for f(x)f 0 (x), since f 0 (x) < 0inaneighbourhood of .The characteristics (fg) b a is in fact an expression which canbeevaluated usingdivision with remainder. Note rst the following simple properties:a) (f;g) =;(fg).This is obvious since when multiplying the polynomial g by ;1, the roots withincreasing and the roots with decreasing of the polynomial fg interchange.b) If g(a) 6= 0 and g(b) 6= 0, then (fg) b a +(g f) b a =[fg] b a.This is also obvious since, by the assumption, the polynomials f and g haveno common roots. Hence, the roots of the polynomial fg split into the roots of thepolynomial f and those of the polynomial g. The number of roots with increasing(and similarly for roots with decreasing) of the polynomial fg is equal to the sum


Selected chapters from algebra 29of the numbers of such roots of the polynomial f and of the polynomial g, whichgives us the equality b).c) If polynomials g and h take the same values at the roots of a polynomial f(i.e., if g() =h() whenever f() = 0), then(fg) b a =(fh) b a:Really, if g() = h(), then a root of the polynomial f(x) is at the sametime a root with increasing (decreasing) for the polynomials fg and fh.d) If a polynomial f is divisible by a polynomial g, then(fg) b a =[fg]b a :Really, the polynomial g has no roots, since its roots would be common rootsfor the polynomials f and g. Therefore, (g f) b a = 0 and from the property b) itfollows that (fg) b a =[fg] b a.We shall describe now the process of evaluating the characteristics (fg) b a .Divide f by g with remainder:(2) f = gq + r:According to property b), we have (fg) b a = ;(g f) b a +[fg] b a. On the other hand, itfollows from relation (2) that f() =r() whenever g() =0. Hence, by propertyc) we obtain that (fg) b a =(g r)b a . The obtained equalities together show that(3) (fg) b a = ;(g r) b a +[fg] b a:As a matter of fact, relation (3) solves our problem, since it reduces the evaluationof the characteristics (fg) b a to the evaluation of the characteristics (g r)b a for thepolynomials g and r of smaller degree, because the expression [fg] b a is determinedby the values of the polynomials f and g at the endpoints a and b of the segment[a b] (see Table 1).Our process of passing from the pair (fg) to a pair of polynomials with smallerdegree is the same as in the process of determining the greatest common divisorof the polynomials f and g. In such a case the characteristics is determined byproperty d).We intend to improve our result in two directions. Firstly, we shall presentin a unied form the nal answer which can be obtained after passing from thepair (fg) to (g r) and then executing all the divisions in the consecutive stepsof the Euclid's algorithm. Secondly, our inductive reasoning needs that conditionsimposed on the polynomials f and g (f(a) 6= 0, f(b) 6= 0) are then imposed tothe polynomials g, r etc. We shall show how one can get rid of these additionalrestrictions.First of all, we shall transform a bit the answer we have obtained (formula (3)).We start with changing the notation. The polynomial f will be denoted by f 1 , g byf 2 and ;r by f 3 . Taking into account condition a) of the characteristics, formula(3) obtains the form(4) (f 1 f 2 ) b a =(f 2 f 3 ) b a +[f 1 f 2 ] b a


30 I. R. <strong>Shafarevich</strong>and the formula of division with remainder (formula (2)) the formf 1 = f 2 q 1 ; f 3(we have denoted here q by q 1 ). Now itisclearhow to apply formula (4), reducingdegrees of polynomials considered. Starting from f 1 and f 2 dene polynomials f iby induction:(5) f i;1 = f i q i;1 ; f i+1 where the degree of f i+1 is smaller than the degree of f i (assuming that f i;1 andf i are already dened). Clearly, f i;1 are just those plynomials which appear as remaindersin the Euclid's algorithm, only with the changed signs. After several stepswe come to a polynomial f k ,dieringeventually only by sign with the gcd(f 1 f 2 ).Applying formula (4) to f 2 and f 3 instead to f 1 and f 2 , we obtain that(f 2 f 3 ) b a = (f 3f 4 ) b a +[f 2f 3 ] b a . Substituting this value for (f 2f 3 ) b a into formula(4), we get(f 1 f 2 ) b a =(f 3 f 4 ) b a +[f 1 f 2 ] b a +[f 2 f 3 ] b a:Repeating this process k times and noting that [f k f k+1 ] b a = 0 as a result we obtain:(6) (f 1 f 2 ) b a =[f 1f 2 ] b a +[f 2f 3 ] b a + +[f k;1f k ] b a :However, in order that we have the right to apply formula (4), we have to assumethat f i (a) 6= 0,f i (b) 6= 0 for all i =1 2 ...k.Consider carefully the expression [fg] b a which canbeevaluated using Table 1for F = fg. In our case it can be rewritten as8>< 0 if f(a)g(a) > 0andf(b)g(b) > 0, or f(a)g(a) < 0andf(b)g(b) < 0,[fg] b a = 1 if f(a)g(a) < 0andf(b)g(b) > 0,>:;1 if f(a)g(a) > 0andf(b)g(b) < 0.Table 2If two numbers A and B, distinct from 0, are given, then one says that in thepair (A B) there exists one change of sign if A and B are of opposite signs, andthat there is no change of sign if they are of the same sign. Using this terminology,one can reformulate information of Table 2, denoting by n the number of changesof sign in the pair (f(a)f(b)) and by m the number of changes of sign in the pair(f(b)g(b)). Table 2 obtains the form:[fg] b a m n0 0 00 1 11 1 0;1 0 1We see that in all the cases we have [fg] b a = m ; n.


Selected chapters from algebra 31We shall apply now the last remark to formula (6). Denote by m i the numberof changes of sign in the pair (f i (a)f i+1 (a)), and by n i the number of changesof sign in the pair (f i (b)f i+1 (b)). As a consequence of the remark, formula (6)obtains the form(7) (f 1 f 2 ) b a = m 1 ; n 1 + m 2 ; n 2 + + m k ; n k :What is the meaning of the number m 1 + m 2 + + m k ? One has just to writedown the numbers f 1 (a), f 2 (a), . . . , f k (a) and nd out how many changes of signare there in this sequence|the number of these changes will be m 1 +m 2 ++m k .In general, if a sequnce of numbers A 1 , ... , A k , distinct from 0, is given, then bythe number of changes of sign in this sequence, we shall mean the number of placeswhere numbers of opposite signs stay. For example, in the sequence 1, ;1, 2, 1, 3,;2 there are 3 changes of sign. We cansaythatm 1 +m 2 ++m k is the number ofchanges of sign in the sequence f 1 (a), f 2 (a), ... , f k (a), and that n 1 + n 2 + + n kis the number of changes of sign in the sequence f 1 (b), f 2 (b), ... , f k (b). Formula(7) can be interpreted now in the following way:THEOREM 2. If none of the terms f 1 , ... , f k of Sturm's sequence of polynomialsf 1 , f 2 vanishes, either in a, or in b, and the polynomials f 1 , f 2 have nocommon roots, then the characteristics (fg) b a is equal to the dierence between thenumbers of changes of sign in the sequences of values of polynomials in Sturm'ssequence atthepoints a and b.We have now to get rid of the restrictions f i (a) 6= 0,f i (b) 6= 0 for i =1 ...k,which can be uncomfortable in applications: we shall assume just that f 1 (a) 6= 0and f 1 (b) 6= 0. In order to do that we have togeneralize a bit the notion of thenumber of changes of sign. If some of the terms in the sequence A 1 , ... , A k areequal to 0, then the number of changes of sign in it is dened as the number ofchanges of sign in the sequence which is obtained by deleting all the zeros in thegiven sequence. For example, deleting zeros in the sequence 1, 0, 2, ;1, 0, 3, 1, thesequence 1, 2, ;1, 3, 1 is obtained, and the latter has two changes of sign. Hence,the given sequence has two changes of sign, by denition.Denote now by " the distance from a to the nearest root (distinct from a) ofany of the polynomials f i (x). Thus, f i (x) 6= 0 for a


32 I. R. <strong>Shafarevich</strong>the sequence f 1 (a 0 ), ... , f k (a 0 ), as well as in the sequence f 1 (b 0 ), ... , f k (b 0 ), isdetermined by the Lemma. Thus, we obtain the wanted result:THEOREM 3. If polynomials f 1 and f 2 have no common roots, f 1 (a) 6= 0andf 1 (b) 6= 0, then the characteristics (f 1 f 2 ) b a is equal to the dierence between thenumbers of changes of sign in the sequence f 1 (a), ... , f k (a) and in the sequencef 1 (b), ... , f k (b), where f 1 (x), ... , f k (x) is Sturm's sequence corresponding to thepair of polynomials f 1 , f 2 .We shall show now that the Lemma is valid. Consider, for example, the valuex = a. Suppose that f i (a) =0for some i =1 ...k. By the assumption, i 6= 1since f 1 (a) 6= 0. Also, i 6= k since the polynomial f k (x) can dier from gcd(f 1 f 2 )only by sign and so it is a number distinct from 0. Note that then f i;1(a) 6= 0and f i+1 (a) 6= 0. Really, if we had,for example, f i (a) =0,f i+1 (a) =0, then itwould follow from formula (5) that f i;1(a) = 0. In exactly the same way, thiswould imply that f i;2(a) = 0 etc., and nally f 1 (a) = 0, which would contradictthe original assumption. But we can say even more|not only that the numbersf i;1(a) and f i+1 (a) are distinct from 0, but they have opposite signs|it followsimmediately by substituting x = a into equality (5) and taking into account theassumption that f i (a) =0.Compare now the sequences f 1 (a), ... , f k (a) and f 1 (a 0 ), ... , f k (a 0 ). Letf i (a) = 0. Then, as we have seen, f i;1(a) 6= 0 and f i+1 (a) 6= 0, and f i;1(a)and f i+1 (a) have opposite signs. But then f i;1(a 0 ) 6= 0 and f i+1 (a 0 ) 6= 0, andf i;1(a 0 ) has the same sign as f i;1(a), while f i+1 (a 0 ) has the same sign as f i+1 (a).This follows from the fact that the polynomials f i;1 and f i+1 have no roots in thesegment [a a 0 ], and so (by Bolzano's theorem) they can have novalues of oppositesigns. Write down the respective parts of our sequences. Suppose that f i;1(a) > 0.Then we obtain the following table:f i;1(x) f i (x) f i+1 (x)x = a + 0 ;x = a 0 + ? ;The characteristics (f 1 f 2 ) b0 a0 depends on the number of changes of sign in the lowestrow. But we see that it coincides with the number of changes of sign in the rowabove it|whatever the unkown sign, denoted by ?, is, there will be exactly onechange of sign in each of the rows. The case when f i;1(a) < 0 can be treatedexactly in the same way. The Lemma is proved.Combining Theorem 3 with Theorem 1 we obtain the basic result:THEOREM 4. (Sturm's Theorem) If a polynomial f(x) has no multiple rootsand does not vanish for x = a and x = b, then the number of its roots in the segment[a b] is equal to the dierence between the number of changes of sign of the valuesof polynomials in the Sturm's sequence, formed for the polynomials f(x) and f 0 (x)at x = a and x = b.


Selected chapters from algebra 33One has only to note that the lack ofmultiple roots of the polynomial f(x) isequivalent to the lack of common roots of the polynomials f(x) andf 0 (x)|this isjust the assertion of Theorem 5 of Chapter II. Therefore we can apply Theorem 1to the polynomial f(x) and then Theorem 3 to the pair of polynomials f(x) andf 0 (x).Sturm's theorem gives a possibility to answer the basic questions about distributionof roots of a polynomial. First of all, using the theorem, the number ofroots can be determined. In order to do that, it is enough to remember Theorem 3of Section 3, which indicates a number N such that all the roots of the polynomiallie between ;N and N. After that it is sucient to apply Sturm's theorem to thesegment [;NN]. However, it is remarkable that in order to determine the numberof roots it is neither necessary to evaluate the number N (using Theorem 3), norto evaluate the values of polynomials in Sturm's sequence for x = ;N and x = N.Really, for applying Sturm's theorem it is not necessary to know the values f i (N)themselves, but only their signs. That is why it is sucient tochoose a number Nlarge enough, such that the segment [;NN] contains not only all the roots of thepolynomial f 1 (x), but also all the roots of all the polynomials f i (x) of the Sturm'ssequence (i.e., we can choose a respective number N i for each polynomial f i (x) andtake for N the largest of them). According to Corollary 1 of Theorem 3, Section 3,the sign of the value f i (N), resp. f i (;N), coincides with the sign of the leadingterm of the polynomial f i (x) for x = N, resp. x = ;N. They are determined bythe sign of the leading coecient ofthepolynomialf i (x) andby the parity ofitsdegree. Therefore, there is no need to evaluate N and the values f i (N) andf i (;N).When the number of roots is determined, it is possible to indicate segments,each of which contains exactly one root. In order to do that it is already necessaryto evaluate the number N, indicated in Theorem 3 of Section 3. After that thesegment [;NN] is divided into two equal parts and using Sturm's theorem thenumber of roots in each part is found. Then the same is done with the segments[;N0] and [0N] and the process is continued till each of the segments containsonly one root.If it is known that a segment [a b] contains exactly one root of the polynomialf(x) and the polynomial has no multiple roots, then the values f(a) andf(b) mustbe of opposite signs. Really, if the root is equal to , then, according to Theorem4 of Section 3, for " small enough, the values f( ; ") andf( + ") have the samesign. But f( ; ") andf(a) have to be of the same sign|otherwise the polynomialwould have one more root in the segment [ ; " ]. The same is true for the valuesf( + ") and f(b). Thus, f(a) has the same sign as f( ; "), f(b) the same asf( + "), and f( ; ") and f( + ") have opposite signs. Hence, f(a) andf(b) haveopposite signs. Knowing that, it is possible to evaluate the root with arbitrarylevel of accuracy. It is sucient to divide the segment [a b] into two parts by apoint c and evaluate f(c). Either f(a) and f(c), or f(c) and f(b) have oppositesigns. In the former case is contained in the segment [a c], and in the latter|inthe segment[c b]. After that wecontinue the process with the segmentcontaining until we include in a segment of arbitrary small length. This means that we haveevaluated it with arbitrary level of accuracy.


34 I. R. <strong>Shafarevich</strong>Consider, for example, the polynomial f(x) = x 3 +3x ; 1. Applying thecriterion from Section 3, we have toevaluate the expression 4a 3 +27b 2 =427+27.Since it is positive, the polynomial has one root. Applying Theorem 3 of Section 3,we nd the value N = 3. Therefore, the root is contained between ;3 and 3,where f(;3) < 0, f(3) > 0. Since f(0) < 0, the root is contained between 0and 3. Since f(1) = 3 and f(2) = 13, the root is contained between 0 and 1. Inorder to nd its rst decimal, we have to determine in which of the 10 segments(between 0 and 1/10, 1/10 and 2/10, . . . , 9/10 and 1) it lies. Put rst x =1=2,then f(x) =5=8. Since f(0) and f(1=2) are of opposite signs, the root is containedbetween 0 and 1/2. Put now x =3=10. Since f( 3 27)= + 9 27;1= ; 1 < 0,10 1000 10 1000 10the root is contained between 3/10 and 5/10. Finally, f( 4 64)= + 12 ; 1 > 0.10 1000 10Hence, the root lies between 3/10 and 4/10 and it has the form =03... .Since Sturm's theorem has an elegant formulation and a lot of applications,it became widely known immediately after it had been proved. Jacques Sturm, aFrench mathematician who had proved it, when teaching about the theorem in hislectures, used to say: \Now I will prove a theorem, the name of which Ihave thehonor to bare".Problems1. Construct Sturm's sequence for the polynomials f(x) andf 0 (x) iff(x) =x 2 + ax + b or f(x) =x 3 + ax + b. Using Sturm's theorem deduce again the resultsabout the numbers of roots of these polynomials, obtained already at the end ofSection 3. Hint. In the case of f(x) = x 3 + ax + b consider separately dierentcases of possible signs for a and D =4a 3 +27b 2 .2. Determine, using Sturm's theorem, the number of roots of the polynomialx n + ax + b, depending on n (more precisely, on its parity), a and b.3. Find the number of roots of the polynomial x 5 ; 5ax 3 +5a 2 x +2b. Hint.The answer depends on the sign of the expression a 5 ; b 9 .4. Let a be a root of the derivative f 0 (x) of a polynomial f(x). Put f 1 (x) =f(x), f 2 (x) =f 0 (x)=(x;a). Let f(x) has no multiple roots, and f 1 (x),... ,f k (x) isSturm's sequence for the polynomials f 1 (x) andf 2 (x). Express the number of rootsof the polynomial f(x) in terms of the number of changes of sign in the sequencesf i (N), f i (a) andf i (;N), i =1 ...k where N is a suciently large number.5. Let two polynomials f 1 and f 2 be given, with degrees n and n ; 1, respectively,and suppose that in their Sturm's sequence the degree of the polynomialf i (x) isn ; i + 1 and its leading coecient is positive. Prove that the polynomialf 1 (x) hasn roots. Moreover, each of the polynomials f i (x) hasn ; i + 1 roots, andbetween each two adjacent roots of the polynomial f i (x) there lies a root of thepolynomial f i+1 (x).6. Let a polynomial f(x) of degree n has n roots. Prove that in the Sturm'ssequence (for the polynomials f and f 0 ) each polynomial has the degree whichis smaller exactly by 1 than the degree of the previous one, and all the leadingcoecients are positive. Prove that these conditions are sucient in order that apolynomial of degree n has n roots.I. R. <strong>Shafarevich</strong>, Russian Academy of Sciences, Moscow, Russia

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!