Nonlinear Mechanics - Physics at Oregon State University

Nonlinear Mechanics 

A. W. Stetz 

January 8, 2012

Contents 

1 Lagrangian Dynamics 5 

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 

1.2 Generalized Coordinates and the Lagrangian . . . . . . . . . 6 

1.3 Virtual Work and Generalized Force . . . . . . . . . . . . . . 8 

1.4 Conservative Forces and the Lagrangian . . . . . . . . . . . . 10 

1.4.1 The Central Force Problem in a Plane . . . . . . . . . 11 

1.5 The Hamiltonian Formulation . . . . . . . . . . . . . . . . . . 13 

1.5.1 The Spherical Pendulum . . . . . . . . . . . . . . . . . 15 

2 Canonical Transformations 17 

2.1 Contact Transformations . . . . . . . . . . . . . . . . . . . . . 17 

2.1.1 The Harmonic Oscillator: Cracking a Peanut with a 

Sledgehammer . . . . . . . . . . . . . . . . . . . . . . 20 

2.2 The Second Generating Function . . . . . . . . . . . . . . . . 21 

2.3 Hamilton’s Principle Function . . . . . . . . . . . . . . . . . . 22 

2.3.1 The Harmonic Oscillator: Again . . . . . . . . . . . . 24 

2.4 Hamilton’s Characteristic Function . . . . . . . . . . . . . . . 25 

2.4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 26 

2.5 Action-Angle Variables . . . . . . . . . . . . . . . . . . . . . . 27 

2.5.1 The harmonic oscillator (for the last time) . . . . . . . 29 

3 Abstract Transformation Theory 33 

3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

3.1.1 Poisson Brackets . . . . . . . . . . . . . . . . . . . . . 35 

3.2 Geometry in n Dimensions: The Hairy Ball . . . . . . . . . . 38 

3.2.1 Example: Uncoupled Oscillators . . . . . . . . . . . . 41 

3.2.2 Example: A Particle in a Box . . . . . . . . . . . . . . 43 

3

4 CONTENTS 

4 Canonical Perturbation Theory 45 

4.1 One-Dimensional Systems . . . . . . . . . . . . . . . . . . . . 45 

4.1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 49 

4.1.2 The simple pendulum . . . . . . . . . . . . . . . . . . 49 

4.2 Many Degrees of Freedom . . . . . . . . . . . . . . . . . . . . 51 

5 Introduction to Chaos 55 

5.1 The total failure of perturbation theory . . . . . . . . . . . . 56 

5.2 Fixed points and linearization . . . . . . . . . . . . . . . . . . 58 

5.3 The Henon oscillator . . . . . . . . . . . . . . . . . . . . . . . 62 

5.4 Discrete Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

5.5 Linearized Maps . . . . . . . . . . . . . . . . . . . . . . . . . 70 

5.6 Lyapunov Exponents . . . . . . . . . . . . . . . . . . . . . . . 72 

5.7 The Poincaré-Birkhoff Theorem . . . . . . . . . . . . . . . . . 74 

5.8 All in a tangle . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 

5.9 The KAM theorem and its consequences . . . . . . . . . . . . 80 

5.9.1 Two Conditions . . . . . . . . . . . . . . . . . . . . . . 81 

5.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Chapter 1 

Lagrangian Dynamics 

1.1 Introduction 

The possibility that deterministic mechanical systems could exhibit the behavior 

we now call chaos was first realized by the French mathematician 

Henri Poincaré sometime toward the end of the nineteenth century. His 

discovery emerged from analytic or classical mechanics, which is still part 

of the foundation of physics. To put it a bit facetiously, classical mechanics 

deals with those problems that can be “solved,” in the sense that it is possible 

to derive equations of motions that describe the positions of the various 

parts of a system as functions of time using standard analytic functions. 

Nonlinear dynamics treats problems that cannot be so solved, and it is only 

in these problems that chaos can appear. The simple pendulum makes a 

good example. The differential equation of motion is 

¨θ + ω 2 sin θ = 0 (1.1) 

The sin is a nonlinear function of θ. If we linearize by setting sin θ ≈ θ, 

the solutions are elementary functions, sin ωt and cos ωt. If we keep the sin, 

the solutions can only be expressed in terms of elliptic integrals. This is 

not a chaotic system, because there is only one degree of freedom, but if we 

hang one pendulum from the end of another, the equations of motion are 

hopeless to find (even with elliptic integrals) and the resulting motion can 

be chaotic. 1 

1 I should emphasize the distinction between the differential equations of motion, which 

are usually simple (though nonlinear), and the equations that describe the positions of 

the elements of the system as functions of time, which are usually non-existent. 

5

6 CHAPTER 1. LAGRANGIAN DYNAMICS 

In order to arrive at Poincaré’s moment of discovery, we will have to 

review the development of classical mechanics through the nineteenth century. 

This material is found in many standard texts, but I will cover it here 

in some detail. This is partly to insure uniform notation throughout these 

lectures and partly to focus on those things that lead directly to chaos in 

nonlinear systems. We will begin formulating mechanics in terms of generalized 

coordinates and Lagrange’s equations of motion. We then study 

Lagrange transformations and use them to derive Hamilton’s equations of 

motions. These equations are particularly suited to conservative systems 

in which the Hamiltonian is constant in time, and it is such systems that 

will be our primary concern. It turns out that Lagrange transformations 

can be used to transform Hamiltonians in a myriad of ways. One particularly 

elegant form uses action-angle variables to transform a certain class of 

problems into a set of uncoupled harmonic oscillators. Systems that can be 

so transformed are said to be integrable, which is to say that they can be 

“solved,” at least in principle. What happens, Poincaré asked, to a system 

that is almost but not quite integrable? The answer entails perturbation 

theory and leads to the disastrous problem of small divisors. This is the 

path that led originally to the discovery of chaos, and it is the one we will 

pursue here. 

1.2 Generalized Coordinates and the Lagrangian 

Vector equations, like F = ma, seem to imply a coordinate system. Beginning 

students learn to use cartesian coordinates and then learn that this 

is not always the best choice. If the system has cylindrical symmetry, for 

example, it is best to use cylindrical coordinates: it makes the problem easier. 

By “symmetry” we mean that the number of degrees of freedom of the 

system is less that the dimensionality of the space in which it is imbedded. 

The familiar example of the block sliding down the incline plane will make 

this clear. Let’s say that it’s a two dimensional problem with an x-y coordinate 

system. The block is constrained to move in a straight line, however, 

so that its position can be completely specified by one variable, i.e. it has 

one degree of freedom. The clever student chooses the x axis so that it lies 

along the path of the block. This reduces the problem to one dimension, 

since y = 0 and the x coordinate is given by one simple equation. In the 

pendulum example from the previous section, it was most convenient to use 

a polar coordinate system centered at the pivot. Since r is constant, the 

motion can be described completely in terms of θ.

1.2. GENERALIZED COORDINATES AND THE LAGRANGIAN 7 

These coordinate systems conceal a subtle point: the pendulum moves 

in a circular arc and the block moves in a straight line because they are 

acted on by forces of constraint. In most cases we are not interested in these 

forces. Our choice of coordinates simply makes them disappear from the 

problem. Most problems don’t have obvious symmetries, however. Consider 

a bead sliding along a wire following some complicated snaky path in 3-d 

space. There’s only one degree of freedom, since the particle’s position 

is determined entirely by its distance measured along the wire from some 

reference point. The forces are so complicated, however, that it is out of 

the question to solve the problem by using F = ma in any straightforward 

way. This is the problem that Lagrangian mechanics is designed to handle. 

The basic (and quite profound) idea is that even though there may be no 

coordinate system (in the usual sense) that will reduce the dimensionality of 

the problem, yet there is usually a system of coordinates that will do this. 

Such coordinates are called generalized coordinates. 

To be more specific, suppose that a system consists of N point masses 

with positions specified by ordinary three-dimensional cartesian vectors, ri, 

i = 1 · · · N, subject to some constraints. The easiest constraints to deal with 

are those that can be expressed as a set of l equations of the form 

fj(r1, r2, . . . , t) = 0, (1.2) 

where j = 1 · · · l. Such constraints are said to be holonomic. If in addition, 

the equations of constraint do not involve time explicitly, they are said to be 

scleronomous, otherwise they are called rheonomous. These constraints can 

be used to reduce the 3N cartesian components to a set of 3N − l variables 

q1, q2, . . . , q3N−l. The relationship between the two is given by a set of N 

equations of the form 

ri = ri(q1, q2, . . . , q3N−l, t). (1.3) 

The q’s used in this way are the generalized coordinates. In the example of 

the bead on a curved wire, the equations would reduce to r = r(q), where 

q is a distance measured along the wire. This simply specifies the curvature 

of the wire. 

It should be noted that the q’s need not all have the same units. Also 

note that we can use the same notation even if there are no constraints. 

For example, the position of an unconstrained particle could be written r = 

r(q1, q2, q3), and the q’s might represent cartesian, spherical, or cylindrical 

coordinates. In order to simplify the notation, we will often pack the q’s


into an array and use vector notation, 

 

 

 

 

 

q = 

 

 

 

q1 

q2 

q3 

. 

 

 

 

 

 

 

 

 

 

(1.4) 

This is not meant to imply that q is a vector in the usual sense. For one 

thing, it does not necessarily posses “a magnitude and a direction” as good 

vectors are supposed to have. By the same token, we cannot use the notion 

of orthogonal unit vectors. 

Along with the notion of generalized coordinates comes that of generalized 

velocities. 

˙qk ≡ dqk 

(1.5) 

dt 

Since qi depends only on t, this is a total derivative, but when we differentiate 

ri, we must remember that it depends both explicitly on time as well as 

implicitly through the q’s. 

˙ri = ∑ ∂ri 

∂qk 

k 

˙qk + ∂ri 

∂t 

(1.6) 

(In this chapter I will consistently use the index i to sum over the N point 

masses and k to sum over the 3N − l degrees of freedom.) Differentiating 

both sides with respect to ˙qk yields 

∂ ˙ri 

∂ ˙qk 

= ∂ri 

∂qk 

which will be useful in the following derivations. 

1.3 Virtual Work and Generalized Force 

(1.7) 

There are several routes for deriving Lagrange’s equations of motion. The 

most elegant and general makes use of the principle of least action and the 

calculus of variation. I will use a much more pedestrian approach based on 

Newton’s second law of motion. First note that F = ma can be written in 

the rather arcane form ( ) 

d ∂T 

= Fi 

(1.8) 

dt ∂vi 

Where Fi is i-th component of the total force acting on a particle with 

kinetic energy T . The point of writing this in terms of energy rather than

1.3. VIRTUAL WORK AND GENERALIZED FORCE 9 

acceleration is that we can separate out the forces of constraint, which are 

always perpendicular to the direction of motion and hence do no work. The 

trick is to write this in terms of generalized coordinates and velocities. This 

is rather technical, but the underlying idea is simple, and the result looks 

much like (1.8). 

The qk’s are all independent, so we can vary one by a small amount δqk 

while holding all others constant. 

δri = ∑ ∂ri 

δqk 

∂qk 

k 

(1.9) 

This is sometimes called a virtual displacement. The corresponding virtual 

work is 

δWk = ∑ 

( 

∂ri 

Fi · 

∂qk 

) 

δqk 

(1.10) 

We define a generalized force 

i 

ℑk = ∑ 

i 

Fi · ∂ri 

∂qk 

(1.11) 

The forces of constraint can be excluded from the sum for the reason explained 

above. We are left with 

ℑk = δWk 

δqk 

The kinetic energy is calculated using ordinary velocities. 

T = 1 

2 

i 

∑ 

mi ˙ri · ˙ri 

i 

∂T 

= 

∂qk 

∑ ∂ ˙ri 

mi ˙ri · = 

∂qk 

∑ ∂ ˙ri 

pi · 

∂qk 

∂T 

∂ ˙qk 

= ∑ ∂ ˙ri 

mi ˙ri · 

∂ ˙qk 

i 

i 

= ∑ 

i 

pi · ∂ri 

∂qk 

(1.12) 

(1.13) 

(1.14) 

(1.15) 

Equation (1.7) was used to obtain the last term. A straightforward calculation 

now leads to 

??ℑk = d 

( ) 

∂T 

− 

dt ∂ ˙qk 

∂T 

∂qk 

(1.16) 

which is the generalized form of (1.8).


1.4 Conservative Forces and the Lagrangian 

So far we have made no assumptions about the nature of the forces included 

in ℑ except that they are not forces of constraint. Equation (16) is therefore 

quite general, although seldom used in this form. In these notes we are 

primarily concerned with conservative forces, i.e. forces that can be derived 

from a potential. 

Fi = −∇iV (r1 · · · rN) (1.17) 

Notice that V doesn’t depend on velocity. (Electromagnetic forces are velocity 

dependent, of course, but they can easily be accommodated into the 

Lagrangian framework. I will return to this issue later on.) Now calculate 

the work done by changing some of the q’s. 

∫ 

∑ 

W = Fi · dri = − ∑ 

∫ 

∇iV · dri 

i 

= − ∑ 

∫ 

∇iV · 

i 

∑ ∂ri 

dqk 

∂qk k 

= − ∑ 

∫ 

k 

( ∑ 

∇iV · 

i 

∂ri 

) 

dqk 

∂qk 

= − ∑ 

∫ 

∂V 

dqk 

∂qk 

i 

k 

(1.18) 

The integral is a multidimensional definite integral over the various q’s that 

have changed. Summing over (1.12) then gives 

Comparison with (1.18) yields 

Finally define the Lagrangian 

δW = ∑ 

δWk = ∑ 

ℑkδqk 

k 

W = ∑ 

∫ 

k 

ℑk = − ∂V 

∂qk 

k 

ℑkdqk 

(1.19) 

(1.20) 

(1.21) 

L = T − V (1.22)

1.4. CONSERVATIVE FORCES AND THE LAGRANGIAN 11 

Equation (??) becomes 

d 

dt 

( ∂L 

∂ ˙qk 

) 

− ∂L 

∂qk 

= 0. (1.23) 

Equation (1.23) represents a set of 3N −l second order partial differential 

equations called Lagrange’s equations of motion. I can summarize this long 

development by giving you a “cookbook” procedure for using (1.23) to solve 

mechanics problems: First select a convenient set of generalized coordinates. 

Then calculate T and V in the usual way using the ri’s. Use equation (1.3) 

to eliminate the ri’s in favor of the qk’s. Finally substitute L into (1.23) and 

solve the resulting equations. 

Classical mechanics texts are full of examples in which this program is 

carried to a successful conclusion. In fact, most of these problems are contrived 

and of little interest except to illustrate the method. The vast majority 

of systems lead to differential equations that cannot be solved in closed 

form. The modern emphasis is to understand the solutions qualitatively 

and then obtain numerical solutions using the computer. The Hamiltonian 

formalism described in the next section is better suited to both these ends. 

1.4.1 The Central Force Problem in a Plane 

Consider the central force problem as an example of this technique. 

V = V (r) F = −∇V (1.24) 

L = T − V = 1 

2 m 

( 

˙r 2 + r 2 ϕ˙ 2 ) 

− V (r) (1.25) 

Let’s choose our generalized coordinates to be q1 = r and q2 = ϕ. Equation 

(1.23) becomes 

m¨r − mr ˙ ϕ 2 + dV 

= 0 

dr 

d 

( 

mr 

dt 

(1.26) 

2 ) 

ϕ˙ 

= 0 (1.27) 

This last equation tells us that there is a quantity mr 2 ˙ ϕ that does not change 

with time. Such a quantity is said to be conserved. In this case we have 

rediscovered the conservation of angular momentum. 

This reduces the problem to one dimension. 

mr 2 ˙ ϕ ≡ lz = constant (1.28) 

m¨r = l2 z dV 

− 

mr3 dr 

(1.29)


Since there are no constraints, the generalized forces are identical with the 

ordinary forces 

ℑϕ = − dV 

dϕ = 0 ℑr = − dV 

(1.30) 

dr 

This equation has an elegant closed form solution in the special case of 

gravitational attraction. 

V = − GmM 

r 

≡ −k 

r 

(1.31) 

m¨r = l2 z k 

− (1.32) 

mr3 r 

This apparently nonlinear equation yields to a simple trick, let u = 1/r. 

d2u dϕ2 + u = m2k l2 z 

(1.33) 

If the motion is circular u is constant. Otherwise it oscillates around the 

value m2k/l2 z with simple harmonic motion. 2 The period of oscillation is 

identical with the period of rotation so the corresponding orbit is an ellipse. 

This problem was easy to solve because we were able to discover a nontrivial 

quantity that was constant, in this case the angular momentum. The 

constant enabled us to reduce the number of independent variables from 

two to one. Such a conserved quantity is called an integral of the motion 

or a constant of the motion. Obviously, the more such quantities one can 

find, the easier the problem. This raises two practical problems. First, how 

can we tell, perhaps from looking at the physics of a problem, how many 

independent conserved quantities there are? Second, how are we to find 

them? 

In the central force problem, both of these questions answered themselves. 

We know that angular momentum is conserved. This fact manifests 

itself in the Lagrangian in that L depends on ˙ ϕ but not on ϕ. Such a coordinate 

is said to be cyclic or ignorable. Let q be such a coordinate. Then 

( ) 

d ∂L 

= 0 (1.34) 

dt ∂ ˙q 

The quantity in brackets has a special significance. It is called the canonically 

conjugate momentum. 3 

∂L 

≡ pk 

(1.35) 

˙qk 

2 

This illustrates a general principle in physics: When correctly viewed, everything is a 

harmonic oscillator. 

3 

This notation is universally used, hence the old aphorism that mechanics is a matter 

of minding your p’s and q’s.

1.5. THE HAMILTONIAN FORMULATION 13 

To summarize, if q is cyclic, p is conserved. 

Suppose we had tried to do the central force problem in cartesian coordinates. 

Both x and y would appear in the Lagrangian, and neither px nor 

py would be constant. If we insisted on this, central forces would remain an 

intractable problem in two dimensions. We need to choose our generalized 

coordinates so that there are as many cyclic variables as possible. The two 

questions reemerge: how many are we entitled to and how do we find the 

corresponding p’s and q’s? 

A partial answer to the first is given by a well-known result called 

Noether’s theorem: For every transformation that leaves the Lagrangian 

invariant there is a constant of the motion. 4 This theorem (which underlies 

all of modern particle physics) says that there is a fundamental connection 

between symmetries and invariance principles on one hand and conservation 

laws on the other. Momentum is conserved because the laws of physics 

are invariant under translation. Angular momentum is conserved because 

the laws of physics are invariant under rotation. Despite its fundamental 

significance, Noether’s theorem is not much help in practical calculations. 

Granted it gives a procedure for finding the conserved quantity after the 

corresponding symmetry transformation has been found, but how is one to 

find the transformation? The physicist must rely on his traditional tools: 

inspiration, the Ouija Board, and simply pounding one’s head against a 

wall. The fact remains that there are simple systems, e.g. the Henon-Heiles 

problem to be discussed later, that have fascinated physicists for decades 

and for which the existence of these transformations is still controversial. 

I will have much more to say about the second question. As you will 

see, there is a more or less “cookbook” procedure for finding the right set 

of variables and some fundamental results about the sorts of problems for 

which these procedures are possible. 

1.5 The Hamiltonian Formulation 

I will explain the Hamiltonian assuming that there is only one degree of 

freedom. It’s easy to generalize once the basic ideas are clear. Lagrangians 

are functions of q and ˙q. We define a new function of q and p (given by 

(1.34)). 

H(p, q) = p ˙q − L(q, ˙q) (1.36) 

The new function is called the Hamiltonian, and the transformation L → H 

is called a Lagrange transformation. The equation is much more subtle than 

4 See Finch and Hand for a simple proof and further discussion.


it looks. In fact, its worth several pages of explanation. 

It’s clear from elementary mechanics that q, ˙q, and p can’t all be independent 

variables, since p = m ˙q. You might say that there are two ways of 

formulating Newton’s second law: a (q, ˙q) formulation, F = m¨q, and a (q, p) 

formulation, F = ˙p. The connection between q and its canonically conjugate 

momentum is usually more complicated than this, but there is still a (q, ˙q) 

formulation, the Lagrangian, and a (q, p) formulation, the Hamiltonian. The 

Legendre transformation is a procedure for transforming the one formulation 

into the other. The key point is that it is invertible. 5 To see what this 

means, let’s first assume that q, ˙q and p are all independent. 

dH = 

What is the condition that H not depend on ˙q? 

H(q, ˙q, p) = p ˙q − L(q, ˙q) (1.37) 

( 

p − ∂L 

) 

d ˙q + ˙q dp − 

∂ ˙q 

∂L 

dq (1.38) 

∂q 

p(q, ˙q) = 

∂L(q, ˙q) 

∂ ˙q 

OK. This is the definition of p anyhow, so we’re on the right track. 

dH = ˙q dp − ∂L 

∂q dq 

dH = ∂H 

∂p 

dp +∂H 

∂q dq 

Adding and subtracting these two equations gives 

˙q(q, p) = ∂H 

∂p 

− ∂L 

∂q 

= ∂H 

∂q 

Combining (1.23), (1.39), and (1.4141) gives the fourth major result. 

˙p(q, p) = − ∂H 

∂q 

5 The following argument is taken from Finch & Hand. 

(1.39) 

(1.40) 

(1.41) 

(1.42)

1.5. THE HAMILTONIAN FORMULATION 15 

Now here’s what I mean that Legendre transformations are invertible: 

First follow the steps from L → H. We start with L = L(q, ˙q). Equation 

(1.39) gives p = p(q, ˙q). Invert this to find ˙q = ˙q(q, p). The Hamiltonian is 

now 

H(q, p) = ˙q(q, p)p − L[q, ˙q(q, p)]. (1.43) 

Now suppose that we start from H = H(p, q). Use (1.40) to find ˙q = ˙q(q, p). 

Invert to find p = p(q, ˙q). Finally 

L(q, ˙q) = ˙qp(q, ˙q) − H[q, p(q, ˙q)] (1.44) 

In both cases we were able to complete the transformation without knowing 

ahead of time the functional relationship among q, ˙q, and p. To summarize: 

Equations (1.37), (1.39), and (1.41) enable us to transform between the (q, ˙q) 

(Lagrangian) prescription and the (q, p) (Hamiltonian) prescription; while 

(1.40) and (1.41) are Hamilton’s equations of motion. 

1.5.1 The Spherical Pendulum 

A mass m hangs from a string of length R. The string makes an angle θ 

with the vertical and can rotate about the vertical with an angle ϕ. 

T = 1 

2 mR2 ( ˙ θ 2 + sin 2 θ ˙ ϕ 2 ) (1.45) 

V = mgR(1 − cos θ) (1.46) 

The mgR constant doesn’t appear in the equations of motion, so we can 

forget about it. The Lagrangian is L = T − V as usual. 

pθ = ∂L 

∂ ˙ θ = mR2 ˙ θ (1.47) 

pϕ = ∂L 

∂ ˙ ϕ = mR2 sin 2 θ ˙ ϕ ≡ lϕ 

(1.48) 

The angle ϕ is cyclic, so pϕ = lϕ is constant. At this point we are still in the 

(q, ˙q) prescription. Invert (47) and (48) to obtain ˙ θ and ˙ ϕ as functions of pθ 

and lϕ. 

˙θ = pθ/mR 2 

(1.49) 

H = p2 θ + 

2mR2 ˙ϕ = lϕ/mR 2 sin 2 θ (1.50) 

l2 ϕ 

2mR2 sin2 − mgR cos θ (1.51) 

θ


The equations of motion follow from this. 

˙θ = ∂H 

∂pθ 

= pθ 

mR 2 

˙pθ = − ∂H 

∂θ = l2 ϕ cos θ 

˙ϕ = ∂H 

∂pϕ 

mR 2 sin 3 θ 

= 

lϕ 

mR 2 sin 2 θ 

(1.52) 

− mgR sin θ (1.53) 

(1.54) 

˙pϕ = 0 (1.55) 

Suppose we were to try to find an analytic solution to this system of 

equations. First note that there are two constants of motion, the angular 

momentum lϕ, and the total energy E = H. 

1. Invert (1.49) to obtain pθ = pθ(θ, E, lϕ). 

2. Substitute pθ into (1.49) and integrate 

∫ 

mR2 dθ = t ≡ N(θ) 

pθ 

The integral is hopeless anyhow, so we label its output N(θ), (short 

for an exceedingly nasty function). 

3. Invert the nasty function to find θ as a function of t. 

4. Take the sine of this even nastier function and substitute it into (1.54) 

to find ˙ ϕ. 

5. Integrate and invert to find ϕ as a function of t. 

This makes sense in principle, but is wildly impossible in practice. Now 

suppose we could change the problem so that both θ and ϕ were cyclic so 

that the two constants of motion were pθ and pϕ (rather than E and pϕ). 

Then 

˙θ = ∂H 

= ωθ 

∂pθ 

θ = ωθt + θ0 

˙ϕ = ∂H 

∂pϕ 

= ωϕ ϕ = ωϕt + ϕ0 

Here ωθ and ωϕ are two constant “frequencies” that we could easily extract 

from the Hamiltonian. This apparently small change makes the problem 

trivial! In both cases there are two constants of motion: it makes all the 

difference which two constants. This is the basis of the idea we will be 

pursuing in the next chapter.

Chapter 2 

Canonical Transformations 

We saw at the end af the last chapter that a problem in which all the 

generalized coordinates are cyclic is trivial to solve. We also saw that there 

is a great flixibility allowed in the choice of coordinates for any particular 

problem. It turns out that there is an important class of problems for which 

it is possible to choose the coordinates so that they are in fact all cyclic. 

The choice is usually far from obvious, but there is a formal procedure for 

finding the “magic” variables. One formulates the problem in terms of the 

natural p’s and q’s and then transforms to a new set of variables, usually 

called Qk and Pk, that have the right properties. 

2.1 Contact Transformations 

The most general transformation is called a contact transformation. 

Qk = Qk(q, p, t) Pk = Pk(q, p, t) (2.1) 

(In this formula and what follows, the symbols p and q when used as arguments 

stand for the complete set, q1, q2, q3, · · · , etc.) There is a certain privileged 

class of transformations called canonical transformations that preserve 

the structure of Hamilton’s equation of motion for all dynamical systems. 

This means that there is a new Hamiltonian function called K(Q, P ) for 

which the new equations of motion are 

˙Qk = ∂K 

∂Pk 

Pk 

˙ = − ∂K 

∂Qk 

(2.2) 

In a footnote in Classical Mechanics, Goldstein suggested that K be called 

the Kamiltonian. The idea has caught on with several authors, and I will 

use it without further apology. The trick is to find it. 

17

18 CHAPTER 2. CANONICAL TRANSFORMATIONS 

Theorem: Let F be any function of qk and Qk and possibly pk and Pk, 

as well as time. Then the new Lagrangian defined by 

¯L = L − dF 

dt 

(2.3) 

is equivalent to L in the sense that it yields the same equations of motion. 

Proof: 

F ˙ = ∑ ∂F 

˙qk + 

∂qk k 

∑ ∂F 

˙Qk + 

∂Qk k 

∂F 

∂t 

( 

d ∂ 

dt 

(2.4) 

˙ 

) 

F 

= 

∂ ˙qk 

d 

( ) 

∂F 

= 

dt ∂qk 

∂ ˙ F 

∂qk 

( 

d ∂ 

dt 

˙ F 

∂ ˙ ) 

= 

Qk 

d 

( ) 

∂F 

= 

dt ∂Qk 

∂ ˙ F 

∂Qk 

These last two can be rewritten 

d 

dt 

( 

∂ ˙ 

F 

∂ ˙qk 

( 

d ∂ 

dt 

˙ F 

∂ ˙ ) 

− 

Qk 

) 

∂F˙ − = 0 

∂qk 

∂ ˙ 

F 

∂Qk 

So ˙ F satisfies Lagrange’s equation whether we regard it as a function of qk 

or Qk. Obviously, if L satisfies Lagrange’s equation, then so does L − ˙ F . 

(The conclusion is unchanged if F contains pk and/or Pk.) The function F 

is called the generating function of the transformation. 

K is obtained by a Legendre transformation just as H was. 

= 0 

K(Q, P ) = ∑ 

Pk ˙ Qk − ¯ L(Q, ˙ Q, t) (2.5) 

k 

This has the same form as (1.36), so the derivation of the equations of motion 

(1.39) through(1.42) are unchanged as well. 

Pk = ∂K 

∂ ˙ Qk 

˙Qk = ∂K 

∂Pk 

Pk 

˙ = − ∂K 

∂Qk 

(2.6) 

These simple results provide the framework for canonical transformations. 

In order to use them we will need to know two more things: (1) How

2.1. CONTACT TRANSFORMATIONS 19 

to find F , and given F , (2) how to find the transformation (q, p) → (Q, P ). 

We deal with (2) now and postpone (1) to later sections. 

Consider the variables q, Q, p, and P . Any two of these constitute a 

complete set, so there are four kinds of generating functions usually called 

F1(q, Q, t), F2(q, P, t), F3(p, Q, t), and F4(p, P, t). All four are discussed in 

Goldstein. F1 provides a good introduction. Most of our work will make 

use of F2. 

Starting with F1(q, Q) (2.3) becomes 

Since 

we get with the help of (4) 

¯L(Q, ˙ Q, t) = L(q, ˙q, t) − d 

dt F1(q, Q, t) (2.7) 

∂ ¯ L 

∂ ˙qk 

∂ ¯ L 

∂ ˙qk 

= ∂L 

− 

∂ ˙qk 

= ∂L 

∂ ˙ Qk 

∂ ˙ 

F1 

∂ ˙qk 

= 0, 

= ˙pk − ∂F1 

∂qk 

∂ ¯ L 

∂ ˙ Qk 

= Pk = ∂L 

∂ ˙ Qk 

∂F1 ˙ 

− 

∂ ˙ Qk 

This yields the two transformation equations 

Pk = − ∂F1 

∂Qk 

pk = ∂F1 

∂qk 

= 0. 

= − ∂F1 

∂Qk 

(2.8) 

(2.9) 

A straightforward set of substitutions gives our final formula for the Kamiltonian. 

K = ∑ 

[ 

− ∂F 

˙Qk − L + 

∂Qk 

∂F 

˙qk + 

∂qk 

∂F 

] 

˙Qk + 

Qk 

∂F 

∂t 

To be more explicit 

k 

= −L + ∑ 

k 

pk ˙qk + ∂F 

∂t 

K(Q, P ) = H(q(Q, P ), p(Q, P ), t) + ∂ 

∂t F1(q(Q, P ), Q, t) (2.10) 

Summary:


1. Here is the typical problem: We are given the Hamiltonian H = 

H(q, p) for some conservative system. H = E is constant, but the 

q’s and p’s change with time in a complicated way. Our goal is to find 

the functions q = q(t) and p = p(t) using the technique of canonical 

transformations. 

2. We need to know the generating function F = F1(q, Q). This is the 

hard part, and I’m postponing it as long as possible. 

3. Substitute F into (2.8) and (2.9). This gives a set of coupled algebraic 

equations for q, Q, p, and P . They must be combined in such a way 

as to give qk = qk(Q, P ) and pk = pk(Q, P ). 

4. Use (2.10) to find K. If we had the right generating function to start 

with, Q will be cyclic, i.e. K = K(P ). The equations of motion are 

obtained from (2.6). Pk 

˙ = 0 and ˙ Qk = ωk. The ω’s are a set of 

constants as are the P ’s. Qk(t) = ωkt + αk. The α’s are constants 

obtained from the initial conditions. 

5. Finally qk(t) = qk(Q(t), P ) and pk(t) = pk(Q(t), P ). 

2.1.1 The Harmonic Oscillator: Cracking a Peanut with a 

Sledgehammer 

H = p2 kq2 

+ 

2m 2 

It’s useful to try a new technique on an old problem. As it turns out, the 

generating function is 

F = mωq2 

cot Q (2.12) 

2 

The transformation is found from (2.8) and (2.9). 

p = ∂F 

∂q 

= 1 

2m (p2 + m 2 ω 2 q 2 ) (2.11) 

= mωq cot Q 

P = − ∂F mωq2 

= 

∂Q 2 sin2 Q 

Solve for p and q in terms of P and Q and then substitute into (2.10) to 

find K. 

√ 

2P 

q = 

mω sin Q p = √ 2P mω cos Q

2.2. THE SECOND GENERATING FUNCTION 21 

K = ωP P = E/ω 

We have achieved our goal. Q is cyclic, and the equations of motion are 

trivial. 

˙Q = ∂K 

q = 

∂P = ω Q = ωt + Q0 (2.13) 

√ 2E 

mω 2 sin(ωt + Q0) p = √ 2mE cos(ωt + Q0) (2.14) 

2.2 The Second Generating Function 

There’s an old recipe for tiger stew that begins, “First catch the tiger.” In 

our quest for the tiger, we now turn our attention to the second generating 

function, F2 = F2(q, P, t). F2 is obtained from F1 by means of a Legendre 

transformation. 1 

F2(q, P ) = F1(q, Q) + ∑ 

(2.15) 

k 

QkPk 

We are looking for transformation equations analogous to (refe2.8) and (2.9). 

Since L = ¯ L + ˙ F1, 

∑ 

pk ˙qk − H = ∑ 

Pk ˙ Qk − K + d 

dt (F2 − ∑ QkPk) 

k 

k 

k 

= − ∑ Qk ˙ 

Pk − K + ˙ 

F2 

Substitute 

F2 

˙ = ∑ 

[ 

∂F2 

˙qk + 

∂qk k 

∂F2 

] 

Pk 

˙ + 

∂Pk 

∂F2 

∂t 

−H = −K + ∑ 

[( ) ( ) ] 

∂F2 

∂F2 

− pk ˙qk + − Qk Pk 

˙ + 

∂qk 

∂Pk 

∂F2 

∂t 

We are working on the assumption that ˙q and ˙ 

P are not independent variables. 

We enforce this by requiring that 

∂F2 

∂qk 

∂F2 

∂Pk 

= pk 

= Qk 

(2.16) 

(2.17) 

K(q, P ) = H(q(Q, P ), P ) + ∂ 

∂t F2(q(Q, P ), P ) (2.18) 

1 When in doubt, do a Legendre transformation.


2.3 Hamilton’s Principle Function 

The F1 style generating functions were used to transform to a new set of 

variables (q, p) → (Q, P ) such that all the Q’s were cyclic. As a consequence, 

the P ’s were constants of the motion, and the Q’s were linear functions 

of time. The generating function itself was hard to find, however. The 

F2 generating function goes one step further; it can transform to a set of 

variables in which both the Q’s and P ’s are constant and simple functions of 

the initial values of the phase space variables. In essence, our transformation 

is 

(q(t), p(t)) ↔ (q0, p0) 

This is a time dependent transformation, of course. The fact that we can 

find such transformations shows that the time evolution of a system is itself 

a canonical transformation. 

We look for an F2 so that K in (2.18) is identically zero! Then from 

(2.6), ˙ Qk = 0 and ˙ 

Pk = 0. The appropriate generating function will be a 

solution to 

H(q, p, t) + ∂F2 

∂t 

We eliminate pk using (2.16) 

( 

H q1, . . . , qn; ∂F2 

, . . . , 

∂q1 

∂F2 

) 

; t + 

∂qn 

∂F2 

∂t 

(2.19) 

= 0. (2.20) 

The solution to this equation is usually called S, Hamilton’s principle function. 

The equation itself is the Hamilton-Jacobi equation. 2 

There are two serious issues here: does it have a solution, and if it does, 

can we find it? We will take a less serious approach: if we can find a solution, 

then it most surely exists. Furthermore, if we can find it, it will have the 

form 

S = ∑ 

Wk(qk) − αt (2.21) 

k 

Partial differential equations that have solutions of the form (2.21) are said 

to be separable. 3 Most of the familiar textbook problems in classical mechanics 

and atomic physics can be separated in this form. The question 

of separability does depend on the system of generalized coordinates used. 

For example, the Kepler problem is separable in spherical coordinates but 

not in cartesian coordinates. It would be nice to know whether a particular 

2 See Goldstein, Classical Mechanics, Chapter 10 

3 Or to be meticulous, completely separable

2.3. HAMILTON’S PRINCIPLE FUNCTION 23 

Hamiltonian could be separated with some system of coordinates, but no 

completely general criterion is known. 4 As a rule of thumb, Hamiltonians 

with explicit time dependence are not separable. 

If our Hamiltonian is separable, then when (2.21) is substituted into 

(2.20), the result will look like 

f1 

( 

q1, dW1 

dq1 

) 

+ f2 

( 

q2, dW2 

) 

+ · · · = α (2.22) 

dq2 

Each function fk is a function only of qk and dWk/dqk. Since all the q’s 

are independent, each function must be separately constant. This gives us a 

system of n independent, first-order, ordinary differential equations for the 

Wk’s. 

( 

fk qk, dWk 

) 

= αk. (2.23) 

dqk 

The W ’s so obtained are then substituted into (2.21). The resulting function 

for S is 

F2 ≡ S = S(q1, . . . , qn; α1, . . . , αn; α, t) 

The final constant α is redundant for two reasons: first, ∑ αk = α, and 

second, the transformations equations (2.16) and (2.17) involve derivatives 

with respect to qk and Pk. When S is so differentiated, the −αt piece will 

disappear. In order to make this apparent, we will write S as follows: 

F2 ≡ S = S(q1, . . . , qn; α1, . . . , αn; t) (2.24) 

Since the F2 generating functions have the form F2(q, P ), we are entitled to 

think of the α’s as “momenta,” i.e. αk in (??) corresponds to Pk in (2.17). 

In a way this makes sense. Our goal was to transform the time-dependent 

q’s and p’s into a new set of constant Q’s and P ’s, and the α’s are most 

certainly constant. On the other hand, they are not the initial momenta p0 

that evolve into p(t). The relationship between α and p0 will be determined 

later. 

If we have dome our job correctly, the Q’s given by (2.17) are also constant. 

They are traditionally called β, so 

Qk = βk = 

∂S(q, α, t) 

∂αk 

Again, β’s are constant, but they are not equal to q0. 

We can turn this into a cookbook algorithm. 

(2.25) 

4 The is a very technical result, the so-called Staeckel conditions, which gives necessary 

and sufficient conditions for separability in orthogonal coordinate systems.


1. Substitute (2.21) into (2.20) and separate variables. 

2. Integrate the resulting fist-order ODE’s. The result will be n independent 

functions Wk = Wk(q, α). Put the Wk’s back into (2.21) to 

construct S = S(q, α, t). 

3. Find the constant β coordinates using 

βk = ∂S 

∂αk 

4. Invert these equations to find qk = qk(β, α, t) 

5. Find the momenta with 

pk = ∂S 

∂qk 

2.3.1 The Harmonic Oscillator: Again 

The harmonic oscillator provides an easy example of this procedure. 

H = 1 

[ (∂S 

1 

2m ∂q 

[ (∂W 

1 

2m ∂q 

2m (p2 + m 2 ω 2 q 2 ) 

) 2 

+ m 2 ω 2 q 2 

] 

) 2 

+ m 2 ω 2 q 2 

+ ∂S 

= 0 

∂t 

] 

= α 

(2.26) 

(2.27) 

Since there is only one q, the entire quantity on the left of the equal sign is 

a constant. 

W (q, α) = √ ∫ 

2mα 

√ 

dq 1 − mω2q2 2α 

The new transformed constant “momentum” P = α. 

∂S(q, α, t) 

β = = 

∂α 

∂W (q, α) 

− t 

∂α 

= 1 

ω sin−1 

[ √ ] 

mω2 q − t (2.28) 

2α 

Invert this equation to find q as a function of t and β. 

√ 

2α 

q = sin(ωt + βω) 

mω2

2.4. HAMILTON’S CHARACTERISTIC FUNCTION 25 

Evidentally, β has something to do with initial conditions: ωβ = ϕ0, the 

initial phase angle. 

p = ∂S 

∂q = √ 2mα − m 2 ω 2 q 2 

= √ 2mα cos(ωt + ϕ0) 

The maximum value of p is √ 2mE, so that makes sense too. 

2.4 Hamilton’s Characteristic Function 

There is another way to use the F2 generating function to turn a difficult 

problem into an easy one. In the previous section we chose F2 = S = W −αt, 

so that K = 0. It is also possible to to take F2 = W (q) so that 

( 

K = H qk, ∂W 

) 

= E = α1 

(2.29) 

∂qk 

The W obtained in this way is called Hamilton’s characteristic function. 

W = ∑ 

Wk(qk, α1, . . . , αn) 

k 

= W (q1, . . . , qn, E, α2, . . . , αn) = W (q1, . . . , qn, α1, . . . , αn) (2.30) 

It generates a contact transformation with properties quite different from 

that generated by S. The equations of motion are 

Pk 

˙ = − ∂K 

∂Qk 

˙Qk = ∂K 

∂Pk 

= ∂K 

∂αk 

= 0 (2.31) 

= δk1 

The new feature is that ˙ Q1 = 1 so Q1 = t − t0. In general 

but now β1 = t − t0. 

Qk = ∂W 

∂αk 

pk = ∂W 

∂qk 

as before. 

The algorithm now works like this: 

= βk 

(2.32) 

(2.33) 

(2.34)


1. Substitute (2.30) into (2.29) and separate variables. 

2. Integrate the resulting fist-order ODE’s. The result will be n independent 

functions Wk = Wk(q, α). Put the Wk’s back into (2.30) to 

construct S = S(q, α, t). 

3. Find the constant β coordinates using 

Remember that β1 = t − t0. 

βk = ∂S 

∂αk 

4. Invert these equations to find qk = qk(β, α, t) 

5. Find the momenta with 

2.4.1 Examples 

pk = ∂S 

∂qk 

(2.35) 

(2.36) 

Problems with one degree of freedom are virtually identical whether they 

are formulated in terms of the characteristic function or the principle function. 

Take for example, the harmonic oscillator from the previous section. 

Equation (2.28) becomes 

β = 

∂W (q, α) 

∂α 

q = 

= 1 

ω sin−1 

[ √ 

mω2 q 

2α 

] 

= t − t0 

√ 2α 

mω 2 sin[ω(t − t0)] (2.37) 

The following problem raises some new issues. 

Consider a particle in a stable orbit in a central potential. The motion 

will lie in a plane so we can do the problem in two dimensions. 

H = 1 

2m 

( 

p 2 r + p2 ψ 

r 2 

) 

+ V (r) (2.38) 

pψ = mr 2 ˙ ψ is the angular momentum. It is conserved since ψ is cyclic. 

[ (∂W ) 2 

1 

+ 

2m ∂r 

1 

r2 ( ) ] 

2 

∂W 

+ V (r) = α1 

∂ψ 

(2.39)

2.5. ACTION-ANGLE VARIABLES 27 

[ 

r 2 

( ) 2 

dWr 

+ 2mr 

dr 

2 V (r) − 2mα1r 2 

] 

+ 

( ) 2 

dWψ 

= 0 (2.40) 

dψ 

At this point we notice ∂W 

∂ψ = pψ, which we know is constant. Why not 

call it something like αψ? Then Wψ = αψψ. This is worth stating as a 

general principle: if q is cyclic, Wq = αqq, where αq is one of the n constant 

α’s appearing in (2.30). 

∫ 

W = 

√ 

dr 2m(α1 − V ) − α2 ψ /r2 + αψψ (2.41) 

We can find r as a function of time by inverting the equation for β1, just as 

we did in (2.37), but more to the point 

βψ = ∂W 

∫ 

αψdr 

= − √ 

∂αψ r 2m(α1 − V ) − α2 ψ /r2 

+ ψ (2.42) 

Make the usual substitution,u = 1/r. 

∫ 

du 

ψ − βψ = − √ 

2m(α1 − V (r))/α2 ψ − u2 

(2.43) 

This is a new kind of equation of motion, which gives ψ = ψ(r) or r = r(ψ) 

(assuming we can do the integral), i.e. there is no explicit time dependence. 

Such equations are called orbit equations. Often it will be more useful to 

have the equations in this form, when we are concerned with the geometric 

properties of the trajectories. 

2.5 Action-Angle Variables 

We are pursuing a rout to chaos that begins with periodic or quasi-periodic 

systems. A particularly elegant approach to these systems makes use of a 

variant of Hamilton’s characteristic function. In this technique, the integration 

constants αk appearing directly in the solution of the Hamilton-Jacobi 

equation are not themselves chosen to be the new momenta. Instead, we 

define a set of constants Ik, which form a set of n independent functions of 

the α’s known as action variables. The coordinates conjugate to the J’s are 

angles that increase linearly with time. You are familiar with a system that 

behaves just like this, the harmonic oscillator! 

q = 

√ 2E 

k sin ψ p = √ 2mE cos ψ


Where ψ = ωt + ψ0. In the language of action-angle variables I = E/ω, so 

√ 

2I 

q = 

mω sin ψ p = √ 2mIω cos ψ 

I is the “momentum” conjugate to the “coordinate” ψ. 

Action angle variables are only appropriate to periodic motion, and there 

are other restrictions we will learn as we go along, but within these limitations, 

all systems can be transformed into a set of uncoupled harmonic 

oscillators. 5 To see what “periodic motion” implies, have a look at the 

simple pendulum. 

H = p2 θ − mgl cos θ = E = α (2.44) 

2ml2 pθ = ± √ 2ml 2 (E + mgl cos θ) (2.45) 

There are two kinds of motion possible. If E is small, the pendulum will 

reverse at the points where pθ = 0. The motion is called libration, i.e. 

bounded and periodic. If E is large enough, however, the pendulum will 

swing around a complete circle. Such motion is called rotation (obviously). 

There is a critical value of E = mgl for which, in principle, the pendulum 

could stand straight up motionless at θ = π. An orbit in pθ - θ phase space 

corresponding to this energy forms the dividing line between the two kinds 

of motion. Such a trajectory is called a separatrix. 

For either type of periodic motion, we can introduce a new variable I 

designed to replace α as the new constant momentum. 

I(α) = 1 

2π 

 

p(q, α) dq (2.46) 

This is a definite integral taken over a complete period of libration or rotation. 

6 I will prove (1) the angle ψ conjugate to I is cyclic, and (2) ∆ψ = 2π 

corresponds to one complete cycle of the periodic motion. 

1. Since I = I(α) and H = α, it follows that H is a function of I only. 

H = H(I). 

I ˙ = − ∂H 

∂ψ 

= 0 

˙ 

ψ = ∂H 

∂I 

= ω(I) (2.47) 

5 When correctly viewed, everything is a harmonic oscillator. 

6 Textbooks are about equally divided on whether to call action I or J and whether or 

not to include the factor 1/2π.


2. We are using an F2 type generating function, which is a function of the 

old coordinate and new momentum. Hamilton’s characteristic function 

can be written as 

W = W (q, I). (2.48) 

The transformation equations are 

Note that 

so 

 

dψ = 

∂ψ 

∂q 

ψ = ∂W 

∂I 

∂ψ 

∂q 

 

∂ ∂W 

dq = 

∂I ∂q 

p = ∂W 

∂q 

( ) 

∂ ∂W 

= 

∂I ∂q 

 

∂ 

dq = 

∂I 

2.5.1 The harmonic oscillator (for the last time) 

H = 1 

2m (p2 + m 2 ω 2 q 2 ) 

p = ± √ 2mE − m2ω2q 2 

I = 1 

 

√2mE 

− m2ω2q 2 dq 

2π 

(2.49) 

p dq = ∂ 

(2πI) = 2π. 

∂I 

The integral is tricky in this form because p changes sign at the turning 

points. We won’t have to worry about this if we make the substitution 

q = 

√ 

2E 

sin ψ (2.50) 

mω2 This substitution not only makes the integral easy and takes care of the sign 

change, it also makes clear the meaning of an integral over a complete cycle, 

i.e. ψ goes from 0 to 2π. 

I = E 

πω 

 

cos 2 ψ dψ = E/ω 

From this point of view the introduction of ψ at (50) seems nothing 

more that a mathematical trick. We would have stumbled on it eventually,


however, as the following argument shows. The Hamilton-Jacobi equation 

is 

[ (dW ) 2 

1 

+ m 

2m dq 

2 ω 2 q 2 

] 

= E 

∫ 

√2mIω 

W = 

− m2ω2q 2 dq 

∂W 

∂I 

∫ 

= mω 

dq 

√ 2mIω − m 2 ω 2 q 2 

= sin −1 

( √ ) 

mω2 q − ψ0 = ψ 

2I 

√ 

2E 

q = sin(ψ − ψ0) 

mω2 In the last equation ψ0 appears as an integration constant. Evidentally, ψ 

is the angle variable conjugate to I. 

In summary, to use action-angle variables for problems with one degree 

of freedom: 

1. Find p as a function of E = α and q. 

2. Calculate I(E) using (2.46). 

3. Solve the Hamilton-Jacobi equation to find W = W (q, I). 

4. Find ψ = ψ(q, I) using (2.49). 

5. Invert this equation to get q = q(I, ψ). 

6. Use (2.47) to get ω(I). 

7. Calculate p = p(I, q) from (2.49). 

One attractive feature of this scheme is that you can find the frequency 

without using the characteristic function and without finding the equations 

of motion. The phase space plot is particularly important. Use polar coordinates 

(what else) for (I, ψ). Every trajectory, whatever the system, is a 

circle! 

Our derivation was based on the following assumptions: (1) The system 

had one degree of freedom. (2) Energy was conserved and the Hamiltonian 

had no explicit time dependence. (3) The motion was periodic. Every such 

system is at heart, a harmonic oscillator. Phase space trajectories are circles.


The frequency can be found with a few deft moves. From a philosophical 

point of view, (and we will be getting deeper and deeper into philosophy as 

these lectures proceed) problems in this category are “as good as solved,” 

nothing more needs to be said about them. The same is definitely not true 

true with more than one degree of freedom. I will take a paragraph to 

generalize before going on to some more abstract developments. 

We must assume that the system is separable, so 

W (q1, . . . , qn, α1, . . . , αn) = ∑ 

Wk(qk, α1, . . . , αn) (2.51) 

k 

pk = ∂ 

Wk(qk, α1, . . . , αn) (2.52) 

∂qk 

Ik = 1 

 

pk(qk, α1, . . . , αn) (2.53) 

2π 

Next find all the q’s as function of the I’s and substitute into W . 

Finally 

ψk = ∂W 

∂Ik 

W = W (q1, . . . , qn; I1, . . . , In) 

˙ 

Ik = 0 

˙ ψk = ∂H 

∂Ik 

= ωk 

(2.54)

32 CHAPTER 2. CANONICAL TRANSFORMATIONS

Chapter 3 

Abstract Transformation 

Theory 

So, one-dimensional problems are simple. Given the restrictions listed in 

the previous section, their phase space trajectories are circles. How does 

this generalize to problems with two or more degrees of freedom? A brief 

answer is that, given a number of conditions that we must discuss carefully, 

the phase space trajectories of a system with n degrees of freedom, move 

on the surface of an n-dimensional torus imbedded in 2n dimensional space. 

The final answer is a donut! In order to prove this remarkable assertion and 

understand the conditions that must be satisfied, we must slog through a 

lot of technical material about transformations in general. 

3.1 Notation 

Our first job is to devise some compact notation for dealing with higher 

dimensional spaces. I will show you the notation in one dimension. It will 

then be easy to generalize. Recall Hamilton’s equations of motion. 

˙p = − ∂H 

∂q 

We will turn this into a vector equation. 

( ) 

q 

η = 

p 

( 

0 

J = 

−1 

1 

0 

The equations of motion in vector form are 

˙q = ∂H 

∂p 

) 

∇ = 

( ∂ 

∂q 

∂ 

∂p 

) 

(3.1) 

˙η = J · ∇H (3.2) 

33

34 CHAPTER 3. ABSTRACT TRANSFORMATION THEORY 

J is not a vector of course. Sometimes an array used in this way is called a 

dyadic. At any rate this is just shorthand for matrix multiplication, i.e. 

( ) ( ) 

˙q 0 1 

= 

˙p −1 0 

( ) 

∂H 

∂q 

∂H 

∂p 

The structure of J is important. Notice that it does two things: it exchanges 

p and q and it changes one sign. This is called a symplectic transformation. 

I want to explore the connection between canonical transformations and 

symlpectic transformations. 

I’ll start with the generic canonical transformation, (q, p) → (Q, P ). How 

do the velocities transform? Define 

) 

Using the notation 

this can be written 

( ˙Q 

˙ 

P 

M = 

) 

= 

( 

∂Q ∂Q 

∂q 

∂P 

∂p 

∂P 

∂q ∂p 

( 

∂Q ∂Q 

∂q 

∂P 

∂p 

∂P 

∂q ∂p 

ζ = 

( Q 

P 

) 

) ( ˙q 

˙p 

) 

(3.3) 

(3.4) 

(3.5) 

˙ζ = M · ˙η = M · J · ∇H (3.6) 

The gradient operator differentiates H with respect to q and p. These derivatives 

transform e.g. 

∂H ∂H ∂Q ∂H ∂P 

= + 

∂q ∂Q ∂q ∂P ∂q 

consequently 

The T stands for transpose, of course: 

but 

Combining (3.8) and (3.9): 

∇ (q,p) = M T · ∇ (Q,P )H (3.7) 

˙ζ = M · J · M T · ∇ (Q,P )H (3.8) 

˙ζ = J · ∇ (Q,P )H (3.9) 

J = M · J · M T 

(3.10)

3.1. NOTATION 35 

Those of you who have studied special relativity should find (??) congenial. 

Remember the definition of a Lorentz transformation: any 4×4 matrix 

Λ that satisfies 

g = Λ · g · Λ T 

(3.11) 

is a Lorentz transformation. 1 The matrix 

⎛ 

1 0 0 

⎞ 

0 

⎜ 

g = ⎜ 0 

⎝ 0 

−1 

0 

0 

−1 

0 ⎟ 

0 ⎠ 

0 0 0 −1 

(3.12) 

is called the metric or metric tensor. Forgive me for exaggerating slightly: 

everything there is to know about special relativity flows out of (3.11). We 

say that Lorentz transformations “preserve the metric,” i.e. leave the metric 

invariant. The geometry of space and time is encapsulated in (12). By the 

same token, canonical transformations preserve the metric J. The geometry 

of phase space is encapsulated in the definition of J. Since J is symplectic, 

canonical transformations are symplectic transformation, they preserve the 

symplectic metric. 

Equation (4.10) is the starting point for the modern approach to mechanics 

that uses the tools of Lie group theory. I will only mention in passing 

some points of contact with group theory. Both Goldstein’s and Schenk’s 

texts have much more on the subject. 

3.1.1 Poisson Brackets 

Equation (3.10) is really shorthand for four equations, e.g. 

∂Q ∂P 

∂q ∂p 

∂P ∂Q 

− = 1 (3.13) 

∂q ∂p 

This combination of derivatives is called a Poisson bracket. The usual notation 

is 

∂X ∂Y ∂X ∂Y 

− ≡ [X, Y ]q,p 

∂q ∂p ∂q ∂p 

(3.14) 

The quantity on the left is called a Poisson bracket. Then (3.13) becomes 

[Q, P ]q,p = 1 (3.15) 

1 It is not a good idea to use matrix notation in relativity because of the ambiguity 

inherent in covariant and contravariant indices. Normally one would write (11) using 

tensor notation.


This together with the trivially true 

[q, p]q,p = 1 (3.16) 

are called the fundamental Poisson brackets. We conclude that canonical 

transformations leave the fundamental Poisson brackets invariant. It turns 

out that all Poisson brackets have the same value when evaluated with respect 

to any canonical set of variables. This assertion requires some proof, 

however. I will start by generalizing to n dimensions. 

⎛ 

⎜ 

η = ⎜ 

⎝ 

q1 

q2 

. 

qn 

p1 

p2 

. 

pn 

⎞ 

⎟ 

⎠ 

J = 

( 0 ℑn 

−ℑn 0 

) 

⎛ 

⎜ 

∇ = ⎜ 

⎝ 

∂ 

∂q1 

∂ 

∂q2 

. 

∂ 

∂qn 

∂ 

∂p1 

∂ 

∂p2 

∂ 

∂pn 

⎞ 

⎟ 

⎠ 

(3.17) 

The symbol ℑn is the anti-diagonal n × n unit matrix. (refe3.14) becomes 

[X, Y ]η ≡ ∑ 

( 

∂X ∂Y 

∂qk ∂pk 

− ∂X 

) 

∂Y 

, 

∂pk ∂qk 

(3.18) 

or in matrix notation 

The following should look familiar: 

k 

[X, Y ]η = (∇ηX) T · J · ∇ηY. (3.19) 

[qi, qk] = [pi, pk] = 0 [qi, pk] = δik 

These are, of course, the commutation relations for position and momentum 

operators in quantum mechanics. The resemblance is not accidental. The 

operator formulation of quantum mechanics grew out of Poisson bracket 

formulation of classical mechanics. This development is reviewed in all the 

standard texts. In matrix notation 

[η, η]η = [ζ, ζ]η = J (3.20) 

The Poisson bracket of two vectors is itself a n × n matrix. i.e. 

[X, Y ]ij ≡ [Xi, Yj] (3.21)

3.1. NOTATION 37 

The proof of the above assertion is straightforward. 

∇ηY = M T · ∇ζY 

(∇ηX) T = (M T · ∇ζX) T = (∇ζX) T · M 

[X, Y ]η = (∇ζX) T · M · J · M T · ∇ζY 

= (∇ζX) T · J · ∇ζY = [X, Y ]ζ 

The last step makes use of (3.10). The invariance of the Poisson brackets is 

a non-trivial consequence of the symplectic nature of canonical transformations. 

From now on will will not bother with the subscripts on the Poisson 

brackets. 

Here is another similarity with quantum mechanics. Let f be any func- 

tion of canonical variables. 

f ˙ = ∑ 

( 

∂f 

∂qk k 

= ∑ 

( 

∂f ∂H 

∂qk ∂pk 

k 

df 

dt 

˙qk + ∂f 

) 

˙p + 

∂pk 

∂f 

∂t 

− ∂f 

∂pk 

∂H 

∂qk 

= [f, H] + ∂f 

∂t 

) 

+ ∂f 

∂t 

(3.22) 

This looks like Heisenberg’s equation of motion. For our purposes it means 

that if f doesn’t depend on time explicitly and if [f, H] = 0, then f is a 

constant of the motion. We can use (3.22) to test if our favorite function is 

in fact constant, and we can also use it to construct new constants as the 

following argument shows. 

Let f, g, and h be arbitrary functions of canonical variables. The following 

Jacobi identity is just a matter of algebra. 

[f, [g, h]] + [g, [h, f]] + [h, [f, g]] = 0 (3.23) 

Now suppose h = H, the Hamiltonian, and f and g are constants of the 

motion. Then 

[H, [f, g]] = 0 

Consequence: If f and g are constants of the motion, then so is [f, g]. 

This should make us uneasy. Take any two constants. Well, maybe they 

commute, but if not, then we have three constants. Commute the new 

constant with f and g and get two more constants, etc. How many constants 

are we entitled to – anyway? This is a deep question, which has something 

to do with the notion of involution. I’ll get to that later.


3.2 Geometry in n Dimensions: The Hairy Ball 

Have another look at equation (3.2). Let’s call ˙η a velocity field. By this 

I mean that it associates a complete set of ˙q’s and ˙p’s with each point in 

phase space. In what direction does ˙η point? This is an easy question in 

one dimension; ˙η evaluated at the point P points in the direction tangent 

to the trajectory through P . Since trajectories can’t cross in phase space, 

there is only one trajectory through P , and the direction is unambiguous. 

If we use action-angle variables, the trajectory is a circle, and ˙η is what we 

would call in Ph211, a tangent velocity. The same is true, no doubt, for 

n > 1, but how do these circles fit together? How does one visualize this in 

higher dimensions? 

The answer, as I have mentioned before, is that the trajectories all lie 

on the surface of an n dimensional torus imbedded in 2n dimensional space. 

Your ordinary breakfast donut is a two dimensional torus imbedded in three 

dimensional space. 2 This is easy to visualize, so let’s limit the discussion 

to two degrees of freedom for the time being. The step from one degree of 

freedom to two involves some profound new ideas. The step from two to 

higher dimension is mostly a matter of mathematical generalization. 

Since we are dealing with conservative systems, the trajectories are limited 

by the conservation of energy, i.e. H(q1, q2; p1, p2) = E is an equation 

of constraint. The trajectories move on a manifold with three independent 

variables. Now the gradient of a function has a well defined geometrical 

significance: at the point P , ∇f points in a direction perpendicular to the 

surface or contour of constant f through P . In this case ∇H is a four component 

vector perpendicular to the surface of constant energy. Unfortunately, 

˙η points in the direction of J · ∇H. What direction is that? Well, 

(∇H) T · J · ∇H = [H, H] = 0 

Consequently, J · ∇H points in a direction perpendicular to ∇H, which is 

perpendicular to the plane of constant H, i.e. ˙η lies somewhere on the three 

dimensional surface of constant H. 

We could have guessed that ahead of time, of course, but we can take the 

argument further. H is probably not the only constant of motion. Suppose 

there are others; call them F , G, etc. For each of these constants we can 

2 The word “dimension” gets used in two different ways. When we talk about physical 

systems, Lagrangians, Hamiltonians, etc., the dimension is equal to the number of degrees 

of freedom. Here I am using dimension to mean the number of independent variables 

required to describe the system.

3.2. GEOMETRY IN N DIMENSIONS: THE HAIRY BALL 39 

construct a vector field using (2). 

˙ηF = J · ∇F 

˙ηG = J · ∇G 

· · · etc. · · · 

How many such fields can we construct that are independent of one another? 

To put it another way, how many independent constants of motion are there? 

That’s a good question – what do you mean by “independent”? The answer 

comes from differential geometry. I’m afraid I can only give a hand-waving 

introduction to it. There are two related requirements: 

1. Suppose F and G are independent constants of motion. Take any 

trajectory from the manifold of constant F and another from constant 

G. There is no continuous canonical transformation that maps the one 

trajectory into another. 

2. For each point P in space there must be one unique trajectory that lies 

in the plane of constant F and simultaneously in the plane of constant 

G. 

Think about this last requirement in the case where there are two degrees 

of freedom and two independent constants of motion. The trajectories must 

lie on a two- dimensional surface. If we use action-angle variables, the 

trajectories are circles. This sounds like a globe of the earth. Trajectories 

with constant ϕ are called longitudes, lines of constant θ are latitudes. But 

wait! We have a serious problem at the poles. The north and south poles 

have all possible longitudes. Requirement 2 is violated. Could you rearrange 

the lines so that this problem doesn’t occur? It turns out that this is not 

possible. This deep result is known in mathematical circles a the Poincare- 

Hopf theorem. In the sort of less exalted company we keep, it’s the Hairy 

Ball Theorem. The idea is this: try to comb the hair on a hairy ball so that 

there is no bald spot. It can’t be done. So long as you really use a comb, i.e. 

so long as the trajectories don’t cross, you will always be left with one hair 

standing straight up! This is not a proof, of course, but it is a vivid way 

of visualizing the content of the theorem. It is easy to see, however, that 

what is impossible on a sphere is trivially easy on the surface of a donut. It 

can be done in an infinite variety of ways. The simplest is to choose your 

“longitudes” so they go around the donut the long way. Latitudes go around 

the short way. This also satisfies requirement 1. You can’t deform a latitude 

into a longitude without cutting through the donut.


OK. Suppose you have two constants of motion F and G. How can you 

tell if they are independent? The answer is surprisingly simple, [F, G] = 0 

Proof: Take a point P on the surface of the donut. We should be able 

to set up a local coordinate system with its origin at P to describe the 

trajectories on the surface. We need two unit vectors, ˆ ξF and ˆ ξG, such that 

every trajectory in the ˆ ξF - ˆ ξG plane has constant F and G. Choose 

ˆξF = ϵJ · ∇F 

This is guaranteed to lie in the surface of constant F ; however, G should 

remain constant along ˆ ξF . This means that 

0 = ( ˆ ξF ) T · ∇G = ϵ(J · ∇F ) T · ∇G 

= −ϵ(∇F ) T · J · ∇G = −ϵ[F, G] 

This proves the assertion. It’s worth reflecting on the fact that this construction 

would be impossible on the surface of a sphere. The sphere, unlike 

the donut, has only one independent constant, its radius. This theorem also 

relieves our anxiety about extra constants. If F and G are independent, we 

don’t get a “free” constant K = [F, G], because K = 0. 

Summary and generalization: 

1. A system with n degrees of freedom has at most n independent constants 

of motion. Otherwise we could use the additional constants to 

eliminate one or more of these degrees. For example, we could use the 

Hamilton-Jacobi procedure to make all the momenta constant. The 

Hamiltonian would then only be a function of the n coordinates, but 

these would not be independent because of the additional constraints. 

2. Let’s say there are k constants, Fi, i = 1, . . . , k. If they are independent 

we must have [Fi, Fj] = 0. 

3. In the best case there are exactly n independent constants. Such 

constants are said to be in involution. Such a system is said to be 

integrable. 

4. All trajectories of integrable systems are confined to the surfaces of 

n-dimensional tori imbedded in 2n-dimension space. 

5. If k < n there are no general statements we can make about the 

behavior of the trajectories. We will be very much concerned in the 

next chapter with systems that are “almost” integrable.


6. There are no general criteria known for deciding whether or not a 

system is integrable; however, if the Hamiltonian is separable, the 

system is integrable. 

3.2.1 Example: Uncoupled Oscillators 

The Hamiltonian for two uncoupled harmonic oscillators (with m = 1) is 

H = 1 

2 (p2 1 + p 2 2 + ω 2 1q 2 1 + ω2q 2 2) 

This is an important problem because every linear oscillating system can 

be put in this form by a suitable choice of coordinates. 3 There are two 

constants of motion 

E1 = 1 

2 (p2 1 + ω 2 1q 2 1) E2 = 1 

2 (p2 2 + ω 2 2q 2 2) 

In terms of action-angle variables, the constants are I1 and I2. 

H = I1ω1 + I2ω2 = E1 + E2 = E 

Every integrable system can be put in this form, although in general the 

ω’s will be functions of the I’s. Here they are just parameters from the 

Hamiltonian. 

This is a simple problem, but the phase space is four dimensional. Let’s 

think about all possible ways we might visualize it. In the q1 - p1 or (q2 - 

p2) plane the trajectories are ellipses with 

qk(max) = √ 2Ek/ωk 

pk(max) = √ 2Ek, 

where k = 1, 2. The area enclosed by each ellipse is significant, because 

∫ 

area = dq dp = p dq = 2πI (3.24) 

s 

The first integral is a surface integral over the area of the ellipse. The second 

is a line integral around the ellipse. This identity is a variant of Stokes’s 

theorem. It’s useful to rescale the variables so that they both have the same 

units and the trajectory is a circle. An natural choice would be 

q ′ k 

√ 

= qk ωk = √ 2Ik sin ψk 

p ′ k = pk/ √ ωk = √ 2Ik cos ψk 

3 This comes under the heading of “theory of small oscillations.” Most mechanics texts 

devote a chapter to it.


The trajectories are now circles with radius √ 2Ik. The area enclosed is 2πIk, 

as required by (3.24). 

The motion in the q1 - q2 plane is more complicated. It depends on 

the ratio ω1/ω2 called the winding number. If this is a rational number, 

say N1/N2 then after N1 cycles of q1 and N2 cycles of q2, the trajectory 

will come back to its starting point. This is called a Lissajou figure. If the 

winding number is irrational, the trajectory will be confined to a limited 

area but will never return to its starting point. It will eventually “color 

in” all available space. In the next chapter we will be concerned with systems 

that are “almost” integrable. For such systems the winding number is 

all-important. Systems with irrational winding numbers tend to be stable 

under perturbation. Those with rational winding numbers disintegrate at 

the slightest push! 

The centerpiece of this chapter is the torus. The trajectories spiral 

around the donut. If the winding number is rational they “wear a path” 

around the donut. If it’s irrational they cover the donut evenly. A useful 

way of visualizing this was invented by Poincaré. Imagine a flat plane cutting 

through the donut in such a way that every point on the plane has the 

angle ψ1 = 0. Place a dot on the plane at he point where each trajectory 

passes through it. If the winding number is a rational fraction, there will 

be a finite number of points. Each time a trajectory passes through ψ1 = 0 

it will pass through one of the dots. If the winding number is irrational the 

crossings will mark out a continuous circle. The Poincaré section as it is 

called (some books call it the surface of section) is a useful diagnostic tool. 

Suppose you have a system of equations that are not integrable (so far as 

you know) but is amenable to computer calculation. Take various Poincaré 

sections. If they are circles then the system is at least approximately integrable 

and can be described with action-angle variables. As we will see, 

there are often regions of phase space, “islands” as it were, where motion is 

simply periodic and other regions that are wildly chaotic. 

Pictures of this motion appear in all the standard texts. I have yet to 

see a clear explanation of the coordinates involved, however. What does it 

mean really to say that the donut is a 2-d surface in a 4-d space? Your 

breakfast donut, after all, is imbedded in 3-d space. If we take a Poincaré 

section through the donut at the plane ψ2 = 0 and plot q ′ 1 versus p′ 1 

, we 

will get either a circle of dots or a continuous circle with a radius equal to 

√ 2I1, or we can take a slice through ψ1 = 0 and get a circle with radius 

√ 2I2. Put it this way, any point on the torus has four (polar) coordinates, 

( √ 2I1, ψ1, √ 2I2, ψ2), but in 3-d space, only three of them are independent. 

When the torus is in 4-d space, all four of them are independent. If we really


lived in 4-d space, we would label the axes of the donut plot (q ′ 1 , p′ 1 , q′ 2 , p′ 2 ). 

This is impossible for us to imagine. The donut is easy; just remember that 

there is no equation of constraint among the four variables. 4 

3.2.2 Example: A Particle in a Box 

Consider a particle in a two-dimensional box with elastic walls. 

0 ≤ x ≤ a 0 ≤ y ≤ b 

H = 1 

2m (p2x + p 2 y) = π2 

( 

I2 1 

2m a2 + I2 2 

b2 ) 

I1 = 1 

 

px dx = 

2π 

a 

π |px| I2 = b 

π |py| 

ω1 = ∂H 

∂I1 

= π2 

I1 ω2 

ma2 = π2 

I2 

mb2 There are several interesting points about this apparently trivial problem. 

The Hamiltonian looks linear, but in fact it contains an invisible nonlinear 

potential that reverses the particle’s momentum when it hits the wall. One 

symptom of this is that the frequencies depend on I. This looks odd, but 

it’s just the action-angle way of saying that the particle makes a round trip 

(in the x direction) in a time T = 2am/px. The loop integral in this context 

is an integral over one “round trip” of the particle. 

∫ a ∫ 0 

pxdx = |px| dx + (−|px|) dx = 2a|px| 

0 

My real point in showing this example is to call your attention to the 

angle variable. I will work through the calculation for the x variable. This 

same thing holds for y of course. 

1 

2m 

a 

( ) 2 

dWx 

= E1 

dx 

∫ 

Wx = (±) √ ∫ 

2mE1 dx = 

ψ1 = ∂Wx 

∂I1 

= ± π 

∫ 

a 

(±) πI1 

a dx 

dx = ± π 

a x + ψ10 = ψ1 

4 Of course, I1 and I2 are constant for any given set of initial conditions. It is this sense 

in which the torus is a 2-d surface.


The term ψ10 is an integration constant. There is no reason why it must be 

the same for both legs of the journey. We are free to choose it as follows: 

0 → x → a: ψ1 = πx/a 

0 ← x ← a: ψ1 = 2π − πx/a 

While the particle is bouncing violently between the walls, the angle variables 

are increasing smoothly with time, ψ1 = ω1t and ψ2 = ω2t. Even this 

strange problem is equivalent to a donut! 5 

5 When correctly viewed, everything is a harmonic oscillator – in this case two harmonic 

oscillators.

Chapter 4 

Canonical Perturbation 

Theory 

So far we have assumed that our systems had exact analytic solutions. One 

way of stating this is that we can find a canonical transformation to action 

angle variables such that the new Hamiltonian is a function of the action 

variables only, H = H(I). Such problems are the exception rather than the 

rule. For our purposes they are also uninteresting. All periodic integrable 

systems are equivalent to a set of uncoupled harmonic oscillators. Once you 

get over the thrill of this discovery, the oscillators are boring! The existence 

of chaos depends on the system not being equivalent to a set of oscillators. 

In order to deal with systems that are non-trivial in this sense, we need some 

way of doing perturbation theory. 1 

4.1 One-Dimensional Systems 

I will present the theory first for systems with one degree of freedom. This 

will simplify the notation, however the interesting complications only appear 

in higher dimensions. Here is the basic situation: A bounded conservative 

system with one degree of freedom is described by a constant Hamiltonian 

H(q, p) = E. We need to obtain the equations of motion in the form q = q(t) 

1 I will follow the treatment in Chaos and Integrability in Nonlinear Dynamics, Michael 

Tabor, Wiley-Interscience, 1989. Another good reference is Classical Mechanics by R. A. 

Metzner and L.C. Shepley, Prentice Hall, 1991. The subject is also discussed in Classical 

Mechanics, Goldstein, Poole and Safko, third edition, Addison-Wesley, 2002. Goldstein 

discusses time-dependent and time-independent perturbation theory. We are doing the 

time-independent variety. 

45

46 CHAPTER 4. CANONICAL PERTURBATION THEORY 

and p = p(t), but this is impossible due to the non-linear nature of the 

problem. We are able to split up the Hamiltonian 

H = H0 + ϵH1 

in such a way that H0 is amenable to exact solution, and H1 is in some 

sense small. We indicate the smallness by multiplying it by ϵ. This is a 

bookkeeping device; it will be set to one after the approximations have been 

derived. 

The first step is to find the canonical transformation that makes H0 

cyclic, i.e. q = q(I, ψ), p = p(I, ψ), and H0 = H0(I) where I and ˙ ψ = ω0 

are both constant. Unfortunately, this transformation does not render the 

complete Hamiltonian cyclic, so we write 

H(I, ψ) = H0(I) + ϵH1(I, ψ). (4.1) 

H is still constant, and consequently I now depends on ψ. H0 is not an 

explicit function of ψ, but it does depend on ψ implicitly through I. 

Despite this inconvenience, I and ψ are still a perfectly good set of 

canonical variables, so that the equations of motion 

I ˙ = − ∂ 

H(I, ψ) ˙ 

∂ψ 

∂ 

ψ = H(I, ψ) (4.2) 

∂I 

are valid without approximation, even though we are unable to solve them in 

this form. The so-called time-dependent perturbation proceeds from here by 

expanding the solutions of (4.2) as power series in ϵ. Our approach is to find 

a second canonical transformation, i.e. (q, p) → (I, ψ) → (J, φ) such that 

H(I, ψ) → K(J). This last step must be done as a series of approximations, 

of course, otherwise the problem would be exactly solvable. 

In order to make the transformation (I, ψ) → (J, φ) we will use a generating 

function of the F2 genus, i.e. F = F (ψ, J). We need to expand 

F = F0(ψ, J) + ϵF1(ψ, J) + · · · (4.3) 

where F0 = Jψ. This is the identity transformation as can be seen as follows: 

I = ∂ 

∂ψ F0 = J φ = ∂ 

∂J F0 = ψ 

In terms of (??)the transformation equations are 

I = ∂F 

∂ψ 

= J + ϵ∂F1 (ψ, J) + · · · (4.4) 

∂ψ

4.1. ONE-DIMENSIONAL SYSTEMS 47 

φ = ∂F 

∂J 

= ψ + ϵ∂F1 (ψ, J) + · · · (4.5) 

∂J 

Before going on there are some technical points about ψ and J that need 

to be discussed. When ϵ = 0, ψ is the exact angle variable for the system. 

This means that we can find p and q as functions of ψ such that p and q 

return to their original values when ∆ψ = 2π. We can in principle invert 

this transformation to find ψ as a function of p and q. 

ψ = ψ(q, p) (4.6) 

When p and q run through a complete cycle, ψ advances by 2π. When ϵ ̸= 0 

the orbit will be different from the unperturbed case, but the functional 

relationship doesn’t change, so when p and q run through a complete cycle, 

we must still have ∆ψ = 2π. Of course, the exact angle variable will also 

advance 2π. In summary 

∆ψ = ∆φ = 2π (4.7) 

for one complete cycle. 

The following integrals are all equal because canonical transformations 

preserve phase space volume. 

J = 1 

2π 

 

p dq = 1 

2π 

Now integrate (4.4) around one orbit: 

that is 

 

1 

2π 

I dψ = 1 

 

2π 

 

J dφ = 1 

 

2π 

J dψ + 1 

2π ϵ 

 

∂F1 

∂ψ 

J = J + 1 

2π ϵ 

 

∂F1 

∂ψ 

dψ + · · · ; 

I dψ (4.8) 

dψ + · · · ; 

We have just seen that ∆ψ = 2π around one cycle. Consequently 

 

∂F1 

dψ = 0 (4.9) 

∂ψ 

implies that the derivative of F1 is purely oscillatory with a fundamental 

period of 2π in ψ. (The same is true of the higher order terms as well.) 

The Hamiltonian is transformed using (??) with the new variables. 

K(φ, J) = H(ψ(φ, J), I(φ, J)) + ∂ 

∂t F2(ψ(φ, J), J, t) (4.10)


As explained above, we seek a transformation that makes φ cyclic so that 

K = K(J). The appropriate generating function does not depend on time, 

so (4.10) becomes 

K(J) = H(ψ(φ, J), I(φ, J)) (4.11) 

The approximation procedure consists in expanding the left and right sides 

of this equation in powers of ϵ and then equating terms of zeroth and first 

order. This procedure could be carried out to higher order. I’m interested 

in first order corrections only. 

The so-called Kamiltonian is expanded as follows: 

K(J) = K0(J) + ϵK1(J) + · · · 

At first sight, this agenda looks hopeless. We need to know the exact value 

of J to make use any of these terms, even the zeroth order approximation. 

The exquisite point is that we can use (4.8) to calculate J exactly without 

knowing the complete transformation. 

The zeroth order Hamiltonian is expanded with the help of (4.4). 

H0(I) = H0( ∂F 

∂ψ ) = H0(J + ϵ ∂F1 

∂ψ + · · · ) = H0(J) + ϵ ∂F1 

∂ψ 

∂H0(J) 

∂J 

 

 

 

ϵ=0 

The first order term is already multiplied by ϵ. 

Substitute all this into (??) gives 

∂H0 

∂J 

 

 

 

ϵ=0 

+ · · · 

= ∂H0(I) 

∂I = ω0 (4.12) 

ϵH1(ψ, I) = ϵH1(φ, J) + · · · 

K0(J) = H0(J) (4.13) 

K1(J) = ∂F1 

∂ψ ω0 + H1(φ, J) (4.14) 

The notation H0(J) means that you take your formula for H0(I) and replace 

the symbol I with the symbol J without making any change in the functional 

form of H0. 

Integrate (4.14) around one cycle and use (4.9); (4.14) becomes 

K1(J) = H1(J) ≡ 1 

∫ 2π 

H1(ψ, J) dψ, (4.15) 

2π 0

4.1. ONE-DIMENSIONAL SYSTEMS 49 

and 

∂ 

∂ψ F1(ψ, J) = 1 [ 

H1 − H1(ψ, J) 

ω0(J) 

] ≡ − ˜ H1(ψ, J) 

ω0 

(4.16) 

˜H1 is the periodic part of H1. We are left with a differential equation that 

is easy to integrate. 

F1(ψ, J) = − 1 

∫ 

dψ 

ω0(J) 

˜ H1(ψ, J) (4.17) 

4.1.1 Summary 

I will summarize all these technical details in the form of an algorithm for 

doing first order perturbation theory. Remember that the object is to find 

equations of motion in the form q = q(t) and p = p(t). We do this in three 

steps: (1) Find q = q(I, ψ) and p = p(I, ψ). (2) Find I = I(J, φ) and 

ψ = ψ(J, φ). (3) J and ˙φ are constant, so φ = ˙φt + φ0. 

1. Identify the H0 part of the Hamiltonian. Find the transformation 

equations q = q(I, ψ) and p = p(I, ψ) using the Hamiltonian-Jacobi 

equation as described in the previous section. Use (??) to get ω0. 

2. Equation (4.8) can be used to find J in terms of the total energy 

E. The integral presents no difficulties in principle, especially if the 

Hamiltonian is separable. In fact, textbooks never bother to do this. 

It seems sufficient to display the results in terms of J, the assumption 

being that we could find J = J(E) if we really had to. 

3. The first order correction to the energy is obtained from the integral in 

(4.15). Get the first order correction to the frequency by differentiating 

it with respect to J. 

4. The generating function F1 is calculated from (4.17). It is then substituted 

into (4.4) and (4.5). These give implicit equations for ψ = 

ψ(J, φ) and I = I(J, φ). Unfortunately, it is usually impossible to 

invert them to obtain these formula explicitly. 

4.1.2 The simple pendulum 

The pendulum makes a nice example 

H = l2 

+ mgR(1 − cos θ) 

2mR2


The angular momentum l = mR2θ˙ is canonically conjugate to the angle θ. 

H = 1 

2mR2 [ 

l 2 + m 2 R 4 ω 2 0θ 2 

( 

1 − θ2 

)] 

+ · · · 

12 

The first two terms reduce to the familiar harmonic oscillator. This is the 

zeroth order problem. 

l 2 = 

Make the natural substitution 

H0 = E0 = l2 mgRθ2 

+ 

2mR2 2 

( ) 2 

dW 

= 2mR 

dθ 

2 E0 − m 2 R 4 ω 2 0θ 2 

l = 

sin 2 ψ = ml2 ω 2 0 

2E0 

( dW 

dθ 

θ 2 

(4.18) 

) 

= √ 2mR2E cos ψ (4.19) 

We can look on this as a convenient change of variable, but ψ is also the 

angle variable. This can be seen as follows: 

I = 1 

 

2π 

l dθ = 

√ 2mR 2 E0 

2π 

[ 

1 − mR2 ω 2 0 

2E0 

θ 2 

] 1/2 

Use (4.19) to get the familiar result, I = E0/ω0. The generating function is 

obtained from the indefinite integral 

∫ ( ) ∫ 

dW [2mR2 W = 

dθ = ω0I − m 

dθ 

2 R 4 ω 2 0θ 2] 1/2 

dθ (4.20) 

According to the basic transformation formula we should have 

ψ = ∂W 

∂I 

One can show by differentiating (4.20) and using (4.19) to complete the 

integration, that this is indeed so. 

Equations (4.18) and (4.19) can be rearranged to give 

l = √ 2mR 2 Iω0 cos ψ (4.21) 

dθ

4.2. MANY DEGREES OF FREEDOM 51 

θ = 

√ 2I 

mR 2 ω0 

sin ψ (4.22) 

The goal of the action-angle program is to express the original coordinates 

and momenta in terms of the action-angle variables. This has now been 

completed to zeroth order. 

The first order correction is 

H1(I, ψ) = − mR2 ω 2 0 θ4 

24 

= − I2 

6mR 2 sin4 ψ. 

We are now in a position to recast our Hamiltonian à la (4.1). 

( 

H(I, ψ) = Iω0 + ϵ − I2 

6mR2 sin4 ) 

ψ 

We have also obtained ω0 = √ g/R “for free.” The ϵ is there for bookkeeping 

purposes only. We have no further need for it. 

K0(J) = H0(J) = Jω0 

K1(J) = H1(J) = 1 

∫ 2π 

2π 0 

F1(J, ψ) = − 1 

˜H = H1 − H1 = 

ω0 

∫ 

J 2 

H1 dψ = − 

16mR2 J 2 

48mR 2 (3 − 8 sin4 ψ) 

dψ ˜ J 

H1 = 

2 

192 mR2 (sin 4ψ − 8 sin 2ψ) 

ω0 

ω = ω0 − 

J 

32mR 2 

4.2 Many Degrees of Freedom 

For systems of two or more degrees of freedom, canonical perturbation theory 

is formulated in exactly the same way as before – but now profound 

difficulties arise, even to first order in ϵ. The problem centers around equation 

(4.16) repeated here for reference 

ω0(J) ∂F1(ψ, J) 

∂ψ 

= − ˜ H1(ψ, J) 

We were able to solve this with a simple integration (4.17). This is not 

possible for more that one degree of freedom, so we must resort to Fourier


series. Before doing this, however, we will need to generalize our notation. 

Let’s use the vectors 

J = (J1, · · · , Jn) ω0 = (ω01, · · · , ω0n) ∇ψ = ( ∂ 

∂ψ1 

, · · · , ∂ 

) 

∂ψn 

where n is the number of degrees of freedom. In this notation (4.16) becomes 

where 

ω0(J) · ∇ψF1(ψ, J) = − ˜ H1(J, ψ) (4.23) 

¯H1(J, ψ) = 

∫ 2π 

0 

∫ 2π 

dψ1 · · · dψnH1(J, ψ) (4.24) 

0 

and ˜ H1 = H1 − ¯ H1. Since both sides of (4.16) are periodic, we can solve 

them with Fourier series. 

˜H1(J, ψ) = ∑ 

Ak(J)e ik·ψ 

(4.25) 

where k is a vector of integers 

k 

F1(J, ψ) = ∑ 

Bk(J)e ik·ψ 

k 

k = k1, · · · , kn 

(4.26) 

It seems as if we could proceed as follows: ˜ H1 is known at this point, so we 

can find Ak Substitute these definitions into (4.16) we get 

Bk = i Ak 

ω0 · k 

(4.27) 

Now here’s the infamous problem. Suppose, for example, there were only 

two degrees of freedom. In this case the denominator of (??) would be 

ω0 · k = ω01k1 + ω02k2 

(4.28) 

You can see that if the winding number ω01/ω02 is a rational number, then 

for some k, Bk will be infinite. It seems that the slightest perturbation 

will blow this system into outer space! Even if the winding number is not 

rational, there will always be values of k that will make ω0 · k arbitrarily 

small. 

This problem was discovered in the early twentieth century, and all the 

effort of the most eminent mathematicians of the day failed to solve it.

4.2. MANY DEGREES OF FREEDOM 53 

One opinion held that the slightest perturbation would cause the system to 

become “ergodic,” that is to say, the trajectories would fill up all of phase 

space. Numerical calculations later showed that this was often not the case. 

Trajectories will often “lock in” to stable patterns. This has been the subject 

of much contemporary research. When and why do trajectories lock in, and 

what happens when they do not? The question of what trajectories remain 

stable under small perturbations is at least partly answered by the so-called 

KAM (Kolmogorov, Arnold, Moser) theorem. In the general case there is, 

if not a complete theory, at least a well-developed taxonomy. We will turn 

to these matters in the next chapter.

54 CHAPTER 4. CANONICAL PERTURBATION THEORY

Chapter 5 

Introduction to Chaos 

The canonical perturbation theory of the previous chapter is a lot of work, 

and in two or more degrees of freedom it summons up the ogre of small 

denominators. Many people have tried to solve this problem by pounding 

their heads on it. This turns out not to be a fruitful approach. I will 

illustrate the limitations of perturbation theory by considering the van der 

Pol oscillator. This is a simple nonlinear, one-dimensional, second-order 

differential equation closely resembling a damped harmonic oscillator. It 

has stable solutions which can easily be found numerically, yet it has no 

known analytic solutions, and perturbation theory, on general principles, 

just can’t work! 1 We then go on to discuss linear stability theory. With 

these simple techniques you can analyze most nonlinear systems (the van 

der Pol oscillator is an exception) and get a qualitative picture of the phase 

space dynamics. In one degree of freedom (two-dimensional phase space) 

it will become immediately apparent where perturbation theory is possible 

and a qualitative idea of the motion of the system where it is not. 

Higher dimensional spaces are not so easy to analyze, in part because 

they are hard to visualize and in part because they are often not integrable. 

It is this non-intagrability that leads to chaos. Here we resort to the 

Poincareé section and the notion of discrete maps. The Poincaré-Birkoff and 

KAM theorems can then tell us something about the onset and structure of 

chaos. 

1 It should be remembered that all the major developments in elementary particle theory 

over the last few decades starting with the standard model in the 1970’s are based on the 

notion of spontaneous symmetry breaking. Spontaneous symmetry breaking, almost by 

definition, cannot be described with perturbation theory. When perturbation theory fails 

we always expect new physics. The same is true (to a lesser extent) in classical mechanics 

as well. 

55

56 CHAPTER 5. INTRODUCTION TO CHAOS 

5.1 The total failure of perturbation theory 

To get some feeling for how perturbation theory might be useless, look at 

the following “toy” example. 

¨x = −x + ϵ(x 2 + ˙x 2 − 1) sin( √ 2t) (5.1) 

This looks like a harmonic oscillator with a resonant frequency ω = 1 and 

a “small” driving term with a frequency ω = √ 2. Obvious solutions are 

x(t) = sin t and x(t) = cos t, which hold for all values of ϵ. If we set ϵ = 0 

then the solutions more generally are x(t) = x0 sin(t + t0). This solution 

plotted on a phase space plot of x(t) versus ˙x(t) will be a circle with radius 

r = x0. What would you expect for finite ϵ? There presumably are other 

solutions, but don’t waste your time looking for them! You should convince 

yourself however, that there are no solutions of the form 

x(t) = sin t + 

∞∑ 

ϵ n fn(t) (5.2) 

Also convince yourself that the trouble comes from the non-linear terms. 

The point is because of the non-linearity, it is not possible to start with 

unperturbed solutions and get new solutions by adding to them. 

A more interesting and oft-studied example is the van der Pol equation. 

It was first introduced by van der Pol in 1926 in a study of the nonlinear 

vacuum tube circuits of early radios. 

n=1 

¨x + ϵ(x 2 − 1) ˙x + x = 0 (5.3) 

Again the ϵ = 0 equations are x(t) = x0 sin(t + t0). In phase space this is 

a circle of radius x0. If we make ϵ ever so much larger than zero, however, 

something remarkable happens as shown in the first of the plots in Fig. 

5.1. Yes the orbit eventually becomes a circle, but regardless of the initial 

conditions, the radius r ≈ 2. The same sort of behavior is shown in Fig. 5.1 

for larger values of ϵ. The shape of the final orbit is determined entirely by ϵ 

and is completely unaffected by the initial conditions. A curve of the sort is 

called a limit cycle. It’s easy to see in vague way why the limit cycle exists. 

The term proportional to ϵ in (5.3) looks like an oscillator damping term, but 

its sign depends on whether x 2 is greater or less than 1. If it is greater, the 

oscillation is damped; if it is smaller the oscillation is “undamped.” Indeed, 

if ϵ is made negative, the orbits either collapse to zero or diverge to infinity 

depending on the initial conditions. For obvious reasons the solutions with

5.1. THE TOTAL FAILURE OF PERTURBATION THEORY 57 

dx/dt 

dx/dt 

4 

2 

0 

−2 

Epsilon=0.1 

−4 

−4 

4 

−2 0 

x 

Epsilon=1.5 

2 4 

2 

0 

−2 

−4 

−4 −2 0 

x 

2 4 

dx/dt 

dx/dt 

4 

2 

0 

−2 

Epsilon=0.5 

−4 

−4 

10 

−2 0 

x 

Epsilon=3 

2 4 

5 

0 

−5 

−10 

−4 −2 0 

x 

2 4 

Figure 5.1: The van der Pol plot for four values of ϵ and two starting values 

(indicated by asterisks)


positive ϵ are said to be stable and those with negative ϵ are said to be 

unstable. 

This simple model makes an important point. Conventional perturbation 

theory starts with unperturbed, i.e. ϵ = 0 solutions, and then looks for 

series solutions in powers of ϵ. This is obviously hopeless here since even 

a smidgeon of ϵ is enough to completely alter the nature of the orbits. It 

would be better to start with some simple function that approximated the 

limit cycle and then expand in powers of some parameter that characterized 

the deviation of the actual orbit from the simple function. Alas, I don’t 

know how to do this. The trouble is that the limit cycle is so weird, at least 

for large ϵ, that it’s hard to come up with a “lowest-order” solution. For 

many systems however, this is a practical approach. The trick is to look for 

the fixed points. 

5.2 Fixed points and linearization 

Equations of motion can always be cast in the form 

˙ξ = f(ξ, t). (5.4) 

With n degrees of freedom ξ and f are 2n-dimensional vectors. For example, 

Hamilton’s equations with one degree of freedom are 

˙q = ∂H 

∂p 

˙p = − ∂H 

∂q 

[ ] 

p 

ξ = 

q 

(5.5) 

To keep the notation simple and general (and to save typing) I will keep the 

notation in the form (5.5) for the time being and not type out the p’s and 

q’. I will also restrict the discussion to autonomous systems, i.e. those in 

which the Hamiltonian does not depend explicitly on time. 2 

A fixed point (also called a stationary point, equilibrium point, or critical 

point) is simply the point ξf where all the time derivatives vanish, f(ξf ) = 

ξf 

˙ = 0. It’s the place where nothing happens. Detailed information about 

2 The notation in this section is taken from Classical Dynamics by J. V. Jose and E. J 

Saletan

5.2. FIXED POINTS AND LINEARIZATION 59 

the motion of a system close to a fixed point can be obtained by linearizing 

the equations of motion. This is done as follows: First the origin is moved 

to the fixed point by writing 

ζ(t) = ξ(t; ξ0) − ξf 

Second, (5.4) is written for ζ rather than for ξ. 

(5.6) 

˙ζ = f(ζ + ξf ) ≡ g(ζ) (5.7) 

Third, g is expanded in a Taylor series about ζ = 0. 

˙ζ j = dgj 

dζk 

 

 

ξf 

+ O(ζ 2 ) ≡ A j 

k ζk + O(ζ 2 ) (5.8) 

I am using the Einstein summation convention in which one sums over repeated 

indices. Dropping the ζ 2 terms gives the matrix equation 

˙z = A · z (5.9) 

A is a constant matrix called (among other things) the stability matrix. It’s 

easy to solve (9) using the matrix exponential. 

where 

e At ≡ 

z(t) = e At z0 

∞∑ 

n=0 

(5.10) 

A n t n /n! (5.11) 

For our purposes it will be enough to take the case of one degree of freedom 

in which case A is a 2 × 2 real, constant matrix. If A is diagonal 

e At 

 

= 

eλ1t 0 

0 eλ2t 

 

 

 

(5.12) 

where λ1 and λ2 are eigenvalues, which might be real or complex. If they 

are complex they come in complex-conjugate pairs, λ∗ 1 = λ2. 

Various cases can be identified. If both eigenvalues are real and positive, 

all trajectories flow away from the fixed point which is then called unstable. 

If they are both negative all trajectories flow toward and the fixed point is 

said to be stable. If the eigenvalues have opposite signs then the trajectories 

are repelled from one axis and attracted to the other. This is called a 

hyperbolic fixed point or a saddle point.


Figure 5.2: Unstable fixed point for real λ2 > λ1 > 0. 

Figure 5.3: Unstable fixed point for real λ1 = λ2 > 0.

5.2. FIXED POINTS AND LINEARIZATION 61 

Figure 5.4: Hyperbolic fixed point for real λ1 < 0, λ2 > 0. 

It is possible that A cannot be diagonalized. In that case it can at least 

be in upper triangular form, i.e. 

then 

 

 

A = 

 

z(t) = e λt 

 

 

 

 

λ 0 

µ λ 

 

 

 

 

1 0 

µt 1 

 

 

 

 

(5.13) 

(5.14) 

Complex eigenvalues require a bit more discussion. Let λ = α + iβ and 

z = u + iv, where α and β are real numbers, and u and v are real vectors 

orthogonal to one another. Separating real and imaginary parts 

A · u = αu − βv A · v = βu + αv (5.15) 

Evidentally A · z ∗ = λ ∗ z ∗ so z ∗ is an eigenvector with eigenvalue λ ∗ . Eigne 

vectors belonging to different eigenvalues are independent. We can construct 

the independent real vectors u and v as follows 

u = 

z + z∗ 

2 

v = 

z − z∗ 

2i 

(5.16)


Figure 5.5: Unstable fixed point for nondiagonalizable A matrix. All of the 

integral curves are tangent to z2 at the fixed point. 

Substituting these definitions into (5.10) gives 

e At u = e αt (u cos βt − v sin βt), (5.17) 

e At v = e αt (u sin βt + v cos βt). (5.18) 

There are two important cases, α > 0 in which case the fixed point is 

unstable and the orbits are spirals, and α = 0 when the phase portrait 

consists of circles. In this case the fixed point is call a center or an elliptic 

point. 

It should be remembered that (5.9) is a linearized equation. It hold 

in some small region of the fixed point, and of course, as is so often the 

case, the theory gives us no well to tell how small that region might be. The 

damped oscillator makes a good example of the method, and there are many 

other examples in the textbooks. On the other hand, the the theory fails 

completely for the van der Pol oscillator in the previous section. 

5.3 The Henon oscillator 

Although the theory from the previous section is perfectly general in the 

sense that it can be applied to systems with any number of degrees of free-

5.3. THE HENON OSCILLATOR 63 

Figure 5.6: Unstable fixed point for complex λ with ℜ(λ) > 0. 

Figure 5.7: Stable fixed point for λ pure imaginary.


dom, it is almost impossible to visualize in four or more dimensions, and the 

number of cases that must be considered increases rapidly. The best tool 

for visualizing higher dimensional spaces is the Poincaré section. This was 

described briefly in Chapter 3, and we will make more use of it shortly. Before 

doing so it will be useful to a good example of motion with two degrees 

of freedom. A fascinating and oft-studied cases is the Hénon-Heiles Hamiltonian. 

The Hamiltonian was originally used to model the motion of stars 

in the galaxy3 . Written in terms of dimensionless variables the Hamiltonian 

is 

H = 1 

2 ( ˙x2 + ˙y 2 + k1x 2 + k2y 2 ) + λ(x 2 y − y3 

) (5.19) 

3 

This is the Hamiltonian of two uncoupled harmonic oscillators with a perturbation 

proportional to λ. The oscillators have frequencies ω1 = √ k1 and 

ω2 = √ k2. The phase space is the four-dimensional space spanned by x, 

˙x, y, and ˙y. We can think of the unperturbed orbit as lying on two tori. 

In this case their cross sections are circular with radii determined by the 

initial conditions. If the winding number is w = r/s, x will complete r 

cycles while y completes s. Let us make a Poincaré section through the y 

torus at x = 0. Each time the orbit passes through from x < 0 to x > 0 

we mark a point at y and ˙y on the x = 0 plane. An example is shown in 

Figure (5.8) for w = 7/2. Because the winding number is rational there 

are seven discrete dots on the Poincaré section. The case of an irrational 

winding number is shown in Figure (5.9). The x vs. y plot is completely 

filled in, and the Poincaré plot is a continuous loop. Continuous loops like 

this on the Poincaré plot are a sign that the system is circulating around an 

invariant torus and hence is integrable. 

When we turn on the perturbation by making λ ̸= 0 something remarkable 

happens. 4 Figures (5.10) through (5.13) show a progression from the 

orderly motion of the uncoupled oscillators Figure (5.9) through the loop in 

(5.10) suggesting motion around a single distorted torus. As the interaction 

strength is increased this one torus breaks up into five separate tori. In the 

next plot Figure (5.12) the points are beginning to disperse in a random 

way with some structure remaining. Because of the poor resolution of the 

plots one cannot see the fine details that remain. Finally in the last plot the 

points are arranged in a completely random pattern. This is paradigmatic. 

As the strength of the perturbation increases orderly motion disintegrates 

3 See Goldstein’s Classical Dynamics for a review of the physics 

4 I am following standard practice by varying the perturbation strength by changing 

the total energy with λ = 1.


y 

dy/dt 

0.1 

0.05 

0 

−0.05 

−0.1 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

−0.6 

x vs y 

−0.1 −0.05 0 

x 

0.05 0.1 

dy/dt 

0.2 

0.15 

0.1 

0.05 

0 

−0.05 

−0.1 

−0.15 

−0.2 

Poincare Section through x=0 

−0.1 −0.05 0 

y 

0.05 0.1 

Figure 5.8: Harmonic oscillator coordinates for w = 7/2 


−0.3 −0.2 −0.1 0 

y 

0.1 0.2 0.3 

Figure 5.9: Harmonic oscillator coordinates irrational winding number


dy/dt 

0.4 

0.3 

0.2 

0.1 

0 

−0.1 

−0.2 

−0.3 


−0.4 

−0.4 −0.3 −0.2 −0.1 0 

y 

0.1 0.2 0.3 0.4 

Figure 5.10: Henon-Heiles Hamiltonian. Orbit circulates a distorted torus. 

dy/dt 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

−0.1 

−0.2 

−0.3 

−0.4 


−0.5 

−0.4 −0.2 0 0.2 0.4 0.6 

y 

Figure 5.11: The orbit breaks up into smaller tori.


dy/dt 

dy/dt 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

−0.1 

−0.2 

−0.3 

−0.4 


−0.5 

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 

y 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

−0.6 

Figure 5.12: Chaos begins to set in. 


−0.8 

−0.5 0 0.5 1 

y 

Figure 5.13: Complete Chaos


into chaos. One of the goals of chaos theory is to explain explain and predict 

this phenomena. This will require some new formalism. 

5.4 Discrete Maps 

Suppose we were to number the points on the Poincaré plot in the order 

they appeared as the orbit repeatedly cut through the x = 0 plane. This 

would give us a series of coordinates (x1, y2), (x2, y2), · · · , (xn, yn). Think of 

this in terms of a mapping operator T that maps the n’th point into the 

n + 1’th point. 

T(xn, yn) ≡ (xn+1, yn+1) (5.20) 

In principle we could derive the exact mathematical form for this operator. 

(I doubt that anyone has actually done this.) Certainly we could write 

a computer program to do the mapping, and certainly we could derive a 

linearized version of T that would be OK for small displacements. For my 

purposes it will be enough to consider the general properties such operators 

must have. The first of these (from which all others flow) is that they must 

be area preserving. 

Canonical transformations preserve the volume of phase space. This is 

called Liouville’s theorem; it’s proved in most mechanics texts. For a onedegree 

of freedom system, this is just preservation of area in the (p, q)-phase 

plane. Thus for some area A, enclosed by a closed curve C, we can use 

Stokes’ theorem to write 

 

p dq = p dq (5.21) 

C 

where C ′ is the shape of the curve after it has been changed by some canonical 

transformation including the passage of time which is itself a canonical 

transformation. Another way to say the same thing is that if the (q, p) point 

in phase space is transformed to (q ′ , p ′ ), then the Jacobian 

 

 

 

 

∂(q ′ , p ′ ) 

∂(q, p) 

C ′ 

 

 

 

= 1 (5.22) 

These results can be extended to higher dimensions in a completely straightforward 

manner. 

So far, so good. There is a corollary to Liouville’s theorem that is not 

so easy to prove. The transformations of the form (5.20) on the Poincaré

5.4. DISCRETE MAPS 69 

J 

3 

2 

1 

0 

−1 

−2 

−3 

Standard Map, ε = 0.00 

0 1 2 3 4 5 6 

phi 

Figure 5.14: The standard map with ϵ = 0. 

section also preserve area in the sense of (5.22). 5 Discrete maps are area 

preserving. 

Let’s take time out for an example. The following transformation is 

called the standard map, presumably because it appears in so many different 

contexts. Thanks to the Jn+1 (rather than Jn) in the first of equation (5.23) 

it is trivially area preserving. 

ϕn+1 = (ϕn + Jn+1)mod2π (5.23) 

Jn+1 = ϵ sin ϕn + Jn 

This is a one-dimensional map written in terms of action-angle variables J 

and ϕ. Not only ϕ, but also J is periodic with period 2π. We can imagine 

all the orbits wrapped around a cylinder. In the case ϵ = 0, the (ϕn, Jn)’s lie 

along parallel circles as shown in Figure 5.14. When ϵ is increased to 0.050 

a new feature appears, a loop in the center of the plot. This is unusual in 

the sense that it can be contracted to a point; it is topologically distinct 

from all the ϵ = 0 circles. As ϵ is increased an assortment of smaller loops 

appear together with a smattering of completely random points. Because of 

limitations on plot resolution, computer time, and my patience you cannot 

see the really significant thing about this plot: this pattern of islands of loopy 

order interspersed with random dots persists at ever smaller and smaller 

scales. They have a property called self similarity. In this sense they are 

5 Tabor, Appendix 4.1


J 

3 

2 

1 

0 

−1 

−2 

−3 


0 1 2 3 4 5 6 

phi 

Figure 5.15: The standard map with ϵ = 0.050. 

similar to fractal patterns. Finally, as ϵ is increased further, all appearance 

of order disappears and the dots become completely random. This is the 

state of complete chaos. 

5.5 Linearized Maps 

Like the continuous transformations we studied in Section 5.2, discrete maps 

have fixed points about which one can analyze the local topology. Consider 

a generic mapping of the form 6 

[ xi+1 

yi+1 

] 

[ 

xi 

= T 

yi 

] 

(5.24) 

A fixed point of the mapping would be a point where xi+1 = xi and yi+1 = yi. 

I will argue later on that in a plot like Figure 5.16 there are an infinite 

number of fixed points, but to keep the algebra simple here I will assume 

that the fixed point is at the origin (0, 0). Linearizing T about this point 

gives [ ] [ ] [ ] 

δxi+1 T11 T12 δxi 

= 

(5.25) 

δyi+1 T21 T22 δyi 

where of course 

Tij = ∂Ti 

 

 

 

∂xj 

xi,xj=0 

6 I am using Tabor’s notation from section 4.3.4. 

(5.26)

5.5. LINEARIZED MAPS 71 

J 

3 

2 

1 

0 

−1 

−2 

−3 


0 1 2 3 4 5 6 

phi 

Figure 5.16: The standard map with ϵ = 0.750. 

The eigenvalues λi of the Tij matrix must satisfy 

λ 2 − λ(trace(T )) + det(T ) = 0 (5.27) 

The all-important point here is that because of the area-preserving property 

of T, det(T ) = 1. This greatly restricts the allowed types of fixed points. 

There are only three cases to consider. 

If |trace(T )| < 2, λ1 λ2 are a complex conjugate pair lying on the unit 

circle, that is, 

λ1 = e +iα , λ2 = e −iα 

(5.28) 

This is simply a rotation in the vicinity of the fixed point (0, 0). This corresponds 

to a stable or elliptic point. Thus in the immediate neighborhood 

of (0, 0) we expect to find invariant curves like Figure 5.7. 

If |trace(T )| > 2, λ1 λ2 are real numbers satisfying 

λ1 = 1/λ2 

(5.29) 

There are two subcases to consider here depending on whether λ is positive 

or negative. If it is positive we have a regular hyperbolic fixed point in 

which successive iterate stay on the same branch of the hyperbola as in 

Figure 5.17 (a). If λ < 0 we have a hyperbolic-with-reflection fixed point 

in which successive iterates jump backwards and forwards between opposite 

branches of the hyperbola. (See Figure 5.17 (b).)


Figure 5.17: (a) Hyperbolic fixed point. (b) Hyperbolic-with-reflection fixed 

point. 

5.6 Lyapunov Exponents 

Loosely speaking, systems are chaotic because adjacent trajectories diverge 

exponentially from one another. If this were literally true we could parameterize 

this divergence with the function e λx , where λ is some constant and x 

is the independent variable, which might be continuous or discrete depending 

on the application. This is the basic idea behind Lyapunov exponentials, 

a formalism with many alternate definitions (and spellings). 

Let’s apply this idea first to a one-dimensional iterative map of the form 

xi+1 = f(xi) (5.30) 

We can characterize the divergence of two trajectories separated by ϵ upon 

the n-th iteration as 

 

|f(xn + ϵ) − f(xn)| 

lim 

= 

df(xn) 

 

ϵ→0 ϵ 

dxn 

 

(5.31) 

A small but finite deviation at the n-th iteration, say δxn, should grow to 

 

 

δxn+1 ≈ 

df(xn) 

 

dxn 

δxn 

(5.32) 

Continuing this reasoning 

 

δxn+1 

= 

df(xn) df(xn−1) 

δx0 

dxn dxn−1 

× · · · × df(x0) 

dx0 

 

 

 

 

(5.33)

5.6. LYAPUNOV EXPONENTS 73 

= 

n∏ 

i=0 

|f ′ (xi)| = e λn 

The last equality is just a hypothesis. λ will certainly depend on the point 

n where we stop iterating. We should write instead 

λ(n) = 1 

n ln 

n∏ 

|f ′ (xi)| (5.34) 

with the understanding that the definition only makes sense if there is some 

range of n over which λ(n) is more or less constant. λ defined in this way is 

a Lyapunov exponent. 

In the case of multidimensional mappings 

i=0 

xi+1 = F (xi) (5.35) 

where x and F are n-dimensional vectors, there will be a set of n characteristic 

exponents corresponding to the n eigenvalues of the linearized map 

(??). Introducing the eigenvalues λi(N), i = 1, . . . , n, of the matrix 

(LM)N = (T (xN)T (xN−1) · · · T (x1)) 1/N 

(5.36) 

where T (xi) is the linearization of F at the point xi, the exponents are 

defined as 

σi(N) = ln |λi(N)| (5.37) 

Since the T ’s have unit determinant for area-preserving maps, it is clear that 

the sum of the exponents must be zero. 

For the final example, suppose the equation of motion is 

˙x = f(x) (5.38) 

Let s(t) = x(t) − x0(t) be the difference between two near-by trajectories. 

If this does indeed diverge exponentially with time, then ˙s = λs. Then we 

can argue that 

˙s = ˙x − ˙x0 = f(x) − f(x0) = λs = λ(x − x0) (5.39) 

λ = 

f(x) − f(x0) 

x − x0 

≈ df 

 

 

 

dx 

x0 

(5.40)


5.7 The Poincaré-Birkhoff Theorem 

The phase-space trajectories of integrable systems move on smooth tori. The 

appearance of the Poincaré section depends on whether the winding number 

is rational or irrational. If it is rational the section shows discrete points. If 

irrational, the points are ‘ergodic’ and form a continuous loop. Under the 

influence of nonlinear perturbations the tori become distorted, then break 

up into smaller tori, and finally disintegrate into chaos. It turns out that 

the way this happens depends on whether the winding number is rational or 

irrational. If it is irrational the tori are preserved, distorted but preserved, 

under small perturbations. This is a gross oversimplification of the KAM 

theorem, which I will discuss in the next Section 5.8. If the winding number 

is rational, the tori break up in a way that is governed by the so-called 

Poincaré-Birkoff theorem, the subject of this section. This may seem like 

a swindle, since every irrational number can be approximated to arbitrary 

accuracy by a rational number. But, as it turns out, some numbers are more 

irrational than others! 

I will prove the PB theorem for the standard map equation (5.23) but it 

is true under quite general assumptions. I will use the symbol Tϵ for (5.23), 

i.e. 

Tϵ(ϕn, Jn) = (ϕn+1, Jn+1). 

Now imagine the points in Figure 5.14 (ϵ = 0) plotted in polar coordinates 

(for positive J) with ϕ the angular and J the radial coordinate. The points 

now lie on concentric circles of constant J. Choose J ≡ Jr = 2πj/k, with 

k and j integers, i.e. Jr has a rational winding number. If we iterate T0 k 

times, J remains unchanged and ϕ is incremented by k factors of 2π, which 

is to say, ϕ is not changed at all. Symbolically 

T k 0(ϕ, Jr) = (ϕ, Jr) 

Now take a J+ slightly larger than Jr. Tk 0 will increment ϕ by slightly more 

than 2πk so ϕ will increase. In the same way if J− < Jr, Tk 0 will cause ϕ to 

decrease. We can imagine the values of ϕ lying on three circles J+, Jr, and 

J− as shown in Figure 5.18(a). 

Now turn on a small perturbation ϵ > 0. Tk ϵ will map some ϕ’s to larger 

values and some to smaller, but there will be some locus of points, called C 

in Figure 5.19 which are not changed at all. In other words, the curve C is 

mapped purely radially. 

T k ϵ (Jr, ϕ) = (Jc, ϕ)

5.7. THE POINCARÉ-BIRKHOFF THEOREM 75 

Figure 5.18: (a) Three orbits of the unperturbed standard map Tk 0 . (b) The 

ϕ coordinate is left invariant on C by the perturbed map Tk ϵ . 

Curve C is mapped into a new curve called D in Figure 5.18(b). 

T k ϵ (Jc, ϕ) = (Jd, ϕ) 

The curves C and D must have the same area (remember these are areapreserving 

transformations) so they must cross one another an even number 

of times. This situation is shown in Figure 5.19. The crossings represent 

points that are invariant under T k ϵ – they are fixed points. 

This is our first result. A torus with rational winding number j/k is 

invariant under T k 0 , i.e. every point on the torus is a fixed point of Tk 0 . 

When ϵ is even slightly larger than zero, only a discrete (even) number of 

fixed points of T k ϵ survive. You can ascertain the type of fixed points by 

seeing how other points in their immediate vicinity are mapped. Compare 

this flow as it’s called with the arrows in Figures 5.4 and 5.17. You should be 

able to convince yourself that the points along the curve C are alternatively 

hyperbolic and elliptic. Figure 5.20 should help you visualize this. Since 

there are an even number of fixed points, half of them will be elliptic and 

half hyperbolic. How many are there? Suppose (ϕ0, J0) is a fixed point of 

T k ϵ . We can create more fixed points by multiplying by Tϵ as the following


Figure 5.19: The curves C and D. Crossings, like a and b, are fixed points. 

Figure 5.20: A closer look at the fixed points a and b.

5.8. ALL IN A TANGLE 77 

simple argument shows. 

T k ϵ [Tϵ(ϕ0, J0)] = TϵT k ϵ (ϕ0, J0) = Tϵ(ϕ0, J0) 

Starting with (ϕ0, J0) we can create k − 1 additional fixed points by multiplying 

repeatedly with Tϵ. To put it another way, every fixed point of Tk ϵ is a 

member of a family of k fixed points obtained by multiplying by various powers 

of Tϵ Because each mapping is a continuous function of ϕ and J, all the 

members of an elliptic family are elliptic and all the members of a hyperbolic 

family are hyperbolic. I claim that all the members of a family are distinct. 

Proof: Let (ϕs, Js) be the fixed point obtained by Ts ϵ(ϕ0, J0) = (ϕs, Js) with 

s < k. Then of course all such points are fixed points of Tk ϵ . The claim is 

that there is no m < k such that Tm ϵ (ϕs, Js) = (ϕs, Js). Multiply both sides 

of this equation with T−s ϵ . The result is Tm ϵ (ϕ0, J0) = (ϕ0, J0) It is just this 

equation with m replaced by k that defines (ϕ0, J0). Hence m = k. Finally 

note that none of these newly created fixed points can lie along the original 

curve C. If there were there would be instances in which two hyperbolic or 

two elliptic points appeared side by side. This we know to be impossible. 

Consequently each torus breaks up into k fixed points for every fixed point 

on the curve C. This is the Poincaré-Birkoff theorem. 

5.8 All in a tangle 

Have another look at the hyperbolic fixed points in Figures 5.4 and 5.17. 

There are always two loci of points leading directly toward the fixed point 

and two loci leading away from it. These are called the stable and unstable 

manifolds respectively. Following the notation of Hand and Finch I will call 

them H+ and H−. Call the fixed point pf . Any point along H+ will be 

mapped asymptotically back to pf under repeated applications of Tϵ, and 

any point on H− will be mapped asymptotically back to pf under repeated 

applications of T −1 

ϵ . Can these manifolds cross one another? I claim the 

following. 

• H+ and H− cannot intersect themselves, but they can and do intersect 

one another. 

• Stable manifolds of different fixed points cannot intersect one another. 

• Unstable manifolds of different fixed points cannot intersect one another.


• Stable manifolds can intersect with unstable ones. The stable and 

unstable manifolds of a single fixed point intersect in what are called 

homoclinic points and those of two different fixed points, in heteroclinic 

points. 

• Neither H+ nor H− can cross the tori surrounding elliptic fixed points. 

• There are, depending on the size of ϵ, narrow bands surrounding tori 

with irrational winding number that are not broken up into isolated 

fixed points. This is the content of the KAM theorem to be discussed 

in the next section. Neither H+ nor H− can cross these bands. 

The proofs of these assertions are easy and are given in Finch and Hand. 

Referring to Figure 5.21(a), x0 is a heteroclinic point that lies on the unstable 

manifold H− of pf1 and the stable manifold H+ of pf2. Since both manifolds 

are invariant under Tϵ, the Tk ϵ x0 are a set of discrete points that lie on both 

manifolds, so the two manifolds must therefore intersect again. For instance, 

because x1 = Tϵx0 is on both manifolds, H− must loop around to meet H+. 

Similarly the xk = Tk ϵ must lie on both manifolds, so H− must loop around 

over and over again as illustrated in Figure 5.21(b). The inverse map also 

leaves H+ and H+ invariant and hence the x−k = T−k ϵ x0 are intersections 

that force H+ to loop around to meet H−. As k increases and xk approaches 

one of the fixed points, the spacing between the intersections gets smaller, 

so the loops they create get narrower. But because Tϵ is area-preserving, 

the loop areas are the same, so the loops get longer, which leads to many 

intersections among them, as shown in Figure 5.21(c) and (d). 

Try explaining all this to an intimate friend on a date. The more you 

explain the more you will see that this mechanism produces a tangle of 

fathomless complexity. 7 Nonetheless the mess is contained, at least for small 

ϵ. Since stable manifolds cannot cross, the stable manifold emanating from 

pf1 acts as a barrier to the stable manifold emanating from pf2. The same is 

true of the unstable manifolds. The tangle also cannot cross the stable torri 

surrounding the elliptic fixed points nor can they cross the KAM tori. As 

a consequence we expect to see islands of chaos developing between stable 

ellipses. This is clear in Figures 5.15 and 5.16. As ϵ increases, the KAM tori 

also break down and chaos engulfs the entire plot. 

7 Don’t try to explain this for higher-dimensional spaces. That way lies madness.

5.8. ALL IN A TANGLE 79 

Figure 5.21: A hereroclinic intersection. (a) Two hyperbolic fixed points pf1 

and pf2, and an intersection x0 of the unstable manifold of pf1 with the stable 

manifold of pf2. (b) Adding the forward maps T k x0 of the intersection. 

(c) Adding the backward maps T −1 x0 of the intersection. (d) Adding another 

intersection x ′ and some of its backward maps. U – unstable manifold; 

S – stable manifold.


5.9 The KAM theorem and its consequences 

For a system with n independent degrees of freedom to be integable, it is 

a necessary and sufficient condition that n independent constants of the 

motion exist. In this case the system can be transformed into a set of 

action-angle variables 

In this notation 

ω0 ≡ (ω01, ω02, · · · , ω0n) I0 ≡ (I01, I02, · · · , I0n) (5.41) 

ω0 = ∂H0 

∂I0 

Now suppose the system is perturbed slightly 

(5.42) 

H(ω0, I0, ϵ) = H0(I0) + ϵH1(ω0, I0) (5.43) 

where I0 and ω0 are the AA variables of H0. According to our perturbation 

formalism from Chapter 4, there are two series that must converge (4.25) 

and (4.26) repeated here for convenience. 

where k is a vector of integers 8 

and 

˜H1(I, ψ) = ∑ 

Ak(I)e ik·ψ 

k 

F1(I, ψ) = ∑ 

Bk(I)e ik·ψ 

k 

k = k1, · · · , kn 

Bk = i Ak 

ω0 · k 

(5.44) 

(5.45) 

(5.46) 

The rate of decrease of the |Bk| depends both on the |Ak| and the denominators 

|ω0·k|, so even if the |Ak| decrease fast enough for (5.44) to converge, 

(5.45) will not converge if the |ω0 · k| decrease too rapidly. 

The situation seems hopeless. If any of the ω0’s yields a rational winding 

number, the series will blow up immediately, and if one is working with finite 

precision – on a computer for example – every number is a rational number. 

And yet we have seem from our computer models that some stable periodic 

8 The sum over k means the sum over all possible combinations of the n integers 

k1, · · · , kn.

5.9. THE KAM THEOREM AND ITS CONSEQUENCES 81 

trajectories persist even under the influence of small perturbations. The circumstances 

under which this happens is spelled out in a remarkable theorem 

first outlined by Kolmogorov and later proved independently by Arnold and 

Moser. The theorem is extremely difficult and sophisticated although Tabor 

has a nice explanation of the basic ideas, and an understandable outline of 

the proof is given in Classical Dynamics by José and Saletan. I will explain 

the theorem as carefully as I can and let it go at that. 

5.9.1 Two Conditions 

The KAM theorem claims that in regions of phase space where certain conditions 

hold, the perturbation series converges to all order in ϵ. The first 

condition involves the Hessian matrix. 

 

∂ω0α 

 

det 

≡ det 

∂ 

 

2 

H0 

 

̸= 0. (5.47) 

∂I0β 

∂I0α∂I0β 

The content of this statement is as follows: We assume that each torus has 

a unique frequency associated with it. Thus if we knew all the ω0’s we could 

calculate all the I0’s and vice versa. Equation (5.47) ensures that this is 

true. A simple (albeit artificial) example is provided by José and Saletan. 

Consider the one degree of freedom Hamiltonian 

H = I 3 /3 (5.48) 

in which I takes on values in the interval −1 

requires that 

d2H = 2aI ̸= 0 (5.49) 

dI2 Why is this significant? Note that ω(I) = dH/dI = aJ 2 . Inverting this gives 

I(ω) = ± √ ω/a. The ± is a sign that the inversion is not unique. There 

are two regions separated by I = 0. In the region 0 

the region −1 ≤ I < 0, I = − √ ω/a. Thus there are two “good” regions 

separated by a barrier. 

There is a second condition restricting the frequencies. Of course we 

are only considering frequencies with irrational winding numbers. Even if 

the frequencies are incommensurate, |ω0 · k| could be arbitrarily small. The 

KAM theorem requires that it be bounded from below by the so-called “weak 

diophantine condition” 

|ω0 · k| ≥ γ|k| −κ for all integer k (5.50)


where k = √ k · k and γ and κ > n are positive constants. 

What is the significance of this strange inequality? The best way to 

understand it, I think, is to face up to the paradox I mentioned earlier that 

the series can only converge for irrational winding numbers, and yet it seems 

that every irrational number is “arbitrarily close” to a rational number. It is 

this last statement that needs to be examined more carefully. This requires 

a brief excursion into number theory. Consider the unit interval [0, 1]. The 

rationals have measure zero in the interval. That means roughly that they 

don’t take up any space. This can be proved as follows. First put the 

rationals in a one-to-one correspondence with the integers. Construct a 

small open interval of length ϵ < 1 about the first rational, and one of 

length ϵ 2 about the second and so forth. The sum of all these little intervals 

(this is a geometric series) is σ = ϵ/(1 − ϵ), which can be made arbitrarily 

small by choosing ϵ small enough. Thus the space occupied by the rationals 

is less than any positive number. This requires taking the limit ϵ → 0. The 

paradoxical thing is that it is possible to remove a finite interval around each 

rational without deleting all of [0, 1]. This can be seen as follows. Write each 

rational in [0, 1] in its lowest form as p/q, and about each one construct an 

interval of length 1/q 3 . For each q there are at most q − 1 rationals. Thus 

for a given q no more than (q − 1)/q 3 is covered by the intervals, and the 

total length Q that is covered is less (because of overlaps) than the sum of 

these intervals over all q. 

Q < 

∞∑ 

q=2 

q − 1 

< 

q3 ∞∑ 

q=2 

1 

q 3 

(5.51) 

This sum is related to the Riemann zeta function. At any rate Q < 0.645. 

We can make this number as small as we like by replacing 1/q 3 with Γ/q 3 

where Γ < 1. Even if we leave Γ = 1, the fraction of [0, 1] covered by the 

finite intervals is less than 1. 

Now we can divide the irrationals into two sets, those covered by the 

intervals around the rationals and those outside the intervals. Those uncov- 

ered satisfy the condition 

 

ω − p 

 

 

 

q 

Several comments are in order regarding this inequality. 

Γ 

≥ . (5.52) 

q3 • Equation (5.52) makes irrationality a quantitative concept. 9 Those 

irrationals that satisfy (5.52) are “more irrational” than those that 

9 This can also be quantified in terms of continued fraction expansions.

5.10. CONCLUSION 83 

don’t, and the extent of their irrationality can be quantified by that 

value of Γ for which they just do or do not satisfy the inequality. 

• Equation (5.50) is just an n-dimensional version of (5.52). The constants 

γ and κ characterize the degree of irrationality of ω in the same 

way that Γ and the exponent 3 characterize the irrationality of ω in 

(5.52). 

• The uncovered irrationals occupy isolated “islands” between the covered 

intervals. We expect that as the perturbation parameter ϵ is 

increased, those tori with less irrational winding numbers will be destroyed 

first, but islands of stability will remain between them. Eventually 

as the perturbation is increased, all will be swept away in chaos. 

• The KAM theorem gives us no clue how to calculate the appropriate 

values of γ and κ or the values of ϵ for which chaos will set in. Some 

estimates placed the critical value of ϵ to be something around 10 −50 ! 

If this were true, of course, the theorem would be quite pointless. 

Numerical test with specific models have found critical values of ϵ as 

large as ϵc ≈ 1. I will close with a quote from José and Saletan, “To 

our knowledge a rigorous formal estimate of a realistic critical value 

for ϵ remains an open question.” 

5.10 Conclusion 

This is the end of our story about chaos. Remember that we have only 

dealt with bounded, conservative systems with time-independent Hamiltonians. 

(Classical mechanics is a big subject.) Systems with one degree of 

freedom are trivial (in principle) to solve using the method of quadratures. 

Systems with n degrees of freedom are trivial (again in principle) if they have 

n constants of motion. Such a system can be reduced by using action-angle 

variables to an ensemble of uncoupled oscillators. These systems are said to 

be integrable and they do not display chaos. The trouble comes when we 

introduce some non-integrability as a perturbation. Perturbation theory is 

straightforward with one degree of freedom, but with two or more degrees 

of freedom comes the notorious problem of small denominators. 10 Perturbation 

theory fails immediately for all periodic trajectories with rational 

10 There are other ways of doing perturbation theory in addition to the one described 

here. They all suffer the same problem.


winding number. According to the Poincaré-Birkoff theorem, these trajectories 

on the Poincaré section break up into complicated whorls and tangles 

surrounded be regions of stability corresponding to irrational winding numbers. 

According to the KAM theorem these regions break down with those 

with “more irrational” winding numbers surviving those with less. At last 

“Universal darkness covers all,” and the trajectories though deterministic 

show no order or pattern.

Nonlinear Mechanics - Physics at Oregon State University

Create successful ePaper yourself

Delete template?

Save as template?