12.07.2015 Views

Contents

Contents

Contents

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction toProbability ModelsNinth Edition


This page intentionally left blank


Introduction toProbability ModelsNinth EditionSheldon M. RossUniversity of CaliforniaBerkeley, CaliforniaAMSTERDAM • BOSTON • HEIDELBERG • LONDONNEW YORK • OXFORD • PARIS • SAN DIEGOSAN FRANCISCO • SINGAPORE • SYDNEY • TOKYOAcademic Press is an imprint of Elsevier


Acquisitions EditorProject ManagerMarketing ManagersCover DesignCompositionCover PrinterInterior PrinterTom SingerSarah M. HajdukLinda Beattie, Leah AckersonEric DeCiccoVTEXPhoenix ColorThe Maple-Vail Book Manufacturing GroupAcademic Press is an imprint of Elsevier30 Corporate Drive, Suite 400, Burlington, MA 01803, USA525 B Street, Suite 1900, San Diego, California 92101-4495, USA84 Theobald’s Road, London WC1X 8RR, UKThis book is printed on acid-free paper. ○∞Copyright © 2007, Elsevier Inc. All rights reserved.No part of this publication may be reproduced or transmitted in any form or by anymeans, electronic or mechanical, including photocopy, recording, or any informationstorage and retrieval system, without permission in writing from the publisher.Permissions may be sought directly from Elsevier’s Science & Technology RightsDepartment in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333,E-mail: permissions@elsevier.com. You may also complete your request on-linevia the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact”then “Copyright and Permission” and then “Obtaining Permissions.”Library of Congress Cataloging-in-Publication DataApplication SubmittedBritish Library Cataloguing-in-Publication DataA catalogue record for this book is available from the British Library.ISBN-13: 978-0-12-598062-3ISBN-10: 0-12-598062-0For information on all Academic Press publicationsvisit our Web site at www.books.elsevier.comPrinted in the United States of America06 07 08 09 10 9 8 7 6 5 4 3 2 1


<strong>Contents</strong>Preface xiii1. Introduction to Probability Theory 11.1. Introduction 11.2. Sample Space and Events 11.3. Probabilities Defined on Events 41.4. Conditional Probabilities 71.5. Independent Events 101.6. Bayes’ Formula 12Exercises 15References 212. Random Variables 232.1. Random Variables 232.2. Discrete Random Variables 272.2.1. The Bernoulli Random Variable 282.2.2. The Binomial Random Variable 292.2.3. The Geometric Random Variable 312.2.4. The Poisson Random Variable 322.3. Continuous Random Variables 342.3.1. The Uniform Random Variable 352.3.2. Exponential Random Variables 362.3.3. Gamma Random Variables 372.3.4. Normal Random Variables 37v


vi<strong>Contents</strong>2.4. Expectation of a Random Variable 382.4.1. The Discrete Case 382.4.2. The Continuous Case 412.4.3. Expectation of a Function of a Random Variable 432.5. Jointly Distributed Random Variables 472.5.1. Joint Distribution Functions 472.5.2. Independent Random Variables 512.5.3. Covariance and Variance of Sums of Random Variables 532.5.4. Joint Probability Distribution of Functions of RandomVariables 612.6. Moment Generating Functions 642.6.1. The Joint Distribution of the Sample Mean and SampleVariance from a Normal Population 742.7. Limit Theorems 772.8. Stochastic Processes 83Exercises 85References 963. Conditional Probability and ConditionalExpectation 973.1. Introduction 973.2. The Discrete Case 973.3. The Continuous Case 1023.4. Computing Expectations by Conditioning 1053.4.1. Computing Variances by Conditioning 1173.5. Computing Probabilities by Conditioning 1203.6. Some Applications 1373.6.1. A List Model 1373.6.2. A Random Graph 1393.6.3. Uniform Priors, Polya’s Urn Model, andBose–Einstein Statistics 1473.6.4. Mean Time for Patterns 1513.6.5. The k-Record Values of Discrete Random Variables 1553.7. An Identity for Compound Random Variables 1583.7.1. Poisson Compounding Distribution 1613.7.2. Binomial Compounding Distribution 1633.7.3. A Compounding Distribution Related to the NegativeBinomial 164Exercises 165


<strong>Contents</strong>vii4. Markov Chains 1854.1. Introduction 1854.2. Chapman–Kolmogorov Equations 1894.3. Classification of States 1934.4. Limiting Probabilities 2044.5. Some Applications 2174.5.1. The Gambler’s Ruin Problem 2174.5.2. A Model for Algorithmic Efficiency 2214.5.3. Using a Random Walk to Analyze a Probabilistic Algorithmfor the Satisfiability Problem 2244.6. Mean Time Spent in Transient States 2304.7. Branching Processes 2334.8. Time Reversible Markov Chains 2364.9. Markov Chain Monte Carlo Methods 2474.10. Markov Decision Processes 2524.11. Hidden Markov Chains 2564.11.1. Predicting the States 261Exercises 263References 2805. The Exponential Distribution and the PoissonProcess 2815.1. Introduction 2815.2. The Exponential Distribution 2825.2.1. Definition 2825.2.2. Properties of the Exponential Distribution 2845.2.3. Further Properties of the Exponential Distribution 2915.2.4. Convolutions of Exponential Random Variables 2985.3. The Poisson Process 3025.3.1. Counting Processes 3025.3.2. Definition of the Poisson Process 3045.3.3. Interarrival and Waiting Time Distributions 3075.3.4. Further Properties of Poisson Processes 3105.3.5. Conditional Distribution of the Arrival Times 3165.3.6. Estimating Software Reliability 3285.4. Generalizations of the Poisson Process 3305.4.1. Nonhomogeneous Poisson Process 3305.4.2. Compound Poisson Process 3375.4.3. Conditional or Mixed Poisson Processes 343


viii<strong>Contents</strong>Exercises 346References 3646. Continuous-Time Markov Chains 3656.1. Introduction 3656.2. Continuous-Time Markov Chains 3666.3. Birth and Death Processes 3686.4. The Transition Probability Function P ij (t) 3756.5. Limiting Probabilities 3846.6. Time Reversibility 3926.7. Uniformization 4016.8. Computing the Transition Probabilities 404Exercises 407References 4157. Renewal Theory and Its Applications 4177.1. Introduction 4177.2. Distribution of N(t) 4197.3. Limit Theorems and Their Applications 4237.4. Renewal Reward Processes 4337.5. Regenerative Processes 4427.5.1. Alternating Renewal Processes 4457.6. Semi-Markov Processes 4527.7. The Inspection Paradox 4557.8. Computing the Renewal Function 4587.9. Applications to Patterns 4617.9.1. Patterns of Discrete Random Variables 4627.9.2. The Expected Time to a Maximal Run of Distinct Values 4697.9.3. Increasing Runs of Continuous Random Variables 4717.10. The Insurance Ruin Problem 473Exercises 479References 4928. Queueing Theory 4938.1. Introduction 4938.2. Preliminaries 4948.2.1. Cost Equations 4958.2.2. Steady-State Probabilities 496


<strong>Contents</strong>ix8.3. Exponential Models 4998.3.1. A Single-Server Exponential Queueing System 4998.3.2. A Single-Server Exponential Queueing SystemHaving Finite Capacity 5088.3.3. A Shoeshine Shop 5118.3.4. A Queueing System with Bulk Service 5148.4. Network of Queues 5178.4.1. Open Systems 5178.4.2. Closed Systems 5228.5. The System M/G/1 5288.5.1. Preliminaries: Work and Another Cost Identity 5288.5.2. Application of Work to M/G/1 5298.5.3. Busy Periods 5308.6. Variations on the M/G/1 5318.6.1. The M/G/1 with Random-Sized Batch Arrivals 5318.6.2. Priority Queues 5338.6.3. An M/G/1 Optimization Example 5368.6.4. The M/G/1 Queue with Server Breakdown 5408.7. The Model G/M/1 5438.7.1. The G/M/1 Busy and Idle Periods 5488.8. A Finite Source Model 5498.9. Multiserver Queues 5528.9.1. Erlang’s Loss System 5538.9.2. The M/M/k Queue 5548.9.3. The G/M/k Queue 5548.9.4. The M/G/k Queue 556Exercises 558References 5709. Reliability Theory 5719.1. Introduction 5719.2. Structure Functions 5719.2.1. Minimal Path and Minimal Cut Sets 5749.3. Reliability of Systems of Independent Components 5789.4. Bounds on the Reliability Function 5839.4.1. Method of Inclusion and Exclusion 5849.4.2. Second Method for Obtaining Bounds on r(p) 5939.5. System Life as a Function of Component Lives 5959.6. Expected System Lifetime 6049.6.1. An Upper Bound on the Expected Life of aParallel System 608


x<strong>Contents</strong>9.7. Systems with Repair 6109.7.1. A Series Model with Suspended Animation 615Exercises 617References 62410. Brownian Motion and Stationary Processes 62510.1. Brownian Motion 62510.2. Hitting Times, Maximum Variable, and the Gambler’s RuinProblem 62910.3. Variations on Brownian Motion 63110.3.1. Brownian Motion with Drift 63110.3.2. Geometric Brownian Motion 63110.4. Pricing Stock Options 63210.4.1. An Example in Options Pricing 63210.4.2. The Arbitrage Theorem 63510.4.3. The Black-Scholes Option Pricing Formula 63810.5. White Noise 64410.6. Gaussian Processes 64610.7. Stationary and Weakly Stationary Processes 64910.8. Harmonic Analysis of Weakly Stationary Processes 654Exercises 657References 66211. Simulation 66311.1. Introduction 66311.2. General Techniques for Simulating Continuous RandomVariables 66811.2.1. The Inverse Transformation Method 66811.2.2. The Rejection Method 66911.2.3. The Hazard Rate Method 67311.3. Special Techniques for Simulating Continuous RandomVariables 67711.3.1. The Normal Distribution 67711.3.2. The Gamma Distribution 68011.3.3. The Chi-Squared Distribution 68111.3.4. The Beta (n, m) Distribution 68111.3.5. The Exponential Distribution—The Von NeumannAlgorithm 68211.4. Simulating from Discrete Distributions 68511.4.1. The Alias Method 688


<strong>Contents</strong>xi11.5. Stochastic Processes 69211.5.1. Simulating a Nonhomogeneous Poisson Process 69311.5.2. Simulating a Two-Dimensional Poisson Process 70011.6. Variance Reduction Techniques 70311.6.1. Use of Antithetic Variables 70411.6.2. Variance Reduction by Conditioning 70811.6.3. Control Variates 71211.6.4. Importance Sampling 71411.7. Determining the Number of Runs 72011.8. Coupling from the Past 720Exercises 723References 731Appendix: Solutions to Starred Exercises 733Index 775


This page intentionally left blank


PrefaceThis text is intended as an introduction to elementary probability theory and stochasticprocesses. It is particularly well suited for those wanting to see how probabilitytheory can be applied to the study of phenomena in fields such as engineering,computer science, management science, the physical and social sciences, andoperations research.It is generally felt that there are two approaches to the study of probability theory.One approach is heuristic and nonrigorous and attempts to develop in thestudent an intuitive feel for the subject which enables him or her to “think probabilistically.”The other approach attempts a rigorous development of probabilityby using the tools of measure theory. It is the first approach that is employed inthis text. However, because it is extremely important in both understanding andapplying probability theory to be able to “think probabilistically,” this text shouldalso be useful to students interested primarily in the second approach.New to This EditionThe ninth edition contains the following new sections.• Section 3.7 is concerned with compound random variables of the formS N = ∑ Ni=1 X i , where N is independent of the sequence of independent andidentically distributed random variables X i , i 1. It starts by deriving a generalidentity concerning compound random variables, as well as a corollaryof that identity in the case where the X i are positive and integer valued. Thecorollary is then used in subsequent subsections to obtain recursive formulasfor the probability mass function of S N , when N is a Poisson distribution(Subsection 3.7.1), a binomial distribution (Subsection 3.7.2), or a negativebinomial distribution (Subsection 3.7.3).xiii


xivPreface• Section 4.11 deals with hidden Markov chains. These models suppose thata random signal is emitted each time a Markov chain enters a state, withthe distribution of the signal depending on the state entered. The Markovchain is hidden in the sense that it is supposed that only the signals and notthe underlying states of the chain are observable. As part of our analysisof these models we present, in Subsection 4.11.1, the Viterbi algorithm fordetermining the most probable sequence of first n states, given the first nsignals.• Section 8.6.4 analyzes the Poisson arrival single server queue under the assumptionthat the working server will randomly break down and need repair.There is also new material in almost all chapters. Some of the more significantadditions being the following.• Example 5.9, which is concerned with the expected number of normal cellsthat survive until all cancer cells have been killed. The example supposesthat each cell has a weight, and the probability that a given surviving cell isthe next cell killed is proportional to its weight.• A new approach—based on time sampling of a Poisson process—is presentedin Subsection 5.4.1 for deriving the probability mass function of thenumber of events of a nonhomogeneous Poisson process that occur in anyspecified time interval.• There is additional material in Section 8.3 concerning the M/M/1 queue.Among other things, we derive the conditional distribution of the number ofcustomers originally found in the system by a customer who spends a time tin the system before departing. (The conditional distribution is Poisson.) InExample 8.3, we illustrate the inspection paradox, by obtaining the probabilitydistribution of the number in the system as seen by the first arrival aftersome specified time.CourseIdeally, this text would be used in a one-year course in probability models. Otherpossible courses would be a one-semester course in introductory probability theory(involving Chapters 1–3 and parts of others) or a course in elementary stochasticprocesses. The textbook is designed to be flexible enough to be used in avariety of possible courses. For example, I have used Chapters 5 and 8, with smatteringsfrom Chapters 4 and 6, as the basis of an introductory course in queueingtheory.


PrefacexvExamples and ExercisesMany examples are worked out throughout the text, and there are also a largenumber of exercises to be solved by students. More than 100 of these exerciseshave been starred and their solutions provided at the end of the text. These starredproblems can be used for independent study and test preparation. An Instructor’sManual, containing solutions to all exercises, is available free to instructors whoadopt the book for class.OrganizationChapters 1 and 2 deal with basic ideas of probability theory. In Chapter 1 anaxiomatic framework is presented, while in Chapter 2 the important concept ofa random variable is introduced. Subsection 2.6.1 gives a simple derivation ofthe joint distribution of the sample mean and sample variance of a normal datasample.Chapter 3 is concerned with the subject matter of conditional probability andconditional expectation. “Conditioning” is one of the key tools of probability theory,and it is stressed throughout the book. When properly used, conditioning oftenenables us to easily solve problems that at first glance seem quite difficult. Thefinal section of this chapter presents applications to (1) a computer list problem,(2) a random graph, and (3) the Polya urn model and its relation to the Bose-Einstein distribution. Subsection 3.6.5 presents k-record values and the surprisingIgnatov’s theorem.In Chapter 4 we come into contact with our first random, or stochastic, process,known as a Markov chain, which is widely applicable to the study of many realworldphenomena. Applications to genetics and production processes are presented.The concept of time reversibility is introduced and its usefulness illustrated.Subsection 4.5.3 presents an analysis, based on random walk theory, of aprobabilistic algorithm for the satisfiability problem. Section 4.6 deals with themean times spent in transient states by a Markov chain. Section 4.9 introducesMarkov chain Monte Carlo methods. In the final section we consider a model foroptimally making decisions known as a Markovian decision process.In Chapter 5 we are concerned with a type of stochastic process known as acounting process. In particular, we study a kind of counting process known asa Poisson process. The intimate relationship between this process and the exponentialdistribution is discussed. New derivations for the Poisson and nonhomogeneousPoisson processes are discussed. Examples relating to analyzing greedyalgorithms, minimizing highway encounters, collecting coupons, and tracking theAIDS virus, as well as material on compound Poisson processes, are included


xviPrefacein this chapter. Subsection 5.2.4 gives a simple derivation of the convolution ofexponential random variables.Chapter 6 considers Markov chains in continuous time with an emphasis onbirth and death models. Time reversibility is shown to be a useful concept, as itis in the study of discrete-time Markov chains. Section 6.7 presents the computationallyimportant technique of uniformization.Chapter 7, the renewal theory chapter, is concerned with a type of countingprocess more general than the Poisson. By making use of renewal rewardprocesses, limiting results are obtained and applied to various fields. Section 7.9presents new results concerning the distribution of time until a certain pattern occurswhen a sequence of independent and identically distributed random variablesis observed. In Subsection 7.9.1, we show how renewal theory can be used to deriveboth the mean and the variance of the length of time until a specified patternappears, as well as the mean time until one of a finite number of specified patternsappears. In Subsection 7.9.2, we suppose that the random variables are equallylikely to take on any of m possible values, and compute an expression for themean time until a run of m distinct values occurs. In Subsection 7.9.3, we supposethe random variables are continuous and derive an expression for the meantime until a run of m consecutive increasing values occurs.Chapter 8 deals with queueing, or waiting line, theory. After some preliminariesdealing with basic cost identities and types of limiting probabilities, we considerexponential queueing models and show how such models can be analyzed.Included in the models we study is the important class known as a network ofqueues. We then study models in which some of the distributions are allowed tobe arbitrary. Included are Subsection 8.6.3 dealing with an optimization problemconcerning a single server, general service time queue, and Section 8.8, concernedwith a single server, general service time queue in which the arrival source is afinite number of potential users.Chapter 9 is concerned with reliability theory. This chapter will probably beof greatest interest to the engineer and operations researcher. Subsection 9.6.1illustrates a method for determining an upper bound for the expected life of aparallel system of not necessarily independent components and (9.7.1) analyzinga series structure reliability model in which components enter a state of suspendedanimation when one of their cohorts fails.Chapter 10 is concerned with Brownian motion and its applications. The theoryof options pricing is discussed. Also, the arbitrage theorem is presented and itsrelationship to the duality theorem of linear program is indicated. We show howthe arbitrage theorem leads to the Black–Scholes option pricing formula.Chapter 11 deals with simulation, a powerful tool for analyzing stochastic modelsthat are analytically intractable. Methods for generating the values of arbitrarilydistributed random variables are discussed, as are variance reduction methodsfor increasing the efficiency of the simulation. Subsection 11.6.4 introduces the


Prefacexviiimportant simulation technique of importance sampling, and indicates the usefulnessof tilted distributions when applying this method.AcknowledgmentsWe would like to acknowledge with thanks the helpful suggestions made by themany reviewers of the text. These comments have been essential in our attempt tocontinue to improve the book and we owe these reviewers, and others who wishto remain anonymous, many thanks:Mark Brown, City University of New YorkTapas Das, University of South FloridaIsrael David, Ben-Gurion UniversityJay Devore, California Polytechnic InstituteEugene Feinberg, State University of New York, StonybrookRamesh Gupta, University of MaineMarianne Huebner, Michigan State UniversityGarth Isaak, Lehigh UniversityJonathan Kane, University of Wisconsin WhitewaterAmarjot Kaur, Pennsylvania State UniversityZohel Khalil, Concordia UniversityEric Kolaczyk, Boston UniversityMelvin Lax, California State University, Long BeachJean Lemaire, University of PennsylvaniaGeorge Michailidis, University of MichiganKrzysztof Osfaszewski, University of IllinoisErol Pekoz, Boston University


xviiiPrefaceEvgeny Poletsky, Syracuse UniversityAnthony Quas, University of VictoriaDavid Scollnik, University of CalgaryMary Shepherd, Northwest Missouri State UniversityGalen Shorack, University of Washington, SeattleOsnat Stramer, University of IowaGabor Szekeley, Bowling Green State UniversityMarlin Thomas, Purdue UniversityZhenyuan Wang, University of BinghamtonJulie Zhou, University of VictoriaSpecial thanks to Donald Minassian of Butler University, for his extremelycareful reading of the text.


Introduction toProbability Theory11.1. IntroductionAny realistic model of a real-world phenomenon must take into account the possibilityof randomness. That is, more often than not, the quantities we are interestedin will not be predictable in advance but, rather, will exhibit an inherent variationthat should be taken into account by the model. This is usually accomplishedby allowing the model to be probabilistic in nature. Such a model is, naturallyenough, referred to as a probability model.The majority of the chapters of this book will be concerned with different probabilitymodels of natural phenomena. Clearly, in order to master both the “modelbuilding” and the subsequent analysis of these models, we must have a certainknowledge of basic probability theory. The remainder of this chapter, as well asthe next two chapters, will be concerned with a study of this subject.1.2. Sample Space and EventsSuppose that we are about to perform an experiment whose outcome is not predictablein advance. However, while the outcome of the experiment will not beknown in advance, let us suppose that the set of all possible outcomes is known.This set of all possible outcomes of an experiment is known as the sample spaceof the experiment and is denoted by S.Some examples are the following.1. If the experiment consists of the flipping of a coin, thenS ={H,T}where H means that the outcome of the toss is a head and T that it is a tail.1


2 1 Introduction to Probability Theory2. If the experiment consists of rolling a die, then the sample space isS ={1, 2, 3, 4, 5, 6}where the outcome i means that i appeared on the die, i = 1, 2, 3, 4, 5, 6.3. If the experiments consists of flipping two coins, then the sample space consistsof the following four points:S ={(H, H ), (H, T ), (T , H ), (T , T )}The outcome will be (H, H ) if both coins come up heads; it will be (H, T )if the first coin comes up heads and the second comes up tails; it will be(T , H ) if the first comes up tails and the second heads; and it will be (T , T )if both coins come up tails.4. If the experiment consists of rolling two dice, then the sample space consistsof the following 36 points:⎧⎫(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)⎪⎨⎪⎬(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)S =(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)⎪⎩⎪⎭(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)where the outcome (i, j) is said to occur if i appears on the first die and jon the second die.5. If the experiment consists of measuring the lifetime of a car, then the samplespace consists of all nonnegative real numbers. That is,S =[0, ∞) ∗Any subset E of the sample space S is known as an event. Some examples ofevents are the following.1 ′ . In Example (1), if E ={H }, then E is the event that a head appears on theflip of the coin. Similarly, if E ={T }, then E would be the event that a tailappears.2 ′ . In Example (2), if E ={1}, then E is the event that one appears on the rollof the die. If E ={2, 4, 6}, then E would be the event that an even numberappears on the roll.∗ The set (a, b) is defined to consist of all points x such that a


1.2. Sample Space and Events 33 ′ . In Example (3), if E ={(H, H ), (H, T )}, then E is the event that a headappears on the first coin.4 ′ . In Example (4), if E ={(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, thenE is the event that the sum of the dice equals seven.5 ′ . In Example (5), if E = (2, 6), then E is the event that the car lasts betweentwo and six years. For any two events E and F of a sample space S we define the new event E ∪Fto consist of all outcomes that are either in E or in F or in both E and F . That is,the event E ∪ F will occur if either E or F occurs. For example, in (1) if E ={H }and F ={T }, thenE ∪ F ={H, T}That is, E ∪ F would be the whole sample space S. In(2)ifE ={1, 3, 5} andF ={1, 2, 3}, thenE ∪ F ={1, 2, 3, 5}and thus E ∪ F would occur if the outcome of the die is 1 or 2 or 3 or 5. The eventE ∪ F is often referred to as the union of the event E and the event F .For any two events E and F , we may also define the new event EF, sometimeswritten E ∩ F , and referred to as the intersection of E and F , as follows. EFconsists of all outcomes which are both in E and in F . That is, the event EFwill occur only if both E and F occur. For example, in (2) if E ={1, 3, 5} andF ={1, 2, 3}, thenEF ={1, 3}and thus EF would occur if the outcome of the die is either 1 or 3. In Example (1)if E ={H } and F ={T }, then the event EF would not consist of any outcomesand hence could not occur. To give such an event a name, we shall refer to it asthe null event and denote it by Ø. (That is, Ø refers to the event consisting of nooutcomes.) If EF = Ø, then E and F aresaidtobemutually exclusive.We also define unions and intersections of more than two events in a similarmanner. If E 1 ,E 2 ,... are events, then the union of these events, denoted by⋃ ∞n=1E n , is defined to be that event which consists of all outcomes that are in E nfor at least one value of n = 1, 2,.... Similarly, the intersection of the events E n ,denoted by ⋂ ∞n=1 E n , is defined to be the event consisting of those outcomes thatare in all of the events E n ,n= 1, 2,....Finally, for any event E we define the new event E c , referred to as thecomplement of E, to consist of all outcomes in the sample space S that are notin E. That is, E c will occur if and only if E does not occur. In Example (4)


4 1 Introduction to Probability Theoryif E ={(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, then E c will occur if the sum ofthe dice does not equal seven. Also note that since the experiment must result insome outcome, it follows that S c = Ø.1.3. Probabilities Defined on EventsConsider an experiment whose sample space is S. For each event E of the samplespace S, we assume that a number P(E) is defined and satisfies the followingthree conditions:(i) 0 P(E) 1.(ii) P(S)= 1.(iii) For any sequence of events E 1 ,E 2 ,...that are mutually exclusive, that is,events for which E n E m = Ø when n ≠ m, then( ∞)⋃ ∞∑P E n = P(E n )n=1n=1We refer to P(E)as the probability of the event E.Example 1.1 In the coin tossing example, if we assume that a head is equallylikely to appear as a tail, then we would haveP({H }) = P({T }) = 1 2On the other hand, if we had a biased coin and felt that a head was twice as likelyto appear as a tail, then we would haveP({H }) = 2 3 , P({T }) = 1 3Example 1.2 In the die tossing example, if we supposed that all six numberswere equally likely to appear, then we would haveP({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = 1 6From (iii) it would follow that the probability of getting an even number wouldequalP({2, 4, 6}) = P({2}) + P({4}) + P({6})= 1 2


1.3. Probabilities Defined on Events 5Remark We have chosen to give a rather formal definition of probabilities asbeing functions defined on the events of a sample space. However, it turns outthat these probabilities have a nice intuitive property. Namely, if our experimentis repeated over and over again then (with probability 1) the proportion of timethat event E occurs will just be P(E).Since the events E and E c are always mutually exclusive and since E ∪ E c = Swe have by (ii) and (iii) thator1 = P(S)= P(E∪ E c ) = P(E)+ P(E c )P(E c ) = 1 − P(E) (1.1)In words, Equation (1.1) states that the probability that an event does not occur isone minus the probability that it does occur.We shall now derive a formula for P(E∪ F), the probability of all outcomeseither in E or in F . To do so, consider P(E)+ P(F), which is the probabilityof all outcomes in E plus the probability of all points in F . Since any outcomethat is in both E and F will be counted twice in P(E)+ P(F) and only once inP(E∪ F),wemusthaveor equivalentlyP(E)+ P(F)= P(E∪ F)+ P(EF)P(E∪ F)= P(E)+ P(F)− P(EF) (1.2)Note that when E and F are mutually exclusive (that is, when EF = Ø), thenEquation (1.2) states thatP(E∪ F)= P(E)+ P(F)− P(Ø)= P(E)+ P(F)a result which also follows from condition (iii). [Why is P(Ø) = 0?]Example 1.3 Suppose that we toss two coins, and suppose that we assumethat each of the four outcomes in the sample spaceS ={(H, H ), (H, T ), (T , H ), (T , T )}is equally likely and hence has probability 1 4 .LetE ={(H, H ), (H, T )} and F ={(H, H ), (T , H )}That is, E is the event that the first coin falls heads, and F is the event that thesecond coin falls heads.


6 1 Introduction to Probability TheoryBy Equation (1.2) we have that P(E∪ F), the probability that either the firstor the second coin falls heads, is given byP(E∪ F)= P(E)+ P(F)− P(EF)= 1 2 + 1 2 − P({H,H})= 1 − 1 4 = 3 4This probability could, of course, have been computed directly sinceP(E∪ F)= P({H,H),(H,T),(T,H)}) = 3 4We may also calculate the probability that any one of the three events E or For G occurs. This is done as follows:which by Equation (1.2) equalsP(E∪ F ∪ G) = P ((E ∪ F)∪ G)P(E∪ F)+ P(G)− P ((E ∪ F)G)Now we leave it for you to show that the events (E ∪ F)G and EG ∪ FG areequivalent, and hence the preceding equalsP(E∪ F ∪ G)= P(E)+ P(F)− P(EF)+ P(G)− P(EG∪ FG)= P(E)+ P(F)− P(EF)+ P(G)− P(EG)− P(FG)+ P(EGFG)= P(E)+ P(F)+ P(G)− P(EF)− P(EG)− P(FG)+ P(EFG) (1.3)In fact, it can be shown by induction that, for any n events E 1 ,E 2 ,E 3 ,...,E n ,P(E 1 ∪ E 2 ∪···∪E n ) = ∑ P(E i ) − ∑ P(E i E j ) + ∑P(E i E j E k )ii


1.4. Conditional Probabilities1.4. Conditional Probabilities 7Suppose that we toss two dice and that each of the 36 possible outcomes is equallylikely to occur and hence has probability36 1 . Suppose that we observe that thefirst die is a four. Then, given this information, what is the probability that thesum of the two dice equals six? To calculate this probability we reason as follows:Given that the initial die is a four, it follows that there can be at mostsix possible outcomes of our experiment, namely, (4, 1), (4, 2), (4, 3), (4, 4),(4, 5), and (4, 6). Since each of these outcomes originally had the same probabilityof occurring, they should still have equal probabilities. That is, given thatthe first die is a four, then the (conditional) probability of each of the outcomes(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6) is 1 6while the (conditional) probabilityof the other 30 points in the sample space is 0. Hence, the desired probabilitywill be 1 6 .If we let E and F denote, respectively, the event that the sum of the dice issix and the event that the first die is a four, then the probability just obtainedis called the conditional probability that E occurs given that F has occurred and isdenoted byP(E|F)A general formula for P(E|F)which is valid for all events E and F is derived inthe same manner as the preceding. Namely, if the event F occurs, then in orderfor E to occur it is necessary for the actual occurrence to be a point in both E andin F , that is, it must be in EF. Now, because we know that F has occurred, itfollows that F becomes our new sample space and hence the probability that theevent EF occurs will equal the probability of EF relative to the probability of F .That is,P(E|F)= P(EF)P(F)(1.5)Note that Equation (1.5) is only well defined when P(F)>0 and hence P(E|F)is only defined when P(F)>0.Example 1.4 Suppose cards numbered one through ten are placed in a hat,mixed up, and then one of the cards is drawn. If we are told that the numberon the drawn card is at least five, then what is the conditional probability thatit is ten?Solution: Let E denote the event that the number of the drawn card is ten,and let F be the event that it is at least five. The desired probability is P(E|F).


8 1 Introduction to Probability TheoryNow, from Equation (1.5)P(E|F)= P(EF)P(F)However, EF = E since the number of the card will be both ten and at leastfive if and only if it is number ten. Hence,P(E|F)=110= 1 6 610Example 1.5 A family has two children. What is the conditional probabilitythat both are boys given that at least one of them is a boy? Assume that the samplespace S is given by S ={(b, b), (b, g), (g, b), (g, g)}, and all outcomes are equallylikely. [(b, g) means, for instance, that the older child is a boy and the youngerchild a girl.]Solution: Letting B denote the event that both children are boys, and A theevent that at least one of them is a boy, then the desired probability is given byP(B|A) = P(BA)P(A)=P({(b, b)})P({(b, b), (b, g), (g, b)}) = 1434= 1 3Example 1.6 Bev can either take a course in computers or in chemistry. If Bevtakes the computer course, then she will receive an A grade with probability 1 2 ;ifshe takes the chemistry course then she will receive an A grade with probability 1 3 .Bev decides to base her decision on the flip of a fair coin. What is the probabilitythat Bev will get an A in chemistry?Solution: If we let C be the event that Bev takes chemistry and A denotethe event that she receives an A in whatever course she takes, then the desiredprobability is P(AC). This is calculated by using Equation (1.5) as follows:P(AC)= P(C)P(A|C)= 1 12 3 = 1 6Example 1.7 Suppose an urn contains seven black balls and five white balls.We draw two balls from the urn without replacement. Assuming that each ball inthe urn is equally likely to be drawn, what is the probability that both drawn ballsare black?


1.4. Conditional Probabilities 9Solution: Let F and E denote, respectively, the events that the first and secondballs drawn are black. Now, given that the first ball selected is black, thereare six remaining black balls and five white balls, and so P(E|F) =11 6 .AsP(F)is clearly12 7 , our desired probability isP(EF)= P(F)P(E|F)= 712 6 11 = 42132Example 1.8 Suppose that each of three men at a party throws his hat intothe center of the room. The hats are first mixed up and then each man randomlyselects a hat. What is the probability that none of the three men selects his ownhat?Solution: We shall solve this by first calculating the complementary probabilitythat at least one man selects his own hat. Let us denote by E i ,i= 1, 2, 3,the event that the ith man selects his own hat. To calculate the probabilityP(E 1 ∪ E 2 ∪ E 3 ), we first note thatP(E i ) = 1 3, i = 1, 2, 3P(E i E j ) = 1 6 , i ≠ j (1.6)P(E 1 E 2 E 3 ) = 1 6To see why Equation (1.6) is correct, consider firstP(E i E j ) = P(E i )P (E j |E i )Now P(E i ), the probability that the ith man selects his own hat, is clearly 1 3since he is equally likely to select any of the three hats. On the other hand, giventhat the ith man has selected his own hat, then there remain two hats that thejth man may select, and as one of these two is his own hat, it follows that withprobability 1 2 he will select it. That is, P(E j |E i ) = 1 2and soTo calculate P(E 1 E 2 E 3 ) we writeP(E i E j ) = P(E i )P (E j |E i ) = 1 3 1 2 = 1 6P(E 1 E 2 E 3 ) = P(E 1 E 2 )P (E 3 |E 1 E 2 )= 1 6 P(E 3|E 1 E 2 )However, given that the first two men get their own hats it follows that thethird man must also get his own hat (since there are no other hats left). That is,


10 1 Introduction to Probability TheoryP(E 3 |E 1 E 2 ) = 1 and soP(E 1 E 2 E 3 ) = 1 6Now, from Equation (1.4) we have thatP(E 1 ∪ E 2 ∪ E 3 ) = P(E 1 ) + P(E 2 ) + P(E 3 ) − P(E 1 E 2 )− P(E 1 E 3 ) − P(E 2 E 3 ) + P(E 1 E 2 E 3 )= 1 − 1 2 + 1 6= 2 3Hence, the probability that none of the men selects his own hat is1 − 2 3 = 1 3 . 1.5. Independent EventsTwo events E and F aresaidtobeindependent ifP(EF)= P(E)P(F)By Equation (1.5) this implies that E and F are independent ifP(E|F)= P(E)[which also implies that P(F|E) = P(F)]. That is, E and F are independent ifknowledge that F has occurred does not affect the probability that E occurs. Thatis, the occurrence of E is independent of whether or not F occurs.Two events E and F that are not independent are said to be dependent.Example 1.9 Suppose we toss two fair dice. Let E 1 denote the event that thesum of the dice is six and F denote the event that the first die equals four. ThenwhileP(E 1 F)= P({4, 2}) = 1 36P(E 1 )P (F ) = 5 36 1 6 = 5216and hence E 1 and F are not independent. Intuitively, the reason for this is clearfor if we are interested in the possibility of throwing a six (with two dice), then we


1.5. Independent Events 11will be quite happy if the first die lands four (or any of the numbers 1, 2, 3, 4, 5)because then we still have a possibility of getting a total of six. On the other hand,if the first die landed six, then we would be unhappy as we would no longer havea chance of getting a total of six. In other words, our chance of getting a totalof six depends on the outcome of the first die and hence E 1 and F cannot beindependent.Let E 2 be the event that the sum of the dice equals seven. Is E 2 independent ofF ? The answer is yes sincewhileP(E 2 F)= P({(4, 3)}) = 136P(E 2 )P (F ) = 1 6 1 6 = 1 36We leave it for you to present the intuitive argument why the event that the sumof the dice equals seven is independent of the outcome on the first die. The definition of independence can be extended to more than two events.The events E 1 ,E 2 ,...,E n are said to be independent if for every subsetE 1 ′,E 2 ′,...,E r ′,r n, of these eventsP(E 1 ′E 2 ′ ···E r ′) = P(E 1 ′)P (E 2 ′) ···P(E r ′)Intuitively, the events E 1 ,E 2 ,...,E n are independent if knowledge of the occurrenceof any of these events has no effect on the probability of any other event.Example 1.10 (Pairwise Independent Events That Are Not Independent) Leta ball be drawn from an urn containing four balls, numbered 1, 2, 3, 4. Let E ={1, 2}, F={1, 3}, G={1, 4}. If all four outcomes are assumed equally likely,thenHowever,P(EF)= P(E)P(F)= 1 4 ,P(EG)= P(E)P(G)= 1 4 ,P(FG)= P(F)P(G)= 1 414= P(EFG)≠ P(E)P(F)P(G)Hence, even though the events E,F,G are pairwise independent, they are notjointly independent.


12 1 Introduction to Probability TheorySuppose that a sequence of experiments, each of which results in either a“success” or a “failure,” is to be performed. Let E i ,i 1, denote the event thatthe ith experiment results in a success. If, for all i 1 ,i 2 ,...,i n ,P(E i1 E i2 ···E in ) =n∏P(E ij )we say that the sequence of experiments consists of independent trials.Example 1.11 The successive flips of a coin consist of independent trials ifwe assume (as is usually done) that the outcome on any flip is not influenced bythe outcomes on earlier flips. A “success” might consist of the outcome heads anda “failure” tails, or possibly the reverse. j=11.6. Bayes’ FormulaLet E and F be events. We may express E asE = EF ∪ EF cbecause in order for a point to be in E, it must either be in both E and F ,oritmust be in E and not in F . Since EF and EF c are mutually exclusive, we havethatP(E)= P(EF)+ P(EF c )= P(E|F)P(F)+ P(E|F c )P (F c )= P(E|F)P(F)+ P(E|F c )(1 − P(F)) (1.7)Equation (1.7) states that the probability of the event E is a weighted average ofthe conditional probability of E given that F has occurred and the conditionalprobability of E given that F has not occurred, each conditional probability beinggiven as much weight as the event on which it is conditioned has of occurring.Example 1.12 Consider two urns. The first contains two white and sevenblack balls, and the second contains five white and six black balls. We flip afair coin and then draw a ball from the first urn or the second urn dependingon whether the outcome was heads or tails. What is the conditional probabilitythat the outcome of the toss was heads given that a white ball was selected?


1.6. Bayes’ Formula 13Solution: Let W be the event that a white ball is drawn, and let H be theevent that the coin comes up heads. The desired probability P(H|W) may becalculated as follows:P(H|W)= P(HW)P(W)=== P(W|H)P(H)P(W)P(W|H)P(H)P(W|H)P(H)+ P(W|H c )P (H c )29 1 229 1 2 + 5 11 1 2= 2267Example 1.13 In answering a question on a multiple-choice test a studenteither knows the answer or guesses. Let p be the probability that she knows theanswer and 1 − p the probability that she guesses. Assume that a student whoguesses at the answer will be correct with probability 1/m, where m is the numberof multiple-choice alternatives. What is the conditional probability that a studentknew the answer to a question given that she answered it correctly?Solution: Let C and K denote respectively the event that the student answersthe question correctly and the event that she actually knows the answer.NowP(K|C) = P(KC)P(C)=p=p + (1/m)(1 − p)mp=1 + (m − 1)pP(C|K)P(K)P(C|K)P(K)+ P(C|K c )P (K c )Thus, for example, if m = 5, p = 1 2, then the probability that a student knewthe answer to a question she correctly answered is 5 6 . Example 1.14 A laboratory blood test is 95 percent effective in detecting acertain disease when it is, in fact, present. However, the test also yields a “falsepositive” result for 1 percent of the healthy persons tested. (That is, if a healthyperson is tested, then, with probability 0.01, the test result will imply he has thedisease.) If 0.5 percent of the population actually has the disease, what is theprobability a person has the disease given that his test result is positive?Solution: Let D be the event that the tested person has the disease, andE the event that his test result is positive. The desired probability P(D|E) is


14 1 Introduction to Probability Theoryobtained byP(D|E) = P(DE)P(E)=P(E|D)P(D)P(E|D)P(D) + P(E|D c )P (D c )(0.95)(0.005)=(0.95)(0.005) + (0.01)(0.995)= 95294 ≈ 0.323Thus, only 32 percent of those persons whose test results are positive actuallyhave the disease. Equation (1.7) may be generalized in the following manner. Suppose thatF 1 ,F 2 ,...,F n are mutually exclusive events such that ⋃ ni=1 F i = S. In otherwords, exactly one of the events F 1 ,F 2 ,...,F n will occur. By writingE =n⋃EF ii=1and using the fact that the events EF i ,i= 1,...,n, are mutually exclusive, weobtain thatP(E)==n∑P(EF i )i=1n∑P(E|F i )P (F i ) (1.8)i=1Thus, Equation (1.8) shows how, for given events F 1 ,F 2 ,...,F n of which oneand only one must occur, we can compute P(E) by first “conditioning” uponwhich one of the F i occurs. That is, it states that P(E) is equal to a weightedaverage of P(E|F i ), each term being weighted by the probability of the event onwhich it is conditioned.Suppose now that E has occurred and we are interested in determining whichone of the F j also occurred. By Equation (1.8) we have thatP(F j |E) = P(EF j )P(E)= P(E|F j )P (F j )∑ ni=1P(E|F i )P (F i )(1.9)Equation (1.9) is known as Bayes’ formula.


Exercises 15Example 1.15 You know that a certain letter is equally likely to be in anyone of three different folders. Let α i be the probability that you will find yourletter upon making a quick examination of folder i if the letter is, in fact, in folderi, i = 1, 2, 3. (We may have α i < 1.) Suppose you look in folder 1 and do not findthe letter. What is the probability that the letter is in folder 1?Solution: Let F i ,i= 1, 2, 3 be the event that the letter is in folder i; andlet E be the event that a search of folder 1 does not come up with the letter. Wedesire P(F 1 |E). From Bayes’ formula we obtainP(F 1 |E) = P(E|F 1)P (F 1 )∑ 3i=1P(E|F i )P (F i )(1 − α 1 ) 1 3=(1 − α 1 ) 1 3 + 1 3 + 1 3= 1 − α 13 − α 1Exercises1. A box contains three marbles: one red, one green, and one blue. Consider anexperiment that consists of taking one marble from the box then replacing it inthe box and drawing a second marble from the box. What is the sample space?If, at all times, each marble in the box is equally likely to be selected, what is theprobability of each point in the sample space?*2. Repeat Exercise 1 when the second marble is drawn without replacing thefirst marble.3. A coin is to be tossed until a head appears twice in a row. What is the samplespace for this experiment? If the coin is fair, what is the probability that it will betossed exactly four times?4. Let E,F,G be three events. Find expressions for the events that of E,F,G(a) only F occurs,(b) both E and F but not G occur,(c) at least one event occurs,(d) at least two events occur,(e) all three events occur,(f) none occurs,(g) at most one occurs,(h) at most two occur.*5. An individual uses the following gambling system at Las Vegas. He bets $1that the roulette wheel will come up red. If he wins, he quits. If he loses then he


16 1 Introduction to Probability Theorymakes the same bet a second time only this time he bets $2; and then regardlessof the outcome, quits. Assuming that he has a probability of 1 2of winning eachbet, what is the probability that he goes home a winner? Why is this system notused by everyone?6. Show that E(F ∪ G) = EF ∪ EG.7. Show that (E ∪ F) c = E c F c .8. If P(E)= 0.9 and P(F)= 0.8, show that P(EF) 0.7. In general, showthatP(EF) P(E)+ P(F)− 1This is known as Bonferroni’s inequality.*9. We say that E ⊂ F if every point in E is also in F . Show that if E ⊂ F , then10. Show thatP(F)= P(E)+ P(FE c ) P(E)( n)⋃P E i i=1This is known as Boole’s inequality.n∑P(E i )⋃Hint: Either use Equation (1.2) and mathematical induction, or else show thatni=1E i = ⋃ n ⋂i=1 F i , where F 1 = E 1 ,F i = E i−1 i j=1 Ec j, and use property (iii)of a probability.11. If two fair dice are tossed, what is the probability that the sum is i, i =2, 3,...,12?12. Let E and F be mutually exclusive events in the sample space of anexperiment. Suppose that the experiment is repeated until either event E orevent F occurs. What does the sample space of this new super experiment looklike? Show that the probability that event E occurs before event F is P(E)/[P(E)+ P(F)].Hint: Argue that the probability that the original experiment is performed ntimes and E appears on the nth time is P(E)×(1−p) n−1 , n = 1, 2,..., wherep = P(E)+ P(F). Add these probabilities to get the desired answer.13. The dice game craps is played as follows. The player throws two dice, andif the sum is seven or eleven, then she wins. If the sum is two, three, or twelve,then she loses. If the sum is anything else, then she continues throwing until sheeither throws that number again (in which case she wins) or she throws a seven(in which case she loses). Calculate the probability that the player wins.i=1


Exercises 1714. The probability of winning on a single toss of the dice is p. A starts, andif he fails, he passes the dice to B, who then attempts to win on her toss. Theycontinue tossing the dice back and forth until one of them wins. What are theirrespective probabilities of winning?15. Argue that E = EF ∪ EF c , E ∪ F = E ∪ FE c .16. UseExercise15toshowthatP(E∪ F)= P(E)+ P(F)− P(EF).*17. Suppose each of three persons tosses a coin. If the outcome of one of thetosses differs from the other outcomes, then the game ends. If not, then the personsstart over and retoss their coins. Assuming fair coins, what is the probability thatthe game will end with the first round of tosses? If all three coins are biased andhave probability 1 4of landing heads, what is the probability that the game will endat the first round?18. Assume that each child who is born is equally likely to be a boy or a girl.If a family has two children, what is the probability that both are girls given that(a) the eldest is a girl, (b) at least one is a girl?*19. Two dice are rolled. What is the probability that at least one is a six? If thetwo faces are different, what is the probability that at least one is a six?20. Three dice are thrown. What is the probability the same number appears onexactly two of the three dice?21. Suppose that 5 percent of men and 0.25 percent of women are color-blind.A color-blind person is chosen at random. What is the probability of this personbeing male? Assume that there are an equal number of males and females.22. A and B play until one has 2 more points than the other. Assuming that eachpoint is independently won by A with probability p, what is the probability theywill play a total of 2n points? What is the probability that A will win?23. For events E 1 ,E 2 ,...,E n show thatP(E 1 E 2 ···E n ) = P(E 1 )P (E 2 |E 1 )P (E 3 |E 1 E 2 ) ···P(E n |E 1 ···E n−1 )24. In an election, candidate A receives n votes and candidate B receives mvotes, where n>m. Assume that in the count of the votes all possible orderingsof the n + m votes are equally likely. Let P n,m denote the probability that fromthefirstvoteonA is always in the lead. Find(a) P 2,1 (b) P 3,1 (c) P n,1 (d) P 3,2 (e) P 4,2(f ) P n,2 (g) P 4,3 (h) P 5,3 (i) P 5,4( j) Make a conjecture as to the value of P n,m .


18 1 Introduction to Probability Theory*25. Two cards are randomly selected from a deck of 52 playing cards.(a) What is the probability they constitute a pair (that is, that they are of thesame denomination)?(b) What is the conditional probability they constitute a pair given that they areof different suits?26. A deck of 52 playing cards, containing all 4 aces, is randomly divided into4 piles of 13 cards each. Define events E 1 ,E 2 ,E 3 , and E 4 as follows:E 1 ={the first pile has exactly 1 ace},E 2 ={the second pile has exactly 1 ace},E 3 ={the third pile has exactly 1 ace},E 4 ={the fourth pile has exactly 1 ace}UseExercise23tofindP(E 1 E 2 E 3 E 4 ), the probability that each pile has an ace.*27. Suppose in Exercise 26 we had defined the events E i , i = 1, 2, 3, 4, byE 1 ={one of the piles contains the ace of spades},E 2 ={the ace of spades and the ace of hearts are in different piles},E 3 ={the ace of spades, the ace of hearts, and theace of diamonds are in different piles},E 4 ={all 4 aces are in different piles}Now use Exercise 23 to find P(E 1 E 2 E 3 E 4 ), the probability that each pile has anace. Compare your answer with the one you obtained in Exercise 26.28. If the occurrence of B makes A more likely, does the occurrence of A makeB more likely?29. Suppose that P(E)= 0.6. What can you say about P(E|F)when(a) E and F are mutually exclusive?(b) E ⊂ F ?(c) F ⊂ E?*30. Bill and George go target shooting together. Both shoot at a target at thesame time. Suppose Bill hits the target with probability 0.7, whereas George, independently,hits the target with probability 0.4.(a) Given that exactly one shot hit the target, what is the probability that it wasGeorge’s shot?(b) Given that the target is hit, what is the probability that George hit it?31. What is the conditional probability that the first die is six given that the sumof the dice is seven?


Exercises 19*32. Suppose all n men at a party throw their hats in the center of the room.Each man then randomly selects a hat. Show that the probability that none of then men selects his own hat is12! − 1 3! + 1 −+···(−1)n4! n!Note that as n →∞this converges to e −1 . Is this surprising?33. In a class there are four freshman boys, six freshman girls, and six sophomoreboys. How many sophomore girls must be present if sex and class are to beindependent when a student is selected at random?34. Mr. Jones has devised a gambling system for winning at roulette. When hebets, he bets on red, and places a bet only when the ten previous spins of theroulette have landed on a black number. He reasons that his chance of winning isquite large since the probability of eleven consecutive spins resulting in black isquite small. What do you think of this system?35. A fair coin is continually flipped. What is the probability that the first fourflips are(a) H , H , H , H ?(b) T , H , H , H ?(c) What is the probability that the pattern T , H , H , H occurs before thepattern H , H , H , H ?36. Consider two boxes, one containing one black and one white marble, theother, two black and one white marble. A box is selected at random and a marbleis drawn at random from the selected box. What is the probability that the marbleis black?37. In Exercise 36, what is the probability that the first box was the one selectedgiven that the marble is white?38. Urn 1 contains two white balls and one black ball, while urn 2 contains onewhite ball and five black balls. One ball is drawn at random from urn 1 and placedin urn 2. A ball is then drawn from urn 2. It happens to be white. What is theprobability that the transferred ball was white?39. Stores A, B, and C have 50, 75, and 100 employees, and, respectively, 50,60, and 70 percent of these are women. Resignations are equally likely among allemployees, regardless of sex. One employee resigns and this is a woman. What isthe probability that she works in store C?*40. (a) A gambler has in his pocket a fair coin and a two-headed coin. Heselects one of the coins at random, and when he flips it, it shows heads. What isthe probability that it is the fair coin? (b) Suppose that he flips the same coin a


20 1 Introduction to Probability Theorysecond time and again it shows heads. Now what is the probability that it is thefair coin? (c) Suppose that he flips the same coin a third time and it shows tails.Now what is the probability that it is the fair coin?41. In a certain species of rats, black dominates over brown. Suppose that ablack rat with two black parents has a brown sibling.(a) What is the probability that this rat is a pure black rat (as opposed to beinga hybrid with one black and one brown gene)?(b) Suppose that when the black rat is mated with a brown rat, all five of theiroffspring are black. Now, what is the probability that the rat is a pure black rat?42. There are three coins in a box. One is a two-headed coin, another is a faircoin, and the third is a biased coin that comes up heads 75 percent of the time.When one of the three coins is selected at random and flipped, it shows heads.What is the probability that it was the two-headed coin?43. Suppose we have ten coins which are such that if the ith one is flipped thenheads will appear with probability i/10, i = 1, 2,...,10. When one of the coins israndomly selected and flipped, it shows heads. What is the conditional probabilitythat it was the fifth coin?44. Urn 1 has five white and seven black balls. Urn 2 has three white and twelveblack balls. We flip a fair coin. If the outcome is heads, then a ball from urn 1 isselected, while if the outcome is tails, then a ball from urn 2 is selected. Supposethat a white ball is selected. What is the probability that the coin landed tails?*45. An urn contains b black balls and r red balls. One of the balls is drawn atrandom, but when it is put back in the urn c additional balls of the same color areput in with it. Now suppose that we draw another ball. Show that the probabilitythat the first ball drawn was black given that the second ball drawn was red isb/(b + r + c).46. Three prisoners are informed by their jailer that one of them has been chosenat random to be executed, and the other two are to be freed. Prisoner A asksthe jailer to tell him privately which of his fellow prisoners will be set free, claimingthat there would be no harm in divulging this information, since he alreadyknows that at least one will go free. The jailer refuses to answer this question,pointing out that if A knew which of his fellows were to be set free, then his ownprobability of being executed would rise from 1 3 to 1 2, since he would then be oneof two prisoners. What do you think of the jailer’s reasoning?47. For a fixed event B, show that the collection P(A|B), defined for all eventsA, satisfies the three conditions for a probability. Conclude from this thatP(A|B) = P(A|BC)P (C|B) + P(A|BC c )P (C c |B)Then directly verify the preceding equation.


References 21*48. Sixty percent of the families in a certain community own their own car,thirty percent own their own home, and twenty percent own both their own carand their own home. If a family is randomly chosen, what is the probability thatthis family owns a car or a house but not both?ReferencesReference [2] provides a colorful introduction to some of the earliest developments in probabilitytheory. References [3], [4], and [7] are all excellent introductory texts in modern probability theory.Reference [5] is the definitive work which established the axiomatic foundation of modern mathematicalprobability theory. Reference [6] is a nonmathematical introduction to probability theory and itsapplications, written by one of the greatest mathematicians of the eighteenth century.1. L. Breiman, “Probability,” Addison-Wesley, Reading, Massachusetts, 1968.2. F. N. David, “Games, Gods, and Gambling,” Hafner, New York, 1962.3. W. Feller, “An Introduction to Probability Theory and Its Applications,” Vol. I, JohnWiley, New York, 1957.4. B. V. Gnedenko, “Theory of Probability,” Chelsea, New York, 1962.5. A. N. Kolmogorov, “Foundations of the Theory of Probability,” Chelsea, New York,1956.6. Marquis de Laplace, “A Philosophical Essay on Probabilities,” 1825 (English Translation),Dover, New York, 1951.7. S. Ross, “A First Course in Probability,” Sixth Edition, Prentice Hall, New Jersey, 2002.


This page intentionally left blank


Random Variables22.1. Random VariablesIt frequently occurs that in performing an experiment we are mainly interested insome functions of the outcome as opposed to the outcome itself. For instance, intossing dice we are often interested in the sum of the two dice and are not reallyconcerned about the actual outcome. That is, we may be interested in knowingthat the sum is seven and not be concerned over whether the actual outcome was(1, 6) or (2, 5) or (3, 4) or (4, 3) or (5, 2) or (6, 1). These quantities of interest,or more formally, these real-valued functions defined on the sample space, areknown as random variables.Since the value of a random variable is determined by the outcome of the experiment,we may assign probabilities to the possible values of the random variable.Example 2.1 Letting X denote the random variable that is defined as the sumof two fair dice; thenP {X = 2}=P {(1, 1)}= 1 36 ,P {X = 3}=P {(1, 2), (2, 1)}= 2 36 ,P {X = 4}=P {(1, 3), (2, 2), (3, 1)}= 3 36 ,P {X = 5}=P {(1, 4), (2, 3), (3, 2), (4, 1)}=36 4 ,P {X = 6}=P {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}=36 5 ,P {X = 7}=P {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}=36 6 ,P {X = 8}=P {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}=36 5 ,P {X = 9}=P {(3, 6), (4, 5), (5, 4), (6, 3)}= 436 ,23


24 2 Random VariablesP {X = 10}=P {(4, 6), (5, 5), (6, 4)}= 3 36 ,P {X = 11}=P {(5, 6), (6, 5)}= 2 36 ,P {X = 12}=P {(6, 6)}= 1 36(2.1)In other words, the random variable X can take on any integral value betweentwo and twelve, and the probability that it takes on each value is given byEquation (2.1). Since X must take on one of the values two through twelve,we must have that1 = P{ 12⋃i=2which may be checked from Equation (2.1).}∑12{X = n} = P {X = n}Example 2.2 For a second example, suppose that our experiment consistsof tossing two fair coins. Letting Y denote the number of heads appearing,then Y is a random variable taking on one of the values 0, 1, 2 with respectiveprobabilitiesn=2P {Y = 0}=P {(T , T )}= 1 4 ,P {Y = 1}=P {(T , H ), (H, T )}= 2 4 ,P {Y = 2}=P {(H, H )}= 1 4Of course, P {Y = 0}+P {Y = 1}+P {Y = 2}=1.Example 2.3 Suppose that we toss a coin having a probability p of coming upheads, until the first head appears. Letting N denote the number of flips required,then assuming that the outcome of successive flips are independent, N is a randomvariable taking on one of the values 1, 2, 3, ..., with respective probabilitiesP {N = 1}=P {H }=p,P {N = 2}=P {(T , H )}=(1 − p)p,P {N = 3}=P {(T,T,H)}=(1 − p) 2 p,..P {N = n}=P {(T,T,...,T,H)}=(1 − p)} {{ }n−1 p, n 1n−1


2.1. Random Variables 25As a check, note that( ∞)⋃∞∑P {N = n} = P {N = n}n=1= pn=1∞∑(1 − p) n−1n=1p=1 − (1 − p)= 1 Example 2.4 Suppose that our experiment consists of seeing how long a batterycan operate before wearing down. Suppose also that we are not primarily interestedin the actual lifetime of the battery but are concerned only about whetheror not the battery lasts at least two years. In this case, we may define the randomvariable I by{ 1, if the lifetime of battery is two or more yearsI =0, otherwiseIf E denotes the event that the battery lasts two or more years, then the randomvariable I is known as the indicator random variable for event E. (Note that Iequals 1 or 0 depending on whether or not E occurs.) Example 2.5 Suppose that independent trials, each of which results in anyof m possible outcomes with respective probabilities p 1 ,...,p m , ∑ mi=1 p i = 1,are continually performed. Let X denote the number of trials needed until eachoutcome has occurred at least once.Rather than directly considering P {X = n} we will first determine P {X >n},the probability that at least one of the outcomes has not yet occurred after n trials.Letting A i denote the event that outcome i has not yet occurred after the firstn trials, i = 1,...,m, then( m)⋃P {X>n}=P A i=i=1m∑P(A i ) − ∑∑P(A i A j )i=1i


26 2 Random VariablesNow, P(A i ) is the probability that each of the first n trials results in a non-ioutcome, and so by independenceP(A i ) = (1 − p i ) nSimilarly, P(A i A j ) is the probability that the first n trials all result in a non-i andnon-j outcome, and soP(A i A j ) = (1 − p i − p j ) nAs all of the other probabilities are similar, we see thatP {X>n}=m∑(1 − p i ) n − ∑∑(1 − p i − p j ) ni=1i


2.2. Discrete Random Variables 27(ii) lim b→∞ F(b)= F(∞) = 1,(iii) lim b→−∞ F(b)= F(−∞) = 0.Property (i) follows since for a


28 2 Random VariablesFigure 2.1. Graph of F(x).The cumulative distribution function F can be expressed in terms of p(a) byF(a)=∑p(x i )all x i aFor instance, suppose X has a probability mass function given byp(1) = 1 2 , p(2) = 1 3 , p(3) = 1 6then, the cumulative distribution function F of X is given by⎧0, a


2.2. Discrete Random Variables 29A random variable X is said to be a Bernoulli random variable if its probabilitymass function is given by Equation (2.2) for some p ∈ (0, 1).2.2.2. The Binomial Random VariableSuppose that n independent trials, each of which results in a “success” with probabilityp and in a “failure” with probability 1 − p, are to be performed. If Xrepresents the number of successes that occur in the n trials, then X is said to bea binomial random variable with parameters (n, p).The probability mass function of a binomial random variable having parameters(n, p) is given by( ) np(i) = p i (1 − p) n−i , i = 0, 1,...,n (2.3)iwhere( ) n n!=i (n − i)! i!equals the number of different groups of i objects that can be chosen from a setof n objects. The validity of Equation (2.3) may be verified by first noting that theprobability of any particular sequence of the n outcomes containing i successesand n − i failures is, by the assumed independence of trials, p i (1 − p) n−i . Equation(2.3) then follows since there are ( ni)different sequences of the n outcomesleading to i successes and n − i failures. For instance, if n = 3,i = 2, then thereare ( 32)= 3 ways in which the three trials can result in two successes. Namely,any one of the three outcomes (s,s,f), (s,f,s), (f,s,s), where the outcome(s,s,f) means that the first two trials are successes and the third a failure. Sinceeach of the three outcomes (s,s,f), (s,f,s), (f,s,s)has a probability p 2 (1−p)of occurring the desired probability is thus ( 32)p 2 (1 − p).Note that, by the binomial theorem, the probabilities sum to one, that is,∞∑p(i) =i=0n∑i=0( ni)p i (1 − p) n−i = ( p + (1 − p) ) n = 1Example 2.6 Four fair coins are flipped. If the outcomes are assumedindependent, what is the probability that two heads and two tails are obtained?Solution: Letting X equal the number of heads (“successes”) that appear,then X is a binomial random variable with parameters (n = 4, p = 1 2 ).


30 2 Random VariablesHence, by Equation (2.3),( )( ) 4 1 2 ( ) 1 2P {X = 2}== 3 2 2 2 8Example 2.7 It is known that any item produced by a certain machine willbe defective with probability 0.1, independently of any other item. What is theprobability that in a sample of three items, at most one will be defective?Solution: If X is the number of defective items in the sample, then X is a binomialrandom variable with parameters (3, 0.1). Hence, the desired probabilityis given by( ( 3 3P {X = 0}+P {X = 1}= (0.1)0)0 (0.9) 3 + (0.1)1)1 (0.9) 2 = 0.972 Example 2.8 Suppose that an airplane engine will fail, when in flight, withprobability 1 − p independently from engine to engine; suppose that the airplanewill make a successful flight if at least 50 percent of its engines remainoperative. For what values of p is a four-engine plane preferable to a two-engineplane?Solution: Because each engine is assumed to fail or function independentlyof what happens with the other engines, it follows that the number of enginesremaining operative is a binomial random variable. Hence, the probability thata four-engine plane makes a successful flight is( ( ( 4 4 4p2)2 (1 − p) 2 + p3)3 (1 − p) + p4)4 (1 − p) 0= 6p 2 (1 − p) 2 + 4p 3 (1 − p) + p 4whereas the corresponding probability for a two-engine plane is( ( 2 2p(1 − p) + p1)2)2 = 2p(1 − p) + p 2Hence the four-engine plane is safer ifor equivalently if6p 2 (1 − p) 2 + 4p 3 (1 − p) + p 4 2p(1 − p) + p 26p(1 − p) 2 + 4p 2 (1 − p) + p 3 2 − p


2.2. Discrete Random Variables 31which simplifies towhich is equivalent to3p 3 − 8p 2 + 7p − 2 0 or (p − 1) 2 (3p − 2) 03p − 2 0 or p 2 3Hence, the four-engine plane is safer when the engine success probability isat least as large as 2 3, whereas the two-engine plane is safer when this probabilityfalls below 2 3 . Example 2.9 Suppose that a particular trait of a person (such as eye color orleft handedness) is classified on the basis of one pair of genes and suppose thatd represents a dominant gene and r a recessive gene. Thus a person with dd genesis pure dominance, one with rr is pure recessive, and one with rd is hybrid. Thepure dominance and the hybrid are alike in appearance. Children receive one genefrom each parent. If, with respect to a particular trait, two hybrid parents have atotal of four children, what is the probability that exactly three of the four childrenhave the outward appearance of the dominant gene?Solution: If we assume that each child is equally likely to inherit eitherof two genes from each parent, the probabilities that the child of two hybridparents will have dd, rr,orrd pairs of genes are, respectively, 1 4 , 1 4 , 1 2 . Hence,because an offspring will have the outward appearance of the dominant gene ifits gene pair is either dd or rd, it follows that the number of such children isbinomially distributed with parameters (4, 3 4). Thus the desired probability is( )( ) 4 3 3 ( ) 1 1= 273 4 4 64Remark on Terminology If X is a binomial random variable with parameters(n, p), then we say that X has a binomial distribution with parameters(n, p).2.2.3. The Geometric Random VariableSuppose that independent trials, each having probability p of being a success, areperformed until a success occurs. If we let X be the number of trials requireduntil the first success, then X is said to be a geometric random variable with


32 2 Random Variablesparameter p. Its probability mass function is given byp(n) = P {X = n}=(1 − p) n−1 p, n = 1, 2,... (2.4)Equation (2.4) follows since in order for X to equal n it is necessary and sufficientthat the first n − 1 trials be failures and the nth trial a success. Equation (2.4)follows since the outcomes of the successive trials are assumed to be independent.To check that p(n) is a probability mass function, we note that∞∑∞∑p(n) = p (1 − p) n−1 = 1n=1n=12.2.4. The Poisson Random VariableA random variable X, taking on one of the values 0, 1, 2, ...,issaidtobeaPoissonrandom variable with parameter λ,ifforsomeλ>0,p(i) = P {X = i}=e−λ λiEquation (2.5) defines a probability mass function since∞∑ ∑∞p(i) = e −λ λ ii! = e−λ e λ = 1i=0i=0, i = 0, 1,... (2.5)i!The Poisson random variable has a wide range of applications in a diverse numberof areas, as will be seen in Chapter 5.An important property of the Poisson random variable is that it may be used toapproximate a binomial random variable when the binomial parameter n is largeand p is small. To see this, suppose that X is a binomial random variable withparameters (n, p), and let λ = np. Thenn!P {X = i}=(n − i)! i! pi (1 − p) n−i( )n! λ i (=1 − λ ) n−i(n − i)! i! n n=n(n − 1) ···(n − i + 1) λ i (1 − λ/n) nn i i! (1 − λ/n) i


2.2. Discrete Random Variables 33Now, for n large and p small(1 − λ n) n≈ e −λ ,n(n − 1) ···(n − i + 1)n i ≈ 1,(1 − λ n) i≈ 1Hence, for n large and p small,−λ λiP {X = i}≈ei!Example 2.10 Suppose that the number of typographical errors on a singlepage of this book has a Poisson distribution with parameter λ = 1. Calculate theprobability that there is at least one error on this page.Solution:P {X 1}=1 − P {X = 0}=1 − e −1 ≈ 0.633Example 2.11 If the number of accidents occurring on a highway each dayis a Poisson random variable with parameter λ = 3, what is the probability that noaccidents occur today?Solution:P {X = 0}=e −3 ≈ 0.05Example 2.12 Consider an experiment that consists of counting the numberof α-particles given off in a one-second interval by one gram of radioactive material.If we know from past experience that, on the average, 3.2 such α-particlesare given off, what is a good approximation to the probability that no more thantwo α-particles will appear?Solution: If we think of the gram of radioactive material as consisting of alarge number n of atoms each of which has probability 3.2/n of disintegratingand sending off an α-particle during the second considered, then we see that, toa very close approximation, the number of α-particles given off will be a Poissonrandom variable with parameter λ = 3.2. Hence the desired probability isP {X 2}=e −3.2 + 3.2e −3.2 + (3.2)2 e −3.2 ≈ 0.3822


34 2 Random Variables2.3. Continuous Random VariablesIn this section, we shall concern ourselves with random variables whose set ofpossible values is uncountable. Let X be such a random variable. We say that X isa continuous random variable if there exists a nonnegative function f(x), definedfor all real x ∈ (−∞, ∞), having the property that for any set B of real numbers∫P {X ∈ B}= f(x)dx (2.6)The function f(x) is called the probability density function of the random variableX.In words, Equation (2.6) states that the probability that X will be in B maybe obtained by integrating the probability density function over the set B. SinceX must assume some value, f(x)must satisfyB1 = P {X ∈ (−∞, ∞)}=∫ ∞−∞f(x)dxAll probability statements about X can be answered in terms of f(x).Forinstance,letting B =[a,b], we obtain from Equation (2.6) thatP {a X b}=If we let a = b in the preceding, thenP {X = a}=∫ aa∫ baf(x)dx (2.7)f(x)dx= 0In words, this equation states that the probability that a continuous random variablewill assume any particular value is zero.The relationship between the cumulative distribution F(·) and the probabilitydensity f(·) is expressed byF(a)= P {X ∈ (−∞,a]} =Differentiating both sides of the preceding yieldsdF(a)= f(a)da∫ a−∞f(x)dxThat is, the density is the derivative of the cumulative distribution function.A somewhat more intuitive interpretation of the density function may be obtained


2.3. Continuous Random Variables 35from Equation (2.7) as follows:P{a − ε 2 X a + ε }=2∫ a+ε/2a−ε/2f(x)dx≈ εf (a)when ε is small. In other words, the probability that X will be contained in aninterval of length ε around the point a is approximately εf (a). From this, we seethat f(a)is a measure of how likely it is that the random variable will be near a.There are several important continuous random variables that appear frequentlyin probability theory. The remainder of this section is devoted to a study of certainof these random variables.2.3.1. The Uniform Random VariableA random variable is said to be uniformly distributed over the interval (0, 1) if itsprobability density function is given by{ 1, 0


36 2 Random VariablesSolution:Since F(a)= ∫ a−∞f(x)dx, we obtain from Equation (2.8) that⎧0, a α⎪⎨ a − αF(a)=β − α , α


2.3. Continuous Random Variables 372.3.3. Gamma Random VariablesA continuous random variable whose density is given by⎧⎨λe −λx (λx) α−1, if x 0f(x)=⎩Ɣ(α)0, if x0, α>0 is said to be a gamma random variable with parametersα, λ. The quantity Ɣ(α) is called the gamma function and is defined byƔ(α) =∫ ∞0e −x x α−1 dxIt is easy to show by induction that for integral α,say,α = n,2.3.4. Normal Random VariablesƔ(n) = (n − 1)!We say that X is a normal random variable (or simply that X is normally distributed)with parameters μ and σ 2 if the density of X is given byf(x)= 1 √2πσe −(x−μ)2 /2σ 2 ,−∞


38 2 Random VariablesF Y (·) ∗ the cumulative distribution function of the random variable Y is given byF Y (a) = P {Y a}= P {αX + β a}{= P X a − β }α( ) a − β= F Xα∫ (a−β)/α1= √ e −(x−μ)2 /2σ 2 dx−∞ 2πσ∫ a{1 −(v − (αμ + β))2}= √ exp−∞ 2πασ 2α 2 σ 2 dv (2.9)where the last equality is obtained by the change in variables v = αx + β. However,since F Y (a) = ∫ a−∞ f Y (v) dv, it follows from Equation (2.9) that the probabilitydensity function f Y (·) is given by{1 −(v − (αμ + β))2}f Y (v) = √ exp2πασ 2(ασ ) 2 , −∞


2.4. Expectation of a Random Variable 39In other words, the expected value of X is a weighted average of the possiblevalues that X can take on, each value being weighted by the probability that Xassumes that value. For example, if the probability mass function of X is given byp(1) = 1 2 = p(2)thenE[X]=1( 1 2 ) + 2( 1 2 ) = 3 2is just an ordinary average of the two possible values 1 and 2 that X can assume.On the other hand, ifthenp(1) = 1 3 , p(2) = 2 3E[X]=1( 1 3 ) + 2( 2 3 ) = 5 3is a weighted average of the two possible values 1 and 2 where the value 2 is giventwice as much weight as the value 1 since p(2) = 2p(1).Example 2.15Find E[X] where X is the outcome when we roll a fair die.Solution:Since p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1 6, we obtainE[X]=1( 1 6 ) + 2( 1 6 ) + 3( 1 6 ) + 4( 1 6 ) + 5( 1 6 ) + 6( 1 6 ) = 7 2Example 2.16 (Expectation of a Bernoulli Random Variable) Calculate E[X]when X is a Bernoulli random variable with parameter p.Solution:Since p(0) = 1 − p, p(1) = p, we haveE[X]=0(1 − p) + 1(p) = pThus, the expected number of successes in a single trial is just the probabilitythat the trial will be a success. Example 2.17 (Expectation of a Binomial Random Variable) Calculate E[X]when X is binomially distributed with parameters n and p.


40 2 Random VariablesSolution:E[X]====n∑ip(i)i=0n∑i=0n∑i=1n∑i=1= np( ) ni p i (1 − p) n−iiin!(n − i)! i! pi (1 − p) n−in!(n − i)!(i − 1)! pi (1 − p) n−in∑i=1k=0(n − 1)!(n − i)!(i − 1)! pi−1 (1 − p) n−i∑n−1( ) n − 1= npp k (1 − p) n−1−kk= np[p + (1 − p)] n−1= npwhere the second from the last equality follows by letting k = i − 1. Thus, theexpected number of successes in n independent trials is n multiplied by theprobability that a trial results in a success. Example 2.18 (Expectation of a Geometric Random Variable) Calculate theexpectation of a geometric random variable having parameter p.Solution:where q = 1 − p,By Equation (2.4), we have∞∑E[X]= np(1 − p) n−1= pn=1∞∑n=1nq n−1∞∑ dE[X]=pdq (qn )n=1(= p d ∞)∑q ndqn=1


2.4. Expectation of a Random Variable 41= p d ( ) qdq 1 − qp=(1 − q) 2= 1 pIn words, the expected number of independent trials we need to perform untilwe attain our first success equals the reciprocal of the probability that any onetrial results in a success. Example 2.19 (Expectation of a Poisson Random Variable) Calculate E[X]if X is a Poisson random variable with parameter λ.Solution:From Equation (2.5), we haveE[X]==∞∑ ie −λ λ ii=0∞∑i=1i!e −λ λ i(i − 1)!= λe −λ ∞ ∑i=1= λe −λ ∞ ∑k=0= λe −λ e λ= λλ i−1(i − 1)!λ kk!where we have used the identity ∑ ∞k=0 λ k /k!=e λ .2.4.2. The Continuous CaseWe may also define the expected value of a continuous random variable. This isdone as follows. If X is a continuous random variable having a probability densityfunction f(x), then the expected value of X is defined by∫ ∞E[X]= xf (x) dx−∞


42 2 Random VariablesExample 2.20 (Expectation of a Uniform Random Variable) Calculate theexpectation of a random variable uniformly distributed over (α, β).Solution:From Equation (2.8) we have∫ βxE[X]=α β − α dx= β2 − α 22(β − α)= β + α2In other words, the expected value of a random variable uniformly distributedover the interval (α, β) is just the midpoint of the interval. Example 2.21 (Expectation of an Exponential Random Variable) Let X beexponentially distributed with parameter λ. Calculate E[X].Solution:Integrating by parts yieldsE[X]=∫ ∞0xλe −λx dxE[X]=−xe −λx∣ ∫ ∞∣ ∞ 0+ e −λx dx0= 0 − e−λx∞λ ∣0= 1 λExample 2.22 (Expectation of a Normal Random Variable) Calculate E[X]when X is normally distributed with parameters μ and σ 2 .Solution:E[X]= 1 √2πσ∫ ∞−∞xe −(x−μ)2 /2σ 2 dxWriting x as (x − μ) + μ yieldsE[X]= √ 1 ∫ ∞(x − μ)e −(x−μ)2 /2σ 2 dx + μ 1 ∫ ∞√ e −(x−μ)2 /2σ 2 dx2πσ 2πσ−∞−∞


2.4. Expectation of a Random Variable 43Letting y = x − μ leads toE[X]= 1 √2πσ∫ ∞−∞∫ ∞ye −y2 /2σ 2 dy + μ f(x)dx−∞where f(x) is the normal density. By symmetry, the first integral must be 0,and so∫ ∞E[X]=μ f(x)dx= μ −∞2.4.3. Expectation of a Function of a Random VariableSuppose now that we are given a random variable X and its probability distribution(that is, its probability mass function in the discrete case or its probabilitydensity function in the continuous case). Suppose also that we are interested incalculating, not the expected value of X, but the expected value of some functionof X, say,g(X). How do we go about doing this? One way is as follows. Sinceg(X) is itself a random variable, it must have a probability distribution, whichshould be computable from a knowledge of the distribution of X. Once we haveobtained the distribution of g(X), we can then compute E[g(X)] by the definitionof the expectation.Example 2.23Suppose X has the following probability mass function:p(0) = 0.2, p(1) = 0.5, p(2) = 0.3Calculate E[X 2 ].Solution: Letting Y = X 2 , we have that Y is a random variable that can takeon one of the values 0 2 , 1 2 , 2 2 with respective probabilitiesp Y (0) = P {Y = 0 2 }=0.2,p Y (1) = P {Y = 1 2 }=0.5,p Y (4) = P {Y = 2 2 }=0.3Hence,E[X 2 ]=E[Y ]=0(0.2) + 1(0.5) + 4(0.3) = 1.7Note that1.7 = E[X 2 ] ≠ (E[X]) 2 = 1.21


44 2 Random VariablesExample 2.24 Let X be uniformly distributed over (0, 1). Calculate E[X 3 ].Solution: Letting Y = X 3 , we calculate the distribution of Y as follows.For 0 a 1,F Y (a) = P {Y a}= P {X 3 a}= P {X a 1/3 }= a 1/3where the last equality follows since X is uniformly distributed over (0, 1).By differentiating F Y (a), we obtain the density of Y , namely,Hence,f Y (a) = 1 3 a−2/3 , 0 a 1∫ ∞E[X 3 ]=E[Y ]=== 1 3−∞∫ 10∫ 10= 1 3 3 4 a4/3∣ ∣ 1 0= 1 4af Y (a) daa 1 3 a−2/3 daa 1/3 daWhile the foregoing procedure will, in theory, always enable us to computethe expectation of any function of X from a knowledge of the distribution ofX, there is, fortunately, an easier way to do this. The following propositionshows how we can calculate the expectation of g(X) without first determiningits distribution.Proposition 2.1 (a) If X is a discrete random variable with probability massfunction p(x), then for any real-valued function g,E[g(X)]=∑x:p(x)>0g(x)p(x)


2.4. Expectation of a Random Variable 45(b) If X is a continuous random variable with probability density functionf(x), then for any real-valued function g,∫ ∞E[g(X)]= g(x)f(x) dx−∞Example 2.25Applying the proposition to Example 2.23 yieldsE[X 2 ]=0 2 (0.2) + (1 2 )(0.5) + (2 2 )(0.3) = 1.7which, of course, checks with the result derived in Example 2.23. Example 2.26 Applying the proposition to Example 2.24 yieldsE[X 3 ]=∫ 10= 1 4x 3 dx(since f(x)= 1, 0 0x:p(x)>0= aE[X]+bIn the continuous case,E[aX + b]== a∫ ∞−∞∫ ∞−∞(ax + b)f (x)dx= aE[X]+b ∫ ∞xf (x) dx + b f(x)dx−∞


46 2 Random VariablesThe expected value of a random variable X, E[X], is also referred to as the meanor the first moment of X. The quantity E[X n ], n 1, is called the nth momentof X. By Proposition 2.1, we note that⎧⎪⎨E[X n ]=⎪⎩∑x:p(x)>0∫ ∞−∞x n p(x),x n f(x)dx,if X is discreteif X is continuousAnother quantity of interest is the variance of a random variable X, denoted byVar(X), which is defined byVar(X) = E [ (X − E[X]) 2]Thus, the variance of X measures the expected square of the deviation of X fromits expected value.Example 2.27 (Variance of the Normal Random Variable) Let X be normallydistributed with parameters μ and σ 2 .FindVar(X).Solution:Recalling (see Example 2.22) that E[X]=μ, we have thatVar(X) = E[(X − μ) 2 ]= √ 1 ∫ ∞(x − μ) 2 e −(x−μ)2 /2σ 2 dx2πσSubstituting y = (x − μ)/σ yields−∞Var(X) = σ 2√2π∫ ∞−∞y 2 e −y2 /2 dyIntegrating by parts (u = y, dv = ye −y2 /2 dy) givesVar(X) = σ 2√2π(−ye −y2 /2 ∣ ∣ ∣∞−∞ + ∫ ∞= σ 2√2π∫ ∞= σ 2−∞e −y2 /2 dy−∞)e −y2 /2 dyAnother derivation of Var(X) will be given in Example 2.42.


2.5. Jointly Distributed Random Variables 47Suppose that X is continuous with density f, and let E[X]=μ. Then,Var(X) = E[(X − μ) 2 ]= E[X 2 − 2μX + μ 2 ]=∫ ∞−∞(x 2 − 2μx + μ 2 )f (x) dx∫ ∞= x 2 f(x)dx− 2μ−∞= E[X 2 ]−2μμ + μ 2= E[X 2 ]−μ 2∫ ∞−∞∫ ∞xf (x) dx + μ 2 f(x)dxA similar proof holds in the discrete case, and so we obtain the useful identityVar(X) = E[X 2 ]−(E[X]) 2−∞Example 2.28die is rolled.Calculate Var(X) when X represents the outcome when a fairSolution: As previously noted in Example 2.15, E[X]= 7 2 .Also,( ) ( ) ( ) ( ) ( ) ( ) ( )E[X 2 ]=1 16+ 2 2 16+ 3 2 16+ 4 2 16+ 5 2 16+ 6 2 16= 16(91)Hence,Var(X) = 91 6 − ( 72) 2 =35122.5. Jointly Distributed Random Variables2.5.1. Joint Distribution FunctionsThus far, we have concerned ourselves with the probability distribution of a singlerandom variable. However, we are often interested in probability statements concerningtwo or more random variables. To deal with such probabilities, we define,for any two random variables X and Y ,thejoint cumulative probability distributionfunction of X and Y byF(a,b)= P {X a,Y b},−∞


48 2 Random VariablesThe distribution of X can be obtained from the joint distribution of X and Y asfollows:F X (a) = P {X a}= P {X a, Y < ∞}= F(a,∞)Similarly, the cumulative distribution function of Y is given byF Y (b) = P {Y b}=F(∞, b)In the case where X and Y are both discrete random variables, it is convenient todefine the joint probability mass function of X and Y byp(x,y) = P {X = x, Y = y}The probability mass function of X may be obtained from p(x,y) byp X (x) =∑y:p(x,y)>0p(x,y)Similarly,p Y (y) =∑p(x,y)x:p(x,y)>0We say that X and Y are jointly continuous if there exists a function f(x,y),defined for all real x and y, having the property that for all sets A and B of realnumbers∫ ∫P {X ∈ A,Y ∈ B}= f(x,y)dx dyB AThe function f(x,y) is called the joint probability density function of X and Y .The probability density of X can be obtained from a knowledge of f(x,y) by thefollowing reasoning:P {X ∈ A}=P {X ∈ A, Y ∈ (−∞, ∞)}∫ ∞ ∫= f(x,y)dx dy−∞AA∫= f X (x) dx


2.5. Jointly Distributed Random Variables 49where∫ ∞f X (x) = f(x,y)dy−∞is thus the probability density function of X. Similarly, the probability densityfunction of Y is given by∫ ∞f Y (y) = f(x,y)dx−∞A variation of Proposition 2.1 states that if X and Y are random variables andg is a function of two variables, thenE[g(X,Y)]= ∑ ∑g(x,y)p(x,y)in the discrete casey x=∫ ∞ ∫ ∞−∞−∞g(x,y)f(x,y)dx dyin the continuous caseFor example, if g(X,Y) = X + Y , then, in the continuous case,∫ ∞ ∫ ∞E[X + Y ]= (x + y)f(x,y)dx dy−∞ −∞∫ ∞ ∫ ∞∫ ∞ ∫ ∞= xf (x, y) dx dy + yf (x, y) dx dy−∞ −∞−∞ −∞= E[X]+E[Y ]where the first integral is evaluated by using the variation of Proposition 2.1 withg(x,y) = x, and the second with g(x,y) = y.The same result holds in the discrete case and, combined with the corollary inSection 2.4.3, yields that for any constants a,bE[aX + bY]=aE[X]+bE[Y ] (2.10)Joint probability distributions may also be defined for n random variables. Thedetails are exactly the same as when n = 2 and are left as an exercise. The correspondingresult to Equation (2.10) states that if X 1 ,X 2 ,...,X n are n randomvariables, then for any n constants a 1 ,a 2 ,...,a n ,E[a 1 X 1 + a 2 X 2 +···+a n X n ]=a 1 E[X 1 ]+a 2 E[X 2 ]+···+a n E[X n ] (2.11)Example 2.29rolled.Calculate the expected sum obtained when three fair dice are


50 2 Random VariablesSolution: Let X denote the sum obtained. Then X = X 1 + X 2 + X 3 whereX i represents the value of the ith die. Thus,E[X]=E[X 1 ]+E[X 2 ]+E[X 3 ]=3 ( 72)=212Example 2.30 As another example of the usefulness of Equation (2.11), letus use it to obtain the expectation of a binomial random variable having parametersn and p. Recalling that such a random variable X represents the number ofsuccesses in n trials when each trial has probability p of being a success, we havethatX = X 1 + X 2 +···+X nwhere{ 1, if the ith trial is a successX i =0, if the ith trial is a failureHence, X i is a Bernoulli random variable having expectation E[X i ]=1(p) +0(1 − p) = p. Thus,E[X]=E[X 1 ]+E[X 2 ]+···+E[X n ]=npThis derivation should be compared with the one presented in Example 2.17.Example 2.31 At a party N men throw their hats into the center of a room.The hats are mixed up and each man randomly selects one. Find the expectednumber of men who select their own hats.Solution: Letting X denote the number of men that select their own hats,we can best compute E[X] by noting thatX = X 1 + X 2 +···+X Nwhere{ 1, if the ith man selects his own hatX i =0, otherwiseNow, because the ith man is equally likely to select any of the N hats, it followsthatand soP {X i = 1}=P {ith man selects his own hat}= 1 NE[X i ]=1P {X i = 1}+0P {X i = 0}= 1 N


2.5. Jointly Distributed Random Variables 51Hence, from Equation (2.11) we obtain thatE[X]=E[X 1 ]+···+E[X N ]=( 1N)N = 1Hence, no matter how many people are at the party, on the average exactly oneof the men will select his own hat. Example 2.32 Suppose there are 25 different types of coupons and supposethat each time one obtains a coupon, it is equally likely to be any one of the25 types. Compute the expected number of different types that are contained in aset of 10 coupons.Solution: Let X denote the number of different types in the set of 10coupons. We compute E[X] by using the representationwhereNow,X = X 1 +···+X 25{ 1, if at least one type i coupon is in the set of 10X i =0, otherwiseE[X i ]=P {X i = 1}= P {at least one type i coupon is in the set of 10}= 1 − P {no type i coupons are in the set of 10}( ) 10= 1 − 2425when the last equality follows since each of the 10 coupons will (independently)not be a type i with probability 2425 . Hence,[E[X]=E[X 1 ]+···+E[X 25 ]=25 1 − ( ) ]24 10252.5.2. Independent Random VariablesThe random variables X and Y aresaidtobeindependent if, for all a, b,P {X a, Y b}=P {X a}P {Y b} (2.12)In other words, X and Y are independent if, for all a and b, the events E a ={X a} and F b ={Y b} are independent.


52 2 Random VariablesIn terms of the joint distribution function F of X and Y ,wehavethatX and Yare independent ifF(a, b)= F X (a)F Y (b)for all a, bWhen X and Y are discrete, the condition of independence reduces top(x,y) = p X (x)p Y (y) (2.13)while if X and Y are jointly continuous, independence reduces tof(x, y) = f X (x)f Y (y) (2.14)To prove this statement, consider first the discrete version, and suppose that thejoint probability mass function p(x, y) satisfies Equation (2.13). ThenP {X a, Y b}= ∑ ∑p(x, y)yb xa= ∑ ∑p X (x)p Y (y)yb xa= ∑ p Y (y) ∑ p X (x)yb xa= P {Y b}P {X a}and so X and Y are independent. That Equation (2.14) implies independence inthe continuous case is proven in the same manner and is left as an exercise.An important result concerning independence is the following.Proposition 2.3ProofIf X and Y are independent, then for any functions h and gE[g(X)h(Y)]=E[g(X)]E[h(Y )]Suppose that X and Y are jointly continuous. ThenE[g(X)h(Y)]===∫ ∞ ∫ ∞−∞ −∞∫ ∞ ∫ ∞−∞∫ ∞−∞−∞g(x)h(y)f(x, y)dx dyg(x)h(y)f X (x)f Y (y) dx dyh(y)f Y (y) dy= E[h(Y )]E[g(X)]∫ ∞−∞g(x)f X (x) dxThe proof in the discrete case is similar.


2.5. Jointly Distributed Random Variables 532.5.3. Covariance and Variance of Sums of Random VariablesThe covariance of any two random variables X and Y , denoted by Cov(X, Y ), isdefined byCov(X, Y ) = E[(X − E[X])(Y − E[Y ])]= E[XY − YE[X]−XE[Y ]+E[X]E[Y ]]= E[XY]−E[Y ]E[X]−E[X]E[Y ]+E[X]E[Y ]= E[XY]−E[X]E[Y ]Note that if X and Y are independent, then by Proposition 2.3 it follows thatCov(X, Y ) = 0.Let us consider now the special case where X and Y are indicator variables forwhether or not the events A and B occur. That is, for events A and B, define{ 1, if A occursX =0, otherwise,{ 1, if B occursY =0, otherwiseThen,Cov(X, Y ) = E[XY]−E[X]E[Y ]and, because XY will equal 1 or 0 depending on whether or not both X and Yequal 1, we see thatFrom this we see thatCov(X, Y ) = P {X = 1,Y = 1}−P {X = 1}P {Y = 1}Cov(X, Y ) > 0 ⇔ P {X = 1,Y = 1} >P{X = 1}P {Y = 1}P {X = 1,Y = 1}⇔ >P{Y = 1}P {X = 1}⇔ P {Y = 1|X = 1} >P{Y = 1}That is, the covariance of X and Y is positive if the outcome X = 1 makes it morelikely that Y = 1 (which, as is easily seen by symmetry, also implies the reverse).In general it can be shown that a positive value of Cov(X, Y ) is an indicationthat Y tends to increase as X does, whereas a negative value indicates that Y tendsto decrease as X increases.The following are important properties of covariance.


54 2 Random VariablesProperties of CovarianceFor any random variables X, Y, Z and constant c,1. Cov(X, X) = Var(X),2. Cov(X, Y ) = Cov(Y, X),3. Cov(cX, Y ) = c Cov(X, Y ),4. Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z).Whereas the first three properties are immediate, the final one is easily proven asfollows:Cov(X, Y + Z) = E[X(Y + Z)]−E[X]E[Y + Z]= E[XY]−E[X]E[Y ]+E[XZ]−E[X]E[Z]= Cov(X, Y ) + Cov(X, Z)The fourth property listed easily generalizes to give the following result:⎛⎞n∑ m∑ n∑ m∑Cov ⎝ X i , Y j⎠ = Cov(X i ,Y j ) (2.15)i=1j=1i=1 j=1A useful expression for the variance of the sum of random variables can beobtained from Equation (2.15) as follows:( n∑) ⎛⎞n∑ n∑Var X i = Cov ⎝ X i , X j⎠i=1===n∑i=1 j=1i=1j=1n∑Cov(X i ,X j )n∑Cov(X i ,X i ) +i=1i=1n∑ ∑Cov(X i ,X j )i=1 j≠in∑n∑ ∑Var(X i ) + 2 Cov(X i ,X j ) (2.16)i=1 j


2.5. Jointly Distributed Random Variables 55Definition 2.1 If X 1 ,...,X n are independent and identically distributed,then the random variable ¯X = ∑ ni=1 X i /n is called the sample mean.The following proposition shows that the covariance between the sample meanand a deviation from that sample mean is zero. It will be needed in Section 2.6.1.Proposition 2.4 Suppose that X 1 ,...,X n are independent and identicallydistributed with expected value μ and variance σ 2 . Then,(a) E[ ¯X]=μ.(b) Var( ¯X) = σ 2 /n.(c) Cov( ¯X,X i − ¯X) = 0, i= 1,...,n.ProofParts (a) and (b) are easily established as follows:E[ ¯X]= 1 nm∑E[X i ]=μ,i=1( ) 1 2Var( ¯X) = Varn( n∑i=1X i)To establish part (c) we reason as follows:( ) 1 2 n∑= Var(X i ) = σ 2nni=1Cov( ¯X,X i − ¯X) = Cov( ¯X,X i ) − Cov( ¯X, ¯X)= 1 (n Cov X i + ∑ )X j ,X i − Var( ¯X)j≠i= 1 n Cov(X i,X i ) + 1 n Cov ( ∑j≠i= σ 2n − σ 2n = 0X j ,X i)− σ 2where the final equality used the fact that X i and ∑ j≠i X j are independent andthus have covariance 0. Equation (2.16) is often useful when computing variances.Example 2.33 (Variance of a Binomial Random Variable) Compute the varianceof a binomial random variable X with parameters n and p.n


56 2 Random VariablesSolution: Since such a random variable represents the number of successesin n independent trials when each trial has a common probability p of being asuccess, we may writeX = X 1 +···+X nwhere the X i are independent Bernoulli random variables such that{ 1, if the ith trial is a successX i =0, otherwiseHence, from Equation (2.16) we obtainButand thusVar(X) = Var(X 1 ) +···+Var(X n )Var(X i ) = E[X 2 i ]−(E[X i]) 2= E[X i ]−(E[X i ]) 2 since X 2 i = X i= p − p 2Var(X) = np(1 − p)Example 2.34 (Sampling from a Finite Population: The Hypergeometric)Consider a population of N individuals, some of whom are in favor of a certainproposition. In particular suppose that Np of them are in favor and N − Npare opposed, where p is assumed to be unknown. We are interested in estimatingp, the fraction of the population that is for the proposition, by randomly choosingand then determining the positions of n members of the population.In such situations as described in the preceding, it is common to use the fractionof the sampled population that is in favor of the proposition as an estimator of p.Hence, if we let{ 1, if the ith person chosen is in favorX i =0, otherwisethen the usual estimator of p is ∑ ni=1 X i /n. Let us now compute its mean andvariance. Now[ n∑]n∑E X i = E[X i ]i=11


2.5. Jointly Distributed Random Variables 57= npwhere the final equality follows since the ith person chosen is equally likely to beany of the N individuals in the population and so has probability Np/N of beingin favor.( n∑)n∑Var X i = Var(X i ) + 2 ∑∑Cov(X i ,X j )11i


58 2 Random Variableswhich is not surprising since for N large each of the X i will be (approximately)independent random variables, and thus ∑ n1 X i will have an (approximately) binomialdistribution with parameters n and p.The random variable ∑ n1 X i can be thought of as representing the number ofwhite balls obtained when n balls are randomly selected from a population consistingof Np white and N − Np black balls. (Identify a person who favors theproposition with a white ball and one against with a black ball.) Such a randomvariable is called hypergeometric and has a probability mass function given by( )( )} Np N − Npk n − kP X i = k = ( ) Nn{ n∑1It is often important to be able to calculate the distribution of X + Y from thedistributions of X and Y when X and Y are independent. Suppose first that X andY are continuous, X having probability density f and Y having probability densityg. Then, letting F X+Y (a) be the cumulative distribution function of X + Y ,we haveF X+Y (a) = P {X + Y a}∫∫= f(x)g(y)dx dy===x+ya∫ ∞ ∫ a−y−∞ −∞∫ ∞(∫ a−y−∞∫ ∞−∞−∞f(x)g(y)dx dy)f(x)dx g(y)dyF X (a − y)g(y)dy (2.17)The cumulative distribution function F X+Y is called the convolution of the distributionsF X and F Y (the cumulative distribution functions of X and Y , respectively).By differentiating Equation (2.17), we obtain that the probability density functionf X+Y (a) of X + Y is given byf X+Y (a) = d ∫ ∞F X (a − y)g(y)dyda −∞∫ ∞d=da (F X(a − y))g(y) dy=−∞∫ ∞−∞f(a− y)g(y)dy (2.18)


2.5. Jointly Distributed Random Variables 59Example 2.35 (Sum of Two Independent Uniform Random Variables) If Xand Y are independent random variables both uniformly distributed on (0, 1), thencalculate the probability density of X + Y .Solution:From Equation (2.18), since{ 1, 0


60 2 Random Variables=n∑e −λ λk 1 1k! e−λ 2k=0= e −(λ 1+λ 2 )= e−(λ 1+λ 2 )n!n∑k=0n∑k=0λ n−k2(n − k)!λ k 1 λn−k 2k!(n − k)!= e−(λ 1+λ 2 )(λ 1 + λ 2 ) nn!n!k!(n − k)! λk 1 λn−k 2In words, X 1 + X 2 has a Poisson distribution with mean λ 1 + λ 2 . The concept of independence may, of course, be defined for more than tworandom variables. In general, the n random variables X 1 ,X 2 ,...,X n are saidto be independent if, for all values a 1 ,a 2 ,...,a n ,P {X 1 a 1 ,X 2 a 2 ,...,X n a n }=P {X 1 a 1 }P {X 2 a 2 }···P {X n a n }Example 2.37 Let X 1 ,...,X n be independent and identically distributedcontinuous random variables with probability distribution F and density functionF ′ = f .IfweletX (i) denote the ith smallest of these random variables, thenX (1) ,...,X (n) are called the order statistics. To obtain the distribution of X (i) ,note that X (i) will be less than or equal to x if and only if at least i of the nrandom variables X 1 ,...,X n are less than or equal to x. Hence,P {X (i) x}=n∑k=i( nk)(F (x)) k (1 − F(x)) n−kDifferentiation yields that the density function of X (i) is as follows:f X(i) (x) = f(x)n∑k=i− f(x)= f(x)n∑k=i( nk)k(F(x)) k−1 (1 − F(x)) n−kn∑k=i( nk)(n − k)(F (x)) k (1 − F(x)) n−k−1n!(n − k)!(k − 1)! (F (x))k−1 (1 − F(x)) n−k∑n−1n!− f(x)(n − k − 1)!k! (F (x))k (1 − F(x)) n−k−1k=i


= f(x)=n∑k=i− f(x)2.5. Jointly Distributed Random Variables 61n!(n − k)!(k − 1)! (F (x))k−1 (1 − F(x)) n−kn∑j=i+1n!(n − j)!(j − 1)! (F (x))j−1 (1 − F(x)) n−jn!(n − i)!(i − 1)! f (x)(F (x))i−1 (1 − F(x)) n−iThe preceding density is quite intuitive, since in order for X (i) to equal x, i − 1ofthe n values X 1 ,...,X n must be less than x; n − i of them must be greater than x;and one must be equal to x. Now, the probability density that every member of aspecified set of i −1oftheX j is less than x, every member of another specified setof n − i is greater than x, and the remaining value is equal to x is (F (x)) i−1 (1 −F(x)) n−i f(x). Therefore, since there are n!/[(i − 1)!(n − i)!] different partitionsof the n random variables into the three groups, we obtain the preceding densityfunction. 2.5.4. Joint Probability Distribution of Functions of RandomVariablesLet X 1 and X 2 be jointly continuous random variables with joint probability densityfunction f(x 1 ,x 2 ). It is sometimes necessary to obtain the joint distributionof the random variables Y 1 and Y 2 which arise as functions of X 1 and X 2 . Specifically,suppose that Y 1 = g 1 (X 1 ,X 2 ) and Y 2 = g 2 (X 1 ,X 2 ) for some functions g 1and g 2 .Assume that the functions g 1 and g 2 satisfy the following conditions:1. The equations y 1 = g 1 (x 1 ,x 2 ) and y 2 = g 2 (x 1 ,x 2 ) can be uniquely solvedfor x 1 and x 2 in terms of y 1 and y 2 with solutions given by, say, x 1 =h 1 (y 1 ,y 2 ), x 2 = h 2 (y 1 ,y 2 ).2. The functions g 1 and g 2 have continuous partial derivatives at all points(x 1 ,x 2 ) and are such that the following 2 × 2 determinantat all points (x 1 ,x 2 ).∣ ∂g 1 ∂g 1 ∣∣∣∣∣∣∣∣ ∂x 1 ∂x 2J(x 1 ,x 2 ) =≡ ∂g 1 ∂g 2− ∂g 1 ∂g 2≠ 0∂g 2 ∂g 2 ∂x 1 ∂x 2 ∂x 2 ∂x 1∣∂x 1 ∂x 2


62 2 Random VariablesUnder these two conditions it can be shown that the random variables Y 1 and Y 2are jointly continuous with joint density function given byf Y1 ,Y 2(y 1 ,y 2 ) = f X1 ,X 2(x 1 ,x 2 )|J(x 1 ,x 2 )| −1 (2.19)where x 1 = h 1 (y 1 ,y 2 ), x 2 = h 2 (y 1 ,y 2 ).A proof of Equation (2.19) would proceed along the following lines:P {Y 1 y 1 ,Y 2 y 2 }=∫∫f X1 ,X 2(x 1 ,x 2 )dx 1 dx 2 (2.20)(x 1 ,x 2 ):g 1 (x 1 ,x 2 )y 1g 2 (x 1 ,x 2 )y 2The joint density function can now be obtained by differentiating Equation (2.20)with respect to y 1 and y 2 . That the result of this differentiation will be equal tothe right-hand side of Equation (2.19) is an exercise in advanced calculus whoseproof will not be presented in the present text.Example 2.38 If X and Y are independent gamma random variables withparameters (α, λ) and (β,λ), respectively, compute the joint density of U = X +Yand V = X/(X + Y).Solution:The joint density of X and Y is given byf X,Y (x, y) = λe−λx (λx) α−1Ɣ(α)λe −λy (λy) β−1Ɣ(β)= λα+βƔ(α)Ɣ(β) e−λ(x+y) x α−1 y β−1Now, if g 1 (x, y) = x + y, g 2 (x, y) = x/(x + y), then∂g 1∂x = ∂g 1∂y = 1, ∂g 2∂x = y(x + y) 2 , ∂g 2∂y =− x(x + y) 2and so∣ 1 1 ∣∣∣∣∣J(x, y) =y −x=− 1∣(x + y) 2 (x + y) 2 x + y


2.5. Jointly Distributed Random Variables 63Finally, because the equations u = x + y, v = x/(x + y) have as their solutionsx = uv, y = u(1 − v), we see thatf U,V (u, v) = f X,Y [uv, u(1 − v)]u= λe−λu (λu) α+β−1Ɣ(α + β)v α−1 (1 − v) β−1 Ɣ(α + β)Ɣ(α)Ɣ(β)Hence X + Y and X/(X + Y) are independent, with X + Y having a gammadistribution with parameters (α + β,λ) and X/(X + Y)having density functionf V (v) =Ɣ(α + β)Ɣ(α)Ɣ(β) vα−1 (1 − v) β−1 ,0


64 2 Random VariablesFurthermore, we suppose that the equations y 1 = g 1 (x 1 ,...,x n ), y 2 =g 2 (x 1 ,...,x n ),...,y n = g n (x 1 ,...,x n ) have a unique solution, say, x 1 =h 1 (y 1 ,...,y n ), . . . , x n = h n (y 1 ,...,y n ). Under these assumptions the joint densityfunction of the random variables Y i is given byf Y1 ,...,Y n(y 1 ,...,y n ) = f X1 ,...,X n(x 1 ,...,x n ) |J(x 1 ,...,x n )| −1where x i = h i (y 1 ,...,y n ), i = 1, 2,...,n.2.6. Moment Generating FunctionsThe moment generating function φ(t) of the random variable X is defined for allvalues t byφ(t) = E[e tX ]⎧∑e ⎪⎨tx p(x),x= ∫ ∞⎪⎩ e tx f(x)dx,−∞if X is discreteif X is continuousWe call φ(t) the moment generating function because all of the moments of Xcan be obtained by successively differentiating φ(t). For example,Hence,Similarly,φ ′ (t) = d dt E[etX ][ ] d= Edt (etX )= E[Xe tX ]φ ′ (0) = E[X]φ ′′ (t) = d dt φ′ (t)= d dt E[XetX ][ ] d= Edt (XetX )= E[X 2 e tX ]


2.6. Moment Generating Functions 65and soφ ′′ (0) = E[X 2 ]In general, the nth derivative of φ(t) evaluated at t = 0 equals E[X n ], that is,φ n (0) = E[X n ], n 1We now compute φ(t) for some common distributions.Example 2.39 (The Binomial Distribution with Parameters n and p)Hence,and soφ(t) = E[e tX ]n∑( ) n= e tk p k (1 − p) n−kk=k=0n∑k=0( nk)(pe t ) k (1 − p) n−k= (pe t + 1 − p) nφ ′ (t) = n(pe t + 1 − p) n−1 pe tE[X]=φ ′ (0) = npwhich checks with the result obtained in Example 2.17. Differentiating a secondtime yieldsand soφ ′′ (t) = n(n − 1)(pe t + 1 − p) n−2 (pe t ) 2 + n(pe t + 1 − p) n−1 pe tThus, the variance of X is givenE[X 2 ]=φ ′′ (0) = n(n − 1)p 2 + npVar(X) = E[X 2 ]−(E[X]) 2= n(n − 1)p 2 + np − n 2 p 2= np(1 − p)


66 2 Random VariablesExample 2.40 (The Poisson Distribution with Mean λ)φ(t) = E[e tX ]∞∑ e tn e −λ λ n=n!n=0∑∞= e −λ (λe t ) nn!n=0= e −λ e λet= exp{λ(e t − 1)}Differentiation yieldsand soφ ′ (t) = λe t exp{λ(e t − 1)},φ ′′ (t) = (λe t ) 2 exp{λ(e t − 1)}+λe t exp{λ(e t − 1)}E[X]=φ ′ (0) = λ,E[X 2 ]=φ ′′ (0) = λ 2 + λ,Var(X) = E[X 2 ]−(E[X]) 2= λThus, both the mean and the variance of the Poisson equal λ.Example 2.41 (The Exponential Distribution with Parameter λ)φ(t) = E[e tX ]== λ∫ ∞0∫ ∞0= λλ − te tx λe −λx dxe −(λ−t)x dxfor t


2.6. Moment Generating Functions 67We note by the preceding derivation that, for the exponential distribution, φ(t) isonly defined for values of t less than λ. Differentiation of φ(t) yieldsHence,φ ′ (t) =λ(λ − t) 2 , φ′′ (t) = 2λ(λ − t) 3The variance of X is thus given byE[X]=φ ′ (0) = 1 λ , E[X2 ]=φ ′′ (0) = 2 λ 2Var(X) = E[X 2 ]−(E[X]) 2 = 1 λ 2Example 2.42 (The Normal Distribution with Parameters μ and σ 2 )The moment generating function of a standard normal random variable Z isobtained as follows.E[e tZ ]= √ 1 ∫ ∞e tx e −x2 /2 dx2π−∞= 1 √2π∫ ∞−∞= e t2 /2 1√2π∫ ∞= e t2 /2e −(x2 −2tx)/2 dx−∞e −(x−t)2 /2 dxIf Z is a standard normal, then X = σZ+ μ is normal with parameters μ and σ 2 ;therefore,{ σφ(t)= E[e tX ]=E[e t(σZ+μ) ]=e tμ E[e tσZ 2 t 2 }]=exp + μt2By differentiating we obtain{ σφ ′ (t) = (μ + tσ 2 2 t 2) exp2}+ μt ,{ σφ ′′ (t) = (μ + tσ 2 ) 2 2 t 2exp + μt2} { σ+ σ 2 2 t 2 }exp + μt2


68 2 Random Variablesand soimplying thatE[X]=φ ′ (0) = μ,E[X 2 ]=φ ′′ (0) = μ 2 + σ 2Var(X) = E[X 2 ]−E([X]) 2= σ 2 Tables 2.1 and 2.2 give the moment generating function for some commondistributions.An important property of moment generating functions is that the moment generatingfunction of the sum of independent random variables is just the productof the individual moment generating functions. To see this, suppose that X andY are independent and have moment generating functions φ X (t) and φ Y (t), respectively.Then φ X+Y (t), the moment generating function of X + Y ,isgivenbyφ X+Y (t) = E[e t(X+Y) ]= E[e tX e tY ]= E[e tX ]E[e tY ]= φ X (t)φ Y (t)where the next to the last equality follows from Proposition 2.3 since X and Y areindependent.Table 2.1Discrete Probability Momentprobability mass generatingdistribution function, p(x) function, φ(t) Mean VarianceBinomial withparameters n, p0 p 1Poisson withparameterλ>0Geometric withparameter0 p 1( nx )p x (1 − p) n−x ,x = 0, 1,...,ne−λ λxx! ,x = 0, 1, 2,...p(1 − p) x−1 ,x = 1, 2,...(pe t + (1 − p)) n np np(1 − p)exp{λ(e t − 1)} λ λpe t1 − (1 − p)e t 1p1 − pp 2


2.6. Moment Generating Functions 69Table 2.2ContinuousMomentprobability Probability density generatingdistribution function, f(x) function, φ(t) Mean Variance⎧⎨ 1Uniformf(x)= b − a ,a


70 2 Random VariablesBut (pe t + (1 − p)) m+n is just the moment generating function of a binomialrandom variable having parameters m + n and p. Thus, this must be the distributionof X + Y . Example 2.45 (Sums of Independent Poisson Random Variables) Calculatethe distribution of X+Y when X and Y are independent Poisson random variableswith means λ 1 and λ 2 , respectively.Solution:φ X+Y (t) = φ X (t) φ Y (t)= e λ 1(e t −1) e λ 2(e t −1)= e (λ 1+λ 2 )(e t −1)Hence, X + Y is Poisson distributed with mean λ 1 + λ 2 , verifying the resultgiven in Example 2.36. Example 2.46 (Sums of Independent Normal Random Variables) Show thatif X and Y are independent normal random variables with parameters (μ 1 ,σ1 2)and (μ 2 ,σ2 2), respectively, then X + Y is normal with mean μ 1 + μ 2 and varianceσ1 2 + σ 2 2.Solution:φ X+Y (t) = φ X (t)φ Y (t){ } { }σ2= exp 1t 2σ2+ μ 1 t exp 2t 2+ μ 2 t22{ }(σ2= exp 1+ σ2 2)t2+ (μ 1 + μ 2 )t2which is the moment generating function of a normal random variable withmean μ 1 + μ 2 and variance σ1 2 + σ 2 2 . Hence, the result follows since the momentgenerating function uniquely determines the distribution. Example 2.47 (The Poisson Paradigm) We showed in Section 2.2.4 that thenumber of successes that occur in n independent trails, each of which results ina success with probability p is, when n is large and p small, approximately aPoisson random variable with parameter λ = np. This result, however, can besubstantially strengthened. First it is not necessary that the trials have the same


2.6. Moment Generating Functions 71success probability, only that all the success probabilities are small. To see thatthis is the case, suppose that the trials are independent, with trial i resulting ina success with probability p i , where all the p i , i = 1,...,n are small. LettingX i equal 1 if trial i is a success, and 0 otherwise, it follows that the number ofsuccesses, call it X, can be expressed asX =n∑i=1Using that X i is a Bernoulli (or binary) random variable, its moment generatingfunction isX iE[e tX i]=p i e t + 1 − p i = 1 + p i (e t − 1)Now, using the result that, for |x| small,e x ≈ 1 + xit follows, because p i (e t − 1) is small when p i is small, thatE[e tX i]=1 + p i (e t − 1) ≈ exp{p i (e t − 1)}Because the moment generating function of a sum of independent random variablesis the product of their moment generating functions, the preceding impliesthatE[e tX ]≈n∏i=1{ ∑exp{p i (e t − 1)}=expi}p i (e t − 1)But the right side of the preceding is the moment generating function of a Poissonrandom variable with mean ∑ i p i, thus arguing that this is approximately thedistribution of X.Not only is it not necessary for the trials to have the same success probabilityfor the number of successes to approximately have a Poisson distribution, theyneed not even be independent, provided that their dependence is weak. Forinstance,recall the matching problem (Example 2.31) where n people randomlyselect hats from a set consisting of one hat from each person. By regarding therandom selections of hats as constituting n trials, where we say that trial i is asuccess if person i chooses his or her own hat, it follows that, with A i being theevent that trial i is a success,P(A i ) = 1 and P(A i |A j ) = 1nn − 1 , j ≠ iHence, whereas the trials are not independent, their dependence appears, for largen, to be weak. Because of this weak dependence, and the small trial success probabilities,it would seem that the number of matches should approximately have a


72 2 Random VariablesPoisson distribution with mean 1 when n is large, and this is shown to be the casein Example 3.23.The statement that “the number of successes in n trials that are either independentor at most weakly dependent is, when the trial success probabilities are allsmall, approximately a Poisson random variable” is known as the Poisson paradigm.Remark For a nonnegative random variable X, it is often convenient to defineits Laplace transform g(t), t 0, byg(t) = φ(−t)= E[e −tX ]That is, the Laplace transform evaluated at t is just the moment generating functionevaluated at −t. The advantage of dealing with the Laplace transform, ratherthan the moment generating function, when the random variable is nonnegative isthat if X 0 and t 0, then0 e −tX 1That is, the Laplace transform is always between 0 and 1. As in the case of momentgenerating functions, it remains true that nonnegative random variables thathave the same Laplace transform must also have the same distribution. It is also possible to define the joint moment generating function of two ormore random variables. This is done as follows. For any n random variablesX 1 ,...,X n , the joint moment generating function, φ(t 1 ,...,t n ), is defined forall real values of t 1 ,...,t n byφ(t 1 ,...,t n ) = E[e (t 1X 1 + ··· + t n X n ) ]It can be shown that φ(t 1 ,...,t n ) uniquely determines the joint distribution ofX 1 ,...,X n .Example 2.48 (The Multivariate Normal Distribution) Let Z 1 ,...,Z n be aset of n independent standard normal random variables. If, for some constantsa ij , 1 i m, 1 j n, and μ i , 1 i m,X 1 = a 11 Z 1 +···+a 1n Z n + μ 1 ,X 2 = a 21 Z 1 +···+a 2n Z n + μ 2 ,..X i = a i1 Z 1 +···+a in Z n + μ i ,..X m = a m1 Z 1 +···+a mn Z n + μ m


2.6. Moment Generating Functions 73then the random variables X 1 ,...,X m are said to have a multivariate normal distribution.It follows from the fact that the sum of independent normal random variablesis itself a normal random variable that each X i is a normal random variable withmean and variance given byLet us now determineE[X i ]=μ i ,n∑Var(X i ) =j=1a 2 ijφ(t 1 ,...,t m ) = E[exp{t 1 X 1 + ··· +t m X m }]the joint moment generating function of X 1 ,...,X m . The first thing to note is thatsince ∑ mi=1 t i X i is itself a linear combination of the independent normal randomvariables Z 1 ,...,Z n , it is also normally distributed. Its mean and variance arerespectively[ m]∑m∑E t i X i = t i μ iandi=1i=1( m) ⎛∑m∑Var t i X i = Cov ⎝ t i X i ,i=1=m∑i=1 j=1i=1m∑j=1t j X j⎞⎠m∑t i t j Cov(X i ,X j )Now, if Y is a normal random variable with mean μ and variance σ 2 , thenE[e Y ]=φ Y (t)| t=1 = e μ+σ 2 /2Thus, we see that⎧⎨ m∑φ(t 1 ,...,t m ) = exp t i μ i + 1 ⎩ 2i=1m∑i=1 j=1⎫m∑⎬t i t j Cov(X i ,X j )⎭which shows that the joint distribution of X 1 ,...,X m is completely determinedfrom a knowledge of the values of E[X i ] and Cov(X i ,X j ), i, j = 1,...,m.


74 2 Random Variables2.6.1. The Joint Distribution of the Sample Mean and SampleVariance from a Normal PopulationLet X 1 ,...,X n be independent and identically distributed random variables, eachwith mean μ and variance σ 2 . The random variable S 2 defined byn∑S 2 (X i − ¯X) 2=n − 1i=1is called the sample variance of these data. To compute E[S 2 ] we use the identityn∑(X i − ¯X) 2 =i=1n∑(X i − μ) 2 − n( ¯X − μ) 2 (2.21)i=1which is proven as follows:n∑n∑(X i − ¯X) = (X i − μ + μ − ¯X) 2i=1===i=1n∑n∑(X i − μ) 2 + n(μ − ¯X) 2 + 2(μ − ¯X) (X i − μ)i=1i=1n∑(X i − μ) 2 + n(μ − ¯X) 2 + 2(μ − ¯X)(n ¯X − nμ)i=1n∑(X i − μ) 2 + n(μ − ¯X) 2 − 2n(μ − ¯X) 2i=1and Identity (2.21) follows.Using Identity (2.21) givesE[(n − 1)S 2 ]=n∑E[(X i − μ) 2 ]−nE[( ¯X − μ) 2 ]i=1Thus, we obtain from the preceding thatE[S 2 ]=σ 2= nσ 2 − n Var( ¯X)= (n − 1)σ 2 from Proposition 2.4(b)We will now determine the joint distribution of the sample mean ¯X =∑ ni=1X i /n and the sample variance S 2 when the X i have a normal distribution.To begin we need the concept of a chi-squared random variable.


2.6. Moment Generating Functions 75Definition 2.2 If Z 1 ,...,Z n are independent standard normal random variables,then the random variable ∑ ni=1 Zi 2 is said to be a chi-squared random variablewith n degrees of freedom.We shall now compute the moment generating function of ∑ ni=1 Zi 2.Tobegin,note thatE[exp{tZ 2 i }] = 1 √2π∫ ∞−∞= 1 √2π∫ ∞−∞= σ= (1 − 2t) −1/2e tx2 e −x2 /2 dxe −x2 /2σ 2 dxwhere σ 2 = (1 − 2t) −1Hence,[ {E exp tn∑Zi2i=1}]n∏=i=1E[exp{tZi 2 }] = (1 − 2t)−n/2Now, let X 1 ,...,X n be independent normal random variables, each with meanμ and variance σ 2 , and let ¯X = ∑ ni=1 X i /n and S 2 denote their sample meanand sample variance. Since the sum of independent normal random variables isalso a normal random variable, it follows that ¯X is a normal random variable withexpected value μ and variance σ 2 /n. In addition, from Proposition 2.4,Cov( ¯X,X i − ¯X) = 0, i = 1,...,n (2.22)Also, since ¯X,X 1 − ¯X,X 2 − ¯X,...,X n − ¯X are all linear combinations of the independentstandard normal random variables (X i − μ)/σ, i = 1,...,n, it followsthat the random variables ¯X,X 1 − ¯X,X 2 − ¯X,...,X n − ¯X have a joint distributionthat is multivariate normal. However, if we let Y be a normal random variable withmean μ and variance σ 2 /n that is independent of X 1 ,...,X n , then the randomvariables Y,X 1 − ¯X,X 2 − ¯X,...,X n − ¯X also have a multivariate normal distribution,and by Equation (2.22), they have the same expected values and covariancesas the random variables ¯X,X i − ¯X, i = 1,...,n. Thus, since a multivariate normaldistribution is completely determined by its expected values and covariances,we can conclude that the random vectors Y,X 1 − ¯X,X 2 − ¯X,...,X n − ¯X and¯X,X 1 − ¯X,X 2 − ¯X,...,X n − ¯X have the same joint distribution; thus showingthat ¯X is independent of the sequence of deviations X i − ¯X, i = 1,...,n.


76 2 Random VariablesSince ¯X is independent of the sequence of deviations X i − ¯X,i = 1,...,n,itfollows that it is also independent of the sample variancen∑S 2 (X i − ¯X) 2≡n − 1To determine the distribution of S 2 , use Identity (2.21) to obtain(n − 1)S 2 =i=1n∑(X i − μ) 2 − n( ¯X − μ) 2i=1Dividing both sides of this equation by σ 2 yields(n − 1)S 2 ( ) 2 ¯X − μn∑σ 2 +σ/ √ =ni=1(X i − μ) 2σ 2 (2.23)Now, ∑ ni=1 (X i − μ) 2 /σ 2 is the sum of the squares of n independent standardnormal random variables, and so is a chi-squared random variable with n degreesof freedom; it thus has moment generating function (1 − 2t) −n/2 .Also[( ¯X − μ)/(σ/ √ n)] 2 is the square of a standard normal random variable and so isa chi-squared random variable with one degree of freedom; and thus has momentgenerating function (1 − 2t) −1/2 . In addition, we have previously seen that thetwo random variables on the left side of Equation (2.23) are independent. Therefore,because the moment generating function of the sum of independent randomvariables is equal to the product of their individual moment generating functions,we obtain thatE[e t(n−1)S2 /σ 2 ](1 − 2t) −1/2 = (1 − 2t) −n/2orE[e t(n−1)S2 /σ 2 ]=(1 − 2t) −(n−1)/2But because (1 − 2t) −(n−1)/2 is the moment generating function of a chi-squaredrandom variable with n − 1 degrees of freedom, we can conclude, since the momentgenerating function uniquely determines the distribution of the random variable,that this is the distribution of (n − 1)S 2 /σ 2 .Summing up, we have shown the following.Proposition 2.5 If X 1 ,...,X n are independent and identically distributednormal random variables with mean μ and variance σ 2 , then the sample mean ¯Xand the sample variance S 2 are independent. ¯X is a normal random variable withmean μ and variance σ 2 /n; (n − 1)S 2 /σ 2 is a chi-squared random variable withn − 1 degrees of freedom.


2.7. Limit Theorems 772.7. Limit TheoremsWe start this section by proving a result known as Markov’s inequality.Proposition 2.6 (Markov’s Inequality) If X is a random variable that takesonly nonnegative values, then for any value a>0P {X a} E[X]aProof We give a proof for the case where X is continuous with density f :E[X]=== a∫ ∞0∫ a0∫ ∞a∫ ∞a∫ ∞axf (x) dxxf (x) dx +xf (x) dxaf (x) dxf(x)dx= aP{X a}and the result is proven. As a corollary, we obtain the following.∫ ∞axf (x) dxProposition 2.7 (Chebyshev’s Inequality) If X is a random variable withmean μ and variance σ 2 , then, for any value k>0,P {|X − μ| k} σ 2Proof Since (X −μ) 2 is a nonnegative random variable, we can apply Markov’sinequality (with a = k 2 ) to obtaink 2P {(X − μ) 2 k 2 } E[(X − μ)2 ]k 2


78 2 Random VariablesBut since (X − μ) 2 k 2 if and only if |X − μ| k, the preceding is equivalent toP {|X − μ| k} E[(X − μ)2 ]k 2 = σ 2k 2and the proof is complete. The importance of Markov’s and Chebyshev’s inequalities is that they enableus to derive bounds on probabilities when only the mean, or both the mean and thevariance, of the probability distribution are known. Of course, if the actual distributionwere known, then the desired probabilities could be exactly computed, andwe would not need to resort to bounds.Example 2.49 Suppose we know that the number of items produced in afactory during a week is a random variable with mean 500.(a) What can be said about the probability that this week’s production will beat least 1000?(b) If the variance of a week’s production is known to equal 100, then what canbe said about the probability that this week’s production will be between400 and 600?Solution:Let X be the number of items that will be produced in a week.(a) By Markov’s inequality,P {X 1000} E[X]1000 = 5001000 = 1 2(b) By Chebyshev’s inequality,Hence,P {|X − 500| 100} σ 2(100) 2 = 1100P {|X − 500| < 100} 1 − 1100 = 99100and so the probability that this week’s production will be between 400and 600 is at least 0.99. The following theorem, known as the strong law of large numbers, is probablythe most well-known result in probability theory. It states that the average of asequence of independent random variables having the same distribution will, withprobability 1, converge to the mean of that distribution.


2.7. Limit Theorems 79Theorem 2.1 (Strong Law of Large Numbers) Let X 1 ,X 2 ,... be a sequenceof independent random variables having a common distribution, and letE[X i ]=μ. Then, with probability 1,X 1 + X 2 +···+X nn→ μas n →∞As an example of the preceding, suppose that a sequence of independent trialsis performed. Let E be a fixed event and denote by P(E) the probability that Eoccurs on any particular trial. Letting{ 1, if E occurs on the ith trialX i =0, if E does not occur on the ith trialwe have by the strong law of large numbers that, with probability 1,X 1 +···+X nn→ E[X]=P(E) (2.24)Since X 1 +···+X n represents the number of times that the event E occurs in thefirst n trials, we may interpret Equation (2.24) as stating that, with probability 1,the limiting proportion of time that the event E occurs is just P(E).Running neck and neck with the strong law of large numbers for the honor ofbeing probability theory’s number one result is the central limit theorem. Besidesits theoretical interest and importance, this theorem provides a simple method forcomputing approximate probabilities for sums of independent random variables.It also explains the remarkable fact that the empirical frequencies of so manynatural “populations” exhibit a bell-shaped (that is, normal) curve.Theorem 2.2 (Central Limit Theorem) Let X 1 ,X 2 ,... be a sequence of independent,identically distributed random variables, each with mean μ and varianceσ 2 . Then the distribution ofX 1 + X 2 +···+X n − nμσ √ ntends to the standard normal as n →∞. That is,{ }X1 + X 2 +···+X n − nμPσ √ a → 1 ∫ a√ e −x2 /2 dxn2π −∞as n →∞.Note that like the other results of this section, this theorem holds for any distributionof the X i ’s; herein lies its power.


80 2 Random VariablesIf X is binomially distributed with parameters n and p, then X has the samedistribution as the sum of n independent Bernoulli random variables, each withparameter p. (Recall that the Bernoulli random variable is just a binomial randomvariable whose parameter n equals 1.) Hence, the distribution ofX − E[X]√ = √ X − npVar(X) np(1 − p)approaches the standard normal distribution as n approaches ∞. The normalapproximation will, in general, be quite good for values of n satisfyingnp(1 − p) 10.Example 2.50 (Normal Approximation to the Binomial) Let X be the numberof times that a fair coin, flipped 40 times, lands heads. Find the probability thatX = 20. Use the normal approximation and then compare it to the exact solution.Solution: Since the binomial is a discrete random variable, and the normala continuous random variable, it leads to a better approximation to write thedesired probability asP {X = 20}=P {19.5


2.7. Limit Theorems 81Table 2.3Area (x) under the Standard Normal Curve to the Left of xx 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5597 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8557 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.99903.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.99933.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.99953.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.99973.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998The exact result is( ) 40 1 40P {X = 20}=20)(2which can be shown to equal 0.1268.


82 2 Random VariablesExample 2.51 Let X i ,i = 1, 2,...,10 be independent random variables,each being uniformly distributed over (0, 1). Estimate P { ∑ 101 X i > 7}.Solution: Since E[X i ]= 1 2 ,Var(X i) =12 1 we have by the central limit theoremthat⎧⎫{ 10∑} ⎪⎨ ∑ 10P X i > 7 = P 1 X i − 5√ ( )1⎪⎩> 7 − 5 ⎪⎬√ ( )10 11210 112 ⎪⎭≈ 1 − (2.2)= 0.0139 Example 2.52 The lifetime of a special type of battery is a random variablewith mean 40 hours and standard deviation 20 hours. A battery is used until itfails, at which point it is replaced by a new one. Assuming a stockpile of 25 suchbatteries, the lifetimes of which are independent, approximate the probability thatover 1100 hours of use can be obtained.Solution: If we let X i denote the lifetime of the ith battery to be put inuse, then we desire p = P {X 1 +···+X 25 > 1100}, which is approximated asfollows:{ }X1 +···+X 25 − 1000p = P20 √ 1100 − 1000>2520 √ 25≈ P {N(0, 1)>1}= 1 − (1)≈ 0.1587 We now present a heuristic proof of the Central Limit theorem. Suppose firstthat the X i have mean 0 and variance 1, and let E[e tX ] denote their commonmoment generating function. Then, the moment generating function of X 1 + √n··· + X nis[ { ( )}]X1 +···+X nE exp t √ = E [ e tX 1/ √n e tX 2/ √n ···e tX n/ √ n ]n= ( E [ e tX/√ n ]) nby independenceNow, for n large, we obtain from the Taylor series expansion of e y thate tX/√n ≈ 1 + tX √ n+ t2 X 22n


2.8. Stochastic Processes 83Taking expectations shows that when n is largeE [ e tX/√ n ] ≈ 1 + tE[X] √ n+ t2 E[X 2 ]2n= 1 + t2because E[X]=0, E[X 2 ]=12nTherefore, we obtain that when n is large[ { ( )}] ) nX1 +···+X nE exp t √ ≈(1 + t2n 2nWhen n goes to ∞ the approximation can be shown to become exact and we havethat[ { ( )}]lim E X1 +···+X nexp t √ = e t2 /2n→∞ nThus, the moment generating function of X 1+ √n··· +X nconverges to the momentgenerating function of a (standard) normal random variable with mean 0 and variance1. Using this, it can be proven that the distribution function of the randomvariable X 1+ √n··· +X nconverges to the standard normal distribution function .When the X i have mean μ and variance σ 2 , the random variables X i−μσhavemean 0 and variance 1. Thus, the preceding shows that{ }X1 − μ + X 2 − μ +···+X n − μPσ √ a → (a)nwhich proves the central limit theorem.2.8. Stochastic ProcessesA stochastic process {X(t),t ∈ T } is a collection of random variables. That is,for each t ∈ T,X(t)is a random variable. The index t is often interpreted as timeand, as a result, we refer to X(t) as the state of the process at time t. For example,X(t) might equal the total number of customers that have entered a supermarketby time t; or the number of customers in the supermarket at time t; or the totalamount of sales that have been recorded in the market by time t; etc.The set T is called the index set of the process. When T is a countable setthe stochastic process is said to be a discrete-time process. If T is an interval ofthe real line, the stochastic process is said to be a continuous-time process. Forinstance, {X n ,n= 0, 1,...} is a discrete-time stochastic process indexed by the


84 2 Random Variablesnonnegative integers; while {X(t),t 0} is a continuous-time stochastic processindexed by the nonnegative real numbers.The state space of a stochastic process is defined as the set of all possible valuesthat the random variables X(t) can assume.Thus, a stochastic process is a family of random variables that describes theevolution through time of some (physical) process. We shall see much of stochasticprocesses in the following chapters of this text.Example 2.53 Consider a particle that moves along a set of m + 1 nodes,labeled 0, 1,...,m, that are arranged around a circle (see Figure 2.3). At eachstep the particle is equally likely to move one position in either the clockwise orcounterclockwise direction. That is, if X n is the position of the particle after itsnth step thenP {X n+1 = i + 1|X n = i}=P {X n+1 = i − 1|X n = i}= 1 2where i + 1 ≡ 0 when i = m, and i − 1 ≡ m when i = 0. Suppose now that theparticle starts at 0 and continues to move around according to the preceding rulesuntil all the nodes 1, 2,...,mhave been visited. What is the probability that nodei, i = 1,...,m, is the last one visited?Solution: Surprisingly enough, the probability that node i is the last nodevisited can be determined without any computations. To do so, consider thefirst time that the particle is at one of the two neighbors of node i, that is, thefirst time that the particle is at one of the nodes i − 1ori + 1 (with m + 1 ≡ 0).Suppose it is at node i −1 (the argument in the alternative situation is identical).Since neither node i nor i + 1 has yet been visited, it follows that i will be thelast node visited if and only if i + 1 is visited before i. This is so because inFigure 2.3. Particle moving around a circle.


Exercises 85order to visit i + 1 before i the particle will have to visit all the nodes on thecounterclockwise path from i − 1toi + 1 before it visits i. But the probabilitythat a particle at node i − 1 will visit i + 1 before i is just the probability that aparticle will progress m−1 steps in a specified direction before progressing onestep in the other direction. That is, it is equal to the probability that a gamblerwho starts with one unit, and wins one when a fair coin turns up heads and losesone when it turns up tails, will have his fortune go up by m − 1 before he goesbroke. Hence, because the preceding implies that the probability that node i isthe last node visited is the same for all i, and because these probabilities mustsum to 1, we obtainP {i is the last node visited}=1/m, i = 1,...,m Remark The argument used in Example 2.53 also shows that a gambler who isequally likely to either win or lose one unit on each gamble will be down n beforebeing up 1 with probability 1/(n + 1); or equivalentlyP {gambler is up 1 before being down n}=nn + 1Suppose now we want the probability that the gambler is up 2 before beingdown n. Upon conditioning on whether he reaches up 1 before down n, we obtainthatP {gambler is up 2 before being down n}n= P {up 2 before down n|up 1 before down n}n + 1n= P {up 1 before down n + 1}n + 1= n + 1 nn + 2 n + 1 = nn + 2Repeating this argument yields thatP {gambler is up k before being down n}=nn + kExercises1. An urn contains five red, three orange, and two blue balls. Two balls arerandomly selected. What is the sample space of this experiment? Let X representthe number of orange balls selected. What are the possible values of X? CalculateP {X = 0}.


86 2 Random Variables2. Let X represent the difference between the number of heads and the numberof tails obtained when a coin is tossed n times. What are the possible values of X?3. In Exercise 2, if the coin is assumed fair, then, for n = 2, what are the probabilitiesassociated with the values that X can take on?*4. Suppose a die is rolled twice. What are the possible values that the followingrandom variables can take on?(i) The maximum value to appear in the two rolls.(ii) The minimum value to appear in the two rolls.(iii) The sum of the two rolls.(iv) The value of the first roll minus the value of the second roll.5. If the die in Exercise 4 is assumed fair, calculate the probabilities associatedwith the random variables in (i)–(iv).6. Suppose five fair coins are tossed. Let E be the event that all coins landheads. Define the random variable I EI E ={ 1, if E occurs0, if E c occursFor what outcomes in the original sample space does I E equal 1? What isP {I E = 1}?7. Suppose a coin having probability 0.7 of coming up heads is tossed threetimes. Let X denote the number of heads that appear in the three tosses. Determinethe probability mass function of X.8. Suppose the distribution function of X is given by⎧⎨0, b


Exercises 8710. Suppose three fair dice are rolled. What is the probability at most one sixappears?*11. A ball is drawn from an urn containing three white and three black balls.After the ball is drawn, it is then replaced and another ball is drawn. This goes onindefinitely. What is the probability that of the first four balls drawn, exactly twoare white?12. On a multiple-choice exam with three possible answers for each of the fivequestions, what is the probability that a student would get four or more correctanswers just by guessing?13. An individual claims to have extrasensory perception (ESP). As a test, a faircoin is flipped ten times, and he is asked to predict in advance the outcome. Ourindividual gets seven out of ten correct. What is the probability he would havedone at least this well if he had no ESP? (Explain why the relevant probability isP {X 7} and not P {X = 7}.)14. Suppose X has a binomial distribution with parameters 6 and 1 2. Show thatX = 3 is the most likely outcome.15. Let X be binomially distributed with parameters n and p. Show that as kgoes from 0 to n, P(X= k) increases monotonically, then decreases monotonicallyreaching its largest value(a) in the case that (n + 1)p is an integer, when k equals either (n + 1)p − 1or (n + 1)p,(b) in the case that (n + 1)p is not an integer, when k satisfies (n + 1)p − 1


88 2 Random Variables18. Show that when r = 2 the multinomial reduces to the binomial.19. In Exercise 17, let X i denote the number of times the ith outcome appears,i = 1,...,r. What is the probability mass function of X 1 + X 2 + ...+ X k ?20. A television store owner figures that 50 percent of the customers enteringhis store will purchase an ordinary television set, 20 percent will purchase a colortelevision set, and 30 percent will just be browsing. If five customers enter hisstore on a certain day, what is the probability that two customers purchase colorsets, one customer purchases an ordinary set, and two customers purchase nothing?21. In Exercise 20, what is the probability that our store owner sells three ormore televisions on that day?22. If a fair coin is successively flipped, find the probability that a head firstappears on the fifth trial.*23. A coin having probability p of coming up heads is successively flippeduntil the rth head appears. Argue that X, the number of flips required, will be n,n r, with probability( ) n − 1P {X = n}= p r (1 − p) n−r , n rr − 1This is known as the negative binomial distribution.Hint:How many successes must there be in the first n − 1 trials?24. The probability mass function of X is given by( ) r + k − 1p(k) =p r (1 − p) k , k= 0, 1,...r − 1Give a possible interpretation of the random variable X.Hint: See Exercise 23.In Exercises 25 and 26, suppose that two teams are playing a series of games,each of which is independently won by team A with probability p and by team Bwith probability 1 − p. The winner of the series is the first team to win i games.25. If i = 4, find the probability that a total of 7 games are played. Also showthat this probability is maximized when p = 1/2.26. Find the expected number of games that are played when(a) i = 2;(b) i = 3.In both cases, show that this number is maximized when p = 1/2.


Exercises 89*27. A fair coin is independently flipped n times, k times by A and n − k timesby B. Show that the probability that A and B flip the same number of heads isequal to the probability that there are a total of k heads.28. Suppose that we want to generate a random variable X that is equally likelyto be either 0 or 1, and that all we have at our disposal is a biased coin that,when flipped, lands on heads with some (unknown) probability p. Consider thefollowing procedure:1. Flip the coin, and let 0 1 , either heads or tails, be the result.2. Flip the coin again, and let 0 2 be the result.3. If 0 1 and 0 2 are the same, return to step 1.4. If 0 2 is heads, set X = 0, otherwise set X = 1.(a) Show that the random variable X generated by this procedure is equallylikelytobeeither0or1.(b) Could we use a simpler procedure that continues to flip the coin until thelast two flips are different, and then sets X = 0 if the final flip is a head, andsets X = 1 if it is a tail?29. Consider n independent flips of a coin having probability p of landing heads.Say a changeover occurs whenever an outcome differs from the one precedingit. For instance, if the results of the flips are HHTHTHHT, then thereare a total of five changeovers. If p = 1/2, what is the probability there are kchangeovers?30. Let X be a Poisson random variable with parameter λ. Show that P {X = i}increases monotonically and then decreases monotonically as i increases, reachingits maximum when i is the largest integer not exceeding λ.Hint: Consider P {X = i}/P {X = i − 1}.31. Compare the Poisson approximation with the correct binomial probabilityfor the following cases:(i) P {X = 2} when n = 8, p= 0.1.(ii) P {X = 9} when n = 10, p= 0.95.(iii) P {X = 0} when n = 10, p= 0.1.(iv) P {X = 4} when n = 9, p= 0.2.32. If you buy a lottery ticket in 50 lotteries, in each of which your chance of1winning a prize is100, what is the (approximate) probability that you will win aprize (a) at least once, (b) exactly once, (c) at least twice?


90 2 Random Variables33. Let X be a random variable with probability density{c(1 − xf(x)=2 ), −1


Exercises 9140. Suppose that two teams are playing a series of games, each of which isindependently won by team A with probability p and by team B with probability1 − p. The winner of the series is the first team to win four games. Find theexpected number of games that are played, and evaluate this quantity when p =1/2.41. Consider the case of arbitrary p in Exercise 29. Compute the expected numberof changeovers.42. Suppose that each coupon obtained is, independent of what has been previouslyobtained, equally likely to be any of m different types. Find the expectednumber of coupons one needs to obtain in order to have at least one of each type.Hint:Let X be the number needed. It is useful to represent X byX =m∑i=1X iwhere each X i is a geometric random variable.43. An urn contains n + m balls, of which n are red and m are black. Theyare withdrawn from the urn, one at a time and without replacement. Let X be thenumber of red balls removed before the first black ball is chosen. We are interestedin determining E[X]. To obtain this quantity, number the red balls from 1 to n.Now define the random variables X i ,i= 1,...,n,by{ 1, if red ball i is taken before any black ball is chosenX i =0, otherwise(a) Express X in terms of the X i .(b) Find E[X].44. In Exercise 43, let Y denote the number of red balls chosen after the first butbefore the second black ball has been chosen.(a) Express Y as the sum of n random variables, each of which is equal toeither 0 or 1.(b) Find E[Y ].(c) Compare E[Y ] to E[X] obtained in Exercise 43.(d) Can you explain the result obtained in part (c)?45. A total of r keys are to be put, one at a time, in k boxes, with each keyindependently being put in box i with probability p i , ∑ ki=1 p i = 1. Each time akey is put in a nonempty box, we say that a collision occurs. Find the expectednumber of collisions.


92 2 Random Variables46. If X is a nonnegative integer valued random variable, show thatHint:∞∑∞∑E[X]= P {X n}= P {X>n}n=1n=0Define the sequence of random variables I n , n 1, by{ 1, if n XI n =0, if n>XNow express X in terms of the I n .*47. Consider three trials, each of which is either a success or not. Let X denotethe number of successes. Suppose that E[X]=1.8.(a) What is the largest possible value of P {X = 3}?(b) What is the smallest possible value of P {X = 3}?In both cases, construct a probability scenario that results in P {X = 3} having thedesired value.48. If X is uniformly distributed over (0,1), calculate E[X 2 ].*49. Prove that E[X 2 ] (E[X]) 2 . When do we have equality?50. Let c be a constant. Show that(i) Var(cX) = c 2 Var(X);(ii) Var(c + X) = Var(X).51. A coin, having probability p of landing heads, is flipped until head appearsfor the rth time. Let N denote the number of flips required. Calculate E[N].Hint: There is an easy way of doing this. It involves writing N as the sum ofr geometric random variables.52. (a) Calculate E[X] for the maximum random variable of Exercise 37.(b) Calculate E[X] for X as in Exercise 33.(c) Calculate E[X] for X as in Exercise 34.53. If X is uniform over (0, 1), calculate E[X n ] and Var(X n ).54. Let X and Y each take on either the value 1 or −1. Letp(1, 1) = P {X = 1, Y= 1},p(1, −1) = P {X = 1, Y=−1},p(−1, 1) = P {X =−1, Y= 1},p(−1, −1) = P {X =−1, Y=−1}


Exercises 93Suppose that E[X]=E[Y ]=0. Show that(a) p(1, 1) = p(−1, −1);(b) p(1, −1) = p(−1, 1).Let p = 2p(1, 1).Find(c) Var(X);(d) Var(Y );(e) Cov(X, Y ).55. Let X be a positive random variable having density function f(x).Iff(x) c for all x, show that, for a>0,P {X>a} 1 − ac56. There are n types of coupons. Each newly obtained coupon is, independently,type i with probability p i , i = 1,...,n. Find the expected number and thevariance of the number of distinct types obtained in a collection of k coupons.57. Suppose that X and Y are independent binomial random variables with parameters(n, p) and (m, p). Argue probabilistically (no computations necessary)that X + Y is binomial with parameters (n + m, p).58. An urn contains 2n balls, of which r are red. The balls are randomly removedin n successive pairs. Let X denote the number of pairs in which bothballs are red.(a) Find E[X].(b) Find Var(X).59. Let X 1 ,X 2 ,X 3 , and X 4 be independent continuous random variables witha common distribution function F and letp = P {X 1 X 3


94 2 Random Variablesentiating the moment generating function and then compare the obtained resultwith a direct derivation of these moments.62. In deciding upon the appropriate premium to charge, insurance companiessometimes use the exponential principle, defined as follows. With X as the randomamount that it will have to pay in claims, the premium charged by the insurancecompany isP = 1 a ln( E[e aX ] )where a is some specified positive constant. Find P when X is an exponentialrandom variable with parameter λ, and a = αλ, where 0 ε → 0 asn →∞67. Suppose that X is a random variable with mean 10 and variance 15. Whatcan we say about P {5


Exercises 95(i) Compute P {X = i}.(ii) Let, for i = 1, 2,...,k; j = 1, 2,...,n,{ 1, if the ith ball selected is whiteX i =0, otherwise{ 1, if white ball j is selectedY j =0, otherwiseCompute E[X] in two ways by expressing X first as a function of the X i s andthen of the Y j s.*72. Show that Var(X) = 1 when X is the number of men who select their ownhats in Example 2.31.73. For the multinomial distribution (Exercise 17), let N i denote the number oftimes outcome i occurs. Find(i) E[N i ];(ii) Var(N i );(iii) Cov(N i ,N j );(iv) Compute the expected number of outcomes that do not occur.74. Let X 1 ,X 2 ,...be a sequence of independent identically distributed continuousrandom variables. We say that a record occurs at time n if X n > max(X 1 ,...,X n−1 ). That is, X n is a record if it is larger than each of X 1 ,...,X n−1 . Show(i) P {a record occurs at time n}=1/n;(ii) E[number of records by time n]= ∑ ni=1 1/i;(iii) Var(number of records by time n) = ∑ ni=1 (i − 1)/i 2 ;(iv) Let N = min{n: n>1 and a record occurs at time n}. Show E[N]=∞.Hint: For (ii) and (iii) represent the number of records as the sum of indicator(that is, Bernoulli) random variables.75. Let a 1


96 2 Random Variables(i) Show that N 1 ,...,N n are independent random variables.(ii) What is the distribution of N i ?(iii) Compute E[N] and Var(N).76. Let X and Y be independent random variables with means μ x and μ y andvariances σx 2 and σ y 2 . Show thatVar(XY ) = σ 2 x σ 2 y + μ2 y σ 2 x + μ2 x σ 2 y77. Let X and Y be independent normal random variables, each having parametersμ and σ 2 . Show that X + Y is independent of X − Y.Hint:Find their joint moment generating function.78. Let φ(t 1 ,...,t n ) denote the joint moment generating function of X 1 ,...,X n .(a) Explain how the moment generating function of X i ,φ Xi (t i ), can be obtainedfrom φ(t 1 ,...,t n ).(b) Show that X 1 ,...,X n are independent if and only ifφ(t 1 ,...,t n ) = φ x1 (t 1 ) ···φ Xn (t n )References1. W. Feller, “An Introduction to Probability Theory and Its Applications,” Vol. I,John Wiley, New York, 1957.2. M. Fisz, “Probability Theory and Mathematical Statistics,” John Wiley, New York,1963.3. E. Parzen, “Modern Probability Theory and Its Applications,” John Wiley, New York,1960.4. S. Ross, “A First Course in Probability,” Seventh Edition, Prentice Hall, New Jersey,2006.


Conditional Probabilityand ConditionalExpectation33.1. IntroductionOne of the most useful concepts in probability theory is that of conditional probabilityand conditional expectation. The reason is twofold. First, in practice, weare often interested in calculating probabilities and expectations when some partialinformation is available; hence, the desired probabilities and expectations areconditional ones. Secondly, in calculating a desired probability or expectation it isoften extremely useful to first “condition” on some appropriate random variable.3.2. The Discrete CaseRecall that for any two events E and F , the conditional probability of E given Fis defined, as long as P(F)>0, byP(E|F)= P(EF)P(F)Hence, if X and Y are discrete random variables, then it is natural to define theconditional probability mass function of X given that Y = y,byp X|Y (x|y) = P {X = x|Y = y}P {X = x,Y = y}=P {Y = y}= p(x,y)p Y (y)97


98 3 Conditional Probability and Conditional Expectationfor all values of y such that P {Y = y} > 0. Similarly, the conditional probabilitydistribution function of X given that Y = y is defined, for all y such thatP {Y = y} > 0, byF X|Y (x|y) = P {X x|Y = y}= ∑ axp X|Y (a|y)Finally, the conditional expectation of X given that Y = y is defined byE[X|Y = y]= ∑ xxP{X = x|Y = y}= ∑ xxp X|Y (x|y)In other words, the definitions are exactly as before with the exception thateverything is now conditional on the event that Y = y. IfX is independent of Y ,then the conditional mass function, distribution, and expectation are the same asthe unconditional ones. This follows, since if X is independent of Y , thenp X|Y (x|y) = P {X = x|Y = y}= P {X = x}Example 3.1 Suppose that p(x,y), the joint probability mass function of Xand Y , is given byp(1, 1) = 0.5, p(1, 2) = 0.1, p(2, 1) = 0.1, p(2, 2) = 0.3Calculate the conditional probability mass function of X given that Y = 1.Solution:Hence,We first note thatp Y (1) = ∑ xp(x,1) = p(1, 1) + p(2, 1) = 0.6p X|Y (1|1) = P {X = 1|Y = 1}==P {X = 1,Y = 1}P {Y = 1}p(1, 1)p Y (1)= 5 6


3.2. The Discrete Case 99Similarly,p X|Y (2|1) =p(2, 1)p Y (1) = 1 6Example 3.2 If X 1 and X 2 are independent binomial random variables withrespective parameters (n 1 ,p) and (n 2 ,p), calculate the conditional probabilitymass function of X 1 given that X 1 + X 2 = m.Solution: With q = 1 − p,P {X 1 = k|X 1 + X 2 = m}= P {X 1 = k,X 1 + X 2 = m}P {X 1 + X 2 = m}= P {X 1 = k,X 2 = m − k}P {X 1 + X 2 = m}= P {X 1 = k}P {X 2 = m − k}P {X 1 + X 2 = m}( ) ( )n1p k q n n21−kp m−k q n 2−m+kkm − k= ( )n1 + n 2p m q n 1+n 2 −mmwhere we have used that X 1 + X 2 is a binomial random variable with parameters(n 1 + n 2 ,p) (see Example 2.44). Thus, the conditional probability massfunction of X 1 , given that X 1 + X 2 = m, is)( )(n1 n2P {X 1 = k|X 1 + X 2 = m}=k m − k( ) (3.1)n1 + n 2The distribution given by Equation (3.1), first seen in Example 2.34, is knownas the hypergeometric distribution. It is the distribution of the number of blueballs that are chosen when a sample of m balls is randomly chosen from anurn that contains n 1 blue and n 2 red balls. ( To intuitively see why the conditionaldistribution is hypergeometric, consider n 1 + n 2 independent trials thateach result in a success with probability p;letX 1 represent the number of successesin the first n 1 trials and let X 2 represent the number of successes in thefinal n 2 trials. Because all trials have the same probability of being a success,each of the ( n 1 +n 2)m subsets of m trials is equally likely to be the success trials;thus, the number of the m success trials that are among the first n 1 trials is ahypergeometric random variable.) m


100 3 Conditional Probability and Conditional ExpectationExample 3.3 If X and Y are independent Poisson random variables with respectivemeans λ 1 and λ 2 , calculate the conditional expected value of X giventhat X + Y = n.Solution: Let us first calculate the conditional probability mass function ofX given that X + Y = n. We obtainP {X = k|X + Y = n}===P {X = k,X + Y = n}P {X + Y = n}P {X = k,Y = n − k}P {X + Y = n}P {X = k}P {Y = n − k}P {X + Y = n}where the last equality follows from the assumed independence of X and Y .Recalling (see Example 2.36) that X + Y has a Poisson distribution with meanλ 1 + λ 2 , the preceding equation equalsP {X = k|X + Y = n}= e−λ 1λ k 1k!e −λ 2λ n−k2(n − k)!λ k 1 λn−k 2[ e−(λ 1 +λ 2 ) (λ 1 + λ 2 ) nn!n!=(n − k)!k! (λ 1 + λ 2 ) n( )( ) n k ( ) n−kλ1 λ2=k λ 1 + λ 2 λ 1 + λ 2In other words, the conditional distribution of X given that X + Y = n, isthebinomial distribution with parameters n and λ 1 /(λ 1 + λ 2 ). Hence,] −1λ 1E{X|X + Y = n}=nλ 1 + λ 2Example 3.4 Consider an experiment which results in one of three possibleoutcomes with outcome i occurring with probability p i ,i = 1, 2, 3, ∑ 3i=1 p i = 1.Suppose that n independent replications of this experiment are performed and letX i , i = 1, 2, 3, denote the number of times outcome i appears. Determine theconditional expectation of X 1 given that X 2 = m.


3.2. The Discrete Case 101Solution: For k n − m,P {X 1 = k|X 2 = m}= P {X 1 = k,X 2 = m}P {X 2 = m}Now if X 1 = k and X 2 = m, then it follows that X 3 = n − k − m.However,P {X 1 = k, X 2 = m, X 3 = n − k − m}=n!k!m!(n − k − m)! pk 1 pm 2 p(n−k−m) 3(3.2)This follows since any particular sequence of the n experiments having outcome1 appearing k times, outcome 2 m times, and outcome 3 (n −k − m) times has probability p1 kpm 2 p(n−k−m) 3of occurring. Since there aren!/[k!m!(n − k − m)!] such sequences, Equation (3.2) follows.Therefore, we haveP {X 1 = k|X 2 = m}=n!k!m!(n − k − m)! pk 1 pm 2 p(n−k−m) 3n!m!(n − m)! pm 2 (1 − p 2) n−mwhere we have used the fact that X 2 has a binomial distribution with parametersn and p 2 . Hence,P {X 1 = k|X 2 = m}=or equivalently, writing p 3 = 1 − p 1 − p 2 ,( )(n − m)!k ( ) n−m−kp1 p3k!(n − m − k)! 1 − p 2 1 − p 2( )( ) n − mk (p1P {X 1 = k|X 2 = m}=1 − p ) n−m−k1k 1 − p 2 1 − p 2In other words, the conditional distribution of X 1 , given that X 2 = m, is binomialwith parameters n − m and p 1 /(1 − p 2 ). Consequently,p 1E[X 1 |X 2 = m]=(n − m) 1 − p 2Remarks (i) The desired conditional probability in Example 3.4 could alsohave been computed in the following manner. Consider the n − m experiments


102 3 Conditional Probability and Conditional Expectationthat did not result in outcome 2. For each of these experiments, the probabilitythat outcome 1 was obtained is given byP {outcome 1, not outcome 2}P {outcome 1|not outcome 2}=P {not outcome 2}= p 11 − p 2It therefore follows that, given X 2 = m, the number of times outcome 1 occurs isbinomially distributed with parameters n − m and p 1 /(1 − p 2 ).(ii) Conditional expectations possess all of the properties of ordinary expectations.For instance, such identities as[ n∑]n∑E X i |Y = y = E[X i |Y = y]i=1i=1remain valid.Example 3.5 There are n components. On a rainy day, component i will functionwith probability p i ; on a nonrainy day, component i will function with probabilityq i ,fori = 1,...,n. It will rain tomorrow with probability α. Calculate theconditional expected number of components that function tomorrow, given that itrains.Solution:Let{ 1, if component i functions tomorrowX i =0, otherwiseThen, with Y defined to equal 1 if it rains tomorrow, and 0 otherwise, the desiredconditional expectation is obtained as follows.[ n∑]n∑E X i |Y = 1 = E[X i |Y = 1]t=1i=1n∑= i=1p i3.3. The Continuous CaseIf X and Y have a joint probability density function f(x,y), then the conditionalprobability density function of X, given that Y = y, is defined for all values of y


3.3. The Continuous Case 103such that f Y (y) > 0, byf X|Y (x|y) = f(x,y)f Y (y)To motivate this definition, multiply the left side by dx and the right side by(dx dy)/dy to getf X|Y (x|y)dx =f(x,y)dx dyf Y (y) dyP {x X x + dx, y Y y + dy}≈P {y Y y + dy}= P {x X x + dx|y Y y + dy}In other words, for small values dx and dy, f X|Y (x|y)dx is approximately theconditional probability that X is between x and x + dx given that Y is between yand y + dy.The conditional expectation of X, given that Y = y, is defined for all values ofy such that f Y (y) > 0, by∫ ∞E[X|Y = y]= xf X|Y (x|y) dx−∞Example 3.6Suppose the joint density of X and Y is given by{ 6xy(2 − x − y), 0


104 3 Conditional Probability and Conditional ExpectationHence,E[X|Y = y]=∫ 106x 2 (2 − x − y)dx4 − 3y= (2 − y)2 − 6 44 − 3y= 5 − 4y 8 − 6yExample 3.7Suppose the joint density of X and Y is given by{ 4y(x − y)e −(x+y) , 0 y (by letting w = x − y)= (x − y)e −(x−y) , x >ywhere the final equality used that ∫ ∞0we −w dw is the expected value of anexponential random variable with mean 1. Therefore, with W being exponentialwith mean 1,E[X|Y = y]=∫ ∞yx(x − y)e −(x−y) dx


3.4. Computing Expectations by Conditioning 105=∫ ∞0(w + y)we −w dw= E[W 2 ]+yE[W ]= 2 + y Example 3.8The joint density of X and Y is given byf(x,y) ={ 12ye −xy , 0


106 3 Conditional Probability and Conditional Expectationand YE[X]=E [ E[X|Y ] ] (3.3)If Y is a discrete random variable, then Equation (3.3) states thatE[X]= ∑ yE[X|Y = y]P {Y = y}(3.3a)while if Y is continuous with density f Y (y), then Equation (3.3) says thatE[X]=∫ ∞−∞E[X|Y = y]f Y (y) dy(3.3b)We now give a proof of Equation (3.3) in the case where X and Y are both discreterandom variables.Proof of Equation (3.3) When X and Y Are DiscretethatE[X]= ∑ yWe must showE[X|Y = y]P {Y = y} (3.4)Now, the right side of the preceding can be written∑E[X|Y = y]P {Y = y}= ∑ ∑xP{X = x|Y = y}P {Y = y}yy x= ∑ ∑ P {X = x,Y = y}x P {Y = y}P {Y = y}y x= ∑ ∑xP{X = x,Y = y}y x= ∑ xx ∑ yP {X = x,Y = y}= ∑ xxP{X = x}and the result is obtained.= E[X]One way to understand Equation (3.4) is to interpret it as follows. It states thatto calculate E[X] we may take a weighted average of the conditional expectedvalue of X given that Y = y, each of the terms E[X|Y = y] being weighted bythe probability of the event on which it is conditioned.The following examples will indicate the usefulness of Equation (3.3).


3.4. Computing Expectations by Conditioning 107Example 3.9 Sam will read either one chapter of his probability book or onechapter of his history book. If the number of misprints in a chapter of his probabilitybook is Poisson distributed with mean 2 and if the number of misprints in hishistory chapter is Poisson distributed with mean 5, then assuming Sam is equallylikely to choose either book, what is the expected number of misprints that Samwill come across?Solution:Letting X denote the number of misprints and letting{ 1, if Sam chooses his history bookY =2, if Sam chooses his probability bookthenE[X]=E[X|Y = 1]P {Y = 1}+E[X|Y = 2]P {Y = 2}= 5 ( 12)+ 2( 12)= 7 2Example 3.10 (The Expectation of the Sum of a Random Number of RandomVariables) Suppose that the expected number of accidents per week at an industrialplant is four. Suppose also that the numbers of workers injured in each accidentare independent random variables with a common mean of 2. Assume also thatthe number of workers injured in each accident is independent of the number ofaccidents that occur. What is the expected number of injuries during a week?Solution: Letting N denote the number of accidents and X i the numberinjured in the ith accident, i = 1, 2,..., then the total number of injuries canbe expressed as ∑ Ni=1 X i .Now[ N] [ [∑N]]∑E X i = E E X i |N11But[ N] [∑n∑E X i |N = n = E11[ n∑= E1]X i |N = n]X i by the independence of X i and N= nE[X]


108 3 Conditional Probability and Conditional Expectationwhich yields that[ N]∑E X i |N = NE[X]i=1and thus[ N]∑E X i = E [ NE[X] ] = E[N]E[X]i=1Therefore, in our example, the expected number of injuries during a weekequals 4 × 2 = 8. The random variable ∑ Ni=1 X i , equal to the sum of a random number N ofindependent and identically distributed random variables that are also independentof N, is called a compound random variable. As just shown in Example 3.10, theexpected value of a compound random variable is E[X]E[N]. Its variance willbe derived in Example 3.17.Example 3.11 (The Mean of a Geometric Distribution) A coin, having probabilityp of coming up heads, is to be successively flipped until the first headappears. What is the expected number of flips required?Solution:Let N be the number of flips required, and let{ 1, if the first flip results in a headY =0, if the first flip results in a tailNowE[N]=E[N|Y = 1]P {Y = 1}+E[N|Y = 0]P {Y = 0}= pE[N|Y = 1]+(1 − p)E[N|Y = 0] (3.5)However,E[N|Y = 1]=1, E[N|Y = 0]=1 + E[N] (3.6)To see why Equation (3.6) is true, consider E[N|Y = 1]. Since Y = 1, we knowthat the first flip resulted in heads and so, clearly, the expected number of flipsrequired is 1. On the other hand if Y = 0, then the first flip resulted in tails.However, since the successive flips are assumed independent, it follows that,after the first tail, the expected additional number of flips until the first head is


3.4. Computing Expectations by Conditioning 109just E[N]. Hence E[N|Y = 0] =1 + E[N]. Substituting Equation (3.6) intoEquation (3.5) yieldsorE[N]=p + (1 − p)(1 + E[N])E[N]=1/pBecause the random variable N is a geometric random variable with probabilitymass function p(n) = p(1 − p) n−1 , its expectation could easily have beencomputed from E[N]= ∑ ∞1 np(n) without recourse to conditional expectation.However, if you attempt to obtain the solution to our next example without usingconditional expectation, you will quickly learn what a useful technique “conditioning”can be.Example 3.12 A miner is trapped in a mine containing three doors. The firstdoor leads to a tunnel that takes him to safety after two hours of travel. The seconddoor leads to a tunnel that returns him to the mine after three hours of travel. Thethird door leads to a tunnel that returns him to his mine after five hours. Assumingthat the miner is at all times equally likely to choose any one of the doors, what isthe expected length of time until the miner reaches safety?Solution: Let X denote the time until the miner reaches safety, and let Ydenote the door he initially chooses. NowHowever,E[X]=E[X|Y = 1]P {Y = 1}+E[X|Y = 2]P {Y = 2}+ E[X|Y = 3]P {Y = 3}( )E[X|Y = 1]+E[X|Y = 2]+E[X|Y = 3]= 1 3E[X|Y = 1]=2,E[X|Y = 2]=3 + E[X],E[X|Y = 3]=5 + E[X], (3.7)To understand why this is correct consider, for instance, E[X|Y = 2], and reasonas follows. If the miner chooses the second door, then he spends three hoursin the tunnel and then returns to the mine. But once he returns to the mine theproblem is as before, and hence his expected additional time until safety is justE[X]. Hence E[X|Y = 2]=3 + E[X]. The argument behind the other equalitiesin Equation (3.7) is similar. HenceE[X]= 1 ( )3 2 + 3 + E[X]+5 + E[X] or E[X]=10


110 3 Conditional Probability and Conditional ExpectationExample 3.13 (The Matching Rounds Problem) Suppose in Example 2.31that those choosing their own hats depart, while the others (those without a match)put their selected hats in the center of the room, mix them up, and then reselect.Also, suppose that this process continues until each individual has his own hat.(a) Find E[R n ] where R n is the number of rounds that are necessary when nindividuals are initially present.(b) Find E[S n ] where S n is the total number of selections made by the n individuals,n 2.(c) Find the expected number of false selections made by one of the n people,n 2.Solution: (a) It follows from the results of Example 2.31 that no matter howmany people remain there will, on average, be one match per round. Hence,one might suggest that E[R n ]=n. This turns out to be true, and an inductionproof will now be given. Because it is obvious that E[R 1 ]=1, assume thatE[R k ]=k for k = 1,...,n− 1. To compute E[R n ], start by conditioning onX n , the number of matches that occur in the first round. This givesE[R n ]=n∑E[R n |X n = i]P {X n = i}i=0Now, given a total of i matches in the initial round, the number of roundsneeded will equal 1 plus the number of rounds that are required when n − ipersons are to be matched with their hats. Therefore,E[R n ]=n∑(1 + E[R n−i ])P {X n = i}i=0= 1 + E[R n ]P {X n = 0}+= 1 + E[R n ]P {X n = 0}+n∑E[R n−i ]P {X n = i}i=1n∑(n − i)P{X n = i}i=1by the induction hypothesis= 1 + E[R n ]P {X n = 0}+n(1 − P {X n = 0}) − E[X n ]= E[R n ]P {X n = 0}+n(1 − P {X n = 0})where the final equality used the result, established in Example 2.31, thatE[X n ]=1. Since the preceding equation implies that E[R n ]=n, theresultis proven.


3.4. Computing Expectations by Conditioning 111(b) For n 2, conditioning on X n , the number of matches in round 1, givesE[S n ]==n∑E[S n |X n = i]P {X n = i}i=0n∑(n + E[S n−i ])P {X n = i}i=0= n +n∑E[S n−i ]P {X n = i}where E[S 0 ]=0. To solve the preceding equation, rewrite it asi=0E[S n ]=n + E[S n−Xn ]Now, if there were exactly one match in each round, then it would take a totalof 1 + 2 +···+n = n(n + 1)/2 selections. Thus, let us try a solution of theform E[S n ]=an+bn 2 . For the preceding equation to be satisfied by a solutionof this type, for n 2, we needor, equivalently,an + bn 2 = n + E[a(n − X n ) + b(n − X n ) 2 ]an + bn 2 = n + a(n − E[X n ]) + b(n 2 − 2nE[X n ]+E[X 2 n ])Now, using the results of Example 2.31 and Exercise 72 of Chapter 2 thatE[X n ]=Var(X n ) = 1, the preceding will be satisfied ifan + bn 2 = n + an − a + bn 2 − 2nb + 2band this will be valid provided that b = 1/2,a= 1. That is,E[S n ]=n + n 2 /2satisfies the recursive equation for E[S n ].The formal proof that E[S n ]=n + n 2 /2, n 2, is obtained by inductionon n. It is true when n = 2 (since, in this case, the number of selections is twicethe number of rounds and the number of rounds is a geometric random variable


112 3 Conditional Probability and Conditional Expectationwith parameter p = 1/2). Now, the recursion gives thatE[S n ]=n + E[S n ]P {X n = 0}+n∑E[S n−i ]P {X n = i}Hence, upon assuming that E[S 0 ]=E[S 1 ]=0, E[S k ]=k + k 2 /2, for k =2,...,n− 1 and using that P {X n = n − 1}=0, we see thatE[S n ]=n + E[S n ]P {X n = 0}+i=1n∑[n − i + (n − i) 2 /2]P {X n = i}i=1= n + E[S n ]P {X n = 0}+(n + n 2 /2)(1 − P {X n = 0})− (n + 1)E[X n ]+E[X 2 n ]/2Substituting the identities E[X n ]=1, E[Xn 2 ]=2 in the preceding shows thatE[S n ]=n + n 2 /2and the induction proof is complete.(c) If we let C j denote the number of hats chosen by person j, j = 1,...,nthenn∑C j = S nj=1Taking expectations, and using the fact that each C j has the same mean, yieldsthe resultE[C j ]=E[S n ]/n = 1 + n/2Hence, the expected number of false selections by person j isE[C j − 1]=n/2.Example 3.14 Independent trials, each of which is a success with probabilityp, are performed until there are k consecutive successes. What is the mean numberof necessary trials?Solution: Let N k denote the number of necessary trials to obtain k consecutivesuccesses, and let M k denote its mean. We will obtain a recursive equation


3.4. Computing Expectations by Conditioning 113for M k by conditioning on N k−1 , the number of trials needed for k − 1 consecutivesuccesses. This yieldsM k = E[N k ]=E [ E[N k |N k−1 ] ]Now,E[N k |N k−1 ]=N k−1 + 1 + (1 − p)E[N k ]where the preceding follows since if it takes N k−1 trials to obtain k − 1consecutive successes, then either the next trial is a success and we have ourk in a row or it is a failure and we must begin anew. Taking expectations ofboth sides of the preceding yieldsorM k = M k−1 + 1 + (1 − p)M kM k = 1 p + M k−1pSince N 1 , the time of the first success, is geometric with parameter p,we see thatand, recursivelyM 1 = 1 pM 2 = 1 p + 1 p 2 ,and, in general,M 3 = 1 p + 1 p 2 + 1 p 3M k = 1 p + 1 p 2 +···+ 1 p kExample 3.15 (Analyzing the Quick-Sort Algorithm) Suppose we are givena set of n distinct values—x 1 ,...,x n —and we desire to put these values in increasingorder or, as it is commonly called, to sort them. An efficient procedurefor accomplishing this is the quick-sort algorithm which is defined recursively asfollows: When n = 2 the algorithm compares the two values and puts them inthe appropriate order. When n>2 it starts by choosing at random one of the nvalues—say, x i —and then compares each of the other n − 1 values with x i , notingwhich are smaller and which are larger than x i . Letting S i denote the set of


114 3 Conditional Probability and Conditional Expectationelements smaller than x i , and ¯S i the set of elements greater than x i , the algorithmnow sorts the set S i and the set ¯S i . The final ordering, therefore, consists of theordered set of the elements in S i , then x i , and then the ordered set of the elementsin ¯S i . For instance, suppose that the set of elements is 10, 5, 8, 2, 1, 4, 7. We startby choosing one of these values at random (that is, each of the 7 values has probabilityof 1 7of being chosen). Suppose, for instance, that the value 4 is chosen. Wethen compare 4 with each of the other six values to obtainWe now sort the set {2, 1} to obtain{2, 1}, 4, {10, 5, 8, 7}1, 2, 4, {10, 5, 8, 7}Next we choose a value at random from {10, 5, 8, 7}—say 7 is chosen—and compareeach of the other three values with 7 to obtainFinally, we sort {10, 8} to end up with1, 2, 4, 5, 7, {10, 8}1, 2, 4, 5, 7, 8, 10One measure of the effectiveness of this algorithm is the expected number of comparisonsthat it makes. Let us denote by M n the expected number of comparisonsneeded by the quick-sort algorithm to sort a set of n distinct values. To obtain arecursion for M n we condition on the rank of the initial value selected to obtain:M n =n∑E[number of comparisons|value selected is jth smallest] 1 nj=1Now if the initial value selected is the jth smallest, then the set of values smallerthan it is of size j − 1, and the set of values greater than it is of size n − j. Hence,as n − 1 comparisons with the initial value chosen must be made, we see thatM n =n∑(n − 1 + M j−1 + M n−j ) 1 nj=1n−1= n − 1 + 2 ∑M k (since M 0 = 0)nk=1


3.4. Computing Expectations by Conditioning 115or, equivalently,∑n−1nM n = n(n − 1) + 2To solve the preceding, note that upon replacing n by n + 1 we obtainHence, upon subtraction,k=1(n + 1)M n+1 = (n + 1)n + 2M kn∑k=1M korTherefore,Iterating this gives(n + 1)M n+1 − nM n = 2n + 2M n(n + 1)M n+1 = (n + 2)M n + 2nM n+1n + 2 = 2n(n + 1)(n + 2) + M nn + 1M n+1n + 2 = 2n 2(n − 1)+(n + 1)(n + 2) n(n + 1) + M n−1n=···∑n−1n − k= 2(n + 1 − k)(n + 2 − k)k=0since M 1 = 0Hence,∑n−1n − kM n+1 = 2(n + 2)(n + 1 − k)(n + 2 − k)= 2(n + 2)k=0n∑i=1i(i + 1)(i + 2) , n 1


116 3 Conditional Probability and Conditional ExpectationUsing the identity i/(i + 1)(i + 2) = 2/(i + 2) − 1/(i + 1), we can approximateM n+1 for large n as follows:[ n∑]2n∑M n+1 = 2(n + 2)i + 2 − 1i + 1i=1 i=1[ ∫ n+2 ∫2 n+1]∼ 2(n + 2)3 x dx − 12 x dx= 2(n + 2)[2log(n + 2) − log(n + 1) + log 2 − 2log3][= 2(n + 2) log(n + 2) + log n + 2]n + 1 + log 2 − 2log3∼ 2(n + 2) log(n + 2)Although we usually employ the conditional expectation identity to more easilyenable us to compute an unconditional expectation, in our next example we showhow it can sometimes be used to obtain the conditional expectation.Example 3.16 In the match problem of Example 2.31 involving n, n>1,individuals, find the conditional expected number of matches given that the firstperson did not have a match.Solution: Let X denote the number of matches, and let X 1 equal 1 if thefirst person has a match and let it equal 0 otherwise. Then,E[X]=E[X|X 1 = 0]P {X 1 = 0}+E[X|X 1 = 1]P {X 1 = 1}= E[X|X 1 = 0] n − 1 + E[X|X 1 = 1] 1 nnBut, from Example 2.31E[X]=1Moreover, given that the first person has a match, the expected number ofmatches is equal to 1 plus the expected number of matches when n − 1 peopleselect among their own n − 1 hats, showing thatE[X|X 1 = 1]=2


3.4. Computing Expectations by Conditioning 117Therefore, we obtain the resultE[X|X 1 = 0]= n − 2n − 13.4.1. Computing Variances by ConditioningConditional expectations can also be used to compute the variance of a randomvariable. Specifically, we can use thatVar(X) = E[X 2 ]−(E[X]) 2and then use conditioning to obtain both E[X] and E[X 2 ]. We illustrate this techniqueby determining the variance of a geometric random variable.Example 3.17 (Variance of the Geometric Random Variable) Independenttrials, each resulting in a success with probability p, are performed in sequence.Let N be the trial number of the first success. Find Var(N).Let Y = 1 if the first trial results in a success, and Y = 0 other-Solution:wise.Var(N) = E(N 2 ) − (E[N]) 2To calculate E[N 2 ] and E[N] we condition on Y . For instance,E[N 2 ]=E [ E[N 2 |Y ] ]However,E[N 2 |Y = 1]=1,E[N 2 |Y = 0]=E[(1 + N) 2 ]These two equations are true since if the first trial results in a success, thenclearly N = 1 and so N 2 = 1. On the other hand, if the first trial results ina failure, then the total number of trials necessary for the first success willequal one (the first trial that results in failure) plus the necessary number ofadditional trials. Since this latter quantity has the same distribution as N, we


118 3 Conditional Probability and Conditional Expectationget that E[N 2 |Y = 0]=E[(1 + N) 2 ]. Hence, we see thatE[N 2 ]=E[N 2 |Y = 1]P {Y = 1}+E[N 2 |Y = 0]P {Y = 0}= p + E[(1 + N) 2 ](1 − p)= 1 + (1 − p)E[2N + N 2 ]Since, as was shown in Example 3.11, E[N]=1/p, this yieldsorTherefore,E[N 2 ]=1 +2(1 − p)pE[N 2 ]= 2 − pp 2+ (1 − p)E[N 2 ]Var(N) = E[N 2 ]−(E[N]) 2= 2 − p ( ) 1 2p 2 −p= 1 − pp 2Another way to use conditioning to obtain the variance of a random variableis to apply the conditional variance formula. The conditional variance of Xgiven that Y = y is defined byVar(X|Y = y) = E [ (X − E[X|Y = y]) 2 |Y = y ]That is, the conditional variance is defined in exactly the same manner as theordinary variance with the exception that all probabilities are determined conditionalon the event that Y = y. Expanding the right side of the preceding andtaking expectation term by term yield thatVar(X|Y = y) = E[X 2 |Y = y]−(E[X|Y = y]) 2Letting Var(X|Y) denote that function of Y whose value when Y = y isVar(X|Y = y), we have the following result.


3.4. Computing Expectations by Conditioning 119Proposition 3.1 The Conditional Variance FormulaVar(X) = E [ Var(X|Y) ] + Var ( E[X|Y ] ) (3.8)ProofE [ Var(X|Y) ] = E [ E[X 2 |Y ]−(E[X|Y ]) 2]= E [ E[X 2 |Y ] ] − E [ (E[X|Y ]) 2]= E[X 2 ]−E [ (E[X|Y ]) 2]andVar(E[X|Y ]) = E [ (E[X|Y ]) 2] − ( E [ E[X|Y ] ]) 2= E [ (E[X|Y ]) 2] − (E[X]) 2Therefore,E [ Var(X|Y) ] + Var ( E[X|Y ] ) = E[X 2 ]−(E[X]) 2which completes the proof.Example 3.18 (The Variance of a Compound Random Variable) Let X 1 ,X 2 ,... be independent and identically distributed random variables with distributionF having mean μ and variance σ 2 , and assume that they are independentof the nonnegative integer valued random variable N. As noted in Example 3.10,where its expected value was determined, the random variable S = ∑ Ni=1 X i iscalled a compound random variable. Find its variance.Solution: Whereas we could obtain E[S 2 ] by conditioning on N, let usinstead use the conditional variance formula. Now,( N)∑Var(S|N = n) = Var X i |N = ni=1( n∑= Var X i |N = ni=1( n∑)X ii=1= Var)= nσ 2


120 3 Conditional Probability and Conditional ExpectationBy the same reasoning,E[S|N = n]=nμTherefore,Var(S|N)= Nσ 2 ,E[S|N]=Nμand the conditional variance formula gives thatVar(S) = E[Nσ 2 ]+Var(Nμ) = σ 2 E[N]+μ 2 Var(N)If N is a Poisson random variable, then S = ∑ Ni=1 X i is called a compoundPoisson random variable. Because the variance of a Poisson random variableis equal to its mean, it follows that for a compound Poisson random variablehaving E[N]=λVar(S) = λσ 2 + λμ 2 = λE[X 2 ]where X has the distribution F .3.5. Computing Probabilities by ConditioningNot only can we obtain expectations by first conditioning on an appropriate randomvariable, but we may also use this approach to compute probabilities. To seethis, let E denote an arbitrary event and define the indicator random variable Xby{1, if E occursX =0, if E does not occurIt follows from the definition of X thatE[X]=P(E),E[X|Y = y]=P(E|Y = y),for any random variable YTherefore, from Equations (3.3a) and (3.3b) we obtainP(E)= ∑ yP(E|Y = y)P(Y = y),if Y is discrete∫ ∞= P(E|Y = y)f Y (y) dy,−∞if Y is continuous


3.5. Computing Probabilities by Conditioning 121Example 3.19 Suppose that X and Y are independent continuous randomvariables having densities f X and f Y , respectively. Compute P {X


122 3 Conditional Probability and Conditional ExpectationHowever, becauseh(λ) = 2e−2λ (2λ) n+1, λ>0(n + 1)!is the density function of a gamma (n + 2, 2) random variable, its integral is 1.Therefore,showing that∫ ∞2e −2λ (2λ) n+11 =dλ = 2n+20 (n + 1)! (n + 1)!P {X = n}= n + 12 n+2 ∫ ∞0λ n+1 e −2λ dλExample 3.21 Suppose that the number of people who visit a yoga studioeach day is a Poisson random variable with mean λ. Suppose further that eachperson who visits is, independently, female with probability p or male with probability1 − p. Find the joint probability that exactly n women and m men visit theacademy today.Solution: Let N 1 denote the number of women, and N 2 the number of men,who visit the academy today. Also, let N = N 1 + N 2 be the total number ofpeople who visit. Conditioning on N givesP {N 1 = n, N 2 = m}=∞∑P {N 1 = n, N 2 = m|N = i} P {N = i}i=0Because P {N 1 = n, N 2 = m|N = i}=0 when i ≠ n + m, the preceding equationyields thatλn+mP {N 1 = n, N 2 = m}=P {N 1 = n, N 2 = m|N = n + m}e −λ(n + m)!Given that n + m people visit it follows, because each of these n + m is independentlya woman with probability p, that the conditional probability thatn of them are women (and m are men) is just the binomial probability of n


3.5. Computing Probabilities by Conditioning 123successes in n + m trials. Therefore,( n + mP {N 1 = n, N 2 = m}=)p n (1 − p) m −λ λn+men(n + m)!=(n + m)!n!m!p n (1 − p) m e −λp e −λ(1−p) λ n λ m(n + m)!−λp (λp)n −λ(1−p) (λ(1 − p))m= e en!m!Because the preceding joint probability mass function factors into two products,one of which depends only on n and the other only on m, it follows that N 1 andN 2 are independent. Moreover, becauseP {N 1 = n}=and, similarly,∞∑P {N 1 = n, N 2 = m}m=0−λp (λp)n= en!∞∑m=0−λ(1−p) (λ(1 − p))m −λp (λp)ne = em!n!−λ(1−p) (λ(1 − p))mP {N 2 = m}=em!we can conclude that N 1 and N 2 are independent Poisson random variableswith respective means λp and λ(1 − p). Therefore, this example establishes theimportant result that when each of a Poisson number of events is independentlyclassified either as being type 1 with probability p or type 2 with probability1 − p, then the numbers of type 1 and type 2 events are independent Poissonrandom variables. Example 3.22 Let X 1 ,...,X n be independent Bernoulli random variables,with X i having parameter p i ,i= 1,...,n. That is, P {X i = 1} =p i ,P{X i =0}=q i = 1 − p i . Suppose we want to compute the probability mass function oftheir sum, X 1 +···+X n . To do so, we will recursively obtain the probabilitymass function of X 1 +···+X k ,firstfork = 1, then k = 2, and on up to k = n.To begin, letP k (j) = P {X 1 +···+X k = j}


124 3 Conditional Probability and Conditional Expectationand note thatk∏k∏P k (k) = p i , P k (0) =i=1For 0


3.5. Computing Probabilities by Conditioning 125k prizes is also the best of the first i − 1 prizes (for then none of the prizes inpositions k + 1,k+ 2,...,i− 1 would be selected). Hence, we see thatP k (best|X = i) = 0, if i kP k (best|X = i) = P {best of first i − 1 is among the first k}= k/(i − 1), if i>kFrom the preceding, we obtain thatP k (best) = k nn∑i=k+1∫ n−11i − 11x dx≈ k n k= k ( ) n − 1n log k≈ k ( n)n log kNow, if we consider the functiong(x) = x n log ( nx)theng ′ (x) = 1 n log ( nx)− 1 nand sog ′ (x) = 0 ⇒ log(n/x) = 1 ⇒ x = n/eThus, since P k (best) ≈ g(k), we see that the best strategy of the type consideredis to let the first n/e prizes go by and then accept the first one to appear thatis better than all of those. In addition, since g(n/e) = 1/e, the probability thatthis strategy selects the best prize is approximately 1/e ≈ 0.36788.


126 3 Conditional Probability and Conditional ExpectationRemark Most students are quite surprised by the size of the probability of obtainingthe best prize, thinking that this probability would be close to 0 when n islarge. However, even without going through the calculations, a little thought revealsthat the probability of obtaining the best prize can be made to be reasonablylarge. Consider the strategy of letting half of the prizes go by, and then selectingthe first one to appear that is better than all of those. The probability that a prize isactually selected is the probability that the overall best is among the second halfand this is 1/2. In addition, given that a prize is selected, at the time of selectionthat prize would have been the best of more than n/2 prizes to have appeared, andwould thus have probability of at least 1/2 of being the overall best. Hence, thestrategy of letting the first half of all prizes go by and then accepting the first onethat is better than all of those prizes results in a probability greater than 1/4 ofobtaining the best prize. Example 3.24 At a party n men take off their hats. The hats are then mixed upand each man randomly selects one. We say that a match occurs if a man selectshis own hat. What is the probability of no matches? What is the probability ofexactly k matches?Solution: Let E denote the event that no matches occur, and to make explicitthe dependence on n, write P n = P(E). We start by conditioning on whetheror not the first man selects his own hat—call these events M and M c . ThenP n = P(E)= P(E|M)P(M)+ P(E|M c )P (M c )Clearly, P(E|M) = 0, and soP n = P(E|M c ) n − 1(3.9)nNow, P(E|M c ) is the probability of no matches when n − 1 men select froma set of n − 1 hats that does not contain the hat of one of these men. This canhappen in either of two mutually exclusive ways. Either there are no matchesand the extra man does not select the extra hat (this being the hat of the manthat chose first), or there are no matches and the extra man does select the extrahat. The probability of the first of these events is just P n−1 , which is seen byregarding the extra hat as “belonging” to the extra man. Because the secondevent has probability [1/(n − 1)]P n−2 ,wehaveP(E|M c ) = P n−1 + 1n − 1 P n−2and thus, from Equation (3.9),P n = n − 1nP n−1 + 1 n P n−2


3.5. Computing Probabilities by Conditioning 127or, equivalently,P n − P n−1 =− 1 n (P n−1 − P n−2 ) (3.10)However, because P n is the probability of no matches when n men select amongtheir own hats, we haveand so, from Equation (3.10),P 1 = 0, P 2 = 1 2P 3 − P 2 =− (P 2 − P 1 )3P 4 − P 3 =− (P 3 − P 2 )4=− 1 3!= 1 4!or P 3 = 1 2! − 1 3! ,or P 4 = 1 2! − 1 3! + 1 4!and, in general, we see thatP n = 1 2! − 1 3! + 1 4!−···+(−1)nn!To obtain the probability of exactly k matches, we consider any fixed groupof k men. The probability that they, and only they, select their own hats is1 1n n − 1 ··· 1n − (k − 1) P n−k =(n − k)!P n−kn!where P n−k is the conditional probability that the other n − k men, selectingamong their own hats, have no matches. Because there are ( nk)choices of a setof k men, the desired probability of exactly k matches isP n−kk!=12! − 1 (−1)n−k+···+3! (n − k)!k!which, for n large, is approximately equal to e −1 /k!.Remark The recursive equation, Equation (3.10), could also have been obtainedby using the concept of a cycle, where we say that the sequence of distinctindividuals i 1 ,i 2 ,...,i k constitutes a cycle if i 1 chooses i 2 ’s hat, i 2 chooses i 3 ’shat, ...,i k−1 chooses i k ’s hat, and i k chooses i 1 ’s hat. Note that every individualis part of a cycle, and that a cycle of size k = 1 occurs when someone chooses his


128 3 Conditional Probability and Conditional Expectationor her own hat. With E being, as before, the event that no matches occur, it followsupon conditioning on the size of the cycle containing a specified person, sayperson 1, thatP n = P(E)=n∑P(E|C = k)P(C = k) (3.11)k=1where C is the size of the cycle that contains person 1. Now call person 1 the firstperson, and note that C = k if the first person does not choose 1’s hat; the personwhose hat was chosen by the first person—call this person the second person—does not choose 1’s hat; the person whose hat was chosen by the second person—call this person the third person—does not choose 1’s hat; ..., the person whosehat was chosen by the (k − 1)st person does choose 1’s hat. Consequently,P(C= k) = n − 1nn − 2 ···n − k + 1n − 1 n − k + 21n − k + 1 = 1 n(3.12)That is, the size of the cycle that contains a specified person is equally likely to beany of the values 1, 2,...,n. Moreover, since C = 1 means that 1 chooses his orher own hat, it follows thatP(E|C = 1) = 0 (3.13)On the other hand, if C = k, then the set of hats chosen by the k individuals in thiscycle is exactly the set of hats of these individuals. Hence, conditional on C = k,the problem reduces to determining the probability of no matches when n − kpeople randomly choose among their own n − k hats. Therefore, for k>1P(E|C = k) = P n−kSubstituting (3.12), (3.13), and (3.14) back into Equation (3.11) givesP n = 1 nn∑P n−k (3.14)k=2which is easily shown to be equivalent to Equation (3.10).Example 3.25 (The Ballot Problem) In an election, candidate A receives nvotes, and candidate B receives m votes where n>m. Assuming that all orderingsare equally likely, show that the probability that A is always ahead in the count ofvotes is (n − m)/(n + m).


3.5. Computing Probabilities by Conditioning 129Solution: Let P n,m denote the desired probability. By conditioning on whichcandidate receives the last vote counted we havenP n,m = P {A always ahead|A receives last vote}n + mm+ P {A always ahead|B receives last vote}n + mNow given that A receives the last vote, we can see that the probability that Ais always ahead is the same as if A had received a total of n − 1 and B a totalof m votes. Because a similar result is true when we are given that B receivesthe last vote, we see from the preceding thatP n,m =nn + m P n−1,m +mm + n P n,m−1 (3.15)We can now prove that P n,m = (n − m)/(n + m) by induction on n + m. Asitis true when n + m = 1, that is, P 1,0 = 1, assume it whenever n + m = k. Thenwhen n + m = k + 1, we have by Equation (3.15) and the induction hypothesisthatP n,m =n n − 1 − mn + m n − 1 + m + m n − m + 1m + n n + m − 1= n − mn + mand the result is proven. The ballot problem has some interesting applications. For example, considersuccessive flips of a coin that always land on “heads” with probability p, and letus determine the probability distribution of the first time, after beginning, that thetotal number of heads is equal to the total number of tails. The probability thatthe first time this occurs is at time 2n can be obtained by first conditioning on thetotal number of heads in the first 2n trials. This yieldsP{first time equal = 2n}( ) 2n= P {first time equal = 2n|n heads in first 2n} p n (1 − p) nnNow given a total of n heads in the first 2n flips we can see that all possibleorderings of the n heads and n tails are equally likely, and thus the precedingconditional probability is equivalent to the probability that in an election, in whicheach candidate receives n votes, one of the candidates is always ahead in thecounting until the last vote (which ties them). But by conditioning on whomever


130 3 Conditional Probability and Conditional Expectationreceives the last vote, we see that this is just the probability in the ballot problemwhen m = n − 1. Hence( ) 2nP {first time equal = 2n}=P n,n−1 p n (1 − p) nn( ) 2np n (1 − p) nn=2n − 1Suppose now that we wanted to determine the probability that the first timethere are i more heads than tails occurs after the (2n + i)th flip. Now, in order forthis to be the case, the following two events must occur:(a) The first 2n + i tosses result in n + i heads and n tails; and(b) The order in which the n + i heads and n tails occur is such that the numberof heads is never i more than the number of tails until after the final flip.Now, it is easy to see that event (b) will occur if and only if the order of appearanceof the n + i heads and n tails is such that starting from the final flip and workingbackwards heads is always in the lead. For instance, if there are 4 heads and 2tails (n = 2, i= 2), then the outcome _ _ _ _TH would not suffice because therewould have been 2 more heads than tails sometime before the sixth flip (since thefirst 4 flips resulted in 2 more heads than tails).Now, the probability of the event specified in (a) is just the binomial probabilityof getting n + i heads and n tails in 2n + i flips of the coin.We must now determine the conditional probability of the event specified in(b) given that there are n + i heads and n tails in the first 2n + i flips. To do so,note first that given that there are a total of n + i heads and n tails in the first2n + i flips, all possible orderings of these flips are equally likely. As a result,the conditional probability of (b) given (a) is just the probability that a randomordering of n + i heads and n tails will, when counted in reverse order, alwayshave more heads than tails. Since all reverse orderings are also equally likely, itfollows from the ballot problem that this conditional probability is i/(2n + i).That is, we have shown that( ) 2n + iP {a}= p n+i (1 − p) n ,nP {b|a}=i2n + iand so( ) 2n + iP {first time heads leads by i is after flip 2n + i}= p n+i (1 − p) n in2n + i


3.5. Computing Probabilities by Conditioning 131Example 3.26 Let U 1 ,U 2 ,... be a sequence of independent uniform (0, 1)random variables, and letandN = min{n 2: U n >U n−1 }M = min{n 1: U 1 +···+U n > 1}That is, N is the index of the first uniform random variable that is larger thanits immediate predecessor, and M is the number of uniform random variableswe need sum to exceed 1. Surprisingly, N and M have the same probability distribution,and their common mean is e!Solution: It is easy to find the distribution of N. Since all n! possible orderingsof U 1 ,...,U n are equally likely, we haveP {N >n}=P {U 1 >U 2 > ···>U n }=1/n!To show that P {M >n}=1/n!, we will use mathematical induction. However,to give ourselves a stronger result to use as the induction hypothesis, we willprove the stronger result that for 0 n}=x n /n!, n 1,whereM(x) = min{n 1: U 1 +···+U n >x}is the minimum number of uniforms that need be summed to exceed x. To provethat P {M(x) > n}=x n /n!, note first that it is true for n = 1 sinceP {M(x) > 1}=P {U 1 x}=xSo assume that for all 0 n}=x n /n!. To determineP {M(x) > n+ 1}, condition on U 1 to obtain:P {M(x) > n+ 1}====∫ 10∫ x0∫ x0∫ x0P {M(x) > n+ 1|U 1 = y} dyP {M(x) > n+ 1|U 1 = y} dyP {M(x − y) > n} dy(x − y) ndyn!by the induction hypothesis


132 3 Conditional Probability and Conditional Expectation∫ xu n=0 n! du= xn+1(n + 1)!where the third equality of the preceding follows from the fact that given U 1 =y, M(x) is distributed as 1 plus the number of uniforms that need be summedto exceed x − y. Thus, the induction is complete and we have shown that for0 n}=x n /n!Letting x = 1 shows that N and M have the same distribution. Finally, we havethat∞∑∞∑E[M]=E[N]= P {N>n}= 1/n!=en=0Example 3.27 Let X 1 ,X 2 ,... be independent continuous random variableswith a common distribution function F and density f = F ′ , and suppose that theyare to be observed one at a time in sequence. Letand letn=0N = min{n 2: X n = second largest of X 1 ,...,X n }M = min{n 2: X n = second smallest of X 1 ,...,X n }Which random variable—X N , the first random variable which when observedis the second largest of those that have been seen, or X M , the first one that onobservation is the second smallest to have been seen—tends to be larger?Solution: To calculate the probability density function of X N , it is naturalto condition on the value of N; so let us start by determining its probabilitymass function. Now, if we letA i ={X i ≠ second largest of X 1 ,...,X i }, i 2then, for n 2,P {N = n}=P ( A 2 A 3 ···A n−1 A c )nSince the X i are independent and identically distributed it follows that, forany m 1, knowing the rank ordering of the variables X 1 ,...,X m yields no


3.5. Computing Probabilities by Conditioning 133information about the set of m values {X 1 ,...,X m }. That is, for instance,knowing that X 1


134 3 Conditional Probability and Conditional ExpectationFor instance, for k 1, letN k = min{n k: X n = kth largest of X 1 ,...,X n }Therefore, N 2 is what we previously called N, and X Nk is the first randomvariable that upon observation is the kth largest of all those observed up to thispoint. It can then be shown by a similar argument as used in the preceding thatX Nk has distribution function F for all k (see Exercise 82 at the end of thischapter). In addition, it can be shown that the random variables X Nk ,k 1are independent. (A statement and proof of Ignatov’s theorem in the case ofdiscrete random variables are given in Section 3.6.6.) The use of conditioning can also result in a more computationally efficient solutionthan a direct calculation. This is illustrated by our next example.Example 3.28 Consider n independent trials in which each trial results in oneof the outcomes 1,...,k with respective probabilities p 1 ,...,p k , ∑ ki=1 p i = 1.Suppose further that n>k, and that we are interested in determining the probabilitythat each outcome occurs at least once. If we let A i denote the event thatoutcome i does not occur in any of the n trials, then the desired probability is1 − P( ⋃ ki=1 A i ), and it can be obtained by using the inclusion–exclusion theoremas follows:( k⋃)k∑P A i = P(A i ) − ∑ ∑P(A i A j )i=1 i=1i j>i+ ∑ ∑ ∑P(A i A j A k ) −···+(−1) k+1 P(A 1 ···A k )i j>i k>jwhereP(A i ) = (1 − p i ) nP(A i A j ) = (1 − p i − p j ) n , i


3.5. Computing Probabilities by Conditioning 135when n − N k trials are performed, and each results in outcome i with probabilityp i / ∑ k−1j=1 p j , i = 1,...,k− 1. We could then use a similar conditioning step onthese terms.To follow through on the preceding idea, let A m,r ,form n, r k, denote theevent that each of the outcomes 1,...,r occurs at least once when m independenttrials are performed, where each trial results in one of the outcomes 1,...,r withrespective probabilities p 1 /P r ,...,p r /P r , where P r = ∑ rj=1 p j .LetP(m,r)=P(A m,r ) and note that P(n,k)is the desired probability. To obtain an expressionfor P(m,r), condition on the number of times that outcome r occurs. This givesm∑( )( ) m j (prP(m,r)= P {A m,r |r occurs j times}1 − p ) m−jrj P r P rStarting withj=0m−r+1∑=j=1( )( ) m j (prP(m− j,r − 1)1 − p ) m−jrj P r P rP(m,1) = 1, if m 1P(m,1) = 0, if m = 0we can use the preceding recursion to obtain the quantities P(m,2), m =2,...,n− (k − 2), and then the quantities P(m,3), m = 3,...,n− (k − 3), andso on, up to P(m,k − 1), m = k − 1,...,n− 1. At this point we can then usethe recursion to compute P(n,k). It is not difficult to check that the amount ofcomputation needed is a polynomial function of k, which will be much smallerthan 2 k when k is large. As noted previously, conditional expectations given that Y = y are exactly thesame as ordinary expectations except that all probabilities are computed conditionalon the event that Y = y. As such, conditional expectations satisfy all theproperties of ordinary expectations. For instance, the analog of⎧∑⎪⎨E[X|W = w]P {W = w}, if W is discretewE[X]= ∫⎪⎩ E[X|W = w]f W (w) dw, if W is continuouswis thatE[X|Y = y]⎧∑E[X|W = w,Y = y]P {W = w|Y = y},⎪⎨w= ∫⎪⎩ E[X|W = w,Y = y]f W |Y (w|y)dw,wif W is discreteif W is continuous


136 3 Conditional Probability and Conditional ExpectationIf E[X|Y,W] is defined to be that function of Y and W that, when Y = y andW = w, is equal to E[X|Y = y,W = w], then the preceding can be written asE[X|Y ]=E [ E[X|Y,W] ∣ ∣Y ]Example 3.29 An automobile insurance company classifies each of its policyholdersas being of one of the types i = 1,...,k. It supposes that the numbersof accidents that a type i policyholder has in successive years are independentPoisson random variables with mean λ i ,i= 1,...,k. The probability thata newly insured policyholder is type i is p i , ∑ ki=1 p i = 1. Given that a policyholderhad n accidents in her first year, what is the expected number that she hasin her second year? What is the conditional probability that she has m accidentsin her second year?Solution: Let N i denote the number of accidents the policyholder has inyear i, i = 1, 2. To obtain E[N 2 |N 1 = n], condition on her risk type T .E[N 2 |N 1 = n]====k∑E[N 2 |T = j,N 1 = n]P {T = j|N 1 = n}j=1k∑E[N 2 |T = j]P {T = j|N 1 = n}j=1k∑λ j P {T = j|N 1 = n}j=1∑ kj=1e −λ jλ n+1j∑ kj=1e −λ j λ n j p jp jwhere the final equality used thatP {T = j|N 1 = n}= P {T = j,N 1 = n}P {N 1 = n}=P {N 1 = n|T = j}P {T = j}∑ kj=1P {N 1 = n|T = j}P {T = j}=p j e −λ jλ n j /n!∑ kj=1p j e −λ j λ n j /n!


3.6. Some Applications 137The conditional probability that the policyholder has m accidents in year 2given that she had n in year 1 can also be obtained by conditioning on her type.P {N 2 = m|N 1 = n}===k∑P {N 2 = m|T = j,N 1 = n}P {T = j|N 1 = n}j=1k∑j=1e −λ j λm jm! P {T = j|N 1 = n}∑ kj=1e −2λ jλjm+n p jm! ∑ kj=1 e −λ j λ n j p jAnother way to calculate P {N 2 = m|N 1 = n} is first to writeP {N 2 = m|N 1 = n}= P {N 2 = m, N 1 = n}P {N 1 = n}and then determine both the numerator and denominator by conditioning on T .This yieldsP {N 2 = m|N 1 = n}=∑ kj=1P {N 2 = m, N 1 = n|T = j}p j∑ kj=1P {N 1 = n|T = j}p j==∑ kj=1e −λ j λm j∑ kj=1e −λ j λn jm! e−λ j λn jn! p jn! p j∑ kj=1e −2λ jλ m+njp jm! ∑ kj=1 e −λ j λ n j p j3.6. Some Applications3.6.1. A List ModelConsider n elements—e 1 ,e 2 ,...,e n —that are initially arranged in some orderedlist. At each unit of time a request is made for one of these elements—e i beingrequested, independently of the past, with probability P i . After being requestedthe element is then moved to the front of the list. That is, for instance, if thepresent ordering is e 1 ,e 2 ,e 3 ,e 4 and if e 3 is requested, then the next ordering ise 3 ,e 1 ,e 2 ,e 4 .


138 3 Conditional Probability and Conditional ExpectationWe are interested in determining the expected position of the element requestedafter this process has been in operation for a long time. However, before computingthis expectation, let us note two possible applications of this model. In thefirst we have a stack of reference books. At each unit of time a book is randomlyselected and is then returned to the top of the stack. In the second application wehave a computer receiving requests for elements stored in its memory. The requestprobabilities for the elements may not be known, so to reduce the average time ittakes the computer to locate the element requested (which is proportional to theposition of the requested element if the computer locates the element by startingat the beginning and then going down the list), the computer is programmed toreplace the requested element at the beginning of the list.To compute the expected position of the element requested, we start by conditioningon which element is selected. This yieldsE [ position of element requested ]n∑= E [ position|e i is selected ]P i==i=1n∑E [ position of e i |e i is selected ]P ii=1n∑E [ position of e i ]P i (3.16)i=1where the final equality used that the position of e i and the event that e i is selectedare independent because, regardless of its position, e i is selected with probabilityP i .Nowposition of e i = 1 + ∑ j≠iI jwhere{1, if ej precedes e iI j =0, otherwiseand so,E [ position of e i ]=1 + ∑ j≠iE [I j ]= 1 + ∑ j≠iP {e j precedes e i } (3.17)


3.6. Some Applications 139To compute P {e j precedes e i }, note that e j will precede e i if the most recentrequest for either of them was for e j . But given that a request is for either e i ore j , the probability that it is for e j isand, thus,P jP {e j |e i or e j }=P i + P jP jP {e j precedes e i }=P i + P jHence from Equations (3.16) and (3.17) we see thatn∑ ∑E{position of element requested}=1 + P ii=1 j≠iP jP i + P jThis list model will be further analyzed in Section 4.8, where we will assume adifferent reordering rule—namely, that the element requested is moved one closerto the front of the list as opposed to being moved to the front of the list as assumedhere. We will show there that the average position of the requested element is lessunder the one-closer rule than it is under the front-of-the-line rule.3.6.2. A Random GraphA graph consists of a set V of elements called nodes and a set A of pairs ofelements of V called arcs. A graph can be represented graphically by drawingcircles for nodes and drawing lines between nodes i and j whenever (i, j) is anarc. For instance if V ={1, 2, 3, 4} and A ={(1, 2), (1, 4), (2, 3), (1, 2), (3, 3)},then we can represent this graph as shown in Figure 3.1. Note that the arcs haveno direction (a graph in which the arcs are ordered pairs of nodes is called adirected graph); and that in the figure there are multiple arcs connecting nodes 1and 2, and a self-arc (called a self-loop) from node 3 to itself.Figure 3.1. A graph.


140 3 Conditional Probability and Conditional ExpectationFigure 3.2. A disconnected graph.Figure 3.3.We say that there exists a path from node i to node j, i ≠ j, if there exists asequence of nodes i, i 1 ,...,i k ,j such that (i, i 1 ), (i 1 ,i 2 ),...,(i k ,j) are all arcs.If there is a path between each of the ( n2)distinct pair of nodes we say that thegraph is connected. The graph in Figure 3.1 is connected but the graph in Figure3.2 is not. Consider now the following graph where V ={1, 2,...,n} andA ={(i, X(i)), i = 1,...,n} where the X(i) are independent random variablessuch thatP {X(i) = j}= 1 n ,j = 1, 2,...,nIn other words from each node i we select at random one of the n nodes (includingpossibly the node i itself) and then join node i and the selected node with an arc.Such a graph is commonly referred to as a random graph.We are interested in determining the probability that the random graph so obtainedis connected. As a prelude, starting at some node—say, node 1—let usfollow the sequence of nodes, 1, X(1), X 2 (1),..., where X n (1) = X(X n−1 (1));and define N to equal the first k such that X k (1) is not a new node. In other words,N = 1st k such that X k (1) ∈{1,X(1),...,X k−1 (1)}We can represent this as shown in Figure 3.3 where the arc from X N−1 (1) goesback to a node previously visited.To obtain the probability that the graph is connected we first condition on N toobtainP {graph is connected}=n∑P {connected|N = k}P {N = k} (3.18)k=1


3.6. Some Applications 141Now given that N = k, thek nodes 1,X(1),...,X k−1 (1) are connected to eachother, and there are no other arcs emanating out of these nodes. In other words,if we regard these k nodes as being one supernode, the situation is the same asif we had one supernode and n − k ordinary nodes with arcs emanating fromthe ordinary nodes—each arc going into the supernode with probability k/n.Thesolution in this situation is obtained from Lemma 3.2 by taking r = n − k.Lemma 3.1 Given a random graph consisting of nodes 0, 1,...,r and rarcs—namely, (i, Y i ), i = 1,...,r, wherethen⎧⎪⎨ j with probabilityY i =⎪⎩ 0 with probability1r + k ,kr + kP {graph is connected}=kr + kj = 1,...,r[In other words, for the preceding graph there are r + 1 nodes—r ordinarynodes and one supernode. Out of each ordinary node an arc is chosen. The arcgoes to the supernode with probability k/(r + k) and to each of the ordinary oneswith probability 1/(r + k). There is no arc emanating out of the supernode.]Proof The proof is by induction on r.Asitistruewhenr = 1 for any k, assumeit true for all values less than r. Now in the case under consideration, let us firstcondition on the number of arcs (j, Y j ) for which Y j = 0. This yieldsP {connected}r∑( r= P {connected|i of the Y j = 0}ii=0)( ) k i ( ) r r−i(3.19)r + k r + kNow given that exactly i of the arcs are into the supernode (see Figure 3.4), thesituation for the remaining r − i arcs which do not go into the supernode is thesame as if we had r − i ordinary nodes and one supernode with an arc goingout of each of the ordinary nodes—into the supernode with probability i/r andinto each ordinary node with probability 1/r. But by the induction hypothesis theprobability that this would lead to a connected graph is i/r.Hence,P {connected|i of the Y j = 0}= i r


142 3 Conditional Probability and Conditional ExpectationFigure 3.4. The situation given that i of the r arcs are into the supernode.and from Equation (3.19)r∑( )( )i r k i ( ) r r−iP {connected}=r i r + k r + ki=0= 1 ( )][binomialr E kr,r + k= kr + kwhich completes the proof of the lemma.Hence as the situation given N =k is exactly as described by Lemma 3.2 whenr = n − k, we see that, for the original graph,P {graph is connected|N = k}= k nand, from Equation (3.18),P {graph is connected}= E(N)nTo compute E(N) we use the identityE(N) =∞∑P {N i}which can be proved by defining indicator variables I i , i 1, byi=1(3.20){1, if i NI i =0, if i>N


3.6. Some Applications 143Hence,∞∑N =i=1I iand so[ ∞]∑E(N) = E I ii=1∞∑= E[I i ]i=1∞∑= P {N i} (3.21)i=1Now the event {N i} occurs if the nodes 1,X(1),...,X i−1 (1) are all distinct.Hence,(n − 1) (n − 2)P {N i}= ···(n − i + 1)n n n(n − 1)!=(n − i)!n i−1and so, from Equations (3.20) and (3.21),P {graph is connected}=(n − 1)!=(n − 1)!n nn∑i=1n−1∑j=01(n − i)!n in jj!(by j = n − i) (3.22)We can also use Equation (3.22) to obtain a simple approximate expression forthe probability that the graph is connected when n is large. To do so, we first notethat if X is a Poisson random variable with mean n, then∑n−1P {X


144 3 Conditional Probability and Conditional Expectationcentral limit theorem that for n large such a random variable has approximatelya normal distribution and as such has probability 1 2of being less than its mean.That is, for n largeand so for n large,P {X


3.6. Some Applications 145Figure 3.5. A graph having three connected components.where we use the notation P n (i) to make explicit the dependence on n, the numberof nodes. Since a connected graph is by definition a graph consisting of exactlyone component, from Equation (3.22) we haveP n (1) = P {C = 1}=(n − 1)!n nn−1∑j=0n jj!(3.23)To obtain P n (2), the probability of exactly two components, let us first fix attentionon some particular node—say, node 1. In order that a given set of k − 1other nodes—say, nodes 2,...,k—will along with node 1 constitute one connectedcomponent, and the remaining n − k a second connected component, wemust have(i) X(i) ∈{1, 2,...,k}, for all i = 1,...,k.(ii) X(i) ∈{k + 1,...,n}, for all i = k + 1,...,n.(iii) The nodes 1, 2,...,k form a connected subgraph.(iv) The nodes k + 1,...,nform a connected subgraph.The probability of the preceding occurring is clearly( kn) k ( ) n − k n−kP k (1)P n−k (1)nand because there are ( n−1k−1)ways of choosing a set of k − 1 nodes from the nodes2 through n,wehave∑n−1( )( ) n − 1 k k ( ) n − k n−kP n (2) =P k (1)P n−k (1)k − 1 n nk=1


146 3 Conditional Probability and Conditional ExpectationFigure 3.6. Acycle.and so P n (2) can be computed from Equation (3.23). In general, the recursiveformula for P n (i) is given byP n (i) =n−i+1∑k=1( n − 1k − 1)( kn) k ( ) n − k n−kP k (1)P n−k (i − 1)To compute E[C], the expected number of connected components, first notethat every connected component of our random graph must contain exactly onecycle [a cycle is a set of arcs of the form (i, i 1 ), (i 1 ,i 2 ),...,(i k−1 ,i k ), (i k ,i)fordistinct nodes i, i 1 ,...,i k ]. For example, Figure 3.6 depicts a cycle.The fact that every connected component of our random graph must containexactly one cycle is most easily proved by noting that if the connected componentconsists of r nodes, then it must also have r arcs and, hence, must contain exactlyone cycle (why?). Thus, we see thatE[C]=E[number of cycles][ ∑]= E I(S)Sn= ∑ SE[I(S)]where the sum is over all subsets S ⊂{1, 2,...,n} and{1, if the nodes in S are all the nodes of a cycleI(S)=0, otherwiseNow, if S consists of k nodes, say 1,...,k, thenE[I(S)]=P {1,X(1),...,X k−1 (1) are all distinct and contained in1,...,kand X k (1) = 1}= k − 1 k − 2n n ··· 1 1n n=(k − 1)!n k


Hence, because there are ( nk)subsets of size k we see thatE[C]=n∑( ) n (k − 1)!kk=13.6.3. Uniform Priors, Polya’s Urn Model, andBose–Einstein Statisticsn k3.6. Some Applications 147Suppose that n independent trials, each of which is a success with probability p,are performed. If we let X denote the total number of successes, then X is abinomial random variable such that( ) nP {X = k|p}= p k (1 − p) n−k , k= 0, 1,...,nkHowever, let us now suppose that whereas the trials all have the same successprobability p, its value is not predetermined but is chosen according to a uniformdistribution on (0, 1). (For instance, a coin may be chosen at random from a hugebin of coins representing a uniform spread over all possible values of p, the coin’sprobability of coming up heads. The chosen coin is then flipped n times.) In thiscase, by conditioning on the actual value of p,wehavethatNow, it can be shown that∫ 1P {X = k}= P {X = k|p}f(p)dp0∫ 1( ) n= p k (1 − p) n−k dpk0∫ 10p k (1 − p) n−k dp =k!(n − k)!(n + 1)!(3.24)and thus( ) n k!(n − k)!P {X = k}=k (n + 1)!= 1 , k= 0, 1,...,n (3.25)n + 1In other words, each of the n + 1 possible values of X is equally likely.


148 3 Conditional Probability and Conditional ExpectationAs an alternate way of describing the preceding experiment, let us compute theconditional probability that the (r + 1)st trial will result in a success given a totalof k successes (and r − k failures) in the first r trials.P {(r + 1)st trial is a success|k successes in first r}P {(r + 1)st is a success, ksuccesses in first r trials}=P {k successes in first r trials}∫ 10P {(r + 1)st is a success, kin first r|p} dp=1/(r + 1)∫ 1( ) r= (r + 1) p k+1 (1 − p) r−k dpk0( ) r (k + 1)!(r − k)!= (r + 1)k (r + 2)!by Equation (3.24)= k + 1(3.26)r + 2That is, if the first r trials result in k successes, then the next trial will be a successwith probability (k + 1)/(r + 2).It follows from Equation (3.26) that an alternative description of the stochasticprocess of the successive outcomes of the trials can be described as follows: Thereis an urn which initially contains one white and one black ball. At each stage a ballis randomly drawn and is then replaced along with another ball of the same color.Thus, for instance, if of the first r balls drawn, k were white, then the urn at thetime of the (r + 1)th draw would consist of k + 1 white and r − k + 1 black, andthus the next ball would be white with probability (k + 1)/(r + 2). If we identifythe drawing of a white ball with a successful trial, then we see that this yields analternate description of the original model. This latter urn model is called Polya’surn model.Remarks (i) In the special case when k = r, Equation (3.26) is sometimescalled Laplace’s rule of succession, after the French mathematician Pierre deLaplace. In Laplace’s era, this “rule” provoked much controversy, for people attemptedto employ it in diverse situations where its validity was questionable. Forinstance, it was used to justify such propositions as “If you have dined twice at arestaurant and both meals were good, then the next meal also will be good withprobability 3 4,” and “Since the sun has risen the past 1,826,213 days, so will it risetomorrow with probability 1,826,214/1,826,215.” The trouble with such claimsresides in the fact that it is not at all clear the situation they are describing canbe modeled as consisting of independent trials having a common probability ofsuccess which is itself uniformly chosen.


3.6. Some Applications 149(ii) In the original description of the experiment, we referred to the successivetrials as being independent, and in fact they are independent when the successprobability is known. However, when p is regarded as a random variable, thesuccessive outcomes are no longer independent because knowing whether an outcomeis a success or not gives us some information about p, which in turn yieldsinformation about the other outcomes.The preceding can be generalized to situations in which each trial has morethan two possible outcomes. Suppose that n independent trials, each resulting inone of m possible outcomes 1,...,m, with respective probabilities p 1 ,...,p mare performed. If we let X i denote the number of type i outcomes that result inthe n trials, i = 1,...,m, then the vector X 1 ,...,X m will have the multinomialdistribution given byP {X 1 = x 1 ,X 2 = x 2 ,...,X m = x m |p}=n!x 1 !···x m ! px 11 px 22 ···px m mwhere x 1 ,...,x m is any vector of nonnegative integers that sum to n. Nowletussuppose that the vector p = (p 1 ,...,p m ) is not specified, but instead is chosen bya “uniform” distribution. Such a distribution would be of the form{ c, 0 pi 1,i = 1,...,m, ∑ mf(p 1 ,...,p m ) =1 p i = 10, otherwiseThe preceding multivariate distribution is a special case of what is known as theDirichlet distribution, and it is not difficult to show, using the fact that the distributionmust integrate to 1, that c = (m − 1)!.The unconditional distribution of the vector X is given by∫∫P {X 1 = x 1 ,...,X m = x m }=× f(p 1 ,...,p m )dp 1 ···dp m =∫···P {X 1 = x 1 ,...,X m = x m |p 1 ,...,p m }∫∫(m − 1)!n!x 1 !···x m !∫···0p i 1∑ m1 p i =1p x 11 ···px m m dp 1 ···dp mNow it can be shown that∫∫ ∫···0p i 1∑ m1 p i =1p x 11 ···px m m dp 1 ···dp m =x 1 !···x m !( ∑ m1x i + m − 1 ) !(3.27)


150 3 Conditional Probability and Conditional Expectationand thus, using the fact that ∑ m1 x i = n, we see thatn!(m − 1)!P {X 1 = x 1 ,...,X m = x m }=(n + m − 1)!( ) n + m − 1 −1=(3.28)m − 1Hence, all of the ( n + m − 1) (m − 1 possible outcomes [there are n + m − 1)m − 1 possible nonnegativeinteger valued solutions of x 1 +···+x m = n] of the vector (X 1 ,...,X m )are equally likely. The probability distribution given by Equation (3.28) is sometimescalled the Bose–Einstein distribution.To obtain an alternative description of the foregoing, let us compute the conditionalprobability that the (n + 1)st outcome is of type j if the first n trials haveresulted in x i type i outcomes, i = 1,...,m, ∑ m1 x i = n. This is given byP {(n + 1)st is j|x i type i in first n, i = 1,...,m}= P {(n + 1)st is j, x i type i in first n, i = 1,...,m}P {x i type i in first n, i = 1,...,m}∫∫ ∫n!(m − 1)!··· p x 1x= 1 !···x m !1 ···px j +1j···p x m m dp 1 ···dp m( ) n + m − 1 −1m − 1where the numerator is obtained by conditioning on the p vector and the denominatoris obtained by using Equation (3.28). By Equation (3.27), we have thatP {(n + 1)st is j|x i type i in first n, i = 1,...,m}(x j + 1)n!(m − 1)!(n + m)!=(m − 1)!n!(n + m − 1)!= x j + 1n + m(3.29)Using Equation (3.29), we can now present an urn model description of thestochastic process of successive outcomes. Namely, consider an urn that initiallycontains one of each of m types of balls. Balls are then randomly drawn and arereplaced along with another of the same type. Hence, if in the first n drawingsthere have been a total of x j type j balls drawn, then the urn immediately before


3.6. Some Applications 151the (n+1)st draw will contain x j +1 type j balls out of a total of m+n, and so theprobability of a type j on the (n + 1)st draw will be given by Equation (3.29).Remark Consider a situation where n particles are to be distributed at randomamong m possible regions; and suppose that the regions appear, at least before theexperiment, to have the same physical characteristics. It would thus seem that themost likely distribution for the number of particles that fall into each of the regionsis the multinomial distribution with p i ≡ 1/m. (This, of course, would correspondto each particle, independent of the others, being equally likely to fall in any ofthe m regions.) Physicists studying how particles distribute themselves observedthe behavior of such particles as photons and atoms containing an even numberof elementary particles. However, when they studied the resulting data, they wereamazed to discover that the observed frequencies did not follow the multinomialdistribution but rather seemed to follow the Bose–Einstein distribution. They wereamazed because they could not imagine a physical model for the distribution ofparticles which would result in all possible outcomes being equally likely. (Forinstance, if 10 particles are to distribute themselves between two regions, it hardlyseems reasonable that it is just as likely that both regions will contain 5 particlesas it is that all 10 will fall in region 1 or that all 10 will fall in region 2.)However, from the results of this section we now have a better understandingof the cause of the physicists’ dilemma. In fact, two possible hypotheses presentthemselves. First, it may be that the data gathered by the physicists were actuallyobtained under a variety of different situations, each having its own characteristicp vector which gave rise to a uniform spread over all possible p vectors. A secondpossibility (suggested by the urn model interpretation) is that the particles selecttheir regions sequentially and a given particle’s probability of falling in a region isroughly proportional to the fraction of the landed particles that are in that region.(In other words, the particles presently in a region provide an “attractive” force onelements that have not yet landed.)3.6.4. Mean Time for PatternsLet X = (X 1 ,X 2 ,...) be a sequence of independent and identically distributeddiscrete random variables such thatp i = P {X j = i}For a given subsequence, or pattern, i 1 ,...,i n let T = T(i 1 ,...,i n ) denote thenumber of random variables that we need to observe until the pattern appears.For instance, if the subsequence of interest is 3, 5, 1 and the sequence is X =(5, 3, 1, 3, 5, 3, 5, 1, 6, 2,...)then T = 8. We want to determine E[T ].To begin, let us consider whether the pattern has an overlap, where we say thatthe pattern i 1 ,i 2 ,...,i n has an overlap if for some k,1 k


152 3 Conditional Probability and Conditional Expectationits final k elements is the same as that of its first k elements. That is, it has anoverlap if for some 1 kj,(X j+1 ,...,X j+n ) = (i 1 ,...,i n )} (3.30)To verify (3.30), note first that T = j + n clearly implies both that T>jand that(X j+1 ,...,X j+n ) = (i 1 ,...,i n ). On the other hand, suppose thatT>j and (X j+1 ,...,X j+n ) = (i 1 ,...,i n ) (3.31)Let kj,(X j+1 ,...,X j+n ) = (i 1 ,...,i n )}However, whether T>jis determined by the values X 1 ,...,X j , and is thusindependent of X j+1 ,...,X j+n . Consequently,whereP {T = j + n}=P {T >j}P {(X j+1 ,...,X j+n ) = (i 1 ,...,i n )}= P {T >j}pp = p i1 p i2 ···p inSumming both sides of the preceding over all j yieldsor∞∑∞∑1 = P {T = j + n}=p P {T >j}=pE[T ]j=0j=0E[T ]= 1 pCase 2: The pattern has overlaps.For patterns having overlaps there is a simple trick that will enable us to obtainE[T ] by making use of the result for nonoverlapping patterns. To make the analy-


3.6. Some Applications 153sis more transparent, consider a specific pattern, say P = (3, 5, 1, 3, 5). Let x bea value that does not appear in the pattern, and let T x denote the time until thepattern P x = (3, 5, 1, 3, 5,x) appears. That is, T x is the time of occurrence of thenew pattern that puts x at the end of the original pattern. Because x did not appearin the original pattern it follows that the new pattern has no overlaps; thus,E[T x ]= 1p x pwhere p = ∏ nj=1 p ij = p 2 3 p2 5 p 1. Because the new pattern can occur only after theoriginal one, writeT x = T + Awhere T is the time at which the pattern P = (3, 5, 1, 3, 5) occurs, and A is theadditional time after the occurrence of the pattern P until P x occurs. Also, letE[T x |i 1 ,...i r ] denote the expected additional time after time r until the patternP x appears given that the first r data values are i 1 ,...,i r . Conditioning on X,thenext data value after the occurrence of the pattern (3, 5, 1, 3, 5), gives that⎧1 + E[T x |3, 5, 1], if i = 1⎪⎨1 + E[TE[A|X = i]=x |3], if i = 31, if i = x⎪⎩1 + E[T x ], if i ≠ 1, 3,xTherefore,E[T x ]=E[T ]+E[A]= E[T ]+1 + E[T x |3, 5, 1]p 1 + E[T x |3]p 3 + E[T x ](1 − p 1 − p 3 − p x )(3.32)Butgiving thatSimilarly,E[T x ]=E[T(3, 5, 1)]+E[T x |3, 5, 1]E[T x |3, 5, 1]=E[T x ]−E[T(3, 5, 1)]E[T x |3]=E[T x ]−E[T(3)]Substituting back into Equation (3.32) givesp x E[T x ]=E[T ]+1 − p 1 E[T(3, 5, 1)]−p 3 E[T(3)]


154 3 Conditional Probability and Conditional ExpectationBut, by the result in the nonoverlapping case,1E[T(3, 5, 1)]= , E[T(3)]= 1 p 3 p 5 p 1 p 3yielding the resultE[T ]=p x E[T x ]+ 1p 3 p 5= 1 p + 1p 3 p 5For another illustration of the technique, let us reconsider Example 3.14, whichis concerned with finding the expected time until n consecutive successes occurin independent Bernoulli trials. That is, we want E[T ], when the patternis P = (1, 1,...,1). Then, with x ≠ 1 we consider the nonoverlapping patternP x = (1,...,1,x),and let T x be its occurrence time. With A and X as previouslydefined, we have that⎧⎨1 + E[A], if i = 1E[A|X = i]= 1, if i = x⎩1 + E[T x ], if i ≠ 1,xTherefore,E[A]=1 + E[A]p 1 + E[T x ](1 − p 1 − p x )orE[A]= 11 − p 1+ E[T x ] 1 − p 1 − p x1 − p 1Consequently,E[T ]=E[T x ]−E[A]= p xE[T x ]−11 − p 1= (1/p 1) n − 11 − p 1where the final equality used that E[T x ]= 1p n 1 p x .The mean occurrence time of any overlapping pattern P = (i 1 ,...,i n ) can beobtained by the preceding method. Namely, let T x be the time until the nonoverlappingpattern P x = (i 1 ,...,i n ,x)occurs; then use the identityE[T x ]=E[T ]+E[A]


3.6. Some Applications 155to relate E[T ] and E[T x ]= 1pp x; then condition on the next data value after Poccurs to obtain an expression for E[A] in terms of quantities of the formE[T x |i 1 ,...,i r ]=E[T x ]−E[T(i 1 ,...,i r )]If (i 1 ,...,i r ) is nonoverlapping, use the nonoverlapping result to obtainE[T(i 1 ,...,i r )]; otherwise, repeat the process on the subpattern (i 1 ,...,i r ).Remark We can utilize the preceding technique even when the patterni 1 ,...,i n includes all the distinct data values. For instance, in coin tossing the patternof interest might be h, t, h. Even in such cases, we should let x be a data valuethat is not in the pattern and use the preceding technique (even though p x = 0).Because p x will appear only in the final answer in the expression p x E[T x ]= p xp x p ,by interpreting this fraction as 1/p we obtain the correct answer. (A rigorous approach,yielding the same result, would be to reduce one of the positive p i by ɛ,take p x = ɛ, solveforE[T ], and then let ɛ go to 0.) 3.6.5. The k-Record Values of Discrete Random VariablesLet X 1 ,X 2 ,... be independent and identically distributed random variableswhose set of possible values is the positive integers, and let P {X = j}, j 1,denote their common probability mass function. Suppose that these random variablesare observed in sequence, and say that X n is a k-record value ifX i X nfor exactly k of the values i, i = 1,...,nThat is, the nth value in the sequence is a k-record value if exactly k of the first nvalues (including X n ) are at least as large as it. Let R k denote the ordered set ofk-record values.It is a rather surprising result that not only do the sequences of k-record valueshave the same probability distributions for all k, these sequences are also independentof each other. This result is known as Ignatov’s theorem.Ignatov’s Theoremrandom vectors.R k ,k 1, are independent and identically distributedProof Define a series of subsequences of the data sequence X 1 ,X 2 ,... by lettingthe ith subsequence consist of all data values that are at least as large asi, i 1. For instance, if the data sequence is2, 5, 1, 6, 9, 8, 3, 4, 1, 5, 7, 8, 2, 1, 3, 4, 2, 5, 6, 1,...


156 3 Conditional Probability and Conditional Expectationthen the subsequences are as follows: 1: 2, 5, 1, 6, 9, 8, 3, 4, 1, 5, 7, 8, 2, 1, 3, 4, 2, 5, 6, 1,... 2: 2, 5, 6, 9, 8, 3, 4, 5, 7, 8, 2, 3, 4, 2, 5, 6,... 3: 5, 6, 9, 8, 3, 4, 5, 7, 8, 3, 4, 5, 6,...and so on.Let Xj i be the jth element of subsequence i. That is, Xi jis the jth data valuethat is at least as large as i. An important observation is that i is a k-record valueif and only if Xk i = i. That is, i will be a k-record value if and only if the kthvalue to be at least as large as i is equal to i. (For instance, for the preceding data,since the fifth value to be at least as large as 3 is equal to 3 it follows that 3 is afive-record value.) Now, it is not difficult to see that, independent of which valuesin the first subsequence are equal to 1, the values in the second subsequence areindependent and identically distributed according to the mass functionP {value in second subsequence = j}=P {X = j|X 2}, j 2Similarly, independent of which values in the first subsequence are equal to 1 andwhich values in the second subsequence are equal to 2, the values in the thirdsubsequence are independent and identically distributed according to the massfunctionP {value in third subsequence = j}=P {X = j|X 3}, j 3and so on. It therefore follows that the events {Xj i = i}, i 1, j 1, are independentandP {i is a k-record value}=P {Xk i = i}=P {X = i|X i}It now follows from the independence of the events {Xk i = i}, i 1, and thefact that P {i is a k-record value} does not depend on k, that R k has the samedistribution for all k 1. In addition, it follows from the independence of theevents {Xk i = 1}, that the random vectors R k,k 1, are also independent. Suppose now that the X i ,i 1 are independent finite-valued random variableswith probability mass functionp i = P {X = i},i = 1,...,mand letT = min{n: X i X n for exactly k of the values i, i = 1,...,n}denote the first k-record index. We will now determine its mean.


3.6. Some Applications 157Proposition 3.3Let λ i = p i / ∑ mj=i p j ,i= 1,...,m. Thenm−1∑E[T ]=k + (k − 1)Proof To begin, suppose that the observed random variables X 1 X 2 ,... takeon one of the values i, i + 1,...,mwith respective probabilitiesp jP {X = j}=, j = i,...,mp i +···+p mLet T i denote the first k-record index when the observed data have the precedingmass function, and note that since the each data value is at least i it follows thatthe k-record value will equal i, and T i will equal k, ifX k = i. As a result,E[T i |X k = i]=kOn the other hand, if X k >ithen the k-record value will exceed i, and so all datavalues equal to i can be disregarded when searching for the k-record value. Inaddition, since each data value greater than i will have probability mass functionp jP {X = j|X>i}=, j = i + 1,...,mp i+1 +···+p mit follows that the total number of data values greater than i that need be observeduntil a k-record value appears has the same distribution as T i+1 . Hence,i=1λ iE[T i |X k >i]=E[T i+1 + N i |X k >i]where T i+1 is the total number of variables greater than i that we need observeto obtain a k-record, and N i is the number of values equal to i that are observedin that time. Now, given that X k >iand that T i+1 = n(n k) it follows that thetime to observe T i+1 values greater than i has the same distribution as the numberof trials to obtain n successes given that trial k is a success and that each trial isindependently a success with probability 1 − p i / ∑ ji p j = 1 − λ i . Thus, sincethe number of trials needed to obtain a success is a geometric random variablewith mean 1/(1 − λ i ), we see thatE[T i |T i+1 ,X k >i]=1 + T i+1 − 1= T i+1 − λ i1 − λ i 1 − λ iTaking expectations gives thatE[T i |X k >i]=E[Ti+1 − λ i1 − λ i∣ ∣∣Xk>i]= E[T i+1]−λ i1 − λ i


158 3 Conditional Probability and Conditional ExpectationThus, upon conditioning on whether X k = i, we obtainE[T i ]=E[T i |X k = i]λ i + E[T i |X k >i](1 − λ i )= (k − 1)λ i + E[T i+1 ]Starting with E[T m ]=k, the preceding gives thatIn general,E[T m−1 ]=(k − 1)λ m−1 + kE[T m−2 ]=(k − 1)λ m−2 + (k − 1)λ m−1 + k= (k − 1)m−1∑j=m−2λ j + kE[T m−3 ]=(k − 1)λ m−3 + (k − 1)= (k − 1)m−1∑j=m−3λ j + km−1∑E[T i ]=(k − 1) λ j + kj=im−1∑j=m−2λ j + kand the result follows since T = T 1 .3.7. An Identity for Compound Random VariablesLet X 1 ,X 2 ,... be a sequence of independent and identically distributed randomvariables, and let S n = ∑ ni=1 X i be the sum of the first n of them, n 0, whereS 0 = 0. Recall that if N is a nonnegative integer valued random variable that isindependent of the sequence X 1 ,X 2 ,...thenS N =N∑i=1is said to be a compound random variable, with the distribution of N called thecompounding distribution. In this subsection we will first derive an identity involvingsuch random variables. We will then specialize to where the X i are positiveinteger valued random variables, prove a corollary of the identity, and thenX i


3.7. An Identity for Compound Random Variables 159use this corollary to develop a recursive formula for the probability mass functionof S N , for a variety of common compounding distributions.To begin, let M be a random variable that is independent of the sequenceX 1 ,X 2 ,..., and which is such thatP {M = n}=nP {N = n}, n= 1, 2,...E[N]Proposition 3.4 The Compound Random Variable IdentityFor any function hProofE[S N h(S N )]=E[N]E[X 1 h(S M )]N∑E[S N h(S N )]=E[ X i h(S N )]=i=1∞∑ N∑E[ X i h(S N )|N = n]P {N = n}n=0i=1(by conditioning on N)∞∑ n∑= E[ X i h(S n )|N = n]P {N = n}n=0n=0i=1∞∑ n∑= E[ X i h(S n )]P {N = n}i=1(by independence of N and X 1 ,...,X n )∞∑ n∑= E[X i h(S n )]P {N = n}n=0 i=1Now, because X 1 ,...,X n are independent and identically distributed, andh(S n ) = h(X 1 +···+X n ) is a symmetric function of X 1 ,...,X n , it follows thatthe distribution of X i h(S n ) is the same for all i = 1,...,n.Therefore, continuingthe preceding string of equalities yieldsE[S N h(S N )]=∞∑nE[X 1 h(S n )]P {N = n}n=0= E[N]∞∑E[X 1 h(S n )]P {M = n} (definition of M)n=0


160 3 Conditional Probability and Conditional Expectation∞∑= E[N] E[X 1 h(S n )|M = n]P {M = n}n=0(independence of M and X 1 ,...,X n )∞∑= E[N] E[X 1 h(S M )|M = n]P {M = n}n=0= E[N]E[X 1 h(S M )]which proves the proposition.Suppose now that the X i are positive integer valued random variables, and letα j = P {X 1 = j}, j >0The successive values of P {S N = k} can often be obtained from the followingcorollary to Proposition 3.4.Corollary 3.5P {S N = 0}=P {N = 0}P {S N = k}= 1 k∑k E[N] jα j P {S M−1 = k − j}, k>0j=1ProofFor k fixed, let{ 1, if x = kh(x) =0, if x ≠ kand note that S N h(S N ) is either equal to k if S N = k or is equal to 0 otherwise.Therefore,and the compound identity yieldsE[S N h(S N )]=kP{S N = k}kP{S N = k}=E[N]E[X 1 h(S M )]∞∑= E[N] E[X 1 h(S M ))|X 1 = j]α jj=1


3.7. An Identity for Compound Random Variables 161∞∑= E[N] jE[h(S M )|X 1 = j]α jj=1∞∑= E[N] jP{S M = k|X 1 = j}α j (3.33)j=1Now,{ M}∑∣P {S M = k|X 1 = j}=P X i = k∣X 1 = ji=1{}M∑∣= P j + X i = k∣X 1 = ji=2{}M∑= P j + X i = k{= P j +i=2M−1∑i=1X i = k= P {S M−1 = k − j}The next to last equality followed because X 2 ,...,X M and X 1 ,...,X M−1 havethe same joint distribution; namely that of M − 1 independent random variablesthat all have the distribution of X 1 , where M − 1 is independent of these randomvariables. Thus the proof follows from Equation (3.33). When the distributions of M − 1 and N are related, the preceding corollarycan be a useful recursion for computing the probability mass function of S N ,asisillustrated in the following subsections.}3.7.1. Poisson Compounding DistributionIf N is the Poisson distribution with mean λ, thenP {M − 1 = n}=P {M = n + 1}=(n + 1)P {N = n + 1}E[N]


162 3 Conditional Probability and Conditional Expectation= 1 λn+1(n + 1)e−λλ (n + 1)!−λ λn= en!Consequently, M − 1 is also Poisson with mean λ. Thus, withP n = P {S N = n}the recursion given by Corollary 3.5 can be writtenP 0 = e −λP k = λ kk∑jα j P k−j , k>0j=1Remark When the X i are identically 1, the preceding recursion reduces to thewell-known identity for a Poisson random variable having mean λ:P {N = 0}=e −λP {N = n}= λ P {N = n − 1}, n 1nExample 3.30Let S be a compound Poisson random variable with λ = 4 andP {X i = i}=1/4, i = 1, 2, 3, 4Let us use the recursion given by Corollary 3.5 to determine P {S = 5}.ItgivesP 0 = e −λ = e −4P 1 = λα 1 P 0 = e −4P 2 = λ 2 (α 1P 1 + 2α 2 P 0 ) = 3 2 e−4P 3 = λ 3 (α 1P 2 + 2α 2 P 1 + 3α 3 P 0 ) = 136 e−4P 4 = λ 4 (α 1P 3 + 2α 2 P 2 + 3α 3 P 1 + 4α 4 P 0 ) = 7324 e−4P 5 = λ 5 (α 1P 4 + 2α 2 P 3 + 3α 3 P 2 + 4α 4 P 1 + 5α 5 P 0 ) = 501120 e−4


3.7. An Identity for Compound Random Variables 1633.7.2. Binomial Compounding DistributionSuppose that N is a binomial random variable with parameters r and p. Then,(n + 1)P {N = n + 1}P {M − 1 = n}=E[N]= n + 1 ( ) rp n+1 (1 − p) r−n−1rp n + 1= n + 1rp=r!(r − 1 − n)!(n + 1)! pn+1 (1 − p) r−1−n(r − 1)!(r − 1 − n)!n! pn (1 − p) r−1−nThus, M − 1 is a binomial random variable with parameters r − 1, p.Fixing p,letN(r) be a binomial random variable with parameters r and p, andletThen, Corollary 3.5 yields thatP r (0) = (1 − p) rP r (k) = P {S N(r) = k}P r (k) = rp kk∑jα j P r−1 (k − j), k >0j=1For instance, letting k equal 1, then 2, and then 3 givesP r (1) = rp α 1 (1 − p) r−1P r (2) = rp 2 [α 1P r−1 (1) + 2α 2 P r−1 (0)]= rp [(r − 1)pα221 (1 − p) r−2 + 2α 2 (1 − p) r−1]P r (3) = rp [α1 P r−1 (2) + 2α 2 P r−1 (1) + 3α 3 P r−1 (0) ]3= α 1rp3(r − 1)p [(r − 2)pα221 (1 − p) r−3 + 2α 2 (1 − p) r−2]+ 2α 2rp(r − 1)p α 1 (1 − p) r−2 + α 3 rp(1 − p) r−13


164 3 Conditional Probability and Conditional Expectation3.7.3. A Compounding Distribution Related to the NegativeBinomialSuppose, for a fixed value of p, 0


P r (2) ==P r (3) =r(1 − p) [α1 P r+1 (1) + 2α 2 P r+1 (0) ]2pr(1 − p) [α22p 1 (r + 1)p r+1 (1 − p) + 2α 2 p r+1]r(1 − p) [α1 P r+1 (2) + 2α 2 P r+1 (1) + 3α 3 P r+1 (0) ]3pExercises 165and so on.Exercises1. If X and Y are both discrete, show that ∑ x p X|Y (x|y) = 1 for all y such thatp Y (y) > 0.*2. Let X 1 and X 2 be independent geometric random variables having the sameparameter p. Guess the value ofP {X 1 = i|X 1 + X 2 = n}Hint: Suppose a coin having probability p of coming up heads is continuallyflipped. If the second head occurs on flip number n, what is the conditionalprobability that the first head was on flip number i, i = 1,...,n− 1?Verify your guess analytically.3. The joint probability mass function of X and Y , p(x,y), is given byp(1, 1) = 1 9 , p(2, 1) = 1 3 , p(3, 1) = 1 9 ,p(1, 2) = 1 9 , p(2, 2) = 0, p(3, 2) = 1 18 ,p(1, 3) = 0, p(2, 3) = 1 6 , p(3, 3) = 1 9Compute E[X|Y = i] for i = 1, 2, 3.4. In Exercise 3, are the random variables X and Y independent?5. An urn contains three white, six red, and five black balls. Six of these balls arerandomly selected from the urn. Let X and Y denote respectively the number ofwhite and black balls selected. Compute the conditional probability mass functionof X given that Y = 3. Also compute E[X|Y = 1].*6. Repeat Exercise 5 but under the assumption that when a ball is selected itscolor is noted, and it is then replaced in the urn before the next selection is made.


166 3 Conditional Probability and Conditional Expectation7. Suppose p(x,y,z), the joint probability mass function of the random variablesX, Y , and Z, is given byp(1, 1, 1) = 1 8 , p(2, 1, 1) = 1 4 ,p(1, 1, 2) = 1 8 , p(2, 1, 2) = 3 16 ,p(1, 2, 1) =16 1 , p(2, 2, 1) = 0,p(1, 2, 2) = 0, p(2, 2, 2) = 1 4What is E[X|Y = 2]? What is E[X|Y = 2,Z= 1]?8. An unbiased die is successively rolled. Let X and Y denote, respectively,the number of rolls necessary to obtain a six and a five. Find (a) E[X],(b) E[X|Y = 1],(c)E[X|Y = 5].9. Show in the discrete case that if X and Y are independent, thenE[X|Y = y]=E[X]for all y10. Suppose X and Y are independent continuous random variables. Show thatE[X|Y = y]=E[X]for all y11. The joint density of X and Y isf(x,y) = (y2 − x 2 )e −y ,80


Exercises 16716. The random variables X and Y are said to have a bivariate normal distributionif their joint density function is given byf(x,y) =⎧1 ⎨√2πσ x σ exp y 1 − ρ 2 ⎩ − 12(1 − ρ 2 )[ (x ) − 2 μx×− 2ρ(x − μ ( ) ]}x)(y − μ y ) y − 2 μy+σ x σ x σ y σ yfor −∞


168 3 Conditional Probability and Conditional Expectation(a) The X i are normal with mean θ and variance 1.(b) The density of X i is f(x)= θe −θx ,x>0.(c) The mass function of X i is p(x) = θ x (1 − θ) 1−x , x = 0, 1, 0


Exercises 169value occurs is(10 9 − 1)/9 = 111,111,111Thus, the actual value of approximately 25 million is roughly 22 percent of thetheoretical mean. However, it can be shown that under the uniformity assumptionthe standard deviation of N will be approximately equal to the mean. As aresult, the observed value is approximately 0.78 standard deviations less than itstheoretical mean and is thus quite consistent with the uniformity assumption.*23. A coin having probability p of coming up heads is successively flippeduntil two of the most recent three flips are heads. Let N denote the number offlips. (Note that if the first two flips are heads, then N = 2.) Find E[N].24. A coin, having probability p of landing heads, is continually flipped until atleast one head and one tail have been flipped.(a) Find the expected number of flips needed.(b) Find the expected number of flips that land on heads.(c) Find the expected number of flips that land on tails.(d) Repeat part (a) in the case where flipping is continued until a total of atleast two heads and one tail have been flipped.25. A gambler wins each game with probability p. In each of the followingcases, determine the expected total number of wins.(a) The gambler will play n games; if he wins X of these games, then he willplay an additional X games before stopping.(b) The gambler will play until he wins; if it takes him Y games to get this win,then he will play an additional Y games.26. You have two opponents with whom you alternate play. Whenever you playA, you win with probability p A ; whenever you play B, you win with probabilityp B , where p B >p A . If your objective is to minimize the number of games youneed to play to win two in a row, should you start with A or with B?Hint: Let E[N i ] denote the mean number of games needed if you initiallyplay i. Derive an expression for E[N A ] that involves E[N B ]; write down theequivalent expression for E[N B ] and then subtract.27. A coin that comes up heads with probability p is continually flipped until thepattern T, T, H appears. (That is, you stop flipping when the most recent flip landsheads, and the two immediately preceding it lands tails.) Let X denote the numberof flips made, and find E[X].28. Polya’s urn model supposes that an urn initially contains r red and b blueballs. At each stage a ball is randomly selected from the urn and is then returnedalong with m other balls of the same color. Let X k be the number of red ballsdrawn in the first k selections.


170 3 Conditional Probability and Conditional Expectation(a) Find E[X 1 ].(b) Find E[X 2 ].(c) Find E[X 3 ].(d) Conjecture the value of E[X k ], and then verify your conjecture by a conditioningargument.(e) Give an intuitive proof for your conjecture.Hint: Number the initial r red and b blue balls, so the urn contains one typei red ball, for each i = 1,...,r; as well as one type j blue ball, for each j =1,...,b. Now suppose that whenever a red ball is chosen it is returned alongwith m others of the same type, and similarly whenever a blue ball is chosenit is returned along with m others of the same type. Now, use a symmetryargument to determine the probability that any given selection is red.29. Two players take turns shooting at a target, with each shot by player i hittingthe target with probability p i ,i= 1, 2. Shooting ends when two consecutive shotshit the target. Let μ i denote the mean number of shots taken when player i shootsfirst, i = 1, 2.(a) Find μ 1 and μ 2 .(b) Let h i denote the mean number of times that the target is hit when player ishoots first, i = 1, 2. Find h 1 and h 2 .30. Let X i ,i 0 be independent and identically distributed random variableswith probability mass functionp(j) = P {X i = j},j = 1,...,m,m∑P(j)= 1j=1Find E[N], where N = min{n>0: X n = X 0 }.31. Each element in a sequence of binary data is either 1 with probability por 0 with probability 1 − p. A maximal subsequence of consecutive values havingidentical outcomes is called a run. For instance, if the outcome sequence is1, 1, 0, 1, 1, 1, 0, the first run is of length 2, the second is of length 1, and the thirdis of length 3.(a) Find the expected length of the first run.(b) Find the expected length of the second run.32. Independent trials, each resulting in success with probability p, are performed.(a) Find the expected number of trials needed for there to have been both atleast n successes and at least m failures.


Exercises 171Hint:Is it useful to know the result of the first n + m trials?(b) Find the expected number of trials needed for there to have been either atleast n successes and at least m failures.Hint:Make use of the result from part (a).33. If R i denotes the random amount that is earned in period i, then ∑ ∞i=1 β i−1 R i ,where 0


172 3 Conditional Probability and Conditional Expectationcommon success probability u. Compute the mean and variance of the number ofsuccesses that occur in these trials.39. A deck of n cards, numbered 1 through n, is randomly shuffled so that all n!possible permutations are equally likely. The cards are then turned over one at atime until card number 1 appears. These upturned cards constitute the first cycle.We now determine (by looking at the upturned cards) the lowest numbered cardthat has not yet appeared, and we continue to turn the cards face up until that cardappears. This new set of cards represents the second cycle. We again determinethe lowest numbered of the remaining cards and turn the cards until it appears,and so on until all cards have been turned over. Let m n denote the mean numberof cycles.(a) Derive a recursive formula for m n in terms of m k ,k= 1,...,n− 1.(b) Starting with m 0 = 0, use the recursion to find m 1 ,m 2 ,m 3 , and m 4 .(c) Conjecture a general formula for m n .(d) Prove your formula by induction on n. That is, show it is valid for n = 1,then assume it is true for any of the values 1,...,n − 1 and show that thisimplies it is true for n.(e) Let X i equal 1 if one of the cycles ends with card i, and let it equal 0otherwise, i =1,...,n. Express the number of cycles in terms of these X i .(f) Use the representation in part (e) to determine m n .(g) Are the random variables X 1 ,...,X n independent? Explain.(h) Find the variance of the number of cycles.40. A prisoner is trapped in a cell containing three doors. The first door leads toa tunnel that returns him to his cell after two days of travel. The second leads to atunnel that returns him to his cell after three days of travel. The third door leadsimmediately to freedom.(a) Assuming that the prisoner will always select doors 1, 2, and 3 with probabilities0.5, 0.3, 0.2, what is the expected number of days until he reachesfreedom?(b) Assuming that the prisoner is always equally likely to choose among thosedoors that he has not used, what is the expected number of days until he reachesfreedom? (In this version, for instance, if the prisoner initially tries door 1, thenwhen he returns to the cell, he will now select only from doors 2 and 3.)(c) For parts (a) and (b) find the variance of the number of days until the prisonerreaches freedom.41. A rat is trapped in a maze. Initially it has to choose one of two directions. Ifit goes to the right, then it will wander around in the maze for three minutes andwill then return to its initial position. If it goes to the left, then with probability 1 3it will depart the maze after two minutes of traveling, and with probability 2 3 it


Exercises 173will return to its initial position after five minutes of traveling. Assuming that therat is at all times equally likely to go to the left or the right, what is the expectednumber of minutes that it will be trapped in the maze?42. A total of 11 people, including you, are invited to a party. The times at whichpeople arrive at the party are independent uniform (0, 1) random variables.(a) Find the expected number of people who arrive before you.(b) Find the variance of the number of people who arrive before you.43. The number of claims received at an insurance company during a week is arandom variable with mean μ 1 and variance σ1 2 . The amount paid in each claim isa random variable with mean μ 2 and variance σ2 2 . Find the mean and variance ofthe amount of money paid by the insurance company each week. What independenceassumptions are you making? Are these assumptions reasonable?44. The number of customers entering a store on a given day is Poisson distributedwith mean λ = 10. The amount of money spent by a customer is uniformlydistributed over (0, 100). Find the mean and variance of the amount of money thatthestoretakesinonagivenday.45. An individual traveling on the real line is trying to reach the origin. However,the larger the desired step, the greater is the variance in the result of that step.Specifically, whenever the person is at location x, he next moves to a locationhaving mean 0 and variance βx 2 .LetX n denote the position of the individualafter having taken n steps. Supposing that X 0 = x 0 , find(a) E[X n ];(b) Var(X n ).46. (a) Show thatCov(X, Y ) = Cov(X, E[Y | X])(b) Suppose, that, for constants a and b,Show that*47. If E[Y | X]=1, show thatE[Y | X]=a + bXb = Cov(X, Y )/Var(X)Var(X Y ) Var(X)


174 3 Conditional Probability and Conditional Expectation48. Give another proof of the result of Example 3.17 by computing the momentgenerating function of ∑ Ni=1 X i and then differentiating to obtain its moments.Hint: LetNow,[ (E exp t[ ( )]N∑φ(t)= E exp t X i[ [i=1(= E E exp t) ∣ ]]N∑ ∣∣∣∣X i Ni=1)∣ ] [ ( )]N∑ ∣∣∣∣ n∑X i N = n = E exp t X i = (φ X (t)) ni=1i=1since N is independent of the Xs where φ X (t) = E[e tX ] is the moment generatingfunction for the Xs. Therefore,φ(t)= E [ (φ X (t)) N ]Differentiation yieldsφ ′ (t) = E [ N(φ X (t)) N−1 φ ′ X (t)] ,φ ′′ (t) = E [ N(N − 1)(φ X (t)) N−2 (φ ′ X (t))2 + N(φ X (t)) N−1 φ ′′ X (t)]Evaluate at t = 0 to get the desired result.49. A and B play a series of games with A winning each game with probabilityp. The overall winner is the first player to have won two more games than theother.(a) Find the probability that A is the overall winner.(b) Find the expected number of games played.50. There are three coins in a barrel. These coins, when flipped, will come upheads with respective probabilities 0.3, 0.5, 0.7. A coin is randomly selected fromamong these three and is then flipped ten times. Let N be the number of headsobtained on the ten flips. Find(a) P {N = 0}.(b) P {N = n},n= 0, 1,...,10.(c) Does N have a binomial distribution?(d) If you win $1 each time a head appears and you lose $1 each time a tailappears, is this a fair game? Explain.


Exercises 17551. Do Exercise 50 under the assumption that each time a coin is flipped, it isthen put back in the barrel and another coin is randomly selected. Does N have abinomial distribution now?52. Suppose that X and Y are independent random variables with probabilitydensity functions f X and f Y . Determine a one-dimensional integral expressionfor P {X + Y


176 3 Conditional Probability and Conditional Expectationthe probability that the first player to flip is the winner. Before determining thisprobability, which we will call f (p), answer the following questions.(a) Do you think that f (p) is a monotone function of p? If so, is it increasingor decreasing?(b) What do you think is the value of lim p→1 f(p)?(c) What do you think is the value of lim p→0 f(p)?(d) Find f(p).61. Suppose in Exercise 29 that the shooting ends when the target has been hittwice. Let m i denote the mean number of shots needed for the first hit when playeri shoots first, i = 1, 2. Also, let P i , i = 1, 2, denote the probability that the firsthit is by player 1, when player i shoots first.(a) Find m 1 and m 2 .(b) Find P 1 and P 2 .For the remainder of the problem, assume that player 1 shoots first.(c) Find the probability that the final hit was by 1.(d) Find the probability that both hits were by 1.(e) Find the probability that both hits were by 2.(f) Find the mean number of shots taken.62. A,B, and C are evenly matched tennis players. Initially A and B play a set,and the winner then plays C. This continues, with the winner always playing thewaiting player, until one of the players has won two sets in a row. That player isthen declared the overall winner. Find the probability that A is the overall winner.63. Suppose there are n types of coupons, and that the type of each new couponobtained is independent of past selections and is equally likely to be any of then types. Suppose one continues collecting until a complete set of at least one ofeach type is obtained.(a) Find the probability that there is exactly one type i coupon in the finalcollection.Hint: Condition on T , the number of types that are collected before thefirst type i appears.(b) Find the expected number of types that appear exactly once in the finalcollection.64. A and B roll a pair of dice in turn, with A rolling first. A’s objective is toobtain a sum of 6, and B’s is to obtain a sum of 7. The game ends when eitherplayer reaches his or her objective, and that player is declared the winner.(a) Find the probability that A is the winner.


Exercises 177(b) Find the expected number of rolls of the dice.(c) Find the variance of the number of rolls of the dice.65. The number of red balls in an urn that contains n balls is a random variablethat is equally likely to be any of the values 0, 1,...,n. That is,P {i red,n− i non-red}= 1n + 1 ,i = 0,...,nThe n balls are then randomly removed one at a time. Let Y k denote the numberof red balls in the first k selections, k = 1,...,n.(a) Find P {Y n = j},j = 0,...,n.(b) Find P {Y n−1 = j}, j= 0,...,n.(c) What do you think is the value of P {Y k = j},j = 0,...,n?(d) Verify your answer to part (c) by a backwards induction argument. That is,check that your answer is correct when k = n, and then show that whenever itis true for k it is also true for k − 1,k= 1,...,n.66. The opponents of soccer team A are of two types: either they are aclass 1 or a class 2 team. The number of goals team A scores against aclass i opponent is a Poisson random variable with mean λ i , where λ 1 = 2,λ 2 = 3. This weekend the team has two games against teams they are not veryfamiliar with. Assuming that the first team they play is a class 1 team with probability0.6 and the second is, independently of the class of the first team, a class 1team with probability 0.3, determine(a) the expected number of goals team A will score this weekend.(b) the probability that team A will score a total of five goals.*67. A coin having probability p of coming up heads is continually flipped. LetP j (n) denote the probability that a run of j successive heads occurs within thefirst n flips.(a) Argue thatP j (n) = P j (n − 1) + p j (1 − p)[1 − P j (n − j − 1)](b) By conditioning on the first non-head to appear, derive another equationrelating P j (n) to the quantities P j (n − k),k = 1,...,j.68. In a knockout tennis tournament of 2 n contestants, the players are paired andplay a match. The losers depart, the remaining 2 n−1 players are paired, and theyplay a match. This continues for n rounds, after which a single player remainsunbeaten and is declared the winner. Suppose that the contestants are numbered1 through 2 n , and that whenever two players contest a match, the lower numberedone wins with probability p. Also suppose that the pairings of the remainingplayers are always done at random so that all possible pairings for that roundare equally likely.


178 3 Conditional Probability and Conditional Expectation(a) What is the probability that player 1 wins the tournament?(b) What is the probability that player 2 wins the tournament?Hint: Imagine that the random pairings are done in advance of the tournament.That is, the first-round pairings are randomly determined; the 2 n−1 firstroundpairs are then themselves randomly paired, with the winners of each pairto play in round 2; these 2 n−2 groupings (of four players each) are then randomlypaired, with the winners of each grouping to play in round 3, and so on.Say that players i and j are scheduled to meet in round k if, provided they bothwin their first k − 1 matches, they will meet in round k. Now condition on theround in which players 1 and 2 are scheduled to meet.69. In the match problem, say that (i, j), i < j,isapairifi chooses j’s hat andj chooses i’s hat.(a) Find the expected number of pairs.(b) Let Q n denote the probability that there are no pairs, and derive a recursiveformula for Q n in terms of Q j ,j


Exercises 179Figure 3.7.*73. Suppose that we continually roll a die until the sum of all throws exceeds100. What is the most likely value of this total when you stop?74. There are five components. The components act independently, with componenti working with probability p i ,i= 1, 2, 3, 4, 5. These components form asystem as shown in Figure 3.7.The system is said to work if a signal originating at the left end of the diagramcan reach the right end, where it can pass through a component only if thatcomponent is working. (For instance, if components 1 and 4 both work, then thesystem also works.) What is the probability that the system works?75. This problem will present another proof of the ballot problem of Example3.24.(a) Argue that(b) Explain whyP n,m = 1 − P {A and B are tied at some point}P {A receives first vote and they are eventually tied}= P {B receives first vote and they are eventually tied}Hint: Any outcome in which they are eventually tied with A receiving thefirst vote corresponds to an outcome in which they are eventually tied with Breceiving the first vote. Explain this correspondence.(c) Argue that P {eventually tied} =2m/(n + m), and conclude that P n,m =(n − m)/(n + m).76. Consider a gambler who on each bet either wins 1 with probability 18/38or loses 1 with probability 20/38. (These are the probabilities if the bet is that aroulette wheel will land on a specified color.) The gambler will quit either whenhe or she is winning a total of 5 or after 100 plays. What is the probability he orshe plays exactly 15 times?


180 3 Conditional Probability and Conditional Expectation77. Show that(a) E[XY|Y = y]=yE[X|Y = y](b) E[g(X,Y)|Y = y]=E[g(X,y)|Y = y](c) E[XY]=E[YE[X|Y ]]78. In the ballot problem (Example 3.24), compute P {A is never behind}.79. An urn contains n white and m black balls which are removed oneat a time. If n>m, show that the probability that there are always morewhite than black balls in the urn (until, of course, the urn is empty) equals(n − m)/(n + m). Explain why this probability is equal to the probability thatthe set of withdrawn balls always contains more white than black balls. [Thislatter probability is (n − m)/(n + m) by the ballot problem.]80. A coin that comes up heads with probability p is flipped n consecutive times.What is the probability that starting with the first flip there are always more headsthan tails that have appeared?81. Let X i ,i 1, be independent uniform (0, 1) random variables, and defineN byN = min{n: X n


Exercises 181(c) Prove the following identity:a 1−k =∞∑( ) i + k − 2(1 − a) i , 0 ···>P n , show that 1, 2,...,nis the best ordering.86. Consider the random graph of Section 3.6.2 when n = 5. Compute the probabilitydistribution of the number of components and verify your solution by using


182 3 Conditional Probability and Conditional Expectationit to compute E[C]and then comparing your solution withE[C]=5∑( ) 5 (k − 1)!kk=187. ( (a) From the results of Section 3.6.3 we can conclude that there aren+m−1)m−1 nonnegative integer valued solutions of the equation x1 +···+x m = n.Prove this directly.(b) How many positive integer valued solutions of x 1 +···+x m = n are there?Hint: Let y i = x i − 1.(c) For the Bose–Einstein distribution, compute the probability that exactly kof the X i are equal to 0.88. In Section 3.6.3, we saw that if U is a random variable that is uniform on(0, 1) and if, conditional on U = p,X is binomial with parameters n and p, thenP {X = i}= 1n + 1 , i = 0, 1,...,nFor another way of showing this result, let U,X 1 ,X 2 ,...,X n be independentuniform (0, 1) random variables. Define X byX = #i: X i


Exercises 183(a) N = min(n: X n−2 = 2,X n−1 = 1,X n = 0);(b) N = min(n: X n−3 = 2,X n−2 = 1,X n−1 = 0,X n = 2).91. Find the expected number of flips of a coin, which comes up heads withprobability p, that are necessary to obtain the pattern h, t, h, h, t, h, t, h.92. The number of coins that Josh spots when walking to work is a Poissonrandom variable with mean 6. Each coin is equally likely to be a penny, a nickel,a dime, or a quarter. Josh ignores the pennies but picks up the other coins.(a) Find the expected amount of money that Josh picks up on his way to work.(b) Find the variance of the amount of money that Josh picks up on his way towork.(c) Find the probability that Josh picks up exactly 25 cents on his way to work.*93. Consider a sequence of independent trials, each of which is equally likelyto result in any of the outcomes 0, 1,...,m.Say that a round begins with the firsttrial, and that a new round begins each time outcome 0 occurs. Let N denote thenumber of trials that it takes until all of the outcomes 1,...,m− 1 have occurredin the same round. Also, let T j denote the number of trials that it takes until jdistinct outcomes have occurred, and let I j denote the jth distinct outcome tooccur. (Therefore, outcome I j first occurs at trial T j .)(a) Argue that the random vectors (I 1 ,...,I m ) and (T 1 ,...,T m ) are independent.(b) Define X by letting X = j if outcome 0 is the jth distinct outcometo occur. (Thus, I X = 0.) Derive an equation for E[N] in terms of E[T j ],j = 1,...,m− 1 by conditioning on X.(c) Determine E[T j ],j= 1,...,m− 1.Hint: See Exercise 42 of Chapter 2.(d) Find E[N].94. Let N be a hypergeometric random variable having the distribution of thenumber of white balls in a random sample of size r from a set of w white and bblue balls. That is,( w b)P {N = n}=n)(r−n)( w+brwhere we use the convention that ( mj)= 0 if either jm.Now, considera compound random variable S N = ∑ Ni=1 X i , where the X i are positive integervalued random variables with α j = P {X i = j}.


184 3 Conditional Probability and Conditional Expectation(a) With M as defined as in Section 3.7, find the distribution of M − 1.(b) Suppressing its dependence on b, let P w,r (k) = P {S N = k}, and derive arecursion equation for P w,r (k).(c) Use the recursion of (b) to find P w,r (2).


Markov Chains44.1. IntroductionIn this chapter, we consider a stochastic process {X n ,n= 0, 1, 2,...} that takeson a finite or countable number of possible values. Unless otherwise mentioned,this set of possible values of the process will be denoted by the set of nonnegativeintegers {0, 1, 2,...}. IfX n = i, then the process is said to be in state i at time n.We suppose that whenever the process is in state i, there is a fixed probability P ijthat it will next be in state j. That is, we suppose thatP {X n+1 = j|X n = i, X n−1 = i n−1 ,...,X 1 = i 1 ,X 0 = i 0 }=P ij (4.1)for all states i 0 ,i 1 ,...,i n−1 ,i,j and all n 0. Such a stochastic process is knownas a Markov chain. Equation (4.1) may be interpreted as stating that, for a Markovchain, the conditional distribution of any future state X n+1 given the past statesX 0 ,X 1 ,...,X n−1 and the present state X n , is independent of the past states anddepends only on the present state.The value P ij represents the probability that the process will, when in state i,next make a transition into state j. Since probabilities are nonnegative and sincethe process must make a transition into some state, we have that∞∑P ij 0, i,j 0; P ij = 1, i = 0, 1,...Let P denote the matrix of one-step transition probabilities P ij , so thatP 00 P 01 P 02 ···P 10 P 11 P 12 ···P =...P i0 P i1 P i2 ···∥ . . . ∥j=0185


186 4 Markov ChainsExample 4.1 (Forecasting the Weather) Suppose that the chance of rain tomorrowdepends on previous weather conditions only through whether or not itis raining today and not on past weather conditions. Suppose also that if it rainstoday, then it will rain tomorrow with probability α; and if it does not rain today,then it will rain tomorrow with probability β.If we say that the process is in state 0 when it rains and state 1 when it does notrain, then the preceding is a two-state Markov chain whose transition probabilitiesare given byP =∥ αβ1 − α1 − β ∥Example 4.2 (A Communications System) Consider a communications systemwhich transmits the digits 0 and 1. Each digit transmitted must pass throughseveral stages, at each of which there is a probability p that the digit entered willbe unchanged when it leaves. Letting X n denote the digit entering the nth stage,then {X n ,n= 0, 1,...} is a two-state Markov chain having a transition probabilitymatrixP =∥ p 1 − p1 − p p ∥ Example 4.3 On any given day Gary is either cheerful (C), so-so (S), or glum(G). If he is cheerful today, then he will be C, S, orG tomorrow with respectiveprobabilities 0.5, 0.4, 0.1. If he is feeling so-so today, then he will be C, S, orG tomorrow with probabilities 0.3, 0.4, 0.3. If he is glum today, then he will beC, S,orG tomorrow with probabilities 0.2, 0.3, 0.5.Letting X n denote Gary’s mood on the nth day, then {X n ,n 0} is a three-stateMarkov chain (state 0 = C, state 1 = S, state 2 = G) with transition probabilitymatrix0.5 0.4 0.1P =0.3 0.4 0.3∥0.2 0.3 0.5∥Example 4.4 (Transforming a Process into a Markov Chain) Suppose thatwhether or not it rains today depends on previous weather conditions throughthe last two days. Specifically, suppose that if it has rained for the past two days,then it will rain tomorrow with probability 0.7; if it rained today but not yesterday,then it will rain tomorrow with probability 0.5; if it rained yesterday but not today,then it will rain tomorrow with probability 0.4; if it has not rained in the past twodays, then it will rain tomorrow with probability 0.2.


4.1. Introduction 187If we let the state at time n depend only on whether or not it is raining attime n, then the preceding model is not a Markov chain (why not?). However, wecan transform this model into a Markov chain by saying that the state at any timeis determined by the weather conditions during both that day and the previousday. In other words, we can say that the process is instate 0 if it rained both today and yesterday,state 1 if it rained today but not yesterday,state 2 if it rained yesterday but not today,state 3 if it did not rain either yesterday or today.The preceding would then represent a four-state Markov chain having a transitionprobability matrix0.7 0 0.3 0P =0.5 0 0.5 00 0.4 0 0.6∥0 0.2 0 0.8∥You should carefully check the matrix P, and make sure you understand how itwas obtained. Example 4.5 (A Random Walk Model) A Markov chain whose state space isgiven by the integers i = 0, ±1, ±2,... is said to be a random walk if, for somenumber 0


188 4 Markov ChainsExample 4.7 In most of Europe and Asia annual automobile insurance premiumsare determined by use of a Bonus Malus (Latin for Good-Bad) system. Eachpolicyholder is given a positive integer valued state and the annual premium is afunction of this state (along, of course, with the type of car being insured and thelevel of insurance). A policyholder’s state changes from year to year in response tothe number of claims made by that policyholder. Because lower numbered statescorrespond to lower annual premiums, a policyholder’s state will usually decreaseif he or she had no claims in the preceding year, and will generally increase if heor she had at least one claim. (Thus, no claims is good and typically results in a decreasedpremium, while claims are bad and typically result in a higher premium.)For a given Bonus Malus system, let s i (k) denote the next state of a policyholderwho was in state i in the previous year and who made a total of k claimsin that year. If we suppose that the number of yearly claims made by a particularpolicyholder is a Poisson random variable with parameter λ, then the successivestates of this policyholder will constitute a Markov chain with transition probabilitiesP i,j =∑k:s i (k)=j−λ λkek! , j 0Whereas there are usually many states (20 or so is not atypical), the followingtable specifies a hypothetical Bonus Malus system having four states.Next state ifState Annual Premium 0 claims 1 claim 2 claims 3 claims1 200 1 2 3 42 250 1 3 4 43 400 2 4 4 44 600 3 4 4 4Thus, for instance, the table indicates that s 2 (0) = 1; s 2 (1) = 3; s 2 (k) = 4,k 2.Consider a policyholder whose annual number of claims is a Poisson randomvariable with parameter λ. Ifa k is the probability that such a policyholder makesk claims in a year, thena k = e−λ λkk! , k 0


4.2. Chapman–Kolmogorov Equations 189For the Bonus Malus system specified in the preceding table, the transition probabilitymatrix of the successive states of this policyholder is∥ a 0 a 1 a 2 1 − a 0 − a 1 − a 2∥∥∥∥∥∥∥P =a 0 0 a 1 1 − a 0 − a 10 a 0 0 1− a 0∥0 0 a 0 1 − a 04.2. Chapman–Kolmogorov EquationsWe have already defined the one-step transition probabilities P ij . We now definethe n-step transition probabilities Pij n to be the probability that a process in state iwill be in state j after n additional transitions. That is,P nij = P {X n+k = j|X k = i}, n 0, i,j 0Of course P 1ij = P ij .TheChapman–Kolmogorov equations provide a method forcomputing these n-step transition probabilities. These equations areP n+mij=∞∑k=0P nik P m kj for all n, m 0, all i, j (4.2)and are most easily understood by noting that Pik n P kj m represents the probabilitythat starting in i the process will go to state j in n + m transitions through apath which takes it into state k at the nth transition. Hence, summing over allintermediate states k yields the probability that the process will be in state j aftern + m transitions. Formally, we havePij n+m = P {X n+m = j|X 0 = i}∞∑= P {X n+m = j,X n = k|X 0 = i}==k=0∞∑P {X n+m = j|X n = k,X 0 = i}P {X n = k|X 0 = i}k=0∞∑k=0P m kj P nik


190 4 Markov ChainsIf we let P (n) denote the matrix of n-step transition probabilities Pij n , then Equation(4.2) asserts thatP (n+m) = P (n) · P (m)where the dot represents matrix multiplication. ∗ Hence, in particular,and by inductionP (2) = P (1+1) = P · P = P 2P (n) = P (n−1+1) = P n−1 · P = P nThat is, the n-step transition matrix may be obtained by multiplying the matrix Pby itself n times.Example 4.8 Consider Example 4.1 in which the weather is considered as atwo-state Markov chain. If α = 0.7 and β = 0.4, then calculate the probabilitythat it will rain four days from today given that it is raining today.Solution:Hence,The one-step transition probability matrix is given byP =∥ 0.7 0.30.4 0.6∥P (2) = P 2 =∥ 0.7 0.30.4 0.6∥ ·∥ 0.7 0.30.4 0.6∥=∥ 0.61 0.390.52 0.48∥ ,P (4) = (P 2 ) 2 =∥ 0.61 0.390.52 0.48∥ ·∥ 0.61 0.390.52 0.48∥=∥ 0.5749 0.42510.5668 0.4332∥and the desired probability P00 4 equals 0.5749. ∗ If A is an N × M matrix whose element in the ith row and jth column is a ij and B is an M × Kmatrix whose element in the ith row and jth column is b ij ,thenA · B is defined to be the N × Kmatrix whose element in the ith row and jth column is ∑ Mk=1 a ik b kj .


4.2. Chapman–Kolmogorov Equations 191Example 4.9 Consider Example 4.4. Given that it rained on Monday andTuesday, what is the probability that it will rain on Thursday?Solution: The two-step transition matrix is given by0.7 0 0.3 00.7 0 0.3 0P (2) = P 2 =0.5 0 0.5 00 0.4 0 0.6·0.5 0 0.5 00 0.4 0 0.6∥0 0.2 0 0.8∥∥0 0.2 0 0.8∥0.49 0.12 0.21 0.18=0.35 0.20 0.15 0.300.20 0.12 0.20 0.48∥0.10 0.16 0.10 0.64∥Since rain on Thursday is equivalent to the process being in either state 0 orstate 1 on Thursday, the desired probability is given by P00 2 + P 01 2 = 0.49 +0.12 = 0.61. So far, all of the probabilities we have considered are conditional probabilities.For instance, Pij n is the probability that the state at time n is j given that theinitial state at time 0 is i. If the unconditional distribution of the state at time nis desired, it is necessary to specify the probability distribution of the initial state.Let us denote this by( ∞)∑α i ≡ P {X 0 = i}, i 0 α i = 1All unconditional probabilities may be computed by conditioning on the initialstate. That is,∞∑P {X n = j}= P {X n = j|X 0 = i}P {X 0 = i}=i=0∞∑i=0P nij α iFor instance, if α 0 = 0.4, α 1 = 0.6, in Example 4.8, then the (unconditional)probability that it will rain four days after we begin keeping weather records isP {X 4 = 0}=0.4P 4 00 + 0.6P 4 10i=0= (0.4)(0.5749) + (0.6)(0.5668)= 0.5700


192 4 Markov ChainsSuppose now that you want to determine the probability that a Markov chainenters any of a specified set of states A by time n. One way to accomplish this isto reset the transition probabilities out of states in A to{ 1, if i ∈ A ,j= iP {X m+1 = j|X m = i}=0, if i ∈ A ,j≠ iThat is, transform all states in A into absorbing states which once entered cannever be left. Because the original and transformed Markov chain follows identicalprobabilities until a state in A is entered, it follows that the probability theoriginal Markov chain enters a state in A by time n is equal to the probabilitythat the transformed Markov chain is in one of the states of A at time n.Example 4.10 A pensioner receives 2 (thousand dollars) at the beginning ofeach month. The amount of money he needs to spend during a month is independent∑of the amount he has and is equal to i with probability P i ,i = 1, 2, 3, 4,4i=1P i = 1. If the pensioner has more than 3 at the end of a month, he gives theamount greater than 3 to his son. If, after receiving his payment at the beginningof a month, the pensioner has a capital of 5, what is the probability that his capitalis ever 1 or less at any time within the following four months?Solution: To find the desired probability, we consider a Markov chain withthe state equal to the amount the pensioner has at the end of a month. Becausewe are interested in whether this amount ever falls as low as 1, we will let1 mean that the pensioner’s end-of-month fortune has ever been less than orequal to 1. Because the pensioner will give any end-of-month amount greaterthan 3 to his son, we need only consider the Markov chain with states 1, 2, 3and transition probability matrix Q =[Q i,j ] given by∥ 1 0 0 ∥∥∥∥∥P 3 + P 4 P 2 P 1∥ P 4 P 3 P 1 + P 2To understand the preceding, consider Q 2,1 , the probability that a month thatends with the pensioner having the amount 2 will be followed by a month thatends with the pensioner having less than or equal to 1. Because the pensionerwill begin the new month with the amount 2 + 2 = 4, his ending capital will beless than or equal to 1 if his expenses are either 3 or 4. Thus, Q 2,1 = P 3 + P 4 .The other transition probabilities are similarly explained.


4.3. Classification of States 193Suppose now that P i = 1/4, i = 1, 2, 3, 4. The transition probability matrixis1 0 01/2 1/4 1/4∥1/4 1/4 1/2∥Squaring this matrix and then squaring the result gives the matrix1 0 0222 13 21256 256 256∥ 20134∥25621256Because the pensioner’s initial end of month capital was 3, the desired answeris Q 4 3,1 = 201/256. Let {X n ,n 0} be a Markov chain with transition probabilities P i,j .IfweletQ i,j denote the transition probabilities that transform all states in A into absorbingstates, then⎧⎨1, if i ∈ A ,j = iQ i,j = 0, if i ∈ A ,j ≠ i⎩P i,j , otherwiseFor i, j /∈ A , the n stage transition probability Q n i,jrepresents the probabilitythat the original chain, starting in state i, will be in state j at time n without everhaving entered any of the states in A . For instance, in Example 4.10, startingwith 5 at the beginning of January, the probability that the pensioner’s capital is 4at the beginning of May without ever having been less than or equal to 1 in thattime is Q 4 3,2 = 21/256.We can also compute the conditional probability of X n given that the chainstarts in state i and has not entered any state in A by time n, as follows. Fori, j /∈ A ,256P {X n = j|X 0 = i, X k /∈ A ,k= 1,...,n}= P {X n = j,X k /∈ A ,k= 1,...,n|X 0 = i}P {X k /∈ A ,k= 1,...,n|X 0 = i}=Q n i,j∑r/∈A Qn i,r4.3. Classification of StatesState j is said to be accessible from state i if Pij n > 0forsomen 0. Note thatthis implies that state j is accessible from state i if and only if, starting in i,


194 4 Markov Chainsit is possible that the process will ever enter state j. This is true since if j is notaccessible from i, then{ ∞}⋃∣P {ever enterj|start in i}=P {X n = j} ∣X 0 = i== 0n=0∞∑P {X n = j|X 0 = i}Two states i and j that are accessible to each other are said to communicate, andwe write i ↔ j.Note that any state communicates with itself since, by definition,n=0∞∑n=0P nijP 0ii = P {X 0 = i|X 0 = i}=1The relation of communication satisfies the following three properties:(i) State i communicates with state i, alli 0.(ii) If state i communicates with state j, then state j communicates withstate i.(iii) If state i communicates with state j, and state j communicates with statek, then state i communicates with state k.Properties (i) and (ii) follow immediately from the definition of communication.To prove (iii) suppose that i communicates with j, and j communicates with k.Thus, there exist integers n and m such that Pij n > 0, Pm jk> 0. Now by theChapman–Kolmogorov equations, we have thatP n+mik=∞∑r=0P nir P mrk P nij P m jk > 0Hence, state k is accessible from state i. Similarly, we can show that state i isaccessible from state k. Hence, states i and k communicate.Two states that communicate are said to be in the same class. It is an easyconsequence of (i), (ii), and (iii) that any two classes of states are either identicalor disjoint. In other words, the concept of communication divides the state spaceup into a number of separate classes. The Markov chain is said to be irreducibleif there is only one class, that is, if all states communicate with each other.


4.3. Classification of States 195Example 4.11 Consider the Markov chain consisting of the three states 0, 1,2 and having transition probability matrix12P =12∥01201414It is easy to verify that this Markov chain is irreducible. For example, it is possibleto go from state 0 to state 2 since130 → 1 → 2That is, one way of getting from state 0 to state 2 is to go from state 0 to state 1(with probability 1 2 ) and then go from state 1 to state 2 (with probability 1 4 ). Example 4.12 Consider a Markov chain consisting of the four states 0, 1, 2, 3and having transition probability matrix1 1 220 01 1 P =2 20 01 1 1 1 4 4 4 4∥ 0 0 0 1∥The classes of this Markov chain are {0, 1}, {2}, and {3}. Note that while state0 (or 1) is accessible from state 2, the reverse is not true. Since state 3 is anabsorbing state, that is, P 33 = 1, no other state is accessible from it. For any state i we let f i denote the probability that, starting in state i, theprocess will ever reenter state i. State i is said to be recurrent if f i = 1 and transientif f i < 1.Suppose that the process starts in state i and i is recurrent. Hence, with probability1, the process will eventually reenter state i. However, by the definitionof a Markov chain, it follows that the process will be starting over again when itreenters state i and, therefore, state i will eventually be visited again. Continualrepetition of this argument leads to the conclusion that if state i is recurrent then,starting in state i, the process will reenter state i again and again and again—infact, infinitely often.On the other hand, suppose that state i is transient. Hence, each time the processenters state i there will be a positive probability, namely, 1 − f i , that it will neveragain enter that state. Therefore, starting in state i, the probability that the processwill be in state i for exactly n time periods equals fin−1 (1 − f i ), n 1. In otherwords, if state i is transient then, starting in state i, the number of time periods23∥ ∥∥∥∥∥∥


196 4 Markov Chainsthat the process will be in state i has a geometric distribution with finite mean1/(1 − f i ).From the preceding two paragraphs, it follows that state i is recurrent if andonly if, starting in state i, the expected number of time periods that the process isin state i is infinite. But, letting{ 1, if Xn = iI n =0, if X n ≠ iwe have that ∑ ∞n=0 I n represents the number of periods that the process is instate i. Also,[ ∞]∑∞∑E I n |X 0 = i = E[I n |X 0 = i]n=0We have thus proven the following.Proposition 4.1State i is==recurrent iftransient ifn=0∞∑P {X n = i|X 0 = i}n=0∞∑n=0∞∑n=1∞∑n=1P niiP nii =∞,P nii < ∞The argument leading to the preceding proposition is doubly important becauseit also shows that a transient state will only be visited a finite number oftimes (hence the name transient). This leads to the conclusion that in a finite-stateMarkov chain not all states can be transient. To see this, suppose the states are0, 1,...,M and suppose that they are all transient. Then after a finite amount oftime (say, after time T 0 ) state 0 will never be visited, and after a time (say, T 1 ) state1 will never be visited, and after a time (say, T 2 ) state 2 will never be visited, andso on. Thus, after a finite time T = max{T 0 ,T 1 ,...,T M } no states will be visited.But as the process must be in some state after time T we arrive at a contradiction,which shows that at least one of the states must be recurrent.Another use of Proposition 4.1 is that it enables us to show that recurrence is aclass property.


4.3. Classification of States 197Corollary 4.2 If state i is recurrent, and state i communicates with state j,then state j is recurrent.Proof To prove this we first note that, since state i communicates with state j,there exist integers k and m such that Pij k > 0, P ji m > 0. Now, for any integer nP m+n+kjj P m ji P nii P kijThis follows since the left side of the preceding is the probability of going fromj to j in m+n+k steps, while the right side is the probability of going from j to jin m + n + k steps via a path that goes from j to i in m steps, then from i to i inan additional n steps, then from i to j in an additional k steps.From the preceding we obtain, by summing over n, that∞∑n=1P m+n+kjj P m ji P kij∞∑n=1P nii =∞since Pji mP ij k > 0 and ∑ ∞n=1 Pii n is infinite since state i is recurrent. Thus, byProposition 4.1 it follows that state j is also recurrent. Remarks (i) Corollary 4.2 also implies that transience is a class property. Forif state i is transient and communicates with state j, then state j must also betransient. For if j were recurrent then, by Corollary 4.2, i would also be recurrentand hence could not be transient.(ii) Corollary 4.2 along with our previous result that not all states in a finiteMarkov chain can be transient leads to the conclusion that all states of a finiteirreducible Markov chain are recurrent.Example 4.13 Let the Markov chain consisting of the states 0, 1, 2, 3havethe transition probability matrix1 1 0 02 2P =1 0 0 00 1 0 0∥0 1 0 0∥Determine which states are transient and which are recurrent.Solution: It is a simple matter to check that all states communicate and,hence, since this is a finite chain, all states must be recurrent.


198 4 Markov ChainsExample 4.14Determine the recurrent state.Consider the Markov chain having states 0, 1, 2, 3, 4 and1 1 2 20 0 01 1 2 20 0 0P =1 1 0 02 201 1 0 02 20∥ 1 1 140 0∥4Solution: This chain consists of the three classes {0, 1}, {2, 3}, and {4}.Thefirst two classes are recurrent and the third transient. Example 4.15 (A Random Walk) Consider a Markov chain whose statespace consists of the integers i = 0, ±1, ±2,..., and have transition probabilitiesgiven byP i,i+1 = p = 1 − P i,i−1 , i = 0, ±1, ±2,...where 0


4.3. Classification of States 199where we say that a n ∼ b n when lim n→∞ a n /b n = 1, we obtainP00 2n (4p(1 − p))n∼ √ πnNow it is easy to verify, for positive a n , b n , that if a n ∼ b n , then ∑ n a n < ∞ ifand only if ∑ n b n < ∞. Hence, ∑ ∞n=1 P00 n will converge if and only if∞∑ (4p(1 − p)) n√ πnn=1does. However, 4p(1 − p) 1 with equality holding if and only if p = 1∑ 2 . Hence,∞n=1P00 n =∞if and only if p = 1 2 . Thus, the chain is recurrent when p = 1 2 andtransient if p ≠ 1 2 .When p = 1 2, the preceding process is called a symmetric random walk. Wecould also look at symmetric random walks in more than one dimension. Forinstance, in the two-dimensional symmetric random walk the process would, ateach transition, either take one step to the left, right, up, or down, each havingprobability 1 4. That is, the state is the pair of integers (i, j) and the transitionprobabilities are given byP (i,j),(i+1,j) = P (i,j),(i−1,j) = P (i,j),(i,j+1) = P (i,j),(i,j−1) = 1 4By using the same method as in the one-dimensional case, we now show that thisMarkov chain is also recurrent.Since the preceding chain is irreducible, it follows that all states will be recurrentif state 0 = (0, 0) is recurrent. So consider P00 2n .Nowafter2n steps, the chainwill be back in its original location if for some i, 0 i n,the2n steps consist ofi steps to the left, i to the right, n − i up, and n − i down. Since each step will beeither of these four types with probability 1 4, it follows that the desired probabilityis a multinomial probability. That is,n∑)P00 2n = (2n)! 1 2ni!i!(n − i)!(n − i)!(4=i=0n∑i=0(2n)!n!n!( ) 1 2n ( ) n∑ 2n=4 ni=0( ) 1 2n ( )( ) 2n 2n=4 n nn! n! 1(n − i)!i! (n − i)!i!(4( ni)( ) nn − i) 2n(4.4)


200 4 Markov Chainswhere the last equality uses the combinatorial identity( ) 2nn∑( )( ) n n=n i n − ii=0which follows upon noting that both sides represent the number of subgroups ofsize n one can select from a set of n white and n black objects. Now,( ) 2n= (2n)!n n!n!∼ (2n)2n+1/2 e −2n√ 2πn 2n+1 e −2n by Stirling’s approximation(2π)= √ 4nπnHence, from Equation (4.4) we see thatP 2n00 ∼ 1πnwhich shows that ∑ n P 00 2n =∞, and thus all states are recurrent.Interestingly enough, whereas the symmetric random walks in one and twodimensions are both recurrent, all higher-dimensional symmetric random walksturn out to be transient. (For instance, the three-dimensional symmetric randomwalk is at each transition equally likely to move in any of six ways—either to theleft, right, up, down, in, or out.) Remarks For the one-dimensional random walk of Example 4.15 here is adirect argument for establishing recurrence in the symmetric case, and for determiningthe probability that it ever returns to state 0 in the nonsymmetric case.Letβ = P {ever return to 0}To determine β, start by conditioning on the initial transition to obtainβ = P {ever return to 0|X 1 = 1}p + P {ever return to 0|X 1 =−1}(1 − p) (4.5)Now, let α denote the probability that the Markov chain will ever return to state 0given that it is currently in state 1. Because the Markov chain will always increaseby 1 with probability p or decrease by 1 with probability 1 − p no matter what itscurrent state, note that α is also the probability that the Markov chain currently instate i will ever enter state i − 1, for any i. To obtain an equation for α, conditionon the next transition to obtain


4.3. Classification of States 201α = P {every return|X 1 = 1,X 2 = 0}(1 − p) + P {ever return|X 1 = 1,X 2 = 2}p= 1 − p + P {ever return|X 1 = 1,X 2 = 2}p= 1 − p + pα 2where the final equation follows by noting that in order for the chain to ever gofrom state 2 to state 0 it must first go to state 1—and the probability of that everhappening is α—and if it does eventually go to state 1 then it must still go to state0—and the conditional probability of that ever happening is also α. Therefore,α = 1 − p + pα 2The two roots of this equation are α = 1 and α = (1 − p)/p. Consequently, in thecase of the symmetric random walk where p = 1/2 we can conclude that α = 1.By symmetry, the probability that the symmetric random walk will ever enterstate 0 given that it is currently in state −1 is also 1, proving that the symmetricrandom walk is recurrent.Suppose now that p>1/2. In this case, it can be shown (see Exercise 17 atthe end of this chapter) that P {ever return to|X 1 =−1}=1. Consequently, Equation(4.5) reduces toβ = αp + 1 − pBecause the random walk is transient in this case we know that β 1/2Similarly, when p


202 4 Markov Chainsprotocol (because it was first instituted at the University of Hawaii). We will showthat such a system is asymptotically unstable in the sense that the number of successfultransmissions will, with probability 1, be finite.To begin let X n denote the number of messages in the facility at the beginningof the nth period, and note that {X n ,n 0} is a Markov chain. Now for k 0define the indicator variables I k by⎧⎨1, if the first time that the chain departs state k itI k = directly goes to state k − 1⎩0, otherwiseand let it be 0 if the system is never in state k,k 0. (For instance, if the successivestates are 0, 1, 3, 4,..., then I 3 = 0 since when the chain first departs state 3 itgoes to state 4; whereas, if they are 0, 3, 3, 2,..., then I 3 = 1 since this time itgoes to state 2.) Now,[ ∞]∑ ∞∑E I k = E[I k ]k=0=k=0∞∑P {I k = 1}k=0∞∑P {I k = 1|k is ever visited} (4.6)k=0Now, P {I k = 1|k is ever visited} is the probability that when state k is departedthe next state is k − 1. That is, it is the conditional probability that a transitionfrom k is to k − 1 given that it is not back into k, and soBecauseP {I k = 1|k is ever visited}= P k,k−11 − P kkP k,k−1 = a 0 kp(1 − p) k−1 ,P k,k = a 0 [1 − kp(1 − p) k−1 ]+a 1 (1 − p) kwhich is seen by noting that if there are k messages present on the beginning ofa day, then (a) there will be k − 1 at the beginning of the next day if there are nonew messages that day and exactly one of the k messages transmits; and (b) therewill be k at the beginning of the next day if either


4.3. Classification of States 203(i) there are no new messages and it is not the case that exactly one of theexisting k messages transmits, or(ii) there is exactly one new message (which automatically transmits) and noneof the other k messages transmits.Substitution of the preceding into Equation (4.6) yields[ ∞]∑ ∞∑a 0 kp(1 − p) k−1E I k 1 − a 0 [1 − kp(1 − p) k−1 ]−a 1 (1 − p) kk=0k=0< ∞where the convergence follows by noting that when k is large the denominator ofthe expression in the preceding sum converges to 1 − a 0 and so the convergenceor divergence of the sum is determined by whether or not the sum of the terms inthe numerator converge and ∑ ∞k=0 k(1 − p) k−1 < ∞.Hence, E[ ∑ ∞k=0 I k ] < ∞, which implies that ∑ ∞k=0 I k < ∞ with probability 1(for if there was a positive probability that ∑ ∞k=0 I k could be ∞, then its meanwould be ∞). Hence, with probability 1, there will be only a finite number ofstates that are initially departed via a successful transmission; or equivalently,there will be some finite integer N such that whenever there are N or more messagesin the system, there will never again be a successful transmission. From this(and the fact that such higher states will eventually be reached—why?) it followsthat, with probability 1, there will only be a finite number of successful transmissions.Remarks For a (slightly less than rigorous) probabilistic proof of Stirling’sapproximation, let X 1 X 2 ,...be independent Poisson random variables each havingmean 1. Let S n = ∑ ni=1 X i , and note that both the mean and variance ofS n are equal to n.Now,P {S n = n}=P {n − 1


204 4 Markov ChainsHence, for n largee −n n n≈ (2πn) −1/2n!or, equivalentlyn!≈n n+1/2 e −n√ 2πwhich is Stirling’s approximation.4.4. Limiting ProbabilitiesIn Example 4.8, we calculated P (4) for a two-state Markov chain; it turned out tobeP (4) =∥ 0.5749 0.42510.5668 0.4332∥From this it follows that P (8) = P (4) · P (4) is given (to three significant places) byP (8) =∥ 0.572 0.4280.570 0.430∥Note that the matrix P (8) is almost identical to the matrix P (4) , and secondly, thateach of the rows of P (8) has almost identical entries. In fact it seems that Pij n isconverging to some value (as n →∞) which is the same for all i. In other words,there seems to exist a limiting probability that the process will be in state j aftera large number of transitions, and this value is independent of the initial state.To make the preceding heuristics more precise, two additional properties of thestates of a Markov chain need to be considered. State i is said to have period dif Pii n = 0 whenever n is not divisible by d, and d is the largest integer with thisproperty. For instance, starting in i, it may be possible for the process to enterstate i only at the times 2, 4, 6, 8,..., in which case state i has period 2. A statewith period 1 is said to be aperiodic. It can be shown that periodicity is a classproperty. That is, if state i has period d, and states i and j communicate, thenstate j also has period d.If state i is recurrent, then it is said to be positive recurrent if, starting in i,the expected time until the process returns to state i is finite. It can be shownthat positive recurrence is a class property. While there exist recurrent states thatare not positive recurrent, ∗ it can be shown that in a finite-state Markov chainall recurrent states are positive recurrent. Positive recurrent, aperiodic states arecalled ergodic.∗ Such states are called null recurrent.


4.4. Limiting Probabilities 205We are now ready for the following important theorem which we state withoutproof.Theorem 4.1 For an irreducible ergodic Markov chain lim n→∞ Pij n existsand is independent of i. Furthermore, lettingπ j = lim P nn→∞ij , j 0then π j is the unique nonnegative solution of∞∑π j = π i P ij , j 0,i=0∞∑π j = 1 (4.7)j=0Remarks (i) Given that π j = lim n→∞ Pij n exists and is independent of the initialstate i, it is not difficult to (heuristically) see that the π’s must satisfy Equation(4.7). Let us derive an expression for P {X n+1 = j} by conditioning on thestate at time n. That is,∞∑P {X n+1 = j}= P {X n+1 = j|X n = i}P {X n = i}=i=0∞∑P ij P {X n = i}i=0Letting n →∞, and assuming that we can bring the limit inside the summation,leads to∞∑π j = P ij π ii=0(ii) It can be shown that π j , the limiting probability that the process will be instate j at time n, also equals the long-run proportion of time that the process willbe in state j.(iii) If the Markov chain is irreducible, then there will be a solution toπ j = ∑ i∑π j = 1jπ i P ij , j 0,if and only if the Markov chain is positive recurrent. If a solution exists then itwill be unique, and π j will equal the long run proportion of time that the Markov


206 4 Markov Chainschain is in state j. If the chain is aperiodic, then π j is also the limiting probabilitythat the chain is in state j.Example 4.17 Consider Example 4.1, in which we assume that if it rainstoday, then it will rain tomorrow with probability α; and if it does not rain today,then it will rain tomorrow with probability β. If we say that the state is 0 when itrains and 1 when it does not rain, then by Equation (4.7) the limiting probabilitiesπ 0 and π 1 are given bywhich yields thatπ 0 = απ 0 + βπ 1 ,π 1 = (1 − α)π 0 + (1 − β)π 1 ,π 0 + π 1 = 1π 0 =β1 + β − α , π 1 = 1 − α1 + β − αFor example if α = 0.7 and β = 0.4, then the limiting probability of rain is π 0 =47 = 0.571. Example 4.18 Consider Example 4.3 in which the mood of an individual isconsidered as a three-state Markov chain having a transition probability matrix0.5 0.4 0.1P =0.3 0.4 0.3∥0.2 0.3 0.5∥In the long run, what proportion of time is the process in each of the three states?Solution: The limiting probabilities π i ,i = 0, 1, 2, are obtained by solvingthe set of equations in Equation (4.1). In this case these equations areSolving yieldsπ 0 = 0.5π 0 + 0.3π 1 + 0.2π 2 ,π 1 = 0.4π 0 + 0.4π 1 + 0.3π 2 ,π 2 = 0.1π 0 + 0.3π 1 + 0.5π 2 ,π 0 + π 1 + π 2 = 1π 0 = 2162 , π 1 = 2362 , π 2 = 1862


4.4. Limiting Probabilities 207Example 4.19 (A Model of Class Mobility) A problem of interest to sociologistsis to determine the proportion of society that has an upper- or lower-classoccupation. One possible mathematical model would be to assume that transitionsbetween social classes of the successive generations in a family can be regardedas transitions of a Markov chain. That is, we assume that the occupation of a childdepends only on his or her parent’s occupation. Let us suppose that such a modelis appropriate and that the transition probability matrix is given by0.45 0.48 0.07P =0.05 0.70 0.25∥0.01 0.50 0.49∥(4.8)That is, for instance, we suppose that the child of a middle-class worker will attainan upper-, middle-, or lower-class occupation with respective probabilities 0.05,0.70, 0.25.The limiting probabilities π i , thus satisfyHence,π 0 = 0.45π 0 + 0.05π 1 + 0.01π 2 ,π 1 = 0.48π 0 + 0.70π 1 + 0.50π 2 ,π 2 = 0.07π 0 + 0.25π 1 + 0.49π 2 ,π 0 + π 1 + π 2 = 1π 0 = 0.07, π 1 = 0.62, π 2 = 0.31In other words, a society in which social mobility between classes can be describedby a Markov chain with transition probability matrix given by Equation(4.8) has, in the long run, 7 percent of its people in upper-class jobs, 62 percent ofits people in middle-class jobs, and 31 percent in lower-class jobs. Example 4.20 (The Hardy–Weinberg Law and a Markov Chain in Genetics)Consider a large population of individuals, each of whom possesses a particularpair of genes, of which each individual gene is classified as being of type A ortype a. Assume that the proportions of individuals whose gene pairs are AA, aa,or Aa are, respectively, p 0 , q 0 , and r 0 (p 0 + q 0 + r 0 = 1). When two individualsmate, each contributes one of his or her genes, chosen at random, to the resultantoffspring. Assuming that the mating occurs at random, in that each individual isequally likely to mate with any other individual, we are interested in determiningthe proportions of individuals in the next generation whose genes are AA, aa, orAa. Calling these proportions p, q, and r, they are easily obtained by focusing


208 4 Markov Chainsattention on an individual of the next generation and then determining the probabilitiesfor the gene pair of that individual.To begin, note that randomly choosing a parent and then randomly choosingone of its genes is equivalent to just randomly choosing a gene from the totalgene population. By conditioning on the gene pair of the parent, we see that arandomly chosen gene will be type A with probabilityP {A}=P {A|AA}p 0 + P {A|aa}q 0 + P {A|Aa}r 0= p 0 + r 0 /2Similarly, it will be type a with probabilityP {a}=q 0 + r 0 /2Thus, under random mating a randomly chosen member of the next generationwill be type AA with probability p, wherep = P {A}P {A}=(p 0 + r 0 /2) 2Similarly, the randomly chosen member will be type aa with probabilityq = P {a}P {a}=(q 0 + r 0 /2) 2and will be type Aa with probabilityr = 2P {A}P {a}=2(p 0 + r 0 /2)(q 0 + r 0 /2)Since each member of the next generation will independently be of each of thethree gene types with probabilities p, q, r, it follows that the percentages of themembers of the next generation that are of type AA, aa, orAa are respectivelyp, q, and r.If we now consider the total gene pool of this next generation, then p + r/2,the fraction of its genes that are A, will be unchanged from the previous generation.This follows either by arguing that the total gene pool has not changed fromgeneration to generation or by the following simple algebra:p + r/2 = (p 0 + r 0 /2) 2 + (p 0 + r 0 /2)(q 0 + r 0 /2)= (p 0 + r 0 /2)[p 0 + r 0 /2 + q 0 + r 0 /2]= p 0 + r 0 /2 since p 0 + r 0 + q 0 = 1= P {A} (4.9)Thus, the fractions of the gene pool that are A and a are the same as in the initialgeneration. From this it follows that, under random mating, in all successive generationsafter the initial one the percentages of the population having gene pairsAA, aa, and Aa will remain fixed at the values p, q, and r. This is known as theHardy–Weinberg law.


4.4. Limiting Probabilities 209Suppose now that the gene pair population has stabilized in the percentages p,q, r, and let us follow the genetic history of a single individual and her descendants.(For simplicity, assume that each individual has exactly one offspring.) So,for a given individual, let X n denote the genetic state of her descendant in the nthgeneration. The transition probability matrix of this Markov chain, namely,AA aa AaAAp + r 0 q + r 22aa0 q + r p + r 2 2pAa∥ 2 + r q4 2 + r p4 2 + q 2 + r 2 ∥is easily verified by conditioning on the state of the randomly chosen mate. It isquite intuitive (why?) that the limiting probabilities for this Markov chain (whichalso equal the fractions of the individual’s descendants that are in each of thethree genetic states) should just be p, q, and r. To verify this we must show thatthey satisfy Equation (4.7). Because one of the equations in Equation (4.7) isredundant, it suffices to show that(p = p p + r ) ( p+ r2 2 4)+ r (= p +2) r 2,(q = q q + r ) ( q+ r2 2 4)+ r (= q +2) r 2,p + q + r = 1But this follows from Equation (4.9), and thus the result is established. Example 4.21 Suppose that a production process changes states in accordancewith an irreducible, positive recurrent Markov chain having transition probabilitiesP ij , i, j = 1,...,n, and suppose that certain of the states are consideredacceptable and the remaining unacceptable. Let A denote the acceptable statesand A c the unacceptable ones. If the production process is said to be “up” whenin an acceptable state and “down” when in an unacceptable state, determine1. the rate at which the production process goes from up to down (that is, therate of breakdowns);2. the average length of time the process remains down when it goes down;and3. the average length of time the process remains up when it goes up.


210 4 Markov ChainsSolution: Let π k ,k = 1,...,n, denote the long-run proportions. Now fori ∈ A and j ∈ A c the rate at which the process enters state j from state i israte enter j from i = π i P ijand so the rate at which the production process enters state j from an acceptablestate israte enter j from A = ∑ i∈Aπ i P ijHence, the rate at which it enters an unacceptable state from an acceptable one(which is the rate at which breakdowns occur) israte breakdowns occur = ∑π i P ij (4.10)∑j∈A c i∈ANow let Ū and ¯D denote the average time the process remains up when itgoes up and down when it goes down. Because there is a single breakdownevery Ū + ¯D time units on the average, it follows heuristically thatand, so from Equation (4.10),rate at which breakdowns occur = 1Ū + ¯D1Ū + ¯D = ∑∑j∈A c i∈Aπ i P ij (4.11)To obtain a second equation relating Ū and ¯D, consider the percentage oftime the process is up, which, of course, is equal to ∑ i∈A π i. However, since theprocess is up on the average Ū out of every Ū + ¯D time units, it follows (againsomewhat heuristically) that theand soŪproportion of up time =Ū + ¯DŪŪ + ¯D = ∑ i∈Aπ i (4.12)


4.4. Limiting Probabilities 211Hence, from Equations (4.11) and (4.12) we obtain∑Ū =π i∑j∈A∑i∈A c π ,iP ij¯D ==1 − ∑ i∈A π i∑ ∑j∈A c i∈A π iP ij∑i∈A c π i∑ ∑j∈A c i∈A π iP ijFor example, suppose the transition probability matrix is1 1 1 4 4 201 1 1 0P =4 2 41 1 1 1 4 4 4 4∥ 11∥4140where the acceptable (up) states are 1, 2 and the unacceptable (down) ones are3, 4. The limiting probabilities satisfyThese solve to yieldand thusπ 1 = π 114+ π 314+ π 414,π 2 = π 114+ π 214+ π 314+ π 414,1 1 1π 3 = π 1 2+ π 2 2+ π 3 4,π 1 + π 2 + π 3 + π 4 = 1π 1 =16 3 , π 2 = 1 4 , π 3 = 1448 , π 4 = 1348rate of breakdowns = π 1 (P 13 + P 14 ) + π 2 (P 23 + P 24 )= 9 32 ,Ū = 14 9and ¯D = 2Hence, on the average, breakdowns occur about32(or 28 percent) of the time.They last, on the average, 2 time units, and then there follows a stretch of (onthe average) 14 9time units when the system is up. 29


212 4 Markov ChainsRemarks (i) The long run proportions π j ,j 0, are often called stationaryprobabilities. The reason being that if the initial state is chosen according to theprobabilities π j ,j 0, then the probability of being in state j at any time n isalso equal to π j .Thatis,ifP {X 0 = j}=π j , j 0thenP {X n = j}=π j for all n, j 0The preceding is easily proven by induction, for if we suppose it true for n − 1,then writingP {X n = j}= ∑ iP {X n = j|X n−1 = i}P {X n−1 = i}= ∑ iP ij π iby the induction hypothesis= π j by Equation (4.7)(ii) For state j, define m jj to be the expected number of transitions until aMarkov chain, starting in state j, returns to that state. Since, on the average, thechain will spend 1 unit of time in state j for every m jj units of time, it followsthatπ j = 1m jjIn words, the proportion of time in state j equals the inverse of the mean timebetween visits to j. (The preceding is a special case of a general result, sometimescalled the strong law for renewal processes, which will be presented in Chapter 7.)Example 4.22 (Mean Pattern Times in Markov Chain Generated Data) Consideran irreducible Markov chain {X n ,n 0} with transition probabilities P i,jand stationary probabilities π j ,j 0. Starting in state r, we are interested indetermining the expected number of transitions until the pattern i 1 ,i 2 ,...,i k appears.That is, withN(i 1 ,i 2 ,...,i k ) = min{n k: X n−k+1 = i 1 ,...,X n = i k }we are interested inE[N(i 1 ,i 2 ,...,i k )|X 0 = r]Note that even if i 1 = r, the initial state X 0 is not considered part of the patternsequence.Let μ(i, i 1 ) be the mean number of transitions for the chain to enter state i 1 ,given that the initial state is i, i 0. The quantities μ(i, i 1 ) can be determined as


4.4. Limiting Probabilities 213the solution of the following set of equations, obtained by conditioning on the firsttransition out of state i:μ(i, i 1 ) = 1 + ∑ j≠i 1P i,j μ(j, i 1 ), i 0For the Markov chain {X n ,n 0} associate a corresponding Markov chain, whichwe will refer to as the k-chain, whose state at any time is the sequence of the mostrecent k states of the original chain. (For instance, if k = 3 and X 2 = 4, X 3 = 1,X 4 = 1, then the state of the k-chain at time 4 is (4, 1, 1).) Let π(j 1 ,...,j k ) be thestationary probabilities for the k-chain. Because π(j 1 ,...,j k ) is the proportionof time that the state of the original Markov chain k units ago was j 1 and thefollowing k − 1 states, in sequence, were j 2 ,...,j k , we can conclude thatπ(j 1 ,...,j k ) = π j1 P j1 ,j 2 ···P jk−1 ,j kMoreover, because the mean number of transitions between successive visits ofthe k-chain to the state i 1 ,i 2 ,...,i k is equal to the inverse of the stationary probabilityof that state, we have thatE[number of transitions between visits to i 1 ,i 2 ,...,i k ]1=(4.13)π(i 1 ,...,i k )Let A(i 1 ,...,i m ) be the additional number of transitions needed until the patternappears, given that the first m transitions have taken the chain into statesX 1 = i 1 ,...,X m = i m .We will now consider whether the pattern has overlaps, where we say that thepattern i 1 ,i 2 ,...,i k has an overlap of size j,j < k, if the sequence of its final jelements is the same as that of its first j elements. That is, it has an overlap ofsize j if(i k−j+1 ,...,i k ) = (i 1 ,...,i j ), j < kCase 1: The pattern i 1 ,i 2 ,...,i k has no overlaps.Because there is no overlap, Equation (4.13) yields thatE[N(i 1 ,i 2 ,...,i k )|X 0 = i k ]=1π(i 1 ,...,i k )Because the time until the pattern occurs is equal to the time until the chain entersstate i 1 plus the additional time, we may writeE[N(i 1 ,i 2 ,...,i k )|X 0 = i k ]=μ(i k ,i 1 ) + E[A(i 1 )]


214 4 Markov ChainsThe preceding two equations implyUsing thatE[A(i 1 )]=1π(i 1 ,...,i k ) − μ(i k,i 1 )E[N(i 1 ,i 2 ,...,i k )|X 0 = r]=μ(r, i 1 ) + E[A(i 1 )]gives the resultwhereE[N(i 1 ,i 2 ,...,i k )|X 0 = r]=μ(r, i 1 ) +1π(i 1 ,...,i k ) − μ(i k,i 1 )π(i 1 ,...,i k ) = π i1 P i1 ,i 2 ···P ik−1 ,i kCase 2: Now suppose that the pattern has overlaps and let its largest overlap beof size s. In this case the number of transitions between successive visits of thek-chain to the state i 1 ,i 2 ,...,i k is equal to the additional number of transitionsof the original chain until the pattern appears given that it has already made stransitions with the results X 1 = i 1 ,...,X s = i s . Therefore, from Equation (4.13)But becauseE[A(i 1 ,...,i s )]=1π(i 1 ,...,i k )N(i 1 ,i 2 ,...,i k ) = N(i 1 ,...,i s ) + A(i 1 ,...,i s )we haveE[N(i 1 ,i 2 ,...,i k )|X 0 = r]=E[N(i 1 ,i 2 ,...,i s )|X 0 = r]+1π(i 1 ,...,i k )We can now repeat the same procedure on the pattern i 1 ,...,i s , continuing to doso until we reach one that has no overlap, and then apply the result from Case 1.


4.4. Limiting Probabilities 215For instance, suppose the desired pattern is 1, 2, 3, 1, 2, 3, 1, 2. ThenE[N(1, 2, 3, 1, 2, 3, 1, 2)|X 0 = r]=E[N(1, 2, 3, 1, 2)|X 0 = r]1+π(1, 2, 3, 1, 2, 3, 1, 2)Because the largest overlap of the pattern (1, 2, 3, 1, 2) is of size 2, the sameargument as in the preceding gives1E[N(1, 2, 3, 1, 2)|X 0 = r]=E[N(1, 2)|X 0 = r]+π(1, 2, 3, 1, 2)Because the pattern (1, 2) has no overlap, we obtain from Case 1 thatE[N(1, 2)|X 0 = r]=μ(r, 1) + 1 − μ(2, 1)π(1, 2)Putting it together yieldsE[N(1, 2, 3, 1, 2, 3, 1, 2)|X 0 = r]=μ(r, 1) + 1 − μ(2, 1)π 1 P 1,2+1π 1 P 2 1,2 P 2,3P 3,1+1π 1 P 3 1,2 P 2 2,3 P 2 3,1If the generated data is a sequence of independent and identically distributed randomvariables, with each value equal to j with probability P j , then the Markovchain has P i,j = P j . In this case, π j = P j . Also, because the time to go fromstate i to state j is a geometric random variable with parameter P j ,wehaveμ(i, j) = 1/P j . Thus, the expected number of data values that need be generatedbefore the pattern 1, 2, 3, 1, 2, 3, 1, 2 appears would be1+ 1 − 1 1 1+P 1 P 1 P 2 P 1 P1 2P 2 2P +3 P1 3P 2 3P 32= 1 1 1+P 1 P 2 P1 2P 2 2P +3 P1 3P 2 3P 32 The following result is quite useful.Proposition 4.3 Let {X n ,n 1} be an irreducible Markov chain with stationaryprobabilities π j ,j 0, and let r be a bounded function on the state space.Then, with probability 1,∑ Nn=1r(X n )∞∑lim= r(j)π jN→∞ Nj=0


216 4 Markov ChainsProof If we let a j (N) be the amount of time the Markov chain spends in statej during time periods 1,...,N, thenN∑∞∑r(X n ) = a j (N)r(j)n=1j=0Since a j (N)/N → π j the result follows from the preceding upon dividing by Nand then letting N →∞. If we suppose that we earn a reward r(j) whenever the chain is in state j, thenProposition 4.3 states that our average reward per unit time is ∑ j r(j)π j .Example 4.23 For the four state Bonus Malus automobile insurance systemspecified in Example 4.7, find the average annual premium paid by a policyholderwhose yearly number of claims is a Poisson random variable with mean 1/2.Solution:With a k = e −1/2 (1/2) kk!,wehavea 0 = 0.6065, a 1 = 0.3033, a 2 = 0.0758Therefore, the Markov chain of successive states has the following transitionprobability matrix.0.6065 0.3033 0.0758 0.01440.6065 0.0000 0.3033 0.09020.0000 0.6065 0.0000 ∥0.0000 0.0000 0.6065 0.3935∥The stationary probabilities are given as the solution ofπ 1 = 0.6065π 1 + 0.6065π 2 ,π 2 = 0.3033π 1 + 0.6065π 3 ,π 3 = 0.0758π 1 + 0.3033π 2 + 0.6065π 4 ,π 1 + π 2 + π 3 + π 4 = 1Rewriting the first three of these equations givesπ 2 = 1 − 0.60650.6065 π 1,π 3 = π 2 − 0.3033π 1,0.6065π 4 = π 3 − 0.0758π 1 − 0.3033π 20.6065


4.5. Some Applications 217orπ 2 = 0.6488π 1 ,π 3 = 0.5697π 1 ,π 4 = 0.4900π 1Using that ∑ 4i=1 π i = 1 gives the solution (rounded to four decimal places)π 1 = 0.3692, π 2 = 0.2395, π 3 = 0.2103, π 4 = 0.1809Therefore, the average annual premium paid is200π 1 + 250π 2 + 400π 3 + 600π 4 = 326.3754.5. Some Applications4.5.1. The Gambler’s Ruin ProblemConsider a gambler who at each play of the game has probability p of winningone unit and probability q = 1 − p of losing one unit. Assuming that successiveplays of the game are independent, what is the probability that, starting with iunits, the gambler’s fortune will reach N before reaching 0?If we let X n denote the player’s fortune at time n, then the process {X n ,n=0, 1, 2,...} is a Markov chain with transition probabilitiesP 00 = P NN = 1,P i,i+1 = p = 1 − P i,i−1 , i = 1, 2,...,N − 1This Markov chain has three classes, namely, {0}, {1, 2,...,N − 1}, and {N};thefirst and third class being recurrent and the second transient. Since each transientstate is visited only finitely often, it follows that, after some finite amount of time,the gambler will either attain his goal of N or go broke.Let P i , i = 0, 1,...,N, denote the probability that, starting with i, thegambler’sfortune will eventually reach N. By conditioning on the outcome of theinitial play of the game we obtainor equivalently, since p + q = 1,P i = pP i+1 + qP i−1 , i = 1, 2,...,N − 1pP i + qP i = pP i+1 + qP i−1


218 4 Markov ChainsorP i+1 − P i = q p (P i − P i−1 ), i = 1, 2,...,N − 1Hence, since P 0 = 0, we obtain from the preceding line thatP 2 − P 1 = q p (P 1 − P 0 ) = q p P 1,P 3 − P 2 = q ( ) q 2p (P 2 − P 1 ) = P 1 ,p..P i − P i−1 = q ( ) q i−1p (P i−1 − P i−2 ) = P 1 ,p.P N − P N−1 =( )( ) q q N−1(P N−1 − P N−2 ) = P 1p pAdding the first i − 1 of these equations yieldsP i − P 1 = P 1[ ( qp) ( ) q 2 ( ) ]qi−1+ +···+ppor⎧1 − (q/p) ⎪⎨iP i =1 − (q/p) P 1, if q p ≠ 1⎪⎩ iP 1 , if q p = 1Now, using the fact that P N = 1, we obtain that⎧1 − (q/p)⎪⎨1 − (q/p)P 1 =N , if p ≠ 1 21 ⎪⎩N , if p = 1 2


4.5. Some Applications 219and henceNote that, as N →∞,⎧1 − (q/p) ⎪⎨iP i =1 − (q/p) N , if p ≠ 1 2⎪⎩iN , if p = 1 2⎧ ( ) q i⎪⎨ 1 − , if p> 1P i → p2⎪⎩ 0, if p 1 2(4.14)Thus, if p> 1 2, there is a positive probability that the gambler’s fortune will increaseindefinitely; while if p 1 2, the gambler will, with probability 1, go brokeagainst an infinitely rich adversary.Example 4.24 Suppose Max and Patty decide to flip pennies; the one comingclosest to the wall wins. Patty, being the better player, has a probability 0.6 ofwinningon each flip. (a) If Patty starts with five pennies and Max with ten, what isthe probability that Patty will wipe Max out? (b) What if Patty starts with 10 andMax with 20?Solution: (a) The desired probability is obtained from Equation (4.14) byletting i = 5, N = 15, and p = 0.6. Hence, the desired probability is1 − ( )2 531 − ( )2 15≈ 0.873(b) The desired probability is1 − ( 23) 101 − ( 23) 30≈ 0.98 For an application of the gambler’s ruin problem to drug testing, suppose thattwo new drugs have been developed for treating a certain disease. Drug i has acure rate P i , i = 1, 2, in the sense that each patient treated with drug i will becured with probability P i . These cure rates, however, are not known, and supposewe are interested in a method for deciding whether P 1 >P 2 or P 2 >P 1 . To decideupon one of these alternatives, consider the following test: Pairs of patientsare treated sequentially with one member of the pair receiving drug 1 and the other


220 4 Markov Chainsdrug 2. The results for each pair are determined, and the testing stops when thecumulative number of cures using one of the drugs exceeds the cumulative numberof cures when using the other by some fixed predetermined number. Moreformally, let{ 1, if the patient in the jth pair to receive drug number 1 is curedX j =0, otherwise{ 1, if the patient in the jth pair to receive drug number 2 is curedY j =0, otherwiseFor a predetermined positive integer M the test stops after pair N where N isthe first value of n such that eitherorX 1 +···+X n − (Y 1 +···+Y n ) = MX 1 +···+X n − (Y 1 +···+Y n ) =−MIn the former case we then assert that P 1 >P 2 , and in the latter that P 2 >P 1 .In order to help ascertain whether the preceding is a good test, one thing wewould like to know is the probability of it leading to an incorrect decision. That is,for given P 1 and P 2 where P 1 >P 2 , what is the probability that the test willincorrectly assert that P 2 >P 1 ? To determine this probability, note that after eachpair is checked the cumulative difference of cures using drug 1 versus drug 2will either go up by 1 with probability P 1 (1 − P 2 )—since this is the probabilitythat drug 1 leads to a cure and drug 2 does not—or go down by 1 with probability(1 − P 1 )P 2 , or remain the same with probability P 1 P 2 + (1 − P 1 )(1 − P 2 ). Hence,if we only consider those pairs in which the cumulative difference changes, thenthe difference will go up 1 with probabilityand down 1 with probabilityp = P {up 1|up1 or down 1}P 1 (1 − P 2 )=P 1 (1 − P 2 ) + (1 − P 1 )P 2P 2 (1 − P 1 )q = 1 − p =P 1 (1 − P 2 ) + (1 − P 1 )P 2Hence, the probability that the test will assert that P 2 >P 1 is equal to the probabilitythat a gambler who wins each (one unit) bet with probability p will go down


4.5. Some Applications 221M before going up M. But Equation (4.14) with i = M, N = 2M, shows that thisprobability is given byP {test asserts that P 2 >P 1 }=1 − 1 − (q/p)M1 − (q/p) 2M=11 + (p/q) MThus, for instance, if P 1 = 0.6 and P 2 = 0.4 then the probability of an incorrectdecision is 0.017 when M = 5 and reduces to 0.0003 when M = 10.4.5.2. A Model for Algorithmic EfficiencyThe following optimization problem is called a linear program:minimize cx,subject to Ax = b,x 0where A is an m × n matrix of fixed constants; c = (c 1 ,...,c n ) and b =(b 1 ,...,b m ) are vectors of fixed constants; and x = (x 1 ,...,x ) is the n-vector ofnonnegative values that is to be chosen to minimize cx ≡ ∑ ni=1 c i x i . Supposingthat n>m, it can be shown that the optimal x can always be chosen to have atleast n − m components equal to 0—that is, it can always be taken to be one ofthe so-called extreme points of the feasibility region.The simplex algorithm solves this linear program by moving from an extremepoint of the feasibility region to a better (in terms of the objective function cx)extreme point (via the pivot operation) until the optimal is reached. Because therecan be as many as N ≡ ( nm)such extreme points, it would seem that this methodmight take many iterations, but, surprisingly to some, this does not appear to bethe case in practice.To obtain a feel for whether or not the preceding statement is surprising, letus consider a simple probabilistic (Markov chain) model as to how the algorithmmoves along the extreme points. Specifically, we will suppose that if at any timethe algorithm is at the jth best extreme point then after the next pivot the resultingextreme point is equally likely to be any of the j − 1 best. Under this assumption,we show that the time to get from the Nth best to the best extreme point hasapproximately, for large N, a normal distribution with mean and variance equalto the logarithm (base e) ofN.


222 4 Markov ChainsConsider a Markov chain for which P 11 = 1 andP ij = 1 , j = 1,...,i− 1, i>1i − 1and let T i denote the number of transitions needed to go from state i to state 1.A recursive formula for E[T i ] can be obtained by conditioning on the initial transition:E[T i ]=1 + 1 ∑i−1E[T j ]i − 1j=1Starting with E[T 1 ]=0, we successively see thatE[T 2 ]=1,E[T 3 ]=1 + 1 2 ,E[T 4 ]=1 + 1 3 (1 + 1 + 1 2 ) = 1 + 1 2 + 1 3and it is not difficult to guess and then prove inductively thati−1∑E[T i ]= 1/jHowever, to obtain a more complete description of T N , we will use the representationwherej=1N−1∑T N =j=1{ 1, if the process ever enters jI j =0, otherwiseThe importance of the preceding representation stems from the following:Proposition 4.4 I 1 ,...,I N−1 are independent andP {I j = 1}=1/j, 1 j N − 1Proof Given I j+1 ,...,I N ,letn = min{i: i>j,I i = 1} denote the lowest numberedstate, greater than j, that is entered. Thus we know that the process entersI j


4.5. Some Applications 223state n and the next state entered is one of the states 1, 2,...,j. Hence, as thenext state from state n is equally likely to be any of the lower number states1, 2,...,n− 1 we see that1/(n − 1)P {I j = 1|I j+1 ,...,I N }=j/(n− 1) = 1/jHence, P {I j = 1} =1/j , and independence follows since the preceding conditionalprobability does not depend on I j+1 ,...,I N . Corollary 4.5(i) E[T N ]= ∑ N−1j=1 1/j .(ii) Var(T N ) = ∑ N−1j=1(1/j )(1 − 1/j ).(iii) For N large, T N has approximately a normal distribution with mean log Nand variance log N.Proof Parts (i) and (ii) follow from Proposition 4.4 and the representationT N = ∑ N−1j=1 I j . Part (iii) follows from the central limit theorem sinceorand so∫ N1N−1dxx < ∑∫ N−1dx1/j < 1 +1 x1N−1∑log N< 1/j < 1 + log(N − 1)1N−1∑log N ≈ 1/j Returning to the simplex algorithm, if we assume that n, m, and n − m are alllarge, we have by Stirling’s approximation that( n nN = ∼m)n+1/2(n − m) n−m+1/2 m m+1/2√ 2πand so, letting c = n/m,j=1log N ∼ ( mc + 1 2)log(mc) −(m(c − 1) +12)log(m(c − 1))− ( m + 1 2)log m −12log(2π)


224 4 Markov Chainsor[]clog N ∼ m c logc − 1 + log(c − 1)Now, as lim x→∞ x log[x/(x − 1)]=1, it follows that, when c is large,log N ∼ m[1 + log(c − 1)]Thus, for instance, if n = 8000, m = 1000, then the number of necessary transitionsis approximately normally distributed with mean and variance equal to1000(1 + log 7) ≈ 3000. Hence, the number of necessary transitions would beroughly between95 percent of the time.3000 ± 2 √ 3000 or roughly 3000 ± 1104.5.3. Using a Random Walk to Analyze a Probabilistic Algorithmfor the Satisfiability ProblemConsider a Markov chain with states 0, 1,...,nhavingP 0,1 = 1, P i,i+1 = p, P i,i−1 = q = 1 − p, 1 i nand suppose that we are interested in studying the time that it takes for the chainto go from state 0 to state n. One approach to obtaining the mean time to reachstate n would be to let m i denote the mean time to go from state i to staten, i = 0,...,n− 1. If we then condition on the initial transition, we obtain thefollowing set of equations:m 0 = 1 + m 1 ,m i = E[time to reach n|next state is i + 1]p+ E[time to reach n|next state is i − 1]q= (1 + m i+1 )p + (1 + m i−1 )q= 1 + pm i+1 + qm i−1 , i = 1,...,n− 1Whereas the preceding equations can be solved for m i ,i = 0,...,n−1, we do notpursue their solution; we instead make use of the special structure of the Markovchain to obtain a simpler set of equations. To start, let N i denote the number ofadditional transitions that it takes the chain when it first enters state i until it enters


4.5. Some Applications 225state i + 1. By the Markovian property, it follows that these random variablesN i , i = 0,...,n− 1 are independent. Also, we can express N 0,n , the number oftransitions that it takes the chain to go from state 0 to state n,as∑n−1N 0,n = N i (4.15)Letting μ i = E[N i ] we obtain, upon conditioning on the next transition after thechain enters state i, that for i = 1,...,n− 1μ i = 1 + E[number of additional transitions to reach i + 1|chain to i − 1]qNow, if the chain next enters state i − 1, then in order for it to reach i + 1itmustfirst return to state i and must then go from state i + 1. Hence, we have from thepreceding thati=0μ i = 1 + E[N ∗ i−1 + N ∗ i ]qwhere Ni−1 ∗ and N i∗ are, respectively, the additional number of transitions to returnto state i from i − 1 and the number to then go from i to i + 1. Now, it followsfrom the Markovian property that these random variables have, respectively, thesame distributions as N i−1 and N i . In addition, they are independent (althoughwe will only use this when we compute the variance of N 0,n ). Hence, we see thatorμ i = 1 + q(μ i−1 + μ i )μ i = 1 p + q p μ i−1, i = 1,...,n− 1Starting with μ 0 = 1, and letting α = q/p, we obtain from the preceding recursionthatμ 1 = 1/p + α,μ 2 = 1/p + α(1/p + α) = 1/p + α/p + α 2 ,μ 3 = 1/p + α(1/p + α/p + α 2 )= 1/p + α/p + α 2 /p + α 3In general, we see thatμ i = 1 ∑i−1α j + α i , i = 1,...,n− 1 (4.16)pj=0


226 4 Markov ChainsUsing Equation (4.15), we now getE[N 0,n ]=1 + 1 pn−1 i−1∑ ∑i=1 j=0∑n−1α j +i=1When p = 1 2, and so α = 1, we see from the preceding thatWhen p ≠ 1 2, we obtain thatE[N 0,n ]=1 +α iE[N 0,n ]=1 + (n − 1)n + n − 1 = n 21 ∑n−1(1 − α i ) + α − αnp(1 − α)1 − αi=1= 1 + 1 + α [n − 1 − (α − ] αn )+ α − αn1 − α 1 − α 1 − α= 1 + 2αn+1 − (n + 1)α 2 + n − 1(1 − α) 2where the second equality used the fact that p = 1/(1 + α). Therefore, we see thatwhen α>1, or equivalently when p< 1 2, the expected number of transitions toreach n is an exponentially increasing function of n. On the other hand, when p =12 ,E[N 0,n]=n 2 , and when p> 1 2 ,E[N 0,n] is, for large n, essentially linear in n.Let us now compute Var(N 0,n ). To do so, we will again make use of the representationgiven by Equation (4.15). Letting v i = Var(N i ), we start by determiningthe v i recursively by using the conditional variance formula. Let S i = 1 if the firsttransition out of state i is into state i + 1, and let S i =−1 if the transition is intostate i − 1, i = 1,...,n− 1. Then,Hence,given that S i = 1: N 1 = 1given that S i =−1: N i = 1 + Ni−1 ∗ + N i∗E[N i |S i = 1]=1,E[N i |S i =−1]=1 + μ i−1 + μ i


4.5. Some Applications 227implying thatVar(E[N i |S i ]) = Var(E[N i |S i ]−1)= (μ i−1 + μ i ) 2 q − (μ i−1 + μ i ) 2 q 2= qp(μ i−1 + μ i ) 2Also, since Ni−1 ∗ and N i ∗ , the numbers of transitions to return from state i − 1to i and to then go from state i to state i + 1 are, by the Markovian property,independent random variables having the same distributions as N i−1 and N i ,respectively,we see thatHence,Var(N i |S i = 1) = 0,Var(N i |S i =−1) = v i−1 + v iE[Var(N i |S i )]=q(v i−1 + v i )From the conditional variance formula, we thus obtain thator, equivalently,v i = pq(μ i−1 + μ i ) 2 + q(v i−1 + v i )v i = q(μ i−1 + μ i ) 2 + αv i−1 , i = 1,...,n− 1Starting with v 0 = 0, we obtain from the preceding recursion thatv 1 = q(μ 0 + μ 1 ) 2 ,v 2 = q(μ 1 + μ 2 ) 2 + αq(μ 0 + μ 1 ) 2 ,v 3 = q(μ 2 + μ 3 ) 2 + αq(μ 1 + μ 2 ) 2 + α 2 q(μ 0 + μ 1 ) 2In general, we have for i>0,Therefore, we see thati∑v i = q α i−j (μ j−1 + μ j ) 2 (4.17)j=1∑n−1∑n−1Var(N 0,n ) = v i = qi=0where μ j is given by Equation (4.16).i=1 j=1i∑α i−j (μ j−1 + μ j ) 2


228 4 Markov ChainsWe see from Equations (4.16) and (4.17) that when p 1 2 , and so α 1, that μ iand v i , the mean and variance of the number of transitions to go from state i toi + 1, do not increase too rapidly in i. For instance, when p = 1 2it follows fromEquations (4.16) and (4.17) thatandv i = 1 2μ i = 2i + 1i∑(4j) 2 = 8j=1Hence, since N 0,n is the sum of independent random variables, which are ofroughly similar magnitudes when p 1 2, it follows in this case from the centrallimit theorem that N 0,n is, for large n, approximately normally distributed. Inparticular, when p = 1 2 ,N 0,n is approximately normal with mean n 2 and variance∑n−1Var(N 0,n ) = 8i∑i=1 j=1∑ ∑= 8j 2n−1 n−1j 2j=1 i=ji∑j=1∑n−1= 8 (n − j)j 2≈ 81≈ 2 3 n4j=1∫ n−1j 2(n − x)x 2 dxExample 4.25 (The Satisfiability Problem) A Boolean variable x is one thattakes on either of two values: TRUE or FALSE. If x i ,i 1 are Boolean variables,then a Boolean clause of the formx 1 +¯x 2 + x 3is TRUE if x 1 is TRUE, or if x 2 is FALSE, or if x 3 is TRUE. That is, the symbol“+” means “or” and ¯x is TRUE if x is FALSE and vice versa. A Boolean formulais a combination of clauses such as(x 1 +¯x 2 ) ∗ (x 1 + x 3 ) ∗ (x 2 +¯x 3 ) ∗ ( ¯x 1 +¯x 2 ) ∗ (x 1 + x 2 )In the preceding, the terms between the parentheses represent clauses, and theformula is TRUE if all the clauses are TRUE, and is FALSE otherwise. For a


4.5. Some Applications 229given Boolean formula, the satisfiability problem is either to determine valuesfor the variables that result in the formula being TRUE, or to determine that theformula is never true. For instance, one set of values that makes the precedingformula TRUE is to set x 1 = TRUE,x 2 = FALSE, and x 3 = FALSE.Consider a formula of the n Boolean variables x 1 ,...,x n and suppose that eachclause in this formula refers to exactly two variables. We will now present a probabilisticalgorithm that will either find values that satisfy the formula or determineto a high probability that it is not possible to satisfy it. To begin, start with an arbitrarysetting of values. Then, at each stage choose a clause whose value is FALSE,and randomly choose one of the Boolean variables in that clause and change itsvalue. That is, if the variable has value TRUE then change its value to FALSE,and vice versa. If this new setting makes the formula TRUE then stop, otherwisecontinue in the same fashion. If you have not stopped after n 2 (1 + 4 √ 23) repetitions,then declare that the formula cannot be satisfied. We will now argue that ifthere is a satisfiable assignment then this algorithm will find such an assignmentwith a probability very close to 1.Let us start by assuming that there is a satisfiable assignment of truth valuesand let A be such an assignment. At each stage of the algorithm there is a certainassignment of values. Let Y j denote the number of the n variables whose values atthe jth stage of the algorithm agree with their values in A . For instance, supposethat n = 3 and A consists of the settings x 1 = x 2 = x 3 = TRUE. If the assignmentof values at the jth step of the algorithm is x 1 = TRUE, x 2 = x 3 = FALSE, thenY j = 1. Now, at each stage, the algorithm considers a clause that is not satisfied,thus implying that at least one of the values of the two variables in this clause doesnot agree with its value in A . As a result, when we randomly choose one of thevariables in this clause then there is a probability of at least 1 2 that Y j+1 = Y j + 1and at most 1 2 that Y j+1 = Y j − 1. That is, independent of what has previouslytranspired in the algorithm, at each stage the number of settings in agreement withthose in A will either increase or decrease by 1 and the probability of an increaseis at least 1 2(it is 1 if both variables have values different from their values in A ).Thus, even though the process Y j ,j 0 is not itself a Markov chain (why not?)it is intuitively clear that both the expectation and the variance of the number ofstages of the algorithm needed to obtain the values of A will be less than or equalto the expectation and variance of the number of transitions to go from state 0 tostate n in the Markov chain of Section 4.5.2. Hence, if the algorithm has not yetterminated because it found a set of satisfiable values different from that of A ,itwill do so within an expected time of at most n 2 and with a standard deviation ofat most n 2√ 23. In addition, since the time for the Markov chain to go from 0 to nis approximately normal when n is large we can be quite certain that a satisfiableassignment will be reached by n 2 + 4(n 2√ 23) stages, and thus if one has not beenfound by this number of stages of the algorithm we can be quite certain that thereis no satisfiable assignment.


230 4 Markov ChainsOur analysis also makes it clear why we assumed that there are only two variablesin each clause. For if there were k,k > 2, variables in a clause then as anyclause that is not presently satisfied may have only one incorrect setting, a randomlychosen variable whose value is changed might only increase the number ofvalues in agreement with A with probability 1/k and so we could only concludefrom our prior Markov chain results that the mean time to obtain the values inA is an exponential function of n, which is not an efficient algorithm when n islarge. 4.6. Mean Time Spent in Transient StatesConsider now a finite state Markov chain and suppose that the states are numberedso that T ={1, 2,...,t} denotes the set of transient states. Let⎡⎤P 11 P 12 ··· P 1t⎢P T =. . . . ⎥⎣ . . . . ⎦P t1 P t2 ··· P ttand note that since P T specifies only the transition probabilities from transientstates into transient states, some of its row sums are less than 1 (otherwise,T would be a closed class of states).For transient states i and j, lets ij denote the expected number of time periodsthat the Markov chain is in state j, given that it starts in state i.Letδ i,j = 1 wheni = j and let it be 0 otherwise. Condition on the initial transition to obtains ij = δ i,j + ∑ kP ik s kj= δ i,j +t∑P ik s kj (4.18)where the final equality follows since it is impossible to go from a recurrent to atransient state, implying that s kj = 0 when k is a recurrent state.Let S denote the matrix of values s ij ,i,j = 1,...,t. That is,⎡⎤s 11 s 12 ··· s 1t⎢S = ⎣...⎥. ⎦s t1 s t2 ··· s ttk=1In matrix notation, Equation (4.18) can be written asS = I + P T S


4.6. Mean Time Spent in Transient States 231where I is the identity matrix of size t. Because the preceding equation is equivalentto(I − P T )S = Iwe obtain, upon multiplying both sides by (I − P T ) −1 ,S = (I − P T ) −1That is, the quantities s ij ,i ∈ T,j ∈ T , can be obtained by inverting the matrixI − P T . (The existence of the inverse is easily established.)Example 4.26 Consider the gambler’s ruin problem with p = 0.4 and N = 7.Starting with 3 units, determine(a) the expected amount of time the gambler has 5 units,(b) the expected amount of time the gambler has 2 units.Solution:follows:The matrix P T , which specifies P ij ,i,j ∈{1, 2, 3, 4, 5, 6}, isasInverting I − P T gives1 2 3 4 5 61 0 0.4 0 0 0 02 0.6 0 0.4 0 0 03 0 0.6 0 0.4 0 0P T = 4 0 0 0.6 0 0.4 05 0 0 0 0.6 0 0.46 0 0 0 0 0.6 0⎡⎤1.6149 1.0248 0.6314 0.3691 0.1943 0.07771.5372 2.5619 1.5784 0.9228 0.4857 0.1943S = (I − P T ) −1 =1.4206 2.3677 2.9990 1.7533 0.9228 0.3691⎢1.2458 2.0763 2.6299 2.9990 1.5784 0.6314⎥⎣0.9835 1.6391 2.0763 2.3677 2.5619 1.0248⎦0.5901 0.9835 1.2458 1.4206 1.5372 1.6149Hence,s 3,5 = 0.9228, s 3,2 = 2.3677


232 4 Markov ChainsFor i ∈ T,j ∈ T , the quantity f ij , equal to the probability that the Markovchain ever makes a transition into state j given that it starts in state i, is easilydetermined from P T . To determine the relationship, let us start by deriving anexpression for s ij by conditioning on whether state j is ever entered. This yieldss ij = E[time in j|start in i, ever transit to j]f ij+ E[time in j|start in i, never transit to j](1 − f ij )= (δ i,j + s jj )f ij + δ i,j (1 − f i,j )= δ i,j + f ij s jjsince s jj is the expected number of additional time periods spent in state j giventhat it is eventually entered from state i. Solving the preceding equation yieldsf ij = s ij − δ i,js jjExample 4.27has a fortune of 1?In Example 4.26, what is the probability that the gambler everSolution:Since s 3,1 = 1.4206 and s 1,1 = 1.6149, thenf 3,1 = s 3,1= 0.8797s 1,1As a check, note that f 3,1 is just the probability that a gambler starting with3 reaches 1 before 7. That is, it is the probability that the gambler’s fortune willgo down 2 before going up 4; which is the probability that a gambler startingwith 2 will go broke before reaching 6. Therefore,f 3,1 =− 1 − (0.6/0.4)21 − (0.6/0.4) 6 = 0.8797which checks with our earlier answer. Suppose we are interested in the expected time until the Markov chain enterssome sets of states A, which need not be the set of recurrent states. We can reducethis back to the previous situation by making all states in A absorbing states. Thatis, reset the transition probabilities of states in A to satisfy.P i,i = 1,i ∈ AThis transforms the states of A into recurrent states, and transforms any stateoutside of A from which an eventual transition into A is possible into a transientstate. Thus, our previous approach can be used.


4.7. Branching Processes 2334.7. Branching ProcessesIn this section we consider a class of Markov chains, known as branchingprocesses, which have a wide variety of applications in the biological, sociological,and engineering sciences.Consider a population consisting of individuals able to produce offspring ofthe same kind. Suppose that each individual will, by the end of its lifetime,have produced j new offspring with probability P j ,j 0, independently of thenumbers produced by other individuals. We suppose that P j < 1 for all j 0.The number of individuals initially present, denoted by X 0 , is called the size ofthe zeroth generation. All offspring of the zeroth generation constitute the firstgeneration and their number is denoted by X 1 . In general, let X n denote the sizeof the nth generation. It follows that {X n ,n= 0, 1,...} is a Markov chain havingas its state space the set of nonnegative integers.Note that state 0 is a recurrent state, since clearly P 00 = 1. Also, if P 0 > 0, allother states are transient. This follows since P i0 = P0 i , which implies that startingwith i individuals there is a positive probability of at least P0 i that no later generationwill ever consist of i individuals. Moreover, since any finite set of transientstates {1, 2,...,n} will be visited only finitely often, this leads to the importantconclusion that, if P 0 > 0, then the population will either die out or its size willconverge to infinity.Letμ =∞∑jP jj=0denote the mean number of offspring of a single individual, and letσ 2 =∞∑(j − μ) 2 P jj=0be the variance of the number of offspring produced by a single individual.Let us suppose that X 0 = 1, that is, initially there is a single individual present.We calculate E[X n ] and Var(X n ) by first noting that we may writeX∑n−1X n =where Z i represents the number of offspring of the ith individual of the (n − 1)stgeneration. By conditioning on X n−1 , we obtaini=1Z i


234 4 Markov ChainsE[X n ]=E[E[X n |X n−1 ]][ [ Xn−1]]∑= E E Z i |X n−1i=1= E[X n−1 μ]= μE[X n−1 ]where we have used the fact that E[Z i ]=μ. Since E[X 0 ]=1, the precedingyieldsE[X 1 ]=μ,E[X 2 ]=μE[X 1 ]=μ 2 ,.E[X n ]=μE[X n−1 ]=μ nSimilarly, Var(X n ) may be obtained by using the conditional variance formulaVar(X n ) = E[Var(X n |X n−1 )]+Var(E[X n |X n−1 ])Now, given X n−1 ,X n is just the sum of X n−1 independent random variables eachhaving the distribution {P j ,j 0}. Hence,E[X n |X n−1 ]=X n−1 μ, Var(X n |X n−1 ) = X n−1 σ 2The conditional variance formula now yieldsVar(X n ) = E[X n−1 σ 2 ]+Var(X n−1 μ)= σ 2 μ n−1 + μ 2 Var(X n−1 )= σ 2 μ n−1 + μ 2( σ 2 μ n−2 + μ 2 Var(X n−2 ) )= σ 2 (μ n−1 + μ n ) + μ 4 Var(X n−2 )= σ 2 (μ n−1 + μ n ) + μ 4( σ 2 μ n−3 + μ 2 Var(X n−3 ) )= σ 2 (μ n−1 + μ n + μ n+1 ) + μ 6 Var(X n−3 )=···= σ 2 (μ n−1 + μ n +···+μ 2n−2 ) + μ 2n Var(X 0 )= σ 2 (μ n−1 + μ n +···+μ 2n−2 )


4.7. Branching Processes 235Therefore,{σ 2 μ n−1( 1−μ n )1−μ , if μ ≠ 1Var(X n ) =(4.19)nσ 2 , if μ = 1Let π 0 denote the probability that the population will eventually die out (underthe assumption that X 0 = 1). More formally,π 0 = lim P {X n = 0|X 0 = 1}n→∞The problem of determining the value of π 0 was first raised in connection withthe extinction of family surnames by Galton in 1889.We first note that π 0 = 1ifμ


236 4 Markov ChainsIn fact when μ>1, it can be shown that π 0 is the smallest positive number satisfyingEquation (4.20).Example 4.28 If P 0 = 1 2 , P 1 = 1 4 , P 2 = 1 4 , then determine π 0.Solution: Since μ = 3 4 1, it follows that π 0 = 1. Example 4.29 If P 0 = 1 4 , P 1 = 1 4 , P 2 = 1 2 , then determine π 0.Solution:π 0 satisfiesπ 0 = 1 4 + 1 4 π 0 + 1 2 π 2 0or2π 2 0 − 3π 0 + 1 = 0The smallest positive solution of this quadratic equation is π 0 = 1 2 .Example 4.30 In Examples 4.28 and 4.29, what is the probability that thepopulation will die out if it initially consists of n individuals?Solution: Since the population will die out if and only if the families of eachof the members of the initial generation die out, the desired probability is π0 n.For Example 4.28 this yields π0 n = 1, and for Example 4.29, π 0 n = ( 1 2 )n . 4.8. Time Reversible Markov ChainsConsider a stationary ergodic Markov chain (that is, an ergodic Markov chainthat has been in operation for a long time) having transition probabilities P ij andstationary probabilities π i , and suppose that starting at some time we trace thesequence of states going backward in time. That is, starting at time n, consider thesequence of states X n ,X n−1 ,X n−2 ,.... It turns out that this sequence of states isitself a Markov chain with transition probabilities Q ij defined byQ ij = P {X m = j|X m+1 = i}= P {X m = j,X m+1 = i}P {X m+1 = i}= P {X m = j}P {X m+1 = i|X m = j}P {X m+1 = i}= π j P jiπ i


4.8. Time Reversible Markov Chains 237To prove that the reversed process is indeed a Markov chain, we must verify thatP {X m = j|X m+1 = i, X m+2 ,X m+3 ,...}=P {X m = j|X m+1 = i}To see that this is so, suppose that the present time is m + 1. Now, sinceX 0 ,X 1 ,X 2 ,... is a Markov chain, it follows that the conditional distributionof the future X m+2 ,X m+3 ,... given the present state X m+1 is independent ofthe past state X m . However, independence is a symmetric relationship (that is,if A is independent of B, then B is independent of A), and so this means thatgiven X m+1 ,X m is independent of X m+2 ,X m+3 ,.... But this is exactly what wehad to verify.Thus, the reversed process is also a Markov chain with transition probabilitiesgiven byQ ij = π j P jiπ iIf Q ij = P ij for all i, j, then the Markov chain is said to be time reversible. Thecondition for time reversibility, namely, Q ij = P ij , can also be expressed asπ i P ij = π j P ji for all i, j (4.21)The condition in Equation (4.21) can be stated that, for all states i and j, therate at which the process goes from i to j (namely, π i P ij ) is equal to the rateat which it goes from j to i (namely, π j P ji ). It is worth noting that this is anobvious necessary condition for time reversibility since a transition from i to jgoing backward in time is equivalent to a transition from j to i going forward intime; that is, if X m = i and X m−1 = j, then a transition from i to j is observedif we are looking backward, and one from j to i if we are looking forward intime. Thus, the rate at which the forward process makes a transition from j to i isalways equal to the rate at which the reverse process makes a transition from i toj; if time reversible, this must equal the rate at which the forward process makesa transition from i to j.If we can find nonnegative numbers, summing to one, that satisfy Equation(4.21), then it follows that the Markov chain is time reversible and the numbersrepresent the limiting probabilities. This is so since ifx i P ij = x j P jifor all i, j, ∑ ix i = 1 (4.22)then summing over i yields∑ ∑x i P ij = x j P ji = x j ,ii∑x i = 1and, because the limiting probabilities π i are the unique solution of the preceding,it follows that x i = π i for all i.i


238 4 Markov ChainsExample 4.31probabilitiesConsider a random walk with states 0, 1,...,M and transitionP i,i+1 = α i = 1 − P i,i−1 , i = 1,...,M − 1,P 0,1 = α 0 = 1 − P 0,0 ,P M,M = α M = 1 − P M,M−1Without the need for any computations, it is possible to argue that this Markovchain, which can only make transitions from a state to one of its two nearestneighbors, is time reversible. This follows by noting that the number of transitionsfrom i to i + 1 must at all times be within 1 of the number from i + 1toi.Thisisso because between any two transitions from i to i +1 there must be one from i +1to i (and conversely) since the only way to reenter i from a higher state is via statei + 1. Hence, it follows that the rate of transitions from i to i + 1 equals the ratefrom i + 1toi, and so the process is time reversible.We can easily obtain the limiting probabilities by equating for each state i =0, 1,...,M − 1 the rate at which the process goes from i to i + 1 with the rate atwhich it goes from i + 1toi. This yieldsπ 0 α 0 = π 1 (1 − α 1 ),π 1 α 1 = π 2 (1 − α 2 ),..π i α i = π i+1 (1 − α i+1 ), i = 0, 1,...,M − 1Solving in terms of π 0 yieldsand, in general,π 1 = α 0π 0 ,1 − α 1π 2 = α 1α 1 α 0π 1 =1 − α 2 (1 − α 2 )(1 − α 1 ) π 0π i =Since ∑ M0 π i = 1, we obtainα i−1 ···α 0(1 − α i ) ···(1 − α 1 ) π 0, i = 1, 2,...,Mπ 0[1 +M∑j=1]α j−1 ···α 0= 1(1 − α j ) ···(1 − α 1 )


4.8. Time Reversible Markov Chains 239or[] −1 M∑ α j−1 ···α 0π 0 = 1 +(4.23)(1 − α j ) ···(1 − α 1 )j=1andπ i =For instance, if α i ≡ α, thenα i−1 ···α 0(1 − α i ) ···(1 − α 1 ) π 0, i = 1,...,M (4.24)[M∑( ) ]αj −1π 0 = 1 +1 − αj=1= 1 − β1 − β M+1and, in general,π i = βi (1 − β), i = 0, 1,...,M1 − βM+1 whereβ =α1 − αAnother special case of Example 4.31 is the following urn model, proposedby the physicists P. and T. Ehrenfest to describe the movements of molecules.Suppose that M molecules are distributed among two urns; and at each time pointone of the molecules is chosen at random, removed from its urn, and placed in theother one. The number of molecules in urn I is a special case of the Markov chainof Example 4.31 havingα i = M − i , i = 0, 1,...,MM


240 4 Markov ChainsHence, using Equations (4.23) and (4.24) the limiting probabilities in this case are[M∑π 0 = 1 +j=1[∑ M ( ) ] −1M=jj=0( 1 M=2)where we have used the identity( 11 =2 + 1 M2)Hence, from Equation (4.24)] −1(M − j + 1) ···(M − 1)Mj(j − 1) ···1=M∑( )( M 1 Mj 2)j=0( )( ) M 1 Mπ i =, i = 0, 1,...,Mi 2Because the preceding are just the binomial probabilities, it follows that in thelong run, the positions of each of the M balls are independent and each one isequally likely to be in either urn. This, however, is quite intuitive, for if we focuson any one ball, it becomes quite clear that its position will be independent of thepositions of the other balls (since no matter where the other M − 1 balls are, theball under consideration at each stage will be moved with probability 1/M) andby symmetry, it is equally likely to be in either urn.Example 4.32 Consider an arbitrary connected graph (see Section 3.6 fordefinitions) having a number w ij associated with arc (i, j) for each arc. One instanceof such a graph is given by Figure 4.1. Now consider a particle movingfrom node to node in this manner: If at any time the particle resides at node i,then it will next move to node j with probability P ij whereP ij =w ij∑j w ijand where w ij is 0 if (i, j) is not an arc. For instance, for the graph of Figure 4.1,P 12 = 3/(3 + 1 + 2) = 1 2 .


4.8. Time Reversible Markov Chains 241Figure 4.1. A connected graph with arc weights.The time reversibility equationsπ i P ij = π j P jireduce toπ iw ij∑j w ijw ji= π j ∑i w jior, equivalently, since w ij = w jiwhich is equivalent toorπ i∑j w ijπ i∑j w ij= π j∑i w ji= cπ i = c ∑ jw ijor, since 1 = ∑ i π iπ i =∑∑ij w ij∑j w ijBecause the π i s given by this equation satisfy the time reversibility equations, itfollows that the process is time reversible with these limiting probabilities.For the graph of Figure 4.1 we have thatπ 1 = 632 , π 2 = 332 , π 3 = 6 32 , π 4 = 5 32 , π 5 = 1232


242 4 Markov ChainsIf we try to solve Equation (4.22) for an arbitrary Markov chain with states0, 1,...,M, it will usually turn out that no solution exists. For example, fromEquation (4.22),x i P ij = x j P ji ,implying (if P ij P jk > 0) thatx k P kj = x j P jkx ix k= P jiP kjP ij P jkwhich in general need not equal P ki /P ik . Thus, we see that a necessary conditionfor time reversibility is thatP ik P kj P ji = P ij P jk P ki for all i, j, k (4.25)which is equivalent to the statement that, starting in state i, the path i → k → j →i has the same probability as the reversed path i → j → k → i. To understandthe necessity of this, note that time reversibility implies that the rate at which asequence of transitions from i to k to j to i occurs must equal the rate of onesfrom i to j to k to i (why?), and so we must haveimplying Equation (4.25) when π i > 0.In fact, we can show the following.π i P ik P kj P ji = π i P ij P jk P kiTheorem 4.2 An ergodic Markov chain for which P ij = 0 whenever P ji = 0is time reversible if and only if starting in state i, any path back to i has the sameprobability as the reversed path. That is, iffor all states i, i 1 ,...,i k .P i,i1 P i1 ,i 2 ···P ik ,i = P i,ik P ik ,i k−1 ···P i1 ,i (4.26)Proof We have already proven necessity. To prove sufficiency, fix states i and jand rewrite (4.26) asP i,i1 P i1 ,i 2 ···P ik ,j P ji = P ij P j,ik ···P i1 ,iSumming the preceding over all states i 1 ,...,i k yieldsPij k+1 P ji = P ij Pjik+1Letting k →∞yieldswhich proves the theorem.π j P ji = P ij π i


4.8. Time Reversible Markov Chains 243Example 4.33 Suppose we are given a set of n elements, numbered 1 throughn, which are to be arranged in some ordered list. At each unit of time a request ismade to retrieve one of these elements, element i being requested (independentlyof the past) with probability P i . After being requested, the element then is put backbut not necessarily in the same position. In fact, let us suppose that the elementrequested is moved one closer to the front of the list; for instance, if the presentlist ordering is 1, 3, 4, 2, 5 and element 2 is requested, then the new orderingbecomes 1, 3, 2, 4, 5. We are interested in the long-run average position of theelement requested.For any given probability vector P = (P 1 ,...,P n ), the preceding can be modeledas a Markov chain with n! states, with the state at any time being the listorder at that time. We shall show that this Markov chain is time reversible andthen use this to show that the average position of the element requested whenthis one-closer rule is in effect is less than when the rule of always moving therequested element to the front of the line is used. The time reversibility of theresulting Markov chain when the one-closer reordering rule is in effect easily followsfrom Theorem 4.2. For instance, suppose n = 3 and consider the followingpath from state (1, 2, 3) to itself:(1, 2, 3) → (2, 1, 3) → (2, 3, 1) → (3, 2, 1)→ (3, 1, 2) → (1, 3, 2) → (1, 2, 3)The product of the transition probabilities in the forward direction iswhereas in the reverse direction, it isP 2 P 3 P 3 P 1 P 1 P 2 = P 2 1 P 2 2 P 2 3P 3 P 3 P 2 P 2 P 1 P 1 = P 2 1 P 2 2 P 2 3Because the general result follows in much the same manner, the Markov chain isindeed time reversible. [For a formal argument note that if f i denotes the numberof times element i moves forward in the path, then as the path goes from a fixedstate back to itself, it follows that element i will also move backward f i times.Therefore, since the backward moves of element i are precisely the times that itmoves forward in the reverse path, it follows that the product of the transitionprobabilities for both the path and its reversal will equal∏iP f i+r iiwhere r i is equal to the number of times that element i is in the first position andthe path (or the reverse path) does not change states.]


244 4 Markov ChainsFor any permutation i 1 ,i 2 ,...,i n of 1, 2,...,n,letπ(i 1 ,i 2 ,...,i n ) denote thelimiting probability under the one-closer rule. By time reversibility we haveP ij+1 π(i 1 ,...,i j ,i j+1 ,...,i n ) = P ij π(i 1 ,...,i j+1 ,i j ,...,i n ) (4.27)for all permutations.Now the average position of the element requested can be expressed (as inSection 3.6.1) asAverage position = ∑ i= ∑ iP i E[Position of element i][P i 1 + ∑ ]P {element j precedes element i}j≠i= 1 + ∑ i∑P i P {e j precedes e i }j≠i= 1 + ∑ iP iNow consider any state where element i precedes element j, say,(...,i,i 1 , ...,i k ,j,...). By successive transpositions using Equation (4.27),


4.8. Time Reversible Markov Chains 245we haveπ(...,i,i 1 ,...,i k ,j,...)=For instance,(PiP j) k+1π(...,j,i 1 ,...,i k ,i,...) (4.28)π(1, 2, 3) = P 2P 3π(1, 3, 2) = P 2P 3P 1P 3π(3, 1, 2)= P 2P 3P 1P 3P 1P 2π(3, 2, 1) =Now when P j >P i , Equation (4.28) implies that(P1P 3) 2π(3, 2, 1)π(...,i,i 1 ,...,i k ,j,...)< P iπ(...,j,i 1 ,...,i k ,i,...)P jLetting α(i,j) = P {e i precedes e j }, we see by summing over all states for whichi precedes j and by using the preceding thatα(i,j) < P iα(j,i)P jwhich, since α(i,j) = 1 − α(j,i), yieldsP jα(j,i) >P j + P iHence, the average position of the element requested is indeed smaller under theone-closer rule than under the front-of-the-line rule. The concept of the reversed chain is useful even when the process is not timereversible. To illustrate this, we start with the following proposition whose proofis left as an exercise.Proposition 4.6 Consider an irreducible Markov chain with transition probabilitiesP ij . If we can find positive numbers π i ,i 0, summing to one, and atransition probability matrix Q =[Q ij ] such thatπ i P ij = π j Q ji (4.29)then the Q ij are the transition probabilities of the reversed chain and the π i arethe stationary probabilities both for the original and reversed chain.The importance of the preceding proposition is that, by thinking backward, wecan sometimes guess at the nature of the reversed chain and then use the set ofequations (4.29) to obtain both the stationary probabilities and the Q ij .


246 4 Markov ChainsExample 4.34 A single bulb is necessary to light a given room. When thebulb in use fails, it is replaced by a new one at the beginning of the next day. LetX n equal i if the bulb in use at the beginning of day n is in its ith day of use (thatis, if its present age is i). For instance, if a bulb fails on day n − 1, then a new bulbwill be put in use at the beginning of day n and so X n = 1. If we suppose that eachbulb, independently, fails on its ith day of use with probability p i ,i 1, then it iseasy to see that {X n ,n 1} is a Markov chain whose transition probabilities areas follows:P i,1 = P { bulb, on its ith day of use, fails}= P {life of bulb = i|life of bulb i}P {L = i}=P {L i}where L, a random variable representing the lifetime of a bulb, is such thatP {L = i}=p i .Also,P i,i+1 = 1 − P i,1Suppose now that this chain has been in operation for a long (in theory, an infinite)time and consider the sequence of states going backward in time. Since, inthe forward direction, the state is always increasing by 1 until it reaches the ageat which the item fails, it is easy to see that the reverse chain will always decreaseby 1 until it reaches 1 and then it will jump to a random value representing thelifetime of the (in real time) previous bulb. Thus, it seems that the reverse chainshould have transition probabilities given byQ i,i−1 = 1, i >1Q 1,i = p i , i 1To check this, and at the same time determine the stationary probabilities, wemust see if we can find, with the Q i,j as previously given, positive numbers {π i }such thatπ i P i,j = π j Q j,iTo begin, let j = 1 and consider the resulting equations:π i P i,1 = π 1 Q 1,iThis is equivalent toπ iP {L = i}P {L i} = π 1P {L = i}


4.9. Markov Chain Monte Carlo Methods 247orπ i = π 1 P {L i}Summing over all i yields1 =∞∑ ∑∞π i = π 1 P {L i}=π 1 E[L]i=1i=1and so, for the preceding Q ij to represent the reverse transition probabilities, it isnecessary for the stationary probabilities to beP {L i}π i = , i 1E[L]To finish the proof that the reverse transition probabilities and stationary probabilitiesare as given, all that remains is to show that they satisfyπ i P i,i+1 = π i+1 Q i+1,iwhich is equivalent to()P {L i} P {L = i} P {L i + 1}1 − =E[L] P {L i} E[L]and which is true since P {L i}−P {L = i}=P {L i + 1}.4.9. Markov Chain Monte Carlo MethodsLet X be a discrete random vector whose set of possible values is x j ,j 1. Letthe probability mass function of X be given by P {X = x j },j 1, and supposethat we are interested in calculatingθ = E[h(X)]=∞∑h(x j )P {X = x j }j=1for some specified function h. In situations where it is computationally difficultto evaluate the function h(x j ), j 1, we often turn to simulation to approximateθ. The usual approach, called Monte Carlo simulation, is to use random numbersto generate a partial sequence of independent and identically distributed randomvectors X 1 , X 2 ,...,X n having the mass function P {X = x j }, j 1 (see Chap-


248 4 Markov Chainster 11 for a discussion as to how this can be accomplished). Since the strong lawof large numbers yieldslimn∑n→∞i=1h(X i )n= θ (4.30)it follows that we can estimate θ by letting n be large and using the average of thevalues of h(X i ), i = 1,...,nas the estimator.It often, however, turns out that it is difficult to generate a random vector havingthe specified probability mass function, particularly if X is a vector of dependentrandom variables. In addition, its probability mass function is sometimes given inthe form P {X = x j }=Cb j ,j 1, where the b j are specified, but C must be computed,and in many applications it is not computationally feasible to sum the b j soas to determine C. Fortunately, however, there is another way of using simulationto estimate θ in these situations. It works by generating a sequence, not of independentrandom vectors, but of the successive states of a vector-valued Markovchain X 1 , X 2 ,...whose stationary probabilities are P {X = x j }, j 1. If this canbe accomplished, then it would follow from Proposition 4.3 that Equation (4.30)remains valid, implying that we can then use ∑ ni=1 h(X i )/n as an estimator of θ.We now show how to generate a Markov chain with arbitrary stationary probabilitiesthat may only be specified up to a multiplicative constant. Let b(j),j = 1, 2,... be positive numbers whose sum B = ∑ ∞j=1 b(j) is finite. The following,known as the Hastings–Metropolis algorithm, can be used to generate atime reversible Markov chain whose stationary probabilities areπ(j) = b(j)/B, j = 1, 2,...To begin, let Q be any specified irreducible Markov transition probability matrixon the integers, with q(i,j) representing the row i column j element of Q. Nowdefine a Markov chain {X n ,n 0} as follows. When X n = i, generate a randomvariable Y such that P {Y = j}=q(i,j), j = 1, 2,....IfY = j, then set X n+1equal to j with probability α(i,j), and set it equal to i with probability 1−α(i,j).Under these conditions, it is easy to see that the sequence of states constitutes aMarkov chain with transition probabilities P i,j given byP i,j = q(i, j)α(i, j),if j ≠ iP i,i = q(i,i) + ∑ q(i,k)(1 − α(i,k))k≠iThis Markov chain will be time reversible and have stationary probabilities π(j)ifπ(i)P i,j = π(j)P j,ifor j ≠ i


4.9. Markov Chain Monte Carlo Methods 249which is equivalent toπ(i)q(i, j)α(i, j) = π(j)q(j,i)α(j,i) (4.31)But if we take π j = b(j)/B and set( )π(j)q(j,i)α(i,j) = minπ(i)q(i,j) , 1(4.32)then Equation (4.31) is easily seen to be satisfied. For ifα(i,j) = π(j)q(j,i)π(i)q(i,j)then α(j,i) = 1 and Equation (4.31) follows, and if α(i,j) = 1 thenα(j,i) = π(i)q(i,j)π(j)q(j,i)and again Equation (4.31) holds, thus showing that the Markov chain is time reversiblewith stationary probabilities π(j). Also, since π(j) = b(j)/B, weseefrom (4.32) that( )b(j)q(j,i)α(i,j) = minb(i)q(i,j) , 1which shows that the value of B is not needed to define the Markov chain, becausethe values b(j) suffice. Also, it is almost always the case that π(j),j 1 will notonly be stationary probabilities but will also be limiting probabilities. (Indeed, asufficient condition is that P i,i > 0forsomei.)Example 4.35 Suppose that we want to generate a uniformly distributed elementin S , the set of all permutations (x 1 ,...,x n ) of the numbers (1,...,n) forwhich ∑ nj=1 jx j >afor a given constant a. To utilize the Hastings–Metropolisalgorithm we need to define an irreducible Markov transition probability matrixon the state space S . To accomplish this, we first define a concept of “neighboring”elements of S , and then construct a graph whose vertex set is S .Westartby putting an arc between each pair of neighboring elements in S , where any twopermutations in S are said to be neighbors if one results from an interchange oftwo of the positions of the other. That is, (1, 2, 3, 4) and (1, 2, 4, 3) are neighborswhereas (1, 2, 3, 4) and (1, 3, 4, 2) are not. Now, define the q transition probabilityfunction as follows. With N(s) defined as the set of neighbors of s, and |N(s)|


250 4 Markov Chainsequal to the number of elements in the set N(s), letq(s, t) = 1|N(s)|if t ∈ N(s)That is, the candidate next state from s is equally likely to be any of its neighbors.Since the desired limiting probabilities of the Markov chain are π(s) = C,itfollows that π(s) = π(t), and soα(s, t) = min(|N(s)|/|N(t)|, 1)That is, if the present state of the Markov chain is s then one of its neighbors israndomly chosen, say, t.Ift is a state with fewer neighbors than s (in graph theorylanguage, if the degree of vertex t is less than that of vertex s), then the next stateis t. If not, a uniform (0,1) random number U is generated and the next state is t ifU


4.9. Markov Chain Monte Carlo Methods 251the Hastings–Metropolis algorithm withq(x, y) = 1 n P {X i = x|X j = x j ,j ≠ i}=p(y)nP {X j = x j ,j ≠ i}Because we want the limiting mass function to be p, we see from Equation (4.32)that the vector y is then accepted as the new state with probability( )p(y)q(y, x)α(x, y) = minp(x)q(x, y) , 1( )p(y)p(x)= minp(x)p(y) , 1= 1Hence, when utilizing the Gibbs sampler, the candidate state is always acceptedas the next state of the chain.Example 4.36 Suppose that we want to generate n uniformly distributedpoints in the circle of radius 1 centered at the origin, conditional on the eventthat no two points are within a distance d of each other, when the probability ofthis conditioning event is small. This can be accomplished by using the Gibbssampler as follows. Start with any n points x 1 ,...,x n in the circle that have theproperty that no two of them are within d of the other; then generate the valueof I , equally likely to be any of the values 1,...,n. Then continually generate arandom point in the circle until you obtain one that is not within d of any of theother n − 1 points excluding x I . At this point, replace x I by the generated pointand then repeat the operation. After a large number of iterations of this algorithm,the set of n points will approximately have the desired distribution. Example 4.37 Let X i ,i = 1,...,n, be independent exponential random variableswith respective rates λ i ,i = 1,...,n.LetS = ∑ ni=1 X i , and suppose that wewant to generate the random vector X = (X 1 ,...,X n ), conditional on the eventthat S>cfor some large positive constant c. That is, we want to generate thevalue of a random vector whose density function isf(x 1 ,...,x n ) =1P {S >c}n∏n∑λ i e −λ ix i, x i 0, x i >ci=1This is easily accomplished by starting with an initial vector x = (x 1 ,...,x n )satisfying x i > 0,i = 1,...,n, ∑ ni=1 x i >c. Then generate a random variable Ithat is equally likely to be any of 1,...,n. Next, generate an exponential randomi=1


252 4 Markov Chainsvariable X with rate λ I conditional on the event that X + ∑ j≠I x j >c.Thislatter step, which calls for generating the value of an exponential random variablegiven that it exceeds c − ∑ j≠I x j , is easily accomplished by using the fact thatan exponential conditioned to be greater than a positive constant is distributedas the constant plus the exponential. Consequently, to obtain X, first generate anexponential random variable Y with rate λ I , and then set(X = Y + c − ∑ ) +x jj≠IThe value of x I should then be reset as X and a new iteration of the algorithmbegun. Remark As can be seen by Examples 4.36 and 4.37, although the theory forthe Gibbs sampler was represented under the assumption that the distribution tobe generated was discrete, it also holds when this distribution is continuous.4.10. Markov Decision ProcessesConsider a process that is observed at discrete time points to be in any one ofM possible states, which we number by 1, 2,...,M. After observing the state ofthe process, an action must be chosen, and we let A, assumed finite, denote theset of all possible actions.If the process is in state i at time n and action a is chosen, then the next stateof the system is determined according to the transition probabilities P ij (a). Ifwelet X n denote the state of the process at time n and a n the action chosen at timen, then the preceding is equivalent to stating thatP {X n+1 = j|X 0 ,a 0 ,X 1 ,a 1 ,...,X n = i, a n = a}=P ij (a)Thus, the transition probabilities are functions only of the present state and thesubsequent action.By a policy, we mean a rule for choosing actions. We shall restrict ourselvesto policies which are of the form that the action they prescribe at any time dependsonly on the state of the process at that time (and not on any informationconcerning prior states and actions). However, we shall allow the policyto be “randomized” in that its instructions may be to choose actions accordingto a probability distribution. In other words, a policy β is a set of numbersβ ={β i (a), a ∈ A,i = 1,...,M} with the interpretation that if the process is in


4.10. Markov Decision Processes 253state i, then action a is to be chosen with probability β i (a). Of course, we needhave that0 β i (a) 1, for all i, a∑β i (a) = 1, for all iaUnder any given policy β, the sequence of states {X n ,n= 0, 1,...} constitutesa Markov chain with transition probabilities P ij (β) given byP ij (β) = P β {X n+1 = j|X n = i} ∗= ∑ aP ij (a)β i (a)where the last equality follows by conditioning on the action chosen when instate i. Let us suppose that for every choice of a policy β, the resultant Markovchain {X n ,n= 0, 1,...} is ergodic.For any policy β, letπ ia denote the limiting (or steady-state) probability thatthe process will be in state i and action a will be chosen if policy β is employed.That is,π ia = lim P β{X n = i, a n = a}n→∞The vector π = (π ia ) must satisfy(i) π ia 0 for all i, a,(ii) ∑ ∑i a π ia = 1,(iii) ∑ a π ja = ∑ ∑i a π iaP ij (a) for all j (4.33)Equations (i) and (ii) are obvious, and Equation (iii) which is an analogue ofEquation (4.7) follows as the left-hand side equals the steady-state probabilityof being in state j and the right-hand side is the same probability computed byconditioning on the state and action chosen one stage earlier.Thus for any policy β, there is a vector π = (π ia ) which satisfies (i)–(iii) andwith the interpretation that π ia is equal to the steady-state probability of beingin state i and choosing action a when policy β is employed. Moreover, it turnsout that the reverse is also true. Namely, for any vector π = (π ia ) which satisfies(i)–(iii), there exists a policy β such that if β is used, then the steady-state probabilityof being in i and choosing action a equals π ia . To verify this last statement,∗ We use the notation P β to signify that the probability is conditional on the fact that policy β isused.


254 4 Markov Chainssuppose that π = (π ia ) is a vector which satisfies (i)–(iii). Then, let the policyβ = (β i (a)) beβ i (a) = P {β chooses a|state is i}= π ia∑a π iaNow let P ia denote the limiting probability of being in i and choosing a whenpolicy β is employed. We need to show that P ia = π ia . To do so, first note that{P ia ,i = 1,...,M, a ∈ A} are the limiting probabilities of the two-dimensionalMarkov chain {(X n ,a n ), n 0}. Hence, by the fundamental Theorem 4.1, theyare the unique solution of(i ′ ) P ia 0,(ii ′ ) ∑ ∑i a P ia = 1,(iii ′ ) P ja = ∑ ∑i a ′ P ia ′P ij (a ′ )β j (a)where (iii ′ ) follows sinceSinceP {X n+1 = j,a n+1 = a|X n = i, a n = a ′ }=P ij (a ′ )β j (a)β j (a) =we see that (P ia ) is the unique solution ofP ia 0,∑ ∑P ia = 1,iaP ja = ∑ i∑π ja∑a π jaa ′ P ia ′P ij (a ′ )Hence, to show that P ia = π ia , we need show thatπ ja∑a π jaπ ia 0,∑ ∑π ia = 1,i aπ ja = ∑ i∑a ′ π ia ′P ij (a ′ )π ja∑a π ja


4.10. Markov Decision Processes 255The top two equations follow from (i) and (ii) of Equation (4.33), and the third,which is equivalent to∑π ja = ∑ ∑π ia ′P ij (a ′ )ai a ′follows from condition (iii) of Equation (4.33).Thus we have shown that a vector β = (π ia ) will satisfy (i), (ii), and (iii) ofEquation (4.33) if and only if there exists a policy β such that π ia is equal to thesteady-state probability of being in state i and choosing action a when β is used.In fact, the policy β is defined by β i (a) = π ia / ∑ a π ia.The preceding is quite important in the determination of “optimal” policies.For instance, suppose that a reward R(i,a) is earned whenever action a is chosenin state i. Since R(X i ,a i ) would then represent the reward earned at time i, theexpected average reward per unit time under policy β can be expressed as[∑ ni=1 ]expected average reward under β = lim E R(X i ,a i )βn→∞ nNow, if π ia denotes the steady-state probability of being in state i and choosingaction a, it follows that the limiting expected reward at time n equalslim E[R(X n,a n )]= ∑ ∑π ia R(i,a)n→∞i awhich implies thatexpected average reward under β = ∑ ∑π ia R(i,a)i aHence, the problem of determining the policy that maximizes the expected averagereward is∑∑maximize π ia R(i,a)π=(π ia ) i asubject to π ia 0, for all i, a,∑ ∑π ia = 1,ia∑π ja = ∑ai∑π ia P ij (a), for all j (4.34)aHowever, the preceding maximization problem is a special case of what is knownas a linear program and can be solved by a standard linear programming algorithm


256 4 Markov Chainsknown as the simplex algorithm. ∗ If β ∗ = (πia ∗ ) maximizes the preceding, then theoptimal policy will be given by β ∗ whereβ ∗ i (a) =π ∗ ia∑a π ∗ iaRemarks (i) It can be shown that there is a π ∗ maximizing Equation (4.34)that has the property that for each i, πia ∗ is zero for all but one value of a, whichimplies that the optimal policy is nonrandomized. That is, the action it prescribeswhen in state i is a deterministic function of i.(ii) The linear programming formulation also often works when there are restrictionsplaced on the class of allowable policies. For instance, suppose there isa restriction on the fraction of time the process spends in some state, say, state1. Specifically, suppose that we are allowed to consider only policies having theproperty that their use results in the process being in state 1 less than 100α percentof time. To determine the optimal policy subject to this requirement, we add to thelinear programming problem the additional constraint∑π 1a αasince ∑ a π 1a represents the proportion of time that the process is in state 1.4.11. Hidden Markov ChainsLet {X n ,n= 1, 2,...} be a Markov chain with transition probabilities P i,j andinitial state probabilities p i = P {X 1 = i}, i 0. Suppose that there is a finite setS of signals, and that a signal from S is emitted each time the Markov chainenters a state. Further, suppose that when the Markov chain enters state j then,independently of previous Markov chain states and signals, the signal emitted is swith probability p(s|j), ∑ s∈S p(s|j)= 1. That is, if S n represents the nth signalemitted, thenP {S 1 = s|X 1 = j}=p(s|j),P {S n = s|X 1 ,S 1 ,...,X n−1 ,S n−1 ,X n = j}=p(s|j)A model of the preceding type in which the sequence of signals S 1 ,S 2 ,... isobserved, while the sequence of underlying Markov chain states X 1 ,X 2 ,... isunobserved, is called a hidden Markov chain model.∗ It is called a linear program since the objective function ∑ i∑a R(i,a)π ia and the constraints areall linear functions of the π ia . For a heuristic analysis of the simplex algorithm, see 4.5.2.


4.11. Hidden Markov Chains 257Example 4.38 Consider a production process that in each period is eitherin a good state (state 1) or in a poor state (state 2). If the process is in state 1during a period then, independent of the past, with probability 0.9 it will be instate 1 during the next period and with probability 0.1 it will be in state 2. Once instate 2, it remains in that state forever. Suppose that a single item is produced eachperiod and that each item produced when the process is in state 1 is of acceptablequality with probability 0.99, while each item produced when the process is instate 2 is of acceptable quality with probability 0.96.If the status, either acceptable or unacceptable, of each successive item is observed,while the process states are unobservable, then the preceding is a hiddenMarkov chain model. The signal is the status of the item produced, and has valueeither a or u, depending on whether the item is acceptable or unacceptable. Thesignal probabilities arep(u|1) = 0.01, p(a|1) = 0.99,p(u|2) = 0.04, p(a|2) = 0.96while the transition probabilities of the underlying Markov chain areP 1,1 = 0.9 = 1 − P 1,2 , P 2,2 = 1 Although {S n ,n 1} is not a Markov chain, it should be noted that, conditionalon the current state X n , the sequence S n ,X n+1 ,S n+1 ,... of future signals andstates is independent of the sequence X 1 ,S 1 ,...,X n−1 ,S n−1 of past states andsignals.Let S n = (S 1 ,...,S n ) be the random vector of the first n signals. For a fixedsequence of signals s 1 ,...,s n , let s k = (s 1 ,...,s k ), k n. To begin, let us determinethe conditional probability of the Markov chain state at time n given thatS n = s n . To obtain this probability, letand note thatF n (j) = P {S n = s n ,X n = j}P {X n = j|S n = s n }= P {Sn = s n ,X n = j}P {S n = s n }= F n(j)∑i F n(i)


258 4 Markov ChainsNow,F n (j) = P {S n−1 = s n−1 ,S n = s n ,X n = j}= ∑ iP {S n−1 = s n−1 ,X n−1 = i, X n = j,S n = s n }= ∑ iF n−1 (i)P {X n = j,S n = s n |S n−1 = s n−1 ,X n−1 = i}= ∑ iF n−1 (i)P {X n = j,S n = s n |X n−1 = i}= ∑ iF n−1 (i)P i,j p(s n |j)= p(s n |j) ∑ iF n−1 (i)P i,j (4.35)where the preceding used thatP {X n = j,S n = s n |X n−1 = i}= P {X n = j|X n−1 = i}×P {S n = s n |X n = j,X n−1 = i}= P i,j P {S n = s n |X n = j}= P i,j p(s n |j)Starting withF 1 (i) = P {X 1 = i, S 1 = s 1 }=p i p(s 1 |i)we can use Equation (4.35) to recursively determine the functions F 2 (i),F 3 (i), . . . , up to F n (i).Example 4.39 Suppose in Example 4.38 that P {X 1 = 1}=0.8. Given thatthe successive conditions of the first 3 items produced are a,u,a,(i) what is the probability that the process was in its good state when the thirditem was produced;(ii) what is the probability that X 4 is 1;(iii) what is the probability that the next item produced is acceptable?


4.11. Hidden Markov Chains 259Solution: With s 3 = (a,u,a),wehaveF 1 (1) = (0.8)(0.99) = 0.792,F 1 (2) = (0.2)(0.96) = 0.192F 2 (1) = 0.01[0.792(0.9) + 0.192(0)]=0.007128,F 2 (2) = 0.04[0.792(0.1) + (0.192)(1)]=0.010848F 3 (1) = 0.99[(0.007128)(0.9)]≈0.006351,F 3 (2) = 0.96[(0.007128)(0.1) + 0.010848]≈0.011098Therefore, the answer to part (i) is0.006351P {X 3 = 1|s 3 }≈0.006351 + 0.011098 ≈ 0.364To compute P {X 4 = 1|s 3 }, condition on X 3 to obtainP {X 4 = 1|s 3 }=P {X 4 = 1|X 3 = 1, s 3 }P {X 3 = 1|s 3 }+ P {X 4 = 1|X 3 = 2, s 3 }P {X 3 = 2|s 3 }= P {X 4 = 1|X 3 = 1, s 3 }(0.364) + P {X 4 = 1|X 3 = 2, s 3 }(0.636)= 0.364P 1,1 + 0.636P 2,1= 0.3276To compute P {S 4 = a|s 3 }, condition on X 4P {S 4 = a|s 3 }=P {S 4 = a|X 4 = 1, s 3 }P {X 4 = 1|s 3 }+ P {S 4 = a|X 4 = 2, s 3 }P {X 4 = 2|s 3 }= P {S 4 = a|X 4 = 1}(0.3276) + P {S 4 = a|X 4 = 2}(1 − 0.3276)= (0.99)(0.3276) + (0.96)(0.6724) = 0.9698 To compute P {S n = s n }, use the identity P {S n = s n }= ∑ i F n(i) along with therecursion (4.35). If there are N states of the Markov chain, this requires computingnN quantities F n (i), with each computation requiring a summation over N terms.This can be compared with a computation of P {S n = s n } based on conditioningon the first n states of the Markov chain to obtainP {S n = s n }= ∑P {S n = s n |X 1 = i 1 ,...,X n = i n }P {X 1 = i 1 ,...,X n = i n }i 1 ,...,i n= ∑i 1 ,...,i np(s 1 |i 1 ) ···p(s n |i n )p i1 P i1 ,i 2P i2 ,i 3 ···P in−1 ,i n


260 4 Markov ChainsThe use of the preceding identity to compute P {S n = s n } would thus require asummation over N n terms, with each term being a product of 2n values, indicatingthat it is not competitive with the previous approach.The computation of P {S n = s n } by recursively determining the functions F k (i)is known as the forward approach. There also is a backward approach, which isbased on the quantities B k (i), defined byB k (i) = P {S k+1 = s k+1 ,...,S n = s n |X k = i}A recursive formula for B k (i) can be obtained by conditioning on X k+1 .B k (i) = ∑ jP {S k+1 = s k+1 ,...,S n = s n |X k = i, X k+1 = j}P {X k+1 = j|X k = i}= ∑ jP {S k+1 = s k+1 ,...,S n = s n |X k+1 = j}P i,j= ∑ jP {S k+1 = s k+1 |X k+1 = j}× P {S k+2 = s k+2 ,...,S n = s n |S k+1 = s k+1 ,X k+1 = j}P i,j= ∑ jp(s k+1 |j)P{S k+2 = s k+2 ,...,S n = s n |X k+1 = j}P i,j= ∑ jp(s k+1 |j)B k+1 (j)P i,j (4.36)Starting withB n−1 (i) = P {S n = s n |X n−1 = i}= ∑ jP i,j p(s n |j)we would then use Equation (4.36) to determine the function B n−2 (i), thenB n−3 (i), and so on, down to B 1 (i). This would then yield P {S n = s n } viaP {S n = s n }= ∑ iP {S 1 = s 1 ,...,S n = s n |X 1 = i}p i= ∑ iP {S 1 = s 1 |X 1 = i}P {S 2 = s 2 ,...,S n = s n |S 1 = s 1 ,X 1 = i}p i


4.11. Hidden Markov Chains 261= ∑ ip(s 1 |i)P{S 2 = s 2 ,...,S n = s n |X 1 = i}p i= ∑ ip(s 1 |i)B 1 (i)p iAnother approach to obtaining P {S n = s n } is to combine both the forward andbackward approaches. Suppose that for some k we have computed both functionsF k (j) and B k (j). BecauseP {S n = s n ,X k = j}=P {S k = s k ,X k = j}× P {S k+1 = s k+1 ,...,S n = s n |S k = s k ,X k = j}= P {S k = s k ,X k = j}P {S k+1 = s k+1 ,...,S n = s n |X k = j}= F k (j)B k (j)we see thatP {S n = s n }= ∑ jF k (j)B k (j)The beauty of using the preceding identity to determine P {S n = s n } is that wemay simultaneously compute the sequence of forward functions, starting withF 1 , as well as the sequence of backward functions, starting at B n−1 . The parallelcomputations can then be stopped once we have computed both F k and B k forsome k.4.11.1. Predicting the StatesSuppose the first n observed signals are s n = (s 1 ,...,s n ), and that given this datawe want to predict the first n states of the Markov chain. The best predictor dependson what we are trying to accomplish. If our objective is to maximize theexpected number of states that are correctly predicted, then for each k = 1,...,nwe need to compute P {X k = j|S n = s n } and then let the value of j that maximizesthis quantity be the predictor of X k . (That is, we take the mode of the conditionalprobability mass function of X k , given the sequence of signals, as the predictor ofX k .) To do so, we must first compute this conditional probability mass function,which is accomplished as follows. For k nP {X k = j|S n = s n }= P {Sn = s n ,X k = j}P {S n = s n }= F k(j)B k (j)∑j F k(j)B k (j)Thus, given that S n = s n , the optimal predictor of X k is the value of j that maximizesF k (j)B k (j).


262 4 Markov ChainsA different variant of the prediction problem arises when we regard the sequenceof states as a single entity. In this situation, our objective is to choose thatsequence of states whose conditional probability, given the sequence of signals, ismaximal. For instance, in signal processing, while X 1 ,...,X n might be the actualmessage sent, S 1 ,...,S n would be what is received, and so the objective wouldbe to predict the actual message in its entirety.Letting X k = (X 1 ,...,X k ) be the vector of the first k states, the problemof interest is to find the sequence of states i 1 ,...,i n that maximizes P {X n =(i 1 ,...,i n )|S n = s n }. BecauseP {X n = (i 1 ,...,i n )|S n = s n }= P {X n = (i 1 ,...,i n ), S n = s n }P {S n = s s }this is equivalent to finding the sequence of states i 1 ,...,i n that maximizesP {X n = (i 1 ,...,i n ), S n = s n }.To solve the preceding problem let, for k n,V k (j) = max P {X k−1 = (i 1 ,...,i k−1 ), X k = j,S k = s k }i 1 ,...,i k−1To recursively solve for V k (j), use thatV k (j) = max max P {X k−2 = (i 1 ,...,i k−2 ), X k−1 = i, X k = j,S k = s k }i i 1 ,...,i k−2= max max P {X k−2 = (i 1 ,...,i k−2 ), X k−1 = i, S k−1 = s k−1 ,i i 1 ,...,i k−2X k = j,S k = s k }= max max P {X k−2 = (i 1 ,...,i k−2 ), X k−1 = i, S k−1 = s k−1 }i i 1 ,...,i k−2× P {X k = j,S k = s k |X k−2 = (i 1 ,...,i k−2 ), X k−1 = i, S k−1 = s k−1 }= max max P {X k−2 = (i 1 ,...,i k−2 ), X k−1 = i, S k−1 = s k−1 }i i 1 ,...,i k−2× P {X k = j,S k = s k |X k−1 = i}= max P {X k = j,S k = s k |X k−1 = i}i× max P {X k−2 = (i 1 ,...,i k−2 ), X k−1 = i, S k−1 = s k−1 }i 1 ,...,i k−2= max P i,j p(s k |j)V k−1 (i)i= p(s k |j)max P i,j V k−1 (i) (4.37)i


Exercises 263Starting withV 1 (j) = P {X 1 = j,S 1 = s 1 }=p j p(s 1 |j)we now use the recursive identity (4.37) to determine V 2 (j) for each j; then V 3 (j)for each j; and so on, up to V n (j) for each j.To obtain the maximizing sequence of states, we work in the reverse direction.Let j n be the value (or any of the values if there are more than one) of j thatmaximizes V n (j). Thus j n is the final state of a maximizing state sequence. Also,for k


264 4 Markov Chains3. In Exercise 2, suppose that if it has rained for the past three days, then itwill rain today with probability 0.8; if it did not rain for any of the past three days,then it will rain today with probability 0.2; and in any other case the weather todaywill, with probability 0.6, be the same as the weather yesterday. Determine P forthis Markov chain.*4. Consider a process {X n ,n= 0, 1,...} which takes on the values 0, 1, or 2.SupposeP {X n+1 = j|X n = i, X n−1 = i n−1 ,...,X 0 = i 0 }{ PIij, when n is even=Pij II,when n is oddwhere ∑ 2j=0 Pij I = ∑ 2j=0 Pij II = 1, i = 0, 1, 2. Is {X n,n 0} a Markov chain?If not, then show how, by enlarging the state space, we may transform it into aMarkov chain.5. A Markov chain {X n ,n 0} with states 0, 1, 2, has the transition probabilitymatrix⎡1 1 1⎤2 3 6⎢ ⎥⎣ 0 ⎦13120If P {X 0 = 0}=P {X 0 = 1}= 1 4 , find E[X 3].6. Let the transition probability matrix of a two-state Markov chain be given, asin Example 4.2, byP =∥ p 1 − p1 − p p ∥Show by mathematical induction that1P (n) 2=+ 1 2 (2p − 1)n 2 1 − ∥ 12(2p − 1)n ∥∥∥∥∥ 12 − 1 2 (2p − 1)n 2 1 + 1 2 (2p − 1)n7. In Example 4.4 suppose that it has rained neither yesterday nor the day beforeyesterday. What is the probability that it will rain tomorrow?8. Suppose that coin 1 has probability 0.7 of coming up heads, and coin 2 hasprobability 0.6 of coming up heads. If the coin flipped today comes up heads, thenwe select coin 1 to flip tomorrow, and if it comes up tails, then we select coin 2 toflip tomorrow. If the coin initially flipped is equally likely to be coin 1 or coin 2,then what is the probability that the coin flipped on the third day after the initialflip is coin 1?2312


Exercises 2659. Suppose in Exercise 8 that the coin flipped on Monday comes up heads. Whatis the probability that the coin flipped on Friday of the same week also comes upheads?10. In Example 4.3, Gary is currently in a cheerful mood. What is the probabilitythat he is not in a glum mood on any of the following three days?11. In Example 4.3, Gary was in a glum mood four days ago. Given that hehasn’t felt cheerful in a week, what is the probability he is feeling glum today?12. For a Markov chain {X n ,n 0} with transition probabilities P i,j , considerthe conditional probability that X n = m given that the chain started at time 0 instate i and has not yet entered state r by time n, where r is a specified state notequal to either i or m. We are interested in whether this conditional probabilityis equal to the n stage transition probability of a Markov chain whose state spacedoes not include state r and whose transition probabilities areEither prove the equalityQ i,j =P i,j1 − P i,r, i,j ≠ rP {X n = m|X 0 = i, X k ≠ r, k = 1,...,n}=Q n i,mor construct a counterexample.13. Let P be the transition probability matrix of a Markov chain. Argue that iffor some positive integer r, P r has all positive entries, then so does P n , for allintegers n r.14. Specify the classes of the following Markov chains, and determine whetherthey are transient or recurrent:1 10 0 0 102 2P 1 =1 10 0 0 1202, P 2 =1 1 ,∥ 1 12 20∥2 20 0∥0 0 1 0∥1 1∥ 2020 0 ∥∥∥∥∥∥∥∥∥∥∥ 1 3 4 40 0 01 1 1 4 2 40 01 1 2 20 0 0P 3 =1 12020 0 , P 4 =0 0 1 0 01 10 0 01 22 20 03 30∥10 0 0∥21 0 0 0 0∥1215. Prove that if the number of states in a Markov chain is M, and if state j canbe reached from state i, then it can be reached in M steps or less.


266 4 Markov Chains*16. Show that if state i is recurrent and state i does not communicate withstate j, then P ij = 0. This implies that once a process enters a recurrent classof states it can never leave that class. For this reason, a recurrent class is oftenreferred to as a closed class.17. For the random walk of Example 4.15 use the strong law of large numbersto give another proof that the Markov chain is transient when p ≠ 1 2 .Hint: Note that the state at time n can be written as ∑ ni=1 Y i where the Y i sare independent and P {Y i = 1}=p = 1 − P {Y i =−1}. Argue that if p> 1 2 ,then, by the strong law of large numbers, ∑ n1 Y i →∞as n →∞and hencethe initial state 0 can be visited only finitely often, and hence must be transient.A similar argument holds when p< 1 2 .18. Coin 1 comes up heads with probability 0.6 and coin 2 with probability 0.5.A coin is continually flipped until it comes up tails, at which time that coin is putaside and we start flipping the other one.(a) What proportion of flips use coin 1?(b) If we start the process with coin 1 what is the probability that coin 2 is usedon the fifth flip?19. For Example 4.4, calculate the proportion of days that it rains.20. A transition probability matrix P is said to be doubly stochastic if the sumover each column equals one; that is,∑P ij = 1, for all jiIf such a chain is irreducible and aperiodic and consists of M + 1 states0, 1,...,M, show that the limiting probabilities are given byπ j = 1M + 1 ,j = 0, 1,...,M*21. A DNA nucleotide has any of 4 values. A standard model for a mutationalchange of the nucleotide at a specific location is a Markov chain model that supposesthat in going from period to period the nucleotide does not change withprobability 1 − 3α, and if it does change then it is equally likely to change to anyof the other 3 values, for some 0


Exercises 267Hint: Define an appropriate Markov chain and apply the results of Exercise20.23. Trials are performed in sequence. If the last two trials were successes, thenthe next trial is a success with probability 0.8; otherwise the next trial is a successwith probability 0.5. In the long run, what proportion of trials are successes?24. Consider three urns, one colored red, one white, and one blue. The red urncontains 1 red and 4 blue balls; the white urn contains 3 white balls, 2 red balls,and 2 blue balls; the blue urn contains 4 white balls, 3 red balls, and 2 blue balls.At the initial stage, a ball is randomly selected from the red urn and then returnedto that urn. At every subsequent stage, a ball is randomly selected from the urnwhose color is the same as that of the ball previously selected and is then returnedto that urn. In the long run, what proportion of the selected balls are red? Whatproportion are white? What proportion are blue?25. Each morning an individual leaves his house and goes for a run. He isequally likely to leave either from his front or back door. Upon leaving the house,he chooses a pair of running shoes (or goes running barefoot if there are no shoesat the door from which he departed). On his return he is equally likely to enter,and leave his running shoes, either by the front or back door. If he owns a total ofk pairs of running shoes, what proportion of the time does he run barefooted?26. Consider the following approach to shuffling a deck of n cards. Startingwith any initial ordering of the cards, one of the numbers 1, 2,...,n is randomlychosen in such a manner that each one is equally likely to be selected. If numberi is chosen, then we take the card that is in position i and put it on top of thedeck—that is, we put that card in position 1. We then repeatedly perform thesame operation. Show that, in the limit, the deck is perfectly shuffled in the sensethat the resultant ordering is equally likely to be any of the n! possible orderings.*27. Determine the limiting probabilities π j for the model presented in Exercise1. Give an intuitive explanation of your answer.28. For a series of dependent trials the probability of success on any trial is(k + 1)/(k + 2) where k is equal to the number of successes on the previous twotrials. Compute lim n→∞ P {success on the nth trial}.29. An organization has N employees where N is a large number. Each employeehas one of three possible job classifications and changes classifications(independently) according to a Markov chain with transition probabilities⎡⎤0.7 0.2 0.1⎣0.2 0.6 0.2⎦0.1 0.4 0.5What percentage of employees are in each classification?


268 4 Markov Chains30. Three out of every four trucks on the road are followed by a car, while onlyone out of every five cars is followed by a truck. What fraction of vehicles on theroad are trucks?31. A certain town never has two sunny days in a row. Each day is classifiedas being either sunny, cloudy (but dry), or rainy. If it is sunny one day, then it isequally likely to be either cloudy or rainy the next day. If it is rainy or cloudy oneday, then there is one chance in two that it will be the same the next day, and if itchanges then it is equally likely to be either of the other two possibilities. In thelong run, what proportion of days are sunny? What proportion are cloudy?*32. Each of two switches is either on or off during a day. On day n, each switchwill independently be on with probability[1 + number of on switches during day n − 1]/4For instance, if both switches are on during day n − 1, then each will independentlybe on during day n with probability 3/4. What fraction of days are bothswitches on? What fraction are both off?33. A professor continually gives exams to her students. She can give three possibletypes of exams, and her class is graded as either having done well or badly.Let p i denote the probability that the class does well on a type i exam, and supposethat p 1 = 0.3, p 2 = 0.6, and p 3 = 0.9. If the class does well on an exam,then the next exam is equally likely to be any of the three types. If the class doesbadly, then the next exam is always type 1. What proportion of exams are typei, i = 1, 2, 3?34. A flea moves around the vertices of a triangle in the following manner:Whenever it is at vertex i it moves to its clockwise neighbor vertex with probabilityp i and to the counterclockwise neighbor with probability q i = 1 − p i ,i = 1, 2, 3.(a) Find the proportion of time that the flea is at each of the vertices.(b) How often does the flea make a counterclockwise move which is then followedby five consecutive clockwise moves?35. Consider a Markov chain with states 0, 1, 2, 3, 4. Suppose P 0,4 = 1; andsuppose that when the chain is in state i, i > 0, the next state is equally likely tobe any of the states 0, 1,...,i− 1. Find the limiting probabilities of this Markovchain.36. The state of a process changes daily according to a two-state Markov chain.If the process is in state i during one day, then it is in state j the follow-


Exercises 269ing day with probability P i,j , whereP 0,0 = 0.4, P 0,1 = 0.6, P 1,0 = 0.2, P 1,1 = 0.8Every day a message is sent. If the state of the Markov chain that day is i thenthe message sent is “good” with probability p i and is “bad” with probability q i =1 − p i , i = 0, 1(a) If the process is in state 0 on Monday, what is the probability that a goodmessage is sent on Tuesday?(b) If the process is in state 0 on Monday, what is the probability that a goodmessage is sent on Friday?(c) In the long run, what proportion of messages are good?(d) Let Y n equal 1 if a good message is sent on day n and let it equal 2 otherwise.Is {Y n ,n 1} a Markov chain? If so, give its transition probability matrix.If not, briefly explain why not.37. Show that the stationary probabilities for the Markov chain having transitionprobabilities P i,j are also the stationary probabilities for the Markov chain whosetransition probabilities Q i,j are given byQ i,j = P ki,jfor any specified positive integer k.38. Recall that state i is said to be positive recurrent if m i,i < ∞, where m i,iis the expected number of transitions until the Markov chain, starting in state i,makes a transition back into that state. Because π i , the long run proportion of timethe Markov chain, starting in state i, spends in state i, satisfiesπ i = 1m i,iit follows that state i is positive recurrent if and only if π i > 0. Suppose that statei is positive recurrent and that state i communicates with state j. Show that statej is also positive recurrent by arguing that there is an integer n such thatπ j π i P ni,j > 039. Recall that a recurrent state that is not positive recurrent is called null recurrent.Use the result of Exercise 38 to prove that null recurrence is a class property.That is, if state i is null recurrent and state i communicates with state j, show thatstate j is also null recurrent.


270 4 Markov Chains40. It follows from the argument made in Exercise 38 that state i is null recurrentif it is recurrent and π i = 0. Consider the one-dimensional symmetric randomwalk of Example 4.15.(a) Argue that π i = π 0 for all i.(b) Argue that all states are null recurrent.*41. Let π i denote the long-run proportion of time a given irreducible Markovchain is in state i.(a) Explain why π i is also the proportion of transitions that are into state i aswell as being the proportion of transitions that are from state i.(b) π i P ij represents the proportion of transitions that satisfy what property?(c) ∑ i π iP ij represent the proportion of transitions that satisfy what property?(d) Using the preceding explain whyπ j = ∑ iπ i P ij42. Let A be a set of states, and let A c be the remaining states.(a) What is the interpretation of∑ ∑π i P ij ?i∈A j∈A c(b) What is the interpretation of∑i∈A c ∑j∈Aπ i P ij ?(c) Explain the identity∑ ∑π i P ij = ∑ ∑π i P iji∈A j∈A c i∈A c j∈A43. Each day, one of n possible elements is requested, the ith one with probabilityP i ,i 1, ∑ n1 P i = 1. These elements are at all times arranged in an orderedlist which is revised as follows: The element selected is moved to the front of thelist with the relative positions of all the other elements remaining unchanged. Definethe state at any time to be the list ordering at that time and note that there aren! possible states.(a) Argue that the preceding is a Markov chain.


Exercises 271(b) For any state i 1 ,...,i n (which is a permutation of 1, 2,...,n), letπ(i 1 ,...,i n ) denote the limiting probability. In order for the state to bei 1 ,...,i n , it is necessary for the last request to be for i 1 , the last non-i 1 requestfor i 2 , the last non-i 1 or i 2 request for i 3 , and so on. Hence, it appearsintuitive thatπ(i 1 ,...,i n ) = P i1P i21 − P i1P i31 − P i1 − P i2···P in−11 − P i1 −···−P in−2Verify when n = 3 that the preceding are indeed the limiting probabilities.44. Suppose that a population consists of a fixed number, say, m, of genes in anygeneration. Each gene is one of two possible genetic types. If any generation hasexactly i (of its m) genes being type 1, then the next generation will have j type 1(and m − j type 2) genes with probability( mj)( im) j ( ) m − i m−j, j = 0, 1,...,mmLet X n denote the number of type 1 genes in the nth generation, and assumethat X 0 = i.(a) Find E[X n ].(b) What is the probability that eventually all the genes will be type 1?45. Consider an irreducible finite Markov chain with states 0, 1,...,N.(a) Starting in state i, what is the probability the process will ever visit state j?Explain!(b) Let x i = P {visit state N before state 0|start in i}. Compute a set of linearequations which the x i satisfy, i = 0, 1,...,N.(c) If ∑ j jP ij = i for i = 1,...,N− 1, show that x i = i/N is a solution to theequations in part (b)46. An individual possesses r umbrellas which he employs in going from hishome to office, and vice versa. If he is at home (the office) at the beginning (end)of a day and it is raining, then he will take an umbrella with him to the office(home), provided there is one to be taken. If it is not raining, then he never takesan umbrella. Assume that, independent of the past, it rains at the beginning (end)of a day with probability p.(i) Define a Markov chain with r + 1 states which will help us to determinethe proportion of time that our man gets wet. (Note: He gets wet if it israining, and all umbrellas are at his other location.)


272 4 Markov Chains(ii) Show that the limiting probabilities are given by⎧ q⎪⎨ r + q , if i = 0π i =where q = 1 − p1 ⎪⎩r + q , if i = 1,...,r(iii) What fraction of time does our man get wet?(iv) When r = 3, what value of p maximizes the fraction of time he gets wet*47. Let {X n ,n 0} denote an ergodic Markov chain with limiting probabilitiesπ i . Define the process {Y n ,n 1} by Y n = (X n−1 ,X n ). That is, Y n keeps trackof the last two states of the original chain. Is {Y n ,n 1} a Markov chain? If so,determine its transition probabilities and findlim P {Y n = (i, j)}n→∞48. Verify the transition probability matrix given in Example 4.20.49. Let P (1) and P (2) denote transition probability matrices for ergodic Markovchains having the same state space. Let π 1 and π 2 denote the stationary (limiting)probability vectors for the two chains. Consider a process defined as follows:(i) X 0 = 1. A coin is then flipped and if it comes up heads, then the remainingstates X 1 ,... are obtained from the transition probability matrixP (1) and if tails from the matrix P (2) .Is{X n ,n 0} a Markov chain? Ifp = P {coin comes up heads}, what is lim n→∞ P(X n = i)?(ii) X 0 = 1. At each stage the coin is flipped and if it comes up heads, thenthe next state is chosen according to P (1) and if tails comes up, then it ischosen according to P (2) . In this case do the successive states constitutea Markov chain? If so, determine the transition probabilities. Show bya counterexample that the limiting probabilities are not the same as inpart (i).50. In Exercise 8, if today’s flip lands heads, what is the expected number ofadditional flips needed until the pattern t,t,h,t,h,t,t occurs?51. In Example 4.3, Gary is in a cheerful mood today. Find the expected numberof days until he has been glum for three consecutive days.52. A taxi driver provides service in two zones of a city. Fares picked up inzone A will have destinations in zone A with probability 0.6 or in zone B withprobability 0.4. Fares picked up in zone B will have destinations in zone A withprobability 0.3 or in zone B with probability 0.7. The driver’s expected profitfor a trip entirely in zone A is 6; for a trip entirely in zone B is 8; and fora trip that involves both zones is 12. Find the taxi driver’s average profit pertrip.


Exercises 27353. Find the average premium received per policyholder of the insurance companyof Example 4.23 if λ = 1/4 for one-third of its clients, and λ = 1/2 fortwo-thirds of its clients.54. Consider the Ehrenfest urn model in which M molecules are distributedbetween two urns, and at each time point one of the molecules is chosen at randomand is then removed from its urn and placed in the other one. Let X n denote thenumber of molecules in urn 1 after the nth switch and let μ n = E[X n ]. Showthat(i) μ n+1 = 1 + (1 − 2/M)μ n .(ii) Use (i) to prove thatμ n = M ( ) M − 2 n (2 + E[X 0 ]− M )M255. Consider a population of individuals each of whom possesses two geneswhich can be either type A or type a. Suppose that in outward appearance typeA is dominant and type a is recessive. (That is, an individual will have onlythe outward characteristics of the recessive gene if its pair is aa.) Suppose thatthe population has stabilized, and the percentages of individuals having respectivegene pairs AA, aa, and Aa are p, q, and r. Call an individual dominantor recessive depending on the outward characteristics it exhibits. Let S 11 denotethe probability that an offspring of two dominant parents will be recessive; andlet S 10 denote the probability that the offspring of one dominant and one recessiveparent will be recessive. Compute S 11 and S 10 to show that S 11 = S10 2 .(The quantities S 10 and S 11 are known in the genetics literature as Snyder’s ratios.)56. Suppose that on each play of the game a gambler either wins 1 with probabilityp or loses 1 with probability 1 − p. The gambler continues betting untilshe or he is either winning n or losing m. What is the probability that the gamblerquits a winner?57. A particle moves among n + 1 vertices that are situated on a circle inthe following manner. At each step it moves one step either in the clockwisedirection with probability p or the counterclockwise direction with probabilityq = 1 − p. Starting at a specified state, call it state 0, let T be the time of thefirst return to state 0. Find the probability that all states have been visited bytime T .Hint: Condition on the initial transition and then use results from the gambler’sruin problem.


274 4 Markov Chains58. In the gambler’s ruin problem of Section 4.5.1, suppose the gambler’s fortuneis presently i, and suppose that we know that the gambler’s fortune willeventually reach N (before it goes to 0). Given this information, show that theprobability he wins the next gamble isp[1 − (q/p) i+1 ]1 − (q/p) i , if p ≠ 1 2i + 1, if p = 12i2Hint:The probability we want isP {X n+1 = i + 1|X n = i, limm→∞ X m = N}= P {X n+1 = i + 1, lim m X m = N|X n = i}P {lim m X m = N|X n = i}59. For the gambler’s ruin model of Section 4.5.1, let M i denote the mean numberof games that must be played until the gambler either goes broke or reachesa fortune of N, given that he starts with i, i = 0, 1,...,N. Show that M i satisfiesM 0 = M N = 0; M i = 1 + pM i+1 + qM i−1 , i = 1,...,N − 160. Solve the equations given in Exercise 59 to obtainM i = i(N − i), if p = 1 2= iq − p −Nq − p1 − (q/p) i1 − (q/p) N , if p ≠ 1 261. Suppose in the gambler’s ruin problem that the probability of winning abet depends on the gambler’s present fortune. Specifically, suppose that α i is theprobability that the gambler wins a bet when his or her fortune is i. Given thatthe gambler’s initial fortune is i,letP(i)denote the probability that the gambler’sfortune reaches N before 0.(a) Derive a formula that relates P(i)to P(i− 1) and P(i+ 1).(b) Using the same approach as in the gambler’s ruin problem, solve the equationof part (a) for P(i).(c) Suppose that i balls are initially in urn 1 and N −i are in urn 2, and supposethat at each stage one of the N balls is randomly chosen, taken from whicheverurn it is in, and placed in the other urn. Find the probability that the first urnbecomes empty before the second.


Exercises 275*62. In Exercise 21,(a) what is the expected number of steps the particle takes to return to thestarting position?(b) what is the probability that all other positions are visited before the particlereturns to its starting state?63. For the Markov chain with states 1, 2, 3, 4 whose transition probabilitymatrix P is as specified below find f i3 and s i3 for i = 1, 2, 3.⎡⎤0.4 0.2 0.1 0.3P = ⎢0.1 0.5 0.2 0.2⎥⎣0.3 0.4 0.2 0.1⎦0 0 0 164. Consider a branching process having μ1, prove that π 0 is the smallestpositive number satisfying Equation (4.16).Hint: Let π be any solution of π = ∑ ∞j=0 π j P j . Show by mathematical inductionthat π P {X n = 0} for all n, and let n →∞. In using the inductionargue thatP {X n = 0}=∞∑(P {X n−1 = 0}) j P jj=066. For a branching process, calculate π 0 when(a) P 0 = 1 4 ,P 2 = 3 4 .(b) P 0 = 1 4 ,P 1 = 1 2 ,P 2 = 1 4 .(c) P 0 = 1 6 ,P 1 = 1 2 ,P 3 = 1 3 .67. At all times, an urn contains N balls—some white balls and some blackballs. At each stage, a coin having probability p,0


276 4 Markov Chains(c) Compute the transition probabilities P ij .(d) Let N = 2. Find the proportion of time in each state.(e) Based on your answer in part (d) and your intuition, guess the answer forthe limiting probability in the general case.(f) Prove your guess in part (e) either by showing that Equation (4.7) is satisfiedor by using the results of Example 4.31.(g) If p = 1, what is the expected time until there are only white balls in theurn if initially there are i white and N − i black?*68. (a) Show that the limiting probabilities of the reversed Markov chainare the same as for the forward chain by showing that they satisfy the equationsπ j = ∑ iπ i Q ij(b) Give an intuitive explanation for the result of part (a).69. M balls are initially distributed among m urns. At each stage one of theballs is selected at random, taken from whichever urn it is in, and then placed,at random, in one of the other M − 1 urns. Consider the Markov chain whosestate at any time is the vector (n 1 ,...,n m ) where n i denotes the number ofballs in urn i. Guess at the limiting probabilities for this Markov chain and thenverify your guess and show at the same time that the Markov chain is time reversible.70. A total of m white and m black balls are distributed among two urns, witheach urn containing m balls. At each stage, a ball is randomly selected from eachurn and the two selected balls are interchanged. Let X n denote the number ofblack balls in urn 1 after the nth interchange.(a) Give the transition probabilities of the Markov chain X n ,n 0.(b) Without any computations, what do you think are the limiting probabilitiesof this chain?(c) Find the limiting probabilities and show that the stationary chain is timereversible.71. It follows from Theorem 4.2 that for a time reversible Markov chainP ij P jk P ki = P ik P kj P ji ,for all i, j, kIt turns out that if the state space is finite and P ij > 0 for all i, j, then the precedingis also a sufficient condition for time reversibility. [That is, in this case, we needonly check Equation (4.26) for paths from i to i that have only two intermediatestates.] Prove this.


Exercises 277Hint:Fix i and show that the equationsπ j P jk = π k P kjare satisfied by π j = cP ij /P ji , where c is chosen so that ∑ j π j = 1.72. For a time reversible Markov chain, argue that the rate at which transitionsfrom i to j to k occur must equal the rate at which transitions from k to j to ioccur.73. Show that the Markov chain of Exercise 31 is time reversible.74. A group of n processors is arranged in an ordered list. When a job arrives,the first processor in line attempts it; if it is unsuccessful, then the nextin line tries it; if it too is unsuccessful, then the next in line tries it, and so on.When the job is successfully processed or after all processors have been unsuccessful,the job leaves the system. At this point we are allowed to reorderthe processors, and a new job appears. Suppose that we use the one-closer reorderingrule, which moves the processor that was successful one closer to thefront of the line by interchanging its position with the one in front of it. If allprocessors were unsuccessful (or if the processor in the first position was successful),then the ordering remains the same. Suppose that each time processor iattempts a job then, independently of anything else, it is successful with probabilityp i .(a) Define an appropriate Markov chain to analyze this model.(b) Show that this Markov chain is time reversible.(c) Find the long-run probabilities.75. A Markov chain is said to be a tree process if(i) P ij > 0 whenever P ji > 0,(ii) for every pair of states i and j, i ≠ j, there is a unique sequence of distinctstates i = i 0 ,i 1 ,...,i n−1 ,i n = j such thatP ik ,i k+1> 0, k= 0, 1,...,n− 1In other words, a Markov chain is a tree process if for every pair of distinctstates i and j there is a unique way for the process to go from i to j withoutreentering a state (and this path is the reverse of the unique path from j to i).Argue that an ergodic tree process is time reversible.76. On a chessboard compute the expected number of plays it takes a knight,starting in one of the four corners of the chessboard, to return to its initial positionif we assume that at each play it is equally likely to choose any of its legal moves.(No other pieces are on the board.)


278 4 Markov ChainsHint: Make use of Example 4.32.77. In a Markov decision problem, another criterion often used, different thanthe expected average return per unit time, is that of the expected discounted return.In this criterion we choose a number α, 0


Exercises 279(c) Let {y ja } be a set of numbers satisfying∑ ∑y ja = 11 − α ,j a∑y ja = b j + α ∑ ∑y ia P ij (a) (4.38)ai aArgue that y ja can be interpreted as the expected discounted time that theprocess is in state j and action a is chosen when the initial state is chosenaccording to the probabilities b j and the policy β, given byβ i (a) =y ia∑a y iais employed.Hint: Derive a set of equations for the expected discounted times whenpolicy β is used and show that they are equivalent to Equation (4.38).(d) Argue that an optimal policy with respect to the expected discounted returncriterion can be obtained by first solving the linear program∑ ∑maximize y ja R(j,a),ja∑ ∑such that y ja = 11 − α ,j a∑y ja = b j + α ∑ ∑y ia P ij (a),ai ay ja 0, all j,a;and then defining the policy β ∗ byβ ∗ i (a) =y∗ ia∑a y∗ iawhere the yja ∗ are the solutions of the linear program.78. For the Markov chain of Exercise 5, suppose that p(s|j) is the probabilitythat signal s is emitted when the underlying Markov chain state is j,j = 0, 1, 2.(a) What proportion of emissions are signal s?(b) What proportion of those times in which signal s is emitted is 0 theunderlying state?79. In Example 4.39, what is the probability that the first 4 items produced areall acceptable?


280 4 Markov ChainsReferences1. K. L. Chung, “Markov Chains with Stationary Transition Probabilities,” Springer,Berlin, 1960.2. S. Karlin and H. Taylor, “A First Course in Stochastic Processes,” Second Edition, AcademicPress, New York, 1975.3. J. G. Kemeny and J. L. Snell, “Finite Markov Chains,” Van Nostrand Reinhold, Princeton,New Jersey, 1960.4. S. M. Ross, “Stochastic Processes,” Second Edition, John Wiley, New York, 1996.5. S. M. Ross, “Probability Models for Computer Science,” Academic Press, 2002.


The ExponentialDistribution andthe Poisson Process55.1. IntroductionIn making a mathematical model for a real-world phenomenon it is always necessaryto make certain simplifying assumptions so as to render the mathematicstractable. On the other hand, however, we cannot make too many simplifyingassumptions, for then our conclusions, obtained from the mathematical model,would not be applicable to the real-world situation. Thus, in short, we mustmake enough simplifying assumptions to enable us to handle the mathematicsbut not so many that the mathematical model no longer resembles the real-worldphenomenon. One simplifying assumption that is often made is to assume thatcertain random variables are exponentially distributed. The reason for this is thatthe exponential distribution is both relatively easy to work with and is often agood approximation to the actual distribution.The property of the exponential distribution that makes it easy to analyze isthat it does not deteriorate with time. By this we mean that if the lifetime ofan item is exponentially distributed, then an item that has been in use for ten(or any number of) hours is as good as a new item in regards to the amount of timeremaining until the item fails. This will be formally defined in Section 5.2 whereit will be shown that the exponential is the only distribution which possesses thisproperty.In Section 5.3 we shall study counting processes with an emphasis on a kindof counting process known as the Poisson process. Among other things we shalldiscover about this process is its intimate connection with the exponential distribution.281


282 5 The Exponential Distribution and the Poisson Process5.2. The Exponential Distribution5.2.1. DefinitionA continuous random variable X is said to have an exponential distribution withparameter λ, λ>0, if its probability density function is given by{λef(x)=−λx , x 00, x


5.2. The Exponential Distribution 283Consequently,Var(X) = E[X 2 ]−(E[X]) 2= 2 λ 2 − 1 λ 2= 1 λ 2Example 5.1 (Exponential Random Variables and Expected Discounted Returns)Suppose that you are receiving rewards at randomly changing rates continuouslythroughout time. Let R(x) denote the random rate at which you arereceiving rewards at time x. For a value α 0, called the discount rate, the quantityR =∫ ∞0e −αx R(x)dxrepresents the total discounted reward. (In certain applications, α is a continuouslycompounded interest rate, and R is the present value of the infinite flow ofrewards.) Whereas[∫ ∞] ∫ ∞E[R]=E e −αx R(x)dx = e −αx E[R(x)] dx00is the expected total discounted reward, we will show that it is also equal to theexpected total reward earned up to an exponentially distributed random time withrate α.Let T be an exponential random variable with rate α, that is independent of allthe random variables R(x). We want to argue that∫ ∞[ ∫ T]e −αx E[R(x)] dx = E R(x)dx00To show this, define for each x 0, a random variable I(x)by{ 1, if x TI(x)=0, if x>Tand note that∫ T0R(x)dx =∫ ∞0R(x)I(x)dx


284 5 The Exponential Distribution and the Poisson ProcessThus,[ ∫ T] [ ∫ ∞]E R(x)dx = E R(x)I(x)dx00====∫ ∞0∫ ∞0∫ ∞0∫ ∞0E[R(x)I(x)] dxE[R(x)]E[I(x)] dxE[R(x)]P {T x} dxe −αx E[R(x)] dxby independenceTherefore, the expected total discounted reward is equal to the expected total(undiscounted) reward earned by a random time that is exponentially distributedwith a rate equal to the discount factor. 5.2.2. Properties of the Exponential DistributionA random variable X is said to be without memory, or memoryless,ifP {X>s+ t | X>t}=P {X>s} for all s,t 0 (5.2)If we think of X as being the lifetime of some instrument, then Equation (5.2)states that the probability that the instrument lives for at least s +t hours given thatit has survived t hours is the same as the initial probability that it lives for at leasts hours. In other words, if the instrument is alive at time t, then the distribution ofthe remaining amount of time that it survives is the same as the original lifetimedistribution; that is, the instrument does not remember that it has already been inuse for a time t.The condition in Equation (5.2) is equivalent toorP {X>s+ t, X > t}P {X>t}= P {X>s}P {X>s+ t}=P {X>s}P {X>t} (5.3)Since Equation (5.3) is satisfied when X is exponentially distributed (fore −λ(s+t) = e −λs e −λt ), it follows that exponentially distributed random variablesare memoryless.


5.2. The Exponential Distribution 285Example 5.2 Suppose that the amount of time one spends in a bank is exponentiallydistributed with mean ten minutes, that is, λ =10 1 . What is the probabilitythat a customer will spend more than fifteen minutes in the bank? What is theprobability that a customer will spend more than fifteen minutes in the bank giventhat she is still in the bank after ten minutes?Solution: If X represents the amount of time that the customer spends inthe bank, then the first probability is justP {X>15}=e −15λ = e −3/2 ≈ 0.220The second question asks for the probability that a customer who has spent tenminutes in the bank will have to spend at least five more minutes. However,since the exponential distribution does not “remember” that the customer hasalready spent ten minutes in the bank, this must equal the probability that anentering customer spends at least five minutes in the bank. That is, the desiredprobability is justP {X>5}=e −5λ = e −1/2 ≈ 0.604Example 5.3 Consider a post office that is run by two clerks. Suppose thatwhen Mr. Smith enters the system he discovers that Mr. Jones is being served byone of the clerks and Mr. Brown by the other. Suppose also that Mr. Smith is toldthat his service will begin as soon as either Jones or Brown leaves. If the amountof time that a clerk spends with a customer is exponentially distributed with mean1/λ, what is the probability that, of the three customers, Mr. Smith is the last toleave the post office?Solution: The answer is obtained by this reasoning: Consider the time atwhich Mr. Smith first finds a free clerk. At this point either Mr. Jones orMr. Brown would have just left and the other one would still be in service.However, by the lack of memory of the exponential, it follows that the amountof time that this other man (either Jones or Brown) would still have to spendin the post office is exponentially distributed with mean 1/λ. That is, it is thesame as if he were just starting his service at this point. Hence, by symmetry,the probability that he finishes before Smith must equal 1 2 . Example 5.4 The dollar amount of damage involved in an automobile accidentis an exponential random variable with mean 1000. Of this, the insurancecompany only pays that amount exceeding (the deductible amount of) 400. Findthe expected value and the standard deviation of the amount the insurance companypays per accident.


286 5 The Exponential Distribution and the Poisson ProcessSolution: If X is the dollar amount of damage resulting from an accident,then the amount paid by the insurance company is (X − 400) + , (where a + isdefined to equal a if a>0 and to equal 0 if a 0). Whereas we could certainlydetermine the expected value and variance of (X − 400) + from first principles,it is easier to condition on whether X exceeds 400. So, let{ 1, if X>400I =0, if X 400Let Y = (X −400) + be the amount paid. By the lack of memory property of theexponential, it follows that if a damage amount exceeds 400, then the amountby which it exceeds it is exponential with mean 1000. Therefore,E[Y |I = 1]=1000E[Y |I = 0]=0Var(Y |I = 1) = (1000) 2Var(Y |I = 0) = 0which can be conveniently written asE[Y |I]=10 3 I,Var(Y |I)= 10 6 IBecause I is a Bernoulli random variable that is equal to 1 with probabilitye −0.4 , we obtainE[Y ]=E [ E[Y |I] ] = 10 3 E[I]=10 3 e −0.4 ≈ 670.32and, by the conditional variance formulaVar(Y ) = E [ Var(Y |I) ] + Var ( E[Y |I] )= 10 6 e −0.4 + 10 6 e −0.4 (1 − e −0.4 )where the final equality used that the variance of a Bernoulli random variablewith parameter p is p(1 − p). Consequently,√Var(Y ) ≈ 944.09


5.2. The Exponential Distribution 287It turns out that not only is the exponential distribution “memoryless,” but itis the unique distribution possessing this property. To see this, suppose that X ismemoryless and let ¯F(x)= P {X>x}. Then by Equation (5.3) it follows that¯F(s+ t)= ¯F(s) ¯F(t)That is, ¯F(x)satisfies the functional equationg(s + t)= g(s)g(t)However, it turns out that the only right continuous solution of this functionalequation isg(x) = e −λx ∗and since a distribution function is always right continuous we must have¯F(x)= e −λxorF(x)= P {X x}=1 − e −λxwhich shows that X is exponentially distributed.Example 5.5 Suppose that the amount of time that a lightbulb works beforeburning itself out is exponentially distributed with mean ten hours. Suppose thata person enters a room in which a lightbulb is burning. If this person desires towork for five hours, then what is the probability that she will be able to completeher work without the bulb burning out? What can be said about this probabilitywhen the distribution is not exponential?∗ This is proven as follows: If g(s + t)= g(s)g(t),then( ) ( 2 1g = gn n + 1 ) ( )= g 2 1n nand repeating this yields g(m/n) = g m (1/n).Also( 1g(1) = gn + 1 n +···+ 1 ) ( )= g n 1n nor( ) 1g = (g(1)) 1/nnHence g(m/n) = (g(1)) m/n , which implies, since g is right continuous, that g(x) = (g(1)) x .Sinceg(1) = (g( 1 2 ))2 0 we obtain g(x) = e −λx ,whereλ =−log(g(1)).


288 5 The Exponential Distribution and the Poisson ProcessSolution: Since the bulb is burning when the person enters the room it follows,by the memoryless property of the exponential, that its remaining lifetimeis exponential with mean ten. Hence the desired probability isP {remaining lifetime > 5}=1 − F(5) = e −5λ = e −1/2However, if the lifetime distribution F is not exponential, then the relevantprobability isP {lifetime >t+ 5 | lifetime >t}=1 − F(t + 5)1 − F(t)where t is the amount of time that the bulb had been in use prior to the personentering the room. That is, if the distribution is not exponential then additionalinformation is needed (namely, t) before the desired probability can be calculated.In fact, it is for this reason, namely, that the distribution of the remaininglifetime is independent of the amount of time that the object has already survived,that the assumption of an exponential distribution is so often made. The memoryless property is further illustrated by the failure rate function (alsocalled the hazard rate function) of the exponential distribution.Consider a continuous positive random variable X having distribution functionF and density f .Thefailure (or hazard) rate function r(t) is defined byr(t) =f(t)1 − F(t)(5.4)To interpret r(t), suppose that an item, having lifetime X, has survived for t hours,and we desire the probability that it does not survive for an additional time dt.That is, consider P {X ∈ (t, t + dt)|X>t}.NowP {X ∈ (t, t + dt)|X>t}=P {X ∈ (t, t + dt),X > t}P {X>t}P {X ∈ (t, t + dt)}=P {X>t}≈ f(t)dt1 − F(t) = r(t)dtThat is, r(t) represents the conditional probability density that a t-year-old itemwill fail.Suppose now that the lifetime distribution is exponential. Then, by the memorylessproperty, it follows that the distribution of remaining life for a t-year-old


5.2. The Exponential Distribution 289item is the same as for a new item. Hence r(t) should be constant. This checksout sincer(t) =f(t)1 − F(t)= λe−λte −λt= λThus, the failure rate function for the exponential distribution is constant. Theparameter λ is often referred to as the rate of the distribution. (Note that the rateis the reciprocal of the mean, and vice versa.)It turns out that the failure rate function r(t) uniquely determines the distributionF . To prove this, we note by Equation (5.4) thatIntegrating both sides yieldsorr(t) =Letting t = 0 shows that k = 0 and thusddt F(t)1 − F(t)∫ tlog(1 − F(t))=− r(t)dt + k0{ ∫ t}1 − F(t)= e k exp − r(t)dt0{ ∫ t}F(t)= 1 − exp − r(t)dt0The preceding identity can also be used to show that exponential random variablesare the only ones that are memoryless. Because if X is memoryless, thenits failure rate function must be constant. But if r(t) = c, then by the precedingequation1 − F(t)= e −ctshowing that the random variable is exponential.Example 5.6 Let X 1 ,...,X n be independent exponential random variableswith respective rates λ 1 ,...,λ n , where λ i ≠ λ j when i ≠ j.LetT be independent


290 5 The Exponential Distribution and the Poisson Processof these random variables and suppose thatn∑P j = 1 where P j = P {T = j}j=1The random variable X T is said to be a hyperexponential random variable. Tosee how such a random variable might originate, imagine that a bin contains ndifferent types of batteries, with a type j battery lasting for an exponential distributedtime with rate λ j ,j = 1,...,n. Suppose further that P j is the proportionof batteries in the bin that are type j for each j = 1,...,n. If a battery is randomlychosen, in the sense that it is equally likely to be any of the batteries inthe bin, then the lifetime of the battery selected will have the hyperexponentialdistribution specified in the preceding.To obtain the distribution function F of X = X T , condition on T . This yields1 − F(t)= P {X>t}n∑= P {X>t|T = i}P {T = i}=i=1n∑P i e −λ iti=1Differentiation of the preceding yields f , the density function of X.f(t)=n∑λ i P i e −λ iti=1Consequently, the failure rate function of a hyperexponential random variable is∑ nj=1P j λ j e −λ j tr(t) = ∑ ni=1P i e −λ itBy noting thatP {X>t|T = j}P {T = j}P {T = j|X>t}=P {X>t}= P j e −λ j t∑ ni=1P i e −λ itwe see that the failure rate function r(t) can also be written asn∑r(t) = λ j P {T = j|X>t}j=1


5.2. The Exponential Distribution 291If λ 1 1, thenP {T = 1|X>t}==P 1 e −λ 1tP 1 e −λ 1t + ∑ ni=2 P i e −λ itP 1P 1 + ∑ ni=2 P i e −(λ i−λ 1 )t→ 1ast →∞Similarly, P {T = i|X>t}→0 when i ≠ 1, thus showing thatlimt→∞r(t) = min λ iiThat is, as a randomly chosen battery ages its failure rate converges to the failurerate of the exponential type having the smallest failure rate, which is intuitive sincethe longer the battery lasts, the more likely it is a battery type with the smallestfailure rate. 5.2.3. Further Properties of the Exponential DistributionLet X 1 ,...,X n be independent and identically distributed exponential randomvariables having mean 1/λ. It follows from the results of Example 2.38 thatX 1 +···+X n has a gamma distribution with parameters n and λ. Let us nowgive a second verification of this result by using mathematical induction. Becausethere is nothing to prove when n = 1, let us start by assuming that X 1 +···+X n−1has density given byHence,f X1 + ··· +X n−1 +X n(t) =−λt (λt)n−2f X1 +···+X n−1(t) = λe(n − 2)!=∫ ∞0∫ t0f Xn (t − s)f X1 + ··· +X n−1(s) dsλe −λ(t−s) −λs (λs)n−2λe(n − 2)! ds−λt (λt)n−1= λe(n − 1)!which proves the result.Another useful calculation is to determine the probability that one exponentialrandom variable is smaller than another. That is, suppose that X 1 and X 2 areindependent exponential random variables with respective means 1/λ 1 and 1/λ 2 ;


292 5 The Exponential Distribution and the Poisson Processwhat is P {X 1


5.2. The Exponential Distribution 293The second algorithm, which we call Greedy Algorithm B, is a more “global”version of the first greedy algorithm. It considers all n 2 cost values and choosesthe pair i 1 ,j 1 for which C(i,j) is minimal. It then assigns person i 1 to job j 1 .Itthen eliminates all cost values involving either person i 1 or job j 1 [so that (n − 1) 2values remain] and continues in the same fashion. That is, at each stage it choosesthe person and job that have the smallest cost among all the unassigned peopleand jobs.Under the assumption that the C ij constitute a set of n 2 independent exponentialrandom variables each having mean 1, which of the two algorithms results ina smaller expected total cost?Solution: Suppose first that Greedy Algorithm A is employed. Let C i denotethe cost associated with person i, i = 1,...,n.NowC 1 is the minimumof n independent exponentials each having rate 1; so by Equation (5.6) it willbe exponential with rate n. Similarly, C 2 is the minimum of n − 1 independentexponentials with rate 1, and so is exponential with rate n − 1. Indeed, by thesame reasoning C i will be exponential with rate n − i + 1, i= 1,...,n. Thus,the expected total cost under Greedy Algorithm A isE A [total cost]=E[C 1 +···+C n ]n∑= 1/ii=1Let us now analyze Greedy Algorithm B. Let C i be the cost of the ith personjobpair assigned by this algorithm. Since C 1 is the minimum of all the n 2 valuesC ij , it follows from Equation (5.6) that C 1 is exponential with rate n 2 .Now,it follows from the lack of memory property of the exponential that the amountsby which the other C ij exceed C 1 will be independent exponentials with rates 1.As a result, C 2 is equal to C 1 plus the minimum of (n − 1) 2 independent exponentialswith rate 1. Similarly, C 3 is equal to C 2 plus the minimum of (n − 2) 2independent exponentials with rate 1, and so on. Therefore, we see thatE[C 1 ]=1/n 2 ,E[C 2 ]=E[C 1 ]+1/(n − 1) 2 ,E[C 3 ]=E[C 2 ]+1/(n − 2) 2 ,..E[C j ]=E[C j−1 ]+1/(n − j + 1) 2 ,.E[C n ]=E[C n−1 ]+1


294 5 The Exponential Distribution and the Poisson ProcessTherefore,E[C 1 ]=1/n 2 ,E[C 2 ]=1/n 2 + 1/(n − 1) 2 ,E[C 3 ]=1/n 2 + 1/(n − 1) 2 + 1/(n − 2) 2 ,.E[C n ]=1/n 2 + 1/(n − 1) 2 + 1/(n − 2) 2 + ··· +1Adding up all the E[C i ] yields thatE B [total cost]=n/n 2 + (n − 1)/(n − 1) 2 + (n − 2)/(n − 2) 2 + ··· +1n∑ 1=ii=1The expected cost is thus the same for both greedy algorithms. Let X 1 ,...,X n be independent exponential random variables, with respectiverates λ 1 ,...,λ n . A useful result, generalizing Equation (5.5), is that X i is thesmallest of these with probability λ i / ∑ j λ j . This is shown as follows:{} {}P X i = min X j = P X i < min X jj j≠i=λ i∑ nj=1λ jwhere the final equality uses Equation (5.5) along with the fact that min j≠i X j isexponential with rate ∑ j≠i λ j .Another important fact is that min i X i and the rank ordering of the X i areindependent. To see why this is true, consider the conditional probability thatX i1


5.2. The Exponential Distribution 295Example 5.8 Suppose you arrive at a post office having two clerks at a momentwhen both are busy but there is no one else waiting in line. You will enterservice when either clerk becomes free. If service times for clerk i are exponentialwith rate λ i ,i = 1, 2, find E[T ], where T is the amount of time that you spend inthe post office.Solution: Let R i denote the remaining service time of the customer withclerk i, i = 1, 2, and note, by the lack of memory property of exponentials, thatR 1 and R 2 are independent exponential random variables with respective ratesλ 1 and λ 2 . Conditioning on which of R 1 or R 2 is the smallest yieldsE[T ]=E[T |R 1


296 5 The Exponential Distribution and the Poisson ProcessAnother way to obtain E[T ] is to write T as a sum, take expectations, andthen condition where needed. This approach yieldsE[T ]=E[min(R 1 ,R 2 ) + S]= E[min(R 1 ,R 2 )]+E[S]=1λ 1 + λ 2+ E[S]To compute E[S], we condition on which of R 1 and R 2 is smallest.λ 1E[S]=E[S|R 1


5.2. The Exponential Distribution 297If we let A j equal 1 if cell j is still alive at the moment when all the cells1,...,k have been killed, and let it equal 0 otherwise, thenA =n∑j=k+1Because cell j will be alive at the moment when all the cells 1,...,k have beenkilled if X j is larger than all the values X 1 ,...,X k , we see that for j>kE[A j ]=P {A j = 1}= P {X j > max X i}i=1,...,k==∫ ∞0∫ ∞0∫ ∞=0∫ 1=0A j{P X j > max X i|X j = xi=1,...,k}w j e −w j x dxP {X i


298 5 The Exponential Distribution and the Poisson ProcessSolution: Consider the n + 1 random variables consisting of the remainingservice time of the person in service along with the n additional exponentialdeparture times with rate θ of the first n in line.(a) Given that the smallest of these n + 1 independent exponentials is thedeparture time of the nth person in line, the conditional probability that thisperson will be served is 0; on the other hand, given that this person’s departuretime is not the smallest, the conditional probability that this person willbe served is the same as if it were initially in position n − 1. Since the probabilitythat a given departure time is the smallest of the n + 1 exponentials isθ/(nθ + μ), we obtain thatP n =(n − 1)θ + μnθ + μP n−1Using the preceding with n − 1 replacing n givesP n =(n − 1)θ + μ (n − 2)θ + μnθ + μ (n − 1)θ + μ P n−2 =Continuing in this fashion yields the resultP n = θ + μnθ + μ P 1 =μnθ + μ(n − 2)θ + μnθ + μP n−2(b) To determine an expression for W n , we use the fact that the minimum ofindependent exponentials is, independent of their rank ordering, exponentialwith a rate equal to the sum of the rates. Since the time until the nth person inline enters service is the minimum of these n + 1 random variables plus theadditional time thereafter, we see, upon using the lack of memory propertyof exponential random variables, thatW n = 1nθ + μ + W n−1Repeating the preceding argument with successively smaller values of nyields the solution:W n =n∑i=11iθ + μ5.2.4. Convolutions of Exponential Random VariablesLet X i ,i = 1,...,n, be independent exponential random variables with respectiverates λ i ,i = 1,...,n, and suppose that λ i ≠ λ j for i ≠ j. The random vari-


5.2. The Exponential Distribution 299able ∑ ni=1 X i is said to be a hypoexponential random variable. To compute itsprobability density function, let us start with the case n = 2. Now,∫ tf X1 +X 2(t) = f X1 (s)f X2 (t − s)ds0∫ t= λ 1 e −λ1s λ 2 e −λ2(t−s) ds0= λ 1 λ 2 e −λ 2t∫ t0e −(λ 1−λ 2 )s ds= λ 1λ 1 − λ 2λ 2 e −λ 2t (1 − e −(λ 1−λ 2 )t )= λ 1λ 1 − λ 2λ 2 e −λ 2t + λ 2λ 2 − λ 1λ 1 e −λ 1tUsing the preceding, a similar computation yields, when n = 3,f X1 +X 2 +X 3(t) =which suggests the general result:where3∑i=1f X1 + ··· +X n(t) =( ∏)λ i e −λ λitjλj≠i j − λ in∑C i,n λ i e −λ iti=1λ jC i,n = ∏ λj≠i j − λ iWe will now prove the preceding formula by induction on n. Since we have alreadyestablished it for n = 2, assume it for n and consider n + 1 arbitrary independentexponentials X i with distinct rates λ i , i = 1,...,n+ 1. If necessary,renumber X 1 and X n+1 so that λ n+1


300 5 The Exponential Distribution and the Poisson Process=n∑(C i,ni=1λ i= K n+1 λ n+1 e −λ n+1t +λ i − λ n+1λ n+1 e −λ n+1t + λ n+1λ n+1 − λ iλ i e −λ itn∑C i,n+1 λ i e −λ iti=1)(5.7)where K n+1 = ∑ ni=1 C i,n λ i /(λ i − λ n+1 ) is a constant that does not depend on t.But, we also have thatf X1 +···+X n+1(t) =∫ t0f X2 +···+X n+1(s)λ 1 e −λ 1(t−s) dswhich implies, by the same argument that resulted in Equation (5.7), that for aconstant K 1∑n+1f X1 +···+X n+1(t) = K 1 λ 1 e −λ1t + C i,n+1 λ i e −λ iti=2Equating these two expressions for f X1 +···+X n+1(t) yieldsK n+1 λ n+1 e −λ n+1t + C 1,n+1 λ 1 e −λ 1t = K 1 λ 1 e −λ 1t + C n+1,n+1 λ n+1 e −λ n+1tMultiplying both sides of the preceding equation by e λ n+1t and then letting t →∞yields [since e −(λ 1−λ n+1 )t → 0ast →∞]K n+1 = C n+1,n+1and this, using Equation (5.7), completes the induction proof. Thus, we haveshown that if S = ∑ ni=1 X i , thenwheren∑f s (t) = C i,n λ i e −λ iti=1λ jC i,n = ∏ λj≠i j − λ i(5.8)Integrating both sides of the expression for f S from t to ∞ yields that the taildistribution function of S is given byn∑P {S >t}= C i,n e −λ iti=1(5.9)


5.2. The Exponential Distribution 301Hence, we obtain from Equations (5.8) and (5.9) that r S (t), the failure rate functionof S, is as follows:∑ ni=1C i,n λ i e −λ itr S (t) = ∑ ni=1C i,n e −λ itIf we let λ j = min(λ 1 ,...,λ n ), then it follows, upon multiplying the numeratorand denominator of r S (t) by e λ j t , thatlimt→∞ r S(t) = λ jFrom the preceding, we can conclude that the remaining lifetime of a hypoexponentiallydistributed item that has survived to age t is, for t large, approximatelythat of an exponentially distributed random variable with a rate equal to the minimumof the rates of the random variables whose sums make up the hypoexponential.RemarkAlthough1 =∫ ∞0f S (t) dt =n∑C i,n =i=1n∑ ∏i=1 j≠iλ jλ j − λ iit should not be thought that the C i,n , i = 1,...,nare probabilities, because someof them will be negative. Thus, while the form of the hypoexponential densityis similar to that of the hyperexponential density (see Example 5.6) these tworandom variables are very different.Example 5.11 Let X 1 ,...,X m be independent exponential random variableswith respective rates λ 1 ,...,λ m , where λ i ≠ λ j when i ≠ j. LetN be independentof these random variables and suppose that ∑ mn=1 P n = 1, where P n =P {N = n}. The random variableY =N∑j=1is said to be a Coxian random variable. Conditioning on N gives its density function:m∑f Y (t) = f Y (t|N = n)P n=n=1X jm∑f X1 + ··· +X n(t|N = n)P nn=1


302 5 The Exponential Distribution and the Poisson ProcessLet==m∑f X1 + ··· +X n(t)P nn=1m∑P nn=1 i=1n∑C i,n λ i e −λ itr(n) = P {N = n|N n}If we interpret N as a lifetime measured in discrete time periods, then r(n) denotesthe probability that an item will die in its nth period of use given that it hassurvived up to that time. Thus, r(n) is the discrete time analog of the failure ratefunction r(t), and is correspondingly referred to as the discrete time failure (orhazard) rate function.Coxian random variables often arise in the following manner. Suppose that anitem must go through m stages of treatment to be cured. However, suppose thatafter each stage there is a probability that the item will quit the program. If wesuppose that the amounts of time that it takes the item to pass through the successivestages are independent exponential random variables, and that the probabilitythat an item that has just completed stage n quits the program is (independent ofhow long it took to go through the n stages) equal to r(n), then the total time thatan item spends in the program is a Coxian random variable. 5.3. The Poisson Process5.3.1. Counting ProcessesA stochastic process {N(t),t 0} is said to be a counting process if N(t) representsthe total number of “events” that occur by time t. Some examples of countingprocesses are the following:(a) If we let N(t) equal the number of persons who enter a particular store ator prior to time t, then {N(t),t 0} is a counting process in which an eventcorresponds to a person entering the store. Note that if we had let N(t) equalthe number of persons in the store at time t, then {N(t),t 0} would not be acounting process (why not?).(b) If we say that an event occurs whenever a child is born, then {N(t),t 0}is a counting process when N(t) equals the total number of people who wereborn by time t. [Does N(t) include persons who have died by time t? Explainwhy it must.]


5.3. The Poisson Process 303(c) If N(t) equals the number of goals that a given soccer player scores bytime t, then {N(t),t 0} is a counting process. An event of this process willoccur whenever the soccer player scores a goal.From its definition we see that for a counting process N(t) must satisfy:(i) N(t) 0.(ii) N(t) is integer valued.(iii) If s


304 5 The Exponential Distribution and the Poisson Process5.3.2. Definition of the Poisson ProcessOne of the most important counting processes is the Poisson process which isdefined as follows:Definition 5.1 The counting process {N(t),t 0} is said to be a Poissonprocess having rate λ, λ > 0, if(i) N(0) = 0.(ii) The process has independent increments.(iii) The number of events in any interval of length t is Poisson distributed withmean λt. That is, for all s, t 0−λt (λt)nP {N(t + s) − N(s)= n}=e , n= 0, 1,...n!Note that it follows from condition (iii) that a Poisson process has stationaryincrements and also thatE[N(t)]=λtwhich explains why λ is called the rate of the process.To determine if an arbitrary counting process is actually a Poisson process,we must show that conditions (i), (ii), and (iii) are satisfied. Condition (i), whichsimply states that the counting of events begins at time t = 0, and condition (ii)can usually be directly verified from our knowledge of the process. However, itis not at all clear how we would determine that condition (iii) is satisfied, and forthis reason an equivalent definition of a Poisson process would be useful.As a prelude to giving a second definition of a Poisson process we shall definethe concept of a function f(·) being o(h).Definition 5.2The function f(·) is said to be o(h) ifExample 5.12f(h)limh→0 h = 0(i) The function f(x)= x 2 is o(h) sincef(h)limh→0 h= lim h 2h→0 h = lim h = 0h→0


5.3. The Poisson Process 305(ii) The function f(x)= x is not o(h) sincef(h)limh→0 h= lim hh→0 h = lim 1 = 1 ≠ 0h→0(iii) If f(·) is o(h) and g(·) is o(h), then so is f(·) + g(·). This follows sincef(h)+ g(h) f(h)lim= limh→0 h h→0 h+ lim g(h)h→0 h = 0 + 0 = 0(iv) If f(·) is o(h), then so is g(·) = cf (·). This follows sincecf (h)lim = c lim f(h)h→0 hh = c · 0 = 0(v) From (iii) and (iv) it follows that any finite linear combination of functions,each of which is o(h), iso(h). In order for the function f(·) to be o(h) it is necessary that f(h)/hgo to zeroas h goes to zero. But if h goes to zero, the only way for f(h)/hto go to zero isfor f(h)to go to zero faster than h does. That is, for h small, f(h)must be smallcompared with h.We are now in a position to give an alternate definition of a Poisson process.Definition 5.3 The counting process {N(t),t 0} is said to be a Poissonprocess having rate λ,λ > 0, if(i) N(0) = 0.(ii) The process has stationary and independent increments.(iii) P {N(h)= 1}=λh + o(h).(iv) P {N(h) 2}=o(h).Theorem 5.1Definitions 5.1 and 5.3 are equivalent.Proof We show that Definition 5.3 implies Definition 5.1, and leave it to youto prove the reverse. To start, fix u 0 and letg(t) = E[exp{−uN(t)}]We derive a differential equation for g(t) as follows:g(t + h) = E[exp{−uN(t + h)}]= E[exp{−uN(t)} exp{−u(N(t + h) − N(t))}]


306 5 The Exponential Distribution and the Poisson Process= E[exp{−uN(t)}]E[exp{−u(N(t + h) − N(t))}]by independent increments= g(t) E[exp{−uN(h)}] by stationary increments (5.10)Now, assumptions (iii) and (iv) imply thatP {N(h)= 0}=1 − λh + o(h)Hence, conditioning on whether N(h)= 0orN(h)= 1orN(h) 2 yieldsE[exp{−uN(h)}] = 1 − λh + o(h) + e −u (λh + o(h)) + o(h)= 1 − λh + e −u λh + o(h) (5.11)Therefore, from Equations (5.10) and (5.11) we obtain thatimplying thatg(t + h) − g(t)hLetting h → 0givesor, equivalently,g(t + h) = g(t)(1 − λh + e −u λh) + o(h)= g(t)λ(e −u − 1) + o(h)hg ′ (t) = g(t)λ(e −u − 1)g ′ (t)g(t) = λ(e−u − 1)Integrating, and using g(0) = 1, shows thatorlog(g(t)) = λt(e −u − 1)g(t) = exp{λt(e −u − 1)}That is, the Laplace transform of N(t) evaluated at u is e λt(e−u−1) . Since that isalso the Laplace transform of a Poisson random variable with mean λt, theresultfollows from the fact that the distribution of a nonnegative random variable isuniquely determined by its Laplace transform.


5.3. The Poisson Process 307Figure 5.1.Remarks (i) The result that N(t) has a Poisson distribution is a consequenceof the Poisson approximation to the binomial distribution (see Section 2.2.4). Tosee this, subdivide the interval [0,t] into k equal parts where k is very large(Figure 5.1). Now it can be shown using axiom (iv) of Definition 5.3 that as kincreases to ∞ the probability of having two or more events in any of the k subintervalsgoes to 0. Hence, N(t) will (with a probability going to 1) just equal thenumber of subintervals in which an event occurs. However, by stationary and independentincrements this number will have a binomial distribution with parametersk and p = λt/k + o(t/k). Hence, by the Poisson approximation to the binomialwe see by letting k approach ∞ that N(t) will have a Poisson distribution withmean equal tolim[λ k t ( )] tk→∞ k + o to(t/k)= λt + limkk→∞ t/k= λtby using the definition of o(h) and the fact that t/k → 0ask →∞.(ii) The explicit assumption that the process has stationary increments can beeliminated from Definition 5.3 provided that we change assumptions (iii) and (iv)to require that for any t the probability of one event in the interval (t, t + h) isλh + o(h) and the probability of two or more events in that interval is o(h). Thatis, assumptions (ii), (iii), and (iv) of Definition 5.3 can be replaced by(iii) The process has independent increments.(iv) P {N(t + h) − N(t)= 1}=λh + o(h).(v) P {N(t + h) − N(t) 2}=o(h).5.3.3. Interarrival and Waiting Time DistributionsConsider a Poisson process, and let us denote the time of the first event by T 1 .Further, for n>1, let T n denote the elapsed time between the (n − 1)st and thenth event. The sequence {T n ,n= 1, 2,...} is called the sequence of interarrivaltimes. For instance, if T 1 = 5 and T 2 = 10, then the first event of the Poissonprocess would have occurred at time 5 and the second at time 15.We shall now determine the distribution of the T n . To do so, we first note thatthe event {T 1 >t} takes place if and only if no events of the Poisson process occur


308 5 The Exponential Distribution and the Poisson Processin the interval [0,t] and thus,P {T 1 >t}=P {N(t)= 0}=e −λtHence, T 1 has an exponential distribution with mean 1/λ.Now,However,P {T 2 >t}=E[P {T 2 >t|T 1 }]P {T 2 >t| T 1 = s}=P {0 events in (s, s + t]|T 1 = s}= P {0 events in (s, s + t]}= e −λt (5.12)where the last two equations followed from independent and stationary increments.Therefore, from Equation (5.12) we conclude that T 2 is also an exponentialrandom variable with mean 1/λ and, furthermore, that T 2 is independent of T 1 .Repeating the same argument yields the following.Proposition 5.1 T n ,n= 1, 2,..., are independent identically distributed exponentialrandom variables having mean 1/λ.Remark The proposition should not surprise us. The assumption of stationaryand independent increments is basically equivalent to asserting that, at any pointin time, the process probabilistically restarts itself. That is, the process from anypoint on is independent of all that has previously occurred (by independent increments),and also has the same distribution as the original process (by stationaryincrements). In other words, the process has no memory, and hence exponentialinterarrival times are to be expected.Another quantity of interest is S n , the arrival time of the nth event, also calledthe waiting time until the nth event. It is easily seen thatn∑S n = T i , n 1i=1and hence from Proposition 5.1 and the results of Section 2.2 it follows that S nhas a gamma distribution with parameters n and λ. That is, the probability densityof S n is given by−λt (λt)n−1f Sn (t) = λe(n − 1)! , t 0 (5.13)Equation (5.13) may also be derived by noting that the nth event will occur priorto or at time t if and only if the number of events occurring by time t is at least n.


5.3. The Poisson Process 309That is,N(t) n ⇔ S n tHence,F Sn (t) = P {S n t}=P {N(t) n}=which, upon differentiation, yieldsf Sn (t) =−∞∑j=n−λt (λt)jλej!+∞∑j=n∞∑j=n−λt (λt)j−1λe(j − 1)!−λt (λt)jej!−λt (λt)n−1∞= λe(n − 1)! + ∑−λt (λt)j−1λe(j − 1)! − ∑∞ −λt (λt)jλej!j=n+1j=n−λt (λt)n−1= λe(n − 1)!Example 5.13 Suppose that people immigrate into a territory at a Poissonrate λ = 1 per day.(a) What is the expected time until the tenth immigrant arrives?(b) What is the probability that the elapsed time between the tenth and theeleventh arrival exceeds two days?Solution:(a) E[S 10 ]=10/λ = 10 days.(b) P {T 11 > 2}=e −2λ = e −2 ≈ 0.133.Proposition 5.1 also gives us another way of defining a Poisson process.Suppose we start with a sequence {T n ,n 1} of independent identically distributedexponential random variables each having mean 1/λ. Now let us define acounting process by saying that the nth event of this process occurs at timeS n ≡ T 1 + T 2 + ··· +T nThe resultant counting process {N(t),t 0} ∗ will be Poisson with rate λ.∗ A formal definition of N(t) is given by N(t)≡ max{n: S n t} where S 0 ≡ 0.


310 5 The Exponential Distribution and the Poisson ProcessRemark Another way of obtaining the density function of S n is to note thatbecause S n is the time of the nth event,P {t


5.3. The Poisson Process 311by conditioning on the number of events in that interval, and the distributionof this latter quantity depends only on the length of the interval and isindependent of what has occurred in any nonoverlapping interval.• P {N 1 (h) = 1}=P {N 1 (h) = 1 | N(h)= 1}P {N(h)= 1}+ P {N 1 (h) = 1 | N(h) 2}P {N(h) 2}= p(λh + o(h)) + o(h)= λph + o(h)• P {N 1 (h) 2} P {N(h) 2}=o(h)Thus we see that {N 1 (t), t 0} is a Poisson process with rate λp and, by a similarargument, that {N 2 (t), t 0} is a Poisson process with rate λ(1 − p). Because theprobability of a type I event in the interval from t to t + h is independent of allthat occurs in intervals that do not overlap (t, t + h), it is independent of knowledgeof when type II events occur, showing that the two Poisson processes areindependent. (For another way of proving independence, see Example 3.20.) Example 5.14 If immigrants to area A arrive at a Poisson rate of ten perweek, and if each immigrant is of English descent with probability12 1 , then whatis the probability that no people of English descent will emigrate to area A duringthe month of February?Solution: By the previous proposition it follows that the number of Englishmenemigrating to area A during the month of February is Poisson distributedwith mean 4 · 10 · 112 = 10 3 . Hence the desired probability is e−10/3 . Example 5.15 Suppose nonnegative offers to buy an item that you want tosell arrive according to a Poisson process with rate λ. Assume that each offer isthe value of a continuous random variable having density function f(x). Once theoffer is presented to you, you must either accept it or reject it and wait for the nextoffer. We suppose that you incur costs at a rate c per unit time until the item is sold,and that your objective is to maximize your expected total return, where the totalreturn is equal to the amount received minus the total cost incurred. Suppose youemploy the policy of accepting the first offer that is greater than some specifiedvalue y. (Such a type of policy, which we call a y-policy, can be shown to beoptimal.) What is the best value of y?Solution: Let us compute the expected total return when you use the y-policy, and then choose the value of y that maximizes this quantity. Let X denotethe value of a random offer, and let ¯F(x)= P {X>x}= ∫ ∞xf(u)dube itstail distribution function. Because each offer will be greater than y with probability¯F(y), it follows that such offers occur according to a Poisson processwith rate λ ¯F(y). Hence, the time until an offer is accepted is an exponential


312 5 The Exponential Distribution and the Poisson Processrandom variable with rate λ ¯F(y). Letting R(y) denote the total return from thepolicy that accepts the first offer that is greater than y,wehaveE[R(y)]=E[accepted offer]−cE[time to accept]= E[X|X>y]− cλ ¯F(y)==∫ ∞0∫ ∞yxf X|X>y (x) dx −cλ ¯F(y)x f(x)¯F(y) dx −cλ ¯F(y)∫ ∞yxf (x) dx − c/λ=¯F(y)Differentiation yields that(∫d∞E[R(y)]=0 ⇔−¯F(y)yf(y)+dy yTherefore, the optimal value of y satisfies∫ ∞y ¯F(y)= xf (x) dx − c λor∫ ∞y∫ ∞xf (x) dx − c λ)f(y)= 0y f(x)dx= xf (x) dx − cyyλor∫ ∞(x − y)f(x)dx = c (5.14)yλWe now argue that the left-hand side of the preceding is a nonincreasing functionof y. To do so, note that, with a + defined to equal a if a>0 or to equal 0otherwise, we have∫ ∞(x − y)f(x)dx = E[(X − y) + ]yBecause (X − y) + is a nonincreasing function of y, so is its expectation, thusshowing that the left hand side of Equation (5.14) is a nonincreasing functionof y. Consequently, if E[X]


5.3. The Poisson Process 313abilities p and 1 − p, then the number of individuals in each of the two groupswill be independent Poisson random variables. Because this result easily generalizesto the case where the classification is into any one of r possible groups,we have the following application to a model of employees moving about in anorganization.Example 5.16 Consider a system in which individuals at any time areclassified as being in one of r possible states, and assume that an individualchanges states in accordance with a Markov chain having transition probabilitiesP ij , i, j = 1,...,r. That is, if an individual is in state i during a time periodthen, independently of its previous states, it will be in state j during the next timeperiod with probability P ij . The individuals are assumed to move through the systemindependently of each other. Suppose that the numbers of people initially instates 1, 2,...,r are independent Poisson random variables with respective meansλ 1 ,λ 2 ,...,λ r . We are interested in determining the joint distribution of the numbersof individuals in states 1, 2,...,r at some time n.Solution: For fixed i, letN j (i), j = 1,...,r denote the number of thoseindividuals, initially in state i, that are in state j at time n. Now each of the(Poisson distributed) number of people initially in state i will, independently ofeach other, be in state j at time n with probability Pij n,where P ij n is the n-stagetransition probability for the Markov chain having transition probabilities P ij .Hence, the N j (i), j = 1,...,r will be independent Poisson random variableswith respective means λ i Pij n , j = 1,...,r. Because the sum of independentPoisson random variables is itself a Poisson random variable, it follows thatthe number of individuals in state j at time n—namely ∑ ri=1 N j (i)—will beindependent Poisson random variables with respective means ∑ i λ iPij n,forj=1,...,r. Example 5.17 (The Coupon Collecting Problem) There are m different typesof coupons. Each time a person collects a coupon it is, independently of onespreviously obtained, a type j coupon with probability p j , ∑ mj=1 p j = 1. Let Ndenote the number of coupons one needs to collect in order to have a completecollection of at least one of each type. Find E[N].Solution: If we let N j denote the number one must collect to obtain a typej coupon, then we can express N asN = max1jm N jHowever, even though each N j is geometric with parameter p j , the foregoingrepresentation of N is not that useful, because the random variables N j are notindependent.


314 5 The Exponential Distribution and the Poisson ProcessWe can, however, transform the problem into one of determining the expectedvalue of the maximum of independent random variables. To do so,suppose that coupons are collected at times chosen according to a Poissonprocess with rate λ = 1. Say that an event of this Poisson process is of type j,1 j m, if the coupon obtained at that time is a type j coupon. If we now letN j (t) denote the number of type j coupons collected by time t, then it followsfrom Proposition 5.2 that {N j (t), t 0},j = 1,...,mare independent Poissonprocesses with respective rates λp j = p j .LetX j denote the time of the firstevent of the jth process, and letX = max1jm X jdenote the time at which a complete collection is amassed. Since the X j areindependent exponential random variables with respective rates p j , it followsthatTherefore,P {X


5.3. The Poisson Process 315Therefore,E[X]=E[N]and so E[N] is as given in Equation (5.15).Let us now compute the expected number of types that appear only once inthe complete collection. Letting I i equal 1 if there is only a single type i couponin the final set, and letting it equal 0 otherwise, we thus want[ m]∑ m∑E I i = E[I i ]i=1=i=1m∑P {I i = 1}Now there will be a single type i coupon in the final set if a coupon of eachtype has appeared before the second coupon of type i is obtained. Thus, lettingS i denote the time at which the second type i coupon is obtained, we haveP {I i = 1}=P {X j


316 5 The Exponential Distribution and the Poisson Processthe first process, and Sm 2 the time of the mth event of the second process. We seekP { Sn 1 }


5.3. The Poisson Process 317pendent increments it seems reasonable that each interval in [0,t] of equal lengthshould have the same probability of containing the event. In other words, the timeof the event should be uniformly distributed over [0,t]. This is easily checkedsince, for s t,P {T 1


318 5 The Exponential Distribution and the Poisson ProcessTheorem 5.2 Given that N(t) = n, then arrival times S 1 ,...,S n have thesame distribution as the order statistics corresponding to n independent randomvariables uniformly distributed on the interval (0,t).Proof To obtain the conditional density of S 1 ,...,S n given that N(t)= nnote that for 0


5.3. The Poisson Process 319Example 5.18 (An Infinite Server Queue) Suppose that customers arrive at aservice station in accordance with a Poisson process with rate λ. Upon arrival thecustomer is immediately served by one of an infinite number of possible servers,and the service times are assumed to be independent with a common distributionG. What is the distribution of X(t), the number of customers that have completedservice by time t? What is the distribution of Y(t), the number of customersthat are being served at time t?To answer the preceding questions let us agree to call an entering customer atype I customer if he completes his service by time t and a type II customer if hedoes not complete his service by time t. Now, if the customer enters at time s,s t, then he will be a type I customer if his service time is less than t − s.Since the service time distribution is G, the probability of this will be G(t − s).Similarly, a customer entering at time s, s t, will be a type II customer withprobability Ḡ(t − s) = 1 − G(t − s). Hence, from Proposition 5.3 we obtain thatthe distribution of X(t), the number of customers that have completed service bytime t, is Poisson distributed with mean∫ t∫ tE[X(t)]=λ G(t − s) ds = λ G(y) dy (5.17)00Similarly, the distribution of Y(t), the number of customers being served at time tis Poisson with mean∫ t∫ tE[Y(t)]=λ Ḡ(t − s) ds = λ Ḡ(y) dy (5.18)00Furthermore, X(t) and Y(t)are independent.Suppose now that we are interested in computing the joint distribution of Y(t)and Y(t + s)—that is, the joint distribution of the number in the system at time tand at time t + s. To accomplish this, say that an arrival istype 1: if he arrives before time t and completes service between t and t + s,type 2: if he arrives before t and completes service after t + s,type 3: if he arrives between t and t + s and completes service after t + s,type 4: otherwise.Hence an arrival at time y will be type i with probability P i (y) given by{ G(t + s − y) − G(t − y), if y


320 5 The Exponential Distribution and the Poisson Process{Ḡ(t+ s − y), if t


5.3. The Poisson Process 321Figure 5.2. Cars enter at point a and depart at b.constant speed that is randomly determined, independently from car to car, fromthe distribution G. When a faster car encounters a slower one, it passes it withno time being lost. If your car enters the highway at time s and you are able tochoose your speed, what speed minimizes the expected number of encounters youwill have with other cars, where we say that an encounter occurs each time yourcar either passes or is passed by another car?Solution: We will show that for large s the speed that minimizes the expectednumber of encounters is the median of the speed distribution G. Toseethis, suppose that the speed x is chosen. Let d = b − a denote the length of theroad. Upon choosing the speed x, it follows that your car will enter the road attime s and will depart at time s + t 0 , where t 0 = d/x is the travel time.Now, the other cars enter the road according to a Poisson process with rate λ.Each of them chooses a speed X according to the distribution G, and this resultsin a travel time T = d/X.LetF denote the distribution of travel time T . That is,F(t)= P {T t}=P {d/X t}=P {X d/t}=Ḡ(d/t)Let us say that an event occurs at time t if a car enters the highway at time t.Also, say that the event is a type 1 event if it results in an encounter with yourcar. Now, your car will enter the road at time s and will exit at time s + t 0 .Hence, a car will encounter your car if it enters before s and exits after s + t 0(in which case your car will pass it on the road) or if it enters after s but exitsbefore s + t 0 (in which case it will pass your car). As a result, a car that entersthe road at time t will encounter your car if its travel time T is such thatt + T>s+ t 0 , if t


322 5 The Exponential Distribution and the Poisson ProcessSince events (that is, cars entering the road) are occurring according to a Poissonprocess it follows, upon applying Proposition 5.3, that the total number oftype 1 events that ever occurs is Poisson with meanλ∫ ∞0p(t)dt = λ= λ∫ s0∫ s+t0t 0¯F(s+ t 0 − t)dt + λ¯F(y)dy+ λ∫ t00∫ s+t0sF(y)dyF(s+ t 0 − t)dtTo choose the value of t 0 that minimizes the preceding quantity, we differentiate.This gives{ ∫d ∞}λ p(t)dt = λ{ ¯F(s+ t 0 ) − ¯F(t 0 ) + F(t 0 )}dt 0 0Setting this equal to 0, and using that ¯F(s+ t 0 ) ≈ 0 when s is large, we see thatthe optimal travel time t 0 is such thatF(t 0 ) − ¯F(t 0 ) = 0ororF(t 0 ) −[1 − F(t 0 )]=0F(t 0 ) = 1 2That is, the optimal travel time is the median of the travel time distribution.Since the speed X is equal to the distance d divided by the travel time T ,it follows that the optimal speed x 0 = d/t 0 is such thatF(d/x 0 ) = 1 2SinceF(d/x 0 ) = Ḡ(x 0 )we see that Ḡ(x 0 ) = 1 2, and so the optimal speed is the median of the distributionof speeds.Summing up, we have shown that for any speed x the number of encounterswith other cars will be a Poisson random variable, and the mean of thisPoisson will be smallest when the speed x is taken to be the median of thedistribution G. Example 5.20 (Tracking the Number of HIV Infections) There is a relativelylong incubation period from the time when an individual becomes infected withthe HIV virus, which causes AIDS, until the symptoms of the disease appear.


5.3. The Poisson Process 323As a result, it is difficult for public health officials to be certain of the numberof members of the population that are infected at any given time. We will nowpresent a first approximation model for this phenomenon, which can be used toobtain a rough estimate of the number of infected individuals.Let us suppose that individuals contract the HIV virus in accordance with aPoisson process whose rate λ is unknown. Suppose that the time from when anindividual becomes infected until symptoms of the disease appear is a randomvariable having a known distribution G. Suppose also that the incubation times ofdifferent infected individuals are independent.Let N 1 (t) denote the number of individuals who have shown symptoms of thedisease by time t. Also, let N 2 (t) denote the number who are HIV positive buthave not yet shown any symptoms by time t. Now, since an individual who contractsthe virus at time s will have symptoms by time t with probability G(t − s)and will not with probability Ḡ(t − s), it follows from Proposition 5.3 that N 1 (t)and N 2 (t) are independent Poisson random variables with respective meansE[N 1 (t)]=λ∫ t0G(t − s)ds = λ∫ t0G(y) dyand∫ t∫ tE[N 2 (t)]=λ Ḡ(t − s)ds = λ Ḡ(y) dy00Now, if we knew λ, then we could use it to estimate N 2 (t), the number of individualsinfected but without any outward symptoms at time t, by its meanvalue E[N 2 (t)]. However, since λ is unknown, we must first estimate it. Now,we will presumably know the value of N 1 (t), and so we can use its known valueas an estimate of its mean E[N 1 (t)]. That is, if the number of individuals whohave exhibited symptoms by time t is n 1 , then we can estimate thatn 1 ≈ E[N 1 (t)]=λ∫ t0G(y) dyTherefore, we can estimate λ by the quantity ˆλ given by/ ∫ tˆλ = n 1 G(y) dy0Using this estimate of λ, we can estimate the number of infected but symptomlessindividuals at time t byestimate of N 2 (t) = ˆλ= n 1∫ t0∫ t0∫ t0Ḡ(y) dyḠ(y) dyG(y) dy


324 5 The Exponential Distribution and the Poisson ProcessFor example, suppose that G is exponential with mean μ. Then Ḡ(y) = e −y/μ ,and a simple integration gives thatestimate of N 2 (t) = n 1μ(1 − e −t/μ )t − μ(1 − e −t/μ )If we suppose that t = 16 years, μ = 10 years, and n 1 = 220 thousand, then theestimate of the number of infected but symptomless individuals at time 16 isestimate = 2,200(1 − e−1.6 )16 − 10(1 − e −1.6 ) = 218.96That is, if we suppose that the foregoing model is approximately correct (and weshould be aware that the assumption of a constant infection rate λ that is unchangingover time is almost certainly a weak point of the model), then if the incubationperiod is exponential with mean 10 years and if the total number of individualswho have exhibited AIDS symptoms during the first 16 years of the epidemic is220 thousand, then we can expect that approximately 219 thousand individualsare HIV positive though symptomless at time 16. Proof of Proposition 5.3 Let us compute the joint probabilityP {N i (t) = n i , i = 1,...,k}. To do so note first that in order for there to havebeen n i type i events for i = 1,...,k there must have been a total of ∑ ki=1 n ievents. Hence, conditioning on N(t) yieldsP {N 1 (t) = n 1 ,...,N k (t) = n k }{}∣ ∣∣k∑= P N 1 (t) = n 1 ,...,N k (t) = n k N(t)= n i{}k∑× P N(t)= n ii=1i=1Now consider an arbitrary event that occurred in the interval [0,t]. If it had occurredat time s, then the probability that it would be a type i event would be P i (s).Hence, since by Theorem 5.2 this event will have occurred at some time uniformlydistributed on (0,t), it follows that the probability that this event will be a type i


5.3. The Poisson Process 325event isP i = 1 t∫ t0P i (s) dsindependently of the other events. Hence,{P N i (t) = n i ,i = 1,...,k ∣ N(t)=}k∑n iwill just equal the multinomial probability of n i type i outcomes for i = 1,...,kwhen each of ∑ ki=1 n i independent trials results in outcome i with probabilityP i ,i= 1,...,k. That is,{}∣ ∣∣k∑P N 1 (t) = n 1 ,...,N k (t) = n k N(t)= n iConsequently,i=1P {N 1 (t) = n 1 ,...,N k (t) = n k }= (∑ i n i)!n 1 !···n k ! P n 1k∏=i=11 ···P n kke −λtP i(λtP i ) n i/n i !i=1= (∑ ki=1 n i )!n 1 !···n k ! P n 11 ···P n kke−λt(λt)∑ i n i( ∑ i n i)!and the proof is completeWe now present some additional examples of the usefulness of Theorem 5.2.Example 5.21 Insurance claims are made at times distributed according to aPoisson process with rate λ; the successive claim amounts are independent randomvariables having distribution G with mean μ, and are independent of theclaim arrival times. Let S i and C i denote, respectively, the time and the amountof the ith claim. The total discounted cost of all claims made up to time t, call itD(t), is defined byN(t)∑D(t) = e −αS iC ii=1


326 5 The Exponential Distribution and the Poisson Processwhere α is the discount rate and N(t) is the number of claims made by time t.Todetermine the expected value of D(t), we condition on N(t) to obtainE[D(t)]=∞∑n=0−λt (λt)nE[D(t)|N(t)= n]en!Now, conditional on N(t) = n, the claim arrival times S 1 ,...,S n are distributedas the ordered values U (1) ,...,U (n) of n independent uniform (0,t)random variablesU 1 ,...,U n . Therefore,[ n∑]E[D(t)|N(t)= n]=E C i e −αU (i)i=1==n∑E[C i e −αU (i)]i=1n∑E[C i ]E[e −αU (i)]where the final equality used the independence of the claim amounts and theirarrival times. Because E[C i ]=μ, continuing the preceding givesi=1E[D(t)|N(t)= n]=μn∑E[e −αU (i)]i=1[ n∑= μEi=1[ n∑= μEi=1e −αU (i)e −αU i]]The last equality follows because U (1) ,...,U (n) are the values U 1 ,...,U n inincreasing order, and so ∑ ni=1 e −αU (i) = ∑ ni=1 e −αU i. Continuing the string ofequalities yieldsE[D(t)|N(t)= n]=nμE[e −αU ]= n μ t∫ t0e −αx dx= n μ αt (1 − e−αt )


5.3. The Poisson Process 327Therefore,Taking expectations yields the resultE[D(t)|N(t)]=N(t) μ αt (1 − e−αt )E[D(t)]= λμ α (1 − e−αt )Example 5.22 (An Optimization Example) Suppose that items arrive at aprocessing plant in accordance with a Poisson process with rate λ. At a fixedtime T , all items are dispatched from the system. The problem is to choose anintermediate time, t ∈ (0,T), at which all items in the system are dispatched, soas to minimize the total expected wait of all items.If we dispatch at time t, 0


328 5 The Exponential Distribution and the Poisson Process5.3.6. Estimating Software ReliabilityWhen a new computer software package is developed, a testing procedure is oftenput into effect to eliminate the faults, or bugs, in the package. One commonprocedure is to try the package on a set of well-known problems to see if anyerrors result. This goes on for some fixed time, with all resulting errors beingnoted. Then the testing stops and the package is carefully checked to determinethe specific bugs that were responsible for the observed errors. The package isthen altered to remove these bugs. Because we cannot be certain that all the bugsin the package have been eliminated, however, a problem of great importance isthe estimation of the error rate of the revised software package.To model the preceding, let us suppose that initially the package contains anunknown number, m, of bugs, which we will refer to as bug 1, bug 2,...,bugm.Suppose also that bug i will cause errors to occur in accordance with a Poissonprocess having an unknown rate λ i ,i = 1,...,m. Then, for instance, the numberof errors due to bug i that occurs in any s units of operating time is Poissondistributed with mean λ i s. Also suppose that these Poisson processes caused bybugs i, i = 1,...,m are independent. In addition, suppose that the package is tobe run for t time units with all resulting errors being noted. At the end of thistime a careful check of the package is made to determine the specific bugs thatcaused the errors (that is, a debugging, takes place). These bugs are removed, andthe problem is then to determine the error rate for the revised package.If we let{ 1, if bug i has not caused an error by tψ i (t) =0, otherwisethen the quantity we wish to estimate is(t) = ∑ iλ i ψ i (t)the error rate of the final package. To start, note thatE[(t)]= ∑ iλ i E[ψ i (t)]= ∑ iλ i e −λ it(5.19)Now each of the bugs that is discovered would have been responsible for a certainnumber of errors. Let us denote by M j (t) the number of bugs that were responsiblefor j errors, j 1. That is, M 1 (t) is the number of bugs that caused exactly


5.3. The Poisson Process 329one error, M 2 (t) is the number that caused two errors, and so on, with ∑ j jM j (t)equaling the total number of errors that resulted. To compute E[M 1 (t)], let usdefine the indicator variables, I i (t), i 1, by{ 1, bug i causes exactly 1 errorI i (t) =0, otherwiseThen,M 1 (t) = ∑ iI i (t)and soE[M 1 (t)]= ∑ iE[I i (t)]= ∑ iλ i te −λ it(5.20)Thus, from (5.19) and (5.20) we obtain the intriguing result that[E (t) − M ]1(t)= 0 (5.21)tThus suggests the possible use of M 1 (t)/t as an estimate of (t). To determinewhether or not M 1 (t)/t constitutes a “good” estimate of (t) we shall look athow far apart these two quantities tend to be. That is, we will compute[ (E (t) − M ) ] 2 (1(t)= Var (t) − M )1(t)from (5.21)ttNow,= Var((t)) − 2 t Cov((t), M 1(t)) + 1t 2 Var(M 1(t))Var((t)) = ∑ iλ 2 i Var(ψ i(t)) = ∑ iλ 2 i e−λ it (1 − e −λ it ),Var(M 1 (t)) = ∑ iVar(I i (t)) = ∑ iλ i te −λ it (1 − λ i te −λ it ),( ∑Cov((t), M 1 (t)) = Coviλ i ψ i (t), ∑ jI j (t))= ∑ i∑Cov(λ i ψ i (t), I j (t))j


330 5 The Exponential Distribution and the Poisson Process= ∑ iλ i Cov(ψ i (t), I i (t))=− ∑ iλ i e −λ it λ i te −λ itwhere the last two equalities follow since ψ i (t) and I j (t) are independent wheni ≠ j because they refer to different Poisson processes and ψ i (t)I i (t) = 0. Hencewe obtain that[ (E (t) − M ) ] 21(t)= ∑ λ 2 ite−λit + 1 ∑λ i e −λ ittii= E[M 1(t) + 2M 2 (t)]t 2where the last equality follows from (5.20) and the identity (which we leave as anexercise)E[M 2 (t)]= 1 ∑(λ i t) 2 e −λ it(5.22)2Thus, we can estimate the average square of the difference between (t) andM 1 (t)/t by the observed value of M 1 (t) + 2M 2 (t) divided by t 2 .Example 5.23 Suppose that in 100 units of operating time 20 bugs are discoveredof which two resulted in exactly one, and three resulted in exactly two,errors. Then we would estimate that (100) is something akin to the value ofa random variable whose mean is equal to 1/50 and whose variance is equal to8/10,000. i5.4. Generalizations of the Poisson Process5.4.1. Nonhomogeneous Poisson ProcessIn this section we consider two generalizations of the Poisson process. The firstof these is the nonhomogeneous, also called the nonstationary, Poisson process,which is obtained by allowing the arrival rate at time t to be a function of t.Definition 5.4 The counting process {N(t), t 0} is said to be a nonhomogeneousPoisson process with intensity function λ(t), t 0, if(i) N(0) = 0.(ii) {N(t),t 0} has independent increments.(iii) P {N(t + h) − N(t) 2}=o(h).(iv) P {N(t + h) − N(t)= 1}=λ(t)h + o(h).


5.4. Generalizations of the Poisson Process 331Time sampling an ordinary Poisson process generates a nonhomogeneous Poissonprocess. That is, let {N(t),t 0} be a Poisson process with rate λ, and supposethat an event occurring at time t is, independently of what has occurred priorto t, counted with probability p(t). With N c (t) denoting the number of countedevents by time t, the counting process {N c (t), t 0} is a nonhomogeneous Poissonprocess with intensity function λ(t) = λp(t). This is verified by noting that{N c (t), t 0} satisfies the nonhomogeneous Poisson process axioms.1. N c (0) = 0.2. The number of counted events in (s, s + t) depends solely on the numberof events of the Poisson process in (s, s + t), which is independent of whathas occurred prior to time s. Consequently, the number of counted eventsin (s, s + t) is independent of the process of counted events prior to s, thusestablishing the independent increment property.3. Let N c (t, t +h) = N c (t +h)−N c (t), with a similar definition of N(t,t+h).P {N c (t, t + h) 2} P {N(t,t + h) 2}=o(h)4. To compute P {N c (t, t + h) = 1}, condition on N(t,t + h).P {N c (t, t + h) = 1}= P {N c (t, t + h) = 1|N(t,t + h) = 1}P {N(t,t + h) = 1}+ P {N c (t, t + h) = 1|N(t,t + h) 2}P {N(t,t + h) 2}= P {N c (t, t + h) = 1|N(t,t + h) = 1}λh + o(h)= p(t)λh + o(h)Not only does time sampling a Poisson process result in a nonhomogeneousPoisson process, but it also works the other way: every nonhomogeneous Poissonprocess with a bounded intensity function can be thought of as being a timesampling of a Poisson process. To show this, we start by showing that the superpositionof two independent nonhomogeneous Poisson processes remains anonhomogeneous Poisson process.Proposition 5.4 Let {N(t),t 0}, and {M(t),t 0}, be independent nonhomogeneousPoisson processes, with respective intensity functions λ(t) andμ(t), and let N ∗ (t) = N(t)+ M(t). Then, the following are true.(a) {N ∗ (t), t 0} is a nonhomogeneous Poisson process with intensity functionλ(t) + μ(t).(b) Given that an event of the {N ∗ (t)} process occurs at time t then, independentof what occurred prior to t, the event at t was from the {N(t)} process withprobabilityλ(t)λ(t)+μ(t) .


332 5 The Exponential Distribution and the Poisson ProcessProof To verify that {N ∗ (t), t 0}, is a nonhomogeneous Poisson process withintensity function λ(t) + μ(t), we will argue that it satisfies the nonhomogeneousPoisson process axioms.1. N ∗ (0) = N(0) + M(0) = 0.2. To verify independent increments, let I 1 ,...,I n be nonoverlapping intervals.Let N(I) and M(I) denote, respectively, the number of eventsfrom the {N(t)} process and from the {M(t)} process that are in theinterval I . Because each counting process has independent increments,and the two processes are independent of each other, it follows thatN(I 1 ),...,N(I n ), M(I 1 ), . . . , M(I n ) are all independent, and thus so areN(I 1 ) + M(I 1 ),...,N(I n ) + M(I n ), which shows that {N ∗ (t), t 0} alsopossesses independent increments.3. In order for there to be exactly one event of the N ∗ process between t andt + h, either there must be one event of the N process and 0 events of theM process or the reverse. The first of these mutually exclusive possibilitiesoccurs with probabilityP {N(t,t + h) = 1,M(t,t + h) = 0}= P {N(t,t + h) = 1} P {M(t,t + h) = 0}= (λ(t)h + o(h)) (1 − μ(t)h + o(h))= λ(t)h + o(h)Similarly, the second possibility occurs with probabilityyielding thatP {N(t,t + h) = 0,M(t,t + h) = 1}=μ(t)h + o(h)P {N ∗ (t + h) − N ∗ (t) = 1}=(λ(t) + μ(t))h + o(h)4. In order for there to be at least two events of the N ∗ process between tand t + h, one of the following three possibilities must occur: there is atleast two events of the N process between t and t + h; there is at leasttwo events of the M process between t and t + h; or both processes haveexactly one event between t and t + h. Each of the first two of these possibilitiesoccurs with probability o(h), while the third occurs with probability(λ(t)h + o(h))(μ(t)h + o(h)) = o(h). Thus,P {N ∗ (t + h) − N ∗ (t) 2} o(h)Thus, part (a) is proven.To prove (b), note first that it follows from independent increments that whichprocess caused the event at time t is independent of what occurred prior to t. To


5.4. Generalizations of the Poisson Process 333find the conditional probability that the event at time t is from the N process, weuse thatP {N(t,t + h) = 1|N ∗ (t, t + h) = 1}===P {N(t,t + h) = 1,M(t,t + h) = 0}P {N ∗ (t, t + h) = 1}λ(t)h + o(h)(λ(t) + μ(t))h + o(h)λ(t) + o(h)hλ(t) + μ(t) + o(h)hLetting h → 0 in the preceding proves (b).Now, suppose that {N(t),t 0} is a nonhomogeneous Poisson process with abounded intensity function λ(t), and suppose that λ is such that λ(t) λ, for allt 0. Letting {M(t), t 0} be a nonhomogeneous Poisson process with intensityfunction μ(t) = λ − λ(t), t 0, that is independent of {N(t),t 0}, it followsfrom Proposition 5.4 that {N(t),t 0} can be regarded as being the process oftime sampled events of the Poisson process {N(t)+ M(t),t 0}, where an eventof the Poisson process that occurs at time t is counted with probability p(t) =λ(t)/λ.With this interpretation of a nonhomogeneous Poisson process as being a timesampledPoisson process, the number of events of the nonhomogeneous Poissonprocess by time t, namely, N(t), is equal to the number of counted events of thePoisson process by time t. Consequently, from Proposition 5.3 it follows that N(t)is a Poisson random variable with mean∫ t ∫λ(y) tE[N(t)]=λ0 λdy = λ(y) dy0Moreover, by regarding the nonhomogeneous Poisson process as starting attime s, the preceding yields that N(t+s)−N(t), the number of events in its first ttime units, is a Poisson random variable with mean ∫ t0 λ(s +y)dy = ∫ s+tsλ(y) dy.The function m(t) defined by∫ tm(t) = λ(y) dy0is called the mean value function of the nonhomogeneous Poisson process.Remark That N(s+ t)− N(s) has a Poisson distribution with meanλ(y) dy is a consequence of the Poisson limit of the sum of independent∫ s+ts


334 5 The Exponential Distribution and the Poisson ProcessBernoulli random variables (see Example 2.47). To see why, subdivide the interval[s,s + t] into n subintervals of length t n, where subinterval i goes froms + (i − 1)n t to s + i n t , i = 1,...,n.LetN i = N(s + in t ) − N(s + (i − 1) n t ) bethe number of events that occur in subinterval i, and note that( n)⋃P { 2 events in some subinterval}=P {N i 2}Becausei=1n∑P {N i 2}i=1= no(t/n) by Axiom (iii)lim no(t/n) = lim t o(t/n)n→∞ n→∞ t/n = 0it follows that, as n increases to ∞, the probability of having two or more eventsin any of the n subintervals goes to 0. Consequently, with a probability going to 1,N(t) will equal the number of subintervals in which an event occurs. Because theprobability of an event in subinterval i is λ(s + in t ) n t + o( n t ), it follows, becausethe number of events in different subintervals are independent, that when n islarge the number of subintervals that contain an event is approximately a Poissonrandom variable with meann∑(λ s + in) t tn + no(t/n)But,limn∑n→∞i=1and the result follows.i=1(λ s + i t ) ∫ t s+tn n + no(t/n) = λ(y) dysThe importance of the nonhomogeneous Poisson process resides in the fact thatwe no longer require the condition of stationary increments. Thus we now allowfor the possibility that events may be more likely to occur at certain times thanduring other times.Example 5.24 Siegbert runs a hot dog stand that opens at 8 A.M. From8until 11 A.M. customers seem to arrive, on the average, at a steadily increasingrate that starts with an initial rate of 5 customers per hour at 8 A.M. and reachesa maximum of 20 customers per hour at 11 A.M. From11A.M. until 1 P.M. the


5.4. Generalizations of the Poisson Process 335(average) rate seems to remain constant at 20 customers per hour. However, the(average) arrival rate then drops steadily from 1 P.M. until closing time at 5 P.M.at which time it has the value of 12 customers per hour. If we assume that thenumbers of customers arriving at Siegbert’s stand during disjoint time periods areindependent, then what is a good probability model for the preceding? What is theprobability that no customers arrive between 8:30 A.M. and 9:30 A.M. on Mondaymorning? What is the expected number of arrivals in this period?Solution: A good model for the preceding would be to assume that arrivalsconstitute a nonhomogeneous Poisson process with intensity functionλ(t) given by⎧⎨5 + 5t, 0 t 3λ(t) = 20, 3 t 5⎩20 − 2(t − 5), 5 t 9andλ(t) = λ(t − 9) for t>9Note that N(t) represents the number of arrivals during the first t hours that thestore is open. That is, we do not count the hours between 5 P.M. and 8 A.M.If for some reasons we wanted N(t) to represent the number of arrivals duringthe first t hours regardless of whether the store was open or not, then, assumingthat the process begins at midnight we would let⎧0, 0 t 8⎪⎨ 5 + 5(t − 8), 8 t 11λ(t) = 20, 11 t 1320 − 2(t − 13), 13 t 17⎪⎩0, 17 24As the number of arrivals between 8:30 A.M. and 9:30 A.M. will be Poissonwith mean m( 3 2 ) − m( 1 2 ) in the first representation [and m( 19 2 ) − m( 17 2) in thesecond representation], we have that the probability that this number is zero is{ ∫ }3/2exp − (5 + 5t)dt = e −101/2and the mean number of arrivals is∫ 3/21/2(5 + 5t)dt = 10


336 5 The Exponential Distribution and the Poisson ProcessSuppose that events occur according to a Poisson process with rate λ, andsuppose that, independent of what has previously occurred, an event at times is a type 1 event with probability P 1 (s) or a type 2 event with probabilityP 2 (s) = 1 − P 1 (s).IfN i (t), t 0, denotes the number of type i events by time t,then it easily follows from Definition 5.4 that {N 1 (t), t 0} and {N 2 (t), t 0} areindependent nonhomogeneous Poisson processes with respective intensity functionsλ i (t) = λP i (t), i = 1, 2. (The proof mimics that of Proposition 5.2.)Thisresultgives us another way of understanding (or of proving) the time sampling Poissonprocess result of Proposition 5.3 which states that N 1 (t) and N 2 (t) are independentPoisson random variables with means E[N i (t)]=λ ∫ t0 P i(s) ds, i = 1, 2.Example 5.25 (The Output Process of an Infinite Server Poisson Queue) Itturns out that the output process of the M/G/∞ queue—that is, of the infiniteserver queue having Poisson arrivals and general service distribution G—is a nonhomogeneousPoisson process having intensity function λ(t) = λG(t). Toverifythis claim, let us first argue that the departure process has independent increments.Towards this end, consider nonoverlapping intervals O 1 ,...,O k ; now say that anarrival is type i, i = 1,...,k, if that arrival departs in the interval O i . By Proposition5.3, it follows that the numbers of departures in these intervals are independent,thus establishing independent increments. Now, suppose that an arrivalis “counted” if that arrival departs between t and t + h. Because an arrival at times,s


5.4. Generalizations of the Poisson Process 337If we let S n denote the time of the nth event of the nonhomogeneous Poissonprocess, then we can obtain its density as follows:P {t


338 5 The Exponential Distribution and the Poisson Process(iii) Suppose customers leave a supermarket in accordance with a Poissonprocess. If Y i , the amount spent by the ith customer, i = 1, 2,..., are independentand identically distributed, then {X(t),t 0} is a compound Poissonprocess when X(t) denotes the total amount of money spent by time t. Because X(t) is a compound Poisson random variable with Poisson parameterλt, we have from Examples 3.10 and 3.17 thatE[X(t)]=λtE[Y 1 ] (5.24)andVar(X(t)) = λtE [ Y12 ](5.25)Example 5.26 Suppose that families migrate to an area at a Poisson rateλ = 2 per week. If the number of people in each family is independent and takeson the values 1, 2, 3, 4 with respective probabilities 1 6 , 1 3 , 1 3 , 1 6, then what is theexpected value and variance of the number of individuals migrating to this areaduring a fixed five-week period?Solution:thatLetting Y i denote the number of people in the ith family, we haveE[Y i ]=1 · 16 + 2 · 13 + 3 · 13 + 4 · 16 = 5 2 ,E [ Yi2 ]= 12 · 16 + 22 · 13 + 32 · 13 + 42 · 16 = 43 6Hence, letting X(5) denote the number of immigrants during a five-week period,we obtain from Equations (5.24) and (5.25) thatandE[X(5)]=2 · 5 · 52= 25Var[X(5)]=2 · 5 · 43 6 = 2153Example 5.27 (Busy Periods in Single-Server Poisson Arrival Queues)Consider a single-server service station in which customers arrive according to aPoisson process having rate λ. An arriving customer is immediately served if theserver is free; if not, the customer waits in line (that is, he or she joins the queue).The successive service times are independent with a common distribution.Such a system will alternate between idle periods when there are no customersin the system, so the server is idle, and busy periods when there are customers inthe system, so the server is busy. A busy period will begin when an arrival findsthe system empty, and because of the memoryless property of the Poisson arrivals


5.4. Generalizations of the Poisson Process 339it follows that the distribution of the length of a busy period will be the same foreach such period. Let B denote the length of a busy period. We will compute itsmean and variance.To begin, let S denote the service time of the first customer in the busy periodand let N(S) denote the number of arrivals during that time. Now, if N(S) = 0then the busy period will end when the initial customer completes his service, andso B will equal S in this case. Now, suppose that one customer arrives duringthe service time of the initial customer. Then, at time S there will be a singlecustomer in the system who is just about to enter service. Because the arrivalstream from time S on will still be a Poisson process with rate λ, it thus followsthat the additional time from S until the system becomes empty will have the samedistribution as a busy period. That is, if N(S)= 1 thenB = S + B 1where B 1 is independent of S and has the same distribution as B.Now, consider the general case where N(S) = n, so there will be n customerswaiting when the server finishes his initial service. To determine the distributionof remaining time in the busy period note that the order in which customersare served will not affect the remaining time. Hence, let us suppose that the n arrivals,call them C 1 ,...,C n , during the initial service period are served as follows.Customer C 1 is served first, but C 2 is not served until the only customers in thesystem are C 2 ,...,C n . For instance, any customers arriving during C 1 ’s servicetime will be served before C 2 . Similarly, C 3 is not served until the system is freeof all customers but C 3 ,...,C n , and so on. A little thought reveals that the timesbetween the beginnings of service of customers C i and C i+1 , i = 1,...,n− 1,and the time from the beginning of service of C n until there are no customersin the system, are independent random variables, each distributed as a busy period.It follows from the preceding that if we let B 1 ,B 2 ,... be a sequence of independentrandom variables, each distributed as a busy period, then we can expressB asN(S)∑B = S +i=1B iHence,⎡ ⎤N(S)∑E[B|S]=S + E ⎣ B i |S⎦i=1


340 5 The Exponential Distribution and the Poisson Processand⎛ ⎞N(S)∑Var(B|S) = Var ⎝ B i |S⎠i=1However, given S, ∑ N(S)i=1 B i is a compound Poisson random variable, and thusfrom Equations (5.24) and (5.25) we obtainHence,E[B|S]=S + λSE[B]=(1 + λE[B])SVar(B|S) = λSE[B 2 ]implying, provided that λE[S] < 1, thatE[B]=E[E[B|S]] = (1 + λE[B])E[S]E[B]=E[S]1 − λE[S]Also, by the conditional variance formulayieldingVar(B) = Var(E[B|S]) + E[Var(B|S)]= (1 + λE[B]) 2 Var(S) + λE[S]E[B 2 ]= (1 + λE[B]) 2 Var(S) + λE[S](Var(B) + E 2 [B])Var(B) = Var(S)(1 + λE[B])2 + λE[S]E 2 [B]1 − λE[S]Using E[B]=E[S]/(1 − λE[S]), we obtainVar(B) = Var(S) + λE3 [S](1 − λE[S]) 3 There is a very nice representation of the compound Poisson process when theset of possible values of the Y i is finite or countably infinite. So let us supposethat there are numbers α j ,j 1, such that∑P {Y 1 = α j }=p j , p j = 1Now, a compound Poisson process arises when events occur according to aPoisson process and each event results in a random amount Y being added toj


5.4. Generalizations of the Poisson Process 341the cumulative sum. Let us say that the event is a type j event whenever it resultsin adding the amount α j ,j 1. That is, the ith event of the Poisson process isa type j event if Y i = α j .IfweletN j (t) denote the number of type j eventsby time t, then it follows from Proposition 5.2 that the random variables N j (t),j 1, are independent Poisson random variables with respective meansE[N j (t)]=λp j tSince, for each j, the amount α j is added to the cumulative sum a total of N j (t)times by time t, it follows that the cumulative sum at time t can be expressed asX(t) = ∑ jα j N j (t) (5.26)As a check of Equation (5.26), let us use it to compute the mean and varianceof X(t). This yields[ ∑]E[X(t)]=E α j N j (t)j= ∑ jα j E[N j (t)]= ∑ jα j λp j t= λt E[Y 1 ]Also,[ ∑Var[X(t)]=Varj]α j N j (t)= ∑ jα 2 j Var[N j (t)] by the independence of the N j (t), j 1= ∑ jα 2 j λp j t= λtE[Y 2 1 ]where the next to last equality follows since the variance of the Poisson randomvariable N j (t) is equal to its mean.Thus, we see that the representation (5.26) results in the same expressions forthe mean and variance of X(t) as were previously derived.


342 5 The Exponential Distribution and the Poisson ProcessOne of the uses of the representation (5.26) is that it enables us to conclude thatas t grows large, the distribution of X(t) converges to the normal distribution. Tosee why, note first that it follows by the central limit theorem that the distributionof a Poisson random variable converges to a normal distribution as its mean increases.(Why is this?) Therefore, each of the random variables N j (t) convergesto a normal random variable as t increases. Because they are independent, and becausethe sum of independent normal random variables is also normal, it followsthat X(t) also approaches a normal distribution as t increases.Example 5.28 In Example 5.24, find the approximate probability that at least240 people migrate to the area within the next 50 weeks.Solution: Since λ = 2,E[Y i ]=5/2,E[Yi 2 ]=43/6, we see thatE[X(50)]=250,Var[X(50)]=4300/6Now, the desired probability isP {X(50) 240}=P {X(50) 239.5}{ }X(50) − 250 239.5 − 250= P √ √ 4300/6 4300/6= 1 − φ(−0.3922)= φ(0.3922)= 0.6525where Table 2.3 was used to determine φ(0.3922), the probability that a standardnormal is less than 0.3922. Another useful result is that if {X(t),t 0} and {Y(t),t 0} are independentcompound Poisson processes with respective Poisson parameters and distributionsλ 1 ,F 1 and λ 2 ,F 2 , then {X(t) + Y(t),t 0} is also a compound Poisson process.This is true because in this combined process events will occur according to aPoisson process with rate λ 1 + λ 2 , and each event independently will be from thefirst compound Poisson process with probability λ 1 /(λ 1 + λ 2 ). Consequently, thecombined process will be a compound Poisson process with Poisson parameterλ 1 + λ 2 , and with distribution function F given byF(x)= λ 1λ 1 + λ 2F 1 (x) + λ 2λ 1 + λ 2F 2 (x)


5.4. Generalizations of the Poisson Process 3435.4.3. Conditional or Mixed Poisson ProcessesLet {N(t),t 0} be a counting process whose probabilities are defined as follows.There is a positive random variable L such that, conditional on L = λ, the countingprocess is a Poisson process with rate λ. Such a counting process is called aconditional or a mixed Poisson process.Suppose that L is continuous with density function g. BecauseP {N(t + s) − N(s)= n}==∫ ∞0∫ ∞0P {N(t + s) − N(s)= n | L = λ}g(λ)dλ−λt (λt)ne g(λ)dλ (5.27)n!we see that a conditional Poisson process has stationary increments. However,because knowing how many events occur in an interval gives information aboutthe possible value of L, which affects the distribution of the number of events inany other interval, it follows that a conditional Poisson process does not generallyhave independent increments. Consequently, a conditional Poisson process is notgenerally a Poisson process.Example 5.29 If g is the gamma density with parameters m and θ,theng(λ) = θe −θλ(θλ)m−1(m − 1)! , λ>0∫ ∞−λt (λt)nP {N(t)= n}= e θe −θλ(θλ)m−10 n! (m − 1)! dλ= tn θ m ∫ ∞e −(t+θ)λ λ n+m−1 dλn!(m − 1)! 0Multiplying and dividing by (n+m−1)!(t+θ) n+m givesP {N(t)= n}= tn θ m ∫(n + m − 1)! ∞n!(m − 1)!(t + θ) n+m (t + θ)e−(t+θ)λ((t + θ)λ)n+m−1dλ0(n + m − 1)!Because (t + θ)e −(t+θ)λ ((t + θ)λ) n+m−1 /(n + m − 1)! is the density function ofa gamma (n + m, t + θ) random variable, its integral is 1, giving the result( )( ) n + m − 1 θ m ( ) t nP {N(t)= n}=n t + θ t + θTherefore, the number of events in an interval of length t has the same distributionof the number of failures that occur before a total of m successes are amassed,when each trial is a success with probability θt+θ .


344 5 The Exponential Distribution and the Poisson ProcessTo compute the mean and variance of N(t), condition on L. Because, conditionalon L, N(t) is Poisson with mean Lt, we obtainE[N(t)|L]=LtVar(N(t)|L) = Ltwhere the final equality used that the variance of a Poisson random variable isequal to its mean. Consequently, the conditional variance formula yieldsVar(N(t)) = E[Lt]+Var(Lt)= tE[L]+t 2 Var(L)We can compute the conditional distribution function of L, given that N(t)= n,as follows.P {L x, N(t) = n}P {L x|N(t)= n}=P {N(t)= n}∫ ∞0P {L x,N(t) = n|L = λ}g(λ)dλ=P {N(t)= n}∫ x0P {N(t)= n|L = λ}g(λ)dλ=P {N(t)= n}∫ x0=e−λt (λt) n g(λ)dλ∫ ∞0e −λt (λt) n g(λ)dλwhere the final equality used Equation (5.27). In other words, the conditionaldensity function of L given that N(t)= n isf L|N(t) (λ | n) =e −λt λ n g(λ)∫ ∞0e −λt λ n , λ 0 (5.28)g(λ)dλExample 5.30 An insurance company feels that each of its policyholders hasa rating value and that a policyholder having rating value λ will make claims attimes distributed according to a Poisson process with rate λ, when time is measuredin years. The firm also believes that rating values vary from policyholder topolicyholder, with the probability distribution of the value of a new policyholderbeing uniformly distributed over (0, 1). Given that a policyholder has made nclaims in his or her first t years, what is the conditional distribution of the timeuntil the policyholder’s next claim?


5.4. Generalizations of the Poisson Process 345Solution: If T is the time until the next claim, then we want to computeP {T >x| N(t) = n}. Conditioning on the policyholder’s rating value gives,upon using Equation (5.28),P {T >x| N(t)= n}==∫ ∞0P {T >x| L = λ, N(t) = n}f L|N(t) (λ | n) dλ∫ 10 e−λx e −λt λ n dλ∫ 10 e−λt λ n dλThere is a nice formula for the probability that more than n events occur in aninterval of length t. In deriving it we will use the identity∞∑j=n+1−λt (λt)jej!∫ t−λx (λx)n= λe dx (5.29)0 n!which follows by noting that it equates the probability that the number of eventsby time t of a Poisson process with rate λ is greater than n with the probabilitythat the time of the (n + 1)st event of this process (which has a gamma (n + 1,λ)distribution) is less than t. Interchanging λ and t in Equation (5.29) yields theequivalent identity∞∑j=n+1−λt (λt)jej!Using Equation (5.27) we now haveP {N(t)>n}=∞∑j=n+1∫ ∞====∫ ∞0∞∑0j=n+1∫ ∞ ∫ λ0 0∫ ∞ ∫ ∞0∫ ∞0x=∫ λ0−λt (λt)je g(λ)dλj!−λt (λt)je g(λ)dλj!−tx (tx)nte dx (5.30)n!(by interchanging)−tx (tx)nte dxg(λ)dλ (using (5.30))n!−tx (tx)ng(λ)dλte dxn!−tx (tx)nḠ(x)te dxn!(by interchanging)


346 5 The Exponential Distribution and the Poisson ProcessExercises1. The time T required to repair a machine is an exponentially distributed randomvariable with mean 1 2 (hours).(a) What is the probability that a repair time exceeds 1 2 hour?(b) What is the probability that a repair takes at least 12 1 2hours given that itsduration exceeds 12 hours?2. Suppose that you arrive at a single-teller bank to find five other customers inthe bank, one being served and the other four waiting in line. You join the end ofthe line. If the service times are all exponential with rate μ, what is the expectedamount of time you will spend in the bank?3. Let X be an exponential random variable. Without any computations, tellwhich one of the following is correct. Explain your answer.(a) E[X 2 |X>1]=E[(X + 1) 2 ];(b) E[X 2 |X>1]=E[X 2 ]+1;(c) E[X 2 |X>1]=(1 + E[X]) 2 .4. Consider a post office with two clerks. Three people, A, B, and C, entersimultaneously. A and B go directly to the clerks, and C waits until either A or Bleaves before he begins service. What is the probability that A is still in the postoffice after the other two have left when(a) the service time for each clerk is exactly (nonrandom) ten minutes?(b) the service times are i with probability 1 3, i = 1, 2, 3?(c) the service times are exponential with mean 1/μ?5. The lifetime of a radio is exponentially distributed with a mean of ten years.If Jones buys a ten-year-old radio, what is the probability that it will be workingafter an additional ten years?6. In Example 5.3 if server i serves at an exponential rate λ i , i = 1, 2, show that( ) 2 ( ) 2λ1λ2P {Smith is not last}=+λ 1 + λ 2 λ 1 + λ 2*7. If X 1 and X 2 are independent nonnegative continuous random variables,show thatP {X 1


Exercises 347suppose that ∑ ni=1 P {T = i}=1. Show that the failure rate function r(t) of X Tis given byn∑r(t) = r i (t)P {T = i|X>t}i=19. Machine 1 is currently working. Machine 2 will be put in use at a time t fromnow. If the lifetime of machine i is exponential with rate λ i , i = 1, 2, what is theprobability that machine 1 is the first machine to fail?*10. Let X and Y be independent exponential random variables with respectiverates λ and μ.LetM = min(X, Y ). Find(a) E[MX|M = X],(b) E[MX|M = Y ],(c) Cov(X, M).11. Let X, Y 1 ,...,Y n be independent exponential random variables; X havingrate λ, and Y i having rate μ. LetA j be the event that the jth smallest of thesen + 1 random variables is one of the Y i .Findp = P {X >max i Y i },byusingtheidentityp = P(A 1 ···A n ) = P(A 1 )P (A 2 |A 1 ) ···P(A n |A 1 ···A n−1 )Verify your answer when n = 2 by conditioning on X to obtain p.12. If X i , i = 1, 2, 3, are independent exponential random variables with ratesλ i , i = 1, 2, 3, find(a) P {X 1


348 5 The Exponential Distribution and the Poisson Process17. A set of n cities is to be connected via communication links. The cost toconstruct a link between cities i and j is C ij , i ≠ j. Enough links should be constructedso that for each pair of cities there is a path of links that connects them.As a result, only n − 1 links need be constructed. A minimal cost algorithm forsolving this problem (known as the minimal spanning tree problem) first constructsthe cheapest of all the ( n2)links. Then, at each additional stage it choosesthe cheapest link that connects a city without any links to one with links. That is,if the first link is between cities 1 and 2, then the second link will either be between1 and one of the links 3,...,n or between 2 and one of the links 3,...,n.Suppose that all of the ( n2)costs Cij are independent exponential random variableswith mean 1. Find the expected cost of the preceding algorithm if(a) n = 3,(b) n = 4.*18. Let X 1 and X 2 be independent exponential random variables, each havingrate μ.LetX (1) = minimum(X 1 ,X 2 ) and X (2) = maximum(X 1 ,X 2 )Find(a) E[X (1) ],(b) Var[X (1) ],(c) E[X (2) ],(d) Var[X (2) ].19. Repeat Exercise 18, but this time suppose that the X i are independent exponentialswith respective rates μ i , i = 1, 2.20. Consider a two-server system in which a customer is served first by server 1,then by server 2, and then departs. The service times at server i are exponentialrandom variables with rates μ i , i = 1, 2. When you arrive, you find server 1 freeand two customers at server 2—customer A in service and customer B waiting inline.(a) Find P A , the probability that A is still in service when you move over toserver 2.(b) Find P B , the probability that B is still in the system when you move overto server 2.(c) Find E[T ], where T is the time that you spend in the system.Hint:WriteT = S 1 + S 2 + W A + W Bwhere S i is your service time at server i, W A is the amount of time you waitin queue while A is being served, and W B is the amount of time you wait inqueue while B is being served.


Exercises 34921. In a certain system, a customer must first be served by server 1 and thenby server 2. The service times at server i are exponential with rate μ i , i = 1, 2.An arrival finding server 1 busy waits in line for that server. Upon completion ofservice at server 1, a customer either enters service with server 2 if that serveris free or else remains with server 1 (blocking any other customer from enteringservice) until server 2 is free. Customers depart the system after being served byserver 2. Suppose that when you arrive there is one customer in the system andthat customer is being served by server 1. What is the expected total time youspend in the system?22. Suppose in Exercise 21 you arrive to find two others in the system, one beingserved by server 1 and one by server 2. What is the expected time you spend in thesystem? Recall that if server 1 finishes before server 2, then server 1’s customerwill remain with him (thus blocking your entrance) until server 2 becomes free.*23. A flashlight needs two batteries to be operational. Consider such a flashlightalong with a set of n functional batteries—battery 1, battery 2,..., battery n.Initially, battery 1 and 2 are installed. Whenever a battery fails, it is immediatelyreplaced by the lowest numbered functional battery that has not yet been put inuse. Suppose that the lifetimes of the different batteries are independent exponentialrandom variables each having rate μ. At a random time, call it T , a battery willfail and our stockpile will be empty. At that moment exactly one of the batteries—which we call battery X—will not yet have failed.(a) What is P {X = n}?(b) What is P {X = 1}?(c) What is P {X = i}?(d) Find E[T ].(e) What is the distribution of T ?24. There are 2 servers available to process n jobs. Initially, each server beginswork on a job. Whenever a server completes work on a job, that job leaves thesystem and the server begins processing a new job (provided there are still jobswaiting to be processed). Let T denote the time until all jobs have been processed.If the time that it takes server i to process a job is exponentially distributed withrate μ i ,i= 1, 2, find E[T ] and Var(T ).25. Customers can be served by any of three servers, where the service times ofserver i are exponentially distributed with rate μ i ,i= 1, 2, 3. Whenever a serverbecomes free, the customer who has been waiting the longest begins service withthat server.(a) If you arrive to find all three servers busy and no one waiting, find theexpected time until you depart the system.(b) If you arrive to find all three servers busy and one person waiting, find theexpected time until you depart the system.


350 5 The Exponential Distribution and the Poisson Process26. Each entering customer must be served first by server 1, then by server 2,and finally by server 3. The amount of time it takes to be served by server i isan exponential random variable with rate μ i ,i= 1, 2, 3. Suppose you enter thesystem when it contains a single customer who is being served by server 3.(a) Find the probability that server 3 will still be busy when you move over toserver 2.(b) Find the probability that server 3 will still be busy when you move over toserver 3.(c) Find the expected amount of time that you spend in the system. (Wheneveryou encounter a busy server, you must wait for the service in progress to endbefore you can enter service.)(d) Suppose that you enter the system when it contains a single customer whois being served by server 2. Find the expected amount of time that you spend inthe system.27. Show, in Example 5.7, that the distributions of the total cost are the same forthe two algorithms.28. Consider n components with independent lifetimes which are such thatcomponent i functions for an exponential time with rate λ i . Suppose that all componentsare initially in use and remain so until they fail.(a) Find the probability that component 1 is the second component to fail.(b) Find the expected time of the second failure.Hint:Do not make use of part (a).29. Let X and Y be independent exponential random variables with respectiverates λ and μ, where λ>μ.Letc>0.(a) Show that the conditional density function of X, given that X + Y = c,is(λ − μ)e−(λ−μ)xf X|X+Y (x|c) =1 − e −(λ−μ)c , 0


Exercises 35132. There are three jobs and a single worker who works first on job 1, then onjob 2, and finally on job 3. The amounts of time that he spends on each job areindependent exponential random variables with mean 1. Let C i be the time atwhich job i is completed, i = 1, 2, 3, and let X = ∑ 3i=1 C i be the sum of thesecompletion times. Find (a) E[X],(b)Var(X).33. Let X and Y be independent exponential random variables with respectiverates λ and μ.(a) Argue that, conditional on X>Y, the random variables min(X, Y ) andX − Y are independent.(b) Use part (a) to conclude that for any positive constant cE[min(X, Y )|X>Y+ c]=E[min(X, Y )|X>Y]= E[min(X, Y )]= 1λ + μ(c) Give a verbal explanation of why min(X, Y ) and X − Y are (unconditionally)independent.34. Two individuals, A and B, both require kidney transplants. If she does notreceive a new kidney, then A will die after an exponential time with rate μ A , andB after an exponential time with rate μ B . New kidneys arrive in accordance witha Poisson process having rate λ. It has been decided that the first kidney will goto A (or to B if B is alive and A is not at that time) and the next one to B (if stillliving).(a) What is the probability that A obtains a new kidney?(b) What is the probability that B obtains a new kidney?35. Show that Definition 5.1 of a Poisson process implies Definition 5.3.36. Let S(t) denote the price of a security at time t. A popular model for theprocess {S(t),t 0} supposes that the price remains unchanged until a “shock”occurs, at which time the price is multiplied by a random factor. If we let N(t)denote the number of shocks by time t, and let X i denote the ith multiplicativefactor, then this model supposes thatN(t)∏S(t) = S(0)where ∏ N(t)i=1 X i is equal to 1 when N(t)= 0. Suppose that the X i are independentexponential random variables with rate μ; that {N(t),t 0} is a Poisson processwith rate λ; that {N(t),t 0} is independent of the X i ; and that S(0) = s.(a) Find E[S(t)].(b) Find E[S 2 (t)].i=1X i


352 5 The Exponential Distribution and the Poisson Process37. Cars cross a certain point in the highway in accordance with a Poissonprocess with rate λ = 3 per minute. If Reb blindly runs across the highway,then what is the probability that she will be uninjured if the amount oftime that it takes her to cross the road is s seconds? (Assume that if she ison the highway when a car passes by, then she will be injured.) Do it fors = 2, 5, 10, 20.38. Let {M i (t), t 0}, i = 1, 2 be independent Poisson processes with respectiverates λ i , i = 1, 2, and setN 1 (t) = M 1 (t) + M 2 (t),N 2 (t) = M 2 (t) + M 3 (t)The stochastic process {(N 1 (t), N 2 (t)), t 0} is called a bivariate Poissonprocess.(a) Find P {N 1 (t) = n, N 2 (t) = m}.(b) Find Cov ( N 1 (t), N 2 (t) ) .39. A certain scientific theory supposes that mistakes in cell division occuraccording to a Poisson process with rate 2.5 per year, and that an individualdies when 196 such mistakes have occurred. Assuming this theory,find(a) the mean lifetime of an individual,(b) the variance of the lifetime of an individual.Also approximate(c) the probability that an individual dies before age 67.2,(d) the probability that an individual reaches age 90,(e) the probability that an individual reaches age 100.*40. Show that if {N i (t), t 0} are independent Poisson processes with rateλ i , i = 1, 2, then {N(t),t 0} is a Poisson process with rate λ 1 + λ 2 whereN(t)= N 1 (t) + N 2 (t).41. In Exercise 40 what is the probability that the first event of the combinedprocess is from the N 1 process?42. Let {N(t), t 0} be a Poisson process with rate λ. LetS n denote the timeof the nth event. Find(a) E[S 4 ],(b) E[S 4 |N(1) = 2],(c) E[N(4) − N(2)|N(1) = 3].43. Customers arrive at a two-server service station according to a Poissonprocess with rate λ. Whenever a new customer arrives, any customer that is inthe system immediately departs. A new arrival enters service first with server 1and then with server 2. If the service times at the servers are independent exponentialswith respective rates μ 1 and μ 2 , what proportion of entering customerscompletes their service with server 2?


Exercises 35344. Cars pass a certain street location according to a Poisson process with rate λ.A woman who wants to cross the street at that location waits until she can see thatno cars will come by in the next T time units.(a) Find the probability that her waiting time is 0.(b) Find her expected waiting time.Hint:Condition on the time of the first car.45. Let {N(t), t 0} be a Poisson process with rate λ, that is independentof the nonnegative random variable T with mean μ and variance σ 2 .Find(a) Cov(T , N(T )),(b) Var(N(T )).46. Let {N(t), t 0} be a Poisson process with rate λ, that is independent of thesequence X 1 ,X 2 ,...of independent and identically distributed random variableswith mean μ and variance σ 2 .Find⎛⎞N(t)∑Cov ⎝N(t), X i⎠47. Consider a two-server parallel queuing system where customers arrive accordingto a Poisson process with rate λ, and where the service times are exponentialwith rate μ. Moreover, suppose that arrivals finding both servers busyimmediately depart without receiving any service (such a customer is said to belost), whereas those finding at least one free server immediately enter service andthen depart when their service is completed.(a) If both servers are presently busy, find the expected time until the nextcustomer enters the system.(b) Starting empty, find the expected time until both servers are busy.(c) Find the expected time between two successive lost customers.48. Consider an n-server parallel queuing system where customers arrive accordingto a Poisson process with rate λ, where the service times are exponentialrandom variables with rate μ, and where any arrival finding all servers busy immediatelydeparts without receiving any service. If an arrival finds all serversbusy, find(a) the expected number of busy servers found by the next arrival,(b) the probability that the next arrival finds all servers free,(c) the probability that the next arrival finds exactly i of the servers free.i=1


354 5 The Exponential Distribution and the Poisson Process49. Events occur according to a Poisson process with rate λ. Each time anevent occurs, we must decide whether or not to stop, with our objective beingto stop at the last event to occur prior to some specified time T , whereT>1/λ. That is, if an event occurs at time t, 0 t T , and we decide tostop, then we win if there are no additional events by time T , and we lose otherwise.If we do not stop when an event occurs and no additional events occurby time T , then we lose. Also, if no events occur by time T , then we lose. Considerthe strategy that stops at the first event to occur after some fixed time s,0 s T .(a) Using this strategy, what is the probability of winning?(b) What value of s maximizes the probability of winning?(c) Show that one’s probability of winning when using the preceding strategywith the value of s specified in part (b) is 1/e.50. The number of hours between successive train arrivals at the station is uniformlydistributed on (0, 1). Passengers arrive according to a Poisson process withrate 7 per hour. Suppose a train has just left the station. Let X denote the numberof people who get on the next train. Find(a) E[X],(b) Var(X).51. If an individual has never had a previous automobile accident, then theprobability he or she has an accident in the next h time units is βh + o(h); onthe other hand, if he or she has ever had a previous accident, then the probabilityis αh + o(h). Find the expected number of accidents an individual has bytime t.52. Teams 1 and 2 are playing a match. The teams score points according toindependent Poisson processes with respective rates λ 1 and λ 2 . If the match endswhen one of the teams has scored k more points than the other, find the probabilitythat team 1 wins.Hint:Relate this to the gambler’s ruin problem.53. The water level of a certain reservoir is depleted at a constant rate of 1000units daily. The reservoir is refilled by randomly occurring rainfalls. Rainfallsoccur according to a Poisson process with rate 0.2 per day. The amount of wateradded to the reservoir by a rainfall is 5000 units with probability 0.8 or 8000units with probability 0.2. The present water level is just slightly below 5000units.(a) What is the probability the reservoir will be empty after five days?(b) What is the probability the reservoir will be empty sometime within thenext ten days?


Exercises 35554. A viral linear DNA molecule of length, say, 1 is often known to containa certain “marked position,” with the exact location of this mark beingunknown. One approach to locating the marked position is to cut the moleculeby agents that break it at points chosen according to a Poisson processwith rate λ. It is then possible to determine the fragment that contains themarked position. For instance, letting m denote the location on the line ofthe marked position, then if L 1 denotes the last Poisson event time before m(or 0 if there are no Poisson events in [0,m]), and R 1 denotes the firstPoisson event time after m (or 1 if there are no Poisson events in [m, 1]),then it would be learned that the marked position lies between L 1 and R 1 .Find(a) P {L 1 = 0},(b) P {L 1


356 5 The Exponential Distribution and the Poisson Process*57. Events occur according to a Poisson process with rate λ = 2 per hour.(a) What is the probability that no event occurs between 8 P.M. and 9 P.M.?(b) Starting at noon, what is the expected time at which the fourth event occurs?(c) What is the probability that two or more events occur between 6 P.M. and8 P.M.?58. Consider the coupon collecting problem where there are m distinct typesof∑coupons, and each new coupon collected is type j with probability p j ,mj=1p j = 1. Suppose you stop collecting when you have a complete set ofat least one of each type. Show that[ ∏]P {i is the last type collected}=E (1 − U λ j /λ i)j≠iwhere U is a uniform (0, 1) random variable.59. There are two types of claims that are made to an insurance company. LetN i (t) denote the number of type i claims made by time t, and suppose that{N 1 (t), t 0} and {N 2 (t), t 0} are independent Poisson processes with ratesλ 1 = 10 and λ 2 = 1. The amounts of successive type 1 claims are independentexponential random variables with mean $1000 whereas the amounts fromtype 2 claims are independent exponential random variables with mean $5000.A claim for $4000 has just been received; what is the probability it is a type 1claim?*60. Customers arrive at a bank at a Poisson rate λ. Suppose two customersarrived during the first hour. What is the probability that(a) both arrived during the first 20 minutes?(b) at least one arrived during the first 20 minutes?61. A system has a random number of flaws that we will suppose is Poissondistributed with mean c. Each of these flaws will, independently, cause the systemto fail at a random time having distribution G. When a system failureoccurs, suppose that the flaw causing the failure is immediately located andfixed.(a) What is the distribution of the number of failures by time t?(b) What is the distribution of the number of flaws that remain in the system attime t?(c) Are the random variables in parts (a) and (b) dependent or independent?62. Suppose that the number of typographical errors in a new text is Poisson distributedwith mean λ. Two proofreaders independently read the text. Suppose thateach error is independently found by proofreader i with probability p i , i = 1, 2.


Exercises 357Let X 1 denote the number of errors that are found by proofreader 1 but not byproofreader 2. Let X 2 denote the number of errors that are found by proofreader2 but not by proofreader 1. Let X 3 denote the number of errors that are found byboth proofreaders. Finally, let X 4 denote the number of errors found by neitherproofreader.(a) Describe the joint probability distribution of X 1 ,X 2 ,X 3 ,X 4 .(b) Show thatE[X 1 ]E[X 3 ] = 1 − p 2p 2andE[X 2 ]E[X 3 ] = 1 − p 1p 1Suppose now that λ, p 1 , and p 2 are all unknown.(c) By using X i as an estimator of E[X i ], i = 1, 2, 3, present estimators of p 1 ,p 2 , and λ.(d) Give an estimator of X 4 , the number of errors not found by either proofreader.63. Consider an infinite server queuing system in which customers arrive in accordancewith a Poisson process and where the service distribution is exponentialwith rate μ.LetX(t) denote the number of customers in the system at time t.Find(a) E[X(t + s)|X(s) = n];(b) Var[X(t + s)|X(s) = n];Hint: Divide the customers in the system at time t + s into two groups,one consisting of “old” customers and the other of “new” customers.(c) Consider an infinite server queuing system in which customers arrive accordingto a Poisson process with rate λ, and where the service times are allexponential random variables with rate μ. If there is currently a single customerin the system, find the probability that the system becomes empty whenthat customer departs.*64. Suppose that people arrive at a bus stop in accordance with a Poissonprocess with rate λ. The bus departs at time t. LetX denote the total amountof waiting time of all those who get on the bus at time t. We want to determineVar(X). LetN(t) denote the number of arrivals by time t.(a) What is E[X|N(t)]?(b) Argue that Var[X|N(t)]=N(t)t 2 /12.(c) What is Var(X)?65. An average of 500 people pass the California bar exam each year.A California lawyer practices law, on average, for 30 years. Assuming thesenumbers remain steady, how many lawyers would you expect California to havein 2050?


358 5 The Exponential Distribution and the Poisson Process66. Policyholders of a certain insurance company have accidents at times distributedaccording to a Poisson process with rate λ. The amount of time fromwhen the accident occurs until a claim is made has distribution G.(a) Find the probability there are exactly n incurred but as yet unreportedclaims at time t.(b) Suppose that each claim amount has distribution F , and that the claimamount is independent of the time that it takes to report the claim. Find theexpected value of the sum of all incurred but as yet unreported claims at time t.67. Satellites are launched into space at times distributed according to a Poissonprocess with rate λ. Each satellite independently spends a random time (havingdistribution G) in space before falling to the ground. Find the probability thatnone of the satellites in the air at time t was launched before time s, where s


Exercises 35970. For the infinite server queue with Poisson arrivals and general service distributionG, find the probability that(a) the first customer to arrive is also the first to depart.Let S(t) equal the sum of the remaining service times of all customers inthe system at time t.(b) Argue that S(t) is a compound Poisson random variable.(c) Find E[S(t)].(d) Find Var(S(t)).71. Let S n denote the time of the nth event of the Poisson process {N(t), t 0}having rate λ. Show, for an arbitrary function g, that the random variable∑ N(t)i=1 g(S i) has the same distribution as the compound Poisson random variable∑ N(t)i=1 g(U i), where U 1 ,U 2 ,...is a sequence of independent and identicallydistributed uniform (0, t) random variables that is independent of N, a Poissonrandom variable with mean λt. Consequently, conclude that⎡ ⎤⎛ ⎞N(t)∑∫ tN(t)∑∫ tE ⎣ g(S i ) ⎦ = λ g(x)dx Var ⎝ g(S i ) ⎠ = λ g 2 (x) dxi=1072. A cable car starts off with n riders. The times between successive stops ofthe car are independent exponential random variables with rate λ. At each stopone rider gets off. This takes no time, and no additional riders get on. After a ridergets off the car, he or she walks home. Independently of all else, the walk takesan exponential time with rate μ.(a) What is the distribution of the time at which the last rider departs the car?(b) Suppose the last rider departs the car at time t. What is the probability thatall the other riders are home at that time?73. Shocks occur according to a Poisson process with rate λ, and each shockindependently causes a certain system to fail with probability p.LetT denote thetime at which the system fails and let N denote the number of shocks that it takes.(a) Find the conditional distribution of T given that N = n.(b) Calculate the conditional distribution of N, given that T = t, and noticethat it is distributed as 1 plus a Poisson random variable with mean λ(1 − p)t.(c) Explain how the result in part (b) could have been obtained without anycalculations.74. The number of missing items in a certain location, call it X, is a Poissonrandom variable with mean λ. When searching the location, each item will independentlybe found after an exponentially distributed time with rate μ. A rewardof R is received for each item found, and a searching cost of C per unit of searchtime is incurred. Suppose that you search for a fixed time t and then stop.i=10


360 5 The Exponential Distribution and the Poisson Process(a) Find your total expected return.(b) Find the value of t that maximizes the total expected return.(c) The policy of searching for a fixed time is a static policy. Would a dynamicpolicy, which allows the decision as to whether to stop at each time t, dependon the number already found by t be beneficial?Hint: How does the distribution of the number of items not yet found bytime t depend on the number already found by that time?75. Suppose that the times between successive arrivals of customers at a singleserverstation are independent random variables having a common distribution F .Suppose that when a customer arrives, he or she either immediately enters serviceif the server is free or else joins the end of the waiting line if the server isbusy with another customer. When the server completes work on a customer, thatcustomer leaves the system and the next waiting customer, if there are any, entersservice. Let X n denote the number of customers in the system immediately beforethe nth arrival, and let Y n denote the number of customers that remain in thesystem when the nth customer departs. The successive service times of customersare independent random variables (which are also independent of the interarrivaltimes) having a common distribution G.(a) If F is the exponential distribution with rate λ, which, if any, of the processes{X n }, {Y n } is a Markov chain?(b) If G is the exponential distribution with rate μ, which, if any, of theprocesses {X n }, {Y n } is a Markov chain?(c) Give the transition probabilities of any Markov chains in parts (a) and (b).76. For the model of Example 5.27, find the mean and variance of the numberof customers served in a busy period.77. Events occur according to a nonhomogeneous Poisson process whose meanvalue function is given bym(t) = t 2 + 2t, t 0What is the probability that n events occur between times t = 4 and t = 5?78. A store opens at 8 A.M. From 8 until 10 customers arrive at a Poisson rateof four an hour. Between 10 and 12 they arrive at a Poisson rate of eight an hour.From 12 to 2 the arrival rate increases steadily from eight per hour at 12 to tenper hour at 2; and from 2 to 5 the arrival rate drops steadily from ten per hour at2 to four per hour at 5. Determine the probability distribution of the number ofcustomers that enter the store on a given day.*79. Consider a nonhomogeneous Poisson process whose intensity function λ(t)is bounded and continuous. Show that such a process is equivalent to a process of


Exercises 361counted events from a (homogeneous) Poisson process having rate λ, where anevent at time t is counted (independent of the past) with probability λ(t)/λ; andwhere λ is chosen so that λ(s) < λ for all s.80. Let T 1 ,T 2 ,... denote the interarrival times of events of a nonhomogeneousPoisson process having intensity function λ(t).(a) Are the T i independent?(b) Are the T i identically distributed?(c) Find the distribution of T 1 .81. (a) Let {N(t),t 0} be a nonhomogeneous Poisson process with meanvalue function m(t). GivenN(t) = n, show that the unordered set of arrivaltimes has the same distribution as n independent and identically distributedrandom variables having distribution function⎧⎨m(x)F(x)= m(t) , x t⎩1, x t(b) Suppose that workmen incur accidents in accordance with a nonhomogeneousPoisson process with mean value function m(t). Suppose further thateach injured man is out of work for a random amount of time having distributionF .LetX(t) be the number of workers who are out of work at time t. Byusing part (a), find E[X(t)].82. Let X 1 ,X 2 ,... be independent positive continuous random variables witha common density function f , and suppose this sequence is independent of N, aPoisson random variable with mean λ. DefineN(t)= number of i N : X i tShow that {N(t),t 0} is a nonhomogeneous Poisson process with intensityfunction λ(t) = λf (t).83. Suppose that {N 0 (t), t 0} is a Poisson process with rate λ = 1. Let λ(t)denote a nonnegative function of t, and let∫ tm(t) = λ(s) ds0Define N(t) byN(t)= N 0 (m(t))Argue that {N(t),t 0} is a nonhomogeneous Poisson process with intensityfunction λ(t), t 0.


362 5 The Exponential Distribution and the Poisson ProcessHint:Make use of the identitym(t + h) − m(t) = m ′ (t)h + o(h)*84. Let X 1 ,X 2 ,...be independent and identically distributed nonnegative continuousrandom variables having density function f(x). We say that a record occursat time n if X n is larger than each of the previous values X 1 ,...,X n−1 .(A record automatically occurs at time 1.) If a record occurs at time n, then X nis called a record value. In other words, a record occurs whenever a new highis reached, and that new high is called the record value. Let N(t) denote thenumber of record values that are less than or equal to t. Characterize the process{N(t), t 0} when(a) f is an arbitrary continuous density function.(b) f(x)= λe −λx .Hint: Finish the following sentence: There will be a record whose value isbetween t and t + dt if the first X i that is greater than t lies between ...85. An insurance company pays out claims on its life insurance policies in accordancewith a Poisson process having rate λ = 5 per week. If the amount ofmoney paid on each policy is exponentially distributed with mean $2000, what isthe mean and variance of the amount of money paid by the insurance company ina four-week span?86. In good years, storms occur according to a Poisson process with rate 3 perunit time, while in other years they occur according to a Poisson process withrate 5 per unit time. Suppose next year will be a good year with probability 0.3.Let N(t) denote the number of storms during the first t time units of next year.(a) Find P {N(t)= n}.(b) Is {N(t)} a Poisson process?(c) Does {N(t)} have stationary increments? Why or why not?(d) Does it have independent increments? Why or why not?(e) If next year starts off with three storms by time t = 1, what is the conditionalprobability it is a good year?87. DetermineCov[X(t),X(t + s)]when {X(t),t 0} is a compound Poisson process.88. Customers arrive at the automatic teller machine in accordance with aPoisson process with rate 12 per hour. The amount of money withdrawn oneach transaction is a random variable with mean $30 and standard deviation $50.(A negative withdrawal means that money was deposited.) The machine is in usefor 15 hours daily. Approximate the probability that the total daily withdrawal isless than $6000.


Exercises 36389. Some components of a two-component system fail after receiving a shock.Shocks of three types arrive independently and in accordance with Poissonprocesses. Shocks of the first type arrive at a Poisson rate λ 1 and cause the firstcomponent to fail. Those of the second type arrive at a Poisson rate λ 2 and causethe second component to fail. The third type of shock arrives at a Poisson rate λ 3and causes both components to fail. Let X 1 and X 2 denote the survival times forthe two components. Show that the joint distribution of X 1 and X 2 is given byP {X 1 >s,X 1 >t}=exp{−λ 1 s − λ 2 t − λ 3 max(s, t)}This distribution is known as the bivariate exponential distribution.90. In Exercise 89 show that X 1 and X 2 both have exponential distributions.*91. Let X 1 ,X 2 ,...,X n be independent and identically distributed exponentialrandom variables. Show that the probability that the largest of them is greater thanthe sum of the others is n/2 n−1 .Thatis,ifM = max X jjthen show{P M>}n∑X i − Mi=1= n2 n−1Hint: What is P {X 1 > ∑ ni=2 X i }?92. Prove Equation (5.22).93. Prove that(a) max(X 1 ,X 2 ) = X 1 + X 2 − min(X 1 ,X 2 ) and, in general,n∑(b) max(X 1 ,...,X n ) = X i − ∑∑min(X i ,X j )1i


364 5 The Exponential Distribution and the Poisson Process94. A two-dimensional Poisson process is a process of randomly occurringevents in the plane such that(i) for any region of area A the number of events in that region has a Poissondistribution with mean λA, and(ii) the number of events in nonoverlapping regions are independent.For such a process, consider an arbitrary point in the plane and let X denote itsdistance from its nearest event (where distance is measured in the usual Euclideanmanner). Show that(a) P {X>t}=e −λπt2 ,(b) E[X]= 12 √ . λ95. Let {N(t),t 0} be a conditional Poisson process with a random rate L.(a) Derive an expression for E[L|N(t)= n].(b) Find, for s>t, E[N(s)|N(t)= n].(c) Find, for s


Continuous-TimeMarkov Chains66.1. IntroductionIn this chapter we consider a class of probability models that has a wide variety ofapplications in the real world. The members of this class are the continuous-timeanalogs of the Markov chains of Chapter 4 and as such are characterized by theMarkovian property that, given the present state, the future is independent of thepast.One example of a continuous-time Markov chain has already been met. Thisis the Poisson process of Chapter 5. For if we let the total number of arrivals bytime t [that is, N(t)] be the state of the process at time t, then the Poisson processis a continuous-time Markov chain having states 0, 1, 2,... that always proceedsfrom state n to state n + 1, where n 0. Such a process is known as a pure birthprocess since when a transition occurs the state of the system is always increasedby one. More generally, an exponential model which can go (in one transition)only from state n to either state n − 1 or state n + 1 is called a birth and deathmodel. For such a model, transitions from state n to state n + 1 are designatedas births, and those from n to n − 1 as deaths. Birth and death models have wideapplicability in the study of biological systems and in the study of waiting linesystems in which the state represents the number of customers in the system.These models will be studied extensively in this chapter.In Section 6.2 we define continuous-time Markov chains and then relate them tothe discrete-time Markov chains of Chapter 4. In Section 6.3 we consider birth anddeath processes and in Section 6.4 we derive two sets of differential equations—the forward and backward equations—that describe the probability laws for thesystem. The material in Section 6.5 is concerned with determining the limiting(or long-run) probabilities connected with a continuous-time Markov chain. InSection 6.6 we consider the topic of time reversibility. We show that all birth anddeath processes are time reversible, and then illustrate the importance of this ob-365


366 6 Continuous-Time Markov Chainsservation to queueing systems. In the final section we show how to “uniformize”Markov chains, a technique useful for numerical computations.6.2. Continuous-Time Markov ChainsSuppose we have a continuous-time stochastic process {X(t), t 0} taking onvalues in the set of nonnegative integers. In analogy with the definition ofa discrete-time Markov chain, given in Chapter 4, we say that the process{X(t), t 0} is a continuous-time Markov chain if for all s, t 0 and nonnegativeintegers i, j, x(u), 0 u 5}P {T i >s+ t|T i >s}=P {T i >t}for all s, t 0. Hence, the random variable T i is memoryless and must thus (seeSection 5.2.2) be exponentially distributed.


6.2. Continuous-Time Markov Chains 367In fact, the preceding gives us another way of defining a continuous-timeMarkov chain. Namely, it is a stochastic process having the properties that eachtime it enters state i(i) the amount of time it spends in that state before making a transition into adifferent state is exponentially distributed with mean, say, 1/v i , and(ii) when the process leaves state i, it next enters state j with some probability,say, P ij . Of course, the P ij must satisfyP ii = 0,∑P ij = 1,jall iall iIn other words, a continuous-time Markov chain is a stochastic process that movesfrom state to state in accordance with a (discrete-time) Markov chain, but is suchthat the amount of time it spends in each state, before proceeding to the next state,is exponentially distributed. In addition, the amount of time the process spends instate i, and the next state visited, must be independent random variables. For ifthe next state visited were dependent on T i , then information as to how long theprocess has already been in state i would be relevant to the prediction of the nextstate—and this contradicts the Markovian assumption.Example 6.1 (A Shoeshine Shop) Consider a shoeshine establishment consistingof two chairs—chair 1 and chair 2. A customer upon arrival goes initiallyto chair 1 where his shoes are cleaned and polish is applied. After this is donethe customer moves on to chair 2 where the polish is buffed. The service timesat the two chairs are assumed to be independent random variables that are exponentiallydistributed with respective rates μ 1 and μ 2 . Suppose that potentialcustomers arrive in accordance with a Poisson process having rate λ, and that apotential customer will enter the system only if both chairs are empty.The preceding model can be analyzed as a continuous-time Markov chain, butfirst we must decide upon an appropriate state space. Since a potential customerwill enter the system only if there are no other customers present, it follows thatthere will always either be 0 or 1 customers in the system. However, if there is1 customer in the system, then we would also need to know which chair he waspresently in. Hence, an appropriate state space might consist of the three states 0,1, and 2 where the states have the following interpretation:State Interpretation0 system is empty1 a customer is in chair 12 a customer is in chair 2


368 6 Continuous-Time Markov ChainsWe leave it as an exercise for you to verify thatv 0 = λ, v 1 = μ 1 , v 2 = μ 2 ,P 01 = P 12 = P 20 = 1 6.3. Birth and Death ProcessesConsider a system whose state at any time is represented by the number of peoplein the system at that time. Suppose that whenever there are n people in the system,then (i) new arrivals enter the system at an exponential rate λ n , and (ii) peopleleave the system at an exponential rate μ n . That is, whenever there are n personsin the system, then the time until the next arrival is exponentially distributed withmean 1/λ n and is independent of the time until the next departure which is itselfexponentially distributed with mean 1/μ n . Such a system is called a birth anddeath process. The parameters {λ n } ∞ n=0 and {μ n} ∞ n=1are called, respectively, thearrival (or birth) and departure (or death) rates.Thus, a birth and death process is a continuous-time Markov chain with states{0, 1,...} for which transitions from state n may go only to either state n − 1or state n + 1. The relationships between the birth and death rates and the statetransition rates and probabilities arev 0 = λ 0 ,v i = λ i + μ i , i >0P 01 = 1,λ iP i,i+1 = , i >0λ i + μ iP i,i−1 =μ i, i >0λ i + μ iThe preceding follows, because if there are i in the system, then the next state willbe i + 1 if a birth occurs before a death, and the probability that an exponentialrandom variable with rate λ i will occur earlier than an (independent) exponentialwith rate μ i is λ i /(λ i + μ i ). Moreover, the time until either a birth or a deathoccurs is exponentially distributed with rate λ i + μ i (and so, v i = λ i + μ i ).Example 6.2which(The Poisson Process) Consider a birth and death process forμ n = 0, for all n 0λ n = λ, for all n 0This is a process in which departures never occur, and the time between successivearrivals is exponential with mean 1/λ. Hence, this is just the Poisson process.


6.3. Birth and Death Processes 369A birth and death process for which μ n = 0 for all n is called a pure birthprocess. Another pure birth process is given by the next example.Example 6.3 (A Birth Process with Linear Birth Rate) Consider a populationwhose members can give birth to new members but cannot die. If each memberacts independently of the others and takes an exponentially distributed amount oftime, with mean 1/λ, to give birth, then if X(t) is the population size at time t,then {X(t), t 0} is a pure birth process with λ n = nλ, n 0. This follows sinceif the population consists of n persons and each gives birth at an exponentialrate λ, then the total rate at which births occur is nλ. This pure birth process isknown as a Yule process after G. Yule, who used it in his mathematical theory ofevolution. Example 6.4(A Linear Growth Model with Immigration) A model in whichμ n = nμ, n 1λ n = nλ + θ, n 0is called a linear growth process with immigration. Such processes occur naturallyin the study of biological reproduction and population growth. Each individual inthe population is assumed to give birth at an exponential rate λ; in addition, thereis an exponential rate of increase θ of the population due to an external sourcesuch as immigration. Hence, the total birth rate where there are n persons in thesystem is nλ + θ. Deaths are assumed to occur at an exponential rate μ for eachmember of the population, so μ n = nμ.Let X(t) denote the population size at time t. Suppose that X(0) = i and letM(t) = E[X(t)]We will determine M(t) by deriving and then solving a differential equation thatit satisfies.We start by deriving an equation for M(t + h) by conditioning on X(t). ThisyieldsM(t + h) = E[X(t + h)]= E[E[X(t + h)|X(t)]]Now, given the size of the population at time t then, ignoring events whose probabilityis o(h), the population at time t + h will either increase in size by 1 if abirth or an immigration occurs in (t, t + h), or decrease by 1 if a death occurs inthis interval, or remain the same if neither of these two possibilities occurs. That


370 6 Continuous-Time Markov Chainsis, given X(t),⎧⎨X(t) + 1,X(t + h) = X(t) − 1,⎩X(t),with probability [θ + X(t)λ]h + o(h)with probability X(t)μh + o(h)with probability 1 −[θ + X(t)λ + X(t)μ] h + o(h)Therefore,E[X(t + h)|X(t)]=X(t) +[θ + X(t)λ − X(t)μ]h + o(h)Taking expectations yieldsor, equivalently,M(t + h) = M(t)+ (λ − μ)M(t)h + θh+ o(h)M(t + h) − M(t)h= (λ − μ)M(t) + θ + o(h)hTaking the limit as h → 0 yields the differential equationIf we now define the function h(t) bythenM ′ (t) = (λ − μ)M(t) + θ (6.1)h(t) = (λ − μ)M(t) + θh ′ (t) = (λ − μ)M ′ (t)Therefore, the differential equation (6.1) can be rewritten asorIntegration yieldsh ′ (t)λ − μ = h(t)h ′ (t)h(t) = λ − μlog[h(t)]=(λ − μ)t + cor


Putting this back in terms of M(t) givesh(t) = Ke (λ−μ)tθ + (λ − μ)M(t) = Ke (λ−μ)t6.3. Birth and Death Processes 371To determine the value of the constant K, we use the fact that M(0) = i andevaluate the preceding at t = 0. This givesθ + (λ − μ)i = KSubstituting this back in the preceding equation for M(t) yields the followingsolution for M(t):M(t) =θλ − μ [e(λ−μ)t − 1]+ie (λ−μ)tNote that we have implicitly assumed that λ ≠ μ. Ifλ = μ, then the differentialequation (6.1) reduces toM ′ (t) = θ (6.2)Integrating (6.2) and using that M(0) = i gives the solutionM(t) = θt + iExample 6.5 (The Queueing System M/M/1) Suppose that customers arriveat a single-server service station in accordance with a Poisson process havingrate λ. That is, the times between successive arrivals are independent exponentialrandom variables having mean 1/λ. Upon arrival, each customer goes directly intoservice if the server is free; if not, then the customer joins the queue (that is, hewaits in line). When the server finishes serving a customer, the customer leaves thesystem and the next customer in line, if there are any waiting, enters the service.The successive service times are assumed to be independent exponential randomvariables having mean 1/μ.The preceding is known as the M/M/1 queueing system. The first M refers tothe fact that the interarrival process is Markovian (since it is a Poisson process)and the second to the fact that the service distribution is exponential (and, hence,Markovian). The 1 refers to the fact that there is a single server.If we let X(t) denote the number in the system at time t then {X(t), t 0} isa birth and death process withμ n = μ, n 1λ n = λ, n 0


372 6 Continuous-Time Markov ChainsExample 6.6 (A Multiserver Exponential Queueing System) Consider an exponentialqueueing system in which there are s servers available, each servingat rate μ. An entering customer first waits in line and then goes to the first freeserver. This is a birth and death process with parameters{ nμ, 1 n sμ n =sμ, n > sλ n = λ, n 0To see why this is true, reason as follows: If there are n customers in the system,where n s, then n servers will be busy. Since each of these servers worksat rate μ, the total departure rate will be nμ. On the other hand, if there are ncustomers in the system, where n>s, then all s of the servers will be busy, andthus the total departure rate will be sμ. This is known as an M/M/s queueingmodel. Consider now a general birth and death process with birth rates {λ n } and deathrates {μ n }, where μ 0 = 0, and let T i denote the time, starting from state i, it takesfor the process to enter state i + 1, i 0. We will recursively compute E[T i ],i 0, by starting with i = 0. Since T 0 is exponential with rate λ 0 , we have thatE[T 0 ]= 1 λ 0For i>0, we condition whether the first transition takes the process into statei − 1ori + 1. That is, let{ 1, if the first transition from i is to i + 1I i =0, if the first transition from i is to i − 1and note that1E[T i |I i = 1]= ,λ i + μ i1E[T i |I i = 0]= + E[T i−1 ]+E[T i ]λ i + μ i(6.3)This follows since, independent of whether the first transition is from a birth ordeath, the time until it occurs is exponential with rate λ i + μ i ; now if this firsttransition is a birth, then the population size is at i + 1, so no additional timeis needed; whereas if it is death, then the population size becomes i − 1 and theadditional time needed to reach i +1 is equal to the time it takes to return to state i(and this has mean E[T i−1 ]) plus the additional time it then takes to reach i + 1


6.3. Birth and Death Processes 373(and this has mean E[T i ]). Hence, since the probability that the first transition isa birth is λ i /(λ i + μ i ), we see thator, equivalently,E[T i ]=1+μ i(E[T i−1 ]+E[T i ])λ i + μ i λ i + μ iE[T i ]= 1 + μ iE[T i−1 ], i 1λ i λ iStarting with E[T 0 ]=1/λ 0 , the preceding yields an efficient method to successivelycompute E[T 1 ], E[T 2 ], and so on.Suppose now that we wanted to determine the expected time to go from state ito state j where i


374 6 Continuous-Time Markov ChainsThe foregoing assumes that λ ≠ μ.Ifλ = μ, thenE[T i ]= i + 1λ ,j(j + 1) − k(k + 1)E[time to go from k to j]= 2λWe can also compute the variance of the time to go from 0 to i + 1 by utilizingthe conditional variance formula. First note that Equation (6.3) can bewritten asThusE[T i |I i ]=1λ i + μ i+ (1 − I i )(E[T i−1 ]+E[T i ])Var(E[T i |I i ]) = (E[T i−1 ]+E[T i ]) 2 Var(I i )= (E[T i−1 ]+E[T i ]) 2 μ i λ i(μ i + λ i ) 2 (6.4)where Var(I i ) is as shown since I i is a Bernoulli random variable with parameterp = λ i /(λ i + μ i ). Also, note that if we let X i denote the time until the transitionfrom i occurs, thenVar(T i |I i = 1) = Var(X i |I i = 1)= Var(X i )1=(λ i + μ i ) 2 (6.5)where the preceding uses the fact that the time until transition is independent ofthe next state visited. Also,Var(T i |I i = 0) = Var(X i + time to get back to i + time to then reach i + 1)= Var(X i ) + Var(T i−1 ) + Var(T i ) (6.6)where the foregoing uses the fact that the three random variables are independent.We can rewrite Equations (6.5) and (6.6) asVar(T i |I i ) = Var(X i ) + (1 − I i )[Var(T i−1 ) + Var(T i )]soE[Var(T i |I i )]=1(μ i + λ i ) 2 + μ iμ i + λ i[Var(T i−1 ) + Var(T i )] (6.7)


6.4. The Transition Probability Function P ij (t) 375Hence, using the conditional variance formula, which states that Var(T i ) is thesum of Equations (6.7) and (6.4), we obtainor, equivalently,Var(T i ) =Var(T i ) =1(μ i + λ i ) 2 + μ iμ i + λ i[Var(T i−1 ) + Var(T i )]+ μ iλ i(μ i + λ i ) 2 (E[T i−1]+E[T i ]) 21λ i (λ i + μ i ) + μ iVar(T i−1 ) +μ i(E[T i−1 ]+E[T i ]) 2λ i μ i + λ iStarting with Var(T 0 ) = 1/λ 2 0and using the former recursion to obtain the expectations,we can recursively compute Var(T i ). In addition, if we want the varianceof the time to reach state j, starting from state k, k < j, then this can be expressedas the time to go from k to k + 1 plus the additional time to go from k + 1tok + 2,and so on. Since, by the Markovian property, these successive random variablesare independent, it follows that∑j−1Var(timetogofromk to j)= Var(T i )i=k6.4. The Transition Probability Function P ij (t)LetP ij (t) = P {X(t + s) = j|X(s) = i}denote the probability that a process presently in state i will be in state j atime t later. These quantities are often called the transition probabilities of thecontinuous-time Markov chain.We can explicitly determine P ij (t) in the case of a pure birth process havingdistinct birth rates. For such a process, let X k denote the time the process spendsin state k before making a transition into state k + 1, k 1. Suppose that theprocess is presently in state i, and let j>i. Then, as X i is the time it spends instate i before moving to state i + 1, and X i+1 is the time it then spends in statei + 1 before moving to state i + 2, and so on, it follows that ∑ j−1k=iX k is the time


376 6 Continuous-Time Markov Chainsit takes until the process enters state j. Now, if the process has not yet enteredstate j by time t, then its state at time t is smaller than j, and vice versa. That is,X(t) < j ⇔ X i +···+X j−1 >tTherefore, for itHowever, since X i ,...,X j−1 are independent exponential random variables withrespective rates λ i ,...,λ j−1 , we obtain from the preceding and Equation (5.9),which gives the tail distribution function of ∑ j−1k=i X k, that∑j−1P {X(t) < j|X(0) = i}=k=ie −λ ktk=ij−1∏r≠k, r=iλ rλ r − λ kReplacing j by j + 1 in the preceding gives thatP {X(t) < j + 1|X(0) = i}=j∑e −λ ktj∏k=i r≠k, r=iλ rλ r − λ kSinceP {X(t) = j|X(0) = i}=P {X(t) < j + 1|X(0) = i}−P {X(t) < j|X(0) = i}and since P ii (t) = P {X i >t}=e −λ it , we have shown the following.Proposition 6.1For a pure birth process having λ i ≠ λ j when i ≠ jP ij (t) =j∑k=ie −λ ktj∏r≠k, r=iλ r∑j−1−λ r − λ kk=ie −λ ktP ii (t) = e −λ itj−1∏r≠k, r=iλ rλ r − λ k, i


6.4. The Transition Probability Function P ij (t) 377Example 6.8 Consider the Yule process, which is a pure birth process inwhich each individual in the population independently gives birth at rate λ, andso λ n = nλ, n 1. Letting i = 1, we obtain from Proposition 6.1Now,soP 1j (t) =j∑k=1e −kλtj−1∏= e −jλtr=1j∏r≠k, r=1j−1rr − k − ∑k=1(j−1rr − j + ∑e −kλtk=1k=1e −kλtj∏r≠k, r=1j−1∏r≠k, r=1∑j−1( ) j= e −jλt (−1) j−1 + e −kλt j − k − 1kj − kj−1∏r≠k, r=1rr − k −j−1∏rr − kr≠k, r=1j−1∏r≠k, r=1rr − krr − k = (j − 1)!(1 − k)(2 − k)···(k − 1 − k)(j − k)!( ) j − 1= (−1) k−1 k − 1P 1j (t) =j∑k=1( ) j − 1e −kλt (−1) k−1k − 1∑j−1( ) j − 1= e −λt e −iλt (−1) iii=0= e −λt (1 − e −λt ) j−1)rr − kThus, starting with a single individual, the population size at time t has a geometricdistribution with mean e λt . If the population starts with i individuals, thenwe can regard each of these individuals as starting her own independent Yuleprocess, and so the population at time t will be the sum of i independent andidentically distributed geometric random variables with parameter e −λt . But thismeans that the conditional distribution of X(t), given that X(0) = i, isthesameas the distribution of the number of times that a coin that lands heads on eachflip with probability e −λt must be flipped to amass a total of i heads. Hence, the


378 6 Continuous-Time Markov Chainspopulation size at time t has a negative binomial distribution with parameters iand e −λt ,soP ij (t) =( ) j − 1e −iλt (1 − e −λt ) j−i , j i 1i − 1[We could, of course, have used Proposition 6.1 to immediately obtain an equationfor P ij (t), rather than just using it for P 1j (t), but the algebra that would have thenbeen needed to show the equivalence of the resulting expression to the precedingresult is somewhat involved.] We shall now derive a set of differential equations that the transition probabilitiesP ij (t) satisfy in a general continuous-time Markov chain. However, first weneed a definition and a pair of lemmas.Foranypairofstatesi and j, letq ij = v i P ijSince v i is the rate at which the process makes a transition when in state i and P ijis the probability that this transition is into state j, it follows that q ij is the rate,when in state i, at which the process makes a transition into state j. The quantitiesq ij are called the instantaneous transition rates. Sincev i = ∑ jv i P ij = ∑ jq ijandP ij = q ij= q ij∑v i j q ijit follows that specifying the instantaneous transition rates determines the parametersof the continuous-time Markov chain.Lemma 6.2(a)(b)1 − P ii (h)lim h→0hP ij (h)lim h→0h= v i= q ij when i ≠ jProof We first note that since the amount of time until a transition occurs isexponentially distributed it follows that the probability of two or more transitions


6.4. The Transition Probability Function P ij (t) 379in a time h is o(h). Thus, 1 − P ii (h), the probability that a process in state i attime 0 will not be in state i at time h, equals the probability that a transition occurswithin time h plus something small compared to h. Therefore,1 − P ii (h) = v i h + o(h)and part (a) is proven. To prove part (b), we note that P ij (h), the probability thatthe process goes from state i to state j in a time h, equals the probability thata transition occurs in this time multiplied by the probability that the transition isinto state j, plus something small compared to h. That is,P ij (h) = hv i P ij + o(h)and part (b) is proven.Lemma 6.3 For all s 0, t 0,∞∑P ij (t + s) = P ik (t)P kj (s) (6.8)k=0Proof In order for the process to go from state i to state j in time t + s,itmustbe somewhere at time t and thusP ij (t + s) = P {X(t + s) = j|X(0) = i}∞∑= P {X(t + s) = j,X(t) = k|X(0) = i}k=0∞∑= P {X(t + s) = j|X(t) = k,X(0) = i}·P {X(t) = k|X(0) = i}k=0∞∑= P {X(t + s) = j|X(t) = k}·P {X(t) = k|X(0) = i}k=0∞∑= P kj (s)P ik (t)k=0and the proof is completed.


380 6 Continuous-Time Markov ChainsThe set of equations (6.8) is known as the Chapman–Kolmogorov equations.From Lemma 6.3, we obtainP ij (h + t)− P ij (t) =∞∑P ik (h)P kj (t) − P ij (t)k=0= ∑ k≠iP ik (h)P kj (t) −[1 − P ii (h)]P ij (t)and thus{P ij (t + h) − P ij (t) ∑lim= limh→0 hh→0k≠iP ik (h)P kj (t) −h[ 1 − Pii (h)h] }P ij (t)Now assuming that we can interchange the limit and the summation in the precedingand applying Lemma 6.2, we obtainP ij ′ (t) = ∑ q ik P kj (t) − v i P ij (t)k≠iIt turns out that this interchange can indeed be justified and, hence, we have thefollowing theorem.Theorem 6.1times t 0,(Kolmogorov’s Backward Equations) For all states i, j, andP ij ′ (t) = ∑ q ik P kj (t) − v i P ij (t)k≠iExample 6.9The backward equations for the pure birth process becomeP ′ij (t) = λ iP i+1,j (t) − λ i P ij (t)The backward equations for the birth and death process be-Example 6.10come[P ij ′ (t) = (λ i + μ i )or equivalentlyP ′ 0j (t) = λ 0P 1j (t) − λ 0 P 0j (t),λ iλ i + μ iP i+1,j (t) +μ iλ i + μ iP i−1,j (t)]− (λ i + μ i )P ij (t)P0j ′ (t) = λ 0[P 1j (t) − P 0j (t)],Pij ′ (t) = λ iP i+1,j (t) + μ i P i−1,j (t) − (λ i + μ i )P ij (t), i > 0 (6.9)


6.4. The Transition Probability Function P ij (t) 381Example 6.11 (A Continuous-Time Markov Chain Consisting of Two States)Consider a machine that works for an exponential amount of time having mean1/λ before breaking down; and suppose that it takes an exponential amount oftime having mean 1/μ to repair the machine. If the machine is in working conditionat time 0, then what is the probability that it will be working at time t = 10?To answer this question, we note that the process is a birth and death process(with state 0 meaning that the machine is working and state 1 that it is beingrepaired) having parametersλ 0 = λ, μ 1 = μ,λ i = 0, i ≠ 0, μ i = 0, i ≠ 1We shall derive the desired probability, namely, P 00 (10) by solving the set ofdifferential equations given in Example 6.10. From Equation (6.9), we obtainP 00 ′ (t) = λ[P 10(t) − P 00 (t)], (6.10)P 10 ′ (t) = μP 00(t) − μP 10 (t) (6.11)Multiplying Equation (6.10) by μ and Equation (6.11) by λ and then adding thetwo equations yieldsBy integrating, we obtainμP ′ 00 (t) + λP ′ 10 (t) = 0μP 00 (t) + λP 10 (t) = cHowever, since P 00 (0) = 1 and P 10 (0) = 0, we obtain c = μ and henceor equivalentlyμP 00 (t) + λP 10 (t) = μ (6.12)λP 10 (t) = μ[1 − P 00 (t)]By substituting this result in Equation (6.10), we obtainP 00 ′ (t) = μ[1 − P 00(t)]−λP 00 (t)= μ − (μ + λ)P 00 (t)Lettingh(t) = P 00 (t) −μμ + λ


382 6 Continuous-Time Markov Chainswe have[h ′ (t) = μ − (μ + λ) h(t) +μ ]μ + λ=−(μ + λ)h(t)orh ′ (t)h(t)By integrating both sides, we obtainor=−(μ + λ)log h(t) =−(μ + λ)t + ch(t) = Ke −(μ+λ)tand thusP 00 (t) = Ke −(μ+λ)t +μμ + λwhich finally yields, by setting t = 0 and using the fact that P 00 (0) = 1,P 00 (t) =From Equation (6.12), this also implies thatP 10 (t) =λμ + λ e−(μ+λ)t +μμ + λμμ + λ − μμ + λ e−(μ+λ)tHence, our desired probability P 00 (10) equalsP 00 (10) =λμ + λ e−10(μ+λ) +μμ + λAnother set of differential equations, different from the backward equations,may also be derived. This set of equations, known as Kolmogorov’s forward


6.4. The Transition Probability Function P ij (t) 383equations is derived as follows. From the Chapman–Kolmogorov equations(Lemma 6.3), we haveP ij (t + h) − P ij (t) =∞∑P ik (t)P kj (h) − P ij (t)k=0= ∑ k≠jP ik (t)P kj (h) −[1 − P jj (h)]P ij (t)and thus{P ij (t + h) − P ij (t) ∑lim= lim P ik (t) P kj (h)−h→0 hh→0 hk≠j[ 1 − Pjj (h)h] }P ij (t)and, assuming that we can interchange limit with summation, we obtain fromLemma 6.2P ′ij (t) = ∑ k≠jq kj P ik (t) − v j P ij (t)Unfortunately, we cannot always justify the interchange of limit and summationand thus the preceding is not always valid. However, they do hold in most models,including all birth and death processes and all finite state models. We thus havethe following.Theorem 6.2conditions,(Kolmogorov’s Forward Equations) Under suitable regularityP ′ij (t) = ∑ k≠jq kj P ik (t) − v j P ij (t) (6.13)We shall now solve the forward equations for the pure birth process. For thisprocess, Equation (6.13) reduces toP ′ij (t) = λ j−1P i,j−1 (t) − λ j P ij (t)However, by noting that P ij (t) = 0 whenever j


384 6 Continuous-Time Markov ChainsProposition 6.4For a pure birth process,P ii (t) = e −λ it , i 0P ij (t) = λ j−1 e −λ j t∫ t0e λ j s P i,j−1 (s) ds, j i + 1Proof The fact that P ii (t) = e −λit follows from Equation (6.14) by integratingand using the fact that P ii (0) = 1. To prove the corresponding result for P ij (t),we note by Equation (6.14) thate λ j t [ P ′ij (t) + λ j P ij (t) ] = e λ j t λ j−1 P i,j−1 (t)ord [eλ j t P ij (t) ] = λ j−1 e λ j t P i,j−1 (t)dtHence, since P ij (0) = 0, we obtain the desired results. Example 6.12 (Forward Equations for Birth and Death Process) The forwardequations (Equation 6.13) for the general birth and death process becomeP ′i0 (t) = ∑ k≠0q k0 P ik (t) − λ 0 P i0 (t)= μ 1 P i1 (t) − λ 0 P i0 (t) (6.15)P ′ij (t) = ∑ k≠jq kj P ik (t) − (λ j + μ j )P ij (t)= λ j−1 P i,j−1 (t) + μ j+1 P i,j+1 (t) − (λ j + μ j )P ij (t) (6.16)6.5. Limiting ProbabilitiesIn analogy with a basic result in discrete-time Markov chains, the probability thata continuous-time Markov chain will be in state j at time t often converges toa limiting value which is independent of the initial state. That is, if we call thisvalue P j , thenP j ≡ lim P ij (t)t→∞where we are assuming that the limit exists and is independent of the initial state i.


6.5. Limiting Probabilities 385To derive a set of equations for the P j , consider first the set of forward equationsP ′ij (t) = ∑ k≠jq kj P ik (t) − v j P ij (t) (6.17)Now, if we let t approach ∞, then assuming that we can interchange limit andsummation, we obtain[ ∑]lim P ′t→∞ij (t) = lim q kj P ik (t) − v j P ij (t)t→∞k≠j= ∑ k≠jq kj P k − v j P jHowever, as P ij (t) is a bounded function (being a probability it is always between0 and 1), it follows that if Pij ′ (t) converges, then it must converge to 0 (why isthis?). Hence, we must have that0 = ∑ q kj P k − v j P jk≠jorv j P j = ∑ k≠jq kj P k , all states j (6.18)The preceding set of equations, along with this equation∑P j = 1 (6.19)can be used to solve for the limiting probabilities.jRemarks (i) We have assumed that the limiting probabilities P j exist. A sufficientcondition for this is that(a) all states of the Markov chain communicate in the sense that starting instate i there is a positive probability of ever being in state j, for all i, j and(b) the Markov chain is positive recurrent in the sense that, starting in any state,the mean time to return to that state is finiteIf conditions (a) and (b) hold, then the limiting probabilities will exist and satisfyEquations (6.18) and (6.19). In addition, P j also will have the interpretationof being the long-run proportion of time that the process is in state j.(ii) Equations (6.18) and (6.19) have a nice interpretation: In any interval (0,t)the number of transitions into state j must equal to within 1 the number of tran-


386 6 Continuous-Time Markov Chainssitions out of state j (why?). Hence, in the long run, the rate at which transitionsinto state j occur must equal the rate at which transitions out of state j occur. Nowwhen the process is in state j, itleavesatratev j , and, as P j is the proportion oftime it is in state j, it thus follows thatv j P j = rate at which the process leaves state jSimilarly, when the process is in state k, it enters j at a rate q kj . Hence, as P k isthe proportion of time in state k, we see that the rate at which transitions from kto j occur is just q kj P k ; thus∑q kj P k = rate at which the process enters state jk≠jSo, Equation (6.18) is just a statement of the equality of the rates at which theprocess enters and leaves state j. Because it balances (that is, equates) these rates,the equations (6.18) are sometimes referred to as “balance equations.”(iii) When the limiting probabilities P j exist, we say that the chain is ergodic.The P j are sometimes called stationary probabilities since it can be shown that(as in the discrete-time case) if the initial state is chosen according to the distribution{P j }, then the probability of being in state j at time t is P j , for all t.Let us now determine the limiting probabilities for a birth and death process.From Equation (6.18) or equivalently, by equating the rate at which the processleaves a state with the rate at which it enters that state, we obtainState012n, n 1Rate at which leave = rate at which enterλ 0 P 0 = μ 1 P 1(λ 1 + μ 1 )P 1 = μ 2 P 2 + λ 0 P 0(λ 2 + μ 2 )P 2 = μ 3 P 3 + λ 1 P 1(λ n + μ n )P n = μ n+1 P n+1 + λ n−1 P n−1By adding to each equation the equation preceding it, we obtainλ 0 P 0 = μ 1 P 1 ,λ 1 P 1 = μ 2 P 2 ,λ 2 P 2 = μ 3 P 3 ,.λ n P n = μ n+1 P n+1 , n 0


6.5. Limiting Probabilities 387Solving in terms of P 0 yieldsP 1 = λ 0μ 1P 0 ,P 2 = λ 1μ 2P 1 = λ 1λ 0μ 2 μ 1P 0 ,P 3 = λ 2P 2 = λ 2λ 1 λ 0P 0 ,μ 3 μ 3 μ 2 μ 1..P n = λ n−1P n−1 = λ n−1λ n−2 ···λ 1 λ 0P 0μ n μ n μ n−1 ···μ 2 μ 1And by using the fact that ∑ ∞n=0 P n = 1, we obtainorand so1 = P 0 + P 0 ∞ ∑P 0 =1 +n=1∞∑n=1λ n−1 ···λ 1 λ 0μ n ···μ 2 μ 11λ 0 λ 1 ···λ n−1μ 1 μ 2 ···μ nλ 0 λ 1 ···λ n−1P n =∞∑), n 1 (6.20)λ 0 λ 1 ···λ n−1μ 1 μ 2 ···μ n(1 +μ 1 μ 2 ···μ nn=1The foregoing equations also show us what condition is necessary for these limitingprobabilities to exist. Namely, it is necessary that∞∑n=1This condition also may be shown to be sufficient.λ 0 λ 1 ···λ n−1μ 1 μ 2 ···μ n< ∞ (6.21)


388 6 Continuous-Time Markov ChainsIn the multiserver exponential queueing system (Example 6.6), Condition(6.21) reduces to∞∑n=s+1λ n(sμ) n < ∞which is equivalent to λ/sμ < 1.For the linear growth model with immigration (Example 6.4), Condition (6.21)reduces to∞∑ θ(θ + λ) ···(θ + (n − 1)λ)n!μ n < ∞n=1Using the ratio test, the preceding will converge whenθ(θ + λ) ···(θ + nλ) n!μ nlimn→∞ (n + 1)!μ n+1 θ(θ + λ) ···(θ + (n − 1)λ) = lim θ + nλn→∞ (n + 1)μ= λ μ < 1That is, the condition is satisfied when λMThis is so in the sense that a failing machine is regarded as an arrival and afixed machine as a departure. If any machines are broken down, then since theserviceman’s rate is μ, μ n = μ. On the other hand, if n machines are not inuse, then since the M − n machines in use each fail at a rate λ, it follows that


6.5. Limiting Probabilities 389λ n = (M − n)λ. From Equation (6.20) we have that P n , the probability that nmachines will not be in use, is given by1P 0 =1 + ∑ Mn=1 [Mλ(M − 1)λ ···(M − n + 1)λ/μ n ]=11 + ∑ Mn=1 (λ/μ) n M!/(M − n)! ,(λ/μ) n M!/(M − n)!P n =1 + ∑ , n= 0, 1,...,MMn=1 (λ/μ) n M!/(M − n)!Hence, the average number of machines not in use is given by∑ M∑Mn=0n(λ/μ) n M!/(M − n)!nP n =1 + ∑ Mn=1 (λ/μ) n M!/(M − n)!n=0(6.22)To obtain the long-run proportion of time that a given machine is working wewill compute the equivalent limiting probability of its working. To do so, wecondition on the number of machines that are not working to obtainP {machine is working}==M∑P {machine is working|n not working}P nn=0M∑n=0= 1 −M − nMP nM∑0nP nM(since if n are not working,then M − n are working!)where ∑ M0 nP n is given by Equation (6.22).Example 6.14 (The M/M/1 Queue) In the M/M/1 queue λ n = λ, μ n = μand thus, from Equation (6.20),P n =(λ/μ) n1 + ∑ ∞n=1 (λ/μ) n= (λ/μ) n (1 − λ/μ), n 0provided that λ/μ < 1. It is intuitive that λ must be less than μ for limitingprobabilities to exist. Customers arrive at rate λ and are served at rate μ, and


390 6 Continuous-Time Markov Chainsthus if λ>μ, then they arrive at a faster rate than they can be served and thequeue size will go to infinity. The case λ = μ behaves much like the symmetricrandom walk of Section 4.3, which is null recurrent and thus has no limitingprobabilities. Example 6.15 Let us reconsider the shoeshine shop of Example 6.1, anddetermine the proportion of time the process is in each of the states 0, 1, 2.Because this is not a birth and death process (since the process can go directlyfrom state 2 to state 0), we start with the balance equations for the limiting probabilities.State012Rate that the process leaves = rate that the process entersλP 0 = μ 2 P 2μ 1 P 1 = λP 0μ 2 P 2 = μ 1 P 1Solving in terms of P 0 yieldsP 2 = λ μ 2P 0 , P 1 = λ μ 1P 0which implies, since P 0 + P 1 + P 2 = 1, that[P 0 1 + λ + λ ]= 1μ 2 μ 1orandP 0 =μ 1 μ 2μ 1 μ 2 + λ(μ 1 + μ 2 )P 1 =λμ 2μ 1 μ 2 + λ(μ 1 + μ 2 ) ,P 2 =λμ 1μ 1 μ 2 + λ(μ 1 + μ 2 )Example 6.16 Consider a set of n components along with a single repairman.Suppose that component i functions for an exponentially distributed time with rateλ i and then fails. The time it then takes to repair component i is exponential withrate μ i ,i = 1,...,n. Suppose that when there is more than one failed componentthe repairman always works on the most recent failure. For instance, if there are at


6.5. Limiting Probabilities 391present two failed components—say, components 1 and 2 of which 1 has failedmost recently—then the repairman will be working on component 1. However, ifcomponent 3 should fail before 1’s repair is completed, then the repairman wouldstop working on component 1 and switch to component 3 (that is, a newly failedcomponent preempts service).To analyze the preceding as a continuous-time Markov chain, the state mustrepresent the set of failed components in the order of failure. That is, the state willbe i 1 ,i 2 ,...,i k if i 1 ,i 2 ,...,i k are the k failed components (all the other n − kbeing functional) with i 1 having been the most recent failure (and is thus presentlybeing repaired), i 2 the second most recent, and so on. Because there are k! possibleorderings for a fixed set of k failed components and ( nk)choices of that set, itfollows that there aren∑k=0( nk)k!=n∑k=0n!n∑(n − k)! = n! 1i!possible states.The balance equations for the limiting probabilities are as follows:(μ i1 +∑ i≠i jj=1,...,kλ i)P(i 1 ,...,i k ) =n∑λ i P(φ)=i=1∑ i≠i jj=1,...,ki=0P(i,i 1 ,...,i k )μ i +P(i 2 ,...,i k )λ i1 ,n∑P(i)μ i (6.23)i=1where φ is the state when all components are working. The preceding equationsfollow because state i 1 ,...,i k can be left either by a failure of any of the additionalcomponents or by a repair completion of component i 1 . Also that state can beentered either by a repair completion of component i when the state is i, i 1 ,...,i kor by a failure of component i 1 when the state is i 2 ,...,i k .However, if we takeP(i 1 ,...,i k ) = λ i 1λ i2 ···λ ikμ i1 μ i2 ···μ ikP(φ) (6.24)then it is easily seen that Equations (6.23) are satisfied. Hence, by uniquenessthese must be the limiting probabilities with P(φ) determined to make their sum


392 6 Continuous-Time Markov Chainsequal 1. That is,[P(φ)= 1 + ∑i 1 ,...,i k]λ i1 ···λ −1 ikμ i1 ···μ ikAs an illustration, suppose n = 2 and so there are five states φ, 1, 2, 12, 21. Thenfrom the preceding we would have[P(φ)= 1 + λ 1+ λ 2+ 2λ ] −11λ 2,μ 1 μ 2 μ 1 μ 2P(1) = λ 1μ 1P(φ),P(2) = λ 2μ 2P(φ),P(1, 2) = P(2, 1) = λ 1λ 2P(φ)μ 1 μ 2It is interesting to note, using Equation (6.24), that given the set of failedcomponents, each of the possible orderings of these components is equallylikely. 6.6. Time ReversibilityConsider a continuous-time Markov chain that is ergodic and let us consider thelimiting probabilities P i from a different point of view than previously. If we considerthe sequence of states visited, ignoring the amount of time spent in each stateduring a visit, then this sequence constitutes a discrete-time Markov chain withtransition probabilities P ij . Let us assume that this discrete-time Markov chain,called the embedded chain, is ergodic and denote by π i its limiting probabilities.That is, the π i are the unique solution ofπ i = ∑ jπ j P ji , all i∑π i = 1iNow since π i represents the proportion of transitions that take the process intostate i, and because 1/v i is the mean time spent in state i during a visit, it seems


6.6. Time Reversibility 393intuitive that P i , the proportion of time in state i, should be a weighted average ofthe π i where π i is weighted proportionately to 1/v i . That is, it is intuitive thatP i =π i/v i∑j π j /v j(6.25)To check the preceding, recall that the limiting probabilities P i must satisfyv i P i = ∑ j≠iP j q ji , all ior equivalently, since P ii = 0v i P i = ∑ jP j v j P ji ,all iHence, for the P i s to be given by Equation (6.25), the following would be necessary:π i = ∑ jπ j P ji ,all iBut this, of course, follows since it is in fact the defining equation for the π i s.Suppose now that the continuous-time Markov chain has been in operation fora long time, and suppose that starting at some (large) time T we trace the processgoing backward in time. To determine the probability structure of this reversedprocess, we first note that given we are in state i at some time—say, t—the probabilitythat we have been in this state for an amount of time greater than s is juste −v is . This is so, sinceP {process is in state i throughout [t − s,t]|X(t) = i}P {process is in state i throughout [t − s,t]}=P {X(t) = i}= P {X(t − s) = i}e−v isP {X(t) = i}= e −v issince for t large P {X(t − s) = i}=P {X(t) = i}=P i .In other words, going backward in time, the amount of time the process spendsin state i is also exponentially distributed with rate v i . In addition, as was shown


394 6 Continuous-Time Markov Chainsin Section 4.8, the sequence of states visited by the reversed process constitutes adiscrete-time Markov chain with transition probabilities Q ij given byQ ij = π j P jiπ iHence, we see from the preceding that the reversed process is a continuous-timeMarkov chain with the same transition rates as the forward-time process and withone-stage transition probabilities Q ij . Therefore, the continuous-time Markovchain will be time reversible, in the sense that the process reversed in time hasthe same probabilistic structure as the original process, if the embedded chain istime reversible. That is, ifπ i P ij = π j P ji ,for all i, jNow using the fact that P i = (π i /v i )/( ∑ j π j /v j ), we see that the preceding conditionis equivalent toP i q ij = P j q ji , for all i, j (6.26)Since P i is the proportion of time in state i and q ij is the rate when in state i thatthe process goes to j, the condition of time reversibility is that therateatwhichthe process goes directly from state i to state j is equal to the rate at which it goesdirectly from j to i. It should be noted that this is exactly the same condition neededfor an ergodic discrete-time Markov chain to be time reversible (see Section 4.8).An application of the preceding condition for time reversibility yields the followingproposition concerning birth and death processes.Proposition 6.5An ergodic birth and death process is time reversible.Proof We must show that the rate at which a birth and death process goes fromstate i to state i + 1 is equal to the rate at which it goes from i + 1toi. Nowinany length of time t the number of transitions from i to i + 1 must equal to within1 the number from i + 1toi (since between each transition from i to i + 1theprocess must return to i, and this can only occur through i + 1, and vice versa).Hence, as the number of such transitions goes to infinity as t →∞, it follows thatthe rate of transitions from i to i + 1 equals the rate from i + 1toi. Proposition 6.5 can be used to prove the important result that the output processof an M/M/s queue is a Poisson process. We state this as a corollary.Corollary 6.6 Consider an M/M/s queue in which customers arrive in accordancewith a Poisson process having rate λ and are served by any one ofs servers—each having an exponentially distributed service time with rate μ.


6.6. Time Reversibility 395Figure 6.1. The number in the system.If λ


396 6 Continuous-Time Markov ChainsAnalogous to the result for discrete-time Markov chains, if we can find a probabilityvector P that satisfies the preceding then the Markov chain is time reversibleand the P i s are the long-run probabilities. That is, we have the following proposition.Proposition 6.7 If for some set {P i }and∑P i = 1, P i 0iP i q ij = P j q ji for all i ≠ j (6.27)then the continuous-time Markov chain is time reversible and P i represents thelimiting probability of being in state i.ProofFor fixed i we obtain upon summing Equation (6.27) over all j : j ≠ i∑P i q ij = ∑ P j q jij≠i j≠ior, since ∑ j≠i q ij = v i ,v i P i = ∑ j≠iP j q jiHence, the P i s satisfy the balance equations and thus represent the limiting probabilities.Because Equation (6.27) holds, the chain is time reversible. Example 6.18 Consider a set of n machines and a single repair facility toservice them. Suppose that when machine i, i = 1,...,n, goes down it requiresan exponentially distributed amount of work with rate μ i to get it back up. Therepair facility divides its efforts equally among all down components in the sensethat whenever there are k down machines 1 k n each receives work at a rateof 1/k per unit time. Finally, suppose that each time machine i goes back up itremains up for an exponentially distributed time with rate λ i .The preceding can be analyzed as a continuous-time Markov chain having 2 nstates where the state at any time corresponds to the set of machines that are downat that time. Thus, for instance, the state will be (i 1 ,i 2 ,...,i k ) when machinesi 1 ,...,i k are down and all the others are up. The instantaneous transition rates are


6.6. Time Reversibility 397as follows:q (i1 ,...,i k−1 ),(i 1 ,...,i k ) = λ ik ,q (i1 ,...,i k ),(i 1 ,...,i k−1 ) = μ ik /kwhere i 1 ,...,i k are all distinct. This follows since the failure rate of machine i kis always λ ik and the repair rate of machine i k when there are k failed machinesis μ ik /k.Hence the time reversible equations (6.27) areorP(i 1 ,...,i k )μ ik /k = P(i 1 ,...,i k−1 )λ ikP(i 1 ,...,i k ) = kλ i kμ ikP(i 1 ,...,i k−1 )= kλ i kμ ik(k − 1)λ ik−1μ ik−1P(i 1 ,...,i k−2 ) upon iterating=. .= k!k∏(λ ij /μ ij )P (φ)j=1where φ is the state in which all components are working. BecauseP(φ)+ ∑ P(i 1 ,...,i k ) = 1we see that[P(φ)= 1 + ∑k!i 1 ,...,i kk∏−1(λ ij /μ ij )](6.28)j=1where the preceding sum is over all the 2 n − 1 nonempty subsets {i 1 ,...,i k } of{1, 2,...,n}. Hence as the time reversible equations are satisfied for this choiceof probability vector it follows from Proposition 6.7 that the chain is time reversibleandP(i 1 ,...,i k ) = k!with P(φ)being given by (6.28).k∏(λ ij /μ ij )P (φ)j=1


398 6 Continuous-Time Markov ChainsFor instance, suppose there are two machines. Then, from the preceding wewould haveP(φ)=P(1) =P(2) =11 + λ 1 /μ 1 + λ 2 /μ 2 + 2λ 1 λ 2 /μ 1 μ 2,λ 1 /μ 11 + λ 1 /μ 1 + λ 2 /μ 2 + 2λ 1 λ 2 /μ 1 μ 2,λ 2 /μ 21 + λ 1 /μ 1 + λ 2 /μ 2 + 2λ 1 λ 2 /μ 1 μ 2,2λ 1 λ 2P(1, 2) =μ 1 μ 2 [1 + λ 1 /μ 1 + λ 2 /μ 2 + 2λ 1 λ 2 /μ 1 μ 2 ]Consider a continuous-time Markov chain whose state space is S. We say thatthe Markov chain is truncated to the set A ⊂ S if q ij is changed to 0 for all i ∈ A,j/∈ A. That is, transitions out of the class A are no longer allowed, whereas onesin A continue at the same rates as before. A useful result is that if the chain is timereversible, then so is the truncated one.Proposition 6.8 A time reversible chain with limiting probabilities P j ,j ∈ S, that is truncated to the set A ⊂ S and remains irreducible is also timereversible and has limiting probabilities PjA given byPj A = P j∑i∈A P , j ∈ AiProof By Proposition 6.7 we need to show that, with P A jas given,or, equivalently,P Ai q ij = P A j q jiP i q ij = P j q jifor i ∈ A, j ∈ Afor i ∈ A, j ∈ ABut this follows since the original chain is, by assumption, time reversible.Example 6.19 Consider an M/M/1 queue in which arrivals finding N in thesystem do not enter. This finite capacity system can be regarded as a truncationof the M/M/1 queue to the set of states A ={0, 1,...,N}. Since the number inthe system in the M/M/1 queue is time reversible and has limiting probabilitiesP j = (λ/μ) j (1 − λ/μ) it follows from Proposition 6.6 that the finite capacitymodel is also time reversible and has limiting probabilities given byP j =∑ (λ/μ)jNi=0, j = 0, 1,...,N (λ/μ) i


6.6. Time Reversibility 399Another useful result is given by the following proposition, whose proof is leftas an exercise.Proposition 6.9 If {X i (t), t 0} are, for i = 1,...,n, independent time reversiblecontinuous-time Markov chains, then the vector process {(X i (t), . . . ,X n (t)), t 0} is also a time reversible continuous-time Markov chain.Example 6.20 Consider an n-component system where component i, i =1,...,n, functions for an exponential time with rate λ i and then fails; upon failure,repair begins on component i, with the repair taking an exponentially distributedtime with rate μ i . Once repaired, a component is as good as new. The componentsact independently except that when there is only one working component the systemis temporarily shut down until a repair has been completed; it then starts upagain with two working components.(a) What proportion of time is the system shut down?(b) What is the (limiting) averaging number of components that are being repaired?Solution: Consider first the system without the restriction that it is shutdown when a single component is working. Letting X i (t), i = 1,...,n, equal1 if component i is working at time t and 0 if it is failed, then {X i (t), t 0},i = 1,...,n, are independent birth and death processes. Because a birth anddeath process is time reversible, it follows from Proposition 6.9 that the process{(X 1 (t), . . . , X n (t)), t 0} is also time reversible. Now, withwe haveP i (j) = limt→∞ P {X i(t) = j}, j = 0, 1Also, withP i (1) =μ iμ i + λ i, P i (0) = λ iμ i + λ iP(j 1 ,...,j n ) = limt→∞ P {X i(t) = j i ,i = 1,...,n}it follows, by independence, thatn∏P(j 1 ,...,j n ) = P i (j i ),i=1j i = 0, 1, i= 1,...,nNow, note that shutting down the system when only one component is workingis equivalent to truncating the preceding unconstrained system to the set


400 6 Continuous-Time Markov Chainsconsisting of all states except the one having all components down. Therefore,with P T denoting a probability for the truncated system, we have from Proposition6.8 thatP T (j 1 ,...,j n ) = P(j 1,...,j n ),1 − Cn∑j i > 0i=1whereC = P(0,...,0) =n∏λ j /(μ j + λ j )Hence, letting (0, 1 i ) = (0,...,0, 1, 0,...,0) be the n vector of zeroes andones whose single 1 is in the ith place, we haveP T (system is shut down) =j=1n∑P T (0, 1 i )i=1= 11 − Cn∑i=1(μi= C ∑ ni=1 μ i /λ i1 − Cμ i + λ i) ∏j≠i(λjμ j + λ j)Let R denote the number of components being repaired. Then with I i equal to 1if component i is being repaired and 0 otherwise, we have for the unconstrained(nontruncated) system that[ n∑i=1E[R]=E]n∑n∑I i = P i (0) = λ i /(μ i + λ i )i=1 i=1But, in addition,E[R]=E[R|all components are in repair]C+ E[R|not all components are in repair](1 − C)= nC + E T [R](1 − C)implying that∑ ni=1λ i /(μ i + λ i ) − nCE T [R]=1 − C


6.7. Uniformization 4016.7. UniformizationConsider a continuous-time Markov chain in which the mean time spent in a stateis the same for all states. That is, suppose that v i = v, for all states i. Inthiscase since the amount of time spent in each state during a visit is exponentiallydistributed with rate v, it follows that if we let N(t) denote the number of statetransitions by time t, then {N(t),t 0} will be a Poisson process with rate v.To compute the transition probabilities P ij (t), we can condition on N(t):P ij (t) = P {X(t) = j|X(0) = i}∞∑= P {X(t) = j|X(0) = i, N(t) = n}P {N(t)= n|X(0) = i}=n=0∞∑n=0−vt (vt)nP {X(t) = j|X(0) = i, N(t) = n}en!Now the fact that there have been n transitions by time t tells us something aboutthe amount of time spent in each of the first n states visited, but since the distributionof time spent in each state is the same for all states, it follows that knowingthat N(t)= n gives us no information about which states were visited. Hence,P {X(t) = j|X(0) = i, N(t) = n}=P nijwhere Pij n is just the n-stage transition probability associated with the discretetimeMarkov chain with transition probabilities P ij ; and so when v i ≡ vP ij (t) =∞∑n=0Pij n (vt)ne−vtn!(6.29)Equation (6.29) is often useful from a computational point of view since it enablesus to approximate P ij (t) by taking a partial sum and then computing (by matrixmultiplication of the transition probability matrix) the relevant n stage probabilitiesPij n.Whereas the applicability of Equation (6.29) would appear to be quite limitedsince it supposes that v i ≡ v, it turns out that most Markov chains can be put inthat form by the trick of allowing fictitious transitions from a state to itself. To seehow this works, consider any Markov chain for which the v i are bounded, and letv be any number such thatv i v, for all i (6.30)Now when in state i, the process actually leaves at rate v i ; but this is equivalent tosupposing that transitions occur at rate v, but only the fraction v i /v of transitions


402 6 Continuous-Time Markov Chainsare real ones (and thus real transitions occur at rate v i ) and the remaining fraction1−v i /v are fictitious transitions which leave the process in state i. In other words,any Markov chain satisfying condition (6.30) can be thought of as being a processthat spends an exponential amount of time with rate v in state i and then makes atransition to j with probability P ∗ij , where⎧⎪⎨ 1 − v iij = v ,⎪v ⎩ iv P ij ,P ∗j = ij ≠ i(6.31)Hence, from Equation (6.29) we have that the transition probabilities can be computedbyP ij (t) =∞∑n=0Pij ∗n (vt)ne−vtn!where Pij∗ are the n-stage transition probabilities corresponding to Equation(6.31). This technique of uniformizing the rate in which a transition occurs fromeach state by introducing transitions from a state to itself is known as uniformization.Example 6.21 Let us reconsider Example 6.11, which models the workingsof a machine—either on or off—as a two-state continuous-time Markov chainwithP 01 = P 10 = 1,v 0 = λ, v 1 = μLetting v = λ + μ, the uniformized version of the preceding is to consider it acontinuous-time Markov chain withP 00 =μλ + μ = 1 − P 01,P 10 =μλ + μ = 1 − P 11,v i = λ + μ, i = 1, 2As P 00 = P 10 , it follows that the probability of a transition into state 0 is equalto μ/(λ + μ) no matter what the present state. Because a similar result is true for


6.7. Uniformization 403state 1, it follows that the n-stage transition probabilities are given byPi0 n ,λ + μn 1, i= 0, 1Pi1 n ,λ + μn 1, i= 0, 1Hence,Similarly,∞∑P 00 (t) = P n [(λ + μ)t]n00e−(λ+μ)t n!n=0∞∑( ) μ= e −(λ+μ)t −(λ+μ)t [(λ + μ)t]n+eλ + μn!n=1= e −(λ+μ)t +[1 − e −(λ+μ)t μ]λ + μ= μλ + μ + λλ + μ e−(λ+μ)tP 11 (t) =∞∑n=0The remaining probabilities areP11 n [(λ + μ)t]ne−(λ+μ)tn!= e −(λ+μ)t +[1 − e −(λ+μ)t λ]λ + μ= λλ + μ + μλ + μ e−(λ+μ)tP 01 (t) = 1 − P 00 (t) =λλ + μ [1 − e−(λ+μ)t ],P 10 (t) = 1 − P 11 (t) =μλ + μ [1 − e−(λ+μ)t ]Example 6.22 Consider the two-state chain of Example 6.20 and supposethat the initial state is state 0. Let O(t) denote the total amount of time that theprocess is in state 0 during the interval (0,t). The random variable O(t) is oftencalled the occupation time. We will now compute its mean.


404 6 Continuous-Time Markov ChainsIf we let{ 1, if X(s) = 0I(s)=0, if X(s) = 1then we can represent the occupation time by∫ tO(t) = I(s)ds0Taking expectations and using the fact that we can take the expectation inside theintegral sign (since an integral is basically a sum), we obtain∫ tE[O(t)]= E[I(s)] ds==0∫ t0∫ t0P {X(s) = 0} dsP 00 (s) ds= μλ + μ t + λ(λ + μ) 2 {1 − e−(λ+μ)t }where the final equality follows by integratingP 00 (s) =μλ + μ + λλ + μ e−(λ+μ)s(For another derivation of E[O(t)], see Exercise 38.)6.8. Computing the Transition ProbabilitiesFor any pair of states i and j, let{qij , if i ≠ jr ij =−v i , if i = jUsing this notation, we can rewrite the Kolmogorov backward equationsP ij ′ (t) = ∑ q ik P kj (t) − v i P ij (t)k≠i


6.8. Computing the Transition Probabilities 405and the forward equationsP ′ij (t) = ∑ k≠jq kj P ik (t) − v j P ij (t)as follows:Pij ′ (t) = ∑ kPij ′ (t) = ∑ kr ik P kj (t)r kj P ik (t)(backward)(forward)This representation is especially revealing when we introduce matrix notation.Define the matrices R, and P(t), P ′ (t) by letting the element in row i, column jof these matrices be, respectively, r ij , P ij (t), and Pij ′ (t). Since the backward equationssay that the element in row i, column j of the matrix P ′ (t) can be obtainedby multiplying the ith row of the matrix R by the jth column of the matrix P(t),it is equivalent to the matrix equationSimilarly, the forward equations can be written asP ′ (t) = RP(t) (6.32)P ′ (t) = P(t)R (6.33)Now, just as the solution of the scalar differential equation[or, equivalent, f ′ (t) = f(t)c]isf ′ (t) = cf (t)f(t)= f(0)e ctit can be shown that the solution of the matrix differential Equations (6.32) and(6.33) is given byP(t) = P(0)e RtSince P(0) = I (the identity matrix), this yields thatwhere the matrix e Rt is defined byP(t) = e Rt (6.34)e Rt =∞∑n=0R n tnn!with R n being the (matrix) multiplication of R by itself n times.(6.35)


406 6 Continuous-Time Markov ChainsThe direct use of Equation (6.35) to compute P(t) turns out to be very inefficientfor two reasons. First, since the matrix R contains both positive and negativeelements (remember the off-diagonal elements are the q ij while the ith diagonalelement is −v i ), there is the problem of computer round-off error when we computethe powers of R. Second, we often have to compute many of the terms in theinfinite sum (6.35) to arrive at a good approximation. However, there are certainindirect ways that we can utilize the relation (6.34) to efficiently approximate thematrix P(t). We now present two of these methods.Approximation Method 1 Rather than using (6.35) to compute e Rt ,wecan use the matrix equivalent of the identity(e x = lim 1 + x ) nn→∞ nwhich states that(e Rt = lim I + R t ) nn→∞ nThus, if we let n be a power of 2, say, n = 2 k , then we can approximate P(t) byraising the matrix M = I + Rt/n to the nth power, which can be accomplishedby k matrix multiplications (by first multiplying M by itself to obtain M 2 andthen multiplying that by itself to obtain M 4 and so on). In addition, since onlythe diagonal elements of R are negative (and the diagonal elements of the identitymatrix I are equal to 1) by choosing n large enough, we can guarantee that thematrix I + Rt/n has all nonnegative elements.Approximation Method 2 A second approach to approximating e Rt usesthe identity(e −Rt = lim I − R t ) nn→∞ n(≈ I − Rn) t nfor n largeand thus(P(t) = e Rt ≈ I − R t ) −nn=[ (I − R t n) −1] nHence, if we again choose n to be a large power of 2, say, n = 2 k , we can approximateP(t) by first computing the inverse of the matrix I − Rt/n and then


Exercises 407raising that matrix to the nth power (by utilizing k matrix multiplications). It canbe shown that the matrix (I − Rt/n) −1 will have only nonnegative elements.Remark Both of the preceding computational approaches for approximatingP(t) have probabilistic interpretations (see Exercises 41 and 42).Exercises1. A population of organisms consists of both male and female members. In asmall colony any particular male is likely to mate with any particular female inany time interval of length h, with probability λh + o(h). Each mating immediatelyproduces one offspring, equally likely to be male or female. Let N 1 (t) andN 2 (t) denote the number of males and females in the population at t. Derive theparameters of the continuous-time Markov chain {N 1 (t), N 2 (t)}, i.e., the v i , P ijof Section 6.2.*2. Suppose that a one-celled organism can be in one of two states—either Aor B. An individual in state A will change to state B at an exponential rate α; anindividual in state B divides into two new individuals of type A at an exponentialrate β. Define an appropriate continuous-time Markov chain for a population ofsuch organisms and determine the appropriate parameters for this model.3. Consider two machines that are maintained by a single repairman. Machine ifunctions for an exponential time with rate μ i before breaking down, i = 1, 2. Therepair times (for either machine) are exponential with rate μ. Can we analyze thisas a birth and death process? If so, what are the parameters? If not, how can weanalyze it?*4. Potential customers arrive at a single-server station in accordance with aPoisson process with rate λ. However, if the arrival finds n customers alreadyin the station, then he will enter the system with probability α n . Assuming anexponential service rate μ, set this up as a birth and death process and determinethe birth and death rates.5. There are N individuals in a population, some of whom have a certain infectionthat spreads as follows. Contacts between two members of this populationoccur in accordance with a Poisson process having rate λ. When a contact occurs,it is equally likely to involve any of the ( N) 2 pairs of individuals in the population.If a contact involves an infected and a noninfected individual, then with probabilityp the noninfected individual becomes infected. Once infected, an individualremains infected throughout. Let X(t) denote the number of infected members ofthe population at time t.


408 6 Continuous-Time Markov Chains(a) Is {X(t),t 0} a continuous-time Markov chain?(b) Specify its type.(c) Starting with a single infected individual, what is the expected time untilall members are infected?6. Consider a birth and death process with birth rates λ i = (i + 1)λ, i 0, anddeath rates μ i = iμ,i 0.(a) Determine the expected time to go from state 0 to state 4.(b) Determine the expected time to go from state 2 to state 5.(c) Determine the variances in parts (a) and (b).*7. Individuals join a club in accordance with a Poisson process with rate λ.Each new member must pass through k consecutive stages to become a fullmember of the club. The time it takes to pass through each stage is exponentiallydistributed with rate μ. LetN i (t) denote the number of club membersat time t who have passed through exactly i stages, i = 1,...,k − 1. Also, letN(t) = (N 1 (t), N 2 (t),...,N k−1 (t)).(a) Is {N(t), t 0} a continuous-time Markov chain?(b) If so, give the infinitesimal transition rates. That is, for any state n =(n 1 ,...,n k−1 ) give the possible next states along with their infinitesimal rates.8. Consider two machines, both of which have an exponential lifetime withmean 1/λ. There is a single repairman that can service machines at an exponentialrate μ. Set up the Kolmogorov backward equations; you need notsolve them.9. The birth and death process with parameters λ n = 0 and μ n = μ, n > 0iscalled a pure death process. Find P ij (t).10. Consider two machines. Machine i operates for an exponential time withrate λ i and then fails; its repair time is exponential with rate μ i ,i = 1, 2. Themachines act independently of each other. Define a four-state continuous-timeMarkov chain which jointly describes the condition of the two machines. Use theassumed independence to compute the transition probabilities for this chain andthen verify that these transition probabilities satisfy the forward and backwardequations.*11. Consider a Yule process starting with a single individual—that is, supposeX(0) = 1. Let T i denote the time it takes the process to go from a population ofsize i to one of size i + 1.(a) Argue that T i ,i = 1,...,j, are independent exponentials with respectiverates iλ.(b) Let X 1 ,...,X j denote independent exponential random variables eachhaving rate λ, and interpret X i as the lifetime of component i. Argue that


Exercises 409max(X 1 ,...,X j ) can be expressed asmax(X 1 ,...,X j ) = ε 1 + ε 2 +···+ε jwhere ε 1 ,ε 2 ,...,ε j are independent exponentials with respective rates jλ,(j − 1)λ,...,λ.Hint:Interpret ε i as the time between the i − 1 and the ith failure.(c) Using (a) and (b) argue that(d) Use (c) to obtain thatP {T 1 +···+T j t}=(1 − e −λt ) jP 1j (t) = (1 − e −λt ) j−1 − (1 − e −λt ) j = e −λt (1 − e −λt ) j−1and hence, given X(0) = 1, X(t) has a geometric distribution with parameterp = e −λt .(e) Now conclude that( ) j − 1P ij (t) = e −λti (1 − e −λt ) j−ii − 112. Each individual in a biological population is assumed to give birth at anexponential rate λ, and to die at an exponential rate μ. In addition, there is anexponential rate of increase θ due to immigration. However, immigration is notallowed when the population size is N or larger.(a) Set this up as a birth and death model.(b) If N = 3, 1 = θ = λ, μ = 2, determine the proportion of time that immigrationis restricted.13. A small barbershop, operated by a single barber, has room for at most twocustomers. Potential customers arrive at a Poisson rate of three per hour, and thesuccessive service times are independent exponential random variables with meanhour. What is14(a) the average number of customers in the shop?(b) the proportion of potential customers that enter the shop?(c) If the barber could work twice as fast, how much more business wouldhe do?14. Potential customers arrive at a full-service, one-pump gas station at a Poissonrate of 20 cars per hour. However, customers will only enter the station for gasif there are no more than two cars (including the one currently being attended to)at the pump. Suppose the amount of time required to service a car is exponentiallydistributed with a mean of five minutes.


410 6 Continuous-Time Markov Chains(a) What fraction of the attendant’s time will be spent servicing cars?(b) What fraction of potential customers are lost?15. A service center consists of two servers, each working at an exponential rateof two services per hour. If customers arrive at a Poisson rate of three per hour,then, assuming a system capacity of at most three customers,(a) what fraction of potential customers enter the system?(b) what would the value of part (a) be if there was only a single server, andhis rate was twice as fast (that is, μ = 4)?*16. The following problem arises in molecular biology. The surface of a bacteriumconsists of several sites at which foreign molecules—some acceptable andsome not—become attached. We consider a particular site and assume that moleculesarrive at the site according to a Poisson process with parameter λ. Amongthese molecules a proportion α is acceptable. Unacceptable molecules stay at thesite for a length of time which is exponentially distributed with parameter μ 1 ,whereas an acceptable molecule remains at the site for an exponential time withrate μ 2 . An arriving molecule will become attached only if the site is free of othermolecules. What percentage of time is the site occupied with an acceptable (unacceptable)molecule?17. Each time a machine is repaired it remains up for an exponentially distributedtime with rate λ. It then fails, and its failure is either of two types. If it isa type 1 failure, then the time to repair the machine is exponential with rate μ 1 ;if it is a type 2 failure, then the repair time is exponential with rate μ 2 . Each failureis, independently of the time it took the machine to fail, a type 1 failure withprobability p and a type 2 failure with probability 1 − p. What proportion of timeis the machine down due to a type 1 failure? What proportion of time is it downdue to a type 2 failure? What proportion of time is it up?18. After being repaired, a machine functions for an exponential time with rateλ and then fails. Upon failure, a repair process begins. The repair process proceedssequentially through k distinct phases. First a phase 1 repair must be performed,then a phase 2, and so on. The times to complete these phases are independent,with phase i taking an exponential time with rate μ i , i = 1,...,k.(a) What proportion of time is the machine undergoing a phase i repair?(b) What proportion of time is the machine working?*19. A single repairperson looks after both machines 1 and 2. Each time it is repaired,machine i stays up for an exponential time with rate λ i , i = 1, 2. When machinei fails, it requires an exponentially distributed amount of work with rate μ ito complete its repair. The repairperson will always service machine 1 when it isdown. For instance, if machine 1 fails while 2 is being repaired, then the repairpersonwill immediately stop work on machine 2 and start on 1. What proportionof time is machine 2 down?


Exercises 41120. There are two machines, one of which is used as a spare. A working machinewill function for an exponential time with rate λ and will then fail. Uponfailure, it is immediately replaced by the other machine if that one is in workingorder, and it goes to the repair facility. The repair facility consists of a single personwho takes an exponential time with rate μ to repair a failed machine. At therepair facility, the newly failed machine enters service if the repairperson is free.If the repairperson is busy, it waits until the other machine is fixed; at that time,the newly repaired machine is put in service and repair begins on the other one.Starting with both machines in working condition, find(a) the expected value and(b) the varianceof the time until both are in the repair facility.(c) In the long run, what proportion of time is there a working machine?21. Suppose that when both machines are down in Exercise 20 a second repairpersonis called in to work on the newly failed one. Suppose all repair timesremain exponential with rate μ. Now find the proportion of time at least one machineis working, and compare your answer with the one obtained in Exercise 20.22. Customers arrive at a single-server queue in accordance with a Poissonprocess having rate λ. However, an arrival that finds n customers already inthe system will only join the system with probability 1/(n + 1). That is, withprobability n/(n + 1) such an arrival will not join the system. Show that thelimiting distribution of the number of customers in the system is Poisson withmean λ/μ.23. A job shop consists of three machines and two repairmen. The amount oftime a machine works before breaking down is exponentially distributed withmean 10. If the amount of time it takes a single repairman to fix a machine isexponentially distributed with mean 8, then(a) what is the average number of machines not in use?(b) what proportion of time are both repairmen busy?*24. Consider a taxi station where taxis and customers arrive in accordance withPoisson processes with respective rates of one and two per minute. A taxi will waitno matter how many other taxis are present. However, an arriving customer thatdoes not find a taxi waiting leaves. Find(a) the average number of taxis waiting, and(b) the proportion of arriving customers that get taxis.25. Customers arrive at a service station, manned by a single server who servesat an exponential rate μ 1 , at a Poisson rate λ. After completion of service thecustomer then joins a second system where the server serves at an exponential


412 6 Continuous-Time Markov Chainsrate μ 2 . Such a system is called a tandem or sequential queueing system. Assumingthat λsμ?*28. If {X(t)} and {Y(t)} are independent continuous-time Markov chains, bothof which are time reversible, show that the process {X(t),Y(t)} is also a timereversible Markov chain.29. Consider a set of n machines and a single repair facility to service thesemachines. Suppose that when machine i, i = 1,...,n, fails it requires an exponentiallydistributed amount of work with rate μ i to repair it. The repair facilitydivides its efforts equally among all failed machines in the sense that wheneverthere are k failed machines each one receives work at a rate of 1/k per unit time.If there are a total of r working machines, including machine i, then i fails at aninstantaneous rate λ i /r.(a) Define an appropriate state space so as to be able to analyze the precedingsystem as a continuous-time Markov chain.(b) Give the instantaneous transition rates (that is, give the q ij ).(c) Write the time reversibility equations.(d) Find the limiting probabilities and show that the process is time reversible.30. Consider a graph with nodes 1, 2,...,n and the ( n2)arcs (i, j), i ≠ j,i, j, = 1,...,n. (See Section 3.6.2 for appropriate definitions.) Suppose that aparticle moves along this graph as follows: Events occur along the arcs (i, j) accordingto independent Poisson processes with rates λ ij . An event along arc (i, j)causes that arc to become excited. If the particle is at node i at the moment that(i, j) becomes excited, it instantaneously moves to node j, i, j = 1,...,n.LetP jdenote the proportion of time that the particle is at node j. Show thatP j = 1 nHint:Use time reversibility.


Exercises 41331. A total of N customers move about among r servers in the following manner.When a customer is served by server i, he then goes over to server j, j ≠ i,with probability 1/(r −1). If the server he goes to is free, then the customer entersservice; otherwise he joins the queue. The service times are all independent, withthe service times at server i being exponential with rate μ, i = 1,...,r.Letthestate at any time be the vector (n 1 ,...,n r ), where n i is the number of customerspresently at server i, i = 1,...,r, ∑ i n i = N.(a) Argue that if X(t) is the state at time t, then {X(t), t 0}, is a continuoustimeMarkov chain.(b) Give the infinitesimal rates of this chain.(c) Show that this chain is time reversible, and find the limiting probabilities.32. Customers arrive at a two-server station in accordance with a Poissonprocess having rate λ. Upon arriving, they join a single queue. Whenever a servercompletes a service, the person first in line enters service. The service times ofserver i are exponential with rate μ i ,i= 1, 2, where μ 1 + μ 2 >λ.Anarrivalfinding both servers free is equally likely to go to either one. Define an appropriatecontinuous-time Markov chain for this model, show it is time reversible, andfind the limiting probabilities.*33. Consider two M/M/1 queues with respective parameters λ i ,μ i , i = 1, 2.Suppose they share a common waiting room that can hold at most three customers.That is, whenever an arrival finds her server busy and three customers in the waitingroom, she goes away. Find the limiting probability that there will be n queue1 customers and m queue 2 customers in the system.Hint:Use the results of Exercise 28 together with the concept of truncation.34. Four workers share an office that contains four telephones. At any time, eachworker is either “working” or “on the phone.” Each “working” period of workeri lasts for an exponentially distributed time with rate λ i , and each “on the phone”period lasts for an exponentially distributed time with rate μ i , i = 1, 2, 3, 4.(a) What proportion of time are all workers “working”?Let X i (t) equal 1 if worker i is working at time t, and let it be 0 otherwise.Let X(t) = (X 1 (t), X 2 (t), X 3 (t), X 4 (t)).(b) Argue that {X(t), t 0} is a continuous-time Markov chain and give itsinfinitesimal rates.(c) Is {X(t)} time reversible? Why or why not?Suppose now that one of the phones has broken down. Suppose that a worker whois about to use a phone but finds them all being used begins a new “working”period.(d) What proportion of time are all workers “working”?


414 6 Continuous-Time Markov Chains35. Consider a time reversible continuous-time Markov chain having infinitesimaltransition rates q ij and limiting probabilities {P i }.LetA denote a set of statesfor this chain, and consider a new continuous-time Markov chain with transitionrates q ∗ ij given by q ∗ ij = {cqij , if i ∈ A, j /∈ Aq ij ,otherwisewhere c is an arbitrary positive number. Show that this chain remains time reversible,and find its limiting probabilities.36. Consider a system of n components such that the working times of componenti, i = 1,...,n, are exponentially distributed with rate λ i . When failed,however, the repair rate of component i depends on how many other componentsare down. Specifically, suppose that the instantaneous repair rate of componenti, i = 1,...,n, when there are a total of k failed components, is α k μ i .(a) Explain how we can analyze the preceding as a continuous-time Markovchain. Define the states and give the parameters of the chain.(b) Show that, in steady state, the chain is time reversible and compute thelimiting probabilities.37. For the continuous-time Markov chain of Exercise 3 present a uniformizedversion.38. In Example 6.20, we computed m(t) = E[O(t)], the expected occupationtime in state 0 by time t for the two-state continuous-time Markov chain startingin state 0. Another way of obtaining this quantity is by deriving a differentialequation for it.(a) Show that(b) Show thatm(t + h) = m(t) + P 00 (t)h + o(h)m ′ (t) =μλ + μ + λλ + μ e−(λ+μ)t(c) Solve for m(t).39. Let O(t) be the occupation time for state 0 in the two-state continuous-timeMarkov chain. Find E[O(t)|X(0) = 1].40. Consider the two-state continuous-time Markov chain. Starting in state 0,find Cov[X(s),X(t)].


References 41541. Let Y denote an exponential random variable with rate λ that is independentof the continuous-time Markov chain {X(t)} and letP¯ij = P {X(Y) = j|X(0) = i}(a) Show thatP¯ij = 1v i + λ∑kq ik P¯kj +λv i + λ δ ijwhere δ ij is 1 when i = j and 0 when i ≠ j.(b) Show that the solution of the preceding set of equations is given by¯P = (I − R/λ) −1where ¯P is the matrix of elements ¯ P ij , I is the identity matrix, and R the matrixspecified in Section 6.8.(c) Suppose now that Y 1 ,...,Y n are independent exponentials with rate λ thatare independent of {X(t)}. Show thatP {X(Y 1 +···+Y n ) = j|X(0) = i}is equal to the element in row i, column j of the matrix ¯P n .(d) Explain the relationship of the preceding to Approximation 2 of Section6.8.*42. (a) Show that Approximation 1 of Section 6.8 is equivalent to uniformizingthe continuous-time Markov chain with a value v such that vt = n and thenapproximating P ij (t) by Pij ∗n.(b) Explain why the preceding should make a good approximation.Hint: What is the standard deviation of a Poisson random variable withmean n?References1. D. R. Cox and H. D. Miller, “The Theory of Stochastic Processes,” Methuen, London,1965.2. A. W. Drake, “Fundamentals of Applied Probability Theory,” McGraw-Hill, New York,1967.3. S. Karlin and H. Taylor, “A First Course in Stochastic Processes,” Second Edition, AcademicPress, New York, 1975.4. E. Parzen, “Stochastic Processes,” Holden-Day, San Francisco, California, 1962.5. S. Ross, “Stochastic Processes,” Second Edition, John Wiley, New York, 1996.


This page intentionally left blank


Renewal Theory andIts Applications77.1. IntroductionWe have seen that a Poisson process is a counting process for which the times betweensuccessive events are independent and identically distributed exponentialrandom variables. One possible generalization is to consider a counting processfor which the times between successive events are independent and identicallydistributed with an arbitrary distribution. Such a counting process is called a renewalprocess.Let {N(t), t 0} be a counting process and let X n denote the time betweenthe (n − 1)st and the nth event of this process, n 1.Definition 7.1 If the sequence of nonnegative random variables {X 1 X 2 ,...}is independent and identically distributed, then the counting process {N(t), t 0}is said to be a renewal process.Thus, a renewal process is a counting process such that the time until the firstevent occurs has some distribution F , the time between the first and second eventhas, independently of the time of the first event, the same distribution F , andso on. When an event occurs, we say that a renewal has taken place.For an example of a renewal process, suppose that we have an infinite supply oflightbulbs whose lifetimes are independent and identically distributed. Supposealso that we use a single lightbulb at a time, and when it fails we immediatelyreplace it with a new one. Under these conditions, {N(t), t 0} is a renewalprocess when N(t) represents the number of lightbulbs that have failed by time t.For a renewal process having interarrival times X 1 ,X 2 ,...,letS 0 = 0, S n =n∑X i , n 1i=1417


418 7 Renewal Theory and Its ApplicationsFigure 7.1. Renewal and interarrival times.That is, S 1 = X 1 is the time of the first renewal; S 2 = X 1 + X 2 is the time untilthe first renewal plus the time between the first and second renewal, that is, S 2 isthe time of the second renewal. In general, S n denotes the time of the nth renewal(see Figure 7.1).We shall let F denote the interarrival distribution and to avoid trivialities, weassume that F(0) = P {X n = 0} < 1. Furthermore, we letμ = E[X n ], n 1be the mean time between successive renewals. It follows from the nonnegativityof X n and the fact that X n is not identically 0 that μ>0.The first question we shall attempt to answer is whether an infinite number ofrenewals can occur in a finite amount of time. That is, can N(t) be infinite forsome (finite) value of t? To show that this cannot occur, we first note that, as S nis the time of the nth renewal, N(t) may be written asN(t)= max{n: S n t} (7.1)To understand why Equation (7.1) is valid, suppose, for instance, that S 4 t butS 5 >t. Hence, the fourth renewal had occurred by time t but the fifth renewal occurredafter time t; or in other words, N(t), the number of renewals that occurredby time t, must equal 4. Now by the strong law of large numbers it follows that,with probability 1,S nn → μas n →∞But since μ>0 this means that S n must be going to infinity as n goes to infinity.Thus, S n can be less than or equal to t for at most a finite number of values of n,and hence by Equation (7.1), N(t) must be finite.However, though N(t)


7.2. Distribution of N(t) 419Therefore,P {N(∞)


420 7 Renewal Theory and Its ApplicationsThus, from Equation (7.3) we have that∑[t]( ) k − 1P {N(t)= n}=p n (1 − p) k−nn − 1k=n−[t]∑k=n+1( k − 1n)p n+1 (1 − p) k−n−1Equivalently, since an event independently occurs with probability p at each ofthe times 1, 2,...P {N(t)= n}=( ) [t]p n (1 − p) [t]−nnBy using Equation (7.2) we can calculate m(t), the mean value of N(t),asm(t) = E[N(t)]∞∑= P {N(t) n}==n=1∞∑P {S n t}n=1∞∑F n (t)n=1where we have used the fact that if X is nonnegative and integer valued, then∞∑∞∑E[X]= kP{X = k}==k=1n=1 k=nk=1 n=1k∑P {X = k}∞∑ ∞∑∞∑P {X = k}= P {X n}The function m(t) is known as the mean-value or the renewal function.It can be shown that the mean-value function m(t) uniquely determines therenewal process. Specifically, there is a one-to-one correspondence between theinterarrival distributions F and the mean-value functions m(t).n=1


7.2. Distribution of N(t) 421Example 7.2is given bySuppose we have a renewal process whose mean-value functionm(t) = 2t, t 0What is the distribution of the number of renewals occurring by time 10?Solution: Since m(t) = 2t is the mean-value function of a Poisson processwith rate 2, it follows, by the one-to-one correspondence of interarrival distributionsF and mean-value functions m(t), that F must be exponential withmean 1 2. Thus, the renewal process is a Poisson process with rate 2 and hence−20 (20)nP {N(10) = n}=e , n 0 n!Another interesting result that we state without proof is thatm(t) < ∞for all t


422 7 Renewal Theory and Its ApplicationsAn integral equation satisfied by the renewal function can be obtained by conditioningon the time of the first renewal. Assuming that the interarrival distributionF is continuous with density function f this yieldsm(t) = E[N(t)]=∫ ∞0E[N(t)|X 1 = x]f(x)dx (7.4)Now suppose that the first renewal occurs at a time x that is less than t. Then, usingthe fact that a renewal process probabilistically starts over when a renewal occurs,it follows that the number of renewals by time t would have the same distributionas 1 plus the number of renewals in the first t − x time units. Therefore,E[N(t)|X 1 = x]=1 + E[N(t − x)]if xtwe obtain from Equation (7.4) thatm(t) =∫ t0= F(t)+[1 + m(t − x)]f(x)dx∫ t0m(t − x)f(x)dx (7.5)Equation (7.5) is called the renewal equation and can sometimes be solved toobtain the renewal function.Example 7.3 One instance in which the renewal equation can be solved iswhen the interarrival distribution is uniform—say, uniform on (0, 1). We willnow present a solution in this case when t 1. For such values of t, the renewalfunction becomesm(t) = t +∫ t0m(t − x)dx∫ t= t + m(y) dy by the substitution y = t − x0Differentiating the preceding equation yieldsLetting h(t) = 1 + m(t), we obtainm ′ (t) = 1 + m(t)h ′ (t) = h(t)


7.3. Limit Theorems and Their Applications 423orlog h(t) = t + Corh(t) = Ke torm(t) = Ke t − 1Since m(0) = 0, we see that K = 1, and so we obtainm(t) = e t − 1, 0 t 1 7.3. Limit Theorems and Their ApplicationsWe have shown previously that, with probability 1, N(t) goes to infinity as t goesto infinity. However, it would be nice to know the rate at which N(t) goes toinfinity. That is, we would like to be able to say something about lim t→∞ N(t)/t.As a prelude to determining the rate at which N(t) grows, let us first considerthe random variable S N(t) . In words, just what does this random variable represent?Proceeding inductively suppose, for instance, that N(t)= 3. Then S N(t) = S 3represents the time of the third event. Since there are only three events that haveoccurred by time t, S 3 also represents the time of the last event prior to (or at)time t. This is, in fact, what S N(t) represents—namely, the time of the last renewalpriortoorattime t. Similar reasoning leads to the conclusion that S N(t)+1represents the time of the first renewal after time t (see Figure 7.2). We now areready to prove the following.Proposition 7.1 With probability 1,N(t)t→ 1 μas t →∞Figure 7.2.


424 7 Renewal Theory and Its ApplicationsProof Since S N(t) is the time of the last renewal prior to or at time t, andS N(t)+1 is the time of the first renewal after time t, wehaveS N(t) t


7.3. Limit Theorems and Their Applications 425Elementary Renewal Theoremm(t)→ 1 as t →∞t μAs before, 1/μ is interpreted as 0 when μ =∞.Remark At first glance it might seem that the elementary renewal theoremshould be a simple consequence of Proposition 7.1. That is, since the averagerenewal rate will, with probability 1, converge to 1/μ, should this not imply thatthe expected average renewal rate also converges to 1/μ? We must, however, becareful; consider the next example.Example 7.4 Let U be a random variable which is uniformly distributed on(0, 1); and define the random variables Y n , n 1, by{ 0, if U>1/nY n =n, if U 1/nNow, since, with probability 1, U will be greater than 0, it follows that Y n willequal 0 for all sufficiently large n. That is, Y n will equal 0 for all n large enoughso that 1/n < U. Hence, with probability 1,Y n → 0asn →∞However,{E[Y n ]=nP U 1 }= n 1 n n = 1Therefore, even though the sequence of random variables Y n converges to 0, theexpected values of the Y n are all identically 1. Example 7.5 Beverly has a radio that works on a single battery. As soon asthe battery in use fails, Beverly immediately replaces it with a new battery. If thelifetime of a battery (in hours) is distributed uniformly over the interval (30, 60),then at what rate does Beverly have to change batteries?Solution: If we let N(t) denote the number of batteries that have failedby time t, we have by Proposition 7.1 that the rate at which Beverly replacesbatteries is given byN(t)lim = 1 t→∞ t μ = 1 45That is, in the long run, Beverly will have to replace one battery every 45hours.


426 7 Renewal Theory and Its ApplicationsExample 7.6 Suppose in Example 7.5 that Beverly does not keep any surplusbatteries on hand, and so each time a failure occurs she must go and buy a newbattery. If the amount of time it takes for her to get a new battery is uniformlydistributed over (0,1), then what is the average rate that Beverly changes batteries?Solution:In this case the mean time between renewals is given byμ = EU 1 + EU 2where U 1 is uniform over (30, 60) and U 2 is uniform over (0, 1). Hence,μ = 45 + 1 2 = 45 1 2and so in the long run, Beverly will be putting in a new battery at the rate of91 2 .That is, she will put in two new batteries every 91 hours. Example 7.7 Suppose that potential customers arrive at a single-server bankin accordance with a Poisson process having rate λ. However, suppose that thepotential customer will enter the bank only if the server is free when he arrives.That is, if there is already a customer in the bank, then our arriver, rather thanentering the bank, will go home. If we assume that the amount of time spent inthe bank by an entering customer is a random variable having distribution G, then(a) what is the rate at which customers enter the bank?(b) what proportion of potential customers actually enter the bank?Solution: In answering these questions, let us suppose that at time 0 a customerhas just entered the bank. (That is, we define the process to start whenthe first customer enters the bank.) If we let μ G denote the mean service time,then, by the memoryless property of the Poisson process, it follows that themean time between entering customers isμ = μ G + 1 λHence, the rate at which customers enter the bank will be given by1μ = λ1 + λμ GOn the other hand, since potential customers will be arriving at a rate λ, itfollows that the proportion of them entering the bank will be given byλ/(1 + λμ G ) 1=λ 1 + λμ GIn particular if λ = 2 (in hours) and μ G = 2, then only one customer out of fivewill actually enter the system.


7.3. Limit Theorems and Their Applications 427A somewhat unusual application of Proposition 7.1 is provided by our nextexample.Example 7.8 A sequence of independent trials, each of which results in outcomenumber i with probability P i , i = 1,..., n, ∑ ni P i = 1, is observed untilthe same outcome occurs k times in a row; this outcome then is declared to be thewinner of the game. For instance, if k = 2 and the sequence of outcomes is 1, 2,4, 3, 5, 2, 1, 3, 3, then we stop after nine trials and declare outcome number 3 thewinner. What is the probability that i wins, i = 1,..., n, and what is the expectednumber of trials?Solution: We begin by computing the expected number of coin tosses, call itE[T ], until a run of k successive heads occurs when the tosses are independentand each lands on heads with probability p. By conditioning on the time of thefirst nonhead, we obtainE[T ]=k∑(1 − p)p j−1 (j + E[T ]) + kp kj=1Solving this for E[T ] yieldsUpon simplifying, we obtainE[T ]=k +(1 − p)p kk∑j=1jp j−1E[T ]= 1 + p +···+pk−1p k= 1 − pkp k (1 − p)(7.7)Now, let us return to our example, and let us suppose that as soon as thewinner of a game has been determined we immediately begin playing anothergame. For each i let us determine the rate at which outcome i wins. Now, everytime i wins, everything starts over again and thus wins by i constitute renewals.Hence, from Proposition 7.1, therate at which i wins = 1E[N i ]


428 7 Renewal Theory and Its Applicationswhere N i denotes the number of trials played between successive wins of outcomei. Hence, from Equation (7.7) we see thatrate at which i wins = P ki (1 − P i)1 − P ki(7.8)Hence, the long-run proportion of games that are won by number i is given byrate at which i winsproportion of games i wins = ∑ nj=1rate at which j winsPi k=(1 − P i)/(1 − Pi k∑ )nj=1(Pj k(1 − P j )/(1 − Pj k))However, it follows from the strong law of large numbers that the long-run proportionof games that i wins will, with probability 1, be equal to the probabilitythat i wins any given game. Hence,Pi kP {i wins}=(1 − P i)/(1 − Pi k∑ )nj=1(Pj k(1 − P j )/(1 − Pj k))To compute the expected time of a game, we first note that therate at which games end ==n∑rate at which i winsi=1n∑i=1P ki (1 − P i)1 − P ki[from Equation (7.8)]Now, as everything starts over when a game ends, it follows by Proposition 7.1that the rate at which games end is equal to the reciprocal of the mean time ofa game. Hence,E[time of a game}=1rate at which games end1= ∑ ni=1(Pi k(1− P i)/(1 − Pi k))A key element in the proof of the elementary renewal theorem, which is alsoof independent interest, is the establishment of a relationship between m(t),the mean number of renewals by time t, and E[S N(t)+1 ], the expected time of


7.3. Limit Theorems and Their Applications 429the first renewal after t. Lettingg(t) = E[S N(t)+1 ]we will derive an integral equation, similar to the renewal equation, for g(t) byconditioning on the time of the first renewal. This yieldsg(t) =∫ ∞0E[S N(t)+1 |X 1 = x]f(x)dxwhere we have supposed that the interarrival times are continuous with density f .Now if the first renewal occurs at time x and x>t, then clearly the time of thefirst renewal after t is x. On the other hand, if the first renewal occurs at a timex


430 7 Renewal Theory and Its Applicationsor∫ tg 1 (t) = F(t)+ g 1 (t − x)f(x)dx0That is, g 1 (t) = E[S N(t)+1 ]/μ − 1 satisfies the renewal equation and thus, byuniqueness, must be equal to m(t). We have thus proven the following.Proposition 7.2E[S N(t)+1 ]=μ[m(t) + 1]A second derivation of Proposition 7.2 is given in Exercises 13 and 14. To seehow Proposition 7.2 can be used to establish the elementary renewal theorem, letY(t) denote the time from t until the next renewal. Y(t) is called the excess, orresidual life, at t. As the first renewal after t will occur at time t + Y(t),weseethatS N(t)+1 = t + Y(t)Taking expectations and utilizing Proposition 7.2 yieldswhich implies thatμ[m(t) + 1]=t + E[Y(t)] (7.9)m(t)t= 1 μ − 1 t + E[Y(t)]tμThe elementary renewal theorem can now be proven by showing thatE[Y(t)]lim = 0t→∞ t(see Exercise 14).The relation (7.9) shows that if we can determine E[Y(t)], the mean excess at t,then we can compute m(t) and vice versa.Example 7.9 Consider the renewal process whose interarrival distribution isthe convolution of two exponentials; that is,F = F 1 ∗ F 2 , where F i (t) = 1 − e −μ it , i = 1, 2We will determine the renewal function by first determining E[Y(t)]. To obtainthe mean excess at t, imagine that each renewal corresponds to a new machinebeing put in use, and suppose that each machine has two components—initiallycomponent 1 is employed and this lasts an exponential time with rate μ 1 , and then


7.3. Limit Theorems and Their Applications 431component 2, which functions for an exponential time with rate μ 2 , is employed.When component 2 fails, a new machine is put in use (that is, a renewal occurs).Now consider the process {X(t), t 0} where X(t) is i if a type i component isin use at time t. It is easy to see that {X(t), t 0} is a two-state continuous-timeMarkov chain, and so, using the results of Example 6.11, its transition probabilitiesareP 11 (t) = μ 1e −(μ 1+μ 2 )t + μ 2μ 1 + μ 2 μ 1 + μ 2To compute the expected remaining life of the machine in use at time t, we conditionon whether it is using its first or second component: for if it is still using itsfirst component, then its remaining life is 1/μ 1 + 1/μ 2 , whereas if it is alreadyusing its second component, then its remaining life is 1/μ 2 . Hence, letting p(t)denote the probability that the machine in use at time t is using its first component,we have that( 1E[Y(t)]= + 1 )p(t) + 1 − p(t)μ 1 μ 2 μ 2= 1 μ 2+ p(t)μ 1But, since at time 0 the first machine is utilizing its first component, it followsthat p(t) = P 11 (t), and so, upon using the preceding expression of P 11 (t),we obtainE[Y(t)]= 1 μ 2+1e −(μ 1+μ 2 )t μ 2+μ 1 + μ 2 μ 1 (μ 1 + μ 2 )(7.10)Now it follows from Equation (7.9) thatm(t) + 1 = t u + E[Y(t)]μ(7.11)where μ, the mean interarrival time, is given in this case byμ = 1 μ 1+ 1 μ 2= μ 1 + μ 2μ 1 μ 2Substituting Equation (7.10) and the preceding equation into (7.11) yields, aftersimplifying,m(t) = μ 1μ 2μ 1 + μ 2t − μ 1μ 2(μ 1 + μ 2 ) 2 [1 − e−(μ 1+μ 2 )t ]


432 7 Renewal Theory and Its ApplicationsRemark Using the relationship of Equation (7.11) and results from the twostatecontinuous-time Markov chain, the renewal function can also be obtained inthe same manner as in Example 7.9 for the interarrival distributionsandwhen F i (t) = 1 − e −μ it , t>0, i = 1, 2.F(t)= pF 1 (t) + (1 − p)F 2 (t)F(t)= pF 1 (t) + (1 − p)(F 1 ∗ F 2 )(t)An important limit theorem is the central limit theorem for renewal processes.This states that, for large t, N(t) is approximately normally distributed with meant/μ and variance tσ 2 /μ 3 , where μ and σ 2 are, respectively, the mean and varianceof the interarrival distribution. That is, we have the following theorem whichwe state without proof.Central Limit Theorem for Renewal Processes{}lim P N(t)− t/μ√


7.4. Renewal Reward Processes 433Therefore, N 1 (100) is approximately normal with mean 50 and variance100/8; and N 2 (100) is approximately normal with mean 50 and variance 100/6.Hence, N 1 (100) + N 2 (100) is approximately normal with mean 100 and variance175/6. Thus, with denoting the standard normal distribution function,we have{}N 1 (100) + N 2 (100) − 100 89.5 − 100P {N 1 (100) + N 2 (100)>89.5}=P √ > √ 175/6 175/6( ) −10.5≈ 1 − √ 175/6( ) 10.5≈ √ 175/6≈ (1.944)≈ 0.9741 7.4. Renewal Reward ProcessesA large number of probability models are special cases of the following model.Consider a renewal process {N(t), t 0} having interarrival times X n ,n 1,and suppose that each time a renewal occurs we receive a reward. We denote byR n , the reward earned at the time of the nth renewal. We shall assume that theR n ,n 1, are independent and identically distributed. However, we do allow forthe possibility that R n may (and usually will) depend on X n , the length of the nthrenewal interval. If we letN(t)∑R(t) =n=1then R(t) represents the total reward earned by time t.LetProposition 7.3(a) with probability 1,(b)R nE[R]=E[R n ], E[X]=E[X n ]If E[R] < ∞ and E[X] < ∞, thenR(t)lim = E[R]t→∞ t E[X]limt→∞E[R(t)]t= E[R]E[X]


434 7 Renewal Theory and Its ApplicationsProofWe give the proof for (a) only. To prove this, writeR(t)t∑ N(t)n=1=R (∑ N(t)n n=1=R ) (N(t) )nt N(t) tBy the strong law of large numbers we obtainand by Proposition 7.1∑ N(t)n=1 R nN(t)→ E[R]as t →∞N(t)t→ 1E[X]as t →∞The result thus follows.Remark (i) If we say that a cycle is completed every time a renewal occurs,then Proposition 7.3 states that the long-run average reward per unit time is equalto the expected reward earned during a cycle divided by the expected length of acycle.(ii) Although we have supposed that the reward is earned at the time of a renewal,the result remains valid when the reward is earned gradually throughoutthe renewal cycle.Example 7.11 In Example 7.7 if we suppose that the amounts that the successivecustomers deposit in the bank are independent random variables havinga common distribution H , then the rate at which deposits accumulate—that is,lim t→∞ (total deposits by the time t)/t—is given byE[deposits during a cycle]E[time of cycle]=μ Hμ G + 1/λwhere μ G + 1/λ is the mean time of a cycle, and μ H is the mean of thedistribution H . Example 7.12 (A Car Buying Model) The lifetime of a car is a continuousrandom variable having a distribution H and probability density h. Mr.Brownhas a policy that he buys a new car as soon as his old one either breaks down orreaches the age of T years. Suppose that a new car costs C 1 dollars and also that anadditional cost of C 2 dollars is incurred whenever Mr. Brown’s car breaks down.Under the assumption that a used car has no resale value, what is Mr. Brown’slong-run average cost?


7.4. Renewal Reward Processes 435If we say that a cycle is complete every time Mr. Brown gets a new car, thenit follows from Proposition 7.3 (with costs replacing rewards) that his long-runaverage cost equalsE[cost incurred during a cycle]E[length of a cycle]Now letting X be the lifetime of Mr. Brown’s car during an arbitrary cycle, thenthe cost incurred during that cycle will be given byC 1 ,C 1 + C 2 ,if X>Tif X Tso the expected cost incurred over a cycle isC 1 P {X>T}+(C 1 + C 2 )P {X T }=C 1 + C 2 H(T)Also, the length of the cycle isX, if X TT, if X>Tand so the expected length of a cycle is∫ T0xh(x)dx +∫ ∞T∫ TTh(x)dx= xh(x)dx + T [1 − H(T)]0Therefore, Mr. Brown’s long-run average cost will beC 1 + C 2 H(T)∫ T0 xh(x)dx + T [1 − H(T)] (7.13)Now, suppose that the lifetime of a car (in years) is uniformly distributedover (0, 10), and suppose that C 1 is 3 (thousand) dollars and C 2 is 1 2 (thousand)dollars. What value of T minimizes Mr. Brown’s long-run average cost?


436 7 Renewal Theory and Its ApplicationsIf Mr. Brown uses the value T,T 10, then from Equation (7.13) his long-runaverage cost equals3 + 1 2(T /10)∫ T0 (x/10)dx+ T(1 − T/10) = 3 + T/20T 2 /20 + (10T − T 2 )/10= 60 + T20T − T 2We can now minimize this by using the calculus. Toward this end, letthenEquating to 0 yieldsor, equivalently,which yields the solutionsg(T ) =60 + T20T − T 2g ′ (T ) = (20T − T 2 ) − (60 + T)(20 − 2T)(20T − T 2 ) 220T − T 2 = (60 + T)(20 − 2T)T 2 + 120 T − 1200 = 0T ≈ 9.25andT ≈−129.25Since T 10, it follows that the optimal policy for Mr. Brown would be to purchasea new car whenever his old car reaches the age of 9.25 years. Example 7.13 (Dispatching a Train) Suppose that customers arrive at a traindepot in accordance with a renewal process having a mean interarrival time μ.Whenever there are N customers waiting in the depot, a train leaves. If the depotincurs a cost at the rate of nc dollars per unit time whenever there are n customerswaiting, what is the average cost incurred by the depot?If we say that a cycle is completed whenever a train leaves, then the precedingis a renewal reward process. The expected length of a cycle is the expected time


7.4. Renewal Reward Processes 437required for N customers to arrive and, since the mean interarrival time is μ, thisequalsE[length of cycle]=NμIf we let T n denote the time between the nth and (n + 1)st arrival in a cycle, thenthe expected cost of a cycle may be expressed asE[cost of a cycle]=E[cT 1 + 2cT 2 +···+(N − 1)cT N−1 ]which, since E[T n ]=μ, equalscμ N (N − 1)2Hence, the average cost incurred by the depot iscμN(N − 1)2Nμ=c(N − 1)2Suppose now that each time a train leaves, the depot incurs a cost of six units.What value of N minimizes the depot’s long-run average cost when c = 2, μ= 1?In this case, we have that the average cost per unit time when the depot uses N is6 + cμN(N − 1)/2Nμ= N − 1 + 6 NBy treating this as a continuous function of N and using the calculus, we obtainthat the minimal value of N isN = √ 6 ≈ 2.45Hence, the optimal integral value of N is either 2 which yields a value 4, or 3which also yields the value 4. Hence, either N = 2orN = 3 minimizes the depot’saverage cost. Example 7.14 Suppose that customers arrive at a single-server system inaccordance with a Poisson process with rate λ. Upon arriving a customer mustpass through a door that leads to the server. However, each time someone passesthrough, the door becomes locked for the next t units of time. An arrival findinga locked door is lost, and a cost c is incurred by the system. An arrival findingthe door unlocked passes through to the server. If the server is free, the customerenters service; if the server is busy, the customer departs without service and acost K is incurred. If the service time of a customer is exponential with rate μ,find the average cost per unit time incurred by the system.


438 7 Renewal Theory and Its ApplicationsSolution: The preceding can be considered to be a renewal reward process,with a new cycle beginning each time a customer arrives to find the door unlocked.This is so because whether or not the arrival finds the server free, thedoor will become locked for the next t time units and the server will be busyfor a time X that is exponentially distributed with rate μ. (If the server is free,X is the service time of the entering customer; if the server is busy, X is theremaining service time of the customer in service.) Since the next cycle willbegin at the first arrival epoch after a time t has passed, it follows thatE[time of a cycle]=t + 1/λLet C 1 denote the cost incurred during a cycle due to arrivals finding the doorlocked. Then, since each arrival in the first t time units of a cycle will result ina cost c,wehaveE[C 1 ]=λtcAlso, let C 2 denote the cost incurred during a cycle due to an arrival findingthe door unlocked but the server busy. Then because a cost K is incurred if theserver is still busy a time t after the cycle began and, in addition, the next arrivalafter that time occurs before the service completion, we see thatConsequently,E[C 2 ]=Ke −μt λλ + μaverage cost per unit time = λtc + λKe−μt /(λ + μ)t + 1/λExample 7.15 Consider a manufacturing process that sequentially producesitems, each of which is either defective or acceptable. The following type of samplingscheme is often employed in an attempt to detect and eliminate most of thedefective items. Initially, each item is inspected and this continues until there arek consecutive items that are acceptable. At this point 100% inspection ends andeach successive item is independently inspected with probability α. This partialinspection continues until a defective item is encountered, at which time 100%inspection is reinstituted, and the process begins anew. If each item is, independently,defective with probability q,(a) what proportion of items are inspected?(b) if defective items are removed when detected, what proportion of the remainingitems are defective?


7.4. Renewal Reward Processes 439Remark Before starting our analysis, note that the preceding inspectionscheme was devised for situations in which the probability of producing a defectiveitem changed over time. It was hoped that 100% inspection would correlatewith times at which the defect probability was large and partial inspection whenit was small. However, it is still important to see how such a scheme would workin the extreme case where the defect probability remains constant throughout.Solution: We begin our analysis by noting that we can treat the preceding asa renewal reward process with a new cycle starting each time 100% inspectionis instituted. We then haveE[number inspected in a cycle]proportion of items inspected =E[number produced in a cycle]Let N k denote the number of items inspected until there are k consecutive acceptableitems. Once partial inspection begins—that is, after N k items havebeen produced—since each inspected item will be defective with probability q,it follows that the expected number that will have to be inspected to find adefective item is 1/q. Hence,E[number inspected in a cycle]=E[N k ]+ 1 qIn addition, since at partial inspection each item produced will, independently,be inspected and found to be defective with probability αq, it follows that thenumber of items produced until one is inspected and found to be defective is1/αq, and soE[number produced in a cycle]=E[N k ]+ 1αqAlso, as E[N k ] is the expected number of trials needed to obtain k acceptableitems in a row when each item is acceptable with probability p = 1 − q,it follows from Example 3.14 thatE[N k ]= 1 p + 1 p 2 +···+ 1 p k = (1/p)k − 1qHence we obtainP I ≡ proportion of items that are inspected =(1/p) k(1/p) k − 1 + 1/αTo answer (b), note first that since each item produced is defective with probabilityq it follows that the proportion of items that are both inspected and foundto be defective is qP I . Hence, for N large, out of the first N items producedthere will be (approximately) NqP I that are discovered to be defective and


440 7 Renewal Theory and Its Applicationsthus removed. As the first N items will contain (approximately) Nq defectiveitems, it follows that there will be Nq − NqP I defective items not discovered.Hence,proportion of the nonremoved items that are defective ≈ Nq(1 − P I)N(1 − qP I )As the approximation becomes exact as N →∞, we see thatproportion of the nonremoved items that are defective = q(1 − P I)(1 − qP I )Example 7.16 (The Average Age of a Renewal Process) Consider a renewalprocess having interarrival distribution F and define A(t) to be the timeat t since the last renewal. If renewals represent old items failing and being replacedby new ones, then A(t) represents the age of the item in use at time t.Since S N(t) represents the time of the last event prior to or at time t, wehavethatA(t) = t − S N(t)We are interested in the average value of the age—that is, inlims→∞∫ s0A(t) dtsTo determine this quantity, we use renewal reward theory in the following way:Let us assume that at any time we are being paid money at a rate equal to theage of the renewal process at that time. That is, at time t, we are being paid atrate A(t), and so ∫ s0A(t) dt represents our total earnings by time s. As everythingstarts over again when a renewal occurs, it follows that∫ s0A(t) dts→E[reward during a renewal cycle]E[time of a renewal cycle]Now since the age of the renewal process a time t into a renewal cycle is just t,we havereward during a renewal cycle =∫ X0= X22tdt


7.4. Renewal Reward Processes 441where X is the time of the renewal cycle. Hence, we have thataverage value of age ≡ lims→∞∫ s0A(t) dts= E[X2 ]2E[X](7.14)where X is an interarrival time having distribution function F .Example 7.17 (The Average Excess of a Renewal Process) Another quantityassociated with a renewal process is Y(t), the excess or residual time at time t.Y(t) is defined to equal the time from t until the next renewal and, as such, representsthe remaining (or residual) life of the item in use at time t. The averagevalue of the excess, namely,lims→∞∫ s0 Y(t)dtsalso can be easily obtained by renewal reward theory. To do so, suppose thatwe are paid at time t at a rate equal to Y(t). Then our average reward per unittime will, by renewal reward theory, be given byaverage value of excess ≡ lims→∞=∫ s0 Y(t)dtsE[reward during a cycle]E[length of a cycle]Now, letting X denote the length of a renewal cycle, we have thatreward during a cycle =and thus the average value of the excess is∫ X0= X22(X − t)dtaverage value of excess = E[X2 ]2E[X]which was the same result obtained for the average value of the age of renewalprocess.


442 7 Renewal Theory and Its Applications7.5. Regenerative ProcessesConsider a stochastic process {X(t), t 0} with state space 0, 1, 2, ..., havingthe property that there exist time points at which the process (probabilistically)restarts itself. That is, suppose that with probability one, there exists a time T 1 ,such that the continuation of the process beyond T 1 is a probabilistic replica of thewhole process starting at 0. Note that this property implies the existence of furthertimes T 2 , T 3 ,..., having the same property as T 1 . Such a stochastic process isknown as a regenerative process.From the preceding, it follows that T 1 , T 2 ,..., constitute the arrival times of arenewal process, and we shall say that a cycle is completed every time a renewaloccurs.Examples (1) A renewal process is regenerative, and T 1 represents the timeof the first renewal.(2) A recurrent Markov chain is regenerative, and T 1 represents the time of thefirst transition into the initial state.We are interested in determining the long-run proportion of time that a regenerativeprocess spends in state j. To obtain this quantity, let us imagine that weearn a reward at a rate 1 per unit time when the process is in state j and atrate 0 otherwise. That is, if I(s) represents the rate at which we earn at time s,then{ 1, if X(s) = jI(s)=0, if X(s) ≠ jand∫ ttotal reward earned by t = I(s)ds0As the preceding is clearly a renewal reward process that starts over again at thecycle time T 1 , we see from Proposition 7.3 thataverage reward per unit time = E[reward by time T 1]E[T 1 ]However, the average reward per unit is just equal to the proportion of time thatthe process is in state j. That is, we have the following.Proposition 7.4For a regenerative process, the long-runproportion of time in state j =E[amount of time in j during a cycle]E[time of a cycle]


7.5. Regenerative Processes 443Remark If the cycle time T 1 is a continuous random variable, then it can beshown by using an advanced theorem called the “key renewal theorem” that thepreceding is equal also to the limiting probability that the system is in state j attime t. Thatis,ifT 1 is continuous, thenof time in j during a cycle]lim P {X(t) = j}=E[amountt→∞ E[time of a cycle]Example 7.18 Consider a positive recurrent continuous time Markov chainthat is initially in state i. By the Markovian property, each time the process reentersstate i it starts over again. Thus returns to state i are renewals and constitutethe beginnings of new cycles. By Proposition 7.4, it follows that the long-runproportion of time in state j =E[amount of time in j during an i – i cycle]μ iiwhere μ ii represents the mean time to return to state i. Ifwetakej to equal i,then we obtainproportion of time in state i = 1/v iμ iiExample 7.19 (A Queueing System with Renewal Arrivals) Consider a waitingtime system in which customers arrive in accordance with an arbitrary renewalprocess and are served one at time by a single server having an arbitrary servicedistribution. If we suppose that at time 0 the initial customer has just arrived,then {X(t), t 0} is a regenerative process, where X(t) denotes the number ofcustomers in the system at time t. The process regenerates each time a customerarrives and finds the server free. Example 7.20 Although a system needs only a single machine to function,it maintains an additional machine as a backup. A machine in use functions for arandom time with density function f and then fails. If a machine fails while theother one is in working condition, then the latter is put in use and, simultaneously,repair begins on the one that just failed. If a machine fails while the other machineis in repair, then the newly failed machine waits until the repair is completed; atthat time the repaired machine is put in use and, simultaneously, repair begins onthe recently failed one. All repair times have density function g. FindP 0 , P 1 , P 2 ,where P i is the long-run proportion of time that exactly i of the machines are inworking condition.Solution: Let us say that the system is in state i whenever i machines are inworking condition i = 0, 1, 2. It is then easy to see that every time the systementers state 1 it probabilistically starts over. That is, the system restarts every


444 7 Renewal Theory and Its Applicationstime that a machine is put in use while, simultaneously, repair begins on theother one. Say that a cycle begins each time the system enters state 1. If welet X denote the working time of the machine put in use at the beginning of acycle, and let R be the repair time of the other machine, then the length of thecycle, call it T c , can be expressed asT c = max(X, R)The preceding follows when X R, because, in this case, the machine in usefails before the other one has been repaired, and so a new cycle begins whenthat repair is completed. Similarly, it follows when R


7.5. Regenerative Processes 445The preceding expectations can be computed as follows:E[max(X, R)]==E[(R − X) + ]==E[min(X, R)]==E[(X − R) + ]=∫ ∞ ∫ ∞0 0∫ ∞ ∫ r0 0∫ ∞ ∫ ∞0 0∫ ∞ ∫ r0 0∫ ∞ ∫ ∞0 0∫ ∞ ∫ r0 0∫ ∞ ∫ ∞max(x, r)f (x) g(r) dx drrf (x) g(r) dx dr +∫ ∞ ∫ ∞(r − x) + f(x)g(r)dxdr(r − x)f(x)g(r)dx drmin(x, r) f (x) g(r) dx drxf (x) g(r) dx dr +7.5.1. Alternating Renewal Processes000r∫ ∞ ∫ ∞(x − r)f(x)g(r)dr dx 0rxf (x) g(r) dx drrf (x) g(r) dx drAnother example of a regenerative process is provided by what is known as analternating renewal process, which considers a system that can be in one of twostates: on or off. Initially it is on, and it remains on for a time Z 1 ; it then goes offand remains off for a time Y 1 . It then goes on for a time Z 2 ; then off for a timeY 2 ; then on, and so on.We suppose that the random vectors (Z n ,Y n ), n 1 are independent and identicallydistributed. That is, both the sequence of random variables {Z n } and thesequence {Y n } are independent and identically distributed; but we allow Z n andY n to be dependent. In other words, each time the process goes on, everythingstarts over again, but when it then goes off, we allow the length of the off time todepend on the previous on time.Let E[Z]=E[Z n ] and E[Y ]=E[Y n ] denote, respectively, the mean lengthsof an on and off period.We are concerned with P on , the long-run proportion of time that the system ison. If we letX n = Y n + Z n , n 1


446 7 Renewal Theory and Its Applicationsthen at time X 1 the process starts over again. That is, the process starts over againafter a complete cycle consisting of an on and an off interval. In other words, arenewal occurs whenever a cycle is completed. Therefore, we obtain from Proposition7.4 thatP on ==E[Z]E[Y ]+E[Z]E[on]E[on]+E[off](7.15)Also, if we let P off denote the long-run proportion of time that the system is off,thenP off = 1 − P on=E[off]E[on]+E[off](7.16)Example 7.21 (A Production Process) One example of an alternating renewalprocess is a production process (or a machine) which works for a time Z 1 , thenbreaks down and has to be repaired (which takes a time Y 1 ), then works for a timeZ 2 , then is down for a time Y 2 , and so on. If we suppose that the process is as goodas new after each repair, then this constitutes an alternating renewal process. It isworthwhile to note that in this example it makes sense to suppose that the repairtime will depend on the amount of time the process had been working beforebreaking down. Example 7.22 The rate a certain insurance company charges its policyholdersalternates between r 1 and r 0 . A new policyholder is initially charged atarateofr 1 per unit time. When a policyholder paying at rate r 1 has made noclaims for the most recent s time units, then the rate charged becomes r 0 per unittime. The rate charged remains at r 0 until a claim is made, at which time it revertsto r 1 . Suppose that a given policyholder lives forever and makes claims at timeschosen according to a Poisson process with rate λ, and find(a) P i , the proportion of time that the policyholder pays at rate r i , i = 0, 1;(b) the long-run average amount paid per unit time.Solution: If we say that the system is “on” when the policyholder paysat rate r 1 and “off” when she pays at rate r 0 , then this on–off system is analternating renewal process with a new cycle starting each time a claim is made.If X is the time between successive claims, then the on time in the cycle is the


7.5. Regenerative Processes 447smaller of s and X. (Note that if X x} dx=E[X]∫ c0P {X>x} dx=E[X]∫ c0(1 − F(x))dx=(7.17)E[X]where F is the distribution function of X and where we have used the identity thatfor a nonnegative random variable YE[Y ]=∫ ∞0P {Y >x} dx


448 7 Renewal Theory and Its ApplicationsExample 7.24 (The Excess of a Renewal Process) Let us now consider thelong-run proportion of time that the excess of a renewal process is less than c. Todetermine this quantity, let a cycle correspond to a renewal interval and say thatthe system is on whenever the excess of the renewal process is greater than orequal to c and that it is off otherwise. In other words, whenever a renewal occursthe process goes on and stays on until the last c time units of the renewal intervalwhen it goes off. Clearly this is an alternating renewal process, and so we obtainfrom Equation (7.16) thatlong-run proportion of time the excess is less than c =E[off time in cycle]E[cycle time]If X is the length of a renewal interval, then since the system is off the lastc time units of this interval, it follows that the off time in the cycle will equalmin(X, c). Thus,E[min(X, c)]long-run proportion of time the excess is less than c =E[X]∫ c0(1 − F(x))dx=E[X]where the final equality follows from Equation (7.17). Thus, we see from the resultof Example 7.23 that the long-run proportion of time that the excess is less than cand the long-run proportion of time that the age is less than c are equal. One wayto understand this equivalence is to consider a renewal process that has been inoperation for a long time and then observe it going backwards in time. In doingso, we observe a counting process where the times between successive events areindependent random variables having distribution F . That is, when we observea renewal process going backwards in time we again observe a renewal processhaving the same probability structure as the original. Since the excess (age) at anytime for the backwards process corresponds to the age (excess) at that time for theoriginal renewal process (see Figure 7.3), it follows that all long-run properties ofthe age and the excess must be equal. Figure 7.3. Arrowheads indicate direction of time.


7.5. Regenerative Processes 449Example 7.25 (The Busy Period of the M/G/∞ Queue) The infinite serverqueueing system in which customers arrive according to a Poisson process withrate λ, and have a general service distribution G, was analyzed in Section 5.3,where it was shown that the number of customers in the system at time t is Poissondistributed with mean λ ∫ t0 Ḡ(t)dt. If we say that the system is busy when thereis at least one customer in the system and is idle when the system is empty, findE[B], the expected length of a busy period.Solution: If we say that the system is on when there is at least one customerin the system, and off when the system is empty, then we have an alternatingrenewal process. Because ∫ ∞0Ḡ(t)dt = E[S], where E[S] is the mean of theservice distribution G, it follows from the result of Section 5.3 thatlim P {system off at t}=e−λE[S]t→∞Consequently, from alternating renewal process theory we obtaine −λE[S] =E[off time in cycle]E[cycle time]But when the system goes off, it remains off only up to the time of the nextarrival, giving thatBecauseE[off time in cycle]=1/λE[on time in cycle]=E[B]we obtain thatore −λE[S] =1/λ1/λ + E[B]E[B]= 1 λ (eλE[S] − 1)If μ is the mean interarrival time, then the distribution function F e , defined byF e (x) =∫ x01 − F(y)μis called the equilibrium distribution of F . From the preceding, it follows thatF e (x) represents the long-run proportion of time that the age, and the excess, ofthe renewal process is less than or equal to x.dy


450 7 Renewal Theory and Its ApplicationsExample 7.26 (An Inventory Example) Suppose that customers arrive at aspecified store in accordance with a renewal process having interarrival distributionF. Suppose that the store stocks a single type of item and that each arrivingcustomer desires a random amount of this commodity, with the amounts desiredby the different customers being independent random variables having the commondistribution G. The store uses the following (s, S) ordering policy: If itsinventory level falls below s then it orders enough to bring its inventory up to S.That is, if the inventory after serving a customer is x, then the amount ordered isS − x, if xS− x) (7.19)then it is the N y customer in the cycle that causes the inventory level to fall belowy, and it is the N s customer that ends the cycle. As a result, if we let X i ,i 1,denote the interarrival times of customers, thenon time in a cycle =∑N yi=1X i (7.20)N s∑cycle time = X i (7.21)i=1


7.5. Regenerative Processes 451Assuming that the interarrival times are independent of the successive demands,we have thatSimilarly,E[ ∑ Ny] [X i = E Ei=1E[ Ns[ Ny]]∑X i |N yi=1= E[N y E[X]]= E[X]E[N y ]]∑X i = E[X]E[N s ]i=1Therefore, from Equations (7.18), (7.20), and (7.21) we see thatlong-run proportion of time inventory y = E[N y]E[N s ](7.22)However, as the D i , i 1, are independent and identically distributed nonnegativerandom variables with distribution G, it follows from Equation (7.19) that N x hasthe same distribution as the index of the first event to occur after time S − x of arenewal process having interarrival distribution G. That is, N x − 1 would be thenumber of renewals by time S − x of this process. Hence, we see thatwhereFrom Equation (7.22), we arrive atE[N y ]=m(S − y) + 1,E[N s ]=m(S − s) + 1m(t) =∞∑G n (t)n=1m(S − y) + 1long-run proportion of time inventory y =m(S − s) + 1 , s y SFor instance, if the customer demands are exponentially distributed with mean1/μ, thenlong-run proportion of time inventory y =μ(S − y) + 1μ(S − s) + 1 , s y S


452 7 Renewal Theory and Its Applications7.6. Semi-Markov ProcessesConsider a process that can be in state 1 or state 2 or state 3. It is initially in state 1where it remains for a random amount of time having mean μ 1 , then it goes tostate 2 where it remains for a random amount of time having mean μ 2 , then itgoes to state 3 where it remains for a mean time μ 3 , then back to state 1, and soon. What proportion of time is the process in state i, i = 1, 2, 3?If we say that a cycle is completed each time the process returns to state 1, andif we let the reward be the amount of time we spend in state i during that cycle,then the preceding is a renewal reward process. Hence, from Proposition 7.3 weobtain that P i , the proportion of time that the process is in state i, is given byP i =μ iμ 1 + μ 2 + μ 3, i = 1, 2, 3Similarly, if we had a process which could be in any of N states 1, 2,..., N andwhich moved from state 1 → 2 → 3 →···→N −1 → N → 1, then the long-runproportion of time that the process spends in state i isP i =μ iμ 1 + μ 2 +···+μ N,i = 1, 2,..., Nwhere μ i is the expected amount of time the process spends in state i during eachvisit.Let us now generalize the preceding to the following situation. Suppose that aprocess can be in any one of N states 1, 2,..., N, and that each time it entersstate i it remains there for a random amount of time having mean μ i and thenmakes a transition into state j with probability P ij . Such a process is called asemi-Markov process. Note that if the amount of time that the process spendsin each state before making a transition is identically 1, then the semi-Markovprocess is just a Markov chain.Let us calculate P i for a semi-Markov process. To do so, we first consider π i theproportion of transitions that take the process into state i. Now if we let X n denotethe state of the process after the nth transition, then {X n ,n 0} is a Markov chainwith transition probabilities P ij ,i,j= 1, 2,..., N. Hence, π i will just be thelimiting (or stationary) probabilities for this Markov chain (Section 4.4). That is,π i will be the unique nonnegative solution ∗ of∗ We shall assume that there exists a solution of Equation (7.23). That is, we assume that all of thestates in the Markov chain communicate.


7.6. Semi-Markov Processes 453N∑π i = 1,i=1N∑π i = π j P ji , i = 1, 2,...,N (7.23)j=1Now since the process spends an expected time μ i in state i whenever it visitsthat state, it seems intuitive that P i should be a weighted average of the π i whereπ i is weighted proportionately to μ i . That is,P i =π i μ i∑ Nj=1π j μ j, i = 1, 2,..., N (7.24)where the π i are given as the solution to Equation (7.23).Example 7.27 Consider a machine that can be in one of three states: goodcondition, fair condition, orbroken down. Suppose that a machine in good conditionwill remain this way for a mean time μ 1 and then will go to either the faircondition or the broken condition with respective probabilities 3 4 and 1 4 .Amachinein fair condition will remain that way for a mean time μ 2 and then willbreak down. A broken machine will be repaired, which takes a mean time μ 3 , andwhen repaired will be in good condition with probability 2 3and fair condition withprobability 1 3. What proportion of time is the machine in each state?Solution:π i satisfyLetting the states be 1, 2, 3, we have by Equation (7.23) that theπ 1 + π 2 + π 3 = 1,π 1 = 2 3 π 3,π 2 = 3 4 π 1 + 1 3 π 3,π 3 = 1 4 π 1 + π 2The solution isπ 1 = 415 , π 2 = 1 3 , π 3 = 2 5


454 7 Renewal Theory and Its ApplicationsHence, from Equation (7.24) we obtain that P i , the proportion of time the machineis in state i, is given byP 1 =P 2 =4μ 14μ 1 + 5μ 2 + 6μ 3,5μ 24μ 1 + 5μ 2 + 6μ 3,6μ 3P 3 =4μ 1 + 5μ 2 + 6μ 3For instance, if μ 1 = 5, μ 2 = 2, μ 3 = 1, then the machine will be in goodcondition 5 9 of the time, in fair condition 18 5 of the time, in broken condition 1 6of the time. Remark When the distributions of the amount of time spent in each state duringa visit are continuous, then P i also represents the limiting (as t →∞) probabilitythat the process will be in state i at time t.Example 7.28 Consider a renewal process in which the interarrival distributionis discrete and is such thatP {X = i}=p i , i 1where X represents an interarrival random variable. Let L(t) denote the lengthof the renewal interval that contains the point t [that is, if N(t) is the number ofrenewals by time t and X n the nth interarrival time, then L(t) = X N(t)+1 ]. If wethink of each renewal as corresponding to the failure of a lightbulb (which is thenreplaced at the beginning of the next period by a new bulb), then L(t) will equali if the bulb in use at time t dies in its ith period of use.It is easy to see that L(t) is a semi-Markov process. To determine the proportionof time that L(t) = j, note that each time a transition occurs—that is, each time arenewal occurs—the next state will be j with probability p j . That is, the transitionprobabilities of the embedded Markov chain are P ij = p j . Hence, the limitingprobabilities of this embedded chain are given byπ j = p jand, since the mean time the semi-Markov process spends in state j before atransition occurs is j, it follows that the long-run proportion of time the stateis j isP j =jp j∑i ip i


7.7. The Inspection Paradox 4557.7. The Inspection ParadoxSuppose that a piece of equipment, say, a battery, is installed and serves until itbreaks down. Upon failure it is instantly replaced by a like battery, and this processcontinues without interruption. Letting N(t) denote the number of batteries thathave failed by time t, we have that {N(t), t 0} is a renewal process.Suppose further that the distribution F of the lifetime of a battery is not knownand is to be estimated by the following sampling inspection scheme. We fix sometime t and observe the total lifetime of the battery that is in use at time t. Since Fis the distribution of the lifetime for all batteries, it seems reasonable that it shouldbe the distribution for this battery. However, this is the inspection paradox for itturns out that the battery in use at time t tends to have a larger lifetime than anordinary battery.To understand the preceding so-called paradox, we reason as follows. In renewaltheoretic terms what we are interested in is the length of the renewal intervalcontaining the point t. That is, we are interested in X N(t)+1 = S N(t)+1 − S N(t)(see Figure 7.2). To calculate the distribution of X N(t)+1 we condition on the timeof the last renewal prior to (or at) time t. That is,P {X N(t)+1 >x}=E[P {X N(t)+1 >x|S N(t) = t − s}]where we recall (Figure 7.2) that S N(t) is the time of the last renewal prior to(or at) t. Since there are no renewals between t − s and t, it follows that X N(t)+1must be larger than x if s>x. That is,P {X N(t)+1 >x|S N(t) = t − s}=1 if s>x (7.25)On the other hand, suppose that s x. As before, we know that a renewal occurredat time t − s and no additional renewals occurred between t − s and t, and we askfor the probability that no renewals occur for an additional time x − s.Thatis,weare asking for the probability that an interarrival time will be greater than x giventhat it is greater than s. Therefore, for s x,P {X N(t)+1 >x|S N(t) = t − s}= P {interarrival time >x|interarrival time >s}= P {interarrival time >x}/P {interarrival time >s}= 1 − F(x)1 − F(s) 1 − F(x) (7.26)


456 7 Renewal Theory and Its ApplicationsFigure 7.4.Hence, from Equations (7.25) and (7.26) we see that, for all s,P {X N(t)+1 >x|S N(t) = t − s} 1 − F(x)Taking expectations on both sides yields thatP {X N(t)+1 >x} 1 − F(x) (7.27)However, 1 − F(x) is the probability that an ordinary renewal interval is largerthan x, that is, 1 − F(x)= P {X n >x}, and thus Equation (7.27) is a statement ofthe inspection paradox that the length of the renewal interval containing the pointt tends to be larger than an ordinary renewal interval.Remark To obtain an intuitive feel for the so-called inspection paradox, reasonas follows. We think of the whole line being covered by renewal intervals, one ofwhich covers the point t. Is it not more likely that a larger interval, as opposed toa shorter interval, covers the point t?We can explicitly calculate the distribution of X N(t)+1 when the renewalprocess is a Poisson process. [Note that, in the general case, we did not needto calculate explicitly P {X N(t)+1 >x} to show that it was at least as large as1 − F(x).] To do so we writeX N(t)+1 = A(t) + Y(t)where A(t) denotes the time from t since the last renewal, and Y(t) denotes thetime from t until the next renewal (see Figure 7.4). A(t) is the age of the processat time t (in our example it would be the age at time t of the battery in use attime t), and Y(t) is the excess life of the process at time t (it is the additionaltime from t until the battery fails). Of course, it is true that A(t) = t − S N(t) , andY(t)= S N(t)+1 − t.To calculate the distribution of X N(t)+1 we first note the important fact that,for a Poisson process, A(t) and Y(t) are independent. This follows since by thememoryless property of the Poisson process, the time from t until the next renewalwill be exponentially distributed and will be independent of all that has previouslyoccurred [including, in particular, A(t)]. In fact, this shows that if {N(t), t 0}


7.7. The Inspection Paradox 457is a Poisson process with rate λ, thenP {Y(t) x}=1 − e −λx (7.28)The distribution of A(t) may be obtained as follows{P {0 renewals in [t − x, t]}, if x tP {A(t) > x}=0, if x>t{e −λx , if x t=0, if x>tor, equivalently,{1 − e −λx , x tP {A(t) x}=1, x >t(7.29)Hence, by the independence of Y(t) and A(t) the distribution of X N(t)+1 is justthe convolution of the exponential distribution equation (7.28) and the distributionof equation (7.29). It is interesting to note that for t large, A(t) approximately hasan exponential distribution. Thus, for t large, X N(t)+1 has the distribution of theconvolution of two identically distributed exponential random variables, whichby Section 5.2.3, is the gamma distribution with parameters (2, λ). In particular,for t large, the expected length of the renewal interval containing the point t isapproximately twice the expected length of an ordinary renewal interval.Using the results obtained in Examples 7.16 and 7.17 concerning the averagevalues of the age and of the excess, it follows from the identityX N(t)+1 = A(t) + Y(t)that the average length of the renewal interval containing a specified point islims→∞∫ s0 X N(t)+1 dt= E[X2 ]s E[X]where X has the interarrival distribution. Because, except for when X is a constant,E[X 2 ] >(E[X]) 2 , this average value is, as expected from the inspectionparadox, greater than the expected value of an ordinary renewal interval.


458 7 Renewal Theory and Its ApplicationsWe can use an alternating renewal process argument to determine the long-runproportion of time that X N(t)+1 is greater than c. To do so, let a cycle correspondto a renewal interval, and say that the system is on at time t if the renewal intervalcontaining t is of length greater than c (that is, if X N(t)+1 >c),and say that thesystem is off at time t otherwise. In other words, the system is always on during acycle if the cycle time exceeds c or is always off during the cycle if the cycle timeis less than c. Thus, if X is the cycle time, we have{X, if X>con time in cycle =0, if X cTherefore, we obtain from alternating renewal process theory thatE[on time in cycle]long-run proportion of timeX N(t)+1 >c=E[cycle time]∫ ∞cxf (x) dx=μwhere f is the density function of an interarrival.7.8. Computing the Renewal FunctionThe difficulty with attempting to use the identitym(t) =∞∑F n (t)n=1to compute the renewal function is that the determination of F n (t) =P {X 1 + ··· + X n t} requires the computation of an n-dimensional integral.Following, we present an effective algorithm which requires as inputs only onedimensionalintegrals.Let Y be an exponential random variable having rate λ, and suppose that Yis independent of the renewal process {N(t),t 0}. We start by determiningE[N(Y)], the expected number of renewals by the random time Y .Todoso,we first condition on X 1 , the time of the first renewal. This yieldsE[N(Y)]=∫ ∞0E[N(Y)|X 1 = x]f(x)dx (7.30)where f is the interarrival density. To determine E[N(Y)|X 1 = x], we now conditionon whether or not Y exceeds x. Now,ifY


7.8. Computing the Renewal Function 459On the other hand, if we are given that xx, the amount by which it exceeds x is also exponentialwith rate λ, and so given that Y>xthe number of renewals between x and Y willhave the same distribution as N(Y). Hence,and so,E[N(Y)|X 1 = x, Y < x]=0,E[N(Y)|X 1 = x, Y > x]=1 + E[N(Y)]E[N(Y)|X 1 = x]=E[N(Y)|X 1 = x, Y < x]P {Y x]P {Y >x|X 1 = x}= E[N(Y)|X 1 = x, Y > x]P {Y >x}since Y and X 1 are independent= (1 + E[N(Y)])e −λxE[N(Y)]=(1 + E[N(Y)])∫ ∞0e −λx f(x)dxE[N(Y)]= E[e−λX ]1 − E[e −λX (7.31)]where X has the renewal interarrival distribution.If we let λ = 1/t, then Equation (7.31) presents an expression for the expectednumber of renewals (not by time t, but) by a random exponentially distributedtime with mean t. However, as such a random variable need not be close to itsmean (its variance is t 2 ), Equation (7.31) need not be particularly close to m(t).To obtain an accurate approximation suppose that Y 1 ,Y 2 ,...,Y n are independentexponentials with rate λ and suppose they are also independent of the renewalprocess. Let, for r = 1,...,n,m r = E[N(Y 1 +···+Y r )]To compute an expression for m r , we again start by conditioning on X 1 , the timeof the first renewal.m r =∫ ∞0E[N(Y 1 +···+Y r )|X 1 = x] f(x)dx (7.32)


460 7 Renewal Theory and Its ApplicationsTo determine the foregoing conditional expectation, we now condition on thenumber of partial sums ∑ ji=1 Y i,j= 1,...,r, that are less than x. Now,ifallr partial sums are less than x—that is, if ∑ ri=1 Y i


7.9. Applications to Patterns 461Table 7.1Approximating m(t)F i Exact Approximationi t m(t) n= 1 n = 3 n = 10 n = 25 n = 501 1 0.2838 0.3333 0.3040 0.2903 0.2865 0.28521 2 0.7546 0.8000 0.7697 0.7586 0.7561 0.75531 5 2.250 2.273 2.253 2.250 2.250 2.2501 10 4.75 4.762 4.751 4.750 4.750 4.7502 0.1 0.1733 0.1681 0.1687 0.1689 0.1690 —2 0.3 0.5111 0.4964 0.4997 0.5010 0.5014 —2 0.5 0.8404 0.8182 0.8245 0.8273 0.8281 0.82832 1 1.6400 1.6087 1.6205 1.6261 1.6277 1.62832 3 4.7389 4.7143 4.7294 4.7350 4.7363 4.73672 10 15.5089 15.5000 15.5081 15.5089 15.5089 15.50893 0.1 0.2819 0.2692 0.2772 0.2804 0.2813 —3 0.3 0.7638 0.7105 0.7421 0.7567 0.7609 —3 1 2.0890 2.0000 2.0556 2.0789 2.0850 2.08703 3 5.4444 5.4000 5.4375 5.4437 5.4442 5.4443choosing n large, ∑ ni=1 Y i will be a random variable having most of its probabilityconcentrated about t, and so E[N( ∑ ni=1 Y i )] should be quite close to E[N(t)].[Indeed, if m(t) is continuous at t, it can be shown that these approximationsconverge to m(t) as n goes to infinity.]Example 7.29 Table 7.1 compares the approximation with the exact valuefor the distributions F i with densities f i ,i= 1, 2, 3,whicharegivenbyf 1 (x) = xe −x ,1 − F 2 (x) = 0.3e −x + 0.7e −2x ,1 − F 3 (x) = 0.5e −x + 0.5e −5x 7.9. Applications to PatternsA counting process with independent interarrival times X 1 ,X 2 ,... is said to bea delayed or general renewal process if X 1 has a different distribution fromthe identically distributed random variables X 2 ,X 3 ,.... That is, a delayed renewalprocess is a renewal process in which the first interarrival time has a differentdistribution than the others. Delayed renewal processes often arise in


462 7 Renewal Theory and Its Applicationspractice and it is important to note that all of the limiting theorems aboutN(t), the number of events by time t, remain valid. For instance, it remainstrue thatE[N(t)]t→ 1 μ and Var(N(t))t→ σ 2 /μ 3as t →∞where μ and σ 2 are the expected value and variance of the interarrivals X i , i>1.7.9.1. Patterns of Discrete Random VariablesLet X 1 ,X 2 ,... be independent with P {X i = j} =p(j), j 0, and let T denotethe first time the pattern x 1 ,...,x r occurs. If we say that a renewal occursat time n, n r, if(X n−r+1 ,...,X n ) = (x 1 ,...,x r ), then N(n), n 1, is a delayedrenewal process, where N(n) denotes the number of renewals by time n.It follows thatE[N(n)]→ 1 as n →∞ (7.36)n μVar(N(n))n→ σ 2μ 3 as n →∞ (7.37)where μ and σ are, respectively, the mean and standard deviation of the time betweensuccessive renewals. Whereas, in Section 3.6.4, we showed how to computethe expected value of T , we will now show how to use renewal theory results tocompute both its mean and its variance.To begin, let I(i)equal 1 if there is a renewal at time i and let it be 0 otherwise,i r. Also, let p = ∏ ri=1 p(x i ). Since,P {I(i)= 1}=P {X i−r+1 = i 1 ,...,X i = i r }=pit follows that I(i), i r, are Bernoulli random variables with parameter p.Now,soE[N(n)]=N(n)=n∑I(i)i=rn∑E[I(i)]=(n − r + 1)pi=rDividing by n and then letting n →∞give, from Equation (7.36),μ = 1/p (7.38)


7.9. Applications to Patterns 463That is, the mean time between successive occurrences of the pattern is equal to1/p. Also,Var(N(n))n= 1 nn∑Var(I (i)) + 2 ni=rn−1∑i=r∑nj>i= n − r + 1 p(1 − p) + 2 ∑n−1nni=rCov(I (i), I (j))∑i 0, ifk = max{j


464 7 Renewal Theory and Its Applicationsfrom Equation (7.38):E[T ]=μ = 1 p(7.40)Also, since two patterns cannot occur within a distance less than r of each other,it follows that I(r)I(r + j)= 0 when 1 j r − 1. Hence,Cov(I (r), I (r + j))=−E[I(r)]E[I(r + j)]=−p 2 , if 1 j r − 1Hence, from Equation (7.39), we obtainVar(T ) = σ 2 = p −2 (1 − p) − 2p −3 (r − 1)p 2 = p −2 − (2r − 1)p −1 (7.41)Remark In cases of “rare” patterns, if the pattern hasn’t yet occurred by sometime n, then it would seem that we would have no reason to believe that the remainingtime would be much less than if we were just beginning from scratch.That is, it would seem that the distribution is approximately memoryless andwould thus be approximately exponentially distributed. Thus, since the varianceof an exponential is equal to its mean squared, we would expect when μ is largethat Var(T)≈ E 2 [T ], and this is borne out by the preceding, which states thatVar(T ) = E 2 [T ]−(2r − 1)E[T ].Example 7.30 Suppose we are interested in the number of times that a faircoin needs to be flipped before the pattern h, h, t, h, t occurs. For this pattern,r = 5, p =32 1 , and the overlap is 0. Hence, from Equations (7.40) and (7.41)E[T ]=32, Var(T ) = 32 2 − 9 × 32 = 736,andVar(T )/E 2 [T ]=0.71875On the other hand, if p(i) = i/10, i = 1, 2, 3, 4 and the pattern is 1, 2, 1, 4, 1, 3, 2then r = 7, p = 3/625,000, and the overlap is 0. Thus, again from Equations(7.40) and (7.41), we see that in this caseE[T ]=208,333.33, Var(T ) = 4.34 × 10 10 ,Var(T )/E 2 [T ]=0.99994


Case 2:In this case,The Overlap Is of Size kT = T i1 ,...,i k+ T ∗7.9. Applications to Patterns 465where T i1 ,...,i kis the time until the pattern i 1 ,...,i k appears and T ∗ , distributed asan interarrival time of the renewal process, is the additional time that it takes, startingwith i 1 ,...,i k , to obtain the pattern i 1 ,...,i r . Because these random variablesare independent, we haveNow, from Equation (7.38)E[T ]=E[T i1 ,...,i k]+E[T ∗ ] (7.42)Var(T ) = Var(T i1 ,...,i k) + Var(T ∗ ) (7.43)E[T ∗ ]=μ = p −1 (7.44)Also, since no two renewals can occur within a distance r − k − 1 of each other,it follows that I(r)I(r + j) = 0if1 j r − k − 1. Therefore, from Equation(7.39), we see that( r−1)∑Var(T ∗ ) = σ 2 = p −2 (1 − p) + 2p −3 E[I(r)I(r + j)]−(r − 1)p 2j=r−k∑r−1= p −2 − (2r − 1)p −1 + 2p −3j=r−kE[I(r)I(r + j)] (7.45)The quantities E[I(r)I(r + j)] in Equation (7.45) can be calculated by consideringthe particular pattern. To complete the calculation of the first two momentsof T , we then compute the mean and variance of T i1 ,...,i kby repeating the samemethod.Example 7.31 Suppose that we want to determine the number of flips of afair coin until the pattern h, h, t, h, h occurs. For this pattern, r = 5, p =32 1 , andthe overlap parameter is k = 2. BecauseE[I(5)I (8)]=P {h, h, t, h, h, t, h, h}= 1256E[I(5)I (9)]=P {h, h, t, h, h, h, t, h, h}= 1512we see from Equations (7.44) and (7.45) thatE[T ∗ ]=32Var(T ∗ ) = (32) 2 − 9(32) + 2(32) 3( 1256+ 1512)= 1120


466 7 Renewal Theory and Its ApplicationsHence, from Equations (7.42) and (7.43) we obtainE[T ]=E[T h,h ]+32, Var(T ) = Var(T h,h ) + 1120Now, consider the pattern h, h. It has r = 2, p = 1 4, and overlap parameter 1.Since, for this pattern, E[I(2)I (3)]= 1 8, we obtain, as in the preceding, thatE[T h,h ]=E[T h ]+4,Var(T h,h ) = Var(T h ) + 16 − 3(4) + 2 ( 648)= Var(Th ) + 20Finally, for the pattern h, which has r = 1,p= 1 2, we see from Equations (7.40)and (7.41) thatPutting it all together givesE[T h ]=2, Var(T h ) = 2E[T ]=38, Var(T ) = 1142, Var(T )/E 2 [T ]=0.79086 Example 7.32 Suppose that P {X n = i} =p i , and consider the pattern0, 1, 2, 0, 1, 3, 0, 1. Then p = p0 3p3 1 p 2p 3 , r = 8, and the overlap parameter isk = 2. SinceE[I(8)I (14)]=p 5 0 p5 1 p2 2 p2 3E[I(8)I (15)]=0we see from Equations (7.42) and (7.44) thatE[T ]=E[T 0,1 ]+p −1and from Equations (7.43) and (7.45) thatVar(T ) = Var(T 0,1 ) + p −2 − 15p −1 + 2p −1 (p 0 p 1 ) −1Now, the r and p values of the pattern 0, 1 are r(0, 1) = 2, p(0, 1) = p 0 p 1 , andthis pattern has overlap 0. Hence, from Equations (7.40) and (7.41),E[T 0,1 ]=(p 0 p 1 ) −1 ,Var(T 0,1 ) = (p 0 p 1 ) −2 − 3(p 0 p 1 ) −1For instance, if p i = 0.2, i = 0, 1, 2, 3 thenE[T ]=25 + 5 8 = 390,650Var(T ) = 625 − 75 + 5 16 + 35 × 5 8 = 1.526 × 10 11Var(T )/E 2 [T ]=0.99996


7.9. Applications to Patterns 467Remark It can be shown that T is a type of discrete random variable callednew better than used (NBU), which loosely means that if the pattern has not yetoccurred by some time n then the additional time until it occurs tends to be lessthan the time it would take the pattern to occur if one started all over at that point.Such a random variable is known to satisfy (see Proposition 9.6.1 of Ref. [4])Var(T ) E 2 [T ]−E[T ] E 2 [T ]Now, suppose that there are s patterns, A(1),...,A(s) and that we are interestedin the mean time until one of these patterns occurs, as well as the probabilitymass function of the one that occurs first. Let us assume, without any loss of generality,that none of the patterns is contained in any of the others. [That is, we ruleout such trivial cases as A(1) = h, h and A(2) = h, h, t.] To determine the quantitiesof interest, let T(i)denote the time until pattern A(i) occurs, i = 1,...,s,andlet T(i,j)denote the additional time, starting with the occurrence of pattern A(i),until pattern A(j) occurs, i ≠ j. Start by computing the expected values of theserandom variables. We have already shown how to compute E[T(i)], i = 1,...,s.To compute E[T(i,j)], use the same approach, taking into account any “overlap”between the latter part of A(i) and the beginning part of A(j). For instance,suppose A(1) = 0, 0, 1, 2, 0, 3, and A(2) = 2, 0, 3, 2, 0. ThenT(2) = T 2,0,3 + T(1, 2)where T 2,0,3 is the time to obtain the pattern 2, 0, 3. Hence,E[T(1, 2)]=E[T(2)]−E[T 2,0,3 ]= ( p2 2 p2 0 p ) −13 + (p0 p 2 ) −1 − (p 2 p 0 p 3 ) −1So, suppose now that all of the quantities E[T(i)] and E[T(i,j)] have been computed.Letand letM = min T(i)iP(i)= P {M = T(i)},i = 1,...,sThat is, P(i)is the probability that pattern A(i) is the first pattern to occur. Now,for each j we will derive an equation that E[T(j)] satisfies as follows:E[T(j)]=E[M]+E[T(j)− M]= E[M]+ ∑E[T(i,j)]P(i), j = 1,...,s (7.46)i:i≠j


468 7 Renewal Theory and Its Applicationswhere the final equality is obtained by conditioning on which pattern occurs first.But the Equations (7.46) along with the equations∑P(i)= 1i=1constitute a set of s +1 equations in the s +1 unknowns E[M], P(i), i = 1,...,s.Solving them yields the desired quantities.Example 7.33 Suppose that we continually flip a fair coin. With A(1) =h,t,t,h,h and A(2) = h, h, t, h, t, wehaveE[T(1)]=32 + E[T h ]=34E[T(2)]=32E[T(1, 2)]=E[T(2)]−E[T h,h ]=32 − (4 + E[T h ]) = 26E[T(2, 1)]=E[T(1)]−E[T h,t ]=34 − 4 = 30Hence, we need solve the equations34 = E[M]+30P(2)32 = E[M]+26P(1)1 = P(1) + P(2)These equations are easily solved, and yield the valuesP(1) = P(2) = 1 2 ,E[M]=19Note that although the mean time for pattern A(2) is less than that for A(1), eachhas the same chance of occurring first. Equations (7.46) are easy to solve when there are no overlaps in any of thepatterns. In this case, for all i ≠ jso Equations (7.46) reduce toorE[T(i,j)]=E[T(j)]E[T(j)]=E[M]+(1 − P (j))E[T(j)]P(j)= E[M]/E[T(j)]


7.9. Applications to Patterns 469Summing the preceding over all j yieldsE[M]=P(j)=1∑ sj=11/E[T(j)]1/E[T(j)]∑ sj=11/E[T(j)](7.47)(7.48)In our next example we use the preceding to reanalyze the model of Example 7.8.Example 7.34 Suppose that each play of a game is, independently of theoutcomes of previous plays, won by player i with probability p i , i = 1,...,s.Suppose further that there are specified numbers n(1),...,n(s)such that the firstplayer i to win n(i) consecutive plays is declared the winner of the match. Findthe expected number of plays until there is a winner, and also the probability thatthe winner is i, i = 1,...,s.Solution: Letting A(i),fori = 1,...,s, denote the pattern of n i consecutivevalues of i, this problem asks for P(i), the probability that pattern A(i) occursfirst, and for E[M]. BecauseE[T(i)]=(1/p i ) n(i) + (1/p i ) n(i)−1 +···+1/p i =we obtain, from Equations (7.47) and (7.48), thatE[M]=P(i)=1∑ sj=1[ n(j) pj(1 − p j )/ ( 1 − p n(j) )]j)p n(i)i∑ sj=1[pn(j)j(1 − p i )/ ( 1 − p n(i)i(1 − p j )/ ( 1 − p n(j) )] j1 − pn(i) ip n(i)i(1 − p i )7.9.2. The Expected Time to a Maximal Run of Distinct ValuesLet X i , i 1, be independent and identically distributed random variables that areequally likely to take on any of the values 1, 2,...,m. Suppose that these randomvariables are observed sequentially, and let T denote the first time that a run of mconsecutive values includes all the values 1,...,m. That is,T = min{n: X n−m+1 ,...,X n are all distinct}


470 7 Renewal Theory and Its ApplicationsTo compute E[T ], define a renewal process by letting the first renewal occur attime T . At this point start over and, without using any of the data values up to T ,let the next renewal occur the next time a run of m consecutive values are alldistinct, and so on. For instance, if m = 3 and the data are1, 3, 3, 2, 1, 2, 3, 2, 1, 3,..., (7.49)then there are two renewals by time 10, with the renewals occurring at times5 and 9. We call the sequence of m distinct values that constitutes a renewal arenewal run.Let us now transform the renewal process into a delayed renewal rewardprocess by supposing that a reward of 1 is earned at time n, n m, ifthevaluesX n−m+1 ,..., X n are all distinct. That is, a reward is earned each time theprevious m data values are all distinct. For instance, if m = 3 and the data valuesare as in (7.49) then unit rewards are earned at times 5, 7, 9, and 10. If we let R idenote the reward earned at time i, then by Proposition 7.3,lim nE [∑ ni=1 R i]n= E[R]E[T ](7.50)where R is the reward earned between renewal epochs. Now, with A i equal to theset of the first i data values of a renewal run, and B i to the set of the first i valuesfollowing this renewal run, we have the following:m−1∑E[R]=1 + E[reward earned a time i after a renewal]i=1m−1∑= 1 + P {A i = B i }i=1m−1∑ i!= 1 +m ii=1m−1∑ i!=m i (7.51)i=0Hence, since for i mE[R i ]=P {X i−m+1 ,...,X i are all distinct}= m!m m


7.9. Applications to Patterns 471it follows from Equation (7.50) thatThus from Equation (7.51) we obtainm!m m = E[R]E[T ]E[T ]= mm m−1∑i!/m im!The preceding delayed renewal reward process approach also gives us anotherway of computing the expected time until a specified pattern appears. We illustrateby the following example.Example 7.35 Compute E[T ], the expected time until the pattern h, h, h, t,h, h, h appears, when a coin that comes up heads with probability p and tails withprobability q = 1 − p is continually flipped.Solution: Define a renewal process by letting the first renewal occur whenthe pattern first appears, and then start over. Also, say that a reward of 1 isearned whenever the pattern appears. If R is the reward earned between renewalepochs, we haveE[R]=1 +i=06∑E[reward earned i units after a renewal]i=1= 1 + 0 + 0 + 0 + p 3 q + p 3 qp + p 3 qp 2Hence, since the expected reward earned at time i is E[R i ]=p 6 q, we obtainthe following from the renewal reward theorem:or1 + qp 3 + qp 4 + qp 5= qp 6E[T ]E[T ]=q −1 p −6 + p −3 + p −2 + p −17.9.3. Increasing Runs of Continuous Random VariablesLet X 1 ,X 2 ,... be a sequence of independent and identically distributed continuousrandom variables, and let T denote the first time that there is a string of r


472 7 Renewal Theory and Its Applicationsconsecutive increasing values. That is,T = min{n r : X n−r+1


7.10. The Insurance Ruin Problem 473From the preceding, we see thatlim P {a renewal occurs at time k}= limk→∞ k→∞However,E[N(n)]=∞∑∞∑P {S k = ir}=i=1n∑P {a renewal occurs at time k}k=1i=1ir(ir + 1)!Because we can show that for any numbers a k ,k 1, for which lim k→∞ a k exists∑ nk=1a klim = limn→∞ na kk→∞we obtain from the preceding, upon using the elementary renewal theorem,E[T ]=1∑ ∞i=1ir/(ir + 1)!7.10. The Insurance Ruin ProblemSuppose that claims are made to an insurance firm according to a Poisson processwith rate λ, and that the successive claim amounts Y 1 ,Y 2 ,... are independentrandom variables having a common distribution function F with density f(x).Suppose also that the claim amounts are independent of the claim arrival times.Thus, if we let M(t) be the number of claims made by time t, then ∑ M(t)i=1 Y i isthe total amount paid out in claims by time t. Supposing that the firm starts withan initial capital x and receives income at a constant rate c per unit time, we areinterested in the probability that the firm’s net capital ever becomes negative; thatis, we are interested inR(x) = P}∑Y i >x+ ct for some t 0{ M(t)i=1If the firm’s capital ever becomes negative, we say that the firm is ruined; thusR(x) is the probability of ruin given that the firm begins with an initial capital x.Let μ = E[Y i ] be the mean claim amount, and let ρ = λμ/c. Because claimsoccur at rate λ, the long run rate at which money is paid out is λμ. (Aformalargument uses renewal reward processes. A new cycle begins when a claim occurs;the cost for the cycle is the claim amount, and so the long run average cost


474 7 Renewal Theory and Its Applicationsis μ, the expected cost incurred in a cycle, divided by 1/λ, the mean cycle time.)Because the rate at which money is received is c, it is clear that R(x) = 1 whenρ>1. As R(x) can be shown to also equal 1 when ρ = 1 (think of the recurrenceof the symmetric random walk), we will suppose that ρ


7.10. The Insurance Ruin Problem 475Lemma 7.5For a function k, and a differentiable function tddx∫ x0t(x − y)k(y)dy = t(0)k(x) +∫ x0t ′ (x − y)k(y)dyDifferentiating both sides of Equation (7.53) gives, upon using the precedinglemma,R ′ (x) = λ c[R(0) ¯F(x)+∫ x0]R ′ (x − y) ¯F(y)dy− ¯F(x)Differentiation by parts [u = ¯F(y), dv = R ′ (x − y)dy] shows that∫ x0∫ xR ′ (x − y) ¯F(y)dy =−¯F(y)R(x− y)| x 0 − R(x − y)f(y)dy=−¯F(x)R(0) + R(x) −0∫ x0R(x − y)f(y)dy(7.54)Substituting this result back in Equation (7.54) gives Equation (7.52). Thus, wehave established Equation (7.53).To obtain a more usable expression for R(x), consider a renewal process whoseinterarrival times X 1 ,X 2 ,... are distributed according to the equilibrium distributionof F . That is, the density function of the X i isf e (x) = F e ′ ¯F(x)(x) =μLet N(t) denote the number of renewals by time t, and let us derive an expressionforq(x) = E [ ρ N(x)+1]Conditioning on X 1 gives thatq(x) =∫ ∞0E [ ρ N(x)+1 |X 1 = y ] ¯F(y)μBecause, given that X 1 = y, the number of renewals by time x is distributed as1 + N(x − y) when y x, or is identically 0 when y>x, we see that{ [ρE ρE[ρ N(x)+1 N(x−y)+1 ] , if y x|X 1 = y]=ρ, if y>xdy


476 7 Renewal Theory and Its ApplicationsTherefore, q(x) satisfiesq(x) =∫ x= λ c= λ cρq(x − y) ¯F(y) ∫ ∞μdy + ρ ¯F(y)x μdyq(x − y) ¯F(y)dy+ λ [∫ ∞¯F(y)dy−c0∫ x0∫ x0q(x − y) ¯F(y)dy+ ρ − λ c0∫ x0¯F(y)dy∫ x0]¯F(y)dyBecause q(0) = ρ, this is exactly the same equation that is satisfied by R(x),namely Equation (7.53). Therefore, because the solution to (7.53) is unique, weobtain the following.Proposition 7.6R(x) = q(x) = E [ ρ N(x)+1]Example 7.36 Suppose that the firm does not start with any initial capital.Then, because N(0) = 0, we see that the firm’s probability of ruin isR(0) = ρ. Example 7.37 If the claim distribution F is exponential with mean μ, thenso is F e . Hence, N(x) is Poisson with mean x/μ, giving the resultR(x) = E [ ρ N(x)+1] =∞∑ρ n+1 e −x/μ (x/μ) n /n!n=0∑∞= ρe −x/μ (ρ x/μ) n /n!n=0= ρe −x(1−ρ)/μ To obtain some intuition about the ruin probability, let T be independent of theinterarrival times X i of the renewal process having interarrival distribution F e ,and let T have probability mass functionP {T = n}=ρ n (1 − ρ), n = 0, 1,...Now consider P { ∑ Ti=1 X i >x}, the probability that the sum of the first T of theX i exceeds x. Because N(x)+ 1 is the first renewal that occurs after time x, we


7.10. The Insurance Ruin Problem 477have{N(x)+ 1 = min n:}n∑X i >xi=1Therefore, conditioning on the number of renewals by time x gives that{ T∑} {∞∑ T∑}∣P X i >x = P X i >x∣N(x)= j P {N(x)= j}i=1j=0 i=1∞∑= P {T j + 1|N(x)= j}P {N(x)= j}j=0∞∑= P {T j + 1}P {N(x)= j}j=0∞∑= ρ j+1 P {N(x)= j}j=0= E [ ρ N(x)+1]Consequently, P { ∑ Ti=1 X i >x} is equal to the ruin probability. Now, as noted inExample 7.36, the ruin probability of a firm starting with 0 initial capital is ρ.Suppose that the firm starts with an initial capital x, and suppose for the momentthat it is allowed to remain in business even if its capital becomes negative.Because the probability that the firm’s capital ever falls below its initial startingamount x is the same as the probability that its capital ever becomes negativewhen it starts with 0, this probability is also ρ. Thus if we say that a low occurswhenever the firm’s capital becomes lower than it has ever previously been, thenthe probability that a low ever occurs is ρ. Now, if a low does occur, then theprobability that there will be another low is the probability that the firm’s capitalwill ever fall below its previous low, and clearly this is also ρ. Therefore, eachnew low is the final one with probability 1 − ρ. Consequently, the total numberof lows that ever occur has the same distribution as T . In addition, if we let W ibe the amount by which the ith low is less than the low preceding it, it is easyto see that W 1 ,W 2 ,... are independent and identically distributed, and are alsoindependent of the number of lows. Because the minimal value over all time ofthe firm’s capital (when it is allowed to remain in business even when its capitalbecomes negative) is x − ∑ Ti=1 W i , it follows that the ruin probability of a firm


478 7 Renewal Theory and Its Applicationsthat starts with an initial capital x is{ T∑}R(x) = P W i >xi=1BecauseR(x) = E [ { T∑}ρ N(x)+1] = P X i >xi=1we can identify W i with X i . That is, we can conclude that each new low is lowerthan its predecessor by a random amount whose distribution is the equilibriumdistribution of a claim amount.Let us now give the proof of Lemma 7.5.Proof of Lemma 7.5Let G(x) = ∫ x0t(x − y)k(y)dy. ThenG(x + h) − G(x) = G(x + h) −Dividing through by h givesG(x + h) − G(x)h=Letting h → 0 gives the result:+∫ x0∫ x+hx∫ x+= 1 h+0∫ x+hx∫ x0∫ x0t(x + h − y)k(y)dyt(x + h − y)k(y)dy − G(x)t(x + h − y)k(y)dy[t(x + h − y) − t(x − y)]k(y)dyt(x + h − y)k(y)dyt(x + h − y) − t(x − y)hk(y)dyG ′ (x) = t(0)k(x)+∫ x0t ′ (x − y)k(y)dy


Exercises 479Exercises1. Is it true that(a) N(t)t?(b) N(t) n if and only if S n t?(c) N(t)>n if and only if S n 1}What is E[N]?*6. Consider a renewal process {N(t), t 0} having a gamma (r, λ) interarrivaldistribution. That is, the interarrival density isf(x)= λe−λx (λx) r−1, x >0(r − 1)!(a) Show that∞∑ e −λt (λt) iP {N(t) n}=i!i=nr


480 7 Renewal Theory and Its Applications(b) Show thatm(t) =∞∑[ ] i e −λt (λt) iri=rwhere [i/r] is the largest integer less than or equal to i/r.Hint: Use the relationship between the gamma (r, λ) distribution and the sumof r independent exponentials with rate λ, to define N(t) in terms of a Poissonprocess with rate λ.7. Mr. Smith works on a temporary basis. The mean length of each job he getsis three months. If the amount of time he spends between jobs is exponentiallydistributed with mean 2, then at what rate does Mr. Smith get new jobs?*8. A machine in use is replaced by a new machine either when it fails or when itreaches the age of T years. If the lifetimes of successive machines are independentwith a common distribution F having density f , show that(a) the long-run rate at which machines are replaced equals[∫ T0i!] −1xf (x) dx + T(1 − F(T))(b) the long-run rate at which machines in use fail equals∫ T0F(T)xf (x) dx + T [1 − F(T)]9. A worker sequentially works on jobs. Each time a job is completed, a newone is begun. Each job, independently, takes a random amount of time having distributionF to complete. However, independently of this, shocks occur accordingto a Poisson process with rate λ. Whenever a shock occurs, the worker discontinuesworking on the present job and starts a new one. In the long run, at what rateare jobs completed?10. Consider a renewal process with mean interarrival time μ. Suppose that eachevent of this process is independently “counted” with probability p. LetN C (t)denote the number of counted events by time t, t > 0.(a) Is N C (t), t 0 a renewal process?(b) What is lim t→∞ N C (t)/t?


Exercises 48111. A renewal process for which the time until the initial renewal has a differentdistribution than the remaining interarrival times is called a delayed (or a general)renewal process. Prove that Proposition 7.1 remains valid for a delayed renewalprocess. (In general, it can be shown that all of the limit theorems for a renewalprocess remain valid for a delayed renewal process provided that the time untilthe first renewal has a finite mean.)12. Events occur according to a Poisson process with rate λ. Any event that occurswithin a time d of the event that immediately preceded it is called a d-event.For instance, if d = 1 and events occur at times 2, 2.8, 4, 6, 6.6,..., then theevents at times 2.8 and 6.6 would be d-events.(a) At what rate do d-events occur?(b) What proportion of all events are d-events?13. Let X 1 ,X 2 ,... be a sequence of independent random variables. The nonnegativeinteger valued random variable N is said to be a stopping time for thesequence if the event {N = n} is independent of X n+1 ,X n+2 ,.... The idea beingthat the X i are observed one at a time—first X 1 , then X 2 , and so on—and N representsthe number observed when we stop. Hence, the event {N = n} correspondsto stopping after having observed X 1 ,..., X n and thus must be independent of thevalues of random variables yet to come, namely, X n+1 ,X n+2 ,....(a) Let X 1 ,X 2 ,...be independent withDefineP {X i = 1}=p = 1 − P {X i = 0}, i 1N 1 = min{n: X 1 +···+X n = 5}{ 3, if X1 = 0N 2 =5, if X 1 = 1{ 3, if X4 = 0N 3 =2, if X 4 = 1Which of the N i are stopping times for the sequence X 1 ,...? An importantresult, known as Wald’s equation states that if X 1 ,X 2 ,... are independent andidentically distributed and have a finite mean E(X), and if N is a stopping timefor this sequence having a finite mean, then[ N]∑E X i = E[N]E[X]i=1To prove Wald’s equation, let us define the indicator variables I i ,i 1by{ 1, if i NI i =0, if i>N


482 7 Renewal Theory and Its Applications(b) Show thatFrom part (b) we see thatN∑ ∞∑X i = X i I ii=1[ N]∑E X ii=1=i=1[ ∞]∑= E X i I ii=1∞∑E[X i I i ]i=1where the last equality assumes that the expectation can be brought inside thesummation (as indeed can be rigorously proven in this case).(c) Argue that X i and I i are independent.Hint: I i equals 0 or 1 depending on whether or not we have yet stopped afterobserving which random variables?(d) From part (c) we have[ N]∑ ∞∑E X i = E[X]E[I i ]i=1Complete the proof of Wald’s equation.(e) What does Wald’s equation tell us about the stopping times in part (a)?14. Wald’s equation can be used as the basis of a proof of the elementary renewaltheorem. Let X 1 ,X 2 ,...denote the interarrival times of a renewal process and letN(t) be the number of renewals by time t.(a) Show that whereas N(t) is not a stopping time, N(t)+ 1is.i=1Hint:Note that(b) Argue thatN(t)= n ⇔ X 1 +···+X n t and X 1 +···+X n+1 >tE[ N(t)+1]∑X i = μ[m(t) + 1]i=1


Exercises 483(c) Suppose that the X i are bounded random variables. That is, suppose thereis a constant M such that P {X i


484 7 Renewal Theory and Its Applications(a) entering the bank?(b) leaving the bank?What if F were exponential?*18. Compute the renewal function when the interarrival distribution F issuch that1 − F(t)= pe −μ1t + (1 − p)e −μ 2t19. For the renewal process whose interarrival times are uniformly distributedover (0, 1), determine the expected time from t = 1 until the next renewal.20. For a renewal reward process considerW n = R 1 + R 2 +···+R nX 1 + X 2 +···+X nwhere W n represents the average reward earned during the first n cycles. Showthat W n → E[R]/E[X] as n →∞.21. Consider a single-server bank for which customers arrive in accordancewith a Poisson process with rate λ. If a customer will enter the bank only if theserver is free when he arrives, and if the service time of a customer has the distributionG, then what proportion of time is the server busy?*22. The lifetime of a car has a distribution H and probability density h.Ms. Jones buys a new car as soon as her old car either breaks down or reachesthe age of T years. A new car costs C 1 dollars and an additional cost of C 2 dollarsis incurred whenever a car breaks down. Assuming that a T -year-old car in workingorder has an expected resale value R(T ), what is Ms. Jones’ long-run averagecost?23. If H is the uniform distribution over (2, 8) and if C 1 = 4, C 2 = 1, andR(T ) = 4 − (T /2), then what value of T minimizes Ms. Jones’ long-run averagecost in Exercise 22?24. Wald’s equation can also be proved by using renewal reward processes. LetN be a stopping time for the sequence of independent and identically distributedrandom variables X i , i 1.(a) Let N 1 = N. Argue that the sequence of random variables X N1 +1,X N1 +2,... is independent of X 1 ,...,X N and has the same distribution as theoriginal sequence X i ,i 1.Now treat X N1 +1,X N1 +2,...as a new sequence, and define a stopping time N 2for this sequence that is defined exactly as is N 1 is on the original sequence.(For instance, if N 1 = min{n: X n > 0}, then N 2 = min{n: X N1 +n > 0}.) Similarly,define a stopping time N 3 on the sequence X N1 +N 2 +1,X N1 +N 2 +2,...thatis identically defined on this sequence as is N 1 on the original sequence, and soon.


Exercises 485(b) Is the reward process in which X i is the reward earned during period i arenewal reward process? If so, what is the length of the successive cycles?(c) Derive an expression for the average reward per unit time.(d) Use the strong law of large numbers to derive a second expression for theaverage reward per unit time.(e) Conclude Wald’s equation.25. Suppose in Example 7.13 that the arrival process is a Poisson process andsuppose that the policy employed is to dispatch the train every t time units.(a) Determine the average cost per unit time.(b) Show that the minimal average cost per unit time for such a policy is approximatelyc/2 plus the average cost per unit time for the best policy of thetype considered in that example.26. Consider a train station to which customers arrive in accordance with a Poissonprocess having rate λ. A train is summoned whenever there are N customerswaiting in the station, but it takes K units of time for the train to arrive at thestation. When it arrives, it picks up all waiting customers. Assuming that the trainstation incurs a cost at a rate of nc per unit time whenever there are n customerspresent, find the long-run average cost.27. A machine consists of two independent components, the ith of which functionsfor an exponential time with rate λ i . The machine functions as long as atleast one of these components function. (That is, it fails when both componentshave failed.) When a machine fails, a new machine having both its componentsworking is put into use. A cost K is incurred whenever a machine failure occurs;operating costs at rate c i per unit time are incurred whenever the machine in usehas i working components, i = 1, 2. Find the long run average cost per unit time.28. In Example 7.15, what proportion of the defective items produced is discovered?29. Consider a single-server queueing system in which customers arrive in accordancewith a renewal process. Each customer brings in a random amount ofwork, chosen independently according to the distribution G. The server servesone customer at a time. However, the server processes work at rate i per unit timewhenever there are i customers in the system. For instance, if a customer withworkload 8 enters service when there are three other customers waiting in line,then if no one else arrives that customer will spend 2 units of time in service. Ifanother customer arrives after 1 unit of time, then our customer will spend a totalof 1.8 units of time in service provided no one else arrives.Let W i denote the amount of time customer i spends in the system. Also,define E[W ] byE[W ]= lim (W 1 +···+W n )/nn→∞and so E[W ] is the average amount of time a customer spends in the system.


486 7 Renewal Theory and Its ApplicationsLet N denote the number of customers that arrive in a busy period.(a) Argue thatE[W ]=E[W 1 +···+W N ]/E[N]Let L i denote the amount of work customer i brings into the system; and so theL i ,i 1, are independent random variables having distribution G.(b) Argue that at any time t, the sum of the times spent in the system by allarrivals prior to t is equal to the total amount of work processed by time t.Hint:Consider the rate at which the server processes work.(c) Argue thatN∑ N∑W i =i=1i=1(d) Use Wald’s equation (see Exercise 13) to conclude thatE[W ]=μwhere μ is the mean of the distribution G. That is, the average time that customersspend in the system is equal to the average work they bring to the system.*30. For a renewal process, let A(t) be the age at time t. Prove that if μx|A(t) = s}32. Determine the long-run proportion of time that X N(t)+1


Exercises 487(a) Find the long run average cost per unit time.(b) Find the long run proportion of time the system is being cleaned.*35. Satellites are launched according to a Poisson process with rate λ. Eachsatellite will, independently, orbit the earth for a random time having distributionF .LetX(t) denote the number of satellites orbiting at time t.(a) Determine P {X(t) = k}.Hint:Relate this to the M/G/∞ queue.(b) If at least one satellite is orbiting, then messages can be transmitted and wesay that the system is functional. If the first satellite is orbited at time t = 0,determine the expected time that the system remains functional.Hint: Make use of part (a) when k = 0.36. Each of n skiers continually, and independently, climbs up and then skisdown a particular slope. The time it takes skier i to climb up has distributionF i , and it is independent of her time to ski down, which has distribution H i ,i = 1,..., n.LetN(t) denote the total number of times members of this grouphave skied down the slope by time t. Also, let U(t) denote the number of skiersclimbing up the hill at time t.(a) What is lim t→∞ N(t)/t?(b) Find lim t→∞ E[U(t)].(c) If all F i are exponential with rate λ and all G i are exponential with rate μ,what is P {U(t)= k}?37. There are three machines, all of which are needed for a system to work.Machine i functions for an exponential time with rate λ i before it fails, i = 1, 2, 3.When a machine fails, the system is shut down and repair begins on the failedmachine. The time to fix machine 1 is exponential with rate 5; the time to fixmachine 2 is uniform on (0, 4); and the time to fix machine 3 is a gamma randomvariable with parameters n = 3 and λ = 2. Once a failed machine is repaired, it isas good as new and all machines are restarted.(a) What proportion of time is the system working?(b) What proportion of time is machine 1 being repaired?(c) What proportion of time is machine 2 in a state of suspended animation(that is, neither working nor being repaired)?38. A truck driver regularly drives round trips from A to B and then back to A.Each time he drives from A to B, he drives at a fixed speed that (in miles per hour)is uniformly distributed between 40 and 60; each time he drives from B to A, hedrives at a fixed speed that is equally likely to be either 40 or 60.


488 7 Renewal Theory and Its Applications(a) In the long run, what proportion of his driving time is spent going to B?(b) In the long run, for what proportion of his driving time is he driving at aspeed of 40 miles per hour?39. A system consists of two independent machines that each functions for anexponential time with rate λ. There is a single repairperson. If the repairpersonis idle when a machine fails, then repair immediately begins on that machine;if the repairperson is busy when a machine fails, then that machine must waituntil the other machine has been repaired. All repair times are independent withdistribution function G and, once repaired, a machine is as good as new. Whatproportion of time is the repairperson idle?40. Three marksmen take turns shooting at a target. Marksman 1 shoots untilhe misses, then marksman 2 begins shooting until he misses, then marksman 3until he misses, and then back to marksman 1, and so on. Each time marksman ifires he hits the target, independently of the past, with probability P i , i = 1, 2, 3.Determine the proportion of time, in the long run, that each marksman shoots.41. Each time a certain machine breaks down it is replaced by a new one of thesame type. In the long run, what percentage of time is the machine in use less thanone year old if the life distribution of a machine is(a) uniformly distributed over (0, 2)?(b) exponentially distributed with mean 1?*42. For an interarrival distribution F having mean μ, we defined the equilibriumdistribution of F , denoted F e ,byF e (x) = 1 μ∫ x0[1 − F(y)] dy(a) Show that if F is an exponential distribution, then F = F e .(b) If for some constant c,{ 0, x


Exercises 489park your car in Berkeley and return after three hours, what is the probabilityyou will have received a ticket?43. Consider a renewal process having interarrival distribution F such that¯F(x)= 1 2 e−x + 1 2 e−x/2 , x >0That is, interarrivals are equally likely to be exponential with mean 1 or exponentialwith mean 2.(a) Without any calculations, guess the equilibrium distribution F e .(b) Verify your guess in part (a).44. An airport shuttle bus picks up all passengers waiting at a bus stop anddrops them off at the airport terminal; it then returns to the stop and repeats theprocess. The times between returns to the stop are independent random variableswith distribution F , mean μ, and variance σ 2 . Passengers arrive at the bus stop inaccordance with a Poisson process with rate λ. Suppose the bus has just left thestop, and let X denote the number of passengers it picks up when it returns.(a) Find E[X].(b) Find Var(X).(c) At what rate does the shuttle bus arrive at the terminal without any passengers?Suppose that each passenger that has to wait at the bus stop more than c timeunits writes an angry letter to the shuttle bus manager.(d) What proportion of passengers write angry letters?(e) How does your answer in part (d) relate to F e (x)?45. Consider a system that can be in either state 1 or 2 or 3. Each time the systementers state i it remains there for a random amount of time having mean μ i andthen makes a transition into state j with probability P ij . SupposeP 12 = 1, P 21 = P 23 = 1 2 , P 31 = 1(a) What proportion of transitions takes the system into state 1?(b) If μ 1 = 1, μ 2 = 2, μ 3 = 3, then what proportion of time does the systemspend in each state?46. Consider a semi-Markov process in which the amount of time that theprocess spends in each state before making a transition into a different state isexponentially distributed. What kind of process is this?47. In a semi-Markov process, let t ij denote the conditional expected time thatthe process spends in state i given that the next state is j.


490 7 Renewal Theory and Its Applications(a) Present an equation relating μ i to the t ij .(b) Show that the proportion of time the process is in i and will next enter j isequal to P i P ij t ij /μ i .Hint: Say that a cycle begins each time state i is entered. Imagine that youreceive a reward at a rate of 1 per unit time whenever the process is in i andheading for j. What is the average reward per unit time?48. A taxi alternates between three different locations. Whenever it reaches locationi, it stops and spends a random time having mean t i before obtaining anotherpassenger, i = 1, 2, 3. A passenger entering the cab at location i will wantto go to location j with probability P ij . The time to travel from i to j is a randomvariable with mean m ij . Suppose that t 1 = 1, t 2 = 2, t 3 = 4, P 12 = 1, P 23 = 1,P 31 = 2 3 = 1 − P 32, m 12 = 10, m 23 = 20, m 31 = 15, m 32 = 25. Define an appropriatesemi-Markov process and determine(a) the proportion of time the taxi is waiting at location i, and(b) the proportion of time the taxi is on the road from i to j, i, j = 1, 2, 3.*49. Consider a renewal process having the gamma (n, λ) interarrival distribution,and let Y(t) denote the time from t until the next renewal. Use the theory ofsemi-Markov processes to show thatlimt→∞ P {Y(t)


Exercises 49151. In 1984 the country of Morocco in an attempt to determine the averageamount of time that tourists spend in that country on a visit tried two differentsampling procedures. In one, they questioned randomly chosen tourists as theywere leaving the country; in the other, they questioned randomly chosen guestsat hotels. (Each tourist stayed at a hotel.) The average visiting time of the 3000tourists chosen from hotels was 17.8, whereas the average visiting time of the12,321 tourists questioned at departure was 9.0. Can you explain this discrepancy?Does it necessarily imply a mistake?52. Let X i , i = 1, 2,...,be the interarrival times of the renewal process {N(t)},and let Y , independent of the X i , be exponential with rate λ.(a) Use the lack of memory property of the exponential to argue that(b) Use part (a) to show thatP {X 1 +···+X n 0.


492 7 Renewal Theory and Its ApplicationsReferencesThe results in Section 7.9.1 concerning the computation of the variance of the time until a specifiedpattern appears are new, as are the results of Section 7.9.2. The results of Section 7.9.3 are fromRef. [3].1. D. R. Cox, “Renewal Theory,” Methuen, London, 1962.2. W. Feller, “An Introduction to Probability Theory and Its Applications,” Vol. II,John Wiley, New York, 1966.3. F. Hwang and D. Trietsch, “A Simple Relation Between the Pattern Probability and theRate of False Signals in Control Charts,” Probability in the Engineering and InformationalSciences, 10, 315–323, 1996.4. S. Ross, “Stochastic Processes,” Second Edition, John Wiley, New York, 1996.5. H. C. Tijms, “Stochastic Models, An Algorithmic Approach,” John Wiley, New York,1994.


QueueingTheory88.1. IntroductionIn this chapter we will study a class of models in which customers arrive in somerandom manner at a service facility. Upon arrival they are made to wait in queueuntil it is their turn to be served. Once served they are generally assumed to leavethe system. For such models we will be interested in determining, among otherthings, such quantities as the average number of customers in the system (or in thequeue) and the average time a customer spends in the system (or spends waitingin the queue).In Section 8.2 we derive a series of basic queueing identities which are of greatuse in analyzing queueing models. We also introduce three different sets of limitingprobabilities which correspond to what an arrival sees, what a departure sees,and what an outside observer would see.In Section 8.3 we deal with queueing systems in which all of the defining probabilitydistributions are assumed to be exponential. For instance, the simplest suchmodel is to assume that customers arrive in accordance with a Poisson process(and thus the interarrival times are exponentially distributed) and are served oneat a time by a single server who takes an exponentially distributed length of timefor each service. These exponential queueing models are special examples ofcontinuous-time Markov chains and so can be analyzed as in Chapter 6. However,at the cost of a (very) slight amount of repetition we shall not assume thatyou are familiar with the material of Chapter 6, but rather we shall redevelop anyneeded material. Specifically we shall derive anew (by a heuristic argument) theformula for the limiting probabilities.In Section 8.4 we consider models in which customers move randomly amonga network of servers. The model of Section 8.4.1 is an open system in whichcustomers are allowed to enter and depart the system, whereas the one studied493


494 8 Queueing Theoryin Section 8.4.2 is closed in the sense that the set of customers in the system isconstant over time.In Section 8.5 we study the model M/G/1, which while assuming Poissonarrivals, allows the service distribution to be arbitrary. To analyze this model wefirst introduce in Section 8.5.1 the concept of work, and then use this concept inSection 8.5.2 to help analyze this system. In Section 8.5.3 we derive the averageamount of time that a server remains busy between idle periods.In Section 8.6 we consider some variations of the model M/G/1. In particularin Section 8.6.1 we suppose that bus loads of customers arrive according to aPoisson process and that each bus contains a random number of customers. InSection 8.6.2 we suppose that there are two different classes of customers—withtype 1 customers receiving service priority over type 2.In Section 8.6.3 we present an M/G/1 optimization example. We supposethat the server goes on break whenever she becomes idle, and then determine,under certain cost assumptions, the optimal time for her to return to service.In Section 8.7 we consider a model with exponential service times but where theinterarrival times between customers is allowed to have an arbitrary distribution.We analyze this model by use of an appropriately defined Markov chain. We alsoderive the mean length of a busy period and of an idle period for this model.In Section 8.8 we consider a single server system whose arrival process resultsfrom return visits of a finite number of possible sources. Assuming a generalservice distribution, we show how a Markov chain can be used to analyze thissystem.In the final section of the chapter we talk about multiserver systems. We startwith loss systems, in which arrivals finding all servers busy, are assumed to departand as such are lost to the system. This leads to the famous result known asErlang’s loss formula, which presents a simple formula for the number of busyservers in such a model when the arrival process in Poisson and the service distributionis general. We then discuss multiserver systems in which queues areallowed. However, except in the case where exponential service times are assumed,there are very few explicit formulas for these models. We end by presentingan approximation for the average time a customer waits in queue in ak-server model which assumes Poisson arrivals but allows for a general servicedistribution.8.2. PreliminariesIn this section we will derive certain identities which are valid in the great majorityof queueing models.


8.2. Preliminaries 4958.2.1. Cost EquationsSome fundamental quantities of interest for queueing models areL, the average number of customers in the system;L Q , the average number of customers waiting in queue;W , the average amount of time a customer spends in the system;W Q , the average amount of time a customer spends waiting in queue.A large number of interesting and useful relationships between the precedingand other quantities of interest can be obtained by making use of the followingidea: Imagine that entering customers are forced to pay money (according to somerule) to the system. We would then have the following basic cost identity:average rate at which the system earns= λ a × average amount an entering customer pays (8.1)where λ a is defined to be average arrival rate of entering customers. That is, ifN(t) denotes the number of customer arrivals by time t, thenN(t)λ a = limt→∞ tWe now present a heuristic proof of Equation (8.1).Heuristic Proof of Equation (8.1) Let T be a fixed large number. In twodifferent ways, we will compute the average amount of money the system hasearned by time T . On one hand, this quantity approximately can be obtained bymultiplying the average rate at which the system earns by the length of time T .On the other hand, we can approximately compute it by multiplying the averageamount paid by an entering customer by the average number of customers enteringby time T (and this latter factor is approximately λ a T ). Hence, both sidesof Equation (8.1) when multiplied by T are approximately equal to the averageamount earned by T . The result then follows by letting T →∞. ∗By choosing appropriate cost rules, many useful formulas can be obtainedas special cases of Equation (8.1). For instance, by supposing that each customerpays $1 per unit time while in the system, Equation (8.1) yields the so-called∗ This can be made into a rigorous proof provided we assume that the queueing process is regenerativein the sense of Section 7.5. Most models, including all the ones in this chapter, satisfy thiscondition.


496 8 Queueing TheoryLittle’s formula,L = λ a W (8.2)This follows since, under this cost rule, the rate at which the system earns is justthe number in the system, and the amount a customer pays is just equal to its timein the system.Similarly if we suppose that each customer pays $1 per unit time while inqueue, then Equation (8.1) yieldsL Q = λ a W Q (8.3)By supposing the cost rule that each customer pays $1 per unit time while inservice we obtain from Equation (8.1) that theaverage number of customers in service = λ a E[S] (8.4)where E[S] is defined as the average amount of time a customer spends in service.It should be emphasized that Equations (8.1) through (8.4) are valid for almostall queueing models regardless of the arrival process, the number of servers, orqueue discipline. 8.2.2. Steady-State ProbabilitiesLet X(t) denote the number of customers in the system at time t and defineP n ,n 0, byP n = lim P {X(t) = n}t→∞where we assume the preceding limit exists. In other words, P n is the limitingor long-run probability that there will be exactly n customers in the system. It issometimes referred to as the steady-state probability of exactly n customers in thesystem. It also usually turns out that P n equals the (long-run) proportion of timethat the system contains exactly n customers. For example, if P 0 = 0.3, then inthe long run, the system will be empty of customers for 30 percent of the time.Similarly, P 1 = 0.2 would imply that for 20 percent of the time the system wouldcontain exactly one customer. ∗∗ A sufficient condition for the validity of the dual interpretation of P n is that the queueing processbe regenerative.


8.2. Preliminaries 497Two other sets of limiting probabilities are {a n ,n 0} and {d n ,n 0}, wherea n = proportion of customers that find nin the system when they arrive, andd n = proportion of customers leaving behind nin the system when they departThat is, P n is the proportion of time during which there are n in the system; a n isthe proportion of arrivals that find n; and d n is the proportion of departures thatleave behind n. That these quantities need not always be equal is illustrated by thefollowing example.Example 8.1 Consider a queueing model in which all customers have servicetimes equal to 1, and where the times between successive customers are alwaysgreater than 1 [for instance, the interarrival times could be uniformly distributedover (1, 2)]. Hence, as every arrival finds the system empty and every departureleaves it empty, we haveHowever,a 0 = d 0 = 1P 0 ≠ 1as the system is not always empty of customers.It was, however, no accident that a n equaled d n in the previous example. Thatarrivals and departures always see the same number of customers is always trueas is shown in the next proposition.Proposition 8.1a timeandIn any system in which customers arrive and depart one atthe rate at which arrivals find n = the rate at which departures leave na n = d nProof An arrival will see n in the system whenever the number in the systemgoes from n to n + 1; similarly, a departure will leave behind n whenever thenumber in the system goes from n + 1ton. Now in any interval of time T thenumber of transitions from n to n + 1 must equal to within 1 the number fromn + 1ton. [Between any two transitions from n to n + 1, there must be one fromn + 1ton, and conversely.] Hence, the rate of transitions from n to n + 1 equalstheratefromn + 1ton; or, equivalently, the rate at which arrivals find n equalsthe rate at which departures leave n.Nowa n , the proportion of arrivals finding n,


498 8 Queueing Theorycan be expressed asthe rate at which arrivals find na n =overall arrival rateSimilarly,the rate at which departures leave nd n =overall departure rateThus, if the overall arrival rate is equal to the overall departure rate, then thepreceding shows that a n = d n . On the other hand, if the overall arrival rate exceedsthe overall departure rate, then the queue size will go to infinity, implying thata n = d n = 0. Hence, on the average, arrivals and departures always see the same number ofcustomers. However, as Example 8.1 illustrates, they do not, in general, see thetime averages. One important exception where they do is in the case of Poissonarrivals.Proposition 8.2Poisson arrivals,Poisson arrivals always see time averages. In particular, forP n = a nTo understand why Poisson arrivals always see time averages, consider an arbitraryPoisson arrival. If we knew that it arrived at time t, then the conditionaldistribution of what it sees upon arrival is the same as the unconditional distributionof the system state at time t. For knowing that an arrival occurs at time t givesus no information about what occurred prior to t. (Since the Poisson process hasindependent increments, knowing that an event occurred at time t does not affectthe distribution of what occurred prior to t.) Hence, an arrival would just see thesystem according to the limiting probabilities.Contrast the foregoing with the situation of Example 8.1 where knowing thatan arrival occurred at time t tells us a great deal about the past; in particular ittells us that there have been no arrivals in (t − 1,t). Thus, in this case, we cannotconclude that the distribution of what an arrival at time t observes is the same asthe distribution of the system state at time t.For a second argument as to why Poisson arrivals see time averages, note thatthe total time the system is in state n by time T is (roughly) P n T . Hence, asPoisson arrivals always arrive at rate λ no matter what the system state, it followsthat the number of arrivals in [0, T ] that find the system in state n is (roughly)λP n T . In the long run, therefore, the rate at which arrivals find the system in staten is λP n and, as λ is the overall arrival rate, it follows that λP n /λ = P n is theproportion of arrivals that find the system in state n.The result that Poisson arrivals see time averages is called the PASTA principle.


8.3. Exponential Models 4998.3. Exponential Models8.3.1. A Single-Server Exponential Queueing SystemSuppose that customers arrive at a single-server service station in accordance witha Poisson process having rate λ. That is, the time between successive arrivals areindependent exponential random variables having mean 1/λ. Each customer, uponarrival, goes directly into service if the server is free and, if not, the customer joinsthe queue. When the server finishes serving a customer, the customer leaves thesystem, and the next customer in line, if there is any, enters service. The successiveservice times are assumed to be independent exponential random variables havingmean 1/μ.The preceding is called the M/M/1 queue. The two Ms refer to the fact thatboth the interarrival and the service distributions are exponential (and thus memoryless,or Markovian), and the 1 to the fact that there is a single server. To analyzeit, we shall begin by determining the limiting probabilities P n ,forn = 0, 1,....To do so, think along the following lines. Suppose that we have an infinite numberof rooms numbered 0, 1, 2,..., and suppose that we instruct an individual toenter room n whenever there are n customers in the system. That is, he would bein room 2 whenever there are two customers in the system; and if another wereto arrive, then he would leave room 2 and enter room 3. Similarly, if a servicewould take place he would leave room 2 and enter room 1 (as there would now beonly one customer in the system).Now suppose that in the long run our individual is seen to have entered room 1at the rate of ten times an hour. Then at what rate must he have left room 1?Clearly, at this same rate of ten times an hour. For the total number of times thathe enters room 1 must be equal to (or one greater than) the total number of timeshe leaves room 1. This sort of argument thus yields the general principle whichwill enable us to determine the state probabilities. Namely, for each n 0, the rateat which the process enters state n equals the rate at which it leaves state n. Let usnow determine these rates. Consider first state 0. When in state 0 the process canleave only by an arrival as clearly there cannot be a departure when the systemis empty. Since the arrival rate is λ and the proportion of time that the process isin state 0 is P 0 , it follows that the rate at which the process leaves state 0 is λP 0 .On the other hand, state 0 can only be reached from state 1 via a departure. Thatis, if there is a single customer in the system and he completes service, then thesystem becomes empty. Since the service rate is μ and the proportion of time thatthe system has exactly one customer is P 1 , it follows that the rate at which theprocess enters state 0 is μP 1 .Hence, from our rate-equality principle we get our first equation,λP 0 = μP 1


500 8 Queueing TheoryNow consider state 1. The process can leave this state either by an arrival (whichoccurs at rate λ) or a departure (which occurs at rate μ). Hence, when in state 1,the process will leave this state at a rate of λ+μ. ∗ Since the proportion of time theprocess is in state 1 is P 1 , the rate at which the process leaves state 1 is (λ + μ)P 1 .On the other hand, state 1 can be entered either from state 0 via an arrival orfrom state 2 via a departure. Hence, the rate at which the process enters state 1is λP 0 + μP 2 . Because the reasoning for other states is similar, we obtain thefollowing set of equations:StateRate at which the process leaves = rate at which it enters0 λP 0 = μP 1n, n 1 (λ + μ)P n = λP n−1 + μP n+1 (8.5)The set of Equations (8.5) which balances the rate at which the process enters eachstate with the rate at which it leaves that state is known as balance equations.In order to solve Equations (8.5), we rewrite them to obtainP 1 = λ μ P 0,Solving in terms of P 0 yieldsP n+1 = λ (μ P n + P n − λ )μ P n−1 , n 1P 0 = P 0 ,P 1 = λ μ P 0,P 2 = λ (μ P 1 + P 1 − λ )μ P 0 = λ ( ) λ 2μ P 1 = P 0 ,μP 3 = λ (μ P 2 + P 2 − λ )μ P 1 = λ ( ) λ 3μ P 2 = P 0 ,μP 4 = λ (μ P 3 + P 3 − λ )μ P 2 = λ ( ) λ 4μ P 3 = P 0 ,μP n+1 = λ (μ P n + P n − λ )μ P n−1 = λ ( ) λ n+1μ P n = P 0μ∗ If one event occurs at a rate λ and another occurs at rate μ, then the total rate at which either eventoccurs is λ + μ. Suppose one man earns $2 per hour and another earns $3 per hour; then together theyclearly earn $5 per hour.


8.3. Exponential Models 501To determine P 0 we use the fact that the P n must sum to 1, and thusor∞∑ ∞∑( ) λ n1 = P n = P 0 = P 0μ 1 − λ/μn=0P 0 = 1 − λ μ ,n=0( λ n (P n = 1 −μ) λ ), n 1 (8.6)μNotice that for the preceding equations to make sense, it is necessary for λ/μ tobe less than 1. For otherwise ∑ ∞n=0 (λ/μ) n would be infinite and all the P n wouldbe 0. Hence, we shall assume that λ/μ < 1. Note that it is quite intuitive thatthere would be no limiting probabilities if λ>μ. For suppose that λ>μ. Sincecustomers arrive at a Poisson rate λ, it follows that the expected total number ofarrivals by time t is λt. On the other hand, what is the expected number of customersserved by time t? If there were always customers present, then the numberof customers served would be a Poisson process having rate μ since the time betweensuccessive services would be independent exponentials having mean 1/μ.Hence, the expected number of customers served by time t is no greater than μt;and, therefore, the expected number in the system at time t is at leastλt − μt = (λ − μ)tNow if λ>μ, then the preceding number goes to infinity as t becomes large. Thatis, λ/μ > 1, the queue size increases without limit and there will be no limitingprobabilities. Note also that the condition λ/μ < 1 is equivalent to the conditionthat the mean service time be less than the mean time between successive arrivals.This is the general condition that must be satisfied for limited probabilities to existin most single-server queueing systems.Remarks (i) In solving the balance equations for the M/M/1 queue, we obtainedas an intermediate step the set of equationsλP n = μP n+1 , n 0These equations could have been directly argued from the general queueing result(shown in Proposition 8.1) that the rate at which arrivals find n in the system—namely λP n —is equal to the rate at which departures leave behind n—namely,μP n+1 .


502 8 Queueing Theory(ii) We can also prove that P n = (λ/μ) n (1 − λ/μ) by using a queueing costidentity. Suppose that, for a fixed n>0, whenever there are at least n customersin the system the nth oldest customer (with age measured from when the customerarrived) pays 1 per unit time. Letting X be the steady state number of customersin the system, because the system earns 1 per unit time whenever X is at least n,it follows thataverage rate at which the system earns = P {X n}Also, because a customer who finds fewer than n − 1 in the system when it arriveswill pay 0, while an arrival who finds at least n − 1 in the system will pay 1 perunit time for an exponentially distributed time with rate μaverage amount a customer pays = 1 P {X n − 1}μTherefore, the queueing cost identity yields thatP {X n}=(λ/μ)P {X n − 1}, n>0Iterating this givesP {X n}=(λ/μ)P {X n − 1}= (λ/μ) 2 P {X n − 2}=···= (λ/μ) n P {X 0}= (λ/μ) nTherefore,P {X = n}=P {X n}−P {X n + 1}=(λ/μ) n (1 − λ/μ) Now let us attempt to express the quantities L,L Q ,W, and W Q in terms ofthe limiting probabilities P n . Since P n is the long-run probability that the systemcontains exactly n customers, the average number of customers in the systemclearly is given by∞∑L = nP n=n=0∞∑n=0= λμ − λ( λ n (n 1 −μ) λ )μ(8.7)


8.3. Exponential Models 503where the last equation followed upon application of the algebraic identity∞∑nx n x=(1 − x) 2n=0The quantities W,W Q , and L Q now can be obtained with the help of Equations(8.2) and (8.3). That is, since λ a = λ, we have from Equation (8.7) thatW = L λ= 1μ − λ ,W Q = W − E[S]= W − 1 μ=L Q = λW Qλμ(μ − λ) ,=λ 2μ(μ − λ)(8.8)Example 8.2 Suppose that customers arrive at a Poisson rate of one per every12 minutes, and that the service time is exponential at a rate of one service per8 minutes. What are L and W ?Solution:Since λ = 1 12 ,μ= 1 8 ,wehaveL = 2, W = 24Hence, the average number of customers in the system is two, and the averagetime a customer spends in the system is 24 minutes.Now suppose that the arrival rate increases 20 percent to λ =10 1 . What is thecorresponding change in L and W ? Again using Equations (8.7), we getL = 4, W = 40Hence, an increase of 20 percent in the arrival rate doubled the average numberof customers in the system.


504 8 Queueing TheoryTo understand this better, write Equations (8.7) asL =λ/μ1 − λ/μ ,W = 1/μ1 − λ/μFrom these equations we can see that when λ/μ is near 1, a slight increase inλ/μ will lead to a large increase in L and W . A Technical Remark We have used the fact that if one event occurs at anexponential rate λ, and another independent event at an exponential rate μ, thentogether they occur at an exponential rate λ + μ. To check this formally, let T 1 bethe time at which the first event occurs, and T 2 the time at which the second eventoccurs. ThenP {T 1 t}=1 − e −λt ,P {T 2 t}=1 − e −μtNow if we are interested in the time until either T 1 or T 2 occurs, then we areinterested in T = min(T 1 ,T 2 ).NowP {T t}=1 − P {T >t}= 1 − P {min(T 1 ,T 2 )>t}However, min(T 1 ,T 2 )>t if and only if both T 1 and T 2 are greater than t; hence,P {T t}=1 − P {T 1 >t,T 2 >t}= 1 − P {T 1 >t}P {T 2 >t}= 1 − e −λt e −μt= 1 − e −(λ+μ)tThus, T has an exponential distribution with rate λ + μ, and we are justified inadding the rates. Given that an M/M/1 steady-state customer—that is, a customer who arrivesafter the system has been in operation a long time—spends a total of t time unitsin the system, let us determine the conditional distribution of N, the number ofothers that were present when that customer arrived. That is, letting W ∗ be the


8.3. Exponential Models 505amount of time a customer spends in the system, we will find P {N = n|W ∗ = t}.Now,P {N = n|W ∗ = t}= f N,W∗(n, t)fW ∗ (t)= P {N = n}f W ∗ |N (t|n)fW ∗ (t)where f W ∗ |N (t|n) is the conditional density of W ∗ given that N = n, and f W ∗(t)is the unconditional density of W ∗ . Now, given that N = n, the time that thecustomer spends in the system is distributed as the sum of n + 1 independentexponential random variables with a common rate μ, implying that the conditionaldistribution of W ∗ given that N = n is the gamma distribution with parametersn + 1 and μ. Therefore, with C = 1/f W ∗(t)P {N = n|W ∗ −μt (μt)n= t}=CP{N = n}μen!= C(λ/μ) n −μt (μt)n(1 − λ/μ)μen!(by PASTA)= K (λt)nn!where K = C(1 − λ/μ)μe −μt does not depend on n. Summing over n yields∞∑∞∑ (λt) n1 = P {N = n|T = t}=K = Ke λtn!n=0Thus, K = e −λt , showing thatn=0P {N = n|W ∗ −λt (λt)n= t}=en!Therefore, the conditional distribution of the number seen by an arrival whospends a total of t time units in the system is the Poisson distribution with mean λt.In addition, as a by-product of our analysis, we have thatf W ∗(t) = 1/C= 1 (1 − λ/μ)μe−μtK= (μ − λ)e −(μ−λ)t


506 8 Queueing TheoryIn other words, W ∗ , the amount of time a customer spends in the system, is anexponential random variable with rate μ − λ. (As a check, we note that E[W ∗ ]=1/(μ − λ) which checks with Equation (8.8) since W = E[W ∗ ].)Remark Another argument as to why W ∗ is exponential with rate μ − λ is asfollows. If we let N denote the number of customers in the system as seen byan arrival, then this arrival will spend N + 1 service times in the system beforedeparting. Now,P {N + 1 = j}=P {N = j − 1}=(λ/μ) j−1 (1 − λ/μ), j 1In words, the number of services that have to be completed before the arrivaldeparts is a geometric random variable with parameter 1 − λ/μ. Therefore, aftereach service completion our customer will be the one departing with probability1 − λ/μ. Thus, no matter how long the customer has already spent in the system,the probability he will depart in the next h time units is μh + o(h), the probabilitythat a service ends in that time, multiplied by 1 − λ/μ. That is, the customer willdepart in the next h time units with probability (μ − λ)h + o(h), which says thatthe hazard rate function of W ∗ is the constant μ − λ. But only the exponential hasa constant hazard rate, and so we can conclude that W ∗ is exponential with rateμ − λ.Our next example illustrates the inspection paradox.Example 8.3 For an M/M/1 queue in steady state, what is the probabilitythat the next arrival finds n in the system?Solution: Although it might initially seem, by the PASTA principle, that thisprobability should just be (λ/μ) n (1 − λ/μ), we must be careful. Because if tis the current time, then the time from t until the next arrival is exponentiallydistributed with rate λ, and is independent of the time from t since the lastarrival, which (in the limit, as t goes to infinity) is also exponential with rate λ.Thus, although the times between successive arrivals of a Poisson process areexponential with rate λ, the time between the previous arrival before t and thefirst arrival after t is distributed as the sum of two independent exponentials.(This is an illustration of the inspection paradox, which results because thelength of an interarrival interval that contains a specified time tends to be longerthan an ordinary interarrival interval—see Section 7.7.)Let N a denote the number found by the next arrival, and let X be the numbercurrently in the system. Conditioning on X yields


8.3. Exponential Models 507∞∑P {N a = n}= P {N a = n|X = k}P {X = k}k=0∞∑= P {N a = n|X = k}(λ/μ) k (1 − λ/μ)k=0∞∑= P {N a = n|X = k}(λ/μ) k (1 − λ/μ)k=n∞∑= P {N a = n|X = n + i}(λ/μ) n+i (1 − λ/μ)i=0Now, for n>0, given there are currently n + i in the system, the next arrivalwill find n if we have i services before an arrival and then an arrival beforethe next service completion. By the lack of memory property of exponentialinterarrival random variables, this gives( ) μ iλP {N a = n|X = n + i}=λ + μ λ + μ , n>0Consequently, for n>0,P {N a = n}=∞∑( ) μ i )λ λ n+i(1 − λ/μ)λ + μ λ + μ(μi=0= (λ/μ) n λ∞∑( λ(1 − λ/μ)λ + μ λ + μ= (λ/μ) n+1 (1 − λ/μ)i=0) iOn the other hand, the probability that the next arrival will find the systemempty, when there are currently i in the system, is the probability that thereare i services before the next arrival. Therefore, P {N a = 0|X = i}=( μλ+μ )i ,giving that


508 8 Queueing TheoryP {N a = 0}=∞∑i=0( μ i ( ) λ i(1 − λ/μ)λ + μ)μ= (1 − λ/μ)∞∑( λλ + μi=0= (1 + λ/μ)(1 − λ/μ)) iAs a check, note that[]∞∑∞∑P {N a = n}=(1 − λ/μ) 1 + λ/μ + (λ/μ) n+1n=0n=1∞∑= (1 − λ/μ) (λ/μ) i= 1i=0Note that P {N a = 0} is larger than P 0 = 1 − λ/μ, showing that the nextarrival is more likely to find an empty system than is an average arrival, andthus illustrating the inspection paradox that when the next customer arrives theelapsed time since the previous arrival is distributed as the sum of two independentexponentials with rate λ. Also, we might expect because of the inspectionparadox that E[N a ] is less than L, the average number of customers seen by anarrival. That this is indeed the case is seen from∞∑E[N a ]= n(λ/μ) n+1 (1 − λ/μ) = λ μ L


8.3. Exponential Models 509balance equations:StateRate at which the process leaves = rate at which it enters0 λP 0 = μP 11 n N − 1 (λ + μ)P n = λP n−1 + μP n+1NμP N = λP N−1The argument for state 0 is exactly as before. Namely, when in state 0, theprocess will leave only via an arrival (which occurs at rate λ) and hence the rateat which the process leaves state 0 is λP 0 . On the other hand, the process canenter state 0 only from state 1 via a departure; hence, the rate at which the processenters state 0 is μP 1 . The equation for states n, where 1 n


510 8 Queueing Theoryand hence from Equation (8.10) we obtainP n = (λ/μ)n (1 − λ/μ)1 − (λ/μ) N+1 , n= 0, 1,...,N (8.11)Note that in this case, there is no need to impose the condition that λ/μ < 1. Thequeue size is, by definition, bounded so there is no possibility of its increasingindefinitely.As before, L may be expressed in terms of P n to yieldL =N∑nP nn=0which after some algebra yields(1 − λ/μ) ∑N ( λ n=1 − (λ/μ) N+1 nμ)n=0L = λ[1 + N(λ/μ)N+1 − (N + 1)(λ/μ) N ](μ − λ)(1 − (λ/μ) N+1 )(8.12)In deriving W , the expected amount of time a customer spends in the system,we must be a little careful about what we mean by a customer. Specifically, arewe including those “customers” who arrive to find the system full and thus do notspend any time in the system? Or, do we just want the expected time spent in thesystem by a customer who actually entered the system? The two questions lead,of course, to different answers. In the first case, we have λ a = λ; whereas in thesecond case, since the fraction of arrivals that actually enter the system is 1 − P N ,it follows that λ a = λ(1 − P N ). Once it is clear what we mean by a customer,W can be obtained fromW = L λ aExample 8.4 Suppose that it costs cμ dollars per hour to provide service atarateμ. Suppose also that we incur a gross profit of A dollars for each customerserved. If the system has a capacity N, what service rate μ maximizes our totalprofit?Solution: To solve this, suppose that we use rate μ. Let us determine theamount of money coming in per hour and subtract from this the amount goingout each hour. This will give us our profit per hour, and we can choose μ so asto maximize this.


8.3. Exponential Models 511Now, potential customers arrive at a rate λ. However, a certain proportionof them do not join the system—namely, those who arrive when there are Ncustomers already in the system. Hence, since P N is the proportion of timethat the system is full, it follows that entering customers arrive at a rate ofλ(1 − P N ). Since each customer pays $A, it follows that money comes in atan hourly rate of λ(1 − P N )A and since it goes out at an hourly rate of cμ, itfollows that our total profit per hour is given byprofit per hour = λ(1 − P N )A − cμ[]= λA 1 − (λ/μ)N (1 − λ/μ)1 − (λ/μ) N+1 − cμ= λA[1 − (λ/μ)N ]1 − (λ/μ) N+1 − cμFor instance if N = 2,λ= 1,A= 10,c= 1, thenprofit per hour = 10[1 − (1/μ)2 ]1 − (1/μ) 3 − μ= 10(μ3 − μ)μ 3 − 1in order to maximize profit we differentiate to obtain− μddμ [profit per hour]=10(2μ3 − 3μ 2 + 1)(μ 3 − 1) 2 − 1The value of μ that maximizes our profit now can be obtained by equating tozero and solving numerically. In the previous two models, it has been quite easy to define the state of thesystem. Namely, it was defined as the number of people in the system. Nowwe shall consider some examples where a more detailed state space is necessary.8.3.3. A Shoeshine ShopConsider a shoeshine shop consisting of two chairs. Suppose that an entering customerfirst will go to chair 1. When his work is completed in chair 1, he will goeither to chair 2 if that chair is empty or else wait in chair 1 until chair 2 becomesempty. Suppose that a potential customer will enter this shop as long as chair 1


512 8 Queueing Theoryis empty. (Thus, for instance, a potential customer might enter even if there is acustomer in chair 2.)If we suppose that potential customers arrive in accordance with a Poissonprocess at rate λ, and that the service times for the two chairs are independentand have respective exponential rates of μ 1 and μ 2 , then(a) what proportion of potential customers enters the system?(b) what is the mean number of customers in the system?(c) what is the average amount of time that an entering customer spends in thesystem?To begin we must first decide upon an appropriate state space. It is clear thatthe state of the system must include more information than merely the number ofcustomers in the system. For instance, it would not be enough to specify that thereis one customer in the system as we would also have to know which chair he wasin. Further, if we only know that there are two customers in the system, then wewould not know if the man in chair 1 is still being served or if he is just waitingfor the person in chair 2 to finish. To account for these points, the following statespace, consisting of the five states, (0, 0), (1, 0), (0, 1), (1, 1), and (b, 1), will beused. The states have the following interpretation:StateInterpretation(0, 0) There are no customers in the system.(1, 0) There is one customer in the system, and he is in chair 1.(0, 1) There is one customer in the system, and he is in chair 2.(1, 1) There are two customers in the system, and both arepresently being served.(b, 1) There are two customers in the system, but the customerin the first chair has completed his work in that chair andis waiting for the second chair to become free.It should be noted that when the system is in state (b, 1), the person in chair 1,though not being served, is nevertheless “blocking” potential arrivals from enteringthe system.As a prelude to writing down the balance equations, it is usually worthwhileto make a transition diagram. This is done by first drawing a circle for each stateand then drawing an arrow labeled by the rate at which the process goes fromone state to another. The transition diagram for this model is shown in Figure 8.1.The explanation for the diagram is as follows: The arrow from state (0, 0) to state(1, 0) which is labeled λ means that when the process is in state (0, 0), that is,when the system is empty, then it goes to state (1, 0) at a rate λ, that is via anarrival. The arrow from (0, 1) to (1, 1) is similarly explained.


8.3. Exponential Models 513Figure 8.1. A transition diagram.When the process is in state (1, 0), it will go to state (0, 1) when the customerin chair 1 is finished and this occurs at a rate μ 1 ; hence the arrow from (1, 0) to(0, 1) labeled μ 1 . The arrow from (1, 1) to (b, 1) is similarly explained.When in state (b, 1) the process will go to state (0, 1) when the customer inchair 2 completes his service (which occurs at rate μ 2 ); hence the arrow from(b, 1) to (0, 1) labeled μ 2 . Also when in state (1, 1) the process will go to state(1, 0) when the man in chair 2 finishes and hence the arrow from (1, 1) to (1, 0)labeled μ 2 . Finally, if the process is in state (0, 1), then it will go to state (0, 0)when the man in chair 2 completes his service, hence the arrow from (0, 1) to(0, 0) labeled μ 2 .Because there are no other possible transitions, this completes the transitiondiagram.To write the balance equations we equate the sum of the arrows (multiplied bythe probability of the states where they originate) coming into a state with the sumof the arrows (multiplied by the probability of the state) going out of that state.This givesStateRate that the process leaves = rate that it enters(0, 0) λP 00 = μ 2 P 01(1, 0) μ 1 P 10 = λP 00 + μ 2 P 11(0, 1) (λ+ μ 2 )P 01 = μ 1 P 10 + μ 2 P b1(1, 1) (μ 1 + μ 2 )P 11 = λP 01(b, 1) μ 2 P b1 = μ 1 P 11These along with the equationP 00 + P 10 + P 01 + P 11 + P b1 = 1may be solved to determine the limiting probabilities. Though it is easy to solvethe preceding equations, the resulting solutions are quite involved and hence will


514 8 Queueing Theorynot be explicitly presented. However, it is easy to answer our questions in terms ofthese limiting probabilities. First, since a potential customer will enter the systemwhen the state is either (0, 0) or (0, 1), it follows that the proportion of customersentering the system is P 00 + P 01 . Secondly, since there is one customer in thesystem whenever the state is (0 1) or (1, 0) and two customers in the systemwhenever the state is (1, 1) or (b, 1), it follows that L, the average number in thesystem, is given byL = P 01 + P 10 + 2(P 11 + P b1 )To derive the average amount of time that an entering customer spends in thesystem, we use the relationship W = L/λ a . Since a potential customer will enterthe system when in state (0, 0) or (0, 1), it follows that λ a = λ(P 00 + P 01 ) andhenceW = P 01 + P 10 + 2(P 11 + P b1 )λ ( P 00 + P 01 )Example 8.5 (a) If λ = 1, μ 1 = 1, μ 2 = 2, then calculate the precedingquantities of interest.(b) If λ = 1, μ 1 = 2, μ 2 = 1, then calculate the preceding.Solution:(a) Solving the balance equations yieldsHence,P 00 = 1237 , P 10 = 1637 , P 11 = 2 37 , P 01 = 6 37 , P b1 = 1 37L = 2837 , W = 2818(b) Solving the balance equations yieldsHence,P 00 = 3 11 , P 10 = 211 , P 11 = 1 11 , P b1 = 2 11 , P 01 = 3 11L = 1, W = 11 68.3.4. A Queueing System with Bulk ServiceIn this model, we consider a single-server exponential queueing system in whichthe server is able to serve two customers at the same time. Whenever the servercompletes a service, she then serves the next two customers at the same time.However, if there is only one customer in line, then she serves that customer byherself. We shall assume that her service time is exponential at rate μ whether


8.3. Exponential Models 515she is serving one or two customers. As usual, we suppose that customers arriveat an exponential rate λ. One example of such a system might be an elevator or acable car which can take at most two passengers at any time.It would seem that the state of the system would have to tell us not only howmany customers there are in the system, but also whether one or two are presentlybeing served. However, it turns out that we can more easily solve the problemnot by concentrating on the number of customers in the system, but rather on thenumber in queue. So let us define the state as the number of customers waiting inqueue, with two states when there is no one in queue. That is, let us have as a statespace 0 ′ , 0, 1, 2,...,with the interpretationStateInterpretation0 ′ No one in service0 Server busy; no one waitingn, n > 0 n customers waitingThe transition diagram is shown in Figure 8.2 and the balance equations areStateRate at which the process leaves = rate at which it enters0 ′ λP 0 ′ = μP 00 (λ + μ)P 0 = λP 0 ′ + μP 1 + μP 2n, n 1 (λ + μ)P n = λP n−1 + μP n+2Now the set of equationshas a solution of the form(λ + μ)P n = λP n−1 + μP n+2 , n= 1, 2,... (8.13)P n = α n P 0To see this, substitute the preceding in Equation (8.13) to obtainor(λ + μ)α n P 0 = λα n−1 P 0 + μα n+2 P 0(λ + μ)α = λ + μα 3Figure 8.2.


516 8 Queueing TheorySolving this for α yields the three roots:α = 1, α= −1 − √ 1 + 4λ/μ, and α = −1 + √ 1 + 4λ/μ22As the first two are clearly not possible, it follows that√ 1 + 4λ/μ − 1α =2Hence,P n = α n P 0 ,P 0 ′ = μ λ P 0where the bottom equation follows from the first balance equation. (We can ignorethe second balance equation as one of these equations is always redundant.) Toobtain P 0 ,weuse∞∑P 0 + P 0 ′ + P n = 1orororand thuswheren=1[P 0 1 + μ ∞ ]λ + ∑α n = 1n=1[ 1P 01 − α + μ ]= 1λP 0 =λ(1 − α)λ + μ(1 − α)P n = αn λ(1 − α)λ + μ(1 − α) , n 0P 0 ′ =μ(1 − α)λ + μ(1 − α)√ 1 + 4λ/μ − 1α =2(8.14)


8.4. Network of Queues 517Note that for the preceding to be valid we need α


518 8 Queueing TheoryFigure 8.3. A tandem queue.serves one customer at a time with server i taking an exponential time with rateμ i for a service, i = 1, 2. Such a system is called a tandem or sequential system(see Figure 8.3).To analyze this system we need to keep track of the number of customers atserver 1 and the number at server 2. So let us define the state by the pair (n, m)—meaning that there are n customers at server 1 and m at server 2. The balanceequations areStateRate that the process leaves = rate that it enters0, 0 λP 0,0 = μ 2 P 0,1n, 0; n>0 (λ + μ 1 )P n,0 = μ 2 P n,1 + λP n−1,00,m; m>0(λ + μ 2 )P 0,m = μ 2 P 0,m+1 + μ 1 P 1,m−1n, m; nm > 0 (λ + μ 1 + μ 2 )P n,m = μ 2 P n,m+1 + μ 1 P n+1,m−1+ λP n−1,m (8.15)Rather than directly attempting to solve these (along with the equation ∑ n,m P n,m= 1) we shall guess at a solution and then verify that it indeed satisfies the preceding.We first note that the situation at server 1 is just as in an M/M/1 model.Similarly, as it was shown in Section 6.6 that the departure process of an M/M/1queue is a Poisson process with rate λ, it follows that what server 2 faces is alsoan M/M/1 queue. Hence, the probability that there are n customers at server 1 is( ) λ nP {n at server 1}=(1 − λ )μ 1 μ 1and, similarly,( ) λ mP {m at server 2}=(1 − λ )μ 2 μ 2Now if the numbers of customers at servers 1 and 2 were independent randomvariables, then it would follow that( ) λ nP n,m =(1 − λ )( ) λ m (1 − λ )(8.16)μ 1 μ 1 μ 2 μ 2


8.4. Network of Queues 519To verify that P n,m is indeed equal to the preceding (and thus that the numberof customers at server 1 is independent of the number at server 2), all we needdo is verify that the preceding satisfies the set of Equations (8.15)—this sufficessince we know that the P n,m are the unique solution of Equations (8.15). Now, forinstance, if we consider the first equation of (8.15), we need to show thatλ(1 − λ )(1 − λ )= μ 2(1 − λ )( λ)(1 − λ )μ 1 μ 2 μ 1 μ 2 μ 2which is easily verified. We leave it as an exercise to show that the P n,m ,asgivenby Equation (8.16), satisfy all of the Equations (8.15), and are thus the limitingprobabilities.From the preceding we see that L, the average number of customers in thesystem, is given byL = ∑ n,m(n + m)P n,m= ∑ n( ) λ nn(1 − λ )+ ∑ μ 1 μ 1 m( ) λ mm(1 − λ )μ 2 μ 2= λμ 1 − λ + λμ 2 − λand from this we see that the average time a customer spends in the system isW = L λ = 1μ 1 − λ + 1μ 2 − λRemarks (i) The result (Equations (8.15)) could have been obtained as a directconsequence of the time reversibility of an M/M/1 (see Section 6.6). For not onlydoes time reversibility imply that the output from server 1 is a Poisson process,but it also implies (Exercise 26 of Chapter 6) that the number of customers atserver 1 is independent of the past departure times from server 1. As these pastdeparture times constitute the arrival process to server 2, the independence of thenumbers of customers in the two systems follows.(ii) Since a Poisson arrival sees time averages, it follows that in a tandem queuethe numbers of customers an arrival (to server 1) sees at the two servers are independentrandom variables. However, it should be noted that this does not implythat the waiting times of a given customer at the two servers are independent. Fora counter example suppose that λ is very small with respect to μ 1 = μ 2 ; and thusalmost all customers have zero wait in queue at both servers. However, given thatthe wait in queue of a customer at server 1 is positive, his wait in queue at server 2also will be positive with probability at least as large as 1 2(why?). Hence, the wait-


520 8 Queueing Theorying times in queue are not independent. Remarkably enough, however, it turns outthat the total times (that is, service time plus wait in queue) that an arrival spendsat the two servers are indeed independent random variables.The preceding result can be substantially generalized. To do so, considerasystemofk servers. Customers arrive from outside the system to server i,i = 1,...,k, in accordance with independent Poisson processes at rate r i ;theythen join the queue at i until their turn at service comes. Once a customer is servedby server i, he then joins the queue in front of server j, j = 1,...,k, with probabilityP ij . Hence, ∑ kj=1 P ij 1, and 1 − ∑ kj=1 P ij represents the probability thata customer departs the system after being served by server i.If we let λ j denote the total arrival rate of customers to server j, then the λ jcan be obtained as the solution ofk∑λ j = r j + λ i P ij , i = 1,...,k (8.17)i=1Equation (8.17) follows since r j is the arrival rate of customers to j coming fromoutside the system and, as λ i is the rate at which customers depart server i (rate inmust equal rate out), λ i P ij is the arrival rate to j of those coming from server i.It turns out that the number of customers at each of the servers is independentand of the form( ) n (λjP {n customers at server j}= 1 − λ )j, n 1μ j μ jwhere μ j is the exponential service rate at server j and the λ j are the solutionto Equation (8.17). Of course, it is necessary that λ j /μ j < 1 for all j. To provethis, we first note that it is equivalent to asserting that the limiting probabilitiesP(n 1 ,n 2 ,...,n k ) = P {n j at server j, j = 1,...,k} are given byP(n 1 ,n 2 ,...,n k ) =k∏j=1(λjμ j) nj(1 − λ jμ j)(8.18)which can be verified by showing that it satisfies the balance equations for thismodel.The average number of customers in the system isL ==k∑average number at server jj=1k∑j=1λ jμ j − λ j


8.4. Network of Queues 521The average time a customer spends in the system can be obtained from L = λWwith λ = ∑ kj=1 r j . (Why not λ = ∑ kj=1 λ j ?) This yields∑ kj=1λ j /(μ j − λ j )W = ∑ kj=1r jRemark The result embodied in Equation (8.18) is rather remarkable in that itsays that the distribution of the number of customers at server i is the same as inan M/M/1 system with rates λ i and μ i . What is remarkable is that in the networkmodel the arrival process at node i need not be a Poisson process. For if there isa possibility that a customer may visit a server more than once (a situation calledfeedback), then the arrival process will not be Poisson. An easy example illustratingthis is to suppose that there is a single server whose service rate is verylarge with respect to the arrival rate from outside. Suppose also that with probabilityp = 0.9 a customer upon completion of service is fed back into the system.Hence, at an arrival time epoch there is a large probability of another arrival in ashort time (namely, the feedback arrival); whereas at an arbitrary time point therewill be only a very slight chance of an arrival occurring shortly (since λ is so verysmall). Hence, the arrival process does not possess independent increments andso cannot be Poisson.Thus, we see that when feedback is allowed the steady-state probabilities ofthe number of customers at any given station have the same distribution as in anM/M/1 model even though the model is not M/M/1. (Presumably such quantitiesas the joint distribution of the number at the station at two different timepoints will not be the same as for an M/M/1.)Example 8.6 Consider a system of two servers where customers from outsidethe system arrive at server 1 at a Poisson rate 4 and at server 2 at a Poisson rate 5.The service rates of 1 and 2 are respectively 8 and 10. A customer upon completionof service at server 1 is equally likely to go to server 2 or to leave the system(i.e., P 11 = 0, P 12 = 1 2); whereas a departure from server 2 will go 25 percent ofthe time to server 1 and will depart the system otherwise (i.e., P 21 = 1 4 , P 22 = 0).Determine the limiting probabilities, L, and W .Solution: The total arrival rates to servers 1 and 2—call them λ 1 and λ 2 —can be obtained from Equation (8.17). That is, we haveimplying thatλ 1 = 4 + 1 4 λ 2,λ 2 = 5 + 1 2 λ 1λ 1 = 6, λ 2 = 8


522 8 Queueing TheoryHence,andP {n at server 1, m at server 2}=(34) n 14(45) m 15= 1 20L = 68 − 6 + 810 − 8 = 7,(34) n (45) mW = L 9 = 7 98.4.2. Closed SystemsThe queueing systems described in Section 8.4.1 are called open systems sincecustomers are able to enter and depart the system. A system in which new customersnever enter and existing ones never depart is called a closed system.Let us suppose that we have m customers moving among a system of k servers,where the service times at server i are exponential with rate μ i , i = 1,...,k.When a customer completes service at server i, she then joins the queue infront of server j,j = 1,...,k, with probability P ij , where we now suppose that∑ kj=1P ij = 1 for all i = 1,...,k. That is, P =[P ij ] is a Markov transition probabilitymatrix, which we shall assume is irreducible. Let π = (π 1 ,...,π k ) denotethe stationary probabilities for this Markov chain; that is, π is the unique positivesolution ofπ j =k∑π i P ij ,i=1k∑π j = 1 (8.19)j=1If we denote the average arrival rate (or equivalently the average service completionrate) at server j by λ m (j), j = 1,...,kthen, analogous to Equation (8.17),the λ m (j) satisfyλ m (j) =k∑λ m (i)P iji=1


8.4. Network of Queues 523Hence, from (8.19) we can conclude thatwhereλ m (j) = λ m π j , j = 1, 2,...,k (8.20)λ m =k∑λ m (j) (8.21)j=1From Equation (8.21), we see that λ m is the average service completion rate ofthe entire system, that is, it is the system throughput rate. *If we let P m (n 1 ,n 2 ,...,n k ) denote the limiting probabilitiesP m (n 1 ,n 2 ,...,n k ) = P {n j customers at server j,j = 1,...,k}then, by verifying that they satisfy the balance equation, it can be shown that{ ∏ kj=1 Km (λP m (n 1 ,n 2 ,...,n k ) =m (j)/μ j ) n j, if ∑ kj=1 n j = m0, otherwiseBut from Equation (8.20) we thus obtain thatP m (n 1 ,n 2 ,...,n k ) ={ ∏ kj=1 Cm (π j /μ j ) n j, if ∑ kj=1 n j = m0, otherwise(8.22)where⎡∑C m = ⎢⎣n∑ 1 ,...,n k : j=1nj =m⎤k∏(π j /μ j ) n j ⎥⎦−1(8.23)Equation (8.22) is not as useful as we might suppose, for in order to utilizeit we must determine the normalizing constant C m given by Equation (8.23)which requires summing the products k j=1 (π j /μ j ) n jover all the feasible vectors(n 1 ,...,n k ): ∑ kj=1 n j = m. Hence, since there are ( m+k−1) m vectors this isonly computationally feasible for relatively small values of m and k.We will now present an approach that will enable us to determine recursivelymany of the quantities of interest in this model without first computing the normalizingconstants. To begin, consider a customer who has just left server i andis headed to server j, and let us determine the probability of the system as seen* We are just using the notation λ m (j) and λ m to indicate the dependence on the number of customersin the closed system. This will be used in recursive relations we will develop.


524 8 Queueing Theoryby this customer. In particular, let us determine the probability that this customerobserves, at that moment, n l customers at server l,l = 1,...,k, ∑ kl=1 n l = m − 1.This is done as follows:P {customer observes n l at server l,l = 1,...,k| customer goes from i to j}= P {state is (n 1,...,n i + 1,...,n j ,...,n k ), customer goes from i toj}P {customer goes from i toj}=P m (n 1 ,...,n i + 1,...,n j ,...,n k )μ i P ij∑n: ∑ n j =m−1 P m(n 1 ,...,n i + 1,...,n k )μ i P ij= (π i/μ i ) ∏ kj=1 (π j /μ j ) n jKk∏= C (π j /μ j ) n jj=1from (8.22)where C does not depend on n 1 ,...,n k . But because the preceding is a probabilitydensity on the set of vectors (n 1 ,...,n k ), ∑ kj=1 n j = m−1, it follows from (8.22)that it must equal P m−1 (n 1 ,...,n k ). Hence,P {customer observes n l at server l,l = 1,...,k| customer goes from i to j}= P m−1 (n 1 ,...,n k ),k∑n i = m − 1 (8.24)i=1As (8.24) is true for all i, we thus have proven the following proposition, knownas the arrival theorem.Proposition 8.3 (The Arrival Theorem) In the closed network system withm customers, the system as seen by arrivals to server j is distributed as the stationarydistribution in the same network system when there are only m − 1 customers.Denote by L m (j) and W m (j) the average number of customers and the averagetime a customer spends at server j when there are m customers in the network.Upon conditioning on the number of customers found at server j by an arrival tothat server, it follows thatW m (j) = 1 + E m[number at server j as seen by an arrival]μ j= 1 + L m−1(j)μ j(8.25)


8.4. Network of Queues 525where the last equality follows from the arrival theorem. Now when there arem − 1 customers in the system, then, from Equation (8.20), λ m−1 (j), the averagearrival rate to server j, satisfiesλ m−1 (j) = λ m−1 π jNow, applying the basic cost identity Equation (8.1) with the cost rule being thateach customer in the network system of m − 1 customers pays one per unit timewhile at server j, we obtainUsing Equation (8.25), this yieldsL m−1 (j) = λ m−1 π j W m−1 (j) (8.26)W m (j) = 1 + λ m−1π j W m−1 (j)μ j(8.27)Also using the fact that ∑ kj=1 L m−1 (j) = m − 1 (why?) we obtain, from Equation(8.26):orm − 1 = λ m−1λ m−1 =k∑π j W m−1 (j)j=1m − 1∑ ki=1π i W m−1 (i)(8.28)Hence, from Equation (8.27), we obtain the recursionW m (j) = 1 + (m − 1)π j W m−1 (j)∑μ j μ ki=1 j π i W m−1 (i)(8.29)Starting with the stationary probabilities π j ,j = 1,...,k, and W 1 (j) = 1/μ jwe can now use Equation (8.29) to determine recursively W 2 (j), W 3 (j), . . . ,W m (j). We can then determine the throughput rate λ m by using Equation (8.28),and this will determine L m (j) by Equation (8.26). This recursive approach iscalled mean value analysis.Example 8.7 Consider a k-server network in which the customers move in acyclic permutation. That is,P i,i+1 = 1, i = 1, 2 ...,k− 1, P k,1 = 1Let us determine the average number of customers at server j when there are twocustomers in the system. Now, for this networkπ i = 1/k,i = 1,...,k


526 8 Queueing Theoryand asW 1 (j) = 1 μ jwe obtain from Equation (8.29) thatW 2 (j) = 1 (1/k)(1/μ j )+ ∑μ j μ ki=1 j (1/k)(1/μ i )= 1 1+μ j μ 2 ∑ ki=1j1/μ iHence, from Equation (8.28),λ 2 =k∑l=121k W 2(l)=k∑( 1+μ ll=1μ 2 l2k)1∑ ki=11/μ iand finally, using Equation (8.26),L 2 (j) = λ 21k W 2(j) =2( 1μ j+k∑( 1+μ ll=1μ 2 jμ 2 l)1∑ ki=11/μ i)1∑ ki=11/μ iAnother approach to learning about the stationary probabilities specified byEquation (8.22), which finesses the computational difficulties of computing theconstant C m , is to use the Gibbs sampler of Section 4.9 to generate a Markovchain having these stationary probabilities. To begin, note that since there arealways a total of m customers in the system, Equation (8.22) may equivalently bewritten as a joint mass function of the numbers of customers at each of the servers1,...,k− 1, as follows:k−1∏P m (n 1 ,...,n k−1 ) = C m (π k /μ k ) m−∑ n j(π j /μ j ) n jj=1k−1∏= K (a j ) n j,j=1k−1∑n j mj=1


8.4. Network of Queues 527where a j = (π j μ k )/(π k μ j ), j = 1,...,k − 1. Now, if N = (N 1 ,...,N k−1 )has the preceding joint mass function then,P {N i = n|N 1 = n 1 ,...,N i−1 = n i−1 ,N i+1 = n i+1 ,...,N k−1 = n k−1 }= P m(n 1 ,...,n i−1 ,n,n i+1 ,...,n k−1 )∑r P m(n 1 ,...,n i−1 ,r,n i+1 ,...,n k−1 )= Cai n , n m − ∑ n jj≠iIt follows from the preceding that we may use the Gibbs sampler to generatethe values of a Markov chain whose limiting probability mass function isP m (n 1 ,...,n k−1 ) as follows:1. Let (n 1 ,...,n k−1 ) be arbitrary nonnegative integers satisfying ∑ k−1j=1 n j m.2. Generate a random variable I that is equally likely to be any of 1,...,k− 1.3. If I = i, sets = m − ∑ j≠i n j , and generate the value of a random variableX having probability mass function4. Let n I = X and go to step 2.P {X = n}=Ca n i ,n= 0,...,sThe successive values of the state vector (n 1 ,...,n k−1 ,m− ∑ k−1j=1 n j ) constitutethe sequence of states of a Markov chain with the limiting distribution P m .Allquantities of interest can be estimated from this sequence. For instance, the averageof the values of the jth coordinate of these vectors will converge to the meannumber of individuals at station j, the proportion of vectors whose jth coordinateis less than r will converge to the limiting probability that the number ofindividuals at station j is less than r, and so on.Other quantities of interest can also be obtained from the simulation. For instance,suppose we want to estimate W j , the average amount of time a customerspends at server j on each visit. Then, as noted in the preceding, L j , the averagenumber of customers at server j, can be estimated. To estimate W j ,weusetheidentityL j = λ j W jwhere λ j is the rate at which customers arrive at server j. Setting λ j equal to theservice completion rate at server j shows thatλ j = P {j is busy}μ jUsing the Gibbs sampler simulation to estimate P {j is busy} then leads to anestimator of W j .


528 8 Queueing Theory8.5. The System M/G/18.5.1. Preliminaries: Work and Another Cost IdentityFor an arbitrary queueing system, let us define the work in the system at any timet to be the sum of the remaining service times of all customers in the system attime t. For instance, suppose there are three customers in the system—the one inservice having been there for three of his required five units of service time, andboth people in queue having service times of six units. Then the work at that timeis 2 + 6 + 6 = 14. Let V denote the (time) average work in the system.Now recall the fundamental cost equation (8.1), which states that theaverage rate at which the system earns= λ a × average amount a customer paysand consider the following cost rule: Each customer pays at a rate of y/unit timewhen his remaining service time is y, whether he is in queue or in service. Thus,the rate at which the system earns is just the work in the system; so the basicidentity yields thatV = λ a E[amount paid by a customer]Now, let S and WQ ∗ denote respectively the service time and the time a givencustomer spends waiting in queue. Then, since the customer pays at a constantrate of S per unit time while he waits in queue and at a rate of S − x after spendingan amount of time x in service, we have∫ S]E[amount paid by a customer]=E[SWQ ∗ + (S − x)dxand thusV = λ a E[SWQ ∗ ]+λ aE[S 2 ](8.30)2It should be noted that the preceding is a basic queueing identity [like Equations(8.2)–(8.4)] and as such is valid in almost all models. In addition, if a customer’sservice time is independent of his wait in queue (as is usually, but not always thecase), * then we have from Equation (8.30) that0V = λ a E[S]W Q + λ aE[S 2 ]2(8.31)* For an example where it is not true, see Section 8.6.2.


8.5. The System M/G/1 5298.5.2. Application of Work to M/G/1The M/G/1 model assumes (i) Poisson arrivals at rate λ; (ii) a general servicedistribution; and (iii) a single server. In addition, we will suppose that customersare served in the order of their arrival.Now, for an arbitrary customer in an M/G/1 system.Customer’s wait in queue = work in the system when he arrives (8.32)this follows since there is only a single server (think about it!). Taking expectationsof both sides of Equation (8.32) yieldsW Q = average work as seen by an arrivalBut, due to Poisson arrivals, the average work as seen by an arrival will equal V ,the time average work in the system. Hence, for the model M/G/1,W Q = VThe preceding in conjunction with the identityV = λE[S]W Q + λE[S2 ]2yields the so-called Pollaczek–Khintchine formula,λE[S 2 ]W Q =(8.33)2(1 − λE[S])where E[S] and E[S 2 ] are the first two moments of the service distribution.The quantities L, L Q , and W can be obtained from Equation (8.33) asL Q = λW Q = λ2 E[S 2 ]2(1 − λE[S]) ,W = W Q + E[S]=L = λW = λ2 E[S 2 ]2(1 − λE[S]) + λE[S]λE[S 2 ]+ E[S], (8.34)2(1 − λE[S])Remarks (i) For the preceding quantities to be finite, we need λE[S] < 1. Thiscondition is intuitive since we know from renewal theory that if the server wasalways busy, then the departure rate would be 1/E[S] (see Section 7.3), whichmust be larger than the arrival rate λ to keep things finite.(ii) Since E[S 2 ]=Var(S) + (E[S]) 2 , we see from Equations (8.33) and (8.34)that, for fixed mean service time, L, L Q , W , and W Q all increase as the varianceof the service distribution increases.(iii) Another approach to obtain W Q is presented in Exercise 38.


530 8 Queueing Theory8.5.3. Busy PeriodsThe system alternates between idle periods (when there are no customers in thesystem, and so the server is idle) and busy periods (when there is at least onecustomer in the system, and so the server is busy).Let us denote by I n and B n , respectively, the lengths of the nth idle and thenth busy period, n 1. Hence, in the first ∑ nj=1 (I j + B j ) time units the serverwill be idle for a time ∑ nj=1 I j , and so the proportion of time that the server willbe idle, which of course is just P 0 , can be expressed asP 0 = proportion of idle time= limn→∞I 1 +···+I nI 1 +···+I n + B 1 +···+B nNow it is easy to see that the I 1 ,I 2 ,...are independent and identically distributedas are the B 1 ,B 2 ,.... Hence, by dividing the numerator and the denominator ofthe right side of the preceding by n, and then applying the strong law of largenumbers, we obtain(I 1 +···+I n )/nP 0 = limn→∞ (I 1 +···+I n )/n + (B 1 +···+B n )/n=E[I]E[I]+E[B](8.35)where I and B represent idle and busy time random variables.Now I represents the time from when a customer departs and leaves the systemempty until the next arrival. Hence, from Poisson arrivals, it follows that I isexponential with rate λ, and soE[I]= 1 λ(8.36)To compute P 0 , we note from Equation (8.4) (obtained from the fundamental costequation by supposing that a customer pays at a rate of one per unit time while inservice) thataverage number of busy servers = λE[S]


8.6. Variations on the M/G/1 531However, as the left-hand side of the preceding equals 1 − P 0 (why?), we haveand, from Equations (8.35)–(8.37),orP 0 = 1 − λE[S] (8.37)1 − λE[S]=E[B]=1/λ1/λ + E[B]E[S]1 − λE[S]Another quantity of interest is C, the number of customers served in a busyperiod. The mean of C can be computed by noting that, on the average, for everyE[C] arrivals exactly one will find the system empty (namely, the first customerin the busy period). Hence,a 0 = 1E[C]and, as a 0 = P 0 = 1 − λE[S] because of Poisson arrivals, we see thatE[C]=11 − λE[S]8.6. Variations on the M/G/18.6.1. The M/G/1 with Random-Sized Batch ArrivalsSuppose that, as in the M/G/1, arrivals occur in accordance with a Poissonprocess having rate λ. But now suppose that each arrival consists not of a singlecustomer but of a random number of customers. As before there is a singleserver whose service times have distribution G.Let us denote by α j ,j 1, the probability that an arbitrary batch consists ofj customers; and let N denote a random variable representing the size of a batchand so P {N = j}=α j . Since λ a = λE(N), the basic formula for work [Equation(8.31)] becomes[]V = λE[N] E(S)W Q + E(S2 )(8.38)2To obtain a second equation relating V to W Q , consider an average customer.We have that


532 8 Queueing Theoryhis wait in queue = work in system when he arrives+ his waiting time due to those in his batchTaking expectations and using the fact that Poisson arrivals see time averagesyieldsW Q = V + E[waiting time due to those in his batch]= V + E[W B ] (8.39)Now, E(W B ) can be computed by conditioning on the number in the batch, butwe must be careful because the probability that our average customer comes froma batch of size j is not α j .Forα j is the proportion of batches that are of size j,and if we pick a customer at random, it is more likely that he comes from a largerrather than a smaller batch. (For instance, suppose α 1 = α 100 = 1 2, then half thebatches are of size 1 but 100/101 of the customers will come from a batch ofsize 100!)To determine the probability that our average customer came from a batch ofsize j we reason as follows: Let M be a large number. Then of the first M batchesapproximately Mα j will be of size j, j 1, and thus there would have beenapproximately jMα j customers that arrived in a batch of size j. Hence, the proportionof arrivals in the first M batches that were from batches of size j is approximatelyjMα j / ∑ j jMα j . This proportion becomes exact as M →∞, andso we see thatproportion of customers from batches of size j =jα j∑j jα j= jα jE[N]We are now ready to compute E(W B ), the expected wait in queue due to othersin the batch:E[W B ]= ∑ jE[W B | batch of size j] jα jE[N](8.40)Now if there are j customers in his batch, then our customer would have to waitfor i − 1 of them to be served if he was ith in line among his batch members. Ashe is equally likely to be either 1st, 2nd, ...,orjth in line we see thatE[W B | batch is of size j]=j∑(i − 1)E(S) 1 ji=1= j − 12 E[S]


8.6. Variations on the M/G/1 533Substituting this in Equation (8.40) yieldsE[W B ]= E[S] ∑(j − 1)jα j2E[N]j= E[S](E[N 2 ]−E[N])2E[N]and from Equations (8.38) and (8.39) we obtainRemarksW Q = E[S](E[N 2 ]−E[N])/2E[N]+λE[N]E[S 2 ]/21 − λE[N]E[S](i) Note that the condition for W Q to be finite is thatλE(N) < 1E[S]which again says that the arrival rate must be less than the service rate (when theserver is busy).(ii) For fixed value of E[N], W Q is increasing in Var[N], again indicating that“single-server queues do not like variation.”(iii) The other quantities L, L Q , and W can be obtained by using8.6.2. Priority QueuesW = W Q + E[S],L = λ a W = λE[N]W,L Q = λE[N]W QPriority queueing systems are ones in which customers are classified into typesand then given service priority according to their type. Consider the situationwhere there are two types of customers, which arrive according to independentPoisson processes with respective rates λ 1 and λ 2 , and have service distributionsG 1 and G 2 . We suppose that type 1 customers are given service priority, in thatservice will never begin on a type 2 customer if a type 1 is waiting. However, ifa type 2 is being served and a type 1 arrives, we assume that the service of thetype 2 is continued until completion. That is, there is no preemption once servicehas begun.Let WQ i denote the average wait in queue of a type i customer, i = 1, 2. Ourobjective is to compute the WQ i .First, note that the total work in the system at any time would be exactly thesame no matter what priority rule was employed (as long as the server is always


534 8 Queueing Theorybusy whenever there are customers in the system). This is so since the work willalways decrease at a rate of one per unit time when the server is busy (no matterwho is in service) and will always jump by the service time of an arrival. Hence,the work in the system is exactly as it would be if there was no priority rule butrather a first-come, first-served (called FIFO) ordering. However, under FIFO thepreceding model is just M/G/1 withλ = λ 1 + λ 2 ,G(x) = λ 1λ G 1(x) + λ 2λ G 2(x) (8.41)which follows since the combination of two independent Poisson processes isitself a Poisson process whose rate is the sum of the rates of the componentprocesses. The service distribution G can be obtained by conditioning on whichpriority class the arrival is from—as is done in Equation (8.41).Hence, from the results of Section 8.5, it follows that V , the average work inthe priority queueing system, is given byλE[S 2 ]V =2(1 − λE[S])λ((λ 1 /λ)E[S1 2 =]+(λ 2/λ)E[S2 2])2[1 − λ((λ 1 /λ)E[S 1 ]+(λ 2 /λ)E[S 2 ])]λ 1 E[S1 2 =]+λ 2E[S2 2](8.42)2(1 − λ 1 E[S 1 ]−λ 2 E[S 2 ])where S i has distribution G i , i = 1, 2.Continuing in our quest for WQ i let us note that S and W Q ∗ , the service and waitin queue of an arbitrary customer, are not independent in the priority model sinceknowledge about S gives us information as to the type of customer which in turngives us information about WQ ∗ . To get around this we will compute separatelythe average amount of type 1 and type 2 work in the system. Denoting V i as theaverage amount of type i work we have, exactly as in Section 8.5.1,If we defineV i = λ i E[S i ]WQ i + λ iE[Si 2], i = 1, 2 (8.43)2V i Q ≡ λ iE[S i ]W i Q ,V i S ≡ λ iE[S 2 i ]2


8.6. Variations on the M/G/1 535then we may interpret VQ i as the average amount of type i work in queue, andVS i as the average amount of type i work in service (why?).Now we are ready to compute WQ 1 . To do so, consider an arbitrary type 1 arrival.Thenhis delay = amount of type 1 work in the system when he arrives+ amounts of type 2 work in service when he arrivesTaking expectations and using the fact that Poisson arrivals see time average yieldsWQ 1 = V 1 + VS2or= λ 1 E[S 1 ]W 1 Q + λ 1E[S 2 1 ]2W 1 Q = λ 1E[S 2 1 ]+λ 2E[S 2 2 ]2(1 − λ 1 E[S 1 ])+ λ 2E[S 2 2 ]2(8.44)(8.45)To obtain W 2 Q we first note that since V = V 1 + V 2 , we have from Equations(8.42) and (8.43) thatλ 1 E[S 2 1 ]+λ 2E[S 2 2 ]2(1 − λ 1 E[S 1 ]−λ 2 E[S 2 ]) = λ 1E[S 1 ]W 1 Q + λ 2E[S 2 ]W 2 Q+ λ 1E[S 2 1 ]2+ λ 2E[S 2 2 ]2= W 1 Q + λ 2E[S 2 ]W 2 Q [from Equation (8.44)]Now, using Equation (8.45), we obtainλ 2 E[S 2 ]WQ 2 = λ 1E[S1 2]+λ 2E[S2 2][]12 1 − λ 1 E[S 1 ]−λ 2 E[S 2 ] − 11 − λ 1 E[S 1 ]orW 2 Q = λ 1 E[S 2 1 ]+λ 2E[S 2 2 ]2(1 − λ 1 E[S 1 ]−λ 2 E[S 2 ])(1 − λ 1 E[S 1 ])(8.46)Remarks (i) Note that from Equation (8.45), the condition for WQ 1 to be finiteis that λ 1 E[S 1 ] < 1, which is independent of the type 2 parameters. (Is thisintuitive?) For WQ 2 to be finite, we need, from Equation (8.46), thatλ 1 E[S 1 ]+λ 2 E[S 2 ] < 1


536 8 Queueing TheorySince the arrival rate of all customers is λ = λ 1 + λ 2 , and the average service timeof a customer is (λ 1 /λ)E[S 1 ]+(λ 2 /λ)E[S 2 ], the preceding condition is just thatthe average arrival rate be less than the average service rate.(ii) If there are n types of customers, we can solve for V j ,j = 1,...,n,inasimilar fashion. First, note that the total amount of work in the system of customersof types 1,...,j is independent of the internal priority rule concerningtypes 1,...,j and only depends on the fact that each of them is given priorityover any customers of types j + 1,...,n. (Why is this? Reason it out!) Hence,V 1 +···+V j is the same as it would be if types 1,...,j were considered as asingle type I priority class and types j + 1,...,nas a single type II priority class.Now, from Equations (8.43) and (8.45),whereV I = λ IE[S 2 I ]+λ Iλ II E[S I ]E[S 2 II ]2(1 − λ I E[S I ])λ I = λ 1 +···+λ j ,λ II = λ j+1 +···+λ n ,E[S I ]=j∑i=1E[S 2 I ]= j∑E[S 2 II ]=i=1n∑i=j+1λ iλ IE[S i ],λ iλ IE[S 2 i ],λ iλ IIE[S 2 i ]Hence, as V I = V 1 +···+V j , we have an expression for V 1 +···+V j ,foreach j = 1,...,n, which then can be solved for the individual V 1 ,V 2 ,...,V n .We now can obtain WQ i from Equation (8.43). The result of all this (which weleave for an exercise) is thatWQ i = λ 1 E[S1 2]+···+λ nE[Sn 2]2 ∏ , i = 1,...,n (8.47)ij=i−1 (1 − λ 1 E[S 1 ]−···−λ j E[S j ])8.6.3. An M/G/1 Optimization ExampleConsider a single-server system where customers arrive according to a Poissonprocess with rate λ, and where the service times are independent and have distributionfunction G. Letρ = λE[S], where S represents a service time random


8.6. Variations on the M/G/1 537variable, and suppose that ρ


538 8 Queueing Theorywhen the i − 1 in queue are the only customers in the system. Thus, K + ∑ ni=1 C irepresents the total cost incurred during the busy part of the cycle. In addition,during the idle part of the cycle there will be i customers in the system for anexponential time with rate λ,i = 1,...,n− 1, resulting in an expected cost ofc(1 +···+n − 1)/λ. Consequently,E[C(n)]=K +n∑E[C i ]+i=1n(n − 1)c2λ(8.49)To find E[C i ], consider the moment when the interval of length T i begins, andlet W i be the sum of the initial service time plus the sum of the times spent in thesystem by all the customers that arrive (and are served) until the moment whenthe interval ends and there are only i − 1 customers in the system. Then,C i = (i − 1)cT i + cW iwhere the first term refers to the cost incurred due to the i − 1 customers in queueduring the interval of length T i . As it is easy to see that W i has the same distributionas W b , the sum of the times spent in the system by all arrivals in an M/G/1busy period, we obtain thatE[C i ]=(i − 1)c E[S]1 − ρ + cE[W b] (8.50)Using Equation (8.49), this yieldsE[C(n)]=K +n(n − 1)cE[S]2(1 − ρ)= K + ncE[W b ]+n(n − 1)c2λ( ) ρ1 − ρ + 1+ ncE[W b ]+n(n − 1)c2λn(n − 1)c= K + ncE[W b ]+2λ(1 − ρ)Utilizing the preceding in conjunction with Equation (8.48) shows thatA(n) =Kλ(1 − ρ)n+ λc(1 − ρ)E[W b ]+c(n − 1)2(8.51)To determine E[W b ], we use the result that the average amount of time spentin the system by a customer in the M/G/1systemisW = W Q + E[S]= λE[S2 ]2(1 − ρ) + E[S]However, if we imagine that on day j,j 1, we earn an amount equal to the totaltime spent in the system by the jtharrivalattheM/G/1 system, then it follows


8.6. Variations on the M/G/1 539from renewal reward processes (since everything probabilistically restarts at theend of a busy period) thatW = E[W b]E[N]where N is the number of customers served in an M/G/1 busy period. SinceE[N]=1/(1 − ρ) we see that(1 − ρ)E[W b ]=W = λE[S2 ]2(1 − ρ) + E[S]Therefore, using Equation (8.51), we obtainA(n) =Kλ(1 − ρ)n+ cλ2 E[S 2 ]2(1 − ρ)+ cρ +c(n − 1)2To determine the optimal value of n, treat n as a continuous variable and differentiatethe preceding to obtainA ′ −Kλ(1 − ρ)(n) =n 2 + c 2Setting this equal to 0 and solving yields that the optimal value of n is√2Kλ(1 − ρ)n ∗ =cand the minimal average cost per unit time isA(n ∗ ) = √ 2λK(1 − ρ)c + cλ2 E[S 2 ]2(1 − ρ) + cρ − c 2It is interesting to see how close we can come to the minimal average costwhen we use a simpler policy of the following form: Whenever the server findsthe system empty of customers she departs and then returns after a fixed time t haselapsed. Let us say that a new cycle begins each time the server departs. Both theexpected costs incurred during the idle and the busy parts of a cycle are obtainedby conditioning on N(t), the number of arrivals in the time t that the server isgone. With ¯C(t) being the cost incurred during a cycle, we obtainN(t)∑E[ ¯C(t) | N(t)]=K + E[C i ]+cN(t) t 2= K +i=1N(t)(N(t) − 1)cE[S]2(1 − ρ)+ N(t)cE[W b ]+cN(t) t 2


540 8 Queueing TheoryThe final term of the first equality is the conditional expected cost during the idletime in the cycle and is obtained by using that, given the number of arrivals inthe time t, the arrival times are independent and uniformly distributed on (0,t);the second equality used Equation (8.50). Since N(t) is Poisson with mean λt, itfollows that E[N(t)(N(t) − 1)]=E[N 2 (t)]−E[N(t)]=λ 2 t 2 . Thus, taking theexpected value of the preceding givesE[ ¯C(t)]=K + λ2 t 2 cE[S]2(1 − ρ) + λtcE[W b]+ cλt22= K + cλt22(1 − ρ) + λtcE[W b]Similarly, if ¯T(t)is the time of a cycle, thenE[ ¯T(t)]=E[E[ ¯T(t)|N(t)]]= E[t + N(t)E[B]]= t + ρt1 − ρ= t1 − ρHence, the average cost per unit time, call it Ā(t), isĀ(t) = E[ ¯C(t)]E[ ¯T(t)]=K(1 − ρ)tThus, from Equation (8.51), we see that+ cλt2 + cλ(1 − ρ)E[W b]Ā(n/λ) − A(n) = c/2which shows that allowing the return decision to depend on the number presentlyin the system can reduce the average cost only by the amount c/2. 8.6.4. The M/G/1 Queue with Server BreakdownConsider a single server queue in which customers arrive according to a Poissonprocess with rate λ, and where the amount of service time required by each cus-


8.6. Variations on the M/G/1 541tomer has distribution G. Suppose, however, that when working the server breaksdown at an exponential rate α. That is, the probability a working server will beable to work for an additional time t without breaking down is e −αt . When theserver breaks down, it immediately goes to the repair facility. The repair time is arandom variable with distribution H . Suppose that the customer in service when abreakdown occurs has its service continue, when the sever returns, from the pointit was at when the breakdown occurred. (Therefore, the total amount of time acustomer is actually receiving service from a working server has distribution G.)By letting a customer’s “service time” include the time that the customer iswaiting for the server to come back from being repaired, the preceding is anM/G/1 queue. If we let T denote the amount of time from when a customerfirst enters service until it departs the system, then T is a service time randomvariable of this M/G/1 queue. The average amount of time a customer spendswaiting in queue before its service first commences is, thus,W Q =λE[T 2 ]2(1 − λE[T ])To compute E[T ] and E[T 2 ],letS, having distribution G, be the service requirementof the customer; let N denote the number of times that the server breaksdown while the customer is in service; let R 1 ,R 2 ,... be the amounts of time theserver spends in the repair facility on its successive visits. Then,N∑T = R i + Si=1Conditioning on S yields[ N]∑ ∣ ∣∣SE[T |S = s]=E R i = s + s,i=1( N)∑ ∣ ∣∣SVar(T |S = s) = Var R i = sNow, a working server always breaks down at an exponential rate α. Therefore,given that a customer requires s units of service time, it follows that the numberof server breakdowns while that customer is being served is a Poisson randomvariable with mean αs. Consequently, conditional on S = s, the random variablei=1


542 8 Queueing Theory∑ Ni=1R i is a compound Poisson random variable with Poisson mean αs. Usingthe results from Examples 3.10 and 3.17, we thus obtain[ N] (∑ ∣ N)∣∣S∑ ∣ ∣∣SE R i = s = αsE[R], Var R i = s = αsE[R 2 ]i=1where R has the repair distribution H . Therefore,Thus,i=1E[T |S]=αSE[R]+S = S(1 + αE[R]),Var(T |S) = αSE[R 2 ]E[T ]=E[E[T |S]] = E[S](1 + αE[R])and, by the conditional variance formula,Therefore,Var(T ) = E[Var(T |S)]+Var(E[T |S])= αE[S]E[R 2 ]+(1 + αE[R]) 2 Var(S)E[T 2 ]=Var(T ) + (E[T ]) 2= αE[S]E[R 2 ]+(1 + αE[R]) 2 E[S 2 ]Consequently, assuming that λE[T ]=λE[S](1 + αE[R])


8.7. The Model G/M/1 543These quantities can all be obtained by using the queueing cost identity. For instance,if we suppose that customers pay 1 per unit time while actually beingserved, thanTherefore, the identity yields thataverage rate at which system earns = P w ,average amount a customer pays = E[S]P w = λE[S]To determine P r , suppose a customer whose service is interrupted pays 1 per unittime while the server is being repaired. Then,yielding thataverage rate at which system earns = P r ,[ N]∑average amount a customer pays = E R i = αE[S]E[R]Of course, P I can be obtained fromP r = λαE[S]E[R]P I = 1 − P w − P rRemark The quantities P w and P r could also have been obtained by first notingthat 1 − P 0 = λE[T ] is the proportion of time the server is either working orin repair. Thus,P w = λE[T ] E[S]E[T ] = λE[S],E[T ]−E[S]P r = λE[T ] = λE[S]αE[R] E[T ]i=18.7. The Model G/M/1The model G/M/1 assumes that the times between successive arrivals have anarbitrary distribution G. The service times are exponentially distributed with rateμ and there is a single server.The immediate difficulty in analyzing this model stems from the fact thatthe number of customers in the system is not informative enough to serve as a


544 8 Queueing Theorystate space. For in summarizing what has occurred up to the present we wouldneed to know not only the number in the system, but also the amount of time thathas elapsed since the last arrival (since G is not memoryless). (Why need we notbe concerned with the amount of time the person being served has already spentin service?) To get around this problem we shall only look at the system when acustomer arrives; and so let us define X n ,n 1, byX n ≡ the number in the system as seen by the nth arrivalIt is easy to see that the process {X n ,n 1} is a Markov chain. To compute thetransition probabilities P ij for this Markov chain let us first note that, as long asthere are customers to be served, the number of services in any length of time tis a Poisson random variable with mean μt. This is true since the time betweensuccessive services is exponential and, as we know, this implies that the numberof services thus constitutes a Poisson process. Hence,∫ ∞−μt (μt)jP i,i+1−j = e dG(t), j = 0, 1,...,i0 j!which follows since if an arrival finds i in the system, then the next arrival willfind i + 1 minus the number served, and the probability that j will be served iseasily seen to equal the right side of the preceding (by conditioning on the timebetween the successive arrivals).The formula for P i0 is a little different (it is the probability that at least i + 1Poisson events occur in a random length of time having distribution G) and canbe obtained fromi∑P i0 = 1 −j=0P i,i+1−jThe limiting probabilities π k ,k = 0, 1,..., can be obtained as the unique solutionofπ k = ∑ π i P ik , k 0,ii=k−1∑π k = 1iwhich, in this case, reduce to∞∑∫ ∞π k = π i e −μt (μt) i+1−kdG(t), k 1,(i + 1 − k)!∞∑π k = 100(8.52)


8.7. The Model G/M/1 545(We have not included the equation π 0 = ∑ π i P i0 since one of the equations isalways redundant.)To solve the preceding, let us try a solution of the form π k = cβ k . Substitutioninto Equation (8.52) leads toHowever,cβ k = c= c∞∑i=k−1∫ ∞0∞∑i=k−1β i ∫ ∞0e −μt β k−1and thus Equation (8.53) reduces toore −μt (μt) i+1−k(i + 1 − k)! dG(t)∞ ∑i=k−1(βμt) i+1−k ∞(i + 1 − k)! = ∑(βμt) i+1−kdG(t) (8.53)(i + 1 − k)!j=0= e βμt(βμt) jj!∫ ∞β k = β k−1 e −μt(1−β) dG(t)β =∫ ∞00e −μt(1−β) dG(t) (8.54)The constant c can be obtained from ∑ k π k = 1, which implies thatorc∞∑β k = 10c = 1 − βAs (π k )istheunique solution to Equation (8.52), and π k = (1 − β)β k satisfies,it follows thatπ k = (1 − β)β k , k= 0, 1,...where β is the solution of Equation (8.54). [It can be shown that if the meanof G is greater than the mean service time 1/μ, then there is a unique value of β


546 8 Queueing Theorysatisfying Equation (8.54) which is between 0 and 1.] The exact value of β usuallycan only be obtained by numerical methods.As π k is the limiting probability that an arrival sees k customers, it is just thea k as defined in Section 8.2. Hence,α k = (1 − β)β k , k 0 (8.55)We can obtain W by conditioning on the number in the system when a customerarrives. This yieldsW = ∑ kE[time in system | arrival sees k](1 − β)β kand= ∑ k=k + 1μ (1 − (Since if an arrival sees k then he spendsβ)βk k + 1 service periods in the system)(1μ(1 − β)by usingW Q = W − 1 μ =L = λW =∞∑kx k =0βμ(1 − β) ,)x(1 − x) 2λμ(1 − β) , (8.56)λβL Q = λW Q =μ(1 − β)where λ is the reciprocal of the mean interarrival time. That is,∫1 ∞λ = xdG(x)0In fact, in exactly the same manner as shown for the M/M/1 in Section 8.3.1and Exercise 4 we can show thatW ∗ is exponential with rate μ(1 − β),{WQ ∗ 0with probability 1 − β= exponential with rate μ(1 − β) with probability βwhere W ∗ and WQ ∗ are the amounts of time that a customer spends in system andqueue, respectively (their means are W and W Q ).Whereas a k = (1 − β)β k is the probability that an arrival sees k in the system,it is not equal to the proportion of time during which there are k in the system(since the arrival process is not Poisson). To obtain the P k we first note that the


8.7. The Model G/M/1 547rate at which the number in the system changes from k − 1tok must equal therate at which it changes from k to k − 1 (why?). Now the rate at which it changesfrom k −1tok is equal to the arrival rate λ multiplied by the proportion of arrivalsfinding k − 1 in the system. That is,rate number in system goes from k − 1tok = λa k−1Similarly, the rate at which the number in the system changes from k to k − 1isequal to the proportion of time during which there are k in the system multipliedby the (constant) service rate. That is,Equating these rates yieldsand so, from Equation (8.55),rate number in system goes from k to k − 1 = P k μand, as P 0 = 1 − ∑ ∞k=1 P k , we obtainP k = λ μ a k−1, k 1P k = λ μ (1 − β)βk−1 , k 1P 0 = 1 − λ μRemarks In the foregoing analysis we guessed at a solution of the stationaryprobabilities of the Markov chain of the form π k = cβ k , then verified such a solutionby substituting in the stationary Equation (8.52). However, it could havebeen argued directly that the stationary probabilities of the Markov chain are ofthis form. To do so, define β i to be the expected number of times that state i + 1is visited in the Markov chain between two successive visits to state i, i 0. Nowit is not difficult to see (and we will let you argue it out for yourself) thatβ 0 = β 1 = β 2 =···=βNow it can be shown by using renewal reward processes thatπ i+1 =E[number of visits to state i + 1inani–i cycle]E[number of transitions in an i–i cycle]= β i1/π i


548 8 Queueing Theoryand so,implying, since ∑ ∞0π i+1 = β i π i = βπ i , i 0π i = 1, thatπ i = β i (1 − β), i 08.7.1. The G/M/1 Busy and Idle PeriodsSuppose that an arrival has just found the system empty—and so initiates a busyperiod—and let N denote the number of customers served in that busy period.Since the Nth arrival (after the initiator of the busy period) will also find thesystem empty, it follows that N is the number of transitions for the Markov chain(of Section 8.7) to go from state 0 to state 0. Hence, 1/E[N] is the proportionof transitions that take the Markov chain into state 0; or equivalently, it is theproportion of arrivals that find the system empty. Therefore,E[N]= 1 = 1a 0 1 − βAlso, as the next busy period begins after the Nth interarrival, it follows that thecycle time (that is, the sum of a busy and idle period) is equal to the time untilthe Nth interarrival. In other words, the sum of a busy and idle period can beexpressed as the sum of N interarrival times. Thus, if T i is the ith interarrivaltime after the busy period begins, then[ N]∑E[Busy]+E[Idle]=E T ii=1= E[N]E[T ] (by Wald’s equation)1=(8.57)λ(1 − β)For a second relation between E[Busy] and E[Idle], we can use the same argumentas in Section 8.5.3 to conclude thatE[Busy]1 − P 0 =E[Idle]+E[Busy]and since P 0 = 1 − λ/μ, we obtain, upon combining this with (8.57), thatE[Busy]=E[Idle]=1μ(1 − β) ,μ − λλμ(1 − β)


8.8. A Finite Source Model 5498.8. A Finite Source ModelConsider a system of m machines, whose working times are independent exponentialrandom variables with rate λ. Upon failure, a machine instantly goes toa repair facility that consists of a single repairperson. If the repairperson is free,repair begins on the machine; otherwise, the machine joins the queue of failedmachines. When a machine is repaired it becomes a working machine, and repairbegins on a new machine from the queue of failed machines (provided the queue isnonempty). The successive repair times are independent random variables havingdensity function g, with meanμ R =∫ ∞0xg(x) dxTo analyze this system, so as to determine such quantities as the average numberof machines that are down and the average time that a machine is down, we willexploit the exponentially distributed working times to obtain a Markov chain.Specifically, let X n denote the number of failed machines immediately after thenth repair occurs, n 1. Now, if X n = i>0, then the situation when the nthrepair has just occurred is that repair is about to begin on a machine, there are i −1other machines waiting for repair, and there are m − i working machines, each ofwhich will (independently) continue to work for an exponential time with rate λ.Similarly, if X n = 0, then all m machines are working and will (independently)continue to do so for exponentially distributed times with rate λ. Consequently,any information about earlier states of the system will not affect the probabilitydistribution of the number of down machines at the moment of the next repaircompletion; hence, {X n ,n 1} is a Markov chain. To determine its transitionprobabilities P i,j , suppose first that i>0. Conditioning on R, the length of thenext repair time, and making use of the independence of the m − i remainingworking times, yields that for j m − iP i,i−1+j = P {j failures during R}==∫ ∞0∫ ∞0P {j failures during R | R = r}g(r) dr( ) m − i(1 − e −λr ) j (e −λr ) m−i−j g(r) drjIf i = 0, then, because the next repair will not begin until one of the machinesfails,P 0,j = P 1,j , j m − 1


550 8 Queueing TheoryLet π j ,j = 0,...,m− 1, denote the stationary probabilities of this Markov chain.That is, they are the unique solution ofπ j = ∑ iπ i P i,j ,m−1∑π j = 1j=0Therefore, after explicitly determining the transition probabilities and solving thepreceding equations, we would know the value of π 0 , the proportion of repaircompletions that leaves all machines working. Let us say that the system is “on”when all machines are working and “off” otherwise. (Thus, the system is on whenthe repairperson is idle and off when he is busy.) As all machines are workingwhen the system goes back on, it follows from the lack of memory property of theexponential that the system probabilistically starts over when it goes on. Hence,this on–off system is an alternating renewal process. Suppose that the system hasjust become on, thus starting a new cycle, and let R i ,i 1, be the time of theith repair from that moment. Also, let N denote the number of repairs in the off(busy) time of the cycle. Then, it follows that B, the length of the off period, canbe expressed asB =N∑i=1Although N is not independent of the sequence R 1 ,R 2 ,..., it is easy to checkthat it is a stopping time for this sequence, and thus by Wald’s equation (seeExercise 13 of Chapter 7) we have thatR iE[B]=E[N]E[R]=E[N]μ RAlso, since an on time will last until one of the machines fails, and since theminimum of independent exponential random variables is exponential with a rateequal to the sum of their rates, it follows that E[I], the mean on (idle) time in acycle, is given byE[I]=1/(mλ)Hence, P B , the proportion of time that the repairperson is busy, satisfiesP B =E[N]μ RE[N]μ R + 1/(mλ)


8.8. A Finite Source Model 551However, since, on average, one out of every E[N] repair completions will leaveall machines working, it follows thatπ 0 = 1E[N]Consequently,μ RP B =(8.58)μ R + π 0 /(mλ)Now focus attention on one of the machines, call it machine number 1, and letP 1,R denote the proportion of time that machine 1 is being repaired. Since theproportion of time that the repairperson is busy is P B , and since all machines failat the same rate and have the same repair distribution, it follows thatP 1,R = P Bm = μ R(8.59)mμ R + π 0 /λHowever, machine 1 alternates between time periods when it is working, when itis waiting in queue, and when it is in repair. Let W i ,Q i ,S i denote, respectively,the ith working time, the ith queueing time, and the ith repair time of machine1, i 1. Then, the proportion of time that machine 1 is being repaired during itsfirst n working–queue–repair cycles is:proportion of time in the first n cycles that machine 1 is being repaired∑ ni=1S i= ∑ ni=1W i + ∑ ni=1 Q i + ∑ ni=1 S i∑ ni=1S i /n= ∑ ni=1W i /n + ∑ ni=1 Q i /n + ∑ ni=1 S i /nLetting n →∞and using the strong law of large numbers to conclude that theaverages of the W i and of the S i converge, respectively, to 1/λ and μ R , yieldsthatμ RP 1,R =1/λ + ¯Q + μ Rwhere ¯Q is the average amount of time that machine 1 spends in queue when itfails. Using Equation (8.59), the preceding gives thatμ Rmμ R + π 0 /λ = μ R1/λ + ¯Q + μ Ror, equivalently, that¯Q = (m − 1)μ R − (1 − π 0 )/λMoreover, since all machines are probabilistically equivalent it follows that ¯Q isequal to W Q , the average amount of time that a failed machine spends in queue.


552 8 Queueing TheoryTo determine the average number of machines in queue, we will make use of thebasic queueing identityL Q = λ a W Q = λ a ¯Qwhere λ a is the average rate at which machines fail. To determine λ a ,againfocusattention on machine 1 and suppose that we earn one per unit time whenevermachine 1 is being repaired. It then follows from the basic cost identity of Equation(8.1) thatP 1,R = r 1 μ Rwhere r 1 is the average rate at which machine 1 fails. Thus, from Equation (8.59),we obtain that1r 1 =mμ R + π 0 /λBecause all m machines fail at the same rate, the preceding implies thatmλ a = mr 1 =mμ R + π 0 /λwhich gives that the average number of machines in queue isL Q = m(m − 1)μ R − m(1 − π 0 )/λmμ R + π 0 /λSince the average number of machines being repaired is P B , the preceding, alongwith Equation (8.58), shows that the average number of down machines isL = L Q + P B = m2 μ R − m(1 − π 0 )/λmμ R + π 0 /λ8.9. Multiserver QueuesBy and large, systems that have more than one server are much more difficultto analyze than those with a single server. In Section 8.9.1 we start first with aPoisson arrival system in which no queue is allowed, and then consider in Section8.9.2 the infinite capacity M/M/k system. For both of these models we areable to present the limiting probabilities. In Section 8.9.3 we consider the modelG/M/k. The analysis here is similar to that of the G/M/1 (Section 8.7) exceptthat in place of a single quantity β given as the solution of an integral equation,we have k such quantities. We end in Section 8.9.4 with the model M/G/k forwhich unfortunately our previous technique (used in M/G/1) no longer enablesus to derive W Q , and we content ourselves with an approximation.


8.9. Multiserver Queues 5538.9.1. Erlang’s Loss SystemA loss system is a queueing system in which arrivals that find all servers busydo not enter but rather are lost to the system. The simplest such system is theM/M/k loss system in which customers arrive according to a Poisson processhaving rate λ, enter the system if at least one of the k servers is free, and then spendan exponential amount of time with rate μ being served. The balance equationsfor this system areStateRate leave = rate enterRewriting givesor0 λP 0 = μP 11 (λ + μ)P 1 = 2μP 2 + λP 02 (λ + 2μ)P 2 = 3μP 3 + λP 1i, 0


554 8 Queueing TheorySince E[S] =1/μ, where E[S] is the mean service time, the preceding can bewritten asP i =(λE[S]) i /i!∑ kj=0, i = 0, 1,...,k (8.60)(λE[S]) j /j !Consider now the same system except that the service distribution is general—that is, consider the M/G/k with no queue allowed. This model is sometimescalled the Erlang loss system. It can be shown (though the proof is advanced)that Equation (8.60) (which is called Erlang’s loss formula) remains valid for thismore general system.8.9.2. The M/M/k QueueThe M/M/k infinite capacity queue can be analyzed by the balance equationtechnique. We leave it for you to verify that⎧k−1⎪⎨ ∑ (λ/μ) iP i = i!i=0(λ/μ) i⎪⎩(λ/kμ) i k kP 0 , i >kk!We see from the preceding that we need to impose the condition λ


8.9. Multiserver Queues 555To derive the transition probabilities of the Markov chain, it helps to first notethe relationshipX n+1 = X n + 1 − Y n , n 0where Y n denotes the number of departures during the interarrival time betweenthe nth and (n+1)st arrival. The transition probabilities P ij can now be calculatedas follows:Case 1: j>i+ 1.In this case it easily follows that P ij = 0.Case 2: j i + 1 k.In this case if an arrival finds i in the system, then as ij.In this case since when all servers are busy the departure process is a Poissonprocess, it follows that the length of time until there will only be k in the systemwill have a gamma distribution with parameters i + 1 − k,kμ (the time until


556 8 Queueing Theoryi +1−k events of a Poisson process with rate kμ occur is gamma distributed withparameters i + 1 − k,kμ). Conditioning first on the interarrival time and then onthe time until there are only k in the system (call this latter random variable T k )yieldsP ij ===∫ ∞0∫ ∞ ∫ t0 0∫ ∞ ∫ t0P {i + 1 − j departures in time t} dG(t)0−kμs (kμs)i−kP {i + 1 − j departures in t | T k = s}kμe(i − k)!( kj) (1− e−μ(t−s) ) k−j (e−μ(t−s) ) j kμe−kμs (kμs)i−k(i − k)!dsdG(t)dsdG(t)where the last equality follows since of the k people in service at time s thenumber whose service will end by time t is binomial with parameters k and1 − e −μ(t−s) .We now can verify either by a direct substitution into the equations π j =∑i π iP ij , or by the same argument as presented in the remark at the end of Section8.7, that the limiting probabilities of this Markov chain are of the formπ k−1+j = cβ j , j = 0, 1,....Substitution into any of the equations π j = ∑ i π iP ij when j>kyields that β isgiven as the solution ofβ =∫ ∞0e −kμt(1−β) dG(t)The values π 0 ,π 1 ,...,π k−2 , can be obtained by recursively solving the first k −1of the steady-state equations, and c can then be computed by using ∑ ∞0 π i = 1.If we let WQ ∗ denote the amount of time that a customer spends in queue, thenin exactly the same manner as in G/M/1 we can show thatW ∗ Q = ⎧⎪⎨⎪ ⎩0,with probability ∑ k−10π i = 1 − cβ1−βExp(kμ(1 − β)), with probability ∑ ∞k π i = cβ1−βwhere Exp(kμ(1 − β)) is an exponential random variable with rate kμ(1 − β).8.9.4. The M/G/k QueueIn this section we consider the M/G/k system in which customers arrive at aPoisson rate λ and are served by any of k servers, each of whom has the service


8.9. Multiserver Queues 557distribution G. If we attempt to mimic the analysis presented in Section 8.5 forthe M/G/1 system, then we would start with the basic identityV = λE[S]W Q + λE[S 2 ]/2 (8.61)and then attempt to derive a second equation relating V and W Q .Now if we consider an arbitrary arrival, then we have the following identity:work in system when customer arrives= k × time customer spends in queue + R (8.62)where R is the sum of the remaining service times of all other customers in serviceat the moment when our arrival enters service.The foregoing follows because while the arrival is waiting in queue, work isbeing processed at a rate k per unit time (since all servers are busy). Thus, anamount of work k × time in queue is processed while he waits in queue. Now, allof this work was present when he arrived and in addition the remaining work onthose still being served when he enters service was also present when he arrived—so we obtain Equation (8.62). For an illustration, suppose that there are threeservers all of whom are busy when the customer arrives. Suppose, in addition,that there are no other customers in the system and also that the remaining servicetimes of the three people in service are 3, 6, and 7. Hence, the work seen by thearrival is 3 + 6 + 7 = 16. Now the arrival will spend 3 time units in queue, and atthe moment he enters service, the remaining times of the other two customers are6 − 3 = 3 and 7 − 3 = 4. Hence, R = 3 + 4 = 7 and as a check of Equation (8.62)we see that 16 = 3 × 3 + 7.Taking expectations of Equation (8.62) and using the fact that Poisson arrivalssee time averages, we obtainV = kW Q + E[R]which, along with Equation (8.61), would enable us to solve for W Q if we couldcompute E[R]. However there is no known method for computing E[R] and infact, there is no known exact formula for W Q . The following approximation forW Q was obtained in Reference 6 by using the foregoing approach and then approximatingE[R]:W Q ≈λ k E[S 2 ](E[S]) k−1[ k−1]∑2(k − 1)!(k − λE[S]) 2 (λE[S]) n (λE[S]) k+n! (k − 1)!(k − λE[S])n=0(8.63)The preceding approximation has been shown to be quite close to the W Q whenthe service distribution is gamma. It is also exact when G is exponential.


558 8 Queueing TheoryExercises1. For the M/M/1 queue, compute(a) the expected number of arrivals during a service period and(b) the probability that no customers arrive during a service period.Hint:“Condition.”*2. Machines in a factory break down at an exponential rate of six per hour.There is a single repairman who fixes machines at an exponential rate of eightper hour. The cost incurred in lost production when machines are out of serviceis $10 per hour per machine. What is the average cost rate incurred due to failedmachines?3. The manager of a market can hire either Mary or Alice. Mary, who givesservice at an exponential rate of 20 customers per hour, can be hired at a rate of$3 per hour. Alice, who gives service at an exponential rate of 30 customers perhour, can be hired at a rate of $C per hour. The manager estimates that, on theaverage, each customer’s time is worth $1 per hour and should be accounted forin the model. If customers arrive at a Poisson rate of 10 per hour, then(a) what is the average cost per hour if Mary is hired? if Alice is hired?(b) find C if the average cost per hour is the same for Mary and Alice.4. Suppose that a customer of the M/M/1 system spends the amount of timex>0 waiting in queue before entering service.(a) Show that, conditional on the preceding, the number of other customers thatwere in the system when the customer arrived is distributed as 1 + P , where Pis a Poisson random variable with mean λ.(b) Let WQ ∗ denote the amount of time that an M/M/1 customer spends inqueue. As a byproduct of your analysis in part (a), show thatP {W ∗ Q x}= { 1 −λμif x = 01 − λ μ + λ μ (1 − e−(μ−λ)x ) if x>05. It follows from Exercise 4 that if, in the M/M/1 model, WQ ∗ is the amountof time that a customer spends waiting in queue, then{WQ ∗ 0,= Exp(μ − λ),with probability 1 − λ/μwith probability λ/μwhere Exp(μ − λ) is an exponential random variable with rate μ − λ. Using this,find Var(W ∗ Q ).


Exercises 5596. Two customers move about among three servers. Upon completion of serviceat server i, the customer leaves that server and enters service at whichever of theother two servers is free. (Therefore, there are always two busy servers.) If theservice times at server i are exponential with rate μ i ,i= 1, 2, 3, what proportionof time is server i idle?*7. Show that W is smaller in an M/M/1 model having arrivals at rate λ andservice at rate 2μ than it is in a two-server M/M/2 model with arrivals at rateλ and with each server at rate μ. Can you give an intuitive explanation for thisresult?WoulditalsobetrueforW Q ?8. A group of n customers moves around among two servers. Upon completionof service, the served customer then joins the queue (or enters service if the serveris free) at the other server. All service times are exponential with rate μ. Findtheproportion of time that there are j customers at server 1, j = 0,...,n.9. A facility produces items according to a Poisson process with rate λ. However,it has shelf space for only k items and so it shuts down production wheneverk items are present. Customers arrive at the facility according to a Poisson processwith rate μ. Each customer wants one item and will immediately depart eitherwith the item or empty handed if there is no item available.(a) Find the proportion of customers that go away empty handed.(b) Find the average time that an item is on the shelf.(c) Find the average number of items on the shelf.Suppose now that when a customer does not find any available items it joinsthe “customers’ queue” as long as there are no more than n − 1 other customerswaiting at that time. If there are n waiting customers then the new arrival departswithout an item.(d) Set up the balance equations.(e) In terms of the solution of the balance equations, what is the average numberof customers in the system?10. A group of m customers frequents a single-server station in the followingmanner. When a customer arrives, he or she either enters service if the server isfree or joins the queue otherwise. Upon completing service the customer departsthe system, but then returns after an exponential time with rate θ. All service timesare exponentially distributed with rate μ.(a) Define states and set up the balance equations.In terms of the solution of the balance equations, find(b) the average rate at which customers enter the station.(c) the average time that a customer spends in the station per visit.


560 8 Queueing Theory11. Consider a single-server queue with Poisson arrivals and exponential servicetimes having the following variation: Whenever a service is completed a departureoccurs only with probability α. With probability 1 − α the customer, instead ofleaving, joins the end of the queue. Note that a customer may be serviced morethan once.(a) Set up the balance equations and solve for the steady-state probabilities,stating conditions for it to exist.(b) Find the expected waiting time of a customer from the time he arrives untilhe enters service for the first time.(c) What is the probability that a customer enters service exactly n times, n =1, 2,...?(d) What is the expected amount of time that a customer spends in service(which does not include the time he spends waiting in line)?Hint:Use part (c).(e) What is the distribution of the total length of time a customer spends beingserved?Hint:Is it memoryless?12. Exponential queueing systems in which the state is the number of customersin the system are known as birth and death queueing systems. For such a system,let λ n denote the rate at which a new customer joins the system when it is in staten, and let μ n denote the rate at which there is a departure from the system whenin state n.(a) Give the quantities λ n and μ n for the M/M/1 queue with finite capacity N.(b) Write down the balance equations.(c) Show how the balance equations can be reduced to the set of equationsλ n P n = μ n+1 P n+1 , n 0(d) Give a direct argument for the preceding equations.(e) Solve the preceding equations, and in doing so, give the condition that isneeded for there to be a solution.(f) What is the average arrival rate λ a ?(g) What is the average amount of time that a customer spends in the system?*13. A supermarket has two exponential checkout counters, each operating atrate μ. Arrivals are Poisson at rate λ. The counters operate in the following way:(i) One queue feeds both counters.


Exercises 561(ii) One counter is operated by a permanent checker and the other by a stockclerk who instantaneously begins checking whenever there are two ormore customers in the system. The clerk returns to stocking whenever hecompletes a service, and there are fewer than two customers in the system.(a) Let P n = proportion of time there are n in the system. Set up equations forP n and solve.(b) At what rate does the number in the system go from 0 to 1? from 2 to 1?(c) What proportion of time is the stock clerk checking?Hint:Be a little careful when there is one in the system.14. Customers arrive at a single-service facility at a Poisson rate of 40 per hour.When two or fewer customers are present, a single attendant operates the facility,and the service time for each customer is exponentially distributed with a meanvalue of two minutes. However, when there are three or more customers at the facility,the attendant is joined by an assistant and, working together, they reduce themean service time to one minute. Assuming a system capacity of four customers,(a) what proportion of time are both servers free?(b) each man is to receive a salary proportional to the amount of time he is actuallyat work servicing customers, the rate being the same for both. If togetherthey earn $100 per day, how should this money be split?15. Consider a sequential-service system consisting of two servers, A and B.Arriving customers will enter this system only if server A is free. If a customerdoes enter, then he is immediately served by server A. When his service by A iscompleted, he then goes to B if B is free, or if B is busy, he leaves the system.Upon completion of service at server B, the customer departs. Assuming thatthe (Poisson) arrival rate is two customers an hour, and that A and B serve atrespective (exponential) rates of four and two customers an hour,(a) what proportion of customers enter the system?(b) what proportion of entering customers receive service from B?(c) what is the average number of customers in the system?(d) what is the average amount of time that an entering customer spends in thesystem?16. Customers arrive at a two-server system according to a Poisson process havingrate λ = 5. An arrival finding server 1 free will begin service with that server.An arrival finding server 1 busy and server 2 free will enter service with server 2.An arrival finding both servers busy goes away. Once a customer is served by ei-


562 8 Queueing Theoryther server, he departs the system. The service times at server i are exponentialwith rates μ i , where μ 1 = 4, μ 2 = 2.(a) What is the average time an entering customer spends in the system?(b) What proportion of time is server 2 busy?17. Customers arrive at a two-server station in accordance with a Poissonprocess with a rate of two per hour. Arrivals finding server 1 free begin servicewith that server. Arrivals finding server 1 busy and server 2 free begin service withserver 2. Arrivals finding both servers busy are lost. When a customer is servedby server 1, she then either enters service with server 2 if 2 is free or departs thesystem if 2 is busy. A customer completing service at server 2 departs the system.The service times at server 1 and server 2 are exponential random variables withrespective rates of four and six per hour.(a) What fraction of customers do not enter the system?(b) What is the average amount of time that an entering customer spends in thesystem?(c) What fraction of entering customers receives service from server 1?18. Customers arrive at a two-server system at a Poisson rate λ. An arrival findingthe system empty is equally likely to enter service with either server. An arrivalfinding one customer in the system will enter service with the idle server. An arrivalfinding two others in the system will wait in line for the first free server. Anarrival finding three in the system will not enter. All service times are exponentialwith rate μ, and once a customer is served (by either server), he departs thesystem.(a) Define the states.(b) Find the long-run probabilities.(c) Suppose a customer arrives and finds two others in the system. What is theexpected times he spends in the system?(d) What proportion of customers enter the system?(e) What is the average time an entering customer spends in the system?19. The economy alternates between good and bad periods. During good timescustomers arrive at a certain single-server queueing system in accordance witha Poisson process with rate λ 1 , and during bad times they arrive in accordancewith a Poisson process with rate λ 2 . A good time period lasts for an exponentiallydistributed time with rate α 1 , and a bad time period lasts for an exponential timewith rate α 2 . An arriving customer will only enter the queueing system if theserver is free; an arrival finding the server busy goes away. All service times areexponential with rate μ.(a) Define states so as to be able to analyze this system.(b) Give a set of linear equations whose solution will yield the long run proportionof time the system is in each state.


Exercises 563In terms of the solutions of the equations in part (b),(c) what proportion of time is the system empty?(d) what is the average rate at which customers enter the system?20. There are two types of customers. Type 1 and 2 customers arrive in accordancewith independent Poisson processes with respective rate λ 1 and λ 2 . Thereare two servers. A type 1 arrival will enter service with server 1 if that server isfree; if server 1 is busy and server 2 is free, then the type 1 arrival will enter servicewith server 2. If both servers are busy, then the type 1 arrival will go away. A type2 customer can only be served by server 2; if server 2 is free when a type 2 customerarrives, then the customer enters service with that server. If server 2 is busywhen a type 2 arrives, then that customer goes away. Once a customer is servedby either server, he departs the system. Service times at server i are exponentialwith rate μ i ,i= 1, 2.Suppose we want to find the average number of customers in the system.(a) Define states.(b) Give the balance equations. Do not attempt to solve them.In terms of the long-run probabilities, what is(c) the average number of customers in the system?(d) the average time a customer spends in the system?*21. Suppose in Exercise 20 we want to find out the proportion of time there isa type 1 customer with server 2. In terms of the long-run probabilities given inExercise 20, what is(a) the rate at which a type 1 customer enters service with server 2?(b) the rate at which a type 2 customer enters service with server 2?(c) the fraction of server 2’s customers that are type 1?(d) the proportion of time that a type 1 customer is with server 2?22. Customers arrive at a single-server station in accordance with a Poissonprocess with rate λ. All arrivals that find the server free immediately enter service.All service times are exponentially distributed with rate μ. An arrival thatfinds the server busy will leave the system and roam around “in orbit” for an exponentialtime with rate θ at which time it will then return. If the server is busywhen an orbiting customer returns, then that customer returns to orbit for anotherexponential time with rate θ before returning again. An arrival that finds the serverbusy and N other customers in orbit will depart and not return. That is, N is themaximum number of customers in orbit.(a) Define states.(b) Give the balance equations.In terms of the solution of the balance equations, find(c) the proportion of all customers that are eventually served.


564 8 Queueing Theory(d) the average time that a served customer spends waiting in orbit.23. Consider the M/M/1 system in which customers arrive at rate λ and theserver serves at rate μ. However, suppose that in any interval of length h in whichthe server is busy there is a probability αh + o(h) that the server will experiencea breakdown, which causes the system to shut down. All customers that are in thesystem depart, and no additional arrivals are allowed to enter until the breakdownis fixed. The time to fix a breakdown is exponentially distributed with rate β.(a) Define appropriate states.(b) Give the balance equations.In terms of the long-run probabilities,(c) what is the average amount of time that an entering customer spends in thesystem?(d) what proportion of entering customers complete their service?(e) what proportion of customers arrive during a breakdown?*24. Reconsider Exercise 23, but this time suppose that a customer that is in thesystem when a breakdown occurs remains there while the server is being fixed.In addition, suppose that new arrivals during a breakdown period are allowed toenter the system. What is the average time a customer spends in the system?25. Poisson (λ) arrivals join a queue in front of two parallel servers A and B,having exponential service rates μ A and μ B (see Figure 8.4). When the systemis empty, arrivals go into server A with probability α and into B with probability1 − α. Otherwise, the head of the queue takes the first free server.(a) Define states and set up the balance equations. Do not solve.(b) In terms of the probabilities in part (a), what is the average number in thesystem? Average number of servers idle?(c) In terms of the probabilities in part (a), what is the probability that an arbitraryarrival will get serviced in A?26. In a queue with unlimited waiting space, arrivals are Poisson (parameter λ)and service times are exponentially distributed (parameter μ). However, the serverwaits until K people are present before beginning service on the first customer;thereafter, he services one at a time until all K units, and all subsequent arrivals,are serviced. The server is then “idle” until K new arrivals have occurred.(a) Define an appropriate state space, draw the transition diagram, and set upthe balance equations.Figure 8.4.


Exercises 565(b) In terms of the limiting probabilities, what is the average time a customerspends in queue?(c) What conditions on λ and μ are necessary?27. Consider a single-server exponential system in which ordinary customersarrive at a rate λ and have service rate μ. In addition, there is a special customerwho has a service rate μ 1 . Whenever this special customer arrives, she goes directlyinto service (if anyone else is in service, then this person is bumped backinto queue). When the special customer is not being serviced, she spends an exponentialamount of time (with mean 1/θ) out of the system.(a) What is the average arrival rate of the special customer?(b) Define an appropriate state space and set up balance equations.(c) Find the probability that an ordinary customer is bumped n times.*28. Let D denote the time between successive departures in a stationaryM/M/1 queue with λ


566 8 Queueing TheoryIn terms of the solution of the equations of (b), find(c) the proportion of time the server is cutting hair;(d) the average arrival rate of entering customers.30. For the tandem queue model verify thatP n,m = (λ/μ 1 ) n (1 − λ/μ 1 )(λ/μ 2 ) m (1 − λ/μ 2 )satisfies the balance equation (8.15).31. Consider a network of three stations. Customers arrive at stations 1, 2, 3inaccordance with Poisson processes having respective rates, 5, 10, 15. The servicetimes at the three stations are exponential with respective rates 10, 50, 100. A customercompleting service at station 1 is equally likely to (i) go to station 2, (ii) goto station 3, or (iii) leave the system. A customer departing service at station 2always goes to station 3. A departure from service at station 3 is equally likely toeither go to station 2 or leave the system.(a) What is the average number of customers in the system (consisting of allthree stations)?(b) What is the average time a customer spends in the system?32. Consider a closed queueing network consisting of two customers movingamong two servers, and suppose that after each service completion the customeris equally likely to go to either server—that is, P 1,2 = P 2,1 = 1 2 .Letμ i denote theexponential service rate at server i, i = 1, 2.(a) Determine the average number of customers at each server.(b) Determine the service completion rate for each server.33. Explain how a Markov chain Monte Carlo simulation using the Gibbs samplercan be utilized to estimate(a) the distribution of the amount of time spent at server j on a visit.Hint:Use the arrival theorem.(b) the proportion of time a customer is with server j (i.e., either in server j’squeue or in service with j).34. For open queueing networks(a) state and prove the equivalent of the arrival theorem;(b) derive an expression for the average amount of time a customer spendswaiting in queues.35. Customers arrive at a single-server station in accordance with a Poissonprocess having rate λ. Each customer has a value. The successive values of customersare independent and come from a uniform distribution on (0, 1). Theservicetime of a customer having value x is a random variable with mean 3 + 4xand variance 5.


Exercises 567(a) What is the average time a customer spends in the system?(b) What is the average time a customer having value x spends in the system?*36. Compare the M/G/1 system for first-come, first-served queue disciplinewith one of last-come, first-served (for instance, in which units for service aretaken from the top of a stack). Would you think that the queue size, waitingtime, and busy-period distribution differ? What about their means? What if thequeue discipline was always to choose at random among those waiting? Intuitivelywhich discipline would result in the smallest variance in the waiting timedistribution?37. In an M/G/1 queue,(a) what proportion of departures leave behind 0 work?(b) what is the average work in the system as seen by a departure?38. For the M/G/1 queue, let X n denote the number in the system left behindby the nth departure.(a) If{Xn − 1 + YX n+1 =n , if X n 1Y n , if X n = 0what does Y n represent?(b) Rewrite the preceding aswhereX n+1 = X n − 1 + Y n + δ n (8.64){ 1, if Xn = 0δ n =0, if X n 1Take expectations and let n →∞in Equation (8.64) to obtainE[δ ∞ ]=1 − λE[S](c) Square both sides of Equation (8.64), take expectations, and then let n →∞ to obtainE[X ∞ ]= λ2 E[S 2 ]2(1 − λE[S]) + λE[S](d) Argue that E[X ∞ ], the average number as seen by a departure, is equalto L.


568 8 Queueing Theory*39. Consider an M/G/1 system in which the first customer in a busy periodhas the service distribution G 1 and all others have distribution G 2 .LetC denotethe number of customers in a busy period, and let S denote the service time of acustomer chosen at random.Argue that(a) a 0 = P 0 = 1 − λE[S].(b) E[S]=a 0 E[S 1 ]+(1 − a 0 )E[S 2 ] where S i has distribution G i .(c) Use (a) and (b) to show that E[B], the expected length of a busy period, isgiven byE[B]= E[S 1]1 − λE[S 2 ](d) Find E[C].40. Consider a M/G/1 system with λE[S] < 1.(a) Suppose that service is about to begin at a moment when there are n customersin the system.(i) Argue that the additional time until there are only n − 1 customers inthe system has the same distribution as a busy period.(ii) What is the expected additional time until the system is empty?(b) Suppose that the work in the system at some moment is A. We are interestedin the expected additional time until the system is empty—call it E[T ].Let N denote the number of arrivals during the first A units of time.(i) Compute E[T |N].(ii) Compute E[T ].41. Carloads of customers arrive at a single-server station in accordance with aPoisson process with rate 4 per hour. The service times are exponentially distributedwith rate 20 per hour. If each carload contains either 1, 2, or 3 customers withrespective probabilities 1 4 , 1 2 , 1 4compute the average customer delay in queue.42. In the two-class priority queueing model of Section 8.6.2, what is W Q ?Show that W Q is less than it would be under FIFO if E[S 1 ] E[S 2 ].43. In a two-class priority queueing model suppose that a cost of C i per unittime is incurred for each type i customer that waits in queue, i = 1, 2. Show thattype 1 customers should be given priority over type 2 (as opposed to the reverse)ifE[S 1 ]C 1< E[S 2]C 2


Exercises 56944. Consider the priority queueing model of Section 8.6.2 but now supposethat if a type 2 customer is being served when a type 1 arrives then the type 2customer is bumped out of service. This is called the preemptive case. Supposethat when a bumped type 2 customer goes back in service his service begins at thepoint where it left off when he was bumped.(a) Argue that the work in the system at any time is the same as in the nonpreemptivecase.(b) Derive WQ 1 .Hint: How do type 2 customers affect type 1s?(c) Why is it not true thatV 2 Q = λ 2E[S 2 ]W 2 Q(d) Argue that the work seen by a type 2 arrival is the same as in the nonpreemptivecase, and soWQ 2 = W Q 2 (nonpreemptive) + E[extra time]where the extra time is due to the fact that he may be bumped.(e) Let N denote the number of times a type 2 customer is bumped. Why isE[extra time|N]= NE[S 1]1 − λ 1 E[S 1 ]Hint: When a type 2 is bumped, relate the time until he gets back in serviceto a “busy period.”(f) Let S 2 denote the service time of a type 2. What is E[N|S 2 ]?(g) Combine the preceding to obtainW 2 Q = W 2 Q (nonpreemptive) + λ 1E[S 1 ]E[S 2 ]1 − λ 1 E[S 1 ]*45. Calculate explicitly (not in terms of limiting probabilities) the averagetime a customer spends in the system in Exercise 24.46. In the G/M/1 model if G is exponential with rate λ show that β = λ/μ.47. Verify Erlang’s loss formula, Equation (8.60), when k = 1.48. Verify the formula given for the P i of the M/M/k.49. In the Erlang loss system suppose the Poisson arrival rate is λ = 2, andsuppose there are three servers, each of whom has a service distribution that isuniformly distributed over (0, 2). What proportion of potential customers is lost?


570 8 Queueing Theory50. In the M/M/k system,(a) what is the probability that a customer will have to wait in queue?(b) determine L and W .51. Verify the formula for the distribution of WQ ∗ given for the G/M/k model.*52. Consider a system where the interarrival times have an arbitrary distributionF , and there is a single server whose service distribution is G.LetD n denotethe amount of time the nth customer spends waiting in queue. Interpret S n ,T nso that{Dn + SD n+1 = n − T n , if D n + S n − T n 00, if D n + S n − T n < 053. Consider a model in which the interarrival times have an arbitrary distributionF , and there are k servers each having service distribution G. What conditionon F and G do you think would be necessary for there to exist limiting probabilities?References1. J. Cohen, “The Single Server Queue,” North-Holland, Amsterdam, 1969.2. R. B. Cooper, “Introduction to Queueing Theory,” Second Edition, Macmillan,New York, 1984.3. D. R. Cox and W. L. Smith, “Queues,” Wiley, New York, 1961.4. F. Kelly, “Reversibility and Stochastic Networks,” Wiley, New York, 1979.5. L. Kleinrock, “Queueing Systems,” Vol. I, Wiley, New York, 1975.6. S. Nozaki and S. Ross, “Approximations in Finite Capacity Multiserver Queues withPoisson Arrivals,” J. Appl. Prob. 13, 826–834 (1978).7. L. Takacs, “Introduction to the Theory of Queues,” Oxford University Press, Londonand New York, 1962.8. H. Tijms, “Stochastic Models: An Algorithmic Approach,” Wiley, New York, 1994.9. P. Whittle, “Systems in Stochastic Equilibrium,” Wiley, New York, 1986.10. Wolff, “Stochastic Modeling and the Theory of Queues,” Prentice Hall, New Jersey,1989.


Reliability Theory99.1. IntroductionReliability theory is concerned with determining the probability that a system,possibly consisting of many components, will function. We shall suppose thatwhether or not the system functions is determined solely from a knowledge ofwhich components are functioning. For instance, a series system will functionif and only if all of its components are functioning, while a parallel system willfunction if and only if at least one of its components is functioning. In Section 9.2,we explore the possible ways in which the functioning of the system may dependupon the functioning of its components. In Section 9.3, we suppose that each componentwill function with some known probability (independently of each other)and show how to obtain the probability that the system will function. As thisprobability often is difficult to explicitly compute, we also present useful upperand lower bounds in Section 9.4. In Section 9.5 we look at a system dynamicallyover time by supposing that each component initially functions and does so for arandom length of time at which it fails. We then discuss the relationship betweenthe distribution of the amount of time that a system functions and the distributionsof the component lifetimes. In particular, it turns out that if the amount of timethat a component functions has an increasing failure rate on the average (IFRA)distribution, then so does the distribution of system lifetime. In Section 9.6 weconsider the problem of obtaining the mean lifetime of a system. In the final sectionwe analyze the system when failed components are subjected to repair.9.2. Structure FunctionsConsider a system consisting of n components, and suppose that each componentis either functioning or has failed. To indicate whether or not the ith component571


572 9 Reliability Theoryis functioning, we define the indicator variable x i by{ 1, if the ith component is functioningx i =0, if the ith component has failedThe vector x = (x 1 ,...,x n ) is called the state vector. It indicates which of thecomponents are functioning and which have failed.We further suppose that whether or not the system as a whole is functioning iscompletely determined by the state vector x. Specifically, it is supposed that thereexists a function φ(x) such that{ 1, if the system is functioning when the state vector is xφ(x) =0, if the system has failed when the state vector is xThe function φ(x) is called the structure function of the system.Example 9.1 (The Series Structure) A series system functions if and only ifall of its components are functioning. Hence, its structure function is given byφ(x) = min(x 1 ,...,x n ) =We shall find it useful to represent the structure of a system in terms of a diagram.The relevant diagram for the series structure is shown in Figure 9.1. The ideais that if a signal is initiated at the left end of the diagram then in order for itto successfully reach the right end, it must pass through all of the components;hence, they must all be functioning. n∏i=1x iFigure 9.1. A series system.Example 9.2 (The Parallel Structure) A parallel system functions if and onlyif at least one of its components is functioning. Hence its structure function isgiven byφ(x) = max(x 1 ,...,x n )A parallel structure may be pictorially illustrated by Figure 9.2. This follows sincea signal at the left end can successfully reach the right end as long as at least onecomponent is functioning.


9.2. Structure Functions 573Figure 9.2. A parallel system.Example 9.3 (The k-out-of-n Structure) The series and parallel systems areboth special cases of a k-out-of-n system. Such a system functions if and only ifat least k of the n components are functioning. As ∑ ni=1 x i equals the number offunctioning components, the structure function of a k-out-of-n system is given by⎧1, if ⎪⎨φ(x) =⎪⎩0, ifn∑x i ki=1n∑x i


574 9 Reliability TheoryPictorially, the system is as shown in Figure 9.4. A useful identity, easily checked,is that for binary variables, ∗ x i ,i = 1,...,n,max(x 1 ,...,x n ) = 1 −n∏(1 − x i )i=1Figure 9.4.When n = 2, this yieldsmax(x 1 ,x 2 ) = 1 − (1 − x 1 )(1 − x 2 ) = x 1 + x 2 − x 1 x 2Hence, the structure function in the example may be written asφ(x) = x 1 x 2 (x 3 + x 4 − x 3 x 4 )It is natural to assume that replacing a failed component by a functioning onewill never lead to a deterioration of the system. In other words, it is natural toassume that the structure function φ(x) is an increasing function of x, that is, ifx i y i ,i = 1,...,n, then φ(x) φ(y). Such an assumption shall be made in thischapter and the system will be called monotone.9.2.1. Minimal Path and Minimal Cut SetsIn this section we show how any system can be represented both as a seriesarrangement of parallel structures and as a parallel arrangement of series structures.As a preliminary, we need the following concepts.A state vector x is called a path vector if φ(x) = 1. If, in addition, φ(y) = 0for all y < x, then x is said to be a minimal path vector. ∗∗ If x is a minimal pathvector, then the set A ={i : x i = 1} is called a minimal path set. In other words,a minimal path set is a minimal set of components whose functioning ensures thefunctioning of the system.∗ A binary variable is one that assumes either the value 0 or 1.∗∗ We say that y < x if y i x i ,i = 1,...,n, with y i


9.2. Structure Functions 575Example 9.5 Consider a five-component system whose structure is illustratedby Figure 9.5. Its structure function equalsφ(x) = max(x 1 ,x 2 ) max(x 3 x 4 ,x 5 )= (x 1 + x 2 − x 1 x 2 )(x 3 x 4 + x 5 − x 3 x 4 x 5 )There are four minimal path sets, namely, {1, 3, 4}, {2, 3, 4}, {1, 5}, {2, 5}.Figure 9.5.Example 9.6 In a k-out-of-n system, there are ( nk)minimal path sets, namely,all of the sets consisting of exactly k components. Let A 1 ,...,A s denote the minimal path sets of a given system. We defineα j (x), the indicator function of the jth minimal path set, by{ 1, if all the components of Ai are functioningα j (x) =0, otherwise= ∏x ii∈A jBy definition, it follows that the system will function if all the components of atleast one minimal path set are functioning; that is, if α j (x) = 1forsomej.Ontheother hand, if the system functions, then the set of functioning components mustinclude a minimal path set. Therefore, a system will function if and only if all thecomponents of at least one minimal path set are functioning. Hence,{ 1, if αj (x) = 1forsomejφ(x) =0, if α j (x) = 0 for all jor equivalentlyφ(x) = max α j (x)j∏= max x i (9.1)ji∈A j


576 9 Reliability TheorySince α j (x) is a series structure function of the components of the jth minimalpath set, Equation (9.1) expresses an arbitrary system as a parallel arrangement ofseries systems.Example 9.7 Consider the system of Example 9.5. Because its minimal pathsets are A 1 ={1, 3, 4}, A 2 ={2, 3, 4}, A 3 ={1, 5}, and A 4 ={2, 5}, wehavebyEquation (9.1) thatφ(x) = max{x 1 x 3 x 4 ,x 2 x 3 x 4 ,x 1 x 5 ,x 2 x 5 }= 1 − (1 − x 1 x 3 x 4 )(1 − x 2 x 3 x 4 )(1 − x 1 x 5 )(1 − x 2 x 5 )Figure 9.6.You should verify that this equals the value of φ(x) given in Example 9.5. (Makeuse of the fact that, since x i equals 0 or 1, xi 2 = x i .) This representation may bepictured as shown in Figure 9.6. Figure 9.7. The bridge system.Example 9.8 The system whose structure is as pictured in Figure 9.7 is calledthe bridge system. Its minimal path sets are {1, 4}, {1, 3, 5}, {2, 5}, and {2, 3, 4}.Hence, by Equation (9.1), its structure function may be expressed asφ(x) = max{x 1 x 4 ,x 1 x 3 x 5 ,x 2 x 5 ,x 2 x 3 x 4 }= 1 − (1 − x 1 x 4 )(1 − x 1 x 3 x 5 )(1 − x 2 x 5 )(1 − x 2 x 3 x 4 )This representation φ(x) is diagrammed as shown in Figure 9.8.


9.2. Structure Functions 577Figure 9.8.A state vector x is called a cut vector if φ(x) = 0. If, in addition, φ(y) = 1 for ally > x, then x is said to be a minimal cut vector. Ifx is a minimal cut vector, thenthe set C ={i : x i = 0} is called a minimal cut set. In other words, a minimal cutset is a minimal set of components whose failure ensures the failure of the system.Let C 1 ,...,C k denote the minimal cut sets of a given system. We define β j (x),the indicator function of the jth minimal cut set, by⎧1, if at least one component of the jth minimal⎪⎨cut set is functioningβ j (x) =0, if all of the components of the jth minimal⎪⎩cut set are not functioning= max x ii∈C jSince a system is not functioning if and only if all the components of at least oneminimal cut set are not functioning, it follows thatφ(x) =k∏β j (x) =j=1k∏max x i (9.2)i∈C jSince β j (x) is a parallel structure function of the components of the jth minimalcut set, Equation (9.2) represents an arbitrary system as a series arrangementof parallel systems.Example 9.9 The minimal cut sets of the bridge structure shown in Figure 9.9are {1, 2}, {1, 3, 5}, {2, 3, 4}, and {4, 5}. Hence, from Equation (9.2), we mayexpress φ(x) byj=1φ(x) = max(x 1 ,x 2 ) max(x 1 ,x 3 ,x 5 ) max(x 2 ,x 3 ,x 4 ) max(x 4 ,x 5 )=[1 − (1 − x 1 )(1 − x 2 )][1 − (1 − x 1 )(1 − x 3 )(1 − x 5 )]×[1 − (1 − x 2 )(1 − x 3 )(1 − x 4 )][1 − (1 − x 4 )(1 − x 5 )]This representation of φ(x) is pictorially expressed as Figure 9.10.


578 9 Reliability TheoryFigure 9.9.Figure 9.10. Minimal cut representation of the bridge system.9.3. Reliability of Systems of IndependentComponentsIn this section, we suppose that X i , the state of the ith component, is a randomvariable such thatP {X i = 1}=p i = 1 − P {X i = 0}The value p i , which equals the probability that the ith component is functioning,is called the reliability of the ith component. If we define r byr = P {φ(X) = 1}, where X = (X 1 ,...,X n )then r is called the reliability of the system. When the components, that is, the randomvariables X i ,i = 1,...,n, are independent, we may express r as a functionof the component reliabilities. That is,r = r(p), where p = (p 1 ,...,p n )The function r(p) is called the reliability function. We shall assume throughoutthe remainder of this chapter that the components are independent.


9.3. Reliability of Systems of Independent Components 579Example 9.10 (The Series System) The reliability function of the series systemof n independent components is given byr(p) = P {φ(X) = 1}= P {X i = 1 for all i = 1,...,n}n∏= p ii=1Example 9.11 (The Parallel System) The reliability function of the parallelsystem of n independent components is given byr(p) = P {φ(X) = 1}= P {X i = 1forsomei = 1,...,n}= 1 − P {X i = 0 for all i = 1,...,n}n∏= 1 − (1 − p i ) i=1Example 9.12 (The k-out-of-n System with Equal Probabilities) Consider ak-out-of-n system. If p i = p for all i = 1,...,n, then the reliability function isgiven byr(p,...,p)= P {φ(X) = 1}{ n∑}= P X i ki=1=n∑i=k( ni)p i (1 − p) n−iExample 9.13 (The Two-out-of-Three System) The reliability function of atwo-out-of-three system is given byr(p) = P {φ(X) = 1}= P {X = (1, 1, 1)}+P {X = (1, 1, 0)}+ P {X = (1, 0, 1)}+P {X = (0, 1, 1)}= p 1 p 2 p 3 + p 1 p 2 (1 − p 3 ) + p 1 (1 − p 2 )p 3 + (1 − p 1 )p 2 p 3= p 1 p 2 + p 1 p 3 + p 2 p 3 − 2p 1 p 2 p 3


580 9 Reliability TheoryExample 9.14 (The Three-out-of-Four System) The reliability function of athree-out-of-four system is given byr(p) = P {X = (1, 1, 1, 1)}+P {X = (1, 1, 1, 0)}+P {X = (1, 1, 0, 1)}+ P {X = (1, 0, 1, 1)}+P {X = (0, 1, 1, 1)}= p 1 p 2 p 3 p 4 + p 1 p 2 p 3 (1 − p 4 ) + p 1 p 2 (1 − p 3 )p 4+ p 1 (1 − p 2 )p 3 p 4 + (1 − p 1 )p 2 p 3 p 4= p 1 p 2 p 3 + p 1 p 2 p 4 + p 1 p 3 p 4 + p 2 p 3 p 4 − 3p 1 p 2 p 3 p 4 Example 9.15 (A Five-Component System) Consider a five-component systemthat functions if and only if component 1, component 2, and at least one ofthe remaining components function. Its reliability function is given byr(p) = P {X 1 = 1,X 2 = 1, max(X 3 ,X 4 ,X 5 ) = 1}= P {X 1 = 1}P {X 2 = 1}P {max(X 3 ,X 4 ,X 5 ) = 1}= p 1 p 2 [1 − (1 − p 3 )(1 − p 4 )(1 − p 5 )] Since φ(X) is a 0–1 (that is, a Bernoulli) random variable, we may also computer(p) by taking its expectation. That is,r(p) = P {φ(X) = 1}= E[φ(X)]Example 9.16 (A Four-Component System) A four-component system thatfunctions when both components 1, 4, and at least one of the other componentsfunction has its structure function given byHence,φ(x) = x 1 x 4 max(x 2 ,x 3 )r(p) = E[φ(X)]= E[X 1 X 4 (1 − (1 − X 2 )(1 − X 3 ))]= p 1 p 4 [1 − (1 − p 2 )(1 − p 3 )] An important and intuitive property of the reliability function r(p) is given bythe following proposition.Proposition 9.1 If r(p) is the reliability function of a system of independentcomponents, then r(p) is an increasing function of p.


9.3. Reliability of Systems of Independent Components 581Proof By conditioning on X i and using the independence of the components,we obtainr(p) = E[φ(X)]= p i E[φ(X) | X i = 1]+(1 − p i )E[φ(X) | X i = 0]= p i E[φ(1 i , X)]+(1 − p i )E[φ(0 i , X)]where(1 i , X) = (X 1 ,...,X i−1 , 1,X i+1 ,...,X n ),(0 i , X) = (X 1 ,...,X i−1 , 0,X i+1 ,...,X n )Thus,r(p) = p i E[φ(1 i , X) − φ(0 i , X)]+E[φ(0 i , X)]However, since φ is an increasing function, it follows thatE[φ(1 i , X) − φ(0 i , X)] 0and so the preceding is increasing in p i for all i. Hence the result is proven.Let us now consider the following situation: A system consisting of n differentcomponents is to be built from a stockpile containing exactly two of each type ofcomponent. How should we use the stockpile so as to maximize our probabilityof attaining a functioning system? In particular, should we build two separatesystems, in which case the probability of attaining a functioning one would beP {at least one of the two systems function}= 1 − P {neither of the systems function}= 1 −[(1 − r(p))(1 − r(p ′ ))]where p i (pi ′ ) is the probability that the first (second) number i component functions;or should we build a single system whose ith component functions if at leastone of the number i components functions? In this latter case, the probability thatthe system will function equalsr[1 − (1 − p)(1 − p ′ )]since 1 − (1 − p i )(1 − pi ′ ) equals the probability that the ith component in thesingle system will function. ∗ We now show that replication at the component levelis more effective than replication at the system level.∗ Notation: If x = (x 1 ,...,x n ), y = (y 1 ,...,y n ),thenxy = (x 1 y 1 ,...,x n y n ).Also,max(x, y) =(max(x 1 ,y 1 ),...,max(x n ,y n )) and min(x, y) = (min(x 1 ,y 1 ),...,min(x n ,y n )).


582 9 Reliability TheoryTheorem 9.1 For any reliability function r and vectors p, p ′ ,r[1 − (1 − p)(1 − p ′ )] 1 −[1 − r(p)][1 − r(p ′ )]Proof Let X 1 ,...,X n ,X1 ′ ,...,X′ n be mutually independent 0–1 random variableswithp i = P {X i = 1}, p ′ i = P {X′ i = 1}Since P {max(X i ,Xi ′) = 1}=1 − (1 − p i)(1 − pi ′ ), it follows thatr[1 − (1 − p)(1 − p ′ )]=E(φ[max(X, X ′ )])However, by the monotonicity of φ,wehavethatφ[max(X, X ′ )] is greater than orequal to both φ(X) and (X ′ ) and hence is at least as large as max[φ(X), φ(X ′ )].Hence, from the preceding we have thatr[1 − (1 − p)(1 − p ′ )] E[max(φ(X), φ(X ′ ))]= P {max[φ(X), φ(X ′ )]=1}= 1 − P {φ(X) = 0,φ(X ′ ) = 0}= 1 −[1 − r(p)][1 − r(p ′ )]where the first equality follows from the fact that max[φ(X), φ(X ′ )] is a 0–1 randomvariable and hence its expectation equals the probability that it equals 1. As an illustration of the preceding theorem, suppose that we want to build aseries system of two different types of components from a stockpile consistingof two of each of the kinds of components. Suppose that the reliability of eachcomponent is 1 2. If we use the stockpile to build two separate systems, then theprobability of attaining a working system is1 − ( 34) 2 =716while if we build a single system, replicating components, then the probability ofattaining a working system is( 34) 2 =916Hence, replicating components leads to a higher reliability than replicating systems(as, of course, it must by Theorem 9.1).


9.4. Bounds on the Reliability Function 5839.4. Bounds on the Reliability FunctionConsider the bridge system of Example 9.8, which is represented by Figure 9.11.Using the minimal path representation, we have thatφ(x) = 1 − (1 − x 1 x 4 )(1 − x 1 x 3 x 5 )(1 − x 2 x 5 )(1 − x 2 x 3 x 4 )Hence,r(p) = 1 − E[(1 − X 1 X 4 )(1 − X 1 X 3 X 5 )(1 − X 2 X 5 )(1 − X 2 X 3 X 4 )]Figure 9.11.However, since the minimal path sets overlap (that is, they have components incommon), the random variables (1 − X 1 X 4 ), (1 − X 1 X 3 X 5 ), (1 − X 2 X 5 ), and(1−X 2 X 3 X 4 ) are not independent, and thus the expected value of their product isnot equal to the product of their expected values. Therefore, in order to computer(p), we must first multiply the four random variables and then take the expectedvalue. Doing so, using that X 2 i = X i, we obtainr(p) = E[X 1 X 4 + X 2 X 5 + X 1 X 3 X 5 + X 2 X 3 X 4 − X 1 X 2 X 3 X 4− X 1 X 2 X 3 X 5 − X 1 X 2 X 4 X 5 − X 1 X 3 X 4 X 5 − X 2 X 3 X 4 X 5+ 2X 1 X 2 X 3 X 4 X 5 ]= p 1 p 4 + p 2 p 5 + p 1 p 3 p 5 + p 2 p 3 p 4 − p 1 p 2 p 3 p 4 − p 1 p 2 p 3 p 5− p 1 p 2 p 4 p 5 − p 1 p 3 p 4 p 5 − p 2 p 3 p 4 p 5 + 2p 1 p 2 p 3 p 4 p 5As can be seen by the preceding example, it is often quite tedious to evaluater(p), and thus it would be useful if we had a simple way of obtaining bounds. Wenow consider two methods for this.


584 9 Reliability Theory9.4.1. Method of Inclusion and ExclusionThe following is a well-known formula for the probability of the union of theevents E 1 ,E 2 ,...,E n :( n)⋃ n∑P E i = P(E i ) − ∑∑P(E i E j ) + ∑∑∑P(E i E j E k )i=1i=1i


9.4. Bounds on the Reliability Function 585Then, as1 − I = (1 − 1) Nwe obtain, upon application of the binomial theorem, that1 − I =N∑i=0( Ni)(−1) ior( ) ( ) ( N N NI = N − + −···±2 3 N)(9.5)We now make use of the following combinatorial identity (which is easily establishedby induction on i):( ) ( ) ( n nn− +···± =i i + 1 n)( ) n − 1 0,i − 1i nThe preceding thus implies that( ) ( ) ( N NN− +···± 0 (9.6)i i + 1 N)From Equations (9.5) and (9.6) we obtainI N, by letting i = 2 in (9.6)( ) NI N − , by letting i = 3 in (9.6)2( ) ( ) N NI N − + ,2 3..(9.7)and so on. Now, since N n and ( m) i = 0 whenever i>m, we can rewrite Equation(9.5) asI =n∑i=1( Ni)(−1) i+1 (9.8)


586 9 Reliability TheoryThe equality (9.3) and inequalities (9.4) now follow upon taking expectations of(9.7) and (9.8). This is the case since(⋃ nE[I]=P {N >0}=P {at least one of the E j occurs}=P E j),[ n∑]n∑E[N]=E I j = P(E j )j=1j=11Also,[( )] NE = E[number of pairs of the E j that occur]2[ ∑∑]= E I i I ji


9.4. Bounds on the Reliability Function 587Applying (9.4) yields the desired bounds on r(p). The terms in the summation arecomputed thusly:P(E i ) = ∏l∈A ip l ,P(E i E j ) =P(E i E j E k ) =∏l∈A i ∪A jp l ,∏l∈A i ∪A j ∪A kp land so forth for intersections of more than three of the events. (The precedingfollows since, for instance, in order for the event E i E j to occur, all of the componentsin A i and all of them in A j must function; or, in other words, all componentsin A i ∪ A j must function.)When the p i s are small the probabilities of the intersection of many of theevents E i should be quite small and the convergence should be relatively rapid.Example 9.17 Consider the bridge structure with identical componentprobabilities. That is, take p i to equal p for all i. Letting A 1 ={1, 4},A 2 ={1, 3, 5},A 3 ={2, 5}, and A 4 ={2, 3, 4} denote the minimal path sets, we havethatP(E 1 ) = P(E 3 ) = p 2 ,P(E 2 ) = P(E 4 ) = p 3Also, because exactly five of the six = ( 42)unions of Ai and A j contain fourcomponents (the exception being A 2 ∪ A 4 which contains all five components),we haveP(E 1 E 2 ) = P(E 1 E 3 ) = P(E 1 E 4 ) = P(E 2 E 3 ) = P(E 3 E 4 ) = p 4 ,P(E 2 E 4 ) = p 5Hence, the first two inclusion–exclusion bounds yield2(p 2 + p 3 ) − 5p 4 − p 5 r(p) 2(p 2 + p 3 )where r(p) = r(p,p,p,p,p). For instance, when p = 0.2, we haveand, when p = 0.01,0.08768 r(0.2) 0.096000.02149 r(0.1) 0.02200


588 9 Reliability TheoryJust as we can define events in terms of the minimal path sets whose union is theevent that the system functions, so can we define events in terms of the minimalcut sets whose union is the event that the system fails. Let C 1 ,C 2 ,...,C r denotethe minimal cut sets and define the events F 1 ,...,F r byF i ={all components in C i are failed}Now, because the system is failed if and only if all of the components of at leastone minimal cut set are failed, we have that( r⋃1 − r(p) = P1F i),1 − r(p) ∑ i1 − r(p) ∑ i1 − r(p) ∑ iand so on. AsP(F i ),P(F i ) − ∑∑P(F i F j ),i


9.4. Bounds on the Reliability Function 589A graph can always be subdivided into nonoverlapping connected subgraphscalled components. For instance, the graph in Figure 9.12 with nodes N ={1, 2, 3, 4, 5, 6} and arcs A ={(1, 2), (1, 3), (2, 3), (4, 5)} consists of three components(a graph consisting of a single node is considered to be connected).Figure 9.12.Consider now the random graph having nodes 1, 2,...,n which is such thatthere is an arc from node i to node j with probability P ij . Assume in additionthat the occurrences of these arcs constitute independent events. That is, assumethat the ( n2)random variables Xij ,i ≠ j, are independent where{ 1, if (i, j) is an arcX ij =0, otherwiseWe are interested in the probability that this graph will be connected.We can think of the preceding as being a reliability system of ( n2)components—each component corresponding to a potential arc. The component is said to workif the corresponding arc is indeed an arc of the network, and the system is saidto work if the corresponding graph is connected. As the addition of an arc toa connected graph cannot disconnect the graph, it follows that the structure sodefined is monotone.Let us start by determining the minimal path and minimal cut sets. It is easyto see that a graph will not be connected if and only if the sets of arcs can bepartitioned into two nonempty subsets X and X c in such a way that there is no arcconnecting a node from X with one from X c . For instance, if there are six nodesand if there are no arcs connecting any of the nodes 1, 2, 3, 4 with either 5 or 6,then clearly the graph will not be connected. Thus, we see that any partition of thenodes into two nonempty subsets X and X c corresponds to the minimal cut setdefined by{(i, j): i ∈ X, j ∈ X c }As there are 2 n−1 − 1 such partitions (there are 2 n − 2 ways of choosing a nonemptyproper subset X and, as the partition X, X c is the same as X c ,X,wemustdivide by 2) there are therefore this number of minimal cut sets.


590 9 Reliability TheoryTo determine the minimal path sets, we must characterize a minimal set of arcswhich results in a connected graph. Now the graph in Figure 9.13 is connectedbut it would remain connected if any one of the arcs from the cycle shown inFigure 9.14 were removed. In fact it is not difficult to see that the minimal pathsets are exactly those sets of arcs which result in a graph being connected butnot having any cycles (a cycle being a path from a node to itself). Such sets ofarcs are called spanning trees (Figure 9.15). It is easily verified that any spanningtree contains exactly n − 1 arcs, and it is a famous result in graph theory (due toCayley) that there are exactly n n−2 of these minimal path sets.Figure 9.13. Figure 9.14.Figure 9.15. Two spanning trees (minimal path sets) when n = 4.Because of the large number of minimal path and minimal cut sets (n n−2 and2 n−1 − 1, respectively), it is difficult to obtain any useful bounds without makingfurther restrictions. So, let us assume that all the P ij equal the common value p.That is, we suppose that each of the possible arcs exists, independently, with thesame probability p. We shall start by deriving a recursive formula for the probabilitythat the graph is connected, which is computationally useful when n is nottoo large, and then we shall present an asymptotic formula for this probabilitywhen n is large.Let us denote by P n the probability that the random graph having n nodes isconnected. To derive a recursive formula for P n we first concentrate attention ona single node—say, node 1—and try to determine the probability that node 1 willbe part of a component of size k in the resultant graph. Now for given set of k − 1other nodes these nodes along with node 1 will form a component if(i) there are no arcs connecting any of these k nodes with any of the remainingn − k nodes;(ii) the random graph, restricted to these k nodes [and ( k2)potential arcs—eachindependently appearing with probability p] is connected.


9.4. Bounds on the Reliability Function 591The probability that (i) and (ii) both occur isq k(n−k) P kwhere q = 1 − p. As there are ( n−1k−1)ways of choosing k − 1 other nodes (to formalong with node 1 a component of size k) we see thatP {node(1 is)part of a component of size k}n − 1= q k(n−k) P k , k= 1, 2,...,nk − 1Now since the sum of the foregoing probabilities as k ranges from 1 through nclearly must equal 1, and as the graph is connected if and only if node 1 is part ofa component of size n, we see that∑n−1( ) n − 1P n = 1 −q k(n−k) P k , n= 2, 3,... (9.9)k − 1k=1Starting with P 1 = 1,P 2 = p, Equation (9.9) can be used to determine P n recursivelywhen n is not too large. It is particularly suited for numerical computation.To determine an asymptotic formula for P n when n is large, first note fromEquation (9.9) that since P k 1, we have∑n−1( ) n − 11 − P n q k(n−k)k − 1k=1As it can be shown that for q


592 9 Reliability Theoryin the graph. Specifically, define the minimal cut set C i asC i ={(i, j): j ≠ i}and define F i to be the event that all arcs in C i are not working (and thus, node iis isolated from the other nodes). Now,( ⋃)1 − P n = P(graph is not connected) P F iisince, if any of the events F i occur, then the graph will be disconnected. By theinclusion–exclusion bounds, we have that( ⋃)P F i ∑ P(F i ) − ∑∑P(F i F j )iii


9.4. Bounds on the Reliability Function 5939.4.2. Second Method for Obtaining Bounds on r(p)Our second approach to obtaining bounds on r(p) is based on expressing thedesired probability as the probability of the intersection of events. To do so, letA 1 ,A 2 ,...,A s denote the minimal path sets as before, and define the events,D i ,i = 1,...byD i ={at least one component in A i has failed}Now since the system will have failed if and only if at least one component ineach of the minimal path sets has failed we have that1 − r(p) = P(D 1 D 2 ···D s )= P(D 1 )P (D 2 | D 1 ) ···P(D s | D 1 D 2 ···D s−1 ) (9.11)Now it is quite intuitive that the information that at least one component of A 1 isdown can only increase the probability that at least one component of A 2 is down(or else leave the probability unchanged if A 1 and A 2 do not overlap). Hence,intuitivelyTo prove this inequality, we writeand note thatP(D 2 | D 1 ) P(D 2 )P(D 2 ) = P(D 2 | D 1 )P (D 1 ) + P(D 2 | D c 1 )(1 − P(D 1)) (9.12)P(D 2 | D c 1 ) = P {at least one failed in A 2 | all functioning in A 1 }= 1 − ∏ p jj∈A 2j∉A 1 1 − ∏= P(D 2 )j∈A 2p jHence, from Equation (9.12) we see thatorP(D 2 ) P(D 2 | D 1 )P (D 1 ) + P(D 2 )(1 − P(D 1 ))P(D 2 | D 1 ) P(D 2 )


594 9 Reliability TheoryBy the same reasoning, it also follows thatP(D i | D 1 ···D i−1 ) P(D i )and so from Equation (9.11) we have1 − r(p) ∏ iP(D i )or, equivalently,r(p) 1 − ∏ i(1 − ∏j∈A ip j)To obtain a bound in the other direction, let C 1 ,...,C r denote the minimal cutsets and define the events U 1 ,...,U r byU i ={at least one component in C i is functioning}Then, since the system will function if and only if all of the events U i occur, wehaver(p) = P(U 1 U 2 ···U r )= P(U 1 )P (U 2 | U 1 ) ···P(U r | U 1 ··· U r−1 ) ∏ iP(U i )where the last inequality is established in exactly the same manner as for the D i .Hence,r(p) ∏ [1 − ∏ ](1 − p j )i j∈C iand we thus have the following bounds for the reliability function∏[1 − ∏ ](1 − p j ) r(p) 1 − ∏ (1 − ∏ )p ji j∈C i i j∈A i(9.13)It is to be expected that the upper bound should be close to the actual r(p) if thereis not too much overlap in the minimal path sets, and the lower bound to be closeif there is not too much overlap in the minimal cut sets.


9.5. System Life as a Function of Component Lives 595Example 9.19 For the three-out-of-four system the minimal path sets areA 1 ={1, 2, 3}, A 2 ={1, 2, 4}, A 3 ={1, 3, 4}, and A 4 ={2, 3, 4}; and the minimalcut sets are C 1 ={1, 2}, C 2 ={1, 3}, C 3 ={1, 4}, C 4 ={2, 3}, C 5 ={2, 4},and C 6 ={3, 4}. Hence, by Equation (9.13) we have(1 − q 1 q 2 )(1 − q 1 q 3 )(1 − q 1 q 4 )(1 − q 2 q 3 )(1 − q 2 q 4 )(1 − q 3 q 4 ) r(p) 1 − (1 − p 1 p 2 p 3 )(1 − p 1 p 2 p 4 )(1 − p 1 p 3 p 4 )(1 − p 2 p 3 p 4 )where q i ≡ 1 − p i . For instance, if p i = 1 2for all i, then the preceding yields that0.18 r ( 12,..., 1 2) 0.59The exact value for this structure is easily computed to ber ( 12,..., 1 2)=516= 0.31 9.5. System Life as a Function of Component LivesFor a random variable having distribution function G, we define Ḡ(a) ≡ 1 − G(a)to be the probability that the random variable is greater than a.Consider a system in which the ith component functions for a random lengthof time having distribution F i and then fails. Once failed it remains in thatstate forever. Assuming that the individual component lifetimes are independent,how can we express the distribution of system lifetime as a function of the systemreliability function r(p) and the individual component lifetime distributionsF i ,i = 1,...,n?To answer this we first note that the system will function for a length of time tor greater if and only if it is still functioning at time t. That is, letting F denotethe distribution of system lifetime, we have¯F(t)= P {system life >t}But, by the definition of r(p) we have thatwhere= P {system is functioning at time t}P {system is functioning at time t}=r(P 1 (t), . . . , P n (t))P i (t) = P {component i is functioning at t}= P {lifetime of i>t}= ¯F i (t)


596 9 Reliability TheoryHence we see that¯F(t)= r( ¯F 1 (t),..., ¯F n (t)) (9.14)Example 9.20 In a series system, r(p) = ∏ n1 p i and so from Equation (9.14)¯F(t)=n∏¯F i (t)which is, of course, quite obvious since for a series system the system life is equalto the minimum of the component lives and so will be greater than t if and only ifall component lives are greater than t. Example 9.211In a parallel system r(p) = 1 − ∏ n1 (1 − p i ) and so¯F(t)= 1 −n∏F i (t)The preceding is also easily derived by noting that, in the case of a parallel system,the system life is equal to the maximum of the component lives. For a continuous distribution G, we define λ(t), thefailure rate function ofG, byλ(t) = g(t)Ḡ(t)where g(t) = d/dtG(t). In Section 5.2.2, it is shown that if G is the distribution ofthe lifetime of an item, then λ(t) represents the probability intensity that a t-yearolditem will fail. We say that G is an increasing failure rate (IFR) distribution ifλ(t) is an increasing function of t. Similarly, we say that G is a decreasing failurerate (DFR) distribution if λ(t) is a decreasing function of t.Example 9.22 (The Weibull Distribution) A random variable is said to havethe Weibull distribution if its distribution is given, for some λ>0,α>0, byG(t) = 1 − e −(λt)α , t 0The failure rate function for a Weibull distribution equalsλ(t) = e−(λt)α α(λt) α−1 λe −(λt)α1= αλ(λt) α−1Thus, the Weibull distribution is IFR when α 1, and DFR when 0


9.5. System Life as a Function of Component Lives 597Example 9.23 (The Gamma Distribution) A random variable is said to havea gamma distribution if its density is given, for some λ>0,α>0, byg(t) = λe−λt (λt) α−1Ɣ(α)for t 0whereƔ(α) ≡∫ ∞0e −t t α−1 dtFor the gamma distribution,1λ(t) = Ḡ(t)∫ ∞g(t) = tλe −λx (λx) α−1 dxλe −λt (λt) α−1=∫ ∞With the change of variables u = x − t, we obtain0te −λ(x−t) ( xt) α−1dx∫1 ∞(λ(t) = e −λu 1 + u ) α−1dutHence, G is IFR when α 1 and is DFR when 0


598 9 Reliability TheoryHence,λ F (t) = r′ (Ḡ(t))r(Ḡ(t)) G′ (t)= Ḡ(t)r′ (Ḡ(t)) G ′ (t)r(Ḡ(t)) Ḡ(t)= λ G (t) pr′ (p)∣r(p)∣p=Ḡ(t)(9.15)Since Ḡ(t) is a decreasing function of t, it follows from Equation (9.15) thatif each component of a coherent system has the same IFR lifetime distribution,then the distribution of system lifetime will be IFR if pr ′ (p)/r(p) is a decreasingfunction of p.Example 9.24 (The k-out-of-n System with Identical Components) Considerthe k-out-of-n system which will function if and only if k or more componentsfunction. When each component has the same probability p of functioning, thenumber of functioning components will have a binomial distribution with parametersn and p. Hence,r(p) =n∑i=k( ni)p i (1 − p) n−iwhich, by continual integration by parts, can be shown to be equal tor(p) =Upon differentiation, we obtainTherefore,r ′ (p) =pr ′ (p)r(p)n!(k − 1)!(n − k)!∫ p0x k−1 (1 − x) n−k dxn!(k − 1)!(n − k)! pk−1 (1 − p) n−k[ r(p)=pr ′ (p)[ ∫1 p( x=p p0] −1) k−1 ( ) 1 − x n−k −1dx]1 − p


9.5. System Life as a Function of Component Lives 599Letting y = x/p, yieldspr ′ (p)r(p)=[ ∫ 10( ) 1 − yp n−k −1y dy] k−11 − pSince (1 − yp)/(1 − p) is increasing in p, it follows that pr ′ (p)/r(p) is decreasingin p. Thus, if a k-out-of-n system is composed of independent, like componentshaving an increasing failure rate, the system itself has an increasing failurerate. It turns out, however, that for a k-out-of-n system, in which the independentcomponents have different IFR lifetime distributions, the system lifetime need notbe IFR. Consider the following example of a two-out-of-two (that is, a parallel)system.Example 9.25 (A Parallel System That Is Not IFR) The life distribution ofa parallel system of two independent components, the ith component having anexponential distribution with mean 1/i,i = 1, 2, is given byTherefore,¯F(t)= 1 − (1 − e −t )(1 − e −2t )= e −2t + e −t − e −3tλ(t) = f(t)¯F(t)= 2e−2t + e −t − 3e −3te −2t + e −t − e −3tIt easily follows upon differentiation, that the sign of λ ′ (t) is determined by e −5t −e −3t + 3e −4t , which is positive for small values and negative for large values of t.Therefore, λ(t) is initially strictly increasing, and then strictly decreasing. Hence,F is not IFR. Remark The result of the preceding example is quite surprising at first glance.To obtain a better feel for it we need the concept of a mixture of distributionfunctions. The distribution function G is said to be a mixture of the distributionsG 1 and G 2 if for some p,0


600 9 Reliability Theoryare type 1 and the fraction 1 − p are type 2. Suppose that the lifetime distributionof type 1 items is G 1 and of type 2 items is G 2 . If we choose an item at randomfrom the stockpile, then its life distribution is as given by Equation (9.16).Consider now a mixture of two exponential distributions having rates λ 1 andλ 2 where λ 1 t}==P {type 1, life >t}P {life >t}pe −λ 1tpe −λ 1t + (1 − p)e −λ 2tAs the preceding is increasing in t, it follows that the larger t is, the more likelyit is that the item in use is a type 1 (the better one, since λ 1


9.5. System Life as a Function of Component Lives 601Recall that the failure rate function of a distribution F(t)having density f(t)=F ′ (t) is defined byλ(t) =By integrating both sides, we obtainf(t)1 − F(t)∫ t0∫ tf(s)λ(s) ds =0 1 − F(s) ds=−log ¯F(t)Hence,¯F(t)= e −(t) (9.17)where∫ t(t) = λ(s) ds0The function (t) is called the hazard function of the distribution F .A distribution F is said to have increasing failure on the av-Definition 9.1erage (IFRA) ifincreases in t for t 0.(t)t=∫ t0λ(s) dst(9.18)In other words, Equation (9.18) states that the average failure rate up to time tincreases as t increases. It is not difficult to show that if F is IFR, then F is IFRA;but the reverse need not be true.Note that F is IFRA if (s)/s (t)/t whenever 0 s t, which is equivalentto(αt)αt (t)tfor 0 α 1, all t 0


602 9 Reliability TheoryBut by equation (9.17) we see that (t) =−log ¯F(t), and so the preceding isequivalent toor equivalently− log ¯F(αt) −α log ¯F(t)log ¯F(αt) log ¯F α (t)which, since log x is a monotone function of x, shows that F is IFRA if and only if¯F(αt) ¯F α (t) for 0 α 1, all t 0 (9.19)For a vector p = (p 1 ,...,p n ) we define p α = (p1 α,...,pα n ). We shall need thefollowing proposition.Proposition 9.2Any reliability function r(p) satisfiesr(p α ) [r(p)] α , 0 α 1Proof We prove this by induction on n, the number of components in the system.If n = 1, then either r(p) ≡ 0,r(p)≡ 1, or r(p) ≡ p. Hence, the propositionfollows in this case.So assume that Proposition 9.2 is valid for all monotone systems of n − 1 componentsand consider a system of n components having structure function φ. Byconditioning upon whether or not the nth component is functioning, we obtainr(p α ) = p α n r(1 n, p α ) + (1 − p α n )r(0 n, p α ) (9.20)Now consider a system of components 1 through n − 1 having a structurefunction φ 1 (x) = φ(1 n , x). The reliability function for this system is given byr 1 (p) = r(1 n , p); hence, from the induction assumption (valid for all monotonesystems of n − 1 components), we haver(1 n , p α ) [r(1 n , p)] αSimilarly, by considering the system of components 1 through n − 1 and structurefunction φ 0 (x) = φ(0 n , x), we obtainThus, from Equation (9.20), we obtainr(0 n , p α ) [r(0 n , p)] αr(p α ) p α n [r(1 n, p)] α + (1 − p α n )[r(0 n, p)] α


9.5. System Life as a Function of Component Lives 603which, by using the lemma to follow [with λ = p n , x = r(1 n , p), y = r(0 n , p)],implies thatr(p α ) [p n r(1 n , p) + (1 − p n )r(0 n , p)] α=[r(p)] αwhich proves the result.Lemma 9.1If 0 α 1, 0 λ 1, thenh(y) = λ α x α + (1 − λ α )y α − (λx + (1 − λ)y) α 0for all 0 y x.Proof The proof is left as an exercise. We are now ready to prove the following important theorem.Theorem 9.2 For a monotone system of independent components, if eachcomponent has an IFRA lifetime distribution, then the distribution of system lifetimeis itself IFRA.ProofThe distribution of system lifetime F is given by¯F(αt)= r( ¯F 1 (αt),..., ¯F n (αt))Hence, since r is a monotone function, and since each of the component distributions¯F i is IFRA, we obtain from Equation (9.19)¯F(αt) r( ¯F 1 α (t), . . . , ¯F n α (t)) [r( ¯F 1 (t),..., ¯F n (t))] α= ¯F α (t)which by Equation (9.19) proves the theorem. The last inequality followed, ofcourse, from Proposition 9.2.


604 9 Reliability Theory9.6. Expected System LifetimeIn this section, we show how the mean lifetime of a system can be determined, atleast in theory, from a knowledge of the reliability function r(p) and the componentlifetime distributions F i ,i = 1,...,n.Since the system’s lifetime will be t or larger if and only if the system is stillfunctioning at time t,wehavethatP {system life >t}=r ( ¯F(t) )where ¯F(t) = ( ¯F 1 (t),..., ¯F n (t)). Hence, by a well-known formula that states thatfor any nonnegative random variable X,we see that ∗E[X]=∫ ∞0E[system life]=P {X>x} dx,∫ ∞0r ( ¯F(t) ) dt (9.21)Example 9.26 (A Series System of Uniformly Distributed Components)Consider a series system of three independent components each of which functionsfor an amount of time (in hours) uniformly distributed over (0, 10). Hence,r(p) = p 1 p 2 p 3 and{ t/10, 0 t 10F i (t) =1, t >10i = 1, 2, 3Therefore,⎧( )r ( ¯F(t) ) ⎪⎨ 10 − t 3, 0 t 10= 10⎪⎩0, t >10∗ That E[X]= ∫ ∞0 P {X>x} dx can be shown as follows when X has density f :∫ ∞∫ ∞ ∫ ∞∫ ∞ ∫ y∫ ∞P {X>x} dx = f(y)dydx= f(y)dxdy= yf (y) dy = E[X]00 x0 00


9.6. Expected System Lifetime 605and so from Equation (9.21) we obtainE[system life]=∫ 10= 100∫ 1( ) 10 − t 3dt100= 5 2y 3 dyExample 9.27 (A Two-out-of-Three System) Consider a two-out-of-threesystem of independent components, in which each component’s lifetime is (inmonths) uniformly distributed over (0, 1). As was shown in Example 9.13, thereliability of such a system is given bySincewe see from Equation (9.21) thatr(p) = p 1 p 2 + p 1 p 3 + p 2 p 3 − 2p 1 p 2 p 3E[system life]={ t, 0 t 1F i (t) =1, t >1=∫ 10∫ 10= 1 − 1 2= 1 2[3(1 − t) 2 − 2(1 − t) 3 ] dt(3y 2 − 2y 3 )dyExample 9.28 (A Four-Component System) Consider the four-componentsystem that functions when components 1 and 2 and at least one of components 3and 4 functions. Its structure function is given byand thus its reliability function equalsφ(x) = x 1 x 2 (x 3 + x 4 − x 3 x 4 )r(p) = p 1 p 2 (p 3 + p 4 − p 3 p 4 )


606 9 Reliability TheoryLet us compute the mean system lifetime when the ith component is uniformlydistributed over (0,i), i = 1, 2, 3, 4. Now,{ 1 − t, 0 t 1¯F 1 (t) =0, t >1{ 1 − t/2, 0 t 2¯F 2 (t) =0, t >2{ 1 − t/3, 0 t 3¯F 3 (t) =0, t >3{ 1 − t/4, 0 t 4¯F 4 (t) =0, t >4Hence,⎧ ( )[r ( ¯F(t) ) ⎨ 2 − t 3 − t(1 − t)+ 4 − t](3 − t)(4 − t)− , 0 t 1=2 3 4 12⎩0, t >1Therefore,∫ 1E[system life]= 1 (1 − t)(2 − t)(12 − t 2 )dt24 0= 593(24)(60)≈ 0.41 We end this section by obtaining the mean lifetime of a k-out-of-n system ofindependent identically distributed exponential components. If θ is the mean lifetimeof each component, thenHence, since for a k-out-of-n system,r(p,p,...,p)=¯F i (t) = e −t/θi=kn∑i=k( ni)p i (1 − p) n−iwe obtain from Equation (9.21)∫ ∞ n∑( ) nE[system life]=(e −t/θ ) i (1 − e −t/θ ) n−i dt0 i


9.6. Expected System Lifetime 607Making the substitutiony = e −t/θ ,dy =− 1 θ e−t/θ dt =− y θ dtyieldsE[system life]=θNow, it is not difficult to show that ∗n∑( ) ∫ n 1y i−1 (1 − y) n−i dyii=k0∫ 10y n (1 − y) m dy =m!n!(m + n + 1)!(9.22)Thus, the foregoing equalsE[system life]=θ= θn∑i=kn∑i=kn! (i − 1)!(n − i)!(n − i)!i! n!1i(9.23)Remark Equation (9.23) could have been proven directly by making use ofspecial properties of the exponential distribution. First note that the lifetime of ak-out-of-n system can be written as T 1 +···+T n−k+1 , where T i represents thetime between the (i − 1)st and ith failure. This is true since T 1 +···+T n−k+1equals the time at which the (n − k + 1)st component fails, which is also the firsttime that the number of functioning components is less than k. Now, when alln components are functioning, the rate at which failures occur is n/θ. That is,T 1 is exponentially distributed with mean θ/n. Similarly, since T i represents thetime until the next failure when there are n − (i − 1) functioning components,∗ Let∫ 1C(n,m) = y n (1 − y) m dy0Integration by parts yields that C(n,m) =[m/(n + 1)]C(n + 1,m− 1). Starting with C(n,0) = 1/(n + 1), Equation (9.22) follows by mathematical induction.


608 9 Reliability Theoryit follows that T i is exponentially distributed with mean θ/(n− i + 1). Hence, themean system lifetime equals[ 1E[T 1 +···+T n−k+1 ]=θn +···+ 1 ]kNote also that it follows, from the lack of memory of the exponential, that theT i ,i = 1,...,n− k + 1, are independent random variables.9.6.1. An Upper Bound on the Expected Life of a Parallel SystemConsider a parallel system of n components, whose lifetimes are not necessarilyindependent. The system lifetime can be expressed assystem life = max X iiwhere X i is the lifetime of component i, i = 1,...,n. We can bound the expectedsystem lifetime by making use of the following inequality. Namely, for any constantcmax X i c +in∑(X i − c) + (9.24)where x + , the positive part of x, is equal to x if x>0 and is equal to 0 if x 0.The validity of inequality (9.24) is immediate since if max X i cthen the right side is at least as large as c + (X (n) − c) = X (n) .It follows from inequality (9.24), upon taking expectations, that[ ]E max X i c +ii=1n∑E[(X i − c) + ] (9.25)i=1Now, (X i − c) + is a nonnegative random variable and soE[(X i − c) + ]===∫ ∞0∫ ∞0∫ ∞cP {(X i − c) + >x} dxP {X i − c>x} dxP {X i >y} dy


9.6. Expected System Lifetime 609Thus, we obtain[ ]E max X i c +in∑i=1∫ ∞cP {X i >y} dy (9.26)Because the preceding is true for all c, it follows that we obtain the best boundby letting c equal the value that minimizes the right side of the preceding. Todetermine that value, differentiate the right side of the preceding and set the resultequal to 0, to obtain:1 −n∑P {X i >c}=0i=1That is, the minimizing value of c is that value c ∗ for whichn∑P {X i >c ∗ }=1i=1Since ∑ ni=1 P {X i >c} is a decreasing function of c,thevalueofc ∗ can be easilyapproximated and then utilized in inequality (9.26). Also, it is interesting to notethat c ∗ is such that the expected number of the X i that exceed c ∗ is equal to 1(see Exercise 32). That the optimal value of c has this property is interesting andsomewhat intuitive in as much as inequality 9.24 is an equality when exactly oneof the X i exceeds c.Example 9.29 Suppose the lifetime of component i is exponentially distributedwith rate λ i ,i = 1,...,n. Then the minimizing value of c is such that1 =n∑P {X i >c ∗ }=i=1n∑i=1and the resulting bound of the mean system life is[ ] n∑E max X i c ∗ + E[(X i − c ∗ ) + ]i= c ∗ +i=1e −λ ic ∗n∑ (E[(Xi − c ∗ ) + | X i >c ∗ ]P {X i >c ∗ ]i=1+ E[(X i − c ∗ ) + | X i c ∗ ]P {X i c ∗ ] )= c ∗ +n∑i=11λ ie −λ ic ∗


610 9 Reliability TheoryIn the special case where all the rates are equal, say, λ i = λ,i = 1,...,n, then1 = ne −λc∗ or c ∗ = 1 λ log(n)and the bound is[Emaxi]X i 1 (log(n) + 1)λThat is, if X 1 ,...,X n are identically distributed exponential random variableswith rate λ, then the preceding gives a bound on the expected value of their maximum.In the special case where these random variables are also independent, thefollowing exact expression, given by Equation (9.25), is not much less than thepreceding upper bound:[EmaxiX i]= 1 λn∑1/i ≈ 1 λi=1∫ n11x dx ≈ 1 λ log(n)9.7. Systems with RepairConsider an n component system having reliability function r(p). Suppose thatcomponent i functions for an exponentially distributed time with rate λ i and thenfails; once failed it takes an exponential time with rate μ i to be repaired, i =1,...,n. All components act independently.Let us suppose that all components are initially working, and letA(t) = P {system is working at t}A(t) is called the availability at time t. Since the components act independently,A(t) can be expressed in terms of the reliability function as follows:whereA(t) = r(A 1 (t),...,A n (t)) (9.27)A i (t) = P {component i is functioning at t}Now the state of component i—either on or off—changes in accordance with atwo-state continuous time Markov chain. Hence, from the results of Example 6.12


9.7. Systems with Repair 611we haveA i (t) = P 00 (t) =μ i+λ ie −(λ i+μ i )tμ i + λ i μ i + λ iThus, we obtain( μA(t) = rμ + λ + λ )μ + λ e−(λ+μ)tIf we let t approach ∞, then we obtain the limiting availability—call it A—whichis given by( ) μA = lim A(t) = r t→∞ λ + μRemarks (i) If the on and off distribution for component i are arbitrary continuousdistributions with respective means 1/λ i and 1/μ i ,i = 1,...,n, then itfollows from the theory of alternating renewal processes (see Section 7.5.1) thatA i (t) →1/λ i1/λ i + 1/μ i= μ iμ i + λ iand thus using the continuity of the reliability function, it follows from (9.27) thatthe limiting availability is( ) μA = lim A(t) = r t→∞ μ + λHence, A depends only on the on and off distributions through their means.(ii) It can be shown (using the theory of regenerative processes as presented inSection 7.5) that A will also equal the long-run proportion of time that the systemwill be functioning.Example 9.30For a series system, r(p) = ∏ ni=1 p i and soA(t) =n∏[i=1μ iμ i + λ i+λ ]ie −(λ i+μ i )tμ i + λ iandn∏ μ iA =μ i + λ ii=1


612 9 Reliability TheoryExample 9.31For a parallel system, r(p) = 1 − ∏ ni=1 (1 − p i ) and thusA(t) = 1 −n∏[]λ i(1 − e −(λ i+μ i )t )μ i + λ ii=1andn∏ λ iA(t) = 1 −μ i + λ ii=1The preceding system will alternate between periods when it is up and periodswhen it is down. Let us denote by U i and D i ,i 1, the lengths of the ith up anddown period respectively. For instance in a two-out-of-three system, U 1 will bethe time until two components are down; D 1 , the additional time until two are up;U 2 the additional time until two are down, and so on. LetU 1 +···+U nŪ = lim,n→∞ nD 1 +···+D n¯D = limn→∞ ndenote the average length of an up and down period respectively. ∗To determine Ū and ¯D, note first that in the first n up–down cycles—that is,in time ∑ ni=1 (U i + D i )—the system will be up for a time ∑ ni=1 U i . Hence, theproportion of time the system will be up in the first n up–down cycles is∑U 1 +···+U ni=1nU i /n= ∑U 1 +···+U n + D 1 +···+D ni=1 n U i /n + ∑ ni=1 D i /nAs n →∞, this must converge to A, the long-run proportion of time the systemis up. Hence,( )ŪμŪ + ¯D = A = r λ + μ(9.28)However, to solve for Ū and ¯D we need a second equation. To obtain oneconsider the rate at which the system fails. As there will be n failures in time∗ It can be shown using the theory of regenerative processes that, with probability 1, the precedinglimits will exist and will be constants.


9.7. Systems with Repair 613∑ ni=1(U i + D i ), it follows that the rate at which the system fails isnrate at which system fails = lim ∑n→∞ niU 1 + ∑ n1 D 1n= lim ∑n→∞ n1U i /n + ∑ n1 D i /n = 1Ū + ¯D (9.29)That is, the foregoing yields the intuitive result that, on average, there is one failureevery Ū + ¯D time units. To utilize this, let us determine the rate at which a failureof component i causes the system to go from up to down. Now, the system willgo from up to down when component i fails if the states of the other componentsx 1 ,...,x i−1 , x i−1 ,...,x n are such that φ(1 i , x) = 1,φ(0 i , x) = 0. That is thestates of the other components must be such thatφ(1 i , x) − φ(0 i , x) = 1 (9.30)Since component i will, on average, have one failure every 1/λ i +1/μ i time units,it follows that the rate at which component i fails is equal to (1/λ i + 1/μ i ) −1 =λ i μ i /(λ i + μ i ). In addition, the states of the other components will be such that(9.30) holds with probabilityP {φ(1 i ,X(∞)) − φ(0 i ,X(∞)) = 1}= E[φ(1 i ,X(∞)) − φ(0 i ,X(∞))] since φ(1 i ,X(∞)) − φ(0 i ,X(∞))(is a Bernoulli random variable= r 1 i ,) (μ− r 0 i ,λ + μ)μλ + μHence, putting the preceding together we see thatrate at which componenti causes the system to fail =λ [ (iμ ir 1 i ,λ i + μ iSumming this over all components i thus givesrate at which system fails = ∑ i[ (λ i μ ir 1 i ,λ i + μ i) (μ− r 0 i ,λ + μ) (μ− r 0 i ,λ + μ)]μλ + μ)]μλ + μFinally, equating the preceding with (9.29) yields1Ū + ¯D = ∑ i[ (λ i μ ir 1 i ,λ i + μ i) (μ− r 0 i ,λ + μ)]μλ + μ(9.31)


614 9 Reliability TheorySolving (9.28) and (9.31), we obtainŪ =n∑i=1[ (λ i μ ir 1 i ,λ i + μ i( μrλ + μ)) ( )], (9.32)μμ− r 0 i ,λ + μ λ + μ[ ( )] μ1 − r Ūλ + μ¯D =) (9.33)r( μλ + μAlso, (9.31) yields the rate at which the system fails.Remark In establishing the formulas for Ū and ¯D, we did not make use ofthe assumption of exponential on and off times and in fact, our derivation is validand Equations (9.32) and (9.33) hold whenever Ū and ¯D are well defined (a sufficientcondition is that all on and off distributions are continuous). The quantitiesλ i ,μ i ,i = 1,...,n, will represent, respectively, the reciprocals of the mean lifetimesand mean repair times.Example 9.32For a series system,Ū =∑iwhereas for a parallel system,Ū =∑i∏iλ i μ i ∏λ i + μ iμ iμ i + λ iμ i1 − ∏ iμ ¯D =i + λ i∏ μ × ∑1iiμ i + λ iλ i1 − ∏ iμ i + λ iλ i μ i ∏j≠iλ i + μ i∏iλ i=λ jμ j + λ jμ ¯D = i + λ i1 − ∏ Ū = ∑1λ iiμ i + λ i= ∑1μ ji λ ij≠iμ j + λ ji μ ii λ iλ i1 − ∏ iμ i + λ i× ∑1∏ λ jjμ j + λ j,i μ i,


9.7. Systems with Repair 615The preceding formulas hold for arbitrary continuous up and down distributionswith 1/λ i and 1/μ i denoting respectively the mean up and down times of componenti, i = 1,...,n. 9.7.1. A Series Model with Suspended AnimationConsider a series consisting of n components, and suppose that whenever a component(and thus the system) goes down, repair begins on that component andeach of the other components enters a state of suspended animation. That is, afterthe down component is repaired, the other components resume operation inexactly the same condition as when the failure occurred. If two or more componentsgo down simultaneously, one of them is arbitrarily chosen as being thefailed component and repair on that component begins; the others that went downat the same time are considered to be in a state of suspended animation, and theywill instantaneously go down when the repair is completed. We suppose that (notcounting any time in suspended animation) the distribution of time that componenti functions is F i with mean u i , whereas its repair distribution is G i withmean d i ,i = 1,...,n.To determine the long-run proportion of time this system is working, we reasonas follows. To begin, consider the time, call it T , at which the system has beenup for a time t. Now, when the system is up, the failure times of component iconstitute a renewal process with mean interarrival time u i . Therefore, it followsthatnumber of failures of i in time T ≈ t u iAs the average repair time of i is d i , the preceding implies thattotal repair time of i in time T ≈ td iu iTherefore, in the period of time in which the system has been up for a time t, thetotal system downtime has approximately beentn∑d i /u ii=1Hence, the proportion of time that the system has been up is approximatelytt + t ∑ ni=1 d i /u i


616 9 Reliability TheoryBecause this approximation should become exact as we let t become larger, itfollows thatwhich also shows thatproportion of time the system is up =11 + ∑ i d i/u i(9.34)proportion of time the system is down = 1 − proportion of time the system is up∑i=d i/u i1 + ∑ i d i/u iMoreover, in the time interval from 0 to T , the proportion of the repair time thathas been devoted to component i is approximatelyThus, in the long run,td i /u i∑i td i/u iproportion of down time that is due to component i =d i/u i∑i d i/u iMultiplying the preceding by the proportion of time the system is down givesd i /u iproportion of time component i is being repaired =1 + ∑ i d i/u iAlso, since component j will be in suspended animation whenever any of theother components is in repair, we see thatproportion of time component j is in suspended animation =∑i≠j d i/u i1 + ∑ i d i/u iAnother quantity of interest is the long-run rate at which the system fails. Sincecomponent i fails at rate 1/u i when the system is up, and does not fail when thesystem is down, it follows thatrate at which i fails =proportion of time system is upu i1/u i=1 + ∑ i d i/u i


Exercises 617Since the system fails when any of its components fail, the preceding yields that∑irate at which the system fails =1/u i1 + ∑ i d (9.35)i/u iIf we partition the time axis into periods when the system is up and those whenit is down, we can determine the average length of an up period by noting that ifU(t) is the total amount of time that the system is up in the interval [0,t], and ifN(t) is the number of failures by time t, thenU(t)average length of an up period = limt→∞ N(t)U(t)/t= limt→∞ N(t)/t1= ∑i 1/u iwhere the final equality used Equations (9.34) and (9.35). Also, in a similar mannerit can be shown that∑average length of a down period = ∑ i d i/u ii 1/u (9.36)iExercises1. Prove that, for any structure function, φ,φ(x) = x i φ(1 i , x) + (1 − x i )φ(0 i , x)where(1 i , x) = (x 1 ,...,x i−1 , 1,x i+1 ,...,x n ),(0 i , x) = (x 1 ,...,x i−1 , 0,x i+1 ,...,x n )2. Show that(a) if φ(0, 0,...,0) = 0 and φ(1, 1,...,1) = 1, thenmin x i φ(x) max x i


618 9 Reliability Theory(b) φ(max(x, y)) max(φ(x), φ(y))(c) φ(min(x, y)) min(φ(x), φ(y))3. For any structure function, we define the dual structure φ D byφ D (x) = 1 − φ(1 − x)(a) Show that the dual of a parallel (series) system is a series (parallel) system.(b) Show that the dual of a dual structure is the original structure.(c) What is the dual of a k-out-of-n structure?(d) Show that a minimal path (cut) set of the dual system is a minimal cut(path) set of the original structure.*4. Write the structure function corresponding to the following:(a)Figure 9.16.(b)Figure 9.17.(c)Figure 9.18.


Exercises 6195. Find the minimal path and minimal cut sets for:(a)Figure 9.19.(b)Figure 9.20.*6. The minimal path sets are {1, 2, 4}, {1, 3, 5}, and {5, 6}. Give the minimalcut sets.7. The minimal cut sets are {1, 2, 3}, {2, 3, 4}, and {3, 5}. What are the minimalpath sets?8. Give the minimal path sets and the minimal cut sets for the structure givenby Figure 9.21.9. Component i is said to be relevant to the system if for some state vector x,φ(1 i , x) = 1, φ(0 i , x) = 0Otherwise, it is said to be irrelevant.


620 9 Reliability TheoryFigure 9.21.(a) Explain in words what it means for a component to be irrelevant.(b) Let A 1 ,...,A s be the minimal path sets of a system, and let S denote theset of components. Show that S = ⋃ si=1 A i if and only if all components arerelevant.(c) Let C 1 ,...,C k denote the minimal cut sets. Show that S = ⋃ ki=1 C i if andonly if all components are relevant.10. Let t i denote the time of failure of the ith component; let τ φ (t) denote thetime to failure of the system φ as a function of the vector t = (t 1 ,...,t n ). Showthatmax min t i = τ φ (t) = min max t i1js i∈A j 1jk i∈C jwhere C 1 ,...,C k are the minimal cut sets, and A 1 ,...,A s the minimal path sets.11. Give the reliability function of the structure of Exercise 8.*12. Give the minimal path sets and the reliability function for the structure inFigure 9.22.Figure 9.22.


Exercises 62113. Let r(p) be the reliability function. Show thatr(p) = p i r(1 i , p) + (1 − p i )r(0 i , p)14. Compute the reliability function of the bridge system (see Figure 9.11) byconditioning upon whether or not component 3 is working.15. Compute upper and lower bounds of the reliability function (using Method 2)for the systems given in Exercise 4, and compare them with the exact values whenp i ≡ 1 2 .16. Compute the upper and lower bounds of r(p) using both methods for the(a) two-out-of-three system and(b) two-out-of-four system.(c) Compare these bounds with the exact reliability when(i) p i ≡ 0.5(ii) p i ≡ 0.8(iii) p i ≡ 0.2*17. Let N be a nonnegative, integer-valued random variable. Show thatP {N >0} (E[N])2E[N 2 ]and explain how this inequality can be used to derive additional bounds on areliability function.Hint:E[N 2 ]=E[N 2 | N>0]P {N >0} (Why?) (E[N | N>0]) 2 P {N >0} (Why?)Now multiply both sides by P {N >0}.18. Consider a structure in which the minimal path sets are {1, 2, 3} and{3, 4, 5}.(a) What are the minimal cut sets?(b) If the component lifetimes are independent uniform (0, 1) random variables,determine the probability that the system life will be less than 1 2 .19. Let X 1 ,X 2 ,...,X n denote independent and identically distributed randomvariables and define the order statistics X (1) ,...,X (n) byX (i) ≡ ith smallest of X 1 ,...,X nShow that if the distribution of X j is IFR, then so is the distribution of X (i) .Hint: Relate this to one of the examples of this chapter.


622 9 Reliability Theory20. Let F be a continuous distribution function. For some positive α, define thedistribution function G byḠ(t) = ( ¯F(t)) αFind the relationship between λ G (t) and λ F (t), the respective failure rate functionsof G and F .21. Consider the following four structures:(i)(ii)Figure 9.23. Figure 9.24.(iii)(iv)Figure 9.25. Figure 9.26.Let F 1 , F 2 , and F 3 be the corresponding component failure distributions;each of which is assumed to be IFR (increasing failure rate). Let F be the systemfailure distribution. All components are independent.(a) For which structures is F necessarily IFR if F 1 = F 2 = F 3 ? Give reasons.(b) For which structures is F necessarily IFR if F 2 = F 3 ? Give reasons.(c) For which structures is F necessarily IFR if F 1 ≠ F 2 ≠ F 3 ? Give reasons.*22. Let X denote the lifetime of an item. Suppose the item has reached the ageof t. LetX t denote its remaining life and define¯F t (a) = P {X t >a}In words, ¯F t (a) is the probability that a t-year old item survives an additionaltime a. Show that


Exercises 623(a) ¯F t (a) = ¯F(t + a)/ ¯F(t)where F is the distribution function of X.(b) Another definition of IFR is to say that F is IFR if ¯F t (a) decreases in t,for all a. Show that this definition is equivalent to the one given in the text whenF has a density.23. Show that if each (independent) component of a series system has an IFRdistribution, then the system lifetime is itself IFR by(a) showing thatλ F (t) = ∑ iλ i (t)where λ F (t) is the failure rate function of the system; and λ i (t) the failure ratefunction of the lifetime of component i.(b) using the definition of IFR given in Exercise 22.24. Show that if F is IFR, then it is also IFRA, and show by counterexamplethat the reverse is not true.*25. We say that ζ is a p-percentile of the distribution F if F(ζ)= p. Show thatif ζ is a p-percentile of the IFRA distribution F , then¯F(x) e −θx ,¯F(x) e −θx ,x ζx ζwhereθ =− log(1 − p)ζ26. Prove Lemma 9.3.Hint: Let x = y + δ. Note that f(t)= t α is a concave function when 0 α 1, and use the fact that for a concave function f(t+h)−f(t)is decreasingin t.27. Let r(p) = r(p,p,...,p). Show that if r(p 0 ) = p 0 , thenHint: Use Proposition 9.2.r(p) p for p p 0r(p) p for p p 028. Find the mean lifetime of a series system of two components when the componentlifetimes are respectively uniform on (0, 1) and uniform on (0, 2). Repeatfor a parallel system.


624 9 Reliability Theory29. Show that the mean lifetime of a parallel system of two components is1 μ 1μ 2++μ 1 + μ 2 (μ 1 + μ 2 )μ 2 (μ 1 + μ 2 )μ 1when the first component is exponentially distributed with mean 1/μ 1 and thesecond is exponential with mean 1/μ 2 .*30. Compute the expected system lifetime of a three-out-of-four system whenthe first two component lifetimes are uniform on (0, 1) and the second two areuniform on (0, 2).31. Show that the variance of the lifetime of a k-out-of-n system of components,each of whose lifetimes is exponential with mean θ, is given byθ 2n∑i=k32. In Section 9.6.1 show that the expected number of X i that exceed c ∗ is equalto 1.33. Let X i be an exponential random variable with mean 8 + 2i, fori = 1, 2, 3.Use the results of Section 9.6.1 to obtain an upper bound on E[max X i ], and thencompare this with the exact result when the X i are independent.34. For the model of Section 9.7, compute for a k-out-of-n structure (i) the averageup time, (ii) the average down time, and (iii) the system failure rate.35. Prove the combinatorial identity( ) ( ) ( n − 1 n n= −i − 1 i i + 1)1i 2( n+···± , i nn)(a) by induction on i(b) by a backwards induction argument on i—that is, prove it first for i = n,then assume it for i = k and show that this implies that it is true for i = k − 1.36. Verify Equation (9.36).References1. R. E. Barlow and F. Proschan, “Statistical Theory of Reliability and Life Testing,” Holt,New York, 1975.2. H. Frank and I. Frisch, “Communication, Transmission, and Transportation Network,”Addison-Wesley, Reading, Massachusetts, 1971.3. I. B. Gertsbakh, “Statistical Reliability Theory,” Marcel Dekker, New York and Basel,1989.


Brownian Motion andStationary Processes1010.1. Brownian MotionLet us start by considering the symmetric random walk, which in each time unitis equally likely to take a unit step either to the left or to the right. That is, it isa Markov chain with P i,i+1 = 1 2 = P i,i−1, i = 0, ±1,.... Now suppose that wespeed up this process by taking smaller and smaller steps in smaller and smallertime intervals. If we now go to the limit in the right manner what we obtain isBrownian motion.More precisely, suppose that each t time unit we take a step of size x eitherto the left or the right with equal probabilities. If we let X(t) denote the positionat time t thenX(t) = x(X 1 +···+X [t/t] ) (10.1)where{ +1, if the ith step of length x is to the rightX i =−1, if it is to the leftand [t/t] is the largest integer less than or equal to t/t, and where the X i areassumed independent withP {X i = 1}=P {X i =−1}= 1 2As E[X i ]=0, Var(X i ) = E[Xi 2 ]=1, we see from Equation (10.1) thatE[X(t)]=0,Var(X(t)) = (x) 2 [ tt](10.2)We shall now let x and t go to 0. However, we must do it in a way such thatthe resulting limiting process is nontrivial (for instance, if we let x = t and625


626 10 Brownian Motion and Stationary Processeslet t → 0, then from the preceding we see that E[X(t)] and Var(X(t)) wouldboth converge to 0 and thus X(t) would equal 0 with probability 1). If we letx = σ √ t for some positive constant σ then from Equation (10.2) we see thatas t → 0E[X(t)] =0,Var(X(t)) → σ 2 tWe now list some intuitive properties of this limiting process obtained by takingx = σ √ t and then letting t → 0. From Equation (10.1) and the central limittheorem the following seems reasonable:(i) X(t) is normal with mean 0 and variance σ 2 t. In addition, because thechanges of value of the random walk in nonoverlapping time intervals areindependent, we have(ii) {X(t), t 0} has independent increments, in that for all t 1


10.1. Brownian Motion 627the molecules of the surrounding medium. However, the preceding concise definitionof this stochastic process underlying Brownian motion was given by Wienerin a series of papers originating in 1918.When σ = 1, the process is called standard Brownian motion. Because anyBrownian motion can be converted to the standard process by letting B(t) =X(t)/σ we shall, unless otherwise stated, suppose throughout this chapter thatσ = 1.The interpretation of Brownian motion as the limit of the random walks [Equation(10.1)] suggests that X(t) should be a continuous function of t. This turnsout to be the case, and it may be proven that, with probability 1, X(t) is indeed acontinuous function of t. This fact is quite deep, and no proof shall be attempted.As X(t) is normal with mean 0 and variance t, its density function is given byf t (x) = 1 √2πte −x2 /2tTo obtain the joint density function of X(t 1 ), X(t 2 ),...,X(t n ) for t 1 < ···


628 10 Brownian Motion and Stationary Processesance t k − t k−1 . Hence, the joint density of X(t 1 ),...,X(t n ) is given byf(x 1 ,x 2 ,...,x n ) = f t1 (x 1 )f t2 −t 1(x 2 − x 1 ) ···f tn −t n−1(x n − x n−1 ){exp − 1 [ x21+ (x 2 − x 1 ) 2+···+ (x n − x n−1 ) 2 ]}2 t 1 t 2 − t 1 t n − t n−1=(2π) n/2 [t 1 (t 2 − t 1 ) ···(t n − t n−1 )] 1/2 (10.3)From this equation, we can compute in principle any desired probabilities. Forinstance, suppose we require the conditional distribution of X(s) given thatX(t) = B where s


10.2. Hitting Times, Maximum Variable, and the Gambler’s Ruin Problem 629Solution:(a) P {Y(1)>0|Y(1/2) = σ }= P {Y(1) − Y(1/2)>−σ |Y(1/2) = σ }= P {Y(1) − Y(1/2)>−σ } by independent increments= P {Y(1/2)>−σ } by stationary increments{ }Y(1/2)= Pσ/ √ 2 > −√ 2= ( √ 2)≈ 0.9213where (x) = P {N(0, 1) x} is the standard normal distribution function.(b) Because we must compute P {Y(1/2)>0|Y(1) = σ }, let us first determinethe conditional distribution of Y(s) given that Y(t)= C, when s0|Y(1) = σ }=P {N(σ/2,σ 2 /4)>0}= (1)≈ 0.8413 10.2. Hitting Times, Maximum Variable, and theGambler’s Ruin ProblemLet T a denote the first time the Brownian motion process hits a. When a>0wewill compute P {T a t} by considering P {X(t) a} and conditioning on whetheror not T a t. This givesP {X(t) a}=P {X(t) a|T a t}P {T a t}+P {X(t) a|T a >t}P {T a >t} (10.5)Now if T a t, then the process hits a at some point in [0,t] and, by symmetry, itis just as likely to be above a or below a at time t. That isP {X(t) a|T a t}= 1 2


630 10 Brownian Motion and Stationary ProcessesAs the second right-hand term of Equation (10.5) is clearly equal to 0 (since,by continuity, the process value cannot be greater than a without having yet hita), we see thatP {T a t}=2P {X(t) a}= √ 2 ∫ ∞e −x2 /2t dx2πta= 2 √2π∫ ∞a/ √ te −y2 /2 dy, a > 0 (10.6)For a0{}P max X(s) a = P {T a t} by continuity0st= 2P {X(t) a} from (10.6)= √ 2 ∫ ∞e −y2 /2 dy2πLet us now consider the probability that Brownian motion hits A before −Bwhere A>0, B>0. To compute this we shall make use of the interpretationof Brownian motion as being a limit of the symmetric random walk. To start letus recall from the results of the gambler’s ruin problem (see Section 4.5.1) thatthe probability that the symmetric random walk goes up A before going downB when each step is equally likely to be either up or down a distance x is [byEquation (4.14) with N = (A + B)/x, i = B/x] equal to Bx/(A+ B)x =B/(A + B).Hence, upon letting x → 0, we see thata/ √ tP {up A before down B}=BA + B


10.3. Variations on Brownian Motion10.3.1. Brownian Motion with Drift10.3. VariationsonBrownianMotion 631We say that {X(t), t 0} is a Brownian motion process with drift coefficient μand variance parameter σ 2 if(i) X(0) = 0;(ii) {X(t), t 0} has stationary and independent increments;(iii) X(t) is normally distributed with mean μt and variance tσ 2 .An equivalent definition is to let {B(t), t 0} be standard Brownian motionand then defineX(t) = σB(t)+ μt10.3.2. Geometric Brownian MotionIf {Y(t), t 0} is a Brownian motion process with drift coefficient μ and varianceparameter σ 2 , then the process {X(t), t 0} defined byX(t) = e Y(t)is called geometric Brownian motion.For a geometric Brownian motion process {X(t)}, let us compute the expectedvalue of the process at time t given the history of the process up to time s. Thatis, for s


632 10 Brownian Motion and Stationary ProcessesHence, since Y(t)− Y(s) is normal with mean μ(t − s) and variance (t − s)σ 2 ,it follows by setting a = 1 thatE [ e Y(t)−Y(s)] = e μ(t−s)+(t−s)σ2 /2Thus, we obtainE[X(t)|X(u), 0 u s]=X(s)e (t−s)(μ+σ 2 /2)(10.8)Geometric Brownian motion is useful in the modeling of stock prices over timewhen you feel that the percentage changes are independent and identically distributed.For instance, suppose that X n is the price of some stock at time n. Then, itmight be reasonable to suppose that X n /X n−1 , n 1, are independent and identicallydistributed. LetY n = X n /X n−1and soIterating this equality givesThus,X n = Y n X n−1X n = Y n Y n−1 X n−2log(X n ) == Y n Y n−1 Y n−2 X n−3..= Y n Y n−1 ···Y 1 X 0n∑log(Y i ) + log(X 0 )i=1Since log(Y i ), i 1 are independent and identically distributed, {log(X n )} will,when suitably normalized, approximately be Brownian motion with a drift, andso {X n } will be approximately geometric Brownian motion.10.4. Pricing Stock Options10.4.1. An Example in Options PricingIn situations in which money is to be received or paid out in differing time periods,we must take into account the time value of money. That is, to be given the amountv a time t in the future is not worth as much as being given v immediately. Thereason for this is that if we were immediately given v, then it could be loaned out


10.4. Pricing Stock Options 633Figure 10.1.with interest and so be worth more than v at time t. To take this into account, wewill suppose that the time 0 value, also called the present value, of the amount vto be earned at time t is ve −αt . The quantity α is often called the discount factor.In economic terms, the assumption of the discount function e −αt is equivalent tothe assumption that we can earn interest at a continuously compounded rate of100α percent per unit time.We will now consider a simple model for pricing an option to purchase a stockat a future time at a fixed price.Suppose the present price of a stock is $100 per unit share, and suppose weknow that after one time period it will be, in present value dollars, either $200 or$50 (see Figure 10.1). It should be noted that the prices at time 1 are the presentvalue (or time 0) prices. That is, if the discount factor is α, then the actual possibleprices at time 1 are either 200e α or 50e α . To keep the notation simple, we willsuppose that all prices given are time 0 prices.Suppose that for any y, at a cost of cy, you can purchase at time 0 the optionto buy y shares of the stock at time 1 at a (time 0) cost of $150 per share. Thus,for instance, if you do purchase this option and the stock rises to $200, then youwould exercise the option at time 1 and realize a gain of $200 − 150 = $50 foreach of the y option units purchased. On the other hand, if the price at time 1 was$50, then the option would be worthless at time 1. In addition, at a cost of 100xyou can purchase x units of the stock at time 0, and this will be worth either 200xor 50x at time 1.We will suppose that both x or y can be either positive or negative (or zero).That is, you can either buy or sell both the stock and the option. For instance, ifx were negative then you would be selling −x shares of the stock, yielding you areturn of −100x, and you would then be responsible for buying −x shares of thestock at time 1 at a cost of either $200 or $50 per share.We are interested in determining the appropriate value of c, the unit cost of anoption. Specifically, we will show that unless c = 50/3 there will be a combinationof purchases that will always result in a positive gain.To show this, suppose that at time 0 webuy x units of stock, andbuy y units of options


634 10 Brownian Motion and Stationary Processeswhere x and y (which can be either positive or negative) are to be determined.The value of our holding at time 1 depends on the price of the stock at that time;and it is given by the following{ 200x + 50y, if price is 200value =50x, if price is 50The preceding formula follows by noting that if the price is 200 then the x unitsof the stock are worth 200x, and the y units of the option to buy the stock at a unitprice of 150 are worth (200 − 150)y. On the other hand, if the stock price is 50,then the x units are worth 50x and the y units of the option are worthless. Now,suppose we choose y to be such that the preceding value is the same no matterwhat the price at time 1. That is, we choose y so thator200x + 50y = 50xy =−3x(Note that y has the opposite sign of x, and so if x is positive and as a result xunits of the stock are purchased at time 0, then 3x units of stock options are alsosold at that time. Similarly, if x is negative, then −x units of stock are sold and−3x units of stock options are purchased at time 0.)Thus, with y =−3x, the value of our holding at time 1 isvalue = 50xSince the original cost of purchasing x units of the stock and −3x units of optionsisoriginal cost = 100x − 3xc,we see that our gain on the transaction isgain = 50x − (100x − 3xc) = x(3c − 50)Thus, if 3c = 50, then the gain is 0; on the other hand if 3c ≠ 50, we can guaranteea positive gain (no matter what the price of the stock at time 1) by letting x bepositive when 3c>50 and letting it be negative when 3c


10.4. Pricing Stock Options 635the stock goes up to 200 or down to 50. Thus, a guaranteed profit of 10 is attained.Similarly, if the unit cost per option is c = 15, then selling 1 unit of the stock(x =−1) and buying 3 units of the option (y = 3) leads to an initial gain of100 − 45 = 55. On the other hand, the value of this holding at time 1 is −50.Thus, a guaranteed profit of 5 is attained.A sure win betting scheme is called an arbitrage. Thus, as we have just seen,the only option cost c that does not result in an arbitrage is c = 50/3.10.4.2. The Arbitrage TheoremConsider an experiment whose set of possible outcomes is S ={1, 2,...,m}. Supposethat n wagers are available. If the amount x is bet on wager i, then the returnxr i (j) is earned if the outcome of the experiment is j. In other words, r i (·) is thereturn function for a unit bet on wager i. The amount bet on a wager is allowed tobe either positive or negative or zero.A betting scheme is a vector x = (x 1 ,...,x n ) with the interpretation that x 1is bet on wager 1, x 2 on wager 2,..., and x n on wager n. If the outcome of theexperiment is j, then the return from the betting scheme x isreturn from x =n∑x i r i (j)The following theorem states that either there exists a probability vector p =(p 1 ,...,p m ) on the set of possible outcomes of the experiment under which eachof the wagers has expected return 0, or else there is a betting scheme that guaranteesa positive win.i=1Theorem 10.1 (The Arbitrage Theorem)EitherExactly one of the following is true:(i) there exists a probability vector p = (p 1 ,...,p m ) for whichm∑p j r i (j) = 0,j=1for all i = 1,...,nor(ii) there exists a betting scheme x = (x 1 ,...,x n ) for whichn∑x i r i (j) > 0,i=1for all j = 1,...,m


636 10 Brownian Motion and Stationary ProcessesIn other words, if X is the outcome of the experiment, then the arbitrage theoremstates that either there is a probability vector p for X such thatE p [r i (X)]=0,for all i = 1,...,nor else there is a betting scheme that leads to a sure win.Remark This theorem is a consequence of the (linear algebra) theorem of theseparating hyperplane, which is often used as a mechanism to prove the dualitytheorem of linear programming.The theory of linear programming can be used to determine a betting strategythat guarantees the greatest return. Suppose that the absolute value of the amountbet on each wager must be less than or equal to 1. To determine the vector x thatyields the greatest guaranteed win—call this win v—we need to choose x and vso as to maximize v, subject to the constraintsn∑x i r i (j) v,i=1−1 x i 1,for j = 1,...,mi = 1,...,nThis optimization problem is a linear program and can be solved by standardtechniques (such as by using the simplex algorithm). The arbitrage theorem yieldsthat the optimal v will be positive unless there is a probability vector p for which∑ mj=1p j r i (j) = 0 for all i = 1,...,n.Example 10.2 In some situations, the only types of wagers allowed are tochoose one of the outcomes i, i = 1,...,m, and bet that i is the outcome of theexperiment. The return from such a bet is often quoted in terms of “odds.” If theodds for outcome i are o i (often written as “o i to 1”) then a 1 unit bet will returno i if the outcome of the experiment is i and will return −1 otherwise. That is,{oi , if j = ir i (j) =−1 otherwiseSuppose the odds o 1 ,...,o m are posted. In order for there not to be a sure winthere must be a probability vector p = (p 1 ,...,p m ) such that0 ≡ E p [r i (X)]=o i p i − (1 − p i )That is, we must havep i = 11 + o i


10.4. Pricing Stock Options 637Since the p i must sum to 1, this means that the condition for there not to be anarbitrage is thatm∑(1 + o i ) −1 = 1i=1Thus, if the posted odds are such that ∑ i (1 + o i) −1 ≠ 1, then a sure win is possible.For instance, suppose there are three possible outcomes and the odds are asfollows:Outcome Odds1 12 23 3That is, the odds for outcome 1 are 1 − 1, the odds for outcome 2 are 2 − 1, andthat for outcome 3 are 3 − 1. Since12 + 1 3 + 1 4 > 1a sure win is possible. One possibility is to bet −1 on outcome 1 (and so youeither win 1 if the outcome is not 1 and lose 1 if the outcome is 1) and bet −0.7on outcome 2, and −0.5 on outcome 3. If the experiment results in outcome 1,then we win −1 + 0.7 + 0.5 = 0.2; if it results in outcome 2, then we win 1 −1.4+0.5 = 0.1; if it results in outcome 3, then we win 1+0.7−1.5 = 0.2. Hence,in all cases we win a positive amount. RemarkIf ∑ i (1 + o i) −1 ≠ 1, then the betting schemex i =will always yield a gain of exactly 1.(1 + o i ) −11 − ∑ i (1 + o , i = 1,...,n−1i)Example 10.3 Let us reconsider the option pricing example of the previoussection, where the initial price of a stock is 100 and the present value of the priceat time 1 is either 200 or 50. At a cost of c per share we can purchase at time 0the option to buy the stock at time 1 at a present value price of 150 per share. Theproblem is to set the value of c so that no sure win is possible.In the context of this section, the outcome of the experiment is the value ofthe stock at time 1. Thus, there are two possible outcomes. There are also twodifferent wagers: to buy (or sell) the stock, and to buy (or sell) the option. Bythe arbitrage theorem, there will be no sure win if there is a probability vector(p, 1 − p) that makes the expected return under both wagers equal to 0.


638 10 Brownian Motion and Stationary ProcessesNow, the return from purchasing 1 unit of the stock is{ 200 − 100 = 100, if the price is 200 at time 1return =50 − 100 =−50, if the price is 50 at time 1Hence, if p is the probability that the price is 200 at time 1, thenSetting this equal to 0 yields thatE[return]=100p − 50(1 − p)p = 1 3That is, the only probability vector (p, 1 − p) for which wager 1 yields an expectedreturn 0 is the vector ( 1 3 , 2 3 ).Now, the return from purchasing one share of the option is{ 50 − c, if price is 200return =−c, if price is 50Hence, the expected return when p = 1 3 isE[return]=(50 − c) 1 3 − c 2 3= 50 3 − cThus, it follows from the arbitrage theorem that the only value of c for which therewill not be a sure win is c = 50 3, which verifies the result of section 10.4.1. 10.4.3. The Black-Scholes Option Pricing FormulaSuppose the present price of a stock is X(0) = x 0 , and let X(t) denote its priceat time t. Suppose we are interested in the stock over the time interval 0 to T .Assume that the discount factor is α (equivalently, the interest rate is 100α percentcompounded continuously), and so the present value of the stock price at time tis e −αt X(t).We can regard the evolution of the price of the stock over time as our experiment,and thus the outcome of the experiment is the value of the function X(t),0 t T . The types of wagers available are that for any s


10.4. Pricing Stock Options 639eralized (to handle the preceding situation, where the outcome of the experimentis a function), it follows that there will be no sure win if and only if there existsa probability measure over the set of outcomes under which all of the wagershave expected return 0. Let P be a probability measure on the set of outcomes.Consider first the wager of observing the stock for a time s and then purchasing(or selling) one share with the intention of selling (or purchasing) it at timet,0 s


640 10 Brownian Motion and Stationary Processesprocess (see Section 10.3.2). From Equation (10.8) we have that, for s


10.4. Pricing Stock Options 641where N(m,v) is a normal random variable with mean m and variance v, and φis the standard normal distribution function.Thus, we see from Equation (10.11) thatce αt = x 0 e μt+σ 2 t/2 φ ( σ √ t − a ) − Kφ(−a)Using thatμ + σ 2 /2 = αand letting b =−a, we can write this as follows:c = x 0 φ ( σ √ t + b ) − Ke −αt φ(b) (10.12)whereb = αt − σ 2 t/2 − log(K/x 0 )σ √ tThe option price formula given by Equation (10.12) depends on the initial priceof the stock x 0 , the option exercise time t, the option exercise price K, the discount(or interest rate) factor α, and the value σ 2 . Note that for any value of σ 2 ,if the options are priced according to the formula of Equation (10.12) then noarbitrage is possible. However, as many people believe that the price of a stockactually follows a geometric Brownian motion—that is, X(t) = x 0 e Y(t) whereY(t)is Brownian motion with parameters μ and σ 2 —it has been suggested that itis natural to price the option according to the formula (10.12) with the parameterσ 2 taken equal to the estimated value (see the remark that follows) of the varianceparameter under the assumption of a geometric Brownian motion model. Whenthis is done, the formula (10.12) is known as the Black–Scholes option cost valuation.It is interesting that this valuation does not depend on the value of the driftparameter μ but only on the variance parameter σ 2 .If the option itself can be traded, then the formula of Equation (10.12) can beused to set its price in such a way so that no arbitrage is possible. If at time sthe price of the stock is X(s) = x s , then the price of a (t, K) option—that is, anoption to purchase one unit of the stock at time t for a price K—should be set byreplacing t by t − s and x 0 by x s in Equation (10.12).Remark If we observe a Brownian motion process with variance parameter σ 2over any time interval, then we could theoretically obtain an arbitrarily precise


642 10 Brownian Motion and Stationary Processesestimate of σ 2 . For suppose we observe such a process {Y(s)} for a time t.Then, for fixed h,letN =[t/h] and setW 1 = Y(h)− Y(0),W 2 = Y(2h) − Y(h),.W N = Y(Nh)− Y(Nh− h)Then random variables W 1 ,...,W N are independent and identically distributednormal random variables having variance hσ 2 . We now use the fact (see Section3.6.4) that (N − 1)S 2 /(σ 2 h) has a chi-squared distribution with N − 1 degrees offreedom, where S 2 is the sample variance defined byS 2 =N∑(W i − ¯W) 2 /(N − 1)i=1Since the expected value and variance of a chi-squared with k degrees of freedomare equal to k and 2k, respectively, we see thatE[(N − 1)S 2 /(σ 2 h)]=N − 1andVar[(N − 1)S 2 /(σ 2 h)]=2(N − 1)From this, we see thatandE[S 2 /h]=σ 2Var[S 2 /h]=2σ 4 /(N − 1)Hence, as we let h become smaller (and so N =[t/h] becomes larger) the varianceof the unbiased estimator of σ 2 becomes arbitrarily small. Equation (10.12) is not the only way in which options can be priced so that noarbitrage is possible. Let {X(t),0 t T } be any stochastic process satisfying,


10.4. Pricing Stock Options 643for s


644 10 Brownian Motion and Stationary Processesthen Equation (10.13) is satisfied. Therefore, if for any value of λ we let the Y ihave any distributions with a common mean equal to μ = 1 + α/λ and then pricethe options according to Equation (10.14), then no arbitrage is possible.Remark If {X(t), t 0} satisfies Equation (10.13), then the process{e −αt X(t), t 0} is called a Martingale. Thus, any pricing of options for whichthe expected gain on the option is equal to 0 when {e −αt X(t)} follows the probabilitylaw of some Martingale will result in no arbitrage possibilities.That is, if we choose any Martingale process {Z(t)} and let the cost of a (t, K)option bec = E[e −αt (e αt Z(t) − K) + ]= E[(Z(t) − Ke −αt ) + ]then there is no sure win.In addition, while we did not consider the type of wager where a stock that ispurchased at time s is sold not at a fixed time t but rather at some random timethat depends on the movement of the stock, it can be shown using results aboutMartingales that the expected return of such wagers is also equal to 0.Remark A variation of the arbitrage theorem was first noted by de Finetti in1937. A more general version of de Finetti’s result, of which the arbitrage theoremis a special case, is given in Reference 3.10.5. White NoiseLet {X(t),t 0} denote a standard Brownian motion process and let f be a functionhaving a continuous derivative in the region [a,b]. The stochastic integralf(t)dX(t)is defined as follows:∫ ba∫ baf(t)dX(t)≡limn∑n→∞i=1max(t i −t i−1 )→0f(t i−1 )[X(t i ) − X(t i−1 )] (10.15)where a = t 0


10.5. White Noise 645we see that∫ baf(t)dX(t)= f(b)X(b)− f(a)X(a) −∫ baX(t)df(t) (10.16)Equation (10.16) is usually taken as the definition of ∫ ba f(t)dX(t).By using the right side of Equation (10.16) we obtain, upon assuming the interchangeabilityof expectation and limit, thatAlso,[∫ b]E f(t)dX(t) = 0a( n∑Vari=1)f(t i−1 )[X(t i ) − X(t i−1 )] ==n∑f 2 (t i−1 ) Var[X(t i ) − X(t i−1 )]i=1n∑f 2 (t i−1 )(t i − t i−1 )i=1where the top equality follows from the independent increments of Brownian motion.Hence, we obtain from Equation (10.15) upon taking limits of the precedingthat[∫ b]Var f(t)dX(t) =a∫ baf 2 (t) dtRemark The preceding gives operational meaning to the family of quantities{dX(t), 0 t


646 10 Brownian Motion and Stationary Processesof white noise. That is, if V(t)denotes the particle’s velocity at t, suppose thatV ′ (t) =−βV(t) + αX ′ (t)where {X(t), t 0} is standard Brownian motion. This can be written as follows:ore βt [V ′ (t) + βV(t)]=αe βt X ′ (t)ddt [eβt V(t)]=αe βt X ′ (t)Hence, upon integration, we obtainor∫ te βt V(t)= V(0) + α e βs X ′ (s) ds0∫ tV(t)= V(0)e −βt + α e −β(t−s) dX(s)0Hence, from Equation (10.16),[ ∫ t]V(t)= V(0)e −βt + α X(t) − X(s)βe −β(t−s) ds010.6. Gaussian ProcessesWe start with the following definition.Definition 10.2 A stochastic process X(t), t 0 is called a Gaussian, oranormal, process if X(t 1 ), . . . , X(t n ) has a multivariate normal distribution for allt 1 ,...,t n .If {X(t),t 0} is a Brownian motion process, then because each of X(t 1 ),X(t 2 ),...,X(t n ) can be expressed as a linear combination of the independent normalrandom variables X(t 1 ), X(t 2 ) − X(t 1 ), X(t 3 ) − X(t 2 ), . . . , X(t n ) − X(t n−1 )it follows that Brownian motion is a Gaussian process.Because a multivariate normal distribution is completely determined by themarginal mean values and the covariance values (see Section 2.6) it follows that


10.6. Gaussian Processes 647standard Brownian motion could also be defined as a Gaussian process havingE[X(t)]=0 and, for s t,Cov(X(s), X(t)) = Cov(X(s), X(s) + X(t) − X(s))= Cov(X(s), X(s)) + Cov(X(s), X(t) − X(s))= Cov(X(s), X(s)) by independent increments= s since Var(X(s)) = s (10.17)Let {X(t), t 0} be a standard Brownian motion process and consider theprocess values between 0 and 1 conditional on X(1) = 0. That is, consider theconditional stochastic process {X(t), 0 t 1|X(1) = 0}. Since the conditionaldistribution of X(t 1 ), . . . , X(t n ) is multivariate normal it follows that this conditionalprocess, known as the Brownian bridge (as it is tied down both at 0 andat 1), is a Gaussian process. Let us compute its covariance function. As, fromEquation (10.4),we have that, for s


648 10 Brownian Motion and Stationary ProcessesProof As it is immediate that {Z(t), t 0} is a Gaussian process, all we needverify is that E[Z(t)]=0 and Cov(Z(s), Z(t)) = s(1 − t), when s t. Theformeris immediate and the latter follows fromCov(Z(s), Z(t)) = Cov(X(s) − sX(1), X(t) − tX(1))= Cov(X(s), X(t)) − t Cov(X(s), X(1))− s Cov(X(1), X(t)) + st Cov(X(1), X(1))= s − st − st + st= s(1 − t)and the proof is complete.If {X(t), t 0} is Brownian motion, then the process {Z(t), t 0} defined byZ(t) =∫ t0X(s)ds (10.18)is called integrated Brownian motion. As an illustration of how such a process mayarise in practice, suppose we are interested in modeling the price of a commoditythroughout time. Letting Z(t) denote the price at t then, rather than assumingthat {Z(t)} is Brownian motion (or that log Z(t) is Brownian motion), we mightwant to assume that the rate of change of Z(t) follows a Brownian motion. Forinstance, we might suppose that the rate of change of the commodity’s price is thecurrent inflation rate which is imagined to vary as Brownian motion. Hence,dZ(t) = X(t),dt∫ tZ(t) = Z(0) + X(s)ds0It follows from the fact that Brownian motion is a Gaussian process that{Z(t), t 0} is also Gaussian. To prove this, first recall that W 1 ,...,W n is saidto have a multivariate normal distribution if they can be represented asm∑W i = a ij U j ,j=1i = 1,...,nwhere U j , j = 1,...,m are independent normal random variables. From this itfollows that any set of partial sums of W 1 ,...,W n are also jointly normal. Thefact that Z(t 1 ),...,Z(t n ) is multivariate normal can now be shown by writing theintegral in Equation (10.18) as a limit of approximating sums.


10.7. Stationary and Weakly Stationary Processes 649As {Z(t), t 0} is Gaussian it follows that its distribution is characterized byits mean value and covariance function. We now compute these when {X(t), t 0} is standard Brownian motion.[∫ t]E[Z(t)]=E X(s)dsFor s t,0∫ t= E[X(s)] ds0= 0Cov[Z(s),Z(t)]=E[Z(s)Z(t)][∫ t ∫ s]= E X(y)dy X(u)du00[∫ s ∫ t]= E X(y)X(u)dy du===0 0∫ s ∫ t0 0∫ s ∫ t0∫ s00(∫ u0E[X(y)X(u)] dy dumin(y, u) dy du by (10.17)∫ t) ( tydy+ udy du = s 2u2 − s )610.7. Stationary and Weakly Stationary ProcessesA stochastic process {X(t), t 0} is said to be a stationary process if for alln,s,t,...,t n the random vectors X(t 1 ), . . . , X(t n ) and X(t 1 + s),...,X(t n + s)have the same joint distribution. In other words, a process is stationary if, in choosingany fixed point s as the origin, the ensuing process has the same probabilitylaw. Two examples of stationary processes are:(i) An ergodic continuous-time Markov chain {X(t), t 0} whenP {X(0) = j}=P j , j 0where {P j ,j 0} are the limiting probabilities.(ii) {X(t), t 0} when X(t) = N(t + L) − N(t), t 0, where L>0isafixed constant and {N(t), t 0} is a Poisson process having rate λ.


650 10 Brownian Motion and Stationary ProcessesThe first one of these processes is stationary for it is a Markov chain whose initialstate is chosen according to the limiting probabilities, and it can thus be regardedas an ergodic Markov chain that we start observing at time ∞. Hence the continuationof this process at time s after observation begins is just the continuation ofthe chain starting at time ∞+s, which clearly has the same probability for all s.That the second example—where X(t) represents the number of events of a Poissonprocess that occur between t and t + L—is stationary follows the stationaryand independent increment assumption of the Poisson process which implies thatthe continuation of a Poisson process at any time s remains a Poisson process.Example 10.5 (The Random Telegraph Signal Process) Let {N(t), t 0} denotea Poisson process, and let X 0 be independent of this process and be such thatP {X 0 = 1}=P {X 0 =−1}= 1 2 . Defining X(t) = X 0(−1) N(t) then {X(t), t 0}is called random telegraph signal process. To see that it is stationary, note first thatstarting at any time t, no matter what the value of N(t),asX 0 is equally likely tobe either plus or minus 1, it follows that X(t) is equally likely to be either plus orminus 1. Hence, because the continuation of a Poisson process beyond any timeremains a Poisson process, it follows that {X(t), t 0} is a stationary process.Let us compute the mean and covariance function of the random telegraph signalE[X(t)]=E [ X 0 (−1) N(t)]= E [ X 0 ]E[(−1) N(t)] by independence= 0 since E[X 0 ]=0,Cov[X(t),X(t + s)]=E[X(t)X(t + s)]= E [ X0 2 (−1)N(t)+N(t+s)]= E[(−1) 2N(t) (−1) N(t+s)−N(t) ]= E[(−1) N(t+s)−N(t) ]= E[(−1) N(s) ]∞∑= (−1) i −λs (λs)iei!i=0= e −2λs (10.19)For an application of the random telegraph signal consider a particle movingat a constant unit velocity along a straight line and suppose that collisions involvingthis particle occur at a Poisson rate λ. Also suppose that each time the


10.7. Stationary and Weakly Stationary Processes 651particle suffers a collision it reverses direction. Therefore, if X 0 represents theinitial velocity of the particle, then its velocity at time t—call it X(t)—is givenby X(t) = X 0 (−1) N(t) , where N(t) denotes the number of collisions involvingthe particle by time t. Hence, if X 0 is equally likely to be plus or minus 1, andis independent of {N(t), t 0}, then {X(t), t 0} is a random telegraph signalprocess. If we now let∫ tD(t) = X(s)ds0then D(t) represents the displacement of the particle at time t from its position attime 0. The mean and variance of D(t) are obtained as follows:∫ tE[D(t)]= E[X(s)] ds = 0,0Var[D(t)]=E[D 2 (t)][∫ t= E X(y)dy∫ t== 2= 2= 1 λ00∫ t0∫∫0


652 10 Brownian Motion and Stationary ProcessesExample 10.6 (The Ornstein–Uhlenbeck Process) Let {X(t), t 0} be astandard Brownian motion process, and define, for α>0,V(t)= e −αt/2 X(e αt )The process {V(t), t 0} is called the Ornstein–Uhlenbeck process. It has beenproposed as a model for describing the velocity of a particle immersed in a liquidor gas, and as such is useful in statistical mechanics. Let us compute its mean andcovariance functionE[V(t)]=0,Cov[V(t),V(t+ s)]=e −αt/2 e −α(t+s)/2 Cov[X(e αt ), X(e α(t+s) )]= e −αt e −αs/2 e αt by Equation (10.17)= e −αs/2Hence, {V(t), t 0} is weakly stationary and as it is clearly a Gaussian process(since Brownian motion is Gaussian) we can conclude that it is stationary. It isinteresting to note that (with α = 4λ) it has the same mean and covariance functionas the random telegraph signal process, thus illustrating that two quite differentprocesses can have the same second-order properties. (Of course, if two Gaussianprocesses have the same mean and covariance functions then they are identicallydistributed.) As the following examples show, there are many types of second-order stationaryprocesses that are not stationary.Example 10.7 (An Autoregressive Process) Let Z 0 ,Z 1 ,Z 2 ,... be uncorrelatedrandom variables with E[Z n ]=0, n 0 andwhere λ 2 < 1. DefineVar(Z n ) ={ σ 2 /(1 − λ 2 ), n = 0σ 2 , n 1X 0 = Z 0 ,X n = λX n−1 + Z n , n 1(10.20)The process {X n ,n 0} is called a first-order autoregressive process. It says thatthe state at time n (that is, X n ) is a constant multiple of the state at time n − 1plusa random error term Z n .


10.7. Stationary and Weakly Stationary Processes 653Iterating Equation (10.20) yieldsX n = λ(λX n−2 + Z n−1 ) + Z n= λ 2 X n−2 + λZ n−1 + Z n.=n∑λ n−i Z ii=0and so( n∑)n+m∑Cov(X n ,X n+m ) = Cov λ n−i Z i , λ n+m−i Z ii=0 i=0n∑= λ n−i λ n+m−i Cov(Z i ,Z i )i=0()= σ 2 λ 2n+m 1n∑1 − λ 2 + λ −2i= σ 2 λ m1 − λ 2i=1where the preceding uses the fact that Z i and Z j are uncorrelated when i ≠ j.As E[X n ]=0, we see that {X n ,n 0} is weakly stationary (the definition fora discrete time process is the obvious analog of that given for continuous timeprocesses). Example 10.8 If, in the random telegraph signal process, we drop the requirementthat P {X 0 = 1}=P {X 0 =−1}= 1 2 and only require that E[X 0]=0,then the process {X(t), t 0} need no longer be stationary. (It will remain stationaryif X 0 has a symmetric distribution in the sense that −X 0 has the samedistribution as X 0 .) However, the process will be weakly stationary sinceE[X(t)]=E[X 0 ]E [ (−1) N(t)] = 0,Cov[X(t),X(t + s)]=E[X(t)X(t + s)]= E [ X02 ] [E (−1)N(t)+N(t+s) ]= E [ X02 ]e−2λsfrom (10.19)


654 10 Brownian Motion and Stationary ProcessesExample 10.9 Let W 0 ,W 1 ,W 2 ,... be uncorrelated with E[W n ]=μ andVar(W n ) = σ 2 , n 0, and for some positive integer k defineX n = W n + W n−1 +···+W n−k, n kk + 1The process {X n ,n k}, which at each time keeps track of the arithmetic averageof the most recent k + 1 values of the W s, is called a moving average process.Using the fact that the W n ,n 0 are uncorrelated, we see that⎧⎨(k + 1 − m)σ 2Cov(X n ,X n+m ) =⎩(k + 1) 2 , if 0 m k0, if m>kHence, {X n ,n k} is a second-order stationary process.Let {X n , n 1} be a second-order stationary process with E[X n ]=μ. Animportant question is when, if ever, does ¯X n ≡ ∑ ni=1 X i /n converge to μ? Thefollowing proposition, which we state without proof, shows that E[( ¯X n − μ) 2 ]→0 if and only if ∑ ni=1 R(i)/n → 0. That is, the expected square of the differencebetween ¯X n and μ will converge to 0 if and only if the limiting average value ofR(i) converges to 0.Proposition 10.2 Let {X n ,n 1} be a second-order stationary processhaving∑mean μ and covariance function R(i) = Cov(X n ,X n+i ), and let ¯X n ≡ni=1X i /n. Then lim n→∞ E[( ¯X n − μ) 2 ∑]=0 and only if lim ni=1 n→∞ R(i)/n = 0.10.8. Harmonic Analysis of WeaklyStationary ProcessesSuppose that the stochastic processes {X(t), −∞


10.8. Harmonic Analysis of Weakly Stationary Processes 655function. If h(s) = 0 whenever s


656 10 Brownian Motion and Stationary ProcessesThe preceding expression for R Y (t 2 − t 1 ) = Cov[Y(t 1 ), Y (t 2 )] is, however,more compactly and usefully expressed in terms of Fourier transforms of R X andR Y . Let, for i = √ −1,∫˜R X (w) = e −iws R X (s) dsand∫˜R Y (w) =e −iws R Y (s) dsdenote the Fourier transforms, respectively, of R X and R Y . The function ˜R X (w)is also called the power spectral density of the process {X(t)}. Also, let∫˜h(w) = e −iws h(s) dsdenote the Fourier transform of the function h. Then, from Equation (10.22),∫∫∫˜R Y (w) = e iws R X (s − s 2 + s 1 )h(s 1 )h(s 2 )ds 1 ds 2 ds∫∫∫= e iw(s−s 2+s 1 ) R X (s − s 2 + s 1 )dse −iws 2h(s 2 )ds 2 e iws 1h(s 1 )ds 1= ˜R X (w)˜h(w)˜h(−w) (10.23)Now, using the representationwe obtain[∫˜h(w)˜h(−w) =e ix = cos x + i sin x,e −ix = cos(−x) + i sin(−x) = cos x − i sin x[∫=[∫×∫h(s) cos(ws) ds − i∫h(s) cos(ws) ds + i2 [∫h(s) cos(ws) ds]+∫=∣ h(s)e −iws 2ds∣ =|˜h(w)| 2]h(s) sin(ws) ds]h(s) sin(ws) ds] 2h(s) sin(ws) ds


Exercises 657Hence, from Equation (10.23) we obtain˜R Y (w) = ˜R X (w)|˜h(w)| 2In words, the Fourier transform of the covariance function of the output processis equal to the square of the amplitude of the Fourier transform of the impulsefunction multiplied by the Fourier transform of the covariance function of theinput process.ExercisesIn the following exercises {B(t), t 0} is a standard Brownian motion processand T a denotes the time it takes this process to hit a.*1. What is the distribution of B(s) + B(t), s t?2. Compute the conditional distribution of B(s) given that B(t 1 ) = A, B(t 2 ) =B, where 0


658 10 Brownian Motion and Stationary Processes(b) Using part (a) and the results of the gambler’s ruin problem (Section 4.5.1),compute the probability that a Brownian motion process with drift rate μ goesup A before going down B, A>0, B>0.9. Let {X(t), t 0} be a Brownian motion with drift coefficient μ and varianceparameter σ 2 . What is the joint density function of X(s) and X(t), s


Exercises 659Assume that the continuously compounded interest rate is 5 percent.A stochastic process {Y(t), t 0} is said to be a Martingale process if, fors


660 10 Brownian Motion and Stationary Processes21. Let {X(t), t 0} be Brownian motion with drift coefficient μ and varianceparameter σ 2 . That is,X(t) = σB(t)+ μtLet μ>0, and for a positive constant x letT = Min{t : X(t) = x}{= Min t : B(t) = x − μt }σThat is, T is the first time the process {X(t), t 0} hits x. Use the Martingalestopping theorem to show thatE[T ]=x/μ22. Let X(t) = σB(t) + μt, and for given positive constants A and B, letpdenote the probability that {X(t), t 0} hits A before it hits −B.(a) Define the stopping time T to be the first time the process hits either A or−B. Use this stopping time and the Martingale defined in Exercise 19 to showthat(b) Let c =−2μ/σ , and show thatE[exp{c(X(T ) − μT )/σ − c 2 T/2}] = 1E[exp{−2μX(T )/σ }] = 1(c) Use part (b) and the definition of T to find p.Hint: What are the possible values of exp{−2μX(T )/σ 2 }?23. Let X(t) = σB(t) + μt, and define T to be the first time the process{X(t), t 0} hits either A or −B, where A and B are given positive numbers.Use the Martingale stopping theorem and part (c) of Exercise 22 to find E[T ].*24. Let {X(t), t 0} be Brownian motion with drift coefficient μ and varianceparameter σ 2 . Suppose that μ>0. Let x>0 and define the stopping time T (as inExercise 21) byT = Min{t : X(t) = x}Use the Martingale defined in Exercise 18, along with the result of Exercise 21,to show thatVar(T ) = xσ 2 /μ 3


Exercises 66125. Compute the mean and variance of(a) ∫ 10 tdB(t).(b) ∫ 10 t2 dB(t).26. Let Y(t)= tB(1/t), t>0 and Y(0) = 0.(a) What is the distribution of Y(t)?(b) Compare Cov(Y (s), Y (t)).(c) Argue that {Y(t), t 0} is a standard Brownian motion process.*27. Let Y(t)= B(a 2 t)/a for a>0. Argue that {Y(t)} is a standard Brownianmotion process.28. For s


662 10 Brownian Motion and Stationary Processes34. Let {X(t), −∞


Simulation1111.1. IntroductionLet X = (X 1 ,...,X n ) denote a random vector having a given density functionf(x 1 ,...,x n ) and suppose we are interested in computing∫∫ ∫E[g(X)]= ··· g(x 1 ,...,x n )f (x 1 ,...,x n )dx 1 dx 2 ···dx nfor some n-dimensional function g. For instance, g could represent the total delayin queue of the first [n/2] customers when the X values represent the first [n/2]interarrival and service times. ∗ In many situations, it is not analytically possibleeither to compute the preceding multiple integral exactly or even to numericallyapproximate it within a given accuracy. One possibility that remains is to approximateE[g(X)] by means of simulation.To approximate E[g(X)], start by generating a random vector X (1) = (X (1)1 ,...,X (1)n ) having the joint density f(x 1 ,...,x n ) and then compute Y (1) = g(X (1) ).Now generate a second random vector (independent of the first) X (2) and computeY (2) = g(X (2) ). Keep on doing this until r, a fixed number, of independent andidentically distributed random variables Y (i) = g(X (i) ), i = 1,...,r have beengenerated. Now by the strong law of large numbers, we know thatY (1) +···+Y (r)lim= E[Y (i) ]=E[g(X)]r→∞ rand so we can use the average of the generated Y sasanestimateofE[g(X)].Thisapproach to estimating E[g(X)] is called the Monte Carlo simulation approach.Clearly there remains the problem of how to generate, or simulate, randomvectors having a specified joint distribution. The first step in doing this is to be∗ We are using the notation [a] to represent the largest integer less than or equal to a.663


664 11 Simulationable to generate random variables from a uniform distribution on (0, 1).Onewayto do this would be to take 10 identical slips of paper, numbered 0, 1,...,9, placethem in a hat and then successively select n slips, with replacement, from the hat.The sequence of digits obtained (with a decimal point in front) can be regardedas the value of a uniform (0, 1) random variable rounded off to the nearest (10 1 )n .For instance, if the sequence of digits selected is 3, 8, 7, 2, 1, then the value of theuniform (0, 1) random variable is 0.38721 (to the nearest 0.00001). Tables of thevalues of uniform (0, 1) random variables, known as random number tables, havebeen extensively published [for instance, see The RAND Corporation, A MillionRandom Digits with 100,000 Normal Deviates (New York: The Free Press, 1955)].Table 11.1 is such a table.However, this is not the way in which digital computers simulate uniform (0, 1)random variables. In practice, they use pseudo random numbers instead of trulyrandom ones. Most random number generators start with an initial value X 0 ,called the seed, and then recursively compute values by specifying positive integersa,c, and m, and then lettingX n+1 = (aX n + c) modulo m, n 0where the preceding means that aX n + c is divided by m and the remainder istaken as the value of X n+1 . Thus each X n is either 0, 1,...,m−1 and the quantityX n /m is taken as an approximation to a uniform (0, 1) random variable. It can beshown that subject to suitable choices for a,c,m, the preceding gives rise to asequence of numbers that looks as if it were generated from independent uniform(0, 1) random variables.As our starting point in the simulation of random variables from an arbitrarydistribution, we shall suppose that we can simulate from the uniform (0, 1) distribution,and we shall use the term “random numbers” to mean independent randomvariables from this distribution. In Sections 11.2 and 11.3 we present both generaland special techniques for simulating continuous random variables; and inSection 11.4 we do the same for discrete random variables. In Section 11.5 wediscuss the simulation both of jointly distributed random variables and stochasticprocesses. Particular attention is given to the simulation of nonhomogeneousPoisson processes, and in fact three different approaches for this are discussed.Simulation of two-dimensional Poisson processes is discussed in Section 11.5.2.In Section 11.6 we discuss various methods for increasing the precision of thesimulation estimates by reducing their variance; and in Section 11.7 we considerthe problem of choosing the number of simulation runs needed to attain a desiredlevel of precision. Before beginning this program, however, let us consider twoapplications of simulation to combinatorial problems.Example 11.1 (Generating a Random Permutation) Suppose we are interestedin generating a permutation of the numbers 1, 2,...,n that is such that all


11.1. Introduction 665Table 11.1A Random Number Table04839 96423 24878 82651 66566 14778 76797 14780 13300 8707468086 26432 46901 20848 89768 81536 86645 12659 92259 5710239064 66432 84673 40027 32832 61362 98947 96067 64760 6458425669 26422 44407 44048 37937 63904 45766 66134 75470 6652064117 94305 26766 25940 39972 22209 71500 64568 91402 4241687917 77341 42206 35126 74087 99547 81817 42607 43808 7665562797 56170 86324 88072 76222 36086 84637 93161 76038 6585595876 55293 18988 27354 26575 08625 40801 59920 29841 8015029888 88604 67917 48708 18912 82271 65424 69774 33611 5426273577 12908 30883 18317 28290 35797 05998 41688 34952 3788827958 30134 04024 86385 29880 99730 55536 84855 29080 0925090999 49127 20044 59931 06115 20542 18059 02008 73708 8351718845 49618 02304 51038 20655 58727 28168 15475 56942 5338994824 78171 84610 82834 09922 25417 44137 48413 25555 2124635605 81263 39667 47358 56873 56307 61607 49518 89356 2010333362 64270 01638 92477 66969 98420 04880 45585 46565 0410288720 82765 34476 17032 87589 40836 32427 70002 70663 8886339475 46473 23219 53416 94970 25832 69975 94884 19661 7282806990 67245 68350 82948 11398 42878 80287 88267 47363 4663440980 07391 58745 25774 22987 80059 39911 96189 41151 1422283974 29992 65381 38857 50490 83765 55657 14361 31720 5737533339 31926 14883 24413 59744 92351 97473 89286 35931 0411031662 25388 61642 34072 81249 35648 56891 69352 48373 4557893526 70765 10592 04542 76463 54328 02349 17247 28865 1477720492 38391 91132 21999 59516 81652 27195 48223 46751 2292304153 53381 79401 21438 83035 92350 36693 31238 59649 9175405520 91962 04739 13092 97662 24822 94730 06496 35090 0482247498 87637 99016 71060 88824 71013 18735 20286 23153 7292423167 49323 45021 33132 12544 41035 80780 45393 44812 1251523792 14422 15059 45799 22716 19792 09983 74353 68668 3042985900 98275 32388 52390 16815 69298 82732 38480 73817 3252342559 78985 05300 22164 24369 54224 35083 19687 11062 9149114349 82674 66523 44133 00697 35552 35970 19124 63318 2968617403 53363 44167 64486 64758 75366 76554 31601 12614 3307223632 27889 47914 02584 37680 20801 72152 39339 34806 08930n! possible orderings are equally likely. The following algorithm will accomplishthis by first choosing one of the numbers 1,...,nat random and then putting thatnumber in position n; it then chooses at random one of the remaining n − 1 numbersand puts that number in position n − 1; it then chooses at random one of theremaining n − 2 numbers and puts it in position n − 2, and so on (where choosinga number at random means that each of the remaining numbers is equally likely


666 11 Simulationto be chosen). However, so that we do not have to consider exactly which of thenumbers remain to be positioned, it is convenient and efficient to keep the numbersin an ordered list and then randomly choose the position of the number ratherthan the number itself. That is, starting with any initial ordering p 1 ,p 2 ,...,p n ,we pick one of the positions 1,...,n at random and then interchange the numberin that position with the one in position n. Now we randomly choose one of thepositions 1,...,n− 1 and interchange the number in this position with the one inposition n − 1, and so on.To implement the preceding, we need to be able to generate a random variablethat is equally likely to take on any of the values 1, 2,...,k. To accomplish this,let U denote a random number—that is, U is uniformly distributed over (0, 1)—and note that kU is uniform on (0,k)and soP {i − 1


11.1. Introduction 667in estimating d, the number of distinct elements in the list. If we let m i denotethenumber of times that the element in position i appears on the list, then we canexpress d byn∑ 1d =m ii=1To estimate d, suppose that we generate a random value X equally likely to beeither 1, 2,...,n (that is, we take X =[nU]+1) and then let m(X) denote thenumber of times the element in position X appears on the list. ThenE[ ] 1=m(X)n∑i=11 1m i n = d nHence, if we generate k such random variables X 1 ,...,X k we can estimate d byd ≈ n ∑ ki=1 1/m(X i )kSuppose now that each item in the list has a value attached to it—v(i) being thevalue of the ith element. The sum of the values of the distinct items—call it v—can be expressed asv =n∑i=1v(i)m(i)Now if X =[nU]+1, where U is a random number, then[ ] v(X)n∑ v(i) 1E =m(X) m(i) n = v nHence, we can estimate v by generating X 1 ,...,X k and then estimating v byv ≈ n ki=1k∑i=1v(X i )m(X i )For an important application of the preceding, let A i ={a i,1 ,...,a i,ni }, i =1,...,s denote events, and suppose we are interested in estimating P( ⋃ si=1 A i ).Since( s⋃)P A i = ∑P(a)=i=1 a∈∪A is∑ ∑n ii=1 j=1P(a i,j )m(a i,j )where m(a i,j ) is the number of events to which the point a i,j belongs, the precedingmethod can be used to estimate P( ⋃ s1 A i ).


668 11 SimulationNote that the preceding procedure for estimating v can be effected withoutprior knowledge of the set of values {v 1 ,...,v n }. That is, it suffices that we candetermine the value of an element in a specific place and the number of times thatelement appears on the list. When the set of values is aprioriknown, there is amore efficient approach available as will be shown in Example 11.11. 11.2. General Techniques for SimulatingContinuous Random VariablesIn this section we present three methods for simulating continuous random variables.11.2.1. The Inverse Transformation MethodA general method for simulating a random variable having a continuous distribution—calledthe inverse transformation method—is based on the following proposition.Proposition 11.1 Let U be a uniform (0, 1) random variable. For any continuousdistribution function F if we define the random variable X byX = F −1 (U)then the random variable X has distribution function F .[F −1 (u) is defined toequal that value x for which F(x)= u.]ProofF X (a) = P {X a}= P {F −1 (U) a} (11.1)Now, since F(x) is a monotone function, it follows that F −1 (U) a if and onlyif U F(a). Hence, from Equation (11.1), we see thatF X (a) = P {U F(a)}= F(a) Hence we can simulate a random variable X from the continuous distributionF , when F −1 is computable, by simulating a random number U and thensetting X = F −1 (U).


11.2. General Techniques for Simulating Continuous Random Variables 669Example 11.3 (Simulating an Exponential Random Variable) If F(x)= 1 −e −x , then F −1 (u) is that value of x such that1 − e −x = uorx =−log(1 − u)Hence, if U is a uniform (0, 1) variable, thenF −1 (U) =−log(1 − U)is exponentially distributed with mean 1. Since 1−U is also uniformly distributedon (0, 1) it follows that − log U is exponential with mean 1. Since cX is exponentialwith mean c when X is exponential with mean 1, it follows that −c log Uis exponential with mean c. 11.2.2. The Rejection MethodSuppose that we have a method for simulating a random variable having densityfunction g(x). We can use this as the basis for simulating from the continuousdistribution having density f(x)by simulating Y from g and then accepting thissimulated value with a probability proportional to f(Y)/g(Y).Specifically let c be a constant such thatf(y)g(y) cfor all yWe then have the following technique for simulating a random variable havingdensity f .Rejection MethodStep 1: Simulate Y having density g and simulate a random number U.Step 2: If U f(Y)/cg(Y)set X = Y . Otherwise return to Step 1.Proposition 11.2 The random variable X generated by the rejection methodhas density function f .


670 11 SimulationProof Let X be the value obtained, and let N denote the number of necessaryiterations. ThenP {X x}=P {Y N x}= P {Y x|U f(Y)/cg(Y)}P {Y x,U f(Y)/cg(Y)}=K∫P {Y x,U f(Y)/cg(Y)|Y = y}g(y)dy=K∫ x−∞(f (y)/cg(y))g(y) dy=K∫ x−∞=f(y)dyKcwhere K = P {U f(Y)/cg(Y)}. Letting x →∞shows that K = 1/c and theproof is complete. Remarks (i) The preceding method was originally presented by Von Neumannin the special case where g was positive only in some finite interval (a, b), and Ywas chosen to be uniform over (a, b) [that is, Y = a + (b − a)U].(ii) Note that the way in which we “accept the value Y with probabilityf(Y)/cg(Y)” is by generating a uniform (0, 1) random variable U and then acceptingY if U f(Y)/cg(Y).(iii) Since each iteration of the method will, independently, result in an acceptedvalue with probability P {U f(Y)/cg(Y)}=1/c it follows that the number ofiterations is geometric with mean c.(iv) Actually, it is not necessary to generate a new uniform random numberwhen deciding whether or not to accept, since at a cost of some additional computation,a single random number, suitably modified at each iteration, can be usedthroughout. To see how, note that the actual value of U is not used—only whetheror not U f(Y)/cg(Y)—we can use the fact that, given Y ,U − f(Y)/cg(Y)1 − f(Y)/cg(Y)=cUg(Y) − f(Y)cg(Y ) − f(Y)is uniform on (0, 1). Hence, this may be used as a uniform random number inthe next iteration. As this saves the generation of a random number at the cost ofthe preceding computation, whether it is a net savings depends greatly upon themethod being used to generate random numbers.


11.2. General Techniques for Simulating Continuous Random Variables 671Example 11.4 Let us use the rejection method to generate a random variablehaving density functionf(x)= 20x(1 − x) 3 ,0


672 11 Simulationthat the absolute value of Z has density functionf(x)= 2 √2πe −x2 /2 , 0


11.2. General Techniques for Simulating Continuous Random Variables 673The random variables Z and Y generated by the preceding are independent withZ being normal with mean 0 and variance 1 and Y being exponential with rate 1.(If we want the normal random variable to have mean μ and variance σ 2 , just takeμ + σZ.) Remarks (i) Since c = √ 2e/π ≈ 1.32, the preceding requires a geometric distributednumber of iterations of step 2 with mean 1.32.(ii) The final random number of step 4 need not be separately simulated butrather can be obtained from the first digit of any random number used earlier.That is, suppose we generate a random number to simulate an exponential; thenwe can strip off the initial digit of this random number and just use the remainingdigits (with the decimal point moved one step to the right) as the randomnumber. If this initial digit is 0, 1, 2, 3, or 4 (or 0 if the computer is generatingbinary digits), then we take the sign of Z to be positive and take it to be negativeotherwise.(iii) If we are generating a sequence of standard normal random variables,then we can use the exponential obtained in step 4 as the initial exponentialneeded in step 1 for the next normal to be generated. Hence, on the average, wecan simulate a unit normal by generating 1.64 exponentials and computing 1.32squares.11.2.3. The Hazard Rate MethodLet F be a continuous distribution function with ¯F(0) = 1. Recall that λ(t), thehazard rate function of F , is defined byλ(t) = f(t)¯F(t) , t 0[where f(t)= F ′ (t) is the density function]. Recall also that λ(t) represents theinstantaneous probability intensity that an item having life distribution F will failat time t given it has survived to that time.Suppose now that we are given a bounded function λ(t), such that ∫ ∞0λ(t) dt =∞, and we desire to simulate a random variable S having λ(t) as its hazard ratefunction.To do so let λ be such thatλ(t) λ for all t 0To simulate from λ(t), t 0, we will


674 11 Simulation(a) simulate a Poisson process having rate λ. We will then only “accept” or“count” certain of these Poisson events. Specifically we will(b) count an event that occurs at time t, independently of all else, with probabilityλ(t)/λ.We now have the following proposition.Proposition 11.3 The time of the first counted event—call it S—is a randomvariable whose distribution has hazard rate function λ(t), t 0.ProofP {t


11.2. General Techniques for Simulating Continuous Random Variables 675that are observed in sequence up to some random time N then[ N]∑E X i = E[N]E[X]i=1More precisely let X 1 ,X 2 ,...denote a sequence of independent random variablesand consider the following definition.Definition 11.1 An integer-valued random variable N is said to be astopping time for the sequence X 1 ,X 2 ,... if the event {N = n} is independentof X n+1 ,X n+2 ,...for all n = 1, 2,....Intuitively, we observe the X n s in sequential order and N denotes the numberobserved before stopping. If N = n, then we have stopped after observingX 1 ,...,X n and before observing X n+1 ,X n+2 ,...for all n = 1, 2,....Example 11.6If we letLet X n ,n= 1, 2,...,be independent and such thatP {X n = 0}=P {X n = 1}= 1 2, n= 1, 2,...N = min{n: X 1 +···+X n = 10}then N is a stopping time. We may regard N as being the stopping time of anexperiment that successively flips a fair coin and then stops when the number ofheads reaches 10. Proposition 11.4 (Wald’s Equation) If X 1 ,X 2 ,... are independent andidentically distributed random variables having finite expectations, and if N isa stopping time for X 1 ,X 2 ,...such that E[N] < ∞, thenProofLetting[ N]∑E X n = E[N]E[X]1I n ={ 1, if N n0, if N


676 11 SimulationHence,[ N] [∑∞]∑∞∑E X n = E X n I n = E[X n I n ] (11.3)n=1n=1However, I n = 1 if and only if we have not stopped after successively observingX 1 ,...,X n−1 . Therefore, I n is determined by X 1 ,...,X n−1 and is thus independentof X n . From Equation (11.3) we thus obtainn=1n=1[ N]∑ ∞∑E X n = E[X n ]E[I n ]n=1= E[X]∞∑E[I n ]n=1[ ∞]∑= E[X]E I nn=1= E[X]E[N] Returning to the hazard rate method, we have thatS =N∑i=1As N = min{n: U n λ( ∑ n1 X i )/λ} it follows that the event that N = n is independentof X n+1 ,X n+2 ,.... Hence, by Wald’s equation,X iorE[S]=E[N]E[X i ]= E[N]λE[N]=λE[S]where E[S] is the mean of the desired random variable.


11.3. Special Techniques for Simulating Continuous Random Variables 67711.3. Special Techniques for Simulating ContinuousRandom VariablesSpecial techniques have been devised to simulate from most of the common continuousdistributions. We now present certain of these.11.3.1. The Normal DistributionLet X and Y denote independent standard normal random variables and thus havethe joint density functionf(x,y) = 12π e−(x2 +y 2 )/2 ,−∞


678 11 SimulationThe Jacobian of this transformation is∂d ∂dJ =∂x ∂y2x( )∂θ ∂θ=1 −y∣ ∣ ∣∂x ∂y1 + y 2 /x 2 x 2 ∣ x y ∣∣∣∣∣∣ = 2∣−y x = 2x 2 + y 2 x 2 + y 22y( )1 1 1 + y 2 /x 2 x ∣Hence, from Section 2.5.3 the joint density of R 2 and is given by1f R 2 ,(d, θ) = 1 2π e−d/2 2= 1 12 e−d/2 2π ,0


11.3. Special Techniques for Simulating Continuous Random Variables 679Figure 11.2.Thus, if we generate random numbers U 1 and U 2 and setV 1 = 2U 1 − 1,V 2 = 2U 2 − 1then (V 1 ,V 2 ) is uniformly distributed in the square of area 4 centered at (0, 0) (seeFigure 11.2).Suppose now that we continually generate such pairs (V 1 ,V 2 ) until we obtainone that is contained in the circle of radius 1 centered at (0, 0)—that is, until(V 1 ,V 2 ) is such that V1 2 + V 2 2 1. It now follows that such a pair (V 1,V 2 ) isuniformly distributed in the circle. If we let ¯R, ¯ denote the polar coordinates ofthis pair, then it is easy to verify that ¯R and ¯ are independent, with ¯R 2 beinguniformly distributed on (0, 1), and ¯ uniformly distributed on (0, 2π).Sincesin ¯ = V 2 / ¯R =cos ¯ = V 1 / ¯R =V 2√V 2 1 + V 2 2V 1√V 2 1 + V 2 2it follows from Equation (11.4) that we can generate independent standard normalsX and Y by generating another random number U and settingX = (−2log U) 1/2 V 1 / ¯R,Y = (−2log U) 1/2 V 2 / ¯R,


680 11 SimulationIn fact, since (conditional on V 2 1 + V 2 2 1) ¯R 2 is uniform on (0, 1) and is independentof ¯, we can use it instead of generating a new random number U; thusshowing that√−2logSX = (−2log ¯R 2 ) 1/2 V 1 / ¯R = V 1 ,S√−2logSY = (−2log ¯R 2 ) 1/2 V 2 / ¯R = V 2Sare independent standard normals, whereS = ¯R 2 = V 2 1 + V 2 2Summing up, we thus have the following approach to generating a pair of independentstandard normals:Step 1: Generate random numbers U 1 and U 2 .Step 2: Set V 1 = 2U 1 − 1, V 2 = 2U 2 − 1, S= V1 2 + V 2 2.Step 3: If S>1, return to step 1.Step 4: Return the independent unit normals√ √−2logS−2logSX = V 1 , Y =SSThe preceding is called the polar method. Since the probability that a randompoint in the square will fall within the circle is equal to π/4 (the area of the circledivided by the area of the square), it follows that, on average, the polar method willrequire 4/π = 1.273 iterations of step 1. Hence, it will, on average, require 2.546random numbers, 1 logarithm, 1 square root, 1 division, and 4.546 multiplicationsto generate 2 independent standard normals.11.3.2. The Gamma DistributionTo simulate from a gamma distribution with parameters (n, λ), where n is an integer,we use the fact that the sum of n independent exponential random variableseach having rate λ has this distribution. Hence, if U 1 ,...,U n are independentuniform (0, 1) random variables,(X = 1 n∑n∏)log U i =− 1 λλ log U ii=1i=1has the desired distribution.V 2


11.3. Special Techniques for Simulating Continuous Random Variables 681When n is large, there are other techniques available that do not require so manyrandom numbers. One possibility is to use the rejection procedure with g(x) beingtaken as the density of an exponential random variable with mean n/λ (as this isthe mean of the gamma). It can be shown that for large n the average numberof iterations needed by the rejection algorithm is e[(n − 1)/2π] 1/2 . In addition,if we wanted to generate a series of gammas, then, just as in Example 11.4, wecan arrange things so that upon acceptance we obtain not only a gamma randomvariable but also, for free, an exponential random variable that can then be usedin obtaining the next gamma (see Exercise 8).11.3.3. The Chi-Squared DistributionThe chi-squared distribution with n degrees of freedom is the distribution ofχn 2 = Z2 1 +···+Z2 n where Z i,i = 1,...,n are independent standard normals.Using the fact noted in the remark at the end of Section 3.1 we see that Z1 2 + Z2 2has an exponential distribution with rate 1 2. Hence, when n is even—say n = 2k—χ2k 2 has a gamma distribution with parameters (k, 1 2 ). Hence, −2log(∏ ki=1 U i )has a chi-squared distribution with 2k degrees of freedom. We can simulate a chisquaredrandom variable with 2k + 1 degrees of freedom by first simulating astandard normal random variable Z and then adding Z 2 to the preceding. That is,( k∏)χ2k+1 2 = Z2 − 2log U ii=1where Z,U 1 ,...,U n are independent with Z being a standard normal and theothers being uniform (0, 1) random variables.11.3.4. The Beta (n, m) DistributionThe random variable X is said to have a beta distribution with parameters n, m ifits density is given by(n + m − 1)!f(x)=(n − 1)!(m − 1)! xn−1 (1 − x) m−1 , 0


682 11 SimulationHence, if the n+m−1 uniform random variables are partitioned into three subsetsof sizes n − 1, 1, and m − 1 the probability (density) that each of the variablesin the first set is less than x, the variable in the second set equals x, and all thevariables in the third set are greater than x is given by(P {U x}) m−1 = x n−1 (1 − x) m−1Hence, as there are (n + m − 1)!/(n − 1)!(m − 1)! possible partitions, it followsthat U (n) is beta with parameters (n, m).Thus, one way to simulate from the beta distribution is to find the nth smallestof a set of n + m − 1 random numbers. However, when n and m are large, thisprocedure is not particularly efficient.For another approach consider a Poisson process with rate 1, and recall thatgiven S n+m , the time of the (n + m)th event, the set of the first n + m − 1event times is distributed independently and uniformly on (0,S n+m ). Hence, givenS n+m ,thenth smallest of the first n + m − 1 event times—that is S n —is distributedas the nth smallest of a set of n + m − 1 uniform (0,S n+m ) random variables.But from the preceding we can thus conclude that S n /S n+m has a beta distributionwith parameters (n, m). Therefore, if U 1 ,...,U n+m are random numbers,By writing the preceding as− log ∏ ni=1 U i− log ∏ m+ni=1U iis beta with parameters (n, m)− log ∏ ni=1 U i− log ∏ n1 U i − log ∏ n+mn+1 U iwe see that it has the same distribution as X/(X + Y) where X and Y are independentgamma random variables with respective parameters (n, 1) and (m, 1).Hence, when n and m are large, we can efficiently simulate a beta by first simulatingtwo gamma random variables.11.3.5. The Exponential Distribution—The Von Neumann AlgorithmAs we have seen, an exponential random variable with rate 1 can be simulatedby computing the negative of the logarithm of a random number. Most computerprograms for computing a logarithm, however, involve a power series expansion,and so it might be useful to have at hand a second method that is computationallyeasier. We now present such a method due to Von Neumann.


11.3. Special Techniques for Simulating Continuous Random Variables 683To begin let U 1 ,U 2 ,... be independent uniform (0, 1) random variables anddefine N,N 2, byN = min{n: U 1 U 2 ··· U n−1 n,U 1 y}==∫ 10∫ y0P {N >n,U 1 y|U 1 = x} dxP {N >n|U 1 = x} dxNow, given that U 1 = x,N will be greater than n if x U 2 ··· U n or, equivalently,if(a) U i x, i = 2,...,nand(b)U 2 ··· U nNow, (a) has probability x n−1 of occurring and given (a), since all of the (n − 1)!possible rankings of U 2 ,...,U n are equally likely, (b) has probability 1/(n − 1)!of occurring. Hence,and sowhich yields thatxn−1P {N >n|U 1 = x}=(n − 1)!∫ yx n−1 ynP {N >n,U 1 y}=dx =0 (n − 1)! n!P {N = n, U 1 y}=P {N >n− 1,U 1 y}−P {N >n,U 1 y}= yn−1(n − 1)! − ynn!Upon summing over all the even integers, we see thatP {N is even, U 1 y}=y − y22! + y33! − y44! −···= 1 − e −y (11.5)We are now ready for the following algorithm for generating an exponentialrandom variable with rate 1.


684 11 SimulationStep 1: Generate uniform random numbers U 1 ,U 2 ,... stopping at N =min{n: U 1 ··· U n−1 x−[x](where [x] is the largest integer not exceeding x). AsP {N even, U 1 >y}=P {N even}−P {N even, U 1 y}we see that= 1 − e −1 − (1 − e −y ) = e −y − e −1P {X>x}=e −[x] [e −1 + e −(x−[x]) − e −1 ]=e −xwhich yields the result.Let T denote the number of trials needed to generate a successful run. As eachtrial is a success with probability 1−e −1 it follows that T is geometric with means1/(1 − e −1 ).IfweletN i denote the number of uniform random variables used onthe ith run, i 1, then T (being the first run i for which N i is even) is a stoppingtime for this sequence. Hence, by Wald’s equation, the mean number of uniformrandom variables needed by this algorithm is given by[ T∑]E N i = E[N]E[T ]i=1Now,∞∑E[N]= P {N >n}n=0∞∑= 1 + P {U 1 ··· U n }n=1∞∑= 1 + 1/n!=en=1


11.4. Simulating from Discrete Distributions 685and so[ T∑]eE N i = ≈ 4.31 − e−1 i=1Hence, this algorithm, which computationally speaking is quite easy to perform,requires on the average about 4.3 random numbers to execute.11.4. Simulating from Discrete DistributionsAll of the general methods for simulating from continuous distributions haveanalogs in the discrete case. For instance, if we want to simulate a random variableX having probability mass function∑P {X = x j }=P j , j = 1, 2,..., P j = 1We can use the following discrete time analog of the inverse transform technique.As,To simulate X for which P {X = x j }=P jlet U be uniformly distributed over (0, 1), and set⎧x 1 , if U


686 11 SimulationAsj−1∑P {X = i}=1 − P {X>j− 1}=1 − (1 − p) j−1i=1we can simulate such a random variable by generating a random number U andthen setting X equal to that value j for whichor, equivalently, for which1 − (1 − p) j−1


11.4. Simulating from Discrete Distributions 687Step 1: Let α = 1/p, β = 1/(1 − p).Step 2: Set k = 0.Step 3: Generate a uniform random number U.Step 4: If k = n stop. Otherwise reset k to equal k + 1.Step 5: If U p set X k = 1 and reset U to equal αU. IfU>pset X k = 0and reset U to equal β(U − p).Returntostep4.This procedure generates X 1 ,...,X n and X = ∑ ni=1 X i is the desired randomvariable. It works by noting whether U k p or U k >p; in the former case ittakes U k+1 to equal U k /p, and in the latter case it takes U k+1 to equal (U k − p)/(1 − p). † Example 11.9 (Simulating a Poisson Random Variable) To simulate a Poissonrandom variable with mean λ, generate independent uniform (0, 1) randomvariables U 1 ,U 2 ,...stopping at{}n∏N + 1 = min n: U i


688 11 Simulationnumber of events in (s,λ). On the other hand given that S m = s the set of timesat which the first m − 1 events occur has the same distribution as a set of m − 1uniform (0,s)random variables (see Section 5.3.5). Hence, when λ


11.4. Simulating from Discrete Distributions 689Before presenting the general technique for obtaining the representation ofEquation (11.6), let us illustrate it by an example.Example 11.10 Consider the three-point distribution P with P 1 =16 7 ,P 2 = 1 2 ,P 3 =16 1 . We start by choosing i and j such that they satisfy the conditionsof Lemma 11.5. As P 3 < 1 2 and P 3 + P 2 > 1 2, we can work with i = 3 andj = 2. We will now define a 2-point mass function Q (1) putting all of its weighton 3 and 2 and such that P will be expressible as an equally weighted mixturebetween Q (1) and a second 2-point mass function Q (2) . Secondly, all of the massof point 3 will be contained in Q (1) . As we will haveP j = 1 ( (1)2 Qj+ Q (2) )j , j = 1, 2, 3 (11.7)and, by the preceding, Q (2)3is supposed to equal 0, we must therefore takeQ (1)3= 2P 3 = 1 8 , Q(1) 2= 1 − Q (1)3= 7 8 , Q(1) 1= 0To satisfy Equation (11.7), we must then setQ (2)3= 0, Q (2)2= 2P 2 − 7 8 = 1 8 , Q(2) 1= 2P 1 = 7 8Hence, we have the desired representation in this case. Suppose now that the originaldistribution was the following 4-point mass function:P 1 = 7 16 , P 2 = 1 4 , P 3 = 1 8 , P 4 = 3 16Now, P 3 < 1 3 and P 3 + P 1 > 1 3 . Hence our initial 2-point mass function—Q(1) —will concentrate on points 3 and 1 (giving no weights to 2 and 4). As the finalrepresentation will give weight 1 3 to Q(1) and in addition the other Q (j) , j = 2, 3,will not give any mass to the value 3, we must have thatHence,Also, we can write13 Q(1) 3= P 3 = 1 8Q (1)3= 3 8 , Q(1) 1= 1 − 3 8 = 5 8P = 1 3 Q(1) + 2 3 P(3)


690 11 Simulationwhere P (3) , to satisfy the preceding, must be the vectorP (3)1= 3 (2 P1 − 1 )3 Q(1) 1 =1 13 2,P (3)2= 3 2 P 2 = 3 8 ,P (3)3= 0,P (3)4= 3 2 P 4 =329Note that P (3) gives no mass to the value 3. We can now express the mass functionP (3) as an equally weighted mixture of 2-point mass functions Q (2) and Q (3) , andwe will end up withP = 1 3 Q(1) + 2 ( 123Q (2) + 1 2 Q(3))(Q (1) + Q (2) + Q (3))= 1 3(We leave it as an exercise for you to fill in the details.)The preceding example outlines the following general procedure for writingthe n-point mass function P in the form of Equation (11.6) where each of the Q (i)are mass functions giving all their mass to at most 2 points. To start, we choose iand j satisfying the conditions of Lemma 11.5. We now define the mass functionQ (1) concentrating on the points i and j and which will contain all of the massfor point i by noting that, in the representation of Equation (11.6), Q (k)i= 0fork = 2,...,n− 1, implying thatWritingQ (1)i= (n − 1)P i , and so Q (1)j= 1 − (n − 1)P iP = 1n − 1 Q(1) + n − 2n − 1 P(n−1) (11.8)where P (n−1) represents the remaining mass, we see thatP (n−1)i= 0,P (n−1)j= n − 1n − 2(P j − 1 )n − 1 Q(1) j= n − 1 (P i + P j − 1 ),n − 2n − 1P (n−1)k= n − 1n − 2 P k, k≠ i or jThat the foregoing is indeed a probability mass function is easily checked—forinstance, the nonnegativity of P (n−1)jfollows from the fact that j was chosen sothat P i + P j 1/(n − 1).


11.4. Simulating from Discrete Distributions 691We may now repeat the foregoing procedure on the (n − 1)-point probabilitymass function P (n−1) to obtainP (n−1) = 1n − 2 Q(2) + n − 3n − 2 P(n−2)and thus from Equation (11.8) we haveP = 1n − 1 Q(1) + 1n − 1 Q(2) + n − 3n − 1 P(n−2)We now repeat the procedure on P (n−2) and so on until we finally obtainP = 1 (Q (1) +···+Q (n−1))n − 1In this way we are able to represent P as an equally weighted mixture of n − 12-point mass functions. We can now easily simulate from P by first generating arandom integer N equally likely to be either 1, 2,...,n− 1. If the resulting valueN is such that Q (N) puts positive weight only on the points i N and j N , then wecan set X equal to i N if a second random number is less than Q (N)i Nand equal toj N otherwise. The random variable X will have probability mass function P. Thatis, we have the following procedure for simulating from P.Step 1: Generate U 1 and set N = 1 +[(n − 1)U 1 ].Step 2: Generate U 2 and set{iX = N ,j N ,if U 2 0. (That is,we can arrange things so that the kth 2-point mass function gives positive weightto the value k.) Hence, the procedure calls for simulating N, equally likely to be1, 2,...,n−1, and then if N = k it either accepts k as the value of X, or it acceptsfor the value of X the “alias” of k (namely, the other value that Q (k) gives positiveweight).(ii) Actually, it is not necessary to generate a new random number in step 2.Because N − 1 is the integer part of (n − 1)U 1 , it follows that the remainder(n − 1)U 1 − (N − 1) is independent of U 1 and is uniformly distributed in (0, 1).Hence, rather than generating a new random number U 2 in step 2, we can use(n − 1)U 1 − (N − 1) = (n − 1)U 1 −[(n − 1)U 1 ].


692 11 SimulationExample 11.11 Let us return to the problem of Example 11.1 which considersa list of n, not necessarily distinct, items. Each item has a value—v(i) beingthe value of the item in position i—and we are interested in estimatingv =n∑v(i)/m(i)i=1where m(i) is the number of times the item in position i appears on the list. Inwords, v is the sum of the values of the (distinct) items on the list.To estimate v, note that if X is a random variable such that/ n∑P {X = i}=v(i) v(j), i = 1,...,nthen1∑iE[1/m(X)]=v(i)/m(i) / n∑∑j v(j) = v v(j)Hence, we can estimate v by using the alias (or any other) method to generateindependent random variables X 1 ,...,X k having the same distribution as X andthen estimating v byj=1v ≈ 1 kn∑ k∑v(j) 1/m(X i )j=1 i=111.5. Stochastic ProcessesWe can easily simulate a stochastic process by simulating a sequence of randomvariables. For instance, to simulate the first t time units of a renewal processhaving interarrival distribution F we can simulate independent random variablesX 1 ,X 2 ,...having distribution F , stopping atN = min{n: X 1 +···+X n >t}The X i ,i 1, represent the interarrival times of the renewal process and so thepreceding simulation yields N − 1 events by time t—the events occurring at timesX 1 ,X 1 + X 2 ,...,X 1 +···+X N−1 .Actually there is another approach for simulating a Poisson process that is quiteefficient. Suppose we want to simulate the first t time units of a Poisson processhaving rate λ. To do so, we can first simulate N(t), the number of events by t,and then use the result that given the value of N(t),thesetofN(t) event times is


11.5. Stochastic Processes 693distributed as a set of n independent uniform (0,t) random variables. Hence, westart by simulating N(t), a Poisson random variable with mean λt (by one of themethods given in Example 11.9). Then, if N(t)= n, generate a new set of n randomnumbers—call them U 1 ,...,U n —and {tU 1 ,...,tU n } will represent the setof N(t) event times. If we could stop here this would be much more efficient thansimulating the exponentially distributed interarrival times. However, we usuallydesire the event times in increasing order—for instance, for s


694 11 SimulationMethod 1. Sampling a Poisson ProcessTo simulate the first T time units of a nonhomogeneous Poisson process withintensity function λ(t), letλ be such thatλ(t) λfor all t TNow as shown in Chapter 5, such a nonhomogeneous Poisson process can begenerated by a random selection of the event times of a Poisson process havingrate λ. That is, if an event of a Poisson process with rate λ that occurs at time t iscounted (independently of what has transpired previously) with probability λ(t)/λthen the process of counted events is a nonhomogeneous Poisson process withintensity function λ(t), 0 t T . Hence, by simulating a Poisson process andthen randomly counting its events, we can generate the desired nonhomogeneousPoisson process. We thus have the following procedure:Generate independent random variables X 1 ,U 1 ,X 2 ,U 2 ,... where the X i areexponential with rate λ and the U i are random numbers, stopping at{}n∑N = min n: X i >Ti=1Now let, for j = 1,...,N − 1,⎧∑j /λ⎨1, if U j λ(i=1 i)I j =X ⎩0, otherwiseand setJ ={j: I j = 1}Thus, the counting process having events at the set of times { ∑ ji=1 X i: j ∈ J }constitutes the desired process.The foregoing procedure, referred to as the thinning algorithm (because it“thins” the homogeneous Poisson points) will clearly be most efficient, in thesense of having the fewest number of rejected event times, when λ(t) is near λthroughout the interval. Thus, an obvious improvement is to break up the intervalinto subintervals and then use the procedure over each subinterval. That is,determine appropriate values k, 0


11.5. Stochastic Processes 695Now simulate the nonhomogeneous Poisson process over the interval (t i−1 ,t i ) bygenerating exponential random variables with rate λ i and accepting the generatedevent occurring at time s, s ∈ (t i−1 ,t i ), with probability λ(s)/λ i . Because of thememoryless property of the exponential and the fact that the rate of an exponentialcan be changed upon multiplication by a constant, it follows that there is no lossof efficiency in going from one subinterval to the next. In other words, if we areat t ∈[t i−1 ,t i ) and generate X, an exponential with rate λ i , which is such thatt + X>t i then we can use λ i [X − (t i − t)]/λ i+1 as the next exponential withrate λ i+1 . Thus, we have the following algorithm for generating the first t timeunits of a nonhomogeneous Poisson process with intensity function λ(s) whenthe relations (11.9) are satisfied. In the algorithm, t will represent the present timeand I the present interval (that is, I = i when t i−1 t


696 11 SimulationFigure 11.3.λ(s) = s,0


11.5. Stochastic Processes 697whereλ 1 (t) ≡ 1,⎧⎨λ(t) − 1, 0


698 11 SimulationExample 11.12 If λ(s) = cs, then we can simulate the first T time units ofthe nonhomogeneous Poisson process by first simulating N(T), a Poisson randomvariable having mean m(T ) = ∫ T0 cs ds = CT 2 /2, and then simulating N(T) randomvariables having distributionF(s)= s2T 2 ,0


11.5. Stochastic Processes 699To start, note that if an event occurs at time x then, independent of whathas occurred prior to x, the time until the next event has the distribution F xgiven by¯F x (t) = P {0 events in (x, x + t)|event at x}= P {0 events in (x, x + t)} by independent increments{ ∫ t}= exp − λ(x + y)dy0Differentiation yields that the density corresponding to F x is{ ∫ t}f x (t) = λ(x + t)exp − λ(x + y)dy0implying that the hazard rate function of F x isr x (t) = f x(t)= λ(x + t)¯F x (t)We can now simulate the event times X 1 ,X 2 ,... by simulating X 1 from F 0 ;then if the simulated value of X 1 is x 1 , simulate X 2 by adding x 1 to a value generatedfrom F x1 , and if this sum is x 2 simulate X 3 by adding x 2 to a value generatedfrom F x2 , and so on. The method used to simulate from these distributions shoulddepend, of course, on the form of these distributions. However, it is interesting tonote that if we let λ be such that λ(t) λ and use the hazard rate method to simulate,then we end up with the approach of Method 1 (we leave the verification ofthis fact as an exercise). Sometimes, however, the distributions F x can be easilyinverted and so the inverse transform method can be applied.Example 11.13Hence,and soSuppose that λ(x) = 1/(x + a),x 0. Then∫ t( ) x + a + tλ(x + y)dy = logx + a0F x (t) = 1 −F −1x + ax + a + t = tx + a + tux (u) = (x + a)1 − u


700 11 SimulationWe can, therefore, simulate the successive event times X 1 ,X 2 ,... by generatingU 1 ,U 2 ,...and then settingand, in general,X 1 = aU 11 − U 1,U 2X 2 = (X 1 + a) + X 11 − U 2U jX j = (X j−1 + a) + X j−1 , j 2 1 − U j11.5.2. Simulating a Two-Dimensional Poisson ProcessA point process consisting of randomly occurring points in the plane is said to bea two-dimensional Poisson process having rate λ if(a) the number of points in any given region of area A is Poissondistributed with mean λA; and(b) the numbers of points in disjoint regions are independent.For a given fixed point O in the plane, we now show how to simulate eventsoccurring according to a two-dimensional Poisson process with rate λ in a circularregion of radius r centered about O. LetR i ,i 1, denote the distance betweenO and its ith nearest Poisson point, and let C(a) denote the circle of radius acentered at O. ThenP { { √ }b{πR1 2 >b} = P R 1 > = P no points in C (√ b/π )} = e −λbπAlso, with C(a 2 ) − C(a 1 ) denoting the region between C(a 2 ) and C(a 1 ):P { πR2 2 − πR2 1 >b∣ ∣ R1 = r }{ √= P R 2 > (b + πr 2 )/π ∣ }∣R 1 = r{(√)= P no points in C (b + πr 2 )/π − C(r) ∣ }∣R 1 = r{(√) }= P no points in C (b + πr 2 )/π − C(r) by (b)= e −λbIn fact, the same argument can be repeated to obtain the following.


11.5. Stochastic Processes 701Proposition 11.5 With R 0 = 0,πR 2 i − πR2 i−1 , i 1,are independent exponentials with rate λ.In other words, the amount of area that needs to be traversed to encompass aPoisson point is exponential with rate λ. Since, by symmetry, the respective anglesof the Poisson points are independent and uniformly distributed over (0, 2π),we thus have the following algorithm for simulating the Poisson process over acircular region of radius r about O:Step 1:Generate independent exponentials with rate 1, X 1 ,X 2 ,..., stoppingat{N = min n: X }1 +···+X n>r 2λπStep 2: If N = 1, stop. There are no points in C(r). Otherwise, for i =1,...,N − 1, setR i = √ (X 1 +···+X i )/λπStep 3: Generate independent uniform (0, 1) random variablesU 1 ,...,U N−1 .Step 4: Return the N − 1 Poisson points in C(r) whose polar coordinates are(R i , 2πU i ), i = 1,...,N − 1The preceding algorithm requires, on average, 1 + λπr 2 exponentials and anequal number of uniform random numbers. Another approach to simulating pointsin C(r) is to first simulate N, the number of such points, and then use the factthat, given N, the points are uniformly distributed in C(r). This latter procedurerequires the simulation of N, a Poisson random variable with mean λπr 2 ;wemustthen simulate N uniform points on C(r), by simulating R from the distributionF R (a) = a 2 /r 2 (see Exercise 25) and θ from uniform (0, 2π) and must then sortthese N uniform values in increasing order of R. The main advantage of the firstprocedure is that it eliminates the need to sort.The preceding algorithm can be thought of as the fanning out of a circle centeredat O with a radius that expands continuously from 0 to r. The successiveradii at which Poisson points are encountered is simulated by noting that the additionalarea necessary to encompass a Poisson point is always, independent of thepast, exponential with rate λ. This technique can be used to simulate the processover noncircular regions. For instance, consider a nonnegative function g(x), andsuppose we are interested in simulating the Poisson process in the region between


702 11 SimulationFigure 11.4.the x-axis and g with x going from 0 to T (see Figure 11.4). To do so we canstart at the left-hand end and fan vertically to the right by considering the successiveareas ∫ a0 g(x)dx.NowifX 1 λ g(x)dx0λλλ∫ X10∫ X2∫ XN−1X N−2g(x)dx = ɛ 1 ,X 1g(x)dx = ɛ 2 ,.g(x)dx = ɛ N−1If we now simulate U 1 ,...,U N−1 —independent uniform (0, 1) random numbers—then as the projection on the y-axis of the Poisson point whose x-coordinate isX i , is uniform on (0, g(X i )), it follows that the simulated Poisson points in theinterval are (X i ,U i g(X i )), i = 1,...,N − 1.Of course, the preceding technique is most useful when g is regular enough sothat the foregoing equations can be solved for the X i . For instance, if g(x) = y


11.6. Variance Reduction Techniques 703(and so the region of interest is a rectangle), thenX i = ɛ 1 +···+ɛ i, i = 1,...,N − 1λyand the Poisson points are(X i ,yU i ), i = 1,...,N − 111.6. Variance Reduction TechniquesLet X 1 ,...,X n have a given joint distribution, and suppose we are interested incomputingθ ≡ E[g(X 1 ,...,X n )]where g is some specified function. It is often the case that it is not possible toanalytically compute the preceding, and when such is the case we can attemptto use simulation to estimate θ. This is done as follows: Generate X (1)1 ,...,X(1) nhaving the same joint distribution as X 1 ,...,X n and setY 1 = g ( X (1) )1 ,...,X(1) nNow, simulate a second set of random variables (independent of the first set)X (2)1 ,...,X(2) n having the distribution of X 1 ,...,X n and setY 2 = g ( X (2) )1 ,...,X(2) nContinue this until you have generated k (some predetermined number) sets,and so have also computed Y 1 ,Y 2 ,...,Y k .Now,Y 1 ,...,Y k are independent andidentically distributed random variables each having the same distribution ofg(X 1 ,...,X n ). Thus, if we let Ȳ denote the average of these k random variables—that is,thenk∑Ȳ = Y i /ki=1E[Ȳ ]=θ,E [ (Ȳ − θ) 2] = Var(Ȳ)


704 11 SimulationHence, we can use Ȳ as an estimate of θ. As the expected square of the differencebetween Ȳ and θ is equal to the variance of Ȳ , we would like this quantity to beas small as possible. [In the preceding situation, Var(Ȳ)= Var(Y i )/k, which isusually not known in advance but must be estimated from the generated valuesY 1 ,...,Y n .] We now present three general techniques for reducing the variance ofour estimator.11.6.1. Use of Antithetic VariablesIn the preceding situation, suppose that we have generated Y 1 and Y 2 , identicallydistributed random variables having mean θ.Now,( )Y1 + Y 2Var = 1 2 4 [Var(Y 1) + Var(Y 2 ) + 2Cov(Y 1 ,Y 2 )]= Var(Y 1)2+ Cov(Y 1,Y 2 )2Hence, it would be advantageous (in the sense that the variance would be reduced)if Y 1 and Y 2 rather than being independent were negatively correlated. To seehow we could arrange this, let us suppose that the random variables X 1 ,...,X nare independent and, in addition, that each is simulated via the inverse transformtechnique. That is, X i is simulated from Fi−1 (U i ) where U i is a random numberand F i is the distribution of X i . Hence, Y 1 can be expressed asY 1 = g ( F −11(U 1 ),...,Fn −1 (U n) )Now, since 1 − U is also uniform over (0, 1) whenever U is a random number(and is negatively correlated with U) it follows that Y 2 defined byY 2 = g ( F −11(1 − U 1 ),...,Fn −1 (1 − U n) )will have the same distribution as Y 1 . Hence, if Y 1 and Y 2 were negatively correlated,then generating Y 2 by this means would lead to a smaller variance than if itwere generated by a new set of random numbers. (In addition, there is a computationalsavings since rather than having to generate n additional random numbers,we need only subtract each of the previous n from 1.) The following theoremwill be the key to showing that this technique—known as the use of antitheticvariables—will lead to a reduction in variance whenever g is a monotone function.Theorem 11.1 If X 1 ,...,X n are independent, then, for any increasing functionsf and g of n variables,where X = (X 1 ,...,X n ).E[f(X)g(X)] E[f(X)]E[g(X)] (11.11)


11.6. Variance Reduction Techniques 705Proof The proof is by induction on n. To prove it when n = 1, let f and g beincreasing functions of a single variable. Then, for any x and y,(f (x) − f (y))(g(x) − g(y)) 0since if x y(x y) then both factors are nonnegative (nonpositive). Hence, forany random variables X and Y ,implying that(f (X) − f (Y ))(g(X) − g(Y)) 0or, equivalently,E[(f (X) − f (Y ))(g(X) − g(Y))] 0E[f (X)g(X)]+E[f(Y)g(Y)] E[f(X)g(Y)]+E[f(Y)g(X)]If we suppose that X and Y are independent and identically distributed then, as inthis case,E[f (X)g(X)]=E[f(Y)g(Y)],E[f(X)g(Y)]=E[f(Y)g(X)]=E[f(X)]E[g(X)]we obtain the result when n = 1.So assume that (11.11) holds for n − 1 variables, and now suppose thatX 1 ,...,X n are independent and f and g are increasing functions. ThenHence,E[f(X)g(X)|X n = x n ]= E[f(X 1 ,...,X n−1 ,x n )g(X 1 ,...,X n−1 ,x n )|X n = x]= E[f(X 1 ,...,X n−1 ,x n )g(X 1 ,...,X n−1 ,x n )] by independence E[f(X 1 ,...,X n−1 ,x n )]E[g(X 1 ,...,X n−1 ,x n )]by the induction hypothesis= E[f(X)|X n = x n ]E[g(X)|X n = x n ]E[f(X)g(X)|X n ] E[f(X)|X n ]E[g(X)|X n ]and, upon taking expectations of both sides,E[f(X)g(X)] E [ E[f(X)|X n ]E[g(X)|X n ] ] E[f(X)]E[g(X)]


706 11 SimulationThe last inequality follows because E[f(X)|X n ] and E[g(X)|X n ] are both increasingfunctions of X n , and so, by the result for n = 1,E [ E[f(X)|X n ]E[g(X)|X n ] ] E [ E[f(X)|X n ] ] E [ E[g(X)|X n ] ]= E[f(X)]E[g(X)] Corollary 11.7 If U 1 ,...,U n are independent, and k is either an increasingor decreasing function, thenCov ( k(U 1 ,...,U n ), k(1 − U 1 ,...,1 − U n ) ) 0Proof Suppose k is increasing. As −k(1 − U 1 ,...,1 − U n ) is increasing inU 1 ,...,U n , then, from Theorem 11.1,Cov ( k(U 1 ,...,U n ), −k(1 − U 1 ,...,1 − U n ) ) 0When k is decreasing just replace k by its negative. Since Fi−1 (U i ) is increasing in U i (as F i , being a distribution function, is increasing)it follows that g(F −11(U 1 ),...,Fn−1(Un)) is a monotone function ofU 1 ,...,U n whenever g is monotone. Hence, if g is monotone the antithetic variableapproach of twice using each set of random numbers U 1 ,...,U n by first computingg(F −11(U 1 ),...,Fn−1(Un)) and then g(F −11(1 − U 1 ),...,Fn−1(1− U n))will reduce the variance of the estimate of E[g(X 1 ,...,X n )]. That is, rather thangenerating k sets of n random numbers, we should generate k/2 sets and use eachset twice.Example 11.14 (Simulating the Reliability Function) Consider a system ofn components in which component i, independently of other components, workswith probability p i , i = 1,...,n. Letting{ 1, if component i worksX i =0, otherwisesuppose there is a monotone structure function φ such that{ 1, if the system works under X1 ,...,Xφ(X 1 ,...,X n ) =n0, otherwiseWe are interested in using simulation to estimater(p 1 ,...,p n ) ≡ E[φ(X 1 ,...,X n )]=P {φ(X 1 ,...,X n ) = 1}


11.6. Variance Reduction Techniques 707Now, we can simulate the X i by generating uniform random numbers U 1 ,...,U nand then setting{ 1, if Ui


708 11 Simulation11.6.2. Variance Reduction by ConditioningLet us start by recalling (see Proposition 3.1) the conditional variance formulaVar(Y ) = E[Var(Y |Z)]+Var(E[Y |Z]) (11.12)Now suppose we are interested in estimating E[g(X 1 ,...,X n )] by simulatingX = (X 1 ,...,X n ) and then computing Y = g(X 1 ,...,X n ). Now, if for some randomvariable Z we can compute E[Y |Z] then, as Var(Y |Z) 0, it follows fromthe conditional variance formula thatVar(E[Y |Z]) Var(Y )implying, since E[E[Y |Z]] = E[Y ], that E[Y |Z] is a better estimator of E[Y ]than is Y .In many situations, there are a variety of Z i that can be conditioned on to obtainan improved estimator. Each of these estimators E[Y |Z i ] will have mean E[Y ]and smaller variance than does the raw estimator Y . We now show that for anychoice of weights λ i ,λ i 0, ∑ i λ i = 1, ∑ i λ iE[Y |Z i ] is also an improvementover Y .Proposition 11.8 For any λ i 0, ∑ ∞i=1 λ i = 1,(a) E[ ∑ i λ iE[Y |Z i ]] = E[Y ],(b) Var( ∑ i λ iE[Y |Z i ]) Var(Y ).Proof The proof of (a) is immediate. To prove (b), let N denote an integervalued random variable independent of all the other random variables under considerationand such thatP {N = i}=λ i , i 1Applying the conditional variance formula twice yieldsVar(Y ) Var(E[Y |N,Z N ]) Var ( E[E[Y |N,Z N ]|Z 1 ,...] )= Var ∑ iλ i E[Y |Z i ]Example 11.16 Consider a queueing system having Poisson arrivals andsuppose that any customer arriving when there are already N others in the systemis lost. Suppose that we are interested in using simulation to estimate the expected


11.6. Variance Reduction Techniques 709number of lost customers by time t. The raw simulation approach would be tosimulate the system up to time t and determine L, the number of lost customersfor that run. A better estimate, however, can be obtained by conditioning on thetotal time in [0,t] that the system is at capacity. Indeed, if we let T denote thetime in [0,t] that there are N in the system, thenE[L|T ]=λTwhere λ is the Poisson arrival rate. Hence, a better estimate for E[L] than theaverage value of L over all simulation runs can be obtained by multiplying theaverage value of T per simulation run by λ. If the arrival process were a nonhomogeneousPoisson process, then we could improve over the raw estimator L bykeeping track of those time periods for which the system is at capacity. If we letI 1 ,...,I C denote the time intervals in [0,t] in which there are N in the system,thenE[L|I 1 ,...,I C ]=C∑∫i=1I iλ(s) dswhere λ(s) is the intensity function of the nonhomogeneous Poisson arrivalprocess. The use of the right side of the preceding would thus lead to a betterestimate of E[L] than the raw estimator L. Example 11.17 Suppose that we wanted to estimate the expected sum of thetimes in the system of the first n customers in a queueing system. That is, if W iis the time that the ith customer spends in the system, then we are interested inestimatingθ = E[ n∑i=1W i]Let Y i denote the “state of the system” at the moment at which the ith customerarrives. It can be shown § that for a wide class of models the estimator∑ ni=1E[W i |Y i ] has (the same mean and) a smaller variance than the estimator∑ ni=1W i . (It should be noted that whereas it is immediate that E[W i |Y i ] hassmaller variance than W i , because of the covariance terms involved it is not immediatelyapparent that ∑ ni=1 E[W i |Y i ] has smaller variance than ∑ ni=1 W i .) For§ S. M. Ross, “Simulating Average Delay—Variance Reduction by Conditioning,” Probability in theEngineering and Informational Sciences 2(3), (1988), pp. 309–312.


710 11 Simulationinstance, in the model G/M/1E[W i |Y i ]=(N i + 1)/μwhere N i is the number in the system encountered by the ith arrival and 1/μ isthe mean service time; the result implies that ∑ ni=1 (N i + 1)/μ is a better estimateof the expected total time in the system of the first n customers than is the rawestimator ∑ ni=1 W i . Example 11.18 (Estimating the Renewal Function by Simulation) Considera queueing model in which customers arrive daily in accordance with a renewalprocess having interarrival distribution F . However, suppose that at some fixedtime T , for instance 5 P.M., no additional arrivals are permitted and those customersthat are still in the system are serviced. At the start of the next, and eachsucceeding, day customers again begin to arrive in accordance with the renewalprocess. Suppose we are interested in determining the average time that a customerspends in the system. Upon using the theory of renewal reward processes(with a cycle starting every T time units), it can be shown thataverage time that a customer spends in the system=E[sum of the times in the system of arrivals in (0,T)]m(T )where m(T ) is the expected number of renewals in (0, T ).If we were to use simulation to estimate the preceding quantity, a run wouldconsist of simulating a single day, and as part of a simulation run, we wouldobserve the quantity N(T), the number of arrivals by time T . Since E[N(T)]=m(T ), the natural simulation estimator of m(T ) would be the average (over allsimulated days) value of N(T) obtained. However, Var(N(T )) is, for large T ,proportional to T (its asymptotic form being Tσ 2 /μ 3 , where σ 2 is the varianceand μ the mean of the interarrival distribution F ), and so, for large T , the varianceof our estimator would be large. A considerable improvement can be obtained byusing the analytic formula (see Section 7.3)m(T ) = T μ − 1 + E[Y(T)]μ(11.13)where Y(T) denotes the time from T until the next renewal—that is, it is theexcess life at T . Since the variance of Y(T) does not grow with T (indeed, itconverges to a finite value provided the moments of F are finite), it follows thatfor T large, we would do much better by using the simulation to estimate E[Y(T)]and then use Equation (11.13) to estimate m(T ).


11.6. Variance Reduction Techniques 711Figure 11.5. A(T ) = x.However, by employing conditioning, we can improve further on our estimateof m(T ).Todoso,letA(T ) denote the age of the renewal process at time T —thatis, it is the time at T since the last renewal. Then, rather than using the value ofY(T), we can reduce the variance by considering E[Y(T)|A(T )]. Now knowingthat the age at T is equal to x is equivalent to knowing that there was a renewal attime T − x and the next interarrival time X is greater than x. Since the excess atT will equal X − x (see Figure 11.5), it follows thatE[Y(T)|A(T ) = x]=E[X − x|X>x]∫ ∞P {X − x>t}=dt0 P {X>x}∫ ∞[1 − F(t + x)]=dt1 − F(x)which can be numerically evaluated if necessary.As an illustration of the preceding note that if the renewal process is a Poissonprocess with rate λ, then the raw simulation estimator N(T) will have varianceλT ; since Y(T) will be exponential with rate λ, the estimator based on (11.13)will have variance λ 2 Var{Y(T)}=1. On the other hand, since Y(T)will be independentof A(T ) (and E[Y(T)|A(T )]=1/λ), it follows that the variance of theimproved estimator E[Y(T)|A(T )] is 0. That is, conditioning on the age at timeT yields, in this case, the exact answer. Example 11.19 Consider the M/G/1 queueing system where customers arrivein accordance with a Poisson process with rate λ to a single server havingservice distribution G with mean E[S]. Suppose that, for a specified time t 0 ,theserver will take a break at the first time t t 0 at which the system is empty. Thatis, if X(t) is the number of customers in the system at time t, then the server willtake a break at timeT = min{t t 0 : X(t) = 0}To efficiently use simulation to estimate E[T ], generate the system to time t 0 ; letR denote the remaining service time of the customer in service at time t 0 , andlet X Q equal the number of customers waiting in queue at time t 0 . (Note that Ris equal to 0 if X(t 0 ) = 0, and X Q = (X(t 0 ) − 1) + .) Now, with N equal to thenumber of customers that arrive in the remaining service time R, it follows that0


712 11 Simulationif N = n and X Q = n Q , then the additional amount of time from t 0 + R until theserver can take a break is equal to the amount of time that it takes until the system,starting with n + n Q customers, becomes empty. Because this is equal to the sumof n + n Q busy periods, it follows from Section 8.5.3 thatConsequently,E[S]E[T |R,N,X Q ]=t 0 + R + (N + X Q )1 − λE[S]E[T |R,X Q ]=E [ E[T |R,N,X Q ]|R,X Q]E[S]= t 0 + R + (E[N|R,X Q ]+X Q )1 − λE[S]E[S]= t 0 + R + (λR + X Q )1 − λE[S]Thus, rather than using the generated value of T as the estimator from a simulationrun, it is better to stop the simulation at time t 0 and use the estimatorE[S]t 0 + (λR + X Q ) 1−λE[S] .11.6.3. Control VariatesAgain suppose we want to use simulation to estimate E[g(X)] where X =(X 1 ,...,X n ). But now suppose that for some function f the expected value off(X) is known—say, E[f(X)]=μ. Then for any constant a we can also useas an estimator of E[g(X)].Now,W = g(X) + a(f(X) − μ)Var(W ) = Var(g(X)) + a 2 Var(f (X)) + 2a Cov(g(X), f (X))Simple calculus shows that the preceding is minimized whenand, for this value of a,a =Var(W ) = Var(g(X)) −− Cov(f (X), g(X))Var(f (X))[Cov(f (X), g(X))]2Var(f (X))Because Var(f (X)) and Cov(f (X), g(X)) are usually unknown, the simulateddata should be used to estimate these quantities.


11.6. Variance Reduction Techniques 713Dividing the preceding equation by Var(g(X)) shows thatVar(W )Var(g(X)) = 1 − Corr2 (f (X), g(X))where Corr(X, Y ) is the correlation between X and Y . Consequently, the use of acontrol variate will greatly reduce the variance of the simulation estimator wheneverf(X) and g(X) are strongly correlated.Example 11.20 Consider a continuous time Markov chain which, upon enteringstate i, spends an exponential time with rate v i in that state before makinga transition into some other state, with the transition being into state j with probabilityP i,j ,i 0,j ≠ i. Suppose that costs are incurred at rate C(i) 0 per unittime whenever the chain is in state i, i 0. With X(t) equal to the state at time t,and α being a constant such that 0


714 11 SimulationS i is the service time of customer i, i 1, we may writeD n+1 = g(X 1 ,...,X n ,S 1 ,...,S n )To take into account the possibility that the simulated variables X i ,S i may bychance be quite different from what might be expected we can letf(X 1 ,...,X n ,S 1 ,...,S n ) =As E[f(X, S)]=n(μ G − μ F ) we could usen∑(S i − X i )i=1g(X, S) + a[f(X, S) − n(μ G − μ F )]as an estimator of E[D n+1 ]. Since D n+1 and f are both increasing functions ofS i , −X i ,i = 1,...,n it follows from Theorem 11.1 that f(X, S) and D n+1 arepositively correlated, and so the simulated estimate of a should turn out to benegative.If we wanted to estimate the expected sum of the delays in queue of the firstN(T) arrivals, then we could use ∑ N(T)i=1 S i as our control variable. Indeed as thearrival process is usually assumed independent of the service times, it follows thatE[ N(T)∑i=1S i]= E[S]E[N(T)]where E[N(T)] can either be computed by the method suggested in Section 7.8 orit can be estimated from the simulation as in Example 11.18. This control variablecould also be used if the arrival process were a nonhomogeneous Poisson with rateλ(t); in this case,∫ TE[N(T)]= λ(t) dt 011.6.4. Importance SamplingLet X = (X 1 ,...,X n ) denote a vector of random variables having a joint densityfunction (or joint mass function in the discrete case) f(x) = f(x 1 ,...,x n ), andsuppose that we are interested in estimating∫θ = E[h(X)]= h(x)f (x)dxwhere the preceding is an n-dimensional integral. (If the X i are discrete, theninterpret the integral as an n-fold summation.)


11.6. Variance Reduction Techniques 715Suppose that a direct simulation of the random vector X, so as to compute valuesof h(X), is inefficient, possibly because (a) it is difficult to simulate a randomvector having density function f(x), or (b) the variance of h(X) is large, or (c) acombination of (a) and (b).Another way in which we can use simulation to estimate θ is to note that if g(x)is another probability density such that f(x) = 0 whenever g(x) = 0, then we canexpress θ as∫ h(x)f (x)θ =g(x)dxg(x)[ ] h(X)f (X)= E gg(X)(11.14)where we have written E g to emphasize that the random vector X has joint densityg(x).It follows from Equation (11.14) that θ can be estimated by successively generatingvalues of a random vector X having density function g(x) and then usingas the estimator the average of the values of h(X)f (X)/g(X). If a density functiong(x) can be chosen so that the random variable h(X)f (X)/g(X) has a smallvariance then this approach—referred to as importance sampling—can result inan efficient estimator of θ.Let us now try to obtain a feel for why importance sampling can be useful. Tobegin, note that f(X) and g(X) represent the respective likelihoods of obtainingthe vector X when X is a random vector with respective densities f and g. Hence,if X is distributed according to g, then it will usually be the case that f(X) willbe small in relation to g(X) and thus when X is simulated according to g thelikelihood ratio f(X)/g(X) will usually be small in comparison to 1. However, itis easy to check that its mean is 1:[ ] ∫ ∫f(X) f(x)E g =g(X) g(x) g(x)dx =f(x)dx = 1Thus we see that even though f(X)/g(X) is usually smaller than 1, its mean isequal to 1; thus implying that it is occasionally large and so will tend to have alarge variance. So how can h(X)f (X)/g(X) have a small variance? The answer isthat we can sometimes arrange to choose a density g such that those values of x forwhich f(x)/g(x) is large are precisely the values for which h(x) is exceedinglysmall, and thus the ratio h(X)f (X)/g(X) is always small. Since this will requirethat h(x) sometimes be small, importance sampling seems to work best whenestimating a small probability; for in this case the function h(x) is equal to 1when x lies in some set and is equal to 0 otherwise.


716 11 SimulationWe will now consider how to select an appropriate density g. We will find thatthe so-called tilted densities are useful. Let M(t) = E f [e tX ]= ∫ e tx f(x)dx bethe moment generating function corresponding to a one-dimensional density f .Definition 11.2A density functionf t (x) = etx f(x)M(t)is called a tilted density of f , −∞


11.6. Variance Reduction Techniques 717In certain situations the quantity of interest is the sum of the independent randomvariables X 1 ,...,X n . In this case the joint density f is the product of onedimensionaldensities. That is,f(x 1 ,...,x n ) = f 1 (x 1 ) ···f n (x n )where f i is the density function of X i . In this situation it is often useful to generatethe X i according to their tilted densities, with a common choice of t employed.Example 11.23 Let X 1 ,...,X n be independent random variables having respectiveprobability density (or mass) functions f i , for i = 1,...,n. Suppose weare interested in approximating the probability that their sum is at least as large asa, where a is much larger than the mean of the sum. That is, we are interested inθ = P {S a}where S = ∑ ni=1 X i , and where a> ∑ ni=1 E[X i ]. Letting I{S a} equal 1 ifS a and letting it be 0 otherwise, we have thatθ = E f [I{S a}]where f = (f 1 ,...,f n ). Suppose now that we simulate X i according to the tiltedmass function f i,t , i = 1,...,n, with the value of t,t > 0 left to be determined.The importance sampling estimator of θ would then beˆθ = I{S a} ∏ f i (X i )f i,t (X i )Now,and sof i (X i )f i,t (X i ) = M i(t)e −tX iˆθ = I{S a}M(t)e −tSwhere M(t) = ∏ M i (t) is the moment generating function of S. Since t>0 andI{S a} is equal to 0 when S


718 11 SimulationTo make the bound on the estimator as small as possible we thus choose t,t > 0, tominimize M(t)e −ta . In doing so, we will obtain an estimator whose value on eachiteration is between 0 and min t M(t)e −ta . It can be shown that the minimizing t,call it t ∗ , is such that[ n∑]E t ∗[S]=E t ∗ X i = ai=1where, in the preceding, we mean that the expected value is to be taken under theassumption that the distribution of X i is f i,t ∗ for i = 1,...,n.For instance, suppose that X 1 ,...,X n are independent Bernoulli random variableshaving respective parameters p i ,fori = 1,...,n. Then, if we generate theX i according to their tilted mass functions p i,t ,i = 1,...,n then the importancesampling estimator of θ = P {S a} isˆθ = I{S a}e −tSn∏ (pi e t )+ 1 − p ii=1Since p i,t is the mass function of a Bernoulli random variable with parameterp i e t /(p i e t + 1 − p i ) it follows that[ n∑]n∑ p i e tE t X i =p i e t + 1 − p ii=1 i=1The value of t that makes the preceding equal to a can be numerically approximatedand then utilized in the simulation.As an illustration, suppose that n = 20, p i = 0.4, and a = 16. Then0.4e tE t [S]=200.4e t + 0.6Setting this equal to 16 yields, after a little algebra,e t∗ = 6Thus, if we generate the Bernoullis using the parameterthen because0.4e t∗0.4e t∗ + 0.6 = 0.8M(t ∗ ) = (0.4e t∗ + 0.6) 20 and e −t∗S = (1/6) S


11.6. Variance Reduction Techniques 719we see that the importance sampling estimator isˆθ = I{S 16}(1/6) S 3 20It follows from the preceding thatˆθ (1/6) 16 3 20 = 81/2 16 = 0.001236That is, on each iteration the value of the estimator is between 0 and 0.001236.Since, in this case, θ is the probability that a binomial random variable withparameters 20, 0.4 is at least 16, it can be explicitly computed with the resultθ = 0.000317. Hence, the raw simulation estimator I , which on each iterationtakes the value 0 if the sum of the Bernoullis with parameter 0.4 is less than 16and takes the value 1 otherwise, will have varianceVar(I) = θ(1 − θ)= 3.169 × 10 −4On the other hand, it follows from the fact that 0 ˆθ 0.001236 that (see Exercise33)Var( ˆθ) 2.9131 × 10 −7Example 11.24 Consider a single-server queue in which the times betweensuccessive customer arrivals have density function f and the service times havedensity g. LetD n denote the amount of time that the nth arrival spends waitingin queue and suppose we are interested in estimating α = P {D n a} whena is much larger than E[D n ]. Rather than generating the successive interarrivaland service times according to f and g, respectively, they should be generatedaccording to the densities f −t and g t , where t is a positive number to be determined.Note that using these distributions as opposed to f and g will result insmaller interarrival times (since −t athan if we had simulated using the densitiesf and g. The importance sampling estimator of α would then beˆα = I{D n >a}e t(S n−Y n ) [M f (−t)M g (t)] nwhere S n is the sum of the first n interarrival times, Y n is the sum of the firstn service times, and M f and M g are the moment generating functions of thedensities f and g, respectively. The value of t used should be determined byexperimenting with a variety of different choices.


720 11 Simulation11.7. Determining the Number of RunsSuppose that we are going to use simulation to generate r independent and identicallydistributed random variables Y (1) ,...,Y (r) having mean μ and variance σ 2 .We are then going to useȲ r = Y (1) +···+Y (r)ras an estimate of μ. The precision of this estimate can be measured by its varianceVar(Ȳ r ) = E[(Ȳ r − μ) 2 ]= σ 2 /rHence we would want to choose r, the number of necessary runs, large enough sothat σ 2 /r is acceptably small. However, the difficulty is that σ 2 is not known inadvance. To get around this, you should initially simulate k runs (where k 30)and then use the simulated values Y (1) ,...,Y (k) to estimate σ 2 by the samplevariancek∑ (Y (i) ) 2/(k− Ȳ k − 1)i=1Basedonthisestimateofσ 2 the value of r that attains the desired level of precisioncan now be determined and an additional r − k runs can be generated.11.8. Coupling from the PastConsider an irreducible Markov chain with states 1,...,m and transition probabilitiesP i,j and suppose we want to generate the value of a random variable whosedistribution is that of the stationary distribution of this Markov chain. Whereas wecould approximately generate such a random variable by arbitrarily choosing aninitial state, simulating the resulting Markov chain for a large fixed number oftime periods, and then choosing the final state as the value of the random variable,we will now present a procedure that generates a random variable whosedistribution is exactly that of the stationary distribution.If, in theory, we generated the Markov chain starting at time −∞ in any arbitrarystate, then the state at time 0 would have the stationary distribution. Soimagine that we do this, and suppose that a different person is to generate thenext state at each of these times. Thus, if X(−n), the state at time −n, is i, then


11.8. Coupling from the Past 721person −n would generate a random variable that is equal to j with probabilityP i,j , j = 1,...,m, and the value generated would be the state at time −(n − 1).Now suppose that person −1 wants to do his random variable generation early.Because he does not know what the state at time −1 will be, he generates a sequenceof random variables N −1 (i), i = 1,...,m, where N −1 (i), the next stateif X(−1) = i, is equal to j with probability P i,j , j = 1,...,m. If it results thatX(−1) = i, then person −1 would report that the state at time 0 isS −1 (i) = N −1 (i),i = 1,...,m(That is, S −1 (i) is the simulated state at time 0 when the simulated state at time−1 isi.)Now suppose that person −2, hearing that person −1 is doing his simulationearly, decides to do the same thing. She generates a sequence of randomvariables N −2 (i), i = 1,...,m, where N −2 (i) is equal to j with probabilityP i,j ,j = 1,...,m. Consequently, if it is reported to her that X(−2) = i, thenshe will report that X(−1) = N −2 (i). Combining this with the early generation ofperson −1 shows that if X(−2) = i, then the simulated state at time 0 isS −2 (i) = S −1 (N −2 (i)),i = 1,...,mContinuing in the preceding manner, suppose that person −3 generates a sequenceof random variables N −3 (i), i = 1,...,m, where N −3 (i) is to be the generatedvalue of the next state when X(−3) = i. Consequently, if X(−3) = i thenthe simulated state at time 0 would beS −3 (i) = S −2 (N −3 (i)),i = 1,...,mNow suppose we continue the preceding, and so obtain the simulatedfunctionsS −1 (i), S −2 (i), S −3 (i), . . . ,i = 1,...,mGoing backward in time in this manner, we will at some time, say −r, haveasimulated function S −r (i) that is a constant function. That is, for some state j,S −r (i) will equal j for all states i = 1,...,m. But this means that no matter whatthe simulated values from time −∞ to −r, we can be certain that the simulatedvalue at time 0 is j. Consequently, j can be taken as the value of a generatedrandom variable whose distribution is exactly that of the stationary distribution ofthe Markov chain.


722 11 SimulationExample 11.25 Consider a Markov chain with states 1, 2, 3 and suppose thatsimulation yielded the values⎧⎨3, if i = 1N −1 (i) = 2, if i = 2⎩2, if i = 3and⎧⎨1, if i = 1N −2 (i) = 3, if i = 2⎩1, if i = 3Then⎧⎨3, if i = 1S −2 (i) = 2, if i = 2⎩3, if i = 3If⎧⎨3, if i = 1N −3 (i) = 1, if i = 2⎩1, if i = 3then⎧⎨3, if i = 1S −3 (i) = 3, if i = 2⎩3, if i = 3Therefore, no matter what the state is at time −3, the state at time 0 will be 3.Remark The procedure developed in this section for generating a random variablewhose distribution is the stationary distribution of the Markov chain is calledcoupling from the past.


Exercises 723Exercises*1. Suppose it is relatively easy to simulate from the distributions F i , i =1, 2,...,n.Ifn is small, how can we simulate fromn∑F(x)= P i F i (x), P i 0,i=1∑P i = 1?iGive a method for simulating from⎧1 − e⎪⎨−2x + 2x, 0


724 11 SimulationFigure 11.6.(c) Give two methods for simulating from the distribution F(x) = x n ,0


Exercises 72510. Explain how we can number the Q (k) in the alias method so that k is one ofthe two points that Q (k) gives weight.Hint:Rather than name the initial Q, Q (1) what else could we call it?11. Complete the details of Example 11.10.12. Let X 1 ,...,X k be independent withP {X i = j}= 1 n ,j = 1,...,n, i = 1,...,kIf D is the number of distinct values among X 1 ,...,X k show that[ ( ) ]n − 1kE[D]=n 1 −n≈ k − k22nwhen k2nis small13. The Discrete Rejection Method: Suppose we want to simulate X havingprobability mass function P {X = i}=P i ,i = 1,...,nand suppose we can easilysimulate from the probability mass function Q i , ∑ i Q i = 1, Q i 0. Let C besuch that P i CQ i ,i = 1,...,n. Show that the following algorithm generatesthe desired random variable:Step 1: Generate Y having mass function Q and U an independent randomnumber.Step 2: If U P Y /CQ Y ,setX = Y . Otherwise return to step 1.14. The Discrete Hazard Rate Method: Let X denote a nonnegative integer valuedrandom variable. The function λ(n) = P {X = n | X n}, n 0, is called thediscrete hazard rate function.(a) Show that P {X = n}=λ(n) ∏ n−1i=0(1 − λ(i)).(b) Show that we can simulate X by generating random numbers U 1 ,U 2 ,...stopping atX = min{n: U n λ(n)}(c) Apply this method to simulating a geometric random variable. Explain,intuitively, why it works.(d) Suppose that λ(n) p


726 11 Simulationand letX = min{S k : U k λ(S k )/p}15. Suppose you have just simulated a normal random variable X with mean μand variance σ 2 . Give an easy way to generate a second normal variable with thesame mean and variance that is negatively correlated with X.*16. Suppose n balls having weights w 1 ,w 2 ,...,w n are in an urn. Theseballs are sequentially removed in the following manner: At each selection, a givenball in the urn is chosen with a probability equal to its weight divided by the sumof the weights of the other balls that are still in the urn. Let I 1 ,I 2 ,...,I n denotethe order in which the balls are removed—thus I 1 ,...,I n is a random permutationwith weights.(a) Give a method for simulating I 1 ,...,I n .(b) Let X i be independent exponentials with rates w i ,i = 1,...,n. Explainhow X i can be utilized to simulate I 1 ,...,I n .17. Order Statistics: Let X 1 ,...,X n be i.i.d. from a continuous distribution F ,and let X (i) denote the ith smallest of X 1 ,...,X n ,i = 1,...,n. Suppose wewant to simulate X (1)


Exercises 727(e) Use part (d) to show that U (1) ,...,U (n) can be generated as follows:Step 1: Generate random numbers U 1 ,...,U n .Step 2: SetU (n) = U 1/n1, U (n−1) = U (n) (U 2 ) 1/(n−1) ,U (j−1) = U (j) (U n−j+2 ) 1/(j−1) , j = 2,...,n− 118. Let X 1 ,...,X n be independent exponential random variables each havingrate 1. SetW 1 = X 1 /n,X iW i = W i−1 +n − i + 1 , i = 2,...,nExplain why W 1 ,...,W n has the same joint distribution as the order statistics ofa sample of n exponentials each having rate 1.19. Suppose we want to simulate a large number n of independent exponentialswith rate 1—call them X 1 ,X 2 ,...,X n .Ifweweretoemploytheinversetransform technique we would require one logarithmic computation for each exponentialgenerated. One way to avoid this is to first simulate S n , a gammarandom variable with parameters (n, 1) (say, by the method of Section 11.3.3).Now interpret S n as the time of the nth event of a Poisson process with rate1 and use the result that given S n the set of the first n − 1 event times isdistributed as the set of n − 1 independent uniform (0,S n ) random variables.Based on this, explain why the following algorithm simulates n independentexponentials:Step 1: Generate S n , a gamma random variable with parameters (n, 1).Step 2: Generate n − 1 random numbers U 1 ,U 2 ,...,U n−1 .Step 3: Order the U i , i = 1,...,n− 1 to obtain U (1)


728 11 SimulationP {interarrival time = k} =p(1 − p) k−1 ,k = 1, 2,.... Suppose events occur attimes i 1


Exercises 729(a) Show that ∫ X 10λ(t) dt is exponential with rate 1.(b) Show that ∫ X iX i−1λ(t) dt, i 1, are independent exponentials with rate 1,where X 0 = 0.In words, independent of the past, the additional amount of hazard that must beexperienced until an event occurs is exponential with rate 1.24. Give an efficient method for simulating a nonhomogeneous Poisson processwith intensity functionλ(t) = b + 1t + a , t 025. Let (X, Y ) be uniformly distributed in a circle of radius r about the origin.That is, their joint density is given byf(x,y) = 1πr 2 , 0 x2 + y 2 r 2Let R = √ X 2 + Y 2 and θ = arc tan Y/X denote their polar coordinates. Show thatR and θ are independent with θ being uniform on (0, 2π)and P {R


730 11 SimulationFigure 11.7.(a) Show that E[bI]= ∫ 10 g(x)dx.(b) Show that Var(bI) Var(g(U)), and so hit–miss has larger variance thansimply computing g of a random number.29. Stratified Sampling: Let U 1 ,...,U n be independent random numbersand set Ū i = (U i + i − 1)/n, i = 1,...,n. Hence, Ū i ,i 1, is uniform on((i − 1)/n, i/n). ∑ ni=1 g(Ū i )/n is called the stratified sampling estimator of∫ 10 g(x)dx.(a) Show that E[ ∑ ni=1 g(Ū i )/n]= ∫ 10 g(x)dx.(b) Show that Var[ ∑ ni=1 g(Ū i )/n] Var[ ∑ ni=1 g(U i )/n].Hint: Let U be uniform (0, 1) and define N by N = i if (i − 1)/n


References 731If we use simulation to estimate E[D n ], should we(a) use the simulated data to determine D n , which is then used as an estimateof E[D n ];or(b) use the simulated data to determine W n and then use this quantity minus μas an estimate of E[D n ]?Repeat if we want to estimate E[W n ].*32. Show that if X and Y have the same distribution thenVar((X + Y)/2) Var(X)Hence, conclude that the use of antithetic variables can never increase variance(though it need not be as efficient as generating an independent set of randomnumbers).33. If 0 X a, show that(a) E[X 2 ] aE[X],(b) Var(X) E[X](a − E[X]),(c) Var(X) a 2 /4.34. Suppose in Example 11.19 that no new customers are allowed in the systemafter time t 0 . Give an efficient simulation estimator of the expected additionaltime after t 0 until the system becomes empty.References1. J. Banks and J. Carson, “Discrete Event System Simulation,” Prentice Hall, EnglewoodCliffs, New Jersey, 1984.2. G. Fishman, “Principles of Discrete Event Simulation,” John Wiley, New York, 1978.3. D. Knuth, “Semi Numerical Algorithms,” Vol. 2 of The Art of Computer Programming,Second Edition, Addison-Wesley, Reading, Massachusetts, 1981.4. A. Law and W. Kelton, “Simulation Modelling and Analysis,” Second Edition,McGraw-Hill, New York, 1992.5. S. Ross, “Simulation,” Fourth Edition, Academic Press, San Diego, 2006.6. R. Rubenstein, “Simulation and the Monte Carlo Method,” John Wiley, NewYork, 1981.


This page intentionally left blank


AppendixSolutions to StarredExercisesChapter 12. S ={(r, g), (r, b), (g, r), (g, b), (b, r), (b, g)} where, for instance, (r, g)means that the first marble drawn was red and the second one green. The probabilityof each one of these outcomes is 1 6 .35.4. If he wins, he only wins $1; if he loses, he loses $3.9. F = E ∪ FE c , implying since E and FE c are disjoint that P(F)= P(E)+P(FE c ).17. P {end}=1 − P {continue}= 1 −[Prob(H,H,H)+ Prob(T,T,T)][ 1Fair coin: P {end}=1 −2 · 12 · 12 + 1 2 · 12 · 1 ]2= 3 4[ 1Biased coin: P {end}=1 −4 · 14 · 14 + 3 4 · 34 · 3 ]4= 9 16733


734 Appendix Solutions to Starred Exercises19. E = event at least 1 sixnumber of ways to get EP(E)=number of sample points = 1136D = event two faces are differentP(D)= 1 − P(two faces the same) = 1 − 636 = 5 6P(E\D) = P(ED)P(D) = 10/365/6 = 1 325. (a) P {pair}=P {second card is same denomination as first}= 3 51(b) P {pair | different suits}=P {pair, different suits}P {different suits}P {pair}=P {different suits}= 3/5139/51 = 1 1327. P(E 1 ) = 1P(E 1 |E 2 ) = 3951since 12 cards are in the ace of spades pile and 39 are not.P(E 3 |E 1 E 2 ) = 2650since 24 cards are in the piles of the two aces and 26 are in the other two piles.P(E 4 |E 1 E 2 E 3 ) = 1349SoP {each pile has an ace}=( 3951)( 2650)( ) 1349


30. (a) P {George | exactly 1 hit}===P {George, not Bill}P {exactly 1}P {G, not B}P {G, not B}+P {B, not G}(0.4)(0.3)(0.4)(0.3) + (0.7)(0.6)Chapter 1 735(b)P {G|hit}== 2 9P {G, hit}P {hit}= P {G}P {hit} = 0.41 − (0.3)(0.6) = 204132. Let E i = event person i selects own hat.P (no one selects hat )= 1 − P(E 1 ∪ E 2 ∪···∪E n )[ ∑= 1 − P(E i1 ) − ∑]P(E i1 E i2 ) +···+(−1) n+1 P(E 1 E 2 ···E n )i i i 1


736 Appendix Solutions to Starred Exercises40. (a) F = event fair coin flipped; U = event two-headed coin flipped.(b) P(F|HH)=P(F|H)==P(H|F)P(F)P(H|F)P(F)+ P(H|U)P(U)1212 · 12=· 12+ 1 · 1214= 1 3 34P(HH|F)P(F)P(HH|F)P(F)+ P(HH|U)P(U)14 · 1218= 1 5 58==14 · 12+ 1 · 12P(HHT|F)P(F)(c) P(F|HHT)=P(HHT|F)P(F)+ P(HHT|U)P(U)= P(HHT|F)P(F)P(HHT|F)P(F)+ 0 = 1since the fair coin is the only one that can show tails.45. Let B i = event ith ball is black; R i = event ith ball is red.P(R 2 |B 1 )P (B 1 )P(B 1 |R 2 ) =P(R 2 |B 1 )P (B 1 ) + P(R 2 |R 1 )P (R 1 )r=b + r + c · bb + rrb + r + c · bb + r + r + cb + r + c · rb + rrb=rb + (r + c)rb=b + r + c48. Let C be the event that the randomly chosen family owns a car, and let Hbe the event that the randomly chosen family owns a house.andgiving the resultP(CH c ) = P(C)− P(CH)= 0.6 − 0.2 = 0.4P(C c H)= P(H)− P(CH)= 0.3 − 0.2 = 0.1P(CH c ) + P(C c H)= 0.5


Chapter 2 737Chapter 24. (i) 1, 2, 3, 4, 5, 6.(ii) 1, 2, 3, 4, 5, 6.(iii) 2, 3,...,11, 12.(iv) −5, 4,...,4, 5.11.( )( ) 4 1 2 ( ) 1 2= 3 2 2 2 8 .16. 1 − (0.95) 52 − 52(0.95) 51 (0.05).23. In order for X to equal n, the first n − 1flipsmusthaver − 1 heads,and then the nth flip must land heads. By independence the desired probabilityis thus( ) n − 1p r−1 (1 − p) n−r × pr − 127. P {same number of heads}= ∑ iP {A = i, B = i}( ki)( 12) k ( )( n − k 1 n−k2)= ∑ ii)( )( n − k 1 n2)= ∑ i= ∑ i( kii( )( )( k n − k 1 nk − i i 2)( )( n 1 n=k 2)Another argument is as follows:P {# heads of A = # heads of B}= P {# tails of A = # heads of B} since coin is fair= P {k − # heads of A = # heads of B}= P {k = total # heads}38. c = 2, P{X>2}=∫ ∞22e −2x dx = e −4


738 Appendix Solutions to Starred Exercises47. Let X i be1iftriali is a success and 0 otherwise.(a) The largest value is 0.6. If X 1 = X 2 = X 3 , then1.8 = E[X]=3E[X 1 ]=3P {X 1 = 1}and so P {X = 3}=P {X 1 = 1}=0.6. That this is the largest value is seenby Markov’s inequality which yields thatP {X 3} E[X]/3 = 0.6(b) The smallest value is 0. To construct a probability scenario for whichP {X = 3}=0, let U be a uniform random variable on (0, 1), and define{1, if U 0.6X 1 =0, otherwise{1, if U 0.4X 2 =0, otherwise{1, if either U 0.3orU 0.7X 3 =0, otherwiseit is easy to see thatP {X 1 = X 2 = X 3 = 1}=049. E[X 2 ]−(E[X]) 2 = Var(X) = E[(X − E[X]) 2 ] 0. Equality whenVar(X) = 0, that is, when X is constant.64. See Section 5.2.3 of Chapter 5. Another way is to use moment generatingfunctions. The moment generating function of the sum of n independent exponentialswith rate λ is equal to the product of their moment generating functions.That is, it is [λ/(λ − t)] n . But this is precisely the moment generating function ofa gamma with parameters n and λ.70. Let X i be Poisson with mean 1. ThenP{ n∑1}n∑X i n = e −n n kk!But for n large ∑ n1 X i − n has approximately a normal distribution with mean 0,and so the result follows.k=0


Chapter 2 73972. For the matching problem, letting X = X 1 + ··· +X N , where{1, if ith man selects his own hatX i =0, otherwisewe obtainVar(X) =N∑Var(X i ) + 2 ∑∑Cov(X i ,X j )i=1Since P {X i = 1}=1/N,weseeAlsoVar(X i ) = 1 Ni


740 Appendix Solutions to Starred ExercisesChapter 32. Intuitively it would seem that the first head would be equally likely to occuron any of trials 1,...,n− 1. That is, it is intuitive thatFormally,P {X 1 = i | X 1 + X 2 = n}= 1n − 1 , i = 1,...,n− 1P {X 1 = i | X 1 + X 2 = n}= P {X 1 = i, X 1 + X 2 = n}P {X 1 + X 2 = n}= P {X 1 = i, X 2 = n − i}P {X 1 + X 2 = n}= p(1 − p)i−1 p(1 − p) n−i−1)p(1 − p) n−2 p( n−11= 1n − 1In the preceding, the next to last equality uses the independence of X 1 and X 2 toevaluate the numerator and the fact that X 1 + X 2 has a negative binomial distributionto evaluate the denominator.P {X = 1,Y = 3}6. p X|Y (1 | 3) =P {Y = 3}P {1 white, 3 black, 2 red}=P {3 black}( )6! 3 1 ( 5 3 ( 61!3!2! 14 14)14= (6! 5 3 ( ) 9 33!3! 14)14) 2= 4 9p X|Y (0 | 3) = 827p X|Y (2 | 3) = 2 9


Chapter 3 741p X|Y (3 | 3) = 1 27E[X | Y = 1]= 5 313. The conditional density of X given that X>1isf X|X>1 (X) =f(x)P {X>1} = λ exp−λxe −λ when x>1∫ ∞E[X | X>1]=e λ xλ e −λx dx = 1 + 1/λ1by integration by parts. This latter result also follows immediately by the lack ofmemory property of the exponential.∫∫∫19. E[X | Y = y]f Y (y) dy = xf X|Y (x | y)dxf Y (y) dy∫∫=∫=∫=x f(x,y)f Y (y) dxf Y (y) dy∫x= E[X]f(x,y)dy dxxf X (x) dx23. Let X denote the first time a head appears. Let us obtain an equation forE[N | X] by conditioning on the next two flips after X. This givesE[N | X]=E[N | X, h, h]p 2 + E[N | X, h, t]pq + E[N | X, t, h]pqwhere q = 1 − p.Now+ E[N | X, t, t]q 2E[N | X, h, h]=X + 1, E[N | X, h, t]=X + 1E[N | X, t, h]=X + 2, E[N | X, t, t]=X + 2 + E[N]Substituting back givesE[N|X]=(X + 1)(p 2 + pq) + (X + 2)pq+ (X + 2 + E[N])q 2Taking expectations, and using the fact that X is geometric with mean 1/p, weobtainE[N]=1 + p + q + 2pq + q 2 /p + 2q 2 + q 2 E[N]


742 Appendix Solutions to Starred ExercisesSolving for E[N] yieldsE[N]= 2 + 2q + q2 /p1 − q 247. E[X 2 Y 2 |X]=X 2 E[Y 2 |X] X 2 (E[Y |X]) 2 = X 2The inequality follows since for any random variable U, E[U 2 ] (E[U]) 2 andthis remains true when conditioning on some other random variable X. Takingexpectations of the preceding shows thatE[(XY ) 2 ] E[X 2 ]Asthe results follow.53. P {X = n}=E[XY]=E[E[XY | X]] = E[XE[Y | X]] = E[X]===∫ ∞0∫ ∞0∫ ∞0∫ ∞0P {X = n|λ}e −λ dλe −λ λ ne −λ dλn!e −2λ λ n dλn!e −t t n dtn!( 12) n+1The results follow since ∫ ∞0e −t t n dt = Ɣ(n + 1) = n!60. (a) Intuitive that f(p) is increasing in p, since the larger p is the greateris the advantage of going first.(b) 1.(c) 1 2since the advantage of going first becomes nil.(d) Condition on the outcome of the first flip:Therefore,f(p)= P {I wins | h}p + P {I wins | t}(1 − p)= p +[1 − f(p)](1 − p)f(p)= 12 − p67. Part (a) is proven by noting that a run of j successive heads can occur withinthe first n flips in two mutually exclusive ways. Either there is a run of j succes-


Chapter 3 743sive heads within the first n − 1 flips; or there is no run of j successive headswithin the first n − j − 1flips,flipn − j is not a head, and flips n − j + 1 throughn are all heads.Let A be the event that a run of j successive heads occurs within the firstn, (n j), flips. Conditioning on X, the trial number of the first non-head, givesthe followingP j (n) = ∑ kP(A| X = k)p k−1 (1 − p)===j∑P(A| X = k)p k−1 (1 − p) +k=1j∑P j (n − k)p k−1 (1 − p) +i=1j∑P j (n − k)p k−1 (1 − p) + p ji=1∞∑k=j+1∞∑k=j+1P(A| X = k)p k−1 (1 − p)p k−1 (1 − p)73. Condition on the value of the sum prior to going over 100. In all cases themost likely value is 101. (For instance, if this sum is 98 then the final sum isequally likely to be either 101, 102, 103, or 104. If the sum prior to going over is95, then the final sum is 101 with certainty.)93. (a) By symmetry, for any value of (T 1 ,...,T m ), the random vector(I 1 ,...,I m ) is equally likely to be any of the m! permutations.m∑(b) E[N]= E[N|X = i]P {X = i}i=1= 1 m= 1 mm∑E[N|X = i]i=1( m−1 ∑ (E[Ti ]+E[N] ) )+ E[T m−1 ]i=1where the final equality used the independence of X and T i . Therefore,m−1∑E[N]=E[T m−1 ]+ E[T i ]i=1


744 Appendix Solutions to Starred Exercises(c) E[T i ]=i∑j=1mm + 1 − j(d)m−1∑m−1mE[N]=m + 1 − j + ∑j=1i∑i=1 j=1mm + 1 − jm−1∑=j=1m−1∑=j=1m−1mm + 1 − j + ∑m−1∑j=1 i=jm−1mm + 1 − j + ∑j=1mm + 1 − jm(m − j)m + 1 − jm−1∑()m m(m − j)=+m + 1 − j m + 1 − jj=1= m(m − 1)Chapter 41. P 01 = 1, P 10 = 1 9 , P 21 = 4 9 , P 32 = 1P 11 = 4 9 , P 22 = 4 9P 12 = 4 9 , P 23 = 1 94. Let the state space be S ={0, 1, 2, ¯0, ¯1, ¯2}, where state i(ī) signifies that thepresent value is i, and the present day is even (odd).16. If P ij were (strictly) positive, then Pji n would be 0 for all n (otherwise, i andj would communicate). But then the process, starting in i, has a positive probabilityof at least P ij of never returning to i. This contradicts the recurrence of i.Hence P ij = 0.21. The transition probabilities are{1 − 3α, if j = iP i,j =α, if j ≠ i


Chapter 4 745By symmetry,P nij = 1 3 (1 − P nii ),j ≠ iSo, let us prove by induction that⎧⎨i,j = 4 + 3 4 (1 − 4α)n if j = i⎩ 14 − 1 4 (1 − 4α)n if j ≠ iP n1As the preceding is true for n = 1, assume it for n. To complete the inductionproof, we need to show thatNow,Pi,j n+1 =⎧⎨ 14 + 3 4 (1 − 4α)n+1 if j = i⎩ 14 − 1 4 (1 − 4α)n+1 if j ≠ iP n+1i,i= Pi,i n P i,i + ∑ Pi,j n P j,ij≠i( 1=4 + 3 )( 14 (1 − 4α)n (1 − 3α) + 34 − 1 )4 (1 − 4α)n α= 1 4 + 3 4 (1 − 4α)n (1 − 3α − α)= 1 4 + 3 (1 − 4α)n+14By symmetry, for j ≠ iPij n+1 = 1 n+1(1 − Pii) = 1 3 4 − 1 (1 − 4α)n+14and the induction is complete.By letting n →∞in the preceding, or by using that the transition probabilitymatrix is doubly stochastic, or by just using a symmetry argument, we obtain thatπ i = 1/4, i = 1, 2, 3, 4.


746 Appendix Solutions to Starred Exercises27. The limiting probabilities are obtained fromπ 0 = 1 9 π 1,π 1 = π 0 + 4 9 π 1 + 4 9 π 2,π 2 = 4 9 π 1 + 4 9 π 2 + π 3 ,π 0 + π 1 + π 2 + π 3 = 1and the solution is π 0 = π 3 =20 1 , π 1 = π 2 =20 9 .32. With the state being the number of on switches this is a three-state Markovchain. The equations for the long-run proportions areπ 0 = 1 16 π 0 + 1 4 π 1 + 9 16 π 2,π 1 = 3 8 π 0 + 1 2 π 1 + 3 8 π 2,π 0 + π 1 + π 2 = 1This gives the solutionπ 0 = 2 7 , π 1 = 3 7 , π 2 = 2 741. (a) The number of transitions into state i by time n, the number of transitionsoriginating from state i by time n, and the number of time periodsthe chain is in state i by time n, all differ by at most 1. Thus, their long-runproportions must be equal.(b) π i P ij is the long-run proportion of transitions that go from state i tostate j.(c) ∑ i π iP ij is the long-run proportion of transitions that are into state j.(d) Since π j is also the long-run proportion of transitions that are intostate j, it follows that π j = ∑ i π iP ij .47. {Y n ,n 1} is a Markov chain with states (i, j).{0, if j ≠ kP (i,j),(k,l) =P jl , if j = kwhere P jl is the transition probability for {X n }.limn→∞ P {Y n = (i, j)}=lim nP {X n = i, X n+1 = j}= lim n[P {X n = i}P ij ]= π i P ij


Chapter 5 74762. (a) Since π i = 1 5is equal to the inverse of the expected number of transitionsto return to state i, it follows that the expected number of steps toreturn to the original position is 5.(b) Condition on the first transition. Suppose it is to the right. In this casethe probability is just the probability that a gambler who always bets 1and wins each bet with probability p will, when starting with 1, reach4 before going broke. By the gambler’s ruin problem this probability isequal to1 − q/p1 − (q/p) 4Similarly, if the first move is to the left then the problem is again the samegambler’s ruin problem but with p and q reversed. The desired probabilityis thusp − q1 − (q/p) 4 + q − p1 − (p/q) 468. (a) ∑ i π iQ ij = ∑ i π j P ji = π j∑i P ji = π j(b) Whether perusing the sequence of states in the forward direction oftime or in the reverse direction, the proportion of time the state is i willbe the same.Chapter 57. P {X 1 t}+P {X 2 = t,X 1 >t}f 1 (t)[1 − F 2 (t)]f 1 (t)[1 − F 2 (t)]+f 2 (t)[1 − F 1 (t)]Dividing through by [1 − F 1 (t)][1 − F 2 (t)] yields the result. (Of course, f i and F iare the density and distribution function of X i , i = 1, 2.) To make the precedingderivation rigorous, we should replace “= t” by∈ (t, t + ε) throughout and thenlet ε → 0.


748 Appendix Solutions to Starred Exercises10. (a) E[MX|M = X]=E[M 2 |M = X]= E[M 2 ]=2(λ + μ) 2(b) By the memoryless property of exponentials, given that M = Y , X isdistributed as M + X ′ where X ′ is an exponential with rate λ that is independentof M. Therefore,E[MX|M = Y ]=E[M(M + X ′ )]= E[M 2 ]+E[M]E[X ′ ]2=(λ + μ) 2 + 1λ(λ + μ)λ(c) E[MX]=E[MX|M = X]λ + μ + E[MX|M = Y ] μλ + μ= 2λ + μλ(λ + μ) 2Therefore,Cov(X, M) =λλ(λ + μ) 218. (a) 1/(2μ).(b) 1/(4μ 2 ), since the variance of an exponential is its mean squared.(c) and (d). By the lack of memory property of the exponential it followsthat A, the amount by which X (2) exceeds X (1) , is exponentially distributedwith rate μ and is independent of X (1) . Therefore,E[X (2) ]=E[X (1) + A]= 12μ + 1 μVar(X (2) ) = Var(X (1) + A) = 14μ 2 + 1 μ 2 = 54μ 223. (a) 1 2 .(b) ( 1 2 )n−1 . Whenever battery 1 is in use and a failure occurs the probabilityis 1 2that it is not battery 1 that has failed.(c) ( 1 2 )n−i+1 , i>1.


Chapter 5 749(d) T is the sum of n − 1 independent exponentials with rate 2μ (sinceeach time a failure occurs the time until the next failure is exponential withrate 2μ).(e) Gamma with parameters n − 1 and 2μ.[ N(t)]∏36. E[S(t)|N(t)= n]=sE X i |N(t)= nThus,= sE= sEi=1[ n∏X i |N(t)= ni=1[ n∏]X ii=1= s(E[X]) n= s(1/μ) n]E[S(t)]=s ∑ n(1/μ) n e −λt (λt) n /n!= se −λt ∑ n(λt/μ) n /n!= se −λt+λt/μBy same reasoningandE[S 2 (t)|N(t)= n]=s 2 (E[X 2 ]) n = s 2 (2/μ 2 ) nE[S 2 (t)]=s 2 e −λt+2λt/μ240. The easiest way is to use Definition 3.1. It is easy to see that {N(t),t 0}will also possess stationary and independent increments. Since the sum of twoindependent Poisson random variables is also Poisson, it follows that N(t) is aPoisson random variable with mean (λ 1 + λ 2 )t.57. (a) e −2 .(b) 2 P.M.(c) 1 − 5e −4 .


750 Appendix Solutions to Starred Exercises60. (a) 1 9 .(b) 5 9 .64. (a) Since, given N(t), each arrival is uniformly distributed on (0,t) it followsthat∫ tE[X|N(t)]=N(t) (t − s) ds = N(t) t0 t 2(b) Let U 1 ,U 2 ,...be independent uniform (0,t)random variables. Then[ n∑]Var(X|N(t)= n) = Var (t − U i )i=1= n Var(U i ) = n t212(c) By parts (a) and (b) and the conditional variance formula,( ) [ N(t)t N(t)t2]Var(X) = Var + E212= λtt24 + λtt2·12 = λt3379. Consider a Poisson process with rate λ in which an event at time t is countedwith probability λ(t)/λ independently of the past. Clearly such a process will haveindependent increments. In addition,andP {2 or more counted events in (t, t + h)} P {2 or more events in (t, t + h)}= o(h)P {1 counted event in (t, t + h)}= P {1 counted | 1 event}P(1 event)+ P {1 counted | 2 events}P { 2}∫ t+hλ(s) ds=(λh + o(h)) + o(h)t λ h= λ(t) λh + o(h)λ= λ(t)h + o(h)


Chapter 6 75184. There is a record whose value is between t and t + dt if the first X largerthan t lies between t and t + dt. From this we see that, independent of all recordvalues less than t, there will be one between t and t + dt with probability λ(t) dtwhere λ(t) is the failure rate function given byλ(t) =f(t)1 − F(t)Since the counting process of record values has, by the preceding, independentincrements we can conclude (since there cannot be multiple record values becausethe X i are continuous) that it is a nonhomogeneous Poisson process with intensityfunction λ(t). When f is the exponential density, λ(t) = λ and so the countingprocess of record values becomes an ordinary Poisson process with rate λ.91. To begin, note that{}n∑P X 1 > X i = P {X 1 >X 2 }P {X 1 − X 2 >X 3 |X 1 >X 2 }2× P {X 1 − X 2 − X 3 >X 4 |X 1 >X 2 + X 3 }···× P {X 1 − X 2 ··· −X n−1 >X n |X 1 >X 2 + ··· +X n−1 }( ) 1 n−1=by lack of memory2Hence,{P M>}n∑X i − M =i=1n∑i=1{P X i > ∑ }X j = n2 n−1j≠iChapter 62. Let N A (t) be the member of organisms in state A and let N B (t) be thenumber of organisms in state B. Then {N A (t), N B (t)} is a continuous-Markovchain withP {n,m},{n−1,m+1} =P {n,m},{n+2,m−1} =ν {n,m} = αn + βmαnαn + βmβmαn + βm


752 Appendix Solutions to Starred Exercises4. Let N(t)denote the number of customers in the station at time t. Then {N(t)}is a birth and death process withλ n = λα n ,μ n = μ7. (a) Yes!(b) For n = (n 1 ,...,n i ,n i+1 ,...,n k−1 ) letS i (n) = (n 1 ,...,n i − 1,n i+1 + 1,...,n k−1 ), i = 1,...,k− 2S k−1 (n) = (n 1 ,...,n i ,n i+1 ,...,n k−1 − 1),S 0 (n) = (n 1 + 1,...,n i ,n i+1 ,...,n k−1 ).Thenq n,Si (n) = n i μ, i = 1,...,k− 1q n,S0 (n) = λ11. (b) Follows from the hint about using the lack of memory property and thefact that ε i , the minimum of j − (i − 1) independent exponentials with rateλ, is exponential with rate (j − i − 1)λ.(c) From parts (a) and (b){P {T 1 +···+T j t}=Pmax X i t1 i j(d) With all probabilities conditional on X(0) = 1}= (1 − e −λt ) jP 1j (t) = P {X(t) = j}= P {X(t) j}−P {X(t) j + 1}= P {T 1 +···+T j t}−P {T 1 +···+T j+1 t}(e) The sum of i independent geometrics, each having parameter p = e −λt ,is a negative binomial with parameters i, p. The result follows since startingwith an initial population of i is equivalent to having i independent Yuleprocesses, each starting with a single individual.16. Let the state be2: an acceptable molecule is attached0: no molecule attached1: an unacceptable molecule is attached.


Chapter 6 753Then this is a birth and death process with balance equationsSince ∑ 20 P i = 1, we getμ 1 P 1 = λ(1 − α)P 0μ 2 P 2 = λαP 0[P 2 = 1 + μ 2λα + 1 − α ]μ −1 2=α μ 1λαμ 1λαμ 1 + μ 1 μ 2 + λ(1 − α)μ 2where P 2 is the percentage of time the site is occupied by an acceptable molecule.The percentage of time the site is occupied by an unacceptable molecule isP 1 = 1 − ααμ 2μ 1P 1 =λ(1 − α)μ 2λαμ 1 + μ 1 μ 2 + λ(1 − α)μ 219. There are four states. Let state 0 mean that no machines are down, state1 that machine 1 is down and 2 is up, state 2 that machine 1 is up and 2is down, and 3 that both machines are down. The balance equations are asfollows:(λ 1 + λ 2 )P 0 = μ 1 P 1 + μ 2 P 2(μ 1 + λ 2 )P 1 = λ 1 P 0(λ 1 + μ 2 )P 2 = λ 2 P 0 + μ 1 P 3P 0 + P 1 + P 2 + P 3 = 1μ 1 P 3 = λ 2 P 1 + λ 1 P 2The equations are easily solved and the proportion of time machine 2 is down isP 2 + P 3 .24. We will let the state be the number of taxis waiting. Then, we get a birth anddeath process with λ n = 1,μ n = 2. This is an M/M/1. Therefore:(a) Average number of taxis waiting = 1μ − λ = 12 − 1 = 1.(b) The proportion of arriving customers that gets taxis is the proportionof arriving customers that find at least one taxi waiting. The rate of arrivalof such customers is 2(1 − P 0 ). The proportion of such arrivals istherefore2(1 − P 0 )2(= 1 − P 0 = 1 − 1 − λ )= λ μ μ = 1 2


The constant C is determined from∑Pn,m = 1754 Appendix Solutions to Starred Exercises28. Let Pij x ,vx idenote the parameters of the X(t) and P yij ,vy iof the Y(t)process; and let the limiting probabilities be Pi x,Pyi, respectively. By independencewe have that for the Markov chain {X(t),Y(t)} its parameters arev (i,l) = vi x + v y lvx iP (i,l)(j,l) =vi x + v y lP xijvy lP (i,l)(i,k) =vi x + v y lP ylkandxlim P {(X(t), Y (t)) = (i, j)}=Pt→∞i P y jHence, we need to show thatPi x P yl vx i P ij x = P j x P yl vx j P jix[That is, rate from (i, l) to (j, l) equals the rate from (j, l) to (i, l).] But thisfollows from the fact that the rate from i to j in X(t) equals the rate from j to i;that is,Pi x vx i P ij x = P j x vx j P jixThe analysis is similar in looking at pairs (i, l) and (i, k).33. Suppose first that the waiting room is of infinite size. Let X i (t) denotethe number of customers at server i, i = 1, 2. Then since each of the M/M/1processes {X 1 (t)} is time reversible, it follows from Exercise 28 that the vectorprocess {(X 1 (t), (X(t)), t 0} is a time reversible Markov chain. Now theprocess of interest is just the truncation of this vector process to the set of states AwhereA ={(0,m): m 4}∪{(n, 0): n 4}∪{(n, m): nm > 0,n+ m 5}Hence, the probability that there are n with server 1 and m with server 2 is( ) n (λ1P n,m = k 1 − λ )( ) m (1 λ21 − λ )2μ 1 μ 1 μ 2 μ 2( ) n ( ) m λ1 λ2= C,μ 1 μ 2(n,m)∈ Awhere the sum is over all (n, m) in A.


Chapter 7 75542. (a) The matrix P ∗ can be written asP ∗ = I + R/vand so Pij∗n can be obtained by taking the i, j element of (I + R/v) n , whichgives the result when v = n/t.(b) Uniformization shows that P ij (t) = E[Pij∗N ], where N is independentof the Markov chain with transition probabilities Pij ∗ and is Poisson distributedwith mean vt. Since a Poisson random variable with mean vt hasstandard deviation (vt) 1/2 , it follows that for large values of vt it should benear vt. (For instance, a Poisson random variable with mean 10 6 has standarddeviation 10 3 and thus will, with high probability, be within 3000 of10 6 .) Hence, since for fixed i and j, Pij∗m should not vary much for valuesof m about vt where vt is large, it follows that, for large vt,E[Pij∗N ]≈Pij∗nwhere n = vtChapter 73. By the one-to-one correspondence of m(t) and F , it follows that {N(t),t 0}is a Poisson process with rate 1 2 . Hence,P {N(5) = 0}=e −5/26. (a) Consider a Poisson process having rate λ and say that an event of therenewal process occurs whenever one of the events numbered r, 2r, 3r,...of the Poisson process occurs. ThenP {N(t) n}=P {nr or more Poisson events by t}∞∑= e −λt (λt) i /i!i=nr(b)∞∑∞∑ ∞∑E[N(t)]= P {N(t) n}= e −λt (λt) i /i!n=1n=1 i=nr∞∑[i/r]∑∞∑= e −λt (λt) i /i!= [i/r]e −λt (λt) i /i!i=r n=1i=r


756 Appendix Solutions to Starred Exercises8. (a) The number of replaced machines by time t constitutes a renewalprocess. The time between replacements equals T , if lifetime of new machineis T ; x, if lifetime of new machine is x,x < T . Hence,E[time between replacements]=∫ T0xf (x) dx + T [1 − F(T)]and the result follows by Proposition 3.1.(b) The number of machines that have failed in use by time t constitutesa renewal process. The mean time between in-use failures, E[F ], can becalculated by conditioning on the lifetime of the initial machine as E[F ]=E[E[F | lifetime of initial machine]]. Now{ x, if x TE[F | lifetime of machine is x]=T + E[F ], if x>THence,or∫ TE[F ]=0E[F ]=xf (x) dx + (T + E[F ])[1 − F(T)]∫ T0xf (x) dx + T [1 − F(T)]F(T)and the result follows from Proposition 3.1.18. We can imagine that a renewal corresponds to a machine failure, and eachtime a new machine is put in use its life distribution will be exponential with rateμ 1 with probability p, and exponential with rate μ 2 otherwise. Hence, if our stateis the index of the exponential life distribution of the machine presently in use,then this is a two-state continuous-time Markov chain with intensity ratesq 1,2 = μ 1 (1 − p),q 2,1 = μ 2 pHence,P 11 (t) =μ 1 (1 − p)μ 1 (1 − p) + μ 2 p exp{−[μ 1(1 − p) + μ 2 p]t}μ 2 p+μ 1 (1 − p) + μ 2 pwith similar expressions for the other transition probabilities [P 12 (t) = 1 − P 11 (t),and P 22 (t) is the same with μ 2 p and μ 1 (1 − p) switching places]. Condition-


Chapter 7 757ing on the initial machine now givesE[Y(t)]=pE[Y(t)|X(0) = 1]+(1 − p)E[Y(t)|X(0) = 2][P11 (t)= p + P ] [12(t)P21 (t)+ (1 − p) + P ]22(t)μ 2 μ 2μ 1Finally, we can obtain m(t) fromwhereμ[m(t) + 1]=t + E[Y(t)]μ = p/μ 1 + (1 − p)/μ 2is the mean interarrival time.22. Cost of a cycle = C 1 + C 2 I − R(T )(1 − I)μ 1I ={ 1, if X


758 Appendix Solutions to Starred Exercises35. (a) We can view this as an M/G/∞ system where a satellite launchingcorresponds to an arrival and F is the service distribution. Hence,P {X(t) = k}=e −λ(t) [λ(t)] k /k!where λ(t) = λ ∫ t0(1 − F(s))ds.(b) By viewing the system as an alternating renewal process that is on whenthere is at least one satellite orbiting, we obtainlim P {X(t) = 0}=1/λ1/λ + E[T ]where T , the on time in a cycle, is the quantity of interest. From part (a)lim P {X(t) = 0}=e −λμwhere μ = ∫ ∞0(1 − F(s))ds is the mean time that a satellite orbits. Hence,so42. (a) F e (x) = 1 μ∫ x0∫ xe −λμ =1/λ1/λ + E[T ]E[T ]= 1 − e−λμλe −λμe −y/μ dy = 1 − e −x/μ .(b) F e (x) = 1 dy = x c 0 c , 0 x c.(c) You will receive a ticket if, starting when you park, an official appearswithin one hour. From Example 7.23 the time until the official appears hasthe distribution F e which, by part (a), is the uniform distribution on (0,2).Thus, the probability is equal to 1 2 .49. Think of each interarrival time as consisting of n independent phases—eachof which is exponentially distributed with rate λ—and consider the semi-Markovprocess whose state at any time is the phase of the present interarrival time. Hence,this semi-Markov process goes from state 1 to 2 to 3 ...to n to 1, and so on. Alsothe time spent in each state has the same distribution. Thus, clearly the limitingprobability of this semi-Markov chain is P i = 1/n, i = 1,..., n. To computelim P {Y(t)


Chapter 8 759Chapter 82. This problem can be modeled by an M/M/1 queue in which λ = 6, μ= 8.The average cost rate will be$10 per hour per machine × average number of broken machinesThe average number of broken machines is just L, which can be computed fromEquation (3.2):L =λμ − λ= 6 2 = 3Hence, the average cost rate = $30/hour.7. To compute W for the M/M/2, set up balance equations as follows:λP 0 = μP 1 (each server has rate μ)(λ + μ)P 1 = λP 0 + 2μP 2(λ + 2μ)P n = λP n−1 + 2μP n+1 , n 2∑These have solutions P n = ρ n /2 n−1 P 0 where ρ = λ/μ. The boundary condition∞n=0P n = 1 impliesP 0 = 1 − ρ/2 (2 − ρ)=1 + ρ/2 (2 + ρ)Now we have P n , so we can compute L, and hence W from L = λW :L =∞∑ ∑∞ ( ρ) n−1nP n = ρP 0 n2n=0= 2P 0 ∞ ∑n=0( ρ) nn2n=0(2 − ρ) (ρ/2)= 2(2 + ρ) (1 − ρ/2) 2 [See derivation of Eq. (3.2).]==4ρ(2 + ρ)(2 − ρ)4μλ(2μ + λ)(2μ − λ)


760 Appendix Solutions to Starred ExercisesFrom L = λW we have4μW = W(M/M/2) =(2μ + λ)(2μ − λ)The M/M/1 queue with service rate 2μ hasW(M/M/1) = 12μ − λfrom Equation (3.3). We assume that in the M/M/1 queue, 2μ >λso that thequeue is stable. But then 4μ >2μ + λ, or 4μ/(2μ + λ) > 1, which impliesW(M/M/2) > W(M/M/1). The intuitive explanation is that if one finds thequeue empty in the M/M/2 case, it would do no good to have two servers.One would be better off with one faster server. Now let WQ 1 = W Q(M/M/1) andWQ 2 = W Q(M/M/2). Then,W 1 Q= W(M/M/1) − 1/2μSo,andThen,W 2 Q= W(M/M/2) − 1/μW 1 Q =λ2μ(2μ − λ)W 2 Q = λ 2μ(2μ − λ)(2μ + λ)W 1 Q >W2 Q ⇔ 1 2 >λ


Chapter 8 761(b) The system goes from 0 to 1 at rateλP 0 =The system goes from 2 to 1 at rateλ(2μ − λ)(2μ + λ)2μP 2 = λ2 (2μ − λ)μ (2μ + λ)(c) Introduce a new state cl to indicate that the stock clerk is checking byhimself. The balance equation for P cl isHenceP cl =(λ + μ)P cl = μP 2μλ + μ P 2 =(2μ − λ)2μ(λ + μ) (2μ + λ)Finally, the proportion of time the stock clerk is checking is∞∑2λ 2P cl + P n = P cl +μ(2μ − λ)n=2λ 221. (a) λ 1 P 10 .(b) λ 2 (P 0 + P 10 ).(c) λ 1 P 10 /[λ 1 P 10 + λ 2 (P 0 + P 10 )].(d) This is equal to the fraction of server 2’s customers that are type 1multiplied by the proportion of time server 2 is busy. (This is true since theamount of time server 2 spends with a customer does not depend on whichtype of customer it is.) By (c) the answer is thus(P 01 + P 11 )λ 1 P 10λ 1 P 10 + λ 2 (P 0 + P 10 )24. The states are now n, n 0, and n ′ , n 1 where the state is n when thereare n in the system and no breakdown, and it is n ′ when there are n in the systemand a breakdown is in progress. The balance equations areλP 0 = μP 1(λ + μ + α)P n = λP n−1 + μP n+1 + βP n ′, n 1(β + λ)P 1 ′ = αP 1


762 Appendix Solutions to Starred Exercises(β + λ)P n ′ = αP n + λP (n−1) ′, n 2∞∑ ∞∑P n + P n ′ = 1n=0 n=1In terms of the solution to the preceding,∞∑L = n(P n + P n′ )and son=1W = L = L λ a λ28. If a customer leaves the system busy, the time until the next departure is thetime of a service. If a customer leaves the system empty, the time until the nextdeparture is the time until an arrival plus the time of a service.Using moment generating functions we getE{e sD }= λ μ E{esD | system left busy}(+ 1 − λ )E{e sD | system left empty}μ( )( ) (λ μ=+ 1 − λ )E{e s(X+Y) }μ μ − s μwhere X has the distribution of interarrival times, Y has the distribution of servicetimes, and X and Y are independent. ThenSo,E[e s(X+Y) ]=E[e sX e sY ]= E[e sX ]E[e sY ] by independence( )( )λ μ=λ − s μ − s( )( ) (λ μE{e sD }=+ 1 − λ )( )( )λ μμ μ − s μ λ − s μ − s= λ(λ − s)By the uniqueness of generating functions, it follows that D has an exponentialdistribution with parameter λ.


Chapter 8 76336. The distributions of the queue size and busy period are the same for all threedisciplines; that of the waiting time is different. However, the means are identical.This can be seen by using W = L/λ, since L is the same for all. The smallestvariance in the waiting time occurs under first-come, first-served and the largestunder last-come, first-served.39. (a) a 0 = P 0 due to Poisson arrivals. Assuming that each customer pays 1per unit time while in service the cost identity of Equation (2.1) states thatoraverage number in service = λE[S]1 − P 0 = λE[S](b) Since a 0 is the proportion of arrivals that have service distributionG 1 and 1 − a 0 the proportion having service distribution G 2 ,theresultfollows.(c) We haveand E[I]=1/λ and thus,P 0 =E[I]E[I]+E[B]E[B]= 1 − P 0λP 0Now from parts (a) and (b) we haveor= E[S]1 − λE[S]E[S]=(1 − λE[S])E[S 1 ]+λE[S]E[S 2 ]E[S]=E[S 1 ]1 + λE[S 1 ]+λE[S 2 ]Substituting into E[B]=E[S]/(1 − λE[S]) now yields the result.(d) a 0 = 1/E[C], implying thatE[C]= E[S 1]+1/λ − E[S 2 ]1/λ − E[S 2 ]


764 Appendix Solutions to Starred Exercises45. By regarding any breakdowns that occur during a service as being part ofthat service, we see that this is an M/G/1 model. We need to calculate the first twomoments of a service time. Now the time of a service is the time T until somethinghappens (either a service completion or a breakdown) plus any additional time A.Thus,E[S]=E[T + A]= E[T ]+E[A]To compute E[A], we condition upon whether the happening is a service or abreakdown. This givesμαE[A]=E[A | service] + E[A | breakdown]μ + α μ + αα= E[A | breakdown]μ + α( ) 1 α=β + E[S] μ + αSince E[T ]=1/(α + μ) we obtain thatorE[S]= 1 ( ) 1 αα + μ + β + E[S] μ + αE[S]= 1 μ + α μβWe also need E[S 2 ], which is obtained as follows.E[S 2 ]=E[(T + A) 2 ]= E[T 2 ]+2E[AT ]+E[A 2 ]= E[T 2 ]+2E[A]E[T ]+E[A 2 ]The independence of A and T follows because the time of the first happening isindependent of whether the happening was a service or a breakdown. Now,E[A 2 ]=E[A 2 α| breakdown]μ + α


Chapter 8 765Hence,= αμ + α E[(downtime + S∗ ) 2 ]= α {E[down 2 ]+2E[down]E[S]+E[S 2 ] }μ + α= αμ + α{ 2β 2 + 2 β[ 1μ + α ] }+ E[S 2 ]μβ[E[S 2 2]=(μ + β) 2 + 2 αβ(μ+ α) + α ( 1μ + α μ + α )]μβ+ α { 2μ + α β 2 + 2 [ 1β μ + α ] }+ E[S 2 ]μβNow solve for E[S 2 ]. The desired answer isW Q =λE[S 2 ]2(1 − λE[S])In the preceding, S ∗ is the additional service needed after the breakdown is overand S ∗ has the same distribution as S. The preceding also uses the fact that theexpected square of an exponential is twice the square of its mean.Another way of calculating the moments of S is to use the representationS =N∑(T i + B i ) + T N+1i=1where N is the number of breakdowns while a customer is in service, T i is thetime starting when service commences for the ith time until a happening occurs,and B i is the length of the ith breakdown. We now use the fact that, given N, allof the random variables in the representation are independent exponentials withthe T i having rate μ + α and the B i having rate β. This yieldsE[S|N]= N + 1μ + α + N βVar(S|N)= N + 1(μ + α) 2 + N β 2Therefore, since 1 + N is geometric with mean (μ + α)/μ [and varianceα(α + μ)/μ 2 ] we obtainE[S]= 1 μ + α μβ


766 Appendix Solutions to Starred Exercisesand, using the conditional variance formula,[Var(S) =1μ + α + 1 β] 2α(α + μ)μ 2 +1μ(μ + α) +αμβ 252. S n is the service time of the nth customer; T n is the time between the arrivalof the nth and (n + 1)st customer.Chapter 94. (a) φ(x)= x 1 max(x 2 ,x 3 ,x 4 )x 5 .(b) φ(x)= x 1 max(x 2 x 4 ,x 3 x 5 )x 6 .(c) φ(x)= max(x 1 ,x 2 x 3 )x 4 .6. A minimal cut set has to contain at least one component of each minimal pathset. There are six minimal cut sets: {1, 5}, {1, 6}, {2, 5}, {2, 3, 6}, {3, 4, 6}, {4, 5}.12. The minimal path sets are {1, 4}, {1, 5}, {2, 4}, {2, 5}, {3, 4}, {3, 5}. Withq i = 1 − p i , the reliability function isr(p) = P {either of 1, 2, or 3 works}P {either of 4 or 5 works}= (1 − q 1 q 2 q 3 )(1 − q 4 q 5 )17. E[N 2 ]=E[N 2 |N>0]P {N >0}Thus, (E[N|N >0]) 2 P {N >0}, since E[X 2 ] (E[X]) 2E[N 2 ]P {N >0} (E[N|N >0]P [N >0]) 2= (E[N]) 2Let N denote the number of minimal path sets having all of its components functioning.Then r(p) = P {N >0}. Similarly, if we define N as the number of minimalcut sets having all of its components failed, then 1 − r(p) = P {N >0}. Inboth cases we can compute expressions for E[N] and E[N 2 ] by writing N as thesum of indicator (i.e., Bernoulli) random variables. Then we can use the inequalityto derive bounds on r(p).22. (a) ¯F t (a) = P {X>t+ a | X>t}=P [X>t+ a}P {X>t}= ¯F(t + a)¯F(t)


Chapter 9 767(b) Suppose λ(t) is increasing. Recall that¯F(t)= e − ∫ t0 λ(s) dsHence,¯F(t + a)¯F(t){= exp −∫ t+at}λ(s) dswhich decreases in t since λ(t) is increasing. To go the other way, suppose¯F(t + a)/ ¯F(t)decreases in t. Now when a is small¯F(t + a)¯F(t)≈ e −aλ(t)Hence, e −aλ(t) must decrease in t and thus λ(t) increases.25. For x ξ,1 − p = ¯F(ξ)= ¯F(x(ξ/x)) [ ¯F(x)] ξ/xsince IFRA. Hence, ¯F(x) (1 − p) x/ξ = e −θx .For x ξ,¯F(x)= ¯F(ξ(x/ξ)) [ ¯F(ξ)] x/ξsince IFRA. Hence, ¯F(x) (1 − p) x/ξ = e −θx .30. r(p) = p 1 p 2 p 3 + p 1 p 2 p 4 + p 1 p 3 p 4 + p 2 p 3 p 4 − 3p 1 p 2 p 3 p 4⎧⎪⎨ 2(1 − t) 2 (1 − t/2) + 2(1 − t)(1 − t/2) 2r(1 − F (t)) = −3(1 − t)⎪⎩2 (1 − t/2) 2 , 0 t 10, 1 t 2∫ 1 [E[lifetime]= 2(1 − t) 2 (1 − t/2) + 2(1 − t)(1 − t/2) 20− 3(1 − t) 2 (1 − t/2) 2] dt= 3160


768 Appendix Solutions to Starred ExercisesChapter 101. B(s) + B(t) = 2B(s) + B(t) − B(s).Now2B(s) is normal with mean 0 andvariance 4s and B(t) − B(s) is normal with mean 0 and variance t − s. BecauseB(s) and B(t) − B(s) are independent, it follows that B(s) + B(t) is normal withmean 0 and variance 4s + t − s = 3s + t.3. E[B(t 1 )B(t 2 )B(t 3 )]=E[E[B(t 1 )B(t 2 )B(t 3 )|B(t 1 ), B(t 2 )]]= E[B(t 1 )B(t 2 )E[B(t 3 ) | B(t 1 ), B(t 2 )]]= E[B(t 1 )B(t 2 )B(t 2 )]= E[E[B(t 1 )B 2 (t 2 ) | B(t 1 )]]= E[B(t 1 )E[B 2 (t 2 ) | B(t 1 )]]= E[B(t 1 ){(t 2 − t 1 ) + B 2 (t 1 )}] (∗)= E[B 3 (t 1 )]+(t 2 − t 1 )E[B(t 1 )]= 0where the equality (∗) follows since given B(t 1 ), B(t 2 ) is normal with mean B(t 1 )and variance t 2 − t 1 .Also,E[B 3 (t)]=0 since B(t) is normal with mean 0.5. P {T 1


Chapter 10 769distribution of X(t) given that X(s) = c,s > t, is normal with mean[ ] t(c− μs) (c − μs)tσ+ μt = + μtσssand varianceσ 2 t(s − t)s19. Since knowing the value of Y(t)is equivalent to knowing B(t), wehaveE[Y(t)| Y(u), 0 u s]=e −c2 t/2 E[e cB(t) | B(u),0 u s]= e −c2 t/2 E[e cB(t) | B(s)]Now, given B(s), the conditional distribution of B(t) is normal with mean B(s)and variance t − s. Using the formula for the moment generating function of anormal random variable we see thatThus {Y(t)} is a Martingale.e −c2 t/2 E[e cB(t) |B(s)]=e −c2 t/2 e cB(s)+(t−s)c2 /2= e −c2 s/2 e cB(s)= Y(s)E[Y(t)]=E[Y(0)]=120. By the Martingale stopping theoremE[B(T )]=E[B(0)]=0However, B(T ) = 2 − 4T and so 2 − 4E[T ]=0, or E[T ]= 1 2 .24. It follows from the Martingale stopping theorem and the result of Exercise18 thatE[B 2 (T ) − T ]=0where T is the stopping time given in this problem andB(t) =X(t) − μtσ


770 Appendix Solutions to Starred ExercisesTherefore,[ (X(T ) − μT )2]Eσ 2 − T = 0However, X(T ) = x and so the preceding gives thatE[(x − μT ) 2 ]=σ 2 E[T ]But, from Exercise 21, E[T ]=x/μ and so the preceding is equivalent toVar(μT ) = σ 2 x μ or Var(T ) = σ 2 x μ 327. E[X(a 2 t)/a]=(1/a)E[X(a 2 t)]=0. For s


Chapter 11 771Chapter 111. (a) Let U be a random number. If ∑ i−1j=1 P j


772 Appendix Solutions to Starred ExercisesAnother way is to let{1, the jth acceptable item is in the sampleX j =0, otherwiseand then simulate X 1 ,...,X N by generating random numbers U 1 ,...,U N andthen setting⎧⎪⎨1, if UX j =j < n − ∑ j−1i=1 X iN + M − (j − 1)⎪⎩0, otherwiseand X = ∑ Nj=1 X j then has the desired distribution.The former method is preferable when n N and the latter when N n.6. LetHence,{ } f(x)c(λ) = maxx λe −λx = 2= 2 { λ2}λ √ 2π exp 2λ √ 2π maxx[ { −x2}]exp2 + λxddλ c(λ) = √ { λ22/π exp}[1 − 1 ]2 λ 2Hence (d/dλ)c(λ) = 0 when λ = 1 and it is easy to check that this yields theminimal value of c(λ).16. (a) They can be simulated in the same sequential fashion in which theyare defined. That is, first generate the value of a random variable I 1such thatP {I 1 = i}=w i∑ nj=1w j,i = 1,...,nThen, if I 1 = k, generate the value of I 2 whereP {I 2 = i}=w i∑j≠k w j, i ≠ kand so on. However, the approach given in part (b) is more efficient.(b) Let I j denote the index of the jth smallest X i .


Chapter 11 77323. Let m(t) = ∫ t0 λ(s) ds, and let m−1 (t) be the inverse function. That is,m(m −1 (t)) = t.(a) P {m(X 1 )>x}=P {X 1 >m −1 (x)}= P {N(m −1 (x)) = 0}= e −m(m−1 (x))= e −x(b) P {m(X i ) − m(X i−1 )>x|m(X 1 ),...,m(X i−1 ) − m(X i−2 )}= P {m(X i ) − m(X i−1 )>x|X 1 ,...,X i−1 }= P {m(X i ) − m(X i−1 )>x|X i−1 }= P {m(X i ) − m(X i−1 )>x|m(X i−1 )}Now,P {m(X i ) − m(X i−1 )>x|X i−1 = y}{∫ Xi}= P λ(t)dt > x|X i−1 = yy= P {X i >c|X i−1 = y} where= P {N(c)− N(y)= 0 | X i−1 = y}= P {N(c)− N(y)= 0}{ ∫ c}= exp − λ(t) dt= e −xy∫ cyλ(t) dt = x32. Var[(X + Y)/2]= 1 4[Var(X) + Var(Y ) + 2Cov(X, Y )]Now it is always true that=Var(X) + Cov(X, Y )2Cov(V, W )√ Var(V )Var(W ) 1and so when X and Y have the same distribution Cov(X, Y ) Var(X).


This page intentionally left blank


IndexAbsorbing state of a Markov chain, 187Accessibility of states in a Markov chain,193–194Age of the renewal process, 447Algorithms, analyzing probabilistic, 224–230Alias method in simulation, 688–692Aloha protocol, 201–204Alternating renewal process, 445–446Aperiodic state of a Markov chain, 204Arbitrage defined, 635Arbitrage theorem, 635–638Balance equations, 386, 500Ballot problem, 128–130, 175Bayes’ formula, 12–15Bernoulli random variables, 28–29, 39, 50,56–57, 286, 716independent, 123–124Best prize problem, 124–126Beta distribution, 681–682Beta random variable, 63relation to gamma, 62–64simulation of, 671, 681–682Binomial random variables, 29–31, 39, 50simulating, 686sums of independent, 69–70variance of, 55–56Binomials, negative, 164–165Birth and death model, 365Birth and death processes, 368–375, 384ergodic, 394Bivariate exponential distribution, 363Bivariate normal distribution, 167Black–Scholes option pricing formula, 638–644Bonferroni’s inequality, 16Bonus Malusautomobile insurance system, 216–217system, 188–189Boole’s inequality, 16Bose–Einstein statistics, 147–151Box–Muller approach, 678Branching processes, 233–236BridgeBrownian, 647structure, 587system defined, 576Brown, Robert, 626Brownian bridge, 647Brownian motionintegrated, 648standard, 627variations on, 631–632Brownian motion and stationary processes,625–662Gaussian processes, 646–649harmonic analysis of weakly stationaryprocesses, 654–657hitting times and gambler’s ruin problem,629–630maximum variable and gambler’s ruinproblem, 629–630pricing stock options, 632–644stationary and weakly stationary processes,649–654variations on Brownian motion, 631–632white noise, 644–646Busy period, 338, 530–531Cayley’s theorem, 590Central limit theorem, 79, 82Central limit theorem for renewal processes,432–433775


776 IndexChapman–Kolmogorov equations, 189–193,380Chebyshev’s inequality, 77–78Chi-squared distribution, 681Chi-squared random variable with degree offreedom, 75Class of Markov chain, 193Closed class of a Markov chain, 266Communications with Markov chain, 194Complement of event, 3Compound Poisson process, 337–342Compound Poisson random variable, 120Compound random variablesdefined, 108identities, 159variances of, 119–120Compound random variables, identity for,158–165binomial compounding distribution, 163compounding distribution related to negativebinomials, 164–165Poisson compounding distribution, 161–162Conditional expectation, 98, 102–103, 105–106Conditional expectation, conditional probabilityand, 97–184computing expectations by conditioning,105–120computing probabilities by conditioning,120–137continuous case, 102–105discrete case, 97–102identity for compound random variables,158–165miscellaneous applications, 137–158Conditional or mixed Poisson processes,343–345Conditional probability, 7–10, 136–137Conditional probability and conditionalexpectationcomputing expectations by conditioning,105–120computing probabilities by conditioning,120–137continuous case, 102–105discrete case, 97–102identity for compound random variables,158–165miscellaneous applications, 137–158Conditional probability density function, 102Conditional variance, 118Conditional variance formula, 119Conditioning, computing probabilities by,120–137Connected graph defined, 145Convolution, 58Correlation, 167Counting processes, 302–303Coupling from past, 720–722Coupon collecting problem, 313–316Covariance, properties of, 54Covariance and variance of sums of randomvariables, 53–61Coxian random variable defined, 301Craps, 16Cumulative distribution function (cdf), 26, 28Cut vector defined, 577Decreasing failure rate (DFR), 596–597,599–600Delayed renewal process, 461–462, 481Dependent events, 10Dirichlet distributions defined, 149Discrete distributions simulating from, 685–692Distributions, simulating from discrete, 685–692Doubly stochastic, 266e, 131–132Ehrenfest, P. and T., 239Ehrenfest urn model, 239, 273Einstein, A., 626Elementary renewal theorem, 424–425Ergodic birth and death process, 394Ergodic define, 204Ergodic Markov chain, 242Erlang’s loss system, 553–554Event, 2Excess life of renewal process, 441, 448Expectation, conditional, 97–184, 103Expectation of sum of random number of randomvariables, 107Expectations, computing by conditioning,105–120computing variances by conditioning,117–120Expected system lifetime, 604–610Expected time to maximal run of distinctvalues, 469–471Expected value, 38–39, 41, 92of Bernoulli random variables, 39of binomial random variables, 39–40, 50of compound random variables, 107–108of continuous random variables, 41of discrete random variables, 38of exponential random variables, 42, 282of functions of random variables, 43–45,49


Index 777of geometric random variables, 40–41,108–109of hypergeometric random variables, 56–58of normal random variables, 42–43of Poisson random variables, 41of the sum of a random number of randomvariables. See of compound randomvariablesof sums of random variables, 49–51tables of, 68–69of time until k consecutive successes, 112–113of total discounted reward, 171, 283–284of uniform random variables, 42Exponential dissipation, properties of, 284–291Exponential distribution, 282–302, 682–685further properties of, 291–298Exponential distribution and Poisson processconvolutions of exponential randomvariables, 298–302definition, 282–283exponential distribution, 282–302further properties of exponentialdistribution, 291–298properties of exponential distribution, 284–291Exponential models, 499–517queueing system with bulk service, 514–517shoeshine shop, 511–514single-server exponential queueing system,499–511Exponential queueing system, single-server,499–511Exponential random variables, 36, 42convolutions of, 298–302mixtures of, 599–600rate of, 289–291simulating, 669Von Neumann algorithm, 682–685Failure rate. See Decreasing failure rate; Increasingfailure rateFailure rate function, 288, 302, 346discrete time, 301Gamblers ruin problem, 217–221application to drug testing, 219–220hitting times and, 629–630maximum variable and, 629–630Gamma distribution, 597, 680–681Gamma random variables, 37independent, 62Gaussian processes, 646–649General renewal process. See Delayed renewalprocessGeometric Brownian motion, 631–632, 658Geometric distribution, mean of, 108–109Geometric random variable, 31–32, 40–41variance of, 117–119Gibbs sampler, 250–251Graph and components, 589Graphs, random, 139–147, 588Greedy algorithms, analyzing, 292–294Hardy–Weinberg law, 207–209Hastings–Metropolis algorithm, 248–250Hazard function defined, 601Hazard rate function, 288, 302Hazard rate method, 673–676discrete, 725–726Hit-miss method, 729–730Hyperexponential random variable, 290Hypergeometric defined, 56, 58Hypergeometric distribution defined, 99Hypoexponential random variable, 299Idle period, 338, 530IFR, pillow system that is not, 599IFR lifetime distribution, 598Ignatov’s theorem, 133–134, 155Impulse response function, 654–655Inclusion and exclusion, method of, 584–592Inclusion–exclusion bounds, 586Inclusion–exclusion theorem, 134–136Increasing failure rate (IFR), 596Increasing failure rate on the average (IFRA),571, 601Independent events, 10–12Independent events, pairwise, 11Independent increments, 303Independent random variables, 51–52, 314sum of, 68Indicator random variable for event, 25Inspection paradox, 455–458Instantaneous transmission rates defined, 378Insurance, 121–122, 136–137, 325–327,344–345, 446–447Insurance ruin problem, 473–478Interarrival time, sequence of, 307Intersection of event, 3Inventory example, 450–451Inverse transformation method, 668–669discrete analog, 685Inversions, 95Irreducible Markov chain, 194


778 IndexJacobian determinant, 63Joint cumulative probability distributionfunction, 47Joint moment generating function, 96Joint probabilitycomputing, 324–325density function, 48distribution of functions of randomvariables, 61–64mass function, 48of sample mean and sample variance fromnormal population, 74–76Jointly continuous random variables, 48Jointly distributed random variables, 47–64Kolmogorov’s backward equation, 380Kolmogorov’s forward equation, 382–383k-out-of-n structure, 573k-out-of-n system, 575, 606–607with equal probability, 579with identical components, 598k-record index, 156k-record values of discrete random variables,155–158L = λ a W , 496L = λ a W Q , 496Laplace transform, 72, 306Limiting probabilities, 384–392Linear growth model with immigration, 369–371Linear program defined, 255List model, 137–139Little o notation, 304Little’s formula, 496Long run proportion of time, 205Markov chain generated data, mean patterntimes in, 212–215Markov chain in genetics, 207–209Markov chains, 185–280branching processes, 233–236Chapman–Kolmogorov equations, 189–193classification of states, 193–204continuous-time, 366–368, 381ergodic, 242hidden Markov chains, 256–263independent time reversible continuoustime,399irreducible, 245limiting probabilities, 204–217Markov decision processes, 252–256mean time spent in transient states, 230–232miscellaneous applications, 217–230Monte Carlo methods, 247–252time reversible, 236–247time reversible Markov chains, 236–247Markov chains, continuous-time, 365–415birth and death processes, 368–375computing transition probabilities, 404–407limiting probabilities, 384–392time reversibility, 392–400transition probability function, 375–384uniformization, 401–404Markov chains, hidden, 256–263protecting states, 261–263Markov decision processes, 252–256Markov processes. See Semi-MarkovprocessesMarkovian property, 225, 227Markov’s inequality, 77–78Martingale process, 644, 659Match problem, 126–128Matching rounds problem, 110–112Meanof geometric distribution, 108–109joint distribution of sample, 74–76pattern times in Markov chain generateddata, 212–215Poisson distribution with, 66Poisson random variable width, 336sample, 55value function, 333Mean timefor patterns, 151–155spent in transient states, 230–232Mean value analysis of queing networks, 504Memoryless random variable, 284, 366Minimal cut set, 577Minimal path and minimal cut sets, 574–578Minimal path set, 574Minimal path vector common, 574Mixed Poisson processes, conditional or,343–345Mixture of distributions, 493Moment, 46Moment generating function determinesdistribution, 69Moment generating functions, 64, 68of binomial random variables, 65of compound random variables, 174of exponential random variables, 66–67,282of normal random variables, 67–68of Poisson random variables, 66of the sum of independent random variables,68


Index 779tables of, 68–69Monte Carlo methods, Markov chain, 247–252Monte Carlo simulation approach defined,663Monte Carlo’s simulation, 247Moving average process, 654Multinomial distribution, 87Multivariate normal distribution, 72–73Mutually exclusive events, 3Negative binomial distribution defined, 88Negative binomials, compounding distributionrelated to, 164–165New better than used, 467Nonhomogeneous Poisson processconditional distribution of event times, 361mean value function of, 333simulating, 693–703, 708–709Normal random variables, 37–38, 42–43, 96as approximations of the binomial, 80–81simulation of, 671–673, 677–680Null event, 3Null recurrent state of a Markov chain, 270Occupation time, 404, 414Odds, 636–637Options, 632–635, 637–638Order statistics, 60–61simulation of, 726–727Ornstein-Uhlenbeck process, 652Pairwise independent events, 11Parallel structure, 572–573Parallel system, 571, 579, 596, 612–615expected life of, 608–610that is not IFR, 599Path vector, 574Patternsof discrete random variables, 462–469mean time for, 151–155Patterns, applications to, 461–473expected time to maximal run of distinctvalues, 469–471increasing runs of continuous randomvariables, 471–473patterns of discrete random variables, 462–469Period of a state of a Markov chain, 204Poisson arrival queues, single-server, 338–342Poisson compounding distribution, 161–162Poisson distribution, 60Poisson distribution with mean, 66Poisson paradigm, 70–72Poisson process, 302–330compound, 337–342conditional distribution of arrival times,316–327counting processes, 302–303definition of, 304–307estimating software reliability, 328–330interarrival and waiting time distributions,307–310miscellaneous properties of Poissonprocesses, 310–316nonhomogeneous, 330–337sampling, 694–697sampling of, 318simulating two-dimensional, 700–703sums of, 352Poisson process, exponential distribution and,281–364convolutions of exponential randomvariables, 298–302definition, 282–283exponential distribution, 282–302further properties of exponentialdistribution, 291–298properties of exponential distribution, 284–291Poisson process, generalization of, 330–345compound Poisson process, 337–342conditional or mixed Poisson processes,343–345Poisson process having rate, 304Poisson processesconditional or mixed, 343–345miscellaneous properties of, 310–316Poisson queue, output process of infiniteserver, 336–337Poisson random variables, 32–33, 100, 306,320approximation to the binomial, 32–33,307, 333–334expectation of, 41maximum probability, 89with mean, 336random sampling, 122–123simulating, 687–688sums of independent, 70Polar method defined, 680Pollaczek–Khintchine formula, 529Polya’s urn model, 147–151, 169–170Positive recurrent state of a Markov chain,204Power spectral density, 656Probabilistic algorithm, 229Probabilistically process, 308


780 IndexProbabilitiescomputing, 120–137computing joint, 324–325conditional, 7–10, 97–184, 136–137defined on events, 4–6identical component, 587limiting, 204–217, 384–392of matches, 126–128stationary, 212steady-state, 496–498transition, 404–407of union of events, 6Probability density function, 34, 45conditional, 102joint, 48Probability distribution function, jointcumulative, 47Probability function, transition, 375–384Probability mass function, 27joint, 48Probability theory, introduction to, 1–21Bayes’ formula, 12–15conditional probabilities, 7–10independent events, 10–12probabilities defined on events, 4–6sample space and events, 1–4Pure birth process, 365Queueing system, 371with bulk service, 514–517G/M/k, 554–555multiserver exponential, 372–375with renewal arrivals, 443simulating, 707single-server exponential, 508–511with finite capacity, 508–510waiting times, 504Queueing theory, 493–570cost equations, 495–496departure process, 394–395exponential models, 499–517finite source model, 549–552M/G/k, 556–557model G/M/1, 543–548multiserver queues, 552–557network of queues, 517–527analysis of the Gibbs sampler, 526–527the arrival theorem, 524–525mean value analysis, 525preliminaries, 494–498steady-state probabilities, 496–498system M/G/1, 528–531optimization problem, 536–540tandem quenes, 395transition diagram, 512–513variations on M/G/1, 531–543work in quenes, 528Queuesinfinite server, 319–320multiserver, 552–557output process, 336–337single-server Poisson arrival, 338–342Queues, network of, 517–527closed systems, 522–527open systems, 517–522Quick sort algorithm, 113–116Random graph, 139–147, 588Random numbers, 664Random permutations, generating, 664–666Random subset, 726–727Random telegraph signal process, 653Random variable, expectationcontinuous case, 41–43discrete case, 38–41expectation of function of random variable,43–47Random variable identity, compound, 159Random variable with mean, Poisson, 336Random variables, 23–96Bernoulli, 28–29, 39, 50, 56–57, 286, 716binomial, 29–31, 39, 50chi-squared, 75compound, 108continuous, 26, 34–38, 45, 471–473convolutions of exponential, 298–302covariance and variance of sums of, 53–61Coxian, 301discrete, 26, 155–158, 462–469discrete random variables, 27–33expectation of, 38–47expectation of sum of random number of,107–108expectations of functions of, 43–47exponential, 36, 42gamma, 37geometric, 31–32, 40–41hyperexponential random, 290hypoexponential, 299independent, 51–52, 314independent Bernoulli, 123–124independent gamma, 62joint density function of, 63joint probability distribution of functionsof, 61–64jointly distributed, 47–64jointly distributed random variables, 47–64limit theorems, 77–83moment generating functions, 64–76normal, 37–38, 42–43


Index 781Poisson, 32–33, 306, 320simulating binomial, 686simulating normal, 671–673simulating Poisson, 687–688stochastic processes, 83–85sum of independent, 68sum of two independent uniform, 59sums of independent binomial, 69–70sums of independent normal, 70sums of independent Poisson, 59, 70uniform, 35–36, 42variance of binomial, 55–56variance of compound, 119–120variance of geometric, 117–119Random variables, identity for compound,158–165binomial compounding distribution, 163compounding distribution related to negativebinomials, 164–165Poisson compounding distribution, 161–162Random variables, simulating continuous,668–676, 677–685chi-squared distribution, 681data distribution, 681–682exponential distribution, 682–685gamma distribution, 680–681hazard rate method, 673–676inverse transformation method, 668–669normal distribution, 677–680rejection method, 669–673Von Neumann algorithm, 682–685Random walk, 224–230symmetric, 199Rate of distribution defined, 289Rate of renewal process, 424Rates, instantaneous transition, 378Records, 95Recurrent states, positive, 204Recurring states, 195, 197Regenerative processes, 442–451alternating renewal processes, 445–451Rejection method, 669–673discrete, 725Reliability, estimating software, 328–330Reliability function, 602Reliability function, bounds on, 583–595method of inclusion and exclusion, 584–592second method for obtaining bounds, 593–595Reliability theorybounds on reliability function, 583–595expected system lifetime, 604–610reliability of systems of independentcomponents, 578–582structure functions, 571–578system life as function of component lives,595–603Renewal, cycle and, 434Renewal arrivals, queueing system with, 443Renewal functioncomputing, 458–461estimating, 710–711Renewal processesage of, 447alternating, 445–451average age of, 440average excess of, 441central limit theorem for, 432–433defined, 417excess of, 448and interarrival distribution, 454Renewal reward processes, 433–441Renewal theory and its applications, 417–492Reversible chain, time, 398Sample mean defined, 55Sample space, 1Sample variance, 74Satisfiability problem, 228–230Second-order stationary process, 651Semi-Markov processes, 452–454Series system, 571, 579, 596, 611, 614of uniformly distributed components, 604–605Shuffling, 267Signed rank test, 182Simulation, 663–731coupling from past, 720–722determining number of runs, 720simulating continuous random variables,668–676, 677–685simulating from discrete distributions,685–692stochastic processes, 692–703variance reduction techniques, 703–719Six, 281–364Snyder’s ratio of genetics, 273Software reliability, estimating, 328–330Spanning trees, 590Standard Brownian motion, 627Standard normal distribution function, 80Stationary and weakly stationary processes,649–654Stationary increments, 303Stationary probabilities, 212Stationary processesharmonic analysis of weakly, 654–657


782 Indexsecond-order, 651stationary and weakly, 649–654weakly, 651Stationary processes, Brownian motion and,625–662Stationary transition probabilities, 366Stimulation, 663–731Stirling’s approximation, 144, 203, 223Stochastic processes, 83–85, 626, 646, 692–703simulating nonhomogenous Poissonprocess, 693–700simulating two-dimensional Poissonprocess, 700–703state space of, 84Stopping time, 481, 675Stratified sampling, 730Strong law of large numbers, 78–79Strong law for renewal processes, 423–424Structure functions, 571–578Suspended animation, series model with,615–617Symmetric random walk, 199relation to Brownian motion, 625–626Taylor series expansion, 82Thinning algorithm in simulation, 694Throughput rate, 523Tilted density, 716Time reversibility, 392–400Time reversible chain, 398Time reversible continuous-time Markovchains, independent, 399Time reversible equations, 241Time reversible Markov chains, 236–247Times, conditional distribution of arrival,316–327Transient states, 195mean time spent in, 230–232Transition probabilities, computing, 404–407Transition probability function, 375–384Transition probability matrix, 197Transition rates, instantaneous, 378Tree process, 277Truncated chain, 398Two-dimensional Poisson process,simulating, 700–703Two state continuous time Markov chain,381–383Uniform priors, 147–151Uniform random variables, 35–36, 42, 59Uniformization, 401–404Union of event, 3Union of events, probability of, 6Unit normal distribution. See Standard normaldistributionVariance reduction by conditioning, 708–712Variance reduction techniques, 703–719control variates, 712–714importance sampling, 714–719using antithetic variables, 704–707variance reduction by conditioning, 708–712Variancesof binomial random variables, 55–56of compound random variables, 119–120computing, 117–120of exponential random variables, 282–284of geometric random variables, 117–118of hypergeometric random variables, 56–58joint distribution of sample, 74–76of normal random variables, 46of the number of renewals, 461of Poisson random variables, 66sample, 74of sums of random variables, 53–61tables of, 68–69Viterbi algorithm defined, 263Von Neumann algorithm, 682–685Waiting time, 308Waiting time distributions, interarrival and,307–310Wald’s equation, 482–485, 548, 675defined, 481Weak dependence, 71Weak law of large numbers, 94Weakly stationary process, 651Weakly stationary processesharmonic analysis of, 654–657stationary and, 649–654Weibull distribution, 596White noise, 644–646Wiener, N., 626–627Wiener process, 626Work in queue, 528Yule process, 377, 408–409

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!