Section 3.4 1 Chapter 3 – Special Discrete Random Variables ...
Section 3.4 1 Chapter 3 – Special Discrete Random Variables ...
Section 3.4 1 Chapter 3 – Special Discrete Random Variables ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Section</strong> <strong>3.4</strong> 1<br />
<strong>Chapter</strong> 3 <strong>–</strong> <strong>Special</strong> <strong>Discrete</strong> <strong>Random</strong> <strong>Variables</strong>.<br />
<strong>Section</strong> <strong>3.4</strong> Binomial random variable<br />
An experiment that has only two possible outcomes is called a Bernoulli trial, for<br />
example, a single coin toss. For the sake of argument, we will call one of the possible<br />
outcomes “success”, and the other one “failure”. The probability of a success is p, and the<br />
probability of failure is 1 − p. We are interested in studying a sequence of identical and<br />
independent Bernoulli trials, and looking at the total number of successes that occur.<br />
Definition. A binomial random variable is the number of successes in n independent<br />
and identical Bernoulli trials.<br />
Examples.<br />
A fair coin is tossed 100 times and Y , the number of heads, is recorded. Then Y is<br />
a binomial random variable with n = 100 and p = 1/2.<br />
Two evenly matched teams play a series of 6 games. The number of wins Y is a<br />
binomial random variable with n = 6 and p = 1/2.<br />
An inspector looks at five computers where the chance that each computer is defective<br />
is 1/6. The number Y of defective computers that he sees is a binomial random variable<br />
with n = 5 and p = 1/6.<br />
If Y is a binomial random variable, then the possible outcomes for Y are obviously<br />
0, 1, . . . , n. In other words, the number of observed successes could be any number between<br />
0 and n. The sample space consists of all strings of length n that consist of S’s and F ’s;<br />
for example,<br />
n trials<br />
<br />
SSF SF SSSF · · · SF .<br />
Now let us choose a value of 0 ≤ y ≤ n, and look at a couple of typical sample points<br />
belonging to the event (Y = y),<br />
y n − y<br />
<br />
SSS · · · S F F F · · · F ,<br />
y − 1 n − y<br />
<br />
SSS · · · S F F F · · · F S,<br />
y − 2 n − y<br />
<br />
SSS · · · S F F F · · · F SS.<br />
Every sample point in the event (Y = y) is an arrangement of y S’s and n − y F ’s, and<br />
so therefore has probability p y (1 − p) n−y .<br />
How many such sample points are there? The number of sample points in (Y = y)<br />
. Putting it<br />
is the number of distinct arrangements of y S’s and n − y F ’s, that is, n<br />
y
together gives the formula for binomial probabilities.<br />
Binomial probabilities.<br />
<strong>Section</strong> <strong>3.4</strong> 2<br />
If Y is a binomial random variable with parameters n and p,<br />
then<br />
<br />
n<br />
P(Y = y) = p<br />
y<br />
y (1 − p) n−y , y = 0, 1, . . . , n.<br />
Example. Best-of-seven series<br />
In section 1.6 we figured out that the probability of a best-of-seven series between two<br />
evenly matched teams going the full seven games was 20/64. This can also be calculated<br />
using binomial probabilities. If you play six games against an equally skilled opponent,<br />
and Y is the number of wins, then Y has a binomial distribution with n = 6 and p = 1/2.<br />
The series goes seven games if Y = 3, and the chance of that happening is P(Y = 3) =<br />
6<br />
3<br />
(1/2) 3 (1/2) 3 = 20/64 = .3125. So best-of-seven series ought to be seven games long<br />
30% of the time. But, in fact, if you look at the Stanley Cup final series for the last fifty<br />
years (1946-1995), there were seven-game series only 8 times (1950, 1954, 1955, 1964, 1965,<br />
1971, 1987, 1994). This seems to show that a lot of these match-ups were not even, which<br />
tends to make the series end sooner.<br />
If you are twice as good as your opponent, what is the chance of a full seven games?<br />
This time p = 2/3, and so P(Y = 3) = 6 3 3<br />
3 (2/3) (1/3) = .2195. This agrees more closely<br />
to the actual results, although it’s still a bit high.<br />
Example. An even split<br />
If I toss a fair coin ten times, what is the chance that I get exactly 5 heads and 5<br />
tails? The answer is P(Y = 5) = 10 5 5<br />
5 (1/2) (1/2) = .2461. If I toss a fair coin 100<br />
times, what is the chance of exactly fifty heads? This time the answer is P(Y = 50) =<br />
100<br />
50<br />
(1/2) 50 (1/2) 50 = .0796. You may be a bit surprised that this is such an uncommon<br />
event. If you flip a coin 100 times the odds are pretty good that you will get about an equal<br />
number of heads and tails, but to get exactly one half heads and one half tails gets harder<br />
and harder as the sample size increases. Just for fun, here is an approximate formula for<br />
the chance of getting exactly n heads in 2n coin tosses: P(an even split) ≈ (πn) −1 .<br />
Example. Testing for ESP<br />
In order to test for ESP you draw a card from an ordinary deck and ask the subject<br />
what color it is. You repeat this 20 times and the subject is correct 15 times. How likely<br />
is it that this is due to chance?<br />
If the subject is guessing, then Y , the number of correct readings, follows a binomial<br />
distribution with n = 20 and p = 1/2. We want to know the probability that someone
can do this well (or better) by guessing. Thus<br />
P(Y ≥ 15) = P(Y = 15) + P(Y = 16) + · · · + P(Y = 20)<br />
<br />
20<br />
= (1/2)<br />
15<br />
15 (1/2) 5 <br />
20<br />
+<br />
16<br />
= 21700(1/2) 20<br />
= 0.0207.<br />
<strong>Section</strong> <strong>3.4</strong> 3<br />
(1/2) 16 (1/2) 4 + · · · +<br />
<br />
20<br />
(1/2)<br />
20<br />
20 (1/2) 0<br />
This is a pretty unlikely event but certainly not impossible. What conclusion can we draw?<br />
Example. Quality control<br />
In mass production manufacturing there is a certain percentage of acceptable loss<br />
due to defective units. To check the level of defectives, you take a sample from the day’s<br />
production. If the number of defectives is small you continue, but if there are too many<br />
defectives you shut down the production line for repairs.<br />
Suppose that 5% defectives is considered acceptable, but 10% defectives is unacceptable.<br />
Our strategy is to take a sample of n = 40 units and shut down production if we find<br />
4 or more defectives. Our inspection strategy has two conflicting goals, it is supposed to<br />
shut down when p ≥ .10, but continue if p ≤ .05. There are two possible wrong decisions;<br />
to continue when p ≥ .10, and to shut down even though p ≤ .05.<br />
How often will we unnecessarily shut down? Suppose that there are acceptably many<br />
defectives, and to take the worst case, say there are 5% defectives, so that p = .05. Let<br />
Y be the number of observed defective units in the sample. The probability of shutting<br />
down production is<br />
P(shut down)<br />
= P(Y ≥ 4)<br />
= 1 − P(Y ≤ 3)<br />
= 1 − P(Y = 0) − P(Y = 1) − P(Y = 2) − P(Y = 3)<br />
= 1 −<br />
40<br />
0<br />
<br />
(.05) 0 (.95) 40 −<br />
40<br />
1<br />
= 1 − .1285 − .2705 − .2777 − .1851<br />
= .1382<br />
<br />
(.05) 1 (.95) 39 −<br />
40<br />
2<br />
<br />
(.05) 2 (.95) 38 −<br />
<br />
40<br />
(.05)<br />
3<br />
3 (.95) 37<br />
On the other hand, how often will we fail to spot an unacceptably high level of<br />
defectives? Let us now suppose that there are unacceptably many defectives, and again to<br />
take the worst case, let’s say there are 10% defectives, so that p = .10. The chance that
the day’s production passes inspection anyway is<br />
P(passes inspection)<br />
= P(Y ≤ 3)<br />
= P(Y = 0) + P(Y = 1) + P(Y = 2) + P(Y = 3)<br />
=<br />
40<br />
0<br />
<br />
(.10) 0 (.90) 40 +<br />
40<br />
1<br />
= .0148 + .0657 + .1423 + .2003<br />
= .4231<br />
<strong>Section</strong> <strong>3.4</strong> 4<br />
<br />
(.10) 1 (.90) 39 +<br />
40<br />
2<br />
<br />
(.10) 2 (.90) 38 +<br />
<br />
40<br />
(.10)<br />
3<br />
3 (.90) 37<br />
We see that this scheme is fairly likely to make errors. If we wanted to be more certain<br />
about our decision, we would need to take a larger sample size.<br />
Example. Multiple choice exams<br />
If a multiple choice exam has 30 questions, each with 5 responses, what is the probability<br />
of passing the exam by guessing? If you guess on every question, then Y the number<br />
of correct answers will be a binomial random variable with n = 30 and p = 1/5. To pass<br />
you need 15 or more correct answers so P(pass the exam) = P(Y ≥ 15) = 0.000231.<br />
Binomial moments.<br />
If Y is a binomial random variable with parameters n and p,<br />
then<br />
E(Y ) = np and VAR (Y ) = np(1 − p).<br />
Example. The accuracy of empirical probabilities<br />
If we simulate n random events, where the chance of a success is p, then the number of<br />
observed successes Y has a binomial distribution with parameters n and p. The empirical<br />
probability is p = Y/n. Now the binomial moments given above show that E(p) =<br />
(np)/n = p, and VAR (p) = (np(1−p))/n 2 = p(1−p)/n. By computing the two standard<br />
deviation interval, we get some idea about how close p is to p. Since the quantity p(1 − p)<br />
is maximized when p = 1/2, we find that regardless of the value of p,<br />
2 STD (p) = 2<br />
p(1 − p)<br />
n<br />
≤ 1<br />
√ n .<br />
In most of our examples, the empirical probabilities have been based on n = 1000 repetitions.<br />
Thus, our empirical probabilities are typically within ±.03 of the true probabilities.<br />
For example, suppose we simulate 1000 throws of five dice, and find that on 71 occasions<br />
we get a sum of 14. Then we are fairly certain that the true probability of getting<br />
14 lies somewhere between .041 and .101.
<strong>Section</strong> 3.5 5<br />
<strong>Section</strong> 3.5 Geometric and negative binomial random variables<br />
Like the binomial, the geometric and negative binomial random variables are based<br />
on a sequence of independent and identical Bernoulli trials. Instead of fixing the number<br />
of trials n and counting up how many successes there are, we fix the number of successes<br />
k and count up how many trials it takes to get them. The geometric random variable is<br />
the number of trials until the first success. Given an integer k ≥ 1, the negative binomial<br />
random variable is the number of trials until the k th success. You see that a geometric<br />
random variable is a negative binomial random variable where k = 1. On the other hand,<br />
note that a negative binomial random variable Y is the sum of k independent geometric<br />
random variables. That is, Y = X1 + X2 + · · · + Xk, where X1 is the number of trials until<br />
the first success, X2 is the number of trials after the first success until the second success,<br />
etc. All of these X ’s have geometric distributions with parameter p. If Y is negative<br />
binomial, then a typical sample point belonging to (Y = y) looks like F F S · · · F S S,<br />
where the first y − 1 symbols in the string contain exactly k − 1 successes and y − k<br />
such strings, and they all<br />
failures, and then the y th symbol is an S. Since there are y−1 k−1<br />
have probability pk (1 − p) y−k we get the following formula.<br />
Negative binomial probabilities.<br />
If Y is a negative binomial random variable with parameters k<br />
and p, then<br />
P(Y = y) =<br />
<br />
y − 1<br />
p<br />
k − 1<br />
k (1 − p) y−k , y = k, k + 1, . . . .<br />
It follows that the geometric distribution is given by p(y) = p(1 − p) y−1 , y = 1, 2, . . . .<br />
Example. The chance of a packet arrival to a distribution hub is 1/10 during each<br />
time interval. Let Y be the arrival time of the first packet, it has a geometric distribution<br />
with p = .10. The probability that the first packet arrives during the third time interval<br />
is P(Y = 3) = (1/10) 1 (9/10) 2 = .081. The probability that the first packet arrives on or<br />
after the third time interval is<br />
P(Y ≥ 3) = 1 − P(Y = 1) − P(Y = 2) = 1 − .10 − (.90)(.10) = .81.<br />
If X is the arrival time of the tenth packet, the chance that it arrives on the 99 th time<br />
interval is P(X = 99) = 98 10 89<br />
9 (1/10) (9/10) = 0.01332.
Example. The 500 goal club<br />
<strong>Section</strong> 3.7 6<br />
With only 30 games remaining in the NHL season, veteran winger Flash LaRue is<br />
starting to get worried. With a career total of 488 goals, it is not at all certain that he<br />
will be able to score his 500th career goal before the end of the season. He will get a big<br />
bonus from his team if he manages this feat, but unfortunately Flash only scores at a rate<br />
of about once every three games. Is there any hope that he will get his 500th goal before<br />
the end of the season?<br />
Let’s try to calculate the moments of a negative binomial random variable.<br />
p + p(1 − p) + p(1 − p) 2 + p(1 − p) 3 + · · · 1<br />
p(1 − p) + p(1 − p) 2 + p(1 − p) 3 + · · · (1 − p)<br />
p(1 − p) 2 + p(1 − p) 3 + · · · (1 − p) 2<br />
p(1 − p) 3 + · · · (1 − p) 3<br />
p + 2p(1 − p) + 3p(1 − p) 2 + 4p(1 − p) 3 + · · · 1/p<br />
This sum ought to convince you that the mean of a geometric random variable is 1/p,<br />
and the result for negative binomial follows from the equation Y = X1 + X2 + · · · + Xk.<br />
Confirming the variance formula is left as an exercise.<br />
Negative binomial moments.<br />
If Y is a negative binomial random variable with parameters k<br />
and p, then<br />
E(Y ) = k<br />
p<br />
and VAR (Y ) =<br />
. ..<br />
k(1 − p)<br />
p2 .<br />
We note that, as you would expect, the rarer an event is, the longer you will have to<br />
wait for it. Taking the geometric case (k = 1), we see that we will wait on average µ = 2<br />
trials to see the first “heads” in a coin tossing experiment, we will wait on average µ = 36<br />
trials to see the first pair of sixes in tossing a pair of dice, and we will buy on average<br />
µ = 13, 983, 816 tickets before we win Lotto 6-49.<br />
We also note that σ decreases from infinity to zero as p ranges from 0 to 1. This says<br />
that predicting the first occurrence of an event is difficult for rare events, and easy for<br />
common events.<br />
<strong>Section</strong> 3.7 Hypergeometric random variable<br />
The hypergeometric distribution is the number of successes that arise in sampling<br />
without replacement. We suppose that there is a population of size N , of which r of them<br />
are “successes” and the rest “failures”, and a sample of size n is drawn.<br />
.
<strong>Section</strong> 3.7 7<br />
The probability formula below is simply the ratio of the number of samples containing<br />
y successes and n −y failures, to the total number of possible samples of size n. The weird<br />
looking conditions on y just ensure that you don’t try to find the probability of some<br />
impossible event.<br />
Hypergeometric probabilities.<br />
If Y is a hypergeometric random variable with parameters n, r,<br />
and N , then<br />
P(Y = y) =<br />
<br />
r N − r<br />
y n − y<br />
<br />
N<br />
n<br />
, y = max(0, n −(N −r)), . . . , min(n, r)<br />
Example. A box contains 12 poker chips of which 7 are green and 5 are blue.<br />
Eight chips are selected at random without replacement from this box. Let X denote the<br />
number of green chips selected. The probability mass function is<br />
<br />
7 5<br />
<br />
p(x ) =<br />
x 8 − x<br />
<br />
12<br />
8<br />
, x = 3, 4, . . . , 7.<br />
Note that the range of possible x values is restricted by the make-up of the population.<br />
Example. Lotto 6-49<br />
In Lotto 6-49 you buy a ticket with six numbers chosen from the set {1, 2, . . . , 49}.<br />
The draw consists of a random sample drawn without replacement from the same set,<br />
and your prize depends on how many “successes” were drawn. Here a “success” is any<br />
number that was on your ticket. So Y , the number of matches, follows a hypergeometric<br />
distribution with r = 6, n = 6, and N = 49. The probabilities for the different number of<br />
matches are obtained using the formula<br />
<br />
6 43<br />
y 6 − y<br />
P(Y = y) = , y = 0, . . . , 6.<br />
49<br />
6<br />
To four decimal places, we have<br />
y 0 1 2 3 4 5 6<br />
p(y) .4360 .4130 .1324 .0176 .0010 .0000 .0000
Hypergeometric moments.<br />
<strong>Section</strong> 3.7 8<br />
If Y is a hypergeometric random variable with parameters n, r,<br />
and N then<br />
E(Y ) = n r<br />
N<br />
and VAR (Y ) = n r<br />
N<br />
N − r<br />
N<br />
N − n<br />
N − 1 .<br />
For example, the average number of green chips drawn in the first problem is µ =<br />
(8)(7)/12 = 4.66666. Also, the average number of matches on your Lotto 6-49 ticket is<br />
µ = (6)(6)/49 = .73469.<br />
Example. Capture-tag-recapture<br />
A scientific expedition has captured, tagged, and released eight sea turtles in a particular<br />
region. The expedition assumes that the population size in this region is 35, which<br />
means that 8 are tagged and 27 not tagged. The expedition will now capture 10 turtles<br />
and note how many of them are tagged. If the assumption about the population size is<br />
correct, what is the probability that the new sample will have 3 or less tagged turtles in<br />
it?<br />
P(Y ≤ 3) = P(Y = 0) + P(Y = 1) + P(Y = 2) + P(Y = 3)<br />
<br />
8 27 8 27 8 27 8 27<br />
=<br />
0 10<br />
<br />
35<br />
+<br />
1 9<br />
<br />
35<br />
+<br />
2 8<br />
<br />
35<br />
+<br />
3 7<br />
<br />
35<br />
10 10 10 10<br />
= .04595 + .20424 + .33861 + .27089<br />
= .85969.<br />
We would certainly expect to get three or less tagged turtles in the new sample. If the<br />
expedition found five tagged turtles, is that evidence that they have over-estimated the<br />
population size?<br />
Example. A political poll<br />
The population of Alberta is around 2, 545, 000, and let’s suppose that about 70% of<br />
these are eligible to vote in the next provincial election. Then the population of eligible<br />
voters has N = 1781500 people in it. Suppose that n = 100 people are randomly selected<br />
from the eligible voters (without replacement) and asked whether or not they support<br />
Ralph Klein. Also suppose, for the sake of argument, that exactly 60%, or 1068900 eligible<br />
voters do support Ralph Klein. How accurately will the poll reflect that?<br />
Let Y stand for the number of Klein supporters included in the random sample. Then<br />
Y has a hypergeometric distribution with n = 100, r = 1068900, and N = 1781500. The
mean and variance of Y are given by<br />
µ = 100 1068900<br />
1781500 = 60 and σ 2 = 100 1068900<br />
1781500<br />
<strong>Section</strong> 3.8 9<br />
712600<br />
1781500<br />
1781400<br />
1781499<br />
= 23.998666.<br />
A two standard deviation interval says that probably between 50 to 70 people in the poll<br />
will be Klein supporters.<br />
Note that if the sampling were done with replacement, then Y would follow a binomial<br />
distribution with n = 100 and p = .6. In this case, we would have<br />
Since n is small relative to N , the ratio<br />
µ = 100(.6) = 60 and σ 2 = 100(.6)(.4) = 24.<br />
1781400<br />
1781499<br />
= N − n<br />
N − 1<br />
and the mean and variance of the hypergeometric distribution coincide with the mean and<br />
variance of the binomial distribution. The distributions of these two random variables are<br />
also essentially the same whenever n is small relative to N .<br />
<strong>Section</strong> 3.8 Poisson random variable<br />
This probability distribution is named after the French mathematician Poisson, according<br />
to whom. . .<br />
≈ 1,<br />
Life is good for only two things, discovering mathematics and<br />
teaching mathematics <strong>–</strong> Siméon Poisson<br />
In Recherches sur la probabilité des jugements en matière criminelle et en matière<br />
civile, an important work on probability published in 1837, the Poisson distribution first<br />
appeared. The Poisson distribution describes the probability that a random event will<br />
occur in a time or space interval under the conditions that the probability of the event<br />
occurring is very small, but the number of trials is very large so that the event actually<br />
occurs a few times.<br />
To illustrate this idea, suppose you are interested in the number of arrivals to a queue<br />
in a one day period. You could divide the time interval up into little subintervals, so that<br />
for all practical purposes, only one arrival can occur per subinterval. Therefore, for each<br />
subinterval of time, we have<br />
P(no arrival) = 1 − p, P(one arrival) = p, P(more than one arrival) = 0.<br />
The total number of arrivals X , is the number of subintervals that contain an arrival. This<br />
has a binomial distribution, where n is the number of subintervals. The probability of<br />
seeing x arrivals during the day is<br />
P(X = x ) =<br />
<br />
n<br />
p<br />
x<br />
x (1 − p) n−x .
<strong>Section</strong> 3.8 10<br />
Now let’s suppose that you keep on dividing the time interval into smaller and smaller<br />
subintervals; increasing n but decreasing p so that the product µ = np remains constant.<br />
What happens to P(X = x )?<br />
<br />
n<br />
p<br />
x<br />
x (1 − p) n−x =<br />
<br />
n<br />
<br />
µ<br />
x 1 −<br />
x n<br />
µ<br />
n−x n<br />
Now you take the limit as n → ∞, and obtain<br />
<br />
1 − µ<br />
n → e<br />
n<br />
−µ<br />
= n(n − 1) · · · (n − x + 1)<br />
<br />
µ<br />
x 1 −<br />
x !<br />
n<br />
µ<br />
n 1 −<br />
n<br />
µ<br />
−x n<br />
= µ x <br />
1 −<br />
x !<br />
µ<br />
n n<br />
<br />
n − 1<br />
<br />
n − x + 1<br />
<br />
· · ·<br />
1 −<br />
n n n<br />
n<br />
µ<br />
−x .<br />
n<br />
and<br />
This leads to the following formula.<br />
n<br />
n<br />
Poisson probabilities.<br />
n − 1<br />
n<br />
<br />
· · ·<br />
n − x + 1<br />
n<br />
<br />
1 − µ<br />
−x → 1.<br />
n<br />
If X is a Poisson random variable with parameter µ, then<br />
P(X = x ) = e −µ µ x<br />
x !<br />
, x = 0, 1, . . . ,<br />
The derivation of the Poisson distribution explains why it is sometimes called the law<br />
of rare events. Let’s look at an example involving the rarest event I can think of.<br />
Example. More Lotto 6-49<br />
The odds of winning the jackpot in Lotto 6-49 are one in 13,983,816, or p = 7.1511 ×<br />
10 −8 . Suppose you play twice a week, every week for 10,000 years. The total number<br />
of plays is then n = 2 × 52 × 10000 = 1, 040, 000. Setting µ = np = .07437 and using<br />
the Poisson formula, we see that the chance of hitting zero jackpots during this time is<br />
P(X = 0) = (e −.07437 )(.07437) 0 /0! = .928327. After all that time, we still have only<br />
about a 7% chance of getting a Lotto 6-49 jackpot. The probability of getting exactly two<br />
jackpots during this time is P(X = 2) = (e −.07437 )(.07437) 2 /2! = .002567.<br />
Example. Hashing<br />
Hashing is a tool for organizing files, where a hashing function transforms a key into<br />
an address, which is then the basis for searching for and storing records. Hashing has two<br />
important features:<br />
1. With hashing, the addresses generated appear to be random — there is no immediate<br />
connection between the key and the location of the record.
<strong>Section</strong> 3.8 11<br />
2. With hashing, two different keys may be transformed into the same address, in which<br />
case we say that a collision has occurred.<br />
Given that it is nearly impossible to achieve a uniform distribution of records among<br />
the available addresses in a file, it is important to be able to predict how records are likely<br />
to be distributed. Suppose that there are N addresses available, and that the hashing<br />
function assigns them in a completely random fashion. This means that for any fixed<br />
address, the probability that it is selected is 1/N . If r keys are hashed, we can use the<br />
Poisson approximation to the binomial to obtain the probability that exactly x records<br />
are assigned to a given address. This is<br />
p(x ) = e −(r/N) (r/N ) x<br />
x !<br />
, x = 0, 1, . . .<br />
For instance, if we are trying to fit r = 10000 records in N = 10000 addresses, the<br />
proportion of addresses that will remain empty is p(0) = 1 0 e −1 /0! = .3679. We would<br />
expect a total of about 3679 empty addresses. Since p(1) = 1 1 e −1 /1! = .3679, we would<br />
also expect a total of about 3679 addresses with 1 record assigned, and about 10000 −<br />
2(3679) = 2642 addresses with more than 1 record assigned. Because we have a packing<br />
density r/N of 1, we must expect a large number of collisions. In order to reduce the<br />
number of collisions we should increase the number N of available addresses.<br />
For more about hashing, the reader is referred to <strong>Chapter</strong> 11 of the book File Structures:<br />
A conceptual toolkit by Michael J. Folk and Bill Zoellick.<br />
Poisson moments.<br />
If X is a Poisson random variable, then<br />
Example. Particle emissions<br />
E(X ) = µ and VAR (X ) = µ.<br />
In 1910, Hans Geiger and Ernest Rutherford conducted a famous experiment in which<br />
they counted the number of α-particle emissions during 2608 time intervals of equal length.<br />
Their data is as follows.<br />
x 0 1 2 3 4 5 6 7 8 9 10 > 10<br />
intervals 57 203 383 525 532 408 273 139 45 27 10 6<br />
A total of 10097 particles were observed, giving a rate of µ = 10097/2608 = 3.8715<br />
particles per time period. If these particles were following a Poisson distribution, then the<br />
number of intervals with no particles should be about<br />
2608 × e −3.8715 (3.8715) 0<br />
0!<br />
= 54.31,
<strong>Section</strong> 3.8 12<br />
the number of intervals with exactly one particle should be about<br />
2608 × e −3.8715 (3.8715) 1<br />
1!<br />
= 210.27,<br />
and so on. In fact, the frequencies that we would expect to observe are<br />
0 1 2 3 4 5 6 7 8 9 10 > 10<br />
54.31 210.27 407.06 525.31 508.44 393.69 254.03 140.50 67.99 29.25 11.32 5.83<br />
By comparing these two tables, you can see that the Poisson distribution seems to<br />
describe this phenomenon quite well.