12.07.2015 Views

STAT 201 — Assignment 1 Computing Exercises ... - People.stat.sfu.ca

STAT 201 — Assignment 1 Computing Exercises ... - People.stat.sfu.ca

STAT 201 — Assignment 1 Computing Exercises ... - People.stat.sfu.ca

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>STAT</strong> <strong>201</strong> — <strong>Assignment</strong> 1Partial Solutions<strong>Computing</strong> <strong>Exercises</strong>Ex 1I actually meant the original JMP histogram is verti<strong>ca</strong>lly displayed. After clicking Stack thehistogram is flipped and becomes horizontally displayed.Ex 2As we saw in class, you should see the males are taller than the females, although not by much.You see this by comparing each distribution’s center. You may also say something about thedispersion or spread of the distributions (they are similar).Ex 3• Splitting the height data by gender produces 2 separate columns of data.• Clicking Stack puts the two histograms side by side.• Choosing Uniform S<strong>ca</strong>ling leads to:1. common binning for both histograms;2. common x-range for both histograms.You get same conclusions as for Exercise 2, but with common x-ranges, you see CLEARLYthat the males are taller than the females (female heights pushed against left-end of x-axis;right-end for males), and with common binning, you see the two groups have similar spread.Non-computing <strong>Exercises</strong>1. NOTE: In constructing a histogram, we often put a tick mark on the x-axis for the left-endof each interval (bin):__|___|___|___|___|___|___|___|___|___|___|___|___|_______1 6 11 16 21 26 31 36 41 47 51 56 61For example, for a sentence containg 20 words, we put that in the 16-21 bin; for a sentencewith 21 words, it goes in the 21-26 bin; etc.(b) right skewed; majority of sentences contain 16 to 30 words; large variability (particularlysentences more than 30 words).(c) need actual observations (without grouping)2. Bar chart should be more informative for x-axis that is numeri<strong>ca</strong>l (displays groups innatural ascending order).(a) no: need upper bound for the 35+ group to display properly on x-axis of histogram(note: this is a non-issue for a bar chart which is techni<strong>ca</strong>lly used to display <strong>ca</strong>tegori<strong>ca</strong>ldata such as “<strong>ca</strong>r color”).(b) Tick marks should be:


__|___|___|___|___|_______|___15 20 25 30 35 45Anyone not quite 25 years old (e.g. 24 years and 364 days) fall in 20-25 bin. As soonas she hits her 25th birthday, she falls in the 25-30 bin.If you use the given percentages as the heights of the blocks, then the tricky bit is thatthe 35-45 bin does NOT have a height of 4. The size of each block should representthe coverage of that bin. By size, we mean the area of the block. Since the 35-45bin is twice as wide as all the other bins, it’s height should be reduced by 1/2, i.e. itshould have height 2.3. Note the unit! Any number you observe from the stem plot should be multiplied by 10.median: 5.5×10=55min: 0×10=0max: 45×10=450Q1: 2×10=20Q2: 17.5×10=175Right skewed; majority of students own 0-150 CD’s – many of whom own fewer than 100;large variability (particularly for students owning more than 150 CD’s).(b) another binning: 0’s, 100’s, 200’s, 300’s, and 400’s (so your stems will be 0, 1, 2, 3,4).4. (a) To remove outlier is to (1) shorten left tail and (2) reduce variability: mean moves toright (i.e. increases) and SD becomes smaller.5.(b) x ± 3 SD = 15.85 to 24.55: rule holds.(c) Based on figures (without looking at new mean and SD), removing outliers leavesa relatively bell-shaped distribution. Since the Empiri<strong>ca</strong>l Rule comes from the bellshaped(normal) distribution, the rule should work quite well on the modified data.(d) Hard to say: we need to know how those unusual values were obtained.6. 5: You are above average (i.e. good). The further you are above, the better off you are.So the smaller the SD, the more distinguished you are compared to the rest of the class.(Something to ponder: what if your mark was 65?)7. SD measures variability. 0 SD means 0 variability, i.e. everything the same. As mean mustbe 50, the only possibility is to have all 7 of them equal to 50.8. Re<strong>ca</strong>ll discussion in class of why x is affected by unusual observations (i.e. x is not resistant)– its formula includes all values, whether they be usual or un-. Similarly for the SD.9. Apply facts about symmetry and Empiri<strong>ca</strong>l Rule.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!