10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3Bivariate dataThis chapter looks at data contained in two variables (bivariate data). With univariatedata, we summarized a data set with measures of center and spread and the shape of adistribution with words such as “symmetric” and “long-tailed.” With bivariate data wecan ask additional questions about the relationship between the two variables.Take, <strong>for</strong> instance, data on test scores. If two classes take the same test, the students’scores will be two samples that should have similarly shaped distributions but will beotherwise unrelated as pairs of data. However, if we focus on two exams <strong>for</strong> the samegroup of students, the scores should be related. For example, a better student would beexpected to do better on both exams. Consequently, in addition to the characterization ofdata as categorical or numeric, we will also need to know when the data is paired off insome way.3.1 Pairs of categorical variablesBivariate, categorical data is often presented in the <strong>for</strong>m of a (two-way) contingencytable. The table is found by counting the occurrences of each possible pair of levels andplacing the frequencies in a rectangular grid. Such tables allow us to focus on therelationships by comparing the rows or columns. Later, statistical tests will be developedto determine whether the distribution <strong>for</strong> a given variable depends on the other variable.Our data may come in a summarized or unsummarized <strong>for</strong>mat. The data entry isdifferent <strong>for</strong> each.3.1.1 Making two-way tables from summarized dataIf the data already appears in tabular <strong>for</strong>mat and we wish to analyze it inside R, how isthe data keyed in? Data vectors were created using the c() function. One simple way tomake a table is to combine data vectors together as rows (with rbind()) or as columns(with cbind()).To illustrate: an in<strong>for</strong>mal survey of seat-belt usage in Cali<strong>for</strong>nia examined therelationship between a parent’s use of a seat belt and a child’s. The data appears in Table3.1. A quick glance at the table shows a definite relationship between the two variables:the child’s being buckled is greatly determined by the parent’s.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!