Notes

19.11.2014 Views
Hypothesis testing for population proportion • For a binary random variable X with possible values 0 and 1, we are typically interested in evaluating hypotheses regarding the population proportion of the outcome of interest, denoted as X = 1. • As discussed before, the population proportion is the same as the population mean for such binary variables. • So we follow the same procedure as described above. • More specifıcally, we use the z-test for hypothesis testing. • Note that we do not use t-test, because for binary random variable, population variance is σ 2 = µ(1 - µ). • Therefore, by setting µ=µ 0 according to the null hypothesis, we specify the population variance as σ 2 = µ 0 (1 - µ 0 ). • If we assume that the null hypothesis is true, we have • This means that • As a result, we obtain the z-score as follows: where p is the sample proportion (mean). • Consider the Melanoma dataset available from the MASS package. Suppose that we hypothesize that less than 50% of cases ulcerate: µ < 0.5. Then the null hypothesis can be expressed as H0 : µ = 0.5. Using the Melanoma data set, we can test the above null hypothesis. The number of observations in this data set is n = 205, of which 90 patients had ulceration. Therefore, p = 90/205 = 0.44. • Next, we can find the z-score for our test statistic as follows: • Because HA :µ

Statistical Inference for the Relationship between Two Variables • We now discuss hypothesis testing regarding possible relationships between two variables. • We focus on problems where we are investigating the relationship between one binary categorical variable (e.g., gender) and one numerical variable (e.g., body temperature). • In these situations, the binary variable typically represents two different groups or two different experimental conditions. • We treat the binary variable (a.k.a., factor) as the explanatory variable in our analysis. • The numerical variable, on the other hand, is regarded as the response (target) variable (e.g., body temperature). Relationship Between a Numerical Variable and a Binary Variable • In general, we can denote the means of the two groups as µ 1 and µ 2 . • The null hypothesis indicates that the population means are equal, H 0 : µ 1 = µ 2 . • In contrast, the alternative hypothesis is one the following: H A : µ 1 > µ 2 if we believe the mean for group 1 is greater than the mean for group 2. H A : µ 1 < µ 2 if we believe the mean for group 1 is less than the mean for group 2. H A : µ 1 ≠ µ 2 if we believe the means are different but we do not specify which one is greater. • We can also express these hypotheses in terms of the difference in the means: H A : µ 1 - µ 2 > 0, H A : µ 1 - µ 2 < 0, or H A : µ 1 - µ 2 ≠ 0. • Then the corresponding null hypothesis is that there is no difference in the population means, H 0 : µ 1 - µ 2 =0. • Previously, we used the sample mean to perform statistical inference regarding the population mean µ. • To evaluate our hypothesis regarding the difference between two means, µ 1 - µ 2 , it is reasonable to choose the difference between the sample means, 1 - 2 , as our statistic. • We use µ 12 to denote the difference between the population means µ 1 and µ 2 , and use 12 to denote the difference between the sample means 1 and 2 : µ 12 = µ 1 - µ 2 12 = 1 - 2 • By the Central Limit Theorem, • Therefore, where n 1 and n 2 are the number of observations • We can rewrite this as where • We want to test our hypothesis that H A : µ 12 ≠ 0 (i.e., the difference between the two means is not zero) against the null hypothesis that H 0 : µ 12 = 0. • To use 12 as a test statistic, we need to find its sampling distribution under the null hypothesis (i.e., its null distribution). • If the null hypothesis is true, then µ 12 = 0. Therefore, the null distribution of 12 is • As before, however, it is more common to standardize the test statistic by subtracting its mean (under the null) and dividing the result by its standard deviation.

Page 1 and 2: Hypothesis Testing • In general,

Page 3: One-sided vs. two-sided hypothesis

Page 7 and 8: Paired t-test • While we hope tha

Page 9: • Therefore, we are 95% confident

hypothesis

population

observed

reject

variable

probability

statistic

significance

denote

statistically

www.ce.yildiz.edu.tr

Notes

Notes ... View more Notes

Delete template?

Save as template ?

Notes Notes