Notes

Notes Notes

ce.yildiz.edu.tr
from ce.yildiz.edu.tr More from this publisher
19.11.2014 Views

Two-sample z-test • To test the null hypothesis H 0 : µ 12 = 0, we determine the z-score, • Then, depending on the alternative hypothesis, we can calculate the p-value, which is the observed significance level, as: The above tail probabilities are obtained from the standard normal distribution. For example: • Suppose that our sample includes n 1 = 25 women and n 2 = 27 men. The sample mean of body temperature is 1 = 98.2 for women and 2 = 98.4 for men. Then, our point estimate for the difference between population means is 12 = −0.2. • We assume that = 0.8 and = 1. • The variance of the sampling distribution is 0.8/25+1/27 = 0.07, and the standard deviation is SD 12 = √0.07 = 0.26. • The z-score is z = (-0.2 / 0.26)= -0.76 • H A : µ 12 ≠ 0 and z = −0.76. Therefore, p obs = 2P(Z ≥ |−0.76|) = 2×0.22 = 0.44. • For the body temperature example, p obs = 0.44 is greater than the commonly used significance levels (e.g., 0.01, 0.05, and 0.1). Therefore, the test result is not statistically significant, and we cannot reject the null hypothesis (which states that the population means for the two groups are the same) at these levels. Two-Sample t-test • In practice, SD 12 is not known since σ 1 and σ 2 are unknown. • We can estimate it as follows: where SE 12 is the standard error of 12 . • Then, instead of the standard normal distribution, we need to use t-distributions to find p-values. • For this, we can use R or R-Commander. • The t –score • Therefore, we rarely need to calculate the degrees of freedom manually. Alternatively, we could use a conservative approach and set df to min(n 1 − 1,n 2 − 1), i.e., the smaller of n 1 − 1 and n 2 − 1.

Paired t-test • While we hope that the two samples taken from the population are comparable except for the characteristic that defines the grouping, this is not guaranteed in general. • To mitigate the influence of other important factors (e.g., age) that are not the focus of our study, we sometimes pair (match) each individual in one group with an individual in the other group so that the paired individuals are very similar to each other except for the characteristic that defines the grouping. • For example, we might recruit twins and assign one of them to the treatment group and the other one to the placebo group. • Sometimes, the subjects in the two groups are the same individuals under two different conditions. • When the individuals in the two groups are paired, we use the paired t-test to take the pairing of the observations between the two groups into account. • Using the difference, D, between the paired observations, the hypothesis testing problem reduces to a single sample t-test problem. • To test the null hypothesis H 0 : µ = 0, we calculate the T statistic, where, n is the number of pairs. • The test statistic T has the t-distribution with n - 1 degrees of freedom. • We calculate the corresponding t-score as follows: • Then the p-value is the probability of having as extreme or more extreme values than the observed t- score: where T has the t-distribution with n - 1 degrees of freedom. Example: • For the Platelet data, we want to compare platelet aggregation measurements for the same person before and after smoking. D represents the difference in platelet aggregation from before to after. • The observed value of is =−10.27. Using the sample standard deviation s = 7.98 and the sample size n = 11, the standard error (i.e., estimated standard deviation) of is 7.98 2.41 √11 • Suppose that we are interested in 95% confidence interval estimate for µ (i.e.,population mean of the difference between before and after smoking). Using the t -distribution with n−1 = 10 degrees of freedom, we obtain t crit = 2.23. Therefore, the 95% confidence interval is , 10.27 2.23 2.4, 10.27 2.23 2.4 15.64, 4.90 • At 0.95 confidence level, we believe that the true mean of the difference is between −15.64 and −4.90. Note that this range includes negative values only and does not include the value 0 specified by the null hypothesis. • To perform hypothesis testing, we find the t-score (here, µ0 = 0 according to the null hypothesis) as follows: 10.27 4.26 2.41 • To find the p-value, we find the upper tail probability of |−4.26| = 4.26 from the t -distribution with 10 degrees of freedom, and multiply the results by 2 for two sided hypothesis testing: 2 4.26 2 0.0008 0.0016

Two-sample z-test<br />

• To test the null hypothesis H 0 : µ 12 = 0, we determine the z-score,<br />

• Then, depending on the alternative hypothesis, we can calculate the p-value, which is the observed<br />

significance level, as:<br />

The above tail probabilities are obtained from the standard normal distribution.<br />

For example:<br />

• Suppose that our sample includes n 1 = 25 women and n 2 = 27 men. The sample mean of body<br />

temperature is 1 = 98.2 for women and 2 = 98.4 for men. Then, our point estimate for the<br />

difference between population means is 12 = −0.2.<br />

• We assume that <br />

<br />

= 0.8 and <br />

<br />

= 1.<br />

• The variance of the sampling distribution is 0.8/25+1/27 = 0.07, and the standard deviation is SD 12 =<br />

√0.07 = 0.26.<br />

• The z-score is z = (-0.2 / 0.26)= -0.76<br />

• H A : µ 12 ≠ 0 and z = −0.76. Therefore, p obs = 2P(Z ≥ |−0.76|) = 2×0.22 = 0.44.<br />

• For the body temperature example, p obs = 0.44 is greater than the commonly used significance levels<br />

(e.g., 0.01, 0.05, and 0.1). Therefore, the test result is not statistically significant, and we cannot reject<br />

the null hypothesis (which states that the population means for the two groups are the same) at these<br />

levels.<br />

Two-Sample t-test<br />

• In practice, SD 12 is not known since σ 1 and σ 2 are unknown.<br />

• We can estimate it as follows:<br />

where SE 12 is the standard error of 12 .<br />

• Then, instead of the standard normal distribution, we need to use t-distributions to find p-values.<br />

• For this, we can use R or R-Commander.<br />

• The t –score<br />

• Therefore, we rarely need to calculate the degrees of freedom manually. Alternatively, we could use a<br />

conservative approach and set df to min(n 1 − 1,n 2 − 1), i.e., the smaller of n 1 − 1 and n 2 − 1.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!