Two-Sample Hypothesis Testing
Chapter 8

In this case we will collect two samples of observations and we will use the data to infer if the two populations from which the samples came are the same. You can in fact test hypotheses concerning any population parameter, but we will confine ourselves to tests that involve the mean and variance.

Two-Sample t-test (chapter 8.1)

Suppose that you have two populations of raccoons and you want to study variations in body weight (kg), you collect 11 raccoons from population one and Xbar1 = 4.2 and s12 = 21.87, from population two you collect 8 raccoons and Xbar2 = 5.6 and s22 = 15.36. The subscripts 1 and 2 refer to population one and two, respectively.

We would like to test the null hypothesis that the populations have the same weight and, in this case, that the means of the populations are the same.

Ho: mu1 = mu2
HA: mu1 =/ mu2

Note that we do not know what the population means are. From the results of our test we are going to infer if the means are equal or not, but in reality we will never know for sure if the means are equal or not. Note also that the Xbars are not equal 4.2 =/ 5.6, and clearly a test of the Xbars would not be meaningful.

To test this null hypothesis we will use a two-sample t-test. There are several assumption for this test. 1) the observations in our sample were drawn at random from their respective populations. This is an assumption for essentially every statistical test. 2) The samples come from populations whose distributions are normal. We covered how to do this earlier and for now will assume that the populations are normally distributed, however, you should test this assumption before further analyzing your data. 3) The samples come from populations whose variances are equal.

Variance Ratio Test (chapter 8.5)

The third assumption listed above can be checked with an appropriate statistical test called the variance ratio test and before proceeding with the t-test, we will check this assumption by testing the following null hypothesis.

Ho = sigma12 = sigma22
HA = sigma12 =/ sigma22

To test Ho we will calculate an F value that results from the division of one sample variance by another sample variance. This quantity, the ratio of two sample variances, is distributed as an F-distribution. The critical values for the F distribution are given in Table B4 in the appendix. You should note that there are two degrees of freedom associated with an F distribution, one degree of freedom for the numerator variance and another for the denominator variance. Thus, each page of table B4 is for a different value of the numerator degree of freedom. When using the tables for the F distribution it is very important to use the correct degrees of freedom for the numerator and denominator.

The values of the F distribution vary from 0 to + infinity, however the tables only give you values for alpha less than or equal to 0.50. Thus, to insure that your calculated F value will be within the range of values given in the table, it is important that the largest of the two sample variances be in the numerator. The calculated F will then be greater than 1.

F = s12/s22 or F = s22/s12 (whichever is larger, equation 8.28)

In our case, sample 1 has the larger sample variance (21.87) and thus it will be put in the numerator.
F = 21.87/15.36 = 1.42

To determine if we reject our null hypothesis or not, we need to compare our calculated F to a critical F value from the table. The critical value will have an alpha(2) = 0.05; (2) because we are doing a two-tailed test. The numerator degrees of freedom is equal to n1 - 1 = 11 - 1 = 10 and denominator degrees of freedom is equal to n2 - 1 = 8 - 1 = 7.

F critical = Falpha(2)=0.05, 10, 7 = 4.76.

This value is on page 29 of the appendix (the page for numerator degrees of freedom equal to 10). Use the column for alpha(2) = 0.05 and the row for 7 degrees of freedom, The intersection of this column and row yields the value 4.76.

As our calculated F value (1.42) is less than the critical F value (4.76), we fail to reject the null hypothesis that the populations variances are equal. We can now pool our variances and proceed with the t-test.

Calculations for the two-sample t-test (chapter 8.1)

The formula for the t-value is Xbar1 - Xbar2/ standard error of the difference among the means (equation 8.1).
      Xbar1 - Xbar2
t = ----------------------------
      sXbar1 - Xbar2
The first quantity to be determined is sXbar1 - Xbar2.
To do this we first need the pooled sample variance

sp2 = (SS1 + SS2)/ (v1 + v2) equation 8.4
where v1 = n1 - 1 and v2 = n2 - 1
Remember that SS1 (sum of squares) is equal to the variance times n - 1.

sp2 = (218.7 + 107.5)/(10 + 7) = 19.189

The pooled variance is then used to calculate the standard error of the difference between the means.

sXbar1 - Xbar2 = square root of ((sp2/n1) + (sp2/n2))    (equation 8.6)

sXbar1 - Xbar2 = square root of ((19.189/11) + (19.189/8)) = 2.035

This quantity (2.035) is then substituted along with the Xbars into equation 8.1

t = (4.2 - 5.6)/2.035 = -0.687

We now need to compare our calculated t-value to a critical value. The critical value can be found in table B3. The degrees of freedom in this case are n1 + n2 - 2 = 11 + 8 - 2 = 17. As we are doing a two-tailed test, our critical value will be at the alpha(2) = 0.05. The critical value is 2.110. As the absolute value of our calculated t-value (we are doing a 2-tailed test) is less than 2.110, we fail to reject the null hypothesis that the means of the populations are equal.

Use SigmaStat to run a two-sample t-test.

Comments (chapter 8.1)

The t-test is quite robust to the assumption of normality and equality of variances. This means that even with some deviation from the assumptions, the t-test will retain much of its power. This is especially true if the samples sizes are nearly equal for the two groups, and the larger the sample size the more robust the test. Zar discusses how various deviations from the assumptions affect the results of the t-test. You should study pages 127 - 129 carefully, but you do not need to learn the formulas presented in this section.

There is a modified formula for the t-value, t' (equation 8.11) that does not assume the equality of variances. The formula for t' is more powerful than the formula for t when the variances are unequal and the formula for t is more powerful when the variances are equal. Some packages of computer program will yield either t or t' depending upon the results of the variance ratio test. It is a good idea to pay close attention to the output so that you understand what the computer package is giving you.

P values

In the above example we made the choice to reject or not reject based upon an alpha(2) = 0.05. We can also look at this decision to reject based on P values. If the P value associated with our test had been less than or equal to 0.05, we would have rejected. How do we determine the P value associated with our test?

The calculated t value in the above example was -0.687. As we are doing a two-tailed test and the curve is symmetrical, we can use the value 0.687 when determining the P value. As you look at table B3, find the row equal to the degrees of freedom for our test (in this case 17). Go across the row, comparing the numbers in the table to our calculated value. You are looking for that value that is larger than the calculated t-value. In our case the first entry in the row is 0.689, which is greater than our calculated t-value, so we do not need to look any farther. If you go to the top of the column containing 0.689, you see that the alpha(2) value associated with 0.689 is 0.50. Also note that as the t-values get larger, the associated alpha(2)s get smaller. Thus, as 0.687 is less than 0.689, our P value is greater than 0.50. Thus, based on the table, we can say that P > 0.50. Not very close to the cut-off of 0.05.

If our calculated t-value had been say 1.897, this number would have fallen between the values 1.740 and 2.110, which are associated with alpha(2) values of 0.10 and 0.05, respectively. We could then say that 0.05 is less than P which is less than 0.10 (0.05 < P < 0.10). Still not a significant difference between the means.

Most computer programs calculate the exact value of P and include this in the output. Thus for our example, the computer may have said that the P value was equal to 0.52 (P = 0.52). This P value is usually reported when presenting the results of statistical tests in publications. It might look like this (t = -0.687, df = 17, P = 0.52). It will be important for you to be able to determine the range for P values from the statistical tables.

Study section 8.2, 8.3, and 8.4 for the concepts of power and how power can be determined, however, I will not expect you to remember these formulas or be able to do these calculations on a test.

Do problems 8.1, 8.2, 8.3 and 8.4 at the end of chapter 8 (page 159).

Use SigmaStat to do problems 8.1 and 8.2

Mann-Whitney Test (chapter 8.9 and 8.10)

While the t-test is quite robust, you can deviate so far from the assumptions to the extent that the power becomes too small or that the test begins to perform poorly in other ways. Sometimes, coding or transforming the data may produce a normal distribution, but we will not spend anytime on the transformation of data. If your data do not conform to the assumptions, then you can do a non-parametric test, which is a test that does not have any assumptions about the nature of the underlying distributions (normality and equality of variances). In general, the Mann- Whitney test is more powerful when the assumption of a t-test are not met and a t-test is more powerful when the assumptions are met. You would want to choose a test that has the highest power, that is the greatest ability to reject a false null hypothesis.

Just as the two-sample t-test is the appropriate parametric test when you have two samples and you want to test if the populations means are equal, the Mann-Whitney test is the appropriate non-parametric tests for two samples. The test is carried out through the use of ranks of the measurements and not the original measurements. You will note that the null hypothesis for a Mann-Whitney test does not say anything about the populations means. Remember, mu is a population parameter, and as this test is non-parametric, it does not address if the means are equal. For example, if the you were looking at the difference in body weight of raccoons from Kansas in comparison to those from North Dakota and you had determined that you data fit the assumption of a t-test, your null hypothesis might be

H: mukansas = munorth dakota
HA: mukansas =/ munorth dakota

However, if you had determined that the data did not fit the assumptions of a t-test and you were to do a Mann-Whitney test your null hypothesis would be

Ho: Kansas and North Dakota raccoons have the same body weight
HA: Kansas and North Dakota racoons are not the same body weight.

I will not expect you to be able to do the calculations of the Mann-Whitney test on a test, instead I will expect you to be able to interpret the output from SigmaStat. However, you should pay close attention to how you look up critical values to reject a null hypothesis, how to do a one- tailed test, and when the normal approximation to a Mann-Whitney test can be used. Some computer programs automatically calculate the normal approximation. Be sure to study example 8.13 and 8.14 in your textbook.

Use SigmaStat to run a Mann-Whitney Test.
Do problems 8.12 and 8.13 at the end of chapter 8 (page 160).

Use SigmaStat to do problems 8.21 and 8.13

Last updated on 27 September 2000.
Provide comments to Dwight Moore at
Return to the RDA Home Page at Emporia State University.