When we reject the null hypothesis in ANOVA, we have shown that at least one of the populations has a mean that is different from the others. The question remains though, which population or populations are different. There are in fact numerous alternate hypotheses to the single null hypothesis and what we would like to do is chose an alternate hypothesis. For example, for the five raccoons populations, there are a possible 15 alternate hypotheses. The way that we decide which is the correct alternate hypothesis is with a procedure called multiple comparisons (or range tests). There are many such tests and they all perform slightly differently and have slightly different theoretical bases. The most commonly encountered is either Tukey or Student-Newman-Keuls (SNK), however others are often seen. Some tests are more conservative, in that they are less likely to identify two populations as being different, while other are less conservative and tend to find more differences among the populations. In general, it is probably best to stick with a Tukey test or Student-Newman-Keuls unless you can find some reasonable justification for using a different test. Of course, the choice of your multiple comparison test should be done before performing the analysis and not afterwards as a search of significance in the data.
Tukey Test (chapter 11.1)
|The formulas for calculating a Tukey test are presented in chapter 11.1 but you will not be responsible for knowing them. You are, however, responsible for understanding the general method used in determining which means are significantly different and interpreting the output from a computer program.|
After an analysis of variance has determined that a significant difference exists among the means, then you would run a Tukey test. The basic procedure is to first rank the sample means from highest to lowest. Then the mean with the highest value is compared to the mean with the lowest value. This comparison should show these means to be different. Next the highest mean is compared to the mean second from the lowest, if this comparison is also significant then the highest mean is compared to the third from the lowest. This continues until a comparison does not yield a significant difference. Once a non-significant comparison is found then no other comparisons with the highest mean are needed as they will also be non-significant. Once we have determined which means are significantly different from the highest mean, we repeat this process with the second highest mean. Again determining which means of lower value are significantly different from this one.
Eventually this process produces sets of means that are not significantly different from each other. Below is part of the output from SigmaStat that we ran for chapter 10 on the set of data where we obtained a significant difference among the means. Notice that if we ranked the means from highest to lowest they would be
Col 4 53.20
Col 2 44.20
Col 1 43.20
Col 3 42.20
Col 5 33.20
At the end of the output is a section called the All Pairwise Multiple Comparison Procedures (Tukey Test). In this test Col 4 is tested versus each of the other columns and found to be significantly different from all of them (P < 0.05). Next Col 2 is tested versus the next lowest three. You might note that since the comparison with Col 3 was not significant there was no reason to test Col 2 versus Col 1 as it to was not significant. Next Col 1 (third highest mean) is compared with the remaining two lowest means and only one (versus Col 5) is found to be significant. Finally, Col 3 is compared to the lowest value and found to be significant.
What we need to do next is summarize this in the form of non-significant subsets of means. Again examining our list of sorted means. Col 4 is different from all other groups and thus is in a subset by itself. Columns 2, 1, and 3 were never significant from each other and thus constitute a second non-significant subsets of means. Finally, Col 5 was different from all of the other means and constitutes a subset of means by itself. The vertical bars below group those means that are not significantly different from each other, and this is a common way of summarizing the results of range tests. The resulting alternate hypothesis can be written:
HA: mu4 =/ mu2 = mu1 = mu3 =/ mu5
Col 4 53.20 |
Col 2 44.20 |
Col 1 43.20 |
Col 3 42.20 |
Col 5 33.20 |
Be sure to study carefully example 11.1 in your textbook. Often the results of a multiple comparison test will yield "overlapping subsets", that is a given mean may be a member of two different subsets. Obviously, this can not be the case and its inclusion in one of the subsets represents a type II error, however, we will not know which is the error. The only option, if it is important to resolve this issue, would be to collect more data and thus increase the power of the range tests sufficiently to resolve this matter. In many cases, this may not be possible, and you will simply have to accept the fact that your subsets overlap.
Group Mean Std Dev SEM
Col 1 43.200 1.924 0.860
Col 2 44.200 1.924 0.860
Col 3 42.200 1.924 0.860
Col 4 53.200 1.924 0.860
Col 5 33.200 1.924 0.860
All Pairwise Multiple Comparison Procedures (Tukey Test):
Comparisons for factor:
Comparison Diff of Means p q P<0.05
Col 4 vs. Col 5 20.000 5 23.250 Yes
Col 4 vs. Col 3 11.000 5 12.787 Yes
Col 4 vs. Col 1 10.000 5 11.625 Yes
Col 4 vs. Col 2 9.000 5 10.462 Yes
Col 2 vs. Col 5 11.000 5 12.787 Yes
Col 2 vs. Col 3 2.000 5 2.325 No
Col 2 vs. Col 1 1.000 5 1.162 No
Col 1 vs. Col 5 10.000 5 11.625 Yes
Col 1 vs. Col 3 1.000 5 1.162 No
Col 3 vs. Col 5 9.000 5 10.462 Yes
The Student-Newman-Keuls test (chapter 11.2) is very similar to the Tukey test and the interpretation of the output is essentially that of the Tukey test. Dunnett's test (chapter 11.4) is a multiple comparison test that allows you to designate one group as a "control" and Dunnett's test determines if every other group is significantly different from the control.
In addition to these tests for ANOVA, if you were performing a Kruskal-Wallis test, there is a non-parametric analog to these tests, that uses ranks to determine non-significant subsets of the groups.
SigmaStat can do a variety of these multiple comparison tests. Go back and rerun the SigmaStat run in chapter 10 and instead of choosing a Tukey test, choose one of the others and compare the output. Pay particular attention to see if any of the non-significant subsets change with the different multiple comparisons.