Chapter 10

Often times, you will have an experimental design that includes comparisons among more than two samples. For example, you wish to test if the mean raccoon weight is the same at five locations (five samples from five different location). The statistical tools that you have to this point will not allow you to do this. You might be tempted to test if the means of the five populations were equal by doing two-sample t-test for all possible pairs of locations. However, this would lead to a very large increase in the chance of making a Type I error.

The problem is that for each t-test there is 5% chance of falsely rejecting a true null hypothesis (type I error). Your would need to do 10 such tests in order to test all possible pairs, with each test having a 5% chance of a type I error. For a series of 10 t-tests with alpha set at 0.05, there would be a 65% chance that at least one of the tests was a type I error (see table 10.1). Thus you can see, that it becomes very likely that you will conclude that there is difference among these populations when, in fact, there is none. As you can see, we clearly need a test that can do simultaneous comparisons among the five populations without increasing the chance of making a type I error. This method is called analysis of variance.

One of the estimates of the variance is the "average variance" based upon the variances of each group. This term is essentially like the pooled variance term in a two-sample t-test. The other estimate is based upon the calculation of a standard error that measures how much each sample mean differs from the grand mean for the data. This standard error can then be used to estimate a population variance. If the five samples of raccoons above came from populations whose means are equal then these two variances will estimate the same quantity and thus be very close to equal. However, if the means among the populations are different, then the variance estimated from the standard error will be much larger than the "pooled variance". The ratio of these two variances provides a value (F-value), with which we can evaluate the null hypothesis.

We will now look at the these two variances. Each will be calculated in a way that shows the source of this variations. However, bear in mind that the actual method of calculating these values is very different than the method that I will use first. However, this first method will help to explain how ANOVA works. We will use the data in the table below.

Note that there are seven groups with five observations in each group. There are several quantities that we will use for the subsequent calculations and these are defined below. 41 48 40 40 49 40 41 44 49 50 39 41 48 46 48 49 44 46 50 51 54 43 49 48 46 39 47 44 42 45 50 41 42 51 42 sum of the Xs 218 240 232 212 221 237 227 Xbar 43.6 48.0 46.4 42.4 44.2 47.4 45.4 XbarThe first variance, that is the "average variance" based upon the variances of the of each group, could be called the within groups variance, however the more common term is the^{2}1900.96 2304.0 2152.96 1797.76 1953.64 2246.76 2061.16 sum X^{2}9534 11532 10840 9034 9867 11315 10413 sum squares 29.2 12.0 75.2 45.2 98.1 81.2 107.2Sum of the Xsis simply the sum of the observations for each group.Xbaris the sample mean for each group.Xbaris the sample mean squared.^{2}Sum Xis the sum of the squared observations.^{2}Sum squaresis the sum of squares as previously defined when we discussed the variance.

To calculate this we first need the

Next we need the

The second variance is based upon the difference of each group mean from the grand mean. We will first calculate the variance among the mean. (Sum(Xbar - grand mean)

We now have two estimates of the population variance, one based upon the variance within the groups and one based upon the variance among the means. If the means of the groups are equal, then these two variances should be approximately equal. If the means of the groups are not equal, then the among groups variance term should be much larger than the within groups variance.

To evaluate the relative magnitude of these two numbers, we will calculate a ratio with the among groups mean square in the numerator and the within groups mean square term in the denominator. This ratio is called an

An important point to remember is that the sum of squares and degrees of freedom are additive. The within groups sum of squares plus the among groups sum of squares equals the total sum of squares (equation 10.8). In addition, the within groups degrees of freedom plus the among groups degrees of freedom equals the total degrees of freedom (equation 10.9). Using the formulas presented in table 10.2, we can summarize our results in what is called an ANOVA table.

source of variation | sum of squares | degrees of freedom | mean square | F-value |
---|---|---|---|---|

total | 575.886 | 34 | ||

among groups | 127.086 | 6 | 21.181 | 1.32 |

within groups | 448.800 | 28 | 16.029 |

On tests, I will not expect you to be able to generate the sum of squares values from raw data, however, I will expect you to be able to do the calculations within the ANOVA table. For instance, if I gave you the sum of squares and the samples size, you could determine the correct degrees of freedom, ultimately calculate an F-value, and then determine wether to reject the null hpothesis. |

Use SigmaStat to do the following exercise in ANOVA.

Last updated on 21 July 1999.