ANOVA is an acronym for ANalysis Of VAriance. An ANOVA segregates different sources of variation seen in experimental results. Some of the sources are "explained" (usually due to the treatments the experimenter applied), while the remainder are lumped together as "unexplained" variation (also called the "Error term"). An ANOVA then tests if the variation associated with an explained source is large relative to the unexplained variation. If that ratio (the F statistic) is so large that the probability that it occurred by chance is low (for example, P<=0.05), we can conclude (at that level of probability) that that source of variation did have a significant effect.
For example, consider an experiment where three varieties of wheat were grown at four locations. At each of the locations, there were four blocks, within each of which were small plots for each of the varieties. The yield of each plot was measured. We wish to know if there is a significant difference in yield associated with the different varieties (one source of variation). We also wish to know if one location was superior to another. Finally, we wish to know if some varieties are superior at one location but inferior at another (that is, if there is an interaction of variety and location). The ANOVA procedure will answer these questions.
The layout of the various test plots and the method of assigning treatments to those plots constitutes the "experimental design." The wheat experiment, for example, is a "randomized complete blocks" experiment; all of the treatments occur once, randomly arranged in each block. Experimental designs can vary greatly. Each design requires a slightly different mathematical model and a slightly different procedure for analysis. Extensive discussions of different experimental designs and different ANOVA procedures can be found in statistics texts such as Gomez and Gomez (1984), Little and Hills (1978), Snedecor and Cochran (1980), and Sokal and Rohlf (1995).
CoStat can handle virtually any type of experimental design. It has a large number of pre-defined models that you can pick from a list (including: 1, 2, 3 and 4 way completely randomized, 1 and 2 way randomized blocks, latin square, nested, split plot, split-split plot, split block, some covariance designs, etc). Or, you can use a special language to describe different models.
Because the ANOVA procedure uses a Generalized Linear Models (GLM) approach, it can analyze unbalanced designs and experiments with missing values. It can calculate the Type I, II, or III Sums of Squares.
Before performing the ANOVA, CoStat performs Bartlett's test for homogeneity of variances, one of the assumptions of ANOVA.
After performing the ANOVA, the procedure can automatically run a means comparisons test (also called multiple comparisons of means) (for example, Duncan's, Student-Newman-Keuls (SNK), Tukey-Kramer, Tukey's HSD, or Least Significant Difference (LSD)).
Contrasts are related to multiple comparisons of means, but the tests are done during the ANOVA procedure. Contrasts are comparisons of different subsets of means and are planned before the experiment is conducted. You can specify any contrasts that you want. For example, you might test the control against all other treatments. Contrasts are also called a priori comparisons, planned comparisons, and orthogonal contrasts (which indicates there is no overlap between the statistical questions asked by several contrasts).
CoStat's manual has:
Sample Run 4 - 2 Way Randomized Blocks ANOVA
In a randomized blocks design, the experimental units are in groups called blocks. Usually, each block contains 1 replicate of each combinations of treatments in random order. Thus, there is 1 restriction on randomization. Such experiments are useful in fields with naturally high variability along one axis (for example, due to irrigation). The ANOVA segregates this variability so that differences between treatments are not hidden by differences among the blocks (presumably, the variability is much less within blocks). This is a randomized "complete" blocks design because each block contains one replicate of each of the treatment combinations. In CoStat, the experiments need not be complete; there can be missing data points (by design or by accident). Also, CoStat allows for more than one replicate per treatment combination per block.
The sample run demonstrates a 2 way (also known as "2 factor") randomized blocks design.
Here is the ANOVA model for a 2 Way Randomized Blocks ANOVA (2WRB.aov):
\\\CoStat.AOV 1.00 \\\2 Way Randomized Blocks \\\"1st Factor" "2nd Factor" "Blocks" \\\Type III Blocks \M 3 Main Effects @1 \M 1 @2 \M 2 Interaction @1 x @2 \I 1 2 Error \E Total \T
In the wheat experiment (modified from Allen, 1981), three varieties of wheat were grown at four locations. At each of the locations, there were four blocks, within each of which were small plots for each of the varieties. The Height and Yield of each plot were measured.
This data set is also important because it demonstrates the use of string indices (Butte, Shelby, ...) instead of numeric indices (1, 2, 3, ...) (which older versions of CoStat required).
PRINT DATA 2000-08-03 09:43:16 Using: C:\cohort6\wheat.dt First Column: 1) Location Last Column: 5) Yield First Row: 1 Last Row: 48 Location Variety Block Height Yield --------- ---------- --------- --------- --------- Butte Dwarf 1 91.75 58.77 Butte Dwarf 2 93 58.98 Butte Dwarf 3 91.75 53.73 Butte Dwarf 4 92.75 62.08 Butte Semi-dwarf 1 127.5 39.8 Butte Semi-dwarf 2 132.5 41.4 Butte Semi-dwarf 3 127.75 53.35 Butte Semi-dwarf 4 131.75 39.08 Butte Normal 1 146.5 24.33 Butte Normal 2 154.75 20.66 Butte Normal 3 150.75 24.22 Butte Normal 4 157.75 20.68 Shelby Dwarf 1 63.25 25.22 Shelby Dwarf 2 61.5 26.3 Shelby Dwarf 3 62.75 21.92 Shelby Dwarf 4 63.5 27.54 Shelby Semi-dwarf 1 80 25.97 Shelby Semi-dwarf 2 80 22.73 Shelby Semi-dwarf 3 82.5 28.44 Shelby Semi-dwarf 4 83.75 25.09 Shelby Normal 1 95 23.77 Shelby Normal 2 94 18.7 Shelby Normal 3 96.25 24.9 Shelby Normal 4 91.5 11.29 Dillon Dwarf 1 74 39.44 Dillon Dwarf 2 80 39.37 Dillon Dwarf 3 78.25 37.99 Dillon Dwarf 4 78.25 40.69 Dillon Semi-dwarf 1 106.5 28.42 Dillon Semi-dwarf 2 110.75 35.13 Dillon Semi-dwarf 3 110 36.14 Dillon Semi-dwarf 4 110.75 32.93 Dillon Normal 1 116.5 24.98 Dillon Normal 2 116.75 28.62 Dillon Normal 3 120.25 28.69 Dillon Normal 4 120.25 26.37 Havre Dwarf 1 67.5 26.47 Havre Dwarf 2 72.5 26.22 Havre Dwarf 3 68.75 26.15 Havre Dwarf 4 73.75 28.28 Havre Semi-dwarf 1 90.5 21.13 Havre Semi-dwarf 2 90.5 24.25 Havre Semi-dwarf 3 90.5 25.06 Havre Semi-dwarf 4 96 22.58 Havre Normal 1 97.75 24.16 Havre Normal 2 96.5 21.98 Havre Normal 3 103 25.86 Havre Normal 4 98.5 22.09
For the sample run, use File : Open to open the file called wheat.dt in the cohort directory. Then:
HOMOGENEITY OF VARIANCES - RAW DATA
2000-07-25 10:16:29
Using: c:\cohort6\wheat.dt
Data Column: 5) Yield
Broken Down By:
2) Variety
1) Location
3) Block
Keep If:
Bartlett's Test tests the homogeneity of variances, an assumption of
ANOVA. Bartlett's Test is known to be overly sensitive to non-normal data.
A resulting probability of P<=0.05 indicates the variances may be not
homogeneous and you may wish to transform the data before doing an ANOVA.
For ANOVA designs without replicates (notably most Randomized Blocks
and Latin Square designs), there is not enough data to do this test.
There is not enough data to do the test.
ANOVA
2000-07-25 10:16:29
Using: c:\cohort6\wheat.dt
.AOV Filename: 2WRB.AOV - 2 Way Randomized Blocks
Y Column: 5) Yield
1st Factor: 2) Variety
2nd Factor: 1) Location
Blocks: 3) Block
Keep If:
Rows of data with missing values removed: 0
Rows which remain: 48
Source df Type III SS MS F P
------------------------- -------- ----------- --------- --------- ----- ---
Blocks 3 39.24825625 13.082752 1.1827612 .3313 ns
Main Effects
Variety 2 1633.399687 816.69984 73.834688 .0000 ***
Location 3 2539.06904 846.35635 76.515818 .0000 ***
Interaction
Variety x Location 6 1387.188179 231.19803 20.901724 .0000 ***
Error 33 365.0194188 11.061195<-
------------------------- -------- ----------- --------- --------- ----- ---
Total 47 5963.924581
Model 14 5598.905163 399.9218 36.15539 .0000 ***
R^2 = SSmodel/SStotal = 0.93879543348
Root MSerror = sqrt(MSerror) = 3.32583741448
Mean Y = 30.665625
Coefficient of Variation = (Root MSerror) / abs(Mean Y) * 100% = 10.84549%
COMPARE MEANS
Factor: 2) Variety
Test: Student-Newman-Keuls
Variance: 11.0611945076
Degrees of Freedom: 33
Significance Level: 0.05
Keep If:
n Means = 3
LSD 0.05 = 2.39230738434
Rank Mean Name Mean n Non-significant ranges
----- ---------- ------------- ------- ----------------------------------------
1 Dwarf 37.446875 16 a
2 Semi-dwarf 31.34375 16 b
3 Normal 23.20625 16 c
COMPARE MEANS
Factor: 1) Location
Test: Student-Newman-Keuls
Variance: 11.0611945076
Degrees of Freedom: 33
Significance Level: 0.05
Keep If:
n Means = 4
LSD 0.05 = 2.76239862466
Rank Mean Name Mean n Non-significant ranges
----- --------- ------------- ------- ----------------------------------------
1 Butte 41.4233333333 12 a
2 Dillon 33.2308333333 12 b
3 Havre 24.5191666667 12 c
4 Shelby 23.4891666667 12 c
COMPARE MEANS
Factor: 3) Block
Test: Student-Newman-Keuls
Variance: 11.0611945076
Degrees of Freedom: 33
Significance Level: 0.05
Keep If:
n Means = 4
LSD 0.05 = 2.76239862466
Rank Mean Name Mean n Non-significant ranges
----- --------- ------------- ------- ----------------------------------------
1 3 32.2041666667 12 a
2 2 30.3616666667 12 a
3 1 30.205 12 a
4 4 29.8916666667 12 a