ANOVA, Analysis of Variance is a technique used to test the equality of means of more than two populations. For the purpose of testing the difference of means between two populations t-test is easier to use, even though anova technique could also be used. Analysis of variance provides a test for the significance of difference among means, by comparing the variances using the F test. This technique was initially used in experiments related to agricultural activities like testing the efficacy of different types of Fertilizers or different methods of feeding the animals. In recent times it is used in many other fields like testing the sales made using different sales techniques or testing the usefulness of drugs manufactured by different companies in arresting or curing diseases.

ANOVA is a method used to test hypotheses about the mean of a dependent variable across datasets. It is compare means between three or more groups and variance of the datasets. ANOVA also analyze phase of DMAIC to compare multiple sets of data simultaneously.

The underlying principle of Analysis of Variance is as follows:
  1. If the null hypothesis that the three population means (μ1, μ2, μ3) are equal is true, then both the variation among the sample means (X1, X2, X3)and the variation within groups are results of sampling error.
  2. The first type the variation among the three means is called the Between the Sample Variation", which is a variation of the sample Means X1, X2, and X3 around the Grand mean XGM.
  3. The second type is known as "Within Sample Variation" is variation observed within each sample around the respective means X1, X2 and X3.
  4. When the Population means are equal, then the two types of variations between sample and within sample are not expected to differ significantly after allowing for sampling fluctuations that is adjusting for degrees of freedom.
  5. On the other hand when the null hypothesis is false, i.e when the population means are different then Between sample variation should significantly exceed the within sample variance.

Thus a comparison of "between sample variation" and "within sample variation" is used to explain the differences among the sample means and lead to the inference on the equality of population means.

ANOVA Assumptions

The assumptions made for using ANOVA technique are

  1. The samples are independently drawn from population.
  2. The populations are normally distributed.
  3. The variance of all the population are equal.
There are two types of Anova used.
  • One - Way Classification: The observations are classified according to one factor which are shown column-wise.
  • Two - Way Classification: The observations are classified according two factors, one column-wise and other row-wise in a contingency table.
The following steps are followed in One - Way ANOVA for testing the equality of means of say three populations.

Step 1:
State the null and alternate hypotheses and identify the claim.
Ho : μ1 = μ2 = μ3.
Ha : At least one mean is different from others.

Step 2:
Find the critical value for the F test. Find FdfN,dfD value from F table corresponding the given significance level.
Degrees of Freedom for the numerator, d.f. N = k -1, where k is the number of groups and equal to 3 if three populations are considered.
Degrees of Freedom for the denominator, d.f. D = N - k, where N is the total number of data values in all the samples.

Step 3: Procedure for computing the test value:
  1. Find the mean and variance of each sample group and the Grand Mean XGM writing the sample data of each group in separate columns.
  2. Find the Sum of squares between the groups SSB and Sum of Squares within group SSW using the formulas SSB = ∑ni(Xi - XGM)2 and SSW = ∑(ni - 1)Si2 where Xi and Si2 are the mean and variance of group i.
  3. Find Between Group variance $S_{B}^{2}$ using the formula $S_{B}^{2}$ = SSB /k-1. Between group variance is also represented by MSB read as Mean Squares between.
  4. Find Within Group variance $S_{W}^{2}$ using the formula $S_{W}^{2}$ = SSW /N - k. Within group variance is also represented by MSW read as Mean squares within.
  5. Find the F test value F = $\frac{S_{B}^{2}}{S_{W}^{2}}$.

Step 4: Make the decision.
The null hypothesis is to be rejected if the F test value calculated is greater than the F critical value.

Step 5: Present the calculations in ANOVA table and summarize the result verbally.

ANOVA Table


The format of the ANOVA Summary Table is given below:

Source Sum of
Squares
Degrees of
Freedom d.f
Mean
Squares
F
Between SSB
k - 1
MSBMSB / MSW
Within SSW
N - k
MSW
Total

Below you could see example

Solved Example

Question: The following table gives yields (in 100 Kgs)  per hectares of three varieties of wheat, each grown in four plots. Test the claim that there is no difference among the mean yields of the wheat varieties at 5% significance level. Also Make a ANOVA table for the data given.

 Plot # 
 Variety 
    A
 Variety 
     B
 Variety
     C
   1
    7
     6
     6
   2
    8
     6
     5
   3
    4
     4
     4
   4
    8
     7
     5

Solution:
Step 1:

Ho : μ1 = μ2 = μ3  (Claim the average yields are equal)
H1 :  At least one mean is different from others.

Step 2: Find the critical value
N = 12 and k =3
d.f.N = k -1 = 2
d.f.D = N - K = 12 - 3 =9
Critical value F2,9 at α = 0.05 from F table = 4.26

Step 3: Computing the test value
To calculate the mean and variance of each group a table is made as follows:

  X1 
 X2 
 X3 
 X12   X22 
 X32
   7   6   6   49     36     36 
   8   6   5   64   36   25
   4   4   4   16   16   16
   8   7   5   64   49   25
 Total  27  23  20   193   137   102

Means for each category
X1 = `27/4` = 6.75      X2= `23/4` = 5.75   X3 = `20/4` = 5
Computed variances for each category
$S_{1}^{2}$ = 3.58     $S_{2}^{2}$ = 1.58        $S_{3}^{2}$  = 0.67
Grand Mean XGM = `70/12` = 5.83
Sum of Squares Between Group SSB = ∑ni(Xi - XGM)2 = 4(6.75 - 5.83)2 + 4(5.75 -5.83)2 + 4(5 - 5.83)2 = 1.5417
Sum of Squares Within Groups  SSW = ∑(ni - 1)Si2 = 3(3.58) + 3(1.58) + 3(0.67) = 17.49
Variance Between Groups $S_{B}^{2}$ = SSB /k-1 = `1.5417/2` = 0.77085
Variance Within Groups $S_{W}^{2}$ = SSW /N - k = `17.49/9` = 1.943
F test value = $\frac{S_{B}^{2}}{S_{W}^{2}}$ = `0.77085/1.943` = 0.397

Step 4:
The computed test value 0.397 < The critical value 4.26
There is no sufficient evidence to reject the claim the mean yields are equal for the three varieties of wheat A,B and C.
The ANOVA Summary Table for the test is as follows:

 Source   Sum of
Squares
Degrees of
Freedom d.f
  Mean
Squares
    F  
 Between    1.54 
      2
 0.771  0.397
 Within   17.49
      9
 1.943
 Total  19.03     11