One of the most commonly used statistical tests is ANOVA. It is a statistical method used to compare the means of more than two sets of data and is developed by R.A. Fisher. ANOVA is used to see if there is any difference between groups on some variable where there are more than two groups. Compares the means of more than two samples. It can be applied for both parametric and non parametric data.
Anova has two models:
The Fixed Effects Model
The Random Effects Model.
The one way ANOVA test compares several groups of observations, all of which are independent but possibly with different group means. Two way ANOVA studies the effects of two factors separately (their main effect) and together (their interaction effect).

## Definition

ANOVA is an acronym for analysis of variance. It is viewed as an extension of the t test and is used to test for differences among more than two populations.
Using other kinds of ANOVA analysis, comparison of means using more than one variable is possible. Comparing the differences among group means to a measure of dispersion for the sampling distribution.
Note that ANOVA does not compare variances. It examines the relationship between variables a nominal level independent variable having 3 or more categories and a normally distributed dependent variable. (interval / ratio)

## One Way Analysis of Variance

ANOVA studies the effect of k (>2) levels of a single factor.

The below hypothesis can be tested if different levels of factor affect measured observations differently.

H$_{0}$ = $\mu_{i}$ = $\mu$   ; i = 1, 2, ............., k

H$_{1}$ = $\mu_{i}$ $\neq$ $\mu$ for some i = 1, 2, .........., k

where $\mu_{i}$ = Population mean for level i.

## Two way Analysis of Variance

The purpose of two way ANOVA is to understand if there is an interaction between the two independent variables on the dependent variable. It involves, two factors each with multiple levels. Two independent variables in a two way ANOVA are called factors. Degrees of freedom for each factor will be one less than the number of levels. The Hypothesis is of the type there is no interaction between the two factors. Assumptions related to normality, equality and independent errors apply.
Two possible means model for two way ANOVA are the additive model and the interaction model:
Additive Model : Effects on the outcome for one explanatory variable does not depend on the level of the other explanatory variable.

Interaction Model : Effects on the outcome for one explanatory variable does depend on the level of the other explanatory variable.

## Multivariate Analysis of Variance

Multivariate analysis utilizes a variety of techniques and helps us to establish which variable has made the greatest impact on a dependent variable. It can check for interactions between the effects of independent variables. Purpose of multivariate analysis is to determine whether the response variables are altered by the observer’s manipulation of the independent variables.

The dependent variables should meet parametric requirements. MANOVA is also  considered as an alternative to the repeated measures ANOVA when sphericity is violated. It uses the variance covariance between variables in testing the statistical significance of the mean differences. We first try to find the linear combinations of the dependent variables that separates the groups then test whether these new variables are significantly different for the groups.

## One and Two way ANOVA Model

Different ANOVA models are explained below :

One way ANOVA Model:
One way ANOVA model is given by:

Y$_{ij}$ = $\mu$ + $\tau_{i}$ + e$_{ij}$
Y$_{ij}$ : Represents the jth observation on the i th treatment.
$\mu$ ; Mean
$e_{ij}$ : Errors are normally and independently distributed with mean zero and variance s$^{2}$

Two way ANOVA Model:
The model is written as:

y$_{ijk}$ : $\mu$ + $\tau_{j}$ + $\lambda_{k}$ + ($\tau \lambda)_{jk}$ + e$_{ijk}$
$\mu$: Grand Mean

$\tau_{j}$: Treatment effect for the jth category of the row variable.

$\lambda_{k}$:  treatment effect for the kth category of the column variable.

($\tau \lambda)_{jk}$: interaction effect for the combination of the jth row category and the kth column category.

## Limitations

Some of the important limitations of variance are discussed below:

1)  It becomes difficult in solving problems using ANOVA than compared to the other techniques.

2) In real world applications it is difficult to have all population means from each data group to be equal and also to have all the variances from each data group to be equal.

3)The amount of variance for each sample among the dependent variable is relatively equivalent.

4) It also test for normality and the null hypothesis. When a null hypothesis is rejected we can easily identify that one group is different from others where as in one way anova and multiple groups it becomes difficult to analyze which group is different.

## Assumptions

Assumptions for one way, two way and multivariate analysis of variance are given below:
1) Variances of the population must be equal.

2) Groups should have the same sample size.

3) Populations from which the samples were obtained must be approximately normally distributed.

4) Samples must be independent.

5) Effect of one factor is the same at all levels of the other factor.

6) Normal populations have a common variance $\sigma^{2}$.

## Table

For a two way analysis of variance the table takes the following form:

Sum of a(row) observations in ith row = T$_{Ai}$ = $\sum_{j}$y$_{ij}$

Sum of b(column) observations in jth column = T$_{Bj}$ = $\sum_{i}$y$_{ij}$

Sum of all ab observations = T = $\sum_{i}$ $\sum_{j}$y$_{ij}$ = $\sum_{i}$T$_{Ai}$ = $\sum_{j}$T$_{Bj}$

Total sum of squares = SS$_{T}$ = $\sum_{i}$$\sum_{j}$y$_{ij}^{2}$ - $\frac{T^{2}}{ab}$

Between rows sum of squares = SS$_{A}$ = $\sum_{i}$ $\frac{T^{2}_{Ai}}{b}$ - $\frac{T^{2}}{ab}$

Between columns sum of squares = SS$_{B}$ = $\sum_{j}$ $\frac{T^{2}_{Bj}}{a}$ - $\frac{T^{2}}{ab}$

Error sum of squares = SS$_{E}$ = SS$_{T} - SS_{A} - SS_{B}$

Given below is an image of the ANOVA table:

## Examples

Example 1: In an experimental study of the mineral metabolism of pullets for white pullets of the same strain and hatching were used during the period of investigation two pullets were given the ration C which had high Cao content where as the other two pullets were given the ration NC which has comparable ration with C. Apart from the fact its Cao content is low. In the other aspects the pullets are treated alike and an attempt was made to regulate the daily food consumption of the pullets. The following table keeps the rates of Cao in grams found in the whole eggs laid out by the pullets.

 C1 C2 NC3 NC4 2.435 2.155 2.156 2.274 2.395 1.877 2.376 2.352 2.235 2.163 2.147 1.690 2.545 2.088 1.821 1.685 2.842 2.136 1.805 1.254 2.749 2.071 1.858 0.833 2.723 1.895 1.736 2.706 1.870 1.758 2.586 1.724 2.479 1.353 2.673 2.721 2.255 2.318
state an appropriate linear model to analyze the above data.
Test for the significant difference in the 4 different diets given to pullets.

Solution: The linear model is y$_{ij}$ = $\mu$ + $\alpha_{i}$ + e$_{ij}$
n $_{1}$ = 14, n$_{2}$ = 10,  n$_{3}$ = 8,  n $_{4}$ = 6
n  = n$_{1}$ + n$_{2}$+ n$_{3}$+ n$_{4}$ = 38

Now to test the difference among 4 diets, we need to set up the null hypothesis.
H$_{0}$: There is no significant difference among 4 different diets

H$_{1}$: Atleast one diet is different from others.

y$_{1.}$ =   $\sum_{j = 1}^{n_{i}}$ 35.662
y$_{2.}$ = 19.332
y$_{3.}$ = 15.657
y$_{4.}$ = 10.088

$\bar{y_{1.}}$ = 2.5472
$\bar{y_{1.}}$ = 1.9332
$\bar{y_{1.}}$ = 1.9571
$\bar{y_{1.}}$ =1.6813

$\sum_{i = 1}^{n}y_{i}^{2}$ = y$_{1.}^{2}+y_{2.}^{2}+y_{3.}^{2}+y_{4.}^{2}$

=(35.662)$^{2}$+ (19.332)$^{2}+(15.657)^{2}$+(10.088)$^{2}$
=1992.4138

y$_{..}$ = y$_{1.}+ y_{2.}+y_{3.}+y_{4.}$
= 35.662 + 19.332 + 15.657 + 10.088
= 80.739

(y$_{..})^{2}$ = 6518.7861

C.F = $\frac{y_{..}^{2}}{n}$

= $\frac{6518.7861}{38}$

= 171.5470

TSS = $\frac{\sum_{i = 1}^{n}}{n_{i}}y_{i.}^{2}$ - C.F

= $\frac{y_{1.}^{2}}{n_{1}}$+$\frac{y_{2.}^{2}}{n_{1}}$+$\frac{y_{3.}^{2}}{n_{1}}$+$\frac{y_{4.}^{2}}{n_{1}}$ - C.F

=4.2709

TSS = $\sum_{i = 1}^{t}\sum_{j = 1}^{n_{i}}y_{ij}^{2}$ - C.F

= 178.9946 - 171.5470

= 7.4476

= 7.4476 - 4.2709

= 3.1767

We present all the above various sum of squares in the following ANOVA table.

 Source D.f Sum of Squares Mean sum of squares fcal Due to treatment 4 - 1 = 3 4.2708 1.4236 15.2419 Due to error 37 - 3 = 34 3.1767 0.0934 Total 38 - 1 = 37 7.4476

F$_{tab}$ = 2.92

As F$_{cal}$ > F$_{tab}$ we reject H$_{0}$ and conclude that atleast one diet is different from others.