Statistics is quite an important and perhaps the most vast branch of mathematics. It studies about the collection, organization, analysis and interpretation of numerical data and variables. It provides informative results about data sets that are being used in a number of researches not only in the traditional fields like: maths, science, physics chemistry, psychology, economics ; but also in more modern and technology-oriented areas such as: weather forecast, business statistics, bio-statistics, demography, population survey, data mining, sports statistics, chemo-matrices, reliability engineering and many more.

In statistics, there are different concepts that play an important role in such areas. Covariance is one of them. It illustrates the relationship between two statistical variables. Let us study about covariance in detail.

## Definition of Covariance

The measure of degree of two statistical variables to change with each other, is known as covariance. It is defined as the measure of the extent to which two assets move together.
In simpler words, we can say that covariance measures, whether most of the bigger values of a variable are usually correspond to the bigger values of another variable (although some relate to the smaller values) or most smaller values of one variable are corresponding to the smaller values of another one. There might be the case that the bigger values in one variable correspond to the smaller ones in another variable.Covariance actually determines if two variables tend to demonstrate the similar behavior or the reverse behavior towards each other. In former case, the covariance is said to be positive ; while in latter case, it turns out to be negative. One can easily interpret the type of covariance by just looking at its sign. But the magnitude of covariance can only be found out by doing some clumsy calculations.

## Symbol for Covariance

The symbol for notifying covariance is known as the symbol of covariance.
Covariance between two variables is denoted by any of the following symbols :
1) $cov (X, Y)$

2) $cov (x, y)$

3) $COV (X, Y)$

4) $COV(x, y)$

Where, $X$, $Y$ or $x$, $y$ are two variables which are to be measures for covariance.

## Covariance Coefficient

Covariance is the value with either a positive or a negative sign. It is a constant which is known as covariance coefficient.
The coefficient of covariance does have following two attributes :
1) Magnitude
2) Sign
The magnitude of covariance coefficient shows the degree of covariance between two variables. On the other hand, the sign of covariance coefficient denotes whether there is a positive covariance between two variables or negative.
Coefficient of covariance is written as shown below :
$cov (X, Y)$ = $\pm$ k
Where, $k$ is a constant numerical value.

## Formula For Covariance

Let us consider two independent variables $x$ and $y$.
Then, in order to find the covariance between $x$ and $y$, following formula is utilized:
$cov (x, y)$ = $\sum_{i=1}^{n}$ $\frac{(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1}$
Where,
$x$ and $y$ are two independent variables between which covariance is to determined.

$\bar{x}$ = Mean of value in variable $x$

$\bar{y}$ = Mean of value in variable $y$

$x_{i}$ = $A$ value in variable $x$

$y_{i}$ = $A$ value in variable $y$

$n$ = Sample size of both variables

## Covariance Structure

In statistics, there are two important concepts that are known as covariance matrix and covariance structure. Covariance matrix is a symmetric matrix used in statistics, just as matrices used in algebra. It is a matrix which has elements such as - covariance and variance.
Covariance structures are nothing but the patterns of elements in covariance matrices. There are different types of covariance structures defined in statistics, such as :
1) Variance Component Structure
2) Compound Symmetry Structure
Variance component structure: does have different variances and all the covariances are zero. Whereas, compound symmetry structure: does have all variances equal as well as all covariances equal.

A covariance matrix that has no structural pattern is known as unstructured.

## Covariance Properties

There are various properties that are satisfied by covariance. But few of them are very useful and important. Let us consider three real-valued independent random variables X, Y and Z. Also, assume that a and b are two constant real values. Then, covariance satisfies following important properties -

1) $cov (X, X)$ = $var (X)$ ....(covariance between a variable and itself is equal to variance of that variance)

2) $cov (X, Y)$ = $cov (Y, X)$ ....(covariance is symmetric)

3) $cov (X, Y + Z)$ = $cov (X, Y)$ + $cov (X, Z)$ ...(covariance is distributive)

4) $cov (a X, Y)$ = $a cov (X, Y)$

5) $cov (a X + bY, Z)$ = $a$ $cov (X, Z)$ + $b$ $cov (Y, Z)$

6) $cov (X + a, Y + b)$ = $cov (X, Y)$

## Analysis of Covariance

Analysis of covariance is abbreviated as ANCOVA. It is defined as a combination of linear regression and analysis of variance (ANOVA). Analysis of covariance does include few tests that are used to study the important effects on a dependent variable by the means of categorical variable, in order to control the effect of other continuous variable which are supposed to co-vary with the dependent variable.

These variables are known as covariates. Covariates are not controlled in the whole process, but still affect dependent variable.

Analysis of covariance technique is used to increase the perception of comparison between two variables. ANCOVA is an important concept that is useful in various interpretations about two variables.

## Correlation Vs Covariance

Correlation and covariance are two closely related concepts in probability and statistics. Both the concepts deal with same type of dependent variables. Both measure the relationship between two variates. Even though, both look similar, there is a vast difference between those two.

Let us understand the basic differences between covariance and correlation.
1) Covariance measures the degree of varying two random variables together. While, correlation measures the association or association of two variables. More specifically, covariance is defined as the measure how one variable varies with the variation in another and correlation is the measure how near or far two variables are from being independent of each other.

2) Covariance is the variation of two variates from their expected values. On the other hand, a correlation does the same except for including variations.

3) Covariance is the measure of correlation.

4) Covariance can only involve relationship between two variables or two sets of data, while correlation is able to involve relationship of more that two variables.

5) Covariance value does not have any range limit, but correlation ranges between -1 to +1.

6) There exist two types of covariances:  positive (similar relation) and negative (reverse relation).
Also, correlation are of three types:
positive, negative and zero, where zero correlation refers to no correlation.

## How to Compute Covariance

Let us consider two samples that are denoted by variables x and y. Covariance between x and y is computed by the following steps:

Step 1: Determine the total number of observations in each sample. It is known as sample size and is denoted by n. Make sure the sample size is same for both samples.

Step 2: Calculate the mean of first sample. In order to do so, (for ungrouped data) add up all the observations of first sample and divide it by the total number of observations. Its mean is denoted by $\bar{x}$.

Step 3: Repeat the same method with second sample and find its mean as well. It is represented by $\bar{y}$.

Step 4:  Construct a table in order to calculate $(x_{i}-\bar{x}) \times (y_{i}-\bar{y})$.

Step 5: Use the following formula for finding covariance :
$cov (x, y)$ = $\sum_{i=1}^{n}$ $\frac{(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1}$

## Sample Covariance

You would have heard of terms, such as: sample mean, sample standard deviation etc. There is one more terminology called sample covariance.
Just to recall that sample is an estimate of population. Population refers to a very-very large amount of data that includes a big population in survey. It is very difficult to work with such a large data. In order to be more convenient and accurate, a piece of observations are taken out of the population. This is known as sample. Sample involves the observations that represent the population. We generally compute statistical operations on a sample data.Sample covariance is known as covariance of the sample that represents a population. It is an estimation of population covariance. Usually, the computation of sample covariance is preferred in statistical estimation.

## Covariance Example

Let us consider few examples related to covariance.

Example 1:  Find the covariance between the following two data sets:

 $x$ 8 6 10 5 3 4 $y$ 1 3 1 5 7 7

Solution:
Sample size, $n$ = 6
Mean of first data set :

$\bar{x}$ =$\frac{8+6+10+5+3+4}{6}$

$\bar{x}$ = $\frac{36}{6}$ = 6

Mean of second data set:

$\bar{y}$ = $\frac{1+3+1+5+7+7}{6}$

$\bar{y}$ =$\frac{24}{6}$ = 4

Let us construct the following table:
 $x_{i}$ $y_{i}$ $x_{i}-\bar{x}$ $y_{i}-\bar{y}$ $(x_{i}-\bar{x}) \times (y_{i}-\bar{y})$ 8 1 2 - 3 -6 6 3 0 - 1 0 10 1 4 - 3 -12 5 5 - 1 1 -1 3 7 - 3 3 -9 4 7 - 2 3 -6 $\sum_{i=1}^{n}(x_{i}-\bar{x}) \times (y_{i}-\bar{y})$ = - 34
The formula for covariance is:

$cov (x, y)$ = $\sum_{i=1}^{n}$ $\frac{(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1}$
$cov (x, y)$ = $\frac{-34}{5}$ = - 6.8 (reverse covariance)

Example 2: Calculate the covariance of the following given samples :

 $x$ 2 3 5 2 8 7 12 4 1 $y$ 9 6 7 8 7 2 3 3 4

Solution:  Sample size, $n$ = 9

Mean of first data set:

$\bar{x}$ = $\frac{2+3+5+2+8 +7+12+4+1}{9}$

$\bar{x}$ =$\frac{44}{9}$ = 4.89

Mean of second data set:

$\bar{y}$ = $\frac{9+6+7+8+7+2+3+3+4}{9}$

$\bar{y}$ = $\frac{49}{9}$ = 5.4

Let us construct the following table:
 $x_{i}$ $y_{i}$ $x_{i}-\bar{x}$ $y_{i}-\bar{y}$ $(x_{i}-\bar{x}) \times (y_{i}-\bar{y})$ 2 9 - 2.89 3.6 - 10.404 3 6 - 1.89 0.6 - 1.134 5 7 0.11 1.6 0.176 2 8 - 2.89 2.6 - 7.514 8 7 3.11 1.6 4.976 7 2 2.11 - 3.4 - 7.174 12 3 7.11 - 2.4 - 17.064 4 3 - 0.89 - 2.4 2.136 1 4 - 3.89 - 0.4 1.556 $\sum_{i=1}^{n}(x_{i}-\bar{x}) \times (y_{i}-\bar{y})$ = - 34.446
The formula for covariance is:
cov $(x, y)$ = $\sum_{i=1}^{n}$ $\frac{(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1}$
$cov (x, y)$ = $\frac{- 34.446}{8}$ = - 4.305 (reverse covariance)