Correlation is a statistical technique used to measure the degree to which two variables vary together. For example, height and weight are related taller people tend to be heavier than the shorter people. So it is not necessary that one causes the other, there might be a third factor involved.

Correlation can be performed on two dependent variables also as there is no specific condition that there should be one independent variable and one dependent variable. Scatter plots plays an important role in better understanding of the data and correlation is usually in one of two directions, where it can be either positive or negative. When working with quantities correlation provide precise measurements, and when working with scales correlation provides general indications. Correlation is one of the most common and most useful statistics and correlation is denoted by 'r'.

## Correlation Formula

The formula for correlation is

r =  $\frac{N\sum xy-(\sum x)(\sum y)}{\sqrt{[N\sum x^{2}-(\sum x)^{2}] [N\sum y^{2}-(\sum y)^{2}]}}$
where r : Correlation lies between - 1 and + 1.
N : Number of pairs of scores.
$\sum$xy : Sum of the products of paired scores.
$\sum$x : Sum of x scores.
$\sum$y : Sum of y scores.
$\sum$x$^{2}$ : Sum of squared x scores.
$\sum$y$^{2}$ : Sum of squared y scores.

## Coefficient of Correlation

Correlation coefficient is used in statistics to measure how strong a relationship is between two variables. It varies from - 1(perfect negative correlation) through 0 (no correlation) to + 1 (perfect positive correlation).

Two different types of correlation coefficients are in use.
1. Pearson product moment correlation is used in measuring the association between two variables.
2. Spearman's rank correlation coefficient is based on the rank relationship between variables.
1.  Pearson product moment correlation (r) or correlation coefficient:  Pearson product moment correlation is a measure of the degree of linear relationship between two variables. In correlation the emphasis is mainly on the degree to which a linear model may describe the relationship between two variables.

Correlation coefficient takes on any value between plus and minus one.  -1.00 $\leq$ r $\leq$ 1.00.

 Value of 'r' Interpretation Between 0 and 0.3 (0 and -0.3) A weak positive(negative)linear relationship. Between 0.3 and 0.7 (0 and -0.7) Moderate positive(negative)linear relationship. Between 0.7 and 1.0 (-0.7 and -1.0) Strong positive(negative)linear relationship.

Pearson's correlation is a correlation coefficient used commonly in linear regression. r is a dimensionless quantity does not depend on the units employed.

2.  Spearman's rank correlation:  Identifies whether two variables relate in a monotonic function and measures the strength of association between two ranked variables. It is often used as a statistical method to aid with either proving or disproving a hypothesis. The formula used to calculate Spearman's rank correlation is

$\rho$ = 1 - $\frac{6\sum\ d_i^{2}}{n(n^{2}-1)}$
where n: Number of ranks (cases)
$d_{i}$ = $x_{i}$ - $y_{i}$, difference in paired ranks
i : Paired score
It only identifies the strength of correlation where the data is consistently increasing or decreasing. The formula is based on the assumption that there are no ties. Spearman's rank correlation is the non parametric version of the Pearson product-moment correlation.

## Types of Correlation

Different types of correlation are as follows:
1. Positive correlation: Correlation is positive when the values increase together. The value 1 indicates perfect positive correlation.
2. Negative correlation: Correlation is negative when one value decreases as the other increases. The value -1 indicates perfect negative correlation.
3. Zero or no correlation: In this the two things vary separately and is an uncorrelated variation. Zero correlation represents complete independence and -1.00 or 1.00 will indicate complete dependence. Two variables are said to be statistically independent if their correlation is zero.
4. Linear correlation: If the amount of change is constant in different variables then it is said there is a linear correlation between the variables.
5. Non linear correlation: If the amount of change is not constant in different variables then it is said there is a non linear correlation between the variables.
6. Partial correlation:  Measures the degree of association between two variables after removing the effects of other variables. Establishing a relationship between only one of all the variable. It is helpful to reveal hidden correlations.
7. Multiple correlation: Multiple correlation is a statistical technique that predicts value of one variable on the basis of two or more variables.

## Positive Correlation

Positive correlation: A positive correlation occurs when there is a functional dependency between the variables. The points will be in a straight line and mostly the points are clustered as to resemble a rising straight line with a positive slope. Positive correlation exists when as one variable decreases, the other variable also decreases and vice-versa.

Strong positive correlation: A correlation of +1 indicates a strong positive correlation where the two variables will move in the same direction. The points will be more clustered towards the line.

Weak positive correlation: In a weak positive correlation the points will be dispersed.
Given below is an image for positive, strong positive and weak positive correlation.

## Negative Correlation

Negative correlation: A relationship between variables where one variable increases as the other decreases. Where the slope in a corresponding graph is negative. Negative correlation is called anti-correlation or inverse correlation.
Example: Negative correlation between TV viewing and lower grades.

Strong negative correlation: Correlation of -1 indicates a strong negative correlation where one variable goes up the other variable will go down. In a strong negative correlation the relationship that appears to exist between two variables is negative 100% of the time.

Weak negative correlation: In a weak negative correlation there will be lot of deviation from the best fit.
Given below is an image for negative, strong negative and weak negative correlation.

## Cross Correlation

Cross correlation is a standard method of estimating the degree to which two series are correlated. It is used when measuring information between two different time series. Range of data is - 1 to + 1, if the cross correlation value is to 1 means more closely the information sets are. If the sets are same almost then their product will be positive and the cross correlation will be large, if unlike then their sum will be small with products some being positive and some negative.

## Covariance Correlation

Covariance and correlation are similar which describes the similarity between two random variables. Covariance for two random variables X and Y with sample size N is

Cov(X, Y) = $\sum_{i=1}^{N}$ $\frac{(x_{i}-\bar{x})(y_{i}-\bar{y})}{N}$ = E[(X - E(X)) (Y - E(Y))]
Correlation $\phi _{XY}$ = $\frac{Cov(X,Y)}{\sigma_{x}.\sigma_{y}}$

where E is the expected value
$x_{i}$ : Set of x observations.
$y_{i}$ : Set of y observations.
N : Total frequency for the given data
$\sigma_{x}$ : Standard deviation of X
$\sigma_{y}$ : Standard deviation of y.
Remember correlation is dimensionless while covariance has units which is obtained by multiplying the units of two variables. Covariance for uncorrelated variables will be zero.

## Correlation Analysis

Correlation analysis measures the relationship between two items and observes the interaction of variables. A strong or high correlation means that two or more variables have a strong relationship with each other while a weak or low correlation means the variables are hardly related.

As the correlation coefficient ranges from -1.00 to +1.00, value of -1.00 represents a perfect negative correlation while a value of +1.00 represents a perfect positive correlation and 0.00 indicates there is no relationship between the two variables tested.

## Correlation Examples

### Solved Examples

Question 1: A computer while calculating correlation coefficient between two variables X and Y from 25 pairs of observations obtained the following results.
n = 25, $\sum$ x = 125, $\sum$ y = 100, $\sum$ $x^{2}$ = 650  $\sum$ $y^{2}$ = 460, $\sum$ xy = 508. At the time of checking two pairs of observations were not correctly copied they were taken as (6, 14) and (8, 6) while the corrected values where (8, 12) and (6, 8). Prove that the correct value of the correlation coefficient should be $\frac{2}{3}$.
Solution:

Corrected $\sum$ x =  125 - 6 - 8 + 8 + 6 = 125
Corrected $\sum$ y =  100 - 14 - 6 + 12 + 8 = 100
Corrected $\sum$ $x^{2}$ = 650 - $6^{2}$  - $8^{2}$ +  $6^{2}$ +  $8^{2}$ = 650
Corrected $\sum$ $y^{2}$ = 460 - $14^{2}$ - $6^{2}$ + $12^{2}$ + $8^{2}$ = 436
Corrected $\sum$ xy = 508 - 6 $\times$ 14 - 8 $\times$ 6 + 8 $\times$ 12 + 6 $\times$ 8 = 520

The formula for Pearson product moment correlation is

r =  $\frac{N\sum xy-(\sum x)(\sum y)}{\sqrt{[N\sum x^{2}-(\sum x)^{2}] [N\sum y^{2}-(\sum y)^{2}]}}$

= $\frac{25 \times 520 - 125 \times 100 }{\sqrt{[25 \times 650 - (125)^{2}] [25 \times 436 - (100)^{2}]}}$

= $\frac{13000 - 12500 }{\sqrt{[16250 - 15625] [10900- 10000]}}$

= $\frac{500 }{\sqrt{[625] [900]}}$

= $\frac{500 }{[25] [30]}$

= $\frac{2}{3}$

Question 2: The ranks of 15 students in two subjects A and B are given below. The two numbers with the brackets denoting the ranks of the same student in A and B respectively.
(1, 10), (2, 7), (3, 2), (4, 6), (5, 4), (6, 8), (7, 3), (8, 1), (9, 11), (10, 15)(11, 9), (12, 5), (13, 14), (14, 12), (15, 13).
Use spearman's formula to find rank correlation coefficient.
Solution:

To find the deviations consider the table
 Ranks in A ($x_{i}$) Ranks in B($x_{i}$) $d_{i}$ = $x_{i} - y_{i}$ $d_{i}^{2}$ 1 10 -9 81 2 7 -5 25 3 2 1 1 4 6 -2 4 5 4 1 1 6 8 -2 4 7 3 4 16 8 1 7 49 9 11 -2 4 10 15 -5 25 11 9 2 4 12 5 7 49 13 14 -1 1 14 12 2 4 15 13 2 4 $\sum$ d = 0 $\sum$ $d_{i}^{2}$ = 272
Spearman's rank correlation coefficient is

$\rho$ = 1 - $\frac{6\sum\ d^{2}}{n(n^{2}-1)}$

$\rho$ = 1 - $\frac{6\times 272}{15(225-1)}$

$\rho$ = 1 - $\frac{1632}{3360}$

$\rho$ =0.52

Therefore the rank correlation coefficient is 0.52.