In statistics correlation techniques are used to determine whether a relationship exists between two variables in a bivariate data distribution. A scatter plot visually represents the relationship between the two variables. A scatter plot can show the direction of correlation that is whether a positive, negative or no correlation exists between the two variables. The scatter plot can also provide a hint on the strength of the correlation if it exists, like strong, moderate or weak . Correlation Coefficient a numerical measure of the strength of the relationship is calculated using different formulas.

## Correlation Coefficient Definition

Correlation coefficient is a measure of strength and direction of linear relationship between two variables.
The symbols r and ρ are correspondingly used to represent the sample and population correlation coefficients.

This measure does not establish the causal relationship between the two variables. This means this does not suggest that variation in the variable is caused by the variations in the variable X or vice a versa.

## Linear Correlation Coefficient

A correlation coefficient is also known as linear correlation coefficient as it measures the strength of linear relationship between the variables only. The value of linear correlation coefficient varies from -1 to 1 both values inclusive.

For a strong positive correlation the value of the correlation coefficient will be close to 1 and a value close to -1 will indicate a strong negative correlation. If the value of correlation coefficient is close to 0, it can be stated that no correlation exists between the variables.

The rule of thumb for describing the strength of correlation is as follows:
 r = 1 Perfect Positive Linear Correlation 0.8 < r < 1 Strong Positive Linear Correlation 0.5 < r < 0.8 Moderate Positive Linear Correlation 0.2 < r < 0.5 Weak Positive Linear Correlation r = 0 No Linear Correlation -0.2 > r > -0.5 Weak Negative Linear Correlation -0.5 > r > -0.8 Moderate Negative Linear Correlation -0.8 > r > -1 Strong Negative Linear Correlation r = -1 Perfect Negative Linear Correlation

## Correlation Coefficient Equation

Event though several types correlation coefficients exist, Pearson's product moment coefficient is generally used to determine the strength of correlation between the variables and commonly called the Correlation Coefficient.
The formula is given by
r = $\frac{cov(X,Y)}{\sqrt{var(X)}\sqrt{var(Y)}}$
where the covariance between X and Y cov (XY) = ∑(X - X)(Y- Y).
Substituting the expressions for the variances of X and Y, the formula can be stated as
r = $\frac{\sum (X-\overline{X})(Y-\overline{Y})}{\sqrt{\sum (X-\overline{X})^{2}}\sqrt{\sum (Y-\overline{Y})^{2}}}$

When the values of the variables cannot be measured precisely, Spearman's Rank Correlation coefficient is used to measure the relationship between the variable.
The formula is given below:
ρ = 1-$\frac{6\sum d^{2}}{n(n^{2}-1)}$
where d = the difference between the ranks of corresponding pairs of X and Y and n the number of observations.

## Sample Correlation Coefficient

For calculating the correlation coefficient of sample data, the formula for Pearson product moment coefficient is simplified to a form which can be used for easy computation using table columns.

r = $\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{(n(\sum x^{2})-(\sum x)^{2})(n(\sum y^{2})-(\sum y)^{2})}}$

This formula is quite handy as the sums given in the formula can be found by adding the corresponding columns in the table of data.

## Cross Correlation

Suppose Xi and Yi are two time series data sharing the same time period. Cross correlations are computed using the correlation formula with time delay or lag allowed for one of the variable. Let us suppose there are 12 time periods in the two series. If we want to compute the cross correlation for lag 2, we find the correlation coefficient combining each value of X series with values in Y series lagging by 2 time period.

For example, the correlation between X3 and Y1 , X4 and Y2 and so on. The sum of all these correlations so computed is the cross correlation between x and y for a lag of 2. The series formed by the cross correlations for all possible lags will be twice in length of the given two series.

## Correlation Coefficient Examples

1. The correlation coefficient calculated for a bivariate data distribution consisting of age against blood pressure is 0.85. Comment on the relationship between the two variables.
The value of correlation coefficient is greater than 0.8. Hence the two variables age and blood pressure exhibit a strong positive linear relationship. This means older people can be expected to have higher blood pressure when compared to younger people.

2. The performance of two Athletes in five different events are given below. Find Spearman's rank correlation coefficient for the data.
 Athlete Ranks in Events A 3 2 4 1 5 B 1 3 4 5 2

Let us rewrite the table adding a column for the difference in ranks and another column for its square.

 R1 R2 d = R1 - R2 d2 3 1 2 4 2 3 -1 1 4 4 0 0 1 5 -4 16 5 2 3 9 ∑d2 = 30

ρ = 1-$\frac{6\sum d^{2}}{n(n^{2}-1)}$

= 1-$\frac{6\times 30}{5(25-1)}$ = -0.5