Correlation coefficient is a numerical measure of direction and strength of linear correlation between two variables. Several types of correlation coefficient are defined using various formulas. But Pearson correlation coefficient also known an Pearson product moment correlation coefficient is the most commonly used to describe the linear relationship between two variables, Indeed the term Correlation Coefficient is often loosely used to refer to Pearson Correlation Coefficient.
The formula used to compute Pearson correlation coefficient 'r' is as follows:
r = $\frac{cov(x, y)}{\sqrt{var(x)}\sqrt{var(y)}}$
The covariance between two variables is the average of product of deviations of both the variables from their respective means.
cov (x,y) = $\frac{1}{n}\sum$$(x-\overline{x})(y-\overline{y})$

Covariance is indeed an easily understandable measure of relationship between the two variables x and y. But the value of covariance is altered by the scale and units of measurements used. Pearson's Correlation coefficient eliminates this deficiency. In Pearson's moment correlation formula, the deviations from the means are expressed as fractions of standard deviations of each variable. Thus Pearson correlation coefficient calculated for same data with different scales or units will give the same value. Hence the measure of relationship got using Pearson's formula is more dependable.

The above formula leads to an easily workable formula for sample data given in a table.
r = $\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{(n(\sum x^{2})-(\sum x)^{2})(n(\sum y^{2})-(\sum y)^{2})}}$

## Pearson Correlation Coefficient Interpretation

We can understand the relationship between two variables from the computed value of Pearson Correlation Coefficient. The range for this value is between -1 and 1 both values inclusive. The sign of calculated r tells us about the direction of the correlation and the absolute measure gives the strength of the correlation.

When the absolute value of Pearson correlation coefficient is close to 1, the correlation is described as strong. When the absolute value of r is between 0.5 and 0.8, the variables are said be moderately correlated and when it is between 0.2 and 0.5, the correlation is said to be  week. r = 1indicates a perfect positive correlation and r = -1 a perfect negative correlation. When r is close to zero it is considered that no correlation exists between the two variables.

## Pearson Correlation Coefficient Significance

Pearson correlation coefficient is often computed from sample data. If the computed value of r is high there are two possibilities.
1. Either there is a significant linear relationship exists between the two variables.
2. Or the high value of r is due to chance, as it is computed from sample data of small size.

Hence before coming to a conclusion on the linear relationship between the two variables, the significance of the correlation is tested using several tests. The simplest test carried out to test the significance of correlation is t-test. The formula used to find the test value for t- test is t = $r\sqrt{\frac{n-2}{1-r^{2}}}$ where r is the compute sample correlation coefficient and n the sample size.

## Pearson Correlation Coefficient Table

In order to make the computation of Pearson correlation coefficient easier for sample data, a tabular representation is used.
The formula used to calculate the correlation coefficient is
r = $\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{(n(\sum x^{2})-(\sum x)^{2})(n(\sum y^{2})-(\sum y)^{2})}}$
The table used to calculate r looks like this.

 X Y XY X2 Y2 ∑X = ∑Y = ∑XY = ∑X2 = ∑Y2 =

The last row is gives the summation of each column which values can be substituted in the formula for r. The table format thus makes the calculation much simpler than calculating Pearson correlation coefficient by finding the deviations from mean.

## Pearson Correlation Coefficient Example

The following details show the number of absences and final grade in percent for seven students. Find the Pearson correlation coefficient for the data set and interpret the result.

 Student # absences X Final Score Y% S1 6 83 S2 4 87 S3 16 45 S4 10 75 S5 12 60 S6 5 91 S7 8 79

Extending the columns of the table for computation of the required sums which are to be plugged in the formula for 'r'
 Student # absences X Final Score Y % XY X2 Y2 S1 6 83 498 36 6889 S2 4 87 348 16 7569 S3 16 45 720 256 2025 S4 10 75 750 100 5625 S5 12 60 720 144 3600 S6 5 91 455 25 8281 S7 8 79 632 64 6241 Total ∑X = 61 ∑Y = 520 ∑XY = 4123 ∑X2 = 641 ∑Y2 = 40230

There are 7 observations. Hence n = 7

r = $\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{(n(\sum x^{2})-(\sum x)^{2})(n(\sum y^{2})-(\sum y)^{2})}}$

= $\frac{7(4123)-(61)(520)}{\sqrt{(7(641)-(61)^{2})(7(40230)-(520)^{2})}}$

= -0.9757

The value of computed Pearson correlation coefficient is negative and very close to -1.

Hence there appears to be a strong negative correlation between the number of absences the student had recorded and his final exam score. This means, that as the number of absences increases the final score drops down.