Statisticians use correlation and regression analysis to study the relationship between variables in Multi-variate data distribution. Correlation methods are used to determine whether a relationship exists between the variables, while regression analysis provides methods to measure and describe the relationship. The relationship found in Bivariate data sets, that is between the independent and dependent variables is called a simple relationship. The description of a correlation includes the direction, strength and shape of the relationship. The correlation is categorized in terms of direction as:1. Positive Correlation
2. Negative Correlation.

The relationship between the dependent and independent variables when both variables increase or decrease simultaneously is called a positive correlation between the variables.
The Scatter plot drawn will show points clustered in a corridor like pattern from bottom left to top right.
If a fitting line or curve is drawn to represent the pattern of distribution, the slope at any point on the curve is always positive.
Scatter plot is a powerful tool used in correlation analysis which visually describe the nature of the relationship. The scatter plot drawn for a bivariate data distribution with positive correlation will show a pattern, where the points rising across the coordinate plane in clusters from left to right.


The following Scatter plot shows the population against the number of Primary schools in counties.

Positive Correlation Scatter Plot

The population of the county in thousands is the independent variable plotted along the horizontal axis and the number of primary schools in the country is the dependent variable measured along the vertical axis. A rising flow of dots from left to right can be observed in the plot. This indicates a positive correlation between the two variables. That means higher population means more number of primary schools in the county.

Scatter Plots for data sets can be made easily using graphing calculators or technology like MS Excel. These utilities also have features to show the fitting line or curve as desired.
The correlation between two variables is categorized as linear and non linear correlation depending on the shape of the correlation plot. If the plots appear to form a straight line, that is if the relationship could be approximated to a straight line, then we say a linear correlation exists between the two variables. When the slope of this straight line is positive, that is when the line rises from left to right, then the correlation is described as a Linear positive correlation.

Linear Correlation

The above graph represents a positive linear correlation as the spread of plots form a linear pattern. The fitting line for the data is also shown.
The correlation coefficient is a measure that determines the direction and strength of the relationship between the two variables. The symbols r and ρ represent correspondingly the correlation coefficient of sample and population data. The value of correlation coefficient ranges from -1 to 1 both inclusive. If the value of correlation coefficient is positive then the correlation is also positive.
When the value of correlation coefficient is close to 1, then the correlation is described as strong positive correlation. Generally the values of r > 0.8 are considered to indicate strong positive correlation. When the correlation coefficient is equal to 1, then the relationship is considered to be a Perfect positive correlation.

Strong Correlation
Perfect Positive
Strong Positive Correlation
Perfect Positive Correlation

In the first scatter plot the points appear to lie close to a line with a positive slope. This graph represents a strong positive correlation.
In the second diagram all the plots lie on the same line with a positive slope showing a perfect positive correlation.
A positive correlation between two variables is considered to be weak if the value of correlation coefficient is between 0.2 and 0.5, When the correlation coefficient is less than 0.2 but greater than 0, it can be concluded that no relationship exists between the two variables .

Weak Correlation

The fitting line used to estimate the linear relationship can be found using different methods.

1. Eye estimation

The eye estimation or a rough estimation of the graph of the fitting line cant be found using two approaches. In the first method the number
of points above and below the line are kept the same. In the second method, the total deviations (errors ) on either side are kept equal.

2. Median Method

In the median method the entire plot is partitioned into three or four equal parts and the median point in each partition is identified. The
line joining the median points is the straight line fit for the data set.

3. Using Graphing Calculator

When technology is used to make the scatter plot, the built in feature to find the line of best fit can be used. The graphing calculator not
only display the line of best fit, but also returns the regression equation. Below is shown the line of best fit for data on population and
number of primary schools in a county whose scatter plot is given earlier.

Scatter Plot Fitting Line

Below you can see few examples for Positive Correlation.

Example 1:

Comment on the relationship between the variables X and Y from the Scatter plot given below:
Positive Correlation Examples
The Plots form a path rising from left to right.
Hence the correlation is positive. Most of the
points appear to lie close to a straight line path
except two distant points. Hence the relationship
can be considered as Moderate positive correlation.
If the two distant points are removed the distribution
will fall under the classification of Strong Positive

Example 2:

Here comes some practice for you. In the table below the details of number of hours required for different home painting jobs and the corresponding cost of the job. Draw a scatter plot and infer on the relationship between the two variables.

Time taken'
in Hours
4 6
Cost of Painting
in Dollars
800 900 1500 1600 2300 2000 2800 2700 3000