Statisticians use correlation and regression analysis to study the relationship between variables in Multi-variate data distribution. Correlation methods are used to determine whether a relationship exists between the variables, while regression analysis provides methods to measure and describe the relationship. The relationship found in Bivariate data sets, that is between the independent and dependent variables is called a simple relationship. The description of a correlation includes the direction, strength and shape of the relationship. The correlation is categorized in terms of direction as:1. Positive Correlation
The Scatter plot drawn will show points clustered in a corridor like pattern from bottom left to top right.
If a fitting line or curve is drawn to represent the pattern of distribution, the slope at any point on the curve is always positive.
The following Scatter plot shows the population against the number of Primary schools in counties.
The population of the county in thousands is the independent variable plotted along the horizontal axis and the number of primary schools in the country is the dependent variable measured along the vertical axis. A rising flow of dots from left to right can be observed in the plot. This indicates a positive correlation between the two variables. That means higher population means more number of primary schools in the county.
Scatter Plots for data sets can be made easily using graphing calculators or technology like MS Excel. These utilities also have features to show the fitting line or curve as desired.
The above graph represents a positive linear correlation as the spread of plots form a linear pattern. The fitting line for the data is also shown.
| Strong Positive Correlation
|| Perfect Positive Correlation
In the first scatter plot the points appear to lie close to a line with a positive slope. This graph represents a strong positive correlation.
In the second diagram all the plots lie on the same line with a positive slope showing a perfect positive correlation.
1. Eye estimation
The eye estimation or a rough estimation of the graph of the fitting line cant be found using two approaches. In the first method the number
of points above and below the line are kept the same. In the second method, the total deviations (errors ) on either side are kept equal.
2. Median Method
In the median method the entire plot is partitioned into three or four equal parts and the median point in each partition is identified. The
line joining the median points is the straight line fit for the data set.
3. Using Graphing Calculator
When technology is used to make the scatter plot, the built in feature to find the line of best fit can be used. The graphing calculator not
only display the line of best fit, but also returns the regression equation. Below is shown the line of best fit for data on population and
number of primary schools in a county whose scatter plot is given earlier.
Comment on the relationship between the variables X and Y from the Scatter plot given below:
The Plots form a path rising from left to right.
Hence the correlation is positive. Most of the
points appear to lie close to a straight line path
except two distant points. Hence the relationship
can be considered as Moderate positive correlation.
If the two distant points are removed the distribution
will fall under the classification of Strong Positive
Here comes some practice for you. In the table below the details of number of hours required for different home painting jobs and the corresponding cost of the job. Draw a scatter plot and infer on the relationship between the two variables.
|Cost of Painting