Regression is a versatile technique used in statistics. Correlation analysis is used to determine the correlation between two variables and give a measure of strength and direction of the correlation.

The significance of correlation leads to the next natural step, which is Regression. Regression analysis provides methods to describe the relationship and use the relationship in forecasting. Regression analysis measures and uses the predictive power of one or more independent variables in predicting the values of the dependent variable.

What is Regression Analysis?

Regression analysis broadly involves three steps.
1. Finding the regression equation describing the relationship between the predictor and response variables.
2. Testing the goodness of fit of the regression equation.
3. Understanding the trend and making predictions and forecasts using the regression equation.

The predictor variable is commonly known as the independent variable and the response variable is called the dependent variable. Often changes in more than one predictor variables causes the change in the response variable.

Types of Regression Analysis

Regression analysis can be classified as Simple Regression and Multiple Regression.

Simple Regression:
In simple regression the relationship between the dependent variable and one independent variable is found. Simple regression can further be divided into two types Linear and Non linear.
In simple linear regression the relationship is estimated as a linear function. There are several types of non linear regression like exponential and polynomial models. Which type of approximation is to be done is determined by studying the scatter plot and the strength and significance of linear correlation.

Multiple Regression:
In multiple or multivariate regression two or more predictor variables are involved in finding a fit for predicting the dependent variable. Due to the complexity involved in this type of analysis the regression equation generally formed on a linear model.

Simple Linear Regression Analysis

Simple Linear Regression analysis is based on the linear relation between the response variable Y and a single predictor variable X. The model investigated is Y = α + βx + ε, where ε is called the error which refers to factors that contribute to Y value other than X. A linear regression line is used when the linear correlation coefficient calculated for the sample data is high enough and its significance accepted by a Hypothesis test. The first step in the analysis is to find the equation to the line of best fit. There are many methods to find the line of best fit. But the Least Square Regression line is accepted as the reliable tool to be used in prediction and forecasting.

Least square regression line is got by minimizing the squared deviations of all the data points from the fitting line.

Statistically the line of best fit is written in the form Y' = a + bX the equivalent form to a linear equation Y = mX + b.

The values of a, the Y' intercept and b the slope are found using formulas similar to the one used for finding the correlation coefficient r for a sample data.

a = $\frac{(\sum y)(\sum x^{2})-(\sum x)(\sum xy)}{n(\sum x^{2})-(\sum x)^{2}}$

b = $\frac{n(\sum xy)-(\sum x)(\sum y)}{n(\sum x^{2})-(\sum x)^{2}}$

The slope b can also be defined as b = $\frac{S_{x,y}}{S_{x}^{2}}$ where Sx,y is the sample covariance between x and y and $S_{x}^{2}$ is the sample variance of x.
The Y intercept a can also be defined as a = y - bx where x and y are correspondingly the sample means of x group and the y group.
Here the response variable is denoted by Y' as estimate to distinguish from the actual value Y.The regression line is used for estimation purposes after testing the significance of its slope and intercept.

Multivariate Regression Analysis

In multivariate regression analysis, the regression model defines the relationship between one response variable against many (more than one) predictor variables. The general model considered is Y = β0 + β1X1 + β2X2 +.......+ βpXp + ε. The task is to find the values of all the weights βi, the constant β0 and the residual ε.
The regression equation used for the purpose is of the form
Y = b0 + b1X + bX2 +.....+bpXp.
Often a multivariate regression situation can be reduced to a simple regression model with one predictor variable, by considering the variances caused by other predictor variables as the residual.

Regression Analysis Examples

Solved Example

Question: The selling price of an item and sales volume in of thousands of items is given in the table below.

a) Find the equation to line of best fit, using least square regression.

 Selling Pricein dollars 60 80 100 120 140 160 180 200 220 240 Sales in thousands of Numbers 400 350 300 275 250 210 190 150 100 50

b) Also estimate the sales volume when the selling price is 175 dollars.

Solution:

We need to find the line of best fit of the model y' = a + bx. That is the task is to find the values of the intercept 'a' and slope 'b'.
Let us rewrite the table in order to find the summed up values that can be plugged in the formulas for the slope 'b' and Y' intercept 'a'.

 Price    x dollars Sales         y xy x2 y2 60 400 24,000 3,600 160,000 80 350 28,000 6,400 122,500 100 300 30,000 10,000 90,000 120 275 33,000 14,400 75,625 140 250 35,000 19,600 62,500 160 210 33,600 25,600 44,100 180 190 34,200 32,400 36,100 200 150 30,000 40,000 22,500 220 100 22,000 48,400 10,000 240 50 12,000 57,600 2,500 ∑x = 1,500 ∑y = 2,275 ∑xy = 281,800 ∑x2 = 258,000 ∑y2 = 625,825

We have the required sums to be substituted in the formula.

a = $\frac{(\sum y)(\sum x^{2})-(\sum x)(\sum xy)}{n(\sum x^{2})-(\sum x)^{2}}$

= $\frac{(2275)(258000)-(1500)(281800)}{10(258000)-(1500)^{2}}$  = 497.73

b = $\frac{n(\sum xy)-(\sum x)(\sum y)}{n(\sum x^{2})-(\sum x)^{2}}$

= $\frac{10(281800)-(1500)(2275)}{10(25800)-(1500)^{2}}$   = -1.80

Hence the required regression line is y' = 497.73 - 1.80x.

b) To estimate the sales when the price x = 175 dollars, substitute x =175 in the regression equation and calculate y'.
y' = 497.73 - 1.80(175) = 182.73
This means 182,730 items are expected to be sold at a price of 175 dollars.