We come across a number of inter-related events in our day-today life. For instance, the yield of a crop depends on the rainfall, the cost or price of a product depends on the production and advertising expenditure, the demand for a particular product depends on its price, expenditure of a person depends on his income and so on.

The regression analysis confined to the study of only two variables at a time is called a simple regression. But quiet often the values of a particular phenomenon may be affected by multiplicity of factors. The regression analysis for studying more than two variables at a time is known as multiple regression. In this section we shall discuss with linear Regression.

Line of regression of y on x is the line which gives the best estimate for the value of y for any specified value of x.
Similarly, the line of regression of x on y is the line which gives the best estimate for the value of x for any specified value of y

Linear Regression Equation:
Line of regression of y on x : Let (x1 , y1) , (x2 , y2) . . . . . . . . . . . .. . . . . . . . . . . (xn , yn) be n pairs of observations on the two variables x and y under study.
Then the linear regression of y on x is given by
y = a + b x

where a = $\frac{(\sum x^{2})(\sum y)-(\sum x)(\sum xy)}{n\sum x^{2}-(\sum x)^{2}}$ and

b = $\frac{n\sum xy-(\sum x)\sum y}{n\sum x^{2}-(\sum x)^{2}}$

Linear Regression Table:

x
y
x y
x2
y2
$\sum x$ =
$\sum$ y =
$\sum$ x y = $\sum$ x2 =
$\sum$ y2 =

Substituting the sum of x, y, (xy), x2 and y2 in the above formula for a and b we get the real values of a and b.

Substituting the real values of a and b in the equation y = a + bx we get the equation of regression line.

Simple Linear Regression: In a simple linear regression the value of y corresponds to each of the values of x, meaning as x varies, the value of y also varies.
The equation is given by, y = a + bx

Multiple Linear Regression: In a multiple linear regression, the value of y corresponds to more than one value of x, which can be expressed as,
y = a + b1 x1 + b2 x2 + .....................

Solved Examples

Question 1: Obtain the regression line from the following data.
    X     
    6      
       2      
     10         
    4         
      8        
    Y
    9
      11
      5      8
     7

Solution:
 
Regression equation of Y on X is Y = a + b X

    X          
     Y        
     XY         
    X2            
      Y2              
     6
    9
     54
       36
        81
     2
    11
     22
        4
      121
   10
     5
     50
    100
        25
    4
      8
     32
      16
        64
    8
     7
     56
      64
        49
 $\sum$ X = 30   $\sum$ Y = 40  $\sum$ XY = 214   $\sum$ X2 = 220   $\sum$ Y2 = 340


a = $\frac{(\sum x^{2})(\sum y)-(\sum x)(\sum xy)}{n\sum x^{2}-(\sum x)^{2}}$

    = $\frac{(220)(40)-(30)(214)}{5(220)-30^{2}}$

a   = 11.9

b = $\frac{n\sum xy-(\sum x)\sum y}{n\sum x^{2}-(\sum x)^{2}}$

     = $\frac{5(214)-(30)(40)}{5(220)-30^{2}}$

b   = - 0.65
Substituting the values of X and Y in the equation Y = a + b X we get
     Y = 11.9 + (- 0.65) X
=> Y = 11.9 - 0.65 X

 

Question 2: The data corresponding to heights of fathers and sons in inches are given below:
   Heights of Fathers      
     65      
    66        
     67         
     68      
    69         
     70         
    72        
    67            
   Heights of Sons
     67
    68
     65      72
     72
     69
    71 
    68

Solution:
 

       X       
         Y          
   x = X - $\overline{X}$
   x = X - 68     
              
  y=Y - $\overline{Y}$          
  y = Y - 69            
    x2              
   y2             
          x y                
     65        67
         -3
             -2
       9           4             6
     66
       68
         -2
             -1
       4
          1             2
     67
       65
         -1
            - 4
       1
          16             4
     68
       72
          0
              3
       0
          9             0
     69
       72
          1
              3
       1
          9             3
     70
       69
          2
              0
       4           0             0
     72
       71
          4
              2
       16            4             8
     67
       68
         -1
             -1
       1           1             1
 $\sum$ X = 544    $\sum$ Y = 552     $\sum$ x = 0   $\sum$ y = 0   $\sum$ x2 = 36    $\sum$ y2 = 44     $\sum$ xy = 24

  $\overline{X}$ = $\frac{544}{8}$ = 68

  $\overline{Y}$ = $\frac{552}{8}$ = 69

 $\sigma _{yx}$  = $\frac{n(\sum xy)-(\sum x)(\sum y)}{n\sum x^{2}-\left (\sum x \right)^{2}}$

                         = $\frac{(8)(24)-0}{8(36)-0}$

                         = 0.66

$\sigma _{yx}$    = 0.7
The regression line of y on x is given by the formula,
           y - $\overline{X}$ = $\sigma _{yx}$ (x - $\overline{X})$
Substituting for $\sigma_{yx}$, $\overline{X}$ and $\overline{Y}$, we get

        y - 69 = 0.7 (x - 68)
              y  = 0.7 x + 21.4, which is the required equation of regression.