A univariate data set consists on only one variable, like Income of individual families, Heights of children in a given age group, Test Scores or ages of employees in an organization. But there are many situations where we need to observe two variables for the required study. Also we may be interested to know whether the subject under study is related to another variable. Hence the study of Bivariate Data provides tools, techniques and methods for the purpose of analysis and inference of Bivariate Data Distribution.

A contingency table is used to display the bivariate data when both the variables are classified as categorical.

Bivariate data consists of two variables, whose relationship is to be analyzed.

The Variables in a Bivariate data distribution can both be numerical, can both be categorical or one numerical and one categorical. If the analysis shows that one variable is influenced by the second variable, then the two variables are correspondingly known as dependent and independent variables.
The techniques applied in the analysis of Bivariate Data depend on the types of data involved in the distribution.

Scatter Plot and Regression Line

When both the variables in a Bivariatle data set are quantitative or numerical type, a scatter plot is used to study the relationship between the two variables. Each pair of variables is considered as an ordered pair and plotted on a graph. The independent variable is measured along the X - axis (Horizontal axis) and the dependent variable is measured along the Vertical Y-axis. From the pattern of the plots, we can analyze the correlation between the two variables.

Scatter Plot

The above scatter Plot shows the relationship between the average number of hours studied per week and the final score.
A positive correlation can be recognized from the pattern seen. Using the data set a regression line or trend line can be found using various methods. The equation of the regression line is useful in forecasting future behavior.

Numerical Variable and a Categorical Variable

A back to back stem plot or a Histogram is used to display Bivariate data consisting of a numerical variable and a categorical variable with categories.
The following table shows the weights of new born babies in a hypotherical Hospital during the course of a month.

Weights in Kg
Boys 3.5 4.3 5.0 3.6 4.9
3.5
3.8 4.8 3.6
4.2
Girls 3.0 2.8 3.8 3.2 4.1 3.1 2.7 3.3 3.6 3.2

Stem Leaf

The back to back stem plot is shown above, which can be used for further analysis of finding the median and the quartiles.

When the categorical data consists of more than two categories parallel box plots can be constructed displaying the five point summary of each category.

Example:

The following contingency table shows the ice cream flavor preferences between male and female students

Flavor
Male
Female Total
Vanilla 9
5
14
Chocolate 12
20
32
Strawberry 12
15
27
Caramel 15
12
27
Banana Split 12
8
20
Total 60
60
120

This contingency table can be used for analyzing the bivariate data using different techniques. The frequencies here can be expressed as percentages and compared. Or this can be used in testing the claim on population behavior using advanced techniques like Hypothesis testing.
Below you could see some examples

Solved Examples

Question 1: The table below shows the height of a player and the average number of points made in a single basket ball match.
    
 Height 
 in cm
   x
   Average
Points Scored
        y
      Height 
 in cm
    x
    Average
Points Scored
        y
        Height 
   in cm
       x
   Average
Points Scored
         y
  184         12
     200        20
      199         18
  194         22
     188
       18
      177           6
  185
          6
     184         14
      184
        16
  174
          5
     188
       12
      178
          8
  186
        14
     182
       14
      190
        20
  183         10
     185        10       193
        24
  175
         8
     183        18
      204
        24

Use technology to draw a scatter plot of the data given and discuss the correlation between the height and average points scored. Also use the technology to find the line of best fit for the bivariate data plotted.
Solution:
Example of Bivariate Data

   From the scatter plot pattern a positive correlation between the height of the player and the points scored can be
   inferred. The Equation of the Regression line is y = 0.611X -99.699. The correlation coefficient r = 0.82 which
   tells that a moderate positive correlation exists between the two variables.

Question 2: The heights (in cm)of students in three grades in a High School are given below. Find the five point summary for each group, plot the summary in parallel box plots.

Grade 10
 120, 126, 131, 138, 140, 143, 146, 147, 150, 156, 157, 158, 158, 160, 162, 164, 168, 170
Grade 11
 140, 143, 146, 147, 149, 151, 153, 156, 162, 164, 165, 167, 168, 170, 173, 177, 178, 180
Grade12  151, 153, 154, 158, 160, 163, 164, 166, 167, 169, 169, 172, 175, 180, 187, 189, 193, 195

Solution:
The five point summary for each group is as follows:
    
Grade
  Minimum     Maximum  
   Median 
   Q1    
   Q3   
  10     120
     170
    153
  140   160
  11     140
     180
    163
  149   170
  12     151
     195
    168
  160
  180

Examples of Bivariate Data

The parallel box plot shown above can be used to compare the distributions of heights among the three grade students.