A Histogram is a graphical display of data using bars of different heights. It is used to summarize discrete or continuous data that are measured on an interval scale. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. It consists of tabular frequencies, shown as adjacent rectangles, with an area equal to the frequency of the observations in the interval and groups numbers into ranges and is one of the basic quality tools. When the variables are continuous, there are no gaps between the bars however, in discrete case gaps should be left between the bars.

Histograms helps us to determine which causes dominate. Histogram gives a clear picture of the location and variation in a data set. However, histograms can be manipulated to show different pictures, If too many bars are used then the data can be misleading. In a histogram, frequency is measured by the area of the column. In a vertical bar graph, frequency is measured by the height of the bar.

Given below are different types of histograms

  • Uniform: Uniform distribution gives very little information about the data set. For example when tossed a coin there may not be frequent heads so when a tail occurs there will be change in the pattern.Here we can see that the number of classes will be too less.
Uniform Distribution
  • Bimodal: A bimodal shape will have two peaks which gives us an insight that the data is from two different systems.The two sources are then individually analyzed.
Bimodal Distribution
  • Symmetric: A histogram is said to be symmetric if it's right half is exactly similar to it's left half.
  • Skewed Right: The graph given below shows the distribution is skewed to the right and we call it as positively skewed.Here the values will be greater than zero.
Skewed Right Distribution
  • Skewed Left: The graph given below shows the distribution is skewed to the left and we call it as negatively skewed.All the collected data has values less than 100.
Skewed Left Distribution
  • Random: A random distribution will not follow any pattern for the data set and it will have several peaks where the sources of variation are combined together and we analyze it separately. In this type of distribution we can come across different classes.If there are no multiple sources of variations then we group the pattern, and find if there is any kind of useful information in that data.
Random Distribution
  • Bell shaped: Bell shaped curve mostly looks like a Normal distribution and statistical calculations must be used to verify whether the given data follows the normal distribution or not.
Bell Shaped Distribution

Histogram statistics:

For histograms, the following statistics are calculated:

Mean The average of all the values.
Median Median is the middle number.If it is an even number then the median is the mean of the two middle numbers.
Mode The value that occurs most often
Minimum The smallest value.
Maximum The biggest value.
Std Deviation An expression of how widely spread the values are around the mean.
Class Width Difference between the two boundaries of a class.
Number of Classes The number of bars (including zero height bars ) in the histograms.
Skewness Skewness is the lack of symmetry, a data set is symmetric if it looks the same to the left and right of the center point
Kurtosis Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution.

How do we draw an histogram?

Given below are the steps to be followed.

Step 1: Mark class intervals on X-axis and frequencies on Y-axis.

Step 2: The scale of both the axes should not be same.

Step 3: . If the intervals are in inclusive form, convert them to the exclusive form.

Step 4: Draw rectangles with class intervals as bases and the corresponding frequencies as heights.

Step 5: To draw the histogram for an ungrouped frequency distribution of a variate we should assume the frequency corresponding to the variate value x is spread over the interval x-$\frac{h}{2}$ to x+$\frac{h}{2}$.h:jump from one value to the next.

Solved Examples

Question 1: A die is tossed 11 times and the outcomes are recorded {1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6}. Construct an histogram for this data.

Solution:
 
We see that the graph peaks at 3,
Mean = 3.27
Median = Mode = 3. We see that the numbers are distributed about the mean. The distribution of this graph is wide compared to size of the peak, indicating that values in the set are only loosely bunched round the mean.

Frequency
 Frequency Count
 1
1
 2 2
 3 4
 4 2
 5 1
 6 1

Simple Histogram


 

Question 2: Given below is the data of 50 employees working in an Max. Wealth life Insurance company, Plot histogram for the given data.
Class Interval
Frequency
 11-20  8
 21-30  12
 31-40  8
 41-50  12
 51-60  5
 61-70  5

Solution:
 
As the class Intervals are Inclusive we have to convert them into the exclusive form.
 Class Interval Frequency
 10.5-20.5  8
 20.5-30.5  12
 30.5-40.5  8
 40.5-50.5  12
 50.5-60.5  5
 60.5-70.5  5

The histogram for the above data is shown below:
Histogram Picture