Analyzing data is the process of interpreting the meaning of the data collected, organized and displayed in the form of table or graphs. The process involves finding patterns, similarities, relationships etc. Analyzing data is not simple. It is a tedious work and little time consuming.

Data analysis is important to make predictions and inferences based on the data and it is a critical skill to develop. It helps in suggesting conclusions and decision making, and is crucial to the development of theories and new ideas.

## How to Analyze Data?

Analyzing data requires attention to detail and a relaxed frame of mind. Objective should be very specific and a clear idea of what evaluation questions you want the data to answer and, the choice of appropriate statistical method to be used should be known. When the data is assumed to follow a normal distribution in each group, parametric method is to be used. Non parametric test or distribution free methods are used when the data doesn't follow normal distribution.

Analysis of data is based on three decision criteria - number of groups, data type and assumption of normal distribution (whether the data is normal or not).

## Analyzing Qualitative Data

Qualitative data consists of words and observation and not numbers and it involves identification, interpretation and examining patterns and themes in textual data and determines how these patterns help answer the research questions. Qualitative or narrative data is conducted to organize the data into categories. It is a collection of random, unconnected statements and is considered to be objective. Qualitative data depends on people's opinions, assumptions, knowledge (therefore biases) than that of quantitative data. Researcher chooses to measure the accuracy of the observation where the analyst relates these responses and analyses using statistical techniques.

## Analyzing Quantitative Data

Quantitative data are directly collected as numbers and are usually subjected to statistical procedures such as calculating the mean, frequency distribution, standard deviation etc. On higher levels of statistical analysis t-test, factor analysis, Analysis of variance, regression can also be conducted on the data. Quantitative data provides quantifiable and easy to understand results and can be analyzed in different ways. Quantitative data has four levels of measurement.

Nominal -Nominal refers to categorically discrete data. For example, name of a book, type of car you drive. Nominal sounds like name so it should be easy to remember.

Ordinal - A set of data is said to be ordinal if the observations belonging to it can be ranked. It is possible to count and order but not measure ordinal data.. Example: T-shirt size (large, medium, small).

Interval - Measurements where the difference between values is measured by a fixed scale and is meaningful. Data is continuous and has a logical order and has a standard difference between values. Example: Temperature, Money, Education (In years)

Ratio - Ratio variables are numbers with some base value. Ratio responses will have order and spacing where multiplication makes sense too. Example: Height, weight.

Once levels of measurement have been identified based on the data, appropriate statistical methods can then be used.

## Analyzing Categorical Data

When the data is collected in categories, we record counts. The categorical variables are of two types, nominal and ordinal. Analysis of categorical data involves the use of data tables and is a two way table where the number of observations that fall into each group for two variables will be recorded. One is divided into rows and the other is divided into columns. Another important tool for analyzing categorical data is segmented bar graph.

## Analyzing Likert Scale Data

Likert scale is a psychometric scale commonly used in questionnaires and is the most widely used scale in survey research. Data analysis decision for Likert items is usually made at the questionnaire development stage. When the Likert questions are unique and stand alone, they are considered as Likert type items. Frequencies, modes, medians are the appropriate statistical tools to be used for analysis. When a series of questions are combined measuring a particular trait, then it is a Likert scale. Mean and standard deviation are used to describe the scale. Once the decision between Likert - type and Likert scale has been made, the decision on the appropriate statistics will fall into place.
Given below is an example for a Likert scale asked in a survey. Respondents specify their level of agreement to a statement.

Statement: Ice-cream is good for breakfast
1. Strongly disagree
2. Disagree
3. Neither agree nor disagree
4. Agree
5. Strongly agree
Likert scaling is a bipolar scaling method measuring either positive or negative response to a statement.

## Box-And-Whisker Plot

Box and whisker plot is a histogram like method of displaying data and depicts groups of numerical data through their quartiles. The lines extending vertically from the boxes indicate variability outside the upper and lower quartiles. Outliers can be plotted as individual points. Box plots are non parametric and it can easily display differences between populations without making any assumptions of the statistical distribution. Box plots can be drawn either vertically or horizontally. Box and whisker plot displays the five point summary - median, first quartile, third quartile, maximum and minimum.

## Analyzing Survey Data

Analyzing survey data consists of a number of interrelated processes that are intended to summarize, arrange and transform data into information. Analyzing survey data involves editing, analysis, reporting. It mainly depends on the sample size survey's research design and the quality of data. Commonly used methods in analysis in surveys are like logistic regression, descriptive statistics, regression modelling, correlation, regression etc., Descriptive statistics can be used for variance estimation.

## Quartile Deviation

Quartile deviation is half the difference between the upper and lower quartiles in a distribution and is a measure of the spread through the middle half of a distribution. Quartile deviation ignores the observation on the tails and is not influenced by extremely high or extremely low scores. It is an ordinal statistic and is often used in conjunction with the median

Quartile deviation is given by
Quartile deviation = $\frac{Q_{3} - Q_{1}}{2}$
where, Q$_{1}$ : First quartile
Q$_{3}$ : Third quartile
Q$_{3}$ - Q$_{1}$ : Interquartile range

Quartile deviation is a slightly better measure of absolute dispersion than the range. When different samples from a population are taken and their quartile deviations are calculated, their values are likely to be sufficiently different. This is known as sampling fluctuation. Quartile deviation calculated from sample does not help to draw any conclusion about the quartile deviation in the population. It can be used for comparing the dispersion in two or more than two sets of data.

Given below is an example in finding the quartile deviation.

### Solved Example

Question: Rice production (in Kg) of 20 acres for the 9 set of observations is : 1230, 1150, 1040, 2310, 1453, 1755, 1752, 1900, 1885.
Find the quartile deviation for the given data.
Solution:
Given n = 9
Quartile deviation (Q.D) is given by the formula
Quartile deviation = $\frac{Q_{3} - Q_{1}}{2}$

To find The first quartile ($Q_{1}$)
$Q_{1}$ = Value of ($\frac{n+1}{4}$) th item
Value of $\frac{9+1}{2}$ th item
So, it is the value of 5th item  $Q_{1}$  = 1453.

$Q_{3}$ = Value of $\frac{3(n+1)}{4}$ th item
Value of (7.5) th item
7th item + 0.5 (8 th item  - 7th item)

$Q_{3}$  = 1752 + 0.5 (1900 -1752) = 1826

Now, quartile deviation = $\frac{1826 - 1453}{2}$
Therefore, quartile deviation = 186.5