Statistics which relate with broad classes, groups, as well as categories, so that it is not possible to distinguish the properties of an individual within those classes, groupings, or categories. Aggregation could be by socio-economic grouping, for instance, the size of your economically active population, as well as by time interval, for instance, the number of in-migrants with an area in a season. When considering aggregate data, the choice of spatial unit used can reveal as well as conceal significant differences.

## Definition

Aggregate data are quantified capabilities of collectivities that either relate to the body of interest in its entirety or have been aggregated judging by the properties of individual members on the collective.
Aggregate data is also negative, that is, not really individual data, meaning that aggregate data make reference to bigger entities as compared to individual data do. The process of aggregation can be executed by calculating sums or perhaps various means on the frequency distribution of personal cases.Aggregate data tend to be predominantly secondary data; that may be, researchers do not typically collect such data on their own.

## Analysis

With regards to analysis, here are several starting points: Simple steps to analyze and aggregate of a data.

1)  Describe one variable:

Histogram

Summary statistics (mean, range, standard deviation, min, max, etc)

Do you have outliers? (greater than 1. 5x inter-quartile range)

What types of distribution does it adhere to? (normal, etc)

2) Describe relationship between variables:

Scatter Story

Correlation

Outliers? check out Mahalanobis distance

Mosaic piece for categorical

Contingency table for categorical

3) Predict a true number (like price): regression

Ordinary Least Squares regression as well as machine learning regression approach.
When the technique used to predict is understandable through humans, this is named modeling. For example, a neural network can make predictions, but is typically not understandable. You may use regression to find Key Performance Indicators too.

4) Predict class membership or probability of class membership (like passed/failed): group

Logistic regression or unit learning techniques, such as Support vector Machines,
Put observations in "natural" groups: clustering
Generally one finds "similar" observations by calculating the space (distance) between them.

5) Put attributes into "natural" groups: factoring

And other matrix operations for example Principal Component Analysis, Non Negative Matrix Factorization.
Quantifying Possibility = Standard Deviation, or proportion of times that "bad things" occur x how bad there're
Likelihood of a effectively completed iteration given x amount of story points = Logistic Regression.

## Examples

Some of the examples are explained below to have a better understanding of the topic:

Example 1: For the female athlete strength study, x = Number of 60 pound bench presses and y = Maximum bench press had

x : Mean = 11.0, standard deviation = 7 . 1

x : Mean = 79.9, standard deviation = 13.3

Regression Equation $\hat{y}$ = 63.5 + 1.49x

Find the correlation r between these two variables.

Solution: Slope of the regression equation is b = 1.49. Since s$_{x}$ = 7.1 and s$_{y}$ = 13.3

r = b($\frac{s_{x}}{s_{y}}$)

= 1.49 ($\frac{7.1}{13.3}$)

= 0.80

The variables have a strong positive association.

Example 2: For the 64 female college athletes, the ANOVA table for the multiple regression

predicting y = Weight using x$_{1}$ = height, x$_{2}$ = % body fat, and x$_{3}$ = Age shows

 Source Degrees of freedom Sum of Squares Mean sum of Squares F P Regression 3 12407.9 4136.0 40.48 0.0000 Residual Error 60 6131.0 102.2

1)
State and interpret the null hypothesis tested in this table.

2) From the F table, which F value would have a P value of 0.05 for these data?

3) Report the observed test statistic and P value. Interpret the P value and make a decision for a 0.05 significance level.

Solution: Since there are 3 explanatory variables, the null hypothesis is H$_{0}$ : $\beta_{1}$ = $\beta_{2}$ = $\beta_{3}$ = 0. It states that weight is independent of height, % body fat, and age.

2) In the degrees of freedom column, the ANOVA table shows df$_{1}$ = 3 and df$_{2}$ = 60. FRom the F tables we see that the F value with right tail probability of 0.05 is 2. 76.

3) From the ANOVA table, the observed F test statistic value is 40.5. Since this is well above 2.76, the p value is less than 0.05. The ANOVA table reports p value = 0.0000. If H$_{0}$ were true, it would be extremely unusual to get such a large F test statistic. We can reject H$_{0}$ at the 0.05 significance level.
In summary, we conclude that atleast one predictor has an effect on weight.