Probability distribution provides the possible values of random variable and their corresponding probabilities. Probability distribution can be in the form of a table, graph or a mathematical formula. A random variable will have a well defined set of outcomes and well defined probabilities for the occurrence of each outcome and the outcome happen by chance.

Probability distribution can either be univariate or multivariate. A Univariate distribution will have only one random variable and a multivariate distribution gives the probabilities of a random vector, two or more than two random variables. Probability distribution can be defined for discrete and continuous random variables.  

An event is a set of outcome to which a probability is assigned. There are different types of events in probability explained below.

Independent Events

If the outcome of the first event is not influenced by the outcome of the second event, then the two events are said to be independent. Probability of two events are determined by multiplying the probability of the first event by the probability of the second event. 

If A and B are independent events, P(A and B) = P(A) . P(B)

Dependent Events
Two events are dependent, if the outcome of the first affects the outcome of the second, where the probability will be changed. When two events A and B are dependent, and A occur first. The probability of both occurring is P(A and B) = P(A) $\times$ P($\frac{B}{A}$)

Dependent events are of two types as follows
  • With Replacement: When the sample is drawn, object is placed back where it was taken from and if subsequent draws are made, it could be selected again.
  • Without Replacement: When the sample is drawn, object will not be placed back from where it was taken, which results to change in probability, when the subsequent draws are made.
Elementary Event
An elementary event is any single element of a sample space. Elementary events are also known as atomic events. 
Example: When a die is thrown, an elementary event would be 5.

Compound Event
An event that includes two or more independent events is called a compound event. 
Example: Event of obtaining the same side (both heads or both tails), when a coin is tossed twice. A = {HH, TT}

Impossible Event
Impossible event does not contain any element.
Example: When two coins are tossed simultaneously and obtaining three heads.

Disjoint or Mutually Exclusive Events
Two events A and B are disjoint or mutually exclusive, when they don't have an element in common.
Example: Turning left and turning right are mutually exclusive. (Both can't be done at the same time).

Sure Event

Sure event S, is formed by all possible results of the sample space.
Example: Rolling two die and having a score of less than 13.

Exhaustive Event
One or more events are said to be exhaustive, if all the possible elementary events under the experiment are covered by the events considered together. It can be equally likely or not equally likely.
The most common distribution used in statistical practice is the normal distribution. A random variable is said to have a normal distribution, if it has a probability distribution that is symmetric and bell shaped. The total area of the curve is 1 and is used to measure probabilities. A normal distribution in a variate X with mean $\mu$ and variance $\sigma ^{2}$ with probability density function is

P(x) = $\frac{1}{\sigma \sqrt{2\pi}}$ exp($\frac{^{-(x-\mu)^{2}}}{2\sigma ^{2}}$).

Normal distribution also known Gaussian distribution is an approximate to binomial distribution used in analytical models. Normal distribution is intimately connected to Z- scores. Normal distribution standardizes all the data that is given by using Z scores. Z-score is given as
Z = $\frac{x - \mu}{\sigma}$
$\mu$: Mean of the population
$\sigma$: Standard deviation of the population
x: Raw score

The quantity Z represents the distance between the raw score and the population mean in units of the standard deviation. Z is negative, when the raw score is below the mean and Z is positive, when raw data above the mean. Z value provides an assessment of how off-target a process is operating.
Properties of normal distribution are given below:
  1. Normal distribution is continuous and bell shaped.
  2. Mean, median and mode are equal in normal distribution.
  3. Symmetrical with respect to the mean - 50% of the data under the curve lies to the left of the mean and 50% of the area (data) under the curve lies to the right of the mean.
  4. 95% of the area under the curve is within two standard deviations of the mean.
  5. 99.7% of the area under the curve is within three standard deviations of the mean.
  6. Normal curve extents indefinitely in both directions and is unimodal.
  7. Flattened as the variance is increased. But, becomes more peaked, when the variance is decreased.
  8. Highest point occurs at x = $\mu$ and the curve is asymptotic to the horizontal axis at the extremes.
Mean of a discrete random variable is given by the formula given below:

$\mu_{x}$ = $\sum$ [x . P(X = x)]
x: value of the random variable.
P(X = x): Probability of observing the random variable x

Expected value of a discrete random variable X, denoted E(X), is obtained using the formula
Mean = E(X) = $\sum$ [x . P(X = x)]
To find the mean of a continuous distribution, we can use the following formula
E(X) = $\mu$ = $\int_{-\infty}^{\infty}$ x f(x) dx
Variance is the weighted average of the squared deviations from the mean.
Variance of discrete random variable is given as follows:
$\sigma^{2}_{x}$ = $\sum$ [(x - $\mu_{x})^{2}$. P(X = x)]

Variance of a continuous distribution is given as follows:
V(X) = $E(x)^{2}$ - $\mu^{2}$

Standard deviation of the discrete random variable takes the square root of variance.
$\sigma_{x}$ = $\sqrt{V(x)}$
Theoretical distributions are based upon the mathematical formulas and logic rather than on the raw scores. They are used in statistics to determine probabilities. Probability distribution can either be discrete or continuous. A probability distribution is an assignment of probabilities to the specific values of a random variable. Individual probabilities must be between 0 and 1 inclusive.
Example: Toss a fair coin and observe the uppermost side.
The theoretical probability distribution is P(H) = $\frac{1}{2}$, P(T) = $\frac{1}{2}$.
If the following conditions are satisfied, then an experiment has a binomial probability distribution.
  • Fixed number of observations, number of observations is denoted by n. Each repetition of the experiment is called a trial.
  • Each observation is independent. Outcome of one trial will not affect the outcome of the other trials.
  • For each trial, there are two mutually exclusive outcomes, success or failure.
  • Probability of success is fixed for each trial of the experiment.
Binomial probability distribution is a discrete probability distribution used to frequently model the number of successes in a sample of size 'n' drawn with the replacement from a population of size N. It is used to compute probabilities of certain outcomes. The probability mass function of binomial probability distribution is 
P(X = k) = $^{n}C_{k}\ p^{k}(1-p)^{n-k}$ ; k = 0, 1, 2, 3,...., n                 
$^{n}C_{k}$: Number of outcomes which includes exactly k successes and n - k failures.
n and p are the parameters. Denoted as X $\sim$ B(n, p); 0 $\leq$ X $\leq$ n
The Cumulative distribution function of a real valued random variable X is given by

$F_{X}$(x) = P( X $\leq$ x)

The Probability that X lies in the semi closed interval (a, b] is
P(a < X $\leq$ b) = $F_{X}(b)$ - $F_{X}(a)$ where a < b

Suppose p(x) is a density function for a quantity. The cumulative distribution function for the quantity is defined as
P(x) = $\int_{-\infty}^{x}$p(t)dt
The above equation gives the probability of having a value less than x.
Continuous random variable takes infinite number of values between any two points. If a random variable is a continuous variable, its probability distribution is known as continuous probability distribution. Equations describing continuous probability distribution are known as probability density function.
 
If f(x) is a continuous probability density function defined over a specified range of x then,
Total area under the curve = 1 = $\int_{-\infty}^{\infty}$ f(x)dx = 1
Area under the curve from x = a to x = b is written as P(a < x < b) =  $\int_{a}^b$ f(x)dx
A discrete random variable will take countable number of values. If X is a discrete random variable, the discrete probability distribution represents the probability that X can take on each one of its possible values.

Representation: Probability that the random variable 'X' will equal 'x' as P(X = x). Simply, it can be written as P(x).
Upper case (X) - Denotes a random variable.
Lower case (x) - Denotes a particular value, that this random variable can take.

Probability distribution P of a discrete random variable must satisfy:
  1. 0 $\leq$ P(x) $\leq$ 1, for all x
  2. $\sum_{x}$ P(X = x) = 1
There are many discrete probability distributions. Some of them are binomial distribution, bernoulli distribution, poisson distribution, geometric distribution, etc.

Random variable X is said to follow uniform distribution, when all its outcomes are equally likely. Uniform distribution is known as the distribution of little information, because the probability over any interval of the continuous random variable is the same as for any other interval of the same width. Uniform distribution is a distribution that has constant probability.
Example: Number of heads when a single coin is tossed.
A continuous random variable is said to have uniform distribution on the interval (a, b), if its probability density function is given as

f(x ; a, b) = f(x ; a, b) = $\left\{\begin{matrix}
\frac{1}{b-a}; \ for\ a\leq x\leq b\\
0;\ Otherwise
\end{matrix}\right.$

Geometric distribution is a discrete distribution having probability mass function as
P(x) = pq$^{x}$ ; q = 1 - p, 0 < p < 1; x = 0, 1, 2, .....
The above probability mass function models the number of failures until the first success. It is the only discrete memoryless random distribution.
Mean and variance of geometric distribution is
E(X) = $\frac{1 - p}{p}$

V(X) = $\frac{1-p}{p^{2}}$

Another form of geometric distribution is as follows:
Probability of success on each trial is p and the probability that the xth trial is the first success is
P(x) = pq$^{x-1}$; q = 1 - p, 0 < p < 1; x = 0,1,2,.....

Mean and variance of geometric probability distribution is given as follows:
E(X) = $\frac{1}{p}$

V(X) = $\frac{1-p}{p^{2}}$
Hypergeometric probability distribution is a discrete distribution having a finite population of size M consists of N elements called success and K elements called failures. A sample of n elements are selected at random without replacement.
Probability mass function of hypergeometric distribution with x number of success is

P(x) = $\frac{^{N}C_{x}\ ^{K}C_{n-x}}{^{N}C_{n}}$

Hypergeometric probability is for situations, where you sample without replacement from a finite population, whose elements can be classified into two mutually exclusive categories (Pass/Fail). Probability of success changes on each draw and the draw can be classified into one or two categories.
Poisson distribution was developed by the French mathematician Simeon Denis Poisson in 1837. Poisson distribution is a discrete distribution and is used as a model for the number of events in a specific time period. It uses the fixed interval of time in which the number of successes is recorded.

The probability mass function of Poisson distribution is
p(x, $\lambda$) = $\frac{e^{-\lambda}\lambda^{x}}{x!}$ for x = 0, 1, 2, .....
where, $\lambda$: parameter which indicates the average number of events in the given time interval.
e = 2.71
In a Poisson distribution, only one parameter, $\lambda$ is needed to determine the probability of an event. Mean and variance of the Poisson distribution are both equal to $\lambda$.
Exponential distribution is a continuous distribution having the probability density function as f(x) = $\lambda e^{-\lambda }$, x $\geq$ 0

Exponential distribution depends upon the value of $\lambda$. Smaller values of $\lambda$ flatten the curve. It is used to model time until something happens in the process and is the only continuous memory less random distribution.

Mean and variance of an exponential distribution with parameter $\lambda$ is given by

E[x] = $\frac{1}{\lambda}$

var[x] = $\frac{1}{\lambda^{2}}$
Conditional probability distribution is a mathematical probability calculated depending on the fact that one event depends upon another. They are generally questioned in the form of certain queries found in statements. If 'A' and 'B' are the events considered, then the conditional probability is written as P($\frac{A}{B}$), which means the probability of occurence of event 'A' given event 'B'.

For the two given events A and B with P(B) > 0, the conditional probability of A given B is

P($\frac{A}{B}$) = $\frac{P(A\cap B)}{P(B)}$
If X and Y are discrete random variables and f(x, y) is the value of their joint probability distribution at (x, y), the functions given as
g(x) =  $\sum_{y}^{}$ f(x, y) and
h(y) = $\sum_{x}^{}$ f(x, y)
are the marginal distributions of X and Y respectively.