Here is a probability distribution cheat sheet that I like to keep around for reference. This focuses on the “big picture” properties of some well known PDFs. The goal is to collect some properties that can help me decide when it’s appropriate to use a particular distribution.

## Beta Distribution

- Used in task duration modeling (E.g.. “Sun rise problem”, “What should the probability be after seeing a series of Bernoulli trials?”)
- It defines a prior probability on the parameter of the binomial distribution after a number of observed successes and failures. This prior is parameterized by these successes/failures.
- In a Bayesian setting, is a conjugate prior for the Bernoulli and binomial distributions.

## Cauchy Distribution

- is Cauchy where and are standard normal.
- The sample mean of standard Cauchy random variables (RVs) is itself a standard Cauchy.
- Is the same as a Student’s t-distribution with one degree of freedom.

## Chi-Squared Distribution

- where are standard normal, is Chi-Squared with degree of freedom .
- Because it’s a sum of squares, it often comes up in variance and standard deviation tests.
- It’s used in the Chi-Squared Test to compare an observed distribution to a theoretical one.
- It’s used to calculate the similarity of two histograms. (This is used in computer vision a lot.)

## Exponential Distribution

- Models the time between events in a Poisson process.
- It’s the continuous counterpart to the geometric distribution
- Models the “time for a continuous process to change state.” E.g. Time to particle decay.
- Assumes that the rate between events is constant.

## F-Distribution

- The ratio of two (scaled) Chi-Squared RVs is F-distributed. That is

where the d’s are the corresponding degrees of freedom. - Since it’s a ratio of Chi-Squared RVs it’s useful in comparing the variance of distributions.

## Gamma Distribution

- Commonly used to model waiting times and life-time testing. (Think, a series of rare events.)
- When its parameter is an integer, it’s the sum of exponential RVs.
- Commonly seen as the continuous analog of the negative binomial distribution.

## Geometric Distribution

- Models the problem: “How many Bernoulli trials X are needed before I get a success? What is the distribution of this X?”
- It’s the continuous analog of the exponential distribution.
- It’s memory-less, meaning the probabilities do not changes given previous history.

## Hypergeometric Distribution

- Describes the Urn Problem: “Given N marbles (colored white and black with some distribution), if I select n of them, what’s the probability that k are white?”
- The multivariate hypergeometric distribution deals with marbles of more than one color.

## Negative Binomial Distribution

- Describes the number of successful Bernoulli successes before getting k fails.
- It can be formulated to have a real value for the parameter k. This is useful in describing “contagious” discrete events like tornado outbreaks (according to Wikipedia).
- Is a good “over-dispersed Poisson” in the case when your sample variance does not equal your sample mean.

## Poisson Distribution

- Describes the probability of a number of events occurring in a fixed interval, assuming the probability of occurrence is constant.
- The Poisson mean equals its variance. If your data does not fit this property see the negative binomial distribution.
- It’s a Bernoulli distribution where the number of events n is very large and probability parameter p is very small.
- Cool factoid: The fact that the Poisson mean equals it’s variance is used to count minuscule discrete events. (See Wikipedia entry for more details.)

## Student’s t-distribution

- has a Student’s t-distribution where Z is standard normal, and V is Chi-Squared with v degrees of freedom.
- Is the fat tailed version of the normal distribution.
- Useful for characterizing RVs that contain a ratio where small values in the denominator can cause significant weight in the tails.
- Arises from looking at a random variable that takes the ratio of the sample mean over the sample standard deviation. In the mathematics the dependency on the actual standard deviation drops away. This is fortunate since during an experiment the actual standard deviation is rarely known.