A** factorial distribution** happens when a set of variables are statistically independent events. In other words, the variables don’t interact at all; Given two events x and y, the probability of x doesn’t change when you factor in y. Therefore, the probability of x, given that y has happened —P(x|y)— will be the same as P(x).

The factorial distribution can be written in many ways (Hinton, 2013; Olshausen, 2004):

- p(x,y) = p(x)p(y)
- p(x,y,z) = p(x)p(y)p(z)
- p(x
_{1}, x_{2}, x_{3}, x_{4}) = p(x_{1}) P(x_{2}) p(x_{3}) p(x_{4})

In the case of a probability vector, the meaning is exactly the same. That is, a probability vector from a factorial distribution is the product of probabilities of the vector’s individual terms.

## Defining a Factorial Distribution

For a factorial distribution, P(x,y) = P(x)P(y). We can generalize this for more than two variables (Olshausen, 2004) and write:

P(x_{1}, x_{2},…,x_{n}) = P(x_{1}) · P(x_{2} · ……· P(x_{n}).

This expression can also be written more concisely as:

P(x_{1}, x_{2},…,x_{n})= Π_{i}P(x_{i}).

## Examples of Factorial Distributions

We like to work with factorial distributions because their statistics are easy to compute. In some fields such as neurology, situations best represented by complicated, intractable probability distributions are approximated by factorial distributions in order to take advantage of this ease of manipulation.

One example of an often-encountered factorial distribution is the p-generalized normal distribution, represented by the equation

I won’t go into the meaning of that formula here; if you’d like to go deeper, feel free to read up on it here. But note that when p = 2, this is exactly the normal distribution. So the normal distribution is also factorial.

## References

Grötschel, M. et al. (Eds.) (2013). Online Optimization of Large Scale Systems. Springer Science & Business Media.

Hinton, G. (2013). Lecture 1: Introduction to Machine Learning and Graphical Models. Retrieved December 28, 2017 from: https://www.cs.toronto.edu/~hinton/csc2535/notes/lec1new.pdf

Jordan, I. et al. (2001). Graphical Models: Foundations of Neural Computation. MIT Press.

Olshausen, B. (2004). A Probability Primer. Retrieved December 27, 2017 from:

Retrieved from http://redwood.berkeley.edu/bruno/npb163/probability.pdf

Sinz, F. (2008). Characterization of the p-Generalized Normal Distribution. Retrieved December 27, 2017 from http://www.orga.cvss.cc/media/publications/SinzGerwinnBethge_2008.pdf on December 27, 2017

**Confused and have questions?** Head over to Chegg and use code “CS5OFFBTS18” (exp. 11/30/2018) to get $5 off your first month of Chegg Study, so you can understand any concept by asking a subject expert and getting an in-depth explanation online 24/7.

**Comments? Need to post a correction?** Please post a comment on our *Facebook page*.