Statistics Definitions > Post-Hoc

## Post-Hoc Tests

Post-hoc (Latin, meaning “after this”) means to analyze the results of your experimental data. They are often based on a familywise error rate; the probability of at least one Type I error in a set (family) of comparisons. The most common post-hoc tests are:

- Bonferroni Procedure
- Duncan’s new multiple range test (MRT)
- Dunn’s Multiple Comparison Test
- Fisher’s Least Significant Difference (LSD)
- Holm-Bonferroni Procedure
- Newman-Keuls
- Rodger’s Method
- Scheffé’s Method
- Tukey’s Test (see also: Studentized Range Distribution)
- Dunnett’s correction
- Benjamin-Hochberg (BH) procedure

Bonferroni Procedure (Bonferonni Correction)

This multiple-comparison post-hoc correction is used when you are performing many independent or dependent statistical tests at the same time. The problem with running many simultaneous tests is that the probability of a significant result increases with each test run. This post-hoc test sets the significance cut off at α/n. For example, if you are running 20 simultaneous tests at α=0.05, the correction would be 0.0025. More detail. The Bonferroni does suffer from a loss of power. This is due to several reasons, including the fact that Type II error rates are high for each test. In other words, it overcorrects for Type I errors.

Holm-Bonferroni Method

The ordinary Bonferroni method is sometimes viewed as too conservative. Holm’s sequential Bonferroni post-hoc test is a less strict correction for multiple comparisons. See: Holm-Bonferroni method for a step-by-step example.

Duncan’s new multiple range test (MRT)

When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.

Fisher’s Least Significant Difference (LSD)

A tool to identify which pairs of means are statistically different. Essentially the same as Duncan’s MRT, but with t-values instead of Q values. **See**: Fisher’s Least Significant Difference.

Newman-Keuls

Like Tukey’s, this post-hoc test identifies sample means that are different from each other. Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is more likely to find significant differences.

Rodger’s Method

Considered by some to be the most powerful post-hoc test for detecting differences among groups. This test protects against loss of statistical power as the degrees of freedom increase.

Scheffé’s Method

Used when you want to look at post-hoc comparisons in general (as opposed to just pairwise comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal sample sizes.

See: The Scheffe Test.

Tukey’s Test

The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant Difference,” a number that represents the distance between groups, to compare every mean with every other mean.

Dunnett’s correction

Like Tukey’s this post-hoc test is used to compare means. Unlike Tukey’s, it compares every mean to a control mean. For calculation steps, see: Dunnett’s Test.

Benjamin-Hochberg (BH) procedure

If you perform a very large amount of tests, one or more of the tests will have a significant result purely by chance alone. This post-hoc test accounts for that false discovery rate. For more details, including how to run the procedure, see: Benjamini-Hochberg Procedure.

## More on the Bonferroni Correction

The Bonferroni correction is used to limit the possibility of getting a statistically significant result when testing multiple hypotheses. It’s needed because the more tests you run, the more likely you are to get a significant result. The correction lowers the area where you can reject the null hypothesis. In other words, it makes your p-value smaller.

Imagine looking for the Ace of Clubs in a deck of cards: if you pull one card from the deck, the odds are pretty low (1/52) that you’ll get the Ace of Clubs. Try again (and try perhaps 50 times), you’ll probably end up getting the Ace. The same principal works with hypothesis testing: the more simultaneous tests you run, the more likely you’ll get a “significant” result. Let’s say you were running 50 tests simultaneously with an alpha level of 0.05. The probability of observing at least one significant event due to chance alone is:

P (significant event) = 1 – P(no significant event)

= 1 – (1-0.05)^{50} = 0.92.

That’s almost certain (92%) that you’ll get at least one significant result.

## How to Calculate the Bonferroni Correction

The calculation is actually very simple, it’s just the alpha level (α) divided by the number of tests you’re running.

**Sample question: ** A researcher is testing 25 different hypotheses at the same time, using a critical value of 0.05. What is the Bonferroni correction?

**Answer:**

Bonferroni correction is α/n = .05/25 = .002

For this set of 25 tests, you would reject the null only if your p-value was smaller than .002.

## The Bonferroni Correction and Medical Testing

Matthew A. Napierala, MD points out how multiple tests affect physicians (and patients) in an article for the American Academy of Orthopaedic Surgeons. “In contemporary orthopaedic research studies, numerous simultaneous tests are routinely performed.” This means that given enough tests, one of them is bound to come back as a false positive. Definitely *not* a good thing when we’re talking about health issues.

**Confused and have questions?** Head over to Chegg and use code “CS5OFFBTS18” (exp. 11/30/2018) to get $5 off your first month of Chegg Study, so you can understand any concept by asking a subject expert and getting an in-depth explanation online 24/7.

**Comments? Need to post a correction?** Please post a comment on our *Facebook page*.