Statistics How To

Dummy Variables / Indicator Variable: Simple Definition, Examples

Types of Variable > Dummy Variables

dummy variables

Regression analysis.

What are Dummy Variables?

Dummy variables (sometimes called indicator variables) are used in regression analysis and Latent Class Analysis. As implied by the name, these variables are artificial attributes, and they are used with two or more categories or levels. It’s used when you want to work with categorical variables which have no quantifiable relationship with each other.

For example, race can be categorized by Caucasian, African American, Asian, Hispanic, Other. If you assign the numbers 1-5 for these categories when performing regression analysis, the results would make no sense at all (is the “Other” category in any way 5 times the “Caucasian” category?). However, if you create a variable called Caucasian and assign the dummy variable 1 to mean “is Caucasian” and 0 to mean “is not Caucasian” then you can start to see how dummy variables are useful.

In latent class analysis, the term indicator variable means something more specific, although it’s still an artificial variable. A set of observed variables can “indicate” the presence of one or more latent (hidden) variables — hence the term indicator variable.

Coding Categorical variables with multiple levels

If you have a categorical variable with more than two levels (groups or levels are different groups in the same independent variable), multiple dummy variables need to be created. In the above example, the categorical variable “Race” has five levels (Caucasian, African American, Asian, Hispanic, Other). The formula k-1 is used to decide how many dummy variables to code, where “k” is the number of levels. In other words, only four of these five levels are coded with dummy variables. Which variable should you leave out? It’s usually the largest group to which all the others will be compared. In this example, let’s assume it’s some sort of data for Mexico City, Mexico. the largest group would be Hispanic and that would be the level left out. Ultimately, which variable is not coded with a dummy variable is up to you, the researcher and which variable you are comparing the others to.


Confused and have questions? Head over to Chegg and use code “CS5OFFBTS18” (exp. 11/30/2018) to get $5 off your first month of Chegg Study, so you can understand any concept by asking a subject expert and getting an in-depth explanation online 24/7.

Comments? Need to post a correction? Please post a comment on our Facebook page.

Check out our updated Privacy policy and Cookie Policy