Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Indicator Variable

An indicator variable is a binary variable (values 0 and 1, or -1 and +1) which tells us whether an object exhibits a particular property or not. Indicator variables can be either collected directly (e.g. the gender of a person) or calculated from other variables. In the case of calculating indicator variables we have to distinguish two scenarios:

Dichotomization of a continuous variable: In some experiments a continuous variable is measured at first, however the concecutive analysis requires only to know whether the measured value has exceeded a threshold or has fallen below a limit. This means that we are interested only in two states (limit exceeded or not). The resulting variable is a dichotomous or binary one. The threshold for the dichotomization is clearly determined by the particular problem, thus it is conceivable to extract more than one indicator variable from a measured continuous variable.

Example: An example would be the speed of a vehicle which is measured in km/h (a continuous value), however, it may be only of interest whether the vehicle exceeded the speed limit or not. Thus the continuous speed is transformed into an indicator variable which exhibits a zero value if the speed is below the speed limit, and a value of 1 otherwise.

Breaking down nominal and ordinal variables: If a variable contains the description of several states or qualitative properties, this variable will be useless for establishing statistical models, because most models require the explanatory variables to be at the interval or ratio level of measurement. A remedy in such cases is to split the variable into an appropriate number of indicator variables which contain either 0 or 1 depending on the occurrence of a particular state. In general, a qualitative variable showing n levels of measurement can be represented by n-1 binary indicator variables (however, in most practical circumstances n indicator variables are used).

Example: Let's have a look at a variable which describes different species of a plant. Assuming that the plants to be investigated are divided into three sub-species (as, for example, the well-known data set of R.A. Fisher, which contains the leaf lengths of three different kinds of irises (I. setosa, I. virginica und I. versicolor ). The qualitative variable "sub-species" contains the name of the corresponding sub-species. In order to create indicator variables, we add three additional variables which contain non-zero values (most often a value of 1) if the particular iris plant belongs to corresponding sub-species; otherwise a zeor value is entered.