Contingency Coefficient

Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.

Home

Bivariate Data

Correlation

Index

Contingency Coefficient

If we look at the contingency table of two uncorrelated nominal variables, we can calculate the frequency of a particular combination of features h_ij as

h_ik = h_ih_k/N

In the case of a correlation of the two variables the actual frequencies H_ik will deviate from the ideal uncorrelated frequencies h_ik. The difference D_ik between ideal (uncorrelated) und actual frequencies thus calculates as

D_ik = H_ik - h_ik = H_ik - h_ih_k/N.

For uncorrelated variables the difference of frequencies will be around zero for each cell of the table. Thus the correlation of the two variables can be measured by squaring the relative differences and calculating the sum of these squares in relation to the ideal frequencies:

The resulting χ² coefficient, however, has the disadvantage that its value depends both on the dimension of the contingency table and on the size of the sample. After eliminating the dependence on the sample size, we get Pearson's contingency coefficient C:

As this coefficient C is still depending on the dimension of the contingency table, it will be normalized so that its range extends from 0.0 to 1.0:

with m_min = min(q,p).

Hint: In contrast to the correlation coefficient the corrected contingency coefficient C_corr does not indicate the direction of the correlation but only its strength.

Home

Bivariate Data

Correlation

Contingency Coefficient