Multiple Correspondence Analysis (CA) in Excel

Multiple Correspondence Analysis (MCA) is an exploratory multivariate technique designed to analyze and visualize relationships among several categorical variables. It can be seen as the extension of Correspondence Analysis (CA), which is limited to two categorical variables, to the case of more than two variables.

MCA is particularly useful:

  • to detect patterns of association among categories.
  • to reduce dimensionality of categorical datasets.
  • to create visual maps (plots) that display both observations (individuals) and variable categories in the same space.

Multiple correspondence analysis can be also seen as a simple correspondence analysis carried out on an indicator (or design) matrix with cases as rows and categories of variables as columns, thus allowing to include more then two categorical variables at once. Correspondence analysis and Multiple Correspondence Analysis is available in BESH stat starting from version 0.23.

Example

We will the same data as on the simple correspondence analysis. To analyse the data with BESH stat we need to reformat (un-tabulate) contingency table into two columns. You can download transformed data in csv format here.

The indicator matrix of the analysed data would look like this where each one of the 193 total cases from original contingency table represents one row in the indicator matrix. For each case a 1 is entered into the category where the respective case belongs, and a 0 for all other columns. 

EmployeeSmoking
Case NumberSenior ManagerJunior ManagerSenior ManagerJunior ManagerSecretaryNoneLightMediumHeavy
1100001000
2100001000
3100001000
........
.........
192000010001
193000010001

The approach to analyzing categorical data outlined above can easily be extended to more than two categorical variables. To analyse the data in BESH stat open the csv file in excel and select Add-ins → BESH Stat → Multivariate → Multiple Correspondence Anlysis. Then add Smoking and Employee columns to the list of selected variables; and check the 1st row contains variable names option.

Multiple Correspondence Analysis – data input dialog.
Results

Results contains the Burt table – a symmetric matrix of every two-way cross-tabs between categorical variables; it is analogous to covariance matrix of continuous variables.

Correspondence plot

The interpretation of coordinate values and other statistics reported as the results from a multiple correspondence analysis can be interpreted in the same manner as described in the context of the simple correspondence analysis. Note that the correspondence plot from the simple and multiple correspondence uses different metric, but the relative positions of the points is very similar that would allow you to relate the different categories to each other, based on the distances between the row points (between the individual cases in the indicator matrix). For result interpretation refer to the simple correspondence analysis post.

 

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.