Multiple Correspondence Analysis (MCA) in Educational Data

 

PUBLISHED

May 19, 2023


Introduction

Multiple Correspondence Analysis (MCA) is a multivariate statistical technique that is used to analyze the relationships between categorical variables. It is a generalization of correspondence analysis (CA), which is used to analyze the relationships between two categorical variables. MCA can be used to explore the associations between multiple categorical variables simultaneously.

MCA works by creating a map of the categorical variables. The map is created by calculating the distances between the different categories of the variables. The closer two categories are on the map, the more similar they are. The further apart two categories are on the map, the less similar they are.

MCA can be used to explore a variety of research questions. For example, MCA can be used to:

  • Explore the relationships between different demographic variables, such as age, gender, and education level.
  • Explore the relationships between different product features, such as price, color, and size.
  • Explore the relationships between different customer segments, such as brand loyalists, price-sensitive shoppers, and impulse buyers.

MCA is a powerful tool that can be used to gain insights into the relationships between categorical variables. It is a versatile technique that can be used to explore a variety of research questions.

MCA for Analyzing Educational Data:

MCA is particularly well-suited for analyzing educational data, as it can be used to explore a wide range of topics, such as student achievement, teacher effectiveness, and school climate.

One of the key advantages of MCA is its capacity for making comparisons between groups. This feature is beneficial in examining differences in student achievement, teacher effectiveness, or school climate across various groups, such as different genders, races, or socioeconomic backgrounds. By utilizing MCA, educational researchers and policymakers can gain valuable insights into these variations and use them to inform decision-making.

MCA also aids in the identification of underlying dimensions or constructs that contribute to educational outcomes. It uncovers latent variables that may not be directly measured but are critical in understanding educational success. For instance, MCA can reveal associations between variables such as student motivation, parental involvement, and academic achievement. By recognizing these underlying dimensions, educators can design targeted interventions to enhance student engagement and academic performance.

Here are some specific examples of how MCA can be used to analyze educational data:

  • MCA can be used to explore the relationship between student achievement and student background characteristics, such as gender, race, or socioeconomic status.
  • MCA can be used to identify factors that contribute to teacher effectiveness.
  • MCA can be used to compare school climates between different schools.
  • MCA can be used to assess the impact of educational interventions.

MCA is a valuable tool for educational researchers and policymakers who are interested in understanding how to improve student achievement. By using MCA, they can identify patterns and relationships in data, and make comparisons between groups. This information can be used to develop and implement effective educational policies and interventions.

An Example Workout

a. Raw Data, Variable Information and Summary

  • Question 4 [q4 - gender]: Response Codes Female, Male
  • Question 6 [q6 - experience]: Response Codes 0-5 years = 0, 6-10 years = 1, 11-years and more = 2
  • Question 7 [q7 - sch_type]: Response Codes Rural = 0, Urban/Suburban = 1
  • Question 8 [q8 - tchr_type]: Response Codes pre-service = 0, inservice = 1
  • Question 15 [q15 - rf_time]: Response Codes
  • Question 16 [q16 - rf_length]: Response Codes 0 minutes = 0, 15 minutes = 1, 30 minutes = 2, 45 minutes = 3, 1 hour = 4, 1.5 hours = 5, 2 hours = 6, 3 hours or more = 7
  • Question 17 [q17_1 - rf_tv, q17_2 - rf_music, q17_3 - rf_pd, q17_4 - rf_write, q17_5 - rf_talk_phone, q17_6 - rf_onl_game, q17_7 - rf_soc_network]: Response Codes Never = 0, Alittle of the Time = 1, Some of the Time = 2, Most of the Time = 3
  • Question 19 [q19 - rf_disp]: Response Codes No, not at all = 0, Yes, some = 1, Yes, a lot = 2
  • Question 20 [q20 - ra_text]: Response Codes Textbook Chapters-Online, Journal articles-Online, Reports-Online, Novels-Online, Textbook Chapters-In print, Reports-In print, Novels-In print, Other materials-Please specify
  • Question 21 [q21 - ra_time]: Response Codes 6:00 a.m.-11:59 a.m., Noon-6:00 p.m., 6:00 p.m.-11:59 p.m., Midnight-5:59 a.m.
  • Question 22 [q22 - ra_length]: Response Codes 0 minutes = 0, 15 minutes = 1, 30 minutes = 2, 45 minutes = 3, 1 hour = 4, 1.5 hours = 5, 2 hours = 6, 3 hours or more = 7
  • Question 23 [q23_1 - ra_tv, q23_2 - ra_music, q23_3 - ra_write, q23_4 - ra_talk_phone, q23_5 - ra_video_game, q23_6 - ra_soc_network, q23_7 - ra_other]: Response Codes Never = 0, A little of the time = 1, Some of the time = 2, Most of the time = 3
  • Question 25 [q25 - ra_disp]: Response Codes No, not at all = 0, Yes, some = 1, Yes, a lot = 2

Note: Please, do not take away anything from the plot. I created it just for fun. It doesn’t show anything meaningful.

duration_in_seconds  ip_address           gender   
 Min.   :    74.0    Length:700         Female:182  
 1st Qu.:   418.8    Class :character   Male  :429  
 Median :   812.5    Mode  :character   NA's  : 89  
 Mean   :  2509.7                                   
 3rd Qu.:  1422.5                                   
 Max.   :210923.0                                   
                                                    
             experience            sch_type         tchr_type  
 0-5 years        :313   Rural         :281   pre-service:362  
 6-10 years       :229   Urban/Suburban:412   inservice  :317  
 11-years and more:158   NA's          :  7   NA's       : 21  
                                                               
                                                               
                                                               
                                                               
     q14                            rf_time          rf_length  
 Length:700         6:00 a.m.-11:59 a.m.:165   30 minutes :184  
 Class :character   6:00 p.m.-11:59 p.m.:226   45 minutes :156  
 Mode  :character   Midnight-5:59 a.m.  : 35   60 minutes :155  
                    Noon-6:00 p.m.      :274   15 minutes : 75  
                                               90 minutes : 53  
                                               120 minutes: 51  
                                               (Other)    : 26  
  rf_tv     rf_music    rf_pd     rf_write   rf_talk_phone rf_onl_game
 0   : 64   0   : 36   0   : 30   0   : 67   0   : 49      0   :118   
 1   :124   1   :236   1   :182   1   :235   1   :239      1   :209   
 2   :370   2   :314   2   :360   2   :298   2   :319      2   :296   
 3   :132   3   :103   3   :119   3   : 89   3   : 87      3   : 60   
 NA's: 10   NA's: 11   NA's:  9   NA's: 11   NA's:  6      NA's: 17   
                                                                      
                                                                      
 rf_soc_network                 q19          q20           
 0   : 44       Did not multi-task: 11   Length:700        
 1   :224       No, not at all    :197   Class :character  
 2   :311       Not sure          : 48   Mode  :character  
 3   :113       Yes, a lot        :118                     
 NA's:  8       Yes, some         :317                     
                NA's              :  9                     
                                                           
     q21                  ra_length    ra_tv     ra_music   ra_write  
 Length:700         30 minutes :169   0   : 77   0   : 53   0   : 63  
 Class :character   60 minutes :162   1   :119   1   :230   1   :226  
 Mode  :character   45 minutes :157   2   :342   2   :286   2   :300  
                    15 minutes : 88   3   :154   3   :120   3   :103  
                    90 minutes : 66   NA's:  8   NA's: 11   NA's:  8  
                    120 minutes: 33                                   
                    (Other)    : 25                                   
 ra_talk_phone ra_video_game ra_soc_network ra_other  
 0   : 68      0   :125      0   : 44       0   : 64  
 1   :224      1   :208      1   :255       1   :149  
 2   :300      2   :273      2   :286       2   :188  
 3   : 95      3   : 81      3   :103       3   : 70  
 NA's: 13      NA's: 13      NA's: 12       NA's:229  
                                                      
                                                      
                 q25                     rf_text              rf_disp   
 Did not multi-task:  9   News stories online:232   No, not at all:197  
 No, not at all    :185   Books online       :216   Yes, some     :317  
 Not sure          : 52   Magazines online   :143   Yes, a lot    :118  
 Yes, a lot        :136   Newspapers-In print: 15   NA's          : 68  
 Yes, some         :318   Magazines-In print :  6                       
                          (Other)            :  0                       
                          NA's               : 88                       
                           ra_text                    ra_time   
 Textbook Chapters-Online      :201   6:00 a.m.-11:59 a.m.:201  
 Journal articles-Online       :182   Noon-6:00 p.m.      :286  
 Reports-Online                :143   6:00 p.m.-11:59 p.m.:194  
 Textbook Chapters-In print    : 49   Midnight-5:59 a.m.  : 19  
 Other materials-Please specify: 11                             
 (Other)                       :  9                             
 NA's                          :105                             
           ra_disp   
 No, not at all:185  
 Yes, some     :318  
 Yes, a lot    :136  
 NA's          : 61  

A. Reading for Academic Purposes

i. Perform the MCA

ii. Print the results

Create a dimension plot to visualize the positions of categories and variables in the MCA solution space. This plot helps understand the relationships between variables and identify patterns or clusters. The categories and variables that are closer together on the plot are more strongly associated.

**Results of the Multiple Correspondence Analysis (MCA)**
The analysis was performed on 606 individuals, described by 10 variables
*The results are available in the following objects:

   name              description                       
1  "$eig"            "eigenvalues"                     
2  "$var"            "results for the variables"       
3  "$var$coord"      "coord. of the categories"        
4  "$var$cos2"       "cos2 for the categories"         
5  "$var$contrib"    "contributions of the categories" 
6  "$var$v.test"     "v-test for the categories"       
7  "$ind"            "results for the individuals"     
8  "$ind$coord"      "coord. for the individuals"      
9  "$ind$cos2"       "cos2 for the individuals"        
10 "$ind$contrib"    "contributions of the individuals"
11 "$call"           "intermediate results"            
12 "$call$marge.col" "weights of columns"              
13 "$call$marge.li"  "weights of rows"  
           

iii. Extract the weights of the columns

These weights reflect the relative importance of each variable in the MCA analysis. Variables with higher weights have a stronger influence on the analysis, while variables with lower weights have less impact.

                                 variable      weight
Urban/Suburban             Urban/Suburban 0.060561056
ra_tv_2                           ra_tv_2 0.050825083
Yes, some                       Yes, some 0.050825083
0-5 years                       0-5 years 0.045379538
ra_talk_phone_2           ra_talk_phone_2 0.044389439
Noon-6:00 p.m.             Noon-6:00 p.m. 0.043564356
ra_soc_network_2         ra_soc_network_2 0.042409241
ra_music_2                     ra_music_2 0.042244224
ra_video_game_2           ra_video_game_2 0.040594059
Rural                               Rural 0.039438944
ra_soc_network_1         ra_soc_network_1 0.036633663
ra_music_1                     ra_music_1 0.033828383
ra_talk_phone_1           ra_talk_phone_1 0.033498350
6-10 years                     6-10 years 0.033168317
ra_video_game_1           ra_video_game_1 0.030033003
No, not at all             No, not at all 0.028547855
6:00 a.m.-11:59 a.m. 6:00 a.m.-11:59 a.m. 0.027722772
6:00 p.m.-11:59 p.m. 6:00 p.m.-11:59 p.m. 0.026897690
30 minutes                     30 minutes 0.026072607
45 minutes                     45 minutes 0.022277228
60 minutes                     60 minutes 0.022112211
ra_tv_3                           ra_tv_3 0.022112211
11-years and more       11-years and more 0.021452145
Yes, a lot                     Yes, a lot 0.020627063
ra_video_game_0           ra_video_game_0 0.017986799
ra_music_3                     ra_music_3 0.016996700
ra_tv_1                           ra_tv_1 0.016501650
ra_soc_network_3         ra_soc_network_3 0.014851485
15 minutes                     15 minutes 0.013531353
ra_talk_phone_3           ra_talk_phone_3 0.012706271
ra_video_game_3           ra_video_game_3 0.011386139
ra_tv_0                           ra_tv_0 0.010561056
ra_talk_phone_0           ra_talk_phone_0 0.009405941
90 minutes                     90 minutes 0.008085809
ra_music_0                     ra_music_0 0.006930693
ra_soc_network_0         ra_soc_network_0 0.006105611
120 minutes                   120 minutes 0.004950495
Midnight-5:59 a.m.     Midnight-5:59 a.m. 0.001815182
0 minutes                       0 minutes 0.001815182
180 minutes or more   180 minutes or more 0.001155116

iv. Eigenvalues

       eigenvalue percentage of variance cumulative percentage of variance
dim 1       0.319                 10.625                            10.625
dim 2       0.262                  8.738                            19.363
dim 3       0.160                  5.323                            24.686
dim 4       0.153                  5.084                            29.771
dim 5       0.141                  4.703                            34.474
dim 6       0.126                  4.199                            38.672
dim 7       0.121                  4.023                            42.695
dim 8       0.114                  3.786                            46.481
dim 9       0.111                  3.688                            50.169
dim 10      0.107                  3.552                            53.721
dim 11      0.102                  3.390                            57.112
dim 12      0.100                  3.336                            60.448
dim 13      0.095                  3.169                            63.617
dim 14      0.094                  3.132                            66.749
dim 15      0.088                  2.929                            69.678
dim 16      0.086                  2.858                            72.536
dim 17      0.082                  2.739                            75.274
dim 18      0.079                  2.642                            77.916
dim 19      0.075                  2.511                            80.427
dim 20      0.071                  2.380                            82.807
dim 21      0.068                  2.252                            85.059
dim 22      0.062                  2.067                            87.126
dim 23      0.061                  2.047                            89.174
dim 24      0.057                  1.910                            91.084
dim 25      0.056                  1.868                            92.952
dim 26      0.053                  1.759                            94.711
dim 27      0.045                  1.494                            96.205
dim 28      0.045                  1.487                            97.692
dim 29      0.041                  1.372                            99.064
dim 30      0.028                  0.936                           100.000

v. Contributions of variables to dimensions

Assess the relationships between variables and dimensions. Variables that have higher associations (higher cosine similarity) with specific dimensions are more strongly related to those dimensions. This information can help identify the variables that contribute most to each dimension and understand the underlying patterns in the data.

                      Dim 1  Dim 2  Dim 3  Dim 4  Dim 5
0-5 years             0.360  0.144  0.180  1.551  6.684
6-10 years            0.144  0.390  0.003  0.285 12.151
11-years and more     1.807  0.050  0.299  1.316  0.330
Rural                 1.127  0.839  0.005  0.005  6.418
Urban/Suburban        0.734  0.546  0.003  0.004  4.179
6:00 a.m.-11:59 a.m.  0.010  0.838  4.134  0.018  5.508
Noon-6:00 p.m.        0.329  0.574  0.069  0.823  0.602
6:00 p.m.-11:59 p.m.  0.153  0.007  3.678  0.024  0.569
Midnight-5:59 a.m.    2.894  0.199  0.518 10.999  6.079
0 minutes             0.000  0.001  0.130  6.600  2.183
15 minutes            0.559  0.459  3.736  7.375  5.229
30 minutes            0.416  0.289  0.007  0.000  1.494
45 minutes            0.378  1.132  0.010  2.840  4.625
60 minutes            1.266  0.031  1.470  3.321  5.461
90 minutes            0.141  0.299  2.250  0.018  0.022
120 minutes           2.458  0.513  1.480  0.030  0.198
180 minutes or more   1.308  0.337  2.067  8.371  2.166
ra_tv_0              13.996  2.479  0.052  1.839  0.400
ra_tv_1               0.692  0.458  0.003 17.703  0.179
ra_tv_2               0.642  4.643  0.559  4.963  0.003
ra_tv_3               4.367  7.637  1.544  0.462  0.026
ra_music_0           12.763  3.897  0.242  1.691  0.246
ra_music_1            0.576  3.633 11.286  0.615  1.013
ra_music_2            0.716  1.093 14.938  0.040  0.048
ra_music_3            4.071  9.466  2.783  0.347  0.573
ra_talk_phone_0       8.663  1.311  1.024  1.884  5.228
ra_talk_phone_1       0.746  1.202  6.426  3.788  1.793
ra_talk_phone_2       1.083  2.304  6.921  1.481  0.006
ra_talk_phone_3       3.959 13.191  2.796  0.087  0.124
ra_video_game_0      15.227  3.177  0.004  0.726  0.264
ra_video_game_1       0.009  3.045  0.451  7.162  1.943
ra_video_game_2       1.805  2.320  0.006  2.943  1.456
ra_video_game_3       4.911 12.040  0.739  0.001  0.436
ra_soc_network_0      6.100  1.147  0.398  0.239  6.969
ra_soc_network_1      1.195  1.149  4.947  4.377  5.435
ra_soc_network_2      0.694  2.441  7.214  3.315  0.686
ra_soc_network_3      3.582 13.228  2.103  0.011  0.324
No, not at all        0.082  0.895  1.181  1.185  3.559
Yes, some             0.035  0.090  2.021  1.301  0.004
Yes, a lot            0.002  2.505 12.321  0.259  5.387

A factor analysis was conducted to examine the underlying dimensions of the variables. The analysis revealed five distinct dimensions (Dim 1 to Dim 5) that accounted for the variability in the data. Each dimension appears to represent a unique pattern of variables. Notably, Dim 1 showed moderate positive loadings for variables related to 11-years and more, indicating a potential relationship between this age group and other factors. Dim 2 showed a mix of positive and negative loadings for various variables, suggesting a complex relationship between them. Dim 3 exhibited higher loadings for variables related to specific time intervals, indicating a potential temporal association. Dim 4 showed a mix of loadings for different variables, suggesting a diverse pattern of associations. Dim 5 demonstrated higher loadings for variables related to ‘Yes, a lot,’ potentially indicating a strong association with this response category. Further analysis and contextual information are required to provide a comprehensive interpretation of these dimensions.

vi. Coordinates of categories

Calculate the category contributions to each dimension. Category contributions indicate the extent to which a particular category contributes to the overall structure of each dimension. Higher contributions suggest that the category has a stronger association with that dimension.

                      Dim 1  Dim 2  Dim 3  Dim 4  Dim 5
0-5 years            -0.159  0.091  0.080  0.228 -0.456
6-10 years           -0.118 -0.176 -0.012 -0.115  0.719
11-years and more     0.518  0.078 -0.149 -0.306 -0.147
Rural                -0.302 -0.236 -0.014 -0.015  0.479
Urban/Suburban        0.197  0.154  0.009  0.009 -0.312
6:00 a.m.-11:59 a.m. -0.034  0.282  0.488  0.032 -0.529
Noon-6:00 p.m.       -0.155 -0.186 -0.050 -0.170  0.140
6:00 p.m.-11:59 p.m.  0.134 -0.025 -0.467  0.037  0.173
Midnight-5:59 a.m.    2.254  0.536  0.675  3.040  2.174
0 minutes             0.004 -0.035 -0.338  2.355  1.303
15 minutes           -0.363  0.298  0.664  0.912 -0.738
30 minutes           -0.225  0.170 -0.021 -0.002  0.284
45 minutes           -0.233 -0.365 -0.027 -0.441  0.541
60 minutes            0.427 -0.061 -0.326 -0.479 -0.590
90 minutes           -0.236 -0.312 -0.667  0.058  0.062
120 minutes           1.258  0.521  0.691 -0.096 -0.237
180 minutes or more   1.900  0.874  1.691  3.325  1.627
ra_tv_0               2.055  0.784 -0.089 -0.515 -0.231
ra_tv_1               0.366 -0.270  0.018  1.279  0.124
ra_tv_2              -0.201 -0.489 -0.132 -0.386 -0.010
ra_tv_3              -0.793  0.951  0.334  0.179  0.040
ra_music_0            2.423  1.214  0.236 -0.610  0.224
ra_music_1            0.233 -0.531 -0.730  0.167 -0.206
ra_music_2           -0.232 -0.260  0.751  0.038  0.040
ra_music_3           -0.874  1.208 -0.511 -0.177  0.218
ra_talk_phone_0       1.713  0.604  0.417 -0.553  0.886
ra_talk_phone_1       0.266 -0.307 -0.553  0.415 -0.275
ra_talk_phone_2      -0.279 -0.369  0.499 -0.226 -0.014
ra_talk_phone_3      -0.997  1.650 -0.593  0.102  0.117
ra_video_game_0       1.643  0.680 -0.020 -0.248 -0.144
ra_video_game_1      -0.030 -0.516  0.155  0.603  0.302
ra_video_game_2      -0.376 -0.387 -0.016 -0.333 -0.225
ra_video_game_3      -1.173  1.665 -0.322 -0.013  0.232
ra_soc_network_0      1.785  0.702  0.323 -0.244  1.269
ra_soc_network_1      0.322 -0.287 -0.464  0.427 -0.458
ra_soc_network_2     -0.228 -0.388  0.521 -0.345  0.151
ra_soc_network_3     -0.877  1.528 -0.476  0.033  0.175
No, not at all        0.096 -0.287 -0.257  0.252  0.419
Yes, some            -0.047 -0.068 -0.252 -0.198  0.011
Yes, a lot           -0.017  0.564  0.977  0.139 -0.607

vii. Plotting the results as a biplot

A biplot, which combines the dimension plot with the variable plot. This plot allows you to visualize both the relationships between categories and the relationships between variables in the same plot. It helps interpret the associations between categories, variables, and dimensions simultaneously. This can lead to a better understanding of the relationships and provide valuable information for further analysis or decision-making.

If you want to access the R codes please visit: https://rpubs.com/nirmal/1043602

Comments

Popular posts from this blog

Education Matters: Understanding Nepal’s Education (Publication Date: June 19, 2023, Ratopati-English, Link at the End)

charting Concept and Computation: Maps for the Deep Learning Frontier