Multiple Correspondence Analysis (MCA) in Educational Data
PUBLISHED
May 19, 2023
Introduction
Multiple Correspondence Analysis (MCA) is a multivariate statistical technique that is used to analyze the relationships between categorical variables. It is a generalization of correspondence analysis (CA), which is used to analyze the relationships between two categorical variables. MCA can be used to explore the associations between multiple categorical variables simultaneously.
MCA works by creating a map of the categorical variables. The map is created by calculating the distances between the different categories of the variables. The closer two categories are on the map, the more similar they are. The further apart two categories are on the map, the less similar they are.
MCA can be used to explore a variety of research questions. For example, MCA can be used to:
- Explore the relationships between different demographic variables, such as age, gender, and education level.
- Explore the relationships between different product features, such as price, color, and size.
- Explore the relationships between different customer segments, such as brand loyalists, price-sensitive shoppers, and impulse buyers.
MCA is a powerful tool that can be used to gain insights into the relationships between categorical variables. It is a versatile technique that can be used to explore a variety of research questions.
MCA for Analyzing Educational Data:
MCA is particularly well-suited for analyzing educational data, as it can be used to explore a wide range of topics, such as student achievement, teacher effectiveness, and school climate.
One of the key advantages of MCA is its capacity for making comparisons between groups. This feature is beneficial in examining differences in student achievement, teacher effectiveness, or school climate across various groups, such as different genders, races, or socioeconomic backgrounds. By utilizing MCA, educational researchers and policymakers can gain valuable insights into these variations and use them to inform decision-making.
MCA also aids in the identification of underlying dimensions or constructs that contribute to educational outcomes. It uncovers latent variables that may not be directly measured but are critical in understanding educational success. For instance, MCA can reveal associations between variables such as student motivation, parental involvement, and academic achievement. By recognizing these underlying dimensions, educators can design targeted interventions to enhance student engagement and academic performance.
Here are some specific examples of how MCA can be used to analyze educational data:
- MCA can be used to explore the relationship between student achievement and student background characteristics, such as gender, race, or socioeconomic status.
- MCA can be used to identify factors that contribute to teacher effectiveness.
- MCA can be used to compare school climates between different schools.
- MCA can be used to assess the impact of educational interventions.
MCA is a valuable tool for educational researchers and policymakers who are interested in understanding how to improve student achievement. By using MCA, they can identify patterns and relationships in data, and make comparisons between groups. This information can be used to develop and implement effective educational policies and interventions.
An Example Workout
a. Raw Data, Variable Information and Summary
- Question 4 [q4 - gender]: Response Codes Female, Male
- Question 6 [q6 - experience]: Response Codes 0-5 years = 0, 6-10 years = 1, 11-years and more = 2
- Question 7 [q7 - sch_type]: Response Codes Rural = 0, Urban/Suburban = 1
- Question 8 [q8 - tchr_type]: Response Codes pre-service = 0, inservice = 1
- Question 15 [q15 - rf_time]: Response Codes
- Question 16 [q16 - rf_length]: Response Codes 0 minutes = 0, 15 minutes = 1, 30 minutes = 2, 45 minutes = 3, 1 hour = 4, 1.5 hours = 5, 2 hours = 6, 3 hours or more = 7
- Question 17 [q17_1 - rf_tv, q17_2 - rf_music, q17_3 - rf_pd, q17_4 - rf_write, q17_5 - rf_talk_phone, q17_6 - rf_onl_game, q17_7 - rf_soc_network]: Response Codes Never = 0, Alittle of the Time = 1, Some of the Time = 2, Most of the Time = 3
- Question 19 [q19 - rf_disp]: Response Codes No, not at all = 0, Yes, some = 1, Yes, a lot = 2
- Question 20 [q20 - ra_text]: Response Codes Textbook Chapters-Online, Journal articles-Online, Reports-Online, Novels-Online, Textbook Chapters-In print, Reports-In print, Novels-In print, Other materials-Please specify
- Question 21 [q21 - ra_time]: Response Codes 6:00 a.m.-11:59 a.m., Noon-6:00 p.m., 6:00 p.m.-11:59 p.m., Midnight-5:59 a.m.
- Question 22 [q22 - ra_length]: Response Codes 0 minutes = 0, 15 minutes = 1, 30 minutes = 2, 45 minutes = 3, 1 hour = 4, 1.5 hours = 5, 2 hours = 6, 3 hours or more = 7
- Question 23 [q23_1 - ra_tv, q23_2 - ra_music, q23_3 - ra_write, q23_4 - ra_talk_phone, q23_5 - ra_video_game, q23_6 - ra_soc_network, q23_7 - ra_other]: Response Codes Never = 0, A little of the time = 1, Some of the time = 2, Most of the time = 3
- Question 25 [q25 - ra_disp]: Response Codes No, not at all = 0, Yes, some = 1, Yes, a lot = 2
Note: Please, do not take away anything from the plot. I created it just for fun. It doesn’t show anything meaningful.
duration_in_seconds ip_address gender
Min. : 74.0 Length:700 Female:182
1st Qu.: 418.8 Class :character Male :429
Median : 812.5 Mode :character NA's : 89
Mean : 2509.7
3rd Qu.: 1422.5
Max. :210923.0
experience sch_type tchr_type
0-5 years :313 Rural :281 pre-service:362
6-10 years :229 Urban/Suburban:412 inservice :317
11-years and more:158 NA's : 7 NA's : 21
q14 rf_time rf_length
Length:700 6:00 a.m.-11:59 a.m.:165 30 minutes :184
Class :character 6:00 p.m.-11:59 p.m.:226 45 minutes :156
Mode :character Midnight-5:59 a.m. : 35 60 minutes :155
Noon-6:00 p.m. :274 15 minutes : 75
90 minutes : 53
120 minutes: 51
(Other) : 26
rf_tv rf_music rf_pd rf_write rf_talk_phone rf_onl_game
0 : 64 0 : 36 0 : 30 0 : 67 0 : 49 0 :118
1 :124 1 :236 1 :182 1 :235 1 :239 1 :209
2 :370 2 :314 2 :360 2 :298 2 :319 2 :296
3 :132 3 :103 3 :119 3 : 89 3 : 87 3 : 60
NA's: 10 NA's: 11 NA's: 9 NA's: 11 NA's: 6 NA's: 17
rf_soc_network q19 q20
0 : 44 Did not multi-task: 11 Length:700
1 :224 No, not at all :197 Class :character
2 :311 Not sure : 48 Mode :character
3 :113 Yes, a lot :118
NA's: 8 Yes, some :317
NA's : 9
q21 ra_length ra_tv ra_music ra_write
Length:700 30 minutes :169 0 : 77 0 : 53 0 : 63
Class :character 60 minutes :162 1 :119 1 :230 1 :226
Mode :character 45 minutes :157 2 :342 2 :286 2 :300
15 minutes : 88 3 :154 3 :120 3 :103
90 minutes : 66 NA's: 8 NA's: 11 NA's: 8
120 minutes: 33
(Other) : 25
ra_talk_phone ra_video_game ra_soc_network ra_other
0 : 68 0 :125 0 : 44 0 : 64
1 :224 1 :208 1 :255 1 :149
2 :300 2 :273 2 :286 2 :188
3 : 95 3 : 81 3 :103 3 : 70
NA's: 13 NA's: 13 NA's: 12 NA's:229
q25 rf_text rf_disp
Did not multi-task: 9 News stories online:232 No, not at all:197
No, not at all :185 Books online :216 Yes, some :317
Not sure : 52 Magazines online :143 Yes, a lot :118
Yes, a lot :136 Newspapers-In print: 15 NA's : 68
Yes, some :318 Magazines-In print : 6
(Other) : 0
NA's : 88
ra_text ra_time
Textbook Chapters-Online :201 6:00 a.m.-11:59 a.m.:201
Journal articles-Online :182 Noon-6:00 p.m. :286
Reports-Online :143 6:00 p.m.-11:59 p.m.:194
Textbook Chapters-In print : 49 Midnight-5:59 a.m. : 19
Other materials-Please specify: 11
(Other) : 9
NA's :105
ra_disp
No, not at all:185
Yes, some :318
Yes, a lot :136
NA's : 61
A. Reading for Academic Purposes
i. Perform the MCA
ii. Print the results
Create a dimension plot to visualize the positions of categories and variables in the MCA solution space. This plot helps understand the relationships between variables and identify patterns or clusters. The categories and variables that are closer together on the plot are more strongly associated.
**Results of the Multiple Correspondence Analysis (MCA)**
The analysis was performed on 606 individuals, described by 10 variables
*The results are available in the following objects:
name description
1 "$eig" "eigenvalues"
2 "$var" "results for the variables"
3 "$var$coord" "coord. of the categories"
4 "$var$cos2" "cos2 for the categories"
5 "$var$contrib" "contributions of the categories"
6 "$var$v.test" "v-test for the categories"
7 "$ind" "results for the individuals"
8 "$ind$coord" "coord. for the individuals"
9 "$ind$cos2" "cos2 for the individuals"
10 "$ind$contrib" "contributions of the individuals"
11 "$call" "intermediate results"
12 "$call$marge.col" "weights of columns"
13 "$call$marge.li" "weights of rows"
iii. Extract the weights of the columns
These weights reflect the relative importance of each variable in the MCA analysis. Variables with higher weights have a stronger influence on the analysis, while variables with lower weights have less impact.
variable weight
Urban/Suburban Urban/Suburban 0.060561056
ra_tv_2 ra_tv_2 0.050825083
Yes, some Yes, some 0.050825083
0-5 years 0-5 years 0.045379538
ra_talk_phone_2 ra_talk_phone_2 0.044389439
Noon-6:00 p.m. Noon-6:00 p.m. 0.043564356
ra_soc_network_2 ra_soc_network_2 0.042409241
ra_music_2 ra_music_2 0.042244224
ra_video_game_2 ra_video_game_2 0.040594059
Rural Rural 0.039438944
ra_soc_network_1 ra_soc_network_1 0.036633663
ra_music_1 ra_music_1 0.033828383
ra_talk_phone_1 ra_talk_phone_1 0.033498350
6-10 years 6-10 years 0.033168317
ra_video_game_1 ra_video_game_1 0.030033003
No, not at all No, not at all 0.028547855
6:00 a.m.-11:59 a.m. 6:00 a.m.-11:59 a.m. 0.027722772
6:00 p.m.-11:59 p.m. 6:00 p.m.-11:59 p.m. 0.026897690
30 minutes 30 minutes 0.026072607
45 minutes 45 minutes 0.022277228
60 minutes 60 minutes 0.022112211
ra_tv_3 ra_tv_3 0.022112211
11-years and more 11-years and more 0.021452145
Yes, a lot Yes, a lot 0.020627063
ra_video_game_0 ra_video_game_0 0.017986799
ra_music_3 ra_music_3 0.016996700
ra_tv_1 ra_tv_1 0.016501650
ra_soc_network_3 ra_soc_network_3 0.014851485
15 minutes 15 minutes 0.013531353
ra_talk_phone_3 ra_talk_phone_3 0.012706271
ra_video_game_3 ra_video_game_3 0.011386139
ra_tv_0 ra_tv_0 0.010561056
ra_talk_phone_0 ra_talk_phone_0 0.009405941
90 minutes 90 minutes 0.008085809
ra_music_0 ra_music_0 0.006930693
ra_soc_network_0 ra_soc_network_0 0.006105611
120 minutes 120 minutes 0.004950495
Midnight-5:59 a.m. Midnight-5:59 a.m. 0.001815182
0 minutes 0 minutes 0.001815182
180 minutes or more 180 minutes or more 0.001155116
iv. Eigenvalues
eigenvalue percentage of variance cumulative percentage of variance
dim 1 0.319 10.625 10.625
dim 2 0.262 8.738 19.363
dim 3 0.160 5.323 24.686
dim 4 0.153 5.084 29.771
dim 5 0.141 4.703 34.474
dim 6 0.126 4.199 38.672
dim 7 0.121 4.023 42.695
dim 8 0.114 3.786 46.481
dim 9 0.111 3.688 50.169
dim 10 0.107 3.552 53.721
dim 11 0.102 3.390 57.112
dim 12 0.100 3.336 60.448
dim 13 0.095 3.169 63.617
dim 14 0.094 3.132 66.749
dim 15 0.088 2.929 69.678
dim 16 0.086 2.858 72.536
dim 17 0.082 2.739 75.274
dim 18 0.079 2.642 77.916
dim 19 0.075 2.511 80.427
dim 20 0.071 2.380 82.807
dim 21 0.068 2.252 85.059
dim 22 0.062 2.067 87.126
dim 23 0.061 2.047 89.174
dim 24 0.057 1.910 91.084
dim 25 0.056 1.868 92.952
dim 26 0.053 1.759 94.711
dim 27 0.045 1.494 96.205
dim 28 0.045 1.487 97.692
dim 29 0.041 1.372 99.064
dim 30 0.028 0.936 100.000
v. Contributions of variables to dimensions
Assess the relationships between variables and dimensions. Variables that have higher associations (higher cosine similarity) with specific dimensions are more strongly related to those dimensions. This information can help identify the variables that contribute most to each dimension and understand the underlying patterns in the data.
Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
0-5 years 0.360 0.144 0.180 1.551 6.684
6-10 years 0.144 0.390 0.003 0.285 12.151
11-years and more 1.807 0.050 0.299 1.316 0.330
Rural 1.127 0.839 0.005 0.005 6.418
Urban/Suburban 0.734 0.546 0.003 0.004 4.179
6:00 a.m.-11:59 a.m. 0.010 0.838 4.134 0.018 5.508
Noon-6:00 p.m. 0.329 0.574 0.069 0.823 0.602
6:00 p.m.-11:59 p.m. 0.153 0.007 3.678 0.024 0.569
Midnight-5:59 a.m. 2.894 0.199 0.518 10.999 6.079
0 minutes 0.000 0.001 0.130 6.600 2.183
15 minutes 0.559 0.459 3.736 7.375 5.229
30 minutes 0.416 0.289 0.007 0.000 1.494
45 minutes 0.378 1.132 0.010 2.840 4.625
60 minutes 1.266 0.031 1.470 3.321 5.461
90 minutes 0.141 0.299 2.250 0.018 0.022
120 minutes 2.458 0.513 1.480 0.030 0.198
180 minutes or more 1.308 0.337 2.067 8.371 2.166
ra_tv_0 13.996 2.479 0.052 1.839 0.400
ra_tv_1 0.692 0.458 0.003 17.703 0.179
ra_tv_2 0.642 4.643 0.559 4.963 0.003
ra_tv_3 4.367 7.637 1.544 0.462 0.026
ra_music_0 12.763 3.897 0.242 1.691 0.246
ra_music_1 0.576 3.633 11.286 0.615 1.013
ra_music_2 0.716 1.093 14.938 0.040 0.048
ra_music_3 4.071 9.466 2.783 0.347 0.573
ra_talk_phone_0 8.663 1.311 1.024 1.884 5.228
ra_talk_phone_1 0.746 1.202 6.426 3.788 1.793
ra_talk_phone_2 1.083 2.304 6.921 1.481 0.006
ra_talk_phone_3 3.959 13.191 2.796 0.087 0.124
ra_video_game_0 15.227 3.177 0.004 0.726 0.264
ra_video_game_1 0.009 3.045 0.451 7.162 1.943
ra_video_game_2 1.805 2.320 0.006 2.943 1.456
ra_video_game_3 4.911 12.040 0.739 0.001 0.436
ra_soc_network_0 6.100 1.147 0.398 0.239 6.969
ra_soc_network_1 1.195 1.149 4.947 4.377 5.435
ra_soc_network_2 0.694 2.441 7.214 3.315 0.686
ra_soc_network_3 3.582 13.228 2.103 0.011 0.324
No, not at all 0.082 0.895 1.181 1.185 3.559
Yes, some 0.035 0.090 2.021 1.301 0.004
Yes, a lot 0.002 2.505 12.321 0.259 5.387
A factor analysis was conducted to examine the underlying dimensions of the variables. The analysis revealed five distinct dimensions (Dim 1 to Dim 5) that accounted for the variability in the data. Each dimension appears to represent a unique pattern of variables. Notably, Dim 1 showed moderate positive loadings for variables related to 11-years and more, indicating a potential relationship between this age group and other factors. Dim 2 showed a mix of positive and negative loadings for various variables, suggesting a complex relationship between them. Dim 3 exhibited higher loadings for variables related to specific time intervals, indicating a potential temporal association. Dim 4 showed a mix of loadings for different variables, suggesting a diverse pattern of associations. Dim 5 demonstrated higher loadings for variables related to ‘Yes, a lot,’ potentially indicating a strong association with this response category. Further analysis and contextual information are required to provide a comprehensive interpretation of these dimensions.
vi. Coordinates of categories
Calculate the category contributions to each dimension. Category contributions indicate the extent to which a particular category contributes to the overall structure of each dimension. Higher contributions suggest that the category has a stronger association with that dimension.
Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
0-5 years -0.159 0.091 0.080 0.228 -0.456
6-10 years -0.118 -0.176 -0.012 -0.115 0.719
11-years and more 0.518 0.078 -0.149 -0.306 -0.147
Rural -0.302 -0.236 -0.014 -0.015 0.479
Urban/Suburban 0.197 0.154 0.009 0.009 -0.312
6:00 a.m.-11:59 a.m. -0.034 0.282 0.488 0.032 -0.529
Noon-6:00 p.m. -0.155 -0.186 -0.050 -0.170 0.140
6:00 p.m.-11:59 p.m. 0.134 -0.025 -0.467 0.037 0.173
Midnight-5:59 a.m. 2.254 0.536 0.675 3.040 2.174
0 minutes 0.004 -0.035 -0.338 2.355 1.303
15 minutes -0.363 0.298 0.664 0.912 -0.738
30 minutes -0.225 0.170 -0.021 -0.002 0.284
45 minutes -0.233 -0.365 -0.027 -0.441 0.541
60 minutes 0.427 -0.061 -0.326 -0.479 -0.590
90 minutes -0.236 -0.312 -0.667 0.058 0.062
120 minutes 1.258 0.521 0.691 -0.096 -0.237
180 minutes or more 1.900 0.874 1.691 3.325 1.627
ra_tv_0 2.055 0.784 -0.089 -0.515 -0.231
ra_tv_1 0.366 -0.270 0.018 1.279 0.124
ra_tv_2 -0.201 -0.489 -0.132 -0.386 -0.010
ra_tv_3 -0.793 0.951 0.334 0.179 0.040
ra_music_0 2.423 1.214 0.236 -0.610 0.224
ra_music_1 0.233 -0.531 -0.730 0.167 -0.206
ra_music_2 -0.232 -0.260 0.751 0.038 0.040
ra_music_3 -0.874 1.208 -0.511 -0.177 0.218
ra_talk_phone_0 1.713 0.604 0.417 -0.553 0.886
ra_talk_phone_1 0.266 -0.307 -0.553 0.415 -0.275
ra_talk_phone_2 -0.279 -0.369 0.499 -0.226 -0.014
ra_talk_phone_3 -0.997 1.650 -0.593 0.102 0.117
ra_video_game_0 1.643 0.680 -0.020 -0.248 -0.144
ra_video_game_1 -0.030 -0.516 0.155 0.603 0.302
ra_video_game_2 -0.376 -0.387 -0.016 -0.333 -0.225
ra_video_game_3 -1.173 1.665 -0.322 -0.013 0.232
ra_soc_network_0 1.785 0.702 0.323 -0.244 1.269
ra_soc_network_1 0.322 -0.287 -0.464 0.427 -0.458
ra_soc_network_2 -0.228 -0.388 0.521 -0.345 0.151
ra_soc_network_3 -0.877 1.528 -0.476 0.033 0.175
No, not at all 0.096 -0.287 -0.257 0.252 0.419
Yes, some -0.047 -0.068 -0.252 -0.198 0.011
Yes, a lot -0.017 0.564 0.977 0.139 -0.607
vii. Plotting the results as a biplot
A biplot, which combines the dimension plot with the variable plot. This plot allows you to visualize both the relationships between categories and the relationships between variables in the same plot. It helps interpret the associations between categories, variables, and dimensions simultaneously. This can lead to a better understanding of the relationships and provide valuable information for further analysis or decision-making.
If you want to access the R codes please visit: https://rpubs.com/nirmal/1043602
Comments
Post a Comment