Testing EdSurvey, WeMix, Dire, MICE and Other Important R packages with Sample Educational Research PART 1
### Loading Required Packages
library(tidyverse)
library(haven)
library(EdSurvey)
library(Dire)
library(WeMix)
library(ggplot2)
Reading PISA 2018 Data and Subsetting the US Data
<- EdSurvey::readPISA(
eds_pisa path = "C:/Users/nghimire/OneDrive - The University of Texas at Tyler/Redirected Folders/Documents/edsurvey_PISA_USA/PISA/2018",
database = "INT", countries = "usa", cognitive = "score"
)
Found cached data for country code "usa"
dim(eds_pisa)
[1] 4838 5045
# eds_pisa$w_fstuwt
It’s massive data set. I am surprised by the volume of the columns!!Showing all 5045 column names doesn’t make sense. I would like to printout just first 100 columns. Let’s see, what they are.
<- head(colnames(eds_pisa), 10)
hd <- tail(colnames(eds_pisa), 10)
ht cbind(hd, ht)
[,1] [,2]
[1,] "ROWID" "creactiv"
[2,] "cntryid" "edushort"
[3,] "cnt" "staffshort"
[4,] "cntschid" "stubeha"
[5,] "cntstuid" "teachbeha"
[6,] "cyc" "scmceg"
[7,] "natcen" "w_schgrnrabwt"
[8,] "stratum" "w_fstuwt_sch_sum"
[9,] "subnatio" "senwt.sch_qqq"
[10,] "oecd" "ver_dat.sch_qqq"
Amazing!! I seem to know some of them but I have no idea about any of these variables.
Understanding the Data
Checking the structures of some of the variables and labels. First of all, I want to check how many plausible values are there for each cognitive test areas (e.g., math, science, and reading).
# showWeights(eds_pisa, verbose = TRUE)
showPlausibleValues(eds_pisa)
There are 9 subject scale(s) or subscale(s) in this
edsurvey.data.frame:
'math' subject scale or subscale with 10 plausible values.
'read' subject scale or subscale with 10 plausible values (the
default).
'scie' subject scale or subscale with 10 plausible values.
'glcm' subject scale or subscale with 10 plausible values.
'rcli' subject scale or subscale with 10 plausible values.
'rcun' subject scale or subscale with 10 plausible values.
'rcer' subject scale or subscale with 10 plausible values.
'rtsn' subject scale or subscale with 10 plausible values.
'rtml' subject scale or subscale with 10 plausible values.
Based on the information, the data set contains 10-plausible values for reading, and I want to learn a little more about them by comparing their summary.
Loos like they are fairly homogeneous, but they are different in terms of their values. The OECD report shows that the average Reading scores for US test takers was 505
. The median not mean for pV2read
shows the same value but all mean values are smaller. Taking average of all of the means would provide us:
n_distinct(eds_pisa$ratcmp1)
[1] 93
levelsSDF(varnames = "ratcmp1", data = eds_pisa)
Levels for Variable 'ratcmp1' (Lowest level first):
995. VALID SKIP* (n = 0)
997. NOT APPLICABLE* (n = 0)
998. INVALID* (n = 0)
999. NO RESPONSE* (n = 0)
NOTE: * indicates an omitted level.
I was not sure what the ratcmp1
variable was and if it had any levels. I requested the codebook using showCodebook(eds_data_full)
and figured out that this variable is defined as the index of availability of computers (RATCMP1) is the ratio of computers available to 15-year-olds for educational purposes to the total number of students in the modal grade for 15-year-olds.
As seen above, there were 93 distinct computer indicators among the US samples and the variable does not have any labels. It should be a fairly straight continuous variables with some missing values. However, I don’t need to know more because I am going to get rid of this variable as it has nothing to do with my current analysis. My focus will be more towards teacher samples.
# getStratumVar(data = eds_pisa, weightVar = "origwt")
# summary2(eds_pisa, "composite")
# summary2(eds_pisa, "composite", weightVar = "NULL")
showCutPoints(data = eds_pisa)
Achievement Levels:
Mathematics: 357.77, 420.07, 482.38, 544.68, 606.99, 669.3
Reading: 189.33, 262.04, 334.75, 407.47, 480.18, 552.89, 625.61, 698.32
Science: 260.54, 334.94, 409.54, 484.14, 558.73, 633.33, 707.93
During our hands-on training at NAEP Winter Data Workshop-2023, I learned that they use “origwt” to weight the variable and “composite” as the composite value of all plausible math scores. I tried both expression but looks like they are not the cases, here.
Coming back to the false mean scores that I discussed earlier. I know that NAEP recommends using weighted samples instead of simple samples in analyses. Here’s the further discussion about using weighted values:
Why Weighting the Samples?
- The weights account for the fraction of the population represented by each stratum and reflect the probability that an element of the stratum is selected to be in the sample. One can show that the weighted sample mean is a good estimator in the statistical sense of the population mean when the sampling is a stratified design (pp. 300).
- The unweighted sample size is in fact the size of the only sample selected. The weighted sample size is nothing more than the size of the population represented by the sample which is already known, or can be calculated from the weights (pp. 301).
- Stratification is often used when the population has groups that are different from each other regarding the variable of interest, such as students from different countries, states, and school districts in the PISA assessment. In such cases, we are usually interested in some inference (e.g, mean, proportion,total, ratio, etc.) about each startum (e.g., students from different ethnic status). The weighting comes into play when combining the inferences from the strata into an inference about the entire population (e.g., we are interested to identifying how the 4, 838 US 15-year-olds did in 2018 PISA Reading Assessment and how they compare with each other based on their ethnicity, and we are going to make an inference about the whole US 15-year-olds in the year 2018, which was close to ~12,506,174). [@Ciol et al., 2006, pp. 301]
Let’s come back to this point.
Subsetting the Reading Data
This file is a compiled version of all possible variables in the assessments (Cognitive- Reading, Math, Science, Digital Literacy; and Surveys- Student, Teacher, and School), thus, we can subset the data to suit our requirements for example school, student, or teacher etc. For now, I am going to subset student only data. The selected variables are the one useful for me in this analysis.
<- EdSurvey::getData(eds_pisa,
read_data_full varnames = c(
"ROWID", "cntschid", "cntstuid",
"privatesch", "schltype", "stratio", "schsize",
"totat", "proatce", "proat5ab", "proat5am", "proat6",
"clsize", "teachbeha", "w_schgrnrabwt", "w_fstuwt_sch_sum",
"read", "w_fstuwt"
addAttributes = TRUE, omittedLevels = FALSE
),
)# names(read_data_full)
options(scipen = 999)
summary(read_data_full[, -(27:106)])
ROWID cntschid cntstuid privatesch
Min. : 1 Min. :84000001 Min. :84000001 Length:4838
1st Qu.:1210 1st Qu.:84000047 1st Qu.:84002155 Class :character
Median :2420 Median :84000086 Median :84004338 Mode :character
Mean :2420 Mean :84000087 Mean :84004300
3rd Qu.:3629 3rd Qu.:84000129 3rd Qu.:84006418
Max. :4838 Max. :84000175 Max. :84008626
schltype stratio schsize
PRIVATE INDEPENDENT : 163 Min. : 1.667 Min. : 22
PRIVATE GOVERNMENT-DEPENDENT: 13 1st Qu.: 13.100 1st Qu.: 639
PUBLIC :4636 Median : 16.154 Median :1411
NO RESPONSE : 26 Mean : 17.523 Mean :1491
3rd Qu.: 19.000 3rd Qu.:2076
Max. :100.000 Max. :4507
NA's :761 NA's :559
totat proatce proat5ab proat5am
Min. : 1.00 Min. :0.0000 Min. :0.0116 Min. :0.0167
1st Qu.: 46.00 1st Qu.:0.9765 1st Qu.:0.6095 1st Qu.:0.2692
Median : 80.50 Median :1.0000 Median :1.0000 Median :0.4727
Mean : 85.83 Mean :0.9407 Mean :0.8006 Mean :0.4843
3rd Qu.:114.00 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.6593
Max. :280.00 Max. :1.0000 Max. :1.0000 Max. :1.0000
NA's :702 NA's :802 NA's :861 NA's :853
proat6 clsize teachbeha w_schgrnrabwt
Min. :0.0000 26-30 STUDENTS:1660 Min. :-2.0409 Min. : 20.69
1st Qu.:0.0000 21-25 STUDENTS:1271 1st Qu.:-0.1274 1st Qu.: 42.70
Median :0.0154 31-35 STUDENTS: 645 Median : 0.2266 Median : 66.49
Mean :0.0227 16-20 STUDENTS: 478 Mean : 0.2720 Mean : 120.02
3rd Qu.:0.0330 NO RESPONSE : 277 3rd Qu.: 0.8952 3rd Qu.: 157.37
Max. :0.2222 (Other) : 221 Max. : 1.9937 Max. :1294.02
NA's :886 NA's : 286 NA's :467
w_fstuwt_sch_sum pv1read pv2read pv3read
Min. : 820.3 Min. :161.3 Min. :176.5 Min. :132.4
1st Qu.:18102.3 1st Qu.:423.5 1st Qu.:424.6 1st Qu.:423.2
Median :21341.2 Median :503.6 Median :505.2 Median :503.8
Mean :22421.2 Mean :500.2 Mean :500.8 Mean :500.3
3rd Qu.:25895.6 3rd Qu.:578.7 3rd Qu.:578.4 3rd Qu.:577.9
Max. :49343.6 Max. :868.9 Max. :898.5 Max. :858.4
pv4read pv5read pv6read pv7read
Min. :140.3 Min. :137.7 Min. :128.1 Min. :148.7
1st Qu.:426.1 1st Qu.:423.2 1st Qu.:424.4 1st Qu.:424.4
Median :503.1 Median :504.5 Median :504.4 Median :502.6
Mean :501.1 Mean :500.5 Mean :501.0 Mean :499.9
3rd Qu.:579.3 3rd Qu.:579.0 3rd Qu.:579.3 3rd Qu.:578.6
Max. :834.1 Max. :853.5 Max. :844.8 Max. :815.3
pv8read pv9read pv10read w_fstuwt
Min. :170.9 Min. :173.6 Min. :167.8 Min. : 262.8
1st Qu.:424.9 1st Qu.:426.0 1st Qu.:426.1 1st Qu.: 563.0
Median :504.0 Median :503.9 Median :503.6 Median : 661.7
Mean :500.3 Mean :501.1 Mean :500.6 Mean : 735.6
3rd Qu.:579.4 3rd Qu.:577.1 3rd Qu.:579.0 3rd Qu.: 854.5
Max. :823.4 Max. :818.1 Max. :834.1 Max. :2946.1
# showCodebook(read_data_full)
# View(showCodebook(read_data_full))
The truncated datafile contains all teacher related variables. I don’t need all of them. For example, I am not going to use the teacher behavior and ‘w_fstuwt_sch_sum’ variable. Thus, I got rid of them.
Creating A Composite Varialbe Using Ten Plausible Values
::alpha(read_data_full[, 17:26]) psych
Reliability analysis
Call: psych::alpha(x = read_data_full[, 17:26])
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.99 0.99 0.99 0.94 152 0.00014 501 105 0.94
95% confidence boundaries
lower alpha upper
Feldt 0.99 0.99 0.99
Duhachek 0.99 0.99 0.99
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
pv1read 0.99 0.99 0.99 0.94 137 0.00016 0.0000016 0.94
pv2read 0.99 0.99 0.99 0.94 136 0.00016 0.0000015 0.94
pv3read 0.99 0.99 0.99 0.94 137 0.00016 0.0000016 0.94
pv4read 0.99 0.99 0.99 0.94 137 0.00016 0.0000017 0.94
pv5read 0.99 0.99 0.99 0.94 136 0.00016 0.0000017 0.94
pv6read 0.99 0.99 0.99 0.94 136 0.00016 0.0000018 0.94
pv7read 0.99 0.99 0.99 0.94 137 0.00016 0.0000014 0.94
pv8read 0.99 0.99 0.99 0.94 136 0.00016 0.0000012 0.94
pv9read 0.99 0.99 0.99 0.94 136 0.00016 0.0000014 0.94
pv10read 0.99 0.99 0.99 0.94 136 0.00016 0.0000014 0.94
Item statistics
n raw.r std.r r.cor r.drop mean sd
pv1read 4838 0.97 0.97 0.97 0.96 500 108
pv2read 4838 0.97 0.97 0.97 0.97 501 108
pv3read 4838 0.97 0.97 0.97 0.96 500 108
pv4read 4838 0.97 0.97 0.97 0.96 501 108
pv5read 4838 0.97 0.97 0.97 0.97 500 108
pv6read 4838 0.97 0.97 0.97 0.97 501 108
pv7read 4838 0.97 0.97 0.97 0.96 500 108
pv8read 4838 0.97 0.97 0.97 0.97 500 108
pv9read 4838 0.97 0.97 0.97 0.97 501 107
pv10read 4838 0.97 0.97 0.97 0.97 501 108
Qualitative Descriptors of Cronbach’s alpha
- .95 - 1.00: Excellent
- .90 - .94: Great
- .80 - .89: Good
- .70 - .79: Acceptable
- .60 - .69: Questionable, and
- .00 - .59: Unacceptable
Based on the statistics, the raw alpha of .99 shows Excellent internal consistency among the plausible values. The reliability would not get affected even if we drop one or any of the plausible values. Given the alpha = .99, and all ten plausible values appear to measure same thing, we decided to retain all ten plausible values for a composite variable.
$composite <- rowMeans(read_data_full[, 17:26], na.rm = TRUE)
read_data_fullsummary(read_data_full$composite)
Min. 1st Qu. Median Mean 3rd Qu. Max.
157.3 425.9 504.8 500.6 578.4 810.5
Nope this is not what I wanted!! I got to try something new but none of the average means when a corresponding item is dropped from the instrument in the Item Statistics are close to the reported mean. The outcomes are worse than what I previously reported. The average of the composite value is also much lower,i.e., 500.6 than 505. Sadly, I cannot proceed with this so called ‘composite’ value as dependent variable.
Let’s get back to circle one. The summary of the variables above give include some variable that I need to check again. Here’s the further summary in a tabular form:
rbind(
summary(read_data_full$w_fstuwt),
summary(read_data_full$w_fstuwt_sch_sum),
summary(read_data_full$w_schgrnrabwt)
)
Min. 1st Qu. Median Mean 3rd Qu. Max.
[1,] 262.75044 563.04221 661.71657 735.6438 854.5203 2946.134
[2,] 820.25602 18102.33043 21341.22828 22421.2347 25895.5895 49343.581
[3,] 20.68506 42.69652 66.48856 120.0234 157.3688 1294.020
Based on the statistics, w_fstuwt
can be of some merit to my study but not w_fstuwt_sch_sum
& w_schgrnrabwt
.
Now, I would like to go back to my original dataset and check if there is any variable that is listed as weights.
showWeights(eds_pisa, verbose = TRUE)
There is 1 full sample weight in this edsurvey.data.frame:
'w_fstuwt' with 80 JK replicate weights (the default).
Jackknife replicate weight variables associated with the full
sample weight 'w_fstuwt':
'w_fsturwt1', 'w_fsturwt2', 'w_fsturwt3', 'w_fsturwt4',
'w_fsturwt5', 'w_fsturwt6', 'w_fsturwt7', 'w_fsturwt8',
'w_fsturwt9', 'w_fsturwt10', 'w_fsturwt11', 'w_fsturwt12',
'w_fsturwt13', 'w_fsturwt14', 'w_fsturwt15', 'w_fsturwt16',
'w_fsturwt17', 'w_fsturwt18', 'w_fsturwt19', 'w_fsturwt20',
'w_fsturwt21', 'w_fsturwt22', 'w_fsturwt23', 'w_fsturwt24',
'w_fsturwt25', 'w_fsturwt26', 'w_fsturwt27', 'w_fsturwt28',
'w_fsturwt29', 'w_fsturwt30', 'w_fsturwt31', 'w_fsturwt32',
'w_fsturwt33', 'w_fsturwt34', 'w_fsturwt35', 'w_fsturwt36',
'w_fsturwt37', 'w_fsturwt38', 'w_fsturwt39', 'w_fsturwt40',
'w_fsturwt41', 'w_fsturwt42', 'w_fsturwt43', 'w_fsturwt44',
'w_fsturwt45', 'w_fsturwt46', 'w_fsturwt47', 'w_fsturwt48',
'w_fsturwt49', 'w_fsturwt50', 'w_fsturwt51', 'w_fsturwt52',
'w_fsturwt53', 'w_fsturwt54', 'w_fsturwt55', 'w_fsturwt56',
'w_fsturwt57', 'w_fsturwt58', 'w_fsturwt59', 'w_fsturwt60',
'w_fsturwt61', 'w_fsturwt62', 'w_fsturwt63', 'w_fsturwt64',
'w_fsturwt65', 'w_fsturwt66', 'w_fsturwt67', 'w_fsturwt68',
'w_fsturwt69', 'w_fsturwt70', 'w_fsturwt71', 'w_fsturwt72',
'w_fsturwt73', 'w_fsturwt74', 'w_fsturwt75', 'w_fsturwt76',
'w_fsturwt77', 'w_fsturwt78', 'w_fsturwt79', and 'w_fsturwt80'
Wow! It is indeed w_fstuwt
variable. There are 80 different weights, i.e., w_fsturwt1
: w_fsturwt80
for each of the 4838 students. We can use the w-fstuwt
as the composite of all of these 80 numbers. The following information comes from the NCES websites, which describes the ~ Jackknife Replication Method~: “A replication method that estimates standard errors of percentages and other statistics. It is particularly suited to complex sample designs. In the jackknife, sample units are grouped into pairs (replicate groups). Portions of the sample (replicates) are formed by repeatedly omitting one half of the units in one of the replicate groups and calculating the desired statistic (replicate estimate). The number of replicate estimates is equal to the number of replicate groups. The variability among the replicate estimates is used to estimate the overall sampling variability” (https://nces.ed.gov/nationsreportcard/glossary.aspx#jackknife).
Now, I can move forward and conduct descriptive analyses. Before starting a series of descriptive statistics, I want to see whether there are difference in weighted and unweighted means of reading scores among students. Here’s the unweighted mean score:
summary2(read_data_full, "read", weightVar = NULL)
Estimates are not weighted.
Variable N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 pv1read 4838 161.343 423.4801 503.6375 500.1502 578.6824 868.870 108.4549
2 pv2read 4838 176.458 424.5367 505.2045 500.7907 578.4439 898.478 107.9547
3 pv3read 4838 132.423 423.1459 503.8005 500.3032 577.9054 858.393 107.8982
4 pv4read 4838 140.293 426.1104 503.0610 501.0978 579.3880 834.076 108.4841
5 pv5read 4838 137.737 423.1717 504.4565 500.4783 579.0315 853.488 108.0790
6 pv6read 4838 128.111 424.4202 504.4380 500.9541 579.3228 844.836 108.1850
7 pv7read 4838 148.739 424.3686 502.5630 499.8935 578.6955 815.275 107.7119
8 pv8read 4838 170.907 424.9227 503.9935 500.3028 579.3685 823.427 108.1128
9 pv9read 4838 173.639 425.9822 503.8790 501.0805 577.1258 818.066 107.4320
10 pv10read 4838 167.822 426.0661 503.5685 500.6259 579.0812 834.091 107.9336
NA's
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
The means, as in the past, not close to reported 505. These are the means for all ten plausible scores in reading. Lets check the weighted mean for reading scores among US 15-year-olds who took part in the PISA 2018 assessment. Either of the codes below would give us the same results.
# summary2(read_data_full, "read")
summary2(read_data_full, "read", weightVar = "w_fstuwt")
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max.
1 read 4838 3559045 153.7472 429.7936 509.7353 505.3528 583.5768 844.9
SD NA's Zero weights
1 107.9064 0 0
Amazing! That’s what I want. The mean reading score for the US student was 505.3528. Thus, w_fstuwt
would be the weighted sample and read
the outcome/dependent variable. Let’s draw a histogram and look at the distribution.
Descriptive Statistics
Average Scores Based on Class Size
<- edsurveyTable(formula = read ~ clsize, data = read_data_full)
clsize_reading clsize_reading
Formula: read ~ clsize
Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4275
Summary Table:
clsize N WTD_N PCT SE(PCT) MEAN SE(MEAN)
15 STUDENTS OR FEWER 98 78303.96 2.521100 1.020527 490.9990 10.635406
16-20 STUDENTS 478 361655.05 11.643965 2.477297 505.8996 10.771112
21-25 STUDENTS 1271 911449.99 29.345344 3.874926 507.4223 8.233802
26-30 STUDENTS 1660 1134459.30 36.525425 3.979064 508.5477 6.052202
31-35 STUDENTS 645 501987.16 16.162144 2.511393 511.2569 9.204440
36-40 STUDENTS 123 118088.67 3.802022 1.777732 530.6957 16.336060
School Type and Reading Scores
<- edsurveyTable(
schltype_reading formula = read ~ schltype,
data = read_data_full
) schltype_reading
Formula: read ~ schltype
Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4812
Summary Table:
schltype N WTD_N PCT SE(PCT) MEAN
1 PRIVATE INDEPENDENT 163 203557.99 5.757412 1.2083048 525.3717
2 PRIVATE GOVERNMENT-DEPENDENT 13 22100.53 0.625089 0.3349908 499.0487
3 PUBLIC 4636 3309922.77 93.617499 1.2244331 503.6134
SE(MEAN)
1 21.037771
2 27.273720
3 3.421781
Student Teacher Ratio with Omitted Variables like NAs
summary2(read_data_full, "stratio", omittedLevels = FALSE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 stratio 4838 3559045 1.6667 12.5827 16 17.21917 18.9474 100 9.444379
NA's Zero weights
1 761 0
Student Teacher Ratio without Omitted Variables
summary2(read_data_full, "stratio", omittedLevels = TRUE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 stratio 4077 2932840 1.6667 12.5827 16 17.21917 18.9474 100 9.444379
NA's Zero weights
1 0 0
School Size (by Total Students)
summary2(read_data_full, "schsize", omittedLevels = FALSE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 schsize 4838 3559045 22 639 1411 1490.035 2061 4507 986.5392
NA's Zero weights
1 559 0
Total Teachers by School
summary2(read_data_full, "totat", omittedLevels = FALSE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 totat 4838 3559045 1 48 80 86.36344 115 280 51.42637
NA's Zero weights
1 702 0
Percentage of Teachers Fully Certified
summary2(read_data_full, "proatce", omittedLevels = FALSE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 proatce 4838 3559045 0 0.9685 1 0.9256084 1 1 0.2051878
NA's Zero weights
1 802 0
Percentage of Teachers with Bachelor Degrees or Above Degrees
summary2(read_data_full, "proat5ab", omittedLevels = FALSE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max.
1 proat5ab 4838 3559045 0.0116 0.6541 1 0.8113005 1 1
SD NA's Zero weights
1 0.2747722 861 0
Percentage of Teachers with Master Degree or Above (Per-School)
summary2(read_data_full, "proat5am", omittedLevels = FALSE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 proat5am 4838 3559045 0.0167 0.2974 0.5 0.50363 0.6818 1 0.250698
NA's Zero weights
1 853 0
Percentage of Teachers by School School Having Doctoral Degree or Other Higher Degrees
summary2(read_data_full, "proat6", omittedLevels = FALSE)
Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max.
1 proat6 4838 3559045 0 0 0.0152 0.02227463 0.033 0.2222
SD NA's Zero weights
1 0.02850491 886 0
Linear Regressions
With Continuous Variables
<- EdSurvey::lm.sdf(
lm_1 formula = read ~ stratio,
data = read_data_full
)summary(lm_1)
Formula: read ~ stratio
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4077
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 496.60168 9.25180 53.6762 115.00 <0.0000000000000002 ***
stratio 0.67519 0.49094 1.3753 104.03 0.172
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0036
<- EdSurvey::lm.sdf(
lm_2 formula = read ~ stratio + schsize,
data = read_data_full
)summary(lm_2)
Formula: read ~ stratio + schsize
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4077
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 491.9048769 7.9933326 61.53940 97.909 < 0.0000000000000002 ***
stratio 0.2737335 0.4646726 0.58909 68.285 0.55775
schsize 0.0079432 0.0041571 1.91078 94.134 0.05908 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0076
<- EdSurvey::lm.sdf(
lm_ext formula = read ~ stratio + proatce,
data = read_data_full
)summary(lm_ext)
Formula: read ~ stratio + proatce
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3977
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 505.02654 23.25316 21.7186 107.055 < 0.00000000000000022 ***
stratio 1.47007 0.45072 3.2616 86.933 0.001583 **
proatce -24.12443 22.45719 -1.0742 116.982 0.284925
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0111
<- EdSurvey::lm.sdf(
lm_3 formula = read ~ stratio + schsize + proatce,
data = read_data_full
)summary(lm_3)
Formula: read ~ stratio + schsize + proatce
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3977
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 507.3025663 23.2178607 21.8497 105.349 < 0.0000000000000002 ***
stratio 1.0301117 0.5913361 1.7420 84.508 0.08515 .
schsize 0.0053863 0.0039685 1.3572 80.359 0.17850
proatce -27.0344119 22.1922893 -1.2182 115.883 0.22563
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0125
<- EdSurvey::lm.sdf(
lm_4 formula = read ~ stratio + schsize + proatce + proat5ab,
data = read_data_full
)summary(lm_4)
Formula: read ~ stratio + schsize + proatce + proat5ab
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3943
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 529.3286009 28.0319869 18.8830 76.467 < 0.0000000000000002 ***
stratio 1.1266959 0.6203051 1.8164 81.183 0.07301 .
schsize 0.0049045 0.0041054 1.1946 81.893 0.23568
proatce -29.8925431 25.3737597 -1.1781 115.449 0.24118
proat5ab -24.5453534 16.3549508 -1.5008 95.315 0.13672
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0169
<- EdSurvey::lm.sdf(
lm_ext1 formula = read ~ stratio + proatce + proat5ab,
data = read_data_full
)summary(lm_ext1)
Formula: read ~ stratio + proatce + proat5ab
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3943
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 528.59907 27.79512 19.0177 78.326 < 0.00000000000000022 ***
stratio 1.52870 0.46508 3.2869 85.945 0.001468 **
proatce -27.84878 25.65859 -1.0854 118.118 0.279973
proat5ab -25.52185 16.11597 -1.5836 93.058 0.116669
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0157
<- EdSurvey::lm.sdf(
lm_5 formula = read ~ stratio + proatce +
+ proat5am,
proat5ab data = read_data_full
)summary(lm_5)
Formula: read ~ stratio + proatce + proat5ab + proat5am
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3916
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 515.45068 24.06791 21.4165 60.629 < 0.00000000000000022 ***
stratio 1.57995 0.48489 3.2583 83.114 0.001624 **
proatce -34.74642 23.70757 -1.4656 111.272 0.145570
proat5ab -27.82020 15.34054 -1.8135 101.332 0.072713 .
proat5am 39.31266 16.80545 2.3393 78.328 0.021870 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0233
<- EdSurvey::lm.sdf(
lm_ext3 formula = read ~ stratio + proat5ab + proat5am,
data = read_data_full
)summary(lm_ext3)
Formula: read ~ stratio + proat5ab + proat5am
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3916
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 484.36472 15.41869 31.4141 57.759 < 0.00000000000000022 ***
stratio 1.48223 0.45928 3.2273 82.764 0.001792 **
proat5ab -25.24883 15.39741 -1.6398 106.996 0.103982
proat5am 36.08978 16.41087 2.1991 82.620 0.030663 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0193
<- EdSurvey::lm.sdf(
lm_ext4 formula = read ~ proat5ab + proat5am,
data = read_data_full
)summary(lm_ext4)
Formula: read ~ proat5ab + proat5am
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3916
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 510.956 15.771 32.3981 99.276 < 0.0000000000000002 ***
proat5ab -24.414 15.208 -1.6053 104.398 0.11145
proat5am 31.099 17.986 1.7291 94.263 0.08707 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0093
<- EdSurvey::lm.sdf(
lm_6 formula = read ~ stratio + proatce +
+ proat5am + proat6,
proat5ab data = read_data_full
)summary(lm_6)
Formula: read ~ stratio + proatce + proat5ab + proat5am + proat6
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3820
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 511.82801 29.15910 17.5529 77.875 < 0.00000000000000022 ***
stratio 1.47362 0.52459 2.8091 88.937 0.006108 **
proatce -31.76099 29.96758 -1.0598 110.988 0.291516
proat5ab -25.37473 15.49420 -1.6377 97.020 0.104725
proat5am 36.75996 16.49687 2.2283 81.415 0.028618 *
proat6 111.83668 118.57165 0.9432 64.920 0.349076
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0224
<- EdSurvey::lm.sdf(
lm_7 formula = read ~ stratio + proat5ab +
+ proat6 + totat,
proat5am data = read_data_full
)summary(lm_7)
Formula: read ~ stratio + proat5ab + proat5am + proat6 + totat
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3820
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 480.993209 16.394572 29.33856 59.224 < 0.0000000000000002 ***
stratio 1.388043 0.514877 2.69587 86.273 0.00844 **
proat5ab -23.430735 15.550534 -1.50675 104.568 0.13489
proat5am 34.287480 16.093029 2.13058 85.190 0.03601 *
proat6 115.044123 122.509258 0.93906 65.370 0.35115
totat 0.025009 0.069862 0.35798 65.865 0.72150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0197
<- EdSurvey::lm.sdf(
lm_9 formula = read ~ stratio + schsize + totat +
+ proat5ab + proat5am + proat6,
proatce data = read_data_full
)summary(lm_9)
Formula: read ~ stratio + schsize + totat + proatce + proat5ab + proat5am + proat6
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3820
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 525.595403 33.489521 15.694324 81.642 < 0.0000000000000002 ***
stratio 0.034922 1.653351 0.021122 86.963 0.98320
schsize 0.018719 0.018633 1.004630 84.438 0.31795
totat -0.227053 0.271520 -0.836228 99.273 0.40503
proatce -31.827247 29.302545 -1.086160 109.689 0.27979
proat5ab -23.140016 15.410040 -1.501619 97.277 0.13643
proat5am 38.834932 16.785055 2.313661 83.559 0.02314 *
proat6 97.021875 124.293114 0.780589 65.593 0.43785
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0243
b. Linear Regression with Categorial Variables
<- EdSurvey::lm.sdf(
lm_clsize formula = read ~ clsize,
data = read_data_full, relevels = list(clsize = "15 STUDENTS OR FEWER")
)summary(lm_clsize)
Formula: read ~ clsize
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4275
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 490.999 10.635 46.16646 104.077 < 0.0000000000000002 ***
clsize16-20 STUDENTS 14.901 15.178 0.98174 78.498 0.32925
clsize21-25 STUDENTS 16.423 13.293 1.23552 82.276 0.22015
clsize26-30 STUDENTS 17.549 12.138 1.44582 105.553 0.15119
clsize31-35 STUDENTS 20.258 13.668 1.48209 74.575 0.14253
clsize36-40 STUDENTS 39.697 19.742 2.01081 82.348 0.04762 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0026
<- EdSurvey::lm.sdf(
lm_stype formula = read ~ schltype,
data = read_data_full, relevels = list(schltype = "PUBLIC")
)summary(lm_stype)
Formula: read ~ schltype
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4812
Coefficients:
coef se t dof
(Intercept) 503.6134 3.4218 147.17874 79.84
schltypePRIVATE INDEPENDENT 21.7583 21.2573 1.02357 103.91
schltypePRIVATE GOVERNMENT-DEPENDENT -4.5647 27.0502 -0.16875 104.37
Pr(>|t|)
(Intercept) <0.0000000000000002 ***
schltypePRIVATE INDEPENDENT 0.3084
schltypePRIVATE GOVERNMENT-DEPENDENT 0.8663
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0023
<- EdSurvey::lm.sdf(
lm_stype_clsize formula = read ~ schltype + clsize,
data = read_data_full
)summary(lm_stype_clsize)
Formula: read ~ schltype + clsize
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4249
Coefficients:
coef se t dof
(Intercept) 509.077 17.929 28.39327 87.985
schltypePRIVATE GOVERNMENT-DEPENDENT -21.147 34.804 -0.60762 105.118
schltypePUBLIC -23.147 21.171 -1.09333 106.434
clsize16-20 STUDENTS 12.654 14.567 0.86867 94.517
clsize21-25 STUDENTS 18.036 12.037 1.49847 79.121
clsize26-30 STUDENTS 22.139 11.544 1.91773 99.557
clsize31-35 STUDENTS 25.327 13.392 1.89119 81.241
clsize36-40 STUDENTS 44.766 19.943 2.24467 82.052
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
schltypePRIVATE GOVERNMENT-DEPENDENT 0.54475
schltypePUBLIC 0.27672
clsize16-20 STUDENTS 0.38723
clsize21-25 STUDENTS 0.13799
clsize26-30 STUDENTS 0.05801 .
clsize31-35 STUDENTS 0.06216 .
clsize36-40 STUDENTS 0.02748 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0059
c. Final Model
<- EdSurvey::lm.sdf(
lm_final formula = read ~ stratio + proat5am + clsize,
data = read_data_full
)summary(lm_final)
Formula: read ~ stratio + proat5am + clsize
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3960
Coefficients:
coef se t dof Pr(>|t|)
(Intercept) 452.50875 15.96763 28.33912 91.475 < 0.00000000000000022
stratio 1.93429 0.69888 2.76768 72.015 0.007172
proat5am 39.52769 15.74313 2.51079 71.337 0.014314
clsize16-20 STUDENTS 15.26166 16.37564 0.93197 93.371 0.353754
clsize21-25 STUDENTS 6.43676 13.67295 0.47077 81.769 0.639061
clsize26-30 STUDENTS 1.29426 12.60189 0.10270 91.351 0.918423
clsize31-35 STUDENTS -6.91784 15.12315 -0.45743 87.218 0.648497
clsize36-40 STUDENTS -8.72539 20.13689 -0.43330 78.707 0.665979
(Intercept) ***
stratio **
proat5am *
clsize16-20 STUDENTS
clsize21-25 STUDENTS
clsize26-30 STUDENTS
clsize31-35 STUDENTS
clsize36-40 STUDENTS
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.0179
Achievement Levels
achievementLevels(
achievementVars = c("read", "schltype"),
aggregateBy = "schltype",
data = read_data_full
)
AchievementVars: read, schltype
aggregateBy: schltype
Achievement Level Cutpoints:
189.33 262.04 334.75 407.47 480.18 552.89 625.61 698.32
Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4812
Discrete
read_Level schltype N wtdN
Below Proficiency Level 1c PRIVATE INDEPENDENT 0.0 0.0000
At Proficiency Level 1c PRIVATE INDEPENDENT 0.6 492.1536
At Proficiency Level 1b PRIVATE INDEPENDENT 3.2 3426.9881
At Proficiency Level 1a PRIVATE INDEPENDENT 17.6 20501.6921
At Proficiency Level 2 PRIVATE INDEPENDENT 36.4 43147.0495
At Proficiency Level 3 PRIVATE INDEPENDENT 46.6 54158.1832
At Proficiency Level 4 PRIVATE INDEPENDENT 38.3 53412.2310
At Proficiency Level 5 PRIVATE INDEPENDENT 14.8 19025.9151
At Proficiency Level 6 PRIVATE INDEPENDENT 5.5 9393.7781
Below Proficiency Level 1c PRIVATE GOVERNMENT-DEPENDENT 0.0 0.0000
At Proficiency Level 1c PRIVATE GOVERNMENT-DEPENDENT 0.0 0.0000
At Proficiency Level 1b PRIVATE GOVERNMENT-DEPENDENT 0.8 1072.2823
At Proficiency Level 1a PRIVATE GOVERNMENT-DEPENDENT 2.5 4110.2384
At Proficiency Level 2 PRIVATE GOVERNMENT-DEPENDENT 2.4 4321.7613
At Proficiency Level 3 PRIVATE GOVERNMENT-DEPENDENT 2.2 3597.2551
At Proficiency Level 4 PRIVATE GOVERNMENT-DEPENDENT 4.1 7308.0483
At Proficiency Level 5 PRIVATE GOVERNMENT-DEPENDENT 1.0 1690.9432
At Proficiency Level 6 PRIVATE GOVERNMENT-DEPENDENT 0.0 0.0000
Below Proficiency Level 1c PUBLIC 3.4 2526.1568
At Proficiency Level 1c PUBLIC 55.3 37708.2685
At Proficiency Level 1b PUBLIC 275.3 186750.4032
At Proficiency Level 1a PUBLIC 632.1 428236.1305
At Proficiency Level 2 PUBLIC 1013.2 701405.7180
At Proficiency Level 3 PUBLIC 1131.1 814456.2319
At Proficiency Level 4 PUBLIC 939.3 694113.8394
At Proficiency Level 5 PUBLIC 470.2 355365.4783
At Proficiency Level 6 PUBLIC 116.1 89360.5412
Percent StandardError
0.00000000 NA
0.24177563 0.21824383
1.68354390 1.05275936
10.07167150 4.51614097
21.19644102 4.05335947
26.60577609 6.20357569
26.23931922 5.11338261
9.34668054 5.35363498
4.61479211 2.38453015
0.00000000 NA
0.00000000 NA
4.85184027 8.09313952
18.59791904 8.33685589
19.55501329 6.90576736
16.27678297 6.37389980
33.06730088 11.71438156
7.65114356 6.73699248
0.00000000 NA
0.07632072 0.05034374
1.13924919 0.28895635
5.64213779 0.53020234
12.93794933 0.73095848
21.19099952 0.83321432
24.60650260 0.84448127
20.97069594 0.80925255
10.73636768 0.68619164
2.69977723 0.35046562
SEM Lavaan
cntstuid cntschid privatesch
84000001: 1 84000061: 39 private: 202
84000002: 1 84000063: 39 public :4636
84000003: 1 84000168: 39
84000004: 1 84000069: 38
84000006: 1 84000073: 38
84000007: 1 84000083: 38
(Other) :4832 (Other) :4607
schltype stratio schsize
PUBLIC:4636 Min. : 1.667 Min. : 22
PRIVATE INDEPENDENT: 163 1st Qu.: 13.100 1st Qu.: 639
PRIVATE GOVERNMENT-DEPENDENT: 13 Median : 16.154 Median :1411
NA's : 26 Mean : 17.523 Mean :1491
3rd Qu.: 19.000 3rd Qu.:2076
Max. :100.000 Max. :4507
NA's :761 NA's :559
totat proatce proat5ab proat5am
Min. : 1.00 Min. :0.0000 Min. :0.0116 Min. :0.0167
1st Qu.: 46.00 1st Qu.:0.9765 1st Qu.:0.6095 1st Qu.:0.2692
Median : 80.50 Median :1.0000 Median :1.0000 Median :0.4727
Mean : 85.83 Mean :0.9407 Mean :0.8006 Mean :0.4843
3rd Qu.:114.00 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.6593
Max. :280.00 Max. :1.0000 Max. :1.0000 Max. :1.0000
NA's :702 NA's :802 NA's :861 NA's :853
proat6 clsize reading_score
Min. :0.0000 16-20 STUDENTS: 478 Min. :157.3
1st Qu.:0.0000 21-25 STUDENTS:1271 1st Qu.:425.9
Median :0.0154 26-30 STUDENTS:1660 Median :504.8
Mean :0.0227 31-35 STUDENTS: 645 Mean :500.6
3rd Qu.:0.0330 36-40 STUDENTS: 123 3rd Qu.:578.4
Max. :0.2222 15 STUDENTS OR FEWER: 98 Max. :810.5
NA's :886 NA's : 563
The Models
A. Baseline Model
<- "
base_model # Variances Only
reading_score ~~ reading_score
proatce ~~ proatce
proat5ab ~~ proat5ab
proat5am ~~ proat5am
proat6 ~~ proat6
stratio ~~ stratio
"
<- sem(base_model, data = stu_data)
base_1 summary(base_1, fit.measures = TRUE)
lavaan 0.6.14 ended normally after 31 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 6
Used Total
Number of observations 3820 4838
Model Test User Model:
Test statistic 367.163
Degrees of freedom 15
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 367.163
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.000
Tucker-Lewis Index (TLI) -0.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -26868.397
Loglikelihood unrestricted model (H1) -26684.815
Akaike (AIC) 53748.794
Bayesian (BIC) 53786.282
Sample-size adjusted Bayesian (SABIC) 53767.216
Root Mean Square Error of Approximation:
RMSEA 0.078
90 Percent confidence interval - lower 0.072
90 Percent confidence interval - upper 0.085
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 0.362
Standardized Root Mean Square Residual:
SRMR 0.066
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Variances:
Estimate Std.Err z-value P(>|z|)
reading_score 10638.977 243.435 43.704 0.000
proatce 0.022 0.001 43.704 0.000
proat5ab 0.075 0.002 43.704 0.000
proat5am 0.062 0.001 43.704 0.000
proat6 0.001 0.000 43.704 0.000
stratio 50.308 1.151 43.704 0.000
We compared the Test User Model statistics with the Baseline Model. Based on the output, Model Chi-Square is 597.474 with 21 degrees of freedom, and matches perfectly with the Baseline model.
B. Regression Models
# 1. Bachelor Degree Model
<- "
bach_model reading_score ~ 1 + proat5ab
"
<- sem(bach_model, data = stu_data)
regression_1 coef(regression_1)
reading_score~1 reading_score~proat5ab
523.894 -27.237
reading_score~~reading_score
10713.953
# Bachelor and Masters Model
<- "
mas_model reading_score ~ 1 + proat5ab + proat5am
"
<- sem(mas_model, data = stu_data)
regression_2 coef(regression_2)
reading_score~1 reading_score~proat5ab
511.137 -28.034
reading_score~proat5am reading_score~~reading_score
26.854 10576.327
# Doctoral Degree Model
<- "
doc_model reading_score ~ 1 + proat5ab + proat5am + proat6
"
<- sem(doc_model, data = stu_data)
regression_3 coef(regression_3)
reading_score~1 reading_score~proat5ab
508.435 -25.789
reading_score~proat5am reading_score~proat6
25.000 99.423
reading_score~~reading_score
10533.188
# Academic Attainment and Teacher Certification
<- "
tc_model reading_score ~ 1 + proat5ab + proat5am + proat6 + proatce + proat6:proatce
"
<- sem(tc_model, data = stu_data)
regression_4 coef(regression_4)
reading_score~1 reading_score~proat5ab
563.843 -23.578
reading_score~proat5am reading_score~proat6
28.621 -1576.028
reading_score~proatce reading_score~proat6:proatce
-61.228 1720.194
reading_score~~reading_score
10495.767
# ST-Ratio and Other Variables
<- "
stratio_model reading_score ~ 1 + proat5ab + proat5am + proat6 + proatce + stratio + proatce:stratio
proatce ~ proat5ab + proat5am + proat6 + stratio
stratio ~ proat5ab + proat5am + proat6
"
<- sem(stratio_model, data = stu_data)
regression_5 summary(regression_5, standardized = TRUE, rsquare = TRUE, fit.measures = TRUE)
lavaan 0.6.14 ended normally after 32 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Used Total
Number of observations 3820 4838
Model Test User Model:
Test statistic 16990.878
Degrees of freedom 5
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 23163.744
Degrees of freedom 18
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.266
Tucker-Lewis Index (TLI) -1.642
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -47118.223
Loglikelihood unrestricted model (H1) -38622.784
Akaike (AIC) 94278.446
Bayesian (BIC) 94409.655
Sample-size adjusted Bayesian (SABIC) 94342.926
Root Mean Square Error of Approximation:
RMSEA 0.943
90 Percent confidence interval - lower 0.931
90 Percent confidence interval - upper 0.955
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.170
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
reading_score ~
proat5ab -28.472 6.067 -4.693 0.000 -28.472 -0.076
proat5am 31.964 6.797 4.702 0.000 31.964 0.077
proat6 50.699 54.999 0.922 0.357 50.699 0.015
proatce -39.099 11.185 -3.496 0.000 -39.099 -0.057
stratio 0.856 0.236 3.620 0.000 0.856 0.059
proatce:strati 0.644 0.223 2.889 0.004 0.644 0.046
proatce ~
proat5ab -0.014 0.009 -1.598 0.110 -0.014 -0.026
proat5am 0.097 0.010 9.973 0.000 0.097 0.161
proat6 0.005 0.080 0.063 0.950 0.005 0.001
stratio 0.000 0.000 1.327 0.184 0.000 0.022
stratio ~
proat5ab 1.447 0.415 3.490 0.000 1.447 0.056
proat5am -3.042 0.457 -6.659 0.000 -3.042 -0.107
proat6 32.440 3.729 8.699 0.000 32.440 0.140
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.reading_score 520.629 12.979 40.114 0.000 520.629 5.055
.proatce 0.905 0.011 85.779 0.000 0.905 6.051
.stratio 16.475 0.422 39.005 0.000 16.475 2.323
proatce:strati 16.030 0.120 133.689 0.000 16.030 2.163
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.reading_score 10409.928 238.194 43.704 0.000 10409.928 0.981
.proatce 0.022 0.000 43.704 0.000 0.022 0.974
.stratio 48.805 1.117 43.704 0.000 48.805 0.970
proatce:strati 54.921 1.257 43.704 0.000 54.921 1.000
R-Square:
Estimate
reading_score 0.019
proatce 0.026
stratio 0.030
<- list(
labels stratio = "Student-Teacher Ratio", proat5am = "% of Teachers with
Masters Degree", proat5ab = "% of Teachers with Bachelors Degree",
proat6 = "% of Teachers with Doctoral Degree",
proatce = "% of Teachers Fully Certified",
reading_score = "Reading Score"
)
lavaanPlot(
model = regression_5, labels = labels,
node_options = list(shape = "box", fontname = "Helvetica"),
coefs = TRUE, sig = .05, stand = TRUE, stars = TRUE
)
Further Analysis in Next File.
References
- Ciol, M. A., Hoffman, J. M., Dudgeon, B. J., Shumway-Cook, A., Yorkston, K. M., & Chan, L. (2006). Understanding the use of weights in the analysis of data from multistage surveys. Archives of physical medicine and rehabilitation, 87(2), 299-303. https://doi.org/10.1016/j.apmr.2005.09.021
- Morris, T.P., White, I. R., & Royston, P. (2014). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Medical Research methodology, 14, 1:13. https://doi.org/10.1186/1471-2288-14-75
- Austin, P. C.,White, I. R., Lee, D. S., & Buuren, S. (2021). Missing data in clinical research: A tutorial on multiple imputation. Canadian Journal of Cardiology, 37(9), 1322-1331. https://doi.org/10.1016/j.cjca.2020.11.010
Comments
Post a Comment