Part I Part II Part III

AUTHOR

AFFILIATION

K-16 Literacy Center at University of Texas at Tyler

PUBLISHED

February 28, 2023

### Loading Required Packages
library(tidyverse)
library(haven)
library(EdSurvey)
library(Dire)
library(WeMix)
library(ggplot2)

Reading PISA 2018 Data and Subsetting the US Data

eds_pisa <- EdSurvey::readPISA(
  path = "C:/Users/nghimire/OneDrive - The University of Texas at Tyler/Redirected Folders/Documents/edsurvey_PISA_USA/PISA/2018",
  database = "INT", countries = "usa", cognitive = "score"
)

Found cached data for country code "usa"

dim(eds_pisa)

[1] 4838 5045

# eds_pisa$w_fstuwt

It’s massive data set. I am surprised by the volume of the columns!!Showing all 5045 column names doesn’t make sense. I would like to printout just first 100 columns. Let’s see, what they are.

hd <- head(colnames(eds_pisa), 10)
ht <- tail(colnames(eds_pisa), 10)
cbind(hd, ht)

      [,1]       [,2]              
 [1,] "ROWID"    "creactiv"        
 [2,] "cntryid"  "edushort"        
 [3,] "cnt"      "staffshort"      
 [4,] "cntschid" "stubeha"         
 [5,] "cntstuid" "teachbeha"       
 [6,] "cyc"      "scmceg"          
 [7,] "natcen"   "w_schgrnrabwt"   
 [8,] "stratum"  "w_fstuwt_sch_sum"
 [9,] "subnatio" "senwt.sch_qqq"   
[10,] "oecd"     "ver_dat.sch_qqq"

Amazing!! I seem to know some of them but I have no idea about any of these variables.

Understanding the Data

Checking the structures of some of the variables and labels. First of all, I want to check how many plausible values are there for each cognitive test areas (e.g., math, science, and reading).

# showWeights(eds_pisa, verbose = TRUE)
showPlausibleValues(eds_pisa)

There are 9 subject scale(s) or subscale(s) in this
  edsurvey.data.frame:
'math' subject scale or subscale with 10 plausible values.

'read' subject scale or subscale with 10 plausible values (the
  default).

'scie' subject scale or subscale with 10 plausible values.

'glcm' subject scale or subscale with 10 plausible values.

'rcli' subject scale or subscale with 10 plausible values.

'rcun' subject scale or subscale with 10 plausible values.

'rcer' subject scale or subscale with 10 plausible values.

'rtsn' subject scale or subscale with 10 plausible values.

'rtml' subject scale or subscale with 10 plausible values.

Based on the information, the data set contains 10-plausible values for reading, and I want to learn a little more about them by comparing their summary.

Loos like they are fairly homogeneous, but they are different in terms of their values. The OECD report shows that the average Reading scores for US test takers was 505. The median not mean for pV2read shows the same value but all mean values are smaller. Taking average of all of the means would provide us:

$Average of Means: (500.2 + 500.8 + 503.8 + 503.1 + 500.5 + 501.0 + 499.9 + 500.3 + 501.1 + 500.6)/10 = 500.58$ $Average of Medians: (503.6 + 505.2 + 503.8 + 503.1 + 504.5 + 504.4 + 502.6 + 504.0 + 503.9 + 503.6)/10 = 503.87$ $Median of Median of: 502.6, 503.1, 503.6, 503.6, 503.8, 503.9, 504.0, 504.4, 504.5, 505.2 = (503.8 + 503.9)/2 = 503.85$ Based on these statistics, median-of-median for dependent variable would be much closer to reported scores, followed by median-of-median. Where, the mean-of-means is no where close to the reported average scores. Thus, any analyses conducted using either any single plausible value or an average of all plausible values would lead us to a wrong findings. Let’s come back to this point after I check some other variables.

n_distinct(eds_pisa$ratcmp1)

[1] 93

levelsSDF(varnames = "ratcmp1", data = eds_pisa)

Levels for Variable 'ratcmp1' (Lowest level first):
    995. VALID SKIP* (n = 0)
    997. NOT APPLICABLE* (n = 0)
    998. INVALID* (n = 0)
    999. NO RESPONSE* (n = 0)
    NOTE: * indicates an omitted level.

I was not sure what the ratcmp1 variable was and if it had any levels. I requested the codebook using showCodebook(eds_data_full) and figured out that this variable is defined as the index of availability of computers (RATCMP1) is the ratio of computers available to 15-year-olds for educational purposes to the total number of students in the modal grade for 15-year-olds.

As seen above, there were 93 distinct computer indicators among the US samples and the variable does not have any labels. It should be a fairly straight continuous variables with some missing values. However, I don’t need to know more because I am going to get rid of this variable as it has nothing to do with my current analysis. My focus will be more towards teacher samples.

# getStratumVar(data = eds_pisa, weightVar = "origwt")
# summary2(eds_pisa, "composite")
# summary2(eds_pisa, "composite", weightVar = "NULL")
showCutPoints(data = eds_pisa)

Achievement Levels:
  Mathematics:  357.77, 420.07, 482.38, 544.68, 606.99, 669.3
  Reading:  189.33, 262.04, 334.75, 407.47, 480.18, 552.89, 625.61, 698.32
  Science:  260.54, 334.94, 409.54, 484.14, 558.73, 633.33, 707.93

During our hands-on training at NAEP Winter Data Workshop-2023, I learned that they use “origwt” to weight the variable and “composite” as the composite value of all plausible math scores. I tried both expression but looks like they are not the cases, here.

Coming back to the false mean scores that I discussed earlier. I know that NAEP recommends using weighted samples instead of simple samples in analyses. Here’s the further discussion about using weighted values:

Why Weighting the Samples?

The weights account for the fraction of the population represented by each stratum and reflect the probability that an element of the stratum is selected to be in the sample. One can show that the weighted sample mean is a good estimator in the statistical sense of the population mean when the sampling is a stratified design (pp. 300).
The unweighted sample size is in fact the size of the only sample selected. The weighted sample size is nothing more than the size of the population represented by the sample which is already known, or can be calculated from the weights (pp. 301).
Stratification is often used when the population has groups that are different from each other regarding the variable of interest, such as students from different countries, states, and school districts in the PISA assessment. In such cases, we are usually interested in some inference (e.g, mean, proportion,total, ratio, etc.) about each startum (e.g., students from different ethnic status). The weighting comes into play when combining the inferences from the strata into an inference about the entire population (e.g., we are interested to identifying how the 4, 838 US 15-year-olds did in 2018 PISA Reading Assessment and how they compare with each other based on their ethnicity, and we are going to make an inference about the whole US 15-year-olds in the year 2018, which was close to ~12,506,174). [@Ciol et al., 2006, pp. 301]

Let’s come back to this point.

Subsetting the Reading Data

This file is a compiled version of all possible variables in the assessments (Cognitive- Reading, Math, Science, Digital Literacy; and Surveys- Student, Teacher, and School), thus, we can subset the data to suit our requirements for example school, student, or teacher etc. For now, I am going to subset student only data. The selected variables are the one useful for me in this analysis.

read_data_full <- EdSurvey::getData(eds_pisa,
  varnames = c(
    "ROWID", "cntschid", "cntstuid",
    "privatesch", "schltype", "stratio", "schsize",
    "totat", "proatce", "proat5ab", "proat5am", "proat6",
    "clsize", "teachbeha", "w_schgrnrabwt", "w_fstuwt_sch_sum",
    "read", "w_fstuwt"
  ), addAttributes = TRUE, omittedLevels = FALSE
)
# names(read_data_full)
options(scipen = 999)
summary(read_data_full[, -(27:106)])

     ROWID         cntschid           cntstuid         privatesch       
 Min.   :   1   Min.   :84000001   Min.   :84000001   Length:4838       
 1st Qu.:1210   1st Qu.:84000047   1st Qu.:84002155   Class :character  
 Median :2420   Median :84000086   Median :84004338   Mode  :character  
 Mean   :2420   Mean   :84000087   Mean   :84004300                     
 3rd Qu.:3629   3rd Qu.:84000129   3rd Qu.:84006418                     
 Max.   :4838   Max.   :84000175   Max.   :84008626                     
                                                                        
                         schltype       stratio           schsize    
 PRIVATE INDEPENDENT         : 163   Min.   :  1.667   Min.   :  22  
 PRIVATE GOVERNMENT-DEPENDENT:  13   1st Qu.: 13.100   1st Qu.: 639  
 PUBLIC                      :4636   Median : 16.154   Median :1411  
 NO RESPONSE                 :  26   Mean   : 17.523   Mean   :1491  
                                     3rd Qu.: 19.000   3rd Qu.:2076  
                                     Max.   :100.000   Max.   :4507  
                                     NA's   :761       NA's   :559   
     totat           proatce          proat5ab         proat5am     
 Min.   :  1.00   Min.   :0.0000   Min.   :0.0116   Min.   :0.0167  
 1st Qu.: 46.00   1st Qu.:0.9765   1st Qu.:0.6095   1st Qu.:0.2692  
 Median : 80.50   Median :1.0000   Median :1.0000   Median :0.4727  
 Mean   : 85.83   Mean   :0.9407   Mean   :0.8006   Mean   :0.4843  
 3rd Qu.:114.00   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.6593  
 Max.   :280.00   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
 NA's   :702      NA's   :802      NA's   :861      NA's   :853     
     proat6                  clsize       teachbeha       w_schgrnrabwt    
 Min.   :0.0000   26-30 STUDENTS:1660   Min.   :-2.0409   Min.   :  20.69  
 1st Qu.:0.0000   21-25 STUDENTS:1271   1st Qu.:-0.1274   1st Qu.:  42.70  
 Median :0.0154   31-35 STUDENTS: 645   Median : 0.2266   Median :  66.49  
 Mean   :0.0227   16-20 STUDENTS: 478   Mean   : 0.2720   Mean   : 120.02  
 3rd Qu.:0.0330   NO RESPONSE   : 277   3rd Qu.: 0.8952   3rd Qu.: 157.37  
 Max.   :0.2222   (Other)       : 221   Max.   : 1.9937   Max.   :1294.02  
 NA's   :886      NA's          : 286   NA's   :467                        
 w_fstuwt_sch_sum     pv1read         pv2read         pv3read     
 Min.   :  820.3   Min.   :161.3   Min.   :176.5   Min.   :132.4  
 1st Qu.:18102.3   1st Qu.:423.5   1st Qu.:424.6   1st Qu.:423.2  
 Median :21341.2   Median :503.6   Median :505.2   Median :503.8  
 Mean   :22421.2   Mean   :500.2   Mean   :500.8   Mean   :500.3  
 3rd Qu.:25895.6   3rd Qu.:578.7   3rd Qu.:578.4   3rd Qu.:577.9  
 Max.   :49343.6   Max.   :868.9   Max.   :898.5   Max.   :858.4  
                                                                  
    pv4read         pv5read         pv6read         pv7read     
 Min.   :140.3   Min.   :137.7   Min.   :128.1   Min.   :148.7  
 1st Qu.:426.1   1st Qu.:423.2   1st Qu.:424.4   1st Qu.:424.4  
 Median :503.1   Median :504.5   Median :504.4   Median :502.6  
 Mean   :501.1   Mean   :500.5   Mean   :501.0   Mean   :499.9  
 3rd Qu.:579.3   3rd Qu.:579.0   3rd Qu.:579.3   3rd Qu.:578.6  
 Max.   :834.1   Max.   :853.5   Max.   :844.8   Max.   :815.3  
                                                                
    pv8read         pv9read         pv10read        w_fstuwt     
 Min.   :170.9   Min.   :173.6   Min.   :167.8   Min.   : 262.8  
 1st Qu.:424.9   1st Qu.:426.0   1st Qu.:426.1   1st Qu.: 563.0  
 Median :504.0   Median :503.9   Median :503.6   Median : 661.7  
 Mean   :500.3   Mean   :501.1   Mean   :500.6   Mean   : 735.6  
 3rd Qu.:579.4   3rd Qu.:577.1   3rd Qu.:579.0   3rd Qu.: 854.5  
 Max.   :823.4   Max.   :818.1   Max.   :834.1   Max.   :2946.1

# showCodebook(read_data_full)
# View(showCodebook(read_data_full))

The truncated datafile contains all teacher related variables. I don’t need all of them. For example, I am not going to use the teacher behavior and ‘w_fstuwt_sch_sum’ variable. Thus, I got rid of them.

Creating A Composite Varialbe Using Ten Plausible Values

psych::alpha(read_data_full[, 17:26])


Reliability analysis   
Call: psych::alpha(x = read_data_full[, 17:26])

  raw_alpha std.alpha G6(smc) average_r S/N     ase mean  sd median_r
      0.99      0.99    0.99      0.94 152 0.00014  501 105     0.94

    95% confidence boundaries 
         lower alpha upper
Feldt     0.99  0.99  0.99
Duhachek  0.99  0.99  0.99

 Reliability if an item is dropped:
         raw_alpha std.alpha G6(smc) average_r S/N alpha se     var.r med.r
pv1read       0.99      0.99    0.99      0.94 137  0.00016 0.0000016  0.94
pv2read       0.99      0.99    0.99      0.94 136  0.00016 0.0000015  0.94
pv3read       0.99      0.99    0.99      0.94 137  0.00016 0.0000016  0.94
pv4read       0.99      0.99    0.99      0.94 137  0.00016 0.0000017  0.94
pv5read       0.99      0.99    0.99      0.94 136  0.00016 0.0000017  0.94
pv6read       0.99      0.99    0.99      0.94 136  0.00016 0.0000018  0.94
pv7read       0.99      0.99    0.99      0.94 137  0.00016 0.0000014  0.94
pv8read       0.99      0.99    0.99      0.94 136  0.00016 0.0000012  0.94
pv9read       0.99      0.99    0.99      0.94 136  0.00016 0.0000014  0.94
pv10read      0.99      0.99    0.99      0.94 136  0.00016 0.0000014  0.94

 Item statistics 
            n raw.r std.r r.cor r.drop mean  sd
pv1read  4838  0.97  0.97  0.97   0.96  500 108
pv2read  4838  0.97  0.97  0.97   0.97  501 108
pv3read  4838  0.97  0.97  0.97   0.96  500 108
pv4read  4838  0.97  0.97  0.97   0.96  501 108
pv5read  4838  0.97  0.97  0.97   0.97  500 108
pv6read  4838  0.97  0.97  0.97   0.97  501 108
pv7read  4838  0.97  0.97  0.97   0.96  500 108
pv8read  4838  0.97  0.97  0.97   0.97  500 108
pv9read  4838  0.97  0.97  0.97   0.97  501 107
pv10read 4838  0.97  0.97  0.97   0.97  501 108

Qualitative Descriptors of Cronbach’s alpha

.95 - 1.00: Excellent
.90 - .94: Great
.80 - .89: Good
.70 - .79: Acceptable
.60 - .69: Questionable, and
.00 - .59: Unacceptable

Based on the statistics, the raw alpha of .99 shows Excellent internal consistency among the plausible values. The reliability would not get affected even if we drop one or any of the plausible values. Given the alpha = .99, and all ten plausible values appear to measure same thing, we decided to retain all ten plausible values for a composite variable.

read_data_full$composite <- rowMeans(read_data_full[, 17:26], na.rm = TRUE)
summary(read_data_full$composite)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  157.3   425.9   504.8   500.6   578.4   810.5

Nope this is not what I wanted!! I got to try something new but none of the average means when a corresponding item is dropped from the instrument in the Item Statistics are close to the reported mean. The outcomes are worse than what I previously reported. The average of the composite value is also much lower,i.e., 500.6 than 505. Sadly, I cannot proceed with this so called ‘composite’ value as dependent variable.

Let’s get back to circle one. The summary of the variables above give include some variable that I need to check again. Here’s the further summary in a tabular form:

rbind(
  summary(read_data_full$w_fstuwt),
  summary(read_data_full$w_fstuwt_sch_sum),
  summary(read_data_full$w_schgrnrabwt)
)

          Min.     1st Qu.      Median       Mean    3rd Qu.      Max.
[1,] 262.75044   563.04221   661.71657   735.6438   854.5203  2946.134
[2,] 820.25602 18102.33043 21341.22828 22421.2347 25895.5895 49343.581
[3,]  20.68506    42.69652    66.48856   120.0234   157.3688  1294.020

Based on the statistics, w_fstuwt can be of some merit to my study but not w_fstuwt_sch_sum & w_schgrnrabwt.

Now, I would like to go back to my original dataset and check if there is any variable that is listed as weights.

showWeights(eds_pisa, verbose = TRUE)

There is 1 full sample weight in this edsurvey.data.frame:
  'w_fstuwt' with 80 JK replicate weights (the default).
    Jackknife replicate weight variables associated with the full
    sample weight 'w_fstuwt':
    'w_fsturwt1', 'w_fsturwt2', 'w_fsturwt3', 'w_fsturwt4',
    'w_fsturwt5', 'w_fsturwt6', 'w_fsturwt7', 'w_fsturwt8',
    'w_fsturwt9', 'w_fsturwt10', 'w_fsturwt11', 'w_fsturwt12',
    'w_fsturwt13', 'w_fsturwt14', 'w_fsturwt15', 'w_fsturwt16',
    'w_fsturwt17', 'w_fsturwt18', 'w_fsturwt19', 'w_fsturwt20',
    'w_fsturwt21', 'w_fsturwt22', 'w_fsturwt23', 'w_fsturwt24',
    'w_fsturwt25', 'w_fsturwt26', 'w_fsturwt27', 'w_fsturwt28',
    'w_fsturwt29', 'w_fsturwt30', 'w_fsturwt31', 'w_fsturwt32',
    'w_fsturwt33', 'w_fsturwt34', 'w_fsturwt35', 'w_fsturwt36',
    'w_fsturwt37', 'w_fsturwt38', 'w_fsturwt39', 'w_fsturwt40',
    'w_fsturwt41', 'w_fsturwt42', 'w_fsturwt43', 'w_fsturwt44',
    'w_fsturwt45', 'w_fsturwt46', 'w_fsturwt47', 'w_fsturwt48',
    'w_fsturwt49', 'w_fsturwt50', 'w_fsturwt51', 'w_fsturwt52',
    'w_fsturwt53', 'w_fsturwt54', 'w_fsturwt55', 'w_fsturwt56',
    'w_fsturwt57', 'w_fsturwt58', 'w_fsturwt59', 'w_fsturwt60',
    'w_fsturwt61', 'w_fsturwt62', 'w_fsturwt63', 'w_fsturwt64',
    'w_fsturwt65', 'w_fsturwt66', 'w_fsturwt67', 'w_fsturwt68',
    'w_fsturwt69', 'w_fsturwt70', 'w_fsturwt71', 'w_fsturwt72',
    'w_fsturwt73', 'w_fsturwt74', 'w_fsturwt75', 'w_fsturwt76',
    'w_fsturwt77', 'w_fsturwt78', 'w_fsturwt79', and 'w_fsturwt80'

Wow! It is indeed w_fstuwt variable. There are 80 different weights, i.e., w_fsturwt1: w_fsturwt80 for each of the 4838 students. We can use the w-fstuwt as the composite of all of these 80 numbers. The following information comes from the NCES websites, which describes the ~ Jackknife Replication Method~: “A replication method that estimates standard errors of percentages and other statistics. It is particularly suited to complex sample designs. In the jackknife, sample units are grouped into pairs (replicate groups). Portions of the sample (replicates) are formed by repeatedly omitting one half of the units in one of the replicate groups and calculating the desired statistic (replicate estimate). The number of replicate estimates is equal to the number of replicate groups. The variability among the replicate estimates is used to estimate the overall sampling variability” (https://nces.ed.gov/nationsreportcard/glossary.aspx#jackknife).

Now, I can move forward and conduct descriptive analyses. Before starting a series of descriptive statistics, I want to see whether there are difference in weighted and unweighted means of reading scores among students. Here’s the unweighted mean score:

summary2(read_data_full, "read", weightVar = NULL)

Estimates are not weighted.
   Variable    N    Min.  1st Qu.   Median     Mean  3rd Qu.    Max.       SD
1   pv1read 4838 161.343 423.4801 503.6375 500.1502 578.6824 868.870 108.4549
2   pv2read 4838 176.458 424.5367 505.2045 500.7907 578.4439 898.478 107.9547
3   pv3read 4838 132.423 423.1459 503.8005 500.3032 577.9054 858.393 107.8982
4   pv4read 4838 140.293 426.1104 503.0610 501.0978 579.3880 834.076 108.4841
5   pv5read 4838 137.737 423.1717 504.4565 500.4783 579.0315 853.488 108.0790
6   pv6read 4838 128.111 424.4202 504.4380 500.9541 579.3228 844.836 108.1850
7   pv7read 4838 148.739 424.3686 502.5630 499.8935 578.6955 815.275 107.7119
8   pv8read 4838 170.907 424.9227 503.9935 500.3028 579.3685 823.427 108.1128
9   pv9read 4838 173.639 425.9822 503.8790 501.0805 577.1258 818.066 107.4320
10 pv10read 4838 167.822 426.0661 503.5685 500.6259 579.0812 834.091 107.9336
   NA's
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0

The means, as in the past, not close to reported 505. These are the means for all ten plausible scores in reading. Lets check the weighted mean for reading scores among US 15-year-olds who took part in the PISA 2018 assessment. Either of the codes below would give us the same results.

# summary2(read_data_full, "read")
summary2(read_data_full, "read", weightVar = "w_fstuwt")

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N     Min.  1st Qu.   Median     Mean  3rd Qu.  Max.
1     read 4838    3559045 153.7472 429.7936 509.7353 505.3528 583.5768 844.9
        SD NA's Zero weights
1 107.9064    0            0

Amazing! That’s what I want. The mean reading score for the US student was 505.3528. Thus, w_fstuwt would be the weighted sample and read the outcome/dependent variable. Let’s draw a histogram and look at the distribution.

Descriptive Statistics

Average Scores Based on Class Size

clsize_reading <- edsurveyTable(formula = read ~ clsize, data = read_data_full)
clsize_reading


Formula: read ~ clsize 

Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4275


Summary Table:
               clsize    N      WTD_N       PCT  SE(PCT)     MEAN  SE(MEAN)
 15 STUDENTS OR FEWER   98   78303.96  2.521100 1.020527 490.9990 10.635406
       16-20 STUDENTS  478  361655.05 11.643965 2.477297 505.8996 10.771112
       21-25 STUDENTS 1271  911449.99 29.345344 3.874926 507.4223  8.233802
       26-30 STUDENTS 1660 1134459.30 36.525425 3.979064 508.5477  6.052202
       31-35 STUDENTS  645  501987.16 16.162144 2.511393 511.2569  9.204440
       36-40 STUDENTS  123  118088.67  3.802022 1.777732 530.6957 16.336060

School Type and Reading Scores

schltype_reading <- edsurveyTable(
  formula = read ~ schltype,
  data = read_data_full
)
schltype_reading


Formula: read ~ schltype 

Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4812


Summary Table:
                      schltype    N      WTD_N       PCT   SE(PCT)     MEAN
1          PRIVATE INDEPENDENT  163  203557.99  5.757412 1.2083048 525.3717
2 PRIVATE GOVERNMENT-DEPENDENT   13   22100.53  0.625089 0.3349908 499.0487
3                       PUBLIC 4636 3309922.77 93.617499 1.2244331 503.6134
   SE(MEAN)
1 21.037771
2 27.273720
3  3.421781

Student Teacher Ratio with Omitted Variables like NAs

summary2(read_data_full, "stratio", omittedLevels = FALSE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N   Min. 1st Qu. Median     Mean 3rd Qu. Max.       SD
1  stratio 4838    3559045 1.6667 12.5827     16 17.21917 18.9474  100 9.444379
  NA's Zero weights
1  761            0

Student Teacher Ratio without Omitted Variables

summary2(read_data_full, "stratio", omittedLevels = TRUE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N   Min. 1st Qu. Median     Mean 3rd Qu. Max.       SD
1  stratio 4077    2932840 1.6667 12.5827     16 17.21917 18.9474  100 9.444379
  NA's Zero weights
1    0            0

School Size (by Total Students)

summary2(read_data_full, "schsize", omittedLevels = FALSE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N Min. 1st Qu. Median     Mean 3rd Qu. Max.       SD
1  schsize 4838    3559045   22     639   1411 1490.035    2061 4507 986.5392
  NA's Zero weights
1  559            0

Total Teachers by School

summary2(read_data_full, "totat", omittedLevels = FALSE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N Min. 1st Qu. Median     Mean 3rd Qu. Max.       SD
1    totat 4838    3559045    1      48     80 86.36344     115  280 51.42637
  NA's Zero weights
1  702            0

Percentage of Teachers Fully Certified

summary2(read_data_full, "proatce", omittedLevels = FALSE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N Min. 1st Qu. Median      Mean 3rd Qu. Max.        SD
1  proatce 4838    3559045    0  0.9685      1 0.9256084       1    1 0.2051878
  NA's Zero weights
1  802            0

Percentage of Teachers with Bachelor Degrees or Above Degrees

summary2(read_data_full, "proat5ab", omittedLevels = FALSE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N   Min. 1st Qu. Median      Mean 3rd Qu. Max.
1 proat5ab 4838    3559045 0.0116  0.6541      1 0.8113005       1    1
         SD NA's Zero weights
1 0.2747722  861            0

Percentage of Teachers with Master Degree or Above (Per-School)

summary2(read_data_full, "proat5am", omittedLevels = FALSE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N   Min. 1st Qu. Median    Mean 3rd Qu. Max.       SD
1 proat5am 4838    3559045 0.0167  0.2974    0.5 0.50363  0.6818    1 0.250698
  NA's Zero weights
1  853            0

Percentage of Teachers by School School Having Doctoral Degree or Other Higher Degrees

summary2(read_data_full, "proat6", omittedLevels = FALSE)

Estimates are weighted using the weight variable 'w_fstuwt'
  Variable    N Weighted N Min. 1st Qu. Median       Mean 3rd Qu.   Max.
1   proat6 4838    3559045    0       0 0.0152 0.02227463   0.033 0.2222
          SD NA's Zero weights
1 0.02850491  886            0

Linear Regressions

With Continuous Variables

lm_1 <- EdSurvey::lm.sdf(
  formula = read ~ stratio,
  data = read_data_full
)
summary(lm_1)


Formula: read ~ stratio

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4077

Coefficients:
                 coef        se       t    dof            Pr(>|t|)    
(Intercept) 496.60168   9.25180 53.6762 115.00 <0.0000000000000002 ***
stratio       0.67519   0.49094  1.3753 104.03               0.172    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0036

lm_2 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + schsize,
  data = read_data_full
)
summary(lm_2)


Formula: read ~ stratio + schsize

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4077

Coefficients:
                   coef          se        t    dof             Pr(>|t|)    
(Intercept) 491.9048769   7.9933326 61.53940 97.909 < 0.0000000000000002 ***
stratio       0.2737335   0.4646726  0.58909 68.285              0.55775    
schsize       0.0079432   0.0041571  1.91078 94.134              0.05908 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0076

lm_ext <- EdSurvey::lm.sdf(
  formula = read ~ stratio + proatce,
  data = read_data_full
)
summary(lm_ext)


Formula: read ~ stratio + proatce

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3977

Coefficients:
                 coef        se       t     dof              Pr(>|t|)    
(Intercept) 505.02654  23.25316 21.7186 107.055 < 0.00000000000000022 ***
stratio       1.47007   0.45072  3.2616  86.933              0.001583 ** 
proatce     -24.12443  22.45719 -1.0742 116.982              0.284925    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0111

lm_3 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + schsize + proatce,
  data = read_data_full
)
summary(lm_3)


Formula: read ~ stratio + schsize + proatce

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3977

Coefficients:
                   coef          se       t     dof             Pr(>|t|)    
(Intercept) 507.3025663  23.2178607 21.8497 105.349 < 0.0000000000000002 ***
stratio       1.0301117   0.5913361  1.7420  84.508              0.08515 .  
schsize       0.0053863   0.0039685  1.3572  80.359              0.17850    
proatce     -27.0344119  22.1922893 -1.2182 115.883              0.22563    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0125

lm_4 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + schsize + proatce + proat5ab,
  data = read_data_full
)
summary(lm_4)


Formula: read ~ stratio + schsize + proatce + proat5ab

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3943

Coefficients:
                   coef          se       t     dof             Pr(>|t|)    
(Intercept) 529.3286009  28.0319869 18.8830  76.467 < 0.0000000000000002 ***
stratio       1.1266959   0.6203051  1.8164  81.183              0.07301 .  
schsize       0.0049045   0.0041054  1.1946  81.893              0.23568    
proatce     -29.8925431  25.3737597 -1.1781 115.449              0.24118    
proat5ab    -24.5453534  16.3549508 -1.5008  95.315              0.13672    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0169

lm_ext1 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + proatce + proat5ab,
  data = read_data_full
)
summary(lm_ext1)


Formula: read ~ stratio + proatce + proat5ab

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3943

Coefficients:
                 coef        se       t     dof              Pr(>|t|)    
(Intercept) 528.59907  27.79512 19.0177  78.326 < 0.00000000000000022 ***
stratio       1.52870   0.46508  3.2869  85.945              0.001468 ** 
proatce     -27.84878  25.65859 -1.0854 118.118              0.279973    
proat5ab    -25.52185  16.11597 -1.5836  93.058              0.116669    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0157

lm_5 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + proatce +
    proat5ab + proat5am,
  data = read_data_full
)
summary(lm_5)


Formula: read ~ stratio + proatce + proat5ab + proat5am

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3916

Coefficients:
                 coef        se       t     dof              Pr(>|t|)    
(Intercept) 515.45068  24.06791 21.4165  60.629 < 0.00000000000000022 ***
stratio       1.57995   0.48489  3.2583  83.114              0.001624 ** 
proatce     -34.74642  23.70757 -1.4656 111.272              0.145570    
proat5ab    -27.82020  15.34054 -1.8135 101.332              0.072713 .  
proat5am     39.31266  16.80545  2.3393  78.328              0.021870 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0233

lm_ext3 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + proat5ab + proat5am,
  data = read_data_full
)
summary(lm_ext3)


Formula: read ~ stratio + proat5ab + proat5am

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3916

Coefficients:
                 coef        se       t     dof              Pr(>|t|)    
(Intercept) 484.36472  15.41869 31.4141  57.759 < 0.00000000000000022 ***
stratio       1.48223   0.45928  3.2273  82.764              0.001792 ** 
proat5ab    -25.24883  15.39741 -1.6398 106.996              0.103982    
proat5am     36.08978  16.41087  2.1991  82.620              0.030663 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0193

lm_ext4 <- EdSurvey::lm.sdf(
  formula = read ~ proat5ab + proat5am,
  data = read_data_full
)
summary(lm_ext4)


Formula: read ~ proat5ab + proat5am

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3916

Coefficients:
               coef      se       t     dof             Pr(>|t|)    
(Intercept) 510.956  15.771 32.3981  99.276 < 0.0000000000000002 ***
proat5ab    -24.414  15.208 -1.6053 104.398              0.11145    
proat5am     31.099  17.986  1.7291  94.263              0.08707 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0093

lm_6 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + proatce +
    proat5ab + proat5am + proat6,
  data = read_data_full
)
summary(lm_6)


Formula: read ~ stratio + proatce + proat5ab + proat5am + proat6

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3820

Coefficients:
                 coef        se       t     dof              Pr(>|t|)    
(Intercept) 511.82801  29.15910 17.5529  77.875 < 0.00000000000000022 ***
stratio       1.47362   0.52459  2.8091  88.937              0.006108 ** 
proatce     -31.76099  29.96758 -1.0598 110.988              0.291516    
proat5ab    -25.37473  15.49420 -1.6377  97.020              0.104725    
proat5am     36.75996  16.49687  2.2283  81.415              0.028618 *  
proat6      111.83668 118.57165  0.9432  64.920              0.349076    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0224

lm_7 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + proat5ab +
    proat5am + proat6 + totat,
  data = read_data_full
)
summary(lm_7)


Formula: read ~ stratio + proat5ab + proat5am + proat6 + totat

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3820

Coefficients:
                  coef         se        t     dof             Pr(>|t|)    
(Intercept) 480.993209  16.394572 29.33856  59.224 < 0.0000000000000002 ***
stratio       1.388043   0.514877  2.69587  86.273              0.00844 ** 
proat5ab    -23.430735  15.550534 -1.50675 104.568              0.13489    
proat5am     34.287480  16.093029  2.13058  85.190              0.03601 *  
proat6      115.044123 122.509258  0.93906  65.370              0.35115    
totat         0.025009   0.069862  0.35798  65.865              0.72150    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0197

lm_9 <- EdSurvey::lm.sdf(
  formula = read ~ stratio + schsize + totat +
    proatce + proat5ab + proat5am + proat6,
  data = read_data_full
)
summary(lm_9)


Formula: read ~ stratio + schsize + totat + proatce + proat5ab + proat5am +     proat6

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3820

Coefficients:
                  coef         se         t     dof             Pr(>|t|)    
(Intercept) 525.595403  33.489521 15.694324  81.642 < 0.0000000000000002 ***
stratio       0.034922   1.653351  0.021122  86.963              0.98320    
schsize       0.018719   0.018633  1.004630  84.438              0.31795    
totat        -0.227053   0.271520 -0.836228  99.273              0.40503    
proatce     -31.827247  29.302545 -1.086160 109.689              0.27979    
proat5ab    -23.140016  15.410040 -1.501619  97.277              0.13643    
proat5am     38.834932  16.785055  2.313661  83.559              0.02314 *  
proat6       97.021875 124.293114  0.780589  65.593              0.43785    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0243

b. Linear Regression with Categorial Variables

lm_clsize <- EdSurvey::lm.sdf(
  formula = read ~ clsize,
  data = read_data_full, relevels = list(clsize = "15 STUDENTS OR FEWER")
)
summary(lm_clsize)


Formula: read ~ clsize

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4275

Coefficients:
                        coef      se        t     dof             Pr(>|t|)    
(Intercept)          490.999  10.635 46.16646 104.077 < 0.0000000000000002 ***
clsize16-20 STUDENTS  14.901  15.178  0.98174  78.498              0.32925    
clsize21-25 STUDENTS  16.423  13.293  1.23552  82.276              0.22015    
clsize26-30 STUDENTS  17.549  12.138  1.44582 105.553              0.15119    
clsize31-35 STUDENTS  20.258  13.668  1.48209  74.575              0.14253    
clsize36-40 STUDENTS  39.697  19.742  2.01081  82.348              0.04762 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0026

lm_stype <- EdSurvey::lm.sdf(
  formula = read ~ schltype,
  data = read_data_full, relevels = list(schltype = "PUBLIC")
)
summary(lm_stype)


Formula: read ~ schltype

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4812

Coefficients:
                                         coef       se         t    dof
(Intercept)                          503.6134   3.4218 147.17874  79.84
schltypePRIVATE INDEPENDENT           21.7583  21.2573   1.02357 103.91
schltypePRIVATE GOVERNMENT-DEPENDENT  -4.5647  27.0502  -0.16875 104.37
                                                Pr(>|t|)    
(Intercept)                          <0.0000000000000002 ***
schltypePRIVATE INDEPENDENT                       0.3084    
schltypePRIVATE GOVERNMENT-DEPENDENT              0.8663    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0023

lm_stype_clsize <- EdSurvey::lm.sdf(
  formula = read ~ schltype + clsize,
  data = read_data_full
)
summary(lm_stype_clsize)


Formula: read ~ schltype + clsize

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 4249

Coefficients:
                                        coef      se        t     dof
(Intercept)                          509.077  17.929 28.39327  87.985
schltypePRIVATE GOVERNMENT-DEPENDENT -21.147  34.804 -0.60762 105.118
schltypePUBLIC                       -23.147  21.171 -1.09333 106.434
clsize16-20 STUDENTS                  12.654  14.567  0.86867  94.517
clsize21-25 STUDENTS                  18.036  12.037  1.49847  79.121
clsize26-30 STUDENTS                  22.139  11.544  1.91773  99.557
clsize31-35 STUDENTS                  25.327  13.392  1.89119  81.241
clsize36-40 STUDENTS                  44.766  19.943  2.24467  82.052
                                                 Pr(>|t|)    
(Intercept)                          < 0.0000000000000002 ***
schltypePRIVATE GOVERNMENT-DEPENDENT              0.54475    
schltypePUBLIC                                    0.27672    
clsize16-20 STUDENTS                              0.38723    
clsize21-25 STUDENTS                              0.13799    
clsize26-30 STUDENTS                              0.05801 .  
clsize31-35 STUDENTS                              0.06216 .  
clsize36-40 STUDENTS                              0.02748 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0059

c. Final Model

lm_final <- EdSurvey::lm.sdf(
  formula = read ~ stratio + proat5am + clsize,
  data = read_data_full
)
summary(lm_final)


Formula: read ~ stratio + proat5am + clsize

Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
Plausible values: 10
jrrIMax: 1
full data n: 4838
n used: 3960

Coefficients:
                          coef        se        t    dof              Pr(>|t|)
(Intercept)          452.50875  15.96763 28.33912 91.475 < 0.00000000000000022
stratio                1.93429   0.69888  2.76768 72.015              0.007172
proat5am              39.52769  15.74313  2.51079 71.337              0.014314
clsize16-20 STUDENTS  15.26166  16.37564  0.93197 93.371              0.353754
clsize21-25 STUDENTS   6.43676  13.67295  0.47077 81.769              0.639061
clsize26-30 STUDENTS   1.29426  12.60189  0.10270 91.351              0.918423
clsize31-35 STUDENTS  -6.91784  15.12315 -0.45743 87.218              0.648497
clsize36-40 STUDENTS  -8.72539  20.13689 -0.43330 78.707              0.665979
                        
(Intercept)          ***
stratio              ** 
proat5am             *  
clsize16-20 STUDENTS    
clsize21-25 STUDENTS    
clsize26-30 STUDENTS    
clsize31-35 STUDENTS    
clsize36-40 STUDENTS    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.0179

Achievement Levels

achievementLevels(
  achievementVars = c("read", "schltype"),
  aggregateBy = "schltype",
  data = read_data_full
)


AchievementVars: read, schltype
aggregateBy: schltype

Achievement Level Cutpoints:
189.33 262.04 334.75 407.47 480.18 552.89 625.61 698.32 

Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4812


Discrete
                 read_Level                     schltype      N        wtdN
 Below Proficiency Level 1c          PRIVATE INDEPENDENT    0.0      0.0000
    At Proficiency Level 1c          PRIVATE INDEPENDENT    0.6    492.1536
    At Proficiency Level 1b          PRIVATE INDEPENDENT    3.2   3426.9881
    At Proficiency Level 1a          PRIVATE INDEPENDENT   17.6  20501.6921
     At Proficiency Level 2          PRIVATE INDEPENDENT   36.4  43147.0495
     At Proficiency Level 3          PRIVATE INDEPENDENT   46.6  54158.1832
     At Proficiency Level 4          PRIVATE INDEPENDENT   38.3  53412.2310
     At Proficiency Level 5          PRIVATE INDEPENDENT   14.8  19025.9151
     At Proficiency Level 6          PRIVATE INDEPENDENT    5.5   9393.7781
 Below Proficiency Level 1c PRIVATE GOVERNMENT-DEPENDENT    0.0      0.0000
    At Proficiency Level 1c PRIVATE GOVERNMENT-DEPENDENT    0.0      0.0000
    At Proficiency Level 1b PRIVATE GOVERNMENT-DEPENDENT    0.8   1072.2823
    At Proficiency Level 1a PRIVATE GOVERNMENT-DEPENDENT    2.5   4110.2384
     At Proficiency Level 2 PRIVATE GOVERNMENT-DEPENDENT    2.4   4321.7613
     At Proficiency Level 3 PRIVATE GOVERNMENT-DEPENDENT    2.2   3597.2551
     At Proficiency Level 4 PRIVATE GOVERNMENT-DEPENDENT    4.1   7308.0483
     At Proficiency Level 5 PRIVATE GOVERNMENT-DEPENDENT    1.0   1690.9432
     At Proficiency Level 6 PRIVATE GOVERNMENT-DEPENDENT    0.0      0.0000
 Below Proficiency Level 1c                       PUBLIC    3.4   2526.1568
    At Proficiency Level 1c                       PUBLIC   55.3  37708.2685
    At Proficiency Level 1b                       PUBLIC  275.3 186750.4032
    At Proficiency Level 1a                       PUBLIC  632.1 428236.1305
     At Proficiency Level 2                       PUBLIC 1013.2 701405.7180
     At Proficiency Level 3                       PUBLIC 1131.1 814456.2319
     At Proficiency Level 4                       PUBLIC  939.3 694113.8394
     At Proficiency Level 5                       PUBLIC  470.2 355365.4783
     At Proficiency Level 6                       PUBLIC  116.1  89360.5412
     Percent StandardError
  0.00000000            NA
  0.24177563    0.21824383
  1.68354390    1.05275936
 10.07167150    4.51614097
 21.19644102    4.05335947
 26.60577609    6.20357569
 26.23931922    5.11338261
  9.34668054    5.35363498
  4.61479211    2.38453015
  0.00000000            NA
  0.00000000            NA
  4.85184027    8.09313952
 18.59791904    8.33685589
 19.55501329    6.90576736
 16.27678297    6.37389980
 33.06730088   11.71438156
  7.65114356    6.73699248
  0.00000000            NA
  0.07632072    0.05034374
  1.13924919    0.28895635
  5.64213779    0.53020234
 12.93794933    0.73095848
 21.19099952    0.83321432
 24.60650260    0.84448127
 20.97069594    0.80925255
 10.73636768    0.68619164
  2.69977723    0.35046562

SEM Lavaan

     cntstuid        cntschid      privatesch  
 84000001:   1   84000061:  39   private: 202  
 84000002:   1   84000063:  39   public :4636  
 84000003:   1   84000168:  39                 
 84000004:   1   84000069:  38                 
 84000006:   1   84000073:  38                 
 84000007:   1   84000083:  38                 
 (Other) :4832   (Other) :4607                 
                          schltype       stratio           schsize    
                        PUBLIC:4636   Min.   :  1.667   Min.   :  22  
           PRIVATE INDEPENDENT: 163   1st Qu.: 13.100   1st Qu.: 639  
  PRIVATE GOVERNMENT-DEPENDENT:  13   Median : 16.154   Median :1411  
 NA's                         :  26   Mean   : 17.523   Mean   :1491  
                                      3rd Qu.: 19.000   3rd Qu.:2076  
                                      Max.   :100.000   Max.   :4507  
                                      NA's   :761       NA's   :559   
     totat           proatce          proat5ab         proat5am     
 Min.   :  1.00   Min.   :0.0000   Min.   :0.0116   Min.   :0.0167  
 1st Qu.: 46.00   1st Qu.:0.9765   1st Qu.:0.6095   1st Qu.:0.2692  
 Median : 80.50   Median :1.0000   Median :1.0000   Median :0.4727  
 Mean   : 85.83   Mean   :0.9407   Mean   :0.8006   Mean   :0.4843  
 3rd Qu.:114.00   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.6593  
 Max.   :280.00   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
 NA's   :702      NA's   :802      NA's   :861      NA's   :853     
     proat6                        clsize     reading_score  
 Min.   :0.0000         16-20 STUDENTS: 478   Min.   :157.3  
 1st Qu.:0.0000         21-25 STUDENTS:1271   1st Qu.:425.9  
 Median :0.0154         26-30 STUDENTS:1660   Median :504.8  
 Mean   :0.0227         31-35 STUDENTS: 645   Mean   :500.6  
 3rd Qu.:0.0330         36-40 STUDENTS: 123   3rd Qu.:578.4  
 Max.   :0.2222   15 STUDENTS OR FEWER:  98   Max.   :810.5  
 NA's   :886      NA's                : 563

The Models

A. Baseline Model

base_model <- "
# Variances Only
reading_score ~~ reading_score
proatce ~~ proatce
proat5ab ~~ proat5ab
proat5am  ~~ proat5am
proat6 ~~ proat6
stratio ~~ stratio
"
base_1 <- sem(base_model, data = stu_data)
summary(base_1, fit.measures = TRUE)

lavaan 0.6.14 ended normally after 31 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6

                                                  Used       Total
  Number of observations                          3820        4838

Model Test User Model:
                                                      
  Test statistic                               367.163
  Degrees of freedom                                15
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               367.163
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.000
  Tucker-Lewis Index (TLI)                      -0.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -26868.397
  Loglikelihood unrestricted model (H1)     -26684.815
                                                      
  Akaike (AIC)                               53748.794
  Bayesian (BIC)                             53786.282
  Sample-size adjusted Bayesian (SABIC)      53767.216

Root Mean Square Error of Approximation:

  RMSEA                                          0.078
  90 Percent confidence interval - lower         0.072
  90 Percent confidence interval - upper         0.085
  P-value H_0: RMSEA <= 0.050                    0.000
  P-value H_0: RMSEA >= 0.080                    0.362

Standardized Root Mean Square Residual:

  SRMR                                           0.066

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Variances:
                   Estimate   Std.Err  z-value  P(>|z|)
    reading_score  10638.977  243.435   43.704    0.000
    proatce            0.022    0.001   43.704    0.000
    proat5ab           0.075    0.002   43.704    0.000
    proat5am           0.062    0.001   43.704    0.000
    proat6             0.001    0.000   43.704    0.000
    stratio           50.308    1.151   43.704    0.000

We compared the Test User Model statistics with the Baseline Model. Based on the output, Model Chi-Square is 597.474 with 21 degrees of freedom, and matches perfectly with the Baseline model.

B. Regression Models

# 1. Bachelor Degree Model
bach_model <- "
reading_score ~ 1 + proat5ab
"
regression_1 <- sem(bach_model, data = stu_data)
coef(regression_1)

             reading_score~1       reading_score~proat5ab 
                     523.894                      -27.237 
reading_score~~reading_score 
                   10713.953

# Bachelor and Masters Model
mas_model <- "
reading_score ~ 1 + proat5ab + proat5am
"
regression_2 <- sem(mas_model, data = stu_data)
coef(regression_2)

             reading_score~1       reading_score~proat5ab 
                     511.137                      -28.034 
      reading_score~proat5am reading_score~~reading_score 
                      26.854                    10576.327

# Doctoral Degree Model
doc_model <- "
reading_score ~ 1 + proat5ab + proat5am + proat6
"
regression_3 <- sem(doc_model, data = stu_data)
coef(regression_3)

             reading_score~1       reading_score~proat5ab 
                     508.435                      -25.789 
      reading_score~proat5am         reading_score~proat6 
                      25.000                       99.423 
reading_score~~reading_score 
                   10533.188

# Academic Attainment and Teacher Certification
tc_model <- "
reading_score ~ 1 + proat5ab + proat5am + proat6 + proatce + proat6:proatce
"
regression_4 <- sem(tc_model, data = stu_data)
coef(regression_4)

             reading_score~1       reading_score~proat5ab 
                     563.843                      -23.578 
      reading_score~proat5am         reading_score~proat6 
                      28.621                    -1576.028 
       reading_score~proatce reading_score~proat6:proatce 
                     -61.228                     1720.194 
reading_score~~reading_score 
                   10495.767

# ST-Ratio and Other Variables
stratio_model <- "
reading_score ~ 1 + proat5ab + proat5am + proat6 + proatce + stratio + proatce:stratio
proatce ~ proat5ab + proat5am + proat6 + stratio
stratio ~ proat5ab + proat5am + proat6
"
regression_5 <- sem(stratio_model, data = stu_data)
summary(regression_5, standardized = TRUE, rsquare = TRUE, fit.measures = TRUE)

lavaan 0.6.14 ended normally after 32 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21

                                                  Used       Total
  Number of observations                          3820        4838

Model Test User Model:
                                                       
  Test statistic                              16990.878
  Degrees of freedom                                  5
  P-value (Chi-square)                            0.000

Model Test Baseline Model:

  Test statistic                             23163.744
  Degrees of freedom                                18
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.266
  Tucker-Lewis Index (TLI)                      -1.642

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -47118.223
  Loglikelihood unrestricted model (H1)     -38622.784
                                                      
  Akaike (AIC)                               94278.446
  Bayesian (BIC)                             94409.655
  Sample-size adjusted Bayesian (SABIC)      94342.926

Root Mean Square Error of Approximation:

  RMSEA                                          0.943
  90 Percent confidence interval - lower         0.931
  90 Percent confidence interval - upper         0.955
  P-value H_0: RMSEA <= 0.050                    0.000
  P-value H_0: RMSEA >= 0.080                    1.000

Standardized Root Mean Square Residual:

  SRMR                                           0.170

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all
  reading_score ~                                                         
    proat5ab         -28.472    6.067   -4.693    0.000   -28.472   -0.076
    proat5am          31.964    6.797    4.702    0.000    31.964    0.077
    proat6            50.699   54.999    0.922    0.357    50.699    0.015
    proatce          -39.099   11.185   -3.496    0.000   -39.099   -0.057
    stratio            0.856    0.236    3.620    0.000     0.856    0.059
    proatce:strati     0.644    0.223    2.889    0.004     0.644    0.046
  proatce ~                                                               
    proat5ab          -0.014    0.009   -1.598    0.110    -0.014   -0.026
    proat5am           0.097    0.010    9.973    0.000     0.097    0.161
    proat6             0.005    0.080    0.063    0.950     0.005    0.001
    stratio            0.000    0.000    1.327    0.184     0.000    0.022
  stratio ~                                                               
    proat5ab           1.447    0.415    3.490    0.000     1.447    0.056
    proat5am          -3.042    0.457   -6.659    0.000    -3.042   -0.107
    proat6            32.440    3.729    8.699    0.000    32.440    0.140

Intercepts:
                   Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all
   .reading_score    520.629   12.979   40.114    0.000   520.629    5.055
   .proatce            0.905    0.011   85.779    0.000     0.905    6.051
   .stratio           16.475    0.422   39.005    0.000    16.475    2.323
    proatce:strati    16.030    0.120  133.689    0.000    16.030    2.163

Variances:
                   Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all
   .reading_score  10409.928  238.194   43.704    0.000 10409.928    0.981
   .proatce            0.022    0.000   43.704    0.000     0.022    0.974
   .stratio           48.805    1.117   43.704    0.000    48.805    0.970
    proatce:strati    54.921    1.257   43.704    0.000    54.921    1.000

R-Square:
                   Estimate 
    reading_score      0.019
    proatce            0.026
    stratio            0.030

labels <- list(
  stratio = "Student-Teacher Ratio", proat5am = "% of Teachers with
               Masters Degree", proat5ab = "% of Teachers with Bachelors Degree",
  proat6 = "% of Teachers with Doctoral Degree",
  proatce = "% of Teachers Fully Certified",
  reading_score = "Reading Score"
)

lavaanPlot(
  model = regression_5, labels = labels,
  node_options = list(shape = "box", fontname = "Helvetica"),
  coefs = TRUE, sig = .05, stand = TRUE, stars = TRUE
)

Further Analysis in Next File.

References

Ciol, M. A., Hoffman, J. M., Dudgeon, B. J., Shumway-Cook, A., Yorkston, K. M., & Chan, L. (2006). Understanding the use of weights in the analysis of data from multistage surveys. Archives of physical medicine and rehabilitation, 87(2), 299-303. https://doi.org/10.1016/j.apmr.2005.09.021
Morris, T.P., White, I. R., & Royston, P. (2014). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Medical Research methodology, 14, 1:13. https://doi.org/10.1186/1471-2288-14-75
Austin, P. C.,White, I. R., Lee, D. S., & Buuren, S. (2021). Missing data in clinical research: A tutorial on multiple imputation. Canadian Journal of Cardiology, 37(9), 1322-1331. https://doi.org/10.1016/j.cjca.2020.11.010

Search This Blog

Data-Driven Education Insights

Testing EdSurvey, WeMix, Dire, MICE and Other Important R packages with Sample Educational Research PART 1