cbc_lm

library(OLStrajr)

Introduction

The OLStrajr package provides the cbc_lm function for obtaining estimates for each case. This vignette will demonstrate its use through an example that utilizes the Robins dataset. This dataset presents the yearly male-to-female ratio for Robins, as documented in Birds: incomplete counts—five-minute bird counts Version 1.0.

data(robins)

Preparing Data

While the OLStraj R function works with wide-form data to maintain compatibility with the original OLStraj SAS macro, the cbc_lm function aligns with common R statistical modeling practices by utilizing long-form data. In this example, we employ the tidyr package to transform the Robins dataset into a long format.

Moreover, we adjust the resulting Year variable into a numeric format with a range from 0 to 4 through the following steps: conversion to a factor, coercion to numeric, and subtraction of 1.

# Convert to long form
library(tidyr)

robinsL <- robins |> pivot_longer(cols = starts_with("aug"),
                                  names_to = "Year",
                                  values_to = "Ratio")

robinsL$Year = as.numeric(as.factor(robinsL$Year)) - 1

Running the case by case regression

robins_mod <- cbc_lm(robinsL, Ratio ~ Year, .case = "site")

# Show class
class(robins_mod)
#> [1] "cbc_lm"

The cbc_lm class includes print, summary, and plot methods. In the following section, we’ll take a closer look at the summary method.

Model Summary

According to Carrig, Wirth, and Curran (2004), the mean values of the case-by-case OLS intercepts and slopes can act as unbiased estimators of the mean population intercept and slope. In the implementation of case-by-case regression by Rogosa & Saner (1995), it is noted that the standard errors are the standard deviations across 4,000 bootstrap replications, and the 90% confidence intervals’ endpoints correspond to 5% and 95% values of the empirical distributions obtained from the resampling. They further suggested that more sophisticated and accurate confidence intervals can be developed using the methods in Efron and Tibshirani (1993).”

By default, the summary method for the cbc_lm class initially displays the mean coefficients, bootstrap standard errors (over 4,000 replications), and bootstrap 95% confidence intervals. In addition, it exhibits the broom::tidy and broom::glance results for each case.

summary(robins_mod)
#> Call:
#> lm(formula = Ratio ~ Year, data = mod_df)
#> 
#> Mean Coefficients:
#> $`(Intercept)`
#> [1] 1.68
#> 
#> $Year
#> [1] 0.114
#> 
#> 
#> Bootstrap Standard Errors:
#> $`(Intercept)`
#> [1] 0.004295
#> 
#> $Year
#> [1] 0.1138
#> 
#> 
#> Bootstrap Confidence Intervals:
#> $`(Intercept)`
#> [1] 1.674 1.686
#> 
#> $Year
#> [1] -0.048  0.114
#> 
#> 
#> Summaries for each model:
#> 
#> Model 1 tidy summary:
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)   1.69      0.126     13.3   0.000909
#> 2 Year         -0.0480    0.0516    -0.931 0.421   
#> 
#> Model 1 glance summary:
#> # A tibble: 1 × 8
#>   r.squared adj.r.squared sigma statistic p.value    df df.residual  nobs
#>       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>       <int> <dbl>
#> 1     0.224       -0.0345 0.163     0.867   0.421     1           3     5
#> 
#> Model 2 tidy summary:
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)    1.67      0.296      5.65  0.0110
#> 2 Year           0.276     0.121      2.28  0.107 
#> 
#> Model 2 glance summary:
#> # A tibble: 1 × 8
#>   r.squared adj.r.squared sigma statistic p.value    df df.residual  nobs
#>       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>       <int> <dbl>
#> 1     0.634         0.513 0.383      5.21   0.107     1           3     5