Introductory to ‘bayesmlogit’ Package

0: Introduction

We developed a Bayesian approach to estimate multi-state life tables, which can be useful for modelling a complex health process when numerous predisposing variables and coexisting health conditions are included. For details see SM Lynch & Emma Zang.

This package involves two steps:

Use bayesmlogit() to sample parameters from a multinomial logit model predicting transitions with chosen covariates (using Polya-Gamma latent variables to facilitate Gibbs sampling (Polson et al. 2013)).
Use mlifeTable() to generate multi-state life tables from the posterior samples and summarize quantities of interest from the tables.

In this article, we will take the ‘lifedata’ sample data set as an illustration and describe how to use our package to perform the Bayesian multi-state life table analysis. lifedata is extracted and processed from the data provided by The Health and Retirement Study (HRS), which include three simple states: “health”, “unhealthiness” and “death”. For more details use ?lifedata().

Note that we only present an example in this post. The results are not representative of the entire population. In addition, our results are produced using the MCMC approach. With the same setup, you may obtain varying outcomes.

0.1: Installation

To install this package. Try

devtools::install_github("Xuezhixing-Zhang/bayesmlogit")

0.2: Obtain the Transitions

Before employing the Markov chain Monte Carlo (MCMC) approach, we must determine the transitions of each subject. In lifedata, we have two states “health”, “unhealthiness” and an end state “death”, which can only happen when the subject is dead. Given a time interval, these states determine six kinds of possible transitions: 1:health to health; 2:health to unhealthiness; 3: health to death; 4: unhealthiness to health; 5: unhealthiness to unhealthiness; 6: unhealthiness to death.

The transitions can be obtained in two steps:

Eliminate subjects with only one observation.
Recode the transitions according to the table below:

	Health	Unhealthiness	Death
Health	1	2	3
Unhealthiness	4	5	6
Death	-	-	-

where the first column indicates the start state of subjects and the first row indicates the end state at this time interval. The numbers indicates the index of our transitions. Even though certain transitions in this table, such as “Death” to “Health,” are impossible, we must nonetheless label them in this sequence. If transitions are not established in this order, computation errors may exist in the final results.

In the package, we provide a function CreateTrans() to help you create these transitions. You need to provide subject ID, age, current state, indicator of death and total number of states. Here is another example of this function’s usage.

#In this example, we generate 250 observations and 6 states (including death). Based on these observations, we apply the `CreateTrans()` function and generate the transitions.

#Create subject IDs for each observation. In this example, we have 50 subjects and 250 observations in total.
ID <- rep(1:50, each = 5) 

#Create Age variable for each observation.
Age <- rep(31:35, times = 50) 

#Create the current state for each observation. Without considering the end state "Death", we assume there are five other possible states. 
State <- sample(1:5,size=250,replace=TRUE)

#Create the indicator of death. All subjects in this example are presumed to have died at the last observation.
Death <- rep(c(0,0,0,0,1),times=50)

Example <- data.frame(ID,Age,State,Death)

#Use `CreateTrans()` to create transitions of each observation. Here we have six states in total: death and the other five possible states.
Example$trans <- CreateTrans(Example$ID,Example$Age,
                             Example$State,Example$Death,states=6)

#The transition for the first observation of each subject is NA because we cannot observe their previous states. 
head(Example,10)

##    ID Age State Death trans
## 1   1  31     2     0    NA
## 2   1  32     1     0     7
## 3   1  33     2     0     2
## 4   1  34     1     0     7
## 5   1  35     4     1     6
## 6   2  31     1     0    NA
## 7   2  32     2     0     2
## 8   2  33     4     0    10
## 9   2  34     5     0    23
## 10  2  35     3     1    30

1: Bayesmlogit Method

After getting the transitions for each observation, we sample parameters from a multinomial logit model using the MCMC approach. The package provides the bayesmlogit() function for generating posterior samples. For further details use ?bayesmlogit()(Polson et al. 2013).

Here is an example for this function:

data <- lifedata
y <- data[,1]
X <- data[,-1]

# This example will take about 30 mins.
out <- bayesmlogit(y, X ,samp=1000, burn=500, step.width = 5, verbose=10)

Above codes generate 1000 posterior samples for lifedata after 500 burn-in samples. To reduce autocorrelation among each sample, we will select one sample from every five posterior samples.

The function has two outcomes: out and outwstepwidth. out contains all posterior samples generated after burn-in, while outwstepwidth includes only selected posterior samples. In this example, we will have 1000 posterior samples in out and 200 samples in outwstepwidth.

2: Multi-state Life Table Method

With posterior samples, we then compute the transition probability matrices and generate multi-state life tables.

2.1: Create Life Tables

The life tables can be created by utilizing mlifetable(). Note that the variable “age” must be included in the data, which is necessary for calculating transition probabilities.

trans <- out$outwstepwidth
mlifeTable(y,X,trans =trans,
           groupby = c("male","black","hispanic"),
           vars = "mar",
           startages=50,
           age.gap=1,
           states=3,
           nums =200,
           file_path=".")

This function provides several options to generate life tables. trans is the posterior samples obtained from bayesmlogit()(Polson et al. 2013). groupby specifies the covariates to create subgroups. vars indicates the mediators in our analysis. startages is the start age for life tables. age.gap specifies the age interval for generating life tables. states is the total number of states in our data. nums is the posterior samples used to generate life tables. file_path gives the path for our outputs. We also provide other options for generating life tables, which can be found using ?mlifeTable(). Each life table is a \(\text{num} \times \text{states}\) matrix. In this example, six \(200 \times 3\) life tables will be generated, corresponding to six subgroups.

2.2: Results

When generating life tables, the function will index each subgroup and report sample size of each group:

where “subgroup000” indicates “male = 0”, “black = 0”, “hispanic = 0” and “subgroup001” indicates “male = 0”, “black = 0”, “hispanic = 1”. Other subgroups are also named in this order. All indexed life tables will be saved as “.txt” files in the specified path.

This step will output six life table files, corresponding six subgroups. Each life table will contain 200 lines, which are generated from 200 posterior samples. Without giving names of each state, the life table will appear as follows (Only show the first five lines):

##         V1       V2       V3
## 1 12.26391 12.37167 35.37205
## 2 12.98282 13.26774 33.85567
## 3 11.97415 13.92443 34.11296
## 4 11.36613 13.62579 35.02107
## 5 10.32153 13.61575 36.06585

where “V1”, “V2” and “V3” correspond to the three states “health”, “unhealthiness” and “death”. The numbers below each state indicate the expected number of years that the population will remain in that state. When computing total life expectancy, we ignore the final column, “death,” and sum the remaining columns.

2.3: Plots

To summarize results of life tables and create simple plots, we can use the set mlifeTable_plot = TRUE in mlifeTable() or apply function mlifeTable_plot() to life tables. In the specified path, the function will create a file folder named “mplotResults” by default.

#An example for generating plots with mlifeTable().
mlifeTable(y,X,trans =trans,
           groupby = c("male","black","hispanic"),
           vars = "mar",
           states=3,
           startages=50,
           age.gap=1,
           nums = 200,
           file_path=".",
           mlifeTable_plot = T,
           cred = 0.84)

#An example for generating plots with mlifeTable_plot():
mlifeTable_plot(X=lifedata[,-1],state.include = 0,
      groupby = c("male","black","hispanic"), 
      cred = 0.84, 
      states = 3,
      file_path = ".")

where cred is the credible level for summarizing credible intervals. The function will generate tables that summarize credible intervals and means for each subgroup’s life expectancy. By default, we will get tables and figures for total life expectancy and state-specific life expectancies. Use ?mlifeTable() and ?mlifeTable_plot() for more details.

Without naming each state and subgroup, the total life expectancy tables and plots will appear as follows:

Total Life Expectancy Table:

##       mean left.bound right.bound subgroup
## 1 25.59417   23.89036    27.23789 group000
## 2 20.98345   19.70644    22.39969 group100
## 3 17.50835   15.76863    19.34860 group110
## 4 21.76997   19.96704    23.50505 group010
## 5 25.75218   21.74834    29.84160 group001
## 6 19.54481   16.22042    23.26247 group101

Total Life Expectancy Plot:

Reference

Lynch, S. M., & Zang, E. (2022). “Bayesian Multistate Life Table Methods for Large and Complex State Spaces: Development and Illustration of a New Method.” Sociological Methodology, 52(2), 254-286.
Polson, N. G., Scott, J. G., & Windle, J. (2013). “Bayesian inference for logistic models using Pólya–Gamma latent variables.” Journal of the American statistical Association, 108(504), 1339-1349.