
Bayesian two-component mixture survival regression model
Source:R/mixbayes_survreg.R
survregMixBayes.RdFits a Bayesian two-component parametric survival regression model using Stan. Each observation is assumed to arise from one of two latent components with component-specific survival regression parameters.
Usage
survregMixBayes(
X,
y,
dist = "weibull",
priors = NULL,
control = list(iterations = 10000, burnin.iterations = 1000, seed =
sample.int(.Machine$integer.max, 1), cores = getOption("mc.cores", 1L)),
...
)Arguments
- X
A numeric design matrix (\(N \times K\)), typically created by
model.matrix(). Each row corresponds to one observation and each column to one covariate in the survival model. Missing values are not allowed.- y
A survival response. This can be either a two-column numeric matrix with columns
timeandevent, whereevent = 1indicates an observed event andevent = 0indicates right censoring, or a list with elementstimeandevent. Missing values are not allowed.- dist
Character string specifying the parametric survival distribution used for both mixture components. Supported values are
"gamma"and"weibull".- priors
A named
listof prior specifications, orNULL. Since the Stan models are pre-compiled, prior specifications are converted into the corresponding numeric hyperparameters and passed to the model as data. Any missing entries are automatically filled in using symmetric default values viafill_defaults(priors, p_family = dist, model_type = "survival").- control
A named
listof control parameters for posterior sampling. Defaults are:iterations(default1e4): total number of iterations per chain;burnin.iterations(default1e3): number of warm-up iterations;seed(default: a random integer): random seed for reproducibility;cores(defaultgetOption("mc.cores", 1L)): number of CPU cores used.
Values supplied through
...override the corresponding entries incontrol.- ...
Optional overrides for elements of
control, such asiterations = 4000,burnin.iterations = 1000,seed = 123, orcores = 2.
Value
An object of class "survMixBayes" containing (at least):
m_samplesPosterior draws of aligned latent component labels (matrix of size draws × N), where component 1 corresponds to the correct-match component and component 2 to the incorrect-match component.
estimates$coefficientsPosterior draws of regression coefficients for the correct-match component (component 1; draws × K).
estimates$m.coefficientsPosterior draws of regression coefficients for the incorrect-match component (component 2; draws × K).
estimates$thetaPosterior draws of the mixing weight for the correct-match component (component 1; vector of length draws).
estimates$shapePosterior draws of the shape parameter for the correct-match component (component 1; family-specific).
estimates$m.shapePosterior draws of the shape parameter for the incorrect-match component (component 2; family-specific).
estimates$scalePosterior draws of the scale parameter for the correct-match component (component 1; Weibull only).
estimates$m.scalePosterior draws of the scale parameter for the incorrect-match component (component 2; Weibull only).
familyThe survival distribution used in the model.
callThe matched function call.
Details
The function supports "gamma" and "weibull" component
distributions, with both components sharing the same family. Right-censored
survival outcomes are supported.
Posterior draws are returned for the component-specific regression parameters and mixing weight. To improve interpretability of posterior summaries, the function applies a post-processing step that aligns component labels across posterior draws.
Label switching
Mixture models are invariant to permutations of component labels, which can lead to label switching in MCMC output. To ensure interpretable posterior summaries, this function applies a post-processing step that aligns component labels across posterior draws.
First, an optional global swap of labels (1 and 2) is performed
if component 2 is more frequent overall. Then, labels are
aligned across draws using the ECR-ITERATIVE-1
relabeling algorithm.
References
Gutman, R., Sammartino, C., Green, T., & Montague, B. (2016). Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+. Statistics in Medicine, 35(1), 115–129. doi:10.1002/sim.6586
Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 795–809. doi:10.1111/1467-9868.00265
Papastamoulis, P. (2016). label.switching: An R package for dealing with the label switching problem in MCMC outputs. Journal of Statistical Software, 69(1), 1–24. doi:10.18637/jss.v069.c01
Examples
# \donttest{
# Example: Bayesian mixture survival model fit to linked survival data
# with induced linkage mismatch errors
# 1. Simulate a linked survival dataset
set.seed(301)
n <- 150
X <- matrix(rnorm(n * 2), ncol = 2)
colnames(X) <- c("x1", "x2")
# Generate survival times from a Weibull AFT model
true_time <- rweibull(
n,
shape = 1.5,
scale = exp(0.5 * X[, 1] - 0.5 * X[, 2])
)
# Apply right-censoring
cens_time <- rexp(n, rate = 0.1)
event <- as.integer(true_time <= cens_time)
obs_time <- pmin(true_time, cens_time)
# Induce linkage mismatch errors in approximately 20% of records
is_mismatch <- rbinom(n, 1, 0.2)
mismatch_idx <- which(is_mismatch == 1)
shuffled <- sample(mismatch_idx)
obs_time[mismatch_idx] <- obs_time[shuffled]
event[mismatch_idx] <- event[shuffled]
y <- cbind(time = obs_time, event = event)
# 2. Fit the Bayesian two-component mixture survival model
# Note: Iterations are set artificially low for run time
fit <- survregMixBayes(
X = X,
y = y,
dist = "weibull",
control = list(iterations = 200, burnin.iterations = 100, seed = 123)
)
#>
#> SAMPLING FOR MODEL 'survMixBayes_weibull' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 9.7e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.97 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: There aren't enough warmup iterations to fit the
#> Chain 1: three stages of adaptation as currently configured.
#> Chain 1: Reducing each adaptation stage to 15%/75%/10% of
#> Chain 1: the given number of warmup iterations:
#> Chain 1: init_buffer = 15
#> Chain 1: adapt_window = 75
#> Chain 1: term_buffer = 10
#> Chain 1:
#> Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
#> Chain 1: Iteration: 20 / 200 [ 10%] (Warmup)
#> Chain 1: Iteration: 40 / 200 [ 20%] (Warmup)
#> Chain 1: Iteration: 60 / 200 [ 30%] (Warmup)
#> Chain 1: Iteration: 80 / 200 [ 40%] (Warmup)
#> Chain 1: Iteration: 100 / 200 [ 50%] (Warmup)
#> Chain 1: Iteration: 101 / 200 [ 50%] (Sampling)
#> Chain 1: Iteration: 120 / 200 [ 60%] (Sampling)
#> Chain 1: Iteration: 140 / 200 [ 70%] (Sampling)
#> Chain 1: Iteration: 160 / 200 [ 80%] (Sampling)
#> Chain 1: Iteration: 180 / 200 [ 90%] (Sampling)
#> Chain 1: Iteration: 200 / 200 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.193 seconds (Warm-up)
#> Chain 1: 0.186 seconds (Sampling)
#> Chain 1: 0.379 seconds (Total)
#> Chain 1:
#> Warning: The largest R-hat is 1.19, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Global label swap performed: label 2 dominates label 1.
#>
#> ......................................................................................
#> . Method Time (sec) Status .
#> ......................................................................................
#> . ECR-ITERATIVE-1 0.14 Converged (3 iterations) .
#> ......................................................................................
#>
#> Relabelling all methods according to method ECR-ITERATIVE-1 ... done!
#> Retrieve the 1 permutation arrays by typing:
#> [...]$permutations$"ECR-ITERATIVE-1"
#> Retrieve the 1 best clusterings: [...]$clusters
#> Retrieve the 1 CPU times: [...]$timings
#> Retrieve the 1 X 1 similarity matrix: [...]$similarity
#> Label switching finished. Total time: 0.1 seconds.
# 3. Inspect posterior summaries
# (Label switching is handled automatically)
cat("Component 1 (Correct Links):\n")
#> Component 1 (Correct Links):
print(colMeans(fit$estimates$coefficients))
#> [1] 0.3856700 -0.3266702
cat("Component 2 (Incorrect Links):\n")
#> Component 2 (Incorrect Links):
print(colMeans(fit$estimates$m.coefficients))
#> [1] 0.1048798 -0.1195878
cat("Estimated probability of correct linkage:\n")
#> Estimated probability of correct linkage:
print(mean(fit$estimates$theta))
#> [1] 0.8279494
# }