Fits a generalized linear model (GLM) accounting for mismatch errors using a mixture model framework in the secondary analysis setting. The variance-covariance matrix is estimated using the sandwich formula.
Arguments
- x
Design matrix for the primary outcome model (numeric matrix or data frame).
- y
Response vector for the primary outcome model.
- family
A family object (e.g.,
gaussian,binomial) specifying the error distribution and link function. Can be a character string or a function.- z
Design matrix for the mismatch indicator model (mismatch covariates). If NULL, an intercept-only model is assumed.
- m.rate
The assumed overall mismatch rate (a proportion between 0 and 1). If provided, it imposes a constraint on the mismatch model intercept.
- safe.matches
Logical vector;
TRUEindicates a "safe match" (treated as definitely correct),FALSEindicates a potential mismatch.- control
An optional list of control parameters. Arguments passed via
...will override values in this list.max.iter: Maximum EM iterations (default: 1000).cmax.iter: Maximum iterations for the subroutine in the constrained logistic regression function (default: 1000).tol: Convergence tolerance (default: 1e-4).init.beta: Initial parameter estimates for the outcome model.init.gamma: Initial parameter estimates for the mismatch indicator model.fy: Estimated marginal density of the response. If NULL, estimated using Kernel Density Estimation or parametric assumption.
- ...
Additional arguments passed to
control.
Value
A list of results:
- coefficients
A named vector of coefficients for the outcome model.
- m.coefficients
A named vector of coefficients for the mismatch indicator model (gamma).
- match.prob
The posterior correct match probabilities (weights) for each observation.
- residuals
The working residuals, defined as
y - fitted.values.- fitted.values
The fitted mean values of the outcome model, obtained by transforming the linear predictors by the inverse of the link function.
- linear.predictors
The linear fit on the link scale.
- deviance
The deviance of the weighted outcome model at convergence.
- null.deviance
The deviance of the weighted null outcome model.
- var
The estimated variance-covariance matrix of the parameters (sandwich estimator).
- dispersion
The estimated dispersion parameter (e.g., variance for Gaussian, 1/shape for Gamma).
- objective
A vector tracking the negative log pseudo-likelihood at each iteration of the EM algorithm.
- converged
Logical indicating if the EM algorithm converged within
max.iter.- rank
The numeric rank of the fitted linear model.
- df.residual
The residual degrees of freedom.
- df.null
The residual degrees of freedom for the null model.
- family
The
familyobject used.callThe matched call.
References
Slawski, M.*, West, B. T., Bukke, P., Wang, Z., Diao, G., & Ben-David, E. (2025). A general framework for regression with mismatched data based on mixture modelling. Journal of the Royal Statistical Society Series A: Statistics in Society, 188(3), 896-919. doi:10.1093/jrsssa/qnae083
Slawski, M.*, Diao, G., Ben-David, E. (2021). A pseudo-likelihood approach to linear regression with partially shuffled data. Journal of Computational and Graphical Statistics. 30(4), 991-1003. doi:10.1080/10618600.2020.1870482
