Specifies the linked data and information on the underlying record linkage process for the "General Framework for Regression with Mismatched Data" developed by Slawski et al. (2025). This framework uses a mixture model for pairs of linked records whose two components reflect distributions conditional on match status, i.e., correct match or mismatch. Inference is based on composite likelihood and the EM algorithm. Examples of information about the underlying record linkage process that can be incorporated into the method if available are the assumed overall mismatch rate, safe matches, predictors of match status, or predicted probabilities of correct matches.
Arguments
- linked.data
A data.frame containing the linked dataset. If
NULL, the function attempts to resolve variables specified inm.formulafrom the environment.- m.formula
A one-sided formula object for the mismatch indicator model, with the covariates on the right of "~". The default is an intercept-only model corresponding to a constant mismatch rate.
- m.rate
Numeric; an optional estimate (a proportion between 0 and 1) to constrain the global mismatch rate estimate. Defaults to
NULL.- safe.matches
A logical vector or an unquoted variable name found in
linked.data; an indicator variable for safe matches (TRUE : record can be treated as a correct match and FALSE : record may be mismatched). The default is FALSE for all matches.
Value
An object of class c("adjMixture", "adjustment"). To minimize
memory overhead, the underlying linked.data is stored by reference
within an environment inside this object.
Details
The constructor assumes that any variables defined in m.formula and
safe.matches are in linked.data or in the same environment.
Explicit provision of linked.data is strongly recommended for
reproducibility and to ensure the adjustment object fully encapsulates
the necessary data for downstream model fitting.
References
Slawski, M., West, B. T., Bukke, P., Wang, Z., Diao, G., & Ben-David, E. (2025). A general framework for regression with mismatched data based on mixture modelling. Journal of the Royal Statistical Society Series A: Statistics in Society, 188(3), 896-919. doi:10.1093/jrsssa/qnae083
Bukke, P., Ben-David, E., Diao, G., Slawski, M., & West, B. T. (2025). Cox Proportional Hazards Regression Using Linked Data: An Approach Based on Mixture Modeling. In Frontiers of Statistics and Data Science (pp. 181-200). Singapore: Springer Nature Singapore. doi:10.1007/978-981-96-0742-6_8
Slawski, M., Diao, G., Ben-David, E. (2021). A pseudo-likelihood approach to linear regression with partially shuffled data. Journal of Computational and Graphical Statistics. 30(4), 991-1003. doi:10.1080/10618600.2020.1870482
See also
plglm()for generalized linear regression modelingplcoxph()for Cox proportional hazards regression modelingplctable()for contingency table analysis
Examples
# Load the LIFE-M demo dataset
data(lifem)
# Phase 1: Adjustment Specification
# We model the correct match indicator via logistic regression using
# name commonness scores (commf, comml) and a 5% expected mismatch rate.
adj_object <- adjMixture(
linked.data = lifem,
m.formula = ~ commf + comml,
m.rate = 0.05,
safe.matches = hndlnk
)
