Skip to contents

The postlink package provides a unified suite of statistical tools designed to rigorously account for record linkage errors in downstream modeling.

Record linkage is often error-prone. When datasets are merged using noisy or non-unique identifiers, mismatches (false links) are inadvertently introduced. Ignoring these errors acts as a contaminant in regression analysis, typically leading to significantly attenuated estimates and biased statistical inference. postlink equips researchers with methodologies to propagate linkage uncertainty into their models, specifically accommodating "secondary analysis" workflows where direct access to the primary, unlinked files is restricted.

Details

A Two-Phase Workflow

The package is built on a modular, object-oriented S3 architecture that decouples the specification of linkage error from the substantive statistical modeling. This provides a familiar, standard formula-based modeling interface.

Phase 1: Adjustment Specification

The analyst encapsulates the linked data and the chosen error-adjustment methodology by using one of the adj* constructor functions. These functions validate the data and return a structured adjustment object:

  • adjELE(): Specifies the Exchangeable Linkage Error (ELE) model, which corrects for bias using known or audited mismatch rates.

  • adjMixture(): Specifies a frequentist mixture model approach that treats match status as a latent variable, estimating error rates directly from data (e.g., using match scores) via the EM algorithm.

  • adjMixBayes(): Specifies a Bayesian mixture model approach, enabling parameter estimation and multiple imputation of latent match statuses using Stan.

Phase 2: Estimation & Inference

The adjustment object is subsequently passed to a standard modeling wrapper, integrating the error correction into the familiar R modeling syntax:

  • plglm(): Generalized Linear Models (linear, logistic, Poisson, Gamma).

  • plcoxph(): Cox Proportional Hazards models.

  • plctable(): Contingency table analyses.

  • plsurvreg(): Parametric survival models.

As a note, estimation and inference supported for each type of adjustment object vary. Refer to the adj* documentation for models currently supported.

Post-estimation tools function as expected in standard R workflows (e.g., summary(), predict(), vcov(), and confint()). These methods specially are derived to account for the additional steps introduced by the linkage error adjustment.

Note: While the two-phase workflow is recommended for standard analyses, the package's architecture intentionally isolates the core logic of each method into independent internal routines. Advanced users, developers, or researchers running large-scale simulations can bypass the wrapper functions and formula-parsing overhead entirely. By supplying pre-computed design matrices and response vectors, the underlying computational functions can be directly used if preferred (e.g., coxphELE(), glmMixture(), survregMixBayes()).

References

Chambers, R. (2009). Regression analysis of probability-linked data. Official Statistics Research Series, 4, 1-15.

Chambers, R. L., Fabrizi, E., Ranalli, M. G., Salvati, N., & Wang, S. (2023). Robust regression using probabilistically linked data. Wiley Interdisciplinary Reviews: Computational Statistics, 15(2), e1596. doi:10.1002/wics.1596

Gutman, R., Sammartino, C., Green, T., & Montague, B. (2016). Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+. Statistics in Medicine, 35(1), 115–129. doi:10.1002/sim.6586

Slawski, M., West, B. T., Bukke, P., Wang, Z., Diao, G., & Ben-David, E. (2025). A general framework for regression with mismatched data based on mixture modelling. Journal of the Royal Statistical Society Series A: Statistics in Society, 188(3), 896-919. doi:10.1093/jrsssa/qnae083

Vo, T. H., Garès, V., Zhang, L. C., Happe, A., Oger, E., Paquelet, S., & Chauvet, G. (2024). Cox regression with linked data. Statistics in Medicine, 43(2), 296-314. doi:10.1002/sim.9960

Author

Maintainer: Priyanjali Bukke postlink.group@gmail.com

Authors:

  • Gauri Kamat

  • Jiahao Cui

  • Roee Gutman

  • Martin Slawski

Other contributors:

  • Zhenbang Wang [contributor]

  • Brady T. West [contributor]

  • Emanuel Ben-David [contributor]

  • Guoqing Diao [contributor]