Specifies the linked data and information on the underlying record linkage process for regression of linked data assuming exchangeable linkage errors (ELE) as developed by Chambers (2009) and Vo et al. (2024). These approaches correct for bias from mismatch error via weighting matrices estimated using known mismatch rates or clerical reviews (audit samples).
Usage
adjELE(
linked.data,
m.rate,
audit.size = NULL,
blocks = NULL,
weight.matrix = c("ratio", "LL", "BLUE", "all")
)Arguments
- linked.data
A data.frame containing the data after record linkage.
- m.rate
Numeric vector; known or estimated probability of mismatch for each record or block. Values must be between 0 and 1. Can be a single global rate, a vector of length equal to the number of unique blocks, or a vector of length equal to the number of rows in
linked.data.- audit.size
Numeric vector; If the m.rate is estimated, provide sample sizes for the clerical review audit. Used for variance estimation. If provided, must align with
blockssimilar tom.rate. Defaults toNULL(assume m.rate is known).- blocks
A vector or an unquoted variable name found in
linked.dataidentifying the blocking structure used during linkage. IfNULL(default), all records are assumed to belong to a single block.- weight.matrix
Character; the method for estimating the weight matrix. Must be one of "ratio" (default), "LL", "BLUE", or "all".
Value
An object of class c("adjELE", "adjustment"). To minimize
memory overhead, the underlying linked.data is stored by reference
within an environment inside this object.
Details
The constructor validates consistency between the mismatch rates, audit sizes,
and block identifiers. If blocks are provided, m.rate must
be specified either per-block (length equals number of unique blocks) or
per-record (length equals number of rows).
Explicit provision of linked.data is strongly recommended for
reproducibility and to ensure the adjustment object fully encapsulates
the necessary data for downstream model fitting.
Note
The internal algorithmic structure for the estimating equations was informed by the foundational work presented in Chambers (2009) and Vo et al. (2024). Additionally, the original code provided in the Appendix of Chambers (2009) was utilized as a benchmark oracle during the unit testing phase of package development to check for numerical accuracy and validity of the implementation.
References
Chambers, R. (2009). Regression analysis of probability-linked data. Official Statistics Research Series, 4, 1-15.
Chambers, R. L., Fabrizi, E., Ranalli, M. G., Salvati, N., & Wang, S. (2023). Robust regression using probabilistically linked data. Wiley Interdisciplinary Reviews: Computational Statistics, 15(2), e1596. doi:10.1002/wics.1596
Vo, T. H., Garès, V., Zhang, L. C., Happe, A., Oger, E., Paquelet, S., & Chauvet, G. (2024). Cox regression with linked data. Statistics in Medicine, 43(2), 296-314. doi:10.1002/sim.9960
