Validation Data-Based Adjustments for Outcome Misclassification in Logistic Regression:An Illustration
Autor: | Caroline C. King, Jack D. Sobel, David D. Celentano, Hillary M. Superak, Li Tang, Robert H. Lyles, Yungtai Lo |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2011 |
Předmět: |
Generalized linear model
Likelihood Functions Epidemiology Computer science Bayesian probability Reproducibility of Results Validation Studies as Topic Logistic regression Classification Sensitivity and Specificity Regression Article Logistic Models Bias Frequentist inference Case-Control Studies Data Interpretation Statistical Covariate Econometrics Odds Ratio Parametric statistics |
Popis: | The consequences of misclassified binary outcome or exposure variables when estimating a crude odds ratio (OR) are well understood.1–5 Existing literature also covers the use of validation data to estimate crude ORs while adjusting for misclassification in case-control and cross-sectional studies,6–11 considering the relative merits of external versus internal validation study designs.1; 11–12 In regression applications, many researchers advocate the use of validation data to adjust for measurement error in continuous predictors.13–17 Regarding outcome misclassification for discrete responses, Magder and Hughes18 outline the problem under logistic regression and advocate maximum likelihood via an expectation-maximization algorithm.19 Their work primarily addresses the case of known misclassification probabilities (i.e., sensitivities and specificities) characterizing the observed outcome variable. While continuing to focus on the known sensitivities/specificities case, Neuhaus20 provides further insight into asymptotic bias and efficiency in the broader realm of the generalized linear model, as well as a more efficient computational maximum likelihood approach. Recent articles in the epidemiologic literature demonstrate Monte Carlo-based techniques that similarly facilitate sensitivity analyses with misclassified binary variables.21–22 Other related research includes extensions to settings with count or discrete survival outcomes.23–25 To incorporate validation data, some authors gravitate toward Bayesian approaches using prior assumptions about misclassification probabilities.26–28 From the parametric frequentist perspective, Carroll et al.11 provide general expressions for likelihood functions that accommodate internal validation data. Alternative developments include robust modeling of sensitivity and specificity via kernel smoothers,29 with comparisons of that approach versus parametrically modeling their dependence upon covariates.30 Our aim is to provide guidance for epidemiologists seeking accessible and efficient methods for obtaining validation data-based estimates of logistic regression parameters when the outcome is misclassified. We keep to a likelihood-based approach, as it avoids explicit specification of prior distributions and is readily facilitated for binary outcomes. In the general case, we model the dependence of sensitivity and specificity upon covariates via a second logistic regression model, promoting a flexible and intuitively appealing analytic approach. The methodology that we illustrate is a direct expansion of the known misclassification rate setting considered by Magder and Hughes18 and Neuhaus,20 a covariate-adjusted extension of well-discussed methods for estimating crude ORs,6–9 and ultimately an application of the general main/validation study maximum likelihood approach outlined in Carroll et al.11 However, there have been few if any real-world applications of the latter approach making use of internal validation data, and such application presents computational challenges to the practicing epidemiologist. Thus, our goal is to bring this approach for addressing outcome misclassification in regression closer to the forefront of epidemiologic research. We pursue this aim by highlighting an instructive example involving misclassified outcome status in the HIV Epidemiology Research Study, by transparent exposition of appropriate likelihood functions, and by providing appendices with straightforward computer code that connects directly with that exposition. |
Databáze: | OpenAIRE |
Externí odkaz: |