Contact Us


Frequently Asked Questions

ETD Help

Policies and Procedures

Copyright and Patents

Access Restrictions

Search ETDs:
Advanced Search
Browse by:
Browse ProQuest
Search ProQuest

Laney Graduate School

Rollins School of Public Health

Candler School of Theology

Emory College

Emory Libraries

New ETD website is now LIVE and located here:

Analysis of Data with Complex Misclassification in Response or Predictor Variables by Incorporating Validation Subsampling

Tang, Li (2012)
Dissertation (142 pages)
Committee Chair / Thesis Adviser: Lyles, Robert
Committee Members: Flanders, W Dana ; Haber, Michael J ; Hanfelt, John
Research Fields: Statistics
Keywords: Differential; Misclassification; Internal Validation; Likelihood
Program: Laney Graduate School, Biostatistics
Permanent url:


Abstract Analysis of Data with Complex Misclassification in Response or Predictor Variables by Incorporating Validation Subsampling By Li Tang The problems of misclassification are common in epidemiological and clinical research. Misclassification may be present in either an exposure or outcome variable, or both. It is well known that the validity of analytic results (e.g., estimates of odds ratios of interest) might be questionable when no correction effort is made. Therefore, valid and accessible methods with which to deal with these issues are still in high demand. In this dissertation, we first consider the situation when correlated binary response variables are subject to misclassification. Building upon prior work that extended McNemar's test to correct paired-data odds ratio estimation, we propose a nonlinear mixed model-based approach to adjust for potentially complex differential misclassification in correlated binary responses via internal validation sampling. In the second topic, we shift gears toward predictor misclassification, for which we develop likelihood-based approaches based on generalized linear and generalized linear mixed models that can efficiently incorporate internal validation data in univariate and multivariate settings, respectively. We discuss the use of the approach both in the case when a baseline predictor is misclassified and when a time-dependent predictor is misclassified. In the final topic, we elucidate extensions of well-studied methods in order to facilitate misclassification adjustment when a binary outcome and binary exposure variable are both subject to complex differential misclassification in the 2-by-2 table scenario. We develop maximum likelihood approaches to accommodate a broad range of complexity in the joint misclassification process while incorporating various types of internal validation observations. We then generalize the method to a more standard binary regression setting, allowing the incorporation of covariates both in the main health effects model of interest and in misclassification models for both the binary outcome and exposure variable. Throughout, illustrative examples are presented via detailed analyses of bacterial vaginosis and trichomoniasis data from the HIV Research Epidemiology Study (HERS). Key Words: Differential; Misclassification; Internal Validation; Likelihood

Table of Contents

Table of Contents -- Chapter 1 Introduction...1 -- 1.1 Overview...1 -- 1.2 Misclassification in Correlated Binary Responses...2 -- 1.3 Misclassification in Predictors...4 -- 1.4 Misclassification in Response and Predictor Variables in 2Γƒβ€”2 Tables...6 -- 1.5 Misclassification in Response and Predictor Variables in Regression...7 -- 1.6 Motivating Example...8 -- Chapter 2 Regression Analysis for Differentially Misclassified Correlated Binary Responses...10 -- 2.1 Methods...10 -- 2.1.1 Notation...10 -- 2.1.2 Validation Sampling Scheme...12 -- 2.1.3 Non-differential Misclassification with External Validation...12 -- 2.1.4 Differential Misclassification...13 -- 2.1.5 Main-study Only and Sensitivity Analysis...16 -- 2.1.6 Estimation...17 -- 2.1.7 Correlation in Misclassification Processes...17 -- 2.2 Simulation Studies...18 -- 2.2.1 Non-differential Misclassification...18 -- 2.2.2 Differential Misclassification...20 -- 2.2.3 Importance of Correctly Specifying SE/SP Model...22 -- 2.2.4 A Note About Correlated Misclassification...24 -- 2.3 Example...27 -- 2.3.1 HERS Example...27 -- 2.3.2 Example 1: Pairwise No-covariate case...27 -- 2.3.3 Example 2: Pairwise Covariate-adjusted case...30 -- 2.3.4 Example 3: Longitudinal Analysis with >2 Time Points...38 -- 2.4. Discussion...44 -- Chapter 3 Regression Analysis for Differentially Misclassified Binary Covariates...47 -- 3.1 Univariate Case...47 -- 3.1.1 Model Specification...47 -- 3.1.2 External Validation: Non-differential Misclassification...48 -- 3.1.3 Internal Validation: Differential Misclassification...50 -- 3.1.4 Note on Impact of Mis-specifying X|C Model...52 -- 3.2 Extension to Repeated Measures...52 -- 3.2.1 Model Specification...52 -- 3.2.2 External Validation: Non-Differential Misclassification...54 -- 3.2.3 Internal Validation: Differential...55 -- 3.2.4 Estimation...56 -- 3.3. Simulation Studies...57 -- 3.3.1 External Validation in Univariate Case: Non-Differential Misclassification...57 -- 3.3.2 Internal Validation in Univariate Case: Differential Misclassification...59 -- 3.3.3 External Validation in Longitudinal Case: Non-Differential Misclassification...61 -- 3.3.4 Internal Validation in Longitudinal Case: Differential Misclassification...63 -- 3.4. Example...65 -- 3.4.1 HERS Example...65 -- 3.4.2 Example 1: Univariate Analysis with Visit 4...65 -- 3.4.3 Example 2: Longitudinal Analysis...68 -- 3.5. Discussion...71 -- Chapter 4 Misclassification in Response and Predictor Variables in 2Γƒβ€”2 Tables...73 -- 4.1 Methods...73 -- 4.1.1 Notations and Terminology...73 -- 4.1.2 Maximum Likelihood (ML) Approach...76 -- 4.1.3 Generalized Matrix Method...77 -- 4.1.4 Generalized Inverse Matrix Method...78 -- 4.1.5 Estimation of Misclassification Probabilities and Variance...79 -- 4.1.6 Notes on Case-Control Studies...82 -- 4.1.7 Model Selection...85 -- 4.1.8 Comments Regarding Null Testing...86 -- 4.2. SIMULATION STUDIES...90 -- 4.2.1 Study I: Mimicking Real-data Example...90 -- 4.2.2 Study II: Different Types of Misclassification...92 -- 4.2.3 Study III: Performance of Model Selection...97 -- 4.2.4 Study IV: Misclassification in Case-control studies...100 -- 4.3. EXAMPLE...101 -- 4.4 Discussion...104 -- Chapter 5 Misclassification in Response and Predictor Variables in Logistic Regression...108 -- 5.1. Methods...108 -- 5.1.1 Notation...108 -- 5.1.2 Independent Nondifferential Misclassification...108 -- 5.1.3 Independent Differential Misclassification...110 -- 5.1.4 Dependent and Differential Misclassification...113 -- 5.1.5 Other Types of Misclassification...114 -- 5.2. Example...115 -- 5.3. Simulation Studies...122 -- 5.4. Discussion...124 -- REFERENCES...125


application/pdf Dissertation 142 pages (1.4 MB) [Access copy of Dissertation]
Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.