Addressing bias from non-random missing attributes in health data
Autor: | Nicholas J. Napoli, William F. Barnhardt, Madeline E. Kotoriy, Jeffrey S. Young, Laura E. Barnes |
---|---|
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Selection bias Data collection business.industry media_common.quotation_subject Data management Sample (statistics) Missing data computer.software_genre Data set 03 medical and health sciences 030104 developmental biology 0302 clinical medicine Health care Medicine Quality (business) 030212 general & internal medicine Data mining business computer media_common |
Zdroj: | BHI |
DOI: | 10.1109/bhi.2017.7897256 |
Popis: | This paper aims to improve health outcomes research and data management practices. Typically health care records are very large and cumbersome to manage, and the quality of the data is often overlooked because the volume is thought to be large enough to overcome issues arising from missing data. However, simply removing observations with missing data is problematic because the distribution of missing information is non-random, thus the sample used for analysis becomes biased. We propose a method for evaluating and addressing bias in the data cleaning process. Specifically, we identify where bias exists within data and address the bias using sub-sampling or discarding data. We present a case study analyzing data from a level 1 trauma center to establish how bias in health registries exists and how this bias can have downstream implications for evaluating hospital performance. Our method utilizes a two-tailed z-test to compare subgroups in the data set, which demonstrates how missing data in these subgroups can lead to bias. We demonstrate how to localize the bias in particular subgroups and provide corrective actions to handle the bias. We also exhibit how failure to account for bias can distort performance, illustrating the importance of the proposed method. |
Databáze: | OpenAIRE |
Externí odkaz: |