Examination of the Impacts of Enhancing the ATra Black Box Matching Algorithm to Increase Case Integration in HIV Surveillance Systems: An Evaluation Studty (Preprint)

Autor: Auntre Dojuan Hamp, Helen E Karn, Frances Y Kwon, James Carrier, Reshma Bhattacharjee, Colin Flynn, Trevor Hsu, Anne Rhodes, John McNeice, Bridget J Anderson, Joyce Chicoine, Jessica Fridge, Justice King, Garret R Lum, Tej Mishra, Alisa Kang, J C Smart
Rok vydání: 2020
DOI: 10.2196/preprints.25282
Popis: BACKGROUND HIV surveillance data are essential to monitoring disease trends and to ending the HIV epidemic. Due to strict policies around data security and confidentiality, identifiable HIV surveillance data are not routinely shared across United States (U.S.) public health jurisdictions, with the exceptions of a biannual case-by-case review process, the Routine Interstate Duplicate Review (RIDR) and a quinquennial process, the Comprehensive Interstate Duplicate Review (CIDR). Achieving accurate, timely, and complete HIV surveillance data is complicated in the U.S. by migration and care-seeking across geographic and public health boundaries. To address these issues in HIV surveillance data, a number of public health jurisdictions use the ATra Black Box—a secure, electronic, privacy-assuring system developed by Georgetown University—to identify and confirm potential duplicate case records, exchange data, and perform other data analytics. The goal of the ATra Black Box is to reduce jurisdiction burden in conducting the RIDR/CIDR processes and to improve the quality of data in each jurisdiction's Enhanced HIV/AIDS Reporting System (eHARS). OBJECTIVE This paper evaluates the ability of two software algorithms in the ATra Black Box to identify potential pairs of duplicate case records across multiple jurisdictions for persons living with diagnosed HIV (PWDH). We hypothesized that the algorithm which contains rules to examine all the known first names and last names of the PWDH case records would perform significantly better than the algorithm that examines only one first name and last name per PWDH case record. METHODS Two software algorithms for identifying potential duplicate pairs of case records (matching algorithms) were implemented in the ATra Black Box. For quality assurance precision testing, input files of test data were created for each jurisdiction. Each test file contained randomly generated values as well as a predetermined number of hand-crafted match pairs prepared by one of the authors (F. Kwon). Output reports were examined to verify that the hand-crafted input match pairs (matches) were identified correctly according to the rules of each matching algorithm. The two algorithms were then used by six public health jurisdictions to identify matches in their PWDH data files. Each jurisdiction compared the outputs to determine which algorithm yielded the greater number of duplicate case pairs. RESULTS The matching algorithm with rules to inspect all first and last names for a PWDH case record, including legal and all alias names, performed significantly better than the algorithm that inspected only one first name and last name. The All Names matching algorithm identified 9,070 (4.5%) more duplicate matches than the Single Name matching algorithm. CONCLUSIONS HIV data deduplication across multiple public health jurisdictions is more effective when all the known first and last names of PWDH are searched versus only one first and last name.
Databáze: OpenAIRE