ADGN: An Algorithm for Record Linkage Using Address, Date of Birth, Gender, and Name

Autor: Stephen Ansolabehere, Eitan D. Hersh
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Zdroj: Statistics and Public Policy, Vol 4, Iss 1, Pp 1-10 (2017)
Druh dokumentu: article
ISSN: 2330-443X
2330443X
DOI: 10.1080/2330443X.2017.1389620
Popis: This article presents an algorithm for record linkage that uses multiple indicators derived from combinations of fields commonly found in databases. Specifically, the quadruplet of Address (A), Date of Birth (D), Gender (G), and Name (N) and any triplet of A-D-G-N (i.e., ADG, ADN, AGN, and DGN) also link records with an extremely high likelihood. Matching on multiple identifiers avoids problems of missing data, inconsistent fields, and typographical errors. We show, using a very large database from the State of Texas, that exact matches using combinations A, D, G, and N produce a rate of matches comparable to 9-Digit Social Security Number. Further examination of the linkage rates show that reporting of the data at a higher level of aggregation, such as Birth Year instead of Date of Birth and omission of names, makes correct matches between databases highly unlikely, protecting an individual’s records.
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje