Multi-source entity resolution for genealogical data
Autor: | Efremova, J., Ranjbar-Sahraei, Bijan, Rahmani, H., Oliehoek, F., Calders, T., Tuyls, K., Weiss, G., Boothooft, G., Christen, P., Mandemakers, K., Schraagen, M. |
---|---|
Přispěvatelé: | DKE Scientific staff, RS: FSE DACS, RS: FSE DACS RAI, Dept. of Advanced Computing Sciences |
Jazyk: | angličtina |
Rok vydání: | 2015 |
Předmět: | |
Zdroj: | Population Reconstruction ISBN: 9783319198835 Population Reconstruction Population reconstruction, 129-154 STARTPAGE=129;ENDPAGE=154;TITLE=Population reconstruction Population Reconstruction, 129-154 STARTPAGE=129;ENDPAGE=154;TITLE=Population Reconstruction |
DOI: | 10.1007/978-3-319-19884-2_7 |
Popis: | In this chapter, we study the application of existing entity resolution (er) techniques on a real-world multi-source genealogical dataset. Our goal is to identify all persons involved in various notary acts and link them to their birth, marriage, and death certificates. We analyze the influence of additional er features, such as name popularity, geographical distance, and co-reference information on the overall er performance. We study two prediction models: regression trees and logistic regression. In order to evaluate the performance of the applied algorithms and to obtain a training set for learning the models we developed an interactive interface for getting feedback from human experts. We perform an empirical evaluation on the manually annotated dataset in terms of precision, recall, and f-score. We show that using name popularity, geographical distance together with co-reference information helps to significantly improve er results.keywordsdeath certificateentity resolutionentity resolutioncandidate pairnatural language processing techniquegenealogical datathese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves. |
Databáze: | OpenAIRE |
Externí odkaz: |