Multi-Source Uncertain Entity Resolution at Yad Vashem
Autor: | Ruth Bergman, Tomer Sagi, Avigdor Gal, Alexander Avram, Omer Barkol |
---|---|
Rok vydání: | 2016 |
Předmět: |
Information retrieval
Computer science Decision tree 02 engineering and technology Resolution (logic) computer.software_genre Set (abstract data type) The Holocaust 020204 information systems Scale (social sciences) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining Cluster analysis computer Multi-source |
Zdroj: | SIGMOD Conference |
DOI: | 10.1145/2882903.2903737 |
Popis: | In this work we describe an entity resolution project performed at Yad Vashem, the central repository of Holocaust-era information. The Yad Vashem dataset is unique with respect to classic entity resolution, by virtue of being both massively multi-source and by requiring multi-level entity resolution. With today's abundance of information sources, this project sets an example for multi-source resolution on a big-data scale. We discuss a set of requirements that led us to choose the MFIBlocks entity resolution algorithm in achieving the goals of the application. We also provide a machine learning approach, based upon decision trees to transform soft clusters into ranked clustering of records, representing possible entities. An extensive empirical evaluation demonstrates the unique properties of this dataset, highlighting the shortcomings of current methods and proposing avenues for future research in this realm. |
Databáze: | OpenAIRE |
Externí odkaz: |