JedAI : beyond batch, blocking-based Entity Resolution

Autor: George Papadakis, Leonidas Tsekouras, Manos Thanos, Nikiforos Pittaras, Giovanni Simonini, Dimitrios Skoutas, Paul Isaris, George Giannakopoulos, Themis Palpanas, Manolis Koubarakis
Rok vydání: 2020
Popis: JedAI is an open-source toolkit that allows for building and benchmarking thousands of schema-agnostic Entity Resolution (ER) pipelines through a non-learning, blocking-based end-to-end workflow. In this paper, we present its latest release, JedAI3 , which conveys two new end-to-end workflows: one for budgetagnostic ER that is based on similarity joins, and one for budgetaware (i.e., progressive) ER. This version also adds support for pre-trained word or character embeddings and connects JedAI to the Python data analysis ecosystem. Overall, these enhancements provide JedAI with features offered by no other ER tool, especially in the schema- and domain-agnostic context.
Databáze: OpenAIRE