Abstrakt: |
Despite all the advances, one of the main challenges for the consolidation of the Web of Data is data integration, a key aspect to semantic web data management. Most of the solutions make use of entity resolution, a task that deals with identifying and linking different manifestations of the same real world object in one or more datasets. However, data are usually incomplete, inconsistent and contain outliers and, to overcome these limitations, it is necessary to explore as much as possible the existent patterns in data. One way to extrapolate the commonly used technique of pair-wise matching is to explore the relationship structure between entities. Moreover, with the billions of RDF triples being published in the Web, scale has become a problem, posing some new challenges. Only recently some works started to consider new strategies that can deal with the problem of entity resolution in high scale datasets. In this paper we describe a Map-Reduce strategy for a relational learning approach that addresses the problem by statistical approximation method using a linear algebra technique. We applied the parallelization in all steps of the approach. Preliminary experiments shows that our strategy scales well with real world semantic datasets, maintaining the effectiveness of results even with the increased number of processed data. [ABSTRACT FROM AUTHOR] |