Hierarchical Multi-Label Classification Using Web Reasoning for Large Datasets

Autor: Peixoto, Rafael, Hassan, Thomas, Cruz, Christophe, Bertaux, Aurélie, Silva, Nuno
Přispěvatelé: Polytechnic Institute of Porto, Laboratoire Electronique, Informatique et Image ( Le2i ), Université de Bourgogne ( UB ) -AgroSup Dijon - Institut National Supérieur des Sciences Agronomiques, de l'Alimentation et de l'Environnement-Centre National de la Recherche Scientifique ( CNRS ), Laboratoire Electronique, Informatique et Image [UMR6306] (Le2i), Université de Bourgogne (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Arts et Métiers (ENSAM), Arts et Métiers Sciences et Technologies, HESAM Université (HESAM)-HESAM Université (HESAM)-Arts et Métiers Sciences et Technologies, HESAM Université (HESAM)-HESAM Université (HESAM)-AgroSup Dijon - Institut National Supérieur des Sciences Agronomiques, de l'Alimentation et de l'Environnement
Jazyk: angličtina
Rok vydání: 2016
Předmět:
Zdroj: Open Journal Of Semantic Web
Open Journal Of Semantic Web, Research Online Publishing (RonPub), 2016, 〈10.19210/1006.3.1.1〉
Open Journal Of Semantic Web, Research Online Publishing (RonPub), 2016, ⟨10.19210/1006.3.1.1⟩
ISSN: 2199-336X
Popis: International audience; Extracting valuable data among large volumes of data is one of the main challenges in Big Data. In this paper, a Hierarchical Multi-Label Classification process called Semantic HMC is presented. This process aims to extract valuable data from very large data sources, by automatically learning a label hierarchy and classifying data items.The Semantic HMC process is composed of five scalable steps, namely Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct automatically a label hierarchy from statistical analysis of data. This paper focuses on the last two steps which perform item classification according to the label hierarchy. The process is implemented as a scalable and distributed application, and deployed on a Big Data platform. A quality evaluation is described, which compares the approach with multi-label classification algorithms from the state of the art dedicated to the same goal. The Semantic HMC approach outperforms state of the art approaches in some areas.
Databáze: OpenAIRE