Hierarchical classification of data with long-tailed distributions via global and local granulation
Autor: | Yaojin Lin, Hong Zhao, Shunxin Guo |
---|---|
Rok vydání: | 2021 |
Předmět: |
Information Systems and Management
Computer science business.industry Knowledge organization Data classification WordNet Pattern recognition Spectral clustering Computer Science Applications Theoretical Computer Science Hierarchical classifier ComputingMethodologies_PATTERNRECOGNITION Hotspot (Wi-Fi) Artificial Intelligence Control and Systems Engineering Artificial intelligence business Global optimization Classifier (UML) Software |
Zdroj: | Information Sciences. 581:536-552 |
ISSN: | 0020-0255 |
DOI: | 10.1016/j.ins.2021.09.059 |
Popis: | Automated learning from datasets with a long-tailed distribution has gradually become a research hotspot due to the increasing complexity of large-scale real-world datasets. Existing solutions to long-tailed data classification usually involve re-balancing strategies for global optimization, which can achieve satisfactory results. However, re-balancing strategies tend to alter the original data. In this paper, we propose a knowledge granulation method based on global and local granulation to assist the hierarchical classification of long-tailed data without altering the original data. Firstly, a global classifier is constructed based on the WordNet knowledge organization’s hierarchical structure, which is used to granulate the global data from coarse to fine. Secondly, a local hierarchical classifier adapted to tail data is constructed for tail classes that contain few samples. The hierarchical structure of this local classifier is obtained by granulating the data via spectral clustering rather than by using the semantic hierarchy of classes. Finally, the global classifier is used to preliminarily classify samples, then uncertain samples are further classified by the tail local classifier. Experimental results show that the proposed method outperforms several state-of-the-art models designed for the hierarchical classification of long-tailed data. |
Databáze: | OpenAIRE |
Externí odkaz: |