A new incremental growing neural gas algorithm based on clusters labeling maximization: application to clustering of heterogeneous textual data

Autor: Jean-Charles Lamirel, Zied Boulila, Maha Ghribi, Pascal Cuxac, Claire François
Přispěvatelé: Natural Language Processing: representation, inference and semantics (TALARIS), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Institut de l'information scientifique et technique (INIST), Centre National de la Recherche Scientifique (CNRS), Lamirel, Jean-Charles, Nicolás García-Pedrajas, Francisco Herrera, José Manuel Benítez, Colin Fyfe, Moonis Ali
Jazyk: angličtina
Rok vydání: 2010
Předmět:
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]
Neural gas
Computer science
Population-based incremental learning
[INFO.INFO-NE] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]
Context (language use)
Cluster result
[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]
[STAT.OT]Statistics [stat]/Other Statistics [stat.ML]
computer.software_genre
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]
Cluster Result
[INFO.INFO-PF] Computer Science [cs]/Performance [cs.PF]
Homogeneous Dataset
Winning Neuron
Cluster analysis
ComputingMilieux_MISCELLANEOUS
business.industry
Pattern recognition
Maximization
[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]
[SCCO.LING]Cognitive science/Linguistics
[STAT.OT] Statistics [stat]/Other Statistics [stat.ML]
[STAT.ML] Statistics [stat]/Machine Learning [stat.ML]
Cluster Label
[INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF]
Homogeneous
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
Textual Dataset
Artificial intelligence
Data mining
[INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR]
[SCCO.LING] Cognitive science/Linguistics
business
Algorithm
computer
Distance based
Zdroj: 23rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE 2010)
23rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE 2010), Jun 2010, Cordoba, Spain
Trends in Applied Intelligent Systems ISBN: 9783642130328
IEA/AIE (3)
HAL
IEA/AIE'10: Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems
23rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE 2010), Jun 2010, Cordoba, Spain. pp.139-148, ⟨10.1007/978-3-642-13033-5_15⟩
DOI: 10.1007/978-3-642-13033-5_15⟩
Popis: International audience; Neural clustering algorithms show high performance in the usual context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental neural gas algorithm (IGNG). Nevertheless, this paper highlights clearly the drastic decrease of performance of these algorithms, as well as the one of more classical algorithms, when a heterogeneous textual dataset is considered as an input. A new incremental growing neural gas algorithm exploiting knowledge issued from clusters current labeling in an incremental way is proposed as an alternative to the original distance based algorithm. This solution leads to obtain very significant increase of performance for the clustering of heterogeneous textual data. Moreover, it provides a real incremental character to the proposed algorithm.
Databáze: OpenAIRE