A new incremental growing neural gas algorithm based on clusters labeling maximization: application to clustering of heterogeneous textual data

Autor:	Jean-Charles Lamirel, Zied Boulila, Maha Ghribi, Pascal Cuxac, Claire François
Přispěvatelé:	Natural Language Processing: representation, inference and semantics (TALARIS), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Institut de l'information scientifique et technique (INIST), Centre National de la Recherche Scientifique (CNRS), Lamirel, Jean-Charles, Nicolás García-Pedrajas, Francisco Herrera, José Manuel Benítez, Colin Fyfe, Moonis Ali
Jazyk:	angličtina
Rok vydání:	2010
Předmět:	[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] Neural gas Computer science Population-based incremental learning [INFO.INFO-NE] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] Context (language use) Cluster result [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] [STAT.OT]Statistics [stat]/Other Statistics [stat.ML] computer.software_genre [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] Cluster Result [INFO.INFO-PF] Computer Science [cs]/Performance [cs.PF] Homogeneous Dataset Winning Neuron Cluster analysis ComputingMilieux_MISCELLANEOUS business.industry Pattern recognition Maximization [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] [SCCO.LING]Cognitive science/Linguistics [STAT.OT] Statistics [stat]/Other Statistics [stat.ML] [STAT.ML] Statistics [stat]/Machine Learning [stat.ML] Cluster Label [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF] Homogeneous [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] Textual Dataset Artificial intelligence Data mining [INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR] [SCCO.LING] Cognitive science/Linguistics business Algorithm computer Distance based
Zdroj:	23rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE 2010) 23rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE 2010), Jun 2010, Cordoba, Spain Trends in Applied Intelligent Systems ISBN: 9783642130328 IEA/AIE (3) HAL IEA/AIE'10: Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems 23rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA-AIE 2010), Jun 2010, Cordoba, Spain. pp.139-148, ⟨10.1007/978-3-642-13033-5_15⟩
DOI:	10.1007/978-3-642-13033-5_15⟩
Popis:	International audience; Neural clustering algorithms show high performance in the usual context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental neural gas algorithm (IGNG). Nevertheless, this paper highlights clearly the drastic decrease of performance of these algorithms, as well as the one of more classical algorithms, when a heterogeneous textual dataset is considered as an input. A new incremental growing neural gas algorithm exploiting knowledge issued from clusters current labeling in an incremental way is proposed as an alternative to the original distance based algorithm. This solution leads to obtain very significant increase of performance for the clustering of heterogeneous textual data. Moreover, it provides a real incremental character to the proposed algorithm.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e94d8e06630f055cc47bc0b5b5606ad4 https://hal.inria.fr/inria-00535942 Zobrazit plný text záznamu