Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization

Autor:	Soon Cheol Park, Lim Cheon Choi, Wei Song, Xiao Feng Ding
Přispěvatelé:	Song, Wei, Cheon, Choi Lim, Cheol, Park Soon, Ding, Xiao feng
Jazyk:	angličtina
Rok vydání:	2011
Předmět:	Fuzzy clustering Computer science business.industry General Engineering normalized Google distance Fuzzy control system unsupervised categorization Machine learning computer.software_genre Fuzzy logic Automatic summarization Computer Science Applications Categorization Artificial Intelligence topics estimation Data mining Artificial intelligence Normalized Google distance extractive summarization Cluster analysis business computer Sentence fuzzy evolutionary optimization Premature convergence
Popis:	Modern information retrieval (IR) systems consist of many challenging components, e.g. clustering, summarization,etc. Nowadays, without browsing the whole volume of data sets, IR systems present users with clusters of documents they are interested in, and summarize each document briefly which facilitates the task of finding the desired documents. This paper proposes a fuzzy evolutionary optimization modeling(FEOM) and its applications to unsupervised categorization and extractive summarization. In view of the nature of biological evolution, we take advantage of several fuzzy control parameters to adaptively regulate the behaviors of the evolutionary optimization, which can effectively prevent premature convergence to a local optimal solution. As a portable, modular and extensively executable model, FEOM is firstly implemented for clustering text documents. The searching capability of FEOM is exploited to explore appropriate partitions of documents such that the similarity metric of the resulting clusters is optimized. In order to further investigate its effectiveness as a generic data clustering model, FEOM is then applied to sentence clustering based extractive document summarization. It selects the most important sentence from each cluster to represent the overall meaning of document. We demonstrate the improved performance by a series of experiments using standard test sets, e.g. Reuter document collection,20-newsgroup corpus, DUC01 and DUC02, as evaluated by some commonly used metrics, i.e. F-measureand ROUGE. The experimental results show that FEOM achieves performance as good as or betterthan state of arts of clustering and summarizing systems. Refereed/Peer-reviewed
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5e4dcff23af8c2c73550ae4fd67f6f1f https://hdl.handle.net/1959.8/122585 Zobrazit plný text záznamu