Automated text content identification for document processing using a kernel-based support Vector Selection approach

Autor:	Monique P. Fargues, Steven M. Benveniste
Rok vydání:	2009
Předmět:	Computer science Latent semantic analysis business.industry Feature vector Linear discriminant analysis Document processing Machine learning computer.software_genre Support vector machine Statistical classification Kernel (linear algebra) ComputingMethodologies_PATTERNRECOGNITION Text mining Categorization Polynomial kernel Artificial intelligence Data mining business computer Classifier (UML)
Zdroj:	2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.
DOI:	10.1109/acssc.2009.5469831
Popis:	Automated text analysis and mining tools designed to identify the main topics of texts, chat room discussions, and web postings are an increasingly active research area due to the rapid explosion of Web information. This paper applies the nonlinear kernel-based Feature Vector Selection (FVS) approach followed by a Linear Discriminant Analysis (LDA) step to categorize unstructured text documents. Results are compared to those obtained using the Latent Semantic Analysis (LSA) approach commonly used in text categorization applications. Overall results, taking into account classification performances and computational load issues, show that the FVS-LDA implemented with a polynomial kernel of degree 1 and an added constant of 1 to be the best classifier for the database considered.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::3d2b3bfb853eaf0806af0693c4925d58 https://doi.org/10.1109/acssc.2009.5469831 Zobrazit plný text záznamu