A Concept Vector Space Model for Semantic Kernels
Autor: | Sujeevan Aseervatham |
---|---|
Přispěvatelé: | Laboratoire d'Informatique de Paris-Nord (LIPN), Université Paris 13 (UP13)-Institut Galilée-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS), Aseervatham, Sujeevan |
Jazyk: | angličtina |
Rok vydání: | 2008 |
Předmět: |
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]
Support Vector Machine Computer science [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing 02 engineering and technology computer.software_genre Semantic Similarity [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] Artificial Intelligence Polynomial kernel String kernel 020204 information systems 0202 electrical engineering electronic engineering information engineering Probabilistic latent semantic analysis business.industry Pattern recognition [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing Kernel Kernel method Kernel embedding of distributions Kernel (statistics) Radial basis function kernel 020201 artificial intelligence & image processing Artificial intelligence Text categorization Tree kernel business computer Natural language processing |
Zdroj: | International Journal on Artificial Intelligence Tools International Journal on Artificial Intelligence Tools, World Scientific Publishing, 2008, 18 (2), To appear HAL |
ISSN: | 0218-2130 |
Popis: | International audience; Kernels are widely used in Natural Language Processing as similarity measures within inner-product based learning methods like the Support Vector Machine. The Vector Space Model (VSM) is extensively used for the spatial representation of the documents. However, it is purely a statistical representation. In this paper, we present a Concept Vector Space Model (CVSM) representation which uses linguistic prior knowledge to capture the meanings of the documents. We also propose a linear kernel and a latent kernel for this space. The linear kernel takes advantage of the linguistic concepts whereas the latent kernel combines statistical and linguistic concepts. Indeed, the latter kernel uses latent concepts extracted by the Latent Semantic Analysis (LSA) in the CVSM. The kernels were evaluated on a text categorization task in the biomedical domain. The Ohsumed corpus, well known for being difficult to categorize, was used. The results have shown that the CVSM improves performance compared to the VSM. |
Databáze: | OpenAIRE |
Externí odkaz: |