An Approach for Text Mining Based on Noun Phrases
Autor: | Edilson Ferneda, Marcelo Ladeira, Hercules Antonio do Prado, Marcello Sandi Pinheiro |
---|---|
Rok vydání: | 2015 |
Předmět: |
Computer science
business.industry Process (engineering) InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Term (logic) computer.software_genre Noun phrase ComputingMethodologies_PATTERNRECOGNITION Text mining Noun ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Preprocessor Relevance (information retrieval) Artificial intelligence business Representation (mathematics) computer Natural language processing |
Zdroj: | Intelligent Decision Technologies ISBN: 9783319198569 KES-IDT |
Popis: | The use of noun phrases as descriptors for text mining vectors has been proposed to overcome the poor semantic of the traditional bag-of-words (BOW). However, the solutions found in the literature are unsatisfactory, mainly due to the use of static definitions for noun phrases and the fact that noun phrases per se do not enable an adequate relevance representation since they are expressions that barely repeat. We present an approach to deal with these problems by (i) introducing a process that enables the definition of noun phrases interactively and (ii) considering similar noun phrases as a unique term. A case study compares both approaches, the one proposed in this paper and the other based on BOW. The main contribution of this paper is the improvement of the preprocessing phase of text mining, leading to better results in the overall process. |
Databáze: | OpenAIRE |
Externí odkaz: |