Adaptive Distribution of Vocabulary Frequencies: A Novel Estimation Suitable for Social Media Corpus
Autor: | Igawa, R. A., Kido, G. S., Seixas, J. L., J, R., Barbon Junior, S |
---|---|
Přispěvatelé: | Igawa, R. A., Kido, G. S., Seixas, J. L., J, R., Barbon Junior, S |
Rok vydání: | 2014 |
Předmět: |
Vocabulary
Zipf's law business.industry Computer science media_common.quotation_subject Probabilistic logic computer.software_genre Set (abstract data type) Metric (mathematics) Social media Noise (video) Artificial intelligence Data mining business computer Natural language processing Statistic media_common |
Zdroj: | BRACIS |
DOI: | 10.1109/bracis.2014.58 |
Popis: | This paper aims to propose a mathematical model that evaluates the distribution of the vocabulary frequency terms in proportion to a probabilistic ideal. Once we are able to evaluate it, the main objective of this work is to use it in order to examine text demising. We propose this new metric based on the classic Zipf's law statistic method. The experimental set to test the classic Zipf's law and our developed model is based on some books of the classic literature and some tweets sets of Twitter. Thus, our main result is that the model proposed in this work is more sensitive to the presence of text noises than Zipf's law and is asymptotically quicker, suitable to corpus of social media networks. |
Databáze: | OpenAIRE |
Externí odkaz: |