What Differentiates News Articles with Short and Long Shelf Lives? A Case Study on News Articles at Bloomberg.com
Autor: | Yaohang Li, Rohit Parimi, Kumara Kallepalli, Ajay Gupta, Wessam Elhenfnawy, Kevin Racheal, Parth Shah, Jesse Wright |
---|---|
Rok vydání: | 2016 |
Předmět: |
Root (linguistics)
Thesaurus (information retrieval) Information retrieval Computer science Feature vector InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL 02 engineering and technology Shelf life Linear discriminant analysis Term (time) 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Precision and recall tf–idf Set (psychology) |
Zdroj: | BDCloud-SocialCom-SustainCom |
DOI: | 10.1109/bdcloud-socialcom-sustaincom.2016.30 |
Popis: | In this paper, we apply discriminant analysis on a large set of historic news articles published at www.bloomberg.com and investigate what features make the difference between news articles with short and long shelf lives. We define the shelf life of an article as the time to reach 60% of its total hits throughout its overall life time. The "bag-of-words" model is used to represent the content of an article as a vector of features, which are uni-, bi-, or tri-gram keywords. The thesaurus approach is applied to group words with similar meanings to a set of root words to reduce the size of the feature space. Normalized TF-IDF (Term Frequency -- Inverse Document Frequency) scheme is used to encode the feature vectors. By applying Linear Discriminant Analysis (LDA) on the articles with short and long shelf lives, near or over 80% precision and recall on both categories are achieved. Surprisingly, we also find that the sentiment of news articles has little correlation with their shelf lives. |
Databáze: | OpenAIRE |
Externí odkaz: |