What Differentiates News Articles with Short and Long Shelf Lives? A Case Study on News Articles at Bloomberg.com

Autor: Yaohang Li, Rohit Parimi, Kumara Kallepalli, Ajay Gupta, Wessam Elhenfnawy, Kevin Racheal, Parth Shah, Jesse Wright
Rok vydání: 2016
Předmět:
Zdroj: BDCloud-SocialCom-SustainCom
DOI: 10.1109/bdcloud-socialcom-sustaincom.2016.30
Popis: In this paper, we apply discriminant analysis on a large set of historic news articles published at www.bloomberg.com and investigate what features make the difference between news articles with short and long shelf lives. We define the shelf life of an article as the time to reach 60% of its total hits throughout its overall life time. The "bag-of-words" model is used to represent the content of an article as a vector of features, which are uni-, bi-, or tri-gram keywords. The thesaurus approach is applied to group words with similar meanings to a set of root words to reduce the size of the feature space. Normalized TF-IDF (Term Frequency -- Inverse Document Frequency) scheme is used to encode the feature vectors. By applying Linear Discriminant Analysis (LDA) on the articles with short and long shelf lives, near or over 80% precision and recall on both categories are achieved. Surprisingly, we also find that the sentiment of news articles has little correlation with their shelf lives.
Databáze: OpenAIRE