Comment Spam Classification in Blogs through Comment Analysis and Comment-Blog Post Relationships

Autor: Anand Mahendran, Ashwin Rajadesingan
Rok vydání: 2012
Předmět:
Zdroj: Computational Linguistics and Intelligent Text Processing ISBN: 9783642286001
CICLing (2)
DOI: 10.1007/978-3-642-28601-8_41
Popis: Spamming refers to the process of providing unwanted and irrelevant information to the users. It is a widespread phenomenon that is often noticed in e-mails, instant messages, blogs and forums. In our paper, we consider the problem of spamming in blogs. In blogs, spammers usually target commenting systems which are provided by the authors to facilitate interaction with the readers. Unfortunately, spammers abuse these commenting systems by posting irrelevant and unsolicited content in the form of spam comments. Thus, we propose a novel methodology to classify comments into spam and non-spam using previously-undescribed features including certain blog post-comment relationships. Experiments conducted using our methodology produced a spam detection accuracy of 94.82% with a precision of 96.50% and a recall of 95.80%.
Databáze: OpenAIRE