Comment Spam Classification in Blogs through Comment Analysis and Comment-Blog Post Relationships
Autor: | Anand Mahendran, Ashwin Rajadesingan |
---|---|
Rok vydání: | 2012 |
Předmět: | |
Zdroj: | Computational Linguistics and Intelligent Text Processing ISBN: 9783642286001 CICLing (2) |
DOI: | 10.1007/978-3-642-28601-8_41 |
Popis: | Spamming refers to the process of providing unwanted and irrelevant information to the users. It is a widespread phenomenon that is often noticed in e-mails, instant messages, blogs and forums. In our paper, we consider the problem of spamming in blogs. In blogs, spammers usually target commenting systems which are provided by the authors to facilitate interaction with the readers. Unfortunately, spammers abuse these commenting systems by posting irrelevant and unsolicited content in the form of spam comments. Thus, we propose a novel methodology to classify comments into spam and non-spam using previously-undescribed features including certain blog post-comment relationships. Experiments conducted using our methodology produced a spam detection accuracy of 94.82% with a precision of 96.50% and a recall of 95.80%. |
Databáze: | OpenAIRE |
Externí odkaz: |