Detecting Spam Tweets using Character N-gram Features

Autor:	Mokhtar Ashour, Cherif Salama, M. Watheq El-Kharashi
Rok vydání:	2018
Předmět:	n-gram Information retrieval Computer science Character (computing) 020204 information systems Feature extraction 0202 electrical engineering electronic engineering information engineering Feature (machine learning) 020201 artificial intelligence & image processing 02 engineering and technology Latency (engineering) Popularity Word (computer architecture)
Zdroj:	2018 13th International Conference on Computer Engineering and Systems (ICCES).
DOI:	10.1109/icces.2018.8639297
Popis:	Twitter popularity made it an important and instantaneous source of news and trending events around the world. It has attracted the attention of spammers who post malicious content embedded in tweets and in their profile pages. Spammers use different and evolving techniques to evade traditional security mechanisms, and that creates the need to develop robust solutions that adapt with these techniques. In this paper, we propose using a low-level character n-grams feature that avoids the use of tokenizers or any language dependent tools. Using a publicly available dataset, we evaluate the performance of multiple ma-chine learning classifiers with different representations of the proposed feature. Our experiments show that our approach is an enhancement over the approaches that use word n-grams from tweet tokens. We also show that our technique can detect spam tweets with low latency which is crucial in a real-time environment like twitter.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::587e1d93a43b80c96aa5c845dd9fc68b https://doi.org/10.1109/icces.2018.8639297 Zobrazit plný text záznamu