Identifying Severity of Cyberbullying Using Scalable Labeled Multi-Platform Dataset

Autor:	Vyawahare, Madhura, Govilkar, Sharvari
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Machine Learning Social Media Cyberbullying Dataset Annotation Cybercrimes
Zdroj:	International Journal of Intelligent Systems and Applications in Engineering; Vol. 10 No. 4 (2022); 201-210
ISSN:	2147-6799
Popis:	Increasing invective posts on online social media platforms is of great concern considering the wellbeing of society and psychological health of youth. These invective posts many times take the form of cyberbullying if not tackled in an early stage. It is required to identify such posts which are harmful and may become even more dangerous for any netizens, to maintain a psychologically healthy society. Many machine learning and deep learning based systems were designed in the past for automated cyberbullying detection. Accurate and precise cyberbullying detection needs a large and correctly annotated dataset. The work is focused on resolving the issue of unavailability of appropriate dataset by designing an automated labeling system for creating and labeling the dataset to detect severity of cyberbullying. The meta-features apart from textual comments like semantic and syntactic features also contribute to learning of the machine. Principal components analysis is used for feature extraction and reduction. Rule based methodology is designed, developed and implemented which considers textual, semantic and syntactic features and results in a rich in features, multi-platform, multi-label dataset for severity of cyberbullying detection as well as cyberbullying prediction. Till now only two approaches have been used for Annotation of dataset: Manual labeling and filtration method. A new rule based automated approach is proposed and implemented in this work. Using this new approach the dataset of size 17 lakh entries with 5 labels is prepared and used for training the machines. To make the dataset standardized and usable for researchers in future, it is tested and verified with various methods. Evaluation of the proposed system based on accuracy, precision, recall and f-measure demonstrates that the performance of multiclass classification trained from the prepared dataset is highly improved.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=issn21476799::7cb35a263676456067ad535a06896e47 https://www.ijisae.org/index.php/IJISAE/article/view/2217 Zobrazit plný text záznamu