Identifying Racist Social Media Comments in Sinhala Language Using Text Analytics Models with Machine Learning

Autor: Madhushi D. Welikala, D.S. Dias, N.G.J. Dias
Rok vydání: 2018
Předmět:
Zdroj: 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer).
Popis: Racism or the act of discriminating people based on their race, gender, skin colour and other such related factors has been ruling the world since ancient times. With the development of technology and the introduction of various platforms to communicate with each other such as social media, it soon turned in to a platform to spread racial thoughts within communities. This became a serious issue when certain conversations and actions lead to the outbreak and spread of violence within communities. One such incident that happened in China caused banning of Facebook and certain social media inside the country permanently, while in March 2018, another incident that happened in Sri Lanka made the government ban social media for about one week to stop the spreading of false information and racist thoughts that can make the situation worst. Later, the social media authorities released a statement specifying that the reason why they failed to moderate and stop the spread of hatred and racist comments was due to the unavailability of language translators within their organization. Therefore, the requirement for automatic identification of racist comments on social media has become of utmost importance. However, simple keyword spotting techniques cannot be used to accurately identify the exact intent of a comment. In this paper, we address this issue by building a text analytics model with machine learning that can be used to filter racist comments in Sinhala language. A Two-Class Support Vector Machine was trained with a set of carefully chosen comments from Facebook that were labelled as racist and non-racist based on intent. The trained model was then able to classify racist comments with a 70.8% accuracy in our experimental results.
Databáze: OpenAIRE