Code Mixing: A Challenge for Language Identification in the Language of Social Media

Autor:	Joachim Wagner, Amitava Das, Jennifer Foster, Utsab Barman
Jazyk:	angličtina
Rok vydání:	2014
Předmět:	Hindi Conditional random field Artificial intelligence Language identification business.industry Computer science Computational linguistics Work in process computer.software_genre language.human_language Code-mixing Task (project management) Bengali ComputingMethodologies_PATTERNRECOGNITION Machine learning code switching language identification natural language processing social media language Social media business computer Natural language processing
Zdroj:	Barman, Utsab, Das, Amitava ORCID: 0000-0003-3418-463X , Wagner, Joachim ORCID: 0000-0002-8290-3849 and Foster, Jennifer ORCID: 0000-0002-7789-4853 (2014) Code mixing: a challenge for language identification in the language of social media. In: First Workshop on Computational Approaches to Code Switching, 25 Oct 2014, Doha, Qatar. CodeSwitch@EMNLP
DOI:	10.13140/2.1.3385.6967
Popis:	In social media communication, multilingual speakers often switch between languages, and, in such an environment, automatic language identification becomes both a necessary and challenging task. In this paper, we describe our work in progress on the problem of automatic language identification for the language of social media. We describe a new dataset that we are in the process of creating, which contains Facebook posts and comments that exhibit code mixing between Bengali, English and Hindi. We also present some preliminary word-level language identification experiments using this dataset. Different techniques are employed, including a simple unsupervised dictionary-based approach, supervised word-level classification with and without contextual clues, and sequence labelling using Conditional Random Fields. We find that the dictionary-based approach is surpassed by supervised classification and sequence labelling, and that it is important to take contextual clues into consideration.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5f796bcc43a6a3fa05b2a5963b576234 Zobrazit plný text záznamu