Part-of-speech tagging of code-mixed social media content: pipeline, stacking and joint modelling

Autor: Jennifer Foster, Utsab Barman, Joachim Wagner
Jazyk: angličtina
Rok vydání: 2016
Předmět:
Zdroj: Barman, Utsab, Wagner, Joachim ORCID: 0000-0002-8290-3849 and Foster, Jennifer ORCID: 0000-0002-7789-4853 (2016) Part-of-speech tagging of code-mixed social media content: pipeline, stacking and joint modelling. In: Second Workshop on Computational Approaches to Code Switching, 2 Nov 2016, Austin, Texas, USA.
CodeSwitch@EMNLP
Popis: Multilingual users of social media sometimes use multiple languages during conversation. Mixing multiple languages in content is known as code-mixing. We annotate a subset of a trilingual code-mixed corpus (Barman et al., 2014) with part-of-speech (POS) tags. We investigate two state-of-the-art POS tagging techniques for code-mixed content and combine the features of the two systems to build a better POS tagger. Furthermore, we investigate the use of a joint model which performs language identification (LID) and partof-speech (POS) tagging simultaneously.
Databáze: OpenAIRE