Part of speech tagging for code switched data
Autor: | AlGhamdi, Fahad, Molina, Giovanni, Diab, Mona, Solorio, Thamar, Hawwari, Abdelati, Soto, Victor, Hirschberg, Julia |
---|---|
Rok vydání: | 2019 |
Předmět: | |
Druh dokumentu: | Working Paper |
DOI: | 10.18653/v1/W16-5812 |
Popis: | We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS). CS is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential CS, respectively. Processing CS data is especially challenging in intra-sentential data given state of the art monolingual NLP technology since such technology is geared toward the processing of one language at a time. In this paper we explore multiple strategies of applying state of the art POS taggers to CS data. We investigate the landscape in two CS language pairs, Spanish-English and Modern Standard Arabic-Arabic dialects. We compare the use of two POS taggers vs. a unified tagger trained on CS data. Our results show that applying a machine learning framework using two state of the art POS taggers achieves better performance compared to all other approaches that we investigate. Comment: Association for Computational Linguistics |
Databáze: | arXiv |
Externí odkaz: |