Part-of-speech tagging of code-mixed social media content: pipeline, stacking and joint modelling
Autor: | Jennifer Foster, Utsab Barman, Joachim Wagner |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2016 |
Předmět: |
Language identification
Computer science business.industry Part-of-speech tagging media_common.quotation_subject 02 engineering and technology computer.software_genre 01 natural sciences Pipeline (software) World Wide Web 0103 physical sciences Content (measure theory) 0202 electrical engineering electronic engineering information engineering Code (cryptography) 020201 artificial intelligence & image processing Social media Conversation Artificial intelligence 010306 general physics business Joint (audio engineering) computer Machine translating Natural language processing media_common |
Zdroj: | Barman, Utsab, Wagner, Joachim ORCID: 0000-0002-8290-3849 CodeSwitch@EMNLP |
Popis: | Multilingual users of social media sometimes use multiple languages during conversation. Mixing multiple languages in content is known as code-mixing. We annotate a subset of a trilingual code-mixed corpus (Barman et al., 2014) with part-of-speech (POS) tags. We investigate two state-of-the-art POS tagging techniques for code-mixed content and combine the features of the two systems to build a better POS tagger. Furthermore, we investigate the use of a joint model which performs language identification (LID) and partof-speech (POS) tagging simultaneously. |
Databáze: | OpenAIRE |
Externí odkaz: |