Autor: |
Mads Toftrup, Manuel R. Ciosici, Ira Assent, Søren Asger Sørensen |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
EACL (Student Research Workshop) |
DOI: |
10.18653/v1/2021.eacl-srw.6 |
Popis: |
Language Identification is the task of identifying a document’s language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model’s performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|