Exploiting native language interference for native language identification

Autor:	Ilia Markov, Carlo Strapparava, Vivi Nastase
Přispěvatelé:	Language
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Linguistics and Language SDG 16 - Peace Computer science First language 02 engineering and technology Interference (wave propagation) Language and Linguistics Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Computer. Automation Communication Native-language identification business.industry 05 social sciences SDG 16 - Peace Justice and Strong Institutions 050301 education Linguistics SDG 10 - Reduced Inequalities Justice and Strong Institutions 020201 artificial intelligence & image processing business 0503 education SDG 4 - Quality Education Software
Zdroj:	Natural Language Engineering, 28(2), 167-197. Cambridge University Press Natural language engineering Markov, I, Nastase, V & Strapparava, C 2020, ' Exploiting native language interference for native language identification ', Natural Language Engineering, vol. 28, no. 2, pp. 167-197 . https://doi.org/10.1017/S1351324920000595
ISSN:	1351-3249
Popis:	Native language identification (NLI)—the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2)—is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages’ structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features/representations that cover (indiscriminately) a variety of linguistic phenomena.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::be7835c70c1f0f5dfe2db922a6ce26e1 https://hdl.handle.net/1871.1/f76a084a-d957-45f4-9098-64eb4bb56800 Zobrazit plný text záznamu