Zobrazeno 1 - 10
of 39
pro vyhledávání: '"Kirov, Christo"'
Publikováno v:
Transactions of the Association for Computational Linguistics, Vol 7, Pp 327-342 (2019)
We quantify the linguistic complexity of different languages’ morphological systems. We verify that there is a statistically significant empirical trade-off between paradigm size and irregularity: A language’s inflectional paradigms may be either
Externí odkaz:
https://doaj.org/article/0fb649718b164ce0bb10d522426035cb
Autor:
Ruder, Sebastian, Clark, Jonathan H., Gutkin, Alexander, Kale, Mihir, Ma, Min, Nicosia, Massimo, Rijhwani, Shruti, Riley, Parker, Sarr, Jean-Michel A., Wang, Xinyi, Wieting, John, Gupta, Nitish, Katanova, Anna, Kirov, Christo, Dickinson, Dana L., Roark, Brian, Samanta, Bidisha, Tao, Connie, Adelani, David I., Axelrod, Vera, Caswell, Isaac, Cherry, Colin, Garrette, Dan, Ingle, Reeve, Johnson, Melvin, Panteleev, Dmitry, Talukdar, Partha
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotat
Externí odkaz:
http://arxiv.org/abs/2305.11938
Publikováno v:
EACL Findings 2023
We examine whether large neural language models, trained on very large collections of varied English text, learn the potentially long-distance dependency of British versus American spelling conventions, i.e., whether spelling is consistently one or t
Externí odkaz:
http://arxiv.org/abs/2303.03457
Autor:
Kirov, Christo1 (AUTHOR) ckirov@google.com, Johny, Cibu1 (AUTHOR) cibu@google.com, Katanova, Anna1 (AUTHOR) akatanova@google.com, Gutkin, Alexander1 (AUTHOR) agutkin@google.com, Roark, Brian1 (AUTHOR) roark@google.com
Publikováno v:
Computational Linguistics. Jun2024, Vol. 50 Issue 2, p475-534. 60p.
Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages. We consider the task of reversing these abbreviations in context to recover normalized, expanded versions of abbreviated messages. The problem is
Externí odkaz:
http://arxiv.org/abs/2110.01140
Autor:
Roark, Brian, Wolf-Sonkin, Lawrence, Kirov, Christo, Mielke, Sabrina J., Johny, Cibu, Demirsahin, Isin, Hall, Keith
This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages. The dataset includes, for each language: 1) native script Wikipedia text; 2) a romanization lexicon; and 3
Externí odkaz:
http://arxiv.org/abs/2007.01176
Autor:
Vylomova, Ekaterina, White, Jennifer, Salesky, Elizabeth, Mielke, Sabrina J., Wu, Shijie, Ponti, Edoardo, Maudslay, Rowan Hall, Zmigrod, Ran, Valvoda, Josef, Toldova, Svetlana, Tyers, Francis, Klyachko, Elena, Yegorov, Ilya, Krizhanovsky, Natalia, Czarnowska, Paula, Nikkarinen, Irene, Krizhanovsky, Andrew, Pimentel, Tiago, Hennigen, Lucas Torroba, Kirov, Christo, Nicolai, Garrett, Williams, Adina, Anastasopoulos, Antonios, Cruz, Hilaria, Chodroff, Eleanor, Cotterell, Ryan, Silfverberg, Miikka, Hulden, Mans
A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on
Externí odkaz:
http://arxiv.org/abs/2006.11572
Autor:
Schwartz, Lane, Tyers, Francis, Levin, Lori, Kirov, Christo, Littell, Patrick, Lo, Chi-kiu, Prud'hommeaux, Emily, Park, Hyunji Hayley, Steimel, Kenneth, Knowles, Rebecca, Micher, Jeffrey, Strunk, Lonny, Liu, Han, Haley, Coleman, Zhang, Katherine J., Jimmerson, Robbie, Andriyanets, Vasilisa, Muis, Aldrian Obaja, Otani, Naoki, Park, Jong Hyuk, Zhang, Zhisong
Research in natural language processing commonly assumes that approaches that work well for English and and other widely-used languages are "language agnostic". In high-resource languages, especially those that are analytic, a common approach is to t
Externí odkaz:
http://arxiv.org/abs/2005.05477
Autor:
McCarthy, Arya D., Vylomova, Ekaterina, Wu, Shijie, Malaviya, Chaitanya, Wolf-Sonkin, Lawrence, Nicolai, Garrett, Kirov, Christo, Silfverberg, Miikka, Mielke, Sabrina J., Heinz, Jeffrey, Cotterell, Ryan, Hulden, Mans
Publikováno v:
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology (2019) 229-244
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.
Externí odkaz:
http://arxiv.org/abs/1910.11493
Autor:
Kirov, Christo, Cotterell, Ryan, Sylak-Glassman, John, Walther, Géraldine, Vylomova, Ekaterina, Xia, Patrick, Faruqui, Manaal, Mielke, Sabrina J., McCarthy, Arya D., Kübler, Sandra, Yarowsky, David, Eisner, Jason, Hulden, Mans
The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. The project releases annotated morphological data using a universal tagset, the UniMorph schema. Each infl
Externí odkaz:
http://arxiv.org/abs/1810.11101