Zobrazeno 1 - 10
of 32
pro vyhledávání: '"Nicolai, Garrett"'
In this paper, we address the data scarcity problem in automatic data-driven glossing for low-resource languages by coordinating multiple sources of linguistic expertise. We supplement models with translations at both the token and sentence level as
Externí odkaz:
http://arxiv.org/abs/2406.11085
We investigate automatic interlinear glossing in low-resource settings. We augment a hard-attentional neural model with embedded translation information extracted from interlinear glossed text. After encoding these translations using large language m
Externí odkaz:
http://arxiv.org/abs/2403.08189
Autor:
Yang, Wayne, Nicolai, Garrett
Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming. We investigate an alternative to manual parallel corpora - hallucinated parallel corpora created by generative language
Externí odkaz:
http://arxiv.org/abs/2307.05779
With a growing focus on morphological inflection systems for languages where high-quality data is scarce, training data noise is a serious but so far largely ignored concern. We aim at closing this gap by investigating the types of noise encountered
Externí odkaz:
http://arxiv.org/abs/2305.16581
Autor:
Batsuren, Khuyagbaatar, Goldman, Omer, Khalifa, Salam, Habash, Nizar, Kieraś, Witold, Bella, Gábor, Leonard, Brian, Nicolai, Garrett, Gorman, Kyle, Ate, Yustinus Ghanggo, Ryskina, Maria, Mielke, Sabrina J., Budianskaya, Elena, El-Khaissi, Charbel, Pimentel, Tiago, Gasser, Michael, Lane, William, Raj, Mohit, Coler, Matt, Samame, Jaime Rafael Montoya, Camaiteri, Delio Siticonatzi, Sagot, Benoît, Rojas, Esaú Zumaeta, Francis, Didier López, Oncevay, Arturo, Bautista, Juan López, Villegas, Gema Celeste Silva, Hennigen, Lucas Torroba, Ek, Adam, Guriel, David, Dirix, Peter, Bernardy, Jean-Philippe, Scherbakov, Andrey, Bayyr-ool, Aziyana, Anastasopoulos, Antonios, Zariquiey, Roberto, Sheifer, Karina, Ganieva, Sofya, Cruz, Hilaria, Karahóǧa, Ritván, Markantonatou, Stella, Pavlidis, George, Plugaryov, Matvey, Klyachko, Elena, Salehi, Ali, Angulo, Candy, Baxi, Jatayu, Krizhanovsky, Andrew, Krizhanovskaya, Natalia, Salesky, Elizabeth, Vania, Clara, Ivanova, Sardana, White, Jennifer, Maudslay, Rowan Hall, Valvoda, Josef, Zmigrod, Ran, Czarnowska, Paula, Nikkarinen, Irene, Salchak, Aelita, Bhatt, Brijesh, Straughn, Christopher, Liu, Zoey, Washington, Jonathan North, Pinter, Yuval, Ataman, Duygu, Wolinski, Marcin, Suhardijanto, Totok, Yablonskaya, Anna, Stoehr, Niklas, Dolatian, Hossep, Nuriah, Zahroh, Ratan, Shyam, Tyers, Francis M., Ponti, Edoardo M., Aiton, Grant, Arora, Aryaman, Hatcher, Richard J., Kumar, Ritesh, Young, Jeremiah, Rodionova, Daria, Yemelina, Anastasia, Andrushko, Taras, Marchenko, Igor, Mashkovtseva, Polina, Serova, Alexandra, Prud'hommeaux, Emily, Nepomniashchaya, Maria, Giunchiglia, Fausto, Chodroff, Eleanor, Hulden, Mans, Silfverberg, Miikka, McCarthy, Arya D., Yarowsky, David, Cotterell, Ryan, Tsarfaty, Reut, Vylomova, Ekaterina
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-indepe
Externí odkaz:
http://arxiv.org/abs/2205.03608
Autor:
Forbes, Clarissa, Samir, Farhan, Oliver, Bruce Harold, Yang, Changbing, Coates, Edith, Nicolai, Garrett, Silfverberg, Miikka
Recent progress in NLP is driven by pretrained models leveraging massive datasets and has predominantly benefited the world's political and economic superpowers. Technologically underserved languages are left behind because they lack such resources.
Externí odkaz:
http://arxiv.org/abs/2203.09632
Autor:
Wiemerslage, Adam, Silfverberg, Miikka, Yang, Changbing, McCarthy, Arya D., Nicolai, Garrett, Colunga, Eliana, Kann, Katharina
Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of com
Externí odkaz:
http://arxiv.org/abs/2203.08909
Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data. Despite the performance, the opacity of neur
Externí odkaz:
http://arxiv.org/abs/2104.00789
Autor:
Vylomova, Ekaterina, White, Jennifer, Salesky, Elizabeth, Mielke, Sabrina J., Wu, Shijie, Ponti, Edoardo, Maudslay, Rowan Hall, Zmigrod, Ran, Valvoda, Josef, Toldova, Svetlana, Tyers, Francis, Klyachko, Elena, Yegorov, Ilya, Krizhanovsky, Natalia, Czarnowska, Paula, Nikkarinen, Irene, Krizhanovsky, Andrew, Pimentel, Tiago, Hennigen, Lucas Torroba, Kirov, Christo, Nicolai, Garrett, Williams, Adina, Anastasopoulos, Antonios, Cruz, Hilaria, Chodroff, Eleanor, Cotterell, Ryan, Silfverberg, Miikka, Hulden, Mans
A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on
Externí odkaz:
http://arxiv.org/abs/2006.11572
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology. Participants were asked to submit systems whi
Externí odkaz:
http://arxiv.org/abs/2005.13756