Zobrazeno 1 - 10
of 35
pro vyhledávání: '"BATSUREN, KHUYAGBAATAR"'
Autor:
Batsuren, Khuyagbaatar, Vylomova, Ekaterina, Dankers, Verna, Delgerbaatar, Tsetsuukhei, Uzan, Omri, Pinter, Yuval, Bella, Gábor
The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been p
Externí odkaz:
http://arxiv.org/abs/2404.13292
Autor:
Hupkes, Dieuwke, Giulianelli, Mario, Dankers, Verna, Artetxe, Mikel, Elazar, Yanai, Pimentel, Tiago, Christodoulopoulos, Christos, Lasri, Karim, Saphra, Naomi, Sinclair, Arabella, Ulmer, Dennis, Schottmann, Florian, Batsuren, Khuyagbaatar, Sun, Kaiser, Sinha, Koustuv, Khalatbari, Leila, Ryskina, Maria, Frieske, Rita, Cotterell, Ryan, Jin, Zhijing
Publikováno v:
Nat Mach Intell 5, 1161-1174 (2023)
The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisa
Externí odkaz:
http://arxiv.org/abs/2210.03050
Autor:
Simig, Daniel, Wang, Tianlu, Dankers, Verna, Henderson, Peter, Batsuren, Khuyagbaatar, Hupkes, Dieuwke, Diab, Mona
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain bia
Externí odkaz:
http://arxiv.org/abs/2210.01734
Autor:
Batsuren, Khuyagbaatar, Bella, Gábor, Arora, Aryaman, Martinović, Viktor, Gorman, Kyle, Žabokrtský, Zdeněk, Ganbold, Amarsanaa, Dohnalová, Šárka, Ševčíková, Magda, Pelegrinová, Kateřina, Giunchiglia, Fausto, Cotterell, Ryan, Vylomova, Ekaterina
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, c
Externí odkaz:
http://arxiv.org/abs/2206.07615
Autor:
Batsuren, Khuyagbaatar, Goldman, Omer, Khalifa, Salam, Habash, Nizar, Kieraś, Witold, Bella, Gábor, Leonard, Brian, Nicolai, Garrett, Gorman, Kyle, Ate, Yustinus Ghanggo, Ryskina, Maria, Mielke, Sabrina J., Budianskaya, Elena, El-Khaissi, Charbel, Pimentel, Tiago, Gasser, Michael, Lane, William, Raj, Mohit, Coler, Matt, Samame, Jaime Rafael Montoya, Camaiteri, Delio Siticonatzi, Sagot, Benoît, Rojas, Esaú Zumaeta, Francis, Didier López, Oncevay, Arturo, Bautista, Juan López, Villegas, Gema Celeste Silva, Hennigen, Lucas Torroba, Ek, Adam, Guriel, David, Dirix, Peter, Bernardy, Jean-Philippe, Scherbakov, Andrey, Bayyr-ool, Aziyana, Anastasopoulos, Antonios, Zariquiey, Roberto, Sheifer, Karina, Ganieva, Sofya, Cruz, Hilaria, Karahóǧa, Ritván, Markantonatou, Stella, Pavlidis, George, Plugaryov, Matvey, Klyachko, Elena, Salehi, Ali, Angulo, Candy, Baxi, Jatayu, Krizhanovsky, Andrew, Krizhanovskaya, Natalia, Salesky, Elizabeth, Vania, Clara, Ivanova, Sardana, White, Jennifer, Maudslay, Rowan Hall, Valvoda, Josef, Zmigrod, Ran, Czarnowska, Paula, Nikkarinen, Irene, Salchak, Aelita, Bhatt, Brijesh, Straughn, Christopher, Liu, Zoey, Washington, Jonathan North, Pinter, Yuval, Ataman, Duygu, Wolinski, Marcin, Suhardijanto, Totok, Yablonskaya, Anna, Stoehr, Niklas, Dolatian, Hossep, Nuriah, Zahroh, Ratan, Shyam, Tyers, Francis M., Ponti, Edoardo M., Aiton, Grant, Arora, Aryaman, Hatcher, Richard J., Kumar, Ritesh, Young, Jeremiah, Rodionova, Daria, Yemelina, Anastasia, Andrushko, Taras, Marchenko, Igor, Mashkovtseva, Polina, Serova, Alexandra, Prud'hommeaux, Emily, Nepomniashchaya, Maria, Giunchiglia, Fausto, Chodroff, Eleanor, Hulden, Mans, Silfverberg, Miikka, McCarthy, Arya D., Yarowsky, David, Cotterell, Ryan, Tsarfaty, Reut, Vylomova, Ekaterina
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-indepe
Externí odkaz:
http://arxiv.org/abs/2205.03608
Autor:
Khishigsuren, Temuulen, Bella, Gábor, Batsuren, Khuyagbaatar, Freihat, Abed Alhakim, Nair, Nandu Chandran, Ganbold, Amarsanaa, Khalilia, Hadi, Chandrashekar, Yamini, Giunchiglia, Fausto
This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-spe
Externí odkaz:
http://arxiv.org/abs/2204.05049
Autor:
Bella, Gábor, Byambadorj, Erdenebileg, Chandrashekar, Yamini, Batsuren, Khuyagbaatar, Cheema, Danish Ashgar, Giunchiglia, Fausto
The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract
Externí odkaz:
http://arxiv.org/abs/2203.04723
Autor:
Giunchiglia, Fausto, Otterbacher, Jahna, Kleanthous, Styliani, Batsuren, Khuyagbaatar, Bogin, Veronika, Kuflik, Tsvi, Tal, Avital Shulner
As the role of algorithmic systems and processes increases in society, so does the risk of bias, which can result in discrimination against individuals and social groups. Research on algorithmic bias has exploded in recent years, highlighting both th
Externí odkaz:
http://arxiv.org/abs/2104.05658
Autor:
Orphanou, Kalia, Otterbacher, Jahna, Kleanthous, Styliani, Batsuren, Khuyagbaatar, Giunchiglia, Fausto, Bogina, Veronika, Tal, Avital Shulner, AlanHartman, Kuflik, Tsvi
Mitigating bias in algorithmic systems is a critical issue drawing attention across communities within the information and computer sciences. Given the complexity of the problem and the involvement of multiple stakeholders -- including developers, en
Externí odkaz:
http://arxiv.org/abs/2103.16953
Autor:
Batsuren, Khuyagbaatar
Languages are well known to be diverse on all structural levels, from the smallest (phonemic) to the broadest (pragmatic). We propose a set of formal, quantitative measures for the language diversity of linguistic phenomena, the resource incompletene
Externí odkaz:
https://hdl.handle.net/11572/368635