Výsledky vyhledávání - "BATSUREN, KHUYAGBAATAR"

Report

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge

Autor: Batsuren, Khuyagbaatar, Vylomova, Ekaterina, Dankers, Verna, Delgerbaatar, Tsetsuukhei, Uzan, Omri, Pinter, Yuval, Bella, Gábor

The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been p

Externí odkaz: http://arxiv.org/abs/2404.13292

Zobrazit plný text záznamu

Report

State-of-the-art generalisation research in NLP: A taxonomy and review

Publikováno v: Nat Mach Intell 5, 1161-1174 (2023)

The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisa

Externí odkaz: http://arxiv.org/abs/2210.03050

Zobrazit plný text záznamu

Report

Text Characterization Toolkit

Autor: Simig, Daniel, Wang, Tianlu, Dankers, Verna, Henderson, Peter, Batsuren, Khuyagbaatar, Hupkes, Dieuwke, Diab, Mona

In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain bia

Externí odkaz: http://arxiv.org/abs/2210.01734

Zobrazit plný text záznamu

Report

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

Autor: Batsuren, Khuyagbaatar, Bella, Gábor, Arora, Aryaman, Martinović, Viktor, Gorman, Kyle, Žabokrtský, Zdeněk, Ganbold, Amarsanaa, Dohnalová, Šárka, Ševčíková, Magda, Pelegrinová, Kateřina, Giunchiglia, Fausto, Cotterell, Ryan, Vylomova, Ekaterina

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, c

Externí odkaz: http://arxiv.org/abs/2206.07615

Zobrazit plný text záznamu

Report

UniMorph 4.0: Universal Morphology

Autor: Batsuren, Khuyagbaatar, Goldman, Omer, Khalifa, Salam, Habash, Nizar, Kieraś, Witold, Bella, Gábor, Leonard, Brian, Nicolai, Garrett, Gorman, Kyle, Ate, Yustinus Ghanggo, Ryskina, Maria, Mielke, Sabrina J., Budianskaya, Elena, El-Khaissi, Charbel, Pimentel, Tiago, Gasser, Michael, Lane, William, Raj, Mohit, Coler, Matt, Samame, Jaime Rafael Montoya, Camaiteri, Delio Siticonatzi, Sagot, Benoît, Rojas, Esaú Zumaeta, Francis, Didier López, Oncevay, Arturo, Bautista, Juan López, Villegas, Gema Celeste Silva, Hennigen, Lucas Torroba, Ek, Adam, Guriel, David, Dirix, Peter, Bernardy, Jean-Philippe, Scherbakov, Andrey, Bayyr-ool, Aziyana, Anastasopoulos, Antonios, Zariquiey, Roberto, Sheifer, Karina, Ganieva, Sofya, Cruz, Hilaria, Karahóǧa, Ritván, Markantonatou, Stella, Pavlidis, George, Plugaryov, Matvey, Klyachko, Elena, Salehi, Ali, Angulo, Candy, Baxi, Jatayu, Krizhanovsky, Andrew, Krizhanovskaya, Natalia, Salesky, Elizabeth, Vania, Clara, Ivanova, Sardana, White, Jennifer, Maudslay, Rowan Hall, Valvoda, Josef, Zmigrod, Ran, Czarnowska, Paula, Nikkarinen, Irene, Salchak, Aelita, Bhatt, Brijesh, Straughn, Christopher, Liu, Zoey, Washington, Jonathan North, Pinter, Yuval, Ataman, Duygu, Wolinski, Marcin, Suhardijanto, Totok, Yablonskaya, Anna, Stoehr, Niklas, Dolatian, Hossep, Nuriah, Zahroh, Ratan, Shyam, Tyers, Francis M., Ponti, Edoardo M., Aiton, Grant, Arora, Aryaman, Hatcher, Richard J., Kumar, Ritesh, Young, Jeremiah, Rodionova, Daria, Yemelina, Anastasia, Andrushko, Taras, Marchenko, Igor, Mashkovtseva, Polina, Serova, Alexandra, Prud'hommeaux, Emily, Nepomniashchaya, Maria, Giunchiglia, Fausto, Chodroff, Eleanor, Hulden, Mans, Silfverberg, Miikka, McCarthy, Arya D., Yarowsky, David, Cotterell, Ryan, Tsarfaty, Reut, Vylomova, Ekaterina

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-indepe

Externí odkaz: http://arxiv.org/abs/2205.03608

Zobrazit plný text záznamu

Report

Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship

Autor: Khishigsuren, Temuulen, Bella, Gábor, Batsuren, Khuyagbaatar, Freihat, Abed Alhakim, Nair, Nandu Chandran, Ganbold, Amarsanaa, Khalilia, Hadi, Chandrashekar, Yamini, Giunchiglia, Fausto

This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-spe

Externí odkaz: http://arxiv.org/abs/2204.05049

Zobrazit plný text záznamu

Report

Language Diversity: Visible to Humans, Exploitable by Machines

Autor: Bella, Gábor, Byambadorj, Erdenebileg, Chandrashekar, Yamini, Batsuren, Khuyagbaatar, Cheema, Danish Ashgar, Giunchiglia, Fausto

The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract

Externí odkaz: http://arxiv.org/abs/2203.04723

Zobrazit plný text záznamu

Report

Towards Algorithmic Transparency: A Diversity Perspective

Autor: Giunchiglia, Fausto, Otterbacher, Jahna, Kleanthous, Styliani, Batsuren, Khuyagbaatar, Bogin, Veronika, Kuflik, Tsvi, Tal, Avital Shulner

As the role of algorithmic systems and processes increases in society, so does the risk of bias, which can result in discrimination against individuals and social groups. Research on algorithmic bias has exploded in recent years, highlighting both th

Externí odkaz: http://arxiv.org/abs/2104.05658

Zobrazit plný text záznamu

Report

Mitigating Bias in Algorithmic Systems -- A Fish-Eye View

Autor: Orphanou, Kalia, Otterbacher, Jahna, Kleanthous, Styliani, Batsuren, Khuyagbaatar, Giunchiglia, Fausto, Bogina, Veronika, Tal, Avital Shulner, AlanHartman, Kuflik, Tsvi

Mitigating bias in algorithmic systems is a critical issue drawing attention across communities within the information and computer sciences. Given the complexity of the problem and the involvement of multiple stakeholders -- including developers, en

Externí odkaz: http://arxiv.org/abs/2103.16953

Zobrazit plný text záznamu

Dissertation/ Thesis

Understanding and Exploiting Language Diversity

Autor: Batsuren, Khuyagbaatar

Languages are well known to be diverse on all structural levels, from the smallest (phonemic) to the broadest (pragmatic). We propose a set of formal, quantitative measures for the language diversity of linguistic phenomena, the resource incompletene

Externí odkaz: https://hdl.handle.net/11572/368635

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání