Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Ruiter, Dana"'
Analyzing ethnic or religious bias is important for improving fairness, accountability, and transparency of natural language processing models. However, many techniques rely on human-compiled lists of bias terms, which are expensive to create and are
Externí odkaz:
http://arxiv.org/abs/2205.14036
Autor:
Ruiter, Dana, Kleinbauer, Thomas, España-Bonet, Cristina, van Genabith, Josef, Klakow, Dietrich
Recent research on style transfer takes inspiration from unsupervised neural machine translation (UNMT), learning from large amounts of non-parallel data by exploiting cycle consistency loss, back-translation, and denoising autoencoders. By contrast,
Externí odkaz:
http://arxiv.org/abs/2205.08814
Autor:
Adelani, David Ifeoluwa, Alabi, Jesujoba Oluwadara, Fan, Angela, Kreutzer, Julia, Shen, Xiaoyu, Reid, Machel, Ruiter, Dana, Klakow, Dietrich, Nabende, Peter, Chang, Ernie, Gwadabe, Tajuddeen, Sackey, Freshia, Dossou, Bonaventure F. P., Emezue, Chris Chinenye, Leong, Colin, Beukman, Michael, Muhammad, Shamsuddeen Hassan, Jarso, Guyo Dub, Yousuf, Oreen, Rubungo, Andre Niyongabo, Hacheme, Gilles, Wairagala, Eric Peter, Nasir, Muhammad Umair, Ajibade, Benjamin Ayoade, Ajayi, Tunde Oluwaseyi, Gitau, Yvonne Wambui, Abbott, Jade, Ahmed, Mohamed, Ochieng, Millicent, Aremu, Anuoluwapo, Ogayo, Perez, Mukiibi, Jonathan, Kabore, Fatoumata Ouoba, Kalipe, Godson Koffi, Mbaye, Derguene, Tapo, Allahsera Auguste, Koagne, Victoire Memdjokam, Munkoh-Buabeng, Edwin, Wagner, Valencia, Abdulmumin, Idris, Awokoya, Ayodele, Buzaaba, Happy, Sibanda, Blessing, Bukula, Andiswa, Manthalu, Sam
Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not
Externí odkaz:
http://arxiv.org/abs/2205.02022
Autor:
Ruiter, Dana, Reiners, Liane, D'Sa, Ashwin Geet, Kleinbauer, Thomas, Fohr, Dominique, Illina, Irina, Klakow, Dietrich, Schemer, Christian, Monnier, Angeliki
Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as "hate" or "neutral". This ignores the complex and su
Externí odkaz:
http://arxiv.org/abs/2204.13400
We describe the EdinSaar submission to the shared task of Multilingual Low-Resource Translation for North Germanic Languages at the Sixth Conference on Machine Translation (WMT2021). We submit multilingual translation models for translations to/from
Externí odkaz:
http://arxiv.org/abs/2109.14368
For most language combinations, parallel data is either scarce or simply unavailable. To address this, unsupervised machine translation (UMT) exploits large amounts of monolingual data by using synthetic data generation techniques such as back-transl
Externí odkaz:
http://arxiv.org/abs/2107.08772
Hate speech and profanity detection suffer from data sparsity, especially for languages other than English, due to the subjective nature of the tasks and the resulting annotation incompatibility of existing corpora. In this study, we identify profane
Externí odkaz:
http://arxiv.org/abs/2106.07505
Autor:
Adelani, David I., Ruiter, Dana, Alabi, Jesujoba O., Adebonojo, Damilola, Ayeni, Adesina, Adeyemi, Mofe, Awokoya, Ayodele, España-Bonet, Cristina
Massively multilingual machine translation (MT) has shown impressive capabilities, including zero and few-shot translation between low-resource language pairs. However, these models are often evaluated on high-resource languages with the assumption t
Externí odkaz:
http://arxiv.org/abs/2103.08647
Sentiment tasks such as hate speech detection and sentiment analysis, especially when performed on languages other than English, are often low-resource. In this study, we exploit the emotional information encoded in emojis to enhance the performance
Externí odkaz:
http://arxiv.org/abs/2102.06423
Autor:
Wolf, Moritz, Ruiter, Dana, D'Sa, Ashwin Geet, Reiners, Liane, Alexandersson, Jan, Klakow, Dietrich
A lot of real-world phenomena are complex and cannot be captured by single task annotations. This causes a need for subsequent annotations, with interdependent questions and answers describing the nature of the subject at hand. Even in the case a phe
Externí odkaz:
http://arxiv.org/abs/2010.01080