Zobrazeno 1 - 10
of 8 101
pro vyhledávání: '"Low-resource languages"'
The expanding influence of social media platforms over the past decade has impacted the way people communicate. The level of obscurity provided by social media and easy accessibility of the internet has facilitated the spread of hate speech. The term
Externí odkaz:
http://arxiv.org/abs/2411.19017
Low-resource languages face significant challenges due to the lack of sufficient linguistic data, resources, and tools for tasks such as supervised learning, annotation, and classification. This shortage hinders the development of accurate models and
Externí odkaz:
http://arxiv.org/abs/2411.17637
Large language models (LLMs) under-perform on low-resource languages due to limited training data. We present a method to efficiently collect text data for low-resource languages from the entire Common Crawl corpus. Our approach, UnifiedCrawl, filter
Externí odkaz:
http://arxiv.org/abs/2411.14343
This paper presents a novel multistage fine-tuning strategy designed to enhance automatic speech recognition (ASR) performance in low-resource languages using OpenAI's Whisper model. In this approach we aim to build ASR model for languages with limit
Externí odkaz:
http://arxiv.org/abs/2411.04573
Autor:
Keita, Mamadou K., Homan, Christopher, Hamani, Sofiane Abdoulaye, Bremang, Adwoa, Zampieri, Marcos, Alfari, Habibatou Abdoulaye, Ibrahim, Elysabhete Amadou, Owusu, Dennis
Grammatical error correction (GEC) is important for improving written materials for low-resource languages like Zarma -- spoken by over 5 million people in West Africa. Yet it remains a challenging problem. This study compares rule-based methods, mac
Externí odkaz:
http://arxiv.org/abs/2410.15539
Autor:
Joshi, Raviraj, Singla, Kanishk, Kamath, Anusha, Kalani, Raunak, Paul, Rakesh, Vaidya, Utkarsh, Chauhan, Sanjay Singh, Wartikar, Niranjan, Long, Eileen
Multilingual LLMs support a variety of languages; however, their performance is suboptimal for low-resource languages. In this work, we emphasize the importance of continued pre-training of multilingual LLMs and the use of translation-based synthetic
Externí odkaz:
http://arxiv.org/abs/2410.14815
Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, yet challenges persist in adapting these models for low-resource languages. In this study, we investigate the effects of Low-Rank Adaptation (LoRA) Parameter-Efficie
Externí odkaz:
http://arxiv.org/abs/2411.18571
Autor:
Carta, Salvatore Mario, Chessa, Stefano, Contu, Giulia, Corriga, Andrea, Deidda, Andrea, Fenu, Gianni, Frigau, Luca, Giuliani, Alessandro, Grassi, Luca, Manca, Marco Manolo, Marras, Mirko, Mola, Francesco, Mossa, Bastianino, Mura, Piergiorgio, Ortu, Marco, Piano, Leonardo, Pisano, Simone, Pisu, Alessia, Podda, Alessandro Sebastian, Pompianu, Livio, Seu, Simone, Tiddia, Sandro Gabriele
Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes
Externí odkaz:
http://arxiv.org/abs/2411.13453
Autor:
Lankford, Séamus, Way, Andy
In an evolving landscape of crisis communication, the need for robust and adaptable Machine Translation (MT) systems is more pressing than ever, particularly for low-resource languages. This study presents a comprehensive exploration of leveraging La
Externí odkaz:
http://arxiv.org/abs/2410.23890
Autor:
Nigatu, Hellina Hailu, Tonja, Atnafu Lambebo, Rosman, Benjamin, Solorio, Thamar, Choudhury, Monojit
The disparity in the languages commonly studied in Natural Language Processing (NLP) is typically reflected by referring to languages as low vs high-resourced. However, there is limited consensus on what exactly qualifies as a `low-resource language.
Externí odkaz:
http://arxiv.org/abs/2410.20817