Zobrazeno 1 - 10
of 8 386
pro vyhledávání: '"low resource languages"'
Autor:
Thakkar, Gaurish1 (AUTHOR) gthakkar@ffzg.hr, Preradović, Nives Mikelić1 (AUTHOR) gthakkar@ffzg.hr, Tadić, Marko1 (AUTHOR)
Publikováno v:
Eng. Dec2024, Vol. 5 Issue 4, p2920-2942. 23p.
This study explores the effectiveness of layer pruning for developing more efficient BERT models tailored to specific downstream tasks in low-resource languages. Our primary objective is to evaluate whether pruned BERT models can maintain high perfor
Externí odkaz:
http://arxiv.org/abs/2501.00733
Approaching Speech-to-Text and Automatic Speech Recognition problems in low-resource languages is notoriously challenging due to the scarcity of validated datasets and the diversity of dialects. Arabic, Russian, and Portuguese exemplify these difficu
Externí odkaz:
http://arxiv.org/abs/2501.00425
Autor:
Hettiarachchi, Hansi, Ranasinghe, Tharindu, Rayson, Paul, Mitkov, Ruslan, Gaber, Mohamed, Premasiri, Damith, Tan, Fiona Anting, Uyangodage, Lasitha
The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed t
Externí odkaz:
http://arxiv.org/abs/2412.16365
This paper presents a new approach to fine-tuning OpenAI's Whisper model for low-resource languages by introducing a novel data generation method that converts sentence-level data into a long-form corpus, using Swiss German as a case study. Non-sente
Externí odkaz:
http://arxiv.org/abs/2412.15726
Autor:
Bajpai, Ashutosh, Chakraborty, Tanmoy
The unwavering disparity in labeled resources between resource-rich languages and those considered low-resource remains a significant impediment for Large Language Models (LLMs). Recent strides in cross-lingual in-context learning (X-ICL), mainly thr
Externí odkaz:
http://arxiv.org/abs/2412.08090
Autor:
Zhong, Tianyang, Yang, Zhenyuan, Liu, Zhengliang, Zhang, Ruidong, Liu, Yiheng, Sun, Haiyang, Pan, Yi, Li, Yiwei, Zhou, Yifan, Jiang, Hanqi, Chen, Junhao, Liu, Tianming
Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical challenges, including data scarcity and technological limitat
Externí odkaz:
http://arxiv.org/abs/2412.04497
The expanding influence of social media platforms over the past decade has impacted the way people communicate. The level of obscurity provided by social media and easy accessibility of the internet has facilitated the spread of hate speech. The term
Externí odkaz:
http://arxiv.org/abs/2411.19017
Low-resource languages face significant challenges due to the lack of sufficient linguistic data, resources, and tools for tasks such as supervised learning, annotation, and classification. This shortage hinders the development of accurate models and
Externí odkaz:
http://arxiv.org/abs/2411.17637
Large language models (LLMs) under-perform on low-resource languages due to limited training data. We present a method to efficiently collect text data for low-resource languages from the entire Common Crawl corpus. Our approach, UnifiedCrawl, filter
Externí odkaz:
http://arxiv.org/abs/2411.14343