Výsledky vyhledávání - "Winata, Genta Indra"

Report

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Autor: Lovenia, Holy, Mahendra, Rahmad, Akbar, Salsabil Maulana, Miranda, Lester James V., Santoso, Jennifer, Aco, Elyanah, Fadhilah, Akhdan, Mansurov, Jonibek, Imperial, Joseph Marvin, Kampman, Onno P., Moniz, Joel Ruben Antony, Habibi, Muhammad Ravi Shulthan, Hudi, Frederikus, Montalan, Railey, Ignatius, Ryan, Lopo, Joanito Agili, Nixon, William, Karlsson, Börje F., Jaya, James, Diandaru, Ryandito, Gao, Yuze, Amadeus, Patrick, Wang, Bin, Cruz, Jan Christian Blaise, Whitehouse, Chenxi, Parmonangan, Ivan Halim, Khelli, Maria, Zhang, Wenyu, Susanto, Lucky, Ryanda, Reynard Adha, Hermawan, Sonny Lazuardi, Velasco, Dan John, Kautsar, Muhammad Dehan Al, Hendria, Willy Fitra, Moslem, Yasmin, Flynn, Noah, Adilazuarda, Muhammad Farid, Li, Haochen, Lee, Johanes, Damanhuri, R., Sun, Shuo, Qorib, Muhammad Reza, Djanibekov, Amirbek, Leong, Wei Qi, Do, Quyet V., Muennighoff, Niklas, Pansuwan, Tanrada, Putra, Ilham Firdausi, Xu, Yan, Tai, Ngee Chia, Purwarianti, Ayu, Ruder, Sebastian, Tjhi, William, Limkonchotiwat, Peerat, Aji, Alham Fikri, Keh, Sedrick, Winata, Genta Indra, Zhang, Ruochen, Koto, Fajri, Yong, Zheng-Xin, Cahyawijaya, Samuel

Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts,

Externí odkaz: http://arxiv.org/abs/2406.10118

Zobrazit plný text záznamu

Report

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

Autor: Anugraha, David, Winata, Genta Indra, Li, Chenyue, Irawan, Patrick Amadeus, Lee, En-Shiun Annie

Performance prediction is a method to estimate the performance of Language Models (LMs) on various Natural Language Processing (NLP) tasks, mitigating computational costs associated with model capacity and data for fine-tuning. Our paper introduces P

Externí odkaz: http://arxiv.org/abs/2406.09334

Zobrazit plný text záznamu

Report

MINERS: Multilingual Language Models as Semantic Retrievers

Autor: Winata, Genta Indra, Zhang, Ruochen, Adelani, David Ifeoluwa

Words have been represented in a high-dimensional vector space that encodes their semantic similarities, enabling downstream applications such as retrieving synonyms, antonyms, and relevant contexts. However, despite recent advances in multilingual l

Externí odkaz: http://arxiv.org/abs/2406.07424

Zobrazit plný text záznamu

Report

Lessons from the Trenches on Reproducible Evaluation of Language Models

Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of rep

Externí odkaz: http://arxiv.org/abs/2405.14782

Zobrazit plný text záznamu

Report

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages

Autor: Cahyawijaya, Samuel, Lovenia, Holy, Koto, Fajri, Putri, Rifki Afina, Dave, Emmanuel, Lee, Jhonson, Shadieq, Nuur, Cenggoro, Wawan, Akbar, Salsabil Maulana, Mahendra, Muhammad Ihza, Putri, Dea Annisayanti, Wilie, Bryan, Winata, Genta Indra, Aji, Alham Fikri, Purwarianti, Ayu, Fung, Pascale

Large language models (LLMs) show remarkable human-like capability in various domains and languages. However, a notable quality gap arises in low-resource languages, e.g., Indonesian indigenous languages, rendering them ineffective and inefficient in

Externí odkaz: http://arxiv.org/abs/2404.06138

Zobrazit plný text záznamu

Report

LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization

Autor: Adilazuarda, Muhammad Farid, Cahyawijaya, Samuel, Aji, Alham Fikri, Winata, Genta Indra, Purwarianti, Ayu

Pretrained language models (PLMs) have become remarkably adept at task and language generalization. Nonetheless, they often fail when faced with unseen languages. In this work, we present LinguAlchemy, a regularization method that incorporates variou

Externí odkaz: http://arxiv.org/abs/2401.06034

Zobrazit plný text záznamu

Report

IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages

Autor: Adilazuarda, Muhammad Farid, Cahyawijaya, Samuel, Winata, Genta Indra, Fung, Pascale, Purwarianti, Ayu

Significant progress has been made on Indonesian NLP. Nevertheless, exploration of the code-mixing phenomenon in Indonesian is limited, despite many languages being frequently mixed with Indonesian in daily conversation. In this work, we explore code

Externí odkaz: http://arxiv.org/abs/2311.12405

Zobrazit plný text záznamu

Report

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

Autor: Kautsar, Muhammad Dehan Al, Nurdini, Rahmah Khoirussyifa', Cahyawijaya, Samuel, Winata, Genta Indra, Purwarianti, Ayu

Task-oriented dialogue (ToD) systems have been mostly created for high-resource languages, such as English and Chinese. However, there is a need to develop ToD systems for other regional or local languages to broaden their ability to comprehend the d

Externí odkaz: http://arxiv.org/abs/2311.00958

Zobrazit plný text záznamu

Report

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

Autor: Cahyawijaya, Samuel, Lovenia, Holy, Koto, Fajri, Adhista, Dea, Dave, Emmanuel, Oktavianti, Sarah, Akbar, Salsabil Maulana, Lee, Jhonson, Shadieq, Nuur, Cenggoro, Tjeng Wawan, Linuwih, Hanung Wahyuning, Wilie, Bryan, Muridan, Galih Pradipta, Winata, Genta Indra, Moeljadi, David, Aji, Alham Fikri, Purwarianti, Ayu, Fung, Pascale

Democratizing access to natural language processing (NLP) technology is crucial, especially for underrepresented and extremely low-resource languages. Previous research has focused on developing labeled and unlabeled corpora for these languages throu

Externí odkaz: http://arxiv.org/abs/2309.10661

Zobrazit plný text záznamu

Report

Multilingual Few-Shot Learning via Language Model Retrieval

Autor: Winata, Genta Indra, Huang, Liang-Kang, Vadlamannati, Soumya, Chandarana, Yash

Transformer-based language models have achieved remarkable success in few-shot in-context learning and drawn a lot of research interest. However, these models' performance greatly depends on the choice of the example prompts and also has high variabi

Externí odkaz: http://arxiv.org/abs/2306.10964

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání