Zobrazeno 1 - 10
of 328
pro vyhledávání: '"Dabre, A."'
Pre-trained language models (PLMs) are known to be susceptible to perturbations to the input text, but existing works do not explicitly focus on linguistically grounded attacks, which are subtle and more prevalent in nature. In this paper, we study w
Externí odkaz:
http://arxiv.org/abs/2412.10805
Autor:
Suryanarayanan, Sanjay, Song, Haiyue, Khan, Mohammed Safi Ur Rahman, Kunchukuttan, Anoop, Khapra, Mitesh M., Dabre, Raj
Mining parallel document pairs poses a significant challenge because existing sentence embedding models often have limited context windows, preventing them from effectively capturing document-level information. Another overlooked issue is the lack of
Externí odkaz:
http://arxiv.org/abs/2411.19096
Autor:
Jain, Sparsh, Sankar, Ashwin, Choudhary, Devilal, Suman, Dhairya, Narasimhan, Nikhil, Khan, Mohammed Safi Ur Rahman, Kunchukuttan, Anoop, Khapra, Mitesh M, Dabre, Raj
Automatic Speech Translation (AST) datasets for Indian languages remain critically scarce, with public resources covering fewer than 10 of the 22 official languages. This scarcity has resulted in AST systems for Indian languages lagging far behind th
Externí odkaz:
http://arxiv.org/abs/2411.04699
Autor:
Doddapaneni, Sumanth, Khan, Mohammed Safi Ur Rahman, Venkatesh, Dilip, Dabre, Raj, Kunchukuttan, Anoop, Khapra, Mitesh M.
Evaluating machine-generated text remains a significant challenge in NLP, especially for non-English languages. Current methodologies, including automated metrics, human assessments, and LLM-based evaluations, predominantly focus on English, revealin
Externí odkaz:
http://arxiv.org/abs/2410.13394
Autor:
Winata, Genta Indra, Hudi, Frederikus, Irawan, Patrick Amadeus, Anugraha, David, Putri, Rifki Afina, Wang, Yutong, Nohejl, Adam, Prathama, Ubaidillah Ariq, Ousidhoum, Nedjma, Amriani, Afifa, Rzayev, Anar, Das, Anirban, Pramodya, Ashmari, Adila, Aulia, Wilie, Bryan, Mawalim, Candy Olivia, Cheng, Ching Lam, Abolade, Daud, Chersoni, Emmanuele, Santus, Enrico, Ikhwantri, Fariz, Kuwanto, Garry, Zhao, Hanyang, Wibowo, Haryo Akbarianto, Lovenia, Holy, Cruz, Jan Christian Blaise, Putra, Jan Wira Gotama, Myung, Junho, Susanto, Lucky, Machin, Maria Angelica Riera, Zhukova, Marina, Anugraha, Michael, Adilazuarda, Muhammad Farid, Santosa, Natasha, Limkonchotiwat, Peerat, Dabre, Raj, Audino, Rio Alexander, Cahyawijaya, Samuel, Zhang, Shi-Xiong, Salim, Stephanie Yulia, Zhou, Yi, Gui, Yinxuan, Adelani, David Ifeoluwa, Lee, En-Shiun Annie, Okada, Shogo, Purwarianti, Ayu, Aji, Alham Fikri, Watanabe, Taro, Wijaya, Derry Tanti, Oh, Alice, Ngo, Chong-Wah
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a mas
Externí odkaz:
http://arxiv.org/abs/2410.12705
Autor:
Mundra, Nandini, Kishore, Aditya Nanda, Dabre, Raj, Puduppully, Ratish, Kunchukuttan, Anoop, Khapra, Mitesh M.
Language Models (LMs) excel in natural language processing tasks for English but show reduced performance in most other languages. This problem is commonly tackled by continually pre-training and fine-tuning these models for said languages. A signifi
Externí odkaz:
http://arxiv.org/abs/2407.05841
Machine Translation (MT) between linguistically dissimilar languages is challenging, especially due to the scarcity of parallel corpora. Prior works suggest that pivoting through a high-resource language can help translation into a related low-resour
Externí odkaz:
http://arxiv.org/abs/2406.13332
Autor:
Romero, David, Lyu, Chenyang, Wibowo, Haryo Akbarianto, Lynn, Teresa, Hamed, Injy, Kishore, Aditya Nanda, Mandal, Aishik, Dragonetti, Alina, Abzaliev, Artem, Tonja, Atnafu Lambebo, Balcha, Bontu Fufa, Whitehouse, Chenxi, Salamea, Christian, Velasco, Dan John, Adelani, David Ifeoluwa, Meur, David Le, Villa-Cueva, Emilio, Koto, Fajri, Farooqui, Fauzan, Belcavello, Frederico, Batnasan, Ganzorig, Vallejo, Gisela, Caulfield, Grainne, Ivetta, Guido, Song, Haiyue, Ademtew, Henok Biadglign, Maina, Hernán, Lovenia, Holy, Azime, Israel Abebe, Cruz, Jan Christian Blaise, Gala, Jay, Geng, Jiahui, Ortiz-Barajas, Jesus-German, Baek, Jinheon, Dunstan, Jocelyn, Alemany, Laura Alonso, Nagasinghe, Kumaranage Ravindu Yasas, Benotti, Luciana, D'Haro, Luis Fernando, Viridiano, Marcelo, Estecha-Garitagoitia, Marcos, Cabrera, Maria Camila Buitrago, Rodríguez-Cantelar, Mario, Jouitteau, Mélanie, Mihaylov, Mihail, Imam, Mohamed Fazli Mohamed, Adilazuarda, Muhammad Farid, Gochoo, Munkhjargal, Otgonbold, Munkh-Erdene, Etori, Naome, Niyomugisha, Olivier, Silva, Paula Mónica, Chitale, Pranjal, Dabre, Raj, Chevi, Rendi, Zhang, Ruochen, Diandaru, Ryandito, Cahyawijaya, Samuel, Góngora, Santiago, Jeong, Soyeong, Purkayastha, Sukannya, Kuribayashi, Tatsuki, Clifford, Teresa, Jayakumar, Thanmay, Torrent, Tiago Timponi, Ehsan, Toqeer, Araujo, Vladimir, Kementchedjhieva, Yova, Burzo, Zara, Lim, Zheng Wei, Yong, Zheng Xin, Ignat, Oana, Nwatu, Joan, Mihalcea, Rada, Solorio, Thamar, Aji, Alham Fikri
Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA
Externí odkaz:
http://arxiv.org/abs/2406.05967
Autor:
Singh, Anushka, Sai, Ananya B., Dabre, Raj, Puduppully, Ratish, Kunchukuttan, Anoop, Khapra, Mitesh M
While machine translation evaluation has been studied primarily for high-resource languages, there has been a recent interest in evaluation for low-resource languages due to the increasing availability of data and models. In this paper, we focus on a
Externí odkaz:
http://arxiv.org/abs/2406.03893
Autor:
Robinson, Nathaniel R., Dabre, Raj, Shurtz, Ammon, Dent, Rasul, Onesi, Onenamiyi, Monroc, Claire Bizon, Grobol, Loïc, Muhammad, Hasan, Garg, Ashi, Etori, Naome A., Tiyyala, Vijay Murari, Samuel, Olanrewaju, Stutzman, Matthew Dean, Odoom, Bismarck Bamfo, Khudanpur, Sanjeev, Richardson, Stephen D., Murray, Kenton
A majority of language technologies are tailored for a small number of high-resource languages, while relatively many low-resource languages are neglected. One such group, Creole languages, have long been marginalized in academic study, though their
Externí odkaz:
http://arxiv.org/abs/2405.05376