Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Miranda, Lester James"'
Autor:
Lambert, Nathan, Morrison, Jacob, Pyatkin, Valentina, Huang, Shengyi, Ivison, Hamish, Brahman, Faeze, Miranda, Lester James V., Liu, Alisa, Dziri, Nouha, Lyu, Shane, Gu, Yuling, Malik, Saumya, Graf, Victoria, Hwang, Jena D., Yang, Jiangjiang, Bras, Ronan Le, Tafjord, Oyvind, Wilhelm, Chris, Soldaini, Luca, Smith, Noah A., Wang, Yizhong, Dasigi, Pradeep, Hajishirzi, Hannaneh
Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones. The underlying training data and recipes for
Externí odkaz:
http://arxiv.org/abs/2411.15124
Autor:
Miranda, Lester James V., Wang, Yizhong, Elazar, Yanai, Kumar, Sachin, Pyatkin, Valentina, Brahman, Faeze, Smith, Noah A., Hajishirzi, Hannaneh, Dasigi, Pradeep
Learning from human feedback has enabled the alignment of language models (LMs) with human preferences. However, directly collecting human preferences can be expensive, time-consuming, and can have high variance. An appealing alternative is to distil
Externí odkaz:
http://arxiv.org/abs/2410.19133
Autor:
Gureja, Srishti, Miranda, Lester James V., Islam, Shayekh Bin, Maheshwary, Rishabh, Sharma, Drishti, Winata, Gusti, Lambert, Nathan, Ruder, Sebastian, Hooker, Sara, Fadaee, Marzieh
Reward models (RMs) have driven the state-of-the-art performance of LLMs today by enabling the integration of human feedback into the language modeling process. However, RMs are primarily trained and evaluated in English, and their capabilities in mu
Externí odkaz:
http://arxiv.org/abs/2410.15522
Autor:
Lovenia, Holy, Mahendra, Rahmad, Akbar, Salsabil Maulana, Miranda, Lester James V., Santoso, Jennifer, Aco, Elyanah, Fadhilah, Akhdan, Mansurov, Jonibek, Imperial, Joseph Marvin, Kampman, Onno P., Moniz, Joel Ruben Antony, Habibi, Muhammad Ravi Shulthan, Hudi, Frederikus, Montalan, Railey, Ignatius, Ryan, Lopo, Joanito Agili, Nixon, William, Karlsson, Börje F., Jaya, James, Diandaru, Ryandito, Gao, Yuze, Amadeus, Patrick, Wang, Bin, Cruz, Jan Christian Blaise, Whitehouse, Chenxi, Parmonangan, Ivan Halim, Khelli, Maria, Zhang, Wenyu, Susanto, Lucky, Ryanda, Reynard Adha, Hermawan, Sonny Lazuardi, Velasco, Dan John, Kautsar, Muhammad Dehan Al, Hendria, Willy Fitra, Moslem, Yasmin, Flynn, Noah, Adilazuarda, Muhammad Farid, Li, Haochen, Lee, Johanes, Damanhuri, R., Sun, Shuo, Qorib, Muhammad Reza, Djanibekov, Amirbek, Leong, Wei Qi, Do, Quyet V., Muennighoff, Niklas, Pansuwan, Tanrada, Putra, Ilham Firdausi, Xu, Yan, Tai, Ngee Chia, Purwarianti, Ayu, Ruder, Sebastian, Tjhi, William, Limkonchotiwat, Peerat, Aji, Alham Fikri, Keh, Sedrick, Winata, Genta Indra, Zhang, Ruochen, Koto, Fajri, Yong, Zheng-Xin, Cahyawijaya, Samuel
Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts,
Externí odkaz:
http://arxiv.org/abs/2406.10118
Autor:
Miranda, Lester James V.
We introduce calamanCy, an open-source toolkit for constructing natural language processing (NLP) pipelines for Tagalog. It is built on top of spaCy, enabling easy experimentation and integration with other frameworks. calamanCy addresses the develop
Externí odkaz:
http://arxiv.org/abs/2311.07171
Autor:
Miranda, Lester James V.
We present the development of a Named Entity Recognition (NER) dataset for Tagalog. This corpus helps fill the resource gap present in Philippine languages today, where NER resources are scarce. The texts were obtained from a pretraining corpora cont
Externí odkaz:
http://arxiv.org/abs/2311.07161
Autor:
Miranda, Lester James, Kádár, Ákos, Boyd, Adriane, Van Landeghem, Sofie, Søgaard, Anders, Honnibal, Matthew
The distributed representation of symbols is one of the key technologies in machine learning systems today, playing a pivotal role in modern natural language processing. Traditional word embeddings associate a separate vector with each word. While th
Externí odkaz:
http://arxiv.org/abs/2212.09255
Autor:
Miranda, Lester James V., Samson, Mark Steve, Orden II, Alfiero K., Silmaro, Bianca S., Guzman III, Ram K. De, Sy, Stephanie S.
This paper presents Geomancer, an open-source framework for geospatial feature engineering. It simplifies the acquisition of geospatial attributes for downstream, large-scale machine learning tasks. Geomancer leverages any geospatial dataset stored i
Externí odkaz:
http://arxiv.org/abs/1910.05571