Zobrazeno 1 - 10
of 27 377
pro vyhledávání: '"Leshem, A."'
Autor:
Girgus, Sam B.
Publikováno v:
Cinéaste, 2022 Oct 01. 47(4), 52-54.
Externí odkaz:
https://www.jstor.org/stable/27170434
Scaling laws for large language models (LLMs) predict model performance based on parameters like size and training data. However, differences in training configurations and data processing across model families lead to significant variations in bench
Externí odkaz:
http://arxiv.org/abs/2412.06540
Autor:
Hu, Michael Y., Mueller, Aaron, Ross, Candace, Williams, Adina, Linzen, Tal, Zhuang, Chengxu, Cotterell, Ryan, Choshen, Leshem, Warstadt, Alex, Wilcox, Ethan Gotlieb
The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less. This
Externí odkaz:
http://arxiv.org/abs/2412.05149
Autor:
Singh, Shivalika, Romanou, Angelika, Fourrier, Clémentine, Adelani, David I., Ngui, Jian Gang, Vila-Suero, Daniel, Limkonchotiwat, Peerat, Marchisio, Kelly, Leong, Wei Qi, Susanto, Yosephine, Ng, Raymond, Longpre, Shayne, Ko, Wei-Yin, Smith, Madeline, Bosselut, Antoine, Oh, Alice, Martins, Andre F. T., Choshen, Leshem, Ippolito, Daphne, Ferrante, Enzo, Fadaee, Marzieh, Ermis, Beyza, Hooker, Sara
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from language but also from the cultural knowledge required to interpret questions, reducing the practical u
Externí odkaz:
http://arxiv.org/abs/2412.03304
Autor:
Hershcovitch, Moshik, Wood, Andrew, Choshen, Leshem, Girmonsky, Guy, Leibovitz, Roy, Ennmouri, Ilias, Malka, Michal, Chin, Peter, Sundararaman, Swaminathan, Harnik, Danny
With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model
Externí odkaz:
http://arxiv.org/abs/2411.05239
Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when mergin
Externí odkaz:
http://arxiv.org/abs/2410.19735
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare pretrainin
Externí odkaz:
http://arxiv.org/abs/2410.11840
Autor:
Shabtay, Nimrod, Polo, Felipe Maia, Doveh, Sivan, Lin, Wei, Mirza, M. Jehanzeb, Chosen, Leshem, Yurochkin, Mikhail, Sun, Yuekai, Arbelle, Assaf, Karlinsky, Leonid, Giryes, Raja
The large-scale training of multi-modal models on data scraped from the web has shown outstanding utility in infusing these models with the required world knowledge to perform effectively on multiple downstream tasks. However, one downside of scrapin
Externí odkaz:
http://arxiv.org/abs/2410.10783
In recent years, workplaces and educational institutes have widely adopted virtual meeting platforms. This has led to a growing interest in analyzing and extracting insights from these meetings, which requires effective detection and tracking of uniq
Externí odkaz:
http://arxiv.org/abs/2409.09841
Publikováno v:
First Conference on Language Modeling (2024)
When language models (LMs) are trained to forget (or "unlearn'') a skill, how precisely does their behavior change? We study the behavior of transformer LMs in which tasks have been forgotten via fine-tuning on randomized labels. Such LMs learn to ge
Externí odkaz:
http://arxiv.org/abs/2409.02228