Zobrazeno 1 - 10
of 4 974
pro vyhledávání: '"Leshem, A."'
Autor:
Hu, Michael Y., Mueller, Aaron, Ross, Candace, Williams, Adina, Linzen, Tal, Zhuang, Chengxu, Cotterell, Ryan, Choshen, Leshem, Warstadt, Alex, Wilcox, Ethan Gotlieb
The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less. This
Externí odkaz:
http://arxiv.org/abs/2412.05149
Autor:
Singh, Shivalika, Romanou, Angelika, Fourrier, Clémentine, Adelani, David I., Ngui, Jian Gang, Vila-Suero, Daniel, Limkonchotiwat, Peerat, Marchisio, Kelly, Leong, Wei Qi, Susanto, Yosephine, Ng, Raymond, Longpre, Shayne, Ko, Wei-Yin, Smith, Madeline, Bosselut, Antoine, Oh, Alice, Martins, Andre F. T., Choshen, Leshem, Ippolito, Daphne, Ferrante, Enzo, Fadaee, Marzieh, Ermis, Beyza, Hooker, Sara
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from language but also from the cultural knowledge required to interpret questions, reducing the practical u
Externí odkaz:
http://arxiv.org/abs/2412.03304
Autor:
Hershcovitch, Moshik, Wood, Andrew, Choshen, Leshem, Girmonsky, Guy, Leibovitz, Roy, Ennmouri, Ilias, Malka, Michal, Chin, Peter, Sundararaman, Swaminathan, Harnik, Danny
With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model
Externí odkaz:
http://arxiv.org/abs/2411.05239
Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when mergin
Externí odkaz:
http://arxiv.org/abs/2410.19735
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare pretrainin
Externí odkaz:
http://arxiv.org/abs/2410.11840
Autor:
Shabtay, Nimrod, Polo, Felipe Maia, Doveh, Sivan, Lin, Wei, Mirza, M. Jehanzeb, Chosen, Leshem, Yurochkin, Mikhail, Sun, Yuekai, Arbelle, Assaf, Karlinsky, Leonid, Giryes, Raja
The large-scale training of multi-modal models on data scraped from the web has shown outstanding utility in infusing these models with the required world knowledge to perform effectively on multiple downstream tasks. However, one downside of scrapin
Externí odkaz:
http://arxiv.org/abs/2410.10783
In recent years, workplaces and educational institutes have widely adopted virtual meeting platforms. This has led to a growing interest in analyzing and extracting insights from these meetings, which requires effective detection and tracking of uniq
Externí odkaz:
http://arxiv.org/abs/2409.09841
Publikováno v:
First Conference on Language Modeling (2024)
When language models (LMs) are trained to forget (or "unlearn'') a skill, how precisely does their behavior change? We study the behavior of transformer LMs in which tasks have been forgotten via fine-tuning on randomized labels. Such LMs learn to ge
Externí odkaz:
http://arxiv.org/abs/2409.02228
Consider a scenario where a harmfulness detection metric is employed by a system to filter unsafe responses generated by a Large Language Model. When analyzing individual harmful and unethical prompt-response pairs, the metric correctly classifies ea
Externí odkaz:
http://arxiv.org/abs/2408.12259
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
The veracity of a factoid is largely independent of the language it is written in. However, language models are inconsistent in their ability to answer the same factual question across languages. This raises questions about how LLMs represent a given
Externí odkaz:
http://arxiv.org/abs/2408.10646