Zobrazeno 1 - 10
of 89
pro vyhledávání: '"Hooker, Sara"'
Efficiency, specialization, and adaptability to new data distributions are qualities that are hard to combine in current Large Language Models. The Mixture of Experts (MoE) architecture has been the focus of significant research because its inherent
Externí odkaz:
http://arxiv.org/abs/2408.15901
The use of synthetic data has played a critical role in recent state-of-art breakthroughs. However, overly relying on a single oracle teacher model to generate data has been shown to lead to model collapse and invite propagation of biases. These limi
Externí odkaz:
http://arxiv.org/abs/2408.14960
Autor:
Aryabumi, Viraat, Su, Yixuan, Ma, Raymond, Morisot, Adrien, Zhang, Ivan, Locatelli, Acyr, Fadaee, Marzieh, Üstün, Ahmet, Hooker, Sara
Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been anecdotal consensus among practitioners that code data plays a vital role in
Externí odkaz:
http://arxiv.org/abs/2408.10914
Autor:
Don-Yehiya, Shachar, Burtenshaw, Ben, Astudillo, Ramon Fernandez, Osborne, Cailean, Jaiswal, Mimansa, Kuo, Tzu-Sheng, Zhao, Wenting, Shenfeld, Idan, Peng, Andi, Yurochkin, Mikhail, Kasirzadeh, Atoosa, Huang, Yangsibo, Hashimoto, Tatsunori, Jernite, Yacine, Vila-Suero, Daniel, Abend, Omri, Ding, Jennifer, Hooker, Sara, Kirk, Hannah Rose, Choshen, Leshem
Human feedback on conversations with language language models (LLMs) is central to how these systems learn about the world, improve their capabilities, and are steered toward desirable and safe behaviors. However, this feedback is mostly collected by
Externí odkaz:
http://arxiv.org/abs/2408.16961
Autor:
Reuel, Anka, Bucknall, Ben, Casper, Stephen, Fist, Tim, Soder, Lisa, Aarne, Onni, Hammond, Lewis, Ibrahim, Lujain, Chan, Alan, Wills, Peter, Anderljung, Markus, Garfinkel, Ben, Heim, Lennart, Trask, Andrew, Mukobi, Gabriel, Schaeffer, Rylan, Baker, Mauricio, Hooker, Sara, Solaiman, Irene, Luccioni, Alexandra Sasha, Rajkumar, Nitarshan, Moës, Nicolas, Ladish, Jeffrey, Guha, Neel, Newman, Jessica, Bengio, Yoshua, South, Tobin, Pentland, Alex, Koyejo, Sanmi, Kochenderfer, Mykel J., Trager, Robert
AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technic
Externí odkaz:
http://arxiv.org/abs/2407.14981
Autor:
Longpre, Shayne, Mahari, Robert, Lee, Ariel, Lund, Campbell, Oderinwale, Hamidah, Brannon, William, Saxena, Nayan, Obeng-Marnu, Naana, South, Tobin, Hunter, Cole, Klyman, Kevin, Klamm, Christopher, Schoelkopf, Hailey, Singh, Nikhil, Cherep, Manuel, Anis, Ahmad, Dinh, An, Chitongo, Caroline, Yin, Da, Sileo, Damien, Mataciunas, Deividas, Misra, Diganta, Alghamdi, Emad, Shippole, Enrico, Zhang, Jianguo, Materzynska, Joanna, Qian, Kun, Tiwary, Kush, Miranda, Lester, Dey, Manan, Liang, Minnie, Hamdy, Mohammed, Muennighoff, Niklas, Ye, Seonghyeon, Kim, Seungone, Mohanty, Shrestha, Gupta, Vipul, Sharma, Vivek, Chien, Vu Minh, Zhou, Xuhui, Li, Yizhi, Xiong, Caiming, Villa, Luis, Biderman, Stella, Li, Hanlin, Ippolito, Daphne, Hooker, Sara, Kabbara, Jad, Pentland, Sandy
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent pro
Externí odkaz:
http://arxiv.org/abs/2407.14933
Autor:
Hooker, Sara
At face value, this essay is about understanding a fairly esoteric governance tool called compute thresholds. However, in order to grapple with whether these thresholds will achieve anything, we must first understand how they came to be. To do so, we
Externí odkaz:
http://arxiv.org/abs/2407.05694
Autor:
Marchisio, Kelly, Dash, Saurabh, Chen, Hongyu, Aumiller, Dennis, Üstün, Ahmet, Hooker, Sara, Ruder, Sebastian
Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languag
Externí odkaz:
http://arxiv.org/abs/2407.03211
Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like En
Externí odkaz:
http://arxiv.org/abs/2407.02552
The widespread adoption of synthetic data raises new questions about how models generating the data can influence other large language models (LLMs) via distilled data. To start, our work exhaustively characterizes the impact of passive inheritance o
Externí odkaz:
http://arxiv.org/abs/2407.01490