Zobrazeno 1 - 10
of 663
pro vyhledávání: '"Biderman, P."'
Autor:
Alam, Mohammad Mahmudul, Oberle, Alexander, Raff, Edward, Biderman, Stella, Oates, Tim, Holt, James
Vector Symbolic Architectures (VSAs) are one approach to developing Neuro-symbolic AI, where two vectors in $\mathbb{R}^d$ are `bound' together to produce a new vector in the same space. VSAs support the commutativity and associativity of this bindin
Externí odkaz:
http://arxiv.org/abs/2410.22669
Autor:
Longpre, Shayne, Mahari, Robert, Lee, Ariel, Lund, Campbell, Oderinwale, Hamidah, Brannon, William, Saxena, Nayan, Obeng-Marnu, Naana, South, Tobin, Hunter, Cole, Klyman, Kevin, Klamm, Christopher, Schoelkopf, Hailey, Singh, Nikhil, Cherep, Manuel, Anis, Ahmad, Dinh, An, Chitongo, Caroline, Yin, Da, Sileo, Damien, Mataciunas, Deividas, Misra, Diganta, Alghamdi, Emad, Shippole, Enrico, Zhang, Jianguo, Materzynska, Joanna, Qian, Kun, Tiwary, Kush, Miranda, Lester, Dey, Manan, Liang, Minnie, Hamdy, Mohammed, Muennighoff, Niklas, Ye, Seonghyeon, Kim, Seungone, Mohanty, Shrestha, Gupta, Vipul, Sharma, Vivek, Chien, Vu Minh, Zhou, Xuhui, Li, Yizhi, Xiong, Caiming, Villa, Luis, Biderman, Stella, Li, Hanlin, Ippolito, Daphne, Hooker, Sara, Kabbara, Jad, Pentland, Sandy
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent pro
Externí odkaz:
http://arxiv.org/abs/2407.14933
Most currently deployed large language models (LLMs) undergo continuous training or additional finetuning. By contrast, most research into LLMs' internal mechanisms focuses on models at one snapshot in time (the end of pre-training), raising the ques
Externí odkaz:
http://arxiv.org/abs/2407.10827
State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand ho
Externí odkaz:
http://arxiv.org/abs/2407.07279
Autor:
Prashanth, USVSN Sai, Deng, Alvin, O'Brien, Kyle, S V, Jyothir, Khan, Mohammad Aflah, Borkar, Jaydeep, Choquette-Choo, Christopher A., Fuehne, Jacob Ray, Biderman, Stella, Ke, Tracy, Lee, Katherine, Saphra, Naomi
Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the mo
Externí odkaz:
http://arxiv.org/abs/2406.17746
Autor:
Longpre, Shayne, Biderman, Stella, Albalak, Alon, Schoelkopf, Hailey, McDuff, Daniel, Kapoor, Sayash, Klyman, Kevin, Lo, Kyle, Ilharco, Gabriel, San, Nay, Rauh, Maribeth, Skowron, Aviya, Vidgen, Bertie, Weidinger, Laura, Narayanan, Arvind, Sanh, Victor, Adelani, David, Liang, Percy, Bommasani, Rishi, Henderson, Peter, Luccioni, Sasha, Jernite, Yacine, Soldaini, Luca
Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tool
Externí odkaz:
http://arxiv.org/abs/2406.16746
Autor:
Schaeffer, Rylan, Schoelkopf, Hailey, Miranda, Brando, Mukobi, Gabriel, Madan, Varun, Ibrahim, Adam, Bradley, Herbie, Biderman, Stella, Koyejo, Sanmi
Predictable behavior from scaling advanced AI systems is an extremely desirable property. Although a well-established literature exists on how pretraining performance scales, the literature on how particular downstream capabilities scale is significa
Externí odkaz:
http://arxiv.org/abs/2406.04391
Autor:
Biderman, Stella, Schoelkopf, Hailey, Sutawika, Lintang, Gao, Leo, Tow, Jonathan, Abbasi, Baber, Aji, Alham Fikri, Ammanamanchi, Pawan Sasanka, Black, Sidney, Clive, Jordan, DiPofi, Anthony, Etxaniz, Julen, Fattori, Benjamin, Forde, Jessica Zosa, Foster, Charles, Hsu, Jeffrey, Jaiswal, Mimansa, Lee, Wilson Y., Li, Haonan, Lovering, Charles, Muennighoff, Niklas, Pavlick, Ellie, Phang, Jason, Skowron, Aviya, Tan, Samson, Tang, Xiangru, Wang, Kevin A., Winata, Genta Indra, Yvon, François, Zou, Andy
Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of rep
Externí odkaz:
http://arxiv.org/abs/2405.14782
Autor:
Biderman, Dan, Portes, Jacob, Ortiz, Jose Javier Gonzalez, Paul, Mansheej, Greengard, Philip, Jennings, Connor, King, Daniel, Havens, Sam, Chiley, Vitaliy, Frankle, Jonathan, Blakeney, Cody, Cunningham, John P.
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and f
Externí odkaz:
http://arxiv.org/abs/2405.09673
Autor:
Peng, Bo, Goldstein, Daniel, Anthony, Quentin, Albalak, Alon, Alcaide, Eric, Biderman, Stella, Cheah, Eugene, Du, Xingjian, Ferdinan, Teddy, Hou, Haowen, Kazienko, Przemysław, GV, Kranthi Kiran, Kocoń, Jan, Koptyra, Bartłomiej, Krishna, Satyapriya, McClelland Jr., Ronald, Lin, Jiaju, Muennighoff, Niklas, Obeid, Fares, Saito, Atsushi, Song, Guangyu, Tu, Haoqin, Wirawan, Cahya, Woźniak, Stanisław, Zhang, Ruichong, Zhao, Bingchen, Zhao, Qihang, Zhou, Peng, Zhu, Jian, Zhu, Rui-Jie
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity
Externí odkaz:
http://arxiv.org/abs/2404.05892