Zobrazeno 1 - 10
of 316
pro vyhledávání: '"MISHRA, MAYANK"'
Autor:
Gupta, Sonam, Nandwani, Yatin, Yehudai, Asaf, Mishra, Mayank, Pandey, Gaurav, Raghu, Dinesh, Joshi, Sachindra
Fine-tuning Large Language Models (LLMs) on specific datasets is a common practice to improve performance on target tasks. However, this performance gain often leads to overfitting, where the model becomes too specialized in either the task or the ch
Externí odkaz:
http://arxiv.org/abs/2409.04787
Autor:
Shen, Yikang, Stallone, Matthew, Mishra, Mayank, Zhang, Gaoyuan, Tan, Shawn, Prasad, Aditya, Soria, Adriana Meza, Cox, David D., Panda, Rameswar
Finding the optimal learning rate for language model pretraining is a challenging task. This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters bu
Externí odkaz:
http://arxiv.org/abs/2408.13359
Autor:
Stallone, Matt, Saxena, Vaibhav, Karlinsky, Leonid, McGinn, Bridget, Bula, Tim, Mishra, Mayank, Soria, Adriana Meza, Zhang, Gaoyuan, Prasad, Aditya, Shen, Yikang, Surendran, Saptha, Guttula, Shanmukha, Patel, Hima, Selvam, Parameswaran, Dang, Xuan-Hong, Koyfman, Yan, Sood, Atin, Feris, Rogerio, Desai, Nirmit, Cox, David D., Puri, Ruchir, Panda, Rameswar
This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraini
Externí odkaz:
http://arxiv.org/abs/2407.13739
Padding is often used in tuning LLM models by adding special tokens to shorter training examples to match the length of the longest sequence in each batch. While this ensures uniformity for batch processing, it introduces inefficiencies by including
Externí odkaz:
http://arxiv.org/abs/2407.09105
Autor:
Gershon, Talia, Seelam, Seetharami, Belgodere, Brian, Bonilla, Milton, Hoang, Lan, Barnett, Danny, Chung, I-Hsin, Mohan, Apoorve, Chen, Ming-Hung, Luo, Lixiang, Walkup, Robert, Evangelinos, Constantinos, Salaria, Shweta, Dombrowa, Marc, Park, Yoonho, Kayi, Apo, Schour, Liran, Alim, Alim, Sydney, Ali, Maniotis, Pavlos, Schares, Laurent, Metzler, Bernard, Karacali-Akyamac, Bengi, Wen, Sophia, Chiba, Tatsuhiro, Choochotkaew, Sunyanan, Yoshimura, Takeshi, Misale, Claudia, Elengikal, Tonia, Connor, Kevin O, Liu, Zhuoran, Molina, Richard, Schneidenbach, Lars, Caden, James, Laibinis, Christopher, Fonseca, Carlos, Tarasov, Vasily, Sundararaman, Swaminathan, Schmuck, Frank, Guthridge, Scott, Cohn, Jeremy, Eshel, Marc, Muench, Paul, Liu, Runyu, Pointer, William, Wyskida, Drew, Krull, Bob, Rose, Ray, Wolfe, Brent, Cornejo, William, Walter, John, Malone, Colm, Perucci, Clifford, Franco, Frank, Hinds, Nigel, Calio, Bob, Druyan, Pavel, Kilduff, Robert, Kienle, John, McStay, Connor, Figueroa, Andrew, Connolly, Matthew, Fost, Edie, Roma, Gina, Fonseca, Jake, Levy, Ido, Payne, Michele, Schenkel, Ryan, Malki, Amir, Schneider, Lion, Narkhede, Aniruddha, Moshref, Shekeba, Kisin, Alexandra, Dodin, Olga, Rippon, Bill, Wrieth, Henry, Ganci, John, Colino, Johnny, Habeger-Rose, Donna, Pandey, Rakesh, Gidh, Aditya, Gaur, Aditya, Patterson, Dennis, Salmani, Samsuddin, Varma, Rambilas, Rumana, Rumana, Sharma, Shubham, Mishra, Mayank, Panda, Rameswar, Prasad, Aditya, Stallone, Matt, Zhang, Gaoyuan, Shen, Yikang, Cox, David, Puri, Ruchir, Agrawal, Dakshi, Thorstensen, Drew, Belog, Joel, Tang, Brent, Gupta, Saurabh Kumar, Biswas, Amitabha, Maheshwari, Anup, Gampel, Eran, Van Patten, Jason, Runion, Matthew, Kaki, Sai, Bogin, Yigal, Reitz, Brian, Pritko, Steve, Najam, Shahan, Nambala, Surya, Chirra, Radhika, Welp, Rick, DiMitri, Frank, Telles, Felipe, Arvelo, Amilcar, Chu, King, Seminaro, Ed, Schram, Andrew, Eickhoff, Felix, Hanson, William, Mckeever, Eric, Joseph, Dinakaran, Chaudhary, Piyush, Shivam, Piyush, Chaudhary, Puneet, Jones, Wesley, Guthrie, Robert, Bostic, Chris, Islam, Rezaul, Duersch, Steve, Sawdon, Wayne, Lewars, John, Klos, Matthew, Spriggs, Michael, McMillan, Bill, Gao, George, Kamra, Ashish, Singh, Gaurav, Curry, Marc, Katarki, Tushar, Talerico, Joe, Shi, Zenghui, Malleni, Sai Sindhur, Gallen, Erwan
AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational
Externí odkaz:
http://arxiv.org/abs/2407.05467
Autor:
Brandon, William, Mishra, Mayank, Nrusimha, Aniruddha, Panda, Rameswar, Kelly, Jonathan Ragan
Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory required to store the KV cache can become prohibitive at long sequence lengths an
Externí odkaz:
http://arxiv.org/abs/2405.12981
Autor:
Mishra, Mayank, Stallone, Matt, Zhang, Gaoyuan, Shen, Yikang, Prasad, Aditya, Soria, Adriana Meza, Merler, Michele, Selvam, Parameswaran, Surendran, Saptha, Singh, Shivdeep, Sethi, Manish, Dang, Xuan-Hong, Li, Pengyuan, Wu, Kun-Lung, Zawad, Syed, Coleman, Andrew, White, Matthew, Lewis, Mark, Pavuluri, Raju, Koyfman, Yan, Lublinsky, Boris, de Bayser, Maximilien, Abdelaziz, Ibrahim, Basu, Kinjal, Agarwal, Mayank, Zhou, Yi, Johnson, Chris, Goyal, Aanchal, Patel, Hima, Shah, Yousaf, Zerfos, Petros, Ludwig, Heiko, Munawar, Asim, Crouse, Maxwell, Kapanipathi, Pavan, Salaria, Shweta, Calio, Bob, Wen, Sophia, Seelam, Seetharami, Belgodere, Brian, Fonseca, Carlos, Singhee, Amith, Desai, Nirmit, Cox, David D., Puri, Ruchir, Panda, Rameswar
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based age
Externí odkaz:
http://arxiv.org/abs/2405.04324
Autor:
Pan, Bowen, Shen, Yikang, Liu, Haokun, Mishra, Mayank, Zhang, Gaoyuan, Oliva, Aude, Raffel, Colin, Panda, Rameswar
Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\t
Externí odkaz:
http://arxiv.org/abs/2404.05567
Autor:
Nrusimha, Aniruddha, Mishra, Mayank, Wang, Naigang, Alistarh, Dan, Panda, Rameswar, Kim, Yoon
We consider the problem of accurate quantization for language models, where both the weights and activations are uniformly quantized to 4 bits per parameter, the lowest bitwidth format natively supported by GPU hardware. In this context, the key chal
Externí odkaz:
http://arxiv.org/abs/2404.03605
Autor:
Vas, Joseph Vimal, Medwal, Rohit, Manna, Sourabh, Mishra, Mayank, Muller, Aaron, Mohan, John Rex, Fukuma, Yasuhiro, Duchamp, Martial, Rawat, Rajdeep Singh
Exploring new strategies for controlling the magnetic domain propagation is the key to realize ultrafast, high-density domain wall-based memory and logic devices for next generation computing. These strategies include strain modulation in multiferroi
Externí odkaz:
http://arxiv.org/abs/2404.03177