Výsledky vyhledávání - "Cox, David A"

Report

Autor: Cox, David A.

These notes explore three amazing formulas proved by Abel in his 1826 Paris memoir on what we now call Abelian integrals. We discuss the first two formulas from the point of view of symbolic computation and explain their connection to residues and pa

Externí odkaz: http://arxiv.org/abs/2410.03745

Zobrazit plný text záznamu

Report

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Autor: Shen, Yikang, Stallone, Matthew, Mishra, Mayank, Zhang, Gaoyuan, Tan, Shawn, Prasad, Aditya, Soria, Adriana Meza, Cox, David D., Panda, Rameswar

Finding the optimal learning rate for language model pretraining is a challenging task. This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters bu

Externí odkaz: http://arxiv.org/abs/2408.13359

Zobrazit plný text záznamu

Report

Scaling Granite Code Models to 128K Context

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraini

Externí odkaz: http://arxiv.org/abs/2407.13739

Zobrazit plný text záznamu

Report

The infrastructure powering IBM's Gen AI model development

Autor: Gershon, Talia, Seelam, Seetharami, Belgodere, Brian, Bonilla, Milton, Hoang, Lan, Barnett, Danny, Chung, I-Hsin, Mohan, Apoorve, Chen, Ming-Hung, Luo, Lixiang, Walkup, Robert, Evangelinos, Constantinos, Salaria, Shweta, Dombrowa, Marc, Park, Yoonho, Kayi, Apo, Schour, Liran, Alim, Alim, Sydney, Ali, Maniotis, Pavlos, Schares, Laurent, Metzler, Bernard, Karacali-Akyamac, Bengi, Wen, Sophia, Chiba, Tatsuhiro, Choochotkaew, Sunyanan, Yoshimura, Takeshi, Misale, Claudia, Elengikal, Tonia, Connor, Kevin O, Liu, Zhuoran, Molina, Richard, Schneidenbach, Lars, Caden, James, Laibinis, Christopher, Fonseca, Carlos, Tarasov, Vasily, Sundararaman, Swaminathan, Schmuck, Frank, Guthridge, Scott, Cohn, Jeremy, Eshel, Marc, Muench, Paul, Liu, Runyu, Pointer, William, Wyskida, Drew, Krull, Bob, Rose, Ray, Wolfe, Brent, Cornejo, William, Walter, John, Malone, Colm, Perucci, Clifford, Franco, Frank, Hinds, Nigel, Calio, Bob, Druyan, Pavel, Kilduff, Robert, Kienle, John, McStay, Connor, Figueroa, Andrew, Connolly, Matthew, Fost, Edie, Roma, Gina, Fonseca, Jake, Levy, Ido, Payne, Michele, Schenkel, Ryan, Malki, Amir, Schneider, Lion, Narkhede, Aniruddha, Moshref, Shekeba, Kisin, Alexandra, Dodin, Olga, Rippon, Bill, Wrieth, Henry, Ganci, John, Colino, Johnny, Habeger-Rose, Donna, Pandey, Rakesh, Gidh, Aditya, Gaur, Aditya, Patterson, Dennis, Salmani, Samsuddin, Varma, Rambilas, Rumana, Rumana, Sharma, Shubham, Mishra, Mayank, Panda, Rameswar, Prasad, Aditya, Stallone, Matt, Zhang, Gaoyuan, Shen, Yikang, Cox, David, Puri, Ruchir, Agrawal, Dakshi, Thorstensen, Drew, Belog, Joel, Tang, Brent, Gupta, Saurabh Kumar, Biswas, Amitabha, Maheshwari, Anup, Gampel, Eran, Van Patten, Jason, Runion, Matthew, Kaki, Sai, Bogin, Yigal, Reitz, Brian, Pritko, Steve, Najam, Shahan, Nambala, Surya, Chirra, Radhika, Welp, Rick, DiMitri, Frank, Telles, Felipe, Arvelo, Amilcar, Chu, King, Seminaro, Ed, Schram, Andrew, Eickhoff, Felix, Hanson, William, Mckeever, Eric, Joseph, Dinakaran, Chaudhary, Piyush, Shivam, Piyush, Chaudhary, Puneet, Jones, Wesley, Guthrie, Robert, Bostic, Chris, Islam, Rezaul, Duersch, Steve, Sawdon, Wayne, Lewars, John, Klos, Matthew, Spriggs, Michael, McMillan, Bill, Gao, George, Kamra, Ashish, Singh, Gaurav, Curry, Marc, Katarki, Tushar, Talerico, Joe, Shi, Zenghui, Malleni, Sai Sindhur, Gallen, Erwan

AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational

Externí odkaz: http://arxiv.org/abs/2407.05467

Zobrazit plný text záznamu

Report

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the t

Externí odkaz: http://arxiv.org/abs/2407.00121

Zobrazit plný text záznamu

Report

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Autor: Kang, Junmo, Karlinsky, Leonid, Luo, Hongyin, Wang, Zhen, Hansen, Jacob, Glass, James, Cox, David, Panda, Rameswar, Feris, Rogerio, Ritter, Alan

We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert

Externí odkaz: http://arxiv.org/abs/2406.12034

Zobrazit plný text záznamu

Report

Wavefront Threading Enables Effective High-Level Synthesis

Autor: Pelton, Blake, Sapek, Adam, Eguro, Ken, Lo, Daniel, Forin, Alessandro, Humphrey, Matt, Xi, Jinwen, Cox, David, Karandikar, Rajas, Licht, Johannes de Fine, Babin, Evgeny, Caulfield, Adrian, Burger, Doug

Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL

Externí odkaz: http://arxiv.org/abs/2405.19514

Zobrazit plný text záznamu

Report

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Autor: Wang, Runqian, Ghosh, Soumya, Cox, David, Antognini, Diego, Oliva, Aude, Feris, Rogerio, Karlinsky, Leonid

Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA paramete

Externí odkaz: http://arxiv.org/abs/2405.17258

Zobrazit plný text záznamu

Report

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based age

Externí odkaz: http://arxiv.org/abs/2405.04324

Zobrazit plný text záznamu

Report

LAB: Large-Scale Alignment for ChatBots

Autor: Sudalairaj, Shivchander, Bhandwaldar, Abhishek, Pareja, Aldo, Xu, Kai, Cox, David D., Srivastava, Akash

This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data gen

Externí odkaz: http://arxiv.org/abs/2403.01081

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání