Výsledky vyhledávání - "Catanzaro, Bryan"

Report

Effective Large Language Model Debugging with Best-first Tree Search

Autor: Song, Jialin, Raiman, Jonathan, Catanzaro, Bryan

Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental differe

Externí odkaz: http://arxiv.org/abs/2407.19055

Zobrazit plný text záznamu

Report

Compact Language Models via Pruning and Knowledge Distillation

Autor: Muralidharan, Saurav, Sreenivas, Sharath Turuvekere, Joshi, Raviraj, Chochowski, Marcin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan, Kautz, Jan, Molchanov, Pavlo

Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-train

Externí odkaz: http://arxiv.org/abs/2407.14679

Zobrazit plný text záznamu

Report

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Autor: Xu, Peng, Ping, Wei, Wu, Xianchao, Liu, Zihan, Shoeybi, Mohammad, Catanzaro, Bryan

In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. Thes

Externí odkaz: http://arxiv.org/abs/2407.14482

Zobrazit plný text záznamu

Report

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Autor: Parmar, Jupinder, Satheesh, Sanjev, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan

As language models have scaled both their number of parameters and pretraining dataset sizes, the computational cost for pretraining has become intractable except for the most well-resourced teams. This increasing cost makes it ever more important to

Externí odkaz: http://arxiv.org/abs/2407.07263

Zobrazit plný text záznamu

Report

Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Autor: Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Liu, Bo, Jhunjhunwala, Aastha, Wang, Zhilin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan

The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on. However, model developers fail to disclose their construction methodology which has lead to a l

Externí odkaz: http://arxiv.org/abs/2407.06380

Zobrazit plný text záznamu

Report

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Autor: Yu, Yue, Ping, Wei, Liu, Zihan, Wang, Boxin, You, Jiaxuan, Zhang, Chao, Shoeybi, Mohammad, Catanzaro, Bryan

Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual

Externí odkaz: http://arxiv.org/abs/2407.02485

Zobrazit plný text záznamu

Report

Improving Text-To-Audio Models with Synthetic Captions

Autor: Kong, Zhifeng, Lee, Sang-gil, Ghosal, Deepanway, Majumder, Navonil, Mehrish, Ambuj, Valle, Rafael, Poria, Soujanya, Catanzaro, Bryan

It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations relat

Externí odkaz: http://arxiv.org/abs/2406.15487

Zobrazit plný text záznamu

Report

Nemotron-4 340B Technical Report

Autor: Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distri

Externí odkaz: http://arxiv.org/abs/2406.11704

Zobrazit plný text záznamu

Report

CircuitVAE: Efficient and Scalable Latent Circuit Optimization

Autor: Song, Jialin, Swope, Aidan, Kirby, Robert, Roy, Rajarshi, Godil, Saad, Raiman, Jonathan, Catanzaro, Bryan

Automatically designing fast and space-efficient digital circuits is challenging because circuits are discrete, must exactly implement the desired logic, and are costly to simulate. We address these challenges with CircuitVAE, a search algorithm that

Externí odkaz: http://arxiv.org/abs/2406.09535

Zobrazit plný text záznamu

Report

An Empirical Study of Mamba-based Language Models

Autor: Waleffe, Roger, Byeon, Wonmin, Riach, Duncan, Norick, Brandon, Korthikanti, Vijay, Dao, Tri, Gu, Albert, Hatamizadeh, Ali, Singh, Sudhakar, Narayanan, Deepak, Kulshreshtha, Garvit, Singh, Vartika, Casper, Jared, Kautz, Jan, Shoeybi, Mohammad, Catanzaro, Bryan

Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent

Externí odkaz: http://arxiv.org/abs/2406.07887

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání