Výsledky vyhledávání - "A. Catanzaro"

Report

The curious case of 2MASS J15594729+4403595, an ultra-fast M2 dwarf with possible Rieger cycles

Autor: Messina, S., Catanzaro, G., Lanza, A. F., Gandolfi, D., Serrano, M. M., Deeg, H. J., Garcia-Alvarez, D.

RACE-OC (Rotation and ACtivity Evolution in Open Clusters) is a project aimed at characterising the rotational and magnetic activity properties of the late-type members of open clusters, stellar associations, and moving groups of different ages. As p

Externí odkaz: http://arxiv.org/abs/2408.16328

Zobrazit plný text záznamu

Report

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Autor: Shi, Min, Liu, Fuxiao, Wang, Shihao, Liao, Shijia, Radhakrishnan, Subhashree, Huang, De-An, Yin, Hongxu, Sapra, Karan, Yacoob, Yaser, Shi, Humphrey, Catanzaro, Bryan, Tao, Andrew, Kautz, Jan, Yu, Zhiding, Liu, Guilin

The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on reso

Externí odkaz: http://arxiv.org/abs/2408.15998

Zobrazit plný text záznamu

Report

LLM Pruning and Distillation in Practice: The Minitron Approach

Autor: Sreenivas, Sharath Turuvekere, Muralidharan, Saurav, Joshi, Raviraj, Chochowski, Marcin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan, Kautz, Jan, Molchanov, Pavlo

We present a comprehensive report on compressing the Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters, respectively, using pruning and distillation. We explore two distinct pruning strategies: (1) depth pruning and (2) joint hidden/at

Externí odkaz: http://arxiv.org/abs/2408.11796

Zobrazit plný text záznamu

Report

Effective Large Language Model Debugging with Best-first Tree Search

Autor: Song, Jialin, Raiman, Jonathan, Catanzaro, Bryan

Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental differe

Externí odkaz: http://arxiv.org/abs/2407.19055

Zobrazit plný text záznamu

Report

Compact Language Models via Pruning and Knowledge Distillation

Autor: Muralidharan, Saurav, Sreenivas, Sharath Turuvekere, Joshi, Raviraj, Chochowski, Marcin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan, Kautz, Jan, Molchanov, Pavlo

Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-train

Externí odkaz: http://arxiv.org/abs/2407.14679

Zobrazit plný text záznamu

Report

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Autor: Xu, Peng, Ping, Wei, Wu, Xianchao, Liu, Zihan, Shoeybi, Mohammad, Catanzaro, Bryan

In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. Thes

Externí odkaz: http://arxiv.org/abs/2407.14482

Zobrazit plný text záznamu

Report

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Autor: Parmar, Jupinder, Satheesh, Sanjev, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan

As language models have scaled both their number of parameters and pretraining dataset sizes, the computational cost for pretraining has become intractable except for the most well-resourced teams. This increasing cost makes it ever more important to

Externí odkaz: http://arxiv.org/abs/2407.07263

Zobrazit plný text záznamu

Report

Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Autor: Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Liu, Bo, Jhunjhunwala, Aastha, Wang, Zhilin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan

The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on. However, model developers fail to disclose their construction methodology which has lead to a l

Externí odkaz: http://arxiv.org/abs/2407.06380

Zobrazit plný text záznamu

Report

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Autor: Yu, Yue, Ping, Wei, Liu, Zihan, Wang, Boxin, You, Jiaxuan, Zhang, Chao, Shoeybi, Mohammad, Catanzaro, Bryan

Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual

Externí odkaz: http://arxiv.org/abs/2407.02485

Zobrazit plný text záznamu

Report

Improving Text-To-Audio Models with Synthetic Captions

Autor: Kong, Zhifeng, Lee, Sang-gil, Ghosal, Deepanway, Majumder, Navonil, Mehrish, Ambuj, Valle, Rafael, Poria, Soujanya, Catanzaro, Bryan

It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations relat

Externí odkaz: http://arxiv.org/abs/2406.15487

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání