Zobrazeno 1 - 10
of 127
pro vyhledávání: '"Molchanov, Pavlo"'
Autor:
Ranzinger, Mike, Barker, Jon, Heinrich, Greg, Molchanov, Pavlo, Catanzaro, Bryan, Tao, Andrew
Various visual foundation models have distinct strengths and weaknesses, both of which can be improved through heterogeneous multi-teacher knowledge distillation without labels, termed "agglomerative models." We build upon this body of work by studyi
Externí odkaz:
http://arxiv.org/abs/2410.01680
Autor:
Fang, Gongfan, Yin, Hongxu, Muralidharan, Saurav, Heinrich, Greg, Pool, Jeff, Kautz, Jan, Molchanov, Pavlo, Wang, Xinchao
Large Language Models (LLMs) are distinguished by their massive parameter counts, which typically result in significant redundancy. This work introduces MaskLLM, a learnable pruning method that establishes Semi-structured (or ``N:M'') Sparsity in LLM
Externí odkaz:
http://arxiv.org/abs/2409.17481
Autor:
Li, Jiefeng, Yuan, Ye, Rempe, Davis, Zhang, Haotian, Molchanov, Pavlo, Lu, Cewu, Kautz, Jan, Iqbal, Umar
Estimating global human motion from moving cameras is challenging due to the entanglement of human and camera motions. To mitigate the ambiguity, existing methods leverage learned human motion priors, which however often result in oversmoothed motion
Externí odkaz:
http://arxiv.org/abs/2408.16426
Autor:
Sreenivas, Sharath Turuvekere, Muralidharan, Saurav, Joshi, Raviraj, Chochowski, Marcin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan, Kautz, Jan, Molchanov, Pavlo
We present a comprehensive report on compressing the Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters, respectively, using pruning and distillation. We explore two distinct pruning strategies: (1) depth pruning and (2) joint hidden/at
Externí odkaz:
http://arxiv.org/abs/2408.11796
Autor:
Xue, Fuzhao, Chen, Yukang, Li, Dacheng, Hu, Qinghao, Zhu, Ligeng, Li, Xiuyu, Fang, Yunhao, Tang, Haotian, Yang, Shang, Liu, Zhijian, He, Ethan, Yin, Hongxu, Molchanov, Pavlo, Kautz, Jan, Fan, Linxi, Zhu, Yuke, Lu, Yao, Han, Song
Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, a full-stack solution for long-context visual-language models by co-designing the algorithm and system. For model t
Externí odkaz:
http://arxiv.org/abs/2408.10188
Autor:
Fang, Yunhao, Zhu, Ligeng, Lu, Yao, Wang, Yan, Molchanov, Pavlo, Cho, Jang Hyun, Pavone, Marco, Han, Song, Yin, Hongxu
Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). While model architectures and training infrastructures advance rapidly, data curation remains under-explored. When data quantity and quality
Externí odkaz:
http://arxiv.org/abs/2407.17453
Autor:
Siddiqui, Shoaib Ahmed, Dong, Xin, Heinrich, Greg, Breuel, Thomas, Kautz, Jan, Krueger, David, Molchanov, Pavlo
Large Language Models (LLMs) are not only resource-intensive to train but even more costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs based on cheap proxies for estimating block importance, effectively remov
Externí odkaz:
http://arxiv.org/abs/2407.16286
Autor:
Muralidharan, Saurav, Sreenivas, Sharath Turuvekere, Joshi, Raviraj, Chochowski, Marcin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan, Kautz, Jan, Molchanov, Pavlo
Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-train
Externí odkaz:
http://arxiv.org/abs/2407.14679
Autor:
Cai, Ruisi, Muralidharan, Saurav, Heinrich, Greg, Yin, Hongxu, Wang, Zhangyang, Kautz, Jan, Molchanov, Pavlo
Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a networ
Externí odkaz:
http://arxiv.org/abs/2406.10260
Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch undermines the benefit of leveraging the learned knowledge, leading to
Externí odkaz:
http://arxiv.org/abs/2406.04484