Zobrazeno 1 - 10
of 471
pro vyhledávání: '"Kautz, Jan"'
Autor:
Waleffe, Roger, Byeon, Wonmin, Riach, Duncan, Norick, Brandon, Korthikanti, Vijay, Dao, Tri, Gu, Albert, Hatamizadeh, Ali, Singh, Sudhakar, Narayanan, Deepak, Kulshreshtha, Garvit, Singh, Vartika, Casper, Jared, Kautz, Jan, Shoeybi, Mohammad, Catanzaro, Bryan
Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent
Externí odkaz:
http://arxiv.org/abs/2406.07887
Autor:
Li, Zhenxin, Li, Kailin, Wang, Shihao, Lan, Shiyi, Yu, Zhiding, Ji, Yishen, Li, Zhiqi, Zhu, Ziyue, Kautz, Jan, Wu, Zuxuan, Jiang, Yu-Gang, Alvarez, Jose M.
We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn
Externí odkaz:
http://arxiv.org/abs/2406.06978
Autor:
Cai, Ruisi, Muralidharan, Saurav, Heinrich, Greg, Yin, Hongxu, Wang, Zhangyang, Kautz, Jan, Molchanov, Pavlo
Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a networ
Externí odkaz:
http://arxiv.org/abs/2406.10260
Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users. However, these models often do not offer precise control over camera poses for video generation, l
Externí odkaz:
http://arxiv.org/abs/2406.02509
Autor:
Cheng, An-Chieh, Yin, Hongxu, Fu, Yang, Guo, Qiushan, Yang, Ruihan, Kautz, Jan, Wang, Xiaolong, Liu, Sifei
Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhanc
Externí odkaz:
http://arxiv.org/abs/2406.01584
Autor:
Ye, Hanrong, Huang, De-An, Lu, Yao, Yu, Zhiding, Ping, Wei, Tao, Andrew, Kautz, Jan, Han, Song, Xu, Dan, Molchanov, Pavlo, Yin, Hongxu
We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LL
Externí odkaz:
http://arxiv.org/abs/2405.19335
Autor:
Wang, Shihao, Yu, Zhiding, Jiang, Xiaohui, Lan, Shiyi, Shi, Min, Chang, Nadine, Kautz, Jan, Li, Ying, Alvarez, Jose M.
The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved
Externí odkaz:
http://arxiv.org/abs/2405.01533
Autor:
Huang, De-An, Liao, Shijia, Radhakrishnan, Subhashree, Yin, Hongxu, Molchanov, Pavlo, Yu, Zhiding, Kautz, Jan
There has been tremendous progress in multimodal Large Language Models (LLMs). Recent works have extended these models to video input with promising instruction following capabilities. However, an important missing piece is temporal localization. The
Externí odkaz:
http://arxiv.org/abs/2403.19046
Wide field-of-view (FoV) cameras efficiently capture large portions of the scene, which makes them attractive in multiple domains, such as automotive and robotics. For such applications, estimating depth from multiple images is a critical task, and t
Externí odkaz:
http://arxiv.org/abs/2401.13786
Autor:
Yuan, Ye, Li, Xueting, Huang, Yangyi, De Mello, Shalini, Nagano, Koki, Kautz, Jan, Iqbal, Umar
Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatar
Externí odkaz:
http://arxiv.org/abs/2312.11461