Zobrazeno 1 - 10
of 80
pro vyhledávání: '"Krähenbühl, Philipp"'
As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix wit
Externí odkaz:
http://arxiv.org/abs/2411.09009
Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an o
Externí odkaz:
http://arxiv.org/abs/2410.06468
Autor:
Tan, Shuhan, Ivanovic, Boris, Chen, Yuxiao, Li, Boyi, Weng, Xinshuo, Cao, Yulong, Krähenbühl, Philipp, Pavone, Marco
Simulation stands as a cornerstone for safe and efficient autonomous driving development. At its core a simulation system ought to produce realistic, reactive, and controllable traffic patterns. In this paper, we propose ProSim, a multimodal promptab
Externí odkaz:
http://arxiv.org/abs/2409.05863
We propose a new transformer-based image and video tokenizer with Binary Spherical Quantization (BSQ). BSQ projects the high-dimensional visual embedding to a lower-dimensional hypersphere and then applies binary quantization. BSQ is (1) parameter-ef
Externí odkaz:
http://arxiv.org/abs/2406.07548
Autor:
Cho, Jang Hyun, Ivanovic, Boris, Cao, Yulong, Schmerling, Edward, Wang, Yue, Weng, Xinshuo, Li, Boyi, You, Yurong, Krähenbühl, Philipp, Wang, Yan, Pavone, Marco
Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develo
Externí odkaz:
http://arxiv.org/abs/2405.03685
Autor:
Zhao, Yue, Zhao, Long, Zhou, Xingyi, Wu, Jialin, Chu, Chun-Te, Miao, Hui, Schroff, Florian, Adam, Hartwig, Liu, Ting, Gong, Boqing, Krähenbühl, Philipp, Yuan, Liangzhe
The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human-curated video-text data available. We thus resort
Externí odkaz:
http://arxiv.org/abs/2401.06129
Stabilizing proteins is a foundational step in protein engineering. However, the evolutionary pressure of all extant proteins makes identifying the scarce number of mutations that will improve thermodynamic stability challenging. Deep learning has re
Externí odkaz:
http://arxiv.org/abs/2310.12979
Autor:
Zhao, Yue, Krähenbühl, Philipp
Videos are big, complex to pre-process, and slow to train on. State-of-the-art large-scale video models are trained on clusters of 32 or more GPUs for several days. As a consequence, academia largely ceded the training of large video models to indust
Externí odkaz:
http://arxiv.org/abs/2309.16669
Simulation forms the backbone of modern self-driving development. Simulators help develop, test, and improve driving systems without putting humans, vehicles, or their environment at risk. However, simulators face a major challenge: They rely on real
Externí odkaz:
http://arxiv.org/abs/2307.07947
Autor:
Cho, Jang Hyun, Krähenbühl, Philipp
Large-scale object detection and instance segmentation face a severe data imbalance. The finer-grained object classes become, the less frequent they appear in our datasets. However, at test-time, we expect a detector that performs well for all classe
Externí odkaz:
http://arxiv.org/abs/2301.09724