Zobrazeno 1 - 1
of 1
pro vyhledávání: '"True, Nate"'
Autor:
Vasu, Pavan Kumar Anasosalu, Faghri, Fartash, Li, Chun-Liang, Koc, Cem, True, Nate, Antony, Albert, Santhanam, Gokul, Gabriel, James, Grasch, Peter, Tuzel, Oncel, Pouransari, Hadi
Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as ViTs become inefficient at high resolutions
Externí odkaz:
http://arxiv.org/abs/2412.13303