Zobrazeno 1 - 10
of 4 768
pro vyhledávání: '"P Vasu"'
Autor:
Vasu, Pavan Kumar Anasosalu, Faghri, Fartash, Li, Chun-Liang, Koc, Cem, True, Nate, Antony, Albert, Santhanam, Gokul, Gabriel, James, Grasch, Peter, Tuzel, Oncel, Pouransari, Hadi
Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as ViTs become inefficient at high resolutions
Externí odkaz:
http://arxiv.org/abs/2412.13303
Social and economic networks are often multiplexed, meaning that people are connected by different types of relationships -- such as borrowing goods and giving advice. We make three contributions to the study of multiplexing. First, we document empir
Externí odkaz:
http://arxiv.org/abs/2412.11957
3D Gaussian Splatting (3D-GS) is a recent 3D scene reconstruction technique that enables real-time rendering of novel views by modeling scenes as parametric point clouds of differentiable 3D Gaussians. However, its rendering speed and model size stil
Externí odkaz:
http://arxiv.org/abs/2412.00578
Autor:
Joshi, Ishika, Shahid, Simra, Venneti, Shreeya, Vasu, Manushree, Zheng, Yantao, Li, Yunyao, Krishnamurthy, Balaji, Chan, Gromit Yeuk-Yin
Ensuring large language models' (LLMs) responses align with prompt instructions is crucial for application development. Based on our formative study with industry professionals, the alignment requires heavy human involvement and tedious trial-and-err
Externí odkaz:
http://arxiv.org/abs/2411.06099
We develop a quasisymmetric analogue of the theory of Schubert cycles, building off of our previous work on a quasisymmetric analogue of Schubert polynomials and divided differences. Our constructions result in a natural geometric interpretation for
Externí odkaz:
http://arxiv.org/abs/2410.12643
Autor:
Barman, Niyar R, Sharma, Krish, Aziz, Ashhar, Bajpai, Shashwat, Biswas, Shwetangshu, Sharma, Vasu, Jain, Vinija, Chadha, Aman, Sheth, Amit, Das, Amitava
The rapid advancement of text-to-image generation systems, exemplified by models like Stable Diffusion, Midjourney, Imagen, and DALL-E, has heightened concerns about their potential misuse. In response, companies like Meta and Google have intensified
Externí odkaz:
http://arxiv.org/abs/2408.10446
Autor:
Hsieh, Yu-Guan, Hsieh, Cheng-Yu, Yeh, Shih-Ying, Béthune, Louis, Ansari, Hadi Pour, Vasu, Pavan Kumar Anasosalu, Li, Chun-Liang, Krishna, Ranjay, Tuzel, Oncel, Cuturi, Marco
Humans describe complex scenes with compositionality, using simple text descriptions enriched with links and relationships. While vision-language research has aimed to develop models with compositional understanding capabilities, this is not reflecte
Externí odkaz:
http://arxiv.org/abs/2407.06723
We give an elementary approach utilizing only the divided difference formalism for obtaining expansions of Schubert polynomials that are manifestly nonnegative, by studying solutions to the equation $\sum Y_i\partial_i=\mathrm{id}$ on polynomials wit
Externí odkaz:
http://arxiv.org/abs/2407.02375
Controlling light propagation through complex media plays a significant role in a wide range of applications ranging from astronomical observations to microscopy. Although, several advances have been made based on adaptive optics, optical phase conju
Externí odkaz:
http://arxiv.org/abs/2406.10705
Autor:
Singla, Vasu, Yue, Kaiyu, Paul, Sukriti, Shirkavand, Reza, Jayawardhana, Mayuka, Ganjdanesh, Alireza, Huang, Heng, Bhatele, Abhinav, Somepalli, Gowthami, Goldstein, Tom
Training large vision-language models requires extensive, high-quality image-text pairs. Existing web-scraped datasets, however, are noisy and lack detailed image descriptions. To bridge this gap, we introduce PixelProse, a comprehensive dataset of o
Externí odkaz:
http://arxiv.org/abs/2406.10328