Výsledky vyhledávání

Report

FastVLM: Efficient Vision Encoding for Vision Language Models

Autor: Vasu, Pavan Kumar Anasosalu, Faghri, Fartash, Li, Chun-Liang, Koc, Cem, True, Nate, Antony, Albert, Santhanam, Gokul, Gabriel, James, Grasch, Peter, Tuzel, Oncel, Pouransari, Hadi

Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as ViTs become inefficient at high resolutions

Externí odkaz: http://arxiv.org/abs/2412.13303

Zobrazit plný text záznamu

Report

Multiplexing in Networks and Diffusion

Autor: Chandrasekhar, Arun G., Chaudhary, Vasu, Golub, Benjamin, Jackson, Matthew O.

Social and economic networks are often multiplexed, meaning that people are connected by different types of relationships -- such as borrowing goods and giving advice. We make three contributions to the study of multiplexing. First, we document empir

Externí odkaz: http://arxiv.org/abs/2412.11957

Zobrazit plný text záznamu

Report

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Autor: Hanson, Alex, Tu, Allen, Lin, Geng, Singla, Vasu, Zwicker, Matthias, Goldstein, Tom

3D Gaussian Splatting (3D-GS) is a recent 3D scene reconstruction technique that enables real-time rendering of novel views by modeling scenes as parametric point clouds of differentiable 3D Gaussians. However, its rendering speed and model size stil

Externí odkaz: http://arxiv.org/abs/2412.00578

Zobrazit plný text záznamu

Report

CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering

Autor: Joshi, Ishika, Shahid, Simra, Venneti, Shreeya, Vasu, Manushree, Zheng, Yantao, Li, Yunyao, Krishnamurthy, Balaji, Chan, Gromit Yeuk-Yin

Ensuring large language models' (LLMs) responses align with prompt instructions is crucial for application development. Based on our formative study with industry professionals, the alignment requires heavy human involvement and tedious trial-and-err

Externí odkaz: http://arxiv.org/abs/2411.06099

Zobrazit plný text záznamu

Report

The geometry of quasisymmetric coinvariants

Autor: Nadeau, Philippe, Spink, Hunter, Tewari, Vasu

We develop a quasisymmetric analogue of the theory of Schubert cycles, building off of our previous work on a quasisymmetric analogue of Schubert polynomials and divided differences. Our constructions result in a natural geometric interpretation for

Externí odkaz: http://arxiv.org/abs/2410.12643

Zobrazit plný text záznamu

Report

The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks

Autor: Barman, Niyar R, Sharma, Krish, Aziz, Ashhar, Bajpai, Shashwat, Biswas, Shwetangshu, Sharma, Vasu, Jain, Vinija, Chadha, Aman, Sheth, Amit, Das, Amitava

The rapid advancement of text-to-image generation systems, exemplified by models like Stable Diffusion, Midjourney, Imagen, and DALL-E, has heightened concerns about their potential misuse. In response, companies like Meta and Google have intensified

Externí odkaz: http://arxiv.org/abs/2408.10446

Zobrazit plný text záznamu

Report

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Autor: Hsieh, Yu-Guan, Hsieh, Cheng-Yu, Yeh, Shih-Ying, Béthune, Louis, Ansari, Hadi Pour, Vasu, Pavan Kumar Anasosalu, Li, Chun-Liang, Krishna, Ranjay, Tuzel, Oncel, Cuturi, Marco

Humans describe complex scenes with compositionality, using simple text descriptions enriched with links and relationships. While vision-language research has aimed to develop models with compositional understanding capabilities, this is not reflecte

Externí odkaz: http://arxiv.org/abs/2407.06723

Zobrazit plný text záznamu

Report

Schubert polynomial expansions revisited

Autor: Nadeau, Philippe, Spink, Hunter, Tewari, Vasu

We give an elementary approach utilizing only the divided difference formalism for obtaining expansions of Schubert polynomials that are manifestly nonnegative, by studying solutions to the equation $\sum Y_i\partial_i=\mathrm{id}$ on polynomials wit

Externí odkaz: http://arxiv.org/abs/2407.02375

Zobrazit plný text záznamu

Report

Propagation of circular Airy derivative beams in complex media

Autor: Kumari, Anita, Dev, Vasu, Pal, Vishwa

Controlling light propagation through complex media plays a significant role in a wide range of applications ranging from astronomical observations to microscopy. Although, several advances have been made based on adaptive optics, optical phase conju

Externí odkaz: http://arxiv.org/abs/2406.10705

Zobrazit plný text záznamu

Report

From Pixels to Prose: A Large Dataset of Dense Image Captions

Autor: Singla, Vasu, Yue, Kaiyu, Paul, Sukriti, Shirkavand, Reza, Jayawardhana, Mayuka, Ganjdanesh, Alireza, Huang, Heng, Bhatele, Abhinav, Somepalli, Gowthami, Goldstein, Tom

Training large vision-language models requires extensive, high-quality image-text pairs. Existing web-scraped datasets, however, are noisy and lack detailed image descriptions. To bridge this gap, we introduce PixelProse, a comprehensive dataset of o

Externí odkaz: http://arxiv.org/abs/2406.10328

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání