Zobrazeno 1 - 10
of 4 736
pro vyhledávání: '"A. Saenko"'
Low-rank adapters enable fine-tuning of large models with only a small number of parameters, thus reducing storage costs and minimizing the risk of catastrophic forgetting. However, they often pose optimization challenges, with poor convergence. To o
Externí odkaz:
http://arxiv.org/abs/2412.10362
Autor:
Ray, Arijit, Duan, Jiafei, Tan, Reuben, Bashkirova, Dina, Hendrix, Rose, Ehsani, Kiana, Kembhavi, Aniruddha, Plummer, Bryan A., Krishna, Ranjay, Zeng, Kuo-Hao, Saenko, Kate
Spatial perception is a fundamental component of intelligence. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only test for static spatial reasoning, such as categorizing the relative po
Externí odkaz:
http://arxiv.org/abs/2412.07755
Multi-Source Domain Generalization (DG) is the task of training on multiple source domains and achieving high classification performance on unseen target domains. Recent methods combine robust features from web-scale pretrained backbones with new fea
Externí odkaz:
http://arxiv.org/abs/2412.02856
Autor:
Yiu, Eunice, Qraitem, Maan, Wong, Charlie, Majhi, Anisa Noor, Bai, Yutong, Ginosar, Shiry, Gopnik, Alison, Saenko, Kate
This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children. A "visual analogy" is an abstract rule inferred from one image and applied to another. While benchmarks exist for testing vis
Externí odkaz:
http://arxiv.org/abs/2407.17773
Mobile app user interfaces (UIs) are rich with action, text, structure, and image content that can be utilized to learn generic UI representations for tasks like automating user commands, summarizing content, and evaluating the accessibility of user
Externí odkaz:
http://arxiv.org/abs/2406.07822
Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which are used for
Externí odkaz:
http://arxiv.org/abs/2406.01449
Autor:
Bordes, Florian, Pang, Richard Yuanzhe, Ajay, Anurag, Li, Alexander C., Bardes, Adrien, Petryk, Suzanne, Mañas, Oscar, Lin, Zhiqiu, Mahmoud, Anas, Jayaraman, Bargav, Ibrahim, Mark, Hall, Melissa, Xiong, Yunyang, Lebensold, Jonathan, Ross, Candace, Jayakumar, Srihari, Guo, Chuan, Bouchacourt, Diane, Al-Tahan, Haider, Padthe, Karthik, Sharma, Vasu, Xu, Hu, Tan, Xiaoqing Ellen, Richards, Megan, Lavoie, Samuel, Astolfi, Pietro, Hemmat, Reyhane Askari, Chen, Jun, Tirumala, Kushal, Assouel, Rim, Moayeri, Mazda, Talattof, Arjang, Chaudhuri, Kamalika, Liu, Zechun, Chen, Xilun, Garrido, Quentin, Ullrich, Karen, Agrawal, Aishwarya, Saenko, Kate, Celikyilmaz, Asli, Chandra, Vikas
Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce
Externí odkaz:
http://arxiv.org/abs/2405.17247
Autor:
Petsiuk, Vitali, Saenko, Kate
Motivated by ethical and legal concerns, the scientific community is actively developing methods to limit the misuse of Text-to-Image diffusion models for reproducing copyrighted, violent, explicit, or personal information in the generated images. Si
Externí odkaz:
http://arxiv.org/abs/2404.13706
Autor:
Tan, Reuben, Sun, Ximeng, Hu, Ping, Wang, Jui-hsien, Deilamsalehy, Hanieh, Plummer, Bryan A., Russell, Bryan, Saenko, Kate
Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships. State-of-the-art video Large Language Models (vLLMs) hold promise as a viable solution due to th
Externí odkaz:
http://arxiv.org/abs/2404.04346
Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP. However, the susceptibility of recent Large Vision-Language Models to these attacks remains understudied
Externí odkaz:
http://arxiv.org/abs/2402.00626