Výsledky vyhledávání

Report

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

Autor: Huang, Irene, Lin, Wei, Mirza, M. Jehanzeb, Hansen, Jacob A., Doveh, Sivan, Butoi, Victor Ion, Herzig, Roei, Arbelle, Assaf, Kuhene, Hilde, Darrel, Trevor, Gan, Chuang, Oliva, Aude, Feris, Rogerio, Karlinsky, Leonid

Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficie

Externí odkaz: http://arxiv.org/abs/2406.08164

Zobrazit plný text záznamu

Report

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Autor: Wang, Runqian, Ghosh, Soumya, Cox, David, Antognini, Diego, Oliva, Aude, Feris, Rogerio, Karlinsky, Leonid

Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA paramete

Externí odkaz: http://arxiv.org/abs/2405.17258

Zobrazit plný text záznamu

Report

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Autor: Pan, Bowen, Shen, Yikang, Liu, Haokun, Mishra, Mayank, Zhang, Gaoyuan, Oliva, Aude, Raffel, Colin, Panda, Rameswar

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\t

Externí odkaz: http://arxiv.org/abs/2404.05567

Zobrazit plný text záznamu

Report

Learning Human Action Recognition Representations Without Real Humans

Autor: Zhong, Howard, Mishra, Samarth, Kim, Donghyun, Jin, SouYoung, Panda, Rameswar, Kuehne, Hilde, Karlinsky, Leonid, Saligrama, Venkatesh, Oliva, Aude, Feris, Rogerio

Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related

Externí odkaz: http://arxiv.org/abs/2311.06231

Zobrazit plný text záznamu

Report

LangNav: Language as a Perceptual Representation for Navigation

Autor: Pan, Bowen, Panda, Rameswar, Jin, SouYoung, Feris, Rogerio, Oliva, Aude, Isola, Phillip, Kim, Yoon

We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings. Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's

Externí odkaz: http://arxiv.org/abs/2310.07889

Zobrazit plný text záznamu

Report

Artifact magnification on deepfake videos increases human detection and subjective confidence

Autor: Josephs, Emilie, Fosco, Camilo, Oliva, Aude

The development of technologies for easily and automatically falsifying video has raised practical questions about people's ability to detect false information online. How vulnerable are people to deepfake videos? What technologies can be applied to

Externí odkaz: http://arxiv.org/abs/2304.04733

Zobrazit plný text záznamu

Report

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

Autor: Cascante-Bonilla, Paola, Shehada, Khaled, Smith, James Seale, Doveh, Sivan, Kim, Donghyun, Panda, Rameswar, Varol, Gül, Oliva, Aude, Ordonez, Vicente, Feris, Rogerio, Karlinsky, Leonid

Large-scale pre-trained Vision & Language (VL) models have shown remarkable performance in many applications, enabling replacing a fixed set of supported classes with zero-shot open vocabulary reasoning over (almost arbitrary) natural language prompt

Externí odkaz: http://arxiv.org/abs/2303.17590

Zobrazit plný text záznamu

Report

Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines

Autor: Fosco, Camilo, Josephs, Emilie, Andonian, Alex, Lee, Allen, Wang, Xi, Oliva, Aude

Deepfakes pose a serious threat to digital well-being by fueling misinformation. As deepfakes get harder to recognize with the naked eye, human users become increasingly reliant on deepfake detection models to decide if a video is real or fake. Curre

Externí odkaz: http://arxiv.org/abs/2206.00535

Zobrazit plný text záznamu

Report

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Autor: Grauman, Kristen, Westbury, Andrew, Byrne, Eugene, Chavis, Zachary, Furnari, Antonino, Girdhar, Rohit, Hamburger, Jackson, Jiang, Hao, Liu, Miao, Liu, Xingyu, Martin, Miguel, Nagarajan, Tushar, Radosavovic, Ilija, Ramakrishnan, Santhosh Kumar, Ryan, Fiona, Sharma, Jayant, Wray, Michael, Xu, Mengmeng, Xu, Eric Zhongcong, Zhao, Chen, Bansal, Siddhant, Batra, Dhruv, Cartillier, Vincent, Crane, Sean, Do, Tien, Doulaty, Morrie, Erapalli, Akshay, Feichtenhofer, Christoph, Fragomeni, Adriano, Fu, Qichen, Gebreselasie, Abrham, Gonzalez, Cristina, Hillis, James, Huang, Xuhua, Huang, Yifei, Jia, Wenqi, Khoo, Weslie, Kolar, Jachym, Kottur, Satwik, Kumar, Anurag, Landini, Federico, Li, Chao, Li, Yanghao, Li, Zhenqiang, Mangalam, Karttikeya, Modhugu, Raghava, Munro, Jonathan, Murrell, Tullie, Nishiyasu, Takumi, Price, Will, Puentes, Paola Ruiz, Ramazanova, Merey, Sari, Leda, Somasundaram, Kiran, Southerland, Audrey, Sugano, Yusuke, Tao, Ruijie, Vo, Minh, Wang, Yuchen, Wu, Xindi, Yagi, Takuma, Zhao, Ziwei, Zhu, Yunyi, Arbelaez, Pablo, Crandall, David, Damen, Dima, Farinella, Giovanni Maria, Fuegen, Christian, Ghanem, Bernard, Ithapu, Vamsi Krishna, Jawahar, C. V., Joo, Hanbyul, Kitani, Kris, Li, Haizhou, Newcombe, Richard, Oliva, Aude, Park, Hyun Soo, Rehg, James M., Sato, Yoichi, Shi, Jianbo, Shou, Mike Zheng, Torralba, Antonio, Torresani, Lorenzo, Yan, Mingfei, Malik, Jitendra

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers f

Externí odkaz: http://arxiv.org/abs/2110.07058

Zobrazit plný text záznamu

Report

Dynamic Network Quantization for Efficient Video Inference

Autor: Sun, Ximeng, Panda, Rameswar, Chen, Chun-Fu, Oliva, Aude, Feris, Rogerio, Saenko, Kate

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition. Motivated by the eff

Externí odkaz: http://arxiv.org/abs/2108.10394

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání