Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Yu, Shoubin"'
Recent advances in diffusion models have significantly enhanced their ability to generate high-quality images and videos, but they have also increased the risk of producing unsafe content. Existing unlearning/editing-based methods for safe generation
Externí odkaz:
http://arxiv.org/abs/2410.12761
Autor:
Wang, Ziyang, Yu, Shoubin, Stengel-Eskin, Elias, Yoon, Jaehong, Cheng, Feng, Bertasius, Gedas, Bansal, Mohit
Long-form video understanding has been a challenging task due to the high redundancy in video data and the abundance of query-irrelevant information. To tackle this challenge, we propose VideoTree, a training-free framework which builds a query-adapt
Externí odkaz:
http://arxiv.org/abs/2405.19209
Recent video generative models primarily rely on carefully written text prompts for specific tasks, like inpainting or style editing. They require labor-intensive textual descriptions for input videos, hindering their flexibility to adapt personal/ra
Externí odkaz:
http://arxiv.org/abs/2405.18406
Reasoning in the real world is not divorced from situations. How to capture the present knowledge from surrounding situations and perform reasoning accordingly is crucial and challenging for machine intelligence. This paper introduces a new benchmark
Externí odkaz:
http://arxiv.org/abs/2405.09711
Despite impressive advancements in recent multimodal reasoning approaches, they are still limited in flexibility and efficiency, as these models typically process only a few fixed modality inputs and require updates to numerous parameters. This paper
Externí odkaz:
http://arxiv.org/abs/2402.05889
Autor:
Zhang, Ce, Lu, Taixi, Islam, Md Mohaiminul, Wang, Ziyang, Yu, Shoubin, Bansal, Mohit, Bertasius, Gedas
We present LLoVi, a language-based framework for long-range video question-answering (LVQA). Unlike prior long-range video understanding methods, which are often costly and require specialized long-range video modeling design (e.g., memory queues, st
Externí odkaz:
http://arxiv.org/abs/2312.17235
Recent studies have shown promising results on utilizing pre-trained image-language models for video question answering. While these image-language models can efficiently bootstrap the representation learning of video-language models, they typically
Externí odkaz:
http://arxiv.org/abs/2305.06988
Autor:
Yu, Shoubin, Zhao, Zhongyin, Fang, Haoshu, Deng, Andong, Su, Haisheng, Wang, Dongliang, Gan, Weihao, Lu, Cewu, Wu, Wei
Anomaly detection in surveillance videos is challenging and important for ensuring public security. Different from pixel-based anomaly detection methods, pose-based methods utilize highly-structured skeleton data, which decreases the computational bu
Externí odkaz:
http://arxiv.org/abs/2112.03649
Autor:
Yu, Shoubin, Zhao, Zhongyin, Fang, Haoshu, Deng, Andong, Su, Haisheng, Wang, Dongliang, Gan, Weihao, Lu, Cewu, Wu, Wei
Publikováno v:
IEEE Transactions on Circuits and Systems for Video Technology; August 2024, Vol. 34 Issue: 8 p6661-6673, 13p