Zobrazeno 1 - 10
of 314
pro vyhledávání: '"Tang Yunlong"'
Text-guided diffusion models have revolutionized generative tasks by producing high-fidelity content from text descriptions. They have also enabled an editing paradigm where concepts can be replaced through text conditioning (e.g., a dog to a tiger).
Externí odkaz:
http://arxiv.org/abs/2410.24151
Autor:
Wang, Tao, Zou, Mingjie, Zhang, Dehe, Ku, Yu-Chieh, Zheng, Yawen, Pan, Shen, Ren, Zhongqi, Xu, Zedong, Huang, Haoliang, Luo, Wei, Tang, Yunlong, Chen, Lang, Liu, Cheng-En, Chang, Chun-Fu, Das, Sujit, Bellaiche, Laurent, Yang, Yurong, Ma, Xiuliang, Kuo, Chang-Yang, Liu, Xingjun, Chen, Zuhuang
Publikováno v:
Matter 8, 1-11, 2025
Efforts to combine the advantages of multiple systems to enhance functionlities through solid solution design present a great challenge due to the constraint imposed by the classical Vegard law. Here, we successfully navigate this trade off by levera
Externí odkaz:
http://arxiv.org/abs/2410.12252
Autor:
Hua, Hang, Tang, Yunlong, Zeng, Ziyun, Cao, Liangliang, Yang, Zhengyuan, He, Hangfeng, Xu, Chenliang, Luo, Jiebo
The advent of large Vision-Language Models (VLMs) has significantly advanced multimodal understanding, enabling more sophisticated and accurate integration of visual and textual information across various tasks, including image and video captioning,
Externí odkaz:
http://arxiv.org/abs/2410.09733
The rapid evolution of egocentric video analysis brings new insights into understanding human activities and intentions from a first-person perspective. Despite this progress, the fragmentation in tasks like action recognition, procedure learning, an
Externí odkaz:
http://arxiv.org/abs/2409.17523
Autor:
Moskalenko, Andrey, Bryncev, Alexey, Vatolin, Dmitry, Timofte, Radu, Zhan, Gen, Yang, Li, Tang, Yunlong, Liao, Yiting, Lin, Jiongzhi, Huang, Baitao, Moradi, Morteza, Moradi, Mohammad, Rundo, Francesco, Spampinato, Concetto, Borji, Ali, Palazzo, Simone, Zhu, Yuxin, Sun, Yinan, Duan, Huiyu, Cao, Yuqin, Jia, Ziheng, Hu, Qiang, Min, Xiongkuo, Zhai, Guangtao, Fang, Hao, Cong, Runmin, Lu, Xiankai, Zhou, Xiaofei, Zhang, Wei, Zhao, Chunyu, Mu, Wentao, Deng, Tao, Tavakoli, Hamed R.
This paper reviews the Challenge on Video Saliency Prediction at AIM 2024. The goal of the participants was to develop a method for predicting accurate saliency maps for the provided set of video sequences. Saliency maps are widely exploited in vario
Externí odkaz:
http://arxiv.org/abs/2409.14827
Video saliency prediction aims to identify the regions in a video that attract human attention and gaze, driven by bottom-up features from the video and top-down processes like memory and cognition. Among these top-down influences, language plays a c
Externí odkaz:
http://arxiv.org/abs/2408.12009
Large Vision-Language Models (LVLMs) excel in integrating visual and linguistic contexts to produce detailed content, facilitating applications such as image captioning. However, using LVLMs to generate descriptions often faces the challenge of objec
Externí odkaz:
http://arxiv.org/abs/2406.12663
Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective tra
Externí odkaz:
http://arxiv.org/abs/2404.12353
Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Research in SFDG primarily bulids upon the existing knowledge of large-scale vision-language models and utiliz
Externí odkaz:
http://arxiv.org/abs/2403.16697
Large language models (LLMs) have demonstrated remarkable capabilities in natural language and multimodal domains. By fine-tuning multimodal LLMs with temporal annotations from well-annotated datasets, e.g., dense video captioning datasets, their tem
Externí odkaz:
http://arxiv.org/abs/2403.16276