Zobrazeno 1 - 10
of 763
pro vyhledávání: '"He Yuhang"'
Despite significant advancements in Text-to-Audio (TTA) generation models achieving high-fidelity audio with fine-grained context understanding, they struggle to model the relations between audio events described in the input text. However, previous
Externí odkaz:
http://arxiv.org/abs/2412.15922
Autor:
Wan, Cong, Luo, Xiangyang, Cai, Zijian, Song, Yiren, Zhao, Yunlong, Bai, Yifan, He, Yuhang, Gong, Yihong
In this paper, we introduce GRID, a novel paradigm that reframes a broad range of visual generation tasks as the problem of arranging grids, akin to film strips. At its core, GRID transforms temporal sequences into grid layouts, enabling image genera
Externí odkaz:
http://arxiv.org/abs/2412.10718
Autonomous driving requires a comprehensive understanding of 3D environments to facilitate high-level tasks such as motion prediction, planning, and mapping. In this paper, we introduce DriveMLLM, a benchmark specifically designed to evaluate the spa
Externí odkaz:
http://arxiv.org/abs/2411.13112
Existing prompt learning methods in Vision-Language Models (VLM) have effectively enhanced the transfer capability of VLM to downstream tasks, but they suffer from a significant decline in generalization due to severe overfitting. To address this iss
Externí odkaz:
http://arxiv.org/abs/2410.10247
Non-exemplar class Incremental Learning (NECIL) enables models to continuously acquire new classes without retraining from scratch and storing old task exemplars, addressing privacy and storage issues. However, the absence of data from earlier tasks
Externí odkaz:
http://arxiv.org/abs/2409.14983
Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthoriz
Externí odkaz:
http://arxiv.org/abs/2408.10571
Autor:
Lin, Jiayu, Chen, Guanrong, Jin, Bojun, Li, Chenyang, Jia, Shutong, Lin, Wancong, Sun, Yang, He, Yuhang, Yang, Caihua, Bao, Jianzhu, Wu, Jipeng, Su, Wen, Chen, Jinglu, Li, Xinyi, Chen, Tianyu, Han, Mingjie, Du, Shuaiwen, Wang, Zijian, Li, Jiyin, Suo, Fuzhong, Wang, Hao, Lin, Nuanchen, Huang, Xuanjing, Jiang, Changjian, Xu, RuiFeng, Zhang, Long, Cao, Jiuxin, Jin, Ting, Wei, Zhongyu
In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different
Externí odkaz:
http://arxiv.org/abs/2407.14829
The problem of Rehearsal-Free Continual Learning (RFCL) aims to continually learn new knowledge while preventing forgetting of the old knowledge, without storing any old samples and prototypes. The latest methods leverage large-scale pre-trained mode
Externí odkaz:
http://arxiv.org/abs/2407.10281
This paper introduces the point-axis representation for oriented object detection, emphasizing its flexibility and geometrically intuitive nature with two key components: points and axes. 1) Points delineate the spatial extent and contours of objects
Externí odkaz:
http://arxiv.org/abs/2407.08489
We present SPEAR, a continuous receiver-to-receiver acoustic neural warping field for spatial acoustic effects prediction in an acoustic 3D space with a single stationary audio source. Unlike traditional source-to-receiver modelling methods that requ
Externí odkaz:
http://arxiv.org/abs/2406.11006