Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Dang, Ronghao"'
Autor:
Li, Long, Xu, Weiwen, Guo, Jiayan, Zhao, Ruochen, Li, Xinxuan, Yuan, Yuqian, Zhang, Boqiang, Jiang, Yuming, Xin, Yifei, Dang, Ronghao, Zhao, Deli, Rong, Yu, Feng, Tian, Bing, Lidong
Effective research ideation is a critical step for scientific research. However, the exponential increase in scientific literature makes it challenging for researchers to stay current with recent advances and identify meaningful research directions.
Externí odkaz:
http://arxiv.org/abs/2410.13185
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Autor:
Zhu, Minghao, Wang, Zhengpu, Hu, Mengxian, Dang, Ronghao, Lin, Xiao, Zhou, Xun, Liu, Chengju, Chen, Qijun
Transferring visual-language knowledge from large-scale foundation models for video recognition has proved to be effective. To bridge the domain gap, additional parametric modules are added to capture the temporal information. However, zero-shot gene
Externí odkaz:
http://arxiv.org/abs/2410.10589
In the pursuit of robust and generalizable environment perception and language understanding, the ubiquitous challenge of dataset bias continues to plague vision-and-language navigation (VLN) agents, hindering their performance in unseen environments
Externí odkaz:
http://arxiv.org/abs/2404.10241
Vision-and-Language Navigation (VLN) has gained significant research interest in recent years due to its potential applications in real-world scenarios. However, existing VLN methods struggle with the issue of spurious associations, resulting in poor
Externí odkaz:
http://arxiv.org/abs/2403.03405
Autor:
Lin, Xiao, Zhu, Minghao, Dang, Ronghao, Zhou, Guangliang, Shu, Shaolong, Lin, Feng, Liu, Chengju, Chen, Qijun
Most of existing category-level object pose estimation methods devote to learning the object category information from point cloud modality. However, the scale of 3D datasets is limited due to the high cost of 3D data collection and annotation. Conse
Externí odkaz:
http://arxiv.org/abs/2402.15726
Autor:
Dang, Ronghao, Feng, Jiangyan, Zhang, Haodong, Ge, Chongjian, Song, Lin, Gong, Lijun, Liu, Chengju, Chen, Qijun, Zhu, Feng, Zhao, Rui, Song, Yibing
We propose InstructDET, a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. While deriving from referring expressions (REC), the instructions we leverage are greatly diversified to enco
Externí odkaz:
http://arxiv.org/abs/2310.05136
As the most essential property in a video, motion information is critical to a robust and generalized video representation. To inject motion dynamics, recent works have adopted frame difference as the source of motion information in video contrastive
Externí odkaz:
http://arxiv.org/abs/2309.00297
Autor:
Wang, Liuyi, He, Zongtao, Tang, Jiagui, Dang, Ronghao, Wang, Naijia, Liu, Chengju, Chen, Qijun
Publikováno v:
International Joint Conferences on Artificial Intelligence Organization 2023
Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitation
Externí odkaz:
http://arxiv.org/abs/2305.03602
We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together. Based on the MAD paradigm, we design a multiple
Externí odkaz:
http://arxiv.org/abs/2302.01520
"Search for" or "Navigate to"? When finding an object, the two choices always come up in our subconscious mind. Before seeing the target, we search for the target based on experience. After seeing the target, we remember the target location and navig
Externí odkaz:
http://arxiv.org/abs/2208.00553