Zobrazeno 1 - 10
of 79
pro vyhledávání: '"Song, Zikai"'
Multi-Object Tracking (MOT) aims to associate multiple objects across video frames and is a challenging vision task due to inherent complexities in the tracking environment. Most existing approaches train and track within a single domain, resulting i
Externí odkaz:
http://arxiv.org/abs/2410.23907
Point tracking is a challenging task in computer vision, aiming to establish point-wise correspondence across long video sequences. Recent advancements have primarily focused on temporal modeling techniques to improve local feature similarity, often
Externí odkaz:
http://arxiv.org/abs/2407.20730
The essence of multi-modal fusion lies in exploiting the complementary information inherent in diverse modalities. However, prevalent fusion methods rely on traditional neural architectures and are inadequately equipped to capture the dynamics of int
Externí odkaz:
http://arxiv.org/abs/2405.18014
Autor:
Luo, Run, Li, Yunshui, Chen, Longze, He, Wanwei, Lin, Ting-En, Liu, Ziqiang, Zhang, Lei, Song, Zikai, Xia, Xiaobo, Liu, Tongliang, Yang, Min, Hui, Binyuan
The development of large language models (LLMs) has significantly advanced the emergence of large multimodal models (LMMs). While LMMs have achieved tremendous success by promoting the synergy between multimodal comprehension and creation, they often
Externí odkaz:
http://arxiv.org/abs/2405.15232
In the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology. However, its application to large-scale, high-resolution scenes (exceeding 4k$\times$4k pixels) is hindered by the excessive computational req
Externí odkaz:
http://arxiv.org/abs/2404.12777
Generating realistic human motion sequences from text descriptions is a challenging task that requires capturing the rich expressiveness of both natural language and human motion.Recent advances in diffusion models have enabled significant progress i
Externí odkaz:
http://arxiv.org/abs/2312.12763
Generating multi-view images from a single input view using image-conditioned diffusion models is a recent advancement and has shown considerable potential. However, issues such as the lack of consistency in synthesized views and over-smoothing in ex
Externí odkaz:
http://arxiv.org/abs/2312.06198
Autor:
Ye, YuTeng, Cai, Jiale, Zhou, Hang, Li, Guanwen, Zhang, Youjia, Song, Zikai, Gao, Chenxing, Yu, Junqing, Yang, Wei
In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive sy
Externí odkaz:
http://arxiv.org/abs/2309.09466
Multi-object tracking (MOT) is a challenging vision task that aims to detect individual objects within a single frame and associate them across multiple frames. Recent MOT approaches can be categorized into two-stage tracking-by-detection (TBD) metho
Externí odkaz:
http://arxiv.org/abs/2308.09905
Transformer framework has been showing superior performances in visual object tracking for its great strength in information aggregation across the template and search image with the well-known attention mechanism. Most recent advances focus on explo
Externí odkaz:
http://arxiv.org/abs/2301.10938