Zobrazeno 1 - 10
of 57
pro vyhledávání: '"Li, Siyuan"'
Autor:
Li, Siyuan, Ke, Lei, Danelljan, Martin, Piccinelli, Luigi, Segu, Mattia, Van Gool, Luc, Yu, Fisher
The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits t
Externí odkaz:
http://arxiv.org/abs/2406.04221
Autor:
Tan, Cheng, Wei, Jingxuan, Sun, Linzhuang, Gao, Zhangyang, Li, Siyuan, Yu, Bihui, Guo, Ruifeng, Li, Stan Z.
Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been
Externí odkaz:
http://arxiv.org/abs/2405.20834
Autor:
Piccinelli, Luigi, Yang, Yung-Hsu, Sakaridis, Christos, Segu, Mattia, Li, Siyuan, Van Gool, Luc, Yu, Fisher
Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to
Externí odkaz:
http://arxiv.org/abs/2403.18913
Autor:
Xu, Chao, Liu, Yang, Xing, Jiazheng, Wang, Weida, Sun, Mingze, Dan, Jun, Huang, Tianxin, Li, Siyuan, Cheng, Zhi-Qi, Tai, Ying, Sun, Baigui
In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generat
Externí odkaz:
http://arxiv.org/abs/2403.01901
Autor:
Li, Siyuan, Liu, Zicheng, Tian, Juanxi, Wang, Ge, Wang, Zedong, Jin, Weiyang, Wu, Di, Tan, Cheng, Lin, Tao, Liu, Yang, Sun, Baigui, Li, Stan Z.
Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA method
Externí odkaz:
http://arxiv.org/abs/2402.09240
Autor:
Li, Siyuan, Zhang, Luyuan, Wang, Zedong, Wu, Di, Wu, Lirong, Liu, Zicheng, Xia, Jun, Tan, Cheng, Liu, Yang, Sun, Baigui, Li, Stan Z.
As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised
Externí odkaz:
http://arxiv.org/abs/2401.00897
Spatio-temporal predictive learning plays a crucial role in self-supervised learning, with wide-ranging applications across a diverse range of fields. Previous approaches for temporal modeling fall into two categories: recurrent-based and recurrent-f
Externí odkaz:
http://arxiv.org/abs/2310.05829
Autor:
Ye, Mingqiao, Ke, Lei, Li, Siyuan, Tai, Yu-Wing, Tang, Chi-Keung, Danelljan, Martin, Yu, Fisher
Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to
Externí odkaz:
http://arxiv.org/abs/2307.11035
Autor:
Tan, Cheng, Li, Siyuan, Gao, Zhangyang, Guan, Wenfei, Wang, Zedong, Liu, Zicheng, Wu, Lirong, Li, Stan Z.
Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of
Externí odkaz:
http://arxiv.org/abs/2306.11249
The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object cat
Externí odkaz:
http://arxiv.org/abs/2304.08408