Výsledky vyhledávání

Report

Matching Anything by Segmenting Anything

Autor: Li, Siyuan, Ke, Lei, Danelljan, Martin, Piccinelli, Luigi, Segu, Mattia, Van Gool, Luc, Yu, Fisher

The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits t

Externí odkaz: http://arxiv.org/abs/2406.04221

Zobrazit plný text záznamu

Report

Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning

Autor: Tan, Cheng, Wei, Jingxuan, Sun, Linzhuang, Gao, Zhangyang, Li, Siyuan, Yu, Bihui, Guo, Ruifeng, Li, Stan Z.

Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been

Externí odkaz: http://arxiv.org/abs/2405.20834

Zobrazit plný text záznamu

Report

UniDepth: Universal Monocular Metric Depth Estimation

Autor: Piccinelli, Luigi, Yang, Yung-Hsu, Sakaridis, Christos, Segu, Mattia, Li, Siyuan, Van Gool, Luc, Yu, Fisher

Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to

Externí odkaz: http://arxiv.org/abs/2403.18913

Zobrazit plný text záznamu

Report

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

Autor: Xu, Chao, Liu, Yang, Xing, Jiazheng, Wang, Weida, Sun, Mingze, Dan, Jun, Huang, Tianxin, Li, Siyuan, Cheng, Zhi-Qi, Tai, Ying, Sun, Baigui

In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generat

Externí odkaz: http://arxiv.org/abs/2403.01901

Zobrazit plný text záznamu

Report

Switch EMA: A Free Lunch for Better Flatness and Sharpness

Autor: Li, Siyuan, Liu, Zicheng, Tian, Juanxi, Wang, Ge, Wang, Zedong, Jin, Weiyang, Wu, Di, Tan, Cheng, Lin, Tao, Liu, Yang, Sun, Baigui, Li, Stan Z.

Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA method

Externí odkaz: http://arxiv.org/abs/2402.09240

Zobrazit plný text záznamu

Report

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Autor: Li, Siyuan, Zhang, Luyuan, Wang, Zedong, Wu, Di, Wu, Lirong, Liu, Zicheng, Xia, Jun, Tan, Cheng, Liu, Yang, Sun, Baigui, Li, Stan Z.

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised

Externí odkaz: http://arxiv.org/abs/2401.00897

Zobrazit plný text záznamu

Report

Revisiting the Temporal Modeling in Spatio-Temporal Predictive Learning under A Unified View

Autor: Tan, Cheng, Wang, Jue, Gao, Zhangyang, Li, Siyuan, Wu, Lirong, Xia, Jun, Li, Stan Z.

Spatio-temporal predictive learning plays a crucial role in self-supervised learning, with wide-ranging applications across a diverse range of fields. Previous approaches for temporal modeling fall into two categories: recurrent-based and recurrent-f

Externí odkaz: http://arxiv.org/abs/2310.05829

Zobrazit plný text záznamu

Report

Cascade-DETR: Delving into High-Quality Universal Object Detection

Autor: Ye, Mingqiao, Ke, Lei, Li, Siyuan, Tai, Yu-Wing, Tang, Chi-Keung, Danelljan, Martin, Yu, Fisher

Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to

Externí odkaz: http://arxiv.org/abs/2307.11035

Zobrazit plný text záznamu

Report

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

Autor: Tan, Cheng, Li, Siyuan, Gao, Zhangyang, Guan, Wenfei, Wang, Zedong, Liu, Zicheng, Wu, Lirong, Li, Stan Z.

Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of

Externí odkaz: http://arxiv.org/abs/2306.11249

Zobrazit plný text záznamu

Report

OVTrack: Open-Vocabulary Multiple Object Tracking

Autor: Li, Siyuan, Fischer, Tobias, Ke, Lei, Ding, Henghui, Danelljan, Martin, Yu, Fisher

The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object cat

Externí odkaz: http://arxiv.org/abs/2304.08408

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání