Zobrazeno 1 - 10
of 45
pro vyhledávání: '"Shi, Hengcan"'
Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: dif
Externí odkaz:
http://arxiv.org/abs/2406.12846
DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation
Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains. However, their potential in multi-sensor fusion remains largely unexplored. In this work, we introduce Di
Externí odkaz:
http://arxiv.org/abs/2404.04629
Autor:
Le, Duy-Tho, Gou, Chenhui, Datta, Stavya, Shi, Hengcan, Reid, Ian, Cai, Jianfei, Rezatofighi, Hamid
Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data
Externí odkaz:
http://arxiv.org/abs/2404.01686
Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we prop
Externí odkaz:
http://arxiv.org/abs/2403.07518
In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of existing approaches are task-specific and individ
Externí odkaz:
http://arxiv.org/abs/2307.08238
New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying la
Externí odkaz:
http://arxiv.org/abs/2307.04513
In recent years, open-vocabulary (OV) object detection has attracted increasing research attention. Unlike traditional detection, which only recognizes fixed-category objects, OV detection aims to detect objects in an open category set. Previous work
Externí odkaz:
http://arxiv.org/abs/2307.03339
Recent mask proposal models have significantly improved the performance of zero-shot semantic segmentation. However, the use of a `background' embedding during training in these methods is problematic as the resulting model tends to over-learn and as
Externí odkaz:
http://arxiv.org/abs/2301.07336
Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation. Existing transformer-based segmentation models combine features across scales without any selection, where features on sub-optimal scales may degra
Externí odkaz:
http://arxiv.org/abs/2205.07056
Object proposal generation is an important and fundamental task in computer vision. In this paper, we propose ProposalCLIP, a method towards unsupervised open-category object proposal generation. Unlike previous works which require a large number of
Externí odkaz:
http://arxiv.org/abs/2201.06696