Výsledky vyhledávání

OVTrack: Open-Vocabulary Multiple Object Tracking

Autor: Li, Siyuan, Fischer, Tobias, Ke, Lei, Ding, Henghui, Danelljan, Martin, Yu, Fisher

The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object cat

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::97edc625a5a1c4d0a9448d9ad04cb31e

Zobrazit plný text záznamu

Cascade-DETR: Delving into High-Quality Universal Object Detection

Autor: Ye, Mingqiao, Ke, Lei, Li, Siyuan, Tai, Yu-Wing, Tang, Chi-Keung, Danelljan, Martin, Yu, Fisher

Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1d48b8490c6785dbe9d4f1bac03c4078

Zobrazit plný text záznamu

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Autor: Zheng, Jiangbin, Wang, Yile, Tan, Cheng, Li, Siyuan, Wang, Ge, Xia, Jun, Chen, Yidong, Li, Stan Z.

Sign language recognition (SLR) is a weakly supervised task that annotates sign videos as textual glosses. Recent studies show that insufficient training caused by the lack of large-scale available sign datasets becomes the main bottleneck for SLR. M

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::cbd8f3595fe22f2460b1ee7dfcac787a

Zobrazit plný text záznamu

CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels

Autor: Li, Siyuan, Sun, Li, Li, Qingli

Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lack

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b7a90c58f17f229f6a7a1d7904ca1f7a
http://arxiv.org/abs/2211.13977

Zobrazit plný text záznamu

Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning

Autor: Zhang, Ziyi, Chen, Weikai, Cheng, Hui, Li, Zhen, Li, Siyuan, Lin, Liang, Li, Guanbin

We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source-pretrained model is adapted to the target domain without access to the source data. Existing techniques mainly leverage self-supervised

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b9250d980f37e884e313dffc375cea12
http://arxiv.org/abs/2211.06612

Zobrazit plný text záznamu

OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning

Autor: Li, Siyuan, Wang, Zedong, Liu, Zicheng, Wu, Di, Li, Stan Z.

With the remarkable progress of deep neural networks in computer vision, data mixing augmentation techniques are widely studied to alleviate problems of degraded generalization when the amount of training data is limited. However, mixup strategies ha

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e9da604d1c3b8e235ac20a5ebb82f2e4
http://arxiv.org/abs/2209.04851

Zobrazit plný text záznamu

Tracking Every Thing in the Wild

Autor: Li, Siyuan, Danelljan, Martin, Ding, Henghui, Huang, Thomas E., Yu, Fisher

Current multi-category Multiple Object Tracking (MOT) metrics use class labels to group tracking results for per-class evaluation. Similarly, MOT methods typically only associate objects with the same class predictions. These two prevalent strategies

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a92c39aee0b58e68ab610a864664f6ab
http://arxiv.org/abs/2207.12978

Zobrazit plný text záznamu

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

Autor: Liu, Ye, Li, Siyuan, Wu, Yang, Chen, Chang Wen, Shan, Ying, Qie, Xiaohu

Publikováno v: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Finding relevant moments and highlights in videos according to natural language queries is a natural and highly valuable common need in the current video content explosion era. Nevertheless, jointly conducting moment retrieval and highlight detection

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::97f0500b359d347264a1fefa262fa67f
https://doi.org/10.1109/cvpr52688.2022.00305

Zobrazit plný text záznamu

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Autor: Li, Siyuan, Wu, Di, Wu, Fang, Zang, Zelin, Li, Stan. Z.

Masked image modeling, an emerging self-supervised pre-training method, has shown impressive success across numerous downstream vision tasks with Vision transformers. Its underlying idea is simple: a portion of the input image is masked out and then

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9e53604f14da86fbe9165f1332965390
http://arxiv.org/abs/2205.13943

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání