Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Li, Siyuan"'
Autor:
Tan, Cheng, Li, Siyuan, Gao, Zhangyang, Guan, Wenfei, Wang, Zedong, Liu, Zicheng, Wu, Lirong, Li, Stan Z.
Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c4576b650051db5dcf72c7520d991eab
http://arxiv.org/abs/2306.11249
http://arxiv.org/abs/2306.11249
The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object cat
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::97edc625a5a1c4d0a9448d9ad04cb31e
Autor:
Ye, Mingqiao, Ke, Lei, Li, Siyuan, Tai, Yu-Wing, Tang, Chi-Keung, Danelljan, Martin, Yu, Fisher
Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1d48b8490c6785dbe9d4f1bac03c4078
Autor:
Zheng, Jiangbin, Wang, Yile, Tan, Cheng, Li, Siyuan, Wang, Ge, Xia, Jun, Chen, Yidong, Li, Stan Z.
Sign language recognition (SLR) is a weakly supervised task that annotates sign videos as textual glosses. Recent studies show that insufficient training caused by the lack of large-scale available sign datasets becomes the main bottleneck for SLR. M
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::cbd8f3595fe22f2460b1ee7dfcac787a
CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels
Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lack
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b7a90c58f17f229f6a7a1d7904ca1f7a
http://arxiv.org/abs/2211.13977
http://arxiv.org/abs/2211.13977
We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source-pretrained model is adapted to the target domain without access to the source data. Existing techniques mainly leverage self-supervised
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b9250d980f37e884e313dffc375cea12
http://arxiv.org/abs/2211.06612
http://arxiv.org/abs/2211.06612
With the remarkable progress of deep neural networks in computer vision, data mixing augmentation techniques are widely studied to alleviate problems of degraded generalization when the amount of training data is limited. However, mixup strategies ha
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e9da604d1c3b8e235ac20a5ebb82f2e4
http://arxiv.org/abs/2209.04851
http://arxiv.org/abs/2209.04851
Current multi-category Multiple Object Tracking (MOT) metrics use class labels to group tracking results for per-class evaluation. Similarly, MOT methods typically only associate objects with the same class predictions. These two prevalent strategies
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a92c39aee0b58e68ab610a864664f6ab
http://arxiv.org/abs/2207.12978
http://arxiv.org/abs/2207.12978
Publikováno v:
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Finding relevant moments and highlights in videos according to natural language queries is a natural and highly valuable common need in the current video content explosion era. Nevertheless, jointly conducting moment retrieval and highlight detection
Masked image modeling, an emerging self-supervised pre-training method, has shown impressive success across numerous downstream vision tasks with Vision transformers. Its underlying idea is simple: a portion of the input image is masked out and then
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9e53604f14da86fbe9165f1332965390
http://arxiv.org/abs/2205.13943
http://arxiv.org/abs/2205.13943