Zobrazeno 1 - 10
of 1 005
pro vyhledávání: '"li, Yaqian"'
Autor:
Feng, Juexiao, Yang, Yuhong, Xie, Yanchun, Li, Yaqian, Guo, Yandong, Guo, Yuchen, He, Yuwei, Xiang, Liuyu, Ding, Guiguang
In recent years, object detection in deep learning has experienced rapid development. However, most existing object detection models perform well only on closed-set datasets, ignoring a large number of potential objects whose categories are not defin
Externí odkaz:
http://arxiv.org/abs/2402.18821
The advent of foundation models has revolutionized the fields of natural language processing and computer vision, paving the way for their application in autonomous driving (AD). This survey presents a comprehensive review of more than 40 research pa
Externí odkaz:
http://arxiv.org/abs/2402.01105
Autor:
Xu, Jinjin, Xu, Liwu, Yang, Yuzhe, Li, Xiang, Wang, Fanyi, Xie, Yanchun, Huang, Yi-Jie, Li, Yaqian
Recent advancements in multi-modal large language models (MLLMs) have led to substantial improvements in visual understanding, primarily driven by sophisticated modality alignment strategies. However, predominant approaches prioritize global or regio
Externí odkaz:
http://arxiv.org/abs/2311.05348
Autor:
Huang, Xinyu, Huang, Yi-Jie, Zhang, Youcai, Tian, Weiwei, Feng, Rui, Zhang, Yuejie, Xie, Yanchun, Li, Yaqian, Zhang, Lei
In this paper, we introduce the Recognize Anything Plus Model (RAM++), an open-set image tagging model effectively leveraging multi-grained text supervision. Previous approaches (e.g., CLIP) primarily utilize global text supervision paired with image
Externí odkaz:
http://arxiv.org/abs/2310.15200
Semi-supervised Learning (SSL) has been proven vulnerable to out-of-distribution (OOD) samples in realistic large-scale unsupervised datasets due to over-confident pseudo-labeling OODs as in-distribution (ID). A key underlying problem is class-wise l
Externí odkaz:
http://arxiv.org/abs/2308.15575
The success of pre-training approaches on a variety of downstream tasks has revitalized the field of computer vision. Image aesthetics assessment (IAA) is one of the ideal application scenarios for such methods due to subjective and expensive labelin
Externí odkaz:
http://arxiv.org/abs/2307.15640
Autor:
Zhang, Youcai, Huang, Xinyu, Ma, Jinyu, Li, Zhaoyang, Luo, Zhaochuan, Xie, Yanchun, Qin, Yuzhuo, Luo, Tong, Li, Yaqian, Liu, Shilong, Guo, Yandong, Zhang, Lei
We present the Recognize Anything Model (RAM): a strong foundation model for image tagging. RAM makes a substantial step for large models in computer vision, demonstrating the zero-shot ability to recognize any common category with high accuracy. RAM
Externí odkaz:
http://arxiv.org/abs/2306.03514
Autor:
Lyu, Mengyao, Zhou, Jundong, Chen, Hui, Huang, Yijie, Yu, Dongdong, Li, Yaqian, Guo, Yandong, Guo, Yuchen, Xiang, Liuyu, Ding, Guiguang
Active learning selects informative samples for annotation within budget, which has proven efficient recently on object detection. However, the widely used active detection benchmarks conduct image-level evaluation, which is unrealistic in human work
Externí odkaz:
http://arxiv.org/abs/2303.13089
Knowledge distillation (KD) has been extensively studied in single-label image classification. However, its efficacy for multi-label classification remains relatively unexplored. In this study, we firstly investigate the effectiveness of classical KD
Externí odkaz:
http://arxiv.org/abs/2303.08360
Autor:
Huang, Xinyu, Zhang, Youcai, Ma, Jinyu, Tian, Weiwei, Feng, Rui, Zhang, Yuejie, Li, Yaqian, Guo, Yandong, Zhang, Lei
This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features. In contrast to prior works which utilize object tags either
Externí odkaz:
http://arxiv.org/abs/2303.05657