Výsledky vyhledávání

Report

Debiased Novel Category Discovering and Localization

Autor: Feng, Juexiao, Yang, Yuhong, Xie, Yanchun, Li, Yaqian, Guo, Yandong, Guo, Yuchen, He, Yuwei, Xiang, Liuyu, Ding, Guiguang

In recent years, object detection in deep learning has experienced rapid development. However, most existing object detection models perform well only on closed-set datasets, ignoring a large number of potential objects whose categories are not defin

Externí odkaz: http://arxiv.org/abs/2402.18821

Zobrazit plný text záznamu

Report

A Survey for Foundation Models in Autonomous Driving

Autor: Gao, Haoxiang, Wang, Zhongruo, Li, Yaqian, Long, Kaiwen, Yang, Ming, Shen, Yiqing

The advent of foundation models has revolutionized the fields of natural language processing and computer vision, paving the way for their application in autonomous driving (AD). This survey presents a comprehensive review of more than 40 research pa

Externí odkaz: http://arxiv.org/abs/2402.01105

Zobrazit plný text záznamu

Report

u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model

Autor: Xu, Jinjin, Xu, Liwu, Yang, Yuzhe, Li, Xiang, Wang, Fanyi, Xie, Yanchun, Huang, Yi-Jie, Li, Yaqian

Recent advancements in multi-modal large language models (MLLMs) have led to substantial improvements in visual understanding, primarily driven by sophisticated modality alignment strategies. However, predominant approaches prioritize global or regio

Externí odkaz: http://arxiv.org/abs/2311.05348

Zobrazit plný text záznamu

Report

Open-Set Image Tagging with Multi-Grained Text Supervision

Autor: Huang, Xinyu, Huang, Yi-Jie, Zhang, Youcai, Tian, Weiwei, Feng, Rui, Zhang, Yuejie, Xie, Yanchun, Li, Yaqian, Zhang, Lei

In this paper, we introduce the Recognize Anything Plus Model (RAM++), an open-set image tagging model effectively leveraging multi-grained text supervision. Previous approaches (e.g., CLIP) primarily utilize global text supervision paired with image

Externí odkaz: http://arxiv.org/abs/2310.15200

Zobrazit plný text záznamu

Report

Prototype Fission: Closing Set for Robust Open-set Semi-supervised Learning

Autor: Tan, Xuwei, Huang, Yi-Jie, Li, Yaqian

Semi-supervised Learning (SSL) has been proven vulnerable to out-of-distribution (OOD) samples in realistic large-scale unsupervised datasets due to over-confident pseudo-labeling OODs as in-distribution (ID). A key underlying problem is class-wise l

Externí odkaz: http://arxiv.org/abs/2308.15575

Zobrazit plný text záznamu

Report

CLIP Brings Better Features to Visual Aesthetics Learners

Autor: Xu, Liwu, Xu, Jinjin, Yang, Yuzhe, Huang, Yijie, Xie, Yanchun, Li, Yaqian

The success of pre-training approaches on a variety of downstream tasks has revitalized the field of computer vision. Image aesthetics assessment (IAA) is one of the ideal application scenarios for such methods due to subjective and expensive labelin

Externí odkaz: http://arxiv.org/abs/2307.15640

Zobrazit plný text záznamu

Report

Recognize Anything: A Strong Image Tagging Model

Autor: Zhang, Youcai, Huang, Xinyu, Ma, Jinyu, Li, Zhaoyang, Luo, Zhaochuan, Xie, Yanchun, Qin, Yuzhuo, Luo, Tong, Li, Yaqian, Liu, Shilong, Guo, Yandong, Zhang, Lei

We present the Recognize Anything Model (RAM): a strong foundation model for image tagging. RAM makes a substantial step for large models in computer vision, demonstrating the zero-shot ability to recognize any common category with high accuracy. RAM

Externí odkaz: http://arxiv.org/abs/2306.03514

Zobrazit plný text záznamu

Report

Box-Level Active Detection

Autor: Lyu, Mengyao, Zhou, Jundong, Chen, Hui, Huang, Yijie, Yu, Dongdong, Li, Yaqian, Guo, Yandong, Guo, Yuchen, Xiang, Liuyu, Ding, Guiguang

Active learning selects informative samples for annotation within budget, which has proven efficient recently on object detection. However, the widely used active detection benchmarks conduct image-level evaluation, which is unrealistic in human work

Externí odkaz: http://arxiv.org/abs/2303.13089

Zobrazit plný text záznamu

Report

Knowledge Distillation from Single to Multi Labels: an Empirical Study

Autor: Zhang, Youcai, Qin, Yuzhuo, Liu, Hengwei, Zhang, Yanhao, Li, Yaqian, Gu, Xiaodong

Knowledge distillation (KD) has been extensively studied in single-label image classification. However, its efficacy for multi-label classification remains relatively unexplored. In this study, we firstly investigate the effectiveness of classical KD

Externí odkaz: http://arxiv.org/abs/2303.08360

Zobrazit plný text záznamu

Report

Tag2Text: Guiding Vision-Language Model via Image Tagging

Autor: Huang, Xinyu, Zhang, Youcai, Ma, Jinyu, Tian, Weiwei, Feng, Rui, Zhang, Yuejie, Li, Yaqian, Guo, Yandong, Zhang, Lei

This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features. In contrast to prior works which utilize object tags either

Externí odkaz: http://arxiv.org/abs/2303.05657

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání