Výsledky vyhledávání

Report

TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

Autor: Li, Miaoge, Guo, Jingcai, Da Xu, Richard Yi, Wang, Dongsheng, Cao, Xiaofeng, Guo, Song

Compositional Zero-Shot Learning (CZSL) aims to recognize novel \textit{state-object} compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically

Externí odkaz: http://arxiv.org/abs/2408.08703

Zobrazit plný text záznamu

Report

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

Autor: Wang, Dongsheng, Cui, Jiequan, Li, Miaoge, Lin, Wang, Chen, Bo, Zhang, Hanwang

As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality in

Externí odkaz: http://arxiv.org/abs/2408.05019

Zobrazit plný text záznamu

Report

Tuning Multi-mode Token-level Prompt Alignment across Modalities

Autor: Wang, Dongsheng, Li, Miaoge, Liu, Xinyang, Xu, MingSheng, Chen, Bo, Zhang, Hanwang

Advancements in prompt tuning of vision-language models have underscored their potential in enhancing open-world visual concept comprehension. However, prior works only primarily focus on single-mode (only one prompt for each modality) and holistic l

Externí odkaz: http://arxiv.org/abs/2309.13847

Zobrazit plný text záznamu

Report

PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification

Autor: Li, Miaoge, Wang, Dongsheng, Liu, Xinyang, Zeng, Zequn, Lu, Ruiying, Chen, Bo, Zhou, Mingyuan

Multi-label image classification is a prediction task that aims to identify more than one label from a given image. This paper considers the semantic consistency of the latent space between the visual patch and linguistic label domains and introduces

Externí odkaz: http://arxiv.org/abs/2307.09066

Zobrazit plný text záznamu

Report

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

Autor: Liu, Xinyang, Wang, Dongsheng, Fang, Bowei, Li, Miaoge, Duan, Zhibin, Xu, Yishi, Chen, Bo, Zhou, Mingyuan

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tunin

Externí odkaz: http://arxiv.org/abs/2303.09100

Zobrazit plný text záznamu

Report

Knowledge-Aware Bayesian Deep Topic Model

Autor: Wang, Dongsheng, Xu, Yishi, Li, Miaoge, Duan, Zhibin, Wang, Chaojie, Chen, Bo, Zhou, Mingyuan

We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus on mining w

Externí odkaz: http://arxiv.org/abs/2209.14228

Zobrazit plný text záznamu

Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models

Autor: Liu, Xinyang, Wang, Dongsheng, Li, Miaoge, Duan, Zhibin, Xu, Yishi, Chen, Bo, Zhou, Mingyuan

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::400c988f19cadccbccebee66745c50e9
http://arxiv.org/abs/2303.09100

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání