Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Li, Miaoge"'
Compositional Zero-Shot Learning (CZSL) aims to recognize novel \textit{state-object} compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically
Externí odkaz:
http://arxiv.org/abs/2408.08703
As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality in
Externí odkaz:
http://arxiv.org/abs/2408.05019
Advancements in prompt tuning of vision-language models have underscored their potential in enhancing open-world visual concept comprehension. However, prior works only primarily focus on single-mode (only one prompt for each modality) and holistic l
Externí odkaz:
http://arxiv.org/abs/2309.13847
Autor:
Li, Miaoge, Wang, Dongsheng, Liu, Xinyang, Zeng, Zequn, Lu, Ruiying, Chen, Bo, Zhou, Mingyuan
Multi-label image classification is a prediction task that aims to identify more than one label from a given image. This paper considers the semantic consistency of the latent space between the visual patch and linguistic label domains and introduces
Externí odkaz:
http://arxiv.org/abs/2307.09066
Autor:
Liu, Xinyang, Wang, Dongsheng, Fang, Bowei, Li, Miaoge, Duan, Zhibin, Xu, Yishi, Chen, Bo, Zhou, Mingyuan
For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tunin
Externí odkaz:
http://arxiv.org/abs/2303.09100
Autor:
Wang, Dongsheng, Xu, Yishi, Li, Miaoge, Duan, Zhibin, Wang, Chaojie, Chen, Bo, Zhou, Mingyuan
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus on mining w
Externí odkaz:
http://arxiv.org/abs/2209.14228
For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tunin
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::400c988f19cadccbccebee66745c50e9
http://arxiv.org/abs/2303.09100
http://arxiv.org/abs/2303.09100