Výsledky vyhledávání - "Jing, Liqiang"

Report

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Autor: Chang, Yue, Jing, Liqiang, Zhang, Xiaopeng, Zhang, Yue

Hallucination is a common problem for Large Vision-Language Models (LVLMs) with long generations which is difficult to eradicate. The generation with hallucinations is partially inconsistent with the image content. To mitigate hallucination, current

Externí odkaz: http://arxiv.org/abs/2409.16494

Zobrazit plný text záznamu

Report

FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs

Autor: Yan, Bowen, Zhang, Zhengsong, Jing, Liqiang, Hossain, Eftekhar, Du, Xinya

The rapid development of Large Vision-Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Current approaches mainly rely on costly annotations and are not c

Externí odkaz: http://arxiv.org/abs/2409.13612

Zobrazit plný text záznamu

Report

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Autor: Jing, Liqiang, Huang, Zhehui, Wang, Xiaoyang, Yao, Wenlin, Yu, Wenhao, Ma, Kaixin, Zhang, Hongming, Du, Xinya, Yu, Dong

Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have demonstrated impressive language/vision reasoning abilities, igniting the recent trend of building agents for targeted applications such as shopping assistants or AI software

Externí odkaz: http://arxiv.org/abs/2409.07703

Zobrazit plný text záznamu

Report

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback

Autor: Jing, Liqiang, Du, Xinya

Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination problems, i

Externí odkaz: http://arxiv.org/abs/2404.05046

Zobrazit plný text záznamu

Report

Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction

Autor: Gong, HongLin, Jia, Mengzhao, Jing, Liqiang

In line with the latest research, the task of identifying helpful reviews from a vast pool of user-generated textual and visual data has become a prominent area of study. Effective modal representations are expected to possess two key attributes: con

Externí odkaz: http://arxiv.org/abs/2402.18107

Zobrazit plný text záznamu

Report

Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization

Autor: Jing, Liqiang, Zuo, Jingxuan, Zhang, Yue

Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factuality of multimodal summarization models, we propose two fine-

Externí odkaz: http://arxiv.org/abs/2402.11414

Zobrazit plný text záznamu

Report

Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

Autor: Ouyang, Kun, Jing, Liqiang, Song, Xuemeng, Liu, Meng, Hu, Yupeng, Nie, Liqiang

Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (i.e., utterance, video, and audio). Although existing studi

Externí odkaz: http://arxiv.org/abs/2402.03658

Zobrazit plný text záznamu

Report

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Autor: Jia, Mengzhao, Xie, Can, Jing, Liqiang

Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby signi

Externí odkaz: http://arxiv.org/abs/2312.10493

Zobrazit plný text záznamu

Report

VK-G2T: Vision and Context Knowledge enhanced Gloss2Text

Autor: Jing, Liqiang, Song, Xuemeng, Zu, Xinxing, Zheng, Na, Zhao, Zhongzhou, Nie, Liqiang

Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text)

Externí odkaz: http://arxiv.org/abs/2312.10210

Zobrazit plný text záznamu

Report

FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models

Autor: Jing, Liqiang, Li, Ruosen, Chen, Yunmo, Du, Xinya

We introduce FaithScore (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of the generated free-form answers from large vision-language models (LVLMs). The FaithScore evalua

Externí odkaz: http://arxiv.org/abs/2311.01477

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání