Zobrazeno 1 - 10
of 8 017
pro vyhledávání: '"Chang, Kai"'
Autor:
Li, Cheng-Yi, Chang, Kao-Jung, Yang, Cheng-Fu, Wu, Hsin-Yu, Chen, Wenting, Bansal, Hritik, Chen, Ling, Yang, Yi-Ping, Chen, Yu-Chun, Chen, Shih-Pin, Lirng, Jiing-Feng, Chang, Kai-Wei, Chiou, Shih-Hwa
Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to refle
Externí odkaz:
http://arxiv.org/abs/2407.02235
Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution,
Externí odkaz:
http://arxiv.org/abs/2407.00377
Traditional keyphrase prediction methods predict a single set of keyphrases per document, failing to cater to the diverse needs of users and downstream applications. To bridge the gap, we introduce on-demand keyphrase generation, a novel paradigm tha
Externí odkaz:
http://arxiv.org/abs/2407.00191
Large vision-language models (LVLMs), while proficient in following instructions and responding to diverse questions, invariably generate detailed responses even when questions are ambiguous or unanswerable, leading to hallucinations and bias issues.
Externí odkaz:
http://arxiv.org/abs/2406.14137
Path planning is a fundamental scientific problem in robotics and autonomous navigation, requiring the derivation of efficient routes from starting to destination points while avoiding obstacles. Traditional algorithms like A* and its variants are ca
Externí odkaz:
http://arxiv.org/abs/2407.02511
Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, th
Externí odkaz:
http://arxiv.org/abs/2406.13444
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including base
Externí odkaz:
http://arxiv.org/abs/2406.13692
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction
Externí odkaz:
http://arxiv.org/abs/2406.10746
Autor:
Wang, Fei, Fu, Xingyu, Huang, James Y., Li, Zekun, Liu, Qin, Liu, Xiaogeng, Ma, Mingyu Derek, Xu, Nan, Zhou, Wenxuan, Zhang, Kai, Yan, Tianyi Lorena, Mo, Wenjie Jacky, Liu, Hsiang-Hui, Lu, Pan, Li, Chunyuan, Xiao, Chaowei, Chang, Kai-Wei, Roth, Dan, Zhang, Sheng, Poon, Hoifung, Chen, Muhao
We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of
Externí odkaz:
http://arxiv.org/abs/2406.09411
Autor:
Liu, Hou-I, Tseng, Yu-Wen, Chang, Kai-Cheng, Wang, Pin-Jyun, Shuai, Hong-Han, Cheng, Wen-Huang
Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challe
Externí odkaz:
http://arxiv.org/abs/2406.05755