Výsledky vyhledávání

Report

NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation

Autor: Wanchoo, Karan, Zuo, Xiaoye, Gonzalez, Hannah, Dan, Soham, Georgakis, Georgios, Roth, Dan, Daniilidis, Kostas, Miltsakaki, Eleni

We present NAVCON, a large-scale annotated Vision-Language Navigation (VLN) corpus built on top of two popular datasets (R2R and RxR). The paper introduces four core, cognitively motivated and linguistically grounded, navigation concepts and an algor

Externí odkaz: http://arxiv.org/abs/2412.13026

Zobrazit plný text záznamu

Report

DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction

Autor: Feng, Yu, Htut, Phu Mon, Qi, Zheng, Xiao, Wei, Mager, Manuel, Pappas, Nikolaos, Halder, Kishaloy, Li, Yang, Benajiba, Yassine, Roth, Dan

Quantifying the uncertainty in the factual parametric knowledge of Large Language Models (LLMs), especially in a black-box setting, poses a significant challenge. Existing methods, which gauge a model's uncertainty through evaluating self-consistency

Externí odkaz: http://arxiv.org/abs/2412.09572

Zobrazit plný text záznamu

Report

Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations

Autor: Malaviya, Chaitanya, Chang, Joseph Chee, Roth, Dan, Iyyer, Mohit, Yatskar, Mark, Lo, Kyle

Language model users often issue queries that lack specification, where the context under which a query was issued -- such as the user's identity, the query's intent, and the criteria for a response to be useful -- is not explicit. For instance, a go

Externí odkaz: http://arxiv.org/abs/2411.07237

Zobrazit plný text záznamu

Report

Benchmarking LLM Guardrails in Handling Multilingual Toxicity

Autor: Yang, Yahan, Dan, Soham, Roth, Dan, Lee, Insup

With the ubiquity of Large Language Models (LLMs), guardrails have become crucial to detect and defend against toxic content. However, with the increasing pervasiveness of LLMs in multilingual scenarios, their effectiveness in handling multilingual t

Externí odkaz: http://arxiv.org/abs/2410.22153

Zobrazit plný text záznamu

Report

ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Autor: Yu, Xiaodong, Zhou, Ben, Cheng, Hao, Roth, Dan

Existing math datasets evaluate the reasoning abilities of large language models (LLMs) by either using the final answer or the intermediate reasoning steps derived from static examples. However, the former approach fails to surface model's uses of s

Externí odkaz: http://arxiv.org/abs/2410.19056

Zobrazit plný text záznamu

Report

Open Domain Question Answering with Conflicting Contexts

Autor: Liu, Siyi, Ning, Qiang, Halder, Kishaloy, Xiao, Wei, Qi, Zheng, Htut, Phu Mon, Zhang, Yi, John, Neha Anna, Min, Bonan, Benajiba, Yassine, Roth, Dan

Open domain question answering systems frequently rely on information retrieved from large collections of text (such as the Web) to answer questions. However, such collections of text often contain conflicting information, and indiscriminately depend

Externí odkaz: http://arxiv.org/abs/2410.12311

Zobrazit plný text záznamu

Report

GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation

Autor: He, Jiashu, Ma, Mingyu Derek, Fan, Jinxuan, Roth, Dan, Wang, Wei, Ribeiro, Alejandro

Existing retrieval-based reasoning approaches for large language models (LLMs) heavily rely on the density and quality of the non-parametric knowledge source to provide domain knowledge and explicit reasoning chain. However, inclusive knowledge sourc

Externí odkaz: http://arxiv.org/abs/2410.08475

Zobrazit plný text záznamu

Report

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

Autor: Elangovan, Aparna, Ko, Jongwoo, Xu, Lei, Elyasi, Mahsa, Liu, Ling, Bodapati, Sravan, Roth, Dan

The effectiveness of automatic evaluation of generative models is typically measured by comparing it to human evaluation using correlation metrics. However, metrics like Krippendorff's $\alpha$ and Randolph's $\kappa$, originally designed to measure

Externí odkaz: http://arxiv.org/abs/2410.03775

Zobrazit plný text záznamu

Report

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

Autor: Ou, Tianyue, Xu, Frank F., Madaan, Aman, Liu, Jiarui, Lo, Robert, Sridhar, Abishek, Sengupta, Sudipta, Roth, Dan, Neubig, Graham, Zhou, Shuyan

LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives (e.g., arranging an online meeting). However, accuracy is still far from satisfactory, partly due to a lack of large-scale, direct demonstr

Externí odkaz: http://arxiv.org/abs/2409.15637

Zobrazit plný text záznamu

Report

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

Autor: Zhang, Qingru, Yu, Xiaodong, Singh, Chandan, Liu, Xiaodong, Liu, Liyuan, Gao, Jianfeng, Zhao, Tuo, Roth, Dan, Cheng, Hao

Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks. However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or halluc

Externí odkaz: http://arxiv.org/abs/2409.10790

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání