Výsledky vyhledávání

Report

DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning

Autor: Hu, Yebowen, Wang, Xiaoyang, Yao, Wenlin, Lu, Yiming, Zhang, Daoan, Foroosh, Hassan, Yu, Dong, Liu, Fei

LLMs are ideal for decision-making due to their ability to reason over long contexts and identify critical factors. However, challenges arise when processing transcripts of spoken speech describing complex scenarios. These transcripts often contain u

Externí odkaz: http://arxiv.org/abs/2410.01772

Zobrazit plný text záznamu

Report

IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation

Autor: Lin, Fan, Xie, Shuyi, Dai, Yong, Yao, Wenlin, Lang, Tianjiao, Xu, Zishan, Hu, Zhichao, Xiao, Xiao, Liu, Yuhong, Zhang, Yu

As Large Language Models (LLMs) grow increasingly adept at managing complex tasks, the evaluation set must keep pace with these advancements to ensure it remains sufficiently discriminative. Item Discrimination (ID) theory, which is widely used in ed

Externí odkaz: http://arxiv.org/abs/2409.18892

Zobrazit plný text záznamu

Report

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows

Autor: Yao, Wenlin, Mi, Haitao, Yu, Dong

Despite recent advancements in large language models (LLMs), their performance on complex reasoning problems requiring multi-step thinking and combining various skills is still limited. To address this, we propose a novel framework HDFlow for complex

Externí odkaz: http://arxiv.org/abs/2409.17433

Zobrazit plný text záznamu

Report

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Autor: Jing, Liqiang, Huang, Zhehui, Wang, Xiaoyang, Yao, Wenlin, Yu, Wenhao, Ma, Kaixin, Zhang, Hongming, Du, Xinya, Yu, Dong

Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have demonstrated impressive language/vision reasoning abilities, igniting the recent trend of building agents for targeted applications such as shopping assistants or AI software

Externí odkaz: http://arxiv.org/abs/2409.07703

Zobrazit plný text záznamu

Report

When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives

Autor: Hu, Yebowen, Song, Kaiqiang, Cho, Sangwoo, Wang, Xiaoyang, Yao, Wenlin, Foroosh, Hassan, Yu, Dong, Liu, Fei

Reasoning is most powerful when an LLM accurately aggregates relevant information. We examine the critical role of information aggregation in reasoning by requiring the LLM to analyze sports narratives. To succeed at this task, an LLM must infer poin

Externí odkaz: http://arxiv.org/abs/2406.12084

Zobrazit plný text záznamu

Report

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

Autor: Liang, Zhenwen, Yu, Dian, Yu, Wenhao, Yao, Wenlin, Zhang, Zhihan, Zhang, Xiangliang, Yu, Dong

Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires

Externí odkaz: http://arxiv.org/abs/2405.19444

Zobrazit plný text záznamu

Report

Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Autor: Wu, Xuansheng, Zhao, Haiyan, Zhu, Yaochen, Shi, Yucheng, Yang, Fan, Liu, Tianming, Zhai, Xiaoming, Yao, Wenlin, Li, Jundong, Du, Mengnan, Liu, Ninghao

Explainable AI (XAI) refers to techniques that provide human-understandable insights into the workings of AI models. Recently, the focus of XAI is being extended towards Large Language Models (LLMs) which are often criticized for their lack of transp

Externí odkaz: http://arxiv.org/abs/2403.08946

Zobrazit plný text záznamu

Report

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

Autor: Zhao, Xinran, Zhang, Hongming, Pan, Xiaoman, Yao, Wenlin, Yu, Dong, Wu, Tongshuang, Chen, Jianshu

Publikováno v: Findings of the Association for Computational Linguistics ACL 2024

For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance. While it is now common sense that LLM performances are greatly impacted by prompts, the confidence calibration in prompting LLMs has yet to be th

Externí odkaz: http://arxiv.org/abs/2402.17124

Zobrazit plný text záznamu

Report

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Autor: He, Hongliang, Yao, Wenlin, Ma, Kaixin, Yu, Wenhao, Dai, Yong, Zhang, Hongming, Lan, Zhenzhong, Yu, Dong

The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically only handl

Externí odkaz: http://arxiv.org/abs/2401.13919

Zobrazit plný text záznamu

Report

InFoBench: Evaluating Instruction Following Ability in Large Language Models

Autor: Qin, Yiwei, Song, Kaiqiang, Hu, Yebowen, Yao, Wenlin, Cho, Sangwoo, Wang, Xiaoyang, Wu, Xuansheng, Liu, Fei, Liu, Pengfei, Yu, Dong

This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instructions into

Externí odkaz: http://arxiv.org/abs/2401.03601

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání