Výsledky vyhledávání

Report

UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization

Autor: Uddin, Md Nayem, Saeidi, Amir, Handa, Divij, Seth, Agastya, Son, Tran Cao, Blanco, Eduardo, Corman, Steven R., Baral, Chitta

This paper introduces UnSeenTimeQA, a novel time-sensitive question-answering (TSQA) benchmark that diverges from traditional TSQA benchmarks by avoiding factual and web-searchable queries. We present a series of time-sensitive event scenarios decoup

Externí odkaz: http://arxiv.org/abs/2407.03525

Zobrazit plný text záznamu

Report

Searching for asymmetric and heavily precessing Binary Black Holes in the gravitational wave data from the LIGO and Virgo third Observing Run

Leveraging the features of the GstLAL pipeline, we present the results of a matched filtering search for asymmetric binary black hole systems with heavily mis-aligned spins in LIGO and Virgo data taken during the third observing run. Our target syste

Externí odkaz: http://arxiv.org/abs/2406.17832

Zobrazit plný text záznamu

Report

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

Autor: Patel, Nisarg, Kulkarni, Mohith, Parmar, Mihir, Budhiraja, Aashna, Nakamura, Mutsumi, Varshney, Neeraj, Baral, Chitta

As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation be

Externí odkaz: http://arxiv.org/abs/2406.17169

Zobrazit plný text záznamu

Report

Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

Autor: Varshney, Neeraj, Raj, Satyam, Mishra, Venkatesh, Chatterjee, Agneet, Sarkar, Ritika, Saeidi, Amir, Baral, Chitta

Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, they have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output. Recent research has fo

Externí odkaz: http://arxiv.org/abs/2406.05494

Zobrazit plný text záznamu

Report

ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints

Autor: Handa, Divij, Dolin, Pavel, Kumbhar, Shrinidhi, Baral, Chitta, Son, Tran Cao

Reasoning about actions and change (RAC) has historically driven the development of many early AI challenges, such as the frame problem, and many AI disciplines, including non-monotonic and commonsense reasoning. The role of RAC remains important eve

Externí odkaz: http://arxiv.org/abs/2406.04046

Zobrazit plný text záznamu

Report

Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

Autor: RRV, Aswin, Tyagi, Nemika, Uddin, Md Nayem, Varshney, Neeraj, Baral, Chitta

This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from th

Externí odkaz: http://arxiv.org/abs/2406.03827

Zobrazit plný text záznamu

Report

Investigating the Robustness of LLMs on Math Word Problems

Autor: Anantheswaran, Ujjwala, Gupta, Himanshu, Scaria, Kevin, Verma, Shreyas, Baral, Chitta, Mishra, Swaroop

Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial va

Externí odkaz: http://arxiv.org/abs/2406.15444

Zobrazit plný text záznamu

Report

Designing an Evaluation Framework for Large Language Models in Astronomy Research

Autor: Wu, John F., Hyk, Alina, McCormick, Kiera, Ye, Christine, Astarita, Simone, Baral, Elina, Ciuca, Jo, Cranney, Jesse, Field, Anjalie, Iyer, Kartheik, Koehn, Philipp, Kotler, Jenn, Kruk, Sandor, Ntampaka, Michelle, O'Neill, Charles, Peek, Joshua E. G., Sharma, Sanjib, Yunus, Mikaeel

Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currentl

Externí odkaz: http://arxiv.org/abs/2405.20389

Zobrazit plný text záznamu

Report

Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization

Autor: Saeidi, Amir, Verma, Shivanshu, RRV, Aswin, Baral, Chitta

Large Language Models (LLMs) perform well across diverse tasks, but aligning them with human demonstrations is challenging. Recently, Reinforcement Learning (RL)-free methods like Direct Preference Optimization (DPO) have emerged, offering improved s

Externí odkaz: http://arxiv.org/abs/2405.16681

Zobrazit plný text záznamu

Report

Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images

Autor: Luo, Yiran, Feinglass, Joshua, Gokhale, Tejas, Lee, Kuan-Cheng, Baral, Chitta, Yang, Yezhou

Domain Generalization (DG) is a challenging task in machine learning that requires a coherent ability to comprehend shifts across various domains through extraction of domain-invariant features. DG performance is typically evaluated by performing ima

Externí odkaz: http://arxiv.org/abs/2405.15961

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání