Zobrazeno 1 - 10
of 47
pro vyhledávání: '"Laban, Philippe"'
Evaluating retrieval-augmented generation (RAG) systems remains challenging, particularly for open-ended questions that lack definitive answers and require coverage of multiple sub-topics. In this paper, we introduce a novel evaluation framework base
Externí odkaz:
http://arxiv.org/abs/2410.15531
LLM-based applications are helping people write, and LLM-generated text is making its way into social media, journalism, and our classrooms. However, the differences between LLM-generated and human-written text remain unclear. To explore this, we hir
Externí odkaz:
http://arxiv.org/abs/2409.14509
LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we
Externí odkaz:
http://arxiv.org/abs/2407.01370
Autor:
Agarwal, Divyansh, Fabbri, Alexander R., Risher, Ben, Laban, Philippe, Joty, Shafiq, Wu, Chien-Sheng
Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threat
Externí odkaz:
http://arxiv.org/abs/2404.16251
Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of fact-checking are based on verifying each p
Externí odkaz:
http://arxiv.org/abs/2404.10774
The interactive nature of Large Language Models (LLMs) theoretically allows models to refine and improve their answers, yet systematic analysis of the multi-turn behavior of LLMs remains limited. In this paper, we propose the FlipFlop experiment: in
Externí odkaz:
http://arxiv.org/abs/2311.08596
Making big purchases requires consumers to research or consult a salesperson to gain domain expertise. However, existing conversational recommender systems (CRS) often overlook users' lack of background knowledge, focusing solely on gathering prefere
Externí odkaz:
http://arxiv.org/abs/2310.17749
In this tutorial, we focus on text-to-text generation, a class of natural language generation (NLG) tasks, that takes a piece of text as input and then generates a revision that is improved according to some specific criteria (e.g., readability or li
Externí odkaz:
http://arxiv.org/abs/2310.03878
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. However, standard chat-based conversational interfaces do not support transparency and verifiability of t
Externí odkaz:
http://arxiv.org/abs/2309.15337
Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative
Externí odkaz:
http://arxiv.org/abs/2309.14556