Výsledky vyhledávání - "Fabbri, Alexander R."

Report

ReIFE: Re-evaluating Instruction-Following Evaluation

Autor: Liu, Yixin, Shi, Kejian, Fabbri, Alexander R., Zhao, Yilun, Wang, Peifeng, Wu, Chien-Sheng, Joty, Shafiq, Cohan, Arman

The automatic evaluation of instruction following typically involves using large language models (LLMs) to assess response quality. However, there is a lack of comprehensive evaluation of these LLM-based evaluators across two dimensions: the base LLM

Externí odkaz: http://arxiv.org/abs/2410.07069

Zobrazit plný text záznamu

Report

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Autor: Laban, Philippe, Fabbri, Alexander R., Xiong, Caiming, Wu, Chien-Sheng

LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we

Externí odkaz: http://arxiv.org/abs/2407.01370

Zobrazit plný text záznamu

Report

Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Autor: Agarwal, Divyansh, Fabbri, Alexander R., Risher, Ben, Laban, Philippe, Joty, Shafiq, Wu, Chien-Sheng

Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threat

Externí odkaz: http://arxiv.org/abs/2404.16251

Zobrazit plný text záznamu

Report

Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of Lexical Overlap in Train and Test Reference Summaries

Autor: Choubey, Prafulla Kumar, Fabbri, Alexander R., Xiong, Caiming, Wu, Chien-Sheng

Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competen

Externí odkaz: http://arxiv.org/abs/2311.09458

Zobrazit plný text záznamu

Report

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

Autor: Liu, Yixin, Fabbri, Alexander R., Chen, Jiawen, Zhao, Yilun, Han, Simeng, Joty, Shafiq, Liu, Pengfei, Radev, Dragomir, Wu, Chien-Sheng, Cohan, Arman

While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction control

Externí odkaz: http://arxiv.org/abs/2311.09184

Zobrazit plný text záznamu

Report

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Autor: Huang, Kung-Hsiang, Laban, Philippe, Fabbri, Alexander R., Choubey, Prafulla Kumar, Joty, Shafiq, Xiong, Caiming, Wu, Chien-Sheng

Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains undere

Externí odkaz: http://arxiv.org/abs/2309.09369

Zobrazit plný text záznamu

Report

Generating EDU Extracts for Plan-Guided Summary Re-Ranking

Autor: Adams, Griffin, Fabbri, Alexander R., Ladhak, Faisal, McKeown, Kathleen, Elhadad, Noémie

Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse

Externí odkaz: http://arxiv.org/abs/2305.17779

Zobrazit plný text záznamu

Report

LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

Autor: Laban, Philippe, Kryściński, Wojciech, Agarwal, Divyansh, Fabbri, Alexander R., Xiong, Caiming, Joty, Shafiq, Wu, Chien-Sheng

With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual c

Externí odkaz: http://arxiv.org/abs/2305.14540

Zobrazit plný text záznamu

Report

On Learning to Summarize with Large Language Models as References

Autor: Liu, Yixin, Shi, Kejian, He, Katherine S, Ye, Longtian, Fabbri, Alexander R., Liu, Pengfei, Radev, Dragomir, Cohan, Arman

Recent studies have found that summaries generated by large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. Therefore, we study an LLM-as-reference learning setting

Externí odkaz: http://arxiv.org/abs/2305.14239

Zobrazit plný text záznamu

Report

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation

Autor: Liu, Yixin, Fabbri, Alexander R., Zhao, Yilun, Liu, Pengfei, Joty, Shafiq, Wu, Chien-Sheng, Xiong, Caiming, Radev, Dragomir

Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation

Externí odkaz: http://arxiv.org/abs/2303.03608

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání