Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Fabbri, Alexander R."'
Autor:
Liu, Yixin, Shi, Kejian, Fabbri, Alexander R., Zhao, Yilun, Wang, Peifeng, Wu, Chien-Sheng, Joty, Shafiq, Cohan, Arman
The automatic evaluation of instruction following typically involves using large language models (LLMs) to assess response quality. However, there is a lack of comprehensive evaluation of these LLM-based evaluators across two dimensions: the base LLM
Externí odkaz:
http://arxiv.org/abs/2410.07069
LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we
Externí odkaz:
http://arxiv.org/abs/2407.01370
Autor:
Agarwal, Divyansh, Fabbri, Alexander R., Risher, Ben, Laban, Philippe, Joty, Shafiq, Wu, Chien-Sheng
Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threat
Externí odkaz:
http://arxiv.org/abs/2404.16251
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competen
Externí odkaz:
http://arxiv.org/abs/2311.09458
Autor:
Liu, Yixin, Fabbri, Alexander R., Chen, Jiawen, Zhao, Yilun, Han, Simeng, Joty, Shafiq, Liu, Pengfei, Radev, Dragomir, Wu, Chien-Sheng, Cohan, Arman
While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction control
Externí odkaz:
http://arxiv.org/abs/2311.09184
Autor:
Huang, Kung-Hsiang, Laban, Philippe, Fabbri, Alexander R., Choubey, Prafulla Kumar, Joty, Shafiq, Xiong, Caiming, Wu, Chien-Sheng
Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains undere
Externí odkaz:
http://arxiv.org/abs/2309.09369
Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse
Externí odkaz:
http://arxiv.org/abs/2305.17779
Autor:
Laban, Philippe, Kryściński, Wojciech, Agarwal, Divyansh, Fabbri, Alexander R., Xiong, Caiming, Joty, Shafiq, Wu, Chien-Sheng
With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual c
Externí odkaz:
http://arxiv.org/abs/2305.14540
Autor:
Liu, Yixin, Shi, Kejian, He, Katherine S, Ye, Longtian, Fabbri, Alexander R., Liu, Pengfei, Radev, Dragomir, Cohan, Arman
Recent studies have found that summaries generated by large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. Therefore, we study an LLM-as-reference learning setting
Externí odkaz:
http://arxiv.org/abs/2305.14239
Autor:
Liu, Yixin, Fabbri, Alexander R., Zhao, Yilun, Liu, Pengfei, Joty, Shafiq, Wu, Chien-Sheng, Xiong, Caiming, Radev, Dragomir
Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation
Externí odkaz:
http://arxiv.org/abs/2303.03608