Popis: |
This paper introduces a new metric for automatically evaluation summaries called ContextChain. Based on an in-depth analysis of the TAC 2008 update summarization results, we show that previous automatic metrics such as ROUGE-2 and BE cannot reliably predict strong performing systems. We introduce two new terms called Correlation Recall and Correlation Precision and discuss how they cast more light on the coverage and the correctness of the respective metric. Our newly proposed metric called ContextChain incorporates findings from Giannakopoulos et al. (2008) and Barzilay and Lapata (2008) [2]. We show that our metric correlates with responsiveness scores even for the top n systems that participated in the TAC 2008 update summarization task, whereas ROUGE-2 and BE do not show a correlation for the top 25 systems. |