Evolution of a text summarization system in an automatic evaluation framework

Autor: Rigouste, Lois
Jazyk: angličtina
Rok vydání: 2003
Předmět:
Druh dokumentu: Diplomová práce
DOI: 10.20381/ruor-9679
Popis: CALLISTO is a text summarizer that searches through a space of possible configurations for the best one. This is different from other systems since it allows CALLISTO (1) to choose adequate components based on results obtained on the training data (and thus, to choose a configuration better adapted to the problem) and (2) to allow different texts to be summarized in different ways. The purpose of this thesis is to find out how the initial space CALLISTO explores can be modified to improve the overall quality of the summaries produced. The thesis reviews and evaluates the first arbitrary design choices made in the system, through a fully automated framework based on a content measure proposed by Lin and Hovy. We tried different modifications to CALLISTO such as replacing the internal evaluation measure, testing other discretization processes, changing the learning algorithm or adding new features to characterize the input text. We found that Naive Bayes outperformed the current learner C5.0, by identifying one configuration working satisfactorily for all texts.
Databáze: Networked Digital Library of Theses & Dissertations