A Study on Automated Text Summarization and Its Application on Chinese Documents

Autor: Jen-Yuan Yeh, 葉鎮源
Rok vydání: 2002
Druh dokumentu: 學位論文 ; thesis
Popis: 90
In this thesis, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.
Databáze: Networked Digital Library of Theses & Dissertations