Text Summarization on News
Autor: | Hsiang-Pin Lee, 李祥賓 |
---|---|
Rok vydání: | 2001 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 89 The swift development of information technique and the Internet has resulted in a problem of information overflow. Hence it is imperative to find a way to help users browse through documents efficiently and effectively. Text summarization could be a remedy to this problem. Traditional text summarization is usually processed manually. However, it does cost lots of human resources and cannot satisfy the demand in real time. Therefore, it is necessary to automate the process. This paper presents three methods of text summarization on Reuters news corpus. First, we use the technique of Information Retrieval to collect the important vocabulary of the document (called Important Vocabulary Extract Policy). Second, we determine the significance of the sentence with its position in the document (called Optimal Position Policy). Last, we expand the vocabulary of the title (called Title Expand Policy). To express the concept of the document, we extract the important vocabulary from the document and analyze its structure to find which position the document subject occupies. Moreover, we believe that the title is rather significant in the document. We therefore expand the relative vocabulary of the title from the WordNet. We then use the expanded set of words to find the appropriate sentence for summarization. In experimentation, we design different experiments for three text summarization methods. The summary of text is then evaluated according to text categorization. Experimental results indicate that all of the methods used in this thesis can achieve acceptable performance. Finally, this thesis also proposes a method to combine two policies -- Optimal Position and Title Expand. Opposite to the criterion in 65.6% precision rate, the proposed method result a 71.9% precision rate, a 9.6% improvement in precision. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |