Information Extraction for Ancient Chinese Corpora

Autor: Jia-Yang Chang, 張嘉洋
Rok vydání: 1999
Druh dokumentu: 學位論文 ; thesis
Popis: 87
As the amount of contents being digitized grows at an impressive speed, how to automatically extract information from a large digital library becomes an essential issue for effective utilizing the contents. This thesis discusses an information extraction utility developed for ancient Chinese language. The corpus used as the test bed of the proposed scheme is called “Dan-Shin Files”. Due to the nature of the corpus, automatically segmenting words, parsing sentences, and figuring out the relationship between sentences is not easy. Nevertheless, because of the nature of these documents, the documents of the same type typically have very similar patterns, which facilitates information extraction in a great degree. Based on the properties observed from the corpus, the mechanism proposed in this thesis, which includes clustering, template mining, and information extraction, is built up. The usefulness, effectiveness of the mechanism and its role in NTUDLM are described in this thesis.
Databáze: Networked Digital Library of Theses & Dissertations