Machine-Aided Document Markup with SGML

Autor: Liu, Chia-Cheng, 劉嘉誠
Druh dokumentu: 學位論文 ; thesis
Popis: 83
This thesis discusses a man-machine interactive method to convert a paper document that has no markup, to an electronic document with SGML markup, and designed a system named Mad Muse for that purpose. An image scanner is used to convert a paper document to an image, then the image is normalized, and input to the Mad Muse system. During the training stage, the Mad Muse directs the user with an improved DTD graph to segment the document image, so that its learning quality can be enhanced. The system uses a Chinese OCR to recognize the character string in the image segment, extracts feature information from the string, and then stores the features in a feature database. The Mad Muse system would later use the feature information to mark up the same type of documents automatically. At present there is no Chinese OCR that meets the requirements of the Mad Muse, therefore, this thesis uses simulation to prove the system''s feasibility. Results show that the system can correctly mark up any fixed part of a document. Although the system uses relative relationship to locate the contents of a variant part, it still cannot recognize the contents because the user cannot formally define the ending features of the corresponding structural elements. This problem is for future research.
Databáze: Networked Digital Library of Theses & Dissertations