Recognition and Postprocessing of Chinese Business Cards
Autor: | Tai-Hung Chen, 陳泰宏 |
---|---|
Rok vydání: | 2000 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 88 Business cards convey significant information of personal data. In order to use the information effectively, it is necessary to automatically extract the information to build an electronic business card database. This is called a business card recognition system. In generally, a business card recognition system has three stages. First, a preprocessing stage is needed to perform image processing and extract character images. It then needs a card layout analysis as the second stage. The last stage called post-processing usually adopts linguistics to increase the recognition rate of business card processing. The goal of this thesis is to study the recognition problems of business cards. We assume that characters have been recognized and card layout has been analyzed. Our aim is to improve the low recognition rate of OCR in business card, which happens due to the fact that characters vary greatly in font type and are too small to be recognized. In our approach, Hidden Markov Model is adopted to recognize characters in Chinese business card. A left-right model will output the top-10 candidates as its recognition result. A postprocessing stage is followed to improve the recognition result. A Viterbi algorithm is proposed in the postprocessing stage. The algorithm will use bigram as its linguistic information to search the top-10 candidates. An optimized character sequence is obtained as the improved result of postprocessing. Our experiments are built on the recognition of address item and company item in business cards. Bigram table and Hidden Markov Models are trained with a telephony database. 100 address items and 30 company items are used for testing. Experimental results reveal the validity of our proposed method. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |