WordNet-based Semantic Classification for Auction Commodity Titles

Autor: Wei-Jun Liu, 劉瑋竣
Rok vydání: 2014
Druh dokumentu: 學位論文 ; thesis
Popis: 102
This research aims at automatically classification merchandise in an auction website according to the Chinese titles of merchandise. Because of the shortcomings of traditional article classification, this papers proposes four methods to improve the automatic classification of merchandise titles. The first method first trains the keywords for each class of merchandise, and then exactly extracts the keywords for each testing merchandise title. The extracted feature vector is then classified by the SVM. The second method is designed to loosely extract the keywords for each testing merchandise title. The third method involves the use of machine translation from Chinese keywords to their English translations and the WordNet with semantic structures. In the training phase, the method first derives the best semantic for the English translations of each Chinese keyword. In the testing phase, by using the semantic information of the keywords and the titles, the method extracts extra keywords for the Chinese titles without matched keywords after processed by the second method. The fourth method involves the use of similar keywords, which are the keywords whose semantic distances are short. Based on the third method, the fourth method extends the matched keywords to their similar keywords, and thus each extracted feature vector may contain more keywords with similar meanings. At last, a feature vector can be transformed by a feature extraction method, such as FFT, DCT, etc., into a new feature vector to reduce the size of the feature vector. Then, the SVM can reduce the storage space and the running time while the recognition performance is still retained. Experimental results show the excellent performance of the proposed methods.
Databáze: Networked Digital Library of Theses & Dissertations