A Cross-Trainging Approach for Bilingual Web News Classification

Autor: Che-Min Chen, 陳哲民
Rok vydání: 2006
Druh dokumentu: 學位論文 ; thesis
Popis: 94
As the Internet is developing rapidly, it has become an important information source. Many news sites have provided online news stories and many Web portals have integrated different news sources. Users can therefore browse more related news reports and realize the news events in depth. However, to the best of our knowledge, most Web news portals only provide monolingual news integration services. For this reason, we are motivated to study the core techniques of bilingual Web news classification with related information retrieval and machine translation techniques to facilitate a more comprehensive bilingual news classification. From the past studies, we have learned that the performance of clustering by first performing translation is worse than that of clustering by thereafter performing translation. Our study is based on this research result and employs the cross-training concept earlier proposed for catalog integration to construct an SVM-based classifier for bilingual Web news classification. We have conducted several experiments with the news from Google News as the experimental data sets. From the experimental results, we can find the proposed cross-training approach outperforms the traditional SVM classifier in an all-round manner.
Databáze: Networked Digital Library of Theses & Dissertations