A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth

Autor: Yalu Jia, Lei Liu, Hao Chen, Yinghong Sun
Rok vydání: 2019
Předmět:
Zdroj: ICNC-FSKD
ISSN: 1433-755X
1433-7541
Popis: Unknown word recognition is one of the important research contents of natural language processing. However, there are still problems such as sparse data, corpus noise, and various forms of expressions for the identification of micro-blog short words. This paper proposes an unknown words recognition method POS-FP (Frequent Pattern growth with part- of-speech)for micro-blog short text. Firstly, the candidate unknown words are obtained by combing the N-grams model and frequent item sets. Then the unknown word is filtered and verified by the improved mutual information, information entropy and context dependence. Finally, the open verification method is used to obtain final unknown word. Experiments show that the algorithm improved the unknown word recognition for micro-blog short texts.
Databáze: OpenAIRE