A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth
Autor: | Yalu Jia, Lei Liu, Hao Chen, Yinghong Sun |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
Speech recognition Context (language use) 02 engineering and technology Mutual information Identification (information) Artificial Intelligence 020204 information systems Word recognition Pattern recognition (psychology) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Noise (video) Word (computer architecture) |
Zdroj: | ICNC-FSKD |
ISSN: | 1433-755X 1433-7541 |
Popis: | Unknown word recognition is one of the important research contents of natural language processing. However, there are still problems such as sparse data, corpus noise, and various forms of expressions for the identification of micro-blog short words. This paper proposes an unknown words recognition method POS-FP (Frequent Pattern growth with part- of-speech)for micro-blog short text. Firstly, the candidate unknown words are obtained by combing the N-grams model and frequent item sets. Then the unknown word is filtered and verified by the improved mutual information, information entropy and context dependence. Finally, the open verification method is used to obtain final unknown word. Experiments show that the algorithm improved the unknown word recognition for micro-blog short texts. |
Databáze: | OpenAIRE |
Externí odkaz: |