Mining the Frequent Patterns of Named Entities for Long Document Classification

Autor:	Bohan Wang, Rui Qi, Jinhua Gao, Jianwei Zhang, Xiaoguang Yuan, Wenjun Ke
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	long document classification key feature mining Naive Bayesian Technology Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999
Zdroj:	Applied Sciences, Vol 12, Iss 5, p 2544 (2022)
Druh dokumentu:	article
ISSN:	2076-3417
DOI:	10.3390/app12052544
Popis:	Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MIPELD (mining the frequent pattern of a named entity for long document classification) for long document classification is demonstrated, which mines the frequent patterns of named entities as features. Discovered patterns allow semantic generalization among documents and provide clues for verifying the results. Experiments on several datasets resulted in good accuracy and marco-F1 values, meeting the requirements for practical application. Further analysis validated the effectiveness of MIPELD in mining interpretable information in text classification.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/86afc0c798e348a0b2c56e39d05fe97a Zobrazit plný text záznamu View record in DOAJ