Improving document classification using domain-specific vocabulary: hybridization of deep learning approach with TFIDF

Autor: Kalra, Vandana, Kashyap, Indu, Kaur, Harmeet
Zdroj: International Journal of Information Technology; 20220101, Issue: Preprints p1-7, 7p
Abstrakt: Extracting domain keywords from the corpus helps optimize the task of document classification. Specialized vocabularies built only from semantically similar domain keywords are inadequate for understanding concepts in a specific domain. The proposed paradigm demonstrates that forming domain-specific vocabulary using semantically significant frequently occurring words of the specialized corpus outperformed the traditional classifiers to achieve effective classification. Also, a proposed novel methodology for weight computation assigns the rationalized weight to each word implying its high applicability to a specific domain. The results depicting high accuracy in classifying documents prove the importance of term-document frequency with semantics in word representation with rationalized weights for classification.
Databáze: Supplemental Index