Autor: |
Kalra, Vandana, Kashyap, Indu, Kaur, Harmeet |
Zdroj: |
International Journal of Information Technology; 20220101, Issue: Preprints p1-7, 7p |
Abstrakt: |
Extracting domain keywords from the corpus helps optimize the task of document classification. Specialized vocabularies built only from semantically similar domain keywords are inadequate for understanding concepts in a specific domain. The proposed paradigm demonstrates that forming domain-specific vocabulary using semantically significant frequently occurring words of the specialized corpus outperformed the traditional classifiers to achieve effective classification. Also, a proposed novel methodology for weight computation assigns the rationalized weight to each word implying its high applicability to a specific domain. The results depicting high accuracy in classifying documents prove the importance of term-document frequency with semantics in word representation with rationalized weights for classification. |
Databáze: |
Supplemental Index |
Externí odkaz: |
|