Abstrakt: |
In information technology support/helpdesk transcripts, most of the data of interest, such as the issue of concern, issue severity, context, and system status, is not provided in a structured form. Moreover, the special traits of product issue orientation, implicit background knowledge, and off-topic dialogues require a domain specialized approach to extract knowledge from these transcripts. Accordingly, this study analyzes the specific domain requirements and proposes a novel solution based on Natural Language Processing (NLP) approach. In the core process, this approach uses an adapted term frequency-inverse document frequency (TF-IDF) algorithm by adding a new parameter reflecting the term’s priority in the text. Experimental results show that the proposed NLP-based solution performs reasonably well in topic categorizing with an accuracy of 92.8%. Compared to the performance of keywords extraction, the proposed approach achieves an accuracy of 93.4%, which outperforms the classic TF-IDF method signifying the importance of extracting and accommodating domain-specific knowledge. [ABSTRACT FROM AUTHOR] |