Popis: |
E-mails are a widely used form of communication for both business and personal use. The increased usage of e-mails in recent times resulted in the complexity of prioritizing and categorizing them. Reading through the e-mails, categorizing them based on the topic and providing a reply is a laborious and time-consuming process. Additionally, the categorization of the e-mail in the respective task is subjective to the person understanding, thereby introducing selection bias. In order to address these challenges, this paper introduces HTIE, a Hierarchical Task Identification Framework that uses Natural Language Processing (NLP) to analyze the e-mails and identifies the sales domain-specific entities, tasks, and sub-tasks present in the e-mails. The HTIE framework is designed on the principles of microservices architecture pattern, and an orchestrator coordinates the microservices to provide a technology-agnostic and scalable solution. The HTIE framework comprises three core groups of microservices: (a) E-mail Pre-processor, (b) Information Extraction, and (c) Task Identification. Finally, we evaluate the HTIE framework with the Enron dataset and show that the entity-model generated is 55x smaller in size and can identify sales domain-specific entities 18% more accurate than the existing pre-trained language model. Out of the trained classification models to identify the tasks and sub-tasks from the e-mails, the Stochastic Gradient Descent (SGD) classifier is the best performing model with an accuracy score of 0.87 and a recall score of 0.87. |