Popis: |
Natural Language Processing techniques are being widely used in the industry these days to solve a variety of business problems. In this work, we experiment with the application of NLP techniques for the use case of understanding the call interactions between customers and customer service representatives and to extract interesting insights from these conversations. We focus our methodologies on understanding call transcripts of these interactions which fall under the category of long document understanding. Existing works in text encoding typically address short form text encoding. Deep Learning models like Vanilla Transformer, BERT and DistilBERT have achieved state of the art performance on a variety of tasks involving short form text but perform poorly on long documents. To address this issue, modifications to the Transformer model have been released in the form of Longformer and BigBird. However, all these models require heavy computational resources which are often unavailable in small scale companies that run on budget constraints. To address these concerns, we survey a variety of explainable and light weight text encoders that can be trained easily in a resource constrained environment. We also propose Hierarchical Self Attention based models that outperform DistilBERT, Doc2Vec and single layer self-attention networks for downstream tasks like text classification. The proposed architecture has been put into production at the local industry organization that sponsored the research (SafeAuto Inc.) and helps the company to monitor the performance of its customer service representatives. |