Improving Retrieval-based Dialog System by Chat Log Disentanglement based on Message Pair Similarity Estimation

Autor: Zhi-Xian Liu, 劉至咸
Rok vydání: 2019
Druh dokumentu: 學位論文 ; thesis
Popis: 107
In order to build a retrieval-based chatbot, we generate the Question-Answer Pairs from the chat log. However, Question-Answer Pairs don’t present in order in the chat log. Question-Answer Pairs of different content may interleave with each other. The task of separating mixed messages into detached conversation are called conversation disentanglement. Most of the existing research deal with this task by calculating the similarity of two messages. In this paper, we find that it is very difficult to predict whether two messages belong to the same conversation by calculating the similarity of the message, but if we predict the reply relation of the message by calculating the similarity, this problem can be solved. In addition, we point out that the models in the past research are unable to deal with untrained messages, and cannot be used in real world. In this paper, we used IRC and Reddit datasets for experiments and QNAP chat log for conversation disentanglement. The synthetic Reddit dataset provides an additional amount of training data, and the BERT model gets good performance on predicting reply relationship on this dataset.
Databáze: Networked Digital Library of Theses & Dissertations