Efficient Corpus Creation Method for NLU Using Interview with Probing Questions

Autor: Hiroaki Kokubo, Jinhua She, Rintaro Ikeshita, Masataka Motohashi, Yasunari Obuchi, Takeshi Homma, Kazuaki Shima
Rok vydání: 2019
Předmět:
Zdroj: Journal of Advanced Computational Intelligence and Intelligent Informatics. 23:947-955
ISSN: 1883-8014
1343-0130
DOI: 10.20965/jaciii.2019.p0947
Popis: This paper presents an efficient method to build a corpus to train natural language understanding (NLU) modules. Conventional corpus creation methods involve a common cycle: a subject is given a specific situation where the subject operates a device by voice, and then the subject speaks one utterance to execute the task. In these methods, many subjects are required in order to build a large-scale corpus, which causes a problem of increasing lead time and financial cost. To solve this problem, we propose to incorporate a “probing question” into the cycle. Specifically, after a subject speaks one utterance, the subject is asked to think of alternative utterances to execute the same task. In this way, we obtain many utterances from a small number of subjects. An evaluation of the proposed method applied to interview-based corpus creation shows that the proposed method reduces the number of subjects by 41% while maintaining morphological diversity in a corpus and morphological coverage for user utterances spoken to commercial devices. It also shows that the proposed method reduces the total time for interviewing subjects by 36% compared with the conventional method. We conclude that the proposed method can be used to build a useful corpus while reducing lead time and financial cost.
Databáze: OpenAIRE