Active Learning and Multi-label Classification for Ellipsis and Coreference Detection in Conversational Question-Answering
Autor: | Brabant, Quentin, Rojas-Barahona, Lina Maria, Gardent, Claire |
---|---|
Přispěvatelé: | Orange Labs [Lannion], France Télécom, Natural Language Processing : representations, inference and semantics (SYNALP), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | 12th International Workshop on Spoken Dialog System Technology (IWSDS 2021) 12th International Workshop on Spoken Dialog System Technology (IWSDS 2021), Nov 2021, Singapour/Virtual, Singapore |
Popis: | In human conversations, ellipsis and coreference are commonly occurring linguistic phenomena. Although these phenomena are a mean of making human-machine conversations more fluent and natural, only few dialogue corpora contain explicit indications on which turns contain ellipses and/or coreferences. In this paper we address the task of automatically detecting ellipsis and coreferences in conversational question answering. We propose to use a multi-label classifier based on DistilBERT. Multi-label classification and active learning are employed to compensate the limited amount of labeled data. We show that these methods greatly enhance the performance of the classifier for detecting these phenomena on a manually labeled dataset. Comment: Published in IWSDS 2021 |
Databáze: | OpenAIRE |
Externí odkaz: |