Have We Solved The Hard Problem? It’s Not Easy! Contextual Lexical Contrast as a Means to Probe Neural Coherence
Autor: | Wenqiang Lei, Yisong Miao, Runpeng Xie, Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | Proceedings of the AAAI Conference on Artificial Intelligence. 35:13208-13216 |
ISSN: | 2374-3468 2159-5399 |
DOI: | 10.1609/aaai.v35i15.17560 |
Popis: | Lexical cohesion is a fundamental mechanism for text which requires a pair of words to be interpreted as a certain type of lexical relation (e.g., similarity) to understand a coherent context; we refer to such relations as the contextual lexical relation. However, work on lexical cohesion has not modeled context comprehensively in considering lexical relations due to the lack of linguistic resources. In this paper, we take initial steps to address contextual lexical relations by focusing on the contrast relation, as it is a well-known relation though it is more subtle and relatively less resourced. We present a corpus named Cont 2 Lex to make Contextual Lexical Contrast Recognition a computationally feasible task. We benchmark this task with widely-adopted semantic representations; we discover that contextual embeddings (e.g. BERT) generally outperform static embeddings (e.g. Glove), but barely go beyond 70% in accuracy performance. In addition, we find that all embeddings perform better when CLC occurs within the same sentence, suggesting possible limitations of current computational coherence models. Another intriguing discovery is the improvement of BERT in CLC is largely attributed to its modeling of CLC word pairs co-occurring with other word repetitions. Such observations imply that the progress made in lexical coherence modeling remains relatively primitive even for semantic representations such as BERT that have been empowering numerous standard NLP tasks to approach human benchmarks. Through presenting our corpus and benchmark, we attempt to seed initial discussions and endeavors in advancing semantic representations from modeling syntactic and semantic levels to coherence and discourse levels. |
Databáze: | OpenAIRE |
Externí odkaz: |