Learning Sentence Embeddings for Coherence Modelling and Beyond
Autor: | Charles X. Ling, Jinhang Zhang, Tanner A. Bohn, Yining Hu |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language 021103 operations research Computer science business.industry 0211 other engineering and technologies Window (computing) 02 engineering and technology Coherence (statistics) 010501 environmental sciences computer.software_genre 01 natural sciences Task (computing) Recurrent neural network Document structuring Embedding Artificial intelligence Heuristics business computer Computation and Language (cs.CL) Sentence Natural language processing 0105 earth and related environmental sciences |
Zdroj: | RANLP |
DOI: | 10.48550/arxiv.1804.08053 |
Popis: | We present a novel and effective technique for performing text coherence tasks while facilitating deeper insights into the data. Despite obtaining ever-increasing task performance, modern deep-learning approaches to NLP tasks often only provide users with the final network decision and no additional understanding of the data. In this work, we show that a new type of sentence embedding learned through self-supervision can be applied effectively to text coherence tasks while serving as a window through which deeper understanding of the data can be obtained. To produce these sentence embeddings, we train a recurrent neural network to take individual sentences and predict their location in a document in the form of a distribution over locations. We demonstrate that these embeddings, combined with simple visual heuristics, can be used to achieve performance competitive with state-of-the-art on multiple text coherence tasks, outperforming more complex and specialized approaches. Additionally, we demonstrate that these embeddings can provide insights useful to writers for improving writing quality and informing document structuring, and assisting readers in summarizing and locating information. Comment: Accepted for publication at RANLP 2019. 8 pages (10 with references), 4 Figures in the main text. This version contains significant improvements in the algorithm and reports on a wider set of applications |
Databáze: | OpenAIRE |
Externí odkaz: |