Extracting highlights of scientific articles: A supervised summarization approach
Autor: | Luca Cagliero, Moreno La Quatra |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
0209 industrial biotechnology
Process (engineering) Computer science business.industry General Engineering Extractive summarization 02 engineering and technology Highlight extraction Regression models computer.software_genre Automatic summarization Computer Science Applications Variety (cybernetics) Text mining and analytics 020901 industrial engineering & automation Artificial Intelligence Benchmark (surveying) Similarity (psychology) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing |
Popis: | Scientific articles can be annotated with short sentences, called highlights, providing readers with an at-a-glance overview of the main findings. Highlights are usually manually specified by the authors. This paper presents a supervised approach, based on regression techniques, with the twofold aim at automatically extracting highlights of past articles with missing annotations and simplifying the process of manually annotating new articles. To this end, regression models are trained on a variety of features extracted from previously annotated articles. The proposed approach extends existing extractive approaches by predicting a similarity score, based on n-gram co-occurrences, between article sentences and highlights. The experimental results, achieved on a benchmark collection of articles ranging over heterogeneous topics, show that the proposed regression models perform better than existing methods, both supervised and not. |
Databáze: | OpenAIRE |
Externí odkaz: |