Survey on Extractive Text Summarization Methods with Multi-Document Datasets

Autor:	P N Varalakshmi K, Jagadish S Kallimani
Rok vydání:	2018
Předmět:	Artificial neural network Redundancy (linguistics) Computer science business.industry Discourse analysis computer.software_genre Automatic summarization Multi-document summarization ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Embedding Artificial intelligence Paragraph Cluster analysis business computer Natural language processing Sentence
Zdroj:	ICACCI
DOI:	10.1109/icacci.2018.8554768
Popis:	Text summarization has been one of the key research areas in Natural Language Processing (NLP) for a while. The various methods to summarize one or more documents can be broadly classified into extractive and abstractive text summarization where the former involves selecting key parts in the document and embedding into the summary while balancing between salience and redundancy. The latter involves creating new sentences to provide a summary of the documents. Extractive summarization can further be done in a supervised manner with humans or an unsupervised manner without any human intervention. This paper provides the knowledge a few of the current methods to perform extractive text summarization where the input would be multi document sets. Multi document summarization can consider two types of document sets; a homogeneous set of documents which have a common topic or theme and a heterogeneous set where the main topic for the documents are unrelated but they contain some form information that is related to the summary. The first method uses sentence regression where they consider performing sentence ranking along with sentence relations followed by greedy selection process. The second is an unsupervised paragraph embedding method utilizing a density peaks clustering method. The third method proposes document-level reconstruction using a neural document model. The fourth method is a query focused, joint neural network based model with an attention mechanism. The fifth method concentrates on coherence by providing a graph-based model which does not require discourse analysis as a prerequisite. We also see a way to create a heterogeneous multi-documentcorpus along with the limitations of each of these methods.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::a836cb1a17b6b4c5ed6378319c18c521 https://doi.org/10.1109/icacci.2018.8554768 Zobrazit plný text záznamu