Abstractive Summarization of Text Document in Malayalam Language: Enhancing Attention Model Using POS Tagging Feature

Autor:	Sindhya K. Nambiar, David Peter S., Sumam Mary Idicula
Rok vydání:	2023
Předmět:	General Computer Science
Zdroj:	ACM Transactions on Asian and Low-Resource Language Information Processing. 22:1-14
ISSN:	2375-4702 2375-4699
DOI:	10.1145/3561819
Popis:	Over the past few years, researchers are showing huge interest in sentiment analysis and summarization of documents. The primary reason being that huge volumes of information are available in textual format, and this data has proven helpful for real-world applications and challenges. The sentiment analysis of a document will help the user comprehend the content’s emotional intent. Abstractive summarization algorithms generate a condensed version of the text, which can then be used to determine the emotion represented in the text using sentiment analysis. Recent research in abstractive summarization concentrates on neural network-based models, rather than conjunctions-based approaches, which might improve the overall efficiency. Neural network models like attention mechanism are tried out to handle complex works with promising results. The proposed work aims to present a novel framework that incorporates the part of speech tagging feature to the word embedding layer, which is then used as the input to the attention mechanism. With POS feature being part of the input layer, this framework is capable of dealing with words containing contextual and morphological information. The relevance of POS tagging here is due to its strong reliance on the language’s syntactic, contextual, and morphological information. The three main elements in the work are pre-processing, POS tagging feature in the embedding phase, and the incorporation of it into the attention mechanism. The word embedding provides the semantic concept about the word, while the POS tags give an idea about how significant the words are in the context of the content, which corresponds to the syntactic information. The proposed work was carried out in Malayalam, one of the prominent Indian languages. A widely used and accepted dataset from the English language was translated to Malayalam for conducting the experiments. The proposed framework gives a ROUGE score of 28, which outperformed the baseline models.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::954bff103d321b4d8f7eabade0729976 https://doi.org/10.1145/3561819 Zobrazit plný text záznamu