An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarization

Autor: Yaser M. Wazery, Marwa E. Saleh, Abdelmgeid A. Ali
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Journal of King Saud University: Computer and Information Sciences, Vol 35, Iss 7, Pp 101614- (2023)
Druh dokumentu: article
ISSN: 1319-1578
DOI: 10.1016/j.jksuci.2023.101614
Popis: Extractive summarization has recently gained significant attention as a classification problem at the sentence level. Most current summarization methods rely on only one way of representing sentences in a document (i.e., extracted features, word embeddings, BERT embeddings). However, classification performance and summary generation quality will be improved if we combine two ways of representing sentences. This paper presents a novel extractive text summarization method based on word embeddings and statistical features of a single document. Each sentence is encoded using a Convolutional Neural Network (CNN) and a Feed-Forward Neural Network (FFNN) based on word embeddings and statistical features. CNN and FFNN outputs are concatenated to classify the sentence using a Multilayer Perceptron (MLP). In addition, hybrid model parameters are optimized by the KerasTuner optimization technique to determine the most efficient hybrid model. The proposed method was evaluated on the standard Newsroom dataset. Experiments show that the proposed method effectively captures the document’s semantic and statistical information and outperforms deep learning, machine learning, and state-of-the-art approaches with scores of 78.64, 74.05, and 72.08 for ROUGE-1 ROUGE-2, and ROUGE-L, respectively.
Databáze: Directory of Open Access Journals