Text summarization of News Articles based on named entity recognition using Spacy library

Autor: Ibrahim Alshibly, Sabreen Al-Shorfat, Mohammad Otair, Mohammad Shehab, Omar Tarawneh, Mohammad Sh. Daoud
Rok vydání: 2023
DOI: 10.21203/rs.3.rs-2688915/v1
Popis: The study discusses the importance of summarization in dealing with a large amount of data available on the internet. The study used a deep-learning algorithm based on functions from the spacy library in Python to summarize news articles and evaluated the impact of named entity recognition on the summarization process. The study assessed different datasets from CNN-DailyMail and the BBC (entertainment articles) and found that the proposed method based on named entity recognition showed significant improvement in recall, precision, and F-score compared to the word frequency method. The study also observed that the articles from CNN-DailyMail were longer, with an average of 551 words and 28 sentences, compared to the BBC (entertainment articles), which had an average of 190 words and 12 sentences. The evaluation results showed that the proposed method based on named entity recognition performed better on the shorter articles from the BBC, indicating that the method was more effective in summarizing shorter texts. In summary, the study highlighted the importance of summarization in dealing with a large amount of data available on the internet. It showed that named entity recognition can significantly improve the effectiveness of the summarization process. The study also observed that the proposed method was more effective in summarizing shorter texts.
Databáze: OpenAIRE