Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
Autor: | Abdullah Alshanqiti, Sami S. Albouq, Aeshah Alsughayyir, Abdul Rehman Gilal, Aisha Mousa Mashraqi, Abdallah Namoun |
---|---|
Rok vydání: | 2021 |
Předmět: |
General Computer Science
Computer science business.industry media_common.quotation_subject Feature extraction General Engineering computer.software_genre Punctuation Automatic summarization Task analysis Redundancy (engineering) General Materials Science Quality (business) The Internet Artificial intelligence Electrical and Electronic Engineering business computer Natural language processing Coherence (linguistics) media_common |
Zdroj: | IEEE Access. 9:135594-135607 |
ISSN: | 2169-3536 |
DOI: | 10.1109/access.2021.3113256 |
Popis: | Towards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified version of texts for preserving their time/effort required to skim a given large body of text. However, ATS is deemed as one of the most complex NLP applications, particularly for the Arabic language that has not been intelligently developed like the other Indo-European languages. Thus, we present an extractive-based summarizer (ArDBertSum) for text written in Arabic, relying on the DistilBERT model. Besides, we propose a domain-specific sentence-clauses segmentater (SCSAR) to support our ArDBertSum in further shortening long/complex sentences. The results of our experiments illustrate that our ArDBertSum yields the best performance, compared with non-heuristic Arabic summarizers, in producing an acceptable quality of candidate summaries. These experiments have been conducted on EASC-dataset (along with our proposed dataset) to report on (1) a statistical evaluation utilizing ROUGE metrics and (2) a specific human-based evaluation. The human evaluation results revealed promising perceptions; however, further works are needed to ameliorate the coherence and punctuation of the automatic summaries. |
Databáze: | OpenAIRE |
Externí odkaz: |