Abstrakt: |
Introduction In recent years, the rapid growth of textual data has made extracting valuable summaries with a minimum volume from this massive volume of data inevitable. Due to a large amount of data, text summarization by humans is very time-consuming and practically impossible. Therefore, artificial intelligence in text summarization has been one of the essential branches of text mining and natural language processing. Among the existing methods, the methods that select meaningful sentences based on their role in sparsely reconstructing other sentences have obtained better results. These methods have two main terms, one term related to reconstructing each sentence by others modeled by the l2 norm. The other one is a regularization term that controls the sparseness of the reconstruction coefficients modeled by the group sparse norm. This sparseness allows a limited number of sentences to participate in the construction of each sentence. The reconstruction function based on the L2 norm causes all keywords to play an equal role in sentence reconstruction, which may cause the outlier words to change the result of the summary. Therefore, to improve the summary's quality obtained in this article, we rewrite the penalty function with L1 norm. This substituting allows us to have a sparse reconstruction error. Due to the sparseness property of L1, the reconstruction error corresponding to most words is good and is close to zero in this method. Still, for some words (outlier), it allows this error to be significant, which reduces the method's sensitivity to the outlier words. The implementation results show that the proposed method provides faster and higher quality summaries based on ROUGE and F-measure criteria than the previous methods. Material and Methods In this paper, we introduced a new loss function for text summarization with sparse viewpoint. Results and discussion The method in [5], used the following model for text summarization min Σi=1n||si - Swi||2² + λ||W||2,1 s.t diag(W) = 0,W ≥ 0, SERnxm Here the first term denotes the summation of reconstruction errors of sentences by others. Also, the second term controls the sparseness of the coefficients in the reconstruction. After solving this minimization problem, they sort the sentences based on the norm of the rows of the matrix W. The reconstruction function based on the L2 norm causes all keywords to play an equal role in sentence reconstruction, which may cause the outlier words to change the result of the summary. Therefore, to improve the summary's quality obtained in this article, we rewrite the penalty function with the L1 norm as follows: minwi||si - Swi||1 + λ||wi||1, i = 1,..., n Due to the sparseness property of L1, the reconstruction error corresponding to most words is close to zero. At the same time, for some words (outlier), it allows this error to be significant, which reduces the method's sensitivity to the outlier words. To evaluate the performance of the proposed method, all the texts of the DUC 2002 dataset, which is 115 documents, were compared by both test methods and the results obtained by the F-measure criterion. The obtained results show the quality of the proposed method in obtaining appropriate summaries compared to a technique based on the group sparse method. Conclusion The Main results of the papers could be summarized as follows: • Using the L1 norm in the reconstruction term can filter anomaly data in the text summarization. • Experimental results confirm that the proposed method generally gives results better than the Group sparse norm method. • Examining the hand summaries, we concluded that the participant sentences in the hand summaries are not the only ones with the most keywords. Sometimes sentences with fewer keywords are also meaningful and participate in the handwriting summary. The property of the Group sparse method in [5] is that it tries to have all the keywords within the selected sentences. If the sentence has a specific keyword and the number of keywords is small, it is a strong candidate for selection in the selected sentences. But the proposed method doesn't have such drawbacks. [ABSTRACT FROM AUTHOR] |