Popis: |
Today, with the development of the internet, documents containing information such as articles, news, web pages are produced and stored in digital environment. However, the increase in the number of media where people are able to add new contents such as social media, Twitter, and blog has increased the amount of information on the internet to enormous size. However, it is very difficult and time-consuming to determine whether or not information under research is reached. Automated document summarization systems can reduce the size of the text while keeping the important part of the text and present quickly whether the text contains the desired information. In this study, graph based document summarization methods are discussed. Besides the LexRank method, TextRank algorithm is used with 4 different similarity methods. Unlike other studies, Longest Common Subsequence (LCS), a similarity measure method, is used as a measure of similarity between nodes in the TextRank algorithm. Among the similarity measurement methods used, the longest subset achieved the best success by taking 0,510 Roguel and 0,266 Rouge-2 scores in English dataset. Similarly, the same method yields 0,742 Rouge-1 and 0,676 Rouge-2 scores in Turkish data set, which are better than other methods. |