An Innovative Approach of Bangla Text Summarization by Introducing Pronoun Replacement and Improved Sentence Ranking
Autor: | Zerina Begum, Suraiya Pervin, Md. Majharul Haque |
---|---|
Rok vydání: | 2017 |
Předmět: |
Pronoun
Grammar Computer science business.industry media_common.quotation_subject Cosine similarity 020206 networking & telecommunications 02 engineering and technology Term (logic) computer.software_genre Automatic summarization language.human_language Bengali 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Artificial intelligence Hidden Markov model business computer Software Natural language processing Sentence Information Systems media_common |
Zdroj: | Journal of Information Processing Systems. |
ISSN: | 2092-805X |
DOI: | 10.3745/jips.04.0038 |
Popis: | This paper proposes an automatic method to summarize Bangla news document. In the proposed approach, pronoun replacement is accomplished for the first time to minimize the dangling pronoun from summary. After replacing pronoun, sentences are ranked using term frequency, sentence frequency, numerical figures and title words. If two sentences have at least 60% cosine similarity, the frequency of the larger sentence is increased, and the smaller sentence is removed to eliminate redundancy. Moreover, the first sentence is included in summary always if it contains any title word. In Bangla text, numerical figures can be presented both in words and digits with a variety of forms. All these forms are identified to assess the importance of sentences. We have used the rule-based system in this approach with hidden Markov model and Markov chain model. To explore the rules, we have analyzed 3,000 Bangla news documents and studied some Bangla grammar books. A series of experiments are performed on 200 Bangla news documents and 600 summaries (3 summaries are for each document). The evaluation results demonstrate the effectiveness of the proposed technique over the four latest methods. |
Databáze: | OpenAIRE |
Externí odkaz: |