Dhaka University Institutional Repository

A new approach of bangla news document summarization

Show simple item record

dc.contributor.author Haque, Md. Majharul
dc.date.accessioned 2019-10-30T07:07:22Z
dc.date.available 2019-10-30T07:07:22Z
dc.date.issued 2018-09-24
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/926
dc.description This Thesis submitted to the Department of Computer Science & Engineering of the Faculty of the Engineering and Technology in the University of Dhaka for partial ful llment of the requirements of the degree of Doctor of Philosophy. en_US
dc.description.abstract The object of this research work is to propose a new method of automatic Bangla news document summarization. It is noticeable that the existing English text summarization systems may not be directly applicable for Bangla for the complexities of Bangla language in grammatical rules, structure of sentences, different placement of subject and object, etc. Again, the research work for Bangla language processing is difficult because there is hardly any automated tool to facilitate research work. In this challenging situation, a new approach for Bangla news document summarization has been presented here by introducing pronoun replacement and an improved version of sentence ranking. Major parts of this approach are (i) preprocessing the input document, (ii) word tagging, (iii) replacement of pronoun, and (iv) sentence ranking. Replacement of pronoun has been accomplished here for the rst time to minimize the dangling pronoun in summary. After replacing pronoun, sentences are ranked by considering (i) term frequency, (ii) sentence frequency, (iii) numerical gures (presented in words and digits), and (iv) title words. If two sentences has at least 60% cosine similarity, frequency of larger sentence is increased and remove smaller sentence which eliminates redundancy. Moreover, the rst sentence has been specially considered for containing any title word. Again, numerical gure has beenidenti ed from words and digits to assess the importance of sentences despite the variety of forms for any numerical gure in Bangla. For achieving the target of this proposed method, 3000 news documents have been analyzed and some Bangla grammar books have been studied. The effect of each incorporated feature has been demonstrated with step by step performance analysis. From the evaluation results of the proposed method, the F-measure scores for ROUGE-1 and ROUGE-2 have been found as 0.6003 and 0.5708 respectively and the accuracy of pronoun replacement has been found as 71.80%. The proposed method has minimized the dangling pronoun in summary for 89.75% than the latest Bangla text summarization system. Again, the text summarization performance of the proposed method has been observed as 9.39% (based on ROUGE-1 F-measure score) and 12.52% (based on ROUGE-2 F-measure score) better than the latest existing method. en_US
dc.language.iso en en_US
dc.publisher University of Dhaka en_US
dc.title A new approach of bangla news document summarization en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account