Autor: |
Saptarshi Paul, Bipul Shyam Purkhyastha |
Jazyk: |
angličtina |
Rok vydání: |
2022 |
Předmět: |
|
Zdroj: |
Journal of King Saud University: Computer and Information Sciences, Vol 34, Iss 8, Pp 5030-5044 (2022) |
Druh dokumentu: |
article |
ISSN: |
1319-1578 |
DOI: |
10.1016/j.jksuci.2021.01.016 |
Popis: |
Capable MT systems implemented using SMT and NMT for languages such as Bengali and other Indian languages are used regularly. The performance of MT systems is regulated by the domain knowledge which is directly derived from the parallel corpus that provides the guidelines used to train the model. In the last few years, spectacular results have been achieved by systems using various NMT models. Organizations like Google and Microsoft have shifted from SMT to NMT models. In this paper, we compare the implementation of the unexplored aviation domain with standard domains whose corpuses are downloaded from TDIL (https://tdil.meity.gov.in/) and also have a look at the impact of the post-processing tools on the Tourism corpus of TDIL. The implementation was accomplished using OpenNMT. English to Bengali Aviation parallel corpus has been developed and implemented along with multiple post-processing and pre-processing tools to get the desired results. The developed aviation post-processing tools have been later implemented upon TDIL Tourism corpus to test the effectiveness of the tools on non-aviation but similar corpus. The result analysis involve comparing the BLEU scores of the aviation domain and the BLEU scores of the Tourism Domain before and after the applications of the Pre and Post-processing tools. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|