Korean Historical Documents Analysis with Improved Dynamic Word Embedding
Autor: | JeongA Wi, Kyohoon Jin, Kyeongpil Kang, Young-bin Kim |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Word embedding
010504 meteorology & atmospheric sciences Machine translation Computer science named entity recognition computer.software_genre 01 natural sciences deep-learning lcsh:Technology Task (project management) lcsh:Chemistry Named-entity recognition 0103 physical sciences General Materials Science 010303 astronomy & astrophysics Instrumentation lcsh:QH301-705.5 0105 earth and related environmental sciences Fluid Flow and Transfer Processes dynamic word embedding business.industry lcsh:T Process Chemistry and Technology Deep learning General Engineering historical documents Thesaurus neural machine translation lcsh:QC1-999 Computer Science Applications Annals lcsh:Biology (General) lcsh:QD1-999 lcsh:TA1-2040 Key (cryptography) transformer Artificial intelligence business lcsh:Engineering (General). Civil engineering (General) computer Natural language processing lcsh:Physics |
Zdroj: | Applied Sciences, Vol 10, Iss 7939, p 7939 (2020) Applied Sciences Volume 10 Issue 21 |
ISSN: | 2076-3417 |
Popis: | Historical documents refer to records or books that provide textual information about the thoughts and consciousness of past civilisations, and therefore, they have historical significance. These documents are used as key sources for historical studies as they provide information over several historical periods. Many studies have analysed various historical documents using deep learning however, studies that employ changes in information over time are lacking. In this study, we propose a deep-learning approach using improved dynamic word embedding to determine the characteristics of 27 kings mentioned in the Annals of the Joseon Dynasty, which contains a record of 500 years. The characteristics of words for each king were quantitated based on dynamic word embedding further, this information was applied to named entity recognition and neural machine translation.In experiments, we confirmed that the method we proposed showed better performance than other methods. In the named entity recognition task, the F1-score was 0.68 in the neural machine translation task, the BLEU4 score was 0.34. We demonstrated that this approach can be used to extract information about diplomatic relationships with neighbouring countries and the economic conditions of the Joseon Dynasty. |
Databáze: | OpenAIRE |
Externí odkaz: |