Towards Automatic Modelling of Thematic Domains of a National Literature: Technical Issues in the Case of Russian

Autor: Tatiana Sherstinova, Anna Moskvina, Margarita A. Kirina
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Proceedings of the XXth Conference of Open Innovations Association FRUCT, Vol 29, Iss 1, Pp 313-323 (2021)
Druh dokumentu: article
ISSN: 2305-7254
2343-0737
DOI: 10.23919/FRUCT52173.2021.9435451
Popis: A significant part of modern technologies associated with the development of artificial intelligence systems and digital analytics of diverse data relies on methods of computer text processing (NLP, speech technologies). However, NLP methods are applied primarily to specialized texts, such as scientific literature, technical documentation, news, etc., or social media discourse; fiction texts being usually left out of the focus of NLP practitioners as the fictional world seems to be of less significance or less information value from a practical point of view. Moreover, due to the poetic and metaphorical nature of literary texts, the use of some NLP methods (e.g., topic modeling) for fiction analysis turned out to be more complicated. At the same time, the influence of literature both on the consciousness of individuals and on the formation of social values can hardly be overestimated. Besides, making computers understand fiction in a similar way as humans do would be a real challenge for artificial intelligence. The article puts forward the idea of modeling thematic areas of literature on a national scale, which should reveal the main thematic domains of national literature as a whole. It will allow a better understanding of the cultural traits of the national consciousness in a given historical period and contribute to either literary studies and practical tasks. Methodological approaches to determining and modeling themes of literary works are considered, technical difficulties arising in the process are described, and the ways to solve them are suggested. The proposed methodology has been implemented in the design of the Russian short stories corpus (the first third of the 20th century) and can be applied in the development of artificial intelligence systems that process large volumes of literary texts in any language.
Databáze: Directory of Open Access Journals