A low-resource Lao text regularization task based on BiLSTM.

Autor: WANG Jian, JIANG Lin, WANG Lin-qin, YU Zheng-tao, ZHANG Song, GAO Sheng-xiang
Zdroj: Computer Engineering & Science / Jisuanji Gongcheng yu Kexue; Jul2023, Vol. 45 Issue 7, p1292-1299, 8p
Abstrakt: Text normalization (TN) is an indispensable work in the front-end analysis task of speech synthesis text. Lao text normalization is to convert non-standard words (NSW) in Lao text into spoken-form words (SFW). At present, the task of text normalization has not yet been carried out in Lao, which mainly faces the problems of difficult acquisition of training data, diversified language expression and text regularization with ambiguity. A text normalization task in Lao is carried out. This ask is completed as a sequence tagging task, and neural networks are used to predict NSW with ambiguity in combination with context. The corpus of the Lao text normalization task is constructed, the model results is predicted through the neural network, the self-attention mechanism is increased to deepen the relationship between the sequence characters, and different strategies are explored to introduce the pre-trained language model. An accuracy of 67.59% is achieved on the test set. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index