Designing Human-Robot Communication in the Indonesian Language Using the Deep Bidirectional Long Short-Term Memory Algorithm

Autor: Suci Dwijayanti, Ahmad Reinaldi Akbar, Bhakti Yudho Suprapto
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Jurnal Elektronika dan Telekomunikasi, Vol 24, Iss 1, Pp 1-11 (2024)
Druh dokumentu: article
ISSN: 1411-8289
2527-9955
DOI: 10.55981/jet.595
Popis: Humanoid robots closely resemble humans and engage in various human-like activities while responding to queries from their users, facilitating two-way communication between humans and robots. This bidirectional interaction is enabled through the integration of speech-to-text and text-to-speech systems within the robot. However, research on two-way communication systems for humanoid robots utilizing speech-to-text and text-to-speech technologies has predominantly focused on the English language. This study aims to develop a real-time two-way communication system between humans and a robot, with data collected from ten respondents, including eight males and two females. The sentences used adhere to the standard rules of the Indonesian language. The speech-to-text system employs a deep bidirectional long short-term memory algorithm, coupled with feature extraction via the Mel frequency cepstral coefficients, to convert spoken language into text. Conversely, the text-to-speech system utilizes the Python pyttsx3 module to translate text into spoken responses delivered by the robot. The results indicate that the speech-to-text model achieves a high level of accuracy under quiet-room conditions, with noise levels ranging from 57.5 to 60 dB, boasting an average word error rate (WER) of 24.99% and 25.31% for speakers within and outside the dataset, respectively. In settings with engine noise and crowds, where noise levels range from 62.4 to 86 dB, the measured WER is 36.36% and 36.96% for speakers within and outside the dataset, respectively. This study demonstrates the feasibility of implementing a two-way communication system between humans and a robot, enabling the robot to respond to various vocal inputs effectively.
Databáze: Directory of Open Access Journals