Použití obousměrné LSTM neuronové sítě pro českou fonetickou transkripci

Autor:	Markéta Jůzová, Jakub Vít
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Czech Phrase Artificial neural network Computer science Speech recognition Phonetic transcription Grapheme grapheme-to-phoneme sequence-to-sequence neural networks encoder-decoder model Czech phonetic transcription 020206 networking & telecommunications 02 engineering and technology Pronunciation Autoencoder language.human_language Transcription (linguistics) 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing převod grafémy-fonémy sequence-to-sequence neuronové sítě encoder-decoder model Česká fonetická transkripce
Zdroj:	Text, Speech, and Dialogue ISBN: 9783030279462 TSD
Popis:	Důležitou součástí téměř všech současných systémů TTS je konverze grafémů na fonémy (G2P), tj. transkripce jakékoli vstupní sekvence grafémů do správné sekvence fonémů v daném jazyce. Příprava transkripčních pravidel a slovníků výslovnosti bohužel není snadným procesem pro nové jazyky v systémech TTS. Z tohoto důvodu se v předkládané práci zaměřujeme na vytvoření automatického modelu G2P založeného na neuronových sítích (NN). Na rozdíl od většiny souvisejících prací v oboru G2P, kde se jako vstup používají pouze samostatná slova, považujeme za vstup našeho navrhovaného modelu NN celou frázi. Tento přístup by podle našeho názoru měl vést k přesnější fonetické transkripci, protože výslovnost slova může záviset na okolních slovech. Výsledky natrénovaného modelu G2P jsou prezentovány na českém jazyce, kde k spodobě znělosti přes hranici slov dochází poměrně často, a jsou porovnávány s přístupem založeným na pravidlech. The crucial part of almost all current TTS systems is a grapheme-to-phoneme (G2P) conversion, i.e. the transcription of any input grapheme sequence into the correct sequence of phonemes in the given language. Unfortunately, the preparation of transcription rules and pronunciation dictionaries is not an easy process for new languages in TTS systems. For that reason, in the presented paper, we focus on the creation of an automatic G2P model, based on neural networks (NN). But, contrary to the majority of related works in G2P field, using only separate words as an input, we consider a whole phrase the input of our proposed NN model. That approach should, in our opinion, lead to more precise phonetic transcription output because the pronunciation of a word can depend on the surrounding words. The results of the trained G2P model are presented on the Czech language where the cross-word-boundary phenomena occur quite often, and they are compared to the rule-based approach.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::702f610b27940105973f8cbb465e9647 http://hdl.handle.net/11025/36610 Zobrazit plný text záznamu