Autor: |
Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai |
Jazyk: |
angličtina |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
IEEE Access, Vol 9, Pp 94923-94933 (2021) |
Druh dokumentu: |
article |
ISSN: |
2169-3536 |
DOI: |
10.1109/ACCESS.2021.3089565 |
Popis: |
This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. The results of these experiments demonstrate that the proposed Full-band LPCNet is the only neural vocoder that can synthesize high-quality 48 kHz speech waveforms while maintaining real-time capability with a CPU. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|