Autor: |
Miao, Xiaoxiao, McLoughlin, Ian, Song, Yan |
Předmět: |
|
Zdroj: |
Circuits, Systems & Signal Processing; Jul2021, Vol. 40 Issue 7, p3621-3638, 18p |
Abstrakt: |
This paper proposes novel features for automated language and dialect identification that aim to improve discriminative power by ensuring that each element of the feature vector has a normalised contribution to inter-class variance. The method firstly computes inter- and intra-class frequency variance statistics and then distributes the overall spectral variance across spectral regions which are sized to contain near-equal-variance difference. Spectral features are average pooled within regions to obtain variance normalised features (VNFs). The proposed VNFs are low complexity drop-in replacements for MFCC, SDC, PLP or other input features used for speech-related tasks. In this paper, they are evaluated in three types of system, against MFCCs, for two data-constrained language and dialect identification tasks. VNFs demonstrate good results, comfortably outperforming MFCCs at most dimension sizes, and yielding particularly good performance for the most challenging data-constrained 3s utterance length in the LID task. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|