Non-linear speech representation based on local predictability exponents
Autor: | Oriol Pont, Khalid Daoudi, Antonio Turiel, Vahid Khanagha, Hussein Yahia |
---|---|
Přispěvatelé: | Geometry and Statistics in acquisition data (GeoStat), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Institute of Marine Sciences / Institut de Ciències del Mar [Barcelona] (ICM), Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), Khanagha, Vahid |
Jazyk: | angličtina |
Rok vydání: | 2013 |
Předmět: |
Theoretical computer science
Signal reconstruction [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing Cognitive Neuroscience Computation Complex system Multiscale signal processing 16. Peace & justice Computer Science Applications Nonlinear speech processing Complex dynamics Nonlinear system Cardinality [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing Artificial Intelligence Complex signals and system Predictability Representation (mathematics) [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing Mathematics [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing |
Zdroj: | Neurocomputing Neurocomputing, 2013, Special issue on Non-Linear Speech Signal processing Neurocomputing, Elsevier, 2013, Special issue on Non-Linear Speech Signal processing Digital.CSIC. Repositorio Institucional del CSIC instname |
ISSN: | 0925-2312 |
Popis: | 6 pages, 3 figures Looking for new perspectives to analyze non-linear dynamics of speech, this paper presents a novel approach based on a microcanonical multiscale formulation which allows the geometric and statistical description of multiscale properties of the complex dynamics. Speech is a complex system whose dynamics can be, to some extent, geometrically and statistically accessed by the computation of Local Predictability Exponents (LPEs) unlocking the determination of the most informative subset (Most Singular Manifold or MSM), leading to associated compact representation and reconstruction. But the complex intertwining of different dynamics in speech (added to purely turbulent descriptions) suggests the definition of appropriate multiscale functionals that might influence the evaluation of LPEs, hence leading to more compact MSM. Consequently, by using the classical and generic Sauer/Allebach algorithm for signal reconstruction from irregularly spaced samples, we show that speech reconstruction of good quality can be achieved using MSM of low cardinality. Moreover, in order to further show the potential of the new methodology, we develop a simple and efficient waveform coder which achieves almost the same level of perceptual quality as a standard coder, while having a lower bit-rate. © 2013 Elsevier B.V. This work was funded by the INRIA CORDIS doctoral program |
Databáze: | OpenAIRE |
Externí odkaz: |