Filterbank design for end-to-end speech separation
Autor: | Emmanuel Vincent, Manuel Pariente, Antoine Deleforge, Samuele Cornell |
---|---|
Přispěvatelé: | Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Università Politecnica delle Marche [Ancona] (UNIVPM), Grid5000, Pariente, Manuel |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Signal Processing (eess.SP)
FOS: Computer and information sciences Masking (art) Sound (cs.SD) Computer Science - Machine Learning Computer science Speech recognition 02 engineering and technology Computer Science - Sound Machine Learning (cs.LG) Set (abstract data type) 030507 speech-language pathology & audiology 03 medical and health sciences End-to-end principle [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering 0202 electrical engineering electronic engineering information engineering Electrical Engineering and Systems Science - Signal Processing Short-time Fourier transform 020206 networking & telecommunications Filter bank Speaker recognition [STAT.ML] Statistics [stat]/Machine Learning [stat.ML] [INFO.INFO-SD] Computer Science [cs]/Sound [cs.SD] [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] 0305 other medical science Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | ICASSP 2020-45th International Conference on Acoustics, Speech, and Signal Processing ICASSP 2020-45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain ICASSP |
Popis: | Single-channel speech separation has recently made great progress thanks to learned filterbanks as used in ConvTasNet. In parallel, parameterized filterbanks have been proposed for speaker recognition where only center frequencies and bandwidths are learned. In this work, we extend real-valued learned and parameterized filterbanks into complex-valued analytic filterbanks and define a set of corresponding representations and masking strategies. We evaluate these filterbanks on a newly released noisy speech separation dataset (WHAM). The results show that the proposed analytic learned filterbank consistently outperforms the real-valued filterbank of ConvTasNet. Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions. Finally, we show that the STFT achieves its best performance for 2ms windows. ICASSP 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |