Výsledky vyhledávání - "Bai, Mingsian R."

Report

A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes

Binaural Audio Telepresence (BAT) aims to encode the acoustic scene at the far end into binaural signals for the user at the near end. BAT encompasses an immense range of applications that can vary between two extreme modes of Immersive BAT (I-BAT) a

Externí odkaz: http://arxiv.org/abs/2405.08742

Zobrazit plný text záznamu

Report

Spatial-Temporal Activity-Informed Diarization and Separation

Autor: Hsu, Yicheng, Chen, Ssuhan, Bai, Mingsian R.

A robust multichannel speaker diarization and separation system is proposed by exploiting the spatio-temporal activity of the speakers. The system is realized in a hybrid architecture that combines the array signal processing units and the deep learn

Externí odkaz: http://arxiv.org/abs/2401.16850

Zobrazit plný text záznamu

Report

Learning-based Array Configuration-Independent Binaural Audio Telepresence with Scalable Signal Enhancement and Ambience Preservation

Autor: Hsu, Yicheng, Bai, Mingsian R.

Audio Telepresence (AT) aims to create an immersive experience of the audio scene at the far end for the user(s) at the near end. The application of AT could encompass scenarios with varying degrees of emphasis on signal enhancement and ambience pres

Externí odkaz: http://arxiv.org/abs/2311.12706

Zobrazit plný text záznamu

Report

Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function

Autor: Chang, Hsinyu, Hsu, Yicheng, Bai, Mingsian R.

Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming

Externí odkaz: http://arxiv.org/abs/2310.12837

Zobrazit plný text záznamu

Report

Array Configuration-Agnostic Personal Voice Activity Detection Based on Spatial Coherence

Autor: Hsu, Yicheng, Bai, Mingsian R.

Personal voice activity detection has received increased attention due to the growing popularity of personal mobile devices and smart speakers. PVAD is often an integral element to speech enhancement and recognition for these applications in which li

Externí odkaz: http://arxiv.org/abs/2304.08887

Zobrazit plný text záznamu

Report

Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

Autor: Hsu, Yicheng, Lee, Yonghan, Bai, Mingsian R.

Personalized speech enhancement has been a field of active research for suppression of speechlike interferers such as competing speakers or TV dialogues. Compared with single channel approaches, multichannel PSE systems can be more effective in adver

Externí odkaz: http://arxiv.org/abs/2211.08748

Zobrazit plný text záznamu

Report

Model-matching Principle Applied to the Design of an Array-based All-neural Binaural Rendering System for Audio Telepresence

Autor: Hsu, Yicheng, Ma, Chenghumg, Bai, Mingsian R.

Telepresence aims to create an immersive but virtual experience of the audio and visual scene at the far end for users at the near end. In this contribution, we propose an array-based binaural rendering system that converts the array microphone signa

Externí odkaz: http://arxiv.org/abs/2210.11123

Zobrazit plný text záznamu

Report

Multi-channel target speech enhancement based on ERB-scaled spatial coherence features

Autor: Hsu, Yicheng, Lee, Yonghan, Bai, Mingsian R.

Recently, speech enhancement technologies that are based on deep learning have received considerable research attention. If the spatial information in microphone signals is exploited, microphone arrays can be advantageous under some adverse acoustic

Externí odkaz: http://arxiv.org/abs/2207.08126

Zobrazit plný text záznamu

Report

Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection

Autor: Chen, Yuan, Hsu, Yicheng, Bai, Mingsian R.

Speech enhancement and source localization has been active research for several decades with a wide range of real-world applications. Recently, the Deep Complex Convolution Recurrent network (DCCRN) has yielded impressive enhancement performance for

Externí odkaz: http://arxiv.org/abs/2206.09728

Zobrazit plný text záznamu

Report

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Autor: Hsu, Yicheng, Lee, Yonghan, Bai, Mingsian R.

Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extra

Externí odkaz: http://arxiv.org/abs/2112.05686

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání