Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Zexu Pan"'
Publikováno v:
IEEE Signal Processing Letters. 30:110-114
The speaker extraction technique seeks to single out the voice of a target speaker from the interfering voices in a speech mixture. Typically an auxiliary reference of the target speaker is used to form voluntary attention. Either a pre-recorded utte
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fc62c1f2234c7147ac7863106176422a
http://arxiv.org/abs/2211.00109
http://arxiv.org/abs/2211.00109
Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. In human communication, c
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f9db3bad2aaa3705964781f15ec4851e
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker speech mixture. The prior studies focus mostly on speaker extraction from a highly overlapped multi-talker speech mixture. However, the target-interferenc
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e4139bcaca11b265827bba82a4bae5b1
http://arxiv.org/abs/2109.14831
http://arxiv.org/abs/2109.14831
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker speech mixture when given a cue that represents the target speaker, such as a pre-enrolled speech utterance, or an accompanying video track. Visual cue
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::53f0c617e2ffe81e6c435fc2aff9697c
http://arxiv.org/abs/2106.07150
http://arxiv.org/abs/2106.07150
Publikováno v:
ICASSP
Most of the prior studies in the spatial \ac{DoA} domain focus on a single modality. However, humans use auditory and visual senses to detect the presence of sound sources. With this motivation, we propose to use neural networks with audio and visual
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::065f264fcbee317d34f6d335462d23a8
http://arxiv.org/abs/2105.06107
http://arxiv.org/abs/2105.06107
Publikováno v:
ICASSP
Speaker extraction algorithm relies on the speech sample from the target speaker as the reference point to focus its attention. Such a reference speech is typically pre-recorded. On the other hand, the temporal synchronization between speech and lip
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f745a12dfc999ceb73baf63dc2166519
http://arxiv.org/abs/2010.07775
http://arxiv.org/abs/2010.07775
Publikováno v:
INTERSPEECH
Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attenti
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8c63d3cdb8ca724eae47819e38dd50a4
http://arxiv.org/abs/2009.04107
http://arxiv.org/abs/2009.04107