Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Zhao, Jinzheng"'
Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe
Externí odkaz:
http://arxiv.org/abs/2406.16058
Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a
Externí odkaz:
http://arxiv.org/abs/2406.17800
Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation. SELD is usually tackled as an audio-only problem, but visual information has been recently included. Few audio
Externí odkaz:
http://arxiv.org/abs/2312.09034
Autor:
Zhao, Jinzheng, Xu, Yong, Qian, Xinyuan, Berghi, Davide, Wu, Peipei, Cui, Meng, Sun, Jianyuan, Jackson, Philip J. B., Wang, Wenwu
Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visu
Externí odkaz:
http://arxiv.org/abs/2310.14778
Autor:
Liu, Jiachi, Wang, Liwen, Dong, Guanting, Song, Xiaoshuai, Wang, Zechen, Wang, Zhengyang, Lei, Shanglin, Zhao, Jinzheng, He, Keqing, Xiao, Bo, Xu, Weiran
In real dialogue scenarios, as there are unknown input noises in the utterances, existing supervised slot filling models often perform poorly in practical applications. Even though there are some studies on noise-robust models, these works are only e
Externí odkaz:
http://arxiv.org/abs/2310.03518
The use of audio and visual modality for speaker localization has been well studied in the literature by exploiting their complementary characteristics. However, most previous works employ the setting of static sensors mounted at fixed positions. Unl
Externí odkaz:
http://arxiv.org/abs/2309.16308
Autor:
Li, Xuefeng, Wang, Liwen, Dong, Guanting, He, Keqing, Zhao, Jinzheng, Lei, Hao, Liu, Jiachi, Xu, Weiran
Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the unlabeled target domain. Existing models either encode slot descriptions and examples or design handcrafted question templates using heuristic rules,
Externí odkaz:
http://arxiv.org/abs/2307.02830
Autor:
Dong, Guanting, Guo, Daichi, Wang, Liwen, Li, Xuefeng, Wang, Zechen, Zeng, Chen, He, Keqing, Zhao, Jinzheng, Lei, Hao, Cui, Xinyue, Huang, Yi, Feng, Junlan, Xu, Weiran
Most existing slot filling models tend to memorize inherent patterns of entities and corresponding contexts from training data. However, these models can lead to system failure or undesirable outputs when being exposed to spoken language perturbation
Externí odkaz:
http://arxiv.org/abs/2208.11508
Autor:
Li, Xuefeng, Lei, Hao, Wang, Liwen, Dong, Guanting, Zhao, Jinzheng, Liu, Jiachi, Xu, Weiran, Zhang, Chunyun
Multi-domain text classification can automatically classify texts in various scenarios. Due to the diversity of human languages, texts with the same label in different domains may differ greatly, which brings challenges to the multi-domain text class
Externí odkaz:
http://arxiv.org/abs/2204.12125
Autor:
Liu, Xubo, Liu, Haohe, Kong, Qiuqiang, Mei, Xinhao, Zhao, Jinzheng, Huang, Qiushi, Plumbley, Mark D., Wang, Wenwu
In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e.g., "a man tells a joke followed by people
Externí odkaz:
http://arxiv.org/abs/2203.15147