Zobrazeno 1 - 10
of 182
pro vyhledávání: '"Han, Jiqing"'
Mutual Information-based Representations Disentanglement for Unaligned Multimodal Language Sequences
The key challenge in unaligned multimodal language sequences lies in effectively integrating information from various modalities to obtain a refined multimodal joint representation. Recently, the disentangle and fuse methods have achieved the promisi
Externí odkaz:
http://arxiv.org/abs/2409.12408
Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied o
Externí odkaz:
http://arxiv.org/abs/2407.03966
Autor:
Guan, Yadong, Han, Jiqing, Song, Hongwei, Song, Wenjie, Zheng, Guibin, Zheng, Tieran, He, Yongjun
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shar
Externí odkaz:
http://arxiv.org/abs/2401.05850
This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input. For example, in social security applications, the keyword might be "help", and the goal is to identify what the person who
Externí odkaz:
http://arxiv.org/abs/2310.05352
Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in ampli
Externí odkaz:
http://arxiv.org/abs/2305.17706
Time-weighted Frequency Domain Audio Representation with GMM Estimator for Anomalous Sound Detection
Although deep learning is the mainstream method in unsupervised anomalous sound detection, Gaussian Mixture Model (GMM) with statistical audio frequency representation as input can achieve comparable results with much lower model complexity and fewer
Externí odkaz:
http://arxiv.org/abs/2305.03328
The lack of data and the difficulty of multimodal fusion have always been challenges for multimodal emotion recognition (MER). In this paper, we propose to use pretrained models as upstream network, wav2vec 2.0 for audio modality and BERT for text mo
Externí odkaz:
http://arxiv.org/abs/2302.13661
Autor:
Qian, Fan, Han, Jiqing
Speech emotion recognition is a challenge and an important step towards more natural human-computer interaction (HCI). The popular approach is multimodal emotion recognition based on model-level fusion, which means that the multimodal signals can be
Externí odkaz:
http://arxiv.org/abs/2211.10885
Most recent research about automatic music transcription (AMT) uses convolutional neural networks and recurrent neural networks to model the mapping from music signals to symbolic notation. Based on a high-resolution piano transcription system, we ex
Externí odkaz:
http://arxiv.org/abs/2204.03898
Publikováno v:
Green Processing and Synthesis, Vol 12, Iss 1, Pp 47-51 (2023)
The effects of SiO2 and CO2 on the crystallization action of Ti-containing mixed molten slag (molten Ti-containing blast furnace slag and molten Ti slag) were discussed by thermodynamic calculation and specific experiments. The results of thermodynam
Externí odkaz:
https://doaj.org/article/4f7300bd478d4cba9e6930acbc853e0c