Výsledky vyhledávání

Report

Mutual Information-based Representations Disentanglement for Unaligned Multimodal Language Sequences

Autor: Qian, Fan, Han, Jiqing, Li, Jianchen, He, Yongjun, Zheng, Tieran, Zheng, Guibin

The key challenge in unaligned multimodal language sequences lies in effectively integrating information from various modalities to obtain a refined multimodal joint representation. Recently, the disentangle and fuse methods have achieved the promisi

Externí odkaz: http://arxiv.org/abs/2409.12408

Zobrazit plný text záznamu

Report

Serialized Output Training by Learned Dominance

Autor: Shi, Ying, Li, Lantian, Yin, Shi, Wang, Dong, Han, Jiqing

Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied o

Externí odkaz: http://arxiv.org/abs/2407.03966

Zobrazit plný text záznamu

Report

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection

Autor: Guan, Yadong, Han, Jiqing, Song, Hongwei, Song, Wenjie, Zheng, Guibin, Zheng, Tieran, He, Yongjun

Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shar

Externí odkaz: http://arxiv.org/abs/2401.05850

Zobrazit plný text záznamu

Report

A Glance is Enough: Extract Target Sentence By Looking at A keyword

Autor: Shi, Ying, Wang, Dong, Li, Lantian, Han, Jiqing

This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input. For example, in social security applications, the keyword might be "help", and the goal is to identify what the person who

Externí odkaz: http://arxiv.org/abs/2310.05352

Zobrazit plný text záznamu

Report

Spot keywords from very noisy and mixed speech

Autor: Shi, Ying, Wang, Dong, Li, Lantian, Han, Jiqing, Yin, Shi

Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in ampli

Externí odkaz: http://arxiv.org/abs/2305.17706

Zobrazit plný text záznamu

Report

Time-weighted Frequency Domain Audio Representation with GMM Estimator for Anomalous Sound Detection

Autor: Guan, Jian, Liu, Youde, Zhu, Qiaoxi, Zheng, Tieran, Han, Jiqing, Wang, Wenwu

Although deep learning is the mainstream method in unsupervised anomalous sound detection, Gaussian Mixture Model (GMM) with statistical audio frequency representation as input can achieve comparable results with much lower model complexity and fewer

Externí odkaz: http://arxiv.org/abs/2305.03328

Zobrazit plný text záznamu

Report

Using Auxiliary Tasks In Multimodal Fusion Of Wav2vec 2.0 And BERT For Multimodal Emotion Recognition

Autor: Sun, Dekai, He, Yancheng, Han, Jiqing

The lack of data and the difficulty of multimodal fusion have always been challenges for multimodal emotion recognition (MER). In this paper, we propose to use pretrained models as upstream network, wav2vec 2.0 for audio modality and BERT for text mo

Externí odkaz: http://arxiv.org/abs/2302.13661

Zobrazit plný text záznamu

Report

Contrastive Regularization for Multimodal Emotion Recognition Using Audio and Text

Autor: Qian, Fan, Han, Jiqing

Speech emotion recognition is a challenge and an important step towards more natural human-computer interaction (HCI). The popular approach is multimodal emotion recognition based on model-level fusion, which means that the multimodal signals can be

Externí odkaz: http://arxiv.org/abs/2211.10885

Zobrazit plný text záznamu

Report

Exploring Transformer's potential on automatic piano transcription

Autor: Ou, Longshen, Guo, Ziyi, Benetos, Emmanouil, Han, Jiqing, Wang, Ye

Most recent research about automatic music transcription (AMT) uses convolutional neural networks and recurrent neural networks to model the mapping from music signals to symbolic notation. Based on a high-resolution piano transcription system, we ex

Externí odkaz: http://arxiv.org/abs/2204.03898

Zobrazit plný text záznamu

Akademický článek

Clean preparation of rutile from Ti-containing mixed molten slag by CO2 oxidation

Autor: Han Jiqing, Feng Qiuping, Zhang Li

Publikováno v: Green Processing and Synthesis, Vol 12, Iss 1, Pp 47-51 (2023)

The effects of SiO2 and CO2 on the crystallization action of Ti-containing mixed molten slag (molten Ti-containing blast furnace slag and molten Ti slag) were discussed by thermodynamic calculation and specific experiments. The results of thermodynam

Externí odkaz: https://doaj.org/article/4f7300bd478d4cba9e6930acbc853e0c

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání