Zobrazeno 1 - 10
of 30
pro vyhledávání: '"SONGLONG XING"'
Publikováno v:
ACM Transactions on Multimedia Computing, Communications, and Applications. 19:1-24
Multimodal sequence analysis aims to draw inferences from visual, language, and acoustic sequences. A majority of existing works focus on the aligned fusion of three modalities to explore inter-modal interactions, which is impractical in real-world s
Publikováno v:
IEEE Transactions on Affective Computing. 13:1426-1439
In this paper, we address Emotion Recognition in Conversation (ERC) where conversational data are presented in a multimodal setting. Psychological evidence shows that self and inter-speaker influence are two central factors to emotion dynamics in con
Publikováno v:
IEEE Transactions on Affective Computing. 13:320-334
Multimodal human sentiment comprehension refers to recognizing human affection from multiple modalities. There exist two key issues for this problem. Firstly, it is difficult to explore time-dependent interactions between modalities and focus on the
Publikováno v:
IEEE Transactions on Multimedia. 24:2488-2501
Learning a unified embedding for utterance-level video attracts significant attention recently due to the rapid development of social media and its broad applications. An utterance normally contains not only spoken language but also the nonverbal beh
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 29:1424-1437
The emotion of human is always expressed in a multimodal perspective. Analyzing multimodal human sentiment remains challenging due to the difficulties of the interpretation in inter-modality dynamics. Mainstream multimodal learning architectures tend
Publikováno v:
ACM Transactions on Multimedia Computing, Communications, and Applications. 16:1-18
Visual structure and syntactic structure are essential in images and texts, respectively. Visual structure depicts both entities in an image and their interactions, whereas syntactic structure in texts can reflect the part-of-speech constraints betwe
Publikováno v:
IEEE Transactions on Multimedia. 22:122-137
In this paper, we propose a novel multimodal fusion framework, called the locally confined modality fusion network (LMFN), that contains a bidirectional multiconnected LSTM (BM-LSTM) to address the multimodal human affective computing problem. In the
Publikováno v:
ACM Transactions on Multimedia Computing, Communications & Applications; Jul2020, Vol. 16 Issue 3, p1-18, 18p
Publikováno v:
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.
Recently, image prior learning has emerged as an effective tool for image denoising, which exploits prior knowledge to obtain sparse coding models and utilize them to reconstruct the clean image from the noisy one. Albeit promising, these prior-learn
Publikováno v:
ACL (1)
Scopus-Elsevier
Scopus-Elsevier
We propose a general strategy named ‘divide, conquer and combine’ for multimodal fusion. Instead of directly fusing features at holistic level, we conduct fusion hierarchically so that both local and global interactions are considered for a compr