Zobrazeno 1 - 10
of 381
pro vyhledávání: '"Jose, Joemon"'
Autor:
Fu, Junchen, Ge, Xuri, Xin, Xin, Karatzoglou, Alexandros, Arapakis, Ioannis, Zheng, Kaiwen, Ni, Yongxin, Jose, Joemon M.
Multimodal foundation models (MFMs) have revolutionized sequential recommender systems through advanced representation learning. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt these models, studies often prioritize parameter e
Externí odkaz:
http://arxiv.org/abs/2411.02992
Retrieval-augmented generation (RAG) has gained wide attention as the key component to improve generative models with external knowledge augmentation from information retrieval. It has shown great prominence in enhancing the functionality and perform
Externí odkaz:
http://arxiv.org/abs/2410.20598
Publikováno v:
ACM Multimedia 2024
Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notab
Externí odkaz:
http://arxiv.org/abs/2408.00644
Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within mod
Externí odkaz:
http://arxiv.org/abs/2406.18579
Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to local details, such as facial state changes between video frames, which can
Externí odkaz:
http://arxiv.org/abs/2405.16701
Publikováno v:
Information Processing & Management, Volume 61, Issue 4, July 2024, 103716
In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent object
Externí odkaz:
http://arxiv.org/abs/2404.17273
Autor:
Fu, Junchen, Ge, Xuri, Xin, Xin, Karatzoglou, Alexandros, Arapakis, Ioannis, Wang, Jie, Jose, Joemon M.
Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for recommendation t
Externí odkaz:
http://arxiv.org/abs/2404.02059
Reinforcement Learning (RL)-based recommender systems have demonstrated promising performance in meeting user expectations by learning to make accurate next-item recommendations from historical user-item interactions. However, existing offline RL-bas
Externí odkaz:
http://arxiv.org/abs/2403.16948
Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases. Although Multimodal Large Language Models (MLLMs) demonstrate st
Externí odkaz:
http://arxiv.org/abs/2402.15276