Zobrazeno 1 - 10
of 92
pro vyhledávání: '"Wei, Yinwei"'
Autor:
Li, Yongqi, Cai, Hongru, Wang, Wenjie, Qu, Leigang, Wei, Yinwei, Li, Wenjie, Nie, Liqiang, Chua, Tat-Seng
Text-to-image retrieval is a fundamental task in multimedia processing, aiming to retrieve semantically relevant cross-modal content. Traditional studies have typically approached this task as a discriminative problem, matching the text and image via
Externí odkaz:
http://arxiv.org/abs/2407.17274
Product bundling provides clients with a strategic combination of individual items. And it has gained significant attention in recent years as a fundamental prerequisite for online services. Recent methods utilize multimodal information through sophi
Externí odkaz:
http://arxiv.org/abs/2407.11712
Autor:
Huynh, Tuan-Luc, Vu, Thuy-Trang, Wang, Weiqing, Wei, Yinwei, Le, Trung, Gasevic, Dragan, Li, Yuan-Fang, Do, Thanh-Toan
Differentiable Search Index (DSI) utilizes Pre-trained Language Models (PLMs) for efficient document retrieval without relying on external indexes. However, DSIs need full re-training to handle updates in dynamic corpora, causing significant computat
Externí odkaz:
http://arxiv.org/abs/2406.12593
In the real world, multi-modal data often appears in a streaming fashion, and there is a growing demand for similarity retrieval from such non-stationary data, especially at a large scale. In response to this need, online multi-modal hashing has gain
Externí odkaz:
http://arxiv.org/abs/2406.10776
Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item repres
Externí odkaz:
http://arxiv.org/abs/2404.16555
Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal query, i.e., a reference image paired with corresponding modification text. Recent CIR studies leverage vision-language pre-trained (VLP) methods as the feature ex
Externí odkaz:
http://arxiv.org/abs/2404.15875
Autor:
Kang, Jingqi, Wu, Tongtong, Zhao, Jinming, Wang, Guitao, Wei, Yinwei, Yang, Hao, Qi, Guilin, Li, Yuan-Fang, Haffari, Gholamreza
Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretat
Externí odkaz:
http://arxiv.org/abs/2404.13289
Leveraging Large Language Models (LLMs) for recommendation has recently garnered considerable attention, where fine-tuning plays a key role in LLMs' adaptation. However, the cost of fine-tuning LLMs on rapidly expanding recommendation data limits the
Externí odkaz:
http://arxiv.org/abs/2401.17197
Recommendation systems (RS) have become indispensable tools for web services to address information overload, thus enhancing user experiences and bolstering platforms' revenues. However, with their increasing ubiquity, security concerns have also eme
Externí odkaz:
http://arxiv.org/abs/2401.12578
This paper delves into the text-guided image editing task, focusing on modifying a reference image according to user-specified textual feedback to embody specific attributes. Despite recent advancements, a persistent challenge remains that the single
Externí odkaz:
http://arxiv.org/abs/2401.08472