Zobrazeno 1 - 10
of 648
pro vyhledávání: '"Kankanhalli, Mohan"'
The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an
Externí odkaz:
http://arxiv.org/abs/2406.04629
Visual Commonsense Reasoning (VCR) calls for explanatory reasoning behind question answering over visual scenes. To achieve this goal, a model is required to provide an acceptable rationale as the reason for the predicted answers. Progress on the ben
Externí odkaz:
http://arxiv.org/abs/2405.16934
Unlearning methods for recommender systems (RS) have emerged to address privacy issues and concerns about legal compliance. However, evolving user preferences and content licensing issues still remain unaddressed. This is particularly true in case of
Externí odkaz:
http://arxiv.org/abs/2405.15328
Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises f
Externí odkaz:
http://arxiv.org/abs/2405.13911
Autor:
Cheng, Yi, Xu, Ziwei, Lin, Dongyun, Cheng, Harry, Wong, Yongkang, Sun, Ying, Lim, Joo Hwee, Kankanhalli, Mohan
For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not ful
Externí odkaz:
http://arxiv.org/abs/2405.12538
Autor:
Wang, Nana, Kankanhalli, Mohan
The increasing use of GPS-enabled devices has generated a large amount of trajectory data. These data offer us vital insights to understand the movements of individuals and populations, benefiting a broad range of applications from transportation pla
Externí odkaz:
http://arxiv.org/abs/2404.14106
Publikováno v:
International Joint Conference on Artificial Intelligence 2024
Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily fo
Externí odkaz:
http://arxiv.org/abs/2404.12886
Graph Convolution Networks (GCNs) have significantly succeeded in learning user and item representations for recommendation systems. The core of their efficacy is the ability to explicitly exploit the collaborative signals from both the first- and hi
Externí odkaz:
http://arxiv.org/abs/2404.10321
Autor:
Wang, Guangzhi, Chen, Tianyi, Ghasedi, Kamran, Wu, HsiangTao, Ding, Tianyu, Nuesmeyer, Chris, Zharkov, Ilya, Kankanhalli, Mohan, Liang, Luming
Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are root
Externí odkaz:
http://arxiv.org/abs/2404.08111
News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand name
Externí odkaz:
http://arxiv.org/abs/2403.06520