Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Ruan, Ludan"'
Multimodal processing has attracted much attention lately especially with the success of pre-training. However, the exploration has mainly focused on vision-language pre-training, as introducing more modalities can greatly complicate model design and
Externí odkaz:
http://arxiv.org/abs/2303.06591
Autor:
Lin, Hongpeng, Ruan, Ludan, Xia, Wenke, Liu, Peiyu, Wen, Jingyuan, Xu, Yixin, Hu, Di, Song, Ruihua, Zhao, Wayne Xin, Jin, Qin, Lu, Zhiwu
To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conv
Externí odkaz:
http://arxiv.org/abs/2301.05880
Autor:
Ruan, Ludan, Ma, Yiyang, Yang, Huan, He, Huiguo, Liu, Bei, Fu, Jianlong, Yuan, Nicholas Jing, Jin, Qin, Guo, Baining
We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion m
Externí odkaz:
http://arxiv.org/abs/2212.09478
Autor:
Ruan, Ludan, Jin, Qin
Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have begun to apply transformer to video processing. This survey aims to give a comprehensive overview on trans
Externí odkaz:
http://arxiv.org/abs/2109.09920
Entities Object Localization (EOL) aims to evaluate how grounded or faithful a description is, which consists of caption generation and object grounding. Previous works tackle this problem by jointly training the two modules in a framework, which lim
Externí odkaz:
http://arxiv.org/abs/2106.06138
The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. We propose two novel question-answering tasks to evaluate models' fine-gra
Externí odkaz:
http://arxiv.org/abs/2004.05573
Autor:
Ruan, Ludan, Jin, Qin
Publikováno v:
In AI Open 2022 3:1-13
Autor:
Lin, Hongpeng, Ruan, Ludan, Xia, Wenke, Liu, Peiyu, Wen, Jingyuan, Xu, Yixin, Hu, Di, Song, Ruihua, Zhao, Wayne Xin, Jin, Qin, Lu, Zhiwu
We present a novel multi-modal chitchat dialogue dataset-TikTalk aimed at facilitating the research of intelligent chatbots. It consists of the videos and corresponding dialogues users generate on video social applications. In contrast to existing mu
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3ca8ade9d31b6e34ec9ec4f65871f932
http://arxiv.org/abs/2301.05880
http://arxiv.org/abs/2301.05880
Publikováno v:
Machine Intelligence Research; April 2023, Vol. 20 Issue: 2 p220-232, 13p