Zobrazeno 1 - 10
of 142
pro vyhledávání: '"Wang, Zhaokai"'
The rapid advancement of Large Language Models (LLMs) has led to an influx of efforts to extend their capabilities to multimodal tasks. Among them, growing attention has been focused on monolithic Multimodal Large Language Models (MLLMs) that integra
Externí odkaz:
http://arxiv.org/abs/2410.08202
Autor:
Zhu, Xizhou, Yang, Xue, Wang, Zhaokai, Li, Hao, Dou, Wenhan, Ge, Junqi, Lu, Lewei, Qiao, Yu, Dai, Jifeng
Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires signi
Externí odkaz:
http://arxiv.org/abs/2406.04330
Autor:
Tang, Yihong, Wang, Zhaokai, Qu, Ao, Yan, Yihao, Wu, Zhaofeng, Zhuang, Dingyi, Kai, Jushi, Hou, Kebing, Guo, Xiaotong, Zhao, Jinhua, Zhao, Zhan, Ma, Wei
Citywalk, a recently popular form of urban travel, requires genuine personalization and understanding of fine-grained requests compared to traditional itinerary planning. In this paper, we introduce the novel task of Open-domain Urban Itinerary Plann
Externí odkaz:
http://arxiv.org/abs/2402.07204
Autor:
Li, Hao, Yang, Xue, Wang, Zhaokai, Zhu, Xizhou, Zhou, Jie, Qiao, Yu, Wang, Xiaogang, Li, Hongsheng, Lu, Lewei, Dai, Jifeng
Many reinforcement learning environments (e.g., Minecraft) provide only sparse rewards that indicate task completion or failure with binary values. The challenge in exploration efficiency in such environments makes it difficult for reinforcement-lear
Externí odkaz:
http://arxiv.org/abs/2312.09238
Autor:
Zhuo, Le, Wang, Zhaokai, Wang, Baisen, Liao, Yue, Bao, Chenxi, Peng, Stanley, Han, Songhao, Zhang, Aixi, Fang, Fei, Liu, Si
Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets
Externí odkaz:
http://arxiv.org/abs/2211.11248
Autor:
Tan, Jun, Wang, Zhaokai, Huang, Zhihong, Huang, Ai, Zhang, Huan, Huang, Lei, Song, Naicheng, Xin, Gaojie, Jiang, Ke, Sun, Xiangfu
Publikováno v:
In Biochemical and Biophysical Research Communications 1 October 2024 727
Autor:
Di, Shangzhe, Jiang, Zeren, Liu, Si, Wang, Zhaokai, Zhu, Leyan, He, Zexin, Liu, Hongming, Yan, Shuicheng
In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music tailored to a particular video, and none of them considers the video-music rhyth
Externí odkaz:
http://arxiv.org/abs/2111.08380
Autor:
Song, Naicheng, Wang, Zhaokai, Sun, Quanchao, Xin, Gaojie, Yao, Zuhuan, Huang, Ai, Xing, Shijie, Qu, Yue, Zhang, Huan, Huang, Zhihong, Liao, Yongde, Jiang, Ke
Publikováno v:
In International Immunopharmacology 5 December 2024 142 Part A
Publikováno v:
In Advances in Water Resources April 2024 186
When describing an image, reading text in the visual scene is crucial to understand the key information. Recent work explores the TextCaps task, i.e. image captioning with reading Optical Character Recognition (OCR) tokens, which requires models to r
Externí odkaz:
http://arxiv.org/abs/2012.03662