Zobrazeno 1 - 10
of 29 066
pro vyhledávání: '"Zhou yang"'
Autor:
Xing, Shuo, Qian, Chengyuan, Wang, Yuping, Hua, Hongyuan, Tian, Kexin, Zhou, Yang, Tu, Zhengzhong
Since the advent of Multimodal Large Language Models (MLLMs), they have made a significant impact across a wide range of real-world applications, particularly in Autonomous Driving (AD). Their ability to process complex visual data and reason about i
Externí odkaz:
http://arxiv.org/abs/2412.15208
Autor:
Xing, Shuo, Hua, Hongyuan, Gao, Xiangbo, Zhu, Shenzhe, Li, Renjie, Tian, Kexin, Li, Xiaopeng, Huang, Heng, Yang, Tianbao, Wang, Zhangyang, Zhou, Yang, Yao, Huaxiu, Tu, Zhengzhong
Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work
Externí odkaz:
http://arxiv.org/abs/2412.15206
Autor:
Xue, Wangyu, Qian, Chen, Wu, Jiayi, Zhou, Yang, Liu, Wentao, Ren, Ju, Fan, Siming, Zhang, Yaoxue
Existing works on human-centric video understanding typically focus on analyzing specific moment or entire videos. However, many applications require higher precision at the frame level. In this work, we propose a novel task, BestShot, which aims to
Externí odkaz:
http://arxiv.org/abs/2412.12675
Autor:
Tanveer, Maham, Zhou, Yang, Niklaus, Simon, Amiri, Ali Mahdavi, Zhang, Hao, Singh, Krishna Kumar, Zhao, Nanxuan
By generating plausible and smooth transitions between two image frames, video inbetweening is an essential tool for video editing and long video synthesis. Traditional works lack the capability to generate complex large motions. While recent video g
Externí odkaz:
http://arxiv.org/abs/2412.13190
Autor:
Huang, Hsin-Ping, Zhou, Yang, Wang, Jui-Hsien, Liu, Difan, Liu, Feng, Yang, Ming-Hsuan, Xu, Zhan
Generating realistic human videos remains a challenging task, with the most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, which restricts
Externí odkaz:
http://arxiv.org/abs/2412.13185
Autor:
Shao, Hao, Wang, Shulun, Zhou, Yang, Song, Guanglu, He, Dailan, Qin, Shuo, Zong, Zhuofan, Ma, Bingqi, Liu, Yu, Li, Hongsheng
Video face swapping is becoming increasingly popular across various applications, yet existing methods primarily focus on static images and struggle with video face swapping because of temporal consistency and complex scenarios. In this paper, we pre
Externí odkaz:
http://arxiv.org/abs/2412.11279
We introduce a novel approach for high-resolution talking head generation from a single image and audio input. Prior methods using explicit face models, like 3D morphable models (3DMM) and facial landmarks, often fall short in generating high-fidelit
Externí odkaz:
http://arxiv.org/abs/2412.04000
Autor:
Zhao, Yilong, Yang, Shuo, Zhu, Kan, Zheng, Lianmin, Kasikci, Baris, Zhou, Yang, Xing, Jiarong, Stoica, Ion
Offline batch inference, which leverages the flexibility of request batching to achieve higher throughput and lower costs, is becoming more popular for latency-insensitive applications. Meanwhile, recent progress in model capability and modality make
Externí odkaz:
http://arxiv.org/abs/2411.16102
This paper presents an analytical approximation framework to understand the dynamics of traffic wave propagation for Automated Vehicles (AVs) during traffic oscillations. The framework systematically unravels the intricate relationships between the l
Externí odkaz:
http://arxiv.org/abs/2411.16937
This study presents an analytical solution for the vehicle state evolution of Adaptive Cruise Control (ACC) systems under cut-in scenarios, incorporating sensing delays and anticipation using the Lambert W function. The theoretical analysis demonstra
Externí odkaz:
http://arxiv.org/abs/2411.13456