Zobrazeno 1 - 10
of 545
pro vyhledávání: '"Tang, Zhenyu"'
Autor:
Chen, Lin, Wei, Xilin, Li, Jinsong, Dong, Xiaoyi, Zhang, Pan, Zang, Yuhang, Chen, Zehui, Duan, Haodong, Lin, Bin, Tang, Zhenyu, Yuan, Li, Qiao, Yu, Lin, Dahua, Zhao, Feng, Wang, Jiaqi
We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video
Externí odkaz:
http://arxiv.org/abs/2406.04325
Autor:
Anastassiou, Philip, Tang, Zhenyu, Peng, Kainan, Jia, Dongya, Li, Jiaxin, Tu, Ming, Wang, Yuping, Wang, Yuxuan, Ma, Mingbo
We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been cons
Externí odkaz:
http://arxiv.org/abs/2404.06674
Autor:
Pang, Yatian, Jia, Tanghui, Shi, Yujun, Tang, Zhenyu, Zhang, Junwu, Cheng, Xinhua, Zhou, Xing, Tay, Francis E. H., Yuan, Li
We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still chall
Externí odkaz:
http://arxiv.org/abs/2403.08902
Autor:
Zhu, Bin, Ning, Munan, Jin, Peng, Lin, Bin, Huang, Jinfa, Song, Qi, Zhang, Junwu, Tang, Zhenyu, Pan, Mingjun, Zhou, Xing, Yuan, Li
In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi
Externí odkaz:
http://arxiv.org/abs/2402.14891
Autor:
Lin, Bin, Tang, Zhenyu, Ye, Yang, Cui, Jiaxi, Zhu, Bin, Jin, Peng, Huang, Jinfa, Zhang, Junwu, Ning, Munan, Yuan, Li
Recent advances demonstrate that scaling Large Vision-Language Models (LVLMs) effectively improves downstream task performances. However, existing scaling methods enable all model parameters to be active for each token in the calculation, which bring
Externí odkaz:
http://arxiv.org/abs/2401.15947
Autor:
Zhang, Junwu, Tang, Zhenyu, Pang, Yatian, Cheng, Xinhua, Jin, Peng, Wei, Yida, Ning, Munan, Yuan, Li
Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the
Externí odkaz:
http://arxiv.org/abs/2312.13271
Autor:
Fang, Hao-Shu, Fang, Hongjie, Tang, Zhenyu, Liu, Jirong, Wang, Chenxi, Wang, Junbo, Zhu, Haoyi, Lu, Cewu
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrati
Externí odkaz:
http://arxiv.org/abs/2307.00595
Deep learning models often require large amounts of data for training, leading to increased costs. It is particularly challenging in medical imaging, i.e., gathering distributed data for centralized training, and meanwhile, obtaining quality labels r
Externí odkaz:
http://arxiv.org/abs/2306.14113
We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets. Our approach is designed to recover the reverb-free signal from a reverberant speech signal. We show that accurately si
Externí odkaz:
http://arxiv.org/abs/2212.05360
We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing.
Externí odkaz:
http://arxiv.org/abs/2205.09248