Výsledky vyhledávání - "Tang, Zhenyu"

Report

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Autor: Chen, Lin, Wei, Xilin, Li, Jinsong, Dong, Xiaoyi, Zhang, Pan, Zang, Yuhang, Chen, Zehui, Duan, Haodong, Lin, Bin, Tang, Zhenyu, Yuan, Li, Qiao, Yu, Lin, Dahua, Zhao, Feng, Wang, Jiaqi

We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video

Externí odkaz: http://arxiv.org/abs/2406.04325

Zobrazit plný text záznamu

Report

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

Autor: Anastassiou, Philip, Tang, Zhenyu, Peng, Kainan, Jia, Dongya, Li, Jiaxin, Tu, Ming, Wang, Yuping, Wang, Yuxuan, Ma, Mingbo

We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been cons

Externí odkaz: http://arxiv.org/abs/2404.06674

Zobrazit plný text záznamu

Report

Envision3D: One Image to 3D with Anchor Views Interpolation

Autor: Pang, Yatian, Jia, Tanghui, Shi, Yujun, Tang, Zhenyu, Zhang, Junwu, Cheng, Xinhua, Zhou, Xing, Tay, Francis E. H., Yuan, Li

We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still chall

Externí odkaz: http://arxiv.org/abs/2403.08902

Zobrazit plný text záznamu

Report

LLMBind: A Unified Modality-Task Integration Framework

Autor: Zhu, Bin, Ning, Munan, Jin, Peng, Lin, Bin, Huang, Jinfa, Song, Qi, Zhang, Junwu, Tang, Zhenyu, Pan, Mingjun, Zhou, Xing, Yuan, Li

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi

Externí odkaz: http://arxiv.org/abs/2402.14891

Zobrazit plný text záznamu

Report

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Autor: Lin, Bin, Tang, Zhenyu, Ye, Yang, Cui, Jiaxi, Zhu, Bin, Jin, Peng, Huang, Jinfa, Zhang, Junwu, Ning, Munan, Yuan, Li

Recent advances demonstrate that scaling Large Vision-Language Models (LVLMs) effectively improves downstream task performances. However, existing scaling methods enable all model parameters to be active for each token in the calculation, which bring

Externí odkaz: http://arxiv.org/abs/2401.15947

Zobrazit plný text záznamu

Report

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

Autor: Zhang, Junwu, Tang, Zhenyu, Pang, Yatian, Cheng, Xinhua, Jin, Peng, Wei, Yida, Ning, Munan, Yuan, Li

Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the

Externí odkaz: http://arxiv.org/abs/2312.13271

Zobrazit plný text záznamu

Report

RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot

Autor: Fang, Hao-Shu, Fang, Hongjie, Tang, Zhenyu, Liu, Jirong, Wang, Chenxi, Wang, Junbo, Zhu, Haoyi, Lu, Cewu

A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrati

Externí odkaz: http://arxiv.org/abs/2307.00595

Zobrazit plný text záznamu

Report

Exploring Data Redundancy in Real-world Image Classification through Data Selection

Autor: Tang, Zhenyu, Zhang, Shaoting, Wang, Xiaosong

Deep learning models often require large amounts of data for training, leading to increased costs. It is particularly challenging in medical imaging, i.e., gathering distributed data for centralized training, and meanwhile, obtaining quality labels r

Externí odkaz: http://arxiv.org/abs/2306.14113

Zobrazit plný text záznamu

Report

Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

Autor: Aralikatti, Rohith, Tang, Zhenyu, Manocha, Dinesh

We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets. Our approach is designed to recover the reverb-free signal from a reverberant speech signal. We show that accurately si

Externí odkaz: http://arxiv.org/abs/2212.05360

Zobrazit plný text záznamu

Report

MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

Autor: Ratnarajah, Anton, Tang, Zhenyu, Aralikatti, Rohith Chandrashekar, Manocha, Dinesh

We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing.

Externí odkaz: http://arxiv.org/abs/2205.09248

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání