Výsledky vyhledávání

Report

The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart. In this paper, we introduce Baichuan-Omni, the first open-source 7

Externí odkaz: http://arxiv.org/abs/2410.08565

Zobrazit plný text záznamu

Report

Language-Queried Target Sound Extraction Without Parallel Training Data

Autor: Ma, Hao, Peng, Zhiyuan, Li, Xu, Li, Yukai, Shao, Mingjie, Kong, Qiuqiang, Liu, Ju

Language-queried target sound extraction (TSE) aims to extract specific sounds from mixtures based on language queries. Traditional fully-supervised training schemes require extensively annotated parallel audio-text data, which are labor-intensive. W

Externí odkaz: http://arxiv.org/abs/2409.09398

Zobrazit plný text záznamu

Report

Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility

Autor: Liu, Xiaoyu, Li, Xu, Serrà, Joan, Pascual, Santiago

Speech restoration aims at restoring full-band speech with high quality and intelligibility, considering a diverse set of distortions. MaskSR is a recently proposed generative model for this task. As other models of its kind, MaskSR attains high qual

Externí odkaz: http://arxiv.org/abs/2409.09357

Zobrazit plný text záznamu

Report

EA-VTR: Event-Aware Video-Text Retrieval

Autor: Ma, Zongyang, Zhang, Ziqi, Chen, Yuxin, Qi, Zhongang, Yuan, Chunfeng, Li, Bing, Luo, Yingmin, Li, Xu, Qi, Xiaojuan, Shan, Ying, Hu, Weiming

Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level c

Externí odkaz: http://arxiv.org/abs/2407.07478

Zobrazit plný text záznamu

Report

MaskSR: Masked Language Model for Full-band Speech Restoration

Autor: Li, Xu, Wang, Qirui, Liu, Xiaoyu

Speech restoration aims at restoring high quality speech in the presence of a diverse set of distortions. Although several deep learning paradigms have been studied for this task, the power of the recently emerging language models has not been fully

Externí odkaz: http://arxiv.org/abs/2406.02092

Zobrazit plný text záznamu

Report

SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models

Autor: Cheng, Qingrong, Li, Xu, Fu, Xinghui, Xia, Fei, Sun, Zhongqian

The automated synthesis of high-quality 3D gestures from speech is of significant value in virtual humans and gaming. Previous methods focus on synthesizing gestures that are synchronized with speech rhythm, yet they frequently overlook the inclusion

Externí odkaz: http://arxiv.org/abs/2405.13336

Zobrazit plný text záznamu

Report

Lam-Tung relation breaking in $Z$ boson production as a probe of SMEFT effects

Autor: Li, Xu, Yan, Bin, Yuan, C. -P.

The violation of Lam-Tung relation in the high-$p_T^{\ell\ell}$ region of the Drell-Yan process at the LHC presents a long-standing discrepancy with the standard model prediction at $\mathcal{O}(\alpha_s^3)$ accuracy. In this Letter, we employed a mo

Externí odkaz: http://arxiv.org/abs/2405.04069

Zobrazit plný text záznamu

Report

Dynamic Resolution Guidance for Facial Expression Recognition

Autor: Wang, Songpan, Li, Xu, Jiang, Tianxiang, Xie, Yuanlun

Facial expression recognition (FER) is vital for human-computer interaction and emotion analysis, yet recognizing expressions in low-resolution images remains challenging. This paper introduces a practical method called Dynamic Resolution Guidance fo

Externí odkaz: http://arxiv.org/abs/2404.06365

Zobrazit plný text záznamu

Report

CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to

Externí odkaz: http://arxiv.org/abs/2404.01780

Zobrazit plný text záznamu

Report

Ultrafast switching of sliding ferroelectricity and dynamical magnetic field in van der Waals bilayer induced by light

Autor: Wang, Jian, Li, Xu, Ma, Xingyue, Chen, Lan, Liu, Jun-Ming, Duan, Chun-Gang, Íñiguez-González, Jorge, Wu, Di, Yang, Yurong

Sliding ferroelectricity is a unique type of polarity recently observed in a properly stacked van der Waals bilayer. However, electric-field control of sliding ferroelectricity is hard and could induce large coercive electric fields and serious leaka

Externí odkaz: http://arxiv.org/abs/2403.06531

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání