Výsledky vyhledávání

Report

Improving Audio Generation with Visual Enhanced Caption

Autor: Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Plumbley, Mark D., Wang, Wenwu

Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low qu

Externí odkaz: http://arxiv.org/abs/2407.04416

Zobrazit plný text záznamu

Report

Multi-fidelity topology optimization of flow boiling heat transfer in microchannels

Autor: Yuan, Yi, Chen, Li, Yang, Qirui, Gu, Lingran, Tao, Wen-Quan

Topology optimization (TO) is a powerful method to design innovative structures with improved heat transfer performance. In the present study, a multi-fidelity TO method with a delicately defined objective function is developed for flow boiling heat

Externí odkaz: http://arxiv.org/abs/2405.13519

Zobrazit plný text záznamu

Report

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Autor: Liu, Haohe, Xu, Xuenan, Yuan, Yi, Wu, Mengyue, Wang, Wenwu, Plumbley, Mark D.

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate

Externí odkaz: http://arxiv.org/abs/2405.00233

Zobrazit plný text záznamu

Report

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Autor: Yuan, Yi, Chen, Zhuo, Liu, Xubo, Liu, Haohe, Xu, Xuenan, Jia, Dongya, Chen, Yuanzhe, Plumbley, Mark D., Wang, Wenwu

Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal informati

Externí odkaz: http://arxiv.org/abs/2404.17806

Zobrazit plný text záznamu

Report

Audio Simulation for Sound Source Localization in Virtual Evironment

Autor: Di Yuan, Yi, Wong, Swee Liang, Pan, Jonathan

Non-line-of-sight localization in signal-deprived environments is a challenging yet pertinent problem. Acoustic methods in such predominantly indoor scenarios encounter difficulty due to the reverberant nature. In this study, we aim to locate sound s

Externí odkaz: http://arxiv.org/abs/2404.01611

Zobrazit plný text záznamu

Report

HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback

Autor: Li, Ang, Xiao, Qiugen, Cao, Peng, Tang, Jian, Yuan, Yi, Zhao, Zijie, Chen, Xiaoyuan, Zhang, Liang, Li, Xiangyang, Yang, Kaitong, Guo, Weidong, Gan, Yukang, Yu, Xu, Wang, Daniell, Shan, Ying

Reinforcement Learning from AI Feedback (RLAIF) has the advantages of shorter annotation cycles and lower costs over Reinforcement Learning from Human Feedback (RLHF), making it highly efficient during the rapid strategy iteration periods of large la

Externí odkaz: http://arxiv.org/abs/2403.08309

Zobrazit plný text záznamu

Report

Novel 3D Geometry-Based Stochastic Models for Non-Isotropic MIMO Vehicle-to-Vehicle Channels

Autor: Yuan, Yi, Wang, Cheng-Xiang, Cheng, Xiang, Ai, Bo, Laurenson, David I.

This paper proposes a novel three-dimensional (3D) theoretical regular-shaped geometry-based stochastic model (RS-GBSM) and the corresponding sum-of-sinusoids (SoS) simulation model for non-isotropic multiple-input multiple-output (MIMO) vehicle-to-v

Externí odkaz: http://arxiv.org/abs/2312.00550

Zobrazit plný text záznamu

Report

High-Quality 3D Face Reconstruction with Affine Convolutional Networks

Autor: Lin, Zhiqian, Lin, Jiangke, Li, Lincheng, Yuan, Yi, Zou, Zhengxia

Recent works based on convolutional encoder-decoder architecture and 3DMM parameterization have shown great potential for canonical view reconstruction from a single input image. Conventional CNN architectures benefit from exploiting the spatial corr

Externí odkaz: http://arxiv.org/abs/2310.14237

Zobrazit plný text záznamu

Report

Demonstration of chronometric leveling using transportable optical clocks beyond laser coherence limit

Autor: Yuan, Yi, Cui, Kaifeng, Liu, Daoxin, Yuan, Jinbo, Cao, Jian, Wang, Dehao, Chao, Sijia, Shu, Hualin, Haung, Xueren

Optical clock network requires the establishment of optical frequency transmission link between multiple optical clocks, utilizing narrow linewidth lasers. Despite achieving link noise levels of 10${^{-20}}$, the final accuracy is limited by the phas

Externí odkaz: http://arxiv.org/abs/2310.08835

Zobrazit plný text záznamu

Report

Retrieval-Augmented Text-to-Audio Generation

Autor: Yuan, Yi, Liu, Haohe, Liu, Xubo, Huang, Qiushi, Plumbley, Mark D., Wang, Wenwu

Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such as AudioCaps, are biased in their generation performance. Specifica

Externí odkaz: http://arxiv.org/abs/2309.08051

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání