Výsledky vyhledávání

Report

Advancing Speech Language Models by Scaling Supervised Fine-Tuning with Over 60,000 Hours of Synthetic Speech Dialogue Data

Autor: Zhao, Shuaijiang, Guo, Tingwei, Xiang, Bajian, Wan, Tongtang, Niu, Qiang, Zou, Wei, Li, Xiangang

The GPT-4o represents a significant milestone in enabling real-time interaction with large language models (LLMs) through speech, its remarkable low latency and high fluency not only capture attention but also stimulate research interest in the field

Externí odkaz: http://arxiv.org/abs/2412.01078

Zobrazit plný text záznamu

Report

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Autor: Zhang, Yongmao, Xue, Heyang, Li, Hanzhao, Xie, Lei, Guo, Tingwei, Zhang, Ruixiong, Gong, Caixia

End-to-end singing voice synthesis (SVS) model VISinger can achieve better performance than the typical two-stage model with fewer parameters. However, VISinger has several problems: text-to-phase problem, the end-to-end model learns the meaningless

Externí odkaz: http://arxiv.org/abs/2211.02903

Zobrazit plný text záznamu

Report

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

Autor: Yan, Rui, Wen, Cheng, Zhou, Shuran, Guo, Tingwei, Zou, Wei, Li, Xiangang

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. T

Externí odkaz: http://arxiv.org/abs/2204.08720

Zobrazit plný text záznamu

Report

Time Domain Adversarial Voice Conversion for ADD 2022

Autor: Wen, Cheng, Guo, Tingwei, Tan, Xingjun, Yan, Rui, Zhou, Shuran, Xie, Chuandong, Zou, Wei, Li, Xiangang

In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) system to convert source speech with arbitrary language content into the

Externí odkaz: http://arxiv.org/abs/2204.08692

Zobrazit plný text záznamu

Report

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

Autor: Xu, Yanguang, Sun, Jianwei, Han, Yang, Zhao, Shuaijiang, Mei, Chaoyang, Guo, Tingwei, Zhou, Shuran, Xie, Chuandong, Zou, Wei, Li, Xiangang

This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. The purpose of Task 1 is to leverage both audio and video information to improve the environmental robustne

Externí odkaz: http://arxiv.org/abs/2204.08686

Zobrazit plný text záznamu

Akademický článek

Differential enrichment of middle-low maturity lacustrine shale oil in the late Eocene Shahejie Formation, Bohai Bay Basin

Autor: Li, Yuan, Chen, Di, Jiang, Fujie, Wang, Zhengjun, Cao, Liu, Zhao, Renjie, Guo, Tingwei, Fang, Zhou, Wang, Xiaohao

Publikováno v: In Marine and Petroleum Geology February 2025 172

Zobrazit plný text záznamu

Akademický článek

Carbon contamination of elemental beta-boron promoted a stable boron carbide by spark plasma sintering

Autor: Guo, Tingwei, Hu, Yixuan, Lahkar, Simanta, Joardar, Joydip, Chen, Mingwei, Reddy, Kolan Madhav

Publikováno v: In Journal of the European Ceramic Society August 2024 44(10):5590-5600

Zobrazit plný text záznamu

Akademický článek

Vascular architecture regulates mesenchymal stromal cell heterogeneity via P53-PDGF signaling in the mouse incisor

Autor: Guo, Tingwei, Pei, Fei, Zhang, Mingyi, Yamada, Takahiko, Feng, Jifan, Jing, Junjun, Ho, Thach-Vu, Chai, Yang

Publikováno v: In Cell Stem Cell 6 June 2024 31(6):904-920

Zobrazit plný text záznamu

Akademický článek

Amorphous shear band formation in elemental β-boron

Autor: Guo, Tingwei, Shen, Yidi, Zhang, Haibo, Lahkar, Simanta, Zhang, Zhifu, Song, Shuangxi, An, Qi, Reddy, Kolan Madhav

Publikováno v: In Materials Characterization February 2024 208

Zobrazit plný text záznamu

Report

DiDiSpeech: A Large Scale Mandarin Speech Corpus

Autor: Guo, Tingwei, Wen, Cheng, Jiang, Dongwei, Luo, Ne, Zhang, Ruixiong, Zhao, Shuaijiang, Li, Wubo, Gong, Cheng, Zou, Wei, Han, Kun, Li, Xiangang

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech. It consists of about 800 hours of speech data at 48kHz sampling rate from 6000 speakers and the corresponding texts. All speech data in the corpus is recorded in quie

Externí odkaz: http://arxiv.org/abs/2010.09275

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání