Výsledky vyhledávání

Report

Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

Autor: Shen, Kai, Wu, Lingfei, Tang, Siliang, Xu, Fangli, Long, Bo, Zhuang, Yueting, Pei, Jian

Publikováno v: IEEE Transactions on Pattern Analysis and Machine Intelligence 2024

The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mappin

Externí odkaz: http://arxiv.org/abs/2407.05100

Zobrazit plný text záznamu

Report

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

Autor: Yin, Aoxiong, Li, Haoyuan, Shen, Kai, Tang, Siliang, Zhuang, Yueting

In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. However, existing v

Externí odkaz: http://arxiv.org/abs/2406.07119

Zobrazit plný text záznamu

Report

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Autor: Xin, Detai, Tan, Xu, Shen, Kai, Ju, Zeqian, Yang, Dongchao, Wang, Yuancheng, Takamichi, Shinnosuke, Saruwatari, Hiroshi, Liu, Shujie, Li, Jinyu, Zhao, Sheng

We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as

Externí odkaz: http://arxiv.org/abs/2404.03204

Zobrazit plný text záznamu

Report

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Autor: Ju, Zeqian, Wang, Yuancheng, Shen, Kai, Tan, Xu, Xin, Detai, Yang, Dongchao, Liu, Yanqing, Leng, Yichong, Song, Kaitao, Tang, Siliang, Wu, Zhizheng, Qin, Tao, Li, Xiang-Yang, Ye, Wei, Zhang, Shikun, Bian, Jiang, He, Lei, Li, Jinyu, Zhao, Sheng

While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre,

Externí odkaz: http://arxiv.org/abs/2403.03100

Zobrazit plný text záznamu

Report

Lovelock: Towards Smart NIC-hosted Clusters

Autor: Park, Seo Jin, Govindan, Ramesh, Shen, Kai, Culler, David, Özcan, Fatma, Kim, Geon-Woo, Levy, Hank

Traditional cluster designs were originally server-centric, and have evolved recently to support hardware acceleration and storage disaggregation. In applications that leverage acceleration, the server CPU performs the role of orchestrating computati

Externí odkaz: http://arxiv.org/abs/2309.12665

Zobrazit plný text záznamu

Report

PromptTTS 2: Describing and Generating Voices with Text Prompt

Autor: Leng, Yichong, Guo, Zhifang, Shen, Kai, Tan, Xu, Ju, Zeqian, Liu, Yanqing, Liu, Yufei, Yang, Dongchao, Zhang, Leying, Song, Kaitao, He, Lei, Li, Xiang-Yang, Zhao, Sheng, Qin, Tao, Bian, Jiang

Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using

Externí odkaz: http://arxiv.org/abs/2309.02285

Zobrazit plný text záznamu

Report

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Autor: Shen, Kai, Ju, Zeqian, Tan, Xu, Liu, Yanqing, Leng, Yichong, He, Lei, Qin, Tao, Zhao, Sheng, Bian, Jiang

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize s

Externí odkaz: http://arxiv.org/abs/2304.09116

Zobrazit plný text záznamu

Report

Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging

Autor: Tran, Charlie, Shen, Kai, Liu, Kang, Ashok, Akshay, Ramirez-Zamora, Adolfo, Chen, Jinghua, Li, Yulin, Fang, Ruogu

Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnosti

Externí odkaz: http://arxiv.org/abs/2302.06727

Zobrazit plný text záznamu

Report

A Study on ReLU and Softmax in Transformer

Autor: Shen, Kai, Guo, Junliang, Tan, Xu, Tang, Siliang, Wang, Rui, Bian, Jiang

The Transformer architecture consists of self-attention and feed-forward networks (FFNs) which can be viewed as key-value memories according to previous works. However, FFN and traditional memory utilize different activation functions (i.e., ReLU and

Externí odkaz: http://arxiv.org/abs/2302.06461

Zobrazit plný text záznamu

Report

Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction

Autor: Shen, Kai, Leng, Yichong, Tan, Xu, Tang, Siliang, Zhang, Yuan, Liu, Wenjie, Lin, Edward

Text error correction aims to correct the errors in text sequences such as those typed by humans or generated by speech recognition models. Previous error correction methods usually take the source (incorrect) sentence as encoder input and generate t

Externí odkaz: http://arxiv.org/abs/2211.13252

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání