Zobrazeno 1 - 10
of 3 147
pro vyhledávání: '"Lee, Hung"'
Autor:
Yang, Chih-Kai, Fu, Yu-Kuan, Li, Chen-An, Lin, Yi-Cheng, Lin, Yu-Xiang, Chen, Wei-Chih, Chung, Ho Lam, Kuan, Chun-Yi, Huang, Wei-Ping, Lu, Ke-Han, Lin, Tzu-Quan, Wang, Hsiu-Hsuan, Hu, En-Pei, Hsu, Chan-Jan, Tseng, Liang-Hsuan, Chiu, I-Hsiang, Sanga, Ulin, Chen, Xuanjun, Hsu, Po-chun, Yang, Shu-wen, Lee, Hung-yi
This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorpor
Externí odkaz:
http://arxiv.org/abs/2411.07111
Autor:
Huang, Chien-yu, Chen, Wei-Chih, Yang, Shu-wen, Liu, Andy T., Li, Chen-An, Lin, Yu-Xiang, Tseng, Wei-Cheng, Diwan, Anuj, Shih, Yi-Jen, Shi, Jiatong, Chen, William, Chen, Xuanjun, Hsiao, Chi-Yuan, Peng, Puyuan, Wang, Shih-Heng, Kuan, Chun-Yi, Lu, Ke-Han, Chang, Kai-Wei, Yang, Chih-Kai, Ritter-Gutierrez, Fabian, Chuang, Ming To, Huang, Kuan-Po, Arora, Siddhant, Lin, You-Kuan, Yeo, Eunjung, Chang, Kalvin, Chien, Chung-Ming, Choi, Kwanghee, Hsieh, Cheng-Hsiu, Lin, Yi-Cheng, Yu, Chee-En, Chiu, I-Hsiang, Guimarães, Heitor R., Han, Jionghao, Lin, Tzu-Quan, Lin, Tzu-Yuan, Chang, Homu, Chang, Ting-Wu, Chen, Chun Wei, Chen, Shou-Jen, Chen, Yu-Hua, Cheng, Hsi-Chun, Dhawan, Kunal, Fang, Jia-Lin, Fang, Shi-Xin, Chiang, Kuan-Yu Fang, Fu, Chi An, Hsiao, Hsien-Fu, Hsu, Ching Yu, Huang, Shao-Syuan, Wei, Lee Chen, Lin, Hsi-Che, Lin, Hsuan-Hao, Lin, Hsuan-Ting, Lin, Jian-Ren, Liu, Ting-Chun, Lu, Li-Chun, Pai, Tsung-Min, Pasad, Ankita, Kuan, Shih-Yun Shan, Shon, Suwon, Tang, Yuxun, Tsai, Yun-Shao, Wei, Jui-Chiang, Wei, Tzu-Chieh, Wu, Chengxi, Wu, Dien-Ruei, Yang, Chao-Han Huck, Yang, Chieh-Chi, Yip, Jia Qi, Yuan, Shao-Xiang, Noroozi, Vahid, Chen, Zhehuai, Wu, Haibin, Livescu, Karen, Harwath, David, Watanabe, Shinji, Lee, Hung-yi
Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language i
Externí odkaz:
http://arxiv.org/abs/2411.05361
Autor:
Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gourav, Aditya, Gu, Yile, Gandhe, Ankur, Lee, Hung-yi, Bulyko, Ivan
While textless Spoken Language Models (SLMs) have shown potential in end-to-end speech-to-speech modeling, they still lag behind text-based Large Language Models (LLMs) in terms of semantic coherence and relevance. This work introduces the Align-SLM
Externí odkaz:
http://arxiv.org/abs/2411.01834
Autor:
Kuan, Chun-Yi, Lee, Hung-yi
Recent advancements in large audio-language models (LALMs) have shown impressive capabilities in understanding and reasoning about audio and speech information. However, these models still face challenges, including hallucinating non-existent sound e
Externí odkaz:
http://arxiv.org/abs/2410.16130
In an era where cultural preservation is increasingly intertwined with technological innovation, this study introduces a groundbreaking approach to promoting and safeguarding the rich heritage of Taiwanese Hakka culture through the development of a R
Externí odkaz:
http://arxiv.org/abs/2410.15572
Autor:
Chuang, Yun-Yen, Hsu, Hung-Min, Lin, Kevin, Gu, Chen-Sheng, Li, Ling Zhen, Chang, Ray-I, Lee, Hung-yi
The diffusion model, a new generative modeling paradigm, has achieved significant success in generating images, audio, video, and text. It has been adapted for sequence-to-sequence text generation (Seq2Seq) through DiffuSeq, termed S2S Diffusion. Exi
Externí odkaz:
http://arxiv.org/abs/2410.13201
Autor:
Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Yang, Chao-Han Huck, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank, Lee, Hung-yi
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) by incorporating pre-trained speech models. However, these SLMs often undergo extensive speech instruction-tuning to bridge the gap be
Externí odkaz:
http://arxiv.org/abs/2409.20007
Autor:
Su, Hung-Ting, Hsu, Ya-Ching, Lin, Xudong, Shi, Xiang-Qian, Niu, Yulei, Hsu, Han-Yuan, Lee, Hung-yi, Hsu, Winston H.
Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, whic
Externí odkaz:
http://arxiv.org/abs/2409.14324
Autor:
Wu, Haibin, Chen, Xuanjun, Lin, Yi-Cheng, Chang, Kaiwei, Du, Jiawei, Lu, Ke-Han, Liu, Alexander H., Chung, Ho-Lam, Wu, Yuan-Kuei, Yang, Dongchao, Liu, Songxiang, Wu, Yi-Chiao, Tan, Xu, Glass, James, Watanabe, Shinji, Lee, Hung-yi
Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content, paralinguistics, spea
Externí odkaz:
http://arxiv.org/abs/2409.14085
Autor:
Wang, Chien-Chun, Chen, Li-Wei, Chou, Cheng-Kang, Lee, Hung-Shin, Chen, Berlin, Wang, Hsin-Min
While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To
Externí odkaz:
http://arxiv.org/abs/2409.12386