Zobrazeno 1 - 10
of 20 457
pro vyhledávání: '"Kim Nam"'
Autor:
Hyeon, Sieun, Jung, Kyudan, Won, Jaehee, Kim, Nam-Joon, Ryu, Hyun Gon, Lee, Hyuk-Jae, Do, Jaeyoung
In various academic and professional settings, such as mathematics lectures or research presentations, it is often necessary to convey mathematical expressions orally. However, reading mathematical expressions aloud without accompanying visuals can s
Externí odkaz:
http://arxiv.org/abs/2412.15655
Transducer neural networks have emerged as the mainstream approach for streaming automatic speech recognition (ASR), offering state-of-the-art performance in balancing accuracy and latency. In the conventional framework, streaming transducer models a
Externí odkaz:
http://arxiv.org/abs/2411.17537
Autor:
Chen, Deming, Youssef, Alaa, Pendse, Ruchi, Schleife, André, Clark, Bryan K., Hamann, Hendrik, He, Jingrui, Laino, Teodoro, Varshney, Lav, Wang, Yuxiong, Sil, Avirup, Jabbarvand, Reyhaneh, Xu, Tianyin, Kindratenko, Volodymyr, Costa, Carlos, Adve, Sarita, Mendis, Charith, Zhang, Minjia, Núñez-Corrales, Santiago, Ganti, Raghu, Srivatsa, Mudhakar, Kim, Nam Sung, Torrellas, Josep, Huang, Jian, Seelam, Seetharami, Nahrstedt, Klara, Abdelzaher, Tarek, Eilam, Tamar, Zhao, Huimin, Manica, Matteo, Iyer, Ravishankar, Hirzel, Martin, Adve, Vikram, Marinov, Darko, Franke, Hubertus, Tong, Hanghang, Ainsworth, Elizabeth, Zhao, Han, Vasisht, Deepak, Do, Minh, Oliveira, Fabio, Pacifici, Giovanni, Puri, Ruchir, Nagpurkar, Priya
This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co
Externí odkaz:
http://arxiv.org/abs/2411.13239
New tactile interfaces such as swell form printing or refreshable tactile displays promise to allow visually impaired people to analyze data. However, it is possible that design guidelines and familiar encodings derived from experiments on the visual
Externí odkaz:
http://arxiv.org/abs/2410.08438
We present SegINR, a novel approach to neural Text-to-Speech (TTS) that addresses sequence alignment without relying on an auxiliary duration predictor and complex autoregressive (AR) or non-autoregressive (NAR) frame-level sequence modeling. SegINR
Externí odkaz:
http://arxiv.org/abs/2410.04690
LaTeX is suitable for creating specially formatted documents in science, technology, mathematics, and computer science. Although the use of mathematical expressions in LaTeX format along with language models is increasing, there are no proper evaluat
Externí odkaz:
http://arxiv.org/abs/2409.06639
Autor:
Kim, Nam Gyun, Greenidge, Nikita J., Davy, Joshua, Park, Shinwoo, Chandler, James H., Ryu, Jee-Hwan, Valdastri, Pietro
This paper explores the concept of external magnetic control for vine robots to enable their high curvature steering and navigation for use in endoluminal applications. Vine robots, inspired by natural growth and locomotion strategies, present unique
Externí odkaz:
http://arxiv.org/abs/2409.01319
Autor:
Jung, Kyudan, Hyeon, Sieun, Kwon, Jeong Youn, Kim, Nam-Joon, Ryu, Hyun Gon, Lee, Hyuk-Jae, Do, Jaeyoung
Improving the readability of mathematical expressions in text-based document such as subtitle of mathematical video, is an significant task. To achieve this, mathematical expressions should be convert to compiled formulas. For instance, the spoken ex
Externí odkaz:
http://arxiv.org/abs/2408.07081
We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and
Externí odkaz:
http://arxiv.org/abs/2406.17310
In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data
Externí odkaz:
http://arxiv.org/abs/2406.05965