Zobrazeno 1 - 10
of 761
pro vyhledávání: '"Peng, Yifan"'
Autor:
Wei, Yishu, Wang, Xindi, Ong, Hanley, Zhou, Yiliang, Flanders, Adam, Shih, George, Peng, Yifan
Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled dat
Externí odkaz:
http://arxiv.org/abs/2409.16563
Visual signals can enhance audiovisual speech recognition accuracy by providing additional contextual information. Given the complexity of visual signals, an audiovisual speech recognition model requires robust generalization capabilities across dive
Externí odkaz:
http://arxiv.org/abs/2409.12370
Autor:
Someki, Masao, Choi, Kwanghee, Arora, Siddhant, Chen, William, Cornell, Samuele, Han, Jionghao, Peng, Yifan, Shi, Jiatong, Srivastav, Vaibhav, Watanabe, Shinji
We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on va
Externí odkaz:
http://arxiv.org/abs/2409.09506
Autor:
Liu, Xin, Tu, Shijie, Hu, Yiwen, Peng, Yifan, Han, Yubing, Kuang, Cuifang, Liu, Xu, Hao, Xiang
Tightly focused optical fields are essential in nano-optics, but their applications have been limited by the challenges of accurate yet efficient characterization. In this article, we develop an in situ method for reconstructing the fully vectorial i
Externí odkaz:
http://arxiv.org/abs/2408.14852
Autor:
Zhang, Gongbo, Jin, Qiao, Zhou, Yiliang, Wang, Song, Idnay, Betina R., Luo, Yiming, Park, Elizabeth, Nestor, Jordan G., Spotnitz, Matthew E., Soroush, Ali, Campion, Thomas, Lu, Zhiyong, Weng, Chunhua, Peng, Yifan
Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor de
Externí odkaz:
http://arxiv.org/abs/2408.00588
Autor:
Consoli, Bernardo, Wu, Xizhi, Wang, Song, Zhao, Xinyu, Wang, Yanshan, Rousseau, Justin, Hartvigsen, Tom, Shen, Li, Wu, Huanmei, Peng, Yifan, Long, Qi, Chen, Tianlong, Ding, Ying
Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a sim
Externí odkaz:
http://arxiv.org/abs/2407.17126
Convolutions have become essential in state-of-the-art end-to-end Automatic Speech Recognition~(ASR) systems due to their efficient modelling of local context. Notably, its use in Conformers has led to superior performance compared to vanilla Transfo
Externí odkaz:
http://arxiv.org/abs/2407.03718
Autor:
Chen, William, Zhang, Wangyou, Peng, Yifan, Li, Xinjian, Tian, Jinchuan, Shi, Jiatong, Chang, Xuankai, Maiti, Soumi, Livescu, Karen, Watanabe, Shinji
Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Univ
Externí odkaz:
http://arxiv.org/abs/2407.00837
Contextualized end-to-end automatic speech recognition has been an active research area, with recent efforts focusing on the implicit learning of contextual phrases based on the final loss objective. However, these approaches ignore the useful contex
Externí odkaz:
http://arxiv.org/abs/2406.16120
The Open Whisper-style Speech Model (OWSM) series was introduced to achieve full transparency in building advanced speech-to-text (S2T) foundation models. To this end, OWSM models are trained on 25 public speech datasets, which are heterogeneous in m
Externí odkaz:
http://arxiv.org/abs/2406.09282